Grammar testing proposal

Fri Sep 15 20:06:09 UTC 2006

Colin, great idea. One issue might be specifying language independent 
actions. This could get tough if in the future we support non c-like 
languages. For example, there was mention of supporting Ruby.

Perhaps TXL (http://www.txl.ca/) might be useful. It could be used to define 
a mini toy language and to write transformations to the host languages. 
Though I'm connected to that project so I'm biased in regard to it being 
appropriate :)

-Adrian

Colin Fleming wrote:
> Hi all,
> 
> I've been thinking about various ways to test Ragel and the generated
> grammars, here's what I've come up with. I'm really interested in any
> feedback. I'm currently developing a couple of grammars that I'm
> primarily interested in using with Java. The Java generation is still
> a bit experimental, so I'd like to be able to use acceptance tests
> that confirm that a) the grammar works as expected, b) the results are
> consistent across Java/C++/whatever, and c) that the results are also
> consistent across different code generation strategies.
> 
> This last one is probably currently more useful to Adrian than anyone,
> but I'm probably going to reimplement rlcodegen in Java shortly, so it
> will be great for testing that as well as testing code generation
> implementations for any new languages, or new code generation
> strategies.
> 
> So, I propose a parser class generator that will take a raw Ragel
> grammar and generate an rl file for whichever of the supported
> languages the user requests. This rl file will generate a basic
> parsing class, with the standard methods: init, execute, finish. The
> Ragel syntax would be slightly extended to specify features of the
> generated class, and these extensions stripped out when the rl file is
> written. This would actually probably be pretty generally useful too,
> a lot of people just want a support class that they can integrate into
> a larger project, I imagine.
> 
> The whole point of this thing is testing, so unit test data and
> expected values would be encoded in the source file. Either a test
> class or just the parser could be generated, or both.
> 
> An example is worth a thousand words, so here goes:
> 
> %%{
>   # Variables for the generated class, initialised in init() method
>   # public vars generate getters
>   public int val = 0;
>   private boolean neg = true;
> 
>   action see_neg {
>     neg = true;
>   }
> 
>   action add_digit {
>     val = val * 10 + (fc - '0');
>   }
> 
>   main :=
>     ( '-'@see_neg | '+' )? ( digit @add_digit )+
>     '\n' @{ fbreak; };
> 
>   test {
>     input "1\n";
>     output "1";
>   }
> 
>   test {
>     input "213 3213\n";
>     output "unexpected char ' ' in input";
>     failure;
>   }
> }%%
> 
> Obviously one concern here is overloading the Ragel syntax, maybe a
> prefix would be good to highlight the new keywords as preprocessor
> directives.
> 
> A few more thoughts:
> 
> It would be good to be able to specify variables of the alphabet type:
> public alphtype character;
> 
> It would also be interesting to track the states the machine moves
> through on each run, they could be compared to ensure that the
> different strategies are behaving equally.
> 
> I'm also not sure about having the test code in with the actual
> grammar, but I guess an include directive would make that easier.
> 
> Any thoughts or ideas?
> 
> Cheers,
> Colin
> 
>