Grammar testing proposal

Colin Fleming colin.flem... at coreproc.com
Fri Sep 15 15:18:50 UTC 2006


Hi all,

I've been thinking about various ways to test Ragel and the generated
grammars, here's what I've come up with. I'm really interested in any
feedback. I'm currently developing a couple of grammars that I'm
primarily interested in using with Java. The Java generation is still
a bit experimental, so I'd like to be able to use acceptance tests
that confirm that a) the grammar works as expected, b) the results are
consistent across Java/C++/whatever, and c) that the results are also
consistent across different code generation strategies.

This last one is probably currently more useful to Adrian than anyone,
but I'm probably going to reimplement rlcodegen in Java shortly, so it
will be great for testing that as well as testing code generation
implementations for any new languages, or new code generation
strategies.

So, I propose a parser class generator that will take a raw Ragel
grammar and generate an rl file for whichever of the supported
languages the user requests. This rl file will generate a basic
parsing class, with the standard methods: init, execute, finish. The
Ragel syntax would be slightly extended to specify features of the
generated class, and these extensions stripped out when the rl file is
written. This would actually probably be pretty generally useful too,
a lot of people just want a support class that they can integrate into
a larger project, I imagine.

The whole point of this thing is testing, so unit test data and
expected values would be encoded in the source file. Either a test
class or just the parser could be generated, or both.

An example is worth a thousand words, so here goes:

%%{
  # Variables for the generated class, initialised in init() method
  # public vars generate getters
  public int val = 0;
  private boolean neg = true;

  action see_neg {
    neg = true;
  }

  action add_digit {
    val = val * 10 + (fc - '0');
  }

  main :=
    ( '-'@see_neg | '+' )? ( digit @add_digit )+
    '\n' @{ fbreak; };

  test {
    input "1\n";
    output "1";
  }

  test {
    input "213 3213\n";
    output "unexpected char ' ' in input";
    failure;
  }
}%%

Obviously one concern here is overloading the Ragel syntax, maybe a
prefix would be good to highlight the new keywords as preprocessor
directives.

A few more thoughts:

It would be good to be able to specify variables of the alphabet type:
public alphtype character;

It would also be interesting to track the states the machine moves
through on each run, they could be compared to ensure that the
different strategies are behaving equally.

I'm also not sure about having the test code in with the actual
grammar, but I guess an include directive would make that easier.

Any thoughts or ideas?

Cheers,
Colin



More information about the ragel-users mailing list