[ragel-users] combining ragel and lemon

Mark Olesen Mark.Olesen at esi-group.com
Wed Aug 21 13:16:57 EDT 2019

Hi Adrian,

Thanks for the feedback. After some more digging, it seems it won't make 
much difference between having a push or pull model for a simple syntax. 
The ragel potion of the code is fairly simple: just emit a token type or 
a number - not much else.

As my initial go, I'm trying to extend the simple calculator model with 
some functions etc.  To ship around the longest match issue, it seems 
okay to simply bind in the opening '(' into the token. Eg,

   'log' space* '('         { EMIT_TOKEN(EXP); };

And on the lemon side, accept that with a closing condition.

exp(lhs) ::= LOG exp(a) RPAREN.
     lhs = std:log(a);

This deals adequately with incomplete content such as these
     " log( "  and  " log(10"

This disambiguation crutch fails when supplying some predefined 
constants such as pi and M_PI. If I use a simple match,

     'pi'  @{ EMIT_NUMBER(M_PI); };

It fails with things like "2*pie" and "pi2". Here I can only resort to 
either having a lex failure (ie, pe != eof afterwards) or let lemon 
notice the syntax error.
Neither feels particularly right. I started trying to add in additional 
constraints. Eg,

     ('pi' | 'M_PI') :> (delim | ')') @{ EMIT_NUMBER(M_PI); };

but that eats the next character instead of doing a lookahead (can't 
figure how to manage that) and falls apart for the eof case.  Do I 
somehow need to push the parse point before the delimiters and pop it 
again (and handle eof too)?

If I can get this stage worked out, I might be able to move to the next 
phase with different parse states. The code that I am attempting to port 
is currently flex/bison (not particularly pretty) and I would like to 
avoid a bison build dependency.



Perhaps I'm attempting too much and it's not really possible with 

Thanks for any hints.


On 8/16/19 9:57 PM, Adrian Thurston wrote:
> Hi Mark,
> You can return from the scanner pattern action if you like. You can also 
> craft a machine that that just attempts to match one token, then return 
> the token. If you want to stay faithful to lex semantics, you have to 
> take some care to implement the "longest-match" characteristic yourself.
> But from what I've seen, lemon lets you pass in one token at a time. 
> This is just one example I found, but there seems to be more.
> https://github.com/eloraiby/rl-json-parser/blob/master/json-parser/lexer.rl
> Adrian
> On 2019-08-16 11:07, Mark Olesen wrote:
>> I've use ragel in a few places already for parsing, but now I'm looking
>> to port over a medium sized chunk of flex/bison. Perhaps going for a
>> ragel/lemon combination.
>> According to the lemon docs (https://www.hwaci.com/sw/lemon/lemon.html)
>> the parser is the one calling the lexer. But this seems to be the same
>> as ragel would like to be doing: match a pattern, call an action.
>> The few examples of ragel/lemon that I've found (the classic calculator,
>> or a json parser) seem to be handling this by running ragel as the
>> scanner, and calling the lemon parser in its actions.
>> Some other examples I've seen use re2c for the lexing part. This appears
>> to fit better with what lemon expects, but there must be some way to lex
>> a single token and return from ragel I suspect.
>> I would be thankful for ideas.
>> Cheers,
>> /mark
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at colm.net
>> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users

Principal Engineer, ESI-OpenCFD
Engineering System International GmbH | Einsteinring 24 | 85609 Munich
Mob. +49 171 9710 149
www.openfoam.com | www.esi-group.com | mark.olesen at esi-group.com

More information about the ragel-users mailing list