[ragel-users] simple parser for #include statements

Adrian Thurston thurston at colm.net
Wed Apr 25 18:18:08 UTC 2018


Oh I see. In that case you could use dnl as the default rule, but be 
sure to add it to the end of pattern. That would guarantee achor on 
beginning of line. A question then arises though, do you want to allow 
comments ahead of include statements?

On 2018-04-25 11:37, Mark Olesen wrote:
> Hi Adrian,
> 
> Your explanation starts to make some sense. Using 'any' machine
> instead my 'dnl' machine should be a similar speed (the position of
> looping and testing for '\n' has just shifted about a bit).
> 
> However, if I rewrite it:
> 
> %%{
>     main := |*
>     space*;
> 
>     white* '#' white* 'include' white*
>     (dquot dqarg >buffer %process dquot) dnl;
> 
>     '//' dnl;               # 1-line comment
>     '/*' any* :>> '*/';     # Multi-line comment
> 
>     any                     # Discard
>     *|;
> }%%
> 
> How do I ensure that the '#include' is properly anchored? This is what
> I was attempting with the 'dnl' machine: an attempt to enforce
> line-based processing, but combined with swallowing multi-line
> comments.
> 
> As a regex, I'd specify my match like this
> 
>    /^\s*#\s*include\s+"(.*?)".*$/
> 
> For my ragel machine, should I be doing something different such as
> having a begin-of-line state that I initialize into and reset every
> time I cross a newline?
> With vague hand waving:
> 
> %%{
>     main := |*
> 
>     '#' white* 'include' white*
>     (dquot dqarg >buffer %process dquot) dnl;
> 
>     '//' dnl;               # 1-line comment
>     '/*' any* :>> '*/';     # Multi-line comment
> 
>     (space %isbol | any %notbol)  # Discard
>     *|;
> }%%
> 
> Not that I really understand what I'd do next with this.
> 
> Cheers,
> /mark
> 
> 
> On 04/25/18 15:45, Adrian Thurston wrote:
>> Hi Mark,
>> 
>> So the thing to remember here is that a scanner will always try for 
>> the longest match possible, and only in the case of matches of equal 
>> length will it choose the pattern that appears ahead of the others. So 
>> in this case the dnl at the end is taking precedence over the comment 
>> rules. It doesn't interfere with the include matching rule because it 
>> also has a dnl at the end.
>> 
>> For the catch all you want to use just the any machine. It will go one 
>> char at a time and this may seem less efficient, but ragel does its 
>> best to optimize this.
>> 
>> In regards to the slightly tighter machine that you mentioned, it 
>> would be interesting to see before and after grammars in full to see 
>> what's going on. On their own they produce the same machine, but in 
>> the context of something larger there might be something preventing 
>> it, or it could be a missed opportunity for optimization.
>> 
>> -Adrian
> 
> _______________________________________________
> ragel-users mailing list
> ragel-users at colm.net
> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users



More information about the ragel-users mailing list