[ragel-users] simple parser for #include statements

Mark Olesen Mark.Olesen at esi-group.com
Wed Apr 25 15:37:07 UTC 2018


Hi Adrian,

Your explanation starts to make some sense. Using 'any' machine instead 
my 'dnl' machine should be a similar speed (the position of looping and 
testing for '\n' has just shifted about a bit).

However, if I rewrite it:

%%{
     main := |*
     space*;

     white* '#' white* 'include' white*
     (dquot dqarg >buffer %process dquot) dnl;

     '//' dnl;               # 1-line comment
     '/*' any* :>> '*/';     # Multi-line comment

     any                     # Discard
     *|;
}%%

How do I ensure that the '#include' is properly anchored? This is what I 
was attempting with the 'dnl' machine: an attempt to enforce line-based 
processing, but combined with swallowing multi-line comments.

As a regex, I'd specify my match like this

    /^\s*#\s*include\s+"(.*?)".*$/

For my ragel machine, should I be doing something different such as 
having a begin-of-line state that I initialize into and reset every time 
I cross a newline?
With vague hand waving:

%%{
     main := |*

     '#' white* 'include' white*
     (dquot dqarg >buffer %process dquot) dnl;

     '//' dnl;               # 1-line comment
     '/*' any* :>> '*/';     # Multi-line comment

     (space %isbol | any %notbol)  # Discard
     *|;
}%%

Not that I really understand what I'd do next with this.

Cheers,
/mark


On 04/25/18 15:45, Adrian Thurston wrote:
> Hi Mark,
> 
> So the thing to remember here is that a scanner will always try for the 
> longest match possible, and only in the case of matches of equal length 
> will it choose the pattern that appears ahead of the others. So in this 
> case the dnl at the end is taking precedence over the comment rules. It 
> doesn't interfere with the include matching rule because it also has a 
> dnl at the end.
> 
> For the catch all you want to use just the any machine. It will go one 
> char at a time and this may seem less efficient, but ragel does its best 
> to optimize this.
> 
> In regards to the slightly tighter machine that you mentioned, it would 
> be interesting to see before and after grammars in full to see what's 
> going on. On their own they produce the same machine, but in the context 
> of something larger there might be something preventing it, or it could 
> be a missed opportunity for optimization.
> 
> -Adrian



More information about the ragel-users mailing list