[ragel-users] simple parser for #include statements

Adrian Thurston thurston at colm.net
Wed Apr 25 18:24:18 UTC 2018


End of *every pattern.

On 2018-04-25 14:18, Adrian Thurston wrote:
> Oh I see. In that case you could use dnl as the default rule, but be
> sure to add it to the end of pattern. That would guarantee achor on
> beginning of line. A question then arises though, do you want to allow
> comments ahead of include statements?
> 
> On 2018-04-25 11:37, Mark Olesen wrote:
>> Hi Adrian,
>> 
>> Your explanation starts to make some sense. Using 'any' machine
>> instead my 'dnl' machine should be a similar speed (the position of
>> looping and testing for '\n' has just shifted about a bit).
>> 
>> However, if I rewrite it:
>> 
>> %%{
>>     main := |*
>>     space*;
>> 
>>     white* '#' white* 'include' white*
>>     (dquot dqarg >buffer %process dquot) dnl;
>> 
>>     '//' dnl;               # 1-line comment
>>     '/*' any* :>> '*/';     # Multi-line comment
>> 
>>     any                     # Discard
>>     *|;
>> }%%
>> 
>> How do I ensure that the '#include' is properly anchored? This is what
>> I was attempting with the 'dnl' machine: an attempt to enforce
>> line-based processing, but combined with swallowing multi-line
>> comments.
>> 
>> As a regex, I'd specify my match like this
>> 
>>    /^\s*#\s*include\s+"(.*?)".*$/
>> 
>> For my ragel machine, should I be doing something different such as
>> having a begin-of-line state that I initialize into and reset every
>> time I cross a newline?
>> With vague hand waving:
>> 
>> %%{
>>     main := |*
>> 
>>     '#' white* 'include' white*
>>     (dquot dqarg >buffer %process dquot) dnl;
>> 
>>     '//' dnl;               # 1-line comment
>>     '/*' any* :>> '*/';     # Multi-line comment
>> 
>>     (space %isbol | any %notbol)  # Discard
>>     *|;
>> }%%
>> 
>> Not that I really understand what I'd do next with this.
>> 
>> Cheers,
>> /mark
>> 
>> 
>> On 04/25/18 15:45, Adrian Thurston wrote:
>>> Hi Mark,
>>> 
>>> So the thing to remember here is that a scanner will always try for 
>>> the longest match possible, and only in the case of matches of equal 
>>> length will it choose the pattern that appears ahead of the others. 
>>> So in this case the dnl at the end is taking precedence over the 
>>> comment rules. It doesn't interfere with the include matching rule 
>>> because it also has a dnl at the end.
>>> 
>>> For the catch all you want to use just the any machine. It will go 
>>> one char at a time and this may seem less efficient, but ragel does 
>>> its best to optimize this.
>>> 
>>> In regards to the slightly tighter machine that you mentioned, it 
>>> would be interesting to see before and after grammars in full to see 
>>> what's going on. On their own they produce the same machine, but in 
>>> the context of something larger there might be something preventing 
>>> it, or it could be a missed opportunity for optimization.
>>> 
>>> -Adrian
>> 
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at colm.net
>> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users
> 
> _______________________________________________
> ragel-users mailing list
> ragel-users at colm.net
> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users



More information about the ragel-users mailing list