[ragel-users] Re: Newbie advice

Adrian Thurston thurs... at cs.queensu.ca
Mon Aug 27 17:37:10 UTC 2007


Hi Dave,

Counting works in this case but it doesn't work for arbitrary terminating patterns. Consider the pattern that begins to match but then fails and restarts half-way through with the two potential matches overlapping. You can't get an accurate count. In more general tems the problem is that the machine is matching concurrently but using a single instance of context data.

This is where the pure state machine model breaks down and scanners begin to shine because they delay their pattern actions until after the pattern matches (made possible by backtracking).

Adrian
-----Original Message-----
From: Dave Dribin <ddribin at gmail.com>

Date: Mon, 27 Aug 2007 16:53:24 
To:ragel-users <ragel-users at googlegroups.com>
Subject: [ragel-users] Re: Newbie advice



Thanks, again.  Can you see anything wrong with the following
approach?  To make it more interesting, I want to handle both Unix and
Windows newlines:

    newline = '\r'? '\n' @onNewline;
    any_line = [^\r\n]* newline;
    marker_line = '--' newline;
    section_body = (any_line - marker_line)*;

    section = (section_body marker_line $countMarker) $onChar
@onSection;

Thus, onChar buffers up the entire section, including the full marker
line.  But countMarker counts the number of characters in used in the
variable length marker, and onSection strips that many characters from
the buffer.

This seems to handle the general case of a variable length terminating
marker.  The downside is that it buffers up extra characters only to
yank them off, at the end.  So long as the terminating marker is
fairly short, I don't see this being a major issue.

Is it possible for countMarker to set an "ignore" flag that onChar
checks?  This would require that the countMarker action be called
before onChar, though, and I don't know if that is guaranteed.

-Dave






More information about the ragel-users mailing list