[ragel-users] Re: Inline scanner

Adrian Thurston thurs... at cs.queensu.ca
Thu Jul 5 05:42:25 UTC 2007


Hi Carlos,

I'm not yet convinced that a new feature is necessary to solve this
problem. It seems to me that it's more a matter of coding technique. But
of course I could be wrong ... I just need to know how an inline scanner
is different and better than the code I sent.

One difference I can think of (you described this previously) is that
the inline scanner is entered immediately upon moving to the start state
 (as opposed to the first character out of the start state).
Unfortunately this is not compatible with the current run-time model, in
which actions take place only on transitions over characters. Anything
that involves changing the run-time model I have to consider very carefully.

-Adrian

Carlos Antunes wrote:
> Hi Adrian!
> 
> Thanks for the idea and code!
> 
> I was now able to reduce ragel's memory usage to 330Mbytes with 24212
> states. Compilation time is now roughly 2m45s. I'm still adding stuff
> so I don't know how things will progress.
> 
> In any case, is there any particular reason you resist the
> implementation of a "longest match with backtracking" feature? I am
> asking because this feature, as you know, is the default in pretty
> much any regex lib/app out there. I still think it would be useful in
> ragel, without the need to match and call "external" scanners (which
> tends to break the continuity of the grammar.)
> 
> Thanks!
> 
> Carlos
> 
> On 7/4/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
>> Hey Carlos, I think this does what you want. It moves the processing of
>> whitespace out of the main machine and should reduce the number of states.
>>
>> When a whitespace character is seen there is a call to a scanner which
>> consumes whitespace. When the whitespace scanner sees non-whitespace it
>> holds it and returns. When it sees the end-of-header pattern ('\n' with
>> no continuation) it holds the '\n' and returns. This held '\n' is then
>> read by the end of header string and the header terminates.
>>
>> Cheers,
>>  Adrian
>>
>> #include <iostream>
>> #include <stdlib.h>
>> #include <stdio.h>
>>
>> using namespace std;
>>
>> %%{
>>         machine sipws;
>>         write data;
>> }%%
>>
>> void sipws( char *str )
>> {
>>         char *p = str, *pe = str + strlen(str) + 1;
>>         int cs;
>>         int stack[1];
>>         int top, act;
>>         char *tokstart, *tokend;
>>
>>         %%{
>>                 ws_scan := |*
>>                         # Consume spaces.
>>                         [ \t]+;
>>
>>                         # Consume line continuations
>>                         '\r'? '\n' [ \t]+;
>>
>>                         # An end of header. Holds the \n so the end pattern can match.
>>                         '\r'? '\n' => {
>>                                 cerr << "returning from ws (done) " << (p-str) << endl;
>>                                 fhold; fret;
>>                         };
>>
>>                         # Any other character, hold it and return. */
>>                         any => {
>>                                 cerr << "returning from ws (cont)" << endl;
>>                                 fhold; fret;
>>                         };
>>                 *|;
>>
>>                 # A word is any non-whitespace.
>>                 word = [^ \t\r\n]+;
>>
>>                 # Whitespace machine: holds the character and jumps to the whitespace
>>                 # scanner for processing.
>>                 ws = [ \t\r\n] @{
>>                         cerr << "going to whitespace " << (p-str) << endl;
>>                         fhold; fcall ws_scan;
>>                 };
>>
>>                 # A newline immediately after coming back from the whitespace scanner
>>                 # signifies the end of a header.
>>                 ws_end = ws '\n';
>>
>>                 header = [a-z]+ ':' ws? word (ws word)* ws_end;
>>
>>                 main := header+ 0;
>>
>>                 # Initialize and execute.
>>                 write init;
>>                 write exec;
>>         }%%
>>
>>         if ( cs < sipws_first_final )
>>                 cerr << "sipws: there was an error at position " << (p-str) << endl;
>> };
>>
>>
>> #define BUFSIZE 1024
>>
>> int main()
>> {
>>         sipws(
>>                 "hr: asdf ljfa ljd\n"
>>                 "       cont\n"
>>                 "new:asiei\n"
>>         );
>>         return 0;
>> }
>>
>>
>>
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20070705/91692b60/attachment-0001.sig>


More information about the ragel-users mailing list