[ragel-users] Re: Inline scanner

Carlos Antunes cmantu... at gmail.com
Thu Jul 5 03:08:36 UTC 2007


Hi Adrian!

Thanks for the idea and code!

I was now able to reduce ragel's memory usage to 330Mbytes with 24212
states. Compilation time is now roughly 2m45s. I'm still adding stuff
so I don't know how things will progress.

In any case, is there any particular reason you resist the
implementation of a "longest match with backtracking" feature? I am
asking because this feature, as you know, is the default in pretty
much any regex lib/app out there. I still think it would be useful in
ragel, without the need to match and call "external" scanners (which
tends to break the continuity of the grammar.)

Thanks!

Carlos

On 7/4/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> Hey Carlos, I think this does what you want. It moves the processing of
> whitespace out of the main machine and should reduce the number of states.
>
> When a whitespace character is seen there is a call to a scanner which
> consumes whitespace. When the whitespace scanner sees non-whitespace it
> holds it and returns. When it sees the end-of-header pattern ('\n' with
> no continuation) it holds the '\n' and returns. This held '\n' is then
> read by the end of header string and the header terminates.
>
> Cheers,
>  Adrian
>
> #include <iostream>
> #include <stdlib.h>
> #include <stdio.h>
>
> using namespace std;
>
> %%{
>         machine sipws;
>         write data;
> }%%
>
> void sipws( char *str )
> {
>         char *p = str, *pe = str + strlen(str) + 1;
>         int cs;
>         int stack[1];
>         int top, act;
>         char *tokstart, *tokend;
>
>         %%{
>                 ws_scan := |*
>                         # Consume spaces.
>                         [ \t]+;
>
>                         # Consume line continuations
>                         '\r'? '\n' [ \t]+;
>
>                         # An end of header. Holds the \n so the end pattern can match.
>                         '\r'? '\n' => {
>                                 cerr << "returning from ws (done) " << (p-str) << endl;
>                                 fhold; fret;
>                         };
>
>                         # Any other character, hold it and return. */
>                         any => {
>                                 cerr << "returning from ws (cont)" << endl;
>                                 fhold; fret;
>                         };
>                 *|;
>
>                 # A word is any non-whitespace.
>                 word = [^ \t\r\n]+;
>
>                 # Whitespace machine: holds the character and jumps to the whitespace
>                 # scanner for processing.
>                 ws = [ \t\r\n] @{
>                         cerr << "going to whitespace " << (p-str) << endl;
>                         fhold; fcall ws_scan;
>                 };
>
>                 # A newline immediately after coming back from the whitespace scanner
>                 # signifies the end of a header.
>                 ws_end = ws '\n';
>
>                 header = [a-z]+ ':' ws? word (ws word)* ws_end;
>
>                 main := header+ 0;
>
>                 # Initialize and execute.
>                 write init;
>                 write exec;
>         }%%
>
>         if ( cs < sipws_first_final )
>                 cerr << "sipws: there was an error at position " << (p-str) << endl;
> };
>
>
> #define BUFSIZE 1024
>
> int main()
> {
>         sipws(
>                 "hr: asdf ljfa ljd\n"
>                 "       cont\n"
>                 "new:asiei\n"
>         );
>         return 0;
> }
>
>
>


-- 
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
        -- Thomas Jefferson



More information about the ragel-users mailing list