Problem with a scanner dropping the first character of an identifier.

Patrick O'Grady patr... at baymotion.com
Tue Mar 20 22:22:58 UTC 2007


Hi, all--

I've been struggling with a little self-test fixture which uses Ragel to 
scan some input.  Here's the test program:


#include <stdio.h>

%%{
    machine scanner ;

    ids := |*

        identifier = [a-zA-Z_][a-zA-Z0-9_]* ;

        identifier
                =>  {   printf("Got identifier: %.*s.\n", tokend - tokstart, 
tokstart);
                        fret ;
                    }
                ;

        (' '|'\n'|'\r')*
                =>  { fret; }
                ;

        any
                =>  { printf("Ignored.\n"); fret; }
                ;
    *| ;

    main := ( any %{ fhold; fcall ids; } )* ;
}%%




int main()
{
    unsigned cs ;
    char const * p ;
    char const * pe ;
    char const * tokstart ;
    char const * tokend ;
    unsigned act ;
    unsigned stack[100] ;
    unsigned top ;

    %%write data ;

    %%write init ;

    char const s[] = "Once upon a time." ;

    p = s ;
    pe = &(s[sizeof(s)]);

    %%write exec ;

    %% write eof ;

    return 0 ;
}


I'm compling with Ragel 5.19/MSVC, and I get the following output.

Got identifier: nce.
Got identifier: upon.
Got identifier: a.
Got identifier: time.
Ignored.
Ignored.

Everything here is as expected, except the first identifier, which should be 
"Once", not "nce"--it seems to have skipped over the first 'O'.  First--is 
there a better way to get a list of all the tokens in the input?  Anyone 
have any clues about this misbehavior?  Thanks in advance.

-patrick






More information about the ragel-users mailing list