Problem with a scanner dropping the first character of an identifier.

Patrick O'Grady patr... at baymotion.com
Wed Mar 21 03:50:09 UTC 2007


Adrian--

Thanks very much; that cleared up my problem-- I knew that it had to
be something simple.  In terms of the buffer management--the arguments
you presented are exactly why I'm interested in Ragel--Ragel is one of
the very few parser generator tools which doesn't have to rely on the
thread's call stack in order to keep its state information--allowing
me to manipulate a state machine as events are received by whichever
thread actually receives the event.   I'm also very interested in how
state machines can be inherited and augmented, hopefully without
having to edit the original source for those machines.  Ragel is a
terriffic tool--and thanks again for your help.  If I did have one
request, however, it would be to perhaps add a mode where instead of
using pointer arithmetic to advance through the buffer, the pointer
could be adjusted through function calls or #define macros that I
could override.  I'd use this to allow parsing across non-contiguous
buffers.

-patrick




On Mar 20, 6:38 pm, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> Hi Patrick,
>
> In the main machine the use of % causes the action to be executed on the
> following character. If you change the action embedding operator to @ or
> $ the action will be executed immediately and you should get the results
> you want.
>
> Using tokstart and tokend is the only way to retrieve token text. One of
> the goals of Ragel is to have a tool which generates code with no
> dependencies, including malloc. This is why I have a "hands-off"
> approach to buffer and token-data management. Whenever possible I prefer
> to leave this up to the user, as she is in the best position to decide
> how memory management is to be done.
>
> Cheers,
>  Adrian
>
> Patrick O'Grady wrote:
> > Hi, all--
>
> > I've been struggling with a little self-test fixture which uses Ragel to
> > scan some input.  Here's the test program:
>
> > #include <stdio.h>
>
> > %%{
> >     machine scanner ;
>
> >     ids := |*
>
> >         identifier = [a-zA-Z_][a-zA-Z0-9_]* ;
>
> >         identifier
> >                 =>  {   printf("Got identifier: %.*s.\n", tokend - tokstart,
> > tokstart);
> >                         fret ;
> >                     }
> >                 ;
>
> >         (' '|'\n'|'\r')*
> >                 =>  { fret; }
> >                 ;
>
> >         any
> >                 =>  { printf("Ignored.\n"); fret; }
> >                 ;
> >     *| ;
>
> >     main := ( any %{ fhold; fcall ids; } )* ;
> > }%%
>
> > int main()
> > {
> >     unsigned cs ;
> >     char const * p ;
> >     char const * pe ;
> >     char const * tokstart ;
> >     char const * tokend ;
> >     unsigned act ;
> >     unsigned stack[100] ;
> >     unsigned top ;
>
> >     %%write data ;
>
> >     %%write init ;
>
> >     char const s[] = "Once upon a time." ;
>
> >     p = s ;
> >     pe = &(s[sizeof(s)]);
>
> >     %%write exec ;
>
> >     %% write eof ;
>
> >     return 0 ;
> > }
>
> > I'm compling with Ragel 5.19/MSVC, and I get the following output.
>
> > Got identifier: nce.
> > Got identifier: upon.
> > Got identifier: a.
> > Got identifier: time.
> > Ignored.
> > Ignored.
>
> > Everything here is as expected, except the first identifier, which should be
> > "Once", not "nce"--it seems to have skipped over the first 'O'.  First--is
> > there a better way to get a list of all the tokens in the input?  Anyone
> > have any clues about this misbehavior?  Thanks in advance.
>
> > -patrick
>
> > >
>
>
>  signature.asc
> 1KDownload



More information about the ragel-users mailing list