[ragel-users] Problem with a scanner dropping the first character of an identifier.

Adrian Thurston thurs... at cs.queensu.ca
Wed Mar 21 02:38:04 UTC 2007


Hi Patrick,

In the main machine the use of % causes the action to be executed on the
following character. If you change the action embedding operator to @ or
$ the action will be executed immediately and you should get the results
you want.

Using tokstart and tokend is the only way to retrieve token text. One of
the goals of Ragel is to have a tool which generates code with no
dependencies, including malloc. This is why I have a "hands-off"
approach to buffer and token-data management. Whenever possible I prefer
to leave this up to the user, as she is in the best position to decide
how memory management is to be done.

Cheers,
 Adrian

Patrick O'Grady wrote:
> Hi, all--
> 
> I've been struggling with a little self-test fixture which uses Ragel to 
> scan some input.  Here's the test program:
> 
> 
> #include <stdio.h>
> 
> %%{
>     machine scanner ;
> 
>     ids := |*
> 
>         identifier = [a-zA-Z_][a-zA-Z0-9_]* ;
> 
>         identifier
>                 =>  {   printf("Got identifier: %.*s.\n", tokend - tokstart, 
> tokstart);
>                         fret ;
>                     }
>                 ;
> 
>         (' '|'\n'|'\r')*
>                 =>  { fret; }
>                 ;
> 
>         any
>                 =>  { printf("Ignored.\n"); fret; }
>                 ;
>     *| ;
> 
>     main := ( any %{ fhold; fcall ids; } )* ;
> }%%
> 
> 
> 
> 
> int main()
> {
>     unsigned cs ;
>     char const * p ;
>     char const * pe ;
>     char const * tokstart ;
>     char const * tokend ;
>     unsigned act ;
>     unsigned stack[100] ;
>     unsigned top ;
> 
>     %%write data ;
> 
>     %%write init ;
> 
>     char const s[] = "Once upon a time." ;
> 
>     p = s ;
>     pe = &(s[sizeof(s)]);
> 
>     %%write exec ;
> 
>     %% write eof ;
> 
>     return 0 ;
> }
> 
> 
> I'm compling with Ragel 5.19/MSVC, and I get the following output.
> 
> Got identifier: nce.
> Got identifier: upon.
> Got identifier: a.
> Got identifier: time.
> Ignored.
> Ignored.
> 
> Everything here is as expected, except the first identifier, which should be 
> "Once", not "nce"--it seems to have skipped over the first 'O'.  First--is 
> there a better way to get a list of all the tokens in the input?  Anyone 
> have any clues about this misbehavior?  Thanks in advance.
> 
> -patrick
> 
> 
> 
> 
> 
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "ragel-users" group.
> To post to this group, send email to ragel-users at googlegroups.com
> To unsubscribe from this group, send email to ragel-users-unsubscribe at googlegroups.com
> For more options, visit this group at http://groups.google.com/group/ragel-users?hl=en
> -~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20070320/33afcc2e/attachment-0001.sig>


More information about the ragel-users mailing list