[ragel-users] Action code for simple tokenizer?

Seamus Abshere seamus at abshere.net
Mon Jun 13 23:09:50 UTC 2011


Dear friends who have been using ragel for more than a year,

I bet Kevin and I are facing a similar problem that you have all faced, 
namely that as a software project matures, common ground between its 
founding users and new users erodes. Fresh code examples keep interest 
alive and prevent people from re-inventing the wheel. Please do speak up!

How about an authoritative Ruby code example for Ragel Guide 6.7 section 
4.2.4 (Longest-Match Kleene Star)?. It's "useful when writing simple 
tokenizers"... that sounds like a great way to bridge the gap.

Since all the code examples are in C, it's not clear what you would use 
in Ruby instead of ts and te.

Best,
Seamus

On 6/13/11 12:42 PM, Kevin T. Ryan wrote:
> Hey -
>
> Just started using the library myself.  Easiest way to think about it
> (at least, it was for me) is that you are defining the machine in the
> section you noted below from the guide.  Until you initialize and
> execute it, it doesn't "do anything".  Thus, in some part of your
> script you need:
>
> %% write data; # sets up all the static data needed by the tokenizer
>
> Then (somewhere else in all likelihood), you need to initialize and
> execute the machine.  So, for example:
>
> int main(int argc, char* argv[]) {
>      int cs; // you can use this to check the status of the machine
>      char* p = "Your text to tokenize";
>      char* pe = p + strlen(p);
>
>      %% write init;
>      %% write exec; # this will execute the machine given the input
> provided by 'p'
>
>      if (cs ==<machine_name>_error)
>          fprintf(stderr, "Error\n");
>      return 0;
> }
>
>> What might action A look like? How does it use p, pe, etc.? Ditto for B.
>
> Maybe action 'A' is used to print a match when it  ends (the '%' in
> front of the A indicates that it will occur when leaving action).  For
> example:
>
> action A { print("Found alpha\n"); }
> action B { print("Found int\n"); }
>
> If you need to print out the total string, you might combine it with a
> 'mark' action.  Eg:
>
> action mark { mark = p; /* mark needs to be set up in 'main' function
> now as a char* */ }
> <  as before>
> lower ( lower | digit )*>mark %A |
>
> And do the same for the integer portion of the machine.  You could
> then change your print function to do something like:
>
> printf("Found alpha: %.*s\n", p-mark, mark); // print out the alpha found
>
>> PS. I think this would address a big question for
>> ragel/parsing/lexing/tokenizing newbies, namely, how would an **expert**
>> implement a **simple** tokenizer?
>
> You may also want to look at machines that are 'special' for lexing
> (viz., machine := |* *|;).  BTW, I'm very new to this myself - so
> hopefully I didn't screw anything up too much!
>
> PS - I'm actually trying to write up a tutorial which I'll share with
> the list for feedback once it's done.  I think I have a much better
> grasp of what's going on now, but I think writing it out would
> actually help my understanding too.
>
> Good luck,
>
> ktr
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users

-- 
Seamus Abshere
123 N Blount St Apt 403
Madison, WI 53703
1 (201) 566-0130

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list