[ragel-users] Action code for simple tokenizer?

Seamus Abshere seamus at abshere.net
Mon Jun 13 23:09:50 UTC 2011

Dear friends who have been using ragel for more than a year,

I bet Kevin and I are facing a similar problem that you have all faced, 
namely that as a software project matures, common ground between its 
founding users and new users erodes. Fresh code examples keep interest 
alive and prevent people from re-inventing the wheel. Please do speak up!

How about an authoritative Ruby code example for Ragel Guide 6.7 section 
4.2.4 (Longest-Match Kleene Star)?. It's "useful when writing simple 
tokenizers"... that sounds like a great way to bridge the gap.

Since all the code examples are in C, it's not clear what you would use 
in Ruby instead of ts and te.


On 6/13/11 12:42 PM, Kevin T. Ryan wrote:
> Hey -
> Just started using the library myself.  Easiest way to think about it
> (at least, it was for me) is that you are defining the machine in the
> section you noted below from the guide.  Until you initialize and
> execute it, it doesn't "do anything".  Thus, in some part of your
> script you need:
> %% write data; # sets up all the static data needed by the tokenizer
> Then (somewhere else in all likelihood), you need to initialize and
> execute the machine.  So, for example:
> int main(int argc, char* argv[]) {
>      int cs; // you can use this to check the status of the machine
>      char* p = "Your text to tokenize";
>      char* pe = p + strlen(p);
>      %% write init;
>      %% write exec; # this will execute the machine given the input
> provided by 'p'
>      if (cs ==<machine_name>_error)
>          fprintf(stderr, "Error\n");
>      return 0;
> }
>> What might action A look like? How does it use p, pe, etc.? Ditto for B.
> Maybe action 'A' is used to print a match when it  ends (the '%' in
> front of the A indicates that it will occur when leaving action).  For
> example:
> action A { print("Found alpha\n"); }
> action B { print("Found int\n"); }
> If you need to print out the total string, you might combine it with a
> 'mark' action.  Eg:
> action mark { mark = p; /* mark needs to be set up in 'main' function
> now as a char* */ }
> <  as before>
> lower ( lower | digit )*>mark %A |
> And do the same for the integer portion of the machine.  You could
> then change your print function to do something like:
> printf("Found alpha: %.*s\n", p-mark, mark); // print out the alpha found
>> PS. I think this would address a big question for
>> ragel/parsing/lexing/tokenizing newbies, namely, how would an **expert**
>> implement a **simple** tokenizer?
> You may also want to look at machines that are 'special' for lexing
> (viz., machine := |* *|;).  BTW, I'm very new to this myself - so
> hopefully I didn't screw anything up too much!
> PS - I'm actually trying to write up a tutorial which I'll share with
> the list for feedback once it's done.  I think I have a much better
> grasp of what's going on now, but I think writing it out would
> actually help my understanding too.
> Good luck,
> ktr
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users

Seamus Abshere
123 N Blount St Apt 403
Madison, WI 53703
1 (201) 566-0130

ragel-users mailing list
ragel-users at complang.org

More information about the ragel-users mailing list