[ragel-users] Action code for simple tokenizer?

Adrian Thurston thurston at complang.org
Wed Jun 15 15:28:09 UTC 2011


In the ruby code generator one uses ts and te, except they are offsets 
against 'data', instead of pointers. Aside from that, the assumptions 
and use cases are all the same.

I would like to use only C in the manual. Ragel supports a number of 
languages, but it was originally designed for C and I would like the 
manual to reflect that.

On 11-06-13 04:09 PM, Seamus Abshere wrote:
> Dear friends who have been using ragel for more than a year,
>
> I bet Kevin and I are facing a similar problem that you have all faced,
> namely that as a software project matures, common ground between its
> founding users and new users erodes. Fresh code examples keep interest
> alive and prevent people from re-inventing the wheel. Please do speak up!
>
> How about an authoritative Ruby code example for Ragel Guide 6.7 section
> 4.2.4 (Longest-Match Kleene Star)?. It's "useful when writing simple
> tokenizers"... that sounds like a great way to bridge the gap.
>
> Since all the code examples are in C, it's not clear what you would use
> in Ruby instead of ts and te.
>
> Best,
> Seamus
>
> On 6/13/11 12:42 PM, Kevin T. Ryan wrote:
>> Hey -
>>
>> Just started using the library myself. Easiest way to think about it
>> (at least, it was for me) is that you are defining the machine in the
>> section you noted below from the guide. Until you initialize and
>> execute it, it doesn't "do anything". Thus, in some part of your
>> script you need:
>>
>> %% write data; # sets up all the static data needed by the tokenizer
>>
>> Then (somewhere else in all likelihood), you need to initialize and
>> execute the machine. So, for example:
>>
>> int main(int argc, char* argv[]) {
>> int cs; // you can use this to check the status of the machine
>> char* p = "Your text to tokenize";
>> char* pe = p + strlen(p);
>>
>> %% write init;
>> %% write exec; # this will execute the machine given the input
>> provided by 'p'
>>
>> if (cs ==<machine_name>_error)
>> fprintf(stderr, "Error\n");
>> return 0;
>> }
>>
>>> What might action A look like? How does it use p, pe, etc.? Ditto for B.
>>
>> Maybe action 'A' is used to print a match when it ends (the '%' in
>> front of the A indicates that it will occur when leaving action). For
>> example:
>>
>> action A { print("Found alpha\n"); }
>> action B { print("Found int\n"); }
>>
>> If you need to print out the total string, you might combine it with a
>> 'mark' action. Eg:
>>
>> action mark { mark = p; /* mark needs to be set up in 'main' function
>> now as a char* */ }
>> < as before>
>> lower ( lower | digit )*>mark %A |
>>
>> And do the same for the integer portion of the machine. You could
>> then change your print function to do something like:
>>
>> printf("Found alpha: %.*s\n", p-mark, mark); // print out the alpha found
>>
>>> PS. I think this would address a big question for
>>> ragel/parsing/lexing/tokenizing newbies, namely, how would an **expert**
>>> implement a **simple** tokenizer?
>>
>> You may also want to look at machines that are 'special' for lexing
>> (viz., machine := |* *|;). BTW, I'm very new to this myself - so
>> hopefully I didn't screw anything up too much!
>>
>> PS - I'm actually trying to write up a tutorial which I'll share with
>> the list for feedback once it's done. I think I have a much better
>> grasp of what's going on now, but I think writing it out would
>> actually help my understanding too.
>>
>> Good luck,
>>
>> ktr
>>
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at complang.org
>> http://www.complang.org/mailman/listinfo/ragel-users
>

-- 
Dr. Adrian D. Thurston
http://www.complang.org/thurston/

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list