[ragel-users] Parsing PokerHand-History file (kind of log file withactions)

ragel-user at jgoettgens.de ragel-user at jgoettgens.de
Sat Jul 23 16:00:19 UTC 2011


Jens!

It looks as if the “poker” language mainly consists of a simple list of key 
value pairs, so a plain tokenizer might suffice. Compared to tools like 
flex, or the scanner part of ANTLR, the focus is more on states and 
transitions compared to entire. When matching arbitrary text you probably 
also need to have a look at the longest-match Kleen star operator. With 
Ragel you can do many things “on the fly”, but if you are just transforming 
a list into a different format, you may not need this power. Likewise you 
wouldn't need the power of an LR or LL(*) parser (though LL(*) grammars are 
very easy to code and the speed penalty might be acceptable). You could use 
the tokenizer of the C runtime library and subsequently match keywords using 
the output from gperf. Simple and not slow.

I am using both techniques to remotely control an Asterisk PBX server 
(telphony system) using the AMI protocol 
(http://www.voip-info.org/wiki/view/Asterisk+manager+API). The AMI protocol 
shares a lot with your "poker" language. The main difference is that I am 
dealing with a real-time system (asynchronous communication, timing issues, 
net problems, etc.) and I know that a valid input stream always has 
terminating characters (or I insert a "timeout" token at the socket level, 
so no need for expr**). Unfortunately not all Asterisk modules follow the 
AMI protocol exactly (instead of violating the protocol view this as 
extending the protocol) and there are a couple of exceptions that makes the 
handwritten code now very ugly. This is where Ragel starts to shine.

There are also various text transformation tools out there. I think it could 
be possible to transform your key value lists into SQL code without any 
written line of source code (if you don't count the code the transformation 
specification).

If you have a well behaved source, the system supplied tokenizer (+ gperf) 
is probably preferrable, otherwise Ragel. Ragel is more fun, though. There's 
a graphviz installer for your windows machine and starting from your simple 
example you could add some output for all available actions to see what's 
going on during execution time. It won't take long before you get a feeling 
how things work and where you must be careful.

Of course, if Adrian had a lot of spare time left over, he could add an 
"instrumentation" option to Ragel by adding diagnostic code to all states 
and transitions (essentially adding any allowable action with some code). In 
the simplest form there would be just some console output. A better solution 
would be to fire events though a socket and with a little extra work to 
control the input, one could write a nice graphics tool to visualize the FSM 
and its transitions. This would be helpful for beginners, but generally 
would be useful as a teaching tool for a college level CS class.

Happy tokenizing,
jg


_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users


More information about the ragel-users mailing list