is this possible?

Steve Horne stephenhorne... at aol.com
Thu Aug 30 20:58:40 UTC 2007


That's a very vague requirement!

Let's see...

> i would like to parse packets[bit patterns] and get out token which
> represent the function..

Ragel generates a state machine, which it uses to extract tokens from
input. In principle, the state machine takes as input a stream of
characters, though in practice those 'characters' can be anything you
can represent using integers (the precise type can be specified to get
more bits per character).

It isn't clear from your description whether the state machine you are
talking about would be the one Ragel generates for you, or some other
one, but either case is fine. You basically supply your own 'template'
code for the Ragel state machine, so you can get your input character
stream any way you want and use your output tokens in any way you
want.

You could even use the output from one Ragel machine as the input for
another one.

The only doubt I have is your mention of 'bit patterns'. As you can
handle input any way you want, there is in principle no difficulty,
but to the best of my knowledge you cannot use the current state to
determine how to decode the next input. You could not take variable-
bit-width characters from the input, for instance.

You certainly can handle something like UTF-8 decoding, though - you
just have a machine that takes 8-bit 'characters' as input and outputs
unicode codepoints as its result (possibly sending them to another
Ragel machine that tokenises the unicode character sequence).
Actually, there's no reason why you can't convert UTF-8 to unicode
codepoints in a single Ragel machine, but it's probably easier to
cascade two machines to do the job.

Equally, you could process a Huffman-encoded input if you really want
to. You couldn't handle the input in multi-bit chunks, but you could
handle it one bit at a time. The only condition is that the particular
encoding would have to be fixed. This would work because Huffman
encoding is a variable-width encoding, designed in such a way that you
can find the end of each character using a finite state machine.

The Ragel manual is very good, by the way. Just read the section on
the 'interface to the host program' and I think all your questions
will be answered.

However, just on the off chance, I'll also recommend that you take a
look at the SMC project...

http://smc.sourceforge.net/

I've not actually used it myself, whereas I use Ragel a lot, but
depending on your state machine requirements it could be more
appropriate.

Ragel has a far more powerful model for specifying finite state
machines in that it supports regular grammar handling features (like
regular expressions, but better) as well as explicit specification of
transitions, and the two forms can be mixed as needed. It also allows
backtracking scanners (like flex - the scanner generator, not the
Adobe web framework thing).

Where SMC appears to win out is if you already have a pure state-
transition model (e.g. based on a UML state diagram) and you want to
handle events that are like class methods, with parameters.



More information about the ragel-users mailing list