r.lp... at gmail.com
Fri Apr 6 07:25:10 UTC 2007
I'm wanting my ragel state machine to process unicode text encoded as utf-8.
There are some unicode ranges that I want to transition on e.g.
range = [0x0ED0-0x0ED9];
but I don't know how to express this in a minimal way with an unsigned char
alphabet (i.e. I don't think it can be done directly in ragel's expression
My brain isn't in the best condition, but the two approaches I have thought
1.) use a script to write out the set of strings in the range and leave it to
ragel to minimise the states (or something like this)
2.) use ragel's semantic conditions somehow.. (assemble utf-32 version and use
But before I attempt either, has anyone had to do anything similar? Or are
there any suggestions I could use?
Thanks, and have a good Easter
More information about the ragel-users