'string' ranges

Paul r.lp... at gmail.com
Fri Apr 6 07:25:10 UTC 2007


Hello all,

I'm wanting my ragel state machine to process unicode text encoded as utf-8. 
There are some unicode ranges that I want to transition on e.g.

range = [0x0ED0-0x0ED9];

but I don't know how to express this in a minimal way with an unsigned char 
alphabet (i.e. I don't think it can be done directly in ragel's expression 
syntax).

My brain isn't in the best condition, but the two approaches I have thought 
of:
1.) use a script to write out the set of strings in the range and leave it to 
ragel to minimise the states (or something like this)
2.) use ragel's semantic conditions somehow.. (assemble utf-32 version and use 
integer comparison)

But before I attempt either, has anyone had to do anything similar? Or are 
there any suggestions I could use?

Thanks, and have a good Easter

 - Paul



More information about the ragel-users mailing list