[ragel-users] 'string' ranges

Adrian Thurston thurs... at cs.queensu.ca
Fri Apr 6 14:38:27 UTC 2007


Hi Paul,

I think in theory enumerating all the possibilities with a script then
leaving it up to the minimization routine would work. Though it might
end up taking forever to compile.

Semantic conditions could be made to work, but I would advise trying to
express the ranges directly. If that doesn't work well (or maybe if it
generates too many states) you could go the semantic condition route.

To express them directly assemble the ranges byte by byte and then
section by section. I've never done this in a real program so if you try
it out (or one of the other techniques) would you mind sending a message
to the list to say how it went?

alphtype unsigned char;

# 0x0ED0-0x0ED9
r1 = 0x0D ( 0xD0 .. 0xD9 );

# 0x0A07-0x0D40
r2 =
	0x0A ( 0x07 .. 0xFF ) |
	( 0x0B | 0x0C ) any |
	0x0D ( 0x00 .. 0x40 );

You're probably aware of this but I'll mention it just to put it out
there ... for a really simple solution you can always process in two
passes. First expand to a fixed-width character then change the alphabet
type to short or int and process with Ragel.

Regards,
 Adrian


Paul wrote:
> Hello all,
> 
> I'm wanting my ragel state machine to process unicode text encoded as utf-8. 
> There are some unicode ranges that I want to transition on e.g.
> 
> range = [0x0ED0-0x0ED9];
> 
> but I don't know how to express this in a minimal way with an unsigned char 
> alphabet (i.e. I don't think it can be done directly in ragel's expression 
> syntax).
> 
> My brain isn't in the best condition, but the two approaches I have thought 
> of:
> 1.) use a script to write out the set of strings in the range and leave it to 
> ragel to minimise the states (or something like this)
> 2.) use ragel's semantic conditions somehow.. (assemble utf-32 version and use 
> integer comparison)
> 
> But before I attempt either, has anyone had to do anything similar? Or are 
> there any suggestions I could use?
> 
> Thanks, and have a good Easter
> 
>  - Paul
> 
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "ragel-users" group.
> To post to this group, send email to ragel-users at googlegroups.com
> To unsubscribe from this group, send email to ragel-users-unsubscribe at googlegroups.com
> For more options, visit this group at http://groups.google.com/group/ragel-users?hl=en
> -~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20070406/f965fc6d/attachment-0001.sig>


More information about the ragel-users mailing list