Problem with long alphabet type

Thu Sep 13 17:49:31 UTC 2007

Thanks for the quick answer!  I just compiled the newest version of
the documentation and I didn't see anything about the long alphabet
type in the section about semantic conditions.  Also, when I try to
build a more complex machine, rlgen-cd crashes using the default code
style -- strangely, it works with the goto style, but then the output
is incorrect.

I'd really appreciate this feature, since it would make it a lot
easier to write scanners for Unicode-aware languages.  With the
current tools, I pretty much have two options:

1) Pretend Unicode is 16-bit, and risk offending those unfortunate
ancient greeks who want to use musical notation in their identifiers,
or

2) Write the machine to deal with encoded streams (e.g. UTF-8) and
hope that I never have to support multiple encodings.

It occurs to me that since Unicode doesn't (for now) take up the full
32-bit range, there might be an application for a kind of
"intermediate" alphabet type between short and long -- that way the
storage could still be long, and characters could be allocated in the
range 0x10FFFF..0x8FFFFFFF or something.  Does that sound like it
might work?

On Sep 13, 8:30 am, "Carlos Antunes" <cmantu... at gmail.com> wrote:
> On 9/13/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
>
>
>
> > The semantic condition feature requires that Ragel be able to allocate
> > characters from the alphabet space. Ragel uses these allocated
> > characters to express "character c with cond1 true" or "c with cond1
> > false." But internally Ragel uses longs to store characters and so if
> > your alphabet type is long there is no more room left in the alphabet
> > space to allocate from.
>
> Maybe ragel6 "needs" to work with structs as a datatype instead? Maybe
> have a switch to turn on/off the use of longs/structs? Just a
> tought...