[ragel-users] Re: Problem with long alphabet type

Adrian Thurston thurs... at cs.queensu.ca
Thu Sep 13 18:46:19 UTC 2007


Hi I just committed the necessary changes. 

Could you post the code which causes rlgen-cd to crash?

There is a data structure in common.h called key which encapsulates the character type. The core of ragel uses this structure heavily so the interface to it can't really change, but the type inside it can. I once changed this to long long to get a bigger key but ragel slowed down considerably. Maybe a good bignum library would work well. But we also need to consider how the backend changes in response to a larger key type. Also, the abstraction may be bypassed at times. I would need to audit the code. 

Adrian

-----Original Message-----
From: Elmin <matty.noble at gmail.com>

Date: Thu, 13 Sep 2007 10:49:31 
To:ragel-users <ragel-users at googlegroups.com>
Subject: [ragel-users] Re: Problem with long alphabet type



Thanks for the quick answer!  I just compiled the newest version of
the documentation and I didn't see anything about the long alphabet
type in the section about semantic conditions.  Also, when I try to
build a more complex machine, rlgen-cd crashes using the default code
style -- strangely, it works with the goto style, but then the output
is incorrect.

I'd really appreciate this feature, since it would make it a lot
easier to write scanners for Unicode-aware languages.  With the
current tools, I pretty much have two options:

1) Pretend Unicode is 16-bit, and risk offending those unfortunate
ancient greeks who want to use musical notation in their identifiers,
or

2) Write the machine to deal with encoded streams (e.g. UTF-8) and
hope that I never have to support multiple encodings.

It occurs to me that since Unicode doesn't (for now) take up the full
32-bit range, there might be an application for a kind of
"intermediate" alphabet type between short and long -- that way the
storage could still be long, and characters could be allocated in the
range 0x10FFFF..0x8FFFFFFF or something.  Does that sound like it
might work?

On Sep 13, 8:30 am, "Carlos Antunes" <cmantu... at gmail.com> wrote:
> On 9/13/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
>
>
>
> > The semantic condition feature requires that Ragel be able to allocate
> > characters from the alphabet space. Ragel uses these allocated
> > characters to express "character c with cond1 true" or "c with cond1
> > false." But internally Ragel uses longs to store characters and so if
> > your alphabet type is long there is no more room left in the alphabet
> > space to allocate from.
>
> Maybe ragel6 "needs" to work with structs as a datatype instead? Maybe
> have a switch to turn on/off the use of longs/structs? Just a
> tought...






More information about the ragel-users mailing list