[ragel-users] Reproducable crash

Adrian Thurston thurs... at cs.queensu.ca
Wed Jan 24 00:00:16 UTC 2007


Hi Alex, it appears that the attachment didn't make it. Could you resend?

> This is obviously misspelled. Besides that, is there a way to avoid
> this? I really am scanning any text that may have optional special
> codes in it at random places, and empty text is perfectly fine by me.

It's not normally necessary to have the empty case in something that's
repeated. In theory it creates an infinite loop. If ragel actions strictly
adhered to the rules of automata theory then action foo would be executed an
infinite number of times in between each "bar". But ragel can only
approximate this behaviour, and the approximation is a little wonky in the
case of repeating the empty word.

main := ( "" %foo | "bar" )+ '\n';

If you look at the graphviz drawing of the above you'll see that on the
first and second 'b' foo is executed once, on all subsequent b's it is not
executed. This is inconsistent at best, so ragel yields a warning. A better
implementation isn't really worth it because you can usually just factor the
empty case out of a repetition.

> I don't want to use the full tokenizing machinery since I have the
> entire buffer available at once, but I would like beginning positions
> of machines.

While providing pointers to machines would be useful, here are my reasons
for opting not to.

Ragel is often used to parse text that arrives in blocks. When pointers get
invalidated by moving to the next block, something needs to be done. There
are many options for dealing with this and they depend on how input arrives.
Rather than make any assumptions I feel it is better to leave it up to the
user. At least in Ragel that is. I think auto-pointers is something that
could be done in a higher-level type of program. This is something I intend
to work on in the future (in the broader context of source transformation
systems).

Also, since it would be wasteful to automatically save pointers for every
named machine, the machines for which pointers are saved would need to be
explicitly declared. They can't be extracted from the host language because
the host language is not parsed. I think doing >{ptr = p;} is not much more
typing than a declaration :)

Regards,
 Adrian




More information about the ragel-users mailing list