Multi-char terminators

Colin Fleming colin.flem... at coreproc.com
Thu Oct 5 19:58:05 UTC 2006


Hi all,

As part of parsing XML, I have the following rules for CData sections:

CDStart = '<![CDATA[';

CDEnd = ']]>';

CData = (Char* -- CDEnd) $each_char;

CDSect = CDStart CData CDEnd;

where each_char is a simple action that stores fc in a buffer. The
problem is that the last two characters in the buffer are always ]],
because the machine doesn't know until it encounters the > if it
should exit the CData machine. I work around this with the following:

CDSect = CDStart CData CDEnd %trim_content;

where trim_content strips the last two characters of the buffer, but
it's a bit ugly. It also wouldn't work if the terminator were some
variable-length production. Is there any general way to handle this
case?

Cheers,
Colin



More information about the ragel-users mailing list