[ragel-users] short strings, including some which are 1-letter prefixes of other

Andrew Dalke dalke at dalkescientific.com
Mon Dec 7 17:01:13 UTC 2009


Hi all,

I'm updating a parser I wrote a couple of years ago, which parses a molecular format called SMILES. Molecules contain atoms and bonds. The atoms include the element name as an abbreviation.

Consider C and Cl as two such abbreviations. One is a prefix of the other. I had

  is_raw_atom = (
      #
      'B' % raw_atom_B_5_action |
      'C' % raw_atom_C_6_action |
      'Cl' % raw_atom_Cl_17_action |
        ...

and that worked for what I was doing before, but now I'm trying to get error handling to work. Suppose someone does "CQ". I want raw_atom_C_6_action to occur and then an error.

Ragel doesn't do that. It reports the error at the 'C', because it never transitions out from the end state.

What I did in my current update (in addition to changing the action names) is this:

aliphatic_organic = (
  'B'  %is_aliphatic_B  %err(is_aliphatic_B)  |
  'C'  %is_aliphatic_C  %err(is_aliphatic_C)  |
  'N'  >is_aliphatic_N |
     ...
  'Cl' %is_aliphatic_Cl %err(is_aliphatic_Cl) |
  'Br' @is_aliphatic_Br |
      ...
);


It works, but is it correct and proper? I did see there was the |* ... *| construct designed for things like this, but I didn't want the backtracking.

Best regards,


				Andrew
				dalke at dalkescientific.com






More information about the ragel-users mailing list