[ragel-users] Priority issues when doing a street name parser

William Lachance wrlach at gmail.com
Thu Sep 24 19:23:44 UTC 2009

(sorry about the duplicated mail-- stupid gmail sent my message before
it was ready) :)

Hi Adrian,

Thanks for the quick response. Trying to unpack what you're saying--
do you mean I should try to define a scanner (as defined in section
6.3 of the manual) which tries the various possibilities for street
names (in order from most preferred to least)?

So one might have

main := |*


I was looking a little bit more at regular expressions, and it seems
like perl compatible re's have some special options which allow you to
define how matches are supposed to occur. For example:


"*? Matches the previous atom zero or more times, while consuming as
little input as possible." seems like exactly what I need (a quick
test indicates it gives the desired behaviour). Would it not be
possible for ragel to do this sort of thing?


2009/9/23 Adrian Thurston <thurston at complang.org>:
> Hi William,
> I think what you need is a traditional lexer. See section 6.3 of the manual.
> -Adrian
> William Lachance wrote:
>> Hi,
>> I'm trying to construct a parser for street addresses using Ragel.
>> That is to say, a machine that will take a free form address like
>> "5553 Barrington Street NW" and parse out the individual components
>> (street number, name, suffix, direction). Everything was going
>> swimmingly until I started to try to add support for street names with
>> multiple tokens in them (e.g. "Bella Vista Avenue NW")
>> Right now my main machine looks like this:
>> streetNumber = (digit+ >getStartStr %endNumber);
>> streetName = (alpha+ (space+ alpha+)*) >getStartStr %endName;
>> suffixFull = space+ suffix
>> dirFull = space+ direction
>> main := (streetNumber alpha? space+)? streetName suffixFull? dirFull?
>> The suffix and dir expressions are really long and boring
>> concatenations like this:
>> directionWest = ("w"i|"west"i) >getStartStr %endDirWest;
>> Anyway, the problem with this simple regular expression is that it
>> doesn't give up on parsing the streetName when it begins parsing the
>> direction and suffix. So in the above example, it will correctly parse
>> "Bella Vista", but then overwrite it with "Avenue", and later "NW". I
>> thought that perhaps adding a few ":>>"'s (to stop the processing of
>> the streetname when suffixes and directions appear) would help:
>> main := (streetNumber alpha? space+)? streetName :>> suffixFull? :>> dirFull? 0;
>> Unfortunately, that seems to have the side effect of terminating
>> parsing of the street name prematurely (bringing us back to square
>> one).
>> It _seems_ like what I'm doing should be straightforward. Basically
>> the rule should be: "keep on parsing the street until you find a token
>> that unambiguously matches a suffix and/or direction; at that point,
>> stop, only keeping the previous tokens". Surely there's a way of
>> expressing that in Ragel?
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users

William Lachance
wrlach at gmail.com

More information about the ragel-users mailing list