[ragel-users] Priority issues when doing a street name parser

William Lachance wrlach at gmail.com
Thu Sep 24 19:20:25 UTC 2009


Hi Adrian,

Trying to unpack what you're saying-- do you mean I should try to
define a scanner (as defined in section 6.3 of the manual) which tries
the various possibilities for street names (in order from most
preferred to least)?

So one might have

main := |*
   streetNameWithSuffixAndDirection;
   streetNameWithDirection;




2009/9/23 Adrian Thurston <thurston at complang.org>:
> Hi William,
>
> I think what you need is a traditional lexer. See section 6.3 of the manual.
>
> -Adrian
>
> William Lachance wrote:
>> Hi,
>>
>> I'm trying to construct a parser for street addresses using Ragel.
>> That is to say, a machine that will take a free form address like
>> "5553 Barrington Street NW" and parse out the individual components
>> (street number, name, suffix, direction). Everything was going
>> swimmingly until I started to try to add support for street names with
>> multiple tokens in them (e.g. "Bella Vista Avenue NW")
>>
>> Right now my main machine looks like this:
>>
>> streetNumber = (digit+ >getStartStr %endNumber);
>> streetName = (alpha+ (space+ alpha+)*) >getStartStr %endName;
>> suffixFull = space+ suffix
>> dirFull = space+ direction
>> main := (streetNumber alpha? space+)? streetName suffixFull? dirFull?
>>
>> The suffix and dir expressions are really long and boring
>> concatenations like this:
>>
>> directionWest = ("w"i|"west"i) >getStartStr %endDirWest;
>>
>> Anyway, the problem with this simple regular expression is that it
>> doesn't give up on parsing the streetName when it begins parsing the
>> direction and suffix. So in the above example, it will correctly parse
>> "Bella Vista", but then overwrite it with "Avenue", and later "NW". I
>> thought that perhaps adding a few ":>>"'s (to stop the processing of
>> the streetname when suffixes and directions appear) would help:
>>
>> main := (streetNumber alpha? space+)? streetName :>> suffixFull? :>> dirFull? 0;
>>
>> Unfortunately, that seems to have the side effect of terminating
>> parsing of the street name prematurely (bringing us back to square
>> one).
>>
>> It _seems_ like what I'm doing should be straightforward. Basically
>> the rule should be: "keep on parsing the street until you find a token
>> that unambiguously matches a suffix and/or direction; at that point,
>> stop, only keeping the previous tokens". Surely there's a way of
>> expressing that in Ragel?
>>
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>



-- 
William Lachance
wrlach at gmail.com




More information about the ragel-users mailing list