[ragel-users] Re: Simple URL parser

Adrian Thurston thurs... at cs.queensu.ca
Fri May 23 18:24:05 UTC 2008


Hey

hsanson wrote:
> 1 - If you see my code (attached) you will see I parse the scheme, 
> hostname, port and path parts of the URL using almost identical
>      actions. The only difference is the variables used to store the 
> parsed strings. Is there a way to pass the actions some variables?
>      So I can create a single function and pass the variable to store as 
> a parameter?

Not currently, this is the single most requested feature for Ragel and
I'm hoping to do something about it soon.

> 2 - For some reason the last parsing never finishes. If I pass an URL 
> like "http://hostname" the parser gives me the scheme "http" but
>      not the hostname. If I pass an URL like "http://hostname:8080" the 
> parser outputs the scheme and hostname but not the port. And if
>      I provide a path like "http://hostname:8080/file.html" the parser 
> gives me all except for the path. As you see the last string never get's
>      parsed and I don't know why??.

If you're using 5.X (which you seem to be) then you should do one of the
following:

1. Also embed the *_write actions as EOF actions and add a "write eof"
following all the processing.

2. Send an additional character (such as 0) after the url.

If you're using 6.X then just add "char *eof = pe;" after the
declarations of p and pe.

> 3 - Is there a better/cleaner way to do this??

Looks good to me. One thing to be aware of is that RFCs are sometimes
ambiguous so if you work directly from them you sometimes run into
problems. See chapter four of the manual for more info on resolving
ambiguities.

-Adrian



More information about the ragel-users mailing list