[ragel-users] Simple URL parser

Adrian Thurston thurs... at cs.queensu.ca
Fri May 23 18:07:43 UTC 2008


Hey Horacio,

The host action is executed on every character of the host name because
the finishing operator is used on a repeated machine. Finishing actions
are executed every time the state machine moves into a final state,
which is every character in the case of a single character that is
repeated. Use % instead.

-Adrian

hsanson wrote:
> To learn how to use Ragel I am implementing a simple URL parser that
> receives something like "http://www.ragel.com:8080/file.txt" and
> returns each part (scheme, hostname, port, path) as strings. As I
> understand doing this with Ragel should be a breeze.
> 
> Still there is something I am not getting right and would like some
> advice, see code below:
> 
> The scheme part seems to work so I assume my understanding of Ragel is
> not that bad. The problem is with the hostname and port parts. The
> hostname action gets called for each character on the hostname, that
> is not the intended behavior and the port action never gets called.
> 
> Any tips to take me back on track would be greatly appreciated.
> 
> Horacio
> 
> //###################################
> #include <string.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> typedef struct {
>     char *scheme;
>     char *hostname;
>     char *service;
>     char *path;
>     char *uri;
> } suj_url;
> 
> %%{
>     machine uri_parser;
> 
> # Actions
>     action mark_start {
>         start = fpc;
>         printf("Mark start at %c\n", fc);
>     }
> 
>     action scheme {
>         size_t len = fpc - start + 1;
>         url->scheme = calloc(len,sizeof(char));
>         strncpy(url->scheme, start, len);
>         url->scheme[len]='\0';
>         printf("scheme: %s\n",url->scheme);
>     }
> 
>     action host {
>         size_t len = fpc - start + 1;
>         url->hostname = calloc(len,sizeof(char));
>         strncpy(url->hostname, start, len);
>         url->hostname[len]='\0';
>         printf("host: %s\n",url->hostname);
>     }
> 
>     action port {
>         size_t len = fpc - start + 1;
>         url->service = calloc(len,sizeof(char));
>         strncpy(url->service, start, len);
>         url->service[len]='\0';
>         printf("service: %s\n",url->service);
>     }
> # Grammar
>     escaped = ("%" xdigit xdigit);
>     scheme = ("http"i | "rtsp"i | "rtp"i) >mark_start @scheme;
>     port   = (":" digit+) >mark_start %port;
>     host   = (any* -- ("/" | ":")) >mark_start @host;
> 
>     uri = (scheme "://" host  port ) . '\0';
> 
> # Main
>     main := uri;
> 
> }%%
> 
> %%write data;
> 
> suj_url * suj_url_new(char *uri)
> {
>     suj_url *url;
>     char *start;
>     char *end;
> 
>     int cs;
>     %% write init;
> 
>     char *p = uri;
>     char *pe = p + strlen(uri);
> 
>     url = calloc(1,sizeof(url));
>     url->uri = calloc(strlen(uri),sizeof(char));
>     strncpy(url->uri,uri, strlen(uri));
> 
>     %% write exec;
> 
>     return url;
> }
> 
> int main(int argc, char **argv)
> {
>     suj_url *url;
>     url = suj_url_new("rtp://www.ragel.org:8080");
> }
> 
> 
> > 
> 



More information about the ragel-users mailing list