[ragel-users] Failed to convert URL parser regular expression to Ragel

徐亮 lxu4net at gmail.com
Mon Jan 9 11:02:11 UTC 2012


I have posted a question in
StackOverflow<http://stackoverflow.com/questions/8784903/failed-to-convert-url-parser-regular-expression-to-ragel>about
it.

I found an URL parser regular expression at RFC 2396 and RFC 3986.

    ^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

I converted it to Ragel:

    %%{
      # RFC 3986 URI Generic Syntax (January 2005)
      machine url_parser;

      action pchar     {
        printf("%c", fc);
      }
      action scheme            { printf("scheme\n"); }
      action scheme_end     { printf("\nscheme_end\n"); }
      action authority           { printf("authority\n"); }
      action authority_end    { printf("\nauthority_end\n"); }
      action path                  { printf("path\n"); }
      action path_end           { printf("\npath_end\n"); }
      action query                { printf("query\n"); }
      action query_end         { printf("\nquery_end\n"); }
      action fragment           { printf("fragment\n"); }
      action fragment_end    { printf("\nfragment_end\n"); }

      scheme    = (any - [:/?#])+ >scheme    $pchar %scheme_end ;
      authority = (any - [/?#])*  >authority $pchar %authority_end ;
      path      = (any - [?#])*   >path      $pchar %path_end ;
      query     = (any - [#])*    >query     $pchar %query_end ;
      fragment  = (any)*          >fragment  $pchar %fragment_end ;
      main     := (( scheme ":" )?) <: (( "//" authority )?) <: path ( "?"
query )? ( "#" fragment )?;
    }%%

    #include <cstdio>
    #include <cstdlib>
    #include <string>

    /** Data **/
    %% write data;

    int main(int argc, char **argv) {
      std::string str(argv[1]);
      char const* p = str.c_str();
      char const* pe = p + str.size();
      char const* eof = pe;
      int cs = 0;

      %% write init;
      %% write exec;

      return p - str.c_str();
    }

It's work when I input absolute URI.

    liangxu at dev64:~$ ./uri_test "
http://www.ics.uci.edu/pub/ietf/uri/?c=www&rot=1&e=%20%20"
    scheme
    http
    scheme_end
    authority
    www.ics.uci.edu
    authority_end
    path
    /pub/ietf/uri/
    path_end
    query
    c=www&rot=1&e=%20%20
    query_end

And success when I input authority and path:

    liangxu at dev64:~$ ./uri_test "//
www.ics.uci.edu/pub/ietf/uri/?c=www&rot=1&e=%20%20"
    authority
    www.ics.uci.edu
    authority_end
    path
    /pub/ietf/uri/
    path_end
    query
    c=www&rot=1&e=%20%20
    query_end

But failed when I input only path:

    liangxu at dev64:~$ ./uri_test "/pub/ietf/uri"

What's wrong?

-- 
Liang Xu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20120109/e8efb89f/attachment-0001.html>
-------------- next part --------------
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users


More information about the ragel-users mailing list