[ragel-users] A problem with ragel

Pei Deng dpcmain at gmail.com
Wed Mar 7 20:03:12 UTC 2012


Hi all,

When i want to use ragel to parser uri abnf from rfc3986, but meet some
error.

First, i get the abnf from that rfc, then convert it to ragel form. As
bellow:
#-----------------------------------------------------------------------------------------------------
%%{
machine URI;

action ActionStartScheme { StartScheme(fpc); }
action ActionEndScheme { EndScheme(fpc); }

action ActionStartAuthority { StartAuthority(fpc); }
action ActionEndAuthority { EndAuthority(fpc); }

action ActionStartUser { StartUser(fpc); }
action ActionEndUser { EndUser(fpc); }

action ActionStartHost { StartHost(fpc); }
action ActionEndHost { EndHost(fpc); }

action ActionStartPort { StartPort(fpc); }
action ActionEndPort { EndPort(fpc); }

action ActionStartPath { StartPath(fpc); }
action ActionEndPath { EndPath(fpc); }

action ActionStartQuery { StartQuery(fpc); }
action ActionEndQuery { EndQuery(fpc); }

action ActionStartFragment { StartFragment(fpc); }
action ActionEndFragment { EndFragment(fpc); }

action All { printf("%c", fc); }

#ALPHA = %x41-5A / %x61-7A
ALPHA = 0x41..0x5A | 0x61..0x7A;

#CR = %x0D
CR = 0x0D;

#DIGIT = %x30-39
DIGIT = 0x30..0x39;

#DQUOTE = %x22
DQUOTE = 0x22;

#HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
HEXDIG = DIGIT | "A" | "B" | "C" | "D" | "E" | "F";

#LF = %x0A
LF = 0x0A;

#SP = %x20
SP = 0x20;

#unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
unreserved = ALPHA | DIGIT | "-" | "." | "_" | "~";

#dec-octet = ( DIGIT ) / ( %x31-39 DIGIT ) / ( "1" 2DIGIT ) / ( "2" %x30-34
DIGIT ) / ( "25" %x30-35 )
dec_octet = ( DIGIT ) | ( 0x31..0x39 DIGIT ) | ( "1" DIGIT{2} ) | ( "2"
0x30..0x34 DIGIT ) | ( "25" 0x30..0x35 );

#pct-encoded = "%" HEXDIG HEXDIG
pct_encoded = "%" HEXDIG HEXDIG;

#gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
gen_delims = ":" | "/" | "?" | "#" | "[" | "]" | "@";

#sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" /
"="
sub_delims = "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" |
"=";

#reserved = gen-delims / sub-delims
reserved = gen_delims | sub_delims;

#pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
pchar = unreserved | pct_encoded | sub_delims | ":" | "@";

#query = *( pchar / "/" / "?" )
query = ( ( pchar | "/" | "?" )* );

#fragment = *( pchar / "/" / "?" )
fragment = ( pchar | "/" | "?" )*;

#segment = *pchar
segment = pchar*;

#segment-nz = 1*pchar
segment_nz = pchar+;

#segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
segment_nz_nc = ( unreserved | pct_encoded | sub_delims | "@" )+;

#path-empty = 0<pchar>
path_empty = "";

#path-noscheme = segment-nz-nc *( "/" segment )
path_noscheme = ( segment_nz_nc ( "/" segment )* ) $All;

#path-rootless = segment-nz *( "/" segment )
path_rootless = ( segment_nz ( "/" segment )* ) $All;

#path-absolute = "/" [ segment-nz *( "/" segment ) ]
path_absolute = ( "/" ( segment_nz ( "/" segment )* )? ) $All;

#path-abempty = *( "/" segment )
path_abempty = ( ( "/" segment )* ) $All;

#path = path-abempty / path-absolute / path-noscheme / path-rootless /
path-empty
path = path_abempty | path_absolute | path_noscheme | path_rootless |
path_empty;

#reg-name = *( unreserved / pct-encoded / sub-delims )
reg_name = ( unreserved | pct_encoded | sub_delims )*;

#IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
IPv4address = dec_octet "." dec_octet "." dec_octet "." dec_octet;

#h16 = 1*4HEXDIG
h16 = HEXDIG{1,4};

#ls32 = ( h16 ":" h16 ) / IPv4address
ls32 = ( h16 ":" h16 ) | IPv4address;

#IPv6address = ( 6( h16 ":" ) ls32 ) / ( "::" 5( h16 ":" ) ls32 ) / ( [ h16
] "::" 4( h16 ":" ) ls32 ) / ( [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
) / ( [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 ) / ( [ *3( h16 ":" )
h16 ] "::" h16 ":" ls32 ) / ( [ *4( h16 ":" ) h16 ] "::" ls32 ) / ( [ *5(
h16 ":" ) h16 ] "::" h16 ) / ( [ *6( h16 ":" ) h16 ] "::" )
IPv6address = ( ( h16 ":" ){6} ls32 ) | ( "::" ( h16 ":" ){5} ls32 ) | (
h16? "::" ( h16 ":" ){4} ls32 ) | ( ( ( h16 ":" ){0,1} h16 )? "::" ( h16
":" ){3} ls32 ) | ( ( ( h16 ":" ){0,2} h16 )? "::" ( h16 ":" ){2} ls32 ) |
( ( ( h16 ":" ){0,3} h16 )? "::" h16 ":" ls32 ) | ( ( ( h16 ":" ){0,4} h16
)? "::" ls32 ) | ( ( ( h16 ":" ){0,5} h16 )? "::" h16 ) | ( ( ( h16 ":"
){0,6} h16 )? "::" );

#IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
IPvFuture = "v" HEXDIG+ "." ( unreserved | sub_delims | ":" )+;

#IP-literal = "[" ( IPv6address / IPvFuture  ) "]"
IP_literal = "[" ( IPv6address | IPvFuture  ) "]";

#port = *DIGIT
port = DIGIT*;

#host = IP-literal / IPv4address / reg-name
host = IP_literal | IPv4address | reg_name;

#userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
userinfo = ( unreserved | pct_encoded | sub_delims | ":" )*;

#authority = [ userinfo "@" ] host [ ":" port ]
authority = ( userinfo "@" )? host ( ":" port )?;

#scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
scheme = ALPHA ( ALPHA | DIGIT | "+" | "-" | "." )*;

#relative-part = ( "//" authority path-abempty ) / ( path-absolute ) / (
path-noscheme ) / ( path-empty )
relative_part = ( "//" authority path_abempty ) | ( path_absolute ) | (
path_noscheme ) | ( path_empty );

#relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative_ref = relative_part ( "?" query )? ( "#" fragment )?;

#hier-part = ( "//" authority path-abempty ) / ( path-absolute ) / (
path-rootless ) / ( path-empty )
hier_part = ( "//" authority path_abempty ) | ( path_absolute) | (
path_rootless ) | ( path_empty );

#absolute-URI = scheme ":" hier-part [ "?" query ]
absolute_URI = scheme ":" hier_part ( "?" query )?;

#URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
URI = scheme ":" hier_part ( "?" query )? ( "#" fragment )?;

#URI-reference = URI / relative-ref
URI_reference = URI | relative_ref;

main := URI_reference | absolute_URI;
}%%
#-----------------------------------------------------------------------------------------------------

I want to get the path from a uri, example as:
"http://www.complang.org/ragel/examples/clang.rl"
should out put:
"/ragel/examples/clang.rl"
but i get:
"http//ragel/examples/clang.rl"

I have read the doc of ragel, but i don't know where i do wrong.
I have work on this problem several days :-(, so please give me some help,
thanks very much :-)

ps.
my all need is to get some part information from a uri.
i have see XuLang's question in this maillist, but i want to implement a
all rfc 3986 abnf, not only a common regex match.



-- 

*Regards,*
*Deng Pei*
*
*
*Software Engineering Institute*
*Email: dpcmain at gmail.com*
*Address: East China Normal University, Shanghai, China 200062*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20120308/5c3093cf/attachment-0001.html>
-------------- next part --------------
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users


More information about the ragel-users mailing list