[ragel-users] confused about scanning

𝄆 Rob Harris 𝄇 rob.harris at gmail.com
Fri Aug 5 13:13:25 UTC 2011


All, help. I've R'd TFM all week trying to figure this out, but am still
confused (so please pardon the potential n00bness.)

I have to parse a config file for an app I'm working on, whose format is
basically of the format:
group MyGroup {
  tcpclient( host: foo, port: 49152 );
  udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );
  udp:foo:49152.nonblocking = true;
}

>From what I've read on the Intertubes, it seems that the SOP for processing
this is to define a main := which will match a particular line of the text
and then upon matching call a another machine to "scan" the message.
However, I'm not sure how to do that because it seem that regardless of
whether I define main as a matcher or a scanner, executing the parser always
seems to consume the text as it matches. For instance, when I parse the
group definition, I can simply match on the word "group" and then pass the
rest of the line (up to the {) in to the scanner and I can get 'MyGroup' out
relatively easily. However, when I try to parse the first encapsulated line,
I don't know whether I'm dealing with a string of the first line form or
third line form (or if the command is "chained" as in the second line) until
I've done a kleene star match of the entire line (up to the ;) at which
point it seems that the parser has already consumed the entire line and when
I pass it into a scanner the pointers are already at the next line. Do I
need the store the starting pointer before the first main scan (and if so,
how?) and then how would I tell the downstream scanner where to start? I
thought of making a number of nested c++ "parser objects" but that just seem
inherently wrong.

Below is what I've written so far--just enough to hopefully pass the first
two cases. Again, I don't know if I'm only a character or so off or if my
mindset is completely off. Any help would be appreciated.

--
Rob Harris
  Technological Pragmatist
  rob period harris shift-2 gmail decimal-point com
  "The universe tends towards maximum irony." --Jamie Zawinsky

 %%{
    machine sas_scanner;
    ml_comment = '/*' ( any )* :>> '*/';
    sl_comment = '//' [^\n]* '\n';
    comment    = ml_comment | sl_comment;
    wspace     = comment | space+ ;
    integer    = [0-9]*;
    float      = [0-9]* '.' [0-9]*;
    identifier = [a-zA-Z][a-zA-Z0-9]*;
    fqsm       = [a-zA-Z] ( [a-zA-Z0-9:][a-zA-Z0-9_] )*;
    sqstring   = '\'' [^\n]* :>> '\'';
    dqstring   = '\"' [^\n]* :>> '\"';
    strvalue   = ( integer | float | identifier | sqstring | dqstring );
    action DEBUG { fprintf( stderr, "state: %4d, char: %c\n", cs, *p ); }
    action RESET { reset(); }
    action CRLF  { std::cout << std::endl << std::endl; }
    action NAME  { m_name.append( 1, fc ); }
    action KEY   {  m_key.append( 1, fc ); }
    action VAL   {  m_val.append( 1, fc ); }
    action QKV
    {
      printf( "[%s]=>[%s]\n", m_key.c_str(), m_val.c_str());
      m_kvMap[ m_key ] = m_val;
      m_key.clear();
      m_val.clear();
    }
    action SNAME { printf( "NAME: [%s]\n", m_name.c_str() ); }
    kvpair = ( identifier space* ':' space* strvalue );
    kvlist = ( space+ | kvpair | ',' space+ kvpair );
    instantiation = ( identifier '(' kvlist* ')' );

    instantiation_chain = (
      instantiation $NAME ( space* '>' space* instantiation )*
      ) $NAME >RESET ';' @SNAME;

    inst_chain_scanner :=
    |*
      space+;
      identifier => { diff(); };
      strvalue => { diff(); };
    *|;

    group_name = ( 'g' 'r' 'o' 'u' 'p' );
    group_id = ( identifier - group_name ) @NAME;
    group_line = ( group_name space+ group_id :>> space* '{' );

    group_scanner :=
    |*
      space+ => { m_name.clear(); };
      group_name;
      group_id => { printf( ">> %s\n", m_name.c_str() ); };
      '{' => { fret; };
    *|;

    main :=
    |*
      wspace+;
      group_name => { fcall group_scanner; };
      instantiation_chain => { fcall inst_chain_scanner; };
    *|;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20110805/c354b607/attachment-0001.html>
-------------- next part --------------
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users


More information about the ragel-users mailing list