[ragel-users] confused about scanning

Kevin T. Ryan kevin.t.ryan at gmail.com
Fri Aug 12 02:45:12 UTC 2011


I took what I think you were trying to accomplish (at least to some
degree) and tried to develop a state machine based on the
specifications as I understood them.  I did this partially to help me
understand Ragel a little bit better, so I hope you don't mind that I
didn't use much of what you provided in your original email.  Some
notes related to your email though:

- I don't think you want a scanner for what you are trying to
accomplish.  A scanner spits out things like "number" or "string" or
"operator" without regards to how those things are put together.  I
think you want something that understands the structure of what can
happen where (e.g., udp:foo = true "sets" udp:foo equal to true ... a
scanner might kick out "identifer" [for udp:foo], "operator" or
"equals" for '=' and "keyword" or "identifier" for 'true'.).
- I think you might be able to accomplish some of what you were
intending (even with a scanner) by using fgoto instead of fcall
(although I'm not entirely sure as I didn't fully grasp your code).

The following is in C (not C++), but I think should be easy to follow.
 Note that the last section of my 'main' (checking for errors) was
very helpful in letting me know when I screwed something up (e.g.,
forgot a specific char in a machine, etc.).  All I'm doing is printing
stuff out, but you could adapt it to your needs.  Hope this helps (PS
- you may want to read
http://zedshaw.com/essays/ragel_state_charts.html - I also found it
very helpful in getting started with Ragel):


#include <stdio.h>
#include <string.h>

%%{
    machine sas_scanner;

    action init {
        printf("group: ");
        start = fpc+1;
    }
    action args {
        printf("call: %.*s\n", fpc-start, start);
        start = fpc+1;
    }
    action pr {
        printf("%.*s\n", fpc-start, start);
        start = fpc+1;
    }
    action kwd {
        printf("  %.*s = ", fpc-start, start);
        start = fpc+1;
    }
    action nl {
        printf("\n");
        start = fpc+1;
    }
    action reset { start = fpc; }
    action chain {
        printf("- Chained call -\n");
        start = fpc+1;
    }
    action prset {
        printf(" Set: %.*s ->", fpc-start, start);
        start = fpc+1;
    }

    main := (
        start: (
            "group " @init -> group_name
        ),

        group_name: (
            alpha+ -> group_name |
            " " @pr -> group_name |
            "{" -> details
        ),

        details: (
            '('  @args  -> arguments |
            [:.]        -> details |
            '>'  @chain -> details |
            alpha+      -> details |
            '\n' @reset -> details |
            ';'  @reset -> details |
            '='  @prset -> set     |
            ' '         -> details |
            digit+      -> details |
            '}'         -> final
        ),

        arguments: (
            ',' @pr    -> arguments |
            alpha+     -> arguments |
            ':' @kwd   -> arguments |
            ' '        -> arguments |
            digit+     -> arguments |
            ')' @pr    -> details
        ),

        set: (
            alpha+  -> set |
            ' '     -> set |
            ';' @pr -> details
        )
    );
}%%

%% write data;

int main () {
    char* to_parse =
        "group MyGroup {\n"
        "   tcpclient( host: foo, port: 49152 );\n"
        "   udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );\n"
        "   udp:foo:49152.nonblocking = true;\n"
        "}";

    int cs, act;
    const char* p = to_parse;
    const char* pe = to_parse + strlen(to_parse);

    const char* start;
    const char* end;

    %% write init;
    %% write exec;

    if (cs == sas_scanner_error) {
        printf("Error parsing @ %s\n", p);
    }
    return 0;
}

---------------------
Kevin T. Ryan
http://blog.gridmule.com/



On Fri, Aug 5, 2011 at 9:13 AM, 𝄆 Rob Harris 𝄇 <rob.harris at gmail.com> wrote:
>
> All, help. I've R'd TFM all week trying to figure this out, but am still
> confused (so please pardon the potential n00bness.)
>
> I have to parse a config file for an app I'm working on, whose format is
> basically of the format:
> group MyGroup {
>   tcpclient( host: foo, port: 49152 );
>   udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );
>   udp:foo:49152.nonblocking = true;
> }
>
> From what I've read on the Intertubes, it seems that the SOP for processing
> this is to define a main := which will match a particular line of the text
> and then upon matching call a another machine to "scan" the message.
> However, I'm not sure how to do that because it seem that regardless of
> whether I define main as a matcher or a scanner, executing the parser always
> seems to consume the text as it matches. For instance, when I parse the
> group definition, I can simply match on the word "group" and then pass the
> rest of the line (up to the {) in to the scanner and I can get 'MyGroup' out
> relatively easily. However, when I try to parse the first encapsulated line,
> I don't know whether I'm dealing with a string of the first line form or
> third line form (or if the command is "chained" as in the second line) until
> I've done a kleene star match of the entire line (up to the ;) at which
> point it seems that the parser has already consumed the entire line and when
> I pass it into a scanner the pointers are already at the next line. Do I
> need the store the starting pointer before the first main scan (and if so,
> how?) and then how would I tell the downstream scanner where to start? I
> thought of making a number of nested c++ "parser objects" but that just seem
> inherently wrong.
>
> Below is what I've written so far--just enough to hopefully pass the first
> two cases. Again, I don't know if I'm only a character or so off or if my
> mindset is completely off. Any help would be appreciated.
>
> --
> Rob Harris
>   Technological Pragmatist
>   rob period harris shift-2 gmail decimal-point com
>   "The universe tends towards maximum irony." --Jamie Zawinsky
>
>  %%{
>     machine sas_scanner;
>     ml_comment = '/*' ( any )* :>> '*/';
>     sl_comment = '//' [^\n]* '\n';
>     comment    = ml_comment | sl_comment;
>     wspace     = comment | space+ ;
>     integer    = [0-9]*;
>     float      = [0-9]* '.' [0-9]*;
>     identifier = [a-zA-Z][a-zA-Z0-9]*;
>     fqsm       = [a-zA-Z] ( [a-zA-Z0-9:][a-zA-Z0-9_] )*;
>     sqstring   = '\'' [^\n]* :>> '\'';
>     dqstring   = '\"' [^\n]* :>> '\"';
>     strvalue   = ( integer | float | identifier | sqstring | dqstring );
>     action DEBUG { fprintf( stderr, "state: %4d, char: %c\n", cs, *p ); }
>     action RESET { reset(); }
>     action CRLF  { std::cout << std::endl << std::endl; }
>     action NAME  { m_name.append( 1, fc ); }
>     action KEY   {  m_key.append( 1, fc ); }
>     action VAL   {  m_val.append( 1, fc ); }
>     action QKV
>     {
>       printf( "[%s]=>[%s]\n", m_key.c_str(), m_val.c_str());
>       m_kvMap[ m_key ] = m_val;
>       m_key.clear();
>       m_val.clear();
>     }
>     action SNAME { printf( "NAME: [%s]\n", m_name.c_str() ); }
>     kvpair = ( identifier space* ':' space* strvalue );
>     kvlist = ( space+ | kvpair | ',' space+ kvpair );
>     instantiation = ( identifier '(' kvlist* ')' );
>
>     instantiation_chain = (
>       instantiation $NAME ( space* '>' space* instantiation )*
>       ) $NAME >RESET ';' @SNAME;
>
>     inst_chain_scanner :=
>     |*
>       space+;
>       identifier => { diff(); };
>       strvalue => { diff(); };
>     *|;
>
>     group_name = ( 'g' 'r' 'o' 'u' 'p' );
>     group_id = ( identifier - group_name ) @NAME;
>     group_line = ( group_name space+ group_id :>> space* '{' );
>
>     group_scanner :=
>     |*
>       space+ => { m_name.clear(); };
>       group_name;
>       group_id => { printf( ">> %s\n", m_name.c_str() ); };
>       '{' => { fret; };
>     *|;
>
>     main :=
>     |*
>       wspace+;
>       group_name => { fcall group_scanner; };
>       instantiation_chain => { fcall inst_chain_scanner; };
>     *|;
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users


More information about the ragel-users mailing list