[ragel-users] Detect keywords with a ragel scanner

Alec Tica alexandru.tica at gmail.com
Mon Jul 18 18:16:37 UTC 2011


Thanks a lot Adrian! It's working beautifully!

On Mon, Jul 18, 2011 at 4:49 AM,  <thurston at complang.org> wrote:
> Hi Talek, what you should do is include the tail items in the scanner and add a pattern that covers any word that is not 'select'. If you specify  'select' ahead of the generic pattern it will be matched in favour of the generic pattern on only that word.
>
> Adrian
> -----Original Message-----
> From: Alec Tica <alexandru.tica at gmail.com>
> Sender: ragel-users-bounces at complang.org
> Date: Fri, 15 Jul 2011 00:20:42
> To: <ragel-users at complang.org>
> Reply-To: ragel-users at complang.org
> Subject: [ragel-users] Detect keywords with a ragel scanner
>
> Hi,
>
> I'm new to Ragel and I'm trying to figure out how to solve,
> apparently, a very simple problem. Let's say I have the following
> text:
>
> "select 1 from dual;select 2 from dual;/*comment*/select 3 from dual;select"
>
> I want to detect all "select" keywords using a scanner but taking into
> consideration the word boundaries. "select" is a keyword only if:
>
> 1. starts at: the very beginning of the text or it has a whitespace
> before or a comment or a statement separator (;)
> 2. ends at: the very end of the text or it has a whitespace after or a
> comment or a statement separator (;)
> 3. is not within quotes
> 4. is not part of a comment
>
> Till now I have:
>
> <code>
> %%{
>  machine example;
>
>  action is_eof {
>    true if p == eof - 1
>  }
>
>  # eof
>  EOF = zlen when is_eof;
>
>  # strings
>  squoted_string = ['] ( (any - [''])** ) ['];
>  dquoted_string = '"' ( any )* :>> '"';
>
>  # comments
>  ml_comment = '/*' ( any )* :>> '*/';
>  sl_comment = '--' ( any )* :>> ('\n' | EOF);
>  comment = ml_comment | sl_comment;
>
>  tail = space | comment | ';' | EOF;
>
>  # keyword
>  select = 'select' . tail;
>
>  main := |*
>    squoted_string;
>    dquoted_string;
>    comment;
>    select => { puts "found at #{ts}-#{te}" };
>    any;
>  *|;
>
> }%%
>
> %% write data;
>
> data = 'unselect 1 from dual;select 2 from dual;/*comment*/select 3
> from dual;select'
> # convert the provided string in a stream of chars
> stream_data = data.unpack("c*") if(data.is_a?(String))
> eof = stream_data.length
>
> %% write init;
> %% write exec;
> </code>
>
> Of course, the above scanner incorrectly matches the "unselect" word
> from the data. Anyway, I feel that I'm not on the right track
> therefore I'd like to ask for your advice.
>
> Many thanks in advance!
>
> --
> talek
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>



-- 
talek

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list