[ragel-users] Detect keywords with a ragel scanner

thurston at complang.org thurston at complang.org
Mon Jul 18 01:49:24 UTC 2011


Hi Talek, what you should do is include the tail items in the scanner and add a pattern that covers any word that is not 'select'. If you specify  'select' ahead of the generic pattern it will be matched in favour of the generic pattern on only that word. 

Adrian
-----Original Message-----
From: Alec Tica <alexandru.tica at gmail.com>
Sender: ragel-users-bounces at complang.org
Date: Fri, 15 Jul 2011 00:20:42 
To: <ragel-users at complang.org>
Reply-To: ragel-users at complang.org
Subject: [ragel-users] Detect keywords with a ragel scanner

Hi,

I'm new to Ragel and I'm trying to figure out how to solve,
apparently, a very simple problem. Let's say I have the following
text:

"select 1 from dual;select 2 from dual;/*comment*/select 3 from dual;select"

I want to detect all "select" keywords using a scanner but taking into
consideration the word boundaries. "select" is a keyword only if:

1. starts at: the very beginning of the text or it has a whitespace
before or a comment or a statement separator (;)
2. ends at: the very end of the text or it has a whitespace after or a
comment or a statement separator (;)
3. is not within quotes
4. is not part of a comment

Till now I have:

<code>
%%{
  machine example;

  action is_eof {
    true if p == eof - 1
  }

  # eof
  EOF = zlen when is_eof;

  # strings
  squoted_string = ['] ( (any - [''])** ) ['];
  dquoted_string = '"' ( any )* :>> '"';

  # comments
  ml_comment = '/*' ( any )* :>> '*/';
  sl_comment = '--' ( any )* :>> ('\n' | EOF);
  comment = ml_comment | sl_comment;

  tail = space | comment | ';' | EOF;

  # keyword
  select = 'select' . tail;

  main := |*
    squoted_string;
    dquoted_string;
    comment;
    select => { puts "found at #{ts}-#{te}" };
    any;
  *|;

}%%

%% write data;

data = 'unselect 1 from dual;select 2 from dual;/*comment*/select 3
from dual;select'
# convert the provided string in a stream of chars
stream_data = data.unpack("c*") if(data.is_a?(String))
eof = stream_data.length

%% write init;
%% write exec;
</code>

Of course, the above scanner incorrectly matches the "unselect" word
from the data. Anyway, I feel that I'm not on the right track
therefore I'd like to ask for your advice.

Many thanks in advance!

-- 
talek

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list