[ragel-users] Re: tuning/optimizing scanners

Fri Oct 5 19:04:03 UTC 2007

Hi Chuck,

Those switches are new with 5.24. They pass a number of tests, but are not heavily tested. 
By mutually exclusive I meant that the patterns do not overlap. For example,

'a';
'b';

are mutually exclusive, but

'a';
any;

are not. On second thought I'm not sure if it would be worth the effort to refactor in your case, as the time is probably dominated by the basic iterations of the driver and the gains from mutually exclusive patterns are very small. 

Adrian

-----Original Message-----
From: Chuck Remes <cremes.devlist at mac.com>

Date: Fri, 5 Oct 2007 13:04:29 
To:ragel-users at googlegroups.com
Subject: [ragel-users] Re: tuning/optimizing scanners

The -F1 option is what I was missing! Initially I stayed away from  
those switches due to a note in the docs about not all of them being  
supported for the ruby target. I wanted correctness before I wanted  
speed.

Generating the code with that option resulted in a significant  
performance improvement. My baseline testcase went from 50 seconds  
(wall clock time) to 28 seconds. That's more than adequate for right  
now.

Now that I have my feet wet with ragel I'll be more comfortable  
trying things like generating the C code and interfacing to it from  
my ruby code.

Adrian, thanks for a great tool.

Another question before I drop this line of inquiry. What did you  
mean by "make the patterns mutually exclusive?" So I can understand  
it better, please provide an example of a non-exclusive set and a  
mutually exclusive set of patterns.

cr

On Oct 5, 2007, at 11:45 AM, Adrian Thurston wrote:

> Hmmm, the -F1 option should be the fastest and you may get some  
> marginal
> speedups if you make the patterns mutually exclusive and as greedy as
> possible, but I suppose I'd have to suggest using C if real speed is
> what you're after.
>
> Adrian
>
> Chuck Remes wrote:
>> Adrian,
>>
>> I am using ragel 5.24 so I can have ruby support.
>>
>>
>> On Oct 5, 2007, at 11:13 AM, Adrian Thurston wrote:
>>
>>> Hi Chuck,
>>>
>>> The parsing methodology looks fine to me. There is no undue
>>> backtracking.
>>>
>>> What version of Ragel are you using?
>>>
>>> -Adrian
>>>
>>> Chuck Remes wrote:
>>>> I've written a log parsing tool using ragel and ruby. I'm using the
>>>> scanner construct to perform the parsing, but things appear to be
>>>> running very slowly. I fear I may have chosen the wrong methodology
>>>> to parse the log. (And yes, I know ruby isn't the quickest language
>>>> out there...) :-)
>>>>
>>>> The log in question is a set of key/value pairs that look like this
>>>> (this is one line):
>>>>
>>>> Oct  1 09:50:33.37204 [29193]: {market = ICE | type = order |
>>>> order_id = 4 | buy = 1 | price = 80.83 | volume = 1 | date =
>>>> 2007-10-01 | time = 09:50:33.37201 | metadata = {l={f=Quote|g=4| 
>>>> j=1|
>>>> sid=8290182729}|ac=289182|cf=2881|ca= 289182}}
>>>>
>>>> I'm uninterested in the date and other data at the line start, so I
>>>> throw it away. I primarily search for the key (e.g. 'market = ')  
>>>> and
>>>> then fgoto another machine to parse the value. Upon hitting a pipe
>>>> character, I fgoto main again and look for another key. I pasted  
>>>> in a
>>>> section of the machine below to illustrate.
>>>>
>>>> Is this the correct approach? Is there a superior method for  
>>>> rapidly
>>>> parsing long text strings? Be gentle with me... I'm new to this
>>>> stuff.
>> [snip]
>>
>> >>
>>
>