[ragel-users] Re: tuning/optimizing scanners

Fri Oct 5 18:04:29 UTC 2007

The -F1 option is what I was missing! Initially I stayed away from  
those switches due to a note in the docs about not all of them being  
supported for the ruby target. I wanted correctness before I wanted  
speed.

Generating the code with that option resulted in a significant  
performance improvement. My baseline testcase went from 50 seconds  
(wall clock time) to 28 seconds. That's more than adequate for right  
now.

Now that I have my feet wet with ragel I'll be more comfortable  
trying things like generating the C code and interfacing to it from  
my ruby code.

Adrian, thanks for a great tool.

Another question before I drop this line of inquiry. What did you  
mean by "make the patterns mutually exclusive?" So I can understand  
it better, please provide an example of a non-exclusive set and a  
mutually exclusive set of patterns.

cr

On Oct 5, 2007, at 11:45 AM, Adrian Thurston wrote:

> Hmmm, the -F1 option should be the fastest and you may get some  
> marginal
> speedups if you make the patterns mutually exclusive and as greedy as
> possible, but I suppose I'd have to suggest using C if real speed is
> what you're after.
>
> Adrian
>
> Chuck Remes wrote:
>> Adrian,
>>
>> I am using ragel 5.24 so I can have ruby support.
>>
>>
>> On Oct 5, 2007, at 11:13 AM, Adrian Thurston wrote:
>>
>>> Hi Chuck,
>>>
>>> The parsing methodology looks fine to me. There is no undue
>>> backtracking.
>>>
>>> What version of Ragel are you using?
>>>
>>> -Adrian
>>>
>>> Chuck Remes wrote:
>>>> I've written a log parsing tool using ragel and ruby. I'm using the
>>>> scanner construct to perform the parsing, but things appear to be
>>>> running very slowly. I fear I may have chosen the wrong methodology
>>>> to parse the log. (And yes, I know ruby isn't the quickest language
>>>> out there...) :-)
>>>>
>>>> The log in question is a set of key/value pairs that look like this
>>>> (this is one line):
>>>>
>>>> Oct  1 09:50:33.37204 [29193]: {market = ICE | type = order |
>>>> order_id = 4 | buy = 1 | price = 80.83 | volume = 1 | date =
>>>> 2007-10-01 | time = 09:50:33.37201 | metadata = {l={f=Quote|g=4| 
>>>> j=1|
>>>> sid=8290182729}|ac=289182|cf=2881|ca= 289182}}
>>>>
>>>> I'm uninterested in the date and other data at the line start, so I
>>>> throw it away. I primarily search for the key (e.g. 'market = ')  
>>>> and
>>>> then fgoto another machine to parse the value. Upon hitting a pipe
>>>> character, I fgoto main again and look for another key. I pasted  
>>>> in a
>>>> section of the machine below to illustrate.
>>>>
>>>> Is this the correct approach? Is there a superior method for  
>>>> rapidly
>>>> parsing long text strings? Be gentle with me... I'm new to this
>>>> stuff.
>> [snip]
>>
>> >>
>>
>