[ragel-users] tuning/optimizing scanners

Adrian Thurston thurs... at cs.queensu.ca
Fri Oct 5 16:13:07 UTC 2007


Hi Chuck,

The parsing methodology looks fine to me. There is no undue backtracking.

What version of Ragel are you using?

-Adrian

Chuck Remes wrote:
> I've written a log parsing tool using ragel and ruby. I'm using the  
> scanner construct to perform the parsing, but things appear to be  
> running very slowly. I fear I may have chosen the wrong methodology  
> to parse the log. (And yes, I know ruby isn't the quickest language  
> out there...) :-)
> 
> The log in question is a set of key/value pairs that look like this  
> (this is one line):
> 
> Oct  1 09:50:33.37204 [29193]: {market = ICE | type = order |  
> order_id = 4 | buy = 1 | price = 80.83 | volume = 1 | date =  
> 2007-10-01 | time = 09:50:33.37201 | metadata = {l={f=Quote|g=4|j=1| 
> sid=8290182729}|ac=289182|cf=2881|ca= 289182}}
> 
> I'm uninterested in the date and other data at the line start, so I  
> throw it away. I primarily search for the key (e.g. 'market = ') and  
> then fgoto another machine to parse the value. Upon hitting a pipe  
> character, I fgoto main again and look for another key. I pasted in a  
> section of the machine below to illustrate.
> 
> Is this the correct approach? Is there a superior method for rapidly  
> parsing long text strings? Be gentle with me... I'm new to this stuff.
> 
> Unfortunately, each log record is a slightly different format (for a  
> total of about 15 different formats). I also can't plan on the key/ 
> value pairs showing up in the same order every time.
> 
> Any suggestions?
> 
> ----------- snip here ---------------
> 	feedcode_name = [0-9a-zA-Z\-]+;
> 	numbers = [0-9]+;
> 
> #####
> 	feedcode := |*
> 		spaces;
> 
> 		'|' => { fgoto main; };
> 
> 		feedcode_name => { temp[:feedcode] = data[tokstart..tokend-1]; };
> 		any => {puts "ERR: feedcode #{data[tokstart..tokend-1]}"};
> 	*|;
> #####
> 	volume := |*
> 		spaces;
> 
> 		'|' => { fgoto main; };
> 
> 		numbers => { temp[:quantity] = data[tokstart..tokend].to_i; };
> 		any => {puts "ERR: volume #{data[tokstart..tokend]}"};
> 	*|;
> #####
>          main := |*
> 					'module = ' => { fgoto module; };
> 
> 					'market = ' => { fgoto market; };
> 
> 					'feedcode = ' => { fgoto feedcode; };
> 
> 					'type = ' => { fgoto type; };
> 
> 					'order_id = ' => { fgoto order_id; };
> 
> 					'buy = ' => { fgoto activity; };
> 
> 					'price = ' => { fgoto price; };
> 
> 					'volume = ' => { fgoto volume; };
> 
> 					'date = ' => { fgoto date; };
> 
> 					'time = ' => { fgoto time; };
> 
> 					( numbers | letters | spaces | '\n' | '{' | '}' | other | any );
> 		
>          *|;
> 
> 
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "ragel-users" group.
> To post to this group, send email to ragel-users at googlegroups.com
> To unsubscribe from this group, send email to ragel-users-unsubscribe at googlegroups.com
> For more options, visit this group at http://groups.google.com/group/ragel-users?hl=en
> -~----------~----~----~----~------~----~------~--~---
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20071005/788ebc17/attachment-0001.sig>


More information about the ragel-users mailing list