[ragel-users] Maintaining char & line counts in a scanner

Joe Wildish joe at elusive.cx
Sun Apr 25 15:18:26 UTC 2010


Hi Adrian,

Thanks for the response. I agree that the third approach you mention is the most elegant. Having said all that, I've just finished implementing the counters with a second pass (ie. the first approach).

The reason is that I actually want to record the starting positions (line & column) *and* the ending positions for each token. I therefore played around with having an action on entry that recorded the existing positions, and a final action that then wrote the starting and ending positions into the token struct... however, I started to get myself confused with how the backtracking might come into play, so opted to take the whole counting-malarky outside of the machine :) I may well revisit this, but have since moved onto the cfg parser as the second-pass approach actually does work OK... it's just not as "nice" as keeping it all self-contained in the machine. 

A quick question though; regarding your examples below, are you suggesting that the use of the intersection means that the backtracking won't occur? 

Many thanks,

-Joe

On 23 Apr 2010, at 19:31, Adrian Thurston wrote:

> Hi Joe,
> 
> There are a few approaches to this problem. The simplest approach is to simply count newlines in the matched text in every match action. The downside to this is that you are passing over everything twice.
> 
> If a second pass over each token is something you'd like to avoid, then you can go down the sub-scanner road. Basically, any pattern that can contain a newline, such as multi-line comments, or strings, can be implemented with a sub-scanner. In the main scanner you write a pattern for whatever sequence of characters takes you into comments, for example, then jump into a separate scanner for comments. You end up with broken down comments though, as opposed to a whole match of a comment.
> 
> A third approach is to write patterns that count newlines as they go. This is my favourite approach. The only worry is backtracking. If your scanner patterns backtrack over newlines, then you've got double counting happening. With a well-designed scanner, this isn't normally a problem though. Try something like this:
> 
> counter = ( any | '\n' @inc )*;
> comment = ( '/*' any* :>> '*/' ) & counter;
> 
> Or embed the counting deep:
> 
> comment = ( '/*' ( any | '\n' @inc )* :>> '*/' ) & counter;
> 
> -Adrian


_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list