[ragel-users] Scanners inside of scanners? A better way to avoid conflicts?

Erich Ocean er... at atlasocean.com
Tue Jun 17 05:04:31 UTC 2008


Basically, you need one-token lookahead for that, which a straight  
regex can't do. You might be able to use manual state transitions, but  
it's easier to just separate the logic out into another function.

For example, you could capture the inner contents of the emphasized  
string into a buffer and then call into a C function where you can run  
the three (Ragel) machines in sequence, in the priority order you've  
set. Use the first one that works and return into your original  
scanner machine/function. Order them longest-match-first. :-)

Best, Erich

On Jun 16, 2008, at 8:15 PM, Jason Garber wrote:

> In RedCloth, we have a problem where an _emphasized_ bit of text can  
> have _(myclass#myid)a CSS class and/or id_ but shouldn't have a  
> class or id if the whole emphasis is in parentheses _(practically  
> speaking)_.  Consider this example (simplified):
> in: "before _(in parens)_ after"
> expected: "<p>before <em>(in parens)</em> after</p>"
> but was: "<p>before <em class=\"in parens\">in parens)</em> after</p>"
> It simultaneously pursues the possibilities that the parenthesized  
> text is the class and that it's just regular parenthesized text  
> inside the em.  When the class possibility doesn't work out, the  
> final state is the regular text part but the class has already been  
> captured.
> C = "(" ( [^)#]+ >A %{ STORE(class) } )? ("#" [^)]+ >A % 
> {STORE(id)} )? ")"
> mtext = ( chars (mspace chars)* ) ;
> em = "_" >X C? mtext >A %T :> "_" ;
> # The >X resets the register from the last match, >A registers the  
> beginning of a string and the STORE saves it away.
> I tried having the class info get written to a buffer that was then  
> captured with a leaving action, which works for the class part, but  
> I again run into the same problem with capturing the text because  
> the right side of the union matches also, so it captures too much  
> text.  Whichever side of the vertical pipe writes last, wins.
> C = "(" ( [^)#]+ >A %{ STORE(class_buf) } )? ("#" [^)]+ >A % 
> {STORE(id_buf)} )? ")"
> C_mtext = (C %{ STORE_ATTRIBUTES(); } mtext >A %{STORE(text)} |  
> mtext >B %{ STORE_B(text); });
> # SET_ATTRIBUTES copies the attributes from their buffers and stores  
> them where they belong.
> results in...
> expected: "<p><span class=\"myclass\">SPAN</span></p>"
> but was: "<p><span class=\"myclass\">(myclass)SPAN</span></p>"
> Really what I want is for it to try the first pattern (with the C)  
> and, if that fails, try the second (the plain ol' mtext).  Sounds  
> like a scanner to me.  Problem is, I'm already inside a scanner, so  
> it won't let me call a scanner from within a pattern.
> Got any ideas?
> Jason
> >

More information about the ragel-users mailing list