[ragel-users] Re: Scanners inside of scanners? A better way to avoid conflicts?

Jason Garber j... at jasongarber.com
Tue Jun 17 14:12:39 UTC 2008


Thanks for the help, Erich.

Pardon my ignorance, but how shall I run the three Ragel machines from  
a C function?  Can I run them within the same processing loop or do I  
have to make a new Ragel file with its own main machine and call  
that?  If it needs its own machine, then I have to return several  
pieces of captured information, but if it can stay in the same  
processing loop, then I can just set things in the existing hash.

Sorry if I haven't explained it well enough.  I'm on Skype and AIM if  
you're willing: JasonGarberEMU

On Jun 17, 2008, at 1:04 AM, Erich Ocean wrote:

>
> Jason,
>
> Basically, you need one-token lookahead for that, which a straight
> regex can't do. You might be able to use manual state transitions, but
> it's easier to just separate the logic out into another function.
>
> For example, you could capture the inner contents of the emphasized
> string into a buffer and then call into a C function where you can run
> the three (Ragel) machines in sequence, in the priority order you've
> set. Use the first one that works and return into your original
> scanner machine/function. Order them longest-match-first. :-)
>
> Best, Erich
>
> On Jun 16, 2008, at 8:15 PM, Jason Garber wrote:
>
>> In RedCloth, we have a problem where an _emphasized_ bit of text can
>> have _(myclass#myid)a CSS class and/or id_ but shouldn't have a
>> class or id if the whole emphasis is in parentheses _(practically
>> speaking)_.  Consider this example (simplified):
>>
>> in: "before _(in parens)_ after"
>> expected: "<p>before <em>(in parens)</em> after</p>"
>> but was: "<p>before <em class=\"in parens\">in parens)</em> after</ 
>> p>"
>> It simultaneously pursues the possibilities that the parenthesized
>> text is the class and that it's just regular parenthesized text
>> inside the em.  When the class possibility doesn't work out, the
>> final state is the regular text part but the class has already been
>> captured.
>>
>> C = "(" ( [^)#]+ >A %{ STORE(class) } )? ("#" [^)]+ >A %
>> {STORE(id)} )? ")"
>> mtext = ( chars (mspace chars)* ) ;
>> em = "_" >X C? mtext >A %T :> "_" ;
>> # The >X resets the register from the last match, >A registers the
>> beginning of a string and the STORE saves it away.
>>
>> I tried having the class info get written to a buffer that was then
>> captured with a leaving action, which works for the class part, but
>> I again run into the same problem with capturing the text because
>> the right side of the union matches also, so it captures too much
>> text.  Whichever side of the vertical pipe writes last, wins.
>>
>> C = "(" ( [^)#]+ >A %{ STORE(class_buf) } )? ("#" [^)]+ >A %
>> {STORE(id_buf)} )? ")"
>> C_mtext = (C %{ STORE_ATTRIBUTES(); } mtext >A %{STORE(text)} |
>> mtext >B %{ STORE_B(text); });
>> # SET_ATTRIBUTES copies the attributes from their buffers and stores
>> them where they belong.
>>
>> results in...
>> expected: "<p><span class=\"myclass\">SPAN</span></p>"
>> but was: "<p><span class=\"myclass\">(myclass)SPAN</span></p>"
>>
>> Really what I want is for it to try the first pattern (with the C)
>> and, if that fails, try the second (the plain ol' mtext).  Sounds
>> like a scanner to me.  Problem is, I'm already inside a scanner, so
>> it won't let me call a scanner from within a pattern.
>>
>> Got any ideas?
>>
>> Jason
>>
>>
>>
>>>
>
>
> >



More information about the ragel-users mailing list