[ragel-users] Re: Scanners inside of scanners? A better way to avoid conflicts?

Jason Garber j... at jasongarber.com
Wed Jun 18 12:56:13 UTC 2008


Regarding my question about how to call different machines from the  
same file, I found a June 7 post on this list that answered my question.

> You can pick and choose which machine to start when exec is called by
> setting the cs variable to [machine name]_en_[machine definition].
> e.g:
>
> int cs;
> %%{
>   machine foo;
>   bar := 'bar';
>   baz := 'baz';
>
>
> }%%
>
> %%write init;
> cs = (condition) ? foo_bar : foo_baz;
> %%write exec;
> Take care,
> -Mitchell;
>
But since I'm doing several ways of attribute parsing, I had to start  
a new file anyway, but now they can all be in the same file and share  
the same basic structure.

On Jun 17, 2008, at 1:04 AM, Erich Ocean wrote:

>
> Jason,
>
> Basically, you need one-token lookahead for that, which a straight
> regex can't do. You might be able to use manual state transitions, but
> it's easier to just separate the logic out into another function.
>
> For example, you could capture the inner contents of the emphasized
> string into a buffer and then call into a C function where you can run
> the three (Ragel) machines in sequence, in the priority order you've
> set. Use the first one that works and return into your original
> scanner machine/function. Order them longest-match-first. :-)
>
> Best, Erich
>
> On Jun 16, 2008, at 8:15 PM, Jason Garber wrote:
>
>> In RedCloth, we have a problem where an _emphasized_ bit of text can
>> have _(myclass#myid)a CSS class and/or id_ but shouldn't have a
>> class or id if the whole emphasis is in parentheses _(practically
>> speaking)_.  Consider this example (simplified):
>>
>> in: "before _(in parens)_ after"
>> expected: "<p>before <em>(in parens)</em> after</p>"
>> but was: "<p>before <em class=\"in parens\">in parens)</em> after</ 
>> p>"
>> It simultaneously pursues the possibilities that the parenthesized
>> text is the class and that it's just regular parenthesized text
>> inside the em.  When the class possibility doesn't work out, the
>> final state is the regular text part but the class has already been
>> captured.
>>
>> C = "(" ( [^)#]+ >A %{ STORE(class) } )? ("#" [^)]+ >A %
>> {STORE(id)} )? ")"
>> mtext = ( chars (mspace chars)* ) ;
>> em = "_" >X C? mtext >A %T :> "_" ;
>> # The >X resets the register from the last match, >A registers the
>> beginning of a string and the STORE saves it away.
>>
>> I tried having the class info get written to a buffer that was then
>> captured with a leaving action, which works for the class part, but
>> I again run into the same problem with capturing the text because
>> the right side of the union matches also, so it captures too much
>> text.  Whichever side of the vertical pipe writes last, wins.
>>
>> C = "(" ( [^)#]+ >A %{ STORE(class_buf) } )? ("#" [^)]+ >A %
>> {STORE(id_buf)} )? ")"
>> C_mtext = (C %{ STORE_ATTRIBUTES(); } mtext >A %{STORE(text)} |
>> mtext >B %{ STORE_B(text); });
>> # SET_ATTRIBUTES copies the attributes from their buffers and stores
>> them where they belong.
>>
>> results in...
>> expected: "<p><span class=\"myclass\">SPAN</span></p>"
>> but was: "<p><span class=\"myclass\">(myclass)SPAN</span></p>"
>>
>> Really what I want is for it to try the first pattern (with the C)
>> and, if that fails, try the second (the plain ol' mtext).  Sounds
>> like a scanner to me.  Problem is, I'm already inside a scanner, so
>> it won't let me call a scanner from within a pattern.
>>
>> Got any ideas?
>>
>> Jason
>>
>>
>>
>>>
>
>
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20080618/1bf9787c/attachment-0001.html>


More information about the ragel-users mailing list