[ragel-users] Re: Feature Request: Inline Scanner

Carlos Antunes cmantu... at gmail.com
Sat Nov 4 21:24:19 UTC 2006


On 11/4/06, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
>
> But I still think containment could be useful. Maybe you'd want to have one
> markup for the whole email address and other markups which give you the user
> and host names.
>

Sure, you could do that but, at the same time, you could also do it
sequentially with pretty much the same end result given that string
concatenation is a pretty simple thing to do. Personally, I don't feel
the need for capture within capture although it could prove useful in
certain contexts.

>
> > Hmm, keep a global variable (ex: alltokstarts)? Thss 'alltokstarts'
> > var could be defined as min(tokstart, ts1, ts2, ts3, ...).
>
> I wonder if maintaining this could be made fast even when the number of
> variables grows.
>

Hmm,  'alltokstarts' could be updated at the beginning of each capture
with something like min(alltokstarts, ts(n)), no? This would scale
well.

>
> >>From the point of view of the FSM, the inline scanner would be a
> > virtual state. Transitions to this virtual state would happen if and
> > only if at least one of the inline scanner patterns matches. If there
> > is no possible match then the FSM would error.
>
> You'll have to bear with me here, I can be thick sometimes!
>

>From my point of view you are the expert here. Therefore, if you don't
understand what I'm saying, the blame is totally on me! :-)

>
>From what you're saying it seems like it's not really a scanner
>

No, it's not like a regular scanner that keeps repeatedly trying to
match any of the expressions. I guess I should rename my proposed
'inline scanner' to 'longest-match capture'.

>
>but more like a union because if it finishes when it matches a
pattern then it won't
>ever match more than one. Is that right?
>

Well, union with a twist. For example, with:

|> patternA => actionA; patternB => actionB; <|

Once patternA or patternB matches (the longest or the first wins as
with a regular scanner), the capture machine is done.

>
>If that's the case then it seems like the criteria for it starting is
the same as for it finishing.
>

Hmm, not sure I'm following you here. In any case, after I emailed the
list yesterday, I thought a little bit more about the use of state
embeddings as a way to emulate this functionality and end up
concluding that it was probably rubbish. But I thought that the state
chart paradigm could be used to illustrate the basic idea. For
example, with an expression like:

pattern= patA |> patB1 => actionB1; patB2 => actionB2; <| patC;

one could have a state chart like so:

pattern = (

    start: ( patA -> matched_patA ),

    matched_patA: ( |> patB1 => actionB1; patB2 => actionB2; <| ->
matched_patB ),

    matched_patB: ( patC -> final )

);

Another thing to consider is whether my initially proposal of strictly
relying on longest-match for capture makes sense. Maybe the programmer
should have a choice?

So, do you think this is something you'd be willing to implement? :-)

Thanks!

Carlos

-- 
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
        -- Thomas Jefferson



More information about the ragel-users mailing list