Feature Request: Inline Scanner
cmantu... at gmail.com
Fri Nov 3 21:00:58 UTC 2006
On 11/2/06, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> It seems to me that there are actually two separate features here. One being
> inline scanners and the other being automatic capture/markup of text.
Agreed. The reason I combined the two is because I'm convinced that
scanners end up being the ones "better" at capturing text, mostly
given their longest-match paradigm and their natural backtracking
> Consider that automatic capture/markup could be implemented on arbitrary
> machine definitions and need not be associated with scanners. Scanners
> always do automatic capture by default because the scanner may require
> backtracking up to at most the head of the current pattern. This is solved
> by marking the head of the current pattern so the safety of the backtrack
> can be guaranteed. The pattern markup is more like a bonus.
You are right. As I said above, it is because of this bonus that
scanners end up being natural candidates for capture, IMO.
> If you use new variables, this allows machines that you capture to overlap
> or be contained in one another.
Yes, but is it really useful to have this kind of overlapping or
containment, in practical terms?
>But then the question arises, how do you know where to preserve the
input from when
> you're breaking the stream into buffer blocks?
Hmm, keep a global variable (ex: alltokstarts)? Thss 'alltokstarts'
var could be defined as min(tokstart, ts1, ts2, ts3, ...).
> With inline scanners there are a few questions also: What constitutes "would
> begin the machine?" Since there can be a number of patterns in a scanner, is
> it any pattern at all? Or is it a specific pattern.
>From the point of view of the FSM, the inline scanner would be a
virtual state. Transitions to this virtual state would happen if and
only if at least one of the inline scanner patterns matches. If there
is no possible match then the FSM would error.
> On the other end what constitutes "finishing the scanner?" Again, any
> pattern at all? I'm not sure about the answers to these questions.
Matching at least one of the patterns specified. The corresponding
action would execute and a transition from the virtual state to the
next state would follow regular FSM rules.
I guess that one way of looking at these virtual states and their
associated inline scanning machinery is to think of them as regular
states with appropriatly embedded state actions, no?
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
-- Thomas Jefferson
More information about the ragel-users