[ragel-users] Re: Inline scanner

Carlos Antunes cmantu... at gmail.com
Thu Jul 5 16:00:33 UTC 2007

On 7/5/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> I'm not yet convinced that a new feature is necessary to solve this
> problem. It seems to me that it's more a matter of coding technique. But
> of course I could be wrong ... I just need to know how an inline scanner
> is different and better than the code I sent.

Adrian, the code you sent, while very useful in my case, is a hack!
This preprocessing code is necessary because ragel is not able to
efficiently handle a SIP parser in practical terms, with only

Let me give you an example of something very common in the way SIP is defined:

    algorithm_value = /MD5/i @(p_algorithm_value, 1) |
                              /MD5-sess/i @(p_algorithm_value, 1) |
                              token @(p_algorithm_value, 0);

I have to use priorities here to make sure that /MD5/ and /MD5-sess/
aren't evaluated in parallel with token, given that token matches
(almost) everything. With a scanner, this would be a breeze and would
require no priorities.

Right now, I have:

A)    algorithm = /algorithm/i equal algorithm_value;

Currently, to code with a scanner, I have to do something like this:

B)    algorithm = /algorithm/i sp_optional '=' @{ fcall scan_algorithm_value; }

However, this breaks the "flow" of the grammar. What is easier to
understand, maintain (and keep consistent with SIP, in this case), 'A'
or 'B'? For me, I prefer to use 'A'.

> One difference I can think of (you described this previously) is that
> the inline scanner is entered immediately upon moving to the start state
>  (as opposed to the first character out of the start state).
> Unfortunately this is not compatible with the current run-time model, in
> which actions take place only on transitions over characters. Anything
> that involves changing the run-time model I have to consider very carefully.

Assuming I understand the idea you are trying to convey here (there's
a good chance that I don't), isn't it possible to transition directly
to the scanner by looking at all the potential transitions of the
scanner as if it was a "normal" state machine?

For example, let's assume one has the machine:

C)    variable '=' digit+;

With an inline scanner, this would be, for example:

D)    variable '=' |* digit+ *|;

In 'C', you transition from state "matched_equal" to state
"matched_digit" upon seeing a digit. Now, you could do the same in
'D', no? For all practical purposes, the transition process appears to
be the same in both 'C' and 'D'. The fact that in 'D' you would be
operating under the "longest match with backtracking" paradigm appears
to not affect that initial transition into the scanner. What I am
trying to get at is that maybe you really don't have to change your
run-time model?



"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
        -- Thomas Jefferson

More information about the ragel-users mailing list