[ragel-users] Re: Inline scanner

Carlos Antunes cmantu... at gmail.com
Sun Jul 1 20:17:52 UTC 2007


Adrian,

I have been doing a SIP parser using ragel. The main practical problem
I encountered, when using ragel to ijmplement the SIP parser, has to
do with spaces.

In SIP, a space can be defined as:

sp = ( ( "\r"? "\n" )? [ \t] )+;

On the other hand, a CRLF can be defined as

crlf = "\r"? "\n";

In many case, we have lines that may end up as:

line = "start" whatever sp? crlf;

with:

whatever = ( sp? something)*

Now, without priorities, the thing just don't work because of the
bactracking necessary to resolve between a sp and a crlf. Also,
without priorities, whatever ends up being evaluated several times
because something might start with spaces (or not). With priorities, I
was able to make many of these things work but the numbers of states
just explodes beyond belief. The problem is that is takes 30 minutes
to compile each time I make a modification and I am not done with all
the SIP rules yet.

The bottom line is that the SIP grammar is tough. Ragel has a good
shot at implementing the thing but the pure FSM approach just doesn't
cut it. On the other hand, because inline longest match scanners
aren't available, one has to resort to ugly tricks to call external
scanners.

How would one do the above example with an inline scanner?

sp_optional = |*

  ( ( "\r"? "\n" )? [ \t] )* { fret; };

*|;

line = "start" whatever sp_optional crlf;

whatever = ( sp_optional something)*

The obvious advantage in this case is that the longest match feature
of the scanner will eliminate the need for priorities and FSM
backtracking; therefore, I posit, the FSM would be simpler and it
wouldn't take 30 minutes to compile as of now! :)

I'll leave the capture feature for a subsequent email.

Thanks!

Carlos



On 7/1/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> Hi Carlos,
>
> Yeah, I remember that conversation. I thought about those issues and a
> clear design with good value and a nice implementation never became
> apparent to me. Since discussions didn't prove fruitful last time I
> think a mockup would be really helpful to hash out the idea. An example
> of the new syntax used to solve a real problem paired with the an
> implementation using the existing syntax (the classic before and after)
> would really make things clear.
>
> Cheers,
>  Adrian
>
> Carlos Antunes wrote:
> > Hi Adrian,
> >
> > It's been a while...
> >
> > It's good to know you are still supporting Ragel. With that being
> > said, is there any chance of having "inline scanner" functionality
> > added? We discussed this quite some time ago. The idea is to be have a
> > "longest match with capture" of start and stop of the match inline
> > instead of having to rely on external scanners.
> >
> > Alternatively, what about the ability to automatically jump from a
> > state to a scanner without needing a match followed by an fcall?
> >
> > Thanks!
> >
> > Carlos
> >
> >
> > >
>
>


-- 
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
        -- Thomas Jefferson



More information about the ragel-users mailing list