Possible bug in scanner

David Balmain dbalm... at gmail.com
Mon Apr 21 08:35:09 UTC 2008


On Apr 21, 9:44 am, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> I just checked in a fix for this.
>
> This bug affects -T0 and -T1 for all target languages. The problem
> occurs when the last token of a scanner requires some backtracking to match.

Hi Adrian,

Thanks for getting on to this so quickly. Unfortunately it doesn't
seem to have fixed a problem. I'm actually using -G2 as well as the
default (-T0 I believe) and the error occurs in both cases. I've
investigated this a little more and found a work around;

%%{
    machine Word;

    main := |*
        'a' {PUTS("a: ");};
        [ab]+ . 'c' {PUTS("abc: ");};
-        any;
+        any {};
    *|;

}%%

Note the empty braces after 'any'. With them I get this for my
"longest match"?? switch;

            switch( act ) {//--
                case 1:
                    {{p = ((te))-1;}PUTS("a: ");}
                    break;
                case 3:
                    {{p = ((te))-1;}}
                    break;
                default: break;
            }

Without the empty braces "case 3:" is missing;

            switch( act ) {//--
                case 1:
                    {{p = ((te))-1;}PUTS("a: ");}
                    break;
                default: break;
            }

So p never gets reset if the ( [ab]+ . 'c' ) fails to match. This is
particularly bad if we are at the end of a string as it will continue
to scan after the null byte causing a segfault. I've tried to find
were the problem is in the source so I could give you a patch but it
is taking me a little while figure things out  and I thought you'd
probably be able to fix this straight away. Let me know if there is
anything else I can do to help.

Cheers,



More information about the ragel-users mailing list