[ragel-users] Intermediate match wrongly assumed as valid

Daniel Beecham daniel at beecham.se
Sun Mar 1 04:59:38 EST 2020


The problem is that a finite state machine, having read the "3", cannot
know if the parser is "done" or not - only that it in a certain state. You
have a couple of options that I know:

* Add some "end-of-string" sentinel value (like a null terminator)
* Use first_final instead of state transition actions
* Use EOF actions

For the 'first_final' method, you would essentially do something like what
is described in the ragel guide:

parser_parse(&parser, str, strlen(str));
if ( parser.cs < %%{ write first_final; }%% ) {
  printf("parsing failed\n");
}

The downside with this is that partial reads are no longer supported, e.g.
when
* reading larger data sets, like reading network logs from a file
* reading data over a network or some serial communication

a normal loop over these might look like

char buf[BUFLEN];
bytes_read = read(fd, &buf, BUFLEN);
while (bytes_read > 0) {
  parser_parse(&parser, buf, bytes_read);
  bytes_read = read(fd, &buf, BUFLEN);
}

and a certain read might have read exactly 3 characters, "123", but you
don't know if the next read will get you "456789" or EOF - but "cs" is in a
final state.

In the first method, you would define a parser like

foo = ('123' 0) | ('12345' 0)

then the finishing state action would occur on the null terminator. You
would call your parser like

parser_parse(&parser, str, strlen(str)+1);

or, while over a network, you could do

char buf[BUFLEN];
bytes_read = read(fd, &buf, BUFLEN);
while (bytes_read > 0) {
  parser_parse(&parser, buf, bytes_read);
  bytes_read = read(fd, &buf, BUFLEN):
}
if (0 == bytes_read) {
  // read EOF
  parser_parse(&parser, (char[]){0}, 1);
}

the downside is that adding null terminators to the parser reduces the
extensibility of the parser; it's harder to add the parser as a "sub
parser" of another parser.

EOF actions are run when 'p == pe == eof'. These are essentially the same
as adding a null terminator to the parser since you need to know in advance
that you've hit the EOF - but you move the action from a final state
transition to an EOF action. I've not really used eof actions that much
because I find them slightly wierd to use, but someone can fill in on the
details.

On Sat, Feb 29, 2020 at 1:05 AM Iñaki Baz Castillo <ibc at aliax.net> wrote:

> Hi,
>
> After many years using my Ragel based IPv6 parser, I've found a bug. I
> think I've also understood the problem and simplified the code as much
> as possible.
>
> Let's assume this simple grammar:
>
> --------------------
> foo = "12345" | "123";
> -------------------
>
> The parser.rl has a function that receives a char* data pointer and a
> size_t len. It includes the Ragel %% lines as usual. At the end of the
> function it checks:
>
> --------------------
> // Ensure that the parsing has consumed all the given length.
> if (len == p - data)
>   return true;
> else
>   return false;
> --------------------
>
> The problem is that, when the input is "1234", the parser returns true.
>
> I think I understand the problem:
>
> - The parser first matches "123" which is valid.
> - It continues and matches "1234".
> - At this time it has consumed 4 chars.
> - It exits now because there is no more chars in the input.
> - However it did match "123" so the Ragel action was executed.
>
> May I know how to avoid this problem and make the parser function
> return false in this case?
>
> Thanks a lot.
>
>
>
> --
> Iñaki Baz Castillo
> <ibc at aliax.net>
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at colm.net
> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20200301/a2e213e5/attachment.html>


More information about the ragel-users mailing list