[ragel-users] EOF actions and buffering

Adrian Thurston adrian.thurston at esentire.com
Thu Jan 20 23:43:47 UTC 2011


It is indeed a common problem. It is purposefully left in the hands of 
the programmer. This is due to a few factors.

-There is no general solution that doesn't involve memory allocation.
-There are many different ways to approach memory allocation
-There are many use cases that have always input in one buffer

When I need a general solution I use automatically growing buffers (to a 
limit). See the DSNPd source for an example. I don't think that buffer 
class is limited, but it should be.

http://svn.complang.org/choicesocial/trunk/dsnpd/parser.rl

There are other approaches though. See the Ragel manual (5.9) for a 
short discussion.

Regards,
  Adrian

On 11-01-20 03:26 PM, Benjamin van der Veen wrote:
> Hello,
>
> I am using Ragel to make an HTTP parser. Feel free to tell me this is a terrible idea. ;)
>
> It seems to me that a common problem faced by users of Ragel is that they do not know in advance where (with respect to the grammar being parsed) the boundaries of buffers that they feed the parser are going to be. For example, I can easily make a Ragel grammar which will parse the following using only entering and leaving actions:
>
> "GET /foo HTTP/1.1\r\nBar: Baz\r\n\r\n"
>
> However the parser breaks if I feed it the same data across multiple buffers (as would be the case when reading chunks of data from a network socket):
>
> "GE"
> "T /f"
> "oo HTTP/1.1\r"
> "\nBar: Baz\r\n\r\n"
>
> I found that this can be mitigated against by using EOF-leaving actions (%/some_action) and always setting eof to pe to cause the EOF-leaving actions to occur. However I'm finding that it isn't consistent and leads to unexpected behavior in some cases. Note that I am using the regular expression syntax, not the state chart syntax.
>
> What is the recommended approach to this problem? My intuition is that a properly-specified state machine should work regardless of how data is fed to it and Ragel should make this opaque to the user—it seems to me that processing data across multiple buffers would be a very common problem that Ragel would solve for the user, but I may be mistaken.
>
> In general I'm rather confused about how EOF actions are handled and when entering or leaving actions are treated as EOF actions. I've pored over the manual but I feel like it's all predicated on some knowledge that I don't have and am unsure where to look to find. In particular the first two paragraphs of section 3.1.4 (Leaving Actions) are almost completely opaque to me.
>
> Cheers!
> Benjamin van der Veen
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list