[ragel-users] EOF actions and buffering

Thu Jan 20 23:26:40 UTC 2011

Hello,

I am using Ragel to make an HTTP parser. Feel free to tell me this is a terrible idea. ;)

It seems to me that a common problem faced by users of Ragel is that they do not know in advance where (with respect to the grammar being parsed) the boundaries of buffers that they feed the parser are going to be. For example, I can easily make a Ragel grammar which will parse the following using only entering and leaving actions:

"GET /foo HTTP/1.1\r\nBar: Baz\r\n\r\n"

However the parser breaks if I feed it the same data across multiple buffers (as would be the case when reading chunks of data from a network socket):

"GE"
"T /f"
"oo HTTP/1.1\r"
"\nBar: Baz\r\n\r\n"

I found that this can be mitigated against by using EOF-leaving actions (%/some_action) and always setting eof to pe to cause the EOF-leaving actions to occur. However I'm finding that it isn't consistent and leads to unexpected behavior in some cases. Note that I am using the regular expression syntax, not the state chart syntax.

What is the recommended approach to this problem? My intuition is that a properly-specified state machine should work regardless of how data is fed to it and Ragel should make this opaque to the user—it seems to me that processing data across multiple buffers would be a very common problem that Ragel would solve for the user, but I may be mistaken.

In general I'm rather confused about how EOF actions are handled and when entering or leaving actions are treated as EOF actions. I've pored over the manual but I feel like it's all predicated on some knowledge that I don't have and am unsure where to look to find. In particular the first two paragraphs of section 3.1.4 (Leaving Actions) are almost completely opaque to me.

Cheers!
Benjamin van der Veen
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users