speed vs. re2c?

Adrian Thurston thurs... at cs.queensu.ca
Fri Oct 6 00:45:59 UTC 2006


Hi Joshua,

If I understand you right, then you'll be pleased to know that Ragel is 
already one half of this. You can use it to make parsers which parse an 
entire file in one shot, executing actions (or callbacks, or hooks) along 
the way. See the Mongrel HTTP Server for an example.

http://mongrel.rubyforge.org/

The half that Ragel isn't is JIT. I need some clarification though: by JIT 
regex compiling do you mean only synthesize a state when it is needed? This 
is the only way I can see a possible benefit of doing JIT regex compiling. 
If not, then exactly what part of the process do you envision to be done at 
compile time?

Cheers,
  Adrian

Joshua Haberman wrote:
> Cool -- I'm glad to know it's competetive with re2c.  When I went to
> look up what '-G' does, I was also happy to see that there are lots of
> options for how the code is generated.
> 
> Let me explain why I was interested in re2c, and why I'm now interested
> in Ragel.  Many people I've talked to think this idea is crap, so I
> won't be offended if you do too, but I really believe in it.
> 
> Text processing is one of the most common bottlenecks in high-level
> languages.  The regular expression engines that are built into
> languages like Perl, Ruby, Python, etc. are useful for pattern matching
> on isolated strings, but aren't optimal for the case where you want to
> want to parse a file in a known format, beginning to end.  If you
> designed a library specifically for this use case, you could get lots
> of nice benefits like:
> 
> * its API could be more along the lines of what you want: set up a
> bunch of patterns and rules, then set the library in motion on an input
> stream.
> 
> * you could write an optimized buffering layer that keeps a
> configurable number of trailing tokens in memory at once.
> 
> * you could use a library like Ragel to generate goto-based scanners at
> runtime, so that you could get the performance improvements over the
> table-based scanners that existing regex engines use.  Basically I am
> proposing use Ragel as the backend for a regex JIT.
> 
> You could compile to C and then use an embedded C compiler (like
> libtcc) to compile to machine code.  Personally I would be more
> interested in generating assembly code directly, since it wouldn't be
> as heavyweight a process and would give you the opportunity to optimize
> better than the C compiler, since you are working within a very narrow
> problem domain.
> 
> I don't know when I'd actually get to this, but I'm very interested in
> seeing it done, and will probably try to use Ragel in this way at some
> point.  What do you think?
> 
> Josh
> 
> 
> 



More information about the ragel-users mailing list