colin.flem... at coreproc.com
Thu Sep 14 19:49:59 UTC 2006
Great, thanks! The patch works well for the grammar I sent:
time ragel asciixml.rl -s -M doctypedecl | rlcodegen -V > test.dot
fsm name : AsciiXml
num states: 269
But the full Unicode grammar still takes a while:
time ragel xml.rl -s | rlcodegen -V > test.dot
fsm name : Xml
num states: 445
I guess there's another production creating unreachable states. It's a
vast improvement though, it never even got close to compiling
beforehand. Is there some way I can diagnose what's happening myself?
On 9/14/06, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> Pretty much every operation can cause unreachable states. For example,
> during union a new start state is created with epsilon transitions drawn to
> the old start states. The epsilon operation effectively copies transition
> lists from the destination state to the source state. If the old start
> states don't have any transitions in from elsewhere in the machine then they
> become unreachable. The on-the-fly cleanup will reap states with no external
> entry points. This gets most unreachables, but just like in garbage
> collection, cycles cause problems.
> In your ragel file the subtraction operation is leaving behind unreachables.
> Subtraction may in fact need a full mark and sweep (I had thought the
> on-the-fly was sufficient). The attached patch will allow you to compile
> without -e.
> Colin Fleming wrote:
> > Ok, thanks for the help! What causes unreachable states? Can I change
> > something about the grammar to avoid them (apart from -e, of course)?
> > Cheers,
> > Colin
> Index: ragel/fsmgraph.cpp
> --- ragel/fsmgraph.cpp (revision 3586)
> +++ ragel/fsmgraph.cpp (working copy)
> @@ -496,6 +496,7 @@
> /* Remove states that have no path to a final state. */
> + removeUnreachableStates();
> bool FsmAp::inEptVect( EptVect *eptVect, StateAp *state )
More information about the ragel-users