Minimisation question

Colin Fleming colin.flem... at coreproc.com
Thu Sep 14 19:49:59 UTC 2006


Great, thanks! The patch works well for the grammar I sent:

time ragel asciixml.rl -s -M doctypedecl | rlcodegen -V > test.dot
fsm name  : AsciiXml
num states: 269

real    0m0.087s
user    0m0.079s
sys     0m0.011s

But the full Unicode grammar still takes a while:

time ragel xml.rl -s | rlcodegen -V > test.dot
fsm name  : Xml
num states: 445

real    0m43.829s
user    0m43.293s
sys     0m0.565s

I guess there's another production creating unreachable states. It's a
vast improvement though, it never even got close to compiling
beforehand. Is there some way I can diagnose what's happening myself?

On 9/14/06, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> Pretty much every operation can cause unreachable states. For example,
> during union a new start state is created with epsilon transitions drawn to
> the old start states. The epsilon operation effectively copies transition
> lists from the destination state to the source state. If the old start
> states don't have any transitions in from elsewhere in the machine then they
> become unreachable. The on-the-fly cleanup will reap states with no external
> entry points. This gets most unreachables, but just like in garbage
> collection, cycles cause problems.
>
> In your ragel file the subtraction operation is leaving behind unreachables.
> Subtraction may in fact need a full mark and sweep (I had thought the
> on-the-fly was sufficient). The attached patch will allow you to compile
> without -e.
>
> Cheers,
>   Adrian
>
> Colin Fleming wrote:
> > Ok, thanks for the help! What causes unreachable states? Can I change
> > something about the grammar to avoid them (apart from -e, of course)?
> >
> > Cheers,
> > Colin
> >
> >
>
> >
>
> Index: ragel/fsmgraph.cpp
> ===================================================================
> --- ragel/fsmgraph.cpp  (revision 3586)
> +++ ragel/fsmgraph.cpp  (working copy)
> @@ -496,6 +496,7 @@
>
>         /* Remove states that have no path to a final state. */
>         removeDeadEndStates();
> +       removeUnreachableStates();
>  }
>
>  bool FsmAp::inEptVect( EptVect *eptVect, StateAp *state )
>
>
>



More information about the ragel-users mailing list