[ragel-users] When does Ragel mark a state as 'Final'?

Adrian Thurston thurston at complang.org
Sun Nov 24 15:29:59 UTC 2013


Every sub-expression has a set of final states. An FSM operation may add 
or remove final-state status as it builds new machines. So your main 
machine may not have any final states, but they were present as the 
machine was built up, and so you see variations in how the eof embedding 
operators affect the result.

Arian

On 13-07-25 12:31 AM, Solomon Gibbs wrote:
> Hello,
>
> I'm not sure I understand what Ragel considers a "final" state. IIRC
> the User's Guide says that states that are final before machine
> simplification remain final thereafter.
>
>     When exactly is a state final, and how does one recognize this?
>
> I'm using the state machine syntax to implement a string finder --
> find ASCII strings with length greater than n, and print them. This
> means implementing a maximum length matcher, as below.
>
> Despite the fact that the dot output shows no final states, the EOF
> transitions behave differently depending on which flavor of {$%@}eof
> is used. I do not understand why this should be. For example, in the
> "has_string" state below, using %eof instead of @eof causes both the
> "commit_nonstring_eof" and "commit_string_eof" actions to be called
> from one of the generated/synthetic states terminating the matching
> state.
>
> (State graphics for this machine are are available via
> http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)
>
> action commit_string {    }
>
> action commit_string_eof { }
>
> action commit_nonstring_eof { }
>
> action set_mark { }
>
> action reset {
>     /* Force the machine back into state 1. This happens after
>      * an incomplete match when some graphical characters are
>      * consumed, but not enough for use to keep the string. */
>     fgoto start;
> }
>
> # Matching classes union to 0x00 .. 0xFF
> graphic = (0x09 | 0x20 .. 0x7E);
> non_graphic =  (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);
>
> collector = (
>
>     start: (
>        # Set the mark if we have a graphic character,
>        # otherwise go to non_graphic state and consume input
>        graphic @set_mark ->  has_glyph |
>        non_graphic ->  no_glyph
>     ) $eof(commit_nonstring_eof),
>
>     no_glyph: (
>           # Consume input until a graphic character is encountered
>           non_graphic ->  no_glyph |
>           graphic @set_mark ->  has_glyph
>     ) $eof(commit_nonstring_eof),
>
>     has_glyph: (
>            # We already matched one graphic character to get here
>            # from start or no_glyph. Try to match N-1 before allowing
>                # the string to be committed. If we don't get to N-1,
>                # drop back to the start state
>                graphic{3} $lerr(reset) ->  has_string
>     ) @eof(commit_nonstring_eof),
>
>     has_string: (
>                 # Already consumed our quota of N graphic characters;
>                 # consume input until we run out of graphic characters
>                 # then reset the machine. All exiting edges should commit
>                 # the string. We diferentiate between exiting on a non-graphic
>                 # input that shouldn't be added to the string and exiting
>                 # on a (graphic) EOF that should be added.
>                 graphic* non_graphic ->  start
>     ) %from(commit_string) @eof(commit_string_eof)
>     #) %from(commit_string) %eof(commit_string_eof) // bad
>
> ); #$debug;
>
> main := (collector)+;Hello,
>
> I'm not sure I understand what Ragel considers a "final" state. IIRC
> the User's Guide says that states that are final before machine
> simplification remain final thereafter.
>
>     When exactly is a state final, and how does one recognize this?
>
> I'm using the state machine syntax to implement a string finder --
> find ASCII strings with length greater than n, and print them. This
> means implementing a maximum length matcher, as below.
>
> Despite the fact that the dot output shows no final states, the EOF
> transitions behave differently depending on which flavor of {$%@}eof
> is used. I do not understand why this should be. For example, in the
> "has_string" state below, using %eof instead of @eof causes both the
> "commit_nonstring_eof" and "commit_string_eof" actions to be called
> from one of the generated/synthetic states terminating the matching
> state.
>
> (State graphics for this machine are are available via
> http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)
>
> action commit_string {    }
>
> action commit_string_eof { }
>
> action commit_nonstring_eof { }
>
> action set_mark { }
>
> action reset {
>     /* Force the machine back into state 1. This happens after
>      * an incomplete match when some graphical characters are
>      * consumed, but not enough for use to keep the string. */
>     fgoto start;
> }
>
> # Matching classes union to 0x00 .. 0xFF
> graphic = (0x09 | 0x20 .. 0x7E);
> non_graphic =  (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);
>
> collector = (
>
>     start: (
>        # Set the mark if we have a graphic character,
>        # otherwise go to non_graphic state and consume input
>        graphic @set_mark ->  has_glyph |
>        non_graphic ->  no_glyph
>     ) $eof(commit_nonstring_eof),
>
>     no_glyph: (
>           # Consume input until a graphic character is encountered
>           non_graphic ->  no_glyph |
>           graphic @set_mark ->  has_glyph
>     ) $eof(commit_nonstring_eof),
>
>     has_glyph: (
>            # We already matched one graphic character to get here
>            # from start or no_glyph. Try to match N-1 before allowing
>                # the string to be committed. If we don't get to N-1,
>                # drop back to the start state
>                graphic{3} $lerr(reset) ->  has_string
>     ) @eof(commit_nonstring_eof),
>
>     has_string: (
>                 # Already consumed our quota of N graphic characters;
>                 # consume input until we run out of graphic characters
>                 # then reset the machine. All exiting edges should commit
>                 # the string. We diferentiate between exiting on a non-graphic
>                 # input that shouldn't be added to the string and exiting
>                 # on a (graphic) EOF that should be added.
>                 graphic* non_graphic ->  start
>     ) %from(commit_string) @eof(commit_string_eof)
>     #) %from(commit_string) %eof(commit_string_eof) // bad
>
> ); #$debug;
>
> main := (collector)+;
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/r

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list