[ragel-users] When does Ragel mark a state as 'Final'?

Solomon Gibbs solomon.gibbs.lists at gmail.com
Thu Jul 25 04:31:19 UTC 2013


Hello,

I'm not sure I understand what Ragel considers a "final" state. IIRC
the User's Guide says that states that are final before machine
simplification remain final thereafter.

   When exactly is a state final, and how does one recognize this?

I'm using the state machine syntax to implement a string finder --
find ASCII strings with length greater than n, and print them. This
means implementing a maximum length matcher, as below.

Despite the fact that the dot output shows no final states, the EOF
transitions behave differently depending on which flavor of {$%@}eof
is used. I do not understand why this should be. For example, in the
"has_string" state below, using %eof instead of @eof causes both the
"commit_nonstring_eof" and "commit_string_eof" actions to be called
from one of the generated/synthetic states terminating the matching
state.

(State graphics for this machine are are available via
http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)

action commit_string {    }

action commit_string_eof { }

action commit_nonstring_eof { }

action set_mark { }

action reset {
   /* Force the machine back into state 1. This happens after
    * an incomplete match when some graphical characters are
    * consumed, but not enough for use to keep the string. */
   fgoto start;
}

# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic =  (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);

collector = (

   start: (
      # Set the mark if we have a graphic character,
      # otherwise go to non_graphic state and consume input
      graphic @set_mark -> has_glyph |
      non_graphic -> no_glyph
   ) $eof(commit_nonstring_eof),

   no_glyph: (
         # Consume input until a graphic character is encountered
         non_graphic -> no_glyph |
         graphic @set_mark -> has_glyph
   ) $eof(commit_nonstring_eof),

   has_glyph: (
          # We already matched one graphic character to get here
          # from start or no_glyph. Try to match N-1 before allowing
              # the string to be committed. If we don't get to N-1,
              # drop back to the start state
              graphic{3} $lerr(reset) -> has_string
   ) @eof(commit_nonstring_eof),

   has_string: (
               # Already consumed our quota of N graphic characters;
               # consume input until we run out of graphic characters
               # then reset the machine. All exiting edges should commit
               # the string. We diferentiate between exiting on a non-graphic
               # input that shouldn't be added to the string and exiting
               # on a (graphic) EOF that should be added.
               graphic* non_graphic -> start
   ) %from(commit_string) @eof(commit_string_eof)
   #) %from(commit_string) %eof(commit_string_eof) // bad

); #$debug;

main := (collector)+;Hello,

I'm not sure I understand what Ragel considers a "final" state. IIRC
the User's Guide says that states that are final before machine
simplification remain final thereafter.

   When exactly is a state final, and how does one recognize this?

I'm using the state machine syntax to implement a string finder --
find ASCII strings with length greater than n, and print them. This
means implementing a maximum length matcher, as below.

Despite the fact that the dot output shows no final states, the EOF
transitions behave differently depending on which flavor of {$%@}eof
is used. I do not understand why this should be. For example, in the
"has_string" state below, using %eof instead of @eof causes both the
"commit_nonstring_eof" and "commit_string_eof" actions to be called
from one of the generated/synthetic states terminating the matching
state.

(State graphics for this machine are are available via
http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)

action commit_string {    }

action commit_string_eof { }

action commit_nonstring_eof { }

action set_mark { }

action reset {
   /* Force the machine back into state 1. This happens after
    * an incomplete match when some graphical characters are
    * consumed, but not enough for use to keep the string. */
   fgoto start;
}

# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic =  (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);

collector = (

   start: (
      # Set the mark if we have a graphic character,
      # otherwise go to non_graphic state and consume input
      graphic @set_mark -> has_glyph |
      non_graphic -> no_glyph
   ) $eof(commit_nonstring_eof),

   no_glyph: (
         # Consume input until a graphic character is encountered
         non_graphic -> no_glyph |
         graphic @set_mark -> has_glyph
   ) $eof(commit_nonstring_eof),

   has_glyph: (
          # We already matched one graphic character to get here
          # from start or no_glyph. Try to match N-1 before allowing
              # the string to be committed. If we don't get to N-1,
              # drop back to the start state
              graphic{3} $lerr(reset) -> has_string
   ) @eof(commit_nonstring_eof),

   has_string: (
               # Already consumed our quota of N graphic characters;
               # consume input until we run out of graphic characters
               # then reset the machine. All exiting edges should commit
               # the string. We diferentiate between exiting on a non-graphic
               # input that shouldn't be added to the string and exiting
               # on a (graphic) EOF that should be added.
               graphic* non_graphic -> start
   ) %from(commit_string) @eof(commit_string_eof)
   #) %from(commit_string) %eof(commit_string_eof) // bad

); #$debug;

main := (collector)+;

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list