[ragel-users] Re: syntax improvement, new operators

Erich Ocean er... at atlasocean.com
Fri Feb 9 04:40:11 UTC 2007


Well, take the first User Action example in the Ragel manual on page  
28: 3.1.1 Entering Action:

action A {}
main := ( lower* >A ) . ’ ’;

Let's modify it to add a Pending Out (Leaving) Action, and then make  
that machine optional:

action ENTER_TRANSITION {} # Entering Action
action LEAVE_TRANSITION {} # Pending Out (Leaving) Action

main := ( lower* >ENTER_TRANSITION %LEAVE_TRANSITION )? . ' ';

If the first character recognized by main happens to be a space, then  
LEAVE will be executed, but ENTER won't.

I think it's confusing to a user that a machine will execute its  
Leaving action (to use your terminology) without first executing its  
Entering.

The confusion goes away if you've learned that the > action will only  
be executed on the first character.

The % action isn't a character action, it's a machine action (to use  
my terminology). So a user would naturally reason that it could be  
executed even though no character was recognized, as is this case:

action FIRST_CHAR {} # Executed on recognition of the first character
action MACHINE_ACCEPT {} # Executed when the machine accepts a match

main := ( lower* >FIRST_CHAR %MACHINE_ACCEPT )? . ' ';

I use the Match/Accept terminology because any given machine can make  
a whole bunch of matches while it's recognizing characters, and the @  
action is executed every single time the machine recognizes a match.  
The % action, on the other hand, is only executed when the machine  
finally accepts one of those matches. The @ action (Match) is a  
character action because it is always and only triggered upon the  
recognition of a character, whereas the Accept action is a machine  
action because is only ever executed once, when the machine accepts a  
match, regardless of whether or not a character has been recognized.  
It's character-independent.

Hope this explains some of the reasoning behind the categorization  
and new terminology.

Best, Erich

On Feb 8, 2007, at 7:48 PM, Adrian Thurston wrote:

>
> Hi Erich,
>
> I'm glad to see you are still working with Ragel! By the way, I've
> updated your name in the CREDITS file and elsewhere.
>
>> Character Actions
>> =============
>>
>>> aka First -- This action will be executed on the first character  
>>> the machine recognizes.
>> $ aka Each -- This action will be executed on each character the
>> machine recognizes.
>> @ aka Match -- This action will be executed on characters the machine
>> recognizes that puts the machine into a match state.
>> < aka Continue -- (New) This action will be executed on the next
>> character the machine recognizes when the machine is in a match  
>> state.
>
> So it seems that you prefer to express these operators in terms of the
> characters of the input string that is processed. This is distinct  
> from
> my approach, where I talk about the transitions of a regular
> expression's corresponding state machine.
>
> I prefer to express the operators in terms of transitions because I  
> find
> it to be very precise. For example, with "entering transition actions"
> you can go and look at the graphviz drawing and find the transitions
> which take you into the machine. That's me though, and I would very  
> much
> like to hear what others think. Is it better to talk about the
> transitions that the actions are put into, or is better to talk about
> the characters that are moved over when the actions are executed?
>
> The < operator you have given I find interesting. As I understand it,
> this would embed the action on the transitions which leave final  
> states
> (but stay in the machine). Could you give an example of when it is  
> useful?
>
>
>> Machine Actions
>> ============
>>
>> % aka Accept -- This action will only be executed when the machine
>> accepts a match.
>
> The word "accept" I find to be somewhat ambiguous. It doesn't  
> strike me
> that it means only one of "on the last character" or "on the next
> character." It seems to me that it could easily be interpreted as  
> either
> of those. I chose the word "leaving" for this operator because it's
> clear to me that it means on the next character.
>
>> %\ aka Fail -- (New) This action will only be executed when the
>> machine fails to either: (a) recognize a character, or (b) accept a
>> match.
>
> I'm not quite sure what you mean with (b). I would assume you mean the
> same as above, what is currently known as the leaving (or pending out)
> operator. But then I believe this new operator would be the same as  
> the
> $! operator. Could you clarify?
>
>> %? aka Skip -- (New) This action will be executed instead of Fail  
>> when
>> either the Optional operator or the Kleene Star operator is  
>> applied to
>> the machine.
>
> I'm not sure I understand this operator. If you write:
>
> ( expr %? skip_act )?
>
> Is it the same as writing the following?
>
> ( expr | "" %skip_act )
>
> Could you give us an example of the kind of problem that motivated  
> these
> operators? Especially the part about setting and clearing external  
> state
> flags to do proper resource acquisition and release. An example would
> really help me to understand the issue.
>
> Regards,
>   Adrian
>
>
> >



More information about the ragel-users mailing list