[ragel-users] Default actions that leave the machine

Adrian Thurston thurston at complang.org
Wed Feb 2 22:32:17 UTC 2011


Apparently I don't know how to use my own tool! Let's try this again, 
this time not so rushed on my part :)

         action le {}
         foo = 'hello' $^le;
         main := (
                 any* |
                 foo
         );

Local error actions are local to the named machine they are in, not the 
enclosing (), which is the rushed mistake I made.

Thanks,
  Adrian

On 11-02-02 02:17 PM, Murray Henderson wrote:
> Hi Adrian,
>
> Thanks for taking an interest :-).
>
>
> As far as I can tell,
>
>   main = (
>            ('HELLO ' $^parse_error) 'WORLD' |
>            any*
>         );
>
> and
>
>   main = (
>            ('HELLO ' $!parse_error) 'WORLD' |
>            any*
>         );
>
> are equivalent to
>
>
>   main = any*;
>
>
>
>
> Anyway, the real machine I am trying to build currently looks like this:
>
>
> doctype_single_quoted_value = (
>      "'" ([^>]*)
>          >start_token_value
>          %end_token
>      :>>  "'"
> );
>
> doctype_double_quoted_value = (
>      '"' ([^>]*)
>          >start_token_value
>          %end_token
>      :>>  '"'
> );
>
> doctype_quoted_value = (doctype_single_quoted_value |
> doctype_double_quoted_value);
>
> doctype_name = (
>      space+ (any - ('>' | space))+
>          >start_token_doctype_name
>          %end_token
> );
>
> doctype_public = space+ 'PUBLIC' %token_doctype_public space+
> doctype_quoted_value;
>
> doctype_system = space+ 'SYSTEM' %token_doctype_system space+
> doctype_quoted_value;
>
> doctype = (
>      '<!DOCTYPE' %token_doctype space* (doctype_name doctype_public?
> doctype_system?)? space* '>'
> );
>
>
>
> This machine looks about right (in the FSM diagram) except that it
> doesn't handle malformed doctypes.
>
> With the $^^ operator I described, I imagine the machine would look
> like this (given a parse error action, pe):
>
>
>
> doctype = (
>      '<!DOCTYPE' %token_doctype space* ((doctype_name doctype_public?
> doctype_system?) $^^pe)? space*<: ([^>]+>pe)? '>'
> );
>
>
> Additionally, I think I might be able to use that imaginary operator
> to make whitespace optional (though with a parse error if the
> whitespace is omitted):
>
> eg:
>
> omittable_space = space+>^^pe;
> doctype_public = omittable_space 'PUBLIC' %token_doctype_public
> omittable_space doctype_quoted_value;
>
>
>
>
> I will be using this machine inside multiple scanners, so goto based
> error recovery would be a pain. Default actions that transition to the
> final state seem like a handy feature for any permissive parser
> (although I realize I am doing something extreme).
>
> I still thinking about attempting to patch ragel. Much more
> complicated than I thought it would be, but can't hurt for me to give
> it a crack.
>
>
> Still absolutely nowhere near finished, but my work is progressing slowly ;-).
> https://github.com/murrayh/html5rl/blob/master/html5_grammar.rl
>
>
> Cheers,
> Murray
>
>
> On Tue, Feb 1, 2011 at 5:16 PM, Adrian Thurston<thurston at complang.org>  wrote:
>> Hi, does this do what you want?
>>
>> main = (
>>           ('HELLO ' $^parse_error) 'WORLD' |
>>           any*
>>        );
>>
>> I'm not sure how that fits into your overall plan. Try it out and we'll
>> discuss further.
>>
>> Regards,
>>   Adrian
>>
>> On 11-01-31 03:50 PM, Murray Henderson wrote:
>>>
>>> Hello,
>>>
>>> Both local and global error actions transition to the error state. I
>>> am using Ragel 6.5. I can try with 6.6 when I get home.
>>>
>>> I made a quick example (based off S. Geist's example):
>>>
>>> http://pastebin.com/06ihRxQg
>>>
>>> Example output:
>>>
>>> HELLO WORLD
>>> read: HELLO WORLD
>>> len: 12, state: 12
>>> HELWORLD
>>> parse error
>>> read: HEL
>>> len: 3, state: 0
>>>
>>>
>>> Cheers,
>>> Murray
>>>
>>>
>>> On Tue, Feb 1, 2011 at 10:02 AM, Adrian Thurston<thurston at complang.org>
>>>   wrote:
>>>>
>>>> Local error actions don't. Sorry I should have suggested just those.
>>>>
>>>> On 11-01-31 02:58 PM, Murray Henderson wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Local and global error actions transition to the error state.
>>>>>
>>>>> I want DEF to transition to the next machine (ie. behave like a final
>>>>> state), not the error state.
>>>>>
>>>>> The parser I am writing is permissive, all input must be accepted (I
>>>>> never want to goto the error state).
>>>>>
>>>>> I do not wish to use manual goto recovery, because the parser is large
>>>>> and complex, such manual tracking is a lot of work and error prone.
>>>>>
>>>>> Cheers,
>>>>> Murray
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 1, 2011 at 4:58 AM, Adrian Thurston
>>>>> <adrian.thurston at esentire.com>      wrote:
>>>>>>
>>>>>> Hi, have you looked at ragel's local and global error actions yet?
>>>>>> These
>>>>>> may
>>>>>> do what you want.
>>>>>>
>>>>>> -Adrian
>>>>>>
>>>>>> On 11-01-26 08:08 PM, Murray Henderson wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I want to embed a default action into a machine that leaves the
>>>>>>> machine (without using manual a jump inside the action).
>>>>>>>
>>>>>>> For simplicities sake, I will call this operator $^^ (since it is
>>>>>>> similar to the Local Error operator).
>>>>>>>
>>>>>>>
>>>>>>> Example:
>>>>>>>
>>>>>>> action parse_error {}
>>>>>>> helloworld = ('HELLO ' %^^parse_error) 'WORLD';
>>>>>>>
>>>>>>> Non-error inputs include:
>>>>>>> HELLO WORLD
>>>>>>> HELLOWORLD (parse_error action occurs on 'O' ->        'W' transition)
>>>>>>> HELLWORLD (parse_error action occurs on 'L' ->        'W' transition)
>>>>>>> HELWORLD (parse_error action occurs on 'L' ->        'W' transition)
>>>>>>> HEWORLD (parse_error action occurs on 'E' ->        'W' transition)
>>>>>>> HWORLD (parse_error action occurs on 'H' ->        'W' transition)
>>>>>>> WORLD (parse_error action occurs on ->        'W' transition)
>>>>>>>
>>>>>>>
>>>>>>> I can simulate the above behavior with the '?' operator, but that is
>>>>>>> laborious, and there are other ways of using $^^ that I suspect cannot
>>>>>>> be simulated.
>>>>>>>
>>>>>>>
>>>>>>> I want this operator because I am trying to make a liberal parser that
>>>>>>> accepts all possible input. (Every state must have a default action)
>>>>>>> .I am creating a html5 parser that uses regular machines for
>>>>>>> tokenizing, and scanners built from the regular machines for parsing.
>>>>>>> Yes, I am mad.
>>>>>>>
>>>>>>> I cannot use manual jumps, because I don't want to jump out of the
>>>>>>> scanners mid-token.
>>>>>>>
>>>>>>>
>>>>>>> I am willing to try and add this operator into Ragel myself. I have
>>>>>>> grabbed the source code and tracked my way to fsmap.cpp, where the new
>>>>>>> operator would be added.
>>>>>>>
>>>>>>> Before I continue...
>>>>>>> Is there already a way to achieve my desired behavior that I am not
>>>>>>> aware
>>>>>>> of?
>>>>>>> Would such an operator be worthwhile? Is it even possible?
>>>>>>> Is there any knowledge that could be imparted that would help me make
>>>>>>> a
>>>>>>> patch?
>>>>>>>
>>>>>>> If I do end up making a patch, for symmetry purposes I will make
>>>>>>> global/local and start/any/final etc versions of the operator.
>>>>>>>
>>>>>>> After a brief look through the source, it looks like I would need to
>>>>>>> mod the FsmAp::fillGaps() function, passing in a (separate object for
>>>>>>> each?) final state into the FsmAp::attachNewTrans() instead of NULL.
>>>>>>>
>>>>>>> Ragel is a wonderful program by the way, thank you for creating it.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Murray
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ragel-users mailing list
>>>>>>> ragel-users at complang.org
>>>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ragel-users mailing list
>>>>>> ragel-users at complang.org
>>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ragel-users mailing list
>>>>> ragel-users at complang.org
>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>
>>>> _______________________________________________
>>>> ragel-users mailing list
>>>> ragel-users at complang.org
>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>
>>>
>>> _______________________________________________
>>> ragel-users mailing list
>>> ragel-users at complang.org
>>> http://www.complang.org/mailman/listinfo/ragel-users
>>
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at complang.org
>> http://www.complang.org/mailman/listinfo/ragel-users
>>
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list