[ragel-users] Default actions that leave the machine

Murray Henderson mail at murrayh.id.au
Wed Feb 2 23:39:18 UTC 2011


Thanks Adrian, I think that technique will make it possible to solve my problem.


On Thu, Feb 3, 2011 at 9:32 AM, Adrian Thurston <thurston at complang.org> wrote:
> Apparently I don't know how to use my own tool! Let's try this again, this
> time not so rushed on my part :)
>
>        action le {}
>        foo = 'hello' $^le;
>        main := (
>                any* |
>                foo
>        );
>
> Local error actions are local to the named machine they are in, not the
> enclosing (), which is the rushed mistake I made.
>
> Thanks,
>  Adrian
>
> On 11-02-02 02:17 PM, Murray Henderson wrote:
>>
>> Hi Adrian,
>>
>> Thanks for taking an interest :-).
>>
>>
>> As far as I can tell,
>>
>>  main = (
>>           ('HELLO ' $^parse_error) 'WORLD' |
>>           any*
>>        );
>>
>> and
>>
>>  main = (
>>           ('HELLO ' $!parse_error) 'WORLD' |
>>           any*
>>        );
>>
>> are equivalent to
>>
>>
>>  main = any*;
>>
>>
>>
>>
>> Anyway, the real machine I am trying to build currently looks like this:
>>
>>
>> doctype_single_quoted_value = (
>>     "'" ([^>]*)
>>         >start_token_value
>>         %end_token
>>     :>>  "'"
>> );
>>
>> doctype_double_quoted_value = (
>>     '"' ([^>]*)
>>         >start_token_value
>>         %end_token
>>     :>>  '"'
>> );
>>
>> doctype_quoted_value = (doctype_single_quoted_value |
>> doctype_double_quoted_value);
>>
>> doctype_name = (
>>     space+ (any - ('>' | space))+
>>         >start_token_doctype_name
>>         %end_token
>> );
>>
>> doctype_public = space+ 'PUBLIC' %token_doctype_public space+
>> doctype_quoted_value;
>>
>> doctype_system = space+ 'SYSTEM' %token_doctype_system space+
>> doctype_quoted_value;
>>
>> doctype = (
>>     '<!DOCTYPE' %token_doctype space* (doctype_name doctype_public?
>> doctype_system?)? space* '>'
>> );
>>
>>
>>
>> This machine looks about right (in the FSM diagram) except that it
>> doesn't handle malformed doctypes.
>>
>> With the $^^ operator I described, I imagine the machine would look
>> like this (given a parse error action, pe):
>>
>>
>>
>> doctype = (
>>     '<!DOCTYPE' %token_doctype space* ((doctype_name doctype_public?
>> doctype_system?) $^^pe)? space*<: ([^>]+>pe)? '>'
>> );
>>
>>
>> Additionally, I think I might be able to use that imaginary operator
>> to make whitespace optional (though with a parse error if the
>> whitespace is omitted):
>>
>> eg:
>>
>> omittable_space = space+>^^pe;
>> doctype_public = omittable_space 'PUBLIC' %token_doctype_public
>> omittable_space doctype_quoted_value;
>>
>>
>>
>>
>> I will be using this machine inside multiple scanners, so goto based
>> error recovery would be a pain. Default actions that transition to the
>> final state seem like a handy feature for any permissive parser
>> (although I realize I am doing something extreme).
>>
>> I still thinking about attempting to patch ragel. Much more
>> complicated than I thought it would be, but can't hurt for me to give
>> it a crack.
>>
>>
>> Still absolutely nowhere near finished, but my work is progressing slowly
>> ;-).
>> https://github.com/murrayh/html5rl/blob/master/html5_grammar.rl
>>
>>
>> Cheers,
>> Murray
>>
>>
>> On Tue, Feb 1, 2011 at 5:16 PM, Adrian Thurston<thurston at complang.org>
>>  wrote:
>>>
>>> Hi, does this do what you want?
>>>
>>> main = (
>>>          ('HELLO ' $^parse_error) 'WORLD' |
>>>          any*
>>>       );
>>>
>>> I'm not sure how that fits into your overall plan. Try it out and we'll
>>> discuss further.
>>>
>>> Regards,
>>>  Adrian
>>>
>>> On 11-01-31 03:50 PM, Murray Henderson wrote:
>>>>
>>>> Hello,
>>>>
>>>> Both local and global error actions transition to the error state. I
>>>> am using Ragel 6.5. I can try with 6.6 when I get home.
>>>>
>>>> I made a quick example (based off S. Geist's example):
>>>>
>>>> http://pastebin.com/06ihRxQg
>>>>
>>>> Example output:
>>>>
>>>> HELLO WORLD
>>>> read: HELLO WORLD
>>>> len: 12, state: 12
>>>> HELWORLD
>>>> parse error
>>>> read: HEL
>>>> len: 3, state: 0
>>>>
>>>>
>>>> Cheers,
>>>> Murray
>>>>
>>>>
>>>> On Tue, Feb 1, 2011 at 10:02 AM, Adrian Thurston<thurston at complang.org>
>>>>  wrote:
>>>>>
>>>>> Local error actions don't. Sorry I should have suggested just those.
>>>>>
>>>>> On 11-01-31 02:58 PM, Murray Henderson wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Local and global error actions transition to the error state.
>>>>>>
>>>>>> I want DEF to transition to the next machine (ie. behave like a final
>>>>>> state), not the error state.
>>>>>>
>>>>>> The parser I am writing is permissive, all input must be accepted (I
>>>>>> never want to goto the error state).
>>>>>>
>>>>>> I do not wish to use manual goto recovery, because the parser is large
>>>>>> and complex, such manual tracking is a lot of work and error prone.
>>>>>>
>>>>>> Cheers,
>>>>>> Murray
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 1, 2011 at 4:58 AM, Adrian Thurston
>>>>>> <adrian.thurston at esentire.com>      wrote:
>>>>>>>
>>>>>>> Hi, have you looked at ragel's local and global error actions yet?
>>>>>>> These
>>>>>>> may
>>>>>>> do what you want.
>>>>>>>
>>>>>>> -Adrian
>>>>>>>
>>>>>>> On 11-01-26 08:08 PM, Murray Henderson wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I want to embed a default action into a machine that leaves the
>>>>>>>> machine (without using manual a jump inside the action).
>>>>>>>>
>>>>>>>> For simplicities sake, I will call this operator $^^ (since it is
>>>>>>>> similar to the Local Error operator).
>>>>>>>>
>>>>>>>>
>>>>>>>> Example:
>>>>>>>>
>>>>>>>> action parse_error {}
>>>>>>>> helloworld = ('HELLO ' %^^parse_error) 'WORLD';
>>>>>>>>
>>>>>>>> Non-error inputs include:
>>>>>>>> HELLO WORLD
>>>>>>>> HELLOWORLD (parse_error action occurs on 'O' ->        'W'
>>>>>>>> transition)
>>>>>>>> HELLWORLD (parse_error action occurs on 'L' ->        'W'
>>>>>>>> transition)
>>>>>>>> HELWORLD (parse_error action occurs on 'L' ->        'W' transition)
>>>>>>>> HEWORLD (parse_error action occurs on 'E' ->        'W' transition)
>>>>>>>> HWORLD (parse_error action occurs on 'H' ->        'W' transition)
>>>>>>>> WORLD (parse_error action occurs on ->        'W' transition)
>>>>>>>>
>>>>>>>>
>>>>>>>> I can simulate the above behavior with the '?' operator, but that is
>>>>>>>> laborious, and there are other ways of using $^^ that I suspect
>>>>>>>> cannot
>>>>>>>> be simulated.
>>>>>>>>
>>>>>>>>
>>>>>>>> I want this operator because I am trying to make a liberal parser
>>>>>>>> that
>>>>>>>> accepts all possible input. (Every state must have a default action)
>>>>>>>> .I am creating a html5 parser that uses regular machines for
>>>>>>>> tokenizing, and scanners built from the regular machines for
>>>>>>>> parsing.
>>>>>>>> Yes, I am mad.
>>>>>>>>
>>>>>>>> I cannot use manual jumps, because I don't want to jump out of the
>>>>>>>> scanners mid-token.
>>>>>>>>
>>>>>>>>
>>>>>>>> I am willing to try and add this operator into Ragel myself. I have
>>>>>>>> grabbed the source code and tracked my way to fsmap.cpp, where the
>>>>>>>> new
>>>>>>>> operator would be added.
>>>>>>>>
>>>>>>>> Before I continue...
>>>>>>>> Is there already a way to achieve my desired behavior that I am not
>>>>>>>> aware
>>>>>>>> of?
>>>>>>>> Would such an operator be worthwhile? Is it even possible?
>>>>>>>> Is there any knowledge that could be imparted that would help me
>>>>>>>> make
>>>>>>>> a
>>>>>>>> patch?
>>>>>>>>
>>>>>>>> If I do end up making a patch, for symmetry purposes I will make
>>>>>>>> global/local and start/any/final etc versions of the operator.
>>>>>>>>
>>>>>>>> After a brief look through the source, it looks like I would need to
>>>>>>>> mod the FsmAp::fillGaps() function, passing in a (separate object
>>>>>>>> for
>>>>>>>> each?) final state into the FsmAp::attachNewTrans() instead of NULL.
>>>>>>>>
>>>>>>>> Ragel is a wonderful program by the way, thank you for creating it.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Murray
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ragel-users mailing list
>>>>>>>> ragel-users at complang.org
>>>>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ragel-users mailing list
>>>>>>> ragel-users at complang.org
>>>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ragel-users mailing list
>>>>>> ragel-users at complang.org
>>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>
>>>>> _______________________________________________
>>>>> ragel-users mailing list
>>>>> ragel-users at complang.org
>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> ragel-users mailing list
>>>> ragel-users at complang.org
>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>
>>> _______________________________________________
>>> ragel-users mailing list
>>> ragel-users at complang.org
>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>
>>
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at complang.org
>> http://www.complang.org/mailman/listinfo/ragel-users
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>

_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users



More information about the ragel-users mailing list