[ragel-users] How to convert [#x2070-#x218F] to Ragel grammar?

Iñaki Baz Castillo ibc at aliax.net
Fri Nov 20 19:52:11 UTC 2009


El Viernes, 20 de Noviembre de 2009, Adrian Thurston escribió:
> Okay, in that case then have a look at the unicode2ragel.rb script in
> the contrib directory.

Hummm, two points:

1) I'm not generating Ruby code, but C code (perhaps it doesn't matter and you 
suggested that script for other reason).

2) I don't see that script in 6.4 sources:

/usr/src/ragel-6.4$ ls -l contrib/
total 36
-rw-r--r-- 1 root root 8320 2009-07-08 02:03 Makefile
-rw-r--r-- 1 ibc  ibc    34 2009-04-05 00:06 Makefile.am
-rw-r--r-- 1 ibc  ibc  8280 2009-05-18 15:56 Makefile.in
-rw-r--r-- 1 ibc  ibc  1386 2009-04-04 23:56 ragel.m4
-rw-r--r-- 1 ibc  ibc    85 2009-04-04 23:56 ragel.make


> It can help with generating ragel defintions for
> you.

The fact is that I just wnat to create ragel grammar (using single byte) from 
this original grammar:

  NameStartChar  ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |
                      [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | 
                      [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
                      [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                      [#x10000-#xEFFFF]

Must I do some "exotic" to achieve it? I've already played with UTF-8 code in 
Ragel by using this grammar (from SIP protocol ABNF):

	UTF8_CONT        = 0x80..0xbf;
	UTF8_NONASCII    = ( 0xc0..0xdf UTF8_CONT ) | ( 0xe0..0xef UTF8_CONT{2} ) |
             ( 0xf0..0xf7 UTF8_CONT{3} ) | ( 0xf8..0xfb UTF8_CONT{4} ) |
             ( 0xfc..0xfd UTF8_CONT{5} );

Unfortunatelly I have no idea about if the former grammar ("NameStartChar") 
is, or not, related to UTF-8 or not.
The fact is that I could simplify this grammar (after understanding what it 
means) even if it allows some invalid bytes.


However, please let me ask again:
What is #x2FF? is it "0x2F 0xF0"? or "0x02 0xFF"? I need to know it in order 
to do a workaround.

Really thanks a lot.


-- 
Iñaki Baz Castillo <ibc at aliax.net>




More information about the ragel-users mailing list