[ragel-users] ragel and encodings

Adrian Thurston thurston at complang.org
Tue May 26 02:47:14 UTC 2009


Some people express multibyte sequences directly in ragel with a char or 
unsigned char alphtype. There is contributed script in examples called 
unicode2ragel.rb that generates ragel definitions for ranges of unicode 
code points in utf8 or ucs4.

As a side note, it shoudl probably be in contrib. I'm going to move that 
now for anyone following the SVN directly.

-Adrian

Robert Lemmen wrote:
> On Thu, May 21, 2009 at 11:34:35AM -0400, Wil Macaulay wrote:
>> Depends on your platform, but my approach to this problem (on the Mac)
>> was to detect
>> the encoding, and convert to UTF-8 before parsing. I also converted
>> line-endings (\r\n -> \n)
>> and ensured a newline at the end of the data at the same time.
> 
> how do you handle utf-8 in your ragel code? do you use a single-byte
> alphtype and then handle the utf-8 sequences manually?
> 
> cu  robert
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users




More information about the ragel-users mailing list