[ragel-users] Ragel unicode support

Pietro Paolini pietro.paolini at cognitivecredit.com
Tue May 12 05:51:01 EDT 2020


Hi all,

I do apologise if the question has been asked already - but I could
not find anything related to it in the mailing list or on Google. I
would need to write a lexer which 'understands' Unicode, at the moment
I am happy to stick to UTF-8 encoding but I cannot rule out different
encodings in the future, or at least I'd like to know if that is a
feasible possibility. One use case I've is to match Unicode 'character
blocks', for instance Arabic or CJK using a succinct form such as
'[電-報]', those two glyphs were the quickest thing I could copy and
paste and do not represent the start or end of an interval.

I'd also like to know if it is possible to have statistics about the
number of times a regular expression was matched and - more
importantly, how long it took to process a given token. I think I can
'embed' the former into an action's body but I'd like to double check.


Thanks,
Pietro.



More information about the ragel-users mailing list