From pietro.paolini at cognitivecredit.com  Tue May 12 05:51:01 2020
From: pietro.paolini at cognitivecredit.com (Pietro Paolini)
Date: Tue, 12 May 2020 10:51:01 +0100
Subject: [ragel-users] Ragel unicode support
Message-ID: <CAD1wxR+BPR0gBkc_QNeVT3EGdZs7pY5R1ZqbSJ-wNgm23FJqZw@mail.gmail.com>

Hi all,

I do apologise if the question has been asked already - but I could
not find anything related to it in the mailing list or on Google. I
would need to write a lexer which 'understands' Unicode, at the moment
I am happy to stick to UTF-8 encoding but I cannot rule out different
encodings in the future, or at least I'd like to know if that is a
feasible possibility. One use case I've is to match Unicode 'character
blocks', for instance Arabic or CJK using a succinct form such as
'[電-報]', those two glyphs were the quickest thing I could copy and
paste and do not represent the start or end of an interval.

I'd also like to know if it is possible to have statistics about the
number of times a regular expression was matched and - more
importantly, how long it took to process a given token. I think I can
'embed' the former into an action's body but I'd like to double check.


Thanks,
Pietro.