From pietro.paolini at cognitivecredit.com Tue May 12 05:51:01 2020 From: pietro.paolini at cognitivecredit.com (Pietro Paolini) Date: Tue, 12 May 2020 10:51:01 +0100 Subject: [ragel-users] Ragel unicode support Message-ID: Hi all, I do apologise if the question has been asked already - but I could not find anything related to it in the mailing list or on Google. I would need to write a lexer which 'understands' Unicode, at the moment I am happy to stick to UTF-8 encoding but I cannot rule out different encodings in the future, or at least I'd like to know if that is a feasible possibility. One use case I've is to match Unicode 'character blocks', for instance Arabic or CJK using a succinct form such as '[電-報]', those two glyphs were the quickest thing I could copy and paste and do not represent the start or end of an interval. I'd also like to know if it is possible to have statistics about the number of times a regular expression was matched and - more importantly, how long it took to process a given token. I think I can 'embed' the former into an action's body but I'd like to double check. Thanks, Pietro.