RFC-2822 recognizer: best way to test it?

Wed May 23 13:43:36 UTC 2007

Hi!

As my first Ragel project I'm writing a recognizer for RFC-2822 email
addresses. All the recognizer has to do is scan an input string and
decide whether or not it conforms to RFC-2822. I'll write a little bit
of background first; but in the end my question is, what's the best
way to test this?

I basically started by taking RFC-2822 (<http://www.ietf.org/rfc/
rfc2822.txt>) and taking the rules -- written in the RFC using
Augmented Backus-Naur Form (ABNF) notation (<http://www.ietf.org/rfc/
rfc2234.txt>) -- and rewriting them using Ragel syntax.

There is one circular dependency in those rules ("comment" needs
"ccontent", but "ccontent" needs "comment") and so for the time being
I've commented out that dependency (in other words, nesting of
comments inside comments isn't yet implemented). If everything works
out ok I will as a last step use the trick described here <http://
groups.google.com/group/ragel-users/browse_thread/thread/
f3fdde1d51c86aaf/e4f2b110236b8660> to manually handle the recursion.

Running ragel on the input causes it to spin forever, so I've
simplified some of the rules (mostly by commenting out the optional
whitespace) and now it compiles (using C as the output language).
Before I begin tweaking the rules back into conformance with the RFC I
wanted to ask about testing techniques.

What I have is effectively a black box where I stick input in and get
success or failure message back at the end. Is there any way to break
this down into smaller parts of testing purposes? In other words,
instead of testing that "f... at example.com" passes (it does), can I test
that "example.com" matches  a "domain", or even lower, that "foo" is
valid "atext". Basically, I can test that the whole works, but I'd be
much more confident if I could individually test the parts as well.

What's the best methodology here?

Cheers,
Wincent