Real World Email Requirements From RFCs? NOT!

Two days ago I started out with the intent to completely understand the RFC Documents specifying the requirements for valid email addresses, write the worlds greatest PHP regex to validate email addresses, and then shout it out around the Internet from right here. At the time it sounded like a great idea. So I set out on my quest and started digging up the necessary RFC Documents. I was surprised with what I found.

The first thing I discovered, and quite quickly, was that the RFC Documents were very difficult to understand. Once I got my head around them, I started getting the hang of it. I realized that multiple RFCs could define the same requirements. This started to get me confused again. What made matters worse was when I realized that I really didn’t understand how to know if the document I was reading was even the valid source. It turned out, I was studying one RFC that had been obsoleted by a more recent one.

So what did I learn from all of this? Well, I learned how to find the most recent RFC for a given topic, which documents they update or obsolete, and how to find the errata that is designed to fix the mistakes in the original document.

I also learned that reading RFCs can be very time consuming and difficult to understand. I should suggest an RFC on how to write an RFC so that they have to include a brief summary at the beginning to clarify the requirements in a high-level summary to make them more easy to understand. Then if there is a question, the reader can refer to the more in-depth description later in the document.

The final thing I learned, specific to the email validation I originally set out to do in the first place, is that it does not really fit a “real world” validation process I was hoping to achieve. I mean can anyone honestly say they have encountered an email that looks like “Chuck\@Burgess”@example.com anyway? I know I haven’t. Nor any of the other RFC compliant email addresses that would pass the validation routine.

Suffice it to say I have determined a much better methodology. I decided to blend my idea of validation and verification. I will do some minimal checks on the email address to confirm validity, then post it to the originating SMTP server and check the response for validity. This is by far the best way to handle it. While there are certain limitations that need to be worked through, it is much less work and far more accurate to have the actual owner of the domain tell me if they will accept an email address than trying to validate it through a compliance engine that may turn out to be inaccurate. After all, we live in a world that is digitally connected. I will just have my server EHLO their server and VRFY the email. More to come on that soon!

Happy coding!

Like it? Post to your favorite location and share.
  • Digg
  • del.icio.us
  • Facebook
  • LinkedIn
  • Reddit
  • StumbleUpon
  • Twitter

Leave a Comment

You must be to post a comment.