PHP Regex for Validating Phone Numbers
So, you want to validate a phone number, huh? This is much different than verifying a phone number. To verify a phone number you could simply call the number to see if it connects to someone. But applications and databases don’t necessarily need to verify the number. On most occasions however, the data should be validated so that before a user sees it, they know it has been validated as a “good” number. I define validated as a number that can be assigned based on the North American Numbering Plan (NANPA).
I have seen numerous attempts to validate a phone number using a simple regex but they are all wrong. They best they can do is validate the number of digits. This method and any similar variation is completely useless! Even if you are checking for all the possible delimiters and removing spaces, etc. So how should it be done then, you ask? Easy!
First, we need to know what the rules are for verifying. Since I live in the U.S., and they have clearly defines guidelines for valid phone numbers used by NANPA, it makes this task much easier. Other countries use this same numbering plan like Canada, so this system will work with Canadian numbers too. But this will not be valid solution for numbers outside the NANPA numbering system.
Before I show you how, there are a few disclaimers I want to make clear before someone wants to flame me about them. Keep in mind, we want to validate any number that is assignable within the NANPA numbering plan.
- The phone validation will only validate the phone number according to the NANPA guidelines.
- It does not incorporate exclusions for special numbers (i.e. toll free or premium toll)
- It does not exclude area codes or cic’s that are not currently in use
- It does not verify the number based on location or country
- Phone numbers need to be stripped of non-digit characters leaving only numbers
- I assume there are no extensions included in the phone number
- The phone number being passed is assumed to be 10d or 1+10d regardless of formatting
So, here is the code that will validate the phone number. I have included the method for cleansing the phone number before validation to ensure we are only checking the numbers and they are in the correct positions without delimiters.
<?php
$phone = '+1 (801) 555-1212';
$phone = preg_replace('/\D/', '', $phone); # remove non-digits
$regex = '/^(?:1)?(?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9][0-9]{2}(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))$/';
if(preg_match($regex, $phone)){
echo 'Valid!';
}else{
echo 'Invalid!';
}
?>
The first thing I want to point out is notice how the phone number can be in any strange format. While it is easy for the human eye to see that the phone number appears to be valid, the computer wouldn’t recognize it as a phone number of it tried to actually dial it. Keep in mind, that it doesn’t matter how the phone number is formatted, as long as it has the correct number of digits to verify the phone number is in fact valid. This means is must include all of the components to place a verification call which include NPA (area code), NXX (prefix, exchange, or central office code) and XXXX (the number) formatted as NPA-NXX-XXXX. The cleansing mechanism will remove all of the other gibberish.
To understand if a phone number can actually be verified (or called), we must know what constitutes a legal phone number. For example, back in the day, all 555 numbers were invalid (i.e. 801-555-1234). The 555 CIC is commonly used in many movies to help prevent an influx of calls from the viewers of the movie. However, today this is no longer true. There is only a certain block of 555 numbers that are set aside as fictional numbers. These number range from 555-0100 to 555-0199. All other 555 numbers are considered by NANPA as usable. This is why some movies now are using phone numbers that start with 1 (i.e. 154-5223) since CIC codes cannot start with a 1 or 0.
THE DESCRIPTION
So, just in case you have difficulty reading regex (like I do), I will identify what it looks for to validate the phone number. We will use NXX for both the area code and prefix definitions and XXXX for the last four of the phone number.
AREA CODES (NPA)
(?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?
- N is any digit 2 thru 9 (cannot be 1)
- XX cannot be 11 (i.e. 911, 411, 511)
- N9X cannot be used reserved as expansion codes
- 37x & 96x ranges are set aside for unanticipated purposes
That covers the area code portion of the regex.
CENTRAL OFFICE CODES (CIC, NXX, or the prefix)
[2-9][0-9]{2}
- N cannot be 0 or 1
PHONE NUMBER (last four digits)
(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))
The only regulations for the last for are in combination with the CIC codes.
- Phone number cannot be 555-0100 thru 555-0199
- 555-1212 is reserved for information (similar to 411)
You may immediately notice I have an exception to my validation rule. While 555-1212 is a valid phone and should pass the validation regex, the number cannot be assigned. If you remember, I clarified in the first paragraph that I deem a number to be valid if it can be assigned. Therefore, since 555-1212 is a pre-determined assignation and will never be assigned to a residence or business, I exclude it. This also prevents adding bogus phone numbers to the data set.
You may have also noticed that I did not explain the
(?:1)?
at the beginning of the regex. This simply means the number will still validate with or without a 1 on the front. This is because the International Telecommunications Union (ITU) has assigned all North American Number Plan compliant phone numbers the country code of “1″. (see http://www.nanpa.com/about_us/abt_nanp.html) So any number from NANPA can be called with or without the 1 on the front depending on how and where you are calling from.
OTHER OPTIONS
There are additional things you can do to filter out phone numbers. Like I said at the beginning of this post, this is simply to validate the number to the North American Numbering Plan. This determines that the number is in fact assignable and can be called by your telephone. There may be additional filters you want to apply like:
- Check for a valid US or Canadian phone number
- Verify the CIC is actually assigned to the NPA
- Confirm the NPA is assigned and is in use
- Remove all toll free or special service numbers (i.e. 800, 888, 700, 900)
You could build one big regex filter out all of these exceptions like:
$regex = '/^(?:1)?\\(?(?!(37|96|[2-9]00|8(22|33|44|55|66|77|80|81|82|88)))[2-9][0-8][0-9](?<!(11))\\)?[2-9][0-9]{2}(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))$/';
But this would change the method from validate to validate_filter. The downside of doing this is if/when the information changes. When an areacode is put into use, you now have to modify the regex in order for the number to pass the filter. My advise would be to keep the validation regex by itself. If you need to filter out specific numbers, build a filter method that can be used to determine the filter rules. You can use a database or specified arrays to filter against. This will keep your validation regex clean and reliable and separate from the filter rules you may or may not want to apply.
In conclusion, you can see by my musings that validating a phone number can be simple, as long as you follow the rules. So many people, especially developers, tend to assume they know enough about something to create a solution. One thing I have learned from experience is that no matter what, when I think I have found the end solution to something, an exception to the rule usually always appears. If there is anything you find in this information to be inaccurate, please let me know. Happy coding!
If you found this useful and would like to help me spend time creating more posts like these, you can show your appreciation by buying me a soda.















>
b1232456 on December 3rd, 2010
Very usefull, i would only write the regex shorter and include the 1212 information number as it can be dialed:
\+1(?(?!(37|96))[2-9][0-8]\d(?<!(11)))?[2-9]\d\d(?<!(11))\d{4}(?<!(555(01\d\d)))