Phone number regular expression...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • joemono

    Phone number regular expression...

    Hello everyone!

    First, I appologize if this posting isn't proper "netiquette " for this
    group.

    I've been working with perl for almost 2 years now. However, my regular
    expression knowledge is pretty limited. I wrote the following expression to
    take (hopefully) any _reasonable_ phone number input, and format it as
    (999) 999-9999 x 9999.

    Here's what I've come up with. I would like your comments, if you've got the
    time. I'm really interested in regular expressions, and I want to know if
    what I'm doing is inefficient, slow, etc...

    # area code
    \({0,1}\s*(\d{3 }){0,1}\s*\){0, 1}
    # optional parentheses, 3 digits, optional parentheses
    (?=[-| ]*(\d{3}){1}[-| ]*(\d{4}){1}) #
    match only if the first match is followed by

    # what looks like a phone number

    # this is the same match as the standard 7 digit phone number below
    # main phone number
    [-| ]*
    (\d{3}){1} # first 3 digits
    [-| ]*
    (\d{4}){0,1} # second 4 digits

    # extension
    [-| |x|X]*
    (\d{3,4}){0,1} # extension

    For example, here's a question I have. Is there a way to use the look-ahead
    match in the area code section _again_ for matching the main number, since
    they are the same? I also know that I could use ? instead of {0,1}
    (correct?), but I always get confused between that and non-greedy
    quantifier. Does that make sense?

    I wrote a script to test it (it generates many different possible phone
    number inputs, and then applies the regular expression), and it _seems_ to
    work. But like I said, I kinda don't know what I'm doing. I've been using
    http://www.perldoc.com/perl5.6/pod/perlre.html heavily. It's pretty useful.

    Here's another question, do people ever have extensions less than 3, or
    greater than 4 numbers?

    Thanks for your help!

    Joe


  • Purl Gurl

    #2
    Re: Phone number regular expression...

    joemono wrote:

    (snipped)
    [color=blue]
    > I wrote the following expression to take (hopefully) any _reasonable_
    > phone number input, and format it as
    > (999) 999-9999 x 9999.[/color]

    Parameter is "reasonable " American style phone numbers.

    [color=blue]
    > what I'm doing is inefficient, slow, etc...[/color]

    (snipped a lot of regex matching)

    Yes, very slow, very inefficient. Do not invoke a
    regex engine unless you have no choice, or a regex
    actually "proves" to be the most efficient method
    found within a collection of tested methods.

    [color=blue]
    > Is there a way to use the look-ahead match[/color]

    Never use look-ahead unless you have no choice.
    Using any style of look-ahead will almost always
    be slow and inefficient compared to other methods.

    Note my "almost always" does not mean "always" as some
    might ignorantly claim. In some cases, a look-ahead
    could be your only choice, or most efficient choice.

    [color=blue]
    > do people ever have extensions less than 3, or greater than 4 numbers?[/color]

    Extensions cannot be predicted. Length of an extension is
    directly controlled by an internal PBX system. An extension
    length can literally be any length.

    What is the length of those extensions you hear during a
    recorded menu selection? Is there more than one extension?
    These type of numbers, could be a problem.

    1-800-tru-idiots
    if you are stupid, press 1 now
    *next menu*
    if you are stupid and gullible, press 2 now
    *next menu*
    if you are stupid, gullible and tired of this, press 3 now
    *next menu*
    Thank you for calling America Onlame! You are an idiot! Goodbye!
    *dial tone*

    I count three extensions each with a length of one.

    Your methodology allows parentheses, hyphens and such, then
    tries to match for all possible combinations. This is quite
    inefficient and prone to error.

    Remove all characters except numbers, then work with your data.
    You are interested in phone numbers, are you not? So work with
    numbers, nothing else.

    Keep in mind, regardless of what methodology you employ, there
    is a good chance there will be false positives and false negatives.
    Parsing phone numbers is similar to parsing email addresses; it
    is difficult and unpredictable.

    Look over my method below. This method eliminates all characters
    except numbers, then generates a very uniform output appropriate
    for a data file. Output is also easy on the human eye.


    Ever wonder why people use "spelled" phone numbers, like

    1-800-bite-me

    When someone tries to give me a spelled number, I say,

    "Don't bother. I will not call you."


    Purl Gurl
    --
    Rock Midis! Science Fiction! Amazing Androids!


    My $test_it is used to exemplify a non-destructive
    method, needed for a print of invalid numbers. You
    could easily use $_ throughout as well, but this
    defeats "full" printing of an invalid phone number.

    #!perl

    while (<DATA>)
    {
    my $test_it = $_;
    $test_it =~ s/[^\d+]//g;

    if ($test_it =~ tr/0-9// == 7)
    {
    substr ($test_it, 3, 0, " ");
    print "$test_it\n ";
    }
    elsif ($test_it =~ tr/0-9// == 10)
    {
    substr ($test_it, 3, 0, " ");
    substr ($test_it, 7, 0, " ");
    print "$test_it\n ";
    }
    elsif ($test_it =~ tr/0-9// > 10)
    {
    substr ($test_it, 3, 0, " ");
    substr ($test_it, 7, 0, " ");
    substr ($test_it, 12, 0, " ");
    print "$test_it\n ";
    }
    else
    { print "Phone Number Appears Invalid: $_\n"; }
    }


    __DATA__
    123-4567
    123 4567
    (310) 123 4567
    310-123-4567
    310-123-4567 ext 890
    310 123 4567 890
    123-4567FUBAR
    310 123 FUBAR



    PRINTED RESULTS:
    _______________ _

    123 4567
    123 4567
    310 123 4567
    310 123 4567
    310 123 4567 890
    310 123 4567 890
    123 4567
    Phone Number Appears Invalid: 310 123 FUBAR

    Comment

    • Roy Johnson

      #3
      Re: Phone number regular expression...

      I thought that you made a few odd (either esoteric or not Lazy enough)
      implementation decisions.

      Purl Gurl <purlgurl@purlg url.net> wrote in message news:<3F762F4F. 13CEA620@purlgu rl.net>...[color=blue]
      > [...]You could easily use $_ throughout as well, but this
      > defeats "full" printing of an invalid phone number.[/color]

      Instead of preserving $_ and working on $test_it, you could have saved
      a copy and then worked on $_ itself.

      You used s/[^\d+]//g instead of tr/0-9//dc to remove all non-digits.

      You used tr/0-9// instead of length.

      The use of the 4-argument version of substr() was neat, but a
      judicious pattern match instead of length-checking makes for tighter
      code:

      while (<DATA>) {
      my $save = $_;
      tr/0-9//dc;
      if (/(...)?(...)(... .)/) {
      printf "%3s %s %s %s\n", $1, $2, $3, $';
      }
      else {
      print "Invalid phone number: $save\n";
      }
      }

      Now let's go back to the issue of stripping all non-numerics. If you
      do that, you can't distinguish 123-4567 x890 from (123) 456 7890.
      Granted, when you dial, the phone doesn't know the difference, but
      there may be some difference in how the person doing the dialing has
      to behave.

      If, instead of stripping the non-digits, you just look for groups of
      digits (optional 3, then mandatory 3 and 4, then optional however
      many) amongst the non-digits, you can address that:

      #!perl
      while (<DATA>) {
      my $save = $_;
      if (/^\D*(?:(\d{3})\ D+)?(\d{3})\D+( \d{4})(?:\D+(\d +))?/) {
      printf "%3s %s %s %s\n", $1, $2, $3, $4;
      }
      else {
      print "Invalid phone number: $save\n";
      }
      }

      __DATA__
      123-4567
      123 4567
      123 4567 x890 <-- note
      (310) 123 4567
      310-123-4567
      310-123-4567 ext 890
      310 123 4567 890
      123-4567FUBAR
      310 123 FUBAR


      Output is:
      123 4567
      123 4567
      123 4567 890
      310 123 4567
      310 123 4567
      310 123 4567 890
      310 123 4567 890
      123 4567
      Invalid phone number: 310 123 FUBAR

      Comment

      • Gunnar Hjalmarsson

        #4
        Re: Phone number regular expression...

        joemono wrote:[color=blue]
        > I wrote the following expression to take (hopefully) any
        > _reasonable_ phone number input, and format it as (999) 999-9999 x
        > 9999.[/color]

        Hi Joe,

        I don't know the likelihood in your case that people outside the US
        are asked to enter their phone numbers. The reason why I mention it is
        that I have tried to enter my non-US number at quite a few US based
        web sites, resulting in error messages...

        So, out from that experience, I'd say that a strict phone number
        checking is sometimes a really bad idea. ;-)

        Gunnar
        (Sweden)

        --
        Gunnar Hjalmarsson
        Email: http://www.gunnar.cc/cgi-bin/contact.pl

        Comment

        • Purl Gurl

          #5
          Re: Phone number regular expression...

          Roy Johnson wrote:
          [color=blue]
          > Purl Gurl wrote in message[/color]
          [color=blue]
          > I thought that you made a few odd (either esoteric or not Lazy enough)
          > implementation decisions.[/color]

          I have no interest in reading Code Cop Crap.

          It is annoying to open an article only to discover
          this type of troll mule manure you write.

          Respond to the originating author as you should.

          You are wasting your time and the time of readers.


          Purl Gurl

          Comment

          • Roy Johnson

            #6
            Re: Phone number regular expression...

            Purl Gurl <purlgurl@purlg url.net> wrote in message news:<3F7DB1D9. A0510387@purlgu rl.net>...
            [color=blue]
            > I have no interest in reading Code Cop Crap.[/color]

            Interesting. I have no interest in your critiques of my posts that
            have nothing to do with Perl.

            It's not "trolling" to point out that you're doing bizarre things when
            straightforward methods are available. My code was much more clear
            than yours, as well as being shorter.

            delete $shoulder->{'chip'}

            Comment

            Working...