help with regular expression interpretation

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • hq4ever (at) 012 (dot) net (dot) il

    help with regular expression interpretation


    function testemail($emai l) {

    $validEmailExpr =
    "^[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*@[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*$";

    return eregi($validEma ilExpr, $email);
    }

    $email = "foo@bar.gov.mi l";
    testmail($email ); //return TRUE

    $email = "foo.bar@bar.go v.mil";
    testmail($email ); //return TRUE

    $email = "foo..bar@bar.g ov.mil";
    testmail($email ); //return FALSE - why ??

    $email = "foo.@bar.gov.m il";
    testmail($email ); //return FALSE - why ??

    as i understand it : (steps)

    1. accept only 1 char of group [0-9a-z~!#$%&_-]
    2. require 0 or more chars of group ([.]?[0-9a-z~!#$%&_-])*
    -- why foo..bar is not valid input ?,
    -- shouldn't it be ([.]?[0-9a-z~!#$%&_-]*)* ?

    thank you for your help.

    p.s Sorry if my previous post "can i get the public key of client
    machine using php" didn't fit right into USENET, i am very very new here
    so go easy on me :)
  • zYm3N

    #2
    Re: help with regular expression interpretation

    Dnia Sun, 01 Aug 2004 21:03:44 +0000, hq4ever (at) 012 (dot) net (dot) il
    sp³odzi³(a):
    [color=blue]
    > function testemail($emai l) {
    >
    > $validEmailExpr =
    > "^[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*@[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*$";
    >
    > return eregi($validEma ilExpr, $email);
    > }
    >
    > $email = "foo@bar.gov.mi l";
    > testmail($email ); //return TRUE
    >
    > $email = "foo.bar@bar.go v.mil";
    > testmail($email ); //return TRUE
    >
    > $email = "foo..bar@bar.g ov.mil";
    > testmail($email ); //return FALSE - why ??[/color]
    "]([.]?[0-9"

    zero or one '.'
    [color=blue]
    > $email = "foo.@bar.gov.m il";
    > testmail($email ); //return FALSE - why ??[/color]

    last char should be one of [0-9a-z~!#$%&_-]..

    Try to use:

    eregi(""^[a-z0-9_]+@([a-z0-9_]+.)+[a-z]{2,}$", $email);


    --
    zYm3N[@interia.pl]

    ..:: C++ | C | PHP | HTML | Delphi | Pascal
    ..:: >> http://zymen.cjb.net <<
    ..:: http://zymen.cjb.net/cytowanie.html

    Comment

    • Ian.H

      #3
      Re: help with regular expression interpretation

      On Sun, 01 Aug 2004 20:49:48 +0200, zYm3N wrote:


      [ re: e-mail syntax validation ]

      [color=blue]
      > Try to use:
      >
      > eregi(""^[a-z0-9_]+@([a-z0-9_]+.)+[a-z]{2,}$", $email);[/color]


      foo+bar@my-domain.com


      would fall over drunk in your example.



      Regards,

      Ian


      PS: _ (underscore) is not a valid char within a domain name.

      --
      Ian.H
      digiServ Network
      London, UK


      Comment

      • maxim vexler

        #4
        Re: help with regular expression interpretation

        for what input will this "^([.]?[0-9a-z~!#$%&_-])*$" regular expression
        be TRUE ?

        as i understand, it should accept at most one time '.' & then one
        character, no? and so it can be repeated n times, something like
        ..f.o.o.b.a.r which should evaluate as TRUE, why is it then that a string
        such as .my-domain.root still evaluates TRUE ?

        Comment

        • Ian.H

          #5
          Re: help with regular expression interpretation

          On Mon, 02 Aug 2004 02:37:40 +0000, maxim vexler <hq4ever (at) 012 (dot)
          net (dot) il> wrote:
          [color=blue]
          > for what input will this "^([.]?[0-9a-z~!#$%&_-])*$" regular expression
          > be TRUE ?
          >
          > as i understand, it should accept at most one time '.' & then one
          > character, no? and so it can be repeated n times, something like
          > .f.o.o.b.a.r which should evaluate as TRUE, why is it then that a string
          > such as .my-domain.root still evaluates TRUE ?[/color]


          Maxim,

          Although not 100% fail-safe (nothing will be with checks such as this), I
          use this function I wrote normally:


          function validate_email( $addy, $return_mx_reco rds = false) {
          if (empty($addy)) return false;

          if (!preg_match(
          '/^[a-zA-Z0-9&\'\.\-_\+]+\@[a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/',
          $addy
          )) {
          return false;
          }

          $mx_exists = false;
          $mx_records = array();
          if (getmxrr(array_ pop(explode('@' , $addy)), $mx_records)) {
          $mx_exists = true;
          }

          if ($mx_exists) {
          return ($return_mx_rec ords) ? $mx_records : true;
          } else {
          unset($mx_recor ds);
          return false;
          }
          }



          USAGE:

          if (!validate_emai l($_POST['email'])) {
          $email_valid = false;
          /* Do fail stuff */
          }



          HTH =)



          Regards,

          Ian


          PS: Watch for line wrapping.

          --
          Ian.H
          digiServ Network
          London, UK


          Comment

          • Ian.H

            #6
            Re: help with regular expression interpretation

            On Mon, 02 Aug 2004 02:47:31 +0000, Ian.H wrote:
            [color=blue]
            > if (!preg_match(
            > '/^[a-zA-Z0-9&\'\.\-_\+]+\@[a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/',
            > $addy
            > )) {
            > return false;
            > }[/color]


            Oops, to fix the very issue you mentioned, change the regex to:


            '/^[a-zA-Z0-9&\'\.\-_\+]+\@[^\.][a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/'



            Regards,

            Ian

            --
            Ian.H
            digiServ Network
            London, UK


            Comment

            • Tim Van Wassenhove

              #7
              Re: help with regular expression interpretation

              In article <pan.2004.08.02 .02.51.40.64100 0@bubbleboy.dig iserv.net>, Ian.H wrote:[color=blue]
              > On Mon, 02 Aug 2004 02:47:31 +0000, Ian.H wrote:
              >[color=green]
              >> if (!preg_match(
              >> '/^[a-zA-Z0-9&\'\.\-_\+]+\@[a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/',
              >> $addy
              >> )) {
              >> return false;
              >> }[/color]
              >
              >
              > Oops, to fix the very issue you mentioned, change the regex to:
              >
              >
              > '/^[a-zA-Z0-9&\'\.\-_\+]+\@[^\.][a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/'[/color]


              If it doesn't allow _all_ valid e-mail addresses it's useless.

              A quick search in this newsgroup will direct you to phpclasses or


              --
              Tim Van Wassenhove <http://home.mysth.be/~timvw>

              Comment

              • maxim vexler

                #8
                Re: help with regular expression interpretation

                you have sure given me an answer about "Howto e-mail validation" for
                what i thank you.

                still, could you explain why the expression in my question didn't
                returned what i expected it to return, just for my own knowledge :[color=blue]
                > for what input will this "^([.]?[0-9a-z~!#$%&_-])*$" regular expression
                > be TRUE ?
                >
                > as i understand, it should accept at most one time '.' & then one
                > character, no? and so it can be repeated n times, something like
                > .f.o.o.b.a.r which should evaluate as TRUE, why is it then that a string
                > such as .my-domain.root still evaluates TRUE ?[/color]

                thank you.

                Tim Van Wassenhove wrote:[color=blue]
                > In article <pan.2004.08.02 .02.51.40.64100 0@bubbleboy.dig iserv.net>, Ian.H wrote:
                >[color=green]
                >>On Mon, 02 Aug 2004 02:47:31 +0000, Ian.H wrote:
                >>
                >>[color=darkred]
                >>> if (!preg_match(
                >>> '/^[a-zA-Z0-9&\'\.\-_\+]+\@[a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/',
                >>> $addy
                >>> )) {
                >>> return false;
                >>> }[/color]
                >>
                >>
                >>Oops, to fix the very issue you mentioned, change the regex to:
                >>
                >>
                >> '/^[a-zA-Z0-9&\'\.\-_\+]+\@[^\.][a-zA-Z0-9.-]+\.+[a-zA-Z]{2,6}$/'[/color]
                >
                >
                >
                > If it doesn't allow _all_ valid e-mail addresses it's useless.
                >
                > A quick search in this newsgroup will direct you to phpclasses or
                > http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
                >[/color]

                Comment

                • Ian.H

                  #9
                  Re: help with regular expression interpretation

                  On Mon, 02 Aug 2004 12:08:53 +0000, maxim vexler <hq4ever (at) 012 (dot)
                  net (dot) il> wrote:
                  [color=blue]
                  > still, could you explain why the expression in my question didn't
                  > returned what i expected it to return, just for my own knowledge :[/color]
                  [color=blue][color=green]
                  > > for what input will this "^([.]?[0-9a-z~!#$%&_-])*$" regular expression
                  > > be TRUE ?[/color][/color]


                  This would allow all your chosen chars above (assuming some of those chars
                  didn't require escaping, can't think off the top of my head) but I believe
                  your biggest issue with the above (actually 2):


                  It tries to validate _starting_ with a . as the first char (for no . at
                  the beginning, you'd need [^\.] rather than [.] (the ^ in this case,
                  negates things so "if not ." but don't get that confused with the first ^
                  (as you have it as that's the start anchor)).

                  The other issue by the looks of it, '(...)*' the * here specifying zero or
                  more occurrances, so 'foo@.com' would slide through too.

                  Something else that you might find useful:


                  <http://weitz.de/regex-coach/>


                  I use it normally for more complex regexs, but it's great also for
                  learning regex too IMO.


                  HTH =)



                  Regards,

                  Ian

                  --
                  Ian.H
                  digiServ Network
                  London, UK


                  Comment

                  • Michael Fesser

                    #10
                    Re: help with regular expression interpretation

                    .oO(maxim vexler <hq4ever (at) 012 (dot) net (dot) il>)
                    [color=blue]
                    >for what input will this "^([.]?[0-9a-z~!#$%&_-])*$" regular expression
                    >be TRUE ?[/color]

                    foo
                    .foo
                    foo.bar

                    but not

                    foo.
                    ..foo
                    foo..bar

                    It accepts every string with chars of the class [0-9a-z~!#$%&_-], if
                    there's a dot it has to be followed by at least one other char. There
                    can't be a dot at the end or directly followed by another dot.
                    [color=blue]
                    >as i understand, it should accept at most one time '.' & then one
                    >character, no? and so it can be repeated n times, something like
                    >.f.o.o.b.a.r which should evaluate as TRUE, why is it then that a string
                    >such as .my-domain.root still evaluates TRUE ?[/color]

                    The dot is optional:

                    [.]?

                    means zero or one chars of the class [.],

                    \.?

                    would do the same in this case.

                    So the pattern could be read like this: Any number (zero or more) of
                    chars of the given class, where each char _may_ be preceded by one dot.

                    BTW: You should use the preg* functions (PCRE) instead of the old ereg*
                    functions, they're faster and much more flexible.

                    HTH
                    Micha

                    Comment

                    • maxim vexler

                      #11
                      Re: help with regular expression interpretation

                      what a wonderful tool, thank you very much.

                      Ian.H wrote:
                      [color=blue]
                      > Something else that you might find useful:
                      >
                      >
                      > <http://weitz.de/regex-coach/>
                      >
                      >
                      > I use it normally for more complex regexs, but it's great also for
                      > learning regex too IMO.
                      >
                      >
                      > HTH =)
                      >
                      >
                      >
                      > Regards,
                      >
                      > Ian
                      >[/color]

                      Comment

                      • maxim vexler

                        #12
                        Re: help with regular expression interpretation

                        now i see, i was wrong in the way i thought regular expression was
                        evaluating the text. thank you for your comments.
                        why do you say preg* is better then ereg*, i've seen the same tip on
                        php.net but it didn't said any thing about why is it better ? - i know
                        there are different forms of regular expression (POSIX & Perl no?)
                        - is it the case with this functions?
                        - will the syntax be different ?
                        - do they evaluate differently ?

                        thank you, you have been great help to me.

                        Michael Fesser wrote:[color=blue]
                        > .oO(maxim vexler <hq4ever (at) 012 (dot) net (dot) il>)
                        >
                        >[color=green]
                        >>for what input will this "^([.]?[0-9a-z~!#$%&_-])*$" regular expression
                        >>be TRUE ?[/color]
                        >
                        >
                        > foo
                        > .foo
                        > foo.bar
                        >
                        > but not
                        >
                        > foo.
                        > ..foo
                        > foo..bar
                        >
                        > It accepts every string with chars of the class [0-9a-z~!#$%&_-], if
                        > there's a dot it has to be followed by at least one other char. There
                        > can't be a dot at the end or directly followed by another dot.
                        >
                        >[color=green]
                        >>as i understand, it should accept at most one time '.' & then one
                        >>character, no? and so it can be repeated n times, something like
                        >>.f.o.o.b.a. r which should evaluate as TRUE, why is it then that a string
                        >>such as .my-domain.root still evaluates TRUE ?[/color]
                        >
                        >
                        > The dot is optional:
                        >
                        > [.]?
                        >
                        > means zero or one chars of the class [.],
                        >
                        > \.?
                        >
                        > would do the same in this case.
                        >
                        > So the pattern could be read like this: Any number (zero or more) of
                        > chars of the given class, where each char _may_ be preceded by one dot.
                        >
                        > BTW: You should use the preg* functions (PCRE) instead of the old ereg*
                        > functions, they're faster and much more flexible.
                        >
                        > HTH
                        > Micha[/color]

                        Comment

                        • Michael Fesser

                          #13
                          Re: help with regular expression interpretation

                          .oO(maxim vexler <hq4ever (at) 012 (dot) net (dot) il>)
                          [color=blue]
                          >why do you say preg* is better then ereg*, i've seen the same tip on
                          >php.net but it didn't said any thing about why is it better ?[/color]

                          I haven't done a benchmark, but according to the manual they are faster
                          in most cases. Additionally you can do much more things with them, there
                          are many very useful extensions and features that are not possible with
                          the ereg* functions, for example different modifiers to fine-control the
                          pattern matching, named and conditional subpatterns, assertions ...

                          This allows much more flexible and sometimes more reliable pattern
                          matching.
                          [color=blue]
                          >- i know
                          >there are different forms of regular expression (POSIX & Perl no?)
                          >- is it the case with this functions?[/color]

                          ereg* is POSIX, preg* is PCRE (Perl Compatible Regular Expressions)
                          [color=blue]
                          >- will the syntax be different ?[/color]

                          Slightly. The basic syntax is the same. The main difference is that a
                          PCRE pattern has to be enclosed in delimiters, i.e. a single char before
                          and after the pattern. The ending delimiter may then be followed by some
                          modifiers.

                          Micha

                          Comment

                          Working...