preg_match doesn't work properly!?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • chadsspameateremail@yahoo.com

    preg_match doesn't work properly!?

    I might have found a problem with how preg_match works though I'm not
    sure.
    Lets say you have a regular expression that you want to match a string
    of numbers. You might write the code like this:
    preg_match( '/^[0-9]+$/', $TestString );

    OK everything seems fine. However, did you know if you pass the
    following to preg_match: "12345\n" it will return that a match
    occurred?!? Even though the newline is not a valid character in our
    regular expression.

    Here is the test program, *please run the program as written below*:

    <?php
    $TestString = "12345\n";
    print preg_match( '/^[0-9]+$/', $TestString );
    ?>

    You will find it prints 1 even though the newline character isn't a
    valid part of our regular expression. What other characters I wonder
    can be put in a regular expression and have the string match!? Any
    ideas on this? Why is this undocumented behavior present in PHP?!?
    For regular expressions to not work as expected or documented seems
    like a pretty serious bug in PHP. I don't think there is a problem
    with the regular expression.

    Thoughts?
  • chadsspameateremail@yahoo.com

    #2
    Re: preg_match doesn't work properly!?

    I found this link about the topic:


    Apparently '$' isn't the end of the string unless you add the 'D' to
    the end as in:
    print preg_match( '/^[0-9]+$/D', $TestString );

    The page says 'even documented in the PHP manual is that $...' however
    I looked at the preg_match page on php.net and there is no mention of
    this or the /D switch either. Any ideas what the author was referring
    too?

    I am new to PHP but I would certainly consider this a 'gotcha'
    especially since it is relatively undocumented.

    Comment

    • Rik Wasmus

      #3
      Re: preg_match doesn't work properly!?

      chadsspameatere mail@yahoo.com wrote:
      I might have found a problem with how preg_match works though I'm not
      sure.
      Lets say you have a regular expression that you want to match a string
      of numbers. You might write the code like this:
      preg_match( '/^[0-9]+$/', $TestString );
      >
      OK everything seems fine. However, did you know if you pass the
      following to preg_match: "12345\n" it will return that a match
      occurred?!? Even though the newline is not a valid character in our
      regular expression.
      >
      Here is the test program, *please run the program as written below*:
      >
      <?php
      $TestString = "12345\n";
      print preg_match( '/^[0-9]+$/', $TestString );
      ?>
      >
      You will find it prints 1 even though the newline character isn't a
      valid part of our regular expression. What other characters I wonder
      can be put in a regular expression and have the string match!? Any
      ideas on this? Why is this undocumented behavior present in PHP?!?
      For regular expressions to not work as expected or documented seems
      like a pretty serious bug in PHP. I don't think there is a problem
      with the regular expression.
      >
      Thoughts?
      '/^[0-9]+$/D'


      D (PCRE_DOLLAR_EN DONLY)
      If this modifier is set, a dollar metacharacter in the pattern matches
      only at the end of the subject string. Without this modifier, a dollar
      also matches immediately before the final character if it is a newline
      (but not before any other newlines). This modifier is ignored if m
      modifier is set. There is no equivalent to this modifier in Perl.


      Yes, I also think this is weird. If I want to match for newlines, I'll
      match for newlines :).
      --
      Rik Wasmus
      ....spamrun finished

      Comment

      • Paul Lautman

        #4
        Re: preg_match doesn't work properly!?

        chadsspameatere mail@yahoo.com wrote:
        >I might have found a problem with how preg_match works though I'm not
        sure.
        Lets say you have a regular expression that you want to match a string
        of numbers. You might write the code like this:
        preg_match( '/^[0-9]+$/', $TestString );
        >
        OK everything seems fine. However, did you know if you pass the
        following to preg_match: "12345\n" it will return that a match
        occurred?!? Even though the newline is not a valid character in our
        regular expression.
        Yes, I did, but only because that's what it says in the manual:
        D (PCRE_DOLLAR_EN DONLY)

        If this modifier is set, a dollar metacharacter in the pattern matches only
        at the end of the subject string. Without this modifier, a dollar also
        matches immediately before the final character if it is a newline (but not
        before any other newlines). This modifier is ignored if m modifier is set.
        There is no equivalent to this modifier in Perl.
        Here is the test program, *please run the program as written below*:
        >
        <?php
        $TestString = "12345\n";
        print preg_match( '/^[0-9]+$/', $TestString );
        >?>
        >
        You will find it prints 1 even though the newline character isn't a
        valid part of our regular expression. What other characters I wonder
        can be put in a regular expression and have the string match!? Any
        ideas on this? Why is this undocumented behavior present in PHP?!?
        It isn't since it is documented.
        For regular expressions to not work as expected or documented seems
        like a pretty serious bug in PHP.
        If this was the case then I would agree. However since the cause is not that
        it is not in the documentation, but simply that you did not read it in the
        documentation.. ...
        I don't think there is a problem
        with the regular expression.
        Neither do I.


        Comment

        • Lars Eighner

          #5
          Re: preg_match doesn't work properly!?

          In our last episode,
          <150d8afd-d5bc-418e-a243-98ae25bf9016@k3 0g2000hse.googl egroups.com>, the
          lovely and talented chadsspameatere mail@yahoo.com broadcast on
          comp.lang.php:
          I might have found a problem with how preg_match works though I'm not
          sure. Lets say you have a regular expression that you want to match a
          string of numbers. You might write the code like this: preg_match(
          '/^[0-9]+$/', $TestString );
          OK everything seems fine. However, did you know if you pass the
          following to preg_match: "12345\n" it will return that a match
          occurred?!?
          Right, because it did.
          Even though the newline is not a valid character in our regular
          expression.
          Doesn't matter. The whole expression matches before the newline.
          Here is the test program, *please run the program as written below*:
          ><?php
          $TestString = "12345\n";
          print preg_match( '/^[0-9]+$/', $TestString );
          ?>
          You will find it prints 1 even though the newline character isn't a
          valid part of our regular expression.
          It returns 1 (a match exists) because all of the pattern is found
          in $TestString. That is how perl regular expressions work.

          preg_match('/dog/','catisnotadog bubba')

          matches because all of 'dog' is in 'catisnotadogbu bba'.
          What other characters I wonder can be put in a regular expression and have
          the string match!?
          You can put just about anything in if the pattern matches some part of the
          string.
          Any ideas on this? Why is this undocumented behavior present in PHP?!?
          Of course it is not undocumented. The manuel page makes it perfectly clear
          what a match consists of.
          For regular expressions to not work as expected or documented seems
          like a pretty serious bug in PHP. I don't think there is a problem
          with the regular expression.
          There isn't. There is a serious problem in your understanding of what a
          match is --- or possibly what $ means in a perl regular expression. You
          do know the p in preg_match means perl.
          Thoughts?
          man perlre

          --
          Lars Eighner <http://larseighner.com/usenet@larseigh ner.com
          Countdown: 237 days to go.

          Comment

          • Guillaume

            #6
            Re: preg_match doesn't work properly!?

            Lars Eighner a écrit :
            There isn't. There is a serious problem in your understanding of what a
            match is --- or possibly what $ means in a perl regular expression. You
            do know the p in preg_match means perl.
            First, we're not talking about Perl, but PHP function "preg_repla ce",
            which use PCRE syntax, and not Perl syntax.

            Second, PCRE (just like Perl actually O_o) defines ^ and $ as being
            start and end of string/line (cf.
            http://www.pcre.org/pcre.txt "PCRE_MULTILINE ") (Perl defines them as
            start/end of string and start/end of line if used with /m).
            POSIX doesn't define them, but that's not the point here.

            Pattern ^[0-9]+$ should not match, because in "12345\n" there is a "\n"
            between the last number and the end of string, basically "between the
            plus and the dollar".

            Regards,
            --
            Guillaume

            Comment

            • Rik Wasmus

              #7
              Re: preg_match doesn't work properly!?

              On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner <usenet@larseig hner.com
              wrote:
              In our last episode,
              <150d8afd-d5bc-418e-a243-98ae25bf9016@k3 0g2000hse.googl egroups.com>, the
              lovely and talented chadsspameatere mail@yahoo.com broadcast on
              comp.lang.php:
              >
              >I might have found a problem with how preg_match works though I'm not
              >sure. Lets say you have a regular expression that you want to match a
              >string of numbers. You might write the code like this: preg_match(
              >'/^[0-9]+$/', $TestString );
              >
              >OK everything seems fine. However, did you know if you pass the
              >following to preg_match: "12345\n" it will return that a match
              >occurred?!?
              >
              Right, because it did.
              >
              >Even though the newline is not a valid character in our regular
              >expression.
              >
              Doesn't matter. The whole expression matches before the newline.
              >
              >Here is the test program, *please run the program as written below*:
              >
              ><?php
              > $TestString = "12345\n";
              > print preg_match( '/^[0-9]+$/', $TestString );
              >?>
              >
              >You will find it prints 1 even though the newline character isn't a
              >valid part of our regular expression.
              >
              It returns 1 (a match exists) because all of the pattern is found
              in $TestString. That is how perl regular expressions work.
              >
              preg_match('/dog/','catisnotadog bubba')
              <SNIPPED more>

              With all due respect, you're talking nonsense. You appartently missed that
              the match is anchored to the start & end of string. Nothing of your story
              has any relevance to the op's problem (which he already googled & solved
              himself just before I answered him :) ).
              --
              Rik Wasmus
              ....spamrun finished

              Comment

              • chadsspameateremail@yahoo.com

                #8
                Re: preg_match doesn't work properly!?

                >You do know the p in preg_match means perl.

                Well I come from a Perl background and that's where the original
                misunderstandin g came from. Assuming preg_match operated like a Perl
                regular expression (how stupid could I be?) in a function named after
                Perl...

                I now submit that preg_match should really be named
                klpbnratagybrtd cidreg_match which stands for:
                "Kinda Like Perl But Not Really There Are Gotchas You Better Read The
                Documentation In Detail regular expression" matching. Though maybe
                others have ideas for a shorter name. :)

                Chad. :)

                Comment

                • chadsspameateremail@yahoo.com

                  #9
                  Re: preg_match doesn't work properly!?

                  Actually, I have to correct myself! Much to my surprise this is
                  actually how Perl works after I tried it out. As documented here:
                  In a regular expression, the caret matches the concept “start of string”, while the dollar sign matches “end of string”


                  So in Perl:

                  my $x = "12345\n";
                  if ( $x =~ /^[0-9]+$/ )
                  {
                  print 1;
                  }
                  else
                  {
                  print 0;
                  }

                  Prints 1 whereas:

                  $x = "12345\n";
                  if ( $x =~ /^[0-9]+\z/ )
                  {
                  print 1;
                  }
                  else
                  {
                  print 0;
                  }

                  Prints 0. So I guess preg_match is a good name... :)

                  Comment

                  • Lars Eighner

                    #10
                    Re: preg_match doesn't work properly!?

                    In our last episode, <g1hha9$7bn$1@b iggoron.nerim.n et>, the lovely and
                    talented Guillaume broadcast on comp.lang.php:
                    Lars Eighner a écrit :
                    >There isn't. There is a serious problem in your understanding of what a
                    >match is --- or possibly what $ means in a perl regular expression. You
                    >do know the p in preg_match means perl.
                    First, we're not talking about Perl, but PHP function "preg_repla ce",
                    which use PCRE syntax, and not Perl syntax.
                    Second, PCRE (just like Perl actually O_o) defines ^ and $ as being start
                    and end of string/line (cf. http://www.pcre.org/pcre.txt "PCRE_MULTILINE ")
                    (Perl defines them as start/end of string and start/end of line if used
                    with /m). POSIX doesn't define them, but that's not the point here.
                    Pattern ^[0-9]+$ should not match, because in "12345\n" there is a "\n"
                    between the last number and the end of string, basically "between the
                    plus and the dollar".
                    This is absurd. $ matches the end of the line. You see that is why a
                    "newline" is called a newline. It is after the end of the line.


                    --
                    Lars Eighner <http://larseighner.com/usenet@larseigh ner.com
                    Countdown: 237 days to go.

                    Comment

                    • Lars Eighner

                      #11
                      Re: preg_match doesn't work properly!?

                      In our last episode,
                      <op.ubtp92ky5bn juv@metallium.l an>,
                      the lovely and talented Rik Wasmus
                      broadcast on comp.lang.php:
                      On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner <usenet@larseig hner.com>
                      wrote:
                      >In our last episode,
                      ><150d8afd-d5bc-418e-a243-98ae25bf9016@k3 0g2000hse.googl egroups.com>, the
                      >lovely and talented chadsspameatere mail@yahoo.com broadcast on
                      >comp.lang.ph p:
                      >>
                      >>I might have found a problem with how preg_match works though I'm not
                      >>sure. Lets say you have a regular expression that you want to match a
                      >>string of numbers. You might write the code like this: preg_match(
                      >>'/^[0-9]+$/', $TestString );
                      >>
                      >>OK everything seems fine. However, did you know if you pass the
                      >>following to preg_match: "12345\n" it will return that a match
                      >>occurred?!?
                      >>
                      >Right, because it did.
                      >>
                      >>Even though the newline is not a valid character in our regular
                      >>expression.
                      >>
                      >Doesn't matter. The whole expression matches before the newline.
                      >>
                      >>Here is the test program, *please run the program as written below*:
                      >>
                      >><?php
                      >> $TestString = "12345\n";
                      >> print preg_match( '/^[0-9]+$/', $TestString );
                      >>?>
                      >>
                      >>You will find it prints 1 even though the newline character isn't a
                      >>valid part of our regular expression.
                      >>
                      >It returns 1 (a match exists) because all of the pattern is found
                      >in $TestString. That is how perl regular expressions work.
                      >>
                      > preg_match('/dog/','catisnotadog bubba')
                      ><SNIPPED more>
                      With all due respect, you're talking nonsense. You appartently missed that
                      the match is anchored to the start & end of string. Nothing of your story
                      has any relevance to the op's problem (which he already googled & solved
                      himself just before I answered him :) ).
                      $ matches the end of a line. When there is no newline, the end of a string
                      is presumed to be the end of a line. It was not ever anchored to "end of
                      string." Anyone who thinks of ^ and $ as relating to strings instead of
                      lines is asking for trouble.

                      --
                      Lars Eighner <http://larseighner.com/usenet@larseigh ner.com
                      Countdown: 237 days to go.

                      Comment

                      • Rik Wasmus

                        #12
                        Re: preg_match doesn't work properly!?

                        On Tue, 27 May 2008 22:15:13 +0200, Lars Eighner <usenet@larseig hner.com
                        wrote:
                        In our last episode,
                        <op.ubtp92ky5bn juv@metallium.l an>,
                        the lovely and talented Rik Wasmus
                        broadcast on comp.lang.php:
                        >
                        >On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner
                        ><usenet@larsei ghner.com>
                        >wrote:
                        >>In our last episode,
                        >><150d8afd-d5bc-418e-a243-98ae25bf9016@k3 0g2000hse.googl egroups.com>,
                        >>the
                        >>lovely and talented chadsspameatere mail@yahoo.com broadcast on
                        >>comp.lang.php :
                        >>>
                        >>>I might have found a problem with how preg_match works though I'm not
                        >>>sure. Lets say you have a regular expression that you want to matcha
                        >>>string of numbers. You might write the code like this: preg_match(
                        >>>'/^[0-9]+$/', $TestString );
                        >>>
                        >>>OK everything seems fine. However, did you know if you pass the
                        >>>following to preg_match: "12345\n" it will return that a match
                        >>>occurred?! ?
                        >>>
                        >>Right, because it did.
                        >>>
                        >>>Even though the newline is not a valid character in our regular
                        >>>expression .
                        >>>
                        >>Doesn't matter. The whole expression matches before the newline.
                        >>>
                        >>>Here is the test program, *please run the program as written below*:
                        >>>
                        >>><?php
                        >>> $TestString = "12345\n";
                        >>> print preg_match( '/^[0-9]+$/', $TestString );
                        >>>?>
                        >>>
                        >>>You will find it prints 1 even though the newline character isn't a
                        >>>valid part of our regular expression.
                        >>>
                        >>It returns 1 (a match exists) because all of the pattern is found
                        >>in $TestString. That is how perl regular expressions work.
                        >>>
                        >> preg_match('/dog/','catisnotadog bubba')
                        >
                        ><SNIPPED more>
                        >
                        >With all due respect, you're talking nonsense. You appartently missed
                        >that
                        >the match is anchored to the start & end of string. Nothing of your
                        >story
                        >has any relevance to the op's problem (which he already googled & solved
                        >himself just before I answered him :) ).
                        >
                        $ matches the end of a line. When there is no newline, the end of a
                        string
                        is presumed to be the end of a line. It was not ever anchored to "endof
                        string." Anyone who thinks of ^ and $ as relating to strings insteadof
                        lines is asking for trouble.
                        /m
                        Tricks a lot of people, for obvious reasons.
                        'nuff said
                        --
                        Rik Wasmus
                        ....spamrun finished

                        Comment

                        • AnrDaemon

                          #13
                          Re: preg_match doesn't work properly!?

                          Greetings, Lars Eighner.
                          In reply to Your message dated Wednesday, May 28, 2008, 00:11:01,
                          This is absurd. $ matches the end of the line. You see that is why a
                          "newline" is called a newline. It is after the end of the line.
                          $ matches the end of the line while it set to the multiline. Otherwise it
                          matches the end of *string* (or right before the last \n at the end of string).
                          Feel the difference.


                          --
                          Sincerely Yours, AnrDaemon <anrdaemon@free mail.ru>

                          Comment

                          • Guillaume

                            #14
                            Re: preg_match doesn't work properly!?

                            Lars Eighner a écrit :
                            Anyone who thinks of ^ and $ as relating to strings instead of
                            lines is asking for trouble.
                            Or is reading documentation carefully :p

                            Regards,
                            --
                            Guillaume

                            Comment

                            Working...