Combining 2 preg matches.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • frizzle

    Combining 2 preg matches.

    Hi group,

    I have a function which validates a string using preg match.
    A part looks like

    if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
    ||
    preg_match( '/(--|__)+/' ,$string) ) {

    i wonder how i could combine those two into one ...
    I tried a few different options of putting the second match into the
    first one,
    using things like [^__]+ etc, but nothing worked for me.
    it should prevent double (or more) dashes or underscores behind each
    other.
    hello-there = ok
    hello--there != ok

    Any help would be great.

    Frizzle.

  • Csaba Gabor

    #2
    Re: Combining 2 preg matches.

    frizzle wrote:
    I have a function which validates a string using preg match.
    A part looks like
    >
    if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
    ||
    preg_match( '/(--|__)+/' ,$string) ) {
    >
    i wonder how i could combine those two into one ...
    I tried a few different options of putting the second match into the
    first one,
    using things like [^__]+ etc, but nothing worked for me.
    it should prevent double (or more) dashes or underscores behind each
    other.
    hello-there = ok
    hello--there != ok
    Is hello-_there ok?
    Is hello_-there ok?
    Is _hello-there ok?

    If the answer to the above three questions is no, then the following
    should do the trick. Note that this implies that the final character
    could be a - or _:

    if (preg_match('/^([a-z0-9][-_]?)+$/', $string)) { ... }

    Csaba Gabor from New York

    Comment

    • Chung Leong

      #3
      Re: Combining 2 preg matches.


      frizzle wrote:
      Hi group,
      >
      I have a function which validates a string using preg match.
      A part looks like
      >
      if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
      ||
      preg_match( '/(--|__)+/' ,$string) ) {
      >
      i wonder how i could combine those two into one ...
      I tried a few different options of putting the second match into the
      first one,
      using things like [^__]+ etc, but nothing worked for me.
      it should prevent double (or more) dashes or underscores behind each
      other.
      hello-there = ok
      hello--there != ok
      >
      Any help would be great.
      >
      Frizzle.
      What you need is a lookahead and lookbehind assertion on the dash and
      underscore, stating that they're acceptable only if there're letters in
      front and behind them:

      /^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

      Comment

      • frizzle

        #4
        Re: Combining 2 preg matches.


        Chung Leong wrote:
        frizzle wrote:
        Hi group,

        I have a function which validates a string using preg match.
        A part looks like

        if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
        ||
        preg_match( '/(--|__)+/' ,$string) ) {

        i wonder how i could combine those two into one ...
        I tried a few different options of putting the second match into the
        first one,
        using things like [^__]+ etc, but nothing worked for me.
        it should prevent double (or more) dashes or underscores behind each
        other.
        hello-there = ok
        hello--there != ok

        Any help would be great.

        Frizzle.
        >
        What you need is a lookahead and lookbehind assertion on the dash and
        underscore, stating that they're acceptable only if there're letters in
        front and behind them:
        >
        /^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/
        /^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

        wowowow, could you explain a little on this ?
        like the : and ?<= parts

        (i assume 0-9 should still be included??)

        Frizzle.

        Comment

        • frizzle

          #5
          Re: Combining 2 preg matches.


          frizzle wrote:
          Chung Leong wrote:
          frizzle wrote:
          Hi group,
          >
          I have a function which validates a string using preg match.
          A part looks like
          >
          if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
          ||
          preg_match( '/(--|__)+/' ,$string) ) {
          >
          i wonder how i could combine those two into one ...
          I tried a few different options of putting the second match into the
          first one,
          using things like [^__]+ etc, but nothing worked for me.
          it should prevent double (or more) dashes or underscores behind each
          other.
          hello-there = ok
          hello--there != ok
          >
          Any help would be great.
          >
          Frizzle.
          What you need is a lookahead and lookbehind assertion on the dash and
          underscore, stating that they're acceptable only if there're letters in
          front and behind them:

          /^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/
          >
          /^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/
          >
          wowowow, could you explain a little on this ?
          like the : and ?<= parts
          >
          (i assume 0-9 should still be included??)
          >
          Frizzle.
          Still curious after the explanation, but just letting you know it works
          axactly as it should ..

          Frizzle.

          Comment

          • Rik

            #6
            Re: Combining 2 preg matches.

            frizzle wrote:
            Chung Leong wrote:
            >frizzle wrote:
            >>Hi group,
            >>>
            >>I have a function which validates a string using preg match.
            >>A part looks like
            >>>
            >>if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/',
            >>$string )
            >>>>>
            >>preg_match( '/(--|__)+/' ,$string) ) {
            >>>
            >>i wonder how i could combine those two into one ...
            >>I tried a few different options of putting the second match into the
            >>first one,
            >>using things like [^__]+ etc, but nothing worked for me.
            >>it should prevent double (or more) dashes or underscores behind each
            >>other.
            >>hello-there = ok
            >>hello--there != ok
            >>>
            >>Any help would be great.
            >>>
            >>Frizzle.
            >>
            >What you need is a lookahead and lookbehind assertion on the dash and
            >underscore, stating that they're acceptable only if there're letters
            >in front and behind them:
            >>
            > /^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/
            >
            >
            wowowow, could you explain a little on this ?
            like the : and ?<= parts
            non-capturing group (usefull when you just want to match, and don't need the
            exact matched portion):
            In a regular expression, parentheses can be used to group regex tokens together and for creating backreferences. Backreferences allow you to reuse part of the regex match in the regex, or in the replacement text.


            positive lookbehind:
            Test for a match, or test for failure, without actually consuming any characters.


            $regex ='/ #opening delimiter
            ^ #start of string
            (?: #start of non-capturing group
            [a-z] #any character between a and z
            | #OR
            (?<= #start of positive lookbehind (is preceeded by..)
            [a-z] #any character between a and z
            ) #end of positive lookbehind
            [-_] #character - or _ (not incorrect, but probably better
            to [_\-],[_-] or [\-_]
            (?= #start of positive lookahead
            [a-z] #any character between a and z
            ) #end of positive lookahead
            ) #end of non-capturing group
            + #1 or more times, greedy
            $ #end of string
            /x';


            Human translation:
            The entire(1) string consists of 1 or more (2) characters [a-z] and possibly
            the single characters _ or - enclosed by characters in the range [a-z].

            (1) by achoring them with ^.....$
            (2) by +
            (i assume 0-9 should still be included??)

            If you want that, yes, just change every [a-z] to [a-z0-9].

            Use the /i modifier if you want a match to be case-insensitive.

            Grtz,
            --
            Rik Wasmus


            Comment

            • Rik

              #7
              Re: Combining 2 preg matches.

              Rik wrote:
              $regex ='/ #opening delimiter
              ^ #start of string
              (?: #start of non-capturing group
              [a-z] #any character between a and z
              | #OR
              (?<= #start of positive lookbehind (is preceeded
              by..) [a-z] #any character between a and z
              ) #end of positive lookbehind
              [-_] #character - or _ (not incorrect, but probably
              better to [_\-],[_-] or [\-_]
              (?= #start of positive lookahead
              [a-z] #any character between a and z
              ) #end of positive lookahead
              ) #end of non-capturing group
              + #1 or more times, greedy
              $ #end of string
              /x';
              >

              It just occured to me that, allthough a wonderfull example:

              $regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

              ....will do just fine.

              equally so:
              $regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

              Lookahead & -behind are unneccessary in this case, and this keep it simple.

              Grtz,
              --
              Rik Wasmus


              Comment

              • frizzle

                #8
                Re: Combining 2 preg matches.


                Rik wrote:
                Rik wrote:
                $regex ='/ #opening delimiter
                ^ #start of string
                (?: #start of non-capturing group
                [a-z] #any character between a and z
                | #OR
                (?<= #start of positive lookbehind (is preceeded
                by..) [a-z] #any character between a and z
                ) #end of positive lookbehind
                [-_] #character - or _ (not incorrect, but probably
                better to [_\-],[_-] or [\-_]
                (?= #start of positive lookahead
                [a-z] #any character between a and z
                ) #end of positive lookahead
                ) #end of non-capturing group
                + #1 or more times, greedy
                $ #end of string
                /x';
                >
                >
                It just occured to me that, allthough a wonderfull example:
                >
                $regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';
                >
                ...will do just fine.
                >
                equally so:
                $regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';
                >
                Lookahead & -behind are unneccessary in this case, and this keep it simple.
                >
                Grtz,
                --
                Rik Wasmus
                Wow, thanks for the explanation!
                Nice link there as well. Going right into my bookmarks.

                Frizzle.

                Comment

                • Chung Leong

                  #9
                  Re: Combining 2 preg matches.

                  Rik wrote:
                  It just occured to me that, allthough a wonderfull example:
                  >
                  $regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';
                  >
                  ...will do just fine.
                  >
                  equally so:
                  $regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';
                  >
                  Lookahead & -behind are unneccessary in this case, and this keep it simple.
                  Good point. It doesn't make sense to use assertions when you'll capture
                  the matches anyway.

                  Comment

                  • frizzle

                    #10
                    Re: Combining 2 preg matches.


                    Chung Leong wrote:
                    Rik wrote:
                    It just occured to me that, allthough a wonderfull example:

                    $regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

                    ...will do just fine.

                    equally so:
                    $regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

                    Lookahead & -behind are unneccessary in this case, and this keep it simple.
                    >
                    Good point. It doesn't make sense to use assertions when you'll capture
                    the matches anyway.
                    Somehow, i believe Rik's solution, gave me problems ...

                    '/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
                    '/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

                    An example string that gave problems is:
                    really_a_made_u p_string

                    So i used Chung's option.

                    Frizzle.

                    Comment

                    • Rik

                      #11
                      Re: Combining 2 preg matches.

                      frizzle wrote:
                      >>$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';
                      >>>
                      >>...will do just fine.
                      >>>
                      >>equally so:
                      >>$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';
                      >>>
                      >>Lookahead & -behind are unneccessary in this case, and this keep it
                      >>simple.
                      >Good point. It doesn't make sense to use assertions when you'll
                      >capture
                      >the matches anyway.
                      Somehow, i believe Rik's solution, gave me problems ...
                      >
                      '/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
                      '/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.
                      >
                      An example string that gave problems is:
                      really_a_made_u p_string

                      Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
                      already matched, so it won't work as a start for the second _ in _a_....

                      This one should still be working though:
                      $regex ='/^(?:[a-z0-9]+(?:[_\-][a-z0-9]+)*)$/';

                      Grtz,
                      --
                      Rik Wasmus


                      Comment

                      • frizzle

                        #12
                        Re: Combining 2 preg matches.


                        Rik wrote:
                        frizzle wrote:
                        >$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';
                        >>
                        >...will do just fine.
                        >>
                        >equally so:
                        >$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';
                        >>
                        >Lookahead & -behind are unneccessary in this case, and this keep it
                        >simple.
                        Good point. It doesn't make sense to use assertions when you'll
                        capture
                        the matches anyway.
                        Somehow, i believe Rik's solution, gave me problems ...

                        '/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
                        '/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

                        An example string that gave problems is:
                        really_a_made_u p_string
                        >
                        >
                        Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
                        already matched, so it won't work as a start for the second _ in _a_....
                        >
                        This one should still be working though:
                        $regex ='/^(?:[a-z0-9]+(?:[_\-][a-z0-9]+)*)$/';
                        >
                        Grtz,
                        --
                        Rik Wasmus
                        ok, dankjewel / thanks a lot.

                        Frizzle.

                        Comment

                        • Chung Leong

                          #13
                          Re: Combining 2 preg matches.

                          Rik wrote:
                          Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
                          already matched, so it won't work as a start for the second _ in _a_....
                          You know, I thought that was the problem initially, but then remembered
                          that the regular expression engine does backtracking in order to
                          maximise any match. When it encounters the underscore after assigning
                          the letter to the first subpattern, it's supposed to abandon the
                          previous match, backtrack to the letter, and go down the second branch.

                          Comment

                          • Rik

                            #14
                            Re: Combining 2 preg matches.

                            Chung Leong wrote:
                            Rik wrote:
                            >Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right
                            >is already matched, so it won't work as a start for the second _ in
                            >_a_....
                            >
                            You know, I thought that was the problem initially, but then
                            remembered that the regular expression engine does backtracking in
                            order to
                            maximise any match. When it encounters the underscore after assigning
                            the letter to the first subpattern, it's supposed to abandon the
                            previous match, backtrack to the letter, and go down the second
                            branch.
                            Yes and no. It does exactly what you say, but it is simply not valid:

                            With the pattern:
                            '/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
                            one states the entire string can be build by either [a-z0-9](1)OR
                            [a-z0-9][_\-][a-z0-9](2), think of them as blocks.

                            Let's examine it (not entirely how it works, but this instance close
                            enough):
                            (fixed width font is handy now:)
                            positions: 123456789012345 678901234567890
                            string: really_a_made_u p_string
                            match1: 111111_error, let's try the other option.
                            match2: 111112--_error, no other matches possible.

                            There is no possibility for a match with either (1) or (2) at the second _,
                            and no other options to match instead at the beginning of the string.

                            Grtz,
                            --
                            Rik Wasmus


                            Comment

                            • Chung Leong

                              #15
                              Re: Combining 2 preg matches.

                              Rik wrote:
                              Yes and no. It does exactly what you say, but it is simply not valid:
                              >
                              With the pattern:
                              '/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
                              one states the entire string can be build by either [a-z0-9](1)OR
                              [a-z0-9][_\-][a-z0-9](2), think of them as blocks.
                              >
                              Let's examine it (not entirely how it works, but this instance close
                              enough):
                              (fixed width font is handy now:)
                              positions: 123456789012345 678901234567890
                              string: really_a_made_u p_string
                              match1: 111111_error, let's try the other option.
                              match2: 111112--_error, no other matches possible.
                              >
                              There is no possibility for a match with either (1) or (2) at the second _,
                              and no other options to match instead at the beginning of the string.
                              Ah! I missed the single letter case.

                              Comment

                              Working...