How make regex that means "contains regex#1 but NOT regex#2" ??

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • seberino@spawar.navy.mil

    How make regex that means "contains regex#1 but NOT regex#2" ??

    I'm looking over the docs for the re module and can't find how to
    "NOT" an entire regex.

    For example.....

    How make regex that means "contains regex#1 but NOT regex#2" ?

    Chris
  • A.T.Hofkamp

    #2
    Re: How make regex that means "contai ns regex#1 but NOT regex#2" ??

    On 2008-07-01, seberino@spawar .navy.mil <seberino@spawa r.navy.milwrote :
    I'm looking over the docs for the re module and can't find how to
    "NOT" an entire regex.
    (?! R)
    How make regex that means "contains regex#1 but NOT regex#2" ?
    (\1|(?!\2))

    should do what you want.

    Albert

    Comment

    • Paul McGuire

      #3
      Re: How make regex that means &quot;contai ns regex#1 but NOT regex#2&quot; ??

      On Jul 1, 2:34 am, "A.T.Hofkam p" <h...@se-162.se.wtb.tue. nlwrote:
      On 2008-07-01, seber...@spawar .navy.mil <seber...@spawa r.navy.milwrote :
      >
      I'm looking over the docs for the re module and can't find how to
      "NOT" an entire regex.
      >
      (?! R)
      >
      How make regex that means "contains regex#1 but NOT regex#2" ?
      >
      (\1|(?!\2))
      >
      should do what you want.
      >
      Albert
      I think the OP wants both A AND not B, not A OR not B. If the OP want
      to do re.match(A and not B), then I think this can be done as ((?!
      \2)\1), but if he really wants CONTAINS A and not B, then I think this
      requires 2 calls to re.search. See test code below:

      import re

      def test(restr,inst r):
      print "%s match %s? %s" %
      (restr,instr,bo ol(re.match(res tr,instr)))

      a = "AAA"
      b = "BBB"

      aAndNotB = "(%s|(?!%s) )" % (a,b)

      test(aAndNotB," AAA")
      test(aAndNotB," BBB")
      test(aAndNotB," AAABBB")
      test(aAndNotB," zAAA")
      test(aAndNotB," CCC")

      aAndNotB = "((?!%s)%s) " % (b,a)

      test(aAndNotB," AAA")
      test(aAndNotB," BBB")
      test(aAndNotB," AAABBB")
      test(aAndNotB," zAAA")
      test(aAndNotB," CCC")

      def test2(arestr,br estr,instr):
      print "%s contains %s but NOT %s? %s" % \
      (instr,arestr,b restr,
      bool(re.search( arestr,instr) and
      not re.search(brest r,instr)))

      test2(a,b,"AAA" )
      test2(a,b,"BBB" )
      test2(a,b,"AAAB BB")
      test2(a,b,"zAAA ")
      test2(a,b,"CCC" )

      Prints:

      (AAA|(?!BBB)) match AAA? True
      (AAA|(?!BBB)) match BBB? False
      (AAA|(?!BBB)) match AAABBB? True
      (AAA|(?!BBB)) match zAAA? True
      (AAA|(?!BBB)) match CCC? True
      ((?!BBB)AAA) match AAA? True
      ((?!BBB)AAA) match BBB? False
      ((?!BBB)AAA) match AAABBB? True
      ((?!BBB)AAA) match zAAA? False
      ((?!BBB)AAA) match CCC? False
      AAA contains AAA but NOT BBB? True
      BBB contains AAA but NOT BBB? False
      AAABBB contains AAA but NOT BBB? False
      zAAA contains AAA but NOT BBB? True
      CCC contains AAA but NOT BBB? False


      As we've all seen before, posters are not always the most precise when
      describing whether they want match vs. search. Given that the OP used
      the word "contains", I read that to mean "search". I'm not an RE pro
      by any means, but I think the behavior that the OP wants is given in
      the last 4 tests, and I don't know how to do that in a single RE.

      -- Paul

      Comment

      • Reedick, Andrew

        #4
        RE: How make regex that means &quot;contai ns regex#1 but NOT regex#2&quot; ??


        -----Original Message-----
        From: python-list-bounces+jr9445= att.com@python. org [mailto:python-
        list-bounces+jr9445= att.com@python. org] On Behalf Of
        seberino@spawar .navy.mil
        Sent: Tuesday, July 01, 2008 2:29 AM
        To: python-list@python.org
        Subject: How make regex that means "contains regex#1 but NOT regex#2"
        ??

        I'm looking over the docs for the re module and can't find how to
        "NOT" an entire regex.

        For example.....

        How make regex that means "contains regex#1 but NOT regex#2" ?
        Match 'foo.*bar', except when 'not' appears between foo and bar.


        import re

        s = 'fooAAABBBbar'
        print "Should match:", s
        m = re.match(r'(foo (.(?!not))*bar) ', s);
        if m:
        print m.groups()

        print

        s = 'fooAAAnotBBBba r'
        print "Should not match:", s
        m = re.match(r'(foo (.(?!not))*bar) ', s);
        if m:
        print m.groups()


        == Output ==
        Should match: fooAAABBBbar
        ('fooAAABBBbar' , 'B')

        Should not match: fooAAAnotBBBbar



        *****

        The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621


        Comment

        • Reedick, Andrew

          #5
          RE: How make regex that means &quot;contai ns regex#1 but NOT regex#2&quot; ??


          -----Original Message-----
          From: python-list-bounces+jr9445= att.com@python. org [mailto:python-
          list-bounces+jr9445= att.com@python. org] On Behalf Of Reedick, Andrew
          Sent: Tuesday, July 01, 2008 10:07 AM
          To: seberino@spawar .navy.mil; python-list@python.org
          Subject: RE: How make regex that means "contains regex#1 but NOT
          regex#2" ??

          Match 'foo.*bar', except when 'not' appears between foo and bar.


          import re

          s = 'fooAAABBBbar'
          print "Should match:", s
          m = re.match(r'(foo (.(?!not))*bar) ', s);
          if m:
          print m.groups()

          print

          s = 'fooAAAnotBBBba r'
          print "Should not match:", s
          m = re.match(r'(foo (.(?!not))*bar) ', s);
          if m:
          print m.groups()


          == Output ==
          Should match: fooAAABBBbar
          ('fooAAABBBbar' , 'B')

          Should not match: fooAAAnotBBBbar

          Fixed a bug with 'foonotbar'. Conceptually it breaks down into:

          First_half_of_R egex#1(not
          Regex#2)(any_ch ar_Not_followed _by_Regex#2)*Se cond_half_of_Re gex#1

          However, if possible, I would make it a two pass regex. Match on
          Regex#1, throw away any matches that then match on Regex#2. A two pass
          is faster and easier to code and understand. Easy to understand == less
          chance of a bug. If you're worried about performance, then a) a
          complicated regex may or may not be faster than two simple regexes, and
          b) if you're passing that much data through a regex, you're probably I/O
          bound anyway.


          import re

          ss = ('foobar', 'fooAAABBBbar', 'fooAAAnotBBBba r', 'fooAAAnotbar',
          'foonotBBBbar', 'foonotbar')

          for s in ss:
          print s,
          m = re.match(r'(foo (?!not)(?:.(?!n ot))*bar)', s);
          if m:
          print m.groups()
          else:
          print


          == output ==
          foobar ('foobar',)
          fooAAABBBbar ('fooAAABBBbar' ,)
          fooAAAnotBBBbar
          fooAAAnotbar
          foonotBBBbar
          foonotbar

          *****

          The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621


          Comment

          Working...