problem with regex

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • abcd

    problem with regex

    I have a regex: '[A-Za-z]:\\([^/:\*\?"<>\|])*'

    when I do, re.compile('[A-Za-z]:\\([^/:\*\?"<>\|])*') ...I get

    sre_constants.e rror: unbalanced parenthesis

    do i need to escape something else? i see that i have matching
    parenthesis.

    thx

  • Rob Wolfe

    #2
    Re: problem with regex


    abcd wrote:
    I have a regex: '[A-Za-z]:\\([^/:\*\?"<>\|])*'
    >
    when I do, re.compile('[A-Za-z]:\\([^/:\*\?"<>\|])*') ...I get
    >
    sre_constants.e rror: unbalanced parenthesis
    >
    do i need to escape something else? i see that i have matching
    parenthesis.
    You should use raw string:

    re.compile(r'[A-Za-z]:\\([^/:\*\?"<>\|])*')

    Regards,
    Rob

    Comment

    • Barry

      #3
      Re: problem with regex

      On 28 Jul 2006 05:45:05 -0700, abcd <codecraig@gmai l.comwrote:
      I have a regex: '[A-Za-z]:\\([^/:\*\?"<>\|])*'
      >
      when I do, re.compile('[A-Za-z]:\\([^/:\*\?"<>\|])*') ...I get
      >
      sre_constants.e rror: unbalanced parenthesis
      >
      do i need to escape something else? i see that i have matching
      parenthesis.
      >
      thx
      >
      --
      Try making the argument a raw string:
      re.compile(r'[A-Za-z]:\\([^/:\*\?"<>\|])*')

      Comment

      • Tim Chase

        #4
        Re: problem with regex

        when I do, re.compile('[A-Za-z]:\\([^/:\*\?"<>\|])*') ...I get
        >
        sre_constants.e rror: unbalanced parenthesis

        Because you're not using raw strings, the escapables become
        escaped, making your regexp something like

        [A-Za-z]:\([^/:\*\?"<>\|])*

        (because it knows what "\\" is, but likely doesn't attribute
        significance to "\?" or "\|", and thus leaves them alone).

        Thus, you have "\(" in your regexp, which is a literal
        open-paren. But you have a ")", which is a "close a grouping"
        paren. The error is indicating that the "close a grouping" paren
        doesn't close some previously opened paren.

        General good practice shoves all this stuff in a raw string:

        r"[A-Za-z]:\\([^/:\*\?"<>\|])*"

        which solves much of the headache.

        -tkc




        Comment

        • abcd

          #5
          Re: problem with regex

          well thanks for the quick replies, but now my regex doesn't work.

          Code:
          import re
          p = re.compile(r'[A-Za-z]:\\([^/:\*?"<>\|])*')
          
          x = p.match("c:\test")
          x is None

          any ideas why? i escape the back-slash, the asterisk *, and the PIPE |
          .....b/c they are regex special characters.

          Comment

          • abcd

            #6
            Re: problem with regex

            sorry i forgot to escape the question mark...
            [code]
            import re
            p = re.compile(r'[A-Za-z]:\\([^/:\*?"<>\|])*')
            even when I escape that it still doesnt work as expected.

            p = re.compile(r'[A-Za-z]:\\([^/:\*\?"<>\|])*')

            p.match('c:\tes t') still returns None.

            Comment

            • Tim Chase

              #7
              Re: problem with regex

              p = re.compile(r'[A-Za-z]:\\([^/:\*?"<>\|])*')
              >
              x = p.match("c:\tes t")
              any ideas why? i escape the back-slash, the asterisk *, and the PIPE |
              ....b/c they are regex special characters.

              Same problem, only now in the other string:
              >>s = "c:\test"
              >>print s
              c: est

              Your "\t" is interpreted as as tab character. Thus, you want

              s = r"c:\test"

              or

              s = "c:\\test"

              which you'll find should now be successfully found with

              p.match(s)

              -tkc




              Comment

              • abcd

                #8
                Re: problem with regex

                Sybren Stuvel wrote:
                Yes, because after the "c:" you expect a backslash, and not a tab
                character. Read the manual again about raw strings and character
                escaping, it'll do you good.

                doh. i shall do that.

                thanks.

                Comment

                • abcd

                  #9
                  Re: problem with regex

                  not sure why this passes:

                  >>regex = r'[A-Za-z]:\\([^/:\*\?"<>\|])*'
                  >>p = re.compile(rege x)
                  >>p.match('c:\\ test')
                  <_sre.SRE_Mat ch object at 0x009D77E0>
                  >>p.match('c:\\ test?:/')
                  <_sre.SRE_Mat ch object at 0x009D7720>
                  >>>
                  the last example shouldnt give a match

                  Comment

                  • Tim Chase

                    #10
                    Re: problem with regex

                    >>>regex = r'[A-Za-z]:\\([^/:\*\?"<>\|])*'
                    >>>p = re.compile(rege x)
                    >>>p.match('c:\ \test')
                    <_sre.SRE_Mat ch object at 0x009D77E0>
                    >>>p.match('c:\ \test?:/')
                    <_sre.SRE_Mat ch object at 0x009D7720>
                    >
                    the last example shouldnt give a match
                    Ah, but it should, because it *does* match.
                    >>m = p.match('c:\\te st?:/')
                    >>m.group(0)
                    'c:\\test'
                    >># add a "$" at the end to anchor it
                    >># to the end of the line
                    >>regex = r'[A-Za-z]:\\([^/:\*\?"<>\|])*$'
                    >>p = re.compile(rege x)
                    >>m = p.match('c:\\te st?:/')
                    >>m
                    By adding the "$" to ensure that you're matching the whole string
                    passed to match() and not just as much as possible given the
                    regexp, you solve the problem you describe.

                    -tkc



                    Comment

                    • Rob Wolfe

                      #11
                      Re: problem with regex


                      abcd wrote:
                      not sure why this passes:
                      >
                      >
                      >regex = r'[A-Za-z]:\\([^/:\*\?"<>\|])*'
                      >p = re.compile(rege x)
                      >p.match('c:\\t est')
                      <_sre.SRE_Mat ch object at 0x009D77E0>
                      >p.match('c:\\t est?:/')
                      <_sre.SRE_Mat ch object at 0x009D7720>
                      >>
                      >
                      the last example shouldnt give a match
                      If you want to learn RE I suggest to use great tool redemo.py (tk app).
                      Then you can play with regular expressions to find the result
                      you are looking for.
                      It can be found in Python 2.4 in Tools\Scripts.

                      Regards,
                      Rob

                      Comment

                      Working...