re.match and non-alphanumeric characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • The Web President

    re.match and non-alphanumeric characters

    Dear all,

    this is really driving me nuts and any help would be extremely
    appreciated.

    I have a string that contains some numeric data. I want to isolate
    these data using re.match, as follows.

    bogus = "IFC(35m)"
    data = re.match(r'(\d+ )',bogus)
    print data.group(1)

    I would expect to have "35" printed out to screen, but instead I get
    an error that the regular expression did not match:

    Traceback (most recent call last):
    File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
    line 20, in <module>
    print data.group(1)
    AttributeError: 'NoneType' object has no attribute 'group'

    Note that the same holds if I look for "35" straight, instead of "\d
    +". If instead I look for "IFC" it works fine. That is, apparently
    re.match will match only up to the first non-alphanumeric character
    and ignore anything after a "(", "_", "[" and god knows what else.

    I am using Python 2.6 (r26:66721, latest stable version). Am I missing
    something very big and very important?
  • r

    #2
    Re: re.match and non-alphanumeric characters

    On Nov 16, 10:33 am, The Web President <mattia.land... @gmail.com>
    wrote:
    Dear all,
    >
    this is really driving me nuts and any help would be extremely
    appreciated.
    >
    I have a string that contains some numeric data. I want to isolate
    these data using re.match, as follows.
    >
    bogus = "IFC(35m)"
    data = re.match(r'(\d+ )',bogus)
    print data.group(1)
    >
    I would expect to have "35" printed out to screen, but instead I get
    an error that the regular expression did not match:
    >
    Traceback (most recent call last):
      File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
    line 20, in <module>
        print data.group(1)
    AttributeError: 'NoneType' object has no attribute 'group'
    >
    Note that the same holds if I look for "35" straight, instead of "\d
    +". If instead I look for "IFC" it works fine. That is, apparently
    re.match will match only up to the first non-alphanumeric character
    and ignore anything after a "(", "_", "[" and god knows what else.
    >
    I am using Python 2.6 (r26:66721, latest stable version). Am I missing
    something very big and very important?
    try re.search or re.findall
    re.match is only at the beginning of a string
    i almost never use it
    >>re.search('(\ d+)', bogus).group()
    '35'
    >>re.search('(\ d+)', bogus).span()
    (4, 6)

    Comment

    • MRAB

      #3
      Re: re.match and non-alphanumeric characters

      On Nov 16, 4:33 pm, The Web President <mattia.land... @gmail.com>
      wrote:
      Dear all,
      >
      this is really driving me nuts and any help would be extremely
      appreciated.
      >
      I have a string that contains some numeric data. I want to isolate
      these data using re.match, as follows.
      >
      bogus = "IFC(35m)"
      data = re.match(r'(\d+ )',bogus)
      print data.group(1)
      >
      I would expect to have "35" printed out to screen, but instead I get
      an error that the regular expression did not match:
      >
      Traceback (most recent call last):
        File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
      line 20, in <module>
          print data.group(1)
      AttributeError: 'NoneType' object has no attribute 'group'
      >
      Note that the same holds if I look for "35" straight, instead of "\d
      +". If instead I look for "IFC" it works fine. That is, apparently
      re.match will match only up to the first non-alphanumeric character
      and ignore anything after a "(", "_", "[" and god knows what else.
      >
      I am using Python 2.6 (r26:66721, latest stable version). Am I missing
      something very big and very important?
      re.match() anchors the match at the start of the string. What you need
      is re.search(). It's all in the documentation! :-)

      Comment

      • Gabriel Genellina

        #4
        Re: re.match and non-alphanumeric characters

        En Sun, 16 Nov 2008 14:33:42 -0200, The Web President
        <mattia.landoni @gmail.comescri bió:
        I have a string that contains some numeric data. I want to isolate
        these data using re.match, as follows.
        >
        bogus = "IFC(35m)"
        data = re.match(r'(\d+ )',bogus)
        print data.group(1)
        >
        I would expect to have "35" printed out to screen, but instead I get
        an error that the regular expression did not match:
        Source code: Lib/re/ This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ( str) as well as 8-...


        --
        Gabriel Genellina

        Comment

        • Diez B. Roggisch

          #5
          Re: re.match and non-alphanumeric characters

          The Web President wrote:
          Dear all,
          >
          this is really driving me nuts and any help would be extremely
          appreciated.
          >
          I have a string that contains some numeric data. I want to isolate
          these data using re.match, as follows.
          >
          bogus = "IFC(35m)"
          data = re.match(r'(\d+ )',bogus)
          print data.group(1)
          >
          I would expect to have "35" printed out to screen, but instead I get
          an error that the regular expression did not match:
          >
          Traceback (most recent call last):
          File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
          line 20, in <module>
          print data.group(1)
          AttributeError: 'NoneType' object has no attribute 'group'
          >
          Note that the same holds if I look for "35" straight, instead of "\d
          +". If instead I look for "IFC" it works fine. That is, apparently
          re.match will match only up to the first non-alphanumeric character
          and ignore anything after a "(", "_", "[" and god knows what else.
          >
          I am using Python 2.6 (r26:66721, latest stable version). Am I missing
          something very big and very important?
          Yep - re.search. Match matches the whole string. You want searching.


          Diez

          Comment

          • John Machin

            #6
            Re: re.match and non-alphanumeric characters

            On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
            Match matches the whole string.
            *ONLY* if the pattern ends with "$" or r"\Z"

            Comment

            • Diez B. Roggisch

              #7
              Re: re.match and non-alphanumeric characters

              John Machin schrieb:
              On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
              >
              > Match matches the whole string.
              >
              *ONLY* if the pattern ends with "$" or r"\Z"

              You think so?

              import re

              rex = re.compile("abc .*def")

              if rex.match("abc0 123455678def"):
              print "matched"



              Diez

              Comment

              • Steve Holden

                #8
                Re: re.match and non-alphanumeric characters

                Diez B. Roggisch wrote:
                John Machin schrieb:
                >On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                >>
                >> Match matches the whole string.
                >>
                >*ONLY* if the pattern ends with "$" or r"\Z"
                >
                >
                You think so?
                >
                import re
                >
                rex = re.compile("abc .*def")
                >
                if rex.match("abc0 123455678def"):
                print "matched"
                >
                Your test is inconclusive: necessary, but not sufficient.
                >>rex = re.compile("abc .*def")
                >>>
                >>if rex.match("abc0 123455678defPLU SEXTRASTUFF"):
                .... print "Matched"
                ....
                Matched
                >>>
                regards
                Steve
                --
                Steve Holden +1 571 484 6266 +1 800 494 3119
                Holden Web LLC http://www.holdenweb.com/

                Comment

                • John Machin

                  #9
                  Re: re.match and non-alphanumeric characters

                  On Nov 17, 10:19 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                  John Machin schrieb:
                  >
                  On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                  >
                   Match matches the whole string.
                  >
                  *ONLY* if the pattern ends with "$" or r"\Z"
                  >
                  You think so?
                  >
                  import re
                  >
                  rex = re.compile("abc .*def")
                  >
                  if rex.match("abc0 123455678def"):
                       print "matched"
                  >
                  OK, I'll try again:

                  The following 3-tuples represent (pattern, string,
                  matched_portion _of_string):
                  ('abc', 'abc', 'abc')
                  ('abc', 'abcdef', 'abc')
                  ('abc$', 'abc', 'abc')
                  ('abc$', 'abcdef', '<no match>')

                  Saying "Match matches the whole string" is incorrect; see the second
                  case. If you want to ensure that the whole string matches the pattern,
                  the pattern needs to be terminated by "$" or "\Z".

                  Comment

                  Working...