matching exactly a 4 digit number in python

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • harijay

    matching exactly a 4 digit number in python

    Hi
    I am a few months new into python. I have used regexps before in perl
    and java but am a little confused with this problem.

    I want to parse a number of strings and extract only those that
    contain a 4 digit number anywhere inside a string

    However the regexp
    p = re.compile(r'\d {4}')

    Matches even sentences that have longer than 4 numbers inside
    strings ..for example it matches "I have 3324234 and more"

    I am very confused. Shouldnt the \d{4,} match exactly four digit
    numbers so a 5 digit number sentence should not be matched .

    Here is my test program output and the test given below
    Thanks for your help
    Harijay

    PyMate r8111 running Python 2.5.1 (/usr/bin/python)
    >>testdigit.p y
    Matched I have 2004 rupees
    Matched I have 3324234 and more
    Matched As 3233
    Matched 2323423414 is good
    Matched 4444 dc sav 2412441 asdf
    SKIPPED random1341also and also
    SKIPPED
    SKIPPED 13
    Matched a 1331 saves
    SKIPPED and and as dad
    SKIPPED A has 13123123
    SKIPPED A 13123
    SKIPPED 123 adn
    Matched 1312 times I have told you
    DONE

    #!/usr/bin/python
    import re
    x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
    "2323423414 is good","4444 dc sav 2412441 asdf " , "random1341 also and
    also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
    A 13123","123 adn","1312 times I have told you"]

    p = re.compile(r'\d {4} ')

    for elem in x:
    if re.search(p,ele m):
    print "Matched " + elem
    else:
    print "SKIPPED " + elem

    print "DONE"
  • Mr.SpOOn

    #2
    Re: matching exactly a 4 digit number in python

    2008/11/21 harijay <harijay@gmail. com>:
    Hi
    I am a few months new into python. I have used regexps before in perl
    and java but am a little confused with this problem.
    >
    I want to parse a number of strings and extract only those that
    contain a 4 digit number anywhere inside a string
    >
    However the regexp
    p = re.compile(r'\d {4}')
    >
    Matches even sentences that have longer than 4 numbers inside
    strings ..for example it matches "I have 3324234 and more"
    Try with this:

    p = re.compile(r'\d {4}$')

    The $ character matches the end of the string. It should work.

    Comment

    • Mark Tolonen

      #3
      Re: matching exactly a 4 digit number in python


      "harijay" <harijay@gmail. comwrote in message
      news:7424ff80-c645-4b30-b963-67997be29108@j3 8g2000yqa.googl egroups.com...
      I want to parse a number of strings and extract only those that
      contain a 4 digit number anywhere inside a string
      Try:
      p = re.compile(r'\b \d{4}\b')

      -Mark

      Comment

      • John Machin

        #4
        Re: matching exactly a 4 digit number in python

        On Nov 22, 8:46 am, harijay <hari...@gmail. comwrote:
        Hi
        I am a few months new into python. I have used regexps before in perl
        and java but am a little confused with this problem.
        >
        I want to parse a number of strings and extract only those that
        contain a 4 digit number anywhere inside a string
        >
        However the regexp
        p = re.compile(r'\d {4}')
        >
        Matches even sentences that have longer than 4 numbers inside
        strings ..for example it matches "I have 3324234 and more"
        No it doesn't. When used with re.search on that string it matches
        3324, it doesn't "match" the whole sentence.
        >
        I am very confused. Shouldnt the \d{4,} match exactly four digit
        numbers so a 5 digit number sentence should not be matched .
        {4} does NOT mean the same as {4,}.
        {4} is the same as {4,4}
        {4,} means {4,INFINITY}

        Ignoring {4,}:

        You need to specify a regex that says "4 digits followed by (non-digit
        or end-of-string)". Have a try at that and come back here if you have
        any more problems.

        some test data:
        xxx1234
        xxx12345
        xxx1234xxx
        xxx12345xxx
        xxx1234xxx1235x xx
        xxx12345xxx1234 xxx

        Comment

        • George Sakkis

          #5
          Re: matching exactly a 4 digit number in python

          On Nov 21, 4:46 pm, harijay <hari...@gmail. comwrote:
          Hi
          I am a few months new into python. I have used regexps before in perl
          and java but am a little confused with this problem.
          >
          I want to parse a number of strings and extract only those that
          contain a 4 digit number anywhere inside a string
          >
          However the regexp
          p = re.compile(r'\d {4}')
          >
          Matches even sentences that have longer than 4 numbers inside
          strings ..for example it matches "I have 3324234 and more"
          >
          I am very confused. Shouldnt the \d{4,} match exactly four digit
          numbers so a 5 digit number sentence should not be matched .
          No, why should it ? What you're saying is "give me 4 consecutive
          digits", without specifying what should precede or follow these
          digits. A correct expression is a bit more hairy:

          p = re.compile(r'''
          (?:\D|\b) # find a non-digit or word boundary..
          (\d{4}) # .. followed by the 4 digits to be matched as group
          #1..
          (?:\D|\b) # .. which are followed by non-digit or word boundary
          ''', re.VERBOSE)


          HTH,
          George

          Comment

          • MRAB

            #6
            Re: matching exactly a 4 digit number in python

            George Sakkis wrote:
            On Nov 21, 4:46 pm, harijay <hari...@gmail. comwrote:
            >
            >Hi
            >I am a few months new into python. I have used regexps before in perl
            >and java but am a little confused with this problem.
            >>
            >I want to parse a number of strings and extract only those that
            >contain a 4 digit number anywhere inside a string
            >>
            >However the regexp
            >p = re.compile(r'\d {4}')
            >>
            >Matches even sentences that have longer than 4 numbers inside
            >strings ..for example it matches "I have 3324234 and more"
            >>
            >I am very confused. Shouldnt the \d{4,} match exactly four digit
            >numbers so a 5 digit number sentence should not be matched .
            >
            No, why should it ? What you're saying is "give me 4 consecutive
            digits", without specifying what should precede or follow these
            digits. A correct expression is a bit more hairy:
            >
            p = re.compile(r'''
            (?:\D|\b) # find a non-digit or word boundary..
            (\d{4}) # .. followed by the 4 digits to be matched as group
            #1..
            (?:\D|\b) # .. which are followed by non-digit or word boundary
            ''', re.VERBOSE)
            >
            You want to match a sequence of 4 digits: \d{4}
            not preceded by a digit: (?<!\d)
            not followed by a digit: (?!\d)

            which is: re.compile(r'(? <!\d)\d{4}(?!\d )')

            Comment

            • skip@pobox.com

              #7
              Re: matching exactly a 4 digit number in python

              >I am a few months new into python. I have used regexps before in perl
              >and java but am a little confused with this problem.
              >I want to parse a number of strings and extract only those that
              >contain a 4 digit number anywhere inside a string
              >However the regexp
              >p = re.compile(r'\d {4}')
              >Matches even sentences that have longer than 4 numbers inside strings
              >..for example it matches "I have 3324234 and more"
              Try this instead:
              >>pat = re.compile(r"(? <!\d)(\d{4})(?! \d)")>>for s in x:
              ... m = pat.search(s)
              ... print repr(s),
              ... print (m is not None) and "matches" or "does not match"
              ...
              ' I have 2004 rupees ' matches
              ' I have 3324234 and more' does not match
              ' As 3233 ' matches
              '2323423414 is good' does not match
              '4444 dc sav 2412441 asdf ' matches
              'random1341also and also' matches
              '' does not match
              '13' does not match
              ' a 1331 saves' matches
              ' and and as dad' does not match
              ' A has 13123123' does not match
              'A 13123' does not match
              '123 adn' does not match
              '1312 times I have told you' matches

              --
              Skip Montanaro - skip@pobox.com - http://smontanaro.dyndns.org/

              Comment

              • harijay

                #8
                Re: matching exactly a 4 digit number in python

                Thanks John Machin and Mark Tolonen ..
                SO I guess the correct one is to use the word boundary meta character
                "\b"

                so r'\b\d{4}\b' is what I need since it reads

                a 4 digit number in between word boundaries

                Thanks a tonne, and this being my second post to comp.lang.pytho n. I
                am always amazed at how helpful everyone on this group is

                Hari

                On Nov 21, 5:12 pm, John Machin <sjmac...@lexic on.netwrote:
                On Nov 22, 8:46 am, harijay <hari...@gmail. comwrote:
                >
                Hi
                I am a few months new into python. I have used regexps before in perl
                and java but am a little confused with this problem.
                >
                I want to parse a number of strings and extract only those that
                contain a 4 digit number anywhere inside a string
                >
                However the regexp
                p = re.compile(r'\d {4}')
                >
                Matches even sentences that have longer than 4 numbers inside
                strings ..for example it matches "I have 3324234 and more"
                >
                No it doesn't. When used with re.search on that string it matches
                3324, it doesn't "match" the whole sentence.
                >
                >
                >
                I am very confused. Shouldnt the \d{4,} match exactly four digit
                numbers so a 5 digit number sentence should not be matched .
                >
                {4} does NOT mean the same as {4,}.
                {4} is the same as {4,4}
                {4,} means {4,INFINITY}
                >
                Ignoring {4,}:
                >
                You need to specify a regex that says "4 digits followed by (non-digit
                or end-of-string)". Have a try at that and come back here if you have
                any more problems.
                >
                some test data:
                xxx1234
                xxx12345
                xxx1234xxx
                xxx12345xxx
                xxx1234xxx1235x xx
                xxx12345xxx1234 xxx

                Comment

                Working...