regular expression for integer and decimal numbers

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • gary

    regular expression for integer and decimal numbers

    I want to pick all intergers and decimal numbers out of a string.
    Would this be the most correct regular expression to use?

    "\d+\.?\d*"
  • Andrew Durdin

    #2
    Re: regular expression for integer and decimal numbers

    On 23 Sep 2004 17:51:17 -0700, gary <gary.wilson@gm ail.com> wrote:[color=blue]
    > I want to pick all intergers and decimal numbers out of a string.
    > Would this be the most correct regular expression to use?
    >
    > "\d+\.?\d*"[/color]

    That will work for numbers such as 0123 12.345 12. 0.5 -- but it
    won't work for the following:
    0x12AB .5 10e-3 -15 123L
    If you want to handle some of those, then you'll need a more complicated regex.
    If you want to accept numbers of the form .5 but don't care about 12.
    then a better regex would be
    \d*\.?\d+

    Comment

    • Andrew Dalke

      #3
      Re: regular expression for integer and decimal numbers

      Andrew Durdin wrote:[color=blue]
      > That will work for numbers such as 0123 12.345 12. 0.5 -- but it
      > won't work for the following:
      > 0x12AB .5 10e-3 -15 123L[/color]

      This will handle the normal floats including a leading + or -
      and trailing exponent, all optional.

      r"[+-]?((\d+(\.\d*)?) |\.\d+)([eE][+-]?[0-9]+)?"

      Andrew
      dalke@dalkescie ntific.com

      Comment

      • Peter Hansen

        #4
        Re: regular expression for integer and decimal numbers

        gary wrote:[color=blue]
        > I want to pick all intergers and decimal numbers out of a string.
        > Would this be the most correct regular expression to use?
        >
        > "\d+\.?\d*"[/color]

        Examples, including the most extreme cases you want to handle,
        are always a good idea.

        -Peter

        Comment

        • gary

          #5
          Re: regular expression for integer and decimal numbers

          Peter Hansen <peter@engcorp. com> wrote in message news:<pbadnZrDH OinY87cRVn-jg@powergate.ca >...[color=blue]
          > gary wrote:[color=green]
          > > I want to pick all intergers and decimal numbers out of a string.
          > > Would this be the most correct regular expression to use?
          > >
          > > "\d+\.?\d*"[/color]
          >
          > Examples, including the most extreme cases you want to handle,
          > are always a good idea.
          >
          > -Peter[/color]

          Here is an example of what I will be dealing with:
          """
          TOTAL FIRST DOWNS 19 21
          By Rushing 11 6
          By Passing 6 10
          By Penalty 2 5
          THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
          FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
          TOTAL NET YARDS 379 271
          Total Offensive Plays (inc. times thrown passing) 58 63
          Average gain per offensive play 6.5 4.3
          NET YARDS RUSHING 264 115
          """

          I can only hope that they were nice and put a leading zero in front of
          numbers less than 1.

          Comment

          • Bengt Richter

            #6
            Re: regular expression for integer and decimal numbers

            On 25 Sep 2004 13:13:22 -0700, gary.wilson@gma il.com (gary) wrote:
            [color=blue]
            >Peter Hansen <peter@engcorp. com> wrote in message news:<pbadnZrDH OinY87cRVn-jg@powergate.ca >...[color=green]
            >> gary wrote:[color=darkred]
            >> > I want to pick all intergers and decimal numbers out of a string.
            >> > Would this be the most correct regular expression to use?
            >> >
            >> > "\d+\.?\d*"[/color]
            >>
            >> Examples, including the most extreme cases you want to handle,
            >> are always a good idea.
            >>
            >> -Peter[/color]
            >
            >Here is an example of what I will be dealing with:
            >"""
            >TOTAL FIRST DOWNS 19 21
            > By Rushing 11 6
            > By Passing 6 10
            > By Penalty 2 5
            >THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
            >FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
            >TOTAL NET YARDS 379 271
            > Total Offensive Plays (inc. times thrown passing) 58 63
            > Average gain per offensive play 6.5 4.3
            >NET YARDS RUSHING 264 115
            >"""
            >
            >I can only hope that they were nice and put a leading zero in front of
            >numbers less than 1.[/color]

            Are you sure you want to throw away all the info implicit in the structure of that data?
            How about the columns? Will you get other input with more columns? Otherwise if your
            numeric fields are as they appear, maybe just
            [color=blue][color=green][color=darkred]
            >>> def extract(s):[/color][/color][/color]
            ... for a in s.split():
            ... if not a[0].isdigit(): continue
            ... if a.endswith('%') :
            ... for i in map(int,a[:-1].split('-')): yield i
            ... elif '.' in a: yield float(a)
            ... else: yield int(a)
            ...[color=blue][color=green][color=darkred]
            >>> s = ([/color][/color][/color]
            ... """
            ... TOTAL FIRST DOWNS 19 21
            ... By Rushing 11 6
            ... By Passing 6 10
            ... By Penalty 2 5
            ... THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
            ... FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
            ... TOTAL NET YARDS 379 271
            ... Total Offensive Plays (inc. times thrown passing) 58 63
            ... Average gain per offensive play 6.5 4.3
            ... NET YARDS RUSHING 264 115
            ... """
            ... )[color=blue][color=green][color=darkred]
            >>> for num in extract(s): print num,[/color][/color][/color]
            ...
            19 21 11 6 6 10 2 5 4 11 36 6 14 43 0 1 0 0 0 0 379 271 58 63 6.5 4.3 264 115

            But I doubt that's what you really want ;-)

            Regards,
            Bengt Richter

            Comment

            • Peter Hansen

              #7
              Re: regular expression for integer and decimal numbers

              gary wrote:[color=blue]
              > Peter Hansen <peter@engcorp. com> wrote in message news:<pbadnZrDH OinY87cRVn-jg@powergate.ca >...[color=green]
              >>Examples, including the most extreme cases you want to handle,
              >>are always a good idea.[/color]
              >
              > Here is an example of what I will be dealing with:
              > """
              > TOTAL FIRST DOWNS 19 21
              > By Rushing 11 6
              > By Passing 6 10
              > By Penalty 2 5
              > THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
              > FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
              > TOTAL NET YARDS 379 271
              > Total Offensive Plays (inc. times thrown passing) 58 63
              > Average gain per offensive play 6.5 4.3
              > NET YARDS RUSHING 264 115
              > """
              >
              > I can only hope that they were nice and put a leading zero in front of
              > numbers less than 1.[/color]

              Good example of the input. Now all you need to do is tell
              us exactly what kind of output you would expect to come
              from the routine which you seek. ;-)

              -Peter

              Comment

              • gary

                #8
                Re: regular expression for integer and decimal numbers

                bokr@oz.net (Bengt Richter) wrote in message news:<cj4tfm$se h$0$216.39.172. 122@theriver.co m>...[color=blue]
                > On 25 Sep 2004 13:13:22 -0700, gary.wilson@gma il.com (gary) wrote:
                >[color=green]
                > >Peter Hansen <peter@engcorp. com> wrote in message news:<pbadnZrDH OinY87cRVn-jg@powergate.ca >...[color=darkred]
                > >> gary wrote:
                > >> > I want to pick all intergers and decimal numbers out of a string.
                > >> > Would this be the most correct regular expression to use?
                > >> >
                > >> > "\d+\.?\d*"
                > >>
                > >> Examples, including the most extreme cases you want to handle,
                > >> are always a good idea.
                > >>
                > >> -Peter[/color]
                > >
                > >Here is an example of what I will be dealing with:
                > >"""
                > >TOTAL FIRST DOWNS 19 21
                > > By Rushing 11 6
                > > By Passing 6 10
                > > By Penalty 2 5
                > >THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
                > >FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
                > >TOTAL NET YARDS 379 271
                > > Total Offensive Plays (inc. times thrown passing) 58 63
                > > Average gain per offensive play 6.5 4.3
                > >NET YARDS RUSHING 264 115
                > >"""[/color][/color]
                [color=blue]
                > Are you sure you want to throw away all the info implicit in the structure of that data?
                > How about the columns? Will you get other input with more columns?[/color]

                There are several other instances in the files that I am extracting
                data from where the numbers are not so nicely arranged in columns, so
                I am really looking for something that could be used in all instances.
                (http://www.nfl.com/gamecenter/gamebo...020929_TEN@OAK)

                I do however still need to convert everything from string to numbers.
                I was thinking about using the following for that unless someone has a
                better solution:
                [color=blue][color=green][color=darkred]
                >>> def StrToNum(str):[/color][/color][/color]
                .... try: return int(str)
                .... except ValueError:
                .... try: return float(str)
                .... except ValueError: return str
                [color=blue][color=green][color=darkred]
                >>> statlist = ['10', '6', '2002', 'tampa bay buccaneers', 'atlanta[/color][/color][/color]
                falcons', 'the georgia dome', '1', '03', 'pm', 'est', 'artificial',
                '0', '3', '7', '10', '0', '20', '3', '0', '3', '0', '0', '6', '15',
                '14', '5', '2', '9', '10', '1', '2', '4', '13', '31', '3', '14', '21',
                '1', '1', '100', '0', '1', '0', '327', '243', '59', '64', '5.5',
                '3.8', '74', '70', '26', '22', '2.8', '3.2', '2', '3', '2', '3',
                '253', '173', '2', '8', '4', '14', '261', '187', '31', '17', '1',
                '38', '17', '4', '7.7', '4.1', '5', '3', '0', '3', '2', '2', '5',
                '43.2', '5', '45.6', '0', '0', '0', '0', '0', '0', '31.2', '41.6',
                '50', '40', '0', '0', '3', '40', '0', '0', '5', '120', '4', '50', '1',
                '0', '6', '35', '6', '41', '1', '1', '0', '0', '2', '0', '0', '0',
                '1', '0', '1', '0', '2', '2', '0', '0', '2', '2', '0', '0', '2', '2',
                '2', '3', '0', '2', '0', '0', '2', '0', '0', '1', '0', '0', '0', '0',
                '0', '0', '20', '6', '29', '34', '30', '26', '3', '37', '9', '59',
                '9', '35', '6', '23', 0, 0, '11', '23', '5', '01', '5', '25', '8',
                '37', 0, 0, '26'][color=blue][color=green][color=darkred]
                >>> [StrToNum(item) for item in statlist][/color][/color][/color]
                [10, 6, 2002, 'tampa bay buccaneers', 'atlanta falcons', 'the georgia
                dome', 1, 3, 'pm', 'est', 'artificial', 0, 3, 7, 10, 0, 20, 3, 0, 3,
                0, 0, 6, 15, 14, 5, 2, 9, 10, 1, 2, 4, 13, 31, 3, 14, 21, 1, 1, 100,
                0, 1, 0, 327, 243, 59, 64, 5.5, 3.7999999999999 998, 74, 70, 26, 22,
                2.7999999999999 998, 3.2000000000000 002, 2, 3, 2, 3, 253, 173, 2, 8, 4,
                14, 261, 187, 31, 17, 1, 38, 17, 4, 7.7000000000000 002,
                4.0999999999999 996, 5, 3, 0, 3, 2, 2, 5, 43.200000000000 003, 5,
                45.600000000000 001, 0, 0, 0, 0, 0, 0, 31.199999999999 999,
                41.600000000000 001, 50, 40, 0, 0, 3, 40, 0, 0, 5, 120, 4, 50, 1, 0, 6,
                35, 6, 41, 1, 1, 0, 0, 2, 0, 0, 0, 1, 0, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0,
                2, 2, 2, 3, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 20, 6, 29, 34,
                30, 26, 3, 37, 9, 59, 9, 35, 6, 23, 0, 0, 11, 23, 5, 1, 5, 25, 8, 37,
                0, 0, 26]

                Another thing was that I found a negative number which kinds screws up
                the regex's previously disscussed. So I came up with a workaround
                below:[color=blue][color=green][color=darkred]
                >>> str = """[/color][/color][/color]
                .... FGs - PATs Had Blocked 0-0 0-0
                .... Net Punting Average -6.3 33.3
                .... TOTAL RETURN YARDAGE (Not Including Kickoffs) 14 257
                .... No. and Yards Punt Returns 1-14 2-157
                .... """[color=blue][color=green][color=darkred]
                >>> str = re.sub(r"(\d+)-",r"\1 ",str) #replace number followed by[/color][/color][/color]
                dash with number followed by space[color=blue][color=green][color=darkred]
                >>> teamstats = re.findall(r"-?\d+\.?\d*",str ) #regex discussed before[/color][/color][/color]
                but with an optional negative sign in front[color=blue][color=green][color=darkred]
                >>> teamstats[/color][/color][/color]
                ['0', '0', '0', '0', '-6.3', '33.3', '14', '257', '1', '14', '2',
                '157'][color=blue][color=green][color=darkred]
                >>> [StrToNum(item) for item in teamstats][/color][/color][/color]
                [0, 0, 0, 0, -6.2999999999999 998, 33.299999999999 997, 14, 257, 1, 14,
                2, 157]

                Gary

                Comment

                • gary

                  #9
                  Re: regular expression for integer and decimal numbers

                  Peter Hansen <peter@engcorp. com> wrote in message news:<jfudnfjxP YD6vMvcRVn-uw@powergate.ca >...[color=blue]
                  > Good example of the input. Now all you need to do is tell
                  > us exactly what kind of output you would expect to come
                  > from the routine which you seek. ;-)
                  >
                  > -Peter[/color]

                  Well for that particular example something of the form...

                  Cleveland at Cincinnati +8

                  would be nice ;-)

                  Comment

                  • Peter Hansen

                    #10
                    Re: regular expression for integer and decimal numbers

                    gary wrote:[color=blue]
                    > Peter Hansen <peter@engcorp. com> wrote in message news:<jfudnfjxP YD6vMvcRVn-uw@powergate.ca >...
                    >[color=green]
                    >>Good example of the input. Now all you need to do is tell
                    >>us exactly what kind of output you would expect to come
                    >>from the routine which you seek. ;-)[/color]
                    >
                    > Well for that particular example something of the form...
                    >
                    > Cleveland at Cincinnati +8
                    >
                    > would be nice ;-)[/color]

                    I know nothing about American football except that it
                    isn't played with a puck, so I don't think I get the joke...

                    -Peter

                    Comment

                    Working...