regex problem

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Odd-R.

    regex problem

    Input is a string of four digit sequences, possibly
    separated by a -, for instance like this

    "1234,2222-8888,4567,"

    My regular expression is like this:

    rx1=re.compile( r"""\A(\b\d\d\d \d,|\b\d\d\d\d-\d\d\d\d,)*\Z"" ")

    When running rx1.findall("12 34,2222-8888,4567,")

    I only get the last match as the result. Isn't
    findall suppose to return all the matches?

    Thanks in advance.


    --
    Har du et kjøleskap, har du en TV
    så har du alt du trenger for å leve

    -Jokke & Valentinerne
  • Thomas Guettler

    #2
    Re: regex problem

    Am Tue, 26 Jul 2005 09:57:23 +0000 schrieb Odd-R.:
    [color=blue]
    > Input is a string of four digit sequences, possibly
    > separated by a -, for instance like this
    >
    > "1234,2222-8888,4567,"
    >
    > My regular expression is like this:
    >
    > rx1=re.compile( r"""\A(\b\d\d\d \d,|\b\d\d\d\d-\d\d\d\d,)*\Z"" ")[/color]

    Hi,

    try it without \A and \Z

    import re
    rx1=re.compile( r"""(\b\d\d\d\d ,|\b\d\d\d\d-\d\d\d\d,)""")
    print rx1.findall("12 34,2222-8888,4567,")
    # --> ['1234,', '2222-8888,', '4567,']

    Thomas

    --
    Thomas Güttler, http://www.thomas-guettler.de/


    Comment

    • John Machin

      #3
      Re: regex problem

      Odd-R. wrote:[color=blue]
      > Input is a string of four digit sequences, possibly
      > separated by a -, for instance like this
      >
      > "1234,2222-8888,4567,"
      >
      > My regular expression is like this:
      >
      > rx1=re.compile( r"""\A(\b\d\d\d \d,|\b\d\d\d\d-\d\d\d\d,)*\Z"" ")
      >
      > When running rx1.findall("12 34,2222-8888,4567,")
      >
      > I only get the last match as the result. Isn't
      > findall suppose to return all the matches?[/color]

      For a start, an expression that starts with \A and ends with \Z will
      match the whole string (or not match at all). You have only one match.

      Secondly, as you have a group in your expression, findall returns what
      the group matches. Your expression matches zero or more of what your
      group matches, provided there is nothing else at the start/end of the
      string. The "zero or more" makes the re engine waltz about a bit; when
      the music stopped, the group was matching "4567,".

      Thirdly, findall should be thought of as merely a wrapper around a loop
      using the search method -- it finds all non-overlapping matches of a
      pattern. So the clue to get from this is that you need a really simple
      pattern, like the following. You *don't* have to write an expression
      that does the looping.

      So here's the mean lean no-flab version -- you don't even need the
      parentheses (sorry, Thomas).
      [color=blue][color=green][color=darkred]
      >>> rx1=re.compile( r"""\b\d\d\d\d, |\b\d\d\d\d-\d\d\d\d,""")
      >>> rx1.findall("12 34,2222-8888,4567,")[/color][/color][/color]
      ['1234,', '2222-8888,', '4567,']

      HTH,
      John

      Comment

      • Duncan Booth

        #4
        Re: regex problem

        John Machin wrote:
        [color=blue]
        > So here's the mean lean no-flab version -- you don't even need the
        > parentheses (sorry, Thomas).
        >[color=green][color=darkred]
        > >>> rx1=re.compile( r"""\b\d\d\d\d, |\b\d\d\d\d-\d\d\d\d,""")
        > >>> rx1.findall("12 34,2222-8888,4567,")[/color][/color]
        > ['1234,', '2222-8888,', '4567,'][/color]

        No flab? What about all that repetition of \d? A less flabby version:
        [color=blue][color=green][color=darkred]
        >>> rx1=re.compile( r"""\b\d{4}( ?:-\d{4})?,""")
        >>> rx1.findall("12 34,2222-8888,4567,")[/color][/color][/color]
        ['1234,', '2222-8888,', '4567,']

        Comment

        • John Machin

          #5
          Re: regex problem

          Duncan Booth wrote:[color=blue]
          > John Machin wrote:
          >
          >[color=green]
          >>So here's the mean lean no-flab version -- you don't even need the
          >>parentheses (sorry, Thomas).
          >>
          >>[color=darkred]
          >>>>>rx1=re.com pile(r"""\b\d\d \d\d,|\b\d\d\d\ d-\d\d\d\d,""")
          >>>>>rx1.findal l("1234,2222-8888,4567,")[/color]
          >>
          >>['1234,', '2222-8888,', '4567,'][/color]
          >
          >
          > No flab? What about all that repetition of \d? A less flabby version:
          >
          >[color=green][color=darkred]
          >>>>rx1=re.comp ile(r"""\b\d{4} (?:-\d{4})?,""")
          >>>>rx1.findall ("1234,2222-8888,4567,")[/color][/color]
          >
          > ['1234,', '2222-8888,', '4567,']
          >[/color]


          OK, good idea to factor out the prefix and follow it by optional -1234.
          However optimising re engines do common prefix factoring, *and* they
          rewrite stuff like x{4} as xxxx.

          Cheers,
          John

          Comment

          • Odd-R.

            #6
            Re: regex problem

            On 2005-07-26, Duncan Booth <duncan.booth@i nvalid.invalid> wrote:[color=blue][color=green][color=darkred]
            >>>> rx1=re.compile( r"""\b\d{4}( ?:-\d{4})?,""")
            >>>> rx1.findall("12 34,2222-8888,4567,")[/color][/color]
            > ['1234,', '2222-8888,', '4567,'][/color]

            Thanks all for good advice. However this last expression
            also matches the first four digits when the input is more
            than four digits. To resolve this problem, I first do a
            match of this,

            regex=re.compil e(r"""\A(\b\d{4 },|\d{4}-\d{4},)*(\b\d{4 }|\d{4}-\d{4})\Z""")

            If this turns out ok, I do a find all with your expression, and then I get
            the desired result.


            --
            Har du et kjøleskap, har du en TV
            så har du alt du trenger for å leve

            -Jokke & Valentinerne

            Comment

            Working...