Regular Expression IGNORECASE different for findall and split?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris

    Regular Expression IGNORECASE different for findall and split?

    hello,
    I have question about the re.I option for Regular Expressions:
    [color=blue][color=green][color=darkred]
    >>> import re
    >>> re.findall('x', '1x2X3', re.I)[/color][/color][/color]
    ['x', 'X']

    as expected finds both lower and uppercase x

    [color=blue][color=green][color=darkred]
    >>> re.split('x', '1x2X3', re.I)[/color][/color][/color]
    ['1', '2X3'][color=blue][color=green][color=darkred]
    >>> re.split('x', '1x2X3')[/color][/color][/color]
    ['1', '2X3']

    I expected ['1', '2', '3'] but in this case re.I bahaves exactly as not
    present at all...

    Is that an expected behaviour or a fault?
    Running Python 2.4.1 on Windows XP

    thanks for any hint
    chris

  • Peter Otten

    #2
    Re: Regular Expression IGNORECASE different for findall and split?

    Chris wrote:
    [color=blue][color=green][color=darkred]
    > >>> re.split('x', '1x2X3', re.I)[/color][/color]
    > ['1', '2X3'][/color]

    [color=blue]
    > I expected ['1', '2', '3'] but in this case re.I bahaves exactly as not
    > present at all...[/color]
    [color=blue]
    > Is that an expected behaviour or a fault?[/color]

    This is expected:
    [color=blue][color=green][color=darkred]
    >>> help(re.split)[/color][/color][/color]
    Help on function split in module sre:

    split(pattern, string, maxsplit=0)
    Split the source string by the occurrences of the pattern,
    returning a list containing the resulting substrings.

    You are setting maxsplit to
    [color=blue][color=green][color=darkred]
    >>> re.I[/color][/color][/color]
    2

    Use re.compile() to get the desired behaviour:
    [color=blue][color=green][color=darkred]
    >>> re.compile("x", re.I).split("1x 2X3")[/color][/color][/color]
    ['1', '2', '3']

    Peter

    Comment

    • Chris

      #3
      __dict__ of object, Was: Regular Expression IGNORECASE differentfor findall and split?

      Peter Otten wrote:[color=blue]
      > Chris wrote:
      >
      >[color=green][color=darkred]
      >> >>> re.split('x', '1x2X3', re.I)[/color]
      >>['1', '2X3'][/color]
      >
      >
      >[color=green]
      >>I expected ['1', '2', '3'] but in this case re.I bahaves exactly as not
      >>present at all...[/color]
      >
      >
      >[color=green]
      >>Is that an expected behaviour or a fault?[/color]
      >
      >
      > This is expected:
      >
      >[color=green][color=darkred]
      >>>>help(re.spl it)[/color][/color]
      >
      > Help on function split in module sre:
      >
      > split(pattern, string, maxsplit=0)
      > Split the source string by the occurrences of the pattern,
      > returning a list containing the resulting substrings.
      >
      > You are setting maxsplit to
      >
      >[color=green][color=darkred]
      >>>>re.I[/color][/color]
      >
      > 2
      >
      > Use re.compile() to get the desired behaviour:
      >
      >[color=green][color=darkred]
      >>>>re.compile( "x", re.I).split("1x 2X3")[/color][/color]
      >
      > ['1', '2', '3']
      >
      > Peter[/color]

      thanks, I should read the docs but

      but more of a basic question following, I was doing the following before:

      method = 'split' # came from somewhere else of course
      result = re.__dict__[method].(REGEX, TXT)

      precompiling the regex

      r = compile(REGEX)

      does give an regex object which has the needed methods

      print dir(r)
      ['__copy__', '__deepcopy__', 'findall', 'finditer', 'match', 'scanner',
      'search', 'split', 'sub', 'subn']

      but how do I evaluate them without explicitly calling them?

      result = r.__???MAGIC??? __[method](TXT)

      obviously I am not a Python pro ;)

      thanks
      chris

      Comment

      • Steven Bethard

        #4
        Re: __dict__ of object, Was: Regular Expression IGNORECASE differentfor findall and split?

        Chris wrote:[color=blue]
        > but more of a basic question following, I was doing the following before:
        >
        > method = 'split' # came from somewhere else of course
        > result = re.__dict__[method].(REGEX, TXT)
        >
        > precompiling the regex
        >
        > r = compile(REGEX)
        >
        > does give an regex object which has the needed methods
        >
        > print dir(r)
        > ['__copy__', '__deepcopy__', 'findall', 'finditer', 'match',
        > 'scanner', 'search', 'split', 'sub', 'subn']
        >
        > but how do I evaluate them without explicitly calling them?
        >
        > result = r.__???MAGIC??? __[method](TXT)
        >
        > obviously I am not a Python pro ;)[/color]

        Use getattr:

        method = 'split'
        result = getattr(re.comp ile(REGEX), method)(TXT)

        HTH,

        STeVe

        Comment

        • Fredrik Lundh

          #5
          Re: __dict__ of object,

          Chris <c@cdot.de> wrote:
          [color=blue]
          > but more of a basic question following, I was doing the following before:
          >
          > method = 'split' # came from somewhere else of course
          > result = re.__dict__[method].(REGEX, TXT)
          >
          > precompiling the regex
          >
          > r = compile(REGEX)
          >
          > does give an regex object which has the needed methods
          >
          > print dir(r)
          > ['__copy__', '__deepcopy__', 'findall', 'finditer', 'match', 'scanner',
          > 'search', 'split', 'sub', 'subn']
          >
          > but how do I evaluate them without explicitly calling them?
          >
          > result = r.__???MAGIC??? __[method](TXT)
          >
          > obviously I am not a Python pro ;)[/color]

          I really don't understand why you think you have to write
          your RE code that way, but the mechanism you're looking
          for is getattr:

          result = getattr(r, method)(TXT)

          </F>



          Comment

          • Chris

            #6
            Re: __dict__ of object, Was: Regular Expression IGNORECASE differentforfin dall and split?

            Fredrik Lundh wrote:[color=blue]
            > Chris <c@cdot.de> wrote:
            >
            >[color=green]
            >>but more of a basic question following, I was doing the following before:
            >>
            >>method = 'split' # came from somewhere else of course
            >>result = re.__dict__[method].(REGEX, TXT)
            >>
            >>precompilin g the regex
            >>
            >>r = compile(REGEX)
            >>
            >>does give an regex object which has the needed methods
            >>
            >>print dir(r)
            >>['__copy__', '__deepcopy__', 'findall', 'finditer', 'match', 'scanner',
            >>'search', 'split', 'sub', 'subn']
            >>
            >>but how do I evaluate them without explicitly calling them?
            >>
            >>result = r.__???MAGIC??? __[method](TXT)
            >>
            >>obviously I am not a Python pro ;)[/color]
            >
            >
            > I really don't understand why you think you have to write
            > your RE code that way, but the mechanism you're looking
            > for is getattr:
            >
            > result = getattr(r, method)(TXT)
            >[/color]

            thanks (also to Steven) for the info, that is exactly what i was looking
            for.

            reason is that I built a small UI in which the user may choose if he
            want to do a split, findall (and maybe later others like match or
            search). So the method name comes in "from the UI". I could of course
            use if/elif/else blocks but thought getattr should be shorter and
            easier. I was not really aware of getattr which I was looking for on
            other occations before...

            chris

            Comment

            • Mike Meyer

              #7
              Re: __dict__ of object, Was: Regular Expression IGNORECASEdiffe rentfor findall and split?

              Chris <c@cdot.de> writes:[color=blue]
              > Fredrik Lundh wrote:[color=green]
              >> Chris <c@cdot.de> wrote:
              >>[color=darkred]
              >>>but more of a basic question following, I was doing the following before:
              >>>
              >>>method = 'split' # came from somewhere else of course
              >>>result = re.__dict__[method].(REGEX, TXT)
              >>>
              >>>precompili ng the regex
              >>>
              >>>r = compile(REGEX)
              >>>
              >>>does give an regex object which has the needed methods
              >>>
              >>>print dir(r)
              >>>['__copy__', '__deepcopy__', 'findall', 'finditer', 'match', 'scanner',
              >>>'search', 'split', 'sub', 'subn']
              >>>
              >>>but how do I evaluate them without explicitly calling them?
              >>>
              >>>result = r.__???MAGIC??? __[method](TXT)
              >>>
              >>>obviously I am not a Python pro ;)[/color]
              >> I really don't understand why you think you have to write
              >> your RE code that way, but the mechanism you're looking
              >> for is getattr:
              >> result = getattr(r, method)(TXT)
              >>[/color]
              >
              > thanks (also to Steven) for the info, that is exactly what i was
              > looking for.
              >
              > reason is that I built a small UI in which the user may choose if he
              > want to do a split, findall (and maybe later others like match or
              > search). So the method name comes in "from the UI". I could of course
              > use if/elif/else blocks but thought getattr should be shorter and
              > easier. I was not really aware of getattr which I was looking for on
              > other occations before...[/color]

              So why is the UI returning strings, instead of code objects of some
              kind?

              <mike
              --
              Mike Meyer <mwm@mired.or g> http://www.mired.org/home/mwm/
              Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

              Comment

              • Chris

                #8
                Re: __dict__ of object, Was: Regular Expression IGNORECASE differentforfin dall and split?

                Mike Meyer wrote:[color=blue]
                > Chris <c@cdot.de> writes:
                >[color=green]
                >>Fredrik Lundh wrote:
                >>[color=darkred]
                >>>Chris <c@cdot.de> wrote:
                >>>
                >>>
                >>>>but more of a basic question following, I was doing the following before:
                >>>>
                >>>>method = 'split' # came from somewhere else of course
                >>>>result = re.__dict__[method].(REGEX, TXT)
                >>>>
                >>>>precompilin g the regex
                >>>>
                >>>>r = compile(REGEX)
                >>>>
                >>>>does give an regex object which has the needed methods
                >>>>
                >>>>print dir(r)
                >>>>['__copy__', '__deepcopy__', 'findall', 'finditer', 'match', 'scanner',
                >>>>'search', 'split', 'sub', 'subn']
                >>>>
                >>>>but how do I evaluate them without explicitly calling them?
                >>>>
                >>>>result = r.__???MAGIC??? __[method](TXT)
                >>>>
                >>>>obviously I am not a Python pro ;)
                >>>
                >>>I really don't understand why you think you have to write
                >>>your RE code that way, but the mechanism you're looking
                >>>for is getattr:
                >>> result = getattr(r, method)(TXT)
                >>>[/color]
                >>
                >>thanks (also to Steven) for the info, that is exactly what i was
                >>looking for.
                >>
                >>reason is that I built a small UI in which the user may choose if he
                >>want to do a split, findall (and maybe later others like match or
                >>search). So the method name comes in "from the UI". I could of course
                >>use if/elif/else blocks but thought getattr should be shorter and
                >>easier. I was not really aware of getattr which I was looking for on
                >>other occations before...[/color]
                >
                >
                > So why is the UI returning strings, instead of code objects of some
                > kind?
                >
                > <mike[/color]

                it is a simple ajax call to a python server doing the re. maybe a bit
                contrived but it was nice to try python/ajax on a for me useful app to
                enable easy tryout of regular expressions. if you are interested check
                out http://cthedot.de/retest/

                chris

                Comment

                Working...