\w in regular expression

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Marcello Pietrobon

    \w in regular expression

    Hello,

    I am reading


    But there is an incongruence:
    In the paragraph 2.1: Matching Character:

    |\w|
    Matches any alphanumeric character; this is equivalent to the class
    [a-zA-Z0-9_].

    |\W|
    Matches any non-alphanumeric character; this is equivalent to the
    class |[^a-zA-Z0-9_]|.

    Which is fine with me and the same as in Perl and congruent with:

    |\d|
    Matches any decimal digit; this is equivalent to the class [0-9].

    |\D|
    Matches any non-digit character; this is equivalent to the class
    |[^0-9]|.

    |
    But in the paragraph 5.1: Splitting Strings
    I find:
    |
    [color=blue][color=green][color=darkred]
    >>> p = re.compile(r'\W +')
    >>> p.split('This is a test, short and sweet, of split().')[/color][/color][/color]
    ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']


    At first I thought a typo:
    But on my Python command line:
    Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
    Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
    >>> import re
    >>> p = re.compile(r'\W +'); print p[/color][/color][/color]
    <_sre.SRE_Patte rn object at 0x0090DC38>[color=blue][color=green][color=darkred]
    >>> p.split('This is a test, short and sweet, of split().')[/color][/color][/color]
    ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', ''][color=blue][color=green][color=darkred]
    >>> p = re.compile(r'\w +'); print p[/color][/color][/color]
    <_sre.SRE_Patte rn object at 0x0090D140>[color=blue][color=green][color=darkred]
    >>> p.split('This is a test, short and sweet, of split().')[/color][/color][/color]
    ['', ' ', ' ', ' ', ', ', ' ', ' ', ', ', ' ', '().'][color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]



    In other word is Python re module not compatible with Perl ?


    I also noted that the tools\scripts\r edemo.py behaves different than the Python command line ( it is not the only case )
    because it matches 'This' when I use \w+ and not when I use \W+


    ???


    Thank you for any comments,

    Marcello






    ||






  • William Park

    #2
    Re: \w in regular expression

    Marcello Pietrobon <teiffel@attglo bal.net> wrote:[color=blue][color=green][color=darkred]
    > >>> p = re.compile(r'\W +')
    > >>> p.split('This is a test, short and sweet, of split().')[/color][/color]
    > ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
    >
    > At first I thought a typo:[/color]

    It's correct. No typo.
    [color=blue]
    > In other word is Python re module not compatible with Perl ?[/color]

    What answer were you expecting in Perl?

    --
    William Park, Open Geometry Consulting, <opengeometry@y ahoo.ca>
    Linux solution for data management and processing.

    Comment

    • stewart

      #3
      Re: \w in regular expression

      Marcello Pietrobon wrote:
      [color=blue]
      > I also noted that the tools\scripts\r edemo.py behaves different than the
      > Python command line ( it is not the only case )
      > because it matches 'This' when I use \w+ and not when I use \W+[/color]

      This doesn't sound like an error to me. \w+ matches all lowercase and
      uppercase letters, and numbers, while \W+ matches everything else. So of
      course 'This' will be matched by \w+ and not by \W+. It sounds like your
      impression was that \w+ matched only lowercase letters, which is not the
      case.

      Comment

      Working...