Matching horizontal white space

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Magnus.Moraberg@gmail.com

    Matching horizontal white space

    multipleSpaces = re.compile(u'\\ h+')

    importantTextSt ring = '\n \n \n \t\t '
    importantTextSt ring = multipleSpaces. sub("M", importantTextSt ring)

    I would have expected consecutive spaces and tabs to be replaced by M
    but nothing is being replaced. If I try the following, then I'm left
    only with M, as expected -

    multipleSpaces = re.compile(u'\\ s+') # both vertical and horizontal

    importantTextSt ring = '\n \n \n \t\t '
    importantTextSt ring = multipleSpaces. sub("M", importantTextSt ring)


    What I eventually wish to do is have only single spaces in my text and
    to only have single carriage returns -

    " one two three four

    five


    six

    "

    becoming -

    "one two three four
    five
    six
    "

    Thanks,

    Barry
  • Fredrik Lundh

    #2
    Re: Matching horizontal white space

    Magnus.Moraberg @gmail.com wrote:
    multipleSpaces = re.compile(u'\\ h+')
    >
    importantTextSt ring = '\n \n \n \t\t '
    importantTextSt ring = multipleSpaces. sub("M", importantTextSt ring)
    what's "\\h" supposed to mean?
    I would have expected consecutive spaces and tabs to be replaced by M
    but nothing is being replaced.
    if you know what you want to replace, be explicit:
    >>importantText String = '\n \n \n \t\t '
    >>re.compile( "[\t ]+").sub("M", importantTextSt ring)
    '\nM\nM\nM'

    </F>

    Comment

    • John Machin

      #3
      Re: Matching horizontal white space

      On Sep 13, 12:52 am, Fredrik Lundh <fred...@python ware.comwrote:
      Magnus.Morab... @gmail.com wrote:
      multipleSpaces = re.compile(u'\\ h+')
      >
      importantTextSt ring = '\n  \n  \n \t\t  '
      importantTextSt ring = multipleSpaces. sub("M", importantTextSt ring)
      >
      what's "\\h" supposed to mean?
      Match *h*orizontal whitespace, I guess ... looks like the maintainer
      of the re equivalent in some other language has far too much spare
      time :-)



      Comment

      • Ben Finney

        #4
        Re: Matching horizontal white space

        Magnus.Moraberg @gmail.com writes:
        multipleSpaces = re.compile(u'\\ h+')
        >
        importantTextSt ring = '\n \n \n \t\t '
        importantTextSt ring = multipleSpaces. sub("M", importantTextSt ring)
        Please get into the habit of following the Python coding style guide
        <URL:http://www.python.org/dev/peps/pep-0008>.

        For literal strings that you expect to contain backslashes, it's often
        clearer to use the "raw" string syntax:

        multiple_spaces = re.compile(ur'\ h+')
        I would have expected consecutive spaces and tabs to be replaced by
        M
        Why, what leads you to expect that? Your regular expression doesn't
        specify spaces or tabs. It specifies "the character 'h', one or more
        times".

        For "space or tab", specify a character class of space and tab:
        >>multiple_spac es = re.compile(u'[\t ]+')
        >>important_tex t_string = u'\n \n \n \t\t '
        >>multiple_spac es.sub("M", important_text_ string)
        u'\nM\nM\nM'


        You probably want to read the documentation for the Python 're' module
        <URL:http://www.python.org/doc/lib/module-re>. This is standard
        practice when using any unfamiliar module from the standard library.

        --
        \ “If you do not trust the source do not use this program.” |
        `\ —Microsoft Vista security dialogue |
        _o__) |
        Ben Finney

        Comment

        Working...