strip() using strings instead of chars

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Christoph Zwerschke

    strip() using strings instead of chars

    In Python programs, you will quite frequently find code like the
    following for removing a certain prefix from a string:

    if url.startswith( 'http://'):
    url = url[7:]

    Similarly for stripping suffixes:

    if filename.endswi th('.html'):
    filename = filename[:-5]

    My problem with this is that it's cumbersome and error prone to count
    the number of chars of the prefix or suffix. If you want to change it
    from 'http://' to 'https://', you must not forget to change the 7 to 8.
    If you write len('http://') instead of the 7, you see this is actually
    a DRY problem.

    Things get even worse if you have several prefixes to consider:

    if url.startswith( 'http://'):
    url = url[7:]
    elif url.startswith( 'https://'):
    url = url[8:]

    You can't take use of url.startswith( ('http://', 'https://')) here.

    Here is another concrete example taken from the standard lib:

    if chars.startswit h(BOM_UTF8):
    chars = chars[3:].decode("utf-8")

    This avoids hardcoding the BOM_UTF8, but its length is still hardcoded,
    and the programmer had to know it or look it up when writing this line.

    So my suggestion is to add another string method, say "stripstr" that
    behaves like "strip", but instead of stripping *characters* strips
    *strings* (similarly for lstrip and rstrip). Then in the case above,
    you could simply write url = url.lstripstr(' http://') or
    url = url.lstripstr(( 'http://', 'https://')).

    The new function would actually comprise the old strip function, you
    would have strip('aeiou') == stripstr(set('a eio')).

    Instead of a new function, we could also add another parameter to strip
    (lstrip, rstrip) for passing strings or changing the behavior, or we
    could create functions with the signature of startswith and endswith
    which instead of only checking whether the string starts or ends with
    the substring, remove the substring (startswith and endswith have
    additional "start" and "end" index parameters that may be useful).

    Or did I overlook anything and there is already a good idiom for this?

    Btw, in most other languages, "strip" is called "trim" and behaves
    like Python's strip, i.e. considers the parameter as a set of chars.
    There is one notable exception: In MySQL, trim behaves like stripstr
    proposed above (differently to SQLite, PostgreSQL and Oracle).

    -- Christoph
  • Bruno Desthuilliers

    #2
    Re: strip() using strings instead of chars

    Christoph Zwerschke a écrit :
    In Python programs, you will quite frequently find code like the
    following for removing a certain prefix from a string:
    >
    if url.startswith( 'http://'):
    url = url[7:]
    DRY/SPOT violation. Should be written as :

    prefix = 'http://'
    if url.startswith( prefix):
    url = url[len(prefix):]

    (snip)
    My problem with this is that it's cumbersome and error prone to count
    the number of chars of the prefix or suffix.
    cf above
    If you want to change it
    from 'http://' to 'https://', you must not forget to change the 7 to 8.
    If you write len('http://') instead of the 7, you see this is actually
    a DRY problem.
    cf above
    Things get even worse if you have several prefixes to consider:
    >
    if url.startswith( 'http://'):
    url = url[7:]
    elif url.startswith( 'https://'):
    url = url[8:]
    >
    You can't take use of url.startswith( ('http://', 'https://')) here.
    for prefix in ('http://', 'https://'):
    if url.startswith( prefix):
    url = url[len(prefix):]
    break

    For most complex use case, you may want to consider regexps,
    specifically re.sub:
    >>import re
    >>pat = re.compile(r"(^ https?://|\.txt$)")
    >>urls = ['http://toto.com', 'https://titi.com', 'tutu.com',
    'file://tata.txt']
    >>[pat.sub('', u) for u in urls]
    ['toto.com', 'titi.com', 'tutu.com', 'file://tata']


    Not to dismiss your suggestion, but I thought you might like to know how
    to solve your problem with what's currently available !-)

    Comment

    • Christoph Zwerschke

      #3
      Re: strip() using strings instead of chars

      Bruno Desthuilliers schrieb:
      DRY/SPOT violation. Should be written as :
      >
      prefix = 'http://'
      if url.startswith( prefix):
      url = url[len(prefix):]
      That was exactly my point. This formulation is a bit better, but it
      still violates DRY, because you need to type "prefix" two times. It is
      exactly this idiom that I see so often and that I wanted to simplify.
      Your suggestions work, but I somehow feel such a simple task should have
      a simpler formulation in Python, i.e. something like

      url = url.lstripstr(( 'http://', 'https://'))

      instead of

      for prefix in ('http://', 'https://'):
      if url.startswith( prefix):
      url = url[len(prefix):]
      break

      -- Christoph

      Comment

      • Marc 'BlackJack' Rintsch

        #4
        Re: strip() using strings instead of chars

        On Fri, 11 Jul 2008 16:45:20 +0200, Christoph Zwerschke wrote:
        Bruno Desthuilliers schrieb:
        >DRY/SPOT violation. Should be written as :
        >>
        > prefix = 'http://'
        > if url.startswith( prefix):
        > url = url[len(prefix):]
        >
        That was exactly my point. This formulation is a bit better, but it
        still violates DRY, because you need to type "prefix" two times. It is
        exactly this idiom that I see so often and that I wanted to simplify.
        Your suggestions work, but I somehow feel such a simple task should have
        a simpler formulation in Python, i.e. something like
        >
        url = url.lstripstr(( 'http://', 'https://'))
        I would prefer a name like `remove_prefix( )` instead of a variant with
        `strip` and abbreviations in it.

        Ciao,
        Marc 'BlackJack' Rintsch

        Comment

        • Duncan Booth

          #5
          Re: strip() using strings instead of chars

          Christoph Zwerschke <cito@online.de wrote:
          In Python programs, you will quite frequently find code like the
          following for removing a certain prefix from a string:
          >
          if url.startswith( 'http://'):
          url = url[7:]
          If I came across this code I'd want to know why they weren't using
          urlparse.urlspl it()...
          >
          Similarly for stripping suffixes:
          >
          if filename.endswi th('.html'):
          filename = filename[:-5]
          .... and I'd want to know why os.path.splitex t() wasn't appropriate here.
          >
          My problem with this is that it's cumbersome and error prone to count
          the number of chars of the prefix or suffix. If you want to change it
          from 'http://' to 'https://', you must not forget to change the 7 to 8.
          If you write len('http://') instead of the 7, you see this is actually
          a DRY problem.
          >
          Things get even worse if you have several prefixes to consider:
          >
          if url.startswith( 'http://'):
          url = url[7:]
          elif url.startswith( 'https://'):
          url = url[8:]
          >
          You can't take use of url.startswith( ('http://', 'https://')) here.
          >
          No you can't, so you definitely want to be parsing the URL properly. I
          can't actually think of a use for stripping off the scheme without either
          saving it somewhere or doing further parsing of the url.

          Comment

          • Christoph Zwerschke

            #6
            Re: strip() using strings instead of chars

            Duncan Booth schrieb:
            >if url.startswith( 'http://'):
            > url = url[7:]
            >
            If I came across this code I'd want to know why they weren't using
            urlparse.urlspl it()...
            Right, such code can have a smell since in the case of urls, file names,
            config options etc. there are specialized functions available. But I'm
            not sure whether the need for removing string prefix/suffixes in general
            is really so rare that we shouldn't worry to offer a simpler solution.

            -- Christoph

            Comment

            • Duncan Booth

              #7
              Re: strip() using strings instead of chars

              Christoph Zwerschke <cito@online.de wrote:
              Duncan Booth schrieb:
              >>if url.startswith( 'http://'):
              >> url = url[7:]
              >>
              >If I came across this code I'd want to know why they weren't using
              >urlparse.urlsp lit()...
              >
              Right, such code can have a smell since in the case of urls, file names,
              config options etc. there are specialized functions available. But I'm
              not sure whether the need for removing string prefix/suffixes in general
              is really so rare that we shouldn't worry to offer a simpler solution.
              >
              One of the great things about Python is that it resists bloating the
              builtin classes with lots of methods that just seem like a good idea at the
              time. If a lot of people make a case for this function then it might get
              added, but I think it is unlikely given how simple it is to write a
              function to do this for yourself.

              Comment

              Working...