strip() using strings instead of chars

**Bruno Desthuilliers** · Jul 11 '08, 12:35 PM

Re: strip() using strings instead of chars

Christoph Zwerschke a écrit :

In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:
>
if url.startswith( 'http://'):
url = url[7:]

DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith( prefix):
url = url[len(prefix):]

(snip)

My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix.

cf above

If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.

cf above

Things get even worse if you have several prefixes to consider:
>
if url.startswith( 'http://'):
url = url[7:]
elif url.startswith( 'https://'):
url = url[8:]
>
You can't take use of url.startswith( ('http://', 'https://')) here.

for prefix in ('http://', 'https://'):
if url.startswith( prefix):
url = url[len(prefix):]
break

For most complex use case, you may want to consider regexps,
specifically re.sub:

>>import re
>>pat = re.compile(r"(^ https?://|\.txt$)")
>>urls = ['http://toto.com', 'https://titi.com', 'tutu.com',

'file://tata.txt']

>>[pat.sub('', u) for u in urls]

['toto.com', 'titi.com', 'tutu.com', 'file://tata']

Not to dismiss your suggestion, but I thought you might like to know how
to solve your problem with what's currently available !-)

**Christoph Zwerschke** · Jul 11 '08, 02:55 PM

Re: strip() using strings instead of chars

Bruno Desthuilliers schrieb:

DRY/SPOT violation. Should be written as :
>
prefix = 'http://'
if url.startswith( prefix):
url = url[len(prefix):]

That was exactly my point. This formulation is a bit better, but it
still violates DRY, because you need to type "prefix" two times. It is
exactly this idiom that I see so often and that I wanted to simplify.
Your suggestions work, but I somehow feel such a simple task should have
a simpler formulation in Python, i.e. something like

url = url.lstripstr(( 'http://', 'https://'))

instead of

for prefix in ('http://', 'https://'):
if url.startswith( prefix):
url = url[len(prefix):]
break

-- Christoph

**Marc 'BlackJack' Rintsch** · Jul 11 '08, 03:15 PM

Re: strip() using strings instead of chars

On Fri, 11 Jul 2008 16:45:20 +0200, Christoph Zwerschke wrote:

Bruno Desthuilliers schrieb:

>DRY/SPOT violation. Should be written as :
>>
> prefix = 'http://'
> if url.startswith( prefix):
> url = url[len(prefix):]

>
That was exactly my point. This formulation is a bit better, but it
still violates DRY, because you need to type "prefix" two times. It is
exactly this idiom that I see so often and that I wanted to simplify.
Your suggestions work, but I somehow feel such a simple task should have
a simpler formulation in Python, i.e. something like
>
url = url.lstripstr(( 'http://', 'https://'))

I would prefer a name like `remove_prefix( )` instead of a variant with
`strip` and abbreviations in it.

Ciao,
Marc 'BlackJack' Rintsch

**Duncan Booth** · Jul 11 '08, 05:25 PM

Re: strip() using strings instead of chars

Christoph Zwerschke <cito@online.de wrote:

In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:
>
if url.startswith( 'http://'):
url = url[7:]

If I came across this code I'd want to know why they weren't using
urlparse.urlspl it()...

>
Similarly for stripping suffixes:
>
if filename.endswi th('.html'):
filename = filename[:-5]

.... and I'd want to know why os.path.splitex t() wasn't appropriate here.

>
My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix. If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.
>
Things get even worse if you have several prefixes to consider:
>
if url.startswith( 'http://'):
url = url[7:]
elif url.startswith( 'https://'):
url = url[8:]
>
You can't take use of url.startswith( ('http://', 'https://')) here.
>

No you can't, so you definitely want to be parsing the URL properly. I
can't actually think of a use for stripping off the scheme without either
saving it somewhere or doing further parsing of the url.

**Christoph Zwerschke** · Jul 12 '08, 03:15 PM

Re: strip() using strings instead of chars

Duncan Booth schrieb:

>if url.startswith( 'http://'):
> url = url[7:]

>
If I came across this code I'd want to know why they weren't using
urlparse.urlspl it()...

Right, such code can have a smell since in the case of urls, file names,
config options etc. there are specialized functions available. But I'm
not sure whether the need for removing string prefix/suffixes in general
is really so rare that we shouldn't worry to offer a simpler solution.

-- Christoph

**Duncan Booth** · Jul 12 '08, 05:45 PM

Re: strip() using strings instead of chars

Christoph Zwerschke <cito@online.de wrote:

Duncan Booth schrieb:

>>if url.startswith( 'http://'):
>> url = url[7:]

>>
>If I came across this code I'd want to know why they weren't using
>urlparse.urlsp lit()...

>
Right, such code can have a smell since in the case of urls, file names,
config options etc. there are specialized functions available. But I'm
not sure whether the need for removing string prefix/suffixes in general
is really so rare that we shouldn't worry to offer a simpler solution.
>

One of the great things about Python is that it resists bloating the
builtin classes with lots of methods that just seem like a good idea at the
time. If a lot of people make a case for this function then it might get
added, but I think it is unlikely given how simple it is to write a
function to do this for yourself.

strip() using strings instead of chars

strip() using strings instead of chars

Comment

Comment

Comment

Comment

Comment

Comment