Trouble sorting lists (unicode/locale related?)

**Shu-Hsien Sheu** · Jul 18 '05, 02:50 AM

Slicing vs .startswith

Hi,

I have a question about the comparison of efficiency of string slicing
and using string.startswi th.
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith(' abc'):

Thanks!

-shuhsien

**Bob Gailer** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

At 09:17 AM 9/22/2003, Shu-Hsien Sheu wrote:
[color=blue]
>Hi,
>
>I have a question about the comparison of efficiency of string slicing and
>using string.startswi th.
>For example, which one of the following would be more efficient, or ,
>moreover, more pythonic?
>
>if aa[:3] == 'abc':
>
>vs
>
>if aa.startswith(' abc'):[/color]

Here's one way to address this kind of question:
[color=blue][color=green][color=darkred]
>>> import time
>>> def f():[/color][/color][/color]
.... st = time.time()
.... for i in range(500000):
.... if aa.startswith(' abc'):pass
.... print time.time() - st
....[color=blue][color=green][color=darkred]
>>> f()[/color][/color][/color]
1.01200008392[color=blue][color=green][color=darkred]
>>> def f():[/color][/color][/color]
.... st = time.time()
.... for i in range(500000):
.... if aa[:3] == 'abc':pass
.... print time.time() - st
....[color=blue][color=green][color=darkred]
>>> f()[/color][/color][/color]
1.01100003719

Bob Gailer
bgailer@alum.rp i.edu
303 442 2625

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003

**Duncan Booth** · Jul 18 '05, 02:50 AM

Re: DSU pattern (was Re: Trouble sorting lists (unicode/locale related?))

Alex Martelli <aleax@aleax.it > wrote in
news:VOEbb.1016 41$hE5.3573494@ news1.tin.it:
[color=blue]
> Duncan Booth wrote:
> ...[color=green]
>> FWIW, I think something like this belongs in the standard library rather
>> than as a method on lists or a new builtin.[/color]
>
> Definitely not a method on lists, we agree on that. Problem is, there
> is no module in the standard library to collect "functions that are often
> useful but not QUITE often enough to warrant being built-ins" (and if
> there was, there are quite a few current built-ins I'd like to relegate
> into such an auxiliary/utilities module), so that finding a good place,
> other than built-ins, for such utility functions, is often a bother.
>[/color]
So someone needs to write a PEP proposing such a module.

--
Duncan Booth duncan@rcp.co.u k
int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
"\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

**Peter Hansen** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

Shu-Hsien Sheu wrote:[color=blue]
>
> I have a question about the comparison of efficiency of string slicing
> and using string.startswi th.
> For example, which one of the following would be more efficient, or ,
> moreover, more pythonic?
>
> if aa[:3] == 'abc':
>
> vs
>
> if aa.startswith(' abc'):[/color]

The latter is clearly more readable.

**David Eppstein** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

In article <3F6F3E38.5221D 7E7@engcorp.com >,
Peter Hansen <peter@engcorp. com> wrote:
[color=blue][color=green]
> > For example, which one of the following would be more efficient, or ,
> > moreover, more pythonic?
> >
> > if aa[:3] == 'abc':
> >
> > vs
> >
> > if aa.startswith(' abc'):[/color]
>
> The latter is clearly more readable.[/color]

More Pythonic, too, I think. "Readabilit y counts," and "There should be
one-- and preferably only one --obvious way to do it." In this case,
startswith must be the one obvious way, or else why would it exist in
the standard library at all?

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

**xtian** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

David Eppstein <eppstein@ics.u ci.edu> wrote in message news:<eppstein-89A196.11515122 092003@news.ser vice.uci.edu>.. .[color=blue]
> In article <3F6F3E38.5221D 7E7@engcorp.com >,
> Peter Hansen <peter@engcorp. com> wrote:
>[color=green][color=darkred]
> > > For example, which one of the following would be more efficient, or ,
> > > moreover, more pythonic?
> > >
> > > if aa[:3] == 'abc':
> > >
> > > vs
> > >
> > > if aa.startswith(' abc'):[/color]
> >
> > The latter is clearly more readable.[/color]
>
> More Pythonic, too, I think. "Readabilit y counts," and "There should be
> one-- and preferably only one --obvious way to do it." In this case,
> startswith must be the one obvious way, or else why would it exist in
> the standard library at all?[/color]

It's also much more maintainable - if in the future that 'abc' needs
to change to 'abcdef', the slicing version requires changes in two
places (just asking for bugs), while startswith requires only one. Of
course, all these things (readability, maintainability and
pythonicism) are fairly closely interrelated.

xtian

**Jeremy Fincher** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

Shu-Hsien Sheu <sheu@bu.edu> wrote in message news:<mailman.1 064245938.12315 .python-list@python.org >...[color=blue]
> For example, which one of the following would be more efficient, or ,
> moreover, more pythonic?
>
> if aa[:3] == 'abc':
>
> vs
>
> if aa.startswith(' abc'):[/color]

Python is about maintainability , and the latter is significantly more
maintainable than the former; if the string you're checking against
changes in size, using .startswith doesn't require you to change your
sourcecode anywhere else.

Jeremy

**Paul** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().en dswith('jpg')? Is there a better
solution altogether?

Paul

Bob Gailer <bgailer@alum.r pi.edu> wrote in message news:<mailman.1 064246454.19668 .python-list@python.org >...[color=blue]
> At 09:17 AM 9/22/2003, Shu-Hsien Sheu wrote:
>[color=green]
> >Hi,
> >
> >I have a question about the comparison of efficiency of string slicing and
> >using string.startswi th.
> >For example, which one of the following would be more efficient, or ,
> >moreover, more pythonic?
> >
> >if aa[:3] == 'abc':
> >
> >vs
> >
> >if aa.startswith(' abc'):[/color]
>
> Here's one way to address this kind of question:
>[color=green][color=darkred]
> >>> import time
> >>> def f():[/color][/color]
> ... st = time.time()
> ... for i in range(500000):
> ... if aa.startswith(' abc'):pass
> ... print time.time() - st
> ...[color=green][color=darkred]
> >>> f()[/color][/color]
> 1.01200008392[color=green][color=darkred]
> >>> def f():[/color][/color]
> ... st = time.time()
> ... for i in range(500000):
> ... if aa[:3] == 'abc':pass
> ... print time.time() - st
> ...[color=green][color=darkred]
> >>> f()[/color][/color]
> 1.01100003719
>
> Bob Gailer
> bgailer@alum.rp i.edu
> 303 442 2625
>
> --[/color]

**Alex Martelli** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

Paul wrote:
[color=blue]
> However, what if you don't want case sensitivity? For example, to
> check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> work with both foo.jpg and foo.JPG.
>
> Is this slower than name.lower().en dswith('jpg')? Is there a better
> solution altogether?[/color]

timeit.py gives me 0.9 microseconds for the less maintainable
name[:-3].lower()=='jpg' vs 1.7 for name.lower().en dswith('jpg')
[and over 3 for re's]. Point is, _do I *care*_? How many millions
of filenames will I check, when the extra overhead of checking
a million filenames in the more maintainable way is less than a
second? How long will it have taken me to get those millions
of filenames into memory in the first place?

If this operation IS on the bottleneck, and a 0.8 microseconds
difference matters, I'll do the slicing -- but 99 times out of 100
I'll do the .endswith instead (I'll _start_ that way 100 times out
of 100, and optimize it iff profiling tells me that matters...).

Alex

**Peter Hansen** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

Paul wrote:[color=blue]
>
> However, what if you don't want case sensitivity? For example, to
> check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> work with both foo.jpg and foo.JPG.
>
> Is this slower than name.lower().en dswith('jpg')? Is there a better
> solution altogether?[/color]

Yes, of course. :-)

import os
if os.path.splitex t(name)[1].lower() == 'jpg':
pass

That also handles the problem with files named "ThisFileIs.Not Ajpg"
being mistreated, as the other solutions do. ;-)

-Peter

**David Eppstein** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

In article <3F6F8306.1ABCC 6D@engcorp.com> ,
Peter Hansen <peter@engcorp. com> wrote:
[color=blue]
> Paul wrote:[color=green]
> >
> > However, what if you don't want case sensitivity? For example, to
> > check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> > work with both foo.jpg and foo.JPG.
> >
> > Is this slower than name.lower().en dswith('jpg')? Is there a better
> > solution altogether?[/color]
>
> Yes, of course. :-)
>
> import os
> if os.path.splitex t(name)[1].lower() == 'jpg':
> pass
>
> That also handles the problem with files named "ThisFileIs.Not Ajpg"
> being mistreated, as the other solutions do. ;-)[/color]

I was about to post the same answer. One minor nit, though: it should be

os.path.splitex t(name)[1].lower() == '.jpg'

It may be a little longer than the other solutions, but it expresses the
meaning more clearly.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

**Terry Reedy** · Jul 18 '05, 02:50 AM

Re: Slicing vs .startswith

"Shu-Hsien Sheu" <sheu@bu.edu> wrote in message
news:mailman.10 64245938.12315. python-list@python.org ...[color=blue]
> Hi,
>
> For example, which one of the following would be more efficient, or[/color]
,[color=blue]
> moreover, more pythonic?
>
> if aa[:3] == 'abc':[/color]

This creates and deletes a temporary object.
[color=blue]
> if aa.startswith(' abc'):[/color]

This makes a function call. As Bob showed for his system, the two
overheads are about the same for a three char prefix. If one were
checking a 30000 byte prefix, the call might win.

Terry J. Reedy

**Peter Hansen** · Jul 18 '05, 02:51 AM

Re: Slicing vs .startswith

David Eppstein wrote:[color=blue]
>
> In article <3F6F8306.1ABCC 6D@engcorp.com> ,
> Peter Hansen <peter@engcorp. com> wrote:
>[color=green]
> > Paul wrote:[color=darkred]
> > >
> > > However, what if you don't want case sensitivity? For example, to
> > > check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> > > work with both foo.jpg and foo.JPG.
> > >
> > > Is this slower than name.lower().en dswith('jpg')? Is there a better
> > > solution altogether?[/color]
> >
> > Yes, of course. :-)
> >
> > import os
> > if os.path.splitex t(name)[1].lower() == 'jpg':
> > pass
> >
> > That also handles the problem with files named "ThisFileIs.Not Ajpg"
> > being mistreated, as the other solutions do. ;-)[/color]
>
> I was about to post the same answer. One minor nit, though: it should be
>
> os.path.splitex t(name)[1].lower() == '.jpg'[/color]

Oops, thanks! I _always_ make that mistake. It just seems that if
os.path.split() does not return any path separators in the components,
then os.path.splitex t() shouldn't return the extension separator...

Luckily we have tests around here to catch that kind of thing. :-)

-Peter

**Peter Otten** · Jul 18 '05, 02:51 AM

Re: DSU pattern (was Re: Trouble sorting lists (unicode/locale related?))

Duncan Booth wrote:
[color=blue]
> Note that if anyone proposes this seriously, it should generate a 3-tuple
> (mapping(item), index, item) rather than the 2-tuple you suggest.
>
> This is because the mapping function could reasonably be used to impose an
> ordering on objects that have no natural order, so you need to be sure
> that the comparison never falls through the the original object even where
> the mapping compares equal. It also has the side effect of making the sort
> stable (although if stability is a goal you might want another option to
> reverse the sort which would use '-index' as the second element and call
> .reverse() on the result).[/color]

So my demo implementation was faulty :-(
Let me try again:

def sort(self, cmpfunc=None, mapfunc=None):
if mapfunc:
list.sort(self, lambda x, y: cmp(mapfunc(x), mapfunc(y)))
else:
list.sort(self, cmpfunc)

Seriously, if sort() were to be thus extended, the dsu pattern would rather
act as a performance enhancement under the hood. I prefer plain C

struct {
PyObject *decorator;
int index; /* iff stability is not guaranteed by the sort algorithm */
PyObject *data;
}

over n-tuples and would hope to reuse the sorting infrastructure already
there in listobject.c to compare only the decorators. I am not familiar
with the C source, so be gentle if that is not feasible.
[color=blue]
> FWIW, I think something like this belongs in the standard library rather
> than as a method on lists or a new builtin.[/color]

If you think of the comparison as a two-step process,

(1) extract or calculate an attribute
(2) wrap it into a comparison function

consider the following use cases:

(a) lambda x, y: cmp(y, x)
(b) lambda x, y: cmp(x.attr.lowe r(), y.attr.lower())

All code following the latter pattern could be rewritten (more clearly, I
think) as

alist.sort(mapf unc=lambda x: attr.lower())

and would automatically benefit from dsu (which could even be turned off
transparently for the client code for very large lists, where the memory
footprint becomes a problem).

The litmus test whether to put it into a utility function or into an already
existing method that needs only a minor and completely backwards-compatible
interface change would then be:

Are (b)-style comparison functions nearly as or more common than (a)-style
functions.

Peter

PS: You and Alex Martelli dismissed my somewhat unfocused idea faster than I
could sort it out. I hope you will find the above useful, though.

**Shu-Hsien Sheu** · Jul 18 '05, 02:52 AM

Re: Slicing vs .startswith

Great thanks for all the answers!
I really enjoy Python and learned a lot here.

-shuhsien

Trouble sorting lists (unicode/locale related?)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment