delete duplicates in list

**Diez B. Roggisch** · Jul 18 '05, 04:35 AM

Re: delete duplicates in list

Hi,[color=blue]
> this must have come up before, so i am already sorry for asking but a
> quick googling did not give me any answer.
>
> i have a list from which i want a simpler list without the duplicates
> an easy but somehow contrived solution would be[/color]

In python 2.3, this should work:

import sets
l = [1,2,2,3]
s = sets.Set(l)

A set is a container for which the invariant that no object is in it twice
holds. So it suits your needs.

Regards,

Diez

**Bernard Delmée** · Jul 18 '05, 04:35 AM

Re: delete duplicates in list

> >>> a = [1, 2, 2, 3][color=blue][color=green][color=darkred]
> >>> d = {}.fromkeys(a)
> >>> b = d.keys()
> >>> print b[/color][/color]
> [1, 2, 3][/color]

That, or using a Set (python 2.3+). Actually I seem to recall that
"The Python CookBook" still advises building a dict as the fastest
solution - if your elements can be hashed. See the details at

ActiveState Community

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560

Community support for users of the ActiveState Platform.

Cheers,

Bernard.

**Alex Martelli** · Jul 18 '05, 04:35 AM

Re: delete duplicates in list

christof hoeke wrote:
...[color=blue]
> i have a list from which i want a simpler list without the duplicates[/color]

Canonical is:

import sets
simplerlist = list(sets.Set(t helist))

if you're allright with destroying order, as your example solution suggests.
But dict.fromkeys(a ).keys() is probably faster. Your assertion:
[color=blue]
> there should be an easier or more intuitive solution, maybe with a list
> comprehension=[/color]

doesn't seem self-evident to me. A list-comprehension might be, e.g:

[ x for i, x in enumerate(a) if i==a.index(x) ]

and it does have the advantages of (a) keeping order AND (b) not
requiring hashable (nor even inequality-comparable!) elements -- BUT
it has the non-indifferent cost of being O(N*N) while the others
are about O(N). If you really want something similar to your approach:
[color=blue][color=green][color=darkred]
> >>> b = [x for x in a if x not in b][/color][/color][/color]

you'll have, o horrors!-), to do a loop, so name b is always bound to
"the result list so far" (in the LC, name b is only bound at the end):

b = []
for x in a:
if x not in b:
b.append(x)

However, this is O(N*N) too. In terms of "easier or more intuitive",
I suspect only this latter solution might qualify.

Alex

**Alex Martelli** · Jul 18 '05, 04:35 AM

Re: delete duplicates in list

Bernard Delmée wrote:
[color=blue][color=green][color=darkred]
>> >>> a = [1, 2, 2, 3]
>> >>> d = {}.fromkeys(a)
>> >>> b = d.keys()
>> >>> print b[/color]
>> [1, 2, 3][/color]
>
> That, or using a Set (python 2.3+). Actually I seem to recall that
> "The Python CookBook" still advises building a dict as the fastest
> solution - if your elements can be hashed. See the details at
>
> http://aspn.activestate.com/ASPN/Coo...n/Recipe/52560[/color]

Yep, but a Set is roughly as fast as a dict -- it has a small
penalty wrt a dict, but, emphasis on small. Still, if you're
trying to squeeze every last cycle out of a bottleneck, it's
worth measuring, and perhaps moving from Set to dict.

Alex

**Alex Martelli** · Jul 18 '05, 04:35 AM

Re: delete duplicates in list

Diez B. Roggisch wrote:

[color=blue][color=green]
>> this must have come up before, so i am already sorry for asking but a
>> quick googling did not give me any answer.
>>
>> i have a list from which i want a simpler list without the duplicates
>> an easy but somehow contrived solution would be[/color]
>
> In python 2.3, this should work:
>
> import sets
> l = [1,2,2,3]
> s = sets.Set(l)
>
> A set is a container for which the invariant that no object is in it twice
> holds. So it suits your needs.[/color]

....except it's not a LIST, which was part of the specifications given
by the original poster. It may, of course, be that you've read his
mind correctly and that, despite his words, he doesn't really care
whether he gets a list or a very different container:-).

Alex

**Diez B. Roggisch** · Jul 18 '05, 04:35 AM

Re: delete duplicates in list

Hi,
[color=blue]
> ...except it's not a LIST, which was part of the specifications given
> by the original poster. It may, of course, be that you've read his
> mind correctly and that, despite his words, he doesn't really care
> whether he gets a list or a very different container:-).[/color]

You're right - mathematically. However, there is no such thing like a set in
computers - you always end up with some sort of list :)

So - he'll have a list anyway. But if it respects the order the list
parameter... <just checking, standby>

.... nope:
[color=blue][color=green][color=darkred]
>>> l = [1,2,3,2]
>>> import sets
>>> sets.Set(l)[/color][/color][/color]
Set([1, 2, 3])[color=blue][color=green][color=darkred]
>>> l = [2,1,2,3,2]
>>> sets.Set(l)[/color][/color][/color]
Set([1, 2, 3])

Which is of course reasonable, as the check for existence in the set might
be performed in O(ln n) instead of O(n).

Regards,

Diez

**Hannu Kankaanp??** · Jul 18 '05, 04:36 AM

Re: delete duplicates in list

"Diez B. Roggisch" <deets_noospaam @web.de> wrote in message news:<bnplva$g1 s$07$1@news.t-online.com>...[color=blue]
> Hi,
>[color=green]
> > ...except it's not a LIST, which was part of the specifications given
> > by the original poster. It may, of course, be that you've read his
> > mind correctly and that, despite his words, he doesn't really care
> > whether he gets a list or a very different container:-).[/color]
>
> You're right - mathematically. However, there is no such thing like a set in
> computers - you always end up with some sort of list :)[/color]

Those are just implementation details. There could be a group of monkeys
emulating Python under the hood and their implementation of a set
would be a neural network instead of any kind of sequence, but you
still wouldn't care as a programmer. The only thing that matters is,
if the interface stays same. Nope, the items in a set are no longer
accessible by their index (among other differences).
[color=blue]
> So - he'll have a list anyway.[/color]

So I can't agree with this. You don't know if his Python virtual machine
is a group of monkeys. Python is supposed to be a high level language.
[color=blue]
> But if it respects the order the list
> parameter... <just checking, standby>
>
> ... nope:
>[color=green][color=darkred]
> >>> l = [1,2,3,2]
> >>> import sets
> >>> sets.Set(l)[/color][/color]
> Set([1, 2, 3])[color=green][color=darkred]
> >>> l = [2,1,2,3,2]
> >>> sets.Set(l)[/color][/color]
> Set([1, 2, 3])
>
> Which is of course reasonable, as the check for existence in the set might
> be performed in O(ln n) instead of O(n).[/color]

Actually the Set in sets module has an average lookup of O(1), worst
case O(n) (not 100% sure of worst case, but 99% sure). It's been
implemented with dictionaries, which in turn are hash tables.

**Duncan Booth** · Jul 18 '05, 04:36 AM

Re: delete duplicates in list

Alex Martelli <aleax@aleax.it > wrote in
news:L7Wnb.6554 7$e5.2408695@ne ws1.tin.it:
[color=blue]
> Your assertion:
>[color=green]
>> there should be an easier or more intuitive solution, maybe with a list
>> comprehension=[/color]
>
> doesn't seem self-evident to me. A list-comprehension might be, e.g:
>
> [ x for i, x in enumerate(a) if i==a.index(x) ]
>
> and it does have the advantages of (a) keeping order AND (b) not
> requiring hashable (nor even inequality-comparable!) elements -- BUT
> it has the non-indifferent cost of being O(N*N) while the others
> are about O(N). If you really want something similar to your approach:
>[color=green][color=darkred]
>> >>> b = [x for x in a if x not in b][/color][/color]
>
> you'll have, o horrors!-), to do a loop, so name b is always bound to
> "the result list so far" (in the LC, name b is only bound at the end):
>
> b = []
> for x in a:
> if x not in b:
> b.append(x)
>
> However, this is O(N*N) too. In terms of "easier or more intuitive",
> I suspect only this latter solution might qualify.[/color]

I dunno about more intuitive, but here's a fairly simple list comprehension
solution which is O(N) and preserves the order. Of course its back to
requiring hashable elements again:
[color=blue][color=green][color=darkred]
>>> startList = [5,1,2,1,3,4,2,5 ,3,4]
>>> d = {}
>>> [ d.setdefault(x, x) for x in startList if x not in d ][/color][/color][/color]
[5, 1, 2, 3, 4]

And for the 'I must do it on one line' freaks, here's the single expression
variant of the above: :^)
[color=blue][color=green][color=darkred]
>>> [ d.setdefault(x, x) for d in [{}] for x in startList if x not in d ][/color][/color][/color]
[5, 1, 2, 3, 4]

--
Duncan Booth duncan@rcp.co.u k
int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
"\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

**Alex Martelli** · Jul 18 '05, 04:36 AM

Re: delete duplicates in list

Hannu Kankaanp?? wrote:
...[color=blue][color=green]
>> So - he'll have a list anyway.[/color]
>
> So I can't agree with this. You don't know if his Python virtual machine
> is a group of monkeys. Python is supposed to be a high level language.[/color]

So, in that case those monkeys would surely be perched up on tall trees.

[color=blue][color=green]
>> Which is of course reasonable, as the check for existence in the set
>> might be performed in O(ln n) instead of O(n).[/color]
>
> Actually the Set in sets module has an average lookup of O(1), worst
> case O(n) (not 100% sure of worst case, but 99% sure). It's been[/color]

Hmmm -- could you give an example of that worstcase...? a _full_
hashtable would give such behavior, but Python's dicts always ensure
the underlying hashtables aren't too full...
[color=blue]
> implemented with dictionaries, which in turn are hash tables.[/color]

Yep - check it out...:

[alex@lancelot bo]$ timeit.py -c -s'import sets' -s'x=sets.Set(xr ange(100))'
'7 in x'
100000 loops, best of 3: 2.3 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xr ange(1000))' '7 in x'
100000 loops, best of 3: 2.3 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xr ange(10000))' '7 in x'
100000 loops, best of 3: 2.2 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xr ange(100000))' '7 in x'
100000 loops, best of 3: 2.2 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xr ange(1000000))' '7 in x'
100000 loops, best of 3: 2.3 usec per loop

see the pattern...?-)

Same when something is NOT there, BTW:

[alex@lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xr ange(1000000))' 'None in x'
100000 loops, best of 3: 2.3 usec per loop

etc, etc.

**Michael Hudson** · Jul 18 '05, 04:36 AM

Re: delete duplicates in list

Alex Martelli <aleax@aleax.it > writes:
[color=blue][color=green]
> > Actually the Set in sets module has an average lookup of O(1), worst
> > case O(n) (not 100% sure of worst case, but 99% sure). It's been[/color]
>
> Hmmm -- could you give an example of that worstcase...? a _full_
> hashtable would give such behavior, but Python's dicts always ensure
> the underlying hashtables aren't too full...[/color]

Try inserting a bunch of instances of

class C:
def __hash__(self): return 0

into a dictionary.

Cheers,
mwh

--
Whaaat? That is the most retarded thing I have seen since, oh,
yesterday -- Kaz Kylheku, comp.lang.lisp

**Thomas Heller** · Jul 18 '05, 04:36 AM

hashing mutable instances (was Re: delete duplicates in list)

Michael Hudson <mwh@python.net > writes:
[color=blue]
> Alex Martelli <aleax@aleax.it > writes:
>[color=green][color=darkred]
>> > Actually the Set in sets module has an average lookup of O(1), worst
>> > case O(n) (not 100% sure of worst case, but 99% sure). It's been[/color]
>>
>> Hmmm -- could you give an example of that worstcase...? a _full_
>> hashtable would give such behavior, but Python's dicts always ensure
>> the underlying hashtables aren't too full...[/color]
>
> Try inserting a bunch of instances of
>
> class C:
> def __hash__(self): return 0
>
> into a dictionary.[/color]

I've though about using something like this in production code
to be able to store mutable instances in a dict.
Performance problems aside (since there are only a couple of key/value
pairs in the dict), is it such a bad idea?

Thomas

**Michael Hudson** · Jul 18 '05, 04:36 AM

Re: hashing mutable instances (was Re: delete duplicates in list)

Thomas Heller <theller@python .net> writes:
[color=blue]
> Michael Hudson <mwh@python.net > writes:
>[color=green]
> > Alex Martelli <aleax@aleax.it > writes:
> >[color=darkred]
> >> > Actually the Set in sets module has an average lookup of O(1), worst
> >> > case O(n) (not 100% sure of worst case, but 99% sure). It's been
> >>
> >> Hmmm -- could you give an example of that worstcase...? a _full_
> >> hashtable would give such behavior, but Python's dicts always ensure
> >> the underlying hashtables aren't too full...[/color]
> >
> > Try inserting a bunch of instances of
> >
> > class C:
> > def __hash__(self): return 0
> >
> > into a dictionary.[/color]
>
> I've though about using something like this in production code
> to be able to store mutable instances in a dict.[/color]

Eek!
[color=blue]
> Performance problems aside (since there are only a couple of key/value
> pairs in the dict), is it such a bad idea?[/color]

Um, I don't know actually. It pretty much defeats the point of using
a hashtable. If your class has some non-mutable parts it'd be an
idea to hash them, of course.

And the performance problems are severe -- inserting just a thousand
instances of the class C() into a dictionary takes noticable time.

(What *is* a bad idea is giving such classes an __eq__ method that
mutates the dict being inserted into -- I found a bunch of such bugs a
couple years back and I think it's still possible to core Python (via
stack overflow) by being even more devious).

Cheers,
mwh

--
Our Constitution never promised us a good or efficient government,
just a representative one. And that's what we got.
-- http://www.advogato.org/person/mrorg...html?start=109

**Corrado Gioannini** · Jul 18 '05, 04:36 AM

Re: delete duplicates in list

On Wed, Oct 29, 2003 at 10:02:08PM +0100, christof hoeke wrote:[color=blue]
> there should be an easier or more intuitive solution, maybe with a list
> comprehension=
>
> somthing like
> [color=green][color=darkred]
> >>> b = [x for x in a if x not in b]
> >>> print b[/color][/color]
> []
>
> does not work though.[/color]
if you want a list comp solution, similar to the one you proposed and valid
also with python < 2.3 you could do:
[color=blue][color=green][color=darkred]
>>> a = [1, 2, 2, 3]
>>> b=[]
>>> [b.append(x) for x in a if x not in b][/color][/color][/color]
[None, None, None][color=blue][color=green][color=darkred]
>>> b[/color][/color][/color]
[1, 2, 3]

or even:
[color=blue][color=green][color=darkred]
>>> a = [1, 2, 2, 3]
>>> [b.append(x) for b in [[]] for x in a if x not in b][/color][/color][/color]
[None, None, None][color=blue][color=green][color=darkred]
>>> b[/color][/color][/color]
[1, 2, 3]

Corrado.
--
Thought is only a flash between two long nights,
but this flash is everything.
(H. Poincaré)

**Bernhard Herzog** · Jul 18 '05, 04:36 AM

Re: hashing mutable instances

Thomas Heller <theller@python .net> writes:
[color=blue]
> Michael Hudson <mwh@python.net > writes:
>[color=green]
>> Try inserting a bunch of instances of
>>
>> class C:
>> def __hash__(self): return 0
>>
>> into a dictionary.[/color]
>
> I've though about using something like this in production code
> to be able to store mutable instances in a dict.
> Performance problems aside (since there are only a couple of key/value
> pairs in the dict), is it such a bad idea?[/color]

IMO it depends on what equality means for instances. E.g. if two
instances are only equal if they're identical, i.e. a == b is equivalent
to a is b, then defining __hash__ can be very useful, because then you
can use them in dictionaries and mutability doesn't really matter
because no change to one instance can make it equal to a nother
instance.

I'd define __hash__ to return id(self), though, so that the hash values
are different for different instances to reduce collisions.

This also seems to be what class objects in Python do:
[color=blue][color=green][color=darkred]
>>> class C(object):[/color][/color][/color]
.... pass
....[color=blue][color=green][color=darkred]
>>> hash(C)[/color][/color][/color]
135622420[color=blue][color=green][color=darkred]
>>> id(C)[/color][/color][/color]
135622420[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]

Bernhard

--
Intevation GmbH http://intevation.de/
Sketch http://sketch.sourceforge.net/
Thuban http://thuban.intevation.org/

delete duplicates in list

delete duplicates in list

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment