Py2.3: Feedback on Sets

**Terry Reedy** · Jul 18 '05, 01:36 AM

Re: Py2.3: Feedback on Sets

"Raymond Hettinger" <vze4rx4y@veriz on.net> wrote in message
news:3b__a.9694 $u%2.7778@nwrdn y02.gnilink.net ...[color=blue]
> "Istvan Albert"[color=green][color=darkred]
> > > Then just by looking at the docs, it feels a little bit[/color][/color][/color]
confusing to[color=blue][color=green]
> > have discard() and remove() do essentially the same thing but only[/color][/color]
one[color=blue][color=green]
> > of them raising an exception. Which one? I already forgot. I don't[/color][/color]
know[color=blue][color=green]
> > which one I would prefer though.[/color][/color]

I agree that this is confusing -- like having both str.find and
str.index. I would prefer one delete function with an optional param
'silent' to switch its 'not there' response from the default (either
True or False, according to what seems to be the more common usage) to
the other choice. (I know, I should have read draft more carefully
and commented last fall -- but this seems like the sort of redundancy
that Guido wants to remove in 3.0.)

Terry J. Reedy

**Gerrit Holl** · Jul 18 '05, 01:37 AM

Re: Py2.3: Feedback on Sets

Raymond Hettinger wrote:[color=blue]
> Subject: Py2.3: Feedback on Sets[/color]
[color=blue]
> * Do you care that sets can only contain hashable elements?[/color]

This is the only disadvantage for me.

For the rest, I am happy about it. I am already using it a lot
on places where I used lists before, but where a Set is much
better (no order, no duplicates, it really *is* a set)
[color=blue]
> User feedback is essential to determining the future direction
> of sets (whether it will be implemented in C, change API,
> and/or be given supporting language syntax).[/color]

I really like them. I would also like to be able to do
{elem for elem in set if foo(elem)} to construct a subset.

Gerrit.

--
255. If he sublet the man's yoke of oxen or steal the seed-corn,
planting nothing in the field, he shall be convicted, and for each one
hundred gan he shall pay sixty gur of corn.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger Syndroom - een persoonlijke benadering:

http://people.nl.linux.org/~gerrit/

Het zijn tijden om je zelf met politiek te bemoeien:

Home - SP - Socialistische Partij

http://www.sp.nl/

De website van de Socialistische Partij (SP) in Nederland: Informatie, nieuws, agenda en publicaties.

**Raymond Hettinger** · Jul 18 '05, 01:37 AM

Re: Py2.3: Feedback on Sets

"Russell E. Owen"[color=blue]
> I don't rely on sets heavily (I do have a few implemented as
> dictionaries with value=None) and am not yet ready to make my users
> upgrade to Python 2.3.
>
> I suspect the upgrade issue will significantly slow the incorporation of
> sets and the other new modules, but that over time they're likely to
> become quite popular. I am certainly looking forward to using sets and
> csv.
>
> I think it'd speed the adoption of new modules if they were explicitly
> written to be compatible with one previous generation of Python (and
> documented as such) so users could manually include them with their code
> until the current generation of Python had a bit more time to be adopted.[/color]

Wish granted!

The sets module now will run under Py2.2.
It should be available for download from CVS after 24 hours:

Python

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/sets.p

Download Python for free. The Python programming language, an object-oriented scripting and rapid application development language. You can download it from http://www.python.org/download

y

Raymond Hettinger

**Raymond Hettinger** · Jul 18 '05, 01:37 AM

Re: Py2.3: Feedback on Sets

"Gary Feldman"[color=blue][color=green]
> >* Are the docs clear? Can you suggest improvements?[/color]
>
> I haven't used them yet, but since I'm working my way through
> the docs in general, I thought I'd check them out and comment.[/color]

All of the issues you found have been fixed (except for the discussion of
what an iterable parameter means -- that will be addressed elsewhere).

Raymond Hettinger

**Raymond Hettinger** · Jul 18 '05, 01:39 AM

Re: Py2.3: Feedback on Sets

"John Smith"[color=blue]
> Suggestion: How about adding Set.isProperSub set() and
> Set.isProperSup erset()?[/color]

We have them in operator form: a<b a>b
Spelling them out did not seem to add much value.
This is doubly true because some people read it
as s.isProperSubse tOf(t) and others read it as
s.hasTheProperS ubset(t).

Raymond Hettinger
[color=blue]
> Thanks for this wonderful module. I've been working on data mining and
> machine
> learning area using Python. Set operations are very important to me.[/color]

Great. You'll love it even more when I implement it in C.

Raymond Hettinger

**Christos TZOTZIOY Georgiou** · Jul 18 '05, 01:46 AM

Re: Py2.3: Feedback on Sets

On Tue, 12 Aug 2003 06:02:17 GMT, rumours say that "Raymond Hettinger"
<vze4rx4y@veriz on.net> might have written:

[replying only to those that I have something substantial to say]
[color=blue]
>* Is the support for sets of sets necessary for your work
> and, if so, then is the implementation sufficiently
> powerful?[/color]

I have used sets in:
- Unix sysadm tasks (comparing usernames between passwd and shadow,
finding common files in sync requests et al)
- a hangman game (when the computer guesses words, to continuously
restrict the possibilities based on the human input)
- an image recognition program (comparing haar coefficients)

These come to mind at the moment, but I have used them even in the
python command line; and mostly I care about intersections.
[color=blue]
>* Does the performance meet your expectations?[/color]

In the game and image recognition programs I could use more power;-)
[color=blue]
>* Are sets helpful in your daily work or does the need arise
> only rarely?[/color]

I use them often, it's a very helpful construct.
[color=blue]
>User feedback is essential to determining the future direction
>of sets (whether it will be implemented in C, change API,
>and/or be given supporting language syntax).[/color]

Reimplementatio n in C sounds appropriate, and supporting language syntax
would be nice.

A quick thought, in the spirit of C implementation: there are cases
where I would like to get the intersection of dicts (based on the keys),
without having to create sets from the dict keys and then getting the
relevant values. That is, given dicts a and b, I'd like:
[color=blue][color=green][color=darkred]
>>> a & b # imaginary[/color][/color][/color]

to mean
[color=blue][color=green][color=darkred]
>>> dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real[/color][/color][/color]

You may notice that a&b wouldn't be equivalent to b&a.
Perhaps the speed difference would not be much; I'll grow a function in
dictobject.c, run some benchmarks and come back with results for you.

Another thought: it is unfortunate that an intersection *has* to be
through continuous lookups (talking about the ordering of dict keys re
their hash values, I'll have to delve into dictobject.c it seems), even
taking into account the great speed of key lookups... although building
the result dict should account for more processing cycles than the
comparisons; and in some cases doing a dict.copy() and then removing the
uncommon elements would be faster. Hm, food for thought, and no more
than two hours to sleep now.

Another slogan: Python keeps your mind awake (and c.l.py keeps your body
away from bed :)
--
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.

**Christos TZOTZIOY Georgiou** · Jul 18 '05, 01:47 AM

Re: Py2.3: Feedback on Sets - diffudict.txt (0/1)

On Wed, 20 Aug 2003 06:10:19 +0300, rumours say that Christos "TZOTZIOY"
Georgiou <tzot@sil-tec.gr> might have written:
[color=blue]
>A quick thought, in the spirit of C implementation: there are cases
>where I would like to get the intersection of dicts (based on the keys),
>without having to create sets from the dict keys and then getting the
>relevant values. That is, given dicts a and b, I'd like:
>[color=green][color=darkred]
>>>> a & b # imaginary[/color][/color]
>
>to mean
>[color=green][color=darkred]
>>>> dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real[/color][/color]
>
>You may notice that a&b wouldn't be equivalent to b&a.
>Perhaps the speed difference would not be much; I'll grow a function in
>dictobject.c , run some benchmarks and come back with results for you.[/color]

I implemented dict.intersect( ), and it is *quite* faster than the
equivalent Python code.

*************** *************** *************** *************** **********

Python 2.4a0 (#3, Aug 20 2003, 16:31:22)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
>>> help(dict.inter sect)[/color][/color][/color]
Help on method_descript or:

intersect(...)
D.intersect(E) -> a subset of D having common keys with E
[color=blue][color=green][color=darkred]
>>> import sets
>>> odds = dict(zip("abcde fghijklmn", range(1, 55, 2)))
>>> evens= dict(zip("asdfg hj", range(2, 55, 2)))
>>>
>>> odds[/color][/color][/color]
{'a': 1, 'c': 5, 'b': 3, 'e': 9, 'd': 7, 'g': 13, 'f': 11, 'i': 17, 'h':
15, 'k': 21, 'j': 19, 'm': 25, 'l': 23, 'n': 27}[color=blue][color=green][color=darkred]
>>> evens[/color][/color][/color]
{'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14, 's': 4}[color=blue][color=green][color=darkred]
>>>
>>>
>>> dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)])[/color][/color][/color]
{'a': 1, 'd': 7, 'g': 13, 'f': 11, 'h': 15, 'j': 19}[color=blue][color=green][color=darkred]
>>> odds.intersect( evens)[/color][/color][/color]
{'a': 1, 'h': 15, 'j': 19, 'd': 7, 'g': 13, 'f': 11}[color=blue][color=green][color=darkred]
>>> dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)])[/color][/color][/color]
{'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14}[color=blue][color=green][color=darkred]
>>> evens.intersect (odds)[/color][/color][/color]
{'a': 2, 'h': 12, 'j': 14, 'd': 6, 'g': 10, 'f': 8}[color=blue][color=green][color=darkred]
>>>
>>>
>>> my_setup= 'import sets; odds=dict(zip(" abcdefghijklmn" , range(1, 55, 2))); evens=dict(zip( "asdfghj", range(2, 55, 2)))'
>>> from timeit import Timer
>>>
>>> Timer(stmt="odd s.intersect(eve ns)", setup=my_setup) .repeat()[/color][/color][/color]
[1.3545670509338 379, 1.3367550373077 393, 1.3366960287094 116][color=blue][color=green][color=darkred]
>>> Timer(stmt="eve ns.intersect(od ds)", setup=my_setup) .repeat()[/color][/color][/color]
[1.3214920759201 05, 1.2869999408721 924, 1.3203419446945 19][color=blue][color=green][color=darkred]
>>> Timer(stmt="dic t([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup) .repeat()[/color][/color][/color]
[63.413245916366 577, 63.526772975921 631, 63.503224968910 217][color=blue][color=green][color=darkred]
>>> Timer(stmt="dic t([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup) .repeat()[/color][/color][/color]
[63.498296976089 478, 63.493119001388 55, 63.425426959991 455]

*************** *************** *************** *************** **********

A substantial difference, over 50x on an Athlon XP 1700. Also note the
difference in the key order of the results.

I believe that dicts should grow such a method, perhaps with another
name.

Attached is the diff -u for dictobject.c compared to the one in last
night's python-latest.tgz
--
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.

Py2.3: Feedback on Sets

Comment

Comment

Comment

Comment

Comment

Comment

Comment