Ordering python sets

**bearophileHUGS@lycos.com** · Nov 1 '08, 06:25 PM

Re: Ordering python sets

Lie Ryan:

>Although you said you disagree with the general idea, you actually take the idea two steps further, I take that as an implicit agreement to several parts of the idea.<

Think about a bridge: building half bridge may be bad/useless, while
building it whole may lead to something useful :-)

Bye,
bearophile

**Tim Rowe** · Nov 1 '08, 07:05 PM

Re: Ordering python sets

2008/10/27 <bearophileHUGS @lycos.com>:

Lie Ryan:
>

>>Oh no, the two dict implementation would work _exactly_ the same from the outside, they are transparently interchangeable . Only the performance characteristic differs because of the different implementation. <

>
I don't agree with the general idea. If the operations done by your
data structure have different computational complexity, then they are
fit for different usages. When you program you must know what
computational complexity has each of the operations of your data
structyre, otherwise there's no way to know the complexity of your
whole program, so instead of programming you are just become a mage
that tries magical spells and hopes for the better.

No, when you program you know how you will be using the data
structure, so you can choose the implementation that's right for the
application. That's what the container libraries for other languages
do. At the moment, you just have one implementation, and have to hope
it's right for the job. Adding an *optional* parameter that says, in
effect, "I want this list optimised for writes and reads at both ends"
or "I want this list optimised for fully random reads but writes only
at the end" doesn't *lose* you any information about what you're
programming with. Of course it's essential that the data structure has
identican /functional/ behaviour whatever optimisation you use. Other
languages can enforce that, but Python programmers are used to taking
care of that side of things for themselves.

So I don't accept
so much different data structures to have the same name. That's why
I'll never appreciate the Python list type to be named list instead of
array, despite it supports more or less all the functions you expect
from an abstract list type.

They're not different data structures from the client point of view.
"More or less" all the functions wouldn't be enough.

--
Tim Rowe

**Terry Reedy** · Nov 1 '08, 10:25 PM

Re: Ordering python sets

Lie Ryan wrote:

You need to adjust the current mindset slightly (but in an important way
to understand the "why" behind this idea). The current notion is: list
and dict is a data structure. With this idea, list and dict is an
abstract type, not a data structure. array, linked list, binary tree, red-
black tree, hashed are data structure.

Did you miss my earlier post? Python already has about 20 abstract
types. It has a naming scheme for these, as well for concrete
structures, both built-in and imported. Agitating for names to be
swapped around is a waste of time. You are free, however, to do this in
your own code.

We create a data structure by
passing the data structure's identifier string to a factory function
provided by the abstract type.

Python is object-oriented rather than name-oriented. When one knows the
name of an object at compile time, the Python way is to use it directly
in the code, unquoted. In other words, work with and pass around
objects as much as possible and only use names when they are truly
variable and not known constants.

What you propose is equivalent to using getattr for all attribute
access: "getattr(li st, 'append')" versus "list.appen d".

Terry Jan Reedy

**Lie Ryan** · Nov 4 '08, 08:45 PM

Re: Ordering python sets

On Sun, 02 Nov 2008 02:08:37 +0000, Steven D'Aprano wrote:

On Sat, 01 Nov 2008 18:58:59 +0000, Tim Rowe wrote:
>

>2008/10/27 <bearophileHUGS @lycos.com>:

>>Lie Ryan:
>>>
>>>>Oh no, the two dict implementation would work _exactly_ the same from
>>>>the outside, they are transparently interchangeable . Only the
>>>>performan ce characteristic differs because of the different
>>>>implementat ion.<
>>>
>>I don't agree with the general idea. If the operations done by your
>>data structure have different computational complexity, then they are
>>fit for different usages. When you program you must know what
>>computation al complexity has each of the operations of your data
>>structyre, otherwise there's no way to know the complexity of your
>>whole program, so instead of programming you are just become a mage
>>that tries magical spells and hopes for the better.

>>
>No, when you program you know how you will be using the data structure,
>so you can choose the implementation that's right for the application.

>
I don't understand why you have prefixed that sentence with "No". It
seems like you are agreeing with bearophile's point.

On linguistic note: As a non-native speaker of English, I've never relied
on the correct usage of Yes and No and would instead rely on the
following text. In some languages, situations where English-people
usually use Yes is answered with No and vice versa.

>That's what the container libraries for other languages do. At the
>moment, you just have one implementation, and have to hope it's right
>for the job.

>
To the extent that this is true, it is a sign of the poverty of the
standard Python library. But in fact it isn't true. Python has quite a
decent collection of collection types. Just off the top of my head:
>
tuple, namedtuple, list, set, frozenset, dict, defaultdict, queue, deque
>
set, list and dict are highly optimized for general use, and are very
suitable for building new data structures. And you are wrong to say that
we have to "hope it's right for the job", we can build perfectly
acceptable pure-Python data structures from these building blocks, or
add our own C extensions. You're never forced to use a standard library
data structure, it's a trade-off of your time and effort against runtime
efficiency. And because the standard library types are generally so
excellent, that trade-off is often (but not always) a no-brainer.

There are cases where the algorithm you uses ignited the standard
library's data structure's worst case scenario.

>Adding an *optional* parameter that says, in effect, "I want this list
>optimised for writes and reads at both ends" or "I want this list
>optimised for fully random reads but writes only at the end" doesn't
>*lose* you any information about what you're programming with.

>
It might not lose information, but it does obfuscate it.
>
Right now, we can do this:

<snip>

I've repelled this same argument multiple times. How often do you think
you'd need to do check for an object's implementation? Compare that to
how often we need to ensure that the object we're handling is a list-
replacement.

That's assuming the implementation is available at runtime at all; if
it's not, then you have lost information.

The implementation property is always available, because it's a non-
optional argument when registering the data structure.

>Of course it's essential that the data structure has identican
>/functional/ behaviour whatever optimisation you use.

>
"Essential" ? That's an interesting point. Let's look at an actual
example.
>
Consider one of Lie's initial examples (paraphrased):
>
D1 = dict({1: 'a', 2: 'b'}) # standard hash-table D2 = dict({1: 'a', 2:
'b'}, implementation= "binary tree")
>
You say it's "essential" that the binary tree implementation has
identical functional behaviour to standard hash-table implementation of
dict. Okay, let's see where that leads us. The functional behaviour of
hash-tables is that they are O(1) for reads, while binary trees are
O(log(N)). The only way for them to have identical functional behaviour
is to *artificially* slow down dicts to the lowest common speed. That is
a Bad Thing, and I don't understand why you think it is essential.

I can't think why you consider a function's performance characteristic as
its behavior, a "characteristic " of something means it is a it's property/
character. Not behavior.

we no longer have to
artificially change the behaviour of the type -- be we do have to
artificially cripple the API of the type!

I don't get why you think we've got to restrict it to the lowest common
denominator. We expect people that messes with a field called
"implementation " to know enough about the limitations of the
implementation he chose. Regular python programmers nowadays have to be
aware that the limitation of the (hashed) dict is its key must be
immutable. A true dict/mapping type doesn't have that limitation, we
already have the so-called "artificial " limitations right now. The way it
is now gives the impression that hashed dict's limitations is equal to
dict/mapping's limitation. Instead, it should be: dict is a mapping type
of any object to any object, however the default hashed dict's limitation
is immutable key.

Aside: Also with my way, we could classify a "feature" into three status:
limitation, exact replacement, extension. limitation is minor points
where the implementation simply couldn't fulfill (not fulfilling major
points would fail registration), exact replacement is the regular things,
and extension as features that, if used, might make changing of
implementations harder, so should only be used if that implementation' s
extension is the natural way to write an algorithm.

But maybe you'll argue that Lie's examples are bad examples.

Yes, I agree it is a bad example for this issue. When I wrote that, what
I had in mind is to demonstrate how it'd look like in a simple way. Way
too simple, it seems.

Perhaps
what you have in mind is something like the difference between "dicts
implemented as hash tables with chaining" and "dicts implemented as hash
tables with open addressing". (Aside: what sort of open addressing?
Linear? Quadratic? Something else?) At least now we're talking about
different implementations with the same API and more-or-less the same
functional behaviour.

Not quite. I'm looking for a more radical differences. This is more
apparent in array-list vs bintree-list. One is better for straight
traversal and random access, while the other is better for searching and
inserting.

I'd suggest that 99% of the time, client code simply shouldn't care
about the differences. The existing implementations are Good Enough.
Have you *ever* found yourself saying "I can't use the built-in dict,
because it uses chaining and for my application I need linear open
addressing"? (Or vice versa.)

Yes, that's why we have reasonable default.

I doubt you've every said that. I doubt you know anyone who has said it.
I doubt you know *of* anyone who has said it. And even if you have, then
you are free to go and write your own dict type as a C extension type
and call it dict_with_linea r_openaddressin g and write this:
>
D = dict_with_linea r_openaddressin g(data)
>
instead of:
>
D = dict(data, implementation = "linear open addressing")
>
Yet again Lie's proposal gains you nothing. In fact, as an application
writer, you're better off writing your own type rather than waiting for
it to be added to the built-in implementation of dict.

There are times when it'd be too much work to implement it, test it, and
integrate it. There are times when we know that using alternative
implementation could make our program faster, but simply lack enough time
to implement and test it. I was talking about rapid application
development or even the wilder cousin, quick-one-time-disposable scripts.
(I'm not implying that only rapid programming benefits from this, it is
only one example)

Plus, currently it is impossible, without extensive reading of the whole
docs, to search for other data structure that implements the same
abstract type to list/dict/anything. Their documentations are scattered
around here and there. It's helpful if help(list) could list what
implementations are available (no pun intended).

<snip>

Here's an even better way:
>
class StandardFoo(dat a):
def __init__(self, data):
self.data = foo(data)
def __len__(self):
return len(self.data)
>
class MagicFoo(data):
def __init__(self, data):
self.data = bar(data)
def __len__(self):
return len(self.data[0]) + len(self.data[1:])
>
class GreenFoo(data):
def __init__(self, data):
self.data = baz(data)
def __len__(self):
return self._len
>
>
and let the client code call whichever implementation they want.

You missed the last part. Integrating the implementation to current code.
They are the one that makes the difference.

And I wasn't talking about any of those you mentioned.
It's more like this:

class StandardFoo(dat a):
def __init__(self, data):
self.data = foo(data)
def __len__(self):
return len(self.data)

class MagicFoo(data):
def __init__(self, data):
self.data = bar(data)
def __len__(self):
return len(self.data[0]) + len(self.data[1:])

class GreenFoo(data):
def __init__(self, data):
self.data = baz(data)
def __len__(self):
return self._len

class Foo(AbstractTyp e): pass
Foo.register({' standard': StandardFoo, 'green': GreenFoo, 'magic':
MagicFoo})

[snip]

>They're not different data structures from the client point of view.

>
But of course they are. They have different functional behaviour, and
different APIs.

Unless we artificially cripple the implementations

Then please state the implementation' s limitations on the docs. People
who knows enough to chose non-standard implementation should be
knowledgeable enough about the limitations and extension of the chosen
implementation.

and
make them all identical (which defeats the purpose of having different
implementations !) they are different data structures, and those
differences will be visible to the client code.

Yes, they're different data structure, same Abstract Type.

This scheme is a case of over-generalisation. It's also impractical: who
is going to spend the time and effort to do it? It's hard enough to get
somebody to write a binary tree implementation for the standard library,
without expecting them to *also* write a wrapper for it to make it look
like a dict.

It's hard to write it because everyone knows noone is going to use it
because their implementation would be buried inside 60 feet of soil and
sand somewhere in the dark corners of the docs. And when you said "it's
hard to get somebody to write", do you realize how many people have, in
desperation, write their own implementation of something because they
can't find it in the standard library. And I think the 'register' way
would make it unnecessary to write wrappers as long as you've provided a
compatible data structure the generic register function would handle the
rest.

It's working against the grain of Python, which is well
suited for the one-type = one-API model instead of the one-type = many-
APIs model which Lie is suggesting. And for what benefit? The only
benefit suggested so far is that we can write:
>
dict({1: 'a', 2: 'b'}, implementation= "binary tree")
>
instead of
>
binarytree({1: 'a', 2: 'b'})
>
If you call that a benefit. I don't.

A real benefit is apparent if we have this kind of snippet in our codes
(again, in my habit of oversimplifying : f is handling too many things for
its own good, isinstance is usually evil, and there is a strong smell of
bad design):

def f(listofiterabl e, someiterable):
for lst in listofiterable:
if isinstance(lst, list):
lst.extend(some list)
elif isinstance(lst, [set, dict]):
lst.update(some list)
else:
for n in somelist:
lst += n

if you have that kind of code, and many of them scattered, when you want
to change the data structure, you've got to hunt all those isinstance
down and change them to the new data structure. The other solutions, to
shadow 'list' (i.e. the previous name), doesn't work if we want to use
the original data structure too in other unrelated places. With
implementation, only the object creation would need to be searched and
isinstance(some thing, list) would pass it as an array-list replacement.

**MRAB** · Nov 5 '08, 12:05 AM

Re: Ordering python sets

On Nov 4, 8:40 pm, Lie Ryan <lie.1...@gmail .comwrote:
[snip]

On linguistic note: As a non-native speaker of English, I've never relied
on the correct usage of Yes and No and would instead rely on the
following text. In some languages, situations where English-people
usually use Yes is answered with No and vice versa.
>

[snip]
In English the rule tends to be that "yes" and "no" don't mean "I
agree" and "I disagree" like in other languages, but that "yes"
asserts the positive and "no" asserts the negative:

Positive question:

Does it work? Yes, it works.

Does it work? No, it doesn't work.

Negative question:

Doesn't it work? Yes, it does work. ["does" added for
emphasis]

Doesn't it work? No, it doesn't work.

**Mr.SpOOn** · Nov 6 '08, 01:35 PM

Re: Ordering python sets

On Wed, Nov 5, 2008 at 10:03 PM, Arnaud Delobelle
<arnodel@google mail.comwrote:

Only hashable objects can go in a set. By default a class you define is
not hashable (unless it descends from a hashable class). To remedy this
you can define a __hash__ method in your class. IIRC the only
requirement of a hash function is that two equal objects have the same
hash.

Thanks, now it works.

Ordering python sets

Comment

Comment

Comment

Comment

Comment

Comment