Ordering python sets

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bearophileHUGS@lycos.com

    #16
    Re: Ordering python sets

    Lie Ryan:
    >Although you said you disagree with the general idea, you actually take the idea two steps further, I take that as an implicit agreement to several parts of the idea.<
    Think about a bridge: building half bridge may be bad/useless, while
    building it whole may lead to something useful :-)

    Bye,
    bearophile

    Comment

    • Tim Rowe

      #17
      Re: Ordering python sets

      2008/10/27 <bearophileHUGS @lycos.com>:
      Lie Ryan:
      >
      >>Oh no, the two dict implementation would work _exactly_ the same from the outside, they are transparently interchangeable . Only the performance characteristic differs because of the different implementation. <
      >
      I don't agree with the general idea. If the operations done by your
      data structure have different computational complexity, then they are
      fit for different usages. When you program you must know what
      computational complexity has each of the operations of your data
      structyre, otherwise there's no way to know the complexity of your
      whole program, so instead of programming you are just become a mage
      that tries magical spells and hopes for the better.
      No, when you program you know how you will be using the data
      structure, so you can choose the implementation that's right for the
      application. That's what the container libraries for other languages
      do. At the moment, you just have one implementation, and have to hope
      it's right for the job. Adding an *optional* parameter that says, in
      effect, "I want this list optimised for writes and reads at both ends"
      or "I want this list optimised for fully random reads but writes only
      at the end" doesn't *lose* you any information about what you're
      programming with. Of course it's essential that the data structure has
      identican /functional/ behaviour whatever optimisation you use. Other
      languages can enforce that, but Python programmers are used to taking
      care of that side of things for themselves.
      So I don't accept
      so much different data structures to have the same name. That's why
      I'll never appreciate the Python list type to be named list instead of
      array, despite it supports more or less all the functions you expect
      from an abstract list type.
      They're not different data structures from the client point of view.
      "More or less" all the functions wouldn't be enough.

      --
      Tim Rowe

      Comment

      • Terry Reedy

        #18
        Re: Ordering python sets

        Lie Ryan wrote:
        You need to adjust the current mindset slightly (but in an important way
        to understand the "why" behind this idea). The current notion is: list
        and dict is a data structure. With this idea, list and dict is an
        abstract type, not a data structure. array, linked list, binary tree, red-
        black tree, hashed are data structure.
        Did you miss my earlier post? Python already has about 20 abstract
        types. It has a naming scheme for these, as well for concrete
        structures, both built-in and imported. Agitating for names to be
        swapped around is a waste of time. You are free, however, to do this in
        your own code.
        We create a data structure by
        passing the data structure's identifier string to a factory function
        provided by the abstract type.
        Python is object-oriented rather than name-oriented. When one knows the
        name of an object at compile time, the Python way is to use it directly
        in the code, unquoted. In other words, work with and pass around
        objects as much as possible and only use names when they are truly
        variable and not known constants.

        What you propose is equivalent to using getattr for all attribute
        access: "getattr(li st, 'append')" versus "list.appen d".

        Terry Jan Reedy

        Comment

        • Lie Ryan

          #19
          Re: Ordering python sets

          On Sun, 02 Nov 2008 02:08:37 +0000, Steven D'Aprano wrote:
          On Sat, 01 Nov 2008 18:58:59 +0000, Tim Rowe wrote:
          >
          >2008/10/27 <bearophileHUGS @lycos.com>:
          >>Lie Ryan:
          >>>
          >>>>Oh no, the two dict implementation would work _exactly_ the same from
          >>>>the outside, they are transparently interchangeable . Only the
          >>>>performan ce characteristic differs because of the different
          >>>>implementat ion.<
          >>>
          >>I don't agree with the general idea. If the operations done by your
          >>data structure have different computational complexity, then they are
          >>fit for different usages. When you program you must know what
          >>computation al complexity has each of the operations of your data
          >>structyre, otherwise there's no way to know the complexity of your
          >>whole program, so instead of programming you are just become a mage
          >>that tries magical spells and hopes for the better.
          >>
          >No, when you program you know how you will be using the data structure,
          >so you can choose the implementation that's right for the application.
          >
          I don't understand why you have prefixed that sentence with "No". It
          seems like you are agreeing with bearophile's point.
          On linguistic note: As a non-native speaker of English, I've never relied
          on the correct usage of Yes and No and would instead rely on the
          following text. In some languages, situations where English-people
          usually use Yes is answered with No and vice versa.
          >That's what the container libraries for other languages do. At the
          >moment, you just have one implementation, and have to hope it's right
          >for the job.
          >
          To the extent that this is true, it is a sign of the poverty of the
          standard Python library. But in fact it isn't true. Python has quite a
          decent collection of collection types. Just off the top of my head:
          >
          tuple, namedtuple, list, set, frozenset, dict, defaultdict, queue, deque
          >
          set, list and dict are highly optimized for general use, and are very
          suitable for building new data structures. And you are wrong to say that
          we have to "hope it's right for the job", we can build perfectly
          acceptable pure-Python data structures from these building blocks, or
          add our own C extensions. You're never forced to use a standard library
          data structure, it's a trade-off of your time and effort against runtime
          efficiency. And because the standard library types are generally so
          excellent, that trade-off is often (but not always) a no-brainer.
          There are cases where the algorithm you uses ignited the standard
          library's data structure's worst case scenario.
          >Adding an *optional* parameter that says, in effect, "I want this list
          >optimised for writes and reads at both ends" or "I want this list
          >optimised for fully random reads but writes only at the end" doesn't
          >*lose* you any information about what you're programming with.
          >
          It might not lose information, but it does obfuscate it.
          >
          Right now, we can do this:
          <snip>

          I've repelled this same argument multiple times. How often do you think
          you'd need to do check for an object's implementation? Compare that to
          how often we need to ensure that the object we're handling is a list-
          replacement.
          That's assuming the implementation is available at runtime at all; if
          it's not, then you have lost information.
          The implementation property is always available, because it's a non-
          optional argument when registering the data structure.
          >Of course it's essential that the data structure has identican
          >/functional/ behaviour whatever optimisation you use.
          >
          "Essential" ? That's an interesting point. Let's look at an actual
          example.
          >
          Consider one of Lie's initial examples (paraphrased):
          >
          D1 = dict({1: 'a', 2: 'b'}) # standard hash-table D2 = dict({1: 'a', 2:
          'b'}, implementation= "binary tree")
          >
          You say it's "essential" that the binary tree implementation has
          identical functional behaviour to standard hash-table implementation of
          dict. Okay, let's see where that leads us. The functional behaviour of
          hash-tables is that they are O(1) for reads, while binary trees are
          O(log(N)). The only way for them to have identical functional behaviour
          is to *artificially* slow down dicts to the lowest common speed. That is
          a Bad Thing, and I don't understand why you think it is essential.
          I can't think why you consider a function's performance characteristic as
          its behavior, a "characteristic " of something means it is a it's property/
          character. Not behavior.
          we no longer have to
          artificially change the behaviour of the type -- be we do have to
          artificially cripple the API of the type!
          I don't get why you think we've got to restrict it to the lowest common
          denominator. We expect people that messes with a field called
          "implementation " to know enough about the limitations of the
          implementation he chose. Regular python programmers nowadays have to be
          aware that the limitation of the (hashed) dict is its key must be
          immutable. A true dict/mapping type doesn't have that limitation, we
          already have the so-called "artificial " limitations right now. The way it
          is now gives the impression that hashed dict's limitations is equal to
          dict/mapping's limitation. Instead, it should be: dict is a mapping type
          of any object to any object, however the default hashed dict's limitation
          is immutable key.

          Aside: Also with my way, we could classify a "feature" into three status:
          limitation, exact replacement, extension. limitation is minor points
          where the implementation simply couldn't fulfill (not fulfilling major
          points would fail registration), exact replacement is the regular things,
          and extension as features that, if used, might make changing of
          implementations harder, so should only be used if that implementation' s
          extension is the natural way to write an algorithm.
          But maybe you'll argue that Lie's examples are bad examples.
          Yes, I agree it is a bad example for this issue. When I wrote that, what
          I had in mind is to demonstrate how it'd look like in a simple way. Way
          too simple, it seems.
          Perhaps
          what you have in mind is something like the difference between "dicts
          implemented as hash tables with chaining" and "dicts implemented as hash
          tables with open addressing". (Aside: what sort of open addressing?
          Linear? Quadratic? Something else?) At least now we're talking about
          different implementations with the same API and more-or-less the same
          functional behaviour.
          Not quite. I'm looking for a more radical differences. This is more
          apparent in array-list vs bintree-list. One is better for straight
          traversal and random access, while the other is better for searching and
          inserting.
          I'd suggest that 99% of the time, client code simply shouldn't care
          about the differences. The existing implementations are Good Enough.
          Have you *ever* found yourself saying "I can't use the built-in dict,
          because it uses chaining and for my application I need linear open
          addressing"? (Or vice versa.)
          Yes, that's why we have reasonable default.
          I doubt you've every said that. I doubt you know anyone who has said it.
          I doubt you know *of* anyone who has said it. And even if you have, then
          you are free to go and write your own dict type as a C extension type
          and call it dict_with_linea r_openaddressin g and write this:
          >
          D = dict_with_linea r_openaddressin g(data)
          >
          instead of:
          >
          D = dict(data, implementation = "linear open addressing")
          >
          Yet again Lie's proposal gains you nothing. In fact, as an application
          writer, you're better off writing your own type rather than waiting for
          it to be added to the built-in implementation of dict.
          There are times when it'd be too much work to implement it, test it, and
          integrate it. There are times when we know that using alternative
          implementation could make our program faster, but simply lack enough time
          to implement and test it. I was talking about rapid application
          development or even the wilder cousin, quick-one-time-disposable scripts.
          (I'm not implying that only rapid programming benefits from this, it is
          only one example)

          Plus, currently it is impossible, without extensive reading of the whole
          docs, to search for other data structure that implements the same
          abstract type to list/dict/anything. Their documentations are scattered
          around here and there. It's helpful if help(list) could list what
          implementations are available (no pun intended).

          <snip>
          Here's an even better way:
          >
          class StandardFoo(dat a):
          def __init__(self, data):
          self.data = foo(data)
          def __len__(self):
          return len(self.data)
          >
          class MagicFoo(data):
          def __init__(self, data):
          self.data = bar(data)
          def __len__(self):
          return len(self.data[0]) + len(self.data[1:])
          >
          class GreenFoo(data):
          def __init__(self, data):
          self.data = baz(data)
          def __len__(self):
          return self._len
          >
          >
          and let the client code call whichever implementation they want.
          You missed the last part. Integrating the implementation to current code.
          They are the one that makes the difference.

          And I wasn't talking about any of those you mentioned.
          It's more like this:

          class StandardFoo(dat a):
          def __init__(self, data):
          self.data = foo(data)
          def __len__(self):
          return len(self.data)

          class MagicFoo(data):
          def __init__(self, data):
          self.data = bar(data)
          def __len__(self):
          return len(self.data[0]) + len(self.data[1:])

          class GreenFoo(data):
          def __init__(self, data):
          self.data = baz(data)
          def __len__(self):
          return self._len

          class Foo(AbstractTyp e): pass
          Foo.register({' standard': StandardFoo, 'green': GreenFoo, 'magic':
          MagicFoo})
          [snip]
          >They're not different data structures from the client point of view.
          >
          But of course they are. They have different functional behaviour, and
          different APIs.
          Unless we artificially cripple the implementations
          Then please state the implementation' s limitations on the docs. People
          who knows enough to chose non-standard implementation should be
          knowledgeable enough about the limitations and extension of the chosen
          implementation.
          and
          make them all identical (which defeats the purpose of having different
          implementations !) they are different data structures, and those
          differences will be visible to the client code.
          Yes, they're different data structure, same Abstract Type.
          This scheme is a case of over-generalisation. It's also impractical: who
          is going to spend the time and effort to do it? It's hard enough to get
          somebody to write a binary tree implementation for the standard library,
          without expecting them to *also* write a wrapper for it to make it look
          like a dict.
          It's hard to write it because everyone knows noone is going to use it
          because their implementation would be buried inside 60 feet of soil and
          sand somewhere in the dark corners of the docs. And when you said "it's
          hard to get somebody to write", do you realize how many people have, in
          desperation, write their own implementation of something because they
          can't find it in the standard library. And I think the 'register' way
          would make it unnecessary to write wrappers as long as you've provided a
          compatible data structure the generic register function would handle the
          rest.
          It's working against the grain of Python, which is well
          suited for the one-type = one-API model instead of the one-type = many-
          APIs model which Lie is suggesting. And for what benefit? The only
          benefit suggested so far is that we can write:
          >
          dict({1: 'a', 2: 'b'}, implementation= "binary tree")
          >
          instead of
          >
          binarytree({1: 'a', 2: 'b'})
          >
          If you call that a benefit. I don't.
          A real benefit is apparent if we have this kind of snippet in our codes
          (again, in my habit of oversimplifying : f is handling too many things for
          its own good, isinstance is usually evil, and there is a strong smell of
          bad design):

          def f(listofiterabl e, someiterable):
          for lst in listofiterable:
          if isinstance(lst, list):
          lst.extend(some list)
          elif isinstance(lst, [set, dict]):
          lst.update(some list)
          else:
          for n in somelist:
          lst += n

          if you have that kind of code, and many of them scattered, when you want
          to change the data structure, you've got to hunt all those isinstance
          down and change them to the new data structure. The other solutions, to
          shadow 'list' (i.e. the previous name), doesn't work if we want to use
          the original data structure too in other unrelated places. With
          implementation, only the object creation would need to be searched and
          isinstance(some thing, list) would pass it as an array-list replacement.

          Comment

          • MRAB

            #20
            Re: Ordering python sets

            On Nov 4, 8:40 pm, Lie Ryan <lie.1...@gmail .comwrote:
            [snip]
            On linguistic note: As a non-native speaker of English, I've never relied
            on the correct usage of Yes and No and would instead rely on the
            following text. In some languages, situations where English-people
            usually use Yes is answered with No and vice versa.
            >
            [snip]
            In English the rule tends to be that "yes" and "no" don't mean "I
            agree" and "I disagree" like in other languages, but that "yes"
            asserts the positive and "no" asserts the negative:

            Positive question:

            Does it work? Yes, it works.

            Does it work? No, it doesn't work.

            Negative question:

            Doesn't it work? Yes, it does work. ["does" added for
            emphasis]

            Doesn't it work? No, it doesn't work.

            Comment

            • Mr.SpOOn

              #21
              Re: Ordering python sets

              On Wed, Nov 5, 2008 at 10:03 PM, Arnaud Delobelle
              <arnodel@google mail.comwrote:
              Only hashable objects can go in a set. By default a class you define is
              not hashable (unless it descends from a hashable class). To remedy this
              you can define a __hash__ method in your class. IIRC the only
              requirement of a hash function is that two equal objects have the same
              hash.
              Thanks, now it works.

              Comment

              Working...