Ordering python sets

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mr.SpOOn

    Ordering python sets

    Hi,
    I need a structure to represent a set of integers. I also need to
    perform on this set some basic set operations, such as adding or
    removing elements, joining with other sets and checking for the
    presence of specific elements.

    I think that using Python sets would be the best choice, but I also
    need integers to be ordered inside the set and I've just found out
    that, instead, Python sets are unordered collections.

    Is there another convenient structure or shall I use lists and define
    the operations I need?

    Thanks.
  • Steven D'Aprano

    #2
    Re: Ordering python sets

    On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
    <anotherrandomm using>
    Since python is dynamic language, I think it should be possible to do
    something like this:
    >
    a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
    b = dict({'a': 'A'}, implementation = 'binarytree')
    c = dict({'a': 'A'}, implementation = 'binarytree')
    Oh I hope not. I think you have mistaken "dynamic" for "chaotic".

    When I see a dict, I want to know that any two dicts work the same way. I
    don't want to have to search the entire project's source code to find out
    if it is a dict implemented as a hash table with O(1) lookups, or a dict
    implemented as a binary tree with O(log N) lookups, or a dict implemented
    as a linear array with O(N) lookups.

    If I wanted that sort of nightmare, I can already do it by shadowing the
    builtin:

    dict = binarytree
    D = dict({'a': 'A'}) # make a binary tree

    There is no possible good that come from this suggestion. The beauty of
    Python is that the built-in data structures (list, dict, set) are
    powerful enough for 99% of uses[1], and for the other 1%, you can easily
    and explicitly use something else.

    But *explicitly* is the point. There's never any time where you can do
    this:

    type(mydict) is dict

    and not know exactly what performance characteristics mydict will have.
    (Unless you shadow dict or type, or otherwise do something that breaks
    the rules.) You never need to ask, "Okay, it's a dict. What sort of dict?"

    If you want a binary tree, ask for a binary tree.






    [1] Your mileage may vary.


    --
    Steven

    Comment

    • Lie Ryan

      #3
      Re: Ordering python sets

      On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:
      On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
      >
      ><anotherrandom musing>
      >Since python is dynamic language, I think it should be possible to do
      >something like this:
      >>
      >a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b = dict({'a':
      >'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
      >implementati on = 'binarytree')
      >
      Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
      >
      When I see a dict, I want to know that any two dicts work the same way.
      Oh no, the two dict implementation would work _exactly_ the same from the
      outside, they are transparently interchangeable . Only the performance
      characteristic differs because of the different implementation. Actually
      I got this idea from a book about algorithm and data structure, that book
      said that an abstract data type (e.g. dict, set, list) have several
      competing implementations or data structures (e.g. binary tree dict,
      hashed dict, array dict). A data's implementation and interface can be
      separated so that we can switch the data structure's implementation
      without changing the rest of the code. The book is Algorithm Design
      Manual by Skiena.

      hint: read the PS below
      I don't want to have to search the entire project's source code to find
      out if it is a dict implemented as a hash table with O(1) lookups, or a
      dict implemented as a binary tree with O(log N) lookups, or a dict
      implemented as a linear array with O(N) lookups.
      No, you'd only need to look at the dict's creation point (or actually
      much better at projects docs, but not everyone create good docs). The
      alternative you mentioned below, by shadowing built-in, is both a hack
      and is much more likely to be missed.
      If I wanted that sort of nightmare, I can already do it by shadowing the
      builtin:
      >
      dict = binarytree
      D = dict({'a': 'A'}) # make a binary tree
      I DON'T want THAT sort of nightmare you mentioned...
      And it'd be impossible to have two dictionary that have two different
      implementations .
      There is no possible good that come from this suggestion. The beauty of
      Python is that the built-in data structures (list, dict, set) are
      powerful enough for 99% of uses[1], and for the other 1%, you can easily
      and explicitly use something else.
      Oh really? As far as I know, python's list is extremely bad if you're
      inserting data at the beginning of the list (e.g. lst.insert(0) requires
      the whole array be "copied" one address away). This is because python's
      list uses array data structure, making addressing (e.g. a[2]) fast, but
      insertion slow. If, on the other hand, it is implemented using binary
      tree, it'd make insertion O(log n) but addressing is a bit tricky on it.

      The keyword is "tradeoffs" .
      But *explicitly* is the point. There's never any time where you can do
      this:
      Yes, true, explicitly IS the point. How more explicit can you be than:
      dict({foo: bar}, implementation = 'binarytree')
      type(mydict) is dict
      If my memory serves right, binary tree dict and hashed table dict is both
      a dict right? (Duck Typing)
      Only their implementation differs. Implementation is... well,
      "implementa tion detail".
      and not know exactly what performance characteristics mydict will have.
      Oh... why do I need to know what the performance characteristic of mydict
      is? Unless I know what I'm doing.
      (Unless you shadow dict or type, or otherwise do something that breaks
      the rules.) You never need to ask, "Okay, it's a dict. What sort of
      dict?"
      Okay, it's a dict. What sort of dict? Who the hell cares? I don't need to
      know, they all looks and behave the same (Duck Typing)... at least until
      I profile them (since profiling is a deep black magic by itself, it
      cannot be used to discredit switching implementations ).

      Sometimes we need a data type to use a specific data structure that have
      some specific performance characteristic, because we know we'll be doing
      a specific operation a lot more than other operations.

      If you actually need to know what the implementation is currently being
      used, you could implement a dict.implementa tion property.
      If you want a binary tree, ask for a binary tree.
      Yeah, you ask for binary tree EXPLICITLY:
      bintreedict = dict({a:b}, implementation = 'binarytree')

      this:
      regularhasheddi ct = dict({a:b})

      would have a reasonable default.


      PS: I do admit I have used the wrong terms in the last post. I used the
      term "data structure", instead it should be "abstract data type", "data
      structure" is a synonym for "implementation ". In this post, I hope I've
      corrected all of the usage.

      Comment

      • Terry Reedy

        #4
        Re: Ordering python sets

        Lie Ryan wrote:
        >
        <anotherrandomm using>
        Since python is dynamic language, I think it should be possible to do
        something like this:
        >
        a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
        For this to work, the abstract list would have to know about all
        implementations of the abstraction.
        b = dict({'a': 'A'}, implementation = 'binarytree')
        c = dict({'a': 'A'}, implementation = 'binarytree')
        >
        i.e. basically since a data structure can have different implementations ,
        and different implementations have different performance characteristics ,
        it should be possible to dynamically change the implementation used.
        >
        In the far future, the data structure and its implementation could be
        abstracted even further:
        >
        a = list() # ordered list
        b = set() # unordered list
        c = dict() # unordered dictionary
        d = sorteddict() # ordered dictionary
        >
        Each of the implementations would share a common subset of methods and
        possibly a few implementation dependent method that could only work on
        certain implementations (or is extremely slow except in the correct
        implementation) .
        </anotherrandommu sing>
        The future is 3.0, at least in part, with Abstract Base Classes.
        There are 16 in the collectons module.
        "In addition to containers, the collections module provides some ABCs
        (abstract base classes) that can be used to test whether a class
        provides a particular interface, for example, is it hashable or a
        mapping, and some of them can also be used as mixin classes."

        The ABCs for numbers are in the numbers module.

        tjr

        Comment

        • Lie Ryan

          #5
          Re: Ordering python sets

          On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:
          Lie Ryan wrote:
          >
          >
          ><anotherrandom musing>
          >Since python is dynamic language, I think it should be possible to do
          >something like this:
          >>
          >a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
          >
          For this to work, the abstract list would have to know about all
          implementations of the abstraction.
          /the exact syntax isn't really important/
          /abstract type and implementation separation is the important point/

          Actually, if I want to force it, that syntax could work using the same
          magic used by event-based syst

          Comment

          • Lie Ryan

            #6
            Re: Ordering python sets

            On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:
            Lie Ryan wrote:
            >
            >
            ><anotherrandom musing>
            >Since python is dynamic language, I think it should be possible to do
            >something like this:
            >>
            >a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
            >
            For this to work, the abstract list would have to know about all
            implementations of the abstraction.
            # Sorry the last message is truncated because of an "accident"

            /the exact syntax isn't really important/
            /abstract type and implementation separation is the important point/

            Actually, if I want to force it, that syntax could work using the same
            magic used by event-based systems: registration. Although I agree it
            might be a bit cumbersome to do registration for something like this, but
            as I've said before, exact syntax is not really important.

            Comment

            • Steven D'Aprano

              #7
              Re: Ordering python sets

              On Sat, 25 Oct 2008 21:53:10 +0000, Lie Ryan wrote:
              On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:
              >
              >On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
              >>
              >><anotherrando mmusing>
              >>Since python is dynamic language, I think it should be possible to do
              >>something like this:
              >>>
              >>a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b =
              >>dict({'a': 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
              >>implementatio n = 'binarytree')
              >>
              >Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
              >>
              >When I see a dict, I want to know that any two dicts work the same way.
              >
              Oh no, the two dict implementation would work _exactly_ the same from
              the outside, they are transparently interchangeable . Only the
              performance characteristic differs because of the different
              implementation.
              Exactly. That was my point.



              [...]
              >I don't want to have to search the entire project's source code to find
              >out if it is a dict implemented as a hash table with O(1) lookups, or a
              >dict implemented as a binary tree with O(log N) lookups, or a dict
              >implemented as a linear array with O(N) lookups.
              >
              No, you'd only need to look at the dict's creation point (or actually
              much better at projects docs, but not everyone create good docs).
              And how do you find an arbitrary object's creation point without
              searching the project's source code?


              >If I wanted that sort of nightmare, I can already do it by shadowing
              >the builtin:
              >>
              >dict = binarytree
              >D = dict({'a': 'A'}) # make a binary tree
              >
              I DON'T want THAT sort of nightmare you mentioned... And it'd be
              impossible to have two dictionary that have two different
              implementations .
              Nonsense.

              dict = binarytree
              D1 = dict({'a': 'A'}) # make a binary tree "dict"
              dict = __builtin__.dic t
              D2 = dict({'a': 'A'}) # make a standard dict
              dict = someothertype
              D3 = dict({'a': 'A'})

              I'm not suggesting this is a good idea. This is a terrible idea. But it
              is not much worse than your idea:

              D1 = dict({'a': 'A'}, implementation= 'binarytree')
              D2 = dict({'a': 'A'}, implementation= 'dict')
              D3 = dict({'a': 'A'}, implementation= 'someothertype' )

              >There is no possible good that come from this suggestion. The beauty of
              >Python is that the built-in data structures (list, dict, set) are
              >powerful enough for 99% of uses[1], and for the other 1%, you can
              >easily and explicitly use something else.
              >
              Oh really? As far as I know, python's list is extremely bad if you're
              inserting data at the beginning of the list
              And how often do you do that?

              And when you do, use a deque. Just call it a deque.


              [...]
              >But *explicitly* is the point. There's never any time where you can do
              >this:
              >
              Yes, true, explicitly IS the point. How more explicit can you be than:
              dict({foo: bar}, implementation = 'binarytree')
              >
              >type(mydict) is dict

              You miss the point. With your plan, you can do this:

              D1 = dict({foo: bar}, implementation = 'binarytree')
              D2 = dict({foo: bar}, implementation = 'dict')
              type(D1) is type(D2)

              and yet D1 and D2 have UTTERLY different performance characteristics . So
              now you need to add ANOTHER test to distinguish dicts-which-are-dicts
              from dicts-which-are-binary-trees:

              D1.implementati on != D2.implementati on

              And why? So you can avoid calling a goose a goose, and call it a duck
              instead.

              If my memory serves right, binary tree dict and hashed table dict is
              both a dict right? (Duck Typing)
              Only their implementation differs. Implementation is... well,
              "implementa tion detail".
              Duck typing refers to *interface*, not implementation. I have no problem
              with you using a type with the same interface as a dict. That's what duck
              typing is all about. Just don't call it a dict!

              >and not know exactly what performance characteristics mydict will have.
              >
              Oh... why do I need to know what the performance characteristic of
              mydict is? Unless I know what I'm doing.
              We spend a lot of time on this site talking about exciting Big Picture Stuff like .NET versus Java, XML strategy, Lock-In, competitive strategy, software design, architecture, and so forth. All thi…



              Because when you do this:

              mydict[key] = 1

              it's important whether each dict lookup is O(1), O(log N) or O(N). For a
              dict with one million items, that means that an implementation based on a
              binary tree does O(20) times more processing than a dict, and an
              implementation based on linear searching does O(1000000) times more
              processing.

              If you think implementation details don't matter, try this:

              s1 = 'c'*(10**6)

              versus

              s2 = ''
              for i in xrange(10**6):
              s2 = 'c' + s2 # defeat optimizer

              >(Unless you shadow dict or type, or otherwise do something that breaks
              >the rules.) You never need to ask, "Okay, it's a dict. What sort of
              >dict?"
              >
              Okay, it's a dict. What sort of dict? Who the hell cares?
              If you don't care, then why are you specifying the implementation type?

              mydict = dict({'foo': 'bar'}, implementation= "surprise me!")

              You can't have it both ways. If you care, then you know enough to want a
              hash table based dict (the standard) or a binary tree or something else.
              So go ahead and use whatever data structure you want. Just don't call it
              a dict.

              But if you don't care, then just use the Python standard data types.

              I don't need
              to know, they all looks and behave the same (Duck Typing)...
              Until you try a simple operation like len(mydict) and it takes three
              minutes because the implementation you choose is really inefficient.

              Sometimes we need a data type to use a specific data structure that have
              some specific performance characteristic, because we know we'll be doing
              a specific operation a lot more than other operations.
              I'm not arguing that you should never use any other data structure. Go
              right ahead and use them all you like.

              >If you want a binary tree, ask for a binary tree.
              >
              Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
              implementation = 'binarytree')

              Or you could do it the right way:

              D1 = binarytree(data )
              D2 = dict(data)
              type(D1), type(D2)
              =returns binarytree, dict

              versus:

              D1 = dict(data, implementation= 'binarytree')
              D2 = dict(data)
              type(D1), type(D2)
              =returns dict, dict



              --
              Steven

              Comment

              • Terry Reedy

                #8
                Re: Ordering python sets

                Lie Ryan wrote:
                On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:
                >>a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
                >For this to work, the abstract list would have to know about all
                >implementation s of the abstraction.
                >
                /the exact syntax isn't really important/
                /abstract type and implementation separation is the important point/
                >
                Actually, if I want to force it, that syntax could work using the same
                magic used by event-based systems: registration.
                ABCs have registration method. The builtin ABCs have appropriate
                builtin classes preregistered.
                >>import collections as co
                >>mu = co.MutableSeque nce
                >>issubclass(li st, mu)
                True

                I believe user classes that inherit from an ABC are also registered, and
                other can be registered explicitly.
                >Although I agree it
                might be a bit cumbersome to do registration for something like this, but
                as I've said before, exact syntax is not really important.
                Then why do you object to current
                mylist = linkedlist(data )
                and request the harder to write and implement
                mylist = list(data, implementation = 'linkedlist')
                ?

                tjr

                Comment

                • Lie Ryan

                  #9
                  Re: Ordering python sets

                  On Sun, 26 Oct 2008 00:53:18 +0000, Steven D'Aprano wrote:
                  [...]
                  And how do you find an arbitrary object's creation point without
                  searching the project's source code?
                  How is it better using the current way?
                  Asking the .implementation field isn't much harder than asking the type
                  (), and is much more obvious that we're asking the "implementation " being
                  used.
                  >>If I wanted that sort of nightmare, I can already do it by shadowing
                  >>the builtin:
                  >>>
                  >>dict = binarytree
                  >>D = dict({'a': 'A'}) # make a binary tree
                  >>
                  >I DON'T want THAT sort of nightmare you mentioned... And it'd be
                  >impossible to have two dictionary that have two different
                  >implementation s.
                  >
                  Nonsense.
                  >
                  <snip>

                  There is two good arguments for using a plain string to specify the
                  implementation used: 1) Plain-text Configuration, 2) argument passing.

                  1) The current way requires you have a map of the configuration text to a
                  function (i.e. imps = {'linkedlist': linkedlist}), than assign the
                  function to _a global name_ (i.e. lista = imps[configuration[confkey]] --
                  namespace pollution), then instantiate an object using that (ls = lista
                  ([...]))). The implementation way, only requires ls = list([...],
                  implementation = configuration[confkey])

                  2) Easily said, plain text passing is safer and easier than passing
                  function object.
                  >>There is no possible good that come from this suggestion. The beauty
                  >>of Python is that the built-in data structures (list, dict, set) are
                  >>powerful enough for 99% of uses[1], and for the other 1%, you can
                  >>easily and explicitly use something else.
                  >>
                  >Oh really? As far as I know, python's list is extremely bad if you're
                  >inserting data at the beginning of the list
                  >
                  And how often do you do that?
                  It's not even a deque yet, a regular queue is slow using list in python,
                  and any insertion sort requires you to insert elements into arbitrary
                  position. This makes using array for it a O(n**2). On the other hand,
                  using binary tree is a O(log n).
                  [...]
                  >>But *explicitly* is the point. There's never any time where you can do
                  >>this:
                  >>
                  >Yes, true, explicitly IS the point. How more explicit can you be than:
                  >dict({foo: bar}, implementation = 'binarytree')
                  >>
                  >>type(mydict ) is dict
                  >
                  >
                  You miss the point. With your plan, you can do this:
                  >
                  D1 = dict({foo: bar}, implementation = 'binarytree') D2 = dict({foo:
                  bar}, implementation = 'dict') type(D1) is type(D2)
                  >
                  and yet D1 and D2 have UTTERLY different performance characteristics .
                  It does not matter that it have different performance characteristic.
                  So
                  now you need to add ANOTHER test to distinguish dicts-which-are-dicts
                  from dicts-which-are-binary-trees:
                  How often would you need to care about the specific implementation used?
                  Compare with how often you want to know how to work with the object. Most
                  of the time, you only need to know how to work with it (i.e. knowing its
                  interface), only _sometimes_ when it DOES matter speed-wise, you'd need
                  to know and affect its implementation.
                  D1.implementati on != D2.implementati on
                  >
                  And why? So you can avoid calling a goose a goose, and call it a duck
                  instead.
                  I think we have an utterly different view here.
                  I wished that list() becomes an abstract type, instead of a data
                  structure. (binarytreelist and arraylist is some property of a list)

                  While you insisted that list() is a data structure, and abstract type is
                  some obscureness somewhere unknown.
                  >If my memory serves right, binary tree dict and hashed table dict is
                  >both a dict right? (Duck Typing)
                  >Only their implementation differs. Implementation is... well,
                  >"implementatio n detail".
                  >
                  Duck typing refers to *interface*, not implementation.
                  Well said, you've just supported my view.
                  I have no problem
                  with you using a type with the same interface as a dict. That's what
                  duck typing is all about. Just don't call it a dict!
                  why do you think binarytreedict is "less dict" than hasheddict?
                  >>and not know exactly what performance characteristics mydict will
                  >>have.
                  >>
                  >Oh... why do I need to know what the performance characteristic of
                  >mydict is? Unless I know what I'm doing.
                  >
                  http://www.joelonsoftware.com/articl...000000319.html
                  Sometimes it is good to have control of everything. Sometimes (most of
                  the time) I know I can dump some of the details to think on more
                  important matters. To know which parts need more thinking by doing
                  profiling.
                  Because when you do this:
                  >
                  mydict[key] = 1
                  >
                  it's important whether each dict lookup is O(1), O(log N) or O(N). For a
                  dict with one million items, that means that an implementation based on
                  a binary tree does O(20) times more processing than a dict, and an
                  implementation based on linear searching does O(1000000) times more
                  processing.
                  Well, I don't need to know how the dict works if I'm only using it to
                  store twenty items right? (I do know how python's dict/set/list/etc is
                  implemented, but unless that particular dict is central to my program,
                  I'm not going to create my programs around how dict is implemented, I'm
                  going to create it on how my mind works).

                  I don't care how print statement works, unless I'm trying to do something
                  fancy with print. I don't care that print takes a series of electrical
                  signal, then convert it into a series of bytes and pass it into the
                  terminal stdout then the terminal will figure out how to draw the string
                  using the appropriate font at the appropriate place and pass itself to a
                  window manager which would compose the desktop into an image then pass
                  itself to the graphic driver which would convert the image to electrical
                  signal to be sent to the graphic card which would do some fancy tricks
                  before sending it to the monitor, when what I need is only to show the
                  user a simple welcome message.

                  The reason we use high-level language is that we know we don't need to
                  care about some details. Only when the detail matters then we take care
                  of it. If I do care about all the details, I'd use C, no, no, lower, I'd
                  use assembly, no, lower..., I'd design my own CPU, no even lower, I'd
                  design my own non-von-neumann architecture. I don't think I can change
                  the rules of physics though, but if I can... <some more pseudo random
                  bytes>
                  >I don't need
                  >to know, they all looks and behave the same (Duck Typing)...
                  >
                  Until you try a simple operation like len(mydict) and it takes three
                  minutes because the implementation you choose is really inefficient.
                  .... then I know I have chosen the wrong data structure, and I know I need
                  to change it.
                  >>If you want a binary tree, ask for a binary tree.
                  >>
                  >Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
                  >implementati on = 'binarytree')
                  >
                  Or you could do it the right way:
                  <snip>

                  For example, if I have a certain unknown object. What is easier?
                  >>type(URO) # Unidentified Round Object
                  =returns (*&%$^%*

                  or
                  >>type(URO)
                  =returns diningplate

                  only after three thousand years of continuous and laborious researching,
                  at last I found out that (*&%$^%* is an alien dining plate, and it does
                  have all the features and functionality as earth dining plate, although
                  due to it being made of CClKO32C32CH42U it is much more resistant to
                  acidity but would melt when you put carrot on it.

                  Only when I want to eat carrot, would I care that it is an alien dining
                  plate. And only when I want to eat something extremely acidic, I'd chose
                  the alien dining plate over earth dining plate.

                  Comment

                  • Lie Ryan

                    #10
                    Re: Ordering python sets

                    On Sat, 25 Oct 2008 21:50:36 -0400, Terry Reedy wrote:
                    Lie Ryan wrote:
                    >On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:
                    Then why do you object to current
                    mylist = linkedlist(data )
                    and request the harder to write and implement
                    mylist = list(data, implementation = 'linkedlist')
                    I don't wholly object it. I think it's fine but can be improved.

                    I tend to be more inclined on this side though:

                    mylist = list.linkedlist (data) # or list<othersymbo l>linkedlist(da ta)
                    as syntax sugar for
                    mylist = list(data, implementation = 'linkedlist')

                    this kinds of syntax magnify the fact that linkedlist is a choice of
                    implementation for list abstract type. The current syntax makes no
                    indication whatsoever that linkedlist is anything related to list.

                    Comment

                    • bearophileHUGS@lycos.com

                      #11
                      Re: Ordering python sets

                      Glenn Linderman:
                      how does one create a key that corresponds to ascending integer followed by descending character string?
                      (Others may have already answered you because Google groups is very
                      slow.)
                      >>seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
                      >>sorted(seq, key=lambda (n,s): (-n, s), reverse=True)
                      [(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]

                      A little harder question is how to create a key that corresponds to
                      ascending string followed by descending string?

                      To do that you can sort the data two times, relying on the stable
                      nature of the Python sort.

                      Another (silly?) solution may be to invent a "negate" function for
                      strings too:
                      >>strneg = lambda txt: txt.translate(i table)
                      >>itable = "".join(chr (i) for i in xrange(255, -1, -1))
                      >>pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
                      >>sorted(pair s, key=lambda (s1,s2): (s1, strneg(s2)))
                      [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

                      Bye,
                      bearophile

                      Comment

                      • Peter Otten

                        #12
                        Re: Ordering python sets

                        bearophileHUGS@ lycos.com wrote:
                        Glenn Linderman:
                        >
                        >how does one create a key that corresponds to ascending integer followed
                        >by descending character string?
                        >
                        (Others may have already answered you because Google groups is very
                        slow.)
                        >
                        >>>seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
                        >>>sorted(seq , key=lambda (n,s): (-n, s), reverse=True)
                        [(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]
                        >
                        A little harder question is how to create a key that corresponds to
                        ascending string followed by descending string?
                        >
                        To do that you can sort the data two times, relying on the stable
                        nature of the Python sort.
                        >
                        Another (silly?) solution may be to invent a "negate" function for
                        strings too:
                        >
                        >>>strneg = lambda txt: txt.translate(i table)
                        >>>itable = "".join(chr (i) for i in xrange(255, -1, -1))
                        >>>pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
                        >>>sorted(pairs , key=lambda (s1,s2): (s1, strneg(s2)))
                        [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]
                        Here's a class that can negate arbitrary values:
                        >>class Neg(object):
                        .... def __init__(self, value):
                        .... self.value = value
                        .... def __cmp__(self, other):
                        .... return -cmp(self.value, other.value)
                        ....
                        >>pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
                        >>sorted(pair s, key=lambda (s, t): (s, Neg(t)))
                        [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

                        Comment

                        • bearophileHUGS@lycos.com

                          #13
                          Re: Ordering python sets

                          Lie Ryan:
                          >Oh no, the two dict implementation would work _exactly_ the same from the outside, they are transparently interchangeable . Only the performance characteristic differs because of the different implementation. <
                          I don't agree with the general idea. If the operations done by your
                          data structure have different computational complexity, then they are
                          fit for different usages. When you program you must know what
                          computational complexity has each of the operations of your data
                          structyre, otherwise there's no way to know the complexity of your
                          whole program, so instead of programming you are just become a mage
                          that tries magical spells and hopes for the better. So I don't accept
                          so much different data structures to have the same name. That's why
                          I'll never appreciate the Python list type to be named list instead of
                          array, despite it supports more or less all the functions you expect
                          from an abstract list type.

                          Said that, for a high-level language like Python I can see another
                          possible solution. To have a type that contains several kinds of data
                          structures, for example a "dict" that contains a hash implementation,
                          a red-black tree, etc. Such data structure can use a small part of the
                          computational power given to it to collect statistics usages of each
                          object (or each variable, that may contain in succession several
                          ojects of the same type). Such data structure can then at run time
                          adapt dynamically, chosing to use the implementation fitter for the
                          specific usage of each object or variable (the programmer can give
                          hints of course, or sometimes even coerce the usage a specific
                          implementation) . (such statistics can also be saved on disk to be used
                          for the successive run of the program, to help them work well from the
                          start too, and not just after a while). If this capability works well
                          in practice, then it can solve the problem you were talking about, I
                          think.

                          I presume data structures in future high-level languages will be quite
                          more able to adapt themselves to their usages. Today it's the second
                          time I talk about future programming languages :-)

                          Bye,
                          bearophile

                          Comment

                          • greg

                            #14
                            Re: Ordering python sets

                            On approximately 10/27/2008 10:27 AM, came the following characters from
                            the keyboard of Peter Otten:
                            Here's a class that can negate arbitrary values
                            >
                            ... def __init__(self, value):
                            ... self.value = value
                            ... def __cmp__(self, other):
                            ... return -cmp(self.value, other.value)
                            Unfortunately, the __cmp__ method has been eliminated from
                            3.0 as well, so this won't work as-is. You would need to
                            override __lt__, __eq__, etc.

                            --
                            Greg

                            Comment

                            • Lie Ryan

                              #15
                              Re: Ordering python sets

                              On Mon, 27 Oct 2008 13:18:43 -0700, bearophileHUGS wrote:
                              So I don't accept so much different data structures to have the
                              same name
                              You need to adjust the current mindset slightly (but in an important way
                              to understand the "why" behind this idea). The current notion is: list
                              and dict is a data structure. With this idea, list and dict is an
                              abstract type, not a data structure. array, linked list, binary tree, red-
                              black tree, hashed are data structure. We create a data structure by
                              passing the data structure's identifier string to a factory function
                              provided by the abstract type.

                              There are two possible syntax sugar:
                              a = list.bintree([...])
                              b = list([...]) # also for backward compat, have reasonable default

                              In short, the data structures doesn't share the same name, the abstract
                              data-type does. The suggestion is to change the place where we define the
                              what data structure to use.
                              Said that, for a high-level language like Python I can see another
                              possible solution. To have a type that contains several kinds of data
                              structures, for example a "dict" that contains a hash implementation, a
                              red-black tree, etc. Such data structure can use a small part of the
                              computational power given to it to collect statistics usages of each
                              object (or each variable, that may contain in succession several ojects
                              of the same type). Such data structure can then at run time adapt
                              dynamically, chosing to use the implementation fitter for the specific
                              usage of each object or variable (the programmer can give hints of
                              course, or sometimes even coerce the usage a specific implementation) .
                              (such statistics can also be saved on disk to be used for the successive
                              run of the program, to help them work well from the start too, and not
                              just after a while). If this capability works well in practice, then it
                              can solve the problem you were talking about, I think.
                              >
                              I presume data structures in future high-level languages will be quite
                              more able to adapt themselves to their usages. Today it's the second
                              time I talk about future programming languages :-)
                              Although you said you disagree with the general idea, you actually take
                              the idea two steps further, I take that as an implicit agreement to
                              several parts of the idea.

                              Comment

                              Working...