Replacing cmp with key for sorting

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • George Sakkis

    Replacing cmp with key for sorting

    I want to sort sequences of strings lexicographical ly but those with
    longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
    'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
    'bc', 'bd']. Currently I do it with:

    s.sort(cmp=lamb da x,y: 0 if x==y else
    -1 if x.startswith(y) else
    +1 if y.startswith(x) else
    cmp(x,y))

    Can this be done with an equivalent key function instead of cmp ?

    George
  • bearophileHUGS@lycos.com

    #2
    Re: Replacing cmp with key for sorting

    On Nov 3, 6:49 pm, George Sakkis <george.sak...@ gmail.comwrote:
    I want to sort sequences of strings lexicographical ly but those with
    longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
    'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
    'bc', 'bd']. Currently I do it with:
    >
    s.sort(cmp=lamb da x,y: 0 if x==y else
                                        -1 if x.startswith(y) else
                                        +1 if y.startswith(x) else
                                        cmp(x,y))
    >
    Can this be done with an equivalent key function instead of cmp ?
    >
    George
    Your input and output:

    s = ['a', 'bc', 'bd', 'bcb', 'ba', 'ab']
    r = ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

    To me your lambda looks like an abuse of the inline if expression. So
    I suggest to replace it with a true function, that is more readable:

    def mycmp(x, y):
    if x == y:
    return 0
    elif x.startswith(y) :
    return -1
    elif y.startswith(x) :
    return +1
    else:
    return cmp(x, y)

    print sorted(s, cmp=mycmp)

    It's a peculiar cmp function, I'm thinking still in what situations it
    can be useful.

    To use the key argument given a cmp function I use the simple code
    written by Hettinger:

    def cmp2key(mycmp):
    "Converts a cmp= function into a key= function"
    class K:
    def __init__(self, obj, *args):
    self.obj = obj
    def __cmp__(self, other):
    return mycmp(self.obj, other.obj)
    return K
    print sorted(s, key=cmp2key(myc mp))

    Now I'll look for simpler solutions...

    Bye,
    bearophile

    Comment

    • Alan G Isaac

      #3
      Re: Replacing cmp with key for sorting

      George Sakkis wrote:
      s.sort(cmp=lamb da x,y: 0 if x==y else
      -1 if x.startswith(y) else
      +1 if y.startswith(x) else
      cmp(x,y))


      Probably not what you had in mind ...
      >>s
      ['a', 'bc', 'bd', 'bcb', 'ba', 'ab']
      >>maxlen = max(len(si) for si in s)
      >>def k(si): return si+'z'*(maxlen-len(si))
      ...
      >>sorted(s,key= k)
      ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

      Cheers,
      Alan Isaac

      Comment

      • Arnaud Delobelle

        #4
        Re: Replacing cmp with key for sorting

        George Sakkis <george.sakkis@ gmail.comwrites :
        I want to sort sequences of strings lexicographical ly but those with
        longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
        'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
        'bc', 'bd']. Currently I do it with:
        >
        s.sort(cmp=lamb da x,y: 0 if x==y else
        -1 if x.startswith(y) else
        +1 if y.startswith(x) else
        cmp(x,y))
        >
        Can this be done with an equivalent key function instead of cmp ?
        Here's an idea:
        >>sorted(s, key=lambda x: x+'z'*(3-len(s)))
        ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

        The 3 above is the length of the longest string in the list

        Here's another idea, probably more practical:
        >>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)
        ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

        HTH

        --
        Arnaud

        Comment

        • bearophileHUGS@lycos.com

          #5
          Re: Replacing cmp with key for sorting

          Alan G Isaac:
          Probably not what you had in mind ...
          ...
          >>maxlen = max(len(si) for si in s)
               >>def k(si): return si+'z'*(maxlen-len(si))
          This looks a little better:

          assert isinstance(s, str)
          sorted(s, key=lambda p: p.ljust(maxlen, "\255"))

          If the string is an unicode that may not work anymore.
          I don't know if there are better solutions.

          Bye,
          bearophile

          Comment

          • bearophileHUGS@lycos.com

            #6
            Re: Replacing cmp with key for sorting

            Arnaud Delobelle:
            Here's another idea, probably more practical:
            >sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)
            Nice.
            A variant that probably works with unicode strings too:

            print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

            Bye,
            bearophile

            Comment

            • Arnaud Delobelle

              #7
              Re: Replacing cmp with key for sorting

              bearophileHUGS@ lycos.com writes:
              Arnaud Delobelle:
              >Here's another idea, probably more practical:
              >>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)
              >
              Nice.
              A variant that probably works with unicode strings too:
              >
              print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)
              Of course that's better! (although mine will work with unicode if yours
              does). It's funny how the obvious escapes me so often. Still I think
              the idea of the 'double reverse' (one letterwise, the other listwise)
              was quite good.

              --
              Arnaud

              Comment

              • bearophileHUGS@lycos.com

                #8
                Re: Replacing cmp with key for sorting

                Arnaud Delobelle:
                It's funny how the obvious escapes me so often.
                In this case it's a well known cognitive effect: the mind of humans
                clings to first good/working solution, not allowing its final tuning.
                For that you may need to think about something else for a short time,
                and then look at your solution with a little "fresher" mind.

                This (ugly) translation into D + my functional-style libs shows why
                Python syntax is a good idea:

                import d.all;
                void main() {
                auto txt = "a bc bd bcb ba ab".split();
                putr( sorted(txt, (string s){ return map((char c){return -
                cast(int)c;}, s);} ).reverse );
                }

                Long Live To Python! :-)

                Bye,
                bearophile

                Comment

                • George Sakkis

                  #9
                  Re: Replacing cmp with key for sorting

                  On Nov 3, 1:51 pm, bearophileH...@ lycos.com wrote:
                  Arnaud Delobelle:
                  >
                  Here's another idea, probably more practical:
                  >>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)
                  >
                  Nice.
                  A variant that probably works with unicode strings too:
                  >
                  print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)
                  >
                  Bye,
                  bearophile
                  Awesome! I tested it on a sample list of ~61K words [1] and it's
                  almost 40% faster, from ~1.05s dropped to ~0.62s. That's still >15
                  times slower than the default sorting (0.04s) but I guess there's not
                  much more room for improvement.

                  George

                  [1] http://www.cs.pitt.edu/~kirk/cs1501/...ggle/5desk.txt

                  Comment

                  • bearophileHUGS@lycos.com

                    #10
                    Re: Replacing cmp with key for sorting

                    George Sakkis:
                    but I guess there's not much more room for improvement.
                    That's nonsense, Python is a high level language, so there's nearly
                    always room for improvement (even in programs written in assembly you
                    can generally find faster solutions).
                    If speed is what you look for, and your strings are ASCII then this is
                    much faster:

                    tab = "".join(map(chr , xrange(256)))[::-1]
                    s.sort(key=lamb da x: x.translate(tab ), reverse=True)

                    Bye,
                    bearophile

                    Comment

                    Working...