outline-style sorting algorithm

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jwsacksteder@ramprecision.com

    outline-style sorting algorithm

    I have a need to sort a list of elements that represent sections of a
    document in dot-separated notation. The built in sort does the wrong thing.
    This seems a rather complex problem and I was hoping someone smarter than me
    had already worked out the best way to approach this. For example, consider
    the following list-
    [color=blue][color=green][color=darkred]
    >>> foo[/color][/color][/color]
    ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20', '1.20.1',
    '1.30'][color=blue][color=green][color=darkred]
    >>> foo.sort()
    >>> foo[/color][/color][/color]
    ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1', '1.30',
    '1.9']

    Obviously 1.20.1 should be after 1.9 if we look at this as dot-delimited
    integers, not as decimal numbers.

    Does anyone have pointers to existing code?








  • Max M

    #2
    Re: outline-style sorting algorithm

    jwsacksteder@ra mprecision.com wrote:[color=blue]
    > I have a need to sort a list of elements that represent sections of a
    > document in dot-separated notation. The built in sort does the wrong[/color]
    thing.

    Not really. You are giving it a list of strings, and it sort those
    alphabetically. That seems like the right thing to me ;-)
    [color=blue]
    > This seems a rather complex problem and I was hoping someone smarter[/color]
    than me[color=blue]
    > had already worked out the best way to approach this. For example,[/color]
    consider[color=blue]
    > the following list-[/color]
    [color=blue][color=green][color=darkred]
    >>>>foo[/color][/color]
    >
    > ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20', '1.20.1',
    > '1.30'][/color]

    You need to convert them to another datatype first. Your best bet here
    would be a list or a tuple, as they can map directly to your data.

    '1.0.1'.split(' .') == [1,0,1]

    But list are a bit easier here.

    foo_as_tuples = [f.split('.') for f in foo]
    foo_as_tuples.s ort()

    Then you must convert it back to strings again.

    foo = ['.'.join(f) for f in foo_as_tuples]

    There is a standard way of sorting quickly in python, called
    decorate-sort-undecorate. It is allmost the same example as before:


    decorated = [(itm.split('.') ,itm) for itm in foo]
    decorated.sort( )
    foo = [d[-1] for d in decorated]

    regards Max M

    Comment

    • Eric Brunel

      #3
      Re: outline-style sorting algorithm

      Max M wrote:[color=blue]
      > jwsacksteder@ra mprecision.com wrote:[color=green]
      > > I have a need to sort a list of elements that represent sections of a
      > > document in dot-separated notation. The built in sort does the wrong[/color]
      > thing.
      >
      > Not really. You are giving it a list of strings, and it sort those
      > alphabetically. That seems like the right thing to me ;-)
      >[color=green]
      > > This seems a rather complex problem and I was hoping someone smarter[/color]
      > than me[color=green]
      > > had already worked out the best way to approach this. For example,[/color]
      > consider[color=green]
      > > the following list-[/color]
      >[color=green][color=darkred]
      > >>>>foo[/color]
      > >
      > > ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',[/color]
      > '1.20.1',[color=green]
      > > '1.30'][/color]
      >
      > You need to convert them to another datatype first. Your best bet here
      > would be a list or a tuple, as they can map directly to your data.
      >
      > '1.0.1'.split(' .') == [1,0,1][/color]
      [snip]

      Nope:
      '1.0.1.split('. ') == ['1', '0', '1']

      So the int's are still represented as strings and it does not solve the OP's
      problem:
      [color=blue][color=green][color=darkred]
      >>> foo = ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',[/color][/color][/color]
      '1.20.1', '1.30'][color=blue][color=green][color=darkred]
      >>> bar = [x.split('.') for x in foo]
      >>> bar[/color][/color][/color]
      [['1', '0'], ['1', '0', '1'], ['1', '1', '1'], ['1', '2'], ['1', '9'], ['1',
      '10'], ['1', '11'], ['1', '20'], ['1', '20', '1'], ['1', '30']][color=blue][color=green][color=darkred]
      >>> bar.sort()
      >>> bar[/color][/color][/color]
      [['1', '0'], ['1', '0', '1'], ['1', '1', '1'], ['1', '10'], ['1', '11'], ['1',
      '2'], ['1', '20'], ['1', '20', '1'], ['1', '30'], ['1', '9']]

      And the 1.20.something are still before 1.9

      What you need to do is explicitely convert to integers the strings in the list
      resulting from the split. Here is the shortest way to do it
      [color=blue][color=green][color=darkred]
      >>> bar = [map(int, x.split('.')) for x in foo]
      >>> bar[/color][/color][/color]
      [[1, 0], [1, 0, 1], [1, 1, 1], [1, 2], [1, 9], [1, 10], [1, 11], [1, 20], [1,
      20, 1], [1, 30]]

      Now you can sort:
      [color=blue][color=green][color=darkred]
      >>> bar.sort()
      >>> bar[/color][/color][/color]
      [[1, 0], [1, 0, 1], [1, 1, 1], [1, 2], [1, 9], [1, 10], [1, 11], [1, 20], [1,
      20, 1], [1, 30]]

      Hooray! We've got the result we wanted. Now, convert integers back to string and
      re-join everything:
      [color=blue][color=green][color=darkred]
      >>> foo = ['.'.join(map(st r, x)) for x in bar]
      >>> foo[/color][/color][/color]
      ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20', '1.20.1', '1.30']

      That's what we expected...

      HTH
      --
      - Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
      PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

      Comment

      • Max M

        #4
        Re: outline-style sorting algorithm

        jwsacksteder@ra mprecision.com wrote:[color=blue]
        > I have a need to sort a list of elements that represent sections of a
        > document in dot-separated notation. The built in sort does the wrong[/color]
        thing.

        Not really. You are giving it a list of strings, and it sort those
        alphabetically. That seems like the right thing to me ;-)
        [color=blue]
        > This seems a rather complex problem and I was hoping someone smarter[/color]
        than me[color=blue]
        > had already worked out the best way to approach this. For example,[/color]
        consider[color=blue]
        > the following list-[/color]
        [color=blue][color=green][color=darkred]
        >>>>foo[/color][/color]
        >
        > ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20', '1.20.1',
        > '1.30'][/color]

        You need to convert them to another datatype first. Your best bet here
        would be a list or a tuple, as they can map directly to your data.

        '1.0.1'.split(' .') == [1,0,1]

        But list are a bit easier here.

        foo_as_tuples = [f.split('.') for f in foo]
        foo_as_tuples.s ort()

        Then you must convert it back to strings again.

        foo = ['.'.join(f) for f in foo_as_tuples]

        There is a standard way of sorting quickly in python, called
        decorate-sort-undecorate. It is allmost the same example as before:


        decorated = [(itm.split('.') ,itm) for itm in foo]
        decorated.sort( )
        foo = [d[-1] for d in decorated]

        regards Max M

        Comment

        • wes weston

          #5
          Re: outline-style sorting algorithm

          jwsacksteder@ra mprecision.com wrote:[color=blue]
          > I have a need to sort a list of elements that represent sections of a
          > document in dot-separated notation. The built in sort does the wrong thing.
          > This seems a rather complex problem and I was hoping someone smarter than me
          > had already worked out the best way to approach this. For example, consider
          > the following list-
          >
          >[color=green][color=darkred]
          >>>>foo[/color][/color]
          >
          > ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20', '1.20.1',
          > '1.30']
          >[color=green][color=darkred]
          >>>>foo.sort( )
          >>>>foo[/color][/color]
          >
          > ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1', '1.30',
          > '1.9']
          >
          > Obviously 1.20.1 should be after 1.9 if we look at this as dot-delimited
          > integers, not as decimal numbers.
          >
          > Does anyone have pointers to existing code?
          >
          >
          >
          >
          >
          >
          >
          >[/color]



          list = ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',
          '1.20.1','1.30']

          def parse(str):
          list = str.split('.')
          if len(list) > 0:
          x1 = int(list[0])
          else:
          x1 = -1
          if len(list) > 1:
          x2 = int(list[1])
          else:
          x2 = -1
          if len(list) > 2:
          x3 = int(list[2])
          else:
          x3 = -1
          return x1,x2,x3

          def cmp(x1,x2):
          w1 = parse(x1)
          w2 = parse(x2)
          if w1[0] < w2[0]:
          return -1
          if w1[0] > w2[0]:
          return 1

          if w1[1] < w2[1]:
          return -1
          if w1[1] > w2[1]:
          return 1

          if w1[2] < w2[2]:
          return -1
          if w1[2] > w2[2]:
          return 1

          return 0

          #---------------------------
          if __name__ == "__main__":
          list.sort(cmp)
          for x in list:
          print x

          Comment

          • Thorsten Kampe

            #6
            Re: outline-style sorting algorithm

            * jwsacksteder@ra mprecision.com (2004-04-19 15:08 +0100)[color=blue]
            > I have a need to sort a list of elements that represent sections of a
            > document in dot-separated notation. The built in sort does the wrong thing.
            > This seems a rather complex problem and I was hoping someone smarter than me
            > had already worked out the best way to approach this. For example, consider
            > the following list-
            >[color=green][color=darkred]
            >>>> foo[/color][/color]
            > ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20', '1.20.1',
            > '1.30'][color=green][color=darkred]
            >>>> foo.sort()
            >>>> foo[/color][/color]
            > ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1', '1.30',
            > '1.9']
            >
            > Obviously 1.20.1 should be after 1.9 if we look at this as dot-delimited
            > integers, not as decimal numbers.[/color]

            You need some general approach to avoid the DSU thing:

            def funcsort(seq, func):
            """ sort seq by func(item) """
            seq = seq[:]
            seq.sort(lambda x, y: cmp(func(x), func(y)))
            return seq

            funcsort(foo, lambda x: map(int, x.split('.')))


            Thorsten

            Comment

            Working...