Re: Ranking algorithm

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Juha Nieminen

    Re: Ranking algorithm

    Victor Bazarov wrote:
    Start with an empty 'std::set' and after inserting another value, check
    the size, and if it's greater than N, remove the first in the set.
    If N is very large, std::set can be a real memory hog.

    If a more memory-efficient algorithm is needed, then a heap can be
    used. (The exact same type of heap as used in heap sorting.) Simply make
    the element comparison so that the element with the lowest score ends up
    as the first element of the heap. When the heap grows one element larger
    than N, remove this first element.

    Inserting an element in the heap is O(lg n), and removing the first
    element is also O(lg n) (which is the reason why heap sort is
    O(n lg n).) This is the same complexity as with std::set, the difference
    being that the heap is much more memory-efficient: an array to store N+1
    elements is enough.
  • Victor Bazarov

    #2
    Re: Ranking algorithm

    Juha Nieminen wrote:
    Victor Bazarov wrote:
    >Start with an empty 'std::set' and after inserting another value, check
    >the size, and if it's greater than N, remove the first in the set.
    >
    If N is very large, std::set can be a real memory hog.
    >
    If a more memory-efficient algorithm is needed, then a heap can be
    used. (The exact same type of heap as used in heap sorting.) Simply make
    the element comparison so that the element with the lowest score ends up
    as the first element of the heap. When the heap grows one element larger
    than N, remove this first element.
    >
    Inserting an element in the heap is O(lg n), and removing the first
    element is also O(lg n) (which is the reason why heap sort is
    O(n lg n).) This is the same complexity as with std::set, the difference
    being that the heap is much more memory-efficient: an array to store N+1
    elements is enough.
    Inserting in the middle would mean either reallocation or moving of the
    trailing elements, would it? That's why it will actually be slower than
    the set. Right?

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask

    Comment

    • Kai-Uwe Bux

      #3
      Re: Ranking algorithm

      Victor Bazarov wrote:
      Juha Nieminen wrote:
      >Victor Bazarov wrote:
      >>Start with an empty 'std::set' and after inserting another value, check
      >>the size, and if it's greater than N, remove the first in the set.
      >>
      > If N is very large, std::set can be a real memory hog.
      >>
      > If a more memory-efficient algorithm is needed, then a heap can be
      >used. (The exact same type of heap as used in heap sorting.) Simply make
      >the element comparison so that the element with the lowest score ends up
      >as the first element of the heap. When the heap grows one element larger
      >than N, remove this first element.
      >>
      > Inserting an element in the heap is O(lg n), and removing the first
      >element is also O(lg n) (which is the reason why heap sort is
      >O(n lg n).) This is the same complexity as with std::set, the difference
      >being that the heap is much more memory-efficient: an array to store N+1
      >elements is enough.
      >
      Inserting in the middle would mean either reallocation or moving of the
      trailing elements, would it?
      No. Juha is talking about heap operations, [25.3.6]. Insertion and deletion
      of an element means swapping at most log(N) elements.
      That's why it will actually be slower than the set. Right?
      Depends on how costly 2*log(N) swaps are in comparison to allocation and
      deallocation of a node. The number of comparisons is the same.


      Best

      Kai-Uwe Bux

      Comment

      • Martin Eisenberg

        #4
        Re: Ranking algorithm

        Juha Nieminen wrote:
        (And in fact it might be possible to heapify an array which is
        known to be sorted in O(n). I don't know this for sure, though.)
        A sorted array *is* a heap, no?


        Martin

        --
        Whatever you do will be insignificant,
        but it is very important that you do it.
        --Mahatma Gandhi

        Comment

        • Kai-Uwe Bux

          #5
          Re: Ranking algorithm

          Martin Eisenberg wrote:
          Juha Nieminen wrote:
          >
          >(And in fact it might be possible to heapify an array which is
          >known to be sorted in O(n). I don't know this for sure, though.)
          >
          A sorted array *is* a heap, no?
          No. A heap satisfies the rule

          a[k] >= a[2k+1] and a[k] >= a[2k+2]

          provided the entries do exist. The functions std::make_heap and
          std::sort_heap convert between heaps and sorted sequences.


          Best

          Kai-Uwe Bux

          Comment

          • Martin Eisenberg

            #6
            Re: Ranking algorithm

            Kai-Uwe Bux wrote:
            Martin Eisenberg wrote:
            >
            >Juha Nieminen wrote:
            >>
            >>(And in fact it might be possible to heapify an array which is
            >>known to be sorted in O(n). I don't know this for sure, though.)
            >>
            >A sorted array *is* a heap, no?
            >
            No. A heap satisfies the rule
            >
            a[k] >= a[2k+1] and a[k] >= a[2k+2]
            Either I'm being thick or you have overinterpreted my words. I do
            think that sortedness implies heapness (but not the converse).


            Martin

            --
            Let others praise ancient times;
            I am glad I was born in these.
            --Ovid

            Comment

            • Stephen Horne

              #7
              Re: Ranking algorithm

              On 19 Oct 2008 23:24:52 GMT, Martin Eisenberg
              <martin.eisenbe rg@udo.eduwrote :
              >Kai-Uwe Bux wrote:
              >
              >Martin Eisenberg wrote:
              >>
              >>Juha Nieminen wrote:
              >>>
              >>>(And in fact it might be possible to heapify an array which is
              >>>known to be sorted in O(n). I don't know this for sure, though.)
              >>>
              >>A sorted array *is* a heap, no?
              >>
              >No. A heap satisfies the rule
              >>
              > a[k] >= a[2k+1] and a[k] >= a[2k+2]
              >
              >Either I'm being thick or you have overinterpreted my words. I do
              >think that sortedness implies heapness (but not the converse).
              Sorted array : 1, 2, 3, 4, 5, 6, 7

              Possible heap for same data : 7, 3, 6, 1, 2, 4, 5

              Valid because 7 3, 7 6,
              3 1, 3 2,
              6 4, 6 5

              Depends what you mean by "sorted" - a heap certainly has ordering
              criteria, but not the x[i] < x[i+1] or similar that "sorted" usually
              refers to.

              Comment

              • Juha Nieminen

                #8
                Re: Ranking algorithm

                Stephen Horne wrote:
                If the items are small (or pointers to the real items), you should be
                more concerned about allocation overhead than about the item itself.
                For example, assuming allocation rounded to the next 16 byte boundary,
                the minimum for the above (32 bit pointers, item is pointer) is
                slightly more than one chunk (the slightly more being the flag -
                probably a single byte) giving a 15 byte overhead.
                I have studied a bit how the libc memory allocator works in 32-bit
                linux, and it works like this:

                - Each allocation consumes the requested amount of memory plus 4
                bytes, rounded up to the closest 8-byte boundary.
                - The minimum allocation size is 16 bytes.

                Thus if you allocate, for example, 4 bytes of memory, 16 bytes of RAM
                will be consumed (because it's the minimum allocation size).
                If you allocate 16 bytes, 24 bytes of RAM will be consumed (your 16
                bytes, plus 4 bytes used by the allocator, rounded up to the next 8-byte
                boundary, which is 24). If you allocate 20 bytes, then also 24 bytes of
                RAM will be consumed (20 bytes requested + 4 bytes of ancillary data,
                rounded to 8-byte boundary, which is in this case also 24 bytes).

                Since an std::set element contains three pointers and a flag (most
                probably taking 4 bytes), plus the data itself, it will most probably
                take more than 16 bytes (even if the data itself was just 1 byte). The
                maximum overhead which you get this way is 8 bytes (4 bytes of ancillary
                data, plus possibly 4 bytes due to the 8-byte boundary).

                The libc allocator is not a big memory hog and it's not easy to beat
                it in memory consumption, at least not with a generic allocator.
                However, in many special cases it's relatively easy to beat it in speed
                with a custom allocator.

                Comment

                • Stephen Horne

                  #9
                  Re: Ranking algorithm

                  On Mon, 20 Oct 2008 01:54:35 -0700 (PDT), James Kanze
                  <james.kanze@gm ail.comwrote:
                  >As Martin said, a sorted array is a heap, but not all heaps are
                  >sorted arrays.
                  I'm obviously having a phase of needing to be told everything
                  two/three times before it sinks in.

                  Thanks

                  Comment

                  • Stephen Horne

                    #10
                    Re: Ranking algorithm

                    On Mon, 20 Oct 2008 15:40:55 GMT, Juha Nieminen
                    <nospam@thanks. invalidwrote:
                    >Stephen Horne wrote:
                    >If the items are small (or pointers to the real items), you should be
                    >more concerned about allocation overhead than about the item itself.
                    >For example, assuming allocation rounded to the next 16 byte boundary,
                    >the minimum for the above (32 bit pointers, item is pointer) is
                    >slightly more than one chunk (the slightly more being the flag -
                    >probably a single byte) giving a 15 byte overhead.
                    >
                    I have studied a bit how the libc memory allocator works in 32-bit
                    >linux, and it works like this:
                    Sorry - I didn't realise that Giuliano had specified which
                    platform/compiler/library he is using, and I wouldn't have known the
                    details for any specific library anyway.
                    - Each allocation consumes the requested amount of memory plus 4
                    >bytes, rounded up to the closest 8-byte boundary.
                    Odd. Some features of x86 require 16-byte alignment, such as some SSE
                    stuff, IIRC. I thought all memory was required to be allocated such
                    that no alignment criteria can be violated.

                    No doubt I'm just plain wrong.
                    The libc allocator is not a big memory hog and it's not easy to beat
                    >it in memory consumption, at least not with a generic allocator.
                    >However, in many special cases it's relatively easy to beat it in speed
                    >with a custom allocator.
                    Isn't "custom generic allocator" a contradiction in terms? Apply the
                    special cases both ways, and you'll find some easy space optimisations
                    too. Though in the real world, not easy enough, unless you have a very
                    good reason.

                    Comment

                    • Juha Nieminen

                      #11
                      Re: Ranking algorithm

                      Stephen Horne wrote:
                      > The libc allocator is not a big memory hog and it's not easy to beat
                      >it in memory consumption, at least not with a generic allocator.
                      >However, in many special cases it's relatively easy to beat it in speed
                      >with a custom allocator.
                      >
                      Isn't "custom generic allocator" a contradiction in terms?
                      "Custom" as in self-made, ie. not the one provided by the compiler
                      (although it can be a wrapper around the compiler-provided one).

                      "Generic" as in "supports allocating blocks of any size".

                      The only way to beat the linux libc memory allocator is to remove that
                      4-byte ancillary data from each allocated block. That's not an easy task
                      if you also want to match the speed.

                      Comment

                      Working...