Re: Ranking algorithm

**Victor Bazarov** · Oct 17 '08, 01:05 PM

Re: Ranking algorithm

Juha Nieminen wrote:

Victor Bazarov wrote:

>Start with an empty 'std::set' and after inserting another value, check
>the size, and if it's greater than N, remove the first in the set.

>
If N is very large, std::set can be a real memory hog.
>
If a more memory-efficient algorithm is needed, then a heap can be
used. (The exact same type of heap as used in heap sorting.) Simply make
the element comparison so that the element with the lowest score ends up
as the first element of the heap. When the heap grows one element larger
than N, remove this first element.
>
Inserting an element in the heap is O(lg n), and removing the first
element is also O(lg n) (which is the reason why heap sort is
O(n lg n).) This is the same complexity as with std::set, the difference
being that the heap is much more memory-efficient: an array to store N+1
elements is enough.

Inserting in the middle would mean either reallocation or moving of the
trailing elements, would it? That's why it will actually be slower than
the set. Right?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

**Kai-Uwe Bux** · Oct 17 '08, 01:45 PM

Re: Ranking algorithm

Victor Bazarov wrote:

Juha Nieminen wrote:

>Victor Bazarov wrote:

>>Start with an empty 'std::set' and after inserting another value, check
>>the size, and if it's greater than N, remove the first in the set.

>>
> If N is very large, std::set can be a real memory hog.
>>
> If a more memory-efficient algorithm is needed, then a heap can be
>used. (The exact same type of heap as used in heap sorting.) Simply make
>the element comparison so that the element with the lowest score ends up
>as the first element of the heap. When the heap grows one element larger
>than N, remove this first element.
>>
> Inserting an element in the heap is O(lg n), and removing the first
>element is also O(lg n) (which is the reason why heap sort is
>O(n lg n).) This is the same complexity as with std::set, the difference
>being that the heap is much more memory-efficient: an array to store N+1
>elements is enough.

>
Inserting in the middle would mean either reallocation or moving of the
trailing elements, would it?

No. Juha is talking about heap operations, [25.3.6]. Insertion and deletion
of an element means swapping at most log(N) elements.

That's why it will actually be slower than the set. Right?

Depends on how costly 2*log(N) swaps are in comparison to allocation and
deallocation of a node. The number of comparisons is the same.

Best

Kai-Uwe Bux

**Martin Eisenberg** · Oct 19 '08, 12:15 PM

Re: Ranking algorithm

Juha Nieminen wrote:

(And in fact it might be possible to heapify an array which is
known to be sorted in O(n). I don't know this for sure, though.)

A sorted array *is* a heap, no?

Martin

--
Whatever you do will be insignificant,
but it is very important that you do it.
--Mahatma Gandhi

**Kai-Uwe Bux** · Oct 19 '08, 12:35 PM

Re: Ranking algorithm

Martin Eisenberg wrote:

Juha Nieminen wrote:
>

>(And in fact it might be possible to heapify an array which is
>known to be sorted in O(n). I don't know this for sure, though.)

>
A sorted array *is* a heap, no?

No. A heap satisfies the rule

a[k] >= a[2k+1] and a[k] >= a[2k+2]

provided the entries do exist. The functions std::make_heap and
std::sort_heap convert between heaps and sorted sequences.

Best

Kai-Uwe Bux

**Martin Eisenberg** · Oct 19 '08, 11:25 PM

Re: Ranking algorithm

Kai-Uwe Bux wrote:

Martin Eisenberg wrote:
>

>Juha Nieminen wrote:
>>

>>(And in fact it might be possible to heapify an array which is
>>known to be sorted in O(n). I don't know this for sure, though.)

>>
>A sorted array *is* a heap, no?

>
No. A heap satisfies the rule
>
a[k] >= a[2k+1] and a[k] >= a[2k+2]

Either I'm being thick or you have overinterpreted my words. I do
think that sortedness implies heapness (but not the converse).

Martin

--
Let others praise ancient times;
I am glad I was born in these.
--Ovid

**Stephen Horne** · Oct 20 '08, 06:35 AM

Re: Ranking algorithm

On 19 Oct 2008 23:24:52 GMT, Martin Eisenberg
<martin.eisenbe rg@udo.eduwrote :

>Kai-Uwe Bux wrote:
>

>Martin Eisenberg wrote:
>>

>>Juha Nieminen wrote:
>>>
>>>(And in fact it might be possible to heapify an array which is
>>>known to be sorted in O(n). I don't know this for sure, though.)
>>>
>>A sorted array *is* a heap, no?

>>
>No. A heap satisfies the rule
>>
> a[k] >= a[2k+1] and a[k] >= a[2k+2]

>
>Either I'm being thick or you have overinterpreted my words. I do
>think that sortedness implies heapness (but not the converse).

Sorted array : 1, 2, 3, 4, 5, 6, 7

Possible heap for same data : 7, 3, 6, 1, 2, 4, 5

Valid because 7 3, 7 6,
3 1, 3 2,
6 4, 6 5

Depends what you mean by "sorted" - a heap certainly has ordering
criteria, but not the x[i] < x[i+1] or similar that "sorted" usually
refers to.

**Juha Nieminen** · Oct 20 '08, 03:45 PM

Re: Ranking algorithm

Stephen Horne wrote:

If the items are small (or pointers to the real items), you should be
more concerned about allocation overhead than about the item itself.
For example, assuming allocation rounded to the next 16 byte boundary,
the minimum for the above (32 bit pointers, item is pointer) is
slightly more than one chunk (the slightly more being the flag -
probably a single byte) giving a 15 byte overhead.

I have studied a bit how the libc memory allocator works in 32-bit
linux, and it works like this:

- Each allocation consumes the requested amount of memory plus 4
bytes, rounded up to the closest 8-byte boundary.
- The minimum allocation size is 16 bytes.

Thus if you allocate, for example, 4 bytes of memory, 16 bytes of RAM
will be consumed (because it's the minimum allocation size).
If you allocate 16 bytes, 24 bytes of RAM will be consumed (your 16
bytes, plus 4 bytes used by the allocator, rounded up to the next 8-byte
boundary, which is 24). If you allocate 20 bytes, then also 24 bytes of
RAM will be consumed (20 bytes requested + 4 bytes of ancillary data,
rounded to 8-byte boundary, which is in this case also 24 bytes).

Since an std::set element contains three pointers and a flag (most
probably taking 4 bytes), plus the data itself, it will most probably
take more than 16 bytes (even if the data itself was just 1 byte). The
maximum overhead which you get this way is 8 bytes (4 bytes of ancillary
data, plus possibly 4 bytes due to the 8-byte boundary).

The libc allocator is not a big memory hog and it's not easy to beat
it in memory consumption, at least not with a generic allocator.
However, in many special cases it's relatively easy to beat it in speed
with a custom allocator.

**Stephen Horne** · Oct 21 '08, 09:05 PM

Re: Ranking algorithm

On Mon, 20 Oct 2008 01:54:35 -0700 (PDT), James Kanze
<james.kanze@gm ail.comwrote:

>As Martin said, a sorted array is a heap, but not all heaps are
>sorted arrays.

I'm obviously having a phase of needing to be told everything
two/three times before it sinks in.

Thanks

**Stephen Horne** · Oct 21 '08, 10:45 PM

Re: Ranking algorithm

On Mon, 20 Oct 2008 15:40:55 GMT, Juha Nieminen
<nospam@thanks. invalidwrote:

>Stephen Horne wrote:

>If the items are small (or pointers to the real items), you should be
>more concerned about allocation overhead than about the item itself.
>For example, assuming allocation rounded to the next 16 byte boundary,
>the minimum for the above (32 bit pointers, item is pointer) is
>slightly more than one chunk (the slightly more being the flag -
>probably a single byte) giving a 15 byte overhead.

>
I have studied a bit how the libc memory allocator works in 32-bit
>linux, and it works like this:

Sorry - I didn't realise that Giuliano had specified which
platform/compiler/library he is using, and I wouldn't have known the
details for any specific library anyway.

- Each allocation consumes the requested amount of memory plus 4
>bytes, rounded up to the closest 8-byte boundary.

Odd. Some features of x86 require 16-byte alignment, such as some SSE
stuff, IIRC. I thought all memory was required to be allocated such
that no alignment criteria can be violated.

No doubt I'm just plain wrong.

The libc allocator is not a big memory hog and it's not easy to beat
>it in memory consumption, at least not with a generic allocator.
>However, in many special cases it's relatively easy to beat it in speed
>with a custom allocator.

Isn't "custom generic allocator" a contradiction in terms? Apply the
special cases both ways, and you'll find some easy space optimisations
too. Though in the real world, not easy enough, unless you have a very
good reason.

**Juha Nieminen** · Oct 22 '08, 04:45 PM

Re: Ranking algorithm

Stephen Horne wrote:

> The libc allocator is not a big memory hog and it's not easy to beat
>it in memory consumption, at least not with a generic allocator.
>However, in many special cases it's relatively easy to beat it in speed
>with a custom allocator.

>
Isn't "custom generic allocator" a contradiction in terms?

"Custom" as in self-made, ie. not the one provided by the compiler
(although it can be a wrapper around the compiler-provided one).

"Generic" as in "supports allocating blocks of any size".

The only way to beat the linux libc memory allocator is to remove that
4-byte ancillary data from each allocated block. That's not an easy task
if you also want to match the speed.

Re: Ranking algorithm

Re: Ranking algorithm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment