Old tricks

JosAH

Recognized Expert MVP

Join Date: Mar 2007

Posts: 11453
#1

Old tricks

Sep 29 '07, 01:35 PM

Greetings,

Introduction

This week's tip describes a few old tricks that are almost forgotten by most
people around here. Sometimes there's no need for these tricks anymore because
processors nowadays are so fast and memory comes in abundance. But still, if
we implement an algorithm that is better, or more efficient, than another one,
those faster processors run the first algorithm faster than the other one. If
an algorithm takes less memory to run, we can run larger instances of a problem
to solve in our larger memory computers.

Larger and faster computers are still no excuse to implement slow and/or memory
hogging algorithms. The next paragraphs describe two little problems and the
way they can be solved. The first paragraph deals with speed and the following
paragraph deal with memory requirements without sacrificing speed requirements.

Bit counting

For a plethora of reasons people want to know how many bits are set in a byte
or word. A naive approach to count those bit in an int is the following:

[code=java]
public int count1(int word) {
int ones= 0; // the number of one-bits in the 'word'
for (int mask= 0x00000001; mask != 0; mask<<= 1)
if ((word&mask) != 0) ones++;
return ones;
}
[/code]

This is a very simple approach: we check every single bit using the 'mask'
variable; if the corresponding bit in the word is set we increment the 'ones'
counter. The method returns the 'ones' variable when the loop has finished.

No matter the value of the word, the loop loops 32 times because an int contains
32 bits and all need to be checked.

We can improve on this a bit: when a one bit has been found, we set it back to
zero again. If no one bits remain in the word variable we're done. This is how
it can be implemented:

[code=java]
public int count2(int word) {
int ones= 0; // the number of one-bits in the 'word'
for (int mask= 0x00000001; word != 0; mask<<= 1)
if ((word&mask) != 0) {
word&= ~mask; // remove the bit
ones++;
}
return ones;
}
[/code]

The loop now stops when there are no more bits in the word set to one. The count2
method certainly runs more efficient than the previous count1 method. How much
more efficient exactly? If we consider a random 32 bit pattern, 50% of those
bit patterns has their leftmost bit set to one (check it). So in 50% of all the
possible values of 'word', count2 doesn't run more efficient than method count1.

For 2^31 possible values count2 needs 32 iterations, for 2^30 possible values
count2 needs only 31 iterations etc. etc. all the way to 2^0 possible values
the count2 method needs zero iterations; that's for value 0 where the loop body
is skipped entirely.

Mathematically speaking, on average the count2 method needs:

sum(i= 0, 31: 2^i*(i+1))/2^32 == 31 iterations.

Big deal, on average we're saving one single iteration. We must be able to do
better than this. Have a look at the following notion:

For every bit pattern with at least one bit set to one, we can represent the
bit pattern as follows:

xxx ... xxxx1000 ... 0000

There are bits 'xxx' to the left of the 1 bit and zero or more 0 bits to the right
of the 1 bit.

If we subtract 1 from that bit pattern we obtain the bit pattern:

xxx ... xxxx0111 ... 1111

If we bitwise 'and' both bit patterns we get the bit pattern:

xxx ... xxxx0000 ... 0000

We have effectively removed the rightmost 1 bit from the original bit pattern.
We have done this:

[code=java]
pattern&= pattern-1;
[/code]

We can use this notion in our count3 method:

[code=java]
public int count3(int word) {
int ones= 0; // the number of one-bits in the 'word'
for (; word != 0; ones++)
word&= word-1;
return ones;
}
[/code]

Every pass through the loop a rightmost 1 bit is removed from 'word' until 'word'
equals zero. On average in a 32 bit word there are 16 bits set to 1. So this
count3 method certainly does better than the count1 and count2 methods. count2
wasn't much better than count1 one, but count3 takes only half of the loop passes.

Can we do better than this? Sure, mathematically speaking we can: we simply build
a table with 2^32 elements such that table[i] == j if the bit pattern i contains
j bits set to 1. Sure, mathematicians are loonies: we can't build a table or array
with 2^32 elements; even nowadays we need humongous amounts of memory for that.

We can build such a table for the 256 byte values though; the table would look
something like this:

[code=java]
byte[] table= { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 ... 8 };
[/code]

We could use this table like this:

[code=java]
public int count4(int word) {
int ones;
for (int byte= 0; byte < 4; byte++) {
ones+= table[word&0xff];
word>>>= 8;
}
return ones;
}
[/code]

This method simply counts all the one-bits in every byte (there are four bytes
in an int using Java) so this algorithm takes only four loop passes ever. But
it does need a 256 byte table. Can we do without such a table and still do better
than the count3 method?

We need 2 bits to store the number of ones in a two bit wide word. There are
at most 2 bits equal to 1 in a two bits wide word and the value 2 needs two bits
by itself. We only need 6 bits to store a value in the range [0, 32]. At most
32 bits can be equal to one in a 32 bits wide word.

How can we find the number of bits set to one in a two bit wide word? Easy:

[code=java]
int word; // just two bits in here, zero or one
int ones= (word&1)+(word> >>1);
[/code]

But we can use that little trick in parallel; for a 32 bits wide word we can
store 16 two bit counters for the 16 two bit words that make up the single 32
bit word:

[code=java]
int word; // 32 bits wide word interpreted as 16 two bit words;
int ones= (word&0x5555555 5)+((word&0xaaa aaaaa)>>>1);
[/code]

The 'ones' int now contains sixteen small two bit values representing the number
of bits set to one if we interpret the 32 bit word value as 16 two bit small words.

All we need to do now is add those 16 two bit values in the 'ones' int and we
have the number of bits set to 1 in the word int.

We can add 16 two bit values stored in one 32 bit word using the same trick:
instead of the previous masks we use 0x33333333 and 0xccccccc to mask out the
16 two bit values and add them together again; we have 4 bits to store the sum.

But then again we can repeat this little procedure over and over again:

[code=java]
public int count5(int word) {
word= (word&0x5555555 5)+((word&0xaaa aaaaa)>>> 1); // sum 1 bits
word= (word&0x3333333 3)+((word&0xccc ccccc)>>> 2); // sum 2 bits
word= (word&0x0f0f0f0 f)+((word&0xf0f 0f0f0)>>> 4); // sum 4 bits;
word= (word&0x00ff00f f)+((word&0xff0 0ff00)>>> 8); // sum 8 bits;
word= (word&0x0000fff f)+((word&oxfff f0000)>>>16); // sum 16 bits;

return word;
}
[/code]

This method doesn't use a loop anymore: it uses five steps and a table in disguise; let's see what we've got now:

count1: steps: 32

count2: steps: 31 on average

count3: steps: 16 on average

count4: steps: 4 + 256 byte explicit table

count5: steps: 5 + 40 byte implicit table

Now let's have a look what happens when words are n bits wide; the table will
look like this then:

count1: steps: n

count2: steps: n-1 on average

count3: steps: n/2 on average

count4: steps: n/8 + 256 byte explicit table

count5: steps: 2log(n) + n/8*2*2log(n) byte implicit table

When n gets larger and larger (and it will) the last algorithm's memory
requirements grows logarithmically with n. The first four algorithms grow
linearly proportional with n; pick your choice.

Memory manipulation

We like to move memory around almost continuously; it seems as if we can't make
up our mind where we want our data to be stored and we keep on dragging those
bytes around over and over again. For example: I wrote the first paragraph of
this article after I wrote this second paragraph.

Effecively I moved the second paragraph further down in memory to make room for
the first paragraph. And now I'm appending more data to the rest of the text.

Suppose I have a sequence of memory cell; for the examples I use plain text but
for the development of the algorithms I'll use a more abstract notion: memory
consists of cells that are indexed using the index numbers 0, 1, 2, ... n.

The first thing I want to be able to do is to swap the content of two cells i and j,
so this hypothetical memory implements the following interface:

[code=java]
public interface Swapeable {
public void swap(int i, int j);
}
[/code]

Given this simple interface I can reverse chunks of memory, e.g. if this is the
memory content and two index values i and j

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | H | e | l | l | o | | W | o | r | l | d | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

The following method reverses the content of the memory between index values i and j:

Code:

public void reverse(Swapeable s, int i, int j) { for(; --j > i; i++) s.swap(i, j); }

And the memory content will look like this afterwards:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | H | e | l | o | W | | o | l | r | l | d | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

Well, big deal; that's CS101 and (almost?) everbody can do that. True, but there's
more we can do using just that simple reverse() method and not so many people
realize that fact.

Suppose I reverse this part:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | H | e | l | l | o | | W | o | r | l | d | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

The result is:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | d | l | r | o | W | | o | l | l | e | H | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

Next I reverse this part:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | d | l | r | o | W | | o | l | l | e | H | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

The result is:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | W | o | r | l | d | | o | l | l | e | H | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

And finally I reverse this part:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | W | o | r | l | d | | o | l | l | e | H | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

Which leads to this:

Code:

+---+---+---+---+---+---+---+---+---+---+---+---+ | W | o | r | l | d | | H | e | l | l | o | ! | +---+---+---+---+---+---+---+---+---+---+---+---+ ^ ^ | | i j

I have just interchanged two blocks of memory; this was the original content
of the memory: B1 B2 B3 where B1 == Hello, B2 == <space> and B3 == World. The
result is this B3 B2 B1. The '!' sign wasn't used nor touched. If block B2
contains more than one element I had to reverse that block as well.

Even more important: no additional temporary memory was used except for a single
memory cell used by the swap() method itself. In the early days when memory was
measured in Kilobytes or even single bytes instead of Gigabytes this little trick
was a precious little gem and used a lot. But what keeps you from still using it?

In the following pseudo code I use the notation 'b' for a block B reversed.
The steps needed are:

1: reverse B1 B2 B3 yielding b3 b2 b1
2: reverse b3 yielding B3 b2 b1
3: reverse b2 yielding B3 B2 b1
4: reverse b1 yielding B3 B2 B1

Here's the code for it:

[code=java]
// assume ibeg <= iend <= jbeg <= jend
public void move(Swapeable s, int ibeg, int iend, int jbeg, int jend) {
int ilen= iend-ibeg;
int jlen= jend-jbeg;

reverse(s, ibeg, jend);
reverse(s, ibeg, ibeg+jlen);
reverse(s, ibeg+jlen, jend-ilen);
reverse(s, jend-ilen, jend);
}
[/code]

I leave it to the reader to check all the index fiddling in that method. This
method interchanges two blocks of memory; the blocks don't need to be of the
same size; the first block needs to be to the left of the second block; it doesn't
take rocket science to extend this method so the two blocks can be located
anywhere in memory, i.e. the first block doesn't need to be located to the left
of the second block; (hint: write a little wrapper method).

Concluding remarks

Those two tricks I described here are proud members of my bag of (old) tricks
because they come to good use now and then although I do need to dust them off.
Maybe I'll describe a couple more (old) tricks in the near future. For now,
please understand the details of those two tricks; it's worth the time.

I hope we'll meet again here next week and

kind regards,

Jos
Tags: None
ashraftaj

New Member

Join Date: Feb 2007

Posts: 3
#2

Jun 24 '08, 12:25 PM

Thanx alot it is realy helpping
Comment

Previous template Next

Old tricks

Comment