Old tricks

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    Old tricks

    Greetings,

    Introduction

    This week's tip describes a few old tricks that are almost forgotten by most
    people around here. Sometimes there's no need for these tricks anymore because
    processors nowadays are so fast and memory comes in abundance. But still, if
    we implement an algorithm that is better, or more efficient, than another one,
    those faster processors run the first algorithm faster than the other one. If
    an algorithm takes less memory to run, we can run larger instances of a problem
    to solve in our larger memory computers.

    Larger and faster computers are still no excuse to implement slow and/or memory
    hogging algorithms. The next paragraphs describe two little problems and the
    way they can be solved. The first paragraph deals with speed and the following
    paragraph deal with memory requirements without sacrificing speed requirements.

    Bit counting

    For a plethora of reasons people want to know how many bits are set in a byte
    or word. A naive approach to count those bit in an int is the following:

    [code=java]
    public int count1(int word) {
    int ones= 0; // the number of one-bits in the 'word'
    for (int mask= 0x00000001; mask != 0; mask<<= 1)
    if ((word&mask) != 0) ones++;
    return ones;
    }
    [/code]

    This is a very simple approach: we check every single bit using the 'mask'
    variable; if the corresponding bit in the word is set we increment the 'ones'
    counter. The method returns the 'ones' variable when the loop has finished.

    No matter the value of the word, the loop loops 32 times because an int contains
    32 bits and all need to be checked.

    We can improve on this a bit: when a one bit has been found, we set it back to
    zero again. If no one bits remain in the word variable we're done. This is how
    it can be implemented:

    [code=java]
    public int count2(int word) {
    int ones= 0; // the number of one-bits in the 'word'
    for (int mask= 0x00000001; word != 0; mask<<= 1)
    if ((word&mask) != 0) {
    word&= ~mask; // remove the bit
    ones++;
    }
    return ones;
    }
    [/code]

    The loop now stops when there are no more bits in the word set to one. The count2
    method certainly runs more efficient than the previous count1 method. How much
    more efficient exactly? If we consider a random 32 bit pattern, 50% of those
    bit patterns has their leftmost bit set to one (check it). So in 50% of all the
    possible values of 'word', count2 doesn't run more efficient than method count1.

    For 2^31 possible values count2 needs 32 iterations, for 2^30 possible values
    count2 needs only 31 iterations etc. etc. all the way to 2^0 possible values
    the count2 method needs zero iterations; that's for value 0 where the loop body
    is skipped entirely.

    Mathematically speaking, on average the count2 method needs:

    sum(i= 0, 31: 2^i*(i+1))/2^32 == 31 iterations.

    Big deal, on average we're saving one single iteration. We must be able to do
    better than this. Have a look at the following notion:

    For every bit pattern with at least one bit set to one, we can represent the
    bit pattern as follows:

    xxx ... xxxx1000 ... 0000

    There are bits 'xxx' to the left of the 1 bit and zero or more 0 bits to the right
    of the 1 bit.

    If we subtract 1 from that bit pattern we obtain the bit pattern:

    xxx ... xxxx0111 ... 1111

    If we bitwise 'and' both bit patterns we get the bit pattern:

    xxx ... xxxx0000 ... 0000

    We have effectively removed the rightmost 1 bit from the original bit pattern.
    We have done this:

    [code=java]
    pattern&= pattern-1;
    [/code]

    We can use this notion in our count3 method:

    [code=java]
    public int count3(int word) {
    int ones= 0; // the number of one-bits in the 'word'
    for (; word != 0; ones++)
    word&= word-1;
    return ones;
    }
    [/code]

    Every pass through the loop a rightmost 1 bit is removed from 'word' until 'word'
    equals zero. On average in a 32 bit word there are 16 bits set to 1. So this
    count3 method certainly does better than the count1 and count2 methods. count2
    wasn't much better than count1 one, but count3 takes only half of the loop passes.

    Can we do better than this? Sure, mathematically speaking we can: we simply build
    a table with 2^32 elements such that table[i] == j if the bit pattern i contains
    j bits set to 1. Sure, mathematicians are loonies: we can't build a table or array
    with 2^32 elements; even nowadays we need humongous amounts of memory for that.

    We can build such a table for the 256 byte values though; the table would look
    something like this:

    [code=java]
    byte[] table= { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 ... 8 };
    [/code]

    We could use this table like this:

    [code=java]
    public int count4(int word) {
    int ones;
    for (int byte= 0; byte < 4; byte++) {
    ones+= table[word&0xff];
    word>>>= 8;
    }
    return ones;
    }
    [/code]

    This method simply counts all the one-bits in every byte (there are four bytes
    in an int using Java) so this algorithm takes only four loop passes ever. But
    it does need a 256 byte table. Can we do without such a table and still do better
    than the count3 method?

    We need 2 bits to store the number of ones in a two bit wide word. There are
    at most 2 bits equal to 1 in a two bits wide word and the value 2 needs two bits
    by itself. We only need 6 bits to store a value in the range [0, 32]. At most
    32 bits can be equal to one in a 32 bits wide word.

    How can we find the number of bits set to one in a two bit wide word? Easy:

    [code=java]
    int word; // just two bits in here, zero or one
    int ones= (word&1)+(word> >>1);
    [/code]

    But we can use that little trick in parallel; for a 32 bits wide word we can
    store 16 two bit counters for the 16 two bit words that make up the single 32
    bit word:

    [code=java]
    int word; // 32 bits wide word interpreted as 16 two bit words;
    int ones= (word&0x5555555 5)+((word&0xaaa aaaaa)>>>1);
    [/code]

    The 'ones' int now contains sixteen small two bit values representing the number
    of bits set to one if we interpret the 32 bit word value as 16 two bit small words.

    All we need to do now is add those 16 two bit values in the 'ones' int and we
    have the number of bits set to 1 in the word int.

    We can add 16 two bit values stored in one 32 bit word using the same trick:
    instead of the previous masks we use 0x33333333 and 0xccccccc to mask out the
    16 two bit values and add them together again; we have 4 bits to store the sum.

    But then again we can repeat this little procedure over and over again:

    [code=java]
    public int count5(int word) {
    word= (word&0x5555555 5)+((word&0xaaa aaaaa)>>> 1); // sum 1 bits
    word= (word&0x3333333 3)+((word&0xccc ccccc)>>> 2); // sum 2 bits
    word= (word&0x0f0f0f0 f)+((word&0xf0f 0f0f0)>>> 4); // sum 4 bits;
    word= (word&0x00ff00f f)+((word&0xff0 0ff00)>>> 8); // sum 8 bits;
    word= (word&0x0000fff f)+((word&oxfff f0000)>>>16); // sum 16 bits;

    return word;
    }
    [/code]

    This method doesn't use a loop anymore: it uses five steps and a table in disguise; let's see what we've got now:


    • count1: steps: 32
    • count2: steps: 31 on average
    • count3: steps: 16 on average
    • count4: steps: 4 + 256 byte explicit table
    • count5: steps: 5 + 40 byte implicit table
    Now let's have a look what happens when words are n bits wide; the table will
    look like this then:
    • count1: steps: n
    • count2: steps: n-1 on average
    • count3: steps: n/2 on average
    • count4: steps: n/8 + 256 byte explicit table
    • count5: steps: 2log(n) + n/8*2*2log(n) byte implicit table
    When n gets larger and larger (and it will) the last algorithm's memory
    requirements grows logarithmically with n. The first four algorithms grow
    linearly proportional with n; pick your choice.

    Memory manipulation

    We like to move memory around almost continuously; it seems as if we can't make
    up our mind where we want our data to be stored and we keep on dragging those
    bytes around over and over again. For example: I wrote the first paragraph of
    this article after I wrote this second paragraph.

    Effecively I moved the second paragraph further down in memory to make room for
    the first paragraph. And now I'm appending more data to the rest of the text.

    Suppose I have a sequence of memory cell; for the examples I use plain text but
    for the development of the algorithms I'll use a more abstract notion: memory
    consists of cells that are indexed using the index numbers 0, 1, 2, ... n.

    The first thing I want to be able to do is to swap the content of two cells i and j,
    so this hypothetical memory implements the following interface:

    [code=java]
    public interface Swapeable {
    public void swap(int i, int j);
    }
    [/code]

    Given this simple interface I can reverse chunks of memory, e.g. if this is the
    memory content and two index values i and j

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | H | e | l | l | o |   | W | o | r | l | d | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
                  ^                   ^
                  |                   |
                  i                   j
    The following method reverses the content of the memory between index values i and j:

    Code:
    public void reverse(Swapeable s, int i, int j) {
    
    	for(; --j > i; i++)
    		s.swap(i, j);
    }
    And the memory content will look like this afterwards:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | H | e | l | o | W |   | o | l | r | l | d | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
                  ^                   ^
                  |                   |
                  i                   j
    Well, big deal; that's CS101 and (almost?) everbody can do that. True, but there's
    more we can do using just that simple reverse() method and not so many people
    realize that fact.

    Suppose I reverse this part:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | H | e | l | l | o |   | W | o | r | l | d | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
      ^                                           ^
      |                                           |
      i                                           j
    The result is:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | d | l | r | o | W |   | o | l | l | e | H | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
      ^                                           ^
      |                                           |
      i                                           j
    Next I reverse this part:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | d | l | r | o | W |   | o | l | l | e | H | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
      ^                   ^
      |                   |
      i                   j
    The result is:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | W | o | r | l | d |   | o | l | l | e | H | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
      ^                   ^
      |                   |
      i                   j
    And finally I reverse this part:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | W | o | r | l | d |   | o | l | l | e | H | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
                              ^                   ^
                              |                   |
                              i                   j
    Which leads to this:

    Code:
    +---+---+---+---+---+---+---+---+---+---+---+---+
    | W | o | r | l | d |   | H | e | l | l | o | ! |
    +---+---+---+---+---+---+---+---+---+---+---+---+
                              ^                   ^
                              |                   |
                              i                   j
    I have just interchanged two blocks of memory; this was the original content
    of the memory: B1 B2 B3 where B1 == Hello, B2 == <space> and B3 == World. The
    result is this B3 B2 B1. The '!' sign wasn't used nor touched. If block B2
    contains more than one element I had to reverse that block as well.

    Even more important: no additional temporary memory was used except for a single
    memory cell used by the swap() method itself. In the early days when memory was
    measured in Kilobytes or even single bytes instead of Gigabytes this little trick
    was a precious little gem and used a lot. But what keeps you from still using it?

    In the following pseudo code I use the notation 'b' for a block B reversed.
    The steps needed are:

    1: reverse B1 B2 B3 yielding b3 b2 b1
    2: reverse b3 yielding B3 b2 b1
    3: reverse b2 yielding B3 B2 b1
    4: reverse b1 yielding B3 B2 B1

    Here's the code for it:

    [code=java]
    // assume ibeg <= iend <= jbeg <= jend
    public void move(Swapeable s, int ibeg, int iend, int jbeg, int jend) {
    int ilen= iend-ibeg;
    int jlen= jend-jbeg;

    reverse(s, ibeg, jend);
    reverse(s, ibeg, ibeg+jlen);
    reverse(s, ibeg+jlen, jend-ilen);
    reverse(s, jend-ilen, jend);
    }
    [/code]

    I leave it to the reader to check all the index fiddling in that method. This
    method interchanges two blocks of memory; the blocks don't need to be of the
    same size; the first block needs to be to the left of the second block; it doesn't
    take rocket science to extend this method so the two blocks can be located
    anywhere in memory, i.e. the first block doesn't need to be located to the left
    of the second block; (hint: write a little wrapper method).

    Concluding remarks

    Those two tricks I described here are proud members of my bag of (old) tricks
    because they come to good use now and then although I do need to dust them off.
    Maybe I'll describe a couple more (old) tricks in the near future. For now,
    please understand the details of those two tricks; it's worth the time.

    I hope we'll meet again here next week and

    kind regards,

    Jos
  • ashraftaj
    New Member
    • Feb 2007
    • 3

    #2
    Thanx alot it is realy helpping

    Comment

    Working...