Row-wise vs. column-wise image processing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Enrique Cruiz

    Row-wise vs. column-wise image processing

    Hello all,

    I am currently implementing a fairly simple algorithm. It scans a
    grayscale image, and computes a pixel's new value as a function of its
    original value. Two passes are made, first horizontally and second
    vertically. The problem I have is that the vertical pass is 3 to 4
    times slower than the horizontal, although the code is _exactly_ the
    same in both cases?!

    The code itself is very simple. There are 2 loops to scan through rows
    and colums (horizontal pass), or columns and rows (vertical pass)
    depending on the pass. The processing part is simple and is contained
    in a single line. 'pixel' is a pointer to the current pixel. The new
    value of the current pixel is first updated, and the pointer is then
    incremented to the next pixel, which is either in the next column or in
    the next row depending on the pass. I should add that the image is
    stored as a large 1-D vector, i.e. the values of each rows are
    contiguous in memory. Here is a simplified version of the code:

    ############### #####
    // HORIZONTAL
    // loop for every row
    ....
    // then for every column
    for(col=firstCo l+1 ; col<=lastCol-1 ; ++col)
    {
    *pixel = (*pixel) * 2 / 3;
    pixel++;
    }

    // VERTICAL
    // loop for every column
    ....
    // then for every row
    for(row=firstRo w+1 ; row<=lastRow-1 ; ++row)
    {
    *pixel = (*pixel) * 2 / 3;
    pixel+=imgWidth ;
    }
    ############### ###

    For this small amount of code, timings are as follow:
    - horizontal = 0.035 sec.
    - vertical =   0.135 sec.

    Now if we simply remove in each case the line updating the pixel
    pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
    becomes equal at 0.081. This simple instruction is responsible for the
    massive loss of time. And I have no idea why.

    My only guess relates to memory management issues. Since the image is
    stored row-wise, the current and next values are physically next in
    memory during the horizontal pass. On the other hand, for the vertical
    pass, the next value is stored in the next row, and the distance
    between them becomes 'image_width'. My guess is that the next pixel
    value in such a case is not close enough to be stored in the processor
    cache or register. The processor has to fetch it from memory, hence the
    massive loss in speed. This is however just a guess.

    I would really appreciate if anybody could enlighten me on this topic.

    Thanks in advance,


    Enrique

  • peter koch

    #2
    Re: Row-wise vs. column-wise image processing



    On Jan 25, 11:59 am, Enrique Cruiz <jni6l03mdo6n.. .@jetable.org>
    wrote:
    Hello all,
    >
    I am currently implementing a fairly simple algorithm. It scans a
    grayscale image, and computes a pixel's new value as a function of its
    original value. Two passes are made, first horizontally and second
    vertically. The problem I have is that the vertical pass is 3 to 4
    times slower than the horizontal, although the code is _exactly_ the
    same in both cases?!
    >
    [snip]
    Modern CPU's are complex beasts doing lots of stuff to improve
    performance. One such thing is caching of memory, and that is what
    happens here: the horisontal pass can exploit the cache much better
    than the vertical pass.

    /Peter

    Comment

    • =?iso-8859-1?q?Erik_Wikstr=F6m?=

      #3
      Re: Row-wise vs. column-wise image processing

      Enrique Cruiz wrote:

      This is not a C++ question as such, so it's kind of off topic, but
      anyway...
      // then for every column
      for(col=firstCo l+1 ; col<=lastCol-1 ; ++col)
      {
      *pixel = (*pixel) * 2 / 3;
      pixel++;
      >
      }
      Normally one would use the loop-variable inside the loop, I suspect
      that this is not the real code you are showing. In the real code there
      might be something else that makes the difference, though I doubt it.
      My only guess relates to memory management issues. Since the image is
      stored row-wise, the current and next values are physically next in
      memory during the horizontal pass. On the other hand, for the vertical
      pass, the next value is stored in the next row, and the distance
      between them becomes 'image_width'. My guess is that the next pixel
      value in such a case is not close enough to be stored in the processor
      cache or register. The processor has to fetch it from memory, hence the
      massive loss in speed. This is however just a guess.
      Probably yes, many factors come into play, the size of the image, the
      size of the caches or the CPU, how much speculation the CPU does and so
      on. A modern CPU often speculates that you don't only want to use the
      piece of memory you try to access but also those pieces close to it.
      This helps in the row-wise run since it allows the CPU to read in the
      memory before it is actually used, in the column-wise run it works
      against you by pulling in more than you need. If you have a smaller
      image you might be able to squeeze it all into the cache which should
      make both runs equally fast.

      --
      Erik Wikström

      Comment

      • Enrique Cruiz

        #4
        Re: Row-wise vs. column-wise image processing

        On 2007-01-25 12:02:42 +0000, "peter koch" <peter.koch.lar sen@gmail.comsa id:
        >
        Modern CPU's are complex beasts doing lots of stuff to improve
        performance. One such thing is caching of memory, and that is what
        happens here: the horisontal pass can exploit the cache much better
        than the vertical pass.
        Thanks. Do you know any resources that discuss similar issues related
        to (agressive) optimization?

        Alexis


        Comment

        • Enrique Cruiz

          #5
          Re: Row-wise vs. column-wise image processing

          On 2007-01-25 12:05:36 +0000, "Erik Wikström"
          <eriwik@student .chalmers.sesai d:
          Probably yes, many factors come into play, the size of the image, the
          size of the caches or the CPU, how much speculation the CPU does and so
          on. A modern CPU often speculates that you don't only want to use the
          piece of memory you try to access but also those pieces close to it.
          This helps in the row-wise run since it allows the CPU to read in the
          memory before it is actually used, in the column-wise run it works
          against you by pulling in more than you need. If you have a smaller
          image you might be able to squeeze it all into the cache which should
          make both runs equally fast.
          Thanks a lot for the informative answer. Do you know good resources
          that discuss similar issues related to optimisation?

          Enrique

          Comment

          • Jim Langston

            #6
            Re: Row-wise vs. column-wise image processing

            "Enrique Cruiz" <jni6l03mdo6nvu u@jetable.orgwr ote in message
            news:2007012514 482643658-jni6l03mdo6nvuu @jetableorg...
            On 2007-01-25 12:05:36 +0000, "Erik Wikström" <eriwik@student .chalmers.se>
            said:
            >Probably yes, many factors come into play, the size of the image, the
            >size of the caches or the CPU, how much speculation the CPU does and so
            >on. A modern CPU often speculates that you don't only want to use the
            >piece of memory you try to access but also those pieces close to it.
            >This helps in the row-wise run since it allows the CPU to read in the
            >memory before it is actually used, in the column-wise run it works
            >against you by pulling in more than you need. If you have a smaller
            >image you might be able to squeeze it all into the cache which should
            >make both runs equally fast.
            >
            Thanks a lot for the informative answer. Do you know good resources that
            discuss similar issues related to optimisation?
            >
            Enrique
            Also, consider that
            pixel++;
            is less assembly than:
            pixel+=imgWidth ;

            I mean, pixel++; may be something as simple as:
            INC [pixel]
            (although it may be more like)
            MOV AX,[pixel]
            INC AX;
            MOV [pixel],AX

            Where pixel+=imgWidth ; would involve some memory swapping and an add.
            Some I've seen are like:
            MOV AX,[pixel];
            MOV BX,[imgWidth];
            ADD AX,BX
            MOV [pixel],DX

            or similar. without looking at the assembly I couldn't say. I learned
            assembly back in the 80x86 days and I"m sure it's changed a bit since then.


            Comment

            Working...