Optimisation needed

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • spasmous

    Optimisation needed

    Hi,

    I have an inner loop that is running too slowly. I've done my best at
    optimising it but am not really experienced at what works best. Can
    anyone suggest any optimisations?

    Thanks for any help!


    for(m=-range;m<=range; m++) // range is either 1 or 2
    {
    for(n=-range;n<=range; n++)
    {
    /* indices */
    int x0 = (int)px - m;
    int y0 = (int)py - n;
    index = x0*kpar->fftrows + y0;

    /* zero out of bounds */
    index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));

    /* calculate distance */
    float dx = px - (float)x0;
    float dy = py - (float)y0;
    deltad = dx*dx + dy*dy;

    /* calculate coefficient */
    fscale = expf(deltad * norm);

    /* store values */
    str->array_1[index] += val_1 * fscale;
    str->array_2[index] += val_2 * fscale;
    str->array_3[index] += fscale;
    }
    }
  • Ron Natalie

    #2
    Re: Optimisation needed


    "spasmous" <spasmous@yahoo .com> wrote in message news:2f066762.0 401261506.2c352 2a2@posting.goo gle.com...[color=blue]
    > Hi,
    >
    > I have an inner loop that is running too slowly. I've done my best at
    > optimising it but am not really experienced at what works best. Can
    > anyone suggest any optimisations?[/color]

    It would be ncie if you told us the types of some of these things like px, py
    and array_1,2,3.

    Also if you're on a Pentium chip, you may wish to ask in a intel specific
    newsgroup. Most compilers generate absolutely horrific code for float/double
    to int conversion.

    Comment

    • Andre Kostur

      #3
      Re: Optimisation needed

      spasmous@yahoo. com (spasmous) wrote in news:2f066762.0 401261506.2c352 2a2
      @posting.google .com:
      [color=blue]
      > Hi,
      >
      > I have an inner loop that is running too slowly. I've done my best at
      > optimising it but am not really experienced at what works best. Can
      > anyone suggest any optimisations?
      >
      > Thanks for any help!
      >
      >
      > for(m=-range;m<=range; m++) // range is either 1 or 2
      > {
      > for(n=-range;n<=range; n++)
      > {
      > /* indices */
      > int x0 = (int)px - m;
      > int y0 = (int)py - n;
      > index = x0*kpar->fftrows + y0;
      >
      > /* zero out of bounds */
      > index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
      >
      > /* calculate distance */
      > float dx = px - (float)x0;
      > float dy = py - (float)y0;
      > deltad = dx*dx + dy*dy;
      >
      > /* calculate coefficient */
      > fscale = expf(deltad * norm);
      >
      > /* store values */
      > str->array_1[index] += val_1 * fscale;
      > str->array_2[index] += val_2 * fscale;
      > str->array_3[index] += fscale;
      > }
      > }[/color]

      Umm... some simple stuff that _might_ help:

      Move the declaration and initialization of x0 outside the inner loop. It
      doesn't need to be recalculated every time.

      Move the declaration and initialization of dx outside the inner loop
      (same reason). I'd probably square it outside the loop too.

      Comment

      • Thomas Matthews

        #4
        Re: Optimisation needed

        spasmous wrote:[color=blue]
        > Hi,
        >
        > I have an inner loop that is running too slowly. I've done my best at
        > optimising it but am not really experienced at what works best. Can
        > anyone suggest any optimisations?
        >
        > Thanks for any help!
        >
        >
        > for(m=-range;m<=range; m++) // range is either 1 or 2
        > {
        > for(n=-range;n<=range; n++)
        > {
        > /* indices */
        > int x0 = (int)px - m;
        > int y0 = (int)py - n;
        > index = x0*kpar->fftrows + y0;
        >
        > /* zero out of bounds */
        > index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
        >
        > /* calculate distance */
        > float dx = px - (float)x0;
        > float dy = py - (float)y0;[/color]

        Let us perform some substition:
        Given X0 == px - m; from above,
        dx = px - (px - m);
        dx = px - px + m;
        dx = m;
        Similarly with dy:
        dy = n;
        [color=blue]
        > deltad = dx*dx + dy*dy;[/color]

        This is simplified to:
        deltad = m * m + n * n;
        This simplification removes the variables dx and dy.

        [color=blue]
        > /* calculate coefficient */
        > fscale = expf(deltad * norm);
        >
        > /* store values */
        > str->array_1[index] += val_1 * fscale;
        > str->array_2[index] += val_2 * fscale;
        > str->array_3[index] += fscale;
        > }
        > }[/color]



        --
        Thomas Matthews

        C++ newsgroup welcome message:

        C++ Faq: http://www.parashift.com/c++-faq-lite
        C Faq: http://www.eskimo.com/~scs/c-faq/top.html
        alt.comp.lang.l earn.c-c++ faq:

        Other sites:
        http://www.josuttis.com -- C++ STL Library book
        http://www.sgi.com/tech/stl -- Standard Template Library

        Comment

        • spasmous

          #5
          Re: Optimisation needed

          Andre Kostur <nntpspam@kostu r.net> wrote in message news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...[color=blue]
          >
          > Umm... some simple stuff that _might_ help:
          >
          > Move the declaration and initialization of x0 outside the inner loop. It
          > doesn't need to be recalculated every time.
          >
          > Move the declaration and initialization of dx outside the inner loop
          > (same reason). I'd probably square it outside the loop too.[/color]


          Mm, I think the compiler must be doing that already - it makes no
          difference to the runtime. Thanks for the tip tho' :)

          A quick experiment I did commenting out the expf() call reduced the
          timing by about 75%, so I suspect that's doing the damage. And that's
          also probably already as fast as it can go, being a math library
          function, right?

          Comment

          • Andre Kostur

            #6
            Re: Optimisation needed

            Thomas Matthews <Thomas_Matthew sSpamBotsSuck@s bcglobal.net> wrote in
            news:4015EAAA.6 060203@sbcgloba l.net:
            [color=blue]
            > spasmous wrote:[color=green]
            >> Hi,
            >>
            >> I have an inner loop that is running too slowly. I've done my best at
            >> optimising it but am not really experienced at what works best. Can
            >> anyone suggest any optimisations?
            >>
            >> Thanks for any help!
            >>
            >>
            >> for(m=-range;m<=range; m++) // range is either 1 or 2
            >> {
            >> for(n=-range;n<=range; n++)
            >> {
            >> /* indices */
            >> int x0 = (int)px - m;
            >> int y0 = (int)py - n;
            >> index = x0*kpar->fftrows + y0;
            >>
            >> /* zero out of bounds */
            >> index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
            >>
            >> /* calculate distance */
            >> float dx = px - (float)x0;
            >> float dy = py - (float)y0;[/color]
            >
            > Let us perform some substition:
            > Given X0 == px - m; from above,
            > dx = px - (px - m);
            > dx = px - px + m;
            > dx = m;[/color]

            Error. Your replacements aren't identical to the original:

            Note that x0 is assigned to the integral part of px, minus m. Not just
            px. So later on, when dx is calculated... the px mentioned in there is
            the float px, not just the integral part. Thus you can't drop the px term
            from the equations.
            [color=blue]
            > Similarly with dy:
            > dy = n;[/color]

            Same error.
            [color=blue][color=green]
            >> deltad = dx*dx + dy*dy;[/color]
            >
            > This is simplified to:
            > deltad = m * m + n * n;
            > This simplification removes the variables dx and dy.
            >
            >[color=green]
            >> /* calculate coefficient */
            >> fscale = expf(deltad * norm);
            >>
            >> /* store values */
            >> str->array_1[index] += val_1 * fscale;
            >> str->array_2[index] += val_2 * fscale;
            >> str->array_3[index] += fscale;
            >> }
            >> }[/color]
            >
            >
            >[/color]

            Comment

            • Patrik Stellmann

              #7
              Re: Optimisation needed

              [color=blue]
              > for(m=-range;m<=range; m++) // range is either 1 or 2
              > {
              > for(n=-range;n<=range; n++)
              > {[/color]
              The compiler probably optimizes it anyway and for numeric values might
              be no real difference in general but usually ++m is faster than m++
              since the pseudo-code for the first looks like:

              increment m;
              return m;

              and for the second:

              temp = m;
              increment m;
              return temp;

              so in for loops where you are not interested in the result you might
              prefer the prefix ++ operator before the postfix...

              Comment

              • Kevin Goodsell

                #8
                Re: Optimisation needed

                spasmous wrote:[color=blue]
                > Hi,
                >
                > I have an inner loop that is running too slowly. I've done my best at
                > optimising it but am not really experienced at what works best. Can
                > anyone suggest any optimisations?[/color]

                I haven't really looked at your code, but here's a few tips.

                The first thing you should do (and this applies to ALL attempts at
                optimization) is profile the code. This serves 2 purposes: 1) It can
                help you determine if optimization is actually necessary - there's no
                reason to optimize code that's already fast enough. 2) It allows you to
                properly target optimization effort. Never try to guess which part of
                the code is slow, and never try to guess what tweaks will make it
                faster. These things have to be determined empirically.

                The second thing you should do is look for a better algorithm.
                Algorithmic improvements almost always outperform fine-tuning, often
                dramatically.

                The third thing you should probably do is double check your compiler's
                optimization settings. See if there's a more appropriate set of options
                for your purposes.

                Only after doing those should you look at fine-tuning the code. In
                general, hand-optimized code is uglier, more error-prone, more difficult
                to debug, and more difficult to maintain, so it should be avoided until
                there are no other options. Many of the things you are likely to try are
                probably already done by decent optimizing compilers, so make sure you
                profile to see if your changes make a difference - there's no reason to
                use uglier code that's just as slow.

                -Kevin
                --
                My email address is valid, but changes periodically.
                To contact me please use the address from a recent posting.

                Comment

                • NKOBAYE027

                  #9
                  Re: Optimisation needed

                  if the range is only 1 or 2 then you're probably best of to not bother with
                  loops - if speed (not size) is of the essence just write the code out
                  straight.

                  definitely get rid of all those declarations and casts inside the loops -
                  there has to be a better way of representing the data than that
                  hodgepodge...us e register counters in the loops (though most compilers
                  probably optimise to that anyway)

                  depending upon the accuracy of your exp requirements use your own
                  approximation algorithm - just be sure to calculate the factorial
                  denominators beforehand and save them in an array for later use...



                  "spasmous" <spasmous@yahoo .com> wrote in message
                  news:2f066762.0 401261506.2c352 2a2@posting.goo gle.com...[color=blue]
                  > Hi,
                  >
                  > I have an inner loop that is running too slowly. I've done my best at
                  > optimising it but am not really experienced at what works best. Can
                  > anyone suggest any optimisations?
                  >
                  > Thanks for any help!
                  >
                  >
                  > for(m=-range;m<=range; m++) // range is either 1 or 2
                  > {
                  > for(n=-range;n<=range; n++)
                  > {
                  > /* indices */
                  > int x0 = (int)px - m;
                  > int y0 = (int)py - n;
                  > index = x0*kpar->fftrows + y0;
                  >
                  > /* zero out of bounds */
                  > index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
                  >
                  > /* calculate distance */
                  > float dx = px - (float)x0;
                  > float dy = py - (float)y0;
                  > deltad = dx*dx + dy*dy;
                  >
                  > /* calculate coefficient */
                  > fscale = expf(deltad * norm);
                  >
                  > /* store values */
                  > str->array_1[index] += val_1 * fscale;
                  > str->array_2[index] += val_2 * fscale;
                  > str->array_3[index] += fscale;
                  > }
                  > }[/color]


                  Comment

                  • Chris Theis

                    #10
                    Re: Optimisation needed


                    "spasmous" <spasmous@yahoo .com> wrote in message
                    news:2f066762.0 401262319.87c10 3@posting.googl e.com...[color=blue]
                    > Andre Kostur <nntpspam@kostu r.net> wrote in message[/color]
                    news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...
                    [SNIP]
                    [color=blue]
                    > A quick experiment I did commenting out the expf() call reduced the
                    > timing by about 75%, so I suspect that's doing the damage. And that's
                    > also probably already as fast as it can go, being a math library
                    > function, right?[/color]

                    Well the exponential function is certainly a bottleneck. However, are you
                    sure that your optimizations are necessary and that this is the right place
                    to apply them. You should profile first and see where the time is lost. If
                    it's the code you gave then think whether the algorithm is the best one. If
                    you can answer both of these questions with yes you might consider
                    "hands-on" optimization. If you're dealing with constants you might consider
                    meta-template programming for loop unrolling and the calculation of the
                    exponential. Still, premature optimization might be an evil thing. Thus,
                    fire up your profiler first!

                    Regards
                    Chris


                    Comment

                    • Andre Kostur

                      #11
                      Re: Optimisation needed

                      spasmous@yahoo. com (spasmous) wrote in
                      news:2f066762.0 401262319.87c10 3@posting.googl e.com:
                      [color=blue]
                      > Andre Kostur <nntpspam@kostu r.net> wrote in message
                      > news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...[color=green]
                      >>
                      >> Umm... some simple stuff that _might_ help:
                      >>
                      >> Move the declaration and initialization of x0 outside the inner loop.
                      >> It doesn't need to be recalculated every time.
                      >>
                      >> Move the declaration and initialization of dx outside the inner loop
                      >> (same reason). I'd probably square it outside the loop too.[/color]
                      >
                      >
                      > Mm, I think the compiler must be doing that already - it makes no
                      > difference to the runtime. Thanks for the tip tho' :)[/color]

                      Yep... that's why I said "might" :) Those are simply moving loop
                      invariants outside of the loop. A decent optimizer should be able to spot
                      those and move them outside on it's own. If you change the actual source
                      code, it makes it more obvious to other people reading your code.

                      Comment

                      • Nick Hounsome

                        #12
                        Re: Optimisation needed


                        "spasmous" <spasmous@yahoo .com> wrote in message
                        news:2f066762.0 401262319.87c10 3@posting.googl e.com...[color=blue]
                        > Andre Kostur <nntpspam@kostu r.net> wrote in message[/color]
                        news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...[color=blue][color=green]
                        > >
                        > > Umm... some simple stuff that _might_ help:
                        > >
                        > > Move the declaration and initialization of x0 outside the inner loop.[/color][/color]
                        It[color=blue][color=green]
                        > > doesn't need to be recalculated every time.
                        > >
                        > > Move the declaration and initialization of dx outside the inner loop
                        > > (same reason). I'd probably square it outside the loop too.[/color]
                        >
                        >
                        > Mm, I think the compiler must be doing that already - it makes no
                        > difference to the runtime. Thanks for the tip tho' :)
                        >
                        > A quick experiment I did commenting out the expf() call reduced the
                        > timing by about 75%, so I suspect that's doing the damage. And that's
                        > also probably already as fast as it can go, being a math library
                        > function, right?[/color]

                        It is quite possible that operations on double are faster than floats as
                        the implementation will be optimized for double and
                        floating point hardware may only actually
                        work on doubles. You may actually be paying for a lot of
                        double/float conversions.

                        I'm not up on how these things are implemented but
                        expf(deltad*nor m) == pow( deltad, expf(norm) )
                        so precalculating expf(norm) may help.


                        Comment

                        • David Fisher

                          #13
                          Re: Optimisation needed

                          "spasmous" <spasmous@yahoo .com> wrote:
                          [color=blue]
                          > I have an inner loop that is running too slowly. I've done my best at
                          > optimising it but am not really experienced at what works best. Can
                          > anyone suggest any optimisations?[/color]
                          [color=blue]
                          > for(m=-range;m<=range; m++) // range is either 1 or 2
                          > {
                          > for(n=-range;n<=range; n++)
                          > {
                          > /* indices */
                          > int x0 = (int)px - m;
                          > int y0 = (int)py - n;
                          > index = x0*kpar->fftrows + y0;
                          >
                          > /* zero out of bounds */
                          > index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));[/color]

                          Instead of multiplying index by 0 or 1, you could say:

                          if (index < 0 || index > fftrowsSquared)
                          {
                          index = 0;
                          }

                          and define fftrowsSquared as (kpar->fftrows*kpar->fftrows) before the
                          beginning of the first loop.

                          Values of argument to expf():

                          (square(px - (float) ((int)px - m)) + square(py - (float) ((int) py - n))) *
                          norm

                          float pxVal = px - (float) ((int) px)
                          float pyVal = py - (float) ((int) py)

                          square(pxVal - (-range to range)) + square(pyVal - (-range to range))

                          square(pxVal -1) + square(pyVal - 1)
                          square(pxVal -1) + square(pyVal - 1)
                          square(pxVal -1) + square(pyVal - 1)
                          square(pxVal -1) + square(pyVal - 1)

                          [color=blue]
                          >
                          > /* calculate distance */
                          > float dx = px - (float) ((int) px - m);
                          > float dy = py - (float)y0;
                          > deltad = dx*dx + dy*dy;
                          >
                          > /* calculate coefficient */
                          > fscale = expf(deltad * norm);
                          >
                          > /* store values */
                          > str->array_1[index] += val_1 * fscale;
                          > str->array_2[index] += val_2 * fscale;
                          > str->array_3[index] += fscale;
                          > }
                          > }[/color]


                          Comment

                          • David Fisher

                            #14
                            Re: Optimisation needed

                            "David Fisher" <nospam@nospam. nospam.nospam> wrote:

                            Oops, previous mail got sent before it was finished ...

                            I was trying to say, the arguments to expf() (without the multiplication by
                            "norm") look something like this:

                            square(pxVal -1) + square(pyVal + 1) == (square(pxVal) - 2 * pxVal + 1) +
                            (square(pyVal) + 2 * pyVal + 1)
                            square(pxVal - 2) + square(pyVal) == (square(pxVal) - 4 * pxVal + 4) +
                            square(pyVal)
                            etc.

                            where

                            float pxVal = px - (float) ((int) px);
                            float pyVal = py - (float) ((int) py);

                            So the differences in the arguments to expf() are a multiple of pxVal plus a
                            multiple of pyVal plus a constant ...

                            exp(a + b) = exp(a) * exp(b)

                            so you can calculate the following values and use them to find the required
                            values of expf():

                            exp(square(pxVa l))
                            exp(square(pyVa l))
                            exp(norm) => constant ?
                            exp(1) => constant, once only
                            exp(4) => constant, once only
                            exp(pxVal * 2) => same as square(exp(pxVa l)) => just calculate exp(pxVal)
                            once
                            exp(pxVal * -2) => same as sqrt(exp(pxVal) )
                            exp(pxVal * 4) => same as square(square(e xp(pxVal)))
                            exp(pxVal * -4) => same as sqrt(sqrt(exp(p xVal)))
                            etc.

                            You should end up with a set of expressions that are faster to calculate
                            than expf(): square, sqrt, *, etc

                            Hope this is helpful,

                            David F


                            Comment

                            • David Fisher

                              #15
                              Re: Optimisation needed

                              "David Fisher" <nospam@nospam. nospam.nospam> wrote:
                              [color=blue]
                              > values of expf():
                              >
                              > exp(square(pxVa l))
                              > exp(square(pyVa l))
                              > exp(norm) => constant ?
                              > exp(1) => constant, once only
                              > exp(4) => constant, once only
                              > exp(pxVal * 2) => same as square(exp(pxVa l)) => just calculate exp(pxVal)
                              > once
                              > exp(pxVal * -2) => same as sqrt(exp(pxVal) )
                              > exp(pxVal * 4) => same as square(square(e xp(pxVal)))
                              > exp(pxVal * -4) => same as sqrt(sqrt(exp(p xVal)))
                              > etc.
                              >
                              > You should end up with a set of expressions that are faster to calculate
                              > than expf(): square, sqrt, *, etc[/color]

                              Hmm ... still leaves the problem of multiplying by "norm":

                              exp(a * b) == pow(exp(a), b)

                              You wouldn't want to say pow(whatever, norm) for each expression - that
                              would cost as much as calling expf() in the first place ...

                              I guess the solution is to multiply all of the above-listed values by
                              "norm", ie. calculate the values of:

                              exp(square(pxVa l) * norm)
                              exp(4 * norm)
                              etc.

                              (I hope the maths is right :-)

                              David F


                              Comment

                              Working...