Optimisation needed

**Ron Natalie** · Jul 22 '05, 05:44 AM

Re: Optimisation needed

"spasmous" <spasmous@yahoo .com> wrote in message news:2f066762.0 401261506.2c352 2a2@posting.goo gle.com...[color=blue]
> Hi,
>
> I have an inner loop that is running too slowly. I've done my best at
> optimising it but am not really experienced at what works best. Can
> anyone suggest any optimisations?[/color]

It would be ncie if you told us the types of some of these things like px, py
and array_1,2,3.

Also if you're on a Pentium chip, you may wish to ask in a intel specific
newsgroup. Most compilers generate absolutely horrific code for float/double
to int conversion.

**Andre Kostur** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

spasmous@yahoo. com (spasmous) wrote in news:2f066762.0 401261506.2c352 2a2
@posting.google .com:
[color=blue]
> Hi,
>
> I have an inner loop that is running too slowly. I've done my best at
> optimising it but am not really experienced at what works best. Can
> anyone suggest any optimisations?
>
> Thanks for any help!
>
>
> for(m=-range;m<=range; m++) // range is either 1 or 2
> {
> for(n=-range;n<=range; n++)
> {
> /* indices */
> int x0 = (int)px - m;
> int y0 = (int)py - n;
> index = x0*kpar->fftrows + y0;
>
> /* zero out of bounds */
> index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
>
> /* calculate distance */
> float dx = px - (float)x0;
> float dy = py - (float)y0;
> deltad = dx*dx + dy*dy;
>
> /* calculate coefficient */
> fscale = expf(deltad * norm);
>
> /* store values */
> str->array_1[index] += val_1 * fscale;
> str->array_2[index] += val_2 * fscale;
> str->array_3[index] += fscale;
> }
> }[/color]

Umm... some simple stuff that _might_ help:

Move the declaration and initialization of x0 outside the inner loop. It
doesn't need to be recalculated every time.

Move the declaration and initialization of dx outside the inner loop
(same reason). I'd probably square it outside the loop too.

**Thomas Matthews** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

spasmous wrote:[color=blue]
> Hi,
>
> I have an inner loop that is running too slowly. I've done my best at
> optimising it but am not really experienced at what works best. Can
> anyone suggest any optimisations?
>
> Thanks for any help!
>
>
> for(m=-range;m<=range; m++) // range is either 1 or 2
> {
> for(n=-range;n<=range; n++)
> {
> /* indices */
> int x0 = (int)px - m;
> int y0 = (int)py - n;
> index = x0*kpar->fftrows + y0;
>
> /* zero out of bounds */
> index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
>
> /* calculate distance */
> float dx = px - (float)x0;
> float dy = py - (float)y0;[/color]

Let us perform some substition:
Given X0 == px - m; from above,
dx = px - (px - m);
dx = px - px + m;
dx = m;
Similarly with dy:
dy = n;
[color=blue]
> deltad = dx*dx + dy*dy;[/color]

This is simplified to:
deltad = m * m + n * n;
This simplification removes the variables dx and dy.

[color=blue]
> /* calculate coefficient */
> fscale = expf(deltad * norm);
>
> /* store values */
> str->array_1[index] += val_1 * fscale;
> str->array_2[index] += val_2 * fscale;
> str->array_3[index] += fscale;
> }
> }[/color]

--
Thomas Matthews

C++ newsgroup welcome message:

404 Not Found

http://www.slack.net/~shiva/welcome.txt

C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq:

http://www.raos.demon.uk/acllc-c++/faq.html

Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library

**spasmous** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

Andre Kostur <nntpspam@kostu r.net> wrote in message news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...[color=blue]
>
> Umm... some simple stuff that _might_ help:
>
> Move the declaration and initialization of x0 outside the inner loop. It
> doesn't need to be recalculated every time.
>
> Move the declaration and initialization of dx outside the inner loop
> (same reason). I'd probably square it outside the loop too.[/color]

Mm, I think the compiler must be doing that already - it makes no
difference to the runtime. Thanks for the tip tho' :)

A quick experiment I did commenting out the expf() call reduced the
timing by about 75%, so I suspect that's doing the damage. And that's
also probably already as fast as it can go, being a math library
function, right?

**Andre Kostur** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

Thomas Matthews <Thomas_Matthew sSpamBotsSuck@s bcglobal.net> wrote in
news:4015EAAA.6 060203@sbcgloba l.net:
[color=blue]
> spasmous wrote:[color=green]
>> Hi,
>>
>> I have an inner loop that is running too slowly. I've done my best at
>> optimising it but am not really experienced at what works best. Can
>> anyone suggest any optimisations?
>>
>> Thanks for any help!
>>
>>
>> for(m=-range;m<=range; m++) // range is either 1 or 2
>> {
>> for(n=-range;n<=range; n++)
>> {
>> /* indices */
>> int x0 = (int)px - m;
>> int y0 = (int)py - n;
>> index = x0*kpar->fftrows + y0;
>>
>> /* zero out of bounds */
>> index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
>>
>> /* calculate distance */
>> float dx = px - (float)x0;
>> float dy = py - (float)y0;[/color]
>
> Let us perform some substition:
> Given X0 == px - m; from above,
> dx = px - (px - m);
> dx = px - px + m;
> dx = m;[/color]

Error. Your replacements aren't identical to the original:

Note that x0 is assigned to the integral part of px, minus m. Not just
px. So later on, when dx is calculated... the px mentioned in there is
the float px, not just the integral part. Thus you can't drop the px term
from the equations.
[color=blue]
> Similarly with dy:
> dy = n;[/color]

Same error.
[color=blue][color=green]
>> deltad = dx*dx + dy*dy;[/color]
>
> This is simplified to:
> deltad = m * m + n * n;
> This simplification removes the variables dx and dy.
>
>[color=green]
>> /* calculate coefficient */
>> fscale = expf(deltad * norm);
>>
>> /* store values */
>> str->array_1[index] += val_1 * fscale;
>> str->array_2[index] += val_2 * fscale;
>> str->array_3[index] += fscale;
>> }
>> }[/color]
>
>
>[/color]

**Patrik Stellmann** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

[color=blue]
> for(m=-range;m<=range; m++) // range is either 1 or 2
> {
> for(n=-range;n<=range; n++)
> {[/color]
The compiler probably optimizes it anyway and for numeric values might
be no real difference in general but usually ++m is faster than m++
since the pseudo-code for the first looks like:

increment m;
return m;

and for the second:

temp = m;
increment m;
return temp;

so in for loops where you are not interested in the result you might
prefer the prefix ++ operator before the postfix...

**Kevin Goodsell** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

spasmous wrote:[color=blue]
> Hi,
>
> I have an inner loop that is running too slowly. I've done my best at
> optimising it but am not really experienced at what works best. Can
> anyone suggest any optimisations?[/color]

I haven't really looked at your code, but here's a few tips.

The first thing you should do (and this applies to ALL attempts at
optimization) is profile the code. This serves 2 purposes: 1) It can
help you determine if optimization is actually necessary - there's no
reason to optimize code that's already fast enough. 2) It allows you to
properly target optimization effort. Never try to guess which part of
the code is slow, and never try to guess what tweaks will make it
faster. These things have to be determined empirically.

The second thing you should do is look for a better algorithm.
Algorithmic improvements almost always outperform fine-tuning, often
dramatically.

The third thing you should probably do is double check your compiler's
optimization settings. See if there's a more appropriate set of options
for your purposes.

Only after doing those should you look at fine-tuning the code. In
general, hand-optimized code is uglier, more error-prone, more difficult
to debug, and more difficult to maintain, so it should be avoided until
there are no other options. Many of the things you are likely to try are
probably already done by decent optimizing compilers, so make sure you
profile to see if your changes make a difference - there's no reason to
use uglier code that's just as slow.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

**NKOBAYE027** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

if the range is only 1 or 2 then you're probably best of to not bother with
loops - if speed (not size) is of the essence just write the code out
straight.

definitely get rid of all those declarations and casts inside the loops -
there has to be a better way of representing the data than that
hodgepodge...us e register counters in the loops (though most compilers
probably optimise to that anyway)

depending upon the accuracy of your exp requirements use your own
approximation algorithm - just be sure to calculate the factorial
denominators beforehand and save them in an array for later use...

"spasmous" <spasmous@yahoo .com> wrote in message
news:2f066762.0 401261506.2c352 2a2@posting.goo gle.com...[color=blue]
> Hi,
>
> I have an inner loop that is running too slowly. I've done my best at
> optimising it but am not really experienced at what works best. Can
> anyone suggest any optimisations?
>
> Thanks for any help!
>
>
> for(m=-range;m<=range; m++) // range is either 1 or 2
> {
> for(n=-range;n<=range; n++)
> {
> /* indices */
> int x0 = (int)px - m;
> int y0 = (int)py - n;
> index = x0*kpar->fftrows + y0;
>
> /* zero out of bounds */
> index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));
>
> /* calculate distance */
> float dx = px - (float)x0;
> float dy = py - (float)y0;
> deltad = dx*dx + dy*dy;
>
> /* calculate coefficient */
> fscale = expf(deltad * norm);
>
> /* store values */
> str->array_1[index] += val_1 * fscale;
> str->array_2[index] += val_2 * fscale;
> str->array_3[index] += fscale;
> }
> }[/color]

**Chris Theis** · Jul 22 '05, 05:45 AM

Re: Optimisation needed

"spasmous" <spasmous@yahoo .com> wrote in message
news:2f066762.0 401262319.87c10 3@posting.googl e.com...[color=blue]
> Andre Kostur <nntpspam@kostu r.net> wrote in message[/color]
news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...
[SNIP]
[color=blue]
> A quick experiment I did commenting out the expf() call reduced the
> timing by about 75%, so I suspect that's doing the damage. And that's
> also probably already as fast as it can go, being a math library
> function, right?[/color]

Well the exponential function is certainly a bottleneck. However, are you
sure that your optimizations are necessary and that this is the right place
to apply them. You should profile first and see where the time is lost. If
it's the code you gave then think whether the algorithm is the best one. If
you can answer both of these questions with yes you might consider
"hands-on" optimization. If you're dealing with constants you might consider
meta-template programming for loop unrolling and the calculation of the
exponential. Still, premature optimization might be an evil thing. Thus,
fire up your profiler first!

Regards
Chris

**Andre Kostur** · Jul 22 '05, 05:46 AM

Re: Optimisation needed

spasmous@yahoo. com (spasmous) wrote in
news:2f066762.0 401262319.87c10 3@posting.googl e.com:
[color=blue]
> Andre Kostur <nntpspam@kostu r.net> wrote in message
> news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...[color=green]
>>
>> Umm... some simple stuff that _might_ help:
>>
>> Move the declaration and initialization of x0 outside the inner loop.
>> It doesn't need to be recalculated every time.
>>
>> Move the declaration and initialization of dx outside the inner loop
>> (same reason). I'd probably square it outside the loop too.[/color]
>
>
> Mm, I think the compiler must be doing that already - it makes no
> difference to the runtime. Thanks for the tip tho' :)[/color]

Yep... that's why I said "might" :) Those are simply moving loop
invariants outside of the loop. A decent optimizer should be able to spot
those and move them outside on it's own. If you change the actual source
code, it makes it more obvious to other people reading your code.

**Nick Hounsome** · Jul 22 '05, 05:46 AM

Re: Optimisation needed

"spasmous" <spasmous@yahoo .com> wrote in message
news:2f066762.0 401262319.87c10 3@posting.googl e.com...[color=blue]
> Andre Kostur <nntpspam@kostu r.net> wrote in message[/color]
news:<Xns947CA5 6DF71A6nntpspam kosturnet@207.3 5.177.135>...[color=blue][color=green]
> >
> > Umm... some simple stuff that _might_ help:
> >
> > Move the declaration and initialization of x0 outside the inner loop.[/color][/color]
It[color=blue][color=green]
> > doesn't need to be recalculated every time.
> >
> > Move the declaration and initialization of dx outside the inner loop
> > (same reason). I'd probably square it outside the loop too.[/color]
>
>
> Mm, I think the compiler must be doing that already - it makes no
> difference to the runtime. Thanks for the tip tho' :)
>
> A quick experiment I did commenting out the expf() call reduced the
> timing by about 75%, so I suspect that's doing the damage. And that's
> also probably already as fast as it can go, being a math library
> function, right?[/color]

It is quite possible that operations on double are faster than floats as
the implementation will be optimized for double and
floating point hardware may only actually
work on doubles. You may actually be paying for a lot of
double/float conversions.

I'm not up on how these things are implemented but
expf(deltad*nor m) == pow( deltad, expf(norm) )
so precalculating expf(norm) may help.

**David Fisher** · Jul 22 '05, 05:47 AM

Re: Optimisation needed

"spasmous" <spasmous@yahoo .com> wrote:
[color=blue]
> I have an inner loop that is running too slowly. I've done my best at
> optimising it but am not really experienced at what works best. Can
> anyone suggest any optimisations?[/color]
[color=blue]
> for(m=-range;m<=range; m++) // range is either 1 or 2
> {
> for(n=-range;n<=range; n++)
> {
> /* indices */
> int x0 = (int)px - m;
> int y0 = (int)py - n;
> index = x0*kpar->fftrows + y0;
>
> /* zero out of bounds */
> index *= (int)(index>=0 && index<(kpar->fftrows*kpar->fftrows));[/color]

Instead of multiplying index by 0 or 1, you could say:

if (index < 0 || index > fftrowsSquared)
{
index = 0;
}

and define fftrowsSquared as (kpar->fftrows*kpar->fftrows) before the
beginning of the first loop.

Values of argument to expf():

(square(px - (float) ((int)px - m)) + square(py - (float) ((int) py - n))) *
norm

float pxVal = px - (float) ((int) px)
float pyVal = py - (float) ((int) py)

square(pxVal - (-range to range)) + square(pyVal - (-range to range))

square(pxVal -1) + square(pyVal - 1)
square(pxVal -1) + square(pyVal - 1)
square(pxVal -1) + square(pyVal - 1)
square(pxVal -1) + square(pyVal - 1)

[color=blue]
>
> /* calculate distance */
> float dx = px - (float) ((int) px - m);
> float dy = py - (float)y0;
> deltad = dx*dx + dy*dy;
>
> /* calculate coefficient */
> fscale = expf(deltad * norm);
>
> /* store values */
> str->array_1[index] += val_1 * fscale;
> str->array_2[index] += val_2 * fscale;
> str->array_3[index] += fscale;
> }
> }[/color]

**David Fisher** · Jul 22 '05, 05:47 AM

Re: Optimisation needed

"David Fisher" <nospam@nospam. nospam.nospam> wrote:

Oops, previous mail got sent before it was finished ...

I was trying to say, the arguments to expf() (without the multiplication by
"norm") look something like this:

square(pxVal -1) + square(pyVal + 1) == (square(pxVal) - 2 * pxVal + 1) +
(square(pyVal) + 2 * pyVal + 1)
square(pxVal - 2) + square(pyVal) == (square(pxVal) - 4 * pxVal + 4) +
square(pyVal)
etc.

where

float pxVal = px - (float) ((int) px);
float pyVal = py - (float) ((int) py);

So the differences in the arguments to expf() are a multiple of pxVal plus a
multiple of pyVal plus a constant ...

exp(a + b) = exp(a) * exp(b)

so you can calculate the following values and use them to find the required
values of expf():

exp(square(pxVa l))
exp(square(pyVa l))
exp(norm) => constant ?
exp(1) => constant, once only
exp(4) => constant, once only
exp(pxVal * 2) => same as square(exp(pxVa l)) => just calculate exp(pxVal)
once
exp(pxVal * -2) => same as sqrt(exp(pxVal) )
exp(pxVal * 4) => same as square(square(e xp(pxVal)))
exp(pxVal * -4) => same as sqrt(sqrt(exp(p xVal)))
etc.

You should end up with a set of expressions that are faster to calculate
than expf(): square, sqrt, *, etc

Hope this is helpful,

David F

**David Fisher** · Jul 22 '05, 05:47 AM

Re: Optimisation needed

"David Fisher" <nospam@nospam. nospam.nospam> wrote:
[color=blue]
> values of expf():
>
> exp(square(pxVa l))
> exp(square(pyVa l))
> exp(norm) => constant ?
> exp(1) => constant, once only
> exp(4) => constant, once only
> exp(pxVal * 2) => same as square(exp(pxVa l)) => just calculate exp(pxVal)
> once
> exp(pxVal * -2) => same as sqrt(exp(pxVal) )
> exp(pxVal * 4) => same as square(square(e xp(pxVal)))
> exp(pxVal * -4) => same as sqrt(sqrt(exp(p xVal)))
> etc.
>
> You should end up with a set of expressions that are faster to calculate
> than expf(): square, sqrt, *, etc[/color]

Hmm ... still leaves the problem of multiplying by "norm":

exp(a * b) == pow(exp(a), b)

You wouldn't want to say pow(whatever, norm) for each expression - that
would cost as much as calling expf() in the first place ...

I guess the solution is to multiply all of the above-listed values by
"norm", ie. calculate the values of:

exp(square(pxVa l) * norm)
exp(4 * norm)
etc.

(I hope the maths is right :-)

David F

Optimisation needed

Optimisation needed

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment