32-bit IEEE float multiplication

**Arthur J. O'Dwyer** · Nov 13 '05, 11:46 PM

[FAQ] Re: 32-bit IEEE float multiplication

On Wed, 3 Dec 2003, Andy wrote:[color=blue]
>
> I don't know if this is the correct group to post this, but
> when I multiply a huge floating point value by a really
> small (non-zero) floating point value, I get 0 (zero) for the result.
> This creates a big hole in a 32-bit timer routine I wrote.
>
> Questions.
> 1. Why does this happen?[/color]

It's in the FAQ. In general, Computers Are Not Math. Computers,
even big fast ones, deal exclusively in fixed-width data. That
means you only get a finite amount of precision to work with.
Keep dividing X by 2 over and over again, and eventually X will
get so small that it's indistinguishab le from zero. And on a
computer, that means it's *equal* to zero. It's called "truncation "
or "underflow" or "overflow" or "rounding error" (depending on
exactly what's going on), and it's something that every numeric
programmer should understand.
Your subject line suggests you understand what a "bit" is; so
think about it this way. You're using 32-bit numbers -- 32 bits
can hold 2**32 different values -- so if the number you want to
compute isn't one of those 2**32 particular values representable
by your 32-bit numbers, then you'll get rounding error. If your
number is very close to zero, then it'll *become* zero, just because
that's the nearest representation the computer can get.
[color=blue]
> 2. Is there C macros/functions I can call to tell me
> when two non-zero numbers are multiplied and the
> result will be zero?[/color]

if (a != 0 && b != 0 && a*b == 0)
puts("duh");

-Arthur

**Christian Bau** · Nov 13 '05, 11:46 PM

Re: 32-bit IEEE float multiplication

In article <aed59298.03120 31420.690f132d@ posting.google. com>,
bikejog@hotmail .com (Andy) wrote:
[color=blue]
> Hi,
> I don't know if this is the correct group to post this, but
> when I multiply a huge floating point value by a really
> small (non-zero) floating point value, I get 0 (zero) for the result.
> This creates a big hole in a 32-bit timer routine I wrote.
>
> Questions.
> 1. Why does this happen?
> 2. Is there C macros/functions I can call to tell me
> when two non-zero numbers are multiplied and the
> result will be zero?[/color]

This is unusual. Could you post which compiler is used, and post source
code that demonstrates the problem? You mean something like

double result = 1e300 * 1e-300;

?

**Sidney Cadot** · Nov 13 '05, 11:46 PM

Re: 32-bit IEEE float multiplication

Andy wrote:
[color=blue]
> Hi,
> I don't know if this is the correct group to post this, but
> when I multiply a huge floating point value by a really
> small (non-zero) floating point value, I get 0 (zero) for the result.
> This creates a big hole in a 32-bit timer routine I wrote.
>
> Questions.
> 1. Why does this happen?
> 2. Is there C macros/functions I can call to tell me
> when two non-zero numbers are multiplied and the
> result will be zero?
>
> TIA
> Andy[/color]

This is not really on-topic here, but I wouldn't know where it would be,
so let's give it a go.

Assuming 32-bit IEEE floating point numbers (from your subject line),
their product should be the representation of the number that is closest
to the nearest representable number, subject to rounding. To quote the
first paragraph of Section 4 of IEEE 754-1985:

"Rounding takes a number regarded as infinitely precise and, if
necessary, modifies it to fit in the destination’s format while
signaling the inexact exception (7.5). Except for binary <---> decimal
conversion (whose weaker conditions are specified in 5.6), every
operation specified in Section 5 shall be performed as if it first
produced an intermediate result correct to infinite precision and with
unbounded range, and then rounded that result according to one of the
modes in this section. "

(and, of course, multiplication is specified in Section 5).

So if you are working using IEEE 754-compliant floating point, this
could be the case. To work this out, please show the exact (hex)
representation of the two numbers being multiplied and the result (is
the result a single- or double-precision number)?

Furher, it would help to know the hardware or software your running this on.

And lastly, the minimal program you can produce to show the problem
would be of assistance in understanding this.

Best regards,

Sidney Cadot

**Sidney Cadot** · Nov 13 '05, 11:46 PM

Re: [FAQ] Re: 32-bit IEEE float multiplication

Arthur J. O'Dwyer wrote:
[color=blue]
> On Wed, 3 Dec 2003, Andy wrote:
>[color=green]
>> I don't know if this is the correct group to post this, but
>>when I multiply a huge floating point value by a really
>>small (non-zero) floating point value, I get 0 (zero) for the result.
>>This creates a big hole in a 32-bit timer routine I wrote.
>>
>>Questions.
>> 1. Why does this happen?[/color]
>
>
> It's in the FAQ. In general, Computers Are Not Math. Computers,
> even big fast ones, deal exclusively in fixed-width data. That
> means you only get a finite amount of precision to work with.
> Keep dividing X by 2 over and over again, and eventually X will
> get so small that it's indistinguishab le from zero.[/color]

That being true of course in general, IEEE-754 has some pretty strict
rules on what to expect, and I think they preclude the behavior that the
OP describes. Probably a dud, but worth checking out if only because it
beats the 'Indian C programmers and "u"' hands-down for interestingness .

Best regards,

Sidney

**John Smith** · Nov 13 '05, 11:46 PM

Re: [FAQ] Re: 32-bit IEEE float multiplication

Arthur J. O'Dwyer wrote:
[color=blue]
>
> think about it this way. You're using 32-bit numbers -- 32 bits
> can hold 2**32 different values -- so if the number you want to
> compute isn't one of those 2**32 particular values representable
> by your 32-bit numbers, then you'll get rounding error. If your
> number is very close to zero, then it'll *become* zero, just because
> that's the nearest representation the computer can get.[/color]

I have always been a little hazy on this issue (some fractional base 10
values not representable in base 2). I have "almost" understood it --
until hearing it expressed in this way. It is like the understanding you
acquire after finally hearing the sound of one hand clapping. Thanks,
Arthur.

JS

**Kevin D. Quitt** · Nov 13 '05, 11:47 PM

Re: 32-bit IEEE float multiplication

ftp://ftp.quitt.net/Outgoing/goldbergFollowup.pdf

--
#include <standard.discl aimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
Per the FCA, this address may not be added to any commercial mail list

**Keith Thompson** · Nov 13 '05, 11:47 PM

Re: 32-bit IEEE float multiplication

bikejog@hotmail .com (Andy) writes:[color=blue]
> I don't know if this is the correct group to post this, but
> when I multiply a huge floating point value by a really
> small (non-zero) floating point value, I get 0 (zero) for the result.
> This creates a big hole in a 32-bit timer routine I wrote.
>
> Questions.
> 1. Why does this happen?
> 2. Is there C macros/functions I can call to tell me
> when two non-zero numbers are multiplied and the
> result will be zero?[/color]

Floating-point arithmetic is strange, but it's not usually *that*
strange. I'll call your huge number huge_num, and your small number
small_num. Presumably they meet the following conditions:

small_num > 0.0
small_num < 1.0
huge_num > 1.0

I would expect the following to be true, even in the presence of
rounding errors:

small_num * 1.0 == small_num
small_num * X >= small_num, for any X >= 1.0 (barring overflow)
small_num * huge_num > small_num
therefore
small_num * huge_num > 0.0

I can imagine small_num * huge_num yielding 0.0 if small_num is a
denorm (maybe), or if small_num and huge_num are of different types.

Are you sure that the result of the multiplication is 0.0? If you're
displaying it with something like a printf "%f" format, it may be
rounding to zero on output, not in the computation.

What are the actual types and values of small_num and huge_num? Can
you post a small self-contained program that exhibits the problem?
Try using printf's "%g" format for output, or even something like
"%.50g" to display more digits.

--
Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)

**Dik T. Winter** · Nov 13 '05, 11:47 PM

Re: 32-bit IEEE float multiplication

In article <ln3cc1e6hy.fsf @nuthaus.mib.or g> Keith Thompson <kst-u@mib.org> writes:[color=blue]
> Floating-point arithmetic is strange, but it's not usually *that*
> strange. I'll call your huge number huge_num, and your small number
> small_num. Presumably they meet the following conditions:
>
> small_num > 0.0
> small_num < 1.0
> huge_num > 1.0
>
> I would expect the following to be true, even in the presence of
> rounding errors:
>
> small_num * 1.0 == small_num
> small_num * X >= small_num, for any X >= 1.0 (barring overflow)[/color]

Overflow can not occur when 0.0 < small_num < 1.0, because the product
is smaller than or equal to X (equality is possible due to rounding
when small_num is very close to 1.0).
[color=blue]
> small_num * huge_num > small_num
> therefore
> small_num * huge_num > 0.0
>
> I can imagine small_num * huge_num yielding 0.0 if small_num is a
> denorm (maybe), or if small_num and huge_num are of different types.[/color]

Not even when small_num is a denormalised number. Because huge_num > 1.0
the product must be larger than or equal to small_num (equality is possible
when some rounding occurs and huge_num is very close to 1.0).
[color=blue]
> Are you sure that the result of the multiplication is 0.0? If you're
> displaying it with something like a printf "%f" format, it may be
> rounding to zero on output, not in the computation.[/color]

I think this is the most likely cause.

Under IEEE 0.0 can only be the result of a multiplication when one of the
operands is 0.0 (obviously) or when both operands are in absolute value
smaller than 1.0 and their mathematical product is in absulute value
smaller than the smallest representable number.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

**Arthur J. O'Dwyer** · Nov 13 '05, 11:47 PM

Re: [FAQ] Re: 32-bit IEEE float multiplication

On Thu, 4 Dec 2003, John Smith wrote:[color=blue]
>
> Arthur J. O'Dwyer wrote:[color=green]
> >
> > think about it this way. You're using 32-bit numbers -- 32 bits
> > can hold 2**32 different values -- so if the number you want to
> > compute isn't one of those 2**32 particular values representable
> > by your 32-bit numbers, then you'll get rounding error. If your
> > number is very close to zero, then it'll *become* zero, just because
> > that's the nearest representation the computer can get.[/color]
>
> I have always been a little hazy on this issue (some fractional base 10
> values not representable in base 2). I have "almost" understood it --
> until hearing it expressed in this way. It is like the understanding you
> acquire after finally hearing the sound of one hand clapping. Thanks,
> Arthur.[/color]

You're welcome!
But it occurs to me (after reading some of the other replies)
that there is at least one point I should have mentioned in my
response, and one more point that you still seem to be slightly
"hazy" on:

First, as several others have pointed out, if we have

double a,b,c; /* initialized somehow */
assert(a > 1.0);
assert(b > 0.0);
assert(b < 1.0);
c = a*b;

then it is always the case that

assert(c != 0.0);

However, it is plausible that the OP might have initialized
'b' in such a way as to make him *think* it was a small positive
value, while in fact it had already been corrupted by round-off
error. For example, I think

b = 1e-400; /* Really small positive number? -- NOT! */

is likely to set 'b' to 0.0 on many common platforms. (If I'm
mistaken, just add one more zero to the end of that exponent. ;)
So that's where the OP should be looking in this case. (Or he
should look and see whether he's using the %f or %g format
specifiers to display the numbers, as others have said.)

Secondly, you talk about "fractional base 10 values not
representable in base 2." Such values do exist (modulo certain
objections about the nature of "representable" ) -- for example,

0.1 (base 10) = 0.0001100110011 0011... (base 2)

That is to say, 1/10 has a binary representation that can't be
precisely represented with any finite number of bits -- as opposed
to, say, 1/2 or 3/16, which can.
However, I was talking mostly about numbers that *can* be
represented with a finite number of bits -- the problem is just
that they require *more* bits than the OP's computer is willing
to allocate. For example,

pow(2, -10000)

is mathematically representable as

0.000000... (several thousand more zeroes) ...000001 (base 2),
or just
1 (base 2) multiplied by 2 to the -10011100010000 (base 2) power.

But the OP's 32-bit floating-point format doesn't have enough
bits in the exponent to represent this number (we need 14 bits
to store "-10000" in binary, and the IEEE format only has 8 bits
in its exponent field).
So even though the number pow(2, -10000) has a finite binary
expansion, it's *still* not perfectly representable by the OP's
computer. And so it rounds off to 0.0, and we have problems. :)

HTH,
-Arthur

**Dan Pop** · Nov 13 '05, 11:48 PM

Re: 32-bit IEEE float multiplication

In <aed59298.03120 31420.690f132d@ posting.google. com> bikejog@hotmail .com (Andy) writes:
[color=blue]
>Hi,
> I don't know if this is the correct group to post this, but
>when I multiply a huge floating point value by a really
>small (non-zero) floating point value, I get 0 (zero) for the result.
>This creates a big hole in a 32-bit timer routine I wrote.
>
>Questions.
> 1. Why does this happen?[/color]

Show us the code. A minimal, but complete program illustrating your
problem.
[color=blue]
> 2. Is there C macros/functions I can call to tell me
> when two non-zero numbers are multiplied and the
> result will be zero?[/color]

If the (positive) result is less than FLT_MIN, i.e. it cannot be
represented as a floating number, it is zero.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de

**Andy** · Nov 13 '05, 11:48 PM

Re: 32-bit IEEE float multiplication

The compiler is Keil for Intel 8051 and derivatives. It's
an embedded compiler. It was actually my mistake. I think the
code is actually working, but the test code I wrote had a race
condition on the variable gcGCLK with the ISR that actually
increments this variable. The actual code is appended at the
end of my message.
But here's a real question, given the code below

unsigned long ticks = 0; /* 32-bit */
float fVal; /* 32-bit IEEE-754 */

while(++ticks) {
fVal = (float)ticks * 0.004;
}

1. Can I expect fVal to always not decreasing as ticks is
incremented? (until of course when ticks wraps around
to 0)
2. Does fVal covers all the integral values? ie, could it
go from say 56 to 60 skipping 57, 58, and 59?
3. In this example, each integral value equals to 250
ticks. Are all intervals between any two consecutive
integral values of fVal always 250 ticks? (within tolerance
of course). ie, between fVal of 250 and 251, there're
250 ticks. Is this true for all integral intervals of fVal?
3. How about for a smaller number like 0.00004.

Sorry to have asked so many questions. I'm not too good with
this IEEE float thing.

/* the actual test code I wrote */
typedef unsigned long GCLK_T;
#define SECS_PER_TICK 0.004
GCLK_T RealElapsedTick s(GCLK_T zStart) {
return(gzGCLK - zStart); /* gzGCLK gets incremented in an ISR */
}
GCLK_T RealElapsedSecs (GCLK_T zStart) {
float fVal;

gzGCLK = 0xffffffff; /* ERROR. Race condition with ISR */
zStart = 0x00000000;

fVal = (float)RealElap sedTicks(zStart ) * ((float)GCLK_SE C_PER_TICK);
return(fVal);
}

TIA
Andy

Christian Bau <christian.bau@ cbau.freeserve. co.uk> wrote in message news:<christian .bau-D0D111.23144703 122003@slb-newsm1.svr.pol. co.uk>...[color=blue]
> In article <aed59298.03120 31420.690f132d@ posting.google. com>,
> bikejog@hotmail .com (Andy) wrote:
>[color=green]
> > Hi,
> > I don't know if this is the correct group to post this, but
> > when I multiply a huge floating point value by a really
> > small (non-zero) floating point value, I get 0 (zero) for the result.
> > This creates a big hole in a 32-bit timer routine I wrote.
> >
> > Questions.
> > 1. Why does this happen?
> > 2. Is there C macros/functions I can call to tell me
> > when two non-zero numbers are multiplied and the
> > result will be zero?[/color]
>
> This is unusual. Could you post which compiler is used, and post source
> code that demonstrates the problem? You mean something like
>
> double result = 1e300 * 1e-300;
>
> ?[/color]

**Dik T. Winter** · Nov 13 '05, 11:48 PM

Re: 32-bit IEEE float multiplication

In article <aed59298.03120 40655.43db93e4@ posting.google. com> bikejog@hotmail .com (Andy) writes:[color=blue]
> But here's a real question, given the code below
>
> unsigned long ticks = 0; /* 32-bit */
> float fVal; /* 32-bit IEEE-754 */
>
> while(++ticks) {
> fVal = (float)ticks * 0.004;
> }
>
> 1. Can I expect fVal to always not decreasing as ticks is
> incremented? (until of course when ticks wraps around
> to 0)[/color]

Yes.
[color=blue]
> 2. Does fVal covers all the integral values? ie, could it
> go from say 56 to 60 skipping 57, 58, and 59?[/color]

No. It will not go from 56 to 60, but it may go from slightly less than
59 to slightly more than 59, skipping 59 itself.
[color=blue]
> 3. In this example, each integral value equals to 250
> ticks. Are all intervals between any two consecutive
> integral values of fVal always 250 ticks? (within tolerance
> of course). ie, between fVal of 250 and 251, there're
> 250 ticks. Is this true for all integral intervals of fVal?[/color]

Because of the above observation, no, not necessarily.
[color=blue]
> 3. How about for a smaller number like 0.00004.[/color]

Similar answer. 0.04 and 0.00004 are not exactly representable as
floating point numbers, so rounding occurs both when the representation
of those numbers is created and when this value is used in the
multiplication. A better way would be to calculate:
(float)ticks / 250.0
but that may be slower on your system. (250.0 is exactly representable,
so IEEE mandates that when ticks is an integer that is a multiple of
250 the result should be exact.)

Another problem with your code is that when ticks exceeds 2**24 the
number is no longer exactly represenatable as a floating point number,
so all bets are off.

To get the best possible answer you need something like:
(float)(ticks / 250) + (float)(ticks % 250)/250.0
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

**Dan Pop** · Nov 13 '05, 11:48 PM

Re: 32-bit IEEE float multiplication

In <aed59298.03120 40655.43db93e4@ posting.google. com> bikejog@hotmail .com (Andy) writes:
[color=blue]
>The compiler is Keil for Intel 8051 and derivatives. It's
>an embedded compiler. It was actually my mistake. I think the
>code is actually working, but the test code I wrote had a race
>condition on the variable gcGCLK with the ISR that actually
>increments this variable. The actual code is appended at the
>end of my message.
> But here's a real question, given the code below
>
> unsigned long ticks = 0; /* 32-bit */
> float fVal; /* 32-bit IEEE-754 */
>
> while(++ticks) {
> fVal = (float)ticks * 0.004;[/color]

You're evaluating this expression using double precision, which is
probably the last thing you want on a 8051 or any other processor with
no hardware support for floating point. The right expression is:

fVal = ticks * 0.004f;
[color=blue]
> }
>
> 1. Can I expect fVal to always not decreasing as ticks is
> incremented? (until of course when ticks wraps around
> to 0)[/color]

Yes.
[color=blue]
> 2. Does fVal covers all the integral values? ie, could it
> go from say 56 to 60 skipping 57, 58, and 59?[/color]

If fVal ever reaches the value 56, its next values will be 56.004,
56.008 and so on. Actually, approximations of these values, because
IEEE-754 cannot represent them exactly.
[color=blue]
> 3. In this example, each integral value equals to 250
> ticks. Are all intervals between any two consecutive
> integral values of fVal always 250 ticks? (within tolerance
> of course). ie, between fVal of 250 and 251, there're
> 250 ticks. Is this true for all integral intervals of fVal?[/color]

0.004 cannot be represented by IEEE-754 exactly, because it is not a
combination of negative powers of two. Therefore, your computations
may seldom yield an integer value when the mathematically exact result
is an integer value.

Things would be different for 0.00390625 (256 ticks per integral value)
because this value can be represented exactly.
[color=blue]
> 3. How about for a smaller number like 0.00004.[/color]

No difference.
[color=blue]
> Sorry to have asked so many questions. I'm not too good with
>this IEEE float thing.[/color]

You may want to understand it, to avoid surprises. Binary floating point
arithmetic doesn't work the same way as normal arithmetic, mostly because
most real numbers cannot be represented exactly.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de

**John Smith** · Nov 13 '05, 11:48 PM

Re: [FAQ] Re: 32-bit IEEE float multiplication

Arthur J. O'Dwyer wrote:
[color=blue]
> HTH,
> -Arthur[/color]

It does help. Understanding refined. Thanks.

JS

32-bit IEEE float multiplication

32-bit IEEE float multiplication

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment