32-bit IEEE float multiplication

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andy

    32-bit IEEE float multiplication

    Hi,
    I don't know if this is the correct group to post this, but
    when I multiply a huge floating point value by a really
    small (non-zero) floating point value, I get 0 (zero) for the result.
    This creates a big hole in a 32-bit timer routine I wrote.

    Questions.
    1. Why does this happen?
    2. Is there C macros/functions I can call to tell me
    when two non-zero numbers are multiplied and the
    result will be zero?

    TIA
    Andy
  • Arthur J. O'Dwyer

    #2
    [FAQ] Re: 32-bit IEEE float multiplication


    On Wed, 3 Dec 2003, Andy wrote:[color=blue]
    >
    > I don't know if this is the correct group to post this, but
    > when I multiply a huge floating point value by a really
    > small (non-zero) floating point value, I get 0 (zero) for the result.
    > This creates a big hole in a 32-bit timer routine I wrote.
    >
    > Questions.
    > 1. Why does this happen?[/color]

    It's in the FAQ. In general, Computers Are Not Math. Computers,
    even big fast ones, deal exclusively in fixed-width data. That
    means you only get a finite amount of precision to work with.
    Keep dividing X by 2 over and over again, and eventually X will
    get so small that it's indistinguishab le from zero. And on a
    computer, that means it's *equal* to zero. It's called "truncation "
    or "underflow" or "overflow" or "rounding error" (depending on
    exactly what's going on), and it's something that every numeric
    programmer should understand.
    Your subject line suggests you understand what a "bit" is; so
    think about it this way. You're using 32-bit numbers -- 32 bits
    can hold 2**32 different values -- so if the number you want to
    compute isn't one of those 2**32 particular values representable
    by your 32-bit numbers, then you'll get rounding error. If your
    number is very close to zero, then it'll *become* zero, just because
    that's the nearest representation the computer can get.
    [color=blue]
    > 2. Is there C macros/functions I can call to tell me
    > when two non-zero numbers are multiplied and the
    > result will be zero?[/color]

    if (a != 0 && b != 0 && a*b == 0)
    puts("duh");

    -Arthur

    Comment

    • Christian Bau

      #3
      Re: 32-bit IEEE float multiplication

      In article <aed59298.03120 31420.690f132d@ posting.google. com>,
      bikejog@hotmail .com (Andy) wrote:
      [color=blue]
      > Hi,
      > I don't know if this is the correct group to post this, but
      > when I multiply a huge floating point value by a really
      > small (non-zero) floating point value, I get 0 (zero) for the result.
      > This creates a big hole in a 32-bit timer routine I wrote.
      >
      > Questions.
      > 1. Why does this happen?
      > 2. Is there C macros/functions I can call to tell me
      > when two non-zero numbers are multiplied and the
      > result will be zero?[/color]

      This is unusual. Could you post which compiler is used, and post source
      code that demonstrates the problem? You mean something like

      double result = 1e300 * 1e-300;

      ?

      Comment

      • Sidney Cadot

        #4
        Re: 32-bit IEEE float multiplication

        Andy wrote:
        [color=blue]
        > Hi,
        > I don't know if this is the correct group to post this, but
        > when I multiply a huge floating point value by a really
        > small (non-zero) floating point value, I get 0 (zero) for the result.
        > This creates a big hole in a 32-bit timer routine I wrote.
        >
        > Questions.
        > 1. Why does this happen?
        > 2. Is there C macros/functions I can call to tell me
        > when two non-zero numbers are multiplied and the
        > result will be zero?
        >
        > TIA
        > Andy[/color]

        This is not really on-topic here, but I wouldn't know where it would be,
        so let's give it a go.

        Assuming 32-bit IEEE floating point numbers (from your subject line),
        their product should be the representation of the number that is closest
        to the nearest representable number, subject to rounding. To quote the
        first paragraph of Section 4 of IEEE 754-1985:

        "Rounding takes a number regarded as infinitely precise and, if
        necessary, modifies it to fit in the destination’s format while
        signaling the inexact exception (7.5). Except for binary <---> decimal
        conversion (whose weaker conditions are specified in 5.6), every
        operation specified in Section 5 shall be performed as if it first
        produced an intermediate result correct to infinite precision and with
        unbounded range, and then rounded that result according to one of the
        modes in this section. "

        (and, of course, multiplication is specified in Section 5).

        So if you are working using IEEE 754-compliant floating point, this
        could be the case. To work this out, please show the exact (hex)
        representation of the two numbers being multiplied and the result (is
        the result a single- or double-precision number)?

        Furher, it would help to know the hardware or software your running this on.

        And lastly, the minimal program you can produce to show the problem
        would be of assistance in understanding this.


        Best regards,

        Sidney Cadot

        Comment

        • Sidney Cadot

          #5
          Re: [FAQ] Re: 32-bit IEEE float multiplication

          Arthur J. O'Dwyer wrote:
          [color=blue]
          > On Wed, 3 Dec 2003, Andy wrote:
          >[color=green]
          >> I don't know if this is the correct group to post this, but
          >>when I multiply a huge floating point value by a really
          >>small (non-zero) floating point value, I get 0 (zero) for the result.
          >>This creates a big hole in a 32-bit timer routine I wrote.
          >>
          >>Questions.
          >> 1. Why does this happen?[/color]
          >
          >
          > It's in the FAQ. In general, Computers Are Not Math. Computers,
          > even big fast ones, deal exclusively in fixed-width data. That
          > means you only get a finite amount of precision to work with.
          > Keep dividing X by 2 over and over again, and eventually X will
          > get so small that it's indistinguishab le from zero.[/color]

          That being true of course in general, IEEE-754 has some pretty strict
          rules on what to expect, and I think they preclude the behavior that the
          OP describes. Probably a dud, but worth checking out if only because it
          beats the 'Indian C programmers and "u"' hands-down for interestingness .

          Best regards,

          Sidney

          Comment

          • John Smith

            #6
            Re: [FAQ] Re: 32-bit IEEE float multiplication

            Arthur J. O'Dwyer wrote:
            [color=blue]
            >
            > think about it this way. You're using 32-bit numbers -- 32 bits
            > can hold 2**32 different values -- so if the number you want to
            > compute isn't one of those 2**32 particular values representable
            > by your 32-bit numbers, then you'll get rounding error. If your
            > number is very close to zero, then it'll *become* zero, just because
            > that's the nearest representation the computer can get.[/color]

            I have always been a little hazy on this issue (some fractional base 10
            values not representable in base 2). I have "almost" understood it --
            until hearing it expressed in this way. It is like the understanding you
            acquire after finally hearing the sound of one hand clapping. Thanks,
            Arthur.

            JS






            Comment

            • Kevin D. Quitt

              #7
              Re: 32-bit IEEE float multiplication

              ftp://ftp.quitt.net/Outgoing/goldbergFollowup.pdf


              --
              #include <standard.discl aimer>
              _
              Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
              Per the FCA, this address may not be added to any commercial mail list

              Comment

              • Keith Thompson

                #8
                Re: 32-bit IEEE float multiplication

                bikejog@hotmail .com (Andy) writes:[color=blue]
                > I don't know if this is the correct group to post this, but
                > when I multiply a huge floating point value by a really
                > small (non-zero) floating point value, I get 0 (zero) for the result.
                > This creates a big hole in a 32-bit timer routine I wrote.
                >
                > Questions.
                > 1. Why does this happen?
                > 2. Is there C macros/functions I can call to tell me
                > when two non-zero numbers are multiplied and the
                > result will be zero?[/color]

                Floating-point arithmetic is strange, but it's not usually *that*
                strange. I'll call your huge number huge_num, and your small number
                small_num. Presumably they meet the following conditions:

                small_num > 0.0
                small_num < 1.0
                huge_num > 1.0

                I would expect the following to be true, even in the presence of
                rounding errors:

                small_num * 1.0 == small_num
                small_num * X >= small_num, for any X >= 1.0 (barring overflow)
                small_num * huge_num > small_num
                therefore
                small_num * huge_num > 0.0

                I can imagine small_num * huge_num yielding 0.0 if small_num is a
                denorm (maybe), or if small_num and huge_num are of different types.

                Are you sure that the result of the multiplication is 0.0? If you're
                displaying it with something like a printf "%f" format, it may be
                rounding to zero on output, not in the computation.

                What are the actual types and values of small_num and huge_num? Can
                you post a small self-contained program that exhibits the problem?
                Try using printf's "%g" format for output, or even something like
                "%.50g" to display more digits.

                --
                Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
                San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
                Schroedinger does Shakespeare: "To be *and* not to be"
                (Note new e-mail address)

                Comment

                • Dik T. Winter

                  #9
                  Re: 32-bit IEEE float multiplication

                  In article <ln3cc1e6hy.fsf @nuthaus.mib.or g> Keith Thompson <kst-u@mib.org> writes:[color=blue]
                  > Floating-point arithmetic is strange, but it's not usually *that*
                  > strange. I'll call your huge number huge_num, and your small number
                  > small_num. Presumably they meet the following conditions:
                  >
                  > small_num > 0.0
                  > small_num < 1.0
                  > huge_num > 1.0
                  >
                  > I would expect the following to be true, even in the presence of
                  > rounding errors:
                  >
                  > small_num * 1.0 == small_num
                  > small_num * X >= small_num, for any X >= 1.0 (barring overflow)[/color]

                  Overflow can not occur when 0.0 < small_num < 1.0, because the product
                  is smaller than or equal to X (equality is possible due to rounding
                  when small_num is very close to 1.0).
                  [color=blue]
                  > small_num * huge_num > small_num
                  > therefore
                  > small_num * huge_num > 0.0
                  >
                  > I can imagine small_num * huge_num yielding 0.0 if small_num is a
                  > denorm (maybe), or if small_num and huge_num are of different types.[/color]

                  Not even when small_num is a denormalised number. Because huge_num > 1.0
                  the product must be larger than or equal to small_num (equality is possible
                  when some rounding occurs and huge_num is very close to 1.0).
                  [color=blue]
                  > Are you sure that the result of the multiplication is 0.0? If you're
                  > displaying it with something like a printf "%f" format, it may be
                  > rounding to zero on output, not in the computation.[/color]

                  I think this is the most likely cause.

                  Under IEEE 0.0 can only be the result of a multiplication when one of the
                  operands is 0.0 (obviously) or when both operands are in absolute value
                  smaller than 1.0 and their mathematical product is in absulute value
                  smaller than the smallest representable number.
                  --
                  dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
                  home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

                  Comment

                  • Arthur J. O'Dwyer

                    #10
                    Re: [FAQ] Re: 32-bit IEEE float multiplication


                    On Thu, 4 Dec 2003, John Smith wrote:[color=blue]
                    >
                    > Arthur J. O'Dwyer wrote:[color=green]
                    > >
                    > > think about it this way. You're using 32-bit numbers -- 32 bits
                    > > can hold 2**32 different values -- so if the number you want to
                    > > compute isn't one of those 2**32 particular values representable
                    > > by your 32-bit numbers, then you'll get rounding error. If your
                    > > number is very close to zero, then it'll *become* zero, just because
                    > > that's the nearest representation the computer can get.[/color]
                    >
                    > I have always been a little hazy on this issue (some fractional base 10
                    > values not representable in base 2). I have "almost" understood it --
                    > until hearing it expressed in this way. It is like the understanding you
                    > acquire after finally hearing the sound of one hand clapping. Thanks,
                    > Arthur.[/color]

                    You're welcome!
                    But it occurs to me (after reading some of the other replies)
                    that there is at least one point I should have mentioned in my
                    response, and one more point that you still seem to be slightly
                    "hazy" on:

                    First, as several others have pointed out, if we have

                    double a,b,c; /* initialized somehow */
                    assert(a > 1.0);
                    assert(b > 0.0);
                    assert(b < 1.0);
                    c = a*b;

                    then it is always the case that

                    assert(c != 0.0);

                    However, it is plausible that the OP might have initialized
                    'b' in such a way as to make him *think* it was a small positive
                    value, while in fact it had already been corrupted by round-off
                    error. For example, I think

                    b = 1e-400; /* Really small positive number? -- NOT! */

                    is likely to set 'b' to 0.0 on many common platforms. (If I'm
                    mistaken, just add one more zero to the end of that exponent. ;)
                    So that's where the OP should be looking in this case. (Or he
                    should look and see whether he's using the %f or %g format
                    specifiers to display the numbers, as others have said.)

                    Secondly, you talk about "fractional base 10 values not
                    representable in base 2." Such values do exist (modulo certain
                    objections about the nature of "representable" ) -- for example,

                    0.1 (base 10) = 0.0001100110011 0011... (base 2)

                    That is to say, 1/10 has a binary representation that can't be
                    precisely represented with any finite number of bits -- as opposed
                    to, say, 1/2 or 3/16, which can.
                    However, I was talking mostly about numbers that *can* be
                    represented with a finite number of bits -- the problem is just
                    that they require *more* bits than the OP's computer is willing
                    to allocate. For example,

                    pow(2, -10000)

                    is mathematically representable as

                    0.000000... (several thousand more zeroes) ...000001 (base 2),
                    or just
                    1 (base 2) multiplied by 2 to the -10011100010000 (base 2) power.

                    But the OP's 32-bit floating-point format doesn't have enough
                    bits in the exponent to represent this number (we need 14 bits
                    to store "-10000" in binary, and the IEEE format only has 8 bits
                    in its exponent field).
                    So even though the number pow(2, -10000) has a finite binary
                    expansion, it's *still* not perfectly representable by the OP's
                    computer. And so it rounds off to 0.0, and we have problems. :)

                    HTH,
                    -Arthur

                    Comment

                    • Dan Pop

                      #11
                      Re: 32-bit IEEE float multiplication

                      In <aed59298.03120 31420.690f132d@ posting.google. com> bikejog@hotmail .com (Andy) writes:
                      [color=blue]
                      >Hi,
                      > I don't know if this is the correct group to post this, but
                      >when I multiply a huge floating point value by a really
                      >small (non-zero) floating point value, I get 0 (zero) for the result.
                      >This creates a big hole in a 32-bit timer routine I wrote.
                      >
                      >Questions.
                      > 1. Why does this happen?[/color]

                      Show us the code. A minimal, but complete program illustrating your
                      problem.
                      [color=blue]
                      > 2. Is there C macros/functions I can call to tell me
                      > when two non-zero numbers are multiplied and the
                      > result will be zero?[/color]

                      If the (positive) result is less than FLT_MIN, i.e. it cannot be
                      represented as a floating number, it is zero.

                      Dan
                      --
                      Dan Pop
                      DESY Zeuthen, RZ group
                      Email: Dan.Pop@ifh.de

                      Comment

                      • Andy

                        #12
                        Re: 32-bit IEEE float multiplication

                        The compiler is Keil for Intel 8051 and derivatives. It's
                        an embedded compiler. It was actually my mistake. I think the
                        code is actually working, but the test code I wrote had a race
                        condition on the variable gcGCLK with the ISR that actually
                        increments this variable. The actual code is appended at the
                        end of my message.
                        But here's a real question, given the code below

                        unsigned long ticks = 0; /* 32-bit */
                        float fVal; /* 32-bit IEEE-754 */

                        while(++ticks) {
                        fVal = (float)ticks * 0.004;
                        }

                        1. Can I expect fVal to always not decreasing as ticks is
                        incremented? (until of course when ticks wraps around
                        to 0)
                        2. Does fVal covers all the integral values? ie, could it
                        go from say 56 to 60 skipping 57, 58, and 59?
                        3. In this example, each integral value equals to 250
                        ticks. Are all intervals between any two consecutive
                        integral values of fVal always 250 ticks? (within tolerance
                        of course). ie, between fVal of 250 and 251, there're
                        250 ticks. Is this true for all integral intervals of fVal?
                        3. How about for a smaller number like 0.00004.

                        Sorry to have asked so many questions. I'm not too good with
                        this IEEE float thing.

                        /* the actual test code I wrote */
                        typedef unsigned long GCLK_T;
                        #define SECS_PER_TICK 0.004
                        GCLK_T RealElapsedTick s(GCLK_T zStart) {
                        return(gzGCLK - zStart); /* gzGCLK gets incremented in an ISR */
                        }
                        GCLK_T RealElapsedSecs (GCLK_T zStart) {
                        float fVal;

                        gzGCLK = 0xffffffff; /* ERROR. Race condition with ISR */
                        zStart = 0x00000000;

                        fVal = (float)RealElap sedTicks(zStart ) * ((float)GCLK_SE C_PER_TICK);
                        return(fVal);
                        }

                        TIA
                        Andy


                        Christian Bau <christian.bau@ cbau.freeserve. co.uk> wrote in message news:<christian .bau-D0D111.23144703 122003@slb-newsm1.svr.pol. co.uk>...[color=blue]
                        > In article <aed59298.03120 31420.690f132d@ posting.google. com>,
                        > bikejog@hotmail .com (Andy) wrote:
                        >[color=green]
                        > > Hi,
                        > > I don't know if this is the correct group to post this, but
                        > > when I multiply a huge floating point value by a really
                        > > small (non-zero) floating point value, I get 0 (zero) for the result.
                        > > This creates a big hole in a 32-bit timer routine I wrote.
                        > >
                        > > Questions.
                        > > 1. Why does this happen?
                        > > 2. Is there C macros/functions I can call to tell me
                        > > when two non-zero numbers are multiplied and the
                        > > result will be zero?[/color]
                        >
                        > This is unusual. Could you post which compiler is used, and post source
                        > code that demonstrates the problem? You mean something like
                        >
                        > double result = 1e300 * 1e-300;
                        >
                        > ?[/color]

                        Comment

                        • Dik T. Winter

                          #13
                          Re: 32-bit IEEE float multiplication

                          In article <aed59298.03120 40655.43db93e4@ posting.google. com> bikejog@hotmail .com (Andy) writes:[color=blue]
                          > But here's a real question, given the code below
                          >
                          > unsigned long ticks = 0; /* 32-bit */
                          > float fVal; /* 32-bit IEEE-754 */
                          >
                          > while(++ticks) {
                          > fVal = (float)ticks * 0.004;
                          > }
                          >
                          > 1. Can I expect fVal to always not decreasing as ticks is
                          > incremented? (until of course when ticks wraps around
                          > to 0)[/color]

                          Yes.
                          [color=blue]
                          > 2. Does fVal covers all the integral values? ie, could it
                          > go from say 56 to 60 skipping 57, 58, and 59?[/color]

                          No. It will not go from 56 to 60, but it may go from slightly less than
                          59 to slightly more than 59, skipping 59 itself.
                          [color=blue]
                          > 3. In this example, each integral value equals to 250
                          > ticks. Are all intervals between any two consecutive
                          > integral values of fVal always 250 ticks? (within tolerance
                          > of course). ie, between fVal of 250 and 251, there're
                          > 250 ticks. Is this true for all integral intervals of fVal?[/color]

                          Because of the above observation, no, not necessarily.
                          [color=blue]
                          > 3. How about for a smaller number like 0.00004.[/color]

                          Similar answer. 0.04 and 0.00004 are not exactly representable as
                          floating point numbers, so rounding occurs both when the representation
                          of those numbers is created and when this value is used in the
                          multiplication. A better way would be to calculate:
                          (float)ticks / 250.0
                          but that may be slower on your system. (250.0 is exactly representable,
                          so IEEE mandates that when ticks is an integer that is a multiple of
                          250 the result should be exact.)

                          Another problem with your code is that when ticks exceeds 2**24 the
                          number is no longer exactly represenatable as a floating point number,
                          so all bets are off.

                          To get the best possible answer you need something like:
                          (float)(ticks / 250) + (float)(ticks % 250)/250.0
                          --
                          dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
                          home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

                          Comment

                          • Dan Pop

                            #14
                            Re: 32-bit IEEE float multiplication

                            In <aed59298.03120 40655.43db93e4@ posting.google. com> bikejog@hotmail .com (Andy) writes:
                            [color=blue]
                            >The compiler is Keil for Intel 8051 and derivatives. It's
                            >an embedded compiler. It was actually my mistake. I think the
                            >code is actually working, but the test code I wrote had a race
                            >condition on the variable gcGCLK with the ISR that actually
                            >increments this variable. The actual code is appended at the
                            >end of my message.
                            > But here's a real question, given the code below
                            >
                            > unsigned long ticks = 0; /* 32-bit */
                            > float fVal; /* 32-bit IEEE-754 */
                            >
                            > while(++ticks) {
                            > fVal = (float)ticks * 0.004;[/color]

                            You're evaluating this expression using double precision, which is
                            probably the last thing you want on a 8051 or any other processor with
                            no hardware support for floating point. The right expression is:

                            fVal = ticks * 0.004f;
                            [color=blue]
                            > }
                            >
                            > 1. Can I expect fVal to always not decreasing as ticks is
                            > incremented? (until of course when ticks wraps around
                            > to 0)[/color]

                            Yes.
                            [color=blue]
                            > 2. Does fVal covers all the integral values? ie, could it
                            > go from say 56 to 60 skipping 57, 58, and 59?[/color]

                            If fVal ever reaches the value 56, its next values will be 56.004,
                            56.008 and so on. Actually, approximations of these values, because
                            IEEE-754 cannot represent them exactly.
                            [color=blue]
                            > 3. In this example, each integral value equals to 250
                            > ticks. Are all intervals between any two consecutive
                            > integral values of fVal always 250 ticks? (within tolerance
                            > of course). ie, between fVal of 250 and 251, there're
                            > 250 ticks. Is this true for all integral intervals of fVal?[/color]

                            0.004 cannot be represented by IEEE-754 exactly, because it is not a
                            combination of negative powers of two. Therefore, your computations
                            may seldom yield an integer value when the mathematically exact result
                            is an integer value.

                            Things would be different for 0.00390625 (256 ticks per integral value)
                            because this value can be represented exactly.
                            [color=blue]
                            > 3. How about for a smaller number like 0.00004.[/color]

                            No difference.
                            [color=blue]
                            > Sorry to have asked so many questions. I'm not too good with
                            >this IEEE float thing.[/color]

                            You may want to understand it, to avoid surprises. Binary floating point
                            arithmetic doesn't work the same way as normal arithmetic, mostly because
                            most real numbers cannot be represented exactly.

                            Dan
                            --
                            Dan Pop
                            DESY Zeuthen, RZ group
                            Email: Dan.Pop@ifh.de

                            Comment

                            • John Smith

                              #15
                              Re: [FAQ] Re: 32-bit IEEE float multiplication

                              Arthur J. O'Dwyer wrote:
                              [color=blue]
                              > HTH,
                              > -Arthur[/color]

                              It does help. Understanding refined. Thanks.

                              JS



                              Comment

                              Working...