Floating point calculation

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Vinoth

    Floating point calculation

    I'm working in an ARM (ARM9) system which does not have Floating point
    co-processor or Floating point libraries. But it does support long long int
    (64 bits).
    Can you provide some link that would discuss about ways to emulate floating
    point calculations with just long int or long long int. For eg., if i've a
    formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
    just long ints (but, some data may be lost in final division; That's OK)

    Floating Point:
    X=(1-b)*Y + b*Z
    /* 'b' is a floating point variable with 4 points precision and 'b' is in
    the range of 0 to 1;X, Y and Z are unsigned int*/

    With long int:
    I can emulate the above calculation as:
    X=((10000-10000*b)*Y +10000*b*Z)/10000

    I'm in need of some link that would discuss this and any similar approach.

    --
    -Vinoth



  • Gregory Pietsch

    #2
    Re: Floating point calculation


    Vinoth wrote:[color=blue]
    > I'm working in an ARM (ARM9) system which does not have Floating[/color]
    point[color=blue]
    > co-processor or Floating point libraries. But it does support long[/color]
    long int[color=blue]
    > (64 bits).
    > Can you provide some link that would discuss about ways to emulate[/color]
    floating[color=blue]
    > point calculations with just long int or long long int. For eg., if[/color]
    i've a[color=blue]
    > formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X[/color]
    with[color=blue]
    > just long ints (but, some data may be lost in final division; That's[/color]
    OK)[color=blue]
    >
    > Floating Point:
    > X=(1-b)*Y + b*Z
    > /* 'b' is a floating point variable with 4 points precision and 'b'[/color]
    is in[color=blue]
    > the range of 0 to 1;X, Y and Z are unsigned int*/
    >
    > With long int:
    > I can emulate the above calculation as:
    > X=((10000-10000*b)*Y +10000*b*Z)/10000
    >
    > I'm in need of some link that would discuss this and any similar[/color]
    approach.[color=blue]
    >
    > --
    > -Vinoth[/color]

    One way of faking floating point that I thought of is to find free
    multi-precision math libraries -- GNU mp and the library that comes
    with GNU bc and dc come to mind -- since those libraries treat the
    numbers as arrays of digits.

    Gregory Pietsch

    Comment

    • Rob Morris

      #3
      Re: Floating point calculation

      Vinoth wrote:[color=blue]
      > I'm working in an ARM (ARM9) system which does not have Floating point
      > co-processor or Floating point libraries. But it does support long long int
      > (64 bits).
      > Can you provide some link that would discuss about ways to emulate floating
      > point calculations with just long int or long long int. For eg., if i've a
      > formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
      > just long ints (but, some data may be lost in final division; That's OK)
      >
      > Floating Point:
      > X=(1-b)*Y + b*Z
      > /* 'b' is a floating point variable with 4 points precision and 'b' is in
      > the range of 0 to 1;X, Y and Z are unsigned int*/
      >
      > With long int:
      > I can emulate the above calculation as:
      > X=((10000-10000*b)*Y +10000*b*Z)/10000
      >
      > I'm in need of some link that would discuss this and any similar approach.
      >[/color]
      Hi, I'm a bit loth to step in here (reading this lurking in c.l.c, not a
      C expert), but couldn't you implement floating-point using those long
      longs? Write fmul, fdiv, fadd etc functions that mask off a long long
      into sign, exponent and mantissa and deal with them.

      Multiplication is like in standard index form (multiply the mantissa,
      add the exponent) and with adding you multiply so the numbers have the
      same exponent, add, then return to the normal form).

      There is an IEEE specification for floating point (e.g. google IEEE
      floating-point) that includes rules for what the bit patterns mean,
      representation of small numbers (a special case for numbers between -1
      and 1), infinities etc as well - probably better than coming up with
      your own scheme. I don't know if this is worth the effort for you, or
      if there are drawbacks I've not thought of, but I don't see why you
      couldn't do all this in standard C.

      This link seems good:


      HTH, all the best,
      Rob M

      --
      Rob Morris: arr emm four four five [at] cam dot ac dot uk

      Comment

      • Eric Sosman

        #4
        Re: Floating point calculation



        Vinoth wrote:[color=blue]
        > I'm working in an ARM (ARM9) system which does not have Floating point
        > co-processor or Floating point libraries. But it does support long long int
        > (64 bits).
        > Can you provide some link that would discuss about ways to emulate floating
        > point calculations with just long int or long long int. For eg., if i've a
        > formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
        > just long ints (but, some data may be lost in final division; That's OK)
        >
        > Floating Point:
        > X=(1-b)*Y + b*Z
        > /* 'b' is a floating point variable with 4 points precision and 'b' is in
        > the range of 0 to 1;X, Y and Z are unsigned int*/
        >
        > With long int:
        > I can emulate the above calculation as:
        > X=((10000-10000*b)*Y +10000*b*Z)/10000
        >
        > I'm in need of some link that would discuss this and any similar approach.[/color]

        Your "emulation" should work fine, if the products and
        sum in the numerator don't grow too large for `long'. If
        you know enough about the ranges of Y and Z to be sure this
        won't happen, all is well. If not, you can use `long long'
        for the intermediate results:

        X = ((10000LL - 10000LL*b) * Y + 10000LL*b * Z) / 10000LL;

        There are a number of possible improvements you may want
        to consider. The first is to get rid of those `10000LL*b'
        computations, which is easy: instead of storing `b' itself,
        store `10000 * b' in a `long' variable called `B':

        X = ((10000LL - B) * Y + (long long)B * Z) / 10000LL;

        Rearranging the expression with a little algebra can
        eliminate one of the multiplications and permit a little more
        of the computation to use plain `long' instead of `long long'
        (which may be faster, especially if `long long' is emulated
        in software):

        X = Y + (long)((Z - Y) * (long long)B / 10000LL);

        If you change the scaling factor from 10000 to something
        that's a power of two, you can replace the division with a
        shift. 16384 (1 << 14) is pretty close to your original
        10000, so assuming that `B' is now `b * 16384' you'd have

        X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );

        There's a potential trap here: if `Z - Y' is negative
        so the product being shifted is also negative, C doesn't
        specify exactly what happens with the right shift. Since
        you're only concerned with one implementation you could
        check whether it does what you want. If it doesn't, or
        if you want to be sure the code will work elsewhere, too,
        you could make sure that no negative numerators appear:

        if (Z >= Y)
        X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );
        else
        X = Y - (long)( ((Y - Z) * (long long)B) >> 14 );

        This is about as far as you can go with portable C --
        which is a shame, really, because some machines are capable
        of better. For example, there may be an instruction (or
        instruction sequence) to multiply two 32-bit numbers and
        yield a 64-bit product, but C cannot multiply two `long's to
        get a `long long'. If you used 32-bit scaling instead of
        the 14 bits shown above, the second term would simply be the
        high-order 32 bits of the 64-bit product and the machine might
        be able to extract it without shifting, but C has no portable
        way to perform such dissections. It's possible that a smart
        optimizing compiler might be able to exploit such capabilities
        of the machine (I'd especially recommend looking into the
        possibility of 32-bit scaling), but there are no guarantees.

        What you're doing with the "emulation" is called "fixed-
        point arithmetic," and the techniques can be applied in more
        sophisticated form -- to get a properly-rounded answer, for
        example, or to deal with numbers that have both integer and
        fractional parts. A small amount of research may give you
        some good ideas ...

        --
        Eric.Sosman@sun .com

        Comment

        • Vinoth

          #5
          Re: Floating point calculation

          Thanks to all for the information. I'm intrested in trying out all basic
          operations on fixed-point arithmetic. Can you point to some free library
          available? Google didn't help much.

          "Eric Sosman" <eric.sosman@su n.com> wrote in message
          news:d6i91e$77c $1@news1brm.Cen tral.Sun.COM...[color=blue]
          >
          >
          > Vinoth wrote:[color=green]
          >> I'm working in an ARM (ARM9) system which does not have Floating point
          >> co-processor or Floating point libraries. But it does support long long
          >> int
          >> (64 bits).
          >> Can you provide some link that would discuss about ways to emulate
          >> floating
          >> point calculations with just long int or long long int. For eg., if i've
          >> a
          >> formula X=(1-b)*Y + b*Z in floating point domain, i can calculate X with
          >> just long ints (but, some data may be lost in final division; That's OK)
          >>
          >> Floating Point:
          >> X=(1-b)*Y + b*Z
          >> /* 'b' is a floating point variable with 4 points precision and 'b' is in
          >> the range of 0 to 1;X, Y and Z are unsigned int*/
          >>
          >> With long int:
          >> I can emulate the above calculation as:
          >> X=((10000-10000*b)*Y +10000*b*Z)/10000
          >>
          >> I'm in need of some link that would discuss this and any similar
          >> approach.[/color]
          >
          > Your "emulation" should work fine, if the products and
          > sum in the numerator don't grow too large for `long'. If
          > you know enough about the ranges of Y and Z to be sure this
          > won't happen, all is well. If not, you can use `long long'
          > for the intermediate results:
          >
          > X = ((10000LL - 10000LL*b) * Y + 10000LL*b * Z) / 10000LL;
          >
          > There are a number of possible improvements you may want
          > to consider. The first is to get rid of those `10000LL*b'
          > computations, which is easy: instead of storing `b' itself,
          > store `10000 * b' in a `long' variable called `B':
          >
          > X = ((10000LL - B) * Y + (long long)B * Z) / 10000LL;
          >
          > Rearranging the expression with a little algebra can
          > eliminate one of the multiplications and permit a little more
          > of the computation to use plain `long' instead of `long long'
          > (which may be faster, especially if `long long' is emulated
          > in software):
          >
          > X = Y + (long)((Z - Y) * (long long)B / 10000LL);
          >
          > If you change the scaling factor from 10000 to something
          > that's a power of two, you can replace the division with a
          > shift. 16384 (1 << 14) is pretty close to your original
          > 10000, so assuming that `B' is now `b * 16384' you'd have
          >
          > X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );
          >
          > There's a potential trap here: if `Z - Y' is negative
          > so the product being shifted is also negative, C doesn't
          > specify exactly what happens with the right shift. Since
          > you're only concerned with one implementation you could
          > check whether it does what you want. If it doesn't, or
          > if you want to be sure the code will work elsewhere, too,
          > you could make sure that no negative numerators appear:
          >
          > if (Z >= Y)
          > X = Y + (long)( ((Z - Y) * (long long)B) >> 14 );
          > else
          > X = Y - (long)( ((Y - Z) * (long long)B) >> 14 );
          >
          > This is about as far as you can go with portable C --
          > which is a shame, really, because some machines are capable
          > of better. For example, there may be an instruction (or
          > instruction sequence) to multiply two 32-bit numbers and
          > yield a 64-bit product, but C cannot multiply two `long's to
          > get a `long long'. If you used 32-bit scaling instead of
          > the 14 bits shown above, the second term would simply be the
          > high-order 32 bits of the 64-bit product and the machine might
          > be able to extract it without shifting, but C has no portable
          > way to perform such dissections. It's possible that a smart
          > optimizing compiler might be able to exploit such capabilities
          > of the machine (I'd especially recommend looking into the
          > possibility of 32-bit scaling), but there are no guarantees.
          >
          > What you're doing with the "emulation" is called "fixed-
          > point arithmetic," and the techniques can be applied in more
          > sophisticated form -- to get a properly-rounded answer, for
          > example, or to deal with numbers that have both integer and
          > fractional parts. A small amount of research may give you
          > some good ideas ...
          >
          > --
          > Eric.Sosman@sun .com
          >[/color]


          Comment

          • Eric Sosman

            #6
            Re: Floating point calculation



            Vinoth wrote:[color=blue]
            > Thanks to all for the information. I'm intrested in trying out all basic
            > operations on fixed-point arithmetic. Can you point to some free library
            > available? Google didn't help much.[/color]

            Wow! You must be an awfully fast reader to have
            studied those "about 26,200" results in less than three
            hours! I'm afraid I can't offer more help than those
            26,200 articles can -- and since you've already found
            them inadequate it follows that I'm inadequate, too.
            Sorry.

            (You might also want to Google for "top-posting."
            The "about 84,400" articles won't take *you* very long,
            and may convey something useful.)

            --
            Eric.Sosman@sun .com

            Comment

            • druck

              #7
              Re: Floating point calculation

              On 19 May 2005 "Vinoth" <not_a_valid@em ailaddress.com> wrote:
              [color=blue]
              > Thanks to all for the information. I'm intrested in trying out all basic
              > operations on fixed-point arithmetic. Can you point to some free library
              > available? Google didn't help much.[/color]

              [Previous messages removed]

              Please do not top post to newsgroups.

              ---druck

              --
              The ARM Club Free Software - http://www.armclub.org.uk/free/
              The 32bit Conversions Page - http://www.quantumsoft.co.uk/druck/

              Comment

              • chris.shore@arm.nospam.com

                #8
                Re: Floating point calculation

                Vinoth wrote:[color=blue]
                >
                > Thanks to all for the information. I'm intrested in trying out all basic
                > operations on fixed-point arithmetic. Can you point to some free library
                > available? Google didn't help much.
                >[/color]

                App Note 33 on this page:



                contains some a basic introduction on implementing fixed point
                binary arithmetic on ARM cores.

                Chris

                Comment

                Working...