problem returning floating point

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Peter

    problem returning floating point

    I have written this small app to explain an issue I'm having with a larger
    program.

    In the following code I'm taking 10 ints from the keyboard. In the call to
    average() these 10 ints are then added and Divided by the number of ints to
    return the average, a float.

    Can someone see why I get for example: if the total is 227 then divided by
    10 I get a return of 22.00000 instead of the correct answer 22.7.

    Thanks in advance
    Pete

    #include <stdio.h>


    float average(int, int[]); /* Prototype returning float */

    int main()
    {
    int num[10]; /* array to hold 10 ints */
    float avg; /* float to hold the average */
    int i, size=10;

    printf("Enter 10 integers: \n");

    /* Takes ten ints for keyboard */

    for(i=0; i<size; i++)
    {
    scanf("%d", &num[i]);
    }

    avg = average(size, num); /* Call to average() passing size and array */
    printf("%f", avg);
    }

    /* average() calculates total of 10 ints & calculates average, returns float
    ans */

    float average(int count, int numbers[])
    {
    int total=0;
    int i;
    float ans;

    for(i=0; i<count; i++)
    {
    total +=numbers[i];

    /*printf("Total = %d \n", total);*/
    ans = total/count;
    }
    return ans;

    }

  • Richard Heathfield

    #2
    Re: problem returning floating point

    Peter said:
    I have written this small app to explain an issue I'm having with a
    larger program.
    >
    In the following code I'm taking 10 ints from the keyboard. In the call
    to average() these 10 ints are then added and Divided by the number of
    ints to return the average, a float.
    >
    Can someone see why I get for example: if the total is 227 then divided
    by 10 I get a return of 22.00000 instead of the correct answer 22.7.
    227.0 / 10.0 is indeed 22.7, within reason - the nature of floating point
    is such that you can't always get an answer as precisely as you'd like.

    So:

    227.0 / 10.0 is 22.7 (ish)
    227.0 / 10 is 22.7 (ish)
    227 / 10.0 is 22.7 (ish)
    227 / 10 is 22 (precisely)

    Spot the difference?

    <snip>

    One modification will suffice to fix the above problem:
    float average(int count, int numbers[])
    {
    int total=0;
    Change this to:

    float total=0;

    --
    Richard Heathfield <http://www.cpax.org.uk >
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999

    Comment

    • viza

      #3
      Re: problem returning floating point

      Hi

      On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote:
      Peter said:
      >
      >float average(int count, int numbers[])
      >{
      > int total=0;
      > int i;
      > float ans;
      >>
      > for(i=0; i<count; i++)
      > {
      > total +=numbers[i];
      > ans = total/count;
      > }
      > return ans;
      >>}
      >
      One modification will suffice to fix the above problem:
      >
      >int total=0;
      >
      Change this to:
      >
      float total=0;
      Not strictly wrong, but bad advice.

      As long as the size of the integer used is big enough to hold the sum, it
      is both faster and more accurate to do the addition in an integer.

      A much better change to make is:

      ans = (double)total / (double)count;

      Also, it is massively stupid to calculate the division inside each loop,
      although a half-decent compiler will realise this and optimise it away.

      My version of this function would look like:

      float average( int count, int *numbers ){
      int total= 0;
      int *stop= numbers + count;

      while( numbers < stop )
      total+= *numbers++;

      return (double)total / count;
      }

      I have removed the subscripting in favour of comparing pointers, to
      improve speed, in case this is performance critical.

      In the case where an int couldn't hold the maximum possible sum of the
      ten inputs, I would use a long or long long. In the case where that was
      still not enough, I would use two nested loops, the inner one adding up
      using integers, and the outer one using doubles. Again, this would give
      both increased accuracy and increased speed.

      Comment

      • Richard Tobin

        #4
        Re: problem returning floating point

        In article <BhBQk.2959$i%. 2137@newsfe25.a ms2>,
        viza <tom.viza@gm-il.com.obviousc hange.invalidwr ote:
        >In the case where an int couldn't hold the maximum possible sum of the
        >ten inputs, I would use a long or long long. In the case where that was
        >still not enough, I would use two nested loops, the inner one adding up
        >using integers, and the outer one using doubles. Again, this would give
        >both increased accuracy and increased speed.
        Loss of precision in a case like this arises when a small number is
        added to a large one. A reasonable approach therefore is to add up
        each half of the numbers and then add these together, using the same
        method recursively to add up the halves. Something like this
        (untested):

        double sum_ints(int count, int *numbers)
        {
        if(count == 0)
        return 0.0;
        else if(count == 1)
        return (double)numbers[0];
        else return sum_ints(count/2, numbers) +
        sum_ints(count-count/2, numbers+count/2);
        }

        -- Richard
        --
        Please remember to mention me / in tapes you leave behind.

        Comment

        • Nick Keighley

          #5
          Re: problem returning floating point

          On 6 Nov, 12:19, viza <tom.v...@gm-il.com.obviousc hange.invalid>
          wrote:
          I have removed the subscripting in favour of comparing pointers, to
          improve speed, in case this is performance critical.
          do you have access to a compiler where this makes any difference?

          --
          Nick Keighley

          Comment

          • viza

            #6
            Re: problem returning floating point

            Hi

            On Thu, 06 Nov 2008 04:39:57 -0800, Nick Keighley wrote:
            On 6 Nov, 12:19, viza wrote:
            >
            >I have removed the subscripting in favour of comparing pointers, to
            >improve speed, in case this is performance critical.
            >
            do you have access to a compiler where this makes any difference?
            No, not noticeably, if at all. The use of one less variable makes the C
            look tidier to me, quite aside from what the compiler outputs.

            viza

            Comment

            • Nick Keighley

              #7
              Re: problem returning floating point

              On 6 Nov, 14:20, viza <tom.v...@gm-il.com.obviousc hange.invalid>
              wrote:
              Hi
              >
              On Thu, 06 Nov 2008 04:39:57 -0800, Nick Keighley wrote:
              On 6 Nov, 12:19, viza wrote:
              >
              I have removed the subscripting in favour of comparing pointers, to
              improve speed, in case this is performance critical.
              >
              do you have access to a compiler where this makes any difference?
              >
              No, not noticeably, if at all. The use of one less variable makes the C
              look tidier to me, quite aside from what the compiler outputs.
              I don't see how you are saving a variable. You wrote
              (I fiddled with layout a bit)

              float average( int count, int *numbers )
              {
              int total= 0;
              int *stop= numbers + count;

              while( numbers < stop )
              total+= *numbers++;

              return (double)total / count;
              }

              I'd write

              float average (int count, int *numbers)
              {
              int total = 0;
              int i;

              for (i = 0; i < count; count++)
              total += numbers [i];

              return (double)total / count;
              }

              I don't find your version significantly easier to understand.
              The termination condition seems a little obscure.


              --
              Nick Keighley






              Comment

              • Richard Heathfield

                #8
                Re: problem returning floating point

                viza said:
                Hi
                >
                On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote:
                >Peter said:
                >>
                >>float average(int count, int numbers[])
                >>{
                >> int total=0;
                >> int i;
                >> float ans;
                >>>
                >> for(i=0; i<count; i++)
                >> {
                >> total +=numbers[i];
                >> ans = total/count;
                >> }
                >> return ans;
                >>>}
                >>
                >One modification will suffice to fix the above problem:
                >>
                >>int total=0;
                >>
                >Change this to:
                >>
                >float total=0;
                >
                Not strictly wrong, but bad advice.
                I disagree, but let's see what your grounds are.
                As long as the size of the integer used is big enough to hold the sum,
                We don't know this, so you're already on dodgy ground.
                it is both faster and more accurate to do the addition in an integer.
                Chapter and verse, please. (Math co-processors are commonplace nowadays.)
                >
                A much better change to make is:
                >
                ans = (double)total / (double)count;
                Introducing two spurious casts, and risking overflow on the total.
                Also, it is massively stupid to calculate the division inside each loop,
                although a half-decent compiler will realise this and optimise it away.
                Fine, but he wasn't asking about performance - he was asking about
                precision.
                My version of this function would look like:
                >
                float average( int count, int *numbers ){
                int total= 0;
                int *stop= numbers + count;
                >
                while( numbers < stop )
                total+= *numbers++;
                >
                return (double)total / count;
                }
                /* test driver */
                #include <stdio.h>
                #include <math.h>
                #include <limits.h>

                int main(void)
                {
                int data[] = { INT_MAX, INT_MAX - 1, INT_MAX - 2 };
                float expected = INT_MAX - 1;
                float actual = average(3, data);
                printf("Expecte d: %f\n", expected);
                printf("Actual: %f\n", actual);
                printf("Error: %f\n", fabs(expected - actual));
                return 0;
                }

                Result (on my system):

                Expected: 2147483648.0000 00
                Actual: 715827904.00000 0
                Error: 1431655744.0000 00

                Curiously, on my system, the expected result isn't that hot - but look at
                the actual result!

                If you plug in doubles instead, the expected result is far better:

                Expected: 2147483646.0000 00
                Actual: 715827880.66666 7
                Error: 1431655765.3333 33

                but the error is still colossal.
                I have removed the subscripting in favour of comparing pointers, to
                improve speed, in case this is performance critical.
                In these days of optimising compilers it's unlikely to make any difference
                at all, and it's certainly not guaranteed not to be slower.
                In the case where an int couldn't hold the maximum possible sum of the
                ten inputs, I would use a long
                Might not help - didn't in the above test.
                or long long.
                Might not be available. And of course might not help even if it is.
                In the case where that was
                still not enough, I would use two nested loops, the inner one adding up
                using integers, and the outer one using doubles. Again, this would give
                both increased accuracy and increased speed.
                I doubt it very much. But feel free to demonstrate.

                --
                Richard Heathfield <http://www.cpax.org.uk >
                Email: -http://www. +rjh@
                Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                "Usenet is a strange place" - dmr 29 July 1999

                Comment

                • viza

                  #9
                  Re: problem returning floating point

                  Hi

                  On Thu, 06 Nov 2008 16:05:28 +0000, Richard Heathfield wrote:
                  viza said:
                  >On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote:
                  >As long as the size of the integer used is big enough to hold the sum,
                  >
                  We don't know this, so you're already on dodgy ground.
                  We don't know that float is either. As soon as you add two of anything
                  in C you make an assumption that the result will fit in the type.
                  >it is both faster and more accurate to do the addition in an integer.
                  >
                  Chapter and verse, please. (Math co-processors are commonplace
                  nowadays.)
                  I did test specifically this function when implementing several image
                  shrinking functions. Integer was faster on both x86 and x86-64, which
                  was all I had to hand. 64-bit integers were even faster then float or
                  double on one 32 bit cpu that I tested.
                  >A much better change to make is:
                  >>
                  > ans = (double)total / (double)count;
                  >
                  Introducing two spurious casts, and risking overflow on the total.
                  Your method fails first. The range where all integers can be represented
                  in a float is much smaller than the range of the same size int. Once you
                  pass that limit then if some of your input ints are small they will be
                  completely lost and if many of them are small (wrt the eventual mean)
                  then the output of the float method will be less precise than in the
                  integer method.

                  int data[] = { INT_MAX, INT_MAX - 1, INT_MAX - 2 };
                  >or long long.
                  Might not be available.
                  Ok, my method simply cannot handle integers of the largest type available
                  on the system which are close to their maximum value. In many situations
                  it is possible to know a priori that this will not be the case, and
                  achieve a more accurate result more quickly by using my method.

                  By your method the OP would discard accuracy when that may not have been
                  necessary or even acceptable, and would have run slower on many systems.

                  Have a nice day.

                  viza

                  Comment

                  • Richard Heathfield

                    #10
                    Re: problem returning floating point

                    viza said:
                    Hi
                    >
                    On Thu, 06 Nov 2008 16:05:28 +0000, Richard Heathfield wrote:
                    >viza said:
                    <snip>
                    >>A much better change to make is:
                    >>>
                    >> ans = (double)total / (double)count;
                    >>
                    >Introducing two spurious casts, and risking overflow on the total.
                    >
                    Your method fails first.
                    Not according to my testing.

                    --
                    Richard Heathfield <http://www.cpax.org.uk >
                    Email: -http://www. +rjh@
                    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                    "Usenet is a strange place" - dmr 29 July 1999

                    Comment

                    • user923005

                      #11
                      Re: problem returning floating point

                      On Nov 6, 4:19 am, viza <tom.v...@gm-il.com.obviousc hange.invalid>
                      wrote:
                      Hi
                      >
                      >
                      >
                      >
                      >
                      On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote:
                      Peter said:
                      >
                      float average(int count, int numbers[])
                      {
                        int total=0;
                        int i;
                        float ans;
                      >
                        for(i=0; i<count; i++)
                        {
                          total +=numbers[i];
                          ans = total/count;
                        }
                        return ans;
                      >}
                      >
                      One modification will suffice to fix the above problem:
                      >
                      int total=0;
                      >
                      Change this to:
                      >
                      float total=0;
                      >
                      Not strictly wrong, but bad advice.
                      >
                      As long as the size of the integer used is big enough to hold the sum, it
                      is both faster and more accurate to do the addition in an integer.
                      Utter poppycock. As long as the floating point type has enough
                      precision to hold the totals, a floating point type is exactly as
                      precise as an integral type.
                      So for up to 6 digits with 4 byte float or 15 digits with 8 byte
                      float, the floating point value will hold the exact answer just like
                      an integer would.

                      Also, floating point operations are not appreciably slower and many
                      modern systems have special hardware for floating point to accelerate
                      it.
                      See, for example:


                      One floating point addition per cycle is going to be hard to beat.
                      A much better change to make is:
                      >
                           ans = (double)total / (double)count;
                      >
                      Also, it is massively stupid to calculate the division inside each loop,
                      although a half-decent compiler will realise this and optimise it away.
                      >
                      My version of this function would look like:
                      >
                      float average( int count, int *numbers ){
                        int total= 0;
                        int *stop= numbers + count;
                      >
                        while( numbers < stop )
                          total+= *numbers++;
                      >
                        return (double)total / count;
                      >
                      }
                      >
                      I have removed the subscripting in favour of comparing pointers, to
                      improve speed, in case this is performance critical.
                      More nonsense.
                      In the case where an int couldn't hold the maximum possible sum of the
                      ten inputs, I would use a long or long long.  In the case where that was
                      still not enough, I would use two nested loops, the inner one adding up
                      using integers, and the outer one using doubles.  Again, this would give
                      both increased accuracy and increased speed.
                      All solutions are going to be O(n) and these little tweaky things you
                      are suggesting are for the most part utter nonsense.

                      However, your accumulation only and single division is the right way
                      to do it unless there is a need for a running sum (e.g. if you need
                      the current average after each insertion).

                      Comment

                      • viza

                        #12
                        Re: problem returning floating point

                        Hi

                        On Thu, 06 Nov 2008 13:13:01 -0800, user923005 wrote:
                        On Nov 6, 4:19 am, viza wrote:
                        >As long as the size of the integer used is big enough to hold the sum,
                        >it is both faster and more accurate to do the addition in an integer.
                        Also, floating point operations are not appreciably slower and many
                        modern systems have special hardware for floating point to accelerate
                        it.
                        See, for example:

                        >
                        One floating point addition per cycle is going to be hard to beat.
                        Ok, if you use simd it gets much faster but most compilers can't (or
                        don't by default) use those instructions in a loop like this, so if you
                        are only writing standard C and using a common compiler then integer
                        addition is still much faster.

                        I take your point that both ints and floats can store ints exactly over a
                        certain range; that range is larger for ints though.
                        >I have removed the subscripting in favour of comparing pointers, to
                        >improve speed, in case this is performance critical.
                        >
                        More nonsense.
                        The compiler will probably do that for you but it can't hurt.

                        viza

                        Comment

                        Working...