Feature suggestion: sum() ought to use a compensated summation algorithm

**Arnaud Delobelle** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summation algorithm

Szabolcs Horvát <szhorvat@gmail .comwrites:

[...]

A little research shows that Mathematica uses a "compensate d
summation" algorithm. Indeed, using the algorithm described at

Kahan summation algorithm - Wikipedia

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

gives us a result around ~ 10^-17:
>
def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s
>
mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
>
>
I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?
>
Szabolcs Horvát

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

--
Arnaud

**Ivan Illarionov** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On Sat, 03 May 2008 18:50:34 +0200, Szabolcs HorvÃ¡t wrote:

I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.
>
The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
>
However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.
>
Here's the program (pardon my style, I'm a newbie/occasional user):
>
from random import random
>
data = [random() for x in xrange(1000000)]
>
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
>
A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
around ~ 10^-17:
>
def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s
>
mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
>
>
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
>
Szabolcs HorvÃ¡t

Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.

If you know C you could write a patch to mathmodule.c and submit it to
Python devs.

--
Ivan

**Nick Craig-Wood** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summation algorithm

Szabolcs Horvát <szhorvat@gmail .comwrote:

I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.
>
The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
>
However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.
>
Here's the program (pardon my style, I'm a newbie/occasional user):
>
from random import random
>
data = [random() for x in xrange(1000000)]
>
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
>
A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at

Kahan summation algorithm - Wikipedia

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

gives us a result around ~ 10^-17:

I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg

>>from random import random
>>data = [random() for x in xrange(1000000)]
>>mean = sum(data)/len(data)
>>print sum(x - mean for x in data)/len(data)

1.64152134108e-14

>>mean = sum(sorted(data ))/len(data)
>>print sum(x - mean for x in data)/len(data)

5.86725534824e-15

>>>

It isn't as good as the compensated sum but it is easy!

I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

sum() gets used for any numerical types not just floats...

--
Nick Craig-Wood <nick@craig-wood.com-- http://www.craig-wood.com/nick

**=?ISO-8859-1?Q?Szabolcs_Horv=E1t?=** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

Arnaud Delobelle wrote:

>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)

Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.

But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).

Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)

(Just some thoughts ...)

**hdante** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On May 3, 3:44 pm, Szabolcs Horvát <szhor...@gmail .comwrote:

Arnaud Delobelle wrote:
>

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

>
This occurred to me also, but then I tried
>
sum(['abc', 'efg'], '')
>
and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)
>
Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.
>
But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).
>
Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)
>
(Just some thoughts ...)

The benefits should be weighted considering the common case. For
example, do you find an error of 10^-14 unbearable ? How many times
someone will add 1.000.000 numbers in a typical scientific application
written in python ?

I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?

**Arnaud Delobelle** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summation algorithm

Szabolcs Horvát <szhorvat@gmail .comwrites:

Arnaud Delobelle wrote:

>>
>sum() works for any sequence of objects with an __add__ method, not
>just floats! Your algorithm is specific to floats.

>
This occurred to me also, but then I tried
>
sum(['abc', 'efg'], '')
>
and it did not work. Or is this just a special exception to prevent
the misuse of sum to join strings? (As I said, I'm only an occasional
user.)

I think that's right: anything with an __add__ method, apart from
string, can be sum()ed.

--
Arnaud

**Erik Max Francis** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

Szabolcs Horvát wrote:

Arnaud Delobelle wrote:

>>
>sum() works for any sequence of objects with an __add__ method, not
>just floats! Your algorithm is specific to floats.

>
This occurred to me also, but then I tried
>
sum(['abc', 'efg'], '')
>
and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)

What you wrote is nonsensical there, no different from 'a' + 1 -- which
is why it quite rightly raises a TypeError.

You're trying to add a list to a string, which is nonsensical. You add
strings to strings, or lists to lists, but mixing them up doesn't make
sense. Python can't guess what you mean when you write something like
['abc', 'def'] + '' -- which is the functional equivalent of your call
to sum -- and so doesn't try. It indicates this by raising a TypeError.

--
Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis

**Torsten Bronger** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summation algorithm

Hallöchen!

Erik Max Francis writes:

Szabolcs Horvát wrote:
>

>Arnaud Delobelle wrote:

>>>
>>sum() works for any sequence of objects with an __add__ method, not
>>just floats! Your algorithm is specific to floats.

>>
>This occurred to me also, but then I tried
>>
>sum(['abc', 'efg'], '')
>>
>and it did not work. Or is this just a special exception to
>prevent the misuse of sum to join strings? (As I said, I'm only
>an occasional user.)

>
What you wrote is nonsensical there, no different from 'a' + 1 --
which is why it quite rightly raises a TypeError.

No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: bronger@jabber. org
(See http://ime.webhop.org for further contact info.)

**Erik Max Francis** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

Torsten Bronger wrote:

No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.

You're absolutely right, I misread it. Sorry about that.

--
Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis

**Ivan Illarionov** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On Sat, 03 May 2008 20:44:19 +0200, Szabolcs HorvÃ¡t wrote:

Arnaud Delobelle wrote:

>>
>sum() works for any sequence of objects with an __add__ method, not
>just floats! Your algorithm is specific to floats.

>
This occurred to me also, but then I tried
>
sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'

--
Ivan

**sturlamolden** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On May 3, 10:13 pm, hdante <hda...@gmail.c omwrote:

I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?

Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.

**Thomas Dybdahl Ahle** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:

On Sat, 03 May 2008 20:44:19 +0200, Szabolcs HorvÃ¡t wrote:
>

Arnaud Delobelle wrote:

>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

>
Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.
>
reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

--
Best Regards,
Med Venlig Hilsen,
Thomas

**Ivan Illarionov** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:

>On Sat, 03 May 2008 20:44:19 +0200, Szabolcs HorvÃ¡t wrote:
>>

Arnaud Delobelle wrote:
>>
>sum() works for any sequence of objects with an __add__ method, not
>just floats! Your algorithm is specific to floats.
>
This occurred to me also, but then I tried
>
sum(['abc', 'efg'], '')

>>
>Interesting, I always thought that sum is like shortcut of
>reduce(operato r.add, ...), but I was mistaken.
>>
>reduce() is more forgiving:
>reduce(operato r.add, ['abc', 'efg'], '' ) # it works 'abcefg'

>
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]
>
However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?

Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop

$ python -m timeit "import operator" \
"reduce(operato r.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop

--
Ivan

**George Sakkis** · Jun 27 '08, 04:23 PM

Re: Feature suggestion: sum() ought to use a compensated summationalgori thm

On May 3, 7:12 pm, Ivan Illarionov <ivan.illario.. .@gmail.comwrot e:

On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:

On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:

>

Arnaud Delobelle wrote:

>

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

>

This occurred to me also, but then I tried

>

sum(['abc', 'efg'], '')

>

Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.

>

reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works 'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

So it's not reduce() that is more forgiving, it's sum() that makes an
exception for strings only.

However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?

>
Yes, sum() is faster:
>
$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop
>
$ python -m timeit "import operator" \
"reduce(operato r.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop

Not really; you measure the import and the attribute access time in
the second case. sum() is 2x faster for adding numbers only:

# Adding integers
python -mtimeit --setup="x=[1]*100" "sum(x)"
100000 loops, best of 3: 4.87 usec per loop
python -mtimeit --setup="from operator import add; x=[1]*100"
"reduce(add ,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding floats
python -mtimeit --setup="x=[1.0]*100" "sum(x)"
100000 loops, best of 3: 5.14 usec per loop
python -mtimeit --setup="from operator import add; x=[1.0]*100"
"reduce(add ,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding tuples
python -mtimeit --setup="x=[(1,)]*100" "sum(x,())"
10000 loops, best of 3: 61.6 usec per loop
python -mtimeit --setup="from operator import add; x=[(1,)]*100"
"reduce(add,x,( ))"
10000 loops, best of 3: 66.3 usec per loop

# Adding lists
python -mtimeit --setup="x=[[1]]*100" "sum(x,[])"
10000 loops, best of 3: 79.9 usec per loop
python -mtimeit --setup="from operator import add; x=[[1]]*100"
"reduce(add ,x,[])"
10000 loops, best of 3: 84.3 usec per loop

George

Feature suggestion: sum() ought to use a compensated summation algorithm

Feature suggestion: sum() ought to use a compensated summation algorithm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment