[...]
So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?
Maybe not intuitively, but if you know how dynamically growing data
structures are implemented, it's plausible. They overallocate, and the
amount of overallocation is dependent on the current size. Relevant
source snippet from Python 2.6:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >3) + (newsize < 9 ? 3 : 6);
If, on the other hand, we knew beforehand how big the list will get
approximately, we could avoid all these reallocations. No problem with
Python's C API:
>
Peter,
>
So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?
>
That test only demonstrates that it's faster to append to a 1 million items
list than an empty one (and this on a particular platform with a particular
python version). Different sizes may give different result. I guess this is
because of some internal optimisations (items are probably allocated by
chunks, so sometimes append() involves a realloc, sometimes not).
So the only thing you should remember is that list.append() has a complexity
of O(1), and thus should be considered a constant time operation for any
length. Just be aware of the note:
[1] = These operations rely on the "Amortized" part of "Amortized Worst Case".
Individual actions may take surprisingly long, depending on the history of
the container.
Also note that 50000 items is a lot for a human being, not for a modern
computer.
Le Monday 30 June 2008 15:52:56 Gerhard Häring, vous avez écrit :
Larry Bates wrote:
[...]
So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?
>
Maybe not intuitively, but if you know how dynamically growing data
structures are implemented, it's plausible. They overallocate, and the
amount of overallocation is dependent on the current size. Relevant
source snippet from Python 2.6:
>
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >3) + (newsize < 9 ? 3 : 6);
>
If, on the other hand, we knew beforehand how big the list will get
approximately, we could avoid all these reallocations. No problem with
Python's C API:
>
PyAPI_FUNC(PyOb ject *) PyList_New(Py_s size_t size);
>
But you can't do it directly from Python, unless you (ab)use ctypes.
>
-- Gerhard
>
-- http://mail.python.org/mailman/listinfo/python-list
Well, as I posted few days ago, one could envisage, as a pure python
optimization for dealing with long list, to replace an algorithm with a lot
of append by something like this :
mark = object()
datas = [ mark ] * expected_size
# working with the datas while maintaining the effective currrently used size
Of course one could even subclass list and redefine __len__, append, and some
other methods to deal with this "allocated by block" list.
Le Monday 30 June 2008 15:52:56 Gerhard Häring, vous avez écrit :
>Larry Bates wrote:
>If, on the other hand, we knew beforehand how big the list will get
>approximatel y, we could avoid all these reallocations. No problem with
>Python's C API:
>>
>PyAPI_FUNC(PyO bject *) PyList_New(Py_s size_t size);
>>
>But you can't do it directly from Python, unless you (ab)use ctypes.
>>
>-- Gerhard
>>
>--
>http://mail.python.org/mailman/listinfo/python-list
>
Well, as I posted few days ago, one could envisage, as a pure python
optimization for dealing with long list, to replace an algorithm with a lot
of append by something like this :
>
mark = object()
>
datas = [ mark ] * expected_size
datas = [None] * expected_size
has been a standard idiom since before object() existed ;-)
and works fine *unless* one wants to add None explicitly
and have that be different from 'unused'.
>
# working with the datas while maintaining the effective currrently used size
>
Of course one could even subclass list and redefine __len__, append, and some
other methods to deal with this "allocated by block" list.
An interesting idea if one does this at least a few times and wants to
use .append and .extend instead of explicit indexing.
One could also make such a subclass a 'no-grow' list if appropriate
(when an attempt to grow it would indicate a bug).
Le Monday 30 June 2008 22:21:35 Terry Reedy, vous avez écrit :
Well, as I posted few days ago, one could envisage, as a pure python
optimization for dealing with long list, to replace an algorithm with a
lot of append by something like this :
mark = object()
datas = [ mark ] * expected_size
>
datas = [None] * expected_size
has been a standard idiom since before object() existed ;-)
and works fine *unless* one wants to add None explicitly
and have that be different from 'unused'.
Yes, in fact I used a marker because it I thought of it primarily as outbound
for the list (like \0 for strings in C), but it doesnt' matter what is the
object you put in the list, if you know at every moment its size.
A subclass of list will indeed have to override most of the methods of its
parent (not just some as I assumed before), using extend for reallocation
with some sort of iterator with size, as it work in my previous example with
xrange, something like that :
Comment