Creating a List of Empty Lists

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Skip Montanaro

    #16
    Re: Creating a List of Empty Lists


    Emile> But does intern() intern?

    Yes, I believe it does:
    [color=blue][color=green][color=darkred]
    >>> intern('foo bar')[/color][/color][/color]
    'foo bar'[color=blue][color=green][color=darkred]
    >>> a = 'foo bar'
    >>> a is 'foo bar'[/color][/color][/color]
    0[color=blue][color=green][color=darkred]
    >>> a = intern(a)
    >>> a is 'foo bar'[/color][/color][/color]
    0[color=blue][color=green][color=darkred]
    >>> a is intern('foo bar')[/color][/color][/color]
    1[color=blue][color=green][color=darkred]
    >>> a = 'foo_bar'
    >>> a is 'foo_bar'[/color][/color][/color]
    1

    Emile> does it _only_ apply to strings that look like identifiers?

    Automatic interning, yes. At the point where the system has to decide
    whether to automatically intern a string it knows nothing about the higher
    level context in which the string appears. The optimization was added
    because the strings which are manipulated most often by the interpreter
    happen to be those which represent the program's identifiers (object
    attributes, variables, etc). Consequently, the decision was made some time
    ago to simply intern all strings which look like identifiers (as the snippet
    above shows). It's a bit like driving a tack with a sledge hammer, but it
    does improve interpreter performance.

    Emile> How else might you know?

    It doesn't really matter. The intern() function is available to programmers
    who want to conciously optimize their string handling and is properly
    documented. Automatic interning is not an optimization aimed at the
    programmer, but at the internals of the interpreter. It's just a side
    effect of the crude optimization that if you happen to use a string literal
    in your program which looks like an identifier, it's interned
    automatically. Look at is as a freebie. ;-)

    Skip

    Comment

    • Duncan Booth

      #17
      Re: Creating a List of Empty Lists

      Robin Becker <robin@jessikat .fsnet.co.uk> wrote in
      news:SWfFiLAYAi 0$EwHZ@jessikat .fsnet.co.uk:
      [color=blue]
      > In article <Xns9447973F01B Fduncanrcpcouk@ 127.0.0.1>, Duncan Booth
      ><duncan@NOSPAM rcp.co.uk> writes
      > ......[color=green]
      >>
      >>The recommended way these days is usually:
      >>
      >> a = [ [] for i in range(10) ]
      >>
      >>That still has a loop and works by appending empty lists, but at least
      >>its just a single expression. Also you can incorporate the next stage
      >>of your initialisation quite easily:
      >>
      >> a = [ [b] for b in range(10) ]
      >>[/color]
      > I seem to remember the fastest way to do this was map(list,n*[[]])
      > from a couple of earlier threads, but is that true in 2.3?[/color]

      I think I would prefer to maintain code with the list comprehension rather
      than the map which, to my eyes, isn't immediately obvious. However, it
      would appear that the list comprehension is much faster in any case, so in
      this case I believe the clearest solution is also the fastest.

      C:\>d:\python23 \lib\timeit.py "[ [] for i in range(10) ]"
      10000 loops, best of 3: 28.6 usec per loop

      C:\>d:\python23 \lib\timeit.py "map(list,1 0*[[]])"
      10000 loops, best of 3: 102 usec per loop

      C:\>d:\python23 \lib\timeit.py -s ll=list -s mm=map "mm(ll,10*[[]])"
      10000 loops, best of 3: 91.9 usec per loop

      --
      Duncan Booth duncan@rcp.co.u k
      int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
      "\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

      Comment

      • Robin Becker

        #18
        Re: Creating a List of Empty Lists

        unfortunately all of these tests are slower than 1 clock tick on my
        machine. I believe the may be biassed.
        --
        Robin Becker

        Comment

        • Robin Becker

          #19
          Re: Creating a List of Empty Lists

          Duncan Booth's prompted me to repeat all the nonsense about empty lists
          of a few years ago. I am amazed at the differences between pythons.
          Lambda is now faster than list!!! I would really like to know why
          list([]) is so much slower than list(()). Clearly comprehensions are now
          fast, but still slower than the corresponding map with a lambda.


          My results all obtained on the same win2k sp4 machine.
          C:\tmp>\python2 0\python ttt.py
          list () = 2.09
          list [] = 1.19
          comprehension = 1.97
          copy = 4.69
          cCopy.copy = 2.09
          lambda z: z[:] = 1.56
          lambda z: list(z) = 2.66
          lambda z: [] = 1.42

          C:\tmp>\python2 1\python ttt.py
          list () = 2.33
          list [] = 1.23
          comprehension = 1.78
          copy = 4.34
          cCopy.copy = 2.22
          lambda z: z[:] = 1.55
          lambda z: list(z) = 2.33
          lambda z: [] = 1.41

          C:\tmp>\python2 2\python ttt.py
          list () = 3.22
          list [] = 1.33
          comprehension = 1.59
          copy = 4.05
          cCopy.copy = 2.13
          lambda z: z[:] = 1.69
          lambda z: list(z) = 2.55
          lambda z: [] = 1.64

          C:\tmp>\python2 3\python ttt.py
          list () = 1.73
          list [] = 3.22
          comprehension = 1.00
          copy = 3.94
          cCopy.copy = 1.59
          lambda z: z[:] = 1.14
          lambda z: list(z) = 4.77
          lambda z: [] = 0.95

          ############### ttt.py
          import time
          s = 'list ()'
          t0=time.time()
          for y in xrange(1000):
          x = map(list,1000*[()])
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

          s = 'list []'
          t0=time.time()
          for y in xrange(1000):
          x = map(list,1000*[[]])
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

          s = "comprehens ion"
          t0=time.time()
          for y in xrange(1000):
          x = [[] for i in xrange(1000)]
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

          from copy import copy
          s = 'copy'
          t0=time.time()
          for y in xrange(1000):
          x = map(copy,1000*[[]])
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

          try:
          from cCopy import copy as ccopy
          s = 'cCopy.copy'
          t0=time.time()
          for y in xrange(1000):
          x = map(ccopy,1000*[[]])
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s
          except ImportError:
          pass

          s = 'lambda z: z[:]'
          t0=time.time()
          for y in xrange(1000):
          x = map(lambda z: z[:],1000*[[]])
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

          s = 'lambda z: list(z)'
          t0=time.time()
          for y in xrange(1000):
          x = map(lambda z: list(z),1000*[[]])
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

          s = 'lambda z: []'
          t0=time.time()
          for y in xrange(1000):
          x = map(lambda z: [],xrange(1000))
          t1 = time.time()
          print "%-20s = %.2f" % (s,(t1-t0))
          assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s
          --
          Robin Becker

          Comment

          • Fredrik Lundh

            #20
            Re: Creating a List of Empty Lists

            Robin Becker wrote:
            [color=blue]
            > C:\tmp>\python2 2\python ttt.py
            > list () = 3.22
            > list [] = 1.33
            > lambda z: list(z) = 2.55[/color]
            [color=blue]
            > C:\tmp>\python2 3\python ttt.py
            > list () = 1.73
            > list [] = 3.22
            > lambda z: list(z) = 4.77[/color]

            is this part of the benchmark stable, or was your computer doing
            something else in the background? I find it a bit disturbing that a
            2.3 optimization would slow down a trivial operation that much...

            </F>




            Comment

            • Duncan Booth

              #21
              Re: Creating a List of Empty Lists

              Robin Becker <robin@jessikat .fsnet.co.uk> wrote in
              news:tOjKkIAu$a 1$Ewoy@jessikat .fsnet.co.uk:
              [color=blue]
              > Duncan Booth's prompted me to repeat all the nonsense about empty lists
              > of a few years ago. I am amazed at the differences between pythons.
              > Lambda is now faster than list!!! I would really like to know why
              > list([]) is so much slower than list(()). Clearly comprehensions are now
              > fast, but still slower than the corresponding map with a lambda.[/color]

              May I ask you to repeat your tests but putting all the code inside
              functions, please? I think it is a bit unfair to compare a list
              comprehension with a map/lambda and force the list comprehension to access
              a global variable every time round the loop.

              I think if you let the list comprehension use a local variable it should
              comfortably beat the best of your lambdas (at least it does on my system).

              By the way, I didn't understand your earlier comment:
              [color=blue]
              > unfortunately all of these tests are slower than 1 clock tick on my
              > machine. I believe the may be biassed.[/color]

              You obviously think something was wrong with my earlier timings, but I
              can't understand from this what the problem was; timeit.py does some
              reasonably clever things to try to give an accurate answer including
              varying the number of times it repeats the test to ensure that the time per
              loop is based on a reasonably large time interval.

              --
              Duncan Booth duncan@rcp.co.u k
              int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
              "\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

              Comment

              • Robin Becker

                #22
                Re: Creating a List of Empty Lists

                In article <mailman.262.10 70970634.16879. python-list@python.org >,
                Fredrik Lundh <fredrik@python ware.com> writes[color=blue]
                >Robin Becker wrote:
                >[color=green]
                >> C:\tmp>\python2 2\python ttt.py
                >> list () = 3.22
                >> list [] = 1.33
                >> lambda z: list(z) = 2.55[/color]
                >[color=green]
                >> C:\tmp>\python2 3\python ttt.py
                >> list () = 1.73
                >> list [] = 3.22
                >> lambda z: list(z) = 4.77[/color]
                >
                >is this part of the benchmark stable, or was your computer doing
                >something else in the background? I find it a bit disturbing that a
                >2.3 optimization would slow down a trivial operation that much...
                >
                ></F>
                >
                >
                >
                >[/color]
                No this is stable I also find it really weird.
                --
                Robin Becker

                Comment

                • Robin Becker

                  #23
                  Re: Creating a List of Empty Lists

                  In article <Xns944C79E1BBF 8Dduncanrcpcouk @127.0.0.1>, Duncan Booth
                  <duncan@NOSPAMr cp.co.uk> writes
                  .....[color=blue]
                  >You obviously think something was wrong with my earlier timings, but I
                  >can't understand from this what the problem was; timeit.py does some
                  >reasonably clever things to try to give an accurate answer including
                  >varying the number of times it repeats the test to ensure that the time per
                  >loop is based on a reasonably large time interval.
                  >[/color]
                  not at all, I was puzzled by the reversal of earlier stuff. You're right
                  about the comprehension as well, I had thought the variable was
                  internal, but making it local gives the comprehension the edge.

                  C:\tmp>\python2 3\python ttt.py
                  list () = 1.72
                  list [] = 3.27
                  comprehension = 0.81
                  copy = 3.94
                  cCopy.copy = 1.59
                  lambda z: z[:] = 1.14
                  lambda z: list(z) = 4.73
                  lambda z: [] = 0.95
                  f=lambda z: [] = 0.95

                  like /F though I am extremely puzzled about the list result.
                  --
                  Robin Becker

                  Comment

                  • Duncan Booth

                    #24
                    Re: Creating a List of Empty Lists

                    Robin Becker <robin@jessikat .fsnet.co.uk> wrote in
                    news:LIGj15A6Vc 1$EwNO@jessikat .fsnet.co.uk:
                    [color=blue]
                    >
                    > C:\tmp>\python2 3\python ttt.py
                    > list () = 1.72
                    > list [] = 3.27
                    > comprehension = 0.81
                    > copy = 3.94
                    > cCopy.copy = 1.59
                    > lambda z: z[:] = 1.14
                    > lambda z: list(z) = 4.73
                    > lambda z: [] = 0.95
                    > f=lambda z: [] = 0.95
                    >
                    > like /F though I am extremely puzzled about the list result.[/color]

                    I just had a look at listobject.c, in particular the code for list_fill.

                    static int
                    list_fill(PyLis tObject *result, PyObject *v)
                    {
                    PyObject *it; /* iter(v) */
                    int n; /* guess for result list size */
                    int i;

                    n = result->ob_size;

                    /* Special-case list(a_list), for speed. */
                    if (PyList_Check(v )) {
                    if (v == (PyObject *)result )
                    return 0; /* source is destination, we're done */
                    return list_ass_slice( result, 0, n, v);
                    }

                    /* Empty previous contents */
                    if (n != 0) {
                    if (list_ass_slice (result, 0, n, (PyObject *)NULL) != 0)
                    return -1;
                    }

                    /* Get iterator. There may be some low-level efficiency to be gained
                    * by caching the tp_iternext slot instead of using PyIter_Next()
                    * later, but premature optimization is the root etc.
                    */
                    ... and so on
                    }

                    Timing 'python \python23\lib\t imeit.py "list([])"' gave times around
                    4.5uSec per loop, whereas the same with "list(())" gives about 2.3uSec per
                    loop.

                    Putting #ifdef 0/#endif around the 'special case for speed' block makes the
                    first case go from 4.5uS to 2.4uS, and the second case also speeds up
                    marginally for good measure.

                    So it looks suspiciously like a premature optimisation 'for speed' should
                    say 'for reduced speed'.

                    Next I tried:
                    python \python23\lib\t imeit.py -s"r=range(10000 )" "list(r)"

                    and
                    python \python23\lib\t imeit.py -s"r=tuple(range (10000))" "list(r)"

                    For this length of list the optimisation is a clear winner.

                    It looks to me as though the breakeven is around 100 elements in the list.
                    With fewer than 100 elements the optimisation slows things down, with more
                    it speeds them up.

                    I put the tests in a batch file:

                    ------ test.bat --------
                    @echo off
                    @echo Length: 0
                    python \python23\lib\t imeit.py -s"r=[]" "list(r)"
                    python \python23\lib\t imeit.py -s"r=()" "list(r)"
                    @echo Length: 1
                    python \python23\lib\t imeit.py -s"r=range(1) " "list(r)"
                    python \python23\lib\t imeit.py -s"r=tuple(range (1))" "list(r)"
                    @echo Length: 10
                    python \python23\lib\t imeit.py -s"r=range(10 )" "list(r)"
                    python \python23\lib\t imeit.py -s"r=tuple(range (10))" "list(r)"
                    @echo Length: 100
                    python \python23\lib\t imeit.py -s"r=range(10 0)" "list(r)"
                    python \python23\lib\t imeit.py -s"r=tuple(range (100))" "list(r)"
                    @echo Length: 1000
                    python \python23\lib\t imeit.py -s"r=range(1000) " "list(r)"
                    python \python23\lib\t imeit.py -s"r=tuple(range (1000))" "list(r)"
                    @echo Length: 10000
                    python \python23\lib\t imeit.py -s"r=range(10000 )" "list(r)"
                    python \python23\lib\t imeit.py -s"r=tuple(range (10000))" "list(r)"
                    ------------------------
                    With the original code I get:

                    C:\Pythonsrc\py thon\dist\src\P Cbuild>test
                    Length: 0
                    100000 loops, best of 3: 4.38 usec per loop
                    100000 loops, best of 3: 2.32 usec per loop
                    Length: 1
                    100000 loops, best of 3: 5.26 usec per loop
                    100000 loops, best of 3: 2.4 usec per loop
                    Length: 10
                    100000 loops, best of 3: 5.39 usec per loop
                    100000 loops, best of 3: 2.84 usec per loop
                    Length: 100
                    100000 loops, best of 3: 7.44 usec per loop
                    100000 loops, best of 3: 7.26 usec per loop
                    Length: 1000
                    10000 loops, best of 3: 28.5 usec per loop
                    10000 loops, best of 3: 50.4 usec per loop
                    Length: 10000
                    1000 loops, best of 3: 258 usec per loop
                    1000 loops, best of 3: 505 usec per loop

                    With the optimisation code completely removed I get:
                    C:\Pythonsrc\py thon\dist\src\P Cbuild>test
                    Length: 0
                    100000 loops, best of 3: 2.21 usec per loop
                    100000 loops, best of 3: 2.24 usec per loop
                    Length: 1
                    100000 loops, best of 3: 2.34 usec per loop
                    100000 loops, best of 3: 2.36 usec per loop
                    Length: 10
                    100000 loops, best of 3: 2.84 usec per loop
                    100000 loops, best of 3: 2.85 usec per loop
                    Length: 100
                    100000 loops, best of 3: 7.56 usec per loop
                    100000 loops, best of 3: 7.13 usec per loop
                    Length: 1000
                    10000 loops, best of 3: 53.2 usec per loop
                    10000 loops, best of 3: 50.4 usec per loop
                    Length: 10000
                    1000 loops, best of 3: 549 usec per loop
                    1000 loops, best of 3: 534 usec per loop

                    With the optimisation tweaked to ignore 0 length lists entirely and only
                    'optimise' for lists with more than 100 elements:

                    C:\Pythonsrc\py thon\dist\src\P Cbuild>test
                    Length: 0
                    1000000 loops, best of 3: 1.39 usec per loop
                    100000 loops, best of 3: 2.31 usec per loop
                    Length: 1
                    100000 loops, best of 3: 2.28 usec per loop
                    100000 loops, best of 3: 2.33 usec per loop
                    Length: 10
                    100000 loops, best of 3: 2.85 usec per loop
                    100000 loops, best of 3: 2.84 usec per loop
                    Length: 100
                    100000 loops, best of 3: 7.43 usec per loop
                    100000 loops, best of 3: 7.14 usec per loop
                    Length: 1000
                    10000 loops, best of 3: 28.4 usec per loop
                    10000 loops, best of 3: 50.2 usec per loop
                    Length: 10000
                    1000 loops, best of 3: 256 usec per loop
                    1000 loops, best of 3: 532 usec per loop

                    The relevant part of list_fill now looks like this:

                    /* Special-case list(a_list), for speed. */
                    if (PyList_Check(v )) {
                    int vsize = ((PyListObject* )v)->ob_size;
                    if (v == (PyObject *)result || vsize==0)
                    return 0; /* nothing to copy */
                    if (vsize > 100)
                    return list_ass_slice( result, 0, n, v);
                    }


                    --
                    Duncan Booth duncan@rcp.co.u k
                    int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
                    "\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

                    Comment

                    Working...