benchmarks and questions for new python programmer

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • stormslayer

    benchmarks and questions for new python programmer

    Folks:

    I've been considering a shift to python. I currently use c++builder
    (borland) or perl. I do computational models / statistics
    programming, and was interested in python b/c it

    a. has a library that connects to the R stats package
    b. has code that seems way more readable than anything else

    There is, however, at least for my work, a hard constraint. Execution
    time for large data sets / lots of iterations of a model needs to be
    reasonabe. So, I wrote a program to test it out (appended to the
    bottom of this email). As I said, this is my first python program.
    I'm sure it is terrible, but in many ways, you hope that the
    interpreter / compiler corrects some errors (or the langauge isn't
    that easy to use after all).

    Benchmarks (for the paramter settings in the file):

    C++builder 6.0: 3 seconds
    Perl (ActivePerl 5.6.1.638): 14 seconds
    Python (ActivePython 2.3.2 Build 232): 1 minute 13 seconds

    Note that I didn't fiddle overmuch with any of the programs -- each
    one took about 15 minutes to bang out on the keyboard. I'm hoping
    that there is some obvious mistake in the way I used something in
    python to account for the speed differential. It seems like a really,
    really nice language, but orders of magnitude slower than C/C++ and a
    5x slowdown over Perl seems abusive...

    Thanks for any advice.
    sd


    Python code:

    #!/usr/bin/env python
    import random

    pop=[] # popluation of agents; initially empty list
    N = 10000 # population size
    ep = .05 # mutation rate
    its = 100 # number of times through the main loop
    gold=0 # initial number of gold adopters; (1-gold) = silver
    adopters
    standard=0 # either 1 for gold or 2 for silver
    shifts=0 # number of regime shifts

    def init_all(): # init globals
    global pop, gold, standard
    pop = []
    gold = 0
    for i in range(N):
    if random.randint( 1,2) == 1:
    pop.append(1)
    gold=gold+1
    else: pop.append(2)
    if gold>N/2: standard=1
    else: standard=2 # if tie, silver wins
    # print "Initial Population of Gold Users: ", gold
    # end function init_all

    def one_choose():
    global pop, standard, gold, shifts
    for i in range(its):
    temp = random.randint( 0,N-1)
    tempval = pop[temp]
    old_stand = standard
    if random.random() <ep:
    if random.random() <0.5:
    pop[temp]=1
    if tempval!=pop[temp]:
    gold=gold+1
    if gold>N/2: standard=1
    else:
    pop[temp]=2
    if tempval!=pop[temp]:
    gold=gold-1
    if gold<N/2: standard=2
    if standard!=old_s tand: shifts=shifts+1 # check for regime
    shift after each agent chooses anew
    else:
    if gold>N/2:
    if pop[temp]!=1:
    pop[temp]=1
    gold=gold+1
    if gold>N/2: standard=1
    else:
    if pop[temp]!=2:
    pop[temp]=2
    gold=gold-1
    if gold<N/2: standard=2
    if standard!=old_s tand: shifts=shifts+1 # check for regime
    shift after each agent chooses anew
    # print "Final Population of Gold Users: ", gold
    # end function one_choose

    # start main loop


    for i in range(1000):
    init_all()
    one_choose()
    print "Number of regime shifts: ", shifts
  • Peter Hansen

    #2
    Re: benchmarks and questions for new python programmer

    stormslayer wrote:
    [color=blue]
    > Benchmarks (for the paramter settings in the file):
    >
    > C++builder 6.0: 3 seconds
    > Perl (ActivePerl 5.6.1.638): 14 seconds
    > Python (ActivePython 2.3.2 Build 232): 1 minute 13 seconds
    >
    > Note that I didn't fiddle overmuch with any of the programs -- each
    > one took about 15 minutes to bang out on the keyboard. I'm hoping
    > that there is some obvious mistake in the way I used something in
    > python to account for the speed differential. It seems like a really,
    > really nice language, but orders of magnitude slower than C/C++ and a
    > 5x slowdown over Perl seems abusive...[/color]

    I haven't taken the time to do more than glance at your
    program, but I'll note the following things, in no particular
    order.

    1. Using your first Python program as anything more than a
    learning experience would be nuts. It's highly likely,
    in my experience with C/C++ programmers switching to
    Python, that your code is more like C and less like Python
    and therefore doesn't take much advantage of Python's
    strengths.

    2. Psyco can provide a very large speedup for many Python
    programs, with little or no effort required.

    3. "Orders of magnitude" is not likely quite right. It might
    be up to two (decimal) orders of magnitude slower for certain
    very rare problems, but is probably closer to one order of
    magnitude slower, and possibly less than that in many cases.

    4. Python is about as fast as Perl in general. That you had
    a 5x slowdown suggests either that your benchmark exercises
    one of the few areas where Perl is highly specialized and
    optimized in ways Python cannot be, or (more likely) that
    your first Python code is very "un-Pythonic" and therefore
    shouldn't be taken as typical.

    5. Do a quick search for "Python optimization" and look at
    the first handful of links, including Guido's "Optimizati on
    Anecdote", and implement some of the suggestions there.

    -Peter

    Comment

    • Michael Geary

      #3
      Re: benchmarks and questions for new python programmer

      stormslayer wrote:[color=blue]
      > Benchmarks (for the paramter settings in the file):
      >
      > C++builder 6.0: 3 seconds
      > Perl (ActivePerl 5.6.1.638): 14 seconds
      > Python (ActivePython 2.3.2 Build 232): 1 minute 13 seconds
      >
      > Note that I didn't fiddle overmuch with any of the programs -- each
      > one took about 15 minutes to bang out on the keyboard. I'm hoping
      > that there is some obvious mistake in the way I used something in
      > python to account for the speed differential. It seems like a really,
      > really nice language, but orders of magnitude slower than C/C++ and a
      > 5x slowdown over Perl seems abusive...[/color]

      It's pretty easy to speed up your code by a factor of more than 10.

      First, you need to figure out where the time is going. Profiling is always a
      Good Thing, but in this case, it was pretty obvious just by eyeballing the
      code that the loop in init_all() was the likely culprit. To confirm this, I
      commented out the call to one_choose(), and sure enough, this didn't reduce
      the time much at all. So init_all() is where to concentrate our efforts.

      Here's your init_all() code:
      [color=blue]
      > def init_all(): # init globals
      > global pop, gold, standard
      > pop = []
      > gold = 0
      > for i in range(N):
      > if random.randint( 1,2) == 1:
      > pop.append(1)
      > gold=gold+1
      > else: pop.append(2)
      > if gold>N/2: standard=1
      > else: standard=2 # if tie, silver wins[/color]

      And here is a new version:

      def init_all(): # init globals
      global pop, gold, standard
      pop = [2] * N
      gold = 0
      rand = random.random
      for i in xrange(N):
      if rand() < .5:
      pop[i] = 1
      gold += 1
      if gold>N/2: standard=1
      else: standard=2 # if tie, silver wins

      What's different here?

      * I used xrange(N) instead of range(N). This helps a little bit but not much
      by itself.

      * Instead of using append() repeatedly to extend the array, I preallocated
      the entire array with the "pop = [2] * N" statement. This avoids repeated
      memory allocation and helps a little bit. But still not enough to make a
      huge difference.

      * Finally, I tried tracing into the code in the debugger, and when it
      stepped into the random.randint( ) funciton, I noticed that this is a fairly
      complex little function. All you're using it for is to make a 50-50
      decision, so we can bypass a lot of code by using random.random() intead.
      This is where the big speedup comes from. Also, I grabbed a reference to
      random.random outside the inner loop, which helps a little bit.

      On my machine, your original code runs in 139 seconds. With these changes,
      it runs in 10.9 seconds, which would be about 5.7 seconds on your faster
      machine.

      If that's not fast enough, you can get the psyco module from
      http://psyco.sourceforge.net/ and add these two lines to the top of the
      source file:

      import psyco
      psyco.full()

      With that change, the code runs in 3.6 seconds, which would be about 1.9
      seconds on your machine--considerably faster than the C++ Builder time.

      Good enough? :-)

      -Mike
      [color=blue]
      > Python code:
      >
      > #!/usr/bin/env python
      > import random
      >
      > pop=[] # popluation of agents; initially empty list
      > N = 10000 # population size
      > ep = .05 # mutation rate
      > its = 100 # number of times through the main loop
      > gold=0 # initial number of gold adopters; (1-gold) = silver
      > adopters
      > standard=0 # either 1 for gold or 2 for silver
      > shifts=0 # number of regime shifts
      >
      > def init_all(): # init globals
      > global pop, gold, standard
      > pop = []
      > gold = 0
      > for i in range(N):
      > if random.randint( 1,2) == 1:
      > pop.append(1)
      > gold=gold+1
      > else: pop.append(2)
      > if gold>N/2: standard=1
      > else: standard=2 # if tie, silver wins
      > # print "Initial Population of Gold Users: ", gold
      > # end function init_all
      >
      > def one_choose():
      > global pop, standard, gold, shifts
      > for i in range(its):
      > temp = random.randint( 0,N-1)
      > tempval = pop[temp]
      > old_stand = standard
      > if random.random() <ep:
      > if random.random() <0.5:
      > pop[temp]=1
      > if tempval!=pop[temp]:
      > gold=gold+1
      > if gold>N/2: standard=1
      > else:
      > pop[temp]=2
      > if tempval!=pop[temp]:
      > gold=gold-1
      > if gold<N/2: standard=2
      > if standard!=old_s tand: shifts=shifts+1 # check for regime
      > shift after each agent chooses anew
      > else:
      > if gold>N/2:
      > if pop[temp]!=1:
      > pop[temp]=1
      > gold=gold+1
      > if gold>N/2: standard=1
      > else:
      > if pop[temp]!=2:
      > pop[temp]=2
      > gold=gold-1
      > if gold<N/2: standard=2
      > if standard!=old_s tand: shifts=shifts+1 # check for regime
      > shift after each agent chooses anew
      > # print "Final Population of Gold Users: ", gold
      > # end function one_choose
      >
      > # start main loop
      >
      >
      > for i in range(1000):
      > init_all()
      > one_choose()
      > print "Number of regime shifts: ", shifts[/color]


      Comment

      • Terry Reedy

        #4
        Re: benchmarks and questions for new python programmer


        "stormslaye r" <demarchi@duke. edu> wrote in message
        news:c5f75ecc.0 405161733.4bc95 347@posting.goo gle.com...[color=blue]
        > I've been considering a shift to python. I currently use c++builder
        > (borland) or perl. I do computational models / statistics
        > programming, and was interested in python b/c it
        >
        > a. has a library that connects to the R stats package
        > b. has code that seems way more readable than anything else
        >
        > There is, however, at least for my work, a hard constraint. Execution
        > time for large data sets / lots of iterations of a model needs to be
        > reasonabe.[/color]

        For floating point calculations, and possibly integer, you may want to use
        Numerical Python, or maybe the replacement-in-progress Numarray, or SciPy,
        or PyRex, or Weave, or possibly Boost.Python -- besides the suggestion of
        Psyco (which is what I personally would try first, especially for iterative
        integer work).


        So, I wrote a program to test it out (appended to the[color=blue]
        > bottom of this email). As I said, this is my first python program.
        > I'm sure it is terrible, but in many ways, you hope that the
        > interpreter / compiler corrects some errors (or the langauge isn't
        > that easy to use after all).
        >
        > Benchmarks (for the paramter settings in the file):
        >
        > C++builder 6.0: 3 seconds
        > Perl (ActivePerl 5.6.1.638): 14 seconds
        > Python (ActivePython 2.3.2 Build 232): 1 minute 13 seconds[/color]

        Are you getting the same results from each, so that you know they are at
        least functionally equivalent if not algorithmically equivalent? Without
        identical rngs, this would require storing a sequence of outputs from one
        in a disk file for each program to use.
        [color=blue]
        > Python code:
        >
        > #!/usr/bin/env python
        > import random
        >
        > pop=[] # popluation of agents; initially empty list[/color]

        You might as well comment this out since init rebinds this anyway. Ditto
        for gold and standard
        [color=blue]
        > N = 10000 # population size
        > ep = .05 # mutation rate
        > its = 100 # number of times through the main loop
        > gold=0 # initial number of gold adopters; (1-gold) = silver
        > adopters
        > standard=0 # either 1 for gold or 2 for silver
        > shifts=0 # number of regime shifts
        >
        > def init_all(): # init globals
        > global pop, gold, standard
        > pop = [][/color]

        pop = N*[None], followed by pop[i] = x in loop might be faster.
        so would be pend = pop.append followed by pend(1) and pend(2) in loop
        in either case, rint = random.randint followed by rint(1,2) in loop would
        definitely be so.
        making pop, gold, and standard local vars instead of globals wil be faster.
        then end with
        return pop, gold, standard
        [color=blue]
        > gold = 0
        > for i in range(N):
        > if random.randint( 1,2) == 1:
        > pop.append(1)
        > gold=gold+1[/color]

        gold += 1 might be (should be?) faster here and below
        [color=blue]
        > else: pop.append(2)
        > if gold>N/2: standard=1
        > else: standard=2 # if tie, silver wins
        > # print "Initial Population of Gold Users: ", gold
        > # end function init_all
        >
        > def one_choose():[/color]

        make pop, gold, standard inputs and shifts a local
        same comment about lifting attribute lookups out of loop with
        rint,rdom = randon.randint, random,random
        [color=blue]
        > global pop, standard, gold, shifts
        > for i in range(its):
        > temp = random.randint( 0,N-1)
        > tempval = pop[temp]
        > old_stand = standard
        > if random.random() <ep:
        > if random.random() <0.5:
        > pop[temp]=1
        > if tempval!=pop[temp]:
        > gold=gold+1
        > if gold>N/2: standard=1
        > else:
        > pop[temp]=2
        > if tempval!=pop[temp]:
        > gold=gold-1
        > if gold<N/2: standard=2
        > if standard!=old_s tand: shifts=shifts+1 # check for regime
        > shift after each agent chooses anew
        > else:
        > if gold>N/2:
        > if pop[temp]!=1:
        > pop[temp]=1
        > gold=gold+1
        > if gold>N/2: standard=1
        > else:
        > if pop[temp]!=2:
        > pop[temp]=2
        > gold=gold-1
        > if gold<N/2: standard=2
        > if standard!=old_s tand: shifts=shifts+1 # check for regime
        > shift after each agent chooses anew
        > # print "Final Population of Gold Users: ", gold
        > # end function one_choose
        >
        > # start main loop
        >
        >
        > for i in range(1000):
        > init_all()
        > one_choose()
        > print "Number of regime shifts: ", shifts
        > --
        > http://mail.python.org/mailman/listinfo/python-list
        >[/color]




        Comment

        • Brian Gough

          #5
          Re: benchmarks and questions for new python programmer

          demarchi@duke.e du (stormslayer) writes:
          [color=blue]
          > Note that I didn't fiddle overmuch with any of the programs -- each
          > one took about 15 minutes to bang out on the keyboard. I'm hoping
          > that there is some obvious mistake in the way I used something in
          > python to account for the speed differential. It seems like a really,
          > really nice language, but orders of magnitude slower than C/C++ and a
          > 5x slowdown over Perl seems abusive...[/color]

          In general Perl and Python should have roughly the same performance.

          There is a profiler included with the standard Python distribution.
          You can use it on any script from the command-line,

          $ python /usr/lib/python<version>/profile.py yourscript.py

          with the appropriate path for <version> on your system. It should
          show where your program is spending all its time. See the Python
          Library Reference Manual for details of the profiler.

          --
          Brian Gough

          Network Theory Ltd,
          Publishing the Python Manuals --- http://www.network-theory.co.uk/

          Comment

          • stormslayer

            #6
            Re: benchmarks and questions for new python programmer

            Thanks for all the help. Mike is correct -- the integer random number
            gen from the math library isn't all that good and is causing the
            slowdown. And thanks to Terry for math lib suggestions. You folks
            are great.

            I got a bunch of emails saying that one shouldn't benchmark languages
            this way, or do so without knowing the language really well, or that
            my code wasn't pythonish enough. You folks need to wake up and smell
            the compilers (or interpreters). The whole point of a tool such as a
            compiler or interpreter is to take reasonable code (and in this case,
            my algorithm is certainly that -- the fault wasn't in my code but in a
            python function) and generate reasonable program speeds. Sure,
            experts should be able to fiddle and get something more out of it, but
            the whole attraction of python for me is that it looks like pseudo
            code. If you can't write things the way you think, then it isn't much
            help to use python. It is still the case that unoptimized code in
            borland's C++ is 5x faster than python (and that's for a windows
            program w/ dialog boxes, etc. rather than a console app), and this is
            distressing, but I'll keep trying w/ python. Python is beautiful...

            Thanks again for all the help.
            sd

            Comment

            • Terry Reedy

              #7
              Re: benchmarks and questions for new python programmer


              "stormslaye r" <demarchi@duke. edu> wrote in message
              news:c5f75ecc.0 405170409.59f8f 54@posting.goog le.com...[color=blue]
              > Thanks for all the help. Mike is correct -- the integer random number
              > gen from the math library isn't all that good and is causing the
              > slowdown. And thanks to Terry for math lib suggestions. You folks
              > are great.[/color]

              Thanks, your're welcome.
              [color=blue]
              > I got a bunch of emails[/color]

              Ignore them. Public posts should usually be reponded to publically -- or
              not at all. Dropping negative comments in your mailbox is nasty. Helpful
              comments should be public for the benefit of everyone. I'm glad Michael
              responed publicly and reminded me that some problems, like this one, have a
              better list initializer than the standard default None (so much for
              writing at 2 AM).
              [color=blue]
              > saying that one shouldn't benchmark languages
              > this way, or do so without knowing the language really well, or that
              > my code wasn't pythonish enough.[/color]

              There are occasional trolls who do a dippy 'benchmark', find Python
              lacking, and then post something to the effect that Python is 'bad'. You
              were benchmarking your ability to use three languages. When you found you
              ability with Python inadequate, you asked for help and got at least three
              good responses.
              [color=blue]
              > You folks need to wake up and smell the compilers (or interpreters).[/color]

              I presume 'you folks' refers to the private emailers ;-)
              [color=blue]
              > The whole point of a tool such as a
              > compiler or interpreter is to take reasonable code (and in this case,
              > my algorithm is certainly that -- the fault wasn't in my code but in a
              > python function) and generate reasonable program speeds.[/color]

              You are leaving out programmer speed, which is a large part of overall
              problem solution cost. Python is used by people who value their own time
              *and* who find programming in Python to be faster than in other languages,
              once a reasonable proficiency is obtained. Some whizzes in other languages
              never find the latter to be true, even after a fair trial, and so they
              properly stick with their other languages.

              The quality of implementation of various functions in various modules
              varies. Random.randint( ) is known to be slowish. I presume it could be
              made faster, but that is apparently not an itch of any of the current
              contributors. If you look at the code and see how, I presume a patch would
              be welcome, but you might ask first on the development list about
              undocumented rationales for the current implementation.
              [color=blue]
              > Sure, experts should be able to fiddle and get something more out of it,[/color]

              Here I think you are over-reacting to hitting a bad spot in the road, and
              perhaps forgetting what newbie C++ code can look like, and how much you
              have internalized expert fiddling in the languages you know better.
              [color=blue]
              > but the whole attraction of python for me is that it looks like pseudo[/color]
              code.

              Ditto, as I wrote at least 7 years ago.
              [color=blue]
              > If you can't write things the way you think, then it isn't much
              > help to use python.[/color]

              People who use Python do so at least in part because they find it better
              fits how they think than, for instance, C++. In any case, get it right,
              then make it faster if not fast enough. For some people, Python makes 'get
              it right' easier and faster.
              [color=blue]
              > It is still the case that unoptimized code in
              > borland's C++ is 5x faster than python (and that's for a windows
              > program w/ dialog boxes, etc. rather than a console app), and this is
              > distressing,[/color]

              Remembering when CPU time cost 50-100 times as much as programmer time, I
              once did too. If you focus on total clock time from start to finish, and
              on long term maintainability , you might find it less distressing ;-) If
              you use use Numerical Python, which wraps expert-written Linpack and FFT
              routines, among others, or Psyco (if you can -- it is CPU specific still),
              you migh find 5x less true.
              [color=blue]
              > but I'll keep trying w/ python. Python is beautiful...[/color]

              Be careful, or it might grow on you.

              Terry J. Reedy




              Comment

              • beliavsky@aol.com

                #8
                Re: benchmarks and questions for new python programmer

                demarchi@duke.e du (stormslayer) wrote in message news:<c5f75ecc. 0405170409.59f8 f54@posting.goo gle.com>...

                I found have similar results to yours when comparing the speed of
                Python to Fortran 95 for numerical work -- see for example the
                "Numeric Speed" messge I posted here on 4/30/2004. (Replying to J.
                Carlson, I used 'timethis.exe foo.exe' and 'timethis.exe python
                foo.py', to measure times, where foo.exe and foo.py are a Fortran
                executable and a Python script, and timethis.exe is a timing command
                on Windows).

                In addition to being faster in general for numeric work than Python,
                an advantage of Fortran 95 is that an optimizing compiler such as
                Intel Fortran gives you more freedom in how to write your code without
                sacrificing performance. Some examples of this are that

                (1) An explicit loop to sum the elements of an array in Fortran 95
                takes the same time as the intrinsic function SUM. In Python the
                explicit loop is much slower than the sum in Numeric.

                (2) A Fortran compiler will reduce an expression like x**1 to x, but
                Python actually computes x**1. If your code computes x**ip and ip
                takes on the value of 1, this could make a difference.

                (3) Simple function calls incur little overhead in Fortran, so that
                sqrt(x) is faster than real exponentiation, but in Python the reverse
                can be true. Sqrt(x) OUGHT to be substantially faster.

                (4) Computations done in the main program of a Python script can be
                significantly slower than those done in a function, but I have not
                heard of this making a difference for compiled languages.

                In some respects Python seems like a LOWER-LEVEL language to me than a
                compiled language like Fortran 95 or C++, if you care about
                performance. Good optimizing compilers take what you write and
                reorganize it to run fast, but the Python interpreter is more
                literal-minded, and you need to think more about the most efficient
                way to code something and know more about how the Python interpreter
                works.

                Comment

                Working...