Python is slow?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sturlamolden

    Python is slow?

    I have recently been playing with a kd-tree for solving the "post
    office problem" in a 12-dimensional space. This is pure cpu bound
    number crunching, a task for which I suspected Python to be
    inefficient.

    My prototype in Python 2.5 using NumPy required 0.41 seconds to
    construct the tree from 50,000 samples. Unfortunately, searching it
    felt a bit slow, finding the 11 nearest-neighbours of 1,000 points
    took 29.6 seconds (and there were still 49,000 to go). Naturally, I
    blamed this on Python. It would be 100 times faster if I used C++,
    right?

    After having a working Python prototype, I resorted to rewrite the
    program in C++. The Python prototype took an hour to make, debug and
    verify. The same thing in C++ took me almost a day to complete, even
    with a working prototype as model. To my surprise, the resulting beast
    of C++ required 64.3 seconds to construct the same kd-tree. Searching
    the tree was not faster either, 1,000 points required 38.8 seconds. I
    wasted a day, only to find my Python prototype being the faster.

    We may conclude that I'm bad at programming C++, but I suspect that is
    not the case here. Albeit micro-benchmarks may indicate that Python is
    100-200 times slower than C++, they may not be applicable to the real
    world. Python can be very efficient. And when combined with libraries
    like NumPy, beating it's performance with hand-crafted C++ is
    difficult. At least, my 10 years experience programming scientific
    software in various languages was not sufficient to beat my own Python
    prototype with C++.

    That is not to say I have never seen C++ run a lot faster than Python.
    But it tends to be very short pieces of CPU bound code, no more than a
    function or two. But as the problem grows in complexity, C++
    accumulates too much of its own bloat.


  • Robert Singer

    #2
    Re: Python is slow?

    On Tue, 23 Sep 2008 06:23:12 -0700 (PDT), sturlamolden
    <sturlamolden@y ahoo.nowrote:
    >I have recently been playing with a kd-tree for solving the "post
    >office problem" in a 12-dimensional space. This is pure cpu bound
    >number crunching, a task for which I suspected Python to be
    >inefficient.
    Well, python is not a number crunching language. However much we would
    like it to be (we would ? :-). No scripting language is.
    Developing time is shorter, I agree, but when you have, for example a
    problem which takes 8,31 minutes to go through in optimized fortran
    code (like the one I had the other day), then that hardly matters.
    >
    >My prototype in Python 2.5 using NumPy required 0.41 seconds to
    >construct the tree from 50,000 samples. Unfortunately, searching it
    >felt a bit slow, finding the 11 nearest-neighbours of 1,000 points
    >took 29.6 seconds (and there were still 49,000 to go). Naturally, I
    >blamed this on Python. It would be 100 times faster if I used C++,
    >right?

    Not necessarily.
    Before resorting to rewriting the problem try psyco. It speeds up
    things sometimes.
    Also, (I'm not that familiar with python yet, so I don't know how to
    do it in python), try finding the bottlenecks of your calculation. Are
    the loops where most of the processing time is wasted, or disk
    accessing, or ... ?
    >
    >After having a working Python prototype, I resorted to rewrite the
    >program in C++. The Python prototype took an hour to make, debug and
    >verify. The same thing in C++ took me almost a day to complete, even
    >with a working prototype as model. To my surprise, the resulting beast
    >of C++ required 64.3 seconds to construct the same kd-tree. Searching
    >the tree was not faster either, 1,000 points required 38.8 seconds. I
    >wasted a day, only to find my Python prototype being the faster.

    >
    >We may conclude that I'm bad at programming C++, but I suspect that is
    >not the case here. Albeit micro-benchmarks may indicate that Python is
    >100-200 times slower than C++, they may not be applicable to the real
    >world. Python can be very efficient. And when combined with libraries
    >like NumPy, beating it's performance with hand-crafted C++ is
    >difficult. At least, my 10 years experience programming scientific
    >software in various languages was not sufficient to beat my own Python
    >prototype with C++.
    >
    >That is not to say I have never seen C++ run a lot faster than Python.
    >But it tends to be very short pieces of CPU bound code, no more than a
    >function or two. But as the problem grows in complexity, C++
    >accumulates too much of its own bloat.
    >
    Well, personally, I try to combine fortran (being a fortran programmer
    by trade) with python (in the last few years), as I find fortran to
    be, by two grades, more comfortable for solving scientific problems
    then c (or python for that matter, although it has its merits).
    Starting from ith his capabilities for "normal" array handling, to
    optimisation and easy readability, to whatnot.


    Best regards
    Bob

    Comment

    • Grant Edwards

      #3
      Re: Python is slow?

      On 2008-09-23, sturlamolden <sturlamolden@y ahoo.nowrote:

      [...]
      After having a working Python prototype, I resorted to rewrite the
      program in C++. The Python prototype took an hour to make, debug and
      verify. The same thing in C++ took me almost a day to complete, even
      with a working prototype as model. To my surprise, the resulting beast
      of C++ required 64.3 seconds to construct the same kd-tree. Searching
      the tree was not faster either, 1,000 points required 38.8 seconds. I
      wasted a day, only to find my Python prototype being the faster.
      >
      We may conclude that I'm bad at programming C++,
      AFAICT, _everybody_ is bad at programming C++.

      One begins to suspect it's not the fault of the programmers.

      --
      Grant Edwards grante Yow! Finally, Zippy
      at drives his 1958 RAMBLER
      visi.com METROPOLITAN into the
      faculty dining room.

      Comment

      • George Sakkis

        #4
        Re: Python is slow?

        On Sep 23, 9:57 am, Grant Edwards <gra...@visi.co mwrote:
        On 2008-09-23, sturlamolden <sturlamol...@y ahoo.nowrote:
        >
        [...]
        >
        After having a working Python prototype, I resorted to rewrite the
        program in C++. The Python prototype took an hour to make, debug and
        verify. The same thing in C++ took me almost a day to complete, even
        with a working prototype as model. To my surprise, the resulting beast
        of C++ required 64.3 seconds to construct the same kd-tree. Searching
        the tree was not faster either, 1,000 points required 38.8 seconds. I
        wasted a day, only to find my Python prototype being the faster.
        >
        We may conclude that I'm bad at programming C++,
        >
        AFAICT, _everybody_ is bad at programming C++.
        +1 QOTW

        Comment

        • skip@pobox.com

          #5
          Re: Python is slow?

          >We may conclude that I'm bad at programming C++,
          GrantAFAICT, _everybody_ is bad at programming C++.

          GrantOne begins to suspect it's not the fault of the programmers.

          +1 QOTW...

          Skip

          Comment

          • bearophileHUGS@lycos.com

            #6
            Re: Python is slow?

            sturlamolden:

            CPython is generally slow (you can see this from the huge amount of
            solutions invented to solve the speed problem, like Cython, Numpy,
            Psyco, ShedSkin, Weave, Inline, SIP, Boost Python, SWIG, etc etc), but
            for most of the usages Python is used for, it's not a significant
            problem. I know that sounds like a tautology :-)

            Well written C++ code is generally faster or much faster than similar
            Python code, but programming in Python is often simpler, and it
            generally requires less time. So it may happen that to solve a problem
            a Python program that runs in 1 hour that requires 1 hour to be
            written allows you to find the solution in less time than a C++
            program that runs in 5-10 minutes that requires you 3-4 hours to be
            written :-)

            Note that C++ is just one option, but there are other languages
            around, like CLisp or D (or even a modern JavaVM), that are often an
            acceptable compromise between Python and C/C++.

            So you can show us a reduced/minimal working version of your Python
            code, so I/someone may find ways to speed it up, translate it to C or C
            ++ or CLisp or D, etc.

            Note that I have written a kd-tree in both Python (with Psyco
            compulsively) and D, this is the Psyco version:


            Bye,
            bearophile

            Comment

            • sturlamolden

              #7
              Re: Python is slow?

              On Sep 23, 3:44 pm, Robert Singer <rsinger@____.c omwrote:
              Well, python is not a number crunching language. However much we would
              like it to be (we would ? :-).
              No scripting language is.
              Not even Matlab, R, IDL, Octave, SciLab, S-PLUS or Mathematica?

              Before resorting to rewriting the problem try psyco. It speeds up
              things sometimes.
              I did, Psyco did not help.

              Also, (I'm not that familiar with python yet, so I don't know how to
              do it in python), try finding the bottlenecks of your calculation.
              I did use a profiler, there is no particular single bottle-neck.

              Well, personally, I try to combine fortran (being a fortran programmer
              by trade) with python
              Good compilers are too expensive, and gfortran is not good enough yet.



              Comment

              • namekuseijin

                #8
                Re: Python is slow?

                On Sep 23, 10:57 am, Grant Edwards <gra...@visi.co mwrote:
                AFAICT, _everybody_ is bad at programming C++.
                Thankfully, at least Numpy developers are not bad at C programming.

                Comment

                • bearophileHUGS@lycos.com

                  #9
                  Re: Python is slow?

                  sturlamolden:
                  >F# and OCaml look promising though.<
                  I bet on the future of D and Haskell (and maybe Fortress) instead :-)
                  We'll see.

                  >Sure I could show you the code, Python and C++, if I had a place to post it.<
                  I think the Python version suffices. If it's not too much private you
                  may post the single minimal/reduced runnable Python module here, it
                  will be deleted in some time (if you want you can also use a private
                  paste):


                  >It looks very different form yours though.<
                  Is this a good or bad thing? ;-)

                  Bye,
                  bearophile

                  Comment

                  • sturlamolden

                    #10
                    Re: Python is slow?

                    On Sep 23, 8:52 pm, bearophileH...@ lycos.com wrote:
                    I think the Python version suffices. If it's not too much private you
                    may post the single minimal/reduced runnable Python module here, it
                    will be deleted in some time (if you want you can also use a private
                    paste):http://codepad.org/


                    Available 24 hours from now.

                    Is this a good or bad thing? ;-)
                    It's just facinating how different people working on similar problems
                    come up with different looking code.









                    Comment

                    • Robert Kern

                      #11
                      Re: Python is slow?

                      bearophileHUGS@ lycos.com wrote:
                      sturlamolden:
                      >Sure I could show you the code, Python and C++, if I had a place to post it.<
                      >
                      I think the Python version suffices. If it's not too much private you
                      may post the single minimal/reduced runnable Python module here, it
                      will be deleted in some time (if you want you can also use a private
                      paste):
                      http://codepad.org/
                      You could also drop it on the scipy.org wiki in the Cookbook category.

                      --
                      Robert Kern

                      "I have come to believe that the whole world is an enigma, a harmless enigma
                      that is made terrible by our own mad attempt to interpret it as though it had
                      an underlying truth."
                      -- Umberto Eco

                      Comment

                      • J Peyret

                        #12
                        Re: Python is slow?

                        On Sep 23, 8:31 am, bearophileH...@ lycos.com wrote:

                        Guys, this looks like a great data structure/algo for something I am
                        working on.

                        But... where do I find some definitions of the original BK-tree idea?
                        I looked through Amazon
                        and only a few books mention something like BK-Tree and these are
                        mostly conference minutes books, at ungodly prices.

                        I also did a quick Google on it and there isn't that much about the
                        subject.



                        is the one I mostly saw referred.

                        So... 2 questions:

                        1. More bk-tree references? I can follow the code, but some
                        understanding of the background would be nice.

                        2. What, if any, is a good book to understand the basic of fuzzy/
                        string matching? Proximity/affinity problems? Or, more generally, a
                        good book on advanced algorithms?

                        No, I don't wanna read Knuth's just yet, something more modern/easy to
                        follow maybe? Something like 'Programming Collective Intelligence',
                        ISBN 0596529325, would be very nice, though it is perhaps a bit too
                        specific in its applications. Books using Java or C are fine. Lisp,
                        hmmm, well... I have trouble reading its notation, sorry.

                        Cheers

                        JLuc

                        Comment

                        • sturlamolden

                          #13
                          Re: Python is slow?

                          On Sep 23, 9:17 pm, Robert Kern <robert.k...@gm ail.comwrote:
                          You could also drop it on the scipy.org wiki in the Cookbook category.
                          Yes, if I could figure out how to use it...

                          Comment

                          • Robert Kern

                            #14
                            Re: Python is slow?

                            J Peyret wrote:
                            On Sep 23, 8:31 am, bearophileH...@ lycos.com wrote:
                            >
                            Guys, this looks like a great data structure/algo for something I am
                            working on.
                            >
                            But... where do I find some definitions of the original BK-tree idea?
                            Uh, actually we're talking about kd-trees, not BK-trees. kd-trees are for
                            searching through point sets in a k-dimensional space.

                            --
                            Robert Kern

                            "I have come to believe that the whole world is an enigma, a harmless enigma
                            that is made terrible by our own mad attempt to interpret it as though it had
                            an underlying truth."
                            -- Umberto Eco

                            Comment

                            • Robert Singer

                              #15
                              Re: Python is slow?

                              On Tue, 23 Sep 2008 11:07:22 -0700 (PDT), sturlamolden
                              <sturlamolden@y ahoo.nowrote:
                              >On Sep 23, 3:44 pm, Robert Singer <rsinger@____.c omwrote:
                              >
                              >Well, python is not a number crunching language. However much we would
                              >like it to be (we would ? :-).
                              >
                              >No scripting language is.
                              >
                              >Not even Matlab, R, IDL, Octave, SciLab, S-PLUS or Mathematica?
                              >
                              No. And just to avoid eventual useless discussions which might arise,
                              I ment to say that *in general* compiled languages are faster. We can
                              always have discussions whether or not some newer scripting languages
                              like some from the above list, come close, but that usually is just
                              wasted time.

                              Specifically, I cannot say about R, IDL or S-PLUS, since I never used
                              those (not even heard of IDL till now). Octave and Mathematica have
                              been with me for such a short time (we had a few licences for
                              Wolfram's child for one year, but not my part of the company, so ...)
                              that I would rather not give my opinion about those.
                              I've used Matlab and Scilab for a longer time (still do actually -
                              Matlab for measurement data acquisition, and Scilab ... well, it just
                              sits on the disk somewhere actually), and although Matlab is quite
                              fast when disk I/O is involved, it still comes far.

                              >Also, (I'm not that familiar with python yet, so I don't know how to
                              >do it in python), try finding the bottlenecks of your calculation.
                              >
                              >I did use a profiler, there is no particular single bottle-neck.
                              You're talking about your c or your python version of the program?

                              There is always a bottleneck - that's just the part which works most
                              slowly. Try to find the part which takes the longest to execute, try
                              to put it differently. If it cannot be done, go to the next slowest
                              part.

                              >Good compilers are too expensive, and gfortran is not good enough yet.
                              >
                              ?
                              Gfortran is one of the better compilers on the market. There was, just
                              the other day, a nice discussion on comp.lang.fortr an how it is
                              marvellous what a group of enthousiasts managed do in their time, what
                              commercial giants still didn't.
                              May I ask what are your main objections to it ?

                              Best regards
                              Bob

                              Comment

                              Working...