Python and STL efficiency

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • skip@pobox.com

    #31
    Re: Python and STL efficiency


    TimBut beware! For Python2.5 I had to change the code slightly,
    Timbecause it already realized that the expression

    Tim'%s' % 'something'

    Timwill be a constant expression, and evaluates it once only... so I
    Timhad to replace '%s' with a variable, and I got the timings above
    Timwhich show Python2.5 to be slightly faster than Python2.4.

    Shouldn't you then get rid of any compiler optimizations your C++ compiler
    does? Why penalize 2.5 because it recognizes a useful optimization?

    Tim(Next step would be to create a VB version and a Java version of
    Timthe same program, oh and perhaps to try a version that would work
    Timwith Jython... perhaps somehow w/o the 'set')

    I don't recall the example exactly, but couldn't you just create a set class
    that uses a dict under the covers and only implement the methods you need
    for the test?

    Skip

    Comment

    • Maric Michaud

      #32
      Re: Python and STL efficiency

      Le mardi 22 août 2006 12:55, Mc Osten a écrit :
      In fact Python here is faster. Suppose it has a really optimized set
      class...
      Maybe I'm missing something but the posted c++codes are not equivalent IMO to
      what python is doing. I discarded the "slow" version, and tried to get the
      equivalent in c++ of :

      """
      #!/usr/bin/env python

      size = 1000000

      def f():
      a = []
      for i in range(size):
      a.append('What do you know')
      a.append('so long...')
      a.append('chick en crosses road')
      a.append('fool' )
      b = set(a)
      for s in b:
      print s

      import time
      from time import clock

      f_start = clock()
      f()
      f_end = clock()

      print "Elapsed: %f seconds" % (f_end - f_start)
      """

      I came at first with the following, which is still slower than the python
      version :

      """
      void print_occurence _of_unique_stri ngs(){
      vector<stringa;
      const string& s1 = "What do you know?" ;
      const string& s2 = "so long..." ;
      const string& s3 = "chicken crosses road";
      const string& s4 = "fool" ;
      for (long int i=0; i<SIZE ; ++i){
      a.push_back(s1) ;
      a.push_back(s2) ;
      a.push_back(s3) ;
      a.push_back(s4) ;
      }
      set<stringb(a.b egin(), a.end());
      copy(b.begin(), b.end(),
      ostream_iterato r<string>(cout , "\n"));
      }
      """

      Here, all strings, while passed by reference to the vector, are copied one by
      one.
      Then, I tried this, it just overcome the performance of python code, but not
      in the proportion I expected :

      """
      void print_occurence _of_unique_stri ngs_compare_by_ adresses(){
      vector<string*a ;
      string s1 = "What do you know?";
      string s2 = "so long...";
      string s3 = "chicken crosses road";
      string s4 = "fool";
      for (long int i=0; i<SIZE ; ++i){
      a.push_back(&s1 );
      a.push_back(&s2 );
      a.push_back(&s3 );
      a.push_back(&s4 );
      }
      set<stringb;
      for (vector<string* >::iterator it=a.begin(); it!=a.end(); it++)
      b.insert(**it);
      copy(b.begin(), b.end(), ostream_iterato r<string>(cout , "\n"));
      }
      """

      The problem here, is that the strings in the set are compared by value, which
      is not optimal, and I guess python compare them by adress ("s*n is s*n" has
      the same complexity than "s*n == s*n" in CPython, right ?).

      so, finally, the code in c++, about ten times faster than the equivalent in
      python, must be :

      """
      void print_occurence _of_unique_stri ngs_compared_by _address(){
      cout << "print_occurenc e_of_unique_str ings_compared_b y_address" << endl;
      vector<string*a ;
      string s1 = "What do you know?";
      string s2 = "so long...";
      string s3 = "chicken crosses road";
      string s4 = "fool";
      for (long int i=0; i<SIZE ; ++i){
      a.push_back(&s1 );
      a.push_back(&s2 );
      a.push_back(&s3 );
      a.push_back(&s4 );
      }
      set<string*b(a. begin(), a.end());
      set<stringc; // well ordered set (b is ordered by address)
      for (set<string*>:: iterator it=b.begin(); it!=b.end(); it++)
      c.insert(**it);
      copy(c.begin(), c.end(), ostream_iterato r<string>(cout , "\n"));
      }
      """

      the result on my box is :

      maric@redflag2 mar aoû 22 22:24:23:~$ g++ -O3 -o testcpp testcpp.cpp
      maric@redflag2 mar aoû 22 22:24:29:~$ ./testcpp
      print_occurence _of_strings
      What do you know?
      chicken crosses road
      fool
      so long...
      print_occurence _of_unique_stri ngs
      What do you know?
      chicken crosses road
      fool
      so long...
      print_occurence _of_unique_stri ngs_compared_by _address
      What do you know?
      chicken crosses road
      fool
      so long...
      strings : 1.89
      unique strings : 0.87
      compared by address : 0.18
      maric@redflag2 mar aoû 22 22:24:38:~$ python2.4 testpython.py
      so long...
      What do you know
      fool
      chicken crosses road
      Elapsed: 1.680000 seconds
      maric@redflag2 mar aoû 22 22:24:51:~$ g++ -v
      Using built-in specs.
      Target: i486-linux-gnu
      Configured
      with: ../src/configure -v --enable-languages=c,c++ ,java,fortran,o bjc,obj-c++,ada,treelan g --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu--enable-libstdcxx-debug --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.1-1.4.2.0/jre --enable-mpfr --with-tune=i686 --enable-checking=releas e
      i486-linux-gnu
      Thread model: posix
      gcc version 4.1.2 20060613 (prerelease) (Debian 4.1.1-5)

      I've joined the full c++ file as an attachment.

      --
      _____________

      Maric Michaud
      _____________

      Aristote - www.aristote.info
      3 place des tapis
      69004 Lyon
      Tel: +33 426 880 097

      Comment

      • Tim N. van der Leeuw

        #33
        Re: Python and STL efficiency


        skip@pobox.com wrote:
        TimBut beware! For Python2.5 I had to change the code slightly,
        Timbecause it already realized that the expression
        >
        Tim'%s' % 'something'
        >
        Timwill be a constant expression, and evaluates it once only... so I
        Timhad to replace '%s' with a variable, and I got the timings above
        Timwhich show Python2.5 to be slightly faster than Python2.4.
        >
        Shouldn't you then get rid of any compiler optimizations your C++ compiler
        does? Why penalize 2.5 because it recognizes a useful optimization?
        >
        The point is that I was trying to create 400.000 string instances. The
        extra optimization in 2.5 required an extra trick for that.
        The idea is to compare a C++ version which creates 400.000 string
        instances, with a Python version which creates 400.000 string
        instances; then reduce those 400.000 instances to a set of only 4
        unique strings.
        (So I cannot just create a list with strings generated from numbers 1 -
        400.000, and I didn't want to change the original code too much, so I
        just added a trick to make Python allocate a new string each time
        round.)

        I agree that Python2.5 recognized a useful optimization, and didn't
        wish to penalize it for that, however the optimalization was defeating
        the purpose of my code in the first place!

        Cheers,

        --Tim

        Comment

        • Fredrik Lundh

          #34
          Re: Python and STL efficiency

          Maric Michaud wrote:
          The problem here, is that the strings in the set are compared by value, which
          is not optimal, and I guess python compare them by adress ("s*n is s*n" has
          the same complexity than "s*n == s*n" in CPython, right ?).
          wrong.
          timeit -s"s='x'; n=1000" "s*n is n*s"
          1000000 loops, best of 3: 1.9 usec per loop
          timeit -s"s='x'; n=1000" "s*n == n*s"
          100000 loops, best of 3: 4.5 usec per loop

          </F>

          Comment

          • Tim N. van der Leeuw

            #35
            Re: Python and STL efficiency


            Maric Michaud wrote:
            Le mardi 22 août 2006 12:55, Mc Osten a écrit :
            In fact Python here is faster. Suppose it has a really optimized set
            class...
            >
            Maybe I'm missing something but the posted c++codes are not equivalent IMO to
            what python is doing. I discarded the "slow" version, and tried to get the
            equivalent in c++ of :
            >
            Your C++ version got me the following timings (using gcc 3.4.5 as the
            compiler, MinGW version, with -O6):

            LeeuwT@nlshl-leeuwt ~/My Documents/Python
            $ ./testcpp.exe
            print_occurence _of_strings
            What do you know?
            chicken crosses road
            fool
            so long...
            print_occurence _of_unique_stri ngs
            What do you know?
            chicken crosses road
            fool
            so long...
            print_occurence _of_unique_stri ngs_compared_by _address
            What do you know?
            chicken crosses road
            fool
            so long...
            strings : 2.135
            unique strings : 1.103
            compared by address : 0.21


            For reference, Python's best time was 0.39 seconds on the same computer
            (in the 'fast' version, using only 4 unique string instances).

            Hmmm... Can we conclude now that carefully crafted C++ code is about
            twice as fast as casually and intuitively written Python code? ;) (Just
            kidding here of course)

            NB: Your code now tests for address-equality. Does it also still test
            for string-equality? It looks to me that it does, but it's not quite
            clear to me.

            Cheers,

            --Tim

            Comment

            • Mc Osten

              #36
              Re: Python and STL efficiency

              Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
              NB: Your code now tests for address-equality. Does it also still test
              for string-equality? It looks to me that it does, but it's not quite
              clear to me.
              It does it.

              set<string*b(a. begin(), a.end());
              set<stringc; // well ordered set (b is ordered by address)
              for (set<string*>:: iterator it=b.begin(); it!=b.end(); it++)
              c.insert(**it);
              copy(c.begin(), c.end(), ostream_iterato r<string>(cout , "\n"));

              When we populate the first set, we get rid of all strings with same
              object id/address (it test equality of pointers). Then we populate
              another set (and the default equality test is on strings).

              However, I would have written the code using a proper compare function
              rather than using two sets. In this particular case the number of
              elements of the first set is negligible in respect of the initial vector
              size, thus copying it again does not take a lot of time.
              But such code is optimized for the problem itself: in the real world I
              suppose we would have passed set a proper comparison function that
              checks address and then string equality.


              --
              blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
              site: http://www.akropolix.net/rik0/ | tenetevi riso e
              forum: http://www.akropolix.net/forum/ | bacchette per voi.

              Comment

              • Mc Osten

                #37
                Re: Python and STL efficiency

                Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
                My conclusion from that is, that the vector<or set<implementat ions
                of GCC are far superior to those of VC++ 6, but that memory allocation
                for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
                (And Python still smokes them both).
                It would be interesting to test it with VC 8 (2005). I have it in my
                Parallels vm, but it looks like something is wrong. The very same code
                takes almost a minute, I suppose there is something wrong with it
                (Python is almost as fast as the python 2.4 on MacOS).



                --
                blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                site: http://www.akropolix.net/rik0/ | tenetevi riso e
                forum: http://www.akropolix.net/forum/ | bacchette per voi.

                Comment

                • Mc Osten

                  #38
                  Re: Python and STL efficiency

                  Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
                  And the results of IronPython (1.0rc2) are just in as well:
                  I can't test this one.
                  >
                  And for Python 2.5:
                  LeeuwT@nlshl-leeuwt ~/My Documents/Python
                  $ /cygdrive/c/Python25/python.exe SpeedTest.py
                  Begin Test
                  Number of unique string objects: 4
                  so long...
                  What do you know
                  fool
                  chicken crosses road
                  Number of unique string objects: 400000
                  so long...
                  What do you know
                  fool
                  chicken crosses road
                  Fast - Elapsed: 0.440619 seconds
                  Slow - Elapsed: 1.095341 seconds

                  What the heck... you have a Cray, haven't you?
                  $ /opt/misc/bin/python2.5 -O set_impl.py
                  so long...
                  What do you know
                  fool
                  chicken crosses road
                  so long...
                  What do you know
                  fool
                  chicken crosses road
                  Elapsed: 1.300000 seconds
                  Elapsed: 1.290000 seconds

                  Yes... good optimizer work. The 'slow' code here is faster than the fast
                  one.


                  $ python -O set_impl.py
                  so long...
                  What do you know
                  fool
                  chicken crosses road
                  so long...
                  What do you know
                  fool
                  chicken crosses road
                  Elapsed: 1.360000 seconds
                  Elapsed: 3.800000 seconds
                  (Next step would be to create a VB version and a Java version of the
                  same program, oh and perhaps to try a version that would work with
                  Jython... perhaps somehow w/o the 'set')
                  Ok. I can do the Java version. If I find a RealBasic Set class I can do
                  it. However, I don't remember anything about VB6, and have done nothing
                  with .Net.
                  But I don't think it is that interesting. Java strings are immutable
                  too: I expect it to outperform Python (unless Java Set class sucks). And
                  I don't see the point of taking in VB.
                  A good BASIC implentation is comparable with Pascal or C++ speedwise.
                  (At least this results from Great Language Shootout and Free Basic).

                  --
                  blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                  site: http://www.akropolix.net/rik0/ | tenetevi riso e
                  forum: http://www.akropolix.net/forum/ | bacchette per voi.

                  Comment

                  • Ray

                    #39
                    Re: Python and STL efficiency


                    Tim N. van der Leeuw wrote:
                    Incidentally, I also have a version compiled with VC++ 6 now... (not
                    yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
                    for speed, here's the result of VC++ 6:
                    <snip>

                    OK, now I'm getting obsessed with this too ;-)

                    I'm using VC++ Express, I didn't care to tweak the optimizations, I
                    merely chose the "Release" configuration for the executable. It's
                    blazing fast, taking only 30+ ms each run.

                    Here's the code:

                    int main(){
                    DWORD begin = ::GetTickCount( );
                    vector<stringa;
                    string c = "What do you know?";
                    string d = "so long...";
                    string e = "chicken crosses road";
                    string f = "fool";
                    for (long int i=0; i<10000 ; ++i){
                    a.push_back(c);
                    a.push_back(d);
                    a.push_back(e);
                    a.push_back(f);
                    }
                    set<stringb(a.b egin(), a.end());
                    unique_copy(b.b egin(), b.end(), ostream_iterato r<string>(cout ,
                    "\n"));
                    DWORD end = ::GetTickCount( );
                    cout << "Ends in " << (end - begin) << " ms.";
                    }

                    And here's the result:

                    \TestSTL\releas e>TestSTL.exe
                    What do you know?
                    chicken crosses road
                    fool
                    so long...
                    Ends in 31 ms.

                    I tried the original version:

                    int main(){
                    DWORD begin = ::GetTickCount( );
                    vector<stringa;
                    for (long int i=0; i<10000 ; ++i){
                    a.push_back("Wh at do you know?");
                    a.push_back("so long...");
                    a.push_back("ch icken crosses road");
                    a.push_back("fo ol");
                    }
                    set<stringb(a.b egin(), a.end());
                    unique_copy(b.b egin(), b.end(), ostream_iterato r<string>(cout ,
                    "\n"));
                    DWORD end = ::GetTickCount( );
                    cout << "Ends in " << (end - begin) << " ms.";
                    }

                    And the result is only 50% slower:

                    \TestSTL\releas e>TestSTL.exe
                    What do you know?
                    chicken crosses road
                    fool
                    so long...
                    Ends in 47 ms.

                    Comment

                    • could.net@gmail.com

                      #40
                      Re: Python and STL efficiency

                      That's to say,
                      python is still much faster?

                      I am a c++ newbie but I think c++ should be faster here.
                      Maybe someone can post this to the c++ maillist and they will tell how
                      to accelerate it.
                      Tim N. van der Leeuw wrote:
                      Mc Osten wrote:
                      Fredrik Lundh <fredrik@python ware.comwrote:
                      Python's memory allocator is also quite fast, compared to most generic
                      allocators...
                      In fact also in the two "slow" versions Python outperforms C++.
                      I didn't notice it in the first place.

                      --
                      blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                      site: http://www.akropolix.net/rik0/ | tenetevi riso e
                      forum: http://www.akropolix.net/forum/ | bacchette per voi.
                      >
                      Well, I guess I'm getting really obsessed with this. But anyways. I
                      installed MinGW on my Windows-XP (sp2) laptop. It is g++ version 3.4.5
                      -- ancient, yes, but on windows it's the latest available.
                      >
                      I compiled Mc Osten's C++ program (tweaked the output a little) and ran
                      it; ran his version of the python code too.
                      Oh boy; yes indeed the slow python is faster than the fast C++
                      version... Must be something really awful happening in the STL
                      implementation that comes with GCC 3.4!
                      >
                      Here's the output from my console:
                      >
                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      $ g++ -O3 -march=pentium-m -o SpeedTest SpeedTest.cpp
                      >
                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      $ ./SpeedTest.py
                      Begin Test
                      Number of unique string objects: 4
                      so long...
                      What do you know
                      fool
                      chicken crosses road
                      Number of unique string objects: 40000
                      so long...
                      What do you know
                      fool
                      chicken crosses road
                      Fast - Elapsed: 0.037574 seconds
                      Slow - Elapsed: 0.081520 seconds
                      >
                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      $ ./SpeedTest.exe
                      Begin Test
                      What do you know?
                      chicken crosses road
                      fool
                      so long...
                      What do you know?
                      chicken crosses road
                      fool
                      so long...
                      Fast - Elapsed: 2.089 seconds
                      Slow - Elapsed: 6.303 seconds
                      >
                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      >
                      >
                      Cheers,
                      >
                      --Tim

                      Comment

                      • Ray

                        #41
                        Re: Python and STL efficiency

                        could.net@gmail .com wrote:
                        That's to say,
                        python is still much faster?
                        Not really, see my test, in my other post in the same thread. I'm using
                        VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
                        fair that for C++ we're using the latest as well.
                        I am a c++ newbie but I think c++ should be faster here.
                        Same here, although that said Python's implementation of those data
                        structure must already be as optimal as mortals can do it.

                        <snip>

                        Comment

                        • Mc Osten

                          #42
                          Re: Python and STL efficiency

                          Ray <ray_usenet@yah oo.comwrote:
                          I'm using VC++ Express, I didn't care to tweak the optimizations, I
                          merely chose the "Release" configuration for the executable. It's
                          blazing fast, taking only 30+ ms each run.
                          Of course it is faster. We are looping 1000000 times, you just 10000.

                          --
                          blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                          site: http://www.akropolix.net/rik0/ | tenetevi riso e
                          forum: http://www.akropolix.net/forum/ | bacchette per voi.

                          Comment

                          • Mc Osten

                            #43
                            Re: Python and STL efficiency

                            Ray <ray_usenet@yah oo.comwrote:
                            Not really, see my test, in my other post in the same thread. I'm using
                            VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
                            fair that for C++ we're using the latest as well.
                            In your test, you are looping 10000 times, we looped 1000000.
                            In Python tests with 10000 elements, it was about 10 ms.

                            Moreover, we tried various Python and C++ configurations. Most of the
                            tests are done with Python 2.4, not 2.5.
                            And I used gcc4, that is to say the latest on my platform.
                            Same here, although that said Python's implementation of those data
                            structure must already be as optimal as mortals can do it.
                            I think this is the rationale behind it.

                            --
                            blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                            site: http://www.akropolix.net/rik0/ | tenetevi riso e
                            forum: http://www.akropolix.net/forum/ | bacchette per voi.

                            Comment

                            • Mc Osten

                              #44
                              Re: Python and STL efficiency

                              <could.net@gmai l.comwrote:
                              That's to say,
                              python is still much faster?
                              Yes it is. But of course you can't sat that "Python is faster than C++".
                              We found that the code to do this, written in the most natural way, is a
                              lot faster in Python. However, if you optimze the code, C++ gets almost
                              as fast.

                              In other benchmarks C++ outperforms Python and is 10 or 100 times
                              faster.

                              Maybe someone can post this to the c++ maillist and they will tell how
                              to accelerate it.
                              There are enough C++ experts here to do it. The point is another.

                              --
                              blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                              site: http://www.akropolix.net/rik0/ | tenetevi riso e
                              forum: http://www.akropolix.net/forum/ | bacchette per voi.

                              Comment

                              • Ray

                                #45
                                Re: Python and STL efficiency


                                Mc Osten wrote:
                                Ray <ray_usenet@yah oo.comwrote:
                                >
                                I'm using VC++ Express, I didn't care to tweak the optimizations, I
                                merely chose the "Release" configuration for the executable. It's
                                blazing fast, taking only 30+ ms each run.
                                >
                                Of course it is faster. We are looping 1000000 times, you just 10000.
                                Certainly--I was not comparing 1000000 against 10000. Referring to the
                                OP's statement: "However, while the python code gave the result almost
                                instantly, the C++ code took several seconds to run!" 30ms sounds like
                                a definite improvement over several seconds!

                                I'll try to tweak it later at home and report here. I'll try out the
                                1000000 too.

                                Cheers
                                Ray
                                >
                                --
                                blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                                site: http://www.akropolix.net/rik0/ | tenetevi riso e
                                forum: http://www.akropolix.net/forum/ | bacchette per voi.

                                Comment

                                Working...