Python and STL efficiency

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tim N. van der Leeuw

    #16
    Re: Python and STL efficiency


    Tim N. van der Leeuw wrote:
    Ray wrote:
    Fredrik Lundh wrote:
    in the Python example, the four strings in your example are shared, so
    you're basically copying 40000 pointers to the list.
    >
    in the C++ example, you're creating 40000 string objects.
    >
    </F>
    In which case, Licheng, you should try using the /GF switch. This will
    tell Microsoft C++ compiler to pool identical string literals together.


    :)
    >
    The code still creates a new string - instance each time it tries to
    append a const char* to the vector<string.. .
    >
    You should instead create the string-objects ahead of time, outside of
    the loop.
    >
    Regards,
    >
    --Tim
    Alternatively, slow down the Python implementation by making Python
    allocate new strings each time round:

    a.append('%s' % 'What do you know')


    .... for each of your string-appends. But even then, the python-code is
    still near-instant.

    Cheers,

    --Tim

    Comment

    • Ray

      #17
      Re: Python and STL efficiency


      Fredrik Lundh wrote:
      Ray wrote:
      >
      in the C++ example, you're creating 40000 string objects.
      In which case, Licheng, you should try using the /GF switch. This will
      tell Microsoft C++ compiler to pool identical string literals together.
      >
      in what way does that change the implementation of C++'s string type ?
      Ah, yes what was I thinking? The fact that it stores std::string
      objects escaped my mind somehow. /GF just pools the string literals.
      Thanks for the correction.
      >
      </F>

      Comment

      • Ray

        #18
        Re: Python and STL efficiency


        Tim N. van der Leeuw wrote:
        In which case, Licheng, you should try using the /GF switch. This will
        tell Microsoft C++ compiler to pool identical string literals together.


        :)
        >
        The code still creates a new string - instance each time it tries to
        append a const char* to the vector<string.. .
        Yeah, you're right... I've been programming Java too long :)
        You should instead create the string-objects ahead of time, outside of
        the loop.
        >
        Regards,
        >
        --Tim

        Comment

        • Tim N. van der Leeuw

          #19
          Re: Python and STL efficiency


          Ray wrote:
          Tim N. van der Leeuw wrote:
          In which case, Licheng, you should try using the /GF switch. This will
          tell Microsoft C++ compiler to pool identical string literals together.
          >
          >
          :)
          The code still creates a new string - instance each time it tries to
          append a const char* to the vector<string.. .
          >
          Yeah, you're right... I've been programming Java too long :)
          >
          Took me a while to see that too! Have been programming too much Java /
          Python as well. Anyways, when changing the Python version so that it
          adds 40.000 unique strings to the list (and proving that there are
          indeed 40.000 unique ids in the list, by making a set of all id()s in
          the list and taking the len() of that set), it still takes at most a
          second. I cannot test the speed of the c++ version on my computer, so
          nothing scientific here.

          I'm curious though, if on the OP's machine the slowed-down Python
          version is still faster than the C++ version.


          Cheers,

          --Tim

          Comment

          • Mc Osten

            #20
            Re: Python and STL efficiency

            Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
            I'm curious though, if on the OP's machine the slowed-down Python
            version is still faster than the C++ version.
            I tested both on my machine (my other post in the thread)

            --
            blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
            site: http://www.akropolix.net/rik0/ | tenetevi riso e
            forum: http://www.akropolix.net/forum/ | bacchette per voi.

            Comment

            • Mc Osten

              #21
              Re: Python and STL efficiency

              Jeremy Sanders <jeremy+complan gpython@jeremys anders.netwrote :
              It must be the debugging, the compiler or a poor STL implementation. With
              gcc 4 it runs instantly on my computer (using -O2), even with 10x the
              number of values.
              $ gcc --version
              i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
              5363)

              I adapted original poster's code and made a function that did not create
              strings each time. The NoisyString is a class we can use to actually
              track copying.

              In fact Python here is faster. Suppose it has a really optimized set
              class...


              Here some results (I know that the fpoint optimizations are useless...
              it's is my "prebuilt" full optimization macro :) ):




              $ g++ -O3 -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
              -mfpmath=sse -o set_impl set_impl.cpp
              $ ./set_impl
              What do you know?
              chicken crosses road
              fool
              so long...
              What do you know?
              chicken crosses road
              fool
              so long...
              Elapsed 5.8
              Elapsed 1.71

              $ g++ -Os -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
              -mfpmath=sse -o set_impl set_impl.cpp
              $ ./set_impl

              What do you know?
              chicken crosses road
              fool
              so long...
              What do you know?
              chicken crosses road
              fool
              so long...
              Elapsed 5.8
              Elapsed 1.71

              $ g++ -O3 -o set_impl set_impl.cpp
              $ ./set_impl
              What do you know?
              chicken crosses road
              fool
              so long...
              What do you know?
              chicken crosses road
              fool
              so long...
              Elapsed 0.47
              Elapsed 0.18

              $ g++ -o set_impl set_impl.cpp
              $ ./set_impl
              What do you know?
              chicken crosses road
              fool
              so long...
              What do you know?
              chicken crosses road
              fool
              so long...
              Elapsed 0.63
              Elapsed 0.33

              $ python -O set_impl.py
              so long...
              What do you know
              fool
              chicken crosses road
              so long...
              What do you know
              fool
              chicken crosses road
              Elapsed: 1.370000 seconds
              Elapsed: 3.810000 seconds



              ------------------- PYTHON CODE ---------------------------------
              #python

              global size
              size = 1000000

              def f():
              a = []
              for i in range(size):
              a.append('What do you know')
              a.append('so long...')
              a.append('chick en crosses road')
              a.append('fool' )
              b = set(a)
              for s in b:
              print s

              def slow_f():
              a = []
              for i in range(size):
              a.append('%s' % 'What do you know')
              a.append('%s' % 'so long...')
              a.append('%s' % 'chicken crosses road')
              a.append('%s' % 'fool')
              b = set(a)
              for s in b:
              print s

              import time
              from time import clock

              f_start = clock()
              f()
              f_end = clock()

              slow_f_start = clock()
              slow_f()
              slow_f_end = clock()

              print "Elapsed: %f seconds" % (f_end - f_start)
              print "Elapsed: %f seconds" % (slow_f_end - slow_f_start)

              ------------------------------------------------------------------


              ----------------- CPP CODE -------------------------------------
              #include <iostream>
              #include <ostream>
              #include <iterator>
              #include <string>
              #include <vector>
              #include <set>
              #include <algorithm>
              #include <ctime>
              using namespace std;


              #define SIZE 1000000

              class NoisyString : public std::string {
              public:
              NoisyString(con st string& cp)
              : string(cp)
              {
              cout << "Fuck I got copied!" << endl;
              }

              NoisyString(con st char* s ) : string(s) {

              }



              };


              void f(){
              vector<stringa;
              for (long int i=0; i<SIZE ; ++i){
              a.push_back("Wh at do you know?");
              a.push_back("so long...");
              a.push_back("ch icken crosses road");
              a.push_back("fo ol");
              }
              set<stringb(a.b egin(), a.end());
              copy(b.begin(), b.end(), ostream_iterato r<string>(cout , "\n"));
              }

              void fast_f(){
              vector<stringa;
              string s1 = "What do you know?" ;
              string s2 = "so long..." ;
              string s3 = "chicken crosses road";
              string s4 = "fool" ;
              for (long int i=0; i<SIZE ; ++i){
              a.push_back(s1) ;
              a.push_back(s2) ;
              a.push_back(s3) ;
              a.push_back(s4) ;
              }
              set<stringb(a.b egin(), a.end());
              copy(b.begin(), b.end(), ostream_iterato r<string>(cout , "\n"));
              }


              int main(){
              clock_t f_start,
              f_end,
              faster_f_start,
              faster_f_end,
              fast_f_start,
              fast_f_end;

              f_start = clock();
              f();
              f_end = clock();

              fast_f_start = clock();
              fast_f();
              fast_f_end = clock();


              cout << "Elapsed " << (f_end - f_start) / double(CLOCKS_P ER_SEC) <<
              endl;
              cout << "Elapsed " << (fast_f_end - fast_f_start) /
              double(CLOCKS_P ER_SEC) << endl;

              }

              -----------------------------------------------------------------------




              --
              blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
              site: http://www.akropolix.net/rik0/ | tenetevi riso e
              forum: http://www.akropolix.net/forum/ | bacchette per voi.

              Comment

              • Fredrik Lundh

                #22
                Re: Python and STL efficiency

                "Mc Osten" wrote:
                In fact Python here is faster. Suppose it has a really optimized set
                class...
                Python's memory allocator is also quite fast, compared to most generic
                allocators...

                </F>



                Comment

                • Mc Osten

                  #23
                  Re: Python and STL efficiency

                  Fredrik Lundh <fredrik@python ware.comwrote:
                  Python's memory allocator is also quite fast, compared to most generic
                  allocators...
                  In fact also in the two "slow" versions Python outperforms C++.
                  I didn't notice it in the first place.

                  --
                  blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                  site: http://www.akropolix.net/rik0/ | tenetevi riso e
                  forum: http://www.akropolix.net/forum/ | bacchette per voi.

                  Comment

                  • Tim N. van der Leeuw

                    #24
                    Re: Python and STL efficiency


                    Mc Osten wrote:
                    Fredrik Lundh <fredrik@python ware.comwrote:
                    >
                    Python's memory allocator is also quite fast, compared to most generic
                    allocators...
                    >
                    In fact also in the two "slow" versions Python outperforms C++.
                    I didn't notice it in the first place.
                    >
                    But your C++ program outputs times in seconds, right? So all
                    compilations except for the first two give results in less than a
                    second, right? (meaning the optimizations of your standard-compilation
                    give worst results than -O3?)

                    BTW, I don't quite understand your gcc optimizations for the first 2
                    compiles anyways: two -O options with different values. Doesn't that
                    mean the 2nd -O takes preference, and the compilation is at -O2 instead
                    of -O3?

                    Why both -O3 and -O2 at the command-line?

                    Cheers,

                    --Tim

                    --
                    blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                    site: http://www.akropolix.net/rik0/ | tenetevi riso e
                    forum: http://www.akropolix.net/forum/ | bacchette per voi.

                    Comment

                    • Tim N. van der Leeuw

                      #25
                      Re: Python and STL efficiency


                      Mc Osten wrote:
                      Fredrik Lundh <fredrik@python ware.comwrote:
                      >
                      Python's memory allocator is also quite fast, compared to most generic
                      allocators...
                      >
                      In fact also in the two "slow" versions Python outperforms C++.
                      I didn't notice it in the first place.
                      >
                      --
                      blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                      site: http://www.akropolix.net/rik0/ | tenetevi riso e
                      forum: http://www.akropolix.net/forum/ | bacchette per voi.
                      Well, I guess I'm getting really obsessed with this. But anyways. I
                      installed MinGW on my Windows-XP (sp2) laptop. It is g++ version 3.4.5
                      -- ancient, yes, but on windows it's the latest available.

                      I compiled Mc Osten's C++ program (tweaked the output a little) and ran
                      it; ran his version of the python code too.
                      Oh boy; yes indeed the slow python is faster than the fast C++
                      version... Must be something really awful happening in the STL
                      implementation that comes with GCC 3.4!

                      Here's the output from my console:

                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      $ g++ -O3 -march=pentium-m -o SpeedTest SpeedTest.cpp

                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      $ ./SpeedTest.py
                      Begin Test
                      Number of unique string objects: 4
                      so long...
                      What do you know
                      fool
                      chicken crosses road
                      Number of unique string objects: 40000
                      so long...
                      What do you know
                      fool
                      chicken crosses road
                      Fast - Elapsed: 0.037574 seconds
                      Slow - Elapsed: 0.081520 seconds

                      LeeuwT@nlshl-leeuwt ~/My Documents/Python
                      $ ./SpeedTest.exe
                      Begin Test
                      What do you know?
                      chicken crosses road
                      fool
                      so long...
                      What do you know?
                      chicken crosses road
                      fool
                      so long...
                      Fast - Elapsed: 2.089 seconds
                      Slow - Elapsed: 6.303 seconds

                      LeeuwT@nlshl-leeuwt ~/My Documents/Python


                      Cheers,

                      --Tim

                      Comment

                      • Jeremy Sanders

                        #26
                        Re: Python and STL efficiency

                        Mc Osten wrote:
                        Here some results (I know that the fpoint optimizations are useless...
                        it's is my "prebuilt" full optimization macro :) ):
                        Interesting. The opimisation makes no difference to the speed of the C++ one
                        for me. I just get

                        xpc17:~g++4 -O2 test2.cpp
                        xpc17:~./a.out
                        What do you know?
                        chicken crosses road
                        fool
                        so long...
                        What do you know?
                        chicken crosses road
                        fool
                        so long...
                        Elapsed 2.11
                        Elapsed 1.11

                        (This is with an Althon 64 4600+ running Linux).

                        Unfortunately the Python on this computer doesn't have set as it is too old,
                        so I can't compare it.

                        --
                        Jeremy Sanders

                        Comment

                        • Mc Osten

                          #27
                          Re: Python and STL efficiency

                          Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:

                          But your C++ program outputs times in seconds, right? So all
                          compilations except for the first two give results in less than a
                          second, right? (meaning the optimizations of your standard-compilation
                          give worst results than -O3?)
                          Yes. It's in seconds but the benchmark that are one order of magnitudo
                          less than the others have of a different "size" (100000 instead of
                          1000000). That is cut and paste from my terminal... I think it's a
                          mess. I do it all again from scratch.
                          BTW, I don't quite understand your gcc optimizations for the first 2
                          compiles anyways: two -O options with different values. Doesn't that
                          mean the 2nd -O takes preference, and the compilation is at -O2 instead
                          of -O3?
                          Why both -O3 and -O2 at the command-line?
                          I forgot I put -O2 in my $FAST_FLAGS. I don't know what I was thinking
                          about.

                          This the correct version

                          $ g++ -Os -pipe -march=pentium-m -msse3 -fomit-frame-pointer
                          -mfpmath=sse -o set_impl set_impl.cpp

                          $ ./set_impl
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          Elapsed 6.3
                          Elapsed 2.1

                          $ g++ -O2 -pipe -march=pentium-m -msse3 -fomit-frame-pointer
                          -mfpmath=sse -o set_impl set_impl.cpp
                          $ ./set_impl
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          Elapsed 5.8
                          Elapsed 1.7

                          $ g++ -O3 -pipe -march=pentium-m -msse3 -fomit-frame-pointer
                          -mfpmath=sse -o set_impl set_impl.cpp
                          $ ./set_impl
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          Elapsed 5.79
                          Elapsed 1.72

                          $ g++ -pipe -march=pentium-m -msse3 -fomit-frame-pointer -mfpmath=sse
                          -o set_impl set_impl.cpp
                          $ ./set_impl
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          What do you know?
                          chicken crosses road
                          fool
                          so long...
                          Elapsed 7.12
                          Elapsed 2.98

                          $ python -O set_impl.py
                          so long...
                          What do you know
                          fool
                          chicken crosses road
                          so long...
                          What do you know
                          fool
                          chicken crosses road
                          Elapsed: 1.370000 seconds
                          Elapsed: 3.800000 seconds

                          --
                          blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                          site: http://www.akropolix.net/rik0/ | tenetevi riso e
                          forum: http://www.akropolix.net/forum/ | bacchette per voi.

                          Comment

                          • Mc Osten

                            #28
                            Re: Python and STL efficiency

                            Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
                            Oh boy; yes indeed the slow python is faster than the fast C++
                            version... Must be something really awful happening in the STL
                            implementation that comes with GCC 3.4!
                            And the Python version does the very same number of iterations than the
                            C++ one? I suppose they are looping on arrays of different sizes, just
                            like my "first version".

                            --
                            blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                            site: http://www.akropolix.net/rik0/ | tenetevi riso e
                            forum: http://www.akropolix.net/forum/ | bacchette per voi.

                            Comment

                            • Tim N. van der Leeuw

                              #29
                              Re: Python and STL efficiency


                              Mc Osten wrote:
                              Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
                              >
                              Oh boy; yes indeed the slow python is faster than the fast C++
                              version... Must be something really awful happening in the STL
                              implementation that comes with GCC 3.4!
                              >
                              And the Python version does the very same number of iterations than the
                              C++ one? I suppose they are looping on arrays of different sizes, just
                              like my "first version".
                              >
                              Hmmm.. You're quite right. The C++ version had an array size 100.000
                              (your version), the Python version still had an array size 10.000 (as
                              in my modified copy of the original version).

                              When fixing the Python version to have 100.000 items, like the C++
                              version, the Python timings are:

                              Begin Test
                              Number of unique string objects: 4
                              so long...
                              What do you know
                              fool
                              chicken crosses road
                              Number of unique string objects: 400000
                              so long...
                              What do you know
                              fool
                              chicken crosses road
                              Fast - Elapsed: 0.512088 seconds
                              Slow - Elapsed: 1.139370 seconds

                              Still twice as fast as the fastest GCC 3.4.5 compiled version!

                              Incidentally, I also have a version compiled with VC++ 6 now... (not
                              yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
                              for speed, here's the result of VC++ 6:

                              LeeuwT@nlshl-leeuwt ~/My Documents/Python
                              $ ./SpeedTest_VC.ex e
                              Begin Test
                              What do you know?
                              chicken crosses road
                              fool
                              so long...
                              What do you know?
                              chicken crosses road
                              fool
                              so long...
                              Fast - Elapsed: 4.481 seconds
                              Slow - Elapsed: 4.842 seconds

                              So you can see that it's 'slow' version of the code is faster than the
                              'slow' version compiled with GCC, but the 'fast' code is barely faster
                              than the 'slow' code! And the 'fast' version compiled with GCC is much
                              faster than the 'fast' version compiled with VC++ 6!

                              My conclusion from that is, that the vector<or set<implementat ions
                              of GCC are far superior to those of VC++ 6, but that memory allocation
                              for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
                              (And Python still smokes them both).

                              Cheers,

                              --Tim


                              --
                              blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
                              site: http://www.akropolix.net/rik0/ | tenetevi riso e
                              forum: http://www.akropolix.net/forum/ | bacchette per voi.

                              Comment

                              • Tim N. van der Leeuw

                                #30
                                Re: Python and STL efficiency


                                Tim N. van der Leeuw wrote:
                                Mc Osten wrote:
                                Tim N. van der Leeuw <tim.leeuwvande r@nl.unisys.com wrote:
                                Oh boy; yes indeed the slow python is faster than the fast C++
                                version... Must be something really awful happening in the STL
                                implementation that comes with GCC 3.4!
                                And the Python version does the very same number of iterations than the
                                C++ one? I suppose they are looping on arrays of different sizes, just
                                like my "first version".
                                >
                                Hmmm.. You're quite right. The C++ version had an array size 100.000
                                (your version), the Python version still had an array size 10.000 (as
                                in my modified copy of the original version).
                                >
                                When fixing the Python version to have 100.000 items, like the C++
                                version, the Python timings are:
                                >
                                [...]
                                Fast - Elapsed: 0.512088 seconds
                                Slow - Elapsed: 1.139370 seconds
                                >
                                Still twice as fast as the fastest GCC 3.4.5 compiled version!
                                >
                                Incidentally, I also have a version compiled with VC++ 6 now... (not
                                yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
                                for speed, here's the result of VC++ 6:
                                >
                                LeeuwT@nlshl-leeuwt ~/My Documents/Python
                                $ ./SpeedTest_VC.ex e
                                [...]
                                Fast - Elapsed: 4.481 seconds
                                Slow - Elapsed: 4.842 seconds
                                >
                                [...]

                                And the results of IronPython (1.0rc2) are just in as well:

                                IronPython 1.0.60816 on .NET 2.0.50727.42
                                Copyright (c) Microsoft Corporation. All rights reserved.
                                >>>
                                >>import sys
                                >>sys.path.appe nd('c:/documents and settings/leeuwt/my documents/python')
                                >>import SpeedTest
                                >>SpeedTest.run _test()
                                Begin Test
                                Number of unique string objects: 4
                                What do you know
                                so long...
                                chicken crosses road
                                fool
                                Number of unique string objects: 400000
                                What do you know
                                so long...
                                chicken crosses road
                                fool
                                Fast - Elapsed: 1.287923 seconds
                                Slow - Elapsed: 4.516272 seconds
                                >>>

                                And for Python 2.5:
                                LeeuwT@nlshl-leeuwt ~/My Documents/Python
                                $ /cygdrive/c/Python25/python.exe SpeedTest.py
                                Begin Test
                                Number of unique string objects: 4
                                so long...
                                What do you know
                                fool
                                chicken crosses road
                                Number of unique string objects: 400000
                                so long...
                                What do you know
                                fool
                                chicken crosses road
                                Fast - Elapsed: 0.440619 seconds
                                Slow - Elapsed: 1.095341 seconds

                                LeeuwT@nlshl-leeuwt ~/My Documents/Python

                                But beware! For Python2.5 I had to change the code slightly, because it
                                already realized that the expression

                                '%s' % 'something'

                                will be a constant expression, and evaluates it once only... so I had
                                to replace '%s' with a variable, and I got the timings above which show
                                Python2.5 to be slightly faster than Python2.4.

                                (Next step would be to create a VB version and a Java version of the
                                same program, oh and perhaps to try a version that would work with
                                Jython... perhaps somehow w/o the 'set')

                                Cheers,

                                --Tim

                                Comment

                                Working...