Python and STL efficiency

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Licheng Fang

    Python and STL efficiency

    Hi, I'm learning STL and I wrote some simple code to compare the
    efficiency of python and STL.

    //C++
    #include <iostream>
    #include <string>
    #include <vector>
    #include <set>
    #include <algorithm>
    using namespace std;

    int main(){
    vector<stringa;
    for (long int i=0; i<10000 ; ++i){
    a.push_back("Wh at do you know?");
    a.push_back("so long...");
    a.push_back("ch icken crosses road");
    a.push_back("fo ol");
    }
    set<stringb(a.b egin(), a.end());
    unique_copy(b.b egin(), b.end(), ostream_iterato r<string>(cout , "\n"));
    }

    #python
    def f():
    a = []
    for i in range(10000):
    a.append('What do you know')
    a.append('so long...')
    a.append('chick en crosses road')
    a.append('fool' )
    b = set(a)
    for s in b:
    print s

    I was using VC++.net and IDLE, respectively. I had expected C++ to be
    way faster. However, while the python code gave the result almost
    instantly, the C++ code took several seconds to run! Can somebody
    explain this to me? Or is there something wrong with my code?

  • Marc 'BlackJack' Rintsch

    #2
    Re: Python and STL efficiency

    In <1156143136.020 647.294290@i42g 2000cwa.googleg roups.com>, Licheng Fang
    wrote:
    Hi, I'm learning STL and I wrote some simple code to compare the
    efficiency of python and STL.
    >
    //C++
    #include <iostream>
    #include <string>
    #include <vector>
    #include <set>
    #include <algorithm>
    using namespace std;
    >
    int main(){
    vector<stringa;
    for (long int i=0; i<10000 ; ++i){
    a.push_back("Wh at do you know?");
    a.push_back("so long...");
    a.push_back("ch icken crosses road");
    a.push_back("fo ol");
    }
    set<stringb(a.b egin(), a.end());
    unique_copy(b.b egin(), b.end(), ostream_iterato r<string>(cout , "\n"));
    }
    Why are you using `unique_copy` here?
    #python
    def f():
    a = []
    for i in range(10000):
    a.append('What do you know')
    a.append('so long...')
    a.append('chick en crosses road')
    a.append('fool' )
    b = set(a)
    for s in b:
    print s
    >
    I was using VC++.net and IDLE, respectively. I had expected C++ to be
    way faster. However, while the python code gave the result almost
    instantly, the C++ code took several seconds to run! Can somebody
    explain this to me? Or is there something wrong with my code?
    There's a difference in data structures at least. The Python `set` type
    is implemented with a hash algorithm, so the equivalent STL type would be
    `hash_set`. `set` in Python does not store its contents sorted.

    Ciao,
    Marc 'BlackJack' Rintsch

    Comment

    • Licheng Fang

      #3
      Re: Python and STL efficiency


      Marc 'BlackJack' Rintsch wrote:
      In <1156143136.020 647.294290@i42g 2000cwa.googleg roups.com>, Licheng Fang
      wrote:
      >
      Hi, I'm learning STL and I wrote some simple code to compare the
      efficiency of python and STL.

      //C++
      #include <iostream>
      #include <string>
      #include <vector>
      #include <set>
      #include <algorithm>
      using namespace std;

      int main(){
      vector<stringa;
      for (long int i=0; i<10000 ; ++i){
      a.push_back("Wh at do you know?");
      a.push_back("so long...");
      a.push_back("ch icken crosses road");
      a.push_back("fo ol");
      }
      set<stringb(a.b egin(), a.end());
      unique_copy(b.b egin(), b.end(), ostream_iterato r<string>(cout , "\n"));
      }
      >
      Why are you using `unique_copy` here?
      Sorry, that's a typo. Actually I used 'copy'.
      >
      #python
      def f():
      a = []
      for i in range(10000):
      a.append('What do you know')
      a.append('so long...')
      a.append('chick en crosses road')
      a.append('fool' )
      b = set(a)
      for s in b:
      print s

      I was using VC++.net and IDLE, respectively. I had expected C++ to be
      way faster. However, while the python code gave the result almost
      instantly, the C++ code took several seconds to run! Can somebody
      explain this to me? Or is there something wrong with my code?
      >
      There's a difference in data structures at least. The Python `set` type
      is implemented with a hash algorithm, so the equivalent STL type would be
      `hash_set`. `set` in Python does not store its contents sorted.
      >
      Ciao,
      Marc 'BlackJack' Rintsch
      Thank you for your comments. I tested with hash_set, but I didn't see
      much performance improvement. When I increased the loop to 1 million
      times, the python code still ran reasonably fast and the C++ code got
      stuck there. This totally surprised me, because according this page
      http://norvig.com/python-lisp.html, the speed of python is nowhere near
      that of C++.

      Comment

      • Tim N. van der Leeuw

        #4
        Re: Python and STL efficiency


        Licheng Fang wrote:
        Hi, I'm learning STL and I wrote some simple code to compare the
        efficiency of python and STL.
        >
        >
        I was using VC++.net and IDLE, respectively. I had expected C++ to be
        way faster. However, while the python code gave the result almost
        instantly, the C++ code took several seconds to run! Can somebody
        explain this to me? Or is there something wrong with my code?
        Hi,

        I'm no C++ guru so cannot comment on the C++ code itself, however I do
        wonder if you tested your C++ code with other STL implementation such
        as gcc (gcc is available on windows as well, in various versions).

        What could be is that expanding the list in C++ is done in very small
        increments, leading to many re-allocations. Is it possible to
        pre-allocate the vector<with sufficient entries?


        Also, your Python code as quoted, doesn't actually call your function
        f(). If you say that you get results instantly, I assume that you mean
        all 4 strings are actually printed to console?

        (I'm surprised that the console prints things that fast).

        btw, using range() in Python isn't very efficient, I think... Better to
        use xrange().


        Asked a C++ collegue of mine to comment, and he strongly suspects that
        you're actually running it in the .Net runtime (your C++ code contains
        some C#-isms, such as omitting the '.h' in the include <statements).

        Luck,

        --Tim

        Comment

        • Tim N. van der Leeuw

          #5
          Re: Python and STL efficiency


          Marc 'BlackJack' Rintsch wrote:
          In <1156143136.020 647.294290@i42g 2000cwa.googleg roups.com>, Licheng Fang
          wrote:
          >
          Hi, I'm learning STL and I wrote some simple code to compare the
          efficiency of python and STL.
          [...]
          >
          There's a difference in data structures at least. The Python `set` type
          is implemented with a hash algorithm, so the equivalent STL type would be
          `hash_set`. `set` in Python does not store its contents sorted.
          >
          The set should be only 4 items in size, according to my reading of the
          code, so set implementation differences shouldn't lead to drastic
          performance differences.

          Ciao,
          Marc 'BlackJack' Rintsch
          Cheers,

          --Tim

          Comment

          • Marc 'BlackJack' Rintsch

            #6
            Re: Python and STL efficiency

            In <1156148654.420 508.57960@p79g2 000cwp.googlegr oups.com>, Tim N. van der
            Leeuw wrote:
            (your C++ code contains some C#-isms, such as omitting the '.h' in the
            include <statements).
            That's no C#-ism, that's C++. The standard C++ header names don't have a
            trailing '.h'. ``gcc`` prints deprecation warnings if you write the
            names with '.h'.

            Ciao,
            Marc 'BlackJack' Rintsch

            Comment

            • Tim N. van der Leeuw

              #7
              Re: Python and STL efficiency


              Marc 'BlackJack' Rintsch wrote:
              In <1156148654.420 508.57960@p79g2 000cwp.googlegr oups.com>, Tim N. van der
              Leeuw wrote:
              >
              (your C++ code contains some C#-isms, such as omitting the '.h' in the
              include <statements).
              >
              That's no C#-ism, that's C++. The standard C++ header names don't have a
              trailing '.h'. ``gcc`` prints deprecation warnings if you write the
              names with '.h'.
              >
              Ciao,
              Marc 'BlackJack' Rintsch
              We stand corrected.

              --Tim

              Comment

              • Fredrik Lundh

                #8
                Re: Python and STL efficiency

                Licheng Fang wrote:
                I was using VC++.net and IDLE, respectively. I had expected C++ to be
                way faster. However, while the python code gave the result almost
                instantly, the C++ code took several seconds to run! Can somebody
                explain this to me? Or is there something wrong with my code?
                in the Python example, the four strings in your example are shared, so
                you're basically copying 40000 pointers to the list.

                in the C++ example, you're creating 40000 string objects.

                </F>

                Comment

                • Ray

                  #9
                  Re: Python and STL efficiency

                  How did you compile the C++ executable? I assume that it is Release
                  mode? Then are the optimization switches enabled? Is it compiled as
                  Native Win32 or Managed application?

                  I suspect that other than what other posters have suggested about your
                  code, the difference in speed is due to the way you build your C++
                  executable...

                  HTH,
                  Ray

                  Licheng Fang wrote:
                  Hi, I'm learning STL and I wrote some simple code to compare the
                  efficiency of python and STL.
                  >
                  //C++
                  #include <iostream>
                  #include <string>
                  #include <vector>
                  #include <set>
                  #include <algorithm>
                  using namespace std;
                  >
                  int main(){
                  vector<stringa;
                  for (long int i=0; i<10000 ; ++i){
                  a.push_back("Wh at do you know?");
                  a.push_back("so long...");
                  a.push_back("ch icken crosses road");
                  a.push_back("fo ol");
                  }
                  set<stringb(a.b egin(), a.end());
                  unique_copy(b.b egin(), b.end(), ostream_iterato r<string>(cout , "\n"));
                  }
                  >
                  #python
                  def f():
                  a = []
                  for i in range(10000):
                  a.append('What do you know')
                  a.append('so long...')
                  a.append('chick en crosses road')
                  a.append('fool' )
                  b = set(a)
                  for s in b:
                  print s
                  >
                  I was using VC++.net and IDLE, respectively. I had expected C++ to be
                  way faster. However, while the python code gave the result almost
                  instantly, the C++ code took several seconds to run! Can somebody
                  explain this to me? Or is there something wrong with my code?

                  Comment

                  • Peter Otten

                    #10
                    Re: Python and STL efficiency

                    Licheng Fang wrote:
                    Hi, I'm learning STL and I wrote some simple code to compare the
                    efficiency of python and STL.
                    I was using VC++.net and IDLE, respectively. I had expected C++ to be
                    way faster. However, while the python code gave the result almost
                    instantly, the C++ code took several seconds to run! Can somebody
                    explain this to me? Or is there something wrong with my code?
                    Just a guess: immutable strings might be Python's advantage. Due to your
                    "benchmark" 's simplicity you end up with 10000 string instances in C++ and
                    just four str-s (and a lot of pointers) in Python.

                    What happens if you replace 'string' with 'const char *' in C++ ?
                    (Note that this modification is a bit unfair to Python as it would not
                    detect equal strings in different memory locations)

                    Peter

                    Comment

                    • Ray

                      #11
                      Re: Python and STL efficiency

                      Fredrik Lundh wrote:
                      in the Python example, the four strings in your example are shared, so
                      you're basically copying 40000 pointers to the list.
                      >
                      in the C++ example, you're creating 40000 string objects.
                      >
                      </F>
                      In which case, Licheng, you should try using the /GF switch. This will
                      tell Microsoft C++ compiler to pool identical string literals together.


                      :)

                      Comment

                      • Fredrik Lundh

                        #12
                        Re: Python and STL efficiency

                        Ray wrote:
                        >in the C++ example, you're creating 40000 string objects.
                        >
                        In which case, Licheng, you should try using the /GF switch. This will
                        tell Microsoft C++ compiler to pool identical string literals together.
                        in what way does that change the implementation of C++'s string type ?

                        </F>

                        Comment

                        • Tim N. van der Leeuw

                          #13
                          Re: Python and STL efficiency


                          Ray wrote:
                          Fredrik Lundh wrote:
                          in the Python example, the four strings in your example are shared, so
                          you're basically copying 40000 pointers to the list.

                          in the C++ example, you're creating 40000 string objects.

                          </F>
                          >
                          In which case, Licheng, you should try using the /GF switch. This will
                          tell Microsoft C++ compiler to pool identical string literals together.
                          >
                          >
                          :)
                          The code still creates a new string - instance each time it tries to
                          append a const char* to the vector<string.. .

                          You should instead create the string-objects ahead of time, outside of
                          the loop.

                          Regards,

                          --Tim

                          Comment

                          • Jeremy Sanders

                            #14
                            Re: Python and STL efficiency

                            Licheng Fang wrote:
                            I was using VC++.net and IDLE, respectively. I had expected C++ to be
                            way faster. However, while the python code gave the result almost
                            instantly, the C++ code took several seconds to run! Can somebody
                            explain this to me? Or is there something wrong with my code?
                            It must be the debugging, the compiler or a poor STL implementation. With
                            gcc 4 it runs instantly on my computer (using -O2), even with 10x the
                            number of values.

                            If the problem is that C++ has to make lots of new strings, as other posters
                            have suggested, then you could do something like

                            const string foo = "What do you know?";

                            for (long int i=0; i<10000 ; ++i){
                               a.push_back(foo );
                            ...
                            }

                            as many C++ implementations use reference counting for identical strings.

                            Jeremy

                            --
                            Jeremy Sanders

                            Comment

                            • Christophe

                              #15
                              Re: Python and STL efficiency

                              Jeremy Sanders a écrit :
                              Licheng Fang wrote:
                              >
                              >I was using VC++.net and IDLE, respectively. I had expected C++ to be
                              >way faster. However, while the python code gave the result almost
                              >instantly, the C++ code took several seconds to run! Can somebody
                              >explain this to me? Or is there something wrong with my code?
                              >
                              It must be the debugging, the compiler or a poor STL implementation. With
                              gcc 4 it runs instantly on my computer (using -O2), even with 10x the
                              number of values.
                              >
                              If the problem is that C++ has to make lots of new strings, as other posters
                              have suggested, then you could do something like
                              >
                              const string foo = "What do you know?";
                              >
                              for (long int i=0; i<10000 ; ++i){
                              a.push_back(foo );
                              ...
                              }
                              >
                              as many C++ implementations use reference counting for identical strings.
                              >
                              Jeremy
                              >
                              As a matter of fact, do not count on that. Use a vector<string*j ust in
                              case.

                              Comment

                              Working...