Python extension performance

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • David Jones

    Python extension performance

    Hi,

    I am trying to hunt down the difference in performance between some raw
    C++ code and calling the C++ code from Python. My goal is to use Python
    to control a bunch of number crunching code, and I need to show that
    this will not incur a (big) performance hit.

    This post includes a description of my problem, ideas I have for the
    cause, and some things I plan to try next week. If anyone knows the
    real cause, or thinks any of my ideas are way off base, I would
    appreciate hearing about it.

    My C++ function (testfunction) runs in 2.9 seconds when called from a
    C++ program, but runs in 4.3 seconds when called from Python.
    testfunction calculates its own running time with calls to clock(), and
    this is for only one iteration, so none of the time is in the SWIG code
    or Python.

    Both the C++ executable and python module were linked from the same
    object files, and linked with the same options. The only difference is
    that the Python module is linked with -shared, and the C++ code is not.

    The computer is an Itanium 2. The code was compiled with the Intel
    Compiler, and uses the Intel Math Libraries. Python is version 2.2
    (with little hope of being able to upgrade) from the Red Hat rpm
    install. When I link the C++ exe, I get some warnings about "log2l not
    implemented" from libimf, but I do not see these when I link the Python .so.

    Some potential causes of my problems:

    - linking to a shared library instead building a static exe.
    - intel libraries are not being used when I think they are
    - libpython.so was built with gcc, so I am getting some link issues
    - can linking to python affect my memory allocation and deallocation in
    c++??

    Some things I can try:

    - recompile python with the intel compiler and try again
    - compile my extension into a python interpreter, statically
    - segregate the memory allocations from the numerical work and compare
    how the C++ and Python versions compare


    --end brain dump

    Dave
  • Jack Diederich

    #2
    Re: Python extension performance

    On Fri, Apr 08, 2005 at 10:14:52PM -0400, David Jones wrote:[color=blue]
    > I am trying to hunt down the difference in performance between some raw
    > C++ code and calling the C++ code from Python. My goal is to use Python
    > to control a bunch of number crunching code, and I need to show that
    > this will not incur a (big) performance hit.[/color]
    ...[color=blue]
    > My C++ function (testfunction) runs in 2.9 seconds when called from a
    > C++ program, but runs in 4.3 seconds when called from Python.
    > testfunction calculates its own running time with calls to clock(), and
    > this is for only one iteration, so none of the time is in the SWIG code
    > or Python.[/color]
    ...[color=blue]
    > Some potential causes of my problems:
    >
    > - linking to a shared library instead building a static exe.
    > - intel libraries are not being used when I think they are
    > - libpython.so was built with gcc, so I am getting some link issues
    > - can linking to python affect my memory allocation and deallocation in
    > c++??[/color]
    The main overhead of calling C/C++ from python is the function call overhead
    (python creating the stack frame for the call, and then changing the python
    objects into regular ints, char *, etc). You don't mention how many times
    you are calling the function. If it is only once and the difference is 1.4
    seconds then something is really, really, messed up. So I'll guess it is
    hundreds of thousands of times? Let us know.
    [color=blue]
    > Some things I can try:
    > - recompile python with the intel compiler and try again
    > - compile my extension into a python interpreter, statically
    > - segregate the memory allocations from the numerical work and compare
    > how the C++ and Python versions compare[/color]
    Recompiling with the Intel compiler might help, I hear it is faster than
    GCC for all modern x86 platforms. I think CPython is only tested on GCC
    and windows Visual-C-thingy so you might be SOL. The other two ideas
    seem much harder to do and less likely to show an improvement.

    -jackdied

    Comment

    • Martin v. Löwis

      #3
      Re: Python extension performance

      David Jones wrote:[color=blue]
      > Both the C++ executable and python module were linked from the same
      > object files, and linked with the same options. The only difference is
      > that the Python module is linked with -shared, and the C++ code is not.[/color]
      [...][color=blue]
      > Some potential causes of my problems:
      >
      > - linking to a shared library instead building a static exe.[/color]

      That is unlikely the problem - the shared library should long be loaded
      when you call the function.
      [color=blue]
      > - intel libraries are not being used when I think they are[/color]

      Very much possible. I would do an strace on each binary (python and
      your stand-alone application) to see what libraries are picked up.
      [color=blue]
      > - libpython.so was built with gcc, so I am getting some link issues[/color]

      Unlikely.
      [color=blue]
      > - can linking to python affect my memory allocation and deallocation in
      > c++??[/color]

      It can - Python grabs a 128k at startup, and then another one if the
      first one is exhausted. But that should not cause a performance
      difference.

      Other possible explanations:
      - The intel compiler somehow arranges to use multiple processors in the
      code (e.g. through OpenMP); for some reason, your multiple processor
      are not used when this is run in the Python interpreter (and no,
      the GIL would not be an immediate explanation)
      - The Python interpreter (unknowingly) switches the processor to a
      different floating-point operation mode, one which is less efficient
      (but, say, more correct).

      Regards,
      Martin

      Comment

      • David Jones

        #4
        Re: Python extension performance

        Jack Diederich wrote:
        [color=blue]
        > On Fri, Apr 08, 2005 at 10:14:52PM -0400, David Jones wrote:
        >[color=green]
        >>I am trying to hunt down the difference in performance between some raw
        >>C++ code and calling the C++ code from Python. My goal is to use Python
        >>to control a bunch of number crunching code, and I need to show that
        >>this will not incur a (big) performance hit.
        >>
        >>My C++ function (testfunction) runs in 2.9 seconds when called from a
        >>C++ program, but runs in 4.3 seconds when called from Python.
        >>testfunctio n calculates its own running time with calls to clock(), and
        >>this is for only one iteration, so none of the time is in the SWIG code
        >>or Python.
        >>
        >>Some potential causes of my problems:
        >>
        >>- linking to a shared library instead building a static exe.
        >>- intel libraries are not being used when I think they are
        >>- libpython.so was built with gcc, so I am getting some link issues
        >>- can linking to python affect my memory allocation and deallocation in
        >>c++??[/color]
        >
        > The main overhead of calling C/C++ from python is the function call overhead
        > (python creating the stack frame for the call, and then changing the python
        > objects into regular ints, char *, etc). You don't mention how many times
        > you are calling the function. If it is only once and the difference is 1.4
        > seconds then something is really, really, messed up. So I'll guess it is
        > hundreds of thousands of times? Let us know.[/color]

        Sorry I was not clearer above; the function is only called one time. I
        have run out of obvious things I may have screwed up. The part that
        bugs me most is that these are built from the same .o files except for
        the .o file that has the wrapper function for python.
        [color=blue]
        >[color=green]
        >>Some things I can try:
        >>- recompile python with the intel compiler and try again
        >>- compile my extension into a python interpreter, statically
        >>- segregate the memory allocations from the numerical work and compare
        >>how the C++ and Python versions compare[/color]
        >
        > Recompiling with the Intel compiler might help, I hear it is faster than
        > GCC for all modern x86 platforms. I think CPython is only tested on GCC
        > and windows Visual-C-thingy so you might be SOL. The other two ideas
        > seem much harder to do and less likely to show an improvement.
        >
        > -jackdied
        >[/color]

        By the second option, I meant to compile my extension statically instead
        of using a shared library by unpacking the source rpm and putting my
        code in the Modules/ directory. That is a pretty standard thing to do,
        isn't it?


        Thanks for the comments.

        Dave


        Comment

        Working...