make faster Richards benchmark

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Duncan Lissett

    make faster Richards benchmark

    I'd appreciate any suggestions on how to make faster Python
    implementations of Richards benchmark. Perhaps there are obvious
    problems that can be corrected?


  • Peter Maas

    #2
    Re: make faster Richards benchmark

    Duncan Lissett wrote:[color=blue]
    > I'd appreciate any suggestions on how to make faster Python
    > implementations of Richards benchmark. Perhaps there are obvious
    > problems that can be corrected?[/color]

    The most obvious problem: how does the Richards benchmark work?
    Care to post the source code?

    Mit freundlichen Gruessen,

    Peter Maas

    --
    -------------------------------------------------------------------
    Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
    Tel +49-241-93878-0 Fax +49-241-93878-20 eMail peter.maas@mplu sr.de
    -------------------------------------------------------------------

    Comment

    • Michel Claveau/Hamster

      #3
      Re: make faster Richards benchmark

      Hi !

      On the site, click on the word "Python #92"




      Comment

      • Josef Meile

        #4
        Re: make faster Richards benchmark

        Duncan Lissett wrote:[color=blue]
        > I'd appreciate any suggestions on how to make faster Python
        > implementations of Richards benchmark. Perhaps there are obvious
        > problems that can be corrected?
        >
        > http://www.lissett.com/ben/bench1.htm[/color]
        What's about including a second python implementation of the Richards
        benchmark using psyco? You don't have to modify your code, you only have
        to add two lines. It would be also interesting to see the differences
        between both source codes.

        Regards,
        Josef

        Comment

        • Wilk

          #5
          Re: make faster Richards benchmark

          dlissett0@yahoo .com (Duncan Lissett) writes:
          [color=blue]
          > I'd appreciate any suggestions on how to make faster Python
          > implementations of Richards benchmark. Perhaps there are obvious
          > problems that can be corrected?
          >
          > http://www.lissett.com/ben/bench1.htm[/color]

          import psyco
          psyco.full()

          2 times faster :-)

          --
          Wilk - http://flibuste.net

          Comment

          • Duncan Booth

            #6
            Re: make faster Richards benchmark

            dlissett0@yahoo .com (Duncan Lissett) wrote in
            news:6748553f.0 405122211.5be5a 150@posting.goo gle.com:
            [color=blue]
            > I'd appreciate any suggestions on how to make faster Python
            > implementations of Richards benchmark. Perhaps there are obvious
            > problems that can be corrected?[/color]

            That should say "the Richards benchmark", named for Martin Richards.
            [color=blue]
            >
            > http://www.lissett.com/ben/bench1.htm[/color]

            Well, if I was going to do a Python implementation of this I would want to
            look at what the benchmark is trying to achieve, and then program it fresh
            from the ground up instead of trying to ape some other language. In
            particular I expect that the classes with 'run' methods would probably
            become generators with most or all of their state held as local variables.

            For example, imagining that all the 'run' methods have first been renamed
            'next', IdleTask becomes (untested):

            def IdleTask(schedu ler, v1, v2):
            for i in range(v2):
            if v1 & 1:
            v1 = (v1 >> 1) ^ 0xD008
            yield scheduler.relea se(DEVICEB)
            else:
            v1 = v1 >> 1
            yield scheduler.relea se(DEVICEA)
            yield scheduler.holdC urrent()

            Next most obvious thing is to junk the linked lists and replace them with
            ordinary Python builtin lists. Pass the relevant list around instead of the
            id constants 'DEVICEA', 'DEVICEB' and you can get rid of all that lookup
            overhead. This involves quite a major restructuring of the scheduler
            though: hence my comment about understanding what the code is trying to
            achieve and starting from the ground up.

            Packet.addTo should look more like:

            def addTo(self, queue):
            queue.append(se lf)

            and then vapourise entirely.


            Minor points:

            Remove the pointless use of the 'global' keyword in various places. Replace
            the traceOn variable with __debug__ so you get the same benefits as
            compiled languages by optimising out the test for the trace statements.

            Remove the pointless set/get methods and just access the members directly.
            If you are writing a benchmark they will cripple performance.

            Avoiding multiple accesses to the same instance variable, or assigning to
            instance variables until you are about to return from a method: use a local
            during the execution of the method.

            Comment

            • Peter Hansen

              #7
              Re: make faster Richards benchmark

              Josef Meile wrote:
              [color=blue]
              > Duncan Lissett wrote:
              >[color=green]
              >> I'd appreciate any suggestions on how to make faster Python
              >> implementations of Richards benchmark. Perhaps there are obvious
              >> problems that can be corrected?
              >>
              >> http://www.lissett.com/ben/bench1.htm[/color]
              >
              > What's about including a second python implementation of the Richards
              > benchmark using psyco? You don't have to modify your code, you only have
              > to add two lines. It would be also interesting to see the differences
              > between both source codes.[/color]

              I get an immediate 38% speedup by doing "import psyco; psyco.full()"
              at the start of the benchmark.

              -Peter

              Comment

              • Peter Maas

                #8
                Re: make faster Richards benchmark

                Michel Claveau/Hamster wrote:[color=blue]
                > On the site, click on the word "Python #92"[/color]

                Thanks. Oh, how silly of me. I didn't recognize these as links,
                still tuned to underscore links.

                Mit freundlichen Gruessen,

                Peter Maas

                --
                -------------------------------------------------------------------
                Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
                Tel +49-241-93878-0 Fax +49-241-93878-20 eMail peter.maas@mplu sr.de
                -------------------------------------------------------------------

                Comment

                • Duncan Lissett

                  #9
                  Re: make faster Richards benchmark

                  Duncan Booth <me@privacy.net > wrote in message news:<Xns94E868 3D9EEF2duncanrc pcouk@127.0.0.1 >...

                  -snip-[color=blue]
                  > Well, if I was going to do a Python implementation of this I would want to
                  > look at what the benchmark is trying to achieve, and then program it fresh
                  > from the ground up instead of trying to ape some other language. In
                  > particular I expect that the classes with 'run' methods would probably
                  > become generators with most or all of their state held as local variables.
                  >
                  > For example, imagining that all the 'run' methods have first been renamed
                  > 'next', IdleTask becomes (untested):
                  >
                  > def IdleTask(schedu ler, v1, v2):
                  > for i in range(v2):
                  > if v1 & 1:
                  > v1 = (v1 >> 1) ^ 0xD008
                  > yield scheduler.relea se(DEVICEB)
                  > else:
                  > v1 = v1 >> 1
                  > yield scheduler.relea se(DEVICEA)
                  > yield scheduler.holdC urrent()
                  >
                  > Next most obvious thing is to junk the linked lists and replace them with
                  > ordinary Python builtin lists. Pass the relevant list around instead of the
                  > id constants 'DEVICEA', 'DEVICEB' and you can get rid of all that lookup
                  > overhead. This involves quite a major restructuring of the scheduler
                  > though: hence my comment about understanding what the code is trying to
                  > achieve and starting from the ground up.
                  >
                  > Packet.addTo should look more like:
                  >
                  > def addTo(self, queue):
                  > queue.append(se lf)
                  >
                  > and then vapourise entirely.
                  >
                  >
                  > Minor points:
                  >
                  > Remove the pointless use of the 'global' keyword in various places. Replace
                  > the traceOn variable with __debug__ so you get the same benefits as
                  > compiled languages by optimising out the test for the trace statements.[/color]

                  Thanks, these are all interesting suggestions.

                  [color=blue]
                  > Remove the pointless set/get methods and just access the members directly.
                  > If you are writing a benchmark they will cripple performance.
                  >
                  > Avoiding multiple accesses to the same instance variable, or assigning to
                  > instance variables until you are about to return from a method: use a local
                  > during the execution of the method.[/color]

                  The pointless set/get methods are pointless in the other language
                  implementations as well - that's the point. (And I'll cripple that
                  Oberon-2 implementation real soon by enforcing privacy with modules
                  and set/get procedures.)

                  OTOH there's definitely a place for a truly Pythonic implementation
                  here:


                  best wishes, Duncan

                  Comment

                  • Michel Claveau/Hamster

                    #10
                    Re: make faster Richards benchmark

                    >>> Mit freundlichen Gruessen,

                    Désolé, je ne comprend pas l'allemand. Et pourtant, en octobre, je vais
                    aller là :


                    Je présenterai PONX (Python for Paradox) (cf http://ponx.org)


                    @-salutations
                    --
                    Michel Claveau
                    mél : http://cerbermail.com/?6J1TthIa8B
                    site : http://mclaveau.com






                    Comment

                    • Duncan Lissett

                      #11
                      Re: make faster Richards benchmark

                      Wilk <wilkSPAM@OUTfl ibuste.net> wrote in message news:<873c64zpu 7.fsf@blakie.ri ol>...[color=blue]
                      > dlissett0@yahoo .com (Duncan Lissett) writes:
                      >[color=green]
                      > > I'd appreciate any suggestions on how to make faster Python
                      > > implementations of Richards benchmark. Perhaps there are obvious
                      > > problems that can be corrected?
                      > >
                      > > http://www.lissett.com/ben/bench1.htm[/color]
                      >
                      > import psyco
                      > psyco.full()
                      >
                      > 2 times faster :-)[/color]

                      And Simon: "I just tried the benchmark with Psyco, and cut the run
                      time for input=10000 from 8.1 seconds to 3.4"

                      Joseph, Simon, Wilk: thanks for the Psyco suggestion, and special
                      thanks for going ahead and trying it - we get slightly better than x2.

                      Given how little I know about Python, my assumption is that there's
                      scope for x10 improvement by writing better code... a more Pythonic
                      version for http://www.lissett.com/ben/bench3.htm

                      Suggestions appreciated.

                      best wishes, Duncan

                      Comment

                      • Peter Hansen

                        #12
                        Re: make faster Richards benchmark

                        Duncan Lissett wrote:
                        [color=blue]
                        > Wilk <wilkSPAM@OUTfl ibuste.net> wrote in message news:<873c64zpu 7.fsf@blakie.ri ol>...
                        >[color=green]
                        >>dlissett0@yah oo.com (Duncan Lissett) writes:
                        >>
                        >>[color=darkred]
                        >>>I'd appreciate any suggestions on how to make faster Python
                        >>>implementati ons of Richards benchmark. Perhaps there are obvious
                        >>>problems that can be corrected?
                        >>>
                        >>>http://www.lissett.com/ben/bench1.htm[/color]
                        >>
                        >>import psyco
                        >>psyco.full( )
                        >>
                        >>2 times faster :-)[/color]
                        >
                        > And Simon: "I just tried the benchmark with Psyco, and cut the run
                        > time for input=10000 from 8.1 seconds to 3.4"
                        >
                        > Joseph, Simon, Wilk: thanks for the Psyco suggestion, and special
                        > thanks for going ahead and trying it - we get slightly better than x2.[/color]

                        What platform is this on? On WinXP, with an AMD 2500+ chip, and lots
                        of memory etc., I'm still getting only 38% speedup, nowhere near 100%...

                        -Peter

                        Comment

                        • Jack Diederich

                          #13
                          Re: make faster Richards benchmark

                          On Thu, May 13, 2004 at 09:15:58AM -0700, Duncan Lissett wrote:[color=blue]
                          > Duncan Booth <me@privacy.net > wrote in message news:<Xns94E868 3D9EEF2duncanrc pcouk@127.0.0.1 >...[color=green]
                          > > Remove the pointless set/get methods and just access the members directly.
                          > > If you are writing a benchmark they will cripple performance.
                          > >
                          > > Avoiding multiple accesses to the same instance variable, or assigning to
                          > > instance variables until you are about to return from a method: use a local
                          > > during the execution of the method.[/color]
                          >
                          > The pointless set/get methods are pointless in the other language
                          > implementations as well - that's the point. (And I'll cripple that
                          > Oberon-2 implementation real soon by enforcing privacy with modules
                          > and set/get procedures.)[/color]

                          I would leave python and some other languages out of the comparison.
                          You can do near line-for-line translations for languages in the same class
                          - say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
                          versions to look like a C++ program just measures how badly C++ maps to
                          Python, and not much else.

                          I've even seen some line-for-line translations in production use that
                          use python list-of-strings where the orignal version used char arrarys.
                          You can imagine what that does for performance *shudder*.

                          Writing benchmarks is just hard, if you allow people to solve the problem
                          in whatever way they like you end up measuring how good a coder the
                          language A guy is compared to the language B submitter.

                          -jackdied




                          Comment

                          • Peter Hansen

                            #14
                            Re: make faster Richards benchmark

                            Jack Diederich wrote:
                            [color=blue]
                            > On Thu, May 13, 2004 at 09:15:58AM -0700, Duncan Lissett wrote:[color=green]
                            >>The pointless set/get methods are pointless in the other language
                            >>implementatio ns as well - that's the point.[/color]
                            >
                            > I would leave python and some other languages out of the comparison.
                            > You can do near line-for-line translations for languages in the same class
                            > - say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
                            > versions to look like a C++ program just measures how badly C++ maps to
                            > Python, and not much else.
                            >
                            > I've even seen some line-for-line translations in production use that
                            > use python list-of-strings where the orignal version used char arrarys.
                            > You can imagine what that does for performance *shudder*.
                            >
                            > Writing benchmarks is just hard, if you allow people to solve the problem
                            > in whatever way they like you end up measuring how good a coder the
                            > language A guy is compared to the language B submitter.[/color]

                            In the case of the Richards benchmark, it's better to say "specifying
                            benchmarks is hard". The problem that I saw with it is that it
                            describes the problem in terms that are inappropriate for many of the
                            languages it has been translated to, effectively constraining the
                            implementation in ways that are more suitable to certain languages
                            than others.

                            One key example is the requirement for "global registers" which act
                            similarly to how the low-level registers in the CPU have to be handled,
                            specifically by saving and restoring them in a task-local context
                            whenever a task switch occurs. This is clearly what Richards really
                            wanted, but I'd argue that the approach is ill-conceived, especially
                            in this century...

                            On the other hand, if the specification had been more thoroughly
                            thought out, or perhaps modernized is a more appropriate way of looking
                            at it, then it would describe the expected *use and behaviour* of the
                            tasks in terms of black box testability... in other words, it shouldn't
                            matter that "global registers" are used, but merely that the
                            benchmark produces certain results.

                            Without those tests, it's fairly open to interpretation just how
                            much deviation from the original BCPL implementation is appropriate.
                            Duncan L's ideas for a generator-based approach are interesting,
                            though there are probably those who would argue that it wouldn't
                            follow the benchmark specs exactly. (Especially if it showed
                            performance much closer to C. <wink>)

                            -Peter

                            Comment

                            • Duncan Lissett

                              #15
                              Re: make faster Richards benchmark

                              Duncan Booth <me@privacy.net > wrote in message news:<Xns94E868 3D9EEF2duncanrc pcouk@127.0.0.1 >...

                              -snip-[color=blue]
                              > Next most obvious thing is to junk the linked lists and replace them with
                              > ordinary Python builtin lists.[/color]

                              After the Packet links were replace with a Python list the code was
                              ~1% slower...
                              [color=blue]
                              > Replace the traceOn variable with __debug__ so you get the same benefits as
                              > compiled languages by optimising out the test for the trace statements.[/color]

                              Using __debug__ made the code ~1% faster
                              [color=blue]
                              > Remove the pointless set/get methods and just access the members directly.[/color]

                              As a special dispensation to Python, directly accessed Packet and Tcb
                              fields, ~10% faster

                              [color=blue]
                              > Avoiding multiple accesses to the same instance variable, or assigning to
                              > instance variables until you are about to return from a method: use a local
                              > during the execution of the method.[/color]

                              Seems we're pushing into code optimization rather than not coding
                              badly.

                              Being careful about the tests in conditionals, and reordering some
                              conditionals sped things up some more. (I should roll the same changes
                              into the other langauge implementations .)

                              The OO style implementation took 104.4s and now takes 87.1s

                              I took the OO style implementation, made the methods of the scheduler
                              class into functions, made the fields into globals, and made the Task
                              class run methods into top-level functions.

                              The C style implementation took 114.9s and the OO conversion takes
                              80.3s


                              Further suggestions are welcome (yes, I will try pysco later)

                              Comment

                              Working...