Python slow for filter scripts

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Peter Mutsaers

    Python slow for filter scripts

    Hello,

    Up to now I mostly wrote simple filter scripts in Perl, e.g.

    while(<>) {
    # do something with $_, regexp matching, replacements etc.
    print;
    }

    Now I learned Python and like it much more as a language.

    However, I tried the most simple while(<>) {print;} in Perl versus
    Python, just a copy from stdin to stdout, to see how fast the basic
    filter can be.

    I found that on my (linux) PC, the Python version was 4 times slower.

    Is that normal, does it disqualify Python for simple filter scripts?
  • Terry Reedy

    #2
    Re: Python slow for filter scripts


    "Peter Mutsaers" <plm@gmx.li> wrote in message
    news:9769adc8.0 310281233.5fc5d 252@posting.goo gle.com...[color=blue]
    > However, I tried the most simple while(<>) {print;} in Perl versus
    > Python, just a copy from stdin to stdout, to see how fast the basic
    > filter can be.
    >
    > I found that on my (linux) PC, the Python version was 4 times[/color]
    slower.

    There are several ways to copy a file in Python. Some are much faster
    than others. I believe 'for line in file: print line' might be
    fastest with python code. There is also shutil.copyfile (src,dst) or
    one of other variants. Which did you use?
    [color=blue]
    > Is that normal, does it disqualify Python for simple filter scripts?[/color]

    I have read that Perl is optimized for file read/write in a way that
    Python is not, so this may not be most representative comparison for
    your actual app. In any case, relevance of relative speed depends on
    absolute speed (think about milleseconds versus hours).

    Terry J. Reedy


    Comment

    • David C. Fox

      #3
      Re: Python slow for filter scripts

      Peter Mutsaers wrote:
      [color=blue]
      > Hello,
      >
      > Up to now I mostly wrote simple filter scripts in Perl, e.g.
      >
      > while(<>) {
      > # do something with $_, regexp matching, replacements etc.
      > print;
      > }
      >
      > Now I learned Python and like it much more as a language.
      >
      > However, I tried the most simple while(<>) {print;} in Perl versus
      > Python, just a copy from stdin to stdout, to see how fast the basic
      > filter can be.
      >
      > I found that on my (linux) PC, the Python version was 4 times slower.
      >
      > Is that normal, does it disqualify Python for simple filter scripts?[/color]

      You don't show the Python script you use, so there's no way for us to
      tell whether it is possible to do it more efficiently.

      Also, what size file did you use? Unless you tried it with a large
      enough file, so that the time was proportional to the file size, you may
      just have measured the difference in the startup time for perl vs. python.

      Finally, the relative performance of two languages on Task X is not a
      very good predictor of their relative performance on Task Y, so you are
      probably better off doing a comparison of the actual task you are
      interested in.

      David

      Comment

      • Cameron Laird

        #4
        Re: Python slow for filter scripts

        In article <riCnb.36947$mZ 5.185175@attbi_ s54>,
        David C. Fox <davidcfox@post .harvard.edu> wrote:[color=blue]
        >Peter Mutsaers wrote:[/color]

        Comment

        • Alex Martelli

          #5
          Re: Python slow for filter scripts

          Peter Mutsaers wrote:
          [color=blue]
          > Hello,
          >
          > Up to now I mostly wrote simple filter scripts in Perl, e.g.
          >
          > while(<>) {
          > # do something with $_, regexp matching, replacements etc.
          > print;
          > }
          >
          > Now I learned Python and like it much more as a language.
          >
          > However, I tried the most simple while(<>) {print;} in Perl versus
          > Python, just a copy from stdin to stdout, to see how fast the basic
          > filter can be.
          >
          > I found that on my (linux) PC, the Python version was 4 times slower.
          >
          > Is that normal, does it disqualify Python for simple filter scripts?[/color]

          It really depends on what you're doing. I tried the following:

          cio.pl:
          while(<>) {
          print;
          }

          cio.py:
          import sys
          import fileinput
          import shutil

          emit = sys.stdout.writ e

          def io_1(emit=emit) :
          for line in sys.stdin: emit(line)

          def io_2(emit=emit) :
          for line in fileinput.input (): emit(line)

          def io_3():
          shutil.copyfile obj(sys.stdin, sys.stdout)


          if __name__=='__ma in__':
          import __main__

          def usage():
          sys.stdout = sys.stderr
          print "Usage: %s N" % sys.argv[0]
          print "N indicates what stdin->stdout copy function to run"
          ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
          ns.sort()
          print "valid values for N:", ns
          print "invalid args:", sys.argv[1:]
          sys.exit()
          if len(sys.argv) != 2: usage()
          func = getattr(__main_ _, 'io_'+sys.argv[1], None)
          if func is None: usage()
          sys.argv.pop()
          func()

          and I'm specifically reading the King James' Bible (an easily
          available text so you can reproduct my results!) and writing
          either /dev/null or a tempfile on my own Linux box. I see...:

          [alex@lancelot bo]$ ls -l /x/kjv.txt
          -rw-rw-r-- 1 alex alex 4404445 Mar 29 2003 /x/kjv.txt

          [alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
          0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (330major+61min or)pagefaults 0swaps

          [alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
          0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (330major+61min or)pagefaults 0swaps

          So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
          mostly dependent on what else is going on in the machine, and thus
          by %CPU available, of course). Let's see Python now:


          [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
          0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (448major+278mi nor)pagefaults 0swaps

          [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
          0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (448major+278mi nor)pagefaults 0swaps

          Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
          factor of 3. However, that IS mostly fileinput's issue. Videat:


          [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
          0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (447major+276mi nor)pagefaults 0swaps

          [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
          0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (447major+276mi nor)pagefaults 0swaps

          a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
          but nothing major. Can we do better yet...?

          [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
          0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (447major+275mi nor)pagefaults 0swaps

          [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
          0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgd ata 0maxresident)k
          0inputs+0output s (447major+275mi nor)pagefaults 0swaps

          ....sure! Bulk copy, 50-80 msec, FASTER than Perl. Of course, I'm sure
          you can program it faster in Perl, too. After all, cat takes 20-60
          msec CPU, so thee's clearly space to do better.


          What kind of files do your scripts most often process? For me, a
          textfile of 4.4 MB is larger than typical. How much do those few
          tens of milliseconds' difference matter? You know your apps, I
          don't, but I _would_ find it rather strange if they "disqualifi ed"
          either language. Anything below about a second is typically fine
          with me, so even the slowest of these programs could still handle
          files of about 6 MB, assuming the 50% CPU it got is pretty typical,
          while still taking no more than about 1 second's elapsed time.


          Of course, you can easily edit my script and play with many other
          I/O methods, until you find one that best suits you. Personally,
          I tend to use fileinput just because it's so handy (like perl's <>),
          not caring all that much about those "wasted" milliseconds... :-)


          Alex

          Comment

          • William Park

            #6
            Re: Python slow for filter scripts

            Alex Martelli <aleax@aleax.it > wrote:[color=blue]
            > and I'm specifically reading the King James' Bible (an easily
            > available text so you can reproduct my results!) and writing[/color]

            Can you post URL for the Bible?

            --
            William Park, Open Geometry Consulting, <opengeometry@y ahoo.ca>
            Linux solution for data management and processing.

            Comment

            • Alex Martelli

              #7
              Re: Python slow for filter scripts

              William Park wrote:
              [color=blue]
              > Alex Martelli <aleax@aleax.it > wrote:[color=green]
              >> and I'm specifically reading the King James' Bible (an easily
              >> available text so you can reproduct my results!) and writing[/color]
              >
              > Can you post URL for the Bible?[/color]

              I originally got it from some MySQL stuff by Paul DuBois and decided
              it would make a good general dataset for reproducible tests and
              benchmarks. A little googling suggests that it comes from:

              Find Bible translation and study resources, as well as related programs and resources from Biola University. The Unbound Bible translation tool has been discontinued due to maintenance costs.


              and specifically from the kjv.zip referenced there under the "King
              James Version" anchor (unzipped, of course).


              Alex

              Comment

              • Stan Graves

                #8
                Re: Python slow for filter scripts

                Alex Martelli <aleax@aleax.it > wrote in message news:<VVCnb.370 402$R32.1225053 3@news2.tin.it> ...[color=blue]
                > It really depends on what you're doing. I tried the following:[/color]

                Me too!

                I ran these on a hp-ux 11.00 box. The input was a 560K html file that
                I had laying around.

                Overall, python was about 3 times slower than perl...and remarkable
                consistant for the three different methods.

                stang@ettin$ ll ./print.html
                -rw-r--r-- 1 stan users 567154 Oct 29 10:30
                ../print.html
                stang@ettin$ time cio.pl < ./print.html > /dev/null

                real 0m0.10s
                user 0m0.06s
                sys 0m0.03s
                stang@ettin$ time cio.pl < ./print.html > /tmp/test

                real 0m0.18s
                user 0m0.06s
                sys 0m0.04s
                stang@ettin$ time cio.py 1 < ./print.html > /dev/null

                real 0m0.85s
                user 0m0.30s
                sys 0m0.11s
                stang@ettin$ time cio.py 1 < ./print.html > /tmp/test

                real 0m0.45s
                user 0m0.29s
                sys 0m0.11s
                stang@ettin$ time cio.py 2 < ./print.html > /dev/null

                real 0m0.76s
                user 0m0.64s
                sys 0m0.11s
                stang@ettin$ time cio.py 2 < ./print.html > /tmp/test

                real 0m0.81s
                user 0m0.64s
                sys 0m0.12s
                stang@ettin$ time cio.py 3 < ./print.html > /dev/null

                real 0m0.43s
                user 0m0.16s
                sys 0m0.10s
                stang@ettin$ time cio.py 3 < ./print.html > /tmp/test

                real 0m0.33s
                user 0m0.17s
                sys 0m0.12s
                stang@ettin$

                --Stan Graves
                stan@SoundInMot ionDJ.com

                Comment

                • Skip Montanaro

                  #9
                  Re: Python slow for filter scripts


                  Stan> Overall, python was about 3 times slower than perl...and
                  Stan> remarkable consistant for the three different methods.

                  What version of Python did you use? Note that 2.3 is significantly faster
                  than 2.2 in a number of ways.

                  Skip

                  Comment

                  • William Park

                    #10
                    Re: Python slow for filter scripts

                    Alex Martelli <aleax@aleax.it > wrote:[color=blue]
                    > [alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
                    > 0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgd ata 0maxresident)k
                    > 0inputs+0output s (330major+61min or)pagefaults 0swaps[/color]
                    [color=blue]
                    > [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
                    > 0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgd ata 0maxresident)k
                    > 0inputs+0output s (448major+278mi nor)pagefaults 0swaps[/color]
                    [color=blue]
                    > [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
                    > 0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgd ata 0maxresident)k
                    > 0inputs+0output s (447major+276mi nor)pagefaults 0swaps[/color]
                    [color=blue]
                    > [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
                    > 0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgd ata 0maxresident)k
                    > 0inputs+0output s (447major+275mi nor)pagefaults 0swaps[/color]

                    But, nothing can beat
                    time cat < kjv.txt > /dev/null
                    :-)

                    --
                    William Park, Open Geometry Consulting, <opengeometry@y ahoo.ca>
                    Linux solution for data management and processing.

                    Comment

                    • Bengt Richter

                      #11
                      Re: Python slow for filter scripts

                      On 29 Oct 2003 06:34:22 GMT, William Park <opengeometry@y ahoo.ca> wrote:
                      [color=blue]
                      >Alex Martelli <aleax@aleax.it > wrote:[color=green]
                      >> and I'm specifically reading the King James' Bible (an easily
                      >> available text so you can reproduct my results!) and writing[/color]
                      >
                      >Can you post URL for the Bible?
                      >[/color]
                      Try Project Gutenburg, at

                      Entdecke auf Gutenberg.net spannende Artikel, News und Hintergründe rund um Johannes Gutenberg, Buchdruck und Mediengeschichte » jetzt lesen


                      or their new host at



                      They have a number of bibles in various languages, and a ton (>10,000 e-texts) of other stuff,
                      also some audio texts, apparently. BTW I read somewhere that the BBC is going to make all their
                      archives, video and audio, freely available on the net, except where there is some legal reason
                      they can't. I guess they're a kind of FEF -- Free Entertainment Foundation (thank you British
                      telly owners ;-)

                      Apparently a new King James e-text is at (long URL, or use their search for "bible" (w/o qutoes)
                      and go to entry #16):



                      They also have the Koran, BTW. It's interesting to compare word frequencies, e.g., the 20 most frequent
                      (unless I goofed) in the texts I downloaded:

                      "C:\Info\Lingui stics\Gutenberg \bible\bible11. txt"
                      6647: 'LORD'
                      6649: 'him'
                      6856: 'is'
                      6893: 'be'
                      6971: 'they'
                      7249: 'for'
                      7972: 'a'
                      8388: 'his'
                      8854: 'I'
                      8940: 'unto'
                      9666: 'he'
                      9760: 'shall'
                      12353: 'in'
                      12592: 'that'
                      12846: 'And'
                      13429: 'to'
                      34472: 'of'
                      38891: 'and'
                      62135: 'the'

                      "C:\Info\Lingui stics\Gutenberg \koran\koran10. txt"
                      1739: 'ye'
                      1752: 'with'
                      1956: 'And'
                      1979: 'for'
                      1991: 'who'
                      2037: 'be'
                      2108: 'not'
                      2186: 'that'
                      2254: 'shall'
                      2366: 'them'
                      2575: 'a'
                      2644: 'they'
                      2799: 'is'
                      2900: 'in'
                      3320: 'God'
                      5144: 'to'
                      6855: 'of'
                      6896: 'and'
                      10982: 'the'

                      Both start with the-and-of-to ;-)
                      (I hope this does not offend anyone ;-)

                      Regards,
                      Bengt Richter

                      Comment

                      • Michael Hudson

                        #12
                        Re: Python slow for filter scripts

                        William Park <opengeometry@y ahoo.ca> writes:
                        [color=blue]
                        > Alex Martelli <aleax@aleax.it > wrote:[color=green]
                        > > [alex@lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
                        > > 0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgd ata 0maxresident)k
                        > > 0inputs+0output s (330major+61min or)pagefaults 0swaps[/color]
                        >[color=green]
                        > > [alex@lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
                        > > 0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgd ata 0maxresident)k
                        > > 0inputs+0output s (448major+278mi nor)pagefaults 0swaps[/color]
                        >[color=green]
                        > > [alex@lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
                        > > 0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgd ata 0maxresident)k
                        > > 0inputs+0output s (447major+276mi nor)pagefaults 0swaps[/color]
                        >[color=green]
                        > > [alex@lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
                        > > 0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgd ata 0maxresident)k
                        > > 0inputs+0output s (447major+275mi nor)pagefaults 0swaps[/color]
                        >
                        > But, nothing can beat
                        > time cat < kjv.txt > /dev/null
                        > :-)[/color]

                        time true ?

                        (thinking of



                        )

                        Cheers,
                        mwh

                        --
                        58. Fools ignore complexity. Pragmatists suffer it. Some can avoid
                        it. Geniuses remove it.
                        -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html

                        Comment

                        • Peter Maas

                          #13
                          Re: Python slow for filter scripts

                          Peter Mutsaers schrieb:[color=blue]
                          > I found that on my (linux) PC, the Python version was 4 times slower.
                          >
                          > Is that normal, does it disqualify Python for simple filter scripts?[/color]

                          Have a look at the win32 language shootout (http://dada.perl.it/shootout/)
                          with lots of benchmarks for lots of languages, among them python,
                          cygperl (perl with cygwin calls, I assume) and perl (perl with native
                          win32 calls). cygwin is usually faster than python, perl slower. You
                          can also look at Doug Bagley's Linux based shootout but it's older
                          (Perl 5.6, Python 2.1) and currently not maintained.

                          My impression is that on an average perl is faster than python
                          but not by an order of magnitude but by ~ 20%.

                          Your test is probably closest to the "reverse file" benchmark
                          where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
                          and perl : python = 1.06 : 1.17 on Linux.

                          Mit freundlichen Gruessen,

                          Peter Maas

                          --
                          -------------------------------------------------------------------
                          Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
                          Tel +49-241-93878-0 Fax +49-241-93878-20 eMail peter.maas@mplu sr.de
                          -------------------------------------------------------------------

                          Comment

                          • John J. Lee

                            #14
                            Re: Python slow for filter scripts

                            Peter Maas <fpetermaas@net scape.net> writes:
                            [...][color=blue]
                            > Your test is probably closest to the "reverse file" benchmark
                            > where cygperl : python : perl = 0.68 : 1.68 : 12.72 on win32
                            > and perl : python = 1.06 : 1.17 on Linux.[/color]
                            [...]

                            Why on earth is cygperl faster than perl? Is that really correct?


                            John

                            Comment

                            Working...