Which is faster?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • cnb

    Which is faster?

    For a big nbr of it might matter?
    Is av_grade O(n*2) and the first O(n) when it comes to adding or is
    "sum x for x in y" just traversing the list ones, accumulating the
    values, it doesnt first build the list and then travese it for sum?

    def averageGrade(se lf):
    tot = 0
    for review in self.reviews:
    tot += review.grade
    return tot / len(self.review s)

    def av_grade(self):
    return sum(review.grad e for review in self.reviews) / \
    len(self.review s)
  • Fredrik Lundh

    #2
    Re: Which is faster?

    cnb wrote:
    For a big nbr of it might matter?
    Is av_grade O(n*2) and the first O(n) when it comes to adding or is
    "sum x for x in y" just traversing the list ones, accumulating the
    values, it doesnt first build the list and then travese it for sum?
    sum() doesn't build a list, but even if it would do that, it'd still be
    O(n) -- looping over an array twice doesn't change the time complexity.

    to find out which one's actually faster, benchmark them.

    </F>

    Comment

    • Gabriel Genellina

      #3
      Re: Which is faster?

      En Sat, 30 Aug 2008 01:26:35 -0300, cnb <circularfunc@y ahoo.seescribiï ¿½:
      For a big nbr of it might matter?
      Is av_grade O(n*2) and the first O(n) when it comes to adding or is
      "sum x for x in y" just traversing the list ones, accumulating the
      values, it doesnt first build the list and then travese it for sum?
      AFAIK both versions are O(n).
      sum(x for x in y) contains a "generator expression" [1], it does not
      create an intermediate list.
      sum([x for x in y]) contains a "list comprehension" [2] and it does create
      an intermediate list.

      I'd say the version using sum() should be faster, because the iteration is
      implemented inside the interpreter, not in Python code as the other one.
      But you should actually measure their relative performance; use the timeit
      module for that.
      def averageGrade(se lf):
      tot = 0
      for review in self.reviews:
      tot += review.grade
      return tot / len(self.review s)
      >
      def av_grade(self):
      return sum(review.grad e for review in self.reviews) / \
      len(self.review s)
      [1] http://docs.python.org/tut/node11.ht...00000000000000
      [2] http://docs.python.org/tut/node7.html
      --
      Gabriel Genellina

      Comment

      • Steven D'Aprano

        #4
        Re: Which is faster?

        On Fri, 29 Aug 2008 21:26:35 -0700, cnb wrote:
        def averageGrade(se lf):
        tot = 0
        for review in self.reviews:
        tot += review.grade
        return tot / len(self.review s)
        >
        def av_grade(self):
        return sum(review.grad e for review in self.reviews) / \
        len(self.review s)
        Re-writing the functions so they can be tested alone:

        def averageGrade(al ist):
        tot = 0.0
        for x in alist:
        tot += x
        return tot/len(alist)


        def av_grade(alist) :
        return sum(alist)/len(alist)

        >>from timeit import Timer
        >># small amount of items
        .... alist = range(100)
        >>Timer('averag eGrade(alist)',
        .... 'from __main__ import alist, averageGrade'). repeat(number=1 00000)
        [3.9559240341186 523, 3.4910569190979 004, 3.4856188297271 729]
        >>>
        >>Timer('av_gra de(alist)',
        .... 'from __main__ import alist, av_grade').repe at(number=10000 0)
        [2.0255107879638 672, 1.0968310832977 295, 1.0733180046081 543]


        The version with sum() is much faster. How about with lots of data?
        >>alist = xrange(1000000)
        >>Timer('averag eGrade(alist)',
        .... 'from __main__ import alist, averageGrade'). repeat(number=5 0)
        [17.699107885360 718, 18.182793140411 377, 18.651514053344 727]
        >>>
        >>Timer('av_gra de(alist)',
        .... 'from __main__ import alist, av_grade').repe at(number=50)
        [17.125216007232 666, 15.726368904113 77, 16.309713840484 619]

        sum() is still a little faster.



        --
        Steven

        Comment

        • Gabriel Genellina

          #5
          Re: Which is faster?

          En Sat, 30 Aug 2008 03:15:30 -0300, Steven D'Aprano
          <steve@remove-this-cybersource.com .auescribi�:
          On Fri, 29 Aug 2008 21:26:35 -0700, cnb wrote:
          >
          >def averageGrade(se lf):
          > tot = 0
          > for review in self.reviews:
          > tot += review.grade
          > return tot / len(self.review s)
          >>
          >def av_grade(self):
          > return sum(review.grad e for review in self.reviews) / \
          > len(self.review s)
          >
          Re-writing the functions so they can be tested alone:
          >
          def averageGrade(al ist):
          tot = 0.0
          for x in alist:
          tot += x
          return tot/len(alist)
          >
          >
          def av_grade(alist) :
          return sum(alist)/len(alist)
          >
          >
          >>>from timeit import Timer
          >>># small amount of items
          ... alist = range(100)
          >>>Timer('avera geGrade(alist)' ,
          ... 'from __main__ import alist, averageGrade'). repeat(number=1 00000)
          [3.9559240341186 523, 3.4910569190979 004, 3.4856188297271 729]
          >>>>
          >>>Timer('av_gr ade(alist)',
          ... 'from __main__ import alist, av_grade').repe at(number=10000 0)
          [2.0255107879638 672, 1.0968310832977 295, 1.0733180046081 543]
          >
          >
          The version with sum() is much faster. How about with lots of data?
          >
          >>>alist = xrange(1000000)
          >>>Timer('avera geGrade(alist)' ,
          ... 'from __main__ import alist, averageGrade'). repeat(number=5 0)
          [17.699107885360 718, 18.182793140411 377, 18.651514053344 727]
          >>>>
          >>>Timer('av_gr ade(alist)',
          ... 'from __main__ import alist, av_grade').repe at(number=50)
          [17.125216007232 666, 15.726368904113 77, 16.309713840484 619]
          >
          sum() is still a little faster.
          Mmm, in this last test you're measuring the long integer operations
          performance (because the sum exceeds largely what can be represented in a
          plain integer). Long integers are so slow that the difference between both
          loops becomes negligible.

          I've tried again using float values:
          alist = [float(x) for x in xrange(1000000)]
          and got consistent results for any input size (the version using sum() is
          about twice as fast as the for loop)

          --
          Gabriel Genellina

          Comment

          • cnb

            #6
            Re: Which is faster?

            how does doing something twice not change complexity? yes it maybe
            belongs to the same complexity-class but is still twice as slow no?

            Comment

            • Fredrik Lundh

              #7
              Re: Which is faster?

              cnb wrote:
              how does doing something twice not change complexity? yes it maybe
              belongs to the same complexity-class but is still twice as slow no?
              doing two things quickly can still be faster than doing one thing slowly.

              </F>

              Comment

              • Diez B. Roggisch

                #8
                Re: Which is faster?

                cnb schrieb:
                how does doing something twice not change complexity? yes it maybe
                belongs to the same complexity-class but is still twice as slow no?
                Because big O notation is not about constant factors. Or even subterms
                with lower powers.



                Diez

                Comment

                • Lie

                  #9
                  Re: Which is faster?

                  On Aug 30, 5:30 pm, cnb <circularf...@y ahoo.sewrote:
                  how does doing something twice not change complexity? yes it maybe
                  belongs to the same complexity-class but is still twice as slow no?
                  Who is doing something twice? Definitely not sum().

                  sum() does not create intermediate list, and if you pass generator
                  expressions in it you wouldn't make any intermediate list at all, thus
                  simple looping but in interpreter code. sum() that is passed a list
                  comprehension should be faster for extremely small numbers of values
                  to sum, but we don't care about small things, do we? But using
                  intermediate list should be fast enough for even large numbers of
                  values, the time when it can't cope anymore would be when the
                  intermediate list takes half of your memory (how often is that for
                  regular applications?).

                  Comment

                  • Fredrik Lundh

                    #10
                    Re: Which is faster?

                    Lie wrote:
                    >how does doing something twice not change complexity? yes it maybe
                    >belongs to the same complexity-class but is still twice as slow no?
                    >
                    Who is doing something twice? Definitely not sum().
                    nobody's claiming that -- but as mentioned above, even if sum() had done
                    two passes over the source sequence, it'd still be O(n).

                    </F>

                    Comment

                    • Raymond Hettinger

                      #11
                      Re: Which is faster?

                      On Aug 29, 9:26 pm, cnb <circularf...@y ahoo.sewrote:
                      def av_grade(self):
                           return sum(review.grad e for review in self.reviews) / \
                                    len(self.review s)
                      Minor point. Consider making the divisor: float(len(self. reviews)).
                      It would be a bummer to throw-off the average because of floor
                      division.

                      Also consider an itertools approach:
                      sum(itertools.i map(operator.it emgetter('revie w'), self.reviews))

                      BTW, the sum() in Py2.6 is *much* faster than before. It should run
                      circles around any other approach.

                      As Effbot says, if you really care about speed, then just time both
                      approaches. However, it's a good run of thumb that builtins are hard
                      to beat by simulating them in pure python (unless you're using Psyco
                      as an optimizer).


                      Raymond

                      Comment

                      • Steven D'Aprano

                        #12
                        Re: Which is faster?

                        On Sat, 30 Aug 2008 04:11:33 -0300, Gabriel Genellina wrote:
                        Mmm, in this last test you're measuring the long integer operations
                        performance (because the sum exceeds largely what can be represented in
                        a plain integer). Long integers are so slow that the difference between
                        both loops becomes negligible.
                        Nicely caught, and thank you for pointing that out.


                        --
                        Steven

                        Comment

                        Working...