unintuitive dict timings

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Bob van der Poel

    unintuitive dict timings


    There was a thread a few days ago about populating a dict/list. I've got
    a bit of bottleneck in my program so grabbed the ideas and did a timing
    test....

    ----------

    import time
    import random

    # Test of various methods to add an item to a list in a dict.

    def ifelse(key, val):
    if d.has_key(key):
    d[key].append(val)
    else:
    d[key]=[val]

    def get1(key,val):
    l=d.get(key) or []
    l.append(val)
    d[key]=l

    def get2(key,val):
    l=d.get(key, [])
    l.append(val)
    d[key]=l

    for p in (ifelse, get1, get2):
    tm=time.clock()
    d={}
    kc=1000
    print "Function: %s Keycount %s" % (p, kc)
    for t in range(100000):
    key=random.rand range(kc)
    val=random.rand range(999)
    p(key,val)

    print "Time", time.clock()-tm

    ------------

    Before running, my thoughts were that get2() would have been the fastest
    (due to the fact there were less statements) and ifelse() would have
    been the slowest. Surprisingly (to me) ifelse() is the fastest() and
    get2() is the slowest.

    Which begs the question...is there a faster way? BTW, interesting (again
    to me) that the most readable is also the fastest.



    --
    Bob van der Poel ** Wynndel, British Columbia, CANADA **
    EMAIL: bvdpoel@kootena y.com
    WWW: http://www.kootenay.com/~bvdpoel


  • Ludovico Magnocavallo

    #2
    Re: unintuitive dict timings

    Bob van der Poel wrote:
    [color=blue]
    > # Test of various methods to add an item to a list in a dict.
    >
    > def ifelse(key, val):
    > if d.has_key(key):
    > d[key].append(val)
    > else:
    > d[key]=[val][/color]

    You could also add:

    def ifelse2(key, val):
    if key in d:
    d[key].append(val)
    else:
    d[key]=[val]

    which on my machine is faster than either get1 or get2, and slightly
    slower than ifelse.

    Ludo

    Comment

    • Alex Martelli

      #3
      Re: unintuitive dict timings

      Bob van der Poel wrote:
      ...[color=blue]
      > Before running, my thoughts were that get2() would have been the fastest
      > (due to the fact there were less statements) and ifelse() would have
      > been the slowest. Surprisingly (to me) ifelse() is the fastest() and
      > get2() is the slowest.[/color]

      Not necessarily -- pulling out the generation of keys and values to ensure
      every function is tested on the same ones:

      kc=1000
      keys_and_values = [
      (random.randran ge(kc), random.randrang e(999))
      for t in xrange(100000)
      ]
      for p in (get2, get1, ifelse):
      tm=time.clock()
      d={}
      print "Function: %s Keycount %s" % (p, kc)
      for key, val in keys_and_values :
      p(key,val)

      print "Time", time.clock()-tm

      the numbers are a bit too close to call given the precision of measurement:

      [alex@lancelot pop]$ python ti.py
      Function: <function get2 at 0x402dced4> Keycount 1000
      Time 0.43
      Function: <function get1 at 0x402dce9c> Keycount 1000
      Time 0.44
      Function: <function ifelse at 0x402dce64> Keycount 1000
      Time 0.41

      [color=blue]
      > Which begs the question...is there a faster way? BTW, interesting (again
      > to me) that the most readable is also the fastest.[/color]

      def sede(key, val):
      d.setdefault(ke y, []).append(val)

      appears to be very marginally fastest, but again it's a close call:

      [alex@lancelot pop]$ python ti.py
      Function: <function sede at 0x402dcf44> Keycount 1000
      Time 0.37
      Function: <function get2 at 0x402dcf0c> Keycount 1000
      Time 0.45
      Function: <function get1 at 0x402dced4> Keycount 1000
      Time 0.44
      Function: <function ifelse at 0x402dce9c> Keycount 1000
      Time 0.4

      I think you're probably best-advised to stay with what you deem most
      readable: a perhaps-10%-win can hardly justify losing readability. As
      for me, I find setdefault quite readable despite the ugly name. I sure
      wish we DID have a way to tell a method to be lazy wrt one of its
      arguments, tho;-).


      Alex


      Comment

      • Charl P. Botha

        #4
        Re: unintuitive dict timings

        In article <3f89a527_1@dns .sd54.bc.ca>, Bob van der Poel wrote:[color=blue]
        > been the slowest. Surprisingly (to me) ifelse() is the fastest() and
        > get2() is the slowest.
        >
        > Which begs the question...is there a faster way? BTW, interesting (again[/color]

        <pedantic hat on>
        Please don't use "begging the question" when you actually mean "raising the
        question". See here for the *correct* meaning and use of "begging the
        question": http://skepdic.com/begging.html
        <pedantic hat off>

        --
        charl p. botha http://cpbotha.net/ http://visualisation.tudelft.nl/

        Comment

        • Bob van der Poel

          #5
          Re: unintuitive dict timings



          Alex Martelli wrote:
          [color=blue]
          >
          > def sede(key, val):
          > d.setdefault(ke y, []).append(val)
          >
          > appears to be very marginally fastest, but again it's a close call:[/color]

          Ahhh, an example of setdefault(). I looked at this in the docs but it
          really didn't make much sense.... I'll give this a try just to see what
          difference it makes.

          Thanks.

          --
          Bob van der Poel ** Wynndel, British Columbia, CANADA **
          EMAIL: bvdpoel@kootena y.com
          WWW: http://www.kootenay.com/~bvdpoel

          Comment

          • Dave Benjamin

            #6
            Re: unintuitive dict timings

            In article <3f89e8bf$1_1@d ns.sd54.bc.ca>, Bob van der Poel wrote:[color=blue]
            >
            > Alex Martelli wrote:
            >[color=green]
            >>
            >> def sede(key, val):
            >> d.setdefault(ke y, []).append(val)
            >>
            >> appears to be very marginally fastest, but again it's a close call:[/color]
            >
            > Ahhh, an example of setdefault(). I looked at this in the docs but it
            > really didn't make much sense.... I'll give this a try just to see what
            > difference it makes.[/color]

            Another solution would be to subclass dict to make a dictionary that
            automatically creates lists as needed. For example:

            class dict_of_lists(d ict):
            def __getitem__(sel f, key):
            try:
            return dict.__getitem_ _(self, key)
            except KeyError:
            self[key] = []
            return self[key]

            dol = dict_of_lists()
            dol['a'].append(1)
            dol['b'].append(2)
            dol['c'].append(3)
            dol['c'].append(4)
            dol['c'].append(5)
            print dol

            Output: {'a': [1], 'c': [3, 4, 5], 'b': [2]}

            --
            ..:[ dave benjamin (ramenboy) -:- www.ramenfest.com -:- www.3dex.com ]:.
            : d r i n k i n g l i f e o u t o f t h e c o n t a i n e r :

            Comment

            Working...