Summing a 2D list

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mark

    Summing a 2D list

    Hi all,

    I have a scenario where I have a list like this:

    User Score
    1 0
    1 1
    1 5
    2 3
    2 1
    3 2
    4 3
    4 3
    4 2

    And I need to add up the score for each user to get something like
    this:

    User Score
    1 6
    2 4
    3 2
    4 8

    Is this possible? If so, how can I do it? I've tried looping through
    the arrays and not had much luck so far.

    Any help much appreciated,

    Mark
  • Diez B. Roggisch

    #2
    Re: Summing a 2D list

    Mark wrote:
    Hi all,
    >
    I have a scenario where I have a list like this:
    >
    User Score
    1 0
    1 1
    1 5
    2 3
    2 1
    3 2
    4 3
    4 3
    4 2
    >
    And I need to add up the score for each user to get something like
    this:
    >
    User Score
    1 6
    2 4
    3 2
    4 8
    >
    Is this possible? If so, how can I do it? I've tried looping through
    the arrays and not had much luck so far.
    >
    Any help much appreciated,
    Show us your efforts in code so far. Especially what the actual data looks
    like. Then we can suggest a solution.

    Diez

    Comment

    • Chris

      #3
      Re: Summing a 2D list

      On Jun 12, 3:48 pm, Mark <markjtur...@gm ail.comwrote:
      Hi all,
      >
      I have a scenario where I have a list like this:
      >
      User            Score
      1                 0
      1                 1
      1                 5
      2                 3
      2                 1
      3                 2
      4                 3
      4                 3
      4                 2
      >
      And I need to add up the score for each user to get something like
      this:
      >
      User            Score
      1                 6
      2                 4
      3                 2
      4                 8
      >
      Is this possible? If so, how can I do it? I've tried looping through
      the arrays and not had much luck so far.
      >
      Any help much appreciated,
      >
      Mark
      user_score = {}
      for record in list:
      user, score = record.split()
      if user in user_score: user_score[user] += score
      else: user_score[user] = score

      print '\n'.join(['%s\t%s' % (user, score) for user,score in
      sorted(user_sco re.items())])

      You don't mention what data structure you are keeping your records in
      but hopefully this helps you in the right direction.

      Comment

      • Mark

        #4
        Re: Summing a 2D list

        On Jun 12, 3:02 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
        Mark wrote:
        Hi all,
        >
        I have a scenario where I have a list like this:
        >
        User            Score
        1                 0
        1                 1
        1                 5
        2                 3
        2                 1
        3                 2
        4                 3
        4                 3
        4                 2
        >
        And I need to add up the score for each user to get something like
        this:
        >
        User            Score
        1                 6
        2                 4
        3                 2
        4                 8
        >
        Is this possible? If so, how can I do it? I've tried looping through
        the arrays and not had much luck so far.
        >
        Any help much appreciated,
        >
        Show us your efforts in code so far. Especially what the actual data looks
        like. Then we can suggest a solution.
        >
        Diez
        Hi Diez, thanks for the quick reply.

        To be honest I'm relatively new to Python, so I don't know too much
        about how all the loop constructs work and how they differ to other
        languages. I'm building an app in Django and this data is coming out
        of a database and it looks like what I put up there!

        This was my (failed) attempt:

        predictions = Prediction.obje cts.all()
        scores = []
        for prediction in predictions:
        i = [prediction.pred ictor.id, 0]
        if prediction.pred ictionscore:
        i[1] += int(prediction. predictionscore )
        scores.append(i )

        I did have another loop in there (I'm fairly sure I need one) but that
        didn't work either. I don't imagine that snippet is very helpful,
        sorry!

        Any tips would be gratefully recieved!

        Thanks,

        Mark

        Comment

        • Aidan

          #5
          Re: Summing a 2D list

          Mark wrote:
          Hi all,
          >
          I have a scenario where I have a list like this:
          >
          User Score
          1 0
          1 1
          1 5
          2 3
          2 1
          3 2
          4 3
          4 3
          4 2
          >
          And I need to add up the score for each user to get something like
          this:
          >
          User Score
          1 6
          2 4
          3 2
          4 8
          >
          Is this possible? If so, how can I do it? I've tried looping through
          the arrays and not had much luck so far.
          >
          Any help much appreciated,
          >
          Mark

          does this work for you?


          users = [1,1,1,2,2,3,4,4 ,4]
          score = [0,1,5,3,1,2,3,3 ,2]

          d = dict()

          for u,s in zip(users,score ):
          if d.has_key(u):
          d[u] += s
          else:
          d[u] = s

          for key in d.keys():
          print 'user: %d\nscore: %d\n' % (key,d[key])

          Comment

          • John Salerno

            #6
            Re: Summing a 2D list

            "Mark" <markjturner@gm ail.comwrote in message
            news:c0461b8e-a60d-43f3-b0ab-6d2030dcd149@79 g2000hsk.google groups.com...
            On Jun 12, 3:02 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
            Mark wrote:
            ---
            This was my (failed) attempt:

            predictions = Prediction.obje cts.all()
            scores = []
            for prediction in predictions:
            i = [prediction.pred ictor.id, 0]
            if prediction.pred ictionscore:
            i[1] += int(prediction. predictionscore )
            scores.append(i )
            ---

            Your question sounds like a fun little project, but can you post what the
            actual list of users/scores looks like? Is it a list of tuples like this:

            [(1, 0), (1, 1) ... ]

            Or something else?


            Comment

            • Diez B. Roggisch

              #7
              Re: Summing a 2D list

              To be honest I'm relatively new to Python, so I don't know too much
              about how all the loop constructs work and how they differ to other
              languages. I'm building an app in Django and this data is coming out
              of a database and it looks like what I put up there!
              >
              This was my (failed) attempt:
              >
              predictions = Prediction.obje cts.all()
              scores = []
              for prediction in predictions:
              i = [prediction.pred ictor.id, 0]
              if prediction.pred ictionscore:
              i[1] += int(prediction. predictionscore )
              scores.append(i )
              >
              I did have another loop in there (I'm fairly sure I need one) but that
              didn't work either. I don't imagine that snippet is very helpful,
              sorry!
              It is helpful because it tells us what your actual data looks like.

              What you need is to get a list of (predictor, score)-pairs. These you should
              be able to get like this:

              l = [(p.predictor.id , p.predictionsco re) for p in predictions]

              Now you need to sort this list - because in the next step, we will aggregate
              the values for each predictor.

              result = []
              current_predict or = None
              total_sum = 0
              for predictor, score in l:
              if predictor != current_predict or:
              # only if we really have a current_predict or,
              # the non-existent first one doesn't count
              if current_predict or is not None:
              result.append(( predictor, total_sum))
              total_sum = 0
              current_predict or = predictor
              total_sum += score

              That should be roughly it.

              Diez

              Comment

              • Mark

                #8
                Re: Summing a 2D list

                John, it's a QuerySet coming from a database in Django. I don't know
                enough about the structure of this object to go into detail I'm
                afraid.

                Aidan, I got an error trying your suggestion: 'zip argument #2 must
                support iteration', I don't know what this means!

                Thanks to all who have answered! Sorry I'm not being very specific!

                Comment

                • Aidan

                  #9
                  Re: Summing a 2D list

                  Mark wrote:
                  John, it's a QuerySet coming from a database in Django. I don't know
                  enough about the structure of this object to go into detail I'm
                  afraid.
                  >
                  Aidan, I got an error trying your suggestion: 'zip argument #2 must
                  support iteration', I don't know what this means!
                  well, if we can create 2 iterable sequences one which contains the user
                  the other the scores, it should work

                  the error means that the second argument to the zip function was not an
                  iterable, such as a list tuple or string

                  can you show me the lines you're using to retrieve the data sets from
                  the database? then i might be able to tell you how to build the 2 lists
                  you need.
                  Thanks to all who have answered! Sorry I'm not being very specific!

                  Comment

                  • Aidan

                    #10
                    Re: Summing a 2D list

                    Aidan wrote:
                    Mark wrote:
                    >John, it's a QuerySet coming from a database in Django. I don't know
                    >enough about the structure of this object to go into detail I'm
                    >afraid.
                    >>
                    >Aidan, I got an error trying your suggestion: 'zip argument #2 must
                    >support iteration', I don't know what this means!
                    >
                    well, if we can create 2 iterable sequences one which contains the user
                    the other the scores, it should work
                    >
                    the error means that the second argument to the zip function was not an
                    iterable, such as a list tuple or string
                    >
                    can you show me the lines you're using to retrieve the data sets from
                    the database? then i might be able to tell you how to build the 2 lists
                    you need.
                    >
                    wait you already did...

                    predictions = Prediction.obje cts.all()
                    pairs = [(p.predictor.id ,p.predictionsc ore) for p in predictions]

                    those 2 lines will will build a list of user/score pairs. you can then
                    replace the call to zip with pairs

                    any luck?

                    Comment

                    • =?ISO-8859-1?Q?Gerhard_H=E4ring?=

                      #11
                      Re: Summing a 2D list

                      Mark wrote:
                      John, it's a QuerySet coming from a database in Django. I don't know
                      enough about the structure of this object to go into detail I'm
                      afraid. [...]
                      Then let the database do the summing up. That's what it's there for :-)

                      select user, sum(score) from score_table
                      group by user

                      or something very similar, depending on the actual database schema. I
                      don't know how to do this with Django's ORM, but is the way to do it in
                      plain SQL.

                      -- Gerhard

                      Comment

                      • =?ISO-8859-1?Q?Gerhard_H=E4ring?=

                        #12
                        Counting things fast - was Re: Summing a 2D list

                        Aidan wrote:
                        does this work for you?
                        >
                        users = [1,1,1,2,2,3,4,4 ,4]
                        score = [0,1,5,3,1,2,3,3 ,2]
                        >
                        d = dict()
                        >
                        for u,s in zip(users,score ):
                        if d.has_key(u):
                        d[u] += s
                        else:
                        d[u] = s
                        >
                        for key in d.keys():
                        print 'user: %d\nscore: %d\n' % (key,d[key])
                        I've recently had the very same problem and needed to optimize for the
                        best solution. I've tried quite a few, including:

                        1) using a dictionary with a default value

                        d = collections.def aultdict(lambda : 0)
                        d[key] += value

                        2) Trying out if avoiding object allocation is worth the effort. Using
                        Cython:

                        cdef class Counter:
                        cdef int _counter
                        def __init__(self):
                        self._counter = 0

                        def inc(self):
                        self._counter += 1

                        def __int__(self):
                        return self._counter

                        def __iadd__(self, operand):
                        self._counter += 1
                        return self

                        And no, this was *not* faster than the final solution. This counter
                        class, which is basically a mutable int, is exactly as fast as just
                        using this one (final solution) - tada!

                        counter = {}
                        try:
                        counter[key] += 1
                        except KeyError:
                        counter[key] = 1

                        Using psyco makes this a bit faster still. psyco can't optimize
                        defaultdict or my custom Counter class, though.

                        -- Gerhard

                        Comment

                        • Mark

                          #13
                          Re: Summing a 2D list

                          On Jun 12, 3:45 pm, Aidan <awe...@gmail.c omwrote:
                          Aidan wrote:
                          Mark wrote:
                          John, it's a QuerySet coming from a database in Django. I don't know
                          enough about the structure of this object to go into detail I'm
                          afraid.
                          >
                          Aidan, I got an error trying your suggestion: 'zip argument #2 must
                          support iteration', I don't know what this means!
                          >
                          well, if we can create 2 iterable sequences one which contains the user
                          the other the scores, it should work
                          >
                          the error means that the second argument to the zip function was not an
                          iterable, such as a list tuple or string
                          >
                          can you show me the lines you're using to retrieve the data sets from
                          the database? then i might be able to tell you how to build the 2 lists
                          you need.
                          >
                          wait you already did...
                          >
                          predictions = Prediction.obje cts.all()
                          pairs = [(p.predictor.id ,p.predictionsc ore) for p in predictions]
                          >
                          those 2 lines will will build a list of user/score pairs.  you can then
                          replace the call to zip with pairs
                          >
                          any luck?
                          Thanks Aidan, this works great!

                          Thanks also to everyone else, I'm sure your suggestions would have
                          worked too if I'd been competent enough to do them properly!

                          Comment

                          • Paddy

                            #14
                            Re: Counting things fast - was Re: Summing a 2D list

                            On Jun 12, 4:14 pm, Gerhard Häring <g...@ghaering. dewrote:
                            Aidan wrote:
                            does this work for you?
                            >
                            users = [1,1,1,2,2,3,4,4 ,4]
                            score = [0,1,5,3,1,2,3,3 ,2]
                            >
                            d = dict()
                            >
                            for u,s in zip(users,score ):
                            if d.has_key(u):
                            d[u] += s
                            else:
                            d[u] = s
                            >
                            for key in d.keys():
                            print 'user: %d\nscore: %d\n' % (key,d[key])
                            >
                            I've recently had the very same problem and needed to optimize for the
                            best solution. I've tried quite a few, including:
                            >
                            1) using a dictionary with a default value
                            >
                            d = collections.def aultdict(lambda : 0)
                            d[key] += value
                            >
                            <<SNIP>>
                            -- Gerhard
                            This might be faster, by avoiding the lambda:

                            d = collections.def aultdict(int)
                            d[key] += value

                            - Paddy.

                            Comment

                            • Karsten Heymann

                              #15
                              Re: Summing a 2D list

                              Hi Mark,

                              Mark <markjturner@gm ail.comwrites:
                              I have a scenario where I have a list like this:
                              >
                              User Score
                              1 0
                              1 1
                              1 5
                              2 3
                              2 1
                              3 2
                              4 3
                              4 3
                              4 2
                              >
                              And I need to add up the score for each user to get something like
                              this:
                              >
                              User Score
                              1 6
                              2 4
                              3 2
                              4 8
                              >
                              Is this possible? If so, how can I do it? I've tried looping through
                              the arrays and not had much luck so far.
                              Although your problem has already been solved, I'd like to present a
                              different approach which can be quite a bit faster. The most common
                              approach seems to be using a dictionary:

                              summed_up={}
                              for user,vote in pairs:
                              if summed_up.has_k ey(user):
                              summed_up[user]+=vote
                              else:
                              summed_up[user]=vote

                              But if the list of users is compact and the maximum value is known
                              before, the using a list and coding the user into the list position is
                              much more elegant:

                              summed_up=list( (0,) * max_user )
                              for user,vote in pairs:
                              summed_up[user] += vote

                              I've run a quick and dirty test on these approaches and found that the
                              latter takes only half the time than the first. More precisely, with
                              about 2 million pairs, i got:

                              * dict approach: 2s
                              (4s with "try: ... except KeyError:" instead of the "if")
                              * list approach: 0.9s

                              BTW this was inspired by the book "Programmin g Pearls" I read some
                              years ago where a similar approach saved some magnitudes of time
                              (using a bit field instead of a list to store reserved/free phone
                              numbers IIRC).

                              Yours,
                              Karsten

                              Comment

                              Working...