dictionary comparison

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rickle

    dictionary comparison

    I'm trying to compare sun patch levels on a server to those of what sun
    is recommending. For those that aren't familiar with sun patch
    numbering here is a quick run down.

    A patch number shows up like this:
    113680-03
    ^^^^^^ ^^
    patch# revision

    What I want to do is make a list. I want to show what server x has
    versus what sun recommends, and if the patch exists, but the revision
    is different, I want to show that difference.

    Here are some sample patches that sun recommends:
    117000-05
    116272-03
    116276-01
    116278-01
    116378-02
    116455-01
    116602-01
    116606-01

    Here are some sample patches that server x has:
    117000-01
    116272-02
    116272-01
    116602-02

    So there are some that are the same, some that sun recommends that
    server x doesn't have, and some where the patch is the same but the
    revision is different.

    I've thrown the data into dictionaries, but I just can't seem to figure
    out how I should actually compare the data and present it. Here's what
    I have so far (the split is in place because there is actually a lot
    more data in the file, so I split it out so I just get the patch number
    and revision). So I end up with (for example) 116272-01, then split so
    field[0] is 116272 and field[1] is 01.

    def sun():
    sun = open('sun-patchlist', 'r')
    for s in sun:
    sun_fields = s.split(None, 7)
    for sun_field in sun_fields:
    sun_field = sun_field.strip ()
    sun_patch = {}
    sun_patch['number'] = sun_fields[0]
    sun_patch['rev'] = sun_fields[1]
    print sun_patch['number'], sun_patch['rev']
    sun.close()

    def serverx():
    serverx = open('serverx-patchlist', 'r')
    for p in serverx:
    serverx_fields = p.split(None, 7)
    for serverx_field in serverx_fields:
    serverx_field = serverx_field.s trip()
    serverx_patch = {}
    serverx_patch['number'] = serverx_fields[0]
    serverx_patch['rev'] = serverx_fields[1]
    print serverx_patch['number'], serverx_patch['rev']
    serverx.close()

    if __name__=='__ma in__':
    sun()
    serverx()


    Right now I'm just printing the data, just to be sure that each
    dictionary contains the correct data, which it does. But now I need
    the comparison and I just can't seem to figure it out. I could
    probably write this in perl or a shell script, but I'm trying really
    hard to force myself to learn Python so I want this to be a python
    script, created with only built-in modules.

    Any help would be greatly appreciated,
    Rick

  • Bill Mill

    #2
    Re: dictionary comparison

    On 5 May 2005 08:19:31 -0700, rickle <devrick88@gmai l.com> wrote:[color=blue]
    > I'm trying to compare sun patch levels on a server to those of what sun
    > is recommending. For those that aren't familiar with sun patch
    > numbering here is a quick run down.
    >
    > A patch number shows up like this:
    > 113680-03
    > ^^^^^^ ^^
    > patch# revision
    >
    > What I want to do is make a list. I want to show what server x has
    > versus what sun recommends, and if the patch exists, but the revision
    > is different, I want to show that difference.
    >
    > Here are some sample patches that sun recommends:
    > 117000-05
    > 116272-03
    > 116276-01
    > 116278-01
    > 116378-02
    > 116455-01
    > 116602-01
    > 116606-01
    >
    > Here are some sample patches that server x has:
    > 117000-01
    > 116272-02
    > 116272-01
    > 116602-02
    >
    > So there are some that are the same, some that sun recommends that
    > server x doesn't have, and some where the patch is the same but the
    > revision is different.
    >
    > I've thrown the data into dictionaries, but I just can't seem to figure
    > out how I should actually compare the data and present it. Here's what
    > I have so far (the split is in place because there is actually a lot
    > more data in the file, so I split it out so I just get the patch number
    > and revision). So I end up with (for example) 116272-01, then split so
    > field[0] is 116272 and field[1] is 01.
    >
    > def sun():
    > sun = open('sun-patchlist', 'r')
    > for s in sun:
    > sun_fields = s.split(None, 7)
    > for sun_field in sun_fields:
    > sun_field = sun_field.strip ()
    > sun_patch = {}
    > sun_patch['number'] = sun_fields[0]
    > sun_patch['rev'] = sun_fields[1]
    > print sun_patch['number'], sun_patch['rev']
    > sun.close()
    >
    > def serverx():
    > serverx = open('serverx-patchlist', 'r')
    > for p in serverx:
    > serverx_fields = p.split(None, 7)
    > for serverx_field in serverx_fields:
    > serverx_field = serverx_field.s trip()
    > serverx_patch = {}
    > serverx_patch['number'] = serverx_fields[0]
    > serverx_patch['rev'] = serverx_fields[1]
    > print serverx_patch['number'], serverx_patch['rev']
    > serverx.close()
    > [/color]

    The first thing you should notice about this code is that you copied a
    good amount of code between functions; this should be a huge warning
    bell that something can be abstracted out into a function. In this
    case, it's the parsing of the patch files.

    Also, you should see that you're creating a new dictionary every
    iteration through the loop, and furthermore, you're not returning it
    at the end of your function. Thus, it's destroyed when the function
    exits and it goes out of scope.

    <snip>

    Anyway, since you at least made an effort, here's some totally
    untested code that should (I think) do something close to what you're
    looking for:

    def parse_patch_fil e(f):
    patches = {}
    for line in f:
    patch, rev = line.strip().sp lit('-')
    patches[patch] = rev
    return patches

    def diff_patches(su n, serverx):
    for patch in sun:
    if not serverx.has_key (patch):
    print "Sun recommends patch %s" % patch
    for patch in serverx:
    if not sun.has_key(pat ch):
    print "Serverx has unnecessary patch %s" % patch

    def diff_revs(sun, serverx):
    for patch, rev in sun.iteritems() :
    if serverx.has_key (patch) and rev != serverx[patch]:
    print "Sun recommends rev %d of patch %s; serverx has rev %d"\
    % (rev, patch, serverx[patch])

    if __name__ == '__main__':
    sun = parse_patch_fil e(open('sun-patchlist'))
    serverx = parse_patch_fil e(open('serverx-patchlist'))
    diff_patches(su n, serverx)
    diff_revs(sun, serverx)

    Hope this helps.

    Peace
    Bill Mill
    bill.mill at gmail.com

    Comment

    • Jordan Rastrick

      #3
      Re: dictionary comparison

      rickle wrote:[color=blue]
      > I'm trying to compare sun patch levels on a server to those of what[/color]
      sun[color=blue]
      > is recommending. For those that aren't familiar with sun patch
      > numbering here is a quick run down.
      >
      > A patch number shows up like this:
      > 113680-03
      > ^^^^^^ ^^
      > patch# revision
      >
      > What I want to do is make a list. I want to show what server x has
      > versus what sun recommends, and if the patch exists, but the revision
      > is different, I want to show that difference.
      >
      > Here are some sample patches that sun recommends:
      > 117000-05
      > 116272-03
      > 116276-01
      > 116278-01
      > 116378-02
      > 116455-01
      > 116602-01
      > 116606-01
      >
      > Here are some sample patches that server x has:
      > 117000-01
      > 116272-02
      > 116272-01
      > 116602-02
      >
      > So there are some that are the same, some that sun recommends that
      > server x doesn't have, and some where the patch is the same but the
      > revision is different.
      >
      > I've thrown the data into dictionaries, but I just can't seem to[/color]
      figure[color=blue]
      > out how I should actually compare the data and present it. Here's[/color]
      what[color=blue]
      > I have so far (the split is in place because there is actually a lot
      > more data in the file, so I split it out so I just get the patch[/color]
      number[color=blue]
      > and revision). So I end up with (for example) 116272-01, then split[/color]
      so[color=blue]
      > field[0] is 116272 and field[1] is 01.
      >
      > def sun():
      > sun = open('sun-patchlist', 'r')
      > for s in sun:
      > sun_fields = s.split(None, 7)
      > for sun_field in sun_fields:
      > sun_field = sun_field.strip ()
      > sun_patch = {}
      > sun_patch['number'] = sun_fields[0]
      > sun_patch['rev'] = sun_fields[1]
      > print sun_patch['number'], sun_patch['rev']
      > sun.close()
      >
      > def serverx():
      > serverx = open('serverx-patchlist', 'r')
      > for p in serverx:
      > serverx_fields = p.split(None, 7)
      > for serverx_field in serverx_fields:
      > serverx_field = serverx_field.s trip()
      > serverx_patch = {}
      > serverx_patch['number'] = serverx_fields[0]
      > serverx_patch['rev'] = serverx_fields[1]
      > print serverx_patch['number'], serverx_patch['rev']
      > serverx.close()
      >
      > if __name__=='__ma in__':
      > sun()
      > serverx()
      >
      >
      > Right now I'm just printing the data, just to be sure that each
      > dictionary contains the correct data, which it does. But now I need
      > the comparison and I just can't seem to figure it out. I could
      > probably write this in perl or a shell script, but I'm trying really
      > hard to force myself to learn Python so I want this to be a python
      > script, created with only built-in modules.
      >
      > Any help would be greatly appreciated,
      > Rick[/color]

      Well, it seems that what youre asking is more of a generic programming
      question than anything specific to Python - if you can think of how
      you'd solve this in Perl, for example, then a Python solution along the
      same lines would work just as well. I'm not sure if there was some
      specific issue with Python that was confusing you - if so, perhaps you
      could state it more explicitly.

      To address the problem itself, there are a few things about your
      approach in the above code that I find puzzling. First of all, the
      sun() and servex() functions are identical, except for the name of the
      file they open. This kind of code duplication is bad practice, in
      Python, Perl, or any other language (even Shell scripting perhaps,
      although I wouldn't really know) - you should definitely use a single
      function that takes a filename as an argument instead.

      Second, you are creating a new dictionary inside every iteration of the
      for loop, one for every patch in the file; each dictionary you create
      contains one patch number and one revision number. This data is
      printed, and thereafter ignored (and thus will be consumed by Python's
      Garbage Collector.) Hence youre not actually storing it for later use.
      I don't know whether this was because you were unsure how to proceed to
      the comparing the two datasets; however I think what you probably
      wanted was to have a single dictionary, that keeps track of all the
      patches in the file. You need to define this outside the for loop; and,
      if you want to use it outside the body of the function, you'll need to
      return it. Also, rather than have a dictionary of two values, keyed by
      strings, I'd suggest a dictionary mapping patch numbers to their
      corresponding revision numbers is what you want.

      Once you've got two dictionaries - one for the list for the servers
      patches, and one for Sun's recommended patches - you can compare the
      two sets of data by going through the Sun's patches, checking if the
      server has that patch, and if so, caluclating the difference in
      revision numbers.

      So heres a rough idea of how I'd suggest modifying what you've got to
      get the intended result:

      def patchlevels(fil ename):
      patchfile = open(filename, 'r')
      patch_dict = {}
      for line in patchfile:
      fields = line.split(None , 7)
      for field in fields:
      field = field.strip()
      number = fields[0]
      rev = fields[1]
      patch_dict[number] = rev
      # print number, patch_dict[number]
      patchfile.close ()
      return patch_dict

      if __name__=='__ma in__':
      sun = patchlevels('su n-patchfile')
      serverx = patchlevels('se rverx-patchfile')
      print "Sun recommends:\t\t ", "Server has:\n"
      for patch in sun:
      if patch in serverx:
      rev = serverx[patch]
      diff = int(rev) - int(sun[patch])
      serverhas = "Revision: %s Difference: %s" % (rev, diff)
      else:
      serverhas = "Does not have this patch"
      print patch, sun[patch], "\t\t", serverhas

      I've tried to stay as close to your code as possible and not introduce
      new material, although I have had to use the inbuilt function int to
      convert the revision numbers from strings to integers in order to
      subtract one from the other; also, I used C printf-style string
      formatting on the line after. I hope its reasonably obvious what these
      things do.

      For the sample data you gave, this outputs:

      Sun recommends: Server has:

      116276 01 Does not have this patch
      116378 02 Does not have this patch
      116272 03 Revision: 01 Difference: -2
      116278 01 Does not have this patch
      116602 01 Revision: 02 Difference: 1
      116606 01 Does not have this patch
      116455 01 Does not have this patch
      117000 05 Revision: 01 Difference: -4

      Here negative differences mean the server's version of the patch is out
      of date, whereas positive differences mean its as recent as Sun's
      recommendation or better. You could change the nature of the output to
      whatever your own preference is easily enough. Or, if you want store
      the data in some other structure like a list for further processing,
      instead of just printing it, thats also pretty simple to do.

      This code isn't exactly a work of art, I could have put more effort
      into a sensible name for the function and variables, made it more
      'pythonic' (e.g. by using a list-comprehension in place of the
      whitespace stripping for loop ), etc; but I think it achieves the
      desired result, or something close to it, right?

      Let me know if I was on completely the wrong track.

      Comment

      • rickle

        #4
        Re: dictionary comparison

        Bill and Jordan, thank you both kindly. I'm not too well versed in
        functions in python and that's exactly what I needed. I could see I
        was doing something wrong in my original attempt, but I didn't know how
        to correct it.

        It's working like a charm now, thank you both very much.
        -Rick

        Comment

        • James Stroud

          #5
          Re: dictionary comparison

          On Thursday 05 May 2005 10:20 am, so sayeth rickle:[color=blue]
          > Bill and Jordan, thank you both kindly. I'm not too well versed in
          > functions in python and that's exactly what I needed. I could see I
          > was doing something wrong in my original attempt, but I didn't know how
          > to correct it.
          >
          > It's working like a charm now, thank you both very much.
          > -Rick[/color]

          I thought I'd throw this in to show some things in python that make such comparisons very easy to write and also to recommend to use the patch as key and version as value in the dict.:

          Note that the meat of the code is really about 4 lines because of (module) sets and list comprehension. Everything else is window dressing.

          James

          =============== =============== =====

          # /usr/bin/env python

          from sets import Set

          # pretending these stripped from file
          recc_ary = ["117000-05", "116272-03", "116276-01", "116278-01", "116378-02", "116455-01", "116602-01", "116606-01"]
          serv_ary = ["117000-01", "116272-02", "116272-01", "116602-02"]


          # use patch as value and version as key
          recc_dct = dict([x.split("-") for x in recc_ary])
          serv_dct = dict([x.split("-") for x in serv_ary])

          # use Set to see if patches overlap
          overlap = Set(recc_dct.ke ys()).intersect ion(serv_dct.ke ys())

          # find differences (change comparison operator to <,>,<=,>=, etc.)
          diffs = [patch for patch in overlap if recc_dct[patch] != serv_dct[patch]]

          # print a pretty report
          for patch in diffs:
          print "reccomende d patch for %s (%s) is not server patch (%s)" % \
          (patch, recc_dct[patch], serv_dct[patch])


          --
          James Stroud
          UCLA-DOE Institute for Genomics and Proteomics
          Box 951570
          Los Angeles, CA 90095


          Comment

          • Bengt Richter

            #6
            Re: dictionary comparison

            On 5 May 2005 08:19:31 -0700, "rickle" <devrick88@gmai l.com> wrote:
            [color=blue]
            >I'm trying to compare sun patch levels on a server to those of what sun
            >is recommending. For those that aren't familiar with sun patch
            >numbering here is a quick run down.
            >
            >A patch number shows up like this:
            >113680-03
            >^^^^^^ ^^
            >patch# revision
            >
            >What I want to do is make a list. I want to show what server x has
            >versus what sun recommends, and if the patch exists, but the revision
            >is different, I want to show that difference.
            >
            >Here are some sample patches that sun recommends:
            >117000-05
            >116272-03
            >116276-01
            >116278-01
            >116378-02
            >116455-01
            >116602-01
            >116606-01
            >
            >Here are some sample patches that server x has:
            >117000-01
            >116272-02
            >116272-01
            >116602-02
            >
            >So there are some that are the same, some that sun recommends that
            >server x doesn't have, and some where the patch is the same but the
            >revision is different.
            >
            >I've thrown the data into dictionaries, but I just can't seem to figure
            >out how I should actually compare the data and present it. Here's what
            >I have so far (the split is in place because there is actually a lot
            >more data in the file, so I split it out so I just get the patch number
            >and revision). So I end up with (for example) 116272-01, then split so
            >field[0] is 116272 and field[1] is 01.
            >
            >def sun():
            > sun = open('sun-patchlist', 'r')
            > for s in sun:
            > sun_fields = s.split(None, 7)
            > for sun_field in sun_fields:
            > sun_field = sun_field.strip ()
            > sun_patch = {}
            > sun_patch['number'] = sun_fields[0]
            > sun_patch['rev'] = sun_fields[1]
            > print sun_patch['number'], sun_patch['rev']
            > sun.close()
            >
            >def serverx():
            > serverx = open('serverx-patchlist', 'r')
            > for p in serverx:
            > serverx_fields = p.split(None, 7)
            > for serverx_field in serverx_fields:
            > serverx_field = serverx_field.s trip()
            > serverx_patch = {}
            > serverx_patch['number'] = serverx_fields[0]
            > serverx_patch['rev'] = serverx_fields[1]
            > print serverx_patch['number'], serverx_patch['rev']
            > serverx.close()
            >
            >if __name__=='__ma in__':
            > sun()
            > serverx()
            >
            >
            >Right now I'm just printing the data, just to be sure that each
            >dictionary contains the correct data, which it does. But now I need
            >the comparison and I just can't seem to figure it out. I could
            >probably write this in perl or a shell script, but I'm trying really
            >hard to force myself to learn Python so I want this to be a python
            >script, created with only built-in modules.
            >
            >Any help would be greatly appreciated,
            >[/color]
            In place of sun_rec.splitli nes() and x_has.splitline s() you can substitute
            open('sun-patchlist') adn open('serverx-patchlist') respectively,
            and you can wrap it all in some rountine for your convenience etc.
            But this shows recommended revs that are either there, missing, and/or have unrecommended revs present.
            I added some test data to illustrate. You might want to make the input a little more forgiving about
            e.g. blank lines etc or raise exceptions for what's not allowed or expected.

            ----< sunpatches.py >--------------------------------------------------------------
            #Here are some sample patches that sun recommends:
            sun_rec = """\
            117000-05
            116272-03
            116276-01
            116278-01
            116378-02
            116455-01
            116602-01
            116606-01
            testok-01
            testok-02
            testok-03
            test_0-01
            test_0-02
            test_0-03
            test_2-01
            test_2-02
            test_2-03
            test23-02
            test23-03
            """

            #Here are some sample patches that server x has:
            x_has = """\
            117000-01
            116272-02
            116272-01
            116602-02
            testok-01
            testok-02
            testok-03
            test_2-01
            test_2-02
            test23-01
            test23-02
            test23-03
            """

            def mkdict(lineseq) :
            dct = {}
            for line in lineseq:
            patch, rev = line.split('-')
            dct.setdefault( patch, set()).add(rev)
            return dct

            dct_x_has = mkdict(x_has.sp litlines()) # or e.g., mkdict(open('su nrecfile.txt'))
            dct_sun_rec = mkdict(sun_rec. splitlines())

            for sunpatch, sunrevs in sorted(dct_sun_ rec.items()):
            xrevs = dct_x_has.get(s unpatch, set())
            print 'patch %s: recommended revs %s, missing %s, actual other %s'%(
            sunpatch, map(str,sunrevs &xrevs) or '(none)',
            map(str,sunrevs-xrevs) or '(none)', map(str,xrevs-sunrevs) or '(none)')
            ----------------------------------------------------------------------------------
            Result:

            [12:51] C:\pywk\clp>py2 4 sunpatches.py
            patch 116272: recommended revs (none), missing ['03'], actual other ['02', '01']
            patch 116276: recommended revs (none), missing ['01'], actual other (none)
            patch 116278: recommended revs (none), missing ['01'], actual other (none)
            patch 116378: recommended revs (none), missing ['02'], actual other (none)
            patch 116455: recommended revs (none), missing ['01'], actual other (none)
            patch 116602: recommended revs (none), missing ['01'], actual other ['02']
            patch 116606: recommended revs (none), missing ['01'], actual other (none)
            patch 117000: recommended revs (none), missing ['05'], actual other ['01']
            patch test23: recommended revs ['02', '03'], missing (none), actual other ['01']
            patch test_0: recommended revs (none), missing ['02', '03', '01'], actual other (none)
            patch test_2: recommended revs ['02', '01'], missing ['03'], actual other (none)
            patch testok: recommended revs ['02', '03', '01'], missing (none), actual other (none)

            Oops, didn't pyt multiple revs in sort order. Oh well, you can do that if you like.

            Regards,
            Bengt Richter

            Comment

            • Bengt Richter

              #7
              Re: dictionary comparison

              On Thu, 5 May 2005 10:37:23 -0700, James Stroud <jstroud@mbi.uc la.edu> wrote:
              [...]
              We had the same impulse ;-)
              (see my other post in this thread)[color=blue]
              >
              ># use patch as value and version as key[/color]
              ??? seems the other way around (as it should be?)
              [color=blue]
              >recc_dct = dict([x.split("-") for x in recc_ary])
              >serv_dct = dict([x.split("-") for x in serv_ary])
              >[/color]
              But what about multiple revs for the same patch?

              Regards,
              Bengt Richter

              Comment

              • James Stroud

                #8
                Re: dictionary comparison

                On Thursday 05 May 2005 01:18 pm, so sayeth Bengt Richter:[color=blue]
                > On Thu, 5 May 2005 10:37:23 -0700, James Stroud <jstroud@mbi.uc la.edu>
                > wrote: [...]
                > We had the same impulse ;-)
                > (see my other post in this thread)
                >[color=green]
                > ># use patch as value and version as key[/color]
                >
                > ??? seems the other way around (as it should be?)[/color]

                Sorry, typo in the comment.
                [color=blue]
                >[color=green]
                > >recc_dct = dict([x.split("-") for x in recc_ary])
                > >serv_dct = dict([x.split("-") for x in serv_ary])[/color]
                >
                > But what about multiple revs for the same patch?[/color]

                My Bad...

                serv_dct = dict([(a,max([z for y,z in [f.split("-") for f in serv_ary] if a==y]))
                for a,b in [g.split("-") for g in serv_ary]])

                ;o)

                James

                --
                James Stroud
                UCLA-DOE Institute for Genomics and Proteomics
                Box 951570
                Los Angeles, CA 90095


                Comment

                Working...