compressed serialization module

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mark

    compressed serialization module

    I used pickle and found the file was saved in text format. I wonder
    whether anyone is familiar with a good compact off-the-shelf module
    available that will save in compressed format... or maybe an opinion
    on a smart approach for making a custom one? Appreciate it! I'm a
    bit of a n00b but have been looking around. I found a serialize.py
    but it seems like overkill.

    Mark
  • Joe Strout

    #2
    Re: compressed serialization module

    On Nov 17, 2008, at 10:47 AM, Mark wrote:
    I used pickle and found the file was saved in text format. I wonder
    whether anyone is familiar with a good compact off-the-shelf module
    available that will save in compressed format... or maybe an opinion
    on a smart approach for making a custom one?
    Well, here's a thought: create a zip file (using the standard zipfile
    module), and pickle your data into that.

    HTH,
    - Joe


    Comment

    • skip@pobox.com

      #3
      Re: compressed serialization module

      >I used pickle and found the file was saved in text format. I wonder
      >whether anyone is familiar with a good compact off-the-shelf module
      >available that will save in compressed format... or maybe an opinion
      >on a smart approach for making a custom one?
      JoeWell, here's a thought: create a zip file (using the standard
      Joezipfile module), and pickle your data into that.

      Also, specify a pickle binary protool. Here's a silly example:
      >>len(pickle.du mps([1,2,3], pickle.HIGHEST_ PROTOCOL))
      14
      >>len(pickle.du mps([1,2,3], 0))
      18

      Skip

      Comment

      • Nick Craig-Wood

        #4
        Re: compressed serialization module

        skip@pobox.com <skip@pobox.com wrote:
        >
        >I used pickle and found the file was saved in text format. I wonder
        >whether anyone is familiar with a good compact off-the-shelf module
        >available that will save in compressed format... or maybe an opinion
        >on a smart approach for making a custom one?
        >
        JoeWell, here's a thought: create a zip file (using the standard
        Joezipfile module), and pickle your data into that.
        >
        Also, specify a pickle binary protool. Here's a silly example:
        >
        >>len(pickle.du mps([1,2,3], pickle.HIGHEST_ PROTOCOL))
        14
        >>len(pickle.du mps([1,2,3], 0))
        18
        Or even
        >>L = range(100)
        >>a = pickle.dumps(L)
        >>len(a)
        496
        >>b = a.encode("bz2")
        >>len(b)
        141
        >>c = b.decode("bz2")
        >>M = pickle.loads(c)
        >>M == L
        True
        >>>

        --
        Nick Craig-Wood <nick@craig-wood.com-- http://www.craig-wood.com/nick

        Comment

        • Mark

          #5
          Re: compressed serialization module


          Thanks guys. This is for serializing to disk. I was hoping to not
          have to use too many intermediate steps, but I couldn't figure out how
          to pickle data into zipfile without using either intermediate string
          or file. That's cool here's what I'll probably settle on (tested) -
          now just need to reverse steps for the open function.

          def saveOjb(self, dataObj):
          fName = self.version + '_' + self.modname + '.dat'
          f = open(fName, 'w')
          dStr = pickle.dumps(da taObj)
          c = dStr.encode("bz 2")
          pickle.dump(c, f, pickle.HIGHEST_ PROTOCOL)
          f.close()

          I'm glad to see that "encode()" is not one of the string ops on the
          deprecate list (using Python 2.5).

          Thx,
          Mark

          Comment

          • skip@pobox.com

            #6
            Re: compressed serialization module


            Markdef saveOjb(self, dataObj):
            Mark fName = self.version + '_' + self.modname + '.dat'
            Mark f = open(fName, 'w')
            Mark dStr = pickle.dumps(da taObj)
            Mark c = dStr.encode("bz 2")
            Mark pickle.dump(c, f, pickle.HIGHEST_ PROTOCOL)
            Mark f.close()

            Hmmm... Why pickle it twice?

            def saveOjb(self, dataObj):
            fName = self.version + '_' + self.modname + '.dat'
            f = open(fName, 'wb')
            f.write(pickle. dumps(dataObj, pickle.HIGHEST_ PROTOCOL).encod e("bz2"))
            f.close()

            Skip

            Comment

            • Mark

              #7
              Re: compressed serialization module

              On Nov 17, 3:08 pm, s...@pobox.com wrote:
                  Markdef saveOjb(self, dataObj):
                  Mark    fName = self.version + '_' + self.modname + '.dat'
                  Mark    f = open(fName, 'w')
                  Mark    dStr = pickle.dumps(da taObj)
                  Mark    c = dStr.encode("bz 2")
                  Mark    pickle.dump(c, f, pickle.HIGHEST_ PROTOCOL)
                  Mark    f.close()
              >
              Hmmm...  Why pickle it twice?
              >
                  def saveOjb(self, dataObj):
                      fName = self.version + '_' + self.modname + '.dat'
                      f = open(fName, 'wb')
                      f.write(pickle. dumps(dataObj, pickle.HIGHEST_ PROTOCOL).encod e("bz2"))
                      f.close()
              >
              Skip

              I wasn't sure whether the string object was still a string after
              "encode" is called... at least whether it's still an ascii string.
              And if not, whether it could be used w/ dumps. I tested your
              variation and it works the same. I guess your "write" is doing the
              same as my "dump", but may be more efficient. Thanks.

              Comment

              • greg

                #8
                Re: compressed serialization module

                Mark wrote:
                Thanks guys. This is for serializing to disk. I was hoping to not
                have to use too many intermediate steps
                You should be able to use a gzip.GzipFile
                or bz2.BZ2File and pickle straight into it.

                --
                Greg

                Comment

                • Nick Craig-Wood

                  #9
                  Re: compressed serialization module

                  greg <greg@cosc.cant erbury.ac.nzwro te:
                  Mark wrote:
                  Thanks guys. This is for serializing to disk. I was hoping to not
                  have to use too many intermediate steps
                  >
                  You should be able to use a gzip.GzipFile
                  or bz2.BZ2File and pickle straight into it.
                  Good idea - that will be much more memory efficient. Eg
                  >>import bz2
                  >>import pickle
                  >>L = range(100)
                  >>f = bz2.BZ2File("z. dat", "wb")
                  >>pickle.dump(L , f)
                  >>f.close()
                  >>f = bz2.BZ2File("z. dat", "rb")
                  >>M = pickle.load(f)
                  >>f.close()
                  >>M == L
                  True
                  >>>
                  (Note that basic pickle protocol is likely to be more compressible
                  than the binary version!)

                  --
                  Nick Craig-Wood <nick@craig-wood.com-- http://www.craig-wood.com/nick

                  Comment

                  • greg

                    #10
                    Re: compressed serialization module

                    Nick Craig-Wood wrote:
                    (Note that basic pickle protocol is likely to be more compressible
                    than the binary version!)
                    Although the binary version may be more compact to
                    start with. It would be interesting to compare the
                    two and see which one wins.

                    --
                    Greg

                    Comment

                    • Nick Craig-Wood

                      #11
                      Re: compressed serialization module

                      greg <greg@cosc.cant erbury.ac.nzwro te:
                      Nick Craig-Wood wrote:
                      (Note that basic pickle protocol is likely to be more compressible
                      than the binary version!)
                      >
                      Although the binary version may be more compact to
                      start with. It would be interesting to compare the
                      two and see which one wins.
                      It is very data dependent of course, but in this case the binary
                      version wins...

                      However there is exactly the same amount of information in the text
                      pickle and the binary pickle, so in theory a perfect compressor will
                      compress each to exactly the same size ;-)
                      >>import os
                      >>import bz2
                      >>import pickle
                      >>L = range(1000000)
                      >>f = bz2.BZ2File("z. dat", "wb")
                      >>pickle.dump(L , f)
                      >>f.close()
                      >>os.path.getsi ze("z.dat")
                      1055197L
                      >>f = bz2.BZ2File("z1 .dat", "wb")
                      >>pickle.dump(L , f, -1)
                      >>f.close()
                      >>os.path.getsi ze("z1.dat")
                      524741L
                      >>>
                      Practical considerations might be that bz2 is quite CPU expensive. It
                      also has quite a large overhead

                      eg
                      >>len("a".encod e("bz2"))
                      37

                      So if you are compressing lots of small things, zip is a better
                      protocol
                      >>len("a".encod e("zip"))
                      9

                      It is also much faster!

                      --
                      Nick Craig-Wood <nick@craig-wood.com-- http://www.craig-wood.com/nick

                      Comment

                      Working...