tarfile.open(mode='w:gz'|'w|gz'|..., fileobj=StringIO()) fails.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sebastian.noack@googlemail.com

    tarfile.open(mode='w:gz'|'w|gz'|..., fileobj=StringIO()) fails.

    Hi,

    is there a way to or at least a reason why I can not use tarfile to
    create a gzip or bunzip2 compressed archive in the memory?

    You might might wanna answer "use StringIO" but this isn't such easy
    as it seems to be. ;) I am using Python 2.5.2, by the way. I think
    this is a bug in at least in this version of python, but maybe
    StringIO isn't just file-like enough for this "korky" tarfile module.
    But this would conflict with its documentation.

    "For special purposes, there is a second format for mode: 'filemode|
    [compression]'. open() will return a TarFile object that processes its
    data as a stream of blocks. No random seeking will be done on the
    file. If given, fileobj may be any object that has a read() or write()
    method (depending on the mode)."

    Sounds good, but doesn't work. ;P StringIO provides a read() and
    write() method amongst others. But tarfile has especially in this mode
    problems with the StringIO object.

    I extracted the code out of my project into a standalone python script
    to proof this issue on the lowest level. You can run the script below
    as following: ./StringIO-tarfile.py file1 [file2] [...]


    #
    # File: StringIO-tarfile.py
    #
    #!/usr/bin/env python

    from StringIO import StringIO
    import tarfile
    import sys

    def create_tar_file (filenames, fileobj, mode, result_cb=lambd a f:
    None):
    tar_file = tarfile.open(mo de=mode, fileobj=fileobj )
    for f in filenames:
    tar_file.add(f)
    result = result_cb(fileo bj)
    tar_file.close( )
    return result

    if __name__ == '__main__':
    files = sys.argv[1:]
    modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
    'bz2')]

    string_io_cb = lambda f: f.getvalue()

    for mode in modes:
    ext = mode.replace('w |', '-pipe.tar.').rep lace('w:',
    '.tar.').rstrip ('.')
    # StringIO test.
    content = create_tar_file (files, StringIO(), mode, string_io_cb)
    fd = open('StringIO% s' % ext, 'w')
    fd.write(conten t)
    fd.close()

    # file object test.
    fd = open('file%s' % ext, 'w')
    create_tar_file (files, fd, mode)


    As test input, I have used a directory with a single text file. As you
    can see below, any tests using plain file objects were successful. But
    when using StringIO, I can only create uncompressed tar files. Even
    though I don't get any errors when creating them most of the files are
    just empty or truncated.


    $ for f in `ls *.tar{,.gz,.bz2 }`; do echo -n $f; du -h $f | awk
    '{print " ("$1"B)"}'; tar -tf $f; echo; done

    file-pipe.tar (84KB)
    foo/
    foo/ksp-fosdem2008.txt

    file-pipe.tar.bz2 (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    file-pipe.tar.gz (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    file.tar (84KB)
    foo/
    foo/ksp-fosdem2008.txt

    file.tar.bz2 (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    file.tar.gz (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    StringIO-pipe.tar (76KB)
    foo/
    foo/ksp-fosdem2008.txt
    tar: Unexpected EOF in archive
    tar: Error is not recoverable: exiting now

    StringIO-pipe.tar.bz2 (0B)
    tar: This does not look like a tar archive
    tar: Error exit delayed from previous errors

    StringIO-pipe.tar.gz (0B)
    tar: This does not look like a tar archive
    tar: Error exit delayed from previous errors

    StringIO.tar (76KB)
    foo/
    foo/ksp-fosdem2008.txt

    StringIO.tar.bz 2 (0B)
    tar: This does not look like a tar archive
    tar: Error exit delayed from previous errors

    StringIO.tar.gz (4.0KB)

    gzip: stdin: unexpected end of file
    tar: Child returned status 1
    tar: Error exit delayed from previous errors


    Can somebody reproduce this problem? Did I misunderstood the API? What
    would be the best work around, if I am right? I am thinking about
    using the gzip and bz2 module directly.

    Regards
    Sebastian Noack
  • Gabriel Genellina

    #2
    Re: tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

    En Mon, 26 May 2008 17:44:28 -0300, sebastian.noack @googlemail.com
    <sebastian.noac k@googlemail.co mescribió:
    is there a way to or at least a reason why I can not use tarfile to
    create a gzip or bunzip2 compressed archive in the memory?
    >
    You might might wanna answer "use StringIO" but this isn't such easy
    as it seems to be. ;) I am using Python 2.5.2, by the way. I think
    this is a bug in at least in this version of python, but maybe
    StringIO isn't just file-like enough for this "korky" tarfile module.
    But this would conflict with its documentation.
    def create_tar_file (filenames, fileobj, mode, result_cb=lambd a f:
    None):
    tar_file = tarfile.open(mo de=mode, fileobj=fileobj )
    for f in filenames:
    tar_file.add(f)
    result = result_cb(fileo bj)
    tar_file.close( )
    return result
    It's not a bug, you must extract the StringIO contents *after* closing
    tar_file, else you won't get the last blocks pending to be written.

    --
    Gabriel Genellina

    Comment

    • sebastian.noack@googlemail.com

      #3
      Re: tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

      On May 27, 2:17 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
      wrote:
      It's not a bug, you must extract the StringIO contents *after* closing
      tar_file, else you won't get the last blocks pending to be written.
      I looked at tarfile's source code last night after I wrote this
      message and figured it out. But the problem is that TarFile's close
      method closes the underlying file object after the last block is
      written and when you close StringIO you can not get its content
      anymore. Wtf does it close the underlying file? There is absolute no
      reason for doing this. Are you still sure this isn't a bug?

      Regards
      Sebastian Noack

      Comment

      • sebastian.noack@googlemail.com

        #4
        Re: tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

        I have written a FileWrapper class as workaround, which works for me
        (see the code below). The FileWrapper object holds an internal file-
        like object and maps its attributes, but prevents the user (in this
        case tarfile) from closing the internal file, so I can still access
        StringIO's content after closing the TarFile object.

        But this should not be required to create in memory tar files. It is
        definitely a bug, that TarFile closes external file objects passed to
        tarfile.open, when closing the TarFile object. The code which opens a
        file is also responsible for closing it.

        Regards
        Sebastian Noack


        #
        # File: StringIO-tarfile.py
        #
        #!/usr/bin/env python

        from StringIO import StringIO
        import tarfile
        import sys

        class FileWrapper(obj ect):
        def __init__(self, fileobj):
        self.file = fileobj
        self.closed = fileobj.closed

        def __getattr__(sel f, name):
        # Raise AttributeError, if it isn't a file attribute.
        if name not in dir(file):
        raise AttributeError( name)

        # Get the attribute of the internal file object.
        value = getattr(self.fi le, name)

        # Raise a ValueError, if the attribute is callable (e.g. an instance
        # method) and the FileWrapper is closed.
        if callable(value) and self.closed:
        raise ValueError('I/O operation on closed file')
        return value

        def close(self):
        self.closed = True

        def create_tar_file (filenames, fileobj, mode):
        tar_file = tarfile.open(mo de=mode, fileobj=fileobj )
        for f in filenames:
        tar_file.add(f)
        tar_file.close( )

        if __name__ == '__main__':
        files = sys.argv[1:]
        modes = ['w%s%s' % (x, y) for x in (':', '|') for y in ('', 'gz',
        'bz2')]

        for mode in modes:
        ext = mode.replace('w |', '-pipe.tar.').rep lace('w:',
        '.tar.').rstrip ('.')
        # StringIO test.
        stream = FileWrapper(Str ingIO())
        create_tar_file (files, stream, mode)
        fd = open('StringIO% s' % ext, 'w')
        fd.write(stream .file.getvalue( ))
        stream.file.clo se()
        fd.close()

        # file object test.
        fd = open('file%s' % ext, 'w')
        create_tar_file (files, fd, mode)

        Comment

        • Lars =?iso-8859-1?Q?Gust=E4bel?=

          #5
          Re: tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

          On Tue, May 27, 2008 at 01:51:47AM -0700, sebastian.noack @googlemail.com wrote:
          I have written a FileWrapper class as workaround, which works for me
          (see the code below). The FileWrapper object holds an internal file-
          like object and maps its attributes, but prevents the user (in this
          case tarfile) from closing the internal file, so I can still access
          StringIO's content after closing the TarFile object.
          >
          But this should not be required to create in memory tar files. It is
          definitely a bug, that TarFile closes external file objects passed to
          tarfile.open, when closing the TarFile object. The code which opens a
          file is also responsible for closing it.
          You're right, _BZ2Proxy.close () calls the wrapped file object's close() method
          and that is definitely not the desired behaviour. So, if you can do without 'bz2'
          modes for now, you're problem is gone, all other modes work fine.

          I fixed it (r63744), so the next beta release will work as expected. Your test
          script helped a lot, thanks.

          Regards,

          --
          Lars Gustäbel
          lars@gustaebel. de

          A casual stroll through a lunatic asylum shows that
          faith does not prove anything.
          (Friedrich Nietzsche)

          Comment

          • Gabriel Genellina

            #6
            Re: tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

            En Tue, 27 May 2008 02:43:53 -0300, sebastian.noack @googlemail.com
            <sebastian.noac k@googlemail.co mescribió:
            On May 27, 2:17 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
            wrote:
            >It's not a bug, you must extract the StringIO contents *after* closing
            >tar_file, else you won't get the last blocks pending to be written.
            >
            I looked at tarfile's source code last night after I wrote this
            message and figured it out. But the problem is that TarFile's close
            method closes the underlying file object after the last block is
            written and when you close StringIO you can not get its content
            anymore. Wtf does it close the underlying file? There is absolute no
            reason for doing this. Are you still sure this isn't a bug?
            Ouch, sorry, I only tried with gzip (and worked fine), not bz2 (which is
            buggy).

            --
            Gabriel Genellina

            Comment

            • sebastian.noack@googlemail.com

              #7
              Re: tarfile.open(mo de='w:gz'|'w|gz '|..., fileobj=StringI O()) fails.

              That is right, only bz2 is affected. I am happy that i could help. ;)

              Regards
              Sebastian Noack

              Comment

              Working...