Predicting space available for a ZIP file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bir
    New Member
    • Feb 2007
    • 11

    Predicting space available for a ZIP file

    I am creating a zip file in say c:\Archive, location(could be a network drive also) i need to check whether there is enough space for zipped file to fit in C:\archive.

    Currently what i am doing is first creating a zip file in the same location C:\Archive and then comparing the size of file with the disk space available.

    But the problem is i have already written that file to that location(C:\Arc hive). Is there any other alternative?
  • docsnyder
    New Member
    • Dec 2006
    • 88

    #2
    @bir

    In common (exceptions may exist), the size of a ZIP archive is not predictable. So, it seems that you have to create your ZIP archive anyway.

    My advice:

    You should create the ZIP archive in some tmp space which is considered to be large enough to hold the archive. Then, determine the file size of the archive via stat() and decide whether it fits into your target directory.

    Greetz, Doc

    Comment

    • bir
      New Member
      • Feb 2007
      • 11

      #3
      Doc

      Thats what i have done only after creating the zip file i check for the diskspace. Aren't there any other solutions available?

      Comment

      • docsnyder
        New Member
        • Dec 2006
        • 88

        #4
        @Bir
        Originally posted by bir
        Doc

        Thats what i have done only after creating the zip file i check for the diskspace. Aren't there any other solutions available?
        I guess not! But, let's see what we can do ...

        The size of the resulting ZIP archive can not be determined in advance in an analytical manner.

        This means, that the algorithm building the ZIP archive should run somehow in order to obtain the size.

        So, if you do not want to write the archive to disk prior the size is known, the algorithm could be performed in-memory, as a consequence of the constraints given by you.

        A possible solution I can think of is as follows (assuming it is performed on a UNIX system:
        Code:
        zip - <file list> | wc -c
        This will build the archive (option - forces the output to be written to STDOUT instead of to a file) and pipe it to the word count programm (wc), where the piped bytes are getting counted (option -c).

        With this solution, no data is written to disk at all!

        Also, this solution will work for huge archives as well, because of the nature of a pipe: data is process in small junks (typically 4KB at a time) and does not accumulate in memory.

        If you are not running a UNIX system, try to map the above approach to your system capabilities.

        Greetz, Doc

        Comment

        • bir
          New Member
          • Feb 2007
          • 11

          #5
          Hi Doc

          I am facing a problem running this script on windows. This script basically deals in calculating the compressed file size before writing it to a disk. I am making use of devnull() function here to point to a null device which is causing a problem. i am not able to calculate compressed size properly and some messages are also coming up as mentioned below.

          We are running ActiveState Perl v5.8.8 for windows.

          Any suggestions please?


          My script is as follows:
          # Example of how to compute compressed sizes
          # $Revision: 1.2 $
          use strict;
          use Archive::Zip qw(:ERROR_CODES );
          use File::Spec;
          my $zip = Archive::Zip->new();
          my $blackHoleDevic e = File::Spec->devnull();

          $zip->addFile($_) foreach (<*.pl>);

          # Write and throw the data away.
          # after members are written, the writeOffset will be set
          # to the compressed size.
          $zip->writeToFileNam ed($blackHoleDe vice);

          my $totalSize = 0;
          my $totalCompresse dSize = 0;
          foreach my $member ($zip->members())
          {
          $totalSize += $member->uncompressedSi ze;
          $totalCompresse dSize += $member->_writeOffset ;
          print "Member ", $member->externalFileNa me,
          " size=", $member->uncompressedSi ze,
          ", writeOffset=", $member->_writeOffset ,
          ", compressed=", $member->compressedSize ,
          "\n";
          }

          print "Total Size=", $totalSize, ", total compressed=", $totalCompresse dSize, "\n";

          $zip->writeToFileNam ed('test.zip');



          Output of the script
          IO error: seeking to rewrite local header : Invalid seek
          at C:/Perl/site/lib/Archive/Zip/Member.pm line 623
          Archive::Zip::M ember::_refresh LocalFileHeader ('Archive::Zip: :NewFileMemb
          er=HASH(0x1bd5c e0)', 'IO::File=GLOB( 0x1bd5bb4)') called at C:/Perl/site/lib/Arch
          ive/Zip/Member.pm line 909
          Archive::Zip::M ember::_writeTo FileHandle('Arc hive::Zip::NewF ileMember=HA
          SH(0x1bd5ce0)', 'IO::File=GLOB( 0x1bd5bb4)', 1, 0) called at C:/Perl/site/lib/Arc
          hive/Zip/Archive.pm line 280
          Archive::Zip::A rchive::writeTo FileHandle('Arc hive::Zip::Arch ive=HASH(0x2
          25550)', 'IO::File=GLOB( 0x1bd5bb4)', 1) called at C:/Perl/site/lib/Archive/Zip/A
          rchive.pm line 257
          Archive::Zip::A rchive::writeTo FileNamed('Arch ive::Zip::Archi ve=HASH(0x22
          5550)', 'nul') called at calcSizes.pl line 14
          Member calcSizes.pl size=876, writeOffset=417 , compressed=417
          Member copy.pl size=451, writeOffset=, compressed=451
          Member extract.pl size=862, writeOffset=, compressed=862
          Member mailZip.pl size=1572, writeOffset=, compressed=1572
          Member mfh.pl size=619, writeOffset=, compressed=619
          Member readScalar.pl size=752, writeOffset=, compressed=752

          Comment

          • docsnyder
            New Member
            • Dec 2006
            • 88

            #6
            @bir

            I could not run your code, because I do not have Zip.pm installed. So, I can just give you guesses.

            It appears, that devnull() may not be a seekable device (like pipes). I assume, that writeToFileName d() is trying to seek to the beginning of the given file before starting to write.

            Have you read about the devnull() approach in the internet? Is there somebody out there, who can verify that it should work?

            Sorry, can't tell you more.

            Greetz, Doc

            Comment

            • bir
              New Member
              • Feb 2007
              • 11

              #7
              @DOC

              I am able to get the compressed size of the zip file members by simply writing it to a devnull() using file handles and no messages are coming up this time as used to be earlier. But now I am facing some other problem i.e. i am calculating the sum of the compressed size of the members of the zip file (files to be zipped) which is slightly diffrent from the zip file size.

              And this difference varies according to the volume of files to be zipped.

              1. Why is this size difference is there and how can i get that difference?

              OR

              2. Earlier i was calculating size of zip file by -s tag and now i am not writing it to the disc but to a device handle which points to the null device. How can i get that size from handle?

              Output now is
              Member Archive::Zip::N ewFileMember=HA SH(0x1a4c298) size=719, compressed=367
              Member Archive::Zip::N ewFileMember=HA SH(0x1ceafdc) size=2079, compressed=383
              Member Archive::Zip::N ewFileMember=HA SH(0x1ceb1bc) size=719, compressed=367
              Total Size=3517, total compressed=1117

              Device Null Size :0

              Actual zip file Size: 1571

              U can see the difference between the Actual file size and total compressed size. Also i am not able to get Device null size.

              Comment

              • docsnyder
                New Member
                • Dec 2006
                • 88

                #8
                @Bir

                There is a difference, because a ZIP file does not consist of just a sequence of compressed files. Some additional administrative data is required as well (table of contents, offsets, sizes, ...). I do not know whether Archive::Zip provides a method to figure out the amount of administrative data within a ZIP file.

                The null-device is always of size null, that's why it is called null-device. Writing to such a device will just absorb the written input and discard it. For this reason, there is nothing like "size" associated with this device.

                Greetz, Doc

                Comment

                Working...