Reading the contents of zip files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • arnaudk
    Contributor
    • Sep 2007
    • 425

    Reading the contents of zip files

    I have a large number of sequentially-named zip files, each containing a single csv data file which I need to read into my C++ program.

    1) Does anybody know of any open source libraries to handle zip files? (I've seen some unportable, expensive commercial ones only).

    2) I could use a free program like gzip which can decompress zip files. I don't want to decompress each archive to disk and read in the resultant csv file as this would be excruciatingly slow. Instead, I've seen you can pipe the output of gzip to another program. Is it possible, using system() calls to gzip, to capture this piped output as an fstream or some other stream which I can then getline() to read the csv data rows?
  • RRick
    Recognized Expert Contributor
    • Feb 2007
    • 463

    #2
    On linux, gzip wants to create a single compressed file for each file passed. This doesn't sound like what you want

    If you want to combine multiple files into a single compressed file, the tar command is the way to go. For linux, it is the archive workhorse. It will store and extract single or multiple files; supports directories; puts them in their own directories; or sends them to stdout.

    What more could you ask for? :-)

    Comment

    • gpraghuram
      Recognized Expert Top Contributor
      • Mar 2007
      • 1275

      #3
      Originally posted by arnaudk
      I have a large number of sequentially-named zip files, each containing a single csv data file which I need to read into my C++ program.

      1) Does anybody know of any open source libraries to handle zip files? (I've seen some unportable, expensive commercial ones only).

      2) I could use a free program like gzip which can decompress zip files. I don't want to decompress each archive to disk and read in the resultant csv file as this would be excruciatingly slow. Instead, I've seen you can pipe the output of gzip to another program. Is it possible, using system() calls to gzip, to capture this piped output as an fstream or some other stream which I can then getline() to read the csv data rows?
      You idea 2 is good.
      I remeber that there is a command whicn say what are all the files available inside the zipped file without opening it.(I dont remember the exact name).
      With ur second idea even if you tryto unzip with pipe command then also the files will be unzipped.
      I dont think that can boost the speed

      Raghuram

      Comment

      • arnaudk
        Contributor
        • Sep 2007
        • 425

        #4
        Thanks for your replies,

        RRick: I don't need to create any archives. I just have a collection of zip files, each containing one single file which I need to read into objects in my program, so gzip would work fine here, with the added advantage that the command syntax is the same on most platforms where it is installed, making my program more portable.

        gpraghuram: Actually, I already know that the name of the file inside the zip file will be the same as the name of the archive (except the extension, of course), so I just have to unzip them. It's true that uncompressing the files which will cost some time, but this is unavoidable. What is avoidable, however, is any disk read/writes which are notoriously slow compared to keeping everything confined to the (solid-state) memory. Thus, I'd like to avoid creating temporary files on the hard disk, etc. Do you know how I can capture redirected/piped output into a stream?

        Comment

        • mac11
          Contributor
          • Apr 2007
          • 256

          #5
          Originally posted by arnaudk
          Do you know how I can capture redirected/piped output into a stream?
          I have an idea If you're running Linux. Maybe try using a fifo (named pipe) - have gzip input to it and your program read from it - should work but I've never done it

          google for "linux fifo" to learn fifos

          Comment

          • RRick
            Recognized Expert Contributor
            • Feb 2007
            • 463

            #6
            On Unux, the trick used to redirect output from one process to the input of another process is called "piping". From the command line it looks something like:
            Code:
            gunzip -c xxx.gz | myProg
            gunzip outputs to stdout which the pipe (|) redirects to the input of myProg.

            All myProg has to do is read stdin to get the info.

            Comment

            • arnaudk
              Contributor
              • Sep 2007
              • 425

              #7
              I found some useful links after some exhaustive searching. For the benefit of others, I post them here.

              Zlib, C library for reading/writing .gz files. The following are all based on this great library.

              Gzstream, a wrapper for zlib which defines C++ streams for zlib which work just like ifstream, etc.

              However, I need to read .zip archives which are more complex than .gz files because they can contain several files and directory structure. To this end, there is:

              Minizip, an addon for zlib to handle .zip files. It is also distributed with Zlib1.2.3.

              A C++ wrapper class for Minizip, written by David Godson

              Note that to get them going in VC++ 2005 Express, I had to install the microsoft SDK since I got errors that windows.h wasn't found when I tried to compile.

              With zlib and gzstream, I managed to read the contents of .gz files. But I'm still working on reading .zip files...

              Comment

              • arnaudk
                Contributor
                • Sep 2007
                • 425

                #8
                OK, after a lot of dredging through oodles of lines of C code, I finally managed to dump the buffer filled by the function unzReadCurrentF ile in Minizip (which reads a file in a .zip archive) into a string stream which I can use in the rest of my c++ program. So, problem solved.

                Comment

                • arnaudk
                  Contributor
                  • Sep 2007
                  • 425

                  #9
                  ... and here is the code that does it (didn't use string streams in the end)
                  [CODE=cpp]
                  /*
                  unzips testfile.txt from C:\temp\test.zi p
                  and puts it in a string
                  */
                  #include <cstdio>
                  #include <string>
                  #include <iostream>
                  #include "unzip.h" // MiniZip library

                  #define WRITEBUFFERSIZE (5242880) // 5Mb buffer

                  using namespace std;

                  string readZipFile(str ing zipFile, string fileInZip) {
                  int err = UNZ_OK; // error status
                  uInt size_buf = WRITEBUFFERSIZE ; // byte size of buffer to store raw csv data
                  void* buf; // the buffer
                  string sout; // output strings
                  char filename_inzip[256]; // for unzGetCurrentFi leInfo
                  unz_file_info file_info; // for unzGetCurrentFi leInfo

                  unzFile uf = unzOpen(zipFile .c_str()); // open zipfile stream
                  if (uf==NULL) {
                  cerr << "Cannot open " << zipFile << endl;
                  return sout;
                  } // file is open

                  if ( unzLocateFile(u f,fileInZip.c_s tr(),1) ) { // try to locate file inside zip
                  // second argument of unzLocateFile: 1 = case sensitive, 0 = case-insensitive
                  cerr << "File " << fileInZip << " not found in " << zipFile << endl;
                  return sout;
                  } // file inside zip found

                  if (unzGetCurrentF ileInfo(uf,&fil e_info,filename _inzip,sizeof(f ilename_inzip), NULL,0,NULL,0)) {
                  cerr << "Error " << err << " with zipfile " << zipFile << " in unzGetCurrentFi leInfo." << endl;
                  return sout;
                  } // obtained the necessary details about file inside zip

                  buf = (void*)malloc(s ize_buf); // setup buffer
                  if (buf==NULL) {
                  cerr << "Error allocating memory for read buffer" << endl;
                  return sout;
                  } // buffer ready

                  err = unzOpenCurrentF ilePassword(uf, NULL); // Open the file inside the zip (password = NULL)
                  if (err!=UNZ_OK) {
                  cerr << "Error " << err << " with zipfile " << zipFile << " in unzOpenCurrentF ilePassword." << endl;
                  return sout;
                  } // file inside the zip is open

                  // Copy contents of the file inside the zip to the buffer
                  cout << "Extracting : " << filename_inzip << " from " << zipFile << endl;
                  do {
                  err = unzReadCurrentF ile(uf,buf,size _buf);
                  if (err<0) {
                  cerr << "Error " << err << " with zipfile " << zipFile << " in unzReadCurrentF ile" << endl;
                  sout = ""; // empty output string
                  break;
                  }
                  // copy the buffer to a string
                  if (err>0) for (int i = 0; i < (int) err; i++) sout.push_back( *(((char*)buf)+ i) );
                  } while (err>0);

                  err = unzCloseCurrent File (uf); // close the zipfile
                  if (err!=UNZ_OK) {
                  cerr << "Error " << err << " with zipfile " << zipFile << " in unzCloseCurrent File" << endl;
                  sout = ""; // empty output string
                  }

                  free(buf); // free up buffer memory
                  return sout;
                  }

                  int main(int argc, char *argv[]) {
                  string string_buffer = readZipFile("C:/temp/test.zip", "testfile.txt") ;
                  cout << string_buffer << endl;
                  return 0;
                  }
                  [/CODE]

                  Comment

                  • RRick
                    Recognized Expert Contributor
                    • Feb 2007
                    • 463

                    #10
                    Very nice. I like how simple it is to find and extract the file.

                    Is this based on the zlib library? I believe the latest version is 1.2.3

                    Comment

                    • gpraghuram
                      Recognized Expert Top Contributor
                      • Mar 2007
                      • 1275

                      #11
                      Very good effort and i like to appreciate for your work

                      Raghuram

                      Comment

                      • mrviit
                        New Member
                        • Feb 2012
                        • 1

                        #12
                        Thank you very much.
                        But if the path to the zip file or the name's zip file is Unicode. The unzOpen( or unzOpen64) cannot open the zip file. (I have change the type of the first parameter zipFile to wstring).

                        Can you help me, please?

                        Comment

                        Working...