GZipStream compressed bytes written

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mach77
    New Member
    • Sep 2008
    • 3

    GZipStream compressed bytes written

    When using a GZipStream, is there any way to know how many compressed bytes were written?

    For example:

    // "buffer" is data from a text file
    // "offset" is 0
    // "count" is 100

    stream.Write(bu ffer, offset, count);

    This is going to write the compressed data to the underlying data store. Is there any way to get the length of this data?

    Note: I know you can use a MemoryStream to get the size but this inefficient.
  • PRR
    Recognized Expert Contributor
    • Dec 2007
    • 750

    #2
    Originally posted by mach77
    When using a GZipStream, is there any way to know how many compressed bytes were written?
    For example:

    // "buffer" is data from a text file
    // "offset" is 0
    // "count" is 100

    stream.Write(bu ffer, offset, count);

    This is going to write the compressed data to the underlying data store. Is there any way to get the length of this data?

    Note: I know you can use a MemoryStream to get the size but this inefficient.
    could you explain more? i dont know whether i got u right... but heres wat i can explain....

    "count" is number of bytes compressed...

    Code:
    byte[] myByte;
                using (FileStream f1 = new FileStream(@"C:\log.txt", FileMode.Open))
                {
                    myByte = new byte[f1.Length];
                    f1.Read(myByte, 0, (int)f1.Length);
                }
    
    
                using (FileStream f2 = new FileStream(@"C:\log123.txt", FileMode.Create))
                using (GZipStream gz = new GZipStream(f2, CompressionMode.Compress, false))
                {
                    gz.Write(myByte, 0, myByte.Length);
                }
    To get the length of the data written... on the other file .. i guess you will have to read it ...

    Code:
    using (FileStream f1 = new FileStream(@"C:\log123.txt", FileMode.Open))
                {
                    myByte = new byte[f1.Length];
                    f1.Read(myByte, 0, (int)f1.Length);
                }
    theres also a length property of GZipStream... but its no longer supported... so i wont know wat exactly it does....
    "GZipStream.Len gth Property :This property is not supported and always throws a NotSupportedExc eption."


    Code:
    byte[] more = System.Text.UnicodeEncoding.Unicode.GetBytes("Prr");
    
                using (MemoryStream ms = new MemoryStream())
                {
                    using (GZipStream GZ = new GZipStream(ms, CompressionMode.Compress, false))
                    {
                        GZ.Write(more, 0, more.Length);
                       // string sss = ms.Length.ToString();
    
                       // sss = ms.ToArray().Length.ToString();
    //as GZipStream writes additional data including  information when its been //disposed... you should not do the above
                    }
    
                    byte[] bb = ms.ToArray();
    
                    string len = bb.Length.ToString();
    //You will get the length here....
    
    
                }

    Comment

    • mach77
      New Member
      • Sep 2008
      • 3

      #3
      In my example, "count" was the uncompressed size. So 100 uncompressed bytes pass into the stream, but the stream compresses them before sending it to the underlying data source. I want to know how many bytes are sent to the underlying source.

      Comment

      • Plater
        Recognized Expert Expert
        • Apr 2007
        • 7872

        #4
        Look at the underlying stream's size?

        [code=c#]
        FileStream fs = new FileStream(@"c: \tempzip.zip", FileMode.Create );
        System.IO.Compr ession.GZipStre am gz = new System.IO.Compr ession.GZipStre am(fs, System.IO.Compr ession.Compress ionMode.Compres s) ;
        byte[] fred= Encoding.ASCII. GetBytes("Billy joe has a lot of bottles of rum");
        gz.Write(fred, 0, fred.Length);
        Int64 SizeOfCompresse dBytes = fs.Length;
        [/code]
        If you are doing multiple writes and want how much came from each write, you could probably implement logic to look at the change in the .Length property.

        Comment

        • mldisibio
          Recognized Expert New Member
          • Sep 2008
          • 191

          #5
          Just to reiterate what dirtBag noted, closing a GZipStream flushes the buffers and writes some additional EOF bytes that the decompressor needs. Therefore, the most accurate byte count can only be done after the GZipStream is closed or disposed.

          Also, just a warning, if checking the underlying stream length after a series of atomic writes you must take into account that stream buffering does not always write all the read bytes with each iteration unless the buffer is Flushed to the stream.

          Comment

          • mldisibio
            Recognized Expert New Member
            • Sep 2008
            • 191

            #6
            Also, if you are worried about reading a large file into a memory stream, an alternative is to count bytes in chunks from a buffer size to your liking:

            Code:
                  int _BUFFER_SIZE = 4096;
                  FileStream inputStream = new FileStream(@"C:\someFile.zip", FileMode.Open);
                  byte[] readBuffer = new byte[_BUFFER_SIZE];
                  long totalBytes = 0;
                  int bytesRead = 0;
                    do {
                      // read _BUFFER_SIZE bytes from inputStream into the buffer
                      bytesRead = inputStream.Read(readBuffer, 0, _BUFFER_SIZE);
                      totalBytes += bytesRead;
                    }
                    // until no more bytes are read from the input stream
                    while (bytesRead > 0);
            
                    Console.WriteLine("{0} : {1} bytes.", inputStream.Name, totalBytes);

            Comment

            • mach77
              New Member
              • Sep 2008
              • 3

              #7
              The examples given are all pretty straight forward, but what happens when you don't have access to the underlying stream or the underlying stream throws an exception when you call Length on it?

              Comment

              • Plater
                Recognized Expert Expert
                • Apr 2007
                • 7872

                #8
                Well you always have access to the underlying stream via the .BaseStream proeprty. Not sure which stream types would throw an exception on the Length proeprty, but I guess it could happen?

                What is your underlying stream type?

                Comment

                • mldisibio
                  Recognized Expert New Member
                  • Sep 2008
                  • 191

                  #9
                  That is exactly what dirtBag pointed out. GZipStream does not support Length or Position while open, so neither will its BaseStream.

                  As you can see, the implementation of GZip compression does not allow a real-time feedback of incremental bytes written...which is what you want.

                  Unless you write your own compression algorithm, using the GZipStream class means you will only know how may compressed bytes were written by retrieving the length of the entire compressed stream after it has been fully written and closed.

                  At least that is my conclusion. If someone else can show otherwise, please correct me.

                  Comment

                  • Plater
                    Recognized Expert Expert
                    • Apr 2007
                    • 7872

                    #10
                    I have had zero trouble using the .Length property on the BaseStream

                    Comment

                    • mldisibio
                      Recognized Expert New Member
                      • Sep 2008
                      • 191

                      #11
                      OK, you are correct Plater, the BaseStream.Leng th is available. My statement above about the BaseStream is incorrect.

                      However, only while the wrapper (GZipStream) stream is open...when the GZipStream is closed so is the Base Stream...

                      And because of this, the BaseStream.Leng th is not absolutely correct until the Compression stream is flushed/closed.

                      So for mach77, you can calculate bytes written by examining the increase in BaseStream.Leng th...but the last length given will not equal the final length of the compressed stream.

                      I have provided to test methods to show this: you will need to provide a text file.
                      Code:
                      using System;
                      using System.IO;
                      using System.IO.Compression;
                      
                      namespace bytes {
                      
                        class test {
                      
                          static void Main() {
                            GZipCompressAllAtOnce(@"C:\Temp\someFile.txt");
                            GZipCompressIncremental(@"C:\Temp\someFile.txt");
                          }
                      
                          public static void GZipCompressAllAtOnce(string filename) {
                            byte[] fileBuffer;
                            int bytesRead;
                            MemoryStream ms;
                            GZipStream compressedzipStream;
                            using (FileStream infile = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read)) {
                              fileBuffer = new byte[infile.Length];
                              bytesRead = infile.Read(fileBuffer, 0, fileBuffer.Length);
                            }
                            using (ms = new MemoryStream()) {
                              using (compressedzipStream = new GZipStream(ms, CompressionMode.Compress, true)) {
                                compressedzipStream.Write(fileBuffer, 0, bytesRead);
                                Console.WriteLine("Underlying Stream Length: {0}", compressedzipStream.BaseStream.Length);
                              }
                              Console.WriteLine("Original size: {0}, Compressed size: {1}", fileBuffer.Length, ms.Length);
                            }
                          }
                      
                          public static void GZipCompressIncremental(string filename) {
                            byte[] fileBuffer;
                            int bytesRead;
                            int BUFFERSIZE = 512;
                            MemoryStream ms;
                            using (FileStream infile = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read)) {
                              fileBuffer = new byte[BUFFERSIZE];
                              using (ms = new MemoryStream()) {
                                using (GZipStream compressedzipStream = new GZipStream(ms, CompressionMode.Compress, true)) {
                                  do {
                                    bytesRead = infile.Read(fileBuffer, 0, BUFFERSIZE);
                                    compressedzipStream.Write(fileBuffer, 0, bytesRead);
                                    Console.WriteLine("Underlying Stream Length: {0}", compressedzipStream.BaseStream.Length);
                                  } while (bytesRead > 0);
                                }
                                Console.WriteLine("Original size: {0}, Compressed size: {1}", infile.Length, ms.Length);
                              }
                            }
                          }
                      
                        }
                      }

                      Comment

                      • Plater
                        Recognized Expert Expert
                        • Apr 2007
                        • 7872

                        #12
                        Well the OPs question was to know how many compressed bytes were written, which can be done with the length property as mentioned. Sort of.
                        The compressed streams get a preamble and postamble as mentioned.
                        But I think it works like this:

                        CompressedStrea m after 3 writes
                        [preamble][compressedbytes][postamble]

                        And *NOT* like this:
                        CompressedStrea m after 3 writes
                        [preamble][compressedbytes][postamble][preamble][compressedbytes][postamble][preamble][compressedbytes][postamble]

                        I have not confirmed this however.

                        Comment

                        Working...