GZip Compression :(

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Carlo Razzeto

    GZip Compression :(

    Hello there,

    I'm having an odd issue with GZIP compression (having followed example code
    found on MSDN). Basically, after running through the compression routine I
    end up with a byte array several times larger than the source text file,
    full of zero data. Below is the code used to do the compression, it's a part
    of a web service to retreive a file, there's a compress option prior to
    base64 encoding the data. In the following code all undeclared variables you
    see are properties, compress repersents a compress attribute specified in
    the xml request, FileName is a relitive path to the file on the server
    inside the webroot.

    Response.Conten tType = "text/xml"
    If Not File.Exists(Ser ver.MapPath(Fil eName)) Then
    Throw New GetBinaryFileEx ception(FileNam e,
    GetBinaryFileEx ception.GetBina ryFileError.Fil eNotFound)
    End If

    Dim FileData() As Byte = Nothing
    Dim FStream As New FileStream(Serv er.MapPath(File Name),
    FileMode.Open, FileAccess.Read , FileShare.ReadW rite)
    If Compress Then
    Dim TempData(FStrea m.Length - 1) As Byte
    FStream.Read(Te mpData, 0, FStream.Length)
    Dim MStream As New MemoryStream
    Dim Compressor As New GZipStream(MStr eam,
    CompressionMode .Compress, True)
    Compressor.Writ e(TempData, 0, TempData.Length )

    ReDim FileData(MStrea m.Length - 1)
    Dim BytesRead As Integer = MStream.Read(Fi leData, 0,
    MStream.Length)
    MStream.Close()
    MStream.Dispose ()
    Compressor.Clos e()
    Compressor.Disp ose()
    Else
    ReDim FileData(FStrea m.Length - 1)
    FStream.Read(Fi leData, 0, FStream.Length)
    End If
    FStream.Close()
    FStream.Dispose ()

    Dim Base64 As String = Convert.ToBase6 4String(FileDat a)

    Dim FileDataNode As XmlNode =
    XmlExchangeLib. GetOrSetXmlNode ("FileData", Root)
    XmlExchangeLib. AddAttributeWit hValue(FileData Node, "Compressed ",
    Compress.ToStri ng().ToLower())
    FileDataNode.In nerText = Base64
    XmlResponse.Sav e(Response.Outp utStream)

  • Marc Gravell

    #2
    Re: GZip Compression :(

    First, you need to make sure that you close the zip-stream (compressor)
    before looking at the memory-stream - it won't have finished writing yet;
    second, you then either need to rewind the memory stream, or just use
    ToArray() to get the full contents.
    Third - Read (on the file stream) is not strictly guaranteed to get
    everything - and even if it did it isn't very efficient. But
    File.ReadAllByt es would be a more reliable way of reading the entire file at
    once.

    You might also be allocating the FileData array one too short - I'm not sure
    (VB...)

    Marc


    Comment

    • Carlo Razzeto

      #3
      Re: GZip Compression :(

      The last point is the one I know for a fact is fine, in VB you need to
      declare it to length -1. But I'll take a look at the rest of the points.
      Thanks very much for your thoughts, most helpful...
      "Marc Gravell" <marc.gravell@g mail.comwrote in message
      news:eEnZFMBkIH A.5280@TK2MSFTN GP02.phx.gbl...
      First, you need to make sure that you close the zip-stream (compressor)
      before looking at the memory-stream - it won't have finished writing yet;
      second, you then either need to rewind the memory stream, or just use
      ToArray() to get the full contents.
      Third - Read (on the file stream) is not strictly guaranteed to get
      everything - and even if it did it isn't very efficient. But
      File.ReadAllByt es would be a more reliable way of reading the entire file
      at once.
      >
      You might also be allocating the FileData array one too short - I'm not
      sure (VB...)
      >
      Marc
      >

      Comment

      • Carlo Razzeto

        #4
        Re: GZip Compression :(

        Thanks for the advice, I swithed to autoclosing the zip stream and using
        ToArray on the memory stream and it seems to be pulling bytes. Now my only
        consern is I'm getting back a byte array much larger than my original 26
        byte text file :(

        "Marc Gravell" <marc.gravell@g mail.comwrote in message
        news:eEnZFMBkIH A.5280@TK2MSFTN GP02.phx.gbl...
        First, you need to make sure that you close the zip-stream (compressor)
        before looking at the memory-stream - it won't have finished writing yet;
        second, you then either need to rewind the memory stream, or just use
        ToArray() to get the full contents.
        Third - Read (on the file stream) is not strictly guaranteed to get
        everything - and even if it did it isn't very efficient. But
        File.ReadAllByt es would be a more reliable way of reading the entire file
        at once.
        >
        You might also be allocating the FileData array one too short - I'm not
        sure (VB...)
        >
        Marc
        >

        Comment

        • =?Utf-8?B?RmFtaWx5IFRyZWUgTWlrZQ==?=

          #5
          Re: GZip Compression :(

          Just to make sure...

          You are talking about "before" the step of going to base64, correct? The
          base 64 step will bloat the string by a factor of 1.37 plus header data, if I
          recall correctly.



          "Carlo Razzeto" wrote:
          Thanks for the advice, I swithed to autoclosing the zip stream and using
          ToArray on the memory stream and it seems to be pulling bytes. Now my only
          consern is I'm getting back a byte array much larger than my original 26
          byte text file :(
          >
          "Marc Gravell" <marc.gravell@g mail.comwrote in message
          news:eEnZFMBkIH A.5280@TK2MSFTN GP02.phx.gbl...
          First, you need to make sure that you close the zip-stream (compressor)
          before looking at the memory-stream - it won't have finished writing yet;
          second, you then either need to rewind the memory stream, or just use
          ToArray() to get the full contents.
          Third - Read (on the file stream) is not strictly guaranteed to get
          everything - and even if it did it isn't very efficient. But
          File.ReadAllByt es would be a more reliable way of reading the entire file
          at once.

          You might also be allocating the FileData array one too short - I'm not
          sure (VB...)

          Marc
          >

          Comment

          • Carlo Razzeto

            #6
            Re: GZip Compression :(

            Raw byte array size (prior to conversion to base64 string). I read in 26
            bytes and typically get back 132 bytes worth of "compressed " data.

            "Family Tree Mike" <FamilyTreeMike @discussions.mi crosoft.comwrot e in
            message news:C6F4164A-4BF0-4E3E-8997-8B4506AEC0C3@mi crosoft.com...
            Just to make sure...
            >
            You are talking about "before" the step of going to base64, correct? The
            base 64 step will bloat the string by a factor of 1.37 plus header data,
            if I
            recall correctly.
            >
            >
            >
            "Carlo Razzeto" wrote:
            >
            >Thanks for the advice, I swithed to autoclosing the zip stream and using
            >ToArray on the memory stream and it seems to be pulling bytes. Now my
            >only
            >consern is I'm getting back a byte array much larger than my original 26
            >byte text file :(
            >>
            >"Marc Gravell" <marc.gravell@g mail.comwrote in message
            >news:eEnZFMBkI HA.5280@TK2MSFT NGP02.phx.gbl.. .
            First, you need to make sure that you close the zip-stream (compressor)
            before looking at the memory-stream - it won't have finished writing
            yet;
            second, you then either need to rewind the memory stream, or just use
            ToArray() to get the full contents.
            Third - Read (on the file stream) is not strictly guaranteed to get
            everything - and even if it did it isn't very efficient. But
            File.ReadAllByt es would be a more reliable way of reading the entire
            file
            at once.
            >
            You might also be allocating the FileData array one too short - I'm not
            sure (VB...)
            >
            Marc
            >
            >>

            Comment

            • Marc Gravell

              #7
              Re: GZip Compression :(

              I wouldn't bother compressing 26 bytes... gzip itself has header overhead
              etc. This also isn't enough space to actually get many useful compression
              opportunities. Finally, it depends on what the data is: if it is fairly
              random (a complex image, a security token, etc) then it simply won't
              compress.

              Marc


              Comment

              • Marc Gravell

                #8
                Re: GZip Compression :(

                Demo; outputs "125"; compression just isn't going to help you with very
                short inputs:

                using(MemoryStr eam dest = new MemoryStream()) {
                using(GZipStrea m zip = new GZipStream(dest ,
                CompressionMode .Compress, true))
                using(StreamWri ter writer = new StreamWriter(zi p)) {
                writer.Write("H i hi hi");
                writer.Close();
                zip.Close();
                }
                Console.WriteLi ne(dest.Length) ;
                }

                Marc


                Comment

                • Carlo Razzeto

                  #9
                  Re: GZip Compression :(

                  Ah, yeah hadn't been considering the compression headers. Thanks for
                  reminding me of that, so that makes sense. IRL this code isn't going to be
                  used to compress 25 byte files, more like several K to an M or two pdf files
                  so it should be fine. Thanks,

                  Carlo

                  "Marc Gravell" <marc.gravell@g mail.comwrote in message
                  news:e2HxrPCkIH A.484@TK2MSFTNG P04.phx.gbl...
                  Demo; outputs "125"; compression just isn't going to help you with very
                  short inputs:
                  >
                  using(MemoryStr eam dest = new MemoryStream()) {
                  using(GZipStrea m zip = new GZipStream(dest ,
                  CompressionMode .Compress, true))
                  using(StreamWri ter writer = new StreamWriter(zi p)) {
                  writer.Write("H i hi hi");
                  writer.Close();
                  zip.Close();
                  }
                  Console.WriteLi ne(dest.Length) ;
                  }
                  >
                  Marc
                  >

                  Comment

                  • Marc Gravell

                    #10
                    Re: GZip Compression :(

                    One approach would be to use the first byte to indicate whether compression
                    is on (and what) - i.e. 0x00 = none, 0x01 = gzip, etc. I use this trick
                    quite happily; pick a cutoff under which you won't even bother trying to
                    compress... otherwise try compressing it and see if it got shorter (even
                    some non-trivial data gets longer when "compressed "). Worth consideration
                    perhaps... And in reverse check the first byte - if 0 return the rest of the
                    stream vanilla, if 1 the gzip, etc...

                    Marc


                    Comment

                    Working...