MD5CryptoServiceProvider Hashing a split file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John Smith

    MD5CryptoServiceProvider Hashing a split file

    Hi,

    I am very new to C# and NET framework. I am trying to hash (using
    MD5CryptoServic eProvider) a source that is split into several files.

    Now when the source is in one file I can produce the correct md5 hash.

    My issue is how can I reproduce the correct hash when the file is split
    into different files.

    Thanks :)



  • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

    #2
    Re: MD5CryptoServic eProvider Hashing a split file

    John Smith wrote:
    I am very new to C# and NET framework. I am trying to hash (using
    MD5CryptoServic eProvider) a source that is split into several files.
    >
    Now when the source is in one file I can produce the correct md5 hash.
    >
    My issue is how can I reproduce the correct hash when the file is split
    into different files.
    A hash is calculated based on the byte content.

    Why does it make the difference whether those bytes are read
    from a single file or from multiple files ?

    Arne

    Comment

    • John Smith

      #3
      Re: MD5CryptoServic eProvider Hashing a split file

      Arne Vajhøj wrote:
      John Smith wrote:
      > I am very new to C# and NET framework. I am trying to hash (using
      >MD5CryptoServi ceProvider) a source that is split into several files.
      >>
      >Now when the source is in one file I can produce the correct md5 hash.
      >>
      >My issue is how can I reproduce the correct hash when the file is
      >split into different files.
      >
      A hash is calculated based on the byte content.
      >
      Why does it make the difference whether those bytes are read
      from a single file or from multiple files ?
      >
      Arne

      Thanks Arne.

      I think I might not have explained myself. Let me rephrase it I have no
      clue how I to do it. :?

      I think best way is to show you my problem with quick example code:

      ------------------------------------------------------------
      MD5CryptoServic eProvider oMD5 = new MD5CryptoServic eProvider();
      string sRet;

      string s1 = "First String Sample";
      string s2 = "Second String Sample";
      string s3 = s1 + s2;


      byte[] bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s1);
      sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-", string.Empty);
      System.Diagnost ics.Debug.Write Line(sRet);

      bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s2);
      sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-", string.Empty);
      System.Diagnost ics.Debug.Write Line(sRet);

      bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s3);
      sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-", string.Empty);
      System.Diagnost ics.Debug.Write Line(sRet);
      -----------------------------------------------------------------

      The output hash is as follows:
      s1 = 1EC25881AD012D4 CA6E73D1986AE93 FB
      s2 = D8D46AC432C7251 F863C2D5B91FE48 FC
      s3 = 9E158DDEE697EBA EC2A036F459B024 48

      Now what I want is basically to be able to hash s1 get the
      result and then continue hashing s2 and get the final s3 result.

      Right now the only way I know of getting s3 hash is by first
      concatenating the strings then running it through ComputeHash.

      This isn't much of an issue when the input is a small string, however
      if I am trying to hash several files then that is a different matter.
      **These files can be large, and the only way I know of doing it, is to
      basically combining all the files into a single temporary file and then
      passing the stream to ComputeHash.

      Surely there has to be a better method.

      Any advice?

      Thanks















      Comment

      • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

        #4
        Re: MD5CryptoServic eProvider Hashing a split file

        John Smith wrote:
        Arne Vajhøj wrote:
        >John Smith wrote:
        >> I am very new to C# and NET framework. I am trying to hash (using
        >>MD5CryptoServ iceProvider) a source that is split into several files.
        >>>
        >>Now when the source is in one file I can produce the correct md5 hash.
        >>>
        >>My issue is how can I reproduce the correct hash when the file is
        >>split into different files.
        >>
        >A hash is calculated based on the byte content.
        >>
        >Why does it make the difference whether those bytes are read
        >from a single file or from multiple files ?
        I think best way is to show you my problem with quick example code:
        Example code is always good.
        MD5CryptoServic eProvider oMD5 = new MD5CryptoServic eProvider();
        string sRet;
        >
        string s1 = "First String Sample";
        string s2 = "Second String Sample";
        string s3 = s1 + s2;
        >
        >
        byte[] bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s1);
        sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-",
        string.Empty);
        System.Diagnost ics.Debug.Write Line(sRet);
        >
        bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s2);
        sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-",
        string.Empty);
        System.Diagnost ics.Debug.Write Line(sRet);
        >
        bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s3);
        sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-",
        string.Empty);
        System.Diagnost ics.Debug.Write Line(sRet);
        -----------------------------------------------------------------
        >
        The output hash is as follows:
        s1 = 1EC25881AD012D4 CA6E73D1986AE93 FB
        s2 = D8D46AC432C7251 F863C2D5B91FE48 FC
        s3 = 9E158DDEE697EBA EC2A036F459B024 48
        >
        Now what I want is basically to be able to hash s1 get the
        result and then continue hashing s2 and get the final s3 result.
        >
        Right now the only way I know of getting s3 hash is by first
        concatenating the strings then running it through ComputeHash.
        >
        This isn't much of an issue when the input is a small string, however
        if I am trying to hash several files then that is a different matter.
        **These files can be large, and the only way I know of doing it, is to
        basically combining all the files into a single temporary file and then
        passing the stream to ComputeHash.
        You can not "add" MD5 checksums.

        But if you use TransformBlock and TransformFinalB lock instead
        of ComputeHash, then you should be able to process small
        chunks (like 1 MB or 10 MB) at a time - even coming from
        multiple files.

        Arne


        Comment

        • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

          #5
          Re: MD5CryptoServic eProvider Hashing a split file

          Arne Vajhøj wrote:
          John Smith wrote:
          >Arne Vajhøj wrote:
          >>John Smith wrote:
          >>> I am very new to C# and NET framework. I am trying to hash (using
          >>>MD5CryptoSer viceProvider) a source that is split into several files.
          >>>>
          >>>Now when the source is in one file I can produce the correct md5 hash.
          >>>>
          >>>My issue is how can I reproduce the correct hash when the file is
          >>>split into different files.
          >>>
          >>A hash is calculated based on the byte content.
          >>>
          >>Why does it make the difference whether those bytes are read
          >>from a single file or from multiple files ?
          >
          >I think best way is to show you my problem with quick example code:
          >
          Example code is always good.
          >
          >MD5CryptoServi ceProvider oMD5 = new MD5CryptoServic eProvider();
          >string sRet;
          >>
          >string s1 = "First String Sample";
          >string s2 = "Second String Sample";
          >string s3 = s1 + s2;
          >>
          >>
          >byte[] bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s1);
          >sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-",
          >string.Empty );
          >System.Diagnos tics.Debug.Writ eLine(sRet);
          >>
          >bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s2);
          >sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-",
          >string.Empty );
          >System.Diagnos tics.Debug.Writ eLine(sRet);
          >>
          >bBytes = System.Text.ASC IIEncoding.ASCI I.GetBytes(s3);
          >sRet = BitConverter.To String(oMD5.Com puteHash(bBytes )).Replace("-",
          >string.Empty );
          >System.Diagnos tics.Debug.Writ eLine(sRet);
          >-----------------------------------------------------------------
          >>
          >The output hash is as follows:
          >s1 = 1EC25881AD012D4 CA6E73D1986AE93 FB
          >s2 = D8D46AC432C7251 F863C2D5B91FE48 FC
          >s3 = 9E158DDEE697EBA EC2A036F459B024 48
          >>
          >Now what I want is basically to be able to hash s1 get the
          >result and then continue hashing s2 and get the final s3 result.
          >>
          >Right now the only way I know of getting s3 hash is by first
          >concatenatin g the strings then running it through ComputeHash.
          >>
          >This isn't much of an issue when the input is a small string, however
          >if I am trying to hash several files then that is a different matter.
          >**These files can be large, and the only way I know of doing it, is to
          >basically combining all the files into a single temporary file and then
          >passing the stream to ComputeHash.
          >
          You can not "add" MD5 checksums.
          >
          But if you use TransformBlock and TransformFinalB lock instead
          of ComputeHash, then you should be able to process small
          chunks (like 1 MB or 10 MB) at a time - even coming from
          multiple files.
          Example:

          using System;
          using System.Text;
          using System.Security .Cryptography;

          namespace E
          {
          public class Program
          {
          public static void Main(string[] args)
          {
          MD5CryptoServic eProvider md5 = new MD5CryptoServic eProvider();
          string s1 = "First String Sample";

          Console.WriteLi ne(BitConverter .ToString(md5.C omputeHash(Enco ding.UTF8.GetBy tes(s1))).Repla ce("-",
          ""));
          string s2 = "Second String Sample";

          Console.WriteLi ne(BitConverter .ToString(md5.C omputeHash(Enco ding.UTF8.GetBy tes(s2))).Repla ce("-",
          ""));
          string s3 = s1 + s2;

          Console.WriteLi ne(BitConverter .ToString(md5.C omputeHash(Enco ding.UTF8.GetBy tes(s3))).Repla ce("-",
          ""));
          md5.Initialize( );
          byte[] garbage = new Byte[1000000];
          md5.TransformBl ock(Encoding.UT F8.GetBytes(s1) , 0,
          Encoding.UTF8.G etByteCount(s1) , garbage, 0);
          md5.TransformFi nalBlock(Encodi ng.UTF8.GetByte s(s2), 0,
          Encoding.UTF8.G etByteCount(s2) );

          Console.WriteLi ne(BitConverter .ToString(md5.H ash).Replace("-", ""));
          Console.ReadKey ();
          }
          }
          }

          (it may be possible to optimize it a bit, but it should
          show the concept)

          Arne

          Comment

          • John Smith

            #6
            Re: MD5CryptoServic eProvider Hashing a split file

            (it may be possible to optimize it a bit, but it should
            show the concept)
            >
            Arne
            Ahhhh. I wish I saw the code before. I actually figured it out after you pointed me to the TransformBlock.
            Thanks Arne, you've been a great help. Saved me a lot of time.

            Still have one final issue and I don't think it can be solved (easily). That is working out the hash at each stage.

            So hash for s1
            So hash for s1 + s2
            So hash for s1 + s2 + s3
            etc...

            It seems that I can use the TransformBlock but I am unable to get the current "total" hash of processed chunks.

            The only way I can think of doing it is if I can make a copy of the md5 object, which to my understanding is a pain in the butt in C#;

            Have any suggestions?

            Thx for all the help










            Comment

            • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

              #7
              Re: MD5CryptoServic eProvider Hashing a split file

              John Smith wrote:
              Still have one final issue and I don't think it can be solved (easily).
              That is working out the hash at each stage.
              >
              So hash for s1
              So hash for s1 + s2
              So hash for s1 + s2 + s3
              etc...
              >
              It seems that I can use the TransformBlock but I am unable to get the
              current "total" hash of processed chunks.
              >
              The only way I can think of doing it is if I can make a copy of the md5
              object, which to my understanding is a pain in the butt in C#;
              >
              Have any suggestions?
              I don't think that is possible easily.

              I think what I would do was to have to MD5 hashers.

              One that I reset for each file and one for total. And
              then call both of them with the data.

              I know that MD5(individual) and MD5(total) is not the
              same as MD5(accumulate( individual)) and MD5(total), but
              it may be OK.

              Arne

              Comment

              • John Smith

                #8
                Re: MD5CryptoServic eProvider Hashing a split file

                Thanks. I think it would have to be separate hashers like you said.

                Comment

                Working...