File Hashing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Johnny Jörgensen

    File Hashing

    I'm wondering (and hoping that somebody will be able to answer this):

    If I calculate the hash value of files (either MD5 or SHA1), can I then be
    sure that:

    1) Two files with the same hash value are in fact identical?

    2) Two different files will NEVER have the same hash value?

    3) If two files have the same MD5 hash value, they will ALSO have the same
    SHA1 hash value (I should think that will always be the case)?

    TIA,
    Johnny J.


  • Barry Kelly

    #2
    Re: File Hashing

    Johnny Jörgensen wrote:
    I'm wondering (and hoping that somebody will be able to answer this):
    >
    If I calculate the hash value of files (either MD5 or SHA1), can I then be
    sure that:
    >
    1) Two files with the same hash value are in fact identical?
    No, but fairly sure.
    2) Two different files will NEVER have the same hash value?
    No, but fairly sure.
    3) If two files have the same MD5 hash value, they will ALSO have the same
    SHA1 hash value (I should think that will always be the case)?
    No, but fairly sure.

    -- Barry

    --

    Comment

    • Peter Duniho

      #3
      Re: File Hashing

      Please do not cross-post between language groups. It's one thing to
      "abuse" the C# newsgroup with non-language .NET questions (we all do it
      all the time :) ). But if your .NET question is even nominally on-topic
      in the C# newsgroup (by virtue of the language you're using), it's
      definitely off-topic in the VB.NET newsgroup, and vice a versa.

      Follow-ups to m.p.d.l.csharp.

      As for the question, you would do well to search this newsgroup for
      keywords like "hash", "identical" , "file", etc. You'd be amazed at what's
      already been said on the topic (especially on your first two questions).

      But, the short version is:

      On Wed, 28 May 2008 11:44:27 -0700, Johnny Jörgensen <jojo@altcom.se >
      wrote:
      I'm wondering (and hoping that somebody will be able to answer this):
      >
      If I calculate the hash value of files (either MD5 or SHA1), can I then
      be
      sure that:
      >
      1) Two files with the same hash value are in fact identical?
      >
      2) Two different files will NEVER have the same hash value?
      Your first two questions are the same, and so the answer for both is the
      same: no, you cannot be sure of that.
      3) If two files have the same MD5 hash value, they will ALSO have the
      same
      SHA1 hash value (I should think that will always be the case)?
      Granted, I'm not a crypto expert. However, I'd say the answer to this is
      also "no". If MD5 provided just as much differentiating power as SHA1,
      even though it's 128 bits while SHA1 is 160 bits, then why would anyone
      bother with SHA1? No, I think it's safe to say that there are at least
      some pairs of files for which the MD5 hash is identical, but the SHA1 hash
      is not.

      Of course, finding two different files that produce the exact same hash in
      either algorithm is either contrived or very difficult. But then, it's
      still a possibility (see the answer to questions #1 and #2). :)

      Pete

      Comment

      • sylvain.rodrigue@gmail.com

        #4
        Re: File Hashing

        On May 28, 8:44 pm, "Johnny Jörgensen" <j...@altcom.se wrote:
        I'm wondering (and hoping that somebody will be able to answer this):
        >
        If I calculate the hash value of files (either MD5 or SHA1), can I then be
        sure that:
        >
        1) Two files with the same hash value are in fact identical?
        >
        2) Two different files will NEVER have the same hash value?
        >
        3) If two files have the same MD5 hash value, they will ALSO have the same
        SHA1 hash value (I should think that will always be the case)?
        >
        TIA,
        Johnny J.
        Hello,

        All hashing functions have a finite set of return values (say : 2^128)
        but an infinite number of possible input values. This clearly implies
        that two input values CAN generate the same output value.

        But in practice, the probability that you can find two input values
        generating the same hash signature are pretty close to zero. I would
        say :

        1) Yes. It will be the same file (well, most of the time, read this :
        http://www.mathstat.dal.ca/~selinger/md5collision/)
        2) Yes.
        3) No. Using both an MD5 and a SHA-1 will in fact reduce the number of
        possible collisions.

        Comment

        • Jon Skeet [C# MVP]

          #5
          Re: File Hashing

          Johnny Jörgensen <jojo@altcom.se wrote:
          I'm wondering (and hoping that somebody will be able to answer this):

          If I calculate the hash value of files (either MD5 or SHA1), can I then be
          sure that:

          1) Two files with the same hash value are in fact identical?
          No. Think how much data is contained in a hash. Suppose you have a 128
          bit hash. Now think about just files which are (say) 136 bits in
          length. How many possible files of that length are there? Now how many
          possible 128 bit hash values are there?

          A slightly different way of looking at this: suppose you see some
          people, and label each one with a different (capital) letter of the
          alphabet to tell them apart. When you've got more than 26 people,
          you're *bound* to have at least two people who have the same letter.
          2) Two different files will NEVER have the same hash value?
          That's the same question as question 1.
          3) If two files have the same MD5 hash value, they will ALSO have the same
          SHA1 hash value (I should think that will always be the case)?
          No, not necessarily. It's incredibly likely - hashes are designed such
          that you'd be extremely unlucky to run into two files with the same
          hash but different content. It's possible though.

          --
          Jon Skeet - <skeet@pobox.co m>
          Web site: http://www.pobox.com/~skeet
          Blog: http://www.msmvps.com/jon.skeet
          C# in Depth: http://csharpindepth.com

          Comment

          • =?Utf-8?B?S0g=?=

            #6
            RE: File Hashing

            You could use the file length as an additional piece of "metadata" -- if two
            files were to have the same hash but different byte lengths then they are not
            the same. That's probably going to solve most hash collissions. If you do
            find a case of two files having the same hash and length, then you need to do
            a byte-for-byte comparison to determine equality.

            HTH


            "Johnny Jörgensen" wrote:
            I'm wondering (and hoping that somebody will be able to answer this):
            >
            If I calculate the hash value of files (either MD5 or SHA1), can I then be
            sure that:
            >
            1) Two files with the same hash value are in fact identical?
            >
            2) Two different files will NEVER have the same hash value?
            >
            3) If two files have the same MD5 hash value, they will ALSO have the same
            SHA1 hash value (I should think that will always be the case)?
            >
            TIA,
            Johnny J.
            >
            >
            >

            Comment

            • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

              #7
              Re: File Hashing

              Johnny Jörgensen wrote:
              If I calculate the hash value of files (either MD5 or SHA1), can I then be
              sure that:
              >
              1) Two files with the same hash value are in fact identical?
              >
              2) Two different files will NEVER have the same hash value?
              Other have already answered that question.

              But there is an important point that should be
              emphasized:

              * if you want to protect against accidentally matching
              files, then you should not worry, the probabilities
              of 1/2^128 and 1/2^160 are close to impossible, so
              both MD5 and SHA1 are fine

              * if you want to protect against malicious matching
              files then it a completely different game - MD5 is
              completely broken and SHA1 is somewhat broken - neither
              is usable and you should go for SHA256 instead

              Arne


              Comment

              • Cor Ligthert[MVP]

                #8
                Re: File Hashing

                Johnny,

                No you only will be sure that there is a low change that somebody can create
                your files new with guessing what it would have as content.

                The check if something is complete has in my idea nothing to do with an
                security encryption.

                Cor

                "Johnny Jörgensen" <jojo@altcom.se schreef in bericht
                news:%230qvXMPw IHA.2188@TK2MSF TNGP04.phx.gbl. ..
                I'm wondering (and hoping that somebody will be able to answer this):
                >
                If I calculate the hash value of files (either MD5 or SHA1), can I then be
                sure that:
                >
                1) Two files with the same hash value are in fact identical?
                >
                2) Two different files will NEVER have the same hash value?
                >
                3) If two files have the same MD5 hash value, they will ALSO have the same
                SHA1 hash value (I should think that will always be the case)?
                >
                TIA,
                Johnny J.
                >

                Comment

                • Johnny Jörgensen

                  #9
                  Re: File Hashing

                  Good idea - Thanks

                  /Johnny


                  "KH" <KH@discussions .microsoft.coms krev i meddelandet
                  news:1687E78E-FD6D-4B4C-AF3C-482ACAB77431@mi crosoft.com...
                  You could use the file length as an additional piece of "metadata" -- if
                  two
                  files were to have the same hash but different byte lengths then they are
                  not
                  the same. That's probably going to solve most hash collissions. If you do
                  find a case of two files having the same hash and length, then you need to
                  do
                  a byte-for-byte comparison to determine equality.
                  >
                  HTH
                  >
                  >
                  "Johnny Jörgensen" wrote:
                  >
                  >I'm wondering (and hoping that somebody will be able to answer this):
                  >>
                  >If I calculate the hash value of files (either MD5 or SHA1), can I then
                  >be
                  >sure that:
                  >>
                  >1) Two files with the same hash value are in fact identical?
                  >>
                  >2) Two different files will NEVER have the same hash value?
                  >>
                  >3) If two files have the same MD5 hash value, they will ALSO have the
                  >same
                  >SHA1 hash value (I should think that will always be the case)?
                  >>
                  >TIA,
                  >Johnny J.
                  >>
                  >>
                  >>

                  Comment

                  • Johnny Jörgensen

                    #10
                    Re: File Hashing

                    Thanks

                    /Johnny J.




                    "Barry Kelly" <barry.j.kelly@ gmail.comskrev i meddelandet
                    news:i5br34l3gp lbeir4ihvf96bpa l86bth4o3@4ax.c om...
                    Johnny Jörgensen wrote:
                    >
                    >I'm wondering (and hoping that somebody will be able to answer this):
                    >>
                    >If I calculate the hash value of files (either MD5 or SHA1), can I then
                    >be
                    >sure that:
                    >>
                    >1) Two files with the same hash value are in fact identical?
                    >
                    No, but fairly sure.
                    >
                    >2) Two different files will NEVER have the same hash value?
                    >
                    No, but fairly sure.
                    >
                    >3) If two files have the same MD5 hash value, they will ALSO have the
                    >same
                    >SHA1 hash value (I should think that will always be the case)?
                    >
                    No, but fairly sure.
                    >
                    -- Barry
                    >
                    --
                    http://barrkel.blogspot.com/

                    Comment

                    • Johnny Jörgensen

                      #11
                      Re: File Hashing

                      Thanks

                      Johnny J.

                      <sylvain.rodrig ue@gmail.comskr ev i meddelandet
                      news:2b15a111-e9cc-4e9f-afef-0cf45e474c74@d1 g2000hsg.google groups.com...
                      On May 28, 8:44 pm, "Johnny Jörgensen" <j...@altcom.se wrote:
                      I'm wondering (and hoping that somebody will be able to answer this):
                      >
                      If I calculate the hash value of files (either MD5 or SHA1), can I then be
                      sure that:
                      >
                      1) Two files with the same hash value are in fact identical?
                      >
                      2) Two different files will NEVER have the same hash value?
                      >
                      3) If two files have the same MD5 hash value, they will ALSO have the same
                      SHA1 hash value (I should think that will always be the case)?
                      >
                      TIA,
                      Johnny J.
                      Hello,

                      All hashing functions have a finite set of return values (say : 2^128)
                      but an infinite number of possible input values. This clearly implies
                      that two input values CAN generate the same output value.

                      But in practice, the probability that you can find two input values
                      generating the same hash signature are pretty close to zero. I would
                      say :

                      1) Yes. It will be the same file (well, most of the time, read this :
                      http://www.mathstat.dal.ca/~selinger/md5collision/)
                      2) Yes.
                      3) No. Using both an MD5 and a SHA-1 will in fact reduce the number of
                      possible collisions.


                      Comment

                      • Johnny Jörgensen

                        #12
                        Re: File Hashing

                        Thanks

                        Johnny J.



                        "Jon Skeet [C# MVP]" <skeet@pobox.co mskrev i meddelandet
                        news:MPG.22a7c9 e37c914ca3ceb@m snews.microsoft .com...
                        Johnny Jörgensen <jojo@altcom.se wrote:
                        I'm wondering (and hoping that somebody will be able to answer this):
                        >
                        If I calculate the hash value of files (either MD5 or SHA1), can I then be
                        sure that:
                        >
                        1) Two files with the same hash value are in fact identical?
                        No. Think how much data is contained in a hash. Suppose you have a 128
                        bit hash. Now think about just files which are (say) 136 bits in
                        length. How many possible files of that length are there? Now how many
                        possible 128 bit hash values are there?

                        A slightly different way of looking at this: suppose you see some
                        people, and label each one with a different (capital) letter of the
                        alphabet to tell them apart. When you've got more than 26 people,
                        you're *bound* to have at least two people who have the same letter.
                        2) Two different files will NEVER have the same hash value?
                        That's the same question as question 1.
                        3) If two files have the same MD5 hash value, they will ALSO have the same
                        SHA1 hash value (I should think that will always be the case)?
                        No, not necessarily. It's incredibly likely - hashes are designed such
                        that you'd be extremely unlucky to run into two files with the same
                        hash but different content. It's possible though.

                        --
                        Jon Skeet - <skeet@pobox.co m>
                        Web site: http://www.pobox.com/~skeet
                        Blog: http://www.msmvps.com/jon.skeet
                        C# in Depth: http://csharpindepth.com


                        Comment

                        • Johnny Jörgensen

                          #13
                          Re: File Hashing

                          Thanks

                          Johnny J.



                          "Arne Vajhøj" <arne@vajhoej.d kskrev i meddelandet
                          news:483dd5fa$0 $90267$14726298 @news.sunsite.d k...
                          Johnny Jörgensen wrote:
                          >If I calculate the hash value of files (either MD5 or SHA1), can I then
                          >be sure that:
                          >>
                          >1) Two files with the same hash value are in fact identical?
                          >>
                          >2) Two different files will NEVER have the same hash value?
                          >
                          Other have already answered that question.
                          >
                          But there is an important point that should be
                          emphasized:
                          >
                          * if you want to protect against accidentally matching
                          files, then you should not worry, the probabilities
                          of 1/2^128 and 1/2^160 are close to impossible, so
                          both MD5 and SHA1 are fine
                          >
                          * if you want to protect against malicious matching
                          files then it a completely different game - MD5 is
                          completely broken and SHA1 is somewhat broken - neither
                          is usable and you should go for SHA256 instead
                          >
                          Arne
                          >
                          >

                          Comment

                          • Johnny Jörgensen

                            #14
                            Re: File Hashing

                            That wasn't the intention behind my question either. Simply to determine if
                            two files are identical or not.

                            /Johnny J.




                            "Cor Ligthert[MVP]" <notmyfirstname @planet.nlskrev i meddelandet
                            news:304C03FC-5865-440E-A131-817D8F3B7AB7@mi crosoft.com...
                            Johnny,
                            >
                            No you only will be sure that there is a low change that somebody can
                            create your files new with guessing what it would have as content.
                            >
                            The check if something is complete has in my idea nothing to do with an
                            security encryption.
                            >
                            Cor
                            >
                            "Johnny Jörgensen" <jojo@altcom.se schreef in bericht
                            news:%230qvXMPw IHA.2188@TK2MSF TNGP04.phx.gbl. ..
                            >I'm wondering (and hoping that somebody will be able to answer this):
                            >>
                            >If I calculate the hash value of files (either MD5 or SHA1), can I then
                            >be sure that:
                            >>
                            >1) Two files with the same hash value are in fact identical?
                            >>
                            >2) Two different files will NEVER have the same hash value?
                            >>
                            >3) If two files have the same MD5 hash value, they will ALSO have the
                            >same SHA1 hash value (I should think that will always be the case)?
                            >>
                            >TIA,
                            >Johnny J.
                            >>
                            >

                            Comment

                            • Barry Kelly

                              #15
                              Re: File Hashing

                              KH wrote:
                              You could use the file length as an additional piece of "metadata" -- if two
                              files were to have the same hash but different byte lengths then they are not
                              the same. That's probably going to solve most hash collissions.
                              File length (specifically, bit count) is part of the MD5 and SHA1 hash
                              calculations. There is less information per bit of a separate length
                              indicator than you're getting out of the bits in the MD5 or SHA1 hashes.
                              If minimizing collisions is the priority, then using a better hash
                              function, like SHA-224, SHA-256, etc. will give better "bang for buck"
                              in terms of bit information. Given that you could probably expect file
                              length to require a 64-bit number, choosing SHA-224 over SHA-160 seems
                              to be obvious.

                              Because of the birthday paradox, accidental collisions with hash
                              functions are more common than the astronomical numbers like 2**128 and
                              2**160 seem to suggest; 50% chance with roughly 1.25 times the square
                              root of the number of possible hash values, assuming the hash values are
                              distributed evenly.

                              That works out to a 50% chance of collision after around 2**64 (MD5) or
                              2**80 (SHA-1).

                              2**64 and 2**80 are still large numbers, unlikely to be met in practice
                              where file comparison is the goal of hashing.

                              Of course, specially crafted collisions have been found for MD5, and
                              attacks are underway with 2**35 evaluations for SHA-1. But these won't
                              be of concern for file comparison.

                              -- Barry

                              --

                              Comment

                              Working...