File Hashing

**Cor Ligthert [MVP]** · Jun 27 '08, 08:16 PM

Re: File Hashing

In my idea would that mean, that you can make from the hash code the file.

Cor

"Johnny Jörgensen" <jojo@altcom.se schreef in bericht
news:OWabSyWwIH A.4912@TK2MSFTN GP03.phx.gbl...

That wasn't the intention behind my question either. Simply to determine
if two files are identical or not.
>
/Johnny J.
>
>
>
>
"Cor Ligthert[MVP]" <notmyfirstname @planet.nlskrev i meddelandet
news:304C03FC-5865-440E-A131-817D8F3B7AB7@mi crosoft.com...

>Johnny,
>>
>No you only will be sure that there is a low change that somebody can
>create your files new with guessing what it would have as content.
>>
>The check if something is complete has in my idea nothing to do with an
>security encryption.
>>
>Cor
>>
>"Johnny Jörgensen" <jojo@altcom.se schreef in bericht
>news:%230qvXMP wIHA.2188@TK2MS FTNGP04.phx.gbl ...

>>I'm wondering (and hoping that somebody will be able to answer this):
>>>
>>If I calculate the hash value of files (either MD5 or SHA1), can I then
>>be sure that:
>>>
>>1) Two files with the same hash value are in fact identical?
>>>
>>2) Two different files will NEVER have the same hash value?
>>>
>>3) If two files have the same MD5 hash value, they will ALSO have the
>>same SHA1 hash value (I should think that will always be the case)?
>>>
>>TIA,
>>Johnny J.
>>>

>>

>
>

**Joergen Bech** · Jun 27 '08, 08:16 PM

Re: File Hashing

Well, if you could create the file from the hash code, then it really
wouldn't be hashing, would it?

Then it would be cryptography, in which case there would be no
point for the purposes of what the OP is trying to achieve.

Regards,

Joergen Bech

On Thu, 29 May 2008 12:36:38 +0200, "Cor Ligthert [MVP]"
<notmyfirstname @planet.nlwrote :

>In my idea would that mean, that you can make from the hash code the file.
>
>Cor
>
>"Johnny Jörgensen" <jojo@altcom.se schreef in bericht
>news:OWabSyWwI HA.4912@TK2MSFT NGP03.phx.gbl.. .

>That wasn't the intention behind my question either. Simply to determine
>if two files are identical or not.
>>
>/Johnny J.
>>
>>
>>
>>
>"Cor Ligthert[MVP]" <notmyfirstname @planet.nlskrev i meddelandet
>news:304C03F C-5865-440E-A131-817D8F3B7AB7@mi crosoft.com...

>>Johnny,
>>>
>>No you only will be sure that there is a low change that somebody can
>>create your files new with guessing what it would have as content.
>>>
>>The check if something is complete has in my idea nothing to do with an
>>security encryption.
>>>
>>Cor
>>>
>>"Johnny Jörgensen" <jojo@altcom.se schreef in bericht
>>news:%230qvXM PwIHA.2188@TK2M SFTNGP04.phx.gb l...
>>>I'm wondering (and hoping that somebody will be able to answer this):
>>>>
>>>If I calculate the hash value of files (either MD5 or SHA1), can I then
>>>be sure that:
>>>>
>>>1) Two files with the same hash value are in fact identical?
>>>>
>>>2) Two different files will NEVER have the same hash value?
>>>>
>>>3) If two files have the same MD5 hash value, they will ALSO have the
>>>same SHA1 hash value (I should think that will always be the case)?
>>>>
>>>TIA,
>>>Johnny J.
>>>>
>>>

>>
>>

>

**=?Utf-8?B?S0g=?=** · Jun 27 '08, 08:16 PM

Re: File Hashing

I guess I don't see your point; I don't know exactly what the OP is doing,
but I was simply suggesting a shortcut he could take to identify hash
collissions using a piece of info he may very well have on hand, before doing
a possibly more expensive byte-for-byte comparison.

"Barry Kelly" wrote:

KH wrote:
>

You could use the file length as an additional piece of "metadata" -- if two
files were to have the same hash but different byte lengths then they are not
the same. That's probably going to solve most hash collissions.

>
File length (specifically, bit count) is part of the MD5 and SHA1 hash
calculations. There is less information per bit of a separate length
indicator than you're getting out of the bits in the MD5 or SHA1 hashes.
If minimizing collisions is the priority, then using a better hash
function, like SHA-224, SHA-256, etc. will give better "bang for buck"
in terms of bit information. Given that you could probably expect file
length to require a 64-bit number, choosing SHA-224 over SHA-160 seems
to be obvious.
>
Because of the birthday paradox, accidental collisions with hash
functions are more common than the astronomical numbers like 2**128 and
2**160 seem to suggest; 50% chance with roughly 1.25 times the square
root of the number of possible hash values, assuming the hash values are
distributed evenly.
>
That works out to a 50% chance of collision after around 2**64 (MD5) or
2**80 (SHA-1).
>
2**64 and 2**80 are still large numbers, unlikely to be met in practice
where file comparison is the goal of hashing.
>
Of course, specially crafted collisions have been found for MD5, and
attacks are underway with 2**35 evaluations for SHA-1. But these won't
be of concern for file comparison.
>
-- Barry
>
--

Entropy Overload

http://barrkel.blogspot.com/

>

**Alun Harford** · Jun 27 '08, 08:16 PM

Re: File Hashing

Johnny Jörgensen wrote:

I'm wondering (and hoping that somebody will be able to answer this):
>
If I calculate the hash value of files (either MD5 or SHA1), can I then be
sure that:
>
1) Two files with the same hash value are in fact identical?

Yes (sort of). If you hash two non-identical files and the same hash is
produced, this is more likely to be due to memory corruption than a
break in either MD5 or SHA1.

2) Two different files will NEVER have the same hash value?

No (sort of). By the pigeonhole principle.

3) If two files have the same MD5 hash value, they will ALSO have the same
SHA1 hash value (I should think that will always be the case)?

Yes and No. As above.

Alun Harford

File Hashing

Comment

Comment

Comment

Comment