Re: message digest of large files

From: James Whitwell (jams_at_momento.com.au)
Date: 08/18/05


Date: Thu, 18 Aug 2005 10:48:03 +1000

Kristian Gjøsteen wrote:
> James Whitwell <jams@momento.com.au> wrote:
>
>>We're trying to use message digests as a way to uniquely identify large
>>binary files (around 50-60MB). Is there a limit to the size of the file
>>that we feed through, say SHA1?
>
>
> Yes, there is a limit, but I believe it is 2^64 bits or something
> like that, so there is no need to worry. You should check the
> relevant standard (FIPS 180-1).
>

Thanks for everyone's replies, I'm reading the FIPS 180-1 standard now,
hopefully it'll sink in.

Is there a way of determining what the chances of two hashes for
different files being the same are? My reasoning is that, if I have
lots of large files (say 10000 files of 50MB each), and the hash is only
160 bits long, surely I'll get a collision fairly quickly? That's why I
thought chopping up my files into smaller chunks and generating a hash
for each chunk, then concatenating the hashes together to form a large
unique ID would help me avoid collisions. The files are PDFs that have
been encrypted using Blowfish, so I'd assume they're pretty random.

thanks,
;) james.