Re: message digest of large files
From: James Whitwell (jams_at_momento.com.au)
Date: Thu, 18 Aug 2005 10:48:03 +1000
Kristian Gjøsteen wrote:
> James Whitwell <firstname.lastname@example.org> wrote:
>>We're trying to use message digests as a way to uniquely identify large
>>binary files (around 50-60MB). Is there a limit to the size of the file
>>that we feed through, say SHA1?
> Yes, there is a limit, but I believe it is 2^64 bits or something
> like that, so there is no need to worry. You should check the
> relevant standard (FIPS 180-1).
Thanks for everyone's replies, I'm reading the FIPS 180-1 standard now,
hopefully it'll sink in.
Is there a way of determining what the chances of two hashes for
different files being the same are? My reasoning is that, if I have
lots of large files (say 10000 files of 50MB each), and the hash is only
160 bits long, surely I'll get a collision fairly quickly? That's why I
thought chopping up my files into smaller chunks and generating a hash
for each chunk, then concatenating the hashes together to form a large
unique ID would help me avoid collisions. The files are PDFs that have
been encrypted using Blowfish, so I'd assume they're pretty random.