Re: message digest of large files

From: James Whitwell (jams_at_momento.com.au)
Date: 08/18/05


Date: Thu, 18 Aug 2005 10:48:03 +1000

Kristian Gjøsteen wrote:
> James Whitwell <jams@momento.com.au> wrote:
>
>>We're trying to use message digests as a way to uniquely identify large
>>binary files (around 50-60MB). Is there a limit to the size of the file
>>that we feed through, say SHA1?
>
>
> Yes, there is a limit, but I believe it is 2^64 bits or something
> like that, so there is no need to worry. You should check the
> relevant standard (FIPS 180-1).
>

Thanks for everyone's replies, I'm reading the FIPS 180-1 standard now,
hopefully it'll sink in.

Is there a way of determining what the chances of two hashes for
different files being the same are? My reasoning is that, if I have
lots of large files (say 10000 files of 50MB each), and the hash is only
160 bits long, surely I'll get a collision fairly quickly? That's why I
thought chopping up my files into smaller chunks and generating a hash
for each chunk, then concatenating the hashes together to form a large
unique ID would help me avoid collisions. The files are PDFs that have
been encrypted using Blowfish, so I'd assume they're pretty random.

thanks,
;) james.



Relevant Pages

  • Re: Convert Base16 Hex String to Base58 String
    ... thought a hash would be the correct choice. ... chunks) and save each chunk under its hash string. ... inputs of all the same length is less likely to result in a collision ... than getting hashes of inputs of varying lengths. ...
    (microsoft.public.vb.general.discussion)
  • Re: Windows 2003 Password Encryption
    ... LM hashes really aren't hashes. ... case and broken into two 7-character chunks. ... The hash is a single application of MD4 on the ... Your Unicode password hash is concatenated with your user name ...
    (microsoft.public.win2000.security)
  • Re: Convert Base16 Hex String to Base58 String
    ... thought a hash would be the correct choice. ... 4k chunks) and save each chunk under its hash string. ... getting hashes for inputs of all the same length is less likely to ... result in a collision than getting hashes of inputs of varying ...
    (microsoft.public.vb.general.discussion)
  • Re: People ~Fing with Life
    ... That is what the charge was. ... hash values and the like'. ... this data area had no corresponding entry in the allocation tables. ... Hashes are used for the purposes of error correction ...
    (uk.legal)
  • Re: Passwords: to crypt or to hash?
    ... read recently that hashes are stored rather than crypted versions. ... Very few systems have ever stored crypted passwords. ... the hash function took over a second to compute. ...
    (comp.security.misc)