Re: Question about hashing algorithms

bigzaphod_at_gmail.com
Date: 08/26/05


Date: 26 Aug 2005 14:21:12 -0700

I was just pondering a distributed file transfer system (like
bittorrent) which breaks down the file into many small chunks. Each
chunk is verified with a hash so that you can get the chunk from any
number of possible sources (and prove it is correct). However, the
smaller the chunk the greater the percentage of waste due to the
hashing algorithm. So my thinking was that if there was a "smaller"
hashing function which was safe with shorter data and yielded a shorter
fingerprint, the chunks could be smaller with less loss due to the
overhead of having to verify those chunks individually with a hash.

For instance, if I had a 1K chunk and was using the 20 byte SHA-1 hash,
that signature is 1.9% of the size of the data itself. If you were
transferring a 1GB file in 1K chunks, that'd ultimately be 20MB of
hashing data overhead (if I'm doing the math correctly). If that could
be reduced using a smaller hash signature, then that'd be an advantage.

The obvious answer is to make the data blocks bigger, but that assumes
using SHA-1 and the like. I'm just curious if there's any other
options, but sadly I'm not at all familiar with the math or logic of
hashing algorithms, so there might be some provable reason that it'd be
a bad idea to have a shorter one anyway. In any case, this is mostly
just a mental exercise at this point, but I'm still interested to know
if any such algorithms exist and what their properties are.

l8r
Sean



Relevant Pages