Re: 'tree hash' function for large files?



bennett@xxxxxxxxxxxxx writes:
Is there a standard function that works for, say, downloading a large
file and verifying it while the download is in progress, so that if
the file is tampered with at any point, you'll detect that immediately
without having to download the rest of the file? The limitation of
the normal MD5 hash function is of course that if you only had a given
hash value, you'd have to download the whole file before you could
take the hash to see if it matched.

I can imagine how this could be implemented with a kind of "tree hash"
function -- break a file into N pieces of, say, 10K bytes each, and
compute the MD5 hash of each one. Then compute the MD5 hash of all of
those MD5 hashes together, and that final MD5 hash is what you give
people for verification.

Then when they download the file, the file is wrapped in a special
format such that the first thing downloaded is the table of MD5 hashes
of each of the pieces, and then from there, the user downloads the
pieces. The user verifies that the hash of the table of MD5s, equals
the hash they were given for verification. Then as each piece is
downloaded, they verify that the hash of that piece matches its value
in the table. (This also lets them download the pieces in any order.)

So, it's trivial to describe, but I'm wondering if there is already a
known algorithm that does this, that people have standardized on and
that has been implemented in different languages, etc.

The filesharing apps like bittorrent do something very similar.

Phil
--
"Home taping is killing big business profits. We left this side blank
so you can help." -- Dead Kennedys, written upon the B-side of tapes of
/In God We Trust, Inc./.
.