Re: what is probability to create two equal hashes for md5 algorithm
- From: Unruh <unruh-spam@xxxxxxxxxxxxxx>
- Date: 6 Dec 2006 18:28:15 GMT
"Sergei" <silentser@xxxxxxxxx> writes:
Joseph Ashwood wrote:
"Dmitry Chumack" <saint.d.a@xxxxxxxxx> wrote in message
news:1165339092.911172.123630@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi *
I have a question. There is few doesns of __terabytes__ of data. I need
to split this data to blocks of fixed size. Than I need to calculate a
sha1 hash for each block and store it. The question is which minimal
size of block I have to choose to minimize the probability of existence
of two equal hashes? What probability of such a coincidence would be if
I'll use an md5, crc32 or sha2 algorithms instead of sha1? Or if I'll
use a combination of this algorithms (e.g. sha1 and crc32 or sha2 and
md5 and crc32)?
Since you're talking about 2 generated values colliding, in a cryptographic
hash it takes on average 1/sqrt(2^output size), for MD5 you'll need about
2^64, and SHA about 2^80 hashes generated before you collide. Considering
the size of your data pile, you'll probably get collisions sooner because
there is almost certainly duplicate data.
Here's how I'd do it, and I'd include encryption as well:
Encrypt everything with AES in CTR mode, the odds of a collision in the data
are now very small, about 1/16777216 for a petabyte of data. Then I'd use
SHA512 with 512KB chunks and build up a Merkle tree, this will save you
enormous time later on in figuring out what was changed or damaged. Assuming
1PB this will give you 2*10^9 leaf blocks, then holding to the 512KB chunk,
you can store ~8000 of them per chunk with proper indexing (this will be shy
of the 8192 apparent maximum if you implement the indexing that is necessary
for security). This gives 268435 chunks, building a third level of 34
chunks, and a root/top/fourth level that is tiny. For a total of about
2*10^9 chunks, 2*10^9
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2^256 (~10^77) which is the limit for SHA512. Each of these chunks will fit
conveniently on a floppy disk, 1200 of them to a CD, and I don't even want
to do the math on the other options. On a system with modern cpu you should
be able to achieve this in about a month of compute time (assuming a quad
core, scale appropriately), Tom's benchmark website
http://libtomcrypt.com:80/ltc113.html has more information to help you run
the numbers. The overall odds of a collision in this are in the neighborhood
of 1/10^68, which is quite frankly not gonna happen.
It will be reasonably secure though, so don't lose the AES key. Without the
AES encryption the odds of collision rise sharply, but assuming your data
cannot be compressed beyond a reasonable amount still ignorable, and if they
do collide you don't have to ship one of the blocks because they are
identical.
Joe
How do you get such estimations for MD5 (2^60) and SHA-1 (2^80)? As I
understand, such numbers can be given for truly random hash functions
and MD5, SHA are just the algorithms that try to emulate such things.
Because as far as is known, both are uniformly distributed functions.
Sergei
.
- Follow-Ups:
- References:
- what is probability to create two equal hashes for md5 algorithm
- From: Dmitry Chumack
- Re: what is probability to create two equal hashes for md5 algorithm
- From: Joseph Ashwood
- Re: what is probability to create two equal hashes for md5 algorithm
- From: Sergei
- what is probability to create two equal hashes for md5 algorithm
- Prev by Date: Re: what is probability to create two equal hashes for md5 algorithm
- Next by Date: Group verifiable encryption
- Previous by thread: Re: what is probability to create two equal hashes for md5 algorithm
- Next by thread: Re: what is probability to create two equal hashes for md5 algorithm
- Index(es):
Relevant Pages
|
|