Re: Is MD5 outdated ?

From: Michael Amling (nospam_at_nospam.com)
Date: 09/27/03


Date: Sat, 27 Sep 2003 01:25:50 GMT

Gregory G Rose wrote:
> The formula for the *expected number* of
> collisions is simple: if N is the number of
> samples, and k is the length in bits of the hash,
> you expect to have C = N * (N-1) / 2^(k+1)
> collisions.
>
> Now, by asking for the 1% confidence interval,
> you're effectively asking for the N that
> corresponds to a C of 0.01. Simply shuffling the
> formula around gives a quadratic that you can
> solve in the traditional formula to get:
> N = 1/2 + sqrt(2^(k+3) * C)/2.

   Or, rather than using an expectation value and a confidence interval,
the OP could use the exact probability of zero collisions in n
evaluations of a perfectly random 128-bit hash, namely 1.0 -
n*(n-1)/(2**129), and use that as a lower bound and approximation to the
probability of zero collisions for MD5. That probability remains less
than 0.001 until you reach 824,963,474,247,118,971 trials.

--Mike Amling



Relevant Pages

  • Re: [RFC/PATCH] Documentation of kernel messages
    ... Maybe a stupid idea but why do we want to assign these numbers by hand? ... I can imagine it could introduce collisions when merging tons of patches ... But maybe also 4 bytes would be enough, since the hash only has to be ... For 1000 messages the probability is ...
    (Linux-Kernel)
  • Re: When will md5crk complete?
    ... and in that case birthday attack ... > His core message is correct however: you shouldn't be using MD5. ... Collisions DO exist for every hash algorithm... ...
    (sci.crypt)
  • Re: Hashing
    ... Computing the hash function, which is handled by the instructions: ... reserved word/identifier when searching through the reserved words ... collisions and four slots that have four collisions. ...
    (alt.lang.asm)
  • Re: Hashing
    ... A good hash ... > greater is it better performance due to less collisions". ... then the probability that you need a rehash on any scan is something ... > 'hash method' simply because they use hash codes, ...
    (alt.lang.asm)
  • Re: CHECKSUM() question
    ... Let's assume for a moment that you use CHECKSUM and that there are hash collisions. ... If you stage the answer in a temporary table, you can use that to join to your source table, then filter out the few collisions. ... Remember absolutely no hash that reduces the size of data for searching can be guaranteed to avoid a collision. ... However, from a private discussion, Steve Kass pointed out that HASHBYTES with MD5 for 300 million rows probably has a lower chance of collision than the the possibility that some bit will get randomly changed by some other influence. ...
    (microsoft.public.sqlserver.programming)

Quantcast