Re: "Collision for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD"

From: Rob Warnock (
Date: 08/30/04

Date: Mon, 30 Aug 2004 07:33:20 -0500

Peter Pearson <> wrote:
| Dr. Robert Meier wrote:
| > "Collision for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD"
| > by Xiaoyun Want, Dengguo Feng, Xuejia Lai, and Hongbo Yu 2004.08.17
| Indeed, at the annual IACR cryptography conference in Santa Barbara
| last week, this was the Year of Doom for cryptographic hash functions.
| Not only did this obscure Chinese team burst forth
| and deal the death blow to their handful of respected algorithms,
| but Antoine Joux produced the first collision ever for SHA-0.

For those interested in more details, in the past few days there
have been several *long* threads on "sci.crypt" about these events.
Look for these subject lines [with/without "Re:" in front]:

        Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD
        Any advance news from the crypto rump session?
        RIPEMD broken, *NOT* RIPEMD-128 and RIPEMD-160
        MD5 collisions in convenient form
        Outrageous claims on cash collision exploits???
        HMAC-MD5 not vulnerable?
        A bad day at the hash function factory
        Chinese researchers break MD4, MD5, HAVAL-128 and RIPMED
        Bet on SHA-1 resistance to collisions!
        Collision in SHA-0
        Collision in SHA-0 - concerns for SHA-1
        CRYPTO2004 Rump Session Presentations

These go into great detail on the SHA-0 and MD5 collisions
(especially the latter).

| All these breaks take the form of "I can find two strings whose
| hashes are the same." None of them takes the form of "I can find
| a string whose hash matches some predetermined value."

Correct. Difficulty in the former is called "collision resistance",
and that is what has been broken (somewhat, see below). Difficulty
in the latter is called "(1st) preimage resistance", though what
you probably meant to say was "I can find a *different* string whose
hash matches that of some predetermined string", difficulty with
which is called "2nd preimage resistance", and while that appears
somewhat weakened too [see the details on the collision break below],
I'm not sure it's "broken" yet.

The MD-5 collisions found (*multiple* pairs were published, by the way)
were all of an extremely limited form: Given a random 1024-bit block
(128 bytes), flip six bits in very certain specific positions. The
original and modified blocks now have a higher-than-expected probability
(something like 2**-48 instead of 2**-64) of having the same MD5 hash.

Note that while it would take a very long time to find a colliding pair
if you just picked a random starting block, ran MD5 on it, flipped the
magic six bits, ran MD5 on *that*, and compared the hashes (~8 yrs, for
a medium-speed single-CPU machine), there is apparently a trick for
looking inside the hash as it's working that lets you find the first
half of the "random" block in about an hour, and the second half in
just a few minutes more [per Matt Mahoney, in "sci.crypt"]. As Mahoney
says, since you may have to look at a *lot* of blocks to find one that

    "It would be hard to exploit this in a text document, but it
    would be easy to write two programs that do whatever you want
    and that have the same MD5 checksum."

The trick would be getting people to accept that the first one was
something they wanted to download & run, and only then substitute
the second one for it.

| Similarly, if you are checking the MD5 hash of downloaded code,
| and the expected hash value is published by somebody you trust,
| you are still safe.

Not necessarily. An attacker *might* be able to find a 128-byte
section of some existing program where flipping the magic six bits
*both* left the MD5 has unchanged *and* changed the behavior of
the program in a way that you wouldn't like. Granted, it's *very*
unlikely, but we now know it's not impossible. Fortunately, testing
whether any particular existing binary file is susceptible to that
attack is fairly simple & quick [just loop over the file, flipping
the magic six bits in successive 128-byte blocks, and seeing if the
MD5 hash changed or not -- if not, you have a potential problem].


Rob Warnock <>
627 26th Avenue <URL:>
San Mateo, CA 94403 (650)572-2607

Relevant Pages

  • Re: long index strings
    ... I'm sure breaking a long string into 20 byte segments would work, ... What I was hoping for was a way to compute a mathematical hash such ... as MD5 in Filemaker. ... What are the requirements for writing a plug-in of your own? ...
  • Re: The answers: Lost password + MD5 ?
    ... than the brute-force attack of 2**80 operations based on the hash length. ... This attack builds on previous attacks on SHA-0 and SHA-1, and is a major, ... We wondered if storing passwords hashed as MD5 was safe. ... > (That is called a collision, ...
  • Repairing damaged MD5 values
    ... The result is that whenever the MD5 hashed bytes contained a byte ... it was stored as one hex digit instead of two. ... So instead of uniform string lengths of 32, ... in a 26 digit hash than in a 32 digit. ...
  • Re: Properties of MD5
    ... > little difference result in a MD5 hash with a big difference and two ... > 128 bit hash string as result. ... Is the algorithm then still behaving ... collision so it is still secure for practical purposes unless a better ...
  • Re: Should be in crypto for criminals Re: just stupid?
    ... > in six different hashes can be done in an hour, ... > compute a single hash collision for MD5. ... > a single collision to a known hash value. ...