Re: Panama hash collision question
From: Dann Corbit (dcorbit_at_connx.com)
Date: Wed, 21 May 2003 11:00:50 -0700
"Roger Fleming" <email@example.com> wrote in message
> Dann Corbit wrote:
> > I understand that the Panama hash is not collision free:
> No hash is literally collision free. That is guaranteed by the
> Pigeonhole Principle: there are more possible inputs than outputs, so
> some outptuts must be repeated. When we talk about collision
> resistance in hashes, there are two things that are considered:
> a) for non-cryptographic hashes, we are concerned about the
> *probability* of a collision accidentally occuring for some
> distribution of inputs (usually either random binary data, or text).
> For good hashes, this probability should differ negligibly from what
> you would expect if all hash outputs were uniformly distributed random
> b) for cryptographic hashes, we are concerned with the *difficulty* of
> deliberately calculating a collision. For a good cryptographic hash,
> this difficulty should differ negligibly from brute force trial, i.e.
> has a work factor of the order of sqrt(2^hash_length) (the sqrt() is
> because of the Birthday Paradox).
> The paper you mention is about the second type of problem. The authors
> have found an attack which enables them to *calculate* collisions in
> Panama with about 2^82 steps, when it should take 2^128 steps. It
> should be emphasised that this says nothing at all about your problem,
> except that the chance of an accidental collision is unlikely to be
> greater than 2^-82 - since that would provide an easier attack! 2^-82
> is a microscopic probability; if you were to do a billion trials per
> second for a century, there is less than one chance in a million of a
> single success.
> > [An] Vincent Rijmen, Bart van Rompay, Bart Preneel, Joos Vandewalle,
> > "Producing Collisions for PANAMA,"
> > Presented at Fast Software Encryption 2001, Yokohama, Japan.
> > I don't have access to that article, and a web search turned up links
> > to references or to the Springer page for which I lack a subscription.
> > I need a hash for detecting changes in data. I don't need 32 bytes of
> > resolution, only 8 bytes (which I can create by XOR folding).
> > Can someone who knows about the Panama hash tell me if I can get 64 bits
> > resolution in the hash? I don't need cryptographic strength, but I do
> > a collision to be extremely unlikely.
> There are lots of small, fast functions for producing
> non-cryptographic hashes. Typically they are used for assigning
> collections of data to "buckets" to speed up sorting, list searching,
> comparison etc. As such, a low collision rate is the primary concern
> in designing them. Most standard implementations produce 32 bits of
> output but are easily extensible to longer outptus. Examples to get
> you started include Adler32, FNV hash, Elf hash, and Perl hash.
I have Alder32, FNV (and many others). I have the Crypto++ kit:
which is really an excellent piece of software, and I bolted a bunch of new
functions into it, including 64 bit CRC, Bob Jenkin's hash in 32 and 64
bits, FNV in 32 and 64 bits, UMAC, and Hash127. I feel that 32 bits is not
enough safety for my application. I would like 60 or more bits.
Here is what I am doing:
We synchronize database systems by forming a checksum (hash) for each record
in the table as we copy the database records from a source database to a
target database. At the same time, we save the value of a unique index.
Then, when it is time to synchronize again, we look up the hash value for
each unique index to see if it has changed. What we really need to be able
to detect is if (for each and every record/hash pair) any data in the record
has changed. At first blush, it may seem that 32 bits would be enough for
reasonable safety. But with millions of records processed on a daily basis
by thousands of clients, and any potential pair that is a "false match"
representing data that should be updated and is passed over, we would like
the probability to be vanishingly small that it will occur. Of course, full
synchronizations can be scheduled to wipe out any potential errors, but that
is a small comfort should an error actually happen.
Now, we could use a 512 bit hash or something, but then our incremental
synchronizations would lose the speed advantage for which they are designed.
> > My second choice would be MD5, but it
> > appears that MD5 may also suffer from collisions.
> Not so far I know. Den Boer, Bosselaers and Dobbertin have done some
> work on MD5 which comes very, very close to a method for calculating
> deliberate collisions - close enough that people are nervous about MD5
> and have stopped recommending it for new applications - but no-one has
> yet produced a complete attack. Even if they did, this is not related
> to your need for non-cryptographic collision resistance.
> > Are there versions of these algorithms that are more robust than others,
> > what are the collision rates?
> > Alternatively, is there a good test bench for detection of hashing
> > collisions?
I would like to thank all of the citizens of news:sci.crypt for the
excellent and helpful responses. This is one of the best groups on all of
-- C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html "The C-FAQ Book" ISBN 0-201-84519-9 C.A.P. FAQ: ftp://cap.connx.com/pub/Chess%20Analysis%20Project%20FAQ.htm