Re: Panama hash collision question

From: Dann Corbit (dcorbit_at_connx.com)
Date: 05/21/03


Date: Wed, 21 May 2003 11:00:50 -0700


"Roger Fleming" <roger_for_nntp@hotmail.com> wrote in message
news:fdbae11.0305201726.49f57070@posting.google.com...
> Dann Corbit wrote:
> > I understand that the Panama hash is not collision free:
>
> No hash is literally collision free. That is guaranteed by the
> Pigeonhole Principle: there are more possible inputs than outputs, so
> some outptuts must be repeated. When we talk about collision
> resistance in hashes, there are two things that are considered:
> a) for non-cryptographic hashes, we are concerned about the
> *probability* of a collision accidentally occuring for some
> distribution of inputs (usually either random binary data, or text).
> For good hashes, this probability should differ negligibly from what
> you would expect if all hash outputs were uniformly distributed random
> numbers.
> b) for cryptographic hashes, we are concerned with the *difficulty* of
> deliberately calculating a collision. For a good cryptographic hash,
> this difficulty should differ negligibly from brute force trial, i.e.
> has a work factor of the order of sqrt(2^hash_length) (the sqrt() is
> because of the Birthday Paradox).
>
> The paper you mention is about the second type of problem. The authors
> have found an attack which enables them to *calculate* collisions in
> Panama with about 2^82 steps, when it should take 2^128 steps. It
> should be emphasised that this says nothing at all about your problem,
> except that the chance of an accidental collision is unlikely to be
> greater than 2^-82 - since that would provide an easier attack! 2^-82
> is a microscopic probability; if you were to do a billion trials per
> second for a century, there is less than one chance in a million of a
> single success.
>
> > [An] Vincent Rijmen, Bart van Rompay, Bart Preneel, Joos Vandewalle,
> > "Producing Collisions for PANAMA,"
> > Presented at Fast Software Encryption 2001, Yokohama, Japan.
> >
> > I don't have access to that article, and a web search turned up links
only
> > to references or to the Springer page for which I lack a subscription.
> >
> > I need a hash for detecting changes in data. I don't need 32 bytes of
> > resolution, only 8 bytes (which I can create by XOR folding).
> > Can someone who knows about the Panama hash tell me if I can get 64 bits
of
> > resolution in the hash? I don't need cryptographic strength, but I do
need
> > a collision to be extremely unlikely.
>
> There are lots of small, fast functions for producing
> non-cryptographic hashes. Typically they are used for assigning
> collections of data to "buckets" to speed up sorting, list searching,
> comparison etc. As such, a low collision rate is the primary concern
> in designing them. Most standard implementations produce 32 bits of
> output but are easily extensible to longer outptus. Examples to get
> you started include Adler32, FNV hash, Elf hash, and Perl hash.

I have Alder32, FNV (and many others). I have the Crypto++ kit:
http://www.eskimo.com/~weidai/cryptlib.html
which is really an excellent piece of software, and I bolted a bunch of new
functions into it, including 64 bit CRC, Bob Jenkin's hash in 32 and 64
bits, FNV in 32 and 64 bits, UMAC, and Hash127. I feel that 32 bits is not
enough safety for my application. I would like 60 or more bits.

I have narrowed my choices to two candidates:
UMAC {64 bit flavor}:
http://www.cs.ucdavis.edu/~rogaway/umac/
And D. J. Bernstein's Hash127:
http://cr.yp.to/hash127.html

Here is what I am doing:
We synchronize database systems by forming a checksum (hash) for each record
in the table as we copy the database records from a source database to a
target database. At the same time, we save the value of a unique index.
Then, when it is time to synchronize again, we look up the hash value for
each unique index to see if it has changed. What we really need to be able
to detect is if (for each and every record/hash pair) any data in the record
has changed. At first blush, it may seem that 32 bits would be enough for
reasonable safety. But with millions of records processed on a daily basis
by thousands of clients, and any potential pair that is a "false match"
representing data that should be updated and is passed over, we would like
the probability to be vanishingly small that it will occur. Of course, full
synchronizations can be scheduled to wipe out any potential errors, but that
is a small comfort should an error actually happen.

Now, we could use a 512 bit hash or something, but then our incremental
synchronizations would lose the speed advantage for which they are designed.

> > My second choice would be MD5, but it
> > appears that MD5 may also suffer from collisions.
>
> Not so far I know. Den Boer, Bosselaers and Dobbertin have done some
> work on MD5 which comes very, very close to a method for calculating
> deliberate collisions - close enough that people are nervous about MD5
> and have stopped recommending it for new applications - but no-one has
> yet produced a complete attack. Even if they did, this is not related
> to your need for non-cryptographic collision resistance.
>
> > Are there versions of these algorithms that are more robust than others,
and
> > what are the collision rates?
> >
> > Alternatively, is there a good test bench for detection of hashing
> > collisions?

I would like to thank all of the citizens of news:sci.crypt for the
excellent and helpful responses. This is one of the best groups on all of
USENET.

-- 
C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 "The C-FAQ Book" ISBN 0-201-84519-9
C.A.P. FAQ: ftp://cap.connx.com/pub/Chess%20Analysis%20Project%20FAQ.htm


Relevant Pages

  • Re: Two-stage hashing (pre-hash big integer -> hash-array-index)
    ... > hash value instead of the key to generate the probe sequence. ... avoid all hashes with same home index following same collision chain, ... are the same will follow exactly the same collision chain. ... computes what I call the pre-hash, the large unsigned integer, from the ...
    (comp.programming)
  • Re: Panama hash collision question
    ... > We synchronize database systems by forming a checksum (hash) for each record ... transmit and compare the individual hashes of those 10 records. ...
    (sci.crypt)
  • Re: keys and counters
    ... how many times can the counter be incremented before there is a collision in the hash, that is what i am asking. ... A hash function operated in such a counter mode as you suggest does not have this property - if I can guess or discover the input to the first block then I will know all the other blocks. ... You might think that some attacks are unreasonable/infeasible but do you really know what is possible to the world's largest employer of mathematicians, who have had for many years the world's largest computer budget and unlimited access to 60 plus years of classified research or what is possible for any of the other multi-billion dollar "smaller" big brothers?. ...
    (sci.crypt)
  • Re: Using hash to see if objects attributes have changed
    ... Storing the entire object instead of the hash is not likely to be *that* ... If all you care about is a flag that says whether the state has changed ... stateNow = hashlib.sha1)) ... across such a collision, leading to a bug that might cause loss of data. ...
    (comp.lang.python)
  • Re: Determining the encryption used
    ... impression that if a password verification system is checking passwords ... against a hash table, all you needed was a collision (as this would hash ... They involve generating two seperate hashes which have a collision. ... The collision attacks found can break the security of cryptographic ...
    (Pen-Test)