Re: Expanding hash function bit width

From: Gregory G Rose (ggr_at_qualcomm.com)
Date: 06/24/04


Date: 24 Jun 2004 10:04:38 -0700

In article <heDCc.10174$3N6.6353@amsnews05.chello.com>,
Matthijs Hebly <heeb@iname.com> wrote:
>I have an idea to expand the bit width of any hash function H.
>What I want to know is whether my idea is:
>a) totally rediculous, because...
>b) not bad, but not usable, because...
>c) OK, OK... let's continue this...
>d) whatever.

I think it's B.

>For argument's sake let's say I'm using SHA-1, which has a 160-bit
>output. I have a message M consisting of N bytes from which I want to
>derive a 2*160=320 bit hash value, because for some reason I find
>160-bit too little (I'm considering message-BYTES rather than BITS here,
>because that happens to be what most computers nowadays calculate with.
>Any message ending with some bits that do not form a whole byte should
>be padded somehow, I guess with 10000....).
>
>Now let's say I would send every byte of my message M alternating to one
>of two hashes, H0 and H1, every even numbered byte to H0 and every odd
>byte to H1. After sending all bytes, I would have a resulting H0 and H1,
>both consisting of 160 bits.

I think that you want to send every byte of input
to both hash functions, and not just to alternate
them. To compensate for the equivalence of input
you could do either, or both, of:

a) start the hashes by hashing different input
blocks (eg. H0 = SHA("This is H0" | input))

b) changing the input bytes for one of them, for
example H1 = SHA(input ^ PAD) where PAD is as many
0xC5 bytes as you need. (0xC5 is one of the values
from HMAC, and it changes half the bits.)

Hmmm, on second thoughts, (b) is useless -- you
could arrange for the input to have the
relationship so that it still duplicates the
other input. So I take that back; (a) is
necessary and sufficient, (b) is unnecessary and
insufficient. At least I think so :-). But I
left in the discussion above to show how hard it
is to get this stuff right.

Now you can just use H0 and H1 directly, or if you
want, you could further hash them together.

The rationale for this is that it seems dangerous
to only give half the information to the partial
hashes. And it is. See below.
 
>Because H0 and H1 COULD be the same (especially if M would be something
>like AABBCCDD...), I negate (change 0-bits into 1's and vice-versa) the
>most significant half of H0 (the 80 most significant bits), as well as
>the least significant half of H1 (the 80 least significant bits).
>
>Now I concatenate:
>A) the negated most significant bits from H0 with the most significant
>bits from H1, and send this to a new hash function H'0;
>B) the least significant bits from H0 with the negated least significant
>bits from H1, and send this to a new hash function H'1;
>C) the outputs from H'0 and H'1, resulting in a 320-bit hash.

So, here's a sort-of attack.

Take a bunch of short, equal-length input
messages to SHA, and hash them all, looking for
certain coincidences. You want to find P1 and P2
such that the first 80 bits of SHA(P1) is the
same as the first 80 bits of SHA(P2). At the same
time you want to find messages Q1 and Q2 with the
same property except that we want the *last* 80
bits of their hashes to be equal. You should be
able to find a pair of each by hashing 2^40 short
texts.

Now form
M1 = interleave(P1, Q1)
M2 = interleave(P1, Q2)
M3 = interleave(P2, Q1)
M4 = interleave(P2, Q2)
where "interleave" is just the operation of taking
one byte from the first message, followed by one
byte from the second, etc. (In other words, the
opposite of your splitting algorithm. If you
change the way you split messages, just change the
way you interleave them to compensate.)

You now have 4 messages, all of which have the
property that that the first 160 bits of the
output hash are equal, and you've done it with
2^40 work. You shouldn't be able to do this. Note
that just further hashing the outputs from (C)
above together somehow doesn't really solve the
problem; it will still be the case that full
collisions can be produced more efficiently
(2^120) than should be possible for a 320-bit
output.

Greg.

-- 
Greg Rose
232B EC8F 44C6 C853 D68F  E107 E6BF CD2F 1081 A37C
Qualcomm Australia: http://www.qualcomm.com.au


Relevant Pages

  • Re: "index" efficiency... any help or ideas?
    ... no general purpose hashing method, no matter how good, is good enough ... to prevent the possibility that everything hash into the same place. ... > Downside of hashes; if the data is stored externally, ... > perfect hashing are techniques that require analysis of the data in advance, ...
    (alt.lang.asm)
  • Re: "index" efficiency... any help or ideas?
    ... > dealing with different searching algorithms. ... > cons of hashing vs. binary searching. ... That's the killer reason for using hashes in this application. ... Searching a hash table is ...
    (alt.lang.asm)
  • Re: Some comments on "super fast hash"
    ... SFH seems reasonably good and certainly is fast. ... > a hash, and SFH does not. ... The latest versions of each hash function which leverages this ... it must behave worse on other key sets. ...
    (comp.programming)
  • Re: basic info on hashes?
    ... The basic idea of hashing is that you compute a hash value ... >of finding a good hash function easier to solve. ... >longer being a simple array. ...
    (comp.lang.pascal.delphi.misc)
  • Re: Why unhashing is not possible?
    ... then it is not a hash. ... (which is where perfect hashing becomes useful) ... Types of hashes and their properties exist apart from why ... ice axes in ice, or digging axes when dealing with a forest fire); ...
    (comp.security.misc)