Re: Hashing of short fixed length messages

On Jun 14, 8:47 am, Francois Grieu <fgr...@xxxxxxxxx> wrote:
I think these can be proven rigorously under the random oracle model.

Your proof looks simple, but I need some time for it...

On Jun 14, 12:24 pm, Mok-Kong Shen <mok-kong.s...@xxxxxxxxxxx> wrote:
Maaartin wrote:
I wonder if there's a more efficient way to compute a hash when I
know, that all messages are of given length, say 16 bytes.[snip]

Certainly I don't understand your goal. Normally one hashes something
to something shorter. You have very short length to start with. To how
many bytes you want to hash or do you want a bijective mapping (in
which case any established block cipher would naturally provide such a
one but there might be other possibilities, I suppose)?

The input is a secret key used to encrypt some data. From the key I
need to obtain something usable as an id for the data. That's why I
only need the fixed sized input and why I'm only interested in first
preimage resistance.

On Jun 14, 1:47 pm, Tom St Denis <t...@xxxxxxx> wrote:
For whatever reason I assumed it was 16 bytes, even while I was
looking at the source working with 16 ints. What I said obviously
can't apply to 16 bytes inputs.

MD2 works on 16 byte blocks iirc.

This was not what confused me. I really don't know how did I get it
wrong, but it happens again and again.

MD5 has no expansion btw. It works on blocks of 64 bytes [16 32-bit
words] where it just permutes the selection of words for each round.
You actually have 55 bytes of useful payload before MD5 requires a 2nd

Agreed (1 byte 0x80 and 8 bytes length eats from the input block).

You can also optimize MD5 quite a bit if you have fixed size inputs
[specially so small]. 10 out of the 16 reads from the traditional W[]
array [array of words for the block] are zero, so you can remove them
(and their related ops in the code). You also know the final padding
block so you don't need to compute/store/etc it. You also don't need
to present a traditional hash interface (init/process/done) since the
message is fixed length, so there is no overhead in gathering bytes
and/or double buffering, etc.

I wouldn't be surprised if a dedicated 16-byte version of MD5 ran
considerably faster than your standard off the shelf MD5 code.

The nice thing is I can verify the result easily. The bad thing is
that it eliminates quite a few instructions but not much cycles on a
superscalar processor.

But back to the AES route ...

H(x) = k ^ aes(x, k)

Where 'x' is the key and k is some fixed value is actually what these
class of hashes are (sha1, md5, etc...). You can create a generic
hash by saying

S_{-1} = some_known_value
S_{i} = S_{i-1} xor AES(M_i, S_{i-1})

Where you MD pad M appropriately (e.g. 1 bit followed by enough bits
to get block + length encoded).

So if you are hashing fixed 16-byte strings you could just pick some
random 128-bit value call that S_{-1} and ignore padding [as Greg
pointed out] and just compute S_0. I don't know if it's secure to use
a fixed key (for a hash that is), I certainly wouldn't recommend it.

One caveat though, the AES key schedule is actually vulnerable to
differential attacks [related keys] which actually makes AES bad for
this style of hashing.

The input itself is a hash too, so I can ignore related key attack,

On Jun 14, 7:24 pm, Paul Rubin <no.em...@xxxxxxxxxxxxxx> wrote:
Can I ask what the application is, where speed is so important? If it's
for passwords or key diversification, slow speed may even be a virtue.

This operation will be very common in the application, but I'm not
sure, if speed is important here. I'll know, when I get it running.
Slow speed can't be good here, it's nothing like password
strengthening. Most probably, I'll use some stronger and slower hash,
but I'm curious what can be done here.

0. Compute md5(x). AFAIK, MD5 is still secure w.r.t. the preimage,
isn't it?

Probably ok in practice, but MD5 is deprecated and if it's for a
high-security application, I'd stick to something standard. Note MD5
does have a theoretical preimage attack (Sasaki and Aoki, Eurocrypt
2009) which is not practical, but attacks only get better as they say.

The paper has got stolen by spinger/acm.

1. Take an algorithm like MD5 and remove the padding step. This leads
to a speed-up factor of two, but I don't think it's secure.

You mean just use the compression function once. Given MD5's problems I
can't help wonder if the second compression isn't more useful than it
it would be if the compression function worked as intended.

The compression function gets used only once for input sizes up to 55
bytes, anyway. I confused the numbers.

2. Compute x ^ aes(x, x). I don't know if this is faster then
computing md5(x) and have no idea how secure it is.

If AES is an ideal cipher, this should be ok. Assuming only that AES is
a PRP, though I don't see a way to turn a break against it into a break
against AES.

3. Compute x ^ aes(k, x), where k is a fixed key. This is probably
faster then computing md5(x) and I have no idea how secure it is.

If k is secret, then fine, but then it's not a hash function, since you
have to keep the key secret. With reasonable constraints on the amount
of hashes available to the attacker, You don't even need the (x ^)
because of the PRP-PRF switching lemma. If k is public, you're in
situation 2 again.

I can't keep k secret, it's to be shared among a group of people.

On Jun 14, 7:40 pm, Tom St Denis <t...@xxxxxxx> wrote:
Found it:

Turns out #7 from Table 3 is

H_i = H_{i-1} xor M_i xor E_{M_i}(H_{i-1})

Which is close to the OPs model. Where H_{i-1} might as well be zero,
the hash H = x xor AES(x, x) should in theory be secure provided x is
<= 16 bytes in length.

Actually sorry #9 is closest (multitasking fail) where H_{i-1} can't
be zero (but can be anything else).


H = x xor AES(x xor 0xDEADBEEF, x);

Would be fine. Or for the more sensitive in the crowd

H = x xor AES(x xor 0xEA71EAF, x);

Should do :-)

I understand "dead beef", but what is 0xEA71EAF?

Unique mapping my original model onto schemas from the paper is not
possible, as I have no H[i-1].

Taking H[i-1]=0, my version 2 (x ^ aes(x, x)) is like schema 4, 9, 11,
29, 31, 37, or 39. All the schemata are susceptible to forward,
backward, permutation, or fixed point attack, but none of the attacks
makes sense when doing only a single step.

I don't think that setting H[i-1] to a non-zero constant makes any
difference, why should
H(x) = x ^ aes(x ^ 0xDEADBEEF, x);
be any more secure than
H(x) = x ^ aes(x, x);
? IMHO, what's important in the classificiation in the paper is how
the chaining value gets handelt; without the chaining value many
schemata became equivalent.

However, my version 3 (x ^ aes(k, x)) is like schema 3 (Matyas-Meyer-
Oseas) from the paper, right? According the the table it's secure and
it's faster then version 2.

According to what's written in paragraph "permutation attack" on page
3, x ^ encrypt(k, x) is a one-way function; no reason for this is