Revised ARC4 Hash Function Proposal

I must be a glutton for punishment. This is a heavily revised version
of my proposal for Hash function based on ARC4.

1 Introduction

There are already standard methods for converting block cyphers into
hash functions, Preneel (1993) gave twelve secure methods with
compression functions of the form H_i = E_A(B) xor C. How about using
a stream cypher instead of a block cypher? Only two of Preneel's
twelve compression functions are suitable for use with stream cyphers.
With a stream cypher E(B) = B xor KS, where KS is the keystream. This
means that E(B) xor C in Preneel's constructions becomes B xor KS xor
C. Hence B and C cannot be wholly or partly the same or they will
cancel out, making a weak compression function. The options for A, B
and C are m_i (message block i), H_(i-1) (previous hash block) and m_i
xor H_(i-1). With no cancelling the only allowed values for a stream
cypher are:

11 A = m_i xor H_(i-1), B = m_i, C = H_(i-1)
12 A = m_i xor H_(i-1), B = H_(i-1), C = m_i

which are 11 and 12 in Preneel's scheme. According to Schneier,
number 11 has a hash rate of 1 so I chose that one. This gives a
compression function of:

H_i = E_k(m_i) xor H_(i-1), where k = m_i xor H_(i-1)

The other input I have taken ideas from is the proposal for an RC4
based hash function from Chang, Gupta and Nandi
This was pointed out to me in the course of discussions about my
first effort at an ARC4 hash. The Chang, Gupta and Nandi (CGN)
proposal splits the design into three parts: a compression function
repeated for each block of input, a one way transformation (OWT) and
hash block generation (HBG). This current proposal retains their
three part structure with a modified compression function, using
Preneel 11 as above, and some changes to the one way function and hash
generation function.

By using the Preneel compression function the internal state of the
hash is increased by H_i, the output block from the compression
function. The modifications to the OWT and HBG are to incorporate
that extra state into the final output.

Padding is as proposed by CGN, based on one and zeros with an eight
byte bit length and including the extra initial byte to indicate the
hash length.

Block length for the compression function is 64 bytes, 512 bits. Hash
sizes are from 16 to 64 bytes (128 to 512 bits respectively).

The ARC4 key schedule was not designed as a hash function. My
previous idea used it unchanged, which was weak as a hash compression
function. CGN introduced a permutation, r(), of the message block to
reduce this weakness. However the particular r() they chose is the
identity transformation for every fourth 64 byte block, which has the
same weakness as the unchanged ARC4 key schedule I used in my previous
idea. I do not follow CGN's reasons for their particular choice of
r(), in particular why it is the identity transformation for bytes
0-64, 128 and 196 out of every 256. Their choice of S_IV seems
reasonable to me so I have used it.

2 Description

The message to be hashed is padded to the next 512 bit (64 byte)
boundary and divided into 64 byte blocks. Each block of the message
is passed through the Compression Function. The output from the
Compression Function is the ARC4 state array S, the ARC4 index j and
the intermediate hash block H_inter. These three feed into the One
Way Transformation which produces changed versions of S, j and H_final
as output. These changed S, j and H_final feed into the Hash Block
Generation function which produces the actual hash. S and j are
present in ARC4 and in the CGN proposal. H is additional and is a
result of using the Preneel compression function.

2.1 Inputs and Output

Inputs are the message to be hashed, m, and the length in bytes of the
required hash, L. m cannot exceed (2**64) - 1 bits and L must be in
the range [16 ... 64].

Output is a hash of the message, L bytes long.

2.2 Padding

Padding is identical to CGN padding. Prepend a byte, L, to the
message indicating the size of the output hash in bytes: this value
will be from 16 to 64 as the hash length varies from 128 to 512 bits
in 8 bit increments. Values of L outside this range must be rejected.
Then the message itself, then a single 1 bit, then as many zero bits
as needed then 64 bits (8 bytes) indicating the size of the unpadded
message in bits. This value has its most significant bit first and
its least significant bit last as the last bit of the whole padded
message. If the message is too long, more than (2**64) - 1 bits, then
it must be rejected.

Padded message = L || M || 1 || 000 ... 000 || S

L = hash size byte.
M = message
1 = 1 bit
0 = 0 bit
S = big endian bit size of unpadded message, 8 bytes

The number of zero bits is such that the size of the padded message is
the least possible whole number of 512 bit blocks.

The purpose of the prepended hash size byte is to prevent short hash
outputs being simple truncations of the longer hash outputs for the
same message.

2.3 Initialisation

Initialise the 256 byte ARC4 state array S to the S_IV value given in
CGN. The ARC4 indexes i and j are both initialised to zero. The
initial value of the 64 byte hash block, H_0, is 00, 01, ... 3E, 3F.

2.4 Compression Function

Each block of the message, m_i, is xor'ed with the previous hash
block, H_(i-1), and then mixed into the state array S. Then its
encryption is xor'ed with the previous hash block H_(i-1) to give the
new hash block H_i.

2.4.1 Mixing

Input: S, i, j, H_(i-1), m_i

key <- m_i xor H_(i-1)
i_old <- i
for t in 0 to 63 do
i <- (i + 1) mod 256
j <- (j + S[i] + key[r(i)]) mod 256
swap(S[i], S[j])
end for

Output: S, i, j, i_old

r(i) is a function which mixes up the 64 bytes in eack key block. I
do not like the CGN r() function for the reasons I gave above. I have
not been able to calculate something I know to be better so to keep it
simple I tentatively propose:

i in [ 0, 63] use rot13: i <- (i + 13) mod 64
i in [ 64, 127] use rot17: i <- (i + 17) mod 64
i in [128, 191] use rot19: i <- (i + 19) mod 64
i in [192, 255] use rot23: i <- (i + 23) mod 64

Because we are alternating 64 bytes of key with 64 bytes of
encryption, i_old is used to preserve the values of i consistently
from key block to key block and encryption block to encryption block.
Without this i would be 0-63 or 128-191 for mixing and 64-127 or
192-255 for encryption. I do not treat j similarly as by leaving it
free to run on between mixing and encryption phases it is more
difficult for an attacker to control its value through both phases.
This is in keeping with the design of ARC4, where i moves
systematically through the state array, while j jumps around in a
pseudo random fashion.

2.4.2 Encryption

Input: S, i_old, H_(i-1), m_i

for t in 0 to 63 do
i_old <- (i_old + 1) mod 256
j <- (j + S[i_old]) mod 256
swap(S[i_old], S[j])
cyphertext[t] <- m_i[t] xor S[(S[i_old] + S[j]) mod 256]
end for
H_i <- cyphertext xor H_(i-1)

Output S, j, H_i

2.5 One Way Transformation

This is similar to the OWT in CGN. I have included H_inter, the
output hash block from the previous round of the compression function
and H_final, neither of which are present in the CGN proposal.

Input: S, i, j, H_inter

S_pre <- S

C(S, i, j, H_inter, m_a) -> S, i, j, H_b
C(S, i, j, H_b, m_b) -> S, i, j, H_c
C(S, i, j, H_c, m_c) -> S, i, j, H_d
C(S, i, j, H_d, m_d) -> S, i, j, H_final

S_post <- S

for t in 0 to 255 do
S[t] <- S_pre[S_post[S_pre[t]]]
end for

Output: S, j, H_final

C() is the compression function described above.

m_a to m_d are four dummy message blocks. They can be implemented in
code or as four blocks of additional padding after the standard
padding described above.

m_a = 00, 01, ... 3E, 3F
m_b = 40, 41, ... 7E, 7F
m_c = 80, 81, ... BE, BF
m_d = C0, C1, ... FE, FF

These dummy blocks ensure that the last bytes of the padded message
are well mixed into the ARC4 state S. Without them the last byte of
the padded message would only be involved in two swaps in S.

CGN use an effective 512 extra message bytes here, but their method
only has one swap per message byte. The Preneel compression function
has two swaps per message byte, once as key and once as plaintext, so
the overall number of swaps in S is the same for both OWTs.

2.6 Hash Block Generation

This encrypts as many bytes as required of H_final using standard ARC4

Input: L, S, j, H_final

for i in 0 to (L - 1) do
j <- (j + S[i]) mod 256
swap(S[i], S[j])
Hash[i] <- H_final[i] xor S[(S[i] + S[j]) mod 256];
end for

Output: Hash

Here L is the number of bytes in the hash size required, which was
used as the first byte of padding at the start of the message.

By encrypting H_final both S and j affect the output hash value as
well as L and H_final itself. This increases the effective size of
the internal state of the hash function to more than just H_final

3.0 Testing

Diffusion: Comparing 512 bit hashes of two messages differing only in
a single bit, the average difference in the hashes was 255.84 bits.
This indicates good diffusion.

Diffusion of the last byte: Comparing 512 bit hashes of two messages
differing only in the last byte, the average difference in the hashes
was 256.68 bits. This indicates good sensitivity to the last byte of
the message.

4.0 Questions

I do not follow how CGN have constructed their r() function. My own
proposal, rot13 etc, is very tentative. I would like more input here
as this is an obvious point of possible weakness. I am not even
certain if r() is actually needed in the Preneel compression function.
I do not have the skills needed to determine if it can safely be
omitted, so for security I have left it in.

I get the feeling that this hash is going to be a bit slow as each
message byte is processed twice, one as part of the key and once as
part of the plaintext. At best it is going to run at half the speed
of ARC4 with a constant overhead for the OWT and HBG at the end. That
will have a bigger impact on short messages than on long messages. If
it were possible to omit the r() mixing function then that would help
the speed a bit.

5.0 Bloodsports

You can now all tear this latest idea to pieces. :)