Faster key search in SecurID

From: Scott Contini (contini_at_matmail.com)
Date: 09/07/03


Date: 6 Sep 2003 17:07:40 -0700

Here's a simple result on SecurID. I'm posting it here since a better
result is coming out soon. I'd be interested in feedback. Again,
the attack is very simple so don't have high expectations!

Scott

===============================================================================

            The Effect of a Single Vanishing Differential on ASHF
            Scott Contini
            Sept 3, 2003

In [1], Biryukov, Lano, and Preneel present the first known
real security design flaw in the securid device. The device
is susceptible to "vanishing differentials" (when two related
plaintexts hash to the same value) with relatively high
probability. In terms of the way the device actually works,
this almost always happens when a one-bit time difference gets
formatted into a two-bit plaintext difference. This note assumes
two-bit plaintext differentials, but the general concept of the
attack we describe will apply to larger bit differentials as well.
We also use the same acronym as [1], ASHF, to refer to the alleged
securid hash function.

The main question we are interested in is:

        Given that a two-bit vanishing differential occurred
        in the full ASHF with time format, how can we use it
        to speed up key search?

In [1], "partial key recovery" is talked about on the bottom
of page 11. They assume at least a few vanishing differentials.
They mention that the vanishing differentials can give some
information about the sum of key bits. They give no further details
about what is learned.

Here, we show that a single vanishing differential allows one
to almost always find the key in approximately 2^60 steps,
which is a factor of 2^4 less than expected. Further, these steps
are about 4 times faster than the full ASHF. Thus, the
net speedup in key search time is a factor of about 2^6.

1. ASHF Key'ed Permutation

We first give a more insightful description of how the ASHF
key'ed permutation really works. The original code, obtained
by I.C. Wiener [2] (apparently by reverse engineering the
ACE server code), is quite cryptic. Even after cleaning
it up, it is not obvious what the algorithm is really doing.

An equivalent but more insightful description of the ASHF is
given now. It first takes the key and separates it into
16 nibbles in the most natural way. We refer to these key
nibbles by k[0] .. k[15] . The data bits are indexed
between 0 and 63. These data bits will be taken 4 at a time,
copied to the permuted_data array from right to left (i.e.
higher indexes are filled in first), and then removed from the
original data array. Every time 4 bits are removed from the
original data array, the size shrinks by 4. Indexes within
that array are always modulo the number of bits remaining.

A pointer m is first initialized to the index k[0] . The
first 4 bits that are taken are those right before the index of m .
For example, if k[0] is 0xa, then bits 6, 7, 8, and 9 are taken.
If k[0] is 0x2, then bits 62, 63, 0, and 1 are taken. As
these bits are removed from the array, the index m is adjusted
accordingly so that it continues to point at the same bit it
pointed to before the 4 bits were removed.

The pointer m is then increased by a value of k[1] , and
the 4 bits prior to this are taken, as before. The process
is repeated until all bits have been taken.

Sample C code is given below.

void
securid_permute_data( OCTET *data, const OCTET key )
{
    unsigned char bit_data[64], permuted_bit[64];
    unsigned char hex_key[16];
    int i, j, k, m;
    int bits_remaining = 64;

    memset( bit_data, 0, sizeof (bit_data) );

    securid_expand_data_to_1_bit_per_byte( *data, bit_data );
    securid_expand_key_to_4_bit_per_byte( key, hex_key );

    m = 0;
    for (k = 0; k < 16; k++)
    {
        m += hex_key[k];
        m = m % bits_remaining;

        /* copy the 4 bits before index m into permuted_bit */
        for (i = 0; i < 4; i++)
        {
            /* j is index of the bit that is being copied */
            j = m - 4 + i;
            if (j < 0)
                /* wrap around to the end of the array */
                j += bits_remaining;

            /* fill in the right side of permuted_bit first */
            permuted_bit[bits_remaining - 4 + i] = bit_data[j];
        }

        /* take out the 4 bits just copied from the bit_data array */
        j = m - 4;
        if (j < 0)
            /* wrap around */
            j += bits_remaining;
        remove_4_bits( bit_data, j, bits_remaining );

        /* there are 4 less bits in the array */
        bits_remaining -= 4;
        /* decrease m by 4 since the previous 4 bits have been
chopped
           out */
        m -= 4;
        /* note that bits that are wrapped around do not effect the
index */
        if (m < 0)
            m = 0;
    }

    securid_reassemble_64_bit_from_64_byte( permuted_bit, data );
}

An alternative description which is not needed for our result but
is cleaner to code is given here. It is essentially the same idea
as above, except rather than removing bits from the array, instead
they are crossed off. This simplifies the task of keeping track
of m , and keeps all indexes modulo the constant 64.

void
securid_permute_data( OCTET *data, const OCTET key )
{
    unsigned char bit_data[64], permuted_bit[64];
    unsigned char hex_key[16];
    int i, j, k, m;

    memset( bit_data, 0, sizeof (bit_data) );

    securid_expand_data_to_1_bit_per_byte( *data, bit_data );
    securid_expand_key_to_4_bit_per_byte( key, hex_key );

    m = 0;
    for (k = 0; k < 16; k++)
    {
        /* skip past the next hex_key[k] values, ignoring crossed off
           entries */
        for (i=0; i < hex_key[k]; ++i) {
            m = (m + 1) & 0x3f;
            /* if the entry is crossed off, continue until we find one
               that is not */
            while (bit_data[m] == 255)
                m = (m + 1) & 0x3f;
        }

        /*
         * copy the 4 bits before index m into permuted_bit
         */
        j = (m-1) & 0x3f;
        for (i = 0; i < 4; i++)
        {
            /* ignore bits that are crossed off */
            while (bit_data[j] == 0xff)
                j = (j - 1) & 0x3f;

            /* fill in the right side of permuted_bit first */
            permuted_bit[63 - k*4 - i] = bit_data[j];

            /* cross off the bit we just took */
            bit_data[j] = 0xff;
        }

    }

    securid_reassemble_64_bit_from_64_byte( permuted_bit, data );
}

The ASHF code that Wiener gave yields the same results as
these alternatives, but does so in a very compact, clever way.
It requires careful study to understand why Wiener's code
really is a permutation.

2. Key Searching after a Single Two-Bit Vanishing Differential

Please refer to [1] for a description of ASHF and the concept
of vanishing differential. In this section, when we say "vanishing
differential", we assume a two-bit difference.

Assume a vanishing differential has occurred. What can we say
about it? First, almost always the internal collision will happen
within 32 subrounds of the first round. If the collision happens
after 32 subrounds, this means that each differential bit has
passed through either B[0] or B[4] without disappearing. It
will then be an additional large number of subrounds before the
differential gets to the other byte (i.e. to B[4] if it just
passed through B[0] and vice versa) and has another chance
to disappear. Heuristically, this is extremely unlikely to
happen. If the difference does not disappear within the first
round (= 64 subrounds), then it gets mixed into the key, which
makes it almost impossible to disappear. After 405 simulation
experiments of vanishing differentials, we found 11 where the
differential lasted more than 32 subrounds. So it appears that
less than 3% of the time the assumption is wrong which will
cause our algorithm to fail.

We will be doing a key search where we test a key to see if it
produces a vanishing differential on our plaintext pairs within
32 subrounds. If it does happen, this indicates a candidate key
which we test further to see if it is the right one. Otherwise,
we advance to the next candidate. Since we only have to try
32 subrounds, this is 8 times faster than the full ASHF. On the
other hand, we have to do it for 2 plaintexts, so each trial
is 4 times faster than the full ASHF.

Note that in 32 subrounds, only the first 32 key bits come into
play. We take advantage of this in our attack.

The attack works as follows. Guess the first 6 bytes of the key.
There are 2^48 possibilities. For each guess, we perform the
key'ed permutation only for those first 6 bytes. After this step,
we have 6 bytes in the permuted_data array and 2 bytes remaining.
Although there are 2^16 possible keys remaining, this is far more
than we need to try since many of the resulting permutations of
the remaining 2 bytes of data overlap. Thus, rather than guessing
the last 2 key bytes, we try all possible permutations that could
have come from some key. There are at most 16*12*8*4 ~=
2^12.58 permutations (see our description of the ASHF key'ed
permutation to understand this). If the collision does indeed
happen, all possibilities for the remaining 2 key bytes are
explored with the full ASHF to see if any is the right key. Since
collisions are extremely rare, the time for testing candidates is
negligible in the big picture. The algorithm as stated gives at
least a reduction factor of 2^3.41 work in terms of total number
of keys studied. Taking into consideration that each step is 4
times faster than ASHF, the speedup is at least 2^5.41 over
exhaustive key search.

In reality, we often can get away with less than the 2^12.58
permutation trials since many of them overlap. For instance,
if both difference bits have been placed in the 6 bytes
already determined by the guessed part of the key (which happens
heuristically with probability about 56%), and if the hamming
weight of the remaining 2 bytes is 0, then there is only 1
possibility for the last 2 bytes. According to an exhaustive
search, if both differential bits have already been placed,
then the average number of possibilities for the last two
bytes is about 2^11.3. If we assume 56% of the time we can
get away with this shortened search and the other 44% of the
time we do the full search of 2^12.58 work (which is not
optimal), then the expected work for the last 2 bytes is
2^12.0. Hence, the modified key search is 2^6.0 times faster
than full key search. The reader can independently verify
that the most number of trials for the last 2 bytes when
there is no difference is 4670, which comes from any bit
rotation of the word 0x09bd.

It is not necessary to precompute and store all unique
permutations for each 2 byte word. An algorithm exists to
traverse them without duplication. The algorithm works as
follows. First, all of the 16 possible choices for the
first 4 bits are listed along with their corresponding
shrunken 12 bit arrays. Each shrunken 12 bit array is
rotated until the least integer representation of that
array is obtained (where one is free to choose the endianness
they prefer). Any duplications among the 16 choices are
removed, where a duplication means that both the 4 bit
and the 12 bit arrays are the same. For each remaining
entry, the same process is repeated recursively on the 12
bit arrays (i.e. the list of all 12 possible choices for
the next 4 bits are obtained along with their shrunken
8 bit arrays, etc....). At each recursive step, the
duplication check must not only be done among the locally
computed lists, but also those lists obtained from other
recursive branches of the same depth. This can be done
in constant time using 2^18 storage bits.

Alternatively, one can use the precomputed memory lookup
table method but switch to guessing 6.5 bytes of the key
instead of 6. The search space becomes size 2^60.17, and
the speedup factor becomes 2^5.83.

3. Intelligent Key Searches

Another characteristic about vanishing differentials is that the
two differential bits usually enter B[0] or B[4] within a few
subrounds of each other. This is because the farther apart they
are from entering these bytes, the more difficult it becomes for
the differentials not to propagate, eliminating any reasonable
chance of them disappearing.

Below is a histogram from 405 trials, showing how many iterations
apart it took for both differences to enter B[0] or B[4] .
Thus, we see that 65.43% of the time the differences enter
B[0] or B[4] within 4 subrounds of each other and 74.07% of
the time within 5 subrounds.

iteration number of sum of sum of occurrences
difference occurrences occurrences divided by 405
----------- ----------- ----------- -----------
0 44 44 0.1086
1 36 80 0.1975
2 61 141 0.3481
3 44 185 0.4568
4 80 265 0.6543
5 35 300 0.7407
6 37 337 0.8321
7 19 356 0.8790
8 9 365 0.9012
9 4 369 0.9111
10 5 374 0.9235
11 3 377 0.9309
12 2 379 0.9358
17 1 380 0.9383
18 1 381 0.9407
19 5 386 0.9531
20 2 388 0.9580
21 4 392 0.9679
22 3 395 0.9753
23 5 400 0.9877
24 5 405 1.0000

One can increase the speed of key search by ordering the
keys according to the difference of the number of subrounds
before each differential enters B[0] or B[4] . For
example, if we initially search only among the keys that
put the differences in B[0] or B[4] within 4 subrounds
of each other, there is a 65.43% chance that we will discover
the key 3.27 times faster than doing the full exhaustive
search. If that search fails, one can try repeating the
search for keys that put the differences in B[0] or B[4]
between 5 and 7 iterations of each other, and so on.

It is assumed that a strategy like this can be combined, to
some extent, with the strategy in Section 2.

Bibliography.

[1] A. Biryukov, J. Lano, and B. Preneel, "Cryptanalysis of the Alleged
SecurID Hash Function," from http://eprint.iacr.org/2003/162/ .

[2] I.C. Wiener, "Sample SecurID Token Emulator with Token Secret
Import,"
from http://archives.neohapsis.com/archives/bugtraq/2000-12/0428.html .