Re: Random delay as a countermeasure to timing attacks

François Grieu wrote:
I'm not quite sure that I follow you. My problem is that our adversary
is not bound to repeating the same experiment n times; she can make n
random queries, or queries chosen according to the cipher, or
iteratively-chosen queries, and from this find the hypothesis on key
bits that best match her data. I fear she may have an efficient
heuristic for this.

If I go in the direction of your Bernoulli distribution idea, what I am
hoping for is an upper bound on the total "knowledge" that may leak
about an unknown function T(j) with result either 0 or 1, if T(j)+X is
observed n times (at 1 to n points j chosen by the adversary), and X is
our random distribution of extra noise.

Ahh. Now I think I understand you.

Here was my experimental setup: There is a secret value K in {0,1}. Let R
denote the set of real numbers. There is a noise distribution D on R.
For each of the n samples, the referee chooses X_i <- D independently at
random, and then reveals Y_i := K + X_i to the adversary. This models
known-plaintext cryptanalysis where the adversary cannot affect the
computation and where the same secret bit appears in every iteration.

Maybe this is closer to what you had in mind, instead: There is a secret
value K in {0,1}. There is a function f : {0,1} -> R^n. There is a noise
distribution D on R. For each of the n samples, the referee chooses X_i
<- D, and then reveals Y_i := f(K)_i + X_i to the adversary. We might
ask what is the worst case of the f that leaks the most information,
or in other words, we might allow the adversary to choose the function f.

We may need to restrict the magnitutde of f, since obviously if each
f(K)_i is allowed to be arbitrarily large, then it becomes trivial to
recover the secret K. Therefore, it might make sense to impose the
restriction that 0 <= f(K)_i <= 1 for all i, or in other words, that the
l_infinity norm is bounded: ||f||_\infty <= 1. We could also consider
bounds on the l_2 norm, say, if that fits the application setting better.

Of course, my experimental setup is the special case where the function
f is given by f(x) = (x,x,..,x). This is a repetition code. I have
a suspicion that the ability to choose some other f does not help the
adversary, as long as f satisfies ||f||_\infty <= 1; i.e., in this case,
the repetition code is optimal for the adversary. I have not tried to
prove this, but it feels right.

Now let's move on to the generalization where our secret is more than
just one bit, say K is in {0,1}^m. We can introduce a function f :
{0,1}^m -> R^n, chosen by the adversary. Everything is as before.

In general, f might be any code. As a special case, there is the
repetition code given by f(K) = (K_1,K_2,..,K_m,K_1,..K_m,..) = (K,K,..).
However, now I think something interesting happens. When m>1, I think
it will happen that other codes may be significantly better than the
might be able to leak key bits more quickly.

I suspect that the information-theoretic concept of channel capacity
may be relevant here. Think of the randomized process S --> Y as
the channel, where S = (S_1,..,S_n), Y = (Y_1,..,Y_n), Y_i := S_i + X_i,
and each X_i is drawn randomly and independent from D. Of course, this
is a highly noisy channel. The capacity of such a channel is given by C(n) =
max I(S;Y), where the maximum is taken over all possible distributions on S.
(I can't remember exactly how this goes in the continuous case, but I think
there is a side requirement that Var[S] <= 1, or something like that.)

The channel capacity tells you how many bits of information about S
can be leaked by the channel. If I remember correctly, the channel
capacity grows linearly with n, i.e., C(n) = n * C(1).

Notice that in the most general experiment listed above, the experiment
corresponds to encoding the secret K into the codeword S = f(K) and then
sending S through the channel S --> Y. The function f corresponds to the
channel code used to transmit K. The rate of this code is m/n. Of course,
when the adversary chooses a function f, he should try to choose the best
code that minimizes the decoding error (the probability that the adversary
fails to correctly recover K).

I think that Shannon's theorem gives an asymptotically tight bound on
what rates are achievable. Here is an over-simplified statement of his
result, as I understand it. He focuses on codes whose decoding error
(the probability that the receiver fails to recover the original message
K correctly) -> 0 in the limit. Basically, if the rate R is too large,
namely R > C(1), then it is impossible to achieve a code whose decoding
error -> 0 in the limit. On the other hand, if R < C(1), then it is
asymptotically possible to build a code whose decoding error -> 0 in
the limit. To put it very roughly, the achievability threshold is
something like R = m/n <= C(1), or equivalently, m <= n * C(1) = C(n).
As long as you're below that threshold, codes exist that transport the
secret with asymptotically negligible decoding error.

Maybe this means that, if the attacker can choose the code f, the
number of samples an adversary needs is given by n ~ m / C(1), up to
some constant factors. I don't know. I guess one open question here
is that I don't know how the asymptotics affects things, given that
we are interested primarily in concrete security, not asymptotics;
and also, we're interested in all adversaries that gain even partial
information about K, not merely those whose decoding error probability
-> 0. Nonetheless, it does seem like it shouldn't be hard to compute
the capacity C(1) of the channel I described above, as a function of
the noise rate (i.e., as a function of the parameter \sigma, if D =
Gaussian(\mu, \sigma^2)), and maybe that will help to characterize
the leakage rate somehow.

I have a vague recollection that the repetition code is not optimal
in this more general setting, and that you can do significantly
better than the repetition code. This amounts to saying that if the
attacker can choose f (or if the attacker is lucky about what f is in
the application being attacked), then the attacker may be able to do
significantly better. I don't know if this is more intuitive, if we think
about the discrete case of a binary symmetric channel where each bit is
flipped with probability (1-\epsilon)/2. This channel has capacity ~=
\epsilon^2/(2 ln(2)). The repetition code amounts to sending
(K_1,..,K_m,K_1,..,K_m,..)
through the noisy channel, and I guess you need n ~ O(m/\epsilon^2) bits
to reliably send the m-bit value K using a repetition code. I seem
to recall that you can do better, e.g., by using a linear code f :
{0,1}^m -> {0,1}^n, but I can't seem to recall how or why at the moment.
I don't recall precisely why or in what way such a code is better; and
it is possible I am misremembering; but I do remember being surprised
to discover that the repetition code is not optimal, and then thinking
that it was in hindsight not so surprising. I confess that I have since
forgotten what that hindsight may have been.

I'm afraid that my knowledge of information theory is extremely spotty,
so please be warned that the above may have serious mistakes in it.
Pretty much everything that I have stated above, I learned from Poorvi
Vora; I hope that I have not mangled what she taught me too badly.
Nonetheless, it may be possible to build some framework along the lines
that you were hoping for, by building on some of these concepts. I hope
that this is useful or at least thought-provoking.
.

Relevant Pages

• Re: A blast from the past
... channels were used as to not tie up the home channel. ... Secret for them to exchange Secret CB Tactics, ... So one day a tape recorded Secret CB radio conversation from channel 22A ... And the wives couldn't deny our "Secret CB Donut Meetings". ...