Re: VENONA Query; Kullback-Leibler Divergence

From: David Wagner (
Date: 08/18/05

Date: Thu, 18 Aug 2005 17:27:18 +0000 (UTC)

Jim Foyle wrote:
>In NSA Monograph #4, on VENONA, Robert Benson says that in 1952 Condray
>proposed attacking messages with unknown message beginnings using the work
>of the Naval cryptanalyst and mathematician, Dr. Richard Leibler.

The KL divergence is a measure of how far apart two distributions are.

>Can anybody tell me if this Divergence test, or some suitable modification
>of it, could in fact be used to tell if two messages were likely
>super-encrypted with the same one-time pad?

If there is some application of the KL Divergence to this problem,
perhaps Condray defined two distributions, argued that if they are
"close" then the two messages used the same one-time pad, and then
used KL divergence to measure "closeness". The question would be how
those distributions are defined, as a function of the observed ciphertexts
and anything else that was known to the cryptanalysts. I'd have to know
more about the system to speculate on that topic. Is a copy of the
monograph available anywhere?

Perhaps one approach might be as follows. Let D denote the distribution
obtained by xor-ing two random messages (if super-encryption is used, I use
the term "message" to denote the input to the one-time pad, i.e., the
result of the pre-encipherment). Given two ciphertexts c,c', one might
test whether c xor c' looks like it could have come from distribution D.
Notice that if c and c' were enciphered using the same one-time pad, then
c xor c' should be distributed according to D; if they were enciphered with
different one-time pads, then c xor c' should be uniform.

One might apply the KL divergence as follows. Let F denote the empirical
distribution of c xor c'. For instance, if each code group of each
message is determined to be processed independently of all others (by the
pre-encipherment), then we might define F by
  F(g) = (# of times that code group g appears in c xor c') /
         (# of code groups in c xor c').
Let D_0 denote the a priori distribution on a code group of a message,
and let D denote the distribution obtained by drawing two independent
samples from D_0 and xor-ing them. Then one might use KL(D,F) as a
measure of similarity between D and F; the smaller that KL(D,F) is, the
more likely that c and c' used the same one-time pad. Or, one might
compare KL(D,F) to KL(Uniform,F), and presume that c,c' used the same
one-time pad if the former is smaller, otherwise presume that c,c' used
different pads.

These are mere speculations. I have no idea whether this kind of
approach would work, or would be optimal, let alone whether it is what
Condray had in mind.

There has been some recent discussion of KL divergence and other such
measures here on this group by Doug Gwyn.