# Re: Extracting random data from static, for /dev/random

bmearns <mearns.b@xxxxxxxxx> writes:

On Nov 11, 5:52=A0pm, Unruh <unruh-s...@xxxxxxxxxxxxxx> wrote:
bmearns <mearn...@xxxxxxxxx> writes:
On Nov 11, 1:50=3DA0pm, Unruh <unruh-s...@xxxxxxxxxxxxxx> wrote:
bmearns <mearn...@xxxxxxxxx> writes:
I have a radio plugged into my soundcard, tuned to static. This shoul=
d
be a good source of randomness, right? I know of course it's not all
random data, as confirmed with rngtest. I was looking at the lavarnd
project, they use an algorithm they're calling DigitalBlender (tm) to
"extract" the random data and throw away the rest.
Are there any tools that can do the same thing on an arbitrary input
stream that I could use with my audio static? I thought rngd would do
it, but that just tests the randomness and throws it away if it fails=
,
instead of extracting what ever is random.

One thing you can do is to take the input and hash it. Eg, take in 204=
8 b=3D
its and
has to 1024 bits (eg MD5) This means that if the input has at least 10=
24 =3D
bits of
randomness, the output of MD5 should also have roughly that much rando=
mne=3D
ss.
Interesting, I figured hashing would factor into this somehow, but I
didn't realize it had this sort of "entropy-extraction" property. I
guess it makes sense, in fact good compression should probably have
similar effects. Thanks!

No, compression will NOT. The correlations and biases which are there whi=
ch have
cryptographic impact simply will not be recognized by a compressor-- They=
in
general would produce more output than input. If a stream is compressible=
, then
that stream already has very low entropy.

Yes, it has low entropy and can therefore be compressed. That's my
whole problem, I'm trying to remove everything except the pure
information: isn't that the whole point of compression? I understand

Well, not really. All compressors have very simple algorithms to
recognize repetitions. Ie, most correlations are not recognized by the
compressor and are not removed. Remember that there has to be a table at
the beginning to allow uncompression. That table gets larger and larger
the more subtle the patterns it compresses. so you could get the
situation where a file of length 1K had a final length of 10 TB with
almost all being taken up by the compression table. In the limit as the
file gets infinitely large, compression techniques always win ( the
table size becomes negligible) but most compressors work on finite sized
files.

that current compression techniques may be insufficient for this, but
a theoretically optimal compression algorithm would work, wouldn't it?

See above.

Or am I still missing something? To be clear, I'm talking specifically
about compressing the static from my radio, not the output of /dev/
urandom.

You could certainly run it through a compressor, if that helped, first
and then use a "random" hash function to extract the remaining entropy.
Unfortunately you would have no real idea of how much hashing you should
do. for example the digits of pi almost certainly would not compress at
all, and yet, you know that that the entropy in the digits of pi is very
very small ( the minimal length program to produce the digits of pi is
very small). If your attacker knew that you used the digits of pi, you
would have to do a hash that had a huge compression factor to extract
the real entropy from that stream.

Now radio static is probably a lot better but you still need an
extimate of the entropy density in that stream. For example if at night
that frequency suddenly began picking up Radio Moscow, the entropy
content would plummet.

I've tried audio-entropyd (aed), but it keeps failing the randomness
tests as well, so isn't giving me any extra entropy.
Any advice would be great, my entropy pool is getting exhausted more
and more often these days.

Don't use /dev/random, use /dev/urandom, which does not exhaust. Yes, =
it =3D
uses a
PRNG but continuously seeded with whatever randomness it can find, giv=
ing=3D
you a
continuous stream of cryptographically good PRNG even when the physica=
l r=3D
andomness
runs out.

Thanks,
-Brian
Using urandom, really? I've seen a number of suggestions for that from
people who don't seem to actually know anything other than "urandom
doesn't run out". I have a real hard time believing that the output of
urandom is cryptographically secure: no amount of processing can
increase entropy, so even if it's seeded with random data, the output
won't contain anymore randomness than the input. That means if there's
only 10 bits (for example) in the entropy pool, and that's used to
seed the prng, the output of that prng won't have more than 10 bits of
entropy, right? So even if you generate 1024 bits of output from the
PRNG, it will only contain the same small amount of randomness that
was used as input. I have a little bit of background in information
theory, but am certainly no expert: can anyone either back me up or
explain to me why I'm wrong?

IF you know the input, then your comment is correct. So if someone knows =
all of
the input to the hash function, they can recreate the output. But urandom
constantly is seeding itself. It has thousands of bits of input. (What yo=
u did 10
days ago affects the output today) Now, the output does have more bits th=
an the
input, thus youcould do an exhaustive search of the input and you would h=
ave to go
through less than the total output stream length. But the input is so hug=
e that
that exhaustive search is shall we say infeasible. That is why it is
cryptographically strong. A lot of thought went into /dev/urandom to ensu=
re that
it cryptographically strong.
[clip]

Again, please correct me if I'm wrong, but exact re-creation of my
entropy pool is not the only way to attack a weak entropy source.
Obviously, if someone was able to recreate the exact output of my RNG
system, they could crush what ever kind of crypto I'm trying to do
with it, but exact recreation is the extreme case. The whole reason
good entropy is important is because non-randomness has a dirty little
habit of revealing itself at the most inopportune times, namely when
someone is trying to attack your crypto-system. I'm not arguing
specifically (for the moment) that /dev/urandom is insufficient, but
you seem to be dwelling on this recreation and exhaustive searching
angle, but I don't think that's the only concern (and, as you point
out, it's not a realistic concern). You said yourself that compression
is insufficient because it doesn't account for the "correlations and
biases...which have cryptographic impact", and it's exactly these
correlations and biases that I'm concerned about.

I don't doubt that a corps of extremely intelligent, knowledgeable,
and diligent people worked on making /dev/urandom strong. However,
straight from the manpage:

"When read, /dev/urandom device will return as many bytes as are
requested. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current non-classified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random

I supposed perhaps as I was taking you too literally. Maybe what
you're trying to tell me is that /dev/urandom should be strong enough
to withstand any reasonable attack that anyone other than classified
government agencies could launch. And I supposed that's a reasonable
answer. For curiosities sake, can you confirm (or else continue to
rebuke) my assertion that in the theoretical limit, /dev/urandom is
not as secure as /dev/random?

Thanks again, I appreciate your feedback immensely.

-Brian

.

• References:

## Relevant Pages

• Re: Extracting random data from static, for /dev/random
... it, but that just tests the randomness and throws it away if it fails, ... No, compression will NOT. ... it has low entropy and can therefore be compressed. ... they could crush what ever kind of crypto I'm trying to do ...
(comp.os.linux.security)
• Re: Paying for assistance
... forget about the input being 8 bit sequences. ... This is just uninteresting, and forget about the dictionary, this is only a technicality and we can assume without loss of generality that the input is sorted by probability. ... The expected lengths of files 1,2,3 are then given by multiplying the expected bit count by the entropy of the given file, i.e. ... them 'bait' for a compression algorithm. ...
(comp.compression)
• Re: sound compression / backward-adaptive linear prediction
... People on comp.compression were quite surprised that someone even cosiders KLT for lossless audio compression, so it is likely to be under-researched direction. ... I've just used ordinary least squares to estimate linear prediction coefficients and it works, sort of, but that is not what I need because coefficients tend to change. ... And if I just estimate entropy it depends on entropy estimator being used, ... RT> This might not include any overhead bits you would need if you ...
(comp.lang.lisp)
• Re: Pacbyte data compression patent
... violates the counting argument. ... that bit has a 1/4 bit entropy again would lead to a compression loop. ... ratio by selective velocity through randomization field, ...
(comp.compression)
• Re: Detecting/estimate whether data is encrypted
... your brainstorm sounds good at first glance... ... data with relatively high entropy might be the place to start. ... no idea if compression would look similar or not. ... mind which algorithm and/or key lengths are used, ...
(Security-Basics)