Re: Poker Tests in restricted envionments

From: Gregory G Rose (ggr_at_qualcomm.com)
Date: 08/24/05


Date: 24 Aug 2005 14:00:36 -0700


[I never saw the original article, so I'm
following up the wrong one.]

>David Eather <eather@tpg.com.au> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>
>> I've got a problem and I'll ask here first. I have a random noise
>> generator that collects bits of data from a transistor noise source,
>> processes it and then send it of to a PC serial port at 4800 or 9600.
>> I
>> have a maximum of 14 bytes of ram (probably only 9 usable) so I can't
>> perform a full poker test.
>
>> What I am thinking of doing is to count the number of 1 particular
>> byte in a
>> sample of 2**16 i.e count the number of FF bytes in what should be a
>> random
>> stream of bytes ranging 00 to FF. If the test fails at a reasonable
>> p, then
>> the byte value gets tested again. A successful test would result in
>> testing
>> the next byte in sequence - by not advancing after a failed test it
>> should
>> pick up any "stuck" (i.e. broken) conditions reasonably quickly
>
>> I'm also performing a bit frequency test and steps to also remove
>> auto
>> correlations.
>
>> I have 2 questions. What is the statistic to use to show if
>> particular
>> byte samples are over or under represented in a stream " -
>> probability is
>> not my forte.

I think what you want is the test that shows one
particular byte versus all the others. That's best
done with a binomial test. So, suppose your sample
is indeed 2**16 bytes; each particular byte should show
up 256/2^16 == 256 times. So, the binomial formula
for that is

        x = freq - exp_freq;
        z = (ABS(x) - 0.5) / sqrt(data->samples*P*(1-P));

Where P is the probability of a "normal"
occurrence, that is, 1/256 in this case, "freq"
is the observed frequency, exp_freq is the
expected frequency (256), data->samples is the
total number of samples (2^16). z, then is
approximately standard normal, and should be
compared with the usual standard normal Z
values.

*BUT*!!! Suppose you were looking for a 99%
confidence. You are going to run 256 different
tests, and you can pretty much expect 2.56 of them
to fail at the 99% level. You need to bump up the
confidence interval to, for example, 99.9% to get a
true 99% reading. (Well, that's my cheap and nasty
approximation anyway -- you could always do it
right, and find the right Z value for
100%-(1/256).)

You'll also presumably need to scale the formula
appropriately for your platform, but that's an
exercise.

(Code taken from our general stats program that I
can send you if you want, but haven't cleaned up
for general consumption yet.)

Greg.

-- 
Greg Rose
232B EC8F 44C6 C853 D68F  E107 E6BF CD2F 1081 A37C
Qualcomm Australia: http://www.qualcomm.com.au