Re: Interpreting the test result on a RNG



On Mon, 12 Feb 2007 13:21:25 +0100, Cristiano <cristiano.pi@xxxxxxxxxx> wrote:
On page 100 (page 109 of the pdf file) of this NIST paper
http://csrc.nist.gov/publications/nistpubs/800-22/sp-800-22-051501.pdf
there is an interpretation method of the result obtained from a test of a
random number generator.

The range of acceptable proportion of sequences that pass the test is
determined using the confidence interval calculated using a normal
distribution as an approximation to the binomial distribution.
I'd like to implement that method using the binomial distribution because
the number of the tested sequences ('m' in the paper) can be 50 or 100.

I calculated the following table with m=50 and a=0.05:

F Bin CUM(Bin)
0 0.076945 0.923055
1 0.202487 0.720568
2 0.261101 0.459467
3 0.219875 0.239592
4 0.135975 0.103617
5 0.065841 0.037776
6 0.025990 0.011786
7 0.008598 0.003188

F is the number of the sequences which failed the test (p-value < a)
Bin is Pp(n|N) in this link (the binomial distribution):
http://mathworld.wolfram.com/BinomialDistribution.html
CUM(bin) is the probability to see more than F failures.

My problem is: which column should I use? Or, in other words, the overall
test is one-tailed (column 'CUM(Bin)') or two-tailed (column 'Bin')? I think
the latter.

You could defend either decision, but pay attention to what you
mean by your choice:

If you choose a two-tailed test, you're announcing that
you're willing to tolerate a certain probability of mistakenly
rejecting a perfect random-number generator for having
either too many failures or for having too few failures.
This is, in general, a reasonable thing to say (very few
failures is, in fact, suspicious); but note that in the
table you give the smallest possible number of failures,
zero, has a 7.7% chance of occurring with a perfect random
source. So any two-tailed test you apply would have at least
a 7.7% chance of false rejection, which is probably more than
you want to tolerate.

In contrast, if you choose a single-tailed test, you're
saying that you'll only reject a candidate source for failing
too many tests -- and that you'll accept a candidate source
even if the number of failures is anomalously small. If
you're working from the table you gave above, this would
be the reasonable thing to do. However, even if you work
from the bottom row of the table (i.e., saying you'll reject
the candidate if there are more than 7 failures), you'll
have a .3% chance of rejecting a perfect random source.
That's far too high for many applications.

Note that you have chosen as your acceptance criterion the
number of occurrences of p-value < a, rather than some other
test of the hypothesis that the p-values are uniformly
distributed. This might be an impractical choice when the
number of p-values is relatively small: it seems to me that
SP-800-22 talks in terms of 1000 tests, rather than 50 or
100.

Probably the most useful next step is to decide what
false-rejection rate you can tolerate. That would help
to focus further discussion.

--
To email me, substitute nowhere->spamcop, invalid->net.
.