Re: Interpreting the test result on a RNG



On Mon, 12 Feb 2007 20:50:43 +0100, Cristiano <cristiano.pi@xxxxxxxxxx> wrote:

In the current implementation of my test, I already use the KS test (and its
Anderson-Darling variant) to check whether the p-values are uniformly
distributed, I also use the Pearson's chi-square test with sqrt(n) bins.
I usually test 50 sequences with 100 to 1000 different tests, like this:

Seq#1 Seq#2 Seq#3 ... Seq#50 KS
FFT test .9281 .1982 .0291 .1922 .1234
Serial test .1235 .6453 .3677 .7773 .4567
Permut. test ...
... ...
Autocorr. test .4654 .3737 .1927 ... .3937 .6789
Overall KS .9876

For each row, I calculate the overall p-value with the KS test, so I have 1
p-value for the FFT, 1 p-value for the Serial test and so forth.
When there are many tests, I get for some of them a very small KS p-value
and I don't know how many tests with 0.0001... or 0.00001... can be
tolerated.
To check whether the KS p-values are good (uniformly distributed) I do an
overall KS test over the KS p-values; if this overall p-values is good it
means that I can tolerate the small p-values.
Usually it works, but it's not rare to see a big failure even with a very
good generator. For that reason I tried the binomial distribution method.

Your test method looks good enough that I can't pretend to
have superior expertise, so take what I say with caution:

I expect that real RNGs will tend to fail one particular
test (or perhaps a couple of tests) and not to fail other
tests. I don't think it would be fruitful to try to detect
an RNG that does badly on the FFT Test on one sequence,
badly on the Serial Test on another sequence, and so forth.
So, the case that you most want to detect is the case in
which one single-test KS score is very small, or perhaps
a couple single-test KS scores are quite small. It seems
to me (but I confess to being beyond my ken here) that you
lose a lot of test "power" if a single-test KS score of
0.00001 just gets counted as "one value less than 0.05",
the same as a score of 0.049. That 0.00001 is trying to
tell you something much more important than 0.049.

The "Overall KS" score will detect a bad RNG that spreads
its failures in little bits over all the tests, but that's
not a realistic concern: I've never heard of an RNG that
failed in such a way. You could design one if you tried,
but . . . by accident? No.

So, I think you want a test that focuses on the smallest few
single-test KS values and asks, "Do these show a stronger
tendency toward extreme smallness than you'd expect from the
smallest few values from a uniform distribution?" My
knowledge of statistical tests doesn't suffice here, but
please report what you find.

--
To email me, substitute nowhere->spamcop, invalid->net.
.



Relevant Pages