Re: Interpreting the test result on a RNG



Peter Pearson wrote:
On Mon, 12 Feb 2007 20:50:43 +0100, Cristiano
<cristiano.pi@xxxxxxxxxx> wrote:

In the current implementation of my test, I already use the KS test
(and its Anderson-Darling variant) to check whether the p-values are
uniformly distributed, I also use the Pearson's chi-square test with
sqrt(n) bins.
I usually test 50 sequences with 100 to 1000 different tests, like
this:

Seq#1 Seq#2 Seq#3 ... Seq#50 KS
FFT test .9281 .1982 .0291 .1922 .1234
Serial test .1235 .6453 .3677 .7773 .4567
Permut. test ...
... ...
Autocorr. test .4654 .3737 .1927 ... .3937 .6789
Overall KS .9876

For each row, I calculate the overall p-value with the KS test, so I
have 1 p-value for the FFT, 1 p-value for the Serial test and so
forth.
When there are many tests, I get for some of them a very small KS
p-value and I don't know how many tests with 0.0001... or 0.00001...
can be tolerated.
To check whether the KS p-values are good (uniformly distributed) I
do an overall KS test over the KS p-values; if this overall p-values
is good it means that I can tolerate the small p-values.
Usually it works, but it's not rare to see a big failure even with a
very good generator. For that reason I tried the binomial
distribution method.

Your test method looks good enough that I can't pretend to
have superior expertise, so take what I say with caution:

I expect that real RNGs will tend to fail one particular
test (or perhaps a couple of tests) and not to fail other
tests.

Usually I see that, but I also see (P-)RNG that fails almost all the tests.

I don't think it would be fruitful to try to detect
an RNG that does badly on the FFT Test on one sequence,
badly on the Serial Test on another sequence, and so forth.

I totally agree.

So, the case that you most want to detect is the case in
which one single-test KS score is very small, or perhaps
a couple single-test KS scores are quite small.

Yes.

It seems
to me (but I confess to being beyond my ken here) that you
lose a lot of test "power" if a single-test KS score of
0.00001 just gets counted as "one value less than 0.05",
the same as a score of 0.049. That 0.00001 is trying to
tell you something much more important than 0.049.

Agreed.

The "Overall KS" score will detect a bad RNG that spreads
its failures in little bits over all the tests, but that's
not a realistic concern: I've never heard of an RNG that
failed in such a way. You could design one if you tried,
but . . . by accident? No.

So, I think you want a test that focuses on the smallest few
single-test KS values and asks, "Do these show a stronger
tendency toward extreme smallness than you'd expect from the
smallest few values from a uniform distribution?" My
knowledge of statistical tests doesn't suffice here, but
please report what you find.

The AD test is very sensitive on the tails, but a single 0.00001 in 99 good
p-values is not detected by the AD test (the KS test totally ignores that
bad p-value).

I think that the best thing to do is to look at the overall KS and overall
AD; KS detect bad p-values around 0.5, while AD detect bad p-values near 0
and 1 (if you take 100 perfect p-values and you replace the first 10
p-values with 0.00001, the AD test gives 0.000109, which should be a suspect
overall p-value).

I think it worth to look also at the smallest single-test AD and KS p-values
(this is easy because in my program I sort them), if I get 0.00... it should
be a good idea to test again the generator using the failed test.

What I said is what I'm currently doing; I just hoped to hear a better
testing strategy.

Following what you told me (0.00001 just gets counted as "one value less
than 0.05") I think to discard the binomial distribution test.

Cristiano


.



Relevant Pages

  • Re: Interpreting the test result on a RNG
    ... Anderson-Darling variant) to check whether the p-values are uniformly ... For that reason I tried the binomial distribution method. ... a couple single-test KS scores are quite small. ... smallest few values from a uniform distribution?" ...
    (sci.crypt)
  • Re: Bootstrapping Hypothesis Test
    ... I really am talking about bootstrapping. ... but not quite equal to the uniform distribution. ... Looking at their p-values also seems odd. ... > - What allows a Monte Carlo estimator of a statistic ...
    (sci.stat.math)
  • Re: random data
    ... Bryan Olson wrote: ... > The Kolmogorov-Smirnov test is often a good way to combine the ... all p-values from all kinds of tests in it for doing one ... the uniform distribution)? ...
    (sci.crypt)

Loading