Re: issues with statistical test suite from http://csrc.nist.gov/rng/

From: Mack (macckone_at_a_nospamjunk123_ol.com)
Date: 01/28/04


Date: Wed, 28 Jan 2004 11:11:33 GMT

On Mon, 26 Jan 2004 19:58:56 GMT, "Cristiano"
<cristiano.pi@NSquipo.it> wrote:

>Mack wrote:
>> On Sun, 25 Jan 2004 13:23:29 GMT, "Cristiano"
>> <cristiano.pi@NSquipo.it> wrote:
>>
>>> Mack wrote:
>>>> On Sat, 24 Jan 2004 16:28:27 GMT, "Cristiano"
>>>> <cristiano.pi@NSquipo.it> wrote:
>>>>
>>>>> Mack wrote:
>>>>>> On Thu, 22 Jan 2004 19:30:50 GMT, "Cristiano"
>>>>>> <cristiano.pi@NSquipo.it> wrote:
>>>
>>>> Skewness measures symmetry about a point. Kurtosis could be
>>>> used but the expected value would not be zero as for the normal
>>>> curve when applied to a uniform distribution, although this can be
>>>> easily calculated.
>>>
>>> Sure, it is 6/5 * (n^2+1) / (n^2-1) for a discrete uniform
>>> distribution. And when you got, for eample, 7/6 what do you say? It
>>> is good? It is bad? And how much?
>>>
>>> Here I have two doubts:
>>> 1) Surely we get some information from those parameters, but can the
>>> information gotten be used in testing a rng (in an efficient way)?
>>> 2) You say: "Skewness measures symmetry about a point". I don't know
>>> how you calculate "your" skewness. Do you calculate it using the
>>> absolute moments or the central moments in some "strange" way?
>>
>> skewness=1/n*(sum from 1 to n((Yi-expected mean)^3))/expected
>> deviation^3
>>
>> specifically:
>>
>> Skewness is a measure of symmetry, or more precisely, the lack of
>> symmetry. A distribution, or data set, is symmetric if it looks the
>> same to the left and right of the center point
>>
>> from: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
>
>They wrote that Y is the mean, not the expected mean and that you should use
>the standard deviation, not the expected deviation.

In any statistical calculation you can test either against the sample
or against the expected. In the case of that specific program there
is no way to enter an expected outcome. They do not say sample mean
or sample standard deviation either.

>As I already said, you use a useless method in a bad way.

If you don't like my methodology don't use it.
But it measures exactly what I was trying to measure
in an effective way. I did not use this test to determine
that the sample was not uniform. I used this test to try
and determine how it was not uniform.

The same statistic type of statistic can be calculated
by using the mean deviation from the expected. However
it is more sensitive to the center and less sensitive to the
tails.

t=(mean-expected mean)/variance of the mean
v^2=expected standard deviation/N

in the 1e6x100 LZ case this is
t=(3.63-5.5)/2.8868*sqrt(1/100))=-.6478/.1=-6.48
The correct number of degrees of freedom is 100.

Again this uses a simplified version that is oriented toward the
expected variance of the mean instead of the estimated
variance of the mean. In this case I was simply to lazy to
calculate it but the result will be similar.

Again we can clearly see that the distribution is not
symetrical about the expected mean. The means are
not equal.

in the 1e6x1000 case this is
t=(5.423-5.5)/2.8737*sqrt(1/1000))=-.847
which is not significant but the prior case makes
us wonder if the measurement is correct.

I will leave the calculation of the FFT values to
someone who cares since we have already agreed that
the FFT does not perform as expected on values below
1e6.

>
>
>>>>>> Diehard doesn't give KS results except where
>>>>>> it is appropriate.
>>>>>
>>>>> So does NIST test. But exactly, what do you mean?
>>>>
>>>> The finalAnalysisReport returns KS test values
>>>> where these values are not appropriate.
>>>
>>> Who say that? Have you done a new discovery?
>>> If you calculate the KS test for *only* one sequence, then the KS is
>>> good enough.
>>
>> Why would you ever use a KS test on one p-value?
>
>Uh!? I know that my English is bad, but not so bad! Do you understand what I
>say?
>You said that the KS test used for the final report is not appropriate.
>I said that if you run once the FFT test, the related KS test is far good.
>

That would be running a KS test on one p-value. A KS test is not
appropriate on one p-value.

>One can use whatever he wants, but the KS test used for the
>finalAnalysisReport is appropriate.
>

quoting from post: ppCPb.126628$VW.5097709@news3.tin.it

I said:
>> I think you have already agreed that the KS test of the p-values
>> for these tests is not correct. Specifically it isn't the correct
>> test.

and you said:
>I totally agree with your last sentence: the KS test *must* not be used with
>LZ.
>But all the tests in the suite are good to check a prng (if they are
>properly used).

therefore the finalAnalysisReport returns KS test values
where these values are not appropriate.

now back to your last post:
>>> But if you calculate the KS of the KS's gotten from 100 or 1000
>>> sequences, then the overall p-value is useless because the 100 or
>>> 1000 p-values are too binned.
>>>
>>>
>>>>>> Unfortunately the output is pretty hard to read.
>>>>>> I usually open it with a text editor and search for results of
>>>>>> .000, .00, and .0.
>>>>>
>>>>> And when you find them what do you do?
>>>>
>>>> Repeat that specific test with more data to determine if it is
>>>> isolated or consistent.
>>>
>>> With more data? Each test needs a fixed number of 32-bit numbers
>>> (some test requires slight variations on the number of input
>>> numbers).
>>>
>>> Anyway, when a generator is definitely good or bad?
>>>
>>
>> A generator is bad when it fails a test that has a good mathematical
>> foundation in a spectacular manner
>
>Could you list some test that has a good mathematical foundation?

The Rank test, the Count the Ones test, Runs test, Craps test,
, Poker test, Distribution test, Overlapping Pairs/Triples/Quadruples
test (the sparse form uses empirical data, only the raw form has a
completely mathematical foundation). The birthday spacings test
claims to have good mathematical foundation but some parameters
of the test are empirical. I haven't read the detailed discussion of
this test so I can't confirm the foundation.

Correct me if I am wrong on any of these or missed any that
don't use empirically derived data to test against.

>
>
>> Or alternatively you can say that
>> a generator is bad when fails some test (mathematical or empirical)
>> that other "good" generators pass. Of course a single number that is
>> .0000 or .9999 is not an indication of failure it must do so
>> consistently, since a value like that happens randomly 1 time in
>> 10000.
>
>You still haven't said when a generator is good or bad.
>

That is a pretty good definition of bad. There isn't a really
satisfactory definition of good except that it isn't bad.
An ideal generator produces all sets of possible output
with equal probability and no correlation.

All PRNGs are of course correlated in some way. Given
the initial state of a PRNG you can predict the next value
with 100% probability.

>
>> A generator is good when it meets the criteria for which it will be
>> used.
>
>If the criteria are showed by a test, then this is an incredibily big error!
>What do you mean, exactly?
>

The most common criteria are lack of correlation and even
distribution. They are easy to test in theory but hard in practice.
All PRNGs are by definition correlated in some way (ie. they have
state).

Congruential generators are correlated in one way while
lagged fibonacci are correlated in different ways. Shift register
generators are also correlated in certain ways. All have provable
periods and even distribution.

Other generators such as AWC and MWC have radically different
types of correlation but are slightly biased. The bias is specific
to the generator but is 2 parts or more over the period. For an
extremely long period this is negligible.

KISS is an example of a combination generator that passes
almost any test you throw at it.

In the cryptographic arena an additional requirement is
the inability to determine the state from the output.
BBS and ARC4 are examples. BBS is based on the idea
that factoring is difficult. ARC4 uses shuffling and has been
shown to have some weaknesses in its output. I haven't kept
up with the status but it is still being used. BBS is probably
a lot closer to 'ideal' but is very slow.

>
>> A bad generator such as a simple congruential generator may
>> be a "good" generator if we are using it in a non-demanding
>> application.
>
>It seems to me that you don't have an objective method to say: "this is
>good" or "this is bad".
>I think it is because of the way you use skewness mixed with KS, chi-square
>and ayes.
>

I have several objective ways of saying "this is bad". For any
specific PRNG there is no single way of saying this is good.
Ideally we want something computationally infeasible to predict
but no one has ever proven that is possible. See the whole group
of P vs. NP threads for that argument.

>
>>>>>> I am also having to create my own test suite because nothing
>>>>>> else meets my current needs. sts seems like a good package but
>>>>>> it has its limitations.
>>>>>
>>>>> Yes, all the tests have limitations. I think if one uses a test in
>>>>> a proper
>>>>> way the test can be useful anyway. The "proper way" could be also
>>>>> to discard
>>>>> a test! I done that with some test in dh.
>>>>
>>>> I have never found it necessary to discard a DH test. They may not
>>>> detect a problem where there isn't one but they have never given
>>>> a strong result of a problem where one didn't exist.
>>>
>>> I don't know the status of the newer version of dh, but that test
>>> has had many problems (for example you could see my post on
>>> september 2003 about the bad distribution of the overlap sum test).
>>>
>>
>> The newer version is still being listed as 0.2 beta. However that
>> error has been corrected. The big three tests Gorilla, GCD, and
>> Birthday Spacings test all seem to be functioning adequately.
>
>Have you double checked that?

As well as I can. My "presumed good" generators pass the tests.

The GCD and gorilla tests rely on empirically
derived data. The birthday spacings has a more mathematical
basis. I haven't actually read the detailed discussion of the
test so I can't comment extensively on its development.

From: http://www.jstatsoft.org/v07/i03/tuftests.pdf

Note that we do not need the true distributions to develop a test
of randomness. All we need to do is compare the sample
distribution from a particular RNG with the standard provided by a
number of presumably good RNG's, 'presumably good' meaning that
they produce results so close to a single one that the single one may
used as a standard. ...

Since some of the best minds in mathematics can't come up with
a better definition of a good RNG then I certainly don't expect to do
so. But I don't necessarily trust the empirically derived data.

>
>Cristiano
>

Leslie 'Mack' McBride
remove text between _ marks to respond via e-mail



Relevant Pages

  • Re: Data Simulation from Correlation Matrix
    ... mathematics and algorithm needed to generate ... a simulated data matrix given a correlation ... matrix and a random number generator? ...
    (sci.stat.math)
  • Re: Data Simulation from Correlation Matrix
    ... mathematics and algorithm needed to generate ... a simulated data matrix given a correlation ... matrix and a random number generator? ...
    (sci.stat.math)
  • Data Simulation from Correlation Matrix
    ... mathematics and algorithm needed to generate ... a simulated data matrix given a correlation ... matrix and a random number generator? ...
    (sci.stat.math)
  • Re: issues with statistical test suite from http://csrc.nist.gov/rng/
    ... >> ses is the standard error of skewness. ... >distribution differs from the expected one. ... >Unfortunately I can't know if a generator under test is good or bad if it ... Kurtosis could be ...
    (sci.crypt)
  • Re: Is Itos Lemma correct?
    ... What algorithm did you use for your random number generator? ... Here's the original location and distribution of Mr. Steve Park: ... It's characterized as "a very accurate approximation of the normal idf". ... could lead to a loop in the RNG. ...
    (sci.math)