Re: diehard and ent results quesion

From: Bryan Olson (fakeaddress@nowhere.org)
Date: 03/01/03


From: Bryan Olson <fakeaddress@nowhere.org>
Date: Sat, 01 Mar 2003 06:46:53 GMT

Terry Ritter wrote:
> Bryan Olson <fakeaddress@nowhere.org> wrote in message
news:<%wG7a.448$Z57.17290436@newssvr15.news.prodigy.com>...
>
>>Terry Ritter wrote:
>> > "Cristiano" wrote:
>>[...]
>> >>The 'one big trial
>> >>method' said by Bryan seems better.
>> >
>> > But the idea of "one big trial" is generally wrong
>> > in randomness testing:
>> >
>> > The results from statistical tests are literally
>> > *statistical*, not absolute. The p-value result is
>> > just one position on a distribution.
>>
>>That's what it makes using the "one big trial" rather some
>>"average" right, not wrong. The test would reject a good
>>generator in only a minuscule portion of cases, and it rejects
>>this generator. Sure, if we get an out-lying p-value, we want
>>to check if the result is reproducible, in order to lower the
>>probability of rejecting a good generator.
>
> First let's note the confusion in the above answer:
> If running "one big trial" was the solution, we would
> not need to deal with special cases. But even the
> answer admits that is wrong: False negatives *continue*
> to be an issue even with huge trials. And the answer
> offers to solve that issue with -- wait for it -- more
> trials, that is, beyond the "one big trial" proposed.
> The claim thus contradicts itself.

What a mess. The question was about applying a chi-square test
to a small sample. The solution of averaging together multiple
independent runs has two defects: first it's wrong -- that's not
the chi-square statistic. Second, it assumes we can do many
independent runs, and if that's true we didn't really have the
small-sample problem in the first place.

> The above answer admits that false negatives may be a
> problem. But that fails to come to grips with the real
> problem, because false negatives are *not* the real
> problem in cryptography: A false negative may cause us
> to reject a good generator, so we may waste some design
> time, but in doing that we create no security issue.
>
> Our real problem lies in accepting false *positives*,
> since, if we do that and field a bad generator, we
> *cause* a security issue due to incompetent testing.

That's what makes the averaging suggestion so very bad. If we
tried to apply the chi-square distribution to averages of
counts, it could easily hide flaws.

No one is saying to limit testing to a chi-square test of a
single property. I'm saying the averaging solution is wrong.
Furthermore, if we have multiple runs we could average, then we
have enough samples to get a chi-square statistic for which the
chi-square distribution is quite precise.

[...]
> "One big trial" is not only wrong, it is incompetent.

See Knuth vol II, or most any introductory statistics text for
the chi-square test. The Ritter "average each bin count across
multiple trials" method seems to be garbage.

-- 
--Bryan


Relevant Pages

  • Re: Chi-square for binomial samples
    ... Most chi-square statistics that I have seen assume ... hypergeometric distribution. ... The row totals do ...
    (sci.stat.math)
  • Re: random generation
    ... that how can i generate random numbers for some distribution in any ... can use to teach principles of statistics and related subjects. ... we need a random number generator. ... pseudo-random numbers generated with software on computers. ...
    (sci.stat.consult)
  • Re: prediction using statistics
    ... cycles I can find the statistical distributions of these 9 parameters. ... statistics turn into Markov generator -- for example with the trigram ... Pretty much no matter how much text I analyzed, the generator was pretty ... The results were still gibberish, but a better quality of gibberish. ...
    (comp.soft-sys.matlab)
  • Re: Noncentral Chi Distribution
    ... Chi-square is included in the statistics ... The non-centrality parameter for the N-C chi may also be expressed in terms of a square root, so you may have to square that. ... Depending on what you mean by "evaluation", this is fairly straight-forward, using the functions in the Statistics Toolbox for the NC chi-square. ...
    (comp.soft-sys.matlab)
  • Re: significance in contingency table
    ... I understand the need of finding the law of the statistics under the null ... the formula hints at a Chi-square with one degree of freedom: ... So, the second test would fail in many cases where the first succeeds, which ...
    (sci.stat.math)