Re: diehard and ent results quesion

From: Bryan Olson (nonsuch@nowhere.org)
Date: 03/04/03


From: "Bryan Olson" <nonsuch@nowhere.org>
Date: Tue, 04 Mar 2003 18:08:32 GMT

Terry Ritter wrote:
> Bryan Olson wrote:

>>The question was about applying a chi-square test
>>to a small sample.
>
> Anyone can look back through the thread and see that
> small samples were at most a side issue.

Yes, let's look back. Here's my original post on this issue::

<http://groups.google.com/groups?selm=_Mo7a.205%24i56.80%40newssvr16.news.prodigy.com>

Which was in response to:

<http://groups.google.com/groups?selm=5c88fd7a.0302261656.6f802959%40posting.google.com>

>>The solution of averaging together multiple
>>independent runs has two defects: first it's wrong -- that's not
>>the chi-square statistic.
>
> Sure it is.

See Knuth for the chi-square statistic. One cannot use averages
of counts in place of the actual counts, and still apply the
chi-square distribution.

One could go back and reverse the averaging, by multiplying by
the total number of samples to recover the total counts, then do
the proper chi-square computation with 'n' being the total
number of samples. That doesn't sound like what Ritter was
suggesting, but his suggestion stopped at computing the
averages. I asked Ritter what he's computing with these
averages, and what distribution it should follow given the
hypothesis under test:

<http://groups.google.com/groups?selm=%25wG7a.448%24Z57.17290436%40newssvr15.news.prodigy.com>

but those questions got snipped without answer:

<http://groups.google.com/groups?selm=5c88fd7a.0302281743.6650b8b3%40posting.google.com>

>>Second, it assumes we can do many
>>independent runs, and if that's true we didn't really have the
>>small-sample problem in the first place.
>
> There is no small-sample problem in testing random
> generators.

Well, it can come up, for example if one is testing someone
else's generator in an effort to attack it. In any case, the
issue to which I responded was the small-sample problem in
applying the chi-square test.

> For chi-square testing, we define how many data are
> needed to produce expectation values for the number
> of bins we want in a trial, and then we generate that
> amount of data.

Essentially correct. One the other hand, if given some limited
amount of ciphertext, we'd typically use all we have.

> Note that all of the data in one trial is essentially
> "averaged" anyway, in the sense that one single result
> reflects all the data values. Extreme values are to
> some extent hidden by the rest of the data. This is
> inherent in statistics and part of every chi-square
> computation.

Obviously that's informal, but even so I think it's a poor
understanding. I view the chi-square statistic as much more
like a variance than like an average.

>>No one is saying to limit testing to a chi-square test of a
>>single property.
>
> But what you *are* saying is chi-square testing should
> be "one big trial," with the operative phrase being
> "one ... trial." But no one trial can characterize a
> random generator, so your proposal is wrong. It is not
> part wrong, not a little wrong, just wrong. It does not
> test what needs to be tested.

An absurd misrepresentation of what I wrote. Ritter suggested
"one might average each bin count across multiple trials (on
different data!)". I suggested to use the samples from those
multiple trials directly in the chi-square test, rather than
using these averages. There was no suggestion from either of
us that the result would be enough to "characterize a random
generator".

>>I'm saying the averaging solution is wrong.
>
> But what solution was that?

The one Ritter proposed, as cited. See also:

<http://groups.google.com/groups?selm=5c88fd7a.0302280115.47e167e6%40posting.google.com>

> Averaging was not proposed
> as a solution to testing random generators. Instead,
> it was proposed as a way to improve accuracy in small
> trials, when desired. So even if that was wrong -- and
> it is not -- it would have no effect on testing random
> generators at all.

Look again. In the context it was clearly proposed to improve
the accuracy *of the chi-square distribution*:

<http://groups.google.com/groups?selm=5c88fd7a.0302261656.6f802959%40posting.google.com>

> Averaging is part of statistics and already part of
> every chi-square computed. But averaging is not how
> we evaluate random generators, nor how I have proposed
> to do it. That is just your usual smokescreen to try
> to hide the main point, in which you are obviously and
> abysmally wrong.

As shown, Ritter proposed taking the average of multiple trials
as a solution to the problem of inaccuracy of the chi-square
distribution for small samples. In a two-sentence response, I
pointed out that this is incorrect, and explained how to do a
correct chi-square test given the same data.

[...]
> Your alternative is "one big trial." But when *I*
> say that "one big trial" is wrong, I mean that it
> actually does not work. As a strategy for testing
> random generators, that solution fails; it is
> actually incompetent.

When Ritter says "one big trial" he's talking about something
totally different than what I was talking about. I described
how to do *one* chi-square test. Treat all the data used in
that one test as a single trial; the 'n' must be the total
number of samples.

> Once we have tens of samples in each of tens of bins,
> just adding more samples is not going to make the
> evaluation statistic usably more precise.

Right. The chi-square distribution is precise for sufficiently
large samples.

> The main advantage of ever larger samples is the
> reduction of quantization error -- which I described
> earlier -- as "off by half" becomes less than the
> sampling precision. I also pointed out that such
> error could be addressed in other ways.

No, the main advantage of using ever-larger samples is that test
becomes more sensitive to the type of non-random behavior it
tends to detect. If, for example, the actual category
probabilities differ from the hypothesis (but the events really
are independent), we expect a larger sample to give us a smaller
p-value.

[...]
>>See Knuth vol II, or most any introductory statistics text for
>>the chi-square test.
>
> On the contrary, I recommend Knuth II to you, and this
> time perhaps you should actually read it:
>
> Knuth tests [snip of irrelevent reference to Knuth].

There's *nothing* there that contradicts what I wrote. I simply
explained how to do one chi-square test correctly, with the same
data that Ritter used to average several smaller runs. I
proposed the one big test as a replacement for the bad averaging
method, not as the be-all-end-all of statistical testing.
Readers other than Ritter clearly followed.

> As expected, Knuth agrees with me.

No, see the formula for the chi-square statistic. There's no
pre-averaging.

--
--Bryan


Relevant Pages

  • Re: Chi-Square Question
    ... > I would be grateful of some advice please. ... and have been trying to use Chi-Square test in ... You can still compute Chi-square, though it will be hard to interpret. ... in such cells may dominate the overall ch-squared). ...
    (sci.stat.math)
  • Re: chi square for distribution test
    ... The first test was a 2x4 contingency table: chi-square test of ... the Chi-square tests the probability that I would get my actual ... distribution from randomness. ... we find it very likely to pull that distribution so we can conclude it ...
    (sci.stat.edu)
  • Re: When to use Fisher?
    ... Why not always use Fisher? ... based on an approximation of discretely distributed chi-square values ... to avoid the chi-square test when the numbers in the contingency table ...
    (sci.stat.math)
  • Re: Rand() with base
    ... use the Chi-Square test. ... Chi-Square of the Chi-Square results. ... results should also follow some kind of random pattern ... [More Snip] ...
    (comp.lang.c)