Re: FFT test with few kbits

From: Ernst Lippe (ernstl-at-planet-dot-nl_at_ignore.this)
Date: 02/26/04


Date: Thu, 26 Feb 2004 16:48:38 +0100

On Wed, 25 Feb 2004 16:18:07 +0000, Cristiano wrote:

> Ernst Lippe wrote:
>> On Tue, 24 Feb 2004 21:33:12 +0000, Cristiano wrote:
>>
>>> Many times (perhaps all the times) to see if a test is good, one
>>> should tests the test using a good generator (which is one with a
>>> good proof). Strictly speaking, we know that the components are not
>>> independent, but this kind of dependence is somehow reduced when N
>>> is "big".
>>
>> The main point of my post was that it seems very unlikely that the
>> "dependence" is significantly reduced when N gets bigger. Actually, I
>> expect that for any reasonable measure of dependence the size of N
>> does not have any significant effect on the amount of "dependence",
>> because in most cases two Fourier components are highly dependent for
>> all sizes of N.
>
> You said "two random variables are either independendent or they are not
> independent" (and I agree) so I think "highly dependent" should be replaced
> by "dependent".
But when you accept the standard definition, then your statement
that the "dependence is somehow reduced when N is big" is simply
false. For any given size of N all Fourier components are
dependent. So the dependence stays the same when N gets big.

> Anyway, I'm still convinced that the dependence is not a problem in my test,
> because you can calculate the expected mean and variance of the (dependent)
> components, so the test is perfectly valid even if the components are
> dependent.
> Also, I showed why they are normally distributed, so they have a
> non-independent normal distribution which can be checked with KS test.
But dependency is a problem for the KS test.

Take the following example: I generate a sample N random variables x_i
in the following way, first a draw a random number r from a normal
distribution with known mean and standard deviation and then for all i
x_i = r. In this example each x_i is obviously normally distributed,
but it is also obvious that they are not independent (because within a
sample they are all equal). Now, when you apply the same line of
reasoning that you have used for the Fourier transform, it should be
obvious that the x_i in a sample are also normally distributed. This
is obviously wrong, because the standard deviation within a sample is
always zero. So when you try to use the KS test on this sample with
any distribution that has a non-zero standard deviation, the KS test
will always reject it when N is sufficiently large.

So when the values in your sample are not independent you cannot
simply apply the KS test, even when you know the individual
distributions of the x_i in the sample. With your Fourier test the
situation is completely identical from a mathematical point of
view. The Fourier components are not independent so you cannot use the
KS test, unless you can give a mathematical proof that this particular
dependence structure does not have any influence on the KS test.

>>> I posted the proof for the mean, the proof for the variance and the
>>> proof for the normal distribution.
>>> Now there is just the academic disquisition about the real
>>> independece of the components.
>>> Well, looking at the results I get with several (p)rng and having
>>> the above proofs, I think my test has solid basis to be used with
>>> great confidence.
>>
>> I am afraid that I have to disagree. Suppose, that I take a PRNG and I
>> run your test say 10 times and I find that the p-value in all cases is
>> less than say 0.001. What does this mean?
>
> That the generator is bad with no doubt.

No, why? Your conclusion would be correct when the individual samples
are independent, when they are not independent, the actual
distribution of the KS values could be completely different.

>> Basically nothing, it is
>> very well possible that I get the same results in 50% of the cases
>> when I try the same experiment with a good source of true random
>> numbers.
>
> Uh? No, that is really impossible, or even better, it is extremely unlikely.
> It is very well possible that my test says "good" when the generator is bad,
> but if my test says "bad" the generator is really bad.
> How can you say what you said?
>
>
>> So the only way that you could assess the real significance
>> of your results is by running a large number of simulations with a
>> random source that is known to be good. When the value of N is large
>> these simulations are quite expensive, and you will have to rerun all
>> these simulations when you decide to use a different value of N.
>
> I don't see any good reason to do that.
> I have given the formulas for the mean and the variance, so you don't need
> any expensive simulation; you have the proof for the mean, the variance and
> the distribution: normalize the imaginary components and do the KS test.
> That's all.

No, that would be only correct if we could could assume that
the values were independent, since we know that they are
not, we have to run full simulations, unless you can find
an analytical solution to the combined distributions.

>>> The real strangeness still open (which need to be fixed) is the
>>> non-uniform distribution of the p-values gotten from the real
>>> components. I'm really not able to find an explanation for that.
>>
>> Do you mean the non-uniform distribution of the p-values of the KS
>> test?
>
> Yes, but _only_ for the _real_ components.
>
>> But that is obvious! The only real case where you can show the
>> p-values of the KS test will be uniformly distributed between 0 and 1
>> is when the real distribution is the same as the "theoretical"
>> distribution that you compare it to. In your case the "theoretical
>> distribution" is the independent normal distribution, where all
>> variables are independently distributed. So when your data do not
>> follow this distribution, there are no reasons to expect that the
>> p-values should be uniformly distributed. Now in this case we know
>> that the values are definitely not independently distributed so the
>> real distribution must be different from the "theoretical"
>> distribution. So there are no reasons to expect that the p-values are
>> uniformly distributed. When you find that the p-values are not
>> uniformly distributed the KS test is simply telling you that your
>> assumptions about the distribution are wrong. Since we already knew
>> that they were wrong, there is nothing surprising about this.
>
> Then how can you explain that the p-values of the KS test over the
> _imaginary_ components have a _perfect_ uniform distribution?
>
> I begin to think that the problem in the _real_ part is the dependence of
> the real components with the input sequence, but _not_ between the real
> components.
> With the _imaginary_ part this doesn't happen, so I get a perfect
> distribution even if the imaginary components are dependent each other.

But your whole test still lacks any sound theoretical foundation,
so why should we trust it?

Ernst Lippe



Relevant Pages

  • Re: Observer pattern limitations
    ... back to a consistent cut. ... Constraints are propagating ... the table dependent of the formula. ... distribution can be represented as a multiplicative of individual ...
    (comp.object)
  • Re: FFT test with few kbits
    ... independent" so I think "highly dependent" should be replaced ... non-independent normal distribution which can be checked with KS test. ... That the generator is bad with no doubt. ... So there are no reasons to expect that the p-values are ...
    (sci.crypt)
  • Re: [kde-linux] Kpackage Substitute
    ... That would be entirely dependent on the distribution you're using ... Debian, would call apt utilities ...
    (KDE)
  • Re: [kde-linux] Kpackage Substitute
    ... On Monday 01 October 2007 06:20:44 am David Baron wrote: ... That would be entirely dependent on the distribution you're using. ... Archives: http://lists.kde.org/. ...
    (KDE)
  • Re: Moving to C#
    ... But isn't NGening cpu and Net version dependent? ... lot of headaches for distribution? ...
    (borland.public.delphi.non-technical)