Re: can anyone help me with the calculation of statistical probability?



On Tue, 18 Mar 2008 09:53:08 -0700 (PDT), flame.dawn@xxxxxxxxx wrote:
. . . Can anyone
calculate what the random similarity would be, i.e., if we assume that
there was no plagiarism and that index 1 (27740 terms) and index 2
(3500 terms) were independently derived, what would be the probability
that some of the terms would still be identical if the text to which
the indexes refer is 80%-90% similar.

Interesting question. Someone here can probably help. But
some questions:

You say "the text" is "similar". Is there one text, or two
texts? If two, then what does "80% similar" mean?

Are you talking about indexing in the traditional, book-oriented
sense of somebody compiling an alphabetized list of pointers to
significant mentions in the text? If so, one would hope that
the two indices were *not* just independent random samples:
one would expect them to overlap a lot.

--
To email me, substitute nowhere->spamcop, invalid->net.
.