Re: can anyone help me with the calculation of statistical probability?
- From: Peter Pearson <ppearson@xxxxxxxxxxxxxxx>
- Date: Tue, 18 Mar 2008 18:52:34 -0000
On Tue, 18 Mar 2008 09:53:08 -0700 (PDT), flame.dawn@xxxxxxxxx wrote:
. . . Can anyone
calculate what the random similarity would be, i.e., if we assume that
there was no plagiarism and that index 1 (27740 terms) and index 2
(3500 terms) were independently derived, what would be the probability
that some of the terms would still be identical if the text to which
the indexes refer is 80%-90% similar.
Interesting question. Someone here can probably help. But
some questions:
You say "the text" is "similar". Is there one text, or two
texts? If two, then what does "80% similar" mean?
Are you talking about indexing in the traditional, book-oriented
sense of somebody compiling an alphabetized list of pointers to
significant mentions in the text? If so, one would hope that
the two indices were *not* just independent random samples:
one would expect them to overlap a lot.
--
To email me, substitute nowhere->spamcop, invalid->net.
.
- Follow-Ups:
- References:
- can anyone help me with the calculation of statistical probability?
- From: flame . dawn
- can anyone help me with the calculation of statistical probability?
- Prev by Date: Re: structure of hash functions?
- Next by Date: Re: can anyone help me with the calculation of statistical probability?
- Previous by thread: can anyone help me with the calculation of statistical probability?
- Next by thread: Re: can anyone help me with the calculation of statistical probability?
- Index(es):