Re: can anyone help me with the calculation of statistical probability?



flame.dawn@xxxxxxxxx writes:

Here is the question. This concerns a claim of plagiarism. There are
two indexes of a similar text numbering about 750,000 words. The first
index has 27,740 terms in it, while the second index has 3,500 terms
in it. The authors of the first index claim that the authors of the
second plagiarized their index, but it turns out the indexes are
mostly different, and only a few terms are similar. Can anyone
calculate what the random similarity would be, i.e., if we assume that
there was no plagiarism and that index 1 (27740 terms) and index 2
(3500 terms) were independently derived, what would be the probability
that some of the terms would still be identical if the text to which
the indexes refer is 80%-90% similar.

??? They are indexing the same text? Of course there are similarities. It
is like claiming that two photos of the whitehouse are plagerized because
both have a building in them with white columns.
The only way they could perhaps have substatiated it is by including false
terms in teh index-- eg terms which do not actually appear in the text, or
are ascribe to the wrong pages.
No statistical test is going to determine anything since they two are
correlated by being indices of the same text. Ie, what you are trying to
measure is completely irrelevant to the claim.


.



Relevant Pages

  • Re: Can any one help me calculate a statistical probability
    ... index has 27,740 terms in it, while the second index has 3,500 terms ... The authors of the first index claim that the authors of the ... description of how indexing is done. ... subset of the 27000 most significant terms. ...
    (sci.math)
  • Basic question #3
    ... ....don't know why I am numbering my questions, but I'm on a roll, so ... wouldn't "downto" require the second index to be less than or ... The three books on my desk don't seem to help either. ...
    (comp.lang.vhdl)
  • Re: can anyone help me with the calculation of statistical probability?
    ... two indexes of a similar text numbering about 750,000 words. ... there was no plagiarism and that index 1 and index 2 ... You are missing the boundary condition, what is the subject field (blue ... If such a field had 3,000,000 then perhaps the second index may have ...
    (sci.crypt)
  • Re: can anyone help me with the calculation of statistical probability?
    ... two indexes of a similar text numbering about 750,000 words. ... index has 27,740 terms in it, while the second index has 3,500 terms ... Who the fuck else will pay ...
    (sci.crypt)