Re: The Chinese MD5 attack

From: Unruh (unruh-spam_at_physics.ubc.ca)
Date: 08/19/05


Date: 18 Aug 2005 23:04:59 GMT


"Alan" <a__l__a__n@hotmail.com> writes:

>"Anonymous via Panta Rhei" wrote:
>> Nice words, 6 out of 7 were not in my system-supplied Webster's 2nd, or in
>any of my other
>> (limited) wordlists. Can you post a url for the word list?

>The estimate of 20-30 bits of entropy for 20 characters english text does
>not apply a random list of words. Rather, it applies to actual english
>text, generally following conventional grammar rules.

THAT WAS WHAT IT WAS CALCULATED FOR. Actual English text. What those words
were were words made according the same probability rules for trigrams and
quadrigrams as English as represented by the analysed file.
What I published was (Sum_i P_i ln_2(P_i))/3 over all trigrams. For the
dictionary it was about 3.5 while for Shackelton's book it was 2.7.

IF I use Shakelton's book as the source, the words made with the same
trigram distribution almost always came out as (short) real English words.
In the case of the dictionary, the words generally came out as longer words
not from the dictionary. Ie, that 3.5 is probably an overestimate.
 The Shackelton case, which seems to be what you are refering to, is
perhaps also a little bit of an overestimate in that the word order would
restrict the (very short) sentence as well, but not by that much. Word
order and grammer are not that strong a constraint on the possible
arrangement of words.

(Famous example, the sentence
Buffalo buffalo buffalo buffalo buffalo buffalo buffalo.
is a valid English sentence obeying the rules of grammer.)

>A diceware passphrase has approximately 13 bits of entropy per word... more
>precisely,

And noone can remember it.

> ln(7776) / ln(2) = 12.924812503605780907268694719739

>The average word length in the diceware list is about 4.2 characters. So we
>have roughly 3 bits of entropy per character. But of course the choice of
>each of word is an independent event, unrelated to the words before or after
>(based on the roll of dice). see diceware.com for more details...

Again the word order is a minor perturbation.



Relevant Pages

  • Re: Bitching about the documentation...
    ... English is just par for the course. ... "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo." ... On the other hand, for programmers who don't like obfuscated anything, ... learn the rules of syntax and grammar, ...
    (comp.lang.python)
  • Re: Bitching about the documentation...
    ... >> grammar and syntax. ... > English is just par for the course. ... > One of my favourite examples of obfuscated English is this grammatically ... > "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo." ...
    (comp.lang.python)
  • Re: Bitching about the documentation...
    ... >> English is just par for the course. ... they refer to the city. ... "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo." ... also intimidate buffalo from Buffalo. ...
    (comp.lang.python)