Re: Estimating entropy of a stream

On Dec 20, 6:08 am, Joe Green <> wrote:
Data examination, spectrum analysis and intimate knowledge of your randomness source is what is needed.  Thus if you calculate 5 bits of randomness per sample, arithmetic says output 32 bits every 32/5 (round up to 7) times.  Then double or triple that.  I would go with 32 bits every 16 samples.  

//Do it like this
hash_work_area tbl;
for(;;) {
  for(j=0;j<16;j++) {
    augment tbl with randomness source
    tbl = someHash(tbl);
  output 32 bits from table


After program start or power up be sure to toss the first outputs equivalent to at least 3 times the hash area size / bits of randomness per sample.

I will soon have a < US$100 USB TRNG that produces > 50K true random bytes per second.  My processor is a 32-bit machine that runs the "augment tbl with randomness source and tbl = someHash(tbl)" steps for two samples in just over 3 microseconds.  My hash work area is over 1200 bits with excellent and efficient augmentation and mixing.  

Or you could cheat and no one could tell.  (My USB TRNG does not cheat.)

I recently blogged about a USB Entropy Stick, and they use Maurer's
Universal statistical test which converges to the entropy
of a stream. Also see the other referenced paper which cleans up some
of Maurer's estimates/constants. Post is here

rgs Luke