Re: An already compressed file encoding question
From: SuperFly (fake_at_email.com)
Date: 08/19/03
- Next message: Tom St Denis: "Re: Please test this encryption"
- Previous message: Douglas A. Gwyn: "Re: Please test this encryption"
- In reply to: Mark Wooding: "Re: An already compressed file encoding question"
- Next in thread: Mark Wooding: "Re: An already compressed file encoding question"
- Reply: Mark Wooding: "Re: An already compressed file encoding question"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Tue, 19 Aug 2003 17:43:38 GMT
On 19 Aug 2003 13:25:43 GMT, mdw@nsict.org (Mark Wooding) wrote:
[snip]
>The expected number of unused values is more simply just 256/e; and the
>expected number of values seen once or more is 256(1 - 1/e).
>
>(Computations courtesy of the Emacs calculator.)
>
>> If we call the taken data block, the input group. Shouldn't it be
>> possible to exclude certain impossible combinations from the input
>> group and recode it as a smaller group, using less bits?
>
>No. The unoccupied values are unpredictable. As shown, (in this
>respect at least), your compressed data looks like a remarkably good fit
>for random strings. I shouldn't expect any compressability if I were you.
I know this is true.
But I can't find the flaw in this thought experiment:
1) Let's take a sample string with length L from a file with a near
perfect random distribution. (This could be the output of any good
arithmetic style coder.)
2) Let's assume we have infinite computing power.
Using the random distribution example from above and adding a safety
margin M we can build a list of strings with length L that fit this
pattern. This list should have less than 256^L (if we use byte values)
entry's which would be needed to encode the normal sample string with
length L. If the string is big enough we should be able to encode it
to less bits than the original.
In other words: based on the fact that we know that the data is almost
perfectly random we can exclude all possible combinations that use
less than 160-M and more than 160+M values for a 256 byte sample
string. And narrow it down even further if we use the information in
the different 'values used x times' groups.
Which would almost suggest to me, that there is still some air in the
file that can be squeezed out.
Which can't be true because this would mean that random data could be
compressed :)
- Next message: Tom St Denis: "Re: Please test this encryption"
- Previous message: Douglas A. Gwyn: "Re: Please test this encryption"
- In reply to: Mark Wooding: "Re: An already compressed file encoding question"
- Next in thread: Mark Wooding: "Re: An already compressed file encoding question"
- Reply: Mark Wooding: "Re: An already compressed file encoding question"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|