Re: An already compressed file encoding question

From: SuperFly (fake_at_email.com)
Date: 08/19/03


Date: Tue, 19 Aug 2003 17:43:38 GMT

On 19 Aug 2003 13:25:43 GMT, mdw@nsict.org (Mark Wooding) wrote:

[snip]

>The expected number of unused values is more simply just 256/e; and the
>expected number of values seen once or more is 256(1 - 1/e).
>
>(Computations courtesy of the Emacs calculator.)
>
>> If we call the taken data block, the input group. Shouldn't it be
>> possible to exclude certain impossible combinations from the input
>> group and recode it as a smaller group, using less bits?
>
>No. The unoccupied values are unpredictable. As shown, (in this
>respect at least), your compressed data looks like a remarkably good fit
>for random strings. I shouldn't expect any compressability if I were you.

I know this is true.

But I can't find the flaw in this thought experiment:

1) Let's take a sample string with length L from a file with a near
perfect random distribution. (This could be the output of any good
arithmetic style coder.)
2) Let's assume we have infinite computing power.

Using the random distribution example from above and adding a safety
margin M we can build a list of strings with length L that fit this
pattern. This list should have less than 256^L (if we use byte values)
entry's which would be needed to encode the normal sample string with
length L. If the string is big enough we should be able to encode it
to less bits than the original.

In other words: based on the fact that we know that the data is almost
perfectly random we can exclude all possible combinations that use
less than 160-M and more than 160+M values for a 256 byte sample
string. And narrow it down even further if we use the information in
the different 'values used x times' groups.

Which would almost suggest to me, that there is still some air in the
file that can be squeezed out.

Which can't be true because this would mean that random data could be
compressed :)



Relevant Pages

  • Re: Tying check boxes to data
    ... Specifically I want to exclude the column of data that the check box ... Start out by writing a function that builds the string ... When I check one of the check boxes I ... For the first one it will either blank out the entire graph or allow the ...
    (microsoft.public.access.modulesdaovba)
  • Re: A string & a list
    ... You should exclude those elements from the string: ... The array @- is an internal one, storing the offsets of the substrings found by the last regex pattern match. ... The push of the results of the recursive call is qualified by a call to grepto make sure the new value isn't already in the list. ...
    (perl.beginners)
  • Re: Wondering first caracter for each word in the field
    ... What if we need to exclude some word from showing thir first letter through ... From "Ministry of Foreign Affairs" to retreive "MFA" excluding word ... > Public Function Initialize(strIn As String) As String ...
    (microsoft.public.access.formscoding)
  • Re: Integers behaving like strings
    ... Date/Time data type in the table OR it's a text field that looks like a date? ... so I need to exclude records that might have been ... month field, [cbxMonth], is a combo box with column 1 being MonthID as ... an integer and column 2 being the month name as a string. ...
    (microsoft.public.access.queries)