Re: Barcode Email

From: Ari Silversteinn (abcarisilverstein_at_yahoo.comxyz)
Date: 08/02/05


Date: Tue, 2 Aug 2005 08:03:14 -0400

On Mon, 1 Aug 2005 04:33:49 +0000 (UTC), Walter Roberson wrote:

> In article <1kjmdjg5u2z75$.1ra9lbrozui35.dlg@40tude.net>,
> Ari Silversteinn <abcarisilverstein@yahoo.comxyz> wrote:
>:I don't "think" anything, I know and not everything that reduces file sizes
>:is compression, grow up, think outside the box.
>
> Any algorithm which:
>
> a) maps every unique input file to a unique output file; and
>
> b) maps at least one input file to an output file that is
> shorter than the input file
>
> *is* a compression algorithm. That's pretty much the
> definition of what a compression algorithm *is*.
>
> [I used to hang out in comp.compression a lot. And I studied
> Cleary, Bell, and Witten's "Text Compression Algorithms", so I'm
> not just talking out of my hat.]

We define compression differently. Compression is possible because most
real-world data are very statistically redundant. When represented in its
human-interpretable form (or in the case of text to be printed on a
computer screen, a simple machine-interpretable form such as ASCII), the
data are represented in a non-concise way. For example, the letter 'e' is
much more common in English text than the letter 'z', and the likelihood of
the letter 'q' being followed by the letter 'z' is rather remote. Analysis
of these statistical behaviors can allow the same information to be
represented much more concisely.

That is not what we do since we lose that analytical capability once we
library the data.
 
> Whether a particular compression algorithm is -useful- or
> not depends on the context in which it is used.
>
> For example, consider an algorithm that catelogued the
> 255 most common software license files (including GPL, the
> OpenBSD license, the Microsoft XP license, and so on), and
> then was:
> Compare the source file to each of the 255 license files
> in turn. If the file is an *exact* match, the output is
> a single octet which is the index of the license file.
> Otherwise, the output is the all-1's octet (0xff) followed
> by the input file.
>
> Such an algorithm could end up saving quite a bit of space on a
> filesystem which was a source code repository, whilst expanding
> any given input file by at most one byte. If, on the other hand,
> the the filesystme happened to contain a lot of files that happened
> to be exactly the size of the allocation cluster and contained
> no copies of any licenses, then the algorithm could end up
> doubling the filesystem space requirements. Thus the effectiveness
> of the algorithm depends on the context in which it is applied.

No problems here.
 
> A compression algorithm does NOT need to be anything elaborate
> such as an LZW or wavelet algorithm. It doesn't even need to
> be an "algorithm" in the traditional sense: all it has to be
> is some kind of lookup of a mathematical function.

Or a routine that does random generation of symbols.
 
> Note: functions are defined in mathematics by tuples, *not* necessarily
> by algebra: as long as the tuples can be exhaustively listed or can be
> described, you have a function, even if there is no formula. The pairs
> consisting of n and the n'th prime number form a mathematical function,
> even though there is no known algorithm to generate the n'th prime
> short of exhaustive search.

Yes.

> Similarily, a "compression algorithm" really only needs to be
> a "function" in the mathematical sense, that has the properties
> listed above: it is "one-to-one", and the representation of one
> of the second members of a tuple is smaller than the represention
> of the first member of the tuple. That's all that you have to
> satisify to have a compression function.
>
> Many functions turn out to be compression functions.
>
> The trick in compression functions is not in finding a function
> that will compress *something*, but rather is finding one
> that is useful for the pre-identified input domain (the source
> probabilities of which might be generally known ahead of time.)

Or you have wasted a lot of time, Walter, agreed.

-- 
Drop the alphabet for email


Relevant Pages

  • Re: Barcode Email
    ... > b) maps at least one input file to an output file that is ... We define compression differently. ... consider an algorithm that catelogued the ... > OpenBSD license, the Microsoft XP license, and so on), and ...
    (sci.crypt)
  • Re: Barcode Email
    ... :is compression, grow up, think outside the box. ... maps at least one input file to an output file that is ... *is* a compression algorithm. ... Compare the source file to each of the 255 license files ...
    (sci.crypt)
  • Re: Barcode Email
    ... :is compression, grow up, think outside the box. ... maps at least one input file to an output file that is ... *is* a compression algorithm. ... Compare the source file to each of the 255 license files ...
    (comp.security.misc)
  • Recompressing poorly compressed files
    ... compression algorithm was not very good, and the 2nd one is very ... just compressing with a good algorithm in the first place of course. ... algorithms for compression and decompression, or even if the're a bit ... reversible (e.g. 'decompressing' a gif into a bitmap means losing some ...
    (comp.compression)
  • Re: Computer being developed modeled after human brain
    ... innate skills. ... run-length-encoding compression module. ... If the "edge detection" algorithm is not useful for ... edge detection algorithm will be replaced. ...
    (comp.ai.philosophy)