Re: Barcode Email

From: Walter Roberson (roberson_at_ibd.nrc-cnrc.gc.ca)
Date: 08/01/05


Date: Mon, 1 Aug 2005 04:33:49 +0000 (UTC)

In article <1kjmdjg5u2z75$.1ra9lbrozui35.dlg@40tude.net>,
Ari Silversteinn <abcarisilverstein@yahoo.comxyz> wrote:
:I don't "think" anything, I know and not everything that reduces file sizes
:is compression, grow up, think outside the box.

Any algorithm which:

a) maps every unique input file to a unique output file; and

b) maps at least one input file to an output file that is
shorter than the input file

*is* a compression algorithm. That's pretty much the
definition of what a compression algorithm *is*.

[I used to hang out in comp.compression a lot. And I studied
Cleary, Bell, and Witten's "Text Compression Algorithms", so I'm
not just talking out of my hat.]

Whether a particular compression algorithm is -useful- or
not depends on the context in which it is used.

For example, consider an algorithm that catelogued the
255 most common software license files (including GPL, the
OpenBSD license, the Microsoft XP license, and so on), and
then was:
  Compare the source file to each of the 255 license files
  in turn. If the file is an *exact* match, the output is
  a single octet which is the index of the license file.
  Otherwise, the output is the all-1's octet (0xff) followed
  by the input file.

Such an algorithm could end up saving quite a bit of space on a
filesystem which was a source code repository, whilst expanding
any given input file by at most one byte. If, on the other hand,
the the filesystme happened to contain a lot of files that happened
to be exactly the size of the allocation cluster and contained
no copies of any licenses, then the algorithm could end up
doubling the filesystem space requirements. Thus the effectiveness
of the algorithm depends on the context in which it is applied.

A compression algorithm does NOT need to be anything elaborate
such as an LZW or wavelet algorithm. It doesn't even need to
be an "algorithm" in the traditional sense: all it has to be
is some kind of lookup of a mathematical function.

Note: functions are defined in mathematics by tuples, *not* necessarily
by algebra: as long as the tuples can be exhaustively listed or can be
described, you have a function, even if there is no formula. The pairs
consisting of n and the n'th prime number form a mathematical function,
even though there is no known algorithm to generate the n'th prime
short of exhaustive search.

Similarily, a "compression algorithm" really only needs to be
a "function" in the mathematical sense, that has the properties
listed above: it is "one-to-one", and the representation of one
of the second members of a tuple is smaller than the represention
of the first member of the tuple. That's all that you have to
satisify to have a compression function.

Many functions turn out to be compression functions.

The trick in compression functions is not in finding a function
that will compress *something*, but rather is finding one
that is useful for the pre-identified input domain (the source
probabilities of which might be generally known ahead of time.)

-- 
  "Never install telephone wiring during a lightning storm." -- Linksys


Relevant Pages

  • Re: Barcode Email
    ... :is compression, grow up, think outside the box. ... maps at least one input file to an output file that is ... *is* a compression algorithm. ... Compare the source file to each of the 255 license files ...
    (sci.crypt)
  • Recompressing poorly compressed files
    ... compression algorithm was not very good, and the 2nd one is very ... just compressing with a good algorithm in the first place of course. ... algorithms for compression and decompression, or even if the're a bit ... reversible (e.g. 'decompressing' a gif into a bitmap means losing some ...
    (comp.compression)
  • Re: Computer being developed modeled after human brain
    ... innate skills. ... run-length-encoding compression module. ... If the "edge detection" algorithm is not useful for ... edge detection algorithm will be replaced. ...
    (comp.ai.philosophy)
  • Re: Random Ideas
    ... 'Faqs' thread for compression queries. ... for any (lossless) data compression ... algorithm, there will be ... into other sequences of the same units. ...
    (comp.compression)
  • =?windows-1252?Q?Re=3A_Artificial_intelligence_and_Landauer=92s_princip?= =?windows-1252?Q?l
    ... Not so much a small amount of hardware rather a lot ... This is basically a data compression problem applied ... to a learning algorithm. ... understand how evolution evolved. ...
    (comp.ai.philosophy)

Quantcast