Re: Barcode Email
From: Walter Roberson (roberson_at_ibd.nrc-cnrc.gc.ca)
Date: 08/01/05
- Next message: Joe Fox: "Re: Barcode Email"
- Previous message: Ari Silversteinn: "Re: Barcode Email"
- In reply to: Ari Silversteinn: "Re: Barcode Email"
- Next in thread: Ari Silversteinn: "Re: Barcode Email"
- Reply: Ari Silversteinn: "Re: Barcode Email"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Mon, 1 Aug 2005 04:33:49 +0000 (UTC)
In article <1kjmdjg5u2z75$.1ra9lbrozui35.dlg@40tude.net>,
Ari Silversteinn <abcarisilverstein@yahoo.comxyz> wrote:
:I don't "think" anything, I know and not everything that reduces file sizes
:is compression, grow up, think outside the box.
Any algorithm which:
a) maps every unique input file to a unique output file; and
b) maps at least one input file to an output file that is
shorter than the input file
*is* a compression algorithm. That's pretty much the
definition of what a compression algorithm *is*.
[I used to hang out in comp.compression a lot. And I studied
Cleary, Bell, and Witten's "Text Compression Algorithms", so I'm
not just talking out of my hat.]
Whether a particular compression algorithm is -useful- or
not depends on the context in which it is used.
For example, consider an algorithm that catelogued the
255 most common software license files (including GPL, the
OpenBSD license, the Microsoft XP license, and so on), and
then was:
Compare the source file to each of the 255 license files
in turn. If the file is an *exact* match, the output is
a single octet which is the index of the license file.
Otherwise, the output is the all-1's octet (0xff) followed
by the input file.
Such an algorithm could end up saving quite a bit of space on a
filesystem which was a source code repository, whilst expanding
any given input file by at most one byte. If, on the other hand,
the the filesystme happened to contain a lot of files that happened
to be exactly the size of the allocation cluster and contained
no copies of any licenses, then the algorithm could end up
doubling the filesystem space requirements. Thus the effectiveness
of the algorithm depends on the context in which it is applied.
A compression algorithm does NOT need to be anything elaborate
such as an LZW or wavelet algorithm. It doesn't even need to
be an "algorithm" in the traditional sense: all it has to be
is some kind of lookup of a mathematical function.
Note: functions are defined in mathematics by tuples, *not* necessarily
by algebra: as long as the tuples can be exhaustively listed or can be
described, you have a function, even if there is no formula. The pairs
consisting of n and the n'th prime number form a mathematical function,
even though there is no known algorithm to generate the n'th prime
short of exhaustive search.
Similarily, a "compression algorithm" really only needs to be
a "function" in the mathematical sense, that has the properties
listed above: it is "one-to-one", and the representation of one
of the second members of a tuple is smaller than the represention
of the first member of the tuple. That's all that you have to
satisify to have a compression function.
Many functions turn out to be compression functions.
The trick in compression functions is not in finding a function
that will compress *something*, but rather is finding one
that is useful for the pre-identified input domain (the source
probabilities of which might be generally known ahead of time.)
-- "Never install telephone wiring during a lightning storm." -- Linksys
- Next message: Joe Fox: "Re: Barcode Email"
- Previous message: Ari Silversteinn: "Re: Barcode Email"
- In reply to: Ari Silversteinn: "Re: Barcode Email"
- Next in thread: Ari Silversteinn: "Re: Barcode Email"
- Reply: Ari Silversteinn: "Re: Barcode Email"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|