cryptofreak.org cryptofreak home projects
contact about
minihuff
NewsNews
DownloadsDownloads
DocumentsDocuments
CVSCVS
TrackersTrackers
Contact: jnmiller


projects
Antera Antera
News Commentator
News Frotz
gkrellmGIMPS gkrellmGIMPS
Holy Librams! Holy Librams!
Linux Porting Linux Porting
minihuff minihuff
mod-chal mod-chal
Recipe Radar Recipe Radar
Riggy Riggy
Contact: webmaster


  README


minihuff README Document

(by Jay Miller [jnmiller, cryptofreak dot org])

http://www.cryptofreak.org/projects/minihuff/

$Id: README,v 1.1 2006/02/21 17:24:41 jnmiller Exp $


Contents
~~~~~~~
1. Introduction
2. Internals
3. Configuration
4. Entropy Data
5. Usage
6. Bugs and Troubleshooting



1. Introduction
~~~~~~~~~~~~~~~

minihuff is a data compression library with a simple new twist on
Huffman coding.  Instead of storing the frequency analysis for some set
of data with the data itself, minihuff enables the creation of a static
frequency table to be stored at both ends of a connection.  This allows
effective compression even for very small pieces of data that maintain
similar entropy characteristics.

For more information about Huffman coding, please see Introduction to
Algorithms, by Corman, Leiserson and Rivest (CLR), § 17.3.  If you're
looking for a web reference, you might start at Wiki's entry on the
subject (http://en.wikipedia.org/wiki/Huffman_coding).


2. Internals
~~~~~~~~~~~~

minihuff deviates from a standard Huffman coding in one major respect.
For those characters whose code would exceed their normal, fixed length,
minihuff reverts to a fixed length code with a special 1-bit prefix.
Technically, this means that the decoding tree is trimmed to a depth
equal to the size of the letter type, in bits.  This change is meant
to save space for large alphabets that might contain very long variable
length codes.

Otherwise, minihuff implements Huffman coding in a standard way.  It
uses a priority queue to build a decoding tree, then constructs an
encoding map from the tree.


3. Configuration
~~~~~~~~~~~~~~~~

Before using this library, you should check out config.h.  There are
currently three portability issues that may affect you:

 - Reentrancy: if your compiler requires special code to handle
   functions that are reentrant (recursive), look at how REENTRANT is
   used.
 - Floating point accuracy: if for some reason your compiler is less
   accurate than my brain, you might need to increase MH_FLOAT_ERROR.
   This value is used to assure that float comparison is portable.
 - Floating point type: if your compiler doesn't like floats, you can
   change the floating point type that minihuff uses.

Feel free to add a section for your own compiler and pass on to me why
the changes were needed, so that I can add them to the code base.

Once you are configured, you may wish to run the example program and
check its output for success.  Those of you on *nix can try this:

   # make test


4. Entropy Data
~~~~~~~~~~~~~~~

minihuff requires that entropy data about the data to be
(un)compressed be supplied to it.  To help generate this static data,
I've supplied a simple program called 'freqanal' that will read stdin
and spit out the frequencies of each appearing letter.  Try

   # ./freqanal -h

for help on its usage.  

Ultimately (regardless of whether you use freqanal), you should have
an array of pair_ts in your program that you can supply to mh_init().
See the example program for.. an example.


5. Usage
~~~~~~~~

See the include file minihuff.h for an API overview.  Also see the file
example.c for an example of how to use minihuff in your own code.


6. Bugs and Troubleshooting
~~~~~~~~~~~~~~~~~~~~~~~~~~~

I have tested this library in my own code for some time, so I hope it is
at least somewhat robust barring any distribution bugs.  It has been
used successfully between an 8051 (big endian) and a Pentium 4 (little
endian).  I hope I've also solved some floating point portability issues
for good.

Nonetheless, there naturally may be bugs remaining.  If you can find any
- or if it just doesn't work! - I would very much appreciate a note with
as much detail as possible on the problem.

If you are feeling ambitious, the file debug.c contains some printing
functions for the various data structures used in the minihuff code.
Feel free to make use of them.

Thank you, and good luck!