![]() |
|
|
![]() |
|
|
|
|
README minihuff README Document (by Jay Miller [jnmiller, cryptofreak dot org]) http://www.cryptofreak.org/projects/minihuff/ $Id: README,v 1.1 2006/02/21 17:24:41 jnmiller Exp $ Contents ~~~~~~~ 1. Introduction 2. Internals 3. Configuration 4. Entropy Data 5. Usage 6. Bugs and Troubleshooting 1. Introduction ~~~~~~~~~~~~~~~ minihuff is a data compression library with a simple new twist on Huffman coding. Instead of storing the frequency analysis for some set of data with the data itself, minihuff enables the creation of a static frequency table to be stored at both ends of a connection. This allows effective compression even for very small pieces of data that maintain similar entropy characteristics. For more information about Huffman coding, please see Introduction to Algorithms, by Corman, Leiserson and Rivest (CLR), § 17.3. If you're looking for a web reference, you might start at Wiki's entry on the subject (http://en.wikipedia.org/wiki/Huffman_coding). 2. Internals ~~~~~~~~~~~~ minihuff deviates from a standard Huffman coding in one major respect. For those characters whose code would exceed their normal, fixed length, minihuff reverts to a fixed length code with a special 1-bit prefix. Technically, this means that the decoding tree is trimmed to a depth equal to the size of the letter type, in bits. This change is meant to save space for large alphabets that might contain very long variable length codes. Otherwise, minihuff implements Huffman coding in a standard way. It uses a priority queue to build a decoding tree, then constructs an encoding map from the tree. 3. Configuration ~~~~~~~~~~~~~~~~ Before using this library, you should check out config.h. There are currently three portability issues that may affect you: - Reentrancy: if your compiler requires special code to handle functions that are reentrant (recursive), look at how REENTRANT is used. - Floating point accuracy: if for some reason your compiler is less accurate than my brain, you might need to increase MH_FLOAT_ERROR. This value is used to assure that float comparison is portable. - Floating point type: if your compiler doesn't like floats, you can change the floating point type that minihuff uses. Feel free to add a section for your own compiler and pass on to me why the changes were needed, so that I can add them to the code base. Once you are configured, you may wish to run the example program and check its output for success. Those of you on *nix can try this: # make test 4. Entropy Data ~~~~~~~~~~~~~~~ minihuff requires that entropy data about the data to be (un)compressed be supplied to it. To help generate this static data, I've supplied a simple program called 'freqanal' that will read stdin and spit out the frequencies of each appearing letter. Try # ./freqanal -h for help on its usage. Ultimately (regardless of whether you use freqanal), you should have an array of pair_ts in your program that you can supply to mh_init(). See the example program for.. an example. 5. Usage ~~~~~~~~ See the include file minihuff.h for an API overview. Also see the file example.c for an example of how to use minihuff in your own code. 6. Bugs and Troubleshooting ~~~~~~~~~~~~~~~~~~~~~~~~~~~ I have tested this library in my own code for some time, so I hope it is at least somewhat robust barring any distribution bugs. It has been used successfully between an 8051 (big endian) and a Pentium 4 (little endian). I hope I've also solved some floating point portability issues for good. Nonetheless, there naturally may be bugs remaining. If you can find any - or if it just doesn't work! - I would very much appreciate a note with as much detail as possible on the problem. If you are feeling ambitious, the file debug.c contains some printing functions for the various data structures used in the minihuff code. Feel free to make use of them. Thank you, and good luck! | |||||||||||||||||||||||||||||||||||||||