TERM PAPER CSE-408

TOPIC:HUFFMAN CODES

SUBMITTED TO:MR.VIJAY GARG

SUBMITTED BY:KARANBIR SINGH B.TECH CSE 10804631 RK1R08B39

my roommates and classmates for helping me in assembling the notes related to this topic. I am very thankful to my parents who give me financial support to complete my term paper. Last but not the least. KARANBIR SINGH is very thankful to Lect.VIJAY GARG who assigned me this term paper “HUFFMAN CODES. KARANBIR SINGH .” I am hearty thankful to college library for providing the books.ACKNOWLEDGEMENT First and foremost I.

Contents 1) Introduction 2) Types of Huffman coding a)N-ray Huffman coding b)Adaptive Huffman coding c)Huffman template algorithm d)Length limited Huffman coding e)Huffman coding with unequal letter costs f)Hu-tucker coding g)Canonical Huffman code 3) Properties 4) Advantages 5) Disadvantages 6) Applications 7) References .

and published in the 1952 ". The principle is to use a lower number of bits to encode the data that occurs more frequently. e. ASCII coding. the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted. Huffman was able to design the most efficient compression method of this type . In all cases the code book plus encoded data must be transmitted to enable decoding.INTRODUCTION Huffman coding is an entropy encoding algorithm used for lossless data compression. student at MIT..D. Huffman while he was a Ph. It was developed by David A. Huffman coding is equivalent to simple binary block encoding. that is. resulting in a prefix code sometimes called "prefix-free codes".no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. .g.Huffman coding uses a specific method for choosing the representation for each symbol. Codes are stored in a Code Book which may be constructed for each image or a set of images.Huffman coding is based on the frequency of occurrence of a data item (pixel in images).

and any sized set can form such a contractor. The Huffman template algorithm enables one to use any kind of weights (costs. This approach was considered by Huffman in his original paper. and changing the coding tree structure to match the updated probability estimates.TYPES OF HUFFMAN CODING N-ary Huffman coding The n-ary Huffman algorithm uses the {0. but the algorithm given above does not require this. except that the n least probable symbols are taken together. . non-numerical weights) and one of many combining methods (not just addition). . instead of just the 2 least probable. this is a 2 to 1 contractor.. . not all sets of source words can properly form an n-ary tree for Huffman coding. The same algorithm applies as for binary (n equals 2) codes. Adaptive Huffman coding A variation called adaptive Huffman coding involves calculating the probabilities dynamically based on recent actual frequencies in the sequence of source symbols. it requires only that the weights form a totally ordered commutative monoid. Huffman template algorithm Most often. for binary coding. then the set of source words will form a proper Huffman tree. the weights used in implementations of Huffman coding represent numeric probabilities. In this case. pairs of weights. This is because the tree must form an n to 1 contractor. Note that for n greater than 2. additional 0-probability place holders must be added. If the number of source words is congruent to 1 modulo n-1. 1. meaning a way to order weights and to add them. n − 1} alphabet to encode message and build an n-ary tree. frequencies..

where a 'dash' takes longer to send code. where is the maximum length of a codeword.Such algorithms can solve other minimization problems. respectively. minimizing the total cost of the message and minimizing the total number of digits are the same thing. The goal is still to minimize the weighted average codeword length. and therefore the cost of a dash in transmission time is higher. due to non-uniform characteristics of the transmission medium. N. than a 'dot'. An example is the encoding alphabet of Morse code. No algorithm is known to solve this in . such as minimizing design. it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is N digits will always have a cost of N no matter how many of those digits are 0s. Huffman coding with unequal letter costs is the generalization in ization which this assumption is no longer assumed true: the letters of the encoding alphabet may have non uniform lengths. but there is an additional restriction that the length of each codeword must be less than a given constant. how many are 1s. . No algorithm is known to solve this problem to in linear or linearithmic time. unlike the pre-sorted and unsorted conventional Huffman problems. a problem first applied to circuit Length-limited Huffman coding limited Length-limited Huffman coding is a variant where the goal is still to limited achieve a minimum weighted path length. When working under this assumption. Huffman coding with unequal letter costs ng In the standard Huffman coding problem. but it is no longer sufficient just to minimize the number of symbols used by the message. Its time complexity is . The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Huffman's algorithm. etc.

after the authors of the paper Tucker presenting the first linearithmic solution to this optimal binary alphabetic problem. the alphabetic order of inputs and outputs must be identical. but is not a variation of this algorithm. which. could not be assigned code . like ShannonFano coding. for example. having the same codeword lengths as the original solution. it is assumed that any codeword can correspond to any input symbol.the same manner or with the same efficiency as conventional Huffman coding. These optimal alphabetic binary trees are often used as binary search trees trees. is also optimal. since it is optimal like . but instead should be assigned either or . but alphabetic in weight probability. rendering Hu-Tucker coding unnecessary. Huffman coding. which can be found from calculating these lengths. Huffman-Shannon-Fano code corresponding to the Fano example is . The code resulting from numerically (re-)ordered input is sometimes called the canonical )ordered Huffman code and is often the code used in practice. The technique for finding this code is sometimes called Huffman-Shannon Shannon-Fano coding. the Huffman code has the same lengths as the optimal alphabetic code. . Optimal alphabetic binary trees (Hu Tucker coding) (Hu-Tucker In the standard Huffman coding problem. due to ease of encoding/decoding. This is also known as the Hu-Tucker problem. Canonical Huffman code ical If weights corresponding to the alphabetically ordered inputs are in numerical order. In the alphabetic version. The Huffman . Thus. which has some similarities to Huffman algorithm.

. 5. Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) -> great for decoder. making the upper limit of inefficiency unbounded. Huffman coding is optimal when the probability of each input symbol is a negative power of two. If prior statistics are available and accurate. 2. ADVANTAGES • Algorithm is easy to implement • Produce a lossless compression of images DISADVANTAGES • Efficiency depends on the accuracy of the statistical model used and type of image. The worst case for Huffman coding can happen when the probability of a symbol 6 cedes 2-1 = 0. These situations often respond well to a form of blocking called run-length encoding.5. 4.PROPERTIES 1. The frequencies used can be generic ones for the application domain that are based on average experience. then Huffman coding is very good 3. unambiguous. or they can be the actual frequencies found in the text being compressed.

and the only way for it to know is by following the paths of the up-side down tree and coming to an end of it (one of the branch). if the encoded data is corrupted with additional bits added or bits missing. then whatever that is decoded will be wrong values. • Another disadvantage of Huffman is that. • The Huffman encoding process is usually done in two passes. • Compression of image files that contain long runs of identical pixels by Huffman is not as efficient when compared to RLE. a statistical model is built. and the final image displayed will be garbage. • It is required to send Huffman table at the beginning of the compressed file .• Algorithm varies with different formats. and then in the second pass the image data is encoded based on the generated model.otherwise the decompressor will not be able to decode it. . During the first pass. all codes of the encoded data are of different sizes (not of fixed length). Therefore it is very difficult for the decoder to know that it has reached the last bit of a code. This causes overhead. but few get any better than 8:1 compression. From here we can see that Huffman encoding is a relatively slow process as time is required to build the statistical model in order to archive an efficient compression rate. Thus.

Cormen. A. indeed. J.E. PHI Pvt. in practice arithmetic coding is often preceded by Huffman coding. 2. Ltd. http://en. Hopcroft and J. as it is easier to find an arithmetic code for a binary input than for a nonbinary input. T.E.wikipedia. Stein.. The Design and Analysis Of Computer Algorithms. 3. Huffman coding is in wide use because of its simplicity.D. Arithmetic coding can be viewed as a generalization of Huffman coding.L. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by Huffman coding. Huffman coding today is often used as a "back-end" to some other compression method.Aho.org/huffman_codes 3. Rivest and C.APPLICATIONS 1. C.www.H. 2007 4.com/Huffman 2. Leiserson.V. 2007 . REFERENCES 1.Ullman. Pearson Education Asia.google. Introduction to Algorithms. high speed and lack of encumbrance by patents. R.