Presentation on Huffman Coding as a Data Compression Technique

Submitted by :
Sumit Singh Bagga 3rd C.S.E. 211/07

Submitted to:
Dr. V.K.Pathak C.S.E. Deptt. HBTI-Kanpur

Elements of the Greedy Strategy
1) Determine the optimal substructure of the problem. 2) Develop a recursive solution. 3) Prove that at every stage of the recursion, one of the optimal choices is the greedy choice. 4) Show that all but one of the sub problems induced by making the greedy choice are empty. 5) Develop a recursive algorithm that implements the greedy strategy.

More Generally …
1) Cast the optimization problem as one in which we make a choice and are left with one sub problem to solve. 2) Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe.(Greedy-Choice Property) 3) Demonstrate that, having made the greedy choice, what remains is a sub problem with the property that if we combine an optimal solution to the sub problem with the greedy choice

Greedy-Choice Property
• A globally optimal solution can be arrived at by making a locally optimal (greedy) choice. • Make whatever choice seems best at the moment and then solve the sub-problem arising after the choice is made. • The choice made by a greedy algorithm may depend on choices so far, but it cannot depend on any future choices or on the solutions to sub-problems. • Usually progress in a top-down fashion  making one greedy choice on after another, iteratively reducing each given problem instance to a smaller one. • Of course, we must prove that a greedy choice at each step yields a globally optimal solution.


Optimal Substructure
• A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. • We have the luxury of assuming that we arrived at a sub problem by having made the greedy choice in the original problem.

Huffman codes : Overview
• Huffman codes are a very effective technique for compressing data (savings of 20% to 90%). • Data is considered to be a sequence of characters. • We design a Binary Character Code wherein each character is represented by a unique binary string. • Huffman’s greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string.


Prefix Code …
• An optimal code is always represented by a full binary tree, in which every non-leaf node has two children. • If C is the alphabet from which characters are drawn & all character frequencies are positive then the tree has exactly – |C| leaves and |C|-1 internal B(T ) = nodes ∑ f (c)dT (c) Depth of c (length of the codeword) c∈ C – Cost(number of bits): Frequency
of c

(Not optimal)


Constructing A Huffman Code
• C is a set of n characters. Each character c ∈ C is an object with a frequency, denoted by f[c]. • The algorithm builds the tree T in a bottom-up manner Begin with |C| leaves and perform a sequence of |C|-1 merging. A min-priority queue Q, keyed on f, is used to identify the two least-frequent objects to merge together. The result of the merger of two objects is a new object whose frequency is the

Constructing A Huffman Code (Cont.)

Total computation time = O(n lg
O(lg n)

O(lg n) O(lg n) O(lg n)



Correctness of Huffman’s Algorithm •A greedy algorithm makes a sequence of choices, each of the choices that seems best at the moment is chosen. Here it always produces an optimal solution. •Two ingredients that are exhibited by most problems that lend themselves to a greedy strategy and are also required to be proved for showing the correctness of Huffman’s algorithm are-

Lemma 1 : Greedy-Choice Property
• Let C be an alphabet in which each character c ∈ C has a frequency f[C]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit

Since each swap does not increase the cost, the resulting tree T’’ is also an 24 optimal tree

Proof of Lemma 1
• Without loss of generality, assume f[a]≤ f[b] and f[x]≤ f[y] • The cost difference between T and T’ B (T ) − B (T ' ) = ∑ f (c)dT (c) − ∑ f (c)dT (c) c C (on swapping∈a & x)is c∈C

= f [ x]dT ( x) + f [a ]dT (a ) − f [ x]dT ( x) − f [a ]dT (a )
' '

= f [ x]dT ( x) + f [a ]dT (a ) − f [ x]dT (a ) − f [a]dT ( x) = ( f [a ] − f [ x ])(dT (a) − dT ( x)) ≥0

•Similarly, B(T’’) ≤ B(T) on swapping b & y, but T is optimal, B(T)≤ B(T’’) B(T’’) = B(T) Therefore T’’ is an optimal tree in which

Merging as a Greedy-Choice? : Lemma 2
• Building up an optimal tree by mergers can, without loss of generality, begin with greedy choice of merging together these two characters of lowest frequency. • We can view the cost of a single merger as being the sum of the frequencies of the two items being merged. • Of all possible mergers at each step, HUFFMAN chooses the one that incurs the least cost. • Hence it is a greedy choice.


Lemma 3 : Optimal Substructure Property

• Let C’ = C – {x, y} ∪ {z} – f[z] = f[x] + f[y] • Let T’ be any tree representing an optimal prefix code for C’ Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represent an optimal prefix code for C • Observation: B(T) = B(T’) + f[x] + f[y]  B(T’) = B(T)-f[x]-f[y] – For each c ∈C – {x, y}  dT(c) = dT’ (c) f[c]dT(c) = f[c]dT’ (c) – dT(x) = dT(y) = dT’ (z) + 1 – f[x]dT(x) + f[y]dT(y) = (f[x] + f[y])(dT’ (z) + 1) = f[z]dT’ (z) + (f[x] + f[y])


B(T) = B(T’)+f[x]+f[y] B(T’) = B(T)-f[x]-f[y]


B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3 = B(T) - 5 - 9 B(T) = 45*1+12*3+13*3+5*4+9*4+16*3


Proof of Lemma 3
Prove by contradiction.
• Suppose that T does not represent an optimal prefix code for C. Then there exists a tree T’’ such that B(T’’) < B(T). • Without loss of generality, by Lemma 2, T’’ has x and y as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then • B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’) – T’’’ is better than T’  contradiction to the assumption that T’ is an optimal prefix code for C’


Now since we have proved the 2 properties, we can infer that Huffman Code produces an optimal prefix code. Hence Correctness of Huffman Algorithm as a greedy technique is verified.


Sign up to vote on this title
UsefulNot useful