You are on page 1of 15

Greedy Algorithms

Dr. Tathagata Ray


Professor, BITS Pilani, Hyderabad Campus
BITS Pilani rayt@hyderabad.bits-pilani.ac.in
Hyderabad Campus
Huffman Codes

• Suppose there is a file with the following content.

• “there was a brown crow he was very intelligent”

• In this text we have

Char a b c e g h i l n o r s t v w y
Freq 3 1 1 6 1 2 2 2 3 2 4 2 3 1 4 1

• We have to encode this file such that each character is


represented by a unique binary string called codewords.

BITS Pilani, Hyderabad Campus


Codewords

• If we use fixed length code, we require 4 bits to


represent this 16 characters.

• Thus we will encode the text using 38x4 = 152 bits.

• Can we do better?

BITS Pilani, Hyderabad Campus


Variable-length code

• Variable-length code can do considerably better than a


fixed length code.

• Give frequent characters short codewords and infrequent


character long codewords.
• For example
Char a b c d e f Total
bits
Freq 45000 13000 12000 16000 9000 5000 100000
Fixed 000 001 010 011 100 101 300000
len.
Var. 0 101 100 111 1101 1100 224,000
code

BITS Pilani, Hyderabad Campus


Prefix codes

• We consider here only codes in which no codeword is


also a prefix of some other codeword.

• These are called prefix codes.

• Prefix codes are simple to decode.

• For example 001011101 parses uniquely to 0.0.101.1101


i.e. aabe.

BITS Pilani, Hyderabad Campus


Decoding process

• Decoding needs a convenient representation for the


prefix code so that we can easily pick off the initial
codeword.

• A binary tree whose leaves are the given characters


provide such a representation.

BITS Pilani, Hyderabad Campus


100
0 1
a:45 55
0 1
25 30
0 1 0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

BITS Pilani, Hyderabad Campus


Optimal code

• An optimal code for a file is always represented by a full


binary tree, in which every non leaf node has two
children.

• If C is the alphabet from which the characters are drawn


and all character frequencies are positive then the tree
for an optimal prefix code has exactly |C| leaves and
exactly |C-1| internal nodes.

BITS Pilani, Hyderabad Campus


Cost of a tree

• Given a tree T corresponding to a prefix code, the


number of bits required to encode a file is given by

• .
Depth of c in T

Frequency of character c

BITS Pilani, Hyderabad Campus


Constructing a Huffman code

HUFFMAN(C)

1. For to -1
2. allocate a new node .
3. EXTRACT-MIN()
4. EXTRACT-MIN()

5. INSERT(, )
6. return EXTRACT-MIN() //return root of the tree

BITS Pilani, Hyderabad Campus


Example
f:5 e:9 c:12 b:13 d:16 a:45

c:12 b:13 14 d:16 a:45


0 1
f:5 e:9

14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13

BITS Pilani, Hyderabad Campus


Example

14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13

30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

BITS Pilani, Hyderabad Campus


Example
30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
BITS Pilani, Hyderabad Campus
Example
a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
100 f:5 e:9
0 1

a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9 BITS Pilani, Hyderabad Campus
Time Complexity

• If we use heap data structure to implement the minimum


priority queue Q then constructing the queue will take
O(n) time.

• Each extract minimum operation takes O(lg n) time and


the For loop runs for n-1 time, hence the total time taken
is O(n lg n).

BITS Pilani, Hyderabad Campus

You might also like