Greedy Algos Lecture on Huffman Codes

Lecture# 08
Greedy Algorithms
1
Optimization Problems
• Some problems can have many possible/ feasible solutions
with each solution having a specific cost. We wish to find the
best solution with the optimal cost.
• Maximization problem finds a solution with maximum cost
• Minimization problem finds a solution with minimum cost
• A set of choices must be made in order to arrive at an optimal

(min/max) solution, subject to some constraints.
• Is “Sorting a sequence of numbers” optimization problem?
2
Optimization Problems
• Two common techniques:
• Greedy Algorithms (local)
• Make the greedy choice and THEN
• Solve sub-problem arising after the choice is made
• The choice we make may depend on previous choices, but
not on solutions to sub-problems
• Top down solution, problems decrease in size
• Dynamic Programming (global)
• We make a choice at each step
• The choice depends on solutions to sub-problems
3
• Bottom up solution, smaller to larger sub-problems
Greedy Algorithm
• A greedy algorithm works in phases. At each phase:
• makes the best choice available right now, without regard for future
consequences
• hopes to end up at a global optimum by choosing a local optimum at
each step. For some problem, it works
• Greedy algorithms sometimes works well for optimization problems
• Greedy algorithms tend to be easier to code
• Greedy algorithms frequently used in everyday problem solving

• Choosing a job
• Route finding
• Playing cards 4
• Invest on stocks
Greedy Algorithm
• How to know if a greedy algorithm will solve a particular
optimization problem?
• Two key ingredients
• Greedy choice property
• Optimal sub-structure
• If a problem has these properties, then we can develop a greedy

algorithm for it.
5
Greedy Algorithm
• Greedy choice property: A globally optimal solution can be arrived at
by making a locally optimal (greedy) choice
• Make whatever choice seems best at the moment and then solve the
sub-problem arising after the choice is made
• The choice made by a greedy algorithm may depend on choices so far,
but it cannot depend on any future choices or on the solutions to sub-
problems
• Optimal sub-structure: A problem exhibits optimal substructure if an

optimal solution to the problem contains within it optimal solutions
to sub-problems
6
Huffman Codes
7
Huffman Codes
Computer Data Encoding:
How do we represent data in binary?
Historical Solution:
Fixed length codes.
Encode every symbol by a unique binary string of a fixed length.

Examples: ASCII (7 bit code), EBCDIC (8 bit code), etc.
8
American Standard Code for Information
Interchange (ASCII): Example
Encode: AABCAA
A A B C A A
1000001 1000001 1000010 1000011 1000001 1000001
9
Assuming an ℓ bit fixed length code and a file of n characters
Total space usage in bits: nℓ bits.
Variable Length Codes
Idea: In order to save space, use less bits for frequent characters and
more bits for rare characters.
Example: suppose we have 3 symbols: { A, B, C } and in a file we have
1,000,000 characters. Need 2 bits for a fixed length code and a total of
2,000,000 bits.
Suppose the frequency distribution of the characters is:
C B A
500 500 999,000
Encode: C B A Note that the code of A is of length

11 10 0 A savings
1, and offor
the codes almost
B and C50%
are of
length 2.
Total space usage in bits: 10
(999,000 x 1) + (500 x 2) + (500 x 2) = 1,001,000
How do we Decode?
In the fixed length, we know where every character starts,
since they all have the same number of bits.
Example: A = 00
B = 01
C = 10
00000001 01 1010 10 0110 01 000010 10

A A A B B C C C B C B A AC C
11
How do we Decode?
In the variable length code, we use an idea called Prefix code, where
no code is a prefix of another.
Example: A = 0
B = 10
C = 11
None of the above codes is a prefix of another.
So, for the string:

A A A B B C C C B C B A A C C the encoding:
0 0 0 10 10 11 11 11 10 11 10 0 0 11 11
Decoding the string

0 0 0 10 10 11 11 11 10 11 10 0 0 11 11 12
AAAB B C C C B C B AA C C
How to construct Huffman code?
• Construct a variable length code for a given file with the
following properties:
• Prefix code.
• Using shortest possible codes.
• Efficient.
• Idea:
• Consider a binary tree, with:
• 0 meaning a left turn 0 1
• 1 meaning a right turn.
A
Consider the paths from the root 0 1
to each of the leaves A, B, C, D:
B
A:0 0 1
B : 10 13
C : 110 C D
D : 111
Observe:
1. This is a prefix code, since each of the leaves has a path
ending in it, without continuation.
2. If the tree is full then we are not “wasting” bits.
3. If we make sure that the more frequent symbols are closer to
the root then they will have a smaller code.
0 1
A
0 1
B
0 1
14
C D
Greedy Algorithm:
1. Consider all pairs: <frequency, symbol>.
2. Choose the two lowest frequencies, and make them brothers, with the
root having the combined frequency.
3. Iterate.
15
Greedy Algorithm Example:
Alphabet: A, B, C, D, E, F
Frequency table:
F E D C B A
60 50 40 30 20 10
Total File Length: 210

16
Algorithm Run:
A 10 B 20 C 30 D 40 E 50 F 60
17
Algorithm Run:
X 30 C 30 D 40 E 50 F 60
A 10 B 20
18
Algorithm Run:
Y 60 D 40 E 50 F 60
X 30 C 30
A 10 B 20
19
Algorithm Run:
D 40 E 50 Y 60 F 60
X 30 C 30
A 10 B 20
20
Algorithm Run:
Z 90 Y 60 F 60
D 40 E 50 X 30 C 30
A 10 B 20
21
Algorithm Run:
Y 60 F 60 Z 90
X 30 C 30 D 40 E 50
A 10 B 20
22
Algorithm Run:
W 120 Z 90
Y 60 F 60 D 40 E 50
X 30 C 30
A 10 B 20
23
Algorithm Run:
Z 90 W 120
D 40 E 50 Y 60 F 60
X 30 C 30
A 10 B 20
24
Algorithm Run:
V 210
0 1
Z 90 W 120
0 1 1
0
D 40 E 50 Y 60 F 60
1
0
X 30 C 30
0 1
A 10 B 20 25
The Huffman encoding:
V 210
A: 1000 0 1
B: 1001
Z 90
C: 101 1
W 120
0 0 1
D: 00
E: 01 D 40 E 50 Y 60 F 60
0 1
F: 11
X 30 C 30
0 1
A 10 B 20
26
File Size: 10x4 + 20x4 + 30x3 + 40x2 + 50x2 + 60x2 =
40 + 80 + 90 + 80 + 100 + 120 = 510 bits
Note the savings:
The Huffman code:

Required 510 bits for the file.
Fixed length code:

Need 3 bits for 6 characters.
File has 210 characters.
Total: 630 bits for the file. 27

Practical considerations
• It is not practical to create a Huffman encoding for a single
short string, such as ABRACADABRA
• To decode it, you would need the code table
• If you include the code table in the entire message, the
whole thing is bigger than just the ASCII message
• Huffman encoding is practical if:
• The encoded string is large relative to the code table, OR
• We agree on the code table beforehand
• For example, it’s easy to find a table of letter
frequencies for English (or any other alphabet-based
28
language)
Practice Example
Alphabet: A, B, C, D, E, F
Frequency table:
F E D C B A
5 9 16 12 13 45
Total File Length: 100

29
Find the fixed and variable length code.

100
• Consider a file of 100,000 0 1
characters taken from { A – F }
with the probabilities shown. A:45 55
• Using the fixed-length encoding 0 1
above requires 300,000 bits.
• Using the variable-length encoding 25 30
above requires only 224,000 bits. 0 1 0 1
C:12 B:13 14 D:16

0 1
F:5 E:9
Letter to be encoded A B C D E F
Frequency (thousands) 45 13 12 16 9 5
Fixed-length code 000 001 010 011 100 101
Variable-length code 0 101 100 111 1101 1100
30
Activity Selection Problem
31
Activity Selection Problem ASP
Input: Set A = {a1, a2, ..., an} of n activities, where activity ai is the half-
open interval [si, fi), where si and fi are the start and finish time.
Output: Set M ⊆ A, such that each pair of activities in M is compatible,

i.e. for each ai∈M, i≠j, either si ≥ fj or sj ≥ fi and |M| is the maximum
possible, i.e. we pick as many activities as possible.
The problem involves scheduling of several competing activities that

require exclusive use of common resource.
Objective in activity-selection problem is to select a maximum-size

subset of mutually compatible activities.
32
Activity Selection Problem ASP
Compatible Activity: Activities ai and aj are compatible if the intervals
[si, fi) and [sj, fj) do not overlap i.e si ≥ fj or sj ≥ fi
i 1 2 3 4 5 6 7 8 9 10 11
si 1 3 0 5 3 5 6 8 8 2 12
fi 4 5 6 7 8 9 10 11 12 13 14
A set S of activities sorted in increasing order of finish time
The subset consisting of mutually compatible activities are

• {a3, a9, a11} but it is not a maximal subset, 33
• {a1, a4, a8, a11} is larger.
• {a2, a4, a9, a11} is another largest subset.
Application: Scheduling Problem
• A classroom can be used for one class at a time.
• There are n classes that want to use the classroom.
• Every class has a corresponding time interval Ij = [sj, fj) during which
the room would be needed for this class.
• Our goal is to choose a maximal number of classes that can be
scheduled to use the classroom without two classes ever using the
classroom at the same time.
• Assume that the classes are sorted according to increasing finish
times; that is, f1 < f2 < … < fn.
• Using brute force method requires checking all possible
combinations of activities to find the maximal subset.
2 6
34
1 5 7
3 4 8
A Recursive Greedy Algorithm
• The greedy algorithm first sorts the activities on the finish times and
them picks them in order, such that each new activity picked is
compatible with the one picked previously. This algorithm solves
the ASP in O(n lg n).
Recursive-Activity-Selector (s, f, i, j)
1 m←i+1
2 while m < j and sm < fi
3 do m ← m + 1
4 if m < j
5 then
return {am}  Recursive-Activity-Selector (s, f, m, j)
6 else return Ø 35
Example: A Recursive Greedy Algorithm
i 0 1 2 3 4 5 6 7 8 9 10 11
si - 1 3 0 5 3 5 6 8 8 2 12
fi 0 4 5 6 7 8 9 10 11 12 13 14
For the Recursive Greedy Algorithm, the set S of activities is

sorted in increasing order of finish time
36
a1
a0
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
i = 0,
j = n + 1 = 12
m←i+1←0+1=1
m < j (1 < 12) and s1 < f0 (But 1>0)
if m < j (1 < 12)
return {a1}  Recursive-Activity-Selector (s, f, 1,12) 37
a2
a1
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
i = 1,
m←i+1←1+1=2
m < j (2 < 12) and s2 < f1 (3 < 4)

m←m+1←2+1=3 38
a3
a1
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (3 < 12) and s3 < f1 (0 < 4)

m←m+1←3+1=4
39
a4
a1
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (4 < 12) and s4 < f1 (But 5 > 4)

if m < j (4 < 12)
return {a4}  Recursive-Activity-Selector(s, f, 4,12)
40
a5
a1 a4
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
i = 4,
m←i+1←4+1=5
m < j (5 < 12) and s5 < f4 (3 < 7)

m←m+1←5+1=6 41
a6
a1 a4
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (6 < 12) and s6 < f4 (5 < 7)

m←m+1←6+1=7
42
a7
a1 a4
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (7 < 12) and s7 < f4 (6 < 7)

m←m+1←7+1=8
43
a8
a1 a4
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (8 < 12) and s8 < f1 (But 8 > 7)

if m < j (8 < 12)
return {a8}  Recursive-Activity-Selector (s, f, 8,12)
44
a9
a1 a4 a8
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
i = 8,
m←i+1←8+1=9
m < j (9 < 12) and s9 < f8 (8 < 11)

m ← m + 1 ← 9 + 1 = 10 45
a10
a1 a4 a8
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (10 < 12) and s10 < f8 (2 < 11)

m ← m + 1 ← 10 + 1 = 11
46
a11
a1 a4 a8
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
m < j (11 < 12) and s11 < f8 (But 12 > 11)
if m < j (11 < 12)
return {a11}  Recursive-Activity-Selector (s, f, 11,12)
47
a1 a4 a8 a11
time
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
i = 11,
m ← i + 1 ← 11 + 1 = 12
m < j (But 12 = 12)

48
An Iterative Greedy Algorithm
Iterative-Activity-Selector (s, f)
1 n ← length[s]
2 A ← {a1}
3 i←1
4 for m ← 2 to n
5 do if sm ≥ fi
6 then A ← A  {am}
7 i←m
8 return A
49
Counting Money Problem
50
Suppose you want to count out a certain amount of money,
using the fewest possible bills and coins
A greedy algorithm would do this would be:

At each step, take largest possible bill/coin that does not overshoot
Example: To make $6.39, you can choose:

• a $5 bill
• a $1 bill, to make $6
• a 25¢ coin, to make $6.25
• A 10¢ coin, to make $6.35
• four 1¢ coins, to make $6.39
51
For US money, the greedy algorithm always gives optimum solution
Greedy algorithm (C, N)
1. sort coins so C1  C2  . . .  Ck
2. S = ;
3. Change = 0
4. i=1 \\ Check for next coin
5. while Change  N do \\ all most valuable coins
6. if Change + Ci ≤ N then
7. Change = Change + Ci
8. S = S  {Ci}
9. else i = i+1
52
In Pakistan, our currency notes are
C1 = 5000, C2 = 1000, C3 = 500, C4 = 100,
C5 = 50, C6 = 20 , C7 = 10
Applying above greedy algorithm to N = 13,660, we get

S = {C1, C1, C2, C2, C2, C3 , C4 , C5 , C7}
Does this algorithm always find an optimal solution?

For Pakistani currency the greedy algorithm always gives
optimum solution.
53
A failure of the greedy algorithm:
In some (fictional) monetary system, “kron” comes in 1 kron, 7 kron, and

10 kron coins.
Applying greedy algorithm to count out 15 krons, we would get

• A 10 kron piece
• Five 1 kron pieces, for a total of 15 krons
• This requires six coins
A better solution would be to use two 7 kron pieces and one 1 kron
piece, requiring only tree coins
54
The greedy algorithm results in a solution, but not in an optimum
solution.
Bin Packing Problem
55
Approximate Bin Packing Problem
We are given n items of sizes S1, S2,…..,Sn. All sizes satisfy 0 < Si <= 1.
The problem is to pack these items in the fewest number of bins,

given that each bin has unit capacity.
Suppose we have item list with sizes 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8.
Optimal packing of these sizes is
0.3 0.5
0.8
0.1
0.7
0.4 56
0.2
B1 B2 B3
Algorithm 1: Next Fit
When processing any item, we check to see whether it fits in the
same bin as the last item; if it does it is placed there; otherwise a
new bin is created.
void NextFit ( )
{ read item1;
while ( read item2 ) {
if ( item2 can be packed in the same bin as item1 )
place item2 in the bin;
else
create a new bin for item2;
item1 = item2;
} /* end-while */
}
57
【 Theorem 】 Let M be the optimal number of bins required to pack a list
I of items. Then next fit never uses more than 2M bins. There exist
sequences such that next fit uses 2M – 2 bins.
Next Fit for 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8 is
0.1
0.5 0.8
0.7
0.2 0.4 0.3
B1 B2 B3 B4 B5
58
Algorithm 2: First Fit
• We scan the bins in order and place the new item in the first bin
that is large enough to hold it. A new bin is created only when the
result of previous placements have left no other alternative.
void FirstFit ( )
{ while ( read item ) {
scan for the first bin that is large enough for item;
if ( found )
place item in that bin;
else
create a new bin for item;
} /* end-while */
}
【 Theorem 】 Let M be the optimal number of bins required to pack a 59

list I of items. Then first fit never uses more than 17M / 10 bins. There
exist sequences such that first fit uses 17(M – 1) / 10 bins.
First Fit for 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8 is
0.1
0.5 0.3
0.8
0.7
0.2 0.4
B1 B2 B3 B4
60
Algorithm 3: Best Fit
• Instead of placing a new item in the first spot that is found. It is placed in the
tightest spot among all bins
Best Fit for 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8 is
0.1
0.3
0.5 0.8
0.7
0.2 0.4
B1 B2 B3 B4 61
T = O( N log N ) and bin no. < 1.7M

Offline Algorithms
• The major problem with all the on-line algorithms is that it is
hard to pack the large items, especially when they occur late
in the input.
• The natural way around this is to sort the items, placing the
largest items first.
• Then apply first fit or best fit, i.e.

• Best Fit Decreasing
• First Fit Decreasing
62
Offline Algorithm
• Best Fit Decreasing for 0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1 is
0.2 0.1
0.3
0.4
0.8 0.7 0.5
B1 B2 B3
63

Greedy Algos Lecture on Huffman Codes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Greedy Algos Lecture on Huffman Codes

Uploaded by

Copyright:

Available Formats

Lecture# 08

• A set of choices must be made in order to arrive at an optimal

• Is “Sorting a sequence of numbers” optimization problem?

• Greedy algorithms sometimes works well for optimization problems

• Greedy algorithms tend to be easier to code

• Greedy algorithms frequently used in everyday problem solving

• If a problem has these properties, then we can develop a greedy

• Optimal sub-structure: A problem exhibits optimal substructure if an

Encode every symbol by a unique binary string of a fixed length.

Encode: C B A Note that the code of A is of length

00000001 01 1010 10 0110 01 000010 10

So, for the string:

Decoding the string

Total File Length: 210

The Huffman code:

Fixed length code:

Total: 630 bits for the file. 27

Total File Length: 100

Find the fixed and variable length code.

C:12 B:13 14 D:16

Output: Set M ⊆ A, such that each pair of activities in M is compatible,

The problem involves scheduling of several competing activities that

Objective in activity-selection problem is to select a maximum-size

A set S of activities sorted in increasing order of finish time

The subset consisting of mutually compatible activities are

For the Recursive Greedy Algorithm, the set S of activities is

m < j (2 < 12) and s2 < f1 (3 < 4)

m < j (3 < 12) and s3 < f1 (0 < 4)

m < j (4 < 12) and s4 < f1 (But 5 > 4)

m < j (5 < 12) and s5 < f4 (3 < 7)

m < j (6 < 12) and s6 < f4 (5 < 7)

m < j (7 < 12) and s7 < f4 (6 < 7)

m < j (8 < 12) and s8 < f1 (But 8 > 7)

m < j (9 < 12) and s9 < f8 (8 < 11)

m < j (10 < 12) and s10 < f8 (2 < 11)

m < j (But 12 = 12)

A greedy algorithm would do this would be:

Example: To make $6.39, you can choose:

Applying above greedy algorithm to N = 13,660, we get

Does this algorithm always find an optimal solution?

In some (fictional) monetary system, “kron” comes in 1 kron, 7 kron, and

Applying greedy algorithm to count out 15 krons, we would get

The problem is to pack these items in the fewest number of bins,

【 Theorem 】 Let M be the optimal number of bins required to pack a 59

T = O( N log N ) and bin no. < 1.7M

• Then apply first fit or best fit, i.e.

You might also like