You are on page 1of 14

Chapter 16 1

Greedy Algorithms
Greedy Algorithm makes the choice that looks best at the
moment. May not lead to the optimal solution.
An activity-selection problem
Given a set S = {a1 , a2 , .., an } of activities that wish to use a
resource.
This resource can be used by one activity at a time (no
overlapping).
Each activity ai has a start time si and a finish time fi .
Two activities ai and aj are compatible if the intervals [si , fi ] and
[sj , fj ] do not overlap.
Assume that the activities are ordered by increasing finish time
f 1 ≤ f2 ≤ · · · ≤ fn .
Chapter 16 2

Theorem 16.1 Consider any nonempty subproblem Sij and let


am be the activity in Sij with the earliet finishing time
XXfm = min{fk : ak ∈ Sij }
Then
XX1.Activity am is used in some maximum size subset of
mutually compatible activities of Sij
XX2. The subproblem Sim is empty so that choosing am leaves
the subproblem Smj as the only one that may be nonempty
Chapter 16 3

Greedy-Activity-Selection(s, f )
1 XXn ←− length[s]
2XXA ←− {a1 }
3XXi ←− 1
4XXfor m ←− 2 to n
5 XXXXdo if sm ≥ fi
6 XXXXXXthen A ←− A ∪ {am }
7XXXXXXXXi ←− m
8XXreturn A

Observe it would take θ(n lg n) to sort activities.


The Greedy-Activity-Selection has time complexity θ(n).
Chapter 16 4

Elements of the greedy strategy


It is possible to make locally optimal choices in order to construct
a globally optimal solution. Does not rely on the optimal solution
of subproblems (as in dynamic programming).
The optimal solution contains subproblems which are optimal
solutions to subproblems.
Develop a recursive algorithm that implements the greedy strategy
Convert the recursive algorithm to an iterative algorithm
Chapter 16 5

Greedy versus Dynamic Programming


Given n items each with weight wi and value vi
Maximize the value of goods in a knapsack that can hold at most
W pounds
0-1 knapsack: must take or leave the entire item
XXLet L be the most valuable load ≤ W
XXIf j is removed, the remaining load must be the most valuable
load with weight W − wj using the n − 1 elements (excluding j)
Fractional knapsack problem:
XXRemove a weight w from one item j from the optimal load W .
XXThe remaining load must be the optimal load with weight
W − w given n − 1 items and wj − w pounds of j (leftover weight
of j)
Fractional knapsack can be solved with a greedy algorithm
The 0-1 knapsack cannot (dynamic programming)
Chapter 16 6

Greedy algorithm for knapsack problems


Consider the fractional knapsack problem

XX1. Compute the value per pound for each item


XX2. Sort the items by decreasing value per pound
XX3. Take as much as possible of the element with the largest
value per pound
XX4. Continue with the next most valuable item
Time complexity O(n lg n) (mainly due to sorting)
Chapter 16 7

Example: w1 = 10, w2 = 20, w3 = 30, v1 = 60, v2 = 100, v3 = 120.


XXKnapsack size W = 50.
What are the item chosen under
XXGreedy fractional knapsack
XXGreedy 0-1 knapsack :
XXOptimal 0-1 knapsack :
Chapter 16 8

Huffman Codes
Coding scheme based on frequency of occurrence of each
character. It is efficient in data compression.
Let a data file consist of the characters a, b, c and d.
Represent each character using 2 bits (a : 00, b : 01, c : 10, d : 11).
Using this encoding a data file with n of these characters will
always be encoded using 2n bits.
Suppose that we know more about the data stream (i.e. the
frequency of occurrence of each character).
Frequencies of a, b, c, d are 10, 2, 8, 1 respectively
Change the encoding to a : 0, c : 10, b : 110, d : 111
Encoding of abacacd requires 14 bits with the regular code and
1+3+1+2+1+2+3 = 13 bits with the variable length encoding.
Chapter 16 9

The variable length encoding is called a prefix code. No code


word is a prefix of another codeword.
A full binary tree can be used for the decoding process.
XX- full tree all nodes are either leaves or have two children
XX- don’t confuse the term full with complete – a heap was an
“nearly complete” binary tree
Given an alphabet C we need a binary tree with |C| leaves and
|C| − 1 internal nodes.
Let f (c) denote the frequency of a c in a file and
Let dT (c) denote the depth of c’s leaf in the tree
The number of bits required to encode a file is
P
B(T ) = c∈C f (c)dT (c)
to implement we use a min-priority-queue Q (we could use a
min-heap, has the min heap property)
Chapter 16 10

HUFFMAN(C)
1XXn ←− |C|
2XXQ ←− C
3 XXfor i ←− 1 to n − 1
4XXXXdo allocate a new node z
5XXXXXXlef t[z] ←− x ←− EXT RACT − M IN (Q)
6XXXXXXright[z] ←− y ←− EXT RACT − M IN (Q)
7 XXXXXXf [z] ←− f [x] + f [y]
8 XXXXXXIN SERT (Q, z)
9XXreturn EXT RACT − M IN (Q)

Time complexity of the algorithm is O(n lg n): The loop iterates


n − 1 times and each heap operation takes O( lg n).
Chapter 16 11

Lemma 16.2: Let C be an alphabet in which each character


c ∈ C has frequency f (c). Let x and y be two characters in C
having the lowest frequencies. Then there exists an optimal prefix
code for C in which the codewords for x and y have the same
length and differ only in the last bit.
Proof:
Let T be an arbitrary optimal prefix code tree Let a and b be two
characters that are sibling leaves at a maximum depth in T .
Assume that f (a) ≤ f (b) and f (x) ≤ f (y)
Since x and y have the lowest frequencies, then f (x) ≤ f (a) and
f (y) ≤ f (b).
Swap a and x in T to form T 0 .
Swap b and y in T 0 to obtain T 00 .
Chapter 16 12

The difference in cost between T and T 0 is

B(T ) − B(T 0 ) = f (x)dT (x) + f (a)dT (a) − f (x)dT 0 (x) − f (a)dT 0 (a)
= f (x)dT (x) + f (a)dT (a) − f (x)dT (a) − f (a)dT (x)
= −f (x)(dT (a) − dT (x)) + f (a)(dT (a) − dT (x))
= (f (a) − f (x))(dT (a) − dT (x)) ≥ 0

Since x has minimum frequency and b has maximum depth in T .


Similarly B(T 0 ) − B(T 00 ) ≥ 0.
Thus B(T 00 ) ≤ B(T ).
Since B(T ) is optimal, then B(T 00 ) = B(T ).
Thus T 00 is optimal and x and y appear as siblings of maximum
depth
At each step of the Huffman code make a greedy decision:
combine the two elements with lowest frequencies.
Chapter 16 13

Lemma 16.3: Let T be an optimal prefix code binary tree over


the alphabet C such that f (c) is the frequency of each c ∈ C.
Consider any two arbitrary characters x and y appearing as
siblings in T with parent z, representing a hybrid character with
frequency f (z) = f (x) + f (y). The tree T 0 = T \ {x, y} represents
an optimal prefix code for alphabet C 0 = (C \ {x, y}) ∪ z.
Proof:
For each c ∈ C \ {x, y}, dT (c) = dT 0 (c)
Thus f (c)dT (c) = f (c)dT 0 (c)
Also dT (x) = dT (y) = dT 0 (z) + 1
f (x)dT (x) + f (y)dT (y) = (f (x) + f (y))(dT 0 (z) + 1)
Or f (x)dT (x) + f (y)dT (y) = f (z)dT 0 (z) + f (x) + f (y)
Thus B(T ) = B(T 0 ) + f (x) + f (y)
If T 0 represents a nonoptimal prefix code for C 0 then there exists
a tree T 00 for C 0 such that B(T 00 ) < B(T 00 )
Chapter 16 14

z is a character in C 0 and a leaf in T 00


If we add x and y as the children of z in T 00 , we obtain a prefix
code for C with cost B(T 00 ) + f (x) + f (y) < B(T ) contradiction
with the optimality of T
Therefore, T 0 must be optimal for the alphabet C 0
Theorem
HUFFMAN is an optimal prefix code

You might also like