You are on page 1of 34

Algorithm Design and Complexity

Course 5
Overview
 Greedy Algorithms
 Activity Selection
 Huffman Trees
 Greedy vs Dynamic Programming
 Knapsack Problem
 Greedy
 DP
 Generic problem
Greedy Algorithms
 Efficient method to solve some optimization problems
 The solutions to an optimization problem must satisfy a global
optimum
 More difficult to verify
 Simplification: choose the solution that looks best at each step
 This is called a locally optimal solution

 Advantages:
 Simpler to build the solution
 Less time / Better complexity
 Disadvantage:
 The locally optimal solution does not always lead to the globally
optimal solution
 May not correctly solve the problem (but may provide good
approximations)
Greedy Algorithms (2)
 At each step, we choose the best solution according to
the local optimum (greedy) choice
 We abandon all the other possible solutions
 The solving paths that are not considered by the greedy choice
are discarded!

 We‟ll look at two problems that have a greedy solution


that leads to the global optimum as well
 Activity selection
 Huffman trees

 Greedy is an algorithm design technique (pattern)!


General Greedy Scheme
SolveGreedy(Local_choice, Problem)
partial_sols = InitialSolution(problem); // determine the starting point
final_sols = Φ;
WHILE (partial_sols ≠ Φ)
FOREACH (s IN partial_sols)
IF (s is a solution for Problem) {
final_sols = final_sols U {s};
partial_sols = partial_sols \ {s};
} ELSE // can you optimize current solving path locally ?
IF(CanOptimize(s, Local_choice, Problem)) // YES
partial_sols = partial_sols \ {s} U
OptimizeLocally(s, Local_choice, Problem)
ELSE partial_sols = partial_sols \ {s}; // NO
RETURN final_sols;

 Most times we follow only a single solving path!


Activity Selection Problem
 Given a set of n activities that require exclusive use of a
common resource for a given period of time, determine
the largest subset of non-overlapping activities
 These activities are called mutually compatible
 There might be more than a single solution
 We want to identify one of these best solutions
 Similar to DP, not suitable for finding all possible solutions
 Notations:
 S = {a1, … , an} are the activities
 Each activity has a start time, si, and a finish time, fi
 Each activity requires the common resource for the interval [si,
fi)
Activity Selection Problem (2)
 E.g.
 Activity = classes Resource = classroom
 Activity = processes Resource = CPU
 There exist some other activity selection problems that
are more difficult:
 Maximize the usage time of the resource
 Maximize income if each activity pays for the usage of the
resource
Example – from CLRS
 We can devise a greedy solution if we consider the
activities sorted by their finish times
i 1 2 3 4 5 6 7 8 9
s[i] 1 2 4 1 5 8 9 11 13
f[i] 3 5 7 8 9 10 11 14 16

 Solution: {a1, a3, a6, a8}


 Not unique: {a2, a5, a7, a9}
Define the Sub-Problems
 First, define the similar sub-problems
 Let‟s consider the subset of activities that:
 Start after ai finishes (start after fi)
 Finish before aj starts (finish before sj)
 They are compatible with all activities that:
 Finish before fi
 Start after sj

 Si,j = {all ak in S | fi <= sk < fk < sj}

 We also add two invented activities:


 a0 = [-INF, 0)
 an+1 = [INF, INF + 1)
Define the Sub-Problems (2)
 S0,n+1 = S = the entire set of activities

 When the activities are sorted by their finish time


 f0 <= f1 <= f2 <= … <= fn <= fn+1
 Si,j = Φ if i > j
 fi <= sk < fk < sj < fj
 => fi < fj

 Therefore, the sub-problems are Si,j with 0 <= i < j <=


n+1
Optimal Substructure
 Suppose an optimal solution to Si,j includes the activity ak
 Then, we need to solve two sub-problems:
 Si,k: all activities that start after ai and finish before ak
 Sk,j: all activities that start after ak and finish before aj
 Therefore, the solution to Si,j is made of:
 The solution to Si,k
 ak
 The solution to Sk,j
 Because ak is compatible with both Si,k and Sk,j
 |Solution to Si,j| = |Solution to Si,k| + 1 + |Solution to Sk,j|
Optimal Substructure (2)
 If an optimal solution to Si,j includes ak, then the sub-
solutions for Si,k and Sk,j must also be optimal
 Ai,j = optimal solution for Si,j

 Ai,j = Ai,k U {ak} U Ak,j


 If Si,j is not empty
 We know ak

 c[i, j] = |Ai,j| = maximum size of the subset of mutually


compatible activities in Ai,j
 c[i, j] = 0 if i >= j
Recursive Formulation
 As we do not know the value of k, we must try all the
possible choices in order to find it

 Now, we can solve this problem using DP


 O(n2) sub-problems
 O(n) choices at each step
 O(n3) complexity for the DP solution

 We can find a better one by using a greedy strategy!


Greedy Choice
 Theorem
 If Si,j is not empty and am is the activity with the earliest
finish time in Si,j
 Then, am is used by at least one of the maximum size
subset of mutually independent activities in Si,j
 Si,m = Φ , therefore only Sm,j needs to be solved

 For any other solution to Si,j , we can replace the activity


that finishes earliest in this solution (let‟s call it ak) with
am, and these activities are still mutually independent, as
am finishes earlier than ak
Greedy Choice (2)
 The previous theorem offers the greedy choice
 The number of sub-problems considered in the optimal
solution at each step:
 DP: 2
 Greedy: 1
 The number of choices to be considered at each step:
 DP: j-i-1
 Greedy: 1

 As we have a single choice and a single sub-problem to


solve, we can solve the problem top-down
Greedy Solution
 In order to solve Si,j
 Just choose the activity with the earliest finish time in Si,j
 am
 Then, solve Sm,j

 In order to solve S = S0,n+1


 First choice am1 (is always a1 – why?) for S0,n+1
 Then need to solve Sm1,n+1
 Second choice am2 for Sm1,n+1
 Then need to solve Sm2,n+1
 …
Recursive Algorithm
 Because the greedy algorithm considers the activities sorted by their finish
time, we first need to sort by the finish time!
 O(n logn)

RecursiveActivitySelection(s, f, i, n)
m = i +1
WHILE (m <= n AND s[m] < f[i])
m++ // find the activity with the earliest
// start time that starts after activity i finishes
IF (m <= n) THEN
RETURN {am} U RecursiveActivitySelection(s, f, m, n)
RETURN Φ

Initial call: RecursiveActivitySelection(s, f, 0, n)


Complexity: (n) – go through each activity once
Iterative Algorithm
 Can turn the recursive algorithm into an iterative one

IterativeActivitySelection(s, f, n)
A = {a1}
i=1
FOR (m = 2..n)
IF (s[m] < f[i])
CONTINUE
ELSE
A = A U {am}
i=m
RETURN A

Complexity: (n) – go once through each activity


Huffman Trees
 Efficient method of compressing files
 Especially text files

 Builds a Huffman tree in a greedy fashion


 Specific for the encoded text/file
 It is used for compressing the file
 The compressed file and the Huffman tree are used to
recreate the original file

 Example text: “ana are mere”


Huffman Trees (2)
 K – set of keys that are encoded (the characters in the original
text file)
 In the original text, all the keys are represented on the same
number of bits
 Objective: we want to find an alternative representation for
each key such that:
 The keys that are most frequent are represented on a smaller
number of bits than the ones that are less frequent
 We are able to distinguish easily in this new representation what are
the keys that were in the original file

 Example: text files


 Original representation: char – 8 bits or ASCII – 7 bits
 New representation: 1 bit for the most frequent character in the
encoded text and so on…
Huffman Trees (3)
 Huffman encoding tree:
 An ordered binary tree
 Only the leaves contain the keys from the set K
 All internal nodes must have exactly 2 children
 The edges are coded:
 0 – left edge
 1 – right edge
 The code in the new representation for each key is the set of
codes from the root to the leaf containing that key
 Start from the frequency of appearance of each key in the
original file: p(k) for each k in K
 Example: “ana are mere”
 p(a) = p(e) = 0.25; p(n) = p(m) = 0.083;p(r) = p( ) = 0.166
The Huffman Tree
 T – encoding tree for the set of keys K
 code_length(k) – the length of the code for key k in tree T
 level(k, T) – the level in tree T for the leaf corresponding to key k

 The cost of an encoding tree T for a set of keys K that have the
frequencies p:
Cost(T )  code _ length(k ) * p(k )  level (k , T ) * p(k )
kK kK

 Huffman Tree = An encoding tree of minimum cost for a set of keys


K with frequencies p
 The codes in this tree are called Huffman codes
 Optimization problem!
Building the Huffman Tree
 We can devise a greedy algorithm for building a Huffman tree
for any set of keys K
 Steps:
1. For each key k in K build a simple tree with a single node
that contains k and has the weight w = p(k). Let the forest of
trees be called Forest.
2. Choose any two trees from Forest that have the minimum
weights. Let them be t1 and t2.
3. Remove t1 and t2 from Forest and add a new tree:
a) That has a new root r that does not contain any key (as it is not a
leaf)
b) The two descendents of r are t1 and t2 respectively.
c) The weight of the new tree is w(r) = w(t1) + w(t2)
4. Repeat steps 2 and 3 until Forest contains a single tree
 => the Huffman tree
Example
 Input: “ana are mere”

 p(„a‟) = p(„e‟) = 0.25; p(„n‟) = p(„m‟) = 0.083; p(„r‟) = p(„ „)


= 0.166

 Initially:

W(a)= W(e)= W(r)= W( )= W(m)= W(n)=


0.25 0.25 0.16 0.16 0.08 0.08
Example – Building the Huffman Tree
 Intermediate steps: on whiteboard
 Solution:
 Encoding: „a‟ : 00 , „e‟ : 11 , „r‟ : 010 , „ ‟ : 011 , „m‟ : 100 , „n‟ : 101
 Cost of the tree: Cost(Tree) = 2 * 0.25 + 2 * 0.25 + 3 * 0.083
+ 3 * 0.083 + 3 * 0.166 + 3 * 0.166 = 2.2 bits
 Huffman Tree

0 1
W(a+r+ )=0.57 W(m+n+e)=0.41
0 1 0 1
W(a) W(r+ )=0.32 W(m+n)=0.16 W(e)
0 1 0 1
W(r) W( ) W(m) W(n)
Algorithm for building the Huffman Tree
 On the whiteboard
 Straightforward from the pseudocode
Decoding the File
 Encoded text:
 0010100011000101000111001101011
 a n a „‟ a r e „‟ m e r e
 We also need the Huffman tree

 Starting from the first bit, we walk the tree from the root
to the first leaf we encounter
 When at a leaf, append the key corresponding to that leaf to
the decoded text
 Go to the root again and repeat until we reach the end of the
encoded text
Greedy Algorithms – Conclusions
 Greedy algorithms that build the globally optimal solution can
be devised for some problems that have an optimal
substructure
 Steps for devising a greedy algorithm:
 Determine the optimal substructure
 Develop a recursive solution
 Prove that at any stage of recursion, one of the optimal choices
is the greedy choice. Therefore, it‟s always safe to make the
greedy choice
 Show that all but one of the sub-problems resulting from the
greedy choice are empty
 Develop a recursive greedy algorithm
 Convert it to an iterative algorithm
Greedy Algorithms – Conclusions (2)
 Properties for optimization problems that accept correct
greedy solutions:
 Optimal substructure
 Greedy choice property

 Preprocessing is essential for efficient greedy algorithms:


 E.g. sort some data prior to process it with the greedy
algorithm
Greedy vs. DP
 Similarities:
 Optimization problems
 Optimal substructure (including division into sub-problems)
 Make a choice at each step
 Differences:
 Greedy: 1 choice, 1 sub-problem to be solved
 Greedy is top-down, DP is bottom-up
 Greedy has the greedy choice property
 Greedy does not use memoization as the other sub-problems
are not important (they are discarded if they are not used by
the greedy choice)
Knapsack Problem
 Given a set on n items:
 Values v[i]
 Weights w[i]
 Which are the items that should be carried in order to
maximize the total value that can be carried in a knapsack
of total weight W?
 Optimization problem

 Similar to the change-making problem


 Given a set of divisions (coins and banknotes for a currency),
find the minimum number of coins and banknotes needed to
change a given amount of money
Knapsack Problem (2)
 Can be solved efficiently if:
 Are allowed to carry fractions of the items
 Fractional knapsack problem
 Greedy solution: sort the items according to the ratio v[i]/w[i] and
choose the items in the order of the highest ratio until the knapsack
is full

 We are not allowed to carry fractions of the items


 Integer (0/1) knapsack problem
 But the values for weights and values are relatively small integers
 DP solution: on whiteboard
Knapsack Problem (3)
 However, in the general case:
 Real values for weights
 Very high values for weights

 The problem can only be solved using a backtracking


approach
 The problem is NP-complete
 The class of the most difficult problems that can be solved on a
computer (at this moment, it‟s considered that these problems
cannot be solved in polynomial time)
References
 CLRS – Chapter 16

 MIT OCW – Introduction to Algorithms – video lecture


16

 http://www.math.fau.edu/locke/Greedy.htm

You might also like