Professional Documents
Culture Documents
Dynamic Programming
Greedy Algorithms
1
1. Dynamic programming
2
Four steps of dynamic programming
3
Example 1: Matrix-chain multiplication
Example: A1 10 100
A2 100 5
A3 5 50
((A1A2)A3)) needs
10.100.5 + 10.5.50 = 5000 + 2500
= 7500 scalar multiplications.
(A1(A2A3)) needs
100.5.50 + 10.100.50 = 25000 + 50000 = 75000 scalar
multiplications.
5
Problem statement
6
The structure of an optimal parenthesization
7
Represent a recursive solution
m[i, j] = 0 if i = j,
= min {m[i, k] + m[k + 1, j] + pi-1pkpj.}
if i < j. (5.2)
To help us to keep track of how to construct an optimal
solution, let define:
9
The important observation
10
Step 3: Computing the optimal costs
Instead of computing the solution to recurrence formula (5.2)
by a recursive algorithm, we perform the third step of the
dynamic programming paradigm and compute the optimal cost
by using a bottom-up approach.
12
An example of matrix-chain multiplication
Since we have defined m[i, j] only for i < j, only the portion
of the table m above the main diagonal is used.
13
An example of matrix-chain multiplication (cont.)
Table m
i
1 2 3 4 5 6
6 15125 10500 51375 3500 5000 0
5 11875 7125 2500 1000 0
j 4 9357 4375 750 0
3 7875 2625 0
2 15750 0
1 0
Table s Figure 5.1
i
1 2 3 4 5
6 3 3 3 5 5
5 3 3 3 4
j 4 3 3 3
3 1 2
2 1
14
An example of matrix-chain multiplication (cont.)
= 7125
k = 3 for A2..5
15
Step 4: Constructing an optimal solution
16
Compute the result
17
Elements of dynamic programming
18
Overlapping subproblems
19
Example 2: Longest common subsequence
20
Characterizing a longest common subsequence
21
A recursive solution to subproblems
23
procedure LCS-LENGTH(X, Y)
begin
m: = length[X]; n: = length[Y];
for i: = 1 to m do c[i, 0]: = 0; for j: = 1 to n do c[0, j]: = 0;
for i: = 1 to m do
for j: = 1 to n do
if xi = yj then
begin c[i, j]: = c[i-1, j-1] + 1; b[i, j]: = “” end
else if c[i – 1, j] > = c[i, j-1] then
begin c[i, j]: = c[i – 1, j]; b[i, j]: = “” end
else
begin c[i, j]: = c[i, j-1]; b[i, j]: = “” end
end;
The following figure 5.2 shows the table c for the example.
24
yj B D C A B A
0 0 0 0 0 0 0
xi
A 0 0 0 0 1 1 1
B 0 1 1 1 1 2 2
C 0 1 1 2 2 2 2 Figure 5.2
0 1 1 2 2 3 3
B
0 1 2 2 2 3 3
D
A 0 1 2 2 3 3 4
B 0 1 2 2 3 4 4
25
Tạo chuỗi con chung dài nhất
The table b can be used to construct an LCS of
X = <x1,x2, …xm> and Y = <y1, y2, …, yn>
The following recursive procedure prints out an LCS of X and Y. The
initial invocation is PRINT-LCS(b, X, m, n).
value 4 5 10 11 13
name A B C D E
M = 17
for i: = 0 to M do cost[i]: = 0;
for j: = 1 to N do /* each of item type */
begin
for i:= 1 to M do /* i means capacity */
if i – size[j] > = 0 then
if cost[i] < (cost[i – size[j]] + val[j]) then
begin
cost[i]: = cost[i – size[j]] + val[j]; best[i]: = j
end;
end;
29
Knapsack problem solution
K 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
j=1
cost[k] 0 0 4 4 4 8 8 8 12 12 12 16 16 16 20 20 20
best[k] A A A A AA A A A A A A A A A
j=2
cost[k] 0 0 4 5 5 8 9 10 12 13 14 16 17 18 20 21 22
best[k] A B B A B B A B B A B B A B B
j=3
cost[k] 0 0 4 5 5 8 10 10 12 14 15 16 18 20 20 22 24
best[k] A B B A C B A C C A C C A C C
j=4
cost[k] 0 0 4 5 5 8 10 11 12 14 15 16 18 20 21 22 24
best[k] A B B A C D A C C A C C D C C
j=5
cost[k] 0 0 4 5 5 8 10 11 13 14 15 17 18 20 21 23 24
best[k] A B B A C D E C C E C C D E C
31
Example 4: Warshall algorithm and Floyd algorithm
Transitive closure
33
An example of computing transitive closure
AB C D E FGHI J KLM
A
A 1 1 0 0 0 11 0 0 00 0 0
B 0 1 0 0 0 00 0 0 00 0 0 H I
B C G
C 1 0 1 0 0 00 0 0 00 0 0
D 0 0 0 1 0 10 0 0 00 0 0
E 0 0 01 1 00 0 0 0 0 0 0 D E J K
F 0 0 0 01 10 0 0 0 0 0 0
F
G 0 0 1 01 01 0 0 1 0 0 0
L M
H 0 0 0 00 01 1 1 0 0 0 0
I 0 0 0 0 0 00 1 1 0 0 0 0
J 0 0 0 0 0 00 0 0 1 1 1 1
Adjacency matrix of
K 0 0 0 0 0 00 0 0 0 1 0 0
L 0 0 0 0 0 00 00 0 0 1 1 the initial stage of
M 0 0 0 0 0 00 00 0 0 1 1 Warshall algorithm
34
AB C D E FG H I J K LM
A1 1 1 1 1 1 1 0 0 11 1 1
B 01 0 0 0 0 0 0 0 00 0 0 Adjacency matrix of the
C 11 1 1 1 1 1 0 0 11 1 1 final stage of Warshall
D 00 0 1 1 1 0 0 0 00 0 0 algorithm
E 0 0 01 1 1 0 0 0 0 0 0 0
F 0 0 01 1 1 0 0 0 0 0 0 0
G 1 1 111 1 1 0 0 1 1 1 1
H 1 1 11 11 1 1 1 1 1 1 1
I 1 1 1 1 11 1 1 1 1 1 1 1 Property 5.3.1 Warshall
J 1 1 1 1 11 1 0 0 1 1 11 algorithm finds the
K 0 0 0 0 00 0 0 0 01 0 0 transitive closure in O(V3).
L 1 1 11 11 1 0 0 1 1 11
M 1 1 11 11 1 0 0 1 1 11
35
Explaining Warshall algorithm
Warshall algorithm repeats V iterations on the adjacency matrix
a, constructing a series of V boolean matrices:
a(0),.., a(y-1),a(y),…,a(V) (5.4)
The central point of the algorithm is that we can compute all
elements of each matrix a(y) from its immediate predecessor a(y-1)
in series (5.4)
After the y-th iteration, a[x, j] is equal to 1 if and only if there
exists a directed path of a positive length from the vertex x to
vertex j with each intermediate vertex, if any, numbered not
higher than y.
After the y-th iteration, we compute the elements of matrix a by
the following formula:
ay[x,j] = ay-1[x,j] or (ay-1[x, y] and ay-1[y, j]) (5.5)
The superscript y indicates the value of an element in matrix a
after the y-th iteration.
Warshall algorithm applies dynamic programming paradigm since it
uses the recurrence formula (5.5) but it does not bring out a
recursive algorithm. Instead, it brings out an iterative algorithm
with the support of a matrix for storing intermediate results.
36
a b c d
Example: a 0 1 0 0
A(0) = b 0 0 0 1
c 0 0 0 0
d 1 0 1 0
a b c d
a 0 1 0 0
A(1) = b 0 0 0 1
c 0 0 0 0
a b d 1 1 1 0
a b c d
a 0 1 0 1
A(2) = b 0 0 0 1
c 0 0 0 0
d 1 1 1 1
c d
a b c d
a 0 1 0 1
A(3) = b 0 0 0 1
c 0 0 0 0
d 1 1 1 1
a b c d
a 1 1 1 1
A(4) = b 1 1 1 1
c 0 0 0 0
d 1 1 1 1
37
Floyd algorithm for the All-Pairs Shortest Paths
Problem
For weighted graphs (directed or not) one might want to
build a matrix allowing one to find the shortest path from x
to y for all pairs of vertices. This is the all-pairs shortest path
problem.
A 4 2
3
1 H I
B 1 C G
1 1
Figure 5.7 2
2 1
1
1 D E J K
5 2
F 3
2
L M
1
38
Floyd algorithm
for y : = 1 to V do
for x : = 1 to V do
if a [x, y] > 0 then
for j: = 1 to V do
if a [y, j] > 0 then
if (a[x, j] = 0) or (a[x, y] + a[y, j] < a [x, j])
then
a[x, j] = a[x, y] + a[y, j];
39
An example of Floyd algorithm (for Figure 5.7)
ABCD EFGHI JK LM
A 0 1 0 0 0 24 0 0 00 0 0
Adjacency matrix
B 0 0 0 0 0 00 0 0 00 0 0
in the initial
C 1 0 0 0 0 00 0 0 00 0 0
stage of Floyd’s
D 0 0 0 0 0 10 0 0 00 0 0
algorithm
E 0 0 02 0 00 0 0 0 0 0 0
F 0 0 0 02 00 0 0 0 0 0 0
G 0 0 1 01 00 0 0 1 0 0 0
H 0 0 0 00 03 0 1 0 0 0 0 Notes: All the elements in
I 0 0 0 0 0 00 1 0 0 0 0 0 the diagonal are 0.
J 0 0 0 0 0 00 0 0 0 1 3 2
K 0 0 0 0 0 00 0 0 0 0 0 0
L 0 0 0 0 0 550 0 0 0 0 1
M 0 0 0 0 0 00 00 0 0 1 0
40
ABCDEFGH I JKLM Adjacency matrix
A 6 1 5 6 4 2 4 0 0 56 8 7 in the final stage of
B 00 0 0 0 0 0 0 0 00 0 0 Floyd algorithm
C 12 6 7 5 3 5 0 0 67 9 8
D 0 0 0 5 3 1 0 0 0 00 0 0 Property 5.3.2 Floyd
E 0 0 02 5 3 0 0 0 0 0 0 0 algorithm solves the
F 0 0 04 2 5 0 0 0 0 0 0 0 all-pairs shortest path
G 2 3 131 4 6 0 0 1 2 4 3 problem in O(V3).
H 5 6 46 47 3 2 1 4 5 7 6
I 6 7 5 7 5 8 4 1 2 5 6 8 7
J 10 11 9 11 9 12 8 0 0 9 1 3 2
K 0 0 0 0 0 0 0 0 0 0 0 0 0
L 7 8 68 6 9 5 0 0 6 7 2 1
M 8 9 7 9 7 10 6 0 0 7 8 1 2
41
Explaining Floyd Algorithm
Floyd algorithm repeats V iterations on the adjacency matrix a,
constructing a series of V matrices:
a(0), …,a(y-1),a(y),…,a(V) (5.6)
The central point of the algorithm is that we can compute all
elements of each matrix a(y) from its immediate predecessor
a(y-1) in series (5.6)
After the y-th iteration, a[x, j] stores the shortest length of the
direxted path from the vertex x to vertex j with each
intermediate vertex, if any, numbered not higher than y.
After the y-th iteration, we compute the elements of matrix a by
the following formula:
42
The formula (5.7) is illustrated by the following figure.
ay-1[x,y] ay-1[y,j ]
j
x
ay-1[x,j ]
c d a b c d
a 0 0 3 0 0000
1 R(2) = b 2 0 5 0 P(2)= 00a0
c 9 7 12 1 b0b0
d 6 0 9 0 00a0
a b c d
a 12 10 3 4 cc0c
R(3) = b (3)
2 12 5 6 P = 0 c a c
c 9 7 12 1 b0b0
d 6 16 9 10 0cac
a b c d
a 10 10 3 4 dc0c
R(4) = b (4)
2 12 5 6 P = 0 c a c
c 7 7 10 1 d0d0
d 6 16 9 10 0cac
44
Improving the Floyd algorithm
for i := 1 to V do
for j:= 1 to V do
P[i,j] := 0;
for i := 1 to V do
a[i,i]:= 0;
45
for y : = 1 to V do
for x : = 1 to V do
if a [x, y] > 0 then
for j: = 1 to V do
if a [y, j] > 0 then
if (a[x, j] = 0) or (a[x, y] + a[y, j] < a [x, j]) then
begin
a[x, j] = a[x, y] + a[y, j];
P[x,j] := y;
end
46
2. Greedy algorithm
Algorithms for optimization problems typically go through a
sequence of steps, with a set of choices at each step. A greedy
algorithm always makes the choice that looks best at the
moment.
That is, it makes a locally optimal choice in the hope that this
choice will leads to a globally optimal solution.
Some examples of greedy algorithm:
- An activity-selection problem
- The fractional knapsack problem
- Huffman code problem
- Prim algorithm for minimum-spanning trees
47
Activity-Selection Problem
48
Greedy algorithm for activity-selection problem
49
Procedure Greedy-activity-selector
50
i si fi Figure 5.5 An example
of activity-selection
1 1 4 problem
2 3 5
3 0 6
4 5 7
5 3 8
6 5 9
7 6 10
8 8 11
9 8 12
10 2 13
11 12 14
51
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Elements of greedy algorithm
There are two ingredients that are exhibited by most problems
that lend themselves to a greedy strategy: (1) the greedy choice
property and (2) optimal substructure.
Greedy choice property
The choice made by a greedy algorithm may depend on choices so
far, but it cannot depend on any future choices or on the solutions
of the subproblems. Thus, unlike dynamic programming, a greedy
algorithm usually progresses in a top-down fashion, making one
greedy choice after another, iteratively reducing each given
problem instance to a smaller.
Optimal Substructure
The optimal solution for the problem contains within it optimal
solutions to subprobems.
52
Greedy versus dynamic programming
Given an optimization problem, it is not easy to decide whether
dynamic programming or greedy algorithm should be used to solve
it. Let investigate two variants of a classical optimization problem.
The 0-1 knapsack problem is posed as follows.
'‘A thief robbing a store find it filled with N types of items of
varying size and value (the i-th items is worth vi dollars and
weights wi pounds), but has only a small knapsack of capacity M to
use to carry the goods. The knapsack problem is to find the
combination of items which the thief should choose for his
knapsack in order to maximize the total value of all the items he
takes.”
This is called the 0-1 knapsack problem because each item must
either be taken or left behind; the thief can not take a fractional
amount of an item or take an item more than once.
53
Fractional knapsack problem
In the fractional knapsack problem, the setup is the same, but the
thief can take fractions of items, rather than having to make a
binary (0-1) choice for each item.
Both knapsack problems exhibit the optimal substructure
property.
For the 0-1 problem, consider the most valuable load with the
weights at most M pounds. If we remove item j from this load, the
remaining load must be the most valuable load weighing at most M
- wj that the thief can take from the n-1 original items excluding j.
For the fractional problem, consider that if we remove a weight
wj -w of one item j from the optimal load, the remaining load must
be the most valuable load weighting at most M – (wj –w) that the
thief can take from the n-1original items, excluding item j.
54
Fractional knapsack problem (cont.)
We use greedy algorithm for the fractional knapsack and
dynamic programming for the 1-0 knapsack.
To solve the fractional problem, we first compute the value
per pound (vi/wi ) for each item.
The thief begins by taking as much as possible of the item
with the greatest value per pound (vi/wi). If the supply of
that item is exhausted and he still can carry more, he takes
as much as possible of the item with the next greatest value
per pound, and so forth until he cannot carry any more.
55
Figure 5.6
56
procedure GREEDY_KNAPSACK(V, W, M, X, n);
/* V, W are the arrays contain the values and weights of n objects
ordered so that Vi/Wi Vi+1/Wi+1. M is the knapsack capacity and X is
solution vector */
var rc: real; i: integer;
begin
for i:= 1 to n do X[i]:= 0;
rc := M ; // rc = remaining knapsack
capacity // By sorting the
for i := 1 to n do items by value per
begin pound, the greedy
if W[i] > rc then exit; algorithm, the
X[i] := 1; rc := rc – W[i] greedy algorithm
end; runs in O(nlogn).
if i n then X[i] := rc/W[i]
end
57
Huffman codes
58
a b c d e f
Frequency 45 13 12 16 9 5
Fixed length codeword 000 001 010 011 100 101
Variable length codeword 0 101 100 111 1101 1100
59
Variable-length code
A variable-length code can do better than a fixed length code,
by giving frequent characters short code-words and
infrequent characters long code-words.
a = 0, b = 101, . . . f = 1100
This code requires:
(45. 1 + 13 .3 + 12.3 + 16.3 + 9.4 + 5.4).1000 = 224000 bits
to represent the file, a savings of approximately 25 %.
60
Prefix-free code
We consider here only codes in which no codeword is also a
prefix of some other codeword. Such codes are called prefix-
free-code or prefix-code.
It is possible to show that the optimal data compression
achievable by a character code can always be achieved with a
prefix code.
Prefix codes are desirable because they simplify encoding and
decoding.
- Encoding is simple: we just concatenate the code-words
representing each character of the file.
- Decoding is simple with a prefix code. Since no codeword is a
prefix of any other, the code word that begins an encoded file is
unambiguous.
61
Prefix-free code and binary tree
An optimal code for a file is always represented by a full
binary tree in which every non-leaf node has two children.
We interpret the binary codeword for a charater as the path
from the root to that character, where 0 means “go to the left
child” and 1 means “go to the right child”.
If C is the alphabet from which the characters are drawn,
then the tree for an optimal prefix code has exactly |C| leaves,
one for each letter of the alphabet, and exactly |C|-1 internal
node.
62
100
0 1 100
0 1
86 14 a:45 55
0 1 0 0 1
58 28 14 25 30
0 1 0 1 0 1 0 1 0 1
a:45 b:13 c:12 d:16 e:9 f:5 c:12 b:13 14 d:16
0 1
f:5 e:9
(a) (b)
63
Prefix-free code and binary tree (cont.)
Given a tree T corresponding to a prefix code, it is a simple
matter to compute the number of bits required to encode a
file.
For each character c in the alphabet C, let f(c) denote the
frequency of c in the file and dT(c) is the length of the
codeword for character c. The number of bits required to
encode a file is
B (T ) f (c) dT (c)
cC
which we define as the cost of the tree T.
64
Constructing a Huffman code
Huffman invented a greedy agorithm that constructs an
optimal prefix code called a Huffman code.
The algorithm builds the tree T corresponding to the optimal
code in a bottom-up manner. It begins with a set of |C| leaves
and performs a sequence of |C|-1 “merging” operations to
create the final tree.
A priority queue Q, keyed on f, is used to identify the two least
frequency objects to merge together.
The result of the merger of the two objects is the new object
whose frequency is the sum of the frequencies of the two
objects that were merged.
65
(a) f:5 e:9 c:12 b:13 d:16 a:45 (b) c:12 b:13 14 d:16 a:45
0 1
f:5 e:9
67
Example 4: Graph coloring
Given an undirected graph, a coloring of that graph
is an assignment of a color to each vertex of the
graph so that no two vertices connected by an edge
have the same color. We wish to find a coloring with
the minimum number of colors.
This is an optimization problem.
One reasonable strategy for graph coloring is using
greedy algorithm.
The idea: Initially, we try to color as many vertices as
possible with the first color, then as many as
possible of the uncolored vertices with the second
color and so on.
Note: Greedy algorithm can not yield the optimal
solution for this problem.
68
To color vertices with a new color, we perform the
following steps:
Select some uncolored vertex and color it with a new color.
Scan the list of uncolored vertices. For each uncolored
vertex, determine whether it has an edge to any vertex
already colored with the new color. If there is no such edge,
color the present vertex with the new color.
Example: In the following figure, we color vertex 1
with red color and then we color vertices 3 and 4
with the same red color.
3
1 5 2
69
Procedure SAME_COLOR
Procedure SAME_COLOR determines a set of
vertices (called newclr), all of which can be colored
with a new color. This procedure is called repeately
until all vertices are colored.
procedure SAME_COLOR(G, newclr);
/* SAME_COLOR assigns to newclr a set
of vertices of G that may be given the
same color */
begin
newclr := ;
for each uncolored vertex v of G do
if v is not adjacent to any vertex in newclr
then
mark v colored and add v to newclr.
end;
70
procedure G_COLORING(G);
procedure SAME_COLOR(G, newclr);
/* SAME_COLOR assigns to newclr a set of
vertices of G that may be given the same color ;
a: adjacency matrix for graph G */
begin
newclr := ;
for each uncolored vertex v of G do
begin
found := false;
for each vertex w newclr do
if a[v,w] = 1 /*there is an edge between v and w in G */
then
found := true;
if not found then
mark v colored and add v to newclr
end
end;
71
for each vertex in G do mark uncolored;
while there is any vertex marked uncolored do
begin
SAME_COLOR(G, newclr);
print newclr
end.
Degree of a vertex: the number of edges connected to this
vertex.
Theorem: If (G) is the minimum number of colors to color graph G
and G is the largest degree in that graph then (G) ≤ G +1
Complexity of greedy algorithm for graph coloring
Assume that the graph is represented by an adjacency matrix.
In procedure SAME_COLOR each cell in the adjacency matrix is
examined when we color a new color for the uncolored vertices.
Complexity of procedure SAME_COLOR: O(n2) where n is the
number of vertices in G.
If m is the number of colors used to color the graph then procedure
SAME_COLOR is called m times at all. Therefore, the complexity of
the whole algorithm is m*O(n2). Since m is often a small number,
we can say: The algorithm has a quadratic complexity.
72
Application: Exam timetabling
73
A Heuristic for Graph coloring
“The vertex with the largest degree will be examined to
color first”.
Degree of a vertex: the number of edges connected
to this vertex.
Reason: The vertices with more connected edges
will be more difficult to be colored if we wait until all
their adjacent vertices had been colored.
Algorithm
1.Arrange the vertices by decreasing order of degrees.
2.Color a vertex with maximal degree with color 1.
74
References
75