Advanced Algorithms

Lecture Notes
Periklis A. Papakonstantinou
Fall 2011
Preface
This is a set of lecture notes for the 3rd year course “Advanced Algorithms” which I gave in
Tsinghua University for Yao-class students in the Spring and Fall terms in 2011. This set of notes
is partially based on the notes scribed by students taking the course the first time it was given.
Many thanks to Xiahong Hao and Nathan Hobbs for their hard work in critically revising, editing,
and adding figures to last year’s material. I’d also like to give credit to the students who already
took the course and scribed notes. These lecture notes, together with students’ names can be found
in the Spring 2011 course website.
This version of the course has a few topics different than its predecessor. In particular, I re-
moved the material on algorithmic applications of Finite Metric Embeddings and SDP relaxations
and its likes, and I added elements of Fourier Analysis on the boolean cube with applications
to property testing. I also added a couple of lectures on algorithmic applications of Szemer´ edi’s
Regularity Lemma.
Except from property testing and the regularity lemma the remaining topics are: a brief revi-
sion of the greedy, dynamic programming and network-flows paradigms; a rather in-depth intro-
duction to geometry of polytopes, Linear Programming, applications of duality theory, algorithms
for solving linear programs; standard topics on approximation algorithms; and an introduction to
Streaming Algorithms.
I would be most grateful if you bring to my attention any typos, or send me your recommenda-
tions, corrections, and suggestions to papakons@tsinghua.edu.cn.
Beijing, PERIKLIS A. PAPAKONSTANTINOU
Fall 2011
– This is a growing set of lecture notes. It will not be complete until mid January 2012 –
1
Contents
1 Big-O Notation 4
1.1 Asymptotic Upper Bounds: big-O notation . . . . . . . . . . . . . . . . . . . . . . 4
2 Interval Scheduling 6
2.1 Facts about Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Unweighted Interval Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Weighted Interval Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Sequence Alignment: can we do Dynamic Programming in small space? 12
3.1 Sequence Alignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Dynamic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Matchings and Flows 17
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Maximum Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Max-Flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Ford-Fulkerson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4.1 Edmonds-Karp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Bipartite Matching via Network Flows . . . . . . . . . . . . . . . . . . . . . . . . 24
4.6 Application of Maximum Matchings: Approximating Vertex Cover . . . . . . . . . 24
5 Optimization Problems, Online and Approximation Algorithms 26
5.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Approximation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Makespan Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 Approximation Algorithm for Makespan Problem . . . . . . . . . . . . . . . . . . 28
6 Introduction to Convex Polytopes 31
6.1 Linear Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Affine Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Convex Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.4 Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.5 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1
7 Forms of Linear Programming 36
7.1 Forms of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.2 Linear Programming Form Transformation . . . . . . . . . . . . . . . . . . . . . . 37
8 Linear Programming Duality 38
8.1 Primal and Dual Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8.1.1 Primal Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8.1.2 Dual Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.3 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.3.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.3.2 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.3.3 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3.4 More on Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.4 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.5 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9 Simplex Algorithm and Ellipsoid Algorithm 47
9.1 More on Basic Feasible Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.1.1 Assumptions and conventions . . . . . . . . . . . . . . . . . . . . . . . . 47
9.1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.1.3 Equivalence of the definitions . . . . . . . . . . . . . . . . . . . . . . . . 48
9.1.4 Existence of Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . 49
9.1.5 Existence of optimal Basic Feasible Solution . . . . . . . . . . . . . . . . 50
9.2 Simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2.1 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3 Ellipsoid algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3.2 Mathematical background . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.3.3 LP,LI and LSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.3.4 Ellipsoid algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
10 Max-Flow Min-Cut Through Linear Programming 55
10.1 Flow and Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.1 Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.2 An alternative Definition of Flow . . . . . . . . . . . . . . . . . . . . . . 56
10.1.3 Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
10.2 Max-flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
11 Rounding Technique for Approximation Algorithms(I) 60
11.1 General Method to Use Linear Programming . . . . . . . . . . . . . . . . . . . . 60
11.2 Set Cover Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2
11.2.3 Greedy Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
11.2.4 LP-rounding Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.3 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
12 Rounding Technique for Approximation Algorithms(II) 68
12.1 Randomized Algorithm for Integer Program of Set Cover . . . . . . . . . . . . . . 68
12.1.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
12.1.2 The correctness of the Algorithm . . . . . . . . . . . . . . . . . . . . . . 68
12.2 Method of Computation Expectations . . . . . . . . . . . . . . . . . . . . . . . . 70
12.2.1 MAX-K-SAT problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12.2.2 Derandomized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12.2.3 The proof of correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
13 Primal Dual Method 73
13.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13.2 Set Cover Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3
Chapter 1
Big-O Notation
When talking about algorithms (or functions or programs), it would be nice to have a way of
quantifying or classifying how fast or slow that algorithm is, how much space it uses and so on.
It is useful to present this quantification in relation to the size of the algorithm’s input, so that it
is independent of the peculiarities of the problem. In this chapter we introduce the big-O notation
which among other things it can be used to quantify the asymptotic behavior of an algorithm as a
function of its input length n measured in bits.
1.1 Asymptotic Upper Bounds: big-O notation
Let T(n) be a function, say, the worst-case running time of a certain algorithm with respect to
its input length n. What we would like to point out here is that worst-case running time is a
well-defined function. Contrast this to the running time of an algorithm which is in general not
a function of the input length (why?). Given a function f(n), we say that T(n) ∈ O(f(n)) if,
for sufficiently large n, the function T(n) is bounded from above by a constant multiple of f(n).
Sometimes this is written as T(n) = O(f(n)).
To be precise, T(n) = O(f(n)), if there exists a constant c > 0 and n
0
≥ 0 such that for all
n ≥ n
0
, we have T(n) ≤ cf(n). In this case, we will say that T is asymptoticly upper bounded
by f.
For a particular algorithm, different inputs result in different running times. Notice that the
4
running time of a particular input is a constant, not a function. Running times of interest are the
best, the worst, and the average expressing that the running time is at least, at most and on average
for each input length n. For us the worst-case execution time is often of particular concern since
it is important to know how much time might be needed in the worst case to guarantee that the
algorithm will always finish on time. Similarly, for the worst-case space used by an algorithm.
From this point on unless stated otherwise the term “running time” is used to refer to “worst-case
running time”.
Example. Take Bubble Sort as an example, shown in Algorithm 1. This is an algorithm correct
for the SORTING problem, which has the obvious definition. We see that the number of outer loops
and the number of inner loops together determine the running time of the algorithm. Notice that
we have:
the number of outer loops ≤ n
the number of inner loops ≤ n
Therefore, a rough estimation on the worst-case running time is O(n
2
). This bound can be proved
to be asymptotically tight.
Algorithm 1: Bubble Sort
input : L: list of sortable items
output: L

: list of sorted items in increasing order
repeat
swapped = false;
for i ← 1 to length(L) - 1 do
if L[i −1] > L[i] then
swap( L[i-1], L[i] )
swapped = true
end
end
until not swapped;
return L
An important skill required in the design of combinatorial algorithms is constructing examples.
In this case, infinite families of examples. The more interesting direction in bounding the running
time is what we did before, i.e. to upper bound it. We showed that T(n) = O(n
2
). How do we
know we have not overestimated things? One way is to show that T(n) = Ω(n
2
). To do that, we
construct an infinite family of examples parametrized by n where the algorithm takes Ω(n
2
). In this
case, given n we can consider as input ¸n, n −1, . . . , 1¸ and conclude that on this input Algorithm
1 makes Ω(n
2
) many steps
1
.
1
To be very precise we should have treated n as the input length in the number of bits in the input (and not
the number of integers). For example, the above input for a given n has length ≈ nlog n (each integer requires about
log n many bits to be represented). However, it is an easy exercise to do the (annoying) convention between mintegers
represented in binary that altogether have length n bits.
5
Chapter 2
Interval Scheduling
In this lecture, we begin by describing what a greedy algorithm is. UNWEIGHTED INTERVAL
SCHEDULING is one problem that can be solved using such an algorithm. We then present
WEIGHTED INTERVAL SCHEDULING, where greedy algorithms don’t seem to work, and we must
employ a new technique, dynamic programming, which in some sense generalizes the concept of a
greedy algorithm.
2.1 Facts about Greedy Algorithms
Most greedy algorithms make decisions by iterating the following 2 steps.
1. Order the input elements
2. Make an irrevocable decision based on a local optimization.
Here, irrevocable means that once a decision is made about the partially constructed output, it is
never able to be changed. The greedy approach relies on the hope that repeatedly making locally
optimal choices will lead to a global optimum.
Most greedy algorithms order the input elements just once, and then make decisions one by
one afterwards. Kruskal’s algorithm on minimum spanning trees is a well-known algorithm that
uses the greedy approach. More sophisticated algorithms will reorder the elements in every round.
For example, Dijkstra’s shortest path algorithm will update and reorder the vertices to choose the
one who’s distance to the vertex is minimized.
Greedy algorithms are often the first choice when one tries to solve a problem for two reasons:
1. The greedy concept is simple.
2. In many common cases, a greedy approach leads to an optimal answer.
We present a simple algorithm to demonstrate the greedy approach.
6
2.2 Unweighted Interval Scheduling
Suppose we have a set of jobs, each with a given starting and finishing time. However, we are
restricted to only one machine to deal with these jobs, that is, no two jobs can overlap with each
other. The task is to come up with an arrangement or a “scheduling” that finishes the maximum
number of jobs within a given time interval. The formal definition of the problem can be described
as follows:
Definition 2.2.1 (Intervals and Schedules). Let S be a finite set of intervals. An interval I is a
pair of two integers, the first smaller than the second; i.e. I = (a, b) ∈ (Z
+
)
2
, a < b. We say
that S

⊆ S is a feasible schedule if no two intervals I, I

∈ S overlap with each other. Let
I = (a
1
, b
1
), I

= (a
2
, b
2
); overlap means a
2
< b
1
≤ b
2
or a
1
< b
2
≤ b
1
.
Problem: UNWEIGTHED INTERVAL SCHEDULING
Input: a finite set fo intervals S
Output: a feasible schedule S

⊆ S, such that [S

[ is maximum (the maximum is taken over the
size of all feasible schedules of S).
Figure 2.1 gives an example instance of the Unweighted Interval Scheduling Problem. Here
interval (1, 7) and interval (6, 10) overlap with each other.
(Failed) attempt 1: Greedy rule: Order the intervals in ascending order by length, then choose
the smaller intervals first. The intuition behind this solution is that shorter jobs may be less likely
to overlap with other jobs. So we order the intervals in ascending order, then we choose jobs which
do not overlap with the ones we have already chosen (choosing the shortest job which fulfils this
requirement). However, this solution is not correct.
Figure 2.1: An example of the time line and intervals
Figure 2.2 gives a counterexample in which this greedy rule fails. Giving a counterexample is
a standard technique to show that an algorithm is not correct
1
. In this example, the optimal choice
is job 1 and job 3. But by adhering to a greedy rule that applies to the length of a job, we can only
choose job 2. Solution 1 gives an answer only
1
2
of the optimal. We know that this algorithm is
wrong. But there is still one thing we can ask about this greedy algorithm: How bad can it be? Can
1
The definition of a correct algorithm for a given problem, is that for every input (i) the algorithm terminates and
(ii) if it terminates and the input is in the correct form then the output is in the correct form. Therefore, to prove that
an algorithm is not correct for a given problem it is suffcient to show that there exists an input where the output is not
correct (in this case, by definition of the problem, correct is a schedule of optimal size).
7
Figure 2.2: Counterexample of Solution 1
it give an result much worse than
1
2
? The answer is no. This greedy algorithm can always give a
result no worse than
1
2
of the optimal (see Exercise 1).
Attempt 2: Greedy rule: Sort by the finishing time in non-decreasing order, and choose jobs
which do not overlap with jobs already chosen. A formal algorithm is described below.
Algorithm 2: A greedy algorithm for interval scheduling
input : S : the set of all intervals
output: S

⊆ S s.t. S

is a scheduling with maximum size
Order S as I
1
< I
2
< ... < I
n
in non-decreasing finishing time;
S

← ∅;
for i ← 1 to n do
if I
i
does not overlap an interval in S

then
S

← ∪¦I
i
¦;
end
end
return S

;
We would like to show that this algorithm is “correct”. But, we must first make sure that we
define what is meant by the “correctness” of an algorithm.
The correctness of an algorithm is defined as follows:
1. The algorithm always terminates.
2. If it can be proved that the precondition is true, then the postcondition must also be true. In
other words, every legal input will always generate the right output.
It is clear that Algorithm 8 will terminate after n iterations. For some algorithms though,
termination can be a problem and might require more justification to show. We will now shift our
focus to the more difficult question of how to prove Algorithm 8 always gives the correct maximum
scheduling.
Proof. We begin by introducing some basic concepts used in the proof. Let S

i
be the content
of S

at the end of the i’th iteration of the for loop, and let OPT
i
be an optimal extension of
S

i
(S

i
⊆ OPT
i
). There is one important thing about the second part of definition. We have to show
the existence of such an optimal extension, otherwise the problem is proved by the definition itself.
8
Remark 2.2.2. You may wonder why we are concerned about a different OPT
i
(one for each
iteration i). This is because after the base case is established, we find OPT
n
by assuming it has
been proved for all values between the base case and OPT
n−1
, inclusive. This is the way induction
works.
We will show by induction that
P(i) : ∀i, ∃OPT
i
s.t. S

i
⊆ OPT
i
For the base case, P(0), it is trivial to get S

0
= ∅ and S

0
⊆ OPT
0
. For the inductive step,
suppose we have P(k) and we want to solve P(k + 1). At time k + 1, we consider interval I
k+1
:
• Case I: I
k+1
/ ∈ S

k+1
. We can construct OPT
k+1
= OPT
k
. It is obvious that S

k+1
= S

k

OPT
k+1
.
• Case II: I
k+1
∈ S

k+1
– Subcase a: I
k+1
does not overlap OPT
k
.
This case cannot happen since it contradicts with the definition of OPT
k
.
– Subcase b: I
k+1
overlaps exactly one interval I

from OPT
k
.
Since I
k+1
∈ S

k+1
, I
k+1
will not overlap any interval in S

k
, which implies I

∈ OPT
k
¸
S

k
. So S

k
⊆ OPT
k
¸¦I

¦. Then we can construct OPT
k+1
= (OPT
k
¸¦I

¦)∪¦I
k+1
¦.
S

k+1
= S

k
∪ ¦I
k+1
¦, where S

k
⊆ OPT
k
¸ ¦I

¦. So S

k+1
⊆ OPT
k+1
.
– Subcase c: I
k+1
overlaps more than one intervals from OPT
i
.
We wish to show this case can not happen. Since I
k+1
∈ S

k+1
, I
k+1
will not overlap
any interval in S

k
, which implies I
k+1
only overlaps intervals from OPT
k
¸ S

k
. By
way of contradiction, suppose I
k+1
overlaps with more than one interval I
a
, I
b
...I
t

OPT
k
¸ S

k
, for some t. Here I
a
, I
b
...I
t
are sorted in non-decreasing order by finishing
time. Because I
a
, I
b
...I
t
∈ OPT
k
, we have the property that earlier intervals always end
before later ones start, i.e. all intervals in OPT
k
are sequential and non-overlapping.
Since we have more than one interval that overlaps with I
k
, there must be at least one
interval from I
a
, I
b
...I
t
∈ OPT
k
¸ S

k
that ends before I
k+1
finishes (c.f. Figure 2.3).
By the way the algorithm works, I
k+1
should end before all I
a
, I
b
...I
t
end. So there is
a contradiction.
By induction, for any k > 0, we can always construct OPT
k+1
from OPT
k
such that S
k+1

OPT
k+1
. Thus
P(i) : ∀i, ∃OPT
i
s.t. S

i
⊆ OPT
i
9
Figure 2.3: An example of Case II(b): There are two intervals I
a
and I
b
from OPT
k
overlaps with
I
k
2.3 Weighted Interval Scheduling
This interval scheduling problem is identical to the previous problem, but with the difference that
all intervals are assigned weights; i.e. a weighted interval I can be identified with I = (a, b) and
an integer w(I) which is its weight .
Input: S : a set of weighted intervals S and w : S → N
Output: A schedule S

⊆ S of maximum total weight.
Figure 2.4: An example of Weighted Interval Scheduling
Unlike the Unweighted Interval Scheduling problem, no natural greedy algorithm is known for
solving its weighted counterpart. It can be solved, however, via a different technique if we break
the problem into sub-problems: dynamic programming. Now instead of just greedily building
up one solution, we create a table and in each round we “greedily” compute an optimal solution
to a subproblem by using the values of the optimal solutions of smaller subproblems we have
constructed before.
Of course, not every problem can be solved via Dynamic Programming (DP). Roughly speak-
ing, DP implicitly explores the entire solution space by decomposing the probleminto sub-problems,
and then using the solutions to those sub-problems to assist in solutions for larger and larger sub-
problems.
After ordering intervals in non-decreasing order by finishing time, an example sub-problem in
WEIGHTED INTERVAL SCHEDULING is to give the scheduling with maximum total weight up to
a specified finishing time. For example, say that the input consists of the intervals I
1
, I
2
, I
3
, I
4
,
and that the finishing time of I
1
is the smallest then it’s I
2
, then I
3
and then I
4
. The DP algo-
rithm computes the optimal schedule if the input consists only of I
1
, then it computes the opti-
mal schedule if the input consists of ¦I
1
, I
2
¦, then it computes the optimal schedule if the input
consists of ¦I
1
, I
2
, I
3
¦ and finally it computes the optimal schedule if the input consists of the
entire list ¦I
1
, I
2
, I
3
, I
4
¦. The reader should understand the following: (1) what would have hap-
pened if we had defined the subproblems without ordering the intervals in non-decreasing finishing
time order and (2) how do we use the fact that in order to compute the optimal solution to input
10
¦I
1
, . . . , I
k
¦ we should have recorded in ur table all previous solutions to subproblems defined on
inputs ¦I
1
, . . . , I
k−1
¦, ¦I
1
, . . . , I
k−2
¦, ¦I
1
, . . . , I
k−3
¦, . . . .
Algorithm 3: A dynamic programming algorithm for weighted interval scheduling
input : S : the set of all weighted intervals
output: S

⊆ S s.t. S

is a scheduling of maximum total weight
Order S by I
1
< I
2
< ... < I
n
in non-decreasing finishing time;
S

← ∅;
for i ← 1 to n do
Let j be the largest integer s.t. S’[j] contains intervals not overlapping I
i
;
S

[i] ← max¦S

[i −1], S

[j] + w[I
i

end
return S

;
We will use the example illustrated in Figure 2.4 as the input to Algorithm 3. At the end of
Algorithm 3’s execution, the S

array should look like the following:
0 5 11 100 100 150
We remark that unlike greedy algorithms, in Dynamic Programming algorithms the correctness
is transparent in the description of the algorithm. Usually, the difficulty in a DP algorithmis coming
up with a recurrence relation which relates optimal solutions of smaller subproblems to bigger ones
(this recurrence relation is usually trivial to implement as an algorithm).
Exercise 1. Show why Attempt 1 given for the Unweighted Interval Scheduling problem cannot
be worse than half of the optimal.
11
Chapter 3
Sequence Alignment: can we do Dynamic
Programming in small space?
In this course problems come in two forms: search and optimization. For example, the SEARCH
SHORTEST PATH problem is for a given input to find an actual shortest path, whereas the opti-
mization version of the problem is to find the value (i.e how long) is such a path. In this chapter, we
define the SEQUENCE ALIGNMENT problem, and give a dynamic programming algorithm which
solves it. It is very interesting that there is a conceptual difference in the algorithms that solve
the SEARCH and the OPTIMIZATION version of problems. In particular, although it is easy to
find an optimal Dynamic Programming algorithm for the optimization problem which uses little
space, it is more involved to come up with a time-efficient algorithm that uses little space for the
search version of the problem. To that end, we will combine Dynamic Programming and Divide
and Conquer and we will give a solution which uses little time and simultaneously little space.
3.1 Sequence Alignment Problem
Sequence alignment has a very common every day application: spell check. When we type “Algo-
rthsm” when we mean to type “Algorithm”, the computer has to find similar words to the one we
typed and give us suggestions. In this situation, computers have to find the “nearest” word to the
one we typed. This begs the question, how can we define the “nearest” word for a sequence? We
will give a way to score the difference (or similarity) between strings. Apart from spell checkers
this finds application to algorithms on DNA sequences. In fact, such DNA sequences have huge
sizes, so an algorithm that works using space n and algorithm that works using space n
2
make a
big difference in practice!
Usually, alignments add gaps or insert/delete characters in order to make two sequences the
same. Let δ be a gap cost and α
a,b
be the cost of a mismatch between a and b. We define the
distance between a pair of sequences as the minimal alignment cost.
In the example above we have:
algor thsm
algorith m
12
In the above alignment, we add a total of two gaps to the sequences. We can say that the
alignment cost in this case is 2δ. Notice that for any sequence pair, there is more than one way to
make them same. Different alignments may lead to different costs.
For example, the sequence pair: “abbba” and “abaa”, two sequences over the alphabet Σ =
¦a, b¦. We could assign an alignment cost of δ +α
a,b
:
abbba
ab aa
Or we could assign an alignment cost of 3δ:
abbba
ab aa
Notice that the relationship between δ and α may lead to different minimal alignments.
Now we give a formal description of the Sequence Alignment Problem:
• INPUT: Two sequences X, Y over alphabet Σ.
• OUTPUT: An alignment of minimum cost.
Definition 3.1.1 (Alignment). Let X = x
1
x
2
...x
n
and Y = y
1
y
2
...y
m
. An alignment between X
and Y is a set M ⊆ ¦1, 2, ..., n¦¦1, 2, ..., m¦, such that (i, j), (i

, j

) ∈ M, where i ≤ i

, j

≤ j.
3.2 Dynamic Algorithm
Claim 3.2.1. Let X, Y be two sequences and M be an alignment. Then one of the following is
true:
1. (n, m) ∈ M
2. the n
th
position of X is unmatched.
3. the m
th
position of Y is unmatched.
Proof. When we want to make an alignment for sequences X and Y , for any (n, m) ∈ ¦1, 2, ..., n¦
¦1, 2, ..., m¦, one of three situations occur: x
n
and y
m
form a mismatch, x
n
should be deleted from
X, or y
m
should be inserted into X. Notice that inserting a blank into the same position for both
X and Y can never reach a best alignment, so omit this case.
We denote by Opt(i, j) the optimal solution to the subproblem of aligning sequences X
i
=
x
1
x
2
...x
i
and Y
j
= y
1
y
2
...y
j
. We denote by α
x
i
,y
j
the mismatch cost between x
i
and y
j
and by δ
the gap cost. From the claim above, we have:
Opt(i, j) = min
_
_
_
α
x
i
,y
j
+Opt(i −1, j −1),
δ +Opt(i −1, j)
δ +Opt(i, j −1)
_
_
_
13
We can now present a formal dynamic programming algorithm for Sequence Alignment:
Algorithm 4: Sequence-Alignment-Algorithm
input : Two sequences X, Y over alphabet Σ
output: Minimum alignment cost
for i ← 0 to n do
for j ← 0 to m do
if i = 0 or j = 0 then
Opt(i, j) = (i + j) δ;
end
else
Opt(i, j) = min¦α
x
i
,y
j
+Opt(i −1, j −1), δ +Opt(i −1, j), δ +Opt(i, j −1)¦;
end
end
end
return Opt(n, m)
It is clear that the running time and the space cost are both O(nm) (the nxm array Opt(i, j)
for space and the nested for loops give us the running time). However, we could improve the
Algorithm 4 by reducing the space required while at the same time not increasing the running time!
We can make an algorithm with space O(n +m) using another technique, Divide-And-Conquer.
Remark 3.2.2. In fact, we can compute the value of the optimal alignment easily in space only
O(m). When computing Opt(k + 1, j), we only need the value of Opt(k, j). Thus we can release
the space Opt(0, j)...Opt(k − 1, j)∀j = 0...m. We will call this the Space-efficient-Alignment-
Algorithm.
3.3 Reduction
We will introduce the Shortest Path on Grids with Diagonals problemand then reduce the Sequence
Alignment problem to it. An example of the Shortest Path on Grids with Diagonals problem is
given in Figure 3.1.
We are given two sequences X = x
1
x
2
...x
n
and Y = y
1
y
2
...y
m
. Let the i
th
row of a grid (with
diagonals) represent x
i
in X, and the j
th
column of a grid represent y
j
in Y . Let the weight of
every vertical/horizontal edge be δ and the weight of a diagonal be α
x
i
y
j
. The shortest path on this
graph from u to v is the same as the minimum alignment cost of two sequence.
Figure 3.1 gives an example of a grid with diagonals, G
XY
. x
1
is matched to the first column,
x
2
is matched to the second column. If we move horizontally from left to right on the first row
starting from u, then this means x
1
is matched to a gap. Similarly, y
2
is matched to the bottom
row, y
1
is matched to the upper row. If we move vertically from bottom to top, this means y
1
is
matched to a gap. When we use diagonals, some letter of the two sequence are matched together,
for example, if we move along the diagonal closest to u, then x
1
is matched to y
1
. No matter the
route we take (only allowing moves up, right and north-east diagonal), we can always find a path
from u to v on G
XY
. Moreover, this path always corresponds to an alignment of X and Y and vice
versa.
14
Figure 3.1: An example of grids with diagonals, G
XY
Claim 3.3.1. The minimum value for the Shortest Path on Grids with Diagonals is the same as the
minimum value alignment cost for Sequence Alignment.
Proof. (sketch) For any two input sequences, a feasible solution for the Sequence Alignment prob-
lem can always have a one to one mapping to a feasible solution for the Shortest Path on Grids
with Diagonals problem.
In the Sequence-Alignment algorithm, if we know the value of the last row, we can run the
algorithm in reverse. We call this algorithm the Reverse-Sequence-Alignment algorithm. Let
f[i][j] store the value of the shortest path from u to location (i, j), and g[i][j] store the value of
the Reverse-Sequence-Alignment algorithm at location (i, j). In other words, g[i][j] is the optimal
alignment of subsequences x
i
x
i+1
...x
n
and y
j
y
j+1
...y
m
.
Claim 3.3.2. The shortest path in G
XY
that passes through (i, j) has length f[i][j] + g[i][j].
Proof. Any path that passes through (i, j) can be divided into two shorter paths, one from u to
(i, j) and the other from (i, j) to v. The shortest path from u to (i, j) is f[i][j]. Since the weights
along the edges are fixed in G
XY
no matter which way they are traversed, the shortest path from
(i, j) to v is equal to the value of the shortest path fromv to (i, j), which is g[i][j]. So the minimum
cost from u to v through (i, j) is f[i][j] + g[i][j].
Claim 3.3.3. For a fixed number q, let k be the number that minimizes f[q][k] +g[q][k]. Then there
exists a shortest path from u to v passing through (q, k).
Proof. Suppose there does not exist a shortest path from u to v passing through (q, k). For a fixed
number q, all the pathes from u to v must pass through some point (q, t), t ∈ 1...m. Thus, a
shortest path L must pass through some point (q, k

). According to our assumption, L cannot pass
through (q, k), which means that in particular, f[q][k

] + f[q][k

] < f[q][k] + g[q][k]. This is a
contradiction.
15
We give the final algorithm based on the Divide and Conquer technique below:
Algorithm 5: Divide and Conquer Alignment Algorithm(X, Y )
input : Two sequences X = x
1
x
2
...x
n
, Y = y
1
y
2
...y
n
output: The node on the u −v shortest path on G
XY
if n ≤ 2 or m ≤ 2 then
exhaustively compute the optimal and quit;
end
Call Sequence-Alignment algorithm(X, Y [1, m/2])
Call Reverse-Sequence-Alignment algorithm(X, Y [m/2 + 1, m])
Find q minimize f[q][m/2] + g[q][m/2]
Add [q, m/2] to the output.
Call Divide-and-Conquer-Algorithm(X[1, q], Y [1, m/2])
Call Divide-and-Conquer-Algorithm(X[q, n], Y [m/2 + 1, m])
The space required in Algorithm 5 is O(n+m). This is because we divide G
XY
along its center
column and compute the value of f[i][m/2] and g[i][n/2] for each value of i. Since we apply the
recursive calls sequentially and reuse the working space from one call to the next, we end up with
space O(n +m).
We claim the running time of the algorithm is O(nm): Let size = nm, we find that f(size) =
O(size) + f(
size
2
). This is because the first calls to Sequence-Alignment and Reverse-Sequence-
Alignment each need O(size) time, and the latter two calls need f(
size
2
) time. Solving the recur-
rence, we have the running time is O(nm).
16
Chapter 4
Matchings and Flows
In this chapter, we discuss matchings and flows from a combinatorial perspective. In a subsequent
lecture we will discuss the same issues using Linear Programming. We introduce the problem of
maximum flow, and algorithms for solving solve it, including the Ford-Fulkerson method and one
polynomial time instantiation of it, the Edmonds-Karp algorithm. We prove the maximum flow -
minimum cut theorem, and we show the polynomial time bound for Edmonds-Karp. Then, we apply
network flows to solve the maximum bipartite matching problem. Finally, we apply non-bipartite
matchings to give a 2-approximation algorithm for Vertex Cover.
4.1 Introduction
Matching and floware basic concepts in graph theory. And they have many applications in practice.
Basically , in a graph a matching is a set of edges of which any two edges does not share a common
end node. And a network flow in a directed weighed graph is like a distribution of workloads
according to the capacity of each edge.
In this lecture,we are going to talk about maximum bipartite matchings. Formally, we have the
following definitions :
Definition 4.1.1 (Bipartite Matching). For a graph G = (X, Y, E),we say that M ⊆ E is a (bipar-
tite) matching if every vertex u ∈ (X ∪ Y ) appears at most once in M.
Now we are aware of the formal definition of a matching, we can introduce the main problem
we are gonna talk about today and the day following – Maximum Matching Problem.
Definition 4.1.2 (Maximum Matching). In a bipartite graph G = (X, Y, E), M ⊆ E is a matching.
Then M is a maximum matching if for every matching M

in G, [M

[ ≤ [M[. j.
Definition 4.1.3 (Maximal Matching). In a bipartite graph G = (X, Y, E), M ⊆ E is a matching.
Then M is a maximal matching if there does not exist a matching M

⊃ M.
Remark 4.1.4. Notice that we use the term “maximum” and not “maximal”. These two are differ-
ent. A “maximal” matching is a matching that we can not add more edges into and keep it still a
matching; while a “maximum” matching is the maximal matching with largest size. Here we give
an example in Figure 4.1.
17
Figure 4.1: This figure shows the difference between a maximum matching and a maximal matching.
Here we define the Maximum Bipartite Matching problem.
Definition 4.1.5. Maximum Bipartite Matching
• Input: A bipartite graph G = (X, Y, E).
• Output: A maximum matching M ⊆ E.
The maximum bipartite matching problem is fundamental in graph theory and theoretical com-
puter science. It can be solved in polynomial time. BUT notice that there exists NO dynamic
programming algorithm that can solve this problem. In order to solve this problem, we introduce
another problem called the Maximum Flow Problem. Later we’ll see how can we solve maximum
matching using maximum flow.
4.2 Maximum Flow Problem
What is a FLOW? The word ”flow” reminds us of water flow in a first impression. Yes, network
flow is just like pushing water from the source to the sink, with the limitation of the capacity of
each pipe(here it’s edge in a graph). And a maximum flow problem is just pushing as much water
in as we can. Formally the definition of a network flow is as follows:
Definition 4.2.1 (Network Flow). Given (G, s, t, c), where G is a graph (V, E),s, t ∈ V ,c is a
function which maps an edge to a value, a flow f : E → R is a function where :
1. f(e) ≤ c(e), ∀e ∈ E
2.

∀e=(u,v)
f(e) =

∀e

=(v,w)
f(e

), ∀v ∈ V
We define the value of a flow v(f) as v(f) =

∀e=(s,u)
f(e), which means the sum of all the
edges leading out of the source s. And the maximum flow problem is defined as follows:
Definition 4.2.2. Maximum Flow Problem
• Input: (G,s,t,c).
• Output: a flow in the graph with maximum value.
18

s
3/2
//

u
2/2
//

t
*4

s
1
//

u
2
ii •
t
2
oo
Figure 4.2: the transform from a network to its residual network.
In order to solve the maximum flow problem, we introduce another concept in the graph theory
– residual graph:
Definition 4.2.3 (Residual Graph). For a net work N : (G, s, t, c), and f is a flow for N. Then G
f
the residual network is a graph where:
1. V
f
= V
2. ∀e ∈ E, if c(e) > f(e), then e is in E
f
and c
f
(e) = c(e) − f(e).(Here e is called a forward
edge)
3. ∀e = (u, v) ∈ E, if f(e) > 0,then e

= (v, u) is in E
f
and c(e

) = f(e) (Here e

is called a
backward edge)
Here is an example of the transform from a network to its residual network in Figure 4.2.
4.3 Max-Flow Min-Cut Theorem
This part we are going to talk about the relation between CUT and FLOW. Firstly we introduce the
definition of CUT.
Definition 4.3.1 (CUT). G = (V, E),V = V
1
∪ V
2
and V
1
∩ V
2
= ∅ then we call (V
1
, V
2
) a cut of
the graph G.
In a graph G = (V, E), for every cut S, T such that s ∈ S and t ∈ T, we call it a s-t cut.
Claim 4.3.2. Let f be any flow in G, for every s-t cut (S, T),v(f) = f
out
(S) −f
in
(S).
Proof. By definition, f
in
(s) = 0, so we have v(f) =

∀e=(s,u)
f(e) = f
out
(s) − f
in
(s). Since
every node v in S other than s is internal, we have f
out
(v) −f
in
(v) = 0. Thus, v(f) = f
out
(s) −
f
in
(s) =

v∈S
(f
out
(v) −f
in
(v)) = f
out
(S) −f
in
(S).
Then we have a Max-Flow Min-Cut theorem:
Theorem 4.3.3 (Max-Flow Min-Cut). If f is an s − t flow such that there is no s − t path in the
residual graph G
f
, then there is an s-t cut (A

, B

) in G for which v(f) = c(A

, B

). Conse-
quently, f has the maximum value of any flow in G, and (A

, B

) has the minimum capacity of any
s −t cut in G.
19
Figure 4.3: the construction of A

in a graph
Proof. Let A

denote the set of all nodes v in G for which there is an s − v path in G
f
. Let B

denote the set of all other nodes: B

= V −A

. First, we establish that (A

, B

) is indeed an s −t
cut. The source s belongs to A

since there is always a path from s to s. Moreover, t doesn’t belong
to A

by the assumption that there is no s −t path in the residual graph, so t ∈ B

.
As shown in Figure 4.3, suppose that e = (u, v) is an edge in G for which u ∈ A

and v ∈ B

.
We claim that f(e) = c(e). If not, e would be a forward edge in the residual graph G
f
, so we
would obtain an s −v path in G
f
, contradicting our assumption that v ∈ B

. Now we suppose that
e

= (u

, v

) is an edge in G for which u

∈ B

and v

∈ A

. We claim that f(e

) = 0. If not, e
would be a backward edge (v

, u

) in the residual graph G
f
, so we would obtain an s − u

path in
G
f
, contradicting our assumption that u

∈ B

. So we have an conclusion:
v(f) = f
out
(A

) −f
in
(A

)
=

e=(u,v),u∈A

,v∈B

f(e) −

e=(u,v),u∈B

,v∈A

f(e)
=

e=(u,v),u∈A

,v∈B

c(e) −0
= c(A

, B

)
4.4 Ford-Fulkerson Algorithm
From the Max-flow Min-cut Theorem, we know that if the residual graph contains a path from s
to t, then we can increase the flow by the minimum capacity of the edges on this path, so we must
not have the maximum flow. Otherwise, we can define a cut (S, T) whose capacity is the same as
the flow f , such that every edge from S to T is saturated and every edge from T to S is empty,
20
which implies that f is a maximum flow and (S, T) is a minimum cut. Based on the theorem, we
can have the Ford-Fulkerson Algorithm: Starting with the zero flow, repeatedly augment the flow
along any path s-t in the residual graph, until there is no such path.
Algorithm 6: Ford-Fulkerson Algorithm
input : A network (G,s,t,c)
output: a flow in the graph with maximum value
∀e, f(e) ← 0
while ∃s −t path do
f

← augment(f, p)
f ← f

G
f
← G
f

end
return f
Notice that the augment(f, p) in the algorithm.This is an updating procedure of the graph.It
means to find randomly a s −t path and push through a flow of minimum capacity of this path.
Also there are several things we need to know about the Ford-Fulkerson Algorithm:
1. Ford-Fulkerson Algorithm does not always terminate when we have real numbers assign-
ment. Consider the flow network shown in Figure 4.4, with source s, sink t,capacities of
edges e
1
,e
2
and e
3
respectively 1, r =

5−1
2
and the capacity of all other edges some integer
M ≥ 2. Here the constant r was chosen so r
2
= 1−r. We use augmenting paths according to
the table, where p
1
= ¦s, v
4
, v
3
, v
2
, v
1
, t¦,p
2
= ¦s, v
2
, v
3
, v
4
, t¦,p
3
= ¦s, v
1
, v
2
, v
3
, t¦. Note
that after step 1 as well as after step 5, the residual capacities of edges e
1
, e
2
and e
3
are in the
form r
n
, r
n+1
and 0, respectively, for some n ∈ N. This means that we can use augment-
ing paths p
1
, p
2
, p
1
and p
3
infinitely many times and residual capacities of these edges will
always be in the same form. Total flow in the network after step 5 is 1+2(r
1
+r
2
). If we con-
tinue to use augmenting paths as above, the total flow converges to 1+2

i=1
infr
i
= 3+2r
, while the maximum flow is 2M + 1. In this case, the algorithm never terminates and the
flow doesn’t even converge to the maximum flow.
2. if function c maps the edges to integer, then FF outputs integer value.
This algorithm seems quite natural to us, but in fact the worse case running time of Ford-
Fulkerson Algorithm is exponential. Here is a famous example of Ford-Fulkerson Algorithm run-
ning in exponential time in Figure 4.5.
In this example, if the algorithm choose an s −t path which contains the edge u −v or v −u
(this is possible,recalling that the existence of the backword edge), each time we can only push 1
unit flow. And obviously the max flow for the network is 2
n+1
, so we need to choose 2
n+1
times
augmenting. Thus in this case the running time is exponential of input length (which is n).
4.4.1 Edmonds-Karp Algorithm
Now we introduce a faster algorithm called the Edmonds-Karp algorithm which was published
by Jack Edmonds and Karp in 1972, and solves the maximum flow problem in O(m
2
n) running
21
Figure 4.4: an example of Ford-Fulkerson algorithm never terminates

u
2
n

1


s
2
n
>>
2
n
//

v
2
n
//

t
Figure 4.5: An example that Ford-Fulkerson terminates in exponential time
22
time.This algorithm only differs from the Ford-Fulkerson algorithm in one rule: Ford-Fulkerson
algorithm chooses the augmenting path arbitrarily, but Edmonds-Karp algorithm chooses the aug-
menting path of the shortest traversal length.
This rule sounds amazing: what does the traversal length do with the flow that we can push
through this path? But deeper analysis shows that they indeed have relations to each other.
Theorem 4.4.1. E-K terminates in O([ E [
2
[ V [)
Proof.
Lemma 4.4.2. If we use Edmonds-Karp algorithm on the flow network (G = (V, E), s, t, c), for
any v ∈ V − ¦s, t¦, the shortest-path distance δ
f
(u, v) will monotone increasing with each flow
augmentation.
Proof. We will suppose that for some vertex v ∈ V − ¦s, t¦, there is a flow augmentation that
causes the shortest-path distance from s to v to decrease, and then we will derive a contradiction.
Let f be the flow just before the first augmentation that decreases some shortest-path distance, and
let f

be the flow just afterward. Let v be the vertex with the minimumδ
f
(s, v) whose distance was
decreased by the augmentation, so that δ
f
(s, v) < δ
f
(s, v). Let p = s → . . . u → v be the shortest
path from s to v in G

f
, so that (u, v) ∈ E
f
, and δ
f
(s, u) = δ
f
(s, v) − 1, δ
f
(s, u) δ
f
(s, u).
Then we will get (u, v) doesn’t belong to E
f
(the proof is easy). As (u, v) ∈ E
f
, the augmentation
must have increased the flow from v to u, and therefore the shortest path from s to u in G
f
have(v,
u) as its last edge. Then
δ
f
(s, v) = δ
f
(s, u) −1 δ
f
(s, u) −1 = δ
f
(s, v) −2
. It contradicts our assumption that δ
f
(s, v) < δ
f
(s, v)
Theorem 4.4.3. If the Edmonds-Karp algorithm runs on a flow network G = (V, E) with source s
and sink t, then the total number of flow augmentations performed by the algorithm is O(V E).
Proof. We say that an edge (u, v) in a residual network G
f
is critical on an augmenting path p if
the residual capacity of p is the residual capacity of (u, v), that is , if c
f
(p) = c
f
(u, v). After we
have augmented flow along an augmenting path, any critical edge on the path disappears from the
residual network. Moreover, at least one edge on any augmenting path must be critical. We will
show that each of the [ E [ edges can become critical at most
|V |
2
−1 times.
Let u and v be vertices in V that are connected by an edge in E. Since augmenting paths are
shortest paths, when (u, v) is critical for the first time, we have δ
f
(s, v) = δ
f
(s, u) + 1. Once
the flow is augmented, the edge(u, v) disappears from the residual network. It cannot reappear
later on another augmenting path until after the flow u to v is decreased, which occurs only if (u,
v) appears on an augmenting path. If f

is the flow in G when this event occurs, then we have
δ
f
(s, u) = δ
f
(s, v) + 1. As
δ
f
(s, v) δ
f
(s, v)
we have
δ
f
(s, u) = δ
f
(s, v) + 1 δ
f
(s, v) + 1 = δ
f
(s, u) + 2
Consequently, from the time (u, v) becomes critical to the time when it next becomes critical, the
distance of u from the source increases by at least 2. The distance of u from the source is initially at
23
Figure 4.6: constructing a new network in order to run Ford-Fulkerson on it
least 0. The intermediate vertices on a shortest path from s to u can’t contain s, u, or t. Therefore,
until u becomes unreachable from the source, if ever, its distance is at most [ V [-2. Thus, (u, v)
can become critical at most
|V |−2
2
=
|V |
2
− 1 times. Since there are O(E) pairs of vertices that can
have an edge between them in a residual graph, the total number of critical edges during the entire
execution of the Edmonds-Karp algorithm is O(VE). Each augmenting path has at least one critical
edge, and hence the theorem follows.
Since each iteration of Ford-Fulkerson can be implemented in O(E) time when the augment-
ing path is found by breadth-first search, the total running time of the total running time of the
Edmonds-Karp algorithm is O([ E [
2
[ V [).
4.5 Bipartite Matching via Network Flows
In this section we’ll solve the maximum matching problem using the network flow problem. In
fact this is rather straight forward. Let G = (X, Y, E) be a bipartite graph. We add two nodes
,s,t,to the graph and connect from s to all nodes in X,and connect all nodes in Y to t,as in Fig 6.
Run Ford-Fulkerson on the new network N, we get a maximum flow f.Define a matching M =
¦e [ f(e) = 1¦,then m is a maximum matching. The proof is simple: we just need to prove that
we can build a one-on-one correspondence from matchings to flows.And this is obvious: for a
matching, add two nodes s and t and do similar things as above, we get a flow.And for a flow f,
define a matching M = ¦e [ f(e) = 1¦,we get a matching. Thus the matching we find using
Ford-Fulkerson is a maximum matching.
4.6 Application of Maximum Matchings: Approximating Ver-
tex Cover
We introduce a new problem: vertex cover:
Definition 4.6.1. Vertex Cover
• Input: G = (V, E)
• Output: a cover of minimum size.
24
Here is the definition for COVER:
Definition 4.6.2. Cover : G = (V, E),U ∈ V is a cover if ∀e ∈ E,∃u ∈ U which is a end node
of e.
Despite the long time of research, the problem remains open.
1. Negative: VC can not be approximated with a factor smaller than 1.363, unless P = NP.

2. Positive : We have a polynomial algorithm for VCwith approximation ratio 2−Ω(
1

log(n)
).
3. Belief : VC cannot be approximated within 2 −, ∀ > 0. Notice: in fact under something
which is called the Unique Game Conjecture this is true.
Here is a simple algorithm that approximates VC with ratio 2:
1. compute the maximum matching M in the graph.
2. output the vertices of M.
Proof. Observe that [ M [≤ OPT,where M is the max matching. Also notice that vertices in M
cover G. Suppose that there is a node u which is not covered by M, then it must be connected to a
node outside M,say v.Then (u, v) is an edge that does not belong to M,which contradicts that M is
a max matching.And the algorithm outputs 2 [ M [ nodes, and 2 [ M [≤ 2OPT
25
Chapter 5
Optimization Problems, Online and
Approximation Algorithms
In this chapter, we begin by introducing the definition of an optimization problem followed by
the definition of an approximation algorithm. We then present the Makespan problem as a simple
example of when an approximation algorithm is useful. Because the Makespan problem is NP-
hard, there does not exist a polynomial algorithm to solve it unless P = NP. If we can not
find an exact answer to an arbitrary instance of Makespan in polynomial time, then perhaps we
can give an approximate solution. To this end, we formulate an approximation algorithm for the
Makespan problem: Greedy-Makespan. We go on to improve this algorithm to one with a lower
approximation ratio (i.e. an algorithm that is guaranteed to give an even closer approximation to
an optimal answer).
5.1 Optimization Problems
Definition 5.1.1. An optimization problem is finding an optimal (best) solution from a set of fea-
sible solutions.
Formally, an optimization problem Π consists of:
1. A set of inputs, or instances, J.
2. A set f(I) of feasible solutions, for each given instance I ∈ J.
3. An optimization objective function: c : T → R where T =

I∈I
f(I). That is, given an
instance I and a feasible solution y, the objective function denotes a “measure” of y with a
real number.
The goal of an optimization problem Π is to either give a minimum or maximum: the shortest or
longest path on a graph, for example. For maximization problems, the goal of problem Π is to find
an output ∀I, O ∈ f(I), s.t. c(O) = max
O

∈f(I)
c(O

).
For example, in the Unweighted Interval Scheduling Problem, we have 4 intervals, as shown
in the following Figure 5.1
26
Here feasible solutions are ¦I
1
, I
2
¦, ¦I
1
¦, ¦I
2
¦, ¦I
3
¦, ¦I
4
¦, and ¦I
1
, I
2
¦ is optimal, as it maxi-
mizes the number of intervals that can be scheduled.
Notice: In the following section, we will only discuss maximization problems for brevity;
minimization problems are in no way fundamentally different.
5.2 Approximation Algorithms
Sometimes we may not know the optimal solution for the problem, if there is no known efficient
algorithm to produce it, for example. In these cases, it might be useful to find an answer that
is “close” to optimal. Approximation algorithms do exactly this, and we present them formally
presently.
Definition 5.2.1 (Approximation Algorithm). Let Π be an optimization problem and A be a poly-
nomial time algorithm for Π. For all instances I, denote by OPT(I) the value of the optimal solu-
tion, and by A(I) the solution computed by A, for that instance. We say that Ais a c-approximation
algorithm for Π if
∀I, OPT(I) ≤ cA(I)
5.3 Makespan Problem
In this section, we will discuss the Makespan problem followed by an approximation algorithm
which solves it. Makespan is a basic problem in scheduling theory: how to schedule a set of jobs
on a set of machines.
Definition 5.3.1 (Makespan Problem). We are given n jobs with different processing times, t
1
, t
2
, ..., t
n
.
The goal is to design a assignment of the jobs to m separate, but identical machines such that the
last finishing time for given jobs (also called makespan) is minimized.
We firstly introduce some basic notations to analysis the Makespan Problem. m ∈ N and
t
1
, t
2
, ..., t
n
are given as input. Here, m is the number of machines and t
1
, t
2
, ..., t
n
are the process-
ing length of jobs. Figure 5.3 gives an example input. We define A
i
to be the set jobs assigned
to machine i and T
i
to be the total processing time for machine i: T
i
=

t∈A
i
t. The goal of the
problem is to find a scheduling of the jobs that minimize max
i
T
i
.
Suppose we are given as input: m = 2 ,< t
1
, t
2
, t
3
, t
4
, t
5
>=< 1, 1, 1, 1, 4 >.
Of these two possible solutions, the left one is better, in fact it is optimal.
Fact 5.3.2. MAKESPAN is NP-hard.
27
Thus, a polynomial time algorithm to solve an arbitrary instance of Makespan does not exist
unless P = NP. However, we can find an approximation algorithm that gives a good, albeit pos-
sibly suboptimal, solution that can run in polynomial time.
5.4 Approximation Algorithm for Makespan Problem
A greedy approach is often one of the first ideas to try, as it is so intuitive. The basic idea is as
follows: we schedule the jobs from 1 to n, assigning the i
th
job to the machine with the shortest
total processing time.
Algorithm 7: Greedy Makespan
input : m ∈ N: number of machines and t
1
, t
2
, ..., t
n
: the processing length of jobs
output: T: the last finishing time for the scheduling
for i ← 1 to n do
choose the smallest T[k] over T[i](i = 1, 2, ..., m)
A[k] = A[k] + T[i]
T[k] = T[k] + t[i]
end
T = min
i∈[n]
¦T[i]¦
return T
Theorem 5.4.1. Greedy Makespan is a 2-approximation algorithm for Makespan
Proof. Our proof will first show that Greedy Makespan gives an approximation ratio with a lower
bound of 2 (the solution is at least 2 times the optimal). Then we will show that Greedy Makespan
has an upper bound of 2 (the solution is no worse than 2 times the optimal). In this way we will be
able to show that Greedy Makespan is indeed a 2-approximation for Makespan.
Here is an example to prove that the lower bound approximation ratio of Greedy Makespan Al-
gorithm is at least 2: Consider the case in which there are m machines and n jobs, where n =
m(m− 1) + 1. Among the n jobs, m(m− 1) of them have processing length 1 and the last one
has processing time m. Greedy-Makespan will schedule jobs as shown in Figures 5.1 and 5.2. The
optimal solution for is shown in Figure 5.3.
28
Figure 5.1: Before placing the last job.
Figure 5.2: After placing the last job
Now we have an approximation ration of Greedy Makespan =
2m−1
m
= 2 −
1
m
. Thus, by exam-
ple, we have shown that Greedy Makespan has an approximation ratio lower bound of 2.
We now show that the output of the greedy algorithm is at most twice that of the optimal
solution, i.e. has an upper bound of 2. Firstly we bring in two conspicuously tenable facts:
1.
1
m

t
i
≤ OPT(I).
2. t
max
≤ OPT(I) where t
max
is the most wasteful job.
Let T

k
= T
k
− t
j
where t
j
is the last job assigned to machine k. It is easy to see that T

k
is at
most the average process time. From property (1), we have: T

k

1
m

i
t
i
≤ OPT. From property
(2) we have: T

k
= T
k
−t
j
≥ T
k
−OPT. Combining these facts, we get: T
k
−OPT ≤ OPT. So
T
k
≤ 2OPT.
Now we have an 2-approximation algorithm for Makespan Problem. But can we do better?
The answer is yes. Next we will improve this bound with Improved Greedy Makespan.
The only difference between the Improved Greedy Makespan and Greedy Makespan is that the
improved algorithm first sorts t
i
by non-increasing length before scheduling.
Theorem 5.4.2. Improved Greedy Makespan is a
3
2
-approximation algorithm for Makespan.
Figure 5.3: Optimal solution
29
Proof. Since processing time are sorted in non-increasing order at the beginning of the algorithm,
we have t
1
≥ t
2
≥ ... ≥ t
n
.
In this proof we will make use of the following claim:
Claim 5.4.3. When n ≤ m, Improved Greedy Makespan will obtain an optimal solution. When
n > m, we have OPT ≥ 2t
m+1
.
Proof. When n ≤ m, every job is assigned to unique machine, thus the output of the algorithm is
equal to the finishing time of the largest job. This is obviously optimal.
When n > m, then by the pigeonhole principle there must be two jobs t
i
, t
j
that, in the optimal
case, are assigned to the same machine where i, j ≤ m + 1 and t
i
, t
j
≥ t
m+1
. We conclude that
OPT ≥ 2t
m+1
.
When we define: T

k
= T
k
−t
j
, t
j
the last job assigned to machine k, then we have:
1. If t
j
is such that j ≤ m, then there is only one job scheduled on machine k. This means that
T
k
≤ t
1
≤ OPT.
2. If t
j
is such that j > m, then machine k should be the one with the shortest total length
before scheduling the j
th
job. This implies T

k

1
j−1

i
t
i

1
m

i
t
i
≤ OPT.
Furthermore, we have T

k
= T
k
− t
j
≥ T
k

1
2
OPT. Thus, T
k

1
2
OPT ≤ OPT, which
implies T
k

3
2
OPT
So for every k ∈ [m], T
k

3
2
OPT. Therefore
3
2
is an upper bound on the approximation ratio
of Improved Greedy Makespan.
Although we have shown that Improved Greedy Makespan has an approximation ratio of
3
2
,
this bound is not tight. That is, we can do better.
Remark 5.4.4. Improved Greedy Makespan is in fact a
4
3
-approximation algorithm. (Proof omit-
ted)
30
Chapter 6
Introduction to Convex Polytopes
In this lecture, we cover some basic material on the structure of polytopes and linear programming.
6.1 Linear Space
Definition 6.1.1 (Linear space). V is a linear space, if V ⊆ R
d
, d ∈ Z
+
, V = span¦ u
1
, u
2
, . . . , u
k
¦,u
i

R
d
. Equally, we can define linear space as V = ¦u[∀c
1
, c
2
, . . . , c
k
∈ R, u = c
1
u
1
+ +c
k
u
k
¦.
Definition 6.1.2 (Dimension). Let dim(V ) be the dimension of a linear space V . The dimension
of a linear space V is the cardinality of a basis of V . A basis is a set of linearly independent vectors
that, in a linear combination, can represent every vector in a given linear space.
Example 6.1.3 (Linear space). Line V ⊆ R
2
in Figure 1 is a linear space since it contains the
origin vector 0. It’s obvious that dim(V) = 1.
Figure 6.1: An example of linear space
6.2 Affine Space
In this section, we propose the notion of affine maps: x → Ax + x
0
, which represent an affine
change of coordinates if A is a nonsingular n n matrix, x, x
0
∈ R
n
.
Then we will introduce an affine space. An affine space is a linear space that has “forgotten”
its origin. It can be seen as a shift of a linear space.
31
Definition 6.2.1 (Affine Space). Let V be a linear space. V

is an affine space if V

= V +

β,
where vector

v

∈ V

if and only if ∀

v

∈ V

,

v

= v +

β, where v is a vector in a linear space V ,
and

β is a fixed scalar for an affine space.
Example 6.2.2. In Figure 6.2, line V is a linear space while line V

is not since it does not contain
the origin vector 0. V

is an affine space with the property that V

= V +

β = ¦v +

β[v ∈ V ¦
where V ⊆ R
2
is a linear space and

β ∈ R
2
is a scalar. Note that dim(V

) = dim(V ) = 1.
Figure 6.2: An example of affine space
Definition 6.2.3 (Affine Hull). The affine hull aff(S) of S is the set of all affine combinations of
elements of S, that is,
aff(S) = ¦

u

[

u

=
k

i=0
λ
i
u
i
, u
i
∈ S, λ
i
∈ R, i = 1...k;
k

i=0
λ
i
= 1 k = 1, 2, ...¦
Claim 6.2.4. V

is an affine space if and only if there exist a linear S = ¦ u
0
, u
1
, ..., u
k
¦ such that
V

= aff(S)
Proof. Let V

= V +

β. We define V = span¦ u
1
, u
2
, ..., u
k
¦, and we let B = ¦ u
1
, u
2
, ..., u
k
¦, and
finally B

= (B ∩

0) +

β = ¦

β,

β + u
1
,

β + u
2
, ...,

β + u
k
¦.
• For every y ∈ aff(B

), we show that y ∈ V

:
y = λ
0

β +λ
1
( u
1
+

β) + λ
2
( u
2
+

β) + ... +λ
k
( v
k
+

β)
= (λ
0

1
+... +λ
k
)

β +λ
1
u
1

2
u
2
+... +λ
k
u
k
=

β +λ
1
u
1

2
u
2
+... +λ
k
u
k
Since λ
1
u
1

2
u
2
+ ... + λ
k
u
k
∈ V , y ∈ V

.
• For every y ∈ V

, we show that y ∈ aff(B

)
y =

β +λ
1
u
1

2
u
2
+... +λ
k
u
k
=

β + (λ
1

2
+... +λ
k
)

β −(λ
1

2
+ ... + λ
k
)

β + λ
1
u
1

2
u
2
+ ... + λ
k
u
k
= [1 −(λ
1
+ λ
2
+... +λ
k
)]

β +λ
1
( u
1
+

β) + λ
2
( u
2
+

β) + .. +λ
k
( u
k
+

β)
= λ
0

β +λ
1
( u
1
+

β) + λ
2
( u
2
+

β) + ... +λ
k
( v
k
+

β)
32
So y ∈ aff(B

)
6.3 Convex Polytope
Definition 6.3.1 (Convex body). A Convex body K can be defined as: for any two points x, y ∈ K
where K ⊆ R
d
, K also contains the straight line segment [x, y] = ¦λx + (1 −λ)y[0 ≤ λ ≤ 1¦, or
[x, y] ⊆ K.
Example 6.3.2. The graph in Figure 6.3(a) is a convex body since for any x, y ∈ K, [x, y] ⊆ K.
However, the graph in Figure 6.3(b) is not a convex body since there exists an x, y ∈ K

such that
[x, y] ,⊆ K

.
(a) (b)
Figure 6.3: (a) An example of convex body (b) An example of non-convex body
Definition 6.3.3 (Convex Hull). For any K ⊆ R
d
, the “smallest” convex set containing K, called
the convex hull of K, can be constructed as the intersection of all convex sets that contain K:
conv(K) :=

¦U[U ⊆ R
d
, K ⊆ U, U is convex¦.
Definition 6.3.4 (Convex Hull). An alternative definition of convex hull is that let K = ¦ u
1
, u
2
, . . . , u
k
¦,
¯ conv(K) = ¦u[u = λ u
1
+ λ
k
u
k
,
k

i=0
λ
i
= 1, λ
i
≥ 0¦
Claim 6.3.5. conv(K) = ¯ conv(K).
Proof. Since conv(K) is the “smallest” convex set containing K ⊆ R
d
, conv(K) ⊆ ¯ conv(K).
Next, we will use induction to prove that ¯ conv(K) ⊆ conv(K) .
The base case is clearly correct, when K = ¦ u
1
¦ ⊆ R
d
, ¯ conv(K) ⊆ conv(K).
We now assume that for an arbitrary k − 1 where K = ¦ u
1
, . . . , u
k−1
¦ ⊆ R
d
, we have
¯ conv(K) ⊆ conv(K). On condition k, fix an arbitrary u ∈ ¯ conv(K) such that u = λ u
1
+ λ
k
u
k
.
Actually,
u = (1 −λ
k
)
_
λ
1
1 −λ
k
u
1
+ +
λ
k−1
1 −λ
k
u
k−1
_

k
u
k
,
33
where by hypothesis

u

=
λ
1
1−λ
k
u
1
+ +
λ
k−1
1−λ
k
u
k−1
∈ conv(K), since the sum of coefficients is
1. Because

u

, u
k
∈ conv(K) and conv(K) is convex, then u = (1 − λ
k
)

u

+ λ
k
u
k
∈ [

u

, u
k
] ⊆
conv(K). Thus, by induction on k, ¯ conv(K) ⊆ conv(K).
Hence, conv(K) = ¯ conv(K).
6.4 Polytope
In this section, we are going to introduce the concept of a V-Polytope and an H-Polytope.
Definition 6.4.1 (V-polytope). A V-polytope is convex hull of a finite set of points.
Before we give a definition of an H-Polytope, we need to give the definition of a hyperplane
and an H-Polyhedron.
Definition 6.4.2 (Hyperplane). A hyperplane in R
d
can be described as the set of points x =
(x
1
, x
2
, ..., x
d
)
T
which are solutions to α
1
x
1

2
x
2
+ +α
d
x
d
= β, for fixed α
1
, α
2
, . . . , α
d
, β ∈
R.
Example 6.4.3. x
1
+ x
2
= 1 is a hyperplane.
Figure 6.4: An example of hyperplane
Definition 6.4.4 (Half-space). A Half-space in R
d
is the set of points x = (x
1
, x
2
, ...x
d
)
T
, which
satisfies α
1
x
1

2
x
2
+ +α
d
x
d
≤ β. for fixed α
1
, α
2
, . . . , α
d
, β ∈ R.
Definition 6.4.5 (H-polyhedron). An H-polyhedron is the intersection of a finite number of closed
half-spaces.
Definition 6.4.6. An H-polytope is a bounded H-polyhedron.
A simple question follows: are the two types of polytopes equivalent? The answer is yes.
However, the proof, Fourier-Motzkin elimination, is complicated.
34
(a) (b)
Figure 6.5: (a) H-polyhedron and (b)H-polytope
6.5 Linear Programming
The goal of a Linear program (LP) is to optimize linear functions over polytopes. Examples are to
minimize/maximize α
1
x
1
+ α
2
x
2
+ + α
n
x
n
where x
1
, . . . , x
n
are variables and α
1
, . . . , α
n
are
constants that are subject to a set of inequalities:
α
11
x
1

12
x
2
+ + α
1n
x
n
≤ β
1
α
21
x
1

22
x
2
+ +α
2n
x
n
≤ β
2
.
.
.
It can be solved in polynomial time if x
1
, . . . x
n
are real numbers, but not when they are in-
tegers. Previously, it was believed that LPs were another complexity class between P and NP.
However, Linear Programs can, in fact, be solved in time polynomial in their input size. The
first polynomial time algorithm for LPs was the Ellipsoid algorithm. Since then, there has been a
substantial amount of work in other polynomial time algorithms, e.g. Interior Point methods.
Example 6.5.1. Minimize f(x) = 7x
1
+x
2
+ 5x
3
, subject to
x
1
+ x
2
+ 3x
3
≥ 10
5x
1
+ 2x
2
−x
3
≥ 6
x
1
, x
2
, x
3
≥ 0
Every x =
_
_
x
1
x
2
x
3
_
_
that satisfies the constraints is called feasible solution.
Question: Can you “prove” ∃

x

that makes f(

x

) ≥ 30?
Answer: Yes. Here is such an

x

:

x

=
_
_
2
1
3
_
_
, f(

x

) = 30.
To disprove a proposition or prove a lower bound, then we may need to come up with different
combinations of inequalities, such as
7x
1
+ x
2
+ 5x
3
≥ (x
1
−x
2
+ 3x
3
) + (5x
1
+ 2x
2
−x
3
) ≥ 16.
35
Chapter 7
Forms of Linear Programming
We describe the different forms of Linear Programs (LPs), including the standard and canonical
forms as well as how to do transform between them.
7.1 Forms of Linear Programming
In the previous lecture, we introduced the notion of Linear Programming by giving a simple exam-
ple. Linear Programs are commonly written in two forms: standard form and canonical form.
Recall from the previous lecture that an LP is a set of linear inequalities together with an
objective function. Each linear inequality takes the form:
a
1
x
1
+ a
2
x
2
+... +a
n
x
n
¦≤, =, ≥¦b
There may be many such linear inequalities. The objective function takes the form:
c
1
x
1
+c
2
x
2
+ ... + c
n
x
n
The goal of Linear Programming is to maximize or minimize an objective function over the as-
signments that satisfy all inequalities. The general form of an LP is:
maximize/minimize c
T
x
subject to equations
inequalities
non-negative vers clause
unconstrained variable example:x
i
≷ 0
There are a couple different ways to write a Linear Program.
Definition 7.1.1 (Canonical Form). An LP is said to be in canonical form if it is written as
minimize c
T
x
subject to Ax ≥

b
x ≥

0
36
Definition 7.1.2 (Standard Form). An LP is said to be in standard form if it is written as
minimize c
T
x
subject to Ax =

b
x ≥

0
7.2 Linear Programming Form Transformation
We present transformations that allow us to change an LP written in the general form into one
written in the standard form, and vice versa.
1. STEP1: From General form into Canonical form.
maximize c
T
x ⇒ −minimize

−c
T
x
a
T
x = b ⇒ a
T
x ≤ b and a
T
x ≥ b
a
T
x ≤ b ⇒ −a
T
x ≥ −b
x
i
≷ 0 ⇒ x
+
i
−x

i
, x
+
i
, x

i
≥ 0
2. STEP2: From Canonical form into Standard form
a
T
x ≥ b ⇒a
T
x −y = b, y ≥ 0
Example 7.2.1. Suppose we have an LP written in the general form, and we wish to transform it
into the standard form:
maximize 2x
1
+ 5x
2
subject to x
1
+x
2
≤ 3
x
2
≥ 0
x
1
≷ 0
The standard form of the LP is:
minimize −(2x
+
1
−2x

1
+ 5x
2
)
subject to x
+
1
−x

1
+x
2
+ y = 3
x
+
1
, x

1
, x
2
, y ≥ 0
37
Chapter 8
Linear Programming Duality
In this lecture, we introduce the dual of a Linear Program (LP) and give a geometric proof of the
Linear Programming Duality Theorem from elementary principles. We also introduce the concept
of complementary slackness conditions, which we use to characterize optimal solutions to LPs and
their duals. This material finds many applications in subsequent lectures.
8.1 Primal and Dual Linear Program
8.1.1 Primal Linear Program
Consider an arbitrary Linear Program, which we will call primal in order to distinguish it from
the dual (introduced later on). In what follows, we assume that the primal Linear Program is a
maximization problem, where variables take non-negative values.
maximize
n

j=1
c
j
x
j
subject to
n

j=1
a
ij
x
j
≤ b
i
for i = 1, 2, . . . , m
x
j
≥ 0 for j = 1, 2, . . . , n
The above Linear Program can be rewritten as:
maximize c
T
x (8.1)
subject to Ax ≤

b (8.2)
x ≥

0 (8.3)
An example:
38
maximize 3x
1
+x
2
+ 3x
3
(8.4)
subject to x
1
+x
2
+ 3x
3
≤ 30 (8.5)
2x
1
+ 2x
2
+ 5x
3
≤ 24 (8.6)
4x
1
+x
2
+ 2x
3
≤ 36 (8.7)
x
1
, x
2
, x
3
≥ 0 (8.8)
8.1.2 Dual Linear Program
Suppose that one wants to know whether there is a flow in a given network of value ≥ α. If
such a flow exists, then we have a certificate of small size that proves that such a thing exists: the
certificate is a flow f such that α = v(f). Now, what if one wants to know whether or not every
flow has value < α? One certificate we can give to prove this is to list all flows (say, of integer
values). But this certificate is far too large. From the max-flow min-cut theorem we know that the
weight of every cut is an upper bound to the maximum flow. Observe that for this, we just list the
cut, which is a small certificate. Note that the “size” of the certificate is one thing. We care both
about the size and about the time required to verify using this certificate.
The above relation between max-flow and min-cut is called duality. The concept of duality
appears to be very useful in the design of efficient combinatorial algorithms. As we have men-
tioned before, Linear Programming can be seen as a kind of “unification theory” for algorithms.
Part of its classification as such a thing is because there is a systematic way of obtaining duality
characterizations.
Any primal Linear Program can be converted into a dual linear program:
minimize
m

i=1
b
i
y
i
subject to
m

i=1
a
ij
y
i
≥ c
i
for j = 1, 2, . . . , n
y
i
≥ 0 for i = 1, 2, . . . , m
The above Linear Program can be rewritten as:
minimize

b
T
y (8.9)
subject to A
T
y ≥c (8.10)
y ≥

0 (8.11)
The primal Linear Program example given in (8.4) - (8.8) can be converted to the following
dual form:
39
minimize 30y
1
+ 24y
2
+ 36y
3
subject to y
1
+ 2y
2
+ 4y
3
≥ 3
y
1
+ 2y
2
+y
3
≤ 1
3y
1
+ 5y
2
+ 2y
3
≤ 2
y
1
, y
2
, y
3
≥ 0
As we shall see, in cases when both the primal LP and the dual LP are feasible and bounded,
the objective value of the dual form gives a bound on the objective value of the primal form.
Furthermore, we shall see the two optimal objective values are actually equivalent!
8.2 Weak Duality
Theorem 8.2.1. (Weak Duality) Let x = (x
1
, x
2
, . . . , x
n
) be any feasible solution to the primal
Linear Program and let y = (y
1
, y
2
, . . . , y
m
) be any feasible solution to the dual Linear Program.
Then c
T
x ≤

b
T
y, which is to say
n

j=1
c
j
x
j

m

i=1
b
i
y
i
Proof. We have
n

j=1
c
j
x
j

n

j=1
(
m

i=1
a
ij
y
i
)x
j
=
m

i=1
(
n

j=1
a
ij
x
j
)y
i

m

i=1
b
i
y
i
Corollary 8.2.2. Let x = (x
1
, x
2
, . . . , x
n
) be a feasible solution to the primal Linear Program,
and let y = (y
1
, y
2
, . . . , y
m
) be a feasible solution to the dual Linear Program. If
n

j=1
c
j
x
j
=
m

i=1
b
i
y
i
then x and y are optimal solutions to the primal and dual Linear Programs, respectively.
Proof. Let t =

n
j=1
c
j
x
j
=

m
i=1
b
i
y
i
. By the Weak Duality Theorem we know the objective
value of the primal Linear Program is upper bounded by t. If we set x to be such that it meets its
upper bound t, then x is an optimal solution. The proof for y is similar.
40
8.3 Farkas’ Lemma
Farkas lemma is an essential theorem which tells us that we can have small certificates of in-
feasibilty of a system of linear inequalities. Geometrically this certificate is a hyperplane which
separates the feasible region of a set of constraints from a point that lies outside this region. Al-
though the geometric intuition is clear (and straightforward), it takes a little bit of work to make it
precise. The key part of the proof of Farkas lemma is the following geometric theorem.
8.3.1 Projection Theorem
Theorem 8.3.1. (Projection Theorem) Let K be a closed, convex and non-empty set in R
n
, and

b ∈ R
n
,

b / ∈ K. Define projection p of

b onto K to be x ∈ K such that |

b −x| is minimized. Then
for all z ∈ K, we have (

b − p)
T
(z − p) ≤

0.
Proof. Let u =

b − p, v = z − p. Suppose u v > 0, then
1) If |v| ≤
u·v
v
, then
(

b −z)
2
= (u −v)
2
= u
2
−2u v +v
2
≤ u
2
−u v
< u
2
= (

b − p)
2
which means |

b −z| < |

b − p|.
2) If |v| >
u·v
v
, and because K is convex, p ∈ K and z = p + v ∈ K, we know

z

=
p +
u·v
v
2
v ∈ K. Then
(

b −

z

)
2
= (u −
u v
|v|
2
v)
2
= u
2

(u v)
2
|v|
2
< u
2
= (

b − p)
2
which means |

b −

z

| < |

b − p|.
8.3.2 Farkas’ Lemma
Lemma 8.3.2. (Farkas’ Lemma) One and only one of the following two assertions holds:
1. A
T
y =c, y ≥

0 has a solution.
2. Ax ≤

0 and c
T
x > 0 has a solution.
Proof. First, we show the two assertions cannot hold at the same time. If so, we have
c
T
x = y
T
Ax
41
We know y
T

0 and Ax ≤

0. So c
T
x = y
T
Ax ≤ 0. But this contradicts c
T
x > 0.
Next, we show at least one of the two assertions holds. Specifically, we show if Assertion 1
doesn’t hold, then Assertion 2 must hold.
Assume A
T
y =c, y ≥

0 is not feasible. Let K = ¦A
T
y : y ≥

0¦ (which is obviously convex).
Then c / ∈ K. Let p = A
T
w( w ≥

0) be the projection of c onto K. Then from the Projection
Theorem we know that
(c −A
T
w)
T
(A
T
y −A
T
w) ≤ 0 for all y ≥

0 (8.12)
Define x =c − p =c −A
T
w. Then
x
T
A
T
(y − w) ≤ 0 for all y ≥

0
(y − w)
T
Ax ≤ 0 for all y ≥

0
Let e
i
be the n dimensional vector that has 1 in its i-th component and 0 everywhere else. Take
y = w + e
i
. Then
(Ax)
i
= e
i
T
Ax ≤ 0 for i = 1, 2, . . . , m (8.13)
Thus, each element of Ax is non-positive, which means Ax ≤

0.
Now, c
T
x = ( p +x)
T
x = p
T
x +x
T
x. Setting y =

0 in (8.13), we have − w
T
Ax = − p
T
x ≤

0,
so p
T
x ≥ 0. Since c ,= p, x ,=

0, so x
T
x > 0. Therefore, c
T
x > 0.
8.3.3 Geometric Interpretation
If c lies in the cone formed by the column vectors of A
T
, then Assertion 1 is satisfied. Other-
wise, we can find a hyperplane which contains the origin and separates c from the cone, and thus
Assertion 2 is satisfied. These two cases are depicted below.
The a
i
’s correspond to the column vectors of A
T
.

b
1
and

b
2
correspond to c in Assertion 1 and
Assertion 2, respectively.
42
8.3.4 More on Farkas Lemma
There are several variants of Farkas Lemma in the literature, as well as interpretations that one may
give. In our context Farkas Lemma is being used to characterize linear systems that are infeasible.
Among others, in some sense, this lemma provides a succinct way for witnessing infeasibility.
Now we give an variant of Farkas Lemma:
Lemma 8.3.3. One and only one of the following two assertions holds:
1. Ax ≥

b has a solution.
2. A
T
y = 0,

b
T
y > 0, y ≥

0 has a solution.
Exercise: Prove Lemma 8.3.3.
8.4 Strong Duality
Roughly speaking, Farkas Lemma and the Linear Programming (LP) Duality theorem are the same
thing. The “hard” job has already been done when proving Farkas Lemma. Note that in Farkas we
have a system of linear inequalities whereas in LP Duality we have a system of linear inequalities
over which we wish to optimize a linear function. What remains to be done is to play around
with formulas such that we translate LP Duality to Farkas. To do that we will somehow “encode”
the function we optimize in the Linear Program inside the constraints of the system of linear
inequalities.
Theorem 8.4.1 (Strong Duality). Let x be a feasible solution to the primal Linear Program and
let y be a feasible solution to the dual Linear Program (8.9)-(8.11). Then x and y are optimal
⇐⇒
c
T
x =

b
T
y
Proof. (⇐) See Corollary (8.2.2).
(⇒) Since we already have c
T
x ≤

b
T
y by the Weak Duality Theorem, we just need to prove
c
T
x ≥

b
T
y. Let z

=c
T
x, w

=

b
T
y.
Claim 8.4.2. There exists a solution of dual of value at most z

, i.e.,
∃y : A
T
y ≥ c, y ≥

0,

b
T
y ≤ z

Proof. We wish to prove that there is a y satisfying:
_
A
T

b
T
_

_
c
−z

_
43
Assume the claim is wrong. Then the variant of Farkas’ Lemma implies that
_
A, −

b
_
_
x
λ
_
=

0
_
c
T
, −z

_
_
x
λ
_
> 0
x ≥

0
λ ≥ 0
has a solution. That is, there exist nonnegative x, λ such that
Ax −λ

b =

0
c
T
x −λz

> 0
Case I: λ > 0. Then there exists nonnegative x such that A(
x
λ
) =

b, c
T
(
x
λ
) > z

. This contradicts
the minimality of z

for the primal.
Case II: λ = 0. Then there exists nonnegative x such that Ax = 0, c
T
x > 0. Take any feasible
solution

x

for primal LP. Then for every µ ≥ 0,

x

+µx is feasible for primal LP, since
1.

x

+µx ≥

0 because

x

0, x ≥

0, µ ≥ 0.
2. A(

x

+µx) = A

x

+µAx =

b +µ

0 =

b.
But c
T
(

x

+µx) =c
T

x

+c
T
µx → ∞as µ → ∞, This contradicts the assumption that
the primal has finite solution.
The above claim shows that z

≥ w

. And we already have z

≤ w

by Weak Duality Theorem,
so z

= w

8.5 Complementary Slackness
Theorem 8.5.1. (Complementary Slackness) Let x be an optimal solution to the primal Linear
Program given in (8.1)-(8.3), and let y be an optimal solution to the dual Linear Program given in
(8.9)-(8.11). Then the following conditions are necessary and sufficient for x and y to be optimal:
m

i=1
a
ij
y
i
= c
j
or x
j
= 0 for j = 1, 2, . . . , n (8.14)
n

j=1
a
ij
x
j
= b
i
or y
i
= 0 for i = 1, 2, . . . , m (8.15)
44
Proof. (Sufficient) Define S
j
= ¦j : x
j
> 0¦ and S
i
= ¦i : y
j
> 0¦. Then we have
c
T
x =
n

j=1
c
j
x
j
=
n

j∈S
j
c
j
x
j
=
n

j∈S
j
_
m

i=1
a
ij
y
i
_
x
j
=
n

j∈S
j
_
m

i∈S
i
a
ij
y
i
_
x
j

b
T
y =
m

i=1
b
i
y
i
=
m

i∈S
i
b
i
y
i
=
m

i∈S
i
_
n

j=1
a
ij
x
j
_
y
i
=
m

i∈S
i
_
_
n

j∈S
j
a
ij
x
j
_
_
y
i
Therefore c
T
x =

b
T
y. By the Strong Duality Theorem, we know x and y must be optimal
solutions.
(Necessary) Suppose x and y are optimal solutions, we have
c
T
x ≤ (A
T
y)
T
x
=
n

j=1
_
m

i=1
a
ij
y
i
_
x
j

b
T
y ≥ (Ax)
T
y
=
m

i=1
_
n

j=1
a
ij
x
j
_
y
i
c
T
x =

b
T
y
so
c
T
x =

b
T
y =
n

j=1
_
m

i=1
a
ij
y
i
_
x
j
(8.16)
45
Suppose there exists j

such that

m
i=1
a
ij
y
i
> c

j
and x

j
> 0. Then
c
T
x =
n

j=1
c
j
x
j
=

j=j

c
j
x
j
+c
j
x

j
<

j=j

_
m

i=1
a
ij
y
i
_
x
j
+
_
m

i=1
a
ij
y
i
_
x

j
=
n

j=1
_
m

i=1
a
ij
y
i
_
x
j
which contradicts optimality. Therefore equation (8.14) must be satisfied. Equation (8.15) can be
proved in a similar fashion.
46
Chapter 9
Simplex Algorithm and Ellipsoid Algorithm
In this chapter, we will introduce two methods for solving Linear Programs: the simplex algorithm
and the ellipsoid algorithm. To better illustrate these two algorithms, some mathematical back-
ground is also introduced. The simplex algorithm doesn’t run in polynomial time, but works well
in practice. The ellipsoid runs in polynomial time, but usually performs poorly in practice.
9.1 More on Basic Feasible Solutions
9.1.1 Assumptions and conventions
Since any LP instance could be converted to maximized standard form, we’ll use the standard form
of an LP to show some useful properties in the following lectures. We can write the maximized
standard form of LP compactly as
maximize c
T
x (9.1)
subject to Ax =

b (9.2)
x ≥

0 (9.3)
We only consider the case when the LP (9.1) - (9.3) is feasible and bounded. In such cases, the
feasible region is a polytope.
Since we have seen how to convert a Linear Program from normal forms to standard form in
the previous lectures, we can simplify our notation by setting A to be an mn matrix, c and x as
n-dimensional vectors, and

b as an m-dimensional vector in (9.1) - (9.3).
It’s easy to see that m < n. WLOG we suppose rank(A) = mso that all constraints are useful.
9.1.2 Definitions
Suppose the Linear Program (9.1) - (9.3) is feasible and bounded. Denote the feasible region by
S. It’s easy to see that S is non-empty and convex. Then we have the following definitions:
Definition 9.1.1 (Corner). A point x in S is said to be a corner if there is an n-dimensional vector
w and a scalar t such that
47
1. w
T
x = t,
2. w
T
x

> t for all x

∈ S −¦x¦.
Intuitively, this definition says that all points in S are on the same side of the hyperplane H
defined by w and H ∩ S = ¦x¦. Also note that w
T
y ≥ t for all y ∈ S.
Definition 9.1.2 (Extreme point). A point x in S is said to be an extreme point if there are no two
distinct points u and v in S such that x = λu + (1 −λ)v for some 0 < λ < 1.
The above definition says there is no line segment in S with x in its interior.
Definition 9.1.3 (Basic Feasible Solution). Let B be any non-singular mm sub-matrix made up
of columns of A. If x ∈ S and all n − m components of x not associated with columns of B are
equal to zero, we say x is a Basic Feasible Solution to Ax =

b with respect to basis B.
Intuitively, this says that the columns of B span the whole m-dimensional vector space. Since

b
is also m-dimensional,

b is a linear composition of the columns of B. Therefore, there is a solution
x with all n −m components not associated with columns of B set to zero.
9.1.3 Equivalence of the definitions
Lemma 9.1.4. If x is a corner, then x is an extreme point.
Proof. Suppose x is a corner, but not an extreme point. Then there exists y ,=

0 such that x+y ∈ S
and x − y ∈ S. So w
T
(x + y) ≥ t and w
T
(x − y) ≥ t. Since w
T
x = t, we have w
T
y ≥ 0 and
w
T
y ≤ 0 which implies that w
T
y = 0. So w
T
(x +y) = w
T
x = t. Since x +y ∈ S, x +y = x by
the definition of corners. This means y =

0, which is a contradiction.
Lemma 9.1.5. If x is an extreme point, then the columns ¦ a
j
: x
j
> 0¦ of A are linearly indepen-
dent.
Proof. WLOG suppose the columns are a
1
, a
2
, . . . , a
k
. If they are not independent, then there are
scalars d
1
, d
2
, . . . , d
k
not all zero such that
d
1
a
1
+d
2
a
2
+ +d
k
a
k
=

0 (9.4)
WLOG we suppose [d
j
[ < [ x
j
[ for 1 ≤ j ≤ k. Let

d be an n-dimensional vector such that

d = (d
1
, . . . , d
k
, 0, . . . , 0)
T
. Then we know
A

d = d
1
a
1
+d
2
a
2
+ +d
k
a
k
+ 0 a
k+1
+ + 0 a
n
=

0 (9.5)
So A(x+

d) = Ax =

b. Since x+

d = (x
1
+d
1
, . . . , x
k
+d
k
, 0, . . . , 0) ≥

0, we have x+

d ∈ S.
Similarly, we have x −

d ∈ S. Since x =
1
2
(x +

d) +
1
2
(x −

d) and

d ,=

0, we see x is not an
extreme point, which is a contradiction.
Lemma 9.1.6. If x ∈ S and the columns ¦ a
j
: x
j
> 0¦ of A are linearly independent, then x is a
Basic Feasible Solution.
48
Proof. WLOG suppose x = (x
1
, . . . , x
k
, 0, . . . , 0) where x
j
> 0 for 1 ≤ j ≤ k, which is to say,
B = ¦ a
1
, . . . , a
k
¦ are independent. Since rank(A) = m, we have k ≤ m.
1) If k = m, we are done. This is because B = [ a
1
a
k
] is an mm invertible sub-matrix
of A.
2) If k < m, we can widen B by incorporating more columns of A while maintaining their
independence. Finally, B has m columns. Now the components of x not corresponding to the
columns in B are still zero, so x is a Basic Feasible Solution.
Lemma 9.1.7. If x is a Basic Feasible Solution, then x is a corner.
Proof. WLOG suppose x = (x
1
, . . . , x
m
, 0, . . . , 0)
T
is a Basic Feasible Solution, which is to say
B = [ a
1
a
m
] is an m m non-singular sub-matrix of A. Set w = (0, . . . , 0, 1, . . . , 1)
T
(with
m 0’s and n −m 1’s) and t = 0. Then w
T
x = 0. And w
T
y ≥ 0 for any y ∈ S since y ≥

0.
If there exists z ∈ S such that w
T
z = 0, then z = (z
1
, . . . , z
m
, 0, . . . , 0)
T
and

b = Az =
B[z
1
z
m
]
T
. Since

b = Ax = B[x
1
x
m
]
T
, we have [z
1
z
m
]
T
= [x
1
x
m
]
T
= B
−1

b.
So z = x.
Thus w
T
x = t and w
T
x

> t for all x

∈ S −¦x¦, which proves x is a corner.
Now we have the following theorem stating the equivalence of the various definitions:
Theorem 9.1.8. Suppose the LP (9.1) - (9.3) is feasible with feasible region S. Then the following
statements are equivalent:
1. x is a corner;
2. x is an extreme point;
3. x is a Basic Feasible Solution.
Proof. From Lemma (9.1.4) - (9.1.7).
9.1.4 Existence of Basic Feasible Solution
So far we have seen equivalent definitions for Basic Feasible Solutions. But when do we know
when a BFS exists? The following theorem shows that a BFS exists if a feasible solution does.
Theorem 9.1.9. If the Linear Program (9.1) - (9.3) is feasible then there is a BFS.
Proof. Since the LP is feasible, the feasible region S ,= ∅. Let x be a feasible solution with a
minimal number of non-zero components. If x =

0, we are done since

0 is a BFS. Otherwise, set
I = ¦i : x
i
> 0¦ then it’s clear I ,= ∅. Suppose x is not basic, which is to say, ¦a
i
: i ∈ I¦ are not
linearly independent. Then there exists u ,=

0 with u
i
= 0 for i / ∈ I satisfying Au =

0.
Consider x

= x+u. It’s clear Ax

=

b for every . In addition, x

0 for sufficiently small .
So x

is feasible. Increase or decrease until the first time one more components of x

hits 0. At that
point, we have a feasible solution with fewer non-zero components, which is a contradiction.
49
9.1.5 Existence of optimal Basic Feasible Solution
Suppose a Linear Program is feasible and finite, i.e. has a finite optimal value. Then we can show
the LP has an optimal Basic Feasible Solution.
Theorem 9.1.10. If the Linear Program (9.1) - (9.3) is feasible and finite, there is an optimal
solution at an extreme point of the feasible region S.
Proof. Since there exists an optimal solution, there exists an optimal solution x with a minimal
number of non-zero components. Suppose x is not an extreme point, then x = λu + (1 − λ)v for
some u, v ∈ S and 0 < λ < 1. Since x is optimal, we must have c
T
u ≤c
T
x and c
T
v ≤c
T
x. Since
c
T
x = λc
T
u + (1 −λ)c
T
v, this implies c
T
u =c
T
v =c
T
x. Thus u and v are also optimal.
Consider x

= x +(u −v). The following statements are easy to check:
1. c
T
x

=c
T
x for all ;
2. x
i
= 0 ⇒ u
i
= v
i
= 0 ⇒ x

i
= 0;
3. x
i
> 0 ⇒ x

i
> 0 for sufficiently small .
Increase or decrease until the first time one more components of x

is zero. At that point, we have
an optimal solution with fewer non-zero components, which is a contradiction.
Corollary 9.1.11. If the Linear Program (9.1) - (9.3) is feasible and finite, then there is an optimal
Basic Feasible Solution.
Proof. From Theorem (9.1.8) and Theorem (9.1.10).
9.2 Simplex algorithm
9.2.1 The algorithm
The simplex algorithm is based on the existence of an optimal Basic Feasible Solution(BFS). The
algorithm first chooses a BFS (or reports the LP is infeasible), then in each iteration, moves from
the BFS to its neighbor (which differs from it in one column), and makes sure the objective value
doesn’t decrease when all non-basic variables are set to zero. This is called pivoting.
Consider the following example:
z = 27 +
x
2
4
+
x
3
2

3x
6
4
x
1
= 9 −
x
2
4

x
3
2

x
6
4
x
4
= 21 −
3x
2
4

5x
3
2
+
x
6
4
x
5
= 6 −
3x
2
2
− 4x
3
+
x
6
2
50
In this LP, ¦x
1
, x
4
, x
5
¦ is a BFS. If we set non-basic variables to zero, the objective value is 27.
After a pivoting with x
3
joining the BFS and x
5
leaving the BFS, it becomes:
z =
111
4
+
x
2
16

x
5
8

11x
6
16
x
1
=
33
4

x
2
16
+
x
5
8

5x
6
16
x
3
=
3
2

3x
2
8

x
5
4
+
x
6
8
x
4
=
69
4
+
3x
2
16
+
5x
5
8

x
6
16
In this LP, ¦x
1
, x
3
, x
4
¦ is a BFS. If we set non-basic variables to zero, the objective value is
111
4
> 27. If we continue pivoting, either we reach the optimal BFS, or we conclude the LP is
unbounded.
To prevent cycling, we can use Bland’s rule, which says we should choose the variable with
the smallest index for tie-breaking.
Interested readers should refer to Introduction to Algorithms, Chapter 29 for more details of
the simplex algorithm. In particular, this will illustrate:
1. How to choose the initial BFS.
2. How to choose the pivoting element.
3. When to assert the LP is infeasible, unbounded or whether an optimal solution can found.
9.2.2 Efficiency
The simplex algorithm is not a polynomial time algorithm. Klee and Minty gave an example
showing that the worst-case time complexity of simplex algorithm is exponential in 1972. But it is
generally fast in real world applications.
9.3 Ellipsoid algorithm
In this section, we only consider when a finite feasible region exists.
9.3.1 History
If we formulate an LP in its decision version, then it’s obvious LP ∈ NP since a NTM can nonde-
terministically guess the solution. In addition, LP ∈ co-NP since an NTM can guess the solution
to the LP’s dual. So LP ∈ NP ∩ co-NP. However, it was not until Khachiyan posted his ellipsoid
algorithm, a polynomial time algorithm, that people knew LP ∈ P.
51
9.3.2 Mathematical background
Carath´ eodory’s theorem
Theorem 9.3.1 (Carath´ eodory’s theorem). If a point x of R
d
lies in the convex hull of a set P, there
is a subset P

of P consisting of d + 1 or fewer points which are affinely independent such that x
lies in the convex hull of P. Equivalently, x lies in an r-simplex with vertices in P, where r ≤ d.
Proof. Let x be a point in the convex hull of P. Then, x is a convex combination of a finite number
of points in P:
x =
k

j=1
λ
j
x
j
where every x
j
is in P, every λ
j
is positive, and

k
j=1
λ
j
= 1.
If x
2
−x
1
, . . . , x
k
−x
1
are linearly independent, there are scalars µ
2
, . . . , µ
k
not all zero such
that
k

j=2
µ
j
(x
j
−x
1
) = 0
If µ
1
is defined as µ
1
= −

k
j=2
µ
j
, then

k
j=1
µ
j
x
j
= 0 and

k
j=1
µ
j
= 0 and not all of the
µ
j
’s are equal to zero. Therefore, at least one µ
j
> 0. Then,
x =
k

j=1
λ
j
x
j
−α
k

j=1
µ
j
x
j
=
k

j=1

j
−αµ
j
)x
j
for any real α. In particular, the equality will hold if α is defined as
α = min
1≤j≤k
_
λ
j
µ
j
: µ
j
> 0
_
=
λ
i
µ
i
.
Note that α > 0, and for every j between 1 and k, λ
j
− αµ
j
≥ 0. In particular, λ
i
− αµ
i
= 0
by definition of α. Therefore,
x =
k

j=1

j
−αµ
j
)x
j
where every λ
j
αµ
j
is non-negative, their sum is one, and furthermore, λ
i
αµ
i
= 0. In other
words, x is represented as a convex combination of one less point of P.
Note that if k > d + 1, x
2
− x
1
, . . . , x
k
− x
1
are always linearly dependent. Therefore, this
process can be repeated until x is represented as a convex combination of r ≤ d + 1 points
x
i
1
, . . . , x
ir
in P. Since x
i
2
−x
i
1
, . . . , x
ir
−x
i
1
are linearly independent, when this process termi-
nates, x
i
1
, . . . , x
ir
are affinely independent.
Quadratic form
x
T
Ax is called a quadratic form, where x is an n-dimensional vector and A is a symmetric n n
matrix. If A is positive definite, then x
T
Ax = 1 is an ellipsoid.
52
Volume of n-simplices
A set of n + 1 affinely independent points S ∈ R
n
make up an n-simplex. The volume of this
n-simplex is
1
n!
[det
_
1 1 1
p
0
p
1
p
n
_
[
9.3.3 LP,LI and LSI
There are three kinds of problems that are related to each other.
Question: [Linear Programming/IP] Given an integer mn matrix A, m-vector

b and n-vector
c, either
(a) Find a rational n-vector x such that x ≥

0, Ax =

b and c
T
x is minimized subject to these
conditions, or
(b) Report that there is no n-vector x such that x ≥

0 and Ax =

b, or
(c) Report that the set ¦c
T
x : Ax =

b, x ≥

0¦ has no lower bound.
Question: (Linear Inequalities) Given an integer m n matrix A and m-vector

b, is there an
n-vector x such that Ax ≤

b?
Question: (Linear Strict Inequalities/LSI) Given an integer mn matrix A and m-vector

b, is
there an n-vector x such that Ax <

b?
Polynomial reductions
Theorem9.3.2. There is a polynomial-time algorithmfor an LP if and only if there is a polynomial-
time algorithm for LI.
Theorem 9.3.3. If there is a polynomial-time algorithm for LSI then there is a polynomial-time
algorithm for LI.
Therefore, we just need to provide an polynomial-time algorithm for LSI to ensure there is a
polynomial-time algorithm for LP, via the reduction LP ≤ LI ≤ LSI.
9.3.4 Ellipsoid algorithm
The main idea of the ellipsoid algorithm is to do a binary search. Initially the algorithm finds
an ellipsoid large enough so that a part of the feasible region with lower-bounded volume v
0
is
contained within it. In each iteration, the algorithm tests whether the center of the ellipsoid is a
feasible solution. If so, it reports success. Otherwise, it finds a constraint that is violated, which
defines a hyperplane. Since the center is on the wrong side of this hyperplane, at least half of the
ellipsoid can be discarded. The algorithm proceeds by finding a smaller ellipsoid which contains
the half of the original ellipsoid which is on the correct side. The ratio between the volume of the
newellipsoid and that of the old one is upper bounded by r(n) < 1. Therefore, after (polynomially)
many iterations, either a solution is found, or the ellipsoid becomes small enough (the volume of
it is less than v
0
) such that no feasible region can be contained in it. In the latter case, the feasible
region is can be declared empty.
53
The whole algorithm is given below:
Algorithm 8: Ellipsoid algorithm
input : An mn system of linear strict inequalities Ax <

b, of size L
output: an n-vector such that Ax <

b, if such a vector exists; ‘no’ otherwise
1. Initialize
Set j = 0,

t
0
=

0, B
0
= n
2
2
2L
I.
/
*
j counts the number of iterations so far.
*
/
/
*
The current ellipsoid is E
j
= ¦x : (x −

t
j
)
T
B
−1
j
(x −

t
j
) ≤ 1¦
*
/
2. Test
if

t
j
is a solution to Ax <

b then
return

t
j
end
if j > K = 16n(n + 1)L then
return’no’
end
3. Iterate:
Choose any inequality in Ax <

b that is violated by

t
j
; say a
T

t
j

b.
Set

t
j+1
=

t
j

1
n + 1
B
j
a
_
a
t
B
j
a
B
j+1
=
n
2
n
2
−1
(B
j

2
n + 1
(B
j
a)(B
j
a)
T
a
T
B
j
a
)
j = j + 1
Goto 2.
Theorem 9.3.4. Ellipsoid algorithm runs in polynomial time.
Proof. The algorithm runs for at most 16n(n+1)L rounds, and each round takes polynomial time.
So the algorithm runs in polynomial time.
54
Chapter 10
Max-Flow Min-Cut Through Linear
Programming
In this chapter, we give a proof of the Max-Flow Min-Cut Theorem using Linear Programming.
10.1 Flow and Cut
First, we review the definition of Max-Flow and Min-Cut and introduce another definition of Max-
Flow.
10.1.1 Flow
Let N = (G, s, t, c) be a network (directed graph) with s and t the source and the sink of N,
respectively. The capacity of an edge is a mapping c: E → R
+
, denoted by c
uv
or c(u, v); this
represents the maximum amount of “flow” that can pass through an edge.
Definition 10.1.1 (Flow). A flow is a mapping f: E → R
+
, denoted by f
uv
or f(u, v), subject to
the following two constraints:
1. f
uv
≤ c
uv
for each (u, v) ∈ E (capacity constraint)
2.

u: (u,v)∈E
f
uv
=

u: (v,u)∈E
f
vu
for each v ∈ V ¸ ¦s, t¦ (conservation constraint).
The value of flow represents the amount of flow passing from the source to the sink and is
defined by val(f) =

v∈V
f
sv
, where s is the source of N. The maximum flow problem is to
maximize val(f), i.e. to route as much flow as possible from s to the t.
Definition 10.1.2 (Path). A path is a sequence of vertices such that there is a directed path from
one vertex to next vertex in the sequence. An s −t path is a path that with s at the beginning of the
sequence and vertex t the last.
55
10.1.2 An alternative Definition of Flow
We currently give another definition of flow, and later prove that it is equivalent to Definition 10.1.1.
Definition 10.1.3 (Flow (alternate)). In N = (G, s, t, c), let P be the set of vertices that constitute
the s −t path. A flow is a mapping f’: P → R
+
, denoted by f

p
or f

(p), subject to the following
constraint:
1.

p: (u,v)∈p
f

p
≤ c
uv
for each (u, v) ∈ E
The value of flow is defined by val(f

) =

p∈P
f

p
.
10.1.3 Cut
Definition 10.1.4 (Cut). An s-t cut C = (S, T) is a partition of V such that s ∈ S and t ∈ T.
Definition 10.1.5 (Cut Set). The cut-set of C is the set ¦(u, v) ∈ E[u ∈ S, v ∈ T¦.
Note that if the edges in the cut-set of C are removed, val(f) = 0.
Definition 10.1.6 (Capacity). The capacity of an s −t cut is defined by c(S, T) =

(u,v)∈S×T
c
uv
.
The minimum cut problem is to minimize c(S, T), i.e. to determine an S and T such that the
capacity of the S-T cut is minimal.
10.2 Max-flow Min-Cut Theorem
In this section, we prove the Max-Flow and Min-Cut Theorem by using Linear Programming.
Before we go into the proof, we need the following two lemmas.
Lemma 10.2.1. Given N = (G, s, t, c), for every flow val(f) > 0, there exist an s −t path p such
that ∀(u, v) ∈ p, f
uv
> 0
Proof. We prove the lemma by contradiction.
If such an s − t path does not exist, let A = ¦v ∈ V [∃ s-v path p, ∀(u, w) ∈ p, f
uw
> 0¦. Then
s ∈ A, t ,∈ A and ∀u ∈ A, v ,∈ A, f
uv
= 0.

v∈A
(

w: (v,w)∈E
f
vw

u: (u,v)∈E
f
uv
) =

w: (s,w)∈E
f
sw
= val(f)
> 0
But

v∈A
(

w: (v,w)∈E
f
vw

u: (u,v)∈E
f
uv
) =

v∈A,w∈A,(v,w)∈E
f
vw

u∈A,v∈A,(u,v)∈E
f
uv
= −

u∈A,v∈A,(u,v)∈E
f
uv
≤ 0
This is a contradiction.
56
Lemma 10.2.2 (Flow Decomposition). The two definitions of flow 10.1.1 and 10.1.3 are equiva-
lent. In N = (G, s, t, c), for every α ∈ R
+
,
∃f : E → R
+
, val(f) = α ⇔ ∃f

: P → R
+
, val(f

) = α
Proof. (⇐) Let f
uv
=

p: (u,v)∈p
f

p
.
• First we prove that f is a flow by examining the two constraints given in Definition 10.1.1.
For the capacity constraint, we know by Definition 10.1.3 of f

, f
uv
≤ c
uv
for each (u, v) ∈
E.
For the conservation constraint, by the definition of path 10.1.2, for every path p, ∀(u, v) ∈
p, ∃(v, w) ∈ p and ∀(v, w) ∈ p, ∃(u, v) ∈ p.
Thus, for every path p and node v , [¦(u, v) ∈ p[u ∈ V ¦[ = [¦(v, w) ∈ p[w ∈ V ¦[. So for
every v ∈ V

p∈P

u: (u,v)∈p
f

p
=

p∈P

w: (v,w)∈p
f

p
=⇒

u: (u,v)∈E
f
uv
=

u: (v,u)∈E
f
vu
• Second we prove that val(f) = val(f

)
val(f) =

v∈V
f
sv
=

v∈V

(s,v)∈p
f

p
=

p∈P
f

p
= val(f

)
(⇒) We prove this direction by induction on [E[,
• The base case is obvious when [E[ ≤ 2,
• For the inductive step, if val(f) = 0, let f

p
= 0.
If [f[ > 0, then by Lemma 10.2.1 there exists an s −t path p. Let α = min
fuv,(u,v)∈p
, α > 0.
We construct a new flow f
1
by reducing f
uv
by α for every (u, v) ∈ p and then remove all
edges (u, v) such that f(u, v) = 0. In the new flow, at least one edge is removed; thus, by
induction, ∃val(f

1
) = val(f
1
). We can then get f

by adding p to f

1
.
Thus we find
val(f

) = val(f

1
) + α = val(f
1
) + α = val(f)
Theorem 10.2.3 (Max-flow Min-cut Theorem). Given N = (G, s, t, c), max¦val(f) : E →
R
+
¦ = min¦val(f

) : P → R
+
¦
Proof Sketch: 1. Get the LP of the max-flow problem according to the Definition 10.1.3.
2. Find the dual of the LP. The equivalence of the two answers are guaranteed by the strong
duality theorem.
57
3. Prove the solution of the dual is equal to the answer of the min-cut problem.
Proof. First, we present an LP of the max-flow problem along with its dual.
Max-flow Dual
maximize: val(f

) =

p∈P
f

p
subject to:

(u,v)∈p
f

p
≤ c
uv
(u, v) ∈ E
f

p
≥ 0 p ∈ P
minimize:

(u,v)∈E
c
uv
y
uv
subject to:

(u,v)∈p
y
uv
≥ 1 p ∈ P
y
uv
≥ 0 (u, v) ∈ E
We prove the solution of the dual LP is equivalent to that of the min-cut of N via the following
two observations.
Observation 10.2.4. In N, ∀ cuts C = (S, T), there is a feasible solution to the dual LP of max-
follow in N with cost equal to c(S, T).
Proof. For C = (S, T), set
y
uv
=
_
1 u ∈ S, v ∈ T
0 otherwise
It is easy to check that this is a feasible solution of the dual LP and the cost is equal to c(S, T).
Observation 10.2.5. In N = (V, E), given a feasible solution ¦y
uv
¦
(u,v)∈E
to the dual LP, ∃C =
(S, T), where c(S, T) is less than or equal to the cost of the solution.
Proof. First, construct a graph G = (V, E), where the distance between u and v is given by
dis(u, v) = y
uv
, (u, v) ∈ E.
Let d(u, v) be the length of shortest path from u to v according to dis.
Thus the first [P[ constraints of the dual LP can be written as
d(s, t) ≥ 1
Clearly, this stipulates that all paths from s to t should be greater than 1.
Pick a uniform random number T ∈ [0, 1).
For a given T, we construct a cut in the following way,
v ∈
_
S d(s, v) ≤ T
T d(s, v) > T
We define the following function to make things more clear.
In(u, v, S, T) =
_
1 u ∈ S, v ∈ T
0 otherwise
58
We prove that the expectation of c(S, T) is equal to the cost of the solution of the dual LP.
E(c(S, T)) =

(u,v)∈E
E(c
uv
In(u, v, S, T)) (10.1)
=

(u,v)∈E
c
uv
E(In(u, v, S, T)) (10.2)
=

(u,v)∈E
c
uv
Pr(u ∈ S, v ∈ T) (10.3)
=

(u,v)∈E
c
uv
Pr(d(s, u) ≤ T ∧ d(s, v) > T) (10.4)

(u,v)∈E
c
uv
(d(s, v) −d(s, u)) (10.5)

(u,v)∈E
c
uv
dis(u, v) (10.6)
=

(u,v)∈E)
c
uv
y
uv
(10.7)
Because the expectation of c(S, T) equals to the cost of the solution, then there must exist a cut
C = (S, T) which is less than or equal to the cost of the solution.
According to Observation 10.2.4, we know that the minimum cost of the solution of the dual
LP is less or equal to that of min-cut the problem.
According to Observation 10.2.5, we know that the solution to the min-cut problem is less than or
equal to the minimum cost of the solution of the dual LP.
Thus, we arrive at the conclusion that the solution of the dual must exactly equal to that of the
min-cut problem.
The proof of Max-flow Min-cut Theorem immediately follows.
59
Chapter 11
Rounding Technique for Approximation
Algorithms(I)
In this chapter, we give some concrete examples about using Linear Programming to solve opti-
mization problems. First, we give the general method to solve problems using LPs. Then, we give
the Set Cover problem as an example to the general method. We give two approximation algo-
rithms for Set Cover and apply LP-rounding on the Vertex Cover problem to get a 2-approximation
algorithm.
11.1 General Method to Use Linear Programming
Here, we show the general idea to solve problems using an LP.
To make things easier to understand, we will provide three examples of problems solved by
Linear Programing: Set Cover, Vertex Cover and MAX-K SAT.
11.2 Set Cover Problem
11.2.1 Problem Description
We are given a set U of n elements e
1
, e
2
, , e
n
, and a set S of k subsets S
1
, S
2
, , S
k
from
U. A set cover is a collection of subsets from S that satisfy the condition that every element in U
belongs to one of the subsets. In addition, we are given a cost function c : S → Z
+
. The cost of a
set cover is denoted by the sum of costs of each subset in the collection of selected subsets S. Our
objective is to find the collection of subsets that minimizes the cost.
Example 11.2.1. U = ¦1, 2, , 6¦ and S

= ¦S
1
, S
2
, S
3
¦ where S
1
= ¦1, 2¦, S
2
= ¦2, 3, 4¦ and
S
3
= ¦5, 6¦. In addition, we have the cost of the three sets are 1, 2 and 3 respectively. The example
is illustrated below:
60
Figure 11.1: Set Cover Example
By definition, S

1
= ¦S
2
, S
3
¦ is not a set cover of (U, S, c) because S

1
does not cover 1. Con-
versely, S

2
= ¦S
1
, S
2
, S
3
¦ is the only set cover in this example and c(S

2
) = c(S
1
)+c(S
2
)+c(S
3
) =
6.
A Set Cover problem (U, S, c) is denoted by
Input
U = ¦e
1
, e
2
, , e
n
¦
S = ¦S
1
, S
2
, , S
k
¦ S
i
⊆ U
c : S → Z
+
Output
min
S

c(S

) = min
S

S∈S

c(S) subject to ∀i, ∃j s.t. S
j
∈ S

and e
i
∈ S
j
11.2.2 Complexity
Theorem 11.2.2. The Set Cover Problem is NP-hard.
Proof. Our proof is by reduction from Vertex Cover. For any given instance of Vertex Cover in
graph G = (V, E) and an integer j, the Vertex Cover problem determines whether or not there
exist j vertices that cover all edges.
We can construct a corresponding Set Cover Problem (U, S, c) as follows. We let U = E; for
each vertex in v ∈ V, we form a set S ∈ S that contains all edges in E directly linked to the vertex
v. Further, for every S ∈ S, let c(S) = 1. This construction can be done in time that is polynomial
in the size of the Vertex Cover instance.
Suppose that G has a vertex cover of size at most j; then we have a set cover in the constructed
problem by simply choosing the subsets corresponding to the selected vertices. On the other hand,
suppose that we have a set cover in the constructed problem; we can therefore choose those vertices
corresponding to the selected subsets. In this way, we have Vertex Cover ≤
P
Set Cover. It follows
that the Set Cover Problem is NP-hard.
61
11.2.3 Greedy Set Cover
Algorithm Description
Algorithm 9: Greedy Set Cover Algorithm
Input: element set U = ¦e
1
, e
2
, , e
n
¦, subset set S = ¦S
1
, S
2
, , S
k
¦ and cost function
c : S → Z
+
Output: set cover C with minimum cost
C ← φ;
while C ,= U do
Calculate cost effectiveness (defined below): α
1
, α
2
, , α
l
of the unpicked sets
S
1
, S
2
, , S
l
respectively;
pick a set S
j
with minimum cost effectiveness α;
C ← C ∪ S
j
;
output C;
The cost effectiveness α of subset S above is denoted by
α =
c(S)
[S −C[
Let C
i
be the set of those sets chosen up to and including that in iteration i. [S − C
i
[ means the
size of the subtraction from set S of the all union of all subsets in C
i
. It is obvious that for any
particular subset S
j
, its cost effectiveness α
j
varies from one iteration to another.
Example 11.2.3. U = ¦1, 2, , 6¦ and S

= ¦S
1
, S
2
, , S
10
¦. The elements and cost of each
set are given below
S
1
= ¦1, 2¦ c(S
1
) = 3
S
2
= ¦3, 4¦ c(S
2
) = 3
S
3
= ¦5, 6¦ c(S
3
) = 3
S
4
= ¦1, 2, 3, 4, 5, 6¦ c(S
4
) = 14
S
5
= ¦1¦ c(S
5
) = 1
S
6
= ¦2¦ c(S
5
) = 1
S
7
= ¦3¦ c(S
5
) = 1
S
8
= ¦4¦ c(S
5
) = 1
S
9
= ¦5¦ c(S
5
) = 1
S
10
= ¦6¦ c(S
5
) = 1
The example is depicted in graphical form in Figure 11.2.3.
There are several feasible set covers in this example, e.g. ¦S
1
, S
2
, S
3
¦ and ¦S
2
, S
5
, S
6
, S
9
, S
10
¦.
However, the set cover with the minimum cost is ¦S
5
, S
6
, S
7
, S
8
, S
9
, S
10
¦, with a corresponding
cost of 6.
We give a simple example of how to compute cost effectiveness. Let C = ¦S
1
, S
2
, S
9
¦. Then,
for subset S
3
we have
α =
c(S
3
)
[S
3
−C[
=
3
1
= 3
62
Figure 11.2: Greedy Set Cover Example
Approximation Rate of the Greedy Set Cover
We sort the elements e
1
, e
2
, , e
n
by the iteration when they were added to C. Let this sorted list
be e

1
, e

2
, , e

n
.
Figure 11.3: sort elements by iteration time
Notice that the above subsets S

j
are different from the original S

j
, namely, the set only includes
those elements added by itself to C.
For each e

j
which is added to S

j
, we define
price(e

j
) = α
S

j
namely the cost effectiveness of the set where e

j
was covered.
Observation 11.2.4. A(I) =

n
k=1
price(e

k
)
We denote the cost of the set cover of Greedy Set Cover by A(I), where I is the input. Therefore,
we have
A(I) = c(C)
=

S

i
∈C
c(S

i
)
=

S

i
∈C

e

∈S

i
c(S

i
)
[S

i
[
=
n

k=1
price(e

k
)
The third equation above breaks the cost of a subset into the costs of those elements in the subset.
Lemma 11.2.5. Let k = 1, 2, , n, we have price(e

k
) ≤ OPT/(n −k + 1).
63
Proof. Note that the remaining elements e

k
, e

k+1
, , e

n
can be covered at cost at most OPT,
namely in the OPT solution of the problem. Therefore, there is a subset in OPT which has not been
selected and with a cost effectiveness at most OPT/(n−k +1). Further, our algorithm chooses to
cover a set with minimum cost effectiveness, so that we should have price(e

k
) ≤ OPT/(n−k+1)
to guarantee our algorithm could pick the selected subset.
Theorem 11.2.6. Greedy Set Cover is ln(n)-approx
Proof.
A(I) =
n

k=1
price(e

k
)
≤ (
1
n
+
1
n −1
+ +
1
2
+ 1)OPT
≤ ,ln(n)|OPT
The first equation is due to Observation 11.2.4, and the second is derived from the above lemma.
Thus, the Greedy Set Cover is an ln(n)-approximation algorithm.
Tightness of Greedy Set Cover
U = ¦e
1
, e
2
, , e
n
¦ and S = ¦S
1
, S
2
, , S
n
¦. The elements and cost of each set is given below:
S
1
= ¦1¦ c(S
1
) = 1/n
S
2
= ¦2¦ c(S
2
) = 1/(n −1)
S
3
= ¦3¦ c(S
3
) = 1/(n −2)
. .
. .
. .
S
n−1
= ¦n −1¦ c(S
n−1
) = 1/2
S
n
= ¦1, 2, , n¦ c(S
n
) = 1 +
The above is some negligible value. It is obvious that the OPT Set Cover is ¦S
n
¦ with cost n.
However, in our algorithm, in iteration i we will pick the S
j
that has the minimum cost effective-
ness. The is used here to make sure the cost effectiveness of S
j
is smaller that of S
n
. In this way,
we will choose all n subsets and get a solution with cost
1
n
+
1
n −1
+ + 1 = ,ln(n)|
So we construct a worst case for any n with approximation ratio ln(n). In conclusion, the ln(n)
approximation is tight for Greedy Set Cover.
64
11.2.4 LP-rounding Set Cover
Integer Program
We express the Set Cover problem as an Integer Programs in the following form:
Minimize
k

i=1
c(S
i
)u
i
Subject to

i:e∈S
i
u
i
≥ 1 , ∀e ∈ U
u
i
∈ ¦0, 1¦ , i = 1, 2, , k
where u
i
is the boolean variable that shows whether or not set S
i
has been chosen. S
i
is not chosen
when u
i
= 0, while S
i
is chosen when u
i
= 1.
Example 11.2.7. U = ¦1, 2, 3, 4¦ and the sets and cost of each are given below:
S
1
= ¦1, 2¦ c(S
1
) = 5
S
2
= ¦2, 3¦ c(S
2
) = 100
S
3
= ¦2, 3, 4¦ c(S
3
) = 1
We can change the set cover problem above into the integer programs as follows:
Minimize
5u
1
+ 100u
2
+u
3
Subject to
1 : u
1
≥ 1
2 : u
1
+u
2
+u
3
≥ 1
3 : u
2
+u
3
≥ 1
4 : u
3
≥ 1
u
i
∈ ¦0, 1¦ i ∈ ¦1, 2, 3¦
Relaxing to a Linear Program
The above construction an Integer Program properly represents the original Set Cover problem;
however, the Integer Program is still NP-hard.
To get an approximation algorithm, we relax the integral constraints and get a Linear Program.
Integer Program Linear Program
Minimize :

k
i=1
u
i
c(S
i
)
Subject to:

i:e∈S
i
u
i
≥ 1 , ∀e ∈ U
u
i
∈ ¦0, 1¦ , i = 1, 2, , k
Minimize :

k
i=1
u
i
c(S
i
)
Subject to:

i:e∈S
i
u
i
≥ 1 , ∀e ∈ U
0 ≤ u
i
≤ 1 , i = 1, 2, , k
We relax only the non-linear constraints for u
i
. In this way, we can easily solve the Linear
Program in polynomial time. However, the relaxation of constraints brings about a new problem:
How can we interpret the result vector u = (u
1
, u
2
, , u
k
) when some u
i
’s are fractions?
65
Algorithm 10: LP-rounding Set Cover Algorithm
Input: element set U = ¦e
1
, e
2
, , e
n
¦, subset set S = ¦S
1
, S
2
, , S
k
¦ and cost function
c : S → Z
+
Output: set cover C with minimum cost
1. Write the original Set Cover problem as an Integer Program.
2. Relax the Integer Program into a Linear Program.
3. Solve the Linear Program using the Ellipsoid Algorithm and obtain
u = (u
1
, u
2
, , u
k
) ∈ R
k
.
4. Let f be the maximum frequency (the number of times that element appears in distinct
subsets).
5. Output deterministic rounding

u

= (u

1
, u

2
, , u

k
) ∈ ¦0, 1¦
k
, where u

i
satisfies
u

i
=
_
1 , u
i
≥ 1/f
0 , otherwise
LP-rounding Set Cover Algorithm
The algorithm first changes the original Set Cover problem into an Integer Program, then relaxes
the constraints of the IP to obtain a Linear Program. By solving this LP, we get a solution u ∈ R
k
.
Lastly, we round each element of the result vector u using the f below.
Effectiveness of LP-rounding
Lemma 11.2.8. LP-rounding Set Cover is an f-approximation.
Proof. First, we prove that

u

is a feasible solution. In order to show this, we need to examine each
constraint and satisfy it. Suppose
u
1
+u
2
+ + u
l
≥ 1
be one of the constraints. By definition, we have l ≤ f. Now, there must be some i such that
u
i
≥ 1/l ≥ 1/f. In this way, u
i
will be rounded to 1, and therefore satisfies the proper constraint.
Thus, the resulting vector

u

is a feasible solution.
On the other hand, we look into the rounding process from u
i
to u

i
:
u

i
=
_
1 , u
i
≥ 1/f
0 , otherwise
66
We can easily check that u

i
≤ f u
i
for both cases. So that
k

i=1
c(S
i
) u

i

k

i=1
c(S
i
) f u
i
= f
k

i=1
c(S
i
) u
i
≤ f OPT
The first inequality is due to Lemma 11.2.8. The second inequality comes from relaxing the con-
straints to enlarge the feasible solution area. Finally, we know that the OPT from the Integer
Program will be included in the solution space of the Linear Program.
IP
relax
−−−→ LP −→ OPT
IP
≥ OPT
LP
This shows that the LP-rounding of Set Cover produces feasible solution for the problem, and
gives approximation ratio of f.
11.3 Vertex Cover
As we proved in Section 11.2.2, any Vertex Cover problem can be changed into Set Cover problem.
Example 11.3.1. G = (V, E), where V = ¦u
1
, u
2
, u
3
, u
4
, u
5
¦ and E = ¦e
1
, e
2
, , e
8
¦. We can
Figure 11.4: Vertex Cover Example
change the above Vertex Cover problem into a Set Cover problem, where
U = ¦e
1
, e
2
, e
3
, e
4
, e
5
, e
6
, e
7
, e
8
¦
S = ¦(e
1
, e
2
, e
3
), (e
1
, e
4
, e
5
), (e
3
, e
4
, e
6
, e
7
), (e
2
, e
6
, e
8
), (e
5
, e
7
, e
8

c(S
i
) = 1 , ∀S
i
∈ S
From the above construction, i.e. using LP-rounding for Set Cover, we may by extension solve
the Vertex Cover problem, the maximum frequency corresponding to the number of vertices that
belong to a particular edge, which is 2. We have, therefore, f = 2 and the fact that by using
LP-rounding for Set Cover, we can get a 2-approximation algorithm for Vertex Cover.
67
Chapter 12
Rounding Technique for Approximation
Algorithms(II)
In this chapter, we will give a randomized algorithm for Set Cover. The algorithm gives an approx-
imation factor of log n for Set Cover, and reaches this approximation ratio with high probability.
Then, we give a randomized algorithm for the MAX-K SAT problem. Furthermore, we will give a
deterministic algorithm which derandomizes the aforementioned MAX-K SAT algorithm using the
method of conditional expectation.
12.1 Randomized Algorithm for Integer Program of Set Cover
12.1.1 Algorithm Description
We know from the previous lecture how to express the Set Cover problem as a Linear Program.
For the randomized algorithm, we do the same thing as the last lecture, perform a linear relaxation.
Minimize :
k

i=1
u
i
c(S
i
)
Subject to:

i:e∈S
i
u
i
≥ 1 , ∀e ∈ U
u
i
∈ [0, 1] , i = 1, 2, , k
Here is the algorithm:
12.1.2 The correctness of the Algorithm
Firstly, we want to know the expectation of C(G), where C(G) is the configuration of a random
collection of sets in all iterations. Assume G is the random collection of sets in one iteration. Then
68
Algorithm 11: LP-rounding Set Cover Algorithm
Input: element set U = ¦e
1
, e
2
, , e
n
¦, subset set S = ¦S
1
, S
2
, , S
k
¦ and cost function
c : S → Z
+
Output: set cover C with minimum cost
1. Solve the LP-rounding Set Cover, and let the solution be (x
1
, x
2
. . . x
n
) ∈ R
n
(e.g. using the
Ellipsoid algorithm)
2. Randomly choose u

i

R
[0, 1]
If u

i
≤ x
i
, u
i
← 1, otherwise u
i
← 0
Then Pr(u
i
= 1) = x
i
and Pr(u
i
= 0) = 1 −x
i
3. Repeat the second step 2 ln n times. Output the collection G of sets corresponding to u
i
(If
u
i
= 1, then we picked S
i
into G)
the expectation of C(G) is
E[C(G)] =
k

i=1
Pr(S
i
is picked)C(S
i
)
=
k

i=1
x
i
C(S
i
)
= OPT
fractional
≤ OPT
So the expectation of C(G) is
E[C(G)] ≤ 2 ln n E[C(G)]
≤ (2 ln n)OPT
fraction
≤ 2 ln nOPT
Markov’s inequality:
Pr([X[ ≥ α) ≤
E([X[)
a
Using Markov’s inequality, we have
Pr[C(G) ≥ 8 ln nOPT] ≤
1
4
Now, we want to know the probability that every e
i
has been covered. We know one element is
69
not covered during one iteration of step 2 in algorithm with probability
Pr[e is not covered in one iteration] =

i:e∈S
i
(1 −x
i
)

i:e∈S
i
e
−x
i
= e

i:e∈S
i
x
i

1
e
The last inequality is due to

i:e∈S
i
x
i
≥ 1
Then the probability that e is not covered in any of the 2 ln n repetitions is:
P ≤ (
1
e
)
2 ln n
≤ (
1
n
)
2
=
1
n
2
We know that:
Pr(A ∨ B) ≤ Pr(A) + Pr(B)
So the probability of having an element which is not covered is:
Pr[∃eis not covered] = Pr[e
1
is not covered ∨ e
2
is not covered ∨ e
n
is not covered]

1
n
2
n
=
1
n
So
Pr[all elements are covered] ≥ 1 −
1
n
Let c be the event that all elements are covered and c

be the event that C(G) ≥ 8 ln n OPT.
Then we have:
Pr[c

∧ c] ≥ 1 −
1
n

1
4
>
2
3
Hence, we have an algorithm that succeeds with probability higher than
1
2
. Using Chernoff
bounds we can repeat the algorithm a polynomial number of times so as to succeed with high
probability.
12.2 Method of Computation Expectations
12.2.1 MAX-K-SAT problem
Let us introduce the MAX-K-SAT problem.
70
Input:
1. K-CNF c
1
∧ c
2
∧ c
m
2. C : c
1
, c
2
. . . c
m
→ Z
+
Output:
output a truth assignments that maximizes profit, i.e. satisfies a maximal number of clauses
We know that MAX-2-SAT is an NP-Hard problem. But we have a derandomized algorithm to
get at least one half of the optimal for the problem.
12.2.2 Derandomized Algorithm
The derandomized algorithm can be described as follows:
We choose every variable’s value with
1
2
probability of 1 and
1
2
probability of 0. Using E(X +
Y ) = E(X) + E(Y ) and the method of conditional expectation, we can calculate the expectation
of the total profit. More than that, if we have decided some variable’s value, we can also calculate
the expectation of the total profit in polynomial time for the rest random variables. By making
use of these facts, we can design a derandomized algorithm that always achieves an approximation
ratio of at least one half of of OPT.
Assume W
c
i
= C(c
i
) if c
i
is satisfied , otherwise W
c
i
= 0. And W =

m
i=1
W
c
i
For step i:
1. We calculate E[W[
x
i
=1
] and E[W[
x
i
=0
] in polynomial time.
2. We choose x
i
’s value with bigger expectation for W
12.2.3 The proof of correctness
Firstly, we have
E[W] =
1
2
E[W[
x
i
=1
] +
1
2
E[W[
x
i
=0
]
So the bigger one of E[W[
x
i
=1
] and E[W[
x
i
=0
] must be bigger than E[W] too. At the end of the
algorithm, we must have profits bigger than E[W].
And we know
Pr[c
i
is satisfied] = 1 −
1
2
k

1
2
So
E[W
c
i
] ≥
1
2
C(c
i
)
So
E[W] = E[W
c
1
] + E[W
c
2
] + + E[W
cm
]

1
2
(C(c
1
) + C(c
2
) + +C(c
m
))

1
2
OPT
71
So the profit (total number of clauses satisfied) we get at the end of algorithm is greater than or
equal to
1
2
OPT.
72
Chapter 13
Primal Dual Method
In this lecture, we introduce Primal-Dual approximation algorithms. We also construct an ap-
proximation algorithm to solve SET COVER as an example.
13.1 Definition
In the previous lecture, we know that every Linear Programming problem has a corresponding
Dual problem:
PRIMAL DUAL
Minimize :

n
j=1
c
j
x
j
Subject to:

n
j=1
a
ij
x
j
≥ b
i
, i = 1, 2, , m
x
j
≥ 0 , j = 1, 2, , n
Maximize :

m
i=1
b
i
y
i
Subject to:

m
i=1
a
ij
y
i
≤ c
j
, j = 1, 2, , n
y
i
≥ 0 , i = 1, 2, , m
Also from previous lectures, we know that the Primal and Dual have optimal solutions if and
only if:
• Primal: ∀j = 1, 2, , m, x
j
= 0 or

m
i=1
a
ij
y
i
= c
j
• Dual: ∀i = 1, 2, , n, y
i
= 0 or

n
j=1
a
ij
x
j
= b
i
We give a relaxation of the above conditions as follows:
• Primal complementary slackness conditions:
For α ≥ 1, ∀j = 1, 2, ..., n, either x
j
= 0 or
c
j
α

m
i=1
a
ij
y
i
≤ αc
j
• Dual complementary slackness conditions:
For β ≥ 1, ∀i = 1, 2, ..., m, either y
i
= 0 or
b
i
β

n
j=1
a
ij
x
j
≤ βb
i
Theorem 13.1.1. If x, y are feasible solutions to PRIMAL and DUAL respectively, and they satisfy
primal conditions and dual conditions, then

n
j=1
c
j
x
j
≤ αβOPT.
73
Proof.

n
j=1
c
j
x
j
≤ α

n
j=1

m
i=1
a
ij
y
i
x
j
≤ α

m
i=1
y
i

n
j=1
a
ij
x
j
≤ αβ

m
i=1
b
i
y
i
≤ αβ OPT.
With the above theorem, the basic idea of primal dual method is: set x =

0, y =

0 at the
beginning, then modify them to get an optimal solution.
13.2 Set Cover Problem
Since we have discussed a lot about the Set Cover Problem in the previous lectures, here we only
give a brief description of the Set Cover Problem:
Input:
U = ¦e
1
, e
2
, ..., e
n
¦,o = ¦S
1
, S
2
, ..., S
k
¦,C : o → Z
+
.
Output:
A cover o

⊆ o of U with minimal total cost.
We presently give a PRIMAL for the Set Cover problem as well as its DUAL:
• PRIMAL form: Minimize

S∈S
c(S)x
S
,subject to ∀e ∈ U,

S:e∈S
x
S
≥ 1.
• DUAL form: Maximize

e∈U
y
e
,subject to ∀S ∈ o,

e:e∈S
y
e
≤ c(S), y
e
≥ 0.
We also have the following Primal and Dual conditions:
• The Primal conditions: Let α=1,x
s
,= 0, ⇒

e:e∈S
y
e
= c(S).
• The Dual conditions: Let β = f,y
e
,= 0, ⇒

S:e∈S
x
s
≤ f. (here f is the maximal number
of times an element appears in a set.)
Algorithm 12: Primal Dual Algorithm for Set Cover
Let x ←

0, y ←

0.
We call a set S satisfying the primal condition “tight”.
repeat
Pick an uncovered e ∈ U.
Increase y
e
until a set S becomes tight.
Pick all tight sets in the cover and update x,(x
S
= 1.)
let the elements of these sets as covered
until all elements covered;
Claim 13.2.1. The algorithm is an f-approximation for SET-COVER.
Proof. At the end of the execution the algorithm, the algorithm has covered every element e, so
the constrains of the Primal form are satisfied, i.e. x is a feasible solution for the primal.
In addition, the algorithm never violates the conditions of the Dual. Thus, x, y are feasible for
the dual.
By definition, an element e ∈ U is contained in at most f sets and ∀S ∈ o, x
s
≤ 1 ⇒ the
Primal conditions are always satisfied.
Finally, the Dual conditions are satisfied by the way the algorithm works.
We therefore end up with an f-approximation algorithm.
74
Finally, we give an example which shows that the approximation ratio f is also the lower bound
of the algorithm.
Figure 13.1: An example of a case where the solution of the Primal Dual algorithm is f times the
optimal
Let c(S) = 1 for all S ∈ o. We can see that OPT is x
n+1
= 1 with total cost value = 1+ε. The
Primal Dual algorithmwill choose sets x
1
= x
2
= ... = x
n+1
=1, with a total cost value = n+1. So
we have that
n+1
1+ε
→ n+1 = f. This example shows that f is a lower bound for the approximation
ratio of the Primal Dual method.
75

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.