## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Lecture Notes

Periklis A. Papakonstantinou

Fall 2011

Preface

This is a set of lecture notes for the 3rd year course “Advanced Algorithms” which I gave in

Tsinghua University for Yao-class students in the Spring and Fall terms in 2011. This set of notes

is partially based on the notes scribed by students taking the course the ﬁrst time it was given.

Many thanks to Xiahong Hao and Nathan Hobbs for their hard work in critically revising, editing,

and adding ﬁgures to last year’s material. I’d also like to give credit to the students who already

took the course and scribed notes. These lecture notes, together with students’ names can be found

in the Spring 2011 course website.

This version of the course has a few topics different than its predecessor. In particular, I re-

moved the material on algorithmic applications of Finite Metric Embeddings and SDP relaxations

and its likes, and I added elements of Fourier Analysis on the boolean cube with applications

to property testing. I also added a couple of lectures on algorithmic applications of Szemer´ edi’s

Regularity Lemma.

Except from property testing and the regularity lemma the remaining topics are: a brief revi-

sion of the greedy, dynamic programming and network-ﬂows paradigms; a rather in-depth intro-

duction to geometry of polytopes, Linear Programming, applications of duality theory, algorithms

for solving linear programs; standard topics on approximation algorithms; and an introduction to

Streaming Algorithms.

I would be most grateful if you bring to my attention any typos, or send me your recommenda-

tions, corrections, and suggestions to papakons@tsinghua.edu.cn.

Beijing, PERIKLIS A. PAPAKONSTANTINOU

Fall 2011

– This is a growing set of lecture notes. It will not be complete until mid January 2012 –

1

Contents

1 Big-O Notation 4

1.1 Asymptotic Upper Bounds: big-O notation . . . . . . . . . . . . . . . . . . . . . . 4

2 Interval Scheduling 6

2.1 Facts about Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Unweighted Interval Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Weighted Interval Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Sequence Alignment: can we do Dynamic Programming in small space? 12

3.1 Sequence Alignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Dynamic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Matchings and Flows 17

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Maximum Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Max-Flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4 Ford-Fulkerson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4.1 Edmonds-Karp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.5 Bipartite Matching via Network Flows . . . . . . . . . . . . . . . . . . . . . . . . 24

4.6 Application of Maximum Matchings: Approximating Vertex Cover . . . . . . . . . 24

5 Optimization Problems, Online and Approximation Algorithms 26

5.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.2 Approximation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Makespan Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.4 Approximation Algorithm for Makespan Problem . . . . . . . . . . . . . . . . . . 28

6 Introduction to Convex Polytopes 31

6.1 Linear Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.2 Afﬁne Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.3 Convex Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.4 Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.5 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1

7 Forms of Linear Programming 36

7.1 Forms of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.2 Linear Programming Form Transformation . . . . . . . . . . . . . . . . . . . . . . 37

8 Linear Programming Duality 38

8.1 Primal and Dual Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.1.1 Primal Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.1.2 Dual Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

8.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8.3 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.3.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.3.2 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.3.3 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

8.3.4 More on Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8.4 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8.5 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

9 Simplex Algorithm and Ellipsoid Algorithm 47

9.1 More on Basic Feasible Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9.1.1 Assumptions and conventions . . . . . . . . . . . . . . . . . . . . . . . . 47

9.1.2 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9.1.3 Equivalence of the deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . 48

9.1.4 Existence of Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . 49

9.1.5 Existence of optimal Basic Feasible Solution . . . . . . . . . . . . . . . . 50

9.2 Simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9.2.1 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9.2.2 Efﬁciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.3 Ellipsoid algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.3.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.3.2 Mathematical background . . . . . . . . . . . . . . . . . . . . . . . . . . 52

9.3.3 LP,LI and LSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

9.3.4 Ellipsoid algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

10 Max-Flow Min-Cut Through Linear Programming 55

10.1 Flow and Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

10.1.1 Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

10.1.2 An alternative Deﬁnition of Flow . . . . . . . . . . . . . . . . . . . . . . 56

10.1.3 Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

10.2 Max-ﬂow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

11 Rounding Technique for Approximation Algorithms(I) 60

11.1 General Method to Use Linear Programming . . . . . . . . . . . . . . . . . . . . 60

11.2 Set Cover Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

11.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

11.2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2

11.2.3 Greedy Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

11.2.4 LP-rounding Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

11.3 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

12 Rounding Technique for Approximation Algorithms(II) 68

12.1 Randomized Algorithm for Integer Program of Set Cover . . . . . . . . . . . . . . 68

12.1.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

12.1.2 The correctness of the Algorithm . . . . . . . . . . . . . . . . . . . . . . 68

12.2 Method of Computation Expectations . . . . . . . . . . . . . . . . . . . . . . . . 70

12.2.1 MAX-K-SAT problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

12.2.2 Derandomized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 71

12.2.3 The proof of correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

13 Primal Dual Method 73

13.1 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

13.2 Set Cover Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3

Chapter 1

Big-O Notation

When talking about algorithms (or functions or programs), it would be nice to have a way of

quantifying or classifying how fast or slow that algorithm is, how much space it uses and so on.

It is useful to present this quantiﬁcation in relation to the size of the algorithm’s input, so that it

is independent of the peculiarities of the problem. In this chapter we introduce the big-O notation

which among other things it can be used to quantify the asymptotic behavior of an algorithm as a

function of its input length n measured in bits.

1.1 Asymptotic Upper Bounds: big-O notation

Let T(n) be a function, say, the worst-case running time of a certain algorithm with respect to

its input length n. What we would like to point out here is that worst-case running time is a

well-deﬁned function. Contrast this to the running time of an algorithm which is in general not

a function of the input length (why?). Given a function f(n), we say that T(n) ∈ O(f(n)) if,

for sufﬁciently large n, the function T(n) is bounded from above by a constant multiple of f(n).

Sometimes this is written as T(n) = O(f(n)).

To be precise, T(n) = O(f(n)), if there exists a constant c > 0 and n

0

≥ 0 such that for all

n ≥ n

0

, we have T(n) ≤ cf(n). In this case, we will say that T is asymptoticly upper bounded

by f.

For a particular algorithm, different inputs result in different running times. Notice that the

4

running time of a particular input is a constant, not a function. Running times of interest are the

best, the worst, and the average expressing that the running time is at least, at most and on average

for each input length n. For us the worst-case execution time is often of particular concern since

it is important to know how much time might be needed in the worst case to guarantee that the

algorithm will always ﬁnish on time. Similarly, for the worst-case space used by an algorithm.

From this point on unless stated otherwise the term “running time” is used to refer to “worst-case

running time”.

Example. Take Bubble Sort as an example, shown in Algorithm 1. This is an algorithm correct

for the SORTING problem, which has the obvious deﬁnition. We see that the number of outer loops

and the number of inner loops together determine the running time of the algorithm. Notice that

we have:

the number of outer loops ≤ n

the number of inner loops ≤ n

Therefore, a rough estimation on the worst-case running time is O(n

2

). This bound can be proved

to be asymptotically tight.

Algorithm 1: Bubble Sort

input : L: list of sortable items

output: L

**: list of sorted items in increasing order
**

repeat

swapped = false;

for i ← 1 to length(L) - 1 do

if L[i −1] > L[i] then

swap( L[i-1], L[i] )

swapped = true

end

end

until not swapped;

return L

An important skill required in the design of combinatorial algorithms is constructing examples.

In this case, inﬁnite families of examples. The more interesting direction in bounding the running

time is what we did before, i.e. to upper bound it. We showed that T(n) = O(n

2

). How do we

know we have not overestimated things? One way is to show that T(n) = Ω(n

2

). To do that, we

construct an inﬁnite family of examples parametrized by n where the algorithm takes Ω(n

2

). In this

case, given n we can consider as input ¸n, n −1, . . . , 1¸ and conclude that on this input Algorithm

1 makes Ω(n

2

) many steps

1

.

1

To be very precise we should have treated n as the input length in the number of bits in the input (and not

the number of integers). For example, the above input for a given n has length ≈ nlog n (each integer requires about

log n many bits to be represented). However, it is an easy exercise to do the (annoying) convention between mintegers

represented in binary that altogether have length n bits.

5

Chapter 2

Interval Scheduling

In this lecture, we begin by describing what a greedy algorithm is. UNWEIGHTED INTERVAL

SCHEDULING is one problem that can be solved using such an algorithm. We then present

WEIGHTED INTERVAL SCHEDULING, where greedy algorithms don’t seem to work, and we must

employ a new technique, dynamic programming, which in some sense generalizes the concept of a

greedy algorithm.

2.1 Facts about Greedy Algorithms

Most greedy algorithms make decisions by iterating the following 2 steps.

1. Order the input elements

2. Make an irrevocable decision based on a local optimization.

Here, irrevocable means that once a decision is made about the partially constructed output, it is

never able to be changed. The greedy approach relies on the hope that repeatedly making locally

optimal choices will lead to a global optimum.

Most greedy algorithms order the input elements just once, and then make decisions one by

one afterwards. Kruskal’s algorithm on minimum spanning trees is a well-known algorithm that

uses the greedy approach. More sophisticated algorithms will reorder the elements in every round.

For example, Dijkstra’s shortest path algorithm will update and reorder the vertices to choose the

one who’s distance to the vertex is minimized.

Greedy algorithms are often the ﬁrst choice when one tries to solve a problem for two reasons:

1. The greedy concept is simple.

2. In many common cases, a greedy approach leads to an optimal answer.

We present a simple algorithm to demonstrate the greedy approach.

6

2.2 Unweighted Interval Scheduling

Suppose we have a set of jobs, each with a given starting and ﬁnishing time. However, we are

restricted to only one machine to deal with these jobs, that is, no two jobs can overlap with each

other. The task is to come up with an arrangement or a “scheduling” that ﬁnishes the maximum

number of jobs within a given time interval. The formal deﬁnition of the problem can be described

as follows:

Deﬁnition 2.2.1 (Intervals and Schedules). Let S be a ﬁnite set of intervals. An interval I is a

pair of two integers, the ﬁrst smaller than the second; i.e. I = (a, b) ∈ (Z

+

)

2

, a < b. We say

that S

⊆ S is a feasible schedule if no two intervals I, I

**∈ S overlap with each other. Let
**

I = (a

1

, b

1

), I

= (a

2

, b

2

); overlap means a

2

< b

1

≤ b

2

or a

1

< b

2

≤ b

1

.

Problem: UNWEIGTHED INTERVAL SCHEDULING

Input: a ﬁnite set fo intervals S

Output: a feasible schedule S

⊆ S, such that [S

**[ is maximum (the maximum is taken over the
**

size of all feasible schedules of S).

Figure 2.1 gives an example instance of the Unweighted Interval Scheduling Problem. Here

interval (1, 7) and interval (6, 10) overlap with each other.

(Failed) attempt 1: Greedy rule: Order the intervals in ascending order by length, then choose

the smaller intervals ﬁrst. The intuition behind this solution is that shorter jobs may be less likely

to overlap with other jobs. So we order the intervals in ascending order, then we choose jobs which

do not overlap with the ones we have already chosen (choosing the shortest job which fulﬁls this

requirement). However, this solution is not correct.

Figure 2.1: An example of the time line and intervals

Figure 2.2 gives a counterexample in which this greedy rule fails. Giving a counterexample is

a standard technique to show that an algorithm is not correct

1

. In this example, the optimal choice

is job 1 and job 3. But by adhering to a greedy rule that applies to the length of a job, we can only

choose job 2. Solution 1 gives an answer only

1

2

of the optimal. We know that this algorithm is

wrong. But there is still one thing we can ask about this greedy algorithm: How bad can it be? Can

1

The deﬁnition of a correct algorithm for a given problem, is that for every input (i) the algorithm terminates and

(ii) if it terminates and the input is in the correct form then the output is in the correct form. Therefore, to prove that

an algorithm is not correct for a given problem it is suffcient to show that there exists an input where the output is not

correct (in this case, by deﬁnition of the problem, correct is a schedule of optimal size).

7

Figure 2.2: Counterexample of Solution 1

it give an result much worse than

1

2

? The answer is no. This greedy algorithm can always give a

result no worse than

1

2

of the optimal (see Exercise 1).

Attempt 2: Greedy rule: Sort by the ﬁnishing time in non-decreasing order, and choose jobs

which do not overlap with jobs already chosen. A formal algorithm is described below.

Algorithm 2: A greedy algorithm for interval scheduling

input : S : the set of all intervals

output: S

⊆ S s.t. S

**is a scheduling with maximum size
**

Order S as I

1

< I

2

< ... < I

n

in non-decreasing ﬁnishing time;

S

← ∅;

for i ← 1 to n do

if I

i

does not overlap an interval in S

then

S

← ∪¦I

i

¦;

end

end

return S

;

We would like to show that this algorithm is “correct”. But, we must ﬁrst make sure that we

deﬁne what is meant by the “correctness” of an algorithm.

The correctness of an algorithm is deﬁned as follows:

1. The algorithm always terminates.

2. If it can be proved that the precondition is true, then the postcondition must also be true. In

other words, every legal input will always generate the right output.

It is clear that Algorithm 8 will terminate after n iterations. For some algorithms though,

termination can be a problem and might require more justiﬁcation to show. We will now shift our

focus to the more difﬁcult question of how to prove Algorithm 8 always gives the correct maximum

scheduling.

Proof. We begin by introducing some basic concepts used in the proof. Let S

i

be the content

of S

**at the end of the i’th iteration of the for loop, and let OPT
**

i

be an optimal extension of

S

i

(S

i

⊆ OPT

i

). There is one important thing about the second part of deﬁnition. We have to show

the existence of such an optimal extension, otherwise the problem is proved by the deﬁnition itself.

8

Remark 2.2.2. You may wonder why we are concerned about a different OPT

i

(one for each

iteration i). This is because after the base case is established, we ﬁnd OPT

n

by assuming it has

been proved for all values between the base case and OPT

n−1

, inclusive. This is the way induction

works.

We will show by induction that

P(i) : ∀i, ∃OPT

i

s.t. S

i

⊆ OPT

i

For the base case, P(0), it is trivial to get S

0

= ∅ and S

0

⊆ OPT

0

. For the inductive step,

suppose we have P(k) and we want to solve P(k + 1). At time k + 1, we consider interval I

k+1

:

• Case I: I

k+1

/ ∈ S

k+1

. We can construct OPT

k+1

= OPT

k

. It is obvious that S

k+1

= S

k

⊆

OPT

k+1

.

• Case II: I

k+1

∈ S

k+1

– Subcase a: I

k+1

does not overlap OPT

k

.

This case cannot happen since it contradicts with the deﬁnition of OPT

k

.

– Subcase b: I

k+1

overlaps exactly one interval I

∗

from OPT

k

.

Since I

k+1

∈ S

k+1

, I

k+1

will not overlap any interval in S

k

, which implies I

∗

∈ OPT

k

¸

S

k

. So S

k

⊆ OPT

k

¸¦I

∗

¦. Then we can construct OPT

k+1

= (OPT

k

¸¦I

∗

¦)∪¦I

k+1

¦.

S

k+1

= S

k

∪ ¦I

k+1

¦, where S

k

⊆ OPT

k

¸ ¦I

∗

¦. So S

k+1

⊆ OPT

k+1

.

– Subcase c: I

k+1

overlaps more than one intervals from OPT

i

.

We wish to show this case can not happen. Since I

k+1

∈ S

k+1

, I

k+1

will not overlap

any interval in S

k

, which implies I

k+1

only overlaps intervals from OPT

k

¸ S

k

. By

way of contradiction, suppose I

k+1

overlaps with more than one interval I

a

, I

b

...I

t

∈

OPT

k

¸ S

k

, for some t. Here I

a

, I

b

...I

t

are sorted in non-decreasing order by ﬁnishing

time. Because I

a

, I

b

...I

t

∈ OPT

k

, we have the property that earlier intervals always end

before later ones start, i.e. all intervals in OPT

k

are sequential and non-overlapping.

Since we have more than one interval that overlaps with I

k

, there must be at least one

interval from I

a

, I

b

...I

t

∈ OPT

k

¸ S

k

that ends before I

k+1

ﬁnishes (c.f. Figure 2.3).

By the way the algorithm works, I

k+1

should end before all I

a

, I

b

...I

t

end. So there is

a contradiction.

By induction, for any k > 0, we can always construct OPT

k+1

from OPT

k

such that S

k+1

⊆

OPT

k+1

. Thus

P(i) : ∀i, ∃OPT

i

s.t. S

i

⊆ OPT

i

9

Figure 2.3: An example of Case II(b): There are two intervals I

a

and I

b

from OPT

k

overlaps with

I

k

2.3 Weighted Interval Scheduling

This interval scheduling problem is identical to the previous problem, but with the difference that

all intervals are assigned weights; i.e. a weighted interval I can be identiﬁed with I = (a, b) and

an integer w(I) which is its weight .

Input: S : a set of weighted intervals S and w : S → N

Output: A schedule S

**⊆ S of maximum total weight.
**

Figure 2.4: An example of Weighted Interval Scheduling

Unlike the Unweighted Interval Scheduling problem, no natural greedy algorithm is known for

solving its weighted counterpart. It can be solved, however, via a different technique if we break

the problem into sub-problems: dynamic programming. Now instead of just greedily building

up one solution, we create a table and in each round we “greedily” compute an optimal solution

to a subproblem by using the values of the optimal solutions of smaller subproblems we have

constructed before.

Of course, not every problem can be solved via Dynamic Programming (DP). Roughly speak-

ing, DP implicitly explores the entire solution space by decomposing the probleminto sub-problems,

and then using the solutions to those sub-problems to assist in solutions for larger and larger sub-

problems.

After ordering intervals in non-decreasing order by ﬁnishing time, an example sub-problem in

WEIGHTED INTERVAL SCHEDULING is to give the scheduling with maximum total weight up to

a speciﬁed ﬁnishing time. For example, say that the input consists of the intervals I

1

, I

2

, I

3

, I

4

,

and that the ﬁnishing time of I

1

is the smallest then it’s I

2

, then I

3

and then I

4

. The DP algo-

rithm computes the optimal schedule if the input consists only of I

1

, then it computes the opti-

mal schedule if the input consists of ¦I

1

, I

2

¦, then it computes the optimal schedule if the input

consists of ¦I

1

, I

2

, I

3

¦ and ﬁnally it computes the optimal schedule if the input consists of the

entire list ¦I

1

, I

2

, I

3

, I

4

¦. The reader should understand the following: (1) what would have hap-

pened if we had deﬁned the subproblems without ordering the intervals in non-decreasing ﬁnishing

time order and (2) how do we use the fact that in order to compute the optimal solution to input

10

¦I

1

, . . . , I

k

¦ we should have recorded in ur table all previous solutions to subproblems deﬁned on

inputs ¦I

1

, . . . , I

k−1

¦, ¦I

1

, . . . , I

k−2

¦, ¦I

1

, . . . , I

k−3

¦, . . . .

Algorithm 3: A dynamic programming algorithm for weighted interval scheduling

input : S : the set of all weighted intervals

output: S

⊆ S s.t. S

**is a scheduling of maximum total weight
**

Order S by I

1

< I

2

< ... < I

n

in non-decreasing ﬁnishing time;

S

← ∅;

for i ← 1 to n do

Let j be the largest integer s.t. S’[j] contains intervals not overlapping I

i

;

S

[i] ← max¦S

[i −1], S

[j] + w[I

i

]¦

end

return S

;

We will use the example illustrated in Figure 2.4 as the input to Algorithm 3. At the end of

Algorithm 3’s execution, the S

**array should look like the following:
**

0 5 11 100 100 150

We remark that unlike greedy algorithms, in Dynamic Programming algorithms the correctness

is transparent in the description of the algorithm. Usually, the difﬁculty in a DP algorithmis coming

up with a recurrence relation which relates optimal solutions of smaller subproblems to bigger ones

(this recurrence relation is usually trivial to implement as an algorithm).

Exercise 1. Show why Attempt 1 given for the Unweighted Interval Scheduling problem cannot

be worse than half of the optimal.

11

Chapter 3

Sequence Alignment: can we do Dynamic

Programming in small space?

In this course problems come in two forms: search and optimization. For example, the SEARCH

SHORTEST PATH problem is for a given input to ﬁnd an actual shortest path, whereas the opti-

mization version of the problem is to ﬁnd the value (i.e how long) is such a path. In this chapter, we

deﬁne the SEQUENCE ALIGNMENT problem, and give a dynamic programming algorithm which

solves it. It is very interesting that there is a conceptual difference in the algorithms that solve

the SEARCH and the OPTIMIZATION version of problems. In particular, although it is easy to

ﬁnd an optimal Dynamic Programming algorithm for the optimization problem which uses little

space, it is more involved to come up with a time-efﬁcient algorithm that uses little space for the

search version of the problem. To that end, we will combine Dynamic Programming and Divide

and Conquer and we will give a solution which uses little time and simultaneously little space.

3.1 Sequence Alignment Problem

Sequence alignment has a very common every day application: spell check. When we type “Algo-

rthsm” when we mean to type “Algorithm”, the computer has to ﬁnd similar words to the one we

typed and give us suggestions. In this situation, computers have to ﬁnd the “nearest” word to the

one we typed. This begs the question, how can we deﬁne the “nearest” word for a sequence? We

will give a way to score the difference (or similarity) between strings. Apart from spell checkers

this ﬁnds application to algorithms on DNA sequences. In fact, such DNA sequences have huge

sizes, so an algorithm that works using space n and algorithm that works using space n

2

make a

big difference in practice!

Usually, alignments add gaps or insert/delete characters in order to make two sequences the

same. Let δ be a gap cost and α

a,b

be the cost of a mismatch between a and b. We deﬁne the

distance between a pair of sequences as the minimal alignment cost.

In the example above we have:

algor thsm

algorith m

12

In the above alignment, we add a total of two gaps to the sequences. We can say that the

alignment cost in this case is 2δ. Notice that for any sequence pair, there is more than one way to

make them same. Different alignments may lead to different costs.

For example, the sequence pair: “abbba” and “abaa”, two sequences over the alphabet Σ =

¦a, b¦. We could assign an alignment cost of δ +α

a,b

:

abbba

ab aa

Or we could assign an alignment cost of 3δ:

abbba

ab aa

Notice that the relationship between δ and α may lead to different minimal alignments.

Now we give a formal description of the Sequence Alignment Problem:

• INPUT: Two sequences X, Y over alphabet Σ.

• OUTPUT: An alignment of minimum cost.

Deﬁnition 3.1.1 (Alignment). Let X = x

1

x

2

...x

n

and Y = y

1

y

2

...y

m

. An alignment between X

and Y is a set M ⊆ ¦1, 2, ..., n¦¦1, 2, ..., m¦, such that (i, j), (i

, j

) ∈ M, where i ≤ i

, j

≤ j.

3.2 Dynamic Algorithm

Claim 3.2.1. Let X, Y be two sequences and M be an alignment. Then one of the following is

true:

1. (n, m) ∈ M

2. the n

th

position of X is unmatched.

3. the m

th

position of Y is unmatched.

Proof. When we want to make an alignment for sequences X and Y , for any (n, m) ∈ ¦1, 2, ..., n¦

¦1, 2, ..., m¦, one of three situations occur: x

n

and y

m

form a mismatch, x

n

should be deleted from

X, or y

m

should be inserted into X. Notice that inserting a blank into the same position for both

X and Y can never reach a best alignment, so omit this case.

We denote by Opt(i, j) the optimal solution to the subproblem of aligning sequences X

i

=

x

1

x

2

...x

i

and Y

j

= y

1

y

2

...y

j

. We denote by α

x

i

,y

j

the mismatch cost between x

i

and y

j

and by δ

the gap cost. From the claim above, we have:

Opt(i, j) = min

_

_

_

α

x

i

,y

j

+Opt(i −1, j −1),

δ +Opt(i −1, j)

δ +Opt(i, j −1)

_

_

_

13

We can now present a formal dynamic programming algorithm for Sequence Alignment:

Algorithm 4: Sequence-Alignment-Algorithm

input : Two sequences X, Y over alphabet Σ

output: Minimum alignment cost

for i ← 0 to n do

for j ← 0 to m do

if i = 0 or j = 0 then

Opt(i, j) = (i + j) δ;

end

else

Opt(i, j) = min¦α

x

i

,y

j

+Opt(i −1, j −1), δ +Opt(i −1, j), δ +Opt(i, j −1)¦;

end

end

end

return Opt(n, m)

It is clear that the running time and the space cost are both O(nm) (the nxm array Opt(i, j)

for space and the nested for loops give us the running time). However, we could improve the

Algorithm 4 by reducing the space required while at the same time not increasing the running time!

We can make an algorithm with space O(n +m) using another technique, Divide-And-Conquer.

Remark 3.2.2. In fact, we can compute the value of the optimal alignment easily in space only

O(m). When computing Opt(k + 1, j), we only need the value of Opt(k, j). Thus we can release

the space Opt(0, j)...Opt(k − 1, j)∀j = 0...m. We will call this the Space-efﬁcient-Alignment-

Algorithm.

3.3 Reduction

We will introduce the Shortest Path on Grids with Diagonals problemand then reduce the Sequence

Alignment problem to it. An example of the Shortest Path on Grids with Diagonals problem is

given in Figure 3.1.

We are given two sequences X = x

1

x

2

...x

n

and Y = y

1

y

2

...y

m

. Let the i

th

row of a grid (with

diagonals) represent x

i

in X, and the j

th

column of a grid represent y

j

in Y . Let the weight of

every vertical/horizontal edge be δ and the weight of a diagonal be α

x

i

y

j

. The shortest path on this

graph from u to v is the same as the minimum alignment cost of two sequence.

Figure 3.1 gives an example of a grid with diagonals, G

XY

. x

1

is matched to the ﬁrst column,

x

2

is matched to the second column. If we move horizontally from left to right on the ﬁrst row

starting from u, then this means x

1

is matched to a gap. Similarly, y

2

is matched to the bottom

row, y

1

is matched to the upper row. If we move vertically from bottom to top, this means y

1

is

matched to a gap. When we use diagonals, some letter of the two sequence are matched together,

for example, if we move along the diagonal closest to u, then x

1

is matched to y

1

. No matter the

route we take (only allowing moves up, right and north-east diagonal), we can always ﬁnd a path

from u to v on G

XY

. Moreover, this path always corresponds to an alignment of X and Y and vice

versa.

14

Figure 3.1: An example of grids with diagonals, G

XY

Claim 3.3.1. The minimum value for the Shortest Path on Grids with Diagonals is the same as the

minimum value alignment cost for Sequence Alignment.

Proof. (sketch) For any two input sequences, a feasible solution for the Sequence Alignment prob-

lem can always have a one to one mapping to a feasible solution for the Shortest Path on Grids

with Diagonals problem.

In the Sequence-Alignment algorithm, if we know the value of the last row, we can run the

algorithm in reverse. We call this algorithm the Reverse-Sequence-Alignment algorithm. Let

f[i][j] store the value of the shortest path from u to location (i, j), and g[i][j] store the value of

the Reverse-Sequence-Alignment algorithm at location (i, j). In other words, g[i][j] is the optimal

alignment of subsequences x

i

x

i+1

...x

n

and y

j

y

j+1

...y

m

.

Claim 3.3.2. The shortest path in G

XY

that passes through (i, j) has length f[i][j] + g[i][j].

Proof. Any path that passes through (i, j) can be divided into two shorter paths, one from u to

(i, j) and the other from (i, j) to v. The shortest path from u to (i, j) is f[i][j]. Since the weights

along the edges are ﬁxed in G

XY

no matter which way they are traversed, the shortest path from

(i, j) to v is equal to the value of the shortest path fromv to (i, j), which is g[i][j]. So the minimum

cost from u to v through (i, j) is f[i][j] + g[i][j].

Claim 3.3.3. For a ﬁxed number q, let k be the number that minimizes f[q][k] +g[q][k]. Then there

exists a shortest path from u to v passing through (q, k).

Proof. Suppose there does not exist a shortest path from u to v passing through (q, k). For a ﬁxed

number q, all the pathes from u to v must pass through some point (q, t), t ∈ 1...m. Thus, a

shortest path L must pass through some point (q, k

**). According to our assumption, L cannot pass
**

through (q, k), which means that in particular, f[q][k

] + f[q][k

] < f[q][k] + g[q][k]. This is a

contradiction.

15

We give the ﬁnal algorithm based on the Divide and Conquer technique below:

Algorithm 5: Divide and Conquer Alignment Algorithm(X, Y )

input : Two sequences X = x

1

x

2

...x

n

, Y = y

1

y

2

...y

n

output: The node on the u −v shortest path on G

XY

if n ≤ 2 or m ≤ 2 then

exhaustively compute the optimal and quit;

end

Call Sequence-Alignment algorithm(X, Y [1, m/2])

Call Reverse-Sequence-Alignment algorithm(X, Y [m/2 + 1, m])

Find q minimize f[q][m/2] + g[q][m/2]

Add [q, m/2] to the output.

Call Divide-and-Conquer-Algorithm(X[1, q], Y [1, m/2])

Call Divide-and-Conquer-Algorithm(X[q, n], Y [m/2 + 1, m])

The space required in Algorithm 5 is O(n+m). This is because we divide G

XY

along its center

column and compute the value of f[i][m/2] and g[i][n/2] for each value of i. Since we apply the

recursive calls sequentially and reuse the working space from one call to the next, we end up with

space O(n +m).

We claim the running time of the algorithm is O(nm): Let size = nm, we ﬁnd that f(size) =

O(size) + f(

size

2

). This is because the ﬁrst calls to Sequence-Alignment and Reverse-Sequence-

Alignment each need O(size) time, and the latter two calls need f(

size

2

) time. Solving the recur-

rence, we have the running time is O(nm).

16

Chapter 4

Matchings and Flows

In this chapter, we discuss matchings and ﬂows from a combinatorial perspective. In a subsequent

lecture we will discuss the same issues using Linear Programming. We introduce the problem of

maximum ﬂow, and algorithms for solving solve it, including the Ford-Fulkerson method and one

polynomial time instantiation of it, the Edmonds-Karp algorithm. We prove the maximum ﬂow -

minimum cut theorem, and we show the polynomial time bound for Edmonds-Karp. Then, we apply

network ﬂows to solve the maximum bipartite matching problem. Finally, we apply non-bipartite

matchings to give a 2-approximation algorithm for Vertex Cover.

4.1 Introduction

Matching and ﬂoware basic concepts in graph theory. And they have many applications in practice.

Basically , in a graph a matching is a set of edges of which any two edges does not share a common

end node. And a network ﬂow in a directed weighed graph is like a distribution of workloads

according to the capacity of each edge.

In this lecture,we are going to talk about maximum bipartite matchings. Formally, we have the

following deﬁnitions :

Deﬁnition 4.1.1 (Bipartite Matching). For a graph G = (X, Y, E),we say that M ⊆ E is a (bipar-

tite) matching if every vertex u ∈ (X ∪ Y ) appears at most once in M.

Now we are aware of the formal deﬁnition of a matching, we can introduce the main problem

we are gonna talk about today and the day following – Maximum Matching Problem.

Deﬁnition 4.1.2 (Maximum Matching). In a bipartite graph G = (X, Y, E), M ⊆ E is a matching.

Then M is a maximum matching if for every matching M

in G, [M

[ ≤ [M[. j.

Deﬁnition 4.1.3 (Maximal Matching). In a bipartite graph G = (X, Y, E), M ⊆ E is a matching.

Then M is a maximal matching if there does not exist a matching M

⊃ M.

Remark 4.1.4. Notice that we use the term “maximum” and not “maximal”. These two are differ-

ent. A “maximal” matching is a matching that we can not add more edges into and keep it still a

matching; while a “maximum” matching is the maximal matching with largest size. Here we give

an example in Figure 4.1.

17

Figure 4.1: This ﬁgure shows the difference between a maximum matching and a maximal matching.

Here we deﬁne the Maximum Bipartite Matching problem.

Deﬁnition 4.1.5. Maximum Bipartite Matching

• Input: A bipartite graph G = (X, Y, E).

• Output: A maximum matching M ⊆ E.

The maximum bipartite matching problem is fundamental in graph theory and theoretical com-

puter science. It can be solved in polynomial time. BUT notice that there exists NO dynamic

programming algorithm that can solve this problem. In order to solve this problem, we introduce

another problem called the Maximum Flow Problem. Later we’ll see how can we solve maximum

matching using maximum ﬂow.

4.2 Maximum Flow Problem

What is a FLOW? The word ”ﬂow” reminds us of water ﬂow in a ﬁrst impression. Yes, network

ﬂow is just like pushing water from the source to the sink, with the limitation of the capacity of

each pipe(here it’s edge in a graph). And a maximum ﬂow problem is just pushing as much water

in as we can. Formally the deﬁnition of a network ﬂow is as follows:

Deﬁnition 4.2.1 (Network Flow). Given (G, s, t, c), where G is a graph (V, E),s, t ∈ V ,c is a

function which maps an edge to a value, a ﬂow f : E → R is a function where :

1. f(e) ≤ c(e), ∀e ∈ E

2.

∀e=(u,v)

f(e) =

∀e

=(v,w)

f(e

), ∀v ∈ V

We deﬁne the value of a ﬂow v(f) as v(f) =

∀e=(s,u)

f(e), which means the sum of all the

edges leading out of the source s. And the maximum ﬂow problem is deﬁned as follows:

Deﬁnition 4.2.2. Maximum Flow Problem

• Input: (G,s,t,c).

• Output: a ﬂow in the graph with maximum value.

18

•

s

3/2

//

•

u

2/2

//

•

t

*4

•

s

1

//

•

u

2

ii •

t

2

oo

Figure 4.2: the transform from a network to its residual network.

In order to solve the maximum ﬂow problem, we introduce another concept in the graph theory

– residual graph:

Deﬁnition 4.2.3 (Residual Graph). For a net work N : (G, s, t, c), and f is a ﬂow for N. Then G

f

the residual network is a graph where:

1. V

f

= V

2. ∀e ∈ E, if c(e) > f(e), then e is in E

f

and c

f

(e) = c(e) − f(e).(Here e is called a forward

edge)

3. ∀e = (u, v) ∈ E, if f(e) > 0,then e

= (v, u) is in E

f

and c(e

) = f(e) (Here e

is called a

backward edge)

Here is an example of the transform from a network to its residual network in Figure 4.2.

4.3 Max-Flow Min-Cut Theorem

This part we are going to talk about the relation between CUT and FLOW. Firstly we introduce the

deﬁnition of CUT.

Deﬁnition 4.3.1 (CUT). G = (V, E),V = V

1

∪ V

2

and V

1

∩ V

2

= ∅ then we call (V

1

, V

2

) a cut of

the graph G.

In a graph G = (V, E), for every cut S, T such that s ∈ S and t ∈ T, we call it a s-t cut.

Claim 4.3.2. Let f be any ﬂow in G, for every s-t cut (S, T),v(f) = f

out

(S) −f

in

(S).

Proof. By deﬁnition, f

in

(s) = 0, so we have v(f) =

∀e=(s,u)

f(e) = f

out

(s) − f

in

(s). Since

every node v in S other than s is internal, we have f

out

(v) −f

in

(v) = 0. Thus, v(f) = f

out

(s) −

f

in

(s) =

v∈S

(f

out

(v) −f

in

(v)) = f

out

(S) −f

in

(S).

Then we have a Max-Flow Min-Cut theorem:

Theorem 4.3.3 (Max-Flow Min-Cut). If f is an s − t ﬂow such that there is no s − t path in the

residual graph G

f

, then there is an s-t cut (A

∗

, B

∗

) in G for which v(f) = c(A

∗

, B

∗

). Conse-

quently, f has the maximum value of any ﬂow in G, and (A

∗

, B

∗

) has the minimum capacity of any

s −t cut in G.

19

Figure 4.3: the construction of A

∗

in a graph

Proof. Let A

∗

denote the set of all nodes v in G for which there is an s − v path in G

f

. Let B

∗

denote the set of all other nodes: B

∗

= V −A

∗

. First, we establish that (A

∗

, B

∗

) is indeed an s −t

cut. The source s belongs to A

∗

since there is always a path from s to s. Moreover, t doesn’t belong

to A

∗

by the assumption that there is no s −t path in the residual graph, so t ∈ B

∗

.

As shown in Figure 4.3, suppose that e = (u, v) is an edge in G for which u ∈ A

∗

and v ∈ B

∗

.

We claim that f(e) = c(e). If not, e would be a forward edge in the residual graph G

f

, so we

would obtain an s −v path in G

f

, contradicting our assumption that v ∈ B

∗

. Now we suppose that

e

= (u

, v

) is an edge in G for which u

∈ B

∗

and v

∈ A

∗

. We claim that f(e

) = 0. If not, e

would be a backward edge (v

, u

**) in the residual graph G
**

f

, so we would obtain an s − u

path in

G

f

, contradicting our assumption that u

∈ B

∗

. So we have an conclusion:

v(f) = f

out

(A

∗

) −f

in

(A

∗

)

=

e=(u,v),u∈A

∗

,v∈B

∗

f(e) −

e=(u,v),u∈B

∗

,v∈A

∗

f(e)

=

e=(u,v),u∈A

∗

,v∈B

∗

c(e) −0

= c(A

∗

, B

∗

)

4.4 Ford-Fulkerson Algorithm

From the Max-ﬂow Min-cut Theorem, we know that if the residual graph contains a path from s

to t, then we can increase the ﬂow by the minimum capacity of the edges on this path, so we must

not have the maximum ﬂow. Otherwise, we can deﬁne a cut (S, T) whose capacity is the same as

the ﬂow f , such that every edge from S to T is saturated and every edge from T to S is empty,

20

which implies that f is a maximum ﬂow and (S, T) is a minimum cut. Based on the theorem, we

can have the Ford-Fulkerson Algorithm: Starting with the zero ﬂow, repeatedly augment the ﬂow

along any path s-t in the residual graph, until there is no such path.

Algorithm 6: Ford-Fulkerson Algorithm

input : A network (G,s,t,c)

output: a ﬂow in the graph with maximum value

∀e, f(e) ← 0

while ∃s −t path do

f

← augment(f, p)

f ← f

G

f

← G

f

end

return f

Notice that the augment(f, p) in the algorithm.This is an updating procedure of the graph.It

means to ﬁnd randomly a s −t path and push through a ﬂow of minimum capacity of this path.

Also there are several things we need to know about the Ford-Fulkerson Algorithm:

1. Ford-Fulkerson Algorithm does not always terminate when we have real numbers assign-

ment. Consider the ﬂow network shown in Figure 4.4, with source s, sink t,capacities of

edges e

1

,e

2

and e

3

respectively 1, r =

√

5−1

2

and the capacity of all other edges some integer

M ≥ 2. Here the constant r was chosen so r

2

= 1−r. We use augmenting paths according to

the table, where p

1

= ¦s, v

4

, v

3

, v

2

, v

1

, t¦,p

2

= ¦s, v

2

, v

3

, v

4

, t¦,p

3

= ¦s, v

1

, v

2

, v

3

, t¦. Note

that after step 1 as well as after step 5, the residual capacities of edges e

1

, e

2

and e

3

are in the

form r

n

, r

n+1

and 0, respectively, for some n ∈ N. This means that we can use augment-

ing paths p

1

, p

2

, p

1

and p

3

inﬁnitely many times and residual capacities of these edges will

always be in the same form. Total ﬂow in the network after step 5 is 1+2(r

1

+r

2

). If we con-

tinue to use augmenting paths as above, the total ﬂow converges to 1+2

i=1

infr

i

= 3+2r

, while the maximum ﬂow is 2M + 1. In this case, the algorithm never terminates and the

ﬂow doesn’t even converge to the maximum ﬂow.

2. if function c maps the edges to integer, then FF outputs integer value.

This algorithm seems quite natural to us, but in fact the worse case running time of Ford-

Fulkerson Algorithm is exponential. Here is a famous example of Ford-Fulkerson Algorithm run-

ning in exponential time in Figure 4.5.

In this example, if the algorithm choose an s −t path which contains the edge u −v or v −u

(this is possible,recalling that the existence of the backword edge), each time we can only push 1

unit ﬂow. And obviously the max ﬂow for the network is 2

n+1

, so we need to choose 2

n+1

times

augmenting. Thus in this case the running time is exponential of input length (which is n).

4.4.1 Edmonds-Karp Algorithm

Now we introduce a faster algorithm called the Edmonds-Karp algorithm which was published

by Jack Edmonds and Karp in 1972, and solves the maximum ﬂow problem in O(m

2

n) running

21

Figure 4.4: an example of Ford-Fulkerson algorithm never terminates

•

u

2

n

1

•

s

2

n

>>

2

n

//

•

v

2

n

//

•

t

Figure 4.5: An example that Ford-Fulkerson terminates in exponential time

22

time.This algorithm only differs from the Ford-Fulkerson algorithm in one rule: Ford-Fulkerson

algorithm chooses the augmenting path arbitrarily, but Edmonds-Karp algorithm chooses the aug-

menting path of the shortest traversal length.

This rule sounds amazing: what does the traversal length do with the ﬂow that we can push

through this path? But deeper analysis shows that they indeed have relations to each other.

Theorem 4.4.1. E-K terminates in O([ E [

2

[ V [)

Proof.

Lemma 4.4.2. If we use Edmonds-Karp algorithm on the ﬂow network (G = (V, E), s, t, c), for

any v ∈ V − ¦s, t¦, the shortest-path distance δ

f

(u, v) will monotone increasing with each ﬂow

augmentation.

Proof. We will suppose that for some vertex v ∈ V − ¦s, t¦, there is a ﬂow augmentation that

causes the shortest-path distance from s to v to decrease, and then we will derive a contradiction.

Let f be the ﬂow just before the ﬁrst augmentation that decreases some shortest-path distance, and

let f

**be the ﬂow just afterward. Let v be the vertex with the minimumδ
**

f

(s, v) whose distance was

decreased by the augmentation, so that δ

f

(s, v) < δ

f

(s, v). Let p = s → . . . u → v be the shortest

path from s to v in G

f

, so that (u, v) ∈ E

f

, and δ

f

(s, u) = δ

f

(s, v) − 1, δ

f

(s, u) δ

f

(s, u).

Then we will get (u, v) doesn’t belong to E

f

(the proof is easy). As (u, v) ∈ E

f

, the augmentation

must have increased the ﬂow from v to u, and therefore the shortest path from s to u in G

f

have(v,

u) as its last edge. Then

δ

f

(s, v) = δ

f

(s, u) −1 δ

f

(s, u) −1 = δ

f

(s, v) −2

. It contradicts our assumption that δ

f

(s, v) < δ

f

(s, v)

Theorem 4.4.3. If the Edmonds-Karp algorithm runs on a ﬂow network G = (V, E) with source s

and sink t, then the total number of ﬂow augmentations performed by the algorithm is O(V E).

Proof. We say that an edge (u, v) in a residual network G

f

is critical on an augmenting path p if

the residual capacity of p is the residual capacity of (u, v), that is , if c

f

(p) = c

f

(u, v). After we

have augmented ﬂow along an augmenting path, any critical edge on the path disappears from the

residual network. Moreover, at least one edge on any augmenting path must be critical. We will

show that each of the [ E [ edges can become critical at most

|V |

2

−1 times.

Let u and v be vertices in V that are connected by an edge in E. Since augmenting paths are

shortest paths, when (u, v) is critical for the ﬁrst time, we have δ

f

(s, v) = δ

f

(s, u) + 1. Once

the ﬂow is augmented, the edge(u, v) disappears from the residual network. It cannot reappear

later on another augmenting path until after the ﬂow u to v is decreased, which occurs only if (u,

v) appears on an augmenting path. If f

**is the ﬂow in G when this event occurs, then we have
**

δ

f

(s, u) = δ

f

(s, v) + 1. As

δ

f

(s, v) δ

f

(s, v)

we have

δ

f

(s, u) = δ

f

(s, v) + 1 δ

f

(s, v) + 1 = δ

f

(s, u) + 2

Consequently, from the time (u, v) becomes critical to the time when it next becomes critical, the

distance of u from the source increases by at least 2. The distance of u from the source is initially at

23

Figure 4.6: constructing a new network in order to run Ford-Fulkerson on it

least 0. The intermediate vertices on a shortest path from s to u can’t contain s, u, or t. Therefore,

until u becomes unreachable from the source, if ever, its distance is at most [ V [-2. Thus, (u, v)

can become critical at most

|V |−2

2

=

|V |

2

− 1 times. Since there are O(E) pairs of vertices that can

have an edge between them in a residual graph, the total number of critical edges during the entire

execution of the Edmonds-Karp algorithm is O(VE). Each augmenting path has at least one critical

edge, and hence the theorem follows.

Since each iteration of Ford-Fulkerson can be implemented in O(E) time when the augment-

ing path is found by breadth-ﬁrst search, the total running time of the total running time of the

Edmonds-Karp algorithm is O([ E [

2

[ V [).

4.5 Bipartite Matching via Network Flows

In this section we’ll solve the maximum matching problem using the network ﬂow problem. In

fact this is rather straight forward. Let G = (X, Y, E) be a bipartite graph. We add two nodes

,s,t,to the graph and connect from s to all nodes in X,and connect all nodes in Y to t,as in Fig 6.

Run Ford-Fulkerson on the new network N, we get a maximum ﬂow f.Deﬁne a matching M =

¦e [ f(e) = 1¦,then m is a maximum matching. The proof is simple: we just need to prove that

we can build a one-on-one correspondence from matchings to ﬂows.And this is obvious: for a

matching, add two nodes s and t and do similar things as above, we get a ﬂow.And for a ﬂow f,

deﬁne a matching M = ¦e [ f(e) = 1¦,we get a matching. Thus the matching we ﬁnd using

Ford-Fulkerson is a maximum matching.

4.6 Application of Maximum Matchings: Approximating Ver-

tex Cover

We introduce a new problem: vertex cover:

Deﬁnition 4.6.1. Vertex Cover

• Input: G = (V, E)

• Output: a cover of minimum size.

24

Here is the deﬁnition for COVER:

Deﬁnition 4.6.2. Cover : G = (V, E),U ∈ V is a cover if ∀e ∈ E,∃u ∈ U which is a end node

of e.

Despite the long time of research, the problem remains open.

1. Negative: VC can not be approximated with a factor smaller than 1.363, unless P = NP.

’

2. Positive : We have a polynomial algorithm for VCwith approximation ratio 2−Ω(

1

√

log(n)

).

3. Belief : VC cannot be approximated within 2 −, ∀ > 0. Notice: in fact under something

which is called the Unique Game Conjecture this is true.

Here is a simple algorithm that approximates VC with ratio 2:

1. compute the maximum matching M in the graph.

2. output the vertices of M.

Proof. Observe that [ M [≤ OPT,where M is the max matching. Also notice that vertices in M

cover G. Suppose that there is a node u which is not covered by M, then it must be connected to a

node outside M,say v.Then (u, v) is an edge that does not belong to M,which contradicts that M is

a max matching.And the algorithm outputs 2 [ M [ nodes, and 2 [ M [≤ 2OPT

25

Chapter 5

Optimization Problems, Online and

Approximation Algorithms

In this chapter, we begin by introducing the deﬁnition of an optimization problem followed by

the deﬁnition of an approximation algorithm. We then present the Makespan problem as a simple

example of when an approximation algorithm is useful. Because the Makespan problem is NP-

hard, there does not exist a polynomial algorithm to solve it unless P = NP. If we can not

ﬁnd an exact answer to an arbitrary instance of Makespan in polynomial time, then perhaps we

can give an approximate solution. To this end, we formulate an approximation algorithm for the

Makespan problem: Greedy-Makespan. We go on to improve this algorithm to one with a lower

approximation ratio (i.e. an algorithm that is guaranteed to give an even closer approximation to

an optimal answer).

5.1 Optimization Problems

Deﬁnition 5.1.1. An optimization problem is ﬁnding an optimal (best) solution from a set of fea-

sible solutions.

Formally, an optimization problem Π consists of:

1. A set of inputs, or instances, J.

2. A set f(I) of feasible solutions, for each given instance I ∈ J.

3. An optimization objective function: c : T → R where T =

I∈I

f(I). That is, given an

instance I and a feasible solution y, the objective function denotes a “measure” of y with a

real number.

The goal of an optimization problem Π is to either give a minimum or maximum: the shortest or

longest path on a graph, for example. For maximization problems, the goal of problem Π is to ﬁnd

an output ∀I, O ∈ f(I), s.t. c(O) = max

O

∈f(I)

c(O

).

For example, in the Unweighted Interval Scheduling Problem, we have 4 intervals, as shown

in the following Figure 5.1

26

Here feasible solutions are ¦I

1

, I

2

¦, ¦I

1

¦, ¦I

2

¦, ¦I

3

¦, ¦I

4

¦, and ¦I

1

, I

2

¦ is optimal, as it maxi-

mizes the number of intervals that can be scheduled.

Notice: In the following section, we will only discuss maximization problems for brevity;

minimization problems are in no way fundamentally different.

5.2 Approximation Algorithms

Sometimes we may not know the optimal solution for the problem, if there is no known efﬁcient

algorithm to produce it, for example. In these cases, it might be useful to ﬁnd an answer that

is “close” to optimal. Approximation algorithms do exactly this, and we present them formally

presently.

Deﬁnition 5.2.1 (Approximation Algorithm). Let Π be an optimization problem and A be a poly-

nomial time algorithm for Π. For all instances I, denote by OPT(I) the value of the optimal solu-

tion, and by A(I) the solution computed by A, for that instance. We say that Ais a c-approximation

algorithm for Π if

∀I, OPT(I) ≤ cA(I)

5.3 Makespan Problem

In this section, we will discuss the Makespan problem followed by an approximation algorithm

which solves it. Makespan is a basic problem in scheduling theory: how to schedule a set of jobs

on a set of machines.

Deﬁnition 5.3.1 (Makespan Problem). We are given n jobs with different processing times, t

1

, t

2

, ..., t

n

.

The goal is to design a assignment of the jobs to m separate, but identical machines such that the

last ﬁnishing time for given jobs (also called makespan) is minimized.

We ﬁrstly introduce some basic notations to analysis the Makespan Problem. m ∈ N and

t

1

, t

2

, ..., t

n

are given as input. Here, m is the number of machines and t

1

, t

2

, ..., t

n

are the process-

ing length of jobs. Figure 5.3 gives an example input. We deﬁne A

i

to be the set jobs assigned

to machine i and T

i

to be the total processing time for machine i: T

i

=

t∈A

i

t. The goal of the

problem is to ﬁnd a scheduling of the jobs that minimize max

i

T

i

.

Suppose we are given as input: m = 2 ,< t

1

, t

2

, t

3

, t

4

, t

5

>=< 1, 1, 1, 1, 4 >.

Of these two possible solutions, the left one is better, in fact it is optimal.

Fact 5.3.2. MAKESPAN is NP-hard.

27

Thus, a polynomial time algorithm to solve an arbitrary instance of Makespan does not exist

unless P = NP. However, we can ﬁnd an approximation algorithm that gives a good, albeit pos-

sibly suboptimal, solution that can run in polynomial time.

5.4 Approximation Algorithm for Makespan Problem

A greedy approach is often one of the ﬁrst ideas to try, as it is so intuitive. The basic idea is as

follows: we schedule the jobs from 1 to n, assigning the i

th

job to the machine with the shortest

total processing time.

Algorithm 7: Greedy Makespan

input : m ∈ N: number of machines and t

1

, t

2

, ..., t

n

: the processing length of jobs

output: T: the last ﬁnishing time for the scheduling

for i ← 1 to n do

choose the smallest T[k] over T[i](i = 1, 2, ..., m)

A[k] = A[k] + T[i]

T[k] = T[k] + t[i]

end

T = min

i∈[n]

¦T[i]¦

return T

Theorem 5.4.1. Greedy Makespan is a 2-approximation algorithm for Makespan

Proof. Our proof will ﬁrst show that Greedy Makespan gives an approximation ratio with a lower

bound of 2 (the solution is at least 2 times the optimal). Then we will show that Greedy Makespan

has an upper bound of 2 (the solution is no worse than 2 times the optimal). In this way we will be

able to show that Greedy Makespan is indeed a 2-approximation for Makespan.

Here is an example to prove that the lower bound approximation ratio of Greedy Makespan Al-

gorithm is at least 2: Consider the case in which there are m machines and n jobs, where n =

m(m− 1) + 1. Among the n jobs, m(m− 1) of them have processing length 1 and the last one

has processing time m. Greedy-Makespan will schedule jobs as shown in Figures 5.1 and 5.2. The

optimal solution for is shown in Figure 5.3.

28

Figure 5.1: Before placing the last job.

Figure 5.2: After placing the last job

Now we have an approximation ration of Greedy Makespan =

2m−1

m

= 2 −

1

m

. Thus, by exam-

ple, we have shown that Greedy Makespan has an approximation ratio lower bound of 2.

We now show that the output of the greedy algorithm is at most twice that of the optimal

solution, i.e. has an upper bound of 2. Firstly we bring in two conspicuously tenable facts:

1.

1

m

t

i

≤ OPT(I).

2. t

max

≤ OPT(I) where t

max

is the most wasteful job.

Let T

k

= T

k

− t

j

where t

j

is the last job assigned to machine k. It is easy to see that T

k

is at

most the average process time. From property (1), we have: T

k

≤

1

m

i

t

i

≤ OPT. From property

(2) we have: T

k

= T

k

−t

j

≥ T

k

−OPT. Combining these facts, we get: T

k

−OPT ≤ OPT. So

T

k

≤ 2OPT.

Now we have an 2-approximation algorithm for Makespan Problem. But can we do better?

The answer is yes. Next we will improve this bound with Improved Greedy Makespan.

The only difference between the Improved Greedy Makespan and Greedy Makespan is that the

improved algorithm ﬁrst sorts t

i

by non-increasing length before scheduling.

Theorem 5.4.2. Improved Greedy Makespan is a

3

2

-approximation algorithm for Makespan.

Figure 5.3: Optimal solution

29

Proof. Since processing time are sorted in non-increasing order at the beginning of the algorithm,

we have t

1

≥ t

2

≥ ... ≥ t

n

.

In this proof we will make use of the following claim:

Claim 5.4.3. When n ≤ m, Improved Greedy Makespan will obtain an optimal solution. When

n > m, we have OPT ≥ 2t

m+1

.

Proof. When n ≤ m, every job is assigned to unique machine, thus the output of the algorithm is

equal to the ﬁnishing time of the largest job. This is obviously optimal.

When n > m, then by the pigeonhole principle there must be two jobs t

i

, t

j

that, in the optimal

case, are assigned to the same machine where i, j ≤ m + 1 and t

i

, t

j

≥ t

m+1

. We conclude that

OPT ≥ 2t

m+1

.

When we deﬁne: T

k

= T

k

−t

j

, t

j

the last job assigned to machine k, then we have:

1. If t

j

is such that j ≤ m, then there is only one job scheduled on machine k. This means that

T

k

≤ t

1

≤ OPT.

2. If t

j

is such that j > m, then machine k should be the one with the shortest total length

before scheduling the j

th

job. This implies T

k

≤

1

j−1

i

t

i

≤

1

m

i

t

i

≤ OPT.

Furthermore, we have T

k

= T

k

− t

j

≥ T

k

−

1

2

OPT. Thus, T

k

−

1

2

OPT ≤ OPT, which

implies T

k

≤

3

2

OPT

So for every k ∈ [m], T

k

≤

3

2

OPT. Therefore

3

2

is an upper bound on the approximation ratio

of Improved Greedy Makespan.

Although we have shown that Improved Greedy Makespan has an approximation ratio of

3

2

,

this bound is not tight. That is, we can do better.

Remark 5.4.4. Improved Greedy Makespan is in fact a

4

3

-approximation algorithm. (Proof omit-

ted)

30

Chapter 6

Introduction to Convex Polytopes

In this lecture, we cover some basic material on the structure of polytopes and linear programming.

6.1 Linear Space

Deﬁnition 6.1.1 (Linear space). V is a linear space, if V ⊆ R

d

, d ∈ Z

+

, V = span¦ u

1

, u

2

, . . . , u

k

¦,u

i

∈

R

d

. Equally, we can deﬁne linear space as V = ¦u[∀c

1

, c

2

, . . . , c

k

∈ R, u = c

1

u

1

+ +c

k

u

k

¦.

Deﬁnition 6.1.2 (Dimension). Let dim(V ) be the dimension of a linear space V . The dimension

of a linear space V is the cardinality of a basis of V . A basis is a set of linearly independent vectors

that, in a linear combination, can represent every vector in a given linear space.

Example 6.1.3 (Linear space). Line V ⊆ R

2

in Figure 1 is a linear space since it contains the

origin vector 0. It’s obvious that dim(V) = 1.

Figure 6.1: An example of linear space

6.2 Afﬁne Space

In this section, we propose the notion of afﬁne maps: x → Ax + x

0

, which represent an afﬁne

change of coordinates if A is a nonsingular n n matrix, x, x

0

∈ R

n

.

Then we will introduce an afﬁne space. An afﬁne space is a linear space that has “forgotten”

its origin. It can be seen as a shift of a linear space.

31

Deﬁnition 6.2.1 (Afﬁne Space). Let V be a linear space. V

is an afﬁne space if V

= V +

β,

where vector

v

∈ V

if and only if ∀

v

∈ V

,

v

= v +

β, where v is a vector in a linear space V ,

and

β is a ﬁxed scalar for an afﬁne space.

Example 6.2.2. In Figure 6.2, line V is a linear space while line V

**is not since it does not contain
**

the origin vector 0. V

is an afﬁne space with the property that V

= V +

β = ¦v +

β[v ∈ V ¦

where V ⊆ R

2

is a linear space and

β ∈ R

2

is a scalar. Note that dim(V

) = dim(V ) = 1.

Figure 6.2: An example of afﬁne space

Deﬁnition 6.2.3 (Afﬁne Hull). The afﬁne hull aff(S) of S is the set of all afﬁne combinations of

elements of S, that is,

aff(S) = ¦

u

[

u

=

k

i=0

λ

i

u

i

, u

i

∈ S, λ

i

∈ R, i = 1...k;

k

i=0

λ

i

= 1 k = 1, 2, ...¦

Claim 6.2.4. V

**is an afﬁne space if and only if there exist a linear S = ¦ u
**

0

, u

1

, ..., u

k

¦ such that

V

= aff(S)

Proof. Let V

= V +

β. We deﬁne V = span¦ u

1

, u

2

, ..., u

k

¦, and we let B = ¦ u

1

, u

2

, ..., u

k

¦, and

ﬁnally B

= (B ∩

0) +

β = ¦

β,

β + u

1

,

β + u

2

, ...,

β + u

k

¦.

• For every y ∈ aff(B

), we show that y ∈ V

:

y = λ

0

β +λ

1

( u

1

+

β) + λ

2

( u

2

+

β) + ... +λ

k

( v

k

+

β)

= (λ

0

+λ

1

+... +λ

k

)

β +λ

1

u

1

+λ

2

u

2

+... +λ

k

u

k

=

β +λ

1

u

1

+λ

2

u

2

+... +λ

k

u

k

Since λ

1

u

1

+λ

2

u

2

+ ... + λ

k

u

k

∈ V , y ∈ V

.

• For every y ∈ V

, we show that y ∈ aff(B

)

y =

β +λ

1

u

1

+λ

2

u

2

+... +λ

k

u

k

=

β + (λ

1

+λ

2

+... +λ

k

)

β −(λ

1

+λ

2

+ ... + λ

k

)

β + λ

1

u

1

+λ

2

u

2

+ ... + λ

k

u

k

= [1 −(λ

1

+ λ

2

+... +λ

k

)]

β +λ

1

( u

1

+

β) + λ

2

( u

2

+

β) + .. +λ

k

( u

k

+

β)

= λ

0

β +λ

1

( u

1

+

β) + λ

2

( u

2

+

β) + ... +λ

k

( v

k

+

β)

32

So y ∈ aff(B

)

6.3 Convex Polytope

Deﬁnition 6.3.1 (Convex body). A Convex body K can be deﬁned as: for any two points x, y ∈ K

where K ⊆ R

d

, K also contains the straight line segment [x, y] = ¦λx + (1 −λ)y[0 ≤ λ ≤ 1¦, or

[x, y] ⊆ K.

Example 6.3.2. The graph in Figure 6.3(a) is a convex body since for any x, y ∈ K, [x, y] ⊆ K.

However, the graph in Figure 6.3(b) is not a convex body since there exists an x, y ∈ K

such that

[x, y] ,⊆ K

.

(a) (b)

Figure 6.3: (a) An example of convex body (b) An example of non-convex body

Deﬁnition 6.3.3 (Convex Hull). For any K ⊆ R

d

, the “smallest” convex set containing K, called

the convex hull of K, can be constructed as the intersection of all convex sets that contain K:

conv(K) :=

¦U[U ⊆ R

d

, K ⊆ U, U is convex¦.

Deﬁnition 6.3.4 (Convex Hull). An alternative deﬁnition of convex hull is that let K = ¦ u

1

, u

2

, . . . , u

k

¦,

¯ conv(K) = ¦u[u = λ u

1

+ λ

k

u

k

,

k

i=0

λ

i

= 1, λ

i

≥ 0¦

Claim 6.3.5. conv(K) = ¯ conv(K).

Proof. Since conv(K) is the “smallest” convex set containing K ⊆ R

d

, conv(K) ⊆ ¯ conv(K).

Next, we will use induction to prove that ¯ conv(K) ⊆ conv(K) .

The base case is clearly correct, when K = ¦ u

1

¦ ⊆ R

d

, ¯ conv(K) ⊆ conv(K).

We now assume that for an arbitrary k − 1 where K = ¦ u

1

, . . . , u

k−1

¦ ⊆ R

d

, we have

¯ conv(K) ⊆ conv(K). On condition k, ﬁx an arbitrary u ∈ ¯ conv(K) such that u = λ u

1

+ λ

k

u

k

.

Actually,

u = (1 −λ

k

)

_

λ

1

1 −λ

k

u

1

+ +

λ

k−1

1 −λ

k

u

k−1

_

+λ

k

u

k

,

33

where by hypothesis

u

=

λ

1

1−λ

k

u

1

+ +

λ

k−1

1−λ

k

u

k−1

∈ conv(K), since the sum of coefﬁcients is

1. Because

u

, u

k

∈ conv(K) and conv(K) is convex, then u = (1 − λ

k

)

u

+ λ

k

u

k

∈ [

u

, u

k

] ⊆

conv(K). Thus, by induction on k, ¯ conv(K) ⊆ conv(K).

Hence, conv(K) = ¯ conv(K).

6.4 Polytope

In this section, we are going to introduce the concept of a V-Polytope and an H-Polytope.

Deﬁnition 6.4.1 (V-polytope). A V-polytope is convex hull of a ﬁnite set of points.

Before we give a deﬁnition of an H-Polytope, we need to give the deﬁnition of a hyperplane

and an H-Polyhedron.

Deﬁnition 6.4.2 (Hyperplane). A hyperplane in R

d

can be described as the set of points x =

(x

1

, x

2

, ..., x

d

)

T

which are solutions to α

1

x

1

+α

2

x

2

+ +α

d

x

d

= β, for ﬁxed α

1

, α

2

, . . . , α

d

, β ∈

R.

Example 6.4.3. x

1

+ x

2

= 1 is a hyperplane.

Figure 6.4: An example of hyperplane

Deﬁnition 6.4.4 (Half-space). A Half-space in R

d

is the set of points x = (x

1

, x

2

, ...x

d

)

T

, which

satisﬁes α

1

x

1

+α

2

x

2

+ +α

d

x

d

≤ β. for ﬁxed α

1

, α

2

, . . . , α

d

, β ∈ R.

Deﬁnition 6.4.5 (H-polyhedron). An H-polyhedron is the intersection of a ﬁnite number of closed

half-spaces.

Deﬁnition 6.4.6. An H-polytope is a bounded H-polyhedron.

A simple question follows: are the two types of polytopes equivalent? The answer is yes.

However, the proof, Fourier-Motzkin elimination, is complicated.

34

(a) (b)

Figure 6.5: (a) H-polyhedron and (b)H-polytope

6.5 Linear Programming

The goal of a Linear program (LP) is to optimize linear functions over polytopes. Examples are to

minimize/maximize α

1

x

1

+ α

2

x

2

+ + α

n

x

n

where x

1

, . . . , x

n

are variables and α

1

, . . . , α

n

are

constants that are subject to a set of inequalities:

α

11

x

1

+α

12

x

2

+ + α

1n

x

n

≤ β

1

α

21

x

1

+α

22

x

2

+ +α

2n

x

n

≤ β

2

.

.

.

It can be solved in polynomial time if x

1

, . . . x

n

are real numbers, but not when they are in-

tegers. Previously, it was believed that LPs were another complexity class between P and NP.

However, Linear Programs can, in fact, be solved in time polynomial in their input size. The

ﬁrst polynomial time algorithm for LPs was the Ellipsoid algorithm. Since then, there has been a

substantial amount of work in other polynomial time algorithms, e.g. Interior Point methods.

Example 6.5.1. Minimize f(x) = 7x

1

+x

2

+ 5x

3

, subject to

x

1

+ x

2

+ 3x

3

≥ 10

5x

1

+ 2x

2

−x

3

≥ 6

x

1

, x

2

, x

3

≥ 0

Every x =

_

_

x

1

x

2

x

3

_

_

that satisﬁes the constraints is called feasible solution.

Question: Can you “prove” ∃

x

∗

that makes f(

x

∗

) ≥ 30?

Answer: Yes. Here is such an

x

∗

:

x

∗

=

_

_

2

1

3

_

_

, f(

x

∗

) = 30.

To disprove a proposition or prove a lower bound, then we may need to come up with different

combinations of inequalities, such as

7x

1

+ x

2

+ 5x

3

≥ (x

1

−x

2

+ 3x

3

) + (5x

1

+ 2x

2

−x

3

) ≥ 16.

35

Chapter 7

Forms of Linear Programming

We describe the different forms of Linear Programs (LPs), including the standard and canonical

forms as well as how to do transform between them.

7.1 Forms of Linear Programming

In the previous lecture, we introduced the notion of Linear Programming by giving a simple exam-

ple. Linear Programs are commonly written in two forms: standard form and canonical form.

Recall from the previous lecture that an LP is a set of linear inequalities together with an

objective function. Each linear inequality takes the form:

a

1

x

1

+ a

2

x

2

+... +a

n

x

n

¦≤, =, ≥¦b

There may be many such linear inequalities. The objective function takes the form:

c

1

x

1

+c

2

x

2

+ ... + c

n

x

n

The goal of Linear Programming is to maximize or minimize an objective function over the as-

signments that satisfy all inequalities. The general form of an LP is:

maximize/minimize c

T

x

subject to equations

inequalities

non-negative vers clause

unconstrained variable example:x

i

≷ 0

There are a couple different ways to write a Linear Program.

Deﬁnition 7.1.1 (Canonical Form). An LP is said to be in canonical form if it is written as

minimize c

T

x

subject to Ax ≥

b

x ≥

0

36

Deﬁnition 7.1.2 (Standard Form). An LP is said to be in standard form if it is written as

minimize c

T

x

subject to Ax =

b

x ≥

0

7.2 Linear Programming Form Transformation

We present transformations that allow us to change an LP written in the general form into one

written in the standard form, and vice versa.

1. STEP1: From General form into Canonical form.

maximize c

T

x ⇒ −minimize

−c

T

x

a

T

x = b ⇒ a

T

x ≤ b and a

T

x ≥ b

a

T

x ≤ b ⇒ −a

T

x ≥ −b

x

i

≷ 0 ⇒ x

+

i

−x

−

i

, x

+

i

, x

−

i

≥ 0

2. STEP2: From Canonical form into Standard form

a

T

x ≥ b ⇒a

T

x −y = b, y ≥ 0

Example 7.2.1. Suppose we have an LP written in the general form, and we wish to transform it

into the standard form:

maximize 2x

1

+ 5x

2

subject to x

1

+x

2

≤ 3

x

2

≥ 0

x

1

≷ 0

The standard form of the LP is:

minimize −(2x

+

1

−2x

−

1

+ 5x

2

)

subject to x

+

1

−x

−

1

+x

2

+ y = 3

x

+

1

, x

−

1

, x

2

, y ≥ 0

37

Chapter 8

Linear Programming Duality

In this lecture, we introduce the dual of a Linear Program (LP) and give a geometric proof of the

Linear Programming Duality Theorem from elementary principles. We also introduce the concept

of complementary slackness conditions, which we use to characterize optimal solutions to LPs and

their duals. This material ﬁnds many applications in subsequent lectures.

8.1 Primal and Dual Linear Program

8.1.1 Primal Linear Program

Consider an arbitrary Linear Program, which we will call primal in order to distinguish it from

the dual (introduced later on). In what follows, we assume that the primal Linear Program is a

maximization problem, where variables take non-negative values.

maximize

n

j=1

c

j

x

j

subject to

n

j=1

a

ij

x

j

≤ b

i

for i = 1, 2, . . . , m

x

j

≥ 0 for j = 1, 2, . . . , n

The above Linear Program can be rewritten as:

maximize c

T

x (8.1)

subject to Ax ≤

b (8.2)

x ≥

0 (8.3)

An example:

38

maximize 3x

1

+x

2

+ 3x

3

(8.4)

subject to x

1

+x

2

+ 3x

3

≤ 30 (8.5)

2x

1

+ 2x

2

+ 5x

3

≤ 24 (8.6)

4x

1

+x

2

+ 2x

3

≤ 36 (8.7)

x

1

, x

2

, x

3

≥ 0 (8.8)

8.1.2 Dual Linear Program

Suppose that one wants to know whether there is a ﬂow in a given network of value ≥ α. If

such a ﬂow exists, then we have a certiﬁcate of small size that proves that such a thing exists: the

certiﬁcate is a ﬂow f such that α = v(f). Now, what if one wants to know whether or not every

ﬂow has value < α? One certiﬁcate we can give to prove this is to list all ﬂows (say, of integer

values). But this certiﬁcate is far too large. From the max-ﬂow min-cut theorem we know that the

weight of every cut is an upper bound to the maximum ﬂow. Observe that for this, we just list the

cut, which is a small certiﬁcate. Note that the “size” of the certiﬁcate is one thing. We care both

about the size and about the time required to verify using this certiﬁcate.

The above relation between max-ﬂow and min-cut is called duality. The concept of duality

appears to be very useful in the design of efﬁcient combinatorial algorithms. As we have men-

tioned before, Linear Programming can be seen as a kind of “uniﬁcation theory” for algorithms.

Part of its classiﬁcation as such a thing is because there is a systematic way of obtaining duality

characterizations.

Any primal Linear Program can be converted into a dual linear program:

minimize

m

i=1

b

i

y

i

subject to

m

i=1

a

ij

y

i

≥ c

i

for j = 1, 2, . . . , n

y

i

≥ 0 for i = 1, 2, . . . , m

The above Linear Program can be rewritten as:

minimize

b

T

y (8.9)

subject to A

T

y ≥c (8.10)

y ≥

0 (8.11)

The primal Linear Program example given in (8.4) - (8.8) can be converted to the following

dual form:

39

minimize 30y

1

+ 24y

2

+ 36y

3

subject to y

1

+ 2y

2

+ 4y

3

≥ 3

y

1

+ 2y

2

+y

3

≤ 1

3y

1

+ 5y

2

+ 2y

3

≤ 2

y

1

, y

2

, y

3

≥ 0

As we shall see, in cases when both the primal LP and the dual LP are feasible and bounded,

the objective value of the dual form gives a bound on the objective value of the primal form.

Furthermore, we shall see the two optimal objective values are actually equivalent!

8.2 Weak Duality

Theorem 8.2.1. (Weak Duality) Let x = (x

1

, x

2

, . . . , x

n

) be any feasible solution to the primal

Linear Program and let y = (y

1

, y

2

, . . . , y

m

) be any feasible solution to the dual Linear Program.

Then c

T

x ≤

b

T

y, which is to say

n

j=1

c

j

x

j

≤

m

i=1

b

i

y

i

Proof. We have

n

j=1

c

j

x

j

≤

n

j=1

(

m

i=1

a

ij

y

i

)x

j

=

m

i=1

(

n

j=1

a

ij

x

j

)y

i

≤

m

i=1

b

i

y

i

Corollary 8.2.2. Let x = (x

1

, x

2

, . . . , x

n

) be a feasible solution to the primal Linear Program,

and let y = (y

1

, y

2

, . . . , y

m

) be a feasible solution to the dual Linear Program. If

n

j=1

c

j

x

j

=

m

i=1

b

i

y

i

then x and y are optimal solutions to the primal and dual Linear Programs, respectively.

Proof. Let t =

n

j=1

c

j

x

j

=

m

i=1

b

i

y

i

. By the Weak Duality Theorem we know the objective

value of the primal Linear Program is upper bounded by t. If we set x to be such that it meets its

upper bound t, then x is an optimal solution. The proof for y is similar.

40

8.3 Farkas’ Lemma

Farkas lemma is an essential theorem which tells us that we can have small certiﬁcates of in-

feasibilty of a system of linear inequalities. Geometrically this certiﬁcate is a hyperplane which

separates the feasible region of a set of constraints from a point that lies outside this region. Al-

though the geometric intuition is clear (and straightforward), it takes a little bit of work to make it

precise. The key part of the proof of Farkas lemma is the following geometric theorem.

8.3.1 Projection Theorem

Theorem 8.3.1. (Projection Theorem) Let K be a closed, convex and non-empty set in R

n

, and

b ∈ R

n

,

b / ∈ K. Deﬁne projection p of

b onto K to be x ∈ K such that |

b −x| is minimized. Then

for all z ∈ K, we have (

b − p)

T

(z − p) ≤

0.

Proof. Let u =

b − p, v = z − p. Suppose u v > 0, then

1) If |v| ≤

u·v

v

, then

(

b −z)

2

= (u −v)

2

= u

2

−2u v +v

2

≤ u

2

−u v

< u

2

= (

b − p)

2

which means |

b −z| < |

b − p|.

2) If |v| >

u·v

v

, and because K is convex, p ∈ K and z = p + v ∈ K, we know

z

=

p +

u·v

v

2

v ∈ K. Then

(

b −

z

)

2

= (u −

u v

|v|

2

v)

2

= u

2

−

(u v)

2

|v|

2

< u

2

= (

b − p)

2

which means |

b −

z

| < |

b − p|.

8.3.2 Farkas’ Lemma

Lemma 8.3.2. (Farkas’ Lemma) One and only one of the following two assertions holds:

1. A

T

y =c, y ≥

0 has a solution.

2. Ax ≤

0 and c

T

x > 0 has a solution.

Proof. First, we show the two assertions cannot hold at the same time. If so, we have

c

T

x = y

T

Ax

41

We know y

T

≥

0 and Ax ≤

0. So c

T

x = y

T

Ax ≤ 0. But this contradicts c

T

x > 0.

Next, we show at least one of the two assertions holds. Speciﬁcally, we show if Assertion 1

doesn’t hold, then Assertion 2 must hold.

Assume A

T

y =c, y ≥

**0 is not feasible. Let K = ¦A
**

T

y : y ≥

**0¦ (which is obviously convex).
**

Then c / ∈ K. Let p = A

T

w( w ≥

0) be the projection of c onto K. Then from the Projection

Theorem we know that

(c −A

T

w)

T

(A

T

y −A

T

w) ≤ 0 for all y ≥

0 (8.12)

Deﬁne x =c − p =c −A

T

w. Then

x

T

A

T

(y − w) ≤ 0 for all y ≥

0

(y − w)

T

Ax ≤ 0 for all y ≥

0

Let e

i

be the n dimensional vector that has 1 in its i-th component and 0 everywhere else. Take

y = w + e

i

. Then

(Ax)

i

= e

i

T

Ax ≤ 0 for i = 1, 2, . . . , m (8.13)

Thus, each element of Ax is non-positive, which means Ax ≤

0.

Now, c

T

x = ( p +x)

T

x = p

T

x +x

T

x. Setting y =

0 in (8.13), we have − w

T

Ax = − p

T

x ≤

0,

so p

T

x ≥ 0. Since c ,= p, x ,=

0, so x

T

x > 0. Therefore, c

T

x > 0.

8.3.3 Geometric Interpretation

If c lies in the cone formed by the column vectors of A

T

, then Assertion 1 is satisﬁed. Other-

wise, we can ﬁnd a hyperplane which contains the origin and separates c from the cone, and thus

Assertion 2 is satisﬁed. These two cases are depicted below.

The a

i

’s correspond to the column vectors of A

T

.

b

1

and

b

2

correspond to c in Assertion 1 and

Assertion 2, respectively.

42

8.3.4 More on Farkas Lemma

There are several variants of Farkas Lemma in the literature, as well as interpretations that one may

give. In our context Farkas Lemma is being used to characterize linear systems that are infeasible.

Among others, in some sense, this lemma provides a succinct way for witnessing infeasibility.

Now we give an variant of Farkas Lemma:

Lemma 8.3.3. One and only one of the following two assertions holds:

1. Ax ≥

b has a solution.

2. A

T

y = 0,

b

T

y > 0, y ≥

0 has a solution.

Exercise: Prove Lemma 8.3.3.

8.4 Strong Duality

Roughly speaking, Farkas Lemma and the Linear Programming (LP) Duality theorem are the same

thing. The “hard” job has already been done when proving Farkas Lemma. Note that in Farkas we

have a system of linear inequalities whereas in LP Duality we have a system of linear inequalities

over which we wish to optimize a linear function. What remains to be done is to play around

with formulas such that we translate LP Duality to Farkas. To do that we will somehow “encode”

the function we optimize in the Linear Program inside the constraints of the system of linear

inequalities.

Theorem 8.4.1 (Strong Duality). Let x be a feasible solution to the primal Linear Program and

let y be a feasible solution to the dual Linear Program (8.9)-(8.11). Then x and y are optimal

⇐⇒

c

T

x =

b

T

y

Proof. (⇐) See Corollary (8.2.2).

(⇒) Since we already have c

T

x ≤

b

T

y by the Weak Duality Theorem, we just need to prove

c

T

x ≥

b

T

y. Let z

∗

=c

T

x, w

∗

=

b

T

y.

Claim 8.4.2. There exists a solution of dual of value at most z

∗

, i.e.,

∃y : A

T

y ≥ c, y ≥

0,

b

T

y ≤ z

∗

Proof. We wish to prove that there is a y satisfying:

_

A

T

−

b

T

_

≥

_

c

−z

∗

_

43

Assume the claim is wrong. Then the variant of Farkas’ Lemma implies that

_

A, −

b

_

_

x

λ

_

=

0

_

c

T

, −z

∗

_

_

x

λ

_

> 0

x ≥

0

λ ≥ 0

has a solution. That is, there exist nonnegative x, λ such that

Ax −λ

b =

0

c

T

x −λz

∗

> 0

Case I: λ > 0. Then there exists nonnegative x such that A(

x

λ

) =

b, c

T

(

x

λ

) > z

∗

. This contradicts

the minimality of z

∗

for the primal.

Case II: λ = 0. Then there exists nonnegative x such that Ax = 0, c

T

x > 0. Take any feasible

solution

x

**for primal LP. Then for every µ ≥ 0,
**

x

**+µx is feasible for primal LP, since
**

1.

x

+µx ≥

0 because

x

≥

0, x ≥

0, µ ≥ 0.

2. A(

x

+µx) = A

x

+µAx =

b +µ

0 =

b.

But c

T

(

x

+µx) =c

T

x

+c

T

µx → ∞as µ → ∞, This contradicts the assumption that

the primal has ﬁnite solution.

The above claim shows that z

∗

≥ w

∗

. And we already have z

∗

≤ w

∗

by Weak Duality Theorem,

so z

∗

= w

∗

8.5 Complementary Slackness

Theorem 8.5.1. (Complementary Slackness) Let x be an optimal solution to the primal Linear

Program given in (8.1)-(8.3), and let y be an optimal solution to the dual Linear Program given in

(8.9)-(8.11). Then the following conditions are necessary and sufﬁcient for x and y to be optimal:

m

i=1

a

ij

y

i

= c

j

or x

j

= 0 for j = 1, 2, . . . , n (8.14)

n

j=1

a

ij

x

j

= b

i

or y

i

= 0 for i = 1, 2, . . . , m (8.15)

44

Proof. (Sufﬁcient) Deﬁne S

j

= ¦j : x

j

> 0¦ and S

i

= ¦i : y

j

> 0¦. Then we have

c

T

x =

n

j=1

c

j

x

j

=

n

j∈S

j

c

j

x

j

=

n

j∈S

j

_

m

i=1

a

ij

y

i

_

x

j

=

n

j∈S

j

_

m

i∈S

i

a

ij

y

i

_

x

j

b

T

y =

m

i=1

b

i

y

i

=

m

i∈S

i

b

i

y

i

=

m

i∈S

i

_

n

j=1

a

ij

x

j

_

y

i

=

m

i∈S

i

_

_

n

j∈S

j

a

ij

x

j

_

_

y

i

Therefore c

T

x =

b

T

y. By the Strong Duality Theorem, we know x and y must be optimal

solutions.

(Necessary) Suppose x and y are optimal solutions, we have

c

T

x ≤ (A

T

y)

T

x

=

n

j=1

_

m

i=1

a

ij

y

i

_

x

j

b

T

y ≥ (Ax)

T

y

=

m

i=1

_

n

j=1

a

ij

x

j

_

y

i

c

T

x =

b

T

y

so

c

T

x =

b

T

y =

n

j=1

_

m

i=1

a

ij

y

i

_

x

j

(8.16)

45

Suppose there exists j

such that

m

i=1

a

ij

y

i

> c

j

and x

j

> 0. Then

c

T

x =

n

j=1

c

j

x

j

=

j=j

c

j

x

j

+c

j

x

j

<

j=j

_

m

i=1

a

ij

y

i

_

x

j

+

_

m

i=1

a

ij

y

i

_

x

j

=

n

j=1

_

m

i=1

a

ij

y

i

_

x

j

which contradicts optimality. Therefore equation (8.14) must be satisﬁed. Equation (8.15) can be

proved in a similar fashion.

46

Chapter 9

Simplex Algorithm and Ellipsoid Algorithm

In this chapter, we will introduce two methods for solving Linear Programs: the simplex algorithm

and the ellipsoid algorithm. To better illustrate these two algorithms, some mathematical back-

ground is also introduced. The simplex algorithm doesn’t run in polynomial time, but works well

in practice. The ellipsoid runs in polynomial time, but usually performs poorly in practice.

9.1 More on Basic Feasible Solutions

9.1.1 Assumptions and conventions

Since any LP instance could be converted to maximized standard form, we’ll use the standard form

of an LP to show some useful properties in the following lectures. We can write the maximized

standard form of LP compactly as

maximize c

T

x (9.1)

subject to Ax =

b (9.2)

x ≥

0 (9.3)

We only consider the case when the LP (9.1) - (9.3) is feasible and bounded. In such cases, the

feasible region is a polytope.

Since we have seen how to convert a Linear Program from normal forms to standard form in

the previous lectures, we can simplify our notation by setting A to be an mn matrix, c and x as

n-dimensional vectors, and

b as an m-dimensional vector in (9.1) - (9.3).

It’s easy to see that m < n. WLOG we suppose rank(A) = mso that all constraints are useful.

9.1.2 Deﬁnitions

Suppose the Linear Program (9.1) - (9.3) is feasible and bounded. Denote the feasible region by

S. It’s easy to see that S is non-empty and convex. Then we have the following deﬁnitions:

Deﬁnition 9.1.1 (Corner). A point x in S is said to be a corner if there is an n-dimensional vector

w and a scalar t such that

47

1. w

T

x = t,

2. w

T

x

> t for all x

∈ S −¦x¦.

Intuitively, this deﬁnition says that all points in S are on the same side of the hyperplane H

deﬁned by w and H ∩ S = ¦x¦. Also note that w

T

y ≥ t for all y ∈ S.

Deﬁnition 9.1.2 (Extreme point). A point x in S is said to be an extreme point if there are no two

distinct points u and v in S such that x = λu + (1 −λ)v for some 0 < λ < 1.

The above deﬁnition says there is no line segment in S with x in its interior.

Deﬁnition 9.1.3 (Basic Feasible Solution). Let B be any non-singular mm sub-matrix made up

of columns of A. If x ∈ S and all n − m components of x not associated with columns of B are

equal to zero, we say x is a Basic Feasible Solution to Ax =

**b with respect to basis B.
**

Intuitively, this says that the columns of B span the whole m-dimensional vector space. Since

b

is also m-dimensional,

**b is a linear composition of the columns of B. Therefore, there is a solution
**

x with all n −m components not associated with columns of B set to zero.

9.1.3 Equivalence of the deﬁnitions

Lemma 9.1.4. If x is a corner, then x is an extreme point.

Proof. Suppose x is a corner, but not an extreme point. Then there exists y ,=

0 such that x+y ∈ S

and x − y ∈ S. So w

T

(x + y) ≥ t and w

T

(x − y) ≥ t. Since w

T

x = t, we have w

T

y ≥ 0 and

w

T

y ≤ 0 which implies that w

T

y = 0. So w

T

(x +y) = w

T

x = t. Since x +y ∈ S, x +y = x by

the deﬁnition of corners. This means y =

0, which is a contradiction.

Lemma 9.1.5. If x is an extreme point, then the columns ¦ a

j

: x

j

> 0¦ of A are linearly indepen-

dent.

Proof. WLOG suppose the columns are a

1

, a

2

, . . . , a

k

. If they are not independent, then there are

scalars d

1

, d

2

, . . . , d

k

not all zero such that

d

1

a

1

+d

2

a

2

+ +d

k

a

k

=

0 (9.4)

WLOG we suppose [d

j

[ < [ x

j

[ for 1 ≤ j ≤ k. Let

d be an n-dimensional vector such that

d = (d

1

, . . . , d

k

, 0, . . . , 0)

T

. Then we know

A

d = d

1

a

1

+d

2

a

2

+ +d

k

a

k

+ 0 a

k+1

+ + 0 a

n

=

0 (9.5)

So A(x+

d) = Ax =

b. Since x+

d = (x

1

+d

1

, . . . , x

k

+d

k

, 0, . . . , 0) ≥

0, we have x+

d ∈ S.

Similarly, we have x −

d ∈ S. Since x =

1

2

(x +

d) +

1

2

(x −

d) and

d ,=

0, we see x is not an

extreme point, which is a contradiction.

Lemma 9.1.6. If x ∈ S and the columns ¦ a

j

: x

j

> 0¦ of A are linearly independent, then x is a

Basic Feasible Solution.

48

Proof. WLOG suppose x = (x

1

, . . . , x

k

, 0, . . . , 0) where x

j

> 0 for 1 ≤ j ≤ k, which is to say,

B = ¦ a

1

, . . . , a

k

¦ are independent. Since rank(A) = m, we have k ≤ m.

1) If k = m, we are done. This is because B = [ a

1

a

k

] is an mm invertible sub-matrix

of A.

2) If k < m, we can widen B by incorporating more columns of A while maintaining their

independence. Finally, B has m columns. Now the components of x not corresponding to the

columns in B are still zero, so x is a Basic Feasible Solution.

Lemma 9.1.7. If x is a Basic Feasible Solution, then x is a corner.

Proof. WLOG suppose x = (x

1

, . . . , x

m

, 0, . . . , 0)

T

is a Basic Feasible Solution, which is to say

B = [ a

1

a

m

] is an m m non-singular sub-matrix of A. Set w = (0, . . . , 0, 1, . . . , 1)

T

(with

m 0’s and n −m 1’s) and t = 0. Then w

T

x = 0. And w

T

y ≥ 0 for any y ∈ S since y ≥

0.

If there exists z ∈ S such that w

T

z = 0, then z = (z

1

, . . . , z

m

, 0, . . . , 0)

T

and

b = Az =

B[z

1

z

m

]

T

. Since

b = Ax = B[x

1

x

m

]

T

, we have [z

1

z

m

]

T

= [x

1

x

m

]

T

= B

−1

b.

So z = x.

Thus w

T

x = t and w

T

x

> t for all x

**∈ S −¦x¦, which proves x is a corner.
**

Now we have the following theorem stating the equivalence of the various deﬁnitions:

Theorem 9.1.8. Suppose the LP (9.1) - (9.3) is feasible with feasible region S. Then the following

statements are equivalent:

1. x is a corner;

2. x is an extreme point;

3. x is a Basic Feasible Solution.

Proof. From Lemma (9.1.4) - (9.1.7).

9.1.4 Existence of Basic Feasible Solution

So far we have seen equivalent deﬁnitions for Basic Feasible Solutions. But when do we know

when a BFS exists? The following theorem shows that a BFS exists if a feasible solution does.

Theorem 9.1.9. If the Linear Program (9.1) - (9.3) is feasible then there is a BFS.

Proof. Since the LP is feasible, the feasible region S ,= ∅. Let x be a feasible solution with a

minimal number of non-zero components. If x =

0, we are done since

**0 is a BFS. Otherwise, set
**

I = ¦i : x

i

> 0¦ then it’s clear I ,= ∅. Suppose x is not basic, which is to say, ¦a

i

: i ∈ I¦ are not

linearly independent. Then there exists u ,=

0 with u

i

= 0 for i / ∈ I satisfying Au =

0.

Consider x

= x+u. It’s clear Ax

=

b for every . In addition, x

≥

**0 for sufﬁciently small .
**

So x

is feasible. Increase or decrease until the ﬁrst time one more components of x

hits 0. At that

point, we have a feasible solution with fewer non-zero components, which is a contradiction.

49

9.1.5 Existence of optimal Basic Feasible Solution

Suppose a Linear Program is feasible and ﬁnite, i.e. has a ﬁnite optimal value. Then we can show

the LP has an optimal Basic Feasible Solution.

Theorem 9.1.10. If the Linear Program (9.1) - (9.3) is feasible and ﬁnite, there is an optimal

solution at an extreme point of the feasible region S.

Proof. Since there exists an optimal solution, there exists an optimal solution x with a minimal

number of non-zero components. Suppose x is not an extreme point, then x = λu + (1 − λ)v for

some u, v ∈ S and 0 < λ < 1. Since x is optimal, we must have c

T

u ≤c

T

x and c

T

v ≤c

T

x. Since

c

T

x = λc

T

u + (1 −λ)c

T

v, this implies c

T

u =c

T

v =c

T

x. Thus u and v are also optimal.

Consider x

**= x +(u −v). The following statements are easy to check:
**

1. c

T

x

=c

T

x for all ;

2. x

i

= 0 ⇒ u

i

= v

i

= 0 ⇒ x

i

= 0;

3. x

i

> 0 ⇒ x

i

> 0 for sufﬁciently small .

Increase or decrease until the ﬁrst time one more components of x

**is zero. At that point, we have
**

an optimal solution with fewer non-zero components, which is a contradiction.

Corollary 9.1.11. If the Linear Program (9.1) - (9.3) is feasible and ﬁnite, then there is an optimal

Basic Feasible Solution.

Proof. From Theorem (9.1.8) and Theorem (9.1.10).

9.2 Simplex algorithm

9.2.1 The algorithm

The simplex algorithm is based on the existence of an optimal Basic Feasible Solution(BFS). The

algorithm ﬁrst chooses a BFS (or reports the LP is infeasible), then in each iteration, moves from

the BFS to its neighbor (which differs from it in one column), and makes sure the objective value

doesn’t decrease when all non-basic variables are set to zero. This is called pivoting.

Consider the following example:

z = 27 +

x

2

4

+

x

3

2

−

3x

6

4

x

1

= 9 −

x

2

4

−

x

3

2

−

x

6

4

x

4

= 21 −

3x

2

4

−

5x

3

2

+

x

6

4

x

5

= 6 −

3x

2

2

− 4x

3

+

x

6

2

50

In this LP, ¦x

1

, x

4

, x

5

¦ is a BFS. If we set non-basic variables to zero, the objective value is 27.

After a pivoting with x

3

joining the BFS and x

5

leaving the BFS, it becomes:

z =

111

4

+

x

2

16

−

x

5

8

−

11x

6

16

x

1

=

33

4

−

x

2

16

+

x

5

8

−

5x

6

16

x

3

=

3

2

−

3x

2

8

−

x

5

4

+

x

6

8

x

4

=

69

4

+

3x

2

16

+

5x

5

8

−

x

6

16

In this LP, ¦x

1

, x

3

, x

4

¦ is a BFS. If we set non-basic variables to zero, the objective value is

111

4

> 27. If we continue pivoting, either we reach the optimal BFS, or we conclude the LP is

unbounded.

To prevent cycling, we can use Bland’s rule, which says we should choose the variable with

the smallest index for tie-breaking.

Interested readers should refer to Introduction to Algorithms, Chapter 29 for more details of

the simplex algorithm. In particular, this will illustrate:

1. How to choose the initial BFS.

2. How to choose the pivoting element.

3. When to assert the LP is infeasible, unbounded or whether an optimal solution can found.

9.2.2 Efﬁciency

The simplex algorithm is not a polynomial time algorithm. Klee and Minty gave an example

showing that the worst-case time complexity of simplex algorithm is exponential in 1972. But it is

generally fast in real world applications.

9.3 Ellipsoid algorithm

In this section, we only consider when a ﬁnite feasible region exists.

9.3.1 History

If we formulate an LP in its decision version, then it’s obvious LP ∈ NP since a NTM can nonde-

terministically guess the solution. In addition, LP ∈ co-NP since an NTM can guess the solution

to the LP’s dual. So LP ∈ NP ∩ co-NP. However, it was not until Khachiyan posted his ellipsoid

algorithm, a polynomial time algorithm, that people knew LP ∈ P.

51

9.3.2 Mathematical background

Carath´ eodory’s theorem

Theorem 9.3.1 (Carath´ eodory’s theorem). If a point x of R

d

lies in the convex hull of a set P, there

is a subset P

**of P consisting of d + 1 or fewer points which are afﬁnely independent such that x
**

lies in the convex hull of P. Equivalently, x lies in an r-simplex with vertices in P, where r ≤ d.

Proof. Let x be a point in the convex hull of P. Then, x is a convex combination of a ﬁnite number

of points in P:

x =

k

j=1

λ

j

x

j

where every x

j

is in P, every λ

j

is positive, and

k

j=1

λ

j

= 1.

If x

2

−x

1

, . . . , x

k

−x

1

are linearly independent, there are scalars µ

2

, . . . , µ

k

not all zero such

that

k

j=2

µ

j

(x

j

−x

1

) = 0

If µ

1

is deﬁned as µ

1

= −

k

j=2

µ

j

, then

k

j=1

µ

j

x

j

= 0 and

k

j=1

µ

j

= 0 and not all of the

µ

j

’s are equal to zero. Therefore, at least one µ

j

> 0. Then,

x =

k

j=1

λ

j

x

j

−α

k

j=1

µ

j

x

j

=

k

j=1

(λ

j

−αµ

j

)x

j

for any real α. In particular, the equality will hold if α is deﬁned as

α = min

1≤j≤k

_

λ

j

µ

j

: µ

j

> 0

_

=

λ

i

µ

i

.

Note that α > 0, and for every j between 1 and k, λ

j

− αµ

j

≥ 0. In particular, λ

i

− αµ

i

= 0

by deﬁnition of α. Therefore,

x =

k

j=1

(λ

j

−αµ

j

)x

j

where every λ

j

αµ

j

is non-negative, their sum is one, and furthermore, λ

i

αµ

i

= 0. In other

words, x is represented as a convex combination of one less point of P.

Note that if k > d + 1, x

2

− x

1

, . . . , x

k

− x

1

are always linearly dependent. Therefore, this

process can be repeated until x is represented as a convex combination of r ≤ d + 1 points

x

i

1

, . . . , x

ir

in P. Since x

i

2

−x

i

1

, . . . , x

ir

−x

i

1

are linearly independent, when this process termi-

nates, x

i

1

, . . . , x

ir

are afﬁnely independent.

Quadratic form

x

T

Ax is called a quadratic form, where x is an n-dimensional vector and A is a symmetric n n

matrix. If A is positive deﬁnite, then x

T

Ax = 1 is an ellipsoid.

52

Volume of n-simplices

A set of n + 1 afﬁnely independent points S ∈ R

n

make up an n-simplex. The volume of this

n-simplex is

1

n!

[det

_

1 1 1

p

0

p

1

p

n

_

[

9.3.3 LP,LI and LSI

There are three kinds of problems that are related to each other.

Question: [Linear Programming/IP] Given an integer mn matrix A, m-vector

b and n-vector

c, either

(a) Find a rational n-vector x such that x ≥

0, Ax =

b and c

T

x is minimized subject to these

conditions, or

(b) Report that there is no n-vector x such that x ≥

0 and Ax =

b, or

(c) Report that the set ¦c

T

x : Ax =

b, x ≥

**0¦ has no lower bound.
**

Question: (Linear Inequalities) Given an integer m n matrix A and m-vector

b, is there an

n-vector x such that Ax ≤

b?

Question: (Linear Strict Inequalities/LSI) Given an integer mn matrix A and m-vector

b, is

there an n-vector x such that Ax <

b?

Polynomial reductions

Theorem9.3.2. There is a polynomial-time algorithmfor an LP if and only if there is a polynomial-

time algorithm for LI.

Theorem 9.3.3. If there is a polynomial-time algorithm for LSI then there is a polynomial-time

algorithm for LI.

Therefore, we just need to provide an polynomial-time algorithm for LSI to ensure there is a

polynomial-time algorithm for LP, via the reduction LP ≤ LI ≤ LSI.

9.3.4 Ellipsoid algorithm

The main idea of the ellipsoid algorithm is to do a binary search. Initially the algorithm ﬁnds

an ellipsoid large enough so that a part of the feasible region with lower-bounded volume v

0

is

contained within it. In each iteration, the algorithm tests whether the center of the ellipsoid is a

feasible solution. If so, it reports success. Otherwise, it ﬁnds a constraint that is violated, which

deﬁnes a hyperplane. Since the center is on the wrong side of this hyperplane, at least half of the

ellipsoid can be discarded. The algorithm proceeds by ﬁnding a smaller ellipsoid which contains

the half of the original ellipsoid which is on the correct side. The ratio between the volume of the

newellipsoid and that of the old one is upper bounded by r(n) < 1. Therefore, after (polynomially)

many iterations, either a solution is found, or the ellipsoid becomes small enough (the volume of

it is less than v

0

) such that no feasible region can be contained in it. In the latter case, the feasible

region is can be declared empty.

53

The whole algorithm is given below:

Algorithm 8: Ellipsoid algorithm

input : An mn system of linear strict inequalities Ax <

b, of size L

output: an n-vector such that Ax <

**b, if such a vector exists; ‘no’ otherwise
**

1. Initialize

Set j = 0,

t

0

=

0, B

0

= n

2

2

2L

I.

/

*

j counts the number of iterations so far.

*

/

/

*

The current ellipsoid is E

j

= ¦x : (x −

t

j

)

T

B

−1

j

(x −

t

j

) ≤ 1¦

*

/

2. Test

if

t

j

is a solution to Ax <

b then

return

t

j

end

if j > K = 16n(n + 1)L then

return’no’

end

3. Iterate:

Choose any inequality in Ax <

b that is violated by

t

j

; say a

T

t

j

≥

b.

Set

t

j+1

=

t

j

−

1

n + 1

B

j

a

_

a

t

B

j

a

B

j+1

=

n

2

n

2

−1

(B

j

−

2

n + 1

(B

j

a)(B

j

a)

T

a

T

B

j

a

)

j = j + 1

Goto 2.

Theorem 9.3.4. Ellipsoid algorithm runs in polynomial time.

Proof. The algorithm runs for at most 16n(n+1)L rounds, and each round takes polynomial time.

So the algorithm runs in polynomial time.

54

Chapter 10

Max-Flow Min-Cut Through Linear

Programming

In this chapter, we give a proof of the Max-Flow Min-Cut Theorem using Linear Programming.

10.1 Flow and Cut

First, we review the deﬁnition of Max-Flow and Min-Cut and introduce another deﬁnition of Max-

Flow.

10.1.1 Flow

Let N = (G, s, t, c) be a network (directed graph) with s and t the source and the sink of N,

respectively. The capacity of an edge is a mapping c: E → R

+

, denoted by c

uv

or c(u, v); this

represents the maximum amount of “ﬂow” that can pass through an edge.

Deﬁnition 10.1.1 (Flow). A ﬂow is a mapping f: E → R

+

, denoted by f

uv

or f(u, v), subject to

the following two constraints:

1. f

uv

≤ c

uv

for each (u, v) ∈ E (capacity constraint)

2.

u: (u,v)∈E

f

uv

=

u: (v,u)∈E

f

vu

for each v ∈ V ¸ ¦s, t¦ (conservation constraint).

The value of ﬂow represents the amount of ﬂow passing from the source to the sink and is

deﬁned by val(f) =

v∈V

f

sv

, where s is the source of N. The maximum ﬂow problem is to

maximize val(f), i.e. to route as much ﬂow as possible from s to the t.

Deﬁnition 10.1.2 (Path). A path is a sequence of vertices such that there is a directed path from

one vertex to next vertex in the sequence. An s −t path is a path that with s at the beginning of the

sequence and vertex t the last.

55

10.1.2 An alternative Deﬁnition of Flow

We currently give another deﬁnition of ﬂow, and later prove that it is equivalent to Deﬁnition 10.1.1.

Deﬁnition 10.1.3 (Flow (alternate)). In N = (G, s, t, c), let P be the set of vertices that constitute

the s −t path. A ﬂow is a mapping f’: P → R

+

, denoted by f

p

or f

**(p), subject to the following
**

constraint:

1.

p: (u,v)∈p

f

p

≤ c

uv

for each (u, v) ∈ E

The value of ﬂow is deﬁned by val(f

) =

p∈P

f

p

.

10.1.3 Cut

Deﬁnition 10.1.4 (Cut). An s-t cut C = (S, T) is a partition of V such that s ∈ S and t ∈ T.

Deﬁnition 10.1.5 (Cut Set). The cut-set of C is the set ¦(u, v) ∈ E[u ∈ S, v ∈ T¦.

Note that if the edges in the cut-set of C are removed, val(f) = 0.

Deﬁnition 10.1.6 (Capacity). The capacity of an s −t cut is deﬁned by c(S, T) =

(u,v)∈S×T

c

uv

.

The minimum cut problem is to minimize c(S, T), i.e. to determine an S and T such that the

capacity of the S-T cut is minimal.

10.2 Max-ﬂow Min-Cut Theorem

In this section, we prove the Max-Flow and Min-Cut Theorem by using Linear Programming.

Before we go into the proof, we need the following two lemmas.

Lemma 10.2.1. Given N = (G, s, t, c), for every ﬂow val(f) > 0, there exist an s −t path p such

that ∀(u, v) ∈ p, f

uv

> 0

Proof. We prove the lemma by contradiction.

If such an s − t path does not exist, let A = ¦v ∈ V [∃ s-v path p, ∀(u, w) ∈ p, f

uw

> 0¦. Then

s ∈ A, t ,∈ A and ∀u ∈ A, v ,∈ A, f

uv

= 0.

v∈A

(

w: (v,w)∈E

f

vw

−

u: (u,v)∈E

f

uv

) =

w: (s,w)∈E

f

sw

= val(f)

> 0

But

v∈A

(

w: (v,w)∈E

f

vw

−

u: (u,v)∈E

f

uv

) =

v∈A,w∈A,(v,w)∈E

f

vw

−

u∈A,v∈A,(u,v)∈E

f

uv

= −

u∈A,v∈A,(u,v)∈E

f

uv

≤ 0

This is a contradiction.

56

Lemma 10.2.2 (Flow Decomposition). The two deﬁnitions of ﬂow 10.1.1 and 10.1.3 are equiva-

lent. In N = (G, s, t, c), for every α ∈ R

+

,

∃f : E → R

+

, val(f) = α ⇔ ∃f

: P → R

+

, val(f

) = α

Proof. (⇐) Let f

uv

=

p: (u,v)∈p

f

p

.

• First we prove that f is a ﬂow by examining the two constraints given in Deﬁnition 10.1.1.

For the capacity constraint, we know by Deﬁnition 10.1.3 of f

, f

uv

≤ c

uv

for each (u, v) ∈

E.

For the conservation constraint, by the deﬁnition of path 10.1.2, for every path p, ∀(u, v) ∈

p, ∃(v, w) ∈ p and ∀(v, w) ∈ p, ∃(u, v) ∈ p.

Thus, for every path p and node v , [¦(u, v) ∈ p[u ∈ V ¦[ = [¦(v, w) ∈ p[w ∈ V ¦[. So for

every v ∈ V

p∈P

u: (u,v)∈p

f

p

=

p∈P

w: (v,w)∈p

f

p

=⇒

u: (u,v)∈E

f

uv

=

u: (v,u)∈E

f

vu

• Second we prove that val(f) = val(f

)

val(f) =

v∈V

f

sv

=

v∈V

(s,v)∈p

f

p

=

p∈P

f

p

= val(f

)

(⇒) We prove this direction by induction on [E[,

• The base case is obvious when [E[ ≤ 2,

• For the inductive step, if val(f) = 0, let f

p

= 0.

If [f[ > 0, then by Lemma 10.2.1 there exists an s −t path p. Let α = min

fuv,(u,v)∈p

, α > 0.

We construct a new ﬂow f

1

by reducing f

uv

by α for every (u, v) ∈ p and then remove all

edges (u, v) such that f(u, v) = 0. In the new ﬂow, at least one edge is removed; thus, by

induction, ∃val(f

1

) = val(f

1

). We can then get f

by adding p to f

1

.

Thus we ﬁnd

val(f

) = val(f

1

) + α = val(f

1

) + α = val(f)

Theorem 10.2.3 (Max-ﬂow Min-cut Theorem). Given N = (G, s, t, c), max¦val(f) : E →

R

+

¦ = min¦val(f

) : P → R

+

¦

Proof Sketch: 1. Get the LP of the max-ﬂow problem according to the Deﬁnition 10.1.3.

2. Find the dual of the LP. The equivalence of the two answers are guaranteed by the strong

duality theorem.

57

3. Prove the solution of the dual is equal to the answer of the min-cut problem.

Proof. First, we present an LP of the max-ﬂow problem along with its dual.

Max-ﬂow Dual

maximize: val(f

) =

p∈P

f

p

subject to:

(u,v)∈p

f

p

≤ c

uv

(u, v) ∈ E

f

p

≥ 0 p ∈ P

minimize:

(u,v)∈E

c

uv

y

uv

subject to:

(u,v)∈p

y

uv

≥ 1 p ∈ P

y

uv

≥ 0 (u, v) ∈ E

We prove the solution of the dual LP is equivalent to that of the min-cut of N via the following

two observations.

Observation 10.2.4. In N, ∀ cuts C = (S, T), there is a feasible solution to the dual LP of max-

follow in N with cost equal to c(S, T).

Proof. For C = (S, T), set

y

uv

=

_

1 u ∈ S, v ∈ T

0 otherwise

It is easy to check that this is a feasible solution of the dual LP and the cost is equal to c(S, T).

Observation 10.2.5. In N = (V, E), given a feasible solution ¦y

uv

¦

(u,v)∈E

to the dual LP, ∃C =

(S, T), where c(S, T) is less than or equal to the cost of the solution.

Proof. First, construct a graph G = (V, E), where the distance between u and v is given by

dis(u, v) = y

uv

, (u, v) ∈ E.

Let d(u, v) be the length of shortest path from u to v according to dis.

Thus the ﬁrst [P[ constraints of the dual LP can be written as

d(s, t) ≥ 1

Clearly, this stipulates that all paths from s to t should be greater than 1.

Pick a uniform random number T ∈ [0, 1).

For a given T, we construct a cut in the following way,

v ∈

_

S d(s, v) ≤ T

T d(s, v) > T

We deﬁne the following function to make things more clear.

In(u, v, S, T) =

_

1 u ∈ S, v ∈ T

0 otherwise

58

We prove that the expectation of c(S, T) is equal to the cost of the solution of the dual LP.

E(c(S, T)) =

(u,v)∈E

E(c

uv

In(u, v, S, T)) (10.1)

=

(u,v)∈E

c

uv

E(In(u, v, S, T)) (10.2)

=

(u,v)∈E

c

uv

Pr(u ∈ S, v ∈ T) (10.3)

=

(u,v)∈E

c

uv

Pr(d(s, u) ≤ T ∧ d(s, v) > T) (10.4)

≤

(u,v)∈E

c

uv

(d(s, v) −d(s, u)) (10.5)

≤

(u,v)∈E

c

uv

dis(u, v) (10.6)

=

(u,v)∈E)

c

uv

y

uv

(10.7)

Because the expectation of c(S, T) equals to the cost of the solution, then there must exist a cut

C = (S, T) which is less than or equal to the cost of the solution.

According to Observation 10.2.4, we know that the minimum cost of the solution of the dual

LP is less or equal to that of min-cut the problem.

According to Observation 10.2.5, we know that the solution to the min-cut problem is less than or

equal to the minimum cost of the solution of the dual LP.

Thus, we arrive at the conclusion that the solution of the dual must exactly equal to that of the

min-cut problem.

The proof of Max-ﬂow Min-cut Theorem immediately follows.

59

Chapter 11

Rounding Technique for Approximation

Algorithms(I)

In this chapter, we give some concrete examples about using Linear Programming to solve opti-

mization problems. First, we give the general method to solve problems using LPs. Then, we give

the Set Cover problem as an example to the general method. We give two approximation algo-

rithms for Set Cover and apply LP-rounding on the Vertex Cover problem to get a 2-approximation

algorithm.

11.1 General Method to Use Linear Programming

Here, we show the general idea to solve problems using an LP.

To make things easier to understand, we will provide three examples of problems solved by

Linear Programing: Set Cover, Vertex Cover and MAX-K SAT.

11.2 Set Cover Problem

11.2.1 Problem Description

We are given a set U of n elements e

1

, e

2

, , e

n

, and a set S of k subsets S

1

, S

2

, , S

k

from

U. A set cover is a collection of subsets from S that satisfy the condition that every element in U

belongs to one of the subsets. In addition, we are given a cost function c : S → Z

+

. The cost of a

set cover is denoted by the sum of costs of each subset in the collection of selected subsets S. Our

objective is to ﬁnd the collection of subsets that minimizes the cost.

Example 11.2.1. U = ¦1, 2, , 6¦ and S

= ¦S

1

, S

2

, S

3

¦ where S

1

= ¦1, 2¦, S

2

= ¦2, 3, 4¦ and

S

3

= ¦5, 6¦. In addition, we have the cost of the three sets are 1, 2 and 3 respectively. The example

is illustrated below:

60

Figure 11.1: Set Cover Example

By deﬁnition, S

1

= ¦S

2

, S

3

¦ is not a set cover of (U, S, c) because S

1

does not cover 1. Con-

versely, S

2

= ¦S

1

, S

2

, S

3

¦ is the only set cover in this example and c(S

2

) = c(S

1

)+c(S

2

)+c(S

3

) =

6.

A Set Cover problem (U, S, c) is denoted by

Input

U = ¦e

1

, e

2

, , e

n

¦

S = ¦S

1

, S

2

, , S

k

¦ S

i

⊆ U

c : S → Z

+

Output

min

S

c(S

) = min

S

S∈S

c(S) subject to ∀i, ∃j s.t. S

j

∈ S

and e

i

∈ S

j

11.2.2 Complexity

Theorem 11.2.2. The Set Cover Problem is NP-hard.

Proof. Our proof is by reduction from Vertex Cover. For any given instance of Vertex Cover in

graph G = (V, E) and an integer j, the Vertex Cover problem determines whether or not there

exist j vertices that cover all edges.

We can construct a corresponding Set Cover Problem (U, S, c) as follows. We let U = E; for

each vertex in v ∈ V, we form a set S ∈ S that contains all edges in E directly linked to the vertex

v. Further, for every S ∈ S, let c(S) = 1. This construction can be done in time that is polynomial

in the size of the Vertex Cover instance.

Suppose that G has a vertex cover of size at most j; then we have a set cover in the constructed

problem by simply choosing the subsets corresponding to the selected vertices. On the other hand,

suppose that we have a set cover in the constructed problem; we can therefore choose those vertices

corresponding to the selected subsets. In this way, we have Vertex Cover ≤

P

Set Cover. It follows

that the Set Cover Problem is NP-hard.

61

11.2.3 Greedy Set Cover

Algorithm Description

Algorithm 9: Greedy Set Cover Algorithm

Input: element set U = ¦e

1

, e

2

, , e

n

¦, subset set S = ¦S

1

, S

2

, , S

k

¦ and cost function

c : S → Z

+

Output: set cover C with minimum cost

C ← φ;

while C ,= U do

Calculate cost effectiveness (deﬁned below): α

1

, α

2

, , α

l

of the unpicked sets

S

1

, S

2

, , S

l

respectively;

pick a set S

j

with minimum cost effectiveness α;

C ← C ∪ S

j

;

output C;

The cost effectiveness α of subset S above is denoted by

α =

c(S)

[S −C[

Let C

i

be the set of those sets chosen up to and including that in iteration i. [S − C

i

[ means the

size of the subtraction from set S of the all union of all subsets in C

i

. It is obvious that for any

particular subset S

j

, its cost effectiveness α

j

varies from one iteration to another.

Example 11.2.3. U = ¦1, 2, , 6¦ and S

= ¦S

1

, S

2

, , S

10

¦. The elements and cost of each

set are given below

S

1

= ¦1, 2¦ c(S

1

) = 3

S

2

= ¦3, 4¦ c(S

2

) = 3

S

3

= ¦5, 6¦ c(S

3

) = 3

S

4

= ¦1, 2, 3, 4, 5, 6¦ c(S

4

) = 14

S

5

= ¦1¦ c(S

5

) = 1

S

6

= ¦2¦ c(S

5

) = 1

S

7

= ¦3¦ c(S

5

) = 1

S

8

= ¦4¦ c(S

5

) = 1

S

9

= ¦5¦ c(S

5

) = 1

S

10

= ¦6¦ c(S

5

) = 1

The example is depicted in graphical form in Figure 11.2.3.

There are several feasible set covers in this example, e.g. ¦S

1

, S

2

, S

3

¦ and ¦S

2

, S

5

, S

6

, S

9

, S

10

¦.

However, the set cover with the minimum cost is ¦S

5

, S

6

, S

7

, S

8

, S

9

, S

10

¦, with a corresponding

cost of 6.

We give a simple example of how to compute cost effectiveness. Let C = ¦S

1

, S

2

, S

9

¦. Then,

for subset S

3

we have

α =

c(S

3

)

[S

3

−C[

=

3

1

= 3

62

Figure 11.2: Greedy Set Cover Example

Approximation Rate of the Greedy Set Cover

We sort the elements e

1

, e

2

, , e

n

by the iteration when they were added to C. Let this sorted list

be e

1

, e

2

, , e

n

.

Figure 11.3: sort elements by iteration time

Notice that the above subsets S

j

are different from the original S

j

, namely, the set only includes

those elements added by itself to C.

For each e

j

which is added to S

j

, we deﬁne

price(e

j

) = α

S

j

namely the cost effectiveness of the set where e

j

was covered.

Observation 11.2.4. A(I) =

n

k=1

price(e

k

)

We denote the cost of the set cover of Greedy Set Cover by A(I), where I is the input. Therefore,

we have

A(I) = c(C)

=

S

i

∈C

c(S

i

)

=

S

i

∈C

e

∈S

i

c(S

i

)

[S

i

[

=

n

k=1

price(e

k

)

The third equation above breaks the cost of a subset into the costs of those elements in the subset.

Lemma 11.2.5. Let k = 1, 2, , n, we have price(e

k

) ≤ OPT/(n −k + 1).

63

Proof. Note that the remaining elements e

k

, e

k+1

, , e

n

can be covered at cost at most OPT,

namely in the OPT solution of the problem. Therefore, there is a subset in OPT which has not been

selected and with a cost effectiveness at most OPT/(n−k +1). Further, our algorithm chooses to

cover a set with minimum cost effectiveness, so that we should have price(e

k

) ≤ OPT/(n−k+1)

to guarantee our algorithm could pick the selected subset.

Theorem 11.2.6. Greedy Set Cover is ln(n)-approx

Proof.

A(I) =

n

k=1

price(e

k

)

≤ (

1

n

+

1

n −1

+ +

1

2

+ 1)OPT

≤ ,ln(n)|OPT

The ﬁrst equation is due to Observation 11.2.4, and the second is derived from the above lemma.

Thus, the Greedy Set Cover is an ln(n)-approximation algorithm.

Tightness of Greedy Set Cover

U = ¦e

1

, e

2

, , e

n

¦ and S = ¦S

1

, S

2

, , S

n

¦. The elements and cost of each set is given below:

S

1

= ¦1¦ c(S

1

) = 1/n

S

2

= ¦2¦ c(S

2

) = 1/(n −1)

S

3

= ¦3¦ c(S

3

) = 1/(n −2)

. .

. .

. .

S

n−1

= ¦n −1¦ c(S

n−1

) = 1/2

S

n

= ¦1, 2, , n¦ c(S

n

) = 1 +

The above is some negligible value. It is obvious that the OPT Set Cover is ¦S

n

¦ with cost n.

However, in our algorithm, in iteration i we will pick the S

j

that has the minimum cost effective-

ness. The is used here to make sure the cost effectiveness of S

j

is smaller that of S

n

. In this way,

we will choose all n subsets and get a solution with cost

1

n

+

1

n −1

+ + 1 = ,ln(n)|

So we construct a worst case for any n with approximation ratio ln(n). In conclusion, the ln(n)

approximation is tight for Greedy Set Cover.

64

11.2.4 LP-rounding Set Cover

Integer Program

We express the Set Cover problem as an Integer Programs in the following form:

Minimize

k

i=1

c(S

i

)u

i

Subject to

i:e∈S

i

u

i

≥ 1 , ∀e ∈ U

u

i

∈ ¦0, 1¦ , i = 1, 2, , k

where u

i

is the boolean variable that shows whether or not set S

i

has been chosen. S

i

is not chosen

when u

i

= 0, while S

i

is chosen when u

i

= 1.

Example 11.2.7. U = ¦1, 2, 3, 4¦ and the sets and cost of each are given below:

S

1

= ¦1, 2¦ c(S

1

) = 5

S

2

= ¦2, 3¦ c(S

2

) = 100

S

3

= ¦2, 3, 4¦ c(S

3

) = 1

We can change the set cover problem above into the integer programs as follows:

Minimize

5u

1

+ 100u

2

+u

3

Subject to

1 : u

1

≥ 1

2 : u

1

+u

2

+u

3

≥ 1

3 : u

2

+u

3

≥ 1

4 : u

3

≥ 1

u

i

∈ ¦0, 1¦ i ∈ ¦1, 2, 3¦

Relaxing to a Linear Program

The above construction an Integer Program properly represents the original Set Cover problem;

however, the Integer Program is still NP-hard.

To get an approximation algorithm, we relax the integral constraints and get a Linear Program.

Integer Program Linear Program

Minimize :

k

i=1

u

i

c(S

i

)

Subject to:

i:e∈S

i

u

i

≥ 1 , ∀e ∈ U

u

i

∈ ¦0, 1¦ , i = 1, 2, , k

Minimize :

k

i=1

u

i

c(S

i

)

Subject to:

i:e∈S

i

u

i

≥ 1 , ∀e ∈ U

0 ≤ u

i

≤ 1 , i = 1, 2, , k

We relax only the non-linear constraints for u

i

. In this way, we can easily solve the Linear

Program in polynomial time. However, the relaxation of constraints brings about a new problem:

How can we interpret the result vector u = (u

1

, u

2

, , u

k

) when some u

i

’s are fractions?

65

Algorithm 10: LP-rounding Set Cover Algorithm

Input: element set U = ¦e

1

, e

2

, , e

n

¦, subset set S = ¦S

1

, S

2

, , S

k

¦ and cost function

c : S → Z

+

Output: set cover C with minimum cost

1. Write the original Set Cover problem as an Integer Program.

2. Relax the Integer Program into a Linear Program.

3. Solve the Linear Program using the Ellipsoid Algorithm and obtain

u = (u

1

, u

2

, , u

k

) ∈ R

k

.

4. Let f be the maximum frequency (the number of times that element appears in distinct

subsets).

5. Output deterministic rounding

u

= (u

1

, u

2

, , u

k

) ∈ ¦0, 1¦

k

, where u

i

satisﬁes

u

i

=

_

1 , u

i

≥ 1/f

0 , otherwise

LP-rounding Set Cover Algorithm

The algorithm ﬁrst changes the original Set Cover problem into an Integer Program, then relaxes

the constraints of the IP to obtain a Linear Program. By solving this LP, we get a solution u ∈ R

k

.

Lastly, we round each element of the result vector u using the f below.

Effectiveness of LP-rounding

Lemma 11.2.8. LP-rounding Set Cover is an f-approximation.

Proof. First, we prove that

u

**is a feasible solution. In order to show this, we need to examine each
**

constraint and satisfy it. Suppose

u

1

+u

2

+ + u

l

≥ 1

be one of the constraints. By deﬁnition, we have l ≤ f. Now, there must be some i such that

u

i

≥ 1/l ≥ 1/f. In this way, u

i

will be rounded to 1, and therefore satisﬁes the proper constraint.

Thus, the resulting vector

u

is a feasible solution.

On the other hand, we look into the rounding process from u

i

to u

i

:

u

i

=

_

1 , u

i

≥ 1/f

0 , otherwise

66

We can easily check that u

i

≤ f u

i

for both cases. So that

k

i=1

c(S

i

) u

i

≤

k

i=1

c(S

i

) f u

i

= f

k

i=1

c(S

i

) u

i

≤ f OPT

The ﬁrst inequality is due to Lemma 11.2.8. The second inequality comes from relaxing the con-

straints to enlarge the feasible solution area. Finally, we know that the OPT from the Integer

Program will be included in the solution space of the Linear Program.

IP

relax

−−−→ LP −→ OPT

IP

≥ OPT

LP

This shows that the LP-rounding of Set Cover produces feasible solution for the problem, and

gives approximation ratio of f.

11.3 Vertex Cover

As we proved in Section 11.2.2, any Vertex Cover problem can be changed into Set Cover problem.

Example 11.3.1. G = (V, E), where V = ¦u

1

, u

2

, u

3

, u

4

, u

5

¦ and E = ¦e

1

, e

2

, , e

8

¦. We can

Figure 11.4: Vertex Cover Example

change the above Vertex Cover problem into a Set Cover problem, where

U = ¦e

1

, e

2

, e

3

, e

4

, e

5

, e

6

, e

7

, e

8

¦

S = ¦(e

1

, e

2

, e

3

), (e

1

, e

4

, e

5

), (e

3

, e

4

, e

6

, e

7

), (e

2

, e

6

, e

8

), (e

5

, e

7

, e

8

)¦

c(S

i

) = 1 , ∀S

i

∈ S

From the above construction, i.e. using LP-rounding for Set Cover, we may by extension solve

the Vertex Cover problem, the maximum frequency corresponding to the number of vertices that

belong to a particular edge, which is 2. We have, therefore, f = 2 and the fact that by using

LP-rounding for Set Cover, we can get a 2-approximation algorithm for Vertex Cover.

67

Chapter 12

Rounding Technique for Approximation

Algorithms(II)

In this chapter, we will give a randomized algorithm for Set Cover. The algorithm gives an approx-

imation factor of log n for Set Cover, and reaches this approximation ratio with high probability.

Then, we give a randomized algorithm for the MAX-K SAT problem. Furthermore, we will give a

deterministic algorithm which derandomizes the aforementioned MAX-K SAT algorithm using the

method of conditional expectation.

12.1 Randomized Algorithm for Integer Program of Set Cover

12.1.1 Algorithm Description

We know from the previous lecture how to express the Set Cover problem as a Linear Program.

For the randomized algorithm, we do the same thing as the last lecture, perform a linear relaxation.

Minimize :

k

i=1

u

i

c(S

i

)

Subject to:

i:e∈S

i

u

i

≥ 1 , ∀e ∈ U

u

i

∈ [0, 1] , i = 1, 2, , k

Here is the algorithm:

12.1.2 The correctness of the Algorithm

Firstly, we want to know the expectation of C(G), where C(G) is the conﬁguration of a random

collection of sets in all iterations. Assume G is the random collection of sets in one iteration. Then

68

Algorithm 11: LP-rounding Set Cover Algorithm

Input: element set U = ¦e

1

, e

2

, , e

n

¦, subset set S = ¦S

1

, S

2

, , S

k

¦ and cost function

c : S → Z

+

Output: set cover C with minimum cost

1. Solve the LP-rounding Set Cover, and let the solution be (x

1

, x

2

. . . x

n

) ∈ R

n

(e.g. using the

Ellipsoid algorithm)

2. Randomly choose u

i

∈

R

[0, 1]

If u

i

≤ x

i

, u

i

← 1, otherwise u

i

← 0

Then Pr(u

i

= 1) = x

i

and Pr(u

i

= 0) = 1 −x

i

3. Repeat the second step 2 ln n times. Output the collection G of sets corresponding to u

i

(If

u

i

= 1, then we picked S

i

into G)

the expectation of C(G) is

E[C(G)] =

k

i=1

Pr(S

i

is picked)C(S

i

)

=

k

i=1

x

i

C(S

i

)

= OPT

fractional

≤ OPT

So the expectation of C(G) is

E[C(G)] ≤ 2 ln n E[C(G)]

≤ (2 ln n)OPT

fraction

≤ 2 ln nOPT

Markov’s inequality:

Pr([X[ ≥ α) ≤

E([X[)

a

Using Markov’s inequality, we have

Pr[C(G) ≥ 8 ln nOPT] ≤

1

4

Now, we want to know the probability that every e

i

has been covered. We know one element is

69

not covered during one iteration of step 2 in algorithm with probability

Pr[e is not covered in one iteration] =

i:e∈S

i

(1 −x

i

)

≤

i:e∈S

i

e

−x

i

= e

−

i:e∈S

i

x

i

≤

1

e

The last inequality is due to

i:e∈S

i

x

i

≥ 1

Then the probability that e is not covered in any of the 2 ln n repetitions is:

P ≤ (

1

e

)

2 ln n

≤ (

1

n

)

2

=

1

n

2

We know that:

Pr(A ∨ B) ≤ Pr(A) + Pr(B)

So the probability of having an element which is not covered is:

Pr[∃eis not covered] = Pr[e

1

is not covered ∨ e

2

is not covered ∨ e

n

is not covered]

≤

1

n

2

n

=

1

n

So

Pr[all elements are covered] ≥ 1 −

1

n

Let c be the event that all elements are covered and c

**be the event that C(G) ≥ 8 ln n OPT.
**

Then we have:

Pr[c

∧ c] ≥ 1 −

1

n

−

1

4

>

2

3

Hence, we have an algorithm that succeeds with probability higher than

1

2

. Using Chernoff

bounds we can repeat the algorithm a polynomial number of times so as to succeed with high

probability.

12.2 Method of Computation Expectations

12.2.1 MAX-K-SAT problem

Let us introduce the MAX-K-SAT problem.

70

Input:

1. K-CNF c

1

∧ c

2

∧ c

m

2. C : c

1

, c

2

. . . c

m

→ Z

+

Output:

output a truth assignments that maximizes proﬁt, i.e. satisﬁes a maximal number of clauses

We know that MAX-2-SAT is an NP-Hard problem. But we have a derandomized algorithm to

get at least one half of the optimal for the problem.

12.2.2 Derandomized Algorithm

The derandomized algorithm can be described as follows:

We choose every variable’s value with

1

2

probability of 1 and

1

2

probability of 0. Using E(X +

Y ) = E(X) + E(Y ) and the method of conditional expectation, we can calculate the expectation

of the total proﬁt. More than that, if we have decided some variable’s value, we can also calculate

the expectation of the total proﬁt in polynomial time for the rest random variables. By making

use of these facts, we can design a derandomized algorithm that always achieves an approximation

ratio of at least one half of of OPT.

Assume W

c

i

= C(c

i

) if c

i

is satisﬁed , otherwise W

c

i

= 0. And W =

m

i=1

W

c

i

For step i:

1. We calculate E[W[

x

i

=1

] and E[W[

x

i

=0

] in polynomial time.

2. We choose x

i

’s value with bigger expectation for W

12.2.3 The proof of correctness

Firstly, we have

E[W] =

1

2

E[W[

x

i

=1

] +

1

2

E[W[

x

i

=0

]

So the bigger one of E[W[

x

i

=1

] and E[W[

x

i

=0

] must be bigger than E[W] too. At the end of the

algorithm, we must have proﬁts bigger than E[W].

And we know

Pr[c

i

is satisﬁed] = 1 −

1

2

k

≥

1

2

So

E[W

c

i

] ≥

1

2

C(c

i

)

So

E[W] = E[W

c

1

] + E[W

c

2

] + + E[W

cm

]

≥

1

2

(C(c

1

) + C(c

2

) + +C(c

m

))

≥

1

2

OPT

71

So the proﬁt (total number of clauses satisﬁed) we get at the end of algorithm is greater than or

equal to

1

2

OPT.

72

Chapter 13

Primal Dual Method

In this lecture, we introduce Primal-Dual approximation algorithms. We also construct an ap-

proximation algorithm to solve SET COVER as an example.

13.1 Deﬁnition

In the previous lecture, we know that every Linear Programming problem has a corresponding

Dual problem:

PRIMAL DUAL

Minimize :

n

j=1

c

j

x

j

Subject to:

n

j=1

a

ij

x

j

≥ b

i

, i = 1, 2, , m

x

j

≥ 0 , j = 1, 2, , n

Maximize :

m

i=1

b

i

y

i

Subject to:

m

i=1

a

ij

y

i

≤ c

j

, j = 1, 2, , n

y

i

≥ 0 , i = 1, 2, , m

Also from previous lectures, we know that the Primal and Dual have optimal solutions if and

only if:

• Primal: ∀j = 1, 2, , m, x

j

= 0 or

m

i=1

a

ij

y

i

= c

j

• Dual: ∀i = 1, 2, , n, y

i

= 0 or

n

j=1

a

ij

x

j

= b

i

We give a relaxation of the above conditions as follows:

• Primal complementary slackness conditions:

For α ≥ 1, ∀j = 1, 2, ..., n, either x

j

= 0 or

c

j

α

≤

m

i=1

a

ij

y

i

≤ αc

j

• Dual complementary slackness conditions:

For β ≥ 1, ∀i = 1, 2, ..., m, either y

i

= 0 or

b

i

β

≤

n

j=1

a

ij

x

j

≤ βb

i

Theorem 13.1.1. If x, y are feasible solutions to PRIMAL and DUAL respectively, and they satisfy

primal conditions and dual conditions, then

n

j=1

c

j

x

j

≤ αβOPT.

73

Proof.

n

j=1

c

j

x

j

≤ α

n

j=1

m

i=1

a

ij

y

i

x

j

≤ α

m

i=1

y

i

n

j=1

a

ij

x

j

≤ αβ

m

i=1

b

i

y

i

≤ αβ OPT.

With the above theorem, the basic idea of primal dual method is: set x =

0, y =

0 at the

beginning, then modify them to get an optimal solution.

13.2 Set Cover Problem

Since we have discussed a lot about the Set Cover Problem in the previous lectures, here we only

give a brief description of the Set Cover Problem:

Input:

U = ¦e

1

, e

2

, ..., e

n

¦,o = ¦S

1

, S

2

, ..., S

k

¦,C : o → Z

+

.

Output:

A cover o

**⊆ o of U with minimal total cost.
**

We presently give a PRIMAL for the Set Cover problem as well as its DUAL:

• PRIMAL form: Minimize

S∈S

c(S)x

S

,subject to ∀e ∈ U,

S:e∈S

x

S

≥ 1.

• DUAL form: Maximize

e∈U

y

e

,subject to ∀S ∈ o,

e:e∈S

y

e

≤ c(S), y

e

≥ 0.

We also have the following Primal and Dual conditions:

• The Primal conditions: Let α=1,x

s

,= 0, ⇒

e:e∈S

y

e

= c(S).

• The Dual conditions: Let β = f,y

e

,= 0, ⇒

S:e∈S

x

s

≤ f. (here f is the maximal number

of times an element appears in a set.)

Algorithm 12: Primal Dual Algorithm for Set Cover

Let x ←

0, y ←

0.

We call a set S satisfying the primal condition “tight”.

repeat

Pick an uncovered e ∈ U.

Increase y

e

until a set S becomes tight.

Pick all tight sets in the cover and update x,(x

S

= 1.)

let the elements of these sets as covered

until all elements covered;

Claim 13.2.1. The algorithm is an f-approximation for SET-COVER.

Proof. At the end of the execution the algorithm, the algorithm has covered every element e, so

the constrains of the Primal form are satisﬁed, i.e. x is a feasible solution for the primal.

In addition, the algorithm never violates the conditions of the Dual. Thus, x, y are feasible for

the dual.

By deﬁnition, an element e ∈ U is contained in at most f sets and ∀S ∈ o, x

s

≤ 1 ⇒ the

Primal conditions are always satisﬁed.

Finally, the Dual conditions are satisﬁed by the way the algorithm works.

We therefore end up with an f-approximation algorithm.

74

Finally, we give an example which shows that the approximation ratio f is also the lower bound

of the algorithm.

Figure 13.1: An example of a case where the solution of the Primal Dual algorithm is f times the

optimal

Let c(S) = 1 for all S ∈ o. We can see that OPT is x

n+1

= 1 with total cost value = 1+ε. The

Primal Dual algorithmwill choose sets x

1

= x

2

= ... = x

n+1

=1, with a total cost value = n+1. So

we have that

n+1

1+ε

→ n+1 = f. This example shows that f is a lower bound for the approximation

ratio of the Primal Dual method.

75

- 1-s2.0-0166218X87900175-main
- Programming Camp Syllabus
- edmonds
- moeller-20040514
- Matching
- 12 Transportation Model LP
- Hungarian(Bipartite)
- 06 Dynamic Programming
- Untitled
- IIITA - ProgrammingCampSyllabus
- lec20
- AggPlanning_Chapter03
- Programming Camp Syl Lab Programming
- Chapter 22
- total domination books.pdf
- slides9-8
- Ix-2 Messaoudi Djelfa
- Chap06tn Facility Locations
- Week 4 Solution
- 25
- SAP Gioia Tauro Terminal Problem
- answer-key-look-ahead-x.pdf
- L1
- Industrial Engineering
- Francis
- Simplex
- 2007-LNCS
- GRID SEARCHING Novel way of Searching 2D Array
- Quantitative Analysis-2
- 23+ +Linear+Programming

- Info TOEFL
- Toefel Reading
- ambook
- Handout1
- Handout16
- 2000-01
- Audio Script & Questions
- Adwords Advanced Search Exam q
- ch23
- Introduction to Sap
- Handout2
- 8818990-SAP-Basics
- Trick
- Linear Algebra(Ppt)Updated
- Chem 101-01
- ch22
- Chem 101-01
- 04-15-13 Volume of Known Cross Sections, Day 1
- Chapter A
- Math201 Solved FinalE T113
- OM310Syllabus
- 01 Overview
- Read Me
- MEScchap4-1c
- Chemistry Note

Read Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading