This action might not be possible to undo. Are you sure you want to continue?

**Design and Analysis of Computer Algorithms
**

1

David M. Mount

Department of Computer Science

University of Maryland

Fall 2003

1

Copyright, David M. Mount, 2004, Dept. of Computer Science, University of Maryland, College Park, MD, 20742. These lecture notes were

prepared by David Mount for the course CMSC 451, Design and Analysis of Computer Algorithms, at the University of Maryland. Permission to

use, copy, modify, and distribute these notes for educational purposes and without fee is hereby granted, provided that this copyright notice appear

in all copies.

Lecture Notes 1 CMSC 451

Lecture 1: Course Introduction

Read: (All readings are from Cormen, Leiserson, Rivest and Stein, Introduction to Algorithms, 2nd Edition). Review

Chapts. 1–5 in CLRS.

What is an algorithm? Our text deﬁnes an algorithm to be any well-deﬁned computational procedure that takes some

values as input and produces some values as output. Like a cooking recipe, an algorithm provides a step-by-step

method for solving a computational problem. Unlike programs, algorithms are not dependent on a particular

programming language, machine, system, or compiler. They are mathematical entities, which can be thought of

as running on some sort of idealized computer with an inﬁnite random access memory and an unlimited word

size. Algorithm design is all about the mathematical theory behind the design of good programs.

Why study algorithm design? Programming is a very complex task, and there are a number of aspects of program-

ming that make it so complex. The ﬁrst is that most programming projects are very large, requiring the coor-

dinated efforts of many people. (This is the topic a course like software engineering.) The next is that many

programming projects involve storing and accessing large quantities of data efﬁciently. (This is the topic of

courses on data structures and databases.) The last is that many programming projects involve solving complex

computational problems, for which simplistic or naive solutions may not be efﬁcient enough. The complex

problems may involve numerical data (the subject of courses on numerical analysis), but often they involve

discrete data. This is where the topic of algorithm design and analysis is important.

Although the algorithms discussed in this course will often represent only a tiny fraction of the code that is

generated in a large software system, this small fraction may be very important for the success of the overall

project. An unfortunately common approach to this problem is to ﬁrst design an inefﬁcient algorithm and

data structure to solve the problem, and then take this poor design and attempt to ﬁne-tune its performance. The

problem is that if the underlying design is bad, then often no amount of ﬁne-tuning is going to make a substantial

difference.

The focus of this course is on how to design good algorithms, and how to analyze their efﬁciency. This is among

the most basic aspects of good programming.

Course Overview: This course will consist of a number of major sections. The ﬁrst will be a short review of some

preliminary material, including asymptotics, summations, and recurrences and sorting. These have been covered

in earlier courses, and so we will breeze through them pretty quickly. We will then discuss approaches to

designing optimization algorithms, including dynamic programming and greedy algorithms. The next major

focus will be on graph algorithms. This will include a review of breadth-ﬁrst and depth-ﬁrst search and their

application in various problems related to connectivity in graphs. Next we will discuss minimum spanning trees,

shortest paths, and network ﬂows. We will brieﬂy discuss algorithmic problems arising from geometric settings,

that is, computational geometry.

Most of the emphasis of the ﬁrst portion of the course will be on problems that can be solved efﬁciently, in the

latter portion we will discuss intractability and NP-hard problems. These are problems for which no efﬁcient

solution is known. Finally, we will discuss methods to approximate NP-hard problems, and how to prove how

close these approximations are to the optimal solutions.

Issues in Algorithm Design: Algorithms are mathematical objects (in contrast to the must more concrete notion of

a computer program implemented in some programming language and executing on some machine). As such,

we can reason about the properties of algorithms mathematically. When designing an algorithm there are two

fundamental issues to be considered: correctness and efﬁciency.

It is important to justify an algorithm’s correctness mathematically. For very complex algorithms, this typically

requires a careful mathematical proof, which may require the proof of many lemmas and properties of the

solution, upon which the algorithm relies. For simple algorithms (BubbleSort, for example) a short intuitive

explanation of the algorithm’s basic invariants is sufﬁcient. (For example, in BubbleSort, the principal invariant

is that on completion of the ith iteration, the last i elements are in their proper sorted positions.)

Lecture Notes 2 CMSC 451

Establishing efﬁciency is a much more complex endeavor. Intuitively, an algorithm’s efﬁciency is a function

of the amount of computational resources it requires, measured typically as execution time and the amount of

space, or memory, that the algorithm uses. The amount of computational resources can be a complex function of

the size and structure of the input set. In order to reduce matters to their simplest form, it is common to consider

efﬁciency as a function of input size. Among all inputs of the same size, we consider the maximum possible

running time. This is called worst-case analysis. It is also possible, and often more meaningful, to measure

average-case analysis. Average-case analyses tend to be more complex, and may require that some probability

distribution be deﬁned on the set of inputs. To keep matters simple, we will usually focus on worst-case analysis

in this course.

Throughout out this course, when you are asked to present an algorithm, this means that you need to do three

things:

• Present a clear, simple and unambiguous description of the algorithm (in pseudo-code, for example). They

key here is “keep it simple.” Uninteresting details should be kept to a minimum, so that the key compu-

tational issues stand out. (For example, it is not necessary to declare variables whose purpose is obvious,

and it is often simpler and clearer to simply say, “Add X to the end of list L” than to present code to do

this or use some arcane syntax, such as “L.insertAtEnd(X).”)

• Present a justiﬁcation or proof of the algorithm’s correctness. Your justiﬁcation should assume that the

reader is someone of similar background as yourself, say another student in this class, and should be con-

vincing enough make a skeptic believe that your algorithm does indeed solve the problem correctly. Avoid

rambling about obvious or trivial elements. A good proof provides an overview of what the algorithm

does, and then focuses on any tricky elements that may not be obvious.

• Present a worst-case analysis of the algorithms efﬁciency, typically it running time (but also its space, if

space is an issue). Sometimes this is straightforward, but if not, concentrate on the parts of the analysis

that are not obvious.

Note that the presentation does not need to be in this order. Often it is good to begin with an explanation of

how you derived the algorithm, emphasizing particular elements of the design that establish its correctness and

efﬁciency. Then, once this groundwork has been laid down, present the algorithm itself. If this seems to be a bit

abstract now, don’t worry. We will see many examples of this process throughout the semester.

Lecture 2: Mathematical Background

Read: Review Chapters 1–5 in CLRS.

Algorithm Analysis: Today we will review some of the basic elements of algorithm analysis, which were covered in

previous courses. These include asymptotics, summations, and recurrences.

Asymptotics: Asymptotics involves O-notation (“big-Oh”) and its many relatives, Ω, Θ, o (“little-Oh”), ω. Asymp-

totic notation provides us with a way to simplify the functions that arise in analyzing algorithm running times

by ignoring constant factors and concentrating on the trends for large values of n. For example, it allows us to

reason that for three algorithms with the respective running times

n

3

log n + 4n

2

+ 52nlog n ∈ Θ(n

3

log n)

15n

2

+ 7nlog

3

n ∈ Θ(n

2

)

3n + 4 log

5

n + 19n

2

∈ Θ(n

2

).

Thus, the ﬁrst algorithm is signiﬁcantly slower for large n, while the other two are comparable, up to a constant

factor.

Since asymptotics were covered in earlier courses, I will assume that this is familiar to you. Nonetheless, here

are a few facts to remember about asymptotic notation:

Lecture Notes 3 CMSC 451

Ignore constant factors: Multiplicative constant factors are ignored. For example, 347n is Θ(n). Constant

factors appearing exponents cannot be ignored. For example, 2

3n

is not O(2

n

).

Focus on large n: Asymptotic analysis means that we consider trends for large values of n. Thus, the fastest

growing function of n is the only one that needs to be considered. For example, 3n

2

log n + 25nlog n +

(log n)

7

is Θ(n

2

log n).

Polylog, polynomial, and exponential: These are the most common functions that arise in analyzing algo-

rithms:

Polylogarithmic: Powers of log n, such as (log n)

7

. We will usually write this as log

7

n.

Polynomial: Powers of n, such as n

4

and

√

n = n

1/2

.

Exponential: A constant (not 1) raised to the power n, such as 3

n

.

An important fact is that polylogarithmic functions are strictly asymptotically smaller than polynomial

function, which are strictly asymptotically smaller than exponential functions (assuming the base of the

exponent is bigger than 1). For example, if we let ≺ mean “asymptotically smaller” then

log

a

n ≺ n

b

≺ c

n

for any a, b, and c, provided that b > 0 and c > 1.

Logarithm Simpliﬁcation: It is a good idea to ﬁrst simplify terms involving logarithms. For example, the

following formulas are useful. Here a, b, c are constants:

log

b

n =

log

a

n

log

a

b

= Θ(log

a

n)

log

a

(n

c

) = c log

a

n = Θ(log

a

n)

b

log

a

n

= n

log

a

b

.

Avoid using log n in exponents. The last rule above can be used to achieve this. For example, rather than

saying 3

log

2

n

, express this as n

log

2

3

≈ n

1.585

.

Following the conventional sloppiness, I will often say O(n

2

), when in fact the stronger statement Θ(n

2

) holds.

(This is just because it is easier to say “oh” than “theta”.)

Summations: Summations naturally arise in the analysis of iterative algorithms. Also, more complex forms of analy-

sis, such as recurrences, are often solved by reducing them to summations. Solving a summation means reducing

it to a closed formformula, that is, one having no summations, recurrences, integrals, or other complex operators.

In algorithm design it is often not necessary to solve a summation exactly, since an asymptotic approximation or

close upper bound is usually good enough. Here are some common summations and some tips to use in solving

summations.

Constant Series: For integers a and b,

b

¸

i=a

1 = max(b −a + 1, 0).

Notice that when b = a − 1, there are no terms in the summation (since the index is assumed to count

upwards only), and the result is 0. Be careful to check that b ≥ a −1 before applying this formula blindly.

Arithmetic Series: For n ≥ 0,

n

¸

i=0

i = 1 + 2 + +n =

n(n + 1)

2

.

This is Θ(n

2

). (The starting bound could have just as easily been set to 1 as 0.)

Lecture Notes 4 CMSC 451

Geometric Series: Let x = 1 be any constant (independent of n), then for n ≥ 0,

n

¸

i=0

x

i

= 1 +x +x

2

+ +x

n

=

x

n+1

−1

x −1

.

If 0 < x < 1 then this is Θ(1). If x > 1, then this is Θ(x

n

), that is, the entire sum is proportional to the

last element of the series.

Quadratic Series: For n ≥ 0,

n

¸

i=0

i

2

= 1

2

+ 2

2

+ +n

2

=

2n

3

+ 3n

2

+n

6

.

Linear-geometric Series: This arises in some algorithms based on trees and recursion. Let x = 1 be any

constant, then for n ≥ 0,

n−1

¸

i=0

ix

i

= x + 2x

2

+ 3x

3

+nx

n

=

(n −1)x

(n+1)

−nx

n

+x

(x −1)

2

.

As n becomes large, this is asymptotically dominated by the term (n − 1)x

(n+1)

/(x − 1)

2

. The multi-

plicative term n −1 is very nearly equal to n for large n, and, since x is a constant, we may multiply this

times the constant (x −1)

2

/x without changing the asymptotics. What remains is Θ(nx

n

).

Harmonic Series: This arises often in probabilistic analyses of algorithms. It does not have an exact closed

form solution, but it can be closely approximated. For n ≥ 0,

H

n

=

n

¸

i=1

1

i

= 1 +

1

2

+

1

3

+ +

1

n

= (lnn) +O(1).

There are also a few tips to learn about solving summations.

Summations with general bounds: When a summation does not start at the 1 or 0, as most of the above for-

mulas assume, you can just split it up into the difference of two summations. For example, for 1 ≤ a ≤ b

b

¸

i=a

f(i) =

b

¸

i=0

f(i) −

a−1

¸

i=0

f(i).

Linearity of Summation: Constant factors and added terms can be split out to make summations simpler.

¸

(4 + 3i(i −2)) =

¸

4 + 3i

2

−6i =

¸

4 + 3

¸

i

2

−6

¸

i.

Now the formulas can be to each summation individually.

Approximate using integrals: Integration and summation are closely related. (Integration is in some sense

a continuous form of summation.) Here is a handy formula. Let f(x) be any monotonically increasing

function (the function increases as x increases).

n

0

f(x)dx ≤

n

¸

i=1

f(i) ≤

n+1

1

f(x)dx.

Example: Right Dominant Elements As an example of the use of summations in algorithm analysis, consider the

following simple problem. We are given a list L of numeric values. We say that an element of L is right

dominant if it is strictly larger than all the elements that follow it in the list. Note that the last element of the list

Lecture Notes 5 CMSC 451

is always right dominant, as is the last occurrence of the maximum element of the array. For example, consider

the following list.

L = '10, 9, 5, 13, 2, 7, 1, 8, 4, 6, 3`

The sequence of right dominant elements are '13, 8, 6, 3`.

In order to make this more concrete, we should think about how L is represented. It will make a difference

whether L is represented as an array (allowing for random access), a doubly linked list (allowing for sequential

access in both directions), or a singly linked list (allowing for sequential access in only one direction). Among

the three possible representations, the array representation seems to yield the simplest and clearest algorithm.

However, we will design the algorithm in such a way that it only performs sequential scans, so it could also

be implemented using a singly linked or doubly linked list. (This is common in algorithms. Chose your rep-

resentation to make the algorithm as simple and clear as possible, but give thought to how it may actually be

implemented. Remember that algorithms are read by humans, not compilers.) We will assume here that the

array L of size n is indexed from 1 to n.

Think for a moment how you would solve this problem. Can you see an O(n) time algorithm? (If not, think

a little harder.) To illustrate summations, we will ﬁrst present a naive O(n

2

) time algorithm, which operates

by simply checking for each element of the array whether all the subsequent elements are strictly smaller.

(Although this example is pretty stupid, it will also serve to illustrate the sort of style that we will use in

presenting algorithms.)

Right Dominant Elements (Naive Solution)

// Input: List L of numbers given as an array L[1..n]

// Returns: List D containing the right dominant elements of L

RightDominant(L) {

D = empty list

for (i = 1 to n)

isDominant = true

for (j = i+1 to n)

if (A[i] <= A[j]) isDominant = false

if (isDominant) append A[i] to D

}

return D

}

If I were programming this, I would rewrite the inner (j) loop as a while loop, since we can terminate the

loop as soon as we ﬁnd that A[i] is not dominant. Again, this sort of optimization is good to keep in mind in

programming, but will be omitted since it will not affect the worst-case running time.

The time spent in this algorithm is dominated (no pun intended) by the time spent in the inner (j) loop. On the

ith iteration of the outer loop, the inner loop is executed from i + 1 to n, for a total of n −(i + 1) + 1 = n −i

times. (Recall the rule for the constant series above.) Each iteration of the inner loop takes constant time. Thus,

up to a constant factor, the running time, as a function of n, is given by the following summation:

T(n) =

n

¸

i=1

(n −i).

To solve this summation, let us expand it, and put it into a form such that the above formulas can be used.

T(n) = (n −1) + (n −2) +. . . + 2 + 1 + 0

= 0 + 1 + 2 +. . . + (n −2) + (n −1)

=

n−1

¸

i=0

i =

(n −1)n

2

.

Lecture Notes 6 CMSC 451

The last step comes from applying the formula for the linear series (using n −1 in place of n in the formula).

As mentioned above, there is a simple O(n) time algorithm for this problem. As an exercise, see if you can ﬁnd

it. As an additional challenge, see if you can design your algorithm so it only performs a single left-to-right scan

of the list L. (You are allowed to use up to O(n) working storage to do this.)

Recurrences: Another useful mathematical tool in algorithm analysis will be recurrences. They arise naturally in the

analysis of divide-and-conquer algorithms. Recall that these algorithms have the following general structure.

Divide: Divide the problem into two or more subproblems (ideally of roughly equal sizes),

Conquer: Solve each subproblem recursively, and

Combine: Combine the solutions to the subproblems into a single global solution.

How do we analyze recursive procedures like this one? If there is a simple pattern to the sizes of the recursive

calls, then the best way is usually by setting up a recurrence, that is, a function which is deﬁned recursively in

terms of itself. Here is a typical example. Suppose that we break the problem into two subproblems, each of size

roughly n/2. (We will assume exactly n/2 for simplicity.). The additional overhead of splitting and merging

the solutions is O(n). When the subproblems are reduced to size 1, we can solve them in O(1) time. We will

ignore constant factors, writing O(n) just as n, yielding the following recurrence:

T(n) = 1 if n = 1,

T(n) = 2T(n/2) +n if n > 1.

Note that, since we assume that n is an integer, this recurrence is not well deﬁned unless n is a power of 2 (since

otherwise n/2 will at some point be a fraction). To be formally correct, I should either write n/2| or restrict

the domain of n, but I will often be sloppy in this way.

There are a number of methods for solving the sort of recurrences that show up in divide-and-conquer algo-

rithms. The easiest method is to apply the Master Theorem, given in CLRS. Here is a slightly more restrictive

version, but adequate for a lot of instances. See CLRS for the more complete version of the Master Theorem

and its proof.

Theorem: (Simpliﬁed Master Theorem) Let a ≥ 1, b > 1 be constants and let T(n) be the recurrence

T(n) = aT(n/b) +cn

k

,

deﬁned for n ≥ 0.

Case 1: a > b

k

then T(n) is Θ(n

log

b

a

).

Case 2: a = b

k

then T(n) is Θ(n

k

log n).

Case 3: a < b

k

then T(n) is Θ(n

k

).

Using this version of the Master Theorem we can see that in our recurrence a = 2, b = 2, and k = 1, so a = b

k

and Case 2 applies. Thus T(n) is Θ(nlog n).

There many recurrences that cannot be put into this form. For example, the following recurrence is quite

common: T(n) = 2T(n/2) +nlog n. This solves to T(n) = Θ(nlog

2

n), but the Master Theorem (either this

form or the one in CLRS will not tell you this.) For such recurrences, other methods are needed.

Lecture 3: Review of Sorting and Selection

Read: Review Chapts. 6–9 in CLRS.

Lecture Notes 7 CMSC 451

Review of Sorting: Sorting is among the most basic problems in algorithm design. We are given a sequence of items,

each associated with a given key value. The problem is to permute the items so that they are in increasing (or

decreasing) order by key. Sorting is important because it is often the ﬁrst step in more complex algorithms.

Sorting algorithms are usually divided into two classes, internal sorting algorithms, which assume that data is

stored in an array in main memory, and external sorting algorithm, which assume that data is stored on disk or

some other device that is best accessed sequentially. We will only consider internal sorting.

You are probably familiar with one or more of the standard simple Θ(n

2

) sorting algorithms, such as Insertion-

Sort, SelectionSort and BubbleSort. (By the way, these algorithms are quite acceptable for small lists of, say,

fewer than 20 elements.) BubbleSort is the easiest one to remember, but it widely considered to be the worst of

the three.

The three canonical efﬁcient comparison-based sorting algorithms are MergeSort, QuickSort, and HeapSort. All

run in Θ(nlog n) time. Sorting algorithms often have additional properties that are of interest, depending on the

application. Here are two important properties.

In-place: The algorithm uses no additional array storage, and hence (other than perhaps the system’s recursion

stack) it is possible to sort very large lists without the need to allocate additional working storage.

Stable: A sorting algorithm is stable if two elements that are equal remain in the same relative position after

sorting is completed. This is of interest, since in some sorting applications you sort ﬁrst on one key and

then on another. It is nice to know that two items that are equal on the second key, remain sorted on the

ﬁrst key.

Here is a quick summary of the fast sorting algorithms. If you are not familiar with any of these, check out the

descriptions in CLRS. They are shown schematically in Fig. 1

QuickSort: It works recursively, by ﬁrst selecting a random “pivot value” from the array. Then it partitions the

array into elements that are less than and greater than the pivot. Then it recursively sorts each part.

QuickSort is widely regarded as the fastest of the fast sorting algorithms (on modern machines). One

explanation is that its inner loop compares elements against a single pivot value, which can be stored in

a register for fast access. The other algorithms compare two elements in the array. This is considered

an in-place sorting algorithm, since it uses no other array storage. (It does implicitly use the system’s

recursion stack, but this is usually not counted.) It is not stable. There is a stable version of QuickSort,

but it is not in-place. This algorithm is Θ(nlog n) in the expected case, and Θ(n

2

) in the worst case. If

properly implemented, the probability that the algorithm takes asymptotically longer (assuming that the

pivot is chosen randomly) is extremely small for large n.

QuickSort:

MergeSort:

HeapSort:

Heap

extractMax

x partition < x > x x

sort sort

x

split

sort

merge

buildHeap

Fig. 1: Common O(nlog n) comparison-based sorting algorithms.

Lecture Notes 8 CMSC 451

MergeSort: MergeSort also works recursively. It is a classical divide-and-conquer algorithm. The array is split

into two subarrays of roughly equal size. They are sorted recursively. Then the two sorted subarrays are

merged together in Θ(n) time.

MergeSort is the only stable sorting algorithm of these three. The downside is the MergeSort is the only

algorithm of the three that requires additional array storage (ignoring the recursion stack), and thus it is

not in-place. This is because the merging process merges the two arrays into a third array. Although it is

possible to merge arrays in-place, it cannot be done in Θ(n) time.

HeapSort: HeapSort is based on a nice data structure, called a heap, which is an efﬁcient implementation of a

priority queue data structure. A priority queue supports the operations of inserting a key, and deleting the

element with the smallest key value. A heap can be built for n keys in Θ(n) time, and the minimum key

can be extracted in Θ(log n) time. HeapSort is an in-place sorting algorithm, but it is not stable.

HeapSort works by building the heap (ordered in reverse order so that the maximum can be extracted

efﬁciently) and then repeatedly extracting the largest element. (Why it extracts the maximum rather than

the minimum is an implementation detail, but this is the key to making this work as an in-place sorting

algorithm.)

If you only want to extract the k smallest values, a heap can allow you to do this is Θ(n+k log n) time. A

heap has the additional advantage of being used in contexts where the priority of elements changes. Each

change of priority (key value) can be processed in Θ(log n) time.

Which sorting algorithm should you implement when implementing your programs? The correct answer is

probably “none of them”. Unless you know that your input has some special properties that suggest a much

faster alternative, it is best to rely on the library sorting procedure supplied on your system. Presumably, it

has been engineered to produce the best performance for your system, and saves you from debugging time.

Nonetheless, it is important to learn about sorting algorithms, since the fundamental concepts covered there

apply to much more complex algorithms.

Selection: A simpler, related problem to sorting is selection. The selection problem is, given an array A of n numbers

(not sorted), and an integer k, where 1 ≤ k ≤ n, return the kth smallest value of A. Although selection can be

solved in O(nlog n) time, by ﬁrst sorting A and then returning the kth element of the sorted list, it is possible

to select the kth smallest element in O(n) time. The algorithm is a variant of QuickSort.

Lower Bounds for Comparison-Based Sorting: The fact that O(nlog n) sorting algorithms are the fastest around

for many years, suggests that this may be the best that we can do. Can we sort faster? The claim is no, pro-

vided that the algorithm is comparison-based. A comparison-based sorting algorithm is one in which algorithm

permutes the elements based solely on the results of the comparisons that the algorithm makes between pairs of

elements.

All of the algorithms we have discussed so far are comparison-based. We will see that exceptions exist in

special cases. This does not preclude the possibility of sorting algorithms whose actions are determined by

other operations, as we shall see below. The following theorem gives the lower bound on comparison-based

sorting.

Theorem: Any comparison-based sorting algorithm has worst-case running time Ω(nlog n).

We will not present a proof of this theorem, but the basic argument follows from a simple analysis of the number

of possibilities and the time it takes to distinguish among them. There are n! ways to permute a given set of

n numbers. Any sorting algorithm must be able to distinguish between each of these different possibilities,

since two different permutations need to treated differently. Since each comparison leads to only two possible

outcomes, the execution of the algorithmcan be viewed as a binary tree. (This is a bit abstract, but given a sorting

algorithm it is not hard, but quite tedious, to trace its execution, and set up a new node each time a decision is

made.) This binary tree, called a decision tree, must have at least n! leaves, one for each of the possible input

permutations. Such a tree, even if perfectly balanced, must height at least lg(n!). By Stirling’s approximation, n!

Lecture Notes 9 CMSC 451

is, up to constant factors, roughly (n/e)

n

. Plugging this in and simplifying yields the Ω(nlog n) lower bound.

This can also be generalized to show that the average-case time to sort is also Ω(nlog n).

Linear Time Sorting: The Ω(nlog n) lower bound implies that if we hope to sort numbers faster than in O(nlog n)

time, we cannot do it by making comparisons alone. In some special cases, it is possible to sort without the

use of comparisons. This leads to the possibility of sorting in linear (that is, O(n)) time. Here are three such

algorithms.

Counting Sort: Counting sort assumes that each input is an integer in the range from 1 to k. The algorithm

sorts in Θ(n + k) time. Thus, if k is O(n), this implies that the resulting sorting algorithm runs in Θ(n)

time. The algorithm requires an additional Θ(n + k) working storage but has the nice feature that it is

stable. The algorithm is remarkably simple, but deceptively clever. You are referred to CLRS for the

details.

Radix Sort: The main shortcoming of CountingSort is that (due to space requirements) it is only practical for

a very small ranges of integers. If the integers are in the range from say, 1 to a million, we may not want

to allocate an array of a million elements. RadixSort provides a nice way around this by sorting numbers

one digit, or one byte, or generally, some groups of bits, at a time. As the number of bits in each group

increases, the algorithm is faster, but the space requirements go up.

The idea is very simple. Let’s think of our list as being composed of n integers, each having d decimal

digits (or digits in any base). To sort these integers we simply sort repeatedly, starting at the lowest order

digit, and ﬁnishing with the highest order digit. Since the sorting algorithm is stable, we know that if the

numbers are already sorted with respect to low order digits, and then later we sort with respect to high

order digits, numbers having the same high order digit will remain sorted with respect to their low order

digit. An example is shown in Figure 2.

Input Output

576 49[4] 9[5]4 [1]76 176

494 19[4] 5[7]6 [1]94 194

194 95[4] 1[7]6 [2]78 278

296 =⇒ 57[6] =⇒ 2[7]8 =⇒ [2]96 =⇒ 296

278 29[6] 4[9]4 [4]94 494

176 17[6] 1[9]4 [5]76 576

954 27[8] 2[9]6 [9]54 954

Fig. 2: Example of RadixSort.

The running time is Θ(d(n +k)) where d is the number of digits in each value, n is the length of the list,

and k is the number of distinct values each digit may have. The space needed is Θ(n +k).

A common application of this algorithm is for sorting integers over some range that is larger than n, but

still polynomial in n. For example, suppose that you wanted to sort a list of integers in the range from 1

to n

2

. First, you could subtract 1 so that they are now in the range from 0 to n

2

− 1. Observe that any

number in this range can be expressed as 2-digit number, where each digit is over the range from 0 to

n − 1. In particular, given any integer L in this range, we can write L = an + b, where a = L/n| and

b = L mod n. Now, we can think of L as the 2-digit number (a, b). So, we can radix sort these numbers

in time Θ(2(n +n)) = Θ(n). In general this works to sort any n numbers over the range from 1 to n

d

, in

Θ(dn) time.

BucketSort: CountingSort and RadixSort are only good for sorting small integers, or at least objects (like

characters) that can be encoded as small integers. What if you want to sort a set of ﬂoating-point numbers?

In the worst-case you are pretty much stuck with using one of the comparison-based sorting algorithms,

such as QuickSort, MergeSort, or HeapSort. However, in special cases where you have reason to believe

that your numbers are roughly uniformly distributed over some range, then it is possible to do better. (Note

Lecture Notes 10 CMSC 451

that this is a strong assumption. This algorithm should not be applied unless you have good reason to

believe that this is the case.)

Suppose that the numbers to be sorted range over some interval, say [0, 1). (It is possible in O(n) time

to ﬁnd the maximum and minimum values, and scale the numbers to ﬁt into this range.) The idea is

the subdivide this interval into n subintervals. For example, if n = 100, the subintervals would be

[0, 0.01), [0.01, 0.02), [0.02, 0.03), and so on. We create n different buckets, one for each interval. Then

we make a pass through the list to be sorted, and using the ﬂoor function, we can map each value to its

bucket index. (In this case, the index of element x would be 100x|.) We then sort each bucket in as-

cending order. The number of points per bucket should be fairly small, so even a quadratic time sorting

algorithm (e.g. BubbleSort or InsertionSort) should work. Finally, all the sorted buckets are concatenated

together.

The analysis relies on the fact that, assuming that the numbers are uniformly distributed, the number of

elements lying within each bucket on average is a constant. Thus, the expected time needed to sort each

bucket is O(1). Since there are n buckets, the total sorting time is Θ(n). An example illustrating this idea

is given in Fig. 3.

.81 .17 .59 .38 .86 .14 .10 .71 .42 .56

9

4

B

0

1

2

3

5

6

7

8

.59

.86 .81

.71

.56

.42

.38

.17 .14 .10

A

Fig. 3: BucketSort.

Lecture 4: Dynamic Programming: Longest Common Subsequence

Read: Introduction to Chapt 15, and Section 15.4 in CLRS.

Dynamic Programming: We begin discussion of an important algorithm design technique, called dynamic program-

ming (or DP for short). The technique is among the most powerful for designing algorithms for optimization

problems. (This is true for two reasons. Dynamic programming solutions are based on a few common elements.

Dynamic programming problems are typically optimization problems (ﬁnd the minimum or maximum cost so-

lution, subject to various constraints). The technique is related to divide-and-conquer, in the sense that it breaks

problems down into smaller problems that it solves recursively. However, because of the somewhat different

nature of dynamic programming problems, standard divide-and-conquer solutions are not usually efﬁcient. The

basic elements that characterize a dynamic programming algorithm are:

Substructure: Decompose your problem into smaller (and hopefully simpler) subproblems. Express the solu-

tion of the original problem in terms of solutions for smaller problems.

Table-structure: Store the answers to the subproblems in a table. This is done because subproblem solutions

are reused many times.

Bottom-up computation: Combine solutions on smaller subproblems to solve larger subproblems. (Our text

also discusses a top-down alternative, called memoization.)

Lecture Notes 11 CMSC 451

The most important question in designing a DP solution to a problem is how to set up the subproblem structure.

This is called the formulation of the problem. Dynamic programming is not applicable to all optimization

problems. There are two important elements that a problem must have in order for DP to be applicable.

Optimal substructure: (Sometimes called the principle of optimality.) It states that for the global problem to

be solved optimally, each subproblem should be solved optimally. (Not all optimization problems satisfy

this. Sometimes it is better to lose a little on one subproblem in order to make a big gain on another.)

Polynomially many subproblems: An important aspect to the efﬁciency of DP is that the total number of

subproblems to be solved should be at most a polynomial number.

Strings: One important area of algorithm design is the study of algorithms for character strings. There are a number

of important problems here. Among the most important has to do with efﬁciently searching for a substring

or generally a pattern in large piece of text. (This is what text editors and programs like “grep” do when you

perform a search.) In many instances you do not want to ﬁnd a piece of text exactly, but rather something that is

similar. This arises for example in genetics research and in document retrieval on the web. One common method

of measuring the degree of similarity between two strings is to compute their longest common subsequence.

Longest Common Subsequence: Let us think of character strings as sequences of characters. Given two sequences

X = 'x

1

, x

2

, . . . , x

m

` and Z = 'z

1

, z

2

, . . . , z

k

`, we say that Z is a subsequence of X if there is a strictly in-

creasing sequence of k indices 'i

1

, i

2

, . . . , i

k

` (1 ≤ i

1

< i

2

< . . . < i

k

≤ n) such that Z = 'X

i1

, X

i2

, . . . , X

i

k

`.

For example, let X = 'ABRACADABRA` and let Z = 'AADAA`, then Z is a subsequence of X.

Given two strings X and Y , the longest common subsequence of X and Y is a longest sequence Z that is a

subsequence of both X and Y . For example, let X = 'ABRACADABRA` and let Y = 'YABBADABBADOO`.

Then the longest common subsequence is Z = 'ABADABA`. See Fig. 4

O O D B Y A A D B A B A

X =

Y = B

A

LCS = A B A D A B A

A R B A R B A D A C

Fig. 4: An example of the LCS of two strings X and Y .

The Longest Common Subsequence Problem (LCS) is the following. Given two sequences X = 'x

1

, . . . , x

m

`

and Y = 'y

1

, . . . , y

n

` determine a longest common subsequence. Note that it is not always unique. For example

the LCS of 'ABC` and 'BAC` is either 'AC` or 'BC`.

DP Formulation for LCS: The simple brute-force solution to the problem would be to try all possible subsequences

from one string, and search for matches in the other string, but this is hopelessly inefﬁcient, since there are an

exponential number of possible subsequences.

Instead, we will derive a dynamic programming solution. In typical DP fashion, we need to break the prob-

lem into smaller pieces. There are many ways to do this for strings, but it turns out for this problem that

considering all pairs of preﬁxes will sufﬁce for us. A preﬁx of a sequence is just an initial string of values,

X

i

= 'x

1

, x

2

, . . . , x

i

`. X

0

is the empty sequence.

The idea will be to compute the longest common subsequence for every possible pair of preﬁxes. Let c[i, j]

denote the length of the longest common subsequence of X

i

and Y

j

. For example, in the above case we have

X

5

= 'ABRAC` and Y

6

= 'YABBAD`. Their longest common subsequence is 'ABA`. Thus, c[5, 6] = 3.

Which of the c[i, j] values do we compute? Since we don’t know which will lead to the ﬁnal optimum, we

compute all of them. Eventually we are interested in c[m, n] since this will be the LCS of the two entire strings.

The idea is to compute c[i, j] assuming that we already know the values of c[i

, j

], for i

≤ i and j

≤ j (but

not both equal). Here are the possible cases.

Lecture Notes 12 CMSC 451

Basis: c[i, 0] = c[j, 0] = 0. If either sequence is empty, then the longest common subsequence is empty.

Last characters match: Suppose x

i

= y

j

. For example: Let X

i

= 'ABCA` and let Y

j

= 'DACA`. Since

both end in A, we claim that the LCS must also end in A. (We will leave the proof as an exercise.) Since

the A is part of the LCS we may ﬁnd the overall LCS by removing A from both sequences and taking the

LCS of X

i−1

= 'ABC` and Y

j−1

= 'DAC` which is 'AC` and then adding A to the end, giving 'ACA`

as the answer. (At ﬁrst you might object: But how did you know that these two A’s matched with each

other. The answer is that we don’t, but it will not make the LCS any smaller if we do.) This is illustrated

at the top of Fig. 5.

if x

i

= y

j

then c[i, j] = c[i −1, j −1] + 1

LCS

Y

X A

yj

A A

j

j

Y

i−1

i

X A

add to LCS Last chars match:

j−1

i−1

j−1

x

B

LCS

X

LCS

A

Y

max

j

skip y

i

skip x

A

B

x

i

match

Last chars do not

y

i

B

A

j

Y

i

X

j

Y

i

X

Fig. 5: The possibe cases in the DP formulation of LCS.

Last characters do not match: Suppose that x

i

= y

j

. In this case x

i

and y

j

cannot both be in the LCS (since

they would have to be the last character of the LCS). Thus either x

i

is not part of the LCS, or y

j

is not part

of the LCS (and possibly both are not part of the LCS).

At this point it may be tempting to try to make a “smart” choice. By analyzing the last few characters

of X

i

and Y

j

, perhaps we can ﬁgure out which character is best to discard. However, this approach is

doomed to failure (and you are strongly encouraged to think about this, since it is a common point of

confusion.) Instead, our approach is to take advantage of the fact that we have already precomputed

smaller subproblems, and use these results to guide us.

In the ﬁrst case (x

i

is not in the LCS) the LCS of X

i

and Y

j

is the LCS of X

i−1

and Y

j

, which is c[i −1, j].

In the second case (y

j

is not in the LCS) the LCS is the LCS of X

i

and Y

j−1

which is c[i, j − 1]. We do

not know which is the case, so we try both and take the one that gives us the longer LCS. This is illustrated

at the bottom half of Fig. 5.

if x

i

= y

j

then c[i, j] = max(c[i −1, j], c[i, j −1])

Combining these observations we have the following formulation:

c[i, j] =

0 if i = 0 or j = 0,

c[i −1, j −1] + 1 if i, j > 0 and x

i

= y

j

,

max(c[i, j −1], c[i −1, j]) if i, j > 0 and x

i

= y

j

.

Implementing the Formulation: The task now is to simply implement this formulation. We concentrate only on

computing the maximum length of the LCS. Later we will see how to extract the actual sequence. We will store

some helpful pointers in a parallel array, b[0..m, 0..n]. The code is shown below, and an example is illustrated

in Fig. 6

Lecture Notes 13 CMSC 451

LCS Length Table with back pointers included

2 2 3

=n

2

1

2 2 1

2 2 1 1

1 1 1

1

B

D

C

B

A

B C D B

4

3

2

1

0

4 3 2 0

5 m= 0

B

4

3

2

1

0

4 3 2 1 0

5 m=

=n

start here

X = BACDB

X: X:

Y: Y:

D

1

1 1 1 1

0

0

0

0

0 0 0 0 0

B

D

C

B

A

B C

0

1 1

1 1 1 1

1 1 1 1

0

0

0

0

0 0 0 0 0

2

Y = BDCB

LCS = BCB

3 2 2 1

2 2 2 1

2

Fig. 6: Longest common subsequence example for the sequences X = 'BACDB` and Y = 'BCDB`. The numeric

table entries are the values of c[i, j] and the arrow entries are used in the extraction of the sequence.

Build LCS Table

LCS(x[1..m], y[1..n]) { // compute LCS table

int c[0..m, 0..n]

for i = 0 to m // init column 0

c[i,0] = 0; b[i,0] = SKIPX

for j = 0 to n // init row 0

c[0,j] = 0; b[0,j] = SKIPY

for i = 1 to m // fill rest of table

for j = 1 to n

if (x[i] == y[j]) // take X[i] (Y[j]) for LCS

c[i,j] = c[i-1,j-1]+1; b[i,j] = addXY

else if (c[i-1,j] >= c[i,j-1]) // X[i] not in LCS

c[i,j] = c[i-1,j]; b[i,j] = skipX

else // Y[j] not in LCS

c[i,j] = c[i,j-1]; b[i,j] = skipY

return c[m,n] // return length of LCS

}

Extracting the LCS

getLCS(x[1..m], y[1..n], b[0..m,0..n]) {

LCSstring = empty string

i = m; j = n // start at lower right

while(i != 0 && j != 0) // go until upper left

switch b[i,j]

case addXY: // add X[i] (=Y[j])

add x[i] (or equivalently y[j]) to front of LCSstring

i--; j--; break

case skipX: i--; break // skip X[i]

case skipY: j--; break // skip Y[j]

return LCSstring

}

Lecture Notes 14 CMSC 451

The running time of the algorithm is clearly O(mn) since there are two nested loops with m and n iterations,

respectively. The algorithm also uses O(mn) space.

Extracting the Actual Sequence: Extracting the ﬁnal LCS is done by using the back pointers stored in b[0..m, 0..n].

Intuitively b[i, j] = add

XY

means that X[i] and Y [j] together form the last character of the LCS. So we take

this common character, and continue with entry b[i −1, j −1] to the northwest (`). If b[i, j] = skip

X

, then we

know that X[i] is not in the LCS, and so we skip it and go to b[i −1, j] above us (↑). Similarly, if b[i, j] = skip

Y

,

then we know that Y [j] is not in the LCS, and so we skip it and go to b[i, j −1] to the left (←). Following these

back pointers, and outputting a character with each diagonal move gives the ﬁnal subsequence.

Lecture 5: Dynamic Programming: Chain Matrix Multiplication

Read: Chapter 15 of CLRS, and Section 15.2 in particular.

Chain Matrix Multiplication: This problem involves the question of determining the optimal sequence for perform-

ing a series of operations. This general class of problem is important in compiler design for code optimization

and in databases for query optimization. We will study the problem in a very restricted instance, where the

dynamic programming issues are easiest to see.

Suppose that we wish to multiply a series of matrices

A

1

A

2

. . . A

n

Matrix multiplication is an associative but not a commutative operation. This means that we are free to paren-

thesize the above multiplication however we like, but we are not free to rearrange the order of the matrices. Also

recall that when two (nonsquare) matrices are being multiplied, there are restrictions on the dimensions. A pq

matrix has p rows and q columns. You can multiply a p q matrix A times a q r matrix B, and the result

will be a p r matrix C. (The number of columns of A must equal the number of rows of B.) In particular for

1 ≤ i ≤ p and 1 ≤ j ≤ r,

C[i, j] =

q

¸

k=1

A[i, k]B[k, j].

This corresponds to the (hopefully familiar) rule that the [i, j] entry of C is the dot product of the ith (horizontal)

row of Aand the jth (vertical) column of B. Observe that there are pr total entries in C and each takes O(q) time

to compute, thus the total time to multiply these two matrices is proportional to the product of the dimensions,

pqr.

B C

=

A

p

q

q

r

r

Multiplication

time = pqr

=

*

p

Fig. 7: Matrix Multiplication.

Note that although any legal parenthesization will lead to a valid result, not all involve the same number of

operations. Consider the case of 3 matrices: A

1

be 5 4, A

2

be 4 6 and A

3

be 6 2.

multCost[((A

1

A

2

)A

3

)] = (5 4 6) + (5 6 2) = 180,

multCost[(A

1

(A

2

A

3

))] = (4 6 2) + (5 4 2) = 88.

Even for this small example, considerable savings can be achieved by reordering the evaluation sequence.

Lecture Notes 15 CMSC 451

Chain Matrix Multiplication Problem: Given a sequence of matrices A

1

, A

2

, . . . , A

n

and dimensions p

0

, p

1

, . . . , p

n

where A

i

is of dimension p

i−1

p

i

, determine the order of multiplication (represented, say, as a binary

tree) that minimizes the number of operations.

Important Note: This algorithm does not perform the multiplications, it just determines the best order in which

to perform the multiplications.

Naive Algorithm: We could write a procedure which tries all possible parenthesizations. Unfortunately, the number

of ways of parenthesizing an expression is very large. If you have just one or two matrices, then there is only

one way to parenthesize. If you have n items, then there are n − 1 places where you could break the list with

the outermost pair of parentheses, namely just after the 1st item, just after the 2nd item, etc., and just after the

(n − 1)st item. When we split just after the kth item, we create two sublists to be parenthesized, one with k

items, and the other with n −k items. Then we could consider all the ways of parenthesizing these. Since these

are independent choices, if there are L ways to parenthesize the left sublist and R ways to parenthesize the right

sublist, then the total is L R. This suggests the following recurrence for P(n), the number of different ways of

parenthesizing n items:

P(n) =

1 if n = 1,

¸

n−1

k=1

P(k)P(n −k) if n ≥ 2.

This is related to a famous function in combinatorics called the Catalan numbers (which in turn is related to the

number of different binary trees on n nodes). In particular P(n) = C(n − 1), where C(n) is the nth Catalan

number:

C(n) =

1

n + 1

2n

n

.

Applying Stirling’s formula (which is given in our text), we ﬁnd that C(n) ∈ Ω(4

n

/n

3/2

). Since 4

n

is exponen-

tial and n

3/2

is just polynomial, the exponential will dominate, implying that function grows very fast. Thus,

this will not be practical except for very small n. In summary, brute force is not an option.

Dynamic Programming Approach: This problem, like other dynamic programming problems involves determining

a structure (in this case, a parenthesization). We want to break the problem into subproblems, whose solutions

can be combined to solve the global problem. As is common to any DP solution, we need to ﬁnd some way to

break the problem into smaller subproblems, and we need to determine a recursive formulation, which represents

the optimum solution to each problem in terms of solutions to the subproblems. Let us think of how we can do

this.

Since matrices cannot be reordered, it makes sense to think about sequences of matrices. Let A

i..j

denote the

result of multiplying matrices i through j. It is easy to see that A

i..j

is a p

i−1

p

j

matrix. (Think about this for

a second to be sure you see why.) Now, in order to determine how to perform this multiplication optimally, we

need to make many decisions. What we want to do is to break the problem into problems of a similar structure.

In parenthesizing the expression, we can consider the highest level of parenthesization. At this level we are

simply multiplying two matrices together. That is, for any k, 1 ≤ k ≤ n −1,

A

1..n

= A

1..k

A

k+1..n

.

Thus the problem of determining the optimal sequence of multiplications is broken up into two questions: how

do we decide where to split the chain (what is k?) and how do we parenthesize the subchains A

1..k

and A

k+1..n

?

The subchain problems can be solved recursively, by applying the same scheme.

So, let us think about the problem of determining the best value of k. At this point, you may be tempted to

consider some clever ideas. For example, since we want matrices with small dimensions, pick the value of k

that minimizes p

k

. Although this is not a bad idea, in principle. (After all it might work. It just turns out

that it doesn’t in this case. This takes a bit of thinking, which you should try.) Instead, as is true in almost all

dynamic programming solutions, we will do the dumbest thing of simply considering all possible choices of k,

and taking the best of them. Usually trying all possible choices is bad, since it quickly leads to an exponential

Lecture Notes 16 CMSC 451

number of total possibilities. What saves us here is that there are only O(n

2

) different sequences of matrices.

(There are

n

2

**= n(n −1)/2 ways of choosing i and j to form A
**

i..j

to be precise.) Thus, we do not encounter

the exponential growth.

Notice that our chain matrix multiplication problem satisﬁes the principle of optimality, because once we decide

to break the sequence into the product A

1..k

A

k+1..n

, we should compute each subsequence optimally. That is,

for the global problem to be solved optimally, the subproblems must be solved optimally as well.

Dynamic Programming Formulation: We will store the solutions to the subproblems in a table, and build the table

in a bottom-up manner. For 1 ≤ i ≤ j ≤ n, let m[i, j] denote the minimum number of multiplications needed

to compute A

i..j

. The optimum cost can be described by the following recursive formulation.

Basis: Observe that if i = j then the sequence contains only one matrix, and so the cost is 0. (There is nothing

to multiply.) Thus, m[i, i] = 0.

Step: If i < j, then we are asking about the product A

i..j

. This can be split by considering each k, i ≤ k < j,

as A

i..k

times A

k+1..j

.

The optimum times to compute A

i..k

and A

k+1..j

are, by deﬁnition, m[i, k] and m[k + 1, j], respectively.

We may assume that these values have been computed previously and are already stored in our array. Since

A

i..k

is a p

i−1

p

k

matrix, and A

k+1..j

is a p

k

p

j

matrix, the time to multiply them is p

i−1

p

k

p

j

. This

suggests the following recursive rule for computing m[i, j].

m[i, i] = 0

m[i, j] = min

i≤k<j

(m[i, k] +m[k + 1, j] +p

i−1

p

k

p

j

) for i < j.

i i+1 k k+1 j

k+1..j

A

A

A A A A A

i..k

i..j

A

?

... ...

Fig. 8: Dynamic Programming Formulation.

It is not hard to convert this rule into a procedure, which is given below. The only tricky part is arranging the

order in which to compute the values. In the process of computing m[i, j] we need to access values m[i, k] and

m[k +1, j] for k lying between i and j. This suggests that we should organize our computation according to the

number of matrices in the subsequence. Let L = j−i+1 denote the length of the subchain being multiplied. The

subchains of length 1 (m[i, i]) are trivial to compute. Then we build up by computing the subchains of lengths

2, 3, . . . , n. The ﬁnal answer is m[1, n]. We need to be a little careful in setting up the loops. If a subchain of

length L starts at position i, then j = i + L − 1. Since we want j ≤ n, this means that i + L − 1 ≤ n, or in

other words, i ≤ n−L+1. So our loop for i runs from 1 to n−L+1 (in order to keep j in bounds). The code

is presented below.

The array s[i, j] will be explained later. It is used to extract the actual sequence. The running time of the

procedure is Θ(n

3

). We’ll leave this as an exercise in solving sums, but the key is that there are three nested

loops, and each can iterate at most n times.

Extracting the ﬁnal Sequence: Extracting the actual multiplication sequence is a fairly easy extension. The basic

idea is to leave a split marker indicating what the best split is, that is, the value of k that leads to the minimum

Lecture Notes 17 CMSC 451

Chain Matrix Multiplication

Matrix-Chain(array p[1..n]) {

array s[1..n-1,2..n]

for i = 1 to n do m[i,i] = 0; // initialize

for L = 2 to n do { // L = length of subchain

for i = 1 to n-L+1 do {

j = i + L - 1;

m[i,j] = INFINITY;

for k = i to j-1 do { // check all splits

q = m[i, k] + m[k+1, j] + p[i-1]*p[k]*p[j]

if (q < m[i, j]) {

m[i,j] = q;

s[i,j] = k;

}

}

}

}

return m[1,n] (final cost) and s (splitting markers);

}

value of m[i, j]. We can maintain a parallel array s[i, j] in which we will store the value of k providing the

optimal split. For example, suppose that s[i, j] = k. This tells us that the best way to multiply the subchain

A

i..j

is to ﬁrst multiply the subchain A

i..k

and then multiply the subchain A

k+1..j

, and ﬁnally multiply these

together. Intuitively, s[i, j] tells us what multiplication to perform last. Note that we only need to store s[i, j]

when we have at least two matrices, that is, if j > i.

The actual multiplication algorithm uses the s[i, j] value to determine how to split the current sequence. Assume

that the matrices are stored in an array of matrices A[1..n], and that s[i, j] is global to this recursive procedure.

The recursive procedure Mult does this computation and below returns a matrix.

Extracting Optimum Sequence

Mult(i, j) {

if (i == j) // basis case

return A[i];

else {

k = s[i,j]

X = Mult(i, k) // X = A[i]...A[k]

Y = Mult(k+1, j) // Y = A[k+1]...A[j]

return X*Y; // multiply matrices X and Y

}

}

In the ﬁgure below we show an example. This algorithm is tricky, so it would be a good idea to trace through

this example (and the one given in the text). The initial set of dimensions are '5, 4, 6, 2, 7` meaning that we

are multiplying A

1

(5 4) times A

2

(4 6) times A

3

(6 2) times A

4

(2 7). The optimal sequence is

((A

1

(A

2

A

3

))A

4

).

Lecture 6: Dynamic Programming: Minimum Weight Triangulation

Read: This is not covered in CLRS.

Lecture Notes 18 CMSC 451

i

1

s[i,j]

2 3

1 3

3

j

2

3

4

2

3

0

p

4

p

3

p

Final order

4

A

3

A

2

A

1

A

4

A

3

A

2

A

1

A

3

2

1

1

m[i,j]

1

2

3

4

1

2

3

4

4

2

p

1

p

5

158

88

120 48

104

84

0 0 0 0

i j

7 2 6

Fig. 9: Chain Matrix Multiplication Example.

Polygons and Triangulations: Let’s consider a geometric problem that outwardly appears to be quite different from

chain-matrix multiplication, but actually has remarkable similarities. We begin with a number of deﬁnitions.

Deﬁne a polygon to be a piecewise linear closed curve in the plane. In other words, we form a cycle by joining

line segments end to end. The line segments are called the sides of the polygon and the endpoints are called the

vertices. A polygon is simple if it does not cross itself, that is, if the sides do not intersect one another except

for two consecutive sides sharing a common vertex. A simple polygon subdivides the plane into its interior, its

boundary and its exterior. A simple polygon is said to be convex if every interior angle is at most 180 degrees.

Vertices with interior angle equal to 180 degrees are normally allowed, but for this problem we will assume that

no such vertices exist.

Polygon Simple polygon Convex polygon

Fig. 10: Polygons.

Given a convex polygon, we assume that its vertices are labeled in counterclockwise order P = 'v

1

, . . . , v

n

`.

We will assume that indexing of vertices is done modulo n, so v

0

= v

n

. This polygon has n sides, v

i−1

v

i

.

Given two nonadjacent sides v

i

and v

j

, where i < j−1, the line segment v

i

v

j

is a chord. (If the polygon is simple

but not convex, we include the additional requirement that the interior of the segment must lie entirely in the

interior of P.) Any chord subdivides the polygon into two polygons: 'v

i

, v

i+1

, . . . , v

j

`, and 'v

j

, v

j+1

, . . . , v

i

`.

A triangulation of a convex polygon P is a subdivision of the interior of P into a collection of triangles with

disjoint interiors, whose vertices are drawn from the vertices of P. Equivalently, we can deﬁne a triangulation

as a maximal set T of nonintersecting chords. (In other words, every chord that is not in T intersects the interior

of some chord in T.) It is easy to see that such a set of chords subdivides the interior of the polygon into a

collection of triangles with pairwise disjoint interiors (and hence the name triangulation). It is not hard to prove

(by induction) that every triangulation of an n-sided polygon consists of n − 3 chords and n − 2 triangles.

Triangulations are of interest for a number of reasons. Many geometric algorithm operate by ﬁrst decomposing

a complex polygonal shape into triangles.

In general, given a convex polygon, there are many possible triangulations. In fact, the number is exponential in

n, the number of sides. Which triangulation is the “best”? There are many criteria that are used depending on

the application. One criterion is to imagine that you must “pay” for the ink you use in drawing the triangulation,

and you want to minimize the amount of ink you use. (This may sound fanciful, but minimizing wire length is an

Lecture Notes 19 CMSC 451

important condition in chip design. Further, this is one of many properties which we could choose to optimize.)

This suggests the following optimization problem:

Minimum-weight convex polygon triangulation: Given a convex polygon determine the triangulation that

minimizes the sum of the perimeters of its triangles. (See Fig. 11.)

Lower weight triangulation A triangulation

Fig. 11: Triangulations of convex polygons, and the minimum weight triangulation.

Given three distinct vertices v

i

, v

j

, v

k

, we deﬁne the weight of the associated triangle by the weight function

w(v

i

, v

j

, v

k

) = [v

i

v

j

[ +[v

j

v

k

[ +[v

k

v

i

[,

where [v

i

v

j

[ denotes the length of the line segment v

i

v

j

.

Dynamic Programming Solution: Let us consider an (n + 1)-sided polygon P = 'v

0

, v

1

, . . . , v

n

`. Let us assume

that these vertices have been numbered in counterclockwise order. To derive a DP formulation we need to deﬁne

a set of subproblems from which we can derive the optimum solution. For 0 ≤ i < j ≤ n, deﬁne t[i, j] to be the

weight of the minimum weight triangulation for the subpolygon that lies to the right of directed chord v

i

v

j

, that

is, the polygon with the counterclockwise vertex sequence 'v

i

, v

i+1

, . . . , v

j

`. Observe that if we can compute

this quantity for all such i and j, then the weight of the minimum weight triangulation of the entire polygon can

be extracted as t[0, n]. (As usual, we only compute the minimum weight. But, it is easy to modify the procedure

to extract the actual triangulation.)

As a basis case, we deﬁne the weight of the trivial “2-sided polygon” to be zero, implying that t[i, i + 1] = 0.

In general, to compute t[i, j], consider the subpolygon 'v

i

, v

i+1

, . . . , v

j

`, where j > i +1. One of the chords of

this polygon is the side v

i

v

j

. We may split this subpolygon by introducing a triangle whose base is this chord,

and whose third vertex is any vertex v

k

, where i < k < j. This subdivides the polygon into the subpolygons

'v

i

, v

i+1

, . . . v

k

` and 'v

k

, v

k+1

, . . . v

j

` whose minimum weights are already known to us as t[i, k] and t[k, j].

In addition we should consider the weight of the newly added triangle ´v

i

v

k

v

j

. Thus, we have the following

recursive rule:

t[i, j] =

0 if j = i + 1

min

i<k<j

(t[i, k] +t[k, j] +w(v

i

v

k

v

j

)) if j > i + 1.

The ﬁnal output is the overall minimum weight, which is, t[0, n]. This is illustrated in Fig. 12

Note that this has almost exactly the same structure as the recursive deﬁnition used in the chain matrix multipli-

cation algorithm (except that some indices are different by 1.) The same Θ(n

3

) algorithm can be applied with

only minor changes.

Relationship to Binary Trees: One explanation behind the similarity of triangulations and the chain matrix multipli-

cation algorithm is to observe that both are fundamentally related to binary trees. In the case of the chain matrix

multiplication, the associated binary tree is the evaluation tree for the multiplication, where the leaves of the

tree correspond to the matrices, and each node of the tree is associated with a product of a sequence of two or

more matrices. To see that there is a similar correspondence here, consider an (n + 1)-sided convex polygon

P = 'v

0

, v

1

, . . . , v

n

`, and ﬁx one side of the polygon (say v

0

v

n

). Now consider a rooted binary tree whose root

node is the triangle containing side v

0

v

n

, whose internal nodes are the nodes of the dual tree, and whose leaves

Lecture Notes 20 CMSC 451

k

i k j

n

v

j

i

v

v

v

0

v

Triangulate

at cost t[i,k]

at cost t[k,j]

cost=w(v ,v , v )

Triangulate

Fig. 12: Triangulations and tree structure.

correspond to the remaining sides of the tree. Observe that partitioning the polygon into triangles is equivalent

to a binary tree with n leaves, and vice versa. This is illustrated in Fig. 13. Note that every triangle is associated

with an internal node of the tree and every edge of the original polygon, except for the distinguished starting

side v

0

v

n

, is associated with a leaf node of the tree.

v

11

1

2

3

4

5

6

7

8

9

10

root

A

6

root

v

v

v

v

v

v

v

v

v

v

v

0

A

2 A A

4

A

7 1

A

5

A

8

A A

11 9

A

10

A

3

9

A

8

A

7

A

6

A

5

A

2

A

1

A

4

A

3

A

11

A

10

A

Fig. 13: Triangulations and tree structure.

Once you see this connection. Then the following two observations follow easily. Observe that the associated

binary tree has n leaves, and hence (by standard results on binary trees) n − 1 internal nodes. Since each

internal node other than the root has one edge entering it, there are n−2 edges between the internal nodes. Each

internal node corresponds to one triangle, and each edge between internal nodes corresponds to one chord of the

triangulation.

Lecture 7: Greedy Algorithms: Activity Selection and Fractional Knapack

Read: Sections 16.1 and 16.2 in CLRS.

Greedy Algorithms: In many optimization algorithms a series of selections need to be made. In dynamic program-

ming we saw one way to make these selections. Namely, the optimal solution is described in a recursive manner,

and then is computed “bottom-up”. Dynamic programming is a powerful technique, but it often leads to algo-

rithms with higher than desired running times. Today we will consider an alternative design technique, called

greedy algorithms. This method typically leads to simpler and faster algorithms, but it is not as powerful or as

widely applicable as dynamic programming. We will give some examples of problems that can be solved by

greedy algorithms. (Later in the semester, we will see that this technique can be applied to a number of graph

problems as well.) Even when greedy algorithms do not produce the optimal solution, they often provide fast

heuristics (nonoptimal solution strategies), are often used in ﬁnding good approximations.

Lecture Notes 21 CMSC 451

Activity Scheduling: Activity scheduling and it is a very simple scheduling problem. We are given a set S =

¦1, 2, . . . , n¦ of n activities that are to be scheduled to use some resource, where each activity must be started

at a given start time s

i

and ends at a given ﬁnish time f

i

. For example, these might be lectures that are to be

given in a lecture hall, where the lecture times have been set up in advance, or requests for boats to use a repair

facility while they are in port.

Because there is only one resource, and some start and ﬁnish times may overlap (and two lectures cannot be

given in the same room at the same time), not all the requests can be honored. We say that two activities i and

j are noninterfering if their start-ﬁnish intervals do not overlap, more formally, [s

i

, f

i

) ∩ [s

j

, f

j

) = ∅. (Note

that making the intervals half open, two consecutive activities are not considered to interfere.) The activity

scheduling problem is to select a maximum-size set of mutually noninterfering activities for use of the resource.

(Notice that goal here is maximum number of activities, not maximum utilization. Of course different criteria

could be considered, but the greedy approach may not be optimal in general.)

How do we schedule the largest number of activities on the resource? Intuitively, we do not like long activities,

because they occupy the resource and keep us from honoring other requests. This suggests the following greedy

strategy: repeatedly select the activity with the smallest duration (f

i

−s

i

) and schedule it, provided that it does

not interfere with any previously scheduled activities. Although this seems like a reasonable strategy, this turns

out to be nonoptimal. (See Problem 17.1-4 in CLRS). Sometimes the design of a correct greedy algorithm

requires trying a few different strategies, until hitting on one that works.

Here is a greedy strategy that does work. The intuition is the same. Since we do not like activities that take a

long time, let us select the activity that ﬁnishes ﬁrst and schedule it. Then, we skip all activities that interfere

with this one, and schedule the next one that has the earliest ﬁnish time, and so on. To make the selection process

faster, we assume that the activities have been sorted by their ﬁnish times, that is,

f

1

≤ f

2

≤ . . . ≤ f

n

,

Assuming this sorting, the pseudocode for the rest of the algorithm is presented below. The output is the list A

of scheduled activities. The variable prev holds the index of the most recently scheduled activity at any time, in

order to determine interferences.

Greedy Activity Scheduler

schedule(s[1..n], f[1..n]) { // given start and finish times

// we assume f[1..n] already sorted

List A = <1> // schedule activity 1 first

prev = 1

for i = 2 to n

if (s[i] >= f[prev]) { // no interference?

append i to A; prev = i // schedule i next

}

return A

}

It is clear that the algorithm is quite simple and efﬁcient. The most costly activity is that of sorting the activities

by ﬁnish time, so the total running time is Θ(nlog n). Fig. 14 shows an example. Each activity is represented

by its start-ﬁnish time interval. Observe that the intervals are sorted by ﬁnish time. Event 1 is scheduled ﬁrst. It

interferes with activity 2 and 3. Then Event 4 is scheduled. It interferes with activity 5 and 6. Finally, activity 7

is scheduled, and it intereferes with the remaining activity. The ﬁnal output is ¦1, 4, 7¦. Note that this is not the

only optimal schedule. ¦2, 4, 7¦ is also optimal.

Proof of Optimality: Our proof of optimality is based on showing that the ﬁrst choice made by the algorithm is the

best possible, and then using induction to show that the rest of the choices result in an optimal schedule. Proofs

of optimality for greedy algorithms follow a similar structure. Suppose that you have any nongreedy solution.

Lecture Notes 22 CMSC 451

4

1

4

1 1

Add 7:

Sched 7; Skip 8

Sched 4; Skip 5,6

Sched 1; Skip 2,3

Input:

3

2

3

2

3

5

6

2

7 7

5

Add 1:

7

6

7

Add 4:

8

4

8

6

5

4

2

1

3

5

8

8

6

Fig. 14: An example of the greedy algorithm for activity scheduling. The ﬁnal schedule is ¦1, 4, 7¦.

Show that its cost can be reduced by being “greedier” at some point in the solution. This proof is complicated a

bit by the fact that there may be multiple solutions. Our approach is to show that any schedule that is not greedy

can be made more greedy, without decreasing the number of activities.

Claim: The greedy algorithm gives an optimal solution to the activity scheduling problem.

Proof: Consider any optimal schedule A that is not the greedy schedule. We will construct a new optimal

schedule A

**that is in some sense “greedier” than A. Order the activities in increasing order of ﬁnish
**

time. Let A = 'x

1

, x

2

, . . . , x

k

` be the activities of A. Since A is not the same as the greedy schedule,

consider the ﬁrst activity x

j

where these two schedules differ. That is, the greedy schedule is of the form

G = 'x

1

, x

2

, . . . , x

j−1

, g

j

, . . .` where g

j

= x

j

. (Note that k ≥ j, since otherwise G would have more

activities than the optimal schedule, which would be a contradiction.) The greedy algorithm selects the

activity with the earliest ﬁnish time that does not conﬂict with any earlier activity. Thus, we know that g

j

does not conﬂict with any earlier activity, and it ﬁnishes before x

j

.

Consider the modiﬁed “greedier” schedule A

**that results by replacing x
**

j

with g

j

in the schedule A. (See

Fig. 15.) That is,

A

= 'x

1

, x

2

, . . . , x

j−1

, g

j

, x

j+1

, . . . , x

k

`.

1 5

x

4

G:

x

1

x

2

A:

x x

2

x

3

x

g

3

g

3 A’:

x

1

x

2

x

4

x

5

Fig. 15: Proof of optimality for the greedy schedule (j = 3).

This is a feasible schedule. (Since g

j

cannot conﬂict with the earlier activities, and it does not conﬂict with

later activities, because it ﬁnishes before x

j

.) It has the same number of activities as A, and therefore A

**Lecture Notes 23 CMSC 451
**

is also optimal. By repeating this process, we will eventually convert A into G, without decreasing the

number of activities. Therefore, G is also optimal.

Fractional Knapsack Problem: The classical (0-1) knapsack problem is a famous optimization problem. A thief is

robbing a store, and ﬁnds n items which can be taken. The ith item is worth v

i

dollars and weighs w

i

pounds,

where v

i

and w

i

are integers. He wants to take as valuable a load as possible, but has a knapsack that can only

carry W total pounds. Which items should he take? (The reason that this is called 0-1 knapsack is that each

item must be left (0) or taken entirely (1). It is not possible to take a fraction of an item or multiple copies of an

item.) This optimization problem arises in industrial packing applications. For example, you may want to ship

some subset of items on a truck of limited capacity.

In contrast, in the fractional knapsack problem the setup is exactly the same, but the thief is allowed to take any

fraction of an item for a fraction of the weight and a fraction of the value. So, you might think of each object as

being a sack of gold, which you can partially empty out before taking.

The 0-1 knapsack problem is hard to solve, and in fact it is an NP-complete problem (meaning that there

probably doesn’t exist an efﬁcient solution). However, there is a very simple and efﬁcient greedy algorithm for

the fractional knapsack problem.

As in the case of the other greedy algorithms we have seen, the idea is to ﬁnd the right order in which to process

items. Intuitively, it is good to have high value and bad to have high weight. This suggests that we ﬁrst sort the

items according to some function that is an decreases with value and increases with weight. There are a few

choices that you might try here, but only one works. Let ρ

i

= v

i

/w

i

denote the value-per-pound ratio. We sort

the items in decreasing order of ρ

i

, and add them in this order. If the item ﬁts, we take it all. At some point

there is an item that does not ﬁt in the remaining space. We take as much of this item as possible, thus ﬁlling

the knapsack entirely. This is illustrated in Fig. 16

40

20 20

30

20

5

+

5 $30

$270

+

$100

$140

ρ=

40

35

5

10

20

30

40

knapsack

4.0 6.0 2.0 5.0 3.0

$30

60

$30 $20 $160 $90 $100

fractional problem.

Greedy solution to

to 0−1 problem.

Greedy solution

to 0−1 problem.

Optimal solution

Input

$100

$90

+

+

$220 $260

$160

+

$100

Fig. 16: Example for the fractional knapsack problem.

Correctness: It is intuitively easy to see that the greedy algorithm is optimal for the fractional problem. Given a room

with sacks of gold, silver, and bronze, you would obviously take as much gold as possible, then take as much

silver as possible, and then as much bronze as possible. But it would never beneﬁt you to take a little less gold

so that you could replace it with an equal volume of bronze.

More formally, suppose to the contrary that the greedy algorithm is not optimal. This would mean that there is

an alternate selection that is optimal. Sort the items of the alternate selection in decreasing order by ρ values.

Consider the ﬁrst item i on which the two selections differ. By deﬁnition, greedy takes a greater amount of item

i than the alternate (because the greedy always takes as much as it can). Let us say that greedy takes x more

Lecture Notes 24 CMSC 451

units of object i than the alternate does. All the subsequent elements of the alternate selection are of lesser value

than v

i

. By replacing x units of any such items with x units of itemi, we would increase the overall value of the

alternate selection. However, this implies that the alternate selection is not optimal, a contradiction.

Nonoptimality for the 0-1 Knapsack: Next we show that the greedy algorithm is not generally optimal in the 0-1

knapsack problem. Consider the example shown in Fig. 16. If you were to sort the items by ρ

i

, then you would

ﬁrst take the items of weight 5, then 20, and then (since the item of weight 40 does not ﬁt) you would settle for

the item of weight 30, for a total value of $30 + $100 + $90 = $220. On the other hand, if you had been less

greedy, and ignored the item of weight 5, then you could take the items of weights 20 and 40 for a total value of

$100 + $160 = $260. This feature of “delaying gratiﬁcation” in order to come up with a better overall solution

is your indication that the greedy solution is not optimal.

Lecture 8: Greedy Algorithms: Huffman Coding

Read: Section 16.3 in CLRS.

Huffman Codes: Huffman codes provide a method of encoding data efﬁciently. Normally when characters are coded

using standard codes like ASCII, each character is represented by a ﬁxed-length codeword of bits (e.g. 8 bits

per character). Fixed-length codes are popular, because its is very easy to break a string up into its individual

characters, and to access individual characters and substrings by direct indexing. However, ﬁxed-length codes

may not be the most efﬁcient from the perspective of minimizing the total quantity of data.

Consider the following example. Suppose that we want to encode strings over the (rather limited) 4-character

alphabet C = ¦a, b, c, d¦. We could use the following ﬁxed-length code:

Character a b c d

Fixed-Length Codeword 00 01 10 11

A string such as “abacdaacac” would be encoded by replacing each of its characters by the corresponding binary

codeword.

a b a c d a a c a c

00 01 00 10 11 00 00 10 00 10

The ﬁnal 20-character binary string would be “00010010110000100010”.

Now, suppose that you knew the relative probabilities of characters in advance. (This might happen by analyzing

many strings over a long period of time. In applications like data compression, where you want to encode one

ﬁle, you can just scan the ﬁle and determine the exact frequencies of all the characters.) You can use this

knowledge to encode strings differently. Frequently occurring characters are encoded using fewer bits and less

frequent characters are encoded using more bits. For example, suppose that characters are expected to occur

with the following probabilities. We could design a variable-length code which would do a better job.

Character a b c d

Probability 0.60 0.05 0.30 0.05

Variable-Length Codeword 0 110 10 111

Notice that there is no requirement that the alphabetical order of character correspond to any sort of ordering

applied to the codewords. Now, the same string would be encoded as follows.

a b a c d a a c a c

0 110 0 10 111 0 0 10 0 10

Lecture Notes 25 CMSC 451

Thus, the resulting 17-character string would be “01100101110010010”. Thus, we have achieved a savings of

3 characters, by using this alternative code. More generally, what would be the expected savings for a string of

length n? For the 2-bit ﬁxed-length code, the length of the encoded string is just 2n bits. For the variable-length

code, the expected length of a single encoded character is equal to the sum of code lengths times the respective

probabilities of their occurrences. The expected encoded string length is just n times the expected encoded

character length.

n(0.60 1 + 0.05 3 + 0.30 2 + 0.05 3) = n(0.60 + 0.15 + 0.60 + 0.15) = 1.5n.

Thus, this would represent a 25% savings in expected encoding length. The question that we will consider today

is how to form the best code, assuming that the probabilities of character occurrences are known.

Preﬁx Codes: One issue that we didn’t consider in the example above is whether we will be able to decode the string,

once encoded. In fact, this code was chosen quite carefully. Suppose that instead of coding the character ‘a’

as 0, we had encoded it as 1. Now, the encoded string “111” is ambiguous. It might be “d” and it might be

“aaa”. How can we avoid this sort of ambiguity? You might suggest that we add separation markers between

the encoded characters, but this will tend to lengthen the encoding, which is undesirable. Instead, we would like

the code to have the property that it can be uniquely decoded.

Note that in both the variable-length codes given in the example above no codeword is a preﬁx of another. This

turns out to be the key property. Observe that if two codewords did share a common preﬁx, e.g. a → 001 and

b → 00101, then when we see 00101 . . . how do we know whether the ﬁrst character of the encoded message

is a or b. Conversely, if no codeword is a preﬁx of any other, then as soon as we see a codeword appearing as

a preﬁx in the encoded text, then we know that we may decode this without fear of it matching some longer

codeword. Thus we have the following deﬁnition.

Preﬁx Code: An assignment of codewords to characters so that no codeword is a preﬁx of any other.

Observe that any binary preﬁx coding can be described by a binary tree in which the codewords are the leaves

of the tree, and where a left branch means “0” and a right branch means “1”. The code given earlier is shown

in the following ﬁgure. The length of a codeword is just its depth in the tree. The code given earlier is a preﬁx

code, and its corresponding tree is shown in the following ﬁgure.

111 110

10

0

0

0

1

1

1 0

d b

c

a

Fig. 17: Preﬁx codes.

Decoding a preﬁx code is simple. We just traverse the tree from root to leaf, letting the input character tell

us which branch to take. On reaching a leaf, we output the corresponding character, and return to the root to

continue the process.

Expected encoding length: Once we know the probabilities of the various characters, we can determine the total

length of the encoded text. Let p(x) denote the probability of seeing character x, and let d

T

(x) denote the

length of the codeword (depth in the tree) relative to some preﬁx tree T. The expected number of bits needed to

encode a text with n characters is given in the following formula:

B(T) = n

¸

x∈C

p(x)d

T

(x).

Lecture Notes 26 CMSC 451

This suggests the following problem:

Optimal Code Generation: Given an alphabet C and the probabilities p(x) of occurrence for each character

x ∈ C, compute a preﬁx code T that minimizes the expected length of the encoded bit-string, B(T).

Note that the optimal code is not unique. For example, we could have complemented all of the bits in our earlier

code without altering the expected encoded string length. There is a very simple algorithm for ﬁnding such a

code. It was invented in the mid 1950’s by David Huffman, and is called a Huffman code.. By the way, this

code is used by the Unix utility pack for ﬁle compression. (There are better compression methods however. For

example, compress, gzip and many others are based on a more sophisticated method called the Lempel-Ziv

coding.)

Huffman’s Algorithm: Here is the intuition behind the algorithm. Recall that we are given the occurrence probabil-

ities for the characters. We are going to build the tree up from the leaf level. We will take two characters x and

y, and “merge” them into a single super-character called z, which then replaces x and y in the alphabet. The

character z will have a probability equal to the sum of x and y’s probabilities. Then we continue recursively

building the code on the new alphabet, which has one fewer character. When the process is completed, we know

the code for z, say 010. Then, we append a 0 and 1 to this codeword, given 0100 for x and 0101 for y.

Another way to think of this, is that we merge x and y as the left and right children of a root node called z. Then

the subtree for z replaces x and y in the list of characters. We repeat this process until only one super-character

remains. The resulting tree is the ﬁnal preﬁx tree. Since x and y will appear at the bottom of the tree, it seem

most logical to select the two characters with the smallest probabilities to perform the operation on. The result

is Huffman’s algorithm. It is illustrated in the following ﬁgure.

The pseudocode for Huffman’s algorithm is given below. Let C denote the set of characters. Each character

x ∈ C is associated with an occurrence probability x.prob. Initially, the characters are all stored in a priority

queue Q. Recall that this data structure can be built initially in O(n) time, and we can extract the element with

the smallest key in O(log n) time and insert a new element in O(log n) time. The objects in Q are sorted by

probability. Note that with each execution of the for-loop, the number of items in the queue decreases by one.

So, after n − 1 iterations, there is exactly one element left in the queue, and this is the root of the ﬁnal preﬁx

code tree.

Correctness: The big question that remains is why is this algorithm correct? Recall that the cost of any encoding tree

T is B(T) =

¸

x

p(x)d

T

(x). Our approach will be to show that any tree that differs from the one constructed by

Huffman’s algorithm can be converted into one that is equal to Huffman’s tree without increasing its cost. First,

observe that the Huffman tree is a full binary tree, meaning that every internal node has exactly two children. It

would never pay to have an internal node with only one child (since such a node could be deleted), so we may

limit consideration to full binary trees.

Claim: Consider the two characters, x and y with the smallest probabilities. Then there is an optimal code tree

in which these two characters are siblings at the maximum depth in the tree.

Proof: Let T be any optimal preﬁx code tree, and let b and c be two siblings at the maximum depth of the

tree. Assume without loss of generality that p(b) ≤ p(c) and p(x) ≤ p(y) (if this is not true, then rename

these characters). Now, since x and y have the two smallest probabilities it follows that p(x) ≤ p(b) and

p(y) ≤ p(c). (In both cases they may be equal.) Because b and c are at the deepest level of the tree we

know that d(b) ≥ d(x) and d(c) ≥ d(y). (Again, they may be equal.) Thus, we have p(b) −p(x) ≥ 0 and

d(b) −d(x) ≥ 0, and hence their product is nonnegative. Now switch the positions of x and b in the tree,

resulting in a new tree T

**. This is illustrated in the following ﬁgure.
**

Next let us see how the cost changes as we go from T to T

**. Almost all the nodes contribute the same
**

to the expected cost. The only exception are nodes x and b. By subtracting the old contributions of these

Lecture Notes 27 CMSC 451

30

b: 48 d: 17 f: 13

smallest

smallest

smallest

smallest

22

12

a: 05 c: 07

e: 10

b: 48

d: 17 f: 13

30

smallest

b: 48 52

22

12

a: 05

0

b: 48

Final Tree

011

1

010

0

1

1

1

0

0

1

001

0001

1

0

0000

f: 13 d: 17

a: 05 c: 07

e: 10

f: 13 d: 17 e: 10

c: 07

e: 10

c: 07 a: 05

12

22

12

c: 07 a: 05

b: 48 d: 17 e: 10 f: 13

f: 13 e: 10 d: 17 c: 07 b: 48 a: 05

Fig. 18: Huffman’s Algorithm.

Lecture Notes 28 CMSC 451

Huffman’s Algorithm

Huffman(int n, character C[1..n]) {

Q = C; // priority queue

for i = 1 to n-1 {

z = new internal tree node;

z.left = x = Q.extractMin(); // extract smallest probabilities

z.right = y = Q.extractMin();

z.prob = x.prob + y.prob; // z’s probability is their sum

Q.insert(z); // insert z into queue

}

return the last element left in Q as the root;

}

T’’

−(p(b)−p(x))(d(b)−d(x))

Cost change =

< 0

Cost change =

−(p(c)−p(y))(d(c)−d(y))

< 0

T T’

x

y

c

y

x

b c

b

c

y

b

x

Fig. 19: Correctness of Huffman’s Algorithm.

nodes and adding in the new contributions we have

B(T

) = B(T) −p(x)d(x) +p(x)d(b) −p(b)d(b) +p(b)d(x)

= B(T) +p(x)(d(b) −d(x)) −p(b)(d(b) −d(x))

= B(T) −(p(b) −p(x))(d(b) −d(x))

≤ B(T) because (p(b) −p(x))(d(b) −d(x)) ≥ 0.

Thus the cost does not increase, implying that T

**is an optimal tree. By switching y with c we get a new
**

tree T

, which by a similar argument is also optimal. The ﬁnal tree T

**satisﬁes the statement of the claim.
**

The above theorem asserts that the ﬁrst step of Huffman’s algorithm is essentially the proper one to perform.

The complete proof of correctness for Huffman’s algorithm follows by induction on n (since with each step, we

eliminate exactly one character).

Claim: Huffman’s algorithm produces the optimal preﬁx code tree.

Proof: The proof is by induction on n, the number of characters. For the basis case, n = 1, the tree consists of

a single leaf node, which is obviously optimal.

Assume inductively that when strictly fewer than n characters, Huffman’s algorithm is guaranteed to pro-

duce the optimal tree. We want to show it is true with exactly n characters. Suppose we have exactly n

characters. The previous claim states that we may assume that in the optimal tree, the two characters of

lowest probability x and y will be siblings at the lowest level of the tree. Remove x and y, replacing them

with a new character z whose probability is p(z) = p(x) +p(y). Thus n −1 characters remain.

Consider any preﬁx code tree T made with this new set of n−1 characters. We can convert it into a preﬁx

code tree T

**for the original set of characters by undoing the previous operation and replacing z with x
**

Lecture Notes 29 CMSC 451

and y (adding a “0” bit for x and a “1” bit for y). The cost of the new tree is

B(T

) = B(T) −p(z)d(z) +p(x)(d(z) + 1) +p(y)(d(z) + 1)

= B(T) −(p(x) +p(y))d(z) + (p(x) +p(y))(d(z) + 1)

= B(T) + (p(x) +p(y))(d(z) + 1 −d(z))

= B(T) +p(x) +p(y).

Since the change in cost depends in no way on the structure of the tree T, to minimize the cost of the

ﬁnal tree T

**, we need to build the tree T on n − 1 characters optimally. By induction, this exactly what
**

Huffman’s algorithm does. Thus the ﬁnal tree is optimal.

Lecture 9: Graphs: Background and Breadth First Search

Read: Review Sections 22.1 and 22.2 CLR.

Graph Algorithms: We are now beginning a major new section of the course. We will be discussing algorithms for

both directed and undirected graphs. Intuitively, a graph is a collection of vertices or nodes, connected by a

collection of edges. Graphs are extremely important because they are a very ﬂexible mathematical model for

many application problems. Basically, any time you have a set of objects, and there is some “connection” or “re-

lationship” or “interaction” between pairs of objects, a graph is a good way to model this. Examples of graphs in

application include communication and transportation networks, VLSI and other sorts of logic circuits, surface

meshes used for shape description in computer-aided design and geographic information systems, precedence

constraints in scheduling systems. The list of application is almost too long to even consider enumerating it.

Most of the problems in computational graph theory that we will consider arise because they are of importance

to one or more of these application areas. Furthermore, many of these problems form the basic building blocks

from which more complex algorithms are then built.

Graphs and Digraphs: Most of you have encountered the notions of directed and undirected graphs in other courses,

so we will give a quick overview here.

Deﬁnition: A directed graph (or digraph) G = (V, E) consists of a ﬁnite set V , called the vertices or nodes,

and E, a set of ordered pairs, called the edges of G. (Another way of saying this is that E is a binary

relation on V .)

Observe that self-loops are allowed by this deﬁnition. Some deﬁnitions of graphs disallow this. Multiple edges

are not permitted (although the edges (v, w) and (w, v) are distinct).

1

3

4 1

2

3

4 2

Digraph Graph

Fig. 20: Digraph and graph example.

Deﬁnition: An undirected graph (or graph) G = (V, E) consists of a ﬁnite set V of vertices, and a set E of

unordered pairs of distinct vertices, called the edges. (Note that self-loops are not allowed).

Lecture Notes 30 CMSC 451

Note that directed graphs and undirected graphs are different (but similar) objects mathematically. Certain

notions (such as path) are deﬁned for both, but other notions (such as connectivity) may only be deﬁned for one,

or may be deﬁned differently.

We say that vertex v is adjacent to vertex u if there is an edge (u, v). In a directed graph, given the edge

e = (u, v), we say that u is the origin of e and v is the destination of e. In undirected graphs u and v are the

endpoints of the edge. The edge e is incident (meaning that it touches) both u and v.

In a digraph, the number of edges coming out of a vertex is called the out-degree of that vertex, and the number

of edges coming in is called the in-degree. In an undirected graph we just talk about the degree of a vertex as

the number of incident edges. By the degree of a graph, we usually mean the maximum degree of its vertices.

When discussing the size of a graph, we typically consider both the number of vertices and the number of edges.

The number of vertices is typically written as n or V , and the number of edges is written as m or E or e. Here

are some basic combinatorial facts about graphs and digraphs. We will leave the proofs to you. Given a graph

with V vertices and E edges then:

In a graph:

Number of edges: 0 ≤ E ≤

n

2

= n(n −1)/2 ∈ O(n

2

).

Sum of degrees:

¸

v∈V

deg(v) = 2E.

In a digraph:

Number of edges: 0 ≤ E ≤ n

2

.

Sum of degrees:

¸

v∈V

in-deg(v) =

¸

v∈V

out-deg(v) = E.

Notice that generally the number of edges in a graph may be as large as quadratic in the number of vertices.

However, the large graphs that arise in practice typically have much fewer edges. A graph is said to be sparse if

E ∈ Θ(V ), and dense, otherwise. When giving the running times of algorithms, we will usually express it as a

function of both V and E, so that the performance on sparse and dense graphs will be apparent.

Paths and Cycles: A path in a graph or digraph is a sequence of vertices 'v

0

, v

1

, . . . , v

k

` such that (v

i−1

, v

i

) is an

edge for i = 1, 2, . . . , k. The length of the path is the number of edges, k. A path is simple if all vertices and all

the edges are distinct. A cycle is a path containing at least one edge and for which v

0

= v

k

. A cycle is simple if

its vertices (except v

0

and v

k

) are distinct, and all its edges are distinct.

A graph or digraph is said to be acyclic if it contains no simple cycles. An acyclic connected graph is called a

free tree or simply tree for short. (The term “free” is intended to emphasize the fact that the tree has no root, in

contrast to a rooted tree, as is usually seen in data structures.) An acyclic undirected graph (which need not be

connected) is a collection of free trees, and is (naturally) called a forest. An acyclic digraph is called a directed

acyclic graph, or DAG for short.

Free Tree

cycle

Simple

cycle

Nonsimple DAG Forest

Fig. 21: Illustration of some graph terms.

We say that w is reachable from u if there is a path from u to w. Note that every vertex is reachable from itself

by a trivial path that uses zero edges. An undirected graph is connected if every vertex can reach every other

vertex. (Connectivity is a bit messier for digraphs, and we will deﬁne it later.) The subsets of mutually reachable

vertices partition the vertices of the graph into disjoint subsets, called the connected components of the graph.

Lecture Notes 31 CMSC 451

Representations of Graphs and Digraphs: There are two common ways of representing graphs and digraphs. First

we show how to represent digraphs. Let G = (V, E) be a digraph with n = [V [ and let e = [E[. We will assume

that the vertices of G are indexed ¦1, 2, . . . , n¦.

Adjacency Matrix: An n n matrix deﬁned for 1 ≤ v, w ≤ n.

A[v, w] =

1 if (v, w) ∈ E

0 otherwise.

If the digraph has weights we can store the weights in the matrix. For example if (v, w) ∈ E then

A[v, w] = W(v, w) (the weight on edge (v, w)). If (v, w) / ∈ E then generally W(v, w) need not be

deﬁned, but often we set it to some “special” value, e.g. A(v, w) = −1, or ∞. (By ∞ we mean (in

practice) some number which is larger than any allowable weight. In practice, this might be some machine

dependent constant like MAXINT.)

Adjacency List: An array Adj[1 . . . n] of pointers where for 1 ≤ v ≤ n, Adj[v] points to a linked list contain-

ing the vertices which are adjacent to v (i.e. the vertices that can be reached from v by a single edge). If

the edges have weights then these weights may also be stored in the linked list elements.

3

1

1

0 1

0

1 1

0

0

2 3

1

2

3

2

1

Adjacency matrix

Adj

Adjacency list

3 2

2

3

1

3

2

1

1

Fig. 22: Adjacency matrix and adjacency list for digraphs.

We can represent undirected graphs using exactly the same representation, but we will store each edge twice. In

particular, we representing the undirected edge ¦v, w¦ by the two oppositely directed edges (v, w) and (w, v).

Notice that even though we represent undirected graphs in the same way that we represent digraphs, it is impor-

tant to remember that these two classes of objects are mathematically distinct from one another.

This can cause some complications. For example, suppose you write an algorithm that operates by marking

edges of a graph. You need to be careful when you mark edge (v, w) in the representation that you also mark

(w, v), since they are both the same edge in reality. When dealing with adjacency lists, it may not be convenient

to walk down the entire linked list, so it is common to include cross links between corresponding edges.

1

1

1

3

2

1

1 2 3

1

1

0 1

0

1 0

4

2 3

1

Adjacency list (with crosslinks) Adjacency matrix

Adj

4

1

1 2 4

4 2

0

0

1

1

1 0 4

3

2

1

4

3

1 3

3

Fig. 23: Adjacency matrix and adjacency list for graphs.

An adjacency matrix requires Θ(V

2

) storage and an adjacency list requires Θ(V + E) storage. The V arises

because there is one entry for each vertex in Adj . Since each list has out-deg(v) entries, when this is summed

over all vertices, the total number of adjacency list records is Θ(E). For sparse graphs the adjacency list

representation is more space efﬁcient.

Lecture Notes 32 CMSC 451

Graph Traversals: There are a number of approaches used for solving problems on graphs. One of the most impor-

tant approaches is based on the notion of systematically visiting all the vertices and edge of a graph. The reason

for this is that these traversals impose a type of tree structure (or generally a forest) on the graph, and trees are

usually much easier to reason about than general graphs.

Breadth-ﬁrst search: Given an graph G = (V, E), breadth-ﬁrst search starts at some source vertex s and “discovers”

which vertices are reachable from s. Deﬁne the distance between a vertex v and s to be the minimum number

of edges on a path from s to v. Breadth-ﬁrst search discovers vertices in increasing order of distance, and hence

can be used as an algorithm for computing shortest paths. At any given time there is a “frontier” of vertices that

have been discovered, but not yet processed. Breadth-ﬁrst search is named because it visits vertices across the

entire “breadth” of this frontier.

Initially all vertices (except the source) are colored white, meaning that they are undiscovered. When a vertex

has ﬁrst been discovered, it is colored gray (and is part of the frontier). When a gray vertex is processed, then it

becomes black.

The search makes use of a queue, a ﬁrst-in ﬁrst-out list, where elements are removed in the same order they

are inserted. The ﬁrst item in the queue (the next to be removed) is called the head of the queue. We will also

maintain arrays color[u] which holds the color of vertex u (either white, gray or black), pred[u] which points to

the predecessor of u (i.e. the vertex who ﬁrst discovered u, and d[u], the distance from s to u. Only the color

is really needed for the search (in fact it is only necessary to know whether a node is nonwhite). We include all

this information, because some applications of BFS use this additional information.

Breadth-First Search

BFS(G,s) {

for each u in V { // initialization

color[u] = white

d[u] = infinity

pred[u] = null

}

color[s] = gray // initialize source s

d[s] = 0

Q = {s} // put s in the queue

while (Q is nonempty) {

u = Q.Dequeue() // u is the next to visit

for each v in Adj[u] {

if (color[v] == white) { // if neighbor v undiscovered

color[v] = gray // ...mark it discovered

d[v] = d[u]+1 // ...set its distance

pred[v] = u // ...and its predecessor

Q.Enqueue(v) // ...put it in the queue

}

}

color[u] = black // we are done with u

}

}

Observe that the predecessor pointers of the BFS search deﬁne an inverted tree (an acyclic directed graph in

which the source is the root, and every other node has a unique path to the root). If we reverse these edges we

get a rooted unordered tree called a BFS tree for G. (Note that there are many potential BFS trees for a given

graph, depending on where the search starts, and in what order vertices are placed on the queue.) These edges

of G are called tree edges and the remaining edges of G are called cross edges.

It is not hard to prove that if G is an undirected graph, then cross edges always go between two nodes that are at

most one level apart in the BFS tree. (Can you see why this must be true?) Below is a sketch of a proof that on

Lecture Notes 33 CMSC 451

Q: a, c, d Q: c, d, e Q: d, e, b

Q: e, b Q: b, f, g

Q: (empty)

1 1 1 1

2

1

0

1 1

a

2

e b

c d

s

c

b e

1

2 2

0

s

g

c

e

f

a

d

b

b, f, g e

d

c a s

3 3

1 1 1

2 2 2

1 1

b b

c d

a

s

0

1 1 1

c d

a

s

f

d

a

s

0

2

1 1

e

c d

a

s

0

3 3

2

e

g f

c d

a

s

0

e

g

Fig. 24: Breadth-ﬁrst search: Example.

termination, d[v] is equal to the distance from s to v. (See the CLRS for a detailed proof.)

Theorem: Let δ(s, v) denote the length (number of edges) on the shortest path froms to v. Then, on termination

of the BFS procedure, d[v] = δ(s, v).

Proof: (Sketch) The proof is by induction on the length of the shortest path. Let u be the predecessor of v on

some shortest path from s to v, and among all such vertices the ﬁrst to be processed by the BFS. Thus,

δ(s, v) = δ(s, u) + 1. When u is processed, we have (by induction) d[u] = δ(s, u). Since v is a neighbor

of u, we set d[v] = d[u] + 1. Thus we have

d[v] = d[u] + 1 = δ(s, u) + 1 = δ(s, v),

as desired.

Analysis: The running time analysis of BFS is similar to the running time analysis of many graph traversal algorithms.

As done in CLR V = [V [ and E = [E[. Observe that the initialization portion requires Θ(V ) time. The real

meat is in the traversal loop. Since we never visit a vertex twice, the number of times we go through the while

loop is at most V (exactly V assuming each vertex is reachable from the source). The number of iterations

through the inner for loop is proportional to deg(u) + 1. (The +1 is because even if deg(u) = 0, we need to

spend a constant amount of time to set up the loop.) Summing up over all vertices we have the running time

T(V ) = V +

¸

u∈V

(deg(u) + 1) = V +

¸

u∈V

deg(u) +V = 2V + 2E ∈ Θ(V +E).

The analysis is essentially the same for directed graphs.

Lecture Notes 34 CMSC 451

Lecture 10: Depth-First Search

Read: Sections 23.2 and 23.3 in CLR.

Depth-First Search: The next traversal algorithm that we will study is called depth-ﬁrst search, and it has the nice

property that nontree edges have a good deal of mathematical structure.

Consider the problem of searching a castle for treasure. To solve it you might use the following strategy. As

you enter a room of the castle, paint some grafﬁti on the wall to remind yourself that you were already there.

Successively travel from room to room as long as you come to a place you haven’t already been. When you

return to the same room, try a different door leaving the room (assuming it goes somewhere you haven’t already

been). When all doors have been tried in a given room, then backtrack.

Notice that this algorithm is described recursively. In particular, when you enter a new room, you are beginning

a new search. This is the general idea behind depth-ﬁrst search.

Depth-First Search Algorithm: We assume we are given an directed graph G = (V, E). The same algorithm works

for undirected graphs (but the resulting structure imposed on the graph is different).

We use four auxiliary arrays. As before we maintain a color for each vertex: white means undiscovered, gray

means discovered but not ﬁnished processing, and black means ﬁnished. As before we also store predecessor

pointers, pointing back to the vertex that discovered a given vertex. We will also associate two numbers with

each vertex. These are time stamps. When we ﬁrst discover a vertex u store a counter in d[u] and when we are

ﬁnished processing a vertex we store a counter in f[u]. The purpose of the time stamps will be explained later.

(Note: Do not confuse the discovery time d[v] with the distance d[v] from BFS.) The algorithm is shown in code

block below, and illustrated in Fig. 25. As with BFS, DFS induces a tree structure. We will discuss this tree

structure further below.

Depth-First Search

DFS(G) { // main program

for each u in V { // initialization

color[u] = white;

pred[u] = null;

}

time = 0;

for each u in V

if (color[u] == white) // found an undiscovered vertex

DFSVisit(u); // start a new search here

}

DFSVisit(u) { // start a search at u

color[u] = gray; // mark u visited

d[u] = ++time;

for each v in Adj(u) do

if (color[v] == white) { // if neighbor v undiscovered

pred[v] = u; // ...set predecessor pointer

DFSVisit(v); // ...visit v

}

color[u] = black; // we’re done with u

f[u] = ++time;

}

Analysis: The running time of DFS is Θ(V +E). This is somewhat harder to see than the BFS analysis, because the

recursive nature of the algorithm obscures things. Normally, recurrences are good ways to analyze recursively

Lecture Notes 35 CMSC 451

3/4

2/5

3/4

f

2/5 2/..

DFS(f)

DFS(g)

f

c

b

a

return b

return c

3/..

b

c

a

b

c

f

g

1/..

7/..

6/..

a

b

c

1/..

3/4

6/9

7/8

12/13

11/14

return g

return f

return a

DFS(d)

DFS(e)

return e

return f

DFS(a)

DFS(b)

DFS(c)

3/4

g

a

b

c

f

g

d

e

1/10

6/9

7/8

2/5

1/10

2/5

b

c

a

d e

g

a

1/..

Fig. 25: Depth-First search tree.

deﬁned algorithms, but it is not true here, because there is no good notion of “size” that we can attach to each

recursive call.

First observe that if we ignore the time spent in the recursive calls, the main DFS procedure runs in O(V ) time.

Observe that each vertex is visited exactly once in the search, and hence the call DFSVisit() is made exactly

once for each vertex. We can just analyze each one individually and add up their running times. Ignoring the

time spent in the recursive calls, we can see that each vertex u can be processed in O(1+outdeg(u)) time. Thus

the total time used in the procedure is

T(V ) = V +

¸

u∈V

(outdeg(u) + 1) = V +

¸

u∈V

outdeg(u) +V = 2V +E ∈ Θ(V +E).

A similar analysis holds if we consider DFS for undirected graphs.

Tree structure: DFS naturally imposes a tree structure (actually a collection of trees, or a forest) on the structure

of the graph. This is just the recursion tree, where the edge (u, v) arises when processing vertex u we call

DFSVisit(v) for some neighbor v. For directed graphs the other edges of the graph can be classiﬁed as

follows:

Back edges: (u, v) where v is a (not necessarily proper) ancestor of u in the tree. (Thus, a self-loop is consid-

ered to be a back edge).

Forward edges: (u, v) where v is a proper descendent of u in the tree.

Cross edges: (u, v) where u and v are not ancestors or descendents of one another (in fact, the edge may go

between different trees of the forest).

It is not difﬁcult to classify the edges of a DFS tree by analyzing the values of colors of the vertices and/or

considering the time stamps. This is left as an exercise.

With undirected graphs, there are some important differences in the structure of the DFS tree. First, there is

really no distinction between forward and back edges. So, by convention, they are all called back edges by

convention. Furthermore, it can be shown that there can be no cross edges. (Can you see why not?)

Lecture Notes 36 CMSC 451

Time-stamp structure: There is also a nice structure to the time stamps. In CLR this is referred to as the parenthesis

structure. In particular, the following are easy to observe.

Lemma: (Parenthesis Lemma) Given a digraph G = (V, E), and any DFS tree for G and any two vertices

u, v ∈ V .

• u is a descendent of v if and only if [d[u], f[u]] ⊆ [d[v], f[v]].

• u is an ancestor of v if and only if [d[u], f[u]] ⊇ [d[v], f[v]].

• u is unrelated to v if and only if [d[u], f[u]] and [d[v], f[v]] are disjoint.

6/9

11/14

12/13 2/5

8

1/10

9 10 11 12 13 14

F

C

C

7/8

7 6 5 4 3 2 1

a

c

b f

d

e

g

3/4

B

C

e

d

g

f

c

b

a

Fig. 26: Parenthesis Lemma.

Cycles: The time stamps given by DFS allow us to determine a number of things about a graph or digraph. For

example, suppose you are given a graph or digraph. You run DFS. You can determine whether the graph

contains any cycles very easily. We do this with the help of the following two lemmas.

Lemma: Given a digraph G = (V, E), consider any DFS forest of G, and consider any edge (u, v) ∈ E. If this

edge is a tree, forward, or cross edge, then f[u] > f[v]. If the edge is a back edge then f[u] ≤ f[v].

Proof: For tree, forward, and back edges, the proof follows directly from the parenthesis lemma. (E.g. for a

forward edge (u, v), v is a descendent of u, and so v’s start-ﬁnish interval is contained within u’s, implying

that v has an earlier ﬁnish time.) For a cross edge (u, v) we know that the two time intervals are disjoint.

When we were processing u, v was not white (otherwise (u, v) would be a tree edge), implying that v was

started before u. Because the intervals are disjoint, v must have also ﬁnished before u.

Lemma: Consider a digraph G = (V, E) and any DFS forest for G. G has a cycle if and only the DFS forest

has a back edge.

Proof: (⇐) If there is a back edge (u, v), then v is an ancestor of u, and by following tree edges from v to u

we get a cycle.

(⇒) We show the contrapositive. Suppose there are no back edges. By the lemma above, each of the

remaining types of edges, tree, forward, and cross all have the property that they go from vertices with

higher ﬁnishing time to vertices with lower ﬁnishing time. Thus along any path, ﬁnish times decrease

monotonically, implying there can be no cycle.

Beware: No back edges means no cycles. But you should not infer that there is some simple relationship

between the number of back edges and the number of cycles. For example, a DFS tree may only have a single

back edge, and there may anywhere from one up to an exponential number of simple cycles in the graph.

A similar theorem applies to undirected graphs, and is not hard to prove.

Lecture Notes 37 CMSC 451

Lecture 11: Topological Sort and Strong Components

Read: Sects. 22.3–22.5 in CLRS.

Directed Acyclic Graph: A directed acyclic graph is often called a DAG for short DAG’s arise in many applications

where there are precedence or ordering constraints. For example, if there are a series of tasks to be performed,

and certain tasks must precede other tasks (e.g. in construction you have to build the ﬁrst ﬂoor before you build

the second ﬂoor, but you can do the electrical wiring while you install the windows). In general a precedence

constraint graph is a DAG in which vertices are tasks and the edge (u, v) means that task u must be completed

before task v begins.

A topological sort of a DAG is a linear ordering of the vertices of the DAG such that for each edge (u, v), u

appears before v in the ordering. Note that in general, there may be many legal topological orders for a given

DAG.

To compute a topological ordering is actually very easy, given DFS. By the previous lemma, for every edge

(u, v) in a DAG, the ﬁnish time of u is greater than the ﬁnish time of v. Thus, it sufﬁces to output the vertices

in reverse order of ﬁnishing time. To do this we run a (stripped down) DFS, and when each vertex is ﬁnished

we add it to the front of a linked list. The ﬁnal linked list order will be the ﬁnal topological order. This is given

below.

Topological Sort

TopSort(G) {

for each (u in V) color[u] = white; // initialize

L = new linked_list; // L is an empty linked list

for each (u in V)

if (color[u] == white) TopVisit(u);

return L; // L gives final order

}

TopVisit(u) { // start a search at u

color[u] = gray; // mark u visited

for each (v in Adj(u))

if (color[v] == white) TopVisit(v);

Append u to the front of L; // on finishing u add to list

}

This is typical example of DFS is used in applications. Observe that the structure is essentially the same as the

basic DFS procedure, but we only include the elements of DFS that are needed for this application.

As an example we consider the DAG presented in CLRS for Professor Bumstead’s order of dressing. Bumstead

lists the precedences in the order in which he puts on his clothes in the morning. We do our depth-ﬁrst search in

a different order from the one given in CLRS, and so we get a different ﬁnal ordering. However both orderings

are legitimate, given the precedence constraints. As with depth-ﬁrst search, the running time of topological sort

is Θ(V +E).

Strong Components: Next we consider a very important connectivity problem with digraphs. When digraphs are

used in communication and transportation networks, people want to know that there networks are complete in

the sense that from any location it is possible to reach any other location in the digraph. A digraph is strongly

connected if for every pair of vertices, u, v ∈ V , u can reach v and vice versa.

We would like to write an algorithm that determines whether a digraph is strongly connected. In fact we will

solve a generalization of this problem, of computing the strongly connected components (or strong components

for short) of a digraph. In particular, we partition the vertices of the digraph into subsets such that the induced

subgraph of each subset is strongly connected. (These subsets should be as large as possible, and still have this

Lecture Notes 38 CMSC 451

jacket

Final order: socks, shirt, tie, shorts, pants, shoes, belt, jacket

7/8

2/9

1/10

4/5

3/6

15/16 11/14

12/13

shirt

shirt

jacket

tie

shoes

pants

shorts

belt tie

shoes

socks

belt

pants

shorts socks

Fig. 27: Topological sort.

property.) More formally, we say that two vertices u and v are mutually reachable if u and reach v and vice

versa. It is easy to see that mutual reachability is an equivalence relation. This equivalence relation partitions

the vertices into equivalence classes of mutually reachable vertices, and these are the strong components.

Observe that if we merge the vertices in each strong component into a single super vertex, and joint two su-

pervertices (A, B) if and only if there are vertices u ∈ A and v ∈ B such that (u, v) ∈ E, then the resulting

digraph, called the component digraph, is necessarily acyclic. (Can you see why?) Thus, we may be accurately

refer to it as the component DAG.

a b

c

d

e

f

d,e

f,g,h,i

a,b,c

Digraph and Strong Components Component DAG

i

h

g

Fig. 28: Strong Components.

The algorithm that we will present is an algorithm designer’s “dream” (and an algorithm student’s nightmare).

It is amazingly simple and efﬁcient, but it is so clever that it is very difﬁcult to even see how it works. We will

give some of the intuition that leads to the algorithm, but will not prove the algorithm’s correctness formally.

See CLRS for a formal proof.

Strong Components and DFS: By way of motivation, consider the DFS of the digraph shown in the following ﬁgure

(left). By deﬁnition of DFS, when you enter a strong component, every vertex in the component is reachable,

so the DFS does not terminate until all the vertices in the component have been visited. Thus all the vertices

in a strong component must appear in the same tree of the DFS forest. Observe that in the ﬁgure each strong

component is just a subtree of the DFS forest. Is it always true for any DFS? Unfortunately the answer is

no. In general, many strong components may appear in the same DFS tree. (See the DFS on the right for a

counterexample.) Does there always exist a way to order the DFS such that it is true? Fortunately, the answer is

yes.

Suppose that you knew the component DAG in advance. (This is ridiculous, because you would need to know

the strong components, and that is the problem we are trying to solve. But humor me for a moment.) Further

Lecture Notes 39 CMSC 451

suppose that you computed a reversed topological order on the component digraph. That is, (u, v) is an edge in

the component digraph, then v comes before u in this reversed order (not after as it would in a normal topological

ordering). Now, run DFS, but every time you need a new vertex to start the search from, select the next available

vertex according to this reverse topological order of the component digraph.

Here is an informal justiﬁcation. Clearly once the DFS starts within a given strong component, it must visit

every vertex within the component (and possibly some others) before ﬁnishing. If we do not start in reverse

topological, then the search may “leak out” into other strong components, and put them in the same DFS tree.

For example, in the ﬁgure below right, when the search is started at vertex a, not only does it visit its component

with b and c, but the it also visits the other components as well. However, by visiting components in reverse

topological order of the component tree, each search cannot “leak out” into other components, because other

components would have already have been visited earlier in the search.

b

a

c

d

i

e

g

f h

h

c

b

a

f

g

i

d

e

10/11 2/3

1/8

4/7

5/6

9/12 13/18

14/17

15/16

3/4

2/13

1/18

14/17

15/16 5/12

6/11

7/10

8/9

Fig. 29: Two depth-ﬁrst searches.

This leaves us with the intuition that if we could somehow order the DFS, so that it hits the strong components

according to a reverse topological order, then we would have an easy algorithm for computing strong compo-

nents. However, we do not know what the component DAG looks like. (After all, we are trying to solve the

strong component problem in the ﬁrst place). The “trick” behind the strong component algorithm is that we

can ﬁnd an ordering of the vertices that has essentially the necessary property, without actually computing the

component DAG.

The Plumber’s Algorithm: I call this algorithm the plumber’s algorithm (because it avoids leaks). Unfortunately it

is quite difﬁcult to understand why this algorithm works. I will present the algorithm, and refer you to CLRS

for the complete proof. First recall that G

R

(what CLRS calls G

T

) is the digraph with the same vertex set as G

but in which all edges have been reversed in direction. Given an adjacency list for G, it is possible to compute

G

R

in Θ(V +E) time. (I’ll leave this as an exercise.)

Observe that the strongly connected components are not affected by reversing all the digraph’s edges. If u and v

are mutually reachable in G, then certainly this is still true in G

R

. All that changes is that the component DAG

is completely reversed. The ordering trick is to order the vertices of G according to their ﬁnish times in a DFS.

Then visit the nodes of G

R

in decreasing order of ﬁnish times. All the steps of the algorithm are quite easy to

implement, and all operate in Θ(V +E) time. Here is the algorithm.

Correctness: Why visit vertices in decreasing order of ﬁnish times? Why use the reversal digraph? It is difﬁcult

to justify these elements formally. Here is some intuition, though. Recall that the main intent is to visit the

Lecture Notes 40 CMSC 451

Strong Components

StrongComp(G) {

Run DFS(G), computing finish times f[u] for each vertex u;

Compute R = Reverse(G), reversing all edges of G;

Sort the vertices of R (by CountingSort) in decreasing order of f[u];

Run DFS(R) using this order;

Each DFS tree is a strong component;

}

d

e

f

i

h

g

3

2

1

9

4

5

6 7

c

a b

8

6/11

7/10

a

c

b

d

e

f

i

g h

3/4

14/17

15/16 5/12

Final DFS with components Reversal with new vertex order Initial DFS

a

b

c

f

g

i

h

d

e

8/9

1/18

2/13

Fig. 30: Strong Components Algorithm

strong components in a reverse topological order. The question is how to order the vertices so that this is true.

Recall from the topological sorting algorithm, that in a DAG, ﬁnish times occur in reverse topological order

(i.e., the ﬁrst vertex in the topological order is the one with the highest ﬁnish time). So, if we wanted to visit

the components in reverse topological order, this suggests that we should visit the vertices in increasing order

of ﬁnish time, starting with the lowest ﬁnishing time. This is a good starting idea, but it turns out that it doesn’t

work. The reason is that there are many vertices in each strong component, and they all have different ﬁnish

times. For example, in the ﬁgure above observe that in the ﬁrst DFS (on the left) the lowest ﬁnish time (of 4) is

achieved by vertex c, and its strong component is ﬁrst, not last, in topological order.

It is tempting to give up in frustration at this point. But there is something to notice about the ﬁnish times. If

we consider the maximum ﬁnish time in each component, then these are related to the topological order of the

component DAG. In particular, given any strong component C, deﬁne f(C) to be the maximum ﬁnish time

among all vertices in this component.

f(C) = max

u∈C

f[u].

Lemma: Consider a digraph G = (V, E) and let C and C

**be two distinct strong components. If there is an
**

(u, v) of G such that u ∈ C and v ∈ C

, then f(C) > f(C

).

See the book for a complete proof. Here is a quick sketch. If the DFS visits C ﬁrst, then the DFS will leak into

C

(along edge (u, v) or some other edge), and then will visit everything in C

**before ﬁnally returning to C.
**

Thus, some vertex of C will ﬁnish later than every vertex of C

. On the other hand, suppose that C

is visited

ﬁrst. Because there is an edge from C to C

**, we know from the deﬁnition of the component DAG that there
**

cannot be a path from C

to C. So C

**will completely ﬁnish before we even start C. Thus all the ﬁnish times of
**

C will be larger than the ﬁnish times of C

.

For example, in the previous ﬁgure, the maximum ﬁnish times for each component are 18 (for ¦a, b, c¦), 17 (for

¦d, e¦), and 12 (for ¦f, g, h, i¦). The order '18, 17, 12` is a valid topological order for the component digraph.

Lecture Notes 41 CMSC 451

This is a big help. It tells us that if we run DFS and compute ﬁnish times, and then run a new DFS in decreasing

order of ﬁnish times, we will visit the components in topological order. The problem is that this is not what

we wanted. We wanted a reverse topological order for the component DAG. So, the ﬁnal trick is to reverse

the digraph, by forming G

R

. This does not change the strong components, but it reverses the edges of the

component graph, and so reverses the topological order, which is exactly what we wanted. In conclusion we

have:

Theorem: Consider a digraph G on which DFS has been run. Sort the vertices by decreasing order of ﬁnish

time. Then a DFS of the reversed digraph G

R

, visits the strong components according to a reversed

topological order of the component DAG of G

R

.

Lecture 12: Minimum Spanning Trees and Kruskal’s Algorithm

Read: Chapt 23 in CLRS, up through 23.2.

Minimum Spanning Trees: A common problem in communications networks and circuit design is that of connect-

ing together a set of nodes (communication sites or circuit components) by a network of minimal total length

(where length is the sum of the lengths of connecting wires). We assume that the network is undirected. To

minimize the length of the connecting network, it never pays to have any cycles (since we could break any

cycle without destroying connectivity and decrease the total length). Since the resulting connection graph is

connected, undirected, and acyclic, it is a free tree.

The computational problem is called the minimum spanning tree problem (MST for short). More formally, given

a connected, undirected graph G = (V, E), a spanning tree is an acyclic subset of edges T ⊆ E that connects

all the vertices together. Assuming that each edge (u, v) of G has a numeric weight or cost, w(u, v), (may be

zero or negative) we deﬁne the cost of a spanning tree T to be the sum of edges in the spanning tree

w(T) =

¸

(u,v)∈T

w(u, v).

A minimum spanning tree (MST) is a spanning tree of minimum weight. Note that the minimum spanning tree

may not be unique, but it is true that if all the edge weights are distinct, then the MST will be distinct (this is a

rather subtle fact, which we will not prove). Fig. 31 shows three spanning trees for the same graph, where the

shaded rectangles indicate the edges in the spanning tree. The one on the left is not a minimum spanning tree,

and the other two are. (An interesting observation is that not only do the edges sum to the same value, but in

fact the same set of edge weights appear in the two MST’s. Is this a coincidence? We’ll see later.)

1

8

7

9

6

5 9

8

8 7

10

9

6

10

9

8 7

9

5

8

4

10

6

2

g

9

4

d

f

2

1

2

5

8

4

1

2 2 2

a

Cost = 22 Cost = 22 Cost = 33

a

b

c

e

g

f

d

b

c

e

g

f

d a

b

c

e

Fig. 31: Spanning trees (the middle and right are minimum spanning trees.

Steiner Minimum Trees: Minimum spanning trees are actually mentioned in the U.S. legal code. The reason is

that AT&T was a government supported monopoly at one time, and was responsible for handling all telephone

connections. If a company wanted to connect a collection of installations by an private internal phone system,

Lecture Notes 42 CMSC 451

AT&T was required (by law) to connect them in the minimum cost manner, which is clearly a spanning tree

. . . or is it?

Some companies discovered that they could actually reduce their connection costs by opening a new bogus

installation. Such an installation served no purpose other than to act as an intermediate point for connections.

An example is shown in Fig. 32. On the left, consider four installations that lie at the corners of a 1 1 square.

Assume that all edge lengths are just Euclidean distances. It is easy to see that the cost of any MST for this

conﬁguration is 3 (as shown on the left). However, if you introduce a new installation at the center, whose

distance to each of the other four points is 1/

√

2. It is now possible to connect these ﬁve points with a total cost

of 4/

√

2 = 2

√

2 ≈ 2.83. This is better than the MST.

Cost = 3

1

Steiner point

SMT MST

Cost = 2 sqrt(2) = 2.83

Fig. 32: Steiner Minimum tree.

In general, the problem of determining the lowest cost interconnection tree between a given set of nodes, assum-

ing that you are allowed additional nodes (called Steiner points) is called the Steiner minimum tree (or SMT

for short). An interesting fact is that although there is a simple greedy algorithm for MST’s (as we will see

below), the SMT problem is much harder, and in fact is NP-hard. (Luckily for AT&T, the US Legal code is

rather ambiguous on the point as to whether the phone company was required to use MST’s or SMT’s in making

connections.)

Generic approach: We will present two greedy algorithms (Kruskal’s and Prim’s algorithms) for computing a min-

imum spanning tree. Recall that a greedy algorithm is one that builds a solution by repeated selecting the

cheapest (or generally locally optimal choice) among all options at each stage. An important characteristic of

greedy algorithms is that once they make a choice, they never “unmake” this choice. Before presenting these

algorithms, let us review some basic facts about free trees. They are all quite easy to prove.

Lemma:

• A free tree with n vertices has exactly n −1 edges.

• There exists a unique path between any two vertices of a free tree.

• Adding any edge to a free tree creates a unique cycle. Breaking any edge on this cycle restores a free

tree.

Let G = (V, E) be an undirected, connected graph whose edges have numeric edge weights (which may be

positive, negative or zero). The intuition behind the greedy MST algorithms is simple, we maintain a subset of

edges A, which will initially be empty, and we will add edges one at a time, until A equals the MST. We say

that a subset A ⊆ E is viable if A is a subset of edges in some MST. (We cannot say “the” MST, since it is not

necessarily unique.) We say that an edge (u, v) ∈ E − A is safe if A ∪ ¦(u, v)¦ is viable. In other words, the

choice (u, v) is a safe choice to add so that A can still be extended to form an MST. Note that if A is viable it

cannot contain a cycle. A generic greedy algorithm operates by repeatedly adding any safe edge to the current

spanning tree. (Note that viability is a property of subsets of edges and safety is a property of a single edge.)

When is an edge safe? We consider the theoretical issues behind determining whether an edge is safe or not. Let S

be a subset of the vertices S ⊆ V . A cut (S, V − S) is just a partition of the vertices into two disjoint subsets.

An edge (u, v) crosses the cut if one endpoint is in S and the other is in V −S. Given a subset of edges A, we

Lecture Notes 43 CMSC 451

say that a cut respects A if no edge in A crosses the cut. It is not hard to see why respecting cuts are important

to this problem. If we have computed a partial MST, and we wish to know which edges can be added that do

not induce a cycle in the current MST, any edge that crosses a respecting cut is a possible candidate.

An edge of E is a light edge crossing a cut, if among all edges crossing the cut, it has the minimum weight

(the light edge may not be unique if there are duplicate edge weights). Intuition says that since all the edges

that cross a respecting cut do not induce a cycle, then the lightest edge crossing a cut is a natural choice. The

main theorem which drives both algorithms is the following. It essentially says that we can always augment A

by adding the minimum weight edge that crosses a cut which respects A. (It is stated in complete generality, so

that it can be applied to both algorithms.)

MST Lemma: Let G = (V, E) be a connected, undirected graph with real-valued weights on the edges. Let

A be a viable subset of E (i.e. a subset of some MST), let (S, V − S) be any cut that respects A, and let

(u, v) be a light edge crossing this cut. Then the edge (u, v) is safe for A.

Proof: It will simplify the proof to assume that all the edge weights are distinct. Let T be any MST for G (see

Fig. ). If T contains (u, v) then we are done. Suppose that no MST contains (u, v). We will derive a

contradiction.

8

7

4

9

4 4

x x x

T’ = T − (x,y) + (u,v) T + (u,v)

8

u

y

6

v

y

v u v u

y

A

Fig. 33: Proof of the MST Lemma. Edge (u, v) is the light edge crossing cut (S, V −S).

Add the edge (u, v) to T, thus creating a cycle. Since u and v are on opposite sides of the cut, and since

any cycle must cross the cut an even number of times, there must be at least one other edge (x, y) in T that

crosses the cut.

The edge (x, y) is not in A (because the cut respects A). By removing (x, y) we restore a spanning tree,

call it T

. We have

w(T

) = w(T) −w(x, y) +w(u, v).

Since (u, v) is lightest edge crossing the cut, we have w(u, v) < w(x, y). Thus w(T

) < w(T). This

contradicts the assumption that T was an MST.

Kruskal’s Algorithm: Kruskal’s algorithm works by attempting to add edges to the A in increasing order of weight

(lightest edges ﬁrst). If the next edge does not induce a cycle among the current set of edges, then it is added to

A. If it does, then this edge is passed over, and we consider the next edge in order. Note that as this algorithm

runs, the edges of A will induce a forest on the vertices. As the algorithm continues, the trees of this forest are

merged together, until we have a single tree containing all the vertices.

Observe that this strategy leads to a correct algorithm. Why? Consider the edge (u, v) that Kruskal’s algorithm

seeks to add next, and suppose that this edge does not induce a cycle in A. Let A

**denote the tree of the forest
**

A that contains vertex u. Consider the cut (A

, V − A

**). Every edge crossing the cut is not in A, and so this
**

cut respects A, and (u, v) is the light edge across the cut (because any lighter edge would have been considered

earlier by the algorithm). Thus, by the MST Lemma, (u, v) is safe.

Lecture Notes 44 CMSC 451

The only tricky part of the algorithm is how to detect efﬁciently whether the addition of an edge will create a

cycle in A. We could perform a DFS on subgraph induced by the edges of A, but this will take too much time.

We want a fast test that tells us whether u and v are in the same tree of A.

This can be done by a data structure (which we have not studied) called the disjoint set Union-Find data structure.

This data structure supports three operations:

Create-Set(u): Create a set containing a single item v.

Find-Set(u): Find the set that contains a given item u.

Union(u, v): Merge the set containing u and the set containing v into a common set.

You are not responsible for knowing how this data structure works (which is described in CLRS). You may

use it as a “black-box”. For our purposes it sufﬁces to know that each of these operations can be performed in

O(log n) time, on a set of size n. (The Union-Find data structure is quite interesting, because it can actually

perform a sequence of n operations much faster than O(nlog n) time. However we will not go into this here.

O(log n) time is fast enough for its use in Kruskal’s algorithm.)

In Kruskal’s algorithm, the vertices of the graph will be the elements to be stored in the sets, and the sets will be

vertices in each tree of A. The set A can be stored as a simple list of edges. The algorithm is shown below, and

an example is shown in Fig. 34.

Kruskal’s Algorithm

Kruskal(G=(V,E),w) {

A = {} // initially A is empty

for each (u in V) Create_Set(u) // create set for each vertex

Sort E in increasing order by weight w

for each ((u,v) from the sorted list) {

if (Find_Set(u) != Find_Set(v)) { // u and v in different trees

Add (u,v) to A

Union(u, v)

}

}

return A

}

2

6

5

9

9

7

10

8

2

10

4

8

9

8 7

9

5

6

2

2

1

2

4

5 5

2

1 1

2

2

1

2

a

4

8

4

9

7

6

10 10 9 8 7 6

9

9

8 8

10

7

6

8

9

9

2

4

8

9

8

10

7

9

5

6

10

8

9

8

7

9

5

6

a

b

c

e

g

d a

b

c

e

g

a

c

e

a

b

c

c

c c

c

1

4

8

2

1

2

a

c c

c

c c

c

c

e

c c

c

c

c c

2 2

5

4

1

a

c

c

c c

Fig. 34: Kruskal’s Algorithm. Each vertex is labeled according to the set that contains it.

Analysis: How long does Kruskal’s algorithm take? As usual, let V be the number of vertices and E be the number of

edges. Since the graph is connected, we may assume that E ≥ V −1. Observe that it takes Θ(E log E) time to

Lecture Notes 45 CMSC 451

sort the edges. The for-loop is iterated E times, and each iteration involves a constant number of accesses to the

Union-Find data structure on a collection of V items. Thus each access is Θ(V ) time, for a total of Θ(E log V ).

Thus the total running time is the sum of these, which is Θ((V +E) log V ). Since V is asymptotically no larger

than E, we could write this more simply as Θ(E log V ).

Lecture 13: Prim’s and Baruvka’s Algorithms for MSTs

Read: Chapt 23 in CLRS. Baruvka’s algorithm is not described in CLRS.

Prim’s Algorithm: Prim’s algorithmis another greedy algorithmfor minimumspanning trees. It differs fromKruskal’s

algorithm only in how it selects the next safe edge to add at each step. Its running time is essentially the same

as Kruskal’s algorithm, O((V +E) log V ). There are two reasons for studying Prim’s algorithm. The ﬁrst is to

show that there is more than one way to solve a problem (an important lesson to learn in algorithm design), and

the second is that Prim’s algorithm looks very much like another greedy algorithm, called Dijkstra’s algorithm,

that we will study for a completely different problem, shortest paths. Thus, not only is Prim’s a different way to

solve the same MST problem, it is also the same way to solve a different problem. (Whatever that means!)

Different ways to grow a tree: Kruskal’s algorithm worked by ordering the edges, and inserting them one by one

into the spanning tree, taking care never to introduce a cycle. Intuitively Kruskal’s works by merging or splicing

two trees together, until all the vertices are in the same tree.

In contrast, Prim’s algorithm builds the tree up by adding leaves one at a time to the current tree. We start with

a root vertex r (it can be any vertex). At any time, the subset of edges A forms a single tree (in Kruskal’s it

formed a forest). We look to add a single vertex as a leaf to the tree. The process is illustrated in the following

ﬁgure.

r

u

10

10

u

12

11

4

5

7

6

r

3

5

7

6

12

9

Fig. 35: Prim’s Algorithm.

Observe that if we consider the set of vertices S currently part of the tree, and its complement (V −S), we have

a cut of the graph and the current set of tree edges A respects this cut. Which edge should we add next? The

MST Lemma from the previous lecture tells us that it is safe to add the light edge. In the ﬁgure, this is the edge

of weight 4 going to vertex u. Then u is added to the vertices of S, and the cut changes. Note that some edges

that crossed the cut before are no longer crossing it, and others that were not crossing the cut are.

It is easy to see, that the key questions in the efﬁcient implementation of Prim’s algorithm is how to update the

cut efﬁciently, and how to determine the light edge quickly. To do this, we will make use of a priority queue

data structure. Recall that this is the data structure used in HeapSort. This is a data structure that stores a set of

items, where each item is associated with a key value. The priority queue supports three operations.

insert(u, key): Insert u with the key value key in Q.

extractMin(): Extract the item with the minimum key value in Q.

Lecture Notes 46 CMSC 451

decreaseKey(u, new key): Decrease the value of u’s key value to new key.

A priority queue can be implemented using the same heap data structure used in heapsort. All of the above

operations can be performed in O(log n) time, where n is the number of items in the heap.

What do we store in the priority queue? At ﬁrst you might think that we should store the edges that cross the

cut, since this is what we are removing with each step of the algorithm. The problem is that when a vertex is

moved from one side of the cut to the other, this results in a complicated sequence of updates.

There is a much more elegant solution, and this is what makes Prim’s algorithm so nice. For each vertex in

u ∈ V − S (not part of the current spanning tree) we associate u with a key value key[u], which is the weight

of the lightest edge going from u to any vertex in S. We also store in pred[u] the end vertex of this edge in S.

If there is not edge from u to a vertex in V − S, then we set its key value to +∞. We will also need to know

which vertices are in S and which are not. We do this by coloring the vertices in S black.

Here is Prim’s algorithm. The root vertex r can be any vertex in V .

Prim’s Algorithm

Prim(G,w,r) {

for each (u in V) { // initialization

key[u] = +infinity;

color[u] = white;

}

key[r] = 0; // start at root

pred[r] = nil;

Q = new PriQueue(V); // put vertices in Q

while (Q.nonEmpty()) { // until all vertices in MST

u = Q.extractMin(); // vertex with lightest edge

for each (v in Adj[u]) {

if ((color[v] == white) && (w(u,v) < key[v])) {

key[v] = w(u,v); // new lighter edge out of v

Q.decreaseKey(v, key[v]);

pred[v] = u;

}

}

color[u] = black;

}

[The pred pointers define the MST as an inverted tree rooted at r]

}

The following ﬁgure illustrates Prim’s algorithm. The arrows on edges indicate the predecessor pointers, and

the numeric label in each vertex is the key value.

To analyze Prim’s algorithm, we account for the time spent on each vertex as it is extracted from the priority

queue. It takes O(log V ) to extract this vertex from the queue. For each incident edge, we spend potentially

O(log V ) time decreasing the key of the neighboring vertex. Thus the time is O(log V + deg(u) log V ) time.

The other steps of the update are constant time. So the overall running time is

T(V, E) =

¸

u∈V

(log V + deg(u) log V ) =

¸

u∈V

(1 + deg(u)) log V

= log V

¸

u∈V

(1 + deg(u)) = (log V )(V + 2E) = Θ((V +E) log V ).

Since G is connected, V is asymptotically no greater than E, so this is Θ(E log V ). This is exactly the same as

Kruskal’s algorithm.

Lecture Notes 47 CMSC 451

8

9

2

5

10

2

5

6

2

9

7 8

4

8

9

10

4

8

4

8

9

4

8

9

10

2

5

6

2

9

7 8

7

10

10

2

5

6

2

9

8

4

8

9

10

2

5

6

2

7

2

5

6

2

9

7 8

6

9

2

9

7 8

9

8

4

8

2

1

4

Q: 4,8,?,?,?,? Q: 8,8,10,?,? Q: 1,2,10,?

Q: 2,2,5 Q: 2,5 Q: <empty>

2 5

?

?

?

?

8

4

8

10 10

2

5

2

?

2

5

1

2 ? ?

1

8

1

1

1 1

1

Fig. 36: Prim’s Algorithm.

Baruvka’s Algorithm: We have seen two ways (Kruskal’s and Prim’s algorithms) for solving the MST problem. So,

it may seem like complete overkill to consider yet another algorithm. This one is called Baruvka’s algorithm.

It is actually the oldest of the three algorithms (invented in 1926, well before the ﬁrst computers). The reason

for studying this algorithm is that of the three algorithms, it is the easiest to implement on a parallel computer.

Unlike Kruskal’s and Prim’s algorithms, which add edges one at a time, Baruvka’s algorithm adds a whole set

of edges all at once to the MST.

Baruvka’s algorithm is similar to Kruskal’s algorithm, in the sense that it works by maintaining a collection

of disconnected trees. Let us call each subtree a component. Initially, each vertex is by itself in a one-vertex

component. Recall that with each stage of Kruskal’s algorithm, we add the lightest-weight edge that connects

two different components together. To prove Kruskal’s algorithm correct, we argued (from the MST Lemma)

that the lightest such edge will be safe to add to the MST.

In fact, a closer inspection of the proof reveals that the cheapest edge leaving any component is always safe.

This suggests a more parallel way to grow the MST. Each component determines the lightest edge that goes

from inside the component to outside the component (we don’t care where). We say that such an edge leaves the

component. Note that two components might select the same edge by this process. By the above observation,

all of these edges are safe, so we may add them all at once to the set A of edges in the MST. As a result, many

components will be merged together into a single component. We then apply DFS to the edges of A, to identify

the new components. This process is repeated until only one component remains. A fairly high-level description

of Baruvka’s algorithm is given below.

Baruvka’s Algorithm

Baruvka(G=(V,E), w) {

initialize each vertex to be its own component;

A = {}; // A holds edges of the MST

do {

for (each component C) {

find the lightest edge (u,v) with u in C and v not in C;

add {u,v} to A (unless it is already there);

}

apply DFS to graph H=(V,A), to compute the new components;

} while (there are 2 or more components);

return A; // return final MST edges

Lecture Notes 48 CMSC 451

There are a number of unspeciﬁed details in Baruvka’s algorithm, which we will not spell out in detail, except to

note that they can be solved in Θ(V +E) time through DFS. First, we may apply DFS, but only traversing the

edges of A to compute the components. Each DFS tree will correspond to a separate component. We label each

vertex with its component number as part of this process. With these labels it is easy to determine which edges

go between components (since their endpoints have different labels). Then we can traverse each component

again to determine the lightest edge that leaves the component. (In fact, with a little more cleverness, we can do

all this without having to perform two separate DFS’s.) The algorithm is illustrated in the ﬁgure below.

8

7

6 a

a

h h

h

h

h

a

h

12

15

2

9

14

3

10

11

4

13

1

h

c

a h

e

h

e c

a 6

7

8

12

15

2

9

14

3

10

11

4

13

1

a

10

e

i

g

h

c

a 6

7

8

12

15

2

9

14

3

10

11

4

13

1

f

h h

h

h

h

h

h

h 6

7

8

12

15

2

9

14

3

11

4

13

1

d

b

Fig. 37: Baruvka’s Algorithm.

Analysis: How long does Baruvka’s algorithm take? Observe that because each iteration involves doing a DFS, each

iteration (of the outer do-while loop) can be performed in Θ(V +E) time. The question is how many iterations

are required in general? We claim that there are never more than O(log n) iterations needed. To see why, let m

denote the number of components at some stage. Each of the m components, will merge with at least one other

component. Afterwards the number of remaining components could be a low as 1 (if they all merge together),

but never higher than m/2 (if they merge in pairs). Thus, the number of components decreases by at least

half with each iteration. Since we start with V components, this can happen at most lg V time, until only one

component remains. Thus, the total running time is Θ((V +E) log V ) time. Again, since G is connected, V is

asymptotically no larger than E, so we can write this more succinctly as Θ(E log V ). Thus all three algorithms

have the same asymptotic running time.

Lecture 14: Dijkstra’s Algorithm for Shortest Paths

Read: Chapt 24 in CLRS.

Shortest Paths: Consider the problem of computing shortest paths in a directed graph. We have already seen that

breadth-ﬁrst search is an O(V +E) algorithm for ﬁnding shortest paths from a single source vertex to all other

vertices, assuming that the graph has no edge weights. Suppose that the graph has edge weights, and we wish

to compute the shortest paths from a single source vertex to all other vertices in the graph.

By the way, there are other formulations of the shortest path problem. One may want just the shortest path

between a single pair of vertices. Most algorithms for this problem are variants of the single-source algorithm

that we will present. There is also a single sink problem, which can be solved in the transpose digraph (that is,

by reversing the edges). Computing all-pairs shortest paths can be solved by iterating a single-source algorithm

over all vertices, but there are other global methods that are faster.

Think of the vertices as cities, and the weights represent the cost of traveling from one city to another (nonex-

istent edges can be thought of a having inﬁnite cost). When edge weights are present, we deﬁne the length of a

Lecture Notes 49 CMSC 451

path to be the sum of edge weights along the path. Deﬁne the distance between two vertices, u and v, δ(u, v) to

be the length of the minimum length path from u to v. (δ(u, u) = 0 by considering path of 0 edges from u to

itself.)

Single Source Shortest Paths: The single source shortest path problem is as follows. We are given a directed graph

with nonnegative edge weights G = (V, E) and a distinguished source vertex, s ∈ V . The problem is to

determine the distance from the source vertex to every vertex in the graph.

It is possible to have graphs with negative edges, but in order for the shortest path to be well deﬁned, we need to

add the requirement that there be no cycles whose total cost is negative (otherwise you make the path inﬁnitely

short by cycling forever through such a cycle). The text discusses the Bellman-Ford algorithm for ﬁnding

shortest paths assuming negative weight edges but no negative-weight cycles are present. We will discuss a

simple greedy algorithm, called Dijkstra’s algorithm, which assumes there are no negative edge weights.

We will stress the task of computing the minimum distance from the source to each vertex. Computing the

actual path will be a fairly simple extension. As in breadth-ﬁrst search, for each vertex we will have a pointer

pred[v] which points back to the source. By following the predecessor pointers backwards from any vertex, we

will construct the reversal of the shortest path to v.

Shortest Paths and Relaxation: The basic structure of Dijkstra’s algorithm is to maintain an estimate of the shortest

path for each vertex, call this d[v]. (NOTE: Don’t confuse d[v] with the d[v] in the DFS algorithm. They are

completely different.) Intuitively d[v] will be the length of the shortest path that the algorithm knows of from

s to v. This, value will always greater than or equal to the true shortest path distance from s to v. Initially, we

know of no paths, so d[v] = ∞. Initially d[s] = 0 and all the other d[v] values are set to ∞. As the algorithm

goes on, and sees more and more vertices, it attempts to update d[v] for each vertex in the graph, until all the

d[v] values converge to the true shortest distances.

The process by which an estimate is updated is called relaxation. Here is how relaxation works. Intuitively, if

you can see that your solution is not yet reached an optimum value, then push it a little closer to the optimum.

In particular, if you discover a path from s to v shorter than d[v], then you need to update d[v]. This notion is

common to many optimization algorithms.

Consider an edge from a vertex u to v whose weight is w(u, v). Suppose that we have already computed current

estimates on d[u] and d[v]. We know that there is a path from s to u of weight d[u]. By taking this path and

following it with the edge (u, v) we get a path to v of length d[u] +w(u, v). If this path is better than the existing

path of length d[v] to v, we should update d[v] to the value d[u] + w(u, v). This is illustrated in Fig. 38. We

should also remember that the shortest path to v passes through u, which we do by updating v’s predecessor

pointer.

v

u

8

3 5

s

0

relax(u,v)

s

v

u

11

3 5

0

Fig. 38: Relaxation.

Relaxing an edge

Relax(u,v) {

if (d[u] + w(u,v) < d[v]) { // is the path through u shorter?

d[v] = d[u] + w(u,v) // yes, then take it

pred[v] = u // record that we go through u

}

}

Lecture Notes 50 CMSC 451

Observe that whenever we set d[v] to a ﬁnite value, there is always evidence of a path of that length. Therefore

d[v] ≥ δ(s, v). If d[v] = δ(s, v), then further relaxations cannot change its value.

It is not hard to see that if we perform Relax(u, v) repeatedly over all edges of the graph, the d[v] values will

eventually converge to the ﬁnal true distance value from s. The cleverness of any shortest path algorithm is

to perform the updates in a judicious manner, so the convergence is as fast as possible. In particular, the best

possible would be to order relaxation operations in such a way that each edge is relaxed exactly once. Dijkstra’s

algorithm does exactly this.

Dijkstra’s Algorithm: Dijkstra’s algorithm is based on the notion of performing repeated relaxations. Dijkstra’s

algorithm operates by maintaining a subset of vertices, S ⊆ V , for which we claim we “know” the true distance,

that is d[v] = δ(s, v). Initially S = ∅, the empty set, and we set d[s] = 0 and all others to +∞. One by one we

select vertices from V −S to add to S.

The set S can be implemented using an array of vertex colors. Initially all vertices are white, and we set

color[v] = black to indicate that v ∈ S.

How do we select which vertex among the vertices of V − S to add next to S? Here is where greedy selection

comes in. Dijkstra recognized that the best way in which to perform relaxations is by increasing order of distance

from the source. This way, whenever a relaxation is being performed, it is possible to infer that result of the

relaxation yields the ﬁnal distance value. To implement this, for each vertex in u ∈ V − S, we maintain a

distance estimate d[u]. The greedy thing to do is to take the vertex of V − S for which d[u] is minimum, that

is, take the unprocessed vertex that is closest (by our estimate) to s. Later we will justify why this is the proper

choice.

In order to perform this selection efﬁciently, we store the vertices of V − S in a priority queue (e.g. a heap),

where the key value of each vertex u is d[u]. Note the similarity with Prim’s algorithm, although a different

key value is used there. Also recall that if we implement the priority queue using a heap, we can perform the

operations Insert(), Extract Min(), and Decrease Key(), on a priority queue of size n each in O(log n) time.

Each vertex “knows” its location in the priority queue (e.g. has a cross reference link to the priority queue entry),

and each entry in the priority queue “knows” which vertex it represents. It is important when implementing the

priority queue that this cross reference information is updated.

Here is Dijkstra’s algorithm. (Note the remarkable similarity to Prim’s algorithm.) An example is presented in

Fig. 39.

Notice that the coloring is not really used by the algorithm, but it has been included to make the connection with

the correctness proof a little clearer. Because of the similarity between this and Prim’s algorithm, the running

time is the same, namely Θ(E log V ).

Correctness: Recall that d[v] is the distance value assigned to vertex v by Dijkstra’s algorithm, and let δ(s, v) denote

the length of the true shortest path from s to v. To see that Dijkstra’s algorithm correctly gives the ﬁnal true

distances, we need to show that d[v] = δ(s, v) when the algorithm terminates. This is a consequence of the

following lemma, which states that once a vertex u has been added to S (i.e. colored black), d[u] is the true

shortest distance from s to u. Since at the end of the algorithm, all vertices are in S, then all distance estimates

are correct.

Lemma: When a vertex u is added to S, d[u] = δ(s, u).

Proof: It will simplify the proof conceptually if we assume that all the edge weights are strictly positive (the

general case of nonnegative edges is presented in the text).

Suppose to the contrary that at some point Dijkstra’s algorithm ﬁrst attempts to add a vertex u to S for

which d[u] = δ(s, u). By our observations about relaxation, d[u] is never less than δ(s, u), thus we have

d[u] > δ(s, u). Consider the situation just prior to the insertion of u. Consider the true shortest path from

s to u. Because s ∈ S and u ∈ V − S, at some point this path must ﬁrst jump out of S. Let (x, y) be the

edge taken by the path, where x ∈ S and y ∈ V −S. (Note that it may be that x = s and/or y = u).

Lecture Notes 51 CMSC 451

Dijkstra’s Algorithm

Dijkstra(G,w,s) {

for each (u in V) { // initialization

d[u] = +infinity

color[u] = white

pred[u] = null

}

d[s] = 0 // dist to source is 0

Q = new PriQueue(V) // put all vertices in Q

while (Q.nonEmpty()) { // until all vertices processed

u = Q.extractMin() // select u closest to s

for each (v in Adj[u]) {

if (d[u] + w(u,v) < d[v]) { // Relax(u,v)

d[v] = d[u] + w(u,v)

Q.decreaseKey(v, d[v])

pred[v] = u

}

}

color[u] = black

}

[The pred pointers define an ‘‘inverted’’ shortest path tree]

}

s

3

2

6

5

2 5

8

1

7

6

7

2

1

8

4 5

5

2

3

s

5 3

s

7

2

1

8

4

5

2

3

s

7

2

1

8

4 5

5

2

3

s

2

2

0

5

2 7

4

2

8

0 2

5 6 7

10 ?

? ?

?

?

?

7

7

2

1

8

4 5

5

6

0 0

7

2

0

5

7

5

0

2 7

0

5

2

3

s

7

1

4 5

5

2

Fig. 39: Dijkstra’s Algorithm example.

pred[u]

s to u?

shorter path from

d[y] > d[u]

y

x

S

s

u

Fig. 40: Correctness of Dijkstra’s Algorithm.

Lecture Notes 52 CMSC 451

We argue that y = u. Why? Since x ∈ S we have d[x] = δ(s, x). (Since u was the ﬁrst vertex added to

S which violated this, all prior vertices satisfy this.) Since we applied relaxation to x when it was added,

we would have set d[y] = d[x] + w(x, y) = δ(s, y). Thus d[y] is correct, and by hypothesis, d[u] is not

correct, so they cannot be the same.

Now observe that since y appears somewhere along the shortest path from s to u (but not at u) and all

subsequent edges following y are of positive weight, we have δ(s, y) < δ(s, u), and thus

d[y] = δ(s, y) < δ(s, u) < d[u].

Thus y would have been added to S before u, in contradiction to our assumption that u is the next vertex

to be added to S.

Lecture 15: All-Pairs Shortest Paths

Read: Section 25.2 in CLRS.

All-Pairs Shortest Paths: We consider the generalization of the shortest path problem, to computing shortest paths

between all pairs of vertices. Let G = (V, E) be a directed graph with edge weights. If (u, v) E, is an edge

of G, then the weight of this edge is denoted w(u, v). Recall that the cost of a path is the sum of edge weights

along the path. The distance between two vertices δ(u, v) is the cost of the minimum cost path between them.

We will allow G to have negative cost edges, but we will not allow G to have any negative cost cycles.

We consider the problem of determining the cost of the shortest path between all pairs of vertices in a weighted

directed graph. We will present a Θ(n

3

) algorithm, called the Floyd-Warshall algorithm. This algorithm is

based on dynamic programming.

For this algorithm, we will assume that the digraph is represented as an adjacency matrix, rather than the more

common adjacency list. Although adjacency lists are generally more efﬁcient for sparse graphs, storing all the

inter-vertex distances will require Ω(n

2

) storage, so the savings is not justiﬁed here. Because the algorithm is

matrix-based, we will employ common matrix notation, using i, j and k to denote vertices rather than u, v, and

w as we usually do.

Input Format: The input is an n n matrix w of edge weights, which are based on the edge weights in the digraph.

We let w

ij

denote the entry in row i and column j of w.

w

ij

=

0 if i = j,

w(i, j) if i = j and (i, j) ∈ E,

+∞ if i = j and (i, j) / ∈ E.

Setting w

ij

= ∞if there is no edge, intuitively means that there is no direct link between these two nodes, and

hence the direct cost is inﬁnite. The reason for setting w

ii

= 0 is that there is always a trivial path of length 0

(using no edges) from any vertex to itself. (Note that in digraphs it is possible to have self-loop edges, and so

w(i, i) may generally be nonzero. It cannot be negative, since we assume that there are no negative cost cycles,

and if it is positive, there is no point in using it as part of any shortest path.)

The output will be an n n distance matrix D = d

ij

where d

ij

= δ(i, j), the shortest path cost from vertex i

to j. Recovering the shortest paths will also be an issue. To help us do this, we will also compute an auxiliary

matrix mid[i, j]. The value of mid[i, j] will be a vertex that is somewhere along the shortest path from i to j.

If the shortest path travels directly from i to j without passing through any other vertices, then mid[i, j] will be

set to null. These intermediate values behave somewhat like the predecessor pointers in Dijkstra’s algorithm, in

order to reconstruct the ﬁnal shortest path in Θ(n) time.

Lecture Notes 53 CMSC 451

Floyd-Warshall Algorithm: The Floyd-Warshall algorithm dates back to the early 60’s. Warshall was interested

in the weaker question of reachability: determine for each pair of vertices u and v, whether u can reach v.

Floyd realized that the same technique could be used to compute shortest paths with only minor variations. The

Floyd-Warshall algorithm runs in Θ(n

3

) time.

As with any DP algorithm, the key is reducing a large problem to smaller problems. A natural way of doing this

is by limiting the number of edges of the path, but it turns out that this does not lead to the fastest algorithm(but is

an approach worthy of consideration). The main feature of the Floyd-Warshall algorithm is in ﬁnding a the best

formulation for the shortest path subproblem. Rather than limiting the number of edges on the path, they instead

limit the set of vertices through which the path is allowed to pass. In particular, for a path p = 'v

1

, v

2

, . . . , v

`

we say that the vertices v

2

, v

3

, . . . , v

−1

are the intermediate vertices of this path. Note that a path consisting of

a single edge has no intermediate vertices.

Formulation: Deﬁne d

(k)

ij

to be the shortest path from i to j such that any intermediate vertices on the path are

chosen from the set ¦1, 2, . . . , k¦.

In other words, we consider a path from i to j which either consists of the single edge (i, j), or it visits some

intermediate vertices along the way, but these intermediate can only be chosen from among ¦1, 2, . . . , k¦. The

path is free to visit any subset of these vertices, and to do so in any order. For example, in the digraph shown in

the Fig. 41(a), notice how the value of d

(k)

5,6

changes as k varies.

9

4

6

3

1

4

1

1

(b) (a)

(5,1,6)

5,6

(3)

5,6

(2)

5,6

(1)

5,6

(no path)

(5,4,1,6)

(5,3,2,6)

(5,2,6)

d

d

d

d

d

= 6

= 8

= 9

= 13

= INF

5,6

(0)

(k−1)

1

d

ij

(k−1)

d

ik

(k−1)

d

kj

i

k

j

Vertices 1,2,...,k−1

5

4 2

6

3

(4)

Fig. 41: Limiting intermediate vertices. For example d

(3)

5,6

can go through any combination of the intermediate vertices

¦1, 2, 3¦, of which '5, 3, 2, 6` has the lowest cost of 8.

Floyd-Warshall Update Rule: How do we compute d

(k)

ij

assuming that we have already computed the previous ma-

trix d

(k−1)

? There are two basic cases, depending on the ways that we might get from vertex i to vertex j,

assuming that the intermediate vertices are chosen from ¦1, 2, . . . , k¦:

Don’t go through k at all: Then the shortest path from i to j uses only intermediate vertices ¦1, . . . , k − 1¦

and hence the length of the shortest path is d

(k−1)

ij

.

Do go through k: First observe that a shortest path does not pass through the same vertex twice, so we can

assume that we pass through k exactly once. (The assumption that there are no negative cost cycles is

being used here.) That is, we go from i to k, and then from k to j. In order for the overall path to be as

short as possible we should take the shortest path from i to k, and the shortest path from k to j. Since of

these paths uses intermediate vertices only in ¦1, 2, . . . , k −1¦, the length of the path is d

(k−1)

ik

+d

(k−1)

kj

.

Lecture Notes 54 CMSC 451

This suggests the following recursive rule (the DP formulation) for computing d

(k)

, which is illustrated in

Fig. 41(b).

d

(0)

ij

= w

ij

,

d

(k)

ij

= min

d

(k−1)

ij

, d

(k−1)

ik

+d

(k−1)

kj

for k ≥ 1.

The ﬁnal answer is d

(n)

ij

because this allows all possible vertices as intermediate vertices. We could write a

recursive programto compute d

(k)

ij

, but this will be prohibitively slowbecause the same value may be reevaluated

many times. Instead, we compute it by storing the values in a table, and looking the values up as we need them.

Here is the complete algorithm. We have also included mid-vertex pointers, mid[i, j] for extracting the ﬁnal

shortest paths. We will leave the extraction of the shortest path as an exercise.

Floyd-Warshall Algorithm

Floyd_Warshall(int n, int w[1..n, 1..n]) {

array d[1..n, 1..n]

for i = 1 to n do { // initialize

for j = 1 to n do {

d[i,j] = W[i,j]

mid[i,j] = null

}

}

for k = 1 to n do // use intermediates {1..k}

for i = 1 to n do // ...from i

for j = 1 to n do // ...to j

if (d[i,k] + d[k,j]) < d[i,j]) {

d[i,j] = d[i,k] + d[k,j] // new shorter path length

mid[i,j] = k // new path is through k

}

return d // matrix of distances

}

An example of the algorithm’s execution is shown in Fig. 42.

Clearly the algorithm’s running time is Θ(n

3

). The space used by the algorithm is Θ(n

2

). Observe that we

deleted all references to the superscript (k) in the code. It is left as an exercise that this does not affect the

correctness of the algorithm. (Hint: The danger is that values may be overwritten and then used later in the same

phase. Consider which entries might be overwritten and then reused, they occur in row k and column k. It can

be shown that the overwritten values are equal to their original values.)

Lecture 16: NP-Completeness: Languages and NP

Read: Chapt 34 in CLRS, up through section 34.2.

Complexity Theory: At this point of the semester we have been building up your “bag of tricks” for solving algorith-

mic problems. Hopefully when presented with a problem you now have a little better idea of how to go about

solving the problem. What sort of design paradigm should be used (divide-and-conquer, DFS, greedy, dynamic

programming), what sort of data structures might be relevant (trees, heaps, graphs) and what representations

would be best (adjacency list, adjacency matrices), what is the running time of your algorithm.

All of this is ﬁne if it helps you discover an acceptably efﬁcient algorithm to solve your problem. The question

that often arises in practice is that you have tried every trick in the book, and nothing seems to work. Although

Lecture Notes 55 CMSC 451

9

5

1 3

12

9

1

2

1

7

1

6

5

2

12

9

5

6

12

9

5

1

(3)

d =

(1)

d =

(4)

1 8

4 4

2

8 1

2

4

8

4

1

3

4

5

d =

4

3

2

0 3 4 1

5 0 1 6

4 7 0 5

7 2 3 0

0 8 ? 1

? 0 1 ?

4 ? 0 ?

? 2 9 0

? 0 1 ?

0 8 ? 1

4 12 0 5

? 2 9 0

0 8 9 1

4 12 0 5

7 2 3 0

? 0 1 ?

4 12 0 5

? 2 3 0

3

7

4

7

1

1

2

3

8

5

? = infinity

5 0 1 6

0 8 9 1

1

1

4

3

2

1

4 2

3

1

4

3

2

1

4

3

2

d =

(2)

(0)

d =

Fig. 42: Floyd-Warshall Example. Newly updates entries are circled.

Lecture Notes 56 CMSC 451

your algorithm can solve small problems reasonably efﬁciently (e.g. n ≤ 20) the really large applications that

you want to solve (e.g. n = 1, 000 or n = 10, 000) your algorithm never terminates. When you analyze its

running time, you realize that it is running in exponential time, perhaps n

√

n

, or 2

n

, or 2

(2

n

)

, or n!, or worse!

Near the end of the 60’s where there was great success in ﬁnding efﬁcient solutions to many combinatorial prob-

lems, but there was also a growing list of problems for which there seemed to be no known efﬁcient algorithmic

solutions. People began to wonder whether there was some unknown paradigm that would lead to a solution

to these problems, or perhaps some proof that these problems are inherently hard to solve and no algorithmic

solutions exist that run under exponential time.

Near the end of the 60’s a remarkable discovery was made. Many of these hard problems were interrelated

in the sense that if you could solve any one of them in polynomial time, then you could solve all of them in

polynomial time. This discovery gave rise to the notion of NP-completeness, and created possibly the biggest

open problems in computer science: is P = NP? We will be studying this concept over the next few lectures.

This area is a radical departure from what we have been doing because the emphasis will change. The goal is

no longer to prove that a problem can be solved efﬁciently by presenting an algorithm for it. Instead we will be

trying to show that a problem cannot be solved efﬁciently. The question is how to do this?

Laying down the rules: We need some way to separate the class of efﬁciently solvable problems from inefﬁciently

solvable problems. We will do this by considering problems that can be solved in polynomial time.

When designing algorithms it has been possible for us to be rather informal with various concepts. We have

made use of the fact that an intelligent programmer could ﬁll in any missing details. However, the task of

proving that something cannot be done efﬁciently must be handled much more carefully, since we do not want

leave any “loopholes” that would allow someone to subvert the rules in an unreasonable way and claim to have

an efﬁcient solution when one does not really exist.

We have measured the running time of algorithms using worst-case complexity, as a function of n, the size of

the input. We have deﬁned input size variously for different problems, but the bottom line is the number of bits

(or bytes) that it takes to represent the input using any reasonably efﬁcient encoding. By a reasonably efﬁcient

encoding, we assume that there is not some signiﬁcantly shorter way of providing the same information. For

example, you could write numbers in unary notation 11111111

1

= 100

2

= 8 rather than binary, but that would

be unacceptably inefﬁcient. You could describe graphs in some highly inefﬁcient way, such as by listing all of

its cycles, but this would also be unacceptable. We will assume that numbers are expressed in binary or some

higher base and graphs are expressed using either adjacency matrices or adjacency lists.

We will usually restrict numeric inputs to be integers (as opposed to calling them “reals”), so that it is clear that

arithmetic can be performed efﬁciently. We have also assumed that operations on numbers can be performed in

constant time. From now on, we should be more careful and assume that arithmetic operations require at least

as much time as there are bits of precision in the numbers being stored.

Up until now all the algorithms we have seen have had the property that their worst-case running times are

bounded above by some polynomial in the input size, n. A polynomial time algorithm is any algorithm that

runs in time O(n

k

) where k is some constant that is independent of n. A problem is said to be solvable in

polynomial time if there is a polynomial time algorithm that solves it.

Some functions that do not “look” like polynomials (such as O(nlog n)) are bounded above by polynomials

(such as O(n

2

)). Some functions that do “look” like polynomials are not. For example, suppose you have an

algorithm which inputs a graph of size n and an integer k and runs in O(n

k

) time. Is this a polynomial? No,

because k is an input to the problem, so the user is allowed to choose k = n, implying that the running time

would be O(n

n

) which is not a polynomial in n. The important thing is that the exponent must be a constant

independent of n.

Of course, saying that all polynomial time algorithms are “efﬁcient” is untrue. An algorithm whose running

time is O(n

1000

) is certainly pretty inefﬁcient. Nonetheless, if an algorithm runs in worse than polynomial time

(e.g. 2

n

), then it is certainly not efﬁcient, except for very small values of n.

Lecture Notes 57 CMSC 451

Decision Problems: Many of the problems that we have discussed involve optimization of one form or another: ﬁnd

the shortest path, ﬁnd the minimum cost spanning tree, ﬁnd the minimum weight triangulation. For rather tech-

nical reasons, most NP-complete problems that we will discuss will be phrased as decision problems. A problem

is called a decision problem if its output is a simple “yes” or “no” (or you may think of this as True/False, 0/1,

accept/reject).

We will phrase many optimization problems in terms of decision problems. For example, the minimum spanning

tree decision problem might be: Given a weighted graph Gand an integer k, does Ghave a spanning tree whose

weight is at most k?

This may seem like a less interesting formulation of the problem. It does not ask for the weight of the minimum

spanning tree, and it does not even ask for the edges of the spanning tree that achieves this weight. However,

our job will be to show that certain problems cannot be solved efﬁciently. If we show that the simple decision

problem cannot be solved efﬁciently, then the more general optimization problem certainly cannot be solved

efﬁciently either.

Language Recognition Problems: Observe that a decision problem can also be thought of as a language recognition

problem. We could deﬁne a language L

L = ¦(G, k) [ G has a MST of weight at most k¦.

This set consists of pairs, the ﬁrst element is a graph (e.g. the adjacency matrix encoded as a string) followed

by an integer k encoded as a binary number. At ﬁrst it may seem strange expressing a graph as a string, but

obviously anything that is represented in a computer is broken down somehow into a string of bits.

When presented with an input string (G, k), the algorithm would answer “yes” if (G, k) ∈ L implying that G

has a spanning tree of weight at most k, and “no” otherwise. In the ﬁrst case we say that the algorithm “accepts”

the input and otherwise it “rejects” the input.

Given any language, we can ask the question of how hard it is to determine whether a given string is in the

language. For example, in the case of the MST language L, we can determine membership easily in polynomial

time. We just store the graph internally, run Kruskal’s algorithm, and see whether the ﬁnal optimal weight is at

most k. If so we accept, and otherwise we reject.

Deﬁnition: Deﬁne P to be the set of all languages for which membership can be tested in polynomial time.

(Intuitively, this corresponds to the set of all decisions problems that can be solved in polynomial time.)

Note that languages are sets of strings, and P is a set of languages. P is deﬁned in terms of how hard it is

computationally to recognized membership in the language. A set of languages that is deﬁned in terms of how

hard it is to determine membership is called a complexity class. Since we can compute minimum spanning trees

in polynomial time, we have L ∈ P.

Here is a harder one, though.

M = ¦(G, k) [ G has a simple path of length at least k¦.

Given a graph G and integer k how would you “recognize” whether it is in the language M? You might try

searching the graph for a simple paths, until ﬁnding one of length at least k. If you ﬁnd one then you can accept

and terminate. However, if not then you may spend a lot of time searching (especially if k is large, like n − 1,

and no such path exists). So is M ∈ P? No one knows the answer. In fact, we will show that M is NP-complete.

In what follows, we will be introducing a number of classes. We will jump back and forth between the terms

“language” and “decision problems”, but for our purposes they mean the same things. Before giving all the

technical deﬁnitions, let us say a bit about what the general classes look like at an intuitive level.

P: This is the set of all decision problems that can be solved in polynomial time. We will generally refer to

these problems as being “easy” or “efﬁciently solvable”. (Although this may be an exaggeration in many

cases.)

Lecture Notes 58 CMSC 451

NP: This is the set of all decision problems that can be veriﬁed in polynomial time. (We will give a deﬁnition

of this below.) This class contains P as a subset. Thus, it contains a number of easy problems, but it also

contains a number of problems that are believed to be very hard to solve. The term NP does not mean “not

polynomial”. Originally the term meant “nondeterministic polynomial time”. But it is bit more intuitive to

explain the concept from the perspective of veriﬁcation.

NP-hard: In spite of its name, to say that problem is NP-hard does not mean that it is hard to solve. Rather

it means that if we could solve this problem in polynomial time, then we could solve all NP problems in

polynomial time. Note that for a problem to be NP hard, it does not have to be in the class NP. Since it

is widely believed that all NP problems are not solvable in polynomial time, it is widely believed that no

NP-hard problem is solvable in polynomial time.

NP-complete: A problem is NP-complete if (1) it is in NP, and (2) it is NP-hard. That is, NPC = NP∩NP-hard.

The ﬁgure below illustrates one way that the sets P, NP, NP-hard, and NP-complete (NPC) might look. We

say might because we do not know whether all of these complexity classes are distinct or whether they are all

solvable in polynomial time. There are some problems in the ﬁgure that we will not discuss. One is Graph

Isomorphism, which asks whether two graphs are identical up to a renaming of their vertices. It is known that

this problem is in NP, but it is not known to be in P. The other is QBF, which stands for Quantiﬁed Boolean

Formulas. In this problem you are given a boolean formula with quantiﬁers (∃ and ∀) and you want to know

whether the formula is true or false. This problem is beyond the scope of this course, but may be discussed in

an advanced course on complexity theory.

NP

P

NP−Hard

One way that things ‘might’ be.

Hamiltonian Cycle

Graph Isomorphism?

MST

Strong connectivity

Satisfiability

Knapsack

QBF

NPC

No Ham. Cycle

Easy

Harder

Fig. 43: The (possible) structure of P, NP, and related complexity classes.

Polynomial Time Veriﬁcation and Certiﬁcates: Before talking about the class of NP-complete problems, it is im-

portant to introduce the notion of a veriﬁcation algorithm. Many language recognition problems that may be

very hard to solve, but they have the property that it is easy to verify whether a string is in the language.

Consider the following problem, called the Hamiltonian cycle problem. Given an undirected graph G, does G

have a cycle that visits every vertex exactly once. (There is a similar problem on directed graphs, and there is

also a version which asks whether there is a path that visits all vertices.) We can describe this problem as a

language recognition problem, where the language is

HC = ¦(G) [ G has a Hamiltonian cycle¦,

where (G) denotes an encoding of a graph G as a string. The Hamiltonian cycle problem seems to be much

harder, and there is no known polynomial time algorithm for this problem. For example, the ﬁgure below shows

two graphs, one which is Hamiltonian and one which is not.

However, suppose that a graph did have a Hamiltonian cycle. Then it would be a very easy matter for someone

to convince us of this. They would simply say “the cycle is 'v

3

, v

7

, v

1

, . . . , v

13

`”. We could then inspect the

Lecture Notes 59 CMSC 451

Nonhamiltonian Hamiltonian

Fig. 44: Hamiltonian cycle.

graph, and check that this is indeed a legal cycle and that it visits all the vertices of the graph exactly once. Thus,

even though we know of no efﬁcient way to solve the Hamiltonian cycle problem, there is a very efﬁcient way

to verify that a given graph is in HC. The given cycle is called a certiﬁcate. This is some piece of information

which allows us to verify that a given string is in a language.

More formally, given a language L, and given x ∈ L, a veriﬁcation algorithm is an algorithm which given x

and a string y called the certiﬁcate, can verify that x is in the language L using this certiﬁcate as help. If x is

not in L then there is nothing to verify.

Note that not all languages have the property that they are easy to verify. For example, consider the following

languages:

UHC = ¦(G) [ G has a unique Hamiltonian cycle¦

HC = ¦(G) [ G has no Hamiltonian cycle¦.

Suppose that a graph G is in the language UHC. What information would someone give us that would allow

us to verify that G is indeed in the language? They could give us an example of the unique Hamiltonian cycle,

and we could verify that it is a Hamiltonian cycle, but what sort of certiﬁcate could they give us to convince us

that this is the only one? They could give another cycle that is NOT Hamiltonian, but this does not mean that

there is not another cycle somewhere that is Hamiltonian. They could try to list every other cycle of length n,

but this would not be at all efﬁcient, since there are n! possible cycles in general. Thus, it is hard to imagine

that someone could give us some information that would allow us to efﬁciently convince ourselves that a given

graph is in the language.

The class NP:

Deﬁnition: Deﬁne NP to be the set of all languages that can be veriﬁed by a polynomial time algorithm.

Why is the set called “NP” rather than “VP”? The original term NP stood for “nondeterministic polynomial

time”. This referred to a program running on a nondeterministic computer that can make guesses. Basically,

such a computer could nondeterministically guess the value of certiﬁcate, and then verify that the string is in

the language in polynomial time. We have avoided introducing nondeterminism here. It would be covered in a

course on complexity theory or formal language theory.

Like P, NP is a set of languages based on some complexity measure (the complexity of veriﬁcation). Observe

that P ⊆ NP. In other words, if we can solve a problem in polynomial time, then we can certainly verify

membership in polynomial time. (More formally, we do not even need to see a certiﬁcate to solve the problem,

we can solve it in polynomial time anyway).

However it is not known whether P = NP. It seems unreasonable to think that this should be so. In other words,

just being able to verify that you have a correct solution does not help you in ﬁnding the actual solution very

much. Most experts believe that P = NP, but no one has a proof of this. Next time we will deﬁne the notions of

NP-hard and NP-complete.

Lecture Notes 60 CMSC 451

Lecture 17: NP-Completeness: Reductions

Read: Chapt 34, through Section 34.4.

Summary: Last time we introduced a number of concepts, on the way to deﬁning NP-completeness. In particular,

the following concepts are important.

Decision Problems: are problems for which the answer is either yes or no. NP-complete problems are ex-

pressed as decision problems, and hence can be thought of as language recognition problems, assuming

that the input has been encoded as a string. We encode inputs as strings. For example:

HC = ¦G [ G has a Hamiltonian cycle¦

MST = ¦(G, x) [ G has a MST of cost at most x¦.

P: is the class of all decision problems which can be solved in polynomial time, O(n

k

) for some constant k.

For example MST ∈ P but HC is not known (and suspected not) to be in P.

Certiﬁcate: is a piece of evidence that allows us to verify in polynomial time that a string is in a given language.

For example, suppose that the language is the set of Hamiltonian graphs. To convince someone that a graph

is in this language, we could supply the certiﬁcate consisting of a sequence of vertices along the cycle. It is

easy to access the adjacency matrix to determine that this is a legitimate cycle in G. Therefore HC ∈ NP.

NP: is deﬁned to be the class of all languages that can be veriﬁed in polynomial time. Note that since all

languages in P can be solved in polynomial time, they can certainly be veriﬁed in polynomial time, so we

have P ⊆ NP. However, NP also seems to have some pretty hard problems to solve, such as HC.

Reductions: The class of NP-complete problems consists of a set of decision problems (languages) (a subset of the

class NP) that no one knows how to solve efﬁciently, but if there were a polynomial time solution for even a

single NP-complete problem, then every problem in NP would be solvable in polynomial time. To establish this,

we need to introduce the concept of a reduction.

Before discussing reductions, let us just consider the following question. Suppose that there are two problems,

H and U. We know (or you strongly believe at least) that H is hard, that is it cannot be solved in polynomial

time. On the other hand, the complexity of U is unknown, but we suspect that it too is hard. We want to prove

that U cannot be solved in polynomial time. How would we do this? We want to show that

(H / ∈ P) ⇒ (U / ∈ P).

To do this, we could prove the contrapositive,

(U ∈ P) ⇒ (H ∈ P).

In other words, to show that U is not solvable in polynomial time, we will suppose that there is an algorithm that

solves U in polynomial time, and then derive a contradiction by showing that H can be solved in polynomial

time.

How do we do this? Suppose that we have a subroutine that can solve any instance of problem U in polynomial

time. Then all we need to do is to show that we can use this subroutine to solve problem H in polynomial time.

Thus we have “reduced” problem H to problem U. It is important to note here that this supposed subroutine

is really a fantasy. We know (or strongly believe) that H cannot be solved in polynomial time, thus we are

essentially proving that the subroutine cannot exist, implying that U cannot be solved in polynomial time. (Be

sure that you understand this, this the basis behind all reductions.)

Example: 3-Colorability and Clique Cover: Let us consider an example to make this clearer. The following prob-

lem is well-known to be NP-complete, and hence it is strongly believed that the problem cannot be solved in

polynomial time.

Lecture Notes 61 CMSC 451

3-coloring (3Col): Given a graph G, can each of its vertices be labeled with one of 3 different “colors”, such

that no two adjacent vertices have the same label.

Coloring arises in various partitioning problems, where there is a constraint that two objects cannot be assigned

to the same set of the partition. The term “coloring” comes from the original application which was in map

drawing. Two countries that share a common border should be colored with different colors. It is well known

that planar graphs can be colored with 4 colors, and there exists a polynomial time algorithm for this. But

determining whether 3 colors are possible (even for planar graphs) seems to be hard and there is no known

polynomial time algorithm. In the ﬁgure below we give two graphs, one is 3-colorable and one is not.

3−colorable Not 3−colorable Clique cover (size = 3)

Fig. 45: 3-coloring and Clique Cover.

The 3Col problem will play the role of the hard problem H, which we strongly suspect to not be solvable in

polynomial time. For our unknown problem U, consider the following problem. Given a graph G = (V, E), we

say that a subset of vertices V

⊆ V forms a clique if for every pair of vertices u, v ∈ V

(u, v) ∈ E. That is,

the subgraph induced by V

is a complete graph.

Clique Cover (CCov): Given a graph G = (V, E) and an integer k, can we partition the vertex set into k

subsets of vertices V

1

, V

2

, . . . , V

k

, such that

¸

i

V

i

= V , and that each V

i

is a clique of G.

The clique cover problem arises in applications of clustering. We put an edge between two nodes if they are

similar enough to be clustered in the same group. We want to know whether it is possible to cluster all the

vertices into k groups.

Suppose that you want to solve the CCov problem, but after a while of fruitless effort, you still cannot ﬁnd

a polynomial time algorithm for the CCov problem. How can you prove that CCov is likely to not have a

polynomial time solution? You know that 3Col is NP-complete, and hence experts believe that 3Col / ∈ P. You

feel that there is some connection between the CCov problem and the 3Col problem. Thus, you want to show

that

(3Col / ∈ P) ⇒ (CCov / ∈ P),

which you will show by proving the contrapositive

(CCov ∈ P) ⇒ (3Col ∈ P).

To do this, you assume that you have access to a subroutine CCov(G, k). Given a graph G and an integer k, this

subroutine returns true if G has a clique cover of size k and false otherwise, and furthermore, this subroutine

runs in polynomial time. How can we use this “alleged” subroutine to solve the well-known hard 3Col problem?

We want to write a polynomial time subroutine for 3Col, and this subroutine is allowed to call the subroutine

CCov(G, k) for any graph G and any integer k.

Both problems involve partitioning the vertices up into groups. The only difference here is that in one problem

the number of cliques is speciﬁed as part of the input and in the other the number of color classes is ﬁxed at 3.

In the clique cover problem, for two vertices to be in the same group they must be adjacent to each other. In the

3-coloring problem, for two vertices to be in the same color group, they must not be adjacent. In some sense,

the problems are almost the same, but the requirement adjacent/non-adjacent is exactly reversed.

Lecture Notes 62 CMSC 451

We claim that we can reduce the 3-coloring problem to the clique cover problem as follows. Given a graph G

for which we want to determine its 3-colorability, output the pair (G, 3) where G denotes the complement of G.

(That is, G is a graph on the same vertices, but (u, v) is an edge of G if and only if it is not an edge of G.) We

can then feed the pair (G, 3) into a subroutine for clique cover. This is illustrated in the ﬁgure below.

H G

_

3−colorable Coverable by 3 cliques

G

Not coverable

H

Not 3−colorable

_

Fig. 46: Clique covers in the complement.

Claim: A graph G is 3-colorable if and only if its complement G has a clique-cover of size 3. In other words,

G ∈ 3Col iff (G, 3) ∈ CCov.

Proof: (⇒) If G 3-colorable, then let V

1

, V

2

, V

3

be the three color classes. We claim that this is a clique cover

of size 3 for G, since if u and v are distinct vertices in V

i

, then ¦u, v¦ / ∈ E(G) (since adjacent vertices

cannot have the same color) which implies that ¦u, v¦ ∈ E(G). Thus every pair of distinct vertices in V

i

are adjacent in G.

(⇐) Suppose G has a clique cover of size 3, denoted V

1

, V

2

, V

3

. For i ∈ ¦1, 2, 3¦ give the vertices of V

i

color i. We assert that this is a legal coloring for G, since if distinct vertices u and v are both in V

i

, then

¦u, v¦ ∈ E(G) (since they are in a common clique), implying that ¦u, v¦ / ∈ E((G). Hence, two vertices

with the same color are not adjacent.

Polynomial-time reduction: We now take this intuition of reducing one problem to another through the use of a

subroutine call, and place it on more formal footing. Notice that in the example above, we converted an instance

of the 3-coloring problem (G) into an equivalent instance of the Clique Cover problem (G, 3).

Deﬁnition: We say that a language (i.e. decision problem) L

1

is polynomial-time reducible to language L

2

(written L

1

≤

P

L

2

) if there is a polynomial time computable function f, such that for all x, x ∈ L

1

if and

only if f(x) ∈ L

2

.

In the previous example we showed that

3Col ≤

P

CCov.

In particular we have f(G) = (G, 3). Note that it is easy to complement a graph in O(n

2

) (i.e. polynomial)

time (e.g. ﬂip 0’s and 1’s in the adjacency matrix). Thus f is computable in polynomial time.

Intuitively, saying that L

1

≤

P

L

2

means that “if L

2

is solvable in polynomial time, then so is L

1

.” This is

because a polynomial time subroutine for L

2

could be applied to f(x) to determine whether f(x) ∈ L

2

, or

equivalently whether x ∈ L

1

. Thus, in sense of polynomial time computability, L

1

is “no harder” than L

2

.

The way in which this is used in NP-completeness is exactly the converse. We usually have strong evidence that

L

1

is not solvable in polynomial time, and hence the reduction is effectively equivalent to saying “since L

1

is

not likely to be solvable in polynomial time, then L

2

is also not likely to be solvable in polynomial time.” Thus,

this is how polynomial time reductions can be used to show that problems are as hard to solve as known difﬁcult

problems.

Lemma: If L

1

≤

P

L

2

and L

2

∈ P then L

1

∈ P.

Lecture Notes 63 CMSC 451

Lemma: If L

1

≤

P

L

2

and L

1

/ ∈ P then L

2

/ ∈ P.

One important fact about reducibility is that it is transitive. In other words

Lemma: If L

1

≤

P

L

2

and L

2

≤

P

L

3

then L

1

≤

P

L

3

.

The reason is that if two functions f(x) and g(x) are computable in polynomial time, then their composition

f(g(x)) is computable in polynomial time as well. It should be noted that our text uses the term “reduction”

where most other books use the term “transformation”. The distinction is subtle, but people taking other courses

in complexity theory should be aware of this.

NP-completeness: The set of NP-complete problems are all problems in the complexity class NP, for which it is

known that if any one is solvable in polynomial time, then they all are, and conversely, if any one is not solvable

in polynomial time, then none are. This is made mathematically formal using the notion of polynomial time

reductions.

Deﬁnition: A language L is NP-hard if:

L

≤

P

L for all L

∈ NP.

(Note that L does not need to be in NP.)

Deﬁnition: A language L is NP-complete if:

(1) L ∈ NP and

(2) L is NP-hard.

An alternative (and usually easier way) to show that a problem is NP-complete is to use transitivity.

Lemma: L is NP-complete if

(1) L ∈ NP and

(2) L

≤

P

L for some known NP-complete language L

.

The reason is that all L

∈ NP are reducible to L

(since L

**is NP-complete and hence NP-hard) and hence by
**

transitivity L

**is reducible to L, implying that L is NP-hard.
**

This gives us a way to prove that problems are NP-complete, once we know that one problem is NP-complete.

Unfortunately, it appears to be almost impossible to prove that one problem is NP-complete, because the deﬁni-

tion says that we have to be able to reduce every problem in NP to this problem. There are inﬁnitely many such

problems, so how can we ever hope to do this? We will talk about this next time with Cook’s theorem. Cook

showed that there is one problem called SAT (short for boolean satisﬁability) that is NP-complete. To prove a

second problem is NP-complete, all we need to do is to show that our problem is in NP (and hence it is reducible

to SAT), and then to show that we can reduce SAT (or generally some known NPC problem) to our problem. It

follows that our problem is equivalent to SAT (with respect to solvability in polynomial time). This is illustrated

in the ﬁgure below.

Lecture 18: Cook’s Theorem, 3SAT, and Independent Set

Read: Chapter 34, through 34.5. The reduction given here is similar, but not the same as the reduction given in the

text.

Recap: So far we introduced the deﬁnitions of NP-completeness. Recall that we mentioned the following topics:

P: is the set of decision problems (or languages) that are solvable in polynomial time.

NP: is the set of decision problems (or languages) that can be veriﬁed in polynomial time,

Lecture Notes 64 CMSC 451

Proving a problem is in NP

Your problem

Known NPC

Proving a problem is NP−hard Resulting structure

NPC

NP

Your reduction

NP

NPC

NP

P P

NPC

P

SAT

SAT

Fig. 47: Structure of NPC and reductions.

Polynomial reduction: L

1

≤

P

L

2

means that there is a polynomial time computable function f such that

x ∈ L

1

if and only if f(x) ∈ L

2

. A more intuitive to think about this, is that if we had a subroutine to

solve L

2

in polynomial time, then we could use it to solve L

1

in polynomial time.

Polynomial reductions are transitive, that is, if L

1

≤

P

L

2

and L

2

≤

P

L

3

, then L

1

≤

P

L

3

.

NP-Hard: L is NP-hard if for all L

∈ NP, L

≤

P

L. Thus, if we could solve L in polynomial time, we could

solve all NP problems in polynomial time.

NP-Complete: L is NP-complete if (1) L ∈ NP and (2) L is NP-hard.

The importance of NP-complete problems should now be clear. If any NP-complete problems (and generally

any NP-hard problem) is solvable in polynomial time, then every NP-complete problem (and in fact every

problem in NP) is also solvable in polynomial time. Conversely, if we can prove that any NP-complete problem

(and generally any problem in NP) cannot be solved in polynomial time, then every NP-complete problem (and

generally every NP-hard problem) cannot be solved in polynomial time. Thus all NP-complete problems are

equivalent to one another (in that they are either all solvable in polynomial time, or none are).

An alternative way to show that a problem is NP-complete is to use transitivity of ≤

P

.

Lemma: L is NP-complete if

(1) L ∈ NP and

(2) L

≤

P

L for some NP-complete language L

.

Note: The known NP-complete problem L

**is reduced to the candidate NP-complete problem L. Keep this
**

order in mind.

Cook’s Theorem: Unfortunately, to use this lemma, we need to have at least one NP-complete problem to start the

ball rolling. Stephen Cook showed that such a problem existed. Cook’s theorem is quite complicated to prove,

but we’ll try to give a brief intuitive argument as to why such a problem might exist.

For a problem to be in NP, it must have an efﬁcient veriﬁcation procedure. Thus virtually all NP problems can

be stated in the form, “does there exists X such that P(X)”, where X is some structure (e.g. a set, a path, a

partition, an assignment, etc.) and P(X) is some property that X must satisfy (e.g. the set of objects must ﬁll

the knapsack, or the path must visit every vertex, or you may use at most k colors and no two adjacent vertices

can have the same color). In showing that such a problem is in NP, the certiﬁcate consists of giving X, and the

veriﬁcation involves testing that P(X) holds.

In general, any set X can be described by choosing a set of objects, which in turn can be described as choosing

the values of some boolean variables. Similarly, the property P(X) that you need to satisfy, can be described as

a boolean formula. Stephen Cook was looking for the most general possible property he could, since this should

represent the hardest problem in NP to solve. He reasoned that computers (which represent the most general

Lecture Notes 65 CMSC 451

type of computational devices known) could be described entirely in terms of boolean circuits, and hence in

terms of boolean formulas. If any problem were hard to solve, it would be one in which X is an assignment of

boolean values (true/false, 0/1) and P(X) could be any boolean formula. This suggests the following problem,

called the boolean satisﬁability problem.

SAT: Given a boolean formula, is there some way to assign truth values (0/1, true/false) to the variables of the

formula, so that the formula evaluates to true?

A boolean formula is a logical formula which consists of variables x

i

, and the logical operations x meaning the

negation of x, boolean-or (x∨y) and boolean-and (x∧y). Given a boolean formula, we say that it is satisﬁable

if there is a way to assign truth values (0 or 1) to the variables such that the ﬁnal result is 1. (As opposed to the

case where no matter how you assign truth values the result is always 0.)

For example,

(x

1

∧ (x

2

∨ x

3

)) ∧ ((x

2

∧ x

3

) ∨ x

1

)

is satisﬁable, by the assignment x

1

= 1, x

2

= 0, x

3

= 0 On the other hand,

(x

1

∨ (x

2

∧ x

3

)) ∧ (x

1

∨ (x

2

∧ x

3

)) ∧ (x

2

∨ x

3

) ∧ (x

2

∨ x

3

)

is not satisﬁable. (Observe that the last two clauses imply that one of x

2

and x

3

must be true and the other must

be false. This implies that neither of the subclauses involving x

2

and x

3

in the ﬁrst two clauses can be satisﬁed,

but x

1

cannot be set to satisfy them either.)

Cook’s Theorem: SAT is NP complete.

We will not prove this theorem. The proof would take about a full lecture (not counting the week or so of

background on Turing machines). In fact, it turns out that a even more restricted version of the satisﬁability

problem is NP-complete. A literal is a variable or its negation x or x. A formula is in 3-conjunctive normal

form (3-CNF) if it is the boolean-and of clauses where each clause is the boolean-or of exactly 3 literals. For

example

(x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

3

∨ x

4

) ∧ (x

2

∨ x

3

∨ x

4

)

is in 3-CNF form. 3SAT is the problem of determining whether a formula in 3-CNF is satisﬁable. It turns out that

it is possible to modify the proof of Cook’s theorem to show that the more restricted 3SAT is also NP-complete.

As an aside, note that if we replace the 3 in 3SAT with a 2, then everything changes. If a boolean formula is

given in 2SAT, then it is possible to determine its satisﬁability in polynomial time. Thus, even a seemingly small

change can be the difference between an efﬁcient algorithm and none.

NP-completeness proofs: Now that we know that 3SAT is NP-complete, we can use this fact to prove that other

problems are NP-complete. We will start with the independent set problem.

Independent Set (IS): Given an undirected graph G = (V, E) and an integer k does G contain a subset V

of

k vertices such that no two vertices in V

**are adjacent to one another.
**

For example, the graph shown in the ﬁgure below has an independent set (shown with shaded nodes) of size

4. The independent set problem arises when there is some sort of selection problem, but there are mutual

restrictions pairs that cannot both be selected. (For example, you want to invite as many of your friends to your

party, but many pairs do not get along, represented by edges between them, and you do not want to invite two

enemies.)

Note that if a graph has an independent set of size k, then it has an independent set of all smaller sizes. So the

corresponding optimization problem would be to ﬁnd an independent set of the largest size in a graph. Often

the vertices have weights, so we might talk about the problem of computing the independent set with the largest

total weight. However, since we want to show that the problem is hard to solve, we will consider the simplest

version of the problem.

Lecture Notes 66 CMSC 451

Fig. 48: Independent Set.

Claim: IS is NP-complete.

The proof involves two parts. First, we need to show that IS ∈ NP. The certiﬁcate consists of the k vertices of

V

. We simply verify that for each pair of vertex u, v ∈ V

**, there is no edge between them. Clearly this can be
**

done in polynomial time, by an inspection of the adjacency matrix.

boolean formula

polynomial time computable

graph and integer

no

(in 3−CNF)

yes

F

3SAT

f

IS

(G,k)

Fig. 49: Reduction of 3-SAT to IS.

Secondly, we need to establish that IS is NP-hard, which can be done by showing that some known NP-complete

problem (3SAT) is polynomial-time reducible to IS, that is, 3SAT ≤

P

IS. Let F be a boolean formula in 3-CNF

form (the boolean-and of clauses, each of which is the boolean-or of 3 literals). We wish to ﬁnd a polynomial

time computable function f that maps F into a input for the IS problem, a graph G and integer k. That is,

f(F) = (G, k), such that F is satisﬁable if and only if G has an independent set of size k. This will mean that

if we can solve the independent set problem for G and k in polynomial time, then we would be able to solve

3SAT in polynomial time.

An important aspect to reductions is that we do not attempt to solve the satisﬁability problem. (Remember: It

is NP-complete, and there is not likely to be any polynomial time solution.) So the function f must operate

without knowledge of whether F is satisﬁable. The idea is to translate the similar elements of the satisﬁable

problem to corresponding elements of the independent set problem.

What is to be selected?

3SAT: Which variables are assigned to be true. Equivalently, which literals are assigned true.

IS: Which vertices are to be placed in V

.

Requirements:

3SAT: Each clause must contain at least one literal whose value it true.

IS: V

**must contain at least k vertices.
**

Restrictions:

3SAT: If x

i

is assigned true, then x

i

must be false, and vice versa.

Lecture Notes 67 CMSC 451

IS: If u is selected to be in V

, and v is a neighbor of u, then v cannot be in V

.

We want a function f, which given any 3-CNF boolean formula F, converts it into a pair (G, k) such that the

above elements are translated properly. Our strategy will be to create one vertex for each literal that appears

within each clause. (Thus, if there are m clauses in F, there will be 3m vertices in G.) The vertices are grouped

into clause clusters, one for each clause. Selecting a true literal from some clause corresponds to selecting a

vertex to add to V

**. We set k to the number of clauses. This forces the independent set to pick one vertex
**

from each clause, thus, one literal from each clause is true. In order to keep the IS subroutine from selecting

two literals from some clause (and hence none from some other), we will connect all the vertices in each clause

cluster to each other. To keep the IS subroutine from selecting both a literal and its complement, we will put an

edge between each literal and its complement. This enforces the condition that if a literal is put in the IS (set to

true) then its complement literal cannot also be true. A formal description of the reduction is given below. The

input is a boolean formula F in 3-CNF, and the output is a graph G and integer k.

3SAT to IS Reduction

k ←number of clauses in F;

for each clause C in F

create a clause cluster of 3 vertices from the literals of C;

for each clause cluster (x1, x2, x3)

create an edge (xi, xj) between all pairs of vertices in the cluster;

for each vertex xi

create edges between xi and all its complement vertices xi;

return (G, k);

Given any reasonable encoding of F, it is an easy programming exercise to create G(say as an adjacency matrix)

in polynomial time. We claim that F is satisﬁable if and only if G has an independent set of size k.

Example: Suppose that we are given the 3-CNF formula:

(x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

).

The reduction produces the graph shown in the following ﬁgure and sets k = 4.

1

x

2

x

3

Correctness (x1=x2=1, x3=0)

x

3

x

2

x

1

x x

The reduction

x

1

2

3

x

2

x

2

x

3

x

1

x

3

x

2

x

1

x

1

x x

3

x

1

x

2

x

3

x

3

x

2

x

1

Fig. 50: 3SAT to IS Reduction for (x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

).

In our example, the formula is satisﬁed by the assignment x

1

= 1, x

2

= 1, and x

3

= 0. Note that the literal x

1

satisﬁes the ﬁrst and last clauses, x

2

satisﬁes the second, and x

3

satiﬁes the third. Observe that by selecting the

corresponding vertices from the clusters, we get an independent set of size k = 4.

Lecture Notes 68 CMSC 451

Correctness Proof: We claim that F is satisﬁable if and only if Ghas an independent set of size k. If F is satisﬁable,

then each of the k clauses of F must have at least one true literal. Let V

**denote the corresponding vertices
**

from each of the clause clusters (one from each cluster). Because we take vertices from each cluster, there are

no inter-cluster edges between them, and because we cannot set a variable and its complement to both be true,

there can be no edge of the form (x

i

, x

i

) between the vertices of V

. Thus, V

**is an independent set of size k.
**

Conversely, if Ghas an independent set V

**of size k. First observe that we must select a vertex from each clause
**

cluster, because there are k clusters, and we cannot take two vertices from the same cluster (because they are all

interconnected). Consider the assignment in which we set all of these literals to 1. This assignment is logically

consistent, because we cannot have two vertices labeled x

i

and x

i

in the same cluster. Finally the transformation

clearly runs in polynomial time. This completes the NP-completeness proof.

Observe that our reduction did not attempt to solve the IS problem nor to solve the 3SAT. Also observe that

the reduction had no knowledge of the solution to either problem. (We did not assume that the formula was

satisﬁable, nor did we assume we knew which variables to set to 1.) This is because computing these things

would require exponential time (by the best known algorithms). Instead the reduction simply translated the

input from one problem into an equivalent input to the other problem, while preserving the critical elements to

each problem.

Lecture 19: Clique, Vertex Cover, and Dominating Set

Read: Chapt 34 (up through 34.5). The dominating set proof is not given in our text.

Recap: Last time we gave a reduction from 3SAT (satisﬁability of boolean formulas in 3-CNF form) to IS (indepen-

dent set in graphs). Today we give a few more examples of reductions. Recall that to show that a problem is

NP-complete we need to show (1) that the problem is in NP (i.e. we can verify when an input is in the language),

and (2) that the problem is NP-hard, by showing that some known NP-complete problem can be reduced to this

problem (there is a polynomial time function that transforms an input for one problem into an equivalent input

for the other problem).

Some Easy Reductions: We consider some closely related NP-complete problems next.

Clique (CLIQUE): The clique problem is: given an undirected graph G = (V, E) and an integer k, does G

have a subset V

of k vertices such that for each distinct u, v ∈ V

**, ¦u, v¦ ∈ E. In other words, does G
**

have a k vertex subset whose induced subgraph is complete.

Vertex Cover (VC): A vertex cover in an undirected graph G = (V, E) is a subset of vertices V

⊆ V such that

every edge in G has at least one endpoint in V

**. The vertex cover problem (VC) is: given an undirected
**

graph G and an integer k, does G have a vertex cover of size k?

Dominating Set (DS): A dominating set in a graph G = (V, E) is a subset of vertices V

**such that every vertex
**

in the graph is either in V

or is adjacent to some vertex in V

**. The dominating set problem (DS) is: given
**

a graph G = (V, E) and an integer k, does G have a dominating set of size k?

Don’t confuse the clique (CLIQUE) problem with the clique-cover (CC) problem that we discussed in an earlier

lecture. The clique problem seeks to ﬁnd a single clique of size k, and the clique-cover problem seeks to partition

the vertices into k groups, each of which is a clique.

We have discussed the facts that cliques are of interest in applications dealing with clustering. The vertex cover

problem arises in various servicing applications. For example, you have a compute network and a program that

checks the integrity of the communication links. To save the space of installing the program on every computer

in the network, it sufﬁces to install it on all the computers forming a vertex cover. From these nodes all the

links can be tested. Dominating set is useful in facility location problems. For example, suppose we want to

select where to place a set of ﬁre stations such that every house in the city is within 2 minutes of the nearest

Lecture Notes 69 CMSC 451

ﬁre station. We create a graph in which two locations are adjacent if they are within 2 minutes of each other. A

minimum sized dominating set will be a minimum set of locations such that every other location is reachable

within 2 minutes from one of these sites.

The CLIQUE problem is obviously closely related to the independent set problem (IS): Given a graph G does it

have a k vertex subset that is completely disconnected. It is not quite as clear that the vertex cover problem is

related. However, the following lemma makes this connection clear as well.

G G G

V’ is CLIQUE of

iff

size k in G

iff

V’ is an IS of

size k in G

V−V’ is a VC of

size n−k in G

Fig. 51: Clique, Independent set, and Vertex Cover.

Lemma: Given an undirected graph G = (V, E) with n vertices and a subset V

**⊆ V of size k. The following
**

are equivalent:

(i) V

**is a clique of size k for the complement, G.
**

(ii) V

**is an independent set of size k for G.
**

(iii) V −V

**is a vertex cover of size n −k for G.
**

Proof:

(i) ⇒(ii): If V

is a clique for G, then for each u, v ∈ V

**, ¦u, v¦ is an edge of G implying that ¦u, v¦ is
**

not an edge of G, implying that V

**is an independent set for G.
**

(ii) ⇒(iii): If V

is an independent set for G, then for each u, v ∈ V

**, ¦u, v¦ is not an edge of G, implying
**

that every edge in G is incident to a vertex in V −V

, implying that V −V

is a VC for G.

(iii) ⇒(i): If V −V

is a vertex cover for G, then for any u, v ∈ V

**there is no edge ¦u, v¦ in G, implying
**

that there is an edge ¦u, v¦ in G, implying that V

is a clique in G. V

**is an independent set for G.
**

Thus, if we had an algorithm for solving any one of these problems, we could easily translate it into an algorithm

for the others. In particular, we have the following.

Theorem: CLIQUE is NP-complete.

CLIQUE ∈ NP: The certiﬁcate consists of the k vertices in the clique. Given such a certiﬁcate we can easily

verify in polynomial time that all pairs of vertices in the set are adjacent.

IS ≤

P

CLIQUE: We want to show that given an instance of the IS problem (G, k), we can produce an equiv-

alent instance of the CLIQUE problem in polynomial time. The reduction function f inputs G and k, and

outputs the pair (G, k). Clearly this can be done in polynomial time. By the above lemma, this instance is

equivalent.

Theorem: VC is NP-complete.

VC ∈ NP: The certiﬁcate consists of the k vertices in the vertex cover. Given such a certiﬁcate we can easily

verify in polynomial time that every edge is incident to one of these vertices.

Lecture Notes 70 CMSC 451

IS ≤

P

VC: We want to show that given an instance of the IS problem (G, k), we can produce an equivalent

instance of the VC problem in polynomial time. The reduction function f inputs G and k, computes the

number of vertices, n, and then outputs (G, n − k). Clearly this can be done in polynomial time. By the

lemma above, these instances are equivalent.

Note: Note that in each of the above reductions, the reduction function did not know whether G has an inde-

pendent set or not. It must run in polynomial time, and IS is an NP-complete problem. So it does not have time

to determine whether G has an independent set or which vertices are in the set.

Dominating Set: As with vertex cover, dominating set is an example of a graph covering problem. Here the condition

is a little different, each vertex is adjacent to the members of the dominating set, as opposed to each edge being

incident to each member of the dominating set. Obviously, if G is connected and has a vertex cover of size k,

then it has a dominating set of size k (the same set of vertices), but the converse is not necessarily true. However

the similarity suggests that if VC in NP-complete, then DS is likely to be NP-complete as well. The main result

of this section is just this.

Theorem: DS is NP-complete.

As usual the proof has two parts. First we show that DS ∈ NP. The certiﬁcate just consists of the subset V

in

the dominating set. In polynomial time we can determine whether every vertex is in V

or is adjacent to a vertex

in V

.

Reducing Vertex Cover to Dominating Set: Next we show that an existing NP-complete problem is reducible to

dominating set. We choose vertex cover and show that VC ≤

P

DS. We want a polynomial time function,

which given an instance of the vertex cover problem (G, k), produces an instance (G

, k

**) of the dominating set
**

problem, such that G has a vertex cover of size k if and only if G

has a dominating set of size k

.

How to we translate between these problems? The key difference is the condition. In VC: “every edge is incident

to a vertex in V

”. In DS: “every vertex is either in V

or is adjacent to a vertex in V

**”. Thus the translation must
**

somehow map the notion of “incident” to “adjacent”. Because incidence is a property of edges, and adjacency

is a property of vertices, this suggests that the reduction function maps edges of G into vertices in G

, such that

an incident edge in G is mapped to an adjacent vertex in G

.

This suggests the following idea (which does not quite work). We will insert a vertex into the middle of each

edge of the graph. In other words, for each edge ¦u, v¦, we will create a new special vertex, called w

uv

, and

replace the edge ¦u, v¦ with the two edges ¦u, w

uv

¦ and ¦v, w

uv

¦. The fact that u was incident to edge ¦u, v¦

has now been replaced with the fact that u is adjacent to the corresponding vertex w

uv

. We still need to dominate

the neighbor v. To do this, we will leave the edge ¦u, v¦ in the graph as well. Let G

**be the resulting graph.
**

This is still not quite correct though. Deﬁne an isolated vertex to be one that is incident to no edges. If u is

isolated it can only be dominated if it is included in the dominating set. Since it is not incident to any edges, it

does not need to be in the vertex cover. Let V

I

denote the isolated vertices in G, and let I denote the number of

isolated vertices. The number of vertices to request for the dominating set will be k

= k +I.

Now we can give the complete reduction. Given the pair (G, k) for the VC problem, we create a graph G

as

follows. Initially G

**= G. For each edge ¦u, v¦ in G we create a new vertex w
**

uv

in G

**and add edges ¦u, w
**

uv

¦

and ¦v, w

uv

¦ in G

. Let I denote the number of isolated vertices and set k

= k + I. Output (G

, k

). This

reduction illustrated in the following ﬁgure. Note that every step can be performed in polynomial time.

Correctness of the Reduction: To establish the correctness of the reduction, we need to show that G has a vertex

cover of size k if and only if G

has a dominating set of size k

. First we argue that if V

**is a vertex cover for G,
**

then V

= V

∪ V

I

is a dominating set for G

. Observe that

[V

[ = [V

∪ V

I

[ ≤ k +I = k

.

Note that [V

∪ V

I

[ might be of size less than k + I, if there are any isolated vertices in V

. If so, we can add

any vertices we like to make the size equal to k

.

Lecture Notes 71 CMSC 451

f

G

k=3

G’

k’=3+1=4

Fig. 52: Dominating set reduction.

To see that V

is a dominating set, ﬁrst observe that all the isolated vertices are in V

**and so they are dominated.
**

Second, each of the special vertices w

uv

in G

**corresponds to an edge ¦u, v¦ in G implying that either u or v is
**

in the vertex cover V

. Thus w

uv

is dominated by the same vertex in V

**Finally, each of the nonisolated original
**

vertices v is incident to at least one edge in G, and hence either it is in V

or else all of its neighbors are in V

. In

either case, v is either in V

or adjacent to a vertex in V

**. This is shown in the top part of the following ﬁgure.
**

vertex cover for G dominating set for G’

vertex cover for G using original vertices dominating set for G’

Fig. 53: Correctness of the VC to DS reduction (where k = 3 and I = 1).

Conversely, we claim that if G

has a dominating set V

of size k

= k + I then G has a vertex cover V

of

size k. Note that all I isolated vertices of G

must be in the dominating set. First, let V

= V

− V

I

be the

remaining k vertices. We might try to claim something like: V

**is a vertex cover for G. But this will not
**

necessarily work, because V

**may have vertices that are not part of the original graph G.
**

However, we claim that we never need to use any of the newly created special vertices in V

. In particular,

if some vertex w

uv

∈ V

, then modify V

by replacing w

uv

with u. (We could have just as easily replaced

it with v.) Observe that the vertex w

uv

is adjacent to only u and v, so it dominates itself and these other two

vertices. By using u instead, we still dominate u, v, and w

uv

(because u has edges going to v and w

uv

). Thus

by replacing w

u,v

with u we dominate the same vertices (and potentially more). Let V

**denote the resulting set
**

after this modiﬁcation. (This is shown in the lower middle part of the ﬁgure.)

We claim that V

**is a vertex cover for G. If, to the contrary there were an edge ¦u, v¦ of G that was not
**

covered (neither u nor v was in V

**) then the special vertex w
**

uv

would not be adjacent to any vertex of V

in

G

, contradicting the hypothesis that V

was a dominating set for G

.

Lecture Notes 72 CMSC 451

Lecture 20: Subset Sum

Read: Sections 34.5.5 in CLR.

Subset Sum: The Subset Sumproblem(SS) is the following. Given a ﬁnite set S of positive integers S = ¦w

1

, w

2

, . . . , w

n

¦

and a target value, t, we want to know whether there exists a subset S

**⊆ S that sums exactly to t.
**

This problem is a simpliﬁed version of the 0-1 Knapsack problem, presented as a decision problem. Recall

that in the 0-1 Knapsack problem, we are given a collection of objects, each with an associated weight w

i

and

associated value v

i

. We are given a knapsack of capacity W. The objective is to take as many objects as can ﬁt

in the knapsack’s capacity so as to maximize the value. (In the fractional knapsack we could take a portion of

an object. In the 0-1 Knapsack we either take an object entirely or leave it.) In the simplest version, suppose

that the value is the same as the weight, v

i

= w

i

. (This would occur for example if all the objects were made of

the same material, say, gold.) Then, the best we could hope to achieve would be to ﬁll the knapsack entirely. By

setting t = W, we see that the subset sum problem is equivalent to this simpliﬁed version of the 0-1 Knapsack

problem. It follows that if we can show that this simpler version is NP-complete, then certainly the more general

0-1 Knapsack problem (stated as a decision problem) is also NP-complete.

Consider the following example.

S = ¦3, 6, 9, 12, 15, 23, 32¦ and t = 33.

The subset S

**= ¦6, 12, 15¦ sums to t = 33, so the answer in this case is yes. If t = 34 the answer would be no.
**

Dynamic Programming Solution: There is a dynamic programming algorithm which solves the Subset Sum prob-

lem in O(n t) time.

2

The quantity n t is a polynomial function of n. This would seem to imply that the Subset Sum problem is in P.

But there is a important catch. Recall that in all NP-complete problems we assume (1) running time is measured

as a function of input size (number of bits) and (2) inputs must be encoded in a reasonable succinct manner. Let

us assume that the numbers w

i

and t are all b-bit numbers represented in base 2, using the fewest number of bits

possible. Then the input size is O(nb). The value of t may be as large as 2

b

. So the resulting algorithm has a

running time of O(n2

b

). This is polynomial in n, but exponential in b. Thus, this running time is not polynomial

as a function of the input size.

Note that an important consequence of this observation is that the SS problem is not hard when the numbers

involved are small. If the numbers involved are of a ﬁxed number of bits (a constant independent of n), then

the problem is solvable in polynomial time. However, we will show that in the general case, this problem is

NP-complete.

SS is NP-complete: The proof that Subset Sum (SS) is NP-complete involves the usual two elements.

(i) SS ∈ NP.

(ii) Some known NP-complete problem is reducible to SS. In particular, we will show that Vertex Cover (VC)

is reducible to SS, that is, VC ≤

P

SS.

To show that SS is in NP, we need to give a veriﬁcation procedure. Given S and t, the certiﬁcate is just the

indices of the numbers that form the subset S

**. We can add two b-bit numbers together in O(b) time. So, in
**

polynomial time we can compute the sum of elements in S

**, and verify that this sum equals t.
**

For the remainder of the proof we show how to reduce vertex cover to subset sum. We want a polynomial time

computable function f that maps an instance of the vertex cover (a graph G and integer k) to an instance of the

subset sum problem (a set of integers S and target integer t) such that G has a vertex cover of size k if and only

if S has a subset summing to t. Thus, if subset sum were solvable in polynomial time, so would vertex cover.

2

We will leave this as an exercise, but the formulation is, for 0 ≤ i ≤ n and 0 ≤ t

≤ t, S[i, t

] = 1 if there is a subset of {w

1

, w

2

, . . . , w

i

}

that sums to t

, and 0 otherwise. The ith row of this table can be computed in O(t) time, given the contents of the (i − 1)-st row.

Lecture Notes 73 CMSC 451

How can we encode the notion of selecting a subset of vertices that cover all the edges to that of selecting a

subset of numbers that sums to t? In the vertex cover problem we are selecting vertices, and in the subset sum

problem we are selecting numbers, so it seems logical that the reduction should map vertices into numbers. The

constraint that these vertices should cover all the edges must be mapped to the constraint that the sum of the

numbers should equal the target value.

An Initial Approach: Here is an idea, which does not work, but gives a sense of how to proceed. Let E denote the

number of edges in the graph. First number the edges of the graph from 1 through E. Then represent each vertex

v

i

as an E-element bit vector, where the j-th bit from the left is set to 1 if and only if the edge e

j

is incident

to vertex v

i

. (Another way to think of this is that these bit vectors form the rows of an incidence matrix for the

graph.) An example is shown below, in which k = 3.

1

1

1

6

7

1 0 0 0 0 0 0 0

1 1 1

1 1

1 1 1

1

0

0

0

0

0

0 0

0

0

0 0

0 0

0 0

0

0

0

1

1 1

0 0 0 0 0

0 0 0 0 0

0 0 0

0 0

6

3

e

2

e

1

e

8

e

7

e

6

e

5

e

v

2

v

4

v

3

e

1

e

2

e

3

e

4

e

v

v

1

3

2 4

5

7 v

v

v

v

1

5

v

4

e

5

e

6

e

7

e

8

v

v

v

v

Fig. 54: Encoding a graph as a collection of bit vectors.

Now, suppose we take any subset of vertices and form the logical-or of the corresponding bit vectors. If the

subset is a vertex cover, then every edge will be covered by at least one of these vertices, and so the logical-or

will be a bit vector of all 1’s, 1111 . . . 1. Conversely, if the logical-or is a bit vector of 1’s, then each edge has

been covered by some vertex, implying that the vertices form a vertex cover. (Later we will consider how to

encode the fact that there only allowed k vertices in the cover.)

1

0

1

1 0 0 0 0 0 0 0

1 1 1

1 1

1 1 1 1

1

1

1 1

0

0

0 0

0 0

0 0

0

0

1 1 1 1 1 1 1 1

v v

t =

0

0 0 0

0 0 0 0 0

0 0 0

0 0 0

0

0

0

0

0

0 0

7

3

e

4

e

5

e

6

e

7

e

8

e

1

e

2

e

v

2

v

4

v

2

v

3

v

3

v

4

e

1

e

2

e

v

v

1

3

2 4

5

6

v

v

v

v

1

5

6

7

v

3

e

4

e

5

e

6

e

7

e

8

v

v

v

v

Fig. 55: The logical-or of a vertex cover equals 1111 . . . 1.

Since bit vectors can be thought of as just a way of representing numbers in binary, this is starting to feel more

like the subset sum problem. The target would be the number whose bit vector is all 1’s. There are a number of

problems, however. First, logical-or is not the same as addition. For example, if both of the endpoints of some

edge are in the vertex cover, then its value in the corresponding column would be 2, not 1. Second, we have

no way of controlling how many vertices go into the vertex cover. (We could just take the logical-or of all the

vertices, and then the logical-or would certainly be a bit vectors of 1’s.)

Lecture Notes 74 CMSC 451

There are two ways in which addition differs signiﬁcantly from logical-or. The ﬁrst is the issue of carries. For

example, the 1101 ∨ 0011 = 1111, but in binary 1101 + 0011 = 1000. To ﬁx this, we recognize that we do

not have to use a binary (base-2) representation. In fact, we can assume any base system we want. Observe that

each column of the incidence matrix has at most two 1’s in any column, because each edge is incident to at most

two vertices. Thus, if use any base that is at least as large as base 3, we will never generate a carry to the next

position. In fact we will use base 4 (for reasons to be seen below). Note that the base of the number system is

just for own convenience of notation. Once the numbers have been formed, they will be converted into whatever

form our machine assumes for its input representation, e.g. decimal or binary.

The second difference between logical-or and addition is that an edge may generally be covered either once or

twice in the vertex cover. So, the ﬁnal sum of these numbers will be a number consisting of 1 and 2 digits, e.g.

1211 . . . 112. This does not provide us with a unique target value t. We know that no digit of our sum can be a

zero. To ﬁx this problem, we will create a set of E additional slack values. For 1 ≤ i ≤ E, the ith slack value

will consist of all 0’s, except for a single 1-digit in the ith position, e.g., 00000100000. Our target will be the

number 2222 . . . 222 (all 2’s). To see why this works, observe that from the numbers of our vertex cover, we

will get a sum consisting of 1’s and 2’s. For each position where there is a 1, we can supplement this value by

adding in the corresponding slack value. Thus we can boost any value consisting of 1’s and 2’s to all 2’s. On the

other hand, note that if there are any 0 values in the ﬁnal sum, we will not have enough slack values to convert

this into a 2.

There is one last issue. We are only allowed to place only k vertices in the vertex cover. We will handle this by

adding an additional column. For each number arising from a vertex, we will put a 1 in this additional column.

For each slack variable we will put a 0. In the target, we will require that this column sum to the value k, the

size of the vertex cover. Thus, to form the desired sum, we must select exactly k of the vertex values. Note that

since we only have a base-4 representation, there might be carries out of this last column (if k ≥ 4). But since

this is the last column, it will not affect any of the other aspects of the construction.

The Final Reduction: Here is the ﬁnal reduction, given the graph G = (V, E) and integer k for the vertex cover

problem.

(1) Create a set of n vertex values, x

1

, x

2

, . . . , x

n

using base-4 notation. The value x

i

is equal a 1 followed

by a sequence of E base-4 digits. The j-th digit is a 1 if edge e

j

is incident to vertex v

i

and 0 otherwise.

(2) Create E slack values y

1

, y

2

, . . . , y

E

, where y

i

is a 0 followed by E base-4 digits. The i-th digit of y

i

is 1

and all others are 0.

(3) Let t be the base-4 number whose ﬁrst digit is k (this may actually span multiple base-4 digits), and whose

remaining E digits are all 2.

(4) Convert the x

i

’s, the y

j

’s, and t into whatever base notation is used for the subset sum problem (e.g. base

10). Output the set S = ¦x

1

, . . . , x

n

, y

1

, . . . , y

E

¦ and t.

Observe that this can be done in polynomial time, in O(E

2

), in fact. The construction is illustrated in Fig. 56.

Correctness: We claim that G has a vertex cover of size k if and only if S has a subset that sums to t. If G has a

vertex cover V

**of size k, then we take the vertex values x
**

i

corresponding to the vertices of V

**, and for each
**

edge that is covered only once in V

**, we take the corresponding slack variable. It follows from the comments
**

made earlier that the lower-order E digits of the resulting sum will be of the form 222 . . . 2 and because there

are k elements in V

**, the leftmost digit of the sum will be k. Thus, the resulting subset sums to t.
**

Conversely, if S has a subset S

**that sums to t then we assert that it must select exactly k values from among
**

the vertex values, since the ﬁrst digit must sum to k. We claim that these vertices V

**form a vertex cover. In
**

particular, no edge can be left uncovered by V

**, since (because there are no carries) the corresponding column
**

would be 0 in the sum of vertex values. Thus, no matter what slack values we add, the resulting digit position

could not be equal to 2, and so this cannot be a solution to the subset sum problem.

Lecture Notes 75 CMSC 451

0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0

0

0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0

0

x

x

y

y

y

y

y

y

y

y

t

3 2 2 2 2 2 2 2 2

x

Slack values

Vertex values

vertex cover size (k=3)

x

0 0 0 0 0

0 0 0 0 0 0 0 0

1

1

1

1

1

1

1

1

x

x

x

0

e e e e e e e e

1 0 0 0 0 0 0 0

1 1 1

1 1

1

5

6

7

4

3

2

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

1

0

0

0

0

0

0 0

0

0

0 0

0 0

0 0

0

0

1

1

1

1

1

1

1

0 1 1 1

1

1

1 1

1 1

0 0 0 0 0

0 0 0 0 0

0 0 0

0 0

Fig. 56: Vertex cover to subset sum reduction.

0

0 0

0

0 0

0 0 0

0 0 0

0 0 0 0

0

0 0 0

0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0

0

y

y

y

y

y

y

y

t

3 2 2 2 2 2 2 2 2

x

Slack values

Vertex values

vertex cover size

(take one for each edge that has

only one endpoint in the cover)

(take those in vertex cover)

y

0 0 0 0

0 0 0 0 0 0 0

1

1

1

1

1

1

1

1

x

x

x

x

x

x

1

e e e e e e e e

1 0 0 0 0 0 0 0

1 1 1

1 1

1

5

6

7

4

3

2

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

1 0

0

0

0

0 0

0

0

0 0

0 0

0 0

0

0

1

1

1

1

1

1

0 0

0

1 1 1

1

1

1 1

1 1

0 0 0 0 0

0 0 0 0 0

0 0 0

0 0 0

Fig. 57: Correctness of the reduction.

Lecture Notes 76 CMSC 451

It is worth noting again that in this reduction, we needed to have large numbers. For example, the target value t

is at least as large as 4

E

≥ 4

n

(where n is the number of vertices in G). In our dynamic programming solution

W = t, so the DP algorithm would run in Ω(n4

n

) time, which is not polynomial time.

Lecture 21: Approximation Algorithms: VC and TSP

Read: Chapt 35 (up through 35.2) in CLRS.

Coping with NP-completeness: With NP-completeness we have seen that there are many important optimization

problems that are likely to be quite hard to solve exactly. Since these are important problems, we cannot simply

give up at this point, since people do need solutions to these problems. How do we cope with NP-completeness:

Use brute-force search: Even on the fastest parallel computers this approach is viable only for the smallest

instances of these problems.

Heuristics: A heuristic is a strategy for producing a valid solution, but there are no guarantees how close it is

to optimal. This is worthwhile if all else fails, or if lack of optimality is not really an issue.

General Search Methods: There are a number of very powerful techniques for solving general combinatorial

optimization problems that have been developed in the areas of AI and operations research. These go under

names such as branch-and-bound, A

∗

-search, simulated annealing, and genetic algorithms. The perfor-

mance of these approaches varies considerably from one problem to problem and instance to instance. But

in some cases they can perform quite well.

Approximation Algorithms: This is an algorithm that runs in polynomial time (ideally), and produces a solu-

tion that is within a guaranteed factor of the optimum solution.

Performance Bounds: Most NP-complete problems have been stated as decision problems for theoretical reasons.

However underlying most of these problems is a natural optimization problem. For example, the TSP optimiza-

tion problem is to ﬁnd the simple cycle of minimum cost in a digraph, the VC optimization problem is to ﬁnd

the vertex cover of minimum size, the clique optimization problem is to ﬁnd the clique of maximum size. Note

that sometimes we are minimizing and sometimes we are maximizing. An approximation algorithm is one that

returns a legitimate answer, but not necessarily one of the smallest size.

How do we measure how good an approximation algorithm is? We deﬁne the ratio bound of an approximation

algorithm as follows. Given an instance I of our problem, let C(I) be the cost of the solution produced by our

approximation algorithm, and let C

∗

(I) be the optimal solution. We will assume that costs are strictly positive

values. For a minimization problem we want C(I)/C

∗

(I) to be small, and for a maximization problem we want

C

∗

(I)/C(I) to be small. For any input size n, we say that the approximation algorithm achieves ratio bound

ρ(n), if for all I, [I[ = n we have

max

I

C(I)

C

∗

(I)

,

C

∗

(I)

C(I)

≤ ρ(n).

Observe that ρ(n) is always greater than or equal to 1, and it is equal to 1 if and only if the approximate solution

is the true optimum solution.

Some NP-complete problems can be approximated arbitrarily closely. Such an algorithm is given both the input,

and a real value > 0, and returns an answer whose ratio bound is at most (1 +). Such an algorithm is called

a polynomial time approximation scheme (or PTAS for short). The running time is a function of both n and .

As approaches 0, the running time increases beyond polynomial time. For example, the running time might be

O(n

1/

). If the running time depends only on a polynomial function of 1/ then it is called a fully polynomial-

time approximation scheme. For example, a running time like O((1/)

2

n

3

) would be such an example, whereas

O(n

1/

) and O(2

(1/)

n) are not.

Although NP-complete problems are equivalent with respect to whether they can be solved exactly in polynomial

time in the worst case, their approximability varies considerably.

Lecture Notes 77 CMSC 451

• For some NP-complete problems, it is very unlikely that any approximation algorithm exists. For example,

if the graph TSP problem had an approximation algorithm with a ratio bound of any value less than ∞,

then P = NP.

• Many NP-complete can be approximated, but the ratio bound is a (slow growing) function of n. For

example, the set cover problem (a generalization of the vertex cover problem), can be approximated to

within a factor of ln n. We will not discuss this algorithm, but it is covered in CLRS.

• Some NP-complete problems can be approximated to within a ﬁxed constant factor. We will discuss two

examples below.

• Some NP-complete problems have PTAS’s. One example is the subset problem (which we haven’t dis-

cussed, but is described in CLRS) and the Euclidean TSP problem.

In fact, much like NP-complete problems, there are collections of problems which are “believed” to be hard to

approximate and are equivalent in the sense that if any one can be approximated in polynomial time then they

all can be. This class is called Max-SNP complete. We will not discuss this further. Sufﬁce it to say that the

topic of approximation algorithms would ﬁll another course.

Vertex Cover: We begin by showing that there is an approximation algorithm for vertex cover with a ratio bound of 2,

that is, this algorithm will be guaranteed to ﬁnd a vertex cover whose size is at most twice that of the optimum.

Recall that a vertex cover is a subset of vertices such that every edge in the graph is incident to at least one of

these vertices. The vertex cover optimization problem is to ﬁnd a vertex cover of minimum size.

How does one go about ﬁnding an approximation algorithm. The ﬁrst approach is to try something that seems

like a “reasonably” good strategy, a heuristic. It turns out that many simple heuristics, when not optimal, can

often be proved to be close to optimal.

Here is an very simple algorithm, that guarantees an approximation within a factor of 2 for the vertex cover

problem. It is based on the following observation. Consider an arbitrary edge (u, v) in the graph. One of its

two vertices must be in the cover, but we do not know which one. The idea of this heuristic is to simply put

both vertices into the vertex cover. (You cannot get much stupider than this!) Then we remove all edges that are

incident to u and v (since they are now all covered), and recurse on the remaining edges. For every one vertex

that must be in the cover, we put two into our cover, so it is easy to see that the cover we generate is at most

twice the size of the optimum cover. The approximation is given in the ﬁgure below. Here is a more formal

proof of its approximation bound.

G and opt VC The 2−for−1 Heuristic

Fig. 58: The 2-for-1 heuristic for vertex cover.

Claim: ApproxVC yields a factor-2 approximation for Vertex Cover.

Proof: Consider the set C output by ApproxVC. Let C

∗

be the optimum VC. Let A be the set of edges selected

by the line marked with “(*)” in the ﬁgure. Observe that the size of C is exactly 2[A[ because we add two

vertices for each such edge. However note that in the optimum VC one of these two vertices must have

been added to the VC, and thus the size of C

∗

is at least [A[. Thus we have:

[C[

2

= [A[ ≤ [C

∗

[ ⇒

[C[

[C

∗

[

≤ 2.

Lecture Notes 78 CMSC 451

2-for-1 Approximation for VC

ApproxVC {

C = empty-set

while (E is nonempty) do {

(*) let (u,v) be any edge of E

add both u and v to C

remove from E all edges incident to either u or v

}

return C;

}

This proof illustrates one of the main features of the analysis of any approximation algorithm. Namely, that we

need some way of ﬁnding a bound on the optimal solution. (For minimization problems we want a lower bound,

for maximization problems an upper bound.) The bound should be related to something that we can compute in

polynomial time. In this case, the bound is related to the set of edges A, which form a maximal independent set

of edges.

The Greedy Heuristic: It seems that there is a very simple way to improve the 2-for-1 heuristic. This algorithm

simply selects any edge, and adds both vertices to the cover. Instead, why not concentrate instead on vertices of

high degree, since a vertex of high degree covers the maximum number of edges. This is greedy strategy. We

saw in the minimum spanning tree and shortest path problems that greedy strategies were optimal.

Here is the greedy heuristic. Select the vertex with the maximum degree. Put this vertex in the cover. Then

delete all the edges that are incident to this vertex (since they have been covered). Repeat the algorithm on the

remaining graph, until no more edges remain. This algorithm is illustrated in the ﬁgure below.

The Greedy Heuristic G and opt VC

Fig. 59: The greedy heuristic for vertex cover.

Greedy Approximation for VC

GreedyVC(G=(V,E)) {

C = empty-set;

while (E is nonempty) do {

let u be the vertex of maximum degree in G;

add u to C;

remove from E all edges incident to u;

}

return C;

}

It is interesting to note that on the example shown in the ﬁgure, the greedy heuristic actually succeeds in ﬁnd-

ing the optimum vertex cover. Can we prove that the greedy heuristic always outperforms the stupid 2-for-1

heuristic? The surprising answer is an emphatic “no”. In fact, it can be shown that the greedy heuristic does

not even have a constant performance bound. That is, it can perform arbitrarily poorly compared to the optimal

algorithm. It can be shown that the ratio bound grows as Θ(log n), where n is the number of vertices. (We leave

Lecture Notes 79 CMSC 451

this as a moderately difﬁcult exercise.) However, it should also be pointed out that the vertex cover constructed

by the greedy heuristic is (for typical graphs) smaller than that one computed by the 2-for-1 heuristic, so it would

probably be wise to run both algorithms and take the better of the two.

Traveling Salesman Problem: In the Traveling Salesperson Problem (TSP) we are given a complete undirected

graph with nonnegative edge weights, and we want to ﬁnd a cycle that visits all vertices and is of minimum

cost. Let c(u, v) denote the weight on edge (u, v). Given a set of edges A forming a tour we deﬁne c(A) to be

the sum of edge weights in A. Last time we mentioned that TSP (posed as a decision problem) is NP-complete.

For many of the applications of TSP, the problem satisﬁes something called the triangle inequality. Intuitively,

this says that the direct path fromu to w, is never longer than an indirect path. More formally, for all u, v, w ∈ V

c(u, w) ≤ c(u, v) +c(v, w).

There are many examples of graphs that satisfy the triangle inequality. For example, given any weighted graph,

if we deﬁne c(u, v) to be the shortest path length between u and v (computed, say by the Floyd-Warshall

algorithm), then it will satisfy the triangle inequality. Another example is if we are given a set of points in

the plane, and deﬁne a complete graph on these points, where c(u, v) is deﬁned to be the Euclidean distance

between these points, then the triangle inequality is also satisﬁed.

When the underlying cost function satisﬁes the triangle inequality there is an approximation algorithm for TSP

with a ratio-bound of 2. (In fact, there is a slightly more complex version of this algorithm that has a ratio bound

of 1.5, but we will not discuss it.) Thus, although this algorithm does not produce an optimal tour, the tour that

it produces cannot be worse than twice the cost of the optimal tour.

The key insight is to observe that a TSP with one edge removed is a spanning tree. However it is not necessarily

a minimum spanning tree. Therefore, the cost of the minimum TSP tour is at least as large as the cost of the

MST. We can compute MST’s efﬁciently, using, for example, either Kruskal’s or Prim’s algorithm. If we can

ﬁnd some way to convert the MST into a TSP tour while increasing its cost by at most a constant factor, then

we will have an approximation for TSP. We shall see that if the edge weights satisfy the triangle inequality, then

this is possible.

Here is how the algorithm works. Given any free tree there is a tour of the tree called a twice around tour that

traverses the edges of the tree twice, once in each direction. The ﬁgure below shows an example of this.

Shortcut tour

start

Optimum tour

start

MST Twice−around tour

Fig. 60: TSP Approximation.

This path is not simple because it revisits vertices, but we can make it simple by short-cutting, that is, we skip

over previously visited vertices. Notice that the ﬁnal order in which vertices are visited using the short-cuts is

exactly the same as a preorder traversal of the MST. (In fact, any subsequence of the twice-around tour which

visits each vertex exactly once will sufﬁce.) The triangle inequality assures us that the path length will not

increase when we take short-cuts.

Claim: Approx-TSP has a ratio bound of 2.

Proof: Let H denote the tour produced by this algorithmand let H

∗

be the optimumtour. Let T be the minimum

spanning tree. As we said before, since we can remove any edge of H

∗

resulting in a spanning tree, and

Lecture Notes 80 CMSC 451

TSP Approximation

ApproxTSP(G=(V,E)) {

T = minimum spanning tree for G

r = any vertex

L = list of vertices visited by a preorder walk ot T

starting with r

return L

}

since T is the minimum cost spanning tree we have

c(T) ≤ c(H

∗

).

Now observe that the twice around tour of T has cost 2c(T), since every edge in T is hit twice. By the

triangle inequality, when we short-cut an edge of T to form H we do not increase the cost of the tour, and

so we have

c(H) ≤ 2c(T).

Combining these we have

c(H)

2

≤ c(T) ≤ c(H

∗

) ⇒

c(H)

c(H

∗

)

≤ 2.

Lecture 22: The k-Center Approximation

Read: Today’s material is not covered in CLR.

Facility Location: Imagine that Blockbuster Video wants to open a 50 stores in some city. The company asks you to

determine the best locations for these stores. The condition is that you are to minimize the maximum distance

that any resident of the city must drive in order to arrive at the nearest store.

If we model the road network of the city as an undirected graph whose edge weights are the distances between

intersections, then this is an instance of the k-center problem. In the k-center problem we are given an undirected

graph G = (V, E) with nonnegative edge weights, and we are given an integer k. The problem is to compute

a subset of k vertices C ⊆ V , called centers, such that the maximum distance between any vertex in V and its

nearest center in C is minimized. (The optimization problem seeks to minimize the maximum distance and the

decision problem just asks whether there exists a set of centers that are within a given distance.)

More formally, let G = (V, E) denote the graph, and let w(u, v) denote the weight of edge (u, v). (w(u, v) =

w(v, u) because G is undirected.) We assume that all edge weights are nonnegative. For each pair of vertices,

u, v ∈ V , let d(u, v) = d(u, v) denote the distance between u to v, that is, the length of the shortest path from

u to v. (Note that the shortest path distance satisﬁes the triangle inequality. This will be used in our proof.)

Consider a subset C ⊆ V of vertices, the centers. For each vertex v ∈ V we can associate it with its nearest

center in C. (This is the nearest Blockbuster store to your house). For each center c

i

∈ C we deﬁne its

neighborhood to be the subset of vertices for which c

i

is the closest center. (These are the houses that are closest

to this center. See Fig. 61.) More formally, deﬁne:

V (c

i

) = ¦v ∈ V [ d(v, c

i

) ≤ d(v, c

j

), for i = j¦.

Let us assume for simplicity that there are no ties for the distances to the closest center (or that any such ties have

been broken arbitrarily). Then V (c

1

), V (c

2

), . . . , V (c

k

) forms a partition of the vertex set of G. The bottleneck

distance associated with each center is the distance to its farthest vertex in V (c

i

), that is,

D(c

i

) = max

v∈V (ci)

d(v, c

i

).

Lecture Notes 81 CMSC 451

6 5

4 5

9

5

8

6 8

4

V(c3) 5 7

Input graph (k=3) Optimumum Cost = 7

c1

c3

c2

V(c2)

V(c1)

6

5

9

5 6

7

6

4

5

8

9 8

5

9

5 6

7

6

Fig. 61: The k-center problem with optimum centers c

i

and neighborhood sets V (c

i

).

Finally, we deﬁne the overall bottleneck distance to be

D(C) = max

ci∈C

D(c

i

).

This is the maximum distance of any vertex from its nearest center. This distance is critical because it represents

the customer that must travel farthest to get to the nearest facility, the bottleneck vertex. Given this notation, we

can now formally deﬁne the problem.

k-center problem: Given a weighted undirected graph G = (V, E), and an integer k ≤ [V [, ﬁnd a subset

C ⊆ V of size k such that D(C) is minimized.

The decision-problem formulation of the k-center problem is NP-complete (reduction from dominating set). A

brute force solution to this problem would involve enumerating all k-element of subsets of V , and computing

D(C) for each one. However, letting n = [V [ and k, the number of possible subsets is

n

k

= Θ(n

k

). If k is

a function of n (which is reasonable), then this an exponential number of subsets. Given that the problem is

NP-complete, it is highly unlikely that a signiﬁcantly more efﬁcient exact algorithm exists in the worst-case. We

will show that there does exist an efﬁcient approximation algorithm for the problem.

Greedy Approximation Algorithm: Our approximation algorithm is based on a simple greedy algorithm that pro-

duces a bottleneck distance D(C) that is not more than twice the optimum bottleneck distance. We begin by

letting the ﬁrst center c

1

be any vertex in the graph (the lower left vertex, say, in the ﬁgure below). Compute

the distances between this vertex and all the other vertices in the graph (Fig. 62(b)). Consider the vertex that is

farthest from this center (the upper right vertex at distance 23 in the ﬁgure). This the bottleneck vertex for ¦c

1

¦.

We would like to select the next center so as to reduce this distance. So let us just make it the next center, called

c

2

. Then again we compute the distances from each vertex in the graph to the closer of c

1

and c

2

. (See Fig. 62(c)

where dashed lines indicate which vertices are closer to which center). Again we consider the bottleneck vertex

for the current centers ¦c

1

, c

2

¦. We place the next center at this vertex (see Fig. 62(d)). Again we compute the

distances from each vertex to its nearest center. Repeat this until all k centers have been selected. In Fig. 62(d),

the ﬁnal three greedy centers are shaded, and the ﬁnal bottleneck distance is 11.

Although the greedy approach has a certain intuitive appeal (because it attempts to ﬁnd the vertex that gives the

bottleneck distance, and then puts a center right on this vertex), it is not optimal. In the example shown in the

ﬁgure, the optimum solution (shown on the right) has a bottleneck cost of 9, which beats the 11 that the greedy

algorithm gave.

Here is a summary of the algorithm. For each vertex u, let d[u] denote the distance to the nearest center.

We know from Dijkstra’s algorithm how to compute the shortest path from a single source to all other vertices

in the graph. One way to solve the distance computation step above would be to invoke Dijkstra’s algorithm i

times. But there is an easier way. We can modify Dijkstra’s algorithm to operate as a multiple source algorithm.

In particular, in the initialization of Dijkstra’s single source algorithm, it sets d[s] = 0 and pred[s] = null. In

Lecture Notes 82 CMSC 451

9

6

11

9

4

(d)

6

(b) (c)

5

5

8

6 8

5

4

6

7

6

9

c2

c3

c2

c1 c1 c1

8

5

Greedy Cost = 11

(a)

9

6 8

5

4

6

7

6 5

9

5

19

14 6

12

19 14

5

9 8

6 8

5

4

6

7

6 5

9

5

23

5

6

4

9

5

11

6

12

9 8

6 8

5

4

6

7

6

9

5

Fig. 62: Greedy approximation to k-center.

Greedy Approximation for k-center

KCenterApprox(G, k) {

C = empty_set

for each u in V do // initialize distances

d[u] = INFINITY

for i = 1 to k do { // main loop

Find the vertex u such that d[u] is maximum

Add u to C // u is the current bottleneck vertex

// update distances

Compute the distance from each vertex v to its closest

vertex in C, denoted d[v]

}

return C // final centers

}

Lecture Notes 83 CMSC 451

the modiﬁed multiple source version, we do this for all the vertices of C. The ﬁnal greedy algorithm involves

running Dijkstra’s algorithm k times (once for each time through the for-loop). Recall that the running time of

Dijkstra’s algorithm is O((V + E) log V ). Under the reasonable assumption that E ≥ V , this is O(E log V ).

Thus, the overall running time is O(kE log V ).

Approximation Bound: How bad could greedy be? We will argue that it has a ratio bound of 2. To see that we can

get a factor of 2, consider a set of n + 1 vertices arranged in a linear graph, in which all edges are of weight 1.

The greedy algorithm might pick any initial vertex that it likes. Suppose it picks the leftmost vertex. Then the

maximum (bottleneck) distance is the distance to the rightmost vertex which is n. If we had instead chosen the

vertex in the middle, then the maximum distance would only be n/2, which is better by a factor of 2.

Opt

Greedy

Cost =n/2

Cost = n

Fig. 63: Worst-case for greedy.

We want to show that this approximation algorithm always produces a ﬁnal distance D(C) that is within a factor

of 2 of the distance of the optimal solution.

Let O = ¦o

1

, o

2

, . . . , o

k

¦ denote the centers of the optimal solution (shown as black dots in Fig. 64, and the lines

show the partition into the neighborhoods for each of these points). Let D

∗

= D(O) be the optimal bottleneck

distance.

Let G = ¦g

1

, g

2

, . . . , g

k

¦ be the centers found by the greedy approximation (shown as white dots in the ﬁgure

below). Also, let g

k+1

denote the next center that would have been added next, that is, the bottleneck vertex for

G. Let D(G) denote the bottleneck distance for G. Notice that the distance from g

k+1

to its nearest center is

equal D(G). The proof involves a simple application of the pigeon-hole principal.

<D

>D*

o

3

o

4

o

5

<D

o

1

g

1

g

2

g

3

g

6

g

5

g

4

o

2

opt

opt

Fig. 64: Analysis of the greedy heuristic for k = 5. The greedy centers are given as white dots and the optimal centers

as black dots. The regions represent the neighborhood sets V (o

i

) for the optimal centers.

Theorem: The greedy approximation has a ratio bound of 2, that is D(G)/D

∗

≤ 2.

Proof: Let G

= ¦g

1

, g

2

, . . . , g

k

, g

k+1

¦ be the (k + 1)-element set consisting of the greedy centers together

with the next greedy center g

k+1

First observe that for i = j, d(g

i

, g

j

) ≥ D(G). This follows as a result of

our greedy selection strategy. As each center is selected, it is selected to be at the maximum (bottleneck)

distance from all the previous centers. As we add more centers, the maximum distance between any pair

of centers decreases. Since the ﬁnal bottleneck distance is D(G), all the centers are at least this far apart

from one another.

Lecture Notes 84 CMSC 451

Each g

i

∈ G

**is associated with its closest center in the optimal solution, that is, each belongs to V (o
**

m

)

for some m. Because there are k centers in O, and k + 1 elements in G

**, it follows from the pigeon-hole
**

principal, that at least two centers of G

**are in the same set V (o
**

m

) for some m. (In the ﬁgure, the greedy

centers g

4

and g

5

are both in V (o

2

)). Let these be denoted g

i

and g

j

.

Since D

∗

is the bottleneck distance for O, we know that the distance from g

i

to o

k

is of length at most

D

∗

and similarly the distance from o

k

to g

j

is at most D

∗

. By concatenating these two paths, it follows

that there exists a path of length 2D

∗

from g

i

to g

j

, and hence we have d(g

i

, g

j

) ≤ 2D

∗

. But from the

comments above we have d(g

i

, g

j

) ≥ D(G). Therefore,

D(G) ≤ d(g

i

, g

j

) ≤ 2D

∗

,

from which the desired ratio follows.

Lecture 23: Approximations: Set Cover and Bin Packing

Read: Set cover is covered in Chapt 35.3. Bin packing is covered as an exercise in CLRS.

Set Cover: The set cover problem is a very important optimization problem. You are given a pair (X, F) where

X = ¦x

1

, x

2

, . . . , x

m

¦ is a ﬁnite set (a domain of elements) and F = ¦S

1

, S

2

, . . . , S

n

¦ is a family of subsets

of X, such that every element of X belongs to at least one set of F.

Consider a subset C ⊆ F. (This is a collection of sets over X.) We say that C covers the domain if every

element of X is in some set of C, that is

X =

¸

Si∈C

S

i

.

The problem is to ﬁnd the minimum-sized subset C of F that covers X. Consider the example shown below.

The optimum set cover consists of the three sets ¦S

3

, S

4

, S

5

¦.

S

1

S

2

S

3

S

4

S

5

S

6

Fig. 65: Set cover.

Set cover can be applied to a number of applications. For example, suppose you want to set up security cameras

to cover a large art gallery. From each possible camera position, you can see a certain subset of the paintings.

Each such subset of paintings is a set in your system. You want to put up the fewest cameras to see all the

paintings.

Complexity of Set Cover: We have seen special cases of the set cover problems that are NP-complete. For example,

vertex cover is a type of set cover problem. The domain to be covered are the edges, and each vertex covers the

subset of incident edges. Thus, the decision-problem formulation of set cover (“does there exist a set cover of

size at most k?”) is NP-complete as well.

There is a factor-2 approximation for the vertex cover problem, but it cannot be applied to generate a factor-

2 approximation for set cover. In particular, the VC approximation relies on the fact that each element of the

domain (an edge) is in exactly 2 sets (one for each of its endpoints). Unfortunately, this is not true for the general

Lecture Notes 85 CMSC 451

set cover problem. In fact, it is known that there is no constant factor approximation to the set cover problem,

unless P = NP. This is unfortunate, because set cover is one of the most powerful NP-complete problems.

Today we will show that there is a reasonable approximation algorithm, the greedy heuristic, which achieves

an approximation bound of ln m, where m = [X[, the size of the underlying domain. (The book proves a

somewhat stronger result, that the approximation factor of ln m

where m

**≤ m is the size of the largest set in
**

F. However, their proof is more complicated.)

Greedy Set Cover: A simple greedy approach to set cover works by at each stage selecting the set that covers the

greatest number of “uncovered” elements.

Greedy Set Cover

Greedy-Set-Cover(X, F) {

U = X // U are the items to be covered

C = empty // C will be the sets in the cover

while (U is nonempty) { // there is someone left to cover

select S in F that covers the most elements of U

add S to C

U = U - S

}

return C

}

For the example given earlier the greedy-set cover algorithm would select S

1

(since it covers 6 out of 12 ele-

ments), then S

6

(since it covers 3 out of the remaining 6), then S

2

(since it covers 2 of the remaining 3) and

ﬁnally S

3

. Thus, it would return a set cover of size 4, whereas the optimal set cover has size 3.

What is the approximation factor? The problem with the greedy set cover algorithm is that it can be “fooled” into

picking the wrong set, over and over again. Consider the following example. The optimal set cover consists of

sets S

5

and S

6

, each of size 16. Initially all three sets S

1

, S

5

, and S

6

have 16 elements. If ties are broken in the

worst possible way, the greedy algorithm will ﬁrst select sets S

1

. We remove all the covered elements. Now S

2

,

S

5

and S

6

all cover 8 of the remaining elements. Again, if we choose poorly, S

2

is chosen. The pattern repeats,

choosing S

3

(size 4), S

4

(size 2) and ﬁnally S

5

and S

6

(each of size 1).

Thus, the optimum cover consisted of two sets, but we picked roughly lg m, where m = [X[, for a ratio bound

of (lg m)/2. (Recall the lg denotes logarithm base 2.) There were many cases where ties were broken badly

here, but it is possible to redesign the example such that there are no ties and yet the algorithm has essentially

the same ratio bound.

Optimum: {S5, S6}

Greedy: {S1, S2, S3, S4, S5, S6}

S6

S5

S4S3 S2 S1

Fig. 66: An example in which the Greedy Set cover performs poorly.

However we will show that the greedy set cover heuristic nevers performs worse than a factor of ln m. (Note

that this is natural log, not base 2.)

Before giving the proof, we need one important mathematical inequality.

Lemma: For all c > 0,

1 −

1

c

c

≤

1

e

.

where e is the base of the natural logarithm.

Lecture Notes 86 CMSC 451

Proof: We use the fact that for all x, 1 + x ≤ e

x

. (The two functions are equal when x = 0.) Now, if we

substitute −1/c for x we have (1 −1/c) ≤ e

−1/c

, and if we raise both sides to the cth power, we have the

desired result.

The theorem of the approximation bound for bin packing proven here is a bit weaker from the one in CLRS, but

I think it is easier to understand.

Theorem: Greedy set cover has the ratio bound of at most ln m where m = [X[.

Proof: Let c denote the size of the optimum set cover, and let g denote the size of the greedy set cover minus 1.

We will show that g/c ≤ ln m. (This is not quite what we wanted, but we are correct to within 1 set.)

Initially, there are m

0

= m elements left to be covered. We know that there is a cover of size c (the

optimal cover) and therefore by the pigeonhole principle, there must be at least one set that covers at least

m

0

/c elements. (Since otherwise, if every set covered less than m

0

/c elements, then no collection of c

sets could cover all m

0

elements.) Since the greedy algorithm selects the largest set, it will select a set

that covers at least this many elements. The number of elements that remain to be covered is at most

m

1

= m

0

−m

0

/c = m

0

(1 −1/c).

Applying the argument again, we know that we can cover these m

1

elements with a cover of size c (the

optimal cover), and hence there exists a subset that covers at least m

1

/c elements, leaving at most m

2

=

m

1

(1 −1/c) = m

0

(1 −1/c)

2

elements remaining.

If we apply this argument g times, each time we succeed in covering at least a fraction of (1 −1/c) of the

remaining elements. Then the number of elements that remain is uncovered after g sets have been chosen

by the greedy algorithm is at most m

g

= m

0

(1 −1/c)

g

.

How long can this go on? Consider the largest value of g such that after removing all but the last set of the

greedy cover, we still have some element remaining to be covered. Thus, we are interested in the largest

value of g such that

1 ≤ m

1 −

1

c

g

.

We can rewrite this as

1 ≤ m

¸

1 −

1

c

c

g/c

.

By the inequality above we have

1 ≤ m

¸

1

e

g/c

.

Now, if we multiply by e

g/c

and take natural logs we get that g satisﬁes:

e

g/c

≤ m ⇒

g

c

≤ ln m.

This completes the proof.

Even though the greedy set cover has this relatively bad ratio bound, it seems to perform reasonably well in

practice. Thus, the example shown above in which the approximation bound is (lg m)/2 is not “typical” of set

cover instances.

Bin Packing: Bin packing is another well-known NP-complete problem, which is a variant of the knapsack problem.

We are given a set of n objects, where s

i

denotes the size of the ith object. It will simplify the presentation to

assume that 0 < s

i

< 1. We want to put these objects into a set of bins. Each bin can hold a subset of objects

whose total size is at most 1. The problem is to partition the objects among the bins so as to use the fewest

possible bins. (Note that if your bin size is not 1, then you can reduce the problem into this form by simply

dividing all sizes by the size of the bin.)

Lecture Notes 87 CMSC 451

Bin packing arises in many applications. Many of these applications involve not only the size of the object but

their geometric shape as well. For example, these include packing boxes into a truck, or cutting the maximum

number of pieces of certain shapes out of a piece of sheet metal. However, even if we ignore the geometry,

and just consider the sizes of the objects, the decision problem is still NP-complete. (The reduction is from the

knapsack problem.)

Here is a simple heuristic algorithm for the bin packing problem, called the ﬁrst-ﬁt heuristic. We start with an

unlimited number of empty bins. We take each object in turn, and ﬁnd the ﬁrst bin that has space to hold this

object. We put this object in this bin. The algorithm is illustrated in the ﬁgure below. We claim that ﬁrst-ﬁt uses

at most twice as many bins as the optimum, that is, if the optimal solution uses b

∗

bins, and ﬁrst-ﬁt uses b

ff

bins,

then

b

ff

b

∗

≤ 2.

4

5

2

1

3

6

7

s s

s

s

s

s

s

Fig. 67: First-ﬁt Heuristic.

Theorem: The ﬁrst-ﬁt heuristic achieves a ratio bound of 2.

Proof: Consider an instance ¦s

1

, . . . , s

n

¦ of the bin packing problem. Let S =

¸

i

s

i

denote the sum of all the

object sizes. Let b

∗

denote the optimal number of bins, and b

ff

denote the number of bins used by ﬁrst-ﬁt.

First observe that b

∗

≥ S. This is true, since no bin can hold a total capacity of more than 1 unit, and even

if we were to ﬁll each bin exactly to its capacity, we would need at least S bins. (In fact, since the number

of bins is an integer, we would need at least S| bins.)

Next, we claim that b

ff

≤ 2S. To see this, let t

i

denote the total size of the objects that ﬁrst-ﬁt puts into

bin i. Consider bins i and i + 1 ﬁlled by ﬁrst-ﬁt. Assume that indexing is cyclical, so if i is the last index

(i = b

ff

) then i + 1 = 1. We claim that t

i

+ t

i+1

≥ 1. If not, then the contents of bins i and i + 1 could

both be put into the same bin, and hence ﬁrst-ﬁt would never have started to ﬁll the second bin, preferring

to keep everything in the ﬁrst bin. Thus we have:

b

ff

¸

i=1

(t

i

+t

i+1

) ≥ b

ff

.

But this sum adds up all the elements twice, so it has a total value of 2S. Thus we have 2S ≥ b

ff

.

Combining this with the fact that b

∗

≥ S we have

b

ff

≤ 2S ≤ 2b

∗

,

implying that b

ff

/b

∗

≤ 2, as desired.

There are in fact a number of other heuristics for bin packing. Another example is best ﬁt, which attempts to

put the object into the bin in which it ﬁts most closely with the available space (assuming that there is sufﬁcient

available space). There is also a variant of ﬁrst-ﬁt, called ﬁrst ﬁt decreasing, in which the objects are ﬁrst sorted

in decreasing order of size. (This makes intuitive sense, because it is best to ﬁrst load the big items, and then try

to squeeze the smaller objects into the remaining space.)

Lecture Notes 88 CMSC 451

A more careful proof establishes that ﬁrst ﬁt has a approximation ratio that is a bit smaller than 2, and in fact

17/10 is possible. Best ﬁt has a very similar bound. First ﬁt decreasing has a signiﬁcantly better bound of

11/9 = 1.222 . . ..

Lecture 24: Final Review

Overview: This semester we have discussed general approaches to algorithm design. The intent has been to investi-

gate basic algorithm design paradigms: dynamic programming, greedy algorithms, depth-ﬁrst search, etc. And

to consider how these techniques can be applied on a number of well-deﬁned problems. We have also discussed

the class NP-completeness, of problems that believed to be very hard to solve, and ﬁnally some examples of

approximation algorithms.

How to use this information: In some sense, the algorithms you have learned here are rarely immediately applicable

to your later work (unless you go on to be an algorithm designer) because real world problems are always

messier than these simple abstract problems. However, there are some important lessons to take out of this

class.

Develop a clean mathematical model: Most real-world problems are messy. An important ﬁrst step in solving

any problem is to produce a simple and clean mathematical formulation. For example, this might involve

describing the problem as an optimization problem on graphs, sets, or strings. If you cannot clearly

describe what your algorithm is supposed to do, it is very difﬁcult to know when you have succeeded.

Create good rough designs: Before jumping in and starting coding, it is important to begin with a good rough

design. If your rough design is based on a bad paradigm (e.g. exhaustive enumeration, when depth-ﬁrst

search could have been applied) then no amount of additional tuning and reﬁning will save this bad design.

Prove your algorithm correct: Many times you come up with an idea that seems promising, only to ﬁnd out

later (after a lot of coding and testing) that it does not work. Prove that your algorithm is correct before

coding. Writing proofs is not always easy, but it may save you a few weeks of wasted programming time.

If you cannot see why it is correct, chances are that it is not correct at all.

Can it be improved?: Once you have a solution, try to come up with a better one. Is there some reason why a

better algorithm does not exist? (That is, can you establish a lower bound?) If your solution is exponential

time, then maybe your problem is NP-hard.

Prototype to generate better designs: We have attempted to analyze algorithms from an asymptotic perspec-

tive, which hides many of details of the running time (e.g. constant factors), but give a general perspective

for separating good designs from bad ones. After you have isolated the good designs, then it is time to start

prototyping and doing empirical tests to establish the real constant factors. A good proﬁling tool can tell

you which subroutines are taking the most time, and those are the ones you should work on improving.

Still too slow?: If your problem has an unacceptably high execution time, you might consider an approximation

algorithm. The world is full of heuristics, both good and bad. You should develop a good heuristic, and if

possible, prove a ratio bound for your algorithm. If you cannot prove a ratio bound, run many experiments

to see how good the actual performance is.

There is still much more to be learned about algorithm design, but we have covered a great deal of the basic

material. One direction is to specialize in some particular area, e.g. string pattern matching, computational

geometry, parallel algorithms, randomized algorithms, or approximation algorithms. It would be easy to devote

an entire semester to any one of these topics.

Another direction is to gain a better understanding of average-case analysis, which we have largely ignored.

Still another direction might be to study numerical algorithms (as covered in a course on numerical analysis),

or to consider general search strategies such as simulated annealing. Finally, an emerging area is the study of

algorithm engineering, which considers how to design algorithms that are both efﬁcient in a practical sense, as

well as a theoretical sense.

Lecture Notes 89 CMSC 451

Material for the ﬁnal exam:

Old Material: Know general results, but I will not ask too many detailed questions. Do not forget DFS and

DP. You will likely an algorithm design problem that will involve one of these two techniques.

All-Pairs Shortest paths: (Chapt 25.2.)

Floyd-Warshall Algorithm: All-pairs shortest paths, arbitrary edge weights (no negative cost cycles).

Running time O(V

3

).

NP-completeness: (Chapt 34.)

Basic concepts: Decision problems, polynomial time, the class P, certiﬁcates and the class NP, polynomial

time reductions.

NP-completeness reductions: You are responsible for knowing the following reductions.

• 3-coloring to clique cover.

• 3SAT to Independent Set (IS).

• Independent Set to Vertex Cover and Clique.

• Vertex Cover to Dominating Set.

• Vertex Cover to Subset Sum.

It is also a good idea to understand all the reductions that were used in the homework solutions, since

modiﬁcations of these will likely appear on the ﬁnal.

NP-complete reductions can be challenging. If you cannot see how to solve the problem, here are some

suggestions for maximizing partial credit.

All NP-complete proofs have a very speciﬁc form. Explain that you know the template, and try to ﬁll in as

many aspects as possible. Suppose that you want to prove that some problem B is NP-complete.

• B ∈ NP. This almost always easy, so don’t blow it. This basically involves specifying the certiﬁcate.

The certiﬁcate is almost always the thing that the problem is asking you to ﬁnd.

• For some known NP-complete problem A, A ≤

P

B. This means that you want to ﬁnd a polynomial

time function f that maps an instance of A to an instance of B. (Make sure to get the direction

correct!)

• Show the correctness of your reduction, by showing that x ∈ Aif and only if f(x) ∈ B. First suppose

that you have a solution to x and show how to map this to a solution for f(x). Then suppose that you

have a solution to f(x) and show how to map this to a solution for x.

If you cannot ﬁgure out what f is, at least tell me what you would like f to do. Explain which elements

of problem A will likely map to which elements of problem B. Remember that you are trying to translate

the elements of one problem into the common elements of the other problem.

I try to make at least one reduction on the exam similar to one that you have seen before, so make sure that

understand the ones that we have done either in class or on homework problems.

Approximation Algorithms: (Chapt. 35, up through 35.2.)

Vertex cover: Ratio bound of 2.

TSP with triangle inequality: Ratio bound of 2.

Set Cover: Ratio bound of ln m, where m = [X[.

Bin packing: Ratio bound of 2.

k-center: Ratio bound of 2.

Many approximation algorithms are simple. (Most are based on simple greedy heuristics.) The key to

proving many ratio bounds is ﬁrst coming up with a lower bound on the optimal solution (e.g., TSP

opt

≥

MST). Next, provide an upper bound on the cost of your heuristic relative to this same quantity (e.g., the

shortcut twice-around tour for the MST is at most twice the MST cost).

Lecture Notes 90 CMSC 451

Supplemental Lecture 1: Asymptotics

Read: Chapters 2–3 in CLRS.

Asymptotics: The formulas that are derived for the running times of program may often be quite complex. When

designing algorithms, the main purpose of the analysis is to get a sense for the trend in the algorithm’s running

time. (An exact analysis is probably best done by implementing the algorithm and measuring CPU seconds.) We

would like a simple way of representing complex functions, which captures the essential growth rate properties.

This is the purpose of asymptotics.

Asymptotic analysis is based on two simplifying assumptions, which hold in most (but not all) cases. But it is

important to understand these assumptions and the limitations of asymptotic analysis.

Large input sizes: We are most interested in how the running time grows for large values of n.

Ignore constant factors: The actual running time of the program depends on various constant factors in the im-

plementation (coding tricks, optimizations in compilation, speed of the underlying hardware, etc). There-

fore, we will ignore constant factors.

The justiﬁcation for considering large n is that if n is small, then almost any algorithm is fast enough. People are

most concerned about running times for large inputs. For the most part, these assumptions are reasonable when

making comparisons between functions that have signiﬁcantly different behaviors. For example, suppose we

have two programs, one whose running time is T

1

(n) = n

3

and another whose running time is T

2

(n) = 100n.

(The latter algorithm may be faster because it uses a more sophisticated and complex algorithm, and the added

sophistication results in a larger constant factor.) For small n (e.g., n ≤ 10) the ﬁrst algorithm is the faster of

the two. But as n becomes larger the relative differences in running time become much greater. Assuming one

million operations per second.

n T

1

(n) T

2

(n) T

1

(n)/T

2

(n)

10 0.001 sec 0.001 sec 1

100 1 sec 0.01 sec 100

1000 17 min 0.1 sec 10,000

10,000 11.6 days 1 sec 1,000,000

The clear lesson is that as input sizes grow, the performance of the asymptotically poorer algorithm degrades

much more rapidly.

These assumptions are not always reasonable. For example, in any particular application, n is a ﬁxed value. It

may be the case that one function is smaller than another asymptotically, but for your value of n, the asymptot-

ically larger value is ﬁne. Most of the algorithms that we will study this semester will have both low constants

and low asymptotic running times, so we will not need to worry about these issues.

Asymptotic Notation: To represent the running times of algorithms in a simpler form, we use asymptotic notation,

which essentially represents a function by its fastest growing term and ignores constant factors. For example,

suppose we have an algorithm whose (exact) worst-case running time is given by the following formula:

T(n) = 13n

3

+ 5n

2

−17n + 16.

As n becomes large, the 13n

3

term dominates the others. By ignoring constant factors, we might say that the

running time grows “on the order of” n

3

, which will will express mathematically as T(n) ∈ Θ(n

3

). This

intuitive deﬁnition is ﬁne for informal use. Let us consider how to make this idea mathematically formal.

Deﬁnition: Given any function g(n), we deﬁne Θ(g(n)) to be a set of functions:

Θ(g(n)) = ¦f(n) [ there exist strictly positive constants c

1

, c

2

, and n

0

such that

0 ≤ c

1

g(n) ≤ f(n) ≤ c

2

g(n) for all n ≥ n

0

¦.

Lecture Notes 91 CMSC 451

Let’s dissect this deﬁnition. Intuitively, what we want to say with “f(n) ∈ Θ(g(n))” is that f(n) and g(n) are

asymptotically equivalent. This means that they have essentially the same growth rates for large n. For example,

functions such as

4n

2

, (8n

2

+ 2n −3), (n

2

/5 +

√

n −10 log n), and n(n −3)

are all intuitively asymptotically equivalent, since as n becomes large, the dominant (fastest growing) term is

some constant times n

2

. In other words, they all grow quadratically in n. The portion of the deﬁnition that

allows us to select c

1

and c

2

is essentially saying “the constants do not matter because you may pick c

1

and

c

2

however you like to satisfy these conditions.” The portion of the deﬁnition that allows us to select n

0

is

essentially saying “we are only interested in large n, since you only have to satisfy the condition for all n bigger

than n

0

, and you may make n

0

as big a constant as you like.”

An example: Consider the function f(n) = 8n

2

+ 2n − 3. Our informal rule of keeping the largest term and

throwing away the constants suggests that f(n) ∈ Θ(n

2

) (since f grows quadratically). Let’s see why the

formal deﬁnition bears out this informal observation.

We need to show two things: ﬁrst, that f(n) does grows asymptotically at least as fast as n

2

, and second, that

f(n) grows no faster asymptotically than n

2

. We’ll do both very carefully.

Lower bound: f(n) grows asymptotically at least as fast as n

2

: This is established by the portion of the

deﬁnition that reads: (paraphrasing): “there exist positive constants c

1

and n

0

, such that f(n) ≥ c

1

n

2

for

all n ≥ n

0

.” Consider the following (almost correct) reasoning:

f(n) = 8n

2

+ 2n −3 ≥ 8n

2

−3 = 7n

2

+ (n

2

−3) ≥ 7n

2

= 7n

2

.

Thus, if we set c

1

= 7, then we are done. But in the above reasoning we have implicitly made the

assumptions that 2n ≥ 0 and n

2

−3 ≥ 0. These are not true for all n, but they are true for all sufﬁciently

large n. In particular, if n ≥

√

3, then both are true. So let us select n

0

=

√

3, and now we have

f(n) ≥ c

1

n

2

, for all n ≥ n

0

, which is what we need.

Upper bound: f(n) grows asymptotically no faster than n

2

: This is established by the portion of the deﬁnition

that reads “there exist positive constants c

2

and n

0

, such that f(n) ≤ c

2

n

2

for all n ≥ n

0

.” Consider the

following reasoning (which is almost correct):

f(n) = 8n

2

+ 2n −3 ≤ 8n

2

+ 2n ≤ 8n

2

+ 2n

2

= 10n

2

.

This means that if we let c

2

= 10, then we are done. We have implicitly made the assumption that

2n ≤ 2n

2

. This is not true for all n, but it is true for all n ≥ 1. So, let us select n

0

= 1, and now we have

f(n) ≤ c

2

n

2

for all n ≥ n

0

, which is what we need.

From the lower bound, we have n

0

≥

√

3 and from the upper bound we have n

0

≥ 1, and so combining these

we let n

0

be the larger of the two: n

0

=

√

3. Thus, in conclusion, if we let c

1

= 7, c

2

= 10, and n

0

=

√

3, then

we have

0 ≤ c

1

g(n) ≤ f(n) ≤ c

2

g(n) for all n ≥ n

0

,

and this is exactly what the deﬁnition requires. Since we have shown (by construction) the existence of con-

stants c

1

, c

2

, and n

0

, we have established that f(n) ∈ n

2

. (Whew! That was a lot more work than just the

informal notion of throwing away constants and keeping the largest term, but it shows how this informal notion

is implemented formally in the deﬁnition.)

Now let’s show why f(n) is not in some other asymptotic class. First, let’s show that f(n) / ∈ Θ(n). If this were

true, then we would have to satisfy both the upper and lower bounds. It turns out that the lower bound is satisﬁed

(because f(n) grows at least as fast asymptotically as n). But the upper bound is false. In particular, the upper

bound requires us to show “there exist positive constants c

2

and n

0

, such that f(n) ≤ c

2

n for all n ≥ n

0

.”

Informally, we know that as n becomes large enough f(n) = 8n

2

+2n−3 will eventually exceed c

2

n no matter

Lecture Notes 92 CMSC 451

how large we make c

2

(since f(n) is growing quadratically and c

2

n is only growing linearly). To show this

formally, suppose towards a contradiction that constants c

2

and n

0

did exist, such that 8n

2

+ 2n −3 ≤ c

2

n for

all n ≥ n

0

. Since this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. If

we divide both side by n we have:

lim

n→∞

8n + 2 −

3

n

≤ c

2

.

It is easy to see that in the limit the left side tends to ∞, and so no matter how large c

2

is, this statement is

violated. This means that f(n) / ∈ Θ(n).

Let’s show that f(n) / ∈ Θ(n

3

). Here the idea will be to violate the lower bound: “there exist positive constants

c

1

and n

0

, such that f(n) ≥ c

1

n

3

for all n ≥ n

0

.” Informally this is true because f(n) is growing quadratically,

and eventually any cubic function will exceed it. To show this formally, suppose towards a contradiction that

constants c

1

and n

0

did exist, such that 8n

2

+2n−3 ≥ c

1

n

3

for all n ≥ n

0

. Since this is true for all sufﬁciently

large n then it must be true in the limit as n tends to inﬁnity. If we divide both side by n

3

we have:

lim

n→∞

8

n

+

2

n

2

−

3

n

3

≥ c

1

.

It is easy to see that in the limit the left side tends to 0, and so the only way to satisfy this requirement is to set

c

1

= 0, but by hypothesis c

1

is positive. This means that f(n) / ∈ Θ(n

3

).

O-notation and Ω-notation: We have seen that the deﬁnition of Θ-notation relies on proving both a lower and upper

asymptotic bound. Sometimes we are only interested in proving one bound or the other. The O-notation allows

us to state asymptotic upper bounds and the Ω-notation allows us to state asymptotic lower bounds.

Deﬁnition: Given any function g(n),

O(g(n)) = ¦f(n) [ there exist positive constants c and n

0

such that

0 ≤ f(n) ≤ cg(n) for all n ≥ n

0

¦.

Deﬁnition: Given any function g(n),

Ω(g(n)) = ¦f(n) [ there exist positive constants c and n

0

such that

0 ≤ cg(n) ≤ f(n) for all n ≥ n

0

¦.

Compare this with the deﬁnition of Θ. You will see that O-notation only enforces the upper bound of the Θ

deﬁnition, and Ω-notation only enforces the lower bound. Also observe that f(n) ∈ Θ(g(n)) if and only if

f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)). Intuitively, f(n) ∈ O(g(n)) means that f(n) grows asymptotically at

the same rate or slower than g(n). Whereas, f(n) ∈ O(g(n)) means that f(n) grows asymptotically at the same

rate or faster than g(n).

For example f(n) = 3n

2

+ 4n ∈ Θ(n

2

) but it is not in Θ(n) or Θ(n

3

). But f(n) ∈ O(n

2

) and in O(n

3

) but

not in O(n). Finally, f(n) ∈ Ω(n

2

) and in Ω(n) but not in Ω(n

3

).

The Limit Rule for Θ: The previous examples which used limits suggest alternative way of showing that f(n) ∈

Θ(g(n)).

Limit Rule for Θ-notation: Given positive functions f(n) and g(n), if

lim

n→∞

f(n)

g(n)

= c,

for some constant c > 0 (strictly positive but not inﬁnity), then f(n) ∈ Θ(g(n)).

Lecture Notes 93 CMSC 451

Limit Rule for O-notation: Given positive functions f(n) and g(n), if

lim

n→∞

f(n)

g(n)

= c,

for some constant c ≥ 0 (nonnegative but not inﬁnite), then f(n) ∈ O(g(n)).

Limit Rule for Ω-notation: Given positive functions f(n) and g(n), if

lim

n→∞

f(n)

g(n)

= 0

(either a strictly positive constant or inﬁnity) then f(n) ∈ Ω(g(n)).

This limit rule can be applied in almost every instance (that I know of) where the formal deﬁnition can be used,

and it is almost always easier to apply than the formal deﬁnition. The only exceptions that I know of are strange

instances where the limit does not exist (e.g. f(n) = n

(1+sin n)

). But since most running times are fairly

well-behaved functions this is rarely a problem.

For example, recall the function f(n) = 8n

2

+ 2n − 3. To show that f(n) ∈ Θ(n

2

) we let g(n) = n

2

and

compute the limit. We have

lim

n→∞

8n

2

+ 2n −3

n

2

= lim

n→∞

8 +

2

n

−

3

n

2

= 8,

(since the two fractional terms tend to 0 in the limit). Since 8 is a nonzero constant, it follows that f(n) ∈

Θ(g(n)).

You may recall the important rules from calculus for evaluating limits. (If not, dredge out your calculus book to

remember.) Most of the rules are pretty self evident (e.g., the limit of a ﬁnite sum is the sum of the individual

limits). One important rule to remember is the following:

L’Hˆ opital’s rule: If f(n) and g(n) both approach 0 or both approach ∞in the limit, then

lim

n→∞

f(n)

g(n)

= lim

n→∞

f

(n)

g

(n)

,

where f

(n) and g

**(n) denote the derivatives of f and g relative to n.
**

Exponentials and Logarithms: Exponentials and logarithms are very important in analyzing algorithms. The fol-

lowing are nice to keep in mind. The terminology lg

b

n means (lg n)

b

.

Lemma: Given any positive constants a > 1, b, and c:

lim

n→∞

n

b

a

n

= 0 lim

n→∞

lg

b

n

n

c

= 0.

We won’t prove these, but they can be shown by taking appropriate powers, and then applying L’Hˆ opital’s rule.

The important bottom line is that polynomials always grow more slowly than exponentials whose base is greater

than 1. For example:

n

500

∈ O(2

n

).

For this reason, we will try to avoid exponential running times at all costs. Conversely, logarithmic powers

(sometimes called polylogarithmic functions) grow more slowly than any polynomial. For example:

lg

500

n ∈ O(n).

Lecture Notes 94 CMSC 451

For this reason, we will usually be happy to allow any number of additional logarithmic factors, if it means

avoiding any additional powers of n.

At this point, it should be mentioned that these last observations are really asymptotic results. They are true

in the limit for large n, but you should be careful just how high the crossover point is. For example, by my

calculations, lg

500

n ≤ n only for n > 2

6000

(which is much larger than input size you’ll ever see). Thus, you

should take this with a grain of salt. But, for small powers of logarithms, this applies to all reasonably large

input sizes. For example lg

2

n ≤ n for all n ≥ 16.

Asymptotic Intuition: To get a intuitive feeling for what common asymptotic running times map into in terms of

practical usage, here is a little list.

• Θ(1): Constant time; you can’t beat it!

• Θ(log n): This is typically the speed that most efﬁcient data structures operate in for a single access. (E.g.,

inserting a key into a balanced binary tree.) Also it is the time to ﬁnd an object in a sorted list of length n

by binary search.

• Θ(n): This is about the fastest that an algorithm can run, given that you need Θ(n) time just to read in all

the data.

• Θ(nlog n): This is the running time of the best sorting algorithms. Since many problems require sorting

the inputs, this is still considered quite efﬁcient.

• Θ(n

2

), Θ(n

3

), . . ..: Polynomial time. These running times are acceptable either when the exponent is

small or when the data size is not too large (e.g. n ≤ 1, 000).

• Θ(2

n

), Θ(3

n

): Exponential time. This is only acceptable when either (1) your know that you inputs will

be of very small size (e.g. n ≤ 50), or (2) you know that this is a worst-case running time that will rarely

occur in practical instances. In case (2), it would be a good idea to try to get a more accurate average case

analysis.

• Θ(n!), Θ(n

n

): Acceptable only for really small inputs (e.g. n ≤ 20).

Are their even bigger functions? Deﬁnitely! For example, if you want to see a function that grows inconceivably

fast, look up the deﬁnition of Ackerman’s function in our text.

Max Dominance Revisited: Returning to our Max Dominance algorithms, recall that one had a running time of

T

1

(n) = n

2

and the other had a running time of T

2

(n) = nlog n + n(n −1)/2. Expanding the latter function

and grouping terms in order of their growth rate we have

T

2

(n) =

n

2

2

+nlog n −

n

2

.

We will leave it as an easy exercise to show that both T

1

(n) and T

2

(n) are Θ(n

2

). Although the second

algorithm is twice as fast for large n (because of the 1/2 factor multiplying the n

2

term), this does not represent

a signiﬁcant improvement.

Supplemental Lecture 2: Max Dominance

Read: Review Chapters 1–4 in CLRS.

Faster Algorithm for Max-Dominance: Recall the max-dominance problem from the last two lectures. So far we

have introduced a simple brute-force algorithm that ran in O(n

2

) time, which operated by comparing all pairs

of points. Last time we considered a slight improvement, which sorted the points by their x-coordinate, and

then compared each point against the subsequent points in the sorted order. However, this improvement, only

improved matters by a constant factor. The question we consider today is whether there is an approach that is

signiﬁcantly better.

Lecture Notes 95 CMSC 451

A Major Improvement: The problem with the previous algorithm is that, even though we have cut the number of

comparisons roughly in half, each point is still making lots of comparisons. Can we save time by making only

one comparison for each point? The inner while loop is testing to see whether any point that follows P[i] in the

sorted list has a larger y-coordinate. This suggests, that if we knew which point among P[i + 1, . . . , n] had the

maximum y-coordinate, we could just test against that point.

How can we do this? Here is a simple observation. For any set of points, the point with the maximum y-

coordinate is the maximal point with the smallest x-coordiante. This suggests that we can sweep the points

backwards, from right to left. We keep track of the index j of the most recently seen maximal point. (Initially

the rightmost point is maximal.) When we encounter the point P[i], it is maximal if and only if P[i].y ≥ P[j].y.

This suggests the following algorithm.

Max Dominance: Sort and Reverse Scan

MaxDom3(P, n) {

Sort P in ascending order by x-coordinate;

output P[n]; // last point is always maximal

j = n;

for i = n-1 downto 1 {

if (P[i].y >= P[j].y) { // is P[i] maximal?

output P[i]; // yes..output it

j = i; // P[i] has the largest y so far

}

}

}

The running time of the for-loop is obviously O(n), because there is just a single loop that is executed n − 1

times, and the code inside takes constant time. The total running time is dominated by the O(nlog n) sorting

time, for a total of O(nlog n) time.

How much of an improvement is this? Probably the most accurate way to ﬁnd out would be to code the two up,

and compare their running times. But just to get a feeling, let’s look at the ratio of the running times, ignoring

constant factors:

n

2

nlg n

=

n

lg n

.

(I use the notation lg n to denote the logarithm base 2, ln n to denote the natural logarithm (base e) and log n

when I do not care about the base. Note that a change in base only affects the value of a logarithm function by

a constant amount, so inside of O-notation, we will usually just write log n.)

For relatively small values of n (e.g. less than 100), both algorithms are probably running fast enough that the

difference will be practically negligible. (Rule 1 of algorithm optimization: Don’t optimize code that is already

fast enough.) On larger inputs, say, n = 1, 000, the ratio of n to log n is about 1000/10 = 100, so there is a 100-

to-1 ratio in running times. Of course, we would need to factor in constant factors, but since we are not using

any really complex data structures, it is hard to imagine that the constant factors will differ by more than, say,

10. For even larger inputs, say, n = 1, 000, 000, we are looking at a ratio of roughly 1, 000, 000/20 = 50, 000.

This is quite a signiﬁcant difference, irrespective of the constant factors.

Divide and Conquer Approach: One problem with the previous algorithm is that it relies on sorting. This is nice

and clean (since it is usually easy to get good code for sorting without troubling yourself to write your own).

However, if you really wanted to squeeze the most efﬁciency out of your code, you might consider whether you

can solve this problem without invoking a sorting algorithm.

One of the basic maxims of algorithm design is to ﬁrst approach any problem using one of the standard algorithm

design paradigms, e.g. divide and conquer, dynamic programming, greedy algorithms, depth-ﬁrst search. We

will talk more about these methods as the semester continues. For this problem, divide-and-conquer is a natural

method to choose. What is this paradigm?

Lecture Notes 96 CMSC 451

Divide: Divide the problem into two subproblems (ideally of approximately equal sizes),

Conquer: Solve each subproblem recursively, and

Combine: Combine the solutions to the two subproblems into a global solution.

How shall we divide the problem? I can think of a couple of ways. One is similar to how MergeSort operates.

Just take the array of points P[1..n], and split into two subarrays of equal size P[1..n/2] and P[n/2 + 1..n].

Because we do not sort the points, there is no particular relationship between the points in one side of the list

from the other.

Another approach, which is more reminiscent of QuickSort is to select a random element from the list, called a

pivot, x = P[r], where r is a random integer in the range from 1 to n, and then partition the list into two sublists,

those elements whose x-coordinates are less than or equal to x and those that greater than x. This will not be

guaranteed to split the list into two equal parts, but on average it can be shown that it does a pretty good job.

Let’s consider the ﬁrst method. (The quicksort method will also work, but leads to a tougher analysis.) Here is

more concrete outline. We will describe the algorithm at a very high level. The input will be a point array, and

a point array will be returned. The key ingredient is a function that takes the maxima of two sets, and merges

them into an overall set of maxima.

Max Dominance: Divide-and-Conquer

MaxDom4(P, n) {

if (n == 1) return {P[1]}; // one point is trivially maximal

m = n/2; // midpoint of list

M1 = MaxDom4(P[1..m], m); // solve for first half

M2 = MaxDom4(P[m+1..n], n-m); // solve for second half

return MaxMerge(M1, M2); // merge the results

}

The general process is illustrated below.

The main question is how the procedure Max Merge() is implemented, because it does all the work. Let us

assume that it returns a list of points in sorted order according to x-coordinates of the maximal points. Observe

that if a point is to be maximal overall, then it must be maximal in one of the two sublists. However, just

because a point is maximal in some list, does not imply that it is globally maximal. (Consider point (7, 10) in

the example.) However, if it dominates all the points of the other sublist, then we can assert that it is maximal.

I will describe the procedure at a very high level. It operates by walking through each of the two sorted lists of

maximal points. It maintains two pointers, one pointing to the next unprocessed item in each list. Think of these

as ﬁngers. Take the ﬁnger pointing to the point with the smaller x-coordinate. If its y-coordinate is larger than

the y-coordinate of the point under the other ﬁnger, then this point is maximal, and is copied to the next position

of the result list. Otherwise it is not copied. In either case, we move to the next point in the same list, and repeat

the process. The result list is returned.

The details will be left as an exercise. Observe that because we spend a constant amount of time processing each

point (either copying it to the result list or skipping over it) the total execution time of this procedure is O(n).

Recurrences: How do we analyze recursive procedures like this one? If there is a simple pattern to the sizes of

the recursive calls, then the best way is usually by setting up a recurrence, that is, a function which is deﬁned

recursively in terms of itself.

We break the problem into two subproblems of size roughly n/2 (we will say exactly n/2 for simplicity), and

the additional overhead of merging the solutions is O(n). We will ignore constant factors, writing O(n) just as

n, giving:

T(n) = 1 if n = 1,

T(n) = 2T(n/2) +n if n > 1.

Lecture Notes 97 CMSC 451

(13,3)

(2,14)

(16,4)

(7,7)

(4,11)

(11,5)

2 4 6 10 8

2

4

6

8

10

12 14 16

12

(14,10)

2 4 6 8 10

2

4

6

8

10

12 14 16

12

14

(5,1)

(12,12)

(15,7)

(7,13)

6

8

10

12 14 16

12

14

(5,1)

(7,13)

(12,12)

(14,10)

(15,7)

(13,3)

(2,14)

(16,4)

(7,7)

(4,11)

(11,5)

(9,10)

14

4

(5,1)

(7,13)

(12,12)

(14,10)

(15,7)

(13,3)

(2,14)

(16,4)

(7,7)

(4,11)

(11,5)

Input and initial partition. Solutions to subproblems.

Merged solution.

(9,10) (9,10)

2 4 6 8 10

2

Fig. 68: Divide and conquer approach.

Lecture Notes 98 CMSC 451

Solving Recurrences by The Master Theorem: There are a number of methods for solving the sort of recurrences

that show up in divide-and-conquer algorithms. The easiest method is to apply the Master Theorem that is given

in CLRS. Here is a slightly more restrictive version, but adequate for a lot of instances. See CLRS for the more

complete version of the Master Theorem and its proof.

Theorem: (Simpliﬁed Master Theorem) Let a ≥ 1, b > 1 be constants and let T(n) be the recurrence

T(n) = aT(n/b) +cn

k

,

deﬁned for n ≥ 0.

Case (1): a > b

k

then T(n) is Θ(n

log

b

a

).

Case (2): a = b

k

then T(n) is Θ(n

k

log n).

Case (3): a < b

k

then T(n) is Θ(n

k

).

Using this version of the Master Theorem we can see that in our recurrence a = 2, b = 2, and k = 1, so a = b

k

and case (2) applies. Thus T(n) is Θ(nlog n).

There many recurrences that cannot be put into this form. For example, the following recurrence is quite

common: T(n) = 2T(n/2) +nlog n. This solves to T(n) = Θ(nlog

2

n), but the Master Theorem (either this

form or the one in CLRS will not tell you this.) For such recurrences, other methods are needed.

Expansion: A more basic method for solving recurrences is that of expansion (which CLRS calls iteration). This is

a rather painstaking process of repeatedly applying the deﬁnition of the recurrence until (hopefully) a simple

pattern emerges. This pattern usually results in a summation that is easy to solve. If you look at the proof in

CLRS for the Master Theorem, it is actually based on expansion.

Let us consider applying this to the following recurrence. We assume that n is a power of 3.

T(1) = 1

T(n) = 2T

n

3

+n if n > 1

First we expand the recurrence into a summation, until seeing the general pattern emerge.

T(n) = 2T

n

3

+n

= 2

2T

n

9

+

n

3

+n = 4T

n

9

+

n +

2n

3

= 4

2T

n

27

+

n

9

+

n +

2n

3

= 8T

n

27

+

n +

2n

3

+

4n

9

.

.

.

= 2

k

T

n

3

k

+

k−1

¸

i=0

2

i

n

3

i

= 2

k

T

n

3

k

+n

k−1

¸

i=0

(2/3)

i

.

The parameter k is the number of expansions (not to be confused with the value of k we introduced earlier on

the overhead). We want to know how many expansions are needed to arrive at the basis case. To do this we set

n/(3

k

) = 1, meaning that k = log

3

n. Substituting this in and using the identity a

log b

= b

log a

we have:

T(n) = 2

log

3

n

T(1) +n

log

3

n−1

¸

i=0

(2/3)

i

= n

log

3

2

+n

log

3

n−1

¸

i=0

(2/3)

i

.

Lecture Notes 99 CMSC 451

Next, we can apply the formula for the geometric series and simplify to get:

T(n) = n

log

3

2

+n

1 −(2/3)

log

3

n

1 −(2/3)

= n

log

3

2

+ 3n(1 −(2/3)

log

3

n

) = n

log

3

2

+ 3n(1 −n

log

3

(2/3)

)

= n

log

3

2

+ 3n(1 −n

(log

3

2)−1

) = n

log

3

2

+ 3n −3n

log

3

2

= 3n −2n

log

3

2

.

Since log

3

2 ≈ 0.631 < 1, T(n) is dominated by the 3n term asymptotically, and so it is Θ(n).

Induction and Constructive Induction: Another technique for solving recurrences (and this works for summations

as well) is to guess the solution, or the general form of the solution, and then attempt to verify its correctness

through induction. Sometimes there are parameters whose values you do not know. This is ﬁne. In the course

of the induction proof, you will usually ﬁnd out what these values must be. We will consider a famous example,

that of the Fibonacci numbers.

F

0

= 0

F

1

= 1

F

n

= F

n−1

+F

n−2

for n ≥ 2.

The Fibonacci numbers arise in data structure design. If you study AVL (height balanced) trees in data structures,

you will learn that the minimum-sized AVL trees are produced by the recursive construction given below. Let

L(i) denote the number of leaves in the minimum-sized AVL tree of height i. To construct a minimum-sized

AVL tree of height i, you create a root node whose children consist of a minimum-sized AVL tree of heights

i −1 and i −2. Thus the number of leaves obeys L(0) = L(1) = 1, L(i) = L(i −1) +L(i −2). It is easy to

see that L(i) = F

i+1

.

L(4)=5 L(3)=3 L(2)=2 L(1)=1 L(0) = 1

Fig. 69: Minimum-sized AVL trees.

If you expand the Fibonacci series for a number of terms, you will observe that F

n

appears to growexponentially,

but not as fast as 2

n

. It is tempting to conjecture that F

n

≤ φ

n−1

, for some real parameter φ, where 1 < φ < 2.

We can use induction to prove this and derive a bound on φ.

Lemma: For all integers n ≥ 1, F

n

≤ φ

n−1

for some constant φ, 1 < φ < 2.

Proof: We will try to derive the tightest bound we can on the value of φ.

Basis: For the basis cases we consider n = 1. Observe that F

1

= 1 ≤ φ

0

, as desired.

Induction step: For the induction step, let us assume that F

m

≤ φ

m−1

whenever 1 ≤ m < n. Using this

induction hypothesis we will show that the lemma holds for n itself, whenever n ≥ 2.

Since n ≥ 2, we have F

n

= F

n−1

+F

n−2

. Now, since n −1 and n −2 are both strictly less than n,

we can apply the induction hypothesis, from which we have

F

n

≤ φ

n−2

+φ

n−3

= φ

n−3

(1 +φ).

Lecture Notes 100 CMSC 451

We want to show that this is at most φ

n−1

(for a suitable choice of φ). Clearly this will be true if and

only if (1 +φ) ≤ φ

2

. This is not true for all values of φ (for example it is not true when φ = 1 but it

is true when φ = 2.)

At the critical value of φ this inequality will be an equality, implying that we want to ﬁnd the roots of

the equation

φ

2

−φ −1 = 0.

By the quadratic formula we have

φ =

1 ±

√

1 + 4

2

=

1 ±

√

5

2

.

Since

√

5 ≈ 2.24, observe that one of the roots is negative, and hence would not be a possible

candidate for φ. The positive root is

φ =

1 +

√

5

2

≈ 1.618.

There is a very subtle bug in the preceding proof. Can you spot it? The error occurs in the case n = 2. Here

we claim that F

2

= F

1

+ F

0

and then we apply the induction hypothesis to both F

1

and F

0

. But the induction

hypothesis only applies for m ≥ 1, and hence cannot be applied to F

0

! To ﬁx it we could include F

2

as part of

the basis case as well.

Notice not only did we prove the lemma by induction, but we actually determined the value of φ which makes

the lemma true. This is why this method is called constructive induction.

By the way, the value φ =

1

2

(1 +

√

5) is a famous constant in mathematics, architecture and art. It is the golden

ratio. Two numbers A and B satisfy the golden ratio if

A

B

=

A+B

A

.

It is easy to verify that A = φ and B = 1 satisﬁes this condition. This proportion occurs throughout the world

of art and architecture.

Supplemental Lecture 3: Recurrences and Generating Functions

Read: This material is not covered in CLR. There a good description of generating functions in D. E. Knuth, The Art

of Computer Programming, Vol 1.

Generating Functions: The method of constructive induction provided a way to get a bound on F

n

, but we did not

get an exact answer, and we had to generate a good guess before we were even able to start.

Let us consider an approach to determine an exact representation of F

n

, which requires no guesswork. This

method is based on a very elegant concept, called a generating function. Consider any inﬁnite sequence:

a

0

, a

1

, a

2

, a

3

, . . .

If we would like to “encode” this sequence succinctly, we could deﬁne a polynomial function such that these

are the coefﬁcients of the function:

G(z) = a

0

+a

1

z +a

2

z

2

+a

3

z

3

+. . .

This is called the generating function of the sequence. What is z? It is just a symbolic variable. We will (almost)

never assign it a speciﬁc value. Thus, every inﬁnite sequence of numbers has a corresponding generating func-

tion, and vice versa. What is the advantage of this representation? It turns out that we can perform arithmetic

Lecture Notes 101 CMSC 451

transformations on these functions (e.g., adding them, multiplying them, differentiating them) and this has a

corresponding effect on the underlying transformations. It turns out that some nicely-structured sequences (like

the Fibonacci numbers, and many sequences arising from linear recurrences) have generating functions that are

easy to write down and manipulate.

Let’s consider the generating function for the Fibonacci numbers:

G(z) = F

0

+F

1

z +F

2

z

2

+F

3

z

3

+. . .

= z +z

2

+ 2z

3

+ 3z

4

+ 5z

5

+. . .

The trick in dealing with generating functions is to ﬁgure out how various manipulations of the generating

function to generate algebraically equivalent forms. For example, notice that if we multiply the generating

function by a factor of z, this has the effect of shifting the sequence to the right:

G(z) = F

0

+ F

1

z + F

2

z

2

+ F

3

z

3

+ F

4

z

4

+ . . .

zG(z) = F

0

z + F

1

z

2

+ F

2

z

3

+ F

3

z

4

+ . . .

z

2

G(z) = F

0

z

2

+ F

1

z

3

+ F

2

z

4

+ . . .

Now, let’s try the following manipulation. Compute G(z) −zG(z) −z

2

G(z), and see what we get

(1 −z −z

2

)G(z) = F

0

+ (F

1

−F

0

)z + (F

2

−F

1

−F

0

)z

2

+ (F

3

−F

2

−F

1

)z

3

+. . . + (F

i

−F

i−1

−F

i−2

)z

i

+. . .

= z.

Observe that every term except the second is equal to zero by the deﬁnition of F

i

. (The particular manipulation

we picked was chosen to cause this cancellation to occur.) From this we may conclude that

G(z) =

z

1 −z −z

2

.

So, now we have an alternative representation for the Fibonacci numbers, as the coefﬁcients of this function if

expanded as a power series. So what good is this? The main goal is to get at the coefﬁcients of its power series

expansion. There are certain common tricks that people use to manipulate generating functions.

The ﬁrst is to observe that there are some functions for which it is very easy to get an power series expansion.

For example, the following is a simple consequence of the formula for the geometric series. If 0 < c < 1 then

∞

¸

i=0

c

i

=

1

1 −c

.

Setting z = c, we have

1

1 −z

= 1 +z +z

2

+z

3

+. . .

(In other words, 1/(1−z) is the generating function for the sequence (1, 1, 1, . . .). In general, given an constant

a we have

1

1 −az

= 1 +az +a

2

z

2

+a

3

z

3

+. . .

is the generating function for (1, a, a

2

, a

3

, . . .). It would be great if we could modify our generating function to

be in the form of 1/(1 −az) for some constant a, since then we could then extract the coefﬁcients of the power

series easily.

In order to do this, we would like to rewrite the generating function in the following form:

G(z) =

z

1 −z −z

2

=

A

1 −az

+

B

1 −bz

,

Lecture Notes 102 CMSC 451

for some A, B, a, b. We will skip the steps in doing this, but it is not hard to verify the roots of (1 −az)(1 −bz)

(which are 1/a and 1/b) must be equal to the roots of 1 − z − z

2

. We can then solve for a and b by taking the

reciprocals of the roots of this quadratic. Then by some simple algebra we can plug these values in and solve

for A and B yielding:

G(z) =

z

1 −z −z

2

=

1/

√

5

1 −φz

+

−1/

√

5

1 −

ˆ

φ

=

1

√

5

1

1 −φz

−

1

1 −

ˆ

φ

,

where φ = (1 +

√

5)/2 and

ˆ

φ = (1 −

√

5)/2. (In particular, to determine A, multiply the equation by 1 −φz,

and then consider what happens when z = 1/φ. A similar trick can be applied to get B. In general, this is called

the method of partial fractions.)

Now we are in good shape, because we can extract the coefﬁcients for these two fractions from the above

function. From this we have the following:

G(z) =

1

√

5

( 1 + φz + φ

2

z

2

+ . . .

−1 + −

ˆ

φz + −

ˆ

φ

2

z

2

+ . . . )

Combining terms we have

G(z) =

1

√

5

∞

¸

i=0

(φ

i

−

ˆ

φ

i

)z

i

.

We can now read off the coefﬁcients easily. In particular it follows that

F

n

=

1

√

5

(φ

n

−

ˆ

φ

n

).

This is an exact result, and no guesswork was needed. The only parts that involved some cleverness (beyond the

invention of generating functions) was (1) coming up with the simple closed form formula for G(z) by taking

appropriate differences and applying the rule for the recurrence, and (2) applying the method of partial fractions

to get the generating function into one for which we could easily read off the ﬁnal coefﬁcients.

This is a rather remarkable, because it says that we can express the integer F

n

as the sum of two powers of to

irrational numbers φ and

ˆ

φ. You might try this for a few speciﬁc values of n to see why this is true. By the way,

when you observe that

ˆ

φ < 1, it is clear that the ﬁrst term is the dominant one. Thus we have, for large enough

n, F

n

= φ

n

/

√

5, rounded to the nearest integer.

Supplemental Lecture 4: Medians and Selection

Read: Chapter 9 of CLRS.

Selection: We have discussed recurrences and the divide-and-conquer method of solving problems. Today we will

give a rather surprising (and very tricky) algorithm which shows the power of these techniques.

The problem that we will consider is very easy to state, but surprisingly difﬁcult to solve optimally. Suppose

that you are given a set of n numbers. Deﬁne the rank of an element to be one plus the number of elements

that are smaller than this element. Since duplicate elements make our life more complex (by creating multiple

elements of the same rank), we will make the simplifying assumption that all the elements are distinct for now.

It will be easy to get around this assumption later. Thus, the rank of an element is its ﬁnal position if the set is

sorted. The minimum is of rank 1 and the maximum is of rank n.

Of particular interest in statistics is the median. If n is odd then the median is deﬁned to be the element of rank

(n + 1)/2. When n is even there are two natural choices, namely the elements of ranks n/2 and (n/2) + 1. In

Lecture Notes 103 CMSC 451

statistics it is common to return the average of these two elements. We will deﬁne the median to be either of

these elements.

Medians are useful as measures of the central tendency of a set, especially when the distribution of values is

highly skewed. For example, the median income in a community is likely to be more meaningful measure of

the central tendency than the average is, since if Bill Gates lives in your community then his gigantic income

may signiﬁcantly bias the average, whereas it cannot have a signiﬁcant inﬂuence on the median. They are also

useful, since in divide-and-conquer applications, it is often desirable to partition a set about its median value,

into two sets of roughly equal size. Today we will focus on the following generalization, called the selection

problem.

Selection: Given a set A of n distinct numbers and an integer k, 1 ≤ k ≤ n, output the element of A of rank k.

The selection problem can easily be solved in Θ(nlog n) time, simply by sorting the numbers of A, and then

returning A[k]. The question is whether it is possible to do better. In particular, is it possible to solve this

problem in Θ(n) time? We will see that the answer is yes, and the solution is far from obvious.

The Sieve Technique: The reason for introducing this algorithm is that it illustrates a very important special case of

divide-and-conquer, which I call the sieve technique. We think of divide-and-conquer as breaking the problem

into a small number of smaller subproblems, which are then solved recursively. The sieve technique is a special

case, where the number of subproblems is just 1.

The sieve technique works in phases as follows. It applies to problems where we are interested in ﬁnding a

single item from a larger set of n items. We do not know which item is of interest, however after doing some

amount of analysis of the data, taking say Θ(n

k

) time, for some constant k, we ﬁnd that we do not know what

the desired item is, but we can identify a large enough number of elements that cannot be the desired value, and

can be eliminated from further consideration. In particular “large enough” means that the number of items is

at least some ﬁxed constant fraction of n (e.g. n/2, n/3, 0.0001n). Then we solve the problem recursively on

whatever items remain. Each of the resulting recursive solutions then do the same thing, eliminating a constant

fraction of the remaining set.

Applying the Sieve to Selection: To see more concretely how the sieve technique works, let us apply it to the selec-

tion problem. Recall that we are given an array A[1..n] and an integer k, and want to ﬁnd the k-th smallest

element of A. Since the algorithm will be applied inductively, we will assume that we are given a subarray

A[p..r] as we did in MergeSort, and we want to ﬁnd the kth smallest item (where k ≤ r − p + 1). The initial

call will be to the entire array A[1..n].

There are two principal algorithms for solving the selection problem, but they differ only in one step, which

involves judiciously choosing an item from the array, called the pivot element, which we will denote by x. Later

we will see how to choose x, but for now just think of it as a random element of A. We then partition A into

three parts. A[q] contains the element x, subarray A[p..q − 1] will contain all the elements that are less than x,

and A[q +1..r], will contain all the element that are greater than x. (Recall that we assumed that all the elements

are distinct.) Within each subarray, the items may appear in any order. This is illustrated below.

It is easy to see that the rank of the pivot x is q −p + 1 in A[p..r]. Let xRank = q −p + 1. If k = xRank, then

the pivot is the kth smallest, and we may just return it. If k < xRank, then we know that we need to recursively

search in A[p..q−1] and if k > xRank then we need to recursively search A[q+1..r]. In this latter case we have

eliminated q smaller elements, so we want to ﬁnd the element of rank k −q. Here is the complete pseudocode.

Notice that this algorithm satisﬁes the basic form of a sieve algorithm. It analyzes the data (by choosing the pivot

element and partitioning) and it eliminates some part of the data set, and recurses on the rest. When k = xRank

then we get lucky and eliminate everything. Otherwise we either eliminate the pivot and the right subarray or

the pivot and the left subarray.

We will discuss the details of choosing the pivot and partitioning later, but assume for now that they both take

Θ(n) time. The question that remains is how many elements did we succeed in eliminating? If x is the largest

Lecture Notes 104 CMSC 451

Before partitioing

After partitioing

2 6 4 1 3 7 9

pivot

3 5 1 9 4 6

x

p r

q p r

A[q+1..r] > x

A[p..q−1] < x

5

2 7

Partition

(pivot = 4)

9

7

5

6

(k=6−4=2)

Recurse

x_rnk=2 (DONE!)

6

5

5

6

(pivot = 6)

Partition

(k=2)

Recurse

x_rnk=3

(pivot = 7)

Partition

(k=6)

Initial

x_rnk=4

6

7

3

1

4

6

2

9

5

4

1

9

5

3

7

2

6

9

5

7

Fig. 70: Selection Algorithm.

Selection by the Sieve Technique

Select(array A, int p, int r, int k) { // return kth smallest of A[p..r]

if (p == r) return A[p] // only 1 item left, return it

else {

x = ChoosePivot(A, p, r) // choose the pivot element

q = Partition(A, p, r, x) // partition <A[p..q-1], x, A[q+1..r]>

xRank = q - p + 1 // rank of the pivot

if (k == xRank) return x // the pivot is the kth smallest

else if (k < xRank)

return Select(A, p, q-1, k) // select from left subarray

else

return Select(A, q+1, r, k-xRank)// select from right subarray

}

}

Lecture Notes 105 CMSC 451

or smallest element in the array, then we may only succeed in eliminating one element with each phase. In fact,

if x is one of the smallest elements of A or one of the largest, then we get into trouble, because we may only

eliminate it and the few smaller or larger elements of A. Ideally x should have a rank that is neither too large

nor too small.

Let us suppose for now (optimistically) that we are able to design the procedure Choose Pivot in such a

way that is eliminates exactly half the array with each phase, meaning that we recurse on the remaining n/2

elements. This would lead to the following recurrence.

T(n) =

1 if n = 1,

T(n/2) +n otherwise.

We can solve this either by expansion (iteration) or the Master Theorem. If we expand this recurrence level by

level we see that we get the summation

T(n) = n +

n

2

+

n

4

+ ≤

∞

¸

i=0

n

2

i

= n

∞

¸

i=0

1

2

i

.

Recall the formula for the inﬁnite geometric series. For any c such that [c[ < 1,

¸

∞

i=0

c

i

= 1/(1 − c). Using

this we have

T(n) ≤ 2n ∈ O(n).

(This only proves the upper bound on the running time, but it is easy to see that it takes at least Ω(n) time, so

the total running time is Θ(n).)

This is a bit counterintuitive. Normally you would think that in order to design a Θ(n) time algorithm you could

only make a single, or perhaps a constant number of passes over the data set. In this algorithm we make many

passes (it could be as many as lg n). However, because we eliminate a constant fraction of elements with each

phase, we get this convergent geometric series in the analysis, which shows that the total running time is indeed

linear in n. This lesson is well worth remembering. It is often possible to achieve running times in ways that

you would not expect.

Note that the assumption of eliminating half was not critical. If we eliminated even one per cent, then the

recurrence would have been T(n) = T(99n/100) +n, and we would have gotten a geometric series involving

99/100, which is still less than 1, implying a convergent series. Eliminating any constant fraction would have

been good enough.

Choosing the Pivot: There are two issues that we have left unresolved. The ﬁrst is how to choose the pivot element,

and the second is how to partition the array. Both need to be solved in Θ(n) time. The second problem is a

rather easy programming exercise. Later, when we discuss QuickSort, we will discuss partitioning in detail.

For the rest of the lecture, let’s concentrate on how to choose the pivot. Recall that before we said that we might

think of the pivot as a random element of A. Actually this is not such a bad idea. Let’s see why.

The key is that we want the procedure to eliminate at least some constant fraction of the array after each parti-

tioning step. Let’s consider the top of the recurrence, when we are given A[1..n]. Suppose that the pivot x turns

out to be of rank q in the array. The partitioning algorithm will split the array into A[1..q − 1] < x, A[q] = x

and A[q + 1..n] > x. If k = q, then we are done. Otherwise, we need to search one of the two subarrays. They

are of sizes q − 1 and n − q, respectively. The subarray that contains the kth smallest element will generally

depend on what k is, so in the worst case, k will be chosen so that we have to recurse on the larger of the two

subarrays. Thus if q > n/2, then we may have to recurse on the left subarray of size q −1, and if q < n/2, then

we may have to recurse on the right subarray of size n −q. In either case, we are in trouble if q is very small, or

if q is very large.

If we could select q so that it is roughly of middle rank, then we will be in good shape. For example, if

n/4 ≤ q ≤ 3n/4, then the larger subarray will never be larger than 3n/4. Earlier we said that we might think

of the pivot as a random element of the array A. Actually this works pretty well in practice. The reason is that

Lecture Notes 106 CMSC 451

roughly half of the elements lie between ranks n/4 and 3n/4, so picking a random element as the pivot will

succeed about half the time to eliminate at least n/4. Of course, we might be continuously unlucky, but a careful

analysis will show that the expected running time is still Θ(n). We will return to this later.

Instead, we will describe a rather complicated method for computing a pivot element that achieves the desired

properties. Recall that we are given an array A[1..n], and we want to compute an element x whose rank is

(roughly) between n/4 and 3n/4. We will have to describe this algorithm at a very high level, since the details

are rather involved. Here is the description for Select Pivot:

Groups of 5: Partition A into groups of 5 elements, e.g. A[1..5], A[6..10], A[11..15], etc. There will be exactly

m = n/5| such groups (the last one might have fewer than 5 elements). This can easily be done in Θ(n)

time.

Group medians: Compute the median of each group of 5. There will be m group medians. We do not need an

intelligent algorithm to do this, since each group has only a constant number of elements. For example, we

could just BubbleSort each group and take the middle element. Each will take Θ(1) time, and repeating

this n/5| times will give a total running time of Θ(n). Copy the group medians to a new array B.

Median of medians: Compute the median of the group medians. For this, we will have to call the selection

algorithm recursively on B, e.g. Select(B, 1, m, k), where m = n/5|, and k = (m+ 1)/2|.

Let x be this median of medians. Return x as the desired pivot.

The algorithm is illustrated in the ﬁgure below. To establish the correctness of this procedure, we need to argue

that x satisﬁes the desired rank properties.

8

10

27

Group

29

11

58

39

60

55

1

21

52

19

48 63

12

23

3

24

37

57

14

6

48 24

57

14

25

30

43

2

32

3

63

12

52

23

64

34

17

44

5

19

8

27

10

41

25

25

43

30

32

2

63

52

12

23

3

34

44

17

27

10

8

19

48

41

60

1

29

11

39

58

Get median of medians

(Sorting of group medians is not really performed)

6

43

30

32

2

64

5

34

44

17 29

11

39

58 55

21

41

60

1

24

64

5

55

21

Get group medians

37

57

14

37

6

Fig. 71: Choosing the Pivot. 30 is the ﬁnal pivot.

Lemma: The element x is of rank at least n/4 and at most 3n/4 in A.

Proof: We will show that x is of rank at least n/4. The other part of the proof is essentially symmetrical. To

do this, we need to show that there are at least n/4 elements that are less than or equal to x. This is a bit

complicated, due to the ﬂoor and ceiling arithmetic, so to simplify things we will assume that n is evenly

divisible by 5. Consider the groups shown in the tabular form above. Observe that at least half of the group

medians are less than or equal to x. (Because x is their median.) And for each group median, there are

three elements that are less than or equal to this median within its group (because it is the median of its

Lecture Notes 107 CMSC 451

group). Therefore, there are at least 3((n/5)/2 = 3n/10 ≥ n/4 elements that are less than or equal to x

in the entire array.

Analysis: The last order of business is to analyze the running time of the overall algorithm. We achieved the main

goal, namely that of eliminating a constant fraction (at least 1/4) of the remaining list at each stage of the

algorithm. The recursive call in Select() will be made to list no larger than 3n/4. However, in order

to achieve this, within Select Pivot() we needed to make a recursive call to Select() on an array B

consisting of n/5| elements. Everything else took only Θ(n) time. As usual, we will ignore ﬂoors and ceilings,

and write the Θ(n) as n for concreteness. The running time is

T(n) ≤

1 if n = 1,

T(n/5) +T(3n/4) +n otherwise.

This is a very strange recurrence because it involves a mixture of different fractions (n/5 and 3n/4). This

mixture will make it impossible to use the Master Theorem, and difﬁcult to apply iteration. However, this is a

good place to apply constructive induction. We know we want an algorithm that runs in Θ(n) time.

Theorem: There is a constant c, such that T(n) ≤ cn.

Proof: (by strong induction on n)

Basis: (n = 1) In this case we have T(n) = 1, and so T(n) ≤ cn as long as c ≥ 1.

Step: We assume that T(n

) ≤ cn

for all n

**< n. We will then show that T(n) ≤ cn. By deﬁnition we
**

have

T(n) = T(n/5) +T(3n/4) +n.

Since n/5 and 3n/4 are both less than n, we can apply the induction hypothesis, giving

T(n) ≤ c

n

5

+c

3n

4

+n = cn

1

5

+

3

4

+n

= cn

19

20

+n = n

19c

20

+ 1

.

This last expression will be ≤ cn, provided that we select c such that c ≥ (19c/20) + 1. Solving for

c we see that this is true provided that c ≥ 20.

Combining the constraints that c ≥ 1, and c ≥ 20, we see that by letting c = 20, we are done.

A natural question is why did we pick groups of 5? If you look at the proof above, you will see that it works for

any value that is strictly greater than 4. (You might try it replacing the 5 with 3, 4, or 6 and see what happens.)

Supplemental Lecture 5: Analysis of BucketSort

Probabilistic Analysis of BucketSort: We begin with a quick-and-dirty analysis of bucketsort. Since there are n

buckets, and the items fall uniformly between them, we would expect a constant number of items per bucket.

Thus, the expected insertion time for each bucket is only a constant. Therefore the expected running time of

the algorithm is Θ(n). This quick-and-dirty analysis is probably good enough to convince yourself of this

algorithm’s basic efﬁciency. A careful analysis involves understanding a bit about probabilistic analyses of

algorithms. Since we haven’t done any probabilistic analyses yet, let’s try doing this one. (This one is rather

typical.)

The ﬁrst thing to do in a probabilistic analysis is to deﬁne a random variable that describes the essential quantity

that determines the execution time. A discrete random variable can be thought of as variable that takes on some

Lecture Notes 108 CMSC 451

set of discrete values with certain probabilities. More formally, it is a function that maps some some discrete

sample space (the set of possible values) onto the reals (the probabilities). For 0 ≤ i ≤ n −1, let X

i

denote the

random variable that indicates the number of elements assigned to the i-th bucket.

Since the distribution is uniform, all of the random variables X

i

have the same probability distribution, so we

may as well talk about a single random variable X, which will work for any bucket. Since we are using a

quadratic time algorithm to sort the elements of each bucket, we are interested in the expected sorting time,

which is Θ(X

2

). So this leads to the key question, what is the expected value of X

2

, denoted E[X

2

].

Because the elements are assumed to be uniformly distributed, each element has an equal probability of going

into any bucket, or in particular, it has a probability of p = 1/n of going into the ith bucket. So how many items

do we expect will wind up in bucket i? We can analyze this by thinking of each element of Aas being represented

by a coin ﬂip (with a biased coin, which has a different probability of heads and tails). With probability p = 1/n

the number goes into bucket i, which we will interpret as the coin coming up heads. With probability 1 − 1/n

the item goes into some other bucket, which we will interpret as the coin coming up tails. Since we assume

that the elements of A are independent of each other, X is just the total number of heads we see after making n

tosses with this (biased) coin.

The number of times that a heads event occurs, given n independent trials in which each trial has two possible

outcomes is a well-studied problem in probability theory. Such trials are called Bernoulli trials (named after the

Swiss mathematician James Bernoulli). If p is the probability of getting a head, then the probability of getting k

heads in n tosses is given by the following important formula

P(X = k) =

n

k

p

k

(1 −p)

n−k

where

n

k

=

n!

k!(n −k)!

.

Although this looks messy, it is not too hard to see where it comes from. Basically p

k

is the probability of

tossing k heads, (1 − p)

n−k

is the probability of tossing n − k tails, and

n

k

**is the total number of different
**

ways that the k heads could be distributed among the n tosses. This probability distribution (as a function of k,

for a given n and p) is called the binomial distribution, and is denoted b(k; n, p).

If you consult a standard textbook on probability and statistics, then you will see the two important facts that we

need to know about the binomial distribution. Namely, that its mean value E[X] and its variance Var[X] are

E[X] = np and Var[X] = E[X

2

] −E

2

[X] = np(1 −p).

We want to determine E[X

2

]. By the above formulas and the fact that p = 1/n we can derive this as

E[X

2

] = Var[X] +E

2

[X] = np(1 −p) + (np)

2

=

n

n

1 −

1

n

+

n

n

2

= 2 −

1

n

.

Thus, for large n the time to insert the items into any one of the linked lists is a just shade less than 2. Summing

up over all n buckets, gives a total running time of Θ(2n) = Θ(n). This is exactly what our quick-and-dirty

analysis gave us, but now we know it is true with conﬁdence.

Supplemental Lecture 6: Long Integer Multiplication

Read: This material on integer multiplication is not covered in CLRS.

Long Integer Multiplication: The following little algorithm shows a bit more about the surprising applications of

divide-and-conquer. The problem that we want to consider is how to perform arithmetic on long integers, and

multiplication in particular. The reason for doing arithmetic on long numbers stems from cryptography. Most

techniques for encryption are based on number-theoretic techniques. For example, the character string to be

encrypted is converted into a sequence of numbers, and encryption keys are stored as long integers. Efﬁcient

Lecture Notes 109 CMSC 451

encryption and decryption depends on being able to perform arithmetic on long numbers, typically containing

hundreds of digits.

Addition and subtraction on large numbers is relatively easy. If n is the number of digits, then these algorithms

run in Θ(n) time. (Go back and analyze your solution to the problem on Homework 1). But the standard

algorithm for multiplication runs in Θ(n

2

) time, which can be quite costly when lots of long multiplications are

needed.

This raises the question of whether there is a more efﬁcient way to multiply two very large numbers. It would

seem surprising if there were, since for centuries people have used the same algorithm that we all learn in grade

school. In fact, we will see that it is possible.

Divide-and-Conquer Algorithm: We know the basic grade-school algorithm for multiplication. We normally think

of this algorithm as applying on a digit-by-digit basis, but if we partition an n digit number into two “super

digits” with roughly n/2 each into longer sequences, the same multiplication rule still applies.

w

y

x

z

xz wz

xy wy

wy wz + xy xz

n

n/2 n/2

A

B

Product

Fig. 72: Long integer multiplication.

To avoid complicating things with ﬂoors and ceilings, let’s just assume that the number of digits n is a power of

2. Let Aand B be the two numbers to multiply. Let A[0] denote the least signiﬁcant digit and let A[n−1] denote

the most signiﬁcant digit of A. Because of the way we write numbers, it is more natural to think of the elements

of A as being indexed in decreasing order from left to right as A[n −1..0] rather than the usual A[0..n −1].

Let m = n/2. Let

w = A[n −1..m] x = A[m−1..0] and

y = B[n −1..m] z = B[m−1..0].

If we think of w, x, y and z as n/2 digit numbers, we can express A and B as

A = w 10

m

+x

B = y 10

m

+z,

and their product is

mult(A, B) = mult(w, y)10

2m

+ (mult(w, z) + mult(x, y))10

m

+ mult(x, z).

The operation of multiplying by 10

m

should be thought of as simply shifting the number over by m positions to

the right, and so is not really a multiplication. Observe that all the additions involve numbers involving roughly

n/2 digits, and so they take Θ(n) time each. Thus, we can express the multiplication of two long integers as the

result of four products on integers of roughly half the length of the original, and a constant number of additions

and shifts, each taking Θ(n) time. This suggests that if we were to implement this algorithm, its running time

would be given by the following recurrence

T(n) =

1 if n = 1,

4T(n/2) +n otherwise.

Lecture Notes 110 CMSC 451

If we apply the Master Theorem, we see that a = 4, b = 2, k = 1, and a > b

k

, implying that Case 1 holds and

the running time is Θ(n

lg 4

) = Θ(n

2

). Unfortunately, this is no better than the standard algorithm.

Faster Divide-and-Conquer Algorithm: Even though the above exercise appears to have gotten us nowhere, it ac-

tually has given us an important insight. It shows that the critical element is the number of multiplications on

numbers of size n/2. The number of additions (as long as it is a constant) does not affect the running time. So,

if we could ﬁnd a way to arrive at the same result algebraically, but by trading off multiplications in favor of

additions, then we would have a more efﬁcient algorithm. (Of course, we cannot simulate multiplication through

repeated additions, since the number of additions must be a constant, independent of n.)

The key turns out to be a algebraic “trick”. The quantities that we need to compute are C = wy, D = xz,

and E = (wz + xy). Above, it took us four multiplications to compute these. However, observe that if instead

we compute the following quantities, we can get everything we want, using only three multiplications (but with

more additions and subtractions).

C = mult(w, y)

D = mult(x, z)

E = mult((w +x), (y +z)) −C −D = (wy +wz +xy +xz) −wy −xz = (wz +xy).

Finally we have

mult(A, B) = C 10

2m

+E 10

m

+D.

Altogether we perform 3 multiplications, 4 additions, and 2 subtractions all of numbers with n/2 digitis. We

still need to shift the terms into their proper ﬁnal positions. The additions, subtractions, and shifts take Θ(n)

time in total. So the total running time is given by the recurrence:

T(n) =

1 if n = 1,

3T(n/2) +n otherwise.

Now when we apply the Master Theorem, we have a = 3, b = 2 and k = 1, yielding T(n) ∈ Θ(n

lg 3

) ≈

Θ(n

1.585

).

Is this really an improvement? This algorithm carries a larger constant factor because of the overhead of recur-

sion and the additional arithmetic operations. But asymptotics says that if n is large enough, then this algorithm

will be superior. For example, if we assume that the clever algorithm has overheads that are 5 times greater

than the simple algorithm (e.g. 5n

1.585

versus n

2

) then this algorithm beats the simple algorithm for n ≥ 50.

If the overhead was 10 times larger, then the crossover would occur for n ≥ 260. Although this may seem like

a very large number, recall that in cryptogrphy applications, encryption keys of this length and longer are quite

reasonable.

Supplemental Lecture 7: Dynamic Programming: 0–1 Knapsack Problem

Read: The introduction to Chapter 16 in CLR. The material on the Knapsack Problem is not presented in our text, but

is brieﬂy discussed in Section 17.2.

0-1 Knapsack Problem: Imagine that a burglar breaks into a museum and ﬁnds n items. Let v

i

denote the value of the

i-th item, and let w

i

denote the weight of the i-th item. The burglar carries a knapsack capable of holding total

weight W. The burglar wishes to carry away the most valuable subset items subject to the weight constraint.

For example, a burglar would rather steal diamonds before gold because the value per pound is better. But he

would rather steal gold before lead for the same reason. We assume that the burglar cannot take a fraction of an

object, so he/she must make a decision to take the object entirely or leave it behind. (There is a version of the

Lecture Notes 111 CMSC 451

problem where the burglar can take a fraction of an object for a fraction of the value and weight. This is much

easier to solve.)

More formally, given 'v

1

, v

2

, . . . , v

n

` and 'w

1

, w

2

. . . , w

n

`, and W > 0, we wish to determine the subset

T ⊆ ¦1, 2, . . . , n¦ (of objects to “take”) that maximizes

¸

i∈T

v

i

,

subject to

¸

i∈T

w

i

≤ W.

Let us assume that the v

i

’s, w

i

’s and W are all positive integers. It turns out that this problem is NP-complete,

and so we cannot really hope to ﬁnd an efﬁcient solution. However if we make the same sort of assumption that

we made in counting sort, we can come up with an efﬁcient solution.

We assume that the w

i

’s are small integers, and that W itself is a small integer. We show that this problem

can be solved in O(nW) time. (Note that this is not very good if W is a large integer. But if we truncate our

numbers to lower precision, this gives a reasonable approximation algorithm.)

Here is how we solve the problem. We construct an array V [0..n, 0..W]. For 1 ≤ i ≤ n, and 0 ≤ j ≤ W, the

entry V [i, j] we will store the maximum value of any subset of objects ¦1, 2, . . . , i¦ that can ﬁt into a knapsack of

weight j. If we can compute all the entries of this array, then the array entry V [n, W] will contain the maximum

value of all n objects that can ﬁt into the entire knapsack of weight W.

To compute the entries of the array V we will imply an inductive approach. As a basis, observe that V [0, j] = 0

for 0 ≤ j ≤ W since if we have no items then we have no value. We consider two cases:

Leave object i: If we choose to not take object i, then the optimal value will come about by considering how

to ﬁll a knapsack of size j with the remaining objects ¦1, 2, . . . , i −1¦. This is just V [i −1, j].

Take object i: If we take object i, then we gain a value of v

i

but have used up w

i

of our capacity. With the

remaining j −w

i

capacity in the knapsack, we can ﬁll it in the best possible way with objects ¦1, 2, . . . , i−

1¦. This is v

i

+V [i −1, j −w

i

]. This is only possible if w

i

≤ j.

Since these are the only two possibilities, we can see that we have the following rule for constructing the array

V . The ranges on i and j are i ∈ [0..n] and j ∈ [0..W].

V [0, j] = 0

V [i, j] =

V [i −1, j] if w

i

> j

max(V [i −1, j], v

i

+V [i −1, j −w

i

]) if w

i

≤ j

The ﬁrst line states that if there are no objects, then there is no value, irrespective of j. The second line

implements the rule above.

It is very easy to take these rules an produce an algorithm that computes the maximum value for the knapsack

in time proportional to the size of the array, which is O((n + 1)(W + 1)) = O(nW). The algorithm is given

below.

An example is shown in the ﬁgure below. The ﬁnal output is V [n, W] = V [4, 10] = 90. This reﬂects the

selection of items 2 and 4, of values $40 and $50, respectively and weights 4 + 3 ≤ 10.

The only missing detail is what items should we select to achieve the maximum. We will leave this as an

exercise. They key is to record for each entry V [i, j] in the matrix whether we got this entry by taking the ith

item or leaving it. With this information, it is possible to reconstruct the optimum knapsack contents.

Lecture Notes 112 CMSC 451

0-1 Knapsack Problem

KnapSack(v[1..n], w[1..n], n, W) {

allocate V[0..n][0..W];

for j = 0 to W do V[0, j] = 0; // initialization

for i = 1 to n do {

for j = 0 to W do {

leave_val = V[i-1, j]; // total value if we leave i

if (j >= w[i]) // enough capacity to take i

take_val = v[i] + V[i-1, j - w[i]]; // total value if we take i

else

take_val = -INFINITY; // cannot take i

V[i,j] = max(leave_val, take_val); // final value is max

}

}

return V[n, W];

}

Values of the objects are '10, 40, 30, 50`.

Weights of the objects are '5, 4, 6, 3`.

Capacity → j = 0 1 2 3 4 5 6 7 8 9 10

Item Value Weight 0 0 0 0 0 0 0 0 0 0 0

1 10 5 0 0 0 0 0 10 10 10 10 10 10

2 40 4 0 0 0 0 40 40 40 40 40 50 50

3 30 6 0 0 0 0 40 40 40 40 40 50 70

4 50 3 0 0 0 50 50 50 50 90 90 90 90

Final result is V [4, 10] = 90 (for taking items 2 and 4).

Fig. 73: 0–1 Knapsack Example.

Lecture Notes 113 CMSC 451

Supplemental Lecture 8: Dynamic Programming: Memoization

Read: Section 15.3 of CLRS.

Recursive Implementation: We have described dynamic programming as a method that involves the “bottom-up”

computation of a table. However, the recursive formulations that we have derived have been set up in a “top-

down” manner. Must the computation proceed bottom-up? Consider the following recursive implementation of

the chain-matrix multiplication algorithm. The call Rec-Matrix-Chain(p, i, j) computes and returns

the value of m[i, j]. The initial call is Rec-Matrix-Chain(p, 1, n). We only consider the cost here.

Recursive Chain Matrix Multiplication

Rec-Matrix-Chain(array p, int i, int j) {

if (i == j) m[i,j] = 0; // basis case

else {

m[i,j] = INFINITY; // initialize

for k = i to j-1 do { // try all splits

cost = Rec-Matrix-Chain(p, i, k) +

Rec-Matrix-Chain(p, k+1, j) + p[i-1]*p[k]*p[j];

if (cost < m[i,j]) m[i,j] = cost; // update if better

}

}

return m[i,j]; // return final cost

}

(Note that the table m[1..n, 1..n] is not really needed. We show it just to make the connection with the earlier

version clearer.) This version of the procedure certainly looks much simpler, and more closely resembles the

recursive formulation that we gave previously for this problem. So, what is wrong with this?

The answer is the running time is much higher than the Θ(n

3

) algorithm that we gave before. In fact, we will

see that its running time is exponential in n. This is unacceptably slow.

Let T(n) denote the running time of this algorithm on a sequence of matrices of length n. (That is, n = j−i+1.)

If i = j then we have a sequence of length 1, and the time is Θ(1). Otherwise, we do Θ(1) work and then

consider all possible ways of splitting the sequence of length n into two sequences, one of length k and the other

of length n −k, and invoke the procedure recursively on each one. So we get the following recurrence, deﬁned

for n ≥ 1. (We have replaced the Θ(1)’s with the constant 1.)

T(n) =

1 if n = 1,

1 +

¸

n−1

k=1

(T(k) +T(n −k)) if n ≥ 2.

Claim: T(n) ≥ 2

n−1

.

Proof: The proof is by induction on n. Clearly this is true for n = 1, since T(1) = 1 = 2

0

. In general, for

n ≥ 2, the induction hypothesis is that T(m) ≥ 2

m−1

for all m < n. Using this we have

T(n) = 1 +

n−1

¸

k=1

(T(k) +T(n −k)) ≥ 1 +

n−1

¸

k=1

T(k)

≥ 1 +

n−1

¸

k=1

2

k−1

= 1 +

n−2

¸

k=0

2

k

= 1 + (2

n−1

−1) = 2

n−1

.

In the ﬁrst line we simply ignored the T(n−k) term, in the second line we applied the induction hypothesis,

and in the last line we applied the formula for the geometric series.

Lecture Notes 114 CMSC 451

Why is this so much worse than the dynamic programming version? If you “unravel” the recursive calls on a

reasonably long example, you will see that the procedure is called repeatedly with the same arguments. The

bottom-up version evaluates each entry exactly once.

Memoization: Is it possible to retain the nice top-down structure of the recursive solution, while keeping the same

O(n

3

) efﬁciency of the bottom-up version? The answer is yes, through a technique called memoization. Here

is the idea. Let’s reconsider the function Rec-Matrix-Chain() given above. It’s job is to compute m[i, j],

and return its value. As noted above, the main problem with the procedure is that it recomputes the same entries

over and over. So, we will ﬁx this by allowing the procedure to compute each entry exactly once. One way to

do this is to initialize every entry to some special value (e.g. UNDEFINED). Once an entries value has been

computed, it is never recomputed.

Memoized Chain Matrix Multiplication

Mem-Matrix-Chain(array p, int i, int j) {

if (m[i,j] != UNDEFINED) return m[i,j]; // already defined

else if (i == j) m[i,j] = 0; // basis case

else {

m[i,j] = INFINITY; // initialize

for k = i to j-1 do { // try all splits

cost = Mem-Matrix-Chain(p, i, k) +

Mem-Matrix-Chain(p, k+1, j) + p[i-1]*p[k]*p[j];

if (cost < m[i,j]) m[i,j] = cost; // update if better

}

}

return m[i,j]; // return final cost

}

This version runs in O(n

3

) time. Intuitively, this is because each of the O(n

2

) table entries is only computed

once, and the work needed to compute one table entry (most of it in the for-loop) is at most O(n).

Memoization is not usually used in practice, since it is generally slower than the bottom-up method. However,

in some DP problems, many of the table entries are simply not needed, and so bottom-up computation may

compute entries that are never needed. In these cases memoization may be a good idea. If you have know that

most of the table will not be needed, here is a way to save space. Rather than storing the whole table explicitly

as an array, you can store the “deﬁned” entries of the table in a hash table, using the index pair (i, j) as the hash

key. (See Chapter 11 in CLRS for more information on hashing.)

Supplemental Lecture 9: Articulation Points and Biconnectivity

Read: This material is not covered in CLR (except as Problem 23–2).

Articulation Points and Biconnected Graphs: Today we discuss another application of DFS, this time to a problem

on undirected graphs. Let G = (V, E) be a connected undirected graph. Consider the following deﬁnitions.

Articulation Point (or Cut Vertex): Is any vertex whose removal (together with the removal of any incident

edges) results in a disconnected graph.

Bridge: Is an edge whose removal results in a disconnected graph.

Biconnected: A graph is biconnected if it contains no articulation points. (In general a graph is k-connected, if

k vertices must be removed to disconnect the graph.)

Biconnected graphs and articulation points are of great interest in the design of network algorithms, because

these are the “critical” points, whose failure will result in the network becoming disconnected.

Lecture Notes 115 CMSC 451

Bridge

Articulation point

Biconnected

c

b

g

h

f

i

d

c g

h

components

j

i e e a

c

a

f b

d

j

a e

Fig. 74: Articulation Points and Bridges

Last time we observed that the notion of mutual reachability partitioned the vertices of a digraph into equivalence

classes. We would like to do the same thing here. We say that two edges e

1

and e

2

are cocyclic if either e

1

= e

2

or if there is a simple cycle that contains both edges. It is not too hard to verify that this deﬁnes an equivalence

relation on the edges of a graph. Notice that if two edges are cocyclic, then there are essentially two different

ways of getting from one edge to the other (by going around the the cycle each way).

Biconnected components: The biconnected components of a graph are the equivalence classes of the cocylicity

relation.

Notice that unlike strongly connected components of a digraph (which form a partition of the vertex set) the

biconnected components of a graph form a partition of the edge set. You might think for a while why this is so.

We give an algorithm for computing articulation points. An algorithm for computing bridges is simple modiﬁ-

cation to this procedure.

Articulation Points and DFS: In order to determine the articulation points of an undirected graph, we will call depth-

ﬁrst search, and use the tree structure provided by the search to aid us. In particular, let us ask ourselves if a

vertex u is an articulation point, how would we know it by its structure in the DFS tree?

We assume that G is connected (if not, we can apply this algorithm to each individual connected component).

So we assume is only one tree in the DFS forest. Because G is undirected, the DFS tree has a simpler structure.

First off, we cannot distinguish between forward edges and back edges, and we just call them back edges. Also,

there are no cross edges. (You should take a moment to convince yourself why this is true.)

For now, let us consider the typical case of a vertex u, where u is not a leaf and u is not the root. Let’s let

v

1

, v

2

, . . . , v

k

be the children of u. For each child there is a subtree of the DFS tree rooted at this child. If for

some child, there is no back edge going to a proper ancestor of u, then if we were to remove u, this subtree

would become disconnected from the rest of the graph, and hence u is an articulation point. On the other hand,

if every one of the subtrees rooted at the children of u have back edges to proper ancestors of u, then if u is

removed, the graph remains connected (the backedges hold everything together). This leads to the following.

Observation 1: An internal vertex u of the DFS tree (other than the root) is an articulation point if and only

there exists a subtree rooted at a child of u such that there is no back edge from any vertex in this subtree

to a proper ancestor of u.

Please check this condition carefully to see that you understand it. In particular, notice that the condition for

whether u is an articulation point depends on a test applied to its children. This is the most common source of

confusion for this algorithm.

What about the leaves? If u is a leaf, can it be an articulation point? Answer: No, because when you delete a

leaf from a tree, the rest of the tree remains connected, thus even ignoring the back edges, the graph is connected

after the deletion of a leaf from the DFS tree.

Lecture Notes 116 CMSC 451

Low[u]=d[v]

v

u

Fig. 75: Articulation Points and DFS

Observation 2: A leaf of the DFS tree is never an articulation point. Note that this is completely consistent

with Observation 1, since a leaf will not have any subtrees in the DFS tree, so we can delete the word

“internal” from Observation 1.

What about the root? Since there are no cross edges between the subtrees of the root if the root has two or more

children then it is an articulation point (since its removal separates these two subtrees). On the other hand, if

the root has only a single child, then (as in the case of leaves) its removal does not disconnect the DFS tree, and

hence cannot disconnect the graph in general.

Observation 3: The root of the DFS is an articulation point if and only if it has two or more children.

Articulation Points by DFS: Observations 1, 2, and 3 provide us with a structural characterization of which vertices

in the DFS tree are articulation points. How can we design an algorithm which tests these conditions? Checking

that the root has multiple children is an easy exercise. Checking Observation 1 is the hardest, but we will exploit

the structure of the DFS tree to help us.

The basic thing we need to check for is whether there is a back edge from some subtree to an ancestor of a given

vertex. How can we do this? It would be too expensive to keep track of all the back edges from each subtree

(because there may be Θ(e) back edges. A simpler scheme is to keep track of back edge that goes highest in the

tree (in the sense of going closest to the root). If any back edge goes to an ancestor of u, this one will.

How do we know how close a back edge goes to the root? As we travel from u towards the root, observe that

the discovery times of these ancestors of u get smaller and smaller (the root having the smallest discovery time

of 1). So we keep track of the back edge (v, w) that has the smallest value of d[w].

Low: Deﬁne Low[u] to be the minimum of d[u] and

¦d[w] [ where (v, w) is a back edge and v is a descendent of u¦.

The term “descendent” is used in the nonstrict sense, that is, v may be equal to u. Intuitively, Low[u] is the

highest (closest to the root) that you can get in the tree by taking any one backedge from either u or any

of its descendents. (Beware of this notation: “Low” means low discovery time, not low in the tree. In fact

Low[u] tends to be “high” in the tree, in the sense of being close to the root.)

To compute Low[u] we use the following simple rules: Suppose that we are performing DFS on the vertex u.

Initialization: Low[u] = d[u].

Back edge (u, v): Low[u] = min(Low[u], d[v]). Explanation: We have detected a new back edge coming out

of u. If this goes to a lower d value than the previous back edge then make this the new low.

Tree edge (u, v): Low[u] = min(Low[u], Low[v]). Explanation: Since v is in the subtree rooted at u any single

back edge leaving the tree rooted at v is a single back edge for the tree rooted at u.

Lecture Notes 117 CMSC 451

Observe that once Low[u] is computed for all vertices u, we can test whether a given nonroot vertex u is an

articulation point by Observation 1 as follows: u is an articulation point if and only if it has a child v in the

DFS tree for which Low[v] ≥ d[u] (since if there were a back edge from either v or one of its descendents to an

ancestor of v then we would have Low[v] < d[u]).

The Final Algorithm: There is one subtlety that we must watch for in designing the algorithm (in particular this is

true for any DFS on undirected graphs). When processing a vertex u, we need to know when a given edge (u, v)

is a back edge. How do we do this? An almost correct answer is to test whether v is colored gray (since all gray

vertices are ancestors of the current vertex). This is not quite correct because v may be the parent of v in the DFS

tree and we are just seeing the “other side” of the tree edge between v and u (recalling that in constructing the

adjacency list of an undirected graph we create two directed edges for each undirected edge). To test correctly

for a back edge we use the predecessor pointer to check that v is not the parent of u in the DFS tree.

The complete algorithm for computing articulation points is given below. The main procedure for DFS is the

same as before, except that it calls the following routine rather than DFSvisit().

Articulation Points

ArtPt(u) {

color[u] = gray

Low[u] = d[u] = ++time

for each (v in Adj(u)) {

if (color[v] == white) { // (u,v) is a tree edge

pred[v] = u

ArtPt(v)

Low[u] = min(Low[u], Low[v]) // update Low[u]

if (pred[u] == NULL) { // root: apply Observation 3

if (this is u’s second child)

Add u to set of articulation points

}

else if (Low[v] >= d[u]) { // internal node: apply Observation 1

Add u to set of articulation points

}

}

else if (v != pred[u]) { // (u,v) is a back edge

Low[u] = min(Low[u], d[v]) // update L[u]

}

}

}

An example is shown in the following ﬁgure. As with all DFS-based algorithms, the running time is Θ(n +e).

There are some interesting problems that we still have not discussed. We did not discuss how to compute the

bridges of a graph. This can be done by a small modiﬁcation of the algorithm above. We’ll leave it as an

exercise. (Notice that if ¦u, v¦ is a bridge then it does not follow that u and v are both articulation points.)

Another question is how to determine which edges are in the biconnected components. A hint here is to store

the edges in a stack as you go through the DFS search. When you come to an articulation point, you can show

that all the edges in the biconnected component will be consecutive in the stack.

Supplemental Lecture 10: Bellman-Ford Shortest Paths

Read: Section 24.1 in CLRS.

Bellman-Ford Algorithm: We saw that Dijkstra’s algorithm can solve the single-source shortest path problem, under

the assumption that the edge weights are nonnegative. We also saw that shortest paths are undeﬁned if you

Lecture Notes 118 CMSC 451

1

3

10

3

= articulation pt.

2

4

5

6

3

3

1

1 8

7

9

8

8

8

d

Low=1 d=1

e

i

j f

g

h

d

c

b

a

j

i a e

f b

c g

h

Fig. 76: Articulation Points.

have cycles of total negative cost. What if you have negative edge weights, but no negative cost cycles? We

shall present the Bellman-Ford algorithm, which solves this problem. This algorithm is slower that Dijkstra’s

algorithm, running in Θ(V E) time. In our version we will assume that there are no negative cost cycles. The

one presented in CLRS actually contains a bit of code that checks for this. (Check it out.)

Recall that we are given a graph G = (V, E) with numeric edge weights, w(u, v). Like Dijkstra’s algorithm, the

Bellman-Ford algorithm is based on performing repeated relaxations. (Recall that relaxation updates shortest

path information along a single edge. It was described in our discussion of Dijkstra’s algorithm.) Dijkstra’s

algorithm was based on the idea of organizing the relaxations in the best possible manner, namely in increasing

order of distance. Once relaxation is applied to an edge, it need never be relaxed again. This trick doesn’t seem

to work when dealing with graphs with negative edge weights. Instead, the Bellman-Ford algorithm simply

applies a relaxation to every edge in the graph, and repeats this V −1 times.

Bellman-Ford Algorithm

BellmanFord(G,w,s) {

for each (u in V) { // standard initialization

d[u] = +infinity

pred[u] = null

}

d[s] = 0

for i = 1 to V-1 { // repeat V-1 times

for each (u,v) in E { // relax along each edge

Relax(u,v)

}

}

}

The Θ(V E) running time is pretty obvious, since there are two main nested loops, one iterated V −1 times and

the other iterated E times. The interesting question is how and why it works.

Correctness of Bellman-Ford: I like to think of the Bellman-Ford as a sort of “BubbleSort analogue” for shortest

paths, in the sense that shortest path information is propagated sequentially along each shortest path in the graph.

Consider any shortest path from s to some other vertex u: 'v

0

, v

1

, . . . , v

k

` where v

0

= s and v

k

= u. Since a

shortest path will never visit the same vertex twice, we know that k ≤ V −1, and hence the path consists of at

most V −1 edges. Since this is a shortest path we have δ(s, v

i

) (the true shortest path cost froms to v

i

) satisﬁes

δ(s, v

i

) = δ(s, v

i−1

) +w(v

i−1

, v

i

).

Lecture Notes 119 CMSC 451

−6 −6 −6

5

4

5

−6

5

8

4

8

4

8

5

4

8

phase phase phase

After 3rd relaxation After 2nd relaxation After 1st relaxation Initial configuration

8

9

2

8

0

4

0

8

7 0

2

?

?

? ? 0

Fig. 77: Bellman-Ford Algorithm.

We assert that after the ith pass of the “for-i” loop that d[v

i

] = δ(s, v

i

). The proof is by induction on i. Observe

that after the initialization (pass 0) we have d[v

1

] = d[s] = 0. In general, prior to the ith pass through the loop,

the induction hypothesis tells us that d[v

i−1

] = δ(s, v

i−1

). After the ith pass through the loop, we have done a

relaxation on the edge (v

i−1

, v

i

) (since we do relaxations along all the edges). Thus after the ith pass we have

d[v

i

] ≤ d[v

i−1

] +w(v

i−1

, v

i

) = δ(s, v

i−1

) +w(v

i−1

, v

i

) = δ(s, v

i

).

Recall from Dijkstra’s algorithm that d[v

i

] is never less than δ(s, v

i

) (since each time we do a relaxation there

exists a path that witnesses its value). Thus, d[v

i

] is in fact equal to δ(s, v

i

), completing the induction proof.

In summary, after i passes through the for loop, all vertices that are i edges away (along the shortest path tree)

from the source have the correct distance values stored in d[u]. Thus, after the (V − 1)st iteration of the for

loop, all vertices u have the correct distance values stored in d[u].

Supplemental Lecture 11: Network Flows and Matching

Read: Chapt 27 in CLR.

Maximum Flow: The Max Flow problem is one of the basic problems of algorithm design. Intuitively we can think

of a ﬂow network as a directed graph in which ﬂuid is ﬂowing along the edges of the graph. Each edge has

certain maximum capacity that it can carry. The idea is to ﬁnd out how much ﬂow we can push from one point

to another.

The max ﬂow problem has applications in areas like transportation, routing in networks. It is the simplest

problem in a line of many important problems having to do with the movement of commodities through a

network. These are often studied in business schools, and operations research.

Flow Networks: A ﬂow network G = (V, E) is a directed graph in which each edge (u, v) ∈ E has a nonegative

capacity c(u, v) ≥ 0. If (u, v) ∈ E we model this by setting c(u, v) = 0. There are two special vertices: a

source s, and a sink t. We assume that every vertex lies on some path from the source to the sink (for otherwise

the vertex is of no use to us). (This implies that the digraph is connected, and hence e ≥ n −1.)

A ﬂow is a real valued function on pairs of vertices, f : V V → R which satisﬁes the following three

properties:

Capacity Constraint: For all u, v ∈ V , f(u, v) ≤ c(u, v).

Skew Symmetry: For all u, v ∈ V , f(u, v) = −f(v, u). (In other words, we can think of backwards ﬂow as

negative ﬂow. This is primarily for making algebraic analysis easier.)

Flow conservation: For all u ∈ V −¦s, t¦, we have

¸

v∈V

f(u, v) = 0.

Lecture Notes 120 CMSC 451

(Given skew symmetry, this is equivalent to saying, ﬂow-in = ﬂow-out.) Note that ﬂow conservation

does NOT apply to the source and sink, since we think of ourselves as pumping ﬂow from s to t. Flow

conservation means that no ﬂow is lost anywhere else in the network, thus the ﬂow out of s will equal the

ﬂow into t.

The quantity f(u, v) is called the net ﬂow from u to v. The total value of the ﬂow f is deﬁned as

[f[ =

¸

v∈V

f(s, v)

i.e. the ﬂow out of s. It turns out that this is also equal to

¸

v∈V

f(v, t), the ﬂow into t. We will show this later.

The maximum-ﬂow problem is, given a ﬂow network, and source and sink vertices s and t, ﬁnd the ﬂow of

maximum value from s to t.

Example: Page 581 of CLR.

Multi-source, multi-sink ﬂow problems: It may seem overly restrictive to require that there is only a single source

and a single sink vertex. Many ﬂow problems have situations in which many source vertices s

1

, s

2

, . . . , s

k

and

many sink vertices t

1

, t

2

, . . . , t

l

. This can easily be modelled by just adding a special supersource s

and a

supersink t

, and attaching s

to all the s

i

and attach all the t

j

to t

**. We let these edges have inﬁnite capacity.
**

Now by pushing the maximum ﬂow from s

to t

**we are effectively producing the maximum ﬂow from all the
**

s

i

to all the t

j

’s.

Note that we don’t care which ﬂow from one source goes to another sink. If you require that the ﬂow from

source i goes ONLY to sink i, then you have a tougher problem called the multi-commodity ﬂow problem.

Set Notation: Sometimes rather than talking about the ﬂow from a vertex u to a vertex v, we want to talk about the

ﬂow from a SET of vertices X to another SET of vertices Y . To do this we extend the deﬁnition of f to sets by

deﬁning

f(X, Y ) =

¸

x∈X

¸

y ∈ Y f(x, y).

Using this notation we can deﬁne ﬂow balance for a vertex u more succintly by just writing f(u, V ) = 0. One

important special case of this concept is when X and Y deﬁne a cut (i.e. a partition of the vertex set into two

disjoint subsets X ⊆ V and Y = V − X). In this case f(X, Y ) can be thought of as the net amount of ﬂow

crossing over the cut.

From simple manipulations of the deﬁnition of ﬂow we can prove the following facts.

Lemma:

(i) f(X, X) = 0.

(ii) f(X, Y ) = −f(Y, X).

(iii) If X ∩ Y = ∅ then f(X ∪ Y, Z) = f(X, Z) +f(Y, Z) and f(Z, X ∪ Y ) = f(Z, X) +f(Z, Y ).

Ford-Fulkerson Method: The most basic concept on which all network-ﬂow algorithms work is the notion of aug-

menting ﬂows. The idea is to start with a ﬂow of size zero, and then incrementally make the ﬂow larger and

larger by ﬁnding a path along which we can push more ﬂow. A path in the network froms to t along which more

ﬂow can be pushed is called an augmenting path. This idea is given by the most simple method for computing

network ﬂows, called the Ford-Fulkerson method.

Almost all network ﬂow algorithms are based on this simple idea. They only differ in how they decide which

path or paths along which to push ﬂow. We will prove that when it is impossible to “push” any more ﬂow

through the network, we have reached the maximum possible ﬂow (i.e. a locally maximum ﬂow is globally

maximum).

Lecture Notes 121 CMSC 451

Ford-Fulkerson Network Flow

FordFulkerson(G, s, t) {

initialize flow f to 0;

while (there exists an augmenting path p) {

augment the flow along p;

}

output the final flow f;

}

Residual Network: To deﬁne the notion of an augmenting path, we ﬁrst deﬁne the notion of a residual network. Given

a ﬂow network Gand a ﬂowf, deﬁne the residual capacity of a pair u, v ∈ V to be c

f

(u, v) = c(u, v)−f(u, v).

Because of the capacity constraint, c

f

(u, v) ≥ 0. Observe that if c

f

(u, v) > 0 then it is possible to push more

ﬂow through the edge (u, v). Otherwise we say that the edge is saturated.

The residual network is the directed graph G

f

with the same vertex set as Gbut whose edges are the pairs (u, v)

such that c

f

(u, v) > 0. Each edge in the residual network is weighted with its residual capacity.

Example: Page 589 of CLR.

Lemma: Let f be a ﬂow in G and let f

be a ﬂow in G

f

. Then (f + f

) (deﬁned (f + f

)(u, v) = f(u, v) +

f

(u, v)) is a ﬂow in G. The value of the ﬂow is [f[ +[f

[.

Proof: Basically the residual network tells us how much additional ﬂow we can push through G. This implies

that f +f

never exceeds the overall edge capacities of G. The other rules for ﬂows are easy to verify.

Augmenting Paths: An augmenting path is a simple path from s to t in G

f

. The residual capacity of the path is

the MINIMUM capacity of any edge on the path. It is denoted c

f

(p). Observe that by pushing c

f

(p) units of

ﬂow along each edge of the path, we get a ﬂow in G

f

, and hence we can use this to augment the ﬂow in G.

(Remember that when deﬁning this ﬂow that whenever we push c

f

(p) units of ﬂow along any edge (u, v) of p,

we have to push −c

f

(p) units of ﬂow along the reverse edge (v, u) to maintain skew-symmetry. Since every

edge of the residual network has a strictly positive weight, the resulting ﬂow is strictly larger than the current

ﬂow for G.

In order to determine whether there exists an augmenting path froms to t is an easy problem. First we construct

the residual network, and then we run DFS or BFS on the residual network starting at s. If the search reaches

t then we know that a path exists (and can follow the predecessor pointers backwards to reconstruct it). Since

DFS and BFS take Θ(n +e) time, and it can be shown that the residual network has Θ(n +e) size, the running

time of Ford-Fulkerson is basically

Θ((n +e)(number of augmenting stages)).

Later we will analyze the latter quantity.

Correctness: To establish the correctness of the Ford-Fulkerson algorithm we need to delve more deeply into the

theory of ﬂows and cuts in networks. A cut, (S, T), in a ﬂow network is a partition of the vertex set into two

disjoint subsets S and T such that s ∈ S and t ∈ T. We deﬁne the ﬂow across the cut as f(S, T), and we deﬁne

the capcity of the cut as c(S, T). Note that in computing f(S, T) ﬂows from T to S are counted negatively (by

skew-symmetry), and in computing c(S, T) we ONLY count constraints on edges leading from S to T ignoring

those from T to S).

Lemma: The amount of ﬂow across any cut in the network is equal to [f[.

Lecture Notes 122 CMSC 451

Proof:

f(S, T) = f(S, V ) −f(S, S)

= f(S, V )

= f(s, V ) +f(S −s, V )

= f(s, V )

= [f[

(The fact that f(S − s, V ) = 0 comes from ﬂow conservation. f(u, V ) = 0 for all u other than s and t,

and since S −s is formed of such vertices the sum of their ﬂows will be zero also.)

Corollary: The value of any ﬂow is bounded from above by the capacity of any cut. (i.e. Maximum ﬂow ≤

Minimum cut).

Proof: You cannot push any more ﬂow through a cut than its capacity.

The correctness of the Ford-Fulkerson method is based on the following theorem, called the Max-Flow, Min-Cut

Theorem. It basically states that in any ﬂow network the minimum capacity cut acts like a bottleneck to limit

the maximum amount of ﬂow. Ford-Fulkerson algorithm terminates when it ﬁnds this bottleneck, and hence it

ﬁnds the minimum cut and maximum ﬂow.

Max-Flow Min-Cut Theorem: The following three conditions are equivalent.

(i) f is a maximum ﬂow in G,

(ii) The residual network G

f

contains no augmenting paths,

(iii) [f[ = c(S, T) for some cut (S, T) of G.

Proof: (i) ⇒(ii): If f is a max ﬂow and there were an augmenting path in G

f

, then by pushing ﬂow along this

path we would have a larger ﬂow, a contradiction.

(ii) ⇒ (iii): If there are no augmenting paths then s and t are not connected in the residual network. Let

S be those vertices reachable from s in the residual network and let T be the rest. (S, T) forms a cut.

Because each edge crossing the cut must be saturated with ﬂow, it follows that the ﬂow across the cut

equals the capacity of the cut, thus [f[ = c(S, T).

(iii) ⇒ (i): Since the ﬂow is never bigger than the capacity of any cut, if the ﬂow equals the capacity of

some cut, then it must be maximum (and this cut must be minimum).

Analysis of the Ford-Fulkerson method: The problem with the Ford-Fulkerson algorithm is that depending on how

it picks augmenting paths, it may spend an inordinate amount of time arriving a the ﬁnal maximum ﬂow. Con-

sider the following example (from page 596 in CLR). If the algorithm were smart enough to send ﬂow along

the edges of weight 1,000,000, the algorithm would terminate in two augmenting steps. However, if the algo-

rithm were to try to augment using the middle edge, it will continuously improve the ﬂow by only a single unit.

2,000,000 augmenting will be needed before we get the ﬁnal ﬂow. In general, Ford-Fulkerson can take time

Θ((n +e)[f

∗

[) where f

∗

is the maximum ﬂow.

An Improvement: We have shown that if the augmenting path was chosen in a bad way the algorithm could run for a

very long time before converging on the ﬁnal ﬂow. It seems (from the example we showed) that a more logical

way to push ﬂow is to select the augmenting path which holds the maximum amount of ﬂow. Computing this

path is equivalent to determining the path of maximum capacity from s to t in the residual network. (This is

exactly the same as the beer transport problem given on the last exam.) It is not known how fast this method

works in the worst case, but there is another simple strategy that is guaranteed to give good bounds (in terms of

n and e).

Lecture Notes 123 CMSC 451

Edmonds-Karp Algorithm: The Edmonds-Karp algorithm is Ford-Fulkerson, with one little change. When ﬁnding

the augmenting path, we use Breadth-First search in the residual network, starting at the source s, and thus we

ﬁnd the shortest augmenting path (where the length of the path is the number of edges on the path). We claim

that this choice is particularly nice in that, if we do so, the number of ﬂow augmentations needed will be at most

O(e n). Since each augmentation takes O(n +e) time to compute using BFS, the overall running time will be

O((n + e)e n) = O(n

2

e + e

2

n) ∈ O(e

2

n) (under the reasonable assumption that e ≥ n). (The best known

algorithm is essentially O(e nlog n).

The fact that Edmonds-Karp uses O(en) augmentations is based on the following observations.

Observation: If the edge (u, v) is an edge on the minimum length augmenting path from s to t in G

f

, then

δ

f

(s, v) = δ

f

(s, u) + 1.

Proof: This is a simple property of shortest paths. Since there is an edge from u to v, δ

f

(s, v) ≤ δ

f

(s, u) + 1,

and if δ

f

(s, v) < δ

f

(s, u) +1 then u would not be on the shortest path from s to v, and hence (u, v) is not

on any shortest path.

Lemma: For each vertex u ∈ V −¦s, t¦, let δ

f

(s, u) be the distance function froms to u in the residual network

G

f

. Then as we peform augmentations by the Edmonds-Karp algorithm the value of δ

f

(s, u) increases

monotonically with each ﬂow augmentation.

Proof: (Messy, but not too complicated. See the text.)

Theorem: The Edmonds-Karp algorithm makes at most O(n e) augmentations.

Proof: An edge in the augmenting path is critical if the residual capacity of the path equals the residual capacity

of this edge. In other words, after augmentation the critical edge becomes saturated, and disappears from

the residual graph.

How many times can an edge become critical before the algorithm terminates? Observe that when the

edge (u, v) is critical it lies on the shortest augmenting path, implying that δ

f

(s, v) = δ

f

(s, u) + 1. After

this it disappears from the residual graph. In order to reappear, it must be that we reduce ﬂow on this edge,

i.e. we push ﬂow along the reverse edge (v, u). For this to be the case we have (at some later ﬂow f

)

δ

f

(s, u) = δ

f

(s, v) + 1. Thus we have:

δ

f

(s, u) = δ

f

(s, v) + 1

≥ δ

f

(s, v) + 1 since dists increase with time

= (δ

f

(s, u) + 1) + 1

= δ

f

(s, u) + 2.

Thus, between the time that an edge becomes critical, its tail vertex increases in distance from the source

by two. This can only happen n/2 times, since no vertex can be further than n from the source. Thus, each

edge can become critical at most O(n) times, there are O(e) edges, hence after O(ne) augmentations, the

algorithm must terminate.

In summary, the Edmonds-Karp algorithm makes at most O(ne) augmentations and runs in O(ne

2

) time.

Maximum Matching: One of the important elements of network ﬂow is that it is a very general algorithm which is

capable of solving many problems. (An example is problem 3 in the homework.) We will give another example

here.

Consider the following problem, you are running a dating service and there are a set of men L and a set of

women R. Using a questionaire you establish which men are compatible which which women. Your task is

to pair up as many compatible pairs of men and women as possible, subject to the constraint that each man is

paired with at most one woman, and vice versa. (It may be that some men are not paired with any woman.)

This problem is modelled by giving an undirected graph whose vertex set is V = L ∪ R and whose edge set

consists of pairs (u, v), u ∈ L, v ∈ R such that u and v are compatible. The problem is to ﬁnd a matching,

Lecture Notes 124 CMSC 451

that is a subset of edges M such that for each v ∈ V , there is at most one edge of M incident to v. The desired

matching is the one that has the maximum number of edges, and is called a maximum matching.

Example: See page 601 in CLR.

The resulting undirected graph has the property that its vertex set can be divided into two groups such that all

its edges go from one group to the other (never within a group, unless the dating service is located on Dupont

Circle). This problem is called the maximum bipartite matching problem.

Reduction to Network Flow: We claim that if you have an algorithm for solving the network ﬂow problem, then you

can use this algorithm to solve the maximum bipartite matching problem. (Note that this idea does not work for

general undirected graphs.)

Construct a ﬂow network G

= (V

, E

) as follows. Let s and t be two new vertices and let V

= V ∪ ¦s, t¦.

E

= ¦(s, u)[u ∈ L¦ ∪ ¦(v, t)[v ∈ R¦ ∪ ¦(u, v)[(u, v) ∈ E¦.

Set the capacity of all edges in this network to 1.

Example: See page 602 in CLR.

Now, compute the maximum ﬂow in G

**. Although in general it can be that ﬂows are real numbers, observe that
**

the Ford-Fulkerson algorithm will only assign integer value ﬂows to the edges (and this is true of all existing

network ﬂow algorithms).

Since each vertex in L has exactly 1 incoming edge, it can have ﬂow along at most 1 outgoing edge, and since

each vertex in R has exactly 1 outgoing edge, it can have ﬂow along at most 1 incoming edge. Thus letting f

denote the maximum ﬂow, we can deﬁne a matching

M = ¦(u, v)[u ∈ L, v ∈ R, f(u, v) > 0¦.

We claim that this matching is maximum because for every matching there is a corresponding ﬂow of equal

value, and for every (integer) ﬂow there is a matching of equal value. Thus by maximizing one we maximize

the other.

Supplemental Lecture 12: Hamiltonian Path

Read: The reduction we present for Hamiltonian Path is completely different from the one in Chapt 36.5.4 of CLR.

Hamiltonian Cycle: Today we consider a collection of problems related to ﬁnding paths in graphs and digraphs.

Recall that given a graph (or digraph) a Hamiltonian cycle is a simple cycle that visits every vertex in the graph

(exactly once). A Hamiltonian path is a simple path that visits every vertex in the graph (exactly once). The

Hamiltonian cycle (HC) and Hamiltonian path (HP) problems ask whether a given graph (or digraph) has such

a cycle or path, respectively. There are four variations of these problems depending on whether the graph is

directed or undirected, and depending on whether you want a path or a cycle, but all of these problems are

NP-complete.

An important related problem is the traveling salesman problem (TSP). Given a complete graph (or digraph)

with integer edge weights, determine the cycle of minimum weight that visits all the vertices. Since the graph

is complete, such a cycle will always exist. The decision problem formulation is, given a complete weighted

graph G, and integer X, does there exist a Hamiltonian cycle of total weight at most X? Today we will prove

that Hamiltonian Cycle is NP-complete. We will leave TSP as an easy exercise. (It is done in Section 36.5.5 in

CLR.)

Lecture Notes 125 CMSC 451

Component Design: Up to now, most of the reductions that we have seen (for Clique, VC, and DS in particular) are

of a relatively simple variety. They are sometimes called local replacement reductions, because they operate by

making some local change throughout the graph.

We will present a much more complex style of reduction for the Hamiltonian path problem on directed graphs.

This type of reduction is called a component design reduction, because it involves designing special subgraphs,

sometimes called components or gadgets (also called widgets). whose job it is to enforce a particular constraint.

Very complex reductions may involve the creation of many gadgets. This one involves the construction of only

one. (See CLR’s presentation of HP for other examples of gadgets.)

The gadget that we will use in the directed Hamiltonian path reduction, called a DHP-gadget, is shown in the

ﬁgure below. It consists of three incoming edges labeled i

1

, i

2

, i

3

and three outgoing edges, labeled o

1

, o

2

, o

3

. It

was designed so it satisﬁed the following property, which you can verify. Intuitively it says that if you enter the

gadget on any subset of 1, 2 or 3 input edges, then there is a way to get through the gadget and hit every vertex

exactly once, and in doing so each path must end on the corresponding output edge.

Claim: Given the DHP-gadget:

• For any subset of input edges, there exists a set of paths which join each input edge i

1

, i

2

, or i

3

to

its respective output edge o

1

, o

2

, or o

3

such that together these paths visit every vertex in the gadget

exactly once.

• Any subset of paths that start on the input edges and end on the output edges, and visit all the vertices

of the gadget exactly once, must join corresponding inputs to corresponding outputs. (In other words,

a path that starts on input i

1

must exit on output o

1

.)

The proof is not hard, but involves a careful inspection of the gadget. It is probably easiest to see this on your

own, by starting with one, two, or three input paths, and attempting to get through the gadget without skipping

vertex and without visiting any vertex twice. To see whether you really understand the gadget, answer the

question of why there are 6 groups of triples. Would some other number work?

DHP is NP-complete: This gadget is an essential part of our proof that the directed Hamiltonian path problem is

NP-complete.

Theorem: The directed Hamiltonian Path problem is NP-complete.

Proof: DHP ∈ NP: The certiﬁcate consists of the sequence of vertices (or edges) in the path. It is an easy

matter to check that the path visits every vertex exactly once.

3SAT ≤

P

DHP: This will be the subject of the rest of this section.

Let us consider the similar elements between the two problems. In 3SAT we are selecting a truth assignment

for the variables of the formula. In DHP, we are deciding which edges will be a part of the path. In 3SAT there

must be at least one true literal for each clause. In DHP, each vertex must be visited exactly once.

We are given a boolean formula F in 3-CNF form (three literals per clause). We will convert this formula into

a digraph. Let x

1

, x

2

, . . . , x

m

denote the variables appearing in F. We will construct one DHP-gadget for each

clause in the formula. The inputs and outputs of each gadget correspond to the literals appearing in this clause.

Thus, the clause (x

2

∨x

5

∨x

8

) would generate a clause gadget with inputs labeled x

2

, x

5

, and x

8

, and the same

outputs.

The general structure of the digraph will consist of a series vertices, one for each variable. Each of these vertices

will have two outgoing paths, one taken if x

i

is set to true and one if x

i

is set to false. Each of these paths will

then pass through some number of DHP-gadgets. The true path for x

i

will pass through all the clause gadgets

for clauses in which x

i

appears, and the false path will pass through all the gadgets for clauses in which x

i

appears. (The order in which the path passes through the gadgets is unimportant.) When the paths for x

i

have

passed through their last gadgets, then they are joined to the next variable vertex, x

i+1

. This is illustrated in

the following ﬁgure. (The ﬁgure only shows a portion of the construction. There will be paths coming into

Lecture Notes 126 CMSC 451

i

2

i

1

i

What it looks like inside Gadget

3

i

3

o

2

o

1

o

3

i

2

i

1

i

3

o

2

o

1

o

3

i

2

1

Path with 3 entries

3

o

2

o

1

o

3

i

2

i

1

i

3

o

2

o

1

o

3

i

2

i

i

2

i

1

i

3

o

2

o

1

o

3

i

2

i

1

i

3

o

2

o

1

o

i

1

i

i

3

o

2

o

1

o

3

i

2

i

1

Path with 2 entries

Path with 1 entry

3

o

2

o

1

o

3

Fig. 78: DHP-Gadget and examples of path traversals.

Lecture Notes 127 CMSC 451

these same gadgets from other variables as well.) We add one ﬁnal vertex x

e

, and the last variable’s paths are

connected to x

e

. (If we wanted to reduce to Hamiltonian cycle, rather than Hamiltonian path, we could join x

e

back to x

1

.)

i

x

i

x

i

x

i

_ _ _ _

_ _

_

i+1

x

...

...

_

x

x

x

i

x

i

i

x

i

i

x

i

x

i

x

i

x

i

x

i

x

i

x

i

x

i

x

Fig. 79: General structure of reduction from 3SAT to DHP.

Note that for each variable, the Hamiltonian path must either use the true path or the false path, but it cannot use

both. If we choose the true path for x

i

to be in the Hamiltonian path, then we will have at least one path passing

through each of the gadgets whose corresponding clause contains x

i

, and if we chose the false path, then we

will have at least one path passing through each gadget for x

i

.

For example, consider the following boolean formula in 3-CNF. The construction yields the digraph shown in

the following ﬁgure.

(x

1

∨ x

2

∨ x

3

) ∧ (x

1

∨ x

2

∨ x

3

) ∧ (x

2

∨ x

1

∨ x

3

) ∧ (x

1

∨ x

3

∨ x

2

).

T

F

F

T

to

to

to

to

F

_

x

_

x

x

_

3 3 x

_

x

_

_

2

1 2

1

x

path starts here

T

2

x

2

x

3

x

1

x

2

x

1

x

2

x

3

x

3

x

1

x

x

2

x

e

x

3

x

3

Fig. 80: Example of the 3SAT to DHP reduction.

The Reduction: Let us give a more formal description of the reduction. Recall that we are given a boolean formula F

in 3-CNF. We create a digraph G as follows. For each variable x

i

appearing in F, we create a variable vertex,

named x

i

. We also create a vertex named x

e

(the ending vertex). For each clause c, we create a DHP-gadget

whose inputs and outputs are labeled with the three literals of c. (The order is unimportant, as long as each input

and its corresponding output are labeled the same.)

We join these vertices with the gadgets as follows. For each variable x

i

, consider all the clauses c

1

, c

2

, . . . , c

k

in

which x

i

appears as a literal (uncomplemented). Join x

i

by an edge to the input labeled with x

i

in the gadget for

c

1

, and in general join the the output of gadget c

j

labeled x

i

with the input of gadget c

j+1

with this same label.

Finally, join the output of the last gadget c

k

to the next vertex variable x

i+1

. (If this is the last variable, then

join it to x

e

instead.) The resulting chain of edges is called the true path for variable x

i

. Form a second chain

in exactly the same way, but this time joining the gadgets for the clauses in which x

i

appears. This is called

the false path for x

i

. The resulting digraph is the output of the reduction. Observe that the entire construction

can be performed in polynomial time, by simply inspecting the formula, creating the appropriate vertices, and

adding the appropriate edges to the digraph. The following lemma establishes the correctness of this reduction.

Lemma: The boolean formula F is satisﬁable if and only if the digraph Gproduced by the above reduction has

a Hamiltonian path.

Lecture Notes 128 CMSC 451

1

3

x

_

x

_

x

_

x

_

x

_

x

_

A nonsatisfying assignment misses some gadgets

A satisfying assignment hits all gadgets

3

to

to

F

T

T

Start here

to

to

F

T

F

Start here

1

3 3

e

x

2

x

1

x

3

x

2

x

2

x

2

x

3

x

x

x

1

x

3

x

2

x

1

x

2

x

1

x

2

x

3

e

x

2

Fig. 81: Correctness of the 3SAT to DHP reduction. The upper ﬁgure shows the Hamiltonian path resulting from the

satisfying assignment, x

1

= 1, x

2

= 1, x

3

= 0, and the lower ﬁgure shows the non-Hamiltonian path resulting from

the nonsatisfying assignment x

1

= 0, x

2

= 1, x

3

= 0.

Proof: We need to prove both the “only if” and the “if”.

⇒: Suppose that F has a satisfying assignment. We claim that G has a Hamiltonian path. This path will start at

the variable vertex x

1

, then will travel along either the true path or false path for x

1

, depending on whether

it is 1 or 0, respectively, in the assignment, and then it will continue with x

2

, then x

3

, and so on, until

reaching x

e

. Such a path will visit each variable vertex exactly once.

Because this is a satisfying assignment, we know that for each clause, either 1, 2, or 3 of its literals

will be true. This means that for each clause, either 1, 2, or 3, paths will attempt to travel through the

corresponding gadget. However, we have argued in the above claim that in this case it is possible to visit

every vertex in the gadget exactly once. Thus every vertex in the graph is visited exactly once, implying

that G has a Hamiltonian path.

⇐: Suppose that G has a Hamiltonian path. We assert that the form of the path must be essentially the same as

the one described in the previous part of this proof. In particular, the path must visit the variable vertices

in increasing order from x

1

until x

e

, because of the way in which these vertices are joined together.

Also observe that for each variable vertex, the path will proceed along either the true path or the false path.

If it proceeds along the true path, set the corresponding variable to 1 and otherwise set it to 0. We will

show that the resulting assignment is a satisfying assignment for F.

Any Hamiltonian path must visit all the vertices in every gadget. By the above claim about DHP-gadgets,

if a path visits all the vertices and enters along input edge then it must exit along the corresponding output

edge. Therefore, once the Hamiltonian path starts along the true or false path for some variable, it must

remain on edges with the same label. That is, if the path starts along the true path for x

i

, it must travel

through all the gadgets with the label x

i

until arriving at the variable vertex for x

i+1

. If it starts along the

false path, then it must travel through all gadgets with the label x

i

.

Since all the gadgets are visited and the paths must remain true to their initial assignments, it follows that

for each corresponding clause, at least one (and possibly 2 or three) of the literals must be true. Therefore,

this is a satisfying assignment.

Lecture Notes 129 CMSC 451

Supplemental Lecture 13: Subset Sum Approximation

Read: Section 37.4 in CLR.

Polynomial Approximation Schemes: Last time we saw that for some NP-complete problems, it is possible to ap-

proximate the problem to within a ﬁxed constant ratio bound. For example, the approximation algorithm pro-

duces an answer that is within a factor of 2 of the optimal solution. However, in practice, people would like to

the control the precision of the approximation. This is done by specifying a parameter > 0 as part of the input

to the approximation algorithm, and requiring that the algorithm produce an answer that is within a relative

error of of the optimal solution. It is understood that as tends to 0, the running time of the algorithm will

increase. Such an algorithm is called a polynomial approximation scheme.

For example, the running time of the algorithm might be O(2

(1/)

n

2

). It is easy to see that in such cases the user

pays a big penalty in running time as a function of . (For example, to produce a 1% error, the “constant” factor

would be 2

100

which would be around 4 quadrillion centuries on your 100 Mhz Pentium.) A fully polynomial

approximation scheme is one in which the running time is polynomial in both n and 1/. For example, a

running time of O((n/)

2

) would satisfy this condition. In such cases, reasonably accurate approximations are

computationally feasible.

Unfortunately, there are very few NP-complete problems with fully polynomial approximation schemes. In fact,

recently there has been strong evidence that many NP-complete problems do not have polynomial approximation

schemes (fully or otherwise). Today we will study one that does.

Subset Sum: Recall that in the subset sum problem we are given a set S of positive integers ¦x

1

, x

2

, . . . , x

n

¦ and a

target value t, and we are asked whether there exists a subset S

**⊆ S that sums exactly to t. The optimization
**

problem is to determine the subset whose sum is as large as possible but not larger than t.

This problem is basic to many packing problems, and is indirectly related to processor scheduling problems that

arise in operating systems as well. Suppose we are also given 0 < < 1. Let z

∗

≤ t denote the optimum sum.

The approximation problem is to return a value z ≤ t such that

z ≥ z

∗

(1 −).

If we think of this as a knapsack problem, we want our knapsack to be within a factor of (1 −) of being as full

as possible. So, if = 0.1, then the knapsack should be at least 90% as full as the best possible.

What do we mean by polynomial time here? Recall that the running time should be polynomial in the size of

the input length. Obviously n is part of the input length. But t and the numbers x

i

could also be huge binary

numbers. Normally we just assume that a binary number can ﬁt into a word of our computer, and do not count

their length. In this case we will to be on the safe side. Clearly t requires O(log t) digits to be store in the input.

We will take the input size to be n + log t.

Intuitively it is not hard to believe that it should be possible to determine whether we can ﬁll the knapsack to

within 90% of optimal. After all, we are used to solving similar sorts of packing problems all the time in real

life. But the mental heuristics that we apply to these problems are not necessarily easy to convert into efﬁcient

algorithms. Our intuition tells us that we can afford to be a little “sloppy” in keeping track of exactly full the

knapsack is at any point. The value of tells us just how sloppy we can be. Our approximation will do something

similar. First we consider an exponential time algorithm, and then convert it into an approximation algorithm.

Exponential Time Algorithm: This algorithm is a variation of the dynamic programming solution we gave for the

knapsack problem. Recall that there we used an 2-dimensional array to keep track of whether we could ﬁll a

knapsack of a given capacity with the ﬁrst i objects. We will do something similar here. As before, we will

concentrate on the question of which sums are possible, but determining the subsets that give these sums will

not be hard.

Let L

i

denote a list of integers that contains the sums of all 2

i

subsets of ¦x

1

, x

2

, . . . , x

i

¦ (including the empty

set whose sum is 0). For example, for the set ¦1, 4, 6¦ the corresponding list of sums contains '0, 1, 4, 5(=

Lecture Notes 130 CMSC 451

1 + 4), 6, 7(= 1 + 6), 10(= 4 + 6), 11(= 1 + 4 + 6)`. Note that L

i

can have as many as 2

i

elements, but may

have fewer, since some subsets may have the same sum.

There are two things we will want to do for efﬁciency. (1) Remove any duplicates from L

i

, and (2) only keep

sums that are less than or equal to t. Let us suppose that we a procedure MergeLists(L1, L2) which

merges two sorted lists, and returns a sorted lists with all duplicates removed. This is essentially the procedure

used in MergeSort but with the added duplicate element test. As a bit of notation, let L + x denote the list

resulting by adding the number x to every element of list L. Thus '1, 4, 6` + 3 = '4, 7, 9`. This gives the

following procedure for the subset sum problem.

Exact Subset Sum

Exact_SS(x[1..n], t) {

L = <0>;

for i = 1 to n do {

L = MergeLists(L, L+x[i]);

remove for L all elements greater than t;

}

return largest element in L;

}

For example, if S = ¦1, 4, 6¦ and t = 8 then the successive lists would be

L

0

= '0`

L

1

= '0` ∪ '0 + 1` = '0, 1`

L

2

= '0, 1` ∪ '0 + 4, 1 + 4` = '0, 1, 4, 5`

L

3

= '0, 1, 4, 5` ∪ '0 + 6, 1 + 6, 4 + 6, 5 + 6` = '0, 1, 4, 5, 6, 7, 10, 11`.

The last list would have the elements 10 and 11 removed, and the ﬁnal answer would be 7. The algorithm runs

in Ω(2

n

) time in the worst case, because this is the number of sums that are generated if there are no duplicates,

and no items are removed.

Approximation Algorithm: To convert this into an approximation algorithm, we will introduce a “trim” the lists to

decrease their sizes. The idea is that if the list L contains two numbers that are very close to one another, e.g.

91, 048 and 91, 050, then we should not need to keep both of these numbers in the list. One of them is good

enough for future approximations. This will reduce the size of the lists that the algorithm needs to maintain.

But, how much trimming can we allow and still keep our approximation bound? Furthermore, will we be able

to reduce the list sizes from exponential to polynomial?

The answer to both these questions is yes, provided you apply a proper way of trimming the lists. We will trim

elements whose values are sufﬁciently close to each other. But we should deﬁne close in manner that is relative

to the sizes of the numbers involved. The trimming must also depend on . We select δ = /n. (Why? We will

see later that this is the value that makes everything work out in the end.) Note that 0 < δ < 1. Assume that the

elements of L are sorted. We walk through the list. Let z denote the last untrimmed element in L, and let y ≥ z

be the next element to be considered. If

y −z

y

≤ δ

then we trim y from the list. Equivalently, this means that the ﬁnal trimmed list cannot contain two value y and

z such that

(1 −δ)y ≤ z ≤ y.

We can think of z as representing y in the list.

For example, given δ = 0.1 and given the list

L = '10, 11, 12, 15, 20, 21, 22, 23, 24, 29`,

Lecture Notes 131 CMSC 451

the trimmed list L

will consist of

L

= '10, 12, 15, 20, 23, 29`.

Another way to visualize trimming is to break the interval from [1, t] into a set of buckets of exponentially

increasing size. Let d = 1/(1−δ). Note that d > 1. Consider the intervals [1, d], [d, d

2

], [d

2

, d

3

], . . . , [d

k−1

, d

k

]

where d

k

≥ t. If z ≤ y are in the same interval [d

i−1

, d

i

] then

y −z

y

≤

d

i

−d

i−1

d

i

= 1 −

1

d

= δ.

Thus, we cannot have more than one item within each bucket. We can think of trimming as a way of enforcing

the condition that items in our lists are not relatively too close to one another, by enforcing the condition that no

bucket has more than one item.

L

L’

1 2 4 8 16

Fig. 82: Trimming Lists for Approximate Subset Sum.

Claim: The number of distinct items in a trimmed list is O((nlog t)/), which is polynomial in input size and

1/.

Proof: We know that each pair of consecutive elements in a trimmed list differ by a ratio of at least d =

1/(1 −δ) > 1. Let k denote the number of elements in the trimmed list, ignoring the element of value 0.

Thus, the smallest nonzero value and maximum value in the the trimmed list differ by a ratio of at least

d

k−1

. Since the smallest (nonzero) element is at least as large as 1, and the largest is no larger than t, then

it follows that d

k−1

≤ t/1 = t. Taking the natural log of both sides we have (k −1) ln d ≤ ln t. Using the

facts that δ = /n and the log identity that ln(1 +x) ≤ x, we have

k −1 ≤

ln t

ln d

=

ln t

−ln(1 −δ)

≤

ln t

δ

=

nln t

k = O

nlog t

.

Observe that the input size is at least as large as n (since there are n numbers) and at least as large as log t

(since it takes log t digits to write down t on the input). Thus, this function is polynomial in the input size

and 1/.

The approximation algorithm operates as before, but in addition we call the procedure Trim given below.

For example, consider the set S = ¦104, 102, 201, 101¦ and t = 308 and = 0.20. We have δ = /4 = 0.05.

Here is a summary of the algorithm’s execution.

Lecture Notes 132 CMSC 451

Approximate Subset Sum

Trim(L, delta) {

let the elements of L be denoted y[1..m];

L’ = <y[1]>; // start with first item

last = y[1]; // last item to be added

for i = 2 to m do {

if (last < (1-delta) y[i]) { // different enough?

append y[i] to end of L’;

last = y[i];

}

}

}

Approx_SS(x[1..n], t, eps) {

delta = eps/n; // approx factor

L = <0>; // empty sum = 0

for i = 1 to n do {

L = MergeLists(L, L+x[i]); // add in next item

L = Trim(L, delta); // trim away "near" duplicates

remove for L all elements greater than t;

}

return largest element in L;

}

init: L

0

= '0`

merge: L

1

= '0, 104`

trim: L

1

= '0, 104`

remove: L

1

= '0, 104`

merge: L

2

= '0, 102, 104, 206`

trim: L

2

= '0, 102, 206`

remove: L

2

= '0, 102, 206`

merge: L

3

= '0, 102, 201, 206, 303, 407`

trim: L

3

= '0, 102, 201, 303, 407`

remove: L

3

= '0, 102, 201, 303`

merge: L

4

= '0, 101, 102, 201, 203, 302, 303, 404`

trim: L

4

= '0, 101, 201, 302, 404`

remove: L

4

= '0, 101, 201, 302`

The ﬁnal output is 302. The optimum is 307 = 104 + 102 + 101. So our actual relative error in this case is

within 2%.

The running time of the procedure is O(n[L[) which is O(n

2

ln t/) by the earlier claim.

Lecture Notes 133 CMSC 451

Approximation Analysis: The ﬁnal question is why the algorithm achieves an relative error of at most over the

optimum solution. Let Y

∗

denote the optimum (largest) subset sum and let Y denote the value returned by the

algorithm. We want to show that Y is not too much smaller than Y

∗

, that is,

Y ≥ Y

∗

(1 −).

Our proof will make use of an important inequality from real analysis.

Lemma: For n > 0 and a real numbers,

(1 +a) ≤

1 +

a

n

n

≤ e

a

.

Recall that our intuition was that we would allow a relative error of /n at each stage of the algorithm. Since the

algorithm has n stages, then the total relative error should be (obviously?) n(/n) = . The catch is that these

are relative, not absolute errors. These errors to not accumulate additively, but rather by multiplication. So we

need to be more careful.

Let L

∗

i

denote the i-th list in the exponential time (optimal) solution and let L

i

denote the i-th list in the approx-

imate algorithm. We claim that for each y ∈ L

∗

i

there exists a representative item z ∈ L

i

whose relative error

from y that satisﬁes

(1 −/n)

i

y ≤ z ≤ y.

The proof of the claim is by induction on i. Initially L

0

= L

∗

0

= '0`, and so there is no error. Suppose by

induction that the above equation holds for each item in L

∗

i−1

. Consider an element y ∈ L

∗

i−1

. We know that

y will generate two elements in L

∗

i

: y and y + x

i

. We want to argue that there will be a representative that is

“close” to each of these items.

By our induction hypothesis, there is a representative element z in L

i−1

such that

(1 −/n)

i−1

y ≤ z ≤ y.

When we apply our algorithm, we will form two new items to add (initially) to L

i

: z and z + x

i

. Observe that

by adding x

i

to the inequality above and a little simpliﬁcation we get

(1 −/n)

i−1

(y +x

i

) ≤ z +x

i

≤ y +x

i

.

L

y

z y

i

z+x z

z’’ z’ y+x

i

*

i−1

L

i−1

L

*

i

L

i

Fig. 83: Subset sum approximation analysis.

The items z and z +x

i

might not appear in L

i

because they may be trimmed. Let z

and z

be their respective

representatives. Thus, z

and z

are elements of L

i

. We have

(1 −/n)z ≤ z

≤ z

(1 −/n)(z +x

i

) ≤ z

≤ z +x

i

.

Lecture Notes 134 CMSC 451

Combining these with the inequalities above we have

(1 −/n)

i−1

(1 −/n)y ≤ (1 −/n)

i

y ≤ z

≤ y

(1 −/n)

i−1

(1 −/n)(y +x

i

) ≤ (1 −/n)

i

(y +x

i

) ≤ z

≤ z +y

i

.

Since z and z

are in L

i

this is the desired result. This ends the proof of the claim.

Using our claim, and the fact that Y

∗

(the optimum answer) is the largest element of L

∗

n

and Y (the approximate

answer) is the largest element of L

n

we have

(1 −/n)

n

Y

∗

≤ Y ≤ Y

∗

.

This is not quite what we wanted. We wanted to show that (1 −)Y

∗

≤ Y . To complete the proof, we observe

from the lemma above (setting a = −) that

(1 −) ≤

1 −

n

n

.

This completes the approximate analysis.

Lecture Notes 135 CMSC 451

**Lecture 1: Course Introduction
**

Read: (All readings are from Cormen, Leiserson, Rivest and Stein, Introduction to Algorithms, 2nd Edition). Review Chapts. 1–5 in CLRS. What is an algorithm? Our text deﬁnes an algorithm to be any well-deﬁned computational procedure that takes some values as input and produces some values as output. Like a cooking recipe, an algorithm provides a step-by-step method for solving a computational problem. Unlike programs, algorithms are not dependent on a particular programming language, machine, system, or compiler. They are mathematical entities, which can be thought of as running on some sort of idealized computer with an inﬁnite random access memory and an unlimited word size. Algorithm design is all about the mathematical theory behind the design of good programs. Why study algorithm design? Programming is a very complex task, and there are a number of aspects of programming that make it so complex. The ﬁrst is that most programming projects are very large, requiring the coordinated efforts of many people. (This is the topic a course like software engineering.) The next is that many programming projects involve storing and accessing large quantities of data efﬁciently. (This is the topic of courses on data structures and databases.) The last is that many programming projects involve solving complex computational problems, for which simplistic or naive solutions may not be efﬁcient enough. The complex problems may involve numerical data (the subject of courses on numerical analysis), but often they involve discrete data. This is where the topic of algorithm design and analysis is important. Although the algorithms discussed in this course will often represent only a tiny fraction of the code that is generated in a large software system, this small fraction may be very important for the success of the overall project. An unfortunately common approach to this problem is to ﬁrst design an inefﬁcient algorithm and data structure to solve the problem, and then take this poor design and attempt to ﬁne-tune its performance. The problem is that if the underlying design is bad, then often no amount of ﬁne-tuning is going to make a substantial difference. The focus of this course is on how to design good algorithms, and how to analyze their efﬁciency. This is among the most basic aspects of good programming. Course Overview: This course will consist of a number of major sections. The ﬁrst will be a short review of some preliminary material, including asymptotics, summations, and recurrences and sorting. These have been covered in earlier courses, and so we will breeze through them pretty quickly. We will then discuss approaches to designing optimization algorithms, including dynamic programming and greedy algorithms. The next major focus will be on graph algorithms. This will include a review of breadth-ﬁrst and depth-ﬁrst search and their application in various problems related to connectivity in graphs. Next we will discuss minimum spanning trees, shortest paths, and network ﬂows. We will brieﬂy discuss algorithmic problems arising from geometric settings, that is, computational geometry. Most of the emphasis of the ﬁrst portion of the course will be on problems that can be solved efﬁciently, in the latter portion we will discuss intractability and NP-hard problems. These are problems for which no efﬁcient solution is known. Finally, we will discuss methods to approximate NP-hard problems, and how to prove how close these approximations are to the optimal solutions. Issues in Algorithm Design: Algorithms are mathematical objects (in contrast to the must more concrete notion of a computer program implemented in some programming language and executing on some machine). As such, we can reason about the properties of algorithms mathematically. When designing an algorithm there are two fundamental issues to be considered: correctness and efﬁciency. It is important to justify an algorithm’s correctness mathematically. For very complex algorithms, this typically requires a careful mathematical proof, which may require the proof of many lemmas and properties of the solution, upon which the algorithm relies. For simple algorithms (BubbleSort, for example) a short intuitive explanation of the algorithm’s basic invariants is sufﬁcient. (For example, in BubbleSort, the principal invariant is that on completion of the ith iteration, the last i elements are in their proper sorted positions.) Lecture Notes 2 CMSC 451

Establishing efﬁciency is a much more complex endeavor. Intuitively, an algorithm’s efﬁciency is a function of the amount of computational resources it requires, measured typically as execution time and the amount of space, or memory, that the algorithm uses. The amount of computational resources can be a complex function of the size and structure of the input set. In order to reduce matters to their simplest form, it is common to consider efﬁciency as a function of input size. Among all inputs of the same size, we consider the maximum possible running time. This is called worst-case analysis. It is also possible, and often more meaningful, to measure average-case analysis. Average-case analyses tend to be more complex, and may require that some probability distribution be deﬁned on the set of inputs. To keep matters simple, we will usually focus on worst-case analysis in this course. Throughout out this course, when you are asked to present an algorithm, this means that you need to do three things: • Present a clear, simple and unambiguous description of the algorithm (in pseudo-code, for example). They key here is “keep it simple.” Uninteresting details should be kept to a minimum, so that the key computational issues stand out. (For example, it is not necessary to declare variables whose purpose is obvious, and it is often simpler and clearer to simply say, “Add X to the end of list L” than to present code to do this or use some arcane syntax, such as “L.insertAtEnd(X).”) • Present a justiﬁcation or proof of the algorithm’s correctness. Your justiﬁcation should assume that the reader is someone of similar background as yourself, say another student in this class, and should be convincing enough make a skeptic believe that your algorithm does indeed solve the problem correctly. Avoid rambling about obvious or trivial elements. A good proof provides an overview of what the algorithm does, and then focuses on any tricky elements that may not be obvious. • Present a worst-case analysis of the algorithms efﬁciency, typically it running time (but also its space, if space is an issue). Sometimes this is straightforward, but if not, concentrate on the parts of the analysis that are not obvious. Note that the presentation does not need to be in this order. Often it is good to begin with an explanation of how you derived the algorithm, emphasizing particular elements of the design that establish its correctness and efﬁciency. Then, once this groundwork has been laid down, present the algorithm itself. If this seems to be a bit abstract now, don’t worry. We will see many examples of this process throughout the semester.

**Lecture 2: Mathematical Background
**

Read: Review Chapters 1–5 in CLRS. Algorithm Analysis: Today we will review some of the basic elements of algorithm analysis, which were covered in previous courses. These include asymptotics, summations, and recurrences. Asymptotics: Asymptotics involves O-notation (“big-Oh”) and its many relatives, Ω, Θ, o (“little-Oh”), ω. Asymptotic notation provides us with a way to simplify the functions that arise in analyzing algorithm running times by ignoring constant factors and concentrating on the trends for large values of n. For example, it allows us to reason that for three algorithms with the respective running times n3 log n + 4n2 + 52n log n ∈ Θ(n3 log n) 15n2 + 7n log3 n ∈ Θ(n2 ) 3n + 4 log5 n + 19n2 ∈ Θ(n2 ). Thus, the ﬁrst algorithm is signiﬁcantly slower for large n, while the other two are comparable, up to a constant factor. Since asymptotics were covered in earlier courses, I will assume that this is familiar to you. Nonetheless, here are a few facts to remember about asymptotic notation: Lecture Notes 3 CMSC 451

I will often say O(n2 ). more complex forms of analysis.) Lecture Notes 4 CMSC 451 . Logarithm Simpliﬁcation: It is a good idea to ﬁrst simplify terms involving logarithms. c are constants: logb n loga (nc ) bloga n = loga n = Θ(loga n) loga b = c loga n = Θ(loga n) nloga b . Arithmetic Series: For n ≥ 0. the following formulas are useful. the fastest growing function of n is the only one that needs to be considered. Polylog. (This is just because it is easier to say “oh” than “theta”. and exponential: These are the most common functions that arise in analyzing algorithms: Polylogarithmic: Powers of log n. express this as nlog2 3 ≈ n1. Here a. Constant Series: For integers a and b. Also. or other complex operators. such as (log n)7 . For example. 23n is not O(2n ). since an asymptotic approximation or close upper bound is usually good enough. 0). b. Following the conventional sloppiness. b 1 = max(b − a + 1. (The starting bound could have just as easily been set to 1 as 0. and c. recurrences. i=a Notice that when b = a − 1. For example. such as recurrences.) Summations: Summations naturally arise in the analysis of iterative algorithms. nb cn = Avoid using log n in exponents. such as n4 and n = n1/2 .585 . For example. For example. For example. integrals. which are strictly asymptotically smaller than exponential functions (assuming the base of the exponent is bigger than 1). We will usually write this as log7 n. when in fact the stronger statement Θ(n2 ) holds. √ Polynomial: Powers of n. such as 3n . Solving a summation means reducing it to a closed form formula. if we let mean “asymptotically smaller” then loga n for any a. provided that b > 0 and c > 1. Constant factors appearing exponents cannot be ignored. polynomial. one having no summations. rather than saying 3log2 n . and the result is 0. Thus. that is. 347n is Θ(n). 2 This is Θ(n ). Here are some common summations and some tips to use in solving summations. Focus on large n: Asymptotic analysis means that we consider trends for large values of n. The last rule above can be used to achieve this. Be careful to check that b ≥ a − 1 before applying this formula blindly. n i = 1 + 2 + ··· + n = i=0 2 n(n + 1) . Exponential: A constant (not 1) raised to the power n.Ignore constant factors: Multiplicative constant factors are ignored. b. there are no terms in the summation (since the index is assumed to count upwards only). are often solved by reducing them to summations. For example. An important fact is that polylogarithmic functions are strictly asymptotically smaller than polynomial function. In algorithm design it is often not necessary to solve a summation exactly. 3n2 log n + 25n log n + (log n)7 is Θ(n2 log n).

) Here is a handy formula. Let x = 1 be any constant. since x is a constant. (4 + 3i(i − 2)) = 4 + 3i2 − 6i = 4+3 i2 − 6 i. Let f (x) be any monotonically increasing function (the function increases as x increases). Now the formulas can be to each summation individually. For example. n Hn = i=1 1 1 1 1 = 1 + + + ··· + = (ln n) + O(1). x−1 If 0 < x < 1 then this is Θ(1).Geometric Series: Let x = 1 be any constant (independent of n). (Integration is in some sense a continuous form of summation. and. The multiplicative term n − 1 is very nearly equal to n for large n. n−1 ixi = x + 2x2 + 3x3 · · · + nxn = i=0 (n − 1)x(n+1) − nxn + x . n 0 n n+1 f (x)dx ≤ i=1 f (i) ≤ f (x)dx. n xi = 1 + x + x2 + · · · + xn = i=0 xn+1 − 1 . then for n ≥ 0. consider the following simple problem. 1 Example: Right Dominant Elements As an example of the use of summations in algorithm analysis. For n ≥ 0. We are given a list L of numeric values. Approximate using integrals: Integration and summation are closely related. that is. Summations with general bounds: When a summation does not start at the 1 or 0. you can just split it up into the difference of two summations. It does not have an exact closed form solution. What remains is Θ(nxn ). n i2 = 12 + 22 + · · · + n2 = i=0 2n3 + 3n2 + n . for 1 ≤ a ≤ b b b a−1 f (i) = i=a i=0 f (i) − i=0 f (i). i 2 3 n There are also a few tips to learn about solving summations. Quadratic Series: For n ≥ 0. (x − 1)2 As n becomes large. then this is Θ(xn ). the entire sum is proportional to the last element of the series. We say that an element of L is right dominant if it is strictly larger than all the elements that follow it in the list. Harmonic Series: This arises often in probabilistic analyses of algorithms. as most of the above formulas assume. we may multiply this times the constant (x − 1)2 /x without changing the asymptotics. Note that the last element of the list Lecture Notes 5 CMSC 451 . 6 Linear-geometric Series: This arises in some algorithms based on trees and recursion. If x > 1. this is asymptotically dominated by the term (n − 1)x(n+1) /(x − 1)2 . then for n ≥ 0. but it can be closely approximated. Linearity of Summation: Constant factors and added terms can be split out to make summations simpler.

since we can terminate the loop as soon as we ﬁnd that A[i] is not dominant. the running time. a doubly linked list (allowing for sequential access in both directions). is given by the following summation: n T (n) = i=1 (n − i). which operates by simply checking for each element of the array whether all the subsequent elements are strictly smaller. On the ith iteration of the outer loop. Again.) We will assume here that the array L of size n is indexed from 1 to n. 6. . it will also serve to illustrate the sort of style that we will use in presenting algorithms. as is the last occurrence of the maximum element of the array. To solve this summation. (Recall the rule for the constant series above. Chose your representation to make the algorithm as simple and clear as possible. (This is common in algorithms. 2. It will make a difference whether L is represented as an array (allowing for random access). let us expand it. L = 10. but will be omitted since it will not affect the worst-case running time. think a little harder. Think for a moment how you would solve this problem. Among the three possible representations. we should think about how L is represented. we will ﬁrst present a naive O(n2 ) time algorithm. the inner loop is executed from i + 1 to n. + (n − 2) + (n − 1) n−1 i = (n − 1)n . 1. . as a function of n. 8. For example. not compilers.) To illustrate summations.. 3 .) Right Dominant Elements (Naive Solution) // Input: List L of numbers given as an array L[1. 7. Can you see an O(n) time algorithm? (If not. However. The time spent in this algorithm is dominated (no pun intended) by the time spent in the inner (j) loop.n] // Returns: List D containing the right dominant elements of L RightDominant(L) { D = empty list for (i = 1 to n) isDominant = true for (j = i+1 to n) if (A[i] <= A[j]) isDominant = false if (isDominant) append A[i] to D } return D } If I were programming this. but give thought to how it may actually be implemented. Remember that algorithms are read by humans. 2 6 CMSC 451 Lecture Notes . Thus. or a singly linked list (allowing for sequential access in only one direction). 5.) Each iteration of the inner loop takes constant time. . for a total of n − (i + 1) + 1 = n − i times. this sort of optimization is good to keep in mind in programming. 13.is always right dominant. 3 The sequence of right dominant elements are 13. I would rewrite the inner (j) loop as a while loop. up to a constant factor. we will design the algorithm in such a way that it only performs sequential scans. 9. consider the following list. and put it into a form such that the above formulas can be used. 4. T (n) = = = i=0 (n − 1) + (n − 2) + . In order to make this more concrete. 6. the array representation seems to yield the simplest and clearest algorithm. + 2 + 1 + 0 0 + 1 + 2 + . . (Although this example is pretty stupid. so it could also be implemented using a singly linked or doubly linked list. 8.

There many recurrences that cannot be put into this form. Here is a slightly more restrictive version. Lecture 3: Review of Sorting and Selection Read: Review Chapts. but adequate for a lot of instances. Here is a typical example. so a = bk and Case 2 applies. each of size roughly n/2. The easiest method is to apply the Master Theorem.) For such recurrences. Theorem: (Simpliﬁed Master Theorem) Let a ≥ 1.). Suppose that we break the problem into two subproblems. there is a simple O(n) time algorithm for this problem. and k = 1. we can solve them in O(1) time. Conquer: Solve each subproblem recursively. b > 1 be constants and let T (n) be the recurrence T (n) = aT (n/b) + cnk . since we assume that n is an integer. Lecture Notes 7 CMSC 451 . the following recurrence is quite common: T (n) = 2T (n/2) + n log n. and Combine: Combine the solutions to the subproblems into a single global solution. but I will often be sloppy in this way. writing O(n) just as n.The last step comes from applying the formula for the linear series (using n − 1 in place of n in the formula). see if you can design your algorithm so it only performs a single left-to-right scan of the list L. When the subproblems are reduced to size 1. if n > 1. We will ignore constant factors. See CLRS for the more complete version of the Master Theorem and its proof. Note that. other methods are needed. a function which is deﬁned recursively in terms of itself. As an additional challenge. To be formally correct. They arise naturally in the analysis of divide-and-conquer algorithms. For example. this recurrence is not well deﬁned unless n is a power of 2 (since otherwise n/2 will at some point be a fraction). Case 3: a < bk then T (n) is Θ(nk ). The additional overhead of splitting and merging the solutions is O(n). that is. given in CLRS. As mentioned above. (We will assume exactly n/2 for simplicity. Case 2: a = bk then T (n) is Θ(nk log n). Recall that these algorithms have the following general structure. then the best way is usually by setting up a recurrence.) Recurrences: Another useful mathematical tool in algorithm analysis will be recurrences. deﬁned for n ≥ 0. 6–9 in CLRS. How do we analyze recursive procedures like this one? If there is a simple pattern to the sizes of the recursive calls. Thus T (n) is Θ(n log n). see if you can ﬁnd it. b = 2. This solves to T (n) = Θ(n log2 n). I should either write n/2 or restrict the domain of n. Divide: Divide the problem into two or more subproblems (ideally of roughly equal sizes). Using this version of the Master Theorem we can see that in our recurrence a = 2. (You are allowed to use up to O(n) working storage to do this. yielding the following recurrence: T (n) T (n) = = 1 2T (n/2) + n if n = 1. Case 1: a > bk then T (n) is Θ(nlogb a ). but the Master Theorem (either this form or the one in CLRS will not tell you this. As an exercise. There are a number of methods for solving the sort of recurrences that show up in divide-and-conquer algorithms.

but it is not in-place. (By the way. The three canonical efﬁcient comparison-based sorting algorithms are MergeSort. depending on the application. QuickSort. (It does implicitly use the system’s recursion stack. since in some sorting applications you sort ﬁrst on one key and then on another. but it widely considered to be the worst of the three. remain sorted on the ﬁrst key. They are shown schematically in Fig. The other algorithms compare two elements in the array. Lecture Notes 8 CMSC 451 . It is nice to know that two items that are equal on the second key. This is considered an in-place sorting algorithm. You are probably familiar with one or more of the standard simple Θ(n2 ) sorting algorithms. Stable: A sorting algorithm is stable if two elements that are equal remain in the same relative position after sorting is completed. and HeapSort. If properly implemented. Sorting algorithms are usually divided into two classes. Sorting algorithms often have additional properties that are of interest. Here are two important properties. but this is usually not counted. The problem is to permute the items so that they are in increasing (or decreasing) order by key. We are given a sequence of items. internal sorting algorithms. and external sorting algorithm. One explanation is that its inner loop compares elements against a single pivot value. and Θ(n2 ) in the worst case. such as InsertionSort. which assume that data is stored in an array in main memory.) BubbleSort is the easiest one to remember. QuickSort is widely regarded as the fastest of the fast sorting algorithms (on modern machines). which assume that data is stored on disk or some other device that is best accessed sequentially. Here is a quick summary of the fast sorting algorithms. Sorting is important because it is often the ﬁrst step in more complex algorithms. Then it partitions the array into elements that are less than and greater than the pivot. There is a stable version of QuickSort. each associated with a given key value. All run in Θ(n log n) time. We will only consider internal sorting.Review of Sorting: Sorting is among the most basic problems in algorithm design. This algorithm is Θ(n log n) in the expected case. 1: Common O(n log n) comparison-based sorting algorithms.) It is not stable. these algorithms are quite acceptable for small lists of. fewer than 20 elements. since it uses no other array storage. say. This is of interest. check out the descriptions in CLRS. the probability that the algorithm takes asymptotically longer (assuming that the pivot is chosen randomly) is extremely small for large n. Then it recursively sorts each part. and hence (other than perhaps the system’s recursion stack) it is possible to sort very large lists without the need to allocate additional working storage. In-place: The algorithm uses no additional array storage. SelectionSort and BubbleSort. by ﬁrst selecting a random “pivot value” from the array. 1 QuickSort: It works recursively. If you are not familiar with any of these. which can be stored in a register for fast access. x QuickSort: x partition <x x >x sort sort split merge MergeSort: sort HeapSort: buildHeap Heap extractMax Fig.

and deleting the element with the smallest key value.) This binary tree. Then the two sorted subarrays are merged together in Θ(n) time. suggests that this may be the best that we can do. it is important to learn about sorting algorithms. Selection: A simpler. A priority queue supports the operations of inserting a key. We will not present a proof of this theorem. and set up a new node each time a decision is made. but it is not stable. HeapSort: HeapSort is based on a nice data structure. return the kth smallest value of A. Although selection can be solved in O(n log n) time. There are n! ways to permute a given set of n numbers. to trace its execution. even if perfectly balanced. but quite tedious. Which sorting algorithm should you implement when implementing your programs? The correct answer is probably “none of them”. The downside is the MergeSort is the only algorithm of the three that requires additional array storage (ignoring the recursion stack). since two different permutations need to treated differently. All of the algorithms we have discussed so far are comparison-based. They are sorted recursively. but given a sorting algorithm it is not hard. called a decision tree. The algorithm is a variant of QuickSort. related problem to sorting is selection. A comparison-based sorting algorithm is one in which algorithm permutes the elements based solely on the results of the comparisons that the algorithm makes between pairs of elements. Since each comparison leads to only two possible outcomes. Lower Bounds for Comparison-Based Sorting: The fact that O(n log n) sorting algorithms are the fastest around for many years. Each change of priority (key value) can be processed in Θ(log n) time. but this is the key to making this work as an in-place sorting algorithm. n! Lecture Notes 9 CMSC 451 . A heap can be built for n keys in Θ(n) time. where 1 ≤ k ≤ n. A heap has the additional advantage of being used in contexts where the priority of elements changes. Such a tree. but the basic argument follows from a simple analysis of the number of possibilities and the time it takes to distinguish among them. one for each of the possible input permutations. The following theorem gives the lower bound on comparison-based sorting. The array is split into two subarrays of roughly equal size. must have at least n! leaves. By Stirling’s approximation. it has been engineered to produce the best performance for your system.) If you only want to extract the k smallest values. must height at least lg(n!). which is an efﬁcient implementation of a priority queue data structure. Theorem: Any comparison-based sorting algorithm has worst-case running time Ω(n log n). by ﬁrst sorting A and then returning the kth element of the sorted list. provided that the algorithm is comparison-based. the execution of the algorithm can be viewed as a binary tree. (This is a bit abstract. it is best to rely on the library sorting procedure supplied on your system. HeapSort works by building the heap (ordered in reverse order so that the maximum can be extracted efﬁciently) and then repeatedly extracting the largest element. The selection problem is. it is possible to select the kth smallest element in O(n) time.MergeSort: MergeSort also works recursively. We will see that exceptions exist in special cases. given an array A of n numbers (not sorted). It is a classical divide-and-conquer algorithm. This is because the merging process merges the two arrays into a third array. Nonetheless. and an integer k. Presumably. This does not preclude the possibility of sorting algorithms whose actions are determined by other operations. called a heap. HeapSort is an in-place sorting algorithm. and saves you from debugging time. Any sorting algorithm must be able to distinguish between each of these different possibilities. since the fundamental concepts covered there apply to much more complex algorithms. Although it is possible to merge arrays in-place. it cannot be done in Θ(n) time. Unless you know that your input has some special properties that suggest a much faster alternative. Can we sort faster? The claim is no. and thus it is not in-place. (Why it extracts the maximum rather than the minimum is an implementation detail. as we shall see below. MergeSort is the only stable sorting algorithm of these three. and the minimum key can be extracted in Θ(log n) time. a heap can allow you to do this is Θ(n + k log n) time.

b). The algorithm is remarkably simple. Since the sorting algorithm is stable. MergeSort. In general this works to sort any n numbers over the range from 1 to nd . but the space requirements go up. Radix Sort: The main shortcoming of CountingSort is that (due to space requirements) it is only practical for a very small ranges of integers. numbers having the same high order digit will remain sorted with respect to their low order digit. The algorithm sorts in Θ(n + k) time. Linear Time Sorting: The Ω(n log n) lower bound implies that if we hope to sort numbers faster than in O(n log n) time. As the number of bits in each group increases. at a time. but deceptively clever. or generally. if k is O(n). This can also be generalized to show that the average-case time to sort is also Ω(n log n). where each digit is over the range from 0 to n − 1. However. Here are three such algorithms. or one byte. such as QuickSort. we can radix sort these numbers in time Θ(2(n + n)) = Θ(n). we can write L = an + b. 2: Example of RadixSort. and k is the number of distinct values each digit may have. it is possible to sort without the use of comparisons. starting at the lowest order digit. each having d decimal digits (or digits in any base). Counting Sort: Counting sort assumes that each input is an integer in the range from 1 to k. The algorithm requires an additional Θ(n + k) working storage but has the nice feature that it is stable. where a = L/n and b = L mod n. BucketSort: CountingSort and RadixSort are only good for sorting small integers. up to constant factors. given any integer L in this range. we may not want to allocate an array of a million elements. or at least objects (like characters) that can be encoded as small integers. RadixSort provides a nice way around this by sorting numbers one digit. What if you want to sort a set of ﬂoating-point numbers? In the worst-case you are pretty much stuck with using one of the comparison-based sorting algorithms. Thus. we can think of L as the 2-digit number (a. To sort these integers we simply sort repeatedly. we know that if the numbers are already sorted with respect to low order digits. you could subtract 1 so that they are now in the range from 0 to n2 − 1. You are referred to CLRS for the details. the algorithm is faster. Now. this implies that the resulting sorting algorithm runs in Θ(n) time. In particular. n is the length of the list. Observe that any number in this range can be expressed as 2-digit number. and ﬁnishing with the highest order digit. and then later we sort with respect to high order digits. Input 576 494 194 296 278 176 954 49[4] 19[4] 95[4] 57[6] 29[6] 17[6] 27[8] 9[5]4 5[7]6 1[7]6 2[7]8 4[9]4 1[9]4 2[9]6 [1]76 [1]94 [2]78 [2]96 [4]94 [5]76 [9]54 Output 176 194 278 296 494 576 954 =⇒ =⇒ =⇒ =⇒ Fig. Plugging this in and simplifying yields the Ω(n log n) lower bound. but still polynomial in n. This leads to the possibility of sorting in linear (that is. we cannot do it by making comparisons alone. roughly (n/e)n . some groups of bits. The running time is Θ(d(n + k)) where d is the number of digits in each value. in Θ(dn) time. O(n)) time.is. The idea is very simple. suppose that you wanted to sort a list of integers in the range from 1 to n2 . An example is shown in Figure 2. in special cases where you have reason to believe that your numbers are roughly uniformly distributed over some range. A common application of this algorithm is for sorting integers over some range that is larger than n. The space needed is Θ(n + k). If the integers are in the range from say. then it is possible to do better. For example. 1 to a million. First. or HeapSort. In some special cases. Let’s think of our list as being composed of n integers. (Note Lecture Notes 10 CMSC 451 . So.

This algorithm should not be applied unless you have good reason to believe that this is the case. (Our text also discusses a top-down alternative. [0. and so on. the subintervals would be [0.10 . Finally. (This is true for two reasons. called memoization. Dynamic Programming: We begin discussion of an important algorithm design technique.01. The number of points per bucket should be fairly small. say [0.02. in the sense that it breaks problems down into smaller problems that it solves recursively. 3.86 Fig. However. and scale the numbers to ﬁt into this range.56 .) Suppose that the numbers to be sorted range over some interval.42 . For example. because of the somewhat different nature of dynamic programming problems. Then we make a pass through the list to be sorted. called dynamic programming (or DP for short).14 .86 . 0. Table-structure: Store the answers to the subproblems in a table. We create n different buckets. the index of element x would be 100x . 0. Thus.17 .38 . Lecture 4: Dynamic Programming: Longest Common Subsequence Read: Introduction to Chapt 15.42 . Dynamic programming problems are typically optimization problems (ﬁnd the minimum or maximum cost solution.) We then sort each bucket in ascending order. we can map each value to its bucket index. the expected time needed to sort each bucket is O(1). all the sorted buckets are concatenated together. 1). subject to various constraints).56 . standard divide-and-conquer solutions are not usually efﬁcient. BubbleSort or InsertionSort) should work. An example illustrating this idea is given in Fig. 3: BucketSort.02). (In this case.59 . assuming that the numbers are uniformly distributed. Express the solution of the original problem in terms of solutions for smaller problems. the total sorting time is Θ(n).) The idea is the subdivide this interval into n subintervals.01). and using the ﬂoor function.71 .10 .81 .4 in CLRS.81 . The technique is related to divide-and-conquer.17 A . [0. if n = 100.59 .that this is a strong assumption. and Section 15.38 . Bottom-up computation: Combine solutions on smaller subproblems to solve larger subproblems. 0. one for each interval.71 . the number of elements lying within each bucket on average is a constant. The analysis relies on the fact that.03). The technique is among the most powerful for designing algorithms for optimization problems. so even a quadratic time sorting algorithm (e.14 . This is done because subproblem solutions are reused many times. The basic elements that characterize a dynamic programming algorithm are: Substructure: Decompose your problem into smaller (and hopefully simpler) subproblems. (It is possible in O(n) time to ﬁnd the maximum and minimum values.g. Dynamic programming solutions are based on a few common elements. B 0 1 2 3 4 5 6 7 8 9 .) Lecture Notes 11 CMSC 451 . Since there are n buckets.

. Xik . Their longest common subsequence is ABA . .) It states that for the global problem to be solved optimally. Lecture Notes 12 CMSC 451 . i2 . . the longest common subsequence of X and Y is a longest sequence Z that is a subsequence of both X and Y . Let c[i. we need to break the problem into smaller pieces. j] values do we compute? Since we don’t know which will lead to the ﬁnal optimum.) Polynomially many subproblems: An important aspect to the efﬁciency of DP is that the total number of subproblems to be solved should be at most a polynomial number. For example. zk . For example the LCS of ABC and BAC is either AC or BC . For example. Dynamic programming is not applicable to all optimization problems. (Not all optimization problems satisfy this. Optimal substructure: (Sometimes called the principle of optimality. . we say that Z is a subsequence of X if there is a strictly increasing sequence of k indices i1 . Sometimes it is better to lose a little on one subproblem in order to make a big gain on another. . j ]. j] denote the length of the longest common subsequence of Xi and Yj . A preﬁx of a sequence is just an initial string of values. Xi = x1 . Given two sequences X = x1 . . . Here are the possible cases. x2 . See Fig. . . we will derive a dynamic programming solution. but this is hopelessly inefﬁcient. One common method of measuring the degree of similarity between two strings is to compute their longest common subsequence. Xi2 . Instead. . 4: An example of the LCS of two strings X and Y . 6] = 3. Longest Common Subsequence: Let us think of character strings as sequences of characters. but it turns out for this problem that considering all pairs of preﬁxes will sufﬁce for us. X0 is the empty sequence. xi . . . Which of the c[i. In typical DP fashion. x2 . There are two important elements that a problem must have in order for DP to be applicable. . . . let X = ABRACADABRA and let Y = YABBADABBADOO . Given two sequences X = x1 . The idea is to compute c[i. . There are many ways to do this for strings. then Z is a subsequence of X. . DP Formulation for LCS: The simple brute-force solution to the problem would be to try all possible subsequences from one string. . . . . Thus. Eventually we are interested in c[m. Then the longest common subsequence is Z = ABADABA . . let X = ABRACADABRA and let Z = AADAA . .The most important question in designing a DP solution to a problem is how to set up the subproblem structure. z2 . The idea will be to compute the longest common subsequence for every possible pair of preﬁxes. . each subproblem should be solved optimally. c[5. Among the most important has to do with efﬁciently searching for a substring or generally a pattern in large piece of text. since there are an exponential number of possible subsequences. . Strings: One important area of algorithm design is the study of algorithms for character strings. ik (1 ≤ i1 < i2 < . There are a number of important problems here. (This is what text editors and programs like “grep” do when you perform a search. j] assuming that we already know the values of c[i . The Longest Common Subsequence Problem (LCS) is the following. for i ≤ i and j ≤ j (but not both equal). . and search for matches in the other string. we compute all of them. This is called the formulation of the problem. For example. but rather something that is similar. Note that it is not always unique. n] since this will be the LCS of the two entire strings. 4 X= A B R A C A D A B R A LCS = A B A D A B A Y= Y A B B A D A B B A D O O Fig. Given two strings X and Y . .) In many instances you do not want to ﬁnd a piece of text exactly. . in the above case we have X5 = ABRAC and Y6 = YABBAD . This arises for example in genetics research and in document retrieval on the web. xm and Y = y1 . . yn determine a longest common subsequence. xm and Z = z1 . < ik ≤ n) such that Z = Xi1 . .

or yj is not part of the LCS (and possibly both are not part of the LCS). j > 0 and xi = yj . 5. 0 c[i − 1. In this case xi and yj cannot both be in the LCS (since they would have to be the last character of the LCS). perhaps we can ﬁgure out which character is best to discard. 5: The possibe cases in the DP formulation of LCS. but it will not make the LCS any smaller if we do. In the second case (yj is not in the LCS) the LCS is the LCS of Xi and Yj−1 which is c[i. We will store some helpful pointers in a parallel array.. so we try both and take the one that gives us the longer LCS. Implementing the Formulation: The task now is to simply implement this formulation.m. For example: Let Xi = ABCA and let Yj = DACA . j − 1]. we claim that the LCS must also end in A. j]. b[0. j] = max(c[i. our approach is to take advantage of the fact that we have already precomputed smaller subproblems. 0. This is illustrated at the bottom half of Fig. j − 1] + 1 if i. c[i. The answer is that we don’t. if xi = yj then c[i. then the longest common subsequence is empty. j − 1]) Combining these observations we have the following formulation: if i = 0 or j = 0. In the ﬁrst case (xi is not in the LCS) the LCS of Xi and Yj is the LCS of Xi−1 and Yj .n]. if xi = yj then c[i. Last characters do not match: Suppose that xi = yj .) Instead. giving ACA as the answer.) Since the A is part of the LCS we may ﬁnd the overall LCS by removing A from both sequences and taking the LCS of Xi−1 = ABC and Yj−1 = DAC which is AC and then adding A to the end. j > 0 and xi = yj . j]. j − 1]. However. Later we will see how to extract the actual sequence. Thus either xi is not part of the LCS. 6 Lecture Notes 13 CMSC 451 . (We will leave the proof as an exercise. j] = c[i − 1. which is c[i − 1. If either sequence is empty. since it is a common point of confusion. j]) if i. We do not know which is the case. We concentrate only on computing the maximum length of the LCS. this approach is doomed to failure (and you are strongly encouraged to think about this. and an example is illustrated in Fig. (At ﬁrst you might object: But how did you know that these two A’s matched with each other. Since both end in A. and use these results to guide us. j − 1] + 1 Xi Last chars match: Yj yj A xi A Xi−1 LCS Yj−1 A add to LCS A Last chars do not match Xi Yj yj B xi A Xi−1 LCS Yj max Xi LCS Yj−1 B A skip xi A B skip yj Fig.) This is illustrated at the top of Fig. The code is shown below. 0] = 0. c[i − 1. 5.. By analyzing the last few characters of Xi and Yj . j] = max(c[i − 1. Last characters match: Suppose xi = yj . c[i. At this point it may be tempting to try to make a “smart” choice.Basis: c[i. 0] = c[j.

0.Y: 0 0 1 X: 2 3 4 m= 5 0 B 0 A 0 C 0 D 0 B 0 1 B 0 1 1 1 1 1 2 0 1 1 1 2 2 3 0 1 1 2 2 2 4 =n 0 1 1 2 2 3 LCS = BCB 0 1 X: 2 3 4 m= 5 Y: 0 0 B A 0 0 1 0 1 1 1 1 1 2 0 1 1 1 2 2 3 0 1 1 2 2 2 4 =n 0 1 1 2 2 3 start here D C B X = BACDB Y = BDCB B D C B C 0 D 0 B 0 LCS Length Table with back pointers included Fig. b[i.n] for i = 0 to m c[i.j-1]) c[i. j--. b[i.n]) { int c[0. break // skip Y[j] return LCSstring } Lecture Notes 14 CMSC 451 . b[0. b[i. j = n // start at lower right while(i != 0 && j != 0) // go until upper left switch b[i.0] = 0. b[i.m.j-1]+1.0] = SKIPX for j = 0 to n c[0.j] return c[m.n]. 6: Longest common subsequence example for the sequences X = BACDB and Y = BCDB ..m..j] = 0.j] = c[i-1.n]) { LCSstring = empty string i = m. Build LCS Table LCS(x[1. b[0.j] = SKIPY for i = 1 to m for j = 1 to n if (x[i] == y[j]) c[i. y[1...j] else if (c[i-1.j] = c[i. break // skip X[i] case skipY: j--. j] and the arrow entries are used in the extraction of the sequence.m]. y[1... The numeric table entries are the values of c[i.j] case addXY: // add X[i] (=Y[j]) add x[i] (or equivalently y[j]) to front of LCSstring i--.m].j] >= c[i.j].j] else c[i...n] } // compute LCS table // init column 0 // init row 0 // fill rest of table // take X[i] (Y[j]) for LCS = addXY // X[i] not in LCS = skipX // Y[j] not in LCS = skipY // return length of LCS Extracting the LCS getLCS(x[1.0.j-1]. break case skipX: i--.j] = c[i-1.

but we are not free to rearrange the order of the matrices. j] entry of C is the dot product of the ith (horizontal) row of A and the jth (vertical) column of B. then we know that X[i] is not in the LCS. If b[i. Intuitively b[i.The running time of the algorithm is clearly O(mn) since there are two nested loops with m and n iterations.2 in particular. Chain Matrix Multiplication: This problem involves the question of determining the optimal sequence for performing a series of operations. 0. 7: Matrix Multiplication.m. multCost[(A1 (A2 A3 ))] = (4 · 6 · 2) + (5 · 4 · 2) = 88. Suppose that we wish to multiply a series of matrices A1 A2 . and Section 15. You can multiply a p × q matrix A times a q × r matrix B. j − 1] to the northwest ( ).. (The number of columns of A must equal the number of rows of B. where the dynamic programming issues are easiest to see. pqr. considerable savings can be achieved by reordering the evaluation sequence. j]. Following these back pointers. q C[i. respectively. Extracting the Actual Sequence: Extracting the ﬁnal LCS is done by using the back pointers stored in b[0. j] = k=1 A[i. and so we skip it and go to b[i − 1. Similarly. The algorithm also uses O(mn) space. Lecture Notes 15 CMSC 451 . if b[i. j] = skipY . Consider the case of 3 matrices: A1 be 5 × 4. j] above us (↑). We will study the problem in a very restricted instance. A2 be 4 × 6 and A3 be 6 × 2. This general class of problem is important in compiler design for code optimization and in databases for query optimization.. j] = skipX . Also recall that when two (nonsquare) matrices are being multiplied. multCost[((A1 A2 )A3 )] = (5 · 4 · 6) + (5 · 6 · 2) = 180. not all involve the same number of operations. thus the total time to multiply these two matrices is proportional to the product of the dimensions. Note that although any legal parenthesization will lead to a valid result.n]. A * q p r q r B = = p C Multiplication time = pqr Fig. and outputting a character with each diagonal move gives the ﬁnal subsequence. and the result will be a p × r matrix C. This means that we are free to parenthesize the above multiplication however we like. then we know that Y [j] is not in the LCS. and so we skip it and go to b[i.) In particular for 1 ≤ i ≤ p and 1 ≤ j ≤ r. and continue with entry b[i − 1. Lecture 5: Dynamic Programming: Chain Matrix Multiplication Read: Chapter 15 of CLRS. This corresponds to the (hopefully familiar) rule that the [i. . An Matrix multiplication is an associative but not a commutative operation. j − 1] to the left (←). . j] = addXY means that X[i] and Y [j] together form the last character of the LCS. Even for this small example. Observe that there are pr total entries in C and each takes O(q) time to compute. there are restrictions on the dimensions. So we take this common character. A p × q matrix has p rows and q columns. k]B[k.

determine the order of multiplication (represented. pick the value of k that minimizes pk .. the number of different ways of parenthesizing n items: 1 if n = 1. Unfortunately. it just determines the best order in which to perform the multiplications. (Think about this for a second to be sure you see why.. Thus the problem of determining the optimal sequence of multiplications is broken up into two questions: how do we decide where to split the chain (what is k?) and how do we parenthesize the subchains A1. . and just after the (n − 1)st item. 1 ≤ k ≤ n − 1. we can consider the highest level of parenthesization. Since 4n is exponential and n3/2 is just polynomial. the number of ways of parenthesizing an expression is very large. since it quickly leads to an exponential Lecture Notes 16 CMSC 451 . and taking the best of them.j is a pi−1 × pj matrix.. it makes sense to think about sequences of matrices. p1 .n . we need to ﬁnd some way to break the problem into smaller subproblems. we ﬁnd that C(n) ∈ Ω(4n /n3/2 ). .Chain Matrix Multiplication Problem: Given a sequence of matrices A1 . where C(n) is the nth Catalan number: 1 2n . A2 . this will not be practical except for very small n. since we want matrices with small dimensions.. (After all it might work.) Instead. then there are n − 1 places where you could break the list with the outermost pair of parentheses. if there are L ways to parenthesize the left sublist and R ways to parenthesize the right sublist.. If you have just one or two matrices. Dynamic Programming Approach: This problem. namely just after the 1st item. whose solutions can be combined to solve the global problem. Let us think of how we can do this.k · Ak+1. What we want to do is to break the problem into problems of a similar structure. and we need to determine a recursive formulation. we will do the dumbest thing of simply considering all possible choices of k. .k and Ak+1. which you should try. Thus. C(n) = n+1 n Applying Stirling’s formula (which is given in our text). you may be tempted to consider some clever ideas. Since these are independent choices. . So. . It is easy to see that Ai. This suggests the following recurrence for P (n). A1. as is true in almost all dynamic programming solutions. Important Note: This algorithm does not perform the multiplications. This takes a bit of thinking. . by applying the same scheme. Let Ai. one with k items. . just after the 2nd item. k=1 This is related to a famous function in combinatorics called the Catalan numbers (which in turn is related to the number of different binary trees on n nodes). Then we could consider all the ways of parenthesizing these. we need to make many decisions. P (n) = n−1 P (k)P (n − k) if n ≥ 2.. for any k. etc. At this level we are simply multiplying two matrices together.n = A1. Naive Algorithm: We could write a procedure which tries all possible parenthesizations. as a binary tree) that minimizes the number of operations.j denote the result of multiplying matrices i through j. let us think about the problem of determining the best value of k. It just turns out that it doesn’t in this case. If you have n items. Although this is not a bad idea. Since matrices cannot be reordered. say. the exponential will dominate.. pn where Ai is of dimension pi−1 × pi . An and dimensions p0 . a parenthesization). For example. which represents the optimum solution to each problem in terms of solutions to the subproblems. When we split just after the kth item. and the other with n − k items. then there is only one way to parenthesize. In particular P (n) = C(n − 1). Usually trying all possible choices is bad. At this point. brute force is not an option.. we create two sublists to be parenthesized. implying that function grows very fast. . We want to break the problem into subproblems. In summary. then the total is L · R.n ? The subchain problems can be solved recursively. That is. in principle. like other dynamic programming problems involves determining a structure (in this case. in order to determine how to perform this multiplication optimally.) Now. As is common to any DP solution. In parenthesizing the expression.

then we are asking about the product Ai. let m[i. and each can iterate at most n times.. m[i.j . this means that i + L − 1 ≤ n. The running time of the procedure is Θ(n3 )..j .. The optimum cost can be described by the following recursive formulation. A k A k+1 . j]. Basis: Observe that if i = j then the sequence contains only one matrix.j . we do not encounter 2 the exponential growth.j A i A i+1 . then j = i + L − 1. What saves us here is that there are only O(n2 ) different sequences of matrices.. We’ll leave this as an exercise in solving sums. but the key is that there are three nested loops. and so the cost is 0.) Thus.k A k+1. m[i. or in other words. j] will be explained later. respectively. i] = 0. j] = = 0 i≤k<j min (m[i. for the global problem to be solved optimally. The optimum times to compute Ai. In the process of computing m[i. k] and m[k + 1.. The array s[i.. i]) are trivial to compute. i ≤ k < j..k times Ak+1. the subproblems must be solved optimally as well. the time to multiply them is pi−1 pk pj . The basic idea is to leave a split marker indicating what the best split is. j]. .j is a pk × pj matrix. n]. j] denote the minimum number of multiplications needed to compute Ai. We need to be a little careful in setting up the loops. That is. we should compute each subsequence optimally.. j] + pi−1 pk pj ) for i < j.. Step: If i < j. as Ai. Then we build up by computing the subchains of lengths 2. . (There is nothing to multiply. Extracting the ﬁnal Sequence: Extracting the actual multiplication sequence is a fairly easy extension.k and Ak+1. It is used to extract the actual sequence. We may assume that these values have been computed previously and are already stored in our array. A i. j] for k lying between i and j. the value of k that leads to the minimum Lecture Notes 17 CMSC 451 .j to be precise.number of total possibilities..n .k · Ak+1. If a subchain of length L starts at position i. m[i. and build the table in a bottom-up manner. by deﬁnition. Since we want j ≤ n. A j ? Fig. The only tricky part is arranging the order in which to compute the values. . and Ak+1.. Since Ai.) Thus. It is not hard to convert this rule into a procedure. For 1 ≤ i ≤ j ≤ n. (There are n = n(n − 1)/2 ways of choosing i and j to form Ai.. 8: Dynamic Programming Formulation.. because once we decide to break the sequence into the product A1.. Notice that our chain matrix multiplication problem satisﬁes the principle of optimality. This suggests that we should organize our computation according to the number of matrices in the subsequence. Let L = j−i+1 denote the length of the subchain being multiplied.k is a pi−1 × pk matrix.j are. 3.j A i. which is given below.. j] we need to access values m[i. The ﬁnal answer is m[1. k] and m[k + 1. So our loop for i runs from 1 to n − L + 1 (in order to keep j in bounds). i] m[i. i ≤ n − L + 1. This suggests the following recursive rule for computing m[i. k] + m[k + 1. that is. This can be split by considering each k. The subchains of length 1 (m[i.. n. The code is presented below. .. Dynamic Programming Formulation: We will store the solutions to the subproblems in a table..

if j > i. j) return X*Y. suppose that s[i. Lecture 6: Dynamic Programming: Minimum Weight Triangulation Read: This is not covered in CLRS.Chain Matrix Multiplication Matrix-Chain(array p[1.n].. j]. for k = i to j-1 do { // check all splits q = m[i.n-1. 2. and ﬁnally multiply these together.j] = INFINITY. and that s[i... m[i. j]) { m[i.j] X = Mult(i.n]) { array s[1. } } } } return m[1. This algorithm is tricky. Assume that the matrices are stored in an array of matrices A[1. s[i. j] when we have at least two matrices. s[i. We can maintain a parallel array s[i.2. k) Y = Mult(k+1. j] value to determine how to split the current sequence.i] = 0. j] in which we will store the value of k providing the optimal split.k and then multiply the subchain Ak+1.j] = q. that is. Intuitively. so it would be a good idea to trace through this example (and the one given in the text).j] = k.A[k] // Y = A[k+1]. Lecture Notes 18 CMSC 451 . Extracting Optimum Sequence Mult(i. For example.. k] + m[k+1.j . j) { if (i == j) return A[i].. j] = k. This tells us that the best way to multiply the subchain Ai.j is to ﬁrst multiply the subchain Ai. The actual multiplication algorithm uses the s[i. } } // basis case // X = A[i].. 6.. 4. 7 meaning that we are multiplying A1 (5 × 4) times A2 (4 × 6) times A3 (6 × 2) times A4 (2 × 7)..n] for i = 1 to n do m[i. else { k = s[i.A[j] // multiply matrices X and Y In the ﬁgure below we show an example.1. The recursive procedure Mult does this computation and below returns a matrix. // initialize for L = 2 to n do { // L = length of subchain for i = 1 to n-L+1 do { j = i + L . The optimal sequence is ((A1 (A2 A3 ))A4 ). j] is global to this recursive procedure. j] tells us what multiplication to perform last.n] (final cost) and s (splitting markers). The initial set of dimensions are 5.. j] + p[i-1]*p[k]*p[j] if (q < m[i. } value of m[i.. Note that we only need to store s[i..

. A polygon is simple if it does not cross itself. we assume that its vertices are labeled in counterclockwise order P = v1 . . the number of sides. the number is exponential in n. whose vertices are drawn from the vertices of P . vj+1 . we form a cycle by joining line segments end to end. the line segment vi vj is a chord. (This may sound fanciful. and you want to minimize the amount of ink you use. . A triangulation of a convex polygon P is a subdivision of the interior of P into a collection of triangles with disjoint interiors.) Any chord subdivides the polygon into two polygons: vi . but actually has remarkable similarities. and vj . vi−1 vi . A simple polygon subdivides the plane into its interior. It is not hard to prove (by induction) that every triangulation of an n-sided polygon consists of n − 3 chords and n − 2 triangles.j] 3 i 0 A3 Final order 1 2 A2 A3 A4 Fig. Triangulations are of interest for a number of reasons. but for this problem we will assume that no such vertices exist. We will assume that indexing of vertices is done modulo n.4 j 2 1 0 5 A1 p0 p1 4 A2 p2 120 0 6 3 158 88 48 1 2 104 m[i. . Many geometric algorithm operate by ﬁrst decomposing a complex polygonal shape into triangles. vi . In fact. Equivalently. The line segments are called the sides of the polygon and the endpoints are called the vertices. This polygon has n sides. . vj . vn . if the sides do not intersect one another except for two consecutive sides sharing a common vertex. given a convex polygon. . vi+1 . that is. Polygons and Triangulations: Let’s consider a geometric problem that outwardly appears to be quite different from chain-matrix multiplication. its boundary and its exterior. . . . (In other words. there are many possible triangulations. 9: Chain Matrix Multiplication Example. In general. but minimizing wire length is an Lecture Notes 19 CMSC 451 . Deﬁne a polygon to be a piecewise linear closed curve in the plane.) It is easy to see that such a set of chords subdivides the interior of the polygon into a collection of triangles with pairwise disjoint interiors (and hence the name triangulation). where i < j−1. we include the additional requirement that the interior of the segment must lie entirely in the interior of P . every chord that is not in T intersects the interior of some chord in T .j] 3 84 0 2 7 A4 p3 p4 A1 i 4 j 2 3 1 4 1 1 3 3 2 3 3 2 s[i. Vertices with interior angle equal to 180 degrees are normally allowed. 10: Polygons. A simple polygon is said to be convex if every interior angle is at most 180 degrees. we can deﬁne a triangulation as a maximal set T of nonintersecting chords. . We begin with a number of deﬁnitions. In other words. . so v0 = vn . One criterion is to imagine that you must “pay” for the ink you use in drawing the triangulation. . Given two nonadjacent sides vi and vj . Polygon Simple polygon Convex polygon Fig. Given a convex polygon. Which triangulation is the “best”? There are many criteria that are used depending on the application. (If the polygon is simple but not convex.

11: Triangulations of convex polygons. and whose leaves Lecture Notes 20 CMSC 451 . i + 1] = 0. whose internal nodes are the nodes of the dual tree. Now consider a rooted binary tree whose root node is the triangle containing side v0 vn . it is easy to modify the procedure to extract the actual triangulation. consider the subpolygon vi . Thus. v1 . where i < k < j. . then the weight of the minimum weight triangulation of the entire polygon can be extracted as t[0. In the case of the chain matrix multiplication. . vi+1 . that is. . where j > i + 1. . this is one of many properties which we could choose to optimize. . Observe that if we can compute this quantity for all such i and j. j].) The same Θ(n3 ) algorithm can be applied with only minor changes. vk+1 . .) As a basis case. vj whose minimum weights are already known to us as t[i. . . . vk ) = |vi vj | + |vj vk | + |vk vi |. But. vk . To see that there is a similar correspondence here. . consider an (n + 1)-sided convex polygon P = v0 . The ﬁnal output is the overall minimum weight. . and ﬁx one side of the polygon (say v0 vn ). For 0 ≤ i < j ≤ n. and each node of the tree is associated with a product of a sequence of two or more matrices.) A triangulation Lower weight triangulation Fig. vj . . vi+1 . t[0. We may split this subpolygon by introducing a triangle whose base is this chord. vj . . where the leaves of the tree correspond to the matrices. Further. vk and vk . we deﬁne the weight of the associated triangle by the weight function w(vi . 12 Note that this has almost exactly the same structure as the recursive deﬁnition used in the chain matrix multiplication algorithm (except that some indices are different by 1. deﬁne t[i. This is illustrated in Fig. (As usual. 11. . Given three distinct vertices vi . . vn . we have the following recursive rule: t[i. n]. j] = 0 mini<k<j (t[i. (See Fig. . vi+1 . . we only compute the minimum weight. Relationship to Binary Trees: One explanation behind the similarity of triangulations and the chain matrix multiplication algorithm is to observe that both are fundamentally related to binary trees. In addition we should consider the weight of the newly added triangle vi vk vj . which is. In general.) This suggests the following optimization problem: Minimum-weight convex polygon triangulation: Given a convex polygon determine the triangulation that minimizes the sum of the perimeters of its triangles. and the minimum weight triangulation. Let us assume that these vertices have been numbered in counterclockwise order. . j]. n]. the polygon with the counterclockwise vertex sequence vi . vn . where |vi vj | denotes the length of the line segment vi vj . One of the chords of this polygon is the side vi vj . This subdivides the polygon into the subpolygons vi . v1 . and whose third vertex is any vertex vk . k] + t[k. k] and t[k. . j] + w(vi vk vj )) if j = i + 1 if j > i + 1. . j] to be the weight of the minimum weight triangulation for the subpolygon that lies to the right of directed chord vi vj . we deﬁne the weight of the trivial “2-sided polygon” to be zero. Dynamic Programming Solution: Let us consider an (n + 1)-sided polygon P = v0 . the associated binary tree is the evaluation tree for the multiplication. To derive a DP formulation we need to deﬁne a set of subproblems from which we can derive the optimum solution. .important condition in chip design. . vj . implying that t[i. vj . to compute t[i.

but it often leads to algorithms with higher than desired running times. 13. are often used in ﬁnding good approximations. we will see that this technique can be applied to a number of graph problems as well. except for the distinguished starting side v0 vn . they often provide fast heuristics (nonoptimal solution strategies). Observe that the associated binary tree has n leaves. Each internal node corresponds to one triangle. correspond to the remaining sides of the tree. but it is not as powerful or as widely applicable as dynamic programming. and each edge between internal nodes corresponds to one chord of the triangulation.2 in CLRS.1 and 16. We will give some examples of problems that can be solved by greedy algorithms.j] vi cost=w(v i . 12: Triangulations and tree structure. Since each internal node other than the root has one edge entering it.vk. Namely. In dynamic programming we saw one way to make these selections. and vice versa. Lecture 7: Greedy Algorithms: Activity Selection and Fractional Knapack Read: Sections 16. Dynamic programming is a powerful technique. Greedy Algorithms: In many optimization algorithms a series of selections need to be made. Once you see this connection. Then the following two observations follow easily. Lecture Notes 21 CMSC 451 .vn v0 vj Triangulate at cost t[k. is associated with a leaf node of the tree. there are n − 2 edges between the internal nodes. 13: Triangulations and tree structure. the optimal solution is described in a recursive manner. (Later in the semester. A1 v1 v0 root v11 root A11 v10 A10 v9 A9 v8 v4 A5 v 5 A6 v6 A7 v7 A8 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A3 v3 A2 v2 A4 Fig.) Even when greedy algorithms do not produce the optimal solution. and hence (by standard results on binary trees) n − 1 internal nodes.k] vk Fig. and then is computed “bottom-up”. Today we will consider an alternative design technique. called greedy algorithms. vj ) Triangulate at cost t[i. Observe that partitioning the polygon into triangles is equivalent to a binary tree with n leaves. This method typically leads to simpler and faster algorithms. Note that every triangle is associated with an internal node of the tree and every edge of the original polygon. This is illustrated in Fig.

n] already sorted List A = <1> prev = 1 for i = 2 to n if (s[i] >= f[prev]) { append i to A. . or requests for boats to use a repair facility while they are in port.n].1-4 in CLRS). Assuming this sorting. the pseudocode for the rest of the algorithm is presented below. The intuition is the same. Note that this is not the only optimal schedule. activity 7 is scheduled. It interferes with activity 2 and 3. To make the selection process faster. (Note that making the intervals half open. f[1. ..Activity Scheduling: Activity scheduling and it is a very simple scheduling problem. fi ) ∩ [sj . Because there is only one resource. The ﬁnal output is {1. Suppose that you have any nongreedy solution. 7}. and schedule the next one that has the earliest ﬁnish time. 2. where the lecture times have been set up in advance. because they occupy the resource and keep us from honoring other requests. For example. so the total running time is Θ(n log n). Proof of Optimality: Our proof of optimality is based on showing that the ﬁrst choice made by the algorithm is the best possible. schedule(s[1. 4. Fig. that is. . fj ) = ∅. We are given a set S = {1. ≤ fn . Each activity is represented by its start-ﬁnish time interval. prev = i } return A } Greedy Activity Scheduler // given start and finish times // schedule activity 1 first // no interference? // schedule i next It is clear that the algorithm is quite simple and efﬁcient. we do not like long activities.n]) { // we assume f[1. n} of n activities that are to be scheduled to use some resource. We say that two activities i and j are noninterfering if their start-ﬁnish intervals do not overlap.) The activity scheduling problem is to select a maximum-size set of mutually noninterfering activities for use of the resource. and it intereferes with the remaining activity. and then using induction to show that the rest of the choices result in an optimal schedule. [si . It interferes with activity 5 and 6. we skip all activities that interfere with this one. This suggests the following greedy strategy: repeatedly select the activity with the smallest duration (fi − si ) and schedule it. Finally. but the greedy approach may not be optimal in general. let us select the activity that ﬁnishes ﬁrst and schedule it. this turns out to be nonoptimal. not all the requests can be honored. these might be lectures that are to be given in a lecture hall. . two consecutive activities are not considered to interfere. Then. . provided that it does not interfere with any previously scheduled activities. . (Notice that goal here is maximum number of activities. Since we do not like activities that take a long time. Observe that the intervals are sorted by ﬁnish time. Sometimes the design of a correct greedy algorithm requires trying a few different strategies. The output is the list A of scheduled activities. The variable prev holds the index of the most recently scheduled activity at any time. in order to determine interferences. Lecture Notes 22 CMSC 451 . where each activity must be started at a given start time si and ends at a given ﬁnish time fi . more formally. not maximum utilization. {2. and some start and ﬁnish times may overlap (and two lectures cannot be given in the same room at the same time). f1 ≤ f2 ≤ . until hitting on one that works. Then Event 4 is scheduled. we assume that the activities have been sorted by their ﬁnish times.. Although this seems like a reasonable strategy. Of course different criteria could be considered. (See Problem 17. 14 shows an example. Event 1 is scheduled ﬁrst. Here is a greedy strategy that does work. 7} is also optimal. Proofs of optimality for greedy algorithms follow a similar structure. 4.) How do we schedule the largest number of activities on the resource? Intuitively.. and so on. The most costly activity is that of sorting the activities by ﬁnish time.

. without decreasing the number of activities.3 5 6 7 8 4 5 6 7 8 Add 4: Add 7: 1 2 3 4 5 6 7 8 Sched 7. That is. gj . A: G: x1 x1 x2 x2 x3 g3 x4 x5 A’: x1 x2 g3 x4 x5 Fig. xj−1 . . where gj = xj . Skip 8 1 2 3 4 5 6 7 8 Sched 4. 15: Proof of optimality for the greedy schedule (j = 3). . Thus. . The ﬁnal schedule is {1. Let A = x1 .) The greedy algorithm selects the activity with the earliest ﬁnish time that does not conﬂict with any earlier activity. . . . Proof: Consider any optimal schedule A that is not the greedy schedule. because it ﬁnishes before xj . Claim: The greedy algorithm gives an optimal solution to the activity scheduling problem. since otherwise G would have more activities than the optimal schedule. A = x1 . which would be a contradiction. xj−1 . xk . (Since gj cannot conﬂict with the earlier activities.) That is. . .) It has the same number of activities as A. . the greedy schedule is of the form G = x1 . This proof is complicated a bit by the fact that there may be multiple solutions. x2 . . 7}. . and therefore A Lecture Notes 23 CMSC 451 . . xk be the activities of A. and it does not conﬂict with later activities. Our approach is to show that any schedule that is not greedy can be made more greedy. . . Skip 2. . This is a feasible schedule. gj . Skip 5. xj+1 . (See Fig. Show that its cost can be reduced by being “greedier” at some point in the solution. Since A is not the same as the greedy schedule. x2 . . . Consider the modiﬁed “greedier” schedule A that results by replacing xj with gj in the schedule A.6 Fig. consider the ﬁrst activity xj where these two schedules differ. 4. 15. and it ﬁnishes before xj . . (Note that k ≥ j. x2 . Order the activities in increasing order of ﬁnish time. 14: An example of the greedy algorithm for activity scheduling.Input: 1 2 3 4 Add 1: 1 2 3 Sched 1. We will construct a new optimal schedule A that is in some sense “greedier” than A. we know that gj does not conﬂict with any earlier activity.

G is also optimal. Given a room with sacks of gold. It is not possible to take a fraction of an item or multiple copies of an item. we take it all. but has a knapsack that can only carry W total pounds.0 Greedy solution to fractional problem. Let us say that greedy takes x more Lecture Notes 24 CMSC 451 . it is good to have high value and bad to have high weight.0 4. In contrast. This is illustrated in Fig. and in fact it is an NP-complete problem (meaning that there probably doesn’t exist an efﬁcient solution). silver.0 3. By deﬁnition. and bronze. but only one works. There are a few choices that you might try here. 20 $100 Greedy solution to 0−1 problem. The 0-1 knapsack problem is hard to solve.0 2. we will eventually convert A into G. Intuitively. Which items should he take? (The reason that this is called 0-1 knapsack is that each item must be left (0) or taken entirely (1). This would mean that there is an alternate selection that is optimal. So.is also optimal. Sort the items of the alternate selection in decreasing order by ρ values. greedy takes a greater amount of item i than the alternate (because the greedy always takes as much as it can). suppose to the contrary that the greedy algorithm is not optimal. which you can partially empty out before taking. The ith item is worth vi dollars and weighs wi pounds. Fractional Knapsack Problem: The classical (0-1) knapsack problem is a famous optimization problem. and then as much bronze as possible. and ﬁnds n items which can be taken.) This optimization problem arises in industrial packing applications. Therefore. there is a very simple and efﬁcient greedy algorithm for the fractional knapsack problem. thus ﬁlling the knapsack entirely. Correctness: It is intuitively easy to see that the greedy algorithm is optimal for the fractional problem. 16 35 40 60 $140 30 $90 40 $160 + 40 30 20 5 knapsack $30 $20 $100 $90 $160 ρ= 6. and add them in this order. As in the case of the other greedy algorithms we have seen. However. 10 5 20 $100 + $30 $270 5 20 + + $100 + $30 $220 $260 Optimal solution to 0−1 problem. By repeating this process. where vi and wi are integers. you might think of each object as being a sack of gold. then take as much silver as possible. you would obviously take as much gold as possible. We sort the items in decreasing order of ρi . Consider the ﬁrst item i on which the two selections differ. Let ρi = vi /wi denote the value-per-pound ratio. More formally. He wants to take as valuable a load as possible.0 Input 5. At some point there is an item that does not ﬁt in the remaining space. We take as much of this item as possible. in the fractional knapsack problem the setup is exactly the same. For example. This suggests that we ﬁrst sort the items according to some function that is an decreases with value and increases with weight. but the thief is allowed to take any fraction of an item for a fraction of the weight and a fraction of the value. If the item ﬁts. A thief is robbing a store. 16: Example for the fractional knapsack problem. the idea is to ﬁnd the right order in which to process items. But it would never beneﬁt you to take a little less gold so that you could replace it with an equal volume of bronze. Fig. without decreasing the number of activities. you may want to ship some subset of items on a truck of limited capacity.

because its is very easy to break a string up into its individual characters. b.05 110 c 0. a 0 Lecture Notes b 110 a 0 c 10 d 111 25 a 0 a 0 c 10 a 0 c 10 CMSC 451 . For example. Consider the following example. a 00 b 01 a 00 c 10 d 11 a 00 a 00 c 10 a 00 c 10 The ﬁnal 20-character binary string would be “00010010110000100010”. if you had been less greedy. Consider the example shown in Fig. then 20. and to access individual characters and substrings by direct indexing. We could use the following ﬁxed-length code: Character Fixed-Length Codeword a 00 b 01 c 10 d 11 A string such as “abacdaacac” would be encoded by replacing each of its characters by the corresponding binary codeword. then you could take the items of weights 20 and 40 for a total value of $100 + $160 = $260. Normally when characters are coded using standard codes like ASCII. and ignored the item of weight 5. this implies that the alternate selection is not optimal. Fixed-length codes are popular. 16.g.05 111 Notice that there is no requirement that the alphabetical order of character correspond to any sort of ordering applied to the codewords. and then (since the item of weight 40 does not ﬁt) you would settle for the item of weight 30.units of object i than the alternate does. However. All the subsequent elements of the alternate selection are of lesser value than vi . By replacing x units of any such items with x units of item i. c. However. suppose that you knew the relative probabilities of characters in advance. 8 bits per character). On the other hand.30 10 d 0. (This might happen by analyzing many strings over a long period of time. we would increase the overall value of the alternate selection. Huffman Codes: Huffman codes provide a method of encoding data efﬁciently. Lecture 8: Greedy Algorithms: Huffman Coding Read: Section 16. you can just scan the ﬁle and determine the exact frequencies of all the characters. a contradiction. suppose that characters are expected to occur with the following probabilities.) You can use this knowledge to encode strings differently. We could design a variable-length code which would do a better job. d}. Nonoptimality for the 0-1 Knapsack: Next we show that the greedy algorithm is not generally optimal in the 0-1 knapsack problem. Now. ﬁxed-length codes may not be the most efﬁcient from the perspective of minimizing the total quantity of data. Now.60 0 b 0. In applications like data compression. Suppose that we want to encode strings over the (rather limited) 4-character alphabet C = {a. If you were to sort the items by ρi .3 in CLRS. for a total value of $30 + $100 + $90 = $220. then you would ﬁrst take the items of weight 5. each character is represented by a ﬁxed-length codeword of bits (e. where you want to encode one ﬁle. This feature of “delaying gratiﬁcation” in order to come up with a better overall solution is your indication that the greedy solution is not optimal. the same string would be encoded as follows. Frequently occurring characters are encoded using fewer bits and less frequent characters are encoded using more bits. Character Probability Variable-Length Codeword a 0.

Thus, the resulting 17-character string would be “01100101110010010”. Thus, we have achieved a savings of 3 characters, by using this alternative code. More generally, what would be the expected savings for a string of length n? For the 2-bit ﬁxed-length code, the length of the encoded string is just 2n bits. For the variable-length code, the expected length of a single encoded character is equal to the sum of code lengths times the respective probabilities of their occurrences. The expected encoded string length is just n times the expected encoded character length. n(0.60 · 1 + 0.05 · 3 + 0.30 · 2 + 0.05 · 3) = n(0.60 + 0.15 + 0.60 + 0.15) = 1.5n. Thus, this would represent a 25% savings in expected encoding length. The question that we will consider today is how to form the best code, assuming that the probabilities of character occurrences are known. Preﬁx Codes: One issue that we didn’t consider in the example above is whether we will be able to decode the string, once encoded. In fact, this code was chosen quite carefully. Suppose that instead of coding the character ‘a’ as 0, we had encoded it as 1. Now, the encoded string “111” is ambiguous. It might be “d” and it might be “aaa”. How can we avoid this sort of ambiguity? You might suggest that we add separation markers between the encoded characters, but this will tend to lengthen the encoding, which is undesirable. Instead, we would like the code to have the property that it can be uniquely decoded. Note that in both the variable-length codes given in the example above no codeword is a preﬁx of another. This turns out to be the key property. Observe that if two codewords did share a common preﬁx, e.g. a → 001 and b → 00101, then when we see 00101 . . . how do we know whether the ﬁrst character of the encoded message is a or b. Conversely, if no codeword is a preﬁx of any other, then as soon as we see a codeword appearing as a preﬁx in the encoded text, then we know that we may decode this without fear of it matching some longer codeword. Thus we have the following deﬁnition. Preﬁx Code: An assignment of codewords to characters so that no codeword is a preﬁx of any other. Observe that any binary preﬁx coding can be described by a binary tree in which the codewords are the leaves of the tree, and where a left branch means “0” and a right branch means “1”. The code given earlier is shown in the following ﬁgure. The length of a codeword is just its depth in the tree. The code given earlier is a preﬁx code, and its corresponding tree is shown in the following ﬁgure.

0 a 0 0 c 10

1 1 0 b 110 1 d 111

Fig. 17: Preﬁx codes. Decoding a preﬁx code is simple. We just traverse the tree from root to leaf, letting the input character tell us which branch to take. On reaching a leaf, we output the corresponding character, and return to the root to continue the process. Expected encoding length: Once we know the probabilities of the various characters, we can determine the total length of the encoded text. Let p(x) denote the probability of seeing character x, and let dT (x) denote the length of the codeword (depth in the tree) relative to some preﬁx tree T . The expected number of bits needed to encode a text with n characters is given in the following formula: B(T ) = n

x∈C

p(x)dT (x).

Lecture Notes

26

CMSC 451

This suggests the following problem: Optimal Code Generation: Given an alphabet C and the probabilities p(x) of occurrence for each character x ∈ C, compute a preﬁx code T that minimizes the expected length of the encoded bit-string, B(T ). Note that the optimal code is not unique. For example, we could have complemented all of the bits in our earlier code without altering the expected encoded string length. There is a very simple algorithm for ﬁnding such a code. It was invented in the mid 1950’s by David Huffman, and is called a Huffman code.. By the way, this code is used by the Unix utility pack for ﬁle compression. (There are better compression methods however. For example, compress, gzip and many others are based on a more sophisticated method called the Lempel-Ziv coding.) Huffman’s Algorithm: Here is the intuition behind the algorithm. Recall that we are given the occurrence probabilities for the characters. We are going to build the tree up from the leaf level. We will take two characters x and y, and “merge” them into a single super-character called z, which then replaces x and y in the alphabet. The character z will have a probability equal to the sum of x and y’s probabilities. Then we continue recursively building the code on the new alphabet, which has one fewer character. When the process is completed, we know the code for z, say 010. Then, we append a 0 and 1 to this codeword, given 0100 for x and 0101 for y. Another way to think of this, is that we merge x and y as the left and right children of a root node called z. Then the subtree for z replaces x and y in the list of characters. We repeat this process until only one super-character remains. The resulting tree is the ﬁnal preﬁx tree. Since x and y will appear at the bottom of the tree, it seem most logical to select the two characters with the smallest probabilities to perform the operation on. The result is Huffman’s algorithm. It is illustrated in the following ﬁgure. The pseudocode for Huffman’s algorithm is given below. Let C denote the set of characters. Each character x ∈ C is associated with an occurrence probability x.prob. Initially, the characters are all stored in a priority queue Q. Recall that this data structure can be built initially in O(n) time, and we can extract the element with the smallest key in O(log n) time and insert a new element in O(log n) time. The objects in Q are sorted by probability. Note that with each execution of the for-loop, the number of items in the queue decreases by one. So, after n − 1 iterations, there is exactly one element left in the queue, and this is the root of the ﬁnal preﬁx code tree. Correctness: The big question that remains is why is this algorithm correct? Recall that the cost of any encoding tree T is B(T ) = x p(x)dT (x). Our approach will be to show that any tree that differs from the one constructed by Huffman’s algorithm can be converted into one that is equal to Huffman’s tree without increasing its cost. First, observe that the Huffman tree is a full binary tree, meaning that every internal node has exactly two children. It would never pay to have an internal node with only one child (since such a node could be deleted), so we may limit consideration to full binary trees. Claim: Consider the two characters, x and y with the smallest probabilities. Then there is an optimal code tree in which these two characters are siblings at the maximum depth in the tree. Proof: Let T be any optimal preﬁx code tree, and let b and c be two siblings at the maximum depth of the tree. Assume without loss of generality that p(b) ≤ p(c) and p(x) ≤ p(y) (if this is not true, then rename these characters). Now, since x and y have the two smallest probabilities it follows that p(x) ≤ p(b) and p(y) ≤ p(c). (In both cases they may be equal.) Because b and c are at the deepest level of the tree we know that d(b) ≥ d(x) and d(c) ≥ d(y). (Again, they may be equal.) Thus, we have p(b) − p(x) ≥ 0 and d(b) − d(x) ≥ 0, and hence their product is nonnegative. Now switch the positions of x and b in the tree, resulting in a new tree T . This is illustrated in the following ﬁgure. Next let us see how the cost changes as we go from T to T . Almost all the nodes contribute the same to the expected cost. The only exception are nodes x and b. By subtracting the old contributions of these

Lecture Notes

27

CMSC 451

smallest a: 05 b: 48 c: 07 d: 17 e: 10 f: 13 smallest 12 a: 05 b: 48 c: 07 smallest 22 12 a: 05 c: 07 smallest 22 12 a: 05 c: 07 smallest 52 22 12 a: 05 c: 07 Final Tree 0 0 0 0 a: 05 0000 1 c: 07 0001 1 e: 10 001 d: 17 010 1 0 1 f: 13 011 1 b: 48 1 e: 10 d: 17 b: 48 30 f: 13 e: 10 b: 48 d: 17 30 f: 13 b: 48 e: 10 d: 17 f: 13 d: 17 e: 10 f: 13

Fig. 18: Huffman’s Algorithm.

Lecture Notes

28

CMSC 451

Huffman’s algorithm is guaranteed to produce the optimal tree. we eliminate exactly one character). Thus the cost does not increase. For the basis case. The ﬁnal tree T satisﬁes the statement of the claim.extractMin(). Claim: Huffman’s algorithm produces the optimal preﬁx code tree. The previous claim states that we may assume that in the optimal tree. // priority queue for i = 1 to n-1 { z = new internal tree node.left = x = Q.prob. n = 1. which by a similar argument is also optimal. z. Remove x and y.Huffman’s Algorithm Huffman(int n.n]) { Q = C. which is obviously optimal. // insert z into queue } return the last element left in Q as the root. z.extractMin(). // z’s probability is their sum Q. Suppose we have exactly n characters.. the tree consists of a single leaf node.prob + y. By switching y with c we get a new tree T . replacing them with a new character z whose probability is p(z) = p(x) + p(y). the number of characters.insert(z). the two characters of lowest probability x and y will be siblings at the lowest level of the tree. implying that T is an optimal tree. Assume inductively that when strictly fewer than n characters. Consider any preﬁx code tree T made with this new set of n − 1 characters. We can convert it into a preﬁx code tree T for the original set of characters by undoing the previous operation and replacing z with x Lecture Notes 29 CMSC 451 . nodes and adding in the new contributions we have B(T ) = B(T ) − p(x)d(x) + p(x)d(b) − p(b)d(b) + p(b)d(x) = B(T ) + p(x)(d(b) − d(x)) − p(b)(d(b) − d(x)) = B(T ) − (p(b) − p(x))(d(b) − d(x)) ≤ B(T ) because (p(b) − p(x))(d(b) − d(x)) ≥ 0. Proof: The proof is by induction on n. 19: Correctness of Huffman’s Algorithm. character C[1.prob = x. We want to show it is true with exactly n characters. // extract smallest probabilities z. Thus n − 1 characters remain. } T x y c b y c T’ b c x y T’’ b x Cost change = −(p(b)−p(x))(d(b)−d(x)) <0 Cost change = −(p(c)−p(y))(d(c)−d(y)) <0 Fig. The above theorem asserts that the ﬁrst step of Huffman’s algorithm is essentially the proper one to perform.right = y = Q. The complete proof of correctness for Huffman’s algorithm follows by induction on n (since with each step.

Since the change in cost depends in no way on the structure of the tree T . VLSI and other sorts of logic circuits. Lecture Notes 30 CMSC 451 . (Note that self-loops are not allowed). The list of application is almost too long to even consider enumerating it. any time you have a set of objects. 20: Digraph and graph example.and y (adding a “0” bit for x and a “1” bit for y). E) consists of a ﬁnite set V of vertices. this exactly what Huffman’s algorithm does. v) are distinct).1 and 22. Graphs and Digraphs: Most of you have encountered the notions of directed and undirected graphs in other courses. Furthermore. Some deﬁnitions of graphs disallow this. called the vertices or nodes. precedence constraints in scheduling systems. we need to build the tree T on n − 1 characters optimally. w) and (w. and there is some “connection” or “relationship” or “interaction” between pairs of objects. surface meshes used for shape description in computer-aided design and geographic information systems. and E. Multiple edges are not permitted (although the edges (v. The cost of the new tree is B(T ) = B(T ) − p(z)d(z) + p(x)(d(z) + 1) + p(y)(d(z) + 1) = B(T ) − (p(x) + p(y))d(z) + (p(x) + p(y))(d(z) + 1) = B(T ) + (p(x) + p(y))(d(z) + 1 − d(z)) = B(T ) + p(x) + p(y). We will be discussing algorithms for both directed and undirected graphs. Lecture 9: Graphs: Background and Breadth First Search Read: Review Sections 22. By induction. E) consists of a ﬁnite set V . so we will give a quick overview here. Basically. Most of the problems in computational graph theory that we will consider arise because they are of importance to one or more of these application areas. a graph is a collection of vertices or nodes. Examples of graphs in application include communication and transportation networks.) Observe that self-loops are allowed by this deﬁnition. Graphs are extremely important because they are a very ﬂexible mathematical model for many application problems. connected by a collection of edges. Deﬁnition: A directed graph (or digraph) G = (V. called the edges of G. a graph is a good way to model this. Thus the ﬁnal tree is optimal. called the edges. (Another way of saying this is that E is a binary relation on V . many of these problems form the basic building blocks from which more complex algorithms are then built. 1 2 Digraph 4 3 1 2 Graph 3 4 Fig. a set of ordered pairs. Intuitively. and a set E of unordered pairs of distinct vertices. to minimize the cost of the ﬁnal tree T . Graph Algorithms: We are now beginning a major new section of the course. Deﬁnition: An undirected graph (or graph) G = (V.2 CLR.

A graph is said to be sparse if E ∈ Θ(V ). so that the performance on sparse and dense graphs will be apparent. we typically consider both the number of vertices and the number of edges. v1 . . we usually mean the maximum degree of its vertices. or DAG for short. An undirected graph is connected if every vertex can reach every other vertex.Note that directed graphs and undirected graphs are different (but similar) objects mathematically. . v∈V Notice that generally the number of edges in a graph may be as large as quadratic in the number of vertices. The edge e is incident (meaning that it touches) both u and v. k. k. A cycle is simple if its vertices (except v0 and vk ) are distinct. The number of vertices is typically written as n or V . We say that w is reachable from u if there is a path from u to w. and all its edges are distinct. In undirected graphs u and v are the endpoints of the edge. vk such that (vi−1 . When discussing the size of a graph. When giving the running times of algorithms. In a directed graph. in contrast to a rooted tree. An acyclic connected graph is called a free tree or simply tree for short. Note that every vertex is reachable from itself by a trivial path that uses zero edges. but other notions (such as connectivity) may only be deﬁned for one. . We say that vertex v is adjacent to vertex u if there is an edge (u. An acyclic digraph is called a directed acyclic graph. . as is usually seen in data structures. and the number of edges is written as m or E or e. v∈V In a digraph: Number of edges: 0 ≤ E ≤ n2 . given the edge e = (u. (The term “free” is intended to emphasize the fact that the tree has no root. and is (naturally) called a forest. we will usually express it as a function of both V and E. Sum of degrees: v∈V in-deg(v) = out-deg(v) = E. 2. A graph or digraph is said to be acyclic if it contains no simple cycles. Lecture Notes 31 CMSC 451 . Simple cycle Nonsimple cycle Free Tree Forest DAG Fig. v).) An acyclic undirected graph (which need not be connected) is a collection of free trees. Here are some basic combinatorial facts about graphs and digraphs. We will leave the proofs to you. A cycle is a path containing at least one edge and for which v0 = vk . Given a graph with V vertices and E edges then: In a graph: Number of edges: 0 ≤ E ≤ n = n(n − 1)/2 ∈ O(n2 ). . we say that u is the origin of e and v is the destination of e. the large graphs that arise in practice typically have much fewer edges. and the number of edges coming in is called the in-degree. A path is simple if all vertices and all the edges are distinct. v). called the connected components of the graph. In a digraph. vi ) is an edge for i = 1. (Connectivity is a bit messier for digraphs. Paths and Cycles: A path in a graph or digraph is a sequence of vertices v0 .) The subsets of mutually reachable vertices partition the vertices of the graph into disjoint subsets. 2 Sum of degrees: deg(v) = 2E. . However. and we will deﬁne it later. Certain notions (such as path) are deﬁned for both. the number of edges coming out of a vertex is called the out-degree of that vertex. By the degree of a graph. In an undirected graph we just talk about the degree of a vertex as the number of incident edges. The length of the path is the number of edges. . 21: Illustration of some graph terms. . and dense. otherwise. or may be deﬁned differently.

w)). If (v. the vertices that can be reached from v by a single edge). w) need not be / deﬁned. A[v. A(v. . since they are both the same edge in reality. v). Adj[v] points to a linked list containing the vertices which are adjacent to v (i. n}. but often we set it to some “special” value. . 2. when this is summed over all vertices. w] = 1 0 if (v. Let G = (V. We will assume that the vertices of G are indexed {1. The V arises because there is one entry for each vertex in Adj . w) (the weight on edge (v.) Adjacency List: An array Adj[1 . v). If the edges have weights then these weights may also be stored in the linked list elements. 23: Adjacency matrix and adjacency list for graphs. w) in the representation that you also mark (w. w] = W (v. 1 1 0 0 2 1 0 1 3 1 1 0 Adj 1 2 3 1 3 2 2 3 1 1 2 3 2 3 Adjacency matrix Adjacency list Fig. If the digraph has weights we can store the weights in the matrix. we representing the undirected edge {v. In particular. For example if (v. For example. w) and (w. Adj 1 2 3 4 1 0 1 1 1 2 1 0 1 0 3 1 1 0 1 4 1 0 1 0 3 4 1 1 2 4 3 1 2 2 1 3 3 4 1 2 4 3 Adjacency matrix Adjacency list (with crosslinks) Fig. Adjacency Matrix: An n × n matrix deﬁned for 1 ≤ v. so it is common to include cross links between corresponding edges. w) ∈ E then A[v. w) ∈ E then generally W (v. but we will store each edge twice. the total number of adjacency list records is Θ(E).e. it is important to remember that these two classes of objects are mathematically distinct from one another. This can cause some complications. 22: Adjacency matrix and adjacency list for digraphs. In practice. w ≤ n. e. First we show how to represent digraphs. . Notice that even though we represent undirected graphs in the same way that we represent digraphs. . suppose you write an algorithm that operates by marking edges of a graph. w) = −1. . it may not be convenient to walk down the entire linked list. (By ∞ we mean (in practice) some number which is larger than any allowable weight. .Representations of Graphs and Digraphs: There are two common ways of representing graphs and digraphs. An adjacency matrix requires Θ(V 2 ) storage and an adjacency list requires Θ(V + E) storage. For sparse graphs the adjacency list representation is more space efﬁcient. n] of pointers where for 1 ≤ v ≤ n. When dealing with adjacency lists. this might be some machine dependent constant like MAXINT. Since each list has out-deg(v) entries. Lecture Notes 32 CMSC 451 . E) be a digraph with n = |V | and let e = |E|. w) ∈ E otherwise. You need to be careful when you mark edge (v. We can represent undirected graphs using exactly the same representation. w} by the two oppositely directed edges (v.g. or ∞.

breadth-ﬁrst search starts at some source vertex s and “discovers” which vertices are reachable from s. When a vertex has ﬁrst been discovered. Breadth-ﬁrst search: Given an graph G = (V.. then it becomes black. and every other node has a unique path to the root). When a gray vertex is processed. meaning that they are undiscovered.. One of the most important approaches is based on the notion of systematically visiting all the vertices and edge of a graph. At any given time there is a “frontier” of vertices that have been discovered.. Breadth-First Search BFS(G.Graph Traversals: There are a number of approaches used for solving problems on graphs. the distance from s to u.. (Note that there are many potential BFS trees for a given graph. the vertex who ﬁrst discovered u. It is not hard to prove that if G is an undirected graph. Only the color is really needed for the search (in fact it is only necessary to know whether a node is nonwhite). it is colored gray (and is part of the frontier). gray or black). but not yet processed.set its distance . We will also maintain arrays color[u] which holds the color of vertex u (either white. The reason for this is that these traversals impose a type of tree structure (or generally a forest) on the graph..) These edges of G are called tree edges and the remaining edges of G are called cross edges. Breadth-ﬁrst search is named because it visits vertices across the entire “breadth” of this frontier.Enqueue(v) } } color[u] = black } } // initialization // initialize source s // put s in the queue // u is the next to visit // // // // // if neighbor v undiscovered . Deﬁne the distance between a vertex v and s to be the minimum number of edges on a path from s to v.put it in the queue // we are done with u Observe that the predecessor pointers of the BFS search deﬁne an inverted tree (an acyclic directed graph in which the source is the root. The search makes use of a queue. and in what order vertices are placed on the queue. Initially all vertices (except the source) are colored white. If we reverse these edges we get a rooted unordered tree called a BFS tree for G. and trees are usually much easier to reason about than general graphs. (Can you see why this must be true?) Below is a sketch of a proof that on Lecture Notes 33 CMSC 451 . Breadth-ﬁrst search discovers vertices in increasing order of distance.. and d[u]. a ﬁrst-in ﬁrst-out list. We include all this information... depending on where the search starts.s) { for each u in V { color[u] = white d[u] = infinity pred[u] = null } color[s] = gray d[s] = 0 Q = {s} while (Q is nonempty) { u = Q. pred[u] which points to the predecessor of u (i.mark it discovered . because some applications of BFS use this additional information. where elements are removed in the same order they are inserted. and hence can be used as an algorithm for computing shortest paths. then cross edges always go between two nodes that are at most one level apart in the BFS tree.and its predecessor .Dequeue() for each v in Adj[u] { if (color[v] == white) { color[v] = gray d[v] = d[u]+1 pred[v] = u Q. E). The ﬁrst item in the queue (the next to be removed) is called the head of the queue.e.

d[v] is equal to the distance from s to v. u) + 1 = δ(s. b f Q: a. termination. f. we have (by induction) d[u] = δ(s. Since we never visit a vertex twice. Since v is a neighbor of u. on termination of the BFS procedure. the number of times we go through the while loop is at most V (exactly V assuming each vertex is reachable from the source). g 1 d e a 1 e 2 1 d e c g c a 1 e 2 s0 1 c b2 d s0 a 1 e 2 f 3 1 c b2 3 g s0 1 c b2 Q: e. we set d[v] = d[u] + 1. and among all such vertices the ﬁrst to be processed by the BFS. Let u be the predecessor of v on some shortest path from s to v. e s0 1 c b2 3 g Q: b. c. v) denote the length (number of edges) on the shortest path from s to v. Then.) Summing up over all vertices we have the running time T (V ) = V + u∈V (deg(u) + 1) = V + u∈V deg(u) + V = 2V + 2E ∈ Θ(V + E). Proof: (Sketch) The proof is by induction on the length of the shortest path.a s d b s a 1 s0 1 c 1 d a a 1 e 2 b. 24: Breadth-ﬁrst search: Example. g a 1 e 2 Q: (empty) f 3 s0 1 c Q: c. b 1 d 1 d Q: d. (The +1 is because even if deg(u) = 0. As done in CLR V = |V | and E = |E|. u) + 1. we need to spend a constant amount of time to set up the loop. u). d 1 d Fig. d[v] = δ(s. The analysis is essentially the same for directed graphs. (See the CLRS for a detailed proof. e. Lecture Notes 34 CMSC 451 . When u is processed. Analysis: The running time analysis of BFS is similar to the running time analysis of many graph traversal algorithms.) Theorem: Let δ(s. Thus we have d[v] = d[u] + 1 = δ(s. v). The real meat is in the traversal loop. δ(s. f. Thus. The number of iterations through the inner for loop is proportional to deg(u) + 1. v). v) = δ(s. d. Observe that the initialization portion requires Θ(V ) time. as desired.

As before we also store predecessor pointers. E). This is the general idea behind depth-ﬁrst search. } // main program // initialization // found an undiscovered vertex // start a new search here // start a search at u // mark u visited // if neighbor v undiscovered // . As with BFS. We use four auxiliary arrays.3 in CLR. and it has the nice property that nontree edges have a good deal of mathematical structure. These are time stamps. you are beginning a new search. Normally. pred[u] = null. } DFSVisit(u) { color[u] = gray. Depth-First Search DFS(G) { for each u in V { color[u] = white.. When you return to the same room. To solve it you might use the following strategy. We will discuss this tree structure further below. } color[u] = black. DFSVisit(v).2 and 23.) The algorithm is shown in code block below. Depth-First Search: The next traversal algorithm that we will study is called depth-ﬁrst search. when you enter a new room. When all doors have been tried in a given room. Consider the problem of searching a castle for treasure. We will also associate two numbers with each vertex. and black means ﬁnished. try a different door leaving the room (assuming it goes somewhere you haven’t already been).. } time = 0. In particular. pointing back to the vertex that discovered a given vertex. gray means discovered but not ﬁnished processing. This is somewhat harder to see than the BFS analysis. f[u] = ++time. because the recursive nature of the algorithm obscures things. DFS induces a tree structure.. As you enter a room of the castle. 25. The same algorithm works for undirected graphs (but the resulting structure imposed on the graph is different). recurrences are good ways to analyze recursively Lecture Notes 35 CMSC 451 . then backtrack. d[u] = ++time. paint some grafﬁti on the wall to remind yourself that you were already there. When we ﬁrst discover a vertex u store a counter in d[u] and when we are ﬁnished processing a vertex we store a counter in f [u].Lecture 10: Depth-First Search Read: Sections 23. for each u in V if (color[u] == white) DFSVisit(u)..visit v // we’re done with u Analysis: The running time of DFS is Θ(V + E). Depth-First Search Algorithm: We assume we are given an directed graph G = (V. As before we maintain a color for each vertex: white means undiscovered. and illustrated in Fig. (Note: Do not confuse the discovery time d[v] with the distance d[v] from BFS. for each v in Adj(u) do if (color[v] == white) { pred[v] = u. Successively travel from room to room as long as you come to a place you haven’t already been. Notice that this algorithm is described recursively.set predecessor pointer // . The purpose of the time stamps will be explained later.

there is really no distinction between forward and back edges. A similar analysis holds if we consider DFS for undirected graphs.d DFS(a) DFS(b) e DFS(c) 1/. With undirected graphs. where the edge (u. 25: Depth-First search tree. (Can you see why not?) Lecture Notes 36 CMSC 451 . v) where v is a (not necessarily proper) ancestor of u in the tree. because there is no good notion of “size” that we can attach to each recursive call. the main DFS procedure runs in O(V ) time. First observe that if we ignore the time spent in the recursive calls.. c d c 3/4 c g 3/4 7/.. Ignoring the time spent in the recursive calls. a self-loop is considered to be a back edge). a DFS(f) DFS(g) 1/. there are some important differences in the structure of the DFS tree. c a 1/10 g 3/. Cross edges: (u. Thus the total time used in the procedure is T (V ) = V + u∈V (outdeg(u) + 1) = V + u∈V outdeg(u) + V = 2V + E ∈ Θ(V + E). v) where v is a proper descendent of u in the tree. This is left as an exercise. a f b a f 2/... by convention. but it is not true here. We can just analyze each one individually and add up their running times. (Thus. they are all called back edges by convention. a DFS(d) DFS(e) b return e 2/5 return f c 3/4 1/10 11/14 b f 2/5 6/9 12/13 e return g return f return a f 6/9 c g 3/4 7/8 g 7/8 Fig. Forward edges: (u. v) arises when processing vertex u we call DFSVisit(v) for some neighbor v. or a forest) on the structure of the graph. Furthermore. deﬁned algorithms. Tree structure: DFS naturally imposes a tree structure (actually a collection of trees.. For directed graphs the other edges of the graph can be classiﬁed as follows: Back edges: (u.. we can see that each vertex u can be processed in O(1 + outdeg(u)) time. and hence the call DFSVisit() is made exactly once for each vertex. Observe that each vertex is visited exactly once in the search. a return c return b 1/.. It is not difﬁcult to classify the edges of a DFS tree by analyzing the values of colors of the vertices and/or considering the time stamps. So. First. the edge may go between different trees of the forest). v) where u and v are not ancestors or descendents of one another (in fact. b b 2/5 b 2/5 6/. This is just the recursion tree. it can be shown that there can be no cross edges.

and back edges. and any DFS tree for G and any two vertices u. Proof: (⇐) If there is a back edge (u. and cross all have the property that they go from vertices with higher ﬁnishing time to vertices with lower ﬁnishing time. For example. forward. When we were processing u. Lemma: Consider a digraph G = (V. Because the intervals are disjoint. v). If the edge is a back edge then f [u] ≤ f [v]. v ∈ V . G has a cycle if and only the DFS forest has a back edge. implying that v has an earlier ﬁnish time. Proof: For tree. f [u]] and [d[v]. v) ∈ E. and by following tree edges from v to u we get a cycle. (E. Thus along any path. a DFS tree may only have a single back edge. f [u]] ⊆ [d[v]. and so v’s start-ﬁnish interval is contained within u’s. E) and any DFS forest for G. f [v]] are disjoint. E). A similar theorem applies to undirected graphs. f [v]]. the following are easy to observe. Lecture Notes 37 CMSC 451 . But you should not infer that there is some simple relationship between the number of back edges and the number of cycles.) For a cross edge (u. implying that v was started before u. v must have also ﬁnished before u. for a forward edge (u. By the lemma above. f [v]]. In CLR this is referred to as the parenthesis structure.g. each of the remaining types of edges. suppose you are given a graph or digraph. a 1/10 C d 11/14 a d b 2/5 6/9 f F B C C e 12/13 b c f g e c 3/4 g 7/8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Fig. Beware: No back edges means no cycles. You run DFS. ﬁnish times decrease monotonically. forward. and is not hard to prove. implying there can be no cycle. v) would be a tree edge). f [u]] ⊇ [d[v]. We do this with the help of the following two lemmas. then v is an ancestor of u. and consider any edge (u. Lemma: (Parenthesis Lemma) Given a digraph G = (V. v) we know that the two time intervals are disjoint. For example. Suppose there are no back edges. You can determine whether the graph contains any cycles very easily. consider any DFS forest of G. 26: Parenthesis Lemma. forward. or cross edge. tree. v). If this edge is a tree. v is a descendent of u.Time-stamp structure: There is also a nice structure to the time stamps. • u is unrelated to v if and only if [d[u]. E). then f [u] > f [v]. Lemma: Given a digraph G = (V. (⇒) We show the contrapositive. and there may anywhere from one up to an exponential number of simple cycles in the graph. Cycles: The time stamps given by DFS allow us to determine a number of things about a graph or digraph. v was not white (otherwise (u. • u is an ancestor of v if and only if [d[u]. In particular. • u is a descendent of v if and only if [d[u]. the proof follows directly from the parenthesis lemma.

if there are a series of tasks to be performed. and when each vertex is ﬁnished we add it to the front of a linked list. Topological Sort TopSort(G) { for each (u in V) color[u] = white. u can reach v and vice versa. Observe that the structure is essentially the same as the basic DFS procedure. but you can do the electrical wiring while you install the windows).Lecture 11: Topological Sort and Strong Components Read: Sects. and certain tasks must precede other tasks (e. given DFS. This is given below. We do our depth-ﬁrst search in a different order from the one given in CLRS.g. // L is an empty linked list for each (u in V) if (color[u] == white) TopVisit(u). and still have this Lecture Notes 38 CMSC 451 . // on finishing u add to list } This is typical example of DFS is used in applications. return L. Directed Acyclic Graph: A directed acyclic graph is often called a DAG for short DAG’s arise in many applications where there are precedence or ordering constraints. and so we get a different ﬁnal ordering. For example. of computing the strongly connected components (or strong components for short) of a digraph. We would like to write an algorithm that determines whether a digraph is strongly connected. but we only include the elements of DFS that are needed for this application. A topological sort of a DAG is a linear ordering of the vertices of the DAG such that for each edge (u. v). To do this we run a (stripped down) DFS. people want to know that there networks are complete in the sense that from any location it is possible to reach any other location in the digraph. Strong Components: Next we consider a very important connectivity problem with digraphs. The ﬁnal linked list order will be the ﬁnal topological order. A digraph is strongly connected if for every pair of vertices. Note that in general. In fact we will solve a generalization of this problem. u. v) means that task u must be completed before task v begins. in construction you have to build the ﬁrst ﬂoor before you build the second ﬂoor. // initialize L = new linked_list. u appears before v in the ordering. v ∈ V . In general a precedence constraint graph is a DAG in which vertices are tasks and the edge (u. given the precedence constraints. v) in a DAG. However both orderings are legitimate. the running time of topological sort is Θ(V + E). When digraphs are used in communication and transportation networks. // mark u visited for each (v in Adj(u)) if (color[v] == white) TopVisit(v). 22. Append u to the front of L.3–22. Thus. // L gives final order } TopVisit(u) { // start a search at u color[u] = gray. we partition the vertices of the digraph into subsets such that the induced subgraph of each subset is strongly connected. In particular. for every edge (u.5 in CLRS. (These subsets should be as large as possible. By the previous lemma. To compute a topological ordering is actually very easy. As an example we consider the DAG presented in CLRS for Professor Bumstead’s order of dressing. it sufﬁces to output the vertices in reverse order of ﬁnishing time. there may be many legal topological orders for a given DAG. the ﬁnish time of u is greater than the ﬁnish time of v. Bumstead lists the precedences in the order in which he puts on his clothes in the morning. As with depth-ﬁrst search.

but it is so clever that it is very difﬁcult to even see how it works. Thus all the vertices in a strong component must appear in the same tree of the DFS forest.) More formally. It is easy to see that mutual reachability is an equivalence relation. the answer is yes. shoes. Suppose that you knew the component DAG in advance.h. It is amazingly simple and efﬁcient. B) if and only if there are vertices u ∈ A and v ∈ B such that (u. then the resulting digraph. In general. but will not prove the algorithm’s correctness formally. We will give some of the intuition that leads to the algorithm. tie. (Can you see why?) Thus. Observe that if we merge the vertices in each strong component into a single super vertex.b. shirt. when you enter a strong component. By deﬁnition of DFS. jacket Fig. Observe that in the ﬁgure each strong component is just a subtree of the DFS forest. pants. See CLRS for a formal proof. consider the DFS of the digraph shown in the following ﬁgure (left).c Digraph and Strong Components Component DAG Fig. This equivalence relation partitions the vertices into equivalence classes of mutually reachable vertices. every vertex in the component is reachable.) Further Lecture Notes 39 CMSC 451 . 28: Strong Components. we may be accurately refer to it as the component DAG. Is it always true for any DFS? Unfortunately the answer is no. v) ∈ E.i h d. we say that two vertices u and v are mutually reachable if u and reach v and vice versa. 27: Topological sort. many strong components may appear in the same DFS tree. called the component digraph.) Does there always exist a way to order the DFS such that it is true? Fortunately. is necessarily acyclic. But humor me for a moment. and joint two supervertices (A. The algorithm that we will present is an algorithm designer’s “dream” (and an algorithm student’s nightmare).g. property. Strong Components and DFS: By way of motivation. c a d f e g i f. (This is ridiculous. (See the DFS on the right for a counterexample. and these are the strong components. so the DFS does not terminate until all the vertices in the component have been visited.e b a. because you would need to know the strong components. belt. and that is the problem we are trying to solve.shorts pants shirt belt tie jacket socks shoes 1/10 shorts 2/9 pants shirt tie 11/14 12/13 socks 15/16 3/6 belt 7/8 shoes 4/5 jacket Final order: socks. shorts.

Now. Clearly once the DFS starts within a given strong component. but the it also visits the other components as well. Unfortunately it is quite difﬁcult to understand why this algorithm works. each search cannot “leak out” into other components. If u and v are mutually reachable in G. (After all. The ordering trick is to order the vertices of G according to their ﬁnish times in a DFS. we do not know what the component DAG looks like. Given an adjacency list for G. First recall that GR (what CLRS calls GT ) is the digraph with the same vertex set as G but in which all edges have been reversed in direction. The “trick” behind the strong component algorithm is that we can ﬁnd an ordering of the vertices that has essentially the necessary property. we are trying to solve the strong component problem in the ﬁrst place). Correctness: Why visit vertices in decreasing order of ﬁnish times? Why use the reversal digraph? It is difﬁcult to justify these elements formally. it must visit every vertex within the component (and possibly some others) before ﬁnishing. However.) Observe that the strongly connected components are not affected by reversing all the digraph’s edges. run DFS. and refer you to CLRS for the complete proof. This leaves us with the intuition that if we could somehow order the DFS. i e 1/8 9/12 d 10/11 c a 13/18 b h 2/3 4/7 f a 14/17 c g 5/6 15/16 b 3/4 2/13 1/18 d 14/17 e 15/16 f 5/12 g 6/11 i 7/10 h 8/9 Fig. select the next available vertex according to this reverse topological order of the component digraph. because other components would have already have been visited earlier in the search. but every time you need a new vertex to start the search from. so that it hits the strong components according to a reverse topological order. by visiting components in reverse topological order of the component tree. I will present the algorithm. when the search is started at vertex a. 29: Two depth-ﬁrst searches. and put them in the same DFS tree. v) is an edge in the component digraph. Recall that the main intent is to visit the Lecture Notes 40 CMSC 451 . in the ﬁgure below right. then v comes before u in this reversed order (not after as it would in a normal topological ordering). All that changes is that the component DAG is completely reversed. All the steps of the algorithm are quite easy to implement. (u. not only does it visit its component with b and c. Then visit the nodes of GR in decreasing order of ﬁnish times. Here is an informal justiﬁcation. Here is the algorithm. it is possible to compute GR in Θ(V + E) time. then certainly this is still true in GR . That is.suppose that you computed a reversed topological order on the component digraph. However. then we would have an easy algorithm for computing strong components. and all operate in Θ(V + E) time. then the search may “leak out” into other strong components. without actually computing the component DAG. For example. Here is some intuition. though. The Plumber’s Algorithm: I call this algorithm the plumber’s algorithm (because it avoids leaks). (I’ll leave this as an exercise. If we do not start in reverse topological.

i}). For example. If the DFS visits C ﬁrst. the ﬁrst vertex in the topological order is the one with the highest ﬁnish time). Compute R = Reverse(G). and then will visit everything in C before ﬁnally returning to C. Recall from the topological sorting algorithm. computing finish times f[u] for each vertex u. deﬁne f (C) to be the maximum ﬁnish time among all vertices in this component. But there is something to notice about the ﬁnish times. v) or some other edge). Because there is an edge from C to C . If we consider the maximum ﬁnish time in each component. If there is an (u.Strong Components StrongComp(G) { Run DFS(G). reversing all edges of G. that in a DAG. we know from the deﬁnition of the component DAG that there cannot be a path from C to C. The question is how to order the vertices so that this is true. h.. In particular. } a b 2/13 1/18 d 14/17 9 c b a c d e g f i h c 3/4 f g i h 5/12 e 15/16 a 1 4 6/11 d e 2 f 7/10 3 5 h i 8 b g 6 7 8/9 Initial DFS Reversal with new vertex order Final DFS with components Fig. Here is a quick sketch. g. Run DFS(R) using this order. and its strong component is ﬁrst. So C will completely ﬁnish before we even start C. this suggests that we should visit the vertices in increasing order of ﬁnish time. some vertex of C will ﬁnish later than every vertex of C . ﬁnish times occur in reverse topological order (i. in the previous ﬁgure. then f (C) > f (C ). Thus. e}). 17 (for {d.e. c}). So. then the DFS will leak into C (along edge (u. then these are related to the topological order of the component DAG. the maximum ﬁnish times for each component are 18 (for {a. u∈C Lemma: Consider a digraph G = (V. and 12 (for {f. Sort the vertices of R (by CountingSort) in decreasing order of f[u]. starting with the lowest ﬁnishing time. not last. and they all have different ﬁnish times. The reason is that there are many vertices in each strong component. See the book for a complete proof. This is a good starting idea. v) of G such that u ∈ C and v ∈ C . The order 18. suppose that C is visited ﬁrst. 17. Each DFS tree is a strong component. 12 is a valid topological order for the component digraph. Lecture Notes 41 CMSC 451 . if we wanted to visit the components in reverse topological order. given any strong component C. It is tempting to give up in frustration at this point. On the other hand. but it turns out that it doesn’t work. E) and let C and C be two distinct strong components. in topological order. b. 30: Strong Components Algorithm strong components in a reverse topological order. Thus all the ﬁnish times of C will be larger than the ﬁnish times of C . For example. in the ﬁgure above observe that in the ﬁrst DFS (on the left) the lowest ﬁnish time (of 4) is achieved by vertex c. f (C) = max f [u].

The problem is that this is not what we wanted.This is a big help. and was responsible for handling all telephone connections. Lecture 12: Minimum Spanning Trees and Kruskal’s Algorithm Read: Chapt 23 in CLRS. Assuming that each edge (u. where the shaded rectangles indicate the edges in the spanning tree. a spanning tree is an acyclic subset of edges T ⊆ E that connects all the vertices together. So. and acyclic. Then a DFS of the reversed digraph GR . by forming GR . Steiner Minimum Trees: Minimum spanning trees are actually mentioned in the U. and so reverses the topological order. undirected. visits the strong components according to a reversed topological order of the component DAG of GR . It tells us that if we run DFS and compute ﬁnish times. Sort the vertices by decreasing order of ﬁnish time. A minimum spanning tree (MST) is a spanning tree of minimum weight. Note that the minimum spanning tree may not be unique.2. but it is true that if all the edge weights are distinct. and then run a new DFS in decreasing order of ﬁnish times. We assume that the network is undirected. it never pays to have any cycles (since we could break any cycle without destroying connectivity and decrease the total length). undirected graph G = (V. More formally. v). v) of G has a numeric weight or cost. The one on the left is not a minimum spanning tree. w(u. (An interesting observation is that not only do the edges sum to the same value. E). but it reverses the edges of the component graph. The reason is that AT&T was a government supported monopoly at one time. we will visit the components in topological order. and the other two are. then the MST will be distinct (this is a rather subtle fact. up through 23. the ﬁnal trick is to reverse the digraph. Lecture Notes 42 CMSC 451 . 31 shows three spanning trees for the same graph. given a connected. Is this a coincidence? We’ll see later. but in fact the same set of edge weights appear in the two MST’s. This does not change the strong components.S. We wanted a reverse topological order for the component DAG. (may be zero or negative) we deﬁne the cost of a spanning tree T to be the sum of edges in the spanning tree w(T ) = (u. which is exactly what we wanted. The computational problem is called the minimum spanning tree problem (MST for short). Since the resulting connection graph is connected. v).) 4 a 8 c b 9 2 1 Cost = 33 10 8 d 7 9 f 5 2 e 6 g a 8 4 b 9 2 c 10 8 d 7 9 f 5 2 e 6 g a 8 4 b 9 2 c 10 8 d 7 9 f 5 2 e 6 g 1 Cost = 22 1 Cost = 22 Fig. To minimize the length of the connecting network. which we will not prove). 31: Spanning trees (the middle and right are minimum spanning trees. Minimum Spanning Trees: A common problem in communications networks and circuit design is that of connecting together a set of nodes (communication sites or circuit components) by a network of minimal total length (where length is the sum of the lengths of connecting wires). Fig. If a company wanted to connect a collection of installations by an private internal phone system. legal code.v)∈T w(u. In conclusion we have: Theorem: Consider a digraph G on which DFS has been run. it is a free tree.

assuming that you are allowed additional nodes (called Steiner points) is called the Steiner minimum tree (or SMT for short).83. v)} is viable. v) ∈ E − A is safe if A ∪ {(u. This is better than the MST. Recall that a greedy algorithm is one that builds a solution by repeated selecting the cheapest (or generally locally optimal choice) among all options at each stage. we maintain a subset of edges A. Given a subset of edges A. It is easy to see that the cost of any MST for this conﬁguration is 3 (as shown on the left). We say that a subset A ⊆ E is viable if A is a subset of edges in some MST. The intuition behind the greedy MST algorithms is simple. In other words. consider four installations that lie at the corners of a 1 × 1 square. • There exists a unique path between any two vertices of a free tree. • Adding any edge to a free tree creates a unique cycle. 32. since it is not necessarily unique. . (We cannot say “the” MST. On the left. if you introduce a new installation at the center. 32: Steiner Minimum tree. . However. (Note that viability is a property of subsets of edges and safety is a property of a single edge. Note that if A is viable it cannot contain a cycle. the US Legal code is rather ambiguous on the point as to whether the phone company was required to use MST’s or SMT’s in making connections. the SMT problem is much harder. they never “unmake” this choice. An interesting fact is that although there is a simple greedy algorithm for MST’s (as we will see below). Such an installation served no purpose other than to act as an intermediate point for connections. They are all quite easy to prove. and we will add edges one at a time. An edge (u. MST SMT Steiner point 1 Cost = 3 Cost = 2 sqrt(2) = 2. (Luckily for AT&T. which will initially be empty. the choice (u. or is it? Some companies discovered that they could actually reduce their connection costs by opening a new bogus installation. the problem of determining the lowest cost interconnection tree between a given set of nodes.AT&T was required (by law) to connect them in the minimum cost manner.) When is an edge safe? We consider the theoretical issues behind determining whether an edge is safe or not. whose √ distance to each of the other four points is 1/ 2. Lemma: • A free tree with n vertices has exactly n − 1 edges.83 Fig. v) is a safe choice to add so that A can still be extended to form an MST. and in fact is NP-hard. E) be an undirected. Let S be a subset of the vertices S ⊆ V . Let G = (V. v) crosses the cut if one endpoint is in S and the other is in V − S. It is now possible to connect these ﬁve points with a total cost √ √ of 4/ 2 = 2 2 ≈ 2. Breaking any edge on this cycle restores a free tree. connected graph whose edges have numeric edge weights (which may be positive. let us review some basic facts about free trees. until A equals the MST.) Generic approach: We will present two greedy algorithms (Kruskal’s and Prim’s algorithms) for computing a minimum spanning tree. An example is shown in Fig. A generic greedy algorithm operates by repeatedly adding any safe edge to the current spanning tree.) We say that an edge (u. Assume that all edge lengths are just Euclidean distances. In general. we Lecture Notes 43 CMSC 451 . negative or zero). An important characteristic of greedy algorithms is that once they make a choice. which is clearly a spanning tree . A cut (S. V − S) is just a partition of the vertices into two disjoint subsets. Before presenting these algorithms.

Then the edge (u. If it does.e. As the algorithm continues. v) is safe. We have w(T ) = w(T ) − w(x.) MST Lemma: Let G = (V. and since any cycle must cross the cut an even number of times. We will derive a contradiction. by the MST Lemma. 9 u 4 7 x 6 A 8 y x 8 y x y T’ = T − (x. then the lightest edge crossing a cut is a natural choice. 33: Proof of the MST Lemma. and we wish to know which edges can be added that do not induce a cycle in the current MST. V − A ). y) is not in A (because the cut respects A). v) to T . Thus w(T ) < w(T ). if among all edges crossing the cut. v) is the light edge crossing cut (S. we have w(u. v) be a light edge crossing this cut. v) then we are done. This contradicts the assumption that T was an MST. and we consider the next edge in order. so that it can be applied to both algorithms. and so this cut respects A. (It is stated in complete generality. Consider the cut (A . y) + w(u. Every edge crossing the cut is not in A. it has the minimum weight (the light edge may not be unique if there are duplicate edge weights). a subset of some MST). and suppose that this edge does not induce a cycle in A. The main theorem which drives both algorithms is the following. Edge (u. undirected graph with real-valued weights on the edges. the edges of A will induce a forest on the vertices. then this edge is passed over. It essentially says that we can always augment A by adding the minimum weight edge that crosses a cut which respects A. the trees of this forest are merged together. Since u and v are on opposite sides of the cut. until we have a single tree containing all the vertices. v) is the light edge across the cut (because any lighter edge would have been considered earlier by the algorithm). V − S). Add the edge (u. Intuition says that since all the edges that cross a respecting cut do not induce a cycle. Let A be a viable subset of E (i. It is not hard to see why respecting cuts are important to this problem. (u. v). ). v) is lightest edge crossing the cut. and let (u. If T contains (u. Proof: It will simplify the proof to assume that all the edge weights are distinct. Thus. Note that as this algorithm runs. Suppose that no MST contains (u. An edge of E is a light edge crossing a cut. v).v) v u 4 v u 4 v T + (u. and (u. Kruskal’s Algorithm: Kruskal’s algorithm works by attempting to add edges to the A in increasing order of weight (lightest edges ﬁrst). Lecture Notes 44 CMSC 451 . Since (u. v) is safe for A. Let T be any MST for G (see Fig. call it T . If the next edge does not induce a cycle among the current set of edges.say that a cut respects A if no edge in A crosses the cut. E) be a connected.v) Fig. v) that Kruskal’s algorithm seeks to add next. The edge (x. If we have computed a partial MST. Observe that this strategy leads to a correct algorithm.y) + (u. By removing (x. v) < w(x. y) in T that crosses the cut. Why? Consider the edge (u. then it is added to A. y). any edge that crosses a respecting cut is a possible candidate. y) we restore a spanning tree. Let A denote the tree of the forest A that contains vertex u. there must be at least one other edge (x. let (S. V − S) be any cut that respects A. thus creating a cycle.

Find-Set(u): Find the set that contains a given item u. Observe that it takes Θ(E log E) time to Lecture Notes 45 CMSC 451 . let V be the number of vertices and E be the number of edges. v): Merge the set containing u and the set containing v into a common set. we may assume that E ≥ V − 1. Since the graph is connected. The algorithm is shown below. However we will not go into this here.E). (The Union-Find data structure is quite interesting. the vertices of the graph will be the elements to be stored in the sets.) In Kruskal’s algorithm. Analysis: How long does Kruskal’s algorithm take? As usual. 34. For our purposes it sufﬁces to know that each of these operations can be performed in O(log n) time. Kruskal’s Algorithm Kruskal(G=(V. and the sets will be vertices in each tree of A. The set A can be stored as a simple list of edges. v) } } return A } 1 4 a 8 b 9 10 8 d 2 1 10 8 7 c 2 1 9 7 9 e 6 5 c 2 g 2 4 a 8 b 9 10 8 c 2 1 10 8 7 c 2 1 9 7 9 e 6 5 c 2 g 2 4 a 8 b 9 10 8 c 2 1 10 8 7 c 2 1 9 7 9 e 6 5 c 2 4 e 6 5 c 2 c c c c c 10 9 9 4 c 8 c 9 c c 6 5 c 2 8 7 6 4 c a 8 a 9 c c 6 5 c 2 c 5 4 a 8 a 9 c Fig. O(log n) time is fast enough for its use in Kruskal’s algorithm. This data structure supports three operations: Create-Set(u): Create a set containing a single item v. because it can actually perform a sequence of n operations much faster than O(n log n) time. and an example is shown in Fig. You may use it as a “black-box”. Each vertex is labeled according to the set that contains it.The only tricky part of the algorithm is how to detect efﬁciently whether the addition of an edge will create a cycle in A.v) from the sorted list) { if (Find_Set(u) != Find_Set(v)) { // u and v in different trees Add (u. You are not responsible for knowing how this data structure works (which is described in CLRS). We could perform a DFS on subgraph induced by the edges of A.v) to A Union(u. on a set of size n. 34: Kruskal’s Algorithm. Union(u. but this will take too much time. We want a fast test that tells us whether u and v are in the same tree of A.w) { A = {} // initially A is empty for each (u in V) Create_Set(u) // create set for each vertex Sort E in increasing order by weight w for each ((u. This can be done by a data structure (which we have not studied) called the disjoint set Union-Find data structure.

At any time. and each iteration involves a constant number of accesses to the Union-Find data structure on a collection of V items. key): Insert u with the key value key in Q. Observe that if we consider the set of vertices S currently part of the tree. we have a cut of the graph and the current set of tree edges A respects this cut. insert(u. It is easy to see. shortest paths. Then u is added to the vertices of S. not only is Prim’s a different way to solve the same MST problem. and the cut changes. it is also the same way to solve a different problem. Lecture Notes 46 CMSC 451 . The ﬁrst is to show that there is more than one way to solve a problem (an important lesson to learn in algorithm design). extractMin(): Extract the item with the minimum key value in Q. we will make use of a priority queue data structure. Thus the total running time is the sum of these. we could write this more simply as Θ(E log V ). where each item is associated with a key value. The priority queue supports three operations. and the second is that Prim’s algorithm looks very much like another greedy algorithm. and inserting them one by one into the spanning tree. called Dijkstra’s algorithm. In the ﬁgure. for a total of Θ(E log V ). O((V + E) log V ). (Whatever that means!) Different ways to grow a tree: Kruskal’s algorithm worked by ordering the edges. which is Θ((V + E) log V ). Thus each access is Θ(V ) time. the subset of edges A forms a single tree (in Kruskal’s it formed a forest). Baruvka’s algorithm is not described in CLRS. 35: Prim’s Algorithm. This is a data structure that stores a set of items. taking care never to introduce a cycle. Thus. In contrast. this is the edge of weight 4 going to vertex u. Prim’s algorithm builds the tree up by adding leaves one at a time to the current tree. Since V is asymptotically no larger than E. and its complement (V − S). We start with a root vertex r (it can be any vertex). and how to determine the light edge quickly. and others that were not crossing the cut are. We look to add a single vertex as a leaf to the tree. It differs from Kruskal’s algorithm only in how it selects the next safe edge to add at each step. To do this. Its running time is essentially the same as Kruskal’s algorithm. Intuitively Kruskal’s works by merging or splicing two trees together. Note that some edges that crossed the cut before are no longer crossing it. 12 10 6 r 11 4 u 5 7 r 12 10 6 7 u 5 3 9 Fig. The process is illustrated in the following ﬁgure. that the key questions in the efﬁcient implementation of Prim’s algorithm is how to update the cut efﬁciently. that we will study for a completely different problem. The for-loop is iterated E times. Recall that this is the data structure used in HeapSort. Prim’s Algorithm: Prim’s algorithm is another greedy algorithm for minimum spanning trees. Lecture 13: Prim’s and Baruvka’s Algorithms for MSTs Read: Chapt 23 in CLRS. Which edge should we add next? The MST Lemma from the previous lecture tells us that it is safe to add the light edge.sort the edges. until all the vertices are in the same tree. There are two reasons for studying Prim’s algorithm.

we account for the time spent on each vertex as it is extracted from the priority queue. If there is not edge from u to a vertex in V − S. The arrows on edges indicate the predecessor pointers. Prim’s Algorithm Prim(G. So the overall running time is T (V. What do we store in the priority queue? At ﬁrst you might think that we should store the edges that cross the cut. and this is what makes Prim’s algorithm so nice. The other steps of the update are constant time. // new lighter edge out of v Q.extractMin(). Thus the time is O(log V + deg(u) log V ) time. } } color[u] = black. and the numeric label in each vertex is the key value. Here is Prim’s algorithm.r) { for each (u in V) { // initialization key[u] = +infinity. V is asymptotically no greater than E. // vertex with lightest edge for each (v in Adj[u]) { if ((color[v] == white) && (w(u. We will also need to know which vertices are in S and which are not. new key): Decrease the value of u’s key value to new key.v). Q = new PriQueue(V). then we set its key value to +∞. A priority queue can be implemented using the same heap data structure used in heapsort. There is a much more elegant solution. so this is Θ(E log V ). Lecture Notes 47 CMSC 451 . pred[v] = u. we spend potentially O(log V ) time decreasing the key of the neighboring vertex. We do this by coloring the vertices in S black. // start at root pred[r] = nil. which is the weight of the lightest edge going from u to any vertex in S. It takes O(log V ) to extract this vertex from the queue. color[u] = white.nonEmpty()) { // until all vertices in MST u = Q. The root vertex r can be any vertex in V . This is exactly the same as Kruskal’s algorithm. where n is the number of items in the heap. } [The pred pointers define the MST as an inverted tree rooted at r] } The following ﬁgure illustrates Prim’s algorithm.decreaseKey(v. For each vertex in u ∈ V − S (not part of the current spanning tree) we associate u with a key value key[u]. To analyze Prim’s algorithm. All of the above operations can be performed in O(log n) time. this results in a complicated sequence of updates. E) = u∈V (log V + deg(u) log V ) = u∈V (1 + deg(u)) log V = log V u∈V (1 + deg(u)) = (log V )(V + 2E) = Θ((V + E) log V ).w.v) < key[v])) { key[v] = w(u. We also store in pred[u] the end vertex of this edge in S. since this is what we are removing with each step of the algorithm.decreaseKey(u. } key[r] = 0. key[v]). For each incident edge. Since G is connected. The problem is that when a vertex is moved from one side of the cut to the other. // put vertices in Q while (Q.

A = {}. This suggests a more parallel way to grow the MST.5 10 5 6 5 2 2 4 9 8 4 9 8 5 6 5 2 2 2 4 9 8 5 6 5 2 2 8 2 1 7 9 8 2 1 7 9 8 2 2 1 7 9 Fig. Unlike Kruskal’s and Prim’s algorithms. Each component determines the lightest edge that goes from inside the component to outside the component (we don’t care where).? 10 10 6 8 7 9 2 5 ? 9 2 2 1 1 1 Q: 2. So. we add the lightest-weight edge that connects two different components together. in the sense that it works by maintaining a collection of disconnected trees.? 10 10 6 8 7 9 8 5 ? 9 2 2 8 ? 1 Q: 2. As a result. Initially. return A.5 10 8 4 8 Q: 1. each vertex is by itself in a one-vertex component. Let us call each subtree a component. it is the easiest to implement on a parallel computer.2.A). it may seem like complete overkill to consider yet another algorithm. Baruvka’s Algorithm: We have seen two ways (Kruskal’s and Prim’s algorithms) for solving the MST problem. we argued (from the MST Lemma) that the lightest such edge will be safe to add to the MST. } while (there are 2 or more components).? 10 4 ? 6 8 7 9 ? 5 ? 9 2 2 8 ? 1 Q: <empty> 10 4 4 8 Q: 8.8. add {u. all of these edges are safe. 36: Prim’s Algorithm. } apply DFS to graph H=(V.4 8 Q: 4. to compute the new components.10.?. well before the ﬁrst computers). // return final MST edges Lecture Notes 48 CMSC 451 .8. To prove Kruskal’s algorithm correct. // A holds edges of the MST do { for (each component C) { find the lightest edge (u. many components will be merged together into a single component. By the above observation. which add edges one at a time. w) { initialize each vertex to be its own component. so we may add them all at once to the set A of edges in the MST. Baruvka’s algorithm is similar to Kruskal’s algorithm. This one is called Baruvka’s algorithm.v) with u in C and v not in C. We then apply DFS to the edges of A. A fairly high-level description of Baruvka’s algorithm is given below.2.?.E). It is actually the oldest of the three algorithms (invented in 1926. to identify the new components. Baruvka’s Algorithm Baruvka(G=(V. In fact.?. This process is repeated until only one component remains.?. a closer inspection of the proof reveals that the cheapest edge leaving any component is always safe. Baruvka’s algorithm adds a whole set of edges all at once to the MST.10. Note that two components might select the same edge by this process. We say that such an edge leaves the component.v} to A (unless it is already there). The reason for studying this algorithm is that of the three algorithms. Recall that with each stage of Kruskal’s algorithm.

but never higher than m/2 (if they merge in pairs). When edge weights are present.) The algorithm is illustrated in the ﬁgure below. We have already seen that breadth-ﬁrst search is an O(V + E) algorithm for ﬁnding shortest paths from a single source vertex to all other vertices. the total running time is Θ((V + E) log V ) time. which we will not spell out in detail. Thus. this can happen at most lg V time. and we wish to compute the shortest paths from a single source vertex to all other vertices in the graph. each iteration (of the outer do-while loop) can be performed in Θ(V + E) time. Again. but only traversing the edges of A to compute the components. (In fact. Thus all three algorithms have the same asymptotic running time. with a little more cleverness. and the weights represent the cost of traveling from one city to another (nonexistent edges can be thought of a having inﬁnite cost). Each of the m components. assuming that the graph has no edge weights. 37: Baruvka’s Algorithm. except to note that they can be solved in Θ(V + E) time through DFS. we deﬁne the length of a Lecture Notes 49 CMSC 451 . will merge with at least one other component. Suppose that the graph has edge weights. By the way. With these labels it is easy to determine which edges go between components (since their endpoints have different labels). To see why. Think of the vertices as cities. Computing all-pairs shortest paths can be solved by iterating a single-source algorithm over all vertices. but there are other global methods that are faster. Analysis: How long does Baruvka’s algorithm take? Observe that because each iteration involves doing a DFS. Then we can traverse each component again to determine the lightest edge that leaves the component. by reversing the edges). Lecture 14: Dijkstra’s Algorithm for Shortest Paths Read: Chapt 24 in CLRS. we can do all this without having to perform two separate DFS’s. Shortest Paths: Consider the problem of computing shortest paths in a directed graph. V is asymptotically no larger than E. We label each vertex with its component number as part of this process. until only one component remains.There are a number of unspeciﬁed details in Baruvka’s algorithm. which can be solved in the transpose digraph (that is. the number of components decreases by at least half with each iteration. Each DFS tree will correspond to a separate component. we may apply DFS. Afterwards the number of remaining components could be a low as 1 (if they all merge together). so we can write this more succinctly as Θ(E log V ). b 1 a 14 9 2 c 15 13 8 9 2 h 15 h 12 h 11 4 6 h 7 8 d 12 e 11 4 10 6 g 7 13 f 10 h 3 i a 14 1 9 2 c 15 13 8 9 2 a 15 a 12 h 11 4 6 h 7 a 8 c 12 e 11 4 10 6 e 7 13 h 10 h 3 h h 1 h 14 h h 3 h a 14 1 a h h 3 h Fig. since G is connected. There is also a single sink problem. Most algorithms for this problem are variants of the single-source algorithm that we will present. there are other formulations of the shortest path problem. One may want just the shortest path between a single pair of vertices. The question is how many iterations are required in general? We claim that there are never more than O(log n) iterations needed. Since we start with V components. Thus. let m denote the number of components at some stage. First.

v) s 0 Fig. We will discuss a simple greedy algorithm. As in breadth-ﬁrst search. until all the d[v] values converge to the true shortest distances. we will construct the reversal of the shortest path to v. Shortest Paths and Relaxation: The basic structure of Dijkstra’s algorithm is to maintain an estimate of the shortest path for each vertex. By taking this path and following it with the edge (u.) Single Source Shortest Paths: The single source shortest path problem is as follows. Deﬁne the distance between two vertices. which assumes there are no negative edge weights. We know that there is a path from s to u of weight d[u]. The text discusses the Bellman-Ford algorithm for ﬁnding shortest paths assuming negative weight edges but no negative-weight cycles are present. we should update d[v] to the value d[u] + w(u. u s 0 3 5 11 u v relax(u. (NOTE: Don’t confuse d[v] with the d[v] in the DFS algorithm. we need to add the requirement that there be no cycles whose total cost is negative (otherwise you make the path inﬁnitely short by cycling forever through such a cycle). which we do by updating v’s predecessor pointer. called Dijkstra’s algorithm. Suppose that we have already computed current estimates on d[u] and d[v]. We are given a directed graph with nonnegative edge weights G = (V.v) { if (d[u] + w(u. δ(u. (δ(u. Computing the actual path will be a fairly simple extension. then push it a little closer to the optimum. This. v) we get a path to v of length d[u]+ w(u.v) < d[v]) { d[v] = d[u] + w(u. it attempts to update d[v] for each vertex in the graph. call this d[v]. 38. As the algorithm goes on. v). for each vertex we will have a pointer pred[v] which points back to the source. 3 5 8 v Relaxing an edge Relax(u. Intuitively. 38: Relaxation.) Intuitively d[v] will be the length of the shortest path that the algorithm knows of from s to v.v) pred[v] = u } } // is the path through u shorter? // yes. The process by which an estimate is updated is called relaxation. Consider an edge from a vertex u to v whose weight is w(u. This is illustrated in Fig. value will always greater than or equal to the true shortest path distance from s to v.path to be the sum of edge weights along the path. Initially. we know of no paths. If this path is better than the existing path of length d[v] to v. Initially d[s] = 0 and all the other d[v] values are set to ∞. It is possible to have graphs with negative edges. but in order for the shortest path to be well deﬁned. E) and a distinguished source vertex. v). so d[v] = ∞. Here is how relaxation works. s ∈ V . u) = 0 by considering path of 0 edges from u to itself. We should also remember that the shortest path to v passes through u. We will stress the task of computing the minimum distance from the source to each vertex. if you discover a path from s to v shorter than d[v]. In particular. and sees more and more vertices. u and v. v) to be the length of the minimum length path from u to v. then you need to update d[v]. then take it // record that we go through u Lecture Notes 50 CMSC 451 . The problem is to determine the distance from the source vertex to every vertex in the graph. This notion is common to many optimization algorithms. v). They are completely different. if you can see that your solution is not yet reached an optimum value. By following the predecessor pointers backwards from any vertex.

we need to show that d[v] = δ(s. Dijkstra’s Algorithm: Dijkstra’s algorithm is based on the notion of performing repeated relaxations. u).g. In particular. Since at the end of the algorithm. where the key value of each vertex u is d[u]. Initially all vertices are white. the empty set. d[u] is the true shortest distance from s to u. How do we select which vertex among the vertices of V − S to add next to S? Here is where greedy selection comes in.g. thus we have d[u] > δ(s. take the unprocessed vertex that is closest (by our estimate) to s. If d[v] = δ(s. v) denote the length of the true shortest path from s to v. This way. d[u] is never less than δ(s. To see that Dijkstra’s algorithm correctly gives the ﬁnal true distances. 39. Initially S = ∅. it is possible to infer that result of the relaxation yields the ﬁnal distance value. Consider the situation just prior to the insertion of u. Dijkstra’s algorithm does exactly this. for which we claim we “know” the true distance. u). By our observations about relaxation. we store the vertices of V − S in a priority queue (e. v). Notice that the coloring is not really used by the algorithm. It is not hard to see that if we perform Relax(u. The greedy thing to do is to take the vertex of V − S for which d[u] is minimum. In order to perform this selection efﬁciently. all vertices are in S. Note the similarity with Prim’s algorithm. Extract Min(). (Note the remarkable similarity to Prim’s algorithm. (Note that it may be that x = s and/or y = u). Let (x. that is d[v] = δ(s. Dijkstra recognized that the best way in which to perform relaxations is by increasing order of distance from the source. we maintain a distance estimate d[u]. Correctness: Recall that d[v] is the distance value assigned to vertex v by Dijkstra’s algorithm. v) when the algorithm terminates. Lecture Notes 51 CMSC 451 . which states that once a vertex u has been added to S (i. for each vertex in u ∈ V − S. where x ∈ S and y ∈ V − S. One by one we select vertices from V − S to add to S. at some point this path must ﬁrst jump out of S. S ⊆ V . To implement this. Also recall that if we implement the priority queue using a heap. then all distance estimates are correct. Proof: It will simplify the proof conceptually if we assume that all the edge weights are strictly positive (the general case of nonnegative edges is presented in the text). then further relaxations cannot change its value. although a different key value is used there. and we set color[v] = black to indicate that v ∈ S. Lemma: When a vertex u is added to S. It is important when implementing the priority queue that this cross reference information is updated. Dijkstra’s algorithm operates by maintaining a subset of vertices. whenever a relaxation is being performed. Each vertex “knows” its location in the priority queue (e. y) be the edge taken by the path. The cleverness of any shortest path algorithm is to perform the updates in a judicious manner. and we set d[s] = 0 and all others to +∞. Suppose to the contrary that at some point Dijkstra’s algorithm ﬁrst attempts to add a vertex u to S for which d[u] = δ(s. and Decrease Key(). Here is Dijkstra’s algorithm. u).) An example is presented in Fig. u). and let δ(s. d[u] = δ(s. v).e. Later we will justify why this is the proper choice. a heap). The set S can be implemented using an array of vertex colors. the running time is the same. v). Because s ∈ S and u ∈ V − S. Consider the true shortest path from s to u. so the convergence is as fast as possible. This is a consequence of the following lemma. that is.Observe that whenever we set d[v] to a ﬁnite value. the best possible would be to order relaxation operations in such a way that each edge is relaxed exactly once. namely Θ(E log V ). colored black). the d[v] values will eventually converge to the ﬁnal true distance value from s. we can perform the operations Insert(). Therefore d[v] ≥ δ(s. has a cross reference link to the priority queue entry). on a priority queue of size n each in O(log n) time. Because of the similarity between this and Prim’s algorithm. there is always evidence of a path of that length. and each entry in the priority queue “knows” which vertex it represents. v) repeatedly over all edges of the graph. but it has been included to make the connection with the correctness proof a little clearer.

39: Dijkstra’s Algorithm example.v) < d[v]) { // d[v] = d[u] + w(u.w.v) Q.Dijkstra’s Algorithm Dijkstra(G. d[v]) pred[v] = u } } color[u] = black } [The pred pointers define an ‘‘inverted’’ } initialization dist to source is 0 put all vertices in Q until all vertices processed select u closest to s Relax(u. Lecture Notes 52 CMSC 451 .decreaseKey(v. S s pred[u] u shorter path from s to u? x y d[y] > d[u] Fig.nonEmpty()) { // u = Q. 40: Correctness of Dijkstra’s Algorithm.extractMin() // for each (v in Adj[u]) { if (d[u] + w(u.v) shortest path tree] ? 7 s 0 2 ? 5 7 s 0 2 2 3 2 5 3 2 1 8 ? 4 5 ? 6 4 5 7 0 7 s 0 2 7 3 2 2 1 8 ? 4 5 ? 6 4 5 7 2 7 s 0 2 5 3 2 2 1 8 10 4 5 7 6 4 5 7 5 1 8 5 1 8 5 1 8 6 7 s 0 2 5 3 2 2 7 7 s 0 2 5 3 2 2 5 5 5 Fig.s) { for each (u in V) { // d[u] = +infinity color[u] = white pred[u] = null } d[s] = 0 // Q = new PriQueue(V) // while (Q.

d[u] is not correct. If (u. intuitively means that there is no direct link between these two nodes. We will present a Θ(n3 ) algorithm. The value of mid[i. is an edge of G. using i. in contradiction to our assumption that u is the next vertex to be added to S. v. For this algorithm. The distance between two vertices δ(u. Recall that the cost of a path is the sum of edge weights along the path. we will employ common matrix notation. j] will be set to null. Input Format: The input is an n × n matrix w of edge weights. if i = j.2 in CLRS. j]. j) ∈ E. so the savings is not justiﬁed here. To help us do this. Why? Since x ∈ S we have d[x] = δ(s. then the weight of this edge is denoted w(u. the shortest path cost from vertex i to j. Recovering the shortest paths will also be an issue. j and k to denote vertices rather than u.) The output will be an n × n distance matrix D = dij where dij = δ(i. E) be a directed graph with edge weights. we have δ(s. (Note that in digraphs it is possible to have self-loop edges. u). y) = δ(s. Thus y would have been added to S before u. which are based on the edge weights in the digraph. since we assume that there are no negative cost cycles. Let G = (V. / Setting wij = ∞ if there is no edge. storing all the inter-vertex distances will require Ω(n2 ) storage. rather than the more common adjacency list. Lecture 15: All-Pairs Shortest Paths Read: Section 25.) Since we applied relaxation to x when it was added. y) < δ(s. so they cannot be the same. 0 w(i. but we will not allow G to have any negative cost cycles. all prior vertices satisfy this. we will also compute an auxiliary matrix mid[i. then mid[i. Thus d[y] is correct. v) is the cost of the minimum cost path between them. and hence the direct cost is inﬁnite. j). called the Floyd-Warshall algorithm. j] will be a vertex that is somewhere along the shortest path from i to j. y) < δ(s. x). we would have set d[y] = d[x] + w(x. (Since u was the ﬁrst vertex added to S which violated this. and if it is positive. v). Because the algorithm is matrix-based. We let wij denote the entry in row i and column j of w. wij = +∞ if i = j and (i. Now observe that since y appears somewhere along the shortest path from s to u (but not at u) and all subsequent edges following y are of positive weight. and w as we usually do. Lecture Notes 53 CMSC 451 . there is no point in using it as part of any shortest path. These intermediate values behave somewhat like the predecessor pointers in Dijkstra’s algorithm. j) if i = j and (i. y). All-Pairs Shortest Paths: We consider the generalization of the shortest path problem. we will assume that the digraph is represented as an adjacency matrix. to computing shortest paths between all pairs of vertices. Although adjacency lists are generally more efﬁcient for sparse graphs. v) E. If the shortest path travels directly from i to j without passing through any other vertices. and by hypothesis. in order to reconstruct the ﬁnal shortest path in Θ(n) time. It cannot be negative. This algorithm is based on dynamic programming. We will allow G to have negative cost edges. and thus d[y] = δ(s. i) may generally be nonzero. The reason for setting wii = 0 is that there is always a trivial path of length 0 (using no edges) from any vertex to itself. We consider the problem of determining the cost of the shortest path between all pairs of vertices in a weighted directed graph. j) ∈ E.We argue that y = u. and so w(i. u) < d[u].

k−1 (k−1) (k−1) (k) 5 1 4 1 1 9 6 1 4 2 3 4 (0) d5.1. in the digraph shown in (k) the Fig. For example.Floyd-Warshall Algorithm: The Floyd-Warshall algorithm dates back to the early 60’s. . . . Warshall was interested in the weaker question of reachability: determine for each pair of vertices u and v. k − 1} (k−1) . 3}..6 = 6 (a) (3) (1) (2) (3) (5. . and hence the length of the shortest path is dij Do go through k: First observe that a shortest path does not pass through the same vertex twice. 41: Limiting intermediate vertices. In order for the overall path to be as short as possible we should take the shortest path from i to k. notice how the value of d5.6 = 9 d5. Floyd realized that the same technique could be used to compute shortest paths with only minor variations.6) (5.6 changes as k varies.6 = 13 (5. .) That is. . whether u can reach v. . 2. Since of (k−1) (k−1) + dkj . 6 has the lowest cost of 8. Note that a path consisting of a single edge has no intermediate vertices. so we can assume that we pass through k exactly once.1.6 can go through any combination of the intermediate vertices {1. (The assumption that there are no negative cost cycles is being used here. . j).3. k}: Don’t go through k at all: Then the shortest path from i to j uses only intermediate vertices {1. . As with any DP algorithm. . 2. The main feature of the Floyd-Warshall algorithm is in ﬁnding a the best formulation for the shortest path subproblem. . . The Floyd-Warshall algorithm runs in Θ(n3 ) time. In other words. 2. . v2 . . . . and the shortest path from k to j. Formulation: Deﬁne dij to be the shortest path from i to j such that any intermediate vertices on the path are chosen from the set {1.2. .. .2.6) dik dkj k (b) (4) Fig.. Rather than limiting the number of edges on the path. these paths uses intermediate vertices only in {1. v3 . 2.6 = 8 6 d5.6 dij (k−1) j 3 d5. . v we say that the vertices v2 . i = INF (no path) Vertices 1. . 2. .6) d5. . 41(a). for a path p = v1 . v −1 are the intermediate vertices of this path. they instead limit the set of vertices through which the path is allowed to pass. of which 5. but these intermediate can only be chosen from among {1. . For example d5. but it turns out that this does not lead to the fastest algorithm (but is an approach worthy of consideration). k − 1}. . Floyd-Warshall Update Rule: How do we compute dij assuming that we have already computed the previous matrix d(k−1) ? There are two basic cases. 2. the length of the path is dik (k) Lecture Notes 54 CMSC 451 .6) (5. k}. k}. we go from i to k. . . . we consider a path from i to j which either consists of the single edge (i. assuming that the intermediate vertices are chosen from {1. A natural way of doing this is by limiting the number of edges of the path. and then from k to j.2. or it visits some intermediate vertices along the way. 3. . depending on the ways that we might get from vertex i to vertex j. and to do so in any order. The path is free to visit any subset of these vertices. In particular.4.. the key is reducing a large problem to smaller problems.

All of this is ﬁne if it helps you discover an acceptably efﬁcient algorithm to solve your problem.from i .n. It is left as an exercise that this does not affect the correctness of the algorithm.j] = d[i.. heaps.j]) d[i.. We will leave the extraction of the shortest path as an exercise. What sort of design paradigm should be used (divide-and-conquer.n]) { array d[1. int w[1. Although Lecture Notes 55 CMSC 451 . Clearly the algorithm’s running time is Θ(n3 ).. The question that often arises in practice is that you have tried every trick in the book.n] for i = 1 to n do { for j = 1 to n do { d[i. Instead. Here is the complete algorithm. dynamic programming). 42.k] + d[k. and looking the values up as we need them. adjacency matrices).j] = W[i. dij dij (n) (0) = wij . 1. they occur in row k and column k.. but this will be prohibitively slow because the same value may be reevaluated many times. j] for extracting the ﬁnal shortest paths. We have also included mid-vertex pointers.2. Floyd-Warshall Algorithm Floyd_Warshall(int n. what is the running time of your algorithm. mid[i. 41(b). what sort of data structures might be relevant (trees.. which is illustrated in Fig..j]) < d[i.. We could write a recursive program to compute dij . Complexity Theory: At this point of the semester we have been building up your “bag of tricks” for solving algorithmic problems. we compute it by storing the values in a table.. Consider which entries might be overwritten and then reused.. The space used by the algorithm is Θ(n2 ). and nothing seems to work.) Lecture 16: NP-Completeness: Languages and NP Read: Chapt 34 in CLRS. DFS.to j new shorter path length new path is through k // matrix of distances An example of the algorithm’s execution is shown in Fig. Hopefully when presented with a problem you now have a little better idea of how to go about solving the problem. dik (k−1) + dkj (k−1) for k ≥ 1. up through section 34. greedy.This suggests the following recursive rule (the DP formulation) for computing d(k) .n. = min dij (k−1) (k) . Observe that we deleted all references to the superscript (k) in the code. The ﬁnal answer is dij because this allows all possible vertices as intermediate vertices. It can be shown that the overwritten values are equal to their original values.k] + d[k.j] = null } } for k = 1 to n do for i = 1 to n do for j = 1 to n do if (d[i.j] mid[i. graphs) and what representations would be best (adjacency list.j] mid[i.j] = k } return d } (k) // initialize // // // { // // use intermediates {1.k} . 1. (Hint: The danger is that values may be overwritten and then used later in the same phase.

Newly updates entries are circled.1 1 4 9 3 4 2 8 2 1 d = (0) 0 ? 4 ? 8 0 ? 2 ? 1 0 9 1 ? ? 0 1 4 5 9 4 2 1 3 8 2 12 d = (1) 0 ? 4 ? 8 0 12 2 ? 1 0 9 1 ? 5 0 1 ? = infinity 1 1 4 5 3 9 3 1 1 4 5 6 3 3 7 4 1 4 3 2 7 2 5 1 4 2 8 2 12 1 d = (2) 0 8 ? 0 4 12 ? 2 9 1 0 3 1 ? 5 0 5 2 8 2 12 1 4 5 6 7 4 d = (3) 9 3 3 0 5 4 7 8 0 12 2 9 1 0 3 1 6 5 0 1 d = (4) 0 5 4 7 3 0 7 2 4 1 0 3 1 6 5 0 Fig. Lecture Notes 56 CMSC 451 . 42: Floyd-Warshall Example.

the size of the input. but that would be unacceptably inefﬁcient. because k is an input to the problem. except for very small values of n. Is this a polynomial? No. However. Of course. we should be more careful and assume that arithmetic operations require at least as much time as there are bits of precision in the numbers being stored. The question is how to do this? Laying down the rules: We need some way to separate the class of efﬁciently solvable problems from inefﬁciently solvable problems. but there was also a growing list of problems for which there seemed to be no known efﬁcient algorithmic solutions. When designing algorithms it has been possible for us to be rather informal with various concepts. For example. or perhaps some proof that these problems are inherently hard to solve and no algorithmic solutions exist that run under exponential time. 2n ). so that it is clear that arithmetic can be performed efﬁciently. We will do this by considering problems that can be solved in polynomial time. you realize that it is running in exponential time. n. We have measured the running time of algorithms using worst-case complexity. Some functions that do not “look” like polynomials (such as O(n log n)) are bounded above by polynomials (such as O(n2 )). or worse! Near the end of the 60’s where there was great success in ﬁnding efﬁcient solutions to many combinatorial problems. then it is certainly not efﬁcient. we assume that there is not some signiﬁcantly shorter way of providing the same information. This discovery gave rise to the notion of NP-completeness. People began to wonder whether there was some unknown paradigm that would lead to a solution to these problems. but this would also be unacceptable. if an algorithm runs in worse than polynomial time (e.g. then you could solve all of them in polynomial time.g. When you analyze its √ n running time. Lecture Notes 57 CMSC 451 . The goal is no longer to prove that a problem can be solved efﬁciently by presenting an algorithm for it. 000) your algorithm never terminates. We will assume that numbers are expressed in binary or some higher base and graphs are expressed using either adjacency matrices or adjacency lists. perhaps n n . For example. or 2n . A problem is said to be solvable in polynomial time if there is a polynomial time algorithm that solves it. We have deﬁned input size variously for different problems. By a reasonably efﬁcient encoding. An algorithm whose running time is O(n1000 ) is certainly pretty inefﬁcient. We have also assumed that operations on numbers can be performed in constant time. From now on. Instead we will be trying to show that a problem cannot be solved efﬁciently. Many of these hard problems were interrelated in the sense that if you could solve any one of them in polynomial time. The important thing is that the exponent must be a constant independent of n. A polynomial time algorithm is any algorithm that runs in time O(nk ) where k is some constant that is independent of n. We have made use of the fact that an intelligent programmer could ﬁll in any missing details. suppose you have an algorithm which inputs a graph of size n and an integer k and runs in O(nk ) time. Near the end of the 60’s a remarkable discovery was made. Up until now all the algorithms we have seen have had the property that their worst-case running times are bounded above by some polynomial in the input size. implying that the running time would be O(nn ) which is not a polynomial in n. but the bottom line is the number of bits (or bytes) that it takes to represent the input using any reasonably efﬁcient encoding. as a function of n. or n!. This area is a radical departure from what we have been doing because the emphasis will change. saying that all polynomial time algorithms are “efﬁcient” is untrue.g. and created possibly the biggest open problems in computer science: is P = NP? We will be studying this concept over the next few lectures. such as by listing all of its cycles. 000 or n = 10. Nonetheless. you could write numbers in unary notation 111111111 = 1002 = 8 rather than binary. n ≤ 20) the really large applications that you want to solve (e. You could describe graphs in some highly inefﬁcient way. since we do not want leave any “loopholes” that would allow someone to subvert the rules in an unreasonable way and claim to have an efﬁcient solution when one does not really exist. or 2(2 ) . Some functions that do “look” like polynomials are not. We will usually restrict numeric inputs to be integers (as opposed to calling them “reals”). so the user is allowed to choose k = n. the task of proving that something cannot be done efﬁciently must be handled much more carefully. n = 1.your algorithm can solve small problems reasonably efﬁciently (e.

we will be introducing a number of classes. So is M ∈ P? No one knows the answer. and see whether the ﬁnal optimal weight is at most k. until ﬁnding one of length at least k. does G have a spanning tree whose weight is at most k? This may seem like a less interesting formulation of the problem. We could deﬁne a language L L = {(G. and P is a set of languages. Since we can compute minimum spanning trees in polynomial time. In fact. M = {(G. ﬁnd the minimum cost spanning tree. the algorithm would answer “yes” if (G. (Intuitively. Given any language. A problem is called a decision problem if its output is a simple “yes” or “no” (or you may think of this as True/False. For example. but for our purposes they mean the same things. (Although this may be an exaggeration in many cases. this corresponds to the set of all decisions problems that can be solved in polynomial time. P is deﬁned in terms of how hard it is computationally to recognized membership in the language. It does not ask for the weight of the minimum spanning tree. we have L ∈ P. we will show that M is NP-complete. k) | G has a MST of weight at most k}. and “no” otherwise. we can ask the question of how hard it is to determine whether a given string is in the language. We just store the graph internally. For rather technical reasons. run Kruskal’s algorithm. However. Here is a harder one. and it does not even ask for the edges of the spanning tree that achieves this weight. the adjacency matrix encoded as a string) followed by an integer k encoded as a binary number. Given a graph G and integer k how would you “recognize” whether it is in the language M ? You might try searching the graph for a simple paths. the minimum spanning tree decision problem might be: Given a weighted graph G and an integer k. then the more general optimization problem certainly cannot be solved efﬁciently either.g. However. Language Recognition Problems: Observe that a decision problem can also be thought of as a language recognition problem. Deﬁnition: Deﬁne P to be the set of all languages for which membership can be tested in polynomial time. like n − 1. most NP-complete problems that we will discuss will be phrased as decision problems.) Lecture Notes 58 CMSC 451 . and otherwise we reject. This set consists of pairs. though. the ﬁrst element is a graph (e. our job will be to show that certain problems cannot be solved efﬁciently. When presented with an input string (G.) Note that languages are sets of strings. If so we accept. If we show that the simple decision problem cannot be solved efﬁciently.Decision Problems: Many of the problems that we have discussed involve optimization of one form or another: ﬁnd the shortest path. A set of languages that is deﬁned in terms of how hard it is to determine membership is called a complexity class. in the case of the MST language L. accept/reject). In what follows. For example. k). ﬁnd the minimum weight triangulation. if not then you may spend a lot of time searching (especially if k is large. If you ﬁnd one then you can accept and terminate. We will generally refer to these problems as being “easy” or “efﬁciently solvable”. k) ∈ L implying that G has a spanning tree of weight at most k. we can determine membership easily in polynomial time. At ﬁrst it may seem strange expressing a graph as a string. We will phrase many optimization problems in terms of decision problems. We will jump back and forth between the terms “language” and “decision problems”. and no such path exists). let us say a bit about what the general classes look like at an intuitive level. but obviously anything that is represented in a computer is broken down somehow into a string of bits. 0/1. In the ﬁrst case we say that the algorithm “accepts” the input and otherwise it “rejects” the input. k) | G has a simple path of length at least k}. Before giving all the technical deﬁnitions. P: This is the set of all decision problems that can be solved in polynomial time.

one which is Hamiltonian and one which is not. but may be discussed in an advanced course on complexity theory. NP-complete: A problem is NP-complete if (1) it is in NP. Many language recognition problems that may be very hard to solve. Given an undirected graph G. then we could solve all NP problems in polynomial time. This problem is beyond the scope of this course. which stands for Quantiﬁed Boolean Formulas. does G have a cycle that visits every vertex exactly once. v7 . The Hamiltonian cycle problem seems to be much harder. but it is not known to be in P. the ﬁgure below shows two graphs. . It is known that this problem is in NP. v13 ”. The other is QBF. it does not have to be in the class NP. Thus. But it is bit more intuitive to explain the concept from the perspective of veriﬁcation. NP-hard. We could then inspect the Lecture Notes 59 CMSC 451 . but they have the property that it is easy to verify whether a string is in the language. v1 . Cycle Knapsack Hamiltonian Cycle Satisfiability Graph Isomorphism? MST Strong connectivity Harder NPC NP P Easy One way that things ‘might’ be. NP-hard: In spite of its name. There are some problems in the ﬁgure that we will not discuss. That is. it contains a number of easy problems. Since it is widely believed that all NP problems are not solvable in polynomial time. and NP-complete (NPC) might look. (There is a similar problem on directed graphs. Consider the following problem. it is important to introduce the notion of a veriﬁcation algorithm. Then it would be a very easy matter for someone to convince us of this. but it also contains a number of problems that are believed to be very hard to solve. However. to say that problem is NP-hard does not mean that it is hard to solve. Fig. NP. and there is also a version which asks whether there is a path that visits all vertices. . (We will give a deﬁnition of this below. QBF NP−Hard No Ham. For example. NP. and there is no known polynomial time algorithm for this problem. . They would simply say “the cycle is v3 .) This class contains P as a subset. and related complexity classes. NPC = NP∩NP-hard. suppose that a graph did have a Hamiltonian cycle. which asks whether two graphs are identical up to a renaming of their vertices. The ﬁgure below illustrates one way that the sets P.) We can describe this problem as a language recognition problem. Polynomial Time Veriﬁcation and Certiﬁcates: Before talking about the class of NP-complete problems. 43: The (possible) structure of P. . called the Hamiltonian cycle problem. We say might because we do not know whether all of these complexity classes are distinct or whether they are all solvable in polynomial time. In this problem you are given a boolean formula with quantiﬁers (∃ and ∀) and you want to know whether the formula is true or false. One is Graph Isomorphism.NP: This is the set of all decision problems that can be veriﬁed in polynomial time. where (G) denotes an encoding of a graph G as a string. it is widely believed that no NP-hard problem is solvable in polynomial time. Rather it means that if we could solve this problem in polynomial time. Originally the term meant “nondeterministic polynomial time”. Note that for a problem to be NP hard. where the language is HC = {(G) | G has a Hamiltonian cycle}. The term NP does not mean “not polynomial”. and (2) it is NP-hard.

NP is a set of languages based on some complexity measure (the complexity of veriﬁcation). Most experts believe that P = NP. If x is not in L then there is nothing to verify. we do not even need to see a certiﬁcate to solve the problem. just being able to verify that you have a correct solution does not help you in ﬁnding the actual solution very much. Like P. it is hard to imagine that someone could give us some information that would allow us to efﬁciently convince ourselves that a given graph is in the language. For example. but this would not be at all efﬁcient. can verify that x is in the language L using this certiﬁcate as help. It would be covered in a course on complexity theory or formal language theory. Next time we will deﬁne the notions of NP-hard and NP-complete. there is a very efﬁcient way to verify that a given graph is in HC. Why is the set called “NP” rather than “VP”? The original term NP stood for “nondeterministic polynomial time”. In other words. and given x ∈ L. The given cycle is called a certiﬁcate. They could try to list every other cycle of length n. we can solve it in polynomial time anyway). graph. However it is not known whether P = NP. Lecture Notes 60 CMSC 451 . a veriﬁcation algorithm is an algorithm which given x and a string y called the certiﬁcate. and check that this is indeed a legal cycle and that it visits all the vertices of the graph exactly once. We have avoided introducing nondeterminism here. then we can certainly verify membership in polynomial time. but what sort of certiﬁcate could they give us to convince us that this is the only one? They could give another cycle that is NOT Hamiltonian. Observe that P ⊆ NP. In other words. such a computer could nondeterministically guess the value of certiﬁcate. even though we know of no efﬁcient way to solve the Hamiltonian cycle problem. This is some piece of information which allows us to verify that a given string is in a language. but this does not mean that there is not another cycle somewhere that is Hamiltonian. (More formally. Note that not all languages have the property that they are easy to verify. and we could verify that it is a Hamiltonian cycle.Nonhamiltonian Hamiltonian Fig. Thus. if we can solve a problem in polynomial time. given a language L. Thus. It seems unreasonable to think that this should be so. Basically. The class NP: Deﬁnition: Deﬁne NP to be the set of all languages that can be veriﬁed by a polynomial time algorithm. and then verify that the string is in the language in polynomial time. but no one has a proof of this. consider the following languages: UHC = {(G) | G has a unique Hamiltonian cycle} HC = {(G) | G has no Hamiltonian cycle}. since there are n! possible cycles in general. This referred to a program running on a nondeterministic computer that can make guesses. More formally. Suppose that a graph G is in the language UHC. What information would someone give us that would allow us to verify that G is indeed in the language? They could give us an example of the unique Hamiltonian cycle. 44: Hamiltonian cycle.

We know (or you strongly believe at least) that H is hard. We know (or strongly believe) that H cannot be solved in polynomial time. they can certainly be veriﬁed in polynomial time. we could supply the certiﬁcate consisting of a sequence of vertices along the cycle. and then derive a contradiction by showing that H can be solved in polynomial time. NP-complete problems are expressed as decision problems.) Example: 3-Colorability and Clique Cover: Let us consider an example to make this clearer. NP: is deﬁned to be the class of all languages that can be veriﬁed in polynomial time. suppose that the language is the set of Hamiltonian graphs. assuming that the input has been encoded as a string. Before discussing reductions. on the way to deﬁning NP-completeness. and hence can be thought of as language recognition problems. and hence it is strongly believed that the problem cannot be solved in polynomial time. For example: = {G | G has a Hamiltonian cycle} MST = {(G. we will suppose that there is an algorithm that solves U in polynomial time. On the other hand. we need to introduce the concept of a reduction. H and U . Thus we have “reduced” problem H to problem U . thus we are essentially proving that the subroutine cannot exist. For example MST ∈ P but HC is not known (and suspected not) to be in P.4. We want to prove that U cannot be solved in polynomial time. Reductions: The class of NP-complete problems consists of a set of decision problems (languages) (a subset of the class NP) that no one knows how to solve efﬁciently. To convince someone that a graph is in this language. In other words. (Be sure that you understand this. but we suspect that it too is hard. Lecture Notes 61 CMSC 451 . such as HC. implying that U cannot be solved in polynomial time. so we have P ⊆ NP. However. that is it cannot be solved in polynomial time. this the basis behind all reductions. HC P: is the class of all decision problems which can be solved in polynomial time. We encode inputs as strings. x) | G has a MST of cost at most x}. Then all we need to do is to show that we can use this subroutine to solve problem H in polynomial time. through Section 34. Suppose that there are two problems. Summary: Last time we introduced a number of concepts. then every problem in NP would be solvable in polynomial time. Certiﬁcate: is a piece of evidence that allows us to verify in polynomial time that a string is in a given language. the complexity of U is unknown. Therefore HC ∈ NP. For example. How do we do this? Suppose that we have a subroutine that can solve any instance of problem U in polynomial time.Lecture 17: NP-Completeness: Reductions Read: Chapt 34. we could prove the contrapositive. It is important to note here that this supposed subroutine is really a fantasy. / / To do this. let us just consider the following question. (U ∈ P) ⇒ (H ∈ P). Note that since all languages in P can be solved in polynomial time. NP also seems to have some pretty hard problems to solve. It is easy to access the adjacency matrix to determine that this is a legitimate cycle in G. the following concepts are important. In particular. but if there were a polynomial time solution for even a single NP-complete problem. How would we do this? We want to show that (H ∈ P) ⇒ (U ∈ P). Decision Problems: are problems for which the answer is either yes or no. To establish this. The following problem is well-known to be NP-complete. to show that U is not solvable in polynomial time. O(nk ) for some constant k.

For our unknown problem U . . In the 3-coloring problem. Given a graph G = (V. The term “coloring” comes from the original application which was in map drawing. Clique Cover (CCov): Given a graph G = (V. they must not be adjacent. V2 . which we strongly suspect to not be solvable in polynomial time. you still cannot ﬁnd a polynomial time algorithm for the CCov problem. The clique cover problem arises in applications of clustering. / / which you will show by proving the contrapositive (CCov ∈ P) ⇒ (3Col ∈ P). Given a graph G and an integer k. In some sense. and that each Vi is a clique of G. this subroutine returns true if G has a clique cover of size k and false otherwise. In the ﬁgure below we give two graphs. the problems are almost the same. v ∈ V (u. The 3Col problem will play the role of the hard problem H. you assume that you have access to a subroutine CCov(G.3-coloring (3Col): Given a graph G. Suppose that you want to solve the CCov problem. Lecture Notes 62 CMSC 451 . such that no two adjacent vertices have the same label. Both problems involve partitioning the vertices up into groups. Thus. k) for any graph G and any integer k. k). consider the following problem. but the requirement adjacent/non-adjacent is exactly reversed. you want to show that (3Col ∈ P) ⇒ (CCov ∈ P). can each of its vertices be labeled with one of 3 different “colors”. Vk . can we partition the vertex set into k subsets of vertices V1 . E) and an integer k. It is well known that planar graphs can be colored with 4 colors. . The only difference here is that in one problem the number of cliques is speciﬁed as part of the input and in the other the number of color classes is ﬁxed at 3. In the clique cover problem. for two vertices to be in the same group they must be adjacent to each other. How can you prove that CCov is likely to not have a polynomial time solution? You know that 3Col is NP-complete. v) ∈ E. You / feel that there is some connection between the CCov problem and the 3Col problem. one is 3-colorable and one is not. and this subroutine is allowed to call the subroutine CCov(G. We want to know whether it is possible to cluster all the vertices into k groups. We put an edge between two nodes if they are similar enough to be clustered in the same group. That is. How can we use this “alleged” subroutine to solve the well-known hard 3Col problem? We want to write a polynomial time subroutine for 3Col. E). Two countries that share a common border should be colored with different colors. such that i Vi = V . and furthermore. . 3−colorable Not 3−colorable Clique cover (size = 3) Fig. the subgraph induced by V is a complete graph. we say that a subset of vertices V ⊆ V forms a clique if for every pair of vertices u. and hence experts believe that 3Col ∈ P. and there exists a polynomial time algorithm for this. But determining whether 3 colors are possible (even for planar graphs) seems to be hard and there is no known polynomial time algorithm. this subroutine runs in polynomial time. where there is a constraint that two objects cannot be assigned to the same set of the partition. but after a while of fruitless effort. To do this. 45: 3-coloring and Clique Cover. Coloring arises in various partitioning problems. . for two vertices to be in the same color group.

and hence the reduction is effectively equivalent to saying “since L1 is not likely to be solvable in polynomial time. V2 . Intuitively. v) is an edge of G if and only if it is not an edge of G.) We can then feed the pair (G. in sense of polynomial time computability. polynomial) time (e. output the pair (G. In other words. 3). 3} give the vertices of Vi color i. We usually have strong evidence that L1 is not solvable in polynomial time. 46: Clique covers in the complement. then so is L1 . 3). saying that L1 ≤P L2 means that “if L2 is solvable in polynomial time. Claim: A graph G is 3-colorable if and only if its complement G has a clique-cover of size 3. _ G G H _ H 3−colorable Coverable by 3 cliques Not 3−colorable Not coverable Fig. Thus f is computable in polynomial time. The way in which this is used in NP-completeness is exactly the converse.e. V3 . In particular we have f (G) = (G. 2. For i ∈ {1.” Thus. G is a graph on the same vertices.We claim that we can reduce the 3-coloring problem to the clique cover problem as follows. G ∈ 3Col iff (G. Lecture Notes 63 CMSC 451 . v} ∈ E(G) (since adjacent vertices cannot have the same color) which implies that {u. or equivalently whether x ∈ L1 . L1 is “no harder” than L2 . such that for all x. then L2 is also not likely to be solvable in polynomial time. v} ∈ E((G). Notice that in the example above. and place it on more formal footing.” This is because a polynomial time subroutine for L2 could be applied to f (x) to determine whether f (x) ∈ L2 . Thus. Lemma: If L1 ≤P L2 and L2 ∈ P then L1 ∈ P . Given a graph G for which we want to determine its 3-colorability. v} ∈ E(G) (since they are in a common clique). In the previous example we showed that 3Col ≤P CCov. then let V1 . Polynomial-time reduction: We now take this intuition of reducing one problem to another through the use of a subroutine call. Hence. since if distinct vertices u and v are both in Vi . we converted an instance of the 3-coloring problem (G) into an equivalent instance of the Clique Cover problem (G. (That is. then {u.e. denoted V1 . v} ∈ E(G). V2 . ﬂip 0’s and 1’s in the adjacency matrix). x ∈ L1 if and only if f (x) ∈ L2 . decision problem) L1 is polynomial-time reducible to language L2 (written L1 ≤P L2 ) if there is a polynomial time computable function f . We assert that this is a legal coloring for G. two vertices / with the same color are not adjacent. this is how polynomial time reductions can be used to show that problems are as hard to solve as known difﬁcult problems. Note that it is easy to complement a graph in O(n2 ) (i. Deﬁnition: We say that a language (i. V3 be the three color classes. We claim that this is a clique cover / of size 3 for G. since if u and v are distinct vertices in Vi . Thus every pair of distinct vertices in Vi are adjacent in G.g. 3) ∈ CCov. implying that {u. 3) where G denotes the complement of G. but (u. This is illustrated in the ﬁgure below. Proof: (⇒) If G 3-colorable. then {u. 3) into a subroutine for clique cover. (⇐) Suppose G has a clique cover of size 3.

NP: is the set of decision problems (or languages) that can be veriﬁed in polynomial time. 3SAT.Lemma: If L1 ≤P L2 and L1 ∈ P then L2 ∈ P . Lemma: L is NP-complete if (1) L ∈ N P and (2) L ≤P L for some known NP-complete language L . This is illustrated in the ﬁgure below. and conversely. for which it is known that if any one is solvable in polynomial time. but people taking other courses in complexity theory should be aware of this. To prove a second problem is NP-complete. then none are. In other words Lemma: If L1 ≤P L2 and L2 ≤P L3 then L1 ≤P L3 . Recall that we mentioned the following topics: P: is the set of decision problems (or languages) that are solvable in polynomial time. This is made mathematically formal using the notion of polynomial time reductions. Deﬁnition: A language L is NP-hard if: L ≤P L for all L ∈ NP. through 34. so how can we ever hope to do this? We will talk about this next time with Cook’s theorem. Unfortunately. because the deﬁnition says that we have to be able to reduce every problem in NP to this problem. Lecture Notes 64 CMSC 451 . and then to show that we can reduce SAT (or generally some known NPC problem) to our problem. then they all are. and Independent Set Read: Chapter 34.5. Lecture 18: Cook’s Theorem. An alternative (and usually easier way) to show that a problem is NP-complete is to use transitivity. it appears to be almost impossible to prove that one problem is NP-complete. It should be noted that our text uses the term “reduction” where most other books use the term “transformation”. implying that L is NP-hard. then their composition f (g(x)) is computable in polynomial time as well. The reason is that if two functions f (x) and g(x) are computable in polynomial time. Recap: So far we introduced the deﬁnitions of NP-completeness. The distinction is subtle. The reason is that all L ∈ N P are reducible to L (since L is NP-complete and hence NP-hard) and hence by transitivity L is reducible to L.) Deﬁnition: A language L is NP-complete if: (1) L ∈ N P and (2) L is NP-hard. / / One important fact about reducibility is that it is transitive. This gives us a way to prove that problems are NP-complete. once we know that one problem is NP-complete. (Note that L does not need to be in NP. NP-completeness: The set of NP-complete problems are all problems in the complexity class NP. There are inﬁnitely many such problems. The reduction given here is similar. if any one is not solvable in polynomial time. all we need to do is to show that our problem is in NP (and hence it is reducible to SAT). but not the same as the reduction given in the text. Cook showed that there is one problem called SAT (short for boolean satisﬁability) that is NP-complete. It follows that our problem is equivalent to SAT (with respect to solvability in polynomial time).

Keep this order in mind. or the path must visit every vertex. then every NP-complete problem (and in fact every problem in NP) is also solvable in polynomial time. NP-Complete: L is NP-complete if (1) L ∈ NP and (2) L is NP-hard. Stephen Cook showed that such a problem existed. He reasoned that computers (which represent the most general Lecture Notes 65 CMSC 451 . any set X can be described by choosing a set of objects. Cook’s theorem is quite complicated to prove. 47: Structure of NPC and reductions. that is. In showing that such a problem is in NP. Stephen Cook was looking for the most general possible property he could. If any NP-complete problems (and generally any NP-hard problem) is solvable in polynomial time. the property P (X) that you need to satisfy. NP-Hard: L is NP-hard if for all L ∈ NP. the set of objects must ﬁll the knapsack. Thus virtually all NP problems can be stated in the form. “does there exists X such that P (X)”. a set. Cook’s Theorem: Unfortunately. where X is some structure (e. For a problem to be in NP. Similarly. Lemma: L is NP-complete if (1) L ∈ N P and (2) L ≤P L for some NP-complete language L . The importance of NP-complete problems should now be clear. can be described as a boolean formula. then every NP-complete problem (and generally every NP-hard problem) cannot be solved in polynomial time. we could solve all NP problems in polynomial time. an assignment. but we’ll try to give a brief intuitive argument as to why such a problem might exist. to use this lemma. a partition. which in turn can be described as choosing the values of some boolean variables. Conversely. Note: The known NP-complete problem L is reduced to the candidate NP-complete problem L. Polynomial reduction: L1 ≤P L2 means that there is a polynomial time computable function f such that x ∈ L1 if and only if f (x) ∈ L2 . if L1 ≤P L2 and L2 ≤P L3 . A more intuitive to think about this. then we could use it to solve L1 in polynomial time. Thus all NP-complete problems are equivalent to one another (in that they are either all solvable in polynomial time.) and P (X) is some property that X must satisfy (e. and the veriﬁcation involves testing that P (X) holds. we need to have at least one NP-complete problem to start the ball rolling. Thus. a path. L ≤P L. it must have an efﬁcient veriﬁcation procedure. then L1 ≤P L3 . if we could solve L in polynomial time. In general. Polynomial reductions are transitive. the certiﬁcate consists of giving X. or none are). etc.g.g. An alternative way to show that a problem is NP-complete is to use transitivity of ≤P . since this should represent the hardest problem in NP to solve. is that if we had a subroutine to solve L2 in polynomial time. or you may use at most k colors and no two adjacent vertices can have the same color). if we can prove that any NP-complete problem (and generally any problem in NP) cannot be solved in polynomial time.SAT Your problem Your reduction Known NPC SAT NPC NP P NPC NPC NP P Proving a problem is NP−hard NP P Resulting structure Proving a problem is in NP Fig.

type of computational devices known) could be described entirely in terms of boolean circuits, and hence in terms of boolean formulas. If any problem were hard to solve, it would be one in which X is an assignment of boolean values (true/false, 0/1) and P (X) could be any boolean formula. This suggests the following problem, called the boolean satisﬁability problem. SAT: Given a boolean formula, is there some way to assign truth values (0/1, true/false) to the variables of the formula, so that the formula evaluates to true? A boolean formula is a logical formula which consists of variables xi , and the logical operations x meaning the negation of x, boolean-or (x ∨ y) and boolean-and (x ∧ y). Given a boolean formula, we say that it is satisﬁable if there is a way to assign truth values (0 or 1) to the variables such that the ﬁnal result is 1. (As opposed to the case where no matter how you assign truth values the result is always 0.) For example, (x1 ∧ (x2 ∨ x3 )) ∧ ((x2 ∧ x3 ) ∨ x1 ) is satisﬁable, by the assignment x1 = 1, x2 = 0, x3 = 0 On the other hand, (x1 ∨ (x2 ∧ x3 )) ∧ (x1 ∨ (x2 ∧ x3 )) ∧ (x2 ∨ x3 ) ∧ (x2 ∨ x3 ) is not satisﬁable. (Observe that the last two clauses imply that one of x2 and x3 must be true and the other must be false. This implies that neither of the subclauses involving x2 and x3 in the ﬁrst two clauses can be satisﬁed, but x1 cannot be set to satisfy them either.) Cook’s Theorem: SAT is NP complete. We will not prove this theorem. The proof would take about a full lecture (not counting the week or so of background on Turing machines). In fact, it turns out that a even more restricted version of the satisﬁability problem is NP-complete. A literal is a variable or its negation x or x. A formula is in 3-conjunctive normal form (3-CNF) if it is the boolean-and of clauses where each clause is the boolean-or of exactly 3 literals. For example (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x3 ∨ x4 ) ∧ (x2 ∨ x3 ∨ x4 ) is in 3-CNF form. 3SAT is the problem of determining whether a formula in 3-CNF is satisﬁable. It turns out that it is possible to modify the proof of Cook’s theorem to show that the more restricted 3SAT is also NP-complete. As an aside, note that if we replace the 3 in 3SAT with a 2, then everything changes. If a boolean formula is given in 2SAT, then it is possible to determine its satisﬁability in polynomial time. Thus, even a seemingly small change can be the difference between an efﬁcient algorithm and none. NP-completeness proofs: Now that we know that 3SAT is NP-complete, we can use this fact to prove that other problems are NP-complete. We will start with the independent set problem. Independent Set (IS): Given an undirected graph G = (V, E) and an integer k does G contain a subset V of k vertices such that no two vertices in V are adjacent to one another. For example, the graph shown in the ﬁgure below has an independent set (shown with shaded nodes) of size 4. The independent set problem arises when there is some sort of selection problem, but there are mutual restrictions pairs that cannot both be selected. (For example, you want to invite as many of your friends to your party, but many pairs do not get along, represented by edges between them, and you do not want to invite two enemies.) Note that if a graph has an independent set of size k, then it has an independent set of all smaller sizes. So the corresponding optimization problem would be to ﬁnd an independent set of the largest size in a graph. Often the vertices have weights, so we might talk about the problem of computing the independent set with the largest total weight. However, since we want to show that the problem is hard to solve, we will consider the simplest version of the problem. Lecture Notes 66 CMSC 451

Fig. 48: Independent Set. Claim: IS is NP-complete. The proof involves two parts. First, we need to show that IS ∈ NP. The certiﬁcate consists of the k vertices of V . We simply verify that for each pair of vertex u, v ∈ V , there is no edge between them. Clearly this can be done in polynomial time, by an inspection of the adjacency matrix.

boolean formula (in 3−CNF) 3SAT F f (G,k) IS yes no graph and integer

polynomial time computable

Fig. 49: Reduction of 3-SAT to IS. Secondly, we need to establish that IS is NP-hard, which can be done by showing that some known NP-complete problem (3SAT) is polynomial-time reducible to IS, that is, 3SAT ≤P IS. Let F be a boolean formula in 3-CNF form (the boolean-and of clauses, each of which is the boolean-or of 3 literals). We wish to ﬁnd a polynomial time computable function f that maps F into a input for the IS problem, a graph G and integer k. That is, f (F ) = (G, k), such that F is satisﬁable if and only if G has an independent set of size k. This will mean that if we can solve the independent set problem for G and k in polynomial time, then we would be able to solve 3SAT in polynomial time. An important aspect to reductions is that we do not attempt to solve the satisﬁability problem. (Remember: It is NP-complete, and there is not likely to be any polynomial time solution.) So the function f must operate without knowledge of whether F is satisﬁable. The idea is to translate the similar elements of the satisﬁable problem to corresponding elements of the independent set problem. What is to be selected? 3SAT: Which variables are assigned to be true. Equivalently, which literals are assigned true. IS: Which vertices are to be placed in V . Requirements: 3SAT: Each clause must contain at least one literal whose value it true. IS: V must contain at least k vertices. Restrictions: 3SAT: If xi is assigned true, then xi must be false, and vice versa. Lecture Notes 67 CMSC 451

IS: If u is selected to be in V , and v is a neighbor of u, then v cannot be in V . We want a function f , which given any 3-CNF boolean formula F , converts it into a pair (G, k) such that the above elements are translated properly. Our strategy will be to create one vertex for each literal that appears within each clause. (Thus, if there are m clauses in F , there will be 3m vertices in G.) The vertices are grouped into clause clusters, one for each clause. Selecting a true literal from some clause corresponds to selecting a vertex to add to V . We set k to the number of clauses. This forces the independent set to pick one vertex from each clause, thus, one literal from each clause is true. In order to keep the IS subroutine from selecting two literals from some clause (and hence none from some other), we will connect all the vertices in each clause cluster to each other. To keep the IS subroutine from selecting both a literal and its complement, we will put an edge between each literal and its complement. This enforces the condition that if a literal is put in the IS (set to true) then its complement literal cannot also be true. A formal description of the reduction is given below. The input is a boolean formula F in 3-CNF, and the output is a graph G and integer k.

3SAT to IS Reduction k ← number of clauses in F ; for each clause C in F create a clause cluster of 3 vertices from the literals of C; for each clause cluster (x1 , x2 , x3 ) create an edge (xi , xj ) between all pairs of vertices in the cluster; for each vertex xi create edges between xi and all its complement vertices xi ; return (G, k);

Given any reasonable encoding of F , it is an easy programming exercise to create G (say as an adjacency matrix) in polynomial time. We claim that F is satisﬁable if and only if G has an independent set of size k. Example: Suppose that we are given the 3-CNF formula: (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ). The reduction produces the graph shown in the following ﬁgure and sets k = 4.

x1 x1 x2 x3 x1 x2 x3 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3

The reduction

Correctness (x1=x2=1, x3=0)

Fig. 50: 3SAT to IS Reduction for (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ). In our example, the formula is satisﬁed by the assignment x1 = 1, x2 = 1, and x3 = 0. Note that the literal x1 satisﬁes the ﬁrst and last clauses, x2 satisﬁes the second, and x3 satiﬁes the third. Observe that by selecting the corresponding vertices from the clusters, we get an independent set of size k = 4. Lecture Notes 68 CMSC 451

v} ∈ E. there can be no edge of the form (xi . If F is satisﬁable. and we cannot take two vertices from the same cluster (because they are all interconnected). For example. and (2) that the problem is NP-hard. if G has an independent set V of size k. xi ) between the vertices of V . by showing that some known NP-complete problem can be reduced to this problem (there is a polynomial time function that transforms an input for one problem into an equivalent input for the other problem). Lecture 19: Clique. This assignment is logically consistent. Instead the reduction simply translated the input from one problem into an equivalent input to the other problem. We have discussed the facts that cliques are of interest in applications dealing with clustering. E) and an integer k. Recall that to show that a problem is NP-complete we need to show (1) that the problem is in NP (i. E) is a subset of vertices V ⊆ V such that every edge in G has at least one endpoint in V . while preserving the critical elements to each problem. Thus. does G have a k vertex subset whose induced subgraph is complete. because we cannot have two vertices labeled xi and xi in the same cluster. each of which is a clique. there are no inter-cluster edges between them. then each of the k clauses of F must have at least one true literal. Clique (CLIQUE): The clique problem is: given an undirected graph G = (V. and Dominating Set Read: Chapt 34 (up through 34. E) and an integer k. we can verify when an input is in the language). and because we cannot set a variable and its complement to both be true. Finally the transformation clearly runs in polynomial time. Conversely. Let V denote the corresponding vertices from each of the clause clusters (one from each cluster). it sufﬁces to install it on all the computers forming a vertex cover. does G have a subset V of k vertices such that for each distinct u. The dominating set problem (DS) is: given a graph G = (V. Today we give a few more examples of reductions.) This is because computing these things would require exponential time (by the best known algorithms). V is an independent set of size k. The vertex cover problem arises in various servicing applications. Because we take vertices from each cluster.Correctness Proof: We claim that F is satisﬁable if and only if G has an independent set of size k. This completes the NP-completeness proof. Dominating set is useful in facility location problems. you have a compute network and a program that checks the integrity of the communication links. does G have a vertex cover of size k? Dominating Set (DS): A dominating set in a graph G = (V. First observe that we must select a vertex from each clause cluster. does G have a dominating set of size k? Don’t confuse the clique (CLIQUE) problem with the clique-cover (CC) problem that we discussed in an earlier lecture. The vertex cover problem (VC) is: given an undirected graph G and an integer k. v ∈ V . Observe that our reduction did not attempt to solve the IS problem nor to solve the 3SAT. suppose we want to select where to place a set of ﬁre stations such that every house in the city is within 2 minutes of the nearest Lecture Notes 69 CMSC 451 . The clique problem seeks to ﬁnd a single clique of size k.e. Vertex Cover (VC): A vertex cover in an undirected graph G = (V. Vertex Cover. E) is a subset of vertices V such that every vertex in the graph is either in V or is adjacent to some vertex in V . because there are k clusters. In other words. nor did we assume we knew which variables to set to 1.5). Recap: Last time we gave a reduction from 3SAT (satisﬁability of boolean formulas in 3-CNF form) to IS (independent set in graphs). For example. Some Easy Reductions: We consider some closely related NP-complete problems next. {u. (We did not assume that the formula was satisﬁable. To save the space of installing the program on every computer in the network. From these nodes all the links can be tested. and the clique-cover problem seeks to partition the vertices into k groups. Also observe that the reduction had no knowledge of the solution to either problem. The dominating set proof is not given in our text. Consider the assignment in which we set all of these literals to 1.

implying that V is a clique in G. Given such a certiﬁcate we can easily verify in polynomial time that all pairs of vertices in the set are adjacent. CLIQUE ∈ NP: The certiﬁcate consists of the k vertices in the clique. 51: Clique. In particular. Clearly this can be done in polynomial time. (iii) V − V is a vertex cover of size n − k for G. k). v} is not an edge of G.ﬁre station. Theorem: VC is NP-complete. if we had an algorithm for solving any one of these problems. By the above lemma. implying that every edge in G is incident to a vertex in V − V . v ∈ V . {u. It is not quite as clear that the vertex cover problem is related. v} is not an edge of G. v ∈ V there is no edge {u. Given such a certiﬁcate we can easily verify in polynomial time that every edge is incident to one of these vertices. E) with n vertices and a subset V ⊆ V of size k. G V’ is CLIQUE of size k in G iff G V’ is an IS of size k in G iff G V−V’ is a VC of size n−k in G Fig. Lecture Notes 70 CMSC 451 . then for each u. The following are equivalent: (i) V is a clique of size k for the complement. Thus. and Vertex Cover. V is an independent set for G. v ∈ V . Independent set. G. (ii) V is an independent set of size k for G. k). VC ∈ NP: The certiﬁcate consists of the k vertices in the vertex cover. implying that there is an edge {u. Lemma: Given an undirected graph G = (V. (ii) ⇒ (iii): If V is an independent set for G. However. this instance is equivalent. v} in G. Proof: (i) ⇒ (ii): If V is a clique for G. we can produce an equivalent instance of the CLIQUE problem in polynomial time. The CLIQUE problem is obviously closely related to the independent set problem (IS): Given a graph G does it have a k vertex subset that is completely disconnected. we could easily translate it into an algorithm for the others. then for each u. then for any u. implying that V − V is a VC for G. IS ≤P CLIQUE: We want to show that given an instance of the IS problem (G. the following lemma makes this connection clear as well. v} is an edge of G implying that {u. A minimum sized dominating set will be a minimum set of locations such that every other location is reachable within 2 minutes from one of these sites. We create a graph in which two locations are adjacent if they are within 2 minutes of each other. The reduction function f inputs G and k. we have the following. (iii) ⇒ (i): If V − V is a vertex cover for G. {u. v} in G. Theorem: CLIQUE is NP-complete. implying that V is an independent set for G. and outputs the pair (G.

n − k). In VC: “every edge is incident to a vertex in V ”. The main result of this section is just this. and IS is an NP-complete problem. we will create a new special vertex. wuv } in G . The number of vertices to request for the dominating set will be k = k + I. So it does not have time to determine whether G has an independent set or which vertices are in the set. v} with the two edges {u. Let VI denote the isolated vertices in G. In DS: “every vertex is either in V or is adjacent to a vertex in V ”. Clearly this can be done in polynomial time. we can produce an equivalent instance of the VC problem in polynomial time. As usual the proof has two parts. these instances are equivalent. Observe that |V | = |V ∪ VI | ≤ k + I = k . but the converse is not necessarily true. We choose vertex cover and show that VC ≤P DS. we can add any vertices we like to make the size equal to k . then V = V ∪ VI is a dominating set for G . which given an instance of the vertex cover problem (G. k). In polynomial time we can determine whether every vertex is in V or is adjacent to a vertex in V . Correctness of the Reduction: To establish the correctness of the reduction. First we show that DS ∈ NP. Theorem: DS is NP-complete. This suggests the following idea (which does not quite work). produces an instance (G .IS ≤P VC: We want to show that given an instance of the IS problem (G. We want a polynomial time function. it does not need to be in the vertex cover. such that an incident edge in G is mapped to an adjacent vertex in G . v} in the graph as well. The fact that u was incident to edge {u. called wuv . The reduction function f inputs G and k. such that G has a vertex cover of size k if and only if G has a dominating set of size k . if there are any isolated vertices in V . we will leave the edge {u. How to we translate between these problems? The key difference is the condition. Thus the translation must somehow map the notion of “incident” to “adjacent”. Because incidence is a property of edges. It must run in polynomial time. v} in G we create a new vertex wuv in G and add edges {u. However the similarity suggests that if VC in NP-complete. Note that |V ∪ VI | might be of size less than k + I. Since it is not incident to any edges. for each edge {u. the reduction function did not know whether G has an independent set or not. and adjacency is a property of vertices. We will insert a vertex into the middle of each edge of the graph. Lecture Notes 71 CMSC 451 . First we argue that if V is a vertex cover for G. Note that every step can be performed in polynomial time. computes the number of vertices. each vertex is adjacent to the members of the dominating set. k ) of the dominating set problem. if G is connected and has a vertex cover of size k. wuv }. dominating set is an example of a graph covering problem. The certiﬁcate just consists of the subset V in the dominating set. By the lemma above. as opposed to each edge being incident to each member of the dominating set. then it has a dominating set of size k (the same set of vertices). This reduction illustrated in the following ﬁgure. Reducing Vertex Cover to Dominating Set: Next we show that an existing NP-complete problem is reducible to dominating set. k). we create a graph G as follows. and replace the edge {u. Initially G = G. Note: Note that in each of the above reductions. To do this. This is still not quite correct though. Given the pair (G. k) for the VC problem. v}. Deﬁne an isolated vertex to be one that is incident to no edges. n. Output (G . For each edge {u. We still need to dominate the neighbor v. wuv } and {v. Let G be the resulting graph. v} has now been replaced with the fact that u is adjacent to the corresponding vertex wuv . Now we can give the complete reduction. wuv } and {v. Obviously. we need to show that G has a vertex cover of size k if and only if G has a dominating set of size k . and let I denote the number of isolated vertices. If so. If u is isolated it can only be dominated if it is included in the dominating set. In other words. then DS is likely to be NP-complete as well. Dominating Set: As with vertex cover. this suggests that the reduction function maps edges of G into vertices in G . and then outputs (G. k ). Here the condition is a little different. Let I denote the number of isolated vertices and set k = k + I.

Let V denote the resulting set after this modiﬁcation. v} in G implying that either u or v is in the vertex cover V . vertex cover for G dominating set for G’ dominating set for G’ using original vertices vertex cover for G Fig. Thus wuv is dominated by the same vertex in V Finally. we claim that if G has a dominating set V of size k = k + I then G has a vertex cover V of size k. let V = V − VI be the remaining k vertices. v. and hence either it is in V or else all of its neighbors are in V . Conversely. each of the special vertices wuv in G corresponds to an edge {u. This is shown in the top part of the following ﬁgure. to the contrary there were an edge {u. v} of G that was not covered (neither u nor v was in V ) then the special vertex wuv would not be adjacent to any vertex of V in G . (This is shown in the lower middle part of the ﬁgure. To see that V is a dominating set. However. We might try to claim something like: V is a vertex cover for G. each of the nonisolated original vertices v is incident to at least one edge in G. ﬁrst observe that all the isolated vertices are in V and so they are dominated. Lecture Notes 72 CMSC 451 . 53: Correctness of the VC to DS reduction (where k = 3 and I = 1).v with u we dominate the same vertices (and potentially more). and wuv (because u has edges going to v and wuv ). Thus by replacing wu. then modify V by replacing wuv with u. In either case. By using u instead.) We claim that V is a vertex cover for G. (We could have just as easily replaced it with v. v is either in V or adjacent to a vertex in V . because V may have vertices that are not part of the original graph G. so it dominates itself and these other two vertices. contradicting the hypothesis that V was a dominating set for G . we claim that we never need to use any of the newly created special vertices in V . In particular. Second.) Observe that the vertex wuv is adjacent to only u and v. If. 52: Dominating set reduction. we still dominate u. Note that all I isolated vertices of G must be in the dominating set. if some vertex wuv ∈ V . But this will not necessarily work. First.f G k=3 G’ k’=3+1=4 Fig.

SS is NP-complete: The proof that Subset Sum (SS) is NP-complete involves the usual two elements. If t = 34 the answer would be no. then certainly the more general 0-1 Knapsack problem (stated as a decision problem) is also NP-complete. but exponential in b. VC ≤P SS. each with an associated weight wi and associated value vi . we want to know whether there exists a subset S ⊆ S that sums exactly to t. . 12. suppose that the value is the same as the weight. S[i. By setting t = W . But there is a important catch. Given S and t. (In the fractional knapsack we could take a portion of an object. and verify that this sum equals t. . t. (This would occur for example if all the objects were made of the same material. and 0 otherwise. the certiﬁcate is just the indices of the numbers that form the subset S . given the contents of the (i − 1)-st row. It follows that if we can show that this simpler version is NP-complete.5 in CLR. so the answer in this case is yes. Recall that in all NP-complete problems we assume (1) running time is measured as a function of input size (number of bits) and (2) inputs must be encoded in a reasonable succinct manner. So. The ith row of this table can be computed in O(t) time. the best we could hope to achieve would be to ﬁll the knapsack entirely. S = {3.Lecture 20: Subset Sum Read: Sections 34. So the resulting algorithm has a running time of O(n2b ). w } 1 2 i that sums to t . we will show that Vertex Cover (VC) is reducible to SS. t ] = 1 if there is a subset of {w . Subset Sum: The Subset Sum problem (SS) is the following. wn } and a target value. To show that SS is in NP. 23. In the 0-1 Knapsack we either take an object entirely or leave it. say. for 0 ≤ i ≤ n and 0 ≤ t ≤ t. . . We are given a knapsack of capacity W . in polynomial time we can compute the sum of elements in S . The subset S = {6. . Recall that in the 0-1 Knapsack problem. 9. 2 We will leave this as an exercise. . Dynamic Programming Solution: There is a dynamic programming algorithm which solves the Subset Sum problem in O(n · t) time. 15. Thus. The objective is to take as many objects as can ﬁt in the knapsack’s capacity so as to maximize the value. This problem is a simpliﬁed version of the 0-1 Knapsack problem. Thus. We want a polynomial time computable function f that maps an instance of the vertex cover (a graph G and integer k) to an instance of the subset sum problem (a set of integers S and target integer t) such that G has a vertex cover of size k if and only if S has a subset summing to t. if subset sum were solvable in polynomial time. we will show that in the general case. Then the input size is O(nb). 6. 32} and t = 33. but the formulation is. w . this running time is not polynomial as a function of the input size. Given a ﬁnite set S of positive integers S = {w1 . vi = wi . (i) SS ∈ NP.5. This would seem to imply that the Subset Sum problem is in P. In particular.) Then. using the fewest number of bits possible. We can add two b-bit numbers together in O(b) time. gold. 12. (ii) Some known NP-complete problem is reducible to SS. However. For the remainder of the proof we show how to reduce vertex cover to subset sum. this problem is NP-complete. we need to give a veriﬁcation procedure.2 The quantity n · t is a polynomial function of n. This is polynomial in n. If the numbers involved are of a ﬁxed number of bits (a constant independent of n). then the problem is solvable in polynomial time. . .) In the simplest version. 15} sums to t = 33. Note that an important consequence of this observation is that the SS problem is not hard when the numbers involved are small. The value of t may be as large as 2b . that is. Let us assume that the numbers wi and t are all b-bit numbers represented in base 2. we are given a collection of objects. Lecture Notes 73 CMSC 451 . so would vertex cover. w2 . we see that the subset sum problem is equivalent to this simpliﬁed version of the 0-1 Knapsack problem. presented as a decision problem. Consider the following example.

54: Encoding a graph as a collection of bit vectors. (We could just take the logical-or of all the vertices. Conversely. but gives a sense of how to proceed. suppose we take any subset of vertices and form the logical-or of the corresponding bit vectors. The target would be the number whose bit vector is all 1’s. There are a number of problems. . First. then each edge has been covered by some vertex. and then the logical-or would certainly be a bit vectors of 1’s. 1111 . this is starting to feel more like the subset sum problem. 1. If the subset is a vertex cover. we have no way of controlling how many vertices go into the vertex cover. First number the edges of the graph from 1 through E. so it seems logical that the reduction should map vertices into numbers. if both of the endpoints of some edge are in the vertex cover. where the j-th bit from the left is set to 1 if and only if the edge ej is incident to vertex vi . Let E denote the number of edges in the graph. An Initial Approach: Here is an idea. implying that the vertices form a vertex cover. . 55: The logical-or of a vertex cover equals 1111 . Then represent each vertex vi as an E-element bit vector. For example.) An example is shown below. and so the logical-or will be a bit vector of all 1’s.How can we encode the notion of selecting a subset of vertices that cover all the edges to that of selecting a subset of numbers that sums to t? In the vertex cover problem we are selecting vertices. Since bit vectors can be thought of as just a way of representing numbers in binary. then every edge will be covered by at least one of these vertices. . Second. if the logical-or is a bit vector of 1’s. Now. logical-or is not the same as addition. then its value in the corresponding column would be 2.) Lecture Notes 74 CMSC 451 . (Later we will consider how to encode the fact that there only allowed k vertices in the cover. in which k = 3. which does not work. not 1. The constraint that these vertices should cover all the edges must be mapped to the constraint that the sum of the numbers should equal the target value. (Another way to think of this is that these bit vectors form the rows of an incidence matrix for the graph.) v7 v1 e1 e2 v2 e3 v3 e4 v6 e6 e5 v5 e7 v4 e8 e1 e2 e3 e4 e5 e6 e7 e8 v1 v2 v3 v4 v5 v6 v7 t= 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 0 0 1 v2 v v3 v v4 Fig. and in the subset sum problem we are selecting numbers. . 1. however. v7 v1 e1 e2 v2 e3 v3 e4 v6 e6 e5 v5 e7 v4 e8 e1 e2 e3 e4 e5 e6 e7 e8 v1 v2 v3 v4 v5 v6 v7 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 Fig.

. Conversely. There is one last issue.g. Our target will be the number 2222 . . 00000100000. Observe that each column of the incidence matrix has at most two 1’s in any column. 222 (all 2’s). (2) Create E slack values y1 . no edge can be left uncovered by V . to form the desired sum. 56. since (because there are no carries) the corresponding column would be 0 in the sum of vertex values. On the other hand. if S has a subset S that sums to t then we assert that it must select exactly k values from among the vertex values.g. but in binary 1101 + 0011 = 1000. 2 and because there are k elements in V . except for a single 1-digit in the ith position. in fact. The Final Reduction: Here is the ﬁnal reduction. The construction is illustrated in Fig. Note that the base of the number system is just for own convenience of notation. the ﬁnal sum of these numbers will be a number consisting of 1 and 2 digits. we will require that this column sum to the value k. We will handle this by adding an additional column. . . The i-th digit of yi is 1 and all others are 0. and whose remaining E digits are all 2. The ﬁrst is the issue of carries. we will get a sum consisting of 1’s and 2’s. Thus. we recognize that we do not have to use a binary (base-2) representation. the ith slack value will consist of all 0’s. Output the set S = {x1 . we must select exactly k of the vertex values. To see why this works. and for each edge that is covered only once in V . xn . The value xi is equal a 1 followed by a sequence of E base-4 digits. For each slack variable we will put a 0.g. y1 . For each number arising from a vertex. . yE } and t. (1) Create a set of n vertex values. we will never generate a carry to the next position. we can supplement this value by adding in the corresponding slack value. base 10). y2 . no matter what slack values we add. the yj ’s. if use any base that is at least as large as base 3. the 1101 ∨ 0011 = 1111. Thus. Thus. . Thus. e. decimal or binary. Observe that this can be done in polynomial time. The second difference between logical-or and addition is that an edge may generally be covered either once or twice in the vertex cover. where yi is a 0 followed by E base-4 digits. In fact we will use base 4 (for reasons to be seen below). . they will be converted into whatever form our machine assumes for its input representation. . observe that from the numbers of our vertex cover. This does not provide us with a unique target value t. . We know that no digit of our sum can be a zero. .g. in O(E 2 ). we will not have enough slack values to convert this into a 2. But since this is the last column. since the ﬁrst digit must sum to k. because each edge is incident to at most two vertices. . In particular. 112. E) and integer k for the vertex cover problem. there might be carries out of this last column (if k ≥ 4). . x1 . the resulting digit position could not be equal to 2. . Note that since we only have a base-4 representation. (3) Let t be the base-4 number whose ﬁrst digit is k (this may actually span multiple base-4 digits). and so this cannot be a solution to the subset sum problem. . For example. note that if there are any 0 values in the ﬁnal sum. e. We are only allowed to place only k vertices in the vertex cover. .. then we take the vertex values xi corresponding to the vertices of V . To ﬁx this problem. the resulting subset sums to t. It follows from the comments made earlier that the lower-order E digits of the resulting sum will be of the form 222 . yE . the size of the vertex cover. If G has a vertex cover V of size k. For 1 ≤ i ≤ E. In the target. x2 . we will put a 1 in this additional column. Once the numbers have been formed. . To ﬁx this. we will create a set of E additional slack values. e. and t into whatever base notation is used for the subset sum problem (e. xn using base-4 notation. In fact. For each position where there is a 1. we can assume any base system we want. . . The j-th digit is a 1 if edge ej is incident to vertex vi and 0 otherwise. (4) Convert the xi ’s. 1211 . . Thus we can boost any value consisting of 1’s and 2’s to all 2’s. Correctness: We claim that G has a vertex cover of size k if and only if S has a subset that sums to t. we take the corresponding slack variable. . the leftmost digit of the sum will be k. So.There are two ways in which addition differs signiﬁcantly from logical-or. given the graph G = (V. it will not affect any of the other aspects of the construction. . . Lecture Notes 75 CMSC 451 . We claim that these vertices V form a vertex cover.

57: Correctness of the reduction. Lecture Notes 76 CMSC 451 .e1 e2 e3 e4 e5 e6 e7 e8 x1 x2 x3 x4 x5 x6 x7 y1 y2 y3 y4 y5 y6 y7 y8 t 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 2 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 2 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 2 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 2 Vertex values Slack values vertex cover size (k=3) Fig. 56: Vertex cover to subset sum reduction. e1 e2 e3 e4 e5 e6 e7 e8 x1 x2 x3 x4 x5 x6 x7 y1 y2 y3 y4 y5 y6 y7 y8 t 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 2 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 2 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 2 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 2 Vertex values (take those in vertex cover) Slack values (take one for each edge that has only one endpoint in the cover) vertex cover size Fig.

or if lack of optimality is not really an issue. Some NP-complete problems can be approximated arbitrarily closely. Given an instance I of our problem. let C(I) be the cost of the solution produced by our approximation algorithm. Heuristics: A heuristic is a strategy for producing a valid solution. The performance of these approaches varies considerably from one problem to problem and instance to instance. We will assume that costs are strictly positive values. But in some cases they can perform quite well. For example. the VC optimization problem is to ﬁnd the vertex cover of minimum size. and it is equal to 1 if and only if the approximate solution is the true optimum solution. and a real value > 0. the running time increases beyond polynomial time. which is not polynomial time. For any input size n. In our dynamic programming solution W = t. Such an algorithm is called a polynomial time approximation scheme (or PTAS for short). How do we measure how good an approximation algorithm is? We deﬁne the ratio bound of an approximation algorithm as follows. and let C ∗ (I) be the optimal solution. As approaches 0. and returns an answer whose ratio bound is at most (1 + ). If the running time depends only on a polynomial function of 1/ then it is called a fully polynomialtime approximation scheme. For a minimization problem we want C(I)/C ∗ (I) to be small. since people do need solutions to these problems. Since these are important problems. their approximability varies considerably.2) in CLRS. max I C ∗ (I) C(I) Observe that ρ(n) is always greater than or equal to 1. simulated annealing. the clique optimization problem is to ﬁnd the clique of maximum size. However underlying most of these problems is a natural optimization problem. Performance Bounds: Most NP-complete problems have been stated as decision problems for theoretical reasons. and for a maximization problem we want C ∗ (I)/C(I) to be small. How do we cope with NP-completeness: Use brute-force search: Even on the fastest parallel computers this approach is viable only for the smallest instances of these problems. Note that sometimes we are minimizing and sometimes we are maximizing. the TSP optimization problem is to ﬁnd the simple cycle of minimum cost in a digraph. Such an algorithm is given both the input. These go under names such as branch-and-bound. The running time is a function of both n and . ≤ ρ(n). we say that the approximation algorithm achieves ratio bound ρ(n). For example. General Search Methods: There are a number of very powerful techniques for solving general combinatorial optimization problems that have been developed in the areas of AI and operations research. For example. Lecture 21: Approximation Algorithms: VC and TSP Read: Chapt 35 (up through 35. we needed to have large numbers. whereas O(n1/ ) and O(2(1/ ) n) are not. This is worthwhile if all else fails. and genetic algorithms. but not necessarily one of the smallest size. a running time like O((1/ )2 n3 ) would be such an example. so the DP algorithm would run in Ω(n4n ) time. An approximation algorithm is one that returns a legitimate answer. Approximation Algorithms: This is an algorithm that runs in polynomial time (ideally). we cannot simply give up at this point. Coping with NP-completeness: With NP-completeness we have seen that there are many important optimization problems that are likely to be quite hard to solve exactly. Although NP-complete problems are equivalent with respect to whether they can be solved exactly in polynomial time in the worst case. |I| = n we have C(I) C ∗ (I) . Lecture Notes 77 CMSC 451 . A∗ -search. but there are no guarantees how close it is to optimal. the running time might be O(n 1/ ).It is worth noting again that in this reduction. if for all I. and produces a solution that is within a guaranteed factor of the optimum solution. the target value t is at least as large as 4E ≥ 4n (where n is the number of vertices in G). For example.

• For some NP-complete problems. Proof: Consider the set C output by ApproxVC. so it is easy to see that the cover we generate is at most twice the size of the optimum cover. but it is covered in CLRS. • Some NP-complete problems have PTAS’s. We will not discuss this further. 58: The 2-for-1 heuristic for vertex cover. can be approximated to within a factor of ln n. G and opt VC The 2−for−1 Heuristic Fig. if the graph TSP problem had an approximation algorithm with a ratio bound of any value less than ∞. Let A be the set of edges selected by the line marked with “(*)” in the ﬁgure. but the ratio bound is a (slow growing) function of n. • Some NP-complete problems can be approximated to within a ﬁxed constant factor. We will discuss two examples below. Consider an arbitrary edge (u. it is very unlikely that any approximation algorithm exists. Sufﬁce it to say that the topic of approximation algorithms would ﬁll another course. much like NP-complete problems. but we do not know which one. Here is an very simple algorithm. a heuristic. (You cannot get much stupider than this!) Then we remove all edges that are incident to u and v (since they are now all covered). • Many NP-complete can be approximated. We will not discuss this algorithm. and thus the size of C ∗ is at least |A|. For example. How does one go about ﬁnding an approximation algorithm. The vertex cover optimization problem is to ﬁnd a vertex cover of minimum size. this algorithm will be guaranteed to ﬁnd a vertex cover whose size is at most twice that of the optimum. the set cover problem (a generalization of the vertex cover problem). we put two into our cover. Let C ∗ be the optimum VC. For every one vertex that must be in the cover. However note that in the optimum VC one of these two vertices must have been added to the VC. Recall that a vertex cover is a subset of vertices such that every edge in the graph is incident to at least one of these vertices. It is based on the following observation. when not optimal. The approximation is given in the ﬁgure below. It turns out that many simple heuristics. can often be proved to be close to optimal. For example. and recurse on the remaining edges. Claim: ApproxVC yields a factor-2 approximation for Vertex Cover. v) in the graph. The ﬁrst approach is to try something that seems like a “reasonably” good strategy. that is. One of its two vertices must be in the cover. This class is called Max-SNP complete. Observe that the size of C is exactly 2|A| because we add two vertices for each such edge. Vertex Cover: We begin by showing that there is an approximation algorithm for vertex cover with a ratio bound of 2. Here is a more formal proof of its approximation bound. |C ∗ | CMSC 451 . there are collections of problems which are “believed” to be hard to approximate and are equivalent in the sense that if any one can be approximated in polynomial time then they all can be. Thus we have: |C| = |A| ≤ |C ∗ | 2 Lecture Notes 78 ⇒ |C| ≤ 2. that guarantees an approximation within a factor of 2 for the vertex cover problem. One example is the subset problem (which we haven’t discussed. then P = NP. but is described in CLRS) and the Euclidean TSP problem. In fact. The idea of this heuristic is to simply put both vertices into the vertex cover.

In fact. why not concentrate instead on vertices of high degree. that we need some way of ﬁnding a bound on the optimal solution. Can we prove that the greedy heuristic always outperforms the stupid 2-for-1 heuristic? The surprising answer is an emphatic “no”.v) be any edge of E add both u and v to C remove from E all edges incident to either u or v } return C. } It is interesting to note that on the example shown in the ﬁgure. Repeat the algorithm on the remaining graph. Here is the greedy heuristic. This is greedy strategy. since a vertex of high degree covers the maximum number of edges. } return C. the greedy heuristic actually succeeds in ﬁnding the optimum vertex cover. where n is the number of vertices. We saw in the minimum spanning tree and shortest path problems that greedy strategies were optimal. This algorithm simply selects any edge. for maximization problems an upper bound. which form a maximal independent set of edges.2-for-1 Approximation for VC ApproxVC { C = empty-set while (E is nonempty) do { (*) let (u. (For minimization problems we want a lower bound. This algorithm is illustrated in the ﬁgure below. Put this vertex in the cover. In this case. } This proof illustrates one of the main features of the analysis of any approximation algorithm. 59: The greedy heuristic for vertex cover. Namely. Instead. remove from E all edges incident to u. it can be shown that the greedy heuristic does not even have a constant performance bound. Greedy Approximation for VC GreedyVC(G=(V.E)) { C = empty-set. the bound is related to the set of edges A.) The bound should be related to something that we can compute in polynomial time. while (E is nonempty) do { let u be the vertex of maximum degree in G. That is. (We leave Lecture Notes 79 CMSC 451 . Then delete all the edges that are incident to this vertex (since they have been covered). The Greedy Heuristic: It seems that there is a very simple way to improve the 2-for-1 heuristic. and adds both vertices to the cover. Select the vertex with the maximum degree. It can be shown that the ratio bound grows as Θ(log n). until no more edges remain. G and opt VC The Greedy Heuristic Fig. it can perform arbitrarily poorly compared to the optimal algorithm. add u to C.

w). Given a set of edges A forming a tour we deﬁne c(A) to be the sum of edge weights in A. using. but we will not discuss it. v) to be the shortest path length between u and v (computed. 60: TSP Approximation. once in each direction. for all u. If we can ﬁnd some way to convert the MST into a TSP tour while increasing its cost by at most a constant factor. Let T be the minimum spanning tree.) The triangle inequality assures us that the path length will not increase when we take short-cuts. When the underlying cost function satisﬁes the triangle inequality there is an approximation algorithm for TSP with a ratio-bound of 2. w) ≤ c(u. The key insight is to observe that a TSP with one edge removed is a spanning tree. given any weighted graph. then it will satisfy the triangle inequality. we skip over previously visited vertices. so it would probably be wise to run both algorithms and take the better of the two. either Kruskal’s or Prim’s algorithm. Claim: Approx-TSP has a ratio bound of 2. (In fact. Given any free tree there is a tour of the tree called a twice around tour that traverses the edges of the tree twice. then the triangle inequality is also satisﬁed. if we deﬁne c(u. the cost of the minimum TSP tour is at least as large as the cost of the MST. Here is how the algorithm works. For example. As we said before. v) is deﬁned to be the Euclidean distance between these points. where c(u. Last time we mentioned that TSP (posed as a decision problem) is NP-complete. the problem satisﬁes something called the triangle inequality. since we can remove any edge of H ∗ resulting in a spanning tree. that is. w ∈ V c(u. Another example is if we are given a set of points in the plane. it should also be pointed out that the vertex cover constructed by the greedy heuristic is (for typical graphs) smaller than that one computed by the 2-for-1 heuristic. (In fact. although this algorithm does not produce an optimal tour. v). but we can make it simple by short-cutting. Intuitively. However it is not necessarily a minimum spanning tree. say by the Floyd-Warshall algorithm).) Thus.this as a moderately difﬁcult exercise. We can compute MST’s efﬁciently. the tour that it produces cannot be worse than twice the cost of the optimal tour. Therefore. Traveling Salesman Problem: In the Traveling Salesperson Problem (TSP) we are given a complete undirected graph with nonnegative edge weights. Notice that the ﬁnal order in which vertices are visited using the short-cuts is exactly the same as a preorder traversal of the MST. any subsequence of the twice-around tour which visits each vertex exactly once will sufﬁce. there is a slightly more complex version of this algorithm that has a ratio bound of 1. start start MST Twice−around tour Shortcut tour Optimum tour Fig. and deﬁne a complete graph on these points. then we will have an approximation for TSP. then this is possible. This path is not simple because it revisits vertices.) However. More formally. is never longer than an indirect path. The ﬁgure below shows an example of this. We shall see that if the edge weights satisfy the triangle inequality. this says that the direct path from u to w. For many of the applications of TSP. There are many examples of graphs that satisfy the triangle inequality. v. v) + c(v. Let c(u. for example. v) denote the weight on edge (u.5. and Lecture Notes 80 CMSC 451 . and we want to ﬁnd a cycle that visits all vertices and is of minimum cost. Proof: Let H denote the tour produced by this algorithm and let H ∗ be the optimum tour.

By the triangle inequality. Combining these we have c(H) ≤ c(T ) ≤ c(H ∗ ) 2 ⇒ c(H) ≤ 2. For each vertex v ∈ V we can associate it with its nearest center in C.) More formally. . For each pair of vertices. The condition is that you are to minimize the maximum distance that any resident of the city must drive in order to arrive at the nearest store. called centers. when we short-cut an edge of T to form H we do not increase the cost of the tour. V (ck ) forms a partition of the vertex set of G.) More formally. . v) denote the weight of edge (u. ci ) ≤ d(v. Then V (c1 ). If we model the road network of the city as an undirected graph whose edge weights are the distances between intersections. deﬁne: V (ci ) = {v ∈ V | d(v. v) = w(v. v) denote the distance between u to v. and so we have c(H) ≤ 2c(T ). . For each center ci ∈ C we deﬁne its neighborhood to be the subset of vertices for which ci is the closest center. D(ci ) = max d(v. v) = d(u. such that the maximum distance between any vertex in V and its nearest center in C is minimized. let d(u. E) denote the graph. v ∈ V . that is. V (c2 ). The problem is to compute a subset of k vertices C ⊆ V . (The optimization problem seeks to minimize the maximum distance and the decision problem just asks whether there exists a set of centers that are within a given distance. 61. for i = j}. the centers.) We assume that all edge weights are nonnegative. This will be used in our proof.E)) { T = minimum spanning tree for G r = any vertex L = list of vertices visited by a preorder walk ot T starting with r return L } since T is the minimum cost spanning tree we have c(T ) ≤ c(H ∗ ). and we are given an integer k. The bottleneck distance associated with each center is the distance to its farthest vertex in V (ci ). Now observe that the twice around tour of T has cost 2c(T ). Facility Location: Imagine that Blockbuster Video wants to open a 50 stores in some city. E) with nonnegative edge weights. then this is an instance of the k-center problem. Let us assume for simplicity that there are no ties for the distances to the closest center (or that any such ties have been broken arbitrarily). the length of the shortest path from u to v. See Fig. (This is the nearest Blockbuster store to your house). . c(H ∗ ) Lecture 22: The k-Center Approximation Read: Today’s material is not covered in CLR. (These are the houses that are closest to this center. u) because G is undirected. u. that is. (Note that the shortest path distance satisﬁes the triangle inequality. (w(u. v). ci ).TSP Approximation ApproxTSP(G=(V. cj ). let G = (V. In the k-center problem we are given an undirected graph G = (V. The company asks you to determine the best locations for these stores. v∈V (ci ) Lecture Notes 81 CMSC 451 . and let w(u. since every edge in T is hit twice.) Consider a subset C ⊆ V of vertices.

This distance is critical because it represents the customer that must travel farthest to get to the nearest facility. We begin by letting the ﬁrst center c1 be any vertex in the graph (the lower left vertex. So let us just make it the next center. and an integer k ≤ |V |. which beats the 11 that the greedy algorithm gave. we can now formally deﬁne the problem. In Lecture Notes 82 CMSC 451 . in the ﬁgure below). Repeat this until all k centers have been selected. and computing D(C) for each one. Compute the distances between this vertex and all the other vertices in the graph (Fig. One way to solve the distance computation step above would be to invoke Dijkstra’s algorithm i times. it is highly unlikely that a signiﬁcantly more efﬁcient exact algorithm exists in the worst-case. and the ﬁnal bottleneck distance is 11. (See Fig. We would like to select the next center so as to reduce this distance. Finally. it is not optimal. In particular. we deﬁne the overall bottleneck distance to be D(C) = max D(ci ). k-center problem: Given a weighted undirected graph G = (V. Again we consider the bottleneck vertex for the current centers {c1 . ﬁnd a subset C ⊆ V of size k such that D(C) is minimized. But there is an easier way. let d[u] denote the distance to the nearest center. 62(d). Greedy Approximation Algorithm: Our approximation algorithm is based on a simple greedy algorithm that produces a bottleneck distance D(C) that is not more than twice the optimum bottleneck distance. For each vertex u. Again we compute the distances from each vertex to its nearest center. ci ∈C This is the maximum distance of any vertex from its nearest center. then this an exponential number of subsets. 62(c) where dashed lines indicate which vertices are closer to which center). the ﬁnal three greedy centers are shaded. We can modify Dijkstra’s algorithm to operate as a multiple source algorithm. Given this notation.V(c2) 5 9 8 5 6 6 8 7 9 5 V(c1) Input graph (k=3) 6 4 5 9 5 5 c1 6 8 5 c2 7 7 6 6 8 9 6 4 4 c3 5 5 V(c3) Optimumum Cost = 7 Fig. E). the bottleneck vertex. However. We will show that there does exist an efﬁcient approximation algorithm for the problem. In the example shown in the ﬁgure. Consider the vertex that is farthest from this center (the upper right vertex at distance 23 in the ﬁgure). the optimum solution (shown on the right) has a bottleneck cost of 9. The decision-problem formulation of the k-center problem is NP-complete (reduction from dominating set). called c2 . letting n = |V | and k. In Fig. Given that the problem is NP-complete. say. Although the greedy approach has a certain intuitive appeal (because it attempts to ﬁnd the vertex that gives the bottleneck distance. If k is k a function of n (which is reasonable). Then again we compute the distances from each vertex in the graph to the closer of c1 and c2 . A brute force solution to this problem would involve enumerating all k-element of subsets of V . it sets d[s] = 0 and pred[s] = null. 62(d)). We know from Dijkstra’s algorithm how to compute the shortest path from a single source to all other vertices in the graph. 61: The k-center problem with optimum centers ci and neighborhood sets V (ci ). the number of possible subsets is n = Θ(nk ). c2 }. in the initialization of Dijkstra’s single source algorithm. Here is a summary of the algorithm. This the bottleneck vertex for {c1 }. and then puts a center right on this vertex). We place the next center at this vertex (see Fig. 62(b)).

k) { C = empty_set for each u in V do // initialize distances d[u] = INFINITY for i = 1 to k do { // main loop Find the vertex u such that d[u] is maximum Add u to C // u is the current bottleneck vertex // update distances Compute the distance from each vertex v to its closest vertex in C. 62: Greedy approximation to k-center. denoted d[v] } return C // final centers } Lecture Notes 83 CMSC 451 .5 9 8 5 6 (a) 6 7 6 4 9 5 8 14 9 5 5 c1 5 19 7 6 23 4 11 9 5 5 c1 5 6 7 6 c2 4 11 9 5 5 c1 5 6 7 6 c2 4 8 12 6 9 19 5 8 12 6 9 4 5 8 c3 6 9 4 5 6 6 (b) 8 14 6 6 (c) 8 9 6 6 (d) 8 9 Greedy Cost = 11 Fig. Greedy Approximation for k-center KCenterApprox(G.

Let D∗ = D(O) be the optimal bottleneck distance. in which all edges are of weight 1. Suppose it picks the leftmost vertex. d(gi . .the modiﬁed multiple source version. If we had instead chosen the vertex in the middle. . consider a set of n + 1 vertices arranged in a linear graph. As each center is selected. Thus. . Approximation Bound: How bad could greedy be? We will argue that it has a ratio bound of 2. . Notice that the distance from gk+1 to its nearest center is equal D(G). . . To see that we can get a factor of 2. it is selected to be at the maximum (bottleneck) distance from all the previous centers. Let O = {o1 . ok } denote the centers of the optimal solution (shown as black dots in Fig. this is O(E log V ). g2 . The greedy algorithm might pick any initial vertex that it likes. gj ) ≥ D(G). Let D(G) denote the bottleneck distance for G. Greedy Opt Cost = n Cost =n/2 Fig. then the maximum distance would only be n/2. This follows as a result of our greedy selection strategy. Theorem: The greedy approximation has a ratio bound of 2. let gk+1 denote the next center that would have been added next. We want to show that this approximation algorithm always produces a ﬁnal distance D(C) that is within a factor of 2 of the distance of the optimal solution. the overall running time is O(kE log V ). the bottleneck vertex for G. o2 . all the centers are at least this far apart from one another. gk . The proof involves a simple application of the pigeon-hole principal. which is better by a factor of 2. 64. Since the ﬁnal bottleneck distance is D(G). that is D(G)/D∗ ≤ 2. Recall that the running time of Dijkstra’s algorithm is O((V + E) log V ). gk } be the centers found by the greedy approximation (shown as white dots in the ﬁgure below). The greedy centers are given as white dots and the optimal centers as black dots. The regions represent the neighborhood sets V (oi ) for the optimal centers. Under the reasonable assumption that E ≥ V . . Also. 64: Analysis of the greedy heuristic for k = 5. . g3 o1 g1 g5 <Dopt >D* o2 <Dopt o3 g6 o5 g4 o4 g2 Fig. gk+1 } be the (k + 1)-element set consisting of the greedy centers together with the next greedy center gk+1 First observe that for i = j. Proof: Let G = {g1 . the maximum distance between any pair of centers decreases. . The ﬁnal greedy algorithm involves running Dijkstra’s algorithm k times (once for each time through the for-loop). Lecture Notes 84 CMSC 451 . . Let G = {g1 . 63: Worst-case for greedy. . g2 . we do this for all the vertices of C. that is. and the lines show the partition into the neighborhoods for each of these points). As we add more centers. Then the maximum (bottleneck) distance is the distance to the rightmost vertex which is n. .

Consider a subset C ⊆ F . . that at least two centers of G are in the same set V (om ) for some m. Lecture 23: Approximations: Set Cover and Bin Packing Read: Set cover is covered in Chapt 35. Consider the example shown below. xm } is a ﬁnite set (a domain of elements) and F = {S1 . the VC approximation relies on the fact that each element of the domain (an edge) is in exactly 2 sets (one for each of its endpoints). S3 S1 S4 S5 S2 S6 Fig. Set Cover: The set cover problem is a very important optimization problem. and k + 1 elements in G . You want to put up the fewest cameras to see all the paintings. The optimum set cover consists of the three sets {S3 . D(G) ≤ d(gi . S2 . X= Si ∈C The problem is to ﬁnd the minimum-sized subset C of F that covers X. that is Si . (This is a collection of sets over X. but it cannot be applied to generate a factor2 approximation for set cover. the decision-problem formulation of set cover (“does there exist a set cover of size at most k?”) is NP-complete as well. S4 . There is a factor-2 approximation for the vertex cover problem. . Because there are k centers in O. gj ) ≥ D(G). that is. 65: Set cover. Therefore. you can see a certain subset of the paintings. Each such subset of paintings is a set in your system. Bin packing is covered as an exercise in CLRS. S5 }. For example. this is not true for the general Lecture Notes 85 CMSC 451 . For example. . . such that every element of X belongs to at least one set of F . vertex cover is a type of set cover problem. Complexity of Set Cover: We have seen special cases of the set cover problems that are NP-complete. Thus. Sn } is a family of subsets of X. the greedy centers g4 and g5 are both in V (o2 )). Let these be denoted gi and gj . By concatenating these two paths.3. The domain to be covered are the edges. Since D∗ is the bottleneck distance for O. from which the desired ratio follows.Each gi ∈ G is associated with its closest center in the optimal solution. . In particular. You are given a pair (X. gj ) ≤ 2D∗ . . each belongs to V (om ) for some m. x2 . it follows that there exists a path of length 2D∗ from gi to gj . F ) where X = {x1 . . and hence we have d(gi . (In the ﬁgure.) We say that C covers the domain if every element of X is in some set of C. Set cover can be applied to a number of applications. gj ) ≤ 2D∗ . From each possible camera position. and each vertex covers the subset of incident edges. Unfortunately. . suppose you want to set up security cameras to cover a large art gallery. But from the comments above we have d(gi . it follows from the pigeon-hole principal. we know that the distance from gi to ok is of length at most D∗ and similarly the distance from ok to gj is at most D∗ .

Again. This is unfortunate. Today we will show that there is a reasonable approximation algorithm.) Greedy Set Cover: A simple greedy approach to set cover works by at each stage selecting the set that covers the greatest number of “uncovered” elements. if we choose poorly. Lecture Notes 86 CMSC 451 1 c c ≤ 1 . The pattern repeats. S4S3 S5 S6 S2 S1 Optimum: {S5. but it is possible to redesign the example such that there are no ties and yet the algorithm has essentially the same ratio bound. S5 and S6 all cover 8 of the remaining elements. In fact. that the approximation factor of ln m where m ≤ m is the size of the largest set in F . However we will show that the greedy set cover heuristic nevers performs worse than a factor of ln m. Now S2 . Consider the following example. for a ratio bound of (lg m)/2. S3. What is the approximation factor? The problem with the greedy set cover algorithm is that it can be “fooled” into picking the wrong set. 1− where e is the base of the natural logarithm. 66: An example in which the Greedy Set cover performs poorly. then S6 (since it covers 3 out of the remaining 6). (Recall the lg denotes logarithm base 2. over and over again. it is known that there is no constant factor approximation to the set cover problem. each of size 16. choosing S3 (size 4). the size of the underlying domain. but we picked roughly lg m. the greedy heuristic. not base 2.) There were many cases where ties were broken badly here. (Note that this is natural log. Initially all three sets S1 . the greedy algorithm will ﬁrst select sets S1 . Greedy Set Cover Greedy-Set-Cover(X. S2 is chosen. we need one important mathematical inequality. the optimum cover consisted of two sets. then S2 (since it covers 2 of the remaining 3) and ﬁnally S3 . e . Thus. unless P = NP. Thus. S5. We remove all the covered elements. S4 (size 2) and ﬁnally S5 and S6 (each of size 1). S6} Greedy: {S1. and S6 have 16 elements. which achieves an approximation bound of ln m. If ties are broken in the worst possible way. where m = |X|. S4. S5 . whereas the optimal set cover has size 3. S2.set cover problem. (The book proves a somewhat stronger result. it would return a set cover of size 4. Lemma: For all c > 0.S } return C } For the example given earlier the greedy-set cover algorithm would select S1 (since it covers 6 out of 12 elements). where m = |X|. F) { U = X // U are the items to be covered C = empty // C will be the sets in the cover while (U is nonempty) { // there is someone left to cover select S in F that covers the most elements of U add S to C U = U . because set cover is one of the most powerful NP-complete problems. The optimal set cover consists of sets S5 and S6 . S6} Fig. their proof is more complicated. However.) Before giving the proof.

and hence there exists a subset that covers at least m1 /c elements. We want to put these objects into a set of bins. Even though the greedy set cover has this relatively bad ratio bound. we know that we can cover these m1 elements with a cover of size c (the optimal cover). If we apply this argument g times. but we are correct to within 1 set. if every set covered less than m0 /c elements. It will simplify the presentation to assume that 0 < si < 1. 1≤m 1− c We can rewrite this as 1≤m By the inequality above we have 1≤m 1 e g/c 1− 1 c c g/c . Each bin can hold a subset of objects whose total size is at most 1. . Bin Packing: Bin packing is another well-known NP-complete problem. Theorem: Greedy set cover has the ratio bound of at most ln m where m = |X|. leaving at most m2 = m1 (1 − 1/c) = m0 (1 − 1/c)2 elements remaining. then you can reduce the problem into this form by simply dividing all sizes by the size of the bin. and if we raise both sides to the cth power. We know that there is a cover of size c (the optimal cover) and therefore by the pigeonhole principle. The problem is to partition the objects among the bins so as to use the fewest possible bins. and let g denote the size of the greedy set cover minus 1. Then the number of elements that remain is uncovered after g sets have been chosen by the greedy algorithm is at most mg = m0 (1 − 1/c)g . (The two functions are equal when x = 0. We are given a set of n objects. c .) Lecture Notes 87 CMSC 451 ⇒ g ≤ ln m. if we multiply by eg/c and take natural logs we get that g satisﬁes: eg/c ≤ m This completes the proof. if we substitute −1/c for x we have (1 − 1/c) ≤ e−1/c . then no collection of c sets could cover all m0 elements. The number of elements that remain to be covered is at most m1 = m0 − m0 /c = m0 (1 − 1/c).) Initially. (This is not quite what we wanted. Now. there are m0 = m elements left to be covered. but I think it is easier to understand. Proof: Let c denote the size of the optimum set cover. the example shown above in which the approximation bound is (lg m)/2 is not “typical” of set cover instances. How long can this go on? Consider the largest value of g such that after removing all but the last set of the greedy cover. The theorem of the approximation bound for bin packing proven here is a bit weaker from the one in CLRS. we have the desired result. we are interested in the largest value of g such that g 1 .) Now. each time we succeed in covering at least a fraction of (1 − 1/c) of the remaining elements. Thus. (Since otherwise. we still have some element remaining to be covered. We will show that g/c ≤ ln m. 1 + x ≤ ex . Applying the argument again. there must be at least one set that covers at least m0 /c elements. where si denotes the size of the ith object.) Since the greedy algorithm selects the largest set. (Note that if your bin size is not 1.Proof: We use the fact that for all x. it will select a set that covers at least this many elements. which is a variant of the knapsack problem. Thus. it seems to perform reasonably well in practice.

Bin packing arises in many applications. and bff denote the number of bins used by ﬁrst-ﬁt.) Next. in which the objects are ﬁrst sorted in decreasing order of size. we claim that bff ≤ 2S. This is true. We take each object in turn. and then try to squeeze the smaller objects into the remaining space. Many of these applications involve not only the size of the object but their geometric shape as well. so if i is the last index (i = bff ) then i + 1 = 1. b∗ s6 s7 s3 s1 s2 s5 s4 Fig. as desired. Assume that indexing is cyclical. Combining this with the fact that b∗ ≥ S we have bff ≤ 2S ≤ 2b∗ . we would need at least S bins. . (The reduction is from the knapsack problem. that is. preferring to keep everything in the ﬁrst bin. Let b∗ denote the optimal number of bins. If not. Thus we have 2S ≥ bff .) Here is a simple heuristic algorithm for the bin packing problem. We claim that ﬁrst-ﬁt uses at most twice as many bins as the optimum. and ﬁnd the ﬁrst bin that has space to hold this object. For example. and just consider the sizes of the objects. The algorithm is illustrated in the ﬁgure below. called the ﬁrst-ﬁt heuristic. Let S = i si denote the sum of all the object sizes. implying that bff /b∗ ≤ 2. since the number of bins is an integer. There are in fact a number of other heuristics for bin packing. . There is also a variant of ﬁrst-ﬁt. We start with an unlimited number of empty bins. First observe that b∗ ≥ S. which attempts to put the object into the bin in which it ﬁts most closely with the available space (assuming that there is sufﬁcient available space). Another example is best ﬁt. or cutting the maximum number of pieces of certain shapes out of a piece of sheet metal. 67: First-ﬁt Heuristic. we would need at least S bins. sn } of the bin packing problem. and hence ﬁrst-ﬁt would never have started to ﬁll the second bin. . i=1 But this sum adds up all the elements twice. called ﬁrst ﬁt decreasing. . and ﬁrst-ﬁt uses bff bins. We put this object in this bin. We claim that ti + ti+1 ≥ 1. Theorem: The ﬁrst-ﬁt heuristic achieves a ratio bound of 2. then bff ≤ 2. and even if we were to ﬁll each bin exactly to its capacity. if the optimal solution uses b∗ bins. even if we ignore the geometry.) Lecture Notes 88 CMSC 451 . Thus we have: bff (ti + ti+1 ) ≥ bff . these include packing boxes into a truck. let ti denote the total size of the objects that ﬁrst-ﬁt puts into bin i. However. (This makes intuitive sense. since no bin can hold a total capacity of more than 1 unit. Proof: Consider an instance {s1 . To see this. because it is best to ﬁrst load the big items. (In fact. then the contents of bins i and i + 1 could both be put into the same bin. Consider bins i and i + 1 ﬁlled by ﬁrst-ﬁt. so it has a total value of 2S. the decision problem is still NP-complete.

One direction is to specialize in some particular area. If you cannot clearly describe what your algorithm is supposed to do. you might consider an approximation algorithm. or approximation algorithms.g. The intent has been to investigate basic algorithm design paradigms: dynamic programming. can you establish a lower bound?) If your solution is exponential time. run many experiments to see how good the actual performance is. Lecture Notes 89 CMSC 451 .g. You should develop a good heuristic. then maybe your problem is NP-hard. prove a ratio bound for your algorithm. sets. string pattern matching. exhaustive enumeration. Is there some reason why a better algorithm does not exist? (That is. Best ﬁt has a very similar bound. only to ﬁnd out later (after a lot of coding and testing) that it does not work. or strings. After you have isolated the good designs. However. an emerging area is the study of algorithm engineering. Prototype to generate better designs: We have attempted to analyze algorithms from an asymptotic perspective. And to consider how these techniques can be applied on a number of well-deﬁned problems. try to come up with a better one. it is very difﬁcult to know when you have succeeded. Still too slow?: If your problem has an unacceptably high execution time. there are some important lessons to take out of this class.. Develop a clean mathematical model: Most real-world problems are messy. parallel algorithms. it is important to begin with a good rough design. It would be easy to devote an entire semester to any one of these topics. The world is full of heuristics. as well as a theoretical sense. and ﬁnally some examples of approximation algorithms. . Prove that your algorithm is correct before coding. etc. or to consider general search strategies such as simulated annealing. If you cannot prove a ratio bound. but give a general perspective for separating good designs from bad ones. Prove your algorithm correct: Many times you come up with an idea that seems promising. which we have largely ignored. If your rough design is based on a bad paradigm (e. computational geometry. There is still much more to be learned about algorithm design. Create good rough designs: Before jumping in and starting coding.A more careful proof establishes that ﬁrst ﬁt has a approximation ratio that is a bit smaller than 2. when depth-ﬁrst search could have been applied) then no amount of additional tuning and reﬁning will save this bad design. constant factors). but we have covered a great deal of the basic material. which hides many of details of the running time (e. An important ﬁrst step in solving any problem is to produce a simple and clean mathematical formulation. For example. this might involve describing the problem as an optimization problem on graphs. chances are that it is not correct at all. We have also discussed the class NP-completeness. which considers how to design algorithms that are both efﬁcient in a practical sense. First ﬁt decreasing has a signiﬁcantly better bound of 11/9 = 1. and in fact 17/10 is possible.222 . depth-ﬁrst search. Writing proofs is not always easy. Can it be improved?: Once you have a solution. Lecture 24: Final Review Overview: This semester we have discussed general approaches to algorithm design. Finally. greedy algorithms. but it may save you a few weeks of wasted programming time. If you cannot see why it is correct. How to use this information: In some sense. and those are the ones you should work on improving. and if possible. both good and bad. Still another direction might be to study numerical algorithms (as covered in a course on numerical analysis).g. the algorithms you have learned here are rarely immediately applicable to your later work (unless you go on to be an algorithm designer) because real world problems are always messier than these simple abstract problems. then it is time to start prototyping and doing empirical tests to establish the real constant factors. e. randomized algorithms. A good proﬁling tool can tell you which subroutines are taking the most time. of problems that believed to be very hard to solve. Another direction is to gain a better understanding of average-case analysis. .

polynomial time reductions. at least tell me what you would like f to do.) The key to proving many ratio bounds is ﬁrst coming up with a lower bound on the optimal solution (e. Suppose that you want to prove that some problem B is NP-complete. NP-completeness: (Chapt 34. the shortcut twice-around tour for the MST is at most twice the MST cost). • For some known NP-complete problem A. k-center: Ratio bound of 2. up through 35. • B ∈ NP. where m = |X|.) Basic concepts: Decision problems. TSPopt ≥ MST).) Vertex cover: Ratio bound of 2. (Make sure to get the direction correct!) • Show the correctness of your reduction. Approximation Algorithms: (Chapt. All-Pairs Shortest paths: (Chapt 25.2. I try to make at least one reduction on the exam similar to one that you have seen before. certiﬁcates and the class NP. polynomial time.g. Bin packing: Ratio bound of 2. here are some suggestions for maximizing partial credit. Remember that you are trying to translate the elements of one problem into the common elements of the other problem. NP-complete reductions can be challenging. This basically involves specifying the certiﬁcate.. Then suppose that you have a solution to f (x) and show how to map this to a solution for x.g. so don’t blow it. (Most are based on simple greedy heuristics. by showing that x ∈ A if and only if f (x) ∈ B. It is also a good idea to understand all the reductions that were used in the homework solutions. provide an upper bound on the cost of your heuristic relative to this same quantity (e. the class P. since modiﬁcations of these will likely appear on the ﬁnal. so make sure that understand the ones that we have done either in class or on homework problems. Next. The certiﬁcate is almost always the thing that the problem is asking you to ﬁnd. NP-completeness reductions: You are responsible for knowing the following reductions. and try to ﬁll in as many aspects as possible. TSP with triangle inequality: Ratio bound of 2. but I will not ask too many detailed questions. • 3SAT to Independent Set (IS). This means that you want to ﬁnd a polynomial time function f that maps an instance of A to an instance of B. If you cannot see how to solve the problem. Running time O(V 3 ). • Independent Set to Vertex Cover and Clique. Do not forget DFS and DP. • Vertex Cover to Subset Sum. A ≤P B..) Floyd-Warshall Algorithm: All-pairs shortest paths. • 3-coloring to clique cover. 35. Explain which elements of problem A will likely map to which elements of problem B. arbitrary edge weights (no negative cost cycles).Material for the ﬁnal exam: Old Material: Know general results. • Vertex Cover to Dominating Set. If you cannot ﬁgure out what f is. Lecture Notes 90 CMSC 451 . Explain that you know the template. This almost always easy.2. You will likely an algorithm design problem that will involve one of these two techniques. Set Cover: Ratio bound of ln m. All NP-complete proofs have a very speciﬁc form. Many approximation algorithms are simple. First suppose that you have a solution to x and show how to map this to a solution for f (x).

speed of the underlying hardware. People are most concerned about running times for large inputs.000. we use asymptotic notation. which will will express mathematically as T (n) ∈ Θ(n3 ).1 sec 1 sec T1 (n)/T2 (n) 1 100 10. the performance of the asymptotically poorer algorithm degrades much more rapidly. For example.001 sec 0. Therefore. which captures the essential growth rate properties. then almost any algorithm is fast enough. Deﬁnition: Given any function g(n). we might say that the running time grows “on the order of” n3 . n is a ﬁxed value..000 The clear lesson is that as input sizes grow. (An exact analysis is probably best done by implementing the algorithm and measuring CPU seconds. suppose we have two programs.000 1. n ≤ 10) the ﬁrst algorithm is the faster of the two. Asymptotic analysis is based on two simplifying assumptions. For the most part.001 sec 1 sec 17 min 11. By ignoring constant factors. these assumptions are reasonable when making comparisons between functions that have signiﬁcantly different behaviors. c2 . we will ignore constant factors. the 13n3 term dominates the others. These assumptions are not always reasonable. Asymptotics: The formulas that are derived for the running times of program may often be quite complex. Most of the algorithms that we will study this semester will have both low constants and low asymptotic running times.g.) We would like a simple way of representing complex functions. But as n becomes larger the relative differences in running time become much greater. Asymptotic Notation: To represent the running times of algorithms in a simpler form. It may be the case that one function is smaller than another asymptotically. in any particular application. we deﬁne Θ(g(n)) to be a set of functions: Θ(g(n)) = {f (n) | there exist strictly positive constants c1 . but for your value of n. n 10 100 1000 10.6 days T2 (n) 0.Supplemental Lecture 1: Asymptotics Read: Chapters 2–3 in CLRS. Large input sizes: We are most interested in how the running time grows for large values of n. Ignore constant factors: The actual running time of the program depends on various constant factors in the implementation (coding tricks. suppose we have an algorithm whose (exact) worst-case running time is given by the following formula: T (n) = 13n3 + 5n2 − 17n + 16. (The latter algorithm may be faster because it uses a more sophisticated and complex algorithm. Let us consider how to make this idea mathematically formal. This is the purpose of asymptotics.) For small n (e. When designing algorithms. etc).01 sec 0. the main purpose of the analysis is to get a sense for the trend in the algorithm’s running time. the asymptotically larger value is ﬁne.000 T1 (n) 0. which essentially represents a function by its fastest growing term and ignores constant factors. For example. so we will not need to worry about these issues. Assuming one million operations per second. optimizations in compilation. and the added sophistication results in a larger constant factor. As n becomes large. The justiﬁcation for considering large n is that if n is small. But it is important to understand these assumptions and the limitations of asymptotic analysis. which hold in most (but not all) cases. and n0 such that 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 }. one whose running time is T1 (n) = n3 and another whose running time is T2 (n) = 100n. 91 CMSC 451 Lecture Notes . This intuitive deﬁnition is ﬁne for informal use. For example.

then we are done. that f (n) does grows asymptotically at least as fast as n2 . This means that they have essentially the same growth rates for large n. we have n0 ≥ √ and from the upper bound we have n0 ≥ 1. (Whew! That was a lot more work than just the informal notion of throwing away constants and keeping the largest term. then we are done. which is what we need. since you only have to satisfy the condition for all n bigger than n0 . the dominant (fastest growing) term is some constant times n2 . Lower bound: f (n) grows asymptotically at least as fast as n2 : This is established by the portion of the deﬁnition that reads: (paraphrasing): “there exist positive constants c1 and n0 . We need to show two things: ﬁrst.) Now let’s show why f (n) is not in some other asymptotic class. These are not true for all n. we have established that f (n) ∈ n2 . then we would have to satisfy both the upper and lower bounds.” Consider the following reasoning (which is almost correct): f (n) = 8n2 + 2n − 3 ≤ 8n2 + 2n ≤ 8n2 + 2n2 = 10n2 . Intuitively. such that f (n) ≤ c2 n for all n ≥ n0 . what we want to say with “f (n) ∈ Θ(g(n))” is that f (n) and g(n) are asymptotically equivalent. Our informal rule of keeping the largest term and throwing away the constants suggests that f (n) ∈ Θ(n2 ) (since f grows quadratically). (8n2 + 2n − 3). In particular. and so combining these √ we let n0 be the larger of the two: n0 = 3.” The portion of the deﬁnition that allows us to select n0 is essentially saying “we are only interested in large n. Since we have shown (by construction) the existence of constants c1 . and now we have f (n) ≤ c2 n2 for all n ≥ n0 .” An example: Consider the function f (n) = 8n2 + 2n − 3.Let’s dissect this deﬁnition. and now we have f (n) ≥ c1 n2 . the upper bound requires us to show “there exist positive constants c2 and n0 . let us select n0 = 1. but it shows how this informal notion is implemented formally in the deﬁnition. if we let c1 = 7. which is what we need. in conclusion. (n2 /5 + n − 10 log n). If this were / true. This is not true for all n. It turns out that the lower bound is satisﬁed (because f (n) grows at least as fast asymptotically as n). We have implicitly made the assumption that 2n ≤ 2n2 . This means that if we let c2 = 10. if n ≥ 3. 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n) and this is exactly what the deﬁnition requires. for all n ≥ n0 . since as n becomes large. and you may make n0 as big a constant as you like. then both are true. such that f (n) ≤ c2 n2 for all n ≥ n0 . then we have for all n ≥ n0 . let’s show that f (n) ∈ Θ(n). First. But the upper bound is false. but it is true for all n ≥ 1.” Informally. For example. c2 = 10. such that f (n) ≥ c1 n2 for all n ≥ n0 . In particular. In other words. and n0 . Thus. But in the above reasoning we have implicitly made the − true assumptions that 2n ≥ 0 and n2 √ 3 ≥ 0.” Consider the following (almost correct) reasoning: f (n) = 8n2 + 2n − 3 ≥ 8n2 − 3 = 7n2 + (n2 − 3) ≥ 7n2 = 7n2 . they all grow quadratically in n. So let us select n0 = 3. We’ll do both very carefully. functions such as √ 4n2 . and second. that f (n) grows no faster asymptotically than n2 . if we set c1 = 7. Upper bound: f (n) grows asymptotically no faster than n2 : This is established by the portion of the deﬁnition that reads “there exist positive constants c2 and n0 . but they are√ for all sufﬁciently large n. So. Thus. c2 . and n0 = 3. The portion of the deﬁnition that allows us to select c1 and c2 is essentially saying “the constants do not matter because you may pick c1 and c2 however you like to satisfy these conditions. √ 3 From the lower bound. we know that as n becomes large enough f (n) = 8n2 + 2n − 3 will eventually exceed c2 n no matter Lecture Notes 92 CMSC 451 . Let’s see why the formal deﬁnition bears out this informal observation. and n(n − 3) are all intuitively asymptotically equivalent.

You will see that O-notation only enforces the upper bound of the Θ deﬁnition. The O-notation allows us to state asymptotic upper bounds and the Ω-notation allows us to state asymptotic lower bounds. such that 8n2 + 2n − 3 ≥ c1 n3 for all n ≥ n0 . such that 8n2 + 2n − 3 ≤ c2 n for all n ≥ n0 . Limit Rule for Θ-notation: Given positive functions f (n) and g(n). For example f (n) = 3n2 + 4n ∈ Θ(n2 ) but it is not in Θ(n) or Θ(n3 ). such that f (n) ≥ c1 n3 for all n ≥ n0 . Also observe that f (n) ∈ Θ(g(n)) if and only if f (n) ∈ O(g(n)) and f (n) ∈ Ω(g(n)). Lecture Notes 93 CMSC 451 . g(n) n→∞ for some constant c > 0 (strictly positive but not inﬁnity).how large we make c2 (since f (n) is growing quadratically and c2 n is only growing linearly). and so the only way to satisfy this requirement is to set / c1 = 0. The Limit Rule for Θ: The previous examples which used limits suggest alternative way of showing that f (n) ∈ Θ(g(n)). suppose towards a contradiction that constants c2 and n0 did exist. suppose towards a contradiction that constants c1 and n0 did exist. To show this formally. Compare this with the deﬁnition of Θ. If we divide both side by n3 we have: lim 2 3 8 + − 3 n n2 n ≥ c1 . To show this formally. This means that f (n) ∈ Θ(n3 ). if lim f (n) = c. This means that f (n) ∈ Θ(n). Ω(g(n)) = {f (n) | there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 }. / Let’s show that f (n) ∈ Θ(n3 ). Since this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. and so no matter how large c2 is. Since this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. If we divide both side by n we have: 3 ≤ c2 . f (n) ∈ O(g(n)) means that f (n) grows asymptotically at the same rate or faster than g(n). Deﬁnition: Given any function g(n). this statement is violated. Here the idea will be to violate the lower bound: “there exist positive constants / c1 and n0 . But f (n) ∈ O(n2 ) and in O(n3 ) but not in O(n).” Informally this is true because f (n) is growing quadratically. Whereas. Intuitively. and eventually any cubic function will exceed it. Deﬁnition: Given any function g(n). Sometimes we are only interested in proving one bound or the other. and Ω-notation only enforces the lower bound. but by hypothesis c1 is positive. lim 8n + 2 − n→∞ n It is easy to see that in the limit the left side tends to ∞. Finally. f (n) ∈ Ω(n2 ) and in Ω(n) but not in Ω(n3 ). O-notation and Ω-notation: We have seen that the deﬁnition of Θ-notation relies on proving both a lower and upper asymptotic bound. f (n) ∈ O(g(n)) means that f (n) grows asymptotically at the same rate or slower than g(n). n→∞ It is easy to see that in the limit the left side tends to 0. O(g(n)) = {f (n) | there exist positive constants c and n0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 }. then f (n) ∈ Θ(g(n)).

The terminology lgb n means (lg n)b . dredge out your calculus book to remember. it follows that f (n) ∈ Θ(g(n)). g(n) n→∞ for some constant c ≥ 0 (nonnegative but not inﬁnite). if lim f (n) = c. For example: n500 ∈ O(2n ).g. Conversely. For example: lg500 n ∈ O(n). But since most running times are fairly well-behaved functions this is rarely a problem. if lim f (n) =0 g(n) n→∞ (either a strictly positive constant or inﬁnity) then f (n) ∈ Ω(g(n)). To show that f (n) ∈ Θ(n2 ) we let g(n) = n2 and compute the limit. n→∞ nc lim We won’t prove these. logarithmic powers (sometimes called polylogarithmic functions) grow more slowly than any polynomial. but they can be shown by taking appropriate powers. We have 3 2 8n2 + 2n − 3 = lim 8 + − 2 = 8. Limit Rule for Ω-notation: Given positive functions f (n) and g(n). Since 8 is a nonzero constant. The following are nice to keep in mind. then f (n) ∈ O(g(n)). g(n) n→∞ g (n) n→∞ where f (n) and g (n) denote the derivatives of f and g relative to n. The only exceptions that I know of are strange instances where the limit does not exist (e..) Most of the rules are pretty self evident (e. the limit of a ﬁnite sum is the sum of the individual limits). o The important bottom line is that polynomials always grow more slowly than exponentials whose base is greater than 1. b. and then applying L’Hˆ pital’s rule.Limit Rule for O-notation: Given positive functions f (n) and g(n). You may recall the important rules from calculus for evaluating limits. One important rule to remember is the following: o L’Hˆ pital’s rule: If f (n) and g(n) both approach 0 or both approach ∞ in the limit. and c: nb =0 n→∞ an lim lgb n = 0. Lecture Notes 94 CMSC 451 .g. This limit rule can be applied in almost every instance (that I know of) where the formal deﬁnition can be used. Exponentials and Logarithms: Exponentials and logarithms are very important in analyzing algorithms. (If not. For example. f (n) = n(1+sin n) ). and it is almost always easier to apply than the formal deﬁnition. n→∞ n→∞ n2 n n lim (since the two fractional terms tend to 0 in the limit). we will try to avoid exponential running times at all costs. recall the function f (n) = 8n2 + 2n − 3. then lim f (n) f (n) = lim . Lemma: Given any positive constants a > 1. For this reason.

For this reason.: Polynomial time. Max Dominance Revisited: Returning to our Max Dominance algorithms. • Θ(1): Constant time. if you want to see a function that grows inconceivably fast. Since many problems require sorting the inputs. by my calculations. For example lg2 n ≤ n for all n ≥ 16. . for small powers of logarithms. which sorted the points by their x-coordinate. • Θ(n!).g. Θ(nn ): Acceptable only for really small inputs (e. At this point. Expanding the latter function and grouping terms in order of their growth rate we have T2 (n) = n n2 + n log n − . this improvement. here is a little list. inserting a key into a balanced binary tree. you should take this with a grain of salt. n ≤ 1. Thus. Θ(3n ): Exponential time. given that you need Θ(n) time just to read in all the data. this does not represent a signiﬁcant improvement. Supplemental Lecture 2: Max Dominance Read: Review Chapters 1–4 in CLRS. and then compared each point against the subsequent points in the sorted order. In case (2).. Θ(n3 ).g. 000). but you should be careful just how high the crossover point is. These running times are acceptable either when the exponent is small or when the data size is not too large (e. we will usually be happy to allow any number of additional logarithmic factors. Last time we considered a slight improvement. or (2) you know that this is a worst-case running time that will rarely occur in practical instances. Asymptotic Intuition: To get a intuitive feeling for what common asymptotic running times map into in terms of practical usage. Faster Algorithm for Max-Dominance: Recall the max-dominance problem from the last two lectures. They are true in the limit for large n. n ≤ 50). Are their even bigger functions? Deﬁnitely! For example. However. This is only acceptable when either (1) your know that you inputs will be of very small size (e. • Θ(2n ). it should be mentioned that these last observations are really asymptotic results. n ≤ 20). lg500 n ≤ n only for n > 26000 (which is much larger than input size you’ll ever see).) Also it is the time to ﬁnd an object in a sorted list of length n by binary search. For example. this is still considered quite efﬁcient. Lecture Notes 95 CMSC 451 . which operated by comparing all pairs of points. 2 2 We will leave it as an easy exercise to show that both T1 (n) and T2 (n) are Θ(n2 ). • Θ(n2 ). only improved matters by a constant factor.. recall that one had a running time of T1 (n) = n2 and the other had a running time of T2 (n) = n log n + n(n − 1)/2.g. Although the second algorithm is twice as fast for large n (because of the 1/2 factor multiplying the n2 term). . .g. if it means avoiding any additional powers of n. look up the deﬁnition of Ackerman’s function in our text. The question we consider today is whether there is an approach that is signiﬁcantly better. • Θ(n log n): This is the running time of the best sorting algorithms. But. So far we have introduced a simple brute-force algorithm that ran in O(n2 ) time. • Θ(n): This is about the fastest that an algorithm can run. it would be a good idea to try to get a more accurate average case analysis. this applies to all reasonably large input sizes. (E. you can’t beat it! • Θ(log n): This is typically the speed that most efﬁcient data structures operate in for a single access.

but since we are not using any really complex data structures. the point with the maximum ycoordinate is the maximal point with the smallest x-coordiante.y) { // is P[i] maximal? output P[i]. we will usually just write log n. We keep track of the index j of the most recently seen maximal point. Can we save time by making only one comparison for each point? The inner while loop is testing to see whether any point that follows P [i] in the sorted list has a larger y-coordinate. even though we have cut the number of comparisons roughly in half. that if we knew which point among P [i + 1. This is nice and clean (since it is usually easy to get good code for sorting without troubling yourself to write your own). Max Dominance: Sort and Reverse Scan MaxDom3(P. output P[n]. both algorithms are probably running fast enough that the difference will be practically negligible. so there is a 100to-1 ratio in running times. it is maximal if and only if P [i]. This is quite a signiﬁcant difference. 000. and compare their running times. 10. But just to get a feeling. divide-and-conquer is a natural method to choose.output it j = i. so inside of O-notation. less than 100). 000. 000.) For relatively small values of n (e. and the code inside takes constant time. This suggests. divide and conquer. // P[i] has the largest y so far } } } The running time of the for-loop is obviously O(n). the ratio of n to log n is about 1000/10 = 100. What is this paradigm? Lecture Notes 96 CMSC 451 . One of the basic maxims of algorithm design is to ﬁrst approach any problem using one of the standard algorithm design paradigms. The total running time is dominated by the O(n log n) sorting time. ln n to denote the natural logarithm (base e) and log n when I do not care about the base. For any set of points. 000/20 = 50. say. 000. say.) When we encounter the point P [i]. depth-ﬁrst search. // last point is always maximal j = n. Of course. n] had the maximum y-coordinate. Note that a change in base only affects the value of a logarithm function by a constant amount. How much of an improvement is this? Probably the most accurate way to ﬁnd out would be to code the two up. We will talk more about these methods as the semester continues. (Initially the rightmost point is maximal.g. we would need to factor in constant factors. dynamic programming. we could just test against that point. (Rule 1 of algorithm optimization: Don’t optimize code that is already fast enough. n lg n lg n (I use the notation lg n to denote the logarithm base 2. each point is still making lots of comparisons. for i = n-1 downto 1 { if (P[i]. e. How can we do this? Here is a simple observation.) On larger inputs. This suggests that we can sweep the points backwards. This suggests the following algorithm.y.. . for a total of O(n log n) time. .A Major Improvement: The problem with the previous algorithm is that. For this problem. we are looking at a ratio of roughly 1. n = 1. if you really wanted to squeeze the most efﬁciency out of your code. Divide and Conquer Approach: One problem with the previous algorithm is that it relies on sorting. say. 000. n = 1. . n) { Sort P in ascending order by x-coordinate. . ignoring constant factors: n n2 = . you might consider whether you can solve this problem without invoking a sorting algorithm.y >= P[j]. irrespective of the constant factors.y ≥ P [j]. let’s look at the ratio of the running times. greedy algorithms. from right to left.g. However. // yes. because there is just a single loop that is executed n − 1 times. it is hard to imagine that the constant factors will differ by more than. For even larger inputs.

Take the ﬁnger pointing to the point with the smaller x-coordinate. M2). (Consider point (7. In either case. and split into two subarrays of equal size P [1. x = P [r]. because it does all the work. does not imply that it is globally maximal. One is similar to how MergeSort operates. which is more reminiscent of QuickSort is to select a random element from the list. We break the problem into two subproblems of size roughly n/2 (we will say exactly n/2 for simplicity). Otherwise it is not copied. M2 = MaxDom4(P[m+1. We will describe the algorithm at a very high level. and the additional overhead of merging the solutions is O(n). then it must be maximal in one of the two sublists. The details will be left as an exercise. If its y-coordinate is larger than the y-coordinate of the point under the other ﬁnger. called a pivot. Observe that because we spend a constant amount of time processing each point (either copying it to the result list or skipping over it) the total execution time of this procedure is O(n). We will ignore constant factors. Just take the array of points P [1. It maintains two pointers. and a point array will be returned. just because a point is maximal in some list. then this point is maximal. m = n/2. T (n) = 2T (n/2) + n if n > 1. and then partition the list into two sublists. The key ingredient is a function that takes the maxima of two sets.n]. 10) in the example. Max Dominance: Divide-and-Conquer MaxDom4(P. How shall we divide the problem? I can think of a couple of ways.Divide: Divide the problem into two subproblems (ideally of approximately equal sizes). Because we do not sort the points. The result list is returned.n]. but on average it can be shown that it does a pretty good job. and merges them into an overall set of maxima. if it dominates all the points of the other sublist. where r is a random integer in the range from 1 to n...) Here is more concrete outline. giving: T (n) = 1 if n = 1. Another approach. m). and repeat the process. M1 = MaxDom4(P[1. a function which is deﬁned recursively in terms of itself. Let us assume that it returns a list of points in sorted order according to x-coordinates of the maximal points. However. we move to the next point in the same list. then the best way is usually by setting up a recurrence. This will not be guaranteed to split the list into two equal parts. return MaxMerge(M1. that is.n].. then we can assert that it is maximal. I will describe the procedure at a very high level. Conquer: Solve each subproblem recursively. Think of these as ﬁngers. (The quicksort method will also work. n) { if (n == 1) return {P[1]}. It operates by walking through each of the two sorted lists of maximal points. writing O(n) just as n. The input will be a point array. and Combine: Combine the solutions to the two subproblems into a global solution. The main question is how the procedure Max Merge() is implemented.n/2] and P [n/2 + 1. those elements whose x-coordinates are less than or equal to x and those that greater than x.. Let’s consider the ﬁrst method.. Recurrences: How do we analyze recursive procedures like this one? If there is a simple pattern to the sizes of the recursive calls. Lecture Notes 97 CMSC 451 . and is copied to the next position of the result list. one pointing to the next unprocessed item in each list. n-m). } // // // // // one point is trivially maximal midpoint of list solve for first half solve for second half merge the results The general process is illustrated below. but leads to a tougher analysis. there is no particular relationship between the points in one side of the list from the other. Observe that if a point is to be maximal overall.m].) However.

14) (7.4) 4 6 8 10 12 14 16 Input and initial partition.14) (7.10) (15.5) (5.10) (14.5) (4.14 12 10 8 6 4 2 2 (2.12) (14. Lecture Notes 98 CMSC 451 .12) (4.3) 4 6 8 10 12 14 16 Merged solution.10) (14.13) 2 4 6 8 10 12 14 16 Solutions to subproblems.3) (16.10) (15.10) (2. Fig. (12.7) (7.7) (7.1) (13.10) (15.14) (7.1) (13.13) (12. 68: Divide and conquer approach.4) 14 12 10 8 6 4 2 (2. 14 12 10 8 6 4 2 2 (5.11) (9.13) (12.7) (11.11) (9.11) (9.7) (11.7) (11.1) (7.7) (16.5) (5.12) (4.4) (13.3) (16.

b > 1 be constants and let T (n) be the recurrence T (n) = aT (n/b) + cnk . The easiest method is to apply the Master Theorem that is given in CLRS. Substituting this in and using the identity alog b = blog a we have: T (n) = 2log3 n T (1) + n log3 n−1 i=0 (2/3)i = nlog3 2 + n log3 n−1 (2/3)i . T (n) n +n 3 n 2n n n + + n = 4T + n+ = 2 2T 9 3 9 3 n 2n 2n 4n n n + + n+ = 8T + n+ + = 4 2T 27 9 3 27 3 9 . This pattern usually results in a summation that is easy to solve. T (1) T (n) = = 1 2T n +n 3 if n > 1 First we expand the recurrence into a summation. We assume that n is a power of 3. but the Master Theorem (either this form or the one in CLRS will not tell you this. For example. but adequate for a lot of instances.) For such recurrences. Case (2): a = bk then T (n) is Θ(nk log n). If you look at the proof in CLRS for the Master Theorem. There many recurrences that cannot be put into this form. We want to know how many expansions are needed to arrive at the basis case. other methods are needed. . Using this version of the Master Theorem we can see that in our recurrence a = 2. until seeing the general pattern emerge. Case (1): a > bk then T (n) is Θ(nlogb a ). = 2T = 2k T n 3k k−1 + i=0 2i n = 2k T 3i n 3k k−1 +n i=0 (2/3)i .Solving Recurrences by The Master Theorem: There are a number of methods for solving the sort of recurrences that show up in divide-and-conquer algorithms. Case (3): a < bk then T (n) is Θ(nk ). Expansion: A more basic method for solving recurrences is that of expansion (which CLRS calls iteration). This solves to T (n) = Θ(n log2 n). Here is a slightly more restrictive version. To do this we set n/(3k ) = 1. Theorem: (Simpliﬁed Master Theorem) Let a ≥ 1. . The parameter k is the number of expansions (not to be confused with the value of k we introduced earlier on the overhead). meaning that k = log3 n. deﬁned for n ≥ 0. it is actually based on expansion. Let us consider applying this to the following recurrence. b = 2. and k = 1. i=0 Lecture Notes 99 CMSC 451 . so a = bk and case (2) applies. See CLRS for the more complete version of the Master Theorem and its proof. This is a rather painstaking process of repeatedly applying the deﬁnition of the recurrence until (hopefully) a simple pattern emerges. the following recurrence is quite common: T (n) = 2T (n/2) + n log n. Thus T (n) is Θ(n log n).

It is tempting to conjecture that Fn ≤ φn−1 . whenever n ≥ 2. and so it is Θ(n). you will usually ﬁnd out what these values must be. as desired. Sometimes there are parameters whose values you do not know. Since n ≥ 2. Let L(i) denote the number of leaves in the minimum-sized AVL tree of height i. Induction step: For the induction step. we have Fn = Fn−1 + Fn−2 . If you study AVL (height balanced) trees in data structures. 1 < φ < 2. Thus the number of leaves obeys L(0) = L(1) = 1. from which we have Fn ≤ φn−2 + φn−3 = φn−3 (1 + φ). you create a root node whose children consist of a minimum-sized AVL tree of heights i − 1 and i − 2.Next. We can use induction to prove this and derive a bound on φ.631 < 1. T (n) is dominated by the 3n term asymptotically. Lemma: For all integers n ≥ 1. Observe that F1 = 1 ≤ φ0 . we can apply the formula for the geometric series and simplify to get: T (n) = nlog3 2 + n 1 − (2/3)log3 n 1 − (2/3) = nlog3 2 + 3n(1 − (2/3)log3 n ) = nlog3 2 + 3n(1 − nlog3 (2/3) ) = nlog3 2 + 3n(1 − n(log3 2)−1 ) = nlog3 2 + 3n − 3nlog3 2 = 3n − 2nlog3 2 . that of the Fibonacci numbers. In the course of the induction proof. Induction and Constructive Induction: Another technique for solving recurrences (and this works for summations as well) is to guess the solution. since n − 1 and n − 2 are both strictly less than n. we can apply the induction hypothesis. you will observe that Fn appears to grow exponentially. It is easy to see that L(i) = Fi+1 . Now. for some real parameter φ. but not as fast as 2n . This is ﬁne. Using this induction hypothesis we will show that the lemma holds for n itself. L(i) = L(i − 1) + L(i − 2). Basis: For the basis cases we consider n = 1. where 1 < φ < 2. Lecture Notes 100 CMSC 451 . F0 F1 Fn = = = 0 1 Fn−1 + Fn−2 for n ≥ 2. Fn ≤ φn−1 for some constant φ. 69: Minimum-sized AVL trees. If you expand the Fibonacci series for a number of terms. L(0) = 1 L(1)=1 L(2)=2 L(3)=3 L(4)=5 Fig. Since log3 2 ≈ 0. We will consider a famous example. The Fibonacci numbers arise in data structure design. let us assume that Fm ≤ φm−1 whenever 1 ≤ m < n. and then attempt to verify its correctness through induction. Proof: We will try to derive the tightest bound we can on the value of φ. To construct a minimum-sized AVL tree of height i. or the general form of the solution. you will learn that the minimum-sized AVL trees are produced by the recursive construction given below.

√ By the way. but we did not get an exact answer. the value φ = 1 (1 + 5) is a famous constant in mathematics.618. a2 . This proportion occurs throughout the world of art and architecture. . By the quadratic formula we have φ = 1± √ √ 1± 5 1+4 = . and vice versa. . Consider any inﬁnite sequence: a0 . . architecture and art. Thus.We want to show that this is at most φn−1 (for a suitable choice of φ). and hence cannot be applied to F0 ! To ﬁx it we could include F2 as part of the basis case as well. implying that we want to ﬁnd the roots of the equation φ2 − φ − 1 = 0. . What is z? It is just a symbolic variable. and hence would not be a possible candidate for φ. This is why this method is called constructive induction. we could deﬁne a polynomial function such that these are the coefﬁcients of the function: G(z) = a0 + a1 z + a2 z 2 + a3 z 3 + . But the induction hypothesis only applies for m ≥ 1. observe that one of the roots is negative. This is not true for all values of φ (for example it is not true when φ = 1 but it is true when φ = 2. but we actually determined the value of φ which makes the lemma true. every inﬁnite sequence of numbers has a corresponding generating function. called a generating function.) At the critical value of φ this inequality will be an equality. E. and we had to generate a good guess before we were even able to start. If we would like to “encode” this sequence succinctly. Notice not only did we prove the lemma by induction. Two numbers A and B satisfy the golden ratio if A+B A = . Let us consider an approach to determine an exact representation of Fn . B A It is easy to verify that A = φ and B = 1 satisﬁes this condition. What is the advantage of this representation? It turns out that we can perform arithmetic Lecture Notes 101 CMSC 451 . Clearly this will be true if and only if (1 + φ) ≤ φ2 . There a good description of generating functions in D. Generating Functions: The method of constructive induction provided a way to get a bound on Fn . a3 . We will (almost) never assign it a speciﬁc value. a1 . Vol 1.24. The Art of Computer Programming. Supplemental Lecture 3: Recurrences and Generating Functions Read: This material is not covered in CLR. Here we claim that F2 = F1 + F0 and then we apply the induction hypothesis to both F1 and F0 . This method is based on a very elegant concept. The positive root is √ 1+ 5 ≈ 1. This is called the generating function of the sequence. φ = 2 There is a very subtle bug in the preceding proof. Knuth. which requires no guesswork. . Can you spot it? The error occurs in the case n = 2. 2 2 √ Since 5 ≈ 2. It is the golden 2 ratio.

. = 1 − z − z2 1 − az 1 − bz 102 CMSC 451 . (The particular manipulation we picked was chosen to cause this cancellation to occur. notice that if we multiply the generating function by a factor of z. The trick in dealing with generating functions is to ﬁgure out how various manipulations of the generating function to generate algebraically equivalent forms.). + (Fi − Fi−1 − Fi−2 )z i + .) From this we may conclude that G(z) = z . . . 1 − z − z2 So. .. a.. In general. now we have an alternative representation for the Fibonacci numbers. = z + z 2 + 2z 3 + 3z 4 + 5z 5 + . a2 . this has the effect of shifting the sequence to the right: G(z) = F0 zG(z) = z 2 G(z) = + F1 z F0 z + + F2 z 2 F1 z 2 F0 z 2 + + + F3 z 3 F2 z 3 F1 z 3 + + + F4 z 4 F3 z 4 F2 z 4 + + + . . Compute G(z) − zG(z) − z 2 G(z). given an constant a we have 1 = 1 + az + a2 z 2 + a3 z 3 + . 1. 1/(1 − z) is the generating function for the sequence (1. There are certain common tricks that people use to manipulate generating functions. and many sequences arising from linear recurrences) have generating functions that are easy to write down and manipulate.. . Let’s consider the generating function for the Fibonacci numbers: G(z) = F0 + F1 z + F2 z 2 + F3 z 3 + . . 1−z (In other words.. . . let’s try the following manipulation. we have 1 = 1 + z + z2 + z3 + . differentiating them) and this has a corresponding effect on the underlying transformations. The ﬁrst is to observe that there are some functions for which it is very easy to get an power series expansion. . . since then we could then extract the coefﬁcients of the power series easily. . Observe that every term except the second is equal to zero by the deﬁnition of Fi . .. multiplying them. 1 − az is the generating function for (1.transformations on these functions (e. If 0 < c < 1 then ∞ ci = i=0 1 . adding them. we would like to rewrite the generating function in the following form: G(z) = Lecture Notes B A z + . . In order to do this. a3 . and see what we get (1 − z − z 2 )G(z) = F0 + (F1 − F0 )z + (F2 − F1 − F0 )z 2 + (F3 − F2 − F1 )z 3 + . = z. For example. . For example. 1−c Setting z = c.. It would be great if we could modify our generating function to be in the form of 1/(1 − az) for some constant a.. Now. It turns out that some nicely-structured sequences (like the Fibonacci numbers. . .). the following is a simple consequence of the formula for the geometric series. . . . 1. So what good is this? The main goal is to get at the coefﬁcients of its power series expansion. as the coefﬁcients of this function if expanded as a power series.g.

Suppose that you are given a set of n numbers. Thus we have. but surprisingly difﬁcult to solve optimally. and (2) applying the method of partial fractions to get the generating function into one for which we could easily read off the ﬁnal coefﬁcients. ˆ 1 − φz 1−φ √ √ ˆ where φ = (1 + 5)/2 and φ = (1 − 5)/2. From this we have the following: G(z) = 1 √ 5 ( 1 −1 + φz ˆ + −φz ∞ + + φ2 z 2 ˆ −φ2 z 2 + + . because we can extract the coefﬁcients for these two fractions from the above function. we will make the simplifying assumption that all the elements are distinct for now. A similar trick can be applied to get B. to determine A. Today we will give a rather surprising (and very tricky) algorithm which shows the power of these techniques. In general. The only parts that involved some cleverness (beyond the invention of generating functions) was (1) coming up with the simple closed form formula for G(z) by taking appropriate differences and applying the rule for the recurrence. Deﬁne the rank of an element to be one plus the number of elements that are smaller than this element. the rank of an element is its ﬁnal position if the set is sorted. a..) Now we are in good shape. Since duplicate elements make our life more complex (by creating multiple elements of the same rank).. Supplemental Lecture 4: Medians and Selection Read: Chapter 9 of CLRS. In Lecture Notes 103 CMSC 451 . Fn = φn / 5. because it says that we can express the integer Fn as the sum of two powers of to ˆ irrational numbers φ and φ. The minimum is of rank 1 and the maximum is of rank n. and then consider what happens when z = 1/φ. for large enough √ n. Selection: We have discussed recurrences and the divide-and-conquer method of solving problems. i=0 We can now read off the coefﬁcients easily. If n is odd then the median is deﬁned to be the element of rank (n + 1)/2. b. this is called the method of partial fractions. multiply the equation by 1 − φz. We will skip the steps in doing this. You might try this for a few speciﬁc values of n to see why this is true. B.for some A. namely the elements of ranks n/2 and (n/2) + 1. (In particular. Thus.. 5 This is an exact result. Then by some simple algebra we can plug these values in and solve for A and B yielding: z = G(z) = 1 − z − z2 √ √ −1/ 5 1/ 5 + ˆ 1 − φz 1−φ 1 = √ 5 1 1 − . but it is not hard to verify the roots of (1 − az)(1 − bz) (which are 1/a and 1/b) must be equal to the roots of 1 − z − z 2 . ˆ when you observe that φ < 1.. rounded to the nearest integer. It will be easy to get around this assumption later. Of particular interest in statistics is the median. In particular it follows that 1 ˆ Fn = √ (φn − φn ). ) Combining terms we have 1 G(z) = √ 5 ˆ (φi − φi )z i . and no guesswork was needed. We can then solve for a and b by taking the reciprocals of the roots of this quadratic. By the way. it is clear that the ﬁrst term is the dominant one. This is a rather remarkable. The problem that we will consider is very easy to state. . When n is even there are two natural choices.

It analyzes the data (by choosing the pivot element and partitioning) and it eliminates some part of the data set.q −1] and if k > xRank then we need to recursively search A[q +1. Let xRank = q − p + 1.n].. taking say Θ(nk ) time. Then we solve the problem recursively on whatever items remain. let us apply it to the selection problem. Otherwise we either eliminate the pivot and the right subarray or the pivot and the left subarray. The initial call will be to the entire array A[1. Each of the resulting recursive solutions then do the same thing. If k < xRank . which I call the sieve technique.) Within each subarray. For example. Applying the Sieve to Selection: To see more concretely how the sieve technique works.. In particular “large enough” means that the number of items is at least some ﬁxed constant fraction of n (e. then we know that we need to recursively search in A[p. The sieve technique is a special case. output the element of A of rank k. Later we will see how to choose x. and we want to ﬁnd the kth smallest item (where k ≤ r − p + 1).r]. however after doing some amount of analysis of the data. We think of divide-and-conquer as breaking the problem into a small number of smaller subproblems..n] and an integer k. Notice that this algorithm satisﬁes the basic form of a sieve algorithm. Today we will focus on the following generalization. but assume for now that they both take Θ(n) time. since in divide-and-conquer applications. Selection: Given a set A of n distinct numbers and an integer k. 1 ≤ k ≤ n. and recurses on the rest. is it possible to solve this problem in Θ(n) time? We will see that the answer is yes. where the number of subproblems is just 1. Medians are useful as measures of the central tendency of a set. we will assume that we are given a subarray A[p. eliminating a constant fraction of the remaining set. called the selection problem. 0. called the pivot element. but they differ only in one step. Since the algorithm will be applied inductively. The Sieve Technique: The reason for introducing this algorithm is that it illustrates a very important special case of divide-and-conquer. The question is whether it is possible to do better.q − 1] will contain all the elements that are less than x. it is often desirable to partition a set about its median value. the median income in a community is likely to be more meaningful measure of the central tendency than the average is. which involves judiciously choosing an item from the array. simply by sorting the numbers of A. so we want to ﬁnd the element of rank k − q. but for now just think of it as a random element of A. the items may appear in any order..r]. Recall that we are given an array A[1. subarray A[p. Here is the complete pseudocode.g. n/2. This is illustrated below.. There are two principal algorithms for solving the selection problem. since if Bill Gates lives in your community then his gigantic income may signiﬁcantly bias the average.0001n). The question that remains is how many elements did we succeed in eliminating? If x is the largest Lecture Notes 104 CMSC 451 . If k = xRank . which we will denote by x. It is easy to see that the rank of the pivot x is q − p + 1 in A[p.. whereas it cannot have a signiﬁcant inﬂuence on the median. which are then solved recursively. and then returning A[k]. and the solution is far from obvious. We then partition A into three parts. and A[q + 1. In this latter case we have eliminated q smaller elements. A[q] contains the element x. will contain all the element that are greater than x.. The sieve technique works in phases as follows. We will discuss the details of choosing the pivot and partitioning later. We do not know which item is of interest. (Recall that we assumed that all the elements are distinct. for some constant k. The selection problem can easily be solved in Θ(n log n) time. especially when the distribution of values is highly skewed.. They are also useful.r] as we did in MergeSort. and want to ﬁnd the k-th smallest element of A. and we may just return it. We will deﬁne the median to be either of these elements. In particular. n/3. and can be eliminated from further consideration. It applies to problems where we are interested in ﬁnding a single item from a larger set of n items. but we can identify a large enough number of elements that cannot be the desired value. then the pivot is the kth smallest. When k = xRank then we get lucky and eliminate everything.r].statistics it is common to return the average of these two elements. we ﬁnd that we do not know what the desired item is. into two sets of roughly equal size.

. int k) { // return kth smallest of A[p.r] if (p == r) return A[p] // only 1 item left. A[q+1.. p.r] > x 5 9 2 6 4 1 3 7 Initial (k=6) 3 1 2 4 x_rnk=4 6 9 5 7 Partition (pivot = 4) 6 9 5 7 6 5 7 x_rnk=3 9 Partition (pivot = 7) 6 5 5 6 x_rnk=2 (DONE!) Recurse (k=6−4=2) Recurse (k=2) Partition (pivot = 6) Fig.r]> xRank = q .. r. 70: Selection Algorithm.p + 1 // rank of the pivot if (k == xRank) return x // the pivot is the kth smallest else if (k < xRank) return Select(A. x. p.. q-1.pivot p r 5 9 2 6 4 1 3 7 Before partitioing p q r 3 1 2 4 6 9 5 7 After partitioing x A[p..q−1] < x A[q+1. r. k-xRank)// select from right subarray } } Lecture Notes 105 CMSC 451 . q+1. return it else { x = ChoosePivot(A.q-1]. int r. p. x) // partition <A[p. r) // choose the pivot element q = Partition(A. int p. Selection by the Sieve Technique Select(array A. k) // select from left subarray else return Select(A.

For the rest of the lecture.. In fact.q − 1] < x. then the recurrence would have been T (n) = T (99n/100) + n. Recall that before we said that we might think of the pivot as a random element of A. 2i 2i i=0 ∞ i i=0 c ∞ Recall the formula for the inﬁnite geometric series. Note that the assumption of eliminating half was not critical. Earlier we said that we might think of the pivot as a random element of the array A. We can solve this either by expansion (iteration) or the Master Theorem. then the larger subarray will never be larger than 3n/4. and we would have gotten a geometric series involving 99/100. implying a convergent series. so the total running time is Θ(n). or if q is very large. otherwise. then we may have to recurse on the right subarray of size n − q. Let us suppose for now (optimistically) that we are able to design the procedure Choose Pivot in such a way that is eliminates exactly half the array with each phase. Both need to be solved in Θ(n) time. Actually this is not such a bad idea. The ﬁrst is how to choose the pivot element. Using (This only proves the upper bound on the running time. They are of sizes q − 1 and n − q. = 1/(1 − c). In either case. The reason is that Lecture Notes 106 CMSC 451 . which shows that the total running time is indeed linear in n. A[q] = x and A[q + 1. because we eliminate a constant fraction of elements with each phase.or smallest element in the array. This lesson is well worth remembering. If k = q. we will discuss partitioning in detail. Let’s consider the top of the recurrence. In this algorithm we make many passes (it could be as many as lg n). The second problem is a rather easy programming exercise. If we eliminated even one per cent. if x is one of the smallest elements of A or one of the largest. we need to search one of the two subarrays. If we could select q so that it is roughly of middle rank. The subarray that contains the kth smallest element will generally depend on what k is. For example. we are in trouble if q is very small. then we are done. Let’s see why. Actually this works pretty well in practice. meaning that we recurse on the remaining n/2 elements. so in the worst case. For any c such that |c| < 1. and the second is how to partition the array. The key is that we want the procedure to eliminate at least some constant fraction of the array after each partitioning step. It is often possible to achieve running times in ways that you would not expect.n]. then we may only succeed in eliminating one element with each phase. This would lead to the following recurrence. Eliminating any constant fraction would have been good enough.) This is a bit counterintuitive. which is still less than 1. then we may have to recurse on the left subarray of size q − 1. or perhaps a constant number of passes over the data set. we get this convergent geometric series in the analysis. respectively... T (n) = 1 T (n/2) + n if n = 1. Otherwise. but it is easy to see that it takes at least Ω(n) time. and if q < n/2. when we are given A[1. However. because we may only eliminate it and the few smaller or larger elements of A. Later. this we have T (n) ≤ 2n ∈ O(n). k will be chosen so that we have to recurse on the larger of the two subarrays. if n/4 ≤ q ≤ 3n/4. Choosing the Pivot: There are two issues that we have left unresolved. when we discuss QuickSort. then we will be in good shape. Thus if q > n/2. then we get into trouble. let’s concentrate on how to choose the pivot. Normally you would think that in order to design a Θ(n) time algorithm you could only make a single. Suppose that the pivot x turns out to be of rank q in the array.n] > x. Ideally x should have a rank that is neither too large nor too small. The partitioning algorithm will split the array into A[1. If we expand this recurrence level by level we see that we get the summation n n T (n) = n + + + · · · ≤ 2 4 ∞ i=0 n 1 = n .

g. Observe that at least half of the group medians are less than or equal to x. Instead. Here is the description for Select Pivot: Groups of 5: Partition A into groups of 5 elements. Each will take Θ(1) time.15]. Recall that we are given an array A[1. so to simplify things we will assume that n is evenly divisible by 5. Group medians: Compute the median of each group of 5. since the details are rather involved.. so picking a random element as the pivot will succeed about half the time to eliminate at least n/4. Proof: We will show that x is of rank at least n/4. Select(B. There will be m group medians. 14 57 24 6 37 32 2 43 30 25 23 52 12 63 3 5 44 17 34 64 Group 10 27 48 8 19 60 21 1 55 41 29 11 58 39 6 14 24 37 57 2 25 30 32 43 3 12 23 52 63 5 17 34 44 64 8 10 19 27 48 1 21 41 55 60 11 29 39 58 Get group medians 8 10 19 27 48 3 12 23 52 63 6 14 24 37 57 2 25 30 32 43 5 17 34 44 64 11 29 39 58 1 21 41 55 60 Get median of medians (Sorting of group medians is not really performed) Fig. we might be continuously unlucky. This is a bit complicated. Copy the group medians to a new array B. Lemma: The element x is of rank at least n/4 and at most 3n/4 in A. Let x be this median of medians. e.roughly half of the elements lie between ranks n/4 and 3n/4. due to the ﬂoor and ceiling arithmetic. m. To establish the correctness of this procedure. 1. Return x as the desired pivot. There will be exactly m = n/5 such groups (the last one might have fewer than 5 elements). The algorithm is illustrated in the ﬁgure below. A[6. where m = n/5 . we need to argue that x satisﬁes the desired rank properties. we need to show that there are at least n/4 elements that are less than or equal to x. k). For example. 71: Choosing the Pivot. since each group has only a constant number of elements. Of course. and repeating this n/5 times will give a total running time of Θ(n). e. etc. We will return to this later. but a careful analysis will show that the expected running time is still Θ(n). Consider the groups shown in the tabular form above.. To do this. The other part of the proof is essentially symmetrical. we will have to call the selection algorithm recursively on B. there are three elements that are less than or equal to this median within its group (because it is the median of its Lecture Notes 107 CMSC 451 . This can easily be done in Θ(n) time. A[11.g.) And for each group median.10]. We do not need an intelligent algorithm to do this. and we want to compute an element x whose rank is (roughly) between n/4 and 3n/4. we could just BubbleSort each group and take the middle element. We will have to describe this algorithm at a very high level. 30 is the ﬁnal pivot.5]..n]. (Because x is their median. A[1.. and k = (m + 1)/2 . we will describe a rather complicated method for computing a pivot element that achieves the desired properties. For this. Median of medians: Compute the median of the group medians.

provided that we select c such that c ≥ (19c/20) + 1. and the items fall uniformly between them. we can apply the induction hypothesis. otherwise. (You might try it replacing the 5 with 3. Step: We assume that T (n ) ≤ cn for all n < n. This quick-and-dirty analysis is probably good enough to convince yourself of this algorithm’s basic efﬁciency. As usual. Since we haven’t done any probabilistic analyses yet. Analysis: The last order of business is to analyze the running time of the overall algorithm. The running time is T (n) ≤ 1 T (n/5) + T (3n/4) + n if n = 1. 4. and write the Θ(n) as n for concreteness. Everything else took only Θ(n) time. A natural question is why did we pick groups of 5? If you look at the proof above. Thus. within Select Pivot() we needed to make a recursive call to Select() on an array B consisting of n/5 elements. we will ignore ﬂoors and ceilings. Solving for c we see that this is true provided that c ≥ 20. Therefore. A discrete random variable can be thought of as variable that takes on some Lecture Notes 108 CMSC 451 . and difﬁcult to apply iteration. We know we want an algorithm that runs in Θ(n) time. we see that by letting c = 20. Since there are n buckets. The recursive call in Select() will be made to list no larger than 3n/4. or 6 and see what happens. We achieved the main goal. However. this is a good place to apply constructive induction. you will see that it works for any value that is strictly greater than 4. Combining the constraints that c ≥ 1. Theorem: There is a constant c. in order to achieve this. such that T (n) ≤ cn. = cn + n = n 20 20 ≤ c +n This last expression will be ≤ cn. namely that of eliminating a constant fraction (at least 1/4) of the remaining list at each stage of the algorithm. Since n/5 and 3n/4 are both less than n. let’s try doing this one. Proof: (by strong induction on n) Basis: (n = 1) In this case we have T (n) = 1. we would expect a constant number of items per bucket. we are done. and so T (n) ≤ cn as long as c ≥ 1. (This one is rather typical. A careful analysis involves understanding a bit about probabilistic analyses of algorithms. and c ≥ 20. there are at least 3((n/5)/2 = 3n/10 ≥ n/4 elements that are less than or equal to x in the entire array. Therefore the expected running time of the algorithm is Θ(n). This is a very strange recurrence because it involves a mixture of different fractions (n/5 and 3n/4). the expected insertion time for each bucket is only a constant. giving T (n) 3n n 1 3 +c + n = cn + 5 4 5 4 19 19c +1 .) The ﬁrst thing to do in a probabilistic analysis is to deﬁne a random variable that describes the essential quantity that determines the execution time. However.) Supplemental Lecture 5: Analysis of BucketSort Probabilistic Analysis of BucketSort: We begin with a quick-and-dirty analysis of bucketsort.group). We will then show that T (n) ≤ cn. This mixture will make it impossible to use the Master Theorem. By deﬁnition we have T (n) = T (n/5) + T (3n/4) + n.

The number of times that a heads event occurs. For example. Supplemental Lecture 6: Long Integer Multiplication Read: This material on integer multiplication is not covered in CLRS. p). k!(n − k)! Although this looks messy. If you consult a standard textbook on probability and statistics. let Xi denote the random variable that indicates the number of elements assigned to the i-th bucket. Since the distribution is uniform. With probability 1 − 1/n the item goes into some other bucket. or in particular. So this leads to the key question. Because the elements are assumed to be uniformly distributed. what is the expected value of X 2 . Long Integer Multiplication: The following little algorithm shows a bit more about the surprising applications of divide-and-conquer. With probability p = 1/n the number goes into bucket i. for large n the time to insert the items into any one of the linked lists is a just shade less than 2. Namely. gives a total running time of Θ(2n) = Θ(n). and is denoted b(k. which we will interpret as the coin coming up heads. This is exactly what our quick-and-dirty analysis gave us. Such trials are called Bernoulli trials (named after the Swiss mathematician James Bernoulli). given n independent trials in which each trial has two possible outcomes is a well-studied problem in probability theory. so we may as well talk about a single random variable X. By the above formulas and the fact that p = 1/n we can derive this as E[X 2 ] = Var[X] + E 2 [X] = np(1 − p) + (np)2 = n n 1− 1 n + n n 2 = 2− 1 . n Thus. all of the random variables Xi have the same probability distribution. The reason for doing arithmetic on long numbers stems from cryptography. then the probability of getting k heads in n tosses is given by the following important formula P (X = k) = n k p (1 − p)n−k k where n k = n! . Most techniques for encryption are based on number-theoretic techniques. and multiplication in particular. (1 − p)n−k is the probability of tossing n − k tails. Summing up over all n buckets. X is just the total number of heads we see after making n tosses with this (biased) coin. we are interested in the expected sorting time. which we will interpret as the coin coming up tails. Since we are using a quadratic time algorithm to sort the elements of each bucket. it has a probability of p = 1/n of going into the ith bucket. So how many items do we expect will wind up in bucket i? We can analyze this by thinking of each element of A as being represented by a coin ﬂip (with a biased coin. The problem that we want to consider is how to perform arithmetic on long integers. it is a function that maps some some discrete sample space (the set of possible values) onto the reals (the probabilities). Efﬁcient Lecture Notes 109 CMSC 451 . for a given n and p) is called the binomial distribution. it is not too hard to see where it comes from. then you will see the two important facts that we need to know about the binomial distribution. that its mean value E[X] and its variance Var[X] are E[X] = np and Var[X] = E[X 2 ] − E 2 [X] = np(1 − p). We want to determine E[X 2 ]. and encryption keys are stored as long integers. denoted E[X 2 ]. More formally. which has a different probability of heads and tails). each element has an equal probability of going into any bucket. which is Θ(X 2 ). If p is the probability of getting a head. but now we know it is true with conﬁdence.set of discrete values with certain probabilities. and n is the total number of different k ways that the k heads could be distributed among the n tosses. This probability distribution (as a function of k. which will work for any bucket. the character string to be encrypted is converted into a sequence of numbers. Basically pk is the probability of tossing k heads. For 0 ≤ i ≤ n − 1. n. Since we assume that the elements of A are independent of each other.

but if we partition an n digit number into two “super digits” with roughly n/2 each into longer sequences. and their product is mult(A.m] x = A[m − 1. each taking Θ(n) time. CMSC 451 Lecture Notes . If we think of w.. y and z as n/2 digit numbers.. otherwise. its running time would be given by the following recurrence T (n) = 1 4T (n/2) + n 110 if n = 1.encryption and decryption depends on being able to perform arithmetic on long numbers. typically containing hundreds of digits. and a constant number of additions and shifts. Let A and B be the two numbers to multiply. and so is not really a multiplication.0].m] = B[n − 1. In fact. To avoid complicating things with ﬂoors and ceilings. 72: Long integer multiplication. Addition and subtraction on large numbers is relatively easy. we will see that it is possible. But the standard algorithm for multiplication runs in Θ(n2 ) time. it is more natural to think of the elements of A as being indexed in decreasing order from left to right as A[n − 1..0] rather than the usual A[0.0] and z = B[m − 1. Observe that all the additions involve numbers involving roughly n/2 digits. If n is the number of digits. then these algorithms run in Θ(n) time. Because of the way we write numbers. B) = mult(w. z) + mult(x. z). Divide-and-Conquer Algorithm: We know the basic grade-school algorithm for multiplication. (Go back and analyze your solution to the problem on Homework 1). The operation of multiplying by 10m should be thought of as simply shifting the number over by m positions to the right.. y))10m + mult(x. Let w y = A[n − 1. We normally think of this algorithm as applying on a digit-by-digit basis. x. which can be quite costly when lots of long multiplications are needed. Let m = n/2. since for centuries people have used the same algorithm that we all learn in grade school. This raises the question of whether there is a more efﬁcient way to multiply two very large numbers. It would seem surprising if there were. we can express A and B as A = w · 10m + x B = y · 10m + z. we can express the multiplication of two long integers as the result of four products on integers of roughly half the length of the original.. let’s just assume that the number of digits n is a power of 2. and so they take Θ(n) time each.n − 1]. Let A[0] denote the least signiﬁcant digit and let A[n−1] denote the most signiﬁcant digit of A. Thus. y)102m + (mult(w. n n/2 w y wz wy wy xy wz + xy xz Product n/2 x z xz A B Fig.. the same multiplication rule still applies. This suggests that if we were to implement this algorithm.

585 ). we have a = 3. z) = mult((w + x). b = 2. For example. then this algorithm will be superior. So the total running time is given by the recurrence: T (n) = 1 3T (n/2) + n if n = 1. we cannot simulate multiplication through repeated additions. otherwise. Although this may seem like a very large number. b = 2 and k = 1. Is this really an improvement? This algorithm carries a larger constant factor because of the overhead of recursion and the additional arithmetic operations.) The key turns out to be a algebraic “trick”. The additions.If we apply the Master Theorem. independent of n. it took us four multiplications to compute these. (There is a version of the Lecture Notes 111 CMSC 451 .g. Supplemental Lecture 7: Dynamic Programming: 0–1 Knapsack Problem Read: The introduction to Chapter 16 in CLR. 0-1 Knapsack Problem: Imagine that a burglar breaks into a museum and ﬁnds n items. subtractions. So. It shows that the critical element is the number of multiplications on numbers of size n/2. If the overhead was 10 times larger. but by trading off multiplications in favor of additions. For example. if we could ﬁnd a way to arrive at the same result algebraically. implying that Case 1 holds and the running time is Θ(nlg 4 ) = Θ(n2 ). then we would have a more efﬁcient algorithm. this is no better than the standard algorithm. we see that a = 4. recall that in cryptogrphy applications. since the number of additions must be a constant. C D E = mult(w. we can get everything we want. encryption keys of this length and longer are quite reasonable. B) = C · 102m + E · 10m + D. (Of course. 4 additions. The number of additions (as long as it is a constant) does not affect the running time. Unfortunately. k = 1. (y + z)) − C − D = (wy + wz + xy + xz) − wy − xz = (wz + xy). However. We assume that the burglar cannot take a fraction of an object. But he would rather steal gold before lead for the same reason. using only three multiplications (but with more additions and subtractions). Above. so he/she must make a decision to take the object entirely or leave it behind. Now when we apply the Master Theorem. We still need to shift the terms into their proper ﬁnal positions. Let vi denote the value of the i-th item. observe that if instead we compute the following quantities. The quantities that we need to compute are C = wy. and a > bk . 5n1. mult(A. yielding T (n) ∈ Θ(nlg 3 ) ≈ Θ(n1. The burglar carries a knapsack capable of holding total weight W . a burglar would rather steal diamonds before gold because the value per pound is better. y) = mult(x. and let wi denote the weight of the i-th item.585 versus n2 ) then this algorithm beats the simple algorithm for n ≥ 50. if we assume that the clever algorithm has overheads that are 5 times greater than the simple algorithm (e. D = xz. The burglar wishes to carry away the most valuable subset items subject to the weight constraint. and 2 subtractions all of numbers with n/2 digitis. But asymptotics says that if n is large enough. it actually has given us an important insight. then the crossover would occur for n ≥ 260. and shifts take Θ(n) time in total. Finally we have Altogether we perform 3 multiplications.2. Faster Divide-and-Conquer Algorithm: Even though the above exercise appears to have gotten us nowhere. The material on the Knapsack Problem is not presented in our text. and E = (wz + xy). but is brieﬂy discussed in Section 17.

. wi ’s and W are all positive integers. j]. . The second line implements the rule above. 2. then there is no value. j − wi ]) if wi ≤ j The ﬁrst line states that if there are no objects.) More formally. . An example is shown in the ﬁgure below. As a basis. then the array entry V [n. j] in the matrix whether we got this entry by taking the ith item or leaving it.W ]. The algorithm is given below. It turns out that this problem is NP-complete. With this information. 10] = 90. If we can compute all the entries of this array.) Here is how we solve the problem. The ﬁnal output is V [n.. i} that can ﬁt into a knapsack of weight j. j] V [i. . i∈T Let us assume that the vi ’s.. This is just V [i − 1. wn .n. However if we make the same sort of assumption that we made in counting sort.W ]. i− 1}. j] = = 0 V [i − 1. . we can come up with an efﬁcient solution.problem where the burglar can take a fraction of an object for a fraction of the value and weight. But if we truncate our numbers to lower precision. j] = 0 for 0 ≤ j ≤ W since if we have no items then we have no value. v2 . this gives a reasonable approximation algorithm. It is very easy to take these rules an produce an algorithm that computes the maximum value for the knapsack in time proportional to the size of the array. j] we will store the maximum value of any subset of objects {1. i∈T subject to wi ≤ W. To compute the entries of the array V we will imply an inductive approach. .. . the entry V [i. vi + V [i − 1. . 2. we wish to determine the subset T ⊆ {1. Lecture Notes 112 CMSC 451 . then the optimal value will come about by considering how to ﬁll a knapsack of size j with the remaining objects {1. This is only possible if wi ≤ j. respectively and weights 4 + 3 ≤ 10. They key is to record for each entry V [i. vn and w1 . The ranges on i and j are i ∈ [0. (Note that this is not very good if W is a large integer. This reﬂects the selection of items 2 and 4. Since these are the only two possibilities. 2. We construct an array V [0. . . . Take object i: If we take object i. and 0 ≤ j ≤ W . This is much easier to solve. then we gain a value of vi but have used up wi of our capacity. . . . With the remaining j −wi capacity in the knapsack. We assume that the wi ’s are small integers. j] if wi > j max(V [i − 1. 0. W ] = V [4. . observe that V [0. j − wi ]. 2. we can see that we have the following rule for constructing the array V . it is possible to reconstruct the optimum knapsack contents. This is vi + V [i − 1. We will leave this as an exercise. and W > 0. For 1 ≤ i ≤ n.. We consider two cases: Leave object i: If we choose to not take object i. V [0. irrespective of j. . . W ] will contain the maximum value of all n objects that can ﬁt into the entire knapsack of weight W . j]. The only missing detail is what items should we select to achieve the maximum. . and so we cannot really hope to ﬁnd an efﬁcient solution. . of values $40 and $50. i − 1}. . . and that W itself is a small integer. w2 . . We show that this problem can be solved in O(nW ) time. given v1 . which is O((n + 1)(W + 1)) = O(nW ). we can ﬁll it in the best possible way with objects {1. n} (of objects to “take”) that maximizes vi . .n] and j ∈ [0.

if (j >= w[i]) take_val = v[i] + V[i-1.w[i]].n]. for j = 0 to W do V[0. j . take_val). W) { allocate V[0.W]. Lecture Notes 113 CMSC 451 ... Capacity → Value Weight 10 5 40 4 30 6 50 3 j=0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 50 4 0 0 40 40 50 5 0 10 40 40 50 6 0 10 40 40 50 7 0 10 40 40 90 8 0 10 40 40 90 9 0 10 50 50 90 10 0 10 50 70 90 Item 1 2 3 4 Final result is V [4. } } return V[n. 10] = 90 (for taking items 2 and 4). 30.n]. 3 . W].. 6. else take_val = -INFINITY. 40. w[1.. j].j] = max(leave_val. Fig. j] = 0. for i = 1 to n do { for j = 0 to W do { leave_val = V[i-1.n][0. V[i.0-1 Knapsack Problem KnapSack(v[1. 50 . 73: 0–1 Knapsack Example. } // initialization // total value if we leave i // enough capacity to take i // total value if we take i // cannot take i // final value is max Values of the objects are 10. 4. Weights of the objects are 5. n.

and the time is Θ(1). deﬁned for n ≥ 1. we do Θ(1) work and then consider all possible ways of splitting the sequence of length n into two sequences. n). So we get the following recurrence.n. in the second line we applied the induction hypothesis.) T (n) = Claim: T (n) ≥ 2n−1 . In fact.) This version of the procedure certainly looks much simpler.3 of CLRS. k+1. // update if better } } return m[i. The initial call is Rec-Matrix-Chain(p.j]. Recursive Implementation: We have described dynamic programming as a method that involves the “bottom-up” computation of a table. i. // basis case else { m[i. i. if n ≥ 2. Otherwise. We only consider the cost here. Using this we have n−1 n−1 1 1+ n−1 k=1 (T (k) + T (n − k)) if n = 1. In general. However. one of length k and the other of length n − k.j]) m[i. This is unacceptably slow. Clearly this is true for n = 1. Recursive Chain Matrix Multiplication Rec-Matrix-Chain(array p. // initialize for k = i to j-1 do { // try all splits cost = Rec-Matrix-Chain(p. if (cost < m[i. since T (1) = 1 = 20 . // return final cost } (Note that the table m[1.. for n ≥ 2. 1. Let T (n) denote the running time of this algorithm on a sequence of matrices of length n. int j) { if (i == j) m[i. The call Rec-Matrix-Chain(p. and in the last line we applied the formula for the geometric series. what is wrong with this? The answer is the running time is much higher than the Θ(n3 ) algorithm that we gave before. T (n) = 1+ k=1 n−1 (T (k) + T (n − k)) ≥ 1 + k=1 n−2 T (k) ≥ 1+ = 1 + (2 2k−1 = 1 + k=1 n−1 2k . j) computes and returns the value of m[i. the recursive formulations that we have derived have been set up in a “topdown” manner.) If i = j then we have a sequence of length 1. we will see that its running time is exponential in n. − 1) = 2 k=0 n−1 In the ﬁrst line we simply ignored the T (n−k) term. We show it just to make the connection with the earlier version clearer. k) + Rec-Matrix-Chain(p. j].n] is not really needed.j] = INFINITY. j) + p[i-1]*p[k]*p[j]. Must the computation proceed bottom-up? Consider the following recursive implementation of the chain-matrix multiplication algorithm. (That is.Supplemental Lecture 8: Dynamic Programming: Memoization Read: Section 15. (We have replaced the Θ(1)’s with the constant 1.j] = 0.. 1. int i. the induction hypothesis is that T (m) ≥ 2m−1 for all m < n. and invoke the procedure recursively on each one.j] = cost. Proof: The proof is by induction on n. and more closely resembles the recursive formulation that we gave previously for this problem. n = j −i+1. So. Lecture Notes 114 CMSC 451 .

j]. Rather than storing the whole table explicitly as an array. in some DP problems. through a technique called memoization. using the index pair (i. Bridge: Is an edge whose removal results in a disconnected graph. (In general a graph is k-connected. // return final cost } This version runs in O(n3 ) time. E) be a connected undirected graph. the main problem with the procedure is that it recomputes the same entries over and over.j] = 0. Memoization is not usually used in practice. whose failure will result in the network becoming disconnected.j].j] = INFINITY. Articulation Point (or Cut Vertex): Is any vertex whose removal (together with the removal of any incident edges) results in a disconnected graph. Memoized Chain Matrix Multiplication Mem-Matrix-Chain(array p. k) + Mem-Matrix-Chain(p. it is never recomputed. this is because each of the O(n2 ) table entries is only computed once. int i. The bottom-up version evaluates each entry exactly once. Consider the following deﬁnitions. if (cost < m[i. since it is generally slower than the bottom-up method. Once an entries value has been computed. Biconnected: A graph is biconnected if it contains no articulation points. Articulation Points and Biconnected Graphs: Today we discuss another application of DFS. It’s job is to compute m[i. k+1. we will ﬁx this by allowing the procedure to compute each entry exactly once. j) + p[i-1]*p[k]*p[j]. you will see that the procedure is called repeatedly with the same arguments.j]) m[i. if k vertices must be removed to disconnect the graph. Here is the idea. this time to a problem on undirected graphs. Let G = (V. // already defined else if (i == j) m[i. because these are the “critical” points. As noted above.j] = cost. Let’s reconsider the function Rec-Matrix-Chain() given above. One way to do this is to initialize every entry to some special value (e.) Supplemental Lecture 9: Articulation Points and Biconnectivity Read: This material is not covered in CLR (except as Problem 23–2). you can store the “deﬁned” entries of the table in a hash table. and so bottom-up computation may compute entries that are never needed. Intuitively. If you have know that most of the table will not be needed.j]. In these cases memoization may be a good idea. i. // initialize for k = i to j-1 do { // try all splits cost = Mem-Matrix-Chain(p. // basis case else { m[i. int j) { if (m[i. // update if better } } return m[i. here is a way to save space. So.g. Memoization: Is it possible to retain the nice top-down structure of the recursive solution. while keeping the same O(n3 ) efﬁciency of the bottom-up version? The answer is yes. Lecture Notes 115 CMSC 451 . many of the table entries are simply not needed. UNDEFINED). (See Chapter 11 in CLRS for more information on hashing. j) as the hash key.) Biconnected graphs and articulation points are of great interest in the design of network algorithms. However.j] != UNDEFINED) return m[i. and the work needed to compute one table entry (most of it in the for-loop) is at most O(n). and return its value.Why is this so much worse than the dynamic programming version? If you “unravel” the recursive calls on a reasonably long example.

Notice that if two edges are cocyclic. In particular. We say that two edges e1 and e2 are cocyclic if either e1 = e2 or if there is a simple cycle that contains both edges. . notice that the condition for whether u is an articulation point depends on a test applied to its children. On the other hand. we cannot distinguish between forward edges and back edges. because when you delete a leaf from a tree. Because G is undirected. and we just call them back edges.) For now. You might think for a while why this is so. What about the leaves? If u is a leaf. This leads to the following. we can apply this algorithm to each individual connected component). Articulation Points and DFS: In order to determine the articulation points of an undirected graph. For each child there is a subtree of the DFS tree rooted at this child. can it be an articulation point? Answer: No. how would we know it by its structure in the DFS tree? We assume that G is connected (if not. where u is not a leaf and u is not the root. and hence u is an articulation point. there is no back edge going to a proper ancestor of u. vk be the children of u. v2 . We would like to do the same thing here. Please check this condition carefully to see that you understand it. So we assume is only one tree in the DFS forest. let us consider the typical case of a vertex u. let us ask ourselves if a vertex u is an articulation point. Biconnected components: The biconnected components of a graph are the equivalence classes of the cocylicity relation. the graph is connected after the deletion of a leaf from the DFS tree. if every one of the subtrees rooted at the children of u have back edges to proper ancestors of u. Let’s let v1 . Notice that unlike strongly connected components of a digraph (which form a partition of the vertex set) the biconnected components of a graph form a partition of the edge set. An algorithm for computing bridges is simple modiﬁcation to this procedure. and use the tree structure provided by the search to aid us. the graph remains connected (the backedges hold everything together). then if u is removed.a b c d e f g h i j Articulation point Bridge a b c a f c d e e i j g h Biconnected components Fig. (You should take a moment to convince yourself why this is true. then there are essentially two different ways of getting from one edge to the other (by going around the the cycle each way). we will call depthﬁrst search. We give an algorithm for computing articulation points. Lecture Notes 116 CMSC 451 . thus even ignoring the back edges. Also. It is not too hard to verify that this deﬁnes an equivalence relation on the edges of a graph. 74: Articulation Points and Bridges Last time we observed that the notion of mutual reachability partitioned the vertices of a digraph into equivalence classes. First off. the rest of the tree remains connected. this subtree would become disconnected from the rest of the graph. This is the most common source of confusion for this algorithm. . then if we were to remove u. In particular. . the DFS tree has a simpler structure. . If for some child. Observation 1: An internal vertex u of the DFS tree (other than the root) is an articulation point if and only there exists a subtree rooted at a child of u such that there is no back edge from any vertex in this subtree to a proper ancestor of u. there are no cross edges.

Back edge (u. Tree edge (u. In fact Low[u] tends to be “high” in the tree. Explanation: Since v is in the subtree rooted at u any single back edge leaving the tree rooted at v is a single back edge for the tree rooted at u. Explanation: We have detected a new back edge coming out of u. 75: Articulation Points and DFS Observation 2: A leaf of the DFS tree is never an articulation point. Lecture Notes 117 CMSC 451 . (Beware of this notation: “Low” means low discovery time. Low[u] is the highest (closest to the root) that you can get in the tree by taking any one backedge from either u or any of its descendents. Low[v]).v u Low[u]=d[v] Fig. If any back edge goes to an ancestor of u. w) that has the smallest value of d[w]. and hence cannot disconnect the graph in general. but we will exploit the structure of the DFS tree to help us. Low: Deﬁne Low[u] to be the minimum of d[u] and {d[w] | where (v. v): Low[u] = min(Low[u]. Checking Observation 1 is the hardest. On the other hand. w) is a back edge and v is a descendent of u}. then (as in the case of leaves) its removal does not disconnect the DFS tree. Note that this is completely consistent with Observation 1. How can we design an algorithm which tests these conditions? Checking that the root has multiple children is an easy exercise. Initialization: Low[u] = d[u]. v may be equal to u. not low in the tree. The basic thing we need to check for is whether there is a back edge from some subtree to an ancestor of a given vertex. How can we do this? It would be too expensive to keep track of all the back edges from each subtree (because there may be Θ(e) back edges. v): Low[u] = min(Low[u]. Articulation Points by DFS: Observations 1. If this goes to a lower d value than the previous back edge then make this the new low. So we keep track of the back edge (v.) To compute Low[u] we use the following simple rules: Suppose that we are performing DFS on the vertex u. 2. in the sense of being close to the root. that is. Observation 3: The root of the DFS is an articulation point if and only if it has two or more children. if the root has only a single child. d[v]). so we can delete the word “internal” from Observation 1. Intuitively. observe that the discovery times of these ancestors of u get smaller and smaller (the root having the smallest discovery time of 1). How do we know how close a back edge goes to the root? As we travel from u towards the root. this one will. A simpler scheme is to keep track of back edge that goes highest in the tree (in the sense of going closest to the root). The term “descendent” is used in the nonstrict sense. What about the root? Since there are no cross edges between the subtrees of the root if the root has two or more children then it is an articulation point (since its removal separates these two subtrees). since a leaf will not have any subtrees in the DFS tree. and 3 provide us with a structural characterization of which vertices in the DFS tree are articulation points.

you can show that all the edges in the biconnected component will be consecutive in the stack.1 in CLRS.Observe that once Low[u] is computed for all vertices u.v) is a back edge Low[u] = min(Low[u]. The main procedure for DFS is the same as before. We’ll leave it as an exercise. A hint here is to store the edges in a stack as you go through the DFS search. except that it calls the following routine rather than DFSvisit(). we can test whether a given nonroot vertex u is an articulation point by Observation 1 as follows: u is an articulation point if and only if it has a child v in the DFS tree for which Low[v] ≥ d[u] (since if there were a back edge from either v or one of its descendents to an ancestor of v then we would have Low[v] < d[u]). How do we do this? An almost correct answer is to test whether v is colored gray (since all gray vertices are ancestors of the current vertex). We also saw that shortest paths are undeﬁned if you Lecture Notes 118 CMSC 451 . This can be done by a small modiﬁcation of the algorithm above. The complete algorithm for computing articulation points is given below. The Final Algorithm: There is one subtlety that we must watch for in designing the algorithm (in particular this is true for any DFS on undirected graphs). As with all DFS-based algorithms. d[v]) // update L[u] } } } An example is shown in the following ﬁgure. (Notice that if {u. When you come to an articulation point. When processing a vertex u. v} is a bridge then it does not follow that u and v are both articulation points. under the assumption that the edge weights are nonnegative.) Another question is how to determine which edges are in the biconnected components. Articulation Points ArtPt(u) { color[u] = gray Low[u] = d[u] = ++time for each (v in Adj(u)) { if (color[v] == white) { // (u. Supplemental Lecture 10: Bellman-Ford Shortest Paths Read: Section 24. we need to know when a given edge (u.v) is a tree edge pred[v] = u ArtPt(v) Low[u] = min(Low[u]. the running time is Θ(n + e). There are some interesting problems that we still have not discussed. This is not quite correct because v may be the parent of v in the DFS tree and we are just seeing the “other side” of the tree edge between v and u (recalling that in constructing the adjacency list of an undirected graph we create two directed edges for each undirected edge). Low[v]) // update Low[u] if (pred[u] == NULL) { // root: apply Observation 3 if (this is u’s second child) Add u to set of articulation points } else if (Low[v] >= d[u]) { // internal node: apply Observation 1 Add u to set of articulation points } } else if (v != pred[u]) { // (u. v) is a back edge. To test correctly for a back edge we use the predecessor pointer to check that v is not the parent of u in the DFS tree. Bellman-Ford Algorithm: We saw that Dijkstra’s algorithm can solve the single-source shortest path problem. We did not discuss how to compute the bridges of a graph.

we know that k ≤ V − 1. . running in Θ(V E) time. Correctness of Bellman-Ford: I like to think of the Bellman-Ford as a sort of “BubbleSort analogue” for shortest paths. the Bellman-Ford algorithm simply applies a relaxation to every edge in the graph. Like Dijkstra’s algorithm. The interesting question is how and why it works. vi−1 ) + w(vi−1 . It was described in our discussion of Dijkstra’s algorithm. since there are two main nested loops. Once relaxation is applied to an edge. which solves this problem. Instead. in the sense that shortest path information is propagated sequentially along each shortest path in the graph. . The one presented in CLRS actually contains a bit of code that checks for this. (Check it out. (Recall that relaxation updates shortest path information along a single edge. Since this is a shortest path we have δ(s. . and repeats this V − 1 times. namely in increasing order of distance. Consider any shortest path from s to some other vertex u: v0 .w. and hence the path consists of at most V − 1 edges. 76: Articulation Points. one iterated V − 1 times and the other iterated E times. vk where v0 = s and vk = u.v) } } } // standard initialization // repeat V-1 times // relax along each edge The Θ(V E) running time is pretty obvious. In our version we will assume that there are no negative cost cycles. vi ) = δ(s.a b c d e f g h i j 2 3 4 5 6 b c d=1 1 1 a Low=1 8 9 10 j 8 8 e i d h 3 3 7 f 1 8 = articulation pt. vi ) (the true shortest path cost from s to vi ) satisﬁes δ(s. but no negative cost cycles? We shall present the Bellman-Ford algorithm. g 3 Fig.) Recall that we are given a graph G = (V. v). E) with numeric edge weights. vi ). have cycles of total negative cost. This trick doesn’t seem to work when dealing with graphs with negative edge weights. w(u. This algorithm is slower that Dijkstra’s algorithm.s) { for each (u in V) { d[u] = +infinity pred[u] = null } d[s] = 0 for i = 1 to V-1 { for each (u. . Lecture Notes 119 CMSC 451 . the Bellman-Ford algorithm is based on performing repeated relaxations.v) in E { Relax(u. it need never be relaxed again. What if you have negative edge weights.) Dijkstra’s algorithm was based on the idea of organizing the relaxations in the best possible manner. v1 . Bellman-Ford Algorithm BellmanFord(G. Since a shortest path will never visit the same vertex twice.

Skew Symmetry: For all u. (This implies that the digraph is connected. f (u. vi ). Supplemental Lecture 11: Network Flows and Matching Read: Chapt 27 in CLR. If (u. v) ≥ 0. u).8 0 4 ? −6 ? 5 ? 0 8 4 8 −6 4 5 ? 0 8 4 8 −6 2 5 9 0 8 4 8 −6 2 5 7 Initial configuration After 1st relaxation phase After 2nd relaxation phase After 3rd relaxation phase Fig. v) ∈ E we model this by setting c(u. we have f (u. Maximum Flow: The Max Flow problem is one of the basic problems of algorithm design. Thus after the ith pass we have d[vi ] ≤ d[vi−1 ] + w(vi−1 . and hence e ≥ n − 1. v) ∈ E has a nonegative capacity c(u. f : V × V → R which satisﬁes the following three properties: Capacity Constraint: For all u. 77: Bellman-Ford Algorithm. t}. v ∈ V . routing in networks. Each edge has certain maximum capacity that it can carry. The max ﬂow problem has applications in areas like transportation. We assert that after the ith pass of the “for-i” loop that d[vi ] = δ(s. d[vi ] is in fact equal to δ(s. It is the simplest problem in a line of many important problems having to do with the movement of commodities through a network. vi ). (In other words. and a sink t. v) = −f (v. The proof is by induction on i. completing the induction proof. we can think of backwards ﬂow as negative ﬂow. all vertices u have the correct distance values stored in d[u]. after the (V − 1)st iteration of the for loop. vi ). v). Observe that after the initialization (pass 0) we have d[v1 ] = d[s] = 0. We assume that every vertex lies on some path from the source to the sink (for otherwise the vertex is of no use to us). vi ) (since each time we do a relaxation there exists a path that witnesses its value). There are two special vertices: a source s. vi−1 ). vi ) = δ(s.) A ﬂow is a real valued function on pairs of vertices. After the ith pass through the loop. prior to the ith pass through the loop. v) = 0. vi−1 ) + w(vi−1 . vi ) (since we do relaxations along all the edges). Thus. Flow Networks: A ﬂow network G = (V. all vertices that are i edges away (along the shortest path tree) from the source have the correct distance values stored in d[u]. E) is a directed graph in which each edge (u. we have done a relaxation on the edge (vi−1 . v∈V Lecture Notes 120 CMSC 451 . v) = 0. and operations research. f (u. v) ≤ c(u. after i passes through the for loop. Recall from Dijkstra’s algorithm that d[vi ] is never less than δ(s. In summary. v ∈ V .) Flow conservation: For all u ∈ V − {s. the induction hypothesis tells us that d[vi−1 ] = δ(s. Thus. The idea is to ﬁnd out how much ﬂow we can push from one point to another. vi ) = δ(s. These are often studied in business schools. Intuitively we can think of a ﬂow network as a directed graph in which ﬂuid is ﬂowing along the edges of the graph. This is primarily for making algebraic analysis easier. In general.

this is equivalent to saying. called the Ford-Fulkerson method. A path in the network from s to t along which more ﬂow can be pushed is called an augmenting path. v) f (v. then you have a tougher problem called the multi-commodity ﬂow problem. If you require that the ﬂow from source i goes ONLY to sink i. ﬁnd the ﬂow of maximum value from s to t. Z) + f (Y. The total value of the ﬂow f is deﬁned as |f | = v∈V f (s. Set Notation: Sometimes rather than talking about the ﬂow from a vertex u to a vertex v. Lemma: (i) f (X. . v) is called the net ﬂow from u to v. we want to talk about the ﬂow from a SET of vertices X to another SET of vertices Y . Almost all network ﬂow algorithms are based on this simple idea. Multi-source. Y ) = x∈X Using this notation we can deﬁne ﬂow balance for a vertex u more succintly by just writing f (u. X) = 0. given a ﬂow network. This can easily be modelled by just adding a special supersource s and a supersink t . Y ) = −f (Y.e. and then incrementally make the ﬂow larger and larger by ﬁnding a path along which we can push more ﬂow. This idea is given by the most simple method for computing network ﬂows. t2 . Lecture Notes 121 CMSC 451 . sk and many sink vertices t1 . We will prove that when it is impossible to “push” any more ﬂow through the network. We will show this later. Ford-Fulkerson Method: The most basic concept on which all network-ﬂow algorithms work is the notion of augmenting ﬂows. Z) = f (X.e. (ii) f (X.) Note that ﬂow conservation does NOT apply to the source and sink. . (iii) If X ∩ Y = ∅ then f (X ∪ Y. f (X. y). the ﬂow into t.e. The quantity f (u.(Given skew symmetry. and source and sink vertices s and t. multi-sink ﬂow problems: It may seem overly restrictive to require that there is only a single source and a single sink vertex. the ﬂow out of s. a locally maximum ﬂow is globally maximum). and attaching s to all the si and attach all the tj to t . . Now by pushing the maximum ﬂow from s to t we are effectively producing the maximum ﬂow from all the si to all the tj ’s. tl . Many ﬂow problems have situations in which many source vertices s1 . V ) = 0. The idea is to start with a ﬂow of size zero. . To do this we extend the deﬁnition of f to sets by deﬁning y ∈ Y f (x. Note that we don’t care which ﬂow from one source goes to another sink. t). s2 . ﬂow-in = ﬂow-out. Y ). . X) + f (Z. thus the ﬂow out of s will equal the ﬂow into t. We let these edges have inﬁnite capacity. From simple manipulations of the deﬁnition of ﬂow we can prove the following facts. . since we think of ourselves as pumping ﬂow from s to t. They only differ in how they decide which path or paths along which to push ﬂow. . we have reached the maximum possible ﬂow (i. a partition of the vertex set into two disjoint subsets X ⊆ V and Y = V − X). In this case f (X. Flow conservation means that no ﬂow is lost anywhere else in the network. X). X ∪ Y ) = f (Z. Example: Page 581 of CLR. i. Z) and f (Z. It turns out that this is also equal to v∈V The maximum-ﬂow problem is. One important special case of this concept is when X and Y deﬁne a cut (i. Y ) can be thought of as the net amount of ﬂow crossing over the cut. .

cf (u. and it can be shown that the residual network has Θ(n + e) size. v) = c(u. (Remember that when deﬁning this ﬂow that whenever we push cf (p) units of ﬂow along any edge (u. We deﬁne the ﬂow across the cut as f (S. Each edge in the residual network is weighted with its residual capacity. Otherwise we say that the edge is saturated. In order to determine whether there exists an augmenting path from s to t is an easy problem. T ) we ONLY count constraints on edges leading from S to T ignoring those from T to S). The value of the ﬂow is |f | + |f |. Observe that by pushing cf (p) units of ﬂow along each edge of the path. s. the running time of Ford-Fulkerson is basically Θ((n + e)(number of augmenting stages)). Since every edge of the residual network has a strictly positive weight. deﬁne the residual capacity of a pair u. T ). t) { initialize flow f to 0. Example: Page 589 of CLR. Given a ﬂow network G and a ﬂow f . Augmenting Paths: An augmenting path is a simple path from s to t in Gf . we have to push −cf (p) units of ﬂow along the reverse edge (v. Lemma: Let f be a ﬂow in G and let f be a ﬂow in Gf . v) ≥ 0. and in computing c(S. Then (f + f ) (deﬁned (f + f )(u. and we deﬁne the capcity of the cut as c(S. The other rules for ﬂows are easy to verify. Lecture Notes 122 CMSC 451 . } output the final flow f. Correctness: To establish the correctness of the Ford-Fulkerson algorithm we need to delve more deeply into the theory of ﬂows and cuts in networks. (S. T ). A cut. The residual capacity of the path is the MINIMUM capacity of any edge on the path. T ) ﬂows from T to S are counted negatively (by skew-symmetry). v)) is a ﬂow in G. the resulting ﬂow is strictly larger than the current ﬂow for G. v) + f (u. v) = f (u. v)−f (u. Note that in computing f (S. v) such that cf (u. It is denoted cf (p). while (there exists an augmenting path p) { augment the flow along p. v). Since DFS and BFS take Θ(n + e) time. v ∈ V to be cf (u. u) to maintain skew-symmetry. Proof: Basically the residual network tells us how much additional ﬂow we can push through G. Lemma: The amount of ﬂow across any cut in the network is equal to |f |. in a ﬂow network is a partition of the vertex set into two disjoint subsets S and T such that s ∈ S and t ∈ T . First we construct the residual network. v) of p. This implies that f + f never exceeds the overall edge capacities of G. Later we will analyze the latter quantity. and hence we can use this to augment the ﬂow in G. and then we run DFS or BFS on the residual network starting at s. v) > 0 then it is possible to push more ﬂow through the edge (u. we ﬁrst deﬁne the notion of a residual network. T ).Ford-Fulkerson Network Flow FordFulkerson(G. v). } Residual Network: To deﬁne the notion of an augmenting path. Observe that if cf (u. The residual network is the directed graph Gf with the same vertex set as G but whose edges are the pairs (u. we get a ﬂow in Gf . v) > 0. Because of the capacity constraint. If the search reaches t then we know that a path exists (and can follow the predecessor pointers backwards to reconstruct it).

) Corollary: The value of any ﬂow is bounded from above by the capacity of any cut. Analysis of the Ford-Fulkerson method: The problem with the Ford-Fulkerson algorithm is that depending on how it picks augmenting paths. Because each edge crossing the cut must be saturated with ﬂow. then it must be maximum (and this cut must be minimum). (ii) The residual network Gf contains no augmenting paths. V ) + f (S − s. Ford-Fulkerson algorithm terminates when it ﬁnds this bottleneck. An Improvement: We have shown that if the augmenting path was chosen in a bad way the algorithm could run for a very long time before converging on the ﬁnal ﬂow. T ) = f (S. then by pushing ﬂow along this path we would have a larger ﬂow. Lecture Notes 123 CMSC 451 . The correctness of the Ford-Fulkerson method is based on the following theorem. In general. T ) for some cut (S.000. Let S be those vertices reachable from s in the residual network and let T be the rest. Computing this path is equivalent to determining the path of maximum capacity from s to t in the residual network. Proof: You cannot push any more ﬂow through a cut than its capacity. (This is exactly the same as the beer transport problem given on the last exam.) It is not known how fast this method works in the worst case. V ) − f (S. T ). (S. V ) = 0 comes from ﬂow conservation. T ) of G. called the Max-Flow. However. it may spend an inordinate amount of time arriving a the ﬁnal maximum ﬂow. Ford-Fulkerson can take time Θ((n + e)|f ∗ |) where f ∗ is the maximum ﬂow. the algorithm would terminate in two augmenting steps. S) = f (S. Consider the following example (from page 596 in CLR). but there is another simple strategy that is guaranteed to give good bounds (in terms of n and e). Maximum ﬂow ≤ Minimum cut). V ) = = f (s. f (u. It seems (from the example we showed) that a more logical way to push ﬂow is to select the augmenting path which holds the maximum amount of ﬂow. V ) |f | (The fact that f (S − s. (i. and since S − s is formed of such vertices the sum of their ﬂows will be zero also. Min-Cut Theorem. (ii) ⇒ (iii): If there are no augmenting paths then s and t are not connected in the residual network. and hence it ﬁnds the minimum cut and maximum ﬂow.000 augmenting will be needed before we get the ﬁnal ﬂow. It basically states that in any ﬂow network the minimum capacity cut acts like a bottleneck to limit the maximum amount of ﬂow. (iii) ⇒ (i): Since the ﬂow is never bigger than the capacity of any cut.e. V ) = f (s. it follows that the ﬂow across the cut equals the capacity of the cut. T ) forms a cut. V ) = 0 for all u other than s and t. (iii) |f | = c(S. Max-Flow Min-Cut Theorem: The following three conditions are equivalent. thus |f | = c(S. a contradiction. If the algorithm were smart enough to send ﬂow along the edges of weight 1. it will continuously improve the ﬂow by only a single unit. if the algorithm were to try to augment using the middle edge.000. if the ﬂow equals the capacity of some cut.Proof: f (S.000. Proof: (i) ⇒ (ii): If f is a max ﬂow and there were an augmenting path in Gf . 2. (i) f is a maximum ﬂow in G.

it must be that we reduce ﬂow on this edge. and thus we ﬁnd the shortest augmenting path (where the length of the path is the number of edges on the path). u) + 2. See the text. In other words. u ∈ L. Using a questionaire you establish which men are compatible which which women. The fact that Edmonds-Karp uses O(en) augmentations is based on the following observations. u) = δf (s. Since there is an edge from u to v. implying that δf (s. Proof: This is a simple property of shortest paths. u) + 1) + 1 = δf (s. Your task is to pair up as many compatible pairs of men and women as possible. the number of ﬂow augmentations needed will be at most O(e · n). v). This can only happen n/2 times. Lecture Notes 124 CMSC 451 . Thus we have: δf (s.) Theorem: The Edmonds-Karp algorithm makes at most O(n · e) augmentations. v) + 1. you are running a dating service and there are a set of men L and a set of women R. and disappears from the residual graph. v) is not on any shortest path. since no vertex can be further than n from the source. v ∈ R such that u and v are compatible. we push ﬂow along the reverse edge (v. v) is critical it lies on the shortest augmenting path. u) increases monotonically with each ﬂow augmentation. subject to the constraint that each man is paired with at most one woman. δf (s. Proof: An edge in the augmenting path is critical if the residual capacity of the path equals the residual capacity of this edge. Proof: (Messy. (It may be that some men are not paired with any woman. u) be the distance function from s to u in the residual network Gf . For this to be the case we have (at some later ﬂow f ) δf (s. we use Breadth-First search in the residual network. v) = δf (s. and vice versa. its tail vertex increases in distance from the source by two. v) < δf (s. the algorithm must terminate. Maximum Matching: One of the important elements of network ﬂow is that it is a very general algorithm which is capable of solving many problems.) This problem is modelled by giving an undirected graph whose vertex set is V = L ∪ R and whose edge set consists of pairs (u. with one little change.Edmonds-Karp Algorithm: The Edmonds-Karp algorithm is Ford-Fulkerson. u) = δf (s. but not too complicated.) We will give another example here. and hence (u. u). In summary. v) + 1 since dists increase with time ≥ δf (s. The problem is to ﬁnd a matching. and if δf (s. between the time that an edge becomes critical. u) + 1. Consider the following problem. t}. there are O(e) edges. (An example is problem 3 in the homework. In order to reappear. then δf (s. the Edmonds-Karp algorithm makes at most O(ne) augmentations and runs in O(ne2 ) time. Then as we peform augmentations by the Edmonds-Karp algorithm the value of δf (s. Observation: If the edge (u. Since each augmentation takes O(n + e) time to compute using BFS. after augmentation the critical edge becomes saturated.e. u) + 1. After this it disappears from the residual graph. let δf (s. v) + 1 = (δf (s. Thus. (The best known algorithm is essentially O(e · n log n). How many times can an edge become critical before the algorithm terminates? Observe that when the edge (u. v) ≤ δf (s. the overall running time will be O((n + e)e · n) = O(n2 e + e2 n) ∈ O(e2 n) (under the reasonable assumption that e ≥ n). if we do so. When ﬁnding the augmenting path. each edge can become critical at most O(n) times. We claim that this choice is particularly nice in that. v) is an edge on the minimum length augmenting path from s to t in Gf . u) + 1. v) = δf (s. u) + 1 then u would not be on the shortest path from s to v. i. Thus. Lemma: For each vertex u ∈ V −{s. hence after O(ne) augmentations. starting at the source s.

Hamiltonian Cycle: Today we consider a collection of problems related to ﬁnding paths in graphs and digraphs. Example: See page 602 in CLR. observe that the Ford-Fulkerson algorithm will only assign integer value ﬂows to the edges (and this is true of all existing network ﬂow algorithms). Let s and t be two new vertices and let V = V ∪ {s.that is a subset of edges M such that for each v ∈ V . it can have ﬂow along at most 1 incoming edge. The resulting undirected graph has the property that its vertex set can be divided into two groups such that all its edges go from one group to the other (never within a group.) Construct a ﬂow network G = (V . The decision problem formulation is. Thus by maximizing one we maximize the other.5. and since each vertex in R has exactly 1 outgoing edge. u)|u ∈ L} ∪ {(v. Recall that given a graph (or digraph) a Hamiltonian cycle is a simple cycle that visits every vertex in the graph (exactly once). Although in general it can be that ﬂows are real numbers. we can deﬁne a matching M = {(u. Now. Since each vertex in L has exactly 1 incoming edge. respectively. Thus letting f denote the maximum ﬂow. such a cycle will always exist. v ∈ R.) Lecture Notes 125 CMSC 451 . Given a complete graph (or digraph) with integer edge weights. does there exist a Hamiltonian cycle of total weight at most X? Today we will prove that Hamiltonian Cycle is NP-complete.4 of CLR.5. A Hamiltonian path is a simple path that visits every vertex in the graph (exactly once). Since the graph is complete. determine the cycle of minimum weight that visits all the vertices. given a complete weighted graph G. and for every (integer) ﬂow there is a matching of equal value. v) ∈ E}. unless the dating service is located on Dupont Circle). f (u. The Hamiltonian cycle (HC) and Hamiltonian path (HP) problems ask whether a given graph (or digraph) has such a cycle or path. An important related problem is the traveling salesman problem (TSP). but all of these problems are NP-complete. t}. E ) as follows. and depending on whether you want a path or a cycle. The desired matching is the one that has the maximum number of edges. Set the capacity of all edges in this network to 1. it can have ﬂow along at most 1 outgoing edge. v)|u ∈ L. Reduction to Network Flow: We claim that if you have an algorithm for solving the network ﬂow problem. and is called a maximum matching. This problem is called the maximum bipartite matching problem. and integer X. there is at most one edge of M incident to v. (It is done in Section 36. Supplemental Lecture 12: Hamiltonian Path Read: The reduction we present for Hamiltonian Path is completely different from the one in Chapt 36.5 in CLR. We claim that this matching is maximum because for every matching there is a corresponding ﬂow of equal value. Example: See page 601 in CLR. v) > 0}. then you can use this algorithm to solve the maximum bipartite matching problem. t)|v ∈ R} ∪ {(u. v)|(u. There are four variations of these problems depending on whether the graph is directed or undirected. compute the maximum ﬂow in G . We will leave TSP as an easy exercise. (Note that this idea does not work for general undirected graphs. E = {(s.

sometimes called components or gadgets (also called widgets). or o3 such that together these paths visit every vertex in the gadget exactly once.) The gadget that we will use in the directed Hamiltonian path reduction. Claim: Given the DHP-gadget: • For any subset of input edges. each vertex must be visited exactly once. called a DHP-gadget. Let x1 . In 3SAT there must be at least one true literal for each clause. o2 . most of the reductions that we have seen (for Clique. . must join corresponding inputs to corresponding outputs. we are deciding which edges will be a part of the path. or i3 to its respective output edge o1 . To see whether you really understand the gadget. There will be paths coming into Lecture Notes 126 CMSC 451 . x5 . i2 . The general structure of the digraph will consist of a series vertices. It was designed so it satisﬁed the following property. xm denote the variables appearing in F . a path that starts on input i1 must exit on output o1 . The inputs and outputs of each gadget correspond to the literals appearing in this clause. one for each variable. This is illustrated in the following ﬁgure. 2 or 3 input edges. whose job it is to enforce a particular constraint. and attempting to get through the gadget without skipping vertex and without visiting any vertex twice. xi+1 . They are sometimes called local replacement reductions. and visit all the vertices of the gadget exactly once. Each of these vertices will have two outgoing paths. o2 . This type of reduction is called a component design reduction. This one involves the construction of only one. Each of these paths will then pass through some number of DHP-gadgets. Very complex reductions may involve the creation of many gadgets. Would some other number work? DHP is NP-complete: This gadget is an essential part of our proof that the directed Hamiltonian path problem is NP-complete. . . Proof: DHP ∈ NP: The certiﬁcate consists of the sequence of vertices (or edges) in the path. (See CLR’s presentation of HP for other examples of gadgets.Component Design: Up to now. It is probably easiest to see this on your own. two. We will convert this formula into a digraph. (In other words. It consists of three incoming edges labeled i1 . The true path for xi will pass through all the clause gadgets for clauses in which xi appears. one taken if xi is set to true and one if xi is set to false. i2 . We will construct one DHP-gadget for each clause in the formula. and x8 . because it involves designing special subgraphs. and the same outputs. i3 and three outgoing edges. In DHP. and the false path will pass through all the gadgets for clauses in which xi appears. Let us consider the similar elements between the two problems. answer the question of why there are 6 groups of triples. Intuitively it says that if you enter the gadget on any subset of 1. because they operate by making some local change throughout the graph.) The proof is not hard. It is an easy matter to check that the path visits every vertex exactly once. or three input paths. is shown in the ﬁgure below. and DS in particular) are of a relatively simple variety. labeled o1 . In 3SAT we are selecting a truth assignment for the variables of the formula. which you can verify.) When the paths for xi have passed through their last gadgets. (The ﬁgure only shows a portion of the construction. We are given a boolean formula F in 3-CNF form (three literals per clause). We will present a much more complex style of reduction for the Hamiltonian path problem on directed graphs. In DHP. the clause (x2 ∨ x5 ∨ x8 ) would generate a clause gadget with inputs labeled x2 . and in doing so each path must end on the corresponding output edge. there exists a set of paths which join each input edge i1 . o3 . (The order in which the path passes through the gadgets is unimportant. VC. 3SAT ≤P DHP: This will be the subject of the rest of this section. but involves a careful inspection of the gadget. x2 . then they are joined to the next variable vertex. Thus. • Any subset of paths that start on the input edges and end on the output edges. by starting with one. Theorem: The directed Hamiltonian Path problem is NP-complete. then there is a way to get through the gadget and hit every vertex exactly once. .

Lecture Notes 127 CMSC 451 . 78: DHP-Gadget and examples of path traversals.Gadget i1 i2 i3 o1 o2 o3 i1 i2 i3 What it looks like inside o1 o2 o3 Path with 1 entry i1 o1 i2 i3 o2 o3 i1 i2 i3 o1 o2 o3 Path with 2 entries i1 o1 i2 i3 o2 o3 i1 i2 i3 o1 o2 o3 Path with 3 entries i1 o1 i2 i3 o2 o3 i1 i2 i3 o1 o2 o3 Fig.

these same gadgets from other variables as well.) We add one ﬁnal vertex xe , and the last variable’s paths are connected to xe . (If we wanted to reduce to Hamiltonian cycle, rather than Hamiltonian path, we could join xe back to x1 .)

xi xi xi xi _ xi _ _ xi xi xi _ xi _ xi _ xi ... xi xi ... xi _ xi xi xi+1 _ xi

Fig. 79: General structure of reduction from 3SAT to DHP. Note that for each variable, the Hamiltonian path must either use the true path or the false path, but it cannot use both. If we choose the true path for xi to be in the Hamiltonian path, then we will have at least one path passing through each of the gadgets whose corresponding clause contains xi , and if we chose the false path, then we will have at least one path passing through each gadget for xi . For example, consider the following boolean formula in 3-CNF. The construction yields the digraph shown in the following ﬁgure. (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x2 ∨ x1 ∨ x3 ) ∧ (x1 ∨ x3 ∨ x2 ).

path starts here x1 x2 x3

T F T F T F

_ x1 x2 x3

x1 _ x2 _ x3

x2 _ x _1 x3

x1 x3 _ x2

to x3 to x2 to x3 to x2

xe

Fig. 80: Example of the 3SAT to DHP reduction. The Reduction: Let us give a more formal description of the reduction. Recall that we are given a boolean formula F in 3-CNF. We create a digraph G as follows. For each variable xi appearing in F , we create a variable vertex, named xi . We also create a vertex named xe (the ending vertex). For each clause c, we create a DHP-gadget whose inputs and outputs are labeled with the three literals of c. (The order is unimportant, as long as each input and its corresponding output are labeled the same.) We join these vertices with the gadgets as follows. For each variable xi , consider all the clauses c1 , c2 , . . . , ck in which xi appears as a literal (uncomplemented). Join xi by an edge to the input labeled with xi in the gadget for c1 , and in general join the the output of gadget cj labeled xi with the input of gadget cj+1 with this same label. Finally, join the output of the last gadget ck to the next vertex variable xi+1 . (If this is the last variable, then join it to xe instead.) The resulting chain of edges is called the true path for variable xi . Form a second chain in exactly the same way, but this time joining the gadgets for the clauses in which xi appears. This is called the false path for xi . The resulting digraph is the output of the reduction. Observe that the entire construction can be performed in polynomial time, by simply inspecting the formula, creating the appropriate vertices, and adding the appropriate edges to the digraph. The following lemma establishes the correctness of this reduction. Lemma: The boolean formula F is satisﬁable if and only if the digraph G produced by the above reduction has a Hamiltonian path. Lecture Notes 128 CMSC 451

Start here x1 x2 x3

F T

T

x2

x1 _ x3

x2 _ x3

x1

to x3 to x2

xe

**A satisfying assignment hits all gadgets Start here x1 x2 x3
**

F F T

_ x1 x2

_ x3

x2 _ x _1 x3

to x3 xe to x2

A nonsatisfying assignment misses some gadgets

Fig. 81: Correctness of the 3SAT to DHP reduction. The upper ﬁgure shows the Hamiltonian path resulting from the satisfying assignment, x1 = 1, x2 = 1, x3 = 0, and the lower ﬁgure shows the non-Hamiltonian path resulting from the nonsatisfying assignment x1 = 0, x2 = 1, x3 = 0. Proof: We need to prove both the “only if” and the “if”. ⇒: Suppose that F has a satisfying assignment. We claim that G has a Hamiltonian path. This path will start at the variable vertex x1 , then will travel along either the true path or false path for x1 , depending on whether it is 1 or 0, respectively, in the assignment, and then it will continue with x2 , then x3 , and so on, until reaching xe . Such a path will visit each variable vertex exactly once. Because this is a satisfying assignment, we know that for each clause, either 1, 2, or 3 of its literals will be true. This means that for each clause, either 1, 2, or 3, paths will attempt to travel through the corresponding gadget. However, we have argued in the above claim that in this case it is possible to visit every vertex in the gadget exactly once. Thus every vertex in the graph is visited exactly once, implying that G has a Hamiltonian path. ⇐: Suppose that G has a Hamiltonian path. We assert that the form of the path must be essentially the same as the one described in the previous part of this proof. In particular, the path must visit the variable vertices in increasing order from x1 until xe , because of the way in which these vertices are joined together. Also observe that for each variable vertex, the path will proceed along either the true path or the false path. If it proceeds along the true path, set the corresponding variable to 1 and otherwise set it to 0. We will show that the resulting assignment is a satisfying assignment for F . Any Hamiltonian path must visit all the vertices in every gadget. By the above claim about DHP-gadgets, if a path visits all the vertices and enters along input edge then it must exit along the corresponding output edge. Therefore, once the Hamiltonian path starts along the true or false path for some variable, it must remain on edges with the same label. That is, if the path starts along the true path for xi , it must travel through all the gadgets with the label xi until arriving at the variable vertex for xi+1 . If it starts along the false path, then it must travel through all gadgets with the label xi . Since all the gadgets are visited and the paths must remain true to their initial assignments, it follows that for each corresponding clause, at least one (and possibly 2 or three) of the literals must be true. Therefore, this is a satisfying assignment.

Lecture Notes

129

CMSC 451

**Supplemental Lecture 13: Subset Sum Approximation
**

Read: Section 37.4 in CLR. Polynomial Approximation Schemes: Last time we saw that for some NP-complete problems, it is possible to approximate the problem to within a ﬁxed constant ratio bound. For example, the approximation algorithm produces an answer that is within a factor of 2 of the optimal solution. However, in practice, people would like to the control the precision of the approximation. This is done by specifying a parameter > 0 as part of the input to the approximation algorithm, and requiring that the algorithm produce an answer that is within a relative error of of the optimal solution. It is understood that as tends to 0, the running time of the algorithm will increase. Such an algorithm is called a polynomial approximation scheme. For example, the running time of the algorithm might be O(2(1/ ) n2 ). It is easy to see that in such cases the user pays a big penalty in running time as a function of . (For example, to produce a 1% error, the “constant” factor would be 2100 which would be around 4 quadrillion centuries on your 100 Mhz Pentium.) A fully polynomial approximation scheme is one in which the running time is polynomial in both n and 1/ . For example, a running time of O((n/ )2 ) would satisfy this condition. In such cases, reasonably accurate approximations are computationally feasible. Unfortunately, there are very few NP-complete problems with fully polynomial approximation schemes. In fact, recently there has been strong evidence that many NP-complete problems do not have polynomial approximation schemes (fully or otherwise). Today we will study one that does. Subset Sum: Recall that in the subset sum problem we are given a set S of positive integers {x1 , x2 , . . . , xn } and a target value t, and we are asked whether there exists a subset S ⊆ S that sums exactly to t. The optimization problem is to determine the subset whose sum is as large as possible but not larger than t. This problem is basic to many packing problems, and is indirectly related to processor scheduling problems that arise in operating systems as well. Suppose we are also given 0 < < 1. Let z ∗ ≤ t denote the optimum sum. The approximation problem is to return a value z ≤ t such that z ≥ z ∗ (1 − ). If we think of this as a knapsack problem, we want our knapsack to be within a factor of (1 − ) of being as full as possible. So, if = 0.1, then the knapsack should be at least 90% as full as the best possible. What do we mean by polynomial time here? Recall that the running time should be polynomial in the size of the input length. Obviously n is part of the input length. But t and the numbers xi could also be huge binary numbers. Normally we just assume that a binary number can ﬁt into a word of our computer, and do not count their length. In this case we will to be on the safe side. Clearly t requires O(log t) digits to be store in the input. We will take the input size to be n + log t. Intuitively it is not hard to believe that it should be possible to determine whether we can ﬁll the knapsack to within 90% of optimal. After all, we are used to solving similar sorts of packing problems all the time in real life. But the mental heuristics that we apply to these problems are not necessarily easy to convert into efﬁcient algorithms. Our intuition tells us that we can afford to be a little “sloppy” in keeping track of exactly full the knapsack is at any point. The value of tells us just how sloppy we can be. Our approximation will do something similar. First we consider an exponential time algorithm, and then convert it into an approximation algorithm. Exponential Time Algorithm: This algorithm is a variation of the dynamic programming solution we gave for the knapsack problem. Recall that there we used an 2-dimensional array to keep track of whether we could ﬁll a knapsack of a given capacity with the ﬁrst i objects. We will do something similar here. As before, we will concentrate on the question of which sums are possible, but determining the subsets that give these sums will not be hard. Let Li denote a list of integers that contains the sums of all 2i subsets of {x1 , x2 , . . . , xi } (including the empty set whose sum is 0). For example, for the set {1, 4, 6} the corresponding list of sums contains 0, 1, 4, 5(= Lecture Notes 130 CMSC 451

) Note that 0 < δ < 1. Let us suppose that we a procedure MergeLists(L1. 22. then we should not need to keep both of these numbers in the list. because this is the number of sums that are generated if there are no duplicates. 10(= 4 + 6). but may have fewer. 5 + 6 = 0. 15. The last list would have the elements 10 and 11 removed. The trimming must also depend on . 6. Lecture Notes 131 CMSC 451 . 4. } For example. The idea is that if the list L contains two numbers that are very close to one another. and let y ≥ z be the next element to be considered. We select δ = /n. (1) Remove any duplicates from Li . 24. 1. 1 0. 7(= 1 + 6). 20.. 5 ∪ 0 + 6. remove for L all elements greater than t. and the ﬁnal answer would be 7. 10. t) { L = <0>. Approximation Algorithm: To convert this into an approximation algorithm. As a bit of notation. 23. 4. 4. given δ = 0. But we should deﬁne close in manner that is relative to the sizes of the numbers involved. We will trim elements whose values are sufﬁciently close to each other. this means that the ﬁnal trimmed list cannot contain two value y and z such that (1 − δ)y ≤ z ≤ y. how much trimming can we allow and still keep our approximation bound? Furthermore. 1. for i = 1 to n do { L = MergeLists(L. 6} and t = 8 then the successive lists would be L0 L1 L2 L3 = = = = 0 0 ∪ 0 + 1 = 0. 6. Thus 1. 5. if S = {1. 048 and 91. 7.1 + 4). The algorithm runs in Ω(2n ) time in the worst case. 4 + 6. (Why? We will see later that this is the value that makes everything work out in the end. 29 . But. Let z denote the last untrimmed element in L. 9 . There are two things we will want to do for efﬁciency. 11(= 1 + 4 + 6) . We can think of z as representing y in the list. 1 ∪ 0 + 4. 12.n]. since some subsets may have the same sum. will we be able to reduce the list sizes from exponential to polynomial? The answer to both these questions is yes. 4. We walk through the list. 050.1 and given the list L = 10. 5 0. 1. Equivalently. 21. 91. and no items are removed. and returns a sorted lists with all duplicates removed. One of them is good enough for future approximations. 6 + 3 = 4. e. 11. provided you apply a proper way of trimming the lists.g. L2) which merges two sorted lists. let L + x denote the list resulting by adding the number x to every element of list L. 4. Assume that the elements of L are sorted. For example. 7. Note that Li can have as many as 2i elements. This is essentially the procedure used in MergeSort but with the added duplicate element test. we will introduce a “trim” the lists to decrease their sizes. This gives the following procedure for the subset sum problem. and (2) only keep sums that are less than or equal to t. Exact Subset Sum Exact_SS(x[1. 11 . This will reduce the size of the lists that the algorithm needs to maintain. } return largest element in L. 1 + 4 = 0. 1 + 6. If y−z ≤δ y then we trim y from the list. L+x[i]).

15. Claim: The number of distinct items in a trimmed list is O((n log t)/ ). . . but in addition we call the procedure Trim given below. . Thus.05. 201. which is polynomial in input size and 1/ . [d. d3 ]. Taking the natural log of both sides we have (k − 1) ln d ≤ ln t. The approximation algorithm operates as before. this function is polynomial in the input size and 1/ . We can think of trimming as a way of enforcing the condition that items in our lists are not relatively too close to one another. For example. = 0. then it follows that dk−1 ≤ t/1 = t. Let k denote the number of elements in the trimmed list. Consider the intervals [1. 12. Since the smallest (nonzero) element is at least as large as 1. 1 2 L L’ 4 8 16 Fig. 20. and the largest is no larger than t. 101} and t = 308 and Here is a summary of the algorithm’s execution. If z ≤ y are in the same interval [di−1 . we cannot have more than one item within each bucket.20. t] into a set of buckets of exponentially increasing size. [dk−1 . we have k−1 ln t ln t = ln d − ln(1 − δ) n ln t ln t = ≤ δ n log t . We have δ = /4 = 0.the trimmed list L will consist of L = 10. i y d d Thus. 23. dk ] where dk ≥ t. 102. d]. by enforcing the condition that no bucket has more than one item. Proof: We know that each pair of consecutive elements in a trimmed list differ by a ratio of at least d = 1/(1 − δ) > 1. Thus. Let d = 1/(1 − δ). 82: Trimming Lists for Approximate Subset Sum. di ] then di − di−1 1 y−z ≤ = 1 − = δ. = O ≤ k Observe that the input size is at least as large as n (since there are n numbers) and at least as large as log t (since it takes log t digits to write down t on the input). Another way to visualize trimming is to break the interval from [1. d2 ]. Note that d > 1. the smallest nonzero value and maximum value in the the trimmed list differ by a ratio of at least dk−1 . 29 . ignoring the element of value 0. . Lecture Notes 132 CMSC 451 . [d2 . consider the set S = {104. Using the facts that δ = /n and the log identity that ln(1 + x) ≤ x.

201. 102. 407 0. } return largest element in L. 303 0. 407 0. // add in next item L = Trim(L. 101.m]. 206 0. 101. The running time of the procedure is O(n|L|) which is O(n2 ln t/ ) by the earlier claim. // start with first item last = y[1]. 404 0. // approx factor L = <0>. 104 0. 104 0. 201.. last = y[i]. 206. } init: merge: trim: remove: merge: trim: remove: merge: trim: remove: merge: trim: remove: L0 L1 L1 L1 L2 L2 L2 L3 L3 L3 L4 L4 L4 = = = = = = = = = = = = = 0 0. L+x[i]). 303. // empty sum = 0 for i = 1 to n do { L = MergeLists(L. 102. 102. So our actual relative error in this case is within 2%. delta). 102. 303. 201. 404 0. // trim away "near" duplicates remove for L all elements greater than t. } } } Approx_SS(x[1. // last item to be added for i = 2 to m do { if (last < (1-delta) y[i]) { // different enough? append y[i] to end of L’. 206 0. L’ = <y[1]>. The optimum is 307 = 104 + 102 + 101. 201. Lecture Notes 133 CMSC 451 .. t. 201. 101. 303. 102.n]. 302. eps) { delta = eps/n. 302 The ﬁnal output is 302. 104 0. 102. 203. 201. delta) { let the elements of L be denoted y[1.Approximate Subset Sum Trim(L. 302. 206 0. 102. 104.

Consider an element y ∈ L∗ . then the total relative error should be (obviously?) n( /n) = . (1 + a) ≤ 1 + a n n ≤ ea . Let L∗ denote the i-th list in the exponential time (optimal) solution and let Li denote the i-th list in the approxi imate algorithm. but rather by multiplication. we will form two new items to add (initially) to Li : z and z + xi . zy L* i−1 L i−1 L* i Li y+xi z’ y z’’ z+xi z Fig.Approximation Analysis: The ﬁnal question is why the algorithm achieves an relative error of at most over the optimum solution. Suppose by 0 induction that the above equation holds for each item in L∗ . Initially L0 = L∗ = 0 . that is. Since the algorithm has n stages. CMSC 451 . Recall that our intuition was that we would allow a relative error of /n at each stage of the algorithm. Thus. not absolute errors. The items z and z + xi might not appear in Li because they may be trimmed. So we need to be more careful. Let Y ∗ denote the optimum (largest) subset sum and let Y denote the value returned by the algorithm. Our proof will make use of an important inequality from real analysis. z and z are elements of Li . Let z and z be their respective representatives. We want to argue that there will be a representative that is i “close” to each of these items. By our induction hypothesis. The proof of the claim is by induction on i. there is a representative element z in Li−1 such that (1 − /n)i−1 y ≤ z ≤ y. We claim that for each y ∈ L∗ there exists a representative item z ∈ Li whose relative error i from y that satisﬁes (1 − /n)i y ≤ z ≤ y. Y ≥ Y ∗ (1 − ). 83: Subset sum approximation analysis. We have (1 − /n)z (1 − /n)(z + xi ) Lecture Notes 134 ≤z ≤ z ≤ z ≤ z + xi . Lemma: For n > 0 and a real numbers. We know that i−1 i−1 y will generate two elements in L∗ : y and y + xi . Observe that by adding xi to the inequality above and a little simpliﬁcation we get (1 − /n)i−1 (y + xi ) ≤ z + xi ≤ y + xi . and so there is no error. The catch is that these are relative. When we apply our algorithm. We want to show that Y is not too much smaller than Y ∗ . These errors to not accumulate additively.

To complete the proof.Combining these with the inequalities above we have (1 − /n)i−1 (1 − /n)y ≤ (1 − /n)i y ≤ z ≤ y (1 − /n)i−1 (1 − /n)(y + xi ) ≤ (1 − /n)i (y + xi ) ≤ z ≤ z + yi . n n . This ends the proof of the claim. This is not quite what we wanted. Since z and z are in Li this is the desired result. Using our claim. and the fact that Y ∗ (the optimum answer) is the largest element of L∗ and Y (the approximate n answer) is the largest element of Ln we have (1 − /n)n Y ∗ ≤ Y ≤ Y ∗ . We wanted to show that (1 − )Y ∗ ≤ Y . Lecture Notes 135 CMSC 451 . we observe from the lemma above (setting a = − ) that (1 − ) ≤ 1 − This completes the approximate analysis.

- Algorithm
- Intro Notes
- CS502_all
- Computer Networks
- root
- 27287326-Algorithms-and-Data-Structures
- daa
- Technical
- daa 1mark questions and answers
- CS2251-QB
- Design and Analysis of Algorithm
- ln1ntroduction
- selva
- Introduction to Lisp
- DAA2M
- Algorithm Analysis and Design
- Algo Notes Tirupattur
- adaNotes-1
- A Notes on Design & Analysis of Algorithm
- DAA
- ADA_Manual
- 2 Marks
- 0130232432Algorithmics
- Introduction to Design and Analysis Computer Algorithms - Solution Manual
- DAA Tutorials
- Problems on Algorithms (2002)
- ALGORITHMS DESIGN TECHNIQUES AND ANALYSIS
- full notes
- Cormen Algo-lec2
- Hotel Management Project in Java (Report)
- Algorithm Analysis and Design notes

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd