# Lecture Notes CMSC 251

CMSC 251: Algorithms
1
Spring 1998
Dave Mount
Lecture 1: Course Introduction
(Tuesday, Jan 27, 1998)
Read: Course syllabus and Chapter 1 in CLR (Cormen, Leiserson, and Rivest).
What is algorithm design? Our text deﬁnes an algorithm to be any well-deﬁned computational procedure
that takes some values as input and produces some values as output. Like a cooking recipe, an algorithm
provides a step-by-step method for solving a computational problem.
A good understanding of algorithms is essential for a good understanding of the most basic element of
computer science: programming. Unlike a program, an algorithm is a mathematical entity, which is
independent of a speciﬁc programming language, machine, or compiler. Thus, in some sense, algorithm
design is all about the mathematical theory behind the design of good programs.
Why study algorithm design? There are many facets to good program design. Good algorithm design is
one of them (and an important one). To be really complete algorithm designer, it is important to be
aware of programming and machine issues as well. In any important programming project there are
two major types of issues, macro issues and micro issues.
Macro issues involve elements such as how does one coordinate the efforts of many programmers
working on a single piece of software, and how does one establish that a complex programming system
satisﬁes its various requirements. These macro issues are the primary subject of courses on software
engineering.
A great deal of the programming effort on most complex software systems consists of elements whose
programming is fairly mundane (input and output, data conversion, error checking, report generation).
However, there is often a small critical portion of the software, which may involve only tens to hundreds
of lines of code, but where the great majority of computational time is spent. (Or as the old adage goes:
80% of the execution time takes place in 20% of the code.) The micro issues in programming involve
how best to deal with these small critical sections.
It may be very important for the success of the overall project that these sections of code be written
in the most efﬁcient manner possible. An unfortunately common approach to this problem is to ﬁrst
design an inefﬁcient algorithm and data structure to solve the problem, and then take this poor design
and attempt to ﬁne-tune its performance by applying clever coding tricks or by implementing it on the
most expensive and fastest machines around to boost performance as much as possible. The problem is
that if the underlying design is bad, then often no amount of ﬁne-tuning is going to make a substantial
difference.
As an example, I know of a programmer who was working at Boeing on their virtual reality system for
the 777 project. The system was running unacceptably slowly in spite of the efforts of a large team of
programmers and the biggest supercomputer available. A new programmer was hired to the team, and
his ﬁrst question was on the basic algorithms and data structures used by the system. It turns out that
the system was based on rendering hundreds of millions of polygonal elements, most of which were
1
Copyright, David M. Mount, 1998, Dept. of Computer Science, University of Maryland, College Park, MD, 20742. These lecture
notes were prepared by David Mount for the course CMSC 251, Algorithms, at the University of Maryland, College Park. Permission
to use, copy, modify, and distribute these notes for educational purposes and without fee is hereby granted, provided that this copyright
notice appear in all copies.
1
Lecture Notes CMSC 251
invisible at any point in time. Recognizing this source of inefﬁciency, he redesigned the algorithms and
data structures, recoded the inner loops, so that the algorithm concentrated its efforts on eliminating
many invisible elements, and just drawing the few thousand visible elements. In a matter of two weeks
he had a system that ran faster on his ofﬁce workstation, than the one running on the supercomputer.
This may seem like a simple insight, but it is remarkable how many times the clever efforts of a single
clear-sighted person can eclipse the efforts of larger groups, who are not paying attention to the basic
principle that we will stress in this course:
Before you implement, ﬁrst be sure you have a good design.
This course is all about how to design good algorithms. Because the lesson cannot be taught in just
one course, there are a number of companion courses that are important as well. CMSC 420 deals with
how to design good data structures. This is not really an independent issue, because most of the fastest
algorithms are fast because they use fast data structures, and vice versa. CMSC 451 is the advanced
version of this course, which teaches other advanced elements of algorithm design. In fact, many of the
courses in the computer science department deal with efﬁcient algorithms and data structures, but just
as they apply to various applications: compilers, operating systems, databases, artiﬁcial intelligence,
computer graphics and vision, etc. Thus, a good understanding of algorithm design is a central element
to a good understanding of computer science and good programming.
Implementation Issues: One of the elements that we will focus on in this course is to try to study algorithms
as pure mathematical objects, and so ignore issues such as programming language, machine, and op-
erating system. This has the advantage of clearing away the messy details that affect implementation.
But these details may be very important.
For example, an important fact of current processor technology is that of locality of reference. Fre-
quently accessed data can be stored in registers or cache memory. Our mathematical analyses will
usually ignore these issues. But a good algorithm designer can work within the realm of mathemat-
ics, but still keep an open eye to implementation issues down the line that will be important for ﬁnal
implementation. For example, we will study three fast sorting algorithms this semester, heap-sort,
merge-sort, and quick-sort. From our mathematical analysis, all have equal running times. However,
among the three (barring any extra considerations) quicksort is the fastest on virtually all modern ma-
chines. Why? It is the best from the perspective of locality of reference. However, the difference is
typically small (perhaps 10–20% difference in running time).
Thus this course is not the last word in good program design, and in fact it is perhaps more accu-
rately just the ﬁrst word in good program design. The overall strategy that I would suggest to any
programming would be to ﬁrst come up with a few good designs from a mathematical and algorithmic
perspective. Next prune this selection by consideration of practical matters (like locality of reference).
Finally prototype (that is, do test implementations) a few of the best designs and run them on data
sets that will arise in your application for the ﬁnal ﬁne-tuning. Also, be sure to use whatever develop-
ment tools that you have, such as proﬁlers (programs which pin-point the sections of the code that are
responsible for most of the running time).
Course in Review: This course will consist of three major sections. The ﬁrst is on the mathematical tools
necessary for the analysis of algorithms. This will focus on asymptotics, summations, recurrences.
The second element will deal with one particularly important algorithmic problem: sorting a list of
numbers. We will show a number of different strategies for sorting, and use this problem as a case-
study in different techniques for designing and analyzing algorithms. The ﬁnal third of the course will
deal with a collection of various algorithmic problems and solution techniques. Finally we will close
this last third with a very brief introduction to the theory of NP-completeness. NP-complete problem
are those for which no efﬁcient algorithms are known, but no one knows for sure whether efﬁcient
solutions might exist.
2
Lecture Notes CMSC 251
Lecture 2: Analyzing Algorithms: The 2-d Maxima Problem
(Thursday, Jan 29, 1998)
Analyzing Algorithms: In order to design good algorithms, we must ﬁrst agree the criteria for measuring
algorithms. The emphasis in this course will be on the design of efﬁcient algorithm, and hence we
will measure algorithms in terms of the amount of computational resources that the algorithm requires.
These resources include mostly running time and memory. Depending on the application, there may
be other elements that are taken into account, such as the number disk accesses in a database program
or the communication bandwidth in a networking application.
In practice there are many issues that need to be considered in the design algorithms. These include
issues such as the ease of debugging and maintaining the ﬁnal software through its life-cycle. Also,
one of the luxuries we will have in this course is to be able to assume that we are given a clean, fully-
speciﬁed mathematical description of the computational problem. In practice, this is often not the case,
and the algorithm must be designed subject to only partial knowledge of the ﬁnal speciﬁcations. Thus,
in practice it is often necessary to design algorithms that are simple, and easily modiﬁed if problem
parameters and speciﬁcations are slightly modiﬁed. Fortunately, most of the algorithms that we will
discuss in this class are quite simple, and are easy to modify subject to small problem variations.
Model of Computation: Another goal that we will have in this course is that our analyses be as independent
as possible of the variations in machine, operating system, compiler, or programming language. Unlike
programs, algorithms to be understood primarily by people (i.e. programmers) and not machines. Thus
gives us quite a bit of ﬂexibility in how we present our algorithms, and many low-level details may be
omitted (since it will be the job of the programmer who implements the algorithm to ﬁll them in).
But, in order to say anything meaningful about our algorithms, it will be important for us to settle
on a mathematical model of computation. Ideally this model should be a reasonable abstraction of a
standard generic single-processor machine. We call this model a random access machine or RAM.
A RAM is an idealized machine with an inﬁnitely large random-access memory. Instructions are exe-
cuted one-by-one (there is no parallelism). Each instruction involves performing some basic operation
on two values in the machines memory (which might be characters or integers; let’s avoid ﬂoating
point for now). Basic operations include things like assigning a value to a variable, computing any
basic arithmetic operation (+, −, ∗, integer division) on integer values of any size, performing any
comparison (e.g. x ≤ 5) or boolean operations, accessing an element of an array (e.g. A[10]). We
assume that each basic operation takes the same constant time to execute.
This model seems to go a good job of describing the computational power of most modern (nonparallel)
machines. It does not model some elements, such as efﬁciency due to locality of reference, as described
in the previous lecture. There are some “loop-holes” (or hidden ways of subverting the rules) to beware
of. For example, the model would allow you to add two numbers that contain a billion digits in constant
time. Thus, it is theoretically possible to derive nonsensical results in the form of efﬁcient RAM
programs that cannot be implemented efﬁciently on any machine. Nonetheless, the RAM model seems
to be fairly sound, and has done a good job of modeling typical machine technology since the early
60’s.
Example: 2-dimension Maxima: Rather than jumping in with all the deﬁnitions, let us begin the discussion
of how to analyze algorithms with a simple problem, called 2-dimension maxima. To motivate the
problem, suppose that you want to buy a car. Since you’re a real swinger you want the fastest car
around, so among all cars you pick the fastest. But cars are expensive, and since you’re a swinger on
a budget, you want the cheapest. You cannot decide which is more important, speed or price, but you
know that you deﬁnitely do NOT want to consider a car if there is another car that is both faster and
3
Lecture Notes CMSC 251
cheaper. We say that the fast, cheap car dominates the slow, expensive car relative to your selection
criteria. So, given a collection of cars, we want to list those that are not dominated by any other.
Here is how we might model this as a formal problem. Let a point p in 2-dimensional space be given
by its integer coordinates, p = (p.x, p.y). A point p is said to dominated by point q if p.x ≤ q.x and
p.y ≤ q.y. Given a set of n points, P = ¦p
1
, p
2
, . . . , p
n
¦ in 2-space a point is said to be maximal if it
is not dominated by any other point in P.
The car selection problem can be modeled in this way. If for each car we associated (x, y) values where
x is the speed of the car, and y is the negation of the price (thus high y values mean cheap cars), then
the maximal points correspond to the fastest and cheapest cars.
2-dimensional Maxima: Given a set of points P = ¦p
1
, p
2
, . . . , p
n
¦ in 2-space, each represented by
its x and y integer coordinates, output the set of the maximal points of P, that is, those points p
i
,
such that p
i
is not dominated by any other point of P. (See the ﬁgure below.)
(2,5)
2 4 6 8 10
2
4
6
8
10
12 14
(9,10)
(13,3)
(15,7)
(14,10)
(12,12)
(7,13)
(11,5)
(4,11)
(7,7)
(5,1)
(4,4)
14
12
16
Figure 1: Maximal Points.
Observe that our description of the problem so far has been at a fairly mathematical level. For example,
we have intentionally not discussed issues as to how points are represented (e.g., using a structure with
records for the x and y coordinates, or a 2-dimensional array) nor have we discussed input and output
formats. These would normally be important in a software speciﬁcation. However, we would like to
keep as many of the messy issues out since they would just clutter up the algorithm.
Brute Force Algorithm: To get the ball rolling, let’s just consider a simple brute-force algorithm, with no
thought to efﬁciency. Here is the simplest one that I can imagine. Let P = ¦p
1
, p
2
, . . . , p
n
¦ be the
initial set of points. For each point p
i
, test it against all other points p
j
. If p
i
is not dominated by any
other point, then output it.
This English description is clear enough that any (competent) programmer should be able to implement
it. However, if you want to be a bit more formal, it could be written in pseudocode as follows:
Brute Force Maxima
Maxima(int n, Point P[1..n]) { // output maxima of P[0..n-1]
for i = 1 to n {
maximal = true; // P[i] is maximal by default
for j = 1 to n {
if (i != j) and (P[i].x <= P[j].x) and (P[i].y <= P[j].y) {
4
Lecture Notes CMSC 251
maximal = false; // P[i] is dominated by P[j]
break;
}
}
if (maximal) output P[i]; // no one dominated...output
}
}
There are no formal rules to the syntax of this pseudocode. In particular, do not assume that more
detail is better. For example, I omitted type speciﬁcations for the procedure Maxima and the variable
maximal, and I never deﬁned what a Point data type is, since I felt that these are pretty clear
from context or just unimportant details. Of course, the appropriate level of detail is a judgement call.
Remember, algorithms are to be read by people, and so the level of detail depends on your intended
audience. When writing pseudocode, you should omit details that detract from the main ideas of the
algorithm, and just go with the essentials.
You might also notice that I did not insert any checking for consistency. For example, I assumed that
the points in P are all distinct. If there is a duplicate point then the algorithm may fail to output even
a single point. (Can you see why?) Again, these are important considerations for implementation, but
we will often omit error checking because we want to see the algorithm in its simplest form.
Correctness: Whenever you present an algorithm, you should also present a short argument for its correct-
ness. If the algorithm is tricky, then this proof should contain the explanations of why the tricks works.
In a simple case like the one above, there almost nothing that needs to be said. We simply implemented
the deﬁnition: a point is maximal if no other point dominates it.
Running Time Analysis: The main purpose of our mathematical analyses will be be measure the execution
time (and sometimes the space) of an algorithm. Obviously the running time of an implementation of
the algorithm would depend on the speed of the machine, optimizations of the compiler, etc. Since we
want to avoid these technology issues and treat algorithms as mathematical objects, we will only focus
on the pseudocode itself. This implies that we cannot really make distinctions between algorithms
whose running times differ by a small constant factor, since these algorithms may be faster or slower
depending on how well they exploit the particular machine and compiler. How small is small? To
make matters mathematically clean, let us just ignore all constant factors in analyzing running times.
We’ll see later why even with this big assumption, we can still make meaningful comparisons between
algorithms.
In this case we might measure running time by counting the number of steps of pseudocode that are
executed, or the number of times that an element of P is accessed, or the number of comparisons that
are performed.
Running time depends on input size. So we will deﬁne running time in terms of a function of input
size. Formally, the input size is deﬁned to be the number of characters in the input ﬁle, assuming some
reasonable encoding of inputs (e.g. numbers are represented in base 10 and separated by a space).
However, we will usually make the simplifying assumption that each number is of some constant
maximum length (after all, it must ﬁt into one computer word), and so the input size can be estimated
up to constant factor by the parameter n, that is, the length of the array P.
Also, different inputs of the same size may generally result in different execution times. (For example,
in this problem, the number of times we execute the inner loop before breaking out depends not only on
the size of the input, but the structure of the input.) There are two common criteria used in measuring
running times:
Worst-case time: is the maximum running time over all (legal) inputs of size n? Let I denote a legal
input instance, and let [I[ denote its length, and let T(I) denote the running time of the algorithm
5
Lecture Notes CMSC 251
on input I.
T
worst
(n) = max
|I|=n
T(I).
Average-case time: is the average running time over all inputs of size n? More generally, for each
input I, let p(I) denote the probability of seeing this input. The average-case running time is the
weight sum of running times, with the probability being the weight.
T
avg
(n) =

|I|=n
p(I)T(I).
We will almost always work with worst-case running time. This is because for many of the problems
we will work with, average-case running time is just too difﬁcult to compute, and it is difﬁcult to specify
a natural probability distribution on inputs that are really meaningful for all applications. It turns out
that for most of the algorithms we will consider, there will be only a constant factor difference between
worst-case and average-case times.
Running Time of the Brute Force Algorithm: Let us agree that the input size is n, and for the running
time we will count the number of time that any element of P is accessed. Clearly we go through the
outer loop n times, and for each time through this loop, we go through the inner loop n times as well.
The condition in the if-statement makes four accesses to P. (Under C semantics, not all four need be
evaluated, but let’s ignore this since it will just complicate matters). The output statement makes two
accesses (to P[i].x and P[i].y) for each point that is output. In the worst case every point is maximal
(can you see how to generate such an example?) so these two access are made for each time through
the outer loop.
Thus we might express the worst-case running time as a pair of nested summations, one for the i-loop
and the other for the j-loop:
T(n) =
n

i=1
_
_
2 +
n

j=1
4
_
_
.
These are not very hard summations to solve.

n
j=1
4 is just 4n, and so
T(n) =
n

i=1
(4n + 2) = (4n + 2)n = 4n
2
+ 2n.
As mentioned before we will not care about the small constant factors. Also, we are most interested in
what happens as n gets large. Why? Because when n is small, almost any algorithm is fast enough.
It is only for large values of n that running time becomes an important issue. When n is large, the n
2
term will be much larger than the n term, and so it will dominate the running time. We will sum this
analysis up by simply saying that the worst-case running time of the brute force algorithm is Θ(n
2
).
This is called the asymptotic growth rate of the function. Later we will discuss more formally what
this notation means.
Summations: (This is covered in Chapter 3 of CLR.) We saw that this analysis involved computing a sum-
mation. Summations should be familiar from CMSC 150, but let’s review a bit here. Given a ﬁnite
sequence of values a
1
, a
2
, . . . , a
n
, their suma
1
+a
2
+ +a
n
can be expressed in summation notation
as
n

i=1
a
i
.
If n = 0, then the value of the sum is the additive identity, 0. There are a number of simple algebraic
facts about sums. These are easy to verify by simply writing out the summation and applying simple
6
Lecture Notes CMSC 251
high school algebra. If c is a constant (does not depend on the summation index i) then
n

i=1
ca
i
= c
n

i=1
a
i
and
n

i=1
(a
i
+b
i
) =
n

i=1
a
i
+
n

i=1
b
i
.
There are some particularly important summations, which you should probably commit to memory (or
at least remember their asymptotic growth rates). If you want some practice with induction, the ﬁrst
two are easy to prove by induction.
Arithmetic Series: For n ≥ 0,
n

i=1
i = 1 + 2 + +n =
n(n + 1)
2
= Θ(n
2
).
Geometric Series: Let x ,= 1 be any constant (independent of i), then for n ≥ 0,
n

i=0
x
i
= 1 +x +x
2
+ +x
n
=
x
n+1
−1
x −1
.
If 0 < x < 1 then this is Θ(1), and if x > 1, then this is Θ(x
n
).
Harmonic Series: This arises often in probabilistic analyses of algorithms. For n ≥ 0,
H
n
=
n

i=1
1
i
= 1 +
1
2
+
1
3
+ +
1
n
≈ ln n = Θ(lnn).
Lecture 3: Summations and Analyzing Programs with Loops
(Tuesday, Feb 3, 1998)
Recap: Last time we presented an algorithm for the 2-dimensional maxima problem. Recall that the algo-
rithm consisted of two nested loops. It looked something like this:
Brute Force Maxima
Maxima(int n, Point P[1..n]) {
for i = 1 to n {
...
for j = 1 to n {
...
...
}
}
We were interested in measuring the worst-case running time of this algorithm as a function of the
input size, n. The stuff in the “. . . ” has been omitted because it is unimportant for the analysis.
Last time we counted the number of times that the algorithm accessed a coordinate of any point. (This
was only one of many things that we could have chosen to count.) We showed that as a function of n
in the worst case this quantity was
T(n) = 4n
2
+ 2n.
7
Lecture Notes CMSC 251
We were most interested in the growth rate for large values of n (since almost all algorithms run fast
for small values of n), so we were most interested in the 4n
2
term, which determines how the function
grows asymptotically for large n. Also, we do not care about constant factors (because we wanted
simplicity and machine independence, and ﬁgured that the constant factors were better measured by
implementing the algorithm). So we can ignored the factor 4 and simply say that the algorithm’s
worst-case running time grows asymptotically as n
2
, which we wrote as Θ(n
2
).
In this and the next lecture we will consider the questions of (1) how is it that one goes about analyzing
the running time of an algorithm as function such as T(n) above, and (2) how does one arrive at a
simple asymptotic expression for that running time.
A Harder Example: Let’s consider another example. Again, we will ignore stuff that takes constant time
(expressed as “. . . ” in the code below).
A Not-So-Simple Example:
for i = 1 to n { // assume that n is input size
...
for j = 1 to 2*i {
...
k = j;
while (k >= 0) {
...
k = k - 1;
}
}
}
How do we analyze the running time of an algorithm that has many complex nested loops? The
answer is that we write out the loops as summations, and then try to solve the summations. Let I(),
M(), T() be the running times for (one full execution of) the inner loop, middle loop, and the entire
program. To convert the loops into summations, we work from the inside-out. Let’s consider one pass
through the innermost loop. The number of passes through the loop depends on j. It is executed for
k = j, j −1, j −2, . . . , 0, and the time spent inside the loop is a constant, so the total time is just j +1.
We could attempt to arrive at this more formally by expressing this as a summation:
I(j) =
j

k=0
1 = j + 1
Why the “1”? Because the stuff inside this loop takes constant time to execute. Why did we count
up from 0 to j (and not down as the loop does?) The reason is that the mathematical notation for
summations always goes from low index to high, and since addition is commutative it does not matter
in which order we do the addition.
Now let us consider one pass through the middle loop. It’s running time is determined by i. Using
the summation we derived above for the innermost loop, and the fact that this loop is executed for j
running from 1 to 2i, it follows that the execution time is
M(i) =
2i

j=1
I(j) =
2i

j=1
(j + 1).
Last time we gave the formula for the arithmetic series:
n

i=1
i =
n(n + 1)
2
.
8
Lecture Notes CMSC 251
Our sum is not quite of the right form, but we can split it into two sums:
M(i) =
2i

j=1
j +
2i

j=1
1.
The latter sum is clearly just 2i. The former is an arithmetic series, and so we ﬁnd can plug in 2i for n,
and j for i in the formula above to yield the value:
M(i) =
2i(2i + 1)
2
+ 2i =
4i
2
+ 2i + 4i
2
= 2i
2
+ 3i.
Now, for the outermost sum and the running time of the entire algorithm we have
T(n) =
n

i=1
(2i
2
+ 3i).
Splitting this up (by the linearity of addition) we have
T(n) = 2
n

i=1
i
2
+ 3
n

i=1
i.
The latter sum is another arithmetic series, which we can solve by the formula above as n(n + 1)/2.
The former summation

n
i=1
i
2
is not one that we have seen before. Later, we’ll show the following.
Quadratic Series: For n ≥ 0.
n

i=1
i
2
= 1 + 4 + 9 + +n
2
=
2n
3
+ 3n
2
+n
6
.
Assuming this fact for now, we conclude that the total running time is:
T(n) = 2
2n
3
+ 3n
2
+n
6
+ 3
n(n + 1)
2
,
which after some algebraic manipulations gives
T(n) =
4n
3
+ 15n
2
+ 11n
6
.
As before, we ignore all but the fastest growing term 4n
3
/6, and ignore constant factors, so the total
running time is Θ(n
3
).
Solving Summations: In the example above, we saw an unfamiliar summation,

n
i=1
i
2
, which we claimed
could be solved in closed form as:
n

i=1
i
2
=
2n
3
+ 3n
2
+n
6
.
Solving a summation in closed-form means that you can write an exact formula for the summation
without any embedded summations or asymptotic terms. In general, when you are presented with an
unfamiliar summation, how do you approach solving it, or if not solving it in closed form, at least
getting an asymptotic approximation. Here are a few ideas.
9
Lecture Notes CMSC 251
Use crude bounds: One of the simples approaches, that usually works for arriving at asymptotic
bounds is to replace every term in the summation with a simple upper bound. For example,
in

n
i=1
i
2
we could replace every term of the summation by the largest term. This would give
n

i=1
i
2

n

i=1
n
2
= n
3
.
Notice that this is asymptotically equal to the formula, since both are Θ(n
3
).
This technique works pretty well with relatively slow growing functions (e.g., anything growing
more slowly than than a polynomial, that is, i
c
for some constant c). It does not give good bounds
with faster growing functions, such as an exponential function like 2
i
.
Approximate using integrals: Integration and summation are closely related. (Integration is in some
sense a continuous form of summation.) Here is a handy formula. Let f(x) be any monotonically
increasing function (the function increases as x increases).
_
n
0
f(x)dx ≤
n

i=1
f(i) ≤
_
n+1
1
f(x)dx.
2 1 0 3 ... n
f(x)
f(2)
x
2 1 0 3 ... n n+1
f(2)
x
f(x)
Figure 2: Approximating sums by integrals.
Most running times are increasing functions of input size, so this formula is useful in analyzing
algorithm running times.
Using this formula, we can approximate the above quadratic sum. In this case, f(x) = x
2
.
n

i=1
i
2

_
n+1
1
x
2
dx =
x
3
3
¸
¸
¸
¸
n+1
x=1
=
(n + 1)
3
3

1
3
=
n
3
+ 3n
2
+ 3n
3
.
Note that the constant factor on the leading term of n
3
/3 is equal to the exact formula.
You might say, why is it easier to work with integrals than summations? The main reason is
that most people have more experience in calculus than in discrete math, and there are many
mathematics handbooks with lots of solved integrals.
Use constructive induction: This is a fairly good method to apply whenever you can guess the general
form of the summation, but perhaps you are not sure of the various constant factors. In this case,
the integration formula suggests a solution of the form:
n

i=1
i
2
= an
3
+bn
2
+cn +d,
but we do not know what a, b, c, and d are. However, we believe that they are constants (i.e., they
are independent of n).
10
Lecture Notes CMSC 251
Let’s try to prove this formula by induction on n, and as the proof proceeds, we should gather
information about what the values of a, b, c, and d are.
Since this is the ﬁrst induction proof we have done, let us recall how induction works. Basically
induction proofs are just the mathematical equivalents of loops in programming. Let n be the
integer variable on which we are performing the induction. The theorem or formula to be proved,
called the induction hypothesis is a function of n, denote IH(n). There is some smallest value
n
0
for which IH(n
0
) is suppose to hold. We prove IH(n
0
), and then we work up to successively
larger value of n, each time we may make use of the induction hypothesis, as long as we apply it
to strictly smaller values of n.
Prove IH(n0);
for n = n0+1 to infinity do
Prove IH(n), assuming that IH(n’) holds for all n’ < n;
This is sometimes called strong induction, because we assume that the hypothesis holds for all
n

< n. Usually we only need to assume the induction hypothesis for the next smaller value of n,
namely n −1.
Basis Case: (n = 0) Recall that an empty summation is equal to the additive identity, 0. In this
case we want to prove that 0 = a 0
3
+ b 0
2
+ c 0 + d. For this to be true, we must have
d = 0.
Induction Step: Let us assume that n > 0, and that the formula holds for all values n

< n, and
from this we will show that the formula holds for the value n itself.
The structure of proving summations by induction is almost always the same. First, write the
summation for i running up to n, then strip off the last term, apply the induction hypothesis
on the summation running up to n −1, and then combine everything algebraically. Here we
go.
n

i=1
i
2
=
_
n−1

i=1
i
2
_
+n
2
= a(n −1)
3
+b(n −1)
2
+c(n −1) +d +n
2
= (an
3
−3an
2
+ 3an −a) + (bn
2
−2bn +b) + (cn −c) +d +n
2
= an
3
+ (−3a +b + 1)n
2
+ (3a −2b +c)n + (−a +b −c +d).
To complete the proof, we want this is equal to an
3
+bn
2
+cn+d. Since this should be true
for all n, this means that each power of n must match identically. This gives us the following
constraints
a = a, b = −3a +b + 1, c = 3a −2b +c, d = −a +b −c +d.
We already know that d = 0 from the basis case. From the second constraint above we can
cancel b from both sides, implying that a = 1/3. Combining this with the third constraint
we have b = 1/2. Finally from the last constraint we have c = −a +b = 1/6.
This gives the ﬁnal formula
n

i=1
i
2
=
n
3
3
+
n
2
2
+
n
6
=
2n
3
+ 3n
2
+n
6
.
As desired, all of the values a through d are constants, independent of n. If we had chosen
the wrong general form, then either we would ﬁnd that some of these “constants” depended
on n, or we might get a set of constraints that could not be satisﬁed.
Notice that constructive induction gave us the exact formula for the summation. The only
tricky part is that we had to “guess” the general structure of the solution.
11
Lecture Notes CMSC 251
In summary, there is no one way to solve a summation. However, there are many tricks that can be
applied to either ﬁnd asymptotic approximations or to get the exact solution. The ultimate goal is to
come up with a close-form solution. This is not always easy or even possible, but for our purposes
asymptotic bounds will usually be good enough.
Lecture 4: 2-d Maxima Revisited and Asymptotics
(Thursday, Feb 5, 1998)
Read: Chapts. 2 and 3 in CLR.
2-dimensional Maxima Revisited: Recall the max-dominance problem from the previous lectures. A point
p is said to dominated by point q if p.x ≤ q.x and p.y ≤ q.y. Given a set of n points, P =
¦p
1
, p
2
, . . . , p
n
¦ in 2-space a point is said to be maximal if it is not dominated by any other point
in P. The problem is to output all the maximal points of P.
So far we have introduced a simple brute-force algorithm that ran in Θ(n
2
) time, which operated by
comparing all pairs of points. The question we consider today is whether there is an approach that is
signiﬁcantly better?
The problem with the brute-force algorithm is that uses no intelligence in pruning out decisions. For
example, once we know that a point p
i
is dominated by another point p
j
, then we we do not need to use
p
i
for eliminating other points. Any point that p
i
dominates will also be dominated by p
j
. (This follows
from the fact that the domination relation is transitive, which can easily be veriﬁed.) This observation
by itself, does not lead to a signiﬁcantly faster algorithm though. For example, if all the points are
maximal, which can certainly happen, then this optimization saves us nothing.
Plane-sweep Algorithm: The question is whether we can make an signiﬁcant improvement in the running
time? Here is an idea for how we might do it. We will sweep a vertical line across the plane from left
to right. As we sweep this line, we will build a structure holding the maximal points lying to the left of
the sweep line. When the sweep line reaches the rightmost point of P, then we will have constructed
the complete set of maxima. This approach of solving geometric problems by sweeping a line across
the plane is called plane sweep.
Although we would like to think of this as a continuous process, we need some way to perform the
plane sweep in discrete steps. To do this, we will begin by sorting the points in increasing order of
their x-coordinates. For simplicity, let us assume that no two points have the same y-coordinate. (This
limiting assumption is actually easy to overcome, but it is good to work with the simpler version, and
save the messy details for the actual implementation.) Then we will advance the sweep-line from point
to point in n discrete steps. As we encounter each new point, we will update the current list of maximal
points.
First off, how do we sort the points? We will leave this problem for later in the semester. But the
bottom line is that there exist any number of good sorting algorithms whose running time to sort n
values is Θ(nlog n). We will just assume that they exist for now.
So the only remaining problem is, how do we store the existing maximal points, and how do we update
them when a new point is processed? We claim that as each new point is added, it must be maximal for
the current set. (Why? Beacuse its x-coordinate is larger than all the x-coordinates of all the existing
points, and so it cannot be dominated by any of the existing points.) However, this new point may
dominate some of the existing maximal points, and so we may need to delete them from the list of
maxima. (Notice that once a point is deleted as being nonmaximal, it will never need to be added back
again.) Consider the ﬁgure below.
Let p
i
denote the current point being considered. Notice that since the p
i
has greater x-coordinate
than all the existing points, it dominates an existing point if and only if its y-coordinate is also larger
12
Lecture Notes CMSC 251
(b)
sweep line sweep line
(a)
4 2
(2,5)
(13,3)
(9,10)
(4,11)
(3,13)
(10,5)
(7,7)
(15,7)
(14,10)
(12,12)
(5,1)
(4,4)
14
12
16 14 12
10
8
6
4
2
10 8 6 6
(2,5)
(13,3)
(9,10)
(4,11)
(3,13)
(10,5)
(7,7)
(15,7)
(14,10)
(12,12)
8 10
2
4
6
8
10
12 14 16
12
14
(4,4)
(5,1)
4 2
Figure 3: Plane sweep algorithm for 2-d maxima.
(or equal). Thus, among the existing maximal points, we want to ﬁnd those having smaller (or equal)
y-coordinate, and eliminate them.
At this point, we need to make an important observation about how maximal points are ordered with
respect to the x- and y-coordinates. As we read maximal points from left to right (in order of increasing
x-coordinates) the y-coordinates appear in decreasing order. Why is this so? Suppose to the contrary,
that we had two maximal points p and q, with p.x ≥ q.x but p.y ≥ q.y. Then it would follow that q is
dominated by p, and hence it is not maximal, a contradiction.
This is nice, because it implies that if we store the existing maximal points in a list, the points that
p
i
dominates (if any) will all appear at the end of this list. So we have to scan this list to ﬁnd the
breakpoint between the maximal and dominated points. The question is how do we do this?
I claim that we can simply scan the list linearly. But we must do the scan in the proper direction for
the algorithm to be efﬁcient. Which direction should we scan the list of current maxima? From left
to right, until ﬁnding the ﬁrst point that is not dominated, or from right to left, until ﬁnding the ﬁrst
point that is dominated? Stop here and think about it for a moment. If you can answer this question
correctly, then it says something about your intuition for designing efﬁcient algorithms. Let us assume
that we are trying to optimize worst-case performance.
The correct answer is to scan the list from left to right. Here is why. If you only encounter one point
in the scan, then the scan will always be very efﬁcient. The danger is that you may scan many points
before ﬁnding the proper breakpoint. If we scan the list from left to right, then every point that we
encounter whose y-coordinate is less than p
i
’s will be dominated, and hence it will be eliminated from
the computation forever. We will never have to scan this point again. On the other hand, if we scan
from left to right, then in the worst case (consider when all the points are maximal) we may rescan the
same points over and over again. This will lead to an Θ(n
2
) algorithm
Now we can give the pseudocode for the ﬁnal plane sweep algorithm. Since we add maximal points
onto the end of the list, and delete them from the end of the list, we can use a stack to store the maximal
points, where the top of the stack contains the point with the highest x-coordinate. Let S denote this
stack. The top element of the stack is denoted S.top. Popping the stack means removing the top
element.
Plane Sweep Maxima
Maxima2(int n, Point P[1..n]) {
13
Lecture Notes CMSC 251
Sort P in ascending order by x-coordinate;
S = empty; // initialize stack of maxima
for i = 1 to n do { // add points in order of x-coordinate
while (S is not empty and S.top.y <= P[i].y)
Pop(S); // remove points that P[i] dominates
Push(S, P[i]); // add P[i] to stack of maxima
}
output the contents of S;
}
Why is this algorithm correct? The correctness follows from the discussion up to now. The most
important element was that since the current maxima appear on the stack in decreasing order of x-
coordinates (as we look down fromthe top of the stack), they occur in increasing order of y-coordinates.
Thus, as soon as we ﬁnd the last undominated element in the stack, it follows that everyone else on the
stack is undominated.
Analysis: This is an interesting program to analyze, primarily because the techniques that we discussed in
the last lecture do not apply readily here. I claim that after the sorting (which we mentioned takes
Θ(nlog n) time), the rest of the algorithm only takes Θ(n) time. In particular, we have two nested
loops. The outer loop is clearly executed n times. The inner while-loop could be iterated up to n − 1
times in the worst case (in particular, when the last point added dominates all the others). So, it seems
that though we have n(n −1) for a total of Θ(n
2
).
However, this is a good example of how not to be fooled by analyses that are too simple minded.
Although it is true that the inner while-loop could be executed as many as n − 1 times any one time
through the outer loop, over the entire course of the algorithm we claim that it cannot be executed
more than n times. Why is this? First observe that the total number of elements that have ever been
pushed onto the stack is at most n, since we execute exactly one Push for each time through the outer
for-loop. Also observe that every time we go through the inner while-loop, we must pop an element off
the stack. It is impossible to pop more elements off the stack than are ever pushed on. Therefore, the
inner while-loop cannot be executed more than n times over the entire course of the algorithm. (Make
sure that you believe the argument before going on.)
Therefore, since the total number of iterations of the inner while-loop is n, and since the total number
of iterations in the outer for-loop is n, the total running time of the algorithm is Θ(n).
Is this really better? How much of an improvement is this plane-sweep algorithm over the brute-force al-
gorithm? Probably the most accurate way to ﬁnd out would be to code the two up, and compare their
running times. But just to get a feeling, let’s look at the ratio of the running times. (We have ignored
constant factors, but we’ll see that they cannot play a very big role.)
We have argued that the brute-force algorithm runs in Θ(n
2
) time, and the improved plane-sweep
algorithm runs in Θ(nlog n) time. What is the base of the logarithm? It turns out that it will not matter
for the asymptotics (we’ll show this later), so for concreteness, let’s assume logarithm base 2, which
we’ll denote as lg n. The ratio of the running times is:
n
2
nlg n
=
n
lg n
.
For relatively small values of n (e.g. less than 100), both algorithms are probably running fast enough
that the difference will be practically negligible. On larger inputs, say, n = 1, 000, the ratio of n to
lg n is about 1000/10 = 100, so there is a 100-to-1 ratio in running times. Of course, we have not
considered the constant factors. But since neither algorithm makes use of very complex constructs, it
is hard to imagine that the constant factors will differ by more than, say, a factor of 10. For even larger
14
Lecture Notes CMSC 251
inputs, say, n = 1, 000, 000, we are looking at a ratio of roughly 1, 000, 000/20 = 50, 000. This is
quite a signiﬁcant difference, irrespective of the constant factors.
For example, suppose that there was a constant factor difference of 10 to 1, in favor of the brute-force
algorithm. The plane-sweep algorithm would still be 5,000 times faster. If the plane-sweep algorithm
took, say 10 seconds to execute, then the brute-force algorithm would take 14 hours.
From this we get an idea about the importance of asymptotic analysis. It tells us which algorithm is
better for large values of n. As we mentioned before, if n is not very large, then almost any algorithm
will be fast. But efﬁcient algorithm design is most important for large inputs, and the general rule of
computing is that input sizes continue to grow until people can no longer tolerate the running times.
Thus, by designing algorithms efﬁciently, you make it possible for the user to run large inputs in a
reasonable amount of time.
Asymptotic Notation: We continue to use the notation Θ() but have never deﬁned it. Let’s remedy this now.
Deﬁnition: Given any function g(n), we deﬁne Θ(g(n)) to be a set of functions that are asymptotically
equivalent to g(n), or put formally:
Θ(g(n)) = ¦f(n) [ there exist positive constants c
1
, c
2
, and n
0
such that
0 ≤ c
1
g(n) ≤ f(n) ≤ c
2
g(n) for all n ≥ n
0
¦.
Your response at this point might be, “I’m sorry that I asked”. It seems that the reasonably simple
concept of “throw away all but the fastest growing term, and ignore constant factors” should have a
simpler and more intuitive deﬁnition than this. Unfortunately, it does not (although later we will see
that there is somewhat easier, and nearly equivalent deﬁnition).
First off, we can see that we have been misusing the notation. We have been saying things like T(n) =
Θ(n
2
). This cannot be true. The left side is a function, and right side is a set of functions. This should
properly be written as T(n) ∈ Θ(n
2
). However, this abuse of notation is so common in the ﬁeld of
algorithm design, that no one notices it.
Going back to an earlier lecture, recall that we argued that the brute-force algorithm for 2-d maxima
had a running time of T(n) = 4n
2
+ 2n, which we claimed was Θ(n
2
). Let’s verify that this is so. In
this case g(n) = n
2
. We want to show that f(n) = 4n
2
+2n is a member of this set, which means that
we must argue that there exist constants c
1
, c
2
, and n
0
such that
0 ≤ c
1
n
2
≤ (4n
2
+ 2n) ≤ c
2
n
2
for all n ≥ n
0
.
There are really three inequalities here. The constraint 0 ≤ c
1
n
2
is no problem, since we will always
be dealing with positive n and positive constants. The next is:
c
1
n
2
≤ 4n
2
+ 2n.
If we set c
1
= 4, then we have 0 ≤ 4n
2
≤ 4n
2
+ 2n, which is clearly true as long as n ≥ 0. The other
inequality is
4n
2
+ 2n ≤ c
2
n
2
.
If we select c
2
= 6, and assume that n ≥ 1, then we have n
2
≥ n, implying that
4n
2
+ 2n ≤ 4n
2
+ 2n
2
= 6n
2
= c
2
n
2
.
We have two constraints on n, n ≥ 0 and n ≥ 1. So let us make n
0
= 1, which will imply that we as
long as n ≥ n
0
, we will satisfy both of these constraints.
Thus, we have given a formal proof that 4n
2
+ 2n ∈ Θ(n
2
), as desired. Next time we’ll try to give
some of the intuition behind this deﬁnition.
15
Lecture Notes CMSC 251
Lecture 5: Asymptotics
(Tuesday, Feb 10, 1998)
Read: Chapt. 3 in CLR. The Limit Rule is not really covered in the text. Read Chapt. 4 for next time.
Asymptotics: We have introduced the notion of Θ() notation, and last time we gave a formal deﬁnition.
Today, we will explore this and other asymptotic notations in greater depth, and hopefully give a better
understanding of what they mean.
Θ-Notation: Recall the following deﬁnition from last time.
Deﬁnition: Given any function g(n), we deﬁne Θ(g(n)) to be a set of functions:
Θ(g(n)) = ¦f(n) [ there exist strictly positive constants c
1
, c
2
, and n
0
such that
0 ≤ c
1
g(n) ≤ f(n) ≤ c
2
g(n) for all n ≥ n
0
¦.
Let’s dissect this deﬁnition. Intuitively, what we want to say with “f(n) ∈ Θ(g(n))” is that f(n) and
g(n) are asymptotically equivalent. This means that they have essentially the same growth rates for
large n. For example, functions like 4n
2
, (8n
2
+2n−3), (n
2
/5+

n−10 log n), and n(n−3) are all
intuitively asymptotically equivalent, since as n becomes large, the dominant (fastest growing) term is
some constant times n
2
. In other words, they all grow quadratically in n. The portion of the deﬁnition
that allows us to select c
1
and c
2
is essentially saying “the constants do not matter because you may
pick c
1
and c
2
however you like to satisfy these conditions.” The portion of the deﬁnition that allows
us to select n
0
is essentially saying “we are only interested in large n, since you only have to satisfy
the condition for all n bigger than n
0
, and you may make n
0
as big a constant as you like.”
An example: Consider the function f(n) = 8n
2
+ 2n − 3. Our informal rule of keeping the largest term
and throwing away the constants suggests that f(n) ∈ Θ(n
2
) (since f grows quadratically). Let’s see
why the formal deﬁnition bears out this informal observation.
We need to show two things: ﬁrst, that f(n) does grows asymptotically at least as fast as n
2
, and
second, that f(n) grows no faster asymptotically than n
2
. We’ll do both very carefully.
Lower bound: f(n) grows asymptotically at least as fast as n
2
: This is established by the portion
of the deﬁnition that reads: (paraphrasing): “there exist positive constants c
1
and n
0
, such that
f(n) ≥ c
1
n
2
for all n ≥ n
0
.” Consider the following (almost correct) reasoning:
f(n) = 8n
2
+ 2n −3 ≥ 8n
2
−3 = 7n
2
+ (n
2
−3) ≥ 7n
2
= 7n
2
.
Thus, if we set c
1
= 7, then we are done. But in the above reasoning we have implicitly made
the assumptions that 2n ≥ 0 and n
2
−3 ≥ 0. These are not true for all n, but they are true for all
sufﬁciently large n. In particular, if n ≥

3, then both are true. So let us select n
0

3, and
now we have f(n) ≥ c
1
n
2
, for all n ≥ n
0
, which is what we need.
Upper bound: f(n) grows asymptotically no faster than n
2
: This is established by the portion of
the deﬁnition that reads “there exist positive constants c
2
and n
0
, such that f(n) ≤ c
2
n
2
for all
n ≥ n
0
.” Consider the following reasoning (which is almost correct):
f(n) = 8n
2
+ 2n −3 ≤ 8n
2
+ 2n ≤ 8n
2
+ 2n
2
= 10n
2
.
This means that if we let c
2
= 10, then we are done. We have implicitly made the assumption
that 2n ≤ 2n
2
. This is not true for all n, but it is true for all n ≥ 1. So, let us select n
0
≥ 1, and
now we have f(n) ≤ c
2
n
2
for all n ≥ n
0
, which is what we need.
16
Lecture Notes CMSC 251
From the lower bound, we have n
0

3 and from the upper bound we have n
0
≥ 1, and so combining
these we let n
0
be the larger of the two: n
0
=

3. Thus, in conclusion, if we let c
1
= 7, c
2
= 10, and
n
0
=

3, then we have
0 ≤ c
1
g(n) ≤ f(n) ≤ c
2
g(n) for all n ≥ n
0
,
and this is exactly what the deﬁnition requires. Since we have shown (by construction) the existence of
constants c
1
, c
2
, and n
0
, we have established that f(n) ∈ n
2
. (Whew! That was a lot more work than
just the informal notion of throwing away constants and keeping the largest term, but it shows how this
informal notion is implemented formally in the deﬁnition.)
Now let’s show why f(n) is not in some other asymptotic class. First, let’s show that f(n) / ∈ Θ(n).
If this were true, then we would have to satisfy both the upper and lower bounds. It turns out that
the lower bound is satisﬁed (because f(n) grows at least as fast asymptotically as n). But the upper
bound is false. In particular, the upper bound requires us to show “there exist positive constants c
2
and n
0
, such that f(n) ≤ c
2
n for all n ≥ n
0
.” Informally, we know that as n becomes large enough
f(n) = 8n
2
+ 2n − 3 will eventually exceed c
2
n no matter how large we make c
2
(since f(n) is
2
n is only growing linearly). To show this formally, suppose towards a
2
and n
0
did exist, such that 8n
2
+ 2n − 3 ≤ c
2
n for all n ≥ n
0
. Since
this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. If we divide
both side by n we have:
lim
n→∞
_
8n + 2 −
3
n
_
≤ c
2
.
It is easy to see that in the limit the left side tends to ∞, and so no matter how large c
2
is, this statement
is violated. This means that f(n) / ∈ Θ(n).
Let’s show that f(n) / ∈ Θ(n
3
). Here the idea will be to violate the lower bound: “there exist positive
constants c
1
and n
0
, such that f(n) ≥ c
1
n
3
for all n ≥ n
0
.” Informally this is true because f(n) is
growing quadratically, and eventually any cubic function will exceed it. To show this formally, suppose
towards a contradiction that constants c
1
and n
0
did exist, such that 8n
2
+2n−3 ≥ c
1
n
3
for all n ≥ n
0
.
Since this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. If we
divide both side by n
3
we have:
lim
n→∞
_
8
n
+
2
n
2

3
n
3
_
≥ c
1
.
It is easy to see that in the limit the left side tends to 0, and so the only way to satisfy this requirement
is to set c
1
= 0, but by hypothesis c
1
is positive. This means that f(n) / ∈ Θ(n
3
).
O-notation and Ω-notation: We have seen that the deﬁnition of Θ-notation relies on proving both a lower
and upper asymptotic bound. Sometimes we are only interested in proving one bound or the other. The
O-notation allows us to state asymptotic upper bounds and the Ω-notation allows us to state asymptotic
lower bounds.
Deﬁnition: Given any function g(n),
O(g(n)) = ¦f(n) [ there exist positive constants c and n
0
such that
0 ≤ f(n) ≤ cg(n) for all n ≥ n
0
¦.
Deﬁnition: Given any function g(n),
Ω(g(n)) = ¦f(n) [ there exist positive constants c and n
0
such that
0 ≤ cg(n) ≤ f(n) for all n ≥ n
0
¦.
17
Lecture Notes CMSC 251
Compare this with the deﬁnition of Θ. You will see that O-notation only enforces the upper bound of
the Θ deﬁnition, and Ω-notation only enforces the lower bound. Also observe that f(n) ∈ Θ(g(n)) if
and only if f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)). Intuitively, f(n) ∈ O(g(n)) means that f(n) grows
asymptotically at the same rate or slower than g(n). Whereas, f(n) ∈ O(g(n)) means that f(n) grows
asymptotically at the same rate or faster than g(n).
For example f(n) = 3n
2
+ 4n ∈ Θ(n
2
) but it is not in Θ(n) or Θ(n
3
). But f(n) ∈ O(n
2
) and in
O(n
3
) but not in O(n). Finally, f(n) ∈ Ω(n
2
) and in Ω(n) but not in Ω(n
3
).
The Limit Rule for Θ: The previous examples which used limits suggest alternative way of showing that
f(n) ∈ Θ(g(n)).
Limit Rule for Θ-notation: Given positive functions f(n) and g(n), if
lim
n→∞
f(n)
g(n)
= c,
for some constant c > 0 (strictly positive but not inﬁnity), then f(n) ∈ Θ(g(n)).
Limit Rule for O-notation: Given positive functions f(n) and g(n), if
lim
n→∞
f(n)
g(n)
= c,
for some constant c ≥ 0 (nonnegative but not inﬁnite), then f(n) ∈ O(g(n)).
Limit Rule for Ω-notation: Given positive functions f(n) and g(n), if
lim
n→∞
f(n)
g(n)
,= 0
(either a strictly positive constant or inﬁnity) then f(n) ∈ Ω(g(n)).
This limit rule can be applied in almost every instance (that I know of) where the formal deﬁnition can
be used, and it is almost always easier to apply than the formal deﬁnition. The only exceptions that I
know of are strange instances where the limit does not exist (e.g. f(n) = n
(1+sin n)
). But since most
running times are fairly well-behaved functions this is rarely a problem.
You may recall the important rules from calculus for evaluating limits. (If not, dredge out your old
calculus book to remember.) Most of the rules are pretty self evident (e.g., the limit of a ﬁnite sum is
the sum of the individual limits). One important rule to remember is the following:
L’Hˆ opital’s rule: If f(n) and g(n) both approach 0 or both approach ∞in the limit, then
lim
n→∞
f(n)
g(n)
= lim
n→∞
f

(n)
g

(n)
,
where f

(n) and g

(n) denote the derivatives of f and g relative to n.
Polynomial Functions: Using the Limit Rule it is quite easy to analyze polynomial functions.
Lemma: Let f(n) = 2n
4
−5n
3
−2n
2
+ 4n −7. Then f(n) ∈ Θ(n
4
).
Proof: This would be quite tedious to do by the formal deﬁnition. Using the limit rule we have:
lim
n→∞
f(n)
n
4
= lim
n→∞
_
2 −
5
n

2
n
2
+
4
n
3

7
n
4
_
= 2 −0 −0 + 0 −0 = 2.
Since 2 is a strictly positive constant it follows from the limit rule that f(n) ∈ Θ(n
2
).
18
Lecture Notes CMSC 251
In fact, it is easy to generalize this to arbitrary polynomials.
Theorem: Consider any asymptotically positive polynomial of degree p(n) =

d
i=0
a
i
n
i
, where
a
d
> 0. Then p(n) ∈ Θ(n
d
).
From this, the informal rule of “keep the largest term and throw away the constant factors” is now much
more evident.
Exponentials and Logarithms: Exponentials and logarithms are very important in analyzing algorithms.
The following are nice to keep in mind. The terminology lg
b
n means (lg n)
b
.
Lemma: Given any positive constants a > 1, b, and c:
lim
n→∞
n
b
a
n
= 0 lim
n→∞
lg
b
n
n
c
= 0.
We won’t prove these, but they can be shown by taking appropriate powers, and then applying L’Hˆ opital’s
rule. The important bottom line is that polynomials always grow more slowly than exponentials whose
base is greater than 1. For example:
n
500
∈ O(2
n
).
For this reason, we will try to avoid exponential running times at all costs. Conversely, logarithmic
powers (sometimes called polylogarithmic functions) grow more slowly than any polynomial. For
example:
lg
500
n ∈ O(n).
For this reason, we will usually be happy to allow any number of additional logarithmic factors, if it
means avoiding any additional powers of n.
At this point, it should be mentioned that these last observations are really asymptotic results. They
are true in the limit for large n, but you should be careful just how high the crossover point is. For
example, by my calculations, lg
500
n ≤ n only for n > 2
6000
(which is much larger than input size
you’ll ever see). Thus, you should take this with a grain of salt. But, for small powers of logarithms,
this applies to all reasonably large input sizes. For example lg
2
n ≤ n for all n ≥ 16.
Asymptotic Intuition: To get a intuitive feeling for what common asymptotic running times map into in
terms of practical usage, here is a little list.
• Θ(1): Constant time; you can’t beat it!
• Θ(log n): This is typically the speed that most efﬁcient data structures operate in for a single
access. (E.g., inserting a key into a balanced binary tree.) Also it is the time to ﬁnd an object in a
sorted list of length n by binary search.
• Θ(n): This is about the fastest that an algorithm can run, given that you need Θ(n) time just to
• Θ(nlog n): This is the running time of the best sorting algorithms. Since many problems require
sorting the inputs, this is still considered quite efﬁcient.
• Θ(n
2
), Θ(n
3
), . . ..: Polynomial time. These running times are acceptable either when the expo-
nent is small or when the data size is not too large (e.g. n ≤ 1, 000).
• Θ(2
n
), Θ(3
n
): Exponential time. This is only acceptable when either (1) your know that you
inputs will be of very small size (e.g. n ≤ 50), or (2) you know that this is a worst-case running
time that will rarely occur in practical instances. In case (2), it would be a good idea to try to get
a more accurate average case analysis.
• Θ(n!), Θ(n
n
): Acceptable only for really small inputs (e.g. n ≤ 20).
Are their even bigger functions. You betcha! For example, if you want to see a function that grows
inconceivably fast, look up the deﬁnition of Ackerman’s function in our book.
19
Lecture Notes CMSC 251
Lecture 6: Divide and Conquer and MergeSort
(Thursday, Feb 12, 1998)
Read: Chapt. 1 (on MergeSort) and Chapt. 4 (on recurrences).
Divide and Conquer: The ancient Roman politicians understood an important principle of good algorithm
design (although they were probably not thinking about algorithms at the time). You divide your
enemies (by getting them to distrust each other) and then conquer them piece by piece. This is called
divide-and-conquer. In algorithm design, the idea is to take a problem on a large input, break the input
into smaller pieces, solve the problem on each of the small pieces, and then combine the piecewise
solutions into a global solution. But once you have broken the problem into pieces, how do you solve
these pieces? The answer is to apply divide-and-conquer to them, thus further breaking them down.
The process ends when you are left with such tiny pieces remaining (e.g. one or two items) that it is
trivial to solve them.
Summarizing, the main elements to a divide-and-conquer solution are
• Divide (the problem into a small number of pieces),
• Conquer (solve each piece, by applying divide-and-conquer recursively to it), and
• Combine (the pieces together into a global solution).
There are a huge number computational problems that can be solved efﬁciently using divide-and-
conquer. In fact the technique is so powerful, that when someone ﬁrst suggests a problem to me,
the ﬁrst question I usually ask (after what is the brute-force solution) is “does there exist a divide-and-
conquer solution for this problem?”
Divide-and-conquer algorithms are typically recursive, since the conquer part involves invoking the
same technique on a smaller subproblem. Analyzing the running times of recursive programs is rather
tricky, but we will show that there is an elegant mathematical concept, called a recurrence, which is
useful for analyzing the sort of recursive programs that naturally arise in divide-and-conquer solutions.
For the next couple of lectures we will discuss some examples of divide-and-conquer algorithms, and
how to analyze them using recurrences.
MergeSort: The ﬁrst example of a divide-and-conquer algorithm which we will consider is perhaps the best
known. This is a simple and very efﬁcient algorithm for sorting a list of numbers, called MergeSort.
We are given an sequence of n numbers A, which we will assume is stored in an array A[1 . . . n]. The
objective is to output a permutation of this sequence, sorted in increasing order. This is normally done
by permuting the elements within the array A.
How can we apply divide-and-conquer to sorting? Here are the major elements of the MergeSort
algorithm.
Divide: Split A down the middle into two subsequences, each of size roughly n/2.
Conquer: Sort each subsequence (by calling MergeSort recursively on each).
Combine: Merge the two sorted subsequences into a single sorted list.
The dividing process ends when we have split the subsequences down to a single item. An sequence
of length one is trivially sorted. The key operation where all the work is done is in the combine stage,
which merges together two sorted lists into a single sorted list. It turns out that the merging process is
quite easy to implement.
The following ﬁgure gives a high-level view of the algorithm. The “divide” phase is shown on the left.
It works top-down splitting up the list into smaller sublists. The “conquer and combine” phases are
shown on the right. They work bottom-up, merging sorted lists together into larger sorted lists.
20
Lecture Notes CMSC 251
7 5 2 4 1 6 3 0 output: input:
split
merge
2 4 5 7
5 7
0 1 3 6
0 3
0 1 2 3 4 5 6 7
7
1 6 2 4
0 3 6 1 4 2 5 7 5
7 5 2 4 1 6 3 0
3 0 1 6 2 4 7 5
0 3 6 1 4 2
Figure 4: MergeSort.
MergeSort: Let’s design the algorithm top-down. We’ll assume that the procedure that merges two sorted
list is available to us. We’ll implement it later. Because the algorithm is called recursively on sublists,
in addition to passing in the array itself, we will pass in two indices, which indicate the ﬁrst and last
indices of the subarray that we are to sort. The call MergeSort(A, p, r) will sort the subarray
A[p..r] and return the sorted result in the same subarray.
Here is the overview. If r = p, then this means that there is only one element to sort, and we may return
immediately. Otherwise (if p < r) there are at least two elements, and we will invoke the divide-and-
conquer. We ﬁnd the index q, midway between p and r, namely q = (p + r)/2 (rounded down to the
nearest integer). Then we split the array into subarrays A[p..q] and A[q +1..r]. (We need to be careful
here. Why would it be wrong to do A[p..q − 1] and A[q..r]? Suppose r = p + 1.) Call MergeSort
recursively to sort each subarray. Finally, we invoke a procedure (which we have yet to write) which
merges these two subarrays into a single sorted array.
MergeSort
MergeSort(array A, int p, int r) {
if (p < r) { // we have at least 2 items
q = (p + r)/2
MergeSort(A, p, q) // sort A[p..q]
MergeSort(A, q+1, r) // sort A[q+1..r]
Merge(A, p, q, r) // merge everything together
}
}
Merging: All that is left is to describe the procedure that merges two sorted lists. Merge(A, p, q, r)
assumes that the left subarray, A[p..q], and the right subarray, A[q + 1..r], have already been sorted.
We merge these two subarrays by copying the elements to a temporary working array called B. For
convenience, we will assume that the array B has the same index range A, that is, B[p..r]. (One nice
thing about pseudocode, is that we can make these assumptions, and leave them up to the programmer
to ﬁgure out how to implement it.) We have to indices i and j, that point to the current elements of
each subarray. We move the smaller element into the next position of B (indicated by index k) and
then increment the corresponding index (either i or j). When we run out of elements in one array, then
we just copy the rest of the other array into B. Finally, we copy the entire contents of B back into A.
(The use of the temporary array is a bit unpleasant, but this is impossible to overcome entirely. It is one
of the shortcomings of MergeSort, compared to some of the other efﬁcient sorting algorithms.)
In case you are not aware of C notation, the operator i++ returns the current value of i, and then
increments this variable by one.
Merge
Merge(array A, int p, int q, int r) { // merges A[p..q] with A[q+1..r]
21
Lecture Notes CMSC 251
array B[p..r]
i = k = p // initialize pointers
j = q+1
while (i <= q and j <= r) { // while both subarrays are nonempty
if (A[i] <= A[j]) B[k++] = A[i++] // copy from left subarray
else B[k++] = A[j++] // copy from right subarray
}
while (i <= q) B[k++] = A[i++] // copy any leftover to B
while (j <= r) B[k++] = A[j++]
for i = p to r do A[i] = B[i] // copy B back to A
}
This completes the description of the algorithm. Observe that of the last two while-loops in the Merge
procedure, only one will be executed. (Do you see why?)
If you ﬁnd the recursion to be a bit confusing. Go back and look at the earlier ﬁgure. Convince yourself
that as you unravel the recursion you are essentially walking through the tree (the recursion tree) shown
in the ﬁgure. As calls are made you walk down towards the leaves, and as you return you are walking
up towards the root. (We have drawn two trees in the ﬁgure, but this is just to make the distinction
between the inputs and outputs clearer.)
Discussion: One of the little tricks in improving the running time of this algorithm is to avoid the constant
copying from A to B and back to A. This is often handled in the implementation by using two arrays,
both of equal size. At odd levels of the recursion we merge from subarrays of A to a subarray of B. At
even levels we merge from from B to A. If the recursion has an odd number of levels, we may have to
do one ﬁnal copy from B back to A, but this is faster than having to do it at every level. Of course, this
only improves the constant factors; it does not change the asymptotic running time.
Another implementation trick to speed things by a constant factor is that rather than driving the divide-
and-conquer all the way down to subsequences of size 1, instead stop the dividing process when the
sequence sizes fall below constant, e.g. 20. Then invoke a simple Θ(n
2
) algorithm, like insertion sort
on these small lists. Often brute force algorithms run faster on small subsequences, because they do
not have the added overhead of recursion. Note that since they are running on subsequences of size at
most 20, the running times is Θ(20
2
) = Θ(1). Thus, this will not affect the overall asymptotic running
time.
It might seem at ﬁrst glance that it should be possible to merge the lists “in-place”, without the need
for additional temporary storage. The answer is that it is, but it no one knows how to do it without
destroying the algorithm’s efﬁciency. It turns out that there are faster ways to sort numbers in-place,
e.g. using either HeapSort or QuickSort.
Here is a subtle but interesting point to make regarding this sorting algorithm. Suppose that in the if-
statement above, we have A[i] = A[j]. Observe that in this case we copy from the left sublist. Would
it have mattered if instead we had copied from the right sublist? The simple answer is no—since the
elements are equal, they can appear in either order in the ﬁnal sublist. However there is a subtler reason
to prefer this particular choice. Many times we are sorting data that does not have a single attribute,
but has many attributes (name, SSN, grade, etc.) Often the list may already have been sorted on one
attribute (say, name). If we sort on a second attribute (say, grade), then it would be nice if people with
same grade are still sorted by name. A sorting algorithm that has the property that equal items will
appear in the ﬁnal sorted list in the same relative order that they appeared in the initial input is called a
stable sorting algorithm. This is a nice property for a sorting algorithm to have. By favoring elements
from the left sublist over the right, we will be preserving the relative order of elements. It can be shown
that as a result, MergeSort is a stable sorting algorithm. (This is not immediate, but it can be proved by
induction.)
22
Lecture Notes CMSC 251
Analysis: What remains is to analyze the running time of MergeSort. First let us consider the running time
of the procedure Merge(A, p, q, r). Let n = r − p + 1 denote the total length of both the left
and right subarrays. What is the running time of Merge as a function of n? The algorithm contains four
loops (none nested in the other). It is easy to see that each loop can be executed at most n times. (If
you are a bit more careful you can actually see that all the while-loops together can only be executed n
times in total, because each execution copies one new element to the array B, and B only has space for
n elements.) Thus the running time to Merge n items is Θ(n). Let us write this without the asymptotic
notation, simply as n. (We’ll see later why we do this.)
Now, how do we describe the running time of the entire MergeSort algorithm? We will do this through
the use of a recurrence, that is, a function that is deﬁned recursively in terms of itself. To avoid
circularity, the recurrence for a given value of n is deﬁned in terms of values that are strictly smaller
than n. Finally, a recurrence has some basis values (e.g. for n = 1), which are deﬁned explicitly.
Let’s see how to apply this to MergeSort. Let T(n) denote the worst case running time of MergeSort on
an array of length n. For concreteness we could count whatever we like: number of lines of pseudocode,
number of comparisons, number of array accesses, since these will only differ by a constant factor.
Since all of the real work is done in the Merge procedure, we will count the total time spent in the
Merge procedure.
First observe that if we call MergeSort with a list containing a single element, then the running time is a
constant. Since we are ignoring constant factors, we can just write T(n) = 1. When we call MergeSort
with a list of length n > 1, e.g. Merge(A, p, r), where r−p+1 = n, the algorithm ﬁrst computes
q = ¸(p +r)/2|. The subarray A[p..q], which contains q − p + 1 elements. You can verify (by some
tedious ﬂoor-ceiling arithmetic, or simpler by just trying an odd example and an even example) that is
of size ¸n/2|. Thus the remaining subarray A[q +1..r] has ¸n/2| elements in it. How long does it take
to sort the left subarray? We do not know this, but because ¸n/2| < n for n > 1, we can express this
as T(¸n/2|). Similarly, we can express the time that it takes to sort the right subarray as T(¸n/2|).
Finally, to merge both sorted lists takes n time, by the comments made above. In conclusion we have
T(n) =
_
1 if n = 1,
T(¸n/2|) +T(¸n/2|) +n otherwise.
Lecture 7: Recurrences
(Tuesday, Feb 17, 1998)
Read: Chapt. 4 on recurrences. Skip Section 4.4.
Divide and Conquer and Recurrences: Last time we introduced divide-and-conquer as a basic technique
for designing efﬁcient algorithms. Recall that the basic steps in divide-and-conquer solution are (1)
divide the problem into a small number of subproblems, (2) solve each subproblem recursively, and (3)
combine the solutions to the subproblems to a global solution. We also described MergeSort, a sorting
algorithm based on divide-and-conquer.
Because divide-and-conquer is an important design technique, and because it naturally gives rise to
recursive algorithms, it is important to develop mathematical techniques for solving recurrences, either
exactly or asymptotically. To do this, we introduced the notion of a recurrence, that is, a recursively
deﬁned function. Today we discuss a number of techniques for solving recurrences.
MergeSort Recurrence: Here is the recurrence we derived last time for MergeSort. Recall that T(n) is the
time to run MergeSort on a list of size n. We argued that if the list is of length 1, then the total sorting
time is a constant Θ(1). If n > 1, then we must recursively sort two sublists, one of size ¸n/2| and
the other of size ¸n/2|, and the nonrecursive part took Θ(n) time for splitting the list (constant time)
23
Lecture Notes CMSC 251
and merging the lists (Θ(n) time). Thus, the total running time for MergeSort could be described by
the following recurrence:
T(n) =
_
1 if n = 1,
T(¸n/2|) +T(¸n/2|) +n otherwise.
Notice that we have dropped the Θ()’s, replacing Θ(1) and Θ(n) by just 1 and n, respectively. This
is done to make the recurrence more concrete. If we had wanted to be more precise, we could have
replaced these with more exact functions, e.g., c
1
and c
2
n for some constants c
1
and c
2
. The analysis
would have been a bit more complex, but we would arrive at the same asymptotic formula.
Getting a feel: We could try to get a feeling for what this means by plugging in some values and expanding
the deﬁnition.
T(1) = 1 (by the basis.)
T(2) = T(1) +T(1) + 2 = 1 + 1 + 2 = 4
T(3) = T(2) +T(1) + 3 = 4 + 1 + 3 = 8
T(4) = T(2) +T(2) + 4 = 4 + 4 + 4 = 12
T(5) = T(3) +T(2) + 5 = 8 + 4 + 5 = 17
. . .
T(8) = T(4) +T(4) + 8 = 12 + 12 + 8 = 32
. . .
T(16) = T(8) +T(8) + 16 = 32 + 32 + 16 = 80
. . .
T(32) = T(16) +T(16) + 32 = 80 + 80 + 32 = 192.
It’s hard to see much of a pattern here, but here is a trick. Since the recurrence divides by 2 each time,
let’s consider powers of 2, since the function will behave most regularly for these values. If we consider
the ratios T(n)/n for powers of 2 and interesting pattern emerges:
T(1)/1 = 1 T(8)/8 = 4
T(2)/2 = 2 T(16)/16 = 5
T(4)/4 = 3 T(32)/32 = 6.
This suggests that for powers of 2, T(n)/n = (lg n) + 1, or equivalently, T(n) = (nlg n) + n which
is Θ(nlog n). This is not a proof, but at least it provides us with a starting point.
Logarithms in Θ-notation: Notice that I have broken away from my usual convention of say lg n and just
said log n inside the Θ(). The reason is that the base really does not matter when it is inside the Θ.
Recall the change of base formula:
log
b
n =
log
a
n
log
a
b
.
If a and b are constants the log
a
b is a constant. Consequently log
b
n and log
a
n differ only by a constant
factor. Thus, inside the Θ() we do not need to differentiate between them. Henceforth, I will not be
fussy about the bases of logarithms if asymptotic results are sufﬁcient.
Eliminating Floors and Ceilings: One of the nasty things about recurrences is that ﬂoors and ceilings are
a pain to deal with. So whenever it is reasonable to do so, we will just forget about them, and make
whatever simplifying assumptions we like about n to make things work out. For this case, we will
make the simplifying assumption that n is a power of 2. Notice that this means that our analysis will
24
Lecture Notes CMSC 251
only be correct for a very limited (but inﬁnitely large) set of values of n, but it turns out that as long as
the algorithm doesn’t act signiﬁcantly different for powers of 2 versus other numbers, the asymptotic
analysis will hold for all n as well. So let us restate our recurrence under this assumption:
T(n) =
_
1 if n = 1,
2T(n/2) +n otherwise.
Veriﬁcation through Induction: We have just generated a guess for the solution to our recurrence. Let’s
see if we can verify its correctness formally. The proof will be by strong induction on n. Because n is
limited to powers of 2, we cannot do the usual n to n + 1 proof (because if n is a power of 2, n + 1
will generally not be a power of 2). Instead we use strong induction.
Claim: For all n ≥ 1, n a power of 2, T(n) = (nlg n) +n.
Proof: (By strong induction on n.)
Basis case: (n = 1) In this case T(1) = 1 by deﬁnition and the formula gives 1 lg 1 + 1 = 1,
which matches.
Induction step: Let n > 1, and assume that the formula T(n

) = (n

lg n

)+n

, holds whenever
n

< n. We want to prove the formula holds for n itself. To do this, we need to express T(n)
in terms of smaller values. To do this, we apply the deﬁnition:
T(n) = 2T(n/2) +n.
Now, n/2 < n, so we can apply the induction hypothesis here, yielding T(n/2) = (n/2) lg(n/2)+
(n/2). Plugging this in gives
T(n) = 2((n/2) lg(n/2) + (n/2)) +n
= (nlg(n/2) +n) +n
= n(lg n −lg 2) + 2n
= (nlg n −n) + 2n
= nlg n +n,
which is exactly what we want to prove.
The Iteration Method: The above method of “guessing” a solution and verifying through induction works
ﬁne as long as your recurrence is simple enough that you can come up with a good guess. But if the
recurrence is at all messy, there may not be a simple formula. The following method is quite powerful.
When it works, it allows you to convert a recurrence into a summation. By in large, summations are
easier to solve than recurrences (and if nothing else, you can usually approximate them by integrals).
The method is called iteration. Let’s start expanding out the deﬁnition until we see a pattern developing.
We ﬁrst write out the deﬁnition T(n) = 2T(n/2) + n. This has a recursive formula inside T(n/2)
which we can expand, by ﬁlling in the deﬁnition but this time with the argument n/2 rather than n.
Plugging in we get T(n) = 2(2T(n/4) +n/2) +n. We then simplify and repeat. Here is what we get
when we repeat this.
T(n) = 2T(n/2) +n
= 2(2T(n/4) +n/2) +n = 4T(n/4) +n +n
= 4(2T(n/8) +n/4) +n +n = 8T(n/8) +n +n +n
= 8(2T(n/16) +n/8) +n +n +n = 16T(n/16) +n +n +n +n
= . . .
25
Lecture Notes CMSC 251
At this point we can see a general pattern emerging.
T(n) = 2
k
T(n/(2
k
)) + (n +n + +n) (k times)
= 2
k
T(n/(2
k
)) +kn.
Now, we have generated alot of equations, but we still haven’t gotten anywhere, because we need to
get rid of the T() from the right-hand side. Here’s how we can do that. We know that T(1) = 1. Thus,
let us select k to be a value which forces the term n/(2
k
) = 1. This means that n = 2
k
, implying that
k = lg n. If we substitute this value of k into the equation we get
T(n) = 2
(lg n)
T(n/(2
(lg n)
)) + (lg n)n
= 2
(lg n)
T(1) +nlg n = 2
(lg n)
+nlg n = n +nlg n.
In simplifying this, we have made use of the formula from the ﬁrst homework, a
log
b
n
= n
log
b
a
, where
a = b = 2. Thus we have arrived at the same conclusion, but this time no guesswork was involved.
The Iteration Method (a Messier Example): That one may have been a bit too easy to see the general form.
Let’s try a messier recurrence:
T(n) =
_
1 if n = 1,
3T(n/4) +n otherwise.
To avoid problems with ﬂoors and ceilings, we’ll make the simplifying assumption here that n is a
power of 4. As before, the idea is to repeatedly apply the deﬁnition, until a pattern emerges.
T(n) = 3T(n/4) +n
= 3(3T(n/16) +n/4) +n = 9T(n/16) + 3(n/4) +n
= 9(3T(n/64) +n/16) + 3(n/4) +n = 27T(n/64) + 9(n/16) + 3(n/4) +n
= . . .
= 3
k
T
_
n
4
k
_
+ 3
k−1
(n/4
k−1
) + + 9(n/16) + 3(n/4) +n
= 3
k
T
_
n
4
k
_
+
k−1

i=0
3
i
4
i
n.
As before, we have the recursive term T(n/4
k
) still ﬂoating around. To get rid of it we recall that
we know the value of T(1), and so we set n/4
k
= 1 implying that 4
k
= n, that is, k = log
4
n. So,
plugging this value in for k we get:
T(n) = 3
log
4
n
T(1) +
(log
4
n)−1

i=0
3
i
4
i
n
= n
log
4
3
+
(log
4
n)−1

i=0
3
i
4
i
n.
Again, in the last step we used the formula a
log
b
n
= n
log
b
a
where a = 3 and b = 4, and the fact that
T(1) = 1. (Why did we write it this way? This emphasizes that the function is of the formn
c
for some
constant c.) By the way, log
4
3 = 0.7925 . . . ≈ 0.79, so n
log
4
3
≈ n
0.79
.
26
Lecture Notes CMSC 251
We have this messy summation to solve though. First observe that the value n remains constant
throughout the sum, and so we can pull it out front. Also note that we can write 3
i
/4
i
and (3/4)
i
.
T(n) = n
log
4
3
+n
(log
4
n)−1

i=0
_
3
4
_
i
.
Note that this is a geometric series. We may apply the formula for the geometric series, which gave in
an earlier lecture. For x ,= 1:
m

i=0
x
i
=
x
m+1
−1
x −1
.
In this case x = 3/4 and m = log
4
n −1. We get
T(n) = n
log
4
3
+n
(3/4)
log
4
n
−1
(3/4) −1
.
Applying our favorite log identity once more to the expression in the numerator (with a = 3/4 and
b = 4) we get
(3/4)
log
4
n
= n
log
4
(3/4)
= n
(log
4
3−log
4
4)
= n
(log
4
3−1)
=
n
log
4
3
n
.
If we plug this back in, we have
T(n) = n
log
4
3
+n
n
log
4
3
n
−1
(3/4) −1
= n
log
4
3
+
n
log
4
3
−n
−1/4
= n
log
4
3
−4(n
log
4
3
−n)
= n
log
4
3
+ 4(n −n
log
4
3
)
= 4n −3n
log
4
3
.
So the ﬁnal result (at last!) is
T(n) = 4n −3n
log
4
3
≈ 4n −3n
0.79
∈ Θ(n).
It is interesting to note the unusual exponent of log
4
3 ≈ 0.79. We have seen that two nested loops typi-
2
) time, and three nested loops typically leads to Θ(n
3
) time, so it seems remarkable
that we could generate a strange exponent like 0.79 as part of a running time. However, as we shall
see, this is often the case in divide-and-conquer recurrences.
Lecture 8: More on Recurrences
(Thursday, Feb 19, 1998)
Read: Chapt. 4 on recurrences, skip Section 4.4.
Recap: Last time we discussed recurrences, that is, functions that are deﬁned recursively. We discussed
their importance in analyzing divide-and-conquer algorithms. We also discussed two methods for solv-
ing recurrences, namely guess-and-verify (by induction), and iteration. These are both very powerful
methods, but they are quite “mechanical”, and it is difﬁcult to get a quick and intuitive sense of what
is going on in the recurrence. Today we will discuss two more techniques for solving recurrences. The
ﬁrst provides a way of visualizing recurrences and the second, called the Master Theorem, is a method
of solving many recurrences that arise in divide-and-conquer applications.
27
Lecture Notes CMSC 251
Visualizing Recurrences Using the Recursion Tree: Iteration is a very powerful technique for solving re-
currences. But, it is easy to get lost in all the symbolic manipulations and lose sight of what is going
on. Here is a nice way to visualize what is going on in iteration. We can describe any recurrence in
terms of a tree, where each expansion of the recurrence takes us one level deeper in the tree.
Recall that the recurrence for MergeSort (which we simpliﬁed by assuming that n is a power of 2, and
hence could drop the ﬂoors and ceilings)
T(n) =
_
1 if n = 1,
2T(n/2) +n otherwise.
Suppose that we draw the recursion tree for MergeSort, but each time we merge two lists, we label that
node of the tree with the time it takes to perform the associated (nonrecursive) merge. Recall that to
merge two lists of size m/2 to a list of size m takes Θ(m) time, which we will just write as m. Below
is an illustration of the resulting recursion tree.
= n
n
n
n 2( /2) =
n 4( /4) =
n/4 n/4 n/4 n/4
n/2 n/2
n
+
T(n/4)
T(n/2)
T(n)
T(n/2)
...
n (lg +1) n
n(n/n) = n
... 1 1
+ 1 levels
+
+
1 1
n lg
Figure 5: Using the recursion tree to visualize a recurrence.
Observe that the total work at the topmost level of the recursion is Θ(n) (or just n for short). At the
second level we have two merges, each taking n/2 time, for a total of 2(n/2) = n. At the third level we
have 4 merges, each taking n/4 time, for a total of 4(n/4) = n. This continues until the bottommost
level of the tree. Since the tree exactly lg n + 1 levels (0, 1, 2, . . . , lg n), and each level contributes a
total of n time, the total running time is n(lg n + 1) = nlg n + n. This is exactly what we got by the
iteration method.
This can be used for a number of simple recurrences. For example, let’s try it on the following recur-
rence. The tree is illustrated below.
T(n) =
_
1 if n = 1,
3T(n/2) +n
2
otherwise.
Again, we label each node with the amount of work at that level. In this case the work for T(m) is m
2
.
For the top level (or 0th level) the work is n
2
. At level 1 we have three nodes whose work is (n/2)
2
each, for a total of 3(n/2)
2
. This can be written as n
2
(3/4). At the level 2 the work is 9(n/4)
2
, which
can be written as n
2
(9/16). In general it is easy to extrapolate to see that at the level i, we have 3
i
nodes, each involving (n/2
i
)
2
work, for a total of 3
i
(n/2
i
)
2
= n
2
(3/4)
i
.
This leads to the following summation. Note that we have not determined where the tree bottoms out,
so we have left off the upper bound on the sum.
T(n) = n
2
?

i=0
_
3
4
_
i
.
28
Lecture Notes CMSC 251
T(n/2) T(n/2)
T(n/4)
T(n/2)
T(n)
(n/4)
2
...
2 2
n
+
+
+
= n
...
9(n/4) = n (9/16)
2 2
3(n/2) = n (3/4)
(n/4)
2
(n/4)
2
(n/4)
2
(n/4)
2
(n/4)
2
2
(n/2)
2
(n/2)
2
(n/2)
2
i
n (3/4)
2
2
Figure 6: Another recursion tree example.
If all we wanted was an asymptotic expression, then are essentially done at this point. Why? The
summation is a geometric series, and the base (3/4) is less than 1. This means that this series converges
to some nonzero constant (even if we ran the sum out to ∞). Thus the running time is Θ(n
2
).
To get a more exact result, observe that the recursion bottoms out when we get down to single items,
and since the sizes of the inputs are cut by half at each level, it is not hard to see that the ﬁnal level is
level lg n. (It is easy to be off by ±1 here, but this sort of small error will not affect the asymptotic
result. In this case we happen to be right.) So, we can plug in lg n for the “?” in the above summation.
T(n) = n
2
lg n

i=0
_
3
4
_
i
.
If we wanted to get a more exact answer, we could plug the summation into the formula for the geo-
metric series and simplify. This would lead to an expression like
T(n) = n
2
(3/4)
lg n+1
−1
(3/4) −1
.
This will take some work to simplify, but at this point it is all just tedious algebra to get the formula
into simple form. (This sort of algebraic is typical of algorithm analysis, so be sure that you follow
each step.)
T(n) = n
2
(3/4)
lg n+1
−1
(3/4) −1
= −4n
2
((3/4)
lg n+1
−1)
= 4n
2
(1 −(3/4)
lg n+1
) = 4n
2
(1 −(3/4)(3/4)
lg n
)
= 4n
2
(1 −(3/4)n
lg(3/4)
) = 4n
2
(1 −(3/4)n
lg 3−lg 4
)
= 4n
2
(1 −(3/4)n
lg 3−2
) = 4n
2
(1 −(3/4)(n
lg 3
/n
2
))
= 4n
2
−3n
lg 3
.
Note that lg 3 ≈ 1.58, so the whole expression is Θ(n
2
).
In conclusion, the technique of drawing the recursion tree is a somewhat more visual way of analyzing
summations, but it is really equivalent to the method of iteration.
(Simpliﬁed) Master Theorem: If you analyze many divide-and-conquer algorithms, you will see that the
same general type of recurrence keeps popping up. In general you are breaking a problem into a
subproblems, where each subproblem is roughly a factor of 1/b of the original problem size, and
29
Lecture Notes CMSC 251
the time it takes to do the splitting and combining on an input of size n is Θ(n
k
). For example, in
MergeSort, a = 2, b = 2, and k = 1.
Rather than doing every such recurrence from scratch, can we just come up with a general solution?
The answer is that you can if all you need is an asymptotic expression. This result is called the Master
Theorem, because it can be used to “master” so many different types of recurrence. Our text gives a
fairly complicated version of this theorem. We will give a simpler version, which is general enough for
most typical situations. In cases where this doesn’t apply, try the one from the book. If the one from
the book doesn’t apply, then you will probably need iteration, or some other technique.
Theorem: (Simpliﬁed Master Theorem) Let a ≥ 1, b > 1 be constants and let T(n) be the recurrence
T(n) = aT(n/b) +n
k
,
deﬁned for n ≥ 0. (As usual let us assume that n is a power of b. The basis case, T(1) can be any
constant value.) Then
Case 1: if a > b
k
then T(n) ∈ Θ(n
log
b
a
).
Case 2: if a = b
k
then T(n) ∈ Θ(n
k
log n).
Case 3: if a < b
k
then T(n) ∈ Θ(n
k
).
Using this version of the Master Theorem we can see that in the MergeSort recurrence a = 2, b = 2,
and k = 1. Thus, a = b
k
(2 = 2
1
) and so Case 2 applies. From this we have T(n) ∈ Θ(nlog n).
In the recurrence above, T(n) = 3T(n/2) + n
2
, we have a = 3, b = 2 and k = 2. We have a < b
k
(3 < 2
2
) in this case, and so Case 3 applies. From this we have T(n) ∈ Θ(n
2
).
Finally, consider the recurrence T(n) = 4T(n/3) + n, in which we have a = 4, b = 3 and k = 1. In
this case we have a > b
k
(4 > 3
1
), and so Case 1 applies. From this we have T(n) ∈ Θ(n
log
3
4
) ≈
Θ(n
1.26
). This may seem to be a rather strange running time (a non-integer exponent), but this not
uncommon for many divide-and-conquer solutions.
There are many recurrences that cannot be put into this form. For example, if the splitting and combin-
ing steps involve sorting, we might have seen a recurrence of the form
T(n) =
_
1 if n = 1,
2T(n/2) +nlog n otherwise.
This solves to T(n) = Θ(nlog
2
n), but the Master Theorem (neither this form nor the one in CLR)
will tell you this. However, iteration works just ﬁne here.
Recursion Trees Revisited: The recursion trees offer some intuition about why it is that there are three cases
in the Master Theorem. Generally speaking the question is where is most of the work done: at the top
of the tree (the root level), at the bottom of the tree (the leaf level), or spread equally throughout the
entire tree.
For example, in the MergeSort recurrence (which corresponds to Case 2 in the Master Theorem) every
level of the recursion tree provides the same total work, namely n. For this reason the total work is
equal to this value times the height of the tree, namely Θ(log n), for a total of Θ(nlog n).
Next consider the earlier recurrence T(n) = 3T(n/2)+n
2
(which corresponds to Case 3 in the Master
Theorem). In this instance most of the work was concentrated at the root of the tree. Each level of the
tree provided a smaller fraction of work. By the nature of the geometric series, it did not matter how
many levels the tree had at all. Even with an inﬁnite number of levels, the geometric series that result
will converge to a constant value. This is an important observation to make. A common way to design
the most efﬁcient divide-and-conquer algorithms is to try to arrange the recursion so that most of the
work is done at the root, and at each successive level of the tree the work at this level reduces (by some
constant factor). As long as this is the case, Case 3 will apply.
30
Lecture Notes CMSC 251
Finally, in the recurrence T(n) = 4T(n/3) + n (which corresponds to Case 1), most of the work is
done at the leaf level of the recursion tree. This can be seen if you perform iteration on this recurrence,
the resulting summation is
n
log
3
n

i=0
_
4
3
_
i
.
(You might try this to see if you get the same result.) Since 4/3 > 1, as we go deeper into the levels
of the tree, that is deeper into the summation, the terms are growing successively larger. The largest
contribution will be from the leaf level.
Lecture 9: Medians and Selection
(Tuesday, Feb 24, 1998)
Read: Todays material is covered in Sections 10.2 and 10.3. You are not responsible for the randomized
analysis of Section 10.2. Our presentation of the partitioning algorithm and analysis are somewhat different
from the ones in the book.
Selection: In the last couple of lectures we have discussed recurrences and the divide-and-conquer method
of solving problems. Today we will give a rather surprising (and very tricky) algorithm which shows
the power of these techniques.
The problem that we will consider is very easy to state, but surprisingly difﬁcult to solve optimally.
Suppose that you are given a set of n numbers. Deﬁne the rank of an element to be one plus the
number of elements that are smaller than this element. Since duplicate elements make our life more
complex (by creating multiple elements of the same rank), we will make the simplifying assumption
that all the elements are distinct for now. It will be easy to get around this assumption later. Thus, the
rank of an element is its ﬁnal position if the set is sorted. The minimum is of rank 1 and the maximum
is of rank n.
Of particular interest in statistics is the median. If n is odd then the median is deﬁned to be the element
of rank (n + 1)/2. When n is even there are two natural choices, namely the elements of ranks n/2
and (n/2) + 1. In statistics it is common to return the average of these two elements. We will deﬁne
the median to be either of these elements.
Medians are useful as measures of the central tendency of a set, especially when the distribution of val-
ues is highly skewed. For example, the median income in a community is likely to be more meaningful
measure of the central tendency than the average is, since if Bill Gates lives in your community then
his gigantic income may signiﬁcantly bias the average, whereas it cannot have a signiﬁcant inﬂuence
on the median. They are also useful, since in divide-and-conquer applications, it is often desirable to
partition a set about its median value, into two sets of roughly equal size. Today we will focus on the
following generalization, called the selection problem.
Selection: Given a set A of n distinct numbers and an integer k, 1 ≤ k ≤ n, output the element of A
of rank k.
The selection problem can easily be solved in Θ(nlog n) time, simply by sorting the numbers of A,
and then returning A[k]. The question is whether it is possible to do better. In particular, is it possible
to solve this problem in Θ(n) time? We will see that the answer is yes, and the solution is far from
obvious.
The Sieve Technique: The reason for introducing this algorithm is that it illustrates a very important special
case of divide-and-conquer, which I call the sieve technique. We think of divide-and-conquer as break-
ing the problem into a small number of smaller subproblems, which are then solved recursively. The
sieve technique is a special case, where the number of subproblems is just 1.
31
Lecture Notes CMSC 251
The sieve technique works in phases as follows. It applies to problems where we are interested in
ﬁnding a single item from a larger set of n items. We do not know which item is of interest, however
after doing some amount of analysis of the data, taking say Θ(n
k
) time, for some constant k, we ﬁnd
that we do not know what the desired item is, but we can identify a large enough number of elements
that cannot be the desired value, and can be eliminated from further consideration. In particular “large
enough” means that the number of items is at least some ﬁxed constant fraction of n (e.g. n/2, n/3,
0.0001n). Then we solve the problem recursively on whatever items remain. Each of the resulting
recursive solutions then do the same thing, eliminating a constant fraction of the remaining set.
Applying the Sieve to Selection: To see more concretely how the sieve technique works, let us apply it to
the selection problem. Recall that we are given an array A[1..n] and an integer k, and want to ﬁnd the
k-th smallest element of A. Since the algorithm will be applied inductively, we will assume that we
are given a subarray A[p..r] as we did in MergeSort, and we want to ﬁnd the kth smallest item (where
k ≤ r −p + 1). The initial call will be to the entire array A[1..n].
There are two principal algorithms for solving the selection problem, but they differ only in one step,
which involves judiciously choosing an item from the array, called the pivot element, which we will
denote by x. Later we will see how to choose x, but for now just think of it as a random element of A.
We then partition A into three parts. A[q] contains the element x, subarray A[p..q −1] will contain all
the elements that are less than x, and A[q + 1..r], will contain all the element that are greater than x.
(Recall that we assumed that all the elements are distinct.) Within each subarray, the items may appear
in any order. This is illustrated below.
Before partitioing
After partitioing
2 6 4 1 3 7 9
pivot
3 5 1 9 4 6
x
p r
q p r
A[q+1..r] > x
A[p..q−1] < x
5
2 7
Partition
(pivot = 4)
9
7
5
6
(k=6−4=2)
Recurse
x_rnk=2 (DONE!)
6
5
5
6
(pivot = 6)
Partition
(k=2)
Recurse
x_rnk=3
(pivot = 7)
Partition
(k=6)
Initial
x_rnk=4
6
7
3
1
4
6
2
9
5
4
1
9
5
3
7
2
6
9
5
7
Figure 7: Selection Algorithm.
It is easy to see that the rank of the pivot x is q−p+1 in A[p..r]. Let x rnk = q −p +1. If k = x rnk,
then the pivot is the kth smallest, and we may just return it. If k < x rnk, then we know that we need
to recursively search in A[p..q − 1] and if k > x rnk then we need to recursively search A[q + 1..r].
In this latter case we have eliminated q smaller elements, so we want to ﬁnd the element of rank k −q.
Here is the complete pseudocode.
Selection
Select(array A, int p, int r, int k) { // return kth smallest of A[p..r]
if (p == r) return A[p] // only 1 item left, return it
32
Lecture Notes CMSC 251
else {
x = Choose_Pivot(A, p, r) // choose the pivot element
q = Partition(A, p, r, x) // partition <A[p..q-1], x, A[q+1..r]>
x_rnk = q - p + 1 // rank of the pivot
if (k == x_rnk) return x // the pivot is the kth smallest
else if (k < x_rnk)
return Select(A, p, q-1, k) // select from left subarray
else
return Select(A, q+1, r, k-x_rnk)// select from right subarray
}
}
Notice that this algorithm satisﬁes the basic form of a sieve algorithm. It analyzes the data (by choosing
the pivot element and partitioning) and it eliminates some part of the data set, and recurses on the rest.
When k = x rnk then we get lucky and eliminate everything. Otherwise we either eliminate the pivot
and the right subarray or the pivot and the left subarray.
We will discuss the details of choosing the pivot and partitioning later, but assume for now that they
both take Θ(n) time. The question that remains is how many elements did we succeed in eliminating?
If x is the largest or smallest element in the array, then we may only succeed in eliminating one element
with each phase. In fact, if x is one of the smallest elements of A or one of the largest, then we get
into trouble, because we may only eliminate it and the few smaller or larger elements of A. Ideally x
should have a rank that is neither too large nor too small.
Let us suppose for now (optimistically) that we are able to design the procedure Choose Pivot in
such a way that is eliminates exactly half the array with each phase, meaning that we recurse on the
remaining n/2 elements. This would lead to the following recurrence.
T(n) =
_
1 if n = 1,
T(n/2) +n otherwise.
We can solve this either by expansion (iteration) or the Master Theorem. If we expand this recurrence
level by level we see that we get the summation
T(n) = n +
n
2
+
n
4
+ ≤

i=0
n
2
i
= n

i=0
1
2
i
.
Recall the formula for the inﬁnite geometric series. For any c such that [c[ < 1,

i=0
c
i
= 1/(1 −c).
Using this we have
T(n) ≤ 2n ∈ O(n).
(This only proves the upper bound on the running time, but it is easy to see that it takes at least Ω(n)
time, so the total running time is Θ(n).)
This is a bit counterintuitive. Normally you would think that in order to design a Θ(n) time algorithm
you could only make a single, or perhaps a constant number of passes over the data set. In this algorithm
we make many passes (it could be as many as lg n). However, because we eliminate a constant fraction
of elements with each phase, we get this convergent geometric series in the analysis, which shows that
the total running time is indeed linear in n. This lesson is well worth remembering. It is often possible
to achieve running times in ways that you would not expect.
Note that the assumption of eliminating half was not critical. If we eliminated even one per cent, then
the recurrence would have been T(n) = T(99n/100)+n, and we would have gotten a geometric series
involving 99/100, which is still less than 1, implying a convergent series. Eliminating any constant
fraction would have been good enough.
33
Lecture Notes CMSC 251
Choosing the Pivot: There are two issues that we have left unresolved. The ﬁrst is how to choose the pivot
element, and the second is how to partition the array. Both need to be solved in Θ(n) time. The second
problem is a rather easy programming exercise. Later, when we discuss QuickSort, we will discuss
partitioning in detail.
For the rest of the lecture, let’s concentrate on how to choose the pivot. Recall that before we said that
we might think of the pivot as a random element of A. Actually this is not such a bad idea. Let’s see
why.
The key is that we want the procedure to eliminate at least some constant fraction of the array after
each partitioning step. Let’s consider the top of the recurrence, when we are given A[1..n]. Suppose
that the pivot x turns out to be of rank q in the array. The partitioning algorithm will split the array into
A[1..q − 1] < x, A[q] = x and A[q + 1..n] > x. If k = q, then we are done. Otherwise, we need
to search one of the two subarrays. They are of sizes q − 1 and n − q, respectively. The subarray that
contains the kth smallest element will generally depend on what k is, so in the worst case, k will be
chosen so that we have to recurse on the larger of the two subarrays. Thus if q > n/2, then we may
have to recurse on the left subarray of size q − 1, and if q < n/2, then we may have to recurse on the
right subarray of size n −q. In either case, we are in trouble if q is very small, or if q is very large.
If we could select q so that it is roughly of middle rank, then we will be in good shape. For example,
if n/4 ≤ q ≤ 3n/4, then the larger subarray will never be larger than 3n/4. Earlier we said that we
might think of the pivot as a random element of the array A. Actually this works pretty well in practice.
The reason is that roughly half of the elements lie between ranks n/4 and 3n/4, so picking a random
element as the pivot will succeed about half the time to eliminate at least n/4. Of course, we might be
continuously unlucky, but a careful analysis will show that the expected running time is still Θ(n). We
Instead, we will describe a rather complicated method for computing a pivot element that achieves the
desired properties. Recall that we are given an array A[1..n], and we want to compute an element x
whose rank is (roughly) between n/4 and 3n/4. We will have to describe this algorithm at a very high
level, since the details are rather involved. Here is the description for Select Pivot:
Groups of 5: Partition A into groups of 5 elements, e.g. A[1..5], A[6..10], A[11..15], etc. There will
be exactly m = ¸n/5| such groups (the last one might have fewer than 5 elements). This can
easily be done in Θ(n) time.
Group medians: Compute the median of each group of 5. There will be mgroup medians. We do not
need an intelligent algorithm to do this, since each group has only a constant number of elements.
For example, we could just BubbleSort each group and take the middle element. Each will take
Θ(1) time, and repeating this ¸n/5| times will give a total running time of Θ(n). Copy the group
medians to a new array B.
Median of medians: Compute the median of the group medians. For this, we will have to call the
selection algorithm recursively on B, e.g. Select(B, 1, m, k), where m = ¸n/5|, and
k = ¸(m+ 1)/2|. Let x be this median of medians. Return x as the desired pivot.
The algorithm is illustrated in the ﬁgure below. To establish the correctness of this procedure, we need
to argue that x satisﬁes the desired rank properties.
Lemma: The element x is of rank at least n/4 and at most 3n/4 in A.
Proof: We will show that x is of rank at least n/4. The other part of the proof is essentially sym-
metrical. To do this, we need to show that there are at least n/4 elements that are less than or
equal to x. This is a bit complicated, due to the ﬂoor and ceiling arithmetic, so to simplify things
we will assume that n is evenly divisible by 5. Consider the groups shown in the tabular form
above. Observe that at least half of the group medians are less than or equal to x. (Because x is
34
Lecture Notes CMSC 251
8
10
27
Group
29
11
58
39
60
55
1
21
52
19
48 63
12
23
3
24
37
57
14
6
48 24
57
14
25
30
43
2
32
3
63
12
52
23
64
34
17
44
5
19
8
27
10
41
25
25
43
30
32
2
63
52
12
23
3
34
44
17
27
10
8
19
48
41
60
1
29
11
39
58
Get median of medians
(Sorting of group medians is not really performed)
6
43
30
32
2
64
5
34
44
17 29
11
39
58 55
21
41
60
1
24
64
5
55
21
Get group medians
37
57
14
37
6
Figure 8: Choosing the Pivot. 30 is the ﬁnal pivot.
their median.) And for each group median, there are three elements that are less than or equal to
this median within its group (because it is the median of its group). Therefore, there are at least
3((n/5)/2 = 3n/10 ≥ n/4 elements that are less than or equal to x in the entire array.
Analysis: The last order of business is to analyze the running time of the overall algorithm. We achieved
the main goal, namely that of eliminating a constant fraction (at least 1/4) of the remaining list at each
stage of the algorithm. The recursive call in Select() will be made to list no larger than 3n/4.
However, in order to achieve this, within Select Pivot() we needed to make a recursive call to
Select() on an array B consisting of ¸n/5| elements. Everything else took only Θ(n) time. As
usual, we will ignore ﬂoors and ceilings, and write the Θ(n) as n for concreteness. The running time
is
T(n) ≤
_
1 if n = 1,
T(n/5) +T(3n/4) +n otherwise.
This is a very strange recurrence because it involves a mixture of different fractions (n/5 and 3n/4).
This mixture will make it impossible to use the Master Theorem, and difﬁcult to apply iteration. How-
ever, this is a good place to apply constructive induction. We know we want an algorithm that runs in
Θ(n) time.
Theorem: There is a constant c, such that T(n) ≤ cn.
Proof: (by strong induction on n)
Basis: (n = 1) In this case we have T(n) = 1, and so T(n) ≤ cn as long as c ≥ 1.
Step: We assume that T(n

) ≤ cn

for all n

< n. We will then show that T(n) ≤ cn. By
deﬁnition we have
T(n) = T(n/5) +T(3n/4) +n.
35
Lecture Notes CMSC 251
Since n/5 and 3n/4 are both less than n, we can apply the induction hypothesis, giving
T(n) ≤ c
n
5
+c
3n
4
+n = cn
_
1
5
+
3
4
_
+n
= cn
19
20
+n = n
_
19c
20
+ 1
_
.
This last expression will be ≤ cn, provided that we select c such that c ≥ (19c/20) + 1.
Solving for c we see that this is true provided that c ≥ 20.
Combining the constraints that c ≥ 1, and c ≥ 20, we see that by letting c = 20, we are done.
A natural question is why did we pick groups of 5? If you look at the proof above, you will see that it
works for any value that is strictly greater than 4. (You might try it replacing the 5 with 3, 4, or 6 and
see what happens.)
Lecture 10: Long Integer Multiplication
(Thursday, Feb 26, 1998)
Read: Todays material on integer multiplication is not covered in CLR.
Ofﬁce hours: The TA, Kyongil, will have extra ofﬁce hours on Monday before the midterm, from 1:00-2:00.
I’ll have ofﬁce hours from 2:00-4:00 on Monday.
Long Integer Multiplication: The following little algorithm shows a bit more about the surprising applica-
tions of divide-and-conquer. The problem that we want to consider is how to perform arithmetic on
long integers, and multiplication in particular. The reason for doing arithmetic on long numbers stems
from cryptography. Most techniques for encryption are based on number-theoretic techniques. For
example, the character string to be encrypted is converted into a sequence of numbers, and encryption
keys are stored as long integers. Efﬁcient encryption and decryption depends on being able to perform
arithmetic on long numbers, typically containing hundreds of digits.
Addition and subtraction on large numbers is relatively easy. If n is the number of digits, then these
algorithms run in Θ(n) time. (Go back and analyze your solution to the problem on Homework 1). But
the standard algorithm for multiplication runs in Θ(n
2
) time, which can be quite costly when lots of
long multiplications are needed.
This raises the question of whether there is a more efﬁcient way to multiply two very large numbers. It
would seem surprising if there were, since for centuries people have used the same algorithm that we
all learn in grade school. In fact, we will see that it is possible.
Divide-and-Conquer Algorithm: We know the basic grade-school algorithm for multiplication. We nor-
mally think of this algorithm as applying on a digit-by-digit basis, but if we partition an n digit number
into two “super digits” with roughly n/2 each into longer sequences, the same multiplication rule still
applies.
To avoid complicating things with ﬂoors and ceilings, let’s just assume that the number of digits n is
a power of 2. Let A and B be the two numbers to multiply. Let A[0] denote the least signiﬁcant digit
and let A[n − 1] denote the most signiﬁcant digit of A. Because of the way we write numbers, it is
more natural to think of the elements of A as being indexed in decreasing order from left to right as
A[n −1..0] rather than the usual A[0..n −1].
Let m = n/2. Let
w = A[n −1..m] x = A[m−1..0] and
y = B[n −1..m] z = B[m−1..0].
36
Lecture Notes CMSC 251
w
y
x
z
xz wz
xy wy
wy wz + xy xz
n
n/2 n/2
A
B
Product
Figure 9: Long integer multiplication.
If we think of w, x, y and z as n/2 digit numbers, we can express A and B as
A = w 10
m
+x
B = y 10
m
+z,
and their product is
mult(A, B) = mult(w, y)10
2m
+ (mult(w, z) + mult(x, y))10
m
+ mult(x, z).
The operation of multiplying by 10
m
should be thought of as simply shifting the number over by
m positions to the right, and so is not really a multiplication. Observe that all the additions involve
numbers involving roughly n/2 digits, and so they take Θ(n) time each. Thus, we can express the
multiplication of two long integers as the result of 4 products on integers of roughly half the length of
the original, and a constant number of additions and shifts, each taking Θ(n) time. This suggests that
if we were to implement this algorithm, its running time would be given by the following recurrence
T(n) =
_
1 if n = 1,
4T(n/2) +n otherwise.
If we apply the Master Theorem, we see that a = 4, b = 2, k = 1, and a > b
k
, implying that Case
1 holds and the running time is Θ(n
lg 4
) = Θ(n
2
). Unfortunately, this is no better than the standard
algorithm.
Faster Divide-and-Conquer Algorithm: Even though the above exercise appears to have gotten us nowhere,
it actually has given us an important insight. It shows that the critical element is the number of multi-
plications on numbers of size n/2. The number of additions (as long as it is a constant) does not affect
the running time. So, if we could ﬁnd a way to arrive at the same result algebraically, but by trading
off multiplications in favor of additions, then we would have a more efﬁcient algorithm. (Of course,
we cannot simulate multiplication through repeated additions, since the number of additions must be a
constant, independent of n.)
The key turns out to be a algebraic “trick”. The quantities that we need to compute are C = wy,
D = xz, and E = (wz + xy). Above, it took us four multiplications to compute these. However,
observe that if instead we compute the following quantities, we can get everything we want, using only
three multiplications (but with more additions and subtractions).
C = mult(w, y)
D = mult(x, z)
E = mult((w +x), (y +z)) −C −D = (wy +wz +xy +xz) −wy −xz = (wz +xy).
37
Lecture Notes CMSC 251
Finally we have
mult(A, B) = C 10
2m
+E 10
m
+D.
Altogether we perform 3 multiplications, 4 additions, and 2 subtractions all of numbers with n/2
digitis. We still need to shift the terms into their proper ﬁnal positions. The additions, subtractions, and
shifts take Θ(n) time in total. So the total running time is given by the recurrence:
T(n) =
_
1 if n = 1,
3T(n/2) +n otherwise.
Now when we apply the Master Theorem, we have a = 3, b = 2 and k = 1, yielding T(n) ∈
Θ(n
lg 3
) ≈ Θ(n
1.585
).
Is this really an improvement? This algorithm carries a larger constant factor because of the overhead
of recursion and the additional arithmetic operations. But asymptotics says that if n is large enough,
then this algorithm will be superior. For example, if we assume that the clever algorithm has overheads
that are 5 times greater than the simple algorithm (e.g. 5n
1.585
versus n
2
) then this algorithm beats the
simple algorithm for n ≥ 50. If the overhead was 10 times larger, then the crossover would occur for
n ≥ 260.
Review for the Midterm: Here is a list topics and readings for the ﬁrst midterm exam. Generally you are
responsible for anything discussed in class, and anything appearing on homeworks. It is a good idea to
check out related chapters in the book, because this is where I often look for ideas on problems.
Worst-case, Average-case: Recall that a worst-case means that we consider the highest running time
over all inputs of size n, average case means that we average running times over all inputs of size
n (and generally weighting each input by its probability of occuring). (Chapt 1 of CLR.)
General analysis methods: Be sure you understand the induction proofs given in class and on the
homeworks. Also be sure you understand how the constructive induction proofs worked.
Summations: Write down (and practice recognizing) the basic formulas for summations. These in-
clude the arithmetic series

i

i
i
2
, the geometric series

i
x
i
, and the
harmonic series

i
1/i. Practice with simplifying summations. For example, be sure that you
can take something like

i
3
i
_
n
2
i
_
2
and simplify it to a geometric series
n
2

i
(3/4)
i
.
Also be sure you can apply the integration rule to summations. (Chapt. 3 of CLR.)
Asymptotics: Know the formal deﬁnitions for Θ, O, and Ω, as well as how to use the limit-rule.
Know the what the other forms, o and ω, mean informally. There are a number of good sample
problems in the book. I’ll be happy to check any of your answers. Also be able to rank functions
in asymptotic order. For example which is larger lg

n or

lg n? (It is the former, can you see
why?) Remember the following rule and know how to use it.
lim
n→∞
n
b
a
n
= 0 lim
n→∞
lg
b
n
n
c
= 0.
(Chapt. 2 of CLR.)
Recurrences: Know how to analyze the running time of a recursive program by expressing it as a
recurrence. Review the basic techniques for solving recurrences: guess and verify by induction
(I’ll provide any guesses that you need on the exam), constructive induction, iteration, and the
(simpliﬁed) Master Theorem. (You are NOT responsible for the more complex version given in
the text.) (Chapt 4, Skip 4.4.)
38
Lecture Notes CMSC 251
Divide-and-conquer: Understand how to design algorithms by divide-and-conquer. Understand the
divide-and-conquer algorithm for MergeSort, and be able to work an example by hand. Also
understand how the sieve technique works, and how it was used in the selection problem. (Chapt
10 on Medians; skip the randomized analysis. The material on the 2-d maxima and long integer
multiplication is not discussed in CLR.)
Lecture 11: First Midterm Exam
(Tuesday, March 3, 1998)
First midterm exam today. No lecture.
Lecture 12: Heaps and HeapSort
(Thursday, Mar 5, 1998)
Sorting: For the next series of lectures we will focus on sorting algorithms. The reasons for studying sorting
algorithms in details are twofold. First, sorting is a very important algorithmic problem. Procedures
for sorting are parts of many large software systems, either explicitly or implicitly. Thus the design of
efﬁcient sorting algorithms is important for the overall efﬁciency of these systems. The other reason is
more pedagogical. There are many sorting algorithms, some slow and some fast. Some possess certain
desirable properties, and others do not. Finally sorting is one of the few problems where there provable
lower bounds on how fast you can sort. Thus, sorting forms an interesting case study in algorithm
theory.
In the sorting problem we are given an array A[1..n] of n numbers, and are asked to reorder these
elements into increasing order. More generally, A is of an array of records, and we choose one of these
records as the key value on which the elements will be sorted. The key value need not be a number. It
can be any object from a totally ordered domain. Totally ordered means that for any two elements of
the domain, x, and y, either x < y, x =, or x > y.
There are some domains that can be partially ordered, but not totally ordered. For example, sets can
be partially ordered under the subset relation, ⊂, but this is not a total order, it is not true that for any
two sets either x ⊂ y, x = y or x ⊃ y. There is an algorithm called topological sorting which can be
applied to “sort” partially ordered sets. We may discuss this later.
Slow Sorting Algorithms: There are a number of well-known slow sorting algorithms. These include the
following:
Bubblesort: Scan the array. Whenever two consecutive items are found that are out of order, swap
them. Repeat until all consecutive items are in order.
Insertion sort: Assume that A[1..i − 1] have already been sorted. Insert A[i] into its proper position
in this subarray, by shifting all larger elements to the right by one to make space for the new item.
Selection sort: Assume that A[1..i − 1] contain the i − 1 smallest elements in sorted order. Find the
smallest element in A[i..n], and then swap it with A[i].
These algorithms are all easy to implement, but they run in Θ(n
2
) time in the worst case. We have
already seen that MergeSort sorts an array of numbers in Θ(nlog n) time. We will study two others,
HeapSort and QuickSort.
39
Lecture Notes CMSC 251
Priority Queues: The heapsort algorithm is based on a very nice data structure, called a heap. A heap is
a concrete implementation of an abstract data type called a priority queue. A priority queue stores
elements, each of which is associated with a numeric key value, called its priority. A simple priority
queue supports three basic operations:
Create: Create an empty queue.
Insert: Insert an element into a queue.
ExtractMax: Return the element with maximum key value from the queue. (Actually it is more
common to extract the minimum. It is easy to modify the implementation (by reversing < and >
to do this.)
Empty: Test whether the queue is empty.
Adjust Priority: Change the priority of an item in the queue.
It is common to support a number of additional operations as well, such as building a priority queue
from an initial set of elements, returning the largest element without deleting it, and changing the
priority of an element that is already in the queueu (either decreasing or increasing it).
Heaps: A heap is a data structure that supports the main priority queue operations (insert and delete max)
in Θ(log n) time. For now we will describe the heap in terms of a binary tree implementation, but we
will see later that heaps can be stored in arrays.
By a binary tree we mean a data structure which is either empty or else it consists of three things: a
root node, a left subtree and a right subtree. The left subtree and right subtrees are each binary trees.
They are called the left child and right child of the root node. If both the left and right children of a
node are empty, then this node is called a leaf node. A nonleaf node is called an internal node.
Complete Binary Tree Binary Tree
Root
Depth:
2
3
4
5
1
0
internal
node
leaf
Figure 10: Binary trees.
The depth of a node in a binary tree is its distance from the root. The root is at depth 0, its children
at depth 1, its grandchildren at depth 2, and so on. The height of a binary tree is its maximum depth.
Binary tree is said to be complete if all internal nodes have two (nonempty) children, and all leaves
have the same depth. An important fact about a complete binary trees is that a complete binary tree of
height h has
n = 1 + 2 +. . . + 2
h
=
h

i=0
2
i
= 2
h+1
−1
nodes altogether. If we solve for h in terms of n, we see that the the height of a complete binary tree
with n nodes is h = (lg(n + 1)) −1 ≈ lg n ∈ Θ(log n).
40
Lecture Notes CMSC 251
A heap is represented as an left-complete binary tree. This means that all the levels of the tree are full
except the bottommost level, which is ﬁlled from left to right. An example is shown below. The keys of
a heap are stored in something called heap order. This means that for each node u, other than the root,
key(Parent(u)) ≥ key(u). This implies that as you follow any path from a leaf to the root the keys
appear in (nonstrict) increasing order. Notice that this implies that the root is necessarily the largest
element.
4
Heap ordering Left−complete Binary Tree
14
3
16
9
10
8 7
1 2
Figure 11: Heap.
Next time we will show how the priority queue operations are implemented for a heap.
Lecture 13: HeapSort
(Tuesday, Mar 10, 1998)
Heaps: Recall that a heap is a data structure that supports the main priority queue operations (insert and
extract max) in Θ(log n) time each. It consists of a left-complete binary tree (meaning that all levels of
the tree except possibly the bottommost) are full, and the bottommost level is ﬁlled from left to right.
As a consequence, it follows that the depth of the tree is Θ(log n) where n is the number of elements
stored in the tree. The keys of the heap are stored in the tree in what is called heap order. This means
that for each (nonroot) node its parent’s key is at least as large as its key. From this it follows that the
largest key in the heap appears at the root.
Array Storage: Last time we mentioned that one of the clever aspects of heaps is that they can be stored in
arrays, without the need for using pointers (as would normally be needed for storing binary trees). The
reason for this is the left-complete nature of the tree.
This is done by storing the heap in an array A[1..n]. Generally we will not be using all of the array,
since only a portion of the keys may be part of the current heap. For this reason, we maintain a variable
m ≤ n which keeps track of the current number of elements that are actually stored actively in the
heap. Thus the heap will consist of the elements stored in elements A[1..m].
We store the heap in the array by simply unraveling it level by level. Because the binary tree is left-
complete, we know exactly how many elements each level will supply. The root level supplies 1 node,
the next level 2, then 4, then 8, and so on. Only the bottommost level may supply fewer than the
appropriate power of 2, but then we can use the value of mto determine where the last element is. This
is illustrated below.
We should emphasize that this only works because the tree is left-complete. This cannot be used for
general trees.
41
Lecture Notes CMSC 251
3 2 1 5 6 7 8 9 4 13 12 11 10
6 7
10
16 14 10 8 7 9 4 2 1 3
n=13 m=10
Heap as an array
9 8
9
5 4
3 2
1
16
Heap as a binary tree
14
3
10
8 7
1 4 2
Figure 12: Storing a heap in an array.
We claim that to access elements of the heap involves simple arithmetic operations on the array indices.
In particular it is easy to see the following.
Left(i) : return 2i.
Right(i) : return 2i + 1.
Parent(i) : return ¸i/2|.
IsLeaf (i) : return Left(i) > m. (That is, if i’s left child is not in the tree.)
IsRoot(i) : return i == 1.
For example, the heap ordering property can be stated as “for all i, 1 ≤ i ≤ n, if (not IsRoot(i)) then
A[Parent(i)] ≥ A[i]”.
So is a heap a binary tree or an array? The answer is that from a conceptual standpoint, it is a binary
tree. However, it is implemented (typically) as an array for space efﬁciency.
Maintaining the Heap Property: There is one principal operation for maintaining the heap property. It is
called Heapify. (In other books it is sometimes called sifting down.) The idea is that we are given an
element of the heap which we suspect may not be in valid heap order, but we assume that all of other
the elements in the subtree rooted at this element are in heap order. In particular this root element may
be too small. To ﬁx this we “sift” it down the tree by swapping it with one of its children. Which child?
We should take the larger of the two children to satisfy the heap ordering property. This continues
recursively until the element is either larger than both its children or until its falls all the way to the
leaf level. Here is the pseudocode. It is given the heap in the array A, and the index i of the suspected
element, and m the current active size of the heap. The element A[max] is set to the maximum of A[i]
and it two children. If max ,= i then we swap A[i] and A[max] and then recurse on A[max].
Heapify
Heapify(array A, int i, int m) { // sift down A[i] in A[1..m]
l = Left(i) // left child
r = Right(i) // right child
max = i
if (l <= m and A[l] > A[max]) max = l // left child exists and larger
if (r <= m and A[r] > A[max]) max = r // right child exists and larger
if (max != i) { // if either child larger
swap A[i] with A[max] // swap with larger child
Heapify(A, max, m) // and recurse
}
}
42
Lecture Notes CMSC 251
See Figure 7.2 on page 143 of CLR for an example of how Heapify works (in the case where m = 10).
We show the execution on a tree, rather than on the array representation, since this is the most natural
way to conceptualize the heap. You might try simulating this same algorithm on the array, to see how
it works at a ﬁner details.
Note that the recursive implementation of Heapify is not the most efﬁcient. We have done so because
many algorithms on trees are most naturally implemented using recursion, so it is nice to practice this
here. It is possible to write the procedure iteratively. This is left as an exercise.
The HeapSort algorithm will consist of two major parts. First building a heap, and then extracting the
maximum elements from the heap, one by one. We will see how to use Heapify to help us do both of
these.
How long does Hepify take to run? Observe that we perform a constant amount of work at each level
of the tree until we make a call to Heapify at the next lower level of the tree. Thus we do O(1) work
for each level of the tree which we visit. Since there are Θ(log n) levels altogether in the tree, the total
time for Heapify is O(log n). (It is not Θ(log n) since, for example, if we call Heapify on a leaf, then
it will terminate in Θ(1) time.)
Building a Heap: We can use Heapify to build a heap as follows. First we start with a heap in which the
elements are not in heap order. They are just in the same order that they were given to us in the array
A. We build the heap by starting at the leaf level and then invoke Heapify on each node. (Note: We
cannot start at the top of the tree. Why not? Because the precondition which Heapify assumes is that
the entire tree rooted at node i is already in heap order, except for i.) Actually, we can be a bit more
efﬁcient. Since we know that each leaf is already in heap order, we may as well skip the leaves and
start with the ﬁrst nonleaf node. This will be in position ¸n/2|. (Can you see why?)
Here is the code. Since we will work with the entire array, the parameter mfor Heapify, which indicates
the current heap size will be equal to n, the size of array A, in all the calls.
BuildHeap
BuildHeap(int n, array A[1..n]) { // build heap from A[1..n]
for i = n/2 downto 1 {
Heapify(A, i, n)
}
}
An example of BuildHeap is shown in Figure 7.3 on page 146 of CLR. Since each call to Heapify
takes O(log n) time, and we make roughly n/2 calls to it, the total running time is O((n/2) log n) =
O(nlog n). Next time we will show that this actually runs faster, and in fact it runs in Θ(n) time.
HeapSort: We can now give the HeapSort algorithm. The idea is that we need to repeatedly extract the
maximum item from the heap. As we mentioned earlier, this element is at the root of the heap. But
once we remove it we are left with a hole in the tree. To ﬁx this we will replace it with the last leaf in
the tree (the one at position A[m]). But now the heap order will very likely be destroyed. So we will
just apply Heapify to the root to ﬁx everything back up.
HeapSort
HeapSort(int n, array A[1..n]) { // sort A[1..n]
BuildHeap(n, A) // build the heap
m = n // initially heap contains all
while (m >= 2) {
swap A[1] with A[m] // extract the m-th largest
m = m-1 // unlink A[m] from heap
43
Lecture Notes CMSC 251
Heapify(A, 1, m) // fix things up
}
}
An example of HeapSort is shown in Figure 7.4 on page 148 of CLR. We make n −1 calls to Heapify,
each of which takes O(log n) time. So the total running time is O((n −1) log n) = O(nlog n).
Lecture 14: HeapSort Analysis and Partitioning
(Thursday, Mar 12, 1998)
Read: Chapt 7 and 8 in CLR. The algorithm we present for partitioning is different from the texts.
HeapSort Analysis: Last time we presented HeapSort. Recall that the algorithm operated by ﬁrst building a
heap in a bottom-up manner, and then repeatedly extracting the maximum element from the heap and
moving it to the end of the array. One clever aspect of the data structure is that it resides inside the
array to be sorted.
We argued that the basic heap operation of Heapify runs in O(log n) time, because the heap has
O(log n) levels, and the element being sifted moves down one level of the tree after a constant amount
of work.
Based on this we can see that (1) that it takes O(nlog n) time to build a heap, because we need
to apply Heapify roughly n/2 times (to each of the internal nodes), and (2) that it takes O(nlog n)
time to extract each of the maximum elements, since we need to extract roughly n elements and each
extraction involves a constant amount of work and one Heapify. Therefore the total running time of
HeapSort is O(nlog n).
Is this tight? That is, is the running time Θ(nlog n)? The answer is yes. In fact, later we will see that it
is not possible to sort faster than Ω(nlog n) time, assuming that you use comparisons, which HeapSort
does. However, it turns out that the ﬁrst part of the analysis is not tight. In particular, the BuildHeap
procedure that we presented actually runs in Θ(n) time. Although in the wider context of the HeapSort
algorithm this is not signiﬁcant (because the running time is dominated by the Θ(nlog n) extraction
phase).
Nonetheless there are situations where you might not need to sort all of the elements. For example, it
is common to extract some unknown number of the smallest elements until some criterion (depending
on the particular application) is met. For this reason it is nice to be able to build the heap quickly since
you may not need to extract all the elements.
BuildHeap Analysis: Let us consider the running time of BuildHeap more carefully. As usual, it will make
our lives simple by making some assumptions about n. In this case the most convenient assumption is
that n is of the form n = 2
h+1
−1, where h is the height of the tree. The reason is that a left-complete
tree with this number of nodes is a complete tree, that is, its bottommost level is full. This assumption
will save us from worrying about ﬂoors and ceilings.
With this assumption, level 0 of the tree has 1 node, level 1 has 2 nodes, and up to level h, which has
2
h
nodes. All the leaves reside on level h.
Recall that when Heapify is called, the running time depends on how far an element might sift down
before the process terminates. In the worst case the element might sift down all the way to the leaf
level. Let us count the work done level by level.
At the bottommost level there are 2
h
nodes, but we do not call Heapify on any of these so the work is
0. At the next to bottommost level there are 2
h−1
nodes, and each might sift down 1 level. At the 3rd
level from the bottom there are 2
h−2
nodes, and each might sift down 2 levels. In general, at level j
44
Lecture Notes CMSC 251
0 0 0 0 0 0
2*2
0 0*8
1*4
3*1
Total work for BuildHeap
2
0
1 1 1 1
2
3
Figure 13: Analysis of BuildHeap.
from the bottom there are 2
h−j
nodes, and each might sift down j levels. So, if we count from bottom
to top, level-by-level, we see that the total time is proportional to
T(n) =
h

j=0
j2
h−j
=
h

j=0
j
2
h
2
j
.
If we factor out the 2
h
term, we have
T(n) = 2
h
h

j=0
j
2
j
.
This is a sum that we have never seen before. We could try to approximate it by an integral, which
would involve integration by parts, but it turns out that there is a very cute solution to this particular
sum. We’ll digress for a moment to work it out. First, write down the inﬁnite general geometric series,
for any constant x < 1.

j=0
x
j
=
1
1 −x
.
Then take the derivative of both sides with respect to x, and multiply by x giving:

j=0
jx
j−1
=
1
(1 −x)
2

j=0
jx
j
=
x
(1 −x)
2
,
and if we plug x = 1/2, then voila! we have the desired formula:

j=0
j
2
j
=
1/2
(1 −(1/2))
2
=
1/2
1/4
= 2.
In our case we have a bounded sum, but since the inﬁnite series is bounded, we can use it instead as an
easy approximation.
Using this we have
T(n) = 2
h
h

j=0
j
2
j
≤ 2
h

j=0
j
2
j
≤ 2
h
2 = 2
h+1
.
Now recall that n = 2
h+1
−1, so we have T(n) ≤ n + 1 ∈ O(n). Clearly the algorithm takes at least
Ω(n) time (since it must access every element of the array at least once) so the total running time for
BuildHeap is Θ(n).
45
Lecture Notes CMSC 251
It is worthwhile pausing here a moment. This is the second time we have seen a relatively complex
structured algorithm, with doubly nested loops, come out with a running time of Θ(n). (The other
example was the median algorithm, based on the sieve technique. Actually if you think deeply about
this, there is a sense in which a parallel version of BuildHeap can be viewed as operating like a sieve,
but maybe this is getting too philosophical.) Perhaps a more intuitive way to describe what is happening
here is to observe an important fact about binary trees. This is that the vast majority of nodes are at the
lowest level of the tree. For example, in a complete binary tree of height h there is a total of n ≈ 2
h+1
nodes in total, and the number of nodes in the bottom 3 levels alone is
2
h
+ 2
h−1
+ 2
h−2
=
n
2
+
n
4
+
n
8
=
7n
8
= 0.875n.
That is, almost 90% of the nodes of a complete binary tree reside in the 3 lowest levels. Thus the lesson
to be learned is that when designing algorithms that operate on trees, it is important to be most efﬁcient
on the bottommost levels of the tree (as BuildHeap is) since that is where most of the weight of the tree
resides.
Partitioning: Our next sorting algorithm is QuickSort. QuickSort is interesting in a number of respects.
First off, (as we will present it) it is a randomized algorithm, which means that it makes use of a ran-
dom number generator. We will show that in the worst case its running time is O(n
2
), its expected
case running time is O(nlog n). Moreover, this expected case running time occurs with high proba-
bility, in that the probability that the algorithm takes signiﬁcantly more than O(nlog n) time is rapidly
decreasing function of n. In addition, QuickSort has a better locality-of-reference behavior than either
MergeSort or HeapSort, and thus it tends to run fastest of all three algorithms. This is how it got its
name. QuickSort (and its variants) are considered the methods of choice for most standard library
sorting algorithms.
Next time we will discuss QuickSort. Today we will discuss one aspect of QuickSort, namely the
partitioning algorithm. This is the same partitioning algorithm which we discussed when we talked
about the selection (median) problem. We are given an array A[p..r], and a pivot element x chosen
from the array. Recall that the partitioning algorithm is suppose to partition A into three subarrays:
A[p..q − 1] whose elements are all less than or equal to x, A[q] = x, and A[q + 1..r] whose elements
are greater than or equal to x. We will assume that x is the ﬁrst element of the subarray, that is,
x = A[p]. If a different rule is used for selecting x, this is easily handled by swapping this element
with A[p] before calling this procedure.
We will present a different algorithm from the one given in the text (in Section 8.1). This algorithm is
a little easier to verify the correctness, and a little easier to analyze. (But I suspect that the one in the
text is probably a bit for efﬁcient for actual implementation.)
This algorithm works by maintaining the following invariant condition. The subarray is broken into
four segments. The boundaries between these items are indicated by the indices p, q, s, and r.
(1) A[p] = x is the pivot value,
(2) A[p + 1..q] contains items that are less than x,
(3) A[q + 1..s −1] contains items that are greater than or equal to x, and
(4) A[s..r] contains items whose values are currently unknown.
This is illustrated below.
The algorithm begins by setting q = p and s = p +1. With each step through the algorithm we test the
value of A[s] against x. If A[s] ≥ x, then we can simply increment s. Otherwise we increment q, swap
A[s] with A[q], and then increment s. Notice that in either case, the invariant is still maintained. In the
ﬁrst case this is obvious. In the second case, A[q] now holds a value that is less than x, and A[s − 1]
now holds a value that is greater than or equal to x. The algorithm ends when s = r, meaning that
46
Lecture Notes CMSC 251
configuration
p
r
q s
x ?
p r
q s
>= x < x
q
swap
x
Initial configuration
Final configuration
Intermediate
x < x >= x ?
Figure 14: Partitioning intermediate structure.
all of the elements have been processed. To ﬁnish things off we swap A[p] (the pivot) with A[q], and
return the value of q. Here is the complete code:
Partition
Partition(int p, int r, array A) { // 3-way partition of A[p..r]
x = A[p] // pivot item in A[p]
q = p
for s = p+1 to r do {
if (A[s] < x) {
q = q+1
swap A[q] with A[s]
}
}
swap A[p] with A[q] // put the pivot into final position
return q // return location of pivot
}
An example is shown below.
Lecture 15: QuickSort
(Tuesday, Mar 17, 1998)
Revised: March 18. Fixed a bug in the analysis.
Read: Chapt 8 in CLR. My presentation and analysis are somewhat different than the text’s.
QuickSort and Randomized Algorithms: Early in the semester we discussed the fact that we usually study
the worst-case running times of algorithms, but sometimes average-case is a more meaningful measure.
Today we will study QuickSort. It is a worst-case Θ(n
2
) algorithm, whose expected-case running time
is Θ(nlog n).
We will present QuickSort as a randomized algorithm, that is, an algorithm which makes random
choices. There are two common types of randomized algorithms:
Monte Carlo algorithms: These algorithms may produce the wrong result, but the probability of this
occurring can be made arbitrarily small by the user. Usually the lower you make this probability,
the longer the algorithm takes to run.
47
Lecture Notes CMSC 251
Final swap
3 7 1
q
8 4
s
5 3 7
3
6
s
4 8
q
1 7 3 6
5 3
3
1
5 3 7 1 8 4
s q
6 3
5
4 6
q
8 1
s
3 7 4 6 3
q
8
p r
5 3 8 6 4 3
q s
1 7
5 3 8 6 4 3
q
7 1
s
5 3 8 6 4 3
q
7 1
s
5
5 3 8 6 4 3
s q
7 1
Figure 15: Partitioning example.
Las Vegas algorithms: These algorithms always produce the correct result, but the running time is a
random variable. In these cases the expected running time, averaged over all possible random
choices is the measure of the algorithm’s running time.
The most well known Monte Carlo algorithm is one for determining whether a number is prime. This
is an important problem in cryptography. The QuickSort algorithm that we will discuss today is an ex-
ample of a Las Vegas algorithm. Note that QuickSort does not need to be implemented as a randomized
algorithm, but as we shall see, this is generally considered the safest implementation.
QuickSort Overview: QuickSort is also based on the divide-and-conquer design paradigm. Unlike Merge-
Sort where most of the work is done after the recursive call returns, in QuickSort the work is done
before the recursive call is made. Here is an overview of QuickSort. Note the similarity with the selec-
tion algorithm, which we discussed earlier. Let A[p..r] be the (sub)array to be sorted. The initial call
is to A[1..n].
Basis: If the list contains 0 or 1 elements, then return.
Select pivot: Select a random element x from the array, called the pivot.
Partition: Partition the array in three subarrays, those elements A[1..q − 1] ≤ x, A[q] = x, and
A[q + 1..n] ≥ x.
Recurse: Recursively sort A[1..q −1] and A[q + 1..n].
The pseudocode for QuickSort is given below. The initial call is QuickSort(1, n, A). The Partition
routine was discussed last time. Recall that Partition assumes that the pivot is stored in the ﬁrst element
of A. Since we want a random pivot, we pick a random index i from p to r, and then swap A[i] with
A[p].
QuickSort
QuickSort(int p, int r, array A) { // Sort A[p..r]
if (r <= p) return // 0 or 1 items, return
i = a random index from [p..r] // pick a random element
48
Lecture Notes CMSC 251
swap A[i] with A[p] // swap pivot into A[p]
q = Partition(p, r, A) // partition A about pivot
QuickSort(p, q-1, A) // sort A[p..q-1]
QuickSort(q+1, r, A) // sort A[q+1..r]
}
QuickSort Analysis: The correctness of QuickSort should be pretty obvious. However its analysis is not
so obvious. It turns out that the running time of QuickSort depends heavily on how good a job we
do in selecting the pivot. In particular, if the rank of the pivot (recall that this means its position in
the ﬁnal sorted list) is very large or very small, then the partition will be unbalanced. We will see that
unbalanced partitions (like unbalanced binary trees) are bad, and result is poor running times. However,
if the rank of the pivot is anywhere near the middle portion of the array, then the split will be reasonably
well balanced, and the overall running time will be good. Since the pivot is chosen at random by our
algorithm, we may do well most of the time and poorly occasionally. We will see that the expected
running time is O(nlog n).
Worst-case Analysis: Let’s begin by considering the worst-case performance, because it is easier than the
average case. Since this is a recursive program, it is natural to use a recurrence to describe its running
time. But unlike MergeSort, where we had control over the sizes of the recursive calls, here we do
not. It depends on how the pivot is chosen. Suppose that we are sorting an array of size n, A[1..n],
and further suppose that the pivot that we select is of rank q, for some q in the range 1 to n. It takes
Θ(n) time to do the partitioning and other overhead, and we make two recursive calls. The ﬁrst is to
the subarray A[1..q −1] which has q −1 elements, and the other is to the subarray A[q + 1..n] which
has r −(q + 1) + 1 = r −q elements. So if we ignore the Θ (as usual) we get the recurrence:
T(n) = T(q −1) +T(n −q) +n.
This depends on the value of q. To get the worst case, we maximize over all possible values of q. As a
basis we have that T(0) = T(1) = Θ(1). Putting this together we have
T(n) =
_
1 if n ≤ 1
max
1≤q≤n
(T(q −1) +T(n −q) +n) otherwise.
Recurrences that have max’s and min’s embedded in them are very messy to solve. The key is deter-
mining which value of q gives the maximum. (A rule of thumb of algorithm analysis is that the worst
cases tends to happen either at the extremes or in the middle. So I would plug in the value q = 1, q = n,
and q = n/2 and work each out.) In this case, the worst case happens at either of the extremes (but see
the book for a more careful analysis based on an analysis of the second derivative). If we expand the
recurrence in the case q = 1 we get:
T(n) ≤ T(0) +T(n −1) +n = 1 +T(n −1) +n
= T(n −1) + (n + 1)
= T(n −2) +n + (n + 1)
= T(n −3) + (n −1) +n + (n + 1)
= T(n −4) + (n −2) + (n −1) +n + (n + 1)
= . . .
= T(n −k) +
k−2

i=−1
(n −i).
49
Lecture Notes CMSC 251
For the basis, T(1) = 1 we set k = n −1 and get
T(n) ≤ T(1) +
n−3

i=−1
(n −i)
= 1 + (3 + 4 + 5 +. . . + (n −1) +n + (n + 1))

n+1

i=1
i =
(n + 1)(n + 2)
2
∈ O(n
2
).
In fact, a more careful analysis reveals that it is Θ(n
2
) in this case.
Average-case Analysis: Next we show that in the average case QuickSort runs in Θ(nlog n) time. When
we talked about average-case analysis at the beginning of the semester, we said that it depends on some
assumption about the distribution of inputs. However, in this case, the analysis does not depend on
the input distribution at all—it only depends on the random choices that the algorithm makes. This is
good, because it means that the analysis of the algorithm’s performance is the same for all inputs. In
this case the average is computed over all possible random choices that the algorithm might make for
the choice of the pivot index in the second step of the QuickSort procedure above.
To analyze the average running time, we let T(n) denote the average running time of QuickSort on a list
of size n. It will simplify the analysis to assume that all of the elements are distinct. The algorithm has
n random choices for the pivot element, and each choice has an equal probability of 1/n of occuring.
So we can modify the above recurrence to compute an average rather than a max, giving:
T(n) =
_
1 if n ≤ 1
1
n

n
q=1
(T(q −1) +T(n −q) +n) otherwise.
This is not a standard recurrence, so we cannot apply the Master Theorem. Expansion is possible, but
rather tricky. Instead, we will attempt a constructive induction to solve it. We know that we want a
Θ(nlog n) running time, so let’s try T(n) ≤ anlg n + b. (Properly we should write ¸lg n| because
unlike MergeSort, we cannot assume that the recursive calls will be made on array sizes that are powers
of 2, but we’ll be sloppy because things will be messy enough anyway.)
Theorem: There exist a constant c such that T(n) ≤ cnln n, for all n ≥ 2. (Notice that we have
replaced lg n with lnn. This has been done to make the proof easier, as we shall see.)
Proof: The proof is by constructive induction on n. For the basis case n = 2 we have
T(2) =
1
2
2

q=1
(T(q −1) +T(2 −q) + 2)
=
1
2
((T(0) +T(1) + 2) + (T(1) +T(0) + 2) =
8
2
= 4.
We want this to be at most c2 ln 2 implying that c ≥ 4/(2 ln 2) ≈ 2.885.
For the induction step, we assume that n ≥ 3, and the induction hypothesis is that for any n

< n,
we have T(n

) ≤ cn

ln n

. We want to prove it is true for T(n). By expanding the deﬁnition of
T(n), and moving the factor of n outside the sum we have:
T(n) =
1
n
n

q=1
(T(q −1) +T(n −q) +n)
=
1
n
n

q=1
(T(q −1) +T(n −q)) +n.
50
Lecture Notes CMSC 251
Observe that if we split the sum into two sums, they both add the same values T(0) + T(1) +
. . . +T(n −1), just that one counts up and the other counts down. Thus we can replace this with
2

n−1
q=0
T(q). Because they don’t follow the formula, we’ll extract T(0) and T(1) and treat them
specially. If we make this substitution and apply the induction hypothesis to the remaining sum
we have (which we can because q < n) we have
T(n) =
2
n
_
n−1

q=0
T(q)
_
+n =
2
n
_
T(0) +T(1) +
n−1

q=2
T(q)
_
+n

2
n
_
1 + 1 +
n−1

q=2
(cq lg q)
_
+n
=
2c
n
_
n−1

q=2
(cq ln q)
_
+n +
4
n
.
We have never seen this sum before. Later we will show that
S(n) =
n−1

q=2
q ln q ≤
n
2
2
ln n −
n
2
4
.
Assuming this for now, we have
T(n) =
2c
n
_
n
2
2
ln n −
n
2
4
_
+n +
4
n
= cnln n −
cn
2
+n +
4
n
= cnln n +n
_
1 −
c
2
_
+
4
n
.
To ﬁnish the proof, we want all of this to be at most cnln n. If we cancel the common cnln n we
see that this will be true if we select c such that
n
_
1 −
c
2
_
+
4
n
≤ 0.
After some simple manipulations we see that this is equivalent to:
0 ≥ n −
cn
2
+
4
n
cn
2
≥ n +
4
n
c ≥ 2 +
8
n
2
.
Since n ≥ 3, we only need to select c so that c ≥ 2 +
8
9
, and so selecting c = 3 will work. From
the basis case we have c ≥ 2.885, so we may choose c = 3 to satisfy both the constraints.
The Leftover Sum: The only missing element to the proof is dealing with the sum
S(n) =
n−1

q=2
q ln q.
51
Lecture Notes CMSC 251
To bound this, recall the integration formula for bounding summations (which we paraphrase
here). For any monotonically increasing function f(x)
b−1

i=a
f(i) ≤
_
b
a
f(x)dx.
The function f(x) = xln x is monotonically increasing, and so we have
S(n) ≤
_
n
2
xln xdx.
If you are a calculus macho man, then you can integrate this by parts, and if you are a calculus
wimp (like me) then you can look it up in a book of integrals
_
n
2
xln xdx =
x
2
2
ln x −
x
2
4
¸
¸
¸
¸
n
x=2
=
_
n
2
2
ln n −
n
2
4
_
−(2 ln 2 −1) ≤
n
2
2
ln n −
n
2
4
.
This completes the summation bound, and hence the entire proof.
Summary: So even though the worst-case running time of QuickSort is Θ(n
2
), the average-case running
time is Θ(nlog n). Although we did not show it, it turns out that this doesn’t just happen much of
the time. For large values of n, the running time is Θ(nlog n) with high probability. In order to get
Θ(n
2
) time the algorithm must make poor choices for the pivot at virtually every step. Poor choices are
rare, and so continuously making poor choices are very rare. You might ask, could we make QuickSort
deterministic Θ(nlog n) by calling the selection algorithm to use the median as the pivot. The answer
is that this would work, but the resulting algorithm would be so slow practically that no one would ever
use it.
QuickSort (like MergeSort) is not formally an in-place sorting algorithm, because it does make use of a
recursion stack. In MergeSort and in the expected case for QuickSort, the size of the stack is O(log n),
so this is not really a problem.
QuickSort is the most popular algorithm for implementation because its actual performance (on typical
modern architectures) is so good. The reason for this stems from the fact that (unlike Heapsort) which
can make large jumps around in the array, the main work in QuickSort (in partitioning) spends most of
its time accessing elements that are close to one another. The reason it tends to outperform MergeSort
(which also has good locality of reference) is that most comparisons are made against the pivot element,
which can be stored in a register. In MergeSort we are always comparing two array elements against
each other. The most efﬁcient versions of QuickSort uses the recursion for large subarrays, but once
the sizes of the subarray falls below some minimum size (e.g. 20) it switches to a simple iterative
algorithm, such as selection sort.
Lecture 16: Lower Bounds for Sorting
(Thursday, Mar 19, 1998)
Review of Sorting: So far we have seen a number of algorithms for sorting a list of numbers in ascending
order. Recall that an in-place sorting algorithm is one that uses no additional array storage (however,
we allow QuickSort to be called in-place even though they need a stack of size O(log n) for keeping
track of the recursion). A sorting algorithm is stable if duplicate elements remain in the same relative
position after sorting.
52
Lecture Notes CMSC 251
Slow Algorithms: Include BubbleSort, InsertionSort, and SelectionSort. These are all simple Θ(n
2
)
in-place sorting algorithms. BubbleSort and InsertionSort can be implemented as stable algo-
rithms, but SelectionSort cannot (without signiﬁcant modiﬁcations).
Mergesort: Mergesort is a stable Θ(nlog n) sorting algorithm. The downside is that MergeSort is
the only algorithm of the three that requires additional array storage, implying that it is not an
in-place algorithm.
Quicksort: Widely regarded as the fastest of the fast algorithms. This algorithm is O(nlog n) in the
expected case, and O(n
2
) in the worst case. The probability that the algorithm takes asymptoti-
cally longer (assuming that the pivot is chosen randomly) is extremely small for large n. It is an
(almost) in-place sorting algorithm but is not stable.
Heapsort: Heapsort is based on a nice data structure, called a heap, which is a fast priority queue.
Elements can be inserted into a heap in O(log n) time, and the largest item can be extracted in
O(log n) time. (It is also easy to set up a heap for extracting the smallest item.) If you only want
to extract the k largest values, a heap can allow you to do this is O(n + k log n) time. It is an
in-place algorithm, but it is not stable.
Lower Bounds for Comparison-Based Sorting: Can we sort faster than O(nlog n) time? We will give an
argument that if the sorting algorithm is based solely on making comparisons between the keys in the
array, then it is impossible to sort more efﬁciently than Ω(nlog n) time. Such an algorithm is called a
comparison-based sorting algorithm, and includes all of the algorithms given above.
Virtually all known general purpose sorting algorithms are based on making comparisons, so this is
not a very restrictive assumption. This does not preclude the possibility of a sorting algorithm whose
actions are determined by other types of operations, for example, consulting the individual bits of
numbers, performing arithmetic operations, indexing into an array based on arithmetic operations on
keys.
We will show that any comparison-based sorting algorithm for a input sequence ¸a
1
, a
2
, . . . , a
n
) must
make at least Ω(nlog n) comparisons in the worst-case. This is still a difﬁcult task if you think about it.
It is easy to show that a problem can be solved fast (just give an algorithm). But to show that a problem
cannot be solved fast you need to reason in some way about all the possible algorithms that might ever
be written. In fact, it seems surprising that you could even hope to prove such a thing. The catch here
is that we are limited to using comparison-based algorithms, and there is a clean mathematical way of
characterizing all such algorithms.
Decision Tree Argument: In order to prove lower bounds, we need an abstract way of modeling “any pos-
sible” comparison-based sorting algorithm, we model such algorithms in terms of an abstract model
called a decision tree.
In a comparison-based sorting algorithm only comparisons between the keys are used to determine
the action of the algorithm. Let ¸a
1
, a
2
, . . . , a
n
) be the input sequence. Given two elements, a
i
and
a
j
, their relative order can only be determined by the results of comparisons like a
i
< a
j
, a
i
≤ a
j
,
a
i
= a
j
, a
i
≥ a
j
, and a
i
> a
j
.
A decision tree is a mathematical representation of a sorting algorithm (for a ﬁxed value of n). Each
node of the decision tree represents a comparison made in the algorithm (e.g., a
4
: a
7
), and the two
branches represent the possible results, for example, the left subtree consists of the remaining compar-
isons made under the assumption that a
4
≤ a
7
and the right subtree for a
4
> a
7
. (Alternatively, one
might be labeled with a
4
< a
7
and the other with a
4
≥ a
7
.)
Observe that once we know the value of n, then the “action” of the sorting algorithm is completely
determined by the results of its comparisons. This action may involve moving elements around in the
array, copying them to other locations in memory, performing various arithmetic operations on non-key
data. But the bottom-line is that at the end of the algorithm, the keys will be permuted in the ﬁnal array
53
Lecture Notes CMSC 251
in some way. Each leaf in the decision tree is labeled with the ﬁnal permutation that the algorithm
generates after making all of its comparisons.
To make this more concrete, let us assume that n = 3, and let’s build a decision tree for SelectionSort.
Recall that the algorithm consists of two phases. The ﬁrst ﬁnds the smallest element of the entire list,
and swaps it with the ﬁrst element. The second ﬁnds the smaller of the remaining two items, and swaps
it with the second element. Here is the decision tree (in outline form). The ﬁrst comparison is between
a
1
and a
2
. The possible results are:
a
1
≤ a
2
: Then a
1
is the current minimum. Next we compare a
1
with a
3
whose results might be either:
a
1
≤ a
3
: Then we know that a
1
is the minimum overall, and the elements remain in their original
positions. Then we pass to phase 2 and compare a
2
with a
3
. The possible results are:
a
2
≤ a
3
: Final output is ¸a
1
, a
2
, a
3
).
a
2
> a
3
: These two are swapped and the ﬁnal output is ¸a
1
, a
3
, a
2
).
a
1
> a
3
: Then we know that a
3
is the minimum is the overall minimum, and it is swapped with
a
1
. The we pass to phase 2 and compare a
2
with a
1
(which is now in the third position of
the array) yielding either:
a
2
≤ a
1
: Final output is ¸a
3
, a
2
, a
1
).
a
2
> a
1
: These two are swapped and the ﬁnal output is ¸a
3
, a
1
, a
2
).
a
1
> a
2
: Then a
2
is the current minimum. Next we compare a
2
with a
3
whose results might be either:
a
2
≤ a
3
: Then we know that a
2
is the minimum overall. We swap a
2
with a
1
, and then pass to
phase 2, and compare the remaining items a
1
and a
3
. The possible results are:
a
1
≤ a
3
: Final output is ¸a
2
, a
1
, a
3
).
a
1
> a
3
: These two are swapped and the ﬁnal output is ¸a
2
, a
3
, a
1
).
a
2
> a
3
: Then we know that a
3
is the minimum is the overall minimum, and it is swapped with
a
1
. We pass to phase 2 and compare a
2
with a
1
(which is now in the third position of the
array) yielding either:
a
2
≤ a
1
: Final output is ¸a
3
, a
2
, a
1
).
a
2
> a
1
: These two are swapped and the ﬁnal output is ¸a
3
, a
1
, a
2
).
The ﬁnal decision tree is shown below. Note that there are some nodes that are unreachable. For exam-
ple, in order to reach the fourth leaf from the left it must be that a
1
≤ a
2
and a
1
> a
2
, which cannot
both be true. Can you explain this? (The answer is that virtually all sorting algorithms, especially
inefﬁcient ones like selection sort, may make comparisons that are redundant, in the sense that their
outcome has already been determined by earlier comparisons.) As you can see, converting a complex
sorting algorithm like HeapSort into a decision tree for a large value of n will be very tedious and
complex, but I hope you are convinced by this exercise that it can be done in a simple mechanical way.
3,1,2 3,2,1 3,1,2 3,2,1
a2:a1
>
a2:a1
>
>
> >
> >
a2:a3
a1:a3
2,3,1 2,1,3 1,3,2 1,2,3
a2:a3
a1:a2
<
< < < <
<
<
a1:a3
Figure 16: Decision Tree for SelectionSort on 3 keys.
54
Lecture Notes CMSC 251
Using Decision Trees for Analyzing Sorting: Consider any sorting algorithm. Let T(n) be the maximum
number of comparisons that this algorithm makes on any input of size n. Notice that the running time
fo the algorithm must be at least as large as T(n), since we are not counting data movement or other
computations at all. The algorithm deﬁnes a decision tree. Observe that the height of the decision
tree is exactly equal to T(n), because any path from the root to a leaf corresponds to a sequence of
As we have seen earlier, any binary tree of height T(n) has at most 2
T(n)
leaves. This means that this
sorting algorithm can distinguish between at most 2
T(n)
different ﬁnal actions. Let’s call this quantity
A(n), for the number of different ﬁnal actions the algorithm can take. Each action can be thought of as
a speciﬁc way of permuting the oringinal input to get the sorted output.
How many possible actions must any sorting algorithm distinguish between? If the input consists of n
distinct numbers, then those numbers could be presented in any of n! different permutations. For each
different permutation, the algorithm must “unscramble” the numbers in an essentially different way,
that is it must take a different action, implying that A(n) ≥ n!. (Again, A(n) is usually not exactly
equal to n! because most algorithms contain some redundant unreachable leaves.)
Since A(n) ≤ 2
T(n)
we have 2
T(n)
≥ n!, implying that
T(n) ≥ lg(n!).
We can use Stirling’s approximation for n! (see page 35 in CLR) yielding:
n! ≥

2πn
_
n
e
_
n
T(n) ≥ lg
_

2πn
_
n
e
_
n
_
= lg

2πn +nlg n −nlg e ∈ Ω(nlog n).
Thus we have, the following theorem.
Theorem: Any comparison-based sorting algorithm has worst-case running time Ω(nlog n).
This can be generalized to show that the average-case time to sort is also Ω(nlog n) (by arguing about
the average height of a leaf in a tree with at least n! leaves). The lower bound on sorting can be
generalized to provide lower bounds to a number of other problems as well.
Lecture 17: Linear Time Sorting
(Tuesday, Mar 31, 1998)
Linear Time Sorting: Last time we presented a proof that it is not possible to sort faster than Ω(nlog n)
time assuming that the algorithm is based on making 2-way comparisons. Recall that the argument
was based on showing that any comparison-based sorting could be represented as a decision tree, the
decision tree must have at least n! leaves, to distinguish between the n! different permutations in which
the keys could be input, and hence its height must be at least lg(n!) ∈ Ω(nlg n).
This lower bound implies that if we hope to sort numbers faster than in O(nlog n) time, we cannot
do it by making comparisons alone. Today we consider the question of whether it is possible to sort
without the use of comparisons. They answer is yes, but only under very restrictive circumstances.
55
Lecture Notes CMSC 251
Many applications involve sorting small integers (e.g. sorting characters, exam scores, last four digits
of a social security number, etc.). We present three algorithms based on the theme of speeding up
sorting in special cases, by not making comparisons.
Counting Sort: Counting sort assumes that each input is an integer in the range from 1 to k. The algorithm
sorts in Θ(n +k) time. If k is known to be Θ(n), then this implies that the resulting sorting algorithm
is Θ(n) time.
The basic idea is to determine, for each element in the input array, its rank in the ﬁnal sorted array.
Recall that the rank of a item is the number of elements in the array that are less than or equal to it.
Notice that once you know the rank of every element, you sort by simply copying each element to the
appropriate location of the ﬁnal sorted output array. The question is how to ﬁnd the rank of an element
without comparing it to the other elements of the array? Counting sort uses the following three arrays.
As usual A[1..n] is the input array. Recall that although we usually think of A as just being a list of
numbers, it is actually a list of records, and the numeric value is the key on which the list is being
sorted. In this algorithm we will be a little more careful to distinguish the entire record A[j] from the
key A[j].key.
We use three arrays:
A[1..n] : Holds the initial input. A[j] is a record. A[j].key is the integer key value on which to sort.
B[1..n] : Array of records which holds the sorted output.
R[1..k] : An array of integers. R[x] is the rank of x in A, where x ∈ [1..k].
The algorithm is remarkably simple, but deceptively clever. The algorithm operates by ﬁrst construct-
ing R. We do this in two steps. First we set R[x] to be the number of elements of A[j] whose key
is equal to x. We can do this initializing R to zero, and then for each j, from 1 to n, we increment
R[A[j].key] by 1. Thus, if A[j].key = 5, then the 5th element of R is incremented, indicating that we
have seen one more 5. To determine the number of elements that are less than or equal to x, we replace
R[x] with the sum of elements in the subarray R[1..x]. This is done by just keeping a running total of
the elements of R.
Now R[x] now contains the rank of x. This means that if x = A[j].key then the ﬁnal position of A[j]
should be at position R[x] in the ﬁnal sorted array. Thus, we set B[R[x]] = A[j]. Notice that this
copies the entire record, not just the key value. There is a subtlety here however. We need to be careful
if there are duplicates, since we do not want them to overwrite the same location of B. To do this, we
decrement R[i] after copying.
Counting Sort
CountingSort(int n, int k, array A, array B) { // sort A[1..n] to B[1..n]
for x = 1 to k do R[x] = 0 // initialize R
for j = 1 to n do R[A[j].key]++ // R[x] = #(A[j] == x)
for x = 2 to k do R[x] += R[x-1] // R[x] = rank of x
for j = n downto 1 do { // move each element of A to B
x = A[j].key // x = key value
B[R[x]] = A[j] // R[x] is where to put it
R[x]-- // leave space for duplicates
}
}
There are four (unnested) loops, executed k times, n times, k − 1 times, and n times, respectively,
so the total running time is Θ(n + k) time. If k = O(n), then the total running time is Θ(n). The
ﬁgure below shows an example of the algorithm. You should trace through a few examples, to convince
yourself how it works.
56
Lecture Notes CMSC 251
s
s
r
r
r
e
e a
4 3 2 1 5
4 3 2 1
5 4 2 2
s
3
1 3
3 3 1
4 3 3 1
4 3 3 1 1
v
v
v
v
v
s
5
1 4 3 2 1 5
A R R
R
R
R
R
R
B
B
B
B
B
Key
Other data
2
3 2 2
5 3 2 1
5 2 2 1
4 2 2 1
4 2 2 0
4 3
e
1 2 0 2 3 1 3 4 1
a s v r
Figure 17: Counting Sort.
Obviously this not an in-place sorting algorithm (we need two additional arrays). However it is a stable
sorting algorithm. I’ll leave it as an exercise to prove this. (As a hint, notice that the last loop runs
down from n to 1. It would not be stable if the loop were running the other way.)
Radix Sort: The main shortcoming of counting sort is that it is only really (due to space requirements) for
small integers. If the integers are in the range from 1 to 1 million, we may not want to allocate an
array of a million elements. Radix sort provides a nice way around this by sorting numbers one digit
at a time. Actually, what constitutes a “digit” is up to the implementor. For example, it is typically
more convenient to sort by bytes rather than digits (especially for sorting character strings). There is a
tradeoff between the space and time.
The idea is very simple. Let’s think of our list as being composed of n numbers, each having d decimal
digits (or digits in any base). Let’s suppose that we have access to a stable sorting algorithm, like
Counting Sort. To sort these numbers we can simply sort repeatedly, starting at the lowest order digit,
and ﬁnishing with the highest order digit. Since the sorting algorithm is stable, we know that if the
numbers are already sorted with respect to low order digits, and then later we sort with respect to high
order digits, numbers having the same high order digit will remain sorted with respect to their low
order digit. As usual, let A[1..n] be the array to sort, and let d denote the number of digits in A. We
will not discuss how it is that A is broken into digits, but this might be done through bit manipulations
(shifting and masking off bits) or by accessing elements byte-by-byte, etc.
RadixSort(int n, int d, array A) { // sort A[1..n] with d digits
for i = 1 to d do {
Sort A (stably) with respect to i-th lowest order digit;
}
}
57
Lecture Notes CMSC 251
Here is an example.
576 49[4] 9[5]4 [1]76 176
494 19[4] 5[7]6 [1]94 194
194 95[4] 1[7]6 [2]78 278
296 =⇒ 57[6] =⇒ 2[7]8 =⇒ [2]96 =⇒ 296
278 29[6] 4[9]4 [4]94 494
176 17[6] 1[9]4 [5]76 576
954 27[8] 2[9]6 [9]54 954
The running time is clearly Θ(d(n+k)) where d is the number of digits, n is the length of the list, and
k is the number of values a digit can have. This is usually a constant, but the algorithm’s running time
will be Θ(dn) as long as k ∈ O(n).
Notice that we can be quite ﬂexible in the deﬁnition of what a “digit” is. It can be any number in the
range from 1 to cn for some constant c, and we will still have an Θ(n) time algorithm. For example,
if we have d = 2 and set k = n, then we can sort numbers in the range n ∗ n = n
2
in Θ(n) time. In
general, this can be used to sort numbers in the range from 1 to n
d
in Θ(dn) time.
At this point you might ask, since a computer integer word typically consists of 32 bits (4 bytes), then
doesn’t this imply that we can sort any array of integers in O(n) time (by applying radix sort on each
of the d = 4 bytes)? The answer is yes, subject to this word-length restriction. But you should be
careful about attempting to make generalizations when the sizes of the numbers are not bounded. For
example, suppose you have n keys and there are no duplicate values. Then it follows that you need
at least B = ¸lg n| bits to store these values. The number of bytes is d = ¸B/8|. Thus, if you were
to apply radix sort in this situation, the running time would be Θ(dn) = Θ(nlog n). So there is no
real asymptotic savings here. Furthermore, the locality of reference behavior of Counting Sort (and
hence of RadixSort) is not as good as QuickSort. Thus, it is not clear whether it is really faster to use
RadixSort over QuickSort. This is at a level of similarity, where it would probably be best to implement
both algorithms on your particular machine to determine which is really faster.
Lecture 18: Review for Second Midterm
(Thursday, Apr 2, 1998)
General Remarks: Up to now we have covered the basic techniques for analyzing algorithms (asymptotics,
summations, recurrences, induction), have discussed some algorithm design techniques (divide-and-
conquer in particular), and have discussed sorting algorithm and related topics. Recall that our goal is
to provide you with the necessary tools for designing and analyzing efﬁcient algorithms.
Material from Text: You are only responsible for material that has been covered in class or on class assign-
ments. However it is always a good idea to see the text to get a better insight into some of the topics
we have covered. The relevant sections of the text are the following.
• Review Chapts 1: InsertionSort and MergeSort.
• Chapt 7: Heaps, HeapSort. Look at Section 7.5 on priority queues, even though we didn’t cover
it in class.
• Chapt 8: QuickSort. You are responsible for the partitioning algorithm which we gave in class,
not the one in the text. Section 8.2 gives some good intuition on the analysis of QuickSort.
• Chapt 9 (skip 9.4): Lower bounds on sorting, CountingSort, RadixSort.
58
Lecture Notes CMSC 251
• Chapt. 10 (skip 10.1): Selection. Read the analysis of the average case of selection. It is similar
to the QuickSort analysis.
You are also responsible for anything covered in class.
Cheat Sheets: The exam is closed-book, closed-notes, but you are allowed two sheets of notes (front and
back). You should already have the cheat sheet from the ﬁrst exam with basic deﬁnitions of asymp-
totics, important summations, Master theorem. Also add Stirling’s approximation (page 35), and the
integration formula for summations (page 50). You should be familiar enough with each algorithm
presented in class that you could work out an example by hand, without refering back to your cheat
sheet. But it is a good idea to write down a brief description of each algorithm. For example, you might
be asked to show the result of BuildHeap on an array, or show how to apply the Partition algorithm
used in QuickSort.
Keep track of algorithm running times and their limitations. For example, if you need an efﬁcient
stable sorting algorithm, MergeSort is ﬁne, but both HeapSort and QuickSort are not stable. You
can sort short integers in Θ(n) time through CountingSort, but you cannot use this algorithm to sort
arbitrary numbers, such as reals.
Sorting issues: We discussed the following issues related to sorting.
Slow Algorithms: BubbleSort, InsertionSort, SelectionSort are all simple Θ(n
2
) algorithm. They are
ﬁne for small inputs. They are all in-place sorting algorithms (they use no additional array stor-
age), and BubbleSort and InsertionSort are stable sorting algorithms (if implemented carefully).
MergeSort: A divide-and-conquer Θ(nlog n) algorithm. It is stable, but requires additional array
storage for merging, and so it is not an in-place algorithm.
HeapSort: A Θ(nlog n) algorithm which uses a clever data structure, called a heap. Heaps are a
nice way of implementing a priority queue data structure, allowing insertions, and extracting the
maximum in Θ(log n) time, where n is the number of active elements in the heap. Remember that
a heap can be built in Θ(n) time. HeapSort is not stable, but it is an in-place sorting algorithm.
QuickSort: The algorithm is based on selecting a pivot value. If chosen randomly, then the expected
time is Θ(nlog n), but the worst-case is Θ(n
2
). However the worst-case occurs so rarely that
people usually do not worry about it. This algorithm is not stable, but it is considered an in-place
sorting algorithm even though it does require some additional array storage. It implicitly requires
storage for the recursion stack, but the expected depth of the recursion is O(log n), so this is not
Lower bounds: Assuming comparisons are used, you cannot sort faster than Ω(nlog n) time. This is
because any comparison-based algorithm can be written as a decision tree, and because there are
n! possible outcomes to sorting, even a perfectly balanced tree would require height of at least
O(log n!) = O(nlog n).
Counting sort: If you are sorting n small integers (in the range of 1 to k) then this algorithm will sort
them in Θ(n +k) time. Recall that the algorithm is based on using the elements as indices to an
array. In this way it circumvents the lower bound argument.
Radix sort: If you are sorting n integers that have been broken into d digits (each of constant size),
you can sort them in O(dn) time.
What sort of questions might there be? Some will ask you to about the properties of these sorting
algorithms, or asking which algorithm would be most appropriate to use in a certain circumstance.
Others will ask you to either reason about the internal operations of the algorithms, or ask you to extend
these algorithms for other purposes. Finally, there may be problems asking you to devise algorithms to
solve some sort of novel sorting problem.
59
Lecture Notes CMSC 251
Lecture 19: Second Midterm Exam
(Tuesday, April 7, 1998)
Second midterm exam today. No lecture.
Lecture 20: Introduction to Graphs
(Thursday, April 9, 1998)
Graph Algorithms: For the next few weeks, we will be discussing a number of various topics. One involves
algorithms on graphs. Intuitively, a graph is a collection of vertices or nodes, connected by a collection
of edges. Graphs are very important discrete structures because they are a very ﬂexible mathematical
model for many application problems. Basically, any time you have a set of objects, and there is some
“connection” or “relationship” or “interaction” between pairs of objects, a graph is a good way to model
this. Examples of graphs in application include communication and transportation networks, VLSI and
other sorts of logic circuits, surface meshes used for shape description in computer-aided design and
geographic information systems, precedence constraints in scheduling systems. The list of application
is almost too long to even consider enumerating it.
Most of the problems in computational graph theory that we will consider arise because they are of
importance to one or more of these application areas. Furthermore, many of these problems form the
basic building blocks from which more complex algorithms are then built.
Graphs and Digraphs: A directed graph (or digraph) G = (V, E) consists of a ﬁnite set of vertices V (also
called nodes) and E is a binary relation on V (i.e. a set of ordered pairs from V ) called the edges.
For example, the ﬁgure below (left) shows a directed graph. Observe that self-loops are allowed by
this deﬁnition. Some deﬁnitions of graphs disallow this. Multiple edges are not permitted (although
the edges (v, w) and (w, v) are distinct). This shows the graph G = (V, E) where V = ¦1, 2, 3¦ and
E = ¦(1, 1), (1, 2), (2, 3), (3, 2), (1, 3)¦.
Graph Digraph
2
4
3
2
1
3
1
Figure 18: Digraph and graph example.
In an undirected graph (or just graph) G = (V, E) the edge set consists of unordered pairs of distinct
vertices (thus self-loops are not allowed). The ﬁgure above (right) shows the graph G = (V, E), where
V = ¦1, 2, 3, 4¦ and the edge set is E = ¦¦1, 2¦, ¦1, 3¦, ¦1, 4¦, ¦2, 4¦, ¦3, 4¦¦.
We say that vertex w is adjacent to vertex v if there is an edge from v to w. In an undirected graph, we
say that an edge is incident on a vertex if the vertex is an endpoint of the edge. In a directed graph we
will often say that an edge either leaves or enters a vertex.
A digraph or undirected graph is said to be weighted if its edges are labelled with numeric weights. The
meaning of the weight is dependent on the application, e.g. distance between vertices or ﬂow capacity
through the edge.
60
Lecture Notes CMSC 251
Observe that directed graphs and undirected graphs are different (but similar) objects mathematically.
Certain notions (such as path) are deﬁned for both, but other notions (such as connectivity) are only
deﬁned for one.
In a digraph, the number of edges coming out of a vertex is called the out-degree of that vertex, and
the number of edges coming in is called the in-degree. In an undirected graph we just talk about the
degree of a vertex, as the number of edges which are incident on this vertex. By the degree of a graph,
we usually mean the maximum degree of its vertices.
In a directed graph, each edge contributes 1 to the in-degree of a vertex and contributes one to the
out-degree of each vertex, and thus we have
Observation: For a digraph G = (V, E),

v∈V
in-deg(v) =

v∈V
out-deg(v) = [E[.
([E[ means the cardinality of the set E, i.e. the number of edges).
In an undirected graph each edge contributes once to the outdegree of two different edges and thus we
have
Observation: For an undirected graph G = (V, E),

v∈V
deg(v) = 2[E[.
Lecture 21: More on Graphs
(Tuesday, April 14, 1998)
Graphs: Last time we introduced the notion of a graph (undirected) and a digraph (directed). We deﬁned
vertices, edges, and the notion of degrees of vertices. Today we continue this discussion. Recall that
graphs and digraphs both consist of two objects, a set of vertices and a set of edges. For graphs edges
are undirected and for graphs they are directed.
Paths and Cycles: Let’s concentrate on directed graphs for the moment. A path in a directed graph is a
sequence of vertices ¸v
0
, v
1
, . . . , v
k
) such that (v
i−1
, v
i
) is an edge for i = 1, 2, . . . , k. The length of
the path is the number of edges, k. We say that w is reachable from u if there is a path from u to w.
Note that every vertex is reachable from itself by a path that uses zero edges. A path is simple if all
vertices (except possibly the ﬁrst and last) are distinct.
A cycle in a digraph is a path containing at least one edge and for which v
0
= v
k
. A cycle is simple if,
1
, . . . , v
k
are distinct. (Note: A self-loop counts as a simple cycle of length 1).
In undirected graphs we deﬁne path and cycle exactly the same, but for a simple cycle we add the
requirement that the cycle visit at least three distinct vertices. This is to rule out the degenerate cycle
¸u, w, u), which simply jumps back and forth along a single edge.
There are two interesting classes cycles. A Hamiltonian cycle is a cycle that visits every vertex in a
graph exactly once. A Eulerian cycle is a cycle (not necessarily simple) that visits every edge of a
graph exactly once. (By the way, this is pronounced “Oiler-ian” and not “Yooler-ian”.) There are also
“path” versions in which you need not return to the starting vertex.
61
Lecture Notes CMSC 251
One of the early problems which motivated interest in graph theory was the K¨ onigsberg Bridge Prob-
lem. This city sits on the Pregel River as is joined by 7 bridges. The question is whether it is possible
to cross all 7 bridges without visiting any bridge twice. Leonard Euler showed that it is not possible, by
showing that this question could be posed as a problem of whether the multi-graph shown below has
an Eulerian path, and then proving necessary and sufﬁcient conditions for a graph to have such a path.
4
1
3
2
2
4
3
1
Figure 19: Bridge’s at K¨ onigsberg Problem.
Euler proved that for a graph to have an Eulerian path, all but at most two of the vertices must have
even degree. In this graph, all 4 of the vertices have odd degree.
Connectivity and acyclic graphs: A graph is said to be acyclic if it contains no simple cycles. A graph is
connected if every vertex can reach every other vertex. An acyclic connected graph is called a free tree
or simply tree for short. The term “free” is intended to emphasize the fact that the tree has no root, in
contrast to a rooted tree, as is usually seen in data structures.
Observe that a free tree is a minimally connected graph in the sense that the removal of any causes the
resulting graph to be disconnected. Furthermore, there is a unique path between any two vertices in a
free tree. The addition of any edge to a free tree results in the creation of a single cycle.
The “reachability” relation is an equivalence relation on vertices, that is, it is reﬂexive (a vertex is
reachable from itself), symmetric (if u is reachable from v, then v is reachable from u), and transitive
(if u is reachable from v and v is reachable from w, then u is reachable from w). This implies that
the reachability relation partitions the vertices of the graph in equivalence classes. These are called
connected components.
A connected graph has a single connected component. An acyclic graph (which is not necessarily
connected) consists of many free trees, and is called (what else?) a forest.
A digraph is strongly connected if for any two vertices u and v, u can reach v and v can reach u. (There
is another type of connectivity in digraphs called weak connectivity, but we will not consider it.) As
with connected components in graphs, strongly connectivity deﬁnes an equivalence partition on the
vertices. These are called the strongly connected components of the digraph.
A directed graph that is acyclic is called a DAG, for directed acyclic graph. Note that it is different
from a directed tree.
Isomorphism: Two graphs G = (V, E) and G

= (V

, E

) are said to be isomorphic if there is a bijection
(that is, a 1–1 and onto) function f : V → V

, such that (u, v) ∈ E if and only if (f(u), f(v)) ∈ E

.
Isomorphic graphs are essentially “equal” except that their vertices have been given different names.
Determining whether graphs are isomorphic is not as easy as it might seem at ﬁrst. For example,
consider the graphs in the ﬁgure. Clearly (a) and (b) seem to appear more similar to each other than to
(c), but in fact looks are deceiving. Observe that in all three cases all the vertices have degree 3, so that
is not much of a help. Observe there are simple cycles of length 4 in (a), but the smallest simple cycles
in (b) and (c) are of length 5. This implies that (a) cannot be isomorphic to either (b) or (c). It turns
62
Lecture Notes CMSC 251
5
4
3
2
8
1
10
9
(a)
6 7
8
9
10
(b) (c)
2
6
1
3 4
5
7
8 9
10
1
2
3 4
5
6
7
Figure 20: Graph isomorphism.
out that (b) and (c) are isomorphic. One possible isomorphism mapping is given below. The notation
(u → v) means that vertex u from graph (b) is mapped to vertex v in graph (c). Check that each edge
from (b) is mapped to an edge of (c).
¦(1 → 1), (2 → 2), (3 → 3), (4 → 7), (5 → 8), (6 → 5), (7 → 10), (8 → 4), (9 → 6), (10 → 9)¦.
Subgraphs and special graphs: A graph G

= (V

, E

) is a subgraph of G = (V, E) if V

⊆ V and
E

⊆ E. Given a subset V

⊆ V , the subgraph induced by V

is the graph G

= (V

, E

) where
E

= ¦(u, v) ∈ E [ u, v ∈ V

¦.
In other words, take all the edges of G that join pairs of vertices in V

.
An undirected graph that has the maximum possible number of edges is called a complete graph.
Complete graphs are often denoted with the letter K. For example, K
5
is the complete graph on 5
vertices. Given a graph G, a subset of vertices V

⊆ V is said to form a clique if the subgraph induced
by V

is complete. In other words, all the vertices of V

are adjacent to one another. A subset of
vertices V

forms an independent set if the subgraph induced by V

has no edges. For example, in the
ﬁgure below (a), the subset ¦1, 2, 4, 6¦ is a clique, and ¦3, 4, 7, 8¦ is an independent set.
(a) (b)
9
8
7
3
6
5
4
2
1
Figure 21: Cliques, Independent set, Bipartite graphs.
A bipartite graph is an undirected graph in which the vertices can be partitioned into two sets V
1
and V
2
such that all the edges go between a vertex in V
1
and V
2
(never within the same group). For example,
the graph shown in the ﬁgure (b), is bipartite.
63
Lecture Notes CMSC 251
The complement of a graph G = (V, E), often denoted
¯
G is a graph on the same vertex set, but in
which the edge set has been complemented. The reversal of a directed graph, often denoted G
R
is a
graph on the same vertex set in which all the edge directions have been reversed. This may also be
called the transpose and denoted G
T
.
A graph is planar if it can be drawn on the plane such that no two edges cross over one another. Planar
graphs are important special cases of graphs, since they arise in applications of geographic information
systems (as subdivisions of region into smaller subregions), circuits (where wires cannot cross), solid
modeling (for modeling complex surfaces as collections of small triangles). In general there may
be many different ways to draw a planar graph in the plane. For example, the ﬁgure below shows
two essentially different drawings of the same graph. Such a drawing is called a planar embedding.
The neighbors of a vertex are the vertices that it is adjacent to. An embedding is determined by the
counterclockwise cyclic ordering of the neighbors about all the vertices. For example, in the embedding
on the left, the neighbors of vertex 1 in counterclockwise order are ¸2, 3, 4, 5), but on the right the order
is ¸2, 5, 4, 3). Thus the two embeddings are different.
5
4 3
2
1 1
5
4 3
2
Figure 22: Planar Embeddings.
An important fact about planar embeddings of graphs is that they naturally subdivide the plane into
regions, called faces. For example, in the ﬁgure on the left, the triangular region bounded by vertices
¸1, 2, 5) is a face. There is always one face, called the unbounded face that surrounds the whole graph.
This embedding has 6 faces (including the unbounded face). Notice that the other embedding also has
6 faces. Is it possible that two different embeddings have different numbers of faces? The answer is
no. The reason stems from an important observation called Euler’s formula, which relates the numbers
of vertices, edges, and faces in a planar graph.
Euler’s Formula: A connected planar embedding of a graph with V vertices, E edges, and F faces,
satisﬁes:
V −E +F = 2.
In the examples above, both graphs have 5 vertices, and 9 edges, and so by Euler’s formula they have
F = 2 −V +E = 2 −5 + 9 = 6 faces.
Size Issues: When referring to graphs and digraphs we will always let n = [V [ and e = [E[. (Our textbook
usually uses the abuse of notation V = [V [ and E = [E[. Beware, the sometimes V is a set, and
sometimes it is a number. Some authors use m = [E[.)
Because the running time of an algorithm will depend on the size of the graph, it is important to know
how n and e relate to one another.
64
Lecture Notes CMSC 251
Observation: For a digraph e ≤ n
2
= O(n
2
). For an undirected graph e ≤
_
n
2
_
= n(n − 1)/2 =
O(n
2
).
A graph or digraph is allowed to have no edges at all. One interesting question is what the minimum
number of edges that a connected graph must have.
We say that a graph is sparse if e is much less than n
2
.
For example, the important class of planar graphs (graphs which can be drawn on the plane so that no
two edges cross over one another) e = O(n). In most application areas, very large graphs tend to be
sparse. This is important to keep in mind when designing graph algorithms, because when n is really
large and O(n
2
) running time is often unacceptably large for real-time response.
Lecture 22: Graphs Representations and BFS
(Thursday, April 16, 1998)
Read: Sections 23.1 through 23.3 in CLR.
Representations of Graphs and Digraphs: We will describe two ways of representing graphs and digraphs.
First we show how to represent digraphs. Let G = (V, E) be a digraph with n = [V [ and let e = [E[.
We will assume that the vertices of G are indexed ¦1, 2, . . . , n¦.
Adjacency Matrix: An n n matrix deﬁned for 1 ≤ v, w ≤ n.
A[v, w] =
_
1 if (v, w) ∈ E
0 otherwise.
If the digraph has weights we can store the weights in the matrix. For example if (v, w) ∈ E then
A[v, w] = W(v, w) (the weight on edge (v, w)). If (v, w) / ∈ E then generally W(v, w) need
not be deﬁned, but often we set it to some “special” value, e.g. A(v, w) = −1, or ∞. (By ∞
we mean (in practice) some number which is larger than any allowable weight. In practice, this
might be some machine dependent constant like MAXINT.)
Adjacency List: An array Adj[1 . . . n] of pointers where for 1 ≤ v ≤ n, Adj[v] points to a linked list
containing the vertices which are adjacent to v (i.e. the vertices that can be reached from v by a
single edge). If the edges have weights then these weights may also be stored in the linked list
elements.
3
1
1
0 1
0
1 1
0
0
2 3
1
2
3
2
1
3 2
2
3
1
3
2
1
1
We can represent undirected graphs using exactly the same representation, but we will store each edge
twice. In particular, we representing the undirected edge ¦v, w¦ by the two oppositely directed edges
(v, w) and (w, v). Notice that even though we represent undirected graphs in the same way that we
65
Lecture Notes CMSC 251
represent digraphs, it is important to remember that these two classes of objects are mathematically
distinct from one another.
This can cause some complications. For example, suppose you write an algorithm that operates by
marking edges of a graph. You need to be careful when you mark edge (v, w) in the representation
that you also mark (w, v), since they are both the same edge in reality. When dealing with adjacency
lists, it may not be convenient to walk down the entire linked list, so it is common to include cross links
between corresponding edges.
2
1
1 2 3
1
1
0 1
0
1
3
4
1
1 2 4
4 2
3 1
3
4
1
1
1
0
0
0
1
1
1 0 4
3
2
1
3
1
3 2
4
2
) storage and an adjacency list requires Θ(n+e) storage (one entry
for each vertex in Adj and each list has outdeg(v) entries, which when summed is Θ(e). For sparse
graphs the adjacency list representation is more cost effective.
Shortest Paths: To motivate our ﬁrst algorithm on graphs, consider the following problem. You are given
an undirected graph G = (V, E) (by the way, everything we will be saying can be extended to directed
graphs, with only a few small changes) and a source vertex s ∈ V . The length of a path in a graph
(without edge weights) is the number of edges on the path. We would like to ﬁnd the shortest path from
s to each other vertex in G. If there are ties (two shortest paths of the same length) then either path
may be chosen arbitrarily.
The ﬁnal result will be represented in the following way. For each vertex v ∈ V , we will store d[v]
which is the distance (length of the shortest path) from s to v. Note that d[s] = 0. We will also store a
predecessor (or parent) pointer π[v], which indicates the ﬁrst vertex along the shortest path if we walk
from v backwards to s. We will let π[s] = NIL.
It may not be obvious at ﬁrst, but these single predecessor pointers are sufﬁcient to reconstruct the
shortest path to any vertex. Why? We make use of a simple fact which is an example of a more general
principal of many optimization problems, called the principal of optimality. For a path to be a shortest
path, every subpath of the path must be a shortest path. (If not, then the subpath could be replaced with
a shorter subpath, implying that the original path was not shortest after all.)
Using this observation, we know that the last edge on the shortest path from s to v is the edge (u, v),
then the ﬁrst part of the path must consist of a shortest path from s to u. Thus by following the
predecessor pointers we will construct the reverse of the shortest path from s to v.
Obviously, there is simple brute-force strategy for computing shortest paths. We could simply start
enumerating all simple paths starting at s, and keep track of the shortest path arriving at each vertex.
However, since there can be as many as n! simple paths in a graph (consider a complete graph), then
this strategy is clearly impractical.
Here is a simple strategy that is more efﬁcient. Start with the source vertex s. Clearly, the distance to
each of s’s neighbors is exactly 1. Label all of them with this distance. Now consider the unvisited
66
Lecture Notes CMSC 251
2
2
2
2
2
2
3
3
3
3
3
s
1
1
1
s s
: Finished : Discovered : Undiscovered
Figure 25: Breadth-ﬁrst search for shortest paths.
neighbors of these neighbors. They will be at distance 2 from s. Next consider the unvisited neighbors
of the neighbors of the neighbors, and so on. Repeat this until there are no more unvisited neighbors left
to visit. This algorithm can be visualized as simulating a wave propagating outwards from s, visiting
the vertices in bands at ever increasing distances from s.
Breadth-ﬁrst search: Given an graph G = (V, E), breadth-ﬁrst search starts at some source vertex s and
“discovers” which vertices are reachable froms. Deﬁne the distance between a vertex v and s to be the
minimum number of edges on a path from s to v. Breadth-ﬁrst search discovers vertices in increasing
order of distance, and hence can be used as an algorithm for computing shortest paths. At any given
time there is a “frontier” of vertices that have been discovered, but not yet processed. Breadth-ﬁrst
search is named because it visits vertices across the entire “breadth” of this frontier.
Initially all vertices (except the source) are colored white, meaning that they have not been seen. When
a vertex has ﬁrst been discovered, it is colored gray (and is part of the frontier). When a gray vertex is
processed, then it becomes black.
BFS(graph G=(V,E), vertex s) {
int d[1..size(V)] // vertex distances
int color[1..size(V)] // vertex colors
vertex pred[1..size(V)] // predecessor pointers
queue Q = empty // FIFO queue
for each u in V { // initialization
color[u] = white
d[u] = INFINITY
pred[u] = NULL
}
color[s] = gray // initialize source s
d[s] = 0
enqueue(Q, s) // put source in the queue
while (Q is nonempty) {
u = dequeue(Q) // u is the next vertex to visit
for each v in Adj[u] {
if (color[v] == white) { // if neighbor v undiscovered
color[v] = gray // ...mark it discovered
d[v] = d[u]+1 // ...set its distance
pred[v] = u // ...and its predecessor
67
Lecture Notes CMSC 251
enqueue(Q, v) // ...put it in the queue
}
}
color[u] = black // we are done with u
}
}
The search makes use of a queue, a ﬁrst-in ﬁrst-out list, where elements are removed in the same order
they are inserted. The ﬁrst item in the queue (the next to be removed) is called the head of the queue.
We will also maintain arrays color[u] which holds the color of vertex u (either white, gray or black),
pred[u] which points to the predecessor of u (i.e. the vertex who ﬁrst discovered u, and d[u], the
distance from s to u. Only the color is really needed for the search, but the others are useful depending
on the application.
s
t
u v
Q: x, u, w
s
t
u v w
x
Q: w, t
s
t
u v w
x
Q: u, w
Q: v, x
w
x s
t
u v w
x s
t
u v w
x
Q: s
t
u v w
x s
Q: t
? ?
?
1
0 1
?
2 2
1
1 0
2 2 1
0
1
1 3
2
?
? ? ?
? 0
0
1
1 3
2 2
2
? 0 1
Observe that the predecessor pointers of the BFS search deﬁne an inverted tree. If we reverse these
edges we get a rooted unordered tree called a BFS tree for G. (Note that there are many potential BFS
trees for a given graph, depending on where the search starts, and in what order vertices are placed on
the queue.) These edges of Gare called tree edges and the remaining edges of Gare called cross edges.
It is not hard to prove that if G is an undirected graph, then cross edges always go between two nodes
that are at most one level apart in the BFS tree. The reason is that if any cross edge spanned two or
more levels, then when the vertex at the higher level (closer to the root) was being processed, it would
have discovered the other vertex, implying that the other vertex would appear on the very next level of
the tree, a contradiction. (In a directed graph cross edges will generally go down at most 1 level, but
they may come up an arbitrary number of levels.)
Analysis: The running time analysis of BFS is similar to the running time analysis of many graph traversal
algorithms. Let n = [V [ and e = [E[. Observe that the initialization portion requires Θ(n) time.
The real meat is in the traversal loop. Since we never visit a vertex twice, the number of times we go
through the while loop is at most n (exactly n assuming each vertex is reachable from the source). The
number of iterations through the inner for loop is proportional to deg(u) + 1. (The +1 is because even
if deg(u) = 0, we need to spend a constant amount of time to set up the loop.) Summing up over all
vertices we have the running time
T(n) = n +

u∈V
(deg(u) + 1) = n +

u∈V
deg(u) +n = 2n + 2e ∈ Θ(n +e).
68
Lecture Notes CMSC 251
0
1
2 2
3
1
t
u w
x v
s
Figure 27: BFS tree.
For an directed graph the analysis is essentially the same.
Lecture 23: All-Pairs Shortest Paths
(Tuesday, April 21, 1998)
Read: Chapt 26 (up to Section 26.2) in CLR.
All-Pairs Shortest Paths: Last time we showed how to compute shortest paths starting at a designated
source vertex, and assuming that there are no weights on the edges. Today we talk about a consid-
erable generalization of this problem. First, we compute shortest paths not from a single vertex, but
from every vertex in the graph. Second, we allow edges in the graph to have numeric weights.
Let G = (V, E) be a directed graph with edge weights. If (u, v) E, is an edge of G, then the weight of
this edge is denoted W(u, v). Intuitively, this weight denotes the distance of the road from u to v, or
more generally the cost of traveling from u to v. For now, let us think of the weights as being positive
values, but we will see that the algorithm that we are about to present can handle negative weights as
well, in special cases. Intuitively a negative weight means that you get paid for traveling from u to v.
Given a path π = ¸u
0
, u
1
, . . . , u
k
), the cost of this path is the sum of the edge weights:
cost(π) = W(u
0
, u
1
) +W(u
1
, u
2
) + W(u
k−1
, u
k
) =
k

i=1
W(u
i−1
, u
i
).
(We will avoid using the term length, since it can be confused with the number of edges on the path.)
The distance between two vertices is the cost of the minimum cost path between them.
We consider the problem of determining the cost of the shortest path between all pairs of vertices
in a weighted directed graph. We will present two algorithms for this problem. The ﬁrst is a rather
naive Θ(n
4
) algorithm, and the second is a Θ(n
3
) algorithm. The latter is called the Floyd-Warshall
algorithm. Both algorithms is based on a completely different algorithm design technique, called
dynamic programming.
For these algorithms, we will assume that the digraph is represented as an adjacency matrix, rather than
the more common adjacency list. Recall that adjacency lists are generally more efﬁcient for sparse
graphs (and large graphs tend to be sparse). However, storing all the distance information between
each pair of vertices, will quickly yield a dense digraph (since typically almost every vertex can reach
almost every other vertex). Therefore, since the output will be dense, there is no real harm in using the
Because both algorithms are matrix-based, we will employ common matrix notation, using i, j and k
to denote vertices rather than u, v, and w as we usually do. Let G = (V, E, w) denote the input digraph
and its edge weight function. The edge weights may be positive, zero, or negative, but we assume that
69
Lecture Notes CMSC 251
there are no cycles whose total weight is negative. It is easy to see why this causes problems. If the
shortest path ever enters such a cycle, it would never exit. Why? Because by going round the cycle
over and over, the cost will become smaller and smaller. Thus, the shortest path would have a weight
of −∞, and would consist of an inﬁnite number of edges. Disallowing negative weight cycles will rule
out the possibility this absurd situation.
Input Format: The input is an nn matrix W of edge weights, which are based on the edge weights in the
digraph. We let w
ij
denote the entry in row i and column j of W.
w
ij
=
_
_
_
0 if i = j,
W(i, j) if i ,= j and (i, j) ∈ E,
+∞ if i ,= j and (i, j) / ∈ E.
Setting w
ij
= ∞ if there is no edge, intuitively means that there is no direct link between these two
nodes, and hence the direct cost is inﬁnite. The reason for setting w
ii
= 0 is that intuitively the cost
of getting from any vertex to should be 0, since we have no distance to travel. Note that in digraphs
it is possible to have self-loop edges, and so W(i, j) may generally be nonzero. Notice that it cannot
be negative (otherwise we would have a negative cost cycle consisting of this single edge). If it is
positive, then it never does us any good to follow this edge (since it increases our cost and doesn’t take
us anywhere new).
The output will be an n n distance matrix D = d
ij
where d
ij
= δ(i, j), the shortest path cost from
vertex i to j. Recovering the shortest paths will also be an issue. To help us do this, we will also
compute an auxiliary matrix pred[i, j]. The value of pred[i, j] will be a vertex that is somewhere along
the shortest path from i to j. If the shortest path travels directly from i to j without passing through
any other vertices, then pred[i, j] will be set to null. We will see later than using these values it will be
possible to reconstruct any shortest path in Θ(n) time.
Dynamic Programming for Shortest Paths: The algorithm is based on a technique called dynamic pro-
gramming. Dynamic programming problems are typically optimization problems (ﬁnd the smallest or
largest solution, subject to various constraints). The technique is related to divide-and-conquer, in the
sense that it breaks problems down into smaller problems that it solves recursively. However, because
of the somewhat different nature of dynamic programming problems, standard divide-and-conquer
solutions are not usually efﬁcient. The basic elements that characterize a dynamic programming algo-
rithm are:
Substructure: Decompose the problem into smaller subproblems.
Optimal substructure: Each of the subproblems should be solved optimally.
Bottom-up computation: Combine solutions on small subproblems to solve larger subproblems, and
eventually to arrive at a solution to the complete problem.
The question is how to decompose the shortest path problem into subproblems in a meaningful way.
There is one very natural way to do this. What is remarkable, is that this does not lead to the best solu-
tion. First we will introduce the natural decomposition, and later present the Floyd-Warshall algorithm
makes use of a different, but more efﬁcient dynamic programming formulation.
Path Length Formulation: We will concentrate just on computing the cost of the shortest path, not the path
itself. Let us ﬁrst sketch the natural way to break the problem into subproblems. We want to ﬁnd some
parameter, which constrains the estimates to the shortest path costs. At ﬁrst the estimates will be crude.
As this parameter grows, the shortest paths cost estimates should converge to their correct values. A
natural way to do this is to restrict the number of edges that are allowed to be in the shortest path.
For 0 ≤ m ≤ n − 1, deﬁne d
(m)
ij
to be the cost of the shortest path from vertex i to vertex j that
contains at most m edges. Let D
(m)
denote the matrix whose entries are these values. The idea is to
70
Lecture Notes CMSC 251
compute D
(0)
then D
(1)
, and so on, up to D
(n−1)
. Since we know that no shortest path can use more
than n − 1 edges (for otherwise it would have to repeat a vertex), we know that D
(n−1)
is the ﬁnal
distance matrix. This is illustrated in the ﬁgure (a) below.
d = 9
(2)
1,3
d = 4
(3)
4
9
2
1
1
4
d = INF
1,3
(1)
1,3
(no path)
(using: 1,2,3)
(unsing: 1,4,2,3)
(a) (b)
2
8
i
j
k
(m−1)
(m−1)
kj
w
d
d
ik
ij
1
3
Figure 28: Dynamic Programming Formulation.
The question is, how do we compute these distance matrices? As a basis, we could start with paths of
containing 0 edges, D
(0)
(as our text does). However, observe that D
(1)
= W, since the edges of the
digraph are just paths of length 1. It is just as easy to start with D
(1)
, since we are given W as input.
So as our basis case we have
d
(1)
ij
= w
ij
.
Now, to make the induction go, we claim that it is possible to compute D
(m)
fromD
(m−1)
, for m ≥ 2.
Consider how to compute the quantity d
(m)
ij
. This is the length of the shortest path from i to j using at
most m edges. There are two cases:
Case 1: If the shortest path uses strictly fewer than m edges, then its cost is just d
(m−1)
ij
.
Case 2: If the shortest path uses exactly m edges, then the path uses m − 1 edges to go from i to
some vertex k, and then follows a single edge (k, j) of weight w
kj
to get to j. The path from
i to k should be shortest (by the principle of optimality) so the length of the resulting path is
d
(m−1)
ik
+w
ij
. But we do not know what k is. So we minimize over all possible choices.
This is illustrated in the ﬁgure (b) above.
This suggests the following rule:
d
(m)
ij
= min
_
d
(m−1)
ij
min
1≤k≤n
_
d
(m−1)
ik
+w
kj
_
_
.
Notice that the two terms of the main min correspond to the two cases. In the second case, we consider
all vertices k, and consider the length of the shortest path from i to k, using m−1 edges, and then the
single edge length cost from k to j.
We can simplify this formula a bit by observing that since w
jj
= 0, we have d
(m−1)
ij
= d
(m−1)
ij
+w
jj
.
This term occurs in the second case (when k = j). Thus, the ﬁrst term is redundant. This gives
d
(m)
ij
= min
1≤k≤n
_
d
(m−1)
ik
+w
kj
_
,
The next question is howshall we implement this rule. One way would be to write a recursive procedure
to do it. Here is a possible implementation. To compute the shortest path from i to j, the initial call
would be Dist(n −1, i, j). The array of weights
71
Lecture Notes CMSC 251
Recursive Shortest Paths
Dist(int m, int i, int j) {
if (m == 1) return W[i,j] // single edge case
best = INF
for k = 1 to n do
best = min(best, Dist(m-1, i, k) + w[k,j]) // apply the update rule
return best
}
Unfortunately this will be very slow. Let T(m, n) be the running time of this algorithm on a graph with
n vertices, where the ﬁrst argument is m. The algorithm makes n calls to itself with the ﬁrst argument
of m− 1. When m = 1, the recursion bottoms out, and we have T(1, n) = 1. Otherwise, we make n
recursive calls to T(m−1, n). This gives the recurrence:
T(m, n) =
_
1 if m = 1,
nT(m−1, n) + 1 otherwise.
The total running time is T(n−1, n). It is a straightforward to solve this by expansion. The result will
be O(n
n
), a huge value. It is not hard to see why. If you unravel the recursion, you will see that this
algorithm is just blindly trying all possible paths from i to j. There are exponentially many such paths.
So how do we make this faster? The answer is to use table-lookup. This is the key to dynamic
programming. Observe that there are only O(n
3
) different possible numbers d
(m)
ij
that we have to
compute. Once we compute one of these values, we will store it in a table. Then if we want this value
again, rather than recompute it, we will simply look its value up in the table.
The ﬁgure below gives an implementation of this idea. The main procedure ShortestPath(n, w) is
given the number of vertices n and the matrix of edge weights W. The matrix D
(m)
is stored as D[m],
for 1 ≤ m ≤ n −1. For each m, D[m] is a 2-dimensional matrix, implying that D is a 3-dimensional
matrix. We initialize D
(1)
by copying W. Then each call to ExtendPaths() computes D
(m)
from
D
(m−1)
, from the above formula.
Dynamic Program Shortest Paths
ShortestPath(int n, int W[1..n, 1..n]) {
array D[1..n-1][1..n, 1..n]
copy W to D[1] // initialize D[1]
for m = 2 to n-1 do
D[m] = ExtendPaths(n, D[m-1], W) // comput D[m] from D[m-1]
return D[n-1]
}
ExtendShortestPath(int n, int d[1..n, 1..n], int W[1..n, 1..n]) {
matrix dd[1..n, 1..n] = d[1..n, 1..n] // copy d to temp matrix
for i = 1 to n do // start from i
for j = 1 to n do // ...to j
for k = 1 to n do // ...passing through k
dd[i,j] = min(dd[i,j], d[i,k] + W[k,j])
return dd // return matrix of distances
}
The procedure ExtendShortestPath() consists of 3 nested loops, and so its running time is Θ(n
3
). It
is called n −2 times by the main procedure, and so the total running time is Θ(n
4
). Next time we will
see that we can improve on this. This is illustrated in the ﬁgure below.
72
Lecture Notes CMSC 251
1
5
4
1
4
3
2
1
4 12 0 5
5 0 1 ?
0 3 9 1
(2)
D =
3
4
7
5 7
2
3
1
5
4
1
4
3
3
(1)
13
7 2 3 0
4 7 0 5
6
? 2 9 0
4 ? 0 ?
? 0 1 ?
0 8 ? 1
? = infinity
W =
9
1
2
4
8 1
4
3
2
1
13 2 3 0
3
9
5 12
2
2
1
5 0 1 6
0 3 4 1
(3)
D =
= D
Figure 29: Shortest Path Example.
Lecture 24: Floyd-Warshall Algorithm
(Thursday, April 23, 1998)
Read: Chapt 26 (up to Section 26.2) in CLR.
Floyd-Warshall Algorithm: We continue discussion of computing shortest paths between all pairs of ver-
tices in a directed graph. The Floyd-Warshall algorithm dates back to the early 60’s. Warshall was
interested in the weaker question of reachability: determine for each pair of vertices u and v, whether
u can reach v. Floyd realized that the same technique could be used to compute shortest paths with
only minor variations.
The Floyd-Warshall algorithm improves upon this algorithm, running in Θ(n
3
) time. The genius of the
Floyd-Warshall algorithm is in ﬁnding a different formulation for the shortest path subproblem than
the path length formulation introduced earlier. At ﬁrst the formulation may seem most unnatural, but
it leads to a faster algorithm. As before, we will compute a set of matrices whose entries are d
(k)
ij
. We
will change the meaning of each of these entries.
For a path p = ¸v
1
, v
2
, . . . , v

) we say that the vertices v
2
, v
3
, . . . , v
−1
are the intermediate vertices
of this path. Note that a path consisting of a single edge has no intermediate vertices. We deﬁne d
(k)
ij
to be the shortest path from i to j such that any intermediate vertices on the path are chosen from the
set ¦1, 2, . . . , k¦. In other words, we consider a path from i to j which either consists of the single
edge (i, j), or it visits some intermediate vertices along the way, but these intermediate can only be
chosen from ¦1, 2, . . . , k¦. The path is free to visit any subset of these vertices, and to do so in any
order. Thus, the difference between Floyd’s formulation and the previous formulation is that here
the superscript (k) restricts the set of vertices that the path is allowed to pass through, and there the
superscript (m) restricts the number of edges the path is allowed to use. For example, in the digraph
shown in the following ﬁgure, notice how the value of d
(k)
32
changes as k varies.
Floyd-Warshall Update Rule: How do we compute d
(k)
ij
assuming that we have already computed the pre-
vious matrix d
(k−1)
? As before, there are two basic cases, depending on the ways that we might get
from vertex i to vertex j, assuming that the intermediate vertices are chosen from ¦1, 2, . . . , k¦:
73
Lecture Notes CMSC 251
3,2
d = INF
(0)
3,2
d = 12
(1)
3,2
d = 12
(2)
3,2
d = 12
(3)
3,2
(4)
8
d = 7
Vertices 1 through k−1
kj
d
(k−1)
ik
d
(k−1)
k
(a)
(3,1,4,2)
(3,1,2)
(3,1,2)
(3,1,2)
(no path)
9
ij
d
(k−1)
j i
2
2
1
4
1
4
3
1
Figure 30: Floyd-Warshall Formulation.
We don’t go through k at all: Then the shortest path from i to j uses only intermediate vertices
¦1, . . . , k −1¦ and hence the length of the shortest path is d
(k−1)
ij
.
We do go through k: First observe that a shortest path does not pass through the same vertex twice, so
we can assume that we pass through k exactly once. (The assumption that there are no negative
cost cycles is being used here.) That is, we go from i to k, and then from k to j. In order
for the overall path to be as short as possible we should take the shortest path from i to k, and
the shortest path from k to j. (This is the principle of optimality.) Each of these paths uses
intermediate vertices only in ¦1, 2, . . . , k −1¦. The length of the path is d
(k−1)
ik
+d
(k−1)
kj
.
This suggests the following recursive rule for computing d
(k)
:
d
(0)
ij
= w
ij
,
d
(k)
ij
= min
_
d
(k−1)
ij
, d
(k−1)
ik
+d
(k−1)
kj
_
for k ≥ 1.
(n)
ij
because this allows all possible vertices as intermediate vertices. Again, we
could write a recursive program to compute d
(k)
ij
, but this will be prohibitively slow. Instead, we
compute it by storing the values in a table, and looking the values up as we need them. Here is the
complete algorithm. We have also included predecessor pointers, pred[i, j] for extracting the ﬁnal
shortest paths. We will discuss them later.
Floyd-Warshall Algorithm
Floyd_Warshall(int n, int W[1..n, 1..n]) {
array d[1..n, 1..n]
for i = 1 to n do { // initialize
for j = 1 to n do {
d[i,j] = W[i,j]
pred[i,j] = null
}
}
for k = 1 to n do // use intermediates {1..k}
for i = 1 to n do // ...from i
for j = 1 to n do // ...to j
if (d[i,k] + d[k,j]) < d[i,j]) {
d[i,j] = d[i,k] + d[k,j] // new shorter path length
pred[i,j] = k // new path is through k
74
Lecture Notes CMSC 251
}
return d // matrix of final distances
}
Clearly the algorithm’s running time is Θ(n
3
). The space used by the algorithm is Θ(n
2
). Observe
that we deleted all references to the superscript (k) in the code. It is left as an exercise that this does
not affect the correctness of the algorithm. An example is shown in the following ﬁgure.
2
1
3
4
4
3
D =
D =
(3)
(1)
1
2
3
4
4
5
D =
(4)
1 8
1
2
3
4
1
2
3
4
D =
(2)
(0)
1
2
1
5
3
7
3
2
7
3
6
5
8
2
12
9
5
6
7
12
9
5
1
D =
9
4 4
2
8
1
2
4
8
1
4
1
1
3
4
5
1
2
1
9
12
1
5 0 1 6
4 7 0 5
7 2 3 0
0 8 9 1
? 0 1 ?
4 12 0 5
? 2 3 0
0 3 4 1
? = infinity
0 8 ? 1
? 0 1 ?
4 ? 0 ?
? 2 9 0
0 8 ? 1
? 0 1 ?
4 12 0 5
? 2 9 0
0 8 9 1
5 0 1 6
4 12 0 5
7 2 3 0
Figure 31: Floyd-Warshall Example.
Extracting Shortest Paths: The predecessor pointers pred[i, j] can be used to extract the ﬁnal path. Here
is the idea, whenever we discover that the shortest path from i to j passes through an intermediate
vertex k, we set pred[i, j] = k. If the shortest path does not pass through any intermediate vertex,
then pred[i, j] = null . To ﬁnd the shortest path from i to j, we consult pred[i, j]. If it is null, then
the shortest path is just the edge (i, j). Otherwise, we recursively compute the shortest path from i to
pred[i, j] and the shortest path from pred[i, j] to j.
Printing the Shortest Path
Path(i,j) {
if pred[i,j] = null // path is a single edge
output(i,j)
else { // path goes through pred
Path(i, pred[i,j]); // print path from i to pred
Path(pred[i,j], j); // print path from pred to j
}
}
75
Lecture Notes CMSC 251
Lecture 25: Longest Common Subsequence
(April 28, 1998)
Strings: One important area of algorithm design is the study of algorithms for character strings. There are
a number of important problems here. Among the most important has to do with efﬁciently searching
for a substring or generally a pattern in large piece of text. (This is what text editors and functions
like ”grep” do when you perform a search.) In many instances you do not want to ﬁnd a piece of text
exactly, but rather something that is ”similar”. This arises for example in genetics research. Genetic
codes are stored as long DNA molecules. The DNA strands can be broken down into a long sequences
each of which is one of four basic types: C, G, T, A.
But exact matches rarely occur in biology because of small changes in DNA replication. Exact sub-
string search will only ﬁnd exact matches. For this reason, it is of interest to compute similarities
between strings that do not match exactly. The method of string similarities should be insensitive to
random insertions and deletions of characters from some originating string. There are a number of
measures of similarity in strings. The ﬁrst is the edit distance, that is, the minimum number of single
character insertions, deletions, or transpositions necessary to convert one string into another. The other,
which we will study today, is that of determining the length of the longest common subsequence.
Longest Common Subsequence: Let us think of character strings as sequences of characters. Given two
sequences X = ¸x
1
, x
2
, . . . , x
m
) and Z = ¸z
1
, z
2
, . . . , z
k
), we say that Z is a subsequence of X if
there is a strictly increasing sequence of k indices ¸i
1
, i
2
, . . . , i
k
) (1 ≤ i
1
< i
2
< . . . < i
k
≤ n) such
that Z = ¸X
i1
, X
i2
, . . . , X
i
k
). For example, let X = ¸ABRACADABRA) and let Z = ¸AADAA),
then Z is a subsequence of X.
Given two strings X and Y , the longest common subsequence of X and Y is a longest sequence Z
which is both a subsequence of X and Y .
For example, let X be as before and let Y = ¸YABBADABBADOO). Then the longest common
The Longest Common Subsequence Problem (LCS) is the following. Given two sequences X =
¸x
1
, . . . , x
m
) and Y = ¸y
1
, . . . , y
n
) determine a longest common subsequence. Note that it is not
always unique. For example the LCS of ¸ABC) and ¸BAC) is either ¸AC) or ¸BC).
Dynamic Programming Solution: The simple brute-force solution to the problem would be to try all pos-
sible subsequences from one string, and search for matches in the other string, but this is hopelessly
inefﬁcient, since there are an exponential number of possible subsequences.
Instead, we will derive a dynamic programming solution. In typical DP fashion, we need to break the
problem into smaller pieces. There are many ways to do this for strings, but it turns out for this problem
that considering all pairs of preﬁxes will sufﬁce for us. A preﬁx of a sequence is just an initial string of
values, X
i
= ¸x
1
, x
2
, . . . , x
i
). X
0
is the empty sequence.
The idea will be to compute the longest common subsequence for every possible pair of preﬁxes. Let
c[i, j] denote the length of the longest common subsequence of X
i
and Y
j
. Eventually we are interested
in c[m, n] since this will be the LCS of the two entire strings. The idea is to compute c[i, j] assuming
that we already know the values of c[i

, j

] for i

≤ i and j

≤ j (but not both equal). We begin with
some observations.
Basis: c[i, 0] = c[j, 0] = 0. If either sequence is empty, then the longest common subsequence is
empty.
76
Lecture Notes CMSC 251
Last characters match: Suppose x
i
= y
j
. Example: Let X
i
= ¸ABCA) and let Y
j
= ¸DACA).
Since both end in A, we claim that the LCS must also end in A. (We will explain why later.)
Since the A is part of the LCS we may ﬁnd the overall LCS by removing A from both sequences
and taking the LCS of X
i−1
= ¸ABC) and Y
j−1
= ¸DAC) which is ¸AC) and then adding A
to the end, giving ¸ACA) as the answer. (At ﬁrst you might object: But how did you know that
these two A’s matched with each other. The answer is that we don’t, but it will not make the LCS
any smaller if we do.)
Thus, if x
i
= y
j
then c[i, j] = c[i −1, j −1] + 1.
Last characters do not match: Suppose that x
i
,= y
j
. In this case x
i
and y
j
cannot both be in the
LCS (since they would have to be the last character of the LCS). Thus either x
i
is not part of the
LCS, or y
j
is not part of the LCS (and possibly both are not part of the LCS).
In the ﬁrst case the LCS of X
i
and Y
j
is the LCS of X
i−1
and Y
j
, which is c[i − 1, j]. In the
second case the LCS is the LCS of X
i
and Y
j−1
which is c[i, j − 1]. We do not know which is
the case, so we try both and take the one that gives us the longer LCS.
Thus, if x
i
,= y
j
then c[i, j] = max(c[i −1, j], c[i, j −1]).
We left undone the business of showing that if both strings end in the same character, then the LCS
must also end in this same character. To see this, suppose by contradiction that both characters end in
A, and further suppose that the LCS ended in a different character B. Because A is the last character
of both strings, it follows that this particular instance of the character A cannot be used anywhere else
in the LCS. Thus, we can add it to the end of the LCS, creating a longer common subsequence. But
this would contradict the deﬁnition of the LCS as being longest.
Combining these observations we have the following rule:
c[i, j] =
_
_
_
0 if i = 0 or j = 0,
c[i −1, j −1] + 1 if i, j > 0 and x
i
= y
j
,
max(c[i, j −1], c[i −1, j]) if i, j > 0 and x
i
,= y
j
.
Implementing the Rule: The task now is to simply implement this rule. As with other DP solutions, we
concentrate on computing the maximum length. We will store some helpful pointers in a parallel array,
b[0..m, 0..n].
Longest Common Subsequence
LCS(char x[1..m], char y[1..n]) {
int c[0..m, 0..n]
for i = 0 to m do {
c[i,0] = 0 b[i,0] = SKIPX // initialize column 0
}
for j = 0 to n do {
c[0,j] = 0 b[0,j] = SKIPY // initialize row 0
}
for i = 1 to m do {
for j = 1 to n do {
if (x[i] == y[j]) {
c[i,j] = c[i-1,j-1]+1 // take X[i] and Y[j] for LCS
}
else if (c[i-1,j] >= c[i,j-1]) { // X[i] not in LCS
c[i,j] = c[i-1,j]
b[i,j] = SKIPX
}
else { // Y[j] not in LCS
77
Lecture Notes CMSC 251
c[i,j] = c[i,j-1]
b[i,j] = SKIPY
}
}
}
return c[m,n];
}
LCS Length Table with back pointers included
m=
=n
3 2 2 1
2 2 1 2
2
1 1
1 2 1
0
0 0 0
B
D
C
B
A
B C D B
4
3
2
1
0
4 3 2 1
5 start here
A
B C D B
4
3
2
1
0
4 3 2 1 0
5 m=
=n
B
1 1
1 1 1 1
0
0
0
0
0
0 0 0 0 0
B
D
C
0
1
2 2 1 1
1 1 1 1
1 1 1 1
0
0
0
0
0 0
2
X = BACDB
Y = BDCB
LCS = BCB
3 2 2 1
2 2
Figure 32: Longest common subsequence example.
The running time of the algorithm is clearly O(mn) since there are two nested loops with m and n
iterations, respectively. The algorithm also uses O(mn) space.
Extracting the Actual Sequence: Extracting the ﬁnal LCS is done by using the back pointers stored in
b[0..m, 0..n]. Intuitively b[i, j] = ADDXY means that X[i] and Y [j] together form the last character
of the LCS. So we take this common character, and continue with entry b[i −1, j −1] to the northwest
(¸). If b[i, j] = SKIPX, then we know that X[i] is not in the LCS, and so we skip it and go to
b[i − 1, j] above us (↑). Similarly, if b[i, j] = SKIPY, then we know that Y [j] is not in the LCS,
and so we skip it and go to b[i, j − 1] to the left (←). Following these back pointers, and outputting a
character with each diagonal move gives the ﬁnal subsequence.
Print Subsequence
getLCS(char x[1..m], char y[1..n], int b[0..m,0..n]) {
LCS = empty string
i = m
j = n
while(i != 0 && j != 0) {
switch b[i,j] {
add x[i] (or equivalently y[j]) to front of LCS
i--; j--; break
case SKIPX:
i--; break
case SKIPY:
j--; break
}
}
return LCS
}
78
Lecture Notes CMSC 251
Lecture 26: Chain Matrix Multiplication
(Thursday, April 30, 1998)
Chain Matrix Multiplication: This problem involves the question of determining the optimal sequence for
performing a series of operations. This general class of problem is important in compiler design for
code optimization and in databases for query optimization. We will study the problem in a very re-
stricted instance, where the dynamic programming issues are easiest to see.
Suppose that we wish to multiply a series of matrices
A
1
A
2
. . . A
n
Matrix multiplication is an associative but not a commutative operation. This means that we are free to
parenthesize the above multiplication however we like, but we are not free to rearrange the order of the
matrices. Also recall that when two (nonsquare) matrices are being multiplied, there are restrictions on
the dimensions. A p q matrix has p rows and q columns. You can multiply a p q matrix A times a
q r matrix B, and the result will be a p r matrix C. (The number of columns of A must equal the
number of rows of B.) In particular for 1 ≤ i ≤ p and 1 ≤ j ≤ r,
C[i, j] =
q

k=1
A[i, k]B[k, j].
Observe that there are pr total entries in C and each takes O(q) time to compute, thus the total time
(e.g. number of multiplications) to multiply these two matrices is p q r.
B C
=
A
p
q
q
r
p
r
Multiplication
pqr
time =
=
*
Figure 33: Matrix Multiplication.
Note that although any legal parenthesization will lead to a valid result, not all involve the same number
of operations. Consider the case of 3 matrices: A
1
be 5 4, A
2
be 4 6 and A
3
be 6 2.
mult[((A
1
A
2
)A
3
)] = (5 4 6) + (5 6 2) = 180,
mult[(A
1
(A
2
A
3
))] = (4 6 2) + (5 4 2) = 88.
Even for this small example, considerable savings can be achieved by reordering the evaluation se-
quence. The Chain Matrix Multiplication problem is: Given a sequence of matrices A
1
, A
2
, . . . , A
n
and dimensions p
0
, p
1
, . . . , p
n
where A
i
is of dimension p
i−1
p
i
, determine the multiplication se-
quence that minimizes the number of operations.
Important Note: This algorithm does not perform the multiplications, it just ﬁgures out the best order
in which to perform the multiplications.
79
Lecture Notes CMSC 251
Naive Algorithm: We could write a procedure which tries all possible parenthesizations. Unfortunately, the
number of ways of parenthesizing an expression is very large. If you have just one item, then there is
only one way to parenthesize. If you have n items, then there are n −1 places where you could break
the list with the outermost pair of parentheses, namely just after the 1st item, just after the 2nd item,
etc., and just after the (n − 1)st item. When we split just after the kth item, we create two sublists to
be parenthesized, one with k items, and the other with n − k items. Then we could consider all the
ways of parenthesizing these. Since these are independent choices, if there are L ways to parenthesize
the left sublist and R ways to parenthesize the right sublist, then the total is L R. This suggests the
following recurrence for P(n), the number of different ways of parenthesizing n items:
P(n) =
_
1 if n = 1,

n−1
k=1
P(k)P(n −k) if n ≥ 2.
This is related to a famous function in combinatorics called the Catalan numbers (which in turn is
related to the number of different binary trees on n nodes). In particular P(n) = C(n −1) and
C(n) =
1
n + 1
_
2n
n
_
.
Applying Stirling’s formula, we ﬁnd that C(n) ∈ Ω(4
n
/n
3/2
). Since 4
n
is exponential and n
3/2
is just
polynomial, the exponential will dominate, and this grows very fast. Thus, this will not be practical
except for very small n.
Dynamic Programming Solution: This problem, like other dynamic programming problems involves de-
termining a structure (in this case, a parenthesization). We want to break the problem into subproblems,
whose solutions can be combined to solve the global problem.
For convenience we can write A
i..j
to be the product of matrices i through j. It is easy to see that
A
i..j
is a p
i−1
p
j
matrix. In parenthesizing the expression, we can consider the highest level of
parenthesization. At this level we are simply multiplying two matrices together. That is, for any k,
1 ≤ k ≤ n −1,
A
1..n
= A
1..k
A
k+1..n
.
Thus the problem of determining the optimal sequence of multiplications is broken up into 2 questions:
how do we decide where to split the chain (what is k?) and how do we parenthesize the subchains
A
1..k
and A
k+1..n
? The subchain problems can be solved by recursively applying the same scheme.
The former problem can be solved by just considering all possible values of k. Notice that this problem
satisﬁes the principle of optimality, because if we want to ﬁnd the optimal sequence for multiplying
A
1..n
we must use the optimal sequences for A
1..k
and A
k+1..n
. In other words, the subproblems must
be solved optimally for the global problem to be solved optimally.
We will store the solutions to the subproblems in a table, and build the table in a bottom-up manner.
For 1 ≤ i ≤ j ≤ n, let m[i, j] denote the minimum number of multiplications needed to compute
A
i..j
. The optimum cost can be described by the following recursive deﬁnition. As a basis observe that
if i = j then the sequence contains only one matrix, and so the cost is 0. (There is nothing to multiply.)
Thus, m[i, i] = 0. If i < j, then we are asking about the product A
i..j
. This can be split by considering
each k, i ≤ k < j, as A
i..k
times A
k+1..j
.
The optimum time to compute A
i..k
is m[i, k], and the optimum time to compute A
k+1..j
is m[k+1, j].
We may assume that these values have been computed previously and stored in our array. Since A
i..k
is a p
i−1
p
k
matrix, and A
k+1..j
is a p
k
p
j
matrix, the time to multiply them is p
i−1
p
k
p
j
. This
suggests the following recursive rule for computing m[i, j].
m[i, i] = 0
m[i, j] = min
i≤k<j
(m[i, k] +m[k + 1, j] +p
i−1
p
k
p
j
) for i < j.
80
Lecture Notes CMSC 251
It is not hard to convert this rule into a procedure, which is given below. The only tricky part is arranging
the order in which to compute the values. In the process of computing m[i, j] we will need to access
values m[i, k] and m[k+1, j] for k lying between i and j. This suggests that we should organize things
our computation according to the number of matrices in the subchain. Let L = j − i + 1 denote the
length of the subchain being multiplied. The subchains of length 1 (m[i, i]) are trivial. Then we build
up by computing the subchains of lengths 2, 3, . . . , n. The ﬁnal answer is m[1, n]. We need to be a
little careful in setting up the loops. If a subchain of length L starts at position i, then j = i + L − 1.
Since we want j ≤ n, this means that i + L − 1 ≤ n, or in other words, i ≤ n − L + 1. So our loop
for i runs from 1 to n −L + 1 (to keep j in bounds).
Chain Matrix Multiplication
Matrix-Chain(array p[1..n], int n) {
array s[1..n-1,2..n]
for i = 1 to n do m[i,i] = 0 // initialize
for L = 2 to n do { // L = length of subchain
for i = 1 to n-L+1 do {
j = i + L - 1
m[i,j] = INFINITY
for k = i to j-1 do {
q = m[i, k] + m[k+1, j] + p[i-1]*p[k]*p[j]
if (q < m[i, j]) { m[i,j] = q; s[i,j] = k }
}
}
}
return m[1,n] and s
}
The array s[i, j] will be explained later. It is used to extract the actual sequence. The running time of
the procedure is Θ(n
3
). We’ll leave this as an exercise in solving sums, but the key is that there are
three nested loops, and each can iterate at most n times.
Extracting the ﬁnal Sequence: To extract the actual sequence is a fairly easy extension. The basic idea is
to leave a split marker indicating what the best split is, that is, what value of k lead to the minimum
value of m[i, j]. We can maintain a parallel array s[i, j] in which we will store the value of k providing
the optimal split. For example, suppose that s[i, j] = k. This tells us that the best way to multiply
the subchain A
i..j
is to ﬁrst multiply the subchain A
i..k
and then multiply the subchain A
k+1..j
, and
ﬁnally multiply these together. Intuitively, s[i, j] tells us what multiplication to perform last. Note that
we only need to store s[i, j] when we have at least two matrices, that is, if j > i.
The actual multiplication algorithm uses the s[i, j] value to determine how to split the current sequence.
Assume that the matrices are stored in an array of matrices A[1..n], and that s[i, j] is global to this
recursive procedure. The procedure returns a matrix.
Extracting Optimum Sequence
Mult(i, j) {
if (i > j) {
k = s[i,j]
X = Mult(i, k) // X = A[i]...A[k]
Y = Mult(k+1, j) // Y = A[k+1]...A[j]
return X*Y; // multiply matrices X and Y
}
else
return A[i];
}
81
Lecture Notes CMSC 251
i
0
j
7 2 6 4
1
2
3
4
1
2
3
4
0
p
4
p
3
p
2
p
1
p
m[i,j]
5
158
88
120
48
104
84
0 0 0
Final order
3
4
A
3
A
2
A
1
A
4
A
3
A
2
A
1
A
2
3
2
1
1
3
3 1
3 2 1
s[i,j]
i j
2
3
4
Figure 34: Chain Matrix Multiplication.
In the ﬁgure below we show an example. This algorithm is tricky, so it would be a good idea to trace
through this example (and the one given in the text). The initial set of dimensions are ¸5, 4, 6, 2, 7)
meaning that we are multiplying A
1
(5 4) times A
2
(4 6) times A
3
(6 2) times A
4
(2 7). The
optimal sequence is ((A
1
(A
2
A
3
))A
4
).
Lecture 27: NP-Completeness: General Introduction
(Tuesday, May 5, 1998)
Read: Chapt 36, up through section 36.4.
Easy and Hard Problems: At this point of the semester hopefully you have learned a few things of what
it means for an algorithm to be efﬁcient, and how to design algorithms and determine their efﬁciency
asymptotically. All of this is ﬁne if it helps you discover an acceptably efﬁcient algorithm to solve
your problem. The question that often arises in practice is that you have tried every trick in the book,
and still your best algorithm is not fast enough. Although your algorithm can solve small problems
reasonably efﬁciently (e.g. n ≤ 20) the really large applications that you want to solve (e.g. n ≥ 100)
your algorithm does not terminate quickly enough. When you analyze its running time, you realize that
it is running in exponential time, perhaps n

n
, or 2
n
, or 2
(2
n
)
, or n!, or worse.
Towards the end of the 60’s and in the eary 70’s there were great strides made in ﬁnding efﬁcient
solutions to many combinatorial problems. But at the same time there was also a growing list of
problems for which there seemed to be no known efﬁcient algorithmic solutions. The best solutions
known for these problems required exponential time. People began to wonder whether there was some
unknown paradigm that would lead to a solution to these problems, or perhaps some proof that these
problems are inherently hard to solve and no algorithmic solutions exist that run under exponential
time.
Around this time a remarkable discovery was made. It turns out that many of these “hard” problems
were interrelated in the sense that if you could solve any one of them in polynomial time, then you
could solve all of them in polynomial time. The next couple of lectures we will discuss some of these
problems and introduce the notion of P, NP, and NP-completeness.
Polynomial Time: We need some way to separate the class of efﬁciently solvable problems from inefﬁ-
ciently solvable problems. We will do this by considering problems that can be solved in polynomial
time.
82
Lecture Notes CMSC 251
We have measured the running time of algorithms using worst-case complexity, as a function of n, the
size of the input. We have deﬁned input size variously for different problems, but the bottom line is the
number of bits (or bytes) that it takes to represent the input using any reasonably efﬁcient encoding. (By
a reasonably efﬁcient encoding, we assume that there is not some signiﬁcantly shorter way of providing
the same information. For example, you could write numbers in unary notation 11111111
1
= 100
2
= 8
rather than binary, but that would be unacceptably inefﬁcient.)
We have also assumed that operations on numbers can be performed in constant time. From now on,
we should be more careful and assume that arithmetic operations require at least as much time as there
are bits of precision in the numbers being stored.
Up until now all the algorithms we have seen have had the property that their worst-case running times
are bounded above by some polynomial in the input size, n. A polynomial time algorithm is any
algorithm that runs in time O(n
k
) where k is some constant that is independent of n. A problem is said
to be solvable in polynomial time if there is a polynomial time algorithm that solves it.
Some functions that do not “look” like polynomials but are. For example, a running time of O(nlog n)
does not look like a polynomial, but it is bounded above by a the polynomial O(n
2
), so it is considered
to be in polynomial time.
On the other hand, some functions that do “look” like polynomials are not. For example, a running time
of O(n
k
) is not considered in polynomial time if k is an input parameter that could vary as a function
of n. The important constraint is that the exponent in a polynomial function must be a constant that is
independent of n.
Decision Problems: Many of the problems that we have discussed involve optimization of one form or
another: ﬁnd the shortest path, ﬁnd the minimum cost spanning tree, ﬁnd the knapsack packing of
greatest value. For rather technical reasons, most NP-complete problems that we will discuss will be
phrased as decision problems. A problem is called a decision problem if its output is a simple “yes” or
“no” (or you may think of this as True/False, 0/1, accept/reject).
We will phrase many optimization problems in terms of decision problems. For example, rather than
asking, what is the minimum number of colors needed to color a graph, instead we would phrase this
as a decision problem: Given a graph G and an integer k, is it possible to color G with k colors. Of
course, if you could answer this decision problem, then you could determine the minimum number of
colors by trying all possible values of k (or if you were more clever, you would do a binary search on
k).
One historical artifact of NP-completeness is that problems are stated in terms of language-recognition
problems. This is because the theory of NP-completeness grew out of automata and formal language
theory. We will not be taking this approach, but you should be aware that if you look in the book, it
will often describe NP-complete problems as languages.
Deﬁnition: Deﬁne P to be the set of all decision problems that can be solved in polynomial time.
NP and Polynomial Time Veriﬁcation: Before talking about the class of NP-complete problems, it is im-
portant to introduce the notion of a veriﬁcation algorithm. Many language recognition problems that
may be very hard to solve, but they have the property that it is easy to verify whether its answer is
correct.
Consider the following problem, called the undirected Hamiltonian cycle problem (UHC). Given an
undirected graph G, does G have a cycle that visits every vertex exactly once.
An interesting aspect of this problems is that if the graph did contain a Hamiltonian cycle, then
it would be easy for someone to convince you that it did. They would simply say “the cycle is
¸v
3
, v
7
, v
1
, . . . , v
13
)”. We could then inspect the graph, and check that this is indeed a legal cycle
and that it visits all the vertices of the graph exactly once. Thus, even though we know of no efﬁcient
83
Lecture Notes CMSC 251
Nonhamiltonian Hamiltonian
Figure 35: Undirected Hamiltonian cycle.
way to solve the Hamiltonian cycle problem, there is a very efﬁcient way to verify that a given graph is
Hamiltonian. The given cycle is called a certiﬁcate. This is some piece of information which allows us
to verify that a given string is in a language. If it is possible to verify the accuracy of a certiﬁcate for a
problem in polynomial time , we say that the problem is polynomial time veriﬁable.
Note that not all languages have the property that they are easy to verify. For example, consider the
problem of determining whether a graph has exactly one Hamiltonian cycle. It would be easy for
someone to convince that it has at least one, but it is not clear what someone (no matter how smart)
would say to you to convince you that there is not another one.
Deﬁnition: Deﬁne NP to be the set of all decision problems that can be veriﬁed by a polynomial time
algorithm.
Beware that polynomial time veriﬁcation and polynomial time solvable are two very different concepts.
The Hamiltonian cycle problem is NP-complete, and so it is widely believed that there is no polynomial
time solution to the problem.
Why is the set called “NP” rather than “VP”? The original term NP stood for “nondeterministic polyno-
mial time”. This referred to a program running on a nondeterministic computer that can make guesses.
Basically, such a computer could nondeterministically guess the value of certiﬁcate, and then verify
that the string is in the language in polynomial time. We have avoided introducing nondeterminism
here. It would be covered in a course on complexity theory or formal language theory.
Like P, NP is a set of languages based on some complexity measure (the complexity of veriﬁcation).
Observe that P ⊆ NP. In other words, if we can solve a problem in polynomial time, then we can
certainly verify that an answer is correct in polynomial time. (More formally, we do not even need to
see a certiﬁcate to solve the problem, we can solve it in polynomial time anyway).
However it is not known whether P = NP. It seems unreasonable to think that this should be so. In
other words, just being able to verify that you have a correct solution does not help you in ﬁnding the
actual solution very much. Most experts believe that P ,= NP, but no one has a proof of this.
NP-Completeness: We will not give a formal deﬁnition of NP-completeness. (This is covered in the text,
and higher level courses such as 451). For now, think of the set of NP-complete problems as the
“hardest” problems to solve in the entire class NP. There may be even harder problems to solve that are
not in the class NP. These are called NP-hard problems.
One question is how can we the notion of “hardness” mathematically formal. This is where the concept
of a reduction comes in. We will describe this next.
Reductions: Before discussing reductions, let us ﬁrst consider the following example. Suppose that there
are two problems, A and B. You know (or you strongly believe at least) that it is impossible to solve
84
Lecture Notes CMSC 251
Harder
NP−Hard
NP
NP−Complete
P
Easy
Figure 36: Relationship between P, NP, and NP-complete.
problem A in polynomial time. You want to prove that B cannot be solved in polynomial time. How
would you do this?
We want to show that
(A / ∈ P) ⇒ (B / ∈ P).
To do this, we could prove the contrapositive,
(B ∈ P) ⇒ (A ∈ P).
In other words, to show that B is not solvable in polynomial time, we will suppose that there is an
algorithm that solves B in polynomial time, and then derive a contradiction by showing that A can be
solved in polynomial time.
How do we do this? Suppose that we have a subroutine that can solve any instance of problem B in
polynomial time. Then all we need to do is to show that we can use this subroutine to solve problem A
in polynomial time. Thus we have “reduced” problem A to problem B.
It is important to note here that this supposed subroutine is really a fantasy. We know (or strongly
believe) that Acannot be solved in polynomial time, thus we are essentially proving that the subroutine
cannot exist, implying that B cannot be solved in polynomial time.
Let us consider an example to make this clearer. It is a fact that the problem of determining whether an
undirected graph has a Hamiltonian cycle (UHC) is an NP-complete problem. Thus, there is no known
polynomial time algorithm, and in fact experts widely believe that no such polynomial time algorithm
exists.
Suppose your boss of yours tells you that he wants you to ﬁnd a polynomial solution to a different
problem, namely the problem of ﬁnding a Hamiltonian cycle in a directed graph (DHC). You think
about this for a few minutes, and you convince yourself that this is not a reasonable request. After all,
would allowing directions on the edges make this problem any easier? Suppose you and your boss both
agree that the UHC problem (for undirected graphs) is NP-complete, and so it would be unreasonable
for him to expect you to solve this problem. But he tells you that the directed version is easier. After
all, by adding directions to the edges you eliminate the ambiguity of which direction to travel along
each edge. Shouldn’t that make the problem easier? The problem is, how do you convince your boss
that he is making an unreasonable request (assuming your boss is willing to listen to logic).
You explain to your boss: “Suppose I could ﬁnd an efﬁcient (i.e., polynomial time) solution to the
DHC problem, then I’ll show you that it would then be possible to solve UHC in polynomial time.” In
particular, you will use the efﬁcient algorithm for DHC (which you still haven’t written) as a subroutine
to solve UHC. Since you both agree that UHC is not efﬁciently solvable, this means that this efﬁcient
subroutine for DHC must not exist. Therefore your boss agrees that he has given you an unreasonable
85
Lecture Notes CMSC 251
Here is how you might do this. Given an undirected graph G, create a directed graph G

by just
replacing each undirected edge ¦u, v¦ with two directed edges, (u, v) and (v, u). Now, every simple
path in the G is a simple path in G

, and vice versa. Therefore, G has a Hamiltonian cycle if and only
if G

does. Now, if you could develop an efﬁcient solution to the DHC problem, you could use this
algorithm and this little transformation solve the UHC problem. Here is your algorithm for solving the
undirected Hamiltonian cycle problem. You take the undirected graph G, convert it to an equivalent
directed graph G

(by edge-doubling), and then call your (supposed) algorithmfor directed Hamiltonian
cycles. Whatever answer this algorithm gives, you return as the answer for the Hamiltonian cycle.
UHC to DHC Reduction
bool Undir_Ham_Cycle(graph G) {
create digraph G’ with the same number of vertices as G
for each edge {u,v} in G {
add edges (u,v) and (v,u) to G’
}
return Dir_Ham_Cycle(G’)
}
You would now have a polynomial time algorithm for UHC. Since you and your boss both agree that
this is not going to happen soon, he agrees to let you off.
Nonhamiltonian Hamiltonian
Figure 37: Directed Hamiltonian cycle reduction.
Notice that neither problem UHC or DHC has been solved. You have just shown how to convert a
solution to DHC into a solution for UHC. This is called a reduction and is central to NP-completeness.
Lecture 28: NP-Completeness and Reductions
(Thursday, May 7, 1998)
Read: Chapt 36, through Section 36.4.
Summary: Last time we introduced a number of concepts, on the way to deﬁning NP-completeness. In
particular, the following concepts are important.
Decision Problems: are problems for which the answer is either “yes” or “no.” The classes P and NP
problems are deﬁned as classes of decision problems.
P: is the class of all decisions problems that can be solved in polynomial time (that is, O(n
k
) for some
constant k).
86
Lecture Notes CMSC 251
NP: is deﬁned to be the class of all decision problems that can be veriﬁed in polynomial time. This
means that if the answer to the problem is “yes” then it is possible give some piece of information
that would allow someone to verify that this is the correct answer in polynomial time. (If the
answer is “no” then no such evidence need be given.)
Reductions: Last time we introduced the notion of a reduction. Given two problems A and B, we say that
A is polynomially reducible to B, if, given a polynomial time subroutine for B, we can use it to solve
A in polynomial time. (Note: This deﬁnition differs somewhat from the deﬁnition in the text, but it is
good enough for our purposes.) When this is so we will express this as
A ≤
P
B.
The operative word in the deﬁnition is “if”. We will usually apply the concept of reductions to problems
for which we strongly believe that there is no polynomial time solution.
Some important facts about reductions are:
Lemma: If A ≤
P
B and B ∈ P then A ∈ P.
Lemma: If A ≤
P
B and A / ∈ P then B / ∈ P.
Lemma: (Transitivity) If A ≤
P
B and B ≤
P
C then A ≤
P
C.
The ﬁrst lemma is obvious from the deﬁnition. To see the second lemma, observe that B cannot be in
P, since otherwise A would be in P by the ﬁrst lemma, giving a contradiction. The third lemma takes a
bit of thought. It says that if you can use a subroutine for B to solve A in polynomial time, and you can
use a subroutine for C to solve B in polynomial time, then you can use the subroutine for C to solve
A in polynomial time. (This is done by replacing each call to B with its appropriate subroutine calls to
C).
NP-completeness: Last time we gave the informal deﬁnition that the NP-complete problems are the “hard-
est” problems in NP. Here is a more formal deﬁnition in terms of reducibility.
Deﬁnition: A decision problem B ∈ NP is NP-complete if
A ≤
P
B for all A ∈ NP.
In other words, if you could solve B in polynomial time, then every other problemAin NP would
be solvable in polynomial time.
We can use transitivity to simplify this.
Lemma: B is NP-complete if
(1) B ∈ NP and
(2) A ≤
P
B for some NP-complete problem A.
Thus, if you can solve B in polynomial time, then you could solve A in polynomial time. Since A is
NP-complete, you could solve every problem in NP in polynomial time.
Example: 3-Coloring and Clique Cover: Let us consider an example to make this clearer. Consider the
following two graph problems.
3-coloring (3COL): Given a graph G, can each of its vertices be labeled with one of 3 different “col-
ors”, such that no two adjacent vertices have the same label.
Clique Cover (CC): Given a graph G and an integer k, can the vertices of G be partitioned into k
subsets, V
1
, V
2
, . . . , V
k
, such that

i
V
i
= V , and that each V
i
is a clique of G.
87
Lecture Notes CMSC 251
k=3
Clique cover 3−colorable Not 3−colorable
Figure 38: 3-coloring and Clique Cover.
Recall that the clique is a subset of vertices, such that every pair of vertices in the subset are adjacent
to each other.
3COL is a known NP-complete problem. Your boss tells you that he wants you to solve the CC
problem. You suspect that the CC problem is also NP-complete. How do you prove this to your boss?
There are two things to be shown, ﬁrst that the CC problem is in NP. We won’t worry about this, but it
is pretty easy. (In particular, to convince someone that a graph has a clique cover of size k, just specify
what the k subsets are. Then it is an easy matter to verify that they form a clique cover.)
The second item is to show that a known NP-complete problem (we will choose 3COL) is polynomially
reducible to CC. To do this, you assume that you have access to a subroutine CliqueCover(G,
k). Given a graph G and an integer k, this subroutine returns true if G has a clique cover of size k
and false otherwise, and furthermore, this subroutine runs in polynomial time. How can we use this
“alleged” subroutine to solve the well-known hard 3COL problem? We want to write a polynomial time
subroutine for 3COL, and this subroutine is allowed to call the subroutine CliqueCover(G,k) for
any graph G and any integer k.
Let’s see in what respect the two problems are similar. Both problems are dividing the vertices up into
groups. In the clique cover problem, for two vertices to be in the same group they must be adjacent to
each other. In the 3-coloring problem, for two vertices to be in the same color group, they must not be
is exactly reversed.
Recall that if Gis a graph, then Gis the complement graph, that is, a graph with the same vertex set, but
in which edges and nonedge have been swapped. The main observation is that a graph Gis 3-colorable,
if and only if its complement G, has a clique-cover of size k = 3. We’ll leave the proof as an exercise.
Using this fact, we can reduce the the 3-coloring problem to the clique cover problem as follows.
Remember that this means that, if we had a polynomial time procedure for the clique cover problem
then we could use it as a subroutine to solve the 3-coloring problem Given the graph we want to
compute the 3-coloring for, we take its complement and then invoke the clique cover, setting k = 3.
3COL to CC Reduction
bool 3Colorable(graph G) {
let G’ = complement(G)
return CliqueCover(G’,3)
}
There are a few important things to observe here. First, we never needed to implement the Clique-
Cover() procedure. Remember, these are all “what if” games. If we could solve CC in polynomial
time, then we could solve 3COL. But since we know that 3COL is hard to solve, this means that CC is
also hard to solve.
88
Lecture Notes CMSC 251
G G
k=3 k=3 G
_
G
_
3−colorable
Not 3−colorable
Clique cover No clique cover
Figure 39: Clique covers in the complement.
A second thing to observe that is the direction of the reduction. In normal reductions, you reduce the
problem that you do not know how to solve to one that you do know how to solve. But in NP-complete,
we do not know how to solve either problem (and indeed we are trying to show that an efﬁcient solution
does not exist). Thus the direction of the reduction is naturally backwards. You reduce the known
problem to the problem you want to show is NP-complete. Remember this! It is quite counterintuitive.
Remember: Always reduce the known NP-complete problem to the problem you want to prove
is NP-complete.
The ﬁnal thing to observe is that the reduction really didn’t attempt to solve the problem at all. It just
tried to make one problem look more like the other problem. A reductionist might go so far as to say
that there really is only one NP-complete problem. It has just been dressed up to look differently.
Example: Hamiltonian Path and Hamiltonian Cycle: Let’s consider another example. We have seen the
Hamiltonian Cycle (HC) problem (Given a graph, does it have a cycle that visits every vertex exactly
once?). Another variant is the Hamiltonian Path (HP) problem (Given a graph, does it have a simple
path that visits every vertex exactly once?)
Suppose that we know that the HC problem is NP-complete, and we want to show that the HP problem
is NP-complete. How would we do this. First, remember what we have to show, that a known NP-
complete problem is reducible to our problem. That is, HC ≤
P
HP. In other words, suppose that
we had a subroutine that could solve HP in polynomial time. How could we use it to solve HC in
polynomial time?
Here is a ﬁrst attempt (that doesn’t work). First, if a graph has’t a Hamiltonian cycle, then it certainly
must have a Hamiltonian path (by simply deleting any edge on the cycle). So if we just invoke the
HamPath subroutine on the graph and it returns “no” then we can safely answer “no” for HamCycle.
However, if it answers “yes” then what can we say? Notice, that there are graphs that have Hamiltonian
path but no Hamiltonian cycle (as shown in the ﬁgure below). Thus this will not do the job.
Here is our second attempt (but this will also have a bug). The problem is that cycles and paths are
different things. We can convert a cycle to a path by deleting any edge on the cycle. Suppose that the
89
Lecture Notes CMSC 251
graph G has a Hamiltonian cycle. Then this cycle starts at some ﬁrst vertex u then visits all the other
vertices until coming to some ﬁnal vertex v, and then comes back to u. There must be an edge ¦u, v¦
in the graph. Let’s delete this edge so that the Hamiltonian cycle is now a Hamiltonian path, and then
invoke the HP subroutine on the resulting graph. How do we know which edge to delete? We don’t so
we could try them all. Then if the HP algorithm says “yes” for any deleted edge we would say “yes”
as well.
However, there is a problem here as well. It was our intention that the Hamiltonian path start at u
and end at v. But when we call the HP subroutine, we have no way to enforce this condition. If HP
says “yes”, we do not know that the HP started with u and ended with v. We cannot look inside the
subroutine or modify the subroutine. (Remember, it doesn’t really exist.) We can only call it and check
Correct Reduction Second Attempt
Both HC and HP exist There is both HC and HP
First Attempt
v
u
No HC and no HP No HC, but after deleting No HC but
there is HP
u
v
y
x
There is HC, and after
deleting {u,v} there is HP
{u,v} there is HP
x
v
u
u
v
y
Figure 40: Hamiltonian cycle to Hamiltonian path attempts.
So is there a way to force the HP subroutine to start the path at u and end it at v? The answer is yes,
but we will need to modify the graph to make this happen. In addition to deleting the edge from u to
v, we will add an extra vertex x attached only to u and an extra vertex y attached only to v. Because
these vertices have degree one, if a Hamiltonian path exists, it must start at x and end at y.
This last reduction is the one that works. Here is how it works. Given a graph G for which we want to
determine whether it has a Hamiltonian cycle, we go through all the edges one by one. For each edge
¦u, v¦ (hoping that it will be the last edge on a Hamiltonian cycle) we create a new graph by deleting
this edge and adding vertex x onto u and vertex y onto v. Let the resulting graph be called G

. Then
we invoke our Hamiltonian Path subroutine to see whether G

has a Hamiltonian path. If it does, then
it must start at x to u, and end with v to y (or vice versa). Then we know that the original graph had
a Hamiltonian cycle (starting at u and ending at y). If this fails for all edges, then we report that the
original graph has no Hamiltonian cycle.
90
Lecture Notes CMSC 251
HC to HP Reduction
bool HamCycle(graph G) {
for each edge {u,v} in G {
copy G to a new graph G’
delete edge {u,v} from G’
add new vertices x and y to G’
add new edges {x,u} and {y,v} to G’
if (HamPath(G’)) return true
}
return false // failed for every edge
}
This is a rather inefﬁcient reduction, but it does work. In particular it makes O(e) calls to the Ham-
Path() procedure. Can you see how to do it with fewer calls? (Hint: Consider applying this to the
edges coming out of just one vertex.) Can you see how to do it with only one call? (Hint: This is
trickier.)
As before, notice that we didn’t really attempt to solve either problem. We just tried to ﬁgure out how
to make a procedure for one problem (Hamiltonian path) work to solve another problem (Hamiltonian
cycle). Since HC is NP-complete, this means that there is not likely to be an efﬁcient solution to HP
either.
Lecture 29: Final Review
(Tuesday, May 12, 1998)
Final exam: As mentioned before, the exam will be comprehensive, but it will stress material since the
second midterm exam. I would estimate that about 50–70% of the exam will cover material since the
last midterm, and the remainder will be comprehensive. The exam will be closed book/closed notes
with three sheets of notes (front and back).
Overview: This semester we have discussed general approaches to algorithm design. The goal of this course
is to improve your skills in designing good programs, especially on complex problems, where it is not
obvious how to design a good solution. Finding good computational solutions to problems involves
many skills. Here we have focused on the higher level aspects of the problem: what approaches to use
in designing good algorithms, how generate a rough sketch the efﬁciency of your algorithm (through
asymptotic analysis), how to focus on the essential mathematical aspects of the problem, and strip away
the complicating elements (such as data representations, I/O, etc.)
Of course, to be a complete programmer, you need to be able to orchestrate all of these elements. The
main thrust of this course has only been on the initial stages of this design process. However, these are
important stages, because a poor initial design is much harder to ﬁx later. Still, don’t stop with your
ﬁrst solution to any problem. As we saw with sorting, there may be many ways of solving a problem.
Even algorithms that are asymptotically equivalent (such as MergeSort, HeapSort, and QuickSort) have
The intent of the course has been to investigate basic techniques for algorithm analysis, various algo-
rithm design paradigms: divide-and-conquer graph traversals, dynamic programming, etc. Finally we
have discussed a class of very hard problems to solve, called NP-complete problems, and how to show
that problems are in this class. Here is an overview of the topics that we covered this semester.
Tools of Algorithm Analysis:
91
Lecture Notes CMSC 251
Asymptotics: O, Ω, Θ. General facts about growth rates of functions.
Summations: Analysis of looping programs, common summations, solving complex summa-
tions, integral approximation, constructive induction.
Recurrences: Analysis of recursive programs, strong induction, expansion, Master Theorem.
Sorting:
Mergesort: Stable, Θ(nlog n) sorting algorithm.
Heapsort: Nonstable, Θ(nlog n), in-place algorithm. A heap is an important data structure for
implementation of priority queues (a queue in which the highest priority item is dequeued
ﬁrst).
Quicksort: Nonstable, Θ(nlog n) expected case, (almost) in-place sorting algorithm. This is
regarded as the fastest of these sorting algorithms, primarily because of its pattern of locality
of reference.
Sorting lower bounds: Any sorting algorithm that is based on comparisons requires Ω(nlog n)
steps in the worst-case. The argument is based on a decision tree. Considering the number
of possible outcomes, and observe that they form the leaves of the decision tree. The height
of the decision tree is Ω(lg N), where N is the number of leaves. In this case, N = n!, the
number of different permutations of n keys.
Linear time sorting: If you are sorting small integers in the range from 1 to k, then you can
applying counting sort in Θ(n + k) time. If k is too large, then you can try breaking the
sort to each digit individually. If there are d digits, then its running time is Θ(d(n + k)),
where k is the number of different values in each digit.
Graphs: We presented basic deﬁnitions of graphs and digraphs. A graph (digraph) consists of a set
of vertices and a set of undirected (directed) edges. Recall that the number of edges in a graph
can generally be as large as O(n
2
), but is often smaller (closer to O(n)). A graph is sparse if the
number of edges is o(n
2
), and dense otherwise.
We discussed two representations:
Adjacency matrix: A[u, v] = 1 if (u, v) ∈ E. These are simple, but require Θ(n
2
) storage.
Good for dense graphs.
Adjacency list: Adj [u] is a pointer to a linked list containing the neighbors of u. These are better
for sparse graphs, since they only require Θ(n +e) storage.
Breadth-ﬁrst search: We discussed one graph algorithm: breadth ﬁrst search. This is a way of
traversing the vertices of a graph in increasing order of distance from a source vertex. Recall
that it colors vertices (white, gray, black) to indicate their status in the search, and it also uses a
FIFO queue to determine which order it will visit the vertices. When we process the next vertex,
we simply visit (that is, enqueue) all of its unvisited neighbors. This runs in Θ(n + e) time. (If
the queue is replaced by a stack, then we get a different type of search algorithm, called depth-
ﬁrst search.) We showed that breadth-ﬁrst search could be used to compute shortest paths from a
single source vertex in an (unweighted) graph or digraph.
Dynamic Programming: Dynamic programming is an important design technique used in many op-
timization problems. Its basic elements are those of subdividing large complex problems into
smaller subproblems, solving subproblems in a bottom-up manner (going from smaller to larger).
An important idea in dynamic programming is that of the principal of optimality: For the global
problem to be solved optimally, the subproblems should be solved optimally. This is not always
the case (e.g., when there is dependence between the subproblems, it might be better to do worse
and one to get a big savings on the other).
Floyd-Warshall Algorithm: (Section 26.2) Shortest paths in a weighted digraph between all
pairs of vertices. This algorithm allows negative cost edges, provided that there are no neg-
ative cost cycles. We gave two algorithms. The ﬁrst was based on a DP formulation of
92
Lecture Notes CMSC 251
building up paths based on the number of edges allowed (taking Θ(n
4
) time). The second
(the Floyd-Warshall algorithm) uses a DP formulation based on considering which vertices
you are allowed to pass through. It takes O(n
3
) time.
Longest Common Subsequence: (Section 16.3) Find the longest subsequence of common char-
acters between two character strings. We showed that the LCS of two sequences of lengths
n and m could be computed in Θ(nm).
Chain-Matrix Multiplication: (Section 16.1) Given a chain of matrices, determine the opti-
mum order in which to multiply them. This is an important problem, because many DP
formulations are based on deriving an optimum binary tree given a set of leaves.
NP-completeness: (Chapt 36.)
Basic concepts: Decision problems, polynomial time, the class P, certiﬁcates and the class NP,
polynomial time reductions, NP-completeness.
NP-completeness reductions: We showed that to prove that a problem is NP-complete you need
to show (1) that it is in NP (by showing that it is possible to verify correctness if the answer is
“yes”) and (2) show that some known NP-complete problem can be reduced to your problem.
Thus, if there was a polynomial time algorithm for your problem, then you could use it to
solve a known NP-complete problem in polynomial time.
We showed how to reduce 3-coloring to clique cover, and how to reduce Hamiltonian cycle
to Hamiltonian path.
93

Lecture Notes

CMSC 251

invisible at any point in time. Recognizing this source of inefﬁciency, he redesigned the algorithms and data structures, recoded the inner loops, so that the algorithm concentrated its efforts on eliminating many invisible elements, and just drawing the few thousand visible elements. In a matter of two weeks he had a system that ran faster on his ofﬁce workstation, than the one running on the supercomputer. This may seem like a simple insight, but it is remarkable how many times the clever efforts of a single clear-sighted person can eclipse the efforts of larger groups, who are not paying attention to the basic principle that we will stress in this course: Before you implement, ﬁrst be sure you have a good design. This course is all about how to design good algorithms. Because the lesson cannot be taught in just one course, there are a number of companion courses that are important as well. CMSC 420 deals with how to design good data structures. This is not really an independent issue, because most of the fastest algorithms are fast because they use fast data structures, and vice versa. CMSC 451 is the advanced version of this course, which teaches other advanced elements of algorithm design. In fact, many of the courses in the computer science department deal with efﬁcient algorithms and data structures, but just as they apply to various applications: compilers, operating systems, databases, artiﬁcial intelligence, computer graphics and vision, etc. Thus, a good understanding of algorithm design is a central element to a good understanding of computer science and good programming. Implementation Issues: One of the elements that we will focus on in this course is to try to study algorithms as pure mathematical objects, and so ignore issues such as programming language, machine, and operating system. This has the advantage of clearing away the messy details that affect implementation. But these details may be very important. For example, an important fact of current processor technology is that of locality of reference. Frequently accessed data can be stored in registers or cache memory. Our mathematical analyses will usually ignore these issues. But a good algorithm designer can work within the realm of mathematics, but still keep an open eye to implementation issues down the line that will be important for ﬁnal implementation. For example, we will study three fast sorting algorithms this semester, heap-sort, merge-sort, and quick-sort. From our mathematical analysis, all have equal running times. However, among the three (barring any extra considerations) quicksort is the fastest on virtually all modern machines. Why? It is the best from the perspective of locality of reference. However, the difference is typically small (perhaps 10–20% difference in running time). Thus this course is not the last word in good program design, and in fact it is perhaps more accurately just the ﬁrst word in good program design. The overall strategy that I would suggest to any programming would be to ﬁrst come up with a few good designs from a mathematical and algorithmic perspective. Next prune this selection by consideration of practical matters (like locality of reference). Finally prototype (that is, do test implementations) a few of the best designs and run them on data sets that will arise in your application for the ﬁnal ﬁne-tuning. Also, be sure to use whatever development tools that you have, such as proﬁlers (programs which pin-point the sections of the code that are responsible for most of the running time). Course in Review: This course will consist of three major sections. The ﬁrst is on the mathematical tools necessary for the analysis of algorithms. This will focus on asymptotics, summations, recurrences. The second element will deal with one particularly important algorithmic problem: sorting a list of numbers. We will show a number of different strategies for sorting, and use this problem as a casestudy in different techniques for designing and analyzing algorithms. The ﬁnal third of the course will deal with a collection of various algorithmic problems and solution techniques. Finally we will close this last third with a very brief introduction to the theory of NP-completeness. NP-complete problem are those for which no efﬁcient algorithms are known, but no one knows for sure whether efﬁcient solutions might exist. 2

Lecture Notes

CMSC 251

Lecture 2: Analyzing Algorithms: The 2-d Maxima Problem
(Thursday, Jan 29, 1998)
Read: Chapter 1 in CLR. Analyzing Algorithms: In order to design good algorithms, we must ﬁrst agree the criteria for measuring algorithms. The emphasis in this course will be on the design of efﬁcient algorithm, and hence we will measure algorithms in terms of the amount of computational resources that the algorithm requires. These resources include mostly running time and memory. Depending on the application, there may be other elements that are taken into account, such as the number disk accesses in a database program or the communication bandwidth in a networking application. In practice there are many issues that need to be considered in the design algorithms. These include issues such as the ease of debugging and maintaining the ﬁnal software through its life-cycle. Also, one of the luxuries we will have in this course is to be able to assume that we are given a clean, fullyspeciﬁed mathematical description of the computational problem. In practice, this is often not the case, and the algorithm must be designed subject to only partial knowledge of the ﬁnal speciﬁcations. Thus, in practice it is often necessary to design algorithms that are simple, and easily modiﬁed if problem parameters and speciﬁcations are slightly modiﬁed. Fortunately, most of the algorithms that we will discuss in this class are quite simple, and are easy to modify subject to small problem variations. Model of Computation: Another goal that we will have in this course is that our analyses be as independent as possible of the variations in machine, operating system, compiler, or programming language. Unlike programs, algorithms to be understood primarily by people (i.e. programmers) and not machines. Thus gives us quite a bit of ﬂexibility in how we present our algorithms, and many low-level details may be omitted (since it will be the job of the programmer who implements the algorithm to ﬁll them in). But, in order to say anything meaningful about our algorithms, it will be important for us to settle on a mathematical model of computation. Ideally this model should be a reasonable abstraction of a standard generic single-processor machine. We call this model a random access machine or RAM. A RAM is an idealized machine with an inﬁnitely large random-access memory. Instructions are executed one-by-one (there is no parallelism). Each instruction involves performing some basic operation on two values in the machines memory (which might be characters or integers; let’s avoid ﬂoating point for now). Basic operations include things like assigning a value to a variable, computing any basic arithmetic operation (+, −, ∗, integer division) on integer values of any size, performing any comparison (e.g. x ≤ 5) or boolean operations, accessing an element of an array (e.g. A[10]). We assume that each basic operation takes the same constant time to execute. This model seems to go a good job of describing the computational power of most modern (nonparallel) machines. It does not model some elements, such as efﬁciency due to locality of reference, as described in the previous lecture. There are some “loop-holes” (or hidden ways of subverting the rules) to beware of. For example, the model would allow you to add two numbers that contain a billion digits in constant time. Thus, it is theoretically possible to derive nonsensical results in the form of efﬁcient RAM programs that cannot be implemented efﬁciently on any machine. Nonetheless, the RAM model seems to be fairly sound, and has done a good job of modeling typical machine technology since the early 60’s. Example: 2-dimension Maxima: Rather than jumping in with all the deﬁnitions, let us begin the discussion of how to analyze algorithms with a simple problem, called 2-dimension maxima. To motivate the problem, suppose that you want to buy a car. Since you’re a real swinger you want the fastest car around, so among all cars you pick the fastest. But cars are expensive, and since you’re a swinger on a budget, you want the cheapest. You cannot decide which is more important, speed or price, but you know that you deﬁnitely do NOT want to consider a car if there is another car that is both faster and 3

that is. However.12) (9. if you want to be a bit more formal. Let a point p in 2-dimensional space be given by its integer coordinates.x. . If for each car we associated (x. using a structure with records for the x and y coordinates. . Point P[1.n-1] for i = 1 to n { maximal = true. p2 . y) values where x is the speed of the car. . // P[i] is maximal by default for j = 1 to n { if (i != j) and (P[i]. let’s just consider a simple brute-force algorithm.x ≤ q. it could be written in pseudocode as follows: Brute Force Maxima Maxima(int n. . Here is how we might model this as a formal problem.10) (7. .Lecture Notes CMSC 251 cheaper.y ≤ q. then output it. each represented by its x and y integer coordinates.y. However. Here is the simplest one that I can imagine.7) Figure 1: Maximal Points.5) (13. .) 14 12 10 8 6 4 2 2 4 (2. Brute Force Algorithm: To get the ball rolling.13) (12. The car selection problem can be modeled in this way. We say that the fast.x and p.3) 8 10 12 14 16 (14.. those points pi .4) (5. A point p is said to dominated by point q if p. .x <= P[j]. . cheap car dominates the slow. (See the ﬁgure below.n]) { // output maxima of P[0. pn } be the initial set of points. p2 . output the set of the maximal points of P . .7) (11. Given a set of n points. For each point pi . such that pi is not dominated by any other point of P . expensive car relative to your selection criteria. pn } in 2-space. and y is the negation of the price (thus high y values mean cheap cars). we want to list those that are not dominated by any other. or a 2-dimensional array) nor have we discussed input and output formats. p = (p. test it against all other points pj . p2 .1) 6 (4.y).10) (15. . given a collection of cars. . we would like to keep as many of the messy issues out since they would just clutter up the algorithm. p.g. This English description is clear enough that any (competent) programmer should be able to implement it. So. pn } in 2-space a point is said to be maximal if it is not dominated by any other point in P . These would normally be important in a software speciﬁcation. we have intentionally not discussed issues as to how points are represented (e. with no thought to efﬁciency.y <= P[j]. If pi is not dominated by any other point.y) { 4 .5) (4.11) (7. 2-dimensional Maxima: Given a set of points P = {p1 . then the maximal points correspond to the fastest and cheapest cars. Observe that our description of the problem so far has been at a fairly mathematical level.x) and (P[i]. ... Let P = {p1 . P = {p1 . For example.

you should omit details that detract from the main ideas of the algorithm. } } if (maximal) output P[i]. since these algorithms may be faster or slower depending on how well they exploit the particular machine and compiler. or the number of times that an element of P is accessed. Correctness: Whenever you present an algorithm. and so the input size can be estimated up to constant factor by the parameter n. However. This implies that we cannot really make distinctions between algorithms whose running times differ by a small constant factor. but we will often omit error checking because we want to see the algorithm in its simplest form. etc.. the appropriate level of detail is a judgement call. the number of times we execute the inner loop before breaking out depends not only on the size of the input. In a simple case like the one above. In particular. you should also present a short argument for its correctness. do not assume that more detail is better. and let T (I) denote the running time of the algorithm 5 . then this proof should contain the explanations of why the tricks works. When writing pseudocode. different inputs of the same size may generally result in different execution times.Lecture Notes CMSC 251 maximal = false. For example. Since we want to avoid these technology issues and treat algorithms as mathematical objects. in this problem. and I never deﬁned what a Point data type is. Running Time Analysis: The main purpose of our mathematical analyses will be be measure the execution time (and sometimes the space) of an algorithm. } } // P[i] is dominated by P[j] // no one dominated. Obviously the running time of an implementation of the algorithm would depend on the speed of the machine. For example. optimizations of the compiler.. You might also notice that I did not insert any checking for consistency. (Can you see why?) Again. and just go with the essentials. We’ll see later why even with this big assumption. these are important considerations for implementation. Remember. break. that is. If the algorithm is tricky. Running time depends on input size. I assumed that the points in P are all distinct.g. (For example. since I felt that these are pretty clear from context or just unimportant details. Also. Formally.) There are two common criteria used in measuring running times: Worst-case time: is the maximum running time over all (legal) inputs of size n? Let I denote a legal input instance. but the structure of the input. we will usually make the simplifying assumption that each number is of some constant maximum length (after all. let us just ignore all constant factors in analyzing running times. the length of the array P . and let |I| denote its length. numbers are represented in base 10 and separated by a space). assuming some reasonable encoding of inputs (e. we will only focus on the pseudocode itself. Of course. or the number of comparisons that are performed. We simply implemented the deﬁnition: a point is maximal if no other point dominates it. and so the level of detail depends on your intended audience. it must ﬁt into one computer word). If there is a duplicate point then the algorithm may fail to output even a single point. I omitted type speciﬁcations for the procedure Maxima and the variable maximal. How small is small? To make matters mathematically clean. So we will deﬁne running time in terms of a function of input size. there almost nothing that needs to be said.output There are no formal rules to the syntax of this pseudocode. the input size is deﬁned to be the number of characters in the input ﬁle. we can still make meaningful comparisons between algorithms. In this case we might measure running time by counting the number of steps of pseudocode that are executed. algorithms are to be read by people.

the n2 term will be much larger than the n term. almost any algorithm is fast enough. The output statement makes two accesses (to P [i]. we are most interested in what happens as n gets large. not all four need be evaluated. There are a number of simple algebraic facts about sums. 0. These are easy to verify by simply writing out the summation and applying simple 6 . but let’s review a bit here. and for each time through this loop. their sum a1 +a2 +· · ·+an can be expressed in summation notation as n ai . In the worst case every point is maximal (can you see how to generate such an example?) so these two access are made for each time through the outer loop. (Under C semantics. n 4 is just 4n. The condition in the if-statement makes four accesses to P . and for the running time we will count the number of time that any element of P is accessed.Lecture Notes CMSC 251 on input I. |I|=n Average-case time: is the average running time over all inputs of size n? More generally.y) for each point that is output. It turns out that for most of the algorithms we will consider. i=1 If n = 0. but let’s ignore this since it will just complicate matters). This is called the asymptotic growth rate of the function. average-case running time is just too difﬁcult to compute. Tavg (n) = |I|=n p(I)T (I). then the value of the sum is the additive identity. j=1 These are not very hard summations to solve. It is only for large values of n that running time becomes an important issue. with the probability being the weight. The average-case running time is the weight sum of running times. and so it will dominate the running time. and it is difﬁcult to specify a natural probability distribution on inputs that are really meaningful for all applications. We will sum this analysis up by simply saying that the worst-case running time of the brute force algorithm is Θ(n2 ). Also. one for the i-loop and the other for the j-loop:   n T (n) = i=1 2 + n j=1 n 4 .) We saw that this analysis involved computing a summation. Given a ﬁnite sequence of values a1 . This is because for many of the problems we will work with. . and so T (n) = i=1 (4n + 2) = (4n + 2)n = 4n2 + 2n. As mentioned before we will not care about the small constant factors. we go through the inner loop n times as well. . Summations should be familiar from CMSC 150. Tworst (n) = max T (I).x and P [i]. a2 . . Running Time of the Brute Force Algorithm: Let us agree that the input size is n. there will be only a constant factor difference between worst-case and average-case times. let p(I) denote the probability of seeing this input. Later we will discuss more formally what this notation means. an . We will almost always work with worst-case running time. . Summations: (This is covered in Chapter 3 of CLR. Clearly we go through the outer loop n times. Why? Because when n is small. for each input I. When n is large. Thus we might express the worst-case running time as a pair of nested summations.

.) We showed that as a function of n in the worst case this quantity was T (n) = 4n2 + 2n. Feb 3.Lecture Notes CMSC 251 high school algebra. . Point P[1. n Hn = i=1 1 1 1 1 = 1 + + + · · · + ≈ ln n = Θ(ln n). Last time we counted the number of times that the algorithm accessed a coordinate of any point. then this is Θ(xn ). 2 Geometric Series: Let x = 1 be any constant (independent of i). There are some particularly important summations.. for j = 1 to n { .. 7 . i 2 3 n Lecture 3: Summations and Analyzing Programs with Loops (Tuesday. Recall that the algorithm consisted of two nested loops. If you want some practice with induction. n i = 1 + 2 + ··· + n = i=1 n(n + 1) = Θ(n2 ). the ﬁrst two are easy to prove by induction. then for n ≥ 0. which you should probably commit to memory (or at least remember their asymptotic growth rates). For n ≥ 0.. (This was only one of many things that we could have chosen to count. If c is a constant (does not depend on the summation index i) then n n n n n cai = c i=1 i=1 ai and i=1 (ai + bi ) = i=1 ai + i=1 bi . Harmonic Series: This arises often in probabilistic analyses of algorithms. . n xi = 1 + x + x2 + · · · + xn = i=0 xn+1 − 1 . n. Arithmetic Series: For n ≥ 0.n]) { for i = 1 to n { . It looked something like this: Brute Force Maxima Maxima(int n. .. 3 in CLR. The stuff in the “. } } We were interested in measuring the worst-case running time of this algorithm as a function of the input size. x−1 If 0 < x < 1 then this is Θ(1).. 1998) Read: Chapt. Recap: Last time we presented an algorithm for the 2-dimensional maxima problem. and if x > 1. ” has been omitted because it is unimportant for the analysis..

Lecture Notes

CMSC 251

We were most interested in the growth rate for large values of n (since almost all algorithms run fast for small values of n), so we were most interested in the 4n2 term, which determines how the function grows asymptotically for large n. Also, we do not care about constant factors (because we wanted simplicity and machine independence, and ﬁgured that the constant factors were better measured by implementing the algorithm). So we can ignored the factor 4 and simply say that the algorithm’s worst-case running time grows asymptotically as n2 , which we wrote as Θ(n2 ). In this and the next lecture we will consider the questions of (1) how is it that one goes about analyzing the running time of an algorithm as function such as T (n) above, and (2) how does one arrive at a simple asymptotic expression for that running time. A Harder Example: Let’s consider another example. Again, we will ignore stuff that takes constant time (expressed as “. . . ” in the code below).
A Not-So-Simple Example: for i = 1 to n { ... for j = 1 to 2*i { ... k = j; while (k >= 0) { ... k = k - 1; } } } // assume that n is input size

How do we analyze the running time of an algorithm that has many complex nested loops? The answer is that we write out the loops as summations, and then try to solve the summations. Let I(), M (), T () be the running times for (one full execution of) the inner loop, middle loop, and the entire program. To convert the loops into summations, we work from the inside-out. Let’s consider one pass through the innermost loop. The number of passes through the loop depends on j. It is executed for k = j, j − 1, j − 2, . . . , 0, and the time spent inside the loop is a constant, so the total time is just j + 1. We could attempt to arrive at this more formally by expressing this as a summation:
j

I(j) =
k=0

1=j+1

Why the “1”? Because the stuff inside this loop takes constant time to execute. Why did we count up from 0 to j (and not down as the loop does?) The reason is that the mathematical notation for summations always goes from low index to high, and since addition is commutative it does not matter in which order we do the addition. Now let us consider one pass through the middle loop. It’s running time is determined by i. Using the summation we derived above for the innermost loop, and the fact that this loop is executed for j running from 1 to 2i, it follows that the execution time is
2i 2i

M (i) =
j=1

I(j) =
j=1

(j + 1).

Last time we gave the formula for the arithmetic series:
n

i=
i=1

n(n + 1) . 2

8

Lecture Notes

CMSC 251

Our sum is not quite of the right form, but we can split it into two sums:
2i 2i

M (i) =
j=1

j+
j=1

1.

The latter sum is clearly just 2i. The former is an arithmetic series, and so we ﬁnd can plug in 2i for n, and j for i in the formula above to yield the value: M (i) = 4i2 + 2i + 4i 2i(2i + 1) + 2i = = 2i2 + 3i. 2 2

Now, for the outermost sum and the running time of the entire algorithm we have
n

T (n) =
i=1

(2i2 + 3i).

Splitting this up (by the linearity of addition) we have
n n

T (n) = 2
i=1

i2 + 3
i=1

i.

The latter sum is another arithmetic series, which we can solve by the formula above as n(n + 1)/2. n The former summation i=1 i2 is not one that we have seen before. Later, we’ll show the following. Quadratic Series: For n ≥ 0.
n

i2 = 1 + 4 + 9 + · · · + n2 =
i=1

2n3 + 3n2 + n . 6

Assuming this fact for now, we conclude that the total running time is: T (n) = 2 n(n + 1) 2n3 + 3n2 + n +3 , 6 2

which after some algebraic manipulations gives T (n) = 4n3 + 15n2 + 11n . 6

As before, we ignore all but the fastest growing term 4n3 /6, and ignore constant factors, so the total running time is Θ(n3 ). Solving Summations: In the example above, we saw an unfamiliar summation, could be solved in closed form as:
n n 2 i=1 i ,

which we claimed

i2 =
i=1

2n3 + 3n2 + n . 6

Solving a summation in closed-form means that you can write an exact formula for the summation without any embedded summations or asymptotic terms. In general, when you are presented with an unfamiliar summation, how do you approach solving it, or if not solving it in closed form, at least getting an asymptotic approximation. Here are a few ideas.

9

Lecture Notes

CMSC 251

Use crude bounds: One of the simples approaches, that usually works for arriving at asymptotic bounds is to replace every term in the summation with a simple upper bound. For example, n in i=1 i2 we could replace every term of the summation by the largest term. This would give
n n

i2 ≤
i=1 i=1

n2 = n3 .

Notice that this is asymptotically equal to the formula, since both are Θ(n3 ). This technique works pretty well with relatively slow growing functions (e.g., anything growing more slowly than than a polynomial, that is, ic for some constant c). It does not give good bounds with faster growing functions, such as an exponential function like 2i . Approximate using integrals: Integration and summation are closely related. (Integration is in some sense a continuous form of summation.) Here is a handy formula. Let f (x) be any monotonically increasing function (the function increases as x increases).
n 0 n n+1

f (x)dx ≤
i=1

f (i) ≤

f (x)dx.
1

f(2)

f(x)

f(2)

f(x)

0 1 2 3 ...

n

x

0 1 2 3 ...

n n+1

x

Figure 2: Approximating sums by integrals. Most running times are increasing functions of input size, so this formula is useful in analyzing algorithm running times. Using this formula, we can approximate the above quadratic sum. In this case, f (x) = x2 .
n

i2 ≤
i=1

n+1 1

x2 dx =

x3 3

n+1

=
x=1

1 n3 + 3n2 + 3n (n + 1)3 − = . 3 3 3

Note that the constant factor on the leading term of n3 /3 is equal to the exact formula. You might say, why is it easier to work with integrals than summations? The main reason is that most people have more experience in calculus than in discrete math, and there are many mathematics handbooks with lots of solved integrals. Use constructive induction: This is a fairly good method to apply whenever you can guess the general form of the summation, but perhaps you are not sure of the various constant factors. In this case, the integration formula suggests a solution of the form:
n

i2 = an3 + bn2 + cn + d,
i=1

but we do not know what a, b, c, and d are. However, we believe that they are constants (i.e., they are independent of n). 10

We prove IH(n0 ). for n = n0+1 to infinity do Prove IH(n). and then we work up to successively larger value of n. then strip off the last term. we should gather information about what the values of a. Notice that constructive induction gave us the exact formula for the summation. For this to be true. This is sometimes called strong induction. this means that each power of n must match identically. write the summation for i running up to n. called the induction hypothesis is a function of n. all of the values a through d are constants. If we had chosen the wrong general form. Since this is the ﬁrst induction proof we have done. assuming that IH(n’) holds for all n’ < n. Since this should be true for all n. c = 3a − 2b + c. denote IH(n). Basically induction proofs are just the mathematical equivalents of loops in programming. let us recall how induction works. each time we may make use of the induction hypothesis. implying that a = 1/3. or we might get a set of constraints that could not be satisﬁed. independent of n. b. From the second constraint above we can cancel b from both sides. This gives us the following constraints a = a. d = −a + b − c + d. Usually we only need to assume the induction hypothesis for the next smaller value of n. Combining this with the third constraint we have b = 1/2. we want this is equal to an3 + bn2 + cn + d. Here we go. Finally from the last constraint we have c = −a + b = 1/6. and from this we will show that the formula holds for the value n itself. and as the proof proceeds. Basis Case: (n = 0) Recall that an empty summation is equal to the additive identity. This gives the ﬁnal formula n i2 = i=1 n2 n 2n3 + 3n2 + n n3 + + = . Induction Step: Let us assume that n > 0. First. In this case we want to prove that 0 = a · 03 + b · 02 + c · 0 + d. and d are. The only tricky part is that we had to “guess” the general structure of the solution. We already know that d = 0 from the basis case. namely n − 1.Lecture Notes CMSC 251 Let’s try to prove this formula by induction on n. and then combine everything algebraically. 11 . 0. n n−1 i2 i=1 = i=1 i2 + n2 = = = a(n − 1)3 + b(n − 1)2 + c(n − 1) + d + n2 (an3 − 3an2 + 3an − a) + (bn2 − 2bn + b) + (cn − c) + d + n2 an3 + (−3a + b + 1)n2 + (3a − 2b + c)n + (−a + b − c + d). because we assume that the hypothesis holds for all n < n. as long as we apply it to strictly smaller values of n. Let n be the integer variable on which we are performing the induction. The structure of proving summations by induction is almost always the same. c. apply the induction hypothesis on the summation running up to n − 1. To complete the proof. b = −3a + b + 1. 3 2 6 6 As desired. There is some smallest value n0 for which IH(n0 ) is suppose to hold. we must have d = 0. Prove IH(n0). The theorem or formula to be proved. then either we would ﬁnd that some of these “constants” depended on n. and that the formula holds for all values n < n.

which can easily be veriﬁed. let us assume that no two points have the same y-coordinate.) This observation by itself. So the only remaining problem is. this new point may dominate some of the existing maximal points. .Lecture Notes CMSC 251 In summary. But the bottom line is that there exist any number of good sorting algorithms whose running time to sort n values is Θ(n log n). However. and so it cannot be dominated by any of the existing points. Although we would like to think of this as a continuous process. 1998) Read: Chapts. it will never need to be added back again. and save the messy details for the actual implementation. 2-dimensional Maxima Revisited: Recall the max-dominance problem from the previous lectures. there is no one way to solve a summation. it must be maximal for the current set.) Consider the ﬁgure below.y ≤ q. Given a set of n points. First off. if all the points are maximal. For simplicity. we will update the current list of maximal points. Any point that pi dominates will also be dominated by pj . (This follows from the fact that the domination relation is transitive. (Why? Beacuse its x-coordinate is larger than all the x-coordinates of all the existing points. but it is good to work with the simpler version. it dominates an existing point if and only if its y-coordinate is also larger 12 . we will begin by sorting the points in increasing order of their x-coordinates. there are many tricks that can be applied to either ﬁnd asymptotic approximations or to get the exact solution. We will sweep a vertical line across the plane from left to right.x ≤ q. When the sweep line reaches the rightmost point of P . (This limiting assumption is actually easy to overcome. So far we have introduced a simple brute-force algorithm that ran in Θ(n2 ) time. The question we consider today is whether there is an approach that is signiﬁcantly better? The problem with the brute-force algorithm is that uses no intelligence in pruning out decisions. To do this. P = {p1 . Plane-sweep Algorithm: The question is whether we can make an signiﬁcant improvement in the running time? Here is an idea for how we might do it. how do we sort the points? We will leave this problem for later in the semester. Lecture 4: 2-d Maxima Revisited and Asymptotics (Thursday. Notice that since the pi has greater x-coordinate than all the existing points. The problem is to output all the maximal points of P . For example. then we we do not need to use pi for eliminating other points. . The ultimate goal is to come up with a close-form solution. As we encounter each new point.x and p. p2 . A point p is said to dominated by point q if p. . Feb 5. which can certainly happen. As we sweep this line. We will just assume that they exist for now.) However. then we will have constructed the complete set of maxima. which operated by comparing all pairs of points. . Let pi denote the current point being considered. we need some way to perform the plane sweep in discrete steps. and how do we update them when a new point is processed? We claim that as each new point is added.y. once we know that a point pi is dominated by another point pj . we will build a structure holding the maximal points lying to the left of the sweep line. does not lead to a signiﬁcantly faster algorithm though. This is not always easy or even possible. but for our purposes asymptotic bounds will usually be good enough. and so we may need to delete them from the list of maxima. how do we store the existing maximal points. For example. then this optimization saves us nothing. (Notice that once a point is deleted as being nonmaximal.) Then we will advance the sweep-line from point to point in n discrete steps. pn } in 2-space a point is said to be maximal if it is not dominated by any other point in P . 2 and 3 in CLR. This approach of solving geometric problems by sweeping a line across the plane is called plane sweep.

Thus. they occur in increasing order of y-coordinates. we have two nested loops. S = empty. P[i]).) We have argued that the brute-force algorithm runs in Θ(n2 ) time. However. we must pop an element off the stack. and the improved plane-sweep algorithm runs in Θ(n log n) time. primarily because the techniques that we discussed in the last lecture do not apply readily here.y <= P[i]. // add P[i] to stack of maxima } output the contents of S.top. a factor of 10. say.g. Is this really better? How much of an improvement is this plane-sweep algorithm over the brute-force algorithm? Probably the most accurate way to ﬁnd out would be to code the two up. both algorithms are probably running fast enough that the difference will be practically negligible. // initialize stack of maxima for i = 1 to n do { // add points in order of x-coordinate while (S is not empty and S. For even larger 14 . let’s look at the ratio of the running times. (Make sure that you believe the argument before going on. In particular. it follows that everyone else on the stack is undominated. The outer loop is clearly executed n times.y) Pop(S). this is a good example of how not to be fooled by analyses that are too simple minded. Although it is true that the inner while-loop could be executed as many as n − 1 times any one time through the outer loop. So. What is the base of the logarithm? It turns out that it will not matter for the asymptotics (we’ll show this later). But just to get a feeling. Analysis: This is an interesting program to analyze. n lg n lg n For relatively small values of n (e. It is impossible to pop more elements off the stack than are ever pushed on. I claim that after the sorting (which we mentioned takes Θ(n log n) time). The inner while-loop could be iterated up to n − 1 times in the worst case (in particular. so there is a 100-to-1 ratio in running times. it is hard to imagine that the constant factors will differ by more than. less than 100). say. 000. since we execute exactly one Push for each time through the outer for-loop. the total running time of the algorithm is Θ(n). and since the total number of iterations in the outer for-loop is n. Also observe that every time we go through the inner while-loop. it seems that though we have n(n − 1) for a total of Θ(n2 ). Therefore. But since neither algorithm makes use of very complex constructs. the rest of the algorithm only takes Θ(n) time.) Therefore. we have not considered the constant factors. The most important element was that since the current maxima appear on the stack in decreasing order of xcoordinates (as we look down from the top of the stack). over the entire course of the algorithm we claim that it cannot be executed more than n times. as soon as we ﬁnd the last undominated element in the stack. The ratio of the running times is: n n2 = . Why is this? First observe that the total number of elements that have ever been pushed onto the stack is at most n. since the total number of iterations of the inner while-loop is n. (We have ignored constant factors. so for concreteness. when the last point added dominates all the others). // remove points that P[i] dominates Push(S. Of course. the inner while-loop cannot be executed more than n times over the entire course of the algorithm. the ratio of n to lg n is about 1000/10 = 100. but we’ll see that they cannot play a very big role. n = 1. } Why is this algorithm correct? The correctness follows from the discussion up to now. which we’ll denote as lg n. and compare their running times. On larger inputs.Lecture Notes CMSC 251 Sort P in ascending order by x-coordinate. let’s assume logarithm base 2.

Your response at this point might be. recall that we argued that the brute-force algorithm for 2-d maxima had a running time of T (n) = 4n2 + 2n. n = 1. We have been saying things like T (n) = Θ(n2 ).000 times faster. Unfortunately. and nearly equivalent deﬁnition). “I’m sorry that I asked”. In this case g(n) = n2 . that no one notices it. we have given a formal proof that 4n2 + 2n ∈ Θ(n2 ). Asymptotic Notation: We continue to use the notation Θ() but have never deﬁned it. suppose that there was a constant factor difference of 10 to 1. irrespective of the constant factors. For example. in favor of the brute-force algorithm. implying that 4n2 + 2n ≤ 4n2 + 2n2 = 6n2 = c2 n2 . There are really three inequalities here. Thus. So let us make n0 = 1. then we have 0 ≤ 4n2 ≤ 4n2 + 2n. and assume that n ≥ 1. c2 . 000. Thus. The plane-sweep algorithm would still be 5. which is clearly true as long as n ≥ 0. it does not (although later we will see that there is somewhat easier. which we claimed was Θ(n2 ). First off. 000. The other inequality is 4n2 + 2n ≤ c2 n2 . and right side is a set of functions. 000/20 = 50. as desired. If we set c1 = 4. 15 . you make it possible for the user to run large inputs in a reasonable amount of time. we deﬁne Θ(g(n)) to be a set of functions that are asymptotically equivalent to g(n). But efﬁcient algorithm design is most important for large inputs. n ≥ 0 and n ≥ 1. The constraint 0 ≤ c1 n2 is no problem. Going back to an earlier lecture. which means that we must argue that there exist constants c1 . then the brute-force algorithm would take 14 hours. we will satisfy both of these constraints. It seems that the reasonably simple concept of “throw away all but the fastest growing term. then we have n2 ≥ n. The next is: c1 n2 ≤ 4n2 + 2n. since we will always be dealing with positive n and positive constants. say. It tells us which algorithm is better for large values of n. then almost any algorithm will be fast. and the general rule of computing is that input sizes continue to grow until people can no longer tolerate the running times. This should properly be written as T (n) ∈ Θ(n2 ). We have two constraints on n. and ignore constant factors” should have a simpler and more intuitive deﬁnition than this. Next time we’ll try to give some of the intuition behind this deﬁnition. c2 . or put formally: Θ(g(n)) = {f (n) | there exist positive constants c1 . say 10 seconds to execute. Let’s remedy this now. Deﬁnition: Given any function g(n). As we mentioned before. and n0 such that 0 ≤ c1 n2 ≤ (4n2 + 2n) ≤ c2 n2 for all n ≥ n0 . which will imply that we as long as n ≥ n0 . We want to show that f (n) = 4n2 + 2n is a member of this set. However. this abuse of notation is so common in the ﬁeld of algorithm design. If we select c2 = 6. 000.Lecture Notes CMSC 251 inputs. 000. if n is not very large. This is quite a signiﬁcant difference. and n0 such that 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 }. If the plane-sweep algorithm took. we are looking at a ratio of roughly 1. Let’s verify that this is so. by designing algorithms efﬁciently. we can see that we have been misusing the notation. This cannot be true. The left side is a function. From this we get an idea about the importance of asymptotic analysis.

Intuitively. that f (n) does grows asymptotically at least as fast as n2 . But in the above reasoning we have implicitly made the assumptions that 2n ≥ 0 and n2 − 3 ≥ 0. and n(n − 3) are all intuitively asymptotically equivalent. (n2 /5 + n − 10 log n). we deﬁne Θ(g(n)) to be a set of functions: Θ(g(n)) = {f (n) | there exist strictly positive constants c1 . and last time we gave a formal deﬁnition. and you may make n0 as big a constant as you like. let us select n0 ≥ 1. For example. what we want to say with “f (n) ∈ Θ(g(n))” is that f (n) and g(n) are asymptotically equivalent. Deﬁnition: Given any function g(n). Thus. The portion of the deﬁnition that allows us to select c1 and c2 is essentially saying “the constants do not matter because you may pick c1 and c2 however you like to satisfy these conditions. that f (n) grows no faster asymptotically than n2 . such that f (n) ≤ c2 n2 for all n ≥ n0 . 4 for next time. 16 . This means that they have essentially the same growth rates for √ large n. and n0 such that 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 }. but it is true for all n ≥ 1. Θ-Notation: Recall the following deﬁnition from last time. In particular. Let’s dissect this deﬁnition. We need to show two things: ﬁrst.” The portion of the deﬁnition that allows us to select n0 is essentially saying “we are only interested in large n. We have implicitly made the assumption that 2n ≤ 2n2 . we will explore this and other asymptotic notations in greater depth. Our informal rule of keeping the largest term and throwing away the constants suggests that f (n) ∈ Θ(n2 ) (since f grows quadratically). Feb 10.” Consider the following (almost correct) reasoning: f (n) = 8n2 + 2n − 3 ≥ 8n2 − 3 = 7n2 + (n2 − 3) ≥ 7n2 = 7n2 . for all n ≥ n0 . since you only have to satisfy the condition for all n bigger than n0 . c2 . Let’s see why the formal deﬁnition bears out this informal observation. So let us select n0 ≥ 3. The Limit Rule is not really covered in the text. and now we have f (n) ≥ c1 n2 . (8n2 + 2n − 3). Upper bound: f (n) grows asymptotically no faster than n2 : This is established by the portion of the deﬁnition that reads “there exist positive constants c2 and n0 . Read Chapt. We’ll do both very carefully. and second. since as n becomes large.” An example: Consider the function f (n) = 8n2 + 2n − 3. they all grow quadratically in n. which is what we need. such that f (n) ≥ c1 n2 for all n ≥ n0 . which is what we need.” Consider the following reasoning (which is almost correct): f (n) = 8n2 + 2n − 3 ≤ 8n2 + 2n ≤ 8n2 + 2n2 = 10n2 . then both are true. This means that if we let c2 = 10. So. 3 in CLR. Today. if we set c1 = 7. These are not true for all n. 1998) Read: Chapt. the dominant (fastest growing) term is some constant times n2 . This is not true for all n. Asymptotics: We have introduced the notion of Θ() notation. functions like 4n2 .Lecture Notes CMSC 251 Lecture 5: Asymptotics (Tuesday. Lower bound: f (n) grows asymptotically at least as fast as n2 : This is established by the portion of the deﬁnition that reads: (paraphrasing): “there exist positive constants c1 and n0 . if n ≥ 3. and now we have f (n) ≤ c2 n2 for all n ≥ n0 . In other words. then we are done. and hopefully give a better understanding of what they mean. but they are true for all √ √ sufﬁciently large n. then we are done.

but by hypothesis c1 is positive. Deﬁnition: Given any function g(n). such that f (n) ≤ c2 n for all n ≥ n0 . we have n0 ≥ 3 and from the upper bound we have n0 ≥ 1. Ω(g(n)) = {f (n) | there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 }. we know that as n becomes large enough f (n) = 8n2 + 2n − 3 will eventually exceed c2 n no matter how large we make c2 (since f (n) is growing quadratically and c2 n is only growing linearly). Sometimes we are only interested in proving one bound or the other. suppose towards a contradiction that constants c2 and n0 did exist. in conclusion. n→∞ It is easy to see that in the limit the left side tends to 0. Since this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. The O-notation allows us to state asymptotic upper bounds and the Ω-notation allows us to state asymptotic lower bounds. and eventually any cubic function will exceed it. suppose towards a contradiction that constants c1 and n0 did exist. To show this formally. such that 8n2 +2n−3 ≥ c1 n3 for all n ≥ n0 . such that 8n2 + 2n − 3 ≤ c2 n for all n ≥ n0 . This means that f (n) ∈ Θ(n3 ). / Let’s show that f (n) ∈ Θ(n3 ). the upper bound requires us to show “there exist positive constants c2 and n0 . To show this formally. Thus. O-notation and Ω-notation: We have seen that the deﬁnition of Θ-notation relies on proving both a lower and upper asymptotic bound. If we divide both side by n we have: 3 ≤ c2 . we have established that f (n) ∈ n2 . c2 . this statement is violated. In particular. but it shows how this informal notion is implemented formally in the deﬁnition. / If this were true. Here the idea will be to violate the lower bound: “there exist positive / constants c1 and n0 . c2 = 10. let’s show that f (n) ∈ Θ(n). and √ n0 = 3. then we would have to satisfy both the upper and lower bounds. It turns out that the lower bound is satisﬁed (because f (n) grows at least as fast asymptotically as n). Since this is true for all sufﬁciently large n then it must be true in the limit as n tends to inﬁnity. First. and so no matter how large c2 is. Deﬁnition: Given any function g(n). if we let c1 = 7. and n0 .Lecture Notes CMSC 251 √ From the lower bound. and so combining √ these we let n0 be the larger of the two: n0 = 3. But the upper bound is false. Since we have shown (by construction) the existence of constants c1 . If we divide both side by n3 we have: lim 2 3 8 + − 3 n n2 n ≥ c1 . such that f (n) ≥ c1 n3 for all n ≥ n0 .” Informally this is true because f (n) is growing quadratically. O(g(n)) = {f (n) | there exist positive constants c and n0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 }. 17 . (Whew! That was a lot more work than just the informal notion of throwing away constants and keeping the largest term. and this is exactly what the deﬁnition requires. This means that f (n) ∈ Θ(n). and so the only way to satisfy this requirement / is to set c1 = 0. then we have 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 .) Now let’s show why f (n) is not in some other asymptotic class. lim 8n + 2 − n→∞ n It is easy to see that in the limit the left side tends to ∞.” Informally.

Using the limit rule we have: 2 5 4 7 f (n) = lim 2 − − 2 + 3 − 4 4 n→∞ n n→∞ n n n n lim = 2 − 0 − 0 + 0 − 0 = 2. dredge out your old calculus book to remember. and it is almost always easier to apply than the formal deﬁnition. Proof: This would be quite tedious to do by the formal deﬁnition.Lecture Notes CMSC 251 Compare this with the deﬁnition of Θ. One important rule to remember is the following: L’Hˆ pital’s rule: If f (n) and g(n) both approach 0 or both approach ∞ in the limit. This limit rule can be applied in almost every instance (that I know of) where the formal deﬁnition can be used. Limit Rule for O-notation: Given positive functions f (n) and g(n). (If not. The Limit Rule for Θ: The previous examples which used limits suggest alternative way of showing that f (n) ∈ Θ(g(n)).. But since most running times are fairly well-behaved functions this is rarely a problem. 18 . You may recall the important rules from calculus for evaluating limits. Limit Rule for Θ-notation: Given positive functions f (n) and g(n). Then f (n) ∈ Θ(n4 ).g. g(n) n→∞ for some constant c ≥ 0 (nonnegative but not inﬁnite). if lim f (n) = c. Polynomial Functions: Using the Limit Rule it is quite easy to analyze polynomial functions. the limit of a ﬁnite sum is the sum of the individual limits). then f (n) ∈ O(g(n)). and Ω-notation only enforces the lower bound.g. Since 2 is a strictly positive constant it follows from the limit rule that f (n) ∈ Θ(n2 ). f (n) ∈ Ω(n2 ) and in Ω(n) but not in Ω(n3 ). For example f (n) = 3n2 + 4n ∈ Θ(n2 ) but it is not in Θ(n) or Θ(n3 ). But f (n) ∈ O(n2 ) and in O(n3 ) but not in O(n). Limit Rule for Ω-notation: Given positive functions f (n) and g(n). You will see that O-notation only enforces the upper bound of the Θ deﬁnition. f (n) ∈ O(g(n)) means that f (n) grows asymptotically at the same rate or faster than g(n).) Most of the rules are pretty self evident (e. then f (n) ∈ Θ(g(n)). g(n) n→∞ g (n) where f (n) and g (n) denote the derivatives of f and g relative to n. if f (n) = c. n→∞ g(n) lim for some constant c > 0 (strictly positive but not inﬁnity). then o n→∞ lim f (n) f (n) = lim . The only exceptions that I know of are strange instances where the limit does not exist (e. f (n) ∈ O(g(n)) means that f (n) grows asymptotically at the same rate or slower than g(n). Intuitively. Whereas. if lim f (n) =0 g(n) n→∞ (either a strictly positive constant or inﬁnity) then f (n) ∈ Ω(g(n)). Also observe that f (n) ∈ Θ(g(n)) if and only if f (n) ∈ O(g(n)) and f (n) ∈ Ω(g(n)). Finally. f (n) = n(1+sin n) ). Lemma: Let f (n) = 2n4 − 5n3 − 2n2 + 4n − 7.

this is still considered quite efﬁcient. d i=0 ai ni . we will usually be happy to allow any number of additional logarithmic factors. and then applying L’Hˆ pital’s o rule. Since many problems require sorting the inputs. n ≤ 20). the informal rule of “keep the largest term and throw away the constant factors” is now much more evident. Conversely. 000). 19 . this applies to all reasonably large input sizes. where From this. we will try to avoid exponential running times at all costs.Lecture Notes CMSC 251 In fact. it is easy to generalize this to arbitrary polynomials. The terminology lgb n means (lg n)b .g. for small powers of logarithms. but you should be careful just how high the crossover point is. n ≤ 1. Exponentials and Logarithms: Exponentials and logarithms are very important in analyzing algorithms. Thus. n→∞ an n→∞ nc We won’t prove these. (E. For example: lg500 n ∈ O(n). by my calculations. The important bottom line is that polynomials always grow more slowly than exponentials whose base is greater than 1.) Also it is the time to ﬁnd an object in a sorted list of length n by binary search.g. b. Asymptotic Intuition: To get a intuitive feeling for what common asymptotic running times map into in terms of practical usage. Are their even bigger functions. lg500 n ≤ n only for n > 26000 (which is much larger than input size you’ll ever see). The following are nice to keep in mind. Theorem: Consider any asymptotically positive polynomial of degree p(n) = ad > 0. • Θ(1): Constant time. it should be mentioned that these last observations are really asymptotic results. • Θ(n!). lim For this reason. and c: nb lgb n =0 lim = 0. Lemma: Given any positive constants a > 1. it would be a good idea to try to get a more accurate average case analysis. For example lg2 n ≤ n for all n ≥ 16. if it means avoiding any additional powers of n. Then p(n) ∈ Θ(nd ). At this point. • Θ(n log n): This is the running time of the best sorting algorithms. you can’t beat it! • Θ(log n): This is typically the speed that most efﬁcient data structures operate in for a single access. But. Θ(nn ): Acceptable only for really small inputs (e. • Θ(2n ). They are true in the limit for large n. you should take this with a grain of salt. Θ(3n ): Exponential time. For example: n500 ∈ O(2n ). if you want to see a function that grows inconceivably fast.g. In case (2). This is only acceptable when either (1) your know that you inputs will be of very small size (e. here is a little list. or (2) you know that this is a worst-case running time that will rarely occur in practical instances. . but they can be shown by taking appropriate powers. These running times are acceptable either when the exponent is small or when the data size is not too large (e. Θ(n3 ).: Polynomial time. For this reason. . inserting a key into a balanced binary tree.. given that you need Θ(n) time just to read in all the data. • Θ(n2 ).. look up the deﬁnition of Ackerman’s function in our book. n ≤ 50). • Θ(n): This is about the fastest that an algorithm can run. For example.g. logarithmic powers (sometimes called polylogarithmic functions) grow more slowly than any polynomial. . You betcha! For example.

and • Combine (the pieces together into a global solution). It works top-down splitting up the list into smaller sublists. but we will show that there is an elegant mathematical concept. called a recurrence. that when someone ﬁrst suggests a problem to me. each of size roughly n/2. They work bottom-up. Divide: Split A down the middle into two subsequences. since the conquer part involves invoking the same technique on a smaller subproblem.g. In algorithm design. The “divide” phase is shown on the left. called MergeSort. This is called divide-and-conquer. which we will assume is stored in an array A[1 . the main elements to a divide-and-conquer solution are • Divide (the problem into a small number of pieces). In fact the technique is so powerful. solve the problem on each of the small pieces. 1998) Read: Chapt. merging sorted lists together into larger sorted lists.Lecture Notes CMSC 251 Lecture 6: Divide and Conquer and MergeSort (Thursday. the ﬁrst question I usually ask (after what is the brute-force solution) is “does there exist a divide-andconquer solution for this problem?” Divide-and-conquer algorithms are typically recursive. the idea is to take a problem on a large input. 1 (on MergeSort) and Chapt. Divide and Conquer: The ancient Roman politicians understood an important principle of good algorithm design (although they were probably not thinking about algorithms at the time). The “conquer and combine” phases are shown on the right. How can we apply divide-and-conquer to sorting? Here are the major elements of the MergeSort algorithm. • Conquer (solve each piece. But once you have broken the problem into pieces. MergeSort: The ﬁrst example of a divide-and-conquer algorithm which we will consider is perhaps the best known. This is normally done by permuting the elements within the array A. and then combine the piecewise solutions into a global solution. sorted in increasing order. You divide your enemies (by getting them to distrust each other) and then conquer them piece by piece. . which is useful for analyzing the sort of recursive programs that naturally arise in divide-and-conquer solutions. how do you solve these pieces? The answer is to apply divide-and-conquer to them. 4 (on recurrences). There are a huge number computational problems that can be solved efﬁciently using divide-andconquer. by applying divide-and-conquer recursively to it). The dividing process ends when we have split the subsequences down to a single item. It turns out that the merging process is quite easy to implement. The process ends when you are left with such tiny pieces remaining (e. This is a simple and very efﬁcient algorithm for sorting a list of numbers. break the input into smaller pieces. which merges together two sorted lists into a single sorted list. . Combine: Merge the two sorted subsequences into a single sorted list. Feb 12. Summarizing. and how to analyze them using recurrences. Analyzing the running times of recursive programs is rather tricky. The objective is to output a permutation of this sequence. Conquer: Sort each subsequence (by calling MergeSort recursively on each). one or two items) that it is trivial to solve them. We are given an sequence of n numbers A. 20 . thus further breaking them down. The following ﬁgure gives a high-level view of the algorithm. For the next couple of lectures we will discuss some examples of divide-and-conquer algorithms. n]. An sequence of length one is trivially sorted. The key operation where all the work is done is in the combine stage.

. we invoke a procedure (which we have yet to write) which merges these two subarrays into a single sorted array. q.. and leave them up to the programmer to ﬁgure out how to implement it. q) MergeSort(A. which indicate the ﬁrst and last indices of the subarray that we are to sort. then we just copy the rest of the other array into B.Lecture Notes CMSC 251 input: 7 5 2 4 1 6 3 0 1 6 3 0 1 6 1 6 3 0 3 0 output: 0 1 2 3 4 5 6 7 0 1 3 6 0 3 3 0 7 5 2 4 7 5 7 5 2 2 4 split 2 4 5 7 5 7 7 5 2 merge 2 4 4 1 1 6 6 4 Figure 4: MergeSort. compared to some of the other efﬁcient sorting algorithms. A[p. int p. Merge(A. but this is impossible to overcome entirely.q] // sort A[q+1.q]. A[q + 1. then this means that there is only one element to sort. Merge Merge(array A. int r) { // merges A[p. (One nice thing about pseudocode.r]. Otherwise (if p < r) there are at least two elements. If r = p. that point to the current elements of each subarray.. B[p. r) } } // we have at least 2 items // sort A[p.r] and return the sorted result in the same subarray. and then increments this variable by one. Because the algorithm is called recursively on sublists.q − 1] and A[q.q] and A[q + 1. It is one of the shortcomings of MergeSort. p.... Then we split the array into subarrays A[p.. MergeSort: Let’s design the algorithm top-down. Finally. MergeSort MergeSort(array A.) We have to indices i and j. in addition to passing in the array itself. is that we can make these assumptions. For convenience. q.r]. The call MergeSort(A. p. When we run out of elements in one array. We’ll assume that the procedure that merges two sorted list is available to us. the operator i++ returns the current value of i. midway between p and r. We merge these two subarrays by copying the elements to a temporary working array called B.) In case you are not aware of C notation.r]? Suppose r = p + 1. q+1. and we may return immediately. namely q = (p + r)/2 (rounded down to the nearest integer).. that is. Why would it be wrong to do A[p. We move the smaller element into the next position of B (indicated by index k) and then increment the corresponding index (either i or j).r] // merge everything together Merging: All that is left is to describe the procedure that merges two sorted lists.. r) will sort the subarray A[p. int r) { if (p < r) { q = (p + r)/2 MergeSort(A.. Finally. (The use of the temporary array is a bit unpleasant. r) Merge(A.r] 21 . r) assumes that the left subarray. we will pass in two indices. and the right subarray. (We need to be careful here. We ﬁnd the index q.. int p. We’ll implement it later. have already been sorted. and we will invoke the divide-andconquer. p.. we copy the entire contents of B back into A.) Call MergeSort recursively to sort each subarray. p. we will assume that the array B has the same index range A.q] with A[q+1. Here is the overview.r]. int q.

and (3) combine the solutions to the subproblems to a global solution. one of size n/2 and the other of size n/2 .Lecture Notes CMSC 251 Analysis: What remains is to analyze the running time of MergeSort. Recall that T (n) is the time to run MergeSort on a list of size n. 4 on recurrences. Let T (n) denote the worst case running time of MergeSort on an array of length n. Skip Section 4. How long does it take to sort the left subarray? We do not know this.) Now. which are deﬁned explicitly. then the total sorting time is a constant Θ(1). where r −p+1 = n.r] has n/2 elements in it. Today we discuss a number of techniques for solving recurrences. either exactly or asymptotically. Merge(A. Let’s see how to apply this to MergeSort. which contains q − p + 1 elements. the algorithm ﬁrst computes q = (p + r)/2 . Finally. because each execution copies one new element to the array B. Lecture 7: Recurrences (Tuesday. then we must recursively sort two sublists. q. we can just write T (n) = 1. then the running time is a constant. Similarly. we introduced the notion of a recurrence. Thus the remaining subarray A[q + 1. MergeSort Recurrence: Here is the recurrence we derived last time for MergeSort. it is important to develop mathematical techniques for solving recurrences.) Thus the running time to Merge n items is Θ(n).q]. Because divide-and-conquer is an important design technique. r).. that is. e. otherwise. since these will only differ by a constant factor. Since all of the real work is done in the Merge procedure. and because it naturally gives rise to recursive algorithms. a recurrence has some basis values (e. (We’ll see later why we do this. to merge both sorted lists takes n time. We also described MergeSort. we can express the time that it takes to sort the right subarray as T ( n/2 ). and the nonrecursive part took Θ(n) time for splitting the list (constant time) 23 . how do we describe the running time of the entire MergeSort algorithm? We will do this through the use of a recurrence. To avoid circularity. and B only has space for n elements. number of array accesses. (2) solve each subproblem recursively. In conclusion we have T (n) = 1 T ( n/2 ) + T ( n/2 ) + n if n = 1.4. the recurrence for a given value of n is deﬁned in terms of values that are strictly smaller than n. (If you are a bit more careful you can actually see that all the while-loops together can only be executed n times in total. Feb 17. for n = 1). that is. You can verify (by some tedious ﬂoor-ceiling arithmetic. First let us consider the running time of the procedure Merge(A.g. Finally. p. To do this. or simpler by just trying an odd example and an even example) that is of size n/2 . number of comparisons. If n > 1. When we call MergeSort with a list of length n > 1. First observe that if we call MergeSort with a list containing a single element. a sorting algorithm based on divide-and-conquer. p. r). What is the running time of Merge as a function of n? The algorithm contains four loops (none nested in the other). Since we are ignoring constant factors. The subarray A[p.g. a function that is deﬁned recursively in terms of itself. 1998) Read: Chapt.. by the comments made above. Let us write this without the asymptotic notation. Recall that the basic steps in divide-and-conquer solution are (1) divide the problem into a small number of subproblems. It is easy to see that each loop can be executed at most n times. we will count the total time spent in the Merge procedure. but because n/2 < n for n > 1. We argued that if the list is of length 1. we can express this as T ( n/2 ). simply as n. Let n = r − p + 1 denote the total length of both the left and right subarrays. a recursively deﬁned function. Divide and Conquer and Recurrences: Last time we introduced divide-and-conquer as a basic technique for designing efﬁcient algorithms. For concreteness we could count whatever we like: number of lines of pseudocode.

So whenever it is reasonable to do so. since the function will behave most regularly for these values. otherwise. Since the recurrence divides by 2 each time. Notice that we have dropped the Θ()’s. This is done to make the recurrence more concrete. replacing Θ(1) and Θ(n) by just 1 and n. If we consider the ratios T (n)/n for powers of 2 and interesting pattern emerges: T (1)/1 = 1 T (2)/2 = 2 T (4)/4 = 3 T (8)/8 = 4 T (16)/16 = 5 T (32)/32 = 6...g. T (1) = 1 (by the basis. This suggests that for powers of 2. respectively. Consequently logb n and loga n differ only by a constant factor. T (32) = T (2) + T (2) + 4 = 4 + 4 + 4 = 12 = T (3) + T (2) + 5 = 8 + 4 + 5 = 17 = T (4) + T (4) + 8 = 12 + 12 + 8 = 32 = T (8) + T (8) + 16 = 32 + 32 + 16 = 80 = T (16) + T (16) + 32 = 80 + 80 + 32 = 192. inside the Θ() we do not need to differentiate between them. T (8) .) T (2) = T (1) + T (1) + 2 = 1 + 1 + 2 = 4 T (3) = T (2) + T (1) + 3 = 4 + 1 + 3 = 8 T (4) T (5) . Thus. Recall the change of base formula: loga n . but here is a trick. Eliminating Floors and Ceilings: One of the nasty things about recurrences is that ﬂoors and ceilings are a pain to deal with. or equivalently. Getting a feel: We could try to get a feeling for what this means by plugging in some values and expanding the deﬁnition. and make whatever simplifying assumptions we like about n to make things work out..Lecture Notes CMSC 251 and merging the lists (Θ(n) time). but at least it provides us with a starting point. c1 and c2 n for some constants c1 and c2 . This is not a proof. Logarithms in Θ-notation: Notice that I have broken away from my usual convention of say lg n and just said log n inside the Θ(). T (n) = (n lg n) + n which is Θ(n log n).. Henceforth. e. I will not be fussy about the bases of logarithms if asymptotic results are sufﬁcient. let’s consider powers of 2. we will make the simplifying assumption that n is a power of 2. we will just forget about them. Notice that this means that our analysis will 24 . The reason is that the base really does not matter when it is inside the Θ. Thus.. T (n)/n = (lg n) + 1. the total running time for MergeSort could be described by the following recurrence: T (n) = 1 T ( n/2 ) + T ( n/2 ) + n if n = 1. logb n = loga b If a and b are constants the loga b is a constant. but we would arrive at the same asymptotic formula. T (16) .. The analysis would have been a bit more complex. If we had wanted to be more precise. we could have replaced these with more exact functions. It’s hard to see much of a pattern here.. For this case.

we cannot do the usual n to n + 1 proof (because if n is a power of 2. This has a recursive formula inside T (n/2) which we can expand. So let us restate our recurrence under this assumption: T (n) = 1 2T (n/2) + n if n = 1. When it works. T (n) = 2T (n/2) + n = 2(2T (n/4) + n/2) + n = 4T (n/4) + n + n = 4(2T (n/8) + n/4) + n + n = 8T (n/8) + n + n + n = 8(2T (n/16) + n/8) + n + n + n = 16T (n/16) + n + n + n + n = . otherwise. Here is what we get when we repeat this. Plugging this in gives T (n) = 2((n/2) lg(n/2) + (n/2)) + n = (n lg(n/2) + n) + n = n(lg n − lg 2) + 2n = (n lg n − n) + 2n = n lg n + n. n/2 < n.. By in large. But if the recurrence is at all messy. so we can apply the induction hypothesis here. Instead we use strong induction. you can usually approximate them by integrals). The method is called iteration. To do this. we need to express T (n) in terms of smaller values. n a power of 2.Lecture Notes CMSC 251 only be correct for a very limited (but inﬁnitely large) set of values of n. The Iteration Method: The above method of “guessing” a solution and verifying through induction works ﬁne as long as your recurrence is simple enough that you can come up with a good guess. yielding T (n/2) = (n/2) lg(n/2)+ (n/2). To do this. 25 . We want to prove the formula holds for n itself. we apply the deﬁnition: T (n) = 2T (n/2) + n. Let’s see if we can verify its correctness formally. Veriﬁcation through Induction: We have just generated a guess for the solution to our recurrence. but it turns out that as long as the algorithm doesn’t act signiﬁcantly different for powers of 2 versus other numbers. Proof: (By strong induction on n. summations are easier to solve than recurrences (and if nothing else. which matches. Now. Because n is limited to powers of 2. holds whenever n < n. n + 1 will generally not be a power of 2). We ﬁrst write out the deﬁnition T (n) = 2T (n/2) + n. by ﬁlling in the deﬁnition but this time with the argument n/2 rather than n. Let’s start expanding out the deﬁnition until we see a pattern developing. and assume that the formula T (n ) = (n lg n )+n . Induction step: Let n > 1. it allows you to convert a recurrence into a summation.. there may not be a simple formula. The proof will be by strong induction on n. Claim: For all n ≥ 1.) Basis case: (n = 1) In this case T (1) = 1 by deﬁnition and the formula gives 1 lg 1 + 1 = 1. T (n) = (n lg n) + n. which is exactly what we want to prove. the asymptotic analysis will hold for all n as well. The following method is quite powerful. Plugging in we get T (n) = 2(2T (n/4) + n/2) + n. We then simplify and repeat.

n + 3k−1 (n/4k−1 ) + · · · + 9(n/16) + 3(n/4) + n = 3k T 4k = 3 T k n 4k k−1 + i=0 3i n.7925 . The Iteration Method (a Messier Example): That one may have been a bit too easy to see the general form. Thus we have arrived at the same conclusion. This means that n = 2k . so nlog4 3 ≈ n0. 26 . (k times) Now.79. otherwise. If we substitute this value of k into the equation we get T (n) = = 2(lg n) T (n/(2(lg n) )) + (lg n)n 2(lg n) T (1) + n lg n = 2(lg n) + n lg n = n + n lg n. we have generated alot of equations. ≈ 0. We know that T (1) = 1. because we need to get rid of the T () from the right-hand side. So. but this time no guesswork was involved. 4i As before.79 . until a pattern emerges.. and the fact that T (1) = 1. T (n) = = 2k T (n/(2k )) + (n + n + · · · + n) 2k T (n/(2k )) + kn. In simplifying this. we have made use of the formula from the ﬁrst homework. Thus. Let’s try a messier recurrence: T (n) = 1 3T (n/4) + n if n = 1. log4 3 = 0. the idea is to repeatedly apply the deﬁnition. To get rid of it we recall that we know the value of T (1).Lecture Notes CMSC 251 At this point we can see a general pattern emerging. but we still haven’t gotten anywhere. we have the recursive term T (n/4k ) still ﬂoating around. and so we set n/4k = 1 implying that 4k = n. in the last step we used the formula alogb n = nlogb a where a = 3 and b = 4. T (n) = 3T (n/4) + n = 3(3T (n/16) + n/4) + n = 9T (n/16) + 3(n/4) + n = 9(3T (n/64) + n/16) + 3(n/4) + n = 27T (n/64) + 9(n/16) + 3(n/4) + n = . . let us select k to be a value which forces the term n/(2k ) = 1. To avoid problems with ﬂoors and ceilings. k = log4 n. 4i Again. where a = b = 2. As before. (Why did we write it this way? This emphasizes that the function is of the form nc for some constant c. that is. plugging this value in for k we get: T (n) = 3log4 n T (1) + (log4 n)−1 i=0 3i n 4i = nlog4 3 + (log4 n)−1 i=0 3i n.. implying that k = lg n. . Here’s how we can do that. we’ll make the simplifying assumption here that n is a power of 4.) By the way. alogb n = nlogb a .

We also discussed two methods for solving recurrences. The ﬁrst provides a way of visualizing recurrences and the second. So the ﬁnal result (at last!) is T (n) = 4n − 3nlog4 3 ≈ 4n − 3n0. and it is difﬁcult to get a quick and intuitive sense of what is going on in the recurrence. Note that this is a geometric series. skip Section 4.Lecture Notes CMSC 251 We have this messy summation to solve though. It is interesting to note the unusual exponent of log4 3 ≈ 0. Recap: Last time we discussed recurrences.79 ∈ Θ(n). called the Master Theorem.4. T (n) = nlog4 3 + n (log4 n)−1 i=0 3 4 i . is a method of solving many recurrences that arise in divide-and-conquer applications. We discussed their importance in analyzing divide-and-conquer algorithms. as we shall see. Feb 19. (3/4) − 1 Applying our favorite log identity once more to the expression in the numerator (with a = 3/4 and b = 4) we get (3/4)log4 n = nlog4 (3/4) = n(log4 3−log4 4) = n(log4 3−1) = If we plug this back in. First observe that the value n remains constant throughout the sum. namely guess-and-verify (by induction). However. that is.79. We get T (n) = nlog4 3 + n (3/4)log4 n − 1 . n +n nlog4 3 n = nlog4 3 − 4(nlog4 3 − n) = nlog4 3 + 4(n − nlog4 3 ) = 4n − 3nlog4 3 . and iteration. These are both very powerful methods. but they are quite “mechanical”. We may apply the formula for the geometric series. 27 . and so we can pull it out front. so it seems remarkable that we could generate a strange exponent like 0. Today we will discuss two more techniques for solving recurrences. xi = x−1 i=0 In this case x = 3/4 and m = log4 n − 1. this is often the case in divide-and-conquer recurrences.79 as part of a running time. and three nested loops typically leads to Θ(n3 ) time. which gave in an earlier lecture. 1998) Read: Chapt. For x = 1: m xm+1 − 1 . Also note that we can write 3i /4i and (3/4)i . functions that are deﬁned recursively. Lecture 8: More on Recurrences (Thursday. we have T (n) = n −1 (3/4) − 1 nlog4 3 − n = nlog4 3 + −1/4 log4 3 nlog4 3 . 4 on recurrences. We have seen that two nested loops typically leads to Θ(n2 ) time.

1 n (lg n +1) Figure 5: Using the recursion tree to visualize a recurrence. T (n) = 1 3T (n/2) + n2 if n = 1. .Lecture Notes CMSC 251 Visualizing Recurrences Using the Recursion Tree: Iteration is a very powerful technique for solving recurrences. for a total of 3(n/2)2 . 28 . This leads to the following summation. Suppose that we draw the recursion tree for MergeSort. For the top level (or 0th level) the work is n2 . Below is an illustration of the resulting recursion tree. For example. T(n) n T(n/2) n/2 T(n/4) n/4 n/4 n/4 T(n/2) n/2 n/4 =n + 2( n /2) = n + 4( n /4) = n + . Observe that the total work at the topmost level of the recursion is Θ(n) (or just n for short). we have 3i nodes. we label each node with the amount of work at that level. 1. This continues until the bottommost level of the tree. let’s try it on the following recurrence. This can be written as n2 (3/4). each involving (n/2i )2 work... Since the tree exactly lg n + 1 levels (0. ? T (n) = n2 i=0 3 4 i . Again. . At the second level we have two merges. it is easy to get lost in all the symbolic manipulations and lose sight of what is going on. In general it is easy to extrapolate to see that at the level i. At the level 2 the work is 9(n/4)2 . We can describe any recurrence in terms of a tree. . But. n(n/n) = n lg n + 1 levels 1 1 1 . In this case the work for T (m) is m2 . . Recall that the recurrence for MergeSort (which we simpliﬁed by assuming that n is a power of 2. so we have left off the upper bound on the sum. otherwise. At level 1 we have three nodes whose work is (n/2)2 each. we label that node of the tree with the time it takes to perform the associated (nonrecursive) merge.. for a total of 3i (n/2i )2 = n2 (3/4)i . otherwise. which we will just write as m. Here is a nice way to visualize what is going on in iteration. At the third level we have 4 merges. This can be used for a number of simple recurrences. and each level contributes a total of n time. but each time we merge two lists. for a total of 2(n/2) = n.. which can be written as n2 (9/16). each taking n/4 time. where each expansion of the recurrence takes us one level deeper in the tree. Note that we have not determined where the tree bottoms out. and hence could drop the ﬂoors and ceilings) T (n) = 1 2T (n/2) + n if n = 1. The tree is illustrated below. lg n). This is exactly what we got by the iteration method. 2. the total running time is n(lg n + 1) = n lg n + n. for a total of 4(n/4) = n. each taking n/2 time. Recall that to merge two lists of size m/2 to a list of size m takes Θ(m) time.

.) T (n) = n2 (3/4)lg n+1 − 1 = − 4n2 ((3/4)lg n+1 − 1) (3/4) − 1 = 4n2 (1 − (3/4)lg n+1 ) = 4n2 (1 − (3/4)(3/4)lg n ) = 4n2 (1 − (3/4)nlg(3/4) ) = 4n2 (1 − (3/4)nlg 3−lg 4 ) = 4n2 (1 − (3/4)nlg 3−2 ) = 4n2 (1 − (3/4)(nlg 3 /n2 )) = 4n2 − 3nlg 3 .) So. we can plug in lg n for the “?” in the above summation. and the base (3/4) is less than 1. but this sort of small error will not affect the asymptotic result. This means that this series converges to some nonzero constant (even if we ran the sum out to ∞). and 29 . we could plug the summation into the formula for the geometric series and simplify. it is not hard to see that the ﬁnal level is level lg n. observe that the recursion bottoms out when we get down to single items. but at this point it is all just tedious algebra to get the formula into simple form. lg n T (n) = n2 i=0 3 4 i . n 2 (3/4)i T(n/4) (n/4)2 (n/4)2 (n/4)2 (n/4)2 (n/4)2 (n/4)2 Figure 6: Another recursion tree example. then are essentially done at this point. (This sort of algebraic is typical of algorithm analysis. Why? The summation is a geometric series. the technique of drawing the recursion tree is a somewhat more visual way of analyzing summations... In conclusion. so the whole expression is Θ(n2 ). where each subproblem is roughly a factor of 1/b of the original problem size. Thus the running time is Θ(n2 ). so be sure that you follow each step. (It is easy to be off by ±1 here. but it is really equivalent to the method of iteration. and since the sizes of the inputs are cut by half at each level. Note that lg 3 ≈ 1. If we wanted to get a more exact answer. T(n/2) (n/2) 2 = n2 + 3(n/2) 2 = n2 (3/4) + 2 = n2 (9/16) 9(n/4) + . This would lead to an expression like T (n) = n2 (3/4)lg n+1 − 1 . To get a more exact result. In this case we happen to be right. (3/4) − 1 This will take some work to simplify.58. In general you are breaking a problem into a subproblems..Lecture Notes CMSC 251 T(n) n2 T(n/2) (n/2) 2 T(n/2) (n/2) 2 . you will see that the same general type of recurrence keeps popping up. (Simpliﬁed) Master Theorem: If you analyze many divide-and-conquer algorithms. If all we wanted was an asymptotic expression.

but this not uncommon for many divide-and-conquer solutions. and at each successive level of the tree the work at this level reduces (by some constant factor). can we just come up with a general solution? The answer is that you can if all you need is an asymptotic expression. a = bk (2 = 21 ) and so Case 2 applies. The basis case. Case 3 will apply. This result is called the Master Theorem. iteration works just ﬁne here. Thus. T (1) can be any constant value. T (n) = 3T (n/2) + n2 .26 ). or some other technique. In this instance most of the work was concentrated at the root of the tree. However. at the bottom of the tree (the leaf level). There are many recurrences that cannot be put into this form. namely Θ(log n). and so Case 1 applies. If the one from the book doesn’t apply. For example. for a total of Θ(n log n). b = 3 and k = 1. For this reason the total work is equal to this value times the height of the tree. Each level of the tree provided a smaller fraction of work. Our text gives a fairly complicated version of this theorem. otherwise. b = 2 and k = 2. From this we have T (n) ∈ Θ(n log n). This may seem to be a rather strange running time (a non-integer exponent). Case 3: if a < bk then T (n) ∈ Θ(nk ). 30 . b = 2. because it can be used to “master” so many different types of recurrence. in which we have a = 4. By the nature of the geometric series. b = 2. and so Case 3 applies. Recursion Trees Revisited: The recursion trees offer some intuition about why it is that there are three cases in the Master Theorem. the geometric series that result will converge to a constant value. In this case we have a > bk (4 > 31 ). This is an important observation to make. and k = 1. it did not matter how many levels the tree had at all. which is general enough for most typical situations. deﬁned for n ≥ 0. Theorem: (Simpliﬁed Master Theorem) Let a ≥ 1. In the recurrence above. Finally. try the one from the book. b > 1 be constants and let T (n) be the recurrence T (n) = aT (n/b) + nk . we might have seen a recurrence of the form T (n) = 1 2T (n/2) + n log n if n = 1. For example. A common way to design the most efﬁcient divide-and-conquer algorithms is to try to arrange the recursion so that most of the work is done at the root. Generally speaking the question is where is most of the work done: at the top of the tree (the root level). but the Master Theorem (neither this form nor the one in CLR) will tell you this. in the MergeSort recurrence (which corresponds to Case 2 in the Master Theorem) every level of the recursion tree provides the same total work. or spread equally throughout the entire tree. a = 2.Lecture Notes CMSC 251 the time it takes to do the splitting and combining on an input of size n is Θ(nk ). namely n. We will give a simpler version. From this we have T (n) ∈ Θ(n2 ). in MergeSort. and k = 1. Next consider the earlier recurrence T (n) = 3T (n/2) + n2 (which corresponds to Case 3 in the Master Theorem). Using this version of the Master Theorem we can see that in the MergeSort recurrence a = 2. As long as this is the case. consider the recurrence T (n) = 4T (n/3) + n. if the splitting and combining steps involve sorting. For example. Case 2: if a = bk then T (n) ∈ Θ(nk log n). Even with an inﬁnite number of levels. From this we have T (n) ∈ Θ(nlog3 4 ) ≈ Θ(n1. We have a < bk (3 < 22 ) in this case.) Then Case 1: if a > bk then T (n) ∈ Θ(nlogb a ). we have a = 3. Rather than doing every such recurrence from scratch. then you will probably need iteration. (As usual let us assume that n is a power of b. In cases where this doesn’t apply. This solves to T (n) = Θ(n log2 n).

31 . Today we will focus on the following generalization. In statistics it is common to return the average of these two elements. Thus. For example. Of particular interest in statistics is the median. Our presentation of the partitioning algorithm and analysis are somewhat different from the ones in the book. The Sieve Technique: The reason for introducing this algorithm is that it illustrates a very important special case of divide-and-conquer. the terms are growing successively larger.2. In particular. The question is whether it is possible to do better. it is often desirable to partition a set about its median value.2 and 10. Today we will give a rather surprising (and very tricky) algorithm which shows the power of these techniques. the median income in a community is likely to be more meaningful measure of the central tendency than the average is. Lecture 9: Medians and Selection (Tuesday. You are not responsible for the randomized analysis of Section 10. since if Bill Gates lives in your community then his gigantic income may signiﬁcantly bias the average. 1 ≤ k ≤ n. since in divide-and-conquer applications. which I call the sieve technique. namely the elements of ranks n/2 and (n/2) + 1. The minimum is of rank 1 and the maximum is of rank n. Feb 24. This can be seen if you perform iteration on this recurrence. (You might try this to see if you get the same result. We think of divide-and-conquer as breaking the problem into a small number of smaller subproblems. the rank of an element is its ﬁnal position if the set is sorted.3. Suppose that you are given a set of n numbers. output the element of A of rank k. Medians are useful as measures of the central tendency of a set. The problem that we will consider is very easy to state. The sieve technique is a special case. The selection problem can easily be solved in Θ(n log n) time. Selection: Given a set A of n distinct numbers and an integer k. we will make the simplifying assumption that all the elements are distinct for now. which are then solved recursively. especially when the distribution of values is highly skewed. as we go deeper into the levels of the tree. called the selection problem. They are also useful. The largest contribution will be from the leaf level. simply by sorting the numbers of A. Selection: In the last couple of lectures we have discussed recurrences and the divide-and-conquer method of solving problems. that is deeper into the summation. It will be easy to get around this assumption later. the resulting summation is log3 n n i=0 4 3 i . 1998) Read: Todays material is covered in Sections 10. in the recurrence T (n) = 4T (n/3) + n (which corresponds to Case 1). Since duplicate elements make our life more complex (by creating multiple elements of the same rank). but surprisingly difﬁcult to solve optimally.) Since 4/3 > 1. most of the work is done at the leaf level of the recursion tree. Deﬁne the rank of an element to be one plus the number of elements that are smaller than this element. When n is even there are two natural choices.Lecture Notes CMSC 251 Finally. whereas it cannot have a signiﬁcant inﬂuence on the median. If n is odd then the median is deﬁned to be the element of rank (n + 1)/2. and then returning A[k]. where the number of subproblems is just 1. is it possible to solve this problem in Θ(n) time? We will see that the answer is yes. We will deﬁne the median to be either of these elements. into two sets of roughly equal size. and the solution is far from obvious.

and we want to ﬁnd the kth smallest item (where k ≤ r − p + 1).. and we may just return it. but they differ only in one step. Later we will see how to choose x..n] and an integer k.r] > x 5 9 2 6 4 1 3 7 Initial (k=6) 3 1 2 4 x_rnk=4 6 9 5 7 Partition (pivot = 4) 6 9 5 7 6 5 7 x_rnk=3 9 Partition (pivot = 7) 6 5 5 6 x_rnk=2 (DONE!) Recurse (k=6−4=2) Recurse (k=2) Partition (pivot = 6) Figure 7: Selection Algorithm. taking say Θ(nk ) time. Applying the Sieve to Selection: To see more concretely how the sieve technique works.. 0. let us apply it to the selection problem. return it 32 . Here is the complete pseudocode. and A[q + 1. n/3. Each of the resulting recursive solutions then do the same thing.r] as we did in MergeSort. which involves judiciously choosing an item from the array. (Recall that we assumed that all the elements are distinct.. so we want to ﬁnd the element of rank k − q. but for now just think of it as a random element of A. subarray A[p. int k) { if (p == r) return A[p] // return kth smallest of A[p. Selection Select(array A. This is illustrated below. In particular “large enough” means that the number of items is at least some ﬁxed constant fraction of n (e. eliminating a constant fraction of the remaining set.q − 1] and if k > x rnk then we need to recursively search A[q + 1. which we will denote by x.r].q−1] < x A[q+1. and want to ﬁnd the k-th smallest element of A. we will assume that we are given a subarray A[p. then the pivot is the kth smallest. There are two principal algorithms for solving the selection problem. for some constant k.. We then partition A into three parts.. n/2.r]. and can be eliminated from further consideration. pivot p r 5 9 2 6 4 1 3 7 Before partitioing p q r 3 1 2 4 6 9 5 7 After partitioing x A[p.. we ﬁnd that we do not know what the desired item is. It is easy to see that the rank of the pivot x is q −p+1 in A[p. then we know that we need to recursively search in A[p. int r. If k < x rnk .. Recall that we are given an array A[1. called the pivot element. Since the algorithm will be applied inductively. Let x rnk = q −p +1 .r] // only 1 item left.) Within each subarray. Then we solve the problem recursively on whatever items remain. will contain all the element that are greater than x. A[q] contains the element x.Lecture Notes CMSC 251 The sieve technique works in phases as follows.n]..q − 1] will contain all the elements that are less than x. If k = x rnk .0001n).r].g. We do not know which item is of interest.. but we can identify a large enough number of elements that cannot be the desired value. the items may appear in any order.. The initial call will be to the entire array A[1. however after doing some amount of analysis of the data. It applies to problems where we are interested in ﬁnding a single item from a larger set of n items. In this latter case we have eliminated q smaller elements. int p.

but assume for now that they both take Θ(n) time. r) // choose the pivot element q = Partition(A. This would lead to the following recurrence. so the total running time is Θ(n). We can solve this either by expansion (iteration) or the Master Theorem. then the recurrence would have been T (n) = T (99n/100)+n. because we may only eliminate it and the few smaller or larger elements of A. The question that remains is how many elements did we succeed in eliminating? If x is the largest or smallest element in the array.Lecture Notes CMSC 251 else { x = Choose_Pivot(A. q+1. if x is one of the smallest elements of A or one of the largest. Let us suppose for now (optimistically) that we are able to design the procedure Choose Pivot in such a way that is eliminates exactly half the array with each phase. and recurses on the rest. which shows that the total running time is indeed linear in n. p. x) // partition <A[p. we get this convergent geometric series in the analysis. Using this we have T (n) ≤ 2n ∈ O(n). In fact. (This only proves the upper bound on the running time. If we eliminated even one per cent. i 2 2i i=0 ∞ i i=0 c ∞ Recall the formula for the inﬁnite geometric series.) This is a bit counterintuitive. In this algorithm we make many passes (it could be as many as lg n). then we may only succeed in eliminating one element with each phase. For any c such that |c| < 1. Note that the assumption of eliminating half was not critical.. q-1. p. It is often possible to achieve running times in ways that you would not expect. Normally you would think that in order to design a Θ(n) time algorithm you could only make a single. k-x_rnk)// select from right subarray } } Notice that this algorithm satisﬁes the basic form of a sieve algorithm. or perhaps a constant number of passes over the data set.r]> x_rnk = q . r. Otherwise we either eliminate the pivot and the right subarray or the pivot and the left subarray. = 1/(1 − c). A[q+1. k) // select from left subarray else return Select(A. then we get into trouble. but it is easy to see that it takes at least Ω(n) time. However. otherwise. T (n) = 1 T (n/2) + n if n = 1. r.. This lesson is well worth remembering. When k = x rnk then we get lucky and eliminate everything. Eliminating any constant fraction would have been good enough. We will discuss the details of choosing the pivot and partitioning later.p + 1 // rank of the pivot if (k == x_rnk) return x // the pivot is the kth smallest else if (k < x_rnk) return Select(A. p. x. because we eliminate a constant fraction of elements with each phase. which is still less than 1. It analyzes the data (by choosing the pivot element and partitioning) and it eliminates some part of the data set. If we expand this recurrence level by level we see that we get the summation T (n) = n + n n + + ··· ≤ 2 4 ∞ i=0 n 1 = n . and we would have gotten a geometric series involving 99/100. meaning that we recurse on the remaining n/2 elements. 33 . implying a convergent series. Ideally x should have a rank that is neither too large nor too small.q-1].

n]. Return x as the desired pivot.n].. then the larger subarray will never be larger than 3n/4. The second problem is a rather easy programming exercise. For example.Lecture Notes CMSC 251 Choosing the Pivot: There are two issues that we have left unresolved. we could just BubbleSort each group and take the middle element. and k = (m + 1)/2 . we will discuss partitioning in detail. respectively. This is a bit complicated. then we may have to recurse on the left subarray of size q − 1. For this. if n/4 ≤ q ≤ 3n/4.n] > x. Actually this works pretty well in practice. For the rest of the lecture. Otherwise. so in the worst case. where m = n/5 . and if q < n/2. since each group has only a constant number of elements. e. Instead. Later. k will be chosen so that we have to recurse on the larger of the two subarrays..10].g. e. Observe that at least half of the group medians are less than or equal to x. Let’s consider the top of the recurrence. since the details are rather involved.5]. 1. we need to argue that x satisﬁes the desired rank properties. we need to show that there are at least n/4 elements that are less than or equal to x. we will describe a rather complicated method for computing a pivot element that achieves the desired properties. Copy the group medians to a new array B.15]. Let x be this median of medians. Actually this is not such a bad idea. Of course. Recall that we are given an array A[1. Lemma: The element x is of rank at least n/4 and at most 3n/4 in A. The partitioning algorithm will split the array into A[1. There will be m group medians. Thus if q > n/2.g.. Consider the groups shown in the tabular form above. so to simplify things we will assume that n is evenly divisible by 5. To establish the correctness of this procedure.. We will return to this later. but a careful analysis will show that the expected running time is still Θ(n). Earlier we said that we might think of the pivot as a random element of the array A. A[11. let’s concentrate on how to choose the pivot. The reason is that roughly half of the elements lie between ranks n/4 and 3n/4. when we discuss QuickSort. k). (Because x is 34 . and we want to compute an element x whose rank is (roughly) between n/4 and 3n/4. Median of medians: Compute the median of the group medians. Proof: We will show that x is of rank at least n/4. Recall that before we said that we might think of the pivot as a random element of A. If we could select q so that it is roughly of middle rank.q − 1] < x. we need to search one of the two subarrays. They are of sizes q − 1 and n − q. To do this... we are in trouble if q is very small. The algorithm is illustrated in the ﬁgure below. If k = q. then we will be in good shape. and the second is how to partition the array. The subarray that contains the kth smallest element will generally depend on what k is. Suppose that the pivot x turns out to be of rank q in the array. We will have to describe this algorithm at a very high level. then we may have to recurse on the right subarray of size n − q. due to the ﬂoor and ceiling arithmetic. when we are given A[1. Each will take Θ(1) time. or if q is very large. A[6. This can easily be done in Θ(n) time. Here is the description for Select Pivot: Groups of 5: Partition A into groups of 5 elements. we might be continuously unlucky. We do not need an intelligent algorithm to do this.. Let’s see why. and repeating this n/5 times will give a total running time of Θ(n). Both need to be solved in Θ(n) time. The other part of the proof is essentially symmetrical. etc. we will have to call the selection algorithm recursively on B. There will be exactly m = n/5 such groups (the last one might have fewer than 5 elements). The ﬁrst is how to choose the pivot element. Select(B. A[q] = x and A[q + 1. In either case. The key is that we want the procedure to eliminate at least some constant fraction of the array after each partitioning step. A[1. then we are done. For example. so picking a random element as the pivot will succeed about half the time to eliminate at least n/4. Group medians: Compute the median of each group of 5. m.

Analysis: The last order of business is to analyze the running time of the overall algorithm. such that T (n) ≤ cn. Theorem: There is a constant c. there are three elements that are less than or equal to this median within its group (because it is the median of its group). T (n) ≤ T (n/5) + T (3n/4) + n otherwise. By deﬁnition we have T (n) = T (n/5) + T (3n/4) + n. in order to achieve this. This mixture will make it impossible to use the Master Theorem. and write the Θ(n) as n for concreteness. We achieved the main goal. We know we want an algorithm that runs in Θ(n) time. this is a good place to apply constructive induction. However. The recursive call in Select() will be made to list no larger than 3n/4. We will then show that T (n) ≤ cn. As usual. we will ignore ﬂoors and ceilings. However. Therefore. The running time is 1 if n = 1. and difﬁcult to apply iteration. Step: We assume that T (n ) ≤ cn for all n < n. their median. This is a very strange recurrence because it involves a mixture of different fractions (n/5 and 3n/4). Everything else took only Θ(n) time. 30 is the ﬁnal pivot. 35 . within Select Pivot() we needed to make a recursive call to Select() on an array B consisting of n/5 elements.) And for each group median. Proof: (by strong induction on n) Basis: (n = 1) In this case we have T (n) = 1. namely that of eliminating a constant fraction (at least 1/4) of the remaining list at each stage of the algorithm. there are at least 3((n/5)/2 = 3n/10 ≥ n/4 elements that are less than or equal to x in the entire array.Lecture Notes CMSC 251 14 57 24 6 37 32 2 43 30 25 23 52 12 63 3 5 44 17 34 64 10 27 48 8 19 60 21 1 55 41 29 11 58 39 6 14 24 37 57 2 25 30 32 43 3 12 23 52 63 5 17 34 44 64 8 10 19 27 48 1 21 41 55 60 11 29 39 58 Group Get group medians 8 10 19 27 48 3 12 23 52 63 6 14 24 37 57 2 25 30 32 43 5 17 34 44 64 11 29 39 58 1 21 41 55 60 Get median of medians (Sorting of group medians is not really performed) Figure 8: Choosing the Pivot. and so T (n) ≤ cn as long as c ≥ 1.

which can be quite costly when lots of long multiplications are needed.Lecture Notes CMSC 251 Since n/5 and 3n/4 are both less than n. giving T (n) ≤ c = 3n 1 3 n +c + n = cn + 5 4 5 4 19 19c cn + n = n +1 . and multiplication in particular. since for centuries people have used the same algorithm that we all learn in grade school. let’s just assume that the number of digits n is a power of 2. and encryption keys are stored as long integers. from 1:00-2:00.. I’ll have ofﬁce hours from 2:00-4:00 on Monday. Addition and subtraction on large numbers is relatively easy. This raises the question of whether there is a more efﬁcient way to multiply two very large numbers. Let m = n/2. Let A and B be the two numbers to multiply. Because of the way we write numbers. But the standard algorithm for multiplication runs in Θ(n2 ) time.. Combining the constraints that c ≥ 1. Feb 26. but if we partition an n digit number into two “super digits” with roughly n/2 each into longer sequences. the character string to be encrypted is converted into a sequence of numbers. In fact. It would seem surprising if there were.0] rather than the usual A[0.) Lecture 10: Long Integer Multiplication (Thursday. we will see that it is possible. Kyongil.0].m] 36 x = A[m − 1. then these algorithms run in Θ(n) time. or 6 and see what happens. Solving for c we see that this is true provided that c ≥ 20. Efﬁcient encryption and decryption depends on being able to perform arithmetic on long numbers. the same multiplication rule still applies. and c ≥ 20. we are done.. 1998) Read: Todays material on integer multiplication is not covered in CLR. A natural question is why did we pick groups of 5? If you look at the proof above. you will see that it works for any value that is strictly greater than 4. Let A[0] denote the least signiﬁcant digit and let A[n − 1] denote the most signiﬁcant digit of A. 20 20 +n This last expression will be ≤ cn. . Divide-and-Conquer Algorithm: We know the basic grade-school algorithm for multiplication..m] = B[n − 1. (Go back and analyze your solution to the problem on Homework 1).. Most techniques for encryption are based on number-theoretic techniques. will have extra ofﬁce hours on Monday before the midterm. it is more natural to think of the elements of A as being indexed in decreasing order from left to right as A[n − 1. we see that by letting c = 20. we can apply the induction hypothesis. To avoid complicating things with ﬂoors and ceilings. If n is the number of digits. typically containing hundreds of digits. Long Integer Multiplication: The following little algorithm shows a bit more about the surprising applications of divide-and-conquer. 4. provided that we select c such that c ≥ (19c/20) + 1. Let w y = A[n − 1.. (You might try it replacing the 5 with 3. The problem that we want to consider is how to perform arithmetic on long integers. For example.0] and z = B[m − 1.n − 1]. Ofﬁce hours: The TA. We normally think of this algorithm as applying on a digit-by-digit basis. The reason for doing arithmetic on long numbers stems from cryptography.

we see that a = 4. It shows that the critical element is the number of multiplications on numbers of size n/2.) The key turns out to be a algebraic “trick”. y and z as n/2 digit numbers. if we could ﬁnd a way to arrive at the same result algebraically. D = xz. 37 . otherwise. B) = mult(w. since the number of additions must be a constant. z) mult((w + x). we cannot simulate multiplication through repeated additions. z). = = w · 10m + x y · 10m + z. The number of additions (as long as it is a constant) does not affect the running time. x. If we apply the Master Theorem. and so they take Θ(n) time each. b = 2. y)102m + (mult(w. y) mult(x. we can get everything we want. we can express A and B as A B and their product is mult(A. it took us four multiplications to compute these. we can express the multiplication of two long integers as the result of 4 products on integers of roughly half the length of the original. using only three multiplications (but with more additions and subtractions). and a > bk . (Of course. z) + mult(x. but by trading off multiplications in favor of additions.Lecture Notes CMSC 251 n n/2 w y wz wy wy xy wz + xy xz Product n/2 x z xz A B Figure 9: Long integer multiplication. observe that if instead we compute the following quantities. each taking Θ(n) time. its running time would be given by the following recurrence T (n) = 1 4T (n/2) + n if n = 1. Thus. and a constant number of additions and shifts. However. and E = (wz + xy). If we think of w. it actually has given us an important insight. Observe that all the additions involve numbers involving roughly n/2 digits. independent of n. then we would have a more efﬁcient algorithm. implying that Case 1 holds and the running time is Θ(nlg 4 ) = Θ(n2 ). Unfortunately. this is no better than the standard algorithm. y))10m + mult(x. The operation of multiplying by 10m should be thought of as simply shifting the number over by m positions to the right. (y + z)) − C − D = (wy + wz + xy + xz) − wy − xz = (wz + xy). Above. The quantities that we need to compute are C = wy. k = 1. and so is not really a multiplication. Faster Divide-and-Conquer Algorithm: Even though the above exercise appears to have gotten us nowhere. C D E = = = mult(w. This suggests that if we were to implement this algorithm. So.

yielding T (n) ∈ Θ(nlg 3 ) ≈ Θ(n1. It is a good idea to check out related chapters in the book.585 versus n2 ) then this algorithm beats the simple algorithm for n ≥ 50. Average-case: Recall that a worst-case means that we consider the highest running time over all inputs of size n.585 ). B) = C · 102m + E · 10m + D.g.) Recurrences: Know how to analyze the running time of a recursive program by expressing it as a recurrence. But asymptotics says that if n is large enough. then this algorithm will be superior. Worst-case. iteration. Review the basic techniques for solving recurrences: guess and verify by induction (I’ll provide any guesses that you need on the exam). mean informally. These include the arithmetic series i i. 3 of CLR. then the crossover would occur for n ≥ 260. and 2 subtractions all of numbers with n/2 digitis. Also be sure you can apply the integration rule to summations. If the overhead was 10 times larger. the geometric series i xi . (Chapt. i i2 . Also be sure you understand how the constructive induction proofs worked. Review for the Midterm: Here is a list topics and readings for the ﬁrst midterm exam. and Ω. Know the what the other forms. The additions. 2 of CLR.Lecture Notes CMSC 251 Finally we have mult(A. if we assume that the clever algorithm has overheads that are 5 times greater than the simple algorithm (e. otherwise.) (Chapt 4. we have a = 3. the quadratic series. Practice with simplifying summations.) Asymptotics: Know the formal deﬁnitions for Θ. average case means that we average running times over all inputs of size n (and generally weighting each input by its probability of occuring). (You are NOT responsible for the more complex version given in the text. and shifts take Θ(n) time in total. and the harmonic series i 1/i. n→∞ nc lim (Chapt. can you see why?) Remember the following rule and know how to use it. Skip 4. Altogether we perform 3 multiplications. Generally you are responsible for anything discussed in class. subtractions. nb =0 n→∞ an lim lgb n = 0.4. and anything appearing on homeworks. b = 2 and k = 1. Is this really an improvement? This algorithm carries a larger constant factor because of the overhead of recursion and the additional arithmetic operations. (Chapt 1 of CLR. 4 additions. Also be able to rank functions √ √ in asymptotic order. There are a number of good sample problems in the book.) 38 . For example. I’ll be happy to check any of your answers. So the total running time is given by the recurrence: T (n) = 1 3T (n/2) + n if n = 1. O. 5n1. For example which is larger lg n or lg n? (It is the former. as well as how to use the limit-rule. For example. Now when we apply the Master Theorem.) General analysis methods: Be sure you understand the induction proofs given in class and on the homeworks. be sure that you can take something like n 2 3i i 2 i and simplify it to a geometric series n2 i (3/4)i . Summations: Write down (and practice recognizing) the basic formulas for summations. because this is where I often look for ideas on problems. o and ω. We still need to shift the terms into their proper ﬁnal positions. constructive induction. and the (simpliﬁed) Master Theorem.

There are some domains that can be partially ordered. x = y or x ⊃ y. No lecture. Totally ordered means that for any two elements of the domain. March 3. There is an algorithm called topological sorting which can be applied to “sort” partially ordered sets. x =. x. In the sorting problem we are given an array A[1.) Lecture 11: First Midterm Exam (Tuesday. but not totally ordered. and how it was used in the selection problem. it is not true that for any two sets either x ⊂ y. Some possess certain desirable properties. Lecture 12: Heaps and HeapSort (Thursday. Understand the divide-and-conquer algorithm for MergeSort. or x > y. It can be any object from a totally ordered domain. First. 39 . Find the smallest element in A[i. and others do not. and then swap it with A[i]. Finally sorting is one of the few problems where there provable lower bounds on how fast you can sort. ⊂. These algorithms are all easy to implement. skip the randomized analysis. Slow Sorting Algorithms: There are a number of well-known slow sorting algorithms. 1998) Read: Chapt 7 in CLR. The key value need not be a number. either x < y. A is of an array of records. We have already seen that MergeSort sorts an array of numbers in Θ(n log n) time. 1998) First midterm exam today. by shifting all larger elements to the right by one to make space for the new item. sets can be partially ordered under the subset relation.n]. Repeat until all consecutive items are in order.i − 1] contain the i − 1 smallest elements in sorted order. Insert A[i] into its proper position in this subarray. (Chapt 10 on Medians. sorting is a very important algorithmic problem. and y. and we choose one of these records as the key value on which the elements will be sorted.. The reasons for studying sorting algorithms in details are twofold. HeapSort and QuickSort. Sorting: For the next series of lectures we will focus on sorting algorithms. some slow and some fast.Lecture Notes CMSC 251 Divide-and-conquer: Understand how to design algorithms by divide-and-conquer. We may discuss this later. Mar 5. The material on the 2-d maxima and long integer multiplication is not discussed in CLR.. but they run in Θ(n2 ) time in the worst case. but this is not a total order.. Selection sort: Assume that A[1.n] of n numbers. sorting forms an interesting case study in algorithm theory.i − 1] have already been sorted. There are many sorting algorithms. Insertion sort: Assume that A[1.. swap them. and be able to work an example by hand. The other reason is more pedagogical. More generally. Thus the design of efﬁcient sorting algorithms is important for the overall efﬁciency of these systems. For example. either explicitly or implicitly. These include the following: Bubblesort: Scan the array. We will study two others. Whenever two consecutive items are found that are out of order. Thus. and are asked to reorder these elements into increasing order. Procedures for sorting are parts of many large software systems. Also understand how the sieve technique works.

returning the largest element without deleting it. Binary tree is said to be complete if all internal nodes have two (nonempty) children. and changing the priority of an element that is already in the queueu (either decreasing or increasing it). (Actually it is more common to extract the minimum. . A simple priority queue supports three basic operations: Create: Create an empty queue.Lecture Notes CMSC 251 Priority Queues: The heapsort algorithm is based on a very nice data structure. It is easy to modify the implementation (by reversing < and > to do this. It is common to support a number of additional operations as well. called its priority. A nonleaf node is called an internal node. For now we will describe the heap in terms of a binary tree implementation. a left subtree and a right subtree. An important fact about a complete binary trees is that a complete binary tree of height h has h n = 1 + 2 + . its grandchildren at depth 2. + 2h = i=0 2i = 2h+1 − 1 nodes altogether. 40 . Depth: 0 1 2 3 4 5 Root leaf internal node Binary Tree Figure 10: Binary trees. its children at depth 1. ExtractMax: Return the element with maximum key value from the queue. If we solve for h in terms of n. A priority queue stores elements. but we will see later that heaps can be stored in arrays. and all leaves have the same depth. Insert: Insert an element into a queue. The height of a binary tree is its maximum depth. The root is at depth 0. and so on. Heaps: A heap is a data structure that supports the main priority queue operations (insert and delete max) in Θ(log n) time. If both the left and right children of a node are empty.) Empty: Test whether the queue is empty. A heap is a concrete implementation of an abstract data type called a priority queue. The left subtree and right subtrees are each binary trees. . then this node is called a leaf node. called a heap. such as building a priority queue from an initial set of elements. Adjust Priority: Change the priority of an item in the queue. each of which is associated with a numeric key value. we see that the the height of a complete binary tree with n nodes is h = (lg(n + 1)) − 1 ≈ lg n ∈ Θ(log n). By a binary tree we mean a data structure which is either empty or else it consists of three things: a root node. Complete Binary Tree The depth of a node in a binary tree is its distance from the root. They are called the left child and right child of the root node.

Lecture Notes

CMSC 251

A heap is represented as an left-complete binary tree. This means that all the levels of the tree are full except the bottommost level, which is ﬁlled from left to right. An example is shown below. The keys of a heap are stored in something called heap order. This means that for each node u, other than the root, key(Parent(u)) ≥ key(u). This implies that as you follow any path from a leaf to the root the keys appear in (nonstrict) increasing order. Notice that this implies that the root is necessarily the largest element.
16 14 8 2 4 1 7 9 10 3

Left−complete Binary Tree
Figure 11: Heap.

Heap ordering

Next time we will show how the priority queue operations are implemented for a heap.

Lecture 13: HeapSort
(Tuesday, Mar 10, 1998)
Read: Chapt 7 in CLR. Heaps: Recall that a heap is a data structure that supports the main priority queue operations (insert and extract max) in Θ(log n) time each. It consists of a left-complete binary tree (meaning that all levels of the tree except possibly the bottommost) are full, and the bottommost level is ﬁlled from left to right. As a consequence, it follows that the depth of the tree is Θ(log n) where n is the number of elements stored in the tree. The keys of the heap are stored in the tree in what is called heap order. This means that for each (nonroot) node its parent’s key is at least as large as its key. From this it follows that the largest key in the heap appears at the root. Array Storage: Last time we mentioned that one of the clever aspects of heaps is that they can be stored in arrays, without the need for using pointers (as would normally be needed for storing binary trees). The reason for this is the left-complete nature of the tree. This is done by storing the heap in an array A[1..n]. Generally we will not be using all of the array, since only a portion of the keys may be part of the current heap. For this reason, we maintain a variable m ≤ n which keeps track of the current number of elements that are actually stored actively in the heap. Thus the heap will consist of the elements stored in elements A[1..m]. We store the heap in the array by simply unraveling it level by level. Because the binary tree is leftcomplete, we know exactly how many elements each level will supply. The root level supplies 1 node, the next level 2, then 4, then 8, and so on. Only the bottommost level may supply fewer than the appropriate power of 2, but then we can use the value of m to determine where the last element is. This is illustrated below. We should emphasize that this only works because the tree is left-complete. This cannot be used for general trees. 41

Lecture Notes

CMSC 251

16 1 1 2 14 4 8 8 4 9 1 7 10 5 6 9 10 3 3 7 m=10 n=13 2 3 4 5 7 6 9 7 3 8 2 9 10 11 12 13 4 1 16 14 10 8

2

Heap as a binary tree

Heap as an array

Figure 12: Storing a heap in an array. We claim that to access elements of the heap involves simple arithmetic operations on the array indices. In particular it is easy to see the following. Left(i) : return 2i. Right(i) : return 2i + 1. Parent(i) : return i/2 . IsLeaf (i) : return Left(i) > m. (That is, if i’s left child is not in the tree.) IsRoot(i) : return i == 1. For example, the heap ordering property can be stated as “for all i, 1 ≤ i ≤ n, if (not IsRoot(i)) then A[Parent(i)] ≥ A[i]”. So is a heap a binary tree or an array? The answer is that from a conceptual standpoint, it is a binary tree. However, it is implemented (typically) as an array for space efﬁciency. Maintaining the Heap Property: There is one principal operation for maintaining the heap property. It is called Heapify. (In other books it is sometimes called sifting down.) The idea is that we are given an element of the heap which we suspect may not be in valid heap order, but we assume that all of other the elements in the subtree rooted at this element are in heap order. In particular this root element may be too small. To ﬁx this we “sift” it down the tree by swapping it with one of its children. Which child? We should take the larger of the two children to satisfy the heap ordering property. This continues recursively until the element is either larger than both its children or until its falls all the way to the leaf level. Here is the pseudocode. It is given the heap in the array A, and the index i of the suspected element, and m the current active size of the heap. The element A[max ] is set to the maximum of A[i] and it two children. If max = i then we swap A[i] and A[max ] and then recurse on A[max ].
Heapify Heapify(array A, int i, int m) { l = Left(i) r = Right(i) max = i if (l <= m and A[l] > A[max]) max = l if (r <= m and A[r] > A[max]) max = r if (max != i) { swap A[i] with A[max] Heapify(A, max, m) } } // sift down A[i] in A[1..m] // left child // right child // // // // // left child exists and larger right child exists and larger if either child larger swap with larger child and recurse

42

Lecture Notes

CMSC 251

See Figure 7.2 on page 143 of CLR for an example of how Heapify works (in the case where m = 10). We show the execution on a tree, rather than on the array representation, since this is the most natural way to conceptualize the heap. You might try simulating this same algorithm on the array, to see how it works at a ﬁner details. Note that the recursive implementation of Heapify is not the most efﬁcient. We have done so because many algorithms on trees are most naturally implemented using recursion, so it is nice to practice this here. It is possible to write the procedure iteratively. This is left as an exercise. The HeapSort algorithm will consist of two major parts. First building a heap, and then extracting the maximum elements from the heap, one by one. We will see how to use Heapify to help us do both of these. How long does Hepify take to run? Observe that we perform a constant amount of work at each level of the tree until we make a call to Heapify at the next lower level of the tree. Thus we do O(1) work for each level of the tree which we visit. Since there are Θ(log n) levels altogether in the tree, the total time for Heapify is O(log n). (It is not Θ(log n) since, for example, if we call Heapify on a leaf, then it will terminate in Θ(1) time.) Building a Heap: We can use Heapify to build a heap as follows. First we start with a heap in which the elements are not in heap order. They are just in the same order that they were given to us in the array A. We build the heap by starting at the leaf level and then invoke Heapify on each node. (Note: We cannot start at the top of the tree. Why not? Because the precondition which Heapify assumes is that the entire tree rooted at node i is already in heap order, except for i.) Actually, we can be a bit more efﬁcient. Since we know that each leaf is already in heap order, we may as well skip the leaves and start with the ﬁrst nonleaf node. This will be in position n/2 . (Can you see why?) Here is the code. Since we will work with the entire array, the parameter m for Heapify, which indicates the current heap size will be equal to n, the size of array A, in all the calls.
BuildHeap BuildHeap(int n, array A[1..n]) { for i = n/2 downto 1 { Heapify(A, i, n) } } // build heap from A[1..n]

An example of BuildHeap is shown in Figure 7.3 on page 146 of CLR. Since each call to Heapify takes O(log n) time, and we make roughly n/2 calls to it, the total running time is O((n/2) log n) = O(n log n). Next time we will show that this actually runs faster, and in fact it runs in Θ(n) time. HeapSort: We can now give the HeapSort algorithm. The idea is that we need to repeatedly extract the maximum item from the heap. As we mentioned earlier, this element is at the root of the heap. But once we remove it we are left with a hole in the tree. To ﬁx this we will replace it with the last leaf in the tree (the one at position A[m]). But now the heap order will very likely be destroyed. So we will just apply Heapify to the root to ﬁx everything back up.
HeapSort HeapSort(int n, array A[1..n]) { BuildHeap(n, A) m = n while (m >= 2) { swap A[1] with A[m] m = m-1 // sort A[1..n] // build the heap // initially heap contains all // extract the m-th largest // unlink A[m] from heap

43

Therefore the total running time of HeapSort is O(n log n). level 0 of the tree has 1 node. since we need to extract roughly n elements and each extraction involves a constant amount of work and one Heapify. because the heap has O(log n) levels. that is. HeapSort Analysis: Last time we presented HeapSort. All the leaves reside on level h. each of which takes O(log n) time. which has 2h nodes. With this assumption. As usual. This assumption will save us from worrying about ﬂoors and ceilings.Lecture Notes CMSC 251 Heapify(A. However. at level j 44 . later we will see that it is not possible to sort faster than Ω(n log n) time. In general. Nonetheless there are situations where you might not need to sort all of the elements. In this case the most convenient assumption is that n is of the form n = 2h+1 − 1. which HeapSort does.4 on page 148 of CLR. In particular. and up to level h. For this reason it is nice to be able to build the heap quickly since you may not need to extract all the elements. where h is the height of the tree. One clever aspect of the data structure is that it resides inside the array to be sorted. In the worst case the element might sift down all the way to the leaf level. At the bottommost level there are 2h nodes. and the element being sifted moves down one level of the tree after a constant amount of work. and then repeatedly extracting the maximum element from the heap and moving it to the end of the array. Is this tight? That is. the running time depends on how far an element might sift down before the process terminates. In fact. We argued that the basic heap operation of Heapify runs in O(log n) time. it is common to extract some unknown number of the smallest elements until some criterion (depending on the particular application) is met. At the next to bottommost level there are 2h−1 nodes. because we need to apply Heapify roughly n/2 times (to each of the internal nodes). it turns out that the ﬁrst part of the analysis is not tight. m) } } // fix things up An example of HeapSort is shown in Figure 7. Let us count the work done level by level. 1. it will make our lives simple by making some assumptions about n. Lecture 14: HeapSort Analysis and Partitioning (Thursday. At the 3rd level from the bottom there are 2h−2 nodes. The algorithm we present for partitioning is different from the texts. The reason is that a left-complete tree with this number of nodes is a complete tree. Recall that when Heapify is called. Mar 12. the BuildHeap procedure that we presented actually runs in Θ(n) time. but we do not call Heapify on any of these so the work is 0. For example. is the running time Θ(n log n)? The answer is yes. level 1 has 2 nodes. and each might sift down 2 levels. BuildHeap Analysis: Let us consider the running time of BuildHeap more carefully. So the total running time is O((n − 1) log n) = O(n log n). assuming that you use comparisons. 1998) Read: Chapt 7 and 8 in CLR. Recall that the algorithm operated by ﬁrst building a heap in a bottom-up manner. its bottommost level is full. and each might sift down 1 level. Although in the wider context of the HeapSort algorithm this is not signiﬁcant (because the running time is dominated by the Θ(n log n) extraction phase). and (2) that it takes O(n log n) time to extract each of the maximum elements. Based on this we can see that (1) that it takes O(n log n) time to build a heap. We make n − 1 calls to Heapify.

we can use it instead as an easy approximation. j 2 2j j=0 Now recall that n = 2h+1 − 1. Using this we have h ∞ T (n) = 2h j=0 j j ≤ 2h ≤ 2h · 2 = 2h+1 . First. Clearly the algorithm takes at least Ω(n) time (since it must access every element of the array at least once) so the total running time for BuildHeap is Θ(n). ∞ 1 . and multiply by x giving: ∞ ∞ jxj−1 = j=0 1 (1 − x)2 jxj = j=0 x . xj = 1−x j=0 Then take the derivative of both sides with respect to x. and each might sift down j levels. from the bottom there are 2h−j nodes. we see that the total time is proportional to h h T (n) = j=0 j2h−j = j=0 j 2h . 2j This is a sum that we have never seen before. = = j 2 2 (1 − (1/2)) 1/4 In our case we have a bounded sum. if we count from bottom to top. (1 − x)2 and if we plug x = 1/2. So. We could try to approximate it by an integral. We’ll digress for a moment to work it out. 45 . 2j If we factor out the 2h term. which would involve integration by parts. level-by-level.Lecture Notes CMSC 251 Total work for BuildHeap 3 2 1 0 0 0 1 0 0 1 0 0 2 1 0 3*1 2*2 1*4 0*8 Figure 13: Analysis of BuildHeap. so we have T (n) ≤ n + 1 ∈ O(n). we have h T (n) = 2h j=0 j . write down the inﬁnite general geometric series. then voila! we have the desired formula: ∞ j=0 1/2 1/2 j = 2. for any constant x < 1. but since the inﬁnite series is bounded. but it turns out that there is a very cute solution to this particular sum.

) This algorithm works by maintaining the following invariant condition. We will present a different algorithm from the one given in the text (in Section 8. (1) A[p] = x is the pivot value. QuickSort (and its variants) are considered the methods of choice for most standard library sorting algorithms. This is that the vast majority of nodes are at the lowest level of the tree. QuickSort is interesting in a number of respects. there is a sense in which a parallel version of BuildHeap can be viewed as operating like a sieve. but maybe this is getting too philosophical. which means that it makes use of a random number generator.r] whose elements are greater than or equal to x. and A[q + 1. and the number of nodes in the bottom 3 levels alone is 2h + 2h−1 + 2h−2 = 7n n n n + + = = 0. A[q] now holds a value that is less than x. and A[s − 1] now holds a value that is greater than or equal to x. (The other example was the median algorithm. Today we will discuss one aspect of QuickSort. QuickSort has a better locality-of-reference behavior than either MergeSort or HeapSort. based on the sieve technique. The algorithm ends when s = r. Next time we will discuss QuickSort.s − 1] contains items that are greater than or equal to x. For example.q − 1] whose elements are all less than or equal to x. In addition. The algorithm begins by setting q = p and s = p + 1. With each step through the algorithm we test the value of A[s] against x. and then increment s.r]. namely the partitioning algorithm.875n. swap A[s] with A[q]. This is how it got its name. its expected case running time is O(n log n). (2) A[p + 1.) Perhaps a more intuitive way to describe what is happening here is to observe an important fact about binary trees. it is important to be most efﬁcient on the bottommost levels of the tree (as BuildHeap is) since that is where most of the weight of the tree resides. (3) A[q + 1. In the ﬁrst case this is obvious.. x = A[p]. If a different rule is used for selecting x. almost 90% of the nodes of a complete binary tree reside in the 3 lowest levels.. (as we will present it) it is a randomized algorithm.Lecture Notes CMSC 251 It is worthwhile pausing here a moment. s.q] contains items that are less than x. This is the same partitioning algorithm which we discussed when we talked about the selection (median) problem. Moreover. Partitioning: Our next sorting algorithm is QuickSort. in a complete binary tree of height h there is a total of n ≈ 2h+1 nodes in total. This is the second time we have seen a relatively complex structured algorithm. In the second case. this expected case running time occurs with high probability. and thus it tends to run fastest of all three algorithms. (But I suspect that the one in the text is probably a bit for efﬁcient for actual implementation. then we can simply increment s. and r. and a little easier to analyze. We will show that in the worst case its running time is O(n2 ). The boundaries between these items are indicated by the indices p. meaning that 46 . We are given an array A[p. and a pivot element x chosen from the array. First off. the invariant is still maintained. that is. 2 4 8 8 That is. We will assume that x is the ﬁrst element of the subarray.. This is illustrated below. this is easily handled by swapping this element with A[p] before calling this procedure.. Otherwise we increment q. If A[s] ≥ x.r] contains items whose values are currently unknown. Notice that in either case. q.. A[q] = x. This algorithm is a little easier to verify the correctness.. come out with a running time of Θ(n). in that the probability that the algorithm takes signiﬁcantly more than O(n log n) time is rapidly decreasing function of n. Recall that the partitioning algorithm is suppose to partition A into three subarrays: A[p. and (4) A[s. Thus the lesson to be learned is that when designing algorithms that operate on trees. The subarray is broken into four segments.1). with doubly nested loops. Actually if you think deeply about this.

Mar 17.. Usually the lower you make this probability. 1998) Revised: March 18. Fixed a bug in the analysis. 47 .Lecture Notes CMSC 251 p x <x q p x q s swap <x x q >= x ? >= x s ? r Intermediate configuration r Initial configuration Final configuration Figure 14: Partitioning intermediate structure. To ﬁnish things off we swap A[p] (the pivot) with A[q].r] // pivot item in A[p] // put the pivot into final position // return location of pivot An example is shown below. We will present QuickSort as a randomized algorithm. Read: Chapt 8 in CLR. There are two common types of randomized algorithms: Monte Carlo algorithms: These algorithms may produce the wrong result. the longer the algorithm takes to run. that is. but the probability of this occurring can be made arbitrarily small by the user. whose expected-case running time is Θ(n log n). QuickSort and Randomized Algorithms: Early in the semester we discussed the fact that we usually study the worst-case running times of algorithms. Here is the complete code: Partition Partition(int p. My presentation and analysis are somewhat different than the text’s. It is a worst-case Θ(n2 ) algorithm. Today we will study QuickSort. all of the elements have been processed. an algorithm which makes random choices. array A) { x = A[p] q = p for s = p+1 to r do { if (A[s] < x) { q = q+1 swap A[q] with A[s] } } swap A[p] with A[q] return q } // 3-way partition of A[p. but sometimes average-case is a more meaningful measure. int r. Lecture 15: QuickSort (Tuesday. and return the value of q.

we pick a random index i from p to r.. QuickSort Overview: QuickSort is also based on the divide-and-conquer design paradigm.n]. this is generally considered the safest implementation. Partition: Partition the array in three subarrays. n.. return // pick a random element 48 . The Partition routine was discussed last time.r] be the (sub)array to be sorted. Since we want a random pivot.Lecture Notes CMSC 251 p 5 q 5 3 s 3 q 5 3 q 5 3 q 8 8 s 8 6 s 6 4 s 1 3 7 3 4 7 3 6 4 7 3 8 6 4 7 3 r 1 5 3 4 q 1 5 3 4 q 1 5 3 4 3 q 1 5 3 4 3 1 q 4 3 5 q 7 6 8 Final swap 7 8 7 6 8 6 8 7 s 7 3 s 6 s 6 8 s 1 1 3 1 Figure 15: Partitioning example.q − 1] and A[q + 1. in QuickSort the work is done before the recursive call is made. which we discussed earlier. and A[q + 1.r] // 0 or 1 items. Let A[p. and then swap A[i] with A[p]. Basis: If the list contains 0 or 1 elements. The initial call is QuickSort(1. The pseudocode for QuickSort is given below. int r. This is an important problem in cryptography. A)..r] // Sort A[p.. array A) { if (r <= p) return i = a random index from [p.n]. A[q] = x.q − 1] ≤ x. The QuickSort algorithm that we will discuss today is an example of a Las Vegas algorithm.. those elements A[1. The initial call is to A[1.n] ≥ x. but as we shall see. Recall that Partition assumes that the pivot is stored in the ﬁrst element of A. The most well known Monte Carlo algorithm is one for determining whether a number is prime. averaged over all possible random choices is the measure of the algorithm’s running time.. Las Vegas algorithms: These algorithms always produce the correct result.. Note the similarity with the selection algorithm. In these cases the expected running time. QuickSort QuickSort(int p. Here is an overview of QuickSort. Note that QuickSort does not need to be implemented as a randomized algorithm. Select pivot: Select a random element x from the array. Recurse: Recursively sort A[1. then return. called the pivot. Unlike MergeSort where most of the work is done after the recursive call returns.. but the running time is a random variable.

. r. then the split will be reasonably well balanced.. We will see that unbalanced partitions (like unbalanced binary trees) are bad. Putting this together we have T (n) = 1 max1≤q≤n (T (q − 1) + T (n − q) + n) if n ≤ 1 otherwise. and we make two recursive calls. i=−1 49 .n] which has r − (q + 1) + 1 = r − q elements. then the partition will be unbalanced. However its analysis is not so obvious. But unlike MergeSort.n]. A[1. where we had control over the sizes of the recursive calls. Since the pivot is chosen at random by our algorithm. To get the worst case. The ﬁrst is to the subarray A[1. Worst-case Analysis: Let’s begin by considering the worst-case performance. k−2 = T (n − k) + (n − i).. if the rank of the pivot is anywhere near the middle portion of the array. and result is poor running times. In particular. we maximize over all possible values of q. if the rank of the pivot (recall that this means its position in the ﬁnal sorted list) is very large or very small.r] QuickSort Analysis: The correctness of QuickSort should be pretty obvious. It takes Θ(n) time to do the partitioning and other overhead.. As a basis we have that T (0) = T (1) = Θ(1). Since this is a recursive program. We will see that the expected running time is O(n log n).) In this case. A) QuickSort(p. q = n. q-1.q − 1] which has q − 1 elements. we may do well most of the time and poorly occasionally... for some q in the range 1 to n. and the overall running time will be good. the worst case happens at either of the extremes (but see the book for a more careful analysis based on an analysis of the second derivative). because it is easier than the average case. here we do not. and the other is to the subarray A[q + 1. So I would plug in the value q = 1. and q = n/2 and work each out. This depends on the value of q. So if we ignore the Θ (as usual) we get the recurrence: T (n) = T (q − 1) + T (n − q) + n. r. The key is determining which value of q gives the maximum. Recurrences that have max’s and min’s embedded in them are very messy to solve. If we expand the recurrence in the case q = 1 we get: T (n) ≤ T (0) + T (n − 1) + n = 1 + T (n − 1) + n = T (n − 1) + (n + 1) = = T (n − 2) + n + (n + 1) T (n − 3) + (n − 1) + n + (n + 1) = T (n − 4) + (n − 2) + (n − 1) + n + (n + 1) = .Lecture Notes CMSC 251 swap A[i] with A[p] q = Partition(p. It depends on how the pivot is chosen. A) QuickSort(q+1. It turns out that the running time of QuickSort depends heavily on how good a job we do in selecting the pivot. A) } // // // // swap pivot into A[p] partition A about pivot sort A[p.q-1] sort A[q+1.. (A rule of thumb of algorithm analysis is that the worst cases tends to happen either at the extremes or in the middle. Suppose that we are sorting an array of size n. However. and further suppose that the pivot that we select is of rank q. it is natural to use a recurrence to describe its running time.

) Theorem: There exist a constant c such that T (n) ≤ cn ln n. . For the basis case n = 2 we have T (2) = = 1 2 2 (T (q − 1) + T (2 − q) + 2) q=1 8 1 ((T (0) + T (1) + 2) + (T (1) + T (0) + 2) = = 4. and the induction hypothesis is that for any n < n. Expansion is possible. It will simplify the analysis to assume that all of the elements are distinct. We want to prove it is true for T (n). This has been done to make the proof easier. 2 In fact. but we’ll be sloppy because things will be messy enough anyway. for all n ≥ 2. as we shall see. For the induction step. + (n − 1) + n + (n + 1)) n+1 i = i=1 (n + 1)(n + 2) ∈ O(n2 ). we cannot assume that the recursive calls will be made on array sizes that are powers of 2.) Proof: The proof is by constructive induction on n. so we cannot apply the Master Theorem. When we talked about average-case analysis at the beginning of the semester. So we can modify the above recurrence to compute an average rather than a max. (Notice that we have replaced lg n with ln n. Instead. Average-case Analysis: Next we show that in the average case QuickSort runs in Θ(n log n) time. we have T (n ) ≤ cn ln n . T (1) = 1 we set k = n − 1 and get n−3 T (n) ≤ = ≤ T (1) + (n − i) i=−1 1 + (3 + 4 + 5 + . (Properly we should write lg n because unlike MergeSort. In this case the average is computed over all possible random choices that the algorithm might make for the choice of the pivot index in the second step of the QuickSort procedure above.885. we said that it depends on some assumption about the distribution of inputs. because it means that the analysis of the algorithm’s performance is the same for all inputs. . q=1 50 .Lecture Notes CMSC 251 For the basis. the analysis does not depend on the input distribution at all—it only depends on the random choices that the algorithm makes. giving: T (n) = 1 1 n n q=1 (T (q − 1) + T (n − q) + n) if n ≤ 1 otherwise. so let’s try T (n) ≤ an lg n + b. We know that we want a Θ(n log n) running time. 2 2 We want this to be at most c2 ln 2 implying that c ≥ 4/(2 ln 2) ≈ 2. in this case. However. a more careful analysis reveals that it is Θ(n2 ) in this case. To analyze the average running time. we let T (n) denote the average running time of QuickSort on a list of size n. we assume that n ≥ 3. and moving the factor of n outside the sum we have: T (n) = = 1 n 1 n n (T (q − 1) + T (n − q) + n) q=1 n (T (q − 1) + T (n − q)) + n. By expanding the deﬁnition of T (n). and each choice has an equal probability of 1/n of occuring. we will attempt a constructive induction to solve it. The algorithm has n random choices for the pivot element. This is good. This is not a standard recurrence. but rather tricky.

.Lecture Notes CMSC 251 Observe that if we split the sum into two sums. n To ﬁnish the proof. we’ll extract T (0) and T (1) and treat them specially. If we cancel the common cn ln n we see that this will be true if we select c such that n 1− 4 c + ≤ 0. so we may choose c = 3 to satisfy both the constraints. we only need to select c so that c ≥ 2 + 8 . 2 n After some simple manipulations we see that this is equivalent to: 4 cn + 2 n 4 cn ≥ n+ 2 n 8 c ≥ 2+ 2. just that one counts up and the other counts down.885. + T (n − 1). we have T (n) n2 n2 ln n − 2 4 cn +n+ = cn ln n − 2 c = cn ln n + n 1 − 2 = 2c n +n+ 4 n + 4 n 4 . n 0 ≥ n− Since n ≥ 3. 51 . they both add the same values T (0) + T (1) + . n (cq ln q) q=2 We have never seen this sum before. Because they don’t follow the formula. If we make this substitution and apply the induction hypothesis to the remaining sum we have (which we can because q < n) we have T (n) = ≤ = 2 n 2 n 2c n n−1 T (q) q=0 +n = 2 n n−1 T (0) + T (1) + q=2 T (q) +n n−1 1+1+ q=2 n−1 (cq lg q) +n+ +n 4 . The Leftover Sum: The only missing element to the proof is dealing with the sum n−1 S(n) = q=2 q ln q. Thus we can replace this with n−1 2 q=0 T (q). Later we will show that n−1 S(n) = q=2 q ln q ≤ n2 n2 ln n − . From 9 the basis case we have c ≥ 2. and so selecting c = 3 will work. we want all of this to be at most cn ln n. 2 4 Assuming this for now. .

we allow QuickSort to be called in-place even though they need a stack of size O(log n) for keeping track of the recursion). In order to get Θ(n2 ) time the algorithm must make poor choices for the pivot at virtually every step. QuickSort (like MergeSort) is not formally an in-place sorting algorithm. the main work in QuickSort (in partitioning) spends most of its time accessing elements that are close to one another. because it does make use of a recursion stack. recall the integration formula for bounding summations (which we paraphrase here). Mar 19. 2 4 This completes the summation bound. and so continuously making poor choices are very rare. QuickSort is the most popular algorithm for implementation because its actual performance (on typical modern architectures) is so good. Recall that an in-place sorting algorithm is one that uses no additional array storage (however.g. 1998) Read: Chapt. so this is not really a problem. and if you are a calculus wimp (like me) then you can look it up in a book of integrals n x ln xdx = 2 x2 x2 ln x − 2 4 n = x=2 n2 n2 ln n − 2 4 − (2 ln 2 − 1) ≤ n2 n2 ln n − . 52 . 20) it switches to a simple iterative algorithm. Although we did not show it. In MergeSort we are always comparing two array elements against each other. The reason it tends to outperform MergeSort (which also has good locality of reference) is that most comparisons are made against the pivot element. For large values of n. could we make QuickSort deterministic Θ(n log n) by calling the selection algorithm to use the median as the pivot. Lecture 16: Lower Bounds for Sorting (Thursday. but the resulting algorithm would be so slow practically that no one would ever use it. which can be stored in a register. it turns out that this doesn’t just happen much of the time. the size of the stack is O(log n). the average-case running time is Θ(n log n). Summary: So even though the worst-case running time of QuickSort is Θ(n2 ). In MergeSort and in the expected case for QuickSort. A sorting algorithm is stable if duplicate elements remain in the same relative position after sorting. 9 of CLR. Review of Sorting: So far we have seen a number of algorithms for sorting a list of numbers in ascending order. but once the sizes of the subarray falls below some minimum size (e. For any monotonically increasing function f (x) b−1 b f (i) ≤ i=a a f (x)dx. then you can integrate this by parts. The function f (x) = x ln x is monotonically increasing. the running time is Θ(n log n) with high probability.Lecture Notes CMSC 251 To bound this. You might ask. The most efﬁcient versions of QuickSort uses the recursion for large subarrays. and so we have n S(n) ≤ x ln xdx. The answer is that this would work. and hence the entire proof. 2 If you are a calculus macho man. such as selection sort. Poor choices are rare. The reason for this stems from the fact that (unlike Heapsort) which can make large jumps around in the array.

This is still a difﬁcult task if you think about it. Heapsort: Heapsort is based on a nice data structure. the keys will be permuted in the ﬁnal array 53 . and SelectionSort. implying that it is not an in-place algorithm. and there is a clean mathematical way of characterizing all such algorithms. It is an (almost) in-place sorting algorithm but is not stable. Each node of the decision tree represents a comparison made in the algorithm (e. Let a1 . . ai ≥ aj . . This action may involve moving elements around in the array. one might be labeled with a4 < a7 and the other with a4 ≥ a7 . Such an algorithm is called a comparison-based sorting algorithm. indexing into an array based on arithmetic operations on keys.. which is a fast priority queue. In a comparison-based sorting algorithm only comparisons between the keys are used to determine the action of the algorithm. so this is not a very restrictive assumption. This does not preclude the possibility of a sorting algorithm whose actions are determined by other types of operations. but it is not stable. and ai > aj . Decision Tree Argument: In order to prove lower bounds. an must make at least Ω(n log n) comparisons in the worst-case. But to show that a problem cannot be solved fast you need to reason in some way about all the possible algorithms that might ever be written. . Given two elements. . Elements can be inserted into a heap in O(log n) time. . a2 . then the “action” of the sorting algorithm is completely determined by the results of its comparisons. (Alternatively. (It is also easy to set up a heap for extracting the smallest item. This algorithm is O(n log n) in the expected case. It is an in-place algorithm. but SelectionSort cannot (without signiﬁcant modiﬁcations). Mergesort: Mergesort is a stable Θ(n log n) sorting algorithm. it seems surprising that you could even hope to prove such a thing. their relative order can only be determined by the results of comparisons like ai < aj . It is easy to show that a problem can be solved fast (just give an algorithm). we model such algorithms in terms of an abstract model called a decision tree. copying them to other locations in memory. BubbleSort and InsertionSort can be implemented as stable algorithms. In fact. . and O(n2 ) in the worst case. for example. the left subtree consists of the remaining comparisons made under the assumption that a4 ≤ a7 and the right subtree for a4 > a7 . The probability that the algorithm takes asymptotically longer (assuming that the pivot is chosen randomly) is extremely small for large n. We will show that any comparison-based sorting algorithm for a input sequence a1 . called a heap. The catch here is that we are limited to using comparison-based algorithms. ai = aj . performing various arithmetic operations on non-key data. and the two branches represent the possible results. we need an abstract way of modeling “any possible” comparison-based sorting algorithm. Quicksort: Widely regarded as the fastest of the fast algorithms. These are all simple Θ(n2 ) in-place sorting algorithms. Lower Bounds for Comparison-Based Sorting: Can we sort faster than O(n log n) time? We will give an argument that if the sorting algorithm is based solely on making comparisons between the keys in the array.Lecture Notes CMSC 251 Slow Algorithms: Include BubbleSort. A decision tree is a mathematical representation of a sorting algorithm (for a ﬁxed value of n). . then it is impossible to sort more efﬁciently than Ω(n log n) time. performing arithmetic operations. an be the input sequence.) If you only want to extract the k largest values. Virtually all known general purpose sorting algorithms are based on making comparisons. a heap can allow you to do this is O(n + k log n) time. a4 : a7 ). But the bottom-line is that at the end of the algorithm. ai ≤ aj .) Observe that once we know the value of n. and includes all of the algorithms given above.g. a2 . The downside is that MergeSort is the only algorithm of the three that requires additional array storage. consulting the individual bits of numbers. for example. InsertionSort. and the largest item can be extracted in O(log n) time. . ai and aj .

54 . a2 > a1 : These two are swapped and the ﬁnal output is a3 . The we pass to phase 2 and compare a2 with a1 (which is now in the third position of the array) yielding either: a2 ≤ a1 : Final output is a3 . The ﬁnal decision tree is shown below.1. and let’s build a decision tree for SelectionSort. a1 > a3 : These two are swapped and the ﬁnal output is a2 .3. We pass to phase 2 and compare a2 with a1 (which is now in the third position of the array) yielding either: a2 ≤ a1 : Final output is a3 .1. < a1:a3 < < a2:a3 > 1. let us assume that n = 3.) As you can see. a1 . Can you explain this? (The answer is that virtually all sorting algorithms. a3 . and swaps it with the ﬁrst element.2 < > a2:a1 > 3. in the sense that their outcome has already been determined by earlier comparisons. Recall that the algorithm consists of two phases.1 Figure 16: Decision Tree for SelectionSort on 3 keys. and then pass to phase 2. The possible results are: a1 ≤ a2 : Then a1 is the current minimum.1 2. To make this more concrete.2. We swap a2 with a1 . a1 . The ﬁrst ﬁnds the smallest element of the entire list. Next we compare a1 with a3 whose results might be either: a1 ≤ a3 : Then we know that a1 is the minimum overall. a2 > a3 : These two are swapped and the ﬁnal output is a1 .1 < > a2:a1 > 3. The possible results are: a2 ≤ a3 : Final output is a1 .2. a3 . The ﬁrst comparison is between a1 and a2 . a2 .2.1. a3 . Then we pass to phase 2 and compare a2 with a3 . a1 .2 1. a2 . a2 .3 3. converting a complex sorting algorithm like HeapSort into a decision tree for a large value of n will be very tedious and complex. and compare the remaining items a1 and a3 . a3 . a2 > a1 : These two are swapped and the ﬁnal output is a3 . For example. and it is swapped with a1 . and it is swapped with a1 .Lecture Notes CMSC 251 in some way. may make comparisons that are redundant. especially inefﬁcient ones like selection sort. a1 . Each leaf in the decision tree is labeled with the ﬁnal permutation that the algorithm generates after making all of its comparisons. The possible results are: a1 ≤ a3 : Final output is a2 . The second ﬁnds the smaller of the remaining two items. Next we compare a2 with a3 whose results might be either: a2 ≤ a3 : Then we know that a2 is the minimum overall. Note that there are some nodes that are unreachable. a1 . in order to reach the fourth leaf from the left it must be that a1 ≤ a2 and a1 > a2 .3 3. a1 . which cannot both be true. but I hope you are convinced by this exercise that it can be done in a simple mechanical way. a1 > a2 : Then a2 is the current minimum. a2 . and the elements remain in their original positions. and swaps it with the second element. a2 > a3 : Then we know that a3 is the minimum is the overall minimum.2 a1:a2 > a2:a3 < < a1:a3 > 2. Here is the decision tree (in outline form). a2 . a2 .3. a1 > a3 : Then we know that a3 is the minimum is the overall minimum.

55 . then those numbers could be presented in any of n! different permutations. but only under very restrictive circumstances. This means that this sorting algorithm can distinguish between at most 2T (n) different ﬁnal actions. Mar 31. This can be generalized to show that the average-case time to sort is also Ω(n log n) (by arguing about the average height of a leaf in a tree with at least n! leaves). implying that A(n) ≥ n!. the following theorem. 9 of CLR. Notice that the running time fo the algorithm must be at least as large as T (n). We can use Stirling’s approximation for n! (see page 35 in CLR) yielding: n! T (n) √ n n 2πn e √ n n ≥ lg 2πn e √ = lg 2πn + n lg n − n lg e ∈ Ω(n log n). Observe that the height of the decision tree is exactly equal to T (n). How many possible actions must any sorting algorithm distinguish between? If the input consists of n distinct numbers. we cannot do it by making comparisons alone. Recall that the argument was based on showing that any comparison-based sorting could be represented as a decision tree. This lower bound implies that if we hope to sort numbers faster than in O(n log n) time. because any path from the root to a leaf corresponds to a sequence of comparisons made by the algorithm. A(n) is usually not exactly equal to n! because most algorithms contain some redundant unreachable leaves. the algorithm must “unscramble” the numbers in an essentially different way. ≥ Thus we have.Lecture Notes CMSC 251 Using Decision Trees for Analyzing Sorting: Consider any sorting algorithm. As we have seen earlier. Let’s call this quantity A(n).) Since A(n) ≤ 2T (n) we have 2T (n) ≥ n!. Today we consider the question of whether it is possible to sort without the use of comparisons. that is it must take a different action. and hence its height must be at least lg(n!) ∈ Ω(n lg n). implying that T (n) ≥ lg(n!). The algorithm deﬁnes a decision tree. They answer is yes. Linear Time Sorting: Last time we presented a proof that it is not possible to sort faster than Ω(n log n) time assuming that the algorithm is based on making 2-way comparisons. (Again. the decision tree must have at least n! leaves. Lecture 17: Linear Time Sorting (Tuesday. any binary tree of height T (n) has at most 2T (n) leaves. Theorem: Any comparison-based sorting algorithm has worst-case running time Ω(n log n). to distinguish between the n! different permutations in which the keys could be input. for the number of different ﬁnal actions the algorithm can take. For each different permutation. Let T (n) be the maximum number of comparisons that this algorithm makes on any input of size n. The lower bound on sorting can be generalized to provide lower bounds to a number of other problems as well. since we are not counting data movement or other computations at all. Each action can be thought of as a speciﬁc way of permuting the oringinal input to get the sorted output. 1998) Read: Chapt.

We use three arrays: A[1.. we replace R[x] with the sum of elements in the subarray R[1. Recall that the rank of a item is the number of elements in the array that are less than or equal to it.key]++ for x = 2 to k do R[x] += R[x-1] for j = n downto 1 do { x = A[j].n] is the input array. B[1. If k = O(n). The ﬁgure below shows an example of the algorithm.). This means that if x = A[j]. respectively. The question is how to ﬁnd the rank of an element without comparing it to the other elements of the array? Counting sort uses the following three arrays. R[1. A[j] is a record. we set B[R[x]] = A[j]. n times.n] initialize R R[x] = #(A[j] == x) R[x] = rank of x move each element of A to B x = key value R[x] is where to put it leave space for duplicates There are four (unnested) loops. and then for each j. First we set R[x] to be the number of elements of A[j] whose key is equal to x. sorting characters. it is actually a list of records. and n times. you sort by simply copying each element to the appropriate location of the ﬁnal sorted output array. not just the key value. since we do not want them to overwrite the same location of B.. Thus. Now R[x] now contains the rank of x. To determine the number of elements that are less than or equal to x. from 1 to n. we decrement R[i] after copying. for each element in the input array. to convince yourself how it works. We do this in two steps. Notice that this copies the entire record. The basic idea is to determine. indicating that we have seen one more 5. As usual A[1..key B[R[x]] = A[j] R[x]-} } // // // // // // // // sort A[1.g. The algorithm sorts in Θ(n + k) time. The algorithm operates by ﬁrst constructing R. then this implies that the resulting sorting algorithm is Θ(n) time. 56 . We present three algorithms based on the theme of speeding up sorting in special cases.. There is a subtlety here however.Lecture Notes CMSC 251 Many applications involve sorting small integers (e. In this algorithm we will be a little more careful to distinguish the entire record A[j] from the key A[j]. array B) { for x = 1 to k do R[x] = 0 for j = 1 to n do R[A[j].k] : An array of integers. Counting Sort CountingSort(int n. This is done by just keeping a running total of the elements of R. and the numeric value is the key on which the list is being sorted. then the total running time is Θ(n).key.. A[j]. The algorithm is remarkably simple. executed k times.. exam scores. You should trace through a few examples. k − 1 times. but deceptively clever.n] to B[1.key is the integer key value on which to sort. Counting Sort: Counting sort assumes that each input is an integer in the range from 1 to k.n] : Array of records which holds the sorted output. if A[j].. If k is known to be Θ(n). array A. so the total running time is Θ(n + k) time.key then the ﬁnal position of A[j] should be at position R[x] in the ﬁnal sorted array. We can do this initializing R to zero. R[x] is the rank of x in A. Notice that once you know the rank of every element. then the 5th element of R is incremented.k].n] : Holds the initial input.x]. We need to be careful if there are duplicates. where x ∈ [1.. Recall that although we usually think of A as just being a list of numbers. we increment R[A[j]. last four digits of a social security number.key] by 1. its rank in the ﬁnal sorted array.key = 5. by not making comparisons. int k. Thus. etc. To do this.

n] with d digits for i = 1 to d do { Sort A (stably) with respect to i-th lowest order digit. There is a tradeoff between the space and time. If the integers are in the range from 1 to 1 million.) Radix Sort: The main shortcoming of counting sort is that it is only really (due to space requirements) for small integers. Actually. each having d decimal digits (or digits in any base).. To sort these numbers we can simply sort repeatedly. Let’s suppose that we have access to a stable sorting algorithm. It would not be stable if the loop were running the other way. we may not want to allocate an array of a million elements. Radix sort provides a nice way around this by sorting numbers one digit at a time. it is typically more convenient to sort by bytes rather than digits (especially for sorting character strings). array A) { // sort A[1.. We will not discuss how it is that A is broken into digits.Lecture Notes CMSC 251 1 A 1 a 2 4 e 3 3 r 4 1 s 5 3 v R 1 2 2 0 3 2 4 1 R R 1 2 2 1 1 1 0 2 2 2 2 2 2 2 3 4 3 3 2 2 2 4 5 5 5 5 4 4 B B B B B 1 2 3 4 3 v 5 Key Other data R R R R 1 s 1 s 1 s 1 a 1 s 3 r 3 r 3 r 3 v 3 v 3 v 3 v 4 e 4 e Figure 17: Counting Sort. Obviously this not an in-place sorting algorithm (we need two additional arrays). what constitutes a “digit” is up to the implementor. However it is a stable sorting algorithm. starting at the lowest order digit. Let’s think of our list as being composed of n numbers.n] be the array to sort. let A[1. } } 57 . numbers having the same high order digit will remain sorted with respect to their low order digit. etc. (As a hint. Since the sorting algorithm is stable. As usual. The idea is very simple. and then later we sort with respect to high order digits. Radix Sort RadixSort(int n. int d. like Counting Sort. we know that if the numbers are already sorted with respect to low order digits. For example. I’ll leave it as an exercise to prove this. notice that the last loop runs down from n to 1. but this might be done through bit manipulations (shifting and masking off bits) or by accessing elements byte-by-byte. and let d denote the number of digits in A. and ﬁnishing with the highest order digit.

For example. it is not clear whether it is really faster to use RadixSort over QuickSort. Thus. This is at a level of similarity. 58 . CountingSort. Lecture 18: Review for Second Midterm (Thursday. But you should be careful about attempting to make generalizations when the sizes of the numbers are not bounded. where it would probably be best to implement both algorithms on your particular machine to determine which is really faster. Material from Text: You are only responsible for material that has been covered in class or on class assignments. the locality of reference behavior of Counting Sort (and hence of RadixSort) is not as good as QuickSort. if you were to apply radix sort in this situation. For example. if we have d = 2 and set k = n. recurrences. have discussed some algorithm design techniques (divide-andconquer in particular). • Chapt 9 (skip 9. the running time would be Θ(dn) = Θ(n log n). Furthermore. Thus. Recall that our goal is to provide you with the necessary tools for designing and analyzing efﬁcient algorithms. • Chapt 7: Heaps. then we can sort numbers in the range n ∗ n = n2 in Θ(n) time. 1998) General Remarks: Up to now we have covered the basic techniques for analyzing algorithms (asymptotics. Notice that we can be quite ﬂexible in the deﬁnition of what a “digit” is. Then it follows that you need at least B = lg n bits to store these values. summations. • Chapt 8: QuickSort. then doesn’t this imply that we can sort any array of integers in O(n) time (by applying radix sort on each of the d = 4 bytes)? The answer is yes. Section 8. At this point you might ask. Look at Section 7. The relevant sections of the text are the following. and k is the number of values a digit can have. suppose you have n keys and there are no duplicate values. since a computer integer word typically consists of 32 bits (4 bytes).Lecture Notes CMSC 251 Here is an example. not the one in the text. and we will still have an Θ(n) time algorithm. but the algorithm’s running time will be Θ(dn) as long as k ∈ O(n). induction). • Review Chapts 1: InsertionSort and MergeSort. HeapSort. You are responsible for the partitioning algorithm which we gave in class. This is usually a constant.5 on priority queues. Apr 2. So there is no real asymptotic savings here. 576 494 194 296 278 176 954 49[4] 19[4] 95[4] =⇒ 57[6] 29[6] 17[6] 27[8] 9[5]4 [1]76 176 5[7]6 [1]94 194 1[7]6 [2]78 278 =⇒ 2[7]8 =⇒ [2]96 =⇒ 296 4[9]4 [4]94 494 1[9]4 [5]76 576 2[9]6 [9]54 954 The running time is clearly Θ(d(n + k)) where d is the number of digits. It can be any number in the range from 1 to cn for some constant c. this can be used to sort numbers in the range from 1 to nd in Θ(dn) time.2 gives some good intuition on the analysis of QuickSort. subject to this word-length restriction. In general.4): Lower bounds on sorting. However it is always a good idea to see the text to get a better insight into some of the topics we have covered. The number of bytes is d = B/8 . and have discussed sorting algorithm and related topics. RadixSort. n is the length of the list. even though we didn’t cover it in class.

the ﬁgure below (left) shows a directed graph. surface meshes used for shape description in computer-aided design and geographic information systems. e. Some deﬁnitions of graphs disallow this. E) consists of a ﬁnite set of vertices V (also called nodes) and E is a binary relation on V (i. 1998) Read: Sections 5. Furthermore.e. Most of the problems in computational graph theory that we will consider arise because they are of importance to one or more of these application areas. {1. 2. 4} and the edge set is E = {{1. The ﬁgure above (right) shows the graph G = (V. We say that vertex w is adjacent to vertex v if there is an edge from v to w. Observe that self-loops are allowed by this deﬁnition. 2). The meaning of the weight is dependent on the application. Multiple edges are not permitted (although the edges (v. {2. (1. Lecture 20: Introduction to Graphs (Thursday. distance between vertices or ﬂow capacity through the edge. In an undirected graph (or just graph) G = (V. 60 . In a directed graph we will often say that an edge either leaves or enters a vertex. 3} and E = {(1. {3. 1). we say that an edge is incident on a vertex if the vertex is an endpoint of the edge.4. 4}. One involves algorithms on graphs. For example. (3. E) the edge set consists of unordered pairs of distinct vertices (thus self-loops are not allowed). 3)}. 2. 3. 5. No lecture. In an undirected graph. Intuitively. April 9. E). 1998) Second midterm exam today. we will be discussing a number of various topics. v) are distinct). A digraph or undirected graph is said to be weighted if its edges are labelled with numeric weights. Graphs and Digraphs: A directed graph (or digraph) G = (V. where V = {1. Graphs are very important discrete structures because they are a very ﬂexible mathematical model for many application problems. (2. a set of ordered pairs from V ) called the edges. April 7. 2}.Lecture Notes CMSC 251 Lecture 19: Second Midterm Exam (Tuesday. 3}. 4}}. a graph is a collection of vertices or nodes. 1 2 3 1 2 3 4 Digraph Graph Figure 18: Digraph and graph example. The list of application is almost too long to even consider enumerating it. Basically. precedence constraints in scheduling systems. Examples of graphs in application include communication and transportation networks. any time you have a set of objects. (1. connected by a collection of edges. This shows the graph G = (V. many of these problems form the basic building blocks from which more complex algorithms are then built. {1. Graph Algorithms: For the next few weeks. w) and (w.5. 4}. 2). and there is some “connection” or “relationship” or “interaction” between pairs of objects. VLSI and other sorts of logic circuits. E) where V = {1. a graph is a good way to model this.g. 3).

. edges. . A path in a directed graph is a sequence of vertices v0 . A path is simple if all vertices (except possibly the ﬁrst and last) are distinct. but other notions (such as connectivity) are only deﬁned for one. A Hamiltonian cycle is a cycle that visits every vertex in a graph exactly once. and thus we have Observation: For a digraph G = (V. which simply jumps back and forth along a single edge. In an undirected graph we just talk about the degree of a vertex. In a digraph. vi ) is an edge for i = 1. .e. Graphs: Last time we introduced the notion of a graph (undirected) and a digraph (directed). By the degree of a graph. the number of edges). 2. and the number of edges coming in is called the in-degree. we usually mean the maximum degree of its vertices. v∈V Lecture 21: More on Graphs (Tuesday. Note that every vertex is reachable from itself by a path that uses zero edges. k. i. Recall that graphs and digraphs both consist of two objects. v1 . This is to rule out the degenerate cycle u. . in-deg(v) = v∈V v∈V out-deg(v) = |E|. deg(v) = 2|E|. . We say that w is reachable from u if there is a path from u to w. . w. (By the way. but for a simple cycle we add the requirement that the cycle visit at least three distinct vertices. vk are distinct. There are two interesting classes cycles. . 61 . . this is pronounced “Oiler-ian” and not “Yooler-ian”. (|E| means the cardinality of the set E. as the number of edges which are incident on this vertex. . each edge contributes 1 to the in-degree of a vertex and contributes one to the out-degree of each vertex. E). April 14.Lecture Notes CMSC 251 Observe that directed graphs and undirected graphs are different (but similar) objects mathematically. k. (Note: A self-loop counts as a simple cycle of length 1). . We deﬁned vertices. Certain notions (such as path) are deﬁned for both. the number of edges coming out of a vertex is called the out-degree of that vertex.) There are also “path” versions in which you need not return to the starting vertex. . in addition. . and the notion of degrees of vertices. v1 . a set of vertices and a set of edges. Paths and Cycles: Let’s concentrate on directed graphs for the moment. The length of the path is the number of edges. 5. In an undirected graph each edge contributes once to the outdegree of two different edges and thus we have Observation: For an undirected graph G = (V. For graphs edges are undirected and for graphs they are directed.4. In undirected graphs we deﬁne path and cycle exactly the same. 1998) Read: Sections 5. vk such that (vi−1 . u .5. Today we continue this discussion. A cycle in a digraph is a path containing at least one edge and for which v0 = vk . A cycle is simple if. In a directed graph. A Eulerian cycle is a cycle (not necessarily simple) that visits every edge of a graph exactly once. E).

Observe that in all three cases all the vertices have degree 3. In this graph. and then proving necessary and sufﬁcient conditions for a graph to have such a path. for directed acyclic graph. as is usually seen in data structures. a 1–1 and onto) function f : V → V . by showing that this question could be posed as a problem of whether the multi-graph shown below has an Eulerian path. A connected graph has a single connected component. Clearly (a) and (b) seem to appear more similar to each other than to (c). An acyclic graph (which is not necessarily connected) consists of many free trees. and is called (what else?) a forest. A graph is connected if every vertex can reach every other vertex. v) ∈ E if and only if (f (u). f (v)) ∈ E .Lecture Notes CMSC 251 One of the early problems which motivated interest in graph theory was the K¨ nigsberg Bridge Probo lem. Connectivity and acyclic graphs: A graph is said to be acyclic if it contains no simple cycles. Determining whether graphs are isomorphic is not as easy as it might seem at ﬁrst. This implies that the reachability relation partitions the vertices of the graph in equivalence classes. Isomorphism: Two graphs G = (V. These are called connected components. it is reﬂexive (a vertex is reachable from itself). all but at most two of the vertices must have even degree. Leonard Euler showed that it is not possible. then v is reachable from u). An acyclic connected graph is called a free tree or simply tree for short. there is a unique path between any two vertices in a free tree. u can reach v and v can reach u. The addition of any edge to a free tree results in the creation of a single cycle. o Euler proved that for a graph to have an Eulerian path. but we will not consider it. The question is whether it is possible to cross all 7 bridges without visiting any bridge twice. all 4 of the vertices have odd degree. such that (u. E) and G = (V . Furthermore. consider the graphs in the ﬁgure. strongly connectivity deﬁnes an equivalence partition on the vertices. A directed graph that is acyclic is called a DAG. It turns 62 . This city sits on the Pregel River as is joined by 7 bridges. but in fact looks are deceiving. in contrast to a rooted tree. then u is reachable from w).) As with connected components in graphs. Observe that a free tree is a minimally connected graph in the sense that the removal of any causes the resulting graph to be disconnected. 1 1 3 4 4 3 2 2 Figure 19: Bridge’s at K¨ nigsberg Problem. but the smallest simple cycles in (b) and (c) are of length 5. Observe there are simple cycles of length 4 in (a). so that is not much of a help. These are called the strongly connected components of the digraph. The term “free” is intended to emphasize the fact that the tree has no root. and transitive (if u is reachable from v and v is reachable from w. For example. A digraph is strongly connected if for any two vertices u and v. Note that it is different from a directed tree. that is. Isomorphic graphs are essentially “equal” except that their vertices have been given different names. symmetric (if u is reachable from v. (There is another type of connectivity in digraphs called weak connectivity. E ) are said to be isomorphic if there is a bijection (that is. The “reachability” relation is an equivalence relation on vertices. This implies that (a) cannot be isomorphic to either (b) or (c).

The notation (u → v) means that vertex u from graph (b) is mapped to vertex v in graph (c). E ) where E = {(u. the graph shown in the ﬁgure (b). v) ∈ E | u. For example. K5 is the complete graph on 5 vertices. An undirected graph that has the maximum possible number of edges is called a complete graph. v ∈ V }. Given a subset V ⊆ V . is bipartite. For example. 8} is an independent set. all the vertices of V are adjacent to one another. (7 → 10). (c) out that (b) and (c) are isomorphic. (3 → 3). One possible isomorphism mapping is given below. the subgraph induced by V is the graph G = (V . a subset of vertices V ⊆ V is said to form a clique if the subgraph induced by V is complete. (9 → 6). (4 → 7). 2. 4. Given a graph G. 7. Complete graphs are often denoted with the letter K. 6} is a clique. 4.Lecture Notes CMSC 251 1 6 5 10 9 4 8 3 4 7 2 5 10 9 1 6 2 7 8 3 9 10 2 1 8 5 3 4 7 6 (a) (b) Figure 20: Graph isomorphism. in the ﬁgure below (a). take all the edges of G that join pairs of vertices in V . For example. Bipartite graphs. (2 → 2). (6 → 5). Check that each edge from (b) is mapped to an edge of (c). E ) is a subgraph of G = (V. (8 → 4). the subset {1. Independent set. In other words. A subset of vertices V forms an independent set if the subgraph induced by V has no edges. {(1 → 1). 1 7 2 6 4 3 5 8 9 (a) (b) Figure 21: Cliques. E) if V ⊆ V and E ⊆ E. and {3. Subgraphs and special graphs: A graph G = (V . In other words. A bipartite graph is an undirected graph in which the vertices can be partitioned into two sets V1 and V2 such that all the edges go between a vertex in V1 and V2 (never within the same group). (10 → 9)}. (5 → 8). 63 .

There is always one face. The reversal of a directed graph. 4. 5 . The neighbors of a vertex are the vertices that it is adjacent to. Notice that the other embedding also has 6 faces. and so by Euler’s formula they have F = 2 − V + E = 2 − 5 + 9 = 6 faces. 3. satisﬁes: V − E + F = 2. and faces in a planar graph. 2. For example. In general there may be many different ways to draw a planar graph in the plane. the neighbors of vertex 1 in counterclockwise order are 2. called the unbounded face that surrounds the whole graph. the ﬁgure below shows two essentially different drawings of the same graph. circuits (where wires cannot cross). E). the triangular region bounded by vertices 1. This may also be called the transpose and denoted GT . Is it possible that two different embeddings have different numbers of faces? The answer is no. 64 . This embedding has 6 faces (including the unbounded face). (Our textbook usually uses the abuse of notation V = |V | and E = |E|. Such a drawing is called a planar embedding. edges. Thus the two embeddings are different. which relates the numbers of vertices. An embedding is determined by the counterclockwise cyclic ordering of the neighbors about all the vertices. Beware. 3 . both graphs have 5 vertices. 5 is a face. in the ﬁgure on the left.) Because the running time of an algorithm will depend on the size of the graph. For example. An important fact about planar embeddings of graphs is that they naturally subdivide the plane into regions. Size Issues: When referring to graphs and digraphs we will always let n = |V | and e = |E|. the sometimes V is a set.Lecture Notes CMSC 251 ¯ The complement of a graph G = (V. but on the right the order is 2. often denoted G is a graph on the same vertex set. often denoted GR is a graph on the same vertex set in which all the edge directions have been reversed. In the examples above. 4. Euler’s Formula: A connected planar embedding of a graph with V vertices. and sometimes it is a number. 1 1 5 2 5 2 4 3 4 3 Figure 22: Planar Embeddings. it is important to know how n and e relate to one another. Planar graphs are important special cases of graphs. since they arise in applications of geographic information systems (as subdivisions of region into smaller subregions). and 9 edges. 5. For example. but in which the edge set has been complemented. Some authors use m = |E|. The reason stems from an important observation called Euler’s formula. solid modeling (for modeling complex surfaces as collections of small triangles). in the embedding on the left. and F faces. E edges. called faces. A graph is planar if it can be drawn on the plane such that no two edges cross over one another.

When a gray vertex is processed. it is colored gray (and is part of the frontier). neighbors of these neighbors. Next consider the unvisited neighbors of the neighbors of the neighbors. Repeat this until there are no more unvisited neighbors left to visit. They will be at distance 2 from s. When a vertex has ﬁrst been discovered. This algorithm can be visualized as simulating a wave propagating outwards from s.and its predecessor 67 .. s) while (Q is nonempty) { u = dequeue(Q) for each v in Adj[u] { if (color[v] == white) { color[v] = gray d[v] = d[u]+1 pred[v] = u // // // // vertex distances vertex colors predecessor pointers FIFO queue // initialization // initialize source s // put source in the queue // u is the next vertex to visit // // // // if neighbor v undiscovered . Breadth-ﬁrst search: Given an graph G = (V.. breadth-ﬁrst search starts at some source vertex s and “discovers” which vertices are reachable from s. Breadth-ﬁrst search discovers vertices in increasing order of distance..size(V)] vertex pred[1.. meaning that they have not been seen.Lecture Notes CMSC 251 2 1 s 1 2 2 1 2 s 2 2 3 s 3 3 3 : Finished 3 : Undiscovered : Discovered Figure 25: Breadth-ﬁrst search for shortest paths.E). At any given time there is a “frontier” of vertices that have been discovered.. Breadth-First Search BFS(graph G=(V. and hence can be used as an algorithm for computing shortest paths. E).set its distance . and so on. visiting the vertices in bands at ever increasing distances from s.size(V)] queue Q = empty for each u in V { color[u] = white d[u] = INFINITY pred[u] = NULL } color[s] = gray d[s] = 0 enqueue(Q. Breadth-ﬁrst search is named because it visits vertices across the entire “breadth” of this frontier. then it becomes black. but not yet processed. Initially all vertices (except the source) are colored white.size(V)] int color[1.... Deﬁne the distance between a vertex v and s to be the minimum number of edges on a path from s to v. vertex s) { int d[1.mark it discovered ..

The number of iterations through the inner for loop is proportional to deg(u) + 1.) Summing up over all vertices we have the running time T (n) = n + u∈V (deg(u) + 1) = n + u∈V deg(u) + n = 2n + 2e ∈ Θ(n + e). implying that the other vertex would appear on the very next level of the tree. The reason is that if any cross edge spanned two or more levels.Lecture Notes CMSC 251 enqueue(Q. depending on where the search starts. (Note that there are many potential BFS trees for a given graph. gray or black). x u 2 x s t Q: x. the vertex who ﬁrst discovered u.e. The real meat is in the traversal loop. but they may come up an arbitrary number of levels. Observe that the predecessor pointers of the BFS search deﬁne an inverted tree. and d[u].) These edges of G are called tree edges and the remaining edges of G are called cross edges. If we reverse these edges we get a rooted unordered tree called a BFS tree for G. w u 2 x v 1 w 2 v 1 w 2 v 1 w 2 3 0 1 3 0 1 ? 0 1 t Q: t s x t Q: w. the number of times we go through the while loop is at most n (exactly n assuming each vertex is reachable from the source). but the others are useful depending on the application. a ﬁrst-in ﬁrst-out list. it would have discovered the other vertex. where elements are removed in the same order they are inserted. The ﬁrst item in the queue (the next to be removed) is called the head of the queue.. t s x t Q: u. Let n = |V | and e = |E|. a contradiction. Only the color is really needed for the search. we need to spend a constant amount of time to set up the loop.. (In a directed graph cross edges will generally go down at most 1 level. We will also maintain arrays color [u] which holds the color of vertex u (either white. then when the vertex at the higher level (closer to the root) was being processed. the distance from s to u. Since we never visit a vertex twice. (The +1 is because even if deg(u) = 0. Observe that the initialization portion requires Θ(n) time. and in what order vertices are placed on the queue. It is not hard to prove that if G is an undirected graph. w s x Figure 26: Breadth-ﬁrst search: Example. pred[u] which points to the predecessor of u (i. u. v) } } color[u] = black } } // . u ? v ? w ? u ? v 1 w ? u 2 v 1 w 2 ? 0 ? ? 0 1 ? 0 1 t Q: s u 2 s x s t Q: v.put it in the queue // we are done with u The search makes use of a queue.) Analysis: The running time analysis of BFS is similar to the running time analysis of many graph traversal algorithms. then cross edges always go between two nodes that are at most one level apart in the BFS tree. 68 .

v) E. we will employ common matrix notation. For an directed graph the analysis is essentially the same. . uk ) = i=1 W (ui−1 .Lecture Notes CMSC 251 0 1 2 3 s 1 v 2 x u t w Figure 27: BFS tree. We will present two algorithms for this problem. Recall that adjacency lists are generally more efﬁcient for sparse graphs (and large graphs tend to be sparse). will quickly yield a dense digraph (since typically almost every vertex can reach almost every other vertex). . E. Lecture 23: All-Pairs Shortest Paths (Tuesday. since it can be confused with the number of edges on the path. or negative. . v. and w as we usually do. and the second is a Θ(n3 ) algorithm. ui ). the cost of this path is the sum of the edge weights: k cost(π) = W (u0 . j and k to denote vertices rather than u. Let G = (V. Let G = (V. since the output will be dense. First. Intuitively a negative weight means that you get paid for traveling from u to v. in special cases. All-Pairs Shortest Paths: Last time we showed how to compute shortest paths starting at a designated source vertex. Second. then the weight of this edge is denoted W (u. we compute shortest paths not from a single vertex.2) in CLR. The edge weights may be positive. Because both algorithms are matrix-based. there is no real harm in using the adjacency matrix. storing all the distance information between each pair of vertices. we allow edges in the graph to have numeric weights. April 21. but we will see that the algorithm that we are about to present can handle negative weights as well. Today we talk about a considerable generalization of this problem. For these algorithms. (We will avoid using the term length. called dynamic programming. The latter is called the Floyd-Warshall algorithm. 1998) Read: Chapt 26 (up to Section 26. but we assume that 69 . Intuitively. We consider the problem of determining the cost of the shortest path between all pairs of vertices in a weighted directed graph. The ﬁrst is a rather naive Θ(n4 ) algorithm. For now. is an edge of G. uk .) The distance between two vertices is the cost of the minimum cost path between them. using i. or more generally the cost of traveling from u to v. this weight denotes the distance of the road from u to v. we will assume that the digraph is represented as an adjacency matrix. let us think of the weights as being positive values. zero. However. E) be a directed graph with edge weights. w) denote the input digraph and its edge weight function. . but from every vertex in the graph. Therefore. and assuming that there are no weights on the edges. u1 . u2 ) + · · · W (uk−1 . u1 ) + W (u1 . Given a path π = u0 . v). Both algorithms is based on a completely different algorithm design technique. If (u. rather than the more common adjacency list.

Why? Because by going round the cycle over and over. deﬁne dij to be the cost of the shortest path from vertex i to vertex j that contains at most m edges. j]. the shortest paths cost estimates should converge to their correct values. We will see later than using these values it will be possible to reconstruct any shortest path in Θ(n) time. We want to ﬁnd some parameter. wij =  +∞ if i = j and (i. and would consist of an inﬁnite number of edges. There is one very natural way to do this. Optimal substructure: Each of the subproblems should be solved optimally. Dynamic Programming for Shortest Paths: The algorithm is based on a technique called dynamic programming. which constrains the estimates to the shortest path costs. The technique is related to divide-and-conquer. However. The idea is to 70 (m) . is that this does not lead to the best solution. in the sense that it breaks problems down into smaller problems that it solves recursively. the shortest path would have a weight of −∞. Bottom-up computation: Combine solutions on small subproblems to solve larger subproblems. j) may generally be nonzero.Lecture Notes CMSC 251 there are no cycles whose total weight is negative. because of the somewhat different nature of dynamic programming problems. then pred [i. and later present the Floyd-Warshall algorithm makes use of a different. and hence the direct cost is inﬁnite. The question is how to decompose the shortest path problem into subproblems in a meaningful way. Let us ﬁrst sketch the natural way to break the problem into subproblems. Thus. What is remarkable.  if i = j. we will also compute an auxiliary matrix pred [i. and so W (i. intuitively means that there is no direct link between these two nodes. For 0 ≤ m ≤ n − 1. Let D(m) denote the matrix whose entries are these values. To help us do this. the cost will become smaller and smaller. j) if i = j and (i. then it never does us any good to follow this edge (since it increases our cost and doesn’t take us anywhere new). Input Format: The input is an n × n matrix W of edge weights. The basic elements that characterize a dynamic programming algorithm are: Substructure: Decompose the problem into smaller subproblems. the shortest path cost from vertex i to j. which are based on the edge weights in the digraph. j) ∈ E. not the path itself. since we have no distance to travel. First we will introduce the natural decomposition. / Setting wij = ∞ if there is no edge. The value of pred [i. and eventually to arrive at a solution to the complete problem. If the shortest path ever enters such a cycle. j] will be a vertex that is somewhere along the shortest path from i to j. j) ∈ E. j). If it is positive. Path Length Formulation: We will concentrate just on computing the cost of the shortest path. Note that in digraphs it is possible to have self-loop edges. If the shortest path travels directly from i to j without passing through any other vertices. The output will be an n × n distance matrix D = dij where dij = δ(i. Notice that it cannot be negative (otherwise we would have a negative cost cycle consisting of this single edge). A natural way to do this is to restrict the number of edges that are allowed to be in the shortest path. standard divide-and-conquer solutions are not usually efﬁcient. As this parameter grows. At ﬁrst the estimates will be crude. Dynamic programming problems are typically optimization problems (ﬁnd the smallest or largest solution. j] will be set to null. We let wij denote the entry in row i and column j of W . The reason for setting wii = 0 is that intuitively the cost of getting from any vertex to should be 0. it would never exit. Recovering the shortest paths will also be an issue.  0 W (i. subject to various constraints). It is easy to see why this causes problems. Disallowing negative weight cycles will rule out the possibility this absurd situation. but more efﬁcient dynamic programming formulation.

Since we know that no shortest path can use more than n − 1 edges (for otherwise it would have to repeat a vertex).3 = 4 (unsing: 1. we consider all vertices k. j) of weight wkj to get to j.Lecture Notes CMSC 251 compute D(0) then D(1) .4.3 = 9 (using: 1. since the edges of the digraph are just paths of length 1. the ﬁrst term is redundant. There are two cases: Case 1: If the shortest path uses strictly fewer than m edges. So as our basis case we have (1) dij = wij . how do we compute these distance matrices? As a basis. Case 2: If the shortest path uses exactly m edges. This is illustrated in the ﬁgure (b) above. So we minimize over all possible choices. for m ≥ 2. This gives dij (m) (m−1) (m−1) = min 1≤k≤n dik (m−1) + wkj . observe that D(1) = W . To compute the shortest path from i to j. and then the single edge length cost from k to j. up to D(n−1) . we have dij This term occurs in the second case (when k = j). In the second case. Now. since we are given W as input. This is illustrated in the ﬁgure (a) below. We can simplify this formula a bit by observing that since wjj = 0. and then follows a single edge (k. we claim that it is possible to compute D(m) from D(m−1) . 1 (1) 1 4 4 2 d1. (m) Consider how to compute the quantity dij . then its cost is just dij (m−1) . It is just as easy to start with D(1) . This is the length of the shortest path from i to j using at most m edges. we could start with paths of containing 0 edges. One way would be to write a recursive procedure to do it. This suggests the following rule: dij (m) (m−1) (m−1) = min dij min1≤k≤n dik + wkj .3 = INF (no path) 8 2 dij i dik (m−1) j wkj d1. then the path uses m − 1 edges to go from i to some vertex k. Here is a possible implementation.3) (3) (2) (m−1) 9 3 1 k (a) Figure 28: Dynamic Programming Formulation. to make the induction go. = dij + wjj . The array of weights 71 . D(0) (as our text does). i.2. using m − 1 edges. (b) The question is. j).2.3) d1. we know that D(n−1) is the ﬁnal distance matrix. and consider the length of the shortest path from i to k. Thus. Notice that the two terms of the main min correspond to the two cases. The path from i to k should be shortest (by the principle of optimality) so the length of the resulting path is (m−1) dik + wij . However. But we do not know what k is. The next question is how shall we implement this rule. and so on. the initial call would be Dist(n − 1.

1.. 72 .Lecture Notes CMSC 251 Recursive Shortest Paths Dist(int m.n]) { matrix dd[1. The total running time is T (n − 1. Once we compute one of these values. For each m... Otherwise.n... Observe that there are only O(n3 ) different possible numbers dij that we have to compute.n.passing through k dd[i. for 1 ≤ m ≤ n − 1... we will store it in a table.n.. When m = 1. and so its running time is Θ(n3 ). 1. d[i.j]) return dd // return matrix of distances } The procedure ExtendShortestPath() consists of 3 nested loops. we make n recursive calls to T (m − 1. The ﬁgure below gives an implementation of this idea.j]. int j) { if (m == 1) return W[i.k] + W[k. n). n) = 1 nT (m − 1. int W[1. and so the total running time is Θ(n4 ).n.j] = min(dd[i. 1.n]) { array D[1. The matrix D(m) is stored as D[m]. If you unravel the recursion.n].. W) return D[n-1] } // initialize D[1] // comput D[m] from D[m-1] ExtendShortestPath(int n. So how do we make this faster? The answer is to use table-lookup. The result will be O(nn ). int d[1.to j for k = 1 to n do // . i. Then if we want this value again. implying that D is a 3-dimensional matrix. D[m] is a 2-dimensional matrix.. from the above formula. the recursion bottoms out. we will simply look its value up in the table.j]) return best } // single edge case // apply the update rule Unfortunately this will be very slow.n] // copy d to temp matrix for i = 1 to n do // start from i for j = 1 to n do // . 1.. w) is given the number of vertices n and the matrix of edge weights W . This is the key to dynamic (m) programming. D[m-1].n. It is a straightforward to solve this by expansion. otherwise.. This gives the recurrence: T (m. k) + w[k. n) = 1. 1. It is called n − 2 times by the main procedure. Dynamic Program Shortest Paths ShortestPath(int n. This is illustrated in the ﬁgure below.n-1][1. n) + 1 if m = 1. Next time we will see that we can improve on this. We initialize D(1) by copying W .j] best = INF for k = 1 to n do best = min(best. and we have T (1... There are exponentially many such paths. Then each call to ExtendPaths() computes D(m) from D(m−1) . int i. you will see that this algorithm is just blindly trying all possible paths from i to j. Let T (m. It is not hard to see why. n) be the running time of this algorithm on a graph with n vertices. The algorithm makes n calls to itself with the ﬁrst argument of m − 1. rather than recompute it.. where the ﬁrst argument is m.n] = d[1.n] copy W to D[1] for m = 2 to n-1 do D[m] = ExtendPaths(n. 1. Dist(m-1.. a huge value.. The main procedure ShortestPath(n.n. n). int W[1.

As before. . . . April 23. and there the superscript (m) restricts the number of edges the path is allowed to use. In other words. We deﬁne dij to be the shortest path from i to j such that any intermediate vertices on the path are chosen from the set {1. 2. Thus. . Floyd-Warshall Update Rule: How do we compute dij assuming that we have already computed the previous matrix d(k−1) ? As before. whether u can reach v. 2. assuming that the intermediate vertices are chosen from {1. . The Floyd-Warshall algorithm improves upon this algorithm. . or it visits some intermediate vertices along the way. . notice how the value of d32 changes as k varies. j). but (k) it leads to a faster algorithm. The path is free to visit any subset of these vertices. . in the digraph (k) shown in the following ﬁgure. depending on the ways that we might get from vertex i to vertex j. For example. v −1 are the intermediate vertices (k) of this path. v2 . there are two basic cases. 2. . 1998) Read: Chapt 26 (up to Section 26. Floyd-Warshall Algorithm: We continue discussion of computing shortest paths between all pairs of vertices in a directed graph. . . For a path p = v1 . . and to do so in any order. k}: (k) 73 . k}. v3 . . The Floyd-Warshall algorithm dates back to the early 60’s. running in Θ(n3 ) time. . . . we consider a path from i to j which either consists of the single edge (i. Lecture 24: Floyd-Warshall Algorithm (Thursday. k}. . but these intermediate can only be chosen from {1. . v we say that the vertices v2 .2) in CLR. At ﬁrst the formulation may seem most unnatural. we will compute a set of matrices whose entries are dij .Lecture Notes CMSC 251 1 1 4 4 2 8 2 0 ? W = 4 ? 8 0 ? 2 ? 1 0 9 1 ? ? 0 = D (1) 9 3 1 ? = infinity 1 5 4 2 3 2 1 1 13 4 5 9 3 3 1 12 0 3 (2) 5 0 D = 4 12 13 2 9 1 0 3 1 ? 5 0 5 4 2 3 2 1 4 7 6 4 3 3 5 1 7 0 5 D = 4 7 (3) 3 0 7 2 4 1 0 3 1 6 5 0 Figure 29: Shortest Path Example. Warshall was interested in the weaker question of reachability: determine for each pair of vertices u and v. Note that a path consisting of a single edge has no intermediate vertices. Floyd realized that the same technique could be used to compute shortest paths with only minor variations. . The genius of the Floyd-Warshall algorithm is in ﬁnding a different formulation for the shortest path subproblem than the path length formulation introduced earlier. . the difference between Floyd’s formulation and the previous formulation is that here the superscript (k) restricts the set of vertices that the path is allowed to pass through. We will change the meaning of each of these entries.

.2 = 12 (3. k − 1} and hence the length of the shortest path is dij . In order for the overall path to be as short as possible we should take the shortest path from i to k. The length of the path is dik This suggests the following recursive rule for computing d(k) : dij dij (n) (0) = = wij . 1.2) (a) Figure 30: Floyd-Warshall Formulation. .) That is. (The assumption that there are no negative cost cycles is being used here. intermediate vertices only in {1. .j] = d[i. but this will be prohibitively slow. Again. Here is the complete algorithm.n] for i = 1 to n do { for j = 1 to n do { d[i. so we can assume that we pass through k exactly once.1. we go from i to k.1.n.2 = 12 (3. . We have also included predecessor pointers.2) 8 2 d3. pred [i.n. dik (k−1) + dkj (k−1) for k ≥ 1. The ﬁnal answer is dij because this allows all possible vertices as intermediate vertices. min dij (k−1) (k) .. j] for extracting the ﬁnal shortest paths.1.2 = 7 (3.Lecture Notes CMSC 251 d3.) Each of these paths uses (k−1) (k−1) + dkj .n]) { array d[1..to j // new shorter path length // new path is through k 74 .j] = null } } for k = 1 to n do for i = 1 to n do for j = 1 to n do if (d[i. (This is the principle of optimality. we compute it by storing the values in a table. k − 1}. Instead. and the shortest path from k to j. .2 = 12 (3. .k] + d[k. and then from k to j. int W[1.2) d3.j]) < d[i.2) dik (k−1) Vertices 1 through k−1 (k−1) 9 3 1 dkj k d3.from i // . and looking the values up as we need them. We will discuss them later. . we could write a recursive program to compute dij . 2..j] pred[i.j]) { d[i.j] pred[i.j] = k (k) // initialize // use intermediates {1. We do go through k: First observe that a shortest path does not pass through the same vertex twice.. 1.. . We don’t go through k at all: Then the shortest path from i to j uses only intermediate vertices (k−1) {1...j] = W[i.2 = INF (no path) 1 (0) (1) (2) (3) (4) i dij (k−1) j 1 4 4 2 d3.1.4..k] + d[k.. Floyd-Warshall Algorithm Floyd_Warshall(int n.k} // .

To ﬁnd the shortest path from i to j. j] = k. j] can be used to extract the ﬁnal path. we set pred [i.j] = null output(i. An example is shown in the following ﬁgure. If it is null.j) { if pred[i. pred[i.j]. The space used by the algorithm is Θ(n2 ). j). Extracting Shortest Paths: The predecessor pointers pred [i. If the shortest path does not pass through any intermediate vertex.j) else { Path(i. we recursively compute the shortest path from i to pred [i. } } // path is a single edge // path goes through pred // print path from i to pred // print path from pred to j 75 . Observe that we deleted all references to the superscript (k) in the code. Path(pred[i. j] and the shortest path from pred [i. j] = null . Otherwise. Printing the Shortest Path Path(i. then the shortest path is just the edge (i. j). Here is the idea. we consult pred [i. whenever we discover that the shortest path from i to j passes through an intermediate vertex k. then pred [i. 1 1 1 4 4 2 8 2 0 (0) ? D = 4 ? 8 0 ? 2 ? 1 0 9 1 ? ? 0 1 4 4 2 8 0 ? D = 4 2 ? (1) 8 0 12 2 ? 1 0 9 1 ? 5 0 5 9 3 1 ? = infinity 9 3 1 12 1 1 1 4 4 2 8 2 5 3 9 3 1 12 0 (2) ? D = 4 ? 8 0 12 2 9 1 0 3 1 ? 5 0 5 2 8 1 4 7 4 6 5 9 3 3 1 12 0 5 D = 4 2 7 (3) 8 0 12 2 9 1 0 3 1 6 5 0 1 5 1 4 7 4 2 6 3 2 5 3 4 3 1 7 0 5 D = 4 7 (4) 3 0 7 2 4 1 0 3 1 6 5 0 Figure 31: Floyd-Warshall Example.Lecture Notes CMSC 251 } return d } // matrix of final distances Clearly the algorithm’s running time is Θ(n3 ).j]). j] to j. j]. It is left as an exercise that this does not affect the correctness of the algorithm.

and search for matches in the other string. let X = ABRACADABRA and let Z = AADAA . 1998) Read: Section 16. But exact matches rarely occur in biology because of small changes in DNA replication. Xik . There are a number of measures of similarity in strings. . Xi = x1 . . . . The method of string similarities should be insensitive to random insertions and deletions of characters from some originating string. . < ik ≤ n) such that Z = Xi1 . . j ] for i ≤ i and j ≤ j (but not both equal). There are a number of important problems here. Strings: One important area of algorithm design is the study of algorithms for character strings. . . 0] = 0. we will derive a dynamic programming solution. . Then the longest common subsequence is Z = ABADABA . G. Let c[i. since there are an exponential number of possible subsequences. the minimum number of single character insertions.) In many instances you do not want to ﬁnd a piece of text exactly. Basis: c[i. In typical DP fashion. For example. . Eventually we are interested in c[m. Xi2 . Dynamic Programming Solution: The simple brute-force solution to the problem would be to try all possible subsequences from one string. 76 . . j] assuming that we already know the values of c[i . . . Given two sequences X = x1 . let X be as before and let Y = YABBADABBADOO . Note that it is not always unique. deletions. then the longest common subsequence is empty. we need to break the problem into smaller pieces. Exact substring search will only ﬁnd exact matches. it is of interest to compute similarities between strings that do not match exactly. X0 is the empty sequence. x2 . (This is what text editors and functions like ”grep” do when you perform a search. xi . or transpositions necessary to convert one string into another. For this reason. The idea will be to compute the longest common subsequence for every possible pair of preﬁxes. This arises for example in genetics research. z2 . yn determine a longest common subsequence. Given two strings X and Y . Genetic codes are stored as long DNA molecules. but this is hopelessly inefﬁcient. is that of determining the length of the longest common subsequence. Instead. . the longest common subsequence of X and Y is a longest sequence Z which is both a subsequence of X and Y . The idea is to compute c[i. Among the most important has to do with efﬁciently searching for a substring or generally a pattern in large piece of text. we say that Z is a subsequence of X if there is a strictly increasing sequence of k indices i1 . . . . . T. j] denote the length of the longest common subsequence of Xi and Yj . We begin with some observations. There are many ways to do this for strings. n] since this will be the LCS of the two entire strings. i2 .3 in CLR. xm and Z = z1 . . . . . which we will study today. then Z is a subsequence of X. x2 . The ﬁrst is the edit distance. . . . A. Longest Common Subsequence: Let us think of character strings as sequences of characters. For example. The other. that is. ik (1 ≤ i1 < i2 < . The Longest Common Subsequence Problem (LCS) is the following. zk . If either sequence is empty. but it turns out for this problem that considering all pairs of preﬁxes will sufﬁce for us. .Lecture Notes CMSC 251 Lecture 25: Longest Common Subsequence (April 28. but rather something that is ”similar”. The DNA strands can be broken down into a long sequences each of which is one of four basic types: C. . Given two sequences X = x1 . . . xm and Y = y1 . A preﬁx of a sequence is just an initial string of values. For example the LCS of ABC and BAC is either AC or BC . 0] = c[j. .

Last characters do not match: Suppose that xi = yj . creating a longer common subsequence.. Implementing the Rule: The task now is to simply implement this rule. if i.m. we can add it to the end of the LCS. j] =  max(c[i. Since both end in A.. b[0. but it will not make the LCS any smaller if we do. char y[1. But this would contradict the deﬁnition of the LCS as being longest. 0. if i. and further suppose that the LCS ended in a different character B. j − 1] + 1 c[i. c[i − 1.j-1]) { c[i. (At ﬁrst you might object: But how did you know that these two A’s matched with each other.) Since the A is part of the LCS we may ﬁnd the overall LCS by removing A from both sequences and taking the LCS of Xi−1 = ABC and Yj−1 = DAC which is AC and then adding A to the end. Because A is the last character of both strings. 0. j − 1] + 1. To see this. so we try both and take the one that gives us the longer LCS. The answer is that we don’t. giving ACA as the answer. Thus either xi is not part of the LCS.n].j-1]+1 b[i. j].j] = 0 b[0.j] = c[i-1. In this case xi and yj cannot both be in the LCS (since they would have to be the last character of the LCS). or yj is not part of the LCS (and possibly both are not part of the LCS). if xi = yj then c[i. j].j] = SKIPX } else { // initialize column 0 // initialize row 0 // take X[i] and Y[j] for LCS // X[i] not in LCS // Y[j] not in LCS 77 . which is c[i − 1. (We will explain why later.0] = 0 b[i. j − 1].. We do not know which is the case. We will store some helpful pointers in a parallel array. j]) if i = 0 or j = 0. Longest Common Subsequence LCS(char x[1. j − 1].0] = SKIPX } for j = 0 to n do { c[0. As with other DP solutions.Lecture Notes CMSC 251 Last characters match: Suppose xi = yj . Combining these observations we have the following rule:   0 c[i − 1.n]) { int c[0.m]. Thus.j] b[i. j] = max(c[i − 1. j − 1]). Example: Let Xi = ABCA and let Yj = DACA . j] = c[i − 1. In the second case the LCS is the LCS of Xi and Yj−1 which is c[i.j] = ADDXY } else if (c[i-1. c[i. j > 0 and xi = yj .) Thus.j] >= c[i.j] = c[i-1.. it follows that this particular instance of the character A cannot be used anywhere else in the LCS. In the ﬁrst case the LCS of Xi and Yj is the LCS of Xi−1 and Yj .m.. we claim that the LCS must also end in A. j > 0 and xi = yj . if xi = yj then c[i.n] for i = 0 to m do { c[i. Thus. suppose by contradiction that both characters end in A. then the LCS must also end in this same character.j] = SKIPY } for i = 1 to m do { for j = 1 to n do { if (x[i] == y[j]) { c[i. we concentrate on computing the maximum length. We left undone the business of showing that if both strings end in the same character..

So we take this common character. j − 1] to the northwest ( ). j] = SKIPY . The algorithm also uses O(mn) space.n].m]. break case SKIPX: i--. j--. and so we skip it and go to b[i − 1. break } } return LCS } 78 . j] above us (↑). and continue with entry b[i − 1. 0.j-1] b[i. Print Subsequence getLCS(char x[1. break case SKIPY: j--.j] = SKIPY } } } return c[m..j] = c[i.. The running time of the algorithm is clearly O(mn) since there are two nested loops with m and n iterations.m.. j] = ADDXY means that X[i] and Y [j] together form the last character of the LCS. int b[0.n]. then we know that X[i] is not in the LCS. Following these back pointers.n].Lecture Notes CMSC 251 c[i. and outputting a character with each diagonal move gives the ﬁnal subsequence. respectively. If b[i. char y[1. then we know that Y [j] is not in the LCS... j − 1] to the left (←). and so we skip it and go to b[i. Intuitively b[i. Similarly.n]) { LCS = empty string i = m j = n while(i != 0 && j != 0) { switch b[i. } 0 0 1 2 3 4 m= 5 0 B 0 A 0 C 0 D 0 B 0 1 0 1 1 1 1 1 2 0 1 1 1 2 2 3 0 1 1 2 2 2 4 =n 0 1 1 2 2 3 LCS = BCB 0 1 2 3 4 m= 5 0 0 B 0 A 0 C 0 D 0 B 0 1 0 1 1 1 1 1 2 0 1 1 1 2 2 3 0 1 1 2 2 2 4 =n 0 1 1 2 2 3 start here B D C B X = BACDB Y = BDCB B D C B LCS Length Table with back pointers included Figure 32: Longest common subsequence example. Extracting the Actual Sequence: Extracting the ﬁnal LCS is done by using the back pointers stored in b[0.j] { case ADDXY: add x[i] (or equivalently y[j]) to front of LCS i--.. j] = SKIPX .0. if b[i.m.

mult[((A1 A2 )A3 )] mult[(A1 (A2 A3 ))] = (5 · 4 · 6) + (5 · 6 · 2) = 180. there are restrictions on the dimensions. q C[i. 1998) Read: Section 16. and the result will be a p × r matrix C. April 30. Consider the case of 3 matrices: A1 be 5 × 4. not all involve the same number of operations. A2 be 4 × 6 and A3 be 6 × 2. where the dynamic programming issues are easiest to see. (The number of columns of A must equal the number of rows of B. k]B[k. 79 . You can multiply a p × q matrix A times a q × r matrix B. Chain Matrix Multiplication: This problem involves the question of determining the optimal sequence for performing a series of operations. . pn where Ai is of dimension pi−1 × pi . A2 . but we are not free to rearrange the order of the matrices. thus the total time (e. An Matrix multiplication is an associative but not a commutative operation. Also recall that when two (nonsquare) matrices are being multiplied. = (4 · 6 · 2) + (5 · 4 · 2) = 88. . An and dimensions p0 . it just ﬁgures out the best order in which to perform the multiplications. number of multiplications) to multiply these two matrices is p · q · r. . Observe that there are pr total entries in C and each takes O(q) time to compute. considerable savings can be achieved by reordering the evaluation sequence. p1 . . . .1 of CLR. Note that although any legal parenthesization will lead to a valid result.) In particular for 1 ≤ i ≤ p and 1 ≤ j ≤ r. We will study the problem in a very restricted instance. A * q B = = p C Multiplication time = pqr r p r q Figure 33: Matrix Multiplication. The Chain Matrix Multiplication problem is: Given a sequence of matrices A1 . . Important Note: This algorithm does not perform the multiplications. . j] = k=1 A[i. This general class of problem is important in compiler design for code optimization and in databases for query optimization.g. Suppose that we wish to multiply a series of matrices A1 A2 . .Lecture Notes CMSC 251 Lecture 26: Chain Matrix Multiplication (Thursday. . Even for this small example. j]. A p × q matrix has p rows and q columns. This means that we are free to parenthesize the above multiplication however we like. determine the multiplication sequence that minimizes the number of operations.

. because if we want to ﬁnd the optimal sequence for multiplying A1. If i < j. i] m[i. Thus. j] = = 0 i≤k<j min (m[i. That is.. and so the cost is 0. For convenience we can write Ai. m[i.k times Ak+1. This is related to a famous function in combinatorics called the Catalan numbers (which in turn is related to the number of different binary trees on n nodes).j . In other words. i] = 0. for any k.. we ﬁnd that C(n) ∈ Ω(4n /n3/2 ).j is a pi−1 × pj matrix. The former problem can be solved by just considering all possible values of k. whose solutions can be combined to solve the global problem.k Ak+1. m[i. and Ak+1. If you have n items. as Ai. j].k is a pi−1 × pk matrix.. This suggests the following recursive rule for computing m[i..n . and build the table in a bottom-up manner. the number of ways of parenthesizing an expression is very large. A1. Thus the problem of determining the optimal sequence of multiplications is broken up into 2 questions: how do we decide where to split the chain (what is k?) and how do we parenthesize the subchains A1. In parenthesizing the expression. we create two sublists to be parenthesized..n ? The subchain problems can be solved by recursively applying the same scheme.. namely just after the 1st item. Since 4n is exponential and n3/2 is just polynomial.n = A1.k is m[i. Dynamic Programming Solution: This problem. Since Ai.j to be the product of matrices i through j. (There is nothing to multiply. For 1 ≤ i ≤ j ≤ n. and this grows very fast. j] + pi−1 pk pj ) 80 for i < j. When we split just after the kth item.. This suggests the following recurrence for P (n).n we must use the optimal sequences for A1. n+1 n Applying Stirling’s formula. If you have just one item. i ≤ k < j. The optimum time to compute Ai.j is a pk × pj matrix. then we are asking about the product Ai.Lecture Notes CMSC 251 Naive Algorithm: We could write a procedure which tries all possible parenthesizations.k and Ak+1. one with k items. We will store the solutions to the subproblems in a table. then the total is L · R.. In particular P (n) = C(n − 1) and C(n) = 1 2n . k] + m[k + 1. We may assume that these values have been computed previously and stored in our array. the time to multiply them is pi−1 · pk · pj . etc.. Since these are independent choices. the exponential will dominate. let m[i. As a basis observe that if i = j then the sequence contains only one matrix... Unfortunately.j .k and Ak+1. This can be split by considering each k. just after the 2nd item. j] denote the minimum number of multiplications needed to compute Ai. the subproblems must be solved optimally for the global problem to be solved optimally.. the number of different ways of parenthesizing n items: P (n) = 1 n−1 k=1 P (k)P (n − k) if n = 1.... and the other with n − k items. We want to break the problem into subproblems. then there is only one way to parenthesize. It is easy to see that Ai.) Thus. At this level we are simply multiplying two matrices together.j . if there are L ways to parenthesize the left sublist and R ways to parenthesize the right sublist.n . Then we could consider all the ways of parenthesizing these. .. then there are n − 1 places where you could break the list with the outermost pair of parentheses. and the optimum time to compute Ak+1. a parenthesization). this will not be practical except for very small n.. and just after the (n − 1)st item. Notice that this problem satisﬁes the principle of optimality.j is m[k +1. 1 ≤ k ≤ n − 1. we can consider the highest level of parenthesization.. j]. The optimum cost can be described by the following recursive deﬁnition. like other dynamic programming problems involves determining a structure (in this case. if n ≥ 2. k].

n-1. j] we will need to access values m[i. The only tricky part is arranging the order in which to compute the values. but the key is that there are three nested loops.1 m[i..n] for i = 1 to n do m[i. k] + m[k+1. j]. Intuitively. s[i. and that s[i. which is given below. n]. The basic idea is to leave a split marker indicating what the best split is. .. j] for k lying between i and j. j) return X*Y.. j) { if (i > j) { k = s[i. .j . This tells us that the best way to multiply the subchain Ai. j] + p[i-1]*p[k]*p[j] if (q < m[i. j]) { m[i. and ﬁnally multiply these together. We can maintain a parallel array s[i.j] X = Mult(i. } // X = A[i]. We need to be a little careful in setting up the loops.n] and s } The array s[i. int n) { array s[1.A[j] // multiply matrices X and Y 81 . The procedure returns a matrix. Extracting the ﬁnal Sequence: To extract the actual sequence is a fairly easy extension. what value of k lead to the minimum value of m[i.n]. j] tells us what multiplication to perform last. In the process of computing m[i. this means that i + L − 1 ≤ n. Extracting Optimum Sequence Mult(i. j] = k. i ≤ n − L + 1. The actual multiplication algorithm uses the s[i. or in other words.. The running time of the procedure is Θ(n3 ). Let L = j − i + 1 denote the length of the subchain being multiplied. So our loop for i runs from 1 to n − L + 1 (to keep j in bounds).. then j = i + L − 1. 3. if j > i. j] is global to this recursive procedure. The subchains of length 1 (m[i.Lecture Notes CMSC 251 It is not hard to convert this rule into a procedure. Note that we only need to store s[i..j] = q.k and then multiply the subchain Ak+1.A[k] // Y = A[k+1]. and each can iterate at most n times. s[i. For example. Assume that the matrices are stored in an array of matrices A[1.j is to ﬁrst multiply the subchain Ai. It is used to extract the actual sequence... j] in which we will store the value of k providing the optimal split.j] = INFINITY for k = i to j-1 do { q = m[i. suppose that s[i. k) Y = Mult(k+1. . Chain Matrix Multiplication Matrix-Chain(array p[1.j] = k } } } } return m[1. .. Then we build up by computing the subchains of lengths 2. Since we want j ≤ n. If a subchain of length L starts at position i. This suggests that we should organize things our computation according to the number of matrices in the subchain.i] = 0 // initialize for L = 2 to n do { // L = length of subchain for i = 1 to n-L+1 do { j = i + L . i]) are trivial. n. The ﬁnal answer is m[1. j] will be explained later. k] and m[k +1. } else return A[i]. that is. j] value to determine how to split the current sequence.n]. We’ll leave this as an exercise in solving sums. that is..2. j] when we have at least two matrices..

May 5. Although your algorithm can solve small problems reasonably efﬁciently (e. 1998) Read: Chapt 36. Lecture 27: NP-Completeness: General Introduction (Tuesday. The question that often arises in practice is that you have tried every trick in the book. This algorithm is tricky. Easy and Hard Problems: At this point of the semester hopefully you have learned a few things of what it means for an algorithm to be efﬁcient. up through section 36. 2. 4. All of this is ﬁne if it helps you discover an acceptably efﬁcient algorithm to solve your problem. and how to design algorithms and determine their efﬁciency asymptotically. or n!. The optimal sequence is ((A1 (A2 A3 ))A4 ). 7 meaning that we are multiplying A1 (5 × 4) times A2 (4 × 6) times A3 (6 × 2) times A4 (2 × 7). and still your best algorithm is not fast enough. or worse. n ≤ 20) the really large applications that you want to solve (e.j] 3 i 0 A3 Final order 1 2 A2 A3 A4 Figure 34: Chain Matrix Multiplication. It turns out that many of these “hard” problems were interrelated in the sense that if you could solve any one of them in polynomial time.g. Around this time a remarkable discovery was made. We will do this by considering problems that can be solved in polynomial time. In the ﬁgure below we show an example.Lecture Notes CMSC 251 4 j 2 1 0 5 A1 p0 p1 4 A2 p2 120 0 6 3 88 48 158 1 2 104 m[i. Towards the end of the 60’s and in the eary 70’s there were great strides made in ﬁnding efﬁcient solutions to many combinatorial problems.4. The next couple of lectures we will discuss some of these problems and introduce the notion of P. or 2(2 ) . then you could solve all of them in polynomial time. NP. or perhaps some proof that these problems are inherently hard to solve and no algorithmic solutions exist that run under exponential time. or 2n . People began to wonder whether there was some unknown paradigm that would lead to a solution to these problems. you realize that √ n it is running in exponential time.g. Polynomial Time: We need some way to separate the class of efﬁciently solvable problems from inefﬁciently solvable problems. 82 . so it would be a good idea to trace through this example (and the one given in the text). perhaps n n .j] 3 84 0 2 7 A4 p3 p4 A1 i 4 j 2 3 1 4 1 1 3 3 2 3 3 2 s[i. The initial set of dimensions are 5. The best solutions known for these problems required exponential time. and NP-completeness. 6. n ≥ 100) your algorithm does not terminate quickly enough. But at the same time there was also a growing list of problems for which there seemed to be no known efﬁcient algorithmic solutions. When you analyze its running time.

so it is considered to be in polynomial time. but you should be aware that if you look in the book. For example. what is the minimum number of colors needed to color a graph. but that would be unacceptably inefﬁcient. Some functions that do not “look” like polynomials but are. Deﬁnition: Deﬁne P to be the set of all decision problems that can be solved in polynomial time. For rather technical reasons.) We have also assumed that operations on numbers can be performed in constant time. you would do a binary search on k). Decision Problems: Many of the problems that we have discussed involve optimization of one form or another: ﬁnd the shortest path. We will phrase many optimization problems in terms of decision problems. A problem is called a decision problem if its output is a simple “yes” or “no” (or you may think of this as True/False. ﬁnd the minimum cost spanning tree. For example. . called the undirected Hamiltonian cycle problem (UHC). most NP-complete problems that we will discuss will be phrased as decision problems. Thus. One historical artifact of NP-completeness is that problems are stated in terms of language-recognition problems. A polynomial time algorithm is any algorithm that runs in time O(nk ) where k is some constant that is independent of n. Of course.Lecture Notes CMSC 251 We have measured the running time of algorithms using worst-case complexity. . even though we know of no efﬁcient 83 . An interesting aspect of this problems is that if the graph did contain a Hamiltonian cycle. rather than asking. Up until now all the algorithms we have seen have had the property that their worst-case running times are bounded above by some polynomial in the input size. it is important to introduce the notion of a veriﬁcation algorithm. Consider the following problem. NP and Polynomial Time Veriﬁcation: Before talking about the class of NP-complete problems. (By a reasonably efﬁcient encoding. but they have the property that it is easy to verify whether its answer is correct. v13 ”. we should be more careful and assume that arithmetic operations require at least as much time as there are bits of precision in the numbers being stored. Given an undirected graph G. A problem is said to be solvable in polynomial time if there is a polynomial time algorithm that solves it. as a function of n. We will not be taking this approach. the size of the input. Many language recognition problems that may be very hard to solve. we assume that there is not some signiﬁcantly shorter way of providing the same information. if you could answer this decision problem. We have deﬁned input size variously for different problems. From now on. it will often describe NP-complete problems as languages. then it would be easy for someone to convince you that it did. v1 . The important constraint is that the exponent in a polynomial function must be a constant that is independent of n. For example. does G have a cycle that visits every vertex exactly once. . We could then inspect the graph. On the other hand. 0/1. a running time of O(n log n) does not look like a polynomial. n. some functions that do “look” like polynomials are not. For example. instead we would phrase this as a decision problem: Given a graph G and an integer k. They would simply say “the cycle is v3 . . but the bottom line is the number of bits (or bytes) that it takes to represent the input using any reasonably efﬁcient encoding. a running time of O(nk ) is not considered in polynomial time if k is an input parameter that could vary as a function of n. but it is bounded above by a the polynomial O(n2 ). is it possible to color G with k colors. v7 . ﬁnd the knapsack packing of greatest value. you could write numbers in unary notation 111111111 = 1002 = 8 rather than binary. This is because the theory of NP-completeness grew out of automata and formal language theory. accept/reject). and check that this is indeed a legal cycle and that it visits all the vertices of the graph exactly once. then you could determine the minimum number of colors by trying all possible values of k (or if you were more clever.

For example. (This is covered in the text. The given cycle is called a certiﬁcate. let us ﬁrst consider the following example. However it is not known whether P = NP. Most experts believe that P = NP. It seems unreasonable to think that this should be so. Observe that P ⊆ NP. but no one has a proof of this. In other words. We will describe this next. Suppose that there are two problems. A and B. It would be easy for someone to convince that it has at least one.Lecture Notes CMSC 251 Nonhamiltonian Hamiltonian Figure 35: Undirected Hamiltonian cycle. This referred to a program running on a nondeterministic computer that can make guesses. You know (or you strongly believe at least) that it is impossible to solve 84 . There may be even harder problems to solve that are not in the class NP. NP is a set of languages based on some complexity measure (the complexity of veriﬁcation). such a computer could nondeterministically guess the value of certiﬁcate. then we can certainly verify that an answer is correct in polynomial time. Reductions: Before discussing reductions. way to solve the Hamiltonian cycle problem. Note that not all languages have the property that they are easy to verify. In other words. and so it is widely believed that there is no polynomial time solution to the problem. Beware that polynomial time veriﬁcation and polynomial time solvable are two very different concepts. we say that the problem is polynomial time veriﬁable. if we can solve a problem in polynomial time. and then verify that the string is in the language in polynomial time. there is a very efﬁcient way to verify that a given graph is Hamiltonian. we can solve it in polynomial time anyway). It would be covered in a course on complexity theory or formal language theory. just being able to verify that you have a correct solution does not help you in ﬁnding the actual solution very much. think of the set of NP-complete problems as the “hardest” problems to solve in the entire class NP. consider the problem of determining whether a graph has exactly one Hamiltonian cycle. One question is how can we the notion of “hardness” mathematically formal. If it is possible to verify the accuracy of a certiﬁcate for a problem in polynomial time . NP-Completeness: We will not give a formal deﬁnition of NP-completeness. The Hamiltonian cycle problem is NP-complete. Like P. This is where the concept of a reduction comes in. Basically. These are called NP-hard problems. but it is not clear what someone (no matter how smart) would say to you to convince you that there is not another one. We have avoided introducing nondeterminism here. Why is the set called “NP” rather than “VP”? The original term NP stood for “nondeterministic polynomial time”. and higher level courses such as 451). (More formally. This is some piece of information which allows us to verify that a given string is in a language. Deﬁnition: Deﬁne NP to be the set of all decision problems that can be veriﬁed by a polynomial time algorithm. we do not even need to see a certiﬁcate to solve the problem. For now.

86 . you could use this algorithm and this little transformation solve the UHC problem. Lecture 28: NP-Completeness and Reductions (Thursday. Given an undirected graph G. 1998) Read: Chapt 36. May 7.u) to G’ } return Dir_Ham_Cycle(G’) } You would now have a polynomial time algorithm for UHC. on the way to deﬁning NP-completeness. UHC to DHC Reduction bool Undir_Ham_Cycle(graph G) { create digraph G’ with the same number of vertices as G for each edge {u. Summary: Last time we introduced a number of concepts. Notice that neither problem UHC or DHC has been solved. Now. P: is the class of all decisions problems that can be solved in polynomial time (that is. you return as the answer for the Hamiltonian cycle. You take the undirected graph G.4. through Section 36.Lecture Notes CMSC 251 Here is how you might do this. and then call your (supposed) algorithm for directed Hamiltonian cycles. You have just shown how to convert a solution to DHC into a solution for UHC.v} in G { add edges (u. if you could develop an efﬁcient solution to the DHC problem. O(nk ) for some constant k). the following concepts are important. Here is your algorithm for solving the undirected Hamiltonian cycle problem. convert it to an equivalent directed graph G (by edge-doubling).v) and (v. G has a Hamiltonian cycle if and only if G does. v} with two directed edges.” The classes P and NP problems are deﬁned as classes of decision problems. he agrees to let you off. Now. (u. Whatever answer this algorithm gives. Decision Problems: are problems for which the answer is either “yes” or “no. u). v) and (v. This is called a reduction and is central to NP-completeness. In particular. Therefore. create a directed graph G by just replacing each undirected edge {u. and vice versa. Nonhamiltonian Hamiltonian Figure 37: Directed Hamiltonian cycle reduction. Since you and your boss both agree that this is not going to happen soon. every simple path in the G is a simple path in G .

then you can use the subroutine for C to solve A in polynomial time. NP-completeness: Last time we gave the informal deﬁnition that the NP-complete problems are the “hardest” problems in NP. It says that if you can use a subroutine for B to solve A in polynomial time. Vk . Lemma: B is NP-complete if (1) B ∈ N P and (2) A ≤P B for some NP-complete problem A. if you could solve B in polynomial time. / / Lemma: If A ≤P B and A ∈ P then B ∈ P. but it is good enough for our purposes.) When this is so we will express this as A ≤P B.Lecture Notes CMSC 251 NP: is deﬁned to be the class of all decision problems that can be veriﬁed in polynomial time. and that each Vi is a clique of G. can the vertices of G be partitioned into k subsets. Deﬁnition: A decision problem B ∈ NP is NP-complete if A ≤P B for all A ∈ NP. (Note: This deﬁnition differs somewhat from the deﬁnition in the text. Since A is NP-complete. We can use transitivity to simplify this. giving a contradiction. . Lemma: (Transitivity) If A ≤P B and B ≤P C then A ≤P C. The third lemma takes a bit of thought. if. given a polynomial time subroutine for B. and you can use a subroutine for C to solve B in polynomial time. In other words. such that no two adjacent vertices have the same label. we say that A is polynomially reducible to B. Example: 3-Coloring and Clique Cover: Let us consider an example to make this clearer. 3-coloring (3COL): Given a graph G. observe that B cannot be in P.) Reductions: Last time we introduced the notion of a reduction. . Consider the following two graph problems. can each of its vertices be labeled with one of 3 different “colors”. V2 . To see the second lemma. if you can solve B in polynomial time. Thus. The ﬁrst lemma is obvious from the deﬁnition. such that i Vi = V . (If the answer is “no” then no such evidence need be given. V1 . This means that if the answer to the problem is “yes” then it is possible give some piece of information that would allow someone to verify that this is the correct answer in polynomial time. Here is a more formal deﬁnition in terms of reducibility. you could solve every problem in NP in polynomial time. We will usually apply the concept of reductions to problems for which we strongly believe that there is no polynomial time solution. since otherwise A would be in P by the ﬁrst lemma. . (This is done by replacing each call to B with its appropriate subroutine calls to C). then you could solve A in polynomial time. Given two problems A and B. The operative word in the deﬁnition is “if”. 87 . Clique Cover (CC): Given a graph G and an integer k. . Some important facts about reductions are: Lemma: If A ≤P B and B ∈ P then A ∈ P. we can use it to solve A in polynomial time. then every other problem A in NP would be solvable in polynomial time.