You are on page 1of 9

Study guide on correctness proofs of greedy algorithms, Fall

’08, McConnell
Updated on 10/14/08

In Chapter 4, we will skip Sections 4.3 and 4.9.

Optimization Problems
An optimization problem has the following ingredients:

• A set of constraints that a solution must satisfy. A solution is feasible if it


satisfies the constraints.

• An objective function that tells, for every feasible solution, how good the solu-
tion is.

• An optimum solution to an instance of the problem is a feasible solution that


scores at least as high as every other feasible solution to the instance.

There may be many optimum solutions. The goal of an algorithm is to produce


any optimum solution. The algorithm is correct if, for every instance of the problem,
it either declares that it has no feasible solution, or else outputs some optimum
solution.
Example: Given the positions of a set S of gas stations along a route and a car
with a gas tank that has a range of d miles, find a minimum-sized subset X of S that
you can stop at without running out of gas. Assume you start out with a full gas
tank.

• The constraint is that no two consecutive elements of X can be separated by


more than d miles.

• A feasible solution is a set A of gas stations that meets the constraint, that is,
a set of stops that won’t leave you stranded by the roadside.

• A feasible solution exists if no two elements of S are separated by more than d


miles.

• The objective function is the number of elements in a feasible solution A; the


smaller its value the better A is.

• An optimum solution is a feasible set S of gas stations that’s as small as every


other feasible set A of gas stations. (It minimizes the number of stops you have
to make.)

1
Proving the Correctness of a Greedy Algorithm for an Opti-
mization Problem
Here is an algorithm for finding an optimum solution X for the gas-station problem.
As a base case, if the destination is within d miles, just drive to it, and
return the empty set of gas stations as your optimum solution. Otherwise,
select y to be the farthest gas station that lies within d miles of the start.
Recurse on the remainder of the trip to get an optimum solution Z for
the rest of the trip and return X = {y} ∪ Z as an optimum solution for the
whole trip.
Here’s an illustration of the template for proving such an algorithm correct. For
simplicity, we will always assume that a feasible solution exists, since modifying an
algorithm to handle the case when it doesn’t is always trivial, and it’s an annoying
nuisance.
The template is a Frankenstein-like piece of surgery on an arbitrary optimum
solution to turn it into the solution returned we returned. The operation shows that
our solution is feasible, just as good, and therefore also an optimum solution.
Let U be an arbitrary optimum solution. Imagine it lying on an operating table.

1. First, we perform a head transplant. Let v be the first stop in U , the “head,”
and let W = U −{v} be the rest, the “body.” Replace v with a new head, y, the
first stop our algorithm chose, and we still have feasible solution, A = {y} ∪ W .
(Why is A feasible?) The head transplant didn’t increase the size of the solution,
and the patient is starting to look a little more like our solution.

2. Next, we perform a body transplant. Replace W with Z in A, to obtain X =


(A − W ) ∪ Z, where Z is what our algorithm returned in the recursive call.
That is, we replaced everything but the head. By induction on the number
of gas stations, the new body, Z, is an optimum solution to the recursive call.
The old body, W , is a feasible solution to the recursive call. (Why?) Therefore
|Z| ≤ |W |, and the body transplant didn’t increase the size of the solution.

The new solution, X, is exactly what our algorithm produced, it is feasible, and
no transplant made the solution any worse, so it’s just as good as U . Since U was an
optimum solution, so is X.
Every greedy proof that we will study follows this format. The only thing that
differs from one proof to the next are the answers to the kinds of questions I put
in parentheses, namely, why the solution remains feasible after each transplant, and
why no transplant made the solution any worse.

2
Interval Scheduling (Section 4.1)
You’re given a set S of intervals, and we have to find a maximum subset of intervals
that don’t intersect. Here a feasible solution is any set of pairwise disjoint intervals,
and the objective function is the number of intervals in the set.
Here’s a greedy algorithm:
As a base case, if there are no intervals, return the empty set; this
is the best we can do. Otherwise, select the interval y with the earliest
ending time, toss out all intervals that intersect y, recurse on the remaining
intervals to get an optimum solution Z to this recursive call, and then
return X = {y} ∪ Z as our solution.
For the proof, let U be an optimum solution. Let v be the earliest-ending interval
in U . The head is v and the body is W = U − {v}.
Perform the head transplant: swap v with y to obtain A = {y} ∪ W . This is still
feasible. (Fill in the reasons here.)
Next, perform the body transplant: swap Z for W in A to obtain the solution X
returned by our algorithm. By induction, Z was an optimum solution to the recursive
problem, W was a feasible solution to the recursive problem, so |Z| ≥ |W |. The new
solution X is feasible (fill in reasons here), at least as large as U , and therefore also
an optimum solution.
For each greedy proof you read in the book, try to see if you can
translate the authors’ argument into a variant of this story, with only a
few details changed to fit the problem at hand.

Minimum Spanning Tree


An undirected graph satisfies one of the following if and only if it satisfies all of them:

1. It is connected and has one fewer edges than vertices.

2. It is connected and has no cycles.

3. It has no cycles and has one fewer edges than vertices.

4. Between every pair of vertices, there is a unique path.

5. It has no cycles and adding a new edge between any two vertices creates a cycle.

Convince yourself that a graph that satisfies one of these conditions satisfies all of
them. To do this, try to disprove the claim with counterexamples, and explain what
sabotages your efforts.
A graph that satisfies one (hence all) of these conditions is called an (unrooted)
tree.

3
For the minimum spanning tree problem, an instance is an undirected graph where
the edges have weights, the output is a subset of the edges, the constraint is that the
subset of edges connect the vertices, and the objective is to minimize the total weight
of the selected edges. There is a feasible solution if and only if the graph is connected.
An optimum solution connects the vertices and has no cycle, since, if it has a cycle,
any edge of the cycle can be removed without disconnecting the vertices, yielding a
better solution. The edges of an optimum solution must therefore form a tree. That’s
why an optimum solution for this problem is nicknamed a minimum spanning tree.
Our algorithm will contract an edge e, merging its endpoints into a single node.
The edges incident to this new node are all edges except e that were previously inci-
dent to the endpoints of e.
e3
e5 z
v e5 z v v e5
e3 e
e6 e e6 2 e3 e6
2
e y e y e y
4 4 e7 s 4
e e7
u 7 w x u t u
e
1

Contract e Contract e
1 2

The figure shows the result of a contraction on e1 followed by a contraction on e2 .


One problem is that this can result in a multigraph, where a pair of vertices can have
multiple edges between them, and an edge can be a loop, where the two endpoints
are the same vertex.
To make the induction go through, if we assume that our recursive call operates
on a multigraph, we have to show that the main call works on a multigraph too. As
is often the case with induction, it is often easier to solve a more general problem in
order to get a more general induction hypothesis. In this case, we find a minimum
spanning tree of a multigraph. This solves our original problem on graphs, since a
graph is just a special case of a multigraph.
The goal of our algorithm is to return the identifiers (names) of the edges of the
MST of the original graph. When we contract an edge, the endpoints of some other
edges can change, but we can still keep track of what edges of the original graph they
correspond to since they retain their original identifiers.
Our algorithm works by induction on the number of vertices.
As a base case, if there is only one vertex in our multigraph, there are
zero edges in the spanning tree, so return the empty set. Suppose our
multigraph G has n > 1 vertices. Partition the vertices into two nonempty
sets S and T of your choosing. Such a partition is called a cut. Find
a minimum-weight edge e that crosses the cut. Choose e to be in your
minimum spanning tree. To obtain the rest of the tree, contract e to
obtain a multigraph G0 . Since this has one fewer vertices, we can make a

4
recursive call on G0 . Adding the identifier of e to this list of edges this call
returns, and returning the result gives a minimum spanning tree of G.
For the proof, let U be any optimum solution (set of edges of an MST), and let X
be the set of edges our algorithm picked. As before, we will show how to perform two
transplants on U to turn it into X without making the solution any worse (heavier).
This will show that X is also an MST.
It is a minimum-weight edge across a cut {S, T }. Adding e to U creates a cycle C,
and since the endpoints of e are on opposite sides of the cut, C must re-cross the cut
on some other edge f . The old head will be f , the new head will be e, the old body
will be W = U − {f }, and the new body will be the result Z = X − {e} returned by
our recursive call. This summarizes our surgical tasks.
For the head transplant, removing f from U ∪ {e} doesn’t disconnect it, because
endpoints of f are still connected by the rest of C. The new set is connected and has
n − 1 edges on n vertices. It is once again a spanning tree. Moreover, since e and f
both cross {S, T } and e is a minimum-weight edge that crosses {S, T }, the weight of
f is at least the weight of e, (U ∪ {e}) − {f } = W ∪ {e} is also a minimum spanning
tree.
For the body transplant, contracting e makes W connected, since W ∪ {e} is
connected in the original multigraph. It has n − 2 edges, and since it spans G0 , which
has n − 1 vertices, it is a tree. By our induction hypothesis that the algorithm is
correct for smaller multigraphs, it returned a minimum spanning tree Z of G0 . Since
W is a spanning tree of G0 and Z is a minimum spanning tree of G0 , the weight of Z
is at most the weight of W . Uncontracting e and adding it to Z doesn’t disconnect
it, X = {e} ∪ Z has n − 1 edges and connects G, which has n vertices. It is therfore
a spanning tree of G. Since the weight of Z is at most the weight of W , this tree,
X = Z ∪ {e} weighs at most what the tree W ∪ {e} did. X is what the algorithm
returned for the original multigraph G, and it is a minimum spanning tree.

Efficient implementation

Let’s call the above algorithm generic algorithm. The generic algorithm gives you
complete freedom in picking the cut {S, T }. To get a good time bound, we will exercise
this freedom to get special cases of the generic algorithm that can be implemented
efficiently.
Kruskal’s algorithm picks the minimum weight edge uv that isn’t a loop. There
is always a cut {S, T } that this crosses. For instance, S = {u} and T = V − {u} is
such a cut. This edge is fair game for contraction under the generic algorithm. It
uses the union-find structure to figure out efficiently whether an edge is a loop.
In Prim’s algorithm, a privileged vertex s is passed into the call. It picks a
minimum non-loop edge sx incident to s to contract. The cut {{s}, V − {s}} is an
example of one that sx crosses, so it’s fair game under the generic algorithm. After
contracting sx to obtain a new vertex s0 , it passes s0 as the privileged vertex to the

5
recursive call.
To find a minimum edge incident to s efficiently, it uses a priority queue on V −{s},
where the heap key of each vertex y is the weight of the minimum edge between y
and s. When we contract sx to obtain s0 , we have to remove x from the priority
queue and possibly decrease the heap keys of neighbors of x so that they reflect the
least-weight edge to s0 . We can charge these heap operations at O(log n) per edge
incident to the vertex x that gets merged into s. This happens only once per edge,
so the total time is O(m log n).

Single-source shortest paths


The strategy of Dijkstra’s algorithm for finding the distances from a vertex s to all
others in an undirected graph is almost the same as Prim’s. All edge weights must
be non-negative, however.
Only the strategy for updating the weights in for the recursive call, G0 , is different.
As before, we pick a minimum-weight edges sx and contract it. Edge sx is obviously
a least-weight (shortest) path from s to x.
This creates a new vertex s0 . Some of the edges out of s0 were previously out of
s, and some were out of x. For the recursive call on G0 , we must add the weight of
sx to the ones that were out of x, because to use them in G, we must must first go
through sx. Other than this detail, when the graph is undirected, the algorithm is
the same as Prim’s, and so is the proof.
However, it generalizes to directed graphs. Work out the details on your own,
using these remarks and the book for help. A possible exam question is to ask you to
write out a proof of correctness of Dijkstra’s algorithm, even though I won’t go over
it in class.
Incidentally, when Dijkstra first published this algorithm, he pointed out that in
the special case where all the edges have weight 1, you can replace the priority queue
with just a simple queue, giving an O(n + m) algorithm. Thus was born BFS.

Scheduling for minimum lateness.


A bunch of customers’ jobs are waiting for you to do them. Each job requires a
certain number of time units to complete, and has a deadline for it to be done by.
Once you’ve started a job, you have to finish it. If you finish a customer’s job more
than k time units past the deadline, he will complain and you will be fired. To avoid
this, if possible, you come up with a strategy that minimizes the maximum tardiness
of any job – you don’t care how many customers go away grumpy, as long as you
don’t get fired. (I’m convinced that the construction firm that’s building our new
building uses this algorithm.)
If this algorithm produces a schedule where no job is more than k great; otherwise,

6
you should visit the unemployment office during your lunch hour.
All that’s required is to know an optimum order in which to schedule the jobs.
You start out at time T0 .
As a base case, if there are no jobs, return the empty list. Otherwise,
schedule the job with the earliest deadline first, then recurse on the rest
of the schedule with a new T0 equal to the old one plus the time required
by the first job to get the rest of a tardiness-minimizing schedule.
For the proof, let (j1 , j2 , . . . , jn ) be an optimum solution. This time, instead of
cutting and adding things, the strategy of our proof will be to shuffle the order around
in ways that don’t hurt the maximum tardiness, yielding the order returned by the
algorithm.
The job with the earliest deadline is jk . Let’s show that we can move jk into
the first position without hurting the maximum lateness. Suppose you are the job jk
with the earliest deadline, and I’m the one just ahead of you in the queue. If we swap
positions, then I will finish in the new schedule at the time when you finished in the
old schedule. Since my deadline is at least as late as yours, I’m no more tardy in the
new schedule than you were in the old. You aren’t any tardier, since you are now
getting done earlier. You should cut ahead of me in line. Iterating this argument, if
you cut all the way to the front of the line, we get a schedule where you’re first, and
this new schedule is no worse than (j1 , j2 , . . . , jn ).
Similarly, the rest of this new schedule is a solution to the recursive problem.
Replacing it with the order of the remaining elements produced by the recursive call,
which is an optimum solution to the recursive problem, doesn’t cause any harm. The
result is the solution to the original problem returned by our algorithm, and it’s no
worse than the optimum solution (j1 , j2 , . . . , jn ) we started out with.

Huffman codes
The following gives a Huffman tree for a message of length 100, where a occurs 45
times, e occurs 12 times, b occurs 13 times, f occurs 5 times, e occurs 9 times, and
d occurs 16 times:

7
100
0 1

45 55
1
a 0

25 30

1 0 1
0
12 13 14 16
b d
c 0 1

5 9
f e

By induction from the leaves to the root, the leaves have been labeled with the
frequency of their letter. Each internal tree node has been labeled with the sum of
its two children. Convince yourself that this compels each internal node to be labeled
with the sum of occurrences of its leaf descendants.
The bit representation of each letter is the sequence of labels on the path from
the root to the letter. For instance, the bit representation of b is 101 and the bit
representation of e is 1100.
Suppose the message begins with 101011010110011111011001010 . . .. Start at the
root and follow the path given by the initial letters of the string, until you reach a
leaf. In this case, 101 takes you to b, so the first letter of the message is b. Recurse
on the rest of the bit string to get the rest of the message. Make sure you can decode
the above bit string as baeaf decba . . ..
The reason it’s possible to decode the message is because no letter is an ancestor
of another. You can always tell when you’ve finished a letter. Another way to say
this is that the no letter’s encoding is a prefix of another’s.
The length of the message is obtained by summing over each character the number
of occurrences times its depth in the tree. Convince yourself that this is compelled to
be equivalent to just adding up the labels at all the tree nodes, excluding the root.
The length of the encoding of this message using the tree in the figure is 224
bits, which compares favorably with the 300 bits it would take if each character
required three bits, which is what it would need if we encoded all characters with
equal-length bits. The saving comes from making more frequent characters have
shorter representations, at the expense of less frequent characters, which have longer
representations.
We want to build a tree that minimizes the length of the encoding. Here is
Huffman’s algorithm.
We work by induction on the size of the alphabet. In the base case,
there are two letters, create a tree with one root and two leaves. Other-
wise, find the two least-frequent characters, call them e and f , and replace
all occurrences of e and f in the message with a new character, x. This

8
reduces the size of the alphabet by one. Recursively build an optimum
tree for this message. Then make the two least frequent characters the
children of the node corresponding to x to get an optimum tree for the
original message.
For the proof, note that the algorithm makes e and f siblings. For the head
operation, we show that we can take an arbitrary optimum tree T and shuffle it so
that e and f are siblings. Let nodes v1 and v2 be two deepest siblings in T . Swap
e with the letter in v1 and swap f with the letter in v2 . Convince yourself that the
resulting tree, T 0 , can be no worse than T .
For the body operation, and trim off e and f from T 0 , naming their parent, which
is now a leaf, with new letter x. This gives an encoding tree T1 for the reduced
message. Let k be the number of occurrences of e and f . The length of the message
encoded by T1 is shorter by k bits than the one encode by T . By induction, a recursive
call on the reduced message produces a tree T2 that is at least as good as T1 , since
they are both encoding trees for the reduced message and T2 is an optimum one.
Adding back e and f increases the length of T2 ’s message by k again. The final tree
T 00 is at least as good as T 0 is.

You might also like