Professional Documents
Culture Documents
’08, McConnell
Updated on 10/14/08
Optimization Problems
An optimization problem has the following ingredients:
• An objective function that tells, for every feasible solution, how good the solu-
tion is.
• A feasible solution is a set A of gas stations that meets the constraint, that is,
a set of stops that won’t leave you stranded by the roadside.
1
Proving the Correctness of a Greedy Algorithm for an Opti-
mization Problem
Here is an algorithm for finding an optimum solution X for the gas-station problem.
As a base case, if the destination is within d miles, just drive to it, and
return the empty set of gas stations as your optimum solution. Otherwise,
select y to be the farthest gas station that lies within d miles of the start.
Recurse on the remainder of the trip to get an optimum solution Z for
the rest of the trip and return X = {y} ∪ Z as an optimum solution for the
whole trip.
Here’s an illustration of the template for proving such an algorithm correct. For
simplicity, we will always assume that a feasible solution exists, since modifying an
algorithm to handle the case when it doesn’t is always trivial, and it’s an annoying
nuisance.
The template is a Frankenstein-like piece of surgery on an arbitrary optimum
solution to turn it into the solution returned we returned. The operation shows that
our solution is feasible, just as good, and therefore also an optimum solution.
Let U be an arbitrary optimum solution. Imagine it lying on an operating table.
1. First, we perform a head transplant. Let v be the first stop in U , the “head,”
and let W = U −{v} be the rest, the “body.” Replace v with a new head, y, the
first stop our algorithm chose, and we still have feasible solution, A = {y} ∪ W .
(Why is A feasible?) The head transplant didn’t increase the size of the solution,
and the patient is starting to look a little more like our solution.
The new solution, X, is exactly what our algorithm produced, it is feasible, and
no transplant made the solution any worse, so it’s just as good as U . Since U was an
optimum solution, so is X.
Every greedy proof that we will study follows this format. The only thing that
differs from one proof to the next are the answers to the kinds of questions I put
in parentheses, namely, why the solution remains feasible after each transplant, and
why no transplant made the solution any worse.
2
Interval Scheduling (Section 4.1)
You’re given a set S of intervals, and we have to find a maximum subset of intervals
that don’t intersect. Here a feasible solution is any set of pairwise disjoint intervals,
and the objective function is the number of intervals in the set.
Here’s a greedy algorithm:
As a base case, if there are no intervals, return the empty set; this
is the best we can do. Otherwise, select the interval y with the earliest
ending time, toss out all intervals that intersect y, recurse on the remaining
intervals to get an optimum solution Z to this recursive call, and then
return X = {y} ∪ Z as our solution.
For the proof, let U be an optimum solution. Let v be the earliest-ending interval
in U . The head is v and the body is W = U − {v}.
Perform the head transplant: swap v with y to obtain A = {y} ∪ W . This is still
feasible. (Fill in the reasons here.)
Next, perform the body transplant: swap Z for W in A to obtain the solution X
returned by our algorithm. By induction, Z was an optimum solution to the recursive
problem, W was a feasible solution to the recursive problem, so |Z| ≥ |W |. The new
solution X is feasible (fill in reasons here), at least as large as U , and therefore also
an optimum solution.
For each greedy proof you read in the book, try to see if you can
translate the authors’ argument into a variant of this story, with only a
few details changed to fit the problem at hand.
5. It has no cycles and adding a new edge between any two vertices creates a cycle.
Convince yourself that a graph that satisfies one of these conditions satisfies all of
them. To do this, try to disprove the claim with counterexamples, and explain what
sabotages your efforts.
A graph that satisfies one (hence all) of these conditions is called an (unrooted)
tree.
3
For the minimum spanning tree problem, an instance is an undirected graph where
the edges have weights, the output is a subset of the edges, the constraint is that the
subset of edges connect the vertices, and the objective is to minimize the total weight
of the selected edges. There is a feasible solution if and only if the graph is connected.
An optimum solution connects the vertices and has no cycle, since, if it has a cycle,
any edge of the cycle can be removed without disconnecting the vertices, yielding a
better solution. The edges of an optimum solution must therefore form a tree. That’s
why an optimum solution for this problem is nicknamed a minimum spanning tree.
Our algorithm will contract an edge e, merging its endpoints into a single node.
The edges incident to this new node are all edges except e that were previously inci-
dent to the endpoints of e.
e3
e5 z
v e5 z v v e5
e3 e
e6 e e6 2 e3 e6
2
e y e y e y
4 4 e7 s 4
e e7
u 7 w x u t u
e
1
Contract e Contract e
1 2
4
recursive call on G0 . Adding the identifier of e to this list of edges this call
returns, and returning the result gives a minimum spanning tree of G.
For the proof, let U be any optimum solution (set of edges of an MST), and let X
be the set of edges our algorithm picked. As before, we will show how to perform two
transplants on U to turn it into X without making the solution any worse (heavier).
This will show that X is also an MST.
It is a minimum-weight edge across a cut {S, T }. Adding e to U creates a cycle C,
and since the endpoints of e are on opposite sides of the cut, C must re-cross the cut
on some other edge f . The old head will be f , the new head will be e, the old body
will be W = U − {f }, and the new body will be the result Z = X − {e} returned by
our recursive call. This summarizes our surgical tasks.
For the head transplant, removing f from U ∪ {e} doesn’t disconnect it, because
endpoints of f are still connected by the rest of C. The new set is connected and has
n − 1 edges on n vertices. It is once again a spanning tree. Moreover, since e and f
both cross {S, T } and e is a minimum-weight edge that crosses {S, T }, the weight of
f is at least the weight of e, (U ∪ {e}) − {f } = W ∪ {e} is also a minimum spanning
tree.
For the body transplant, contracting e makes W connected, since W ∪ {e} is
connected in the original multigraph. It has n − 2 edges, and since it spans G0 , which
has n − 1 vertices, it is a tree. By our induction hypothesis that the algorithm is
correct for smaller multigraphs, it returned a minimum spanning tree Z of G0 . Since
W is a spanning tree of G0 and Z is a minimum spanning tree of G0 , the weight of Z
is at most the weight of W . Uncontracting e and adding it to Z doesn’t disconnect
it, X = {e} ∪ Z has n − 1 edges and connects G, which has n vertices. It is therfore
a spanning tree of G. Since the weight of Z is at most the weight of W , this tree,
X = Z ∪ {e} weighs at most what the tree W ∪ {e} did. X is what the algorithm
returned for the original multigraph G, and it is a minimum spanning tree.
Efficient implementation
Let’s call the above algorithm generic algorithm. The generic algorithm gives you
complete freedom in picking the cut {S, T }. To get a good time bound, we will exercise
this freedom to get special cases of the generic algorithm that can be implemented
efficiently.
Kruskal’s algorithm picks the minimum weight edge uv that isn’t a loop. There
is always a cut {S, T } that this crosses. For instance, S = {u} and T = V − {u} is
such a cut. This edge is fair game for contraction under the generic algorithm. It
uses the union-find structure to figure out efficiently whether an edge is a loop.
In Prim’s algorithm, a privileged vertex s is passed into the call. It picks a
minimum non-loop edge sx incident to s to contract. The cut {{s}, V − {s}} is an
example of one that sx crosses, so it’s fair game under the generic algorithm. After
contracting sx to obtain a new vertex s0 , it passes s0 as the privileged vertex to the
5
recursive call.
To find a minimum edge incident to s efficiently, it uses a priority queue on V −{s},
where the heap key of each vertex y is the weight of the minimum edge between y
and s. When we contract sx to obtain s0 , we have to remove x from the priority
queue and possibly decrease the heap keys of neighbors of x so that they reflect the
least-weight edge to s0 . We can charge these heap operations at O(log n) per edge
incident to the vertex x that gets merged into s. This happens only once per edge,
so the total time is O(m log n).
6
you should visit the unemployment office during your lunch hour.
All that’s required is to know an optimum order in which to schedule the jobs.
You start out at time T0 .
As a base case, if there are no jobs, return the empty list. Otherwise,
schedule the job with the earliest deadline first, then recurse on the rest
of the schedule with a new T0 equal to the old one plus the time required
by the first job to get the rest of a tardiness-minimizing schedule.
For the proof, let (j1 , j2 , . . . , jn ) be an optimum solution. This time, instead of
cutting and adding things, the strategy of our proof will be to shuffle the order around
in ways that don’t hurt the maximum tardiness, yielding the order returned by the
algorithm.
The job with the earliest deadline is jk . Let’s show that we can move jk into
the first position without hurting the maximum lateness. Suppose you are the job jk
with the earliest deadline, and I’m the one just ahead of you in the queue. If we swap
positions, then I will finish in the new schedule at the time when you finished in the
old schedule. Since my deadline is at least as late as yours, I’m no more tardy in the
new schedule than you were in the old. You aren’t any tardier, since you are now
getting done earlier. You should cut ahead of me in line. Iterating this argument, if
you cut all the way to the front of the line, we get a schedule where you’re first, and
this new schedule is no worse than (j1 , j2 , . . . , jn ).
Similarly, the rest of this new schedule is a solution to the recursive problem.
Replacing it with the order of the remaining elements produced by the recursive call,
which is an optimum solution to the recursive problem, doesn’t cause any harm. The
result is the solution to the original problem returned by our algorithm, and it’s no
worse than the optimum solution (j1 , j2 , . . . , jn ) we started out with.
Huffman codes
The following gives a Huffman tree for a message of length 100, where a occurs 45
times, e occurs 12 times, b occurs 13 times, f occurs 5 times, e occurs 9 times, and
d occurs 16 times:
7
100
0 1
45 55
1
a 0
25 30
1 0 1
0
12 13 14 16
b d
c 0 1
5 9
f e
By induction from the leaves to the root, the leaves have been labeled with the
frequency of their letter. Each internal tree node has been labeled with the sum of
its two children. Convince yourself that this compels each internal node to be labeled
with the sum of occurrences of its leaf descendants.
The bit representation of each letter is the sequence of labels on the path from
the root to the letter. For instance, the bit representation of b is 101 and the bit
representation of e is 1100.
Suppose the message begins with 101011010110011111011001010 . . .. Start at the
root and follow the path given by the initial letters of the string, until you reach a
leaf. In this case, 101 takes you to b, so the first letter of the message is b. Recurse
on the rest of the bit string to get the rest of the message. Make sure you can decode
the above bit string as baeaf decba . . ..
The reason it’s possible to decode the message is because no letter is an ancestor
of another. You can always tell when you’ve finished a letter. Another way to say
this is that the no letter’s encoding is a prefix of another’s.
The length of the message is obtained by summing over each character the number
of occurrences times its depth in the tree. Convince yourself that this is compelled to
be equivalent to just adding up the labels at all the tree nodes, excluding the root.
The length of the encoding of this message using the tree in the figure is 224
bits, which compares favorably with the 300 bits it would take if each character
required three bits, which is what it would need if we encoded all characters with
equal-length bits. The saving comes from making more frequent characters have
shorter representations, at the expense of less frequent characters, which have longer
representations.
We want to build a tree that minimizes the length of the encoding. Here is
Huffman’s algorithm.
We work by induction on the size of the alphabet. In the base case,
there are two letters, create a tree with one root and two leaves. Other-
wise, find the two least-frequent characters, call them e and f , and replace
all occurrences of e and f in the message with a new character, x. This
8
reduces the size of the alphabet by one. Recursively build an optimum
tree for this message. Then make the two least frequent characters the
children of the node corresponding to x to get an optimum tree for the
original message.
For the proof, note that the algorithm makes e and f siblings. For the head
operation, we show that we can take an arbitrary optimum tree T and shuffle it so
that e and f are siblings. Let nodes v1 and v2 be two deepest siblings in T . Swap
e with the letter in v1 and swap f with the letter in v2 . Convince yourself that the
resulting tree, T 0 , can be no worse than T .
For the body operation, and trim off e and f from T 0 , naming their parent, which
is now a leaf, with new letter x. This gives an encoding tree T1 for the reduced
message. Let k be the number of occurrences of e and f . The length of the message
encoded by T1 is shorter by k bits than the one encode by T . By induction, a recursive
call on the reduced message produces a tree T2 that is at least as good as T1 , since
they are both encoding trees for the reduced message and T2 is an optimum one.
Adding back e and f increases the length of T2 ’s message by k again. The final tree
T 00 is at least as good as T 0 is.