Three minimum spanning tree algorithms

Jinna Lei
Submitted for Math 196, Senior Honors Thesis
University of California, Berkeley
May 2010
1
Contents
1 Introduction 6
1.1 History and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 The problem, formally . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Some definitions . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Simplifying assumptions . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Limited computation model . . . . . . . . . . . . . . . . . . . 8
1.3 Important properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Cuts and cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 About trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . 10
1.3.4 The cut and cycle properties . . . . . . . . . . . . . . . . . . . 11
1.4 Graph representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Classic algorithms 12
2.1 The union-find data structure . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Kruskal’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Dijkstra-Jarník-Prim . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Iterative algorithms 14
3.1 Contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Borůvka’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 One iteration of Borůvka’s algorithm . . . . . . . . . . . . . . 16
3.2.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Fredman-Tarjan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 One iteration of Fredman-Tarjan . . . . . . . . . . . . . . . . 17
3.3.2 The complete algorithm . . . . . . . . . . . . . . . . . . . . . 19
4 An algorithm for verification 19
4.1 Verification: problem definition and reduction . . . . . . . . . . . . . 19
4.1.1 Narrowing down the search space . . . . . . . . . . . . . . . . 19
4.1.2 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 We can verify with a linear number of comparisons . . . . . . . . . . 21
4.2.1 Proof of complexity for a full branching tree . . . . . . . . . . 22
4.2.2 Turning every tree into a full branching tree . . . . . . . . . . 22
4.2.3 We can use B instead of T . . . . . . . . . . . . . . . . . . . . 23
2
5 A randomized algorithm 25
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.1 The subgraph passed to the second recursion is sparse . . . . . 26
5.2 A tree formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 Some facts about vertices and the recursion tree . . . . . . . . 28
5.2.2 Some facts about edges and the recursion tree . . . . . . . . . 29
5.3 Runtime analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.1 The expected running time . . . . . . . . . . . . . . . . . . . . 29
5.3.2 A guaranteed running time . . . . . . . . . . . . . . . . . . . . 30
5.3.3 High-probability proof . . . . . . . . . . . . . . . . . . . . . . 30
6 A deterministic, non-greedy algorithm 31
6.1 The Ackermann function and its inverse . . . . . . . . . . . . . . . . 31
6.2 The Soft Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2.1 Bad and corrupted edges . . . . . . . . . . . . . . . . . . . . . 33
6.2.2 Consequences for the MST algorithm . . . . . . . . . . . . . . 33
6.3 Strong contractibility and weak contractibility . . . . . . . . . . . . . 33
6.3.1 Strong contractibility . . . . . . . . . . . . . . . . . . . . . . . 33
6.3.2 Weak contractibility . . . . . . . . . . . . . . . . . . . . . . . 34
6.3.3 Strong contractibility on minors . . . . . . . . . . . . . . . . . 35
6.4 Overview revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.5 Motivation for Build-T . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.5.1 What we already know . . . . . . . . . . . . . . . . . . . . . 36
6.5.2 The recursion formula . . . . . . . . . . . . . . . . . . . . . . 37
6.6 Build-T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.6.1 A hierarchy of minors . . . . . . . . . . . . . . . . . . . . . . . 38
6.6.2 Building the tree . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.6.3 Determining which sibling to visit next . . . . . . . . . . . . . 39
6.6.4 When a node runs out of children . . . . . . . . . . . . . . . . 40
6.6.5 Data structures and corruption . . . . . . . . . . . . . . . . . 41
6.6.6 Error rate and running time . . . . . . . . . . . . . . . . . . . 41
6.7 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.8 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.8.1 Density games . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.9 And, Finally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3
7 The Optimal One 42
7.1 Decision trees and optimality . . . . . . . . . . . . . . . . . . . . . . 43
7.1.1 Breaking up the decision tree . . . . . . . . . . . . . . . . . . 44
7.2 DenseCase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.3 Building and storing decision trees . . . . . . . . . . . . . . . . . . . 45
7.3.1 Keep the parameters small . . . . . . . . . . . . . . . . . . . . 45
7.3.2 Emulating table lookups . . . . . . . . . . . . . . . . . . . . . 45
7.4 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.4.1 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.4.2 Finding partitions . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.5 Putting things together . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.6 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4
Since this marks the denouement of my college career, I suppose acknowl-
edgements are in order.
I dedicate this to my family who encouraged and supported me through-
out college, and gave me the inspiration to keep going.
Also a very big thanks to Professor Karp, Luqman Hodgkinson, Yehonatan
Sella, Ian Henderson, and everybody else who listened to me blathering
on about spanning trees and soft heaps. And my dad, who not only
listened but read part of it!
5
1 Introduction
1.1 History and Content
In 1926 Otokar Borůvka attacked the problem of finding the most efficient electricity
network for the now-nonexistent nation of Moravia. [1] Distilled into a mathematical
form, this is the problem of finding the subgraph of least cost that is still connected.
Since then, the task of finding a minimum spanning tree has become a staple of
the algorithms repertoire. A few others proposed better solutions, the methods
of Kruskal and Dijkstra-Jarník-Prim (more commonly known as Prim’s algorithm)
being the most intuitive and popular.
The classical greedy algorithms – Kruskal’s, Dijkstra-Jarnik-Prim, and Borůvka’s
– build the graph incrementally, at all times maintaining a correct partial result –
correct in the sense that every edge in the intermediate result ends up in the final. On
the other hand, the fastest algorithms so far all maintain intermediate results that
are instead supersets of the correct answer, instead of subsets. We shall investigate
why this approach is powerful by examining four algorithms: one that checks whether
a given spanning tree actually is minimal, and three for actually constructing the
MST.
The verification algorithm first is the cumulative result of quite a few papers,
probably the first of which came from Janos Komlós in 1984. Next we will examine a
randomized algorithm by David Karger, Philip Klein, and Robert Tarjan (1995) that
runs in linear time with high probability. The third result is a slightly superlinear
algorithm by Bernard Chazelle (2000) that uses the wonderful Soft Heap. The last
algorithm we will look at was put together by Seth Pettie and Vijaya Ramachan-
dran (2002), and although they show that its asymptotic running time cannot be
beaten (at least for a comparison-based algorithm), no one really knows what that
asymptotic running time actually is, except that it is at least linear in the input size.
This is a review of the papers I found interesting. I try to frame things in new
and interesting ways, and introduce some coherence between them. I hope it can be
of help to anyone surveying the minimum spanning tree literature.
1.2 The problem, formally
1.2.1 Some definitions
Definition. A graph G consists of a vertex set V and an edge set E. Every element
of E is an unordered pair of vertices. We will write undirected edges as ¦u, v¦.
6
Definition. A subgraph H of G has an edge set E

⊆ E, and a vertex set induced
by E

.
Definition. A path in G is a sequence of vertices v
0
, v
1
, v
2
, . . . , v
k
such that there is
an edge between any two adjacent vertices v
i
, v
i+1
in the sequence. We will sometimes
refer to the edges in the path, although it is formally defined as a sequence of vertices.
Definition. A graph is connected if there is a path between any two vertices in the
graph.
Definition. A tree is a graph that is minimally connected – that is, any tree T is a
connected graph, but removing any edge will disconnect it.
Definition. A spanning tree of G is a subgraph of G that is a tree and that covers
every vertex in G.
Definition. If G is a graph whose edges have weights w(e), the cost of a subgraph
H is

e∈H
w(e), or more informally, the sum of the weights of all its edges.
Definition. A forest is a set of trees.
Definition. An edge is incident to a connected component, a vertex, or another
edge if exactly one if its endpoints lie in the connected component, equals the vertex,
or is also an endpoint of the other edge.
1.2.2 Statement
Given a graph G = (V, E), and a weight function w over the edges, find a spanning
tree T of G such its cost is minimal. That is, of all the spanning trees U of G,

e∈T
w(e) ≤

e

∈U
w(e

).
We will denote the true minimum spanning tree of G as MST(G).
1.2.3 Simplifying assumptions
Let it be known that m generally denotes the number of edges in the input graph
and n the number of vertices. If there is a possibility of ambiguity, we will strive to
clarify whether m and n refer to the parameters of the original input, or those of a
recursive call.
Unless otherwise stated, we will assume that all edge weights are distinct. Usually
it is a simple matter to generalize to non-distinct weights. In addition, we will often
assume in correctness proofs that all edge weights are integers in [1,m]. This will not
7
change the minimum spanning tree. In fact, we only need an ordering on the edge
weights to find the MST, as the Cut and Cycle properties below show.
We will also assume that the original input graph G is connected. If an input
graph is not connected, we can find the strongly connected components in linear
(O(m + n)) time and feed the components separately to our algorithms. It is not
hard to find an algorithm that finds connected components – for instance, depth-first
search will suffice. Since all the algorithms we deal with are linear or superlinear, this
stipulation causes no loss of generality and doesn’t affect our running time analyses.
It also implies that m ≥ n −1, or n = O(m) and log n = O(log m).
The original input graph G will always be simple – that if an edge connects
two vertices, it is the only edge between those two vertices, and that there are no
self-loops. We lose no generality here because we can clean up a non-simple graph,
keeping only the redundant edges with lightest weight, in O(m) time. We will give the
algorithm later. This assumption causes m ≤ n(n−1), which implies m = O(n
2
) and
log m = O(log n). In some recursive calls the simple-graph requirement is dropped
to make analysis simpler, and it will be clearly stated whenever this occurs.
In addition, we talk about graphs on labeled vertices. For a graph on n vertices,
each vertex is labeled with a number (or any arbitrary symbol, as long as it is unique)
from 1 to n. Two graphs G and G

are taken to be equal if ¦i, j¦ is an edge in G if
and only if ¦i, j¦ is also an edge in G

. There are 2
(
n
2
)
unique unweighted graphs on
n vertices.
1.2.4 Limited computation model
The literature makes a distinction between comparison-based algorithms and algo-
rithms which are allowed full access to bit representations of data. There is a specific
model of computation that is favored, the pointer machine. The main limitation of
a pointer machine is that it does not allow arithmetic on pointers. That means no
constant-time table lookups, since calculating a hash function requires the ability to
manipulate pointers. A pointer is allowed to be dereferenced and checked against an-
other pointer for equality, nothing more [2]. The full range of arithmetic operations
are allowed on any other data type, and have unit cost.
We need to acknowledge the elephant in the room: given models of computation
that do allow bit arithmetic, the MST problem already has a linear time solution!
For example, Fredman and Willard also another algorithm and data structure that
finds MSTs in linear time on a unit-cost RAM. bit arithmetic [8]. Pettie’s algorithm
from Section 7 also runs in linear time if pre-computed MST solutions (instead of
decision trees) are allowed to be cached and retrieved in constant time.
8
We will focus on pointer machine algorithms in this review. They tend to reveal
more about the nature of the MST problem, and the resulting insights are frequently
applicable to matroid optimization in general. In addition, the search for a linear-
time comparison-based minimum spanning tree algorithm has motivated ideas and
data structures that are useful in general to computer science, some of which we will
describe.
1.3 Important properties
1.3.1 Cuts and cycles
Definition. A cut of a graph G is a subset S of vertices and its complement
¯
S such
that neither S nor
¯
S is empty.
Although we will formally define a cut as a set of vertices and its complement,
keep in mind that a cut is really a just of dividing the graph. In many ways the
edges that cross a cut are more important than the vertices themselves.
Definition. An edge e crosses a cut S,
¯
S if one of its endpoints is in S and the other
is in
¯
S.
Often we will talk of a specific cut, one that results from removing an edge from
a spanning tree.
Definition. Let T be a spanning tree of G and e an edge in T. Removing e from T
divides T into two connected components, and every vertex in G is in one of them.
We say S,
¯
S is the cut defined by e if S is the set of vertices in one of the components,
and denote it cut(T, e).
Definition. A cycle in G is a path whose endpoint is the same as its start point.
Since a spanning tree T of G is connected, there is a path involving only edges
in T between any two vertices in G, and since it is a tree, this path is unique.
Definition. Let T(u, v) denote the unique path between u and v in T.
As with cuts, a spanning tree and an edge can define a specific cycle.
Definition. If T is a spanning tree of G and e = ¦u, v¦ is an edge not in T, then
T(u, v¦ and ¦u, v¦ form a cycle. We call this the cycle e makes with T.
9
1.3.2 About trees
We will reiterate without proof some basic facts about trees and go on to some MST
properties.
Fact. A tree T with n vertices has n-1 edges and no cycles.
In the other direction,
Fact. Any two of the following properties suffice to prove that T is a tree: connect-
edness, acyclicity, and having n-1 edges
1.3.3 Existence and Uniqueness
Now we have the tools to prove that the minimum spanning tree does indeed exist
for all graphs, and is furthermore unique if our edge weights are distinct.
Theorem 1. For any connected graph G, the minimum spanning tree of G exists
and is unique.
Proof. Existence: A spanning tree of G exists, since we can keep removing edges
until G no longer has any cycles. Removing any edge that is part of a cycle does not
disconnect G since any path that went through the edge ¦u, v¦ which we removed
may also go through the remaining part of the cycle. There also exists a spanning
tree with minimal weight: since our graphs our finite, we can enumerate all spanning
trees and their costs. The set of spanning tree costs is also finite, so there must be
a minimum.
Uniqueness: Suppose we have two spanning trees, T
1
and T
2
, with the same
weight. Let e be the heaviest edge in T
1
∪ T
2
− T
1
∩ T
2
. Suppose without loss of
generality that e ∈ T
1
. Across cut(T
1
, e), there is no other edge of T
1
, otherwise there
would be a cycle. However, since a spanning tree must be connected, there is at least
one edge in T
2
that crosses this cut. Let f be such an edge of T
2
. Since f is not in T
1
,
it must be in T
1
∪T
2
−T
1
∩T
2
, and since e was the heaviest in this set, w(f) < w(e).
If we replace e with f in T
1
, the resulting graph is connected, since the graphs on
either side of the cut were connected. It is also acyclic, since after removing e from
T
1
, there was no path between a vertices on opposite sides of the cut, so adding in
f created no cycles. Thus replacing e by f results in a spanning tree with total cost
less than cost(T
1
) = cost(T
2
), so neither of these are minimal. Therefore a spanning
tree with minimal cost must be the only spanning tree with that cost.
10
1.3.4 The cut and cycle properties
The properties of being the lightest edge across some cut and the heaviest edge on
some cycle have a curious dual relationship:
Lemma 2. Theorem: ∃ cut S across which e is lightest ⇐⇒ cycle on which e is
heaviest.
Proof. (⇒) Let e be an edge that is the lightest across a cut S,
¯
S, and suppose that
C is a cycle containing e. Removing e from the cycle leaves a path between the two
endpoints of e, call them u and v. Since one (say u) is in S and the other is in
¯
S,
then the remainder of C must cross from S to
¯
S. Let f be the edge in C–e that
crosses the cut. Since e is the lightest across the cut, e must be lighter than f, and
so e cannot be the heaviest in the cycle.
(⇒) We argue the contrapositive. Suppose e is heaviest on a cycle C. Let S,
¯
S
be a cut that e crosses. Since C is a cycle, it must cross the cut at least twice. Let
f be another edge that crosses the cut. We know e is heavier than f, so e cannot be
the lightest across this cut.
The cut property and the cycle property are ways of characterizing all edges in
the MST, and indeed either one actually defines the edges of the MST.
Theorem 3 (Cycle Property). An edge e is not in the minimum spanning tree if
and only if it the heaviest edge on some cycle.
Proof. (⇒) Let T

be the minimum spanning tree for G, and let e = ¦u, v¦ not be in
T

. Since T

connects all the vertices of G, there is a path in T

that connects u and
v. Adding e to this path creates a cycle C. If there exists an edge e

that is heavier
than e, then we can replace e

with e to get a lighter tree T
∗∗
, which is impossible
by our choice of T

.
(⇐) Suppose e is heaviest on the cycle C. Then there is no cut across which it
is the lightest. Let T be a spanning tree of G that includes e. Removing e splits T
into two connected components, defining a cut of G. There is a lighter edge f across
this cut, and replacing e by f yields a tree T

that is lighter than T. So no spanning
tree containing e is minimal.
By the cut-cycle duality, it is easy to see that this implies the cut property:
Theorem 4 (Cut Property). An edgee is in the minimum spanning tree if and only
if it is the lightest across some cut.
Proof. This follows directly from Lemma 2 and Theorem 3.
11
1.4 Graph representation
A graph, being mathematically defined as a set of vertices V and a subset E of
V V , still needs to have some kind of concrete representation on a computer. We
can realize this with an adjacency list: each vertex v maintains a list of pointers
to edge objects for which v is an endpoint. Both edges and vertices we will define
to be data structures, with a vertex storing at minimum its unique identifier. An
edge stores a pointer to the endpoint with the lesser identifier, and a pointer to the
endpoint with the greater identifier, as well as its weight. Vertices and edges are also
capable of storing an additional constant amount of data, which we will describe as
needed.
2 Classic algorithms
2.1 The union-find data structure
The classic greedy algorithms heavily use set operations, in particular asking whether
two objects are in the same set as well as taking the union of two sets. The union-
find data structure supports the operations makeset(u), find(u), and union(u, v):
makeset() returns a new set with one element u, find(u) returns the unique repre-
sentation of the set to which u belongs, and union(u, v) combines the sets containing
u and v into one.
The implementation of the union-find structure is outside the scope of this review.
However, the running times per operation are important enough emphasize here:
makeset and union both run in O(1) time. And for any sequence of union and
find operations that includes k finds, then find takes at most O(kα(k)) time,
averaging O(α(k)) or better per find, where α() refers to one form of the inverse of
the Ackermann function. The Ackermann function grows extremely quickly, and its
inverse grows extremely slowly – α(number of atoms in the observable universe) =
4. The Ackermann function and a different form of its inverse (one that takes two
arguments) will reappear later, when we discuss Chazelle’s algorithm.
2.2 Kruskal’s
Kruskal’s algorithm follows a simple intuition: in order to minimize the final cost,
include the lightest possible edges. In particular, start by grabbing the lightest legal
edge available, and repeat until you have a spanning tree.
12
Algorithm 1 Kruskal
Require: Input G = (V, E)
Ensure: Output T ⊆ E, the minimum spanning tree
1: sort E
2: T ← ∅
3: for all v ∈ V do
4: makeset(v)
5: end for
6: for all edges ¦u, v¦ ∈ E do
7: if find(u) != find(v) then
8: add {u,v} to T
9: union(u,v)
10: end if
11: end for
12: return T
Let’s take a high-level look at Kruskal’s. When edge e is processed, if e is not in
the MST, Kruskal’s ignores it, and if e is in the MST, Kruskal’s puts it in T. This
is easily proved using the cut and cycle properties. Basically, if the endpoints of e
are in the same connected component, then e is heaviest on the cycle it creates with
the existing edges in T, because we are processing the edges in sorted order. On the
other side, if the two endpoints are in different components C
1
and C
2
, and if S
1
is
the vertex set of C
1
, then e is lightest across the cut S
1,
¯
S
1
.
At the time of processing an edge, Kruskal’s algorithm does exactly the right
thing with it – if e belongs in MST(G), then Kruskal’s includes it. If not, Kruskal’s
throws it out.
2.3 Dijkstra-Jarník-Prim
This is more commonly known as Prim’s algorithm, but I will follow Pettie’s example
in calling this the Dijkstra-Jarník-Prim algorithm, or DJP for short. It was first
devleoped in 1930 by Jarník, and independently discovered by Prim and Dijkstra in
the late 1950’s.
From a distance DJP seems very similar to Kruskal’s: it also grabs the lightest
edge possible at every step. Instead of iterating through the edges in sorted order,
it uses a heap to keep track of which vertex would be cheapest to add to a growing
tree. It only keeps track of one edge per candidate vertex, calling on the heap
decreasekey operation if necessary. The running time is therefore heavily dependent
13
on the heap used. With a standard binary heap that insertions, deletions, and
decreasekey operations in O(log N) time, where N is the number of elements in the
heap, the running time is O(mlog n).
3 Iterative algorithms
The two iterative algorithms we shall describe are Borůvka’s and the Fredman-Tarjan
algorithm. Both define iterative steps that are important in the later, more sophis-
ticated algorithms.
3.1 Contractions
The purpose of the contraction is mostly to present things cleanly. Instead of speak-
ing of a collection of intermediate subgraph, it allows us to speak of the vertices of
a contracted graph.
Contraction is exactly what it sounds like: we merge two or more vertices into
one supervertex, whose incident edges are the union of all the edges incident to the
original vertices that make it up. More formally, given a (usually disconnected) sub-
graph H, contracting the graph across H means to make every connected component
of F into a supervertex.
The implementation given in Algorithm 2 requires each vertex to store an integer
in the field “component.”
Let m
F
, n
F
be the numbers of edges and vertices in F, and m
G
, n
G
be the numbers
of edges and vertices in G. Since can find connected components in O(m
F
+n
F
) time,
the entire subroutine takes O(m
G
+ n
G
) time, since we iterate through the vertices
once and throught the edges once, and m
F
< m
G
and n
F
< n
G
.
Contractions have the messy side effect of potentially returning a non-simple
graph. There is a simple clean-up routine, using a lexicographic sort, to ensure that
the contracted graph is simple, keeping the lightest edge when there are redundant
edges. After lexicographically sorting the edges by component identifiers, redundant
edges show up next to each other and we only need to scan the sorted list of edges to
extract the edge of lowest cost among duplicated edges. This can be done in O(m
G
)
on a pointer machine [7].
Definition. If G

was obtained by contracting edges of G, then G

is called a minor
of G.
Remark. A minor of a minor is also a minor. That is, if G

is a minor if G

and G

is
a minor of G, then G

is a minor of G.
14
Algorithm 2 Contract
Require: Input: G = (V, E), a subgraph H of edges to contract
Ensure: Ouput: G

, the contracted graph
If not every vertex in V is represented in H, put the missing vertices in
V

← ∅, E

← ∅
G

= (V

, E

)
connectedComponents←find-connected-components(H)
i ← 0
for all C ∈ connectedComponents do
put i in V

for all v ∈ C do
v.component ← i
end for
i ← i + 1
end for
for all ¦u, v¦ ∈ E do
put ¦u.component, v.component} in E

end for
return G

= (V

, E

)
In terms of MST algorithms, certain subgraphs are safer to contract than others.
Definition (Contractible). A subgraph C of Gis contractible if MST(G) = MST(C)∪
MST(G¸ C).
That is, treating the entire collection of vertices in C as one does not affect the
correctness of an MST algorithm. All partial MST results are contractible – we could
stop Kruskal’s or DJP at any time, for instance, contract G across the intermediate
result, and carry on.
Remark. C is contractible and connected ⇐⇒ C∩MST(G) is connected.
Definition. If G

is a minor of G, and v

is a vertex in G

, then the supervertex v

contains one or more vertices of G. Let the expansion of v

be the subgraph of G
with vertex set ¦v ∈ G : v maps to v

¦, and edge set ¦¦u, v¦ : u, v both map to v

¦.
We write the expansion of v

as C
v
.
3.2 Borůvka’s algorithm
Like Kruskal’s algorithm, Borůvka’s partitions the vertices into partial trees and
merges them incrementally. However, unlike Kruskal’s, which merges two compo-
15
nents in a step, in a single step of Borůvka’s algorithm every component is involved
in a merger. Like DJP, Borůvka’s grows the intermediate result by taking the lightest
edge coming out of a component, but unlike DJP, which only tracks one component,
Borůvka’s does so for many.
Of course the cost of taking multiple edges in a step is that the steps are longer
and more complex.
3.2.1 One iteration of Borůvka’s algorithm
At the start iteration i, we have a graph G
i
, with G
0
= G. During one iteration, each
vertex selects the lightest edge incident to it and contracts that edge. At the end
of the iteration, we have some contraction G
i+1
of G
i
, and the set F of contracted
edges.
In the implementation given in Algorithm 3, we need to store fields “minEdge”
and “minEdgeWeight” for each vertex.
Algorithm 3 Borůvka-step
Require: Input: G
i
= (V
i
, E
i
)
Ensure: Output: a forest F of MST edges and a contracted graph G

F ← ∅
for all ¦u, v¦ ∈ E
i
do
if w(¦u, v¦) < u.minEdgeWeight then
u.minEdgeWeight ← w(¦u, v¦)
u.minEdge ← ¦u, v¦
end if
if w(¦u, v¦) < v.minEdgeWeight then
v.minEdgeWeight ← w(¦u, v¦)
v.minEdge ← ¦u, v¦
end if
end for
for all v ∈ V do
put v.minEdge in F
end for
G
i+1
←contract (G
i
, F)
return G
i+1
, F
Iterating through the edge set and vertex set takes O(m+n) time, and contract
is O(m), so Borůvka-step takes O(m) time in all.
16
3.2.2 The algorithm
Borůvka’s algorithm simply performs Borůvka phases until the entire graph is con-
tracted into one vertex. It stores a running result T, and appends the contracted
edges F to T after every iteration. It is easy to see correctness by noting that taking
the lightest edge out of a vertex v

in G

is equivalent to taking the lightest edge
out of the cut S
v
,
¯
S
v
, where S
v
is the vertex set of C
v
. And we iterate until G is
contracted to a single vertex, so the final T is connected.
3.3 Fredman-Tarjan
Over time, various modifications to Kruskal’s, DJP, and Borůvka’s algorithms have
been proposed, lowering the running time by various degrees. The use of a Fibonacci
heap, and a slight but important modification to DJP, lowers the running time from
O(mlog n)to O(mβ(m, n)), where β is a form of the iterated logarithm, being the
least number of times the logarithm function must be applied to n before it drops
below m/n. Formally, β(m, n) = min¦i : log
(i)
n ≤ m/n¦. The approach is more
thoroughly described in [7].
The Fibonacci heap performs the insert, deletemin, decreasekey, and meld op-
erations in constant amoritized time, and both deletemin and delete in O(log N)
amoritized time, where N is the number of items in the heap.
3.3.1 One iteration of Fredman-Tarjan
One iteration of the Fredman-Tarjan algorithm results in a contracted graph, where
the number of contracted vertices is at most 2m/k, with k being an input parameter.
In addition, it generates a set of partial MSTs ( such that
1. ( covers every vertex.
2. The members of ( are edge-disjoint
3. The number of connected components in ∪
C∈C
C is at most 2m/k.
The basic flow is pretty simple: start with all vertices unmarked. Picking an arbitrary
vertex, expand outward, DJP-style, until the heap of candidate vertices reaches size k.
Mark all the vertices in the current component, and start afresh with an unmarked
vertex. For components other than the first, expansion can stop before the heap
grows large enough, if the currently component collides with an old one – that is,
the last vertex added to the component was already part of another component.
Pseudocode is given in Algorithm 4.
17
Algorithm 4 Fredman-Tarjan-iteration
Require: G = (V, E)
Ensure: F = a subset of MST edges; G = G¸ F
while there are still unmarked vertices do
Initialize a new heap
Pick an arbitrary unmarked vertex v
0
Put all adjacent vertices u in heap with key w(¦u, v
0
¦)
while hp has fewer than k elements do
v ← heap.deletemin()
if v is already in the currently growing component then
Continue without doing anything
end if
Add ¦v, x¦ to F, where w(¦v, x¦) was the last key of v in the heap
for all u adjacent to v do
If u is not in the heap, insert u with key w(¦u, v¦). If u has a greater key
in the heap than w(¦u, v¦), then decrease the key to w(¦u, v¦)
end for
end while
end while
Contract G across the edges of F (without clean-up)
This ensures 1) that every time we retrieve the lightest edge from the heap, it
takes time in O(log k), and 2) whenever we stop growing a component, either it has
k or more other vertices adjacent to it, or it shares a vertex with another component.
However, the first component in any set of components linked by common vertices
must have stopped growth when the heap reached critical size. Therefore every
connected component of F at least k edges coming out of it. Contracting across F
gives us k edges coming out of each vertex. Since the total number of edges coming
out of all vertices is 2m, this gives at most 2m/k vertices in the contracted graph.
Again, the running time is highly dependent on the particular heap implementa-
tion. With a Fibonacci heap, one iteration runs in O(m+nlog k).
Another consequence of this is that we can raise the density,
m
n
, of a graph to an
arbitrary value D in O(m + nlog D) time. This comes from the fact that the new
density
m
n

=
k
2
, so by setting k = 2D and running a Fredman-Tarjan iteration we
have the desired result.
18
3.3.2 The complete algorithm
Again, we perform iterations, contracting the connected components after step, until
the graph is becomes trivial. Setting k = 2
2m
n
for each iteration will give us the
promised time bound. We refer the reader to [7] for details.
4 An algorithm for verification
I’ll begin with a verification algorithm because it nicely illustrates both the usage of
the cycle property and a couple of other tricks.
A bit of history and acknowledgements: Janos Komlós first observed that ver-
ification can be done in a linear number of comparisons, although a linear-time
implementation proved more incorrigible. Valerie King distilled Komlós’s result into
a simpler form in addition to managing to implement Komlós’s algorithm in lin-
ear time and space. Slightly before King’s result, Dixon, Rauch, and Tarjan gave
a completely different algorithm based on massaging the input so that a previous
method of Tarjan’s runs in linear time. Adam Buchsbaum produced the first purely
comparison-based verifier by replacing the RAM-dependent portion of Dixon et al.
by a pointer method.
Here I will talk about Komlós’s information-theoretic result and King’s refine-
ment. It is important to know that Buchsbaum’s algorithm exists, for later algo-
rithms, but I will not go into detail about it.
4.1 Verification: problem definition and reduction
The inputs are a graph G and a spanning tree T of G. A correct verifier accepts if
T is the minimum spanning tree of G and rejects if T is not.
4.1.1 Narrowing down the search space
How do we know if T is the MST? The cut and cycle properties tell us exactly which
edges are in the MST. We present the holistic cycle and cut properties. They are
holistic in the sense that they apply to an entire spanning tree.
Theorem 5 (Holistic cut property). If T is a spanning tree of G, then removing
any edge splits T into two connected components, which between them cover all the
vertices of G. This defines a cut of G. The holistic cut property states that T is the
minimum spanning tree if and only if every edge in in T is the lightest across the cut
defined by removing it from T.
19
Remark. For the cut defined by removing e ∈ T then e is the only edge from T across
that cut.
Likewise the cycle property can be used to evaluate an entire spanning tree.
Definition. Let G be a graph and T a spanning tree of G. Given any two vertices
u and v in G, there is a unique path between them that only uses the edges in T.
This is a consequence of the definition of a spanning tree. Define T(u, v) to be this
unique path.
Theorem 6 (Holistic cycle property). If T is a spanning tree for G, then for every
edge ¦u, v¦ that is not in T, putting ¦u, v¦ together with T(u, v) creates a cycle. T is
the MST if and only if every edge ¦u, v¦ not in T is heaviest in the cycle it creates
with T(u, v). To simplify notation at times, we will speak of the cycle e creates with
T.
Proof. The cut-cycle duality entails that the forward direction of the holistic cut
property is equivalent to the holistic cycle property, and the same for the backward
direction.
For the forward direction, if T is the MST, and f is an edge not in T, then f
must be the heaviest in the cycle it creates with T since otherwise we replace the
heaviest edge in the cycle with f, obtaining a spanning tree lighter than T. For the
backward direction, if every edge in T is lightest across the cut it defines, then the
ordinary cut property guarantees that every edge in T is in the MST.
The holistic cut and cycle properties seem nearly like tautologies, given the or-
dinary cut and cycle properties. Their significance comes from the fact that they
specify the exact cut or cycle that we should look at. The ordinary cut and cycle
properties only said, “if there exists a cut,” or “if there exists a cycle.” The holistic
properties make it so we don’t have to look at all cuts or all cycles, just one.
4.1.2 Reduction
Applying the holistic cut and cycle properties yields the following equivalent formu-
lations of the MST verification problem:
1. Given a graph G and a spanning tree T, then for every e ∈ T, is e the lightest
across the cut defined by removing e from T?
2. Given a graph G and a spanning tree T, then for every e / ∈ T, is e the heaviest
on the cycle it makes with T?
20
Komlós chooses to attack the second question, breaking it up into two parts. The
first task is to find the maximum weight on T(u, v) for all vertex pairs u, v. The
second is to test w(¦u, v¦) against this maximum weight for all edges ¦u, v¦ not in
T.
4.2 We can verify with a linear number of comparisons
Komlós notes that one can turn any spanning tree into a rooted tree by distinguishing
an arbitrary leaf node. Given this natural order on the vertices and edges of the input
tree, then, we can break any query path into two half-paths:
Definition. If one end of a path is an ancestor of the other end, then this path is a
half-path.
Komlós inductively finds the maximum weight on every possible half-path, and
stores the result in a lookup table.
For every node v on level d of the tree, we will construct an ordered list M(v) =
[m
0
(v), m
1
(v), . . . , m
d
(v)], where m
i
(v) is the maximum weight of the directed path
starting at level i and going to v. For example, if p(v) is the parent of v, m
d−1
(v)
equals the weight of the only edge between p(v) to v.
Lemma 7. For every v at level d, we can find M(v) in less than or equal to log d
comparisons.
Proof. If d = 1, i.e. v is a child of the root, we define M(v) = [m
0
(v)] = [w(¦v, r¦)].
This takes zero comparisons, and 0 ≤ log 1 = 0.
Let u be a node and let a
i
(u) denote its ancestor at level i, of course constraining
i to be less than depth(u).For any depth i and any node u, m
i
(u) ≤ m
i−1
(u) because
the directed path from a
i
(u) to u is a subset of the directed path from a
i−1
(u) to
u. Thus [m
0
(v), m
1
(v), . . . , m
d
(v)] is an ordered list.When constructing M(v), since
for every path from an ancestor a
i
(v) to v we know the maximum over all edges ex-
cept ¦p(v), v¦, we only need to compare w¦p(v), v¦) with m
i
(p(v)). However, since
M(p(v)) is an ordered list, we only need to find the point at which w(¦a
d−1
, v¦)
becomes greater than m
i
(p(v)). This is binary search, which takes log(d − 1) com-
parisons.
To actually construct M(v), however, is a little more expensive. We take the
index i∗ returned by binary search, , and set m
i
(v) = m
i
(p(v)) for all i < i∗, and
m
i
(v) = w(¦p(v), v¦ if i ≥ i∗.
Definition (Full branching tree). A rooted tree in which all leaves are at the same
level, and every internal (non-leaf) node has at least two children.
21
4.2.1 Proof of complexity for a full branching tree
The total number of comparisons needed, then, is

i
L
i
for L
i
the total number of comparisons needed for all the vertices at level i, which is
given by

v
log([M(v)[ + 1) (1)
for all v on level i.
Following Eisner’s lead [6], we rewrite this as an average of logs and use Jensen’s
inequality. Equation (1) becomes
n
i

v
log([M(v)[ + 1)
n
i
≤n
i
log(

v
[M(v)[
n
i
+ 1)
≤n
i
log
n +

v
[M(v)[
n
i
≤n
i
log
n + 2m
n
i
=n
i
_
log
n + 2m
n
+ log
n
n
i
_
The sum over all levels is then
O(nlog
m+n
n
)
using the fact that, since this is a full branching tree, the number of nodes at depth
i is at most n/2
i
.
4.2.2 Turning every tree into a full branching tree
By building a tree that documents a run of Borůvka’s algorithm, King gives us a way
to turn every spanning tree into a full branching tree with at most 2n vertices. Each
level of the full branching tree represents the state of the graph before one iteration
22
of Borůvka’s, with nodes at level i corresponding to the contracted vertices in G
i
.
There is an edge between a node at level i −1 and a node at level i if the i −1-node
becomes part of the i-node during that iteration. More formally, if T is a spanning
tree over n vertices, we will build a full branching tree B.
1. Start B as the empty graph.
2. Put all the vertices in T as the leaves of B. That’s the end of the 0th Borůvka
iteration.
3. Repeat until G
i
is contracted to a single vertex: If, at beginning of the ith
iteration, we have vertices v
1
, . . . , v
k
, and at the end, we have vertices u
1
, . . . , u
l
in the contracted graph, then put all u
j
in B as nodes. Draw a directed edge
from v
i
to u
j
if v
i
was contracted into u
j
, and the weight of that edge is the
weight of the edge selected by v
i
during that Borůvka iteration.
Note that since T was a tree to begin with, Boruvka’s algorithm trivially returns T.
An immediate consequence of this is that every edge in T was selected at some point.
In addition, even if we assume weights in T are unique, weights in B may not be.
There is a natural surjective map from edges in B to edges in T, and a one-to-many
mapping from edges in T to edges in B, namely the map that associates every an
edge in T to all the edges with the same weight in B.
Claim. B is a full branching tree.
Proof. B is clearly a rooted tree, with the node in B to the entire T as the root.
Since after an iteration, every connected component is the result of joining at least
two other connected components, condition 2 for a full branching tree is satisfied.
And we can prove by induction on the height of B that all leaves are at the same level
(that level being the number of iterations needed for running Boruvka’s on T).
4.2.3 We can use B instead of T
Recall that for a spanning tree T, T(x, y) denotes the unique path in T between x
and y. In the same way, let B(x, y) denote the unique path in B between leaves x
and y.
Lemma 8. u is on B(x, y) if and only if u is the lowest common ancestor of both x
and y in B, or u is an ancestor of x but not of y, or vice versa.
23
Proof. Suppose v is the lowest common ancestor of x and y. We can see that by
joining the path from v to x and from u to y and ignoring orientation, we get an
undirected path from x to y. Since B is a tree, this is the only path. Any node
on the path from v to x is an ancestor of x but not y (otherwise we contradict the
lowest-ness of v), and vice versa for any node on the path from v to y. On the other
hand, if u is a common ancestor of x and y but not the lowest, then u is not included
on the path defined earlier in the paragraph, which is unique.
Lemma 9. If e’ is an edge of B(x, y), then there is an edge e in T(x, y) with w(e) =
w(e

). As a matter of fact, e is the same edge from which e

derived its weight.
Proof. Let e be the T-edge whose selection gave rise to e

in B. If we show that e is
on T(x, y), then we’re done.
Suppose (u, v) = e

, so e is incident to v. Since v is on B(x, y) and it is not
the highest node (u is higher), by the previous lemma, the subgraph expansion of
v contains exactly one of x and y. Disconnecting e would partition T into C
v
and
T ¸ C
v
, one of which contains x and the other of which contains y. Since x and y
would no longer be connected, e must be on T(x, y).
Lemma 10. If e is heaviest on T(x, y), there must be an edge of the same weight in
B(x, y)
Proof. We will show that the expansion of any contracted vertex that selects e con-
tains x or y but not both. First, let C
v
be the expansion of a vertex in one of the
G
i
. Let x = u
0
, u
1
, . . . , u
k
= y be T(x, y). If e is incident to C
v
, then T(x, y) ∩ C
v
is
nonempty since one endpoint of e is in C
v
and a vertex in T(x, y). Also, T(x, y) ∩C
v
is clearly connected because otherwise there would be a cycle, and T is a tree. So
T(x, y) ∩C
v
= u
i
, . . . , u
j
. Since both x and y are not in C
v
, u
i
,= x and u
j
,= y. Thus
¦u
i−1
, u
i
¦ and ¦u
j
, u
j+1
¦ are both incident to C
v
, and one of these is e. However,
since there are two edges in T(x, y) incident to C
v
, there is an edge lighter than e
incident to C
v
, so C
v
does not select e.
To see that any C
v
containing both x and y cannot select e, note that discon-
necting any edge incident to C
v
leaves C
v
intact. If C
v
contains both x and y,
disconnecting an incident edge leaves T(x, y) connected, so no incident edges of C
v
are part of T(x, y).
Therefore, let v be any vertex that selects e over the course of Borůvka’s algo-
rithm. We noted above that every T-edge is selected by at least one component. C
v
contains exactly one of x or y, and by Lemma 8 is part of B(x, y). Since C
v
does
not contain both x and y, the parent of v is also a node in B(x, y), so the edge from
v to its parent, which has weight w(e), is part of B(x, y).
24
This brings us to our final result:
Theorem 11. If e is the heaviest edge on T(x, y) and f

is the heaviest edge on
B(x, y), then w(e) = w(f

).
Proof. By Lemma 9, there is an edge f on T(x, y) with w(f) = w(f

), so w(f

) =
w(f) ≤ w(e). By Lemma 10, there is an edge e

on B(x, y) with w(e

) = w(e), so
w(e) = w(e

) ≤ w(f

). Therefore w(e) = w(f

).
Therefore we can use Komlos’s algorithm for full branching trees instead of general
trees, which we have just proved to take a linear number of comparisons.
A note The ideas in this section, I thought, nicely illustrated a use of the cycle
property for determining MSTs. In addition, the idea of constructing a tree of con-
tracted components, where each vertex is the child of the vertex it contracted into,
turns up again in Chazelle’s algorithm.
5 A randomized algorithm
Karger, Klein, and Tarjan introduce a randomized algorithm that always returns the
same answer for any input but whose running time varies. MSF-random, as we will call
it, is expected to run in O(m+n) for any given graph with m edges and n vertices,
although it could conceivably get very unlucky and take up to O(mlog n+n
2
) time.
Since I found that the applying big-Oh notation to a randomized running time
is a little bewildering, let’s restate that: there is a magic number c such that when
MSF-random is run on any graph G a large number of times, the average running time
is less than or equal to c (m + n) units of time. However, there is another magic
number d such that MSF-random always finishes in under d (mlog n + n
2
) units of
time.
5.1 Overview
The “F” in the name “MSF-random” comes from the fact that it works for graphs that
are not connected, hence it returns a minimum spanning forest instead of a minimum
spanning tree.
The algorithm is roughly sketched below for a graph G having n vertices and m
edges.
MSF-random:
25
1. Reduce the number of vertices by a factor of 4 via two Borůvka phases. Call
the contracted graph G
0
and the set of contracted edges F
0
.
2. Toss a fair coin for each edge in G, putting it in the subgraph H
a
if the coin
comes up heads. Call MSF-random on H
a
to obtain its minimum spanning forest
F
a
.
3. Eliminate all the edges of G that become the heaviest edge in a cycle when
added to F
a
. Let H
b
be the remaining graph.
4. Call MSF-random on H
b
to get its minimum spanning forest F
b
.
5. Return F
b
∪ F
0
.
Definition. An edge e is F-heavy if adding it to F creates a cycle and e is the
heaviest edge on that cycle. An edge in G that is not F-heavy is F-light.
The F
a
-light edges are exactly the edges that make up H
b
. Via the cycle property
we can see that none of the edges we threw out in step 2 are in the MST of G, so
the MST must be a subset of the edges we haven’t thrown out.
Claim. If H is a subgraph of G that covers all vertices and contains the MST, then
the MST(H) = MST(G).
Proof. Let e an edge in MST(G), and let S,
¯
S be a cut in G across which e is lightest.
The same cut in H has fewer edges across it, but this does not affect the minimality
of e. If e is an edge in MST(H), and U,
¯
U is a cut for which e is lightest in H,
consider the same cut in G. Any edge in G that we haven’t included in H cannot be
the lightest across any cut, so e’s position as lightest is safe.
If MSF-random indeed returns the minimum spanning forest of H
b
correctly, then
F
b
= MST(G
0
) = MST(G ¸ F
0
). Since all edges returned by Borůvka’s are con-
tractible, the return value of MSF-random, F
b
∪ F
0
, is the MST of G.
Lastly, a double inductive argument on m and n, with a base case of an isolated
vertex, suffices to prove that MSF-random is correct.
5.1.1 The subgraph passed to the second recursion is sparse
The following statement is taken directly from Karger, Klein, and Tarjan in [9]
Theorem 12. Let G be a graph with n vertices, and let H be a subgraph obtained
by including each edge independently with probability p, and let F be the minimum
spanning forest of H. The expected number of F-light edges in G is at most n/p
where n is the number of vertices of G and p is the sampling probability.
26
An auxiliary procedure Consider the modification to Kruskal’s algorithm, given
in Algorithm 5.
Algorithm 5 Count-F-light
Require: G = (V, E); p
Ensure: H is a subsampled graph of G; F = MSF(H)
1: Sort E
2: numFLight ← 0
3: numF ← 0
4: H ← (V
H
, E
H
) ← (∅, ∅)
5: F ← (V
F
, E
F
) ← (∅, ∅)
6: for all e ∈ E do
7: X ← coinFlip(p)
8: if X is heads then
9: Put e in H
10: end if
11: if e is F-light then
12: numFLight ← numFLight + 1
13: if X is heads then
14: Put e in F
15: numF ← numF + 1
16: end if
17: end if
18: end for
Claim. After running Algorithm 5, F = MSF(H) and H is a sampled graph with
each edge being included independently with probability p.
Proof. The second part of the claim follows directly from lines 7 to 10. The first part
comes from the fact that we process the edges in increasing order of weight. If e is
included in F, then it does not create a cycle with lighter edges (by the definition of
F-light), and thus it is safe to include it in F, since any edge added to H afterward
must necessarily be heavier.
Fact. Suppose we have a coin that comes up heads with probability p. Let Z be a
random variable representing the number times we must flip the coin to achieve n
heads. Then E[Z] = n/p.
More formally, Z has the negative binomial distribution parameterized by n and
p. The expectation of such a distribution is well-known to be n/p.
27
Claim. The variable numFLight is bounded above by a random variable Z having
the negative binomial distribution with parameters n and p.
Proof. Suppose we flip a coin every time we increment numFLight – but we already
do! We flip a coin and store it in the variable X, and numF counts the number of
times we get heads and increment numFLight. So numF is the number of heads
we have gotten from our numFLight coin flips. However, numF must be less than
n, since the maximum number of edges in the forest is n − 1. Suppose we keep on
flipping after count-F-light finishes, until numF plus the number of heads we got
afterwards is n. Let Z be the total number of flips we had to make. That is, Z is
numFLight plus the number of extra flips we had to make. By construction Z has
a negative binomial distribution, and Z is necessarily greater than numFLight. So
n/p = E[Z] > E[numFLight].
The proof of Theorem 12 follows directly from the previous two claims.
This allows us to expect that the number of edges passed into the second recursive
call is proportional to the number of vertices, not the original number of edges. Since
a Borůvka iteration halves the number of vertices, and we perform two of them at
the start of each call, this is very good news for the running time.
5.2 A tree formulation
MSF-random is a divide-and-conquer algorithm in the sense that it calls itself multiple
times, with the input to each subcall being smaller than the input of the parent call.
The divide is not quite clean, though, and only the output of the last call is used in the
final recursion, making the subcalls more like a sequence of refinements. Nevertheless,
like all divide-and-conquer algorithms, it can be represented by a recursion tree with
the original problem at the root. Each node has two children, one for each recursive
subproblem. The first (randomly sampled) we’ll say is the left child, and the second
the right.
5.2.1 Some facts about vertices and the recursion tree
The Borůvka iteration reduce the number of vertices by a factor of 4, so each subprob-
lem has at most 1/4 the number of vertices as its parent. Therefore, a subproblem
at depth d has at most n/4
d
vertices. Each subproblem has at most two children;
therefore the number of subproblems at depth d is at most 2
d
. Using these facts,
we see that the total number of vertices in all the subproblems at depth d is n/2
d
.
Summing over all levels, we obtain an upper bound of 2n vertices in all subproblems
combined.
28
5.2.2 Some facts about edges and the recursion tree
Definition. A left-path is a path on the recursion tree consisting of all left edges. A
complete left-path is a left-path headed by either the root or a right child.
Note that left-paths correspond to a recursion chain of only the first recursive call
– that is, finding a minimum spanning forest of a randomly sampled subgraph. Also
note that different complete left-paths are disjoint, and that every vertex on a tree
is a member of a complete left-path. In other words, the complete left-paths form a
partition of the tree. Also, every right child heads a complete left-path.
It’s pretty trivial to prove that if X
0
is the number of edges at the head of the
left-path, and X
i
the number of edges at ith node of the path, E[X
i
] ≤ E[X
0
]/2
i
,
since each edge has a 1/2 chance of being sampled, and we remove many edges at
the Borůvka stage.
We sum over all subproblems on the complete left-path and see that the expec-
tation for this number is

i=0
E[X
0
]/2
i
= 2E[X
0
].
Theorem 13. The expected number of edges in all the combined subproblems is 2m
+ n.
Proof. Suppose I have a subproblem with n

vertices and m

edges, and let H
L
and
H
R
denote my left and right subproblems. By the fact above, E[m
R
] ≤ 2n

. Note that
at any depth d, there are at most 2
d
total subproblems and 2
d−1
right subproblems.
Recall that each subproblem has at most n/4
d
vertices. Summing over all depths, we
see that all the right subproblems combined have at most n/2 vertices. Therefore,
by Theorem 12, the total expected number of edges in all the right subproblems is
2(
n
2
) = n. The expected number of edges in the complete left-path headed by the
root is 2m, so the expected number of edges in the entire recursion tree is 2m+n.
5.3 Runtime analysis
5.3.1 The expected running time
For a problem of size n vertices and m edges, the running time T(m, n) breaks down
into
1. Two iterations of Borůvka’s: O(m).
(a) Recursive call + finding F-heavy edges: T(m
L
, n
L
).
29
(b) Finding F-heavy edges + recursive call: O(m) +T(m
R
, n
R
).
2. Concatenate the edges found in previous steps: O(1).
T(m, n) = T(m
L
, n
L
) +T(m
R
, n
R
) +O(m).
The running time depends solely on the number of edges processed in each sub-
problem:
T(m) = T(m
L
) +T(m
R
) +O(m) (2)
By above, the expected total number of edges is 2m+n, which is O(m).
5.3.2 A guaranteed running time
In the worst case, the sampling does nothing, and all the work is done by the Borůvka
iterations. This gives us a bound of O(mlog n) from a maximum recursion depth of
log n, and m edges in all the subproblems at one level. Furthermore, a subproblem
at depth d contains fewer than
1
2
_
n
2
4
d
_
2
=
1
2
n
2
2
4d
edges. This gives us at most
1
2
2
d
n
2
2
4d
<
1
2
n
2
2
d
total edges in a level, and at most n
2
edges in all subproblems at all
levels. This gives us a guarantee that even in the event that MSF-random makes
very, very unlucky choices, it is no worse (asymptotically) than a classical algorithm
like DJP or Borůvka’s.
5.3.3 High-probability proof
The algorithm finishes in O(m) time with probability 1 −exp(−Ω(m)).
First, we deal with the right subproblems. We’re going to prove the number
of edges in all the right subproblems is ≤ 3m with a high probability. We toss a
nickel for every edge that could be F-light. If it’s heads, the edge goes into the right
subproblem. Since the number of edges in a spanning forest is less than the number
of vertices, and the number of vertices in all right subproblems is less than n/2, the
total number of F-light edges in all right subproblems is n/2. Then, the probability
that there are more than 3m F-light edges is less than the probability that fewer
than n/2 heads appear in 3m coin tosses. The authors apply a Chernoff bound and
the inequality m ≥ n/2 to get a probability of exp(−Ω(m)) [9].
Now for the left subproblems. If we define m

to be the total number of edges in
all right subproblems, and m

to be the the total number of edges in left subproblems,
this can be thought of as m

heads appearing in m

coin tosses. Therefore P(m

>
3m

) is the probability of getting only m

heads in more than 3m

coin tosses. And
a Chernoff bound gives you P(m

> O(m)) grows as exp(−Ω(m)).
30
6 A deterministic, non-greedy algorithm
This algorithm, published by Chazelle in 2000 [4], held the record for the fastest
asymptotic runtime for two years, until Pettie and Ramachandran came up with
an algorithm that is by nature upper-bounded by any comparison-based algorithm,
including this one. However, since no one has been able to prove a lesser bound
for the latter, Chazelle’s analysis still yields the lowest asympotic runtime for the
MST problem of which are aware. In this review I will follow a technical report by
Pettie [10] that simplifies Chazelle’s analysis. In addition, I will not include most
of the details Chazelle describes in [4], and focus more on the intuition driving the
algorithm.
This algorithm, at its heart, consists of three parts:
1. Identifying subproblems
2. Recursing on subproblems
3. Refining the result from Number 2.
The reason Number 3 is needed is that we will use a data structure, the soft heap,
that renders the results of the subproblems inexact. While choosing the subproblems,
we use a soft heap, which picks good but not perfect subproblems.
6.1 The Ackermann function and its inverse
The main thing to know about the Ackermann function is that it grows extremely
quickly. Therefore, its inverse grows extremely slowly.
The Ackermann function is defined on a 2D table, as follows:
A(1, j) = 2
j
(j ≥ 1)
A(i, 1) = A(i −1, 2) (i > 1)
A(i, j) = A(i −1, A(i, j −1)) (i, j > 1)
The base cases are sometimes given differently; I have followed [10]. To give an
idea of how fast the Ackermann function grows, the first few values are given in Table
1. It is estimated that the observable universe contains less than 2
515
atoms, which
is in turn less than A(2, 4).
There are two flavors of inverse. The first takes only one argument.
α(k) = min¦i [ A(i, i) > k¦.
31
Table 1: Values of the Ackermann function
i/j 1 2 3 4 5 6
1 2 4 8 16 32 64
2 4 16 65536 2
65536
2
2
65536
2
2
2
65536
3 16 A(2, 16) ... ... ... ...
The second takes two.
α(m, n) = min¦i [ A
_
i,
_
m
n
__
> log n¦.
Note that α(, ) is decreasing in first argument, as m/n is greater, and thus i
needn’t go as high for A(i,
m
n
) to toplog n. We will mostly use this second form in
our analyses.
6.2 The Soft Heap
Recall that heaps support the following operations:
• insert(item, k) puts item in the heap with key k.
• delete(item) takes the item away from the heap.
• deletemin() returns the item with minimum key and removes it from the heap.
• meld(otherHeap) which combines two heaps.
The soft heap, an earlier invention of Chazelle’s [5], plays a central part in lowering
the running time bound. We have seen, as in Kruskal’s and Prim’s, that insisting
on correctness at every step leads to unnecessary overhead. In Kruskal’s sorting the
edges was extra work, and in Prim’s we incurred overhead from maintaining a sorted
heap. The soft heap sacrifices correctness in exchange for speed. At any time it
may contain corrupted elements, elements whose keys have been raised from their
original values. The soft heap is controlled by a user-defined parameter , the error
parameter, and guarantees
1. deletemin, delete, and meld take constant amortized time
2. insert takes O(log(
1

)) amortized time
3. The number of corrupted elements in the heap at any time is at most N, where
N is insertions so far.
32
4. An additional operation, dismantle, takes O(N) time. This is explained in the
next section.
6.2.1 Bad and corrupted edges
Every item in a soft heap has two keys, original and current. The soft heap uses the
current key to “bubble up” elements, and the return value of the deletemin operation
is based on the current key. However, given any heap element, we can find out if it
is corrupt or not by comparing the current and original keys.
When we dismantle a soft heap, we will often want to find out which items in it
are corrupt. This is why the dismantle operation takes O(N) time – we need to look
at all the elements currently in the heap and decide if they are corrupt or not.
Note that corruption may only raise the weights of edges, not change them ar-
bitrarily. Although it is possible to tell exactly how much each edge weight was
corrupted, we will not need this information for the MST algorithm, only the fact
that the soft heap thought the weight was higher than it should be.
6.2.2 Consequences for the MST algorithm
When we pick subproblems, we will use a soft heap to define subsets of the graph on
which to recurse. Ideally, we would like to pick perfectly contractible components.
However, since the soft heap corrupts edges as it goes, we have to settle for a different
sort of contractibility on a corrupt graph.
6.3 Strong contractibility and weak contractibility
Recall that for a contractible subgraph C, MST(G) = MST(C) ∪ MST(G¸ C).
Let C be a subgraph of a weighted graph G.
6.3.1 Strong contractibility
Definition. C is a strongly contractible with respect to a weighted graph G if there
exists a vertex v
0
in C such that if the DJP algorithm starts with v
0
, it will construct
the MST of C after some number of iterations.
Definition. The maximum weight of a path is the maximum weight of all the edges
in a path.
Claim. Let C be strongly contractible for a corruption G

of G, and M
C
be those
edges which are both corrupt in G

and incident to C. Then if ¦u, v¦, ¦x, y¦ are such
33
that u, x ∈ C and v, y / ∈ C and neither are in M
C
, all edges on the path between u
and x in MST(C) have weight less than max(w(¦u, v¦), w(¦x, y¦).
Proof. If T
C
is the minimum spanning tree of C, let e be the edge with heaviest
weight on the path T
C
(u, x). When we run the DJP algorithm from a particular
vertex v
0
, we end up with a minimum spanning tree of C. Let’s look at a step in
the DJP algorithm while it is constructing the the MST of C. Let p = z
0
, z
1
, . . . , z
k
be the part of T
C
(u, x) already selected by the algorithm so far. If neither endpoint
of p is equal to u or x, then there are two edges in T
C
(u, x) that have not been
selected yet. Since e is heavier than any other edge on T
C
(u, x), it is impossible for
the algorithm to select e at this step.
Remark. The converse is false. In particular, consider the graph with vertices a, b, c, d, e
and edges ¦a, b¦, ¦b, c¦, ¦c, e¦, ¦b, d¦, ¦d, f¦ with weights 1,2,5,3,4 respectively. Then
the subgraph C made of vertices b, c, d and edges ¦b, c¦, ¦b, d¦ is its own MST and
cannot be constrcuted by starting DJP on b, c or d since any attempt will run into b
first and select ¦a, b¦ which has weight 1. However it is impossible to find two edges
incident to C that are lighter, since the three incident edges have weights 1,4,5 and
the edges in C have weights 2 and 3.
6.3.2 Weak contractibility
Weakly contractibility guarantees us a composition formula similar to what we get
with ordinary contractibility. Suppose G is our original graph, andG

is the identical
graph, except some edge weights have been raised. Recall that G

¸ C is the graph
resulting from contracting G

across C, so G

¸ C −M
C
is the contracted graph less
the edges in M
C
.
Theorem. If C is a strongly contractible with respect to G

, then MST(G) ⊆
MST(C) ∪ MST(G¸ C −M
C
) ∪ M
C
.
Note that MST(C) and MST(G

¸ C − M
c
) refer to the MST with the edge
weights of G, not G

. The only time G

is relevant is when we specify that C is
stronly contractible with respect to G

.
The following proof is due to Pettie [11].
Proof. We want to show that any edge not in MST(C)∪MST(G¸C−M
C
)∪M
C
also
must not be in MST(G). The only way an edge fails to be in MST(C) ∪MST(G

¸
C −M
C
) ∪ M
C
is if it is in C but not MST(C), or if it is in (G

¸ C −M
C
) but not
in MST(G

¸ C −M
C
).
34
Case 1: If e ∈ C and e / ∈ MST(C) then there exists some cycle in C for which e
is heaviest. This cycle also exists in G, so e / ∈ MST(G).
Case 2a: If e ∈ (G

¸C −M
C
) and e / ∈ MST(G

¸C −M
C
), and there exists some
cycle in (G

¸ C − M
C
) that doesn’t involve C (we are loosely using C to mean the
contracted vertex) for which e is heaviest, then that cycle also exists in G.
Case 2b: Now suppose the only cycles in (G

¸ C − M
C
) for which e is heaviest
include C. Let P be such a cycle. Then in the noncontracted graph G, there is a
cycle P

consisting of P and edges in MST(C). Let ¦u, v¦ and ¦x, y¦ be the two
edges in P that have exactly one vertex in C. Applying Claim 6.3.1and the fact
that e has maximum weight on P, w(e) ≥ max(w(¦u, v¦), w(¦x, y¦) ≥ w(f) for all
f ∈ C ∩ P

. Thus e is heavier than any edge in P

, which is a cycle in the original
graph. So e / ∈ MST(G).
If the conclusion of 6.3.1 hold, we’ll say a graph is weakly contractible.
We will engage in slight abuse of notation and let MST(() be ∪
C∈C
MST(C).
6.3.3 Strong contractibility on minors
If G
0
is a graph, and (
0
is a set of components that is weakly contractible of G
0
,
then let G
1
= G
0
¸ (
0
−M
C
0
. If (
1
is a set of weakly contractible components of G
1
,
then MST(G
0
) ⊆ MST((
0
) ∪ MST((
1
) ∪ M
C
0
∪ M
C
1
. This follows from induction.
Thus the set ( = (
0
∪ (
1
is weakly contractible.
6.4 Overview revisited
Now that we have the vocabulary and machinery, we can give a more detailed
overview.
1. If the input graph is small enough to run DJP under a fixed time, then run
DJP.
2. Find a set ( of subgraphs that is weakly contractible, and let M
C
be the cor-
responding set of bad edges.
3. For subgraph x ∈ (, preprocess x to increase density to
m
n
. As explained in
section 3.3, this can be done in time O(m + nlog D), where D is the desired
density.
4. For subgraph x ∈ (, recurse on x.
5. Preprocess MST(() ∪ M
C
to increase density to
m
n
.
35
6. Recurse on MST(() ∪ M
C
.
Although the subroutine call to raise the densities may seem worrisome, it will turn
out to not affect the O(mα(m, n)) running time.
6.5 Motivation for Build-T
Build-T, the subroutine that will give us our set (, is the key to this algorithm. First
we to establish what we want out of it:
1. Acceptable subproblems. That is, MST(G) ⊆ MST(() ∪ M
C
, ( is edge-
disjoint, and ( covers all vertices of G.
2. Runs in O(m).
3. Small enough subproblems so that recursion of them does not overwhelm the
running time.
4. Not too many bad edges. This is so the final recursion does not overwhelm the
running time.
The rest of this section is dedicated to elaborating on the last two points. In the
remainder of this section, suppose ( is a set of subgraphs such that
MST(G) ⊆ MST(() ∪ M
C
.
Let m
L
be the total number of edges passed to all the recursive calls except the
last, and m
R
, n
R
be the number of edges and vertices passed to the final recursion.
6.5.1 What we already know
The number of vertices in MST(() ∪ M
C
is exactly n, and the number of edges is
exactly m
B
+n −1, where m
B
is the number of edges in M
C
. This follows from the
fact that ( covers all vertices and is edge-disjoint, so MST(() is a spanning tree of
G. After raising the density of MST(() ∪ M
C
via a Fredman-Tarjan iteration, the
number n
R
of vertices at most
_
m
B
m
_
n.
If we do not clean up after the Fredman-Tarjan iteration, then m
R
= m
B
+n−1 ≥
m
B
+
m
B
m
n = m
B
(1 +
n
m
) for a big enough graph. Here we are making the (possibly
big) assumption that we have managed not to corrupt only a fraction of edges in the
graph, so
m
B
m
< 1. So m
R
≥ (1 +
1
D
)m
B
.
36
6.5.2 The recursion formula
Let T(m, n) be the maximum running time for any graph with m edges and n ver-
tices, and let t(m, n) = T(m, n)/cm , for any constant c. Below let the total over-
head, including the time it takes to find subproblems, take O(S(m, n)) time, and let
s(m, n) = S(m, n)/bm, for any constant b.
Then recursive formula for the running time of MST-hierarchical can be written
as
T(m, n) ≤

x∈C
T(m
x
, n
x
) +T(m
R
, n
R
) +bs(m, n) (3)
=

x∈C
cm
x
t(m
x,
n
x
) +cm
R
t(m
R
, n
R
) +bs(m, n)

x∈C
cm
x
t
1
+cm
R
t
2
+bs(m, n) [see below]
= cm
L
t
1
+cm
R
t
2
+bs(m, n)
= (cm
L
t
2
−cm
L
(t
2
−t
1
)) +cm
R
t
2
+bs(m, n)
= (cmt
2
+c(m
L
+m
R
−m)t
2
) −cm
L
(t
2
−t
1
) +bs(m, n)
= cmt
2
+c((m
R
+m
L
−m)t
2
−m
L
(t
2
−t
1
) +
b
c
s(m, n)) (4)
In the above, t
1
= max
x
¦t(m
x
, n
x
) and t
2
= t(m
R
, n
R
).
To have the entire thing run in O(mf(m, n)), it suffices to have the following
restrictions on t(, ) and s(, ):
t(m
R
, n
R
) ≤ f(m, n) (5)
∀x, t(m
x
, n
x
) ≤ f(m, n) −1 (6)
s(m, n) = O(m), (7)
and the following restrictions on m
L
and m
R
:
(m
R
+m
L
−m)a −m
L
+
b
c
m
= m
R
a +m
L
(a −1) + (
b
c
−1)a (8)
≤ 0
with a being f(m, n).
We therefore look for a procedure that will guarantee the last requrement and
runs in O(m), since the first two requirements follow by induction. To see this, note
that if all four of the above hold, substituting t
2
with a and t
1
with a−1 in (3) yields
37
an expression equal to or greater than (3). Propagating this replacement down to
(4),
cma +c((m
R
+m
L
−m)a −m
L
+
b
c
m) [from (5), (6)(7)]
≤cma [from (8)]
We could have replaced α(, ) with any function f(m, n); as long as we can find
subproblems that will allow (5) through (8) to be fulfilled, we will have an algorithm
that runs in O(mf(m, n)). What’s special about α(m, n) is that, as we will show, if
f(m, n) = α(m, n) + 2, then we can fulfil all these requirements.
6.6 Build-T
6.6.1 A hierarchy of minors
It has already been noted that if G
0
, G
1
, . . . , G
N
is a sequence of contractions, with
G
0
= G and G
i+1
defined recursively as G
i
¸ (
i
− M
C
i
, where (
i
is a set of weakly
contractible subgraphs of G
i
, then
MST(G) ⊆ MST((
0
∪ . . . ∪ (
N
) ∪ M
C
0
∪ . . . ∪ M
C
N
.
It is our job to find (
i
so that the conditions described in the previous section hold.
This formulation leads to a hierarchical representation of the subgraphs in the
(
i
s. Each vertex v in G
i
really represents a subgraph of G
i−1
, which contains multiple
vertices of G
i−1
. Therefore, we make v the parent of all the vertices in G
i−1
that
it contains, which likewise are subgraphs that themselves contain vertices of G
i−2
,
and so on. Thus we obtain a hierarchy of subgraphs, with a node at height j in the
hierarchy representing both a vertex of G
i
and a subgraph of G
i−1
, whose children
are its component vertices, and whose parent is the subgraph of G
i
of which this
node is a part. It is clear that this hierarchy is a tree, since every node has one
parent, and there are no other links other than parent-child ones. Call this hierarchy
T .
6.6.2 Building the tree
It turns out that building T layer by layer, minor by minor, will not be as efficient as
building it postorder [10]. Recall that in postorder traversal, all children have lower
traversal numbers than their parents, and left children have lower traversal numbers
than right children.
38
Therefore the first subgraph we want to define is the one at the bottom of the
leftmost path of T , which is a vertex of G
0
(all leaves of T are vertices of G
0
) – call it
v
0
. This is rather trivial, so we “visit” its siblings (by defining them to be subgraphs
in (
−1
), also vertices of G
0
, until we run out of siblings and “visit” the parent. Having
come at last to the parent, we know exactly which vertices are in it, and so we are
able to gather them up and put the entire component in (
0
.
Now the parent C has siblings too, so we start again, by visiting vertices of G
0
until we have visited enough, and are able to define another component C

. When
we have determined that we have defined enough subgraphs of G
0
make a subgraph
of G
1
, we stop and throw all the previously-defined Cs, which are both subgraphs of
G
0
but more importantly vertices of G
1
, into a component and put it into (
1
. Then
we start again at the bottom, with a vertex of G
0
.
We are not building T so much as discovering it, and recording the nodes we
discover. There are still two unknowns, however. One, how do we know that a node
has “run out of children,” and may be visited? Two, how do we know which sibling
to visit next?
6.6.3 Determining which sibling to visit next
While building a component of G
0
, this is an easy question, and the answer is to
take the lightest edge coming out the component. With subgraphs of later minors,
the same is still true. After finishing a vertex of G
i
(a subgraph of G
i−1
), we need
to find an appropriate vertex of G
0
with which to start again. We will do this with
soft heaps. Each node of T on the active path – the path from the root G
N
to the
node just visited – maintains a heap (actually several heaps, as we will see later)
that stores the vertices of G
0
to which its known descendant leaves are adjacent.
If v is the vertex of G
i
which we have just visited, and C
v
is its expansion in G
0
,
let u be the parent of v in T , i.e. u is the subgraph of G
i
of which v will become a
part. Then the heap associated with u now tracks every vertex in C
v
, keyed by the
weight of the lightest edge that leads out of C
v
. In addition, all ancestors of u also
keep similar heaps. Note that a heap for a node in T only exists if we have visited
one of its children. The next vertex of G
0
that we visit will be the min element from
all these heaps. Let this vertex from G
0
be v
0
.
However, we don’t always just start a new bottom-level component with v
0
and
propagate the built components back to v. We also keep track of the min-link
between every pair of components on the active path. The min-link between a node
v and its ancestor w is the lightest edge between v and any visited relative for which
w is the lowest common ancestor. The min-links keep track of internal edge costs,
39
while the heaps keep track of external costs. At all times we want to maintain the
following invariant:
Invariant1 The next edge taken is lighter than all the min-links in the active path.
At the time v
0
is selected, the edge leading to it may be heavier than an existing min-
link. To preserve the invariant, we contract subgraphs until the edge is indeed lighter
than any min-link. That is, if w is the highest ancestor such that min-link(w, z) is
heavier than the edge we selected, then for every partially completed subgraph z
between u and w, call z finished and put it in the appropriate (
i
, except the direct
child of w. All of these z, then, will have no other children. We have to do something
special with the direct child w

of w, since we don’t want to trigger the signal that
causes the algorithm to think w is finished before we get a chance to add v
0
(this
will be explained below). The min-link between w and w

leads out of w

and into
another child w

of w. Take these two children of w, and create a new child node
fuse(w

, w

) whose children are w

and w

.
At this point we are ready to add v
0
to the newest bottom-level component.
Note that, because of the invariant, the min-links coming out of a higher node in
T are always heavier than the min-links coming out of lower nodes.
6.6.4 When a node runs out of children
One reason we decide that a node has no more children and thus should be visited
was described in the last section, when component’s growth is cut short to preserve
the invariant. The only other time we stop and decide to finish visiting a node is
when the subgraph gets big enough. Specifically, a node that is a subgraph of G
i
has
no more than A(s, i + 1) children. The parameter s is defined to be
s = min
_
s

: A(t

,
_
_
m
n
_
1/4
_
< n
_
.
Remark. The leftmost child of any node terminated growth due to the size constraint,
because while the child’s subtree was being traversed, the parent had no other de-
scendants that were not in the child. This means that any non-terminal node has at
least one child of size A(t, i + 1), where G
i
is the minor to which the node belongs.
Remark (2). The previous remark implies that the total number of vertices in G
i
that did not end up in one-element subgraphs (resulting from premature termination
during expansion) is no more than 2n/A(t, i). Pettie proves this in [10].
40
6.6.5 Data structures and corruption
Each component on the active path maintains a list of soft heaps. It should be clear
that there is only one component per minor under construction at a time. Let X
i
denote the active component for G
i
. Note: Chazelle and Pettie number X
i
going in
the opposite direction, with the component of G
0
being X
k
, and the sole node at the
root of T being X
0
.
Recall that the number of corrupt items in a soft heap at most N, where N is
the total number of inserts. As Pettie and Chazelle both point out, once we delete K
items from the heap, the heap is free to corrupt another K elements without violating
its N corruption constraint. To alleviate the amount of corruption that would be
caused by continuously deleting and inserting and re-inserting elements into several
heaps, we instead maintain many different heaps.
X
i
maintains a heap H(i) and additional heaps H
j
(i) for all j > i, as well as
a special heap H

(i). An edge is put into H
j
(i) if the endpoint not in X
i
is also
incident to X
j
via another edge, and not incident to any X
l
for i < l < j. If an edge
is in H

(i) then it is not incident to any ancestors X
j
. An edge is put into H(i) if
its other endpoint is already accounted for in one of the H
j
(i)s.
After we grab a new vertex v from G
0
, we insert all its incident border edges into
the appropriate heap. In addition, adding v to the current component changes some
edges from external to internal. We delete those from their respective heaps.
When we finish visiting a node and its descendants, we put all edges in the heaps
maintained by X
i
into the appropriate X
j
heap and discard corrupt edges in X
j
. If
an edge is eligible for H(i+1) then it is inserted there; otherwise redundant edges are
threshed out and H
j
(i) is melded with H
j
(i +1). There are further details involving
finding the minimum edge among redundant edges; the reader is invited to look at
[4] and [10].
In addition, X
i
maintains a list of min-links for all j < i. Chazelle notes that,
every time we finish visiting a node, or grab a new one, the min-links can be updated
in time quadratic in the length of the active path.
6.6.6 Error rate and running time
Setting = 1/8 gives us a total of at most
m
2
+ d
3
n corrupt border edges, since on
average each edge is inserted and reinserted into a heap at most four times. The total
cost of the heap operations, the inserts, deletes, melds, and comparing min-links is
O(mlog
1
/ +d
2
n), and the min-links contributes O(m) time [10].
41
6.7 Correctness
As noted previously, it suffices to prove that every component of every minor is
weakly contractible. We can use an inductive argument due to Pettie [10]. We
induct on the number of vertices in X
i
: suppose we know X
i
is weakly contractible
before we find a new child v. However, Invariant 1 ensures that for any two edges
coming out of X
i
, the lightest path in X
i
is composed of edges lighter than at least
one of the two incident edges. This is enough to ensure weak contractibility since it
fulfils the hypotheses of 6.3.1.
6.8 Runtime
6.8.1 Density games
Recall that t is min
t
¦t

: A(t

, ¸
_
m
n
_
1/4
|¦. We pick D
0
= 2t and do an initial pass
to raise the density to D
0
. As noted above, this takes O(m+nlog t) ≤ O(mt) time.
Furthermore, every time we preprocess to raise the density, we pick D = m

/n

,
where m

, n

are the parameters of the graph passed in to the recursive call. This
ensures that, for every recursive call, D ≥ 2t.
6.9 And, Finally
Going back to 6.5.1, we see that
m
R
+m
L
−m = m
B
(1 +
1
D
) +m
L
−m =
m
B
D

m
2D
.
If we set f(m, n) to be t as calculated in this section, then (8) becomes
(m
R
+m
L
−m)t −m
L
+
b
c
m

m
2D
t −m
L
+
b
c
m

m
4
+
b
c
m−
m
2
from D ≥ 2t and m
L
+m
B
= m
Choosing b ≤
c
4
completes the requirements.
7 The Optimal One
The two main appeals of this algorithm are that
42
1. It’s simple(r than the last one).
2. It’s theoretically interesting because it shows that a minimum spanning tree
can be found in time proportional to the least number of comparisons needed,
on a pointer machine.
We will refer to this algorithm as MST-decision-tree.
7.1 Decision trees and optimality
We’ve all worked with decision trees at some level. A decision tree can chart the
course of an algorithm, with each internal node representing a possible branching
point, and each leaf containing a possible output. In the case of sorting a list, for
example, each edge weight comparison is a node with two children, one for the ≤
result, and one for the > result, and at each leaf a string which is in sorted order if
the decision tree is correct.
In general, pointer-machine MST algorithms have binary comparison as their
basic action. Any instance of a deterministic algorithm can be distilled into a decision
tree with the internal nodes representing edge weight comparisons, and two children
per node.
Let’s take a look at Kruskal’s algorithm. Every time two weights are compared
during the initial sort there is a node and a two possible child paths. If we know
which edges are present (and therefore know beforehand which edges will create
cycles if they are added), then the sort order fully determines the MST. Then, we
can say that Kruskal’s algorithm really represents a class of decision trees, or a way
to generate a decision tree for an input graph topology.
The height of a decision tree is the maximum length of a path in the tree. That
is, on a decision tree for a particular unweighted graph, the height is the number
of comparisons we make in the worst-case permuation of edge weights. We shall
say a decision tree is optimal if it is correct and there is no correct decision tree of
lesser height. Let T

(U) denote the optimal decision tree height for an unweighted
graph U. Kruskal’s does not always generate an optimal decision tree. For example,
given a connected graph of n vertices and n − 1 edges, Kruskal’s makes at least
(n−1) log(n−1) comparisons during the initial sort, when it could have just returned
the set of all edges without doing any work!
Call the class of a graph G the set of all graphs with the same number of edges
and vertices, denoted by (
m,n
. For refererence, there are
_
(
n
2
)
m
_
such graphs in (
m,n
.
We are interested in all the decision trees generated by MST-decision-tree for any
particular class. Define T

(m, n) to be max¦T

(U) : U ∈ (
m,n
¦. That is, if some
43
hypothetical MST algorithm makes the optimal number of comparisons for each
graph, T

(m, n) is the worst-case number of comparisons possible for a graph with
m edges and n vertices. The big result of [11] and this section is there is an algorithm,
MST-decision-tree runs in O(T

(m, n)) time for any graph in (
m,n
.
The hypothetical algorithm above makes the optimal number of comparisons for
any graph G. However, this should not be taken to mean that Pettie and Ramachan-
dran’s algorithm does the same. The promise is that the number of comparisons and
the amount of time taken for a graph with m edges and n vertices is under T

(m, n),
which depends only on the class, not the individual graph. For example, one graph in
G
100,100
is the cycle on 100 vertices, in which case 100 comparisons are needed to find
the edge to exclude. On the other hand, if G is a path on 100 vertices and an extra
edge between the last vertex on the path and the third-to-last, then the longest cy-
cle is three edges long and only three comparisons are needed. MST-decision-tree
guarantees the same time bound for both.
7.1.1 Breaking up the decision tree
Lemma 14. Suppose G is a graph with m edges and n vertices, and let ( be an
edge-disjoint collection of subgraphs. Then

C∈C
T

(C) ≤ T

(m, n) .
Sketch of proof. The main idea of this proof is that by taking union of all the Cs
we create a graph H that 1) has at most m edges and n vertices, and 2) has MST
equal to the union of all the MST(C)s. T

(H) is then clearly below T

(m, n). The
second main idea is that, by stacking the optimal decision trees for the Cs, we can
create an optimal decision tree for H. For details, see [11].
This will allow us to recurse in strongly contractible subgraphs without messing
up the time bound.
7.2 DenseCase
Pettie and Ramachandran note that several previous superlinear algorithms can be
made to run in guaranteed to run in linear time if the graphs are kept sufficiently
dense.
DJP with a Fibonacci heap, for example, runs in O(m + nlog n) time, so by
ensuring that k(m/n) ≥ log n for some fixed k, we also make sure that nlog n =
O(m), so the entire thing runs in O(m). For the MST-hierarchical procedure
which we will describe in the next section, log n < A(k, ¸m/n|) =⇒ α(m, n) < k,
bringing the O(mα(m, n)) bound down to O(km) = O(k). For this algorithm, Pettie
44
and Ramachandran single out a relatively simple algorithm with an easy density
requirement by Fredman and Tarjan [7], introduced in the same paper that debuted
the Fibonacci heap (also described in 3.3). It runs in time O(mβ(m, n)), where
β(m, n) = min¦i : log
(i)
n ≤ (m/n)¦, so we only need m/n ≥ log
(k)
(n) for β(m, n) ≤
k. We’ll call the Fredman-Tarjan algorithm DenseCase. DenseCase may also operate
on graphs that have self-loops and multiple edges without affecting the running time
analysis.
However, when speaking of the asymptotic runtime, we must account for graphs
that do not meet these requirements. Therefore we’ll only run DenseCase after
enough processing to guarantee the density requirement.
7.3 Building and storing decision trees
7.3.1 Keep the parameters small
Pettie and Ramachandran calculate a set of optimal decision trees for all graphs on
r vertices. For any number r, the time needed to build all the decision trees on r
vertices is a little horrendous. Pettie and Ramachandran go over this calculation, but
in short it is upper-bounded by 2
2
4r
2
. This number was obtained by hypothesizing
a brute-force calculation – building all possible decision trees for all possible graphs
on n vertices, testing them for correctness by trying out every permutation of edge
weights, and eventually taking the shortest correct tree. However, they also note
that if r < log
(3)
n, then the entire calculation runs in O(n)!
7.3.2 Emulating table lookups
This subsection explains how, given k subgraphs, each of which have r or fewer
vertices, we can find the corresponding decision trees in O(kr
2
+ n) time, for r <
log
(3)
n. These methods are unique to the pointer machine model, and are only
relevant when we can’t do table lookups – the important thing to take away is the
final running time of this retrieval process and the fact that we are able to achieve
this time on a pointer machine.
We’ve built 2
(
r
2
)
decision trees, and now we need to be able to use retrieve them
when they are needed. The obvious strategy is to store them in a table, and simply
access the table entry when we need to. However, the pointer machine model disal-
lows such a method, or any method requiring the computation of a machine address.
Theorists running pointer machines have found a way to emulate table lookups by
sorting. Sorting takes longer than a table lookup, but under certain circumstances
it can run fast enough.
45
The intuition is this: if I have N things that are orderable, and I have another
thing that is identical to exactly one of my N things, then if I sort these N+1 things,
then my extra thing should show up next to the original thing it matches. Thus the
time it takes to find the original thing is the time it takes to sort the collection of
N + 1 things. Clearly if I only had one thing to look up, I wouldn’t bother with the
sort; I would just scan the original list of N things. But if I have k extra things, I
can play the same trick, and scan through the sorted string once to find the matches
to all my query objects. So the time it takes to find matches for k extra things is
the time to sort a total of N +k things.
Buschbaum et al. [3] encode a graph on r vertices to a string of r
2
symbols,
basically by listing the edges present and padding short strings with nulls.
Then we throw our 2
(
r
2
)
original graphs and k query graphs together and perform
a bucket sort, returning our items in lexicographic order (there is a natural ordering
on the vertex identifiers, and the encodings are basically strings of vertex identifiers).
How long does this take? The bucket sort performs r
2
passes, one for each symbol
in an encoding. In one pass we need to put each of our elements to be sorted into
a bucket, and we have 2
(
r
2
)
+ k items. The total time taken is O(r
2
2
r
2
+ r
2
k) =
O(n +r
2
k).
As a final note, we can implement bucket sort with linked lists instead of arrays,
ensuring that we do not violate the rules of the pointer machine. See [3] for details.
7.4 Partitioning
We have three components that seem relevant: DenseCase, the optimal decision
trees, and the strong contraction rule.
As the last step we will show how to find an edge-disjoint collection of subgraphs
T such that
1. For some corruption G

of G, all D ∈ T are strongly contractible with respect
to G

.
2. Every vertex in G falls in at least one D ∈ T.
3. Let
˜
T be the collection of subgraphs obtained by merging any two subgraphs
of T that share a vertex.
˜
T is the set of connected components of ∪
D∈D
D.
Then every element in
˜
T has at least log
(3)
n vertices.
4. The process of finding T takes O(m) time.
46
7.4.1 Relevance
We can use DenseCase Suppose ( is a a collection of subgraphs such that every
C ∈ ( has at least log
(3)
(n) vertices and partitions the vertices of G. First, contract-
ing G across ( without removing redundant edges yields a graph with m edges and
n

< n/log
(3)
(n) vertices.
n

<
n
log
(3)
n
=⇒
m
n

>
_
m
n
_
log
(3)
n
=⇒
m
n

> log
(3)
n

.
Thus we can run DenseCase on the contracted graph, and it will finish in O(m).
(Even if we do remove duplicate edges, running DenseCase on the cleaned-up graph
will be strictly faster than running it on the more complex graph. The stipulation
that we do not clean up the graph prior to passing it in to DenseCase is purely to
make the analysis simpler.)
We can use decision trees Now if T is an edge-disjoint collection of subgraphs
such that every D ∈ T has less than or equal to log
(3)
n vertices, then we have
a precomputed optimal decision tree for every subgraph in T. So we can find

D∈D
MST(D) in

D∈D
T

(D) < T

(m, n) time, by Lemma 14.
Furthermore, if ¦D
1
, D
2
, . . . , D
j
¦ is any subset of T, then the MST of their union
is MST(D
1
) ∪MST(D
2
) ∪. . . ∪MST(D
j
), since they are edge-disjoint. So when we
create
˜
T by merging subgraphs that share vertices, we don’t need to do any extra
work to find the MSTs of the subgraphs in
˜
T.
Now if we add the requirement that T covers every vertex , then by combining
any components of T that share vertices, we can obtain a set of subgraphs that
partition the vertices, as required above.
7.4.2 Finding partitions
We perform one Fredman-Tarjan iteration, described in 3.3, with two modifications:
we use a soft heap instead of a Fibonacci heap, and instead of stopping the growth
of a component when the heap gets too large, we stop growth when the component
size reaches r =
_
log
(3)
(n)
_
. In addition, after we finish growing a component, we
store the set of vertices in that component and put that set in T. The use of a soft
47
heap entails corruption; after finishing a component, we discard all border edges that
have been corrupted. Another consequence of using a soft heap is that there is no
decreasekey operation. Instead, we just insert all the edges we find and trust that
the heap will return the “minimum”-weight edge.
Since a Fredman-Tarjan iteration doesn’t stop until all vertices are marked, i.e.
put into a component, T covers all vertices. Components stop growing when or
before they reach r vertices, so every member of T has r or fewer vertices, making
it eligible to get its optimal decision tree applied. Finally, every component in T
stopped growing either when it reached r vertices, or when it collided with another
component. As in the Fredman-Tarjan iteration, the first component of a set of
components linked by shared vertices must have reached its mature size, so that
entire set of components must collectively have r or more vertices.
The use of a soft heap ensures that the procedure runs in O(m) time, and also
that it generates a of G. The corrupted graph G

has at most 2m corrupt edges,
since every edge is inserted at most twice.
7.5 Putting things together
We are now ready to put everything together. The algorithm is as follows:
1. Precompute the decision trees for all graphs with fewer than log
(3)
n vertices.
Store the result in the variable dectree.
2. Run Partition. Store the corrupted graph in G

and the collection of sub-
graphs in T.
3. Use the sorting trick to retrieve the decision trees of the graphs in T.
4. Apply decision trees to get MST of each subgraph. The result is ∪
D∈D
MST(D).
5. Combine subgraphs that share vertices to get a new collection of subgraphs
˜
T.
6. Contract G across
˜
T to get G¸
˜
T. Remove bad edges M
˜
D
to get G¸
˜
T−M
C
.
7. Run DenseCase(G¸
˜
T −M
C
) to get MST(G¸
˜
T −M
C
).
8. Two Borůvka iterations
9. Recurse: MST-decision-tree(∪
D∈D
MST(D) ∪ MST(G¸ T

−M
C
) ∪ M
C
).
48
7.6 Time complexity
The time taken for each step is as follows:
1. Precomputing decision trees – O(n)
2. Partitioning graph – O(m+n). Contracting graph – O(m+n).
3. Sorting – O(m+n).
4. Applying decision trees – O(T

(m, n)).
5. Finding connected components of ∪T– O(m+n).
6. Contracting across T– O(m+n).
7. DenseCase – O(m).
8. Borůvka iterations – O(m+n)
9. Recursion – O(T

(m/2, n/4))
References
[1] Otokar Borůvka. Wikipedia.
[2] A.M. Ben-Amram. What is a pointer machine? ACM SIGACT News, 26(2):88–
95, 1995.
[3] Adam L. Buchsbaum, Haim Kaplan, Anne Rogers, and Jeffery R. Westbrook.
Linear-time pointer-machine algorithms for least common ancestors, mst verifi-
cation, and dominators. In STOC ’98: Proceedings of the thirtieth annual ACM
symposium on Theory of computing, pages 279–288, New York, NY, USA, 1998.
ACM.
[4] Bernard Chazelle. A minimum spanning tree algorithm with inverse-ackermann
type complexity. J. ACM, 47(6):1028–1047, 2000.
[5] Bernard Chazelle. The soft heap: an approximate priority queue with optimal
error rate. J. ACM, 47(6):1012–1027, 2000.
[6] Jason Eisner. State-of-the-art algorithms for minimum spanning trees - a tutorial
discussion. Master’s thesis, University of Pennsylvania, 1997.
49
[7] Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and their uses
in improved network optimization algorithms. J. ACM, 34(3):596–615, 1987.
[8] Michael L. Fredman and Dan E. Willard. Trans-dichotomous algorithms for
minimum spanning trees and shortest paths. Journal of Computer and System
Sciences, 48(3):533 – 551, 1994.
[9] David R. Karger, Philip N. Klein, and Robert E. Tarjan. A randomized linear-
time algorithm to find minimum spanning trees. J. ACM, 42(2):321–328, 1995.
[10] Seth Pettie. Finding minimum spanning trees in o(m &alpha;(m,n)) time. Tech-
nical report, The University of Texas at Austin, 1999.
[11] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree
algorithm. Journal of the ACM, vol. 49, no. 1:16–34, 2002.
50

Sign up to vote on this title
UsefulNot useful