## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Jinna Lei

Submitted for Math 196, Senior Honors Thesis

University of California, Berkeley

May 2010

1

Contents

1 Introduction 6

1.1 History and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 The problem, formally . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Some deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Simplifying assumptions . . . . . . . . . . . . . . . . . . . . . 7

1.2.4 Limited computation model . . . . . . . . . . . . . . . . . . . 8

1.3 Important properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Cuts and cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.2 About trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.3 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . 10

1.3.4 The cut and cycle properties . . . . . . . . . . . . . . . . . . . 11

1.4 Graph representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Classic algorithms 12

2.1 The union-ﬁnd data structure . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Kruskal’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Dijkstra-Jarník-Prim . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Iterative algorithms 14

3.1 Contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Borůvka’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.1 One iteration of Borůvka’s algorithm . . . . . . . . . . . . . . 16

3.2.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Fredman-Tarjan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3.1 One iteration of Fredman-Tarjan . . . . . . . . . . . . . . . . 17

3.3.2 The complete algorithm . . . . . . . . . . . . . . . . . . . . . 19

4 An algorithm for veriﬁcation 19

4.1 Veriﬁcation: problem deﬁnition and reduction . . . . . . . . . . . . . 19

4.1.1 Narrowing down the search space . . . . . . . . . . . . . . . . 19

4.1.2 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 We can verify with a linear number of comparisons . . . . . . . . . . 21

4.2.1 Proof of complexity for a full branching tree . . . . . . . . . . 22

4.2.2 Turning every tree into a full branching tree . . . . . . . . . . 22

4.2.3 We can use B instead of T . . . . . . . . . . . . . . . . . . . . 23

2

5 A randomized algorithm 25

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1.1 The subgraph passed to the second recursion is sparse . . . . . 26

5.2 A tree formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.1 Some facts about vertices and the recursion tree . . . . . . . . 28

5.2.2 Some facts about edges and the recursion tree . . . . . . . . . 29

5.3 Runtime analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3.1 The expected running time . . . . . . . . . . . . . . . . . . . . 29

5.3.2 A guaranteed running time . . . . . . . . . . . . . . . . . . . . 30

5.3.3 High-probability proof . . . . . . . . . . . . . . . . . . . . . . 30

6 A deterministic, non-greedy algorithm 31

6.1 The Ackermann function and its inverse . . . . . . . . . . . . . . . . 31

6.2 The Soft Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2.1 Bad and corrupted edges . . . . . . . . . . . . . . . . . . . . . 33

6.2.2 Consequences for the MST algorithm . . . . . . . . . . . . . . 33

6.3 Strong contractibility and weak contractibility . . . . . . . . . . . . . 33

6.3.1 Strong contractibility . . . . . . . . . . . . . . . . . . . . . . . 33

6.3.2 Weak contractibility . . . . . . . . . . . . . . . . . . . . . . . 34

6.3.3 Strong contractibility on minors . . . . . . . . . . . . . . . . . 35

6.4 Overview revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.5 Motivation for Build-T . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.5.1 What we already know . . . . . . . . . . . . . . . . . . . . . 36

6.5.2 The recursion formula . . . . . . . . . . . . . . . . . . . . . . 37

6.6 Build-T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.6.1 A hierarchy of minors . . . . . . . . . . . . . . . . . . . . . . . 38

6.6.2 Building the tree . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.6.3 Determining which sibling to visit next . . . . . . . . . . . . . 39

6.6.4 When a node runs out of children . . . . . . . . . . . . . . . . 40

6.6.5 Data structures and corruption . . . . . . . . . . . . . . . . . 41

6.6.6 Error rate and running time . . . . . . . . . . . . . . . . . . . 41

6.7 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.8 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.8.1 Density games . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.9 And, Finally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3

7 The Optimal One 42

7.1 Decision trees and optimality . . . . . . . . . . . . . . . . . . . . . . 43

7.1.1 Breaking up the decision tree . . . . . . . . . . . . . . . . . . 44

7.2 DenseCase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.3 Building and storing decision trees . . . . . . . . . . . . . . . . . . . 45

7.3.1 Keep the parameters small . . . . . . . . . . . . . . . . . . . . 45

7.3.2 Emulating table lookups . . . . . . . . . . . . . . . . . . . . . 45

7.4 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.4.1 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.4.2 Finding partitions . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.5 Putting things together . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.6 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4

Since this marks the denouement of my college career, I suppose acknowl-

edgements are in order.

I dedicate this to my family who encouraged and supported me through-

out college, and gave me the inspiration to keep going.

Also a very big thanks to Professor Karp, Luqman Hodgkinson, Yehonatan

Sella, Ian Henderson, and everybody else who listened to me blathering

on about spanning trees and soft heaps. And my dad, who not only

listened but read part of it!

5

1 Introduction

1.1 History and Content

In 1926 Otokar Borůvka attacked the problem of ﬁnding the most eﬃcient electricity

network for the now-nonexistent nation of Moravia. [1] Distilled into a mathematical

form, this is the problem of ﬁnding the subgraph of least cost that is still connected.

Since then, the task of ﬁnding a minimum spanning tree has become a staple of

the algorithms repertoire. A few others proposed better solutions, the methods

of Kruskal and Dijkstra-Jarník-Prim (more commonly known as Prim’s algorithm)

being the most intuitive and popular.

The classical greedy algorithms – Kruskal’s, Dijkstra-Jarnik-Prim, and Borůvka’s

– build the graph incrementally, at all times maintaining a correct partial result –

correct in the sense that every edge in the intermediate result ends up in the ﬁnal. On

the other hand, the fastest algorithms so far all maintain intermediate results that

are instead supersets of the correct answer, instead of subsets. We shall investigate

why this approach is powerful by examining four algorithms: one that checks whether

a given spanning tree actually is minimal, and three for actually constructing the

MST.

The veriﬁcation algorithm ﬁrst is the cumulative result of quite a few papers,

probably the ﬁrst of which came from Janos Komlós in 1984. Next we will examine a

randomized algorithm by David Karger, Philip Klein, and Robert Tarjan (1995) that

runs in linear time with high probability. The third result is a slightly superlinear

algorithm by Bernard Chazelle (2000) that uses the wonderful Soft Heap. The last

algorithm we will look at was put together by Seth Pettie and Vijaya Ramachan-

dran (2002), and although they show that its asymptotic running time cannot be

beaten (at least for a comparison-based algorithm), no one really knows what that

asymptotic running time actually is, except that it is at least linear in the input size.

This is a review of the papers I found interesting. I try to frame things in new

and interesting ways, and introduce some coherence between them. I hope it can be

of help to anyone surveying the minimum spanning tree literature.

1.2 The problem, formally

1.2.1 Some deﬁnitions

Deﬁnition. A graph G consists of a vertex set V and an edge set E. Every element

of E is an unordered pair of vertices. We will write undirected edges as ¦u, v¦.

6

Deﬁnition. A subgraph H of G has an edge set E

**⊆ E, and a vertex set induced
**

by E

.

Deﬁnition. A path in G is a sequence of vertices v

0

, v

1

, v

2

, . . . , v

k

such that there is

an edge between any two adjacent vertices v

i

, v

i+1

in the sequence. We will sometimes

refer to the edges in the path, although it is formally deﬁned as a sequence of vertices.

Deﬁnition. A graph is connected if there is a path between any two vertices in the

graph.

Deﬁnition. A tree is a graph that is minimally connected – that is, any tree T is a

connected graph, but removing any edge will disconnect it.

Deﬁnition. A spanning tree of G is a subgraph of G that is a tree and that covers

every vertex in G.

Deﬁnition. If G is a graph whose edges have weights w(e), the cost of a subgraph

H is

e∈H

w(e), or more informally, the sum of the weights of all its edges.

Deﬁnition. A forest is a set of trees.

Deﬁnition. An edge is incident to a connected component, a vertex, or another

edge if exactly one if its endpoints lie in the connected component, equals the vertex,

or is also an endpoint of the other edge.

1.2.2 Statement

Given a graph G = (V, E), and a weight function w over the edges, ﬁnd a spanning

tree T of G such its cost is minimal. That is, of all the spanning trees U of G,

e∈T

w(e) ≤

e

∈U

w(e

).

We will denote the true minimum spanning tree of G as MST(G).

1.2.3 Simplifying assumptions

Let it be known that m generally denotes the number of edges in the input graph

and n the number of vertices. If there is a possibility of ambiguity, we will strive to

clarify whether m and n refer to the parameters of the original input, or those of a

recursive call.

Unless otherwise stated, we will assume that all edge weights are distinct. Usually

it is a simple matter to generalize to non-distinct weights. In addition, we will often

assume in correctness proofs that all edge weights are integers in [1,m]. This will not

7

change the minimum spanning tree. In fact, we only need an ordering on the edge

weights to ﬁnd the MST, as the Cut and Cycle properties below show.

We will also assume that the original input graph G is connected. If an input

graph is not connected, we can ﬁnd the strongly connected components in linear

(O(m + n)) time and feed the components separately to our algorithms. It is not

hard to ﬁnd an algorithm that ﬁnds connected components – for instance, depth-ﬁrst

search will suﬃce. Since all the algorithms we deal with are linear or superlinear, this

stipulation causes no loss of generality and doesn’t aﬀect our running time analyses.

It also implies that m ≥ n −1, or n = O(m) and log n = O(log m).

The original input graph G will always be simple – that if an edge connects

two vertices, it is the only edge between those two vertices, and that there are no

self-loops. We lose no generality here because we can clean up a non-simple graph,

keeping only the redundant edges with lightest weight, in O(m) time. We will give the

algorithm later. This assumption causes m ≤ n(n−1), which implies m = O(n

2

) and

log m = O(log n). In some recursive calls the simple-graph requirement is dropped

to make analysis simpler, and it will be clearly stated whenever this occurs.

In addition, we talk about graphs on labeled vertices. For a graph on n vertices,

each vertex is labeled with a number (or any arbitrary symbol, as long as it is unique)

from 1 to n. Two graphs G and G

**are taken to be equal if ¦i, j¦ is an edge in G if
**

and only if ¦i, j¦ is also an edge in G

. There are 2

(

n

2

)

unique unweighted graphs on

n vertices.

1.2.4 Limited computation model

The literature makes a distinction between comparison-based algorithms and algo-

rithms which are allowed full access to bit representations of data. There is a speciﬁc

model of computation that is favored, the pointer machine. The main limitation of

a pointer machine is that it does not allow arithmetic on pointers. That means no

constant-time table lookups, since calculating a hash function requires the ability to

manipulate pointers. A pointer is allowed to be dereferenced and checked against an-

other pointer for equality, nothing more [2]. The full range of arithmetic operations

are allowed on any other data type, and have unit cost.

We need to acknowledge the elephant in the room: given models of computation

that do allow bit arithmetic, the MST problem already has a linear time solution!

For example, Fredman and Willard also another algorithm and data structure that

ﬁnds MSTs in linear time on a unit-cost RAM. bit arithmetic [8]. Pettie’s algorithm

from Section 7 also runs in linear time if pre-computed MST solutions (instead of

decision trees) are allowed to be cached and retrieved in constant time.

8

We will focus on pointer machine algorithms in this review. They tend to reveal

more about the nature of the MST problem, and the resulting insights are frequently

applicable to matroid optimization in general. In addition, the search for a linear-

time comparison-based minimum spanning tree algorithm has motivated ideas and

data structures that are useful in general to computer science, some of which we will

describe.

1.3 Important properties

1.3.1 Cuts and cycles

Deﬁnition. A cut of a graph G is a subset S of vertices and its complement

¯

S such

that neither S nor

¯

S is empty.

Although we will formally deﬁne a cut as a set of vertices and its complement,

keep in mind that a cut is really a just of dividing the graph. In many ways the

edges that cross a cut are more important than the vertices themselves.

Deﬁnition. An edge e crosses a cut S,

¯

S if one of its endpoints is in S and the other

is in

¯

S.

Often we will talk of a speciﬁc cut, one that results from removing an edge from

a spanning tree.

Deﬁnition. Let T be a spanning tree of G and e an edge in T. Removing e from T

divides T into two connected components, and every vertex in G is in one of them.

We say S,

¯

S is the cut deﬁned by e if S is the set of vertices in one of the components,

and denote it cut(T, e).

Deﬁnition. A cycle in G is a path whose endpoint is the same as its start point.

Since a spanning tree T of G is connected, there is a path involving only edges

in T between any two vertices in G, and since it is a tree, this path is unique.

Deﬁnition. Let T(u, v) denote the unique path between u and v in T.

As with cuts, a spanning tree and an edge can deﬁne a speciﬁc cycle.

Deﬁnition. If T is a spanning tree of G and e = ¦u, v¦ is an edge not in T, then

T(u, v¦ and ¦u, v¦ form a cycle. We call this the cycle e makes with T.

9

1.3.2 About trees

We will reiterate without proof some basic facts about trees and go on to some MST

properties.

Fact. A tree T with n vertices has n-1 edges and no cycles.

In the other direction,

Fact. Any two of the following properties suﬃce to prove that T is a tree: connect-

edness, acyclicity, and having n-1 edges

1.3.3 Existence and Uniqueness

Now we have the tools to prove that the minimum spanning tree does indeed exist

for all graphs, and is furthermore unique if our edge weights are distinct.

Theorem 1. For any connected graph G, the minimum spanning tree of G exists

and is unique.

Proof. Existence: A spanning tree of G exists, since we can keep removing edges

until G no longer has any cycles. Removing any edge that is part of a cycle does not

disconnect G since any path that went through the edge ¦u, v¦ which we removed

may also go through the remaining part of the cycle. There also exists a spanning

tree with minimal weight: since our graphs our ﬁnite, we can enumerate all spanning

trees and their costs. The set of spanning tree costs is also ﬁnite, so there must be

a minimum.

Uniqueness: Suppose we have two spanning trees, T

1

and T

2

, with the same

weight. Let e be the heaviest edge in T

1

∪ T

2

− T

1

∩ T

2

. Suppose without loss of

generality that e ∈ T

1

. Across cut(T

1

, e), there is no other edge of T

1

, otherwise there

would be a cycle. However, since a spanning tree must be connected, there is at least

one edge in T

2

that crosses this cut. Let f be such an edge of T

2

. Since f is not in T

1

,

it must be in T

1

∪T

2

−T

1

∩T

2

, and since e was the heaviest in this set, w(f) < w(e).

If we replace e with f in T

1

, the resulting graph is connected, since the graphs on

either side of the cut were connected. It is also acyclic, since after removing e from

T

1

, there was no path between a vertices on opposite sides of the cut, so adding in

f created no cycles. Thus replacing e by f results in a spanning tree with total cost

less than cost(T

1

) = cost(T

2

), so neither of these are minimal. Therefore a spanning

tree with minimal cost must be the only spanning tree with that cost.

10

1.3.4 The cut and cycle properties

The properties of being the lightest edge across some cut and the heaviest edge on

some cycle have a curious dual relationship:

Lemma 2. Theorem: ∃ cut S across which e is lightest ⇐⇒ cycle on which e is

heaviest.

Proof. (⇒) Let e be an edge that is the lightest across a cut S,

¯

S, and suppose that

C is a cycle containing e. Removing e from the cycle leaves a path between the two

endpoints of e, call them u and v. Since one (say u) is in S and the other is in

¯

S,

then the remainder of C must cross from S to

¯

S. Let f be the edge in C–e that

crosses the cut. Since e is the lightest across the cut, e must be lighter than f, and

so e cannot be the heaviest in the cycle.

(⇒) We argue the contrapositive. Suppose e is heaviest on a cycle C. Let S,

¯

S

be a cut that e crosses. Since C is a cycle, it must cross the cut at least twice. Let

f be another edge that crosses the cut. We know e is heavier than f, so e cannot be

the lightest across this cut.

The cut property and the cycle property are ways of characterizing all edges in

the MST, and indeed either one actually deﬁnes the edges of the MST.

Theorem 3 (Cycle Property). An edge e is not in the minimum spanning tree if

and only if it the heaviest edge on some cycle.

Proof. (⇒) Let T

∗

be the minimum spanning tree for G, and let e = ¦u, v¦ not be in

T

∗

. Since T

∗

connects all the vertices of G, there is a path in T

∗

that connects u and

v. Adding e to this path creates a cycle C. If there exists an edge e

that is heavier

than e, then we can replace e

**with e to get a lighter tree T
**

∗∗

, which is impossible

by our choice of T

∗

.

(⇐) Suppose e is heaviest on the cycle C. Then there is no cut across which it

is the lightest. Let T be a spanning tree of G that includes e. Removing e splits T

into two connected components, deﬁning a cut of G. There is a lighter edge f across

this cut, and replacing e by f yields a tree T

**that is lighter than T. So no spanning
**

tree containing e is minimal.

By the cut-cycle duality, it is easy to see that this implies the cut property:

Theorem 4 (Cut Property). An edgee is in the minimum spanning tree if and only

if it is the lightest across some cut.

Proof. This follows directly from Lemma 2 and Theorem 3.

11

1.4 Graph representation

A graph, being mathematically deﬁned as a set of vertices V and a subset E of

V V , still needs to have some kind of concrete representation on a computer. We

can realize this with an adjacency list: each vertex v maintains a list of pointers

to edge objects for which v is an endpoint. Both edges and vertices we will deﬁne

to be data structures, with a vertex storing at minimum its unique identiﬁer. An

edge stores a pointer to the endpoint with the lesser identiﬁer, and a pointer to the

endpoint with the greater identiﬁer, as well as its weight. Vertices and edges are also

capable of storing an additional constant amount of data, which we will describe as

needed.

2 Classic algorithms

2.1 The union-ﬁnd data structure

The classic greedy algorithms heavily use set operations, in particular asking whether

two objects are in the same set as well as taking the union of two sets. The union-

ﬁnd data structure supports the operations makeset(u), find(u), and union(u, v):

makeset() returns a new set with one element u, find(u) returns the unique repre-

sentation of the set to which u belongs, and union(u, v) combines the sets containing

u and v into one.

The implementation of the union-ﬁnd structure is outside the scope of this review.

However, the running times per operation are important enough emphasize here:

makeset and union both run in O(1) time. And for any sequence of union and

find operations that includes k finds, then find takes at most O(kα(k)) time,

averaging O(α(k)) or better per find, where α() refers to one form of the inverse of

the Ackermann function. The Ackermann function grows extremely quickly, and its

inverse grows extremely slowly – α(number of atoms in the observable universe) =

4. The Ackermann function and a diﬀerent form of its inverse (one that takes two

arguments) will reappear later, when we discuss Chazelle’s algorithm.

2.2 Kruskal’s

Kruskal’s algorithm follows a simple intuition: in order to minimize the ﬁnal cost,

include the lightest possible edges. In particular, start by grabbing the lightest legal

edge available, and repeat until you have a spanning tree.

12

Algorithm 1 Kruskal

Require: Input G = (V, E)

Ensure: Output T ⊆ E, the minimum spanning tree

1: sort E

2: T ← ∅

3: for all v ∈ V do

4: makeset(v)

5: end for

6: for all edges ¦u, v¦ ∈ E do

7: if ﬁnd(u) != ﬁnd(v) then

8: add {u,v} to T

9: union(u,v)

10: end if

11: end for

12: return T

Let’s take a high-level look at Kruskal’s. When edge e is processed, if e is not in

the MST, Kruskal’s ignores it, and if e is in the MST, Kruskal’s puts it in T. This

is easily proved using the cut and cycle properties. Basically, if the endpoints of e

are in the same connected component, then e is heaviest on the cycle it creates with

the existing edges in T, because we are processing the edges in sorted order. On the

other side, if the two endpoints are in diﬀerent components C

1

and C

2

, and if S

1

is

the vertex set of C

1

, then e is lightest across the cut S

1,

¯

S

1

.

At the time of processing an edge, Kruskal’s algorithm does exactly the right

thing with it – if e belongs in MST(G), then Kruskal’s includes it. If not, Kruskal’s

throws it out.

2.3 Dijkstra-Jarník-Prim

This is more commonly known as Prim’s algorithm, but I will follow Pettie’s example

in calling this the Dijkstra-Jarník-Prim algorithm, or DJP for short. It was ﬁrst

devleoped in 1930 by Jarník, and independently discovered by Prim and Dijkstra in

the late 1950’s.

From a distance DJP seems very similar to Kruskal’s: it also grabs the lightest

edge possible at every step. Instead of iterating through the edges in sorted order,

it uses a heap to keep track of which vertex would be cheapest to add to a growing

tree. It only keeps track of one edge per candidate vertex, calling on the heap

decreasekey operation if necessary. The running time is therefore heavily dependent

13

on the heap used. With a standard binary heap that insertions, deletions, and

decreasekey operations in O(log N) time, where N is the number of elements in the

heap, the running time is O(mlog n).

3 Iterative algorithms

The two iterative algorithms we shall describe are Borůvka’s and the Fredman-Tarjan

algorithm. Both deﬁne iterative steps that are important in the later, more sophis-

ticated algorithms.

3.1 Contractions

The purpose of the contraction is mostly to present things cleanly. Instead of speak-

ing of a collection of intermediate subgraph, it allows us to speak of the vertices of

a contracted graph.

Contraction is exactly what it sounds like: we merge two or more vertices into

one supervertex, whose incident edges are the union of all the edges incident to the

original vertices that make it up. More formally, given a (usually disconnected) sub-

graph H, contracting the graph across H means to make every connected component

of F into a supervertex.

The implementation given in Algorithm 2 requires each vertex to store an integer

in the ﬁeld “component.”

Let m

F

, n

F

be the numbers of edges and vertices in F, and m

G

, n

G

be the numbers

of edges and vertices in G. Since can ﬁnd connected components in O(m

F

+n

F

) time,

the entire subroutine takes O(m

G

+ n

G

) time, since we iterate through the vertices

once and throught the edges once, and m

F

< m

G

and n

F

< n

G

.

Contractions have the messy side eﬀect of potentially returning a non-simple

graph. There is a simple clean-up routine, using a lexicographic sort, to ensure that

the contracted graph is simple, keeping the lightest edge when there are redundant

edges. After lexicographically sorting the edges by component identiﬁers, redundant

edges show up next to each other and we only need to scan the sorted list of edges to

extract the edge of lowest cost among duplicated edges. This can be done in O(m

G

)

on a pointer machine [7].

Deﬁnition. If G

was obtained by contracting edges of G, then G

is called a minor

of G.

Remark. A minor of a minor is also a minor. That is, if G

is a minor if G

and G

is

a minor of G, then G

is a minor of G.

14

Algorithm 2 Contract

Require: Input: G = (V, E), a subgraph H of edges to contract

Ensure: Ouput: G

**, the contracted graph
**

If not every vertex in V is represented in H, put the missing vertices in

V

← ∅, E

← ∅

G

= (V

, E

)

connectedComponents←ﬁnd-connected-components(H)

i ← 0

for all C ∈ connectedComponents do

put i in V

for all v ∈ C do

v.component ← i

end for

i ← i + 1

end for

for all ¦u, v¦ ∈ E do

put ¦u.component, v.component} in E

end for

return G

= (V

, E

)

In terms of MST algorithms, certain subgraphs are safer to contract than others.

Deﬁnition (Contractible). A subgraph C of Gis contractible if MST(G) = MST(C)∪

MST(G¸ C).

That is, treating the entire collection of vertices in C as one does not aﬀect the

correctness of an MST algorithm. All partial MST results are contractible – we could

stop Kruskal’s or DJP at any time, for instance, contract G across the intermediate

result, and carry on.

Remark. C is contractible and connected ⇐⇒ C∩MST(G) is connected.

Deﬁnition. If G

is a minor of G, and v

is a vertex in G

, then the supervertex v

contains one or more vertices of G. Let the expansion of v

be the subgraph of G

with vertex set ¦v ∈ G : v maps to v

¦, and edge set ¦¦u, v¦ : u, v both map to v

¦.

We write the expansion of v

as C

v

.

3.2 Borůvka’s algorithm

Like Kruskal’s algorithm, Borůvka’s partitions the vertices into partial trees and

merges them incrementally. However, unlike Kruskal’s, which merges two compo-

15

nents in a step, in a single step of Borůvka’s algorithm every component is involved

in a merger. Like DJP, Borůvka’s grows the intermediate result by taking the lightest

edge coming out of a component, but unlike DJP, which only tracks one component,

Borůvka’s does so for many.

Of course the cost of taking multiple edges in a step is that the steps are longer

and more complex.

3.2.1 One iteration of Borůvka’s algorithm

At the start iteration i, we have a graph G

i

, with G

0

= G. During one iteration, each

vertex selects the lightest edge incident to it and contracts that edge. At the end

of the iteration, we have some contraction G

i+1

of G

i

, and the set F of contracted

edges.

In the implementation given in Algorithm 3, we need to store ﬁelds “minEdge”

and “minEdgeWeight” for each vertex.

Algorithm 3 Borůvka-step

Require: Input: G

i

= (V

i

, E

i

)

Ensure: Output: a forest F of MST edges and a contracted graph G

F ← ∅

for all ¦u, v¦ ∈ E

i

do

if w(¦u, v¦) < u.minEdgeWeight then

u.minEdgeWeight ← w(¦u, v¦)

u.minEdge ← ¦u, v¦

end if

if w(¦u, v¦) < v.minEdgeWeight then

v.minEdgeWeight ← w(¦u, v¦)

v.minEdge ← ¦u, v¦

end if

end for

for all v ∈ V do

put v.minEdge in F

end for

G

i+1

←contract (G

i

, F)

return G

i+1

, F

Iterating through the edge set and vertex set takes O(m+n) time, and contract

is O(m), so Borůvka-step takes O(m) time in all.

16

3.2.2 The algorithm

Borůvka’s algorithm simply performs Borůvka phases until the entire graph is con-

tracted into one vertex. It stores a running result T, and appends the contracted

edges F to T after every iteration. It is easy to see correctness by noting that taking

the lightest edge out of a vertex v

in G

**is equivalent to taking the lightest edge
**

out of the cut S

v

,

¯

S

v

, where S

v

is the vertex set of C

v

. And we iterate until G is

contracted to a single vertex, so the ﬁnal T is connected.

3.3 Fredman-Tarjan

Over time, various modiﬁcations to Kruskal’s, DJP, and Borůvka’s algorithms have

been proposed, lowering the running time by various degrees. The use of a Fibonacci

heap, and a slight but important modiﬁcation to DJP, lowers the running time from

O(mlog n)to O(mβ(m, n)), where β is a form of the iterated logarithm, being the

least number of times the logarithm function must be applied to n before it drops

below m/n. Formally, β(m, n) = min¦i : log

(i)

n ≤ m/n¦. The approach is more

thoroughly described in [7].

The Fibonacci heap performs the insert, deletemin, decreasekey, and meld op-

erations in constant amoritized time, and both deletemin and delete in O(log N)

amoritized time, where N is the number of items in the heap.

3.3.1 One iteration of Fredman-Tarjan

One iteration of the Fredman-Tarjan algorithm results in a contracted graph, where

the number of contracted vertices is at most 2m/k, with k being an input parameter.

In addition, it generates a set of partial MSTs ( such that

1. ( covers every vertex.

2. The members of ( are edge-disjoint

3. The number of connected components in ∪

C∈C

C is at most 2m/k.

The basic ﬂow is pretty simple: start with all vertices unmarked. Picking an arbitrary

vertex, expand outward, DJP-style, until the heap of candidate vertices reaches size k.

Mark all the vertices in the current component, and start afresh with an unmarked

vertex. For components other than the ﬁrst, expansion can stop before the heap

grows large enough, if the currently component collides with an old one – that is,

the last vertex added to the component was already part of another component.

Pseudocode is given in Algorithm 4.

17

Algorithm 4 Fredman-Tarjan-iteration

Require: G = (V, E)

Ensure: F = a subset of MST edges; G = G¸ F

while there are still unmarked vertices do

Initialize a new heap

Pick an arbitrary unmarked vertex v

0

Put all adjacent vertices u in heap with key w(¦u, v

0

¦)

while hp has fewer than k elements do

v ← heap.deletemin()

if v is already in the currently growing component then

Continue without doing anything

end if

Add ¦v, x¦ to F, where w(¦v, x¦) was the last key of v in the heap

for all u adjacent to v do

If u is not in the heap, insert u with key w(¦u, v¦). If u has a greater key

in the heap than w(¦u, v¦), then decrease the key to w(¦u, v¦)

end for

end while

end while

Contract G across the edges of F (without clean-up)

This ensures 1) that every time we retrieve the lightest edge from the heap, it

takes time in O(log k), and 2) whenever we stop growing a component, either it has

k or more other vertices adjacent to it, or it shares a vertex with another component.

However, the ﬁrst component in any set of components linked by common vertices

must have stopped growth when the heap reached critical size. Therefore every

connected component of F at least k edges coming out of it. Contracting across F

gives us k edges coming out of each vertex. Since the total number of edges coming

out of all vertices is 2m, this gives at most 2m/k vertices in the contracted graph.

Again, the running time is highly dependent on the particular heap implementa-

tion. With a Fibonacci heap, one iteration runs in O(m+nlog k).

Another consequence of this is that we can raise the density,

m

n

, of a graph to an

arbitrary value D in O(m + nlog D) time. This comes from the fact that the new

density

m

n

=

k

2

, so by setting k = 2D and running a Fredman-Tarjan iteration we

have the desired result.

18

3.3.2 The complete algorithm

Again, we perform iterations, contracting the connected components after step, until

the graph is becomes trivial. Setting k = 2

2m

n

for each iteration will give us the

promised time bound. We refer the reader to [7] for details.

4 An algorithm for veriﬁcation

I’ll begin with a veriﬁcation algorithm because it nicely illustrates both the usage of

the cycle property and a couple of other tricks.

A bit of history and acknowledgements: Janos Komlós ﬁrst observed that ver-

iﬁcation can be done in a linear number of comparisons, although a linear-time

implementation proved more incorrigible. Valerie King distilled Komlós’s result into

a simpler form in addition to managing to implement Komlós’s algorithm in lin-

ear time and space. Slightly before King’s result, Dixon, Rauch, and Tarjan gave

a completely diﬀerent algorithm based on massaging the input so that a previous

method of Tarjan’s runs in linear time. Adam Buchsbaum produced the ﬁrst purely

comparison-based veriﬁer by replacing the RAM-dependent portion of Dixon et al.

by a pointer method.

Here I will talk about Komlós’s information-theoretic result and King’s reﬁne-

ment. It is important to know that Buchsbaum’s algorithm exists, for later algo-

rithms, but I will not go into detail about it.

4.1 Veriﬁcation: problem deﬁnition and reduction

The inputs are a graph G and a spanning tree T of G. A correct veriﬁer accepts if

T is the minimum spanning tree of G and rejects if T is not.

4.1.1 Narrowing down the search space

How do we know if T is the MST? The cut and cycle properties tell us exactly which

edges are in the MST. We present the holistic cycle and cut properties. They are

holistic in the sense that they apply to an entire spanning tree.

Theorem 5 (Holistic cut property). If T is a spanning tree of G, then removing

any edge splits T into two connected components, which between them cover all the

vertices of G. This deﬁnes a cut of G. The holistic cut property states that T is the

minimum spanning tree if and only if every edge in in T is the lightest across the cut

deﬁned by removing it from T.

19

Remark. For the cut deﬁned by removing e ∈ T then e is the only edge from T across

that cut.

Likewise the cycle property can be used to evaluate an entire spanning tree.

Deﬁnition. Let G be a graph and T a spanning tree of G. Given any two vertices

u and v in G, there is a unique path between them that only uses the edges in T.

This is a consequence of the deﬁnition of a spanning tree. Deﬁne T(u, v) to be this

unique path.

Theorem 6 (Holistic cycle property). If T is a spanning tree for G, then for every

edge ¦u, v¦ that is not in T, putting ¦u, v¦ together with T(u, v) creates a cycle. T is

the MST if and only if every edge ¦u, v¦ not in T is heaviest in the cycle it creates

with T(u, v). To simplify notation at times, we will speak of the cycle e creates with

T.

Proof. The cut-cycle duality entails that the forward direction of the holistic cut

property is equivalent to the holistic cycle property, and the same for the backward

direction.

For the forward direction, if T is the MST, and f is an edge not in T, then f

must be the heaviest in the cycle it creates with T since otherwise we replace the

heaviest edge in the cycle with f, obtaining a spanning tree lighter than T. For the

backward direction, if every edge in T is lightest across the cut it deﬁnes, then the

ordinary cut property guarantees that every edge in T is in the MST.

The holistic cut and cycle properties seem nearly like tautologies, given the or-

dinary cut and cycle properties. Their signiﬁcance comes from the fact that they

specify the exact cut or cycle that we should look at. The ordinary cut and cycle

properties only said, “if there exists a cut,” or “if there exists a cycle.” The holistic

properties make it so we don’t have to look at all cuts or all cycles, just one.

4.1.2 Reduction

Applying the holistic cut and cycle properties yields the following equivalent formu-

lations of the MST veriﬁcation problem:

1. Given a graph G and a spanning tree T, then for every e ∈ T, is e the lightest

across the cut deﬁned by removing e from T?

2. Given a graph G and a spanning tree T, then for every e / ∈ T, is e the heaviest

on the cycle it makes with T?

20

Komlós chooses to attack the second question, breaking it up into two parts. The

ﬁrst task is to ﬁnd the maximum weight on T(u, v) for all vertex pairs u, v. The

second is to test w(¦u, v¦) against this maximum weight for all edges ¦u, v¦ not in

T.

4.2 We can verify with a linear number of comparisons

Komlós notes that one can turn any spanning tree into a rooted tree by distinguishing

an arbitrary leaf node. Given this natural order on the vertices and edges of the input

tree, then, we can break any query path into two half-paths:

Deﬁnition. If one end of a path is an ancestor of the other end, then this path is a

half-path.

Komlós inductively ﬁnds the maximum weight on every possible half-path, and

stores the result in a lookup table.

For every node v on level d of the tree, we will construct an ordered list M(v) =

[m

0

(v), m

1

(v), . . . , m

d

(v)], where m

i

(v) is the maximum weight of the directed path

starting at level i and going to v. For example, if p(v) is the parent of v, m

d−1

(v)

equals the weight of the only edge between p(v) to v.

Lemma 7. For every v at level d, we can ﬁnd M(v) in less than or equal to log d

comparisons.

Proof. If d = 1, i.e. v is a child of the root, we deﬁne M(v) = [m

0

(v)] = [w(¦v, r¦)].

This takes zero comparisons, and 0 ≤ log 1 = 0.

Let u be a node and let a

i

(u) denote its ancestor at level i, of course constraining

i to be less than depth(u).For any depth i and any node u, m

i

(u) ≤ m

i−1

(u) because

the directed path from a

i

(u) to u is a subset of the directed path from a

i−1

(u) to

u. Thus [m

0

(v), m

1

(v), . . . , m

d

(v)] is an ordered list.When constructing M(v), since

for every path from an ancestor a

i

(v) to v we know the maximum over all edges ex-

cept ¦p(v), v¦, we only need to compare w¦p(v), v¦) with m

i

(p(v)). However, since

M(p(v)) is an ordered list, we only need to ﬁnd the point at which w(¦a

d−1

, v¦)

becomes greater than m

i

(p(v)). This is binary search, which takes log(d − 1) com-

parisons.

To actually construct M(v), however, is a little more expensive. We take the

index i∗ returned by binary search, , and set m

i

(v) = m

i

(p(v)) for all i < i∗, and

m

i

(v) = w(¦p(v), v¦ if i ≥ i∗.

Deﬁnition (Full branching tree). A rooted tree in which all leaves are at the same

level, and every internal (non-leaf) node has at least two children.

21

4.2.1 Proof of complexity for a full branching tree

The total number of comparisons needed, then, is

i

L

i

for L

i

the total number of comparisons needed for all the vertices at level i, which is

given by

v

log([M(v)[ + 1) (1)

for all v on level i.

Following Eisner’s lead [6], we rewrite this as an average of logs and use Jensen’s

inequality. Equation (1) becomes

n

i

v

log([M(v)[ + 1)

n

i

≤n

i

log(

v

[M(v)[

n

i

+ 1)

≤n

i

log

n +

v

[M(v)[

n

i

≤n

i

log

n + 2m

n

i

=n

i

_

log

n + 2m

n

+ log

n

n

i

_

The sum over all levels is then

O(nlog

m+n

n

)

using the fact that, since this is a full branching tree, the number of nodes at depth

i is at most n/2

i

.

4.2.2 Turning every tree into a full branching tree

By building a tree that documents a run of Borůvka’s algorithm, King gives us a way

to turn every spanning tree into a full branching tree with at most 2n vertices. Each

level of the full branching tree represents the state of the graph before one iteration

22

of Borůvka’s, with nodes at level i corresponding to the contracted vertices in G

i

.

There is an edge between a node at level i −1 and a node at level i if the i −1-node

becomes part of the i-node during that iteration. More formally, if T is a spanning

tree over n vertices, we will build a full branching tree B.

1. Start B as the empty graph.

2. Put all the vertices in T as the leaves of B. That’s the end of the 0th Borůvka

iteration.

3. Repeat until G

i

is contracted to a single vertex: If, at beginning of the ith

iteration, we have vertices v

1

, . . . , v

k

, and at the end, we have vertices u

1

, . . . , u

l

in the contracted graph, then put all u

j

in B as nodes. Draw a directed edge

from v

i

to u

j

if v

i

was contracted into u

j

, and the weight of that edge is the

weight of the edge selected by v

i

during that Borůvka iteration.

Note that since T was a tree to begin with, Boruvka’s algorithm trivially returns T.

An immediate consequence of this is that every edge in T was selected at some point.

In addition, even if we assume weights in T are unique, weights in B may not be.

There is a natural surjective map from edges in B to edges in T, and a one-to-many

mapping from edges in T to edges in B, namely the map that associates every an

edge in T to all the edges with the same weight in B.

Claim. B is a full branching tree.

Proof. B is clearly a rooted tree, with the node in B to the entire T as the root.

Since after an iteration, every connected component is the result of joining at least

two other connected components, condition 2 for a full branching tree is satisﬁed.

And we can prove by induction on the height of B that all leaves are at the same level

(that level being the number of iterations needed for running Boruvka’s on T).

4.2.3 We can use B instead of T

Recall that for a spanning tree T, T(x, y) denotes the unique path in T between x

and y. In the same way, let B(x, y) denote the unique path in B between leaves x

and y.

Lemma 8. u is on B(x, y) if and only if u is the lowest common ancestor of both x

and y in B, or u is an ancestor of x but not of y, or vice versa.

23

Proof. Suppose v is the lowest common ancestor of x and y. We can see that by

joining the path from v to x and from u to y and ignoring orientation, we get an

undirected path from x to y. Since B is a tree, this is the only path. Any node

on the path from v to x is an ancestor of x but not y (otherwise we contradict the

lowest-ness of v), and vice versa for any node on the path from v to y. On the other

hand, if u is a common ancestor of x and y but not the lowest, then u is not included

on the path deﬁned earlier in the paragraph, which is unique.

Lemma 9. If e’ is an edge of B(x, y), then there is an edge e in T(x, y) with w(e) =

w(e

). As a matter of fact, e is the same edge from which e

**derived its weight.
**

Proof. Let e be the T-edge whose selection gave rise to e

in B. If we show that e is

on T(x, y), then we’re done.

Suppose (u, v) = e

**, so e is incident to v. Since v is on B(x, y) and it is not
**

the highest node (u is higher), by the previous lemma, the subgraph expansion of

v contains exactly one of x and y. Disconnecting e would partition T into C

v

and

T ¸ C

v

, one of which contains x and the other of which contains y. Since x and y

would no longer be connected, e must be on T(x, y).

Lemma 10. If e is heaviest on T(x, y), there must be an edge of the same weight in

B(x, y)

Proof. We will show that the expansion of any contracted vertex that selects e con-

tains x or y but not both. First, let C

v

be the expansion of a vertex in one of the

G

i

. Let x = u

0

, u

1

, . . . , u

k

= y be T(x, y). If e is incident to C

v

, then T(x, y) ∩ C

v

is

nonempty since one endpoint of e is in C

v

and a vertex in T(x, y). Also, T(x, y) ∩C

v

is clearly connected because otherwise there would be a cycle, and T is a tree. So

T(x, y) ∩C

v

= u

i

, . . . , u

j

. Since both x and y are not in C

v

, u

i

,= x and u

j

,= y. Thus

¦u

i−1

, u

i

¦ and ¦u

j

, u

j+1

¦ are both incident to C

v

, and one of these is e. However,

since there are two edges in T(x, y) incident to C

v

, there is an edge lighter than e

incident to C

v

, so C

v

does not select e.

To see that any C

v

containing both x and y cannot select e, note that discon-

necting any edge incident to C

v

leaves C

v

intact. If C

v

contains both x and y,

disconnecting an incident edge leaves T(x, y) connected, so no incident edges of C

v

are part of T(x, y).

Therefore, let v be any vertex that selects e over the course of Borůvka’s algo-

rithm. We noted above that every T-edge is selected by at least one component. C

v

contains exactly one of x or y, and by Lemma 8 is part of B(x, y). Since C

v

does

not contain both x and y, the parent of v is also a node in B(x, y), so the edge from

v to its parent, which has weight w(e), is part of B(x, y).

24

This brings us to our ﬁnal result:

Theorem 11. If e is the heaviest edge on T(x, y) and f

**is the heaviest edge on
**

B(x, y), then w(e) = w(f

).

Proof. By Lemma 9, there is an edge f on T(x, y) with w(f) = w(f

), so w(f

) =

w(f) ≤ w(e). By Lemma 10, there is an edge e

on B(x, y) with w(e

) = w(e), so

w(e) = w(e

) ≤ w(f

). Therefore w(e) = w(f

).

Therefore we can use Komlos’s algorithm for full branching trees instead of general

trees, which we have just proved to take a linear number of comparisons.

A note The ideas in this section, I thought, nicely illustrated a use of the cycle

property for determining MSTs. In addition, the idea of constructing a tree of con-

tracted components, where each vertex is the child of the vertex it contracted into,

turns up again in Chazelle’s algorithm.

5 A randomized algorithm

Karger, Klein, and Tarjan introduce a randomized algorithm that always returns the

same answer for any input but whose running time varies. MSF-random, as we will call

it, is expected to run in O(m+n) for any given graph with m edges and n vertices,

although it could conceivably get very unlucky and take up to O(mlog n+n

2

) time.

Since I found that the applying big-Oh notation to a randomized running time

is a little bewildering, let’s restate that: there is a magic number c such that when

MSF-random is run on any graph G a large number of times, the average running time

is less than or equal to c (m + n) units of time. However, there is another magic

number d such that MSF-random always ﬁnishes in under d (mlog n + n

2

) units of

time.

5.1 Overview

The “F” in the name “MSF-random” comes from the fact that it works for graphs that

are not connected, hence it returns a minimum spanning forest instead of a minimum

spanning tree.

The algorithm is roughly sketched below for a graph G having n vertices and m

edges.

MSF-random:

25

1. Reduce the number of vertices by a factor of 4 via two Borůvka phases. Call

the contracted graph G

0

and the set of contracted edges F

0

.

2. Toss a fair coin for each edge in G, putting it in the subgraph H

a

if the coin

comes up heads. Call MSF-random on H

a

to obtain its minimum spanning forest

F

a

.

3. Eliminate all the edges of G that become the heaviest edge in a cycle when

added to F

a

. Let H

b

be the remaining graph.

4. Call MSF-random on H

b

to get its minimum spanning forest F

b

.

5. Return F

b

∪ F

0

.

Deﬁnition. An edge e is F-heavy if adding it to F creates a cycle and e is the

heaviest edge on that cycle. An edge in G that is not F-heavy is F-light.

The F

a

-light edges are exactly the edges that make up H

b

. Via the cycle property

we can see that none of the edges we threw out in step 2 are in the MST of G, so

the MST must be a subset of the edges we haven’t thrown out.

Claim. If H is a subgraph of G that covers all vertices and contains the MST, then

the MST(H) = MST(G).

Proof. Let e an edge in MST(G), and let S,

¯

S be a cut in G across which e is lightest.

The same cut in H has fewer edges across it, but this does not aﬀect the minimality

of e. If e is an edge in MST(H), and U,

¯

U is a cut for which e is lightest in H,

consider the same cut in G. Any edge in G that we haven’t included in H cannot be

the lightest across any cut, so e’s position as lightest is safe.

If MSF-random indeed returns the minimum spanning forest of H

b

correctly, then

F

b

= MST(G

0

) = MST(G ¸ F

0

). Since all edges returned by Borůvka’s are con-

tractible, the return value of MSF-random, F

b

∪ F

0

, is the MST of G.

Lastly, a double inductive argument on m and n, with a base case of an isolated

vertex, suﬃces to prove that MSF-random is correct.

5.1.1 The subgraph passed to the second recursion is sparse

The following statement is taken directly from Karger, Klein, and Tarjan in [9]

Theorem 12. Let G be a graph with n vertices, and let H be a subgraph obtained

by including each edge independently with probability p, and let F be the minimum

spanning forest of H. The expected number of F-light edges in G is at most n/p

where n is the number of vertices of G and p is the sampling probability.

26

An auxiliary procedure Consider the modiﬁcation to Kruskal’s algorithm, given

in Algorithm 5.

Algorithm 5 Count-F-light

Require: G = (V, E); p

Ensure: H is a subsampled graph of G; F = MSF(H)

1: Sort E

2: numFLight ← 0

3: numF ← 0

4: H ← (V

H

, E

H

) ← (∅, ∅)

5: F ← (V

F

, E

F

) ← (∅, ∅)

6: for all e ∈ E do

7: X ← coinFlip(p)

8: if X is heads then

9: Put e in H

10: end if

11: if e is F-light then

12: numFLight ← numFLight + 1

13: if X is heads then

14: Put e in F

15: numF ← numF + 1

16: end if

17: end if

18: end for

Claim. After running Algorithm 5, F = MSF(H) and H is a sampled graph with

each edge being included independently with probability p.

Proof. The second part of the claim follows directly from lines 7 to 10. The ﬁrst part

comes from the fact that we process the edges in increasing order of weight. If e is

included in F, then it does not create a cycle with lighter edges (by the deﬁnition of

F-light), and thus it is safe to include it in F, since any edge added to H afterward

must necessarily be heavier.

Fact. Suppose we have a coin that comes up heads with probability p. Let Z be a

random variable representing the number times we must ﬂip the coin to achieve n

heads. Then E[Z] = n/p.

More formally, Z has the negative binomial distribution parameterized by n and

p. The expectation of such a distribution is well-known to be n/p.

27

Claim. The variable numFLight is bounded above by a random variable Z having

the negative binomial distribution with parameters n and p.

Proof. Suppose we ﬂip a coin every time we increment numFLight – but we already

do! We ﬂip a coin and store it in the variable X, and numF counts the number of

times we get heads and increment numFLight. So numF is the number of heads

we have gotten from our numFLight coin ﬂips. However, numF must be less than

n, since the maximum number of edges in the forest is n − 1. Suppose we keep on

ﬂipping after count-F-light ﬁnishes, until numF plus the number of heads we got

afterwards is n. Let Z be the total number of ﬂips we had to make. That is, Z is

numFLight plus the number of extra ﬂips we had to make. By construction Z has

a negative binomial distribution, and Z is necessarily greater than numFLight. So

n/p = E[Z] > E[numFLight].

The proof of Theorem 12 follows directly from the previous two claims.

This allows us to expect that the number of edges passed into the second recursive

call is proportional to the number of vertices, not the original number of edges. Since

a Borůvka iteration halves the number of vertices, and we perform two of them at

the start of each call, this is very good news for the running time.

5.2 A tree formulation

MSF-random is a divide-and-conquer algorithm in the sense that it calls itself multiple

times, with the input to each subcall being smaller than the input of the parent call.

The divide is not quite clean, though, and only the output of the last call is used in the

ﬁnal recursion, making the subcalls more like a sequence of reﬁnements. Nevertheless,

like all divide-and-conquer algorithms, it can be represented by a recursion tree with

the original problem at the root. Each node has two children, one for each recursive

subproblem. The ﬁrst (randomly sampled) we’ll say is the left child, and the second

the right.

5.2.1 Some facts about vertices and the recursion tree

The Borůvka iteration reduce the number of vertices by a factor of 4, so each subprob-

lem has at most 1/4 the number of vertices as its parent. Therefore, a subproblem

at depth d has at most n/4

d

vertices. Each subproblem has at most two children;

therefore the number of subproblems at depth d is at most 2

d

. Using these facts,

we see that the total number of vertices in all the subproblems at depth d is n/2

d

.

Summing over all levels, we obtain an upper bound of 2n vertices in all subproblems

combined.

28

5.2.2 Some facts about edges and the recursion tree

Deﬁnition. A left-path is a path on the recursion tree consisting of all left edges. A

complete left-path is a left-path headed by either the root or a right child.

Note that left-paths correspond to a recursion chain of only the ﬁrst recursive call

– that is, ﬁnding a minimum spanning forest of a randomly sampled subgraph. Also

note that diﬀerent complete left-paths are disjoint, and that every vertex on a tree

is a member of a complete left-path. In other words, the complete left-paths form a

partition of the tree. Also, every right child heads a complete left-path.

It’s pretty trivial to prove that if X

0

is the number of edges at the head of the

left-path, and X

i

the number of edges at ith node of the path, E[X

i

] ≤ E[X

0

]/2

i

,

since each edge has a 1/2 chance of being sampled, and we remove many edges at

the Borůvka stage.

We sum over all subproblems on the complete left-path and see that the expec-

tation for this number is

∞

i=0

E[X

0

]/2

i

= 2E[X

0

].

Theorem 13. The expected number of edges in all the combined subproblems is 2m

+ n.

Proof. Suppose I have a subproblem with n

vertices and m

**edges, and let H
**

L

and

H

R

denote my left and right subproblems. By the fact above, E[m

R

] ≤ 2n

. Note that

at any depth d, there are at most 2

d

total subproblems and 2

d−1

right subproblems.

Recall that each subproblem has at most n/4

d

vertices. Summing over all depths, we

see that all the right subproblems combined have at most n/2 vertices. Therefore,

by Theorem 12, the total expected number of edges in all the right subproblems is

2(

n

2

) = n. The expected number of edges in the complete left-path headed by the

root is 2m, so the expected number of edges in the entire recursion tree is 2m+n.

5.3 Runtime analysis

5.3.1 The expected running time

For a problem of size n vertices and m edges, the running time T(m, n) breaks down

into

1. Two iterations of Borůvka’s: O(m).

(a) Recursive call + ﬁnding F-heavy edges: T(m

L

, n

L

).

29

(b) Finding F-heavy edges + recursive call: O(m) +T(m

R

, n

R

).

2. Concatenate the edges found in previous steps: O(1).

T(m, n) = T(m

L

, n

L

) +T(m

R

, n

R

) +O(m).

The running time depends solely on the number of edges processed in each sub-

problem:

T(m) = T(m

L

) +T(m

R

) +O(m) (2)

By above, the expected total number of edges is 2m+n, which is O(m).

5.3.2 A guaranteed running time

In the worst case, the sampling does nothing, and all the work is done by the Borůvka

iterations. This gives us a bound of O(mlog n) from a maximum recursion depth of

log n, and m edges in all the subproblems at one level. Furthermore, a subproblem

at depth d contains fewer than

1

2

_

n

2

4

d

_

2

=

1

2

n

2

2

4d

edges. This gives us at most

1

2

2

d

n

2

2

4d

<

1

2

n

2

2

d

total edges in a level, and at most n

2

edges in all subproblems at all

levels. This gives us a guarantee that even in the event that MSF-random makes

very, very unlucky choices, it is no worse (asymptotically) than a classical algorithm

like DJP or Borůvka’s.

5.3.3 High-probability proof

The algorithm ﬁnishes in O(m) time with probability 1 −exp(−Ω(m)).

First, we deal with the right subproblems. We’re going to prove the number

of edges in all the right subproblems is ≤ 3m with a high probability. We toss a

nickel for every edge that could be F-light. If it’s heads, the edge goes into the right

subproblem. Since the number of edges in a spanning forest is less than the number

of vertices, and the number of vertices in all right subproblems is less than n/2, the

total number of F-light edges in all right subproblems is n/2. Then, the probability

that there are more than 3m F-light edges is less than the probability that fewer

than n/2 heads appear in 3m coin tosses. The authors apply a Chernoﬀ bound and

the inequality m ≥ n/2 to get a probability of exp(−Ω(m)) [9].

Now for the left subproblems. If we deﬁne m

**to be the total number of edges in
**

all right subproblems, and m

∗

to be the the total number of edges in left subproblems,

this can be thought of as m

heads appearing in m

∗

coin tosses. Therefore P(m

∗

>

3m

) is the probability of getting only m

heads in more than 3m

**coin tosses. And
**

a Chernoﬀ bound gives you P(m

∗

> O(m)) grows as exp(−Ω(m)).

30

6 A deterministic, non-greedy algorithm

This algorithm, published by Chazelle in 2000 [4], held the record for the fastest

asymptotic runtime for two years, until Pettie and Ramachandran came up with

an algorithm that is by nature upper-bounded by any comparison-based algorithm,

including this one. However, since no one has been able to prove a lesser bound

for the latter, Chazelle’s analysis still yields the lowest asympotic runtime for the

MST problem of which are aware. In this review I will follow a technical report by

Pettie [10] that simpliﬁes Chazelle’s analysis. In addition, I will not include most

of the details Chazelle describes in [4], and focus more on the intuition driving the

algorithm.

This algorithm, at its heart, consists of three parts:

1. Identifying subproblems

2. Recursing on subproblems

3. Reﬁning the result from Number 2.

The reason Number 3 is needed is that we will use a data structure, the soft heap,

that renders the results of the subproblems inexact. While choosing the subproblems,

we use a soft heap, which picks good but not perfect subproblems.

6.1 The Ackermann function and its inverse

The main thing to know about the Ackermann function is that it grows extremely

quickly. Therefore, its inverse grows extremely slowly.

The Ackermann function is deﬁned on a 2D table, as follows:

A(1, j) = 2

j

(j ≥ 1)

A(i, 1) = A(i −1, 2) (i > 1)

A(i, j) = A(i −1, A(i, j −1)) (i, j > 1)

The base cases are sometimes given diﬀerently; I have followed [10]. To give an

idea of how fast the Ackermann function grows, the ﬁrst few values are given in Table

1. It is estimated that the observable universe contains less than 2

515

atoms, which

is in turn less than A(2, 4).

There are two ﬂavors of inverse. The ﬁrst takes only one argument.

α(k) = min¦i [ A(i, i) > k¦.

31

Table 1: Values of the Ackermann function

i/j 1 2 3 4 5 6

1 2 4 8 16 32 64

2 4 16 65536 2

65536

2

2

65536

2

2

2

65536

3 16 A(2, 16) ... ... ... ...

The second takes two.

α(m, n) = min¦i [ A

_

i,

_

m

n

__

> log n¦.

Note that α(, ) is decreasing in ﬁrst argument, as m/n is greater, and thus i

needn’t go as high for A(i,

m

n

) to toplog n. We will mostly use this second form in

our analyses.

6.2 The Soft Heap

Recall that heaps support the following operations:

• insert(item, k) puts item in the heap with key k.

• delete(item) takes the item away from the heap.

• deletemin() returns the item with minimum key and removes it from the heap.

• meld(otherHeap) which combines two heaps.

The soft heap, an earlier invention of Chazelle’s [5], plays a central part in lowering

the running time bound. We have seen, as in Kruskal’s and Prim’s, that insisting

on correctness at every step leads to unnecessary overhead. In Kruskal’s sorting the

edges was extra work, and in Prim’s we incurred overhead from maintaining a sorted

heap. The soft heap sacriﬁces correctness in exchange for speed. At any time it

may contain corrupted elements, elements whose keys have been raised from their

original values. The soft heap is controlled by a user-deﬁned parameter , the error

parameter, and guarantees

1. deletemin, delete, and meld take constant amortized time

2. insert takes O(log(

1

)) amortized time

3. The number of corrupted elements in the heap at any time is at most N, where

N is insertions so far.

32

4. An additional operation, dismantle, takes O(N) time. This is explained in the

next section.

6.2.1 Bad and corrupted edges

Every item in a soft heap has two keys, original and current. The soft heap uses the

current key to “bubble up” elements, and the return value of the deletemin operation

is based on the current key. However, given any heap element, we can ﬁnd out if it

is corrupt or not by comparing the current and original keys.

When we dismantle a soft heap, we will often want to ﬁnd out which items in it

are corrupt. This is why the dismantle operation takes O(N) time – we need to look

at all the elements currently in the heap and decide if they are corrupt or not.

Note that corruption may only raise the weights of edges, not change them ar-

bitrarily. Although it is possible to tell exactly how much each edge weight was

corrupted, we will not need this information for the MST algorithm, only the fact

that the soft heap thought the weight was higher than it should be.

6.2.2 Consequences for the MST algorithm

When we pick subproblems, we will use a soft heap to deﬁne subsets of the graph on

which to recurse. Ideally, we would like to pick perfectly contractible components.

However, since the soft heap corrupts edges as it goes, we have to settle for a diﬀerent

sort of contractibility on a corrupt graph.

6.3 Strong contractibility and weak contractibility

Recall that for a contractible subgraph C, MST(G) = MST(C) ∪ MST(G¸ C).

Let C be a subgraph of a weighted graph G.

6.3.1 Strong contractibility

Deﬁnition. C is a strongly contractible with respect to a weighted graph G if there

exists a vertex v

0

in C such that if the DJP algorithm starts with v

0

, it will construct

the MST of C after some number of iterations.

Deﬁnition. The maximum weight of a path is the maximum weight of all the edges

in a path.

Claim. Let C be strongly contractible for a corruption G

of G, and M

C

be those

edges which are both corrupt in G

**and incident to C. Then if ¦u, v¦, ¦x, y¦ are such
**

33

that u, x ∈ C and v, y / ∈ C and neither are in M

C

, all edges on the path between u

and x in MST(C) have weight less than max(w(¦u, v¦), w(¦x, y¦).

Proof. If T

C

is the minimum spanning tree of C, let e be the edge with heaviest

weight on the path T

C

(u, x). When we run the DJP algorithm from a particular

vertex v

0

, we end up with a minimum spanning tree of C. Let’s look at a step in

the DJP algorithm while it is constructing the the MST of C. Let p = z

0

, z

1

, . . . , z

k

be the part of T

C

(u, x) already selected by the algorithm so far. If neither endpoint

of p is equal to u or x, then there are two edges in T

C

(u, x) that have not been

selected yet. Since e is heavier than any other edge on T

C

(u, x), it is impossible for

the algorithm to select e at this step.

Remark. The converse is false. In particular, consider the graph with vertices a, b, c, d, e

and edges ¦a, b¦, ¦b, c¦, ¦c, e¦, ¦b, d¦, ¦d, f¦ with weights 1,2,5,3,4 respectively. Then

the subgraph C made of vertices b, c, d and edges ¦b, c¦, ¦b, d¦ is its own MST and

cannot be constrcuted by starting DJP on b, c or d since any attempt will run into b

ﬁrst and select ¦a, b¦ which has weight 1. However it is impossible to ﬁnd two edges

incident to C that are lighter, since the three incident edges have weights 1,4,5 and

the edges in C have weights 2 and 3.

6.3.2 Weak contractibility

Weakly contractibility guarantees us a composition formula similar to what we get

with ordinary contractibility. Suppose G is our original graph, andG

**is the identical
**

graph, except some edge weights have been raised. Recall that G

¸ C is the graph

resulting from contracting G

across C, so G

¸ C −M

C

is the contracted graph less

the edges in M

C

.

Theorem. If C is a strongly contractible with respect to G

, then MST(G) ⊆

MST(C) ∪ MST(G¸ C −M

C

) ∪ M

C

.

Note that MST(C) and MST(G

¸ C − M

c

) refer to the MST with the edge

weights of G, not G

. The only time G

**is relevant is when we specify that C is
**

stronly contractible with respect to G

.

The following proof is due to Pettie [11].

Proof. We want to show that any edge not in MST(C)∪MST(G¸C−M

C

)∪M

C

also

must not be in MST(G). The only way an edge fails to be in MST(C) ∪MST(G

¸

C −M

C

) ∪ M

C

is if it is in C but not MST(C), or if it is in (G

¸ C −M

C

) but not

in MST(G

¸ C −M

C

).

34

Case 1: If e ∈ C and e / ∈ MST(C) then there exists some cycle in C for which e

is heaviest. This cycle also exists in G, so e / ∈ MST(G).

Case 2a: If e ∈ (G

¸C −M

C

) and e / ∈ MST(G

¸C −M

C

), and there exists some

cycle in (G

¸ C − M

C

) that doesn’t involve C (we are loosely using C to mean the

contracted vertex) for which e is heaviest, then that cycle also exists in G.

Case 2b: Now suppose the only cycles in (G

¸ C − M

C

) for which e is heaviest

include C. Let P be such a cycle. Then in the noncontracted graph G, there is a

cycle P

**consisting of P and edges in MST(C). Let ¦u, v¦ and ¦x, y¦ be the two
**

edges in P that have exactly one vertex in C. Applying Claim 6.3.1and the fact

that e has maximum weight on P, w(e) ≥ max(w(¦u, v¦), w(¦x, y¦) ≥ w(f) for all

f ∈ C ∩ P

. Thus e is heavier than any edge in P

**, which is a cycle in the original
**

graph. So e / ∈ MST(G).

If the conclusion of 6.3.1 hold, we’ll say a graph is weakly contractible.

We will engage in slight abuse of notation and let MST(() be ∪

C∈C

MST(C).

6.3.3 Strong contractibility on minors

If G

0

is a graph, and (

0

is a set of components that is weakly contractible of G

0

,

then let G

1

= G

0

¸ (

0

−M

C

0

. If (

1

is a set of weakly contractible components of G

1

,

then MST(G

0

) ⊆ MST((

0

) ∪ MST((

1

) ∪ M

C

0

∪ M

C

1

. This follows from induction.

Thus the set ( = (

0

∪ (

1

is weakly contractible.

6.4 Overview revisited

Now that we have the vocabulary and machinery, we can give a more detailed

overview.

1. If the input graph is small enough to run DJP under a ﬁxed time, then run

DJP.

2. Find a set ( of subgraphs that is weakly contractible, and let M

C

be the cor-

responding set of bad edges.

3. For subgraph x ∈ (, preprocess x to increase density to

m

n

. As explained in

section 3.3, this can be done in time O(m + nlog D), where D is the desired

density.

4. For subgraph x ∈ (, recurse on x.

5. Preprocess MST(() ∪ M

C

to increase density to

m

n

.

35

6. Recurse on MST(() ∪ M

C

.

Although the subroutine call to raise the densities may seem worrisome, it will turn

out to not aﬀect the O(mα(m, n)) running time.

6.5 Motivation for Build-T

Build-T, the subroutine that will give us our set (, is the key to this algorithm. First

we to establish what we want out of it:

1. Acceptable subproblems. That is, MST(G) ⊆ MST(() ∪ M

C

, ( is edge-

disjoint, and ( covers all vertices of G.

2. Runs in O(m).

3. Small enough subproblems so that recursion of them does not overwhelm the

running time.

4. Not too many bad edges. This is so the ﬁnal recursion does not overwhelm the

running time.

The rest of this section is dedicated to elaborating on the last two points. In the

remainder of this section, suppose ( is a set of subgraphs such that

MST(G) ⊆ MST(() ∪ M

C

.

Let m

L

be the total number of edges passed to all the recursive calls except the

last, and m

R

, n

R

be the number of edges and vertices passed to the ﬁnal recursion.

6.5.1 What we already know

The number of vertices in MST(() ∪ M

C

is exactly n, and the number of edges is

exactly m

B

+n −1, where m

B

is the number of edges in M

C

. This follows from the

fact that ( covers all vertices and is edge-disjoint, so MST(() is a spanning tree of

G. After raising the density of MST(() ∪ M

C

via a Fredman-Tarjan iteration, the

number n

R

of vertices at most

_

m

B

m

_

n.

If we do not clean up after the Fredman-Tarjan iteration, then m

R

= m

B

+n−1 ≥

m

B

+

m

B

m

n = m

B

(1 +

n

m

) for a big enough graph. Here we are making the (possibly

big) assumption that we have managed not to corrupt only a fraction of edges in the

graph, so

m

B

m

< 1. So m

R

≥ (1 +

1

D

)m

B

.

36

6.5.2 The recursion formula

Let T(m, n) be the maximum running time for any graph with m edges and n ver-

tices, and let t(m, n) = T(m, n)/cm , for any constant c. Below let the total over-

head, including the time it takes to ﬁnd subproblems, take O(S(m, n)) time, and let

s(m, n) = S(m, n)/bm, for any constant b.

Then recursive formula for the running time of MST-hierarchical can be written

as

T(m, n) ≤

x∈C

T(m

x

, n

x

) +T(m

R

, n

R

) +bs(m, n) (3)

=

x∈C

cm

x

t(m

x,

n

x

) +cm

R

t(m

R

, n

R

) +bs(m, n)

≤

x∈C

cm

x

t

1

+cm

R

t

2

+bs(m, n) [see below]

= cm

L

t

1

+cm

R

t

2

+bs(m, n)

= (cm

L

t

2

−cm

L

(t

2

−t

1

)) +cm

R

t

2

+bs(m, n)

= (cmt

2

+c(m

L

+m

R

−m)t

2

) −cm

L

(t

2

−t

1

) +bs(m, n)

= cmt

2

+c((m

R

+m

L

−m)t

2

−m

L

(t

2

−t

1

) +

b

c

s(m, n)) (4)

In the above, t

1

= max

x

¦t(m

x

, n

x

) and t

2

= t(m

R

, n

R

).

To have the entire thing run in O(mf(m, n)), it suﬃces to have the following

restrictions on t(, ) and s(, ):

t(m

R

, n

R

) ≤ f(m, n) (5)

∀x, t(m

x

, n

x

) ≤ f(m, n) −1 (6)

s(m, n) = O(m), (7)

and the following restrictions on m

L

and m

R

:

(m

R

+m

L

−m)a −m

L

+

b

c

m

= m

R

a +m

L

(a −1) + (

b

c

−1)a (8)

≤ 0

with a being f(m, n).

We therefore look for a procedure that will guarantee the last requrement and

runs in O(m), since the ﬁrst two requirements follow by induction. To see this, note

that if all four of the above hold, substituting t

2

with a and t

1

with a−1 in (3) yields

37

an expression equal to or greater than (3). Propagating this replacement down to

(4),

cma +c((m

R

+m

L

−m)a −m

L

+

b

c

m) [from (5), (6)(7)]

≤cma [from (8)]

We could have replaced α(, ) with any function f(m, n); as long as we can ﬁnd

subproblems that will allow (5) through (8) to be fulﬁlled, we will have an algorithm

that runs in O(mf(m, n)). What’s special about α(m, n) is that, as we will show, if

f(m, n) = α(m, n) + 2, then we can fulﬁl all these requirements.

6.6 Build-T

6.6.1 A hierarchy of minors

It has already been noted that if G

0

, G

1

, . . . , G

N

is a sequence of contractions, with

G

0

= G and G

i+1

deﬁned recursively as G

i

¸ (

i

− M

C

i

, where (

i

is a set of weakly

contractible subgraphs of G

i

, then

MST(G) ⊆ MST((

0

∪ . . . ∪ (

N

) ∪ M

C

0

∪ . . . ∪ M

C

N

.

It is our job to ﬁnd (

i

so that the conditions described in the previous section hold.

This formulation leads to a hierarchical representation of the subgraphs in the

(

i

s. Each vertex v in G

i

really represents a subgraph of G

i−1

, which contains multiple

vertices of G

i−1

. Therefore, we make v the parent of all the vertices in G

i−1

that

it contains, which likewise are subgraphs that themselves contain vertices of G

i−2

,

and so on. Thus we obtain a hierarchy of subgraphs, with a node at height j in the

hierarchy representing both a vertex of G

i

and a subgraph of G

i−1

, whose children

are its component vertices, and whose parent is the subgraph of G

i

of which this

node is a part. It is clear that this hierarchy is a tree, since every node has one

parent, and there are no other links other than parent-child ones. Call this hierarchy

T .

6.6.2 Building the tree

It turns out that building T layer by layer, minor by minor, will not be as eﬃcient as

building it postorder [10]. Recall that in postorder traversal, all children have lower

traversal numbers than their parents, and left children have lower traversal numbers

than right children.

38

Therefore the ﬁrst subgraph we want to deﬁne is the one at the bottom of the

leftmost path of T , which is a vertex of G

0

(all leaves of T are vertices of G

0

) – call it

v

0

. This is rather trivial, so we “visit” its siblings (by deﬁning them to be subgraphs

in (

−1

), also vertices of G

0

, until we run out of siblings and “visit” the parent. Having

come at last to the parent, we know exactly which vertices are in it, and so we are

able to gather them up and put the entire component in (

0

.

Now the parent C has siblings too, so we start again, by visiting vertices of G

0

until we have visited enough, and are able to deﬁne another component C

. When

we have determined that we have deﬁned enough subgraphs of G

0

make a subgraph

of G

1

, we stop and throw all the previously-deﬁned Cs, which are both subgraphs of

G

0

but more importantly vertices of G

1

, into a component and put it into (

1

. Then

we start again at the bottom, with a vertex of G

0

.

We are not building T so much as discovering it, and recording the nodes we

discover. There are still two unknowns, however. One, how do we know that a node

has “run out of children,” and may be visited? Two, how do we know which sibling

to visit next?

6.6.3 Determining which sibling to visit next

While building a component of G

0

, this is an easy question, and the answer is to

take the lightest edge coming out the component. With subgraphs of later minors,

the same is still true. After ﬁnishing a vertex of G

i

(a subgraph of G

i−1

), we need

to ﬁnd an appropriate vertex of G

0

with which to start again. We will do this with

soft heaps. Each node of T on the active path – the path from the root G

N

to the

node just visited – maintains a heap (actually several heaps, as we will see later)

that stores the vertices of G

0

to which its known descendant leaves are adjacent.

If v is the vertex of G

i

which we have just visited, and C

v

is its expansion in G

0

,

let u be the parent of v in T , i.e. u is the subgraph of G

i

of which v will become a

part. Then the heap associated with u now tracks every vertex in C

v

, keyed by the

weight of the lightest edge that leads out of C

v

. In addition, all ancestors of u also

keep similar heaps. Note that a heap for a node in T only exists if we have visited

one of its children. The next vertex of G

0

that we visit will be the min element from

all these heaps. Let this vertex from G

0

be v

0

.

However, we don’t always just start a new bottom-level component with v

0

and

propagate the built components back to v. We also keep track of the min-link

between every pair of components on the active path. The min-link between a node

v and its ancestor w is the lightest edge between v and any visited relative for which

w is the lowest common ancestor. The min-links keep track of internal edge costs,

39

while the heaps keep track of external costs. At all times we want to maintain the

following invariant:

Invariant1 The next edge taken is lighter than all the min-links in the active path.

At the time v

0

is selected, the edge leading to it may be heavier than an existing min-

link. To preserve the invariant, we contract subgraphs until the edge is indeed lighter

than any min-link. That is, if w is the highest ancestor such that min-link(w, z) is

heavier than the edge we selected, then for every partially completed subgraph z

between u and w, call z ﬁnished and put it in the appropriate (

i

, except the direct

child of w. All of these z, then, will have no other children. We have to do something

special with the direct child w

**of w, since we don’t want to trigger the signal that
**

causes the algorithm to think w is ﬁnished before we get a chance to add v

0

(this

will be explained below). The min-link between w and w

leads out of w

and into

another child w

**of w. Take these two children of w, and create a new child node
**

fuse(w

, w

) whose children are w

and w

.

At this point we are ready to add v

0

to the newest bottom-level component.

Note that, because of the invariant, the min-links coming out of a higher node in

T are always heavier than the min-links coming out of lower nodes.

6.6.4 When a node runs out of children

One reason we decide that a node has no more children and thus should be visited

was described in the last section, when component’s growth is cut short to preserve

the invariant. The only other time we stop and decide to ﬁnish visiting a node is

when the subgraph gets big enough. Speciﬁcally, a node that is a subgraph of G

i

has

no more than A(s, i + 1) children. The parameter s is deﬁned to be

s = min

_

s

: A(t

,

_

_

m

n

_

1/4

_

< n

_

.

Remark. The leftmost child of any node terminated growth due to the size constraint,

because while the child’s subtree was being traversed, the parent had no other de-

scendants that were not in the child. This means that any non-terminal node has at

least one child of size A(t, i + 1), where G

i

is the minor to which the node belongs.

Remark (2). The previous remark implies that the total number of vertices in G

i

that did not end up in one-element subgraphs (resulting from premature termination

during expansion) is no more than 2n/A(t, i). Pettie proves this in [10].

40

6.6.5 Data structures and corruption

Each component on the active path maintains a list of soft heaps. It should be clear

that there is only one component per minor under construction at a time. Let X

i

denote the active component for G

i

. Note: Chazelle and Pettie number X

i

going in

the opposite direction, with the component of G

0

being X

k

, and the sole node at the

root of T being X

0

.

Recall that the number of corrupt items in a soft heap at most N, where N is

the total number of inserts. As Pettie and Chazelle both point out, once we delete K

items from the heap, the heap is free to corrupt another K elements without violating

its N corruption constraint. To alleviate the amount of corruption that would be

caused by continuously deleting and inserting and re-inserting elements into several

heaps, we instead maintain many diﬀerent heaps.

X

i

maintains a heap H(i) and additional heaps H

j

(i) for all j > i, as well as

a special heap H

∞

(i). An edge is put into H

j

(i) if the endpoint not in X

i

is also

incident to X

j

via another edge, and not incident to any X

l

for i < l < j. If an edge

is in H

∞

(i) then it is not incident to any ancestors X

j

. An edge is put into H(i) if

its other endpoint is already accounted for in one of the H

j

(i)s.

After we grab a new vertex v from G

0

, we insert all its incident border edges into

the appropriate heap. In addition, adding v to the current component changes some

edges from external to internal. We delete those from their respective heaps.

When we ﬁnish visiting a node and its descendants, we put all edges in the heaps

maintained by X

i

into the appropriate X

j

heap and discard corrupt edges in X

j

. If

an edge is eligible for H(i+1) then it is inserted there; otherwise redundant edges are

threshed out and H

j

(i) is melded with H

j

(i +1). There are further details involving

ﬁnding the minimum edge among redundant edges; the reader is invited to look at

[4] and [10].

In addition, X

i

maintains a list of min-links for all j < i. Chazelle notes that,

every time we ﬁnish visiting a node, or grab a new one, the min-links can be updated

in time quadratic in the length of the active path.

6.6.6 Error rate and running time

Setting = 1/8 gives us a total of at most

m

2

+ d

3

n corrupt border edges, since on

average each edge is inserted and reinserted into a heap at most four times. The total

cost of the heap operations, the inserts, deletes, melds, and comparing min-links is

O(mlog

1

/ +d

2

n), and the min-links contributes O(m) time [10].

41

6.7 Correctness

As noted previously, it suﬃces to prove that every component of every minor is

weakly contractible. We can use an inductive argument due to Pettie [10]. We

induct on the number of vertices in X

i

: suppose we know X

i

is weakly contractible

before we ﬁnd a new child v. However, Invariant 1 ensures that for any two edges

coming out of X

i

, the lightest path in X

i

is composed of edges lighter than at least

one of the two incident edges. This is enough to ensure weak contractibility since it

fulﬁls the hypotheses of 6.3.1.

6.8 Runtime

6.8.1 Density games

Recall that t is min

t

¦t

: A(t

, ¸

_

m

n

_

1/4

|¦. We pick D

0

= 2t and do an initial pass

to raise the density to D

0

. As noted above, this takes O(m+nlog t) ≤ O(mt) time.

Furthermore, every time we preprocess to raise the density, we pick D = m

/n

,

where m

, n

**are the parameters of the graph passed in to the recursive call. This
**

ensures that, for every recursive call, D ≥ 2t.

6.9 And, Finally

Going back to 6.5.1, we see that

m

R

+m

L

−m = m

B

(1 +

1

D

) +m

L

−m =

m

B

D

≤

m

2D

.

If we set f(m, n) to be t as calculated in this section, then (8) becomes

(m

R

+m

L

−m)t −m

L

+

b

c

m

≤

m

2D

t −m

L

+

b

c

m

≤

m

4

+

b

c

m−

m

2

from D ≥ 2t and m

L

+m

B

= m

Choosing b ≤

c

4

completes the requirements.

7 The Optimal One

The two main appeals of this algorithm are that

42

1. It’s simple(r than the last one).

2. It’s theoretically interesting because it shows that a minimum spanning tree

can be found in time proportional to the least number of comparisons needed,

on a pointer machine.

We will refer to this algorithm as MST-decision-tree.

7.1 Decision trees and optimality

We’ve all worked with decision trees at some level. A decision tree can chart the

course of an algorithm, with each internal node representing a possible branching

point, and each leaf containing a possible output. In the case of sorting a list, for

example, each edge weight comparison is a node with two children, one for the ≤

result, and one for the > result, and at each leaf a string which is in sorted order if

the decision tree is correct.

In general, pointer-machine MST algorithms have binary comparison as their

basic action. Any instance of a deterministic algorithm can be distilled into a decision

tree with the internal nodes representing edge weight comparisons, and two children

per node.

Let’s take a look at Kruskal’s algorithm. Every time two weights are compared

during the initial sort there is a node and a two possible child paths. If we know

which edges are present (and therefore know beforehand which edges will create

cycles if they are added), then the sort order fully determines the MST. Then, we

can say that Kruskal’s algorithm really represents a class of decision trees, or a way

to generate a decision tree for an input graph topology.

The height of a decision tree is the maximum length of a path in the tree. That

is, on a decision tree for a particular unweighted graph, the height is the number

of comparisons we make in the worst-case permuation of edge weights. We shall

say a decision tree is optimal if it is correct and there is no correct decision tree of

lesser height. Let T

∗

(U) denote the optimal decision tree height for an unweighted

graph U. Kruskal’s does not always generate an optimal decision tree. For example,

given a connected graph of n vertices and n − 1 edges, Kruskal’s makes at least

(n−1) log(n−1) comparisons during the initial sort, when it could have just returned

the set of all edges without doing any work!

Call the class of a graph G the set of all graphs with the same number of edges

and vertices, denoted by (

m,n

. For refererence, there are

_

(

n

2

)

m

_

such graphs in (

m,n

.

We are interested in all the decision trees generated by MST-decision-tree for any

particular class. Deﬁne T

∗

(m, n) to be max¦T

∗

(U) : U ∈ (

m,n

¦. That is, if some

43

hypothetical MST algorithm makes the optimal number of comparisons for each

graph, T

∗

(m, n) is the worst-case number of comparisons possible for a graph with

m edges and n vertices. The big result of [11] and this section is there is an algorithm,

MST-decision-tree runs in O(T

∗

(m, n)) time for any graph in (

m,n

.

The hypothetical algorithm above makes the optimal number of comparisons for

any graph G. However, this should not be taken to mean that Pettie and Ramachan-

dran’s algorithm does the same. The promise is that the number of comparisons and

the amount of time taken for a graph with m edges and n vertices is under T

∗

(m, n),

which depends only on the class, not the individual graph. For example, one graph in

G

100,100

is the cycle on 100 vertices, in which case 100 comparisons are needed to ﬁnd

the edge to exclude. On the other hand, if G is a path on 100 vertices and an extra

edge between the last vertex on the path and the third-to-last, then the longest cy-

cle is three edges long and only three comparisons are needed. MST-decision-tree

guarantees the same time bound for both.

7.1.1 Breaking up the decision tree

Lemma 14. Suppose G is a graph with m edges and n vertices, and let ( be an

edge-disjoint collection of subgraphs. Then

C∈C

T

∗

(C) ≤ T

∗

(m, n) .

Sketch of proof. The main idea of this proof is that by taking union of all the Cs

we create a graph H that 1) has at most m edges and n vertices, and 2) has MST

equal to the union of all the MST(C)s. T

∗

(H) is then clearly below T

∗

(m, n). The

second main idea is that, by stacking the optimal decision trees for the Cs, we can

create an optimal decision tree for H. For details, see [11].

This will allow us to recurse in strongly contractible subgraphs without messing

up the time bound.

7.2 DenseCase

Pettie and Ramachandran note that several previous superlinear algorithms can be

made to run in guaranteed to run in linear time if the graphs are kept suﬃciently

dense.

DJP with a Fibonacci heap, for example, runs in O(m + nlog n) time, so by

ensuring that k(m/n) ≥ log n for some ﬁxed k, we also make sure that nlog n =

O(m), so the entire thing runs in O(m). For the MST-hierarchical procedure

which we will describe in the next section, log n < A(k, ¸m/n|) =⇒ α(m, n) < k,

bringing the O(mα(m, n)) bound down to O(km) = O(k). For this algorithm, Pettie

44

and Ramachandran single out a relatively simple algorithm with an easy density

requirement by Fredman and Tarjan [7], introduced in the same paper that debuted

the Fibonacci heap (also described in 3.3). It runs in time O(mβ(m, n)), where

β(m, n) = min¦i : log

(i)

n ≤ (m/n)¦, so we only need m/n ≥ log

(k)

(n) for β(m, n) ≤

k. We’ll call the Fredman-Tarjan algorithm DenseCase. DenseCase may also operate

on graphs that have self-loops and multiple edges without aﬀecting the running time

analysis.

However, when speaking of the asymptotic runtime, we must account for graphs

that do not meet these requirements. Therefore we’ll only run DenseCase after

enough processing to guarantee the density requirement.

7.3 Building and storing decision trees

7.3.1 Keep the parameters small

Pettie and Ramachandran calculate a set of optimal decision trees for all graphs on

r vertices. For any number r, the time needed to build all the decision trees on r

vertices is a little horrendous. Pettie and Ramachandran go over this calculation, but

in short it is upper-bounded by 2

2

4r

2

. This number was obtained by hypothesizing

a brute-force calculation – building all possible decision trees for all possible graphs

on n vertices, testing them for correctness by trying out every permutation of edge

weights, and eventually taking the shortest correct tree. However, they also note

that if r < log

(3)

n, then the entire calculation runs in O(n)!

7.3.2 Emulating table lookups

This subsection explains how, given k subgraphs, each of which have r or fewer

vertices, we can ﬁnd the corresponding decision trees in O(kr

2

+ n) time, for r <

log

(3)

n. These methods are unique to the pointer machine model, and are only

relevant when we can’t do table lookups – the important thing to take away is the

ﬁnal running time of this retrieval process and the fact that we are able to achieve

this time on a pointer machine.

We’ve built 2

(

r

2

)

decision trees, and now we need to be able to use retrieve them

when they are needed. The obvious strategy is to store them in a table, and simply

access the table entry when we need to. However, the pointer machine model disal-

lows such a method, or any method requiring the computation of a machine address.

Theorists running pointer machines have found a way to emulate table lookups by

sorting. Sorting takes longer than a table lookup, but under certain circumstances

it can run fast enough.

45

The intuition is this: if I have N things that are orderable, and I have another

thing that is identical to exactly one of my N things, then if I sort these N+1 things,

then my extra thing should show up next to the original thing it matches. Thus the

time it takes to ﬁnd the original thing is the time it takes to sort the collection of

N + 1 things. Clearly if I only had one thing to look up, I wouldn’t bother with the

sort; I would just scan the original list of N things. But if I have k extra things, I

can play the same trick, and scan through the sorted string once to ﬁnd the matches

to all my query objects. So the time it takes to ﬁnd matches for k extra things is

the time to sort a total of N +k things.

Buschbaum et al. [3] encode a graph on r vertices to a string of r

2

symbols,

basically by listing the edges present and padding short strings with nulls.

Then we throw our 2

(

r

2

)

original graphs and k query graphs together and perform

a bucket sort, returning our items in lexicographic order (there is a natural ordering

on the vertex identiﬁers, and the encodings are basically strings of vertex identiﬁers).

How long does this take? The bucket sort performs r

2

passes, one for each symbol

in an encoding. In one pass we need to put each of our elements to be sorted into

a bucket, and we have 2

(

r

2

)

+ k items. The total time taken is O(r

2

2

r

2

+ r

2

k) =

O(n +r

2

k).

As a ﬁnal note, we can implement bucket sort with linked lists instead of arrays,

ensuring that we do not violate the rules of the pointer machine. See [3] for details.

7.4 Partitioning

We have three components that seem relevant: DenseCase, the optimal decision

trees, and the strong contraction rule.

As the last step we will show how to ﬁnd an edge-disjoint collection of subgraphs

T such that

1. For some corruption G

**of G, all D ∈ T are strongly contractible with respect
**

to G

.

2. Every vertex in G falls in at least one D ∈ T.

3. Let

˜

T be the collection of subgraphs obtained by merging any two subgraphs

of T that share a vertex.

˜

T is the set of connected components of ∪

D∈D

D.

Then every element in

˜

T has at least log

(3)

n vertices.

4. The process of ﬁnding T takes O(m) time.

46

7.4.1 Relevance

We can use DenseCase Suppose ( is a a collection of subgraphs such that every

C ∈ ( has at least log

(3)

(n) vertices and partitions the vertices of G. First, contract-

ing G across ( without removing redundant edges yields a graph with m edges and

n

< n/log

(3)

(n) vertices.

n

<

n

log

(3)

n

=⇒

m

n

>

_

m

n

_

log

(3)

n

=⇒

m

n

> log

(3)

n

.

Thus we can run DenseCase on the contracted graph, and it will ﬁnish in O(m).

(Even if we do remove duplicate edges, running DenseCase on the cleaned-up graph

will be strictly faster than running it on the more complex graph. The stipulation

that we do not clean up the graph prior to passing it in to DenseCase is purely to

make the analysis simpler.)

We can use decision trees Now if T is an edge-disjoint collection of subgraphs

such that every D ∈ T has less than or equal to log

(3)

n vertices, then we have

a precomputed optimal decision tree for every subgraph in T. So we can ﬁnd

∪

D∈D

MST(D) in

D∈D

T

∗

(D) < T

∗

(m, n) time, by Lemma 14.

Furthermore, if ¦D

1

, D

2

, . . . , D

j

¦ is any subset of T, then the MST of their union

is MST(D

1

) ∪MST(D

2

) ∪. . . ∪MST(D

j

), since they are edge-disjoint. So when we

create

˜

T by merging subgraphs that share vertices, we don’t need to do any extra

work to ﬁnd the MSTs of the subgraphs in

˜

T.

Now if we add the requirement that T covers every vertex , then by combining

any components of T that share vertices, we can obtain a set of subgraphs that

partition the vertices, as required above.

7.4.2 Finding partitions

We perform one Fredman-Tarjan iteration, described in 3.3, with two modiﬁcations:

we use a soft heap instead of a Fibonacci heap, and instead of stopping the growth

of a component when the heap gets too large, we stop growth when the component

size reaches r =

_

log

(3)

(n)

_

. In addition, after we ﬁnish growing a component, we

store the set of vertices in that component and put that set in T. The use of a soft

47

heap entails corruption; after ﬁnishing a component, we discard all border edges that

have been corrupted. Another consequence of using a soft heap is that there is no

decreasekey operation. Instead, we just insert all the edges we ﬁnd and trust that

the heap will return the “minimum”-weight edge.

Since a Fredman-Tarjan iteration doesn’t stop until all vertices are marked, i.e.

put into a component, T covers all vertices. Components stop growing when or

before they reach r vertices, so every member of T has r or fewer vertices, making

it eligible to get its optimal decision tree applied. Finally, every component in T

stopped growing either when it reached r vertices, or when it collided with another

component. As in the Fredman-Tarjan iteration, the ﬁrst component of a set of

components linked by shared vertices must have reached its mature size, so that

entire set of components must collectively have r or more vertices.

The use of a soft heap ensures that the procedure runs in O(m) time, and also

that it generates a of G. The corrupted graph G

**has at most 2m corrupt edges,
**

since every edge is inserted at most twice.

7.5 Putting things together

We are now ready to put everything together. The algorithm is as follows:

1. Precompute the decision trees for all graphs with fewer than log

(3)

n vertices.

Store the result in the variable dectree.

2. Run Partition. Store the corrupted graph in G

**and the collection of sub-
**

graphs in T.

3. Use the sorting trick to retrieve the decision trees of the graphs in T.

4. Apply decision trees to get MST of each subgraph. The result is ∪

D∈D

MST(D).

5. Combine subgraphs that share vertices to get a new collection of subgraphs

˜

T.

6. Contract G across

˜

T to get G¸

˜

T. Remove bad edges M

˜

D

to get G¸

˜

T−M

C

.

7. Run DenseCase(G¸

˜

T −M

C

) to get MST(G¸

˜

T −M

C

).

8. Two Borůvka iterations

9. Recurse: MST-decision-tree(∪

D∈D

MST(D) ∪ MST(G¸ T

−M

C

) ∪ M

C

).

48

7.6 Time complexity

The time taken for each step is as follows:

1. Precomputing decision trees – O(n)

2. Partitioning graph – O(m+n). Contracting graph – O(m+n).

3. Sorting – O(m+n).

4. Applying decision trees – O(T

∗

(m, n)).

5. Finding connected components of ∪T– O(m+n).

6. Contracting across T– O(m+n).

7. DenseCase – O(m).

8. Borůvka iterations – O(m+n)

9. Recursion – O(T

∗

(m/2, n/4))

References

[1] Otokar Borůvka. Wikipedia.

[2] A.M. Ben-Amram. What is a pointer machine? ACM SIGACT News, 26(2):88–

95, 1995.

[3] Adam L. Buchsbaum, Haim Kaplan, Anne Rogers, and Jeﬀery R. Westbrook.

Linear-time pointer-machine algorithms for least common ancestors, mst veriﬁ-

cation, and dominators. In STOC ’98: Proceedings of the thirtieth annual ACM

symposium on Theory of computing, pages 279–288, New York, NY, USA, 1998.

ACM.

[4] Bernard Chazelle. A minimum spanning tree algorithm with inverse-ackermann

type complexity. J. ACM, 47(6):1028–1047, 2000.

[5] Bernard Chazelle. The soft heap: an approximate priority queue with optimal

error rate. J. ACM, 47(6):1012–1027, 2000.

[6] Jason Eisner. State-of-the-art algorithms for minimum spanning trees - a tutorial

discussion. Master’s thesis, University of Pennsylvania, 1997.

49

[7] Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and their uses

in improved network optimization algorithms. J. ACM, 34(3):596–615, 1987.

[8] Michael L. Fredman and Dan E. Willard. Trans-dichotomous algorithms for

minimum spanning trees and shortest paths. Journal of Computer and System

Sciences, 48(3):533 – 551, 1994.

[9] David R. Karger, Philip N. Klein, and Robert E. Tarjan. A randomized linear-

time algorithm to ﬁnd minimum spanning trees. J. ACM, 42(2):321–328, 1995.

[10] Seth Pettie. Finding minimum spanning trees in o(m α(m,n)) time. Tech-

nical report, The University of Texas at Austin, 1999.

[11] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree

algorithm. Journal of the ACM, vol. 49, no. 1:16–34, 2002.

50

- Fingerprint Authenication Using Graph Theory
- Datastruct Final Study Guide
- Mathematics & Logics for Basic Computer Science
- MSDGATE2
- EECS 203 Final Study Guide - Google Docs
- 0082
- Comparison of Clustering Algorithms Report
- 48064_1_graphtheoryA1
- Lecture 13
- Vertex
- Fiala
- Trees
- What is Space and Time Tradeoff_prefinal
- Lecture2 Discrete Math Review Handout
- Greedy
- community detection in social networks
- Sample Ques - Elq Comp 1
- MIT ChipFiring
- Data Structure Aucse
- 2-Maths - IJMCAR - CLIQUE EDGE - Venkanagouda M Goudar - Paid
- Data Structures Aptitude
- Data Struct
- Data Structure Aptitude Questions
- data-struct
- Data Structure Aucse
- Data Structures
- 2ndSem
- Graph Drawing System Operations
- Term Paper Topic
- local-do.pdf

Skip carousel

- A Survey
- tmp4359.tmp
- Discrete Mathematics MCQ'S
- tmp2EBB.tmp
- tmp7F8
- Discrete Mathematics MCQ'S
- tmp3C86.tmp
- A NEW S-(a,d) ANTIMAGIC LABELING OF A CLASS OF CIRCULANT GRAPHS
- Discrete Mathematics
- A New S-(a,d) Antimagic Labelling of a Class of Generalized Petersen Graphs
- tmpC906
- tmpD746
- tmp30E5.tmp
- tmp20DC.tmp
- tmpDE4.tmp
- tmp6B5B.tmp
- tmpBF3
- tmp3C0A.tmp
- tmpE648
- Network Bucket Testing
- tmpB0E6
- tmp18A.tmp
- Context Aware Decision Support Mobile Application for Recommendation in the Domain of Hotels
- Algorithm and Data Structure
- Tmp 1387
- Analysis and Design Algorithem
- tmp9F1C
- A Survey on Subgraph Matching Algorithm for Graph Database

- designing and implementing android uis for phones and tablets.pdf
- HR-27.ps
- Agent Oriented Se
- Transitioning to Scala — Medium
- Oc w Queue Linked List Dec 2011
- “Old Wounds” - The New Yorker
- Grameenphone 3G Prepaid & Postpaid
- A Curry of Dependency Inversion Principle (DIP), Inversion of Control (IoC), Dependency Injection (DI) and IoC Container - CodeProject
- Nodejs Succinctly
- c2lang Design
- Strategic Marketing
- Machine Learning is Fun! — Medium
- How to Design a Billion Dollar Company — Life Hacks for Business — Medium
- cv_farhan_faruque
- Java Concurrency
- Modern Threading
- Larry Pesavento - Fibonacci Ratios With Pattern Recognition
- Cv Farhan Fauque Android
- C Pointer Tricks - CodeProject
- Image Processing Tutorial
- Algorithms and Data Structures for External Memory
- QtQuickforC Developers
- Creating Calculator Using HTML,CSS and JavaScript - CodeProject
- Aging
- Scala Tutorial

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading