Professional Documents
Culture Documents
– Aggregate method – Make sure you spend less in any k days than you earn
in those k days. Basically, keep an eye on your average costs and keep
things in balance.
– Accounting method – Plan ahead by storing credit, and spend as and when
necessary. (Covered in recitation next week.)
– Charging method – You run things on credit. Basically you can retroac-
tively charge past transactions. (We won’t cover this.)
– Potential method – Look at the future to see how much your investments
will cost (how much your potential cost increases each time you do some-
thing), and plan your spending accordingly. A large potential means your
data structure can handle a large cost later. So When you are doing a
simple operation of cost c, you’re also looking at your data structure to
see if the potential for a expensive operation has increased.
– Table doubling
– Union-Find
Table doubling
(Recall from 6.006) We want to store n elements in a table of size m = Θ(n). One
idea is to double m whenever n becomes larger than m (due to insertions). The cost
to double a table of size m is clearly Θ(m) = Θ(n), which is also the worse case cost
of an insertion.
But what is the total cost of n insertions? It is at most
20 + 21 + 22 + · · · + 2dlg ne = Θ(n).
In this case, we say each insertion has Θ(n)/n = Θ(1) amortized cost.
1
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
1 Union-Find
Motivating Problem
We want to maintain a dynamic collection of disjoint sets S = {S1 , S2 , . . . , Sn }, each
with one (arbitrary) representative element per set Rep[Si ]. Why would we want to
do this? Because it lets you find connected components in graphs quickly as easily.
Here are some applications:
• Unification (look it up!)
• Networked computers
• Image processing
Simple implementation
Just have a bunch of linked lists; one for each set. The head of the list is the
representative.
2
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
Operations
Make-Set(x): Add a set {x} containing a single element, x, with x being the rep-
resentative of that set. Runtime O(1).
Find-Set(x): Returns representative of the set S(x) containing the element x. Run-
time O(n); we are given a pointer to x and we have to walk left from x to the head.
Union(x): Replaces sets S(x) and S(y) containing elements x and y, respectively,
with a new set containing the union of the two sets, which has a single representative.
(Assuming S(x) and S(y) are distinct.) Runtime O(n) – we have to travel to the
tail of one list and to the head of the other, and join the two lists (a head and a tail
pointer need to change); then we need to set a new representative.
Make-Set(x): O(1).
Find-Set(x): Runtime O(1).
Union(x): First, we find the tail of S(x), head of S(y), and concatenate the two lists
– all of these are O(1). Now we need to update all the pointers in S(y) to point to
the new head; this is O(n). The adversary could ask for m(0), m(1), . . . , m(n − 1),
then u(1, 0), u(2, 0), . . . , u(n − 1, 0); u(i, 0) would take O(i).
Pn−1
This would mean that the total time could be O(n + i=1 i) = O(n2 ). How could
we improve this?
We could always concatenate smaller lists into larger ones (we’d need to maintain
the length). This would make each union O(1) in the example above, but a canny
adversary could still construct more balanced sets and give us the same worst-case
runtime.
However, how common is it that a sequence of operations on n elements involves
taking Union of sets of size O(n)?
Let n = total # of elements (Make-Set operations), and let m = total # of
operations (clearly, m ≥ n). We claim that the cost of all unions is O(n log n), and
the total cost of all operations is O(m + n log n).
Proof. Consider a single element u, and how many times it might need to update its
head pointer. When created, S(u).length = 1. Whenever S(u) merges with another
3
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
set S(v), if S(u).length > S(v).length, we change S(v)’s pointers, in which case,
there is no cost to u. Otherwise, we update the head pointers in S(u), which means
that the length of this new set is at least double that of the old S(u).
So, each time we pay any cost to update u’s pointer, the length of the set to which
u belongs at least doubles. This length cannot exceed n and never decreases, so the
head pointer of u is updated ≤ log n times throughout all the unions! Thus, total
cost for all n elements is O(n log n). Now, there could be a huge number of other
operations, so we add m for those cases, giving us O(m + n log n), or O(m log n); so,
dividing by m, in some “average” sense, the cost of each operation is O(log n). We
call this the amortized cost.
Unfortunately, the worst case behavior is still O(n) per operation, and O(n2 ) per
sequence of operations, etc.
Idea 1: As before, merge shorter trees into taller ones, maintaining height of tree at
root. Worst-case cost of each operation is O(log n); we can prove this using induction
on the height of the tree.
Idea 2: Path compression. Whenever we touch a node, say, on a Find-Set, as we
walk up the tree to the root, we also flatten the tree by redirecting the parent pointer
of each node we visit to the root.
4
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
We claim that the amortized cost of m operations is O(log n). (See later for a
more detailed treatment of this method.) We define a potential function φ map-
ping the data structure to a non-negative integer: magicCost = realCost + ∆φ.
P P
So, magicCost = ( realCost) + φ(f inal) − φ(initial); here we select φ(DS) =
P
u (log u.size).
Make-Set is still O(1), since there is no change in potential (we start at 0, and
log 1 = 0). Union is basically two Find-Sets plus the cost of linking two trees.
magicCostlinking ≤ 1 + log(Rep[S(u).size] + Rep[S(v).size]) − log(Rep[S(u).size]) ≤
log n.
5
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
2 Amortization Methods
We will touch upon these briefly during class as we discuss variants of Union-Find.
The text below goes into a little bit more detail with a couple of examples which will
not be covered in class.
Aggregate Method
The method we used in the above analysis is the aggregate method: just add up the
cost of all the operations and then divide by the number of operations. This is the
simplest method, and may not be able to analyze some complicated algorithms.
where the sum is taken over all operations. This is because we mainly care about
using the amortized cost as an upper bound on the actual cost.
Accounting Method
This method allows an operation to store credit into a bank for future use, if its
assigned amortized cost > its actual cost; it also allows an operation to pay for its
extra actual cost using existing credit, if its assigned amortized cost < its actual cost.
Table doubling
For example, in table doubling:
– if an insertion does not trigger table doubling, store a coin represnting c = O(1)
work for future use.
– if an insertion does trigger table doubling, there must be n/2 elements that are
inserted after the previous table doubling, whose coins have not been consumed.
Use up these n/2 coins to pay for the O(n) table doubling. See figure below.
6
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
– amortized cost for table doubling: O(n) − c · n/2 = 0 for large enough c.
Charging Method
The charging method allows operations to charge cost retroactively to past operations.
(You don’t need to know this method for this course, but it is included here for
completeness.)
Potential Method
This method defines a potential function Φ that maps a data structure (DS) configu-
ration to a value. This function Φ is equivalent to the total unused credits stored up
7
Lecture 4 Union-Find and Amortization 6.046 Spring 2018
and
X X
amortized cost = actual cost + Φ(final DS) − Φ(initial DS).
In order for the amortized bound to hold, Φ should never go below Φ(initial DS)
at any point. If Φ(initial DS) = 0, which is usually the case, then Φ should never go
negative (intuitively, we cannot ”owe the bank”).
The biggest challenge in the potential method is finding a potential function Φ
which has small Φ(Dn ) − Φ(D0 ) so that it gives a tight bound on the total actual
cost. While this is generally a non trivial problem, the following intuition is helpful.
Intuition: Choose your potential Φ so that ∆Φ decreases by a large amount
when an expensive operation is performed on your data structure.
Binary counter
A toy example of potential method is incrementing a binary counter. E.g.,
0011010111
increment ↓
0011011000
Cost of increment is Θ(1 + #1), where #1 represents the number of trailing 1 bits.
So the intuition is that 1 bits are bad.
Define Φ = c · #1. Then for large enough c,
Φ(initial DS) = 0 if the counter starts at 000 · · · 0. This is necessary for the above
amortized analysis. Otherwise, Φ may become smaller than Φ(initial DS).