P. 1
ALGORITHMS DESIGN TECHNIQUES AND ANALYSIS

# ALGORITHMS DESIGN TECHNIQUES AND ANALYSIS

|Views: 9,685|Likes:

### Availability:

See more
See less

05/05/2013

pdf

text

original

We have shown before that the worst case running time required to process
an interspersed sequence σ of m union and ﬁnd operations using union by
rank is O(mlogn). Now we show that if path compression is also employed,
then using amortized time analysis (see Sec. 1.13), it is possible to prove
that the bound is almost O(m).

Lemma 4.2 For any integer r ≥ 0, the number of nodes of rank r is at
most n/2r

.

Proof. Fix a particular value of r. When a node x is assigned a rank of

r, label by x all the nodes contained in the tree rooted at x. By Lemma 4.1,
the number of labeled nodes is at least 2r

. If the root of that tree changes,
then the rank of the root of the new tree is at least r+1. This means that
those nodes labeled with x will never be labeled again. Since the maximum
number of nodes labeled is n, and since each root of rank r has at least 2r
nodes, it follows that there are at most n/2r

nodes with rank r.

Corollary 4.1 The rank of any node is at most logn .

Proof. If for some node x,rank(x) = r ≥ logn +1, then by Lemma 4.2,
there are at most n/2 logn +1

< 1 nodes of rank r.

Deﬁnition 4.2 For any positive integer n,log∗n is deﬁned as

log∗n =

0

if n = 0 or 1

min{i ≥ 0 | loglog...log

i times

n ≤ 1} if n ≥ 2.

For example, log∗ 2 = 1, log∗ 4 = 2, log∗ 16 = 3, log∗ 65536 = 4 and

log∗ 265536

= 5. For the amortized time complexity analysis, we will intro-

duce the following function.

F(j) =

1

if j = 0

2F(j−1)

if j ≥ 1.

The most important property of F(j) is its explosive growth. For example,

F(1) = 2, F(2) = 4, F(3) = 16, F(4) = 65536 and F(5) = 265536

.
Let σ be a sequence of m union and ﬁnd instructions. We partition the
ranks into groups. We put rank r in group log∗r. For example, ranks 0
and 1 are in group 0, rank 2 is in group 1, ranks 3 and 4 are in group 2,

Disjoint Sets Data Structures

133

ranks 5 through 16 are in group 3 and ranks 17 through 65536 are in group
4. Since the largest possible rank is logn , the largest group number is at
most log∗ logn = log∗n−1.
We assess the charges of a ﬁnd instruction find(u) as follows. Let v be
a node on the path from node u to the root of the tree containing u, and
let x be that root. If v is the root, a child of the root, or if the parent of

v is in a diﬀerent rank group from v, then charge one time unit to the ﬁnd
instruction itself. If v = x, and both v and its parent are in the same rank
group, then charge one time unit to node v. Note that the nodes on the
path from u to x are monotonically increasing in rank, and since there are
at most log∗n diﬀerent rank groups, no ﬁnd instruction is charged more
than O(log∗n) time units. It follows that the total number of time units
charged to all the ﬁnd instructions in the sequence σ is O(mlog∗n).
After x is found to be the root of the tree containing u, then by applying
path compression, xwill be the parent of bothuandv. If later onxbecomes
a child of another node, and v and x are in diﬀerent groups, no more node
costs will be charged to v in subsequent ﬁnd instructions. An important
observation is that if node v is in rank group g > 0, then v can be moved
and charged at most F(g)−F(g −1) times before it acquires a parent in
a higher group. If node v is in rank group 0, it will be moved at most once
before having a parent in a higher group.
Now we derive an upper bound on the total charges made to the nodes.
By Lemma 4.2, the number of nodes of rank r is at most n/2r

. If we deﬁne

F(−1) = 0, then the number of nodes in group g is at most

F(g)

r=F(g−1)+1

n

2r

n

2F(g−1)+1

∞ r=0

1
2r

= n

2F(g−1)

= n

F(g).

Since the maximum number of node charges assigned to a node in group

g is equal to F(g)−F(g −1), the number of node charges assigned to all

134

Heaps and the Disjoint Sets Data Structures

nodes in group g is at most

n

F(g) (F(g)−F(g−1)) ≤ n.

Since there are at most log∗n groups (0,1,...,log∗n−1), it follows that the
number of node charges assigned to all nodes is O(nlog∗n). Combining this
with the O(mlog∗n) charges to the ﬁnd instructions yields the following
theorem:

Theorem 4.3 Let T(m) denote the running time required to process an
interspersed sequence σ of m union and ﬁnd operations using union by rank
and path compression. Then T(m) = O(mlog∗n) in the worst case.

Note that for almost all practical purposes, log∗n ≤ 5. This means that
the running time is O(m) for virtually all practical applications.

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->