You are on page 1of 64

Algorithms

R OBERT S EDGEWICK | K EVIN W AYNE

1.5 U NION -F IND
‣ dynamic connectivity ‣ quick find

Algorithms
F O U R T H E D I T I O N

‣ quick union ‣ improvements ‣ applications

R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu

Subtext of today’s lecture (and this course)
Steps to developing a usable algorithm.

・Model the problem. ・Find an algorithm to solve it. ・Fast enough? Fits in memory? ・If not, figure out why. ・Find a way to address the problem. ・Iterate until satisfied.
The scientific method. Mathematical analysis.

2

1.5 U NION -F IND
‣ dynamic connectivity ‣ quick find

Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu

‣ quick union ‣ improvements ‣ applications

4) union(2.Dynamic connectivity Given a set of N objects. 0) union(7. 9) union(5. 7) connected(8. 5) union(9. ・Union command: connect two objects. 8) union(6. 3) union(3. 1) connected(0. 0) connected(0. 1) union(1. 2) union(6. ・Find/connected query: is there a path connecting the two objects? union(4. 7) ✔ 4 0 1 2 3 4 𐄂 ✔ 5 6 7 8 9 .

Connectivity example Q. 5 . Is there a path connecting p and q ? p q A. Yes.

・Friends in a social network. convenient to name objects 0 to N –1. can use symbol table to translate from site names to integers: stay tuned (Chapter 3) 6 . ・Transistors in a computer chip.Modeling the objects Applications involve manipulating objects of all types. ・Computers in a network. ・Suppress details not relevant to union-find. When programming. ・Metallic sites in a composite system. ・Use integers as array index. ・Elements in a mathematical set. ・Pixels in a digital photo. ・Variable names in Fortran program.

Modeling the connections We assume "is connected to" is an equivalence relation: ・Reflexive: p is connected to p. ・Transitive: if p is connected to q and q is connected to r. Connected components. Maximal set of objects that are mutually connected. then q is connected to p. ・Symmetric: if p is connected to q. then p is connected to r. 0 1 2 3 4 5 6 7 { 0 } { 1 4 5 } { 2 3 6 7 } 3 connected components 7 .

Implementing the operations
Find query. Check if two objects are in the same component. Union command. Replace components containing two objects with their union.

0

1

2

3

0

1

2

3

union(2, 5)
4 5 6

7

4

5

6

7

{ 0 } { 1 4 5 } { 2 3 6 7 }

{ 0 } { 1 2 3 4 5 6 7 }

3 connected components

2 connected components
8

Union-find data type (API)
Goal. Design efficient data structure for union-find.

・Number of objects N can be huge. ・Number of operations M can be huge. ・Find queries and union commands may be intermixed.
public class UF UF(int N) void union(int p, int q) boolean connected(int p, int q) int find(int p) int count() initialize union-find data structure with N objects (0 to N – 1) add connection between p and q are p and q in the same component? component identifier for p (0 to N – 1) number of components

9

Dynamic-connectivity client

・Read in number of objects N from standard input. ・Repeat:
– read in pair of integers from standard input – if they are not yet connected, connect them and print out pair

public static void main(String[] args) { int N = StdIn.readInt(); UF uf = new UF(N); while (!StdIn.isEmpty()) { int p = StdIn.readInt(); int q = StdIn.readInt(); if (!uf.connected(p, q)) { uf.union(p, q); StdOut.println(p + " " + q); } } }

% more tinyUF.txt 10 4 3 3 8 6 5 9 4 2 1 8 9 5 0 7 2 6 1 1 0 6 7

10

cs.princeton.1.5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.edu ‣ quick union ‣ improvements ‣ applications .

1.edu ‣ quick union ‣ improvements ‣ applications .cs.5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.princeton.

and 7 are connected 3. 4. 2. and 9 are connected 0 1 2 3 4 5 6 7 8 9 13 . 8. ・Interpretation: p and q are connected iff they have the same id.Quick-find [eager approach] Data structure. if and only if ・Integer array id[] of size N. 5 and 6 are connected 1. 0 1 2 3 4 5 6 7 8 9 id[] 0 1 1 8 8 0 0 1 8 8 0.

・Integer array id[] of size N. id[6] = 0. id[1] = 1 6 and 1 are not connected Union. 0 1 2 3 4 5 6 7 8 9 after union of 6 and 1 id[] 1 1 1 8 8 1 1 1 8 8 problem: many values can change 14 . 0 1 2 3 4 5 6 7 8 9 id[] 0 1 1 8 8 0 0 1 8 8 Find.Quick-find [eager approach] Data structure. change all entries whose id equals id[p] to id[q]. ・Interpretation: p and q are connected iff they have the same id. To merge components containing p and q. Check if p and q have the same id.

Quick-find demo 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 id[] 0 1 2 3 4 5 6 7 8 9 15 .

Quick-find demo 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 id[] 1 1 1 8 8 1 1 1 8 8 .

i < N. for (int i = 0. int q) pid = id[p]. } public boolean connected(int p. } public { int int for } } 17 set id of each object to itself (N array accesses) check whether p and q are in the same component (2 array accesses) void union(int p. change all entries with id[p] to id[q] (at most 2N + 2 array accesses) . int q) { return id[p] == id[q]. i++) if (id[i] == pid) id[i] = qid. i++) id[i] = i.Quick-find: Java implementation public class QuickFindUF { private int[] id. qid = id[q]. public QuickFindUF(int N) { id = new int[N]. (int i = 0.length. i < id.

quadratic Ex. algorithm quick-find initialize N union N find 1 order of growth of number of array accesses Quick-find defect. Union too expensive. Takes N 2 array accesses to process sequence of N union commands on N objects.Quick-find is too slow Cost model. Number of array accesses (for read or write). 18 .

・Touch all words in approximately 1 second. ・But. ・ ・10 words of main memory. 109 operations per second. Huge problem for quick-find. ・30+ years of computer time! 9 9 18 time 64T quadratic Quadratic algorithms don't scale with technology. 9 a truism (roughly) since 1950! Ex. ・10 union commands on 10 objects.Quadratic algorithms do not scale Rough standard (for now). ・New computer may be 10x as fast. 32T ・ 16T With quadratic algorithm. ・Quick-find takes more than 10 operations. takes 10x as long! linearithmic linear 1K 2K 4K 8K 19 8T size . has 10x as much memory ⇒ want to solve a problem that is 10x as big.

edu ‣ quick union ‣ improvements ‣ applications .1.cs.5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.princeton.

princeton.1.edu ‣ quick union ‣ improvements ‣ applications .5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.

. 0 1 2 3 4 5 6 7 8 9 keep going until it doesn’t change (algorithm ensures no cycles) 0 1 9 6 7 8 id[] 0 1 9 4 9 6 6 7 8 9 2 4 5 3 root of 3 is 9 22 .Quick-union [lazy approach] Data structure.]]].. ・Integer array id[] of size N.. ・Root of i is id[id[id[.id[i]. ・Interpretation: id[i] is parent of i..

id[i]. Union... To merge components containing p and q. ・Interpretation: id[i] is parent of i.]]]. 0 1 2 3 4 5 6 7 8 9 0 1 9 6 7 8 id[] 0 1 9 4 9 6 6 7 8 9 2 4 5 q p 3 Find. Check if p and q have the same root. ・Integer array id[] of size N. set the id of p's root to the id of q's root..Quick-union [lazy approach] Data structure. ・Root of i is id[id[id[.. 0 1 2 3 4 5 6 7 8 9 root of 3 is 9 root of 5 is 6 3 and 5 are not connected 0 1 9 6 7 8 id[] 0 1 9 4 9 6 6 7 8 6 2 4 5 q only one value changes p 3 23 .

Quick-union demo 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 id[] 0 1 2 3 4 5 6 7 8 9 24 .

Quick-union demo 8 1 3 9 0 2 7 4 5 0 1 2 3 4 5 6 7 8 9 6 id[] 1 8 1 8 3 0 5 1 8 8 .

} } 26 set id of each object to itself (N array accesses) chase parent pointers until reach root (depth of i array accesses) check if p and q have same root (depth of p and q array accesses) change root of p to point to root of q (depth of p and q array accesses) . } public boolean connected(int p. i < N. public QuickUnionUF(int N) { id = new int[N]. int q) { int i = root(p) int j = root(q). i++) id[i] = i. for (int i = 0. return i. } public void union(int p. int q) { return root(p) == root(q). id[i] = j.Quick-union: Java implementation public class QuickUnionUF { private int[] id. } private int root(int i) { while (i != id[i]) i = id[i].

Number of array accesses (for read or write). Quick-union defect. ・Trees can get tall.Quick-union is also too slow Cost model. ・Trees are flat. ・Find too expensive (could be N array accesses). algorithm quick-find quick-union initialize N N union N N † find 1 N worst case † includes cost of finding roots Quick-find defect. but too expensive to keep them flat. ・Union too expensive (N array accesses). 27 .

princeton.edu ‣ quick union ‣ improvements ‣ applications .cs.5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.1.

5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.princeton.cs.edu ‣ quick union ‣ improvements ‣ applications .1.

・Modify quick-union to avoid tall trees. quick-union q p smaller tree q smaller tree p larger tree weighted might put the larger tree lower larger tree p always chooses the better alternative q q smaller tree smaller tree p larger tree larger tree Weighted quick-union 30 . ・Balance by linking root of smaller tree to root of larger tree.Improvement 1: weighting Weighted quick-union. ・Keep track of size of each tree (number of objects).

Weighted quick-union demo 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 id[] 0 1 2 3 4 5 6 7 8 9 31 .

Weighted quick-union demo 6 4 0 2 5 3 8 9 1 7 0 1 2 3 4 5 6 7 8 9 id[] 6 2 6 4 6 6 6 2 4 4 .

88 union() operations) 33 .Quick-union and weighted quick-union example quick-union average distance to root: 5.52 Quick-union and weighted quick-union (100 sites.11 weighted average distance to root: 1.

Identical to quick-union. sz[i] += sz[j]. ・Update the sz[] array.Weighted quick-union: Java implementation Data structure. } else { id[j] = i. sz[j] += sz[i]. Modify quick-union to: ・Link root of smaller tree to root of larger tree. return root(p) == root(q). Same as quick-union. int j = root(q). Union. Find. int i = root(p). if (sz[i] < sz[j]) { id[i] = j. } 34 . but maintain extra array sz[i] to count number of objects in the tree rooted at i.

given roots. ・Union: takes constant time.Weighted quick-union analysis Running time. ・Find: takes time proportional to depth of p and q. Proposition. lg = base-2 logarithm x N = 10 depth(x) = 3 ≤ lg N 35 . Depth of any node x is at most lg N.

given roots. ・The size of the tree containing x at least doubles since | T | ≥ ・Size of tree containing x can double at most lg N times. Proposition. When does depth of x increase? Increases by 1 when tree T1 containing x is merged into another tree T2. Depth of any node x is at most lg N.Weighted quick-union analysis Running time. ・Find: takes time proportional to depth of p and q. Why? 2 | T 1 |. Pf. ・Union: takes constant time. T2 T1 x 36 .

Stop at guaranteed acceptable performance? A. ・Union: takes constant time. 37 . algorithm quick-find quick-union weighted QU initialize N N N union N N lg N † † connected 1 N lg N † includes cost of finding roots Q. given roots. Depth of any node x is at most lg N. No. ・Find: takes time proportional to depth of p and q. easy to improve further. Proposition.Weighted quick-union analysis Running time.

set the id of each examined node to point to that root.Improvement 2: path compression Quick union with path compression. Just after computing the root of p. 0 root 1 2 3 4 5 6 7 p 8 x 9 10 11 12 38 .

set the id of each examined node to point to that root. Just after computing the root of p. 0 root p 9 1 2 11 12 3 4 5 x 6 7 8 10 39 .Improvement 2: path compression Quick union with path compression.

Improvement 2: path compression Quick union with path compression. 0 root p 9 6 1 2 11 12 8 x 3 4 5 10 7 40 . set the id of each examined node to point to that root. Just after computing the root of p.

set the id of each examined node to point to that root. 0 root p 9 6 3 x 1 2 11 12 8 7 4 5 10 41 . Just after computing the root of p.Improvement 2: path compression Quick union with path compression.

set the id of each examined node to point to that root.Improvement 2: path compression Quick union with path compression. x 0 root p 9 6 3 1 2 11 12 8 7 4 5 10 42 . Just after computing the root of p.

Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length). 43 .Path compression: Java implementation Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root. No reason not to! Keeps tree almost completely flat. } return i. } only one extra line of code ! In practice. private int root(int i) { while (i != id[i]) { id[i] = id[id[i]]. i = id[i].

[Hopcroft-Ulman.Weighted quick-union with path compression: amortized analysis Proposition. Tarjan] Starting from an empty data structure. in "cell-probe" model of computation 44 . N 1 2 4 16 65536 265536 lg* N 0 1 2 3 4 5 ・Analysis can be improved to N + M α(M. WQUPC is not quite linear. Amazing fact. WQUPC is linear. N). [Fredman-Saks] No linear-time algorithm exists. any sequence of M union-find ops on N objects makes ≤ c ( N + M lg* N ) array accesses. ・Simple algorithm with fascinating mathematics. Linear-time algorithm for M union-find ops on N objects? iterate log function ・Cost within constant factor of reading in the data. ・In practice. ・In theory.

Weighted quick union (with path compression) makes it possible to solve problems that could not otherwise be addressed.Summary Bottom line. good algorithm enables solution. [109 unions and finds with 109 objects] ・WQUPC reduces time from 30 years to 6 seconds. ・Supercomputer won't help much. 45 . algorithm quick-find quick-union weighted QU QU + path compression weighted QU + path compression worst-case time MN MN N + M log N N + M log N N + M lg* N M union-find operations on a set of N objects Ex.

edu ‣ quick union ‣ improvements ‣ applications .5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.princeton.cs.1.

1.princeton.5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.edu ‣ quick union ‣ improvements ‣ applications .

・Kruskal's minimum spanning tree algorithm. ・Morphological attribute openings and closings. ・Games (Go. 48 . ・Least common ancestor. ・Hoshen-Kopelman algorithm in physics. ・Compiling equivalence statements in Fortran. ・Matlab's bwlabel() function in image processing. Hex). ・Equivalence of finite state automata. ✓ Dynamic connectivity.Union-find applications ・Percolation. ・Hinley-Milner polymorphic type inference.

・System percolates iff top and bottom are connected by open sites. percolates does not percolate blocked site open site N=8 open site connected to top no open site connected to top 49 . ・Each site is open with probability p (or blocked with probability 1 – p).Percolation A model for many physical systems: ・N-by-N grid of sites.

Percolation A model for many physical systems: ・N-by-N grid of sites. model electricity fluid flow social interaction system material material population vacant site conductor empty person occupied site insulated blocked empty percolates conducts porous communicates 50 . ・System percolates iff top and bottom are connected by open sites. ・Each site is open with probability p (or blocked with probability 1 – p).

6) percolates? p high (0. p low (0.4) does not percolate p medium (0.8) percolates 51 .Likelihood of percolation Depends on site vacancy probability p.

Q. theory guarantees a sharp threshold p*.593 1 site vacancy probability p 52 . ・p < p*: almost certainly does not percolate.Percolation phase transition When N is large. ・p > p*: almost certainly percolates. What is the value of p* ? 1 percolation probability p* 0 0 N = 100 0.

Monte Carlo simulation ・Initialize N-by-N whole grid to be blocked. full open site (connected to top) empty open site (not connected to top) blocked site N = 20 53 . ・Declare random sites open until top connected to bottom. ・Vacancy percentage estimates p*.

How to check whether an N-by-N system percolates? N=5 open site blocked site 54 .Dynamic connectivity solution to estimate percolation threshold Q.

Dynamic connectivity solution to estimate percolation threshold Q. N=5 0 5 10 15 20 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 open site blocked site 55 . How to check whether an N-by-N system percolates? ・Create an object for each site and name them 0 to N 2 – 1.

2 N=5 open site blocked site 56 . How to check whether an N-by-N system percolates? ・Create an object for each site and name them 0 to N – 1. ・Sites are in same component if connected by open sites.Dynamic connectivity solution to estimate percolation threshold Q.

Dynamic connectivity solution to estimate percolation threshold Q. How to check whether an N-by-N system percolates? ・Create an object for each site and name them 0 to N – 1. 2 brute-force algorithm: N 2 calls to connected() top row N=5 bottom row open site blocked site 57 . ・Sites are in same component if connected by open sites. ・Percolates iff any site on bottom row is connected to site on top row.

Introduce 2 virtual sites (and connections to top and bottom).Dynamic connectivity solution to estimate percolation threshold Clever trick. ・Percolates iff virtual top site is connected to virtual bottom site. efficient algorithm: only 1 call to connected() virtual top site N=5 top row bottom row open site blocked site virtual bottom site 58 .

Dynamic connectivity solution to estimate percolation threshold Q. How to model opening a new site? open this site N=5 open site blocked site 59 .

How to model opening a new site? A. up to 4 calls to union() open this site N=5 open site blocked site 60 . Connect newly opened site to all of its adjacent open sites.Dynamic connectivity solution to estimate percolation threshold Q.

Percolation threshold Q.592746 for large square lattices. constant known only via simulation 1 percolation probability p* 0 0 N = 100 0. What is percolation threshold p* ? A. About 0.593 1 site vacancy probability p Fast algorithm enables accurate answer to scientific question. 61 .

5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.edu ‣ quick union ‣ improvements ‣ applications .1.princeton.cs.

・Fast enough? Fits in memory? ・If not. ・Model the problem. The scientific method. ・Find an algorithm to solve it.Subtext of today’s lecture (and this course) Steps to developing a usable algorithm. 63 . Mathematical analysis. ・Find a way to address the problem. figure out why. ・Iterate until satisfied.

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE 1.cs.edu .5 U NION -F IND ‣ dynamic connectivity ‣ quick find Algorithms F O U R T H E D I T I O N ‣ quick union ‣ improvements ‣ applications R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.princeton.