You are on page 1of 17

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE

4.2 D IRECTED G RAPHS 4.2 D IRECTED G RAPHS


‣ introduction ‣ introduction
‣ digraph API ‣ digraph API
‣ digraph search ‣ digraph search
Algorithms Algorithms
‣ topological sort ‣ topological sort
F O U R T H E D I T I O N

R OBERT S EDGEWICK | K EVIN W AYNE


‣ strong components R OBERT S EDGEWICK | K EVIN W AYNE
‣ strong components
http://algs4.cs.princeton.edu http://algs4.cs.princeton.edu

Directed graphs Road network


To see all the details that are visible on the screen,use the
Address Holland Tunnel "Print" link next to the map.
New York, NY 10013
Digraph. Set of vertices connected pairwise by directed edges. Vertex = intersection; edge = one-way street.

outdegree = 4
indegree = 2
0

6 8 7

directed path
1 2
from 0 to 2

3 9 10

4 directed cycle

5
11 12

©2008 Google - Map data ©2008 Sanborn, NAVTEQ™ - Terms of Use

3 4
Political blogosphere graph Overnight interbank loan graph

Vertex = political blog; edge = link. Vertex = bank; edge = overnight loan.

GSCC

GWCC

GIN
GOUT

DC

Tendril

!"#$%& '( !&)&%*+ ,$-). -&/01%2 ,1% 3&4/&56&% 7'8 799:; <=>> ? #"*-/ 0&*2+@ A1--&A/&) A1541-&-/8
The Political Blogosphere and the 2004 U.S. Election: Divided They Blog, Adamic and Glance, 2005 The Topology
B> ? )".A1--&A/&) A1541-&-/8 <3>> of?
the Federal
#"*-/ Funds
./%1-#+@ Market, A1541-&-/8
A1--&A/&) Bech and Atalay,
<CD ? 2008
#"*-/ "-EA1541-&-/8
Figure 1: Community structure of political blogs (expanded set), shown using utilizing a GEM
layout [11] in the GUESS[3] visualization and analysis tool. The colors reflect political orientation, <FGH ? #"*-/ 1$/E A1541-&-/; F- /I". )*@ /I&%& 0&%& JK -1)&. "- /I& <3>>8 L9L -1)&. "- /I& <CD8 :K
5 6
red for conservative, and blue for liberal. Orange links go from liberal to conservative, and purple -1)&. "- <FGH8 J9 -1)&. "- /I& /&-)%"+. *-) 7 -1)&. "- * )".A1--&A/&) A1541-&-/;
ones from conservative to liberal. The size of each blog reflects the number of other blogs that link
to it.
!"#$%&%'$( HI& -1)&. 1, * -&/01%2 A*- 6& 4*%/"/"1-&) "-/1 * A1++&A/"1- 1, )".M1"-/ .&/. A*++&) )".A1--&A/&)
Uber taxi graph Implication
A1541-&-/.8 !!! " #graph
"! !! !! "; HI& -1)&. 0"/I"- &*AI )".A1--&A/&) A1541-&-/ )1 -1/ I*N& +"-2. /1 1% ,%15
longer existed, or had moved to a different location. When looking at the front page of a blog we did
-1)&. "- *-@ 1/I&% A1541-&-/8 ";&;8 #!"# $"# !$# "" $ " $ !! !! " % $ $ !!! !! "& # ' ", % (# %! ; HI& A1541-&-/
not make a distinction between blog references made in blogrolls (blogroll links) from those made
0"/I /I& +*%#&./ -$56&% 1, -1)&. ". %&,&%%&) /1 *. /I& )%*$& +"*,-. /'$$"/&"0 /'12'$"$& O<=>>P; C- 1/I&%
in posts (post citations). This had the disadvantage of not differentiating between blogs that were
Vertex = taxi pickup; edge = taxi ride. Vertex
01%).8 /I& = variable;
<=>> edgeA1541-&-/
". /I& +*%#&./ = logical implication.
1, /I& -&/01%2 "- 0I"AI *++ -1)&. A1--&A/ /1 &*AI 1/I&% N"*
actively mentioned in a post on that day, from blogroll links that remain static over many weeks [10]. $-)"%&A/&) 4*/I.; HI& %&5*"-"-# )".A1--&A/&) A1541-&-/. OB>.P *%& .5*++&% A1541-&-/. ,1% 0I"AI /I&
Since posts usually contain sparse references to other blogs, and blogrolls usually contain dozens of .*5& ". /%$&; C- &54"%"A*+ ./$)"&. /I& <=>> ". 1,/&- ,1$-) /1 6& .&N&%*+ 1%)&%. 1, 5*#-"/$)& +*%#&% /I*-
blogs, we assumed that the network obtained by crawling the front page of each blog would strongly *-@ 1, /I& B>. O.&& Q%1)&% "& *-3 O7999PP;
reflect blogroll links. 479 blogs had blogrolls through blogrolling.com, while many others simply HI& <=>> A1-."./. 1, * )%*$& 4&5'$)-. /'$$"/&"0 /'12'$"$& O<3>>P8 * )%*$& '6&7/'12'$"$& O<FGHP8
maintained a list of links to their favorite blogs. We did not include blogrolls placed on a secondary if x5 is true,
* )%*$& %$7/'12'$"$& O<CDP *-) &"$05%-4 O.&& !"#$%& 'P; HI& <3>> A154%".&. *++ -1)&. /I*/ A*- %&*AI &N&%@
page. then x0 is true
1/I&% -1)& "- /I& <3>> /I%1$#I * )"%&A/&) 4*/I; R -1)& ". "- /I& <FGH ", "/ I*. * 4*/I ,%15 /I& <3>>
We constructed a citation network by identifying whether a URL present on the page of one blog 6$/ -1/ /1 /I& <3>>; C- A1-/%*./8 * -1)& ". "- ~x /I&2 <CD ", "/xI*.
0 * 4*/I /1 /I& <3>> 6$/ -1/ ,%15 "/; R
references another political blog. We called a link found anywhere on a blog’s page, a “page link” to -1)& ". "- * /&-)%"+ ", "/ )1&. -1/ %&.")& 1- * )"%&A/&) 4*/I /1 1% ,%15 /I& <3>>;S9
distinguish it from a “post citation”, a link to another blog that occurs strictly within a post. Figure 1 !%4/644%'$( C- /I& -&/01%2 1, 4*@5&-/. .&-/ 1N&% !&)0"%& *-*+@T&) 6@ 31%*5U2" "& *-3 O799:P8 /I& <3>>
shows the unmistakable division between the liberal and conservative political (blogo)spheres. In ". /I& +*%#&./ A1541-&-/; F- *N&%*#&8 *+51./ %&' 1, /I& -1)&. "- /I*/ -&/01%2 6&+1-# /1 /I& <3>>; C-
fact, 91% of the links originating within either the conservative or liberal communities stay within A1-/%*./8 /I& <3>> ". 5$AI .5*++&%x,1% /I& ,&)&%*+ ~x4 ,$-). -&/01%2;
x3 C- 799:8 1-+@ (&' ) (' 1, /I& -1)&.
6
that community. An effect that may not be as apparent from the visualization is that even though 6&+1-# /1 /I". A1541-&-/; Q@ ,*% /I& +*%#&./ A1541-&-/ ". /I& <CD; C- 799:8 )%' ) )' 1, /I& -1)&. 0&%&
we started with a balanced set of blogs, conservative blogs show a greater tendency to link. 84% "- /I". A1541-&-/; HI& <FGH A1-/*"-&) (*' ) +' 1, *++ -1)&. 4&% )*@8 0I"+& /I&%& 0&%& (+' ) ,' 1,
of conservative blogs link to at least one other blog, and 82% receive a link. In contrast, 74% of /I& -1)&. +1A*/&) "- /I& /&-)%"+.;SS V&.. /I*- -' ) (' 1, /I& -1)&. 0&%& "- /I& %&5*"-"-# )".A1--&A/&)
liberal blogs link to another blog, while only 67% are linked to by another blog. So overall, we see a A1541-&-/. O.&& H*6+& JP;
slightly higher tendency for conservative blogs to link. Liberal blogs linked to 13.6 blogs on average, ~x5 ~x1 x1 x5
S 9 HI& /&-)%"+. 5*@ *+.1 6& )"W&%&-/"*/&) "-/1 /I%&& .$6A1541-&-/.( * .&/ 1, -1)&. /I*/ *%& 1- * 4*/I &5*-*/"-# ,%15 <CD8 *
while conservative blogs linked to an average of 15.1, and this difference is almost entirely due to .&/ 1, -1)&. /I*/ *%& 1- * 4*/I +&*)"-# /1 <FGH8 *-) * .&/ 1, -1)&. /I*/ *%& 1- * 4*/I /I*/ 6&#"-. "- <CD *-) &-). "- <FGH;
S S !!"# 1, -1)&. 0&%& "- X,%15E<CDY /&-)%"+.8 $!%# 1, -1)&. 0&%& "- /I& X/1E<FGHY /&-)%"+. *-) "!&# 1, -1)&. 0&%& "-
the higher proportion of liberal blogs with no links at all.
X/$6&.Y ,%15 <CD /1 <FGH;
Although liberal blogs may not link as generously on average, the most popular liberal blogs,
Daily Kos and Eschaton (atrios.blogspot.com), had 338 and 264 links from our single-day snapshot ~x3 x4 ~x6

S7
4
~x0 x2

http://blog.uber.com/2012/01/09/uberdata-san-franciscomics/
7 8
Combinational circuit WordNet graph

Vertex = logical gate; edge = wire. Vertex = synset; edge = hypernym relationship.

event

happening occurrence occurrent natural_event


miracle
act human_ac on human_ac vity
change altera on modi ca on miracle

group_ac on
damage harm ..
impairment transi on increase forfeit forfeiture ac on

resistance opposi on transgression


leap jump salta on jump leap
change

demo on varia on
mo on movement move

locomo on travel descent

run running jump parachu ng

http://wordnet.princeton.edu dash sprint

9 10

Digraph applications Some digraph problems

digraph vertex directed edge

transportation street intersection one-way street problem description

web web page hyperlink


s→t path Is there a path from s to t ?
food web species predator-prey relationship
shortest s→t path What is the shortest path from s to t ?
WordNet synset hypernym

scheduling task precedence constraint directed cycle Is there a directed cycle in the graph ?

financial bank transaction


topological sort Can the digraph be drawn so that all edges point upwards?
cell phone person placed call
strong connectivity Is there a directed path between all pairs of vertices ?
infectious disease person infection
transitive closure For which vertices v and w is there a directed path from v to w ?
game board position legal move

citation journal article citation PageRank What is the importance of a web page ?

object graph object pointer

inheritance hierarchy class inherits from

control flow code block jump


11 12
Digraph API

Almost identical to Graph API.

public class Digraph

4.2 D IRECTED G RAPHS Digraph(int V) create an empty digraph with V vertices

Digraph(In in) create a digraph from input stream


‣ introduction void addEdge(int v, int w) add a directed edge v→w
‣ digraph API
Iterable<Integer> adj(int v) vertices pointing from v
‣ digraph search
Algorithms int V() number of vertices
‣ topological sort
int E() number of edges
R OBERT S EDGEWICK | K EVIN W AYNE
‣ strong components
Digraph reverse() reverse of this digraph
http://algs4.cs.princeton.edu
String toString() string representation

14

Digraph API Digraph representation: adjacency lists

tinyDG.txt
% java Digraph tinyDG.txt Maintain vertex-indexed array of lists.
V 0->5 5 1
13
E 0->1 tinyDG.txt
22
V 5 1
4 2 2->0 13
2 3 adj[] E
2->3 0 3 tinyDG.txt 22
3 2
6 0
0
3->5 V 4 2
adj[]
5 1
0 1 1 5 2 13 2 3 0 3
3->2 E 3 2 0
2 0 2
4->3 22 6 0
11 12 3 2 1 5 2
12 9
3
4->2 4 2 0 1
4 2 0 2
adj[]
9 10 5->4 4 2 3 11 12 3 02 3
9 11 5 3
7 9
⋮ 3 2 12 9 0
6 9 4 8 0 4
10 12 11->4 6 0 9 10 4
11 4 7 11->12 0 1
9 11 1 5 5 2
4 3 6 9 7 9
8 12->9 6 9 4 8 0
3 5
9
2 0 10 12 2
6 8
8 6
6
11 12 11 4 7 3 2
10 4 3 3 8
6 9
⋮5 4 11 11 10 12 9 3 5
0 5
6 4 12 9 10 6 8 4 9
6 4
12 8 6 10
6 9=
In in
7 6
new In(args[0]); read digraph from 9 11 5 4 5
Digraph G = new Digraph(in); input stream 7 9 0 5 11 11 10
4 12
10 12 6 4 6 12 9 4 8 0
6 9 12
for (int v = 0; v < G.V(); v++) 9 11 4 7
7 6
print out each
for (int w : G.adj(v)) 4 3 4 612 9
edge (once) 8
StdOut.println(v + "->" + w); 3 5
9 9
6
15
8 6 16
8 6 10
5 4
Digraph representations Adjacency-lists graph representation (review): Java implementation

In practice. Use adjacency-lists representation. public class Graph


・Algorithms based on iterating over vertices pointing from v. {
private final int V;
・Real-world digraphs tend to be sparse. private final Bag<Integer>[] adj; adjacency lists

huge number of vertices,


small average vertex degree public Graph(int V)
{ create empty graph
with V vertices
this.V = V;
adj = (Bag<Integer>[]) new Bag[V];
for (int v = 0; v < V; v++)
insert edge edge from iterate over vertices
representation space adj[v] = new Bag<Integer>();
from v to w v to w? pointing from v?
}
list of edges E 1 E E add edge v–w
public void addEdge(int v, int w)
adjacency matrix {
V2 1† 1 V
adj[v].add(w);
adjacency lists E+V 1 outdegree(v) outdegree(v) adj[w].add(v);
}
† disallows parallel edges iterator for vertices
public Iterable<Integer> adj(int v) adjacent to v
{ return adj[v]; }
}
17 18

Adjacency-lists digraph representation: Java implementation

public class Digraph


{
private final int V;
private final Bag<Integer>[] adj; adjacency lists

public Digraph(int V)
{ create empty digraph
4.2 D IRECTED G RAPHS
with V vertices
this.V = V;
adj = (Bag<Integer>[]) new Bag[V]; ‣ introduction
for (int v = 0; v < V; v++)
adj[v] = new Bag<Integer>(); ‣ digraph API
}
‣ digraph search
Algorithms
‣ topological sort
add edge v→w
public void addEdge(int v, int w)
{
adj[v].add(w); R OBERT S EDGEWICK | K EVIN W AYNE
‣ strong components
http://algs4.cs.princeton.edu
}
iterator for vertices
public Iterable<Integer> adj(int v) pointing from v
{ return adj[v]; }
}
19
Reachability Depth-first search in digraphs

Problem. Find all vertices reachable from s along a directed path. Same method as for undirected graphs.

s
・Every undirected graph is a digraph (with edges in both directions).
・DFS is a digraph algorithm.

DFS (to visit a vertex v)

Mark v as visited.
Recursively visit all unmarked
vertices w pointing from v.

21 22

Depth-first search demo Depth-first search demo

To visit a vertex v : 4→2 To visit a vertex v :


2→3
・Mark vertex v as visited. 3→2 ・Mark vertex v as visited.
・Recursively visit all unmarked vertices pointing from v. 6→0 ・Recursively visit all unmarked vertices pointing from v.
0→1
2→0
0 0
11→12 v marked[] edgeTo[]
12→9
6 8 7 6 8 7 0 T –
9→10
1 T 0
1 2 9→11 1 2 reachable 2 T 3
8→9 from vertex 0 3 T 4
10→12 4 T 5
11→4 5 T 0
3 9 10 4→3 3 9 10 6 F –
3→5 7 F –
4 6→8 4 8 F –
8→6 9 F –
5 5→4 5 10 F –
11 12 11 12
0→5 11 F –
6→4 12 F –

a directed graph 6→9 reachable from 0


7→6
23 24
Depth-first search (in undirected graphs) Depth-first search (in directed graphs)

Recall code for undirected graphs. Code for directed graphs identical to undirected one.
[substitute Digraph for Graph]

public class DepthFirstSearch public class DirectedDFS


{ {
private boolean[] marked; true if connected to s private boolean[] marked; true if path from s

public DepthFirstSearch(Graph G, int s) public DirectedDFS(Digraph G, int s)


{ {
constructor marks constructor marks
marked = new boolean[G.V()]; vertices connected to s
marked = new boolean[G.V()]; vertices reachable from s
dfs(G, s); dfs(G, s);
} }

private void dfs(Graph G, int v) recursive DFS does the work private void dfs(Digraph G, int v) recursive DFS does the work
{ {
marked[v] = true; marked[v] = true;
for (int w : G.adj(v)) for (int w : G.adj(v))
if (!marked[w]) dfs(G, w); if (!marked[w]) dfs(G, w);
} }

client can ask whether any client can ask whether any
public boolean visited(int v) public boolean visited(int v)
vertex is connected to s vertex is reachable from s
{ return marked[v]; } { return marked[v]; }
} }

25 26

Reachability application: program control-flow analysis Reachability application: mark-sweep garbage collector

Every program is a digraph. Every data structure is a digraph.


・Vertex = basic block of instructions (straight-line program). ・Vertex = object.
・Edge = jump. ・Edge = reference.
Dead-code elimination. Roots. Objects known to be directly accessible by program (e.g., stack).
Find (and remove) unreachable code.
Reachable objects. Objects indirectly accessible by program
Infinite-loop detection. (starting at a root and following a chain of pointers).
Determine whether exit is unreachable.

roots
27 28
Reachability application: mark-sweep garbage collector Depth-first search in digraphs summary

Mark-sweep algorithm. [McCarthy, 1960] DFS enables direct solution of simple digraph problems.
・Mark: mark all reachable objects. ✓ ・Reachability.
・Sweep: if object is unmarked, it is garbage (so add to free list). ・Path finding.
・Topological sort.
Memory cost. Uses 1 extra mark bit per object (plus DFS stack). ・Directed cycle detection.
Basis for solving difficult digraph problems.
・2-satisfiability.
・Directed Euler path.
・Strongly-connected components.

roots
SIAM J. COMPUT.
Vol. 1, No. 2, June 1972

DEPTH-FIRST SEARCH AND LINEAR GRAPH ALGORITHMS*


ROBERT TARJAN"
Abstract. The value of depth-first search or "bacltracking" as a technique for solving problems is
illustrated by two examples. An improved version of an algorithm for finding the strongly connected
components of a directed graph and ar algorithm for finding the biconnected components of an un-
direct graph are presented. The space and time requirements of both algorithms are bounded by
k 1V + k2E d- k for some constants kl, k2, and k a, where Vis the number of vertices and E is the number
of edges of the graph being examined.
Key words. Algorithm, backtracking, biconnectivity, connectivity, depth-first, graph, search,
spanning tree, strong-connectivity.
29 30
1. Introduction. Consider a graph G, consisting of a set of vertices U and a
set of edges g. The graph may either be directed (the edges are ordered pairs (v, w)
of vertices; v is the tail and w is the head of the edge) or undirected (the edges are
unordered pairs of vertices, also represented as (v, w)). Graphs form a suitable
abstraction for problems in many areas; chemistry, electrical engineering, and
Breadth-first search in digraphs Directed breadth-first search demo
sociology, for example. Thus it is important to have the most economical algo-
rithms for answering graph-theoretical questions.
In studying graph algorithms we cannot avoid at least a few definitions.
These definitions are more-or-less standard in the literature. (See Harary [3],
for instance.) If G (, g) is a graph, a path p’v w in G is a sequence of vertices
and edges leading from v to w. A path is simple if all its vertices are distinct. A path
Same method as for undirected graphs. Repeat until queue is empty:
p’v v is called a closed path. A closed path p’v v is a cycle if all its edges are
distinct and the only vertex to occur twice in p is v, which occurs exactly twice.

・Every undirected graph is a digraph (with edges in both directions). ・Remove vertex v from queue.
Two cycles which are cyclic permutations of each other are considered to be the
same cycle. The undirected version of a directed graph is the graph formed by

・BFS is a digraph algorithm. ・Add to queue all unmarked vertices pointing from v and mark them.
converting each edge of the directed graph into an undirected edge and removing
duplicate edges. An undirected graph is connected if there is a path between every
pair of vertices.
A (directed rooted) tree T is a directed graph whose undirected version is
connected, having one vertex which is the head of no edges (called the root),
and such that all vertices except the root are the head of exactly one edge. The

BFS (from source vertex s) 0


-
relation "(v, w) is an edge of T" is denoted by v- w. The relation "There tinyDG2.txt
path from v to w in T" is denoted by v w. If v w, v is the father ofVw and w is a
is a

son of v. If v w, v is an ancestor of w and w is a descendant of v. Every vertex is an 6


2 in a tree T, T is the subtree of T
ancestor and a descendant of itself. If v is a vertex
having as vertices all the descendants of v in T. If G is a directed graph, a tree T 8
E

is a spanning tree of G if T is a subgraph of G and T contains all the vertices of G.


Put s onto a FIFO queue, and mark s as visited. If R and S are binary relations, R* is the transitive closure of R, R-1 is the 5 0
inverse of R, and
Repeat until the queue is empty: RS {(u, w)lZlv((u, v) R & (v, w) e S)}. 2 4
1 3 2
- remove the least recently added vertex v * Received by the editors August 30, 1971, and in revised form March 9, 1972.

- for each unmarked vertex pointing from v:


"
Department of Computer Science, Cornell University, Ithaca, New York 14850. This research
was supported by the Hertz Foundation and the National Science Foundation under Grant GJ-992. 1 2
146 0 1
add to queue and mark as visited. 3 4 3
5 4 3 5
0 2

Proposition. BFS computes shortest paths (fewest number of edges)


from s to all other vertices in a digraph in time proportional to E + V. graph G

31 32
Directed breadth-first search demo Multiple-source shortest paths
tinyDG.txt

Repeat until queue is empty:


V
Multiple-source shortest paths.13
Given a digraph and a set of source
E
22any vertex in the set to each other vertex.
・Remove vertex v from queue. vertices, find shortest path from
4 2
・Add to queue all unmarked vertices pointing from v and mark them. 2 3 adj[]
Ex. S = { 1, 7, 10 }. 3 2 0
6 0
・ Shortest path to 4 is 7→6→4.
0 1 1
0 2
v edgeTo[] distTo[] ・ 2 0
Shortest path to 5 is 7→6→0→5. 2
11 12
0 – 0 ・ Shortest path to 12 is 10→12.
12 9
3

1
1
2
0
0
1
1
・ … 9 10 4
9 11 5
3 4 3 7 9
4 2 2 10 12 6
3 5 3 4 11 4 7
5 4 4 3
8
3 5
6 8 9
8 6 10
5 4
Q. How to implement multi-source shortest paths algorithm? 11
0 5
done A. Use BFS, but initialize by enqueuing
6 4 all source vertices. 12
33 6 9 34
7 6

Breadth-first search in digraphs application: web crawler Bare-bones web crawler: Java implementation

Goal. Crawl web, starting from some root web page, say www.princeton.edu. Queue<String> queue = new Queue<String>(); queue of websites to crawl
SET<String> marked = new SET<String>(); set of marked websites

String root = "http://www.princeton.edu";


25 queue.enqueue(root); start crawling from root website
0 34
Solution. [BFS with implicit digraph] marked.add(root);

・Choose root web page as source s.


7

10
2 while (!queue.isEmpty())
40
{
・Maintain a Queue of websites to explore. 41
29
49 15
19 33
String v = queue.dequeue();
read in raw html from next
・Maintain a SET of discovered websites.
8
44 StdOut.println(v);
45
website in queue
In in = new In(v);
・Dequeue the next website and enqueue 28
1 14
String input = in.readAll();

websites to which it links 39


22 48
String regexp = "http://(\\w+\\.)+(\\w+)";
18 6 21

(provided you haven't done so before). Pattern pattern = Pattern.compile(regexp); use regular expression to find all URLs
42 13 Matcher matcher = pattern.matcher(input); in website of form http://xxx.yyy.zzz
23
31 47 while (matcher.find()) [crude pattern misses relative URLs]
11 12
32 {
30
26 String w = matcher.group();
5 37
27
9
if (!marked.contains(w))
16
43 {
24 marked.add(w);
4 if unmarked, mark it and put
38 queue.enqueue(w);
on the queue
3 17
35
}
36
46 }
Q. Why not use DFS? 20
}
How many strong components are there in this digraph? 35 36
Web crawler output

BFS crawl DFS crawl

http://www.princeton.edu http://www.princeton.edu
http://www.w3.org http://deimos.apple.com
http://ogp.me http://www.youtube.com
http://giving.princeton.edu http://www.google.com
http://www.princetonartmuseum.org http://news.google.com
http://www.goprincetontigers.com http://csi.gstatic.com 4.2 D IRECTED G RAPHS
http://library.princeton.edu http://googlenewsblog.blogspot.com
http://helpdesk.princeton.edu http://labs.google.com
http://tigernet.princeton.edu http://groups.google.com ‣ introduction
http://alumni.princeton.edu http://img1.blogblog.com
http://gradschool.princeton.edu http://feeds.feedburner.com ‣ digraph API
http://vimeo.com http:/buttons.googlesyndication.com
‣ digraph search
http://princetonusg.com
http://artmuseum.princeton.edu
http://fusion.google.com
http://insidesearch.blogspot.com Algorithms
http://jobs.princeton.edu http://agoogleaday.com ‣ topological sort
http://odoc.princeton.edu http://static.googleusercontent.com
http://blogs.princeton.edu http://searchresearch1.blogspot.com
R OBERT S EDGEWICK | K EVIN W AYNE
‣ strong components
http://www.facebook.com http://feedburner.google.com
http://twitter.com http://www.dot.ca.gov http://algs4.cs.princeton.edu
http://www.youtube.com http://www.TahoeRoads.com
http://deimos.apple.com http://www.LakeTahoeTransit.com
http://qeprize.org http://www.laketahoe.com
http://en.wikipedia.org http://ethel.tahoeguide.com
... ...

37

Precedence scheduling Topological sort

Goal. Given a set of tasks to be completed with precedence constraints, DAG. Directed acyclic graph.
in which order should we schedule the tasks?
Topological sort. Redraw DAG so all edges point upwards.
Digraph model. vertex = task; edge = precedence constraint.

0. Algorithms 0 0→5 0→2 0


1. Complexity Theory 0→1 3→6
2. Artificial Intelligence 2 5 1 3→5 3→4 2 5 1
3. Intro to CS
5→2 6→4
4. Cryptography 3 4 3 4
6→0 3→2
5. Scientific Computing
6
1→4 6
6. Advanced Programming

tasks precedence constraint graph directed edges DAG

Solution. DFS. What else?


feasible schedule 39
topological order 40
Topological sort demo Topological sort demo

・Run depth-first search. ・Run depth-first search.


・Return vertices in reverse postorder. ・Return vertices in reverse postorder.
tinyDAG7.txt
0 0
7
postorder
11
0 5 4 1 2 5 0 6 3
0 2
2 5 1 0 1 2 5 1 topological order
3 6
3 6 0 5 2 1 4
3 5
3 4
5 2
3 4 3 4
6 4
6 0
3 2
6 6

a directed acyclic graph done

41 42

Depth-first search order Topological sort in a DAG: intuition

Why does topological sort algorithm work?


public class DepthFirstOrder
{ ・First vertex in postorder has outdegree 0.
・Second-to-last vertex in postorder can only point to last vertex.
private boolean[] marked;
private Stack<Integer> reversePostorder;

public DepthFirstOrder(Digraph G)
・...
{
reversePostorder = new Stack<Integer>();
0
marked = new boolean[G.V()];
for (int v = 0; v < G.V(); v++) postorder
if (!marked[v]) dfs(G, v);
4 1 2 5 0 6 3
}

private void dfs(Digraph G, int v) 2 5 1 topological order


{
marked[v] = true; 3 6 0 5 2 1 4
for (int w : G.adj(v))
if (!marked[w]) dfs(G, w);
reversePostorder.push(v); 3 4
}

returns all vertices in


public Iterable<Integer> reversePostorder()
“reverse DFS postorder”
{ return reversePostorder; } 6
}
43 44
Topological sort in a DAG: correctness proof Directed cycle detection

Proposition. Reverse DFS postorder of a DAG is a topological order. Proposition. A digraph has a topological order iff no directed cycle.
Pf. Consider any edge v→w. When dfs(v) is called: Pf.
dfs(0)
dfs(1)
dfs(4)
・If directed cycle, topological order impossible.
・Case 1: dfs(w) has already been called and returned. 4 done
1 done
・If no directed cycle, DFS-based algorithm finds a topological order.
Thus, w was done before v. dfs(2)
2 done
dfs(5) marked[]
・ Case 2: dfs(w) has not yet been called. check 2
5 done
0 1 2 3 4 5 ..
dfs(w) will get called directly or indirectly 0 done dfs(0)
check 1 dfs(5) 1 0 0 0 0 0
by dfs(v) and will finish before dfs(v). check 2 dfs(4) 1 0 0 0 0 1
Thus, w will be done before v.
v=3 dfs(3) dfs(3) 1 0 0 0 1 1
check 2
case 1 check 4
check 5 1 0 0 1 1 1
check 5
・Case 3: dfs(w) has already been called, case 2 dfs(6)
check 0 a digraph with a directed cycle Finding a directed cycle in a d
but has not yet returned. check 4
6 done
Can’t happen in a DAG: function call stack contains 3 done
check 4
path from w to v, so v→w would complete a cycle. check 5
Goal. Given a digraph, find a directed cycle.
check 6 Solution. DFS. What else? See textbook.
all vertices pointing from 3 are done before 3 is done, done
so they appear after 3 in topological order 45 46

Directed cycle detection application: precedence scheduling Directed cycle detection application: cyclic inheritance

Scheduling. Given a set of tasks to be completed with precedence The Java compiler does cycle detection.
constraints, in what order should we schedule the tasks?

public class A extends B % javac A.java


{ A.java:1: cyclic inheritance
... involving A
} public class A extends B { }
^
1 error
public class B extends C
{
...
}
http://xkcd.com/754

public class C extends A


{
...
}

Remark. A directed cycle implies scheduling problem is infeasible.


47 48
Directed cycle detection application: spreadsheet recalculation Depth-first search orders

Microsoft Excel does cycle detection (and has a circular reference toolbar!) Observation. DFS visits each vertex exactly once. The order in which it
does so can be important.

Orderings.
・Preorder: order in which dfs() is called.
・Postorder: order in which dfs() returns.
・Reverse postorder: reverse order in which dfs() returns.
private void dfs(Graph G, int v)
{
marked[v] = true;
preorder.enqueue(v);
for (int w : G.adj(v))
if (!marked[w]) dfs(G, w);
postorder.enqueue(v);
reversePostorder.push(v);
}

49 50

Strongly-connected components

Def. Vertices v and w are strongly connected if there is both a directed path
from v to w and a directed path from w to v.

Key property. Strong connectivity is an equivalence relation:

4.2 D IRECTED G RAPHS ・v is strongly connected to v.


・If v is strongly connected to w, then w is strongly connected to v.
‣ introduction ・If v is strongly connected to w and w to x, then v is strongly connected to x.
‣ digraph API
Def. A strong component is a maximal subset of strongly-connected vertices.
‣ digraph search
Algorithms
‣ topological sort

R OBERT S EDGEWICK | K EVIN W AYNE


‣ strong components
http://algs4.cs.princeton.edu

A digraph and its strong components


5 strongly-connected components 52
Connected components vs. strongly-connected components Strong component application: ecological food webs

v and w are connected if there is v and w are strongly connected if there is both a directed Food web graph. Vertex = species; edge = from producer to consumer.
a path between v and w path from v to w and a directed path from w to v

A graph and its connected components A digraph and


A its strongand
digraph components
its strong components
3 connected components 5 strongly-connected components

connected component id (easy to compute with DFS) strongly-connected component id (how to compute?)

0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
id[] 0 0 0 0 0 0 1 1 1 2 2 2 2 id[] 1 0 1 1 1 1 3 4 3 2 2 2 2

public boolean connected(int v, int w) public boolean stronglyConnected(int v, int w) http://www.twingroves.district96.k12.il.us/Wetlands/Salamander/SalGraphics/salfoodweb.gif


{ return id[v] == id[w]; } { return id[v] == id[w]; }

Strong component. Subset of species with common energy flow.


constant-time client connectivity query constant-time client strong-connectivity query
53 54

Strong component application: software modules Strong components algorithms: brief history

Software module dependency graph. 1960s: Core OR problem.


・Vertex = software module. ・Widely studied; some practical algorithms.
・Edge: from module to dependency. ・Complexity not understood.
1972: linear-time DFS algorithm (Tarjan).
・Classic algorithm.
・Level of difficulty: Algs4++.
・Demonstrated broad applicability and importance of DFS.
1980s: easy two-pass linear-time algorithm (Kosaraju-Sharir).
・Forgot notes for lecture; developed algorithm in order to teach it!
Firefox Internet Explorer
・Later found in Russian scientific literature (1972).
Strong component. Subset of mutually interacting modules. 1990s: more easy linear-time algorithms.
Approach 1. Package strong components together. ・Gabow: fixed old OR algorithm.
Approach 2. Use to improve design! ・Cheriyan-Mehlhorn: needed one-pass algorithm for LEDA.
55 56
Kosaraju-Sharir algorithm: intuition Kosaraju-Sharir algorithm demo

Reverse graph. Strong components in G are same as in GR. Phase 1. Compute reverse postorder in GR.
Phase 2. Run DFS in G, visiting unmarked vertices in reverse postorder of GR.
Kernel DAG. Contract each strong component into a single vertex.

how to compute?
Idea. 0

・Compute topological order (reverse postorder) in kernel DAG. 6 8 7


・Run DFS, considering vertices in reverse topological order. 2
1

first vertex is a sink


(has no edges pointing from it) 3 9 10

E A 5
B
D 11 12
Kernel DAG in reverseCtopological order
A digraph and its strong components
digraph G
digraph G and its strong components kernel DAG of G (topological order: A B C D E)
57 58

Kosaraju-Sharir algorithm demo Kosaraju-Sharir algorithm demo

Phase 1. Compute reverse postorder in GR. Phase 2. Run DFS in G, visiting unmarked vertices in reverse postorder of GR.
1 0 2 4 5 3 11 9 12 10 6 7 8 1 0 2 4 5 3 11 9 12 10 6 7 8

0 0
v id[]

6 8 7 6 8 7 0 1
1 0
1 2 1 2 2 1
3 1
4 1
5 1
3 9 10 3 9 10 6 3
7 4
4 4 8 3
9 2
5 5 10 2
11 12 11 12
11 2
12 2

reverse digraph GR done

59 60
Kosaraju-Sharir algorithm Kosaraju-Sharir algorithm

Simple (but mysterious) algorithm for computing strong components. Simple (but mysterious) algorithm for computing strong components.
・Phase 1: run DFS on GR to compute reverse postorder. ・Phase 1: run DFS on GR to compute reverse postorder.
・Phase 2: run DFS on G, considering vertices in order given by first DFS. ・Phase 2: run DFS on G, considering vertices in order given by first DFS.

DFS in reverse digraph GR DFS in original digraph G

check unmarked vertices in the order reverse postorder for use in second dfs() check unmarked vertices in the order
0 1 2 3 4 5 6 7 8 9 10 11 12 1 0 2 4 5 3 11 9 12 10 6 7 8 1 0 2 4 5 3 11 9 12 10 6 7 8
dfs(1) dfs(0) dfs(11) dfs(6) dfs(7)
1 done dfs(5) check 4 check 9 check 6
dfs(0) dfs(4) dfs(12) check 4 check 9
dfs(6) dfs(3) dfs(9) dfs(8) 7 done
dfs(8) check 5 check 11 check 6 check 8
check 6 dfs(2) dfs(10) 8 done
8 done check 0 check 12 check 0
dfs(7) check 3 10 done 6 done
7 done 2 done 9 done
6 done 3 done 12 done
dfs(2) check 2 11 done
dfs(4) 4 done check 9
dfs(11) 5 done check 12
dfs(9) check 1 check 10
dfs(12) 0 done
check 11 check 2
dfs(10) check 4
check 9 check 5
10 done check 3
12 done
check 7
check 6
... 9 done 61 62
11 done
check 6
dfs(5)
dfs(3)
check 4
check 2
Kosaraju-Sharir algorithm 3 done
check 0 Connected components in an undirected graph (with DFS)
5 done
4 done
check 3
2 done
Proposition. Kosaraju-Sharir algorithm computes the strong components of
0 done
dfs(1)
public class CC
{
check 0
a digraph in time proportional to E + V. 1 done private boolean marked[];
check 2 private int[] id;
check 3
check 4
private int count;
check 5
check 6 public CC(Graph G)
Pf. check 7
{
check 8

・Running time: bottleneck is running DFS twice (and computing G ).


check 9
check 10
check 11
R marked = new boolean[G.V()];
id = new int[G.V()];

・Correctness: tricky, see textbook (2 printing).


check 12 nd
for (int v = 0; v < G.V(); v++)

・Implementation: easy! {
if (!marked[v])
{
dfs(G, v);
count++;
}
}
}

private void dfs(Graph G, int v)


{
marked[v] = true;
id[v] = count;
for (int w : G.adj(v))
if (!marked[w])
dfs(G, w);
}

public boolean connected(int v, int w)


{ return id[v] == id[w]; }
}
63 64
Strong components in a digraph (with two DFSs) Digraph-processing summary: algorithms of the day

public class KosarajuSharirSCC


{
private boolean marked[];
private int[] id; single-source
private int count;
reachability DFS
public KosarajuSharirSCC(Digraph G)
{ in a digraph
marked = new boolean[G.V()];
id = new int[G.V()];
DepthFirstOrder dfs = new DepthFirstOrder(G.reverse());
for (int v : dfs.reversePostorder())
{
if (!marked[v])
{
dfs(G, v);
topological sort
DFS
count++; in a DAG
}
}
}

private void dfs(Digraph G, int v)


{ 0
marked[v] = true; 6

id[v] = count; strong 7 8

for (int w : G.adj(v)) 1 2 Kosaraju-Sharir


if (!marked[w]) components 9 10

dfs(G, w); DFS (twice)


in a digraph
3
4
} 11 12
5
public boolean stronglyConnected(int v, int w)
{ return id[v] == id[w]; }
}
65 66