outdegree = 4
indegree = 2
0
6 8 7
directed path
1 2
from 0 to 2
3 9 10
4 directed cycle
5
11 12
3 4
Political blogosphere graph Overnight interbank loan graph
Vertex = political blog; edge = link. Vertex = bank; edge = overnight loan.
GSCC
GWCC
GIN
GOUT
DC
Tendril
!"#$%& '( !&)&%*+ ,$-). -&/01%2 ,1% 3&4/&56&% 7'8 799:; <=>> ? #"*-/ 0&*2+@ A1--&A/&) A1541-&-/8
The Political Blogosphere and the 2004 U.S. Election: Divided They Blog, Adamic and Glance, 2005 The Topology
B> ? )".A1--&A/&) A1541-&-/8 <3>> of?
the Federal
#"*-/ Funds
./%1-#+@ Market, A1541-&-/8
A1--&A/&) Bech and Atalay,
<CD ? 2008
#"*-/ "-EA1541-&-/8
Figure 1: Community structure of political blogs (expanded set), shown using utilizing a GEM
layout [11] in the GUESS[3] visualization and analysis tool. The colors reflect political orientation, <FGH ? #"*-/ 1$/E A1541-&-/; F- /I". )*@ /I&%& 0&%& JK -1)&. "- /I& <3>>8 L9L -1)&. "- /I& <CD8 :K
5 6
red for conservative, and blue for liberal. Orange links go from liberal to conservative, and purple -1)&. "- <FGH8 J9 -1)&. "- /I& /&-)%"+. *-) 7 -1)&. "- * )".A1--&A/&) A1541-&-/;
ones from conservative to liberal. The size of each blog reflects the number of other blogs that link
to it.
!"#$%&%'$( HI& -1)&. 1, * -&/01%2 A*- 6& 4*%/"/"1-&) "-/1 * A1++&A/"1- 1, )".M1"-/ .&/. A*++&) )".A1--&A/&)
Uber taxi graph Implication
A1541-&-/.8 !!! " #graph
"! !! !! "; HI& -1)&. 0"/I"- &*AI )".A1--&A/&) A1541-&-/ )1 -1/ I*N& +"-2. /1 1% ,%15
longer existed, or had moved to a diﬀerent location. When looking at the front page of a blog we did
-1)&. "- *-@ 1/I&% A1541-&-/8 ";&;8 #!"# $"# !$# "" $ " $ !! !! " % $ $ !!! !! "& # ' ", % (# %! ; HI& A1541-&-/
not make a distinction between blog references made in blogrolls (blogroll links) from those made
0"/I /I& +*%#&./ -$56&% 1, -1)&. ". %&,&%%&) /1 *. /I& )%*$& +"*,-. /'$$"/&"0 /'12'$"$& O<=>>P; C- 1/I&%
in posts (post citations). This had the disadvantage of not diﬀerentiating between blogs that were
Vertex = taxi pickup; edge = taxi ride. Vertex
01%).8 /I& = variable;
<=>> edgeA1541-&-/
". /I& +*%#&./ = logical implication.
1, /I& -&/01%2 "- 0I"AI *++ -1)&. A1--&A/ /1 &*AI 1/I&% N"*
actively mentioned in a post on that day, from blogroll links that remain static over many weeks [10]. $-)"%&A/&) 4*/I.; HI& %&5*"-"-# )".A1--&A/&) A1541-&-/. OB>.P *%& .5*++&% A1541-&-/. ,1% 0I"AI /I&
Since posts usually contain sparse references to other blogs, and blogrolls usually contain dozens of .*5& ". /%$&; C- &54"%"A*+ ./$)"&. /I& <=>> ". 1,/&- ,1$-) /1 6& .&N&%*+ 1%)&%. 1, 5*#-"/$)& +*%#&% /I*-
blogs, we assumed that the network obtained by crawling the front page of each blog would strongly *-@ 1, /I& B>. O.&& Q%1)&% "& *-3 O7999PP;
reflect blogroll links. 479 blogs had blogrolls through blogrolling.com, while many others simply HI& <=>> A1-."./. 1, * )%*$& 4&5'$)-. /'$$"/&"0 /'12'$"$& O<3>>P8 * )%*$& '6&7/'12'$"$& O<FGHP8
maintained a list of links to their favorite blogs. We did not include blogrolls placed on a secondary if x5 is true,
* )%*$& %$7/'12'$"$& O<CDP *-) &"$05%-4 O.&& !"#$%& 'P; HI& <3>> A154%".&. *++ -1)&. /I*/ A*- %&*AI &N&%@
page. then x0 is true
1/I&% -1)& "- /I& <3>> /I%1$#I * )"%&A/&) 4*/I; R -1)& ". "- /I& <FGH ", "/ I*. * 4*/I ,%15 /I& <3>>
We constructed a citation network by identifying whether a URL present on the page of one blog 6$/ -1/ /1 /I& <3>>; C- A1-/%*./8 * -1)& ". "- ~x /I&2 <CD ", "/xI*.
0 * 4*/I /1 /I& <3>> 6$/ -1/ ,%15 "/; R
references another political blog. We called a link found anywhere on a blog’s page, a “page link” to -1)& ". "- * /&-)%"+ ", "/ )1&. -1/ %&.")& 1- * )"%&A/&) 4*/I /1 1% ,%15 /I& <3>>;S9
distinguish it from a “post citation”, a link to another blog that occurs strictly within a post. Figure 1 !%4/644%'$( C- /I& -&/01%2 1, 4*@5&-/. .&-/ 1N&% !&)0"%& *-*+@T&) 6@ 31%*5U2" "& *-3 O799:P8 /I& <3>>
shows the unmistakable division between the liberal and conservative political (blogo)spheres. In ". /I& +*%#&./ A1541-&-/; F- *N&%*#&8 *+51./ %&' 1, /I& -1)&. "- /I*/ -&/01%2 6&+1-# /1 /I& <3>>; C-
fact, 91% of the links originating within either the conservative or liberal communities stay within A1-/%*./8 /I& <3>> ". 5$AI .5*++&%x,1% /I& ,&)&%*+ ~x4 ,$-). -&/01%2;
x3 C- 799:8 1-+@ (&' ) (' 1, /I& -1)&.
6
that community. An eﬀect that may not be as apparent from the visualization is that even though 6&+1-# /1 /I". A1541-&-/; Q@ ,*% /I& +*%#&./ A1541-&-/ ". /I& <CD; C- 799:8 )%' ) )' 1, /I& -1)&. 0&%&
we started with a balanced set of blogs, conservative blogs show a greater tendency to link. 84% "- /I". A1541-&-/; HI& <FGH A1-/*"-&) (*' ) +' 1, *++ -1)&. 4&% )*@8 0I"+& /I&%& 0&%& (+' ) ,' 1,
of conservative blogs link to at least one other blog, and 82% receive a link. In contrast, 74% of /I& -1)&. +1A*/&) "- /I& /&-)%"+.;SS V&.. /I*- -' ) (' 1, /I& -1)&. 0&%& "- /I& %&5*"-"-# )".A1--&A/&)
liberal blogs link to another blog, while only 67% are linked to by another blog. So overall, we see a A1541-&-/. O.&& H*6+& JP;
slightly higher tendency for conservative blogs to link. Liberal blogs linked to 13.6 blogs on average, ~x5 ~x1 x1 x5
S 9 HI& /&-)%"+. 5*@ *+.1 6& )"W&%&-/"*/&) "-/1 /I%&& .$6A1541-&-/.( * .&/ 1, -1)&. /I*/ *%& 1- * 4*/I &5*-*/"-# ,%15 <CD8 *
while conservative blogs linked to an average of 15.1, and this diﬀerence is almost entirely due to .&/ 1, -1)&. /I*/ *%& 1- * 4*/I +&*)"-# /1 <FGH8 *-) * .&/ 1, -1)&. /I*/ *%& 1- * 4*/I /I*/ 6&#"-. "- <CD *-) &-). "- <FGH;
S S !!"# 1, -1)&. 0&%& "- X,%15E<CDY /&-)%"+.8 $!%# 1, -1)&. 0&%& "- /I& X/1E<FGHY /&-)%"+. *-) "!&# 1, -1)&. 0&%& "-
the higher proportion of liberal blogs with no links at all.
X/$6&.Y ,%15 <CD /1 <FGH;
Although liberal blogs may not link as generously on average, the most popular liberal blogs,
Daily Kos and Eschaton (atrios.blogspot.com), had 338 and 264 links from our single-day snapshot ~x3 x4 ~x6
S7
4
~x0 x2
http://blog.uber.com/2012/01/09/uberdata-san-franciscomics/
7 8
Combinational circuit WordNet graph
Vertex = logical gate; edge = wire. Vertex = synset; edge = hypernym relationship.
event
group_ac on
damage harm ..
impairment transi on increase forfeit forfeiture ac on
demo on varia on
mo on movement move
9 10
scheduling task precedence constraint directed cycle Is there a directed cycle in the graph ?
citation journal article citation PageRank What is the importance of a web page ?
14
tinyDG.txt
% java Digraph tinyDG.txt Maintain vertex-indexed array of lists.
V 0->5 5 1
13
E 0->1 tinyDG.txt
22
V 5 1
4 2 2->0 13
2 3 adj[] E
2->3 0 3 tinyDG.txt 22
3 2
6 0
0
3->5 V 4 2
adj[]
5 1
0 1 1 5 2 13 2 3 0 3
3->2 E 3 2 0
2 0 2
4->3 22 6 0
11 12 3 2 1 5 2
12 9
3
4->2 4 2 0 1
4 2 0 2
adj[]
9 10 5->4 4 2 3 11 12 3 02 3
9 11 5 3
7 9
⋮ 3 2 12 9 0
6 9 4 8 0 4
10 12 11->4 6 0 9 10 4
11 4 7 11->12 0 1
9 11 1 5 5 2
4 3 6 9 7 9
8 12->9 6 9 4 8 0
3 5
9
2 0 10 12 2
6 8
8 6
6
11 12 11 4 7 3 2
10 4 3 3 8
6 9
⋮5 4 11 11 10 12 9 3 5
0 5
6 4 12 9 10 6 8 4 9
6 4
12 8 6 10
6 9=
In in
7 6
new In(args[0]); read digraph from 9 11 5 4 5
Digraph G = new Digraph(in); input stream 7 9 0 5 11 11 10
4 12
10 12 6 4 6 12 9 4 8 0
6 9 12
for (int v = 0; v < G.V(); v++) 9 11 4 7
7 6
print out each
for (int w : G.adj(v)) 4 3 4 612 9
edge (once) 8
StdOut.println(v + "->" + w); 3 5
9 9
6
15
8 6 16
8 6 10
5 4
Digraph representations Adjacency-lists graph representation (review): Java implementation
public Digraph(int V)
{ create empty digraph
4.2 D IRECTED G RAPHS
with V vertices
this.V = V;
adj = (Bag<Integer>[]) new Bag[V]; ‣ introduction
for (int v = 0; v < V; v++)
adj[v] = new Bag<Integer>(); ‣ digraph API
}
‣ digraph search
Algorithms
‣ topological sort
add edge v→w
public void addEdge(int v, int w)
{
adj[v].add(w); R OBERT S EDGEWICK | K EVIN W AYNE
‣ strong components
http://algs4.cs.princeton.edu
}
iterator for vertices
public Iterable<Integer> adj(int v) pointing from v
{ return adj[v]; }
}
19
Reachability Depth-first search in digraphs
Problem. Find all vertices reachable from s along a directed path. Same method as for undirected graphs.
s
・Every undirected graph is a digraph (with edges in both directions).
・DFS is a digraph algorithm.
Mark v as visited.
Recursively visit all unmarked
vertices w pointing from v.
21 22
Recall code for undirected graphs. Code for directed graphs identical to undirected one.
[substitute Digraph for Graph]
private void dfs(Graph G, int v) recursive DFS does the work private void dfs(Digraph G, int v) recursive DFS does the work
{ {
marked[v] = true; marked[v] = true;
for (int w : G.adj(v)) for (int w : G.adj(v))
if (!marked[w]) dfs(G, w); if (!marked[w]) dfs(G, w);
} }
client can ask whether any client can ask whether any
public boolean visited(int v) public boolean visited(int v)
vertex is connected to s vertex is reachable from s
{ return marked[v]; } { return marked[v]; }
} }
25 26
Reachability application: program control-flow analysis Reachability application: mark-sweep garbage collector
roots
27 28
Reachability application: mark-sweep garbage collector Depth-first search in digraphs summary
Mark-sweep algorithm. [McCarthy, 1960] DFS enables direct solution of simple digraph problems.
・Mark: mark all reachable objects. ✓ ・Reachability.
・Sweep: if object is unmarked, it is garbage (so add to free list). ・Path finding.
・Topological sort.
Memory cost. Uses 1 extra mark bit per object (plus DFS stack). ・Directed cycle detection.
Basis for solving difficult digraph problems.
・2-satisfiability.
・Directed Euler path.
・Strongly-connected components.
roots
SIAM J. COMPUT.
Vol. 1, No. 2, June 1972
・Every undirected graph is a digraph (with edges in both directions). ・Remove vertex v from queue.
Two cycles which are cyclic permutations of each other are considered to be the
same cycle. The undirected version of a directed graph is the graph formed by
・BFS is a digraph algorithm. ・Add to queue all unmarked vertices pointing from v and mark them.
converting each edge of the directed graph into an undirected edge and removing
duplicate edges. An undirected graph is connected if there is a path between every
pair of vertices.
A (directed rooted) tree T is a directed graph whose undirected version is
connected, having one vertex which is the head of no edges (called the root),
and such that all vertices except the root are the head of exactly one edge. The
31 32
Directed breadth-first search demo Multiple-source shortest paths
tinyDG.txt
1
1
2
0
0
1
1
・ … 9 10 4
9 11 5
3 4 3 7 9
4 2 2 10 12 6
3 5 3 4 11 4 7
5 4 4 3
8
3 5
6 8 9
8 6 10
5 4
Q. How to implement multi-source shortest paths algorithm? 11
0 5
done A. Use BFS, but initialize by enqueuing
6 4 all source vertices. 12
33 6 9 34
7 6
Breadth-first search in digraphs application: web crawler Bare-bones web crawler: Java implementation
Goal. Crawl web, starting from some root web page, say www.princeton.edu. Queue<String> queue = new Queue<String>(); queue of websites to crawl
SET<String> marked = new SET<String>(); set of marked websites
10
2 while (!queue.isEmpty())
40
{
・Maintain a Queue of websites to explore. 41
29
49 15
19 33
String v = queue.dequeue();
read in raw html from next
・Maintain a SET of discovered websites.
8
44 StdOut.println(v);
45
website in queue
In in = new In(v);
・Dequeue the next website and enqueue 28
1 14
String input = in.readAll();
(provided you haven't done so before). Pattern pattern = Pattern.compile(regexp); use regular expression to find all URLs
42 13 Matcher matcher = pattern.matcher(input); in website of form http://xxx.yyy.zzz
23
31 47 while (matcher.find()) [crude pattern misses relative URLs]
11 12
32 {
30
26 String w = matcher.group();
5 37
27
9
if (!marked.contains(w))
16
43 {
24 marked.add(w);
4 if unmarked, mark it and put
38 queue.enqueue(w);
on the queue
3 17
35
}
36
46 }
Q. Why not use DFS? 20
}
How many strong components are there in this digraph? 35 36
Web crawler output
http://www.princeton.edu http://www.princeton.edu
http://www.w3.org http://deimos.apple.com
http://ogp.me http://www.youtube.com
http://giving.princeton.edu http://www.google.com
http://www.princetonartmuseum.org http://news.google.com
http://www.goprincetontigers.com http://csi.gstatic.com 4.2 D IRECTED G RAPHS
http://library.princeton.edu http://googlenewsblog.blogspot.com
http://helpdesk.princeton.edu http://labs.google.com
http://tigernet.princeton.edu http://groups.google.com ‣ introduction
http://alumni.princeton.edu http://img1.blogblog.com
http://gradschool.princeton.edu http://feeds.feedburner.com ‣ digraph API
http://vimeo.com http:/buttons.googlesyndication.com
‣ digraph search
http://princetonusg.com
http://artmuseum.princeton.edu
http://fusion.google.com
http://insidesearch.blogspot.com Algorithms
http://jobs.princeton.edu http://agoogleaday.com ‣ topological sort
http://odoc.princeton.edu http://static.googleusercontent.com
http://blogs.princeton.edu http://searchresearch1.blogspot.com
R OBERT S EDGEWICK | K EVIN W AYNE
‣ strong components
http://www.facebook.com http://feedburner.google.com
http://twitter.com http://www.dot.ca.gov http://algs4.cs.princeton.edu
http://www.youtube.com http://www.TahoeRoads.com
http://deimos.apple.com http://www.LakeTahoeTransit.com
http://qeprize.org http://www.laketahoe.com
http://en.wikipedia.org http://ethel.tahoeguide.com
... ...
37
Goal. Given a set of tasks to be completed with precedence constraints, DAG. Directed acyclic graph.
in which order should we schedule the tasks?
Topological sort. Redraw DAG so all edges point upwards.
Digraph model. vertex = task; edge = precedence constraint.
41 42
public DepthFirstOrder(Digraph G)
・...
{
reversePostorder = new Stack<Integer>();
0
marked = new boolean[G.V()];
for (int v = 0; v < G.V(); v++) postorder
if (!marked[v]) dfs(G, v);
4 1 2 5 0 6 3
}
Proposition. Reverse DFS postorder of a DAG is a topological order. Proposition. A digraph has a topological order iff no directed cycle.
Pf. Consider any edge v→w. When dfs(v) is called: Pf.
dfs(0)
dfs(1)
dfs(4)
・If directed cycle, topological order impossible.
・Case 1: dfs(w) has already been called and returned. 4 done
1 done
・If no directed cycle, DFS-based algorithm finds a topological order.
Thus, w was done before v. dfs(2)
2 done
dfs(5) marked[]
・ Case 2: dfs(w) has not yet been called. check 2
5 done
0 1 2 3 4 5 ..
dfs(w) will get called directly or indirectly 0 done dfs(0)
check 1 dfs(5) 1 0 0 0 0 0
by dfs(v) and will finish before dfs(v). check 2 dfs(4) 1 0 0 0 0 1
Thus, w will be done before v.
v=3 dfs(3) dfs(3) 1 0 0 0 1 1
check 2
case 1 check 4
check 5 1 0 0 1 1 1
check 5
・Case 3: dfs(w) has already been called, case 2 dfs(6)
check 0 a digraph with a directed cycle Finding a directed cycle in a d
but has not yet returned. check 4
6 done
Can’t happen in a DAG: function call stack contains 3 done
check 4
path from w to v, so v→w would complete a cycle. check 5
Goal. Given a digraph, find a directed cycle.
check 6 Solution. DFS. What else? See textbook.
all vertices pointing from 3 are done before 3 is done, done
so they appear after 3 in topological order 45 46
Directed cycle detection application: precedence scheduling Directed cycle detection application: cyclic inheritance
Scheduling. Given a set of tasks to be completed with precedence The Java compiler does cycle detection.
constraints, in what order should we schedule the tasks?
Microsoft Excel does cycle detection (and has a circular reference toolbar!) Observation. DFS visits each vertex exactly once. The order in which it
does so can be important.
Orderings.
・Preorder: order in which dfs() is called.
・Postorder: order in which dfs() returns.
・Reverse postorder: reverse order in which dfs() returns.
private void dfs(Graph G, int v)
{
marked[v] = true;
preorder.enqueue(v);
for (int w : G.adj(v))
if (!marked[w]) dfs(G, w);
postorder.enqueue(v);
reversePostorder.push(v);
}
49 50
Strongly-connected components
Def. Vertices v and w are strongly connected if there is both a directed path
from v to w and a directed path from w to v.
v and w are connected if there is v and w are strongly connected if there is both a directed Food web graph. Vertex = species; edge = from producer to consumer.
a path between v and w path from v to w and a directed path from w to v
connected component id (easy to compute with DFS) strongly-connected component id (how to compute?)
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
id[] 0 0 0 0 0 0 1 1 1 2 2 2 2 id[] 1 0 1 1 1 1 3 4 3 2 2 2 2
Strong component application: software modules Strong components algorithms: brief history
Reverse graph. Strong components in G are same as in GR. Phase 1. Compute reverse postorder in GR.
Phase 2. Run DFS in G, visiting unmarked vertices in reverse postorder of GR.
Kernel DAG. Contract each strong component into a single vertex.
how to compute?
Idea. 0
E A 5
B
D 11 12
Kernel DAG in reverseCtopological order
A digraph and its strong components
digraph G
digraph G and its strong components kernel DAG of G (topological order: A B C D E)
57 58
Phase 1. Compute reverse postorder in GR. Phase 2. Run DFS in G, visiting unmarked vertices in reverse postorder of GR.
1 0 2 4 5 3 11 9 12 10 6 7 8 1 0 2 4 5 3 11 9 12 10 6 7 8
0 0
v id[]
6 8 7 6 8 7 0 1
1 0
1 2 1 2 2 1
3 1
4 1
5 1
3 9 10 3 9 10 6 3
7 4
4 4 8 3
9 2
5 5 10 2
11 12 11 12
11 2
12 2
59 60
Kosaraju-Sharir algorithm Kosaraju-Sharir algorithm
Simple (but mysterious) algorithm for computing strong components. Simple (but mysterious) algorithm for computing strong components.
・Phase 1: run DFS on GR to compute reverse postorder. ・Phase 1: run DFS on GR to compute reverse postorder.
・Phase 2: run DFS on G, considering vertices in order given by first DFS. ・Phase 2: run DFS on G, considering vertices in order given by first DFS.
check unmarked vertices in the order reverse postorder for use in second dfs() check unmarked vertices in the order
0 1 2 3 4 5 6 7 8 9 10 11 12 1 0 2 4 5 3 11 9 12 10 6 7 8 1 0 2 4 5 3 11 9 12 10 6 7 8
dfs(1) dfs(0) dfs(11) dfs(6) dfs(7)
1 done dfs(5) check 4 check 9 check 6
dfs(0) dfs(4) dfs(12) check 4 check 9
dfs(6) dfs(3) dfs(9) dfs(8) 7 done
dfs(8) check 5 check 11 check 6 check 8
check 6 dfs(2) dfs(10) 8 done
8 done check 0 check 12 check 0
dfs(7) check 3 10 done 6 done
7 done 2 done 9 done
6 done 3 done 12 done
dfs(2) check 2 11 done
dfs(4) 4 done check 9
dfs(11) 5 done check 12
dfs(9) check 1 check 10
dfs(12) 0 done
check 11 check 2
dfs(10) check 4
check 9 check 5
10 done check 3
12 done
check 7
check 6
... 9 done 61 62
11 done
check 6
dfs(5)
dfs(3)
check 4
check 2
Kosaraju-Sharir algorithm 3 done
check 0 Connected components in an undirected graph (with DFS)
5 done
4 done
check 3
2 done
Proposition. Kosaraju-Sharir algorithm computes the strong components of
0 done
dfs(1)
public class CC
{
check 0
a digraph in time proportional to E + V. 1 done private boolean marked[];
check 2 private int[] id;
check 3
check 4
private int count;
check 5
check 6 public CC(Graph G)
Pf. check 7
{
check 8
・Implementation: easy! {
if (!marked[v])
{
dfs(G, v);
count++;
}
}
}