Professional Documents
Culture Documents
material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org
advertiser
Method 1: Girvan-Newman
Divisive hierarchical clustering based on the
notion of edge betweenness:
Number of shortest paths passing through the edge
Girvan-Newman Algorithm:
Undirected unweighted networks
Repeat until no edges are left:
Calculate betweenness of edges
Remove edges with highest betweenness
Connected components are communities
Gives a hierarchical decomposition of the network
12
1
33
49
Need to re-compute
betweenness at
every step
Note:
∑ 𝑘 𝑢 =2 𝑚
𝑢 ∈𝑁
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 21
Modularity
Modularity of partitioning S of graph G:
Q ∑s S [ (# edges within group s) –
(expected # edges within group s) ]
Aij = 1 if ij,
Normalizing cost.: -1<Q<1
0 else
A 1 5 B
2 6
3 4
Questions:
How can we define a “good” partition of ?
How can we efficiently identify such a partition?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 25
Graph Partitioning
What makes a good partition?
Maximize the number of within-group
connections
Minimize the number of between-group
connections
5
1
2 6
4
3
A B
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 26
Graph Cuts
Express partitioning objectives as a function
of the “edge cut” of the partition
Cut: Set of edges with only one vertex in a
group: cut ( A, B) w
i A, jB
ij
A 5 B
1
cut(A,B) = 2
2 6
4
3
Problem:
Only considers external cluster connections
Does not consider internal cluster connectivity
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 28
[Shi-Malik]
a11 a1n x1 y1
n
yi Aij xj x j
an1 ann xn yn j 1 ( i , j )E
1 2 ... n
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 31
Example: d-regular graph
Suppose all nodes in have degree
and is connected
What are some eigenvalues/vectors of ?
What is ? What x?
Let’s try:
Then: . So:
We found eigenpair of : ,
1 2 3 4 5 6
5 1 0 1 1 0 1 0
1
2 1 0 1 0 0 0
2 6 3 1 1 0 1 0 0
4
3 4 0 0 1 0 1 1
5 1 0 0 1 0 1
6 0 0 0 1 1 0
Important properties:
Symmetric matrix
Eigenvectors are real and orthogonal
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 36
Matrix Representations
Degree matrix (D):
n n diagonal matrix
D=[dii], dii = degree of node i
1 2 3 4 5 6
1 3 0 0 0 0 0
5
1 2 0 2 0 0 0 0
2 3 0 0 3 0 0 0
6
4 4 0 0 0 3 0 0
3
5 0 0 0 0 3 0
6 0 0 0 0 0 2
n n symmetric matrix 2 -1 2 -1 0 0 0
3 -1 -1 3 -1 0 0
5
1 4 0 0 -1 3 -1 -1
2 6 5 -1 0 0 -1 3 -1
4
3 6 0 0 0 -1 -1 2
Remember:
2 min
( i , j )E ( xi x j ) 2
All
labelings
of nodes so
that
i x 2
i i j
x
We want to assign values to nodes i such 𝑥𝑖
that few edges cross 0. 0 𝑥 𝑗
(we want xi and xj to subtract each other) Balance to minimize
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 42
Find Optimal Cut [Fiedler’73]
Back to finding the optimal cut
Express partition (A,B) as a vector
arg min f ( y) ( yi y j ) 2
minn f ( y )
y
( i , j )E
( yi y j ) y Ly 2 T
ii j
𝑥𝑖 0 𝑥 𝑗 x
is only smaller
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 45
Details!
Approx. Guarantee of Spectral
Proof (continued):
1) Let’s set:
Let’s quickly verify that
2) Then:
Spectral Clustering
1 3 -1 -1 0 -1 0
Build Laplacian 2 -1 2 -1 0 0 0
3 -1 -1 3 -1 0 0
matrix L of the 4 0 0 -1 3 -1 -1
5 -1 0 0 -1 3 -1
graph 6 0 0 0 -1 -1 2
3.0
=
0.4 0.3 0.1 0.6 -0.4 0.5
X=
Find eigenvalues 3.0 0.4 -0.3 0.1 0.6 0.4 -0.5
and eigenvectors x
4.0 0.4 -0.3 -0.5 -0.2 0.4 0.5
of the matrix L
1 0.3
2 0.6
Value of x2
Rank in x2
Value of x2
Rank in x2
Components of x3
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 54
k-Way Spectral Clustering
How do we partition a graph into k clusters?
Trawling
Searching for small communities in
the Web graph
What is the signature of a community /
discussion in a Web graph?
…
the left talk about on the right
Remember HITS!
|X| = s = 3
|Y| = t = 4
X Y
K3,4
Fully connected
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 59
[Agrawal-Srikant ‘99]
c
{b,d}: support 3
d {e,f}: support 2
And we just found 2 bipartite
e
f subgraphs:
Itemsets:
a = {b,c,d} a b c
b = {d} d
c
c = {b,d,e,f}
d e
d = {e,f} f
e = {b,d} e
f = {} J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 63
Example (2)
Example of a community from a web graph
[Kumar, Raghavan, Rajagopalan, Tomkins: Trawling the Web for emerging cyber-communities 1999]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 64