Professional Documents
Culture Documents
material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org
advertiser
Need to re-compute
betweenness at
every step
Questions:
How can we define a “good” partition of 𝑮?
How can we efficiently identify such a partition?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 25
What makes a good partition?
Maximize the number of within-group
connections
Minimize the number of between-group
connections
5
1
2 6
4
3
A B
A 5 B
1
cut(A,B) = 2
2 6
4
3
Problem:
Only considers external cluster connections
Does not consider internal cluster connectivity
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 28
[Shi-Malik]
5
1 3 0 0 0 0 0
1 2 0 2 0 0 0 0
2 6 3 0 0 3 0 0 0
4
3 4 0 0 0 3 0 0
5 0 0 0 0 3 0
6 0 0 0 0 0 2
n n symmetric matrix 2 -1 2 -1 0 0 0
3 -1 -1 3 -1 0 0
5
1
4 0 0 -1 3 -1 -1
2 6
4 5 -1 0 0 -1 3 -1
3 6 0 0 0 -1 -1 2
2 𝟐
=σ (𝑥
𝑖,𝑗 ∈𝐸 𝑖 + 𝑥𝑗2 − 2𝑥𝑖 𝑥𝑗 ) = σ 𝒊,𝒋 ∈𝑬 𝒙𝒊 − 𝒙𝒋
Node 𝒊 has degree 𝒅𝒊 . So, value 𝒙𝟐𝒊 needs to be summed up 𝒅𝒊 times.
But each edge (𝒊, 𝒋) has two endpoints so we need 𝒙𝟐𝒊 +𝒙𝟐𝒋
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 40
T
x Mx Details!
2 min T
x x x
Write 𝑥 in axes of eigenvecotrs 𝑤1 , 𝑤2 , … , 𝑤𝑛 of
𝑴. So, 𝑥 = σ𝑛𝑖 𝛼𝑖 𝑤𝑖
Then we get: 𝑀𝑥 = σ𝑖 𝛼𝑖 𝑀𝑤𝑖 = σ𝑖 𝛼𝑖 𝜆𝑖 𝑤𝑖
So, what is 𝒙𝑻 𝑴𝒙? 𝝀𝒊 𝒘𝒊 = 𝟎 if 𝒊 ≠ 𝒋
1 otherwise
𝑥 𝑇 𝑀𝑥 = σ𝑖 𝛼𝑖 𝑤𝑖 σ𝑖 𝛼𝑖 𝜆𝑖 𝑤𝑖 = σ𝑖𝑗 𝛼𝑖 𝜆𝑗 𝛼𝑗 𝑤𝑖 𝑤𝑗
= σ𝑖 𝛼𝑖 𝜆𝑖 𝑤𝑖 𝑤𝑖 = σ𝒊 𝝀𝒊 𝜶𝟐𝒊
To minimize this over all unit vectors x orthogonal to:
w = min over choices of (𝛼1 , … 𝛼𝑛 ) so that:
σ𝛼𝑖2 = 1 (unit length) σ𝛼𝑖 = 0 (orthogonal to 𝑤1 )
To minimize this, set 𝜶𝟐 = 𝟏 and so σ𝒊 𝝀𝒊 𝜶𝟐𝒊 = 𝝀𝟐
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41
What else do we know about x?
𝒙 is unit vector: σ𝒊 𝒙𝟐𝒊 = 𝟏
𝒙 is orthogonal to 1st eigenvector (𝟏, … , 𝟏) thus:
σ𝒊 𝒙𝒊 ⋅ 𝟏 = σ𝒊 𝒙𝒊 = 𝟎
Remember:
2 min
( i , j )E ( xi x j ) 2
All labelings
of nodes 𝑖 so
that σ𝑥𝑖 = 0
i x 2
i
x
We want to assign values 𝒙𝒊 to nodes i such
𝑥𝑖 0 𝑥𝑗
that few edges cross 0.
(we want xi and xj to subtract each other) Balance to minimize
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 42
Back to finding the optimal cut
Express partition (A,B) as a vector
+𝟏 𝒊𝒇 𝒊 ∈ 𝑨
𝒚𝒊 = ቊ
−𝟏 𝒊𝒇 𝒊 ∈ 𝑩
We can minimize the cut of the partition by
finding a non-trivial vector x that minimizes:
Proof (continued):
𝟏
− 𝒊𝒇 𝒊 ∈ 𝑨
1) Let’s set: 𝒙𝒊 = ൞ 𝒂𝟏
+ 𝒊𝒇 𝒊 ∈ 𝑩
𝒃
1 1
Let’s quickly verify that σ𝑖 𝑥𝑖 = 0: 𝑎 − +𝑏 =𝟎
𝑎 𝑏
2 1 1 2 1 1 2
σ 𝑥𝑖 −𝑥𝑗 σ𝑖∈𝐴,𝑗∈𝐵 + 𝑒⋅ +
2) Then: σ𝑖 𝑥𝑖2
= 1 2
𝑏 𝑎
1 2
= 𝑎 𝑏
1 1 =
𝑎 − +𝑏 +
𝑎 𝑏
𝑎 𝑏
1 1 1 1 𝟐 Which proves that the cost
𝑒 + ≤𝑒 + ≤ 𝒆 = 𝟐𝜶 achieved by spectral is better
𝑎 𝑏 𝑎 𝑎 𝒂
than twice the OPT cost
e … number of edges between A and B
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 46
Details!
Spectral Clustering
Build Laplacian
1 3 -1 -1 0 -1 0
2 -1 2 -1 0 0 0
matrix L of the 3
4
-1
0
-1
0
3
-1
-1
3
0
-1
0
-1
graph 5 -1 0 0 -1 3 -1
6 0 0 0 -1 -1 2
2) 0.0
1.0
0.4
0.4
0.3
0.6
-0.5
0.4
-0.2
-0.4
-0.4
0.4
-0.5
0.0
Decomposition: =
3.0
X=
0.4 0.3 0.1 0.6 -0.4 0.5
4.0
0.4
0.4
-0.3
-0.3
0.1
-0.5
0.6
-0.2
0.4
0.4
-0.5
0.5
and eigenvectors x 5.0 0.4 -0.6 0.4 -0.4 -0.4 0.0
of the matrix L
1 0.3
2 0.6
Map vertices to 3 0.3
How do we now
corresponding 4 -0.3
components of 2 5
6
-0.3
-0.6
find the clusters?
Rank in x2
Value of x2
Rank in x2
Components of x3
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 54
How do we partition a graph into k clusters?
…
the left talk about on the right
Remember HITS!
|X| = s = 3
|Y| = t = 4
X Y
K3,4
Fully connected
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 59
[Agrawal-Srikant ‘99]
[Kumar, Raghavan, Rajagopalan, Tomkins: Trawling the Web for emerging cyber-communities 1999]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 64