Professional Documents
Culture Documents
POP
Rock
Goal:
1.Recover the underlying clusters
2.Rank the items in each cluster
Given only pairwise preferences!
EDM
2
Intuition: Utilize the heterogeneity
POP
Rock
4
Main Contribution
5
Proposed Model: Preference Score
Items:
𝑛1 = 𝛽𝑛
𝑛2 = 𝑛 − 𝛽𝑛
𝛽 ∈ 0,0.5
Preference score:
6
Proposed Model: Number of Comparisons
Bin(𝐿, 𝑝) times
𝑤𝑖 𝑤𝑗
N. B. Shah and M. J. Wainwright, “Simple, robust and optimal ranking from pairwise comparisons,” The Journal
of Machine Learning Research, vol. 18, no. 1, pp. 7246–7283, 2017.
7
Proposed Model: Heterogeneity
Outcome of comparisons:
𝑟𝑖𝑗 0.5
𝑤𝑖 𝑤𝑗
𝑤𝑖 𝑤𝑖 𝑤𝑗
𝑟𝑖𝑗 =
𝑟𝑗𝑖 𝑤𝑖 + 𝑤𝑗 0.5
R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. the method of paired
comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
8
R. D. Luce, Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.
Proposed Model: Heterogeneity
Outcome of comparisons:
Heterogeneity!
𝑟𝑖𝑗 0.5
𝑤𝑖 𝑤𝑗
𝑤𝑖 𝑤𝑖 𝑤𝑗
𝑟𝑖𝑗 =
𝑟𝑗𝑖 𝑤𝑖 + 𝑤𝑗 0.5
R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. the method of paired
comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
8
R. D. Luce, Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.
Proposed Model: Quantifying the Heterogeneity
𝑤1 𝑤2 …… 𝑤𝑛1
{
{
{
|𝑤𝑖 − 𝑤𝑗 | Gap of
Δ= min 𝑟𝑖𝑗 − 0.5 Δ= min
𝜎 𝑖 =𝜎(𝑗) 𝜎 𝑖 =𝜎(𝑗) 2(𝑤𝑖 + 𝑤𝑗 ) preference score
Intra-cluster pair (𝑖, 𝑗) Intra-cluster pair (𝑖, 𝑗)
9
Proposed Model: Quantifying the Heterogeneity
𝑤1 𝑤2 …… 𝑤𝑛1
{
{
{
|𝑤𝑖 − 𝑤𝑗 | Gap of
Δ= min 𝑟𝑖𝑗 − 0.5 Δ= min
𝜎 𝑖 =𝜎(𝑗) 𝜎 𝑖 =𝜎(𝑗) 2(𝑤𝑖 + 𝑤𝑗 ) preference score
Intra-cluster pair (𝑖, 𝑗) Intra-cluster pair (𝑖, 𝑗)
Δ: Heterogeneity of comparisons
Δ ↑ ⇒ Clustering becomes easier
9
Proposed Model: Quantifying the Heterogeneity
𝑤1 𝑤2 …… 𝑤𝑛1
{
{
{
|𝑤𝑖 − 𝑤𝑗 | Gap of
Δ= min 𝑟𝑖𝑗 − 0.5 Δ= min
𝜎 𝑖 =𝜎(𝑗) 𝜎 𝑖 =𝜎(𝑗) 2(𝑤𝑖 + 𝑤𝑗 ) preference score
Intra-cluster pair (𝑖, 𝑗) Intra-cluster pair (𝑖, 𝑗)
• Output: 𝜎:
ො 𝑛 → 1,2
ෝ ∈ ℝ𝑛
𝒘
• Ranking Error: ℰ𝑟 𝒘, 𝒘
ෝ ≔ {∃(𝑖, 𝑗) with 𝜎 𝑖 = 𝜎 𝑗 s.t. 𝑤𝑖 − 𝑤𝑗 𝑤
ෝ𝑖 − 𝑤
ෝ𝑗 < 0}
Intra-cluster Inconsistent ordering
• Clustering Error: ℰ𝑐 𝜎, 𝜎ො ≔ {∃𝑖 s.t. 𝜎 𝑖 ≠ 𝜎(𝑖)}
ො
Wrong cluster label
• Goal: Characterize 𝜎ෝinf
ෝ
,𝒘
ෝ
sup ℙ{ℰ𝑐 𝜎, 𝜎ො ∪ ℰ𝑟 (𝒘, 𝒘)}
𝑝,𝐿,Δ
10
The Fundamental Limit is Determined by…
• The parameter of Binomial: 𝑝, 𝐿
• 𝑝, 𝐿 ↑ ⇒ More comparisons!
11
Main Result
2
log 𝑛
𝑝𝐿 × Δ = Θ( )
𝑛
Expected number of comparisons Level of heterogeneity
Impossible Achievable
By Fano’s inequality By the proposed algorithm
𝑝𝐿Δ2 𝐶1
log 𝑛
𝐶2
log 𝑛
𝑛 𝑛
𝐶1 , 𝐶2 : constants that only depends on 𝑤min , 𝑤max and 𝛽
12
Proposed Three-step Algorithm
13
Proposed Three-step Algorithm
Borda
Elimination SDP
Counting
Ensures the
Transform the graph
correct ordering Solve the clustering task
into a block model
in each cluster
with high probability
13
Borda Counting:
Ensure the Correct Intra-Cluster Ordering
8
2 3 3
5
2 3
8
2 3 3
13 5
1 4
1 4 𝑘
𝑗
1 4 …
𝑗
2 3 2 3
8 8
2 3 3 3
13 5 13 5
1 4 4
15
Brief Introduction to SBM
𝜶 𝜷
𝜷 𝜶
Expected value of the adjacency matrix of the observation graph
E. Abbe, “Community detection and stochastic block models: recent developments,”
The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6446–6531, 2017. 16
Before Elimination: Directed Weighted Graph
𝑝
Bin 𝐿, 𝑝𝑟𝑖𝑗 Bin(𝐿, )
2
𝑖 𝑗 𝑖 𝑗
𝑝
Bin 𝐿, 𝑝𝑟𝑗𝑖 Bin(𝐿, )
2
𝒑𝑳
𝒑𝑳𝒓𝒊𝒋
𝟐
𝒑𝑳
𝒑𝑳𝒓𝒊𝒋
𝟐
17
After Elimination: Similar to SBM
𝑝
Bin 𝐿, 𝑝 max{𝑟𝑖𝑗 , 𝑟𝑗𝑖 } Bin(𝐿, )
2
𝑖 𝑗 𝑖 𝑗
𝑝 Δ
𝑝max{𝑟𝑖𝑗 , 𝑟𝑗𝑖 } > (1 + )
2 2
𝒑𝑳 𝒑𝑳
>
𝟐 𝟐
𝒑𝑳 𝒑𝑳
>
𝟐 𝟐
18
Existing algorithms in SBM literature
• Spectral Methods
[Rohe-Chatterjee-Yu ’11] [Vu ’14]
[Yun-Proutiere ’14][Lei-Rinaldo ’15]
[Joseph-Yu ’16][Gao-Ma-Zhang-Zhou ’17] 𝒑𝑳 𝒑𝑳
[Abbe-Fang-Wang-Zhong ’20]
>
𝟐 𝟐
• SDP 𝒑𝑳 𝒑𝑳
>
[Amini-Levina ’14][Abbe-Banderia-Hall ’16] 𝟐 𝟐
[Montanari-Sen ’16][Guédon-Vershynin ’16]
[Hajek-Wu-Xu ’16][Moitra-Perry-Wein ’16]
19
SDP is a Convex Relaxation of MLE in SBM
For SBM2 (𝑛, 𝛼, 𝛽)
MLE: SDP:
Find 𝜎 that Find 𝑋 that
maximize 𝐴 − 𝜆𝐽, 𝜎𝜎 𝑇 maximize 𝐴 − 𝜆𝐽, 𝑋
subject to 𝜎𝑖 ∈ ±1 , ∀𝑖 ∈ [𝑛] subject to 𝑋 is PSD,
1−𝛽 𝑋𝑖𝑖 = 1, ∀𝑖 ∈ 𝑛
log( )
𝜆= 1 − 𝛼
𝛼(1 − 𝛽)
log( )
𝛽(1 − 𝛼)
B. Hajek, Y. Wu, and J. Xu, “Achieving exact cluster recovery threshold via semidefinite programming: Extensions,”
IEEE Transactions on Information Theory, vol. 62, no. 10, pp. 5918–5937, 2016. 20
SDP in our problem
1 −1
maximize 𝐴 − 𝜆(𝐽 − 𝐼), 𝑋
Matrix after elimination step 𝟏 −𝟏 1
subject to 𝑋 is PSD, 𝑋∗ = =
𝑋𝑖𝑖 = 1, ∀𝑖 ∈ 𝑛 −𝟏 𝟏 −1
𝑝𝐿Δ
𝜆=
Δ
4 log(1 + )
2
2 log 𝑛
Theorem: If 𝑝𝐿Δ ≥ 𝐶′ ,
then the output of the above
𝑛
program 𝑋𝑆𝐷𝑃 = 𝑋 ∗ with high probability
Solves Clustering (w.h.p.)! 21
Key Difference Between Our SDP and The Others
Classical SBM: Proposed SDP in our setting:
maximize 𝐴 − 𝜆𝐽, 𝑋 maximize 𝐴 − 𝜆(𝐽 − 𝐼), 𝑋
subject to 𝑋 is PSD, subject to 𝑋 is PSD,
𝑋𝑖𝑖 = 1, ∀𝑖 ∈ 𝑛 𝑋𝑖𝑖 = 1, ∀𝑖 ∈ 𝑛
𝑝 Δ 𝑝
𝛼= 1+ ,𝛽 =
𝑎 log 𝑛 𝑏 log 𝑛 2 2 2
𝛼= ,𝛽 = , 𝑎− 𝑏>2 |𝑤𝑖 − 𝑤𝑗 | 1
𝑛 𝑛 Δ= min ⇒ Δ = 𝑂( )
log 𝑛 𝑖 =𝜎(𝑗) 2(𝑤𝑖 + 𝑤𝑗 )
𝛼−𝛽 =Θ → same order as 𝛼, 𝛽 𝜎 𝑛
𝑛 𝑝
𝛼 − 𝛽 = 𝑂 𝑝Δ = 𝑂 →Smaller than 𝛼, 𝛽
𝑛
22
Conclusion
• Characterize the fundamental limit for exact joint clustering and ranking:
Impossible Achievable
By Fano’s inequality With a polynomial-time algorithm
𝑝𝐿Δ2 𝐶1
log 𝑛
𝐶2
log 𝑛
𝑛 𝑛
• Proposed algorithm:
Borda
Elimination SDP
Counting
(Transformation) (Clustering)
(Ranking)
23
Final Remarks
• The information limit can be extended to multiple clusters (in Appendix C).
• The joint clustering and top-K identification problem can be considered.
1 log 𝑛
This work (exact joint clustering and ranking) Δ = 𝑂 , 𝑝𝐿Δ2 ≥ 𝐶 ⇒ 𝑝𝐿 ≥ 𝐶𝑛 log 𝑛
𝑛 𝑛
1
Joint clustering and top-K identification: Δ = 𝑂
𝐾
• The definition of Δ is slightly different from the definition in the paper.
24
Thank you for listening!
25