You are on page 1of 7

Spectral clustering

In multivariate statistics, spectral clustering techniques make use of the


spectrum (eigenvalues) of the similarity matrix of the data to perform
dimensionality reduction before clustering in fewer dimensions. The similarity
matrix is provided as an input and consists of a quantitative assessment of the
relative similarity of each pair of points in the dataset.

In application to image segmentation, spectral clustering is known as An example connected


segmentation-based object categorization. graph, with 6 vertices.

Definitions
Given an enumerated set of data points, the similarity matrix may be defined as
a symmetric matrix , where represents a measure of the similarity
between data points with indices and . The general approach to spectral
clustering is to use a standard clustering method (there are many such methods, Partitioning into two
k-means is discussed below) on relevant eigenvectors of a Laplacian matrix of connected graphs
. There are many different ways to define a Laplacian which have different
mathematical interpretations, and so the clustering will also have different
interpretations. The eigenvectors that are relevant are the ones that correspond to smallest several
eigenvalues of the Laplacian except for the smallest eigenvalue which will have a value of 0. For
computational efficiency, these eigenvectors are often computed as the eigenvectors corresponding to the
largest several eigenvalues of a function of the Laplacian.

Laplacian matrix

Spectral clustering is well known to relate to partitioning of a mass-


spring system, where each mass is associated with a data point and
each spring stiffness corresponds to a weight of an edge describing
a similarity of the two related data points, as in the spring system.
Specifically, the classical reference [1] explains that the eigenvalue
problem describing transversal vibration modes of a mass-spring
system is exactly the same as the eigenvalue problem for the graph
Laplacian matrix defined as

, A 2-dimensional spring system.

where is the diagonal matrix

and A is the adjacency matrix.


The masses that are tightly connected by the springs in the mass-spring system evidently move together
from the equilibrium position in low-frequency vibration modes, so that the components of the eigenvectors
corresponding to the smallest eigenvalues of the graph Laplacian can be used for meaningful clustering of
the masses. For example, assuming that all the springs and the masses are identical in the 2-dimensional
spring system pictured, one would intuitively expect that the loosest connected masses on the right-hand
side of the system would move with the largest amplitude and in the opposite direction to the rest of the
masses when the system is shaken — and this expectation will be confirmed by analyzing components of
the eigenvectors of the graph Laplacian corresponding to the smallest eigenvalues, i.e., the smallest
vibration frequencies.

Laplacian matrix normalization

The goal of normalization is making the diagonal entries of the Laplacian matrix to be all unit, also scaling
off-diagonal entries correspondingly. In a weighted graph, a vertex may have a large degree because of a
small number of connected edges but with large weights just as well as due to a large number of connected
edges with unit weights.

A popular normalized spectral clustering technique is the normalized cuts algorithm or Shi–Malik
algorithm introduced by Jianbo Shi and Jitendra Malik,[2] commonly used for image segmentation. It
partitions points into two sets based on the eigenvector corresponding to the second-smallest
eigenvalue of the symmetric normalized Laplacian defined as

The vector is also the eigenvector corresponding to the second-largest eigenvalue of the symmetrically
normalized adjacency matrix

The random walk (or left) normalized Laplacian is defined as

and can also be used for spectral clustering. A mathematically equivalent algorithm [3] takes the eigenvector
corresponding to the largest eigenvalue of the random walk normalized adjacency matrix .

The eigenvector of the symmetrically normalized Laplacian and the eigenvector of the left normalized
Laplacian are related by the identity

Cluster analysis via Spectral Embedding

Knowing the -by- matrix of selected eigenvectors, mapping — called spectral embedding — of the
original data points is performed to a -dimensional vector space using the rows of . Now the analysis
is reduced to clustering vectors with components, which may be done in various ways.

In the simplest case , the selected single eigenvector , called the Fiedler vector, corresponds to the
second smallest eigenvalue. Using the components of one can place all points whose component in is
positive in the set and the rest in , thus bi-partitioning the graph and labeling the data points with
two labels. This sign-based approach follows the intuitive explanation of spectral clustering via the mass-
spring model — in the low frequency vibration mode that the Fiedler vector represents, one cluster data
points identified with mutually strongly connected masses would move together in one direction, while in
the complement cluster data points identified with remaining masses would move together in the opposite
direction. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in the
same fashion.

In the general case , any vector clustering technique can be used, e.g., DBSCAN.

Algorithms
Basic Algorithm

1. Calculate the Laplacian (or the normalized Laplacian)


2. Calculate the first eigenvectors (the eigenvectors corresponding to the smallest
eigenvalues of )
3. Consider the matrix formed by the first eigenvectors; the -th row defines the features of
graph node
4. Cluster the graph nodes based on these features (e.g., using k-means clustering)

If the similarity matrix has not already been explicitly constructed, the efficiency of spectral clustering
may be improved if the solution to the corresponding eigenvalue problem is performed in a matrix-free
fashion (without explicitly manipulating or even computing the similarity matrix), as in the Lanczos
algorithm.

For large-sized graphs, the second eigenvalue of the (normalized) graph Laplacian matrix is often ill-
conditioned, leading to slow convergence of iterative eigenvalue solvers. Preconditioning is a key
technology accelerating the convergence, e.g., in the matrix-free LOBPCG method. Spectral clustering has
been successfully applied on large graphs by first identifying their community structure, and then clustering
communities.[4]

Spectral clustering is closely related to nonlinear dimensionality reduction, and dimension reduction
techniques such as locally-linear embedding can be used to reduce errors from noise or outliers.[5]

Costs
Denoting the number of the data points ny , it is important to estimate the memory footprint and compute
time, or number of arithmetic operations (AO) performed, as a function of . No matter the algorithm of the
spectral clustering, the two main costly items are the construction of the graph Laplacian and determining
its eigenvectors for the spectral embedding. The last step — determining the labels from the -by-
matrix of eigenvectors — is typically the least expensive requiring only AO and creating just a -by-
vector of the labels in memory.

The need to construct the graph Laplacian is common for all distance- or correlation-based clustering
methods. Computing the eigenvectors is specific to spectral clustering only.

Constructing graph Laplacian

The graph Laplacian can be and commonly is constructed from the adjacency matrix. The construction can
be performed matrix-free, i.e., without explicitly forming the matrix of the graph Laplacian and no AO. It
can also be performed in-place of the adjacency matrix without increasing the memory footprint. Either
way, the costs of constructing the graph Laplacian is essentially determined by the costs of constructing the
-by- graph adjacency matrix.
Moreover, a normalized Laplacian has exactly the same eigenvectors as the normalized adjacency matrix,
but with the order of the eigenvalues reversed. Thus, instead of computing the eigenvectors corresponding
to the smallest eigenvalues of the normalized Laplacian, one can equivalently compute the eigenvectors
corresponding to the largest eigenvalues of the normalized adjacency matrix, without even talking about the
Laplacian matrix.

Naive constructions of the graph adjacency matrix, e.g., using the RBF kernel, make it dense, thus
requiring memory and AO to determine each of the entries of the matrix. Nystrom method[6] can
be used to approximate the similarity matrix, but the approximate matrix is not elementwise positive,[7] i.e.
cannot be interpreted as a distance-based similarity.

Algorithms to construct the graph adjacency matrix as a sparse matrix are typically based on a nearest
neighbor search, which estimate or sample a neighborhood of a given data point for nearest neighbors, and
compute non-zero entries of the adjacency matrix by comparing only pairs of the neighbors. The number of
the selected nearest neighbors thus determines the number of non-zero entries, and is often fixed so that the
memory footprint of the -by- graph adjacency matrix is only , only sequential arithmetic
operations are needed to compute the non-zero entries, and the calculations can be trivially run in
parallel.

Computing eigenvectors

The cost of computing the -by- (with ) matrix of selected eigenvectors of the graph Laplacian is
normally proportional to the cost of multiplication of the -by- graph Laplacian matrix by a vector, which
varies greatly whether the graph Laplacian matrix is dense or sparse. For the dense case the cost thus is
. The very commonly cited in the literature cost comes from choosing and is clearly
misleading, since, e.g., in a hierarchical spectral clustering as determined by the Fiedler vector.

In the sparse case of the -by- graph Laplacian matrix with non-zero entries, the cost of the matrix-
vector product and thus of computing the -by- with matrix of selected eigenvectors is ,
with the memory footprint also only — both are the optimal low bounds of complexity of clustering
data points. Moreover, matrix-free eigenvalue solvers such as LOBPCG can efficiently run in parallel,
e.g., on multiple GPUs with distributed memory, resulting not only in high quality clusters, which spectral
clustering is famous for, but also top performance. [8]

Software
Free software implementing spectral clustering is available in large open source projects like scikit-learn[9]
using LOBPCG[10] with multigrid preconditioning[11] [12] or ARPACK, MLlib for pseudo-eigenvector
clustering using the power iteration method,[13] and R.[14]

Relationship with other clustering methods


The ideas behind spectral clustering may not be immediately obvious. It may be useful to highlight
relationships with other methods. In particular, it can be described in the context of kernel clustering
methods, which reveals several similarities with other approaches.[15]

Relationship with k-means


The weighted kernel k-means problem[16] shares the objective function with the spectral clustering
problem, which can be optimized directly by multi-level methods.[17]

Relationship to DBSCAN

In the trivial case of determining connected graph components — the optimal clusters with no edges cut —
spectral clustering is also related to a spectral version of DBSCAN clustering that finds density-connected
components.[18]

Measures to compare clusterings


Ravi Kannan, Santosh Vempala and Adrian Vetta[19] proposed a bicriteria measure to define the quality of
a given clustering. They said that a clustering was an (α, ε)-clustering if the conductance of each cluster (in
the clustering) was at least α and the weight of the inter-cluster edges was at most ε fraction of the total
weight of all the edges in the graph. They also look at two approximation algorithms in the same paper.

History and related literatures


Spectral clustering has a long history.[20][21][22][23][24][2][25] Spectral clustering as a machine learning
method was popularized by Shi & Malik[2] and Ng, Jordan, & Weiss.[25]

Ideas and network measures related to spectral clustering also play an important role in a number of
applications apparently different from clustering problems. For instance, networks with stronger spectral
partitions take longer to converge in opinion-updating models used in sociology and economics.[26][27]

See also
Affinity propagation
Kernel principal component analysis
Cluster analysis
Spectral graph theory

References
1. Demmel, J. "CS267: Notes for Lecture 23, April 9, 1999, Graph Partitioning, Part 2" (https://p
eople.eecs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html).
2. Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation" (http://www.cs.b
erkeley.edu/~malik/papers/SM-ncut.pdf), IEEE Transactions on PAMI, Vol. 22, No. 8, Aug
2000.
3. Marina Meilă & Jianbo Shi, "Learning Segmentation by Random Walks (http://www.citeulike.
org/user/mpotamias/article/498897)", Neural Information Processing Systems 13 (NIPS
2000), 2001, pp. 873–879.
4. Zare, Habil; Shooshtari, P.; Gupta, A.; Brinkman, R. (2010). "Data reduction for spectral
clustering to analyze high throughput flow cytometry data" (https://www.ncbi.nlm.nih.gov/pm
c/articles/PMC2923634). BMC Bioinformatics. 11: 403. doi:10.1186/1471-2105-11-403 (http
s://doi.org/10.1186%2F1471-2105-11-403). PMC 2923634 (https://www.ncbi.nlm.nih.gov/pm
c/articles/PMC2923634). PMID 20667133 (https://pubmed.ncbi.nlm.nih.gov/20667133).
5. Arias-Castro, E.; Chen, G.; Lerman, G. (2011), "Spectral clustering based on local linear
approximations.", Electronic Journal of Statistics, 5: 1537–87, arXiv:1001.1323 (https://arxiv.
org/abs/1001.1323), doi:10.1214/11-ejs651 (https://doi.org/10.1214%2F11-ejs651),
S2CID 88518155 (https://api.semanticscholar.org/CorpusID:88518155)
6. Fowlkes, C (2004). "Spectral grouping using the Nystrom method" (https://escholarship.org/u
c/item/29z29233). IEEE Transactions on Pattern Analysis and Machine Intelligence. 26 (2):
214–25. doi:10.1109/TPAMI.2004.1262185 (https://doi.org/10.1109%2FTPAMI.2004.126218
5). PMID 15376896 (https://pubmed.ncbi.nlm.nih.gov/15376896). S2CID 2384316 (https://ap
i.semanticscholar.org/CorpusID:2384316).
7. Wang, S.; Gittens, A.; Mahoney, M.W. (2019). "Scalable Kernel K-Means Clustering with
Nystrom Approximation: Relative-Error Bounds". Journal of Machine Learning Research. 20:
1–49. arXiv:1706.02803 (https://arxiv.org/abs/1706.02803).
8. Acer, Seher; Boman, Erik G.; Glusa, Christian A.; Rajamanickam, Sivasankaran (2021).
"Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems". Parallel
Computing. 106: 102769. doi:10.1016/j.parco.2021.102769 (https://doi.org/10.1016%2Fj.par
co.2021.102769). S2CID 233481603 (https://api.semanticscholar.org/CorpusID:233481603).
9. "2.3. Clustering" (http://scikit-learn.org/stable/modules/clustering.html#spectral-clustering).
10. Knyazev, Andrew V. (2003). Boley; Dhillon; Ghosh; Kogan (eds.). Modern preconditioned
eigensolvers for spectral image segmentation and graph bisection (https://www.researchgat
e.net/publication/343531874). Clustering Large Data Sets; Third IEEE International
Conference on Data Mining (ICDM 2003) Melbourne, Florida: IEEE Computer Society.
pp. 59–62.
11. Knyazev, Andrew V. (2006). Multiscale Spectral Image Segmentation Multiscale
preconditioning for computing eigenvalues of graph Laplacians in image segmentation (http
s://www.researchgate.net/publication/354448354). Fast Manifold Learning Workshop, WM
Williamburg, VA. doi:10.13140/RG.2.2.35280.02565 (https://doi.org/10.13140%2FRG.2.2.35
280.02565).
12. Knyazev, Andrew V. (2006). Multiscale Spectral Graph Partitioning and Image Segmentation
(https://www.researchgate.net/publication/343531874). Workshop on Algorithms for Modern
Massive Datasets Stanford University and Yahoo! Research.
13. "Clustering - RDD-based API - Spark 3.2.0 Documentation" (http://spark.apache.org/docs/lat
est/mllib-clustering.html#power-iteration-clustering-pic).
14. "Kernlab: Kernel-Based Machine Learning Lab" (https://cran.r-project.org/web/packages/ker
nlab). 12 November 2019.
15. Filippone, M.; Camastra, F.; Masulli, F.; Rovetta, S. (January 2008). "A survey of kernel and
spectral methods for clustering" (http://eprints.whiterose.ac.uk/8536/2/Filippone_spectral_m
ethodspr08.pdf) (PDF). Pattern Recognition. 41 (1): 176–190. Bibcode:2008PatRe..41..176F
(https://ui.adsabs.harvard.edu/abs/2008PatRe..41..176F). doi:10.1016/j.patcog.2007.05.018
(https://doi.org/10.1016%2Fj.patcog.2007.05.018).
16. Dhillon, I.S.; Guan, Y.; Kulis, B. (2004). "Kernel k-means: spectral clustering and normalized
cuts" (https://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.p
df) (PDF). Proceedings of the tenth ACM SIGKDD international conference on Knowledge
discovery and data mining. pp. 551–6.
17. Dhillon, Inderjit; Guan, Yuqiang; Kulis, Brian (November 2007). "Weighted Graph Cuts
without Eigenvectors: A Multilevel Approach". IEEE Transactions on Pattern Analysis and
Machine Intelligence. 29 (11): 1944–1957. CiteSeerX 10.1.1.131.2635 (https://citeseerx.ist.p
su.edu/viewdoc/summary?doi=10.1.1.131.2635). doi:10.1109/tpami.2007.1115 (https://doi.or
g/10.1109%2Ftpami.2007.1115). PMID 17848776 (https://pubmed.ncbi.nlm.nih.gov/178487
76). S2CID 9402790 (https://api.semanticscholar.org/CorpusID:9402790).
18. Schubert, Erich; Hess, Sibylle; Morik, Katharina (2018). The Relationship of DBSCAN to
Matrix Factorization and Spectral Clustering (http://ceur-ws.org/Vol-2191/paper38.pdf)
(PDF). LWDA. pp. 330–334.
19. Kannan, Ravi; Vempala, Santosh; Vetta, Adrian (2004). "On Clusterings : Good, Bad and
Spectral". Journal of the ACM. 51 (3): 497–515. doi:10.1145/990308.990313 (https://doi.org/
10.1145%2F990308.990313). S2CID 207558562 (https://api.semanticscholar.org/CorpusID:
207558562).
20. Cheeger, Jeff (1969). "A lower bound for the smallest eigenvalue of the Laplacian".
Proceedings of the Princeton Conference in Honor of Professor S. Bochner.
21. Donath, William; Hoffman, Alan (1972). "Algorithms for partitioning of graphs and computer
logic based on eigenvectors of connections matrices". IBM Technical Disclosure Bulletin.
22. Fiedler, Miroslav (1973). "Algebraic connectivity of graphs" (https://doi.org/10.21136%2FCM
J.1973.101168). Czechoslovak Mathematical Journal. 23 (2): 298–305.
doi:10.21136/CMJ.1973.101168 (https://doi.org/10.21136%2FCMJ.1973.101168).
23. Guattery, Stephen; Miller, Gary L. (1995). "On the performance of spectral graph partitioning
methods". Annual ACM-SIAM Symposium on Discrete Algorithms.
24. Daniel A. Spielman and Shang-Hua Teng (1996). "Spectral Partitioning Works: Planar
graphs and finite element meshes". Annual IEEE Symposium on Foundations of Computer
Science.
25. Ng, Andrew Y.; Jordan, Michael I.; Weiss, Yair (2002). "On spectral clustering: analysis and
an algorithm" (https://ai.stanford.edu/~ang/papers/nips01-spectral.pdf) (PDF). Advances in
Neural Information Processing Systems.
26. DeMarzo, P. M.; Vayanos, D.; Zwiebel, J. (2003-08-01). "Persuasion Bias, Social Influence,
and Unidimensional Opinions" (http://eprints.lse.ac.uk/454/). The Quarterly Journal of
Economics. Oxford University Press. 118 (3): 909–968. doi:10.1162/00335530360698469 (h
ttps://doi.org/10.1162%2F00335530360698469). ISSN 0033-5533 (https://www.worldcat.org/
issn/0033-5533).
27. Golub, Benjamin; Jackson, Matthew O. (2012-07-26). "How Homophily Affects the Speed of
Learning and Best-Response Dynamics". The Quarterly Journal of Economics. Oxford
University Press (OUP). 127 (3): 1287–1338. doi:10.1093/qje/qjs021 (https://doi.org/10.109
3%2Fqje%2Fqjs021). ISSN 0033-5533 (https://www.worldcat.org/issn/0033-5533).

Retrieved from "https://en.wikipedia.org/w/index.php?title=Spectral_clustering&oldid=1159215715"

You might also like