Expert Systems With Applications: Tülin Inkaya

JID: ESWA
ARTICLE IN PRESS [m5G;August 10, 2015;21:44]
Expert Systems With Applications xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
A parameter-free similarity graph for spectral clustering

Q1 Tülin İnkaya∗
Q2
Uludağ University, Industrial Engineering Department, Görükle, 16059 Bursa, Turkey
a r t i c l e i n f o a b s t r a c t
Keywords: Spectral clustering is a popular clustering method due to its simplicity and superior performance in the data
Spectral clustering sets with non-convex clusters. The method is based on the spectral analysis of a similarity graph. Previous
Similarity graph
studies show that clustering results are sensitive to the selection of the similarity graph and its parameter(s).
k-nearest neighbor
In particular, when there are data sets with arbitrary shaped clusters and varying density, it is difficult to
ε -neighborhood
Fully connected graph determine the proper similarity graph and its parameters without a priori information. To address this issue,
we propose a parameter-free similarity graph, namely Density Adaptive Neighborhood (DAN). DAN combines
distance, density and connectivity information, and it reflects the local characteristics. We test the perfor-
mance of DAN with a comprehensive experimental study. We compare k-nearest neighbor (KNN), mutual
KNN, ε -neighborhood, fully connected graph, minimum spanning tree, Gabriel graph, and DAN in terms of
clustering accuracy. We also examine the robustness of DAN to the number of attributes and the transfor-
mations such as decimation and distortion. Our experimental study with various artificial and real data sets
shows that DAN improves the spectral clustering results, and it is superior to the competing approaches.
Moreover, it facilitates the application of spectral clustering to various domains without a priori information.
© 2015 Published by Elsevier Ltd.
1 1. Introduction The most commonly used similarity graphs in the literature are 23
k-nearest neighbor (KNN), mutual KNN, ε -neighborhood, and fully 24
2 Spectral clustering determines the clusters based on the spectral connected graphs (Von Luxburg, 2007). The main idea in these ap- 25
3 analysis of a similarity graph. The approach is easy to implement, proaches is to represent the local characteristics of the data set using 26
4 and it outperforms traditional clustering methods such as k-means a parameter such as k, ε , and σ . A recent study by Maier, von Luxburg, 27
5 algorithm. For this reason, it is one of the widely used clustering al- and Hein (2013) show that the clustering results depend on the choice 28
6 gorithms in bioinformatics (Higham, Kalna, & Kibble, 2007), pattern of the similarity graph and its parameters. However, proper parame- 29
7 recognition (Vázquez-Martín & Bandera, 2013, Wang, 2008), image ter setting becomes a challenging task for the data sets with arbi- 30
8 segmentation (Zeng, Huang, Kang, & Sang, 2014), and text mining trary shaped clusters, varying density, and imbalanced clusters. For 31
9 (Dhillon, 2001, He, Qin, & Liu, 2012). instance, KNN may connect the points in different density regions. 32
10 Basically, a spectral clustering algorithm consists of three steps: A similar problem is observed in the ε -neighborhood and fully con- 33
11 pre-processing, decomposition, and grouping. In the pre-processing nected graphs due to the spherical-shaped neighborhoods. 34
12 step, a similarity graph and its adjacency matrix are constructed for To overcome these limitations a stream of research addresses 35
13 the data set. In the decomposition step, the representation of the data parameter selection problem for the similarity graph (Nadler & 36
14 set is changed using the eigenvectors of the matrix. In the group- Galun, 2006, Ng, Jordan, & Weiss, 2002, Zelnik-Manor & Perona, 37
15 ing step, clusters are extracted from the new representation. In this 2004, Zhang, Li, & Yu, 2011). Another research stream incorpo- 38
16 study, we focus on the pre-processing step. Our aim is to represent rates the proximity relations to the similarity graph using mini- 39
17 the local characteristics of the data set using a similarity graph. In mum spanning tree and β –skeleton (Carreira-Perpinan & Zemel, 40
18 spectral clustering, we consider three important properties of a sim- 2005, Correa & Lindstorm, 2012). There are also studies that use 41
19 ilarity graph (Von Luxburg, 2007): (1) The similarity graph should be k-means, genetic algorithms, and random forests to obtain robust 42
20 symmetric and non-negative. (2) The similarity graph should be con- similarity matrices (Beauchemin, 2015, Chrysouli & Tefas, 2015, Zhu, 43
21 nected unless the connected components (subclusters) form the tar- Loy, & Gong, 2014). These approaches provide some improvement, 44
22 get clusters. (3) The similarity graph should be robust. however, they still include parameters to be set properly. More- 45
over, some of them do not handle the data sets with varying 46
density. 47
In this study, we propose a parameter-free similarity graph 48
∗
Corresponding author. Tel.: +902242942605; fax: +902242941903. to address the limitations of the aforementioned approaches. We 49
Q3
E-mail address: tinkaya@uludag.edu.tr, tinkaya@gmail.com adopt the neighborhood construction (NC) method proposed by 50
http://dx.doi.org/10.1016/j.eswa.2015.07.074
0957-4174/© 2015 Published by Elsevier Ltd.
Please cite this article as: T. İnkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015),
JID: ESWA
2 T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx
51 İnkaya, Kayalıgil, and Özdemirel (2015) to reflect the local charac- Nadler and Galun (2006) introduce a coherence measure for a set of 115
52 teristics of the data set. NC yields a unique neighborhood for each points in the same cluster. The proposed measure is compared with 116
53 point, and the similarity graph generated using NC neighborhoods some threshold values to accept or reject a partition. Although this 117
54 may be asymmetric. Also, it may include isolated vertices and sub- approach finds the clusters correctly, it is not capable of finding clus- 118
55 graphs. However, spectral clustering algorithms require symmetric ters with density variations. 119
56 and connected subgraphs. In order to satisfy these properties, we per- Carreira-Perpinan and Zemel (2005), and Correa and Lindstorm 120
57 form additional steps. First, we construct an undirected graph using (2012) use proximity graphs to incorporate the connectivity informa- 121
58 NC neighborhoods. We call this graph Density Adaptive Neighborhood tion to the similarity graph. Carreira-Perpinan and Zemel (2005) pro- 122
59 (DAN). Then, we insert edges to DAN if it includes more connected pose two similarity graphs based on minimum spanning tree (MST). 123
60 components than the target number of clusters. Finally, we form the Both graphs are constructed using an ensemble of trees. In the first 124
61 weighted adjacency matrix of DAN using Gaussian kernel function. In graph, each point is perturbed using a noise model, and a given num- 125
62 order to find the clusters, decomposition and grouping steps of any ber of MSTs are constructed using perturbed versions of the data 126
63 spectral clustering algorithm are applied to the proposed approach. set. Then, these MSTs are combined to obtain the similarity graph. 127
64 Our comprehensive experimental study with various artificial and In the second one, a given number of MSTs are constructed such 128
65 real data sets shows the superiority of DAN to competing approaches. that the edges in the MSTs are disjoint. Then, the combination of 129
66 To sum up, our contribution is the development of a pre- these disjoint MSTs forms the similarity graph. Correa and Lindstorm 130
67 processing step for spectral clustering with no a priori information (2012) introduce an approach that combines β –skeleton (empty re- 131
68 on the data set. The proposed approach includes the construction of a gion) graph with a local scaling algorithm. The local scaling algorithm 132
69 parameter-free similarity graph and its weighted adjacency matrix. It uses a diffusion-based mechanism. It starts from an estimate of the 133
70 is flexible in the sense that it can be applicable to any spectral cluster- local scale, and the local scale is refined for some iterations. Two pa- 134
71 ing algorithm. It works in the data sets with arbitrary shaped clusters rameters are used to control the diffusion speed. Although these ap- 135
72 and varying density. Moreover, it is robust to the number of attributes proaches find arbitrary shaped clusters, density relations among the 136
73 and transformations. data points are not reflected to the similarity graphs. Moreover, their 137
74 The rest of the paper is organized as follows. The related litera- performances are sensitive to the proper parameter selection. 138
75 ture is provided in Section 2. We introduce the background infor- A group of studies combine various methods to improve the 139
76 mation about spectral clustering and similarity graphs in Section 3. similarity matrix construction. For example, a recent study by 140
77 The proposed approach is explained in Section 4. The performance Beauchemin (2015) proposes a density-based similarity matrix con- 141
78 of the proposed approach is examined in Section 5. The discussion struction method based on k-means with subbagging. The subbag- 142
79 of the experiments is given in Section 6. Finally, we conclude in ging procedure increases the density estimate accuracy. However, 143
80 Section 7. the proposed approach requires six hyperparameters. Moreover, it 144
has shortcomings when there is manifold proximity in the data set. 145
81 2. Literature review Zhu et al. (2014) use clustering random forests to obtain a robust sim- 146
ilarity matrix. A binary split function is optimized for learning a clus- 147
82 Spectral clustering has its roots in graph partitioning problem. tering forest. This also includes two parameters. Chrysouli and Tefas 148
83 Nascimento and Carvalho (2011), Von Luxburg (2007), and Jia, Ding, (2015) combine spectral clustering and genetic algorithms (GA). Us- 149
84 Xu, and Nie (2014) provide comprehensive reviews about the spectral ing GA, they evolve a number of similarity graphs according to the 150
85 clustering algorithms. clustering result. 151
86 The literature about spectral clustering can be classified into two There are also other variants of spectral clustering algorithms. 152
87 categories (Zhu et al., 2014): (1) The studies that focus on data group- For example, approximate spectral clustering (ASC) is developed for 153
88 ing when a similarity graph is given, and (2) the studies that focus large data sets. ASC works with the representatives of data sam- 154
89 on similarity graph construction when a particular spectral cluster- ples (points), namely prototypes. Hence, the desired similarity matrix 155
90 ing algorithm is used. In the first category, there are several studies should reflect the relations between the data samples and prototypes. 156
91 that improve the clustering performance. For instance, Liu, Poon, Liu, Taşdemir (2012) adopts the connectivity graph proposed by Taşdemir 157
92 and Zhang (2014) use latent tree models to find the number of lead- and Merényi (2009), and introduces a similarity measure for the vec- 158
93 ing eigenvectors and partition the data points. Lu, Fu, and Shu (2014) tor quantization prototypes, namely CONN. CONN calculates the simi- 159
94 combine spectral clustering with non-negative matrix factorization, larity measure considering the distribution of the data samples in the 160
95 and propose non-negative and sparse spectral clustering algorithm. Voronoi polygons with respect to the prototypes. Taşdemir, Yalçin, 161
96 Xiang and Gong (2008) introduce a novel informative/relevant eigen- and Yildirim (2015) extend this idea and incorporate topology, dis- 162
97 vector selection algorithm, which determines the number of clusters. tance and density information using geodesic-based similarity crite- 163
98 In this study, we address the similarity graph construction prob- ria. Different from these studies, we aim to define the relations among 164
99 lem, so our work is related to the second category. A group of stud- all points in the data set. 165
100 ies in the second category aims to determine the local characteris- In this study, we propose a pre-processing step for spectral clus- 166
101 tics of the data set using proper parameter selection. Ng et al. (2002) tering, with no a priori information. The proposed approach yields 167
102 suggest the execution of spectral clustering algorithm for different a similarity graph and its weighted adjacency matrix, which can be 168
103 values of neighborhood width σ . Then, they pick the one having the used with any spectral clustering algorithm. Our work differs from 169
104 least squared intra-cluster distance to the centroid. This method ex- the previous studies in the following sense: (1) It is a parameter-free 170
105 tracts the local characteristics better. However, additional parameters approach. (2) It reflects the connectivity, density and distance rela- 171
106 are required, and the computational complexity is high. Zelnik-Manor tions among all data points. (3) It works on the data sets not only 172
107 and Perona (2004) propose the calculation of a local scaling parame- with convex clusters, but also with clusters having arbitrary shapes 173
108 ter σ i for each data point instead of a global parameter σ . However, and varying density. (4) It is robust to the transformations in the 174
109 this approach has limitations for the data sets with density varia- data set. 175
110 tions. Zhang et al. (2011) introduce a local density adaptive similarity
111 measure, namely Common-Near-Neighbor (CNN). CNN uses the local 3. Spectral clustering 176
112 density between two points, and reflects the connectivity by a set of
113 successive points in a dense region. This approach helps scale param- In this section, we explain the most commonly used similarity 177
114 eter σ in the Gaussian similarity function. In an alternative scheme, graphs and spectral clustering algorithms in the literature. 178
JID: ESWA
T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx 3
179 3.1. Similarity graphs k = log(n). Alternative ways for choosing σ are proposed by Ng et al. 218
(2002), and Zelnik-Manor and Perona (2004) (see Section 2). 219
180 Let X = {x1 , … , xn } be the set of data points. We represent X Proximity graphs: The most well-known proximity graphs are 220
181 in the form of a similarity graph G = (V,E) where each data point is MST, relative neighborhood graph (RNG), and Gabriel graph (GG) 221
182 represented by a vertex. G is an undirected graph with vertex set V (Gabriel & Sokal, 1969, Jaromczyk & Toussaint, 1992). MST is a tree 222
183 and edge set E. The weighted adjacency matrix of the graph is W = having the minimum total edge weights. In RNG, vertex vi is con- 223
184 (wij )i,j = 1, … , n . If wij = 0, then the vertices vi and vj are not connected. nected with vertex vj , if di j ≤ max{dip , d j p } for ∀v p ∈ V. In GG, vi is 224

185 k-nearest neighbor graph: The main idea is that vertex vi is con-
connected with vertex vj , if di j ≤ min { dip
2 + d 2 } for ∀v ∈ V. MST,
pj p 225
186 nected with vertex vj if vj belongs to the k nearest neighbors of vi ,
187 or vi belongs to the k nearest neighbors of vj . The resulting graph is RNG and GG do not have any parameters. 226
188 called the k-nearest neighbor graph (KNN). After edge insertion, each
189 edge is weighted by the similarity of its end points. KNN graph should 3.2. Spectral clustering algorithms 227
190 be connected, or it should include a few connected components (Von
191 Luxburg, 2007). For this purpose, the asymptotic connectivity result Let S = (sij )i,j = 1, … , n be a similarity matrix and W = (wij )i,j = 1, … , n 228
192 for random graphs (Brito, Chavez, Quiroz, & Yukich, 1997) can be used be its weighted adjacency matrix. The degree of vertex vi is di = 229
n
193 in a finite sample, i.e. k can be chosen in the order of log(n). i=1 wi j , and the degree matrix D is the diagonal matrix with the 230
194 Mutual k-nearest neighbor graph: The goal is to connect vertices degrees d1 , … , dn on the diagonal. 231
195 vi and vj if both vi belongs to the k nearest neighbors of vj , and vj Spectral clustering is based on the graph Laplacian, which is a ma- 232
196 belongs to the k nearest neighbors of vi . The resulting graph is called trix representation of the graph (Chung, 1997, chap. 1). The unnor- 233
197 the mutual k-nearest neighbor graph (MKNN). Similar to KNN, each malized graph Laplacian, L, is calculated as L = D-W. There are also 234
198 edge is weighted by the similarity of its end points. In general, MKNN normalized versions of graph Laplacian, i.e. Lsym = D−1/2 LD−1/2 and 235
199 has fewer edges compared to KNN. For this reason, selecting a larger Lrw = D−1 L. The former one is a symmetric matrix, whereas the latter 236
200 k compared to the one in KNN is reasonable. one is based on the random walk perspective. These graph Laplacians 237
201 ε-neighborhood graph: In this graph, vertices vi and vj are con- help extract the properties of a data set. 238
202 nected if dij is smaller than ε , where dij denotes the distance between The unnormalized spectral clustering algorithm (Von Luxburg, 239
203 vertices vi and vj . In general, edge weighting is not applied, as the dis- 2007) and two normalized spectral clustering algorithms (Ng et al., 240
204 tances between the connected points are in a similar scale. There are 2002, Shi & Malik, 2000) are presented in Figs. 1–3, respectively. The 241
205 two alternative ways to determine the value of ε : (i) setting ε as the unnormalized spectral clustering algorithm is based on the unnor- 242
206 longest edge in the minimum spanning tree (MST) of the data points, malized graph Laplacian, whereas the normalized spectral clustering 243
207 (ii) setting ε as the mean distance of a point to its kth closest neighbor. algorithms use one of the normalized graph Laplacians. 244
208 The former one ensures connectivity in the graph whereas the latter
209 one can extract the local characteristics inherent in the data set. 4. The proposed approach 245
210 Fully connected graph: In this graph, all vertices are connected.
211 For this reason, the selection of the similarity function is important, The proposed approach corresponds to the pre-processing step of 246
212 as the adjacency matrix should represent the local characteristics of a spectral clustering algorithm, and it includes the construction of 247
213 the neighborhood. A typical example for such a similarity function is Density Adaptive Neighborhood (DAN) and its adjacency matrix. The 248
the Gaussian kernel s(xi ,x j ) = exp (−di j /(2σ 2 )), where parameter
2
214 steps of the proposed approach are given in Fig. 4. 249
215 σ controls the neighborhood width. Parameter σ has a similar role as In the first step, the local characteristics of the data set are ex- 250
216 k and ε . For this reason, σ can be chosen as the longest edge in the tracted. Neighborhood Construction (NC) algorithm (İnkaya et al., 251
217 MST or the mean distance of a point to its kth closest neighbor with 2015) is adopted for this purpose. Let X = {x1 , … , xn } be the set of 252
Input: Data set X and target number of clusters k
-W.
k eigenvectors u1 uk of L.
Rnxk which contains the vectors u1 uk as columns.
i n, let yi Rk be the vector corresponding to the ith row of U.
yi)i=1,...,n in Rk with the k-means algorithm into clusters C1,..,Ck.
Output: Clusters C1,..,Ck
Fig. 1. Unnormalized spectral clustering algorithm by Von Luxburg (2007).
Lrw = D-1L where L=D-W.

k eigenvectors u1 uk of Lrw.
i n, let yi Rk be the vector corresponding to the ith row of U.
yi)i=1,...,n in Rk with the k-means algorithm into clusters C1,.., Ck.
Fig. 2. Normalized spectral clustering algorithm by Shi and Malik (2000).
JID: ESWA
-1/2
LD-1/2.sym=D
k eigenvectors u1 uk of Lsym.
Normalize the rows of U to norm 1, and form the matrix T Rnxk such that
1/ 2
tij uij uik2 .

k
i n, let yi Rk be the vector corresponding to the ith row of T.

yi)i=1,...,n in Rk with the k-means algorithm into clusters C1,.., Ck.
Fig. 3. Normalized spectral clustering algorithm by Ng et al. (2002).
Step 1. Construct the neighborhood of each vertex using the NC algorithm

2015).
1.1. Find the nearest direct neighbors.
1.2. Find the indirect neighbors till the first density decrease.
1.3. Extend the indirect neighbors in Step 1.2 using the indirect connectivity.
1.4. Determine the final neighbors by mutual connectivity tests.
Step 2. Construct an undirected graph, namely DAN.
Step 3. Determine the connected components of DAN.
Step 4. If the number of connected components is greater than k, insert an edge between the
nearest connected components. Update the number of connected components and DAN graph
accordingly, and return to Step 4. Otherwise, go to Step 5.
Step 5. Form the weighted adjacency matrix of DAN using equation (1).
Fig. 4. The proposed approach.
253 data points, and each data point in X is represented by a vertex in is calculated as follows 283
254 set V = {v1 , … ,vn }. In NC, the hypersphere passing through vertices ⎧
255 vi and vj with diameter dij is used for density calculation, where dij ⎨exp − di j
2
if(i, j) ∈ E,
(max{dik :(i,k) ∈ E})
2
is the Euclidean distance between vertices vi and vj . The number of wi j = (1)

256
⎩
257 vertices lying in this hypersphere shows the density between vertices 0 otherwise;
258 vi and vj , densityij . If densityij = 0, then vertices vi and vj are directly
where E is the edge set of DAN. In the Gaussian kernel, the neighbor- 284
259 connected. If densityij > 0, then vertices vi and vj are indirectly con-
hood width is equal to the longest edge in the neighborhood of the 285
260 nected. Using these density and connectivity definitions, Steps 1.1–1.4
corresponding point. Hence, it is uniquely calculated for each point. 286
261 are executed in Fig. 4, and the NC neighborhood of each vertex, NCi , is
262 determined uniquely. NC neighborhoods reflect the density and con-
5. Experimental study 287
263 nectivity relations in the data set.
264 In the second step, we construct an undirected similarity graph
In this section, we performed a comparative study of DAN and 288
265 G = (V,E) with vertex set V and edge set E. We insert an edge (vi ,vj ) if
other similarity graphs for spectral clustering. 289
266 vi ∈ NCj or vj ∈ NCi . This graph is called Density Adaptive Neighborhood
267 (DAN). This step yields a symmetric similarity graph.
5.1. Data sets and comparison 290
268 In the third step, we determine the connected components (sub-
269 clusters) of DAN. A connected component of an undirected graph is
We conducted experiments with spatial data sets (Buerk, 2015, 291
270 the largest subgraph in which all vertices are connected. These con-
Chang & Yeung 2008, İyigün, 2008, Sourina, 2013), I- (Fukunaga, 292
271 nected components are the potential clusters in the data set.
1990) and Ness (Van Ness, 1980) data sets. There are 20 spatial data 293
272 In the fourth step, the number of connected components is com-
sets including clusters with various shapes and density differences. 294
273 pared with the number of target clusters (k). This check shows
We provide some example data sets in Fig. 5 (a)–(f). 295
274 whether DAN satisfies the connectivity property. When the number
I- and Ness are Gaussian data sets with two clusters. The data 296
275 of connected components is more than k, this implies that there are
generation models are explained briefly. Let μi denote the mean vec- 297
276 too many isolated subclusters or vertices. For this reason, we insert
tor for cluster i, and i denote the covariance matrix for cluster i. 298
277 an edge (vi ,vj ) such that (vi ,vj ) = arg min {dij |vi ∈ CCr , vj ∈ CCq , p=r},
We define Ip as the p×p identity matrix, and diag[.] as the diagonal 299
278 where CCr and CCq denote the connected components r and q, respec-
matrix. We set n1 = 100 and n2 = 100, where n1 and n2 denote the 300
279 tively. We repeat this step until the number of connected components
number of points in clusters 1 and 2, respectively. 301
280 is less than or equal to k.
I- data set: 8-dimensional Gaussian data set with μ1 = 0, 302
281 In the final step, the weighted adjacency matrix of DAN is formed
282 using the Gaussian kernel. The weight between vertices vi and vj , wij ,
μ2 = [3.86 3.10 0.84 0.84 1.64 1.08 0.26 0.01]T , 1 = I8 and 303
2 = diag[8.41 12.06 0.12 0.22 1.49 1.77 0.35 2.73]. 304
JID: ESWA
(a) (b)
(c) (d)
(e) (f)
Fig. 5. Example data sets, (a) spiral, (b) data-uc-cc-nu-n_v2, (c) data-c-cc-nu-n_v2, (d) I-, (e) two_moons, and (f) chainlink.
Table 1 5.2. Comparison and performance criteria 314

The properties of UCI data sets.
Data set CN PN DN We used KNN, MKNN, ε -neighborhood, fully connected graph, 315
MST, and GG for comparison. We set the parameters considering the 316
Banknotes 2 200 6
Breast cancer 2 449 9
recommendations in the literature. In KNN, we examined three set- 317
Hepta 7 212 3 tings where k is log(n), 5% and 10% of the points in the data set. We 318
Iris 3 147 4 labeled these settings as KNN1, KNN2 and KNN3, respectively. Simi- 319
Liver 2 341 6 lar to KNN, we chose three settings for parameter k in MKNN with la- 320
Seeds 3 210 7
bels MKNN1, MKNN2 and MKNN3. In ε -neighborhood, we used two 321
User knowledge 4 258 5
Vertebral 2 310 6 settings where ε is equal to: (i) the longest edge in the MST of the 322
data points, (ii) the mean distance of a point to its kth closest neigh- 323
CN # of clusters PN # of points in the data
bor where k = log(n). We labeled these settings as eps-1 and eps- 324
set DN # of attributes
2, respectively. In the fully connected graph (FCG), we used Gaus- 325
sian kernel. The neighborhood width σ in the Gaussian kernel has 326
a similar role as parameter ε , so we used the same parameter set- 327
ting for σ with labels FCG1 and FCG2. There is no parameter for MST 328
305 Ness data sets: p-dimensional Gaussian data set with μ1 = 0, and GG. 329
I 0 After constructing the similarity graph, we applied unnormalized 330
306 μ2 = [/2 0 … 0 /2]T , 1 = Ip , and 2 = [ p 1 ] where p =
0 2 Ip
spectral clustering algorithm, and two normalized spectral clustering 331
307 2, … , 8 and = 6 and 8. is the Mahalanobis distance between two algorithms in Section 3.2. We labeled these algorithms as USC, NSC1 332
308 clusters. and NSC2. All the algorithms were coded in Matlab 8.1. We ran the 333
309 We also used data sets from UCI Machine Learning Repository algorithms on a PC with Intel Core i5 3.00 GHz processor and 4 GB 334
310 (Bache & Lichman, 2013) to generalize our results. The characteris- RAM. 335
311 tics of the eight UCI data sets are shown in Table 1. All the data sets We evaluated the clustering quality of different similarity graphs 336
312 have numerical attributes. We eliminated the missing values in each in terms of Normalized Mutual Information (NMI) (Fred & Jain, 2003) 337
313 data set. In addition, we normalized each data set. and Rand Index (RI) (Rand, 1971). NMI is an information theoretical 338
JID: ESWA
Table 2
Comparison results for the spatial data sets (The numbers in parentheses show the ranks within each spectral clustering algorithm such that 1 is the best and 13 is the worst).
USC NSC1 NSC2
TCN RI NMI TCN RI NMI TCN RI NMI
Mean Min Mean Min Mean Min Mean Min Mean Min Mean Min
KNN1 4 0.821 (4) 0.618 0.747 (3) 0.339 7 0.846 (3) 0.618 0.777 (2) 0.339 9 0.873 (2) 0.639 0.822 (2) 0.524
KNN2 6 0.836 (3) 0.499 0.744 (4) 0.004 5 0.818 (5) 0.517 0.717 (5) 0.034 5 0.814 (3) 0.505 0.717 (5) 0.012
KNN3 5 0.811 (6) 0.497 0.708 (5) 0 5 0.801 (9) 0.498 0.680 (10) 0.001 6 0.814 (4) 0.498 0.712 (7) 0.001
MKNN1 9 0.877 (2) 0.5 0.794 (2) 0.024 8 0.848 (2) 0.5 0.751 (3) 0.008 3 0.809 (8) 0.509 0.809 (3) 0.509
MKNN2 5 0.817 (5) 0.496 0.708 (6) 0 5 0.825 (4) 0.517 0.723 (4) 0.034 6 0.813 (5) 0.505 0.711 (8) 0.012
MKNN3 4 0.795 (7) 0.497 0.685 (7) 0 6 0.803 (8) 0.498 0.686 (8) 0.001 5 0.787 (10) 0.498 0.669 (10) 0.001
eps-1 0 0.705 (12) 0.435 0.501 (12) 0.002 0 0.681 (13) 0.479 0.491 (13) 0.027 1 0.745 (12) 0.5 0.555 (12) 0.05
eps-2 7 0.787 (9) 0.498 0.682 (9) 0.024 5 0.817 (6) 0.498 0.717 (6) 0.024 5 0.812 (6) 0.498 0.715 (6) 0.024
full-1 4 0.794 (8) 0.498 0.683 (8) 0.006 4 0.786 (10) 0.526 0.682 (9) 0.001 5 0.801 (9) 0.554 0.695 (9) 0.001
full-2 0 0.660 (13) 0.498 0.47 (13) 0.004 1 0.714 (12) 0.512 0.541 (12) 0.003 1 0.726 (13) 0.534 0.553 (13) 0.001
MST 2 0.776 (10) 0.589 0.666 (10) 0.315 3 0.805 (7) 0.585 0.711 (7) 0.359 3 0.812 (7) 0.638 0.723 (4) 0.459
GG 1 0.714 (11) 0.499 0.539 (11) 0.005 1 0.721 (11) 0.517 0.542 (11) 0.005 2 0.746 (11) 0.505 0.585 (11) 0.005
DAN 10 0.915 (1) 0.746 0.887 (1) 0.684 11 0.901 (1) 0.589 0.868 (1) 0.535 13 0.925 (1) 0.725 0.901 (1) 0.598
TCN # of data sets in which target clusters are found.
339 measure, and it is based on entropy. RI penalizes both divided clusters of the data sets in Fig. 6(a)–(c). Even the number of attributes in- 383
340 and mixed clusters. creases, the performance of DAN stays the same. When the clusters 384
are closer ( = 6), the performance of DAN shows a slight decrease in 385
341 5.3. Experiments on artificial data sets Fig. 6(d)–(f). Still, the number of attributes does not affect the perfor- 386
mance of DAN significantly. In both settings of , the performances of 387
342 The comparison results for the spatial data sets and I- are pro- KNN and GG are close to the performance of DAN. Similar to DAN, the 388
343 vided in Table 2. We rank each performance criterion for the ease number of attributes does not have an impact on the performance 389
344 of comparison. Among 21 data sets, USC with DAN finds the target of KNN. MST is sensitive to the number of attributes, in particular, 390
345 clusters in 10 data sets. NSC1 and NSC2 with DAN extract the target when the clusters are closer. ε -neighborhood and MKNN are inferior 391
346 clusters in 11 and 13 data sets, respectively. MKNN1 and KNN1 follow to the other similarity graphs, and their performances depend on the 392
347 DAN. MKNN1 extracts the target clusters in 9, 8 and 3 data sets with number of attributes. To sum up, DAN is successful in finding target 393
348 USC, NSC1 and NSC2, respectively. The same Figs. are 4, 7 and 9 for clusters for the data sets with large number of attributes. 394
349 KNN1.
350 Among 13 competing similarity graphs, DAN gives the best RI and 5.4. Experiments on UCI data sets 395
351 NMI values in all spectral clustering algorithms followed by KNN1
352 and MKNN1. The performances of ε -neighborhood, FCG, MST and GG In Table 3, we present the RI value of each UCI data set, when 396
353 are inferior to the performances of DAN, KNN and MKNN. Hence, USC is applied. DAN outperforms other similarity graphs in all data 397
354 DAN outperforms other similarity graphs in finding the target clus- sets except “seeds” and “user knowledge”. For “seeds”, KNN1 is the 398
355 ters with various shapes and density differences. However, it has lim- best performer, whereas MKNN2 gives the highest RI for “user knowl- 399
356 itations in the data sets, which include noise and mixed clusters. Still, edge”. For “banknotes” and “vertebral” data sets, other similarity 400
357 the worst-case performance of DAN (minimum of RI and NMI) is su- graphs such as KNN and MKNN, FCG and ε -neighborhood graph also 401
358 perior to the other ones. find the target clusters. However, the performance of these similar- 402
359 Table 2 indicates that there is a relation between the similarity ity graphs is sensitive to the parameter settings. DAN graph is able 403
360 graph and the spectral clustering algorithm used in terms of cluster- to represent the local characteristics of the data set without any 404
361 ing performance. For instance, NSC2 together with DAN is the best parameter. 405
362 performer for all spatial data sets. NSC2 is also preferable in most We also provide RI values of NSC1 and NSC2 in the Appendix. Even 406
363 of the similarity graphs including ε -neighborhood, FCG, MST and GG. for different spectral clustering algorithms, DAN is the best performer 407
364 However, for KNN and MKNN, the best performing spectral clustering among 13 similarity graphs. 408
365 algorithms are USC and NSC1 depending on the parameter setting.
366 We also analyzed the impact of the parameter settings on KNN, 5.5. Experiments on robustness 409
367 MKNN, ε -neighborhood and fully connected graphs. In Table 2, we
368 observe that the performances of these similarity graphs are sensi- We analyzed the robustness of the similarity graph using two 410
369 tive to the parameter settings. KNN and MKNN are more successful in types of transformations: geometric distortion and decimation. We 411
370 cluster extraction for k = log(n). For the larger values of k, RI and NMI consider the spatial data sets in Section 5.1 for this purpose. Let 412
371 values worsen. FCG shows significantly better performance, when the xi = (xi1 , xi2 ) and xi = (xi1 , xi2 ) be the original and distorted points, 413
372 neighborhood width (σ ) is chosen as the longest edge in the MST. In respectively. In the geometric distortion, we displace each point hori- 414
373 ε-neighborhood graph, setting ε to the mean distance of a point to its zontally such that xi1 = xi1 + λxi2 and xi2 = xi2 where λ is the distor- 415
374 kth closest neighbor with k = log(n) provides a significant improve- tion factor. In decimation, we remove 0.05Dn points randomly where 416
375 ment in the clustering performance. This is also consistent with the D is the decimation factor. 417
376 superior performance of KNN and MKNN for k = log(n). The impact of distortion is illustrated on two example data sets in 418
377 Finally, we examined the performance of DAN with varying the Fig. 7. The resulting clusters for the best three performing similarity 419
378 number of attributes. We generated 2- to 8-dimensional Ness data graphs, i.e. DAN, KNN1 and MKNN1, are shown in Fig. 7. DAN can find 420
379 sets with = 6, 8. The experimental results are shown in Fig. 6. In the target clusters in both data sets in Fig. 7(c) and (f), whereas KNN1 421
380 Fig. 6, we present the average RI values for different parameter set- and MKNN1 mix target clusters as shown in Fig. 7(a), (b), (d), and (e). 422
381 tings of KNN, MKNN, ε -neighborhood and FCG. When the clusters For the distortion factor λ ∈ {0,0.1, … , 0.5}, we present 423
382 are well separated ( = 8), DAN finds the target clusters in most the comparison results for all spatial data sets in Fig. 8. DAN 424
JID: ESWA
(a) (b)
(c) (d)
(e) (f)
Fig. 6. Experimental results for Ness data sets with varying the number of attributes, when (a) USC is applied for = 8, (b) NSC1 is applied for = 8, (c) NSC2 is applied for =
8, (d) USC is applied for = 6, (e) NSC1 is applied for = 6, and (f) NSC2 is applied for = 6.
Table 3
Comparison results for the UCI data sets in terms of RI when USC is applied (The best performer for each data set is bolded).
Data set KNN1 KNN2 KNN3 MKNN1 MKNN2 MKNN3 eps-1 eps-2 FCG1 FCG2 MST GG DAN
Banknotes 1.000 1.000 1.000 0.498 1.000 1.000 0.861 0.498 0.498 0.498 0.887 0.827 1.000
Breast cancer 0.887 0.891 0.887 0.501 0.891 0.887 0.684 0.500 0.500 0.500 0.841 0.899 0.907
Hepta 0.737 0.947 0.862 0.755 0.861 0.722 0.583 0.876 0.951 0.738 0.937 0.953 0.956
Iris 0.816 0.864 0.837 0.432 0.864 0.837 0.729 0.767 0.767 0.768 0.702 0.716 0.876
Liver 0.506 0.504 0.505 0.511 0.504 0.505 0.507 0.512 0.510 0.512 0.504 0.510 0.512
Seeds 0.922 0.911 0.894 0.377 0.910 0.894 0.359 0.687 0.731 0.860 0.904 0.873 0.891
User knowledge 0.705 0.675 0.673 0.317 0.718 0.674 0.319 0.317 0.300 0.300 0.699 0.650 0.701
Vertebral 0.516 0.534 0.545 0.557 0.534 0.545 0.541 0.559 0.559 0.559 0.559 0.503 0.559
425 provides the highest RI among all competing similarity graphs, fol- In Fig. 10, the comparison results for 21 spatial data sets are given 436
426 lowed by KNN and MKNN. ε -neighborhood, FCG, MST, and GG fall with varying the decimation factor D ∈ {0,1, … ,5}. Although the av- 437
427 behind DAN, KNN and MKNN. As the distortion factor increases, the erage RI values of DAN decrease for larger decimation factors, DAN is 438
428 performance of DAN decreases. KNN and MKNN are also sensitive still superior to the competing similarity graphs. KNN and MKNN fol- 439
429 to the distortion factor. Although GG has the worst performance low DAN, and GG has the minimum RI values. To sum up, DAN is the 440
430 among all similarity graphs, its sensitivity to the distortion factor is best performer under distortion and decimation transformations. 441
431 less.
432 We also show the impact of decimation on two example data sets 6. Discussion 442
433 in Fig. 9. The results of the best three performing similarity graphs,
434 i.e. DAN, KNN1 and MKNN1, are given in Fig. 9. In Fig. 9(c) and (f), Our experimental study shows that DAN is an effective pre- 443
435 DAN outperforms KNN1 and MKNN1 in terms of RI. processing step for spectral clustering. It is successful in finding the 444
JID: ESWA
(a) (b) (c)
(d) (e) (f)
Fig. 7. Example data sets with distortion factor λ = 0.3, when NSC2 applied to (a) spiral with KNN1 (RI = 0.95), (b) spiral with MKNN1 (RI = 0.72), (c) spiral with DAN (RI = 1), (d)
data-uc-cc-nu-n_v2 with KNN1 (RI = 0.92), (e) data-uc-cc-nu-n_v2 with MKNN1 (RI = 0.83), and (f) data-uc-cc-nu-n_v2 with DAN (RI = 1).
(a) (b) (c)
Fig 8. The impact of distortion for (a) USC, (b) NSC1, and (c) NSC2.
(a) (b) (c)
(d) (e) (f)
Fig. 9. Example data sets with decimation factor D = 3 when USC is applied to (a) spiral with KNN1 (RI = 0.72), (b) spiral with MKNN1 (RI = 0.72), (c) spiral with DAN (RI = 1); and
when NSC2 is applied to, (d) data-uc-cc-nu-n_v2 with KNN1 (RI = 0.95), (e) data-uc-cc-nu-n_v2 with MKNN1 (RI = 0.85), and (f) data-uc-cc-nu-n_v2 with DAN (RI = 1).
445 clusters with arbitrary shapes and varying density. Also, it is supe- Moreover, DAN is robust to the number of attributes, and the trans- 450
446 rior to the other approaches in terms of RI, NMI and the number of formations such as distortion and decimation. 451
447 data sets in which target clusters are found. Particularly, in spatial Although KNN and MKNN follow DAN, their performances are 452
448 data sets, DAN together with the normalized spectral clustering algo- sensitive to the parameter setting. Even the properties of the data 453
449 rithm by Ng et al. (2002) is able to find the target clusters correctly. set affect the parameter setting. For instance, clustering accuracy 454
JID: ESWA
(a) (b) (c)
Fig. 10. The impact of decimation for (a) USC, (b) NSC1, and (c) NSC2.
455 improves by choosing k = log(n) in spatial data sets, whereas larger The proposed similarity graph, namely DAN, facilitates the use of 481
456 values of k perform better in UCI data sets, which have larger number spectral clustering algorithms in various domains without a priori in- 482
457 of attributes and data points compared to the spatial data sets. When formation on the data set. Compared to the existing approaches, the 483
458 the data set has varying density, MKNN yields slightly better results main advantages of DAN are as follows: (i) it is parameter-free, (ii) 484
459 than KNN. However, MKNN is more sensitive to decimation and dis- it can work together with the well-known spectral clustering algo- 485
460 tortion compared to KNN. rithms, (iii) it is successful in finding the local characteristics of the 486
461 The performances of ε -neighborhood and FCG are inferior to DAN, data sets with arbitrary shaped clusters and varying density, and (iv) 487
462 KNN and MKNN in most of the data sets. As ε -neighborhood and its performance is robust to the number of attributes and transforma- 488
463 FCG define the similarity graphs based on spherical shaped neigh- tions such as distortion and decimation. 489
464 borhoods, these two similarity graphs are not capable of finding the Possible future research directions are as follows. 490
465 clusters with arbitrary shapes and varying density. MST and GG are
466 parameter-free approaches, however, they have poor performance on
• DAN has limitation in handling data sets with noise. Hence, a fu- 491
467 spectral clustering. Hence, defining the proximity relations without ture research direction is the development ways of handling noise. 492
468 density information is not sufficient for spectral clustering.

• When there exist mixed clusters, the neighborhood relations 493
469 ε-neighborhood, FCG, MST and GG show superior performance among the data points may not be precise. Instead of hard (crisp) 494
470 with normalized spectral clustering algorithm by Ng et al. (2002), neighborhood relations, an interesting research direction can be 495
471 whereas the choice of the spectral clustering algorithm for KNN and the use of the fuzzy neighborhood relations in spectral clustering. 496
472 MKNN depends on the parameter setting.

• In practice, side information might be available to guide the clus- 497
tering result to the desired partition. This can be in the form 498
473 7. Conclusion of pairwise constraints or partial labeling. Learning a similarity 499

graph based on side information can be investigated further. 500
• Hybridization of spectral clustering with metaheuristics is a 501
474 Determining the local characteristics of a data set is a building
promising tool to improve the clustering accuracy. 502
475 block for spectral clustering. However, the existing approaches such
476 as KNN, MKNN, ε -neighborhood and FCG are sensitive to the parame-
477 ter selection, and there is not a systematic way for finding the proper Appendix 503
478 setting. This study aims to fill the gap in the spectral clustering litera-
479 ture by providing a pre-processing step, which includes the construc- Table A.1, Table A.2. 504
480 tion of a parameter-free similarity graph and its adjacency matrix.
Table A.1
Comparison results for the UCI data sets in terms of RI when NSC1 is applied (The best performer for each data set is bolded).
Data set KNN1 KNN2 KNN3 MKNN1 MKNN2 MKNN3 eps-1 eps-2 FCG1 FCG2 MST GG DAN
Banknotes 1.000 1.000 1.000 0.498 1.000 1.000 0.887 0.498 0.961 0.923 0.887 0.835 1.000
Breast cancer 0.891 0.895 0.895 0.501 0.895 0.895 0.500 0.500 0.791 0.826 0.841 0.899 0.907
Hepta 0.897 0.854 0.948 0.715 0.857 0.854 0.867 0.864 0.738 0.952 0.947 0.949 0.820
Iris 0.572 0.864 0.842 0.429 0.864 0.842 0.703 0.772 0.772 0.818 0.828 0.721 0.883
Liver 0.499 0.500 0.499 0.506 0.500 0.499 0.512 0.512 0.510 0.509 0.504 0.502 0.503
Seeds 0.916 0.905 0.900 0.380 0.905 0.900 0.373 0.732 0.881 0.885 0.894 0.879 0.900
User knowledge 0.669 0.672 0.669 0.456 0.672 0.675 0.337 0.666 0.648 0.678 0.699 0.654 0.701
Vertebral 0.551 0.559 0.562 0.557 0.559 0.562 0.553 0.559 0.559 0.545 0.559 0.574 0.564
Table A.2
Comparison results for the UCI data sets in terms of RI when NSC2 is applied (The best performer for each data set is bolded).
Data set KNN1 KNN2 KNN3 MKNN-1 MKNN2 MKNN3 eps-1 eps-2 FCG1 FCG2 MST GG DAN
Banknotes 1.000 1.000 1.000 0.852 1.000 1.000 0.498 0.498 0.961 0.932 0.923 0.628 1.000
Breast cancer 0.911 0.915 0.915 0.758 0.915 0.915 0.845 0.500 0.895 0.867 0.899 0.499 0.903
Hepta 0.950 1.000 0.950 0.892 0.904 0.898 0.906 0.959 1.000 1.000 0.948 1.000 0.953
Iris 0.729 0.832 0.837 0.566 0.832 0.837 0.713 0.842 0.773 0.833 0.823 0.823 0.917
Liver 0.499 0.500 0.500 0.509 0.501 0.500 0.501 0.512 0.510 0.499 0.507 0.499 0.502
Seeds 0.911 0.895 0.911 0.390 0.895 0.911 0.400 0.864 0.889 0.882 0.888 0.654 0.900
User knowledge 0.672 0.701 0.669 0.620 0.673 0.669 0.547 0.673 0.717 0.665 0.690 0.653 0.721
Vertebral 0.564 0.564 0.566 0.539 0.564 0.566 0.504 0.559 0.559 0.559 0.555 0.569 0.579
JID: ESWA
505 References Liu, A. H., Poon, L. K. M., Liu, T.-F., & Zhang, N. L. (2014). Latent tree models for rounding 552
in spectral clustering. Neurocomputing, 144, 448–462. 553
506 Bache, K. & Lichman, M. (2013). UCI machine learning repository Lu, H., Fu, Z., & Shu, X. (2014). Non-negative and sparse spectral clustering. Pattern 554
507 [http://archive.ics.uci.edu/ml] Irvine, CA: University of California, School of Recognition, 47, 418–426. 555
508 Information and Computer Science Maier, M., von Luxburg, U., & Hein, M. (2013). How the result of graph clustering meth- 556
509 Beauchemin, M. (2015). A density-based similarity matrix construction for spectral ods depends on the construction of the graph. ESAIM: Probability and Statistics, 17, 557
510 clustering. Neurocomputing, 151, 835–844. 370–418. 558
511 Brito, M., Chavez, E., Quiroz, A., & Yukich, J. (1997). Connectivity of the mutual Nadler, B., & Galun, M. (2006). Fundamental limitations of spectral clustering. Advances 559
512 k-nearest-neighbor graph in clustering and outlier detection. Statistics & Proba- in Neural Information Processing Systems, 19, 1017–1024. 560
513 bility Letters, 35, 33–42. Nascimento, M. C. V., & Carvalho, A. C. P. L. F. (2011). Spectral methods for graph 561
514 Buerk, I. (2015). Fast and efficient spectral clustering. Retrieved from clustering-A survey. European Journal of Operational Research, 211, 221–231. 562
515 http://www.mathworks.com/matlabcentral/fileexchange/34412 Last accessed: Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. 563
516 July 10, 2015. In T. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information 564
517 Carreira-Perpinan, M. A., & Zemel, R. S. (2005). Proximity graphs for clustering and processing systems (pp. 849–856). Cambridge: MIT Press. 565
518 manifold learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural in- Rand, W. M. (1971). Objective criteria for the evaluation of the clustering methods. 566
519 formation processing systems (pp. 225–232). Cambridge: MIT Press. Journal of the American Statistical Association, 66(336), 846–850. 567
520 Chang, H., & Yeung, D. Y. (2008). Robust path-based spectral clustering. Pattern Recog- Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions 568
521 nition, 41(1), 191–203. on Pattern Analysis and Machine Intelligence, 22(8), 888–905. 569
522 Chung, F. R. K. (1997). Spectral graph theory. Providence: American Mathematical Soci- Sourina, O. Current projects in the homepage of Olga Sourina. 570
523 ety. (http://www.ntu.edu.sg/home/eosourina/projects.html , last accessed on March 2, 571
524 Chrysouli, C., & Tefas, A. (2015). Spectral clustering and semi-supervised learning using 2013). 572
525 evolving similarity graphs. Applied Soft Computing. doi:10.1016/j.asoc.2015.05.026. Taşdemir, K. (2012). Vector quantization based approximate spectral clustering of large 573
526 Correa, C. D., & Lindstorm, P. (2012). Locally-scaled spectral clustering using empty datasets. Pattern Recognition, 45, 3034–3044. 574
527 region graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Taşdemir, K., & Merényi, E. (2009). Exploiting data topology in visualization and cluster- 575
528 Knowledge Discovery and Data Mining (pp. 1330–1338). ing of self-organizing maps. IEEE Transactions on Neural Networks, 20(4), 549–562. 576
529 Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph Taşdemir, K., Yalçin, B., & Yildirim, I. (2015). Approximate spectral clustering with uti- 577
530 partitioning. In Proceedings of 7th ACM SIGKDD International Conference on Knowl- lized similarity information using geodesic based hybrid distance measures. Pat- 578
531 edge Discovery and Data Mining (pp. 269–274). tern Recognition, 48, 1465–1477. 579
532 Fred, A. L. N., & Jain, A. K. (2003). Robust data clustering. In Proceedings of IEEE Computer Van Ness, J. (1980). On the dominance of nonparametric Bayes rule discriminant algo- 580
533 Society Conference on Computer Vision and Pattern Recognition (pp. 128–136). rithms in high dimensions. Pattern Recognition, 12(6), 355–368. 581
534 Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). Academic Vázquez-Martín, R., & Bandera, A. (2013). Spatio-temporal feature-based keyframe de- 582
Q4 tection from video shots using spectral clustering. Pattern Recognition Letters, 34(7), 583
535 Press.
536 Gabriel, K. R., & Sokal, R. R. (1969). New statistical approach to geographic variation 770–779. 584
537 analysis. Systematic Zoology, 18(3), 259–278. Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17, 585
538 He, R., Qin, B., & Liu, T. (2012). A novel approach to update summarization using evolu- 395–416. 586
539 tionary manifold-ranking and spectral clustering. Expert Systems with Applications, Wang, C.-H. (2008). Recognition of semiconductor defect patterns using spatial filter- 587
540 39(3), 2375–2384. ing and spectral clustering. Expert Systems with Applications, 34(3), 1914–1923. 588
541 Higham, D. J., Kalna, G., & Kibble, M. (2007). Spectral clustering and its use in bioinfor- Xiang, T., & Gong, S. (2008). Spectral clustering with eigenvector selection. Pattern 589
542 matics. Journal of Computational and Applied Mathematics, 204(1), 25–37. Recognition, 41(3), 1012–1029. 590
543 İnkaya, T., Kayalıgil, S., & Özdemirel, N. E. (2015). An adaptive neighbourhood construc- Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. Advances in Neural 591
544 tion algorithm based on density and connectivity. Pattern Recognition Letters, 52, Information Processing Systems, 17, 1601–1608. 592
545 17–24. Zeng, S., Huang, R., Kang, Z., & Sang, N. (2014). Image segmentation using spectral clus- 593
546 İyigün, C. (2008). Probabilistic distance clustering. New Brunswick, New Jersey: Ph.D. tering of Gaussian mixture models. Neurocomputing, 144, 346–356. 594
547 Dissertation. Rutgers University. Zhang, X., Li, J., & Yu, H. (2011). Local density adaptive similarity measurement for spec- 595
548 Jaromczyk, J. W., & Toussaint, G. T. (1992). Relative neighborhood graphs and their rel- tral clustering. Pattern Recognition Letters, 32, 352–358. 596
549 atives. Proceedings of the IEEE, 80(9), 1502–1517. Zhu, X., Loy, C. C., & Gong, S. (2014). Constructing Robust Affinity Graphs for Spectral 597
550 Jia, H., Ding, S., Xu, X., & Nie, R. (2014). The latest research progress on spectral cluster- Clustering. In Proceedings of 2014 IEEE Conference on Computer Vision and Pattern 598
551 ing. Neural Computing and Applications, 24, 1477–1486. Recognition (pp. 1450–1457). 599

Expert Systems With Applications: Tülin Inkaya

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert Systems With Applications: Tülin Inkaya

Uploaded by

Copyright:

Available Formats

JID: ESWA

ARTICLE IN PRESS [m5G;August 10, 2015;21:44]

Expert Systems With Applications xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Expert Systems With Applications

A parameter-free similarity graph for spectral clustering

2 T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx

T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx 3

Input: Data set X and target number of clusters k

Fig. 1. Unnormalized spectral clustering algorithm by Von Luxburg (2007).

Input: Data set X and target number of clusters k

Lrw = D-1L where L=D-W.

Fig. 2. Normalized spectral clustering algorithm by Shi and Malik (2000).

4 T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx

Input: Data set X and target number of clusters k

tij uij uik2 .

i n, let yi Rk be the vector corresponding to the ith row of T.

Fig. 3. Normalized spectral clustering algorithm by Ng et al. (2002).

Step 1. Construct the neighborhood of each vertex using the NC algorithm

Fig. 4. The proposed approach.

is the Euclidean distance between vertices vi and vj . The number of wi j = (1)

T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx 5

Table 1 5.2. Comparison and performance criteria 314

6 T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx

USC NSC1 NSC2

TCN RI NMI TCN RI NMI TCN RI NMI

TCN # of data sets in which target clusters are found.

T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx 7

8 T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx

(a) (b) (c)

(d) (e) (f)

(a) (b) (c)

(a) (b) (c)

(d) (e) (f)

T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx 9

(a) (b) (c)

468 density information is not suﬃcient for spectral clustering.

472 MKNN depends on the parameter setting.

473 7. Conclusion of pairwise constraints or partial labeling. Learning a similarity 499

10 T. İnkaya / Expert Systems With Applications xxx (2015) xxx–xxx

You might also like