Professional Documents
Culture Documents
Abstract— The twin support vector machine (TWSVM) is one of the sets have shown that our TWSVC performs better than the
powerful classification methods. In this brief, a TWSVM-type clustering relevant plane-based clustering methods.
method, called twin support vector clustering (TWSVC), is proposed.
Our TWSVC includes both linear and nonlinear versions. It determines
The rest of this brief is organized as follows. We give a quick
k cluster center planes by solving a series of quadratic programming review of k-means, kPC, and TWSVM in Section II. Our TWSVC
problems. To make TWSVC more efficient and stable, an initialization with the corresponding initializations is described in Section III. The
algorithm based on the nearest neighbor graph is also suggested. experiments and the conclusion are arranged in Sections IV and V,
The experimental results on several benchmark data sets have shown respectively.
a comparable performance of our TWSVC.
TABLE I
A CCURACIES OF THE L INEAR C LUSTERING M ETHODS ON THE B ENCHMARK D ATA S ETS
In short, for i = 1, 2, . . . , k, (10) can be solved by the following 1) For the given data set, select an positive integer parameter p
procedure. and then construct the p nearest neighbor undirected graph, i.e.,
1) Select the initial [wi0 ; bi0 ]. for i = 1, . . . , m, find xi ’s p nearest neighbors and connect xi
j +1 j +1 with its neighbors.
2) For j = 0, 1, . . ., find [wi ; bi ] by (15).
j +1 j +1 j j 2) Create the clusters by associating the connected samples,
3) Stop if ||[wi ; bi ] − [wi ; bi ]|| is small enough, and then
j +1 j +1
resulting in t clusters. If the current number of clusters t is
set wi = wi , bi = bi . equal to k, stop.
It has been proved that the CCCP is able to find a local solution 3) If t < k, disconnect the two connected samples with the largest
to (10) [18]. Once the solution [wi ; bi ] with i = 1, . . . , k is obtained, distance and go to step 2).
the cluster labels of the data samples can be updated by (6). 4) If t > k, compute the Hausdorff distance [26] between every
two clusters among the t clusters and sort all pairs in ascending
B. Nonlinear TWSVC order. Incorporate the nearest pair of clusters into one, until
Now, let us turn to extend the above linear TWSVC to manifold k clusters are formulated, where the Hausdorff distance between
clustering by kernel trick [23]. Similar to [10], [11], and [24], our two sets S1 and S2 of samples is defined as
nonlinear TWSVC seeks k cluster center manifolds in an appropriate h(S1 , S2 ) = max{max{ min ||i − j ||}, max{ min ||i − j ||}}. (18)
kernel generated space as follows: i∈S1 j ∈S2 i∈S2 j ∈S1
Center-manifoldi := K (x, X)u i + γi = 0, i = 1, 2, . . . , k (16) Second, we concern with the initialization of the CCCP in our
TWSVC, where one needs to select an initial point [wi0 ; bi0 ]. Noting
where K (·, ·) is an appropriate kernel function [10], [11], u i ∈ R m , the relationship between our TWSVC and kPC, the solution to (5)
and γi ∈ R. in kPC is taken. Problem (5) can be converted to an eigenvalue
The counterpart of (10) is problem by the Karush–Kuhn–Tucker conditions [27] as
1 1
min ||K (X i , X)u i + γi e||2 + ce ηi X i ee − I X i wi0 = λwi0 (19)
u i ,γi ,ηi ,X i 2 e e
s.t. |K ( X̂ i , X)u i + γi | ≥ e − ηi , ηi ≥ 0 (17) where I is an identity matrix, resulting in that wi0 should be
where ηi is a slack vector, i = 1, 2, . . . , k. The above problem can the eigenvector corresponding to the smallest eigenvalue of λ and
also be solved by CCCP similar to the linear case. The details are bi0 = −e X i wi0 /e e, i = 1, . . . , k.
omitted.
IV. E XPERIMENTAL R ESULTS
C. Initializations In this section, we analyze the performance of our TWSVC
First, we consider the initialization of the labels of TWSVC. compared with k-means (linear and nonlinear formations [7], [28]),
Traditionally, the initial labels in clustering are randomly generated. kPC (linear and nonlinear formations [8]), PPC (linear and
However, the experiments on k-means [7], kPC [8], and PPC [9] have nonlinear formations [9] where the nonlinear formation
indicated that the results are unstable and strongly depend on the can be obtained easily by the kernel trick as TWSVC),
initial labels. Therefore, we present an initialization algorithm based fuzzy c-means (FCM, linear formation [29]), and Camastra
on the NNG [25], which has been frequently used in manifold-based method (nonlinear formation [30]) on several benchmark data
learning. The main process is as follows. sets [31], [32]. All the methods are implemented by MATLAB [33]
2586 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
TABLE II
A CCURACIES OF THE M ANIFOLD C LUSTERING M ETHODS ON THE B ENCHMARK D ATA S ETS
on a PC with an Intel Core Duo processor (double 3.4 GHz) initialization, all the methods are run 10 times, and the average
with 4-GB RAM. accuracy, the standard deviation, and the one-run CPU time are
In the experiments, we used the metric accuracy to measure recorded in Tables I and II for linear and nonlinear clustering,
the performance of these methods [34]. Given the cluster labels respectively.
yi ∈ N, i = 1, . . . , m, it is easy to compute the corresponding Tables I and II show the following.
similarity matrix M ∈ R m×m , where 1) TWSVC gets higher accuracy than the other plane-based
cluster methods with either random initialization or NNG-based
1, if yi = y j
M(i, j ) = (20) initialization on most data sets.
0, otherwise
2) TWSVC owns the highest average accuracy among these
suppose Mt is the similarity matrix computed by the truth cluster clustering methods.
labels of the data set, and M p is the one computed by the prediction 3) The NNG-based initialization is superior to random initializa-
of a clustering method. Then, the metric accuracy of the clustering tion on most data sets, especially for the plane-based methods.
method is defined as the Rand statistic [34] However, the training time of TWSVC is longer than the others
n 00 + n 11 − m because it needs to solve a series of QPPs.
Accuracy = × 100% (21)
m2 − m Fig. 2 shows the relations between the parameters and the
where n 00 is the number of zeros in M p and Mt , and n 11 is the accuracy (vertical axis) of our linear TWSVC on the above data
number of ones in M p and Mt . sets. It can be found from Fig. 2 that the accuracy of our TWSVC is
To test the proposed initialization strategy, all initial cluster affected by both p and c, and higher accuracy is reached by smaller
labels are selected by both random initialization and NNG-based p for most data sets.
initialization. The parameters c and μ in kPC, PPC, or TWSVC Fig. 3 shows the relations between the parameters and the
are selected from {2i |i = −8, −7, . . . , 7}, and p in the accuracy (vertical axis) of our nonlinear TWSVC only on three
NNG-based initialization is selected from {1, 2, 3, 4, 5}. For random data sets. More results of other data sets can be found at
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015 2587
Fig. 3. Illustration of the effectiveness of nonlinear TWSVC with different parameters on (i)–(v) Dermatology, (vi)–(x) Ecoli, and (xi)–(xv) Haberman.
http://www.optimal-group.org/Resource/TWSVC.html. In Fig. 3, the [3] P. Padungweang, C. Lursinsap, and K. Sunat, “A discrimination analysis
rows correspond to the data sets and the columns correspond to the for unsupervised feature selection via optic diffraction principle,” IEEE
Trans. Neural Netw. Learn. Syst., vol. 23, no. 10, pp. 1587–1600,
parameters. From Fig. 3, it can be seen that:
Oct. 2012.
1) the parameter p ≤ 3 often makes nonlinear TWSVC perform [4] I. Cattinelli, G. Valentini, E. Paulesu, and N. A. Borghese, “A novel
well, which is similar to linear TWSVC; approach to the problem of non-uniqueness of the solution in hierarchical
2) the parameter c ≥ 1 is always a good option for most data sets; clustering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 7,
3) the performance of the nonlinear TWSVC is affected by the pp. 1166–1173, Jul. 2013.
[5] M. W. Berry, Survey of Text Mining: Clustering, Classification, and
parameter μ significantly; Retrieval, vol. 1. New York, NY, USA: Springer-Verlag, 2004.
4) different data sets correspond to different optimal μ, which may [6] R. Ilin, “Unsupervised learning of categorical data with competing
be affected by the data structure. models,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11,
pp. 1726–1737, Nov. 2012.
[7] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data.
V. C ONCLUSION Upper Saddle River, NJ, USA: Prentice-Hall, 1988.
A TWSVM-type plane-based clustering method (TWSVC) has [8] P. S. Bradley and O. L. Mangasarian, “k-plane clustering,” J. Global
been proposed. It contains both the linear and nonlinear formations. Optim., vol. 16, no. 1, pp. 23–32, 2000.
[9] Y.-H. Shao, L. Bai, Z. Wang, X.-Y. Hua, and N.-Y. Deng, “Prox-
The cluster center planes in TWSVC are obtained by solving a series
imal plane clustering via eigenvalues,” Proc. Comput. Sci., vol. 17,
of QPPs instead of the eigenvalue problems in both kPC and PPC. pp. 41–47, May 2013.
In addition, an efficient and stable NNG-based initialization is also [10] Jayadeva, R. Khemchandani, and S. Chandra, “Twin support vector
presented. The experimental results on several public available data machines for pattern classification,” IEEE Trans. Pattern Anal. Mach.
sets have indicated that our TWSVC has higher accuracy compared Intell., vol. 29, no. 5, pp. 905–910, May 2007.
[11] Y.-H. Shao, C.-H. Zhang, X.-B. Wang, and N.-Y. Deng, “Improvements
with current plane-based clustering methods. For practical conve- on twin support vector machines,” IEEE Trans. Neural Netw., vol. 22,
nience, the corresponding TWSVC MATLAB code can be down- no. 6, pp. 962–968, Jun. 2011.
loaded from http://www.optimal-group.org/Resource/TWSVC.html. [12] L. Bai, Z. Wang, Y.-H. Shao, and N.-Y. Deng, “A novel feature selection
It should be pointed out that, in our TWSVC, there are several method for twin support vector machine,” Knowl.-Based Syst., vol. 59,
parameters need to be selected and a series of QPPs is needed to pp. 1–8, Mar. 2014.
[13] R. Souvenir and R. Pless, “Manifold clustering,” in Proc. 10th IEEE Int.
be solved. Consequently, designing more efficient solvers and model Conf. Comput. Vis. (ICCV), vol. 1. Oct. 2005, pp. 648–653.
selection methods is interesting. [14] W. Cao and R. Haralick, “Nonlinear manifold clustering by dimension-
ality,” in Proc. 18th Int. Conf. Pattern Recognit. (ICPR), vol. 1. 2006,
ACKNOWLEDGMENT pp. 920–924.
[15] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by
The authors would like to thank the editor and the anonymous locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,
reviewers for their valuable comments and suggestions. 2000.
[16] X. Huang, Y. Ye, and H. Zhang, “Extensions of k-means-type algo-
R EFERENCES rithms: A new clustering framework by integrating intracluster com-
pactness and intercluster separation,” IEEE Trans. Neural Netw. Learn.
[1] M. Aldenderfer and R. Blashfield, Cluster Analysis. Los Angeles, CA, Syst., vol. 25, no. 8, pp. 1433–1446, Aug. 2014.
USA: Sage Publications, 1985. [17] W. Zhen, C. Jin, and Q. Ming, “Non-parallel planes support vector
[2] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” machine for multi-class classification,” in Proc. Int. Conf. Logistics Syst.
ACM Comput. Surv., vol. 31, no. 3, pp. 264–323, 1999. Intell. Manage., vol. 1. Jan. 2010, pp. 581–585.
2588 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
[18] A. L. Yuille and A. Rangarajan, “The concave-convex procedure [26] F. Hausdorff, Mengenlehre. Berlin, Germany: Walter de Gruyter, 1927.
(CCCP),” in Advances in Neural Information Processing Systems, vol. 2. [27] R. Fletcher, Practical Methods of Optimization. New York, NY, USA:
Cambridge, MA, USA: MIT Press, 2002, pp. 1033–1040. Wiley, 1987.
[19] P.-M. Cheung and J. T. Kwok, “A regularization framework for multiple- [28] I. S. Dhillon, Y. Guan, and B. Kulis, “Kernel k-means: Spectral
instance learning,” in Proc. 23rd Int. Conf. Mach. Learn., 2006, clustering and normalized cuts,” in Proc. 10th ACM SIGKDD Int. Conf.
pp. 193–200. Knowl. Discovery Data Mining, 1988, pp. 551–556.
[20] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., [29] D. Dembélé and P. Kastner, “Fuzzy C-means method for cluster-
vol. 20, no. 3, pp. 273–297, 1995. ing microarray data,” Bioinformatics, vol. 19, no. 8, pp. 973–980,
[21] N. Deng, Y. Tian, and C. Zhang, Support Vector Machines: Optimization 2003.
Based Theory, Algorithms, and Extensions. Philadelphia, PA, USA: [30] F. Camastra and A. Verri, “A novel kernel method for clustering,”
CRC Press, 2012. IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 801–805,
[22] O. L. Mangasarian and D. R. Musicant, “Successive overrelaxation for May 2005.
support vector machines,” IEEE Trans. Neural Netw., vol. 10, no. 5, [31] C. T. Zahn, “Graph-theoretical methods for detecting and describing
pp. 1032–1037, Sep. 1999. gestalt clusters,” IEEE Trans. Comput., vol. C-20, no. 1, pp. 68–86,
[23] B. Schölkopf and A. J. Smola, Learning With Kernels. Cambridge, MA, Jan. 1971.
USA: MIT Press, 2002. [32] C. Blake and C. Merz. (1998). UCI Repository for Machine Learning
[24] O. L. Mangasarian and E. W. Wild, “Multisurface proximal support Databases. [Online]. Available: http://www.ics.uci.edu/~mlearn/
vector machine classification via generalized eigenvalues,” IEEE Trans. MLRepository.html
Pattern Anal. Mach. Intell., vol. 28, no. 1, pp. 69–74, Jan. 2006. [33] The MathWorks, Inc. (1994–2010). MATLAB User’s Guide. [Online].
[25] D. T. Larose, “k-nearest neighbor algorithm,” in Discovering Knowledge Available: http://www.mathworks.com
in Data: An Introduction to Data Mining. Warwick, U.K.: Wiley, 2005, [34] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining,
pp. 90–106. 1st ed. Boston, MA, USA: Addison-Wesley, 2005.