You are on page 1of 4

Proceedings of the 6th World Congress on Intelligent Control and Automation, June 21 - 23, 2006, Dalian, China

The Global Fuzzy C-Means Clustering Algorithm


Weina Wang, Yunjie Zhang, Yi Li and Xiaona Zhang
Department of Applied Mathematics University of Dalian Maritime Dalian, Liaoning Province, China wwn@newmail.dlmu.edu.cn
Abstract - The Fuzzy C-Means (FCM) is one of the algorithms for clustering based on optimizing an objective function, being sensitive to initial conditions, the algorithm usually leads to local minimum results. Aiming at above problem, we present the global Fuzzy C-Means clustering algorithm (GFCM) which is an incremental approach to clustering. It does not depend on any initial conditions and the better clustering results are obtained through a deterministic global search procedure. We also propose the fast global Fuzzy C-Means clustering algorithm (FGFCM) to improve the converging speed of the global Fuzzy C-Means clustering algorithm. Experiments show that the global Fuzzy C-Means clustering algorithm can give us more satisfactory results by escaping from the sensibility to initial value and improving the accuracy of clustering; the fast global Fuzzy C-Means clustering algorithm improved the converging speed of the global Fuzzy C-Means clustering algorithm without significantly affecting solution quality. Index Terms - Clustering, FCM, Global optimization.

I. INTRODUCTION Clustering is the process of grouping a data set in a way that the similarity between data within a cluster is maximized while the similarity between data of different clusters is minimized [1]. There are two main approaches to clustering. One method is crisp clustering (or hard clustering), and the other one is fuzzy clustering. A characteristic of the crisp clustering method is that the boundary between clusters is fully defined. However, in many real life cases, boundaries between natural classes maybe overlapping. So, certain input patterns do not completely belong to a single class, but partially belong to the other classes too. In such cases, the fuzzy clustering method provides a better and more useful method to classify these patterns. There are many fuzzy clustering methods being introduced [2]. Fuzzy C-means clustering algorithm is one of most important and popular fuzzy clustering algorithms. At present, the FCM algorithm has been extensively used in feature analysis, pattern recognition, image processing, classifier design, etc. (see [3][4]). However, the FCM clustering algorithm is sensitive to the situation of the initialization and easy to fall into the local minimum or a saddle point when iterating. To solve this problem several other techniques have been developed that are based global optimization methods (e.g. genetic algorithms, simulated annealing) [5-7]. However, in many practical applications the clustering method that is used is FCM with multiple restarts to escaping from the sensibility to initial value [8].

In Ref. [9], based on the k-means algorithm, they proposed the global k-means clustering algorithm (GKM). Based on the assumption in Ref. [9], we proposed the global Fuzzy C-Means clustering algorithm (GFCM), which is a global clustering algorithm for the minimization of the clustering error. This algorithm is an incremental approach to clustering, and we can obtain an optimal solution for fuzzy C-partition through a series of local searches (FCM). At each local search we let optimal cluster centers for fuzzy (C1)-partition problem be the (C1) initial positions and an appropriate position within the data space be the remaining Cth initial position, then perform the FCM algorithm to obtain the optimal clustering centers corresponding to the fuzzy Cpartition. Since for C =1 the optimal solution is known, we can iteratively apply the above procedure to find optimal solutions for all fuzzy c-partition problems c =1,, C. The global FCM clustering algorithm dose not depend on any initial conditions, effectively escapes from the sensibility to initial value and improves the accuracy of clustering. In the following section we describe the proposed global Fuzzy C-Means algorithm. For the disadvantage of the global algorithms converging speed, we propose the fast global Fuzzy C-Means clustering algorithm (FGFCM) in Section III. Section IV provides experimental results and comparisons with FCM and the global k-means clustering algorithm on two synthetic data sets and three real survey data sets. Finally we make conclusions in Section V. II. THE GLOBAL FUZZY C-MEANS CLUSTING ALGORITHM For a set of unlabeled data X ={x1, , xN }, where N is the number of data points. Its constrained fuzzy C-partition can be briefly described as follows: Given that the membership function of the ith ( i=1,,N ) vector to the jth (j=1,2,,C ) cluster is denoted as uij. The membership values are often constrained as

i,

C j =1

uij = 1; i, j, uij [0,1] ; j,

N i =1

uij > 0.

The most widely used clustering criterion is the weighted within-group sum of squared errors as follows:
Jm =
N C i =1 j =1 m uij xi j . 2

(1)

Where V={v1, v2, , vC} is the vector of the cluster centers and m is the weighting exponent. The FCM is a local search procedure with respect to the clustering criterion. Its performance heavily depends on initial

1-4244-0332-4/06/$20.00 2006 IEEE

3604

Authorized licensed use limited to: PUC MG. Downloaded on May 13, 2009 at 20:46 from IEEE Xplore. Restrictions apply.

starting conditions and always converges to a local minimum. In order to solve this problem we employ the FCM algorithm as a local search and the FCM is scheduled differing in the initial positions of the cluster centers. Based on the k-means algorithm, in Ref. [9] they proposed if the k 1 centers placed at the optimal positions for the (k 1)-clustering problem and the remaining kth center placed at an appropriate position to be discovered, an optimal clustering solution with k clusters can be obtained through local search. Base on this assumption we proposed the global Fuzzy C-Means clustering algorithm. Instead of randomly selecting initial values for all cluster centers as is the case with most global clustering algorithms, the proposed technique proceeds in an incremental way attempting to optimally add one new cluster center at each stage. More specifically, we start with fuzzy 1-partition and find its optimal position which corresponds to the centroid of the data set X. For fuzzy 2-partition problem, the first initial cluster center is placed at the optimal position for fuzzy 1-partition, while the second initial center at execution n is placed at the position of the data point xn (n=1,,N ). Then we perform the FCM algorithm from each of these initial positions respectively, to obtain the best solution for fuzzy 2-partition. In general, let (V1(C), , VC (C)) denote the final solution for fuzzy C-partition. If we have found the solution for the fuzzy (C1)-partition problem, we perform the FCM algorithm with C clusters from each of these initial state (V1 (C1), , VC 1(C1), xn) ( n =1,2,,N ), respectively. The best solution can be obtained through the above procedure. The main advantage of the algorithm is that it dose not depend on any initial conditions and improves the accuracy of clustering. The algorithm is briefly summarized as follows: Step 1: Perform the FCM algorithm to find the optimal clustering centers V(1) of the fuzzy 1-partition problem and let obj_1 be its corresponding value of the objective function found by (1). Step 2: Perform N runs of the FCM algorithm with c clusters where each run n starts from the initial state (V1(c),, Vc (c), xn), and obtain their corresponding values of the objective function and clustering centers. Step 3: Find the minimal value of the objection function obj_(c+1) and its corresponding clustering centers V(c+1) from Step 2. Let V(c+1) be the final clustering centers for fuzzy (c+1)-partition. Step 4: If c+1=C, stop; otherwise set c= c+1 and go to Step 2. III. THE FAST GLOBAL FUZZY C-MEANS CLUSTING ALGORITHM The global Fuzzy C-Means algorithms requires N executions of the FCM algorithm for each value of c ( c = 1,2,,C ), in order to improve the convergence speed of the global Fuzzy C-Means algorithm we proposed the fast global FCM clustering algorithm. For each of the N initial states (

V1(c1), , Vc1(c1), xn) we do not execute the FCM to obtain the final clustering error Jm.. Instead we straightforward compute the value of the objection function for all initial state, find the center corresponding to the minimum value of objection function to be the initial center, and then execute the FCM algorithm to obtain the solution with c clusters. The steps for the fast global Fuzzy C-Means clustering algorithms can be described as follows: Step 1: Perform the FCM to find the optimal clustering center V(1) of the fuzzy 1-partition and let obj_1 be the corresponding value of the objective function found by (1). Step 2: Compute the value of the objection function for all initial state ( V1 (c1),,Vc1 (c1),xn) by using (2) [10].
Jm =
N C

(
i =1 c =1

xi c

2(1 m )

1 m

,||xi

vc||0 (i=1,,N; c=1,2,,C)

(2) Step 3: Find the minimal value of the objection function obj_(c+1) and the corresponding initial state V0(c+1) from Step 2. Let V0(c+1) be the initial clustering centers for fuzzy (c+1)-partition. Step 4: Perform FCM algorithm with (c+1) clusters from the initial state V0(c+1) and obtain the final clustering center V(c+1) for fuzzy (c+1)-partition. Step 5: If c+1=C, stop; otherwise set c= c+1 and go to Step 2. Obviously, the global Fuzzy C-Means clustering algorithms requires performing CN executions of the FCM algorithm, the fast global FCM clustering algorithms only requires performing C executions of the FCM algorithm. Therefore it improves the convergence speed of the former. Experimental results (see next section) suggest that using the fast global FCM clustering algorithm leads to results almost as good as those provided by the global Fuzzy C-Means clustering algorithm. IV. EXPERIMENTAL RESULTS A. Comparison of the global Fuzzy C-Means algorithm to the FCM and the global k-means algorithm To validate the sensibility to initial value and the accuracy of the proposed algorithm, we conducted several experiments (two artificial data sets and three real survey data sets). The three algorithms (FCM, GKM and GFCM) are compared on five data sets: (i) a synthetic data set consisting of ten clusters, with each cluster consisting of 30 points, and this data set is two-dimensional and is depicted in Fig.1(a); (ii) a synthetic data set consisting of fifteen clusters, with each cluster consisting of 20 points, and this data set is two-dimensional and is depicted in Fig.1(b); (iii) elect 63 data points from vowel data set, which consist of nine clusters and is 10-dimensional; (iv) select 600 data points from satimage data set, which consist of six clusters and is 36-dimensional; (v)

3605
Authorized licensed use limited to: PUC MG. Downloaded on May 13, 2009 at 20:46 from IEEE Xplore. Restrictions apply.

select 70 data points from function-finding data set, which consist of fourteen clusters and is two-dimensional. For each of the above presented data sets we executed 10 times the three algorithms respectively, and summarise these results as show in Table I. Experimental results suggest that the global Fuzzy C-Means algorithms and the global k-means algorithms are not sensitive to initial value, their clustering errors and accuracy of clustering are stable, and the global Fuzzy C-Means algorithms experimental results is better than the global k-means algorithms and FCM.

Fig. 1 Artificial data sets TABLE I COMPARISON OF MULTISTART FCM, GKM, GFCM WITH RANDOM INITIAL SOLUTIONS Data FCM 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 32.106181 29.503628 32.106193 29.503633 29.503633 29.503637 29.503636 32.148509 29.503631 29.503635 227.194147 45.384140 123.529549 45.384140 220.335306 119.692422 120.113197 229.870109 123.190366 138.630208 14.631940 14.855134 14.631941 14.369131 14.840373 14.369130 14.542548 14.369128 16.373470 14.840379 740245.533837 740245.533837 740245.533837 740245.533837 702252.724092 702252.724092 702252.724092 740245.533837 702252.724092 740245.533837 840.288461 957.564797 767.348288 778.993361 1050.141482 Clustering error GKM 47.659184 47.659184 47.659184 47.659184 47.659184 47.659184 47.659184 47.659184 47.659184 47.659184 47.453532 47.453532 47.453532 47.453532 47.453532 47.453532 47.453532 47.453532 47.453532 47.453532 36.448849 36.448849 36.448849 36.448849 36.448849 36.448849 36.448849 36.448849 36.448849 36.448849 1344465.825141 1344465.825141 1344465.825141 1344465.825141 1344465.825141 1344465.825141 1344465.825141 1344465.825141 1344465.825141 1344465.825141 776.223363 776.223363 776.223363 776.223363 776.223363 GFCM 29.503621 29.503621 29.503621 29.503621 29.503621 29.503621 29.503621 29.503621 29.503621 29.503621 45.384140 45.384140 45.384140 45.384140 45.384140 45.384140 45.384140 45.384140 45.384140 45.384140 14.369126 14.369126 14.369126 14.369126 14.369126 14.369126 14.369126 14.369126 14.369126 14.369126 702252.724088 702252.724088 702252.724088 702252.724088 702252.724088 702252.724088 702252.724088 702252.724088 702252.724088 702252.724088 565.690058 565.690058 565.690058 565.690058 565.690058 Accuracy of clustering (%) FCM 89.33 100 90.00 100 100 100 100 84.67 100 100 97.00 100 98.00 100 98.00 97.67 98.00 97.67 97.00 96.67 80.95 79.37 80.95 90.48 79.37 90.48 80.95 90.48 77.78 79.37 70.50 70.50 70.50 70.50 77.17 77.17 77.17 70.50 77.17 70.50 80.00 77.14 80.00 78.57 70.00 GKM 99.67 99.67 99.67 99.67 99.67 99.67 99.67 99.67 99.67 99.67 100 100 100 100 100 100 100 100 100 100 90.48 90.48 90.48 90.48 90.48 90.48 90.48 90.48 90.48 90.48 66.17 66.17 66.17 66.17 66.17 66.17 66.17 66.17 66.17 66.17 88.57 88.57 88.57 88.57 88.57 GFCM 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 90.48 90.48 90.48 90.48 90.48 90.48 90.48 90.48 90.48 90.48 77.17 77.17 77.17 77.17 77.17 77.17 77.17 77.17 77.17 77.17 98.57 98.57 98.57 98.57 98.57

DataSet 1

DataSet 2

DataSet 3

DataSet 4

DataSet 5

3606
Authorized licensed use limited to: PUC MG. Downloaded on May 13, 2009 at 20:46 from IEEE Xplore. Restrictions apply.

6 7 8 9 10

710.379174 986.023279 981.365731 845.515913 1058.682813

776.223363 776.223363 776.223363 776.223363 776.223363

565.690058 565.690058 565.690058 565.690058 565.690058

78.57 71.43 72.86 78.57 70.00

88.57 88.57 88.57 88.57 88.57

98.57 98.57 98.57 98.57 98.57

B. Comparison of the global Fuzzy C-Means algorithm to the fast global Fuzzy C-Means algorithm In terms of the fast global Fuzzy C-Means algorithm, it is very encouraging that, although executing significantly faster, it provides solutions of excellent quality, comparable to those obtained by the original method. Therefore it constitutes a very efficient algorithm, both in terms of solution quality and converging speed. In the following we provide the experimental results comparing the fast global Fuzzy C-Means algorithm to the global Fuzzy C-Means algorithm with multiple restarts, and summarise these results as show in Table II. V. CONCLUSIONS Fuzzy C-Means is one of the algorithms for clustering based on optimizing an objective function, being sensitive to initial conditions, the algorithm usually leads to local minimum results. Aiming at above problem, we present a global optimization algorithm, which dose not depend on any initial conditions and only use FCM as a local search. Instead of randomly selecting initial values, the proposed technique proceeds in an incremental way attempting to optimally add one new cluster center at each stage. Therefore it effectively escapes from the sensibility to initial value and improves the accuracy of clustering, compares favorably to the FCM algorithm with multiple random restarts. For the disadvantage of the global algorithms converging speed, we propose the fast global Fuzzy C-Means clustering algorithm, which significantly improves the convergence speed of the global Fuzzy C-Means clustering algorithm, which significantly improves the convergence speed of the global Fuzzy C-Means clustering algorithm,
TABLE II COMPARISON OF MULTISTART GFCM AND FGFCM WITH RANDOM INITIAL SOLUTIONS Accuracy of Clustering error clustering (%) Data GFCM FGFCM GFCM FGFCM DataSet 1 29.503621 29.503632 100 100 DataSet 2 45.384140 45.384140 100 100 DataSet 3 14.369126 14.369127 90.48 90.48 DataSet 4 702252.724088 702252.724091 77.17 77.17 DataSet 5 565.690058 593.113528 100 88.57

while at the same time providing solutions of almost the same quality. In addition, by proceeding in the global Fuzzy C-Means algorithm we solve the fuzzy C-partition problem also solved all intermediate fuzzy c-partition problems for c =1,,C. This may prove useful in many applications where we employ appropriate criteria for selecting the most suitable vale of c [11]. ACKNOWLEDGMENT The authors would like to thank the anonymous referees whose comments and suggestions have helped greatly in improving this paper. This work is partially supported by the National Natural Science Fund of China under Grant 60573072 and Science Research Program of Education Department of Liaoning Province of China (Program NO.2005037). REFERENCES
[1] T. Kwok, R Smith, S. Lozano, and D. Taniar, Parallel fuzzy c-means clustering for large data sets, In Burkhard Monien and Rainer Feldmann, editors, EUROPARO2, volume 2400 of LNCS, pp. 365-374,2002. [2] F. Hoppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy cluster analysis, Wiley Press, New York, 1999. [3] M. C. Clark, L. O. Hall, MRI segmentation using fuzzy clustering techniques: integrating knowledge, http://www.csee.usf.edu/, 1995. [4] Y. W. Lim, S. U. Lee, On the Color Image Segmentation Algorithm Based on the Thresholding and the Fuzzy c-means Techniques, Pattern Recognition, 23(9): pp. 935-951, 1990. [5] Yunying Dong, Yunjie Zhang, Chunling Chang, Multistage Random Sampling Genetic-Algorithm-Based Fuzzy C-Means Clustering Algorithm, Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004, 2069-2073. [6] Yunying Dong, Yunjie Zhang, Chunling Chang, An Improved Hybrid Cluster Algorithm, fuzzy Systems and Mathematics, Vol.19, No.2, 128-133. [7] J. Richardt, F. Karl, C. MPuller, Connections between fuzzy theory, simulated annealing, and convex duality, Fuzzy Sets and Systems, 1998(96): 307334. [8] Chen Jinshan, Wei Gang, A hybrid clustering algorithm incorporation Fuzzy C-Means into canonical genetic algorithm, Journal of electronics and information technology, Vol.24, No.2, 210215. [9] Aristidis Likas, Nikos Vlassis, Jakob J. Verbeek, The global k-means clustering algorithm, Pattern Recognition, 2003(36): 451-461. [10] Nabil belacel, Pierre Hansen, Nenad Mladenovic, Fuzzy J-Means: a new heuristic for fuzzy clustering, Pattern Recognition, 35(2002) 2193-2200. [11] Dae-Won Kim, Kwang H, Doheon Lee, On cluster validity index for estimation of optimal number of fuzzy clusters [J], Pattern Recognition, 2004(37): 2009-2024.

3607
Authorized licensed use limited to: PUC MG. Downloaded on May 13, 2009 at 20:46 from IEEE Xplore. Restrictions apply.

You might also like