You are on page 1of 18

50 European J. Industrial Engineering, Vol. 6, No.

1, 2012
Copyright 2012 Inderscience Enterprises Ltd.











A hybrid algorithm for fuzzy clustering
Z.H. Che
Department of Industrial Engineering and Management,
National Taipei University of Technology,
1, Sec. 3, Chung-Hsiao E. Rd., Taipei 10608, Taiwan
E-mail: zhche@ntut.edu.tw
Abstract: The fuzzy C-means (FCM) algorithm is a commonly used fuzzy
clustering method which conducts data clustering by randomly selecting initial
centroids. With larger data size or attribute dimensions, clustering results may
be affected and more repetitive computations are required. To compensate the
effect of random initial centroids on results, this study proposed a hybrid
algorithm immune genetic annealing fuzzy C-means algorithm (IGAFA).
This algorithm obtains the proper initial cluster centroids to improve clustering
efficiency and then tests them through three data sets: Hambermans survival,
iris, and liver disorders, and compares the results with the executed results of
genetic fuzzy C-means algorithm (GFA), immune fuzzy C-means algorithm
(IFA), and annealing fuzzy C-means algorithm (AFA). The results suggest that
IGAFA could achieve better clustering results.
[Received: November 18, 2009; Accepted: July 19, 2010]
Keywords: fuzzy clustering; fuzzy C-means; FCM; clustering efficiency.
Reference to this paper should be made as follows: Che, Z.H. (2012) A hybrid
algorithm for fuzzy clustering, European J. Industrial Engineering, Vol. 6,
No. 1, pp.5067.
Biographical notes: Z.H. Che is an Associate Professor at the Department of
Industrial Engineering and Management, National Taipei University of
Technology, ROC. He received his PhD in Industrial Engineering and
Management at National Chiao Tung University, ROC. His current
research interests include production/operations management, supply chain
management, information management, and applications of meta-algorithms.

1 Introduction
Fuzzy C-means (FCM) refers to a fuzzy clustering method based on objective function
optimisation, which realises fuzzy clustering of data sets through repetitive computation
in unsupervised classification. Mishra et al. (2001) suggested that with more data, more
time is required for computation and sampling is often used to improve the clustering
efficiency and reduce the database. Lin et al. (2006) and Liu et al. (2007) pointed out that
FCM is a local search technique that enables search by hill-climbing, the randomly
selected initial cluster centroids may affect the clustering result and are likely be trapped
in local optimum during optimisation clustering processes. To resolve this problem,
different algorithms are used to first search the cluster centroids closest to the global
optimum and then conduct FCM steps to avoid being trapped into a local optimum.









A hybrid algorithm for fuzzy clustering 51













Zhao et al. (1996), Sha and Che (2006), and Damodaran et al. (2009) indicated that a
genetic algorithm (GA) based on inheritance and a natural selection mechanism, is
widely applied for various optimisation purposes, as data space is searched by
multi-point concepts and the mutation rule is random which reduces the probability of
trapping into false peak. Peng et al. (2006) pointed out that traditional FCM is often
converged to a local optimum rather than a global optimum. If GA is used to improve
colour pattern recognition, the local search capability could be enhanced despite of the
poorer execution speed of GA. Nie et al. (2007) argued that many researches only
considered the speed of FCM but neglected the accuracy, and thus, better clustered
centroids are identified using the excellent global search capability of GA, and the final
clustering results are converged and applied to medical image segmentation using the
local optimum search capability of FCM.
Bandyopadhyay (2005) suggested applying a simulated annealing algorithm (SA) of
reversible jump Markov chain Monte Carlo to fuzzy clustering. In order to achieve faster
search in the space and provide better convergence value for FCM, new solutions can be
produced through probability during the SA process, so that efficient classification of
artificial and real-life data sets can be achieved. Wu et al. (2007) applied SA to clustering
of incomplete data and the results showed reduced clustering errors. De Castro and
Von Zuben (2002) suggested that the clonal selection principle is an immune
optimisation method which identifies and eliminates the same solutions to retain the
better ones.
According to immune algorithm (IA) principles, Liu et al. (2004) developed new
algorithms and applied fuzzy clustering, then retained local optimums as antibodies and
further conducted global search. It compares K-means, K-medoid, and FCM indicating
new algorithms have better results. Zhang et al. (2007) pointed out that, when IA is
applied to clustering, initial centroids are searched and converged quickly to a global
optimum, thus improving the effect of initial centroids on the results. Liu et al. (2004)
indicated that FCM is easily trapped into a local solution and the clustering result is very
sensitive to initial centroids, thus, IA could be used to determine initial centroids
followed by fuzzy clustering, thereby reducing the convergence frequency of FCM to the
optimum.
There are two major study purposes:
1 combining the GA, IA, SA along with the steps of FCM to develop an immune
genetic annealing fuzzy C-means algorithm (IGAFA) and applying the IGAFA for
performing the fuzzy clustering analysis
2 comparing the solving performance of IGAFA, genetic fuzzy C-means algorithm
(GFA), immune fuzzy C-means algorithm (IFA), and annealing fuzzy C-means
algorithm (AFA) to verify IGAFA has excellent capabilities to solve the fuzzy
clustering problems defined in this study.
Three known data sets, Hambermans survival, iris, and liver disorder, are used for
validation together with the Xie-Beni cluster validation index (XB).
The remainder of this paper is organised as follows: Section 2 is the literature review
on researches and applications of fuzzy cluster analysis and algorithms; Section 3
introduces the methodology, research framework, and processes of the hybrid algorithm;
Section 4 presents the experimental results, compares XB indices, convergence times, and









52 Z.H. Che












computational time to analyse the advantages and disadvantages of the algorithms; the
conclusions and suggestions are given in Section 5.
2 Literature review
2.1 Fuzzy clustering analysis
The partitional clustering algorithm is often used in cluster research because its
computational complexity is small than the hierarchical clustering algorithm. K-means
(Lloyd, 1957), FCM (Bezdek, 1981), and possibilistic C-means (Krishnapuram and
Keller, 1996) in partitional clustering algorithm are most frequently used. This method
first obtains i random cluster centroids, determines the pattern that is suitable for the
same cluster according to the distance between each pattern and cluster of centroids, then
obtains new cluster centroids and modifies the clustering results using the newly
clustered centroids. These steps are implemented repetitively until the set stop conditions
are met and finally the clustering result can be acquired smoothly.
Ruspini (1970) combined Zadehs fuzzy sets theory and cluster analysis to perform
clustering of uncertain data with grades of membership, named fuzzy clustering. When
the data have clear messages, better clustering result can be obtained by traditional cluster
analysis. However, when there are special data, such as counting continuously data or
data with uncertain messages, fuzzy clustering may lead to better results. Bezdek and
Dunn (1975) first applied fuzzy theory to cluster analysis and developed a FCM
clustering algorithm. To resolve the fuzzy C-segmentation problem, Bezdek (1981)
deduced the minimisation algorithm by the distance between sample points and cluster
centroids.
Kaymak and Setnes (2002) pointed out that the XB index proposed by Xie and Beni
(1991) is a commonly used index, and XB is used to calculate the separation and
compactness among clusters and validate the clustering result of FCM. The lower
compactness and higher separation indicate satisfactory clustering results, thus, the
minimised XB index represents better clustering. The XB index is calculated as follows:
2
1 1
2
min
J I
m
ij j i
j i
i k
i j
u x v
XB
K v v
= =

(1)
where
1
2 1
1
m
K
j i
ij
j k k
x v
u
x v

(2)
1
1
J
m
ij j
j
i
J
m
ij
j
u x
v
u
=
=



=



(3)









A hybrid algorithm for fuzzy clustering 53













j index of data point, j = 1,2,3,J
v
i
and v
k
index of cluster centroid, i = 1,2,3,I, k = 1,2,3,K: jth pattern
m
ij
u degree of membership of jth pattern versus ith cluster centroid with m fuzziness
degree.
According to above, the clustering method can be applied to various fields, depending on
the clustering objectives. In this paper, the fuzzy clustering membership and algorithm
with search capability for optimisation problems are employed to obtain the initial
clustering centroids, thereby providing better fuzzy clustering efficiency.
2.2 The concept of GA and its application for FCM
The GA was developed by Holland (1975) and based on the concept of survival of the
fittest in the natural selection. GA is solved by means of a multi-point search rather than
the traditional single-point search, since it is easily trapped into a local optimum by the
single-point search especially for multimodal function. The basic procedure of GA and its
applications are referred to by Hou et al. (1994), Kumar and Shanker (2000), Sun et al.
(2002), Baker and Ayechew (2003), Che and Wang (2008), Wang and Che (2008), Chan
et al. (2009), and Che (2010a, 2010b).
Wei and Fahn (1996) argued that, based on the objective function in FCM, the
intended objective function could be established according to the combinatorial
optimisation problem. Many repetitive algorithms have been proposed, among which,
GA has more opportunities to obtain better solutions for such combinatorial optimisation
problems. Given random and comprehensive characteristics, the search methods of GA
can efficiently solve real-life data clustering problems of a certain quantity. Zhao et al.
(1996) suggested that fuzzy clustering can represent accurate results for inaccurate
clustering of real-life data structures, and GA is widely applied to various optimisation
problems by a multi-point conceptual search and a natural selection mechanism. Thus,
there are more opportunities to realise optimum clustering by applying GA and FCM to
manufacturing unit planning with clustering technology, proving that GA is an efficient
method for real-life data clustering. Lin et al. (2006) proposed a GA-based FCM to
overcome the shortcomings of FCM (easily trapped into local optimum) by using GAs
random global search. Thus, GA can be applied to failure diagnosis of satellite ADS and
proving a better clustering result than FCM.
To sum up, GA of global optimum search capability is widely applied to various
issues. Wei and Fahn (1996) suggested that GA has more opportunities to realise
optimum solutions, although the solving time could be prolonged.
2.3 The concept of IA and its application for FCM
Dasgupta and Okine (1997) suggested that the immune system is a complex
self-adjustment system, which can efficiently resist the intrusion of external bacteria or
virus. The main function is to identify themselves and other units and activate the
corresponding defense mechanism to resist external attack. Liu et al. (2004) indicated that
an artificial IA can identify unique antigens through antibodies in an immune reaction









54 Z.H. Che












and consider its objectives as antigens and its solutions as antibodies, of which antibodies
must be computed for unknown antigens in order to determine the optimum ones.
De Castro and Von Zuben (2002) and Liu et al. (2007) discussed the concept and
applications based on the immune system.
De Castro and Von Zuben (2001) indicated that the immune system could maintain a
local optimum and search a global optimum, and show better performance when applied
to hierarchical clustering and data analysis. Xu et al. (2006) proposed a supervised
clustering algorithm based on the clonal selection principle to identify better clusters.
Meng et al. (2007) proposed a fuzzy IA to conduct efficient data clustering and improve
the stability of a network. Liu et al. (2007) used an IA to search for initial solutions and
then better clustering results in fuzzy clusting problem, and quicker convergence speeds
of the new method are experimentally validated.
2.4 The concept of SA and its application for FCM
SA was first developed by Metropolis et al. (1953). Turgut et al. (2003) indicated
that after the application of Kirkpatrick et al. (1983) to combinatorial optimisation, it
became widely applied to optimisation problems. Its principle is that, when a solid is
heated to a certain temperature the molecular structures among the solids will be broken
down and turned into a liquid structure, then the cooling-down process is controlled in
such a way that, while changing liquid to a solid structure the molecules can be
rearranged into expected stable states. The related concepts of SA and its applications are
referred to by Seckiner and Kurt (2007) suggested that SA could randomly disturb
existing solutions to generate new ones within the solving space and due to the
probability temperature state model within the algorithm, temporarily accept an inferior
local optimum solution that could be replaced by any improved optimum. Thus, it has
been successfully applied to many related management cases. Abramson (1991)
constructed school timetabling by SA sequence and parallel algorithm. Brusco and Jacobs
(1993) applied SA to resolve flexible labour scheduling issues. Loukil et al. (2007),
Branke et al. (2008), and Yang et al. (2005) applied SA to solve dynamic facility layout
problems. Che and Wang (2010) proposed a hybrid approach based on K-means, SA,
convergence factor particle swarm optimisation, and the Taguchi method to solve
supplier cluster analysis problems.
SA steps are used in FCM to search optimum clusters and optimise parameters,
proving that it can improve clustering accuracy and speed. Bandyopadhyay (2005)
applied SA and reversible jump Markov chain Monte Carlo to fuzzy clustering and
demonstrated that the clustering results are better than the basic FCM algorithm, as
determined through experiments with artificial and real-life data sets.
2.5 Summary
The randomly selected initial value of FCM may affect clustering results with poor
global search capability and slow search speed. GA uses the multi-point search
mechanism to avoid the local optimum, but it re-finds the same solutions in the
searching process, that causes the great consumption of time for fuzzy clustering. IA
can remove the solutions with greater similarities to maintain diversity, but it is easily









A hybrid algorithm for fuzzy clustering 55













trapped into a local optimum by the single-point search mechanism in fuzzy clustering
process. SA can randomly disturb existing solutions to generate new ones within the
solving space of neighbourhood, but it is not easy to fine the better solution which is
distant from the existing solutions. Hence, this study combined the global search
capability of GA, the diversity mechanism of IA, and the local search capability
of SA, along with the clustering steps of FCM and then developed a hybrid algorithm
(IGAFA).
3 Methodology
3.1 Immune genetic annealing fuzzy C-means algorithm
IGAFA proposed in this paper combines the global search capability of GA, the
neighbourhood search capability of SA, and the better solution retention characteristics of
IA, and FCM, with the procedures shown in Figure 1.
Figure 1 The procedure of IGAFA (see online version for colours)
Initial Population
Calculate The Fitness
of Population
Reproduction
Crossover
Mutation
Stop
Affinity
Calculataion
Selection to
Memory Section
New
parent population
Perturb
chromosome
Accept New
Chromosome
Replace
Chromosome
Temperature
Decrease
Stop
Update
Memory Section
No
Yes
Yes
No
Yes
No
Yes
Initial Centroids
No
Calculate Matrix U
Yes
No
Optimal Clusters
The Variation
of J < 0.0001
Calculate
Object Function(J)
Calculate
New Centroids(V)
Step1
Step2
Step3
Step4
Step5
Step6
Step7
Step8
Step9
Step11
Step12
Step13
Step14
Step15
Step16
Step10
Step17
Step18
Step19
Step20










56 Z.H. Che












Step 1 The parameters include: population (P), generation (G), mutation rate (MR),
crossover rate (CR), temperature (T), Markov chain length (M), cooling rate (),
final temperature (T
f
). Produce initial population according to cluster number c
and data dimension d, the length of chromosome L = c d represents the
coordinates of the cluster centroids, as shown in Figure 2.
Step 2 Chromosomes produce cluster centroids through the data sets in database, and
calculate the fitness functional values of f(n) through a Euclidean distance
equation:
( ) ( ) ( ) ( )
2 2 2
1 1 2 2
, ... .
j i j i j i jN iN
d x v x v x v x v = + + +
Step 3 Compare the chromosomes by the fitness functional values and remove those
with greater similarities to maintain diversity.
Figure 2 Gene chromosome encoding (see online version for colours)

Figure 3 Calculation of fitness functional values with gene sequencing (see online version for
colours)

X Y Z X Y Z
Centroids of 1

Centroids of 2

Centroids of c

.
.
.
Gene
sequencing of 1
cN2
.
.
.
d c L =
c11 c12
Gene
sequencing of 2
Gene
sequencing of N
X Y Z X Y Z
Centroids of 1

Centroids of 2

Centroids of c
d c L = c21 c22
X Y Z X Y Z
Centroids of 1

Centroids of 2

Centroids of c
d c L =
) 1 ( = n f
) 2 ( = n f
) ( N n f =
cN1










A hybrid algorithm for fuzzy clustering 57













Step 4 Conduct clonal selection, namely, compare the chromosome fitness. If the
chromosome fitness is improved, update the memory anti-body sets, otherwise
conduct Step 5.
Step 5 Calculate probability through the Roulette method (Goldberg, 1989), of which a
higher fitness means a greater probability of reproduction. As the target is
intended for minimisation, take the reciprocal value 1/f(n) from the fitness
functional values of various gene sequences and calculate the percentage of the
whole probability
1
1 1
,
( ) ( )
N
n
n
p
f n f n
=
=

such that the smaller fitness functional
values have greater probability of being selected.
Step 6 As for the chromosome not selected, calculate the crossover number
CN = CR P through the set crossover rate CR and population P, and randomly
select CN chromosomes from the offspring chromosomes to conduct a
single-point crossover (Figure 4).
Figure 4 Single-point crossover procedures (see online version for colours)

C11 C12 C16 C17 C18 C1L
C21 C22 C26 C27 C28 C2L
Cut point
C11 C26 C27 C28 C2L
C21 C16 C17 C18 C1L
Crossover
produres

Figure 5 Mutation methods (see online version for colours)
C11 C12 C19 C1L
Mutation point
Before the
mutation
After the
mutation
Mutation
produres
C11 C12 C19' C1L
Stochastic value











58 Z.H. Che












Step 7 Calculate the mutation number MN = MR L P according to the set MR,
chromosome length (L), and P, randomly select MN genes by single-point
mutation and the chromosome of this gene sequence, and then replace by the
public static function, as shown in Figure 5.
Step 8 After gene mutation, calculate the fitness functional values f
m
(n) of the
chromosomes by Euclidean distance equation and conduct a disturbance to
generate a neighbourhood under current temperature (T), and then calculate the
fitness functional values of the neighbourhood, using the Euclidean distance
equation again.
Step 9 Calculate probability function
1
, ( ) 0
( )
( )
exp , ( ) 0
if f n
P n
f n
if f n
T



=

<



according to the fitness function difference f(n) = f
s
(n) f
m
(n), and then
randomly produce a 01 number r to compare with P(n); if r P(n), conduct
Step 10; if r > P(n) , return to Step 8 to reproduce the neighbourhood until the
end of the set search times.
Step 10 The neighbourhood generated by a disturbance replaces the mutated
chromosome and its fitness functional values.
Step 11 With the set cooling rate , cool the temperature through cooling mechanism
T = T .
Step 12 Judge if the circulation is finished through set final temperature T
f
. If T T
f
,
conduct Step 13; if T > T
f
, repeat Steps 811 until T T
f
conditions are met.
Step 13 Calculate the fitness functional values f(n) of new chromosomes generated by
simulated annealing, using the Euclidean distance equation.
Step 14 If f(n) > f
m
(n), the newly generated solution will replace the population;
otherwise, maintain the original population as the next-generation population.
Repeat Steps 213 until reaching the set number of generations.
Step 15 According to the computational results, generate the optimum coordinate as the
optimum initial centroids of FCM.
Step 16 Calculate the membership of various points and cluster centroids using the
membership equation.
Step 17 Calculate XB, of which XB
h
represents the XB indices of every clustering result.
Step 18 Calculate XB(n) = XB
h
(n) XB
h 1
(n); if XB(n) >= 0.0001, proceed with
Step 19; otherwise conduct Step 20.
Step 19 Obtain new centroids v
i
, and return to Step 16.
Step 20 Complete computational procedures to obtain the optimum clustering result.
3.2 Procedure for evaluation of four methods
Procedure of evaluating the four methods is shown in Figure 6.









A hybrid algorithm for fuzzy clustering 59













Figure 6 The process for evaluating four methods (see online version for colours)

IGAFA AFA IFA GFA
Validity
Index XB
ANOVA
Data
Set
Conclusions
GA
FCM
IA
FCM
SA
FCM
IGSA
FCM
+ + + +
Hambers
Survival
Iris
Liver
Disorders

4 Experimental results
4.1 Data
This study conducted experiments by three data sets, as supplied by the UCI machine
learning repository website, and analysed the clustering results of various algorithms
against the data sets (Asuncion and Newman, 2007):
1 Habermans survival data: 306 patient data are contained in the data sets, including:
the age of patient at time of operation, year of operation, and number of positive
auxiliary nodes detected; these cases are categorised into two types depending on
survival status: survival over five years, and death within five years. The data
distribution is shown in Figure 7.
2 Iris data: 150 data are contained in the data sets of four eigenvalues specific to iris
plant, including: sepal length, sepal width, petal length, and petal width (cm
measuring units); these cases are categorised into three types: setosa, versicolor, and









60 Z.H. Che












virginca, each of which is provided with 50 data. The distribution map is shown in
Figure 8.
3 Liver disorder data: 345 liver patient data are gathered hereto, including: five blood
test data (mcv mean corpuscular volume, alkphos alkaline phosphatase, sgpt alamine
aminotransferase, sgot aspartate aminotransferase, and gammagt gamma-glutamyl
transpeptidase), and daily imbibed quantity of liquor (drinks number half-pint
equivalents of alcoholic beverages drunk per day), which can be categorised into two
types. The distribution map is shown in Figure 9.
Figure 7 Three-dimensional diagram of Habermans survival

Figure 8 Scatter diagram of iris










A hybrid algorithm for fuzzy clustering 61













Figure 9 Scatter diagram of liver disorder

4.2 Comparison of methods
The initial optimum clusters of data sets are listed in Table 1. The optimum clusters
specific to the data sets, were separately subjected to clustering algorithms by means of
GFA, IFA, AFA, and IGAFA. Each algorithm was repetitively tested 30 times, and the
solving performance was compared with respect to the respective XB index, convergence
times, and computational times. This study compared the significance of algorithms
variance using ANOVA, and judged the relationship of algorithms by Schffes multiple
comparison check (Peumans et al., 2007).
Table 2 shows the results of ANOVA. XB indices in Hambermans survival and liver
disorders have no significant differences after clustering (P-value > 0.05), however, those
in the iris data set have significant differences (P-value < 0.05). The algorithm
relationship is determined using Scheffes multiple comparison. The results are listed in
Table 3. In the iris data set, IGAFA is better than GFA, AFA, and IFA.
Table 1 Real optimal clusters (c*) in data sets
Data set Habermans survival Iris Liver disorders
c* 2 3 2
Table 2 Results of ANOVA of XB index values of four methods
Average XB index values
H
0
:
GFA
=
IFA
=
AFA
=
IGAF
H
1
: Otherwise
Data set
GFA IFA AFA IGAFA F-value P-value Result
Habermans
survival
0.223036 0.222972 0.222962 0.222955 1.725 0.16576 Accept H
0

Iris 0.161738 0.161737 0.161733 0.161688 12.798 2.78E-7 Reject H
0

Liver
disorders
0.125810 0.125639 0.125771 0.125584 1.482 1.48174 Accept H
0










62 Z.H. Che












Table 3 Results of multiple comparison of XB index values of four methods
Data set Method A Method B Mean difference F-value P-value Result
IFA GFA 6.42E-5 0.8555 0.466 IFA = GFA
AFA GFA 7.40E-5 1.1388 0.336 AFA = GFA
AFA IFA 9.87E-6 0.0202 0.996 AFA = IFA
IGAFA GFA 8.11E-5 1.3654 0.257 IGAFA = GFA
IGAFA IFA 1.69E-5 0.0593 0.981 IGAFA = IFA
Habermans
survival
IGAFA AFA 7.03E-6 0.0103 0.999 IGAFA = AFA
Rank IGAFA = AFA = IFA = GFA
IFA GFA 7.00E-7 0.0019 0.999 IFA = GFA
AFA GFA 5.07E-6 0.0968 0.962 AFA = GFA
AFA IFA 4.37E-6 0.0719 0.975 AFA = IFA
IGAFA GFA 4.93E-5 9.1564 0.000 IGAFA < GFA
IGAFA IFA 4.86E-5 8.8980 0.000 IGAFA < IFA
Iris
IGAFA AFA 4.42E-5 7.3699 0.000 IGAFA < AFA
Rank IGAFA < AFA = IFA = GFA
IFA GFA 1.72E-4 0.6324 0.596 IFA = GFA
AFA GFA 3.88E-5 0.0323 0.992 AFA = GFA
AFA IFA 1.33E-4 0.3788 0.768 AFA = IFA
IGAFA GFA 2.26E-4 1.1002 0.352 IGAFA = GFA
IGAFA IFA 5.48E-5 0.0644 0.979 IGAFA = IFA
Liver
disorders
IGAFA AFA 1.88E-4 0.7555 0.521 IGAFA = AFA
Rank IGAFA = AFA = IFA = GFA
Similarly, the convergence times of the four algorithms are compared. As shown in
Table 4, the convergence times of various algorithms in the three data sets have
significant differences (P-value < 0.05), and the algorithm relationships are determined
using Scheffes multiple comparison. The results are listed in Table 5, wherein the
advantages/disadvantages of various algorithms for the clustering results of data sets are
not fully consistent. In the Habermans survival data set, IGAFA is better than GFA, IFA,
and AFA; in the iris data set, IGAFA is better than GFA, AFA, and IFA; in the Liver
Disorders data set, IGAFA differs little from GFA or IFA, but the convergence times are
better than AFA.
Table 4 Results of ANOVA of convergence iterations of four methods
Average convergence iterations
H
0
:
GFA
=
IFA
=
AFA
=
IGAF
H
1
: Otherwise
Data set
GFA IFA AFA IGAFA

F-value P-value Result
Habermans survival 7.4 13.6 14.5 6.4 410.278 0.000 Reject H
0

Iris 17.9 22.3 19.7 12.5 129.836 0.000 Reject H
0

Liver disorders 12.9 14.2 24.4 13.3 532.569 0.000 Reject H
0











A hybrid algorithm for fuzzy clustering 63













Table 5 Results of multiple comparison of convergence iterations of four methods
Data set Method A Method B Mean difference F-Value P-value Result
IFA GFA 6.2 151.5649 0.000 IFA > GFA
AFA GFA 7.1 198.7614 0.000 AFA > GFA
AFA IFA 0.9 3.1938 0.002 AFA > IFA
IGAFA GFA 1.0 3.9429 0.010 IGAFA < GFA
IGAFA IFA 7.2 204.3997 0.000 IGAFA < IFA
Habermans
survival
IGAFA AFA 8.1 258.6934 0.000 IGAFA < AFA
Rank IGAFA < GFA < IFA < AFA
IFA GFA 4.3 23.2293 0.000 IFA > GFA
AFA GFA 1.7 3.7745 0.012 AFA > GFA
AFA IFA 2.6 8.2763 0.000 AFA < IFA
IGAFA GFA 5.5 38.0036 0.000 IGAFA < GFA
IGAFA IFA 9.8 120.6568 0.000 IGAFA < IFA
Iris
IGAFA AFA 7.2 65.7319 0.000 IGAFA < AFA
Rank IGAFA < GFA < AFA < IFA
IFA GFA 1.0 2.9412 0.036 IFA > GFA
AFA GFA 11.4 382.2353 0.000 AFA > GFA
AFA IFA 10.4 318.1177 0.000 AFA > IFA
IGAFA GFA 0.3 0.3268 0.805 IGAFA = GFA
IGAFA IFA 0.7 1.3072 0.275 IGAFA = IFA
Liver
disorders
IGAFA AFA 11.1 360.2092 0.000 IGAFA < AFA
Rank IGAFA = GFA = IFA < AFA
The computational time of the four algorithms are compared, as shown in Table 6. The
execution times of algorithms in the three data sets have significant differences
(P-value < 0.05). Scheffes multiple comparison results are listed in Table 7, wherein the
shorter execution time represents stronger efficiency. In the Habermans survival and iris
data sets, the execution times of IGAFA and IFA have insignificant differences, however,
they are better than AFA and GFA; in liver disorders, the execution time of IGAFA is
better than IFA, AFA, and GFA.
To sum up, IGAFA has lesser convergence times and shorter computational times,
therefore, it has excellent computational efficiency in data clustering.
Table 6 Results of ANOVA of CPU time (sec) of four methods
Average CPU time (sec)
H
0
:
GFA
=
IFA
=
AFA
=
IGAF
H
1
: Otherwise
Data set
GFA IFA AFA IGAFA F-value P-value Result
Habermans survival 126 46 61 41 727.31338 0.000 Reject H
0

Iris 113 18 44 34 709.34078 0.000 Reject H
0

Liver disorders 214 86 162 75 1084.8132 0.000 Reject H
0











64 Z.H. Che












Table 7 Results of Scheffes multiple comparison of CPU time (sec) of four methods
Data set Method A Method B Mean difference F-value P-value Result
IFA GFA 79.5 502.9594 0.000 IFA < GFA
AFA GFA 65.0 336.1252 0.000 AFA < GFA
AFA IFA 14.5 16.7530 0.000 AFA > IFA
IGAFA GFA 84.4 566.8928 0.000 IGAFA < GFA
IGAFA IFA 4.9 1.9120 0.131 IGAFA = IFA
Habermans
survival
IGAFA AFA 19.4 29.9843 0.000 IGAFA < AFA
Rank IGAFA = IFA < AFA < GFA
IFA GFA 95.2 513.0825 0.000 IFA < GFA
AFA GFA 69.4 272.8903 0.000 AFA < GFA
AFA IFA 25.8 37.6006 0.000 AFA > IFA
IGAFA GFA 98.3 547.2922 0.000 IGAFA < GFA
IGAFA IFA 3.1 0.5520 0.642 IGAFA = IFA
Iris
IGAFA AFA 28.9 47.2641 0.000 IGAFA < AFA
Rank IGAFA = IFA < AFA < GFA
IFA GFA 128.0 688.2438 0.000 IFA < GFA
AFA GFA 51.6 111.8463 0.000 AFA < GFA
AFA IFA 76.4 245.1936 0.000 AFA > IFA
IGAFA GFA 138.3 803.8519 0.000 IGAFA < GFA
IGAFA IFA 10.3 4.4854 0.005 IGAFA < IFA
Liver
disorders
IGAFA AFA 86.7 316.0054 0.000 IGAFA < AFA
Rank IGAFA < IFA < AFA < GFA
5 Conclusions and suggestions
Since FCM is time-consuming for clustering computation of random initial centroids, this
study developed a hybrid algorithm IGAFA, to improve clustering efficiency with a
global search capability of GA, better solution retention characteristics of IA, and the
overriding random local solution method of SA. IGAFA could properly identify initial
cluster centroids of data sets and measure the execution efficiency of algorithms, in order
to prevent long solving times due to larger data sizes.
This study validated the clustering efficiency by three data sets: Hambermans
survival, iris, and liver disorders, of which XB indices, execution times, and convergence
times were taken as comparison targets. The results indicated that IGAFA has
outstanding solving quality, and thus, can be efficiently used in various fields, such as:
image recognition, part type selection, and customer clustering. Future research will
focus on multi-objective clustering algorithms.









A hybrid algorithm for fuzzy clustering 65













Acknowledgements
This research was partially financially supported by the National Science Council under
project No. NSC 98-2410-H-027-002-MY2. The authors would like to thank the editors
and the three anonymous referees for their valuable comments.
References
Abramson, D. (1991) Constructing school timetables using simulated annealing: sequential and
parallel algorithms, Management Science, Vol. 37, No. 1, pp.98113.
Asuncion, A. and Newman, D.J. (2007) UCI machine learning repository, University of
California, School of Information and Computer Science, Irvine, CA, available at
http://www.ics.uci.edu/~mlearn/MLRepository.html.
Baker, B.M. and Ayechew, M.A. (2003) A genetic algorithm for the vehicle routing problem,
Computers & Operations Research, Vol. 30, No. 5, pp.787800.
Bandyopadhyay, S. (2005) Simulated annealing using a reversible jump Markov chain Monte
Carlo algorithm for fuzzy clustering, IEEE Transactions on Knowledge and Data
Engineering, Vol. 17, No. 4, pp.479490.
Bezdek, J.C. and Dunn, J.C. (1975) Optimal fuzzy partitions: a heuristic for estimating the
parameters in a mixture of normal distributions, IEEE Transactions on Computers, Vol. 24,
No. 8, pp.835840.
Bezdek, J.L. (1981) Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum Press,
New York.
Branke, J., Meisel, S. and Schmidt, C. (2008) Simulated annealing in the presence of noise,
Journal of Heuristics, Vol. 14, No. 6, pp.627654.
Brusco, M. and Jacobs, L. (1993) A simulated annealing approach to the solution of exible labor
scheduling problems, Journal of the Operational Research Society, Vol. 44, No. 12,
pp.11911200.
Chan, K.Y., Chan, K.W. Pong, G.T.Y., Aydin, M.E., Fogarty, T.C. and Ling, S.H. (2009) A
statistics-based genetic algorithm for quality improvements of power supplies, European
Journal of Industrial Engineering, Vol. 3, No. 4, pp.468492.
Che, Z.H. (2010a) A genetic algorithm-based model for solving multi-period supplier selection
problem with assembly sequence, International Journal of Production Research, Vol. 48,
No. 15, pp.43554377.
Che, Z.H. (2010b) Using hybrid genetic algorithms for multi-period product configuration change
planning, International Journal of Innovative Computing, Information and Control, Vol. 6,
No. 6, pp.27612785.
Che, Z.H. and Wang, H.S. (2008) Supplier selection and supply quantity allocation of common
and non-common parts with multiple criteria under multiple products, Computers and
Industry Engineering, Vol. 55, No. 1, pp.110133.
Che, Z.H. and Wang, H.S. (2010) A hybrid approach for supplier cluster analysis, Computers &
Mathematics with Applications, Vol. 59, No.2, pp.745763
Damodaran, P., Hirani, N.S. and Velez-Gallego, M.C. (2009) Scheduling identical parallel batch
processing machines to minimise makespan using genetic algorithms, European Journal of
Industrial Engineering, Vol. 3, No. 2, pp.187206.
Dasgupta, D. and Okine, N.A. (1997) Immunity-based systems: a survey, Proceedings of the
IEEE International Conference on Systems, Man and Cybernetics, 1215 October, Orlando,
Florida, USA.









66 Z.H. Che












De Castro, L.N. and Von Zuben, F.J. (2001) aiNet: an artificial immune network for data analysis,
in Abbass, H.A., Sarker, R.A. and Newton, C.S. (Eds.): Data Mining: A Heuristic Approach,
pp.231259, Idea Group Publishing.
De Castro, L.N. and Von Zuben, F.J. (2002) Learning and optimization using the clonal selection
principle, IEEE Transactions on Evolutionary Computation, Vol. 3, No. 3, pp.239251.
Goldberg, D.E. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning,
Addison-Wesley.
Holland, J.H. (1975) Adaptation in Natural and Artificial System, The University of Michigan
Press.
Hou, E.S.H., Ansari, N. and Hong, R. (1994) A genetic algorithm for multiprocessor scheduling,
IEEE Transactions Parallel and Distributed Systems, Vol. 5, No. 2, pp.113120.
Kaymak, U. and Setnes, M. (2002) Fuzzy clustering with volume prototypes and adaptive cluster
merging, IEEE Transactions on Fuzzy Systems, Vol. 10, No. 6, pp.705712.
Kirkpatrick, S., Gelatt, C. and Vecchi, M. (1983) Optimization by simulated annealing, Science,
Vol. 220, No. 4598, pp.671680.
Krishnapuram, R. and Keller, J.M. (1996) The possibilistic C-means algorithm: insights and
recommendations, IEEE Transactions on Fuzzy Systems, Vol. 4, No. 3, pp.385393.
Kumar, N.S.H. and Shanker, K. (2000) A genetic algorithm for FMS part type selection and
machine loading, International Journal of Production Research, Vol. 38, No. 16,
pp.38613887.
Lin, C., Huang, Y. and Chen, J. (2006) A genetic-based fuzzy clustering algorithm for fault
diagnosis in satellite attitude determination system, Proceedings of the Sixth International
Conference on Intelligent Systems Design and Applications, 1618 October, Jinan, China.
Liu, F., Wang, C., Gao, X.Z. and Wang, O. (2007) A fuzzy clustering algorithm based on artificial
immune principles, Proceedings of International Conference on Computational Intelligence
and Security, 1519 December, Harbin, Heilongjiang, China.
Liu, F., Wang, Q. and Gao, X. (2004) Survey of artificial immune system, IEEE International
Conference on System, Man and Cybernetics, Vol. 4, pp.34153420.
Lloyd, S.P. (1957) Least-square Quantization in PCM, Bell Laboratories Internal Technical Report.
Loukil, T., Teghem, J. and Fortemps, P. (2007) A multi-objective production scheduling case
study solved by simulated annealing, European Journal of Operational Research, Vol. 179,
No. 3, pp.709722.
Meng, K., Xia, R., Ji, T. and Qian, F. (2007) Electricity reference price forecasting with fuzzy
C-means and immune algorithm, 2007 IEEE Congress on Evolutionary Computation,
pp.23372343.
Metropolis, N., Rosenbluth, A.W. and Teller, A.H. (1953) Equation of state calculations by fast
computing machines, Journal of Chemical Physics, Vol. 21, No. 6, pp.10871092.
Mishra, N., Oblinger, D. and Pitt, L. (2001) Sublinera time approximate clustering, Proceedinf of
Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, 79 January, Washington,
DC.
Nie, S., Zhang, Y., Li, W. and Chen, Z. (2007) A fast and automatic segmentation method of MR
brain images based on genetic fuzzy clustering algorithm, 29th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, 2326 August, Lyon,
France.
Peng, H., Xu, L. and Jiang, Y. (2006) Improved genetic FCM algorithm for color image
segmentation, 8th International Conference on Signal Processing, 1620 November, Beijing,
China.
Peumans, M., Hikita, K., Munck, J.D., Landuyt, K.V., Poitevin, A., Lambrechts, P. and
Meerbeek, B.V. (2007) Effects of ceramic surface treatments on the bond strength of an
adhesive luting agent to CAD-CAM ceramic, Journal of Dentistry, Vol. 35, No. 4,
pp.282288.









A hybrid algorithm for fuzzy clustering 67













Ruspini, H. (1970) Numerical methods for fuzzy clustering, Information Sciences, Vol. 2, No. 3,
pp.319350.
Seckiner, S.U. and Kurt, M. (2007) A simulated annealing approach to the solution of job rotation
scheduling problems, Applied Mathematics and Computation, Vol. 188, No. 1, pp.3145.
Sha, D.Y. and Che, Z.H. (2006) Supply chain network design: partner selection and
production/distribution planning using a systematic model, Journal of Operational Research
Society, Vol. 57, No. 1, pp.5262.
Sun, L.H., Wang, L.W. and Wang, K.L. (2002) Study on the incentive mechanism of supplier
selection and management, Computer Integrated Manufacturing Systems, Vol. 8, No. 2,
pp.9599.
Turgut, D., Turgut, B., Elmasri, R. and Le, V. (2003) Optimizing clustering algorithm in mobile ad
hoc networks using simulated annealing, 2003 IEEE Wireless Communications and
Networking Conference, 1620 March, Louisiana, USA.
Wang, H.S. and Che, Z.H. (2008) A multi-phase model for product part change problems,
International Journal of Production Research, Vol. 46, No. 10, pp.27972825.
Wei, C.H. and Fahn, C.S. (1996) A distributed approach to fuzzy clustering by genetic algorithm,
Proceedings of the 1996 Asian Fuzzy Systems Symposium, 1114 December, Kenting, Taiwan.
Wu, J., Song, C.H., Kong, J.M. and Lee, W.D. (2007) Extended mean field annealing for
clustering incomplete data, International Symposium on Information Technology
Convergence, 2324 November, Jeonju, Korea.
Xie, X.L. and Beni, G. (1991) A validity measure for fuzzy clustering, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp.841847.
Xu, L., Mo, H. and Wang, K. (2006) Immune algorithm for supervised clustering, Proceedings of
the 2006 5th IEEE International Conference on Cognitive Informatics, 1719 July, Beijing,
China.
Yang, W., Rueda, L. and Ngom, A. (2005) A simulated annealing approach to find the optimal
parameters for fuzzy clustering microary data, 25th International Conferwnce of the Chilean
Computer Science Society, 711 November, Valdivia, Chile.
Zhang, X., Qian, X., Jiao, L. and Wang, G. (2007) An immune spectral clustering algorithm, 2007
International Symposium on Intelligent Signal Processing and Communication Systems,
28 November1 December, Xiamen, China.
Zhao, L., Tsujimura, Y. and Gen, M. (1996) Genetic algorithm for fuzzy clustering, Proceedings
of IEEE International Conference on Evolutionary Computation, 2022 May, Nagoya, Japan.