5 views

Uploaded by HadiBies

save

You are on page 1of 18

1, 2012

Copyright 2012 Inderscience Enterprises Ltd.

A hybrid algorithm for fuzzy clustering

Z.H. Che

Department of Industrial Engineering and Management,

National Taipei University of Technology,

1, Sec. 3, Chung-Hsiao E. Rd., Taipei 10608, Taiwan

E-mail: zhche@ntut.edu.tw

Abstract: The fuzzy C-means (FCM) algorithm is a commonly used fuzzy

clustering method which conducts data clustering by randomly selecting initial

centroids. With larger data size or attribute dimensions, clustering results may

be affected and more repetitive computations are required. To compensate the

effect of random initial centroids on results, this study proposed a hybrid

algorithm immune genetic annealing fuzzy C-means algorithm (IGAFA).

This algorithm obtains the proper initial cluster centroids to improve clustering

efficiency and then tests them through three data sets: Hambermans survival,

iris, and liver disorders, and compares the results with the executed results of

genetic fuzzy C-means algorithm (GFA), immune fuzzy C-means algorithm

(IFA), and annealing fuzzy C-means algorithm (AFA). The results suggest that

IGAFA could achieve better clustering results.

[Received: November 18, 2009; Accepted: July 19, 2010]

Keywords: fuzzy clustering; fuzzy C-means; FCM; clustering efficiency.

Reference to this paper should be made as follows: Che, Z.H. (2012) A hybrid

algorithm for fuzzy clustering, European J. Industrial Engineering, Vol. 6,

No. 1, pp.5067.

Biographical notes: Z.H. Che is an Associate Professor at the Department of

Industrial Engineering and Management, National Taipei University of

Technology, ROC. He received his PhD in Industrial Engineering and

Management at National Chiao Tung University, ROC. His current

research interests include production/operations management, supply chain

management, information management, and applications of meta-algorithms.

1 Introduction

Fuzzy C-means (FCM) refers to a fuzzy clustering method based on objective function

optimisation, which realises fuzzy clustering of data sets through repetitive computation

in unsupervised classification. Mishra et al. (2001) suggested that with more data, more

time is required for computation and sampling is often used to improve the clustering

efficiency and reduce the database. Lin et al. (2006) and Liu et al. (2007) pointed out that

FCM is a local search technique that enables search by hill-climbing, the randomly

selected initial cluster centroids may affect the clustering result and are likely be trapped

in local optimum during optimisation clustering processes. To resolve this problem,

different algorithms are used to first search the cluster centroids closest to the global

optimum and then conduct FCM steps to avoid being trapped into a local optimum.

A hybrid algorithm for fuzzy clustering 51

Zhao et al. (1996), Sha and Che (2006), and Damodaran et al. (2009) indicated that a

genetic algorithm (GA) based on inheritance and a natural selection mechanism, is

widely applied for various optimisation purposes, as data space is searched by

multi-point concepts and the mutation rule is random which reduces the probability of

trapping into false peak. Peng et al. (2006) pointed out that traditional FCM is often

converged to a local optimum rather than a global optimum. If GA is used to improve

colour pattern recognition, the local search capability could be enhanced despite of the

poorer execution speed of GA. Nie et al. (2007) argued that many researches only

considered the speed of FCM but neglected the accuracy, and thus, better clustered

centroids are identified using the excellent global search capability of GA, and the final

clustering results are converged and applied to medical image segmentation using the

local optimum search capability of FCM.

Bandyopadhyay (2005) suggested applying a simulated annealing algorithm (SA) of

reversible jump Markov chain Monte Carlo to fuzzy clustering. In order to achieve faster

search in the space and provide better convergence value for FCM, new solutions can be

produced through probability during the SA process, so that efficient classification of

artificial and real-life data sets can be achieved. Wu et al. (2007) applied SA to clustering

of incomplete data and the results showed reduced clustering errors. De Castro and

Von Zuben (2002) suggested that the clonal selection principle is an immune

optimisation method which identifies and eliminates the same solutions to retain the

better ones.

According to immune algorithm (IA) principles, Liu et al. (2004) developed new

algorithms and applied fuzzy clustering, then retained local optimums as antibodies and

further conducted global search. It compares K-means, K-medoid, and FCM indicating

new algorithms have better results. Zhang et al. (2007) pointed out that, when IA is

applied to clustering, initial centroids are searched and converged quickly to a global

optimum, thus improving the effect of initial centroids on the results. Liu et al. (2004)

indicated that FCM is easily trapped into a local solution and the clustering result is very

sensitive to initial centroids, thus, IA could be used to determine initial centroids

followed by fuzzy clustering, thereby reducing the convergence frequency of FCM to the

optimum.

There are two major study purposes:

1 combining the GA, IA, SA along with the steps of FCM to develop an immune

genetic annealing fuzzy C-means algorithm (IGAFA) and applying the IGAFA for

performing the fuzzy clustering analysis

2 comparing the solving performance of IGAFA, genetic fuzzy C-means algorithm

(GFA), immune fuzzy C-means algorithm (IFA), and annealing fuzzy C-means

algorithm (AFA) to verify IGAFA has excellent capabilities to solve the fuzzy

clustering problems defined in this study.

Three known data sets, Hambermans survival, iris, and liver disorder, are used for

validation together with the Xie-Beni cluster validation index (XB).

The remainder of this paper is organised as follows: Section 2 is the literature review

on researches and applications of fuzzy cluster analysis and algorithms; Section 3

introduces the methodology, research framework, and processes of the hybrid algorithm;

Section 4 presents the experimental results, compares XB indices, convergence times, and

52 Z.H. Che

computational time to analyse the advantages and disadvantages of the algorithms; the

conclusions and suggestions are given in Section 5.

2 Literature review

2.1 Fuzzy clustering analysis

The partitional clustering algorithm is often used in cluster research because its

computational complexity is small than the hierarchical clustering algorithm. K-means

(Lloyd, 1957), FCM (Bezdek, 1981), and possibilistic C-means (Krishnapuram and

Keller, 1996) in partitional clustering algorithm are most frequently used. This method

first obtains i random cluster centroids, determines the pattern that is suitable for the

same cluster according to the distance between each pattern and cluster of centroids, then

obtains new cluster centroids and modifies the clustering results using the newly

clustered centroids. These steps are implemented repetitively until the set stop conditions

are met and finally the clustering result can be acquired smoothly.

Ruspini (1970) combined Zadehs fuzzy sets theory and cluster analysis to perform

clustering of uncertain data with grades of membership, named fuzzy clustering. When

the data have clear messages, better clustering result can be obtained by traditional cluster

analysis. However, when there are special data, such as counting continuously data or

data with uncertain messages, fuzzy clustering may lead to better results. Bezdek and

Dunn (1975) first applied fuzzy theory to cluster analysis and developed a FCM

clustering algorithm. To resolve the fuzzy C-segmentation problem, Bezdek (1981)

deduced the minimisation algorithm by the distance between sample points and cluster

centroids.

Kaymak and Setnes (2002) pointed out that the XB index proposed by Xie and Beni

(1991) is a commonly used index, and XB is used to calculate the separation and

compactness among clusters and validate the clustering result of FCM. The lower

compactness and higher separation indicate satisfactory clustering results, thus, the

minimised XB index represents better clustering. The XB index is calculated as follows:

2

1 1

2

min

J I

m

ij j i

j i

i k

i j

u x v

XB

K v v

= =

(1)

where

1

2 1

1

m

K

j i

ij

j k k

x v

u

x v

(2)

1

1

J

m

ij j

j

i

J

m

ij

j

u x

v

u

=

=

=

(3)

A hybrid algorithm for fuzzy clustering 53

j index of data point, j = 1,2,3,J

v

i

and v

k

index of cluster centroid, i = 1,2,3,I, k = 1,2,3,K: jth pattern

m

ij

u degree of membership of jth pattern versus ith cluster centroid with m fuzziness

degree.

According to above, the clustering method can be applied to various fields, depending on

the clustering objectives. In this paper, the fuzzy clustering membership and algorithm

with search capability for optimisation problems are employed to obtain the initial

clustering centroids, thereby providing better fuzzy clustering efficiency.

2.2 The concept of GA and its application for FCM

The GA was developed by Holland (1975) and based on the concept of survival of the

fittest in the natural selection. GA is solved by means of a multi-point search rather than

the traditional single-point search, since it is easily trapped into a local optimum by the

single-point search especially for multimodal function. The basic procedure of GA and its

applications are referred to by Hou et al. (1994), Kumar and Shanker (2000), Sun et al.

(2002), Baker and Ayechew (2003), Che and Wang (2008), Wang and Che (2008), Chan

et al. (2009), and Che (2010a, 2010b).

Wei and Fahn (1996) argued that, based on the objective function in FCM, the

intended objective function could be established according to the combinatorial

optimisation problem. Many repetitive algorithms have been proposed, among which,

GA has more opportunities to obtain better solutions for such combinatorial optimisation

problems. Given random and comprehensive characteristics, the search methods of GA

can efficiently solve real-life data clustering problems of a certain quantity. Zhao et al.

(1996) suggested that fuzzy clustering can represent accurate results for inaccurate

clustering of real-life data structures, and GA is widely applied to various optimisation

problems by a multi-point conceptual search and a natural selection mechanism. Thus,

there are more opportunities to realise optimum clustering by applying GA and FCM to

manufacturing unit planning with clustering technology, proving that GA is an efficient

method for real-life data clustering. Lin et al. (2006) proposed a GA-based FCM to

overcome the shortcomings of FCM (easily trapped into local optimum) by using GAs

random global search. Thus, GA can be applied to failure diagnosis of satellite ADS and

proving a better clustering result than FCM.

To sum up, GA of global optimum search capability is widely applied to various

issues. Wei and Fahn (1996) suggested that GA has more opportunities to realise

optimum solutions, although the solving time could be prolonged.

2.3 The concept of IA and its application for FCM

Dasgupta and Okine (1997) suggested that the immune system is a complex

self-adjustment system, which can efficiently resist the intrusion of external bacteria or

virus. The main function is to identify themselves and other units and activate the

corresponding defense mechanism to resist external attack. Liu et al. (2004) indicated that

an artificial IA can identify unique antigens through antibodies in an immune reaction

54 Z.H. Che

and consider its objectives as antigens and its solutions as antibodies, of which antibodies

must be computed for unknown antigens in order to determine the optimum ones.

De Castro and Von Zuben (2002) and Liu et al. (2007) discussed the concept and

applications based on the immune system.

De Castro and Von Zuben (2001) indicated that the immune system could maintain a

local optimum and search a global optimum, and show better performance when applied

to hierarchical clustering and data analysis. Xu et al. (2006) proposed a supervised

clustering algorithm based on the clonal selection principle to identify better clusters.

Meng et al. (2007) proposed a fuzzy IA to conduct efficient data clustering and improve

the stability of a network. Liu et al. (2007) used an IA to search for initial solutions and

then better clustering results in fuzzy clusting problem, and quicker convergence speeds

of the new method are experimentally validated.

2.4 The concept of SA and its application for FCM

SA was first developed by Metropolis et al. (1953). Turgut et al. (2003) indicated

that after the application of Kirkpatrick et al. (1983) to combinatorial optimisation, it

became widely applied to optimisation problems. Its principle is that, when a solid is

heated to a certain temperature the molecular structures among the solids will be broken

down and turned into a liquid structure, then the cooling-down process is controlled in

such a way that, while changing liquid to a solid structure the molecules can be

rearranged into expected stable states. The related concepts of SA and its applications are

referred to by Seckiner and Kurt (2007) suggested that SA could randomly disturb

existing solutions to generate new ones within the solving space and due to the

probability temperature state model within the algorithm, temporarily accept an inferior

local optimum solution that could be replaced by any improved optimum. Thus, it has

been successfully applied to many related management cases. Abramson (1991)

constructed school timetabling by SA sequence and parallel algorithm. Brusco and Jacobs

(1993) applied SA to resolve flexible labour scheduling issues. Loukil et al. (2007),

Branke et al. (2008), and Yang et al. (2005) applied SA to solve dynamic facility layout

problems. Che and Wang (2010) proposed a hybrid approach based on K-means, SA,

convergence factor particle swarm optimisation, and the Taguchi method to solve

supplier cluster analysis problems.

SA steps are used in FCM to search optimum clusters and optimise parameters,

proving that it can improve clustering accuracy and speed. Bandyopadhyay (2005)

applied SA and reversible jump Markov chain Monte Carlo to fuzzy clustering and

demonstrated that the clustering results are better than the basic FCM algorithm, as

determined through experiments with artificial and real-life data sets.

2.5 Summary

The randomly selected initial value of FCM may affect clustering results with poor

global search capability and slow search speed. GA uses the multi-point search

mechanism to avoid the local optimum, but it re-finds the same solutions in the

searching process, that causes the great consumption of time for fuzzy clustering. IA

can remove the solutions with greater similarities to maintain diversity, but it is easily

A hybrid algorithm for fuzzy clustering 55

trapped into a local optimum by the single-point search mechanism in fuzzy clustering

process. SA can randomly disturb existing solutions to generate new ones within the

solving space of neighbourhood, but it is not easy to fine the better solution which is

distant from the existing solutions. Hence, this study combined the global search

capability of GA, the diversity mechanism of IA, and the local search capability

of SA, along with the clustering steps of FCM and then developed a hybrid algorithm

(IGAFA).

3 Methodology

3.1 Immune genetic annealing fuzzy C-means algorithm

IGAFA proposed in this paper combines the global search capability of GA, the

neighbourhood search capability of SA, and the better solution retention characteristics of

IA, and FCM, with the procedures shown in Figure 1.

Figure 1 The procedure of IGAFA (see online version for colours)

Initial Population

Calculate The Fitness

of Population

Reproduction

Crossover

Mutation

Stop

Affinity

Calculataion

Selection to

Memory Section

New

parent population

Perturb

chromosome

Accept New

Chromosome

Replace

Chromosome

Temperature

Decrease

Stop

Update

Memory Section

No

Yes

Yes

No

Yes

No

Yes

Initial Centroids

No

Calculate Matrix U

Yes

No

Optimal Clusters

The Variation

of J < 0.0001

Calculate

Object Function(J)

Calculate

New Centroids(V)

Step1

Step2

Step3

Step4

Step5

Step6

Step7

Step8

Step9

Step11

Step12

Step13

Step14

Step15

Step16

Step10

Step17

Step18

Step19

Step20

56 Z.H. Che

Step 1 The parameters include: population (P), generation (G), mutation rate (MR),

crossover rate (CR), temperature (T), Markov chain length (M), cooling rate (),

final temperature (T

f

). Produce initial population according to cluster number c

and data dimension d, the length of chromosome L = c d represents the

coordinates of the cluster centroids, as shown in Figure 2.

Step 2 Chromosomes produce cluster centroids through the data sets in database, and

calculate the fitness functional values of f(n) through a Euclidean distance

equation:

( ) ( ) ( ) ( )

2 2 2

1 1 2 2

, ... .

j i j i j i jN iN

d x v x v x v x v = + + +

Step 3 Compare the chromosomes by the fitness functional values and remove those

with greater similarities to maintain diversity.

Figure 2 Gene chromosome encoding (see online version for colours)

Figure 3 Calculation of fitness functional values with gene sequencing (see online version for

colours)

X Y Z X Y Z

Centroids of 1

Centroids of 2

Centroids of c

.

.

.

Gene

sequencing of 1

cN2

.

.

.

d c L =

c11 c12

Gene

sequencing of 2

Gene

sequencing of N

X Y Z X Y Z

Centroids of 1

Centroids of 2

Centroids of c

d c L = c21 c22

X Y Z X Y Z

Centroids of 1

Centroids of 2

Centroids of c

d c L =

) 1 ( = n f

) 2 ( = n f

) ( N n f =

cN1

A hybrid algorithm for fuzzy clustering 57

Step 4 Conduct clonal selection, namely, compare the chromosome fitness. If the

chromosome fitness is improved, update the memory anti-body sets, otherwise

conduct Step 5.

Step 5 Calculate probability through the Roulette method (Goldberg, 1989), of which a

higher fitness means a greater probability of reproduction. As the target is

intended for minimisation, take the reciprocal value 1/f(n) from the fitness

functional values of various gene sequences and calculate the percentage of the

whole probability

1

1 1

,

( ) ( )

N

n

n

p

f n f n

=

=

such that the smaller fitness functional

values have greater probability of being selected.

Step 6 As for the chromosome not selected, calculate the crossover number

CN = CR P through the set crossover rate CR and population P, and randomly

select CN chromosomes from the offspring chromosomes to conduct a

single-point crossover (Figure 4).

Figure 4 Single-point crossover procedures (see online version for colours)

C11 C12 C16 C17 C18 C1L

C21 C22 C26 C27 C28 C2L

Cut point

C11 C26 C27 C28 C2L

C21 C16 C17 C18 C1L

Crossover

produres

Figure 5 Mutation methods (see online version for colours)

C11 C12 C19 C1L

Mutation point

Before the

mutation

After the

mutation

Mutation

produres

C11 C12 C19' C1L

Stochastic value

58 Z.H. Che

Step 7 Calculate the mutation number MN = MR L P according to the set MR,

chromosome length (L), and P, randomly select MN genes by single-point

mutation and the chromosome of this gene sequence, and then replace by the

public static function, as shown in Figure 5.

Step 8 After gene mutation, calculate the fitness functional values f

m

(n) of the

chromosomes by Euclidean distance equation and conduct a disturbance to

generate a neighbourhood under current temperature (T), and then calculate the

fitness functional values of the neighbourhood, using the Euclidean distance

equation again.

Step 9 Calculate probability function

1

, ( ) 0

( )

( )

exp , ( ) 0

if f n

P n

f n

if f n

T

=

<

according to the fitness function difference f(n) = f

s

(n) f

m

(n), and then

randomly produce a 01 number r to compare with P(n); if r P(n), conduct

Step 10; if r > P(n) , return to Step 8 to reproduce the neighbourhood until the

end of the set search times.

Step 10 The neighbourhood generated by a disturbance replaces the mutated

chromosome and its fitness functional values.

Step 11 With the set cooling rate , cool the temperature through cooling mechanism

T = T .

Step 12 Judge if the circulation is finished through set final temperature T

f

. If T T

f

,

conduct Step 13; if T > T

f

, repeat Steps 811 until T T

f

conditions are met.

Step 13 Calculate the fitness functional values f(n) of new chromosomes generated by

simulated annealing, using the Euclidean distance equation.

Step 14 If f(n) > f

m

(n), the newly generated solution will replace the population;

otherwise, maintain the original population as the next-generation population.

Repeat Steps 213 until reaching the set number of generations.

Step 15 According to the computational results, generate the optimum coordinate as the

optimum initial centroids of FCM.

Step 16 Calculate the membership of various points and cluster centroids using the

membership equation.

Step 17 Calculate XB, of which XB

h

represents the XB indices of every clustering result.

Step 18 Calculate XB(n) = XB

h

(n) XB

h 1

(n); if XB(n) >= 0.0001, proceed with

Step 19; otherwise conduct Step 20.

Step 19 Obtain new centroids v

i

, and return to Step 16.

Step 20 Complete computational procedures to obtain the optimum clustering result.

3.2 Procedure for evaluation of four methods

Procedure of evaluating the four methods is shown in Figure 6.

A hybrid algorithm for fuzzy clustering 59

Figure 6 The process for evaluating four methods (see online version for colours)

IGAFA AFA IFA GFA

Validity

Index XB

ANOVA

Data

Set

Conclusions

GA

FCM

IA

FCM

SA

FCM

IGSA

FCM

+ + + +

Hambers

Survival

Iris

Liver

Disorders

4 Experimental results

4.1 Data

This study conducted experiments by three data sets, as supplied by the UCI machine

learning repository website, and analysed the clustering results of various algorithms

against the data sets (Asuncion and Newman, 2007):

1 Habermans survival data: 306 patient data are contained in the data sets, including:

the age of patient at time of operation, year of operation, and number of positive

auxiliary nodes detected; these cases are categorised into two types depending on

survival status: survival over five years, and death within five years. The data

distribution is shown in Figure 7.

2 Iris data: 150 data are contained in the data sets of four eigenvalues specific to iris

plant, including: sepal length, sepal width, petal length, and petal width (cm

measuring units); these cases are categorised into three types: setosa, versicolor, and

60 Z.H. Che

virginca, each of which is provided with 50 data. The distribution map is shown in

Figure 8.

3 Liver disorder data: 345 liver patient data are gathered hereto, including: five blood

test data (mcv mean corpuscular volume, alkphos alkaline phosphatase, sgpt alamine

aminotransferase, sgot aspartate aminotransferase, and gammagt gamma-glutamyl

transpeptidase), and daily imbibed quantity of liquor (drinks number half-pint

equivalents of alcoholic beverages drunk per day), which can be categorised into two

types. The distribution map is shown in Figure 9.

Figure 7 Three-dimensional diagram of Habermans survival

Figure 8 Scatter diagram of iris

A hybrid algorithm for fuzzy clustering 61

Figure 9 Scatter diagram of liver disorder

4.2 Comparison of methods

The initial optimum clusters of data sets are listed in Table 1. The optimum clusters

specific to the data sets, were separately subjected to clustering algorithms by means of

GFA, IFA, AFA, and IGAFA. Each algorithm was repetitively tested 30 times, and the

solving performance was compared with respect to the respective XB index, convergence

times, and computational times. This study compared the significance of algorithms

variance using ANOVA, and judged the relationship of algorithms by Schffes multiple

comparison check (Peumans et al., 2007).

Table 2 shows the results of ANOVA. XB indices in Hambermans survival and liver

disorders have no significant differences after clustering (P-value > 0.05), however, those

in the iris data set have significant differences (P-value < 0.05). The algorithm

relationship is determined using Scheffes multiple comparison. The results are listed in

Table 3. In the iris data set, IGAFA is better than GFA, AFA, and IFA.

Table 1 Real optimal clusters (c*) in data sets

Data set Habermans survival Iris Liver disorders

c* 2 3 2

Table 2 Results of ANOVA of XB index values of four methods

Average XB index values

H

0

:

GFA

=

IFA

=

AFA

=

IGAF

H

1

: Otherwise

Data set

GFA IFA AFA IGAFA F-value P-value Result

Habermans

survival

0.223036 0.222972 0.222962 0.222955 1.725 0.16576 Accept H

0

Iris 0.161738 0.161737 0.161733 0.161688 12.798 2.78E-7 Reject H

0

Liver

disorders

0.125810 0.125639 0.125771 0.125584 1.482 1.48174 Accept H

0

62 Z.H. Che

Table 3 Results of multiple comparison of XB index values of four methods

Data set Method A Method B Mean difference F-value P-value Result

IFA GFA 6.42E-5 0.8555 0.466 IFA = GFA

AFA GFA 7.40E-5 1.1388 0.336 AFA = GFA

AFA IFA 9.87E-6 0.0202 0.996 AFA = IFA

IGAFA GFA 8.11E-5 1.3654 0.257 IGAFA = GFA

IGAFA IFA 1.69E-5 0.0593 0.981 IGAFA = IFA

Habermans

survival

IGAFA AFA 7.03E-6 0.0103 0.999 IGAFA = AFA

Rank IGAFA = AFA = IFA = GFA

IFA GFA 7.00E-7 0.0019 0.999 IFA = GFA

AFA GFA 5.07E-6 0.0968 0.962 AFA = GFA

AFA IFA 4.37E-6 0.0719 0.975 AFA = IFA

IGAFA GFA 4.93E-5 9.1564 0.000 IGAFA < GFA

IGAFA IFA 4.86E-5 8.8980 0.000 IGAFA < IFA

Iris

IGAFA AFA 4.42E-5 7.3699 0.000 IGAFA < AFA

Rank IGAFA < AFA = IFA = GFA

IFA GFA 1.72E-4 0.6324 0.596 IFA = GFA

AFA GFA 3.88E-5 0.0323 0.992 AFA = GFA

AFA IFA 1.33E-4 0.3788 0.768 AFA = IFA

IGAFA GFA 2.26E-4 1.1002 0.352 IGAFA = GFA

IGAFA IFA 5.48E-5 0.0644 0.979 IGAFA = IFA

Liver

disorders

IGAFA AFA 1.88E-4 0.7555 0.521 IGAFA = AFA

Rank IGAFA = AFA = IFA = GFA

Similarly, the convergence times of the four algorithms are compared. As shown in

Table 4, the convergence times of various algorithms in the three data sets have

significant differences (P-value < 0.05), and the algorithm relationships are determined

using Scheffes multiple comparison. The results are listed in Table 5, wherein the

advantages/disadvantages of various algorithms for the clustering results of data sets are

not fully consistent. In the Habermans survival data set, IGAFA is better than GFA, IFA,

and AFA; in the iris data set, IGAFA is better than GFA, AFA, and IFA; in the Liver

Disorders data set, IGAFA differs little from GFA or IFA, but the convergence times are

better than AFA.

Table 4 Results of ANOVA of convergence iterations of four methods

Average convergence iterations

H

0

:

GFA

=

IFA

=

AFA

=

IGAF

H

1

: Otherwise

Data set

GFA IFA AFA IGAFA

F-value P-value Result

Habermans survival 7.4 13.6 14.5 6.4 410.278 0.000 Reject H

0

Iris 17.9 22.3 19.7 12.5 129.836 0.000 Reject H

0

Liver disorders 12.9 14.2 24.4 13.3 532.569 0.000 Reject H

0

A hybrid algorithm for fuzzy clustering 63

Table 5 Results of multiple comparison of convergence iterations of four methods

Data set Method A Method B Mean difference F-Value P-value Result

IFA GFA 6.2 151.5649 0.000 IFA > GFA

AFA GFA 7.1 198.7614 0.000 AFA > GFA

AFA IFA 0.9 3.1938 0.002 AFA > IFA

IGAFA GFA 1.0 3.9429 0.010 IGAFA < GFA

IGAFA IFA 7.2 204.3997 0.000 IGAFA < IFA

Habermans

survival

IGAFA AFA 8.1 258.6934 0.000 IGAFA < AFA

Rank IGAFA < GFA < IFA < AFA

IFA GFA 4.3 23.2293 0.000 IFA > GFA

AFA GFA 1.7 3.7745 0.012 AFA > GFA

AFA IFA 2.6 8.2763 0.000 AFA < IFA

IGAFA GFA 5.5 38.0036 0.000 IGAFA < GFA

IGAFA IFA 9.8 120.6568 0.000 IGAFA < IFA

Iris

IGAFA AFA 7.2 65.7319 0.000 IGAFA < AFA

Rank IGAFA < GFA < AFA < IFA

IFA GFA 1.0 2.9412 0.036 IFA > GFA

AFA GFA 11.4 382.2353 0.000 AFA > GFA

AFA IFA 10.4 318.1177 0.000 AFA > IFA

IGAFA GFA 0.3 0.3268 0.805 IGAFA = GFA

IGAFA IFA 0.7 1.3072 0.275 IGAFA = IFA

Liver

disorders

IGAFA AFA 11.1 360.2092 0.000 IGAFA < AFA

Rank IGAFA = GFA = IFA < AFA

The computational time of the four algorithms are compared, as shown in Table 6. The

execution times of algorithms in the three data sets have significant differences

(P-value < 0.05). Scheffes multiple comparison results are listed in Table 7, wherein the

shorter execution time represents stronger efficiency. In the Habermans survival and iris

data sets, the execution times of IGAFA and IFA have insignificant differences, however,

they are better than AFA and GFA; in liver disorders, the execution time of IGAFA is

better than IFA, AFA, and GFA.

To sum up, IGAFA has lesser convergence times and shorter computational times,

therefore, it has excellent computational efficiency in data clustering.

Table 6 Results of ANOVA of CPU time (sec) of four methods

Average CPU time (sec)

H

0

:

GFA

=

IFA

=

AFA

=

IGAF

H

1

: Otherwise

Data set

GFA IFA AFA IGAFA F-value P-value Result

Habermans survival 126 46 61 41 727.31338 0.000 Reject H

0

Iris 113 18 44 34 709.34078 0.000 Reject H

0

Liver disorders 214 86 162 75 1084.8132 0.000 Reject H

0

64 Z.H. Che

Table 7 Results of Scheffes multiple comparison of CPU time (sec) of four methods

Data set Method A Method B Mean difference F-value P-value Result

IFA GFA 79.5 502.9594 0.000 IFA < GFA

AFA GFA 65.0 336.1252 0.000 AFA < GFA

AFA IFA 14.5 16.7530 0.000 AFA > IFA

IGAFA GFA 84.4 566.8928 0.000 IGAFA < GFA

IGAFA IFA 4.9 1.9120 0.131 IGAFA = IFA

Habermans

survival

IGAFA AFA 19.4 29.9843 0.000 IGAFA < AFA

Rank IGAFA = IFA < AFA < GFA

IFA GFA 95.2 513.0825 0.000 IFA < GFA

AFA GFA 69.4 272.8903 0.000 AFA < GFA

AFA IFA 25.8 37.6006 0.000 AFA > IFA

IGAFA GFA 98.3 547.2922 0.000 IGAFA < GFA

IGAFA IFA 3.1 0.5520 0.642 IGAFA = IFA

Iris

IGAFA AFA 28.9 47.2641 0.000 IGAFA < AFA

Rank IGAFA = IFA < AFA < GFA

IFA GFA 128.0 688.2438 0.000 IFA < GFA

AFA GFA 51.6 111.8463 0.000 AFA < GFA

AFA IFA 76.4 245.1936 0.000 AFA > IFA

IGAFA GFA 138.3 803.8519 0.000 IGAFA < GFA

IGAFA IFA 10.3 4.4854 0.005 IGAFA < IFA

Liver

disorders

IGAFA AFA 86.7 316.0054 0.000 IGAFA < AFA

Rank IGAFA < IFA < AFA < GFA

5 Conclusions and suggestions

Since FCM is time-consuming for clustering computation of random initial centroids, this

study developed a hybrid algorithm IGAFA, to improve clustering efficiency with a

global search capability of GA, better solution retention characteristics of IA, and the

overriding random local solution method of SA. IGAFA could properly identify initial

cluster centroids of data sets and measure the execution efficiency of algorithms, in order

to prevent long solving times due to larger data sizes.

This study validated the clustering efficiency by three data sets: Hambermans

survival, iris, and liver disorders, of which XB indices, execution times, and convergence

times were taken as comparison targets. The results indicated that IGAFA has

outstanding solving quality, and thus, can be efficiently used in various fields, such as:

image recognition, part type selection, and customer clustering. Future research will

focus on multi-objective clustering algorithms.

A hybrid algorithm for fuzzy clustering 65

Acknowledgements

This research was partially financially supported by the National Science Council under

project No. NSC 98-2410-H-027-002-MY2. The authors would like to thank the editors

and the three anonymous referees for their valuable comments.

References

Abramson, D. (1991) Constructing school timetables using simulated annealing: sequential and

parallel algorithms, Management Science, Vol. 37, No. 1, pp.98113.

Asuncion, A. and Newman, D.J. (2007) UCI machine learning repository, University of

California, School of Information and Computer Science, Irvine, CA, available at

http://www.ics.uci.edu/~mlearn/MLRepository.html.

Baker, B.M. and Ayechew, M.A. (2003) A genetic algorithm for the vehicle routing problem,

Computers & Operations Research, Vol. 30, No. 5, pp.787800.

Bandyopadhyay, S. (2005) Simulated annealing using a reversible jump Markov chain Monte

Carlo algorithm for fuzzy clustering, IEEE Transactions on Knowledge and Data

Engineering, Vol. 17, No. 4, pp.479490.

Bezdek, J.C. and Dunn, J.C. (1975) Optimal fuzzy partitions: a heuristic for estimating the

parameters in a mixture of normal distributions, IEEE Transactions on Computers, Vol. 24,

No. 8, pp.835840.

Bezdek, J.L. (1981) Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum Press,

New York.

Branke, J., Meisel, S. and Schmidt, C. (2008) Simulated annealing in the presence of noise,

Journal of Heuristics, Vol. 14, No. 6, pp.627654.

Brusco, M. and Jacobs, L. (1993) A simulated annealing approach to the solution of exible labor

scheduling problems, Journal of the Operational Research Society, Vol. 44, No. 12,

pp.11911200.

Chan, K.Y., Chan, K.W. Pong, G.T.Y., Aydin, M.E., Fogarty, T.C. and Ling, S.H. (2009) A

statistics-based genetic algorithm for quality improvements of power supplies, European

Journal of Industrial Engineering, Vol. 3, No. 4, pp.468492.

Che, Z.H. (2010a) A genetic algorithm-based model for solving multi-period supplier selection

problem with assembly sequence, International Journal of Production Research, Vol. 48,

No. 15, pp.43554377.

Che, Z.H. (2010b) Using hybrid genetic algorithms for multi-period product configuration change

planning, International Journal of Innovative Computing, Information and Control, Vol. 6,

No. 6, pp.27612785.

Che, Z.H. and Wang, H.S. (2008) Supplier selection and supply quantity allocation of common

and non-common parts with multiple criteria under multiple products, Computers and

Industry Engineering, Vol. 55, No. 1, pp.110133.

Che, Z.H. and Wang, H.S. (2010) A hybrid approach for supplier cluster analysis, Computers &

Mathematics with Applications, Vol. 59, No.2, pp.745763

Damodaran, P., Hirani, N.S. and Velez-Gallego, M.C. (2009) Scheduling identical parallel batch

processing machines to minimise makespan using genetic algorithms, European Journal of

Industrial Engineering, Vol. 3, No. 2, pp.187206.

Dasgupta, D. and Okine, N.A. (1997) Immunity-based systems: a survey, Proceedings of the

IEEE International Conference on Systems, Man and Cybernetics, 1215 October, Orlando,

Florida, USA.

66 Z.H. Che

De Castro, L.N. and Von Zuben, F.J. (2001) aiNet: an artificial immune network for data analysis,

in Abbass, H.A., Sarker, R.A. and Newton, C.S. (Eds.): Data Mining: A Heuristic Approach,

pp.231259, Idea Group Publishing.

De Castro, L.N. and Von Zuben, F.J. (2002) Learning and optimization using the clonal selection

principle, IEEE Transactions on Evolutionary Computation, Vol. 3, No. 3, pp.239251.

Goldberg, D.E. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning,

Addison-Wesley.

Holland, J.H. (1975) Adaptation in Natural and Artificial System, The University of Michigan

Press.

Hou, E.S.H., Ansari, N. and Hong, R. (1994) A genetic algorithm for multiprocessor scheduling,

IEEE Transactions Parallel and Distributed Systems, Vol. 5, No. 2, pp.113120.

Kaymak, U. and Setnes, M. (2002) Fuzzy clustering with volume prototypes and adaptive cluster

merging, IEEE Transactions on Fuzzy Systems, Vol. 10, No. 6, pp.705712.

Kirkpatrick, S., Gelatt, C. and Vecchi, M. (1983) Optimization by simulated annealing, Science,

Vol. 220, No. 4598, pp.671680.

Krishnapuram, R. and Keller, J.M. (1996) The possibilistic C-means algorithm: insights and

recommendations, IEEE Transactions on Fuzzy Systems, Vol. 4, No. 3, pp.385393.

Kumar, N.S.H. and Shanker, K. (2000) A genetic algorithm for FMS part type selection and

machine loading, International Journal of Production Research, Vol. 38, No. 16,

pp.38613887.

Lin, C., Huang, Y. and Chen, J. (2006) A genetic-based fuzzy clustering algorithm for fault

diagnosis in satellite attitude determination system, Proceedings of the Sixth International

Conference on Intelligent Systems Design and Applications, 1618 October, Jinan, China.

Liu, F., Wang, C., Gao, X.Z. and Wang, O. (2007) A fuzzy clustering algorithm based on artificial

immune principles, Proceedings of International Conference on Computational Intelligence

and Security, 1519 December, Harbin, Heilongjiang, China.

Liu, F., Wang, Q. and Gao, X. (2004) Survey of artificial immune system, IEEE International

Conference on System, Man and Cybernetics, Vol. 4, pp.34153420.

Lloyd, S.P. (1957) Least-square Quantization in PCM, Bell Laboratories Internal Technical Report.

Loukil, T., Teghem, J. and Fortemps, P. (2007) A multi-objective production scheduling case

study solved by simulated annealing, European Journal of Operational Research, Vol. 179,

No. 3, pp.709722.

Meng, K., Xia, R., Ji, T. and Qian, F. (2007) Electricity reference price forecasting with fuzzy

C-means and immune algorithm, 2007 IEEE Congress on Evolutionary Computation,

pp.23372343.

Metropolis, N., Rosenbluth, A.W. and Teller, A.H. (1953) Equation of state calculations by fast

computing machines, Journal of Chemical Physics, Vol. 21, No. 6, pp.10871092.

Mishra, N., Oblinger, D. and Pitt, L. (2001) Sublinera time approximate clustering, Proceedinf of

Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, 79 January, Washington,

DC.

Nie, S., Zhang, Y., Li, W. and Chen, Z. (2007) A fast and automatic segmentation method of MR

brain images based on genetic fuzzy clustering algorithm, 29th Annual International

Conference of the IEEE Engineering in Medicine and Biology Society, 2326 August, Lyon,

France.

Peng, H., Xu, L. and Jiang, Y. (2006) Improved genetic FCM algorithm for color image

segmentation, 8th International Conference on Signal Processing, 1620 November, Beijing,

China.

Peumans, M., Hikita, K., Munck, J.D., Landuyt, K.V., Poitevin, A., Lambrechts, P. and

Meerbeek, B.V. (2007) Effects of ceramic surface treatments on the bond strength of an

adhesive luting agent to CAD-CAM ceramic, Journal of Dentistry, Vol. 35, No. 4,

pp.282288.

A hybrid algorithm for fuzzy clustering 67

Ruspini, H. (1970) Numerical methods for fuzzy clustering, Information Sciences, Vol. 2, No. 3,

pp.319350.

Seckiner, S.U. and Kurt, M. (2007) A simulated annealing approach to the solution of job rotation

scheduling problems, Applied Mathematics and Computation, Vol. 188, No. 1, pp.3145.

Sha, D.Y. and Che, Z.H. (2006) Supply chain network design: partner selection and

production/distribution planning using a systematic model, Journal of Operational Research

Society, Vol. 57, No. 1, pp.5262.

Sun, L.H., Wang, L.W. and Wang, K.L. (2002) Study on the incentive mechanism of supplier

selection and management, Computer Integrated Manufacturing Systems, Vol. 8, No. 2,

pp.9599.

Turgut, D., Turgut, B., Elmasri, R. and Le, V. (2003) Optimizing clustering algorithm in mobile ad

hoc networks using simulated annealing, 2003 IEEE Wireless Communications and

Networking Conference, 1620 March, Louisiana, USA.

Wang, H.S. and Che, Z.H. (2008) A multi-phase model for product part change problems,

International Journal of Production Research, Vol. 46, No. 10, pp.27972825.

Wei, C.H. and Fahn, C.S. (1996) A distributed approach to fuzzy clustering by genetic algorithm,

Proceedings of the 1996 Asian Fuzzy Systems Symposium, 1114 December, Kenting, Taiwan.

Wu, J., Song, C.H., Kong, J.M. and Lee, W.D. (2007) Extended mean field annealing for

clustering incomplete data, International Symposium on Information Technology

Convergence, 2324 November, Jeonju, Korea.

Xie, X.L. and Beni, G. (1991) A validity measure for fuzzy clustering, IEEE Transactions on

Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp.841847.

Xu, L., Mo, H. and Wang, K. (2006) Immune algorithm for supervised clustering, Proceedings of

the 2006 5th IEEE International Conference on Cognitive Informatics, 1719 July, Beijing,

China.

Yang, W., Rueda, L. and Ngom, A. (2005) A simulated annealing approach to find the optimal

parameters for fuzzy clustering microary data, 25th International Conferwnce of the Chilean

Computer Science Society, 711 November, Valdivia, Chile.

Zhang, X., Qian, X., Jiao, L. and Wang, G. (2007) An immune spectral clustering algorithm, 2007

International Symposium on Intelligent Signal Processing and Communication Systems,

28 November1 December, Xiamen, China.

Zhao, L., Tsujimura, Y. and Gen, M. (1996) Genetic algorithm for fuzzy clustering, Proceedings

of IEEE International Conference on Evolutionary Computation, 2022 May, Nagoya, Japan.

- Segmentation of Brain MRI Images using Fuzzy c-means and DWTUploaded byIJSTE
- Clustering & FusionUploaded byromit3011
- Survey of Clustering AlgorithmsUploaded bynnshgh
- 4Uploaded byJames Vincent
- 10.1.1.323.9065Uploaded byDũng Cù Việt
- Vector Quantization and MFCC based Classification of Dysfluencies in Stuttered SpeechUploaded byBONFRING
- Medoid ClusteringUploaded bysatyaappidi
- Fuzzy CMeansUploaded bySup Sitepu
- Application of Higher Education System for Predicting Student Using Data mining TechniquesUploaded byIJIRAE
- 06061219Uploaded byprathap
- Anomaly Detection Using Metaheuristic Firefly Harmonic ClusteringUploaded byfadirsalmen
- [IJCST-V5I6P11]:Chirag, Komal SharmaUploaded byEighthSenseGroup
- Introduction to data mining - Steinbach.pdfUploaded bysi12nisha
- A Latent Class Method for Classifying and Evaluating the Performance of Station Area Transit-Oriented Development in the Toronto RegionUploaded bySabri Taridala
- Ahmed Rebai DA-ClusterUploaded byKunal Sharma
- Bct ManualUploaded byvinodkumar57
- Fungal Detection in Agricultural Crops using IoT and SVMUploaded byIRJET Journal
- Crime Pattern Detection Using Data MiningUploaded byEnes Husic
- Text Mining in Bigdata using RCDC Clustering Algorithm in Hadoop EnvironmentUploaded byIRJET Journal
- A Statistical Framework for Cluster Health Assessment and Its Application in Anti-Money-Laundering SystemsUploaded byCognizant
- Advances in Data Analysis and ClassificationUploaded byLars Frteats
- IS ZC415Uploaded byShailaja Shanker

- 10.1.1.468.8694Uploaded byHadiBies
- 1 Blockchain s Roles in Meeting Key Supply 2018 International Journal of InfUploaded byHadiBies
- 10-Item Sim Score GradingUploaded byKyan Wong
- jurnal mko.pdfUploaded byAgatha Marline
- jurnal mko.pdfUploaded byAgatha Marline
- A Bi Level Programming Approach for Production Dis 2017 Computers IndustriUploaded byHadiBies
- 10.1.1.463.5470.pdfUploaded byHadiBies
- A Business Process Re Design Methodology to Support 2015 International JournUploaded byHadiBies
- 10.1.1.461.7603.pdfUploaded byHadiBies
- Enterprise Resource Planning (ERP) System Implementation a Case for User ParticipationUploaded byastariwulandari
- A_STUDY_ON_LEAN_MANUFACTURING_IMPLEMENTA.pdfUploaded byHadiBies
- Jordan Journal of Mechanical and Industrial Engineering (JJMIE), Volume 2, Number 3,Sep. 2008.pdfUploaded byHadiBies
- A_classification_of_multi-criteria_and_e.pdfUploaded byHadiBies
- 1-s2.0-S1877705812027750-mainUploaded bysmrajbe
- A_Study_on_Increasing_Competitveness_of.pdfUploaded byHadiBies
- Jordan Journal of Mechanical and Industrial Engineering (JJMIE), Volume 5, Number 6, Dec. 2011.pdfUploaded byHadiBies
- The-Benefits-of-Enterprise-Resource-Planning--ERP--System-Implementation-in-Dry-Food-Packaging-Industry_2013_Procedia-Technology.pdfUploaded byHadiBies

- Trabajo de Metodologia FerrumUploaded byJulliette Stephanie Chavez Leonett
- C4. Sistema TelefonicoUploaded byMac Giver Ramos
- 329790461-Brazo-Hidraulico-Informe.docxUploaded byGraceNayaretCancinoAlarcón
- Mayda Informe t3 MecanicaUploaded byDenilson perez gonzales
- Portfolio Em GRUPO UNOPAR Gestao Financeira 3 e 4 - Empresa GFIN - Encomende Aqui 31 996812207Uploaded byMonitor Veterano
- Probabilistic Fatigue Crack Growth AnalysisUploaded byAnonymous swEs7SEg
- manual-sistemas-encendido-inyeccion-electronica.pdfUploaded byCarlos Javier Olivarez Valenzuela
- GESTION HUMANA CESI 4.pdfUploaded bySebastian Isaza
- ContratoUploaded byElizabeth Fortecki
- VERIFICAÇÃO ESTRUTURAL de UMA TORRE de Telecomunicações Trelicada de Aco Com 60 MetrosUploaded byAndré Luiz Nogueira
- Atlas 4140Uploaded bygio_flores_4
- Pilas de CombustibleUploaded byGiadaTitiCaniglia
- Jimmy Smith - The Cat - B3Uploaded byC. Mark Halberstadt
- Embedded Coder Getting Started GuideUploaded byKjfsa Tu
- Historia Del MuebleUploaded byDanilu Moreno
- Medios masivos de comunicación y su influencia en la educación _ OdiseoUploaded byHernán Cortés
- Arqueo Bibliográfico de Educación y Tecnologías de La Información y ComunicaciónUploaded bynatypalmero
- 5s prezentareUploaded byAndrei Glava
- Practica 6 Repaso Unmsm Enero-marzo 2016 (Practica 3 Repaso Unmsm Agosto - Examen Admisión 2014)Uploaded byAnthony Becerra Escobar
- Cryptsy - APIUploaded bySoyq
- FortiGate-200DUploaded byJonathan Martinez
- MELJUN CORTES Prelim Exam of Jose Rizal University CSC15Uploaded byMELJUN CORTES, MBA,MPA
- Anexo I - Memorial Descritivo-tabeladoUploaded byAureo Sierra
- Proyecto de AulaUploaded byroruma
- Diseño circuitos sencuenciales.pdfUploaded byCesar Fernandez
- Bitácora 2401Uploaded byCristian Ramirez Rodas
- Google App EngineUploaded bypercybhai031
- Interpretacion de Perfiles de ProduccionUploaded byLina Secco
- Diapositiva # 10 Métodos Subterráneos 2017 I S 5-6Uploaded byCESAR BUENDIA MARMANILLO
- T.O-Questões Ativ.10 2014 01Uploaded bySara Lemes