J Intell Inf Syst (2012) 38:321–341 DOI 10.1007/s1084401101583
Data clustering using bacterial foraging optimization
Miao Wan · Lixiang Li · Jinghua Xiao · Cong Wang · Yixian Yang
Received: 10 May 2010 / Revised: 16 March 2011 / Accepted: 17 March 2011 / Published online: 9 April 2011 © Springer Science+Business Media, LLC 2011
Abstract Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. This article presents a new clustering algorithm based on the mechanism analysis of Bacterial Foraging (BF). It is an optimization methodology for clustering problem in which a group of bacteria forage to converge to certain positions as final cluster centers by minimizing the fitness function. The quality of this approach is evaluated on several wellknown benchmark data sets. Compared with the popular clustering method named kmeans algorithm, ACO based algorithm and the PSObased clustering technique, experimental results show that the proposed algorithm is an effective clustering technique and can be used to handle data sets with various cluster sizes, densities and multiple dimensions.
Keywords Data mining · Data clustering · Bacterial foraging optimization · Optimization based clustering
1 Introduction
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters) (Jain et al. 1999). In the past fifty years, many
M. Wan _{(}_{B}_{)} · L. Li · C. Wang · Y. Yang
Information Security Center, State Key Laboratory of Networking and Switching Technology,
Beijing University of Posts and Telecommunications, P.O. Box 145, Beijing 100876, China email: wanmiao120@163.com
M. Wan · L. Li · C. Wang · Y. Yang
Key Laboratory of Network and Information Attack & Defence Technology of MOE,
Beijing University of Posts and Telecommunications, Beijing 100876, China
J. Xiao School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
322
J Intell Inf Syst (2012) 38:321–341
attentions have been focused on the problem of clustering from the theoretical and
the practical point of view. Such problem has been addressed in diverse areas such
as pattern recognition, data analysis, image processing, economic science (especially
market research) and biology. So the study about new clustering algorithms is an important issue in the research fields including data mining, machine learning, statistics, and biology. In recent years, different clustering algorithms have been proposed, such as
partitioning (MacQueen 1967; Ng and Han 1994), hierarchical (Guha et al. 1998), densitybased (Hinneburg and Keim 1998), gridbased (Sheikholeslami et al. 1998) and modelbased (Dempster et al. 1977). Partitioning approach constructs different partitions based on some criterion. For hard partitional clustering, each pattern belongs to one and only one cluster. Fuzzy clustering (Bezdek 1981; Zhang and Leung 2004) extends this notion that each pattern may belong to all clusters with
a degree of membership. Apart from the above techniques, kernel kmeans and
spectral clustering have both been used to identify clusters that are nonlinearly separable in input space (Dhillon et al. 2005, 2007; Filippone et al. 2008). kmeans algorithm (MacQueen 1967) is the most popular approach because of its simplicity, efficiency and low cost of computation. However, since criterion func tions for clustering are usually nonconvex and nonlinear, traditional approaches, especially standard kmeans algorithm, is sensitive to initializations and easy to be trapped in local optimal solutions. As the increasing numbers and dimensions of data sets, finding solutions to the criterion functions has become an NPhard problem. Some variants to standard kmeans method provide a fast and local search strategy to solve this problem (Arthur and Vassilvitskii 2007; Kanungo et al. 2004). Since the importance of clustering strategies in many fields, global optimization methods (Hruschka et al. 2006; Shelokar et al. 2004; van der Merwe and Engelbrecht 2003; Li et al. 2006), such as genetic algorithms (GA), ant colony optimization (ACO) and particle swarm optimization (PSO), have been applied to solve clustering problems (Hruschka et al. 2006; Handl et al. 2006; Shelokar et al. 2004; van der Merwe and Engelbrecht 2003; Wan et al. 2010). When solving clustering problems, these algorithms start from an initial population or position and explore the solution space through a number of iterations to reach a near optimal solution. The social insects behavior such as finding the best food source, building of optimal nest structure, brooding, protecting the larva, guarding, etc. show intelligent behavior on the swarm level (Englebrecht 2002). Foraging is a kind of social insect behaviors and can be modelled as an optimization process where an animal seeks to maximize energy intake per unit time spent for foraging. This view led Passino to develop a new optimization algorithm which is inspired by the social foraging behavior of Escherichia coli (E. Coli) bacteria and named as Bacterial Foraging (BF) (Passino 2002). Until today, this latest optimization algorithm, BF, is gaining importance in the optimization problems and has been successfully implemented to some engineering problems such as optimal controller design (Passino 2002; Kim et al. 2007), antenna arrays systems (Guney and Basbug 2008), active power
filter synthesis (Mishra and Bhende 2007), and learning of artificial neural networks (Kim and Cho 2005). Mathematical modelling, modification, and adaptation of the algorithm might be a major part of the research on BF in future. As data clustering can be seen as a process of function optimization, BF may be applied to solve clustering problems with its global search capability.
J Intell Inf Syst (2012) 38:321–341
323
In this paper we propose a new clustering algorithm (called, BFC) for grouping
data by the optimization property of bacterial foraging behavior. Instead of the high speed local search, BFC is an global optimizationbased algorithm which provides a new point of view to solve the NPhard clustering problems. Meanwhile, it is a brand new application of Bacterial Foraging. In our algorithm, no centroid or center needs to be selected in the initial step. Moreover, in order to overcome the drawbacks of traditional algorithms, the proposed algorithm tries to achieve its tripartite objective:
(a) find a high quality approximation to the optimal clustering solution; (b) have a good algorithm performance for highdimensional data; (c) not sensitive to clusters with different size and density. The rest of this paper is organized in the following way. Section 2 gives a back ground of optimization based clustering and the BF algorithm. Section 3 describes the whole process of the proposed BFC algorithm in detail. In Section 4 we give a brief introduction to another three clustering algorithms for comparison and present four measures for algorithm performance evaluation. Section 5 presents experiment companions and discusses experimental results. Finally, conclusion and future work are given in Section 6.
2 Background
2.1 Optimization based clustering
Clustering is a data mining technique which classifies objects into groups (clusters) without any prior knowledge. The problem of common clustering can be formally
started as follows. Given a sample data set X = {x _{1} , x _{2} ,
of the objects into K clusters C _{1} , C _{2} ,
x _{n} }, determine a partition
,
, C _{K} which satisfies:
⎧ ⎪ ⎪ C _{i} = X;
K
⎨ ⎪ C _{i} ^{} C _{j} = ∅,
⎪
⎩
i=1
C _{i} = ∅,
i, j = 1, 2, i = 1, 2,
,
K ;
,
K.
i = j ;
(1)
In the viewpoint of mathematics, cluster C _{i} can be determined by:
C _{i} = {x _{j}  x _{j} − z _{i} ≤ x _{j} − z _{p} ,
⎧
⎪
⎪
⎪
⎨
⎪
⎩ z i =
⎪
⎪
1
C _{i} 
x _{j} ∈C _{i}
x
_{j} ,
p = i,
i = 1, 2,
p = 1, 2,
,
K,
x _{j} ∈ X},
K ,
,
(2)
where · denotes the distance of any two data points in the sample set. z _{i} is the center of cluster C _{i} , which is represented by the average (mean) of all the points in the cluster.
A clustering criterion must be adopted. The most commonly used criterion in
clustering task is the Sum of Squared Error (SSE) (Tan et al. 2006):
SSE =
K
i=1
x
_{j} ∈C _{i}
x _{j} − z _{i} ^{2} .
(3)
324
J Intell Inf Syst (2012) 38:321–341
For each data in the given set, the error is the distance to the nearest cluster. The general objective of clustering is to obtain that partition which, for fixed number of clusters, minimizes the squareerror. Thus, the clustering problem is converted to a process of searching K centers
z _{1} , z _{2} ,
z _{K} , which can minimize the sum of distance between all the sample data x _{i}
, and its closest center. This could be considered as a function optimization issue with the objective function as SSE.
2.2 The bacterial foraging (BF) algorithm
The BF algorithm (Passino 2002) is a new stochastic global search technique based on the foraging behavior of E. Coli bacteria present in the human intestine. The ideas from bacterial foraging can be utilized to solve nongradient optimization problems by three processes, namely, chemotaxis, reproduction, and elimination and dispersal. Generally, as a group, the E. Coli bacteria will try to find food and avoid harmful phenomena during foraging, and after a certain time period, recover and return to some standard behavior in a homogeneous medium. An E. Coli bacterium can move in two different ways: tumbling and swimming, and it alternates between these two modes of operation its entire lifetime. This alternation between the two modes, called chemotactic steps, will move the bacterium, but in random directions, and this
enables it to “search” for nutrients. After the bacterium has collected a given amount of nutrients, it can selfreproduce and divide into two. The bacteria population can also change (e.g., be killed or dispersed) by the local environment.
A BF optimization algorithm can be explained as follows:
Given a Ddimensional search space ^{D} , try to find the minimum of objective
function J(θ), θ ∈ ^{D} , where we do not have measurements or an analytical descrip tion of the gradient ∇ J(θ). Here, we use ideas from bacterial foraging to solve this
nongradient optimization problem. Let {θ ^{i} ( j)i = 1, 2,
S} represent the position
of each member in the population of the S bacteria at the jth chemotactic step.
, the random direction specified by the tumble. To represent a tumble, a unit length random direction, say φ( j), is generated; this will be used to the following swim phase after a tumble. Therefore, the position of bacterium i in one step is updated as:
Choose C(i) > 0 (i = 1, 2,
S), denote a basic chemotactic step size that taken in
,
θ ^{i} ( j + 1) = θ ^{i} ( j) + C(i)φ( j).
(4)
If J(θ ^{i} ( j + 1)) < J(θ ^{i} ( j)), another step in this same direction will be taken. This
swimming iteration will be continued as long as it continues to reduce the objective function, but only up to a maximum number of steps, N _{s} . After N _{c} chemotactic steps, a reproduction step is taken. S _{r} (half of the population) healthiest bacteria each split into two bacteria, which are placed at the same location. Finally, each bacterium in the population is subjected to an elimination–dispersal process with probability p _{e}_{d} .
3 Proposed methodology: the BFC algorithm
In this section we will express how bacterial foraging optimization solves general clustering problem in detail.
J Intell Inf Syst (2012) 38:321–341
325
3.1 The BF based clustering (BFC) algorithm
As we have just mentioned in Section 2.1, clustering tasks can be considered as optimization problems. Firstly, the fitness function should be specified. Here we choose SSE in (3) to be the required function J in BFC:
J(w, z) =
K
n
D
c=1
t=1
d=1
w _{t}_{p} x _{t}_{d} − z _{c}_{d} ^{2} ,
(5)
where D is the dimension of the search space; w is a weight matrix of size n × K and w _{t}_{p} is the associated weight of data x _{t} with cluster c which can be assigned as
w tp =
1
0
if x _{t} is labelled to cluster c otherwise
, t = 1,
,
n,
c = 1,
,
K.
Algorithm 1 introduces the proposed BFC algorithm. In the BFC algorithm, an Ssize population of bacteria is generated for each center, so there will be S × K bacteria changing positions for the minimum cost by foraging behaviors in this approach. A virtual bacterium is actually one trial solution (may be called a searchagent) that moves on the functional surface to locate the global optimum. Initially, S data are randomly generated from X as bacteria for each center z _{c} (line 1 in Algorithm 1). Then for every bacterium i, the chemotaxis process starts (lines 4–20 in Al gorithm 1). All the bacteria update their positions for N _{c} step of iterations. The
agents first present a tumble in a unit length random direction (
(i) ∈ ^{D} is a random vector with each element a random number on [−1, 1]) with a basic chemotaxis step size C(i) (line 6 in Algorithm 1), and then swim to minimize the objective function J up to a maximum number of steps, N _{s} (lines 9– 18 in Algorithm 1). The chemotaxis process is in a combined N _{r}_{e} step reproduction loop (lines 3, 21–23 in Algorithm 1), and encapsulated in a N _{e}_{d} length elimination– dispersal phase during which a percentage p _{e}_{d} of bacteria are dispersed at random (lines 2, 24–25 in Algorithm 1). All the bacteria will converge to certain places in the search space after the iteration process. The final positions of the bacteria are considered as the required centers. Allocate all the data according to (2) into different clusters which are represented by the final centers gained after the iteration process. Assign every data object a corresponding cluster label (lines 26–33 in Algorithm 1).
(i)
^{√} T
(i) (i) ^{,} ^{w}^{h}^{e}^{r}^{e}
3.2 Guidelines for algorithm parameter setting
The bacterial foraging optimization algorithm requires initialization of a variety of parameters, and the authors of Passino (2002) gave out a set of guidelines for parameter choices in BF. Since we put the basic idea of BF into our methodology, these guidelines can work in BFC as well. In BFC, the size of the bacteria population S should be picked first. Enlarging S will apparently increase the computing time but find the optimal more easily. Next, there is a threelayer optimization loop in BFC with the size of N _{i}_{t}_{e}_{r} = N _{c} × N _{r}_{e} × N _{e}_{d} . The larger N _{i}_{t}_{e}_{r} is, the better the optimization progress is, but also the more
326
J Intell Inf Syst (2012) 38:321–341
Algorithm 1 The BFC Algorithm
Require:
Data set, X = {x _{1} , x _{2} , Cluster number, K. Ensure:
,
x _{n} };
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
Clusters: {C _{1} , C _{2} ,
, randomly as the positions of bacteria for each cluster center z _{c} (c = 1, 2, for l = 1 : N _{e}_{d} do for k = 1 : N _{r}_{e} do for j = 1 : N _{c} do for i = 1 : S do
, Initialize K centers for C _{1} , C _{2} ,
C _{K} }.
C _{K} : Generate S data {b _{c} , b ^{2} ,
1
c
,
b
S
c
} from X
, K).
b ^{i} _{c} ( j + 1, k,l) = b ^{i} _{c} ( j, k,l) + C(i)
Calculate J(i, j, k, l) with current b ^{i} _{c} ( j, k,l) J _{l}_{a}_{s}_{t} = J(i, j, k,l) while m < N _{s} do m = m + 1
^{} ^{(}^{i}^{)}
^{√} ^{T} (i) (i)
if J(i, j
+ 1, k, l) < J _{l}_{a}_{s}_{t} then
b ^{i} _{c} ( j + 1, k,l) = b ^{i} _{c} ( j + 1, k,l) + C(i)
J _{l}_{a}_{s}_{t} = J(i, j + 1, k,l)
z _{c} ( j, k,l) = b ^{i} _{c} ( j else m = N _{s} end if end while end for end for
J
i
health ^{=} N j=1 ^{c} +1
J(i, j, k,l)
+ 1, k,l)
Reproduce(X,J _{h}_{e}_{a}_{l}_{t}_{h} )
end for
Elimination–dispersal(X, p _{e}_{d} )
^{} ^{(}^{i}^{)}
^{√} ^{T} (i) (i)
25: end for
26: 
for t = 1 : n do 

27: 
for c = 1 : K do 

28: 
Calculate distance d _{c} = x _{t} − z _{c} 

29: 
end for 

30: 31: 
, Find the position p of min(d) d = {d _{1} , d _{2} , d _{K} } 

32: 
C _{p} .add(x _{t} ) 
33: end for
computational complexity is. If N _{i}_{t}_{e}_{r} is too short, the algorithm could more easily get trapped in a local minimum. Then, the bacteria will swim in random directions with N _{s} steps. Large values of N _{s} tend to make the bacteria move more in different directions to get better results, but of course more computational complexity. And, if p _{e}_{d} is large, the algorithm can degrade to random exhaustive search. If, however, it
J Intell Inf Syst (2012) 38:321–341
327
is chosen appropriately, it can help the algorithm jump out of local optima and into a global optimum. Finally, C(i) is the only one that occurs in the iteration function of (4) and can be seen as a type of “step size” for the BF optimization algorithm. You can choose a biologically motivated value; however, such values may not be the best for an engineering application (Passino 2002). If the C(i) values are too large, then if the optimum value lies in a valley with steep edges, the search will tend to jump out of the valley, or it may simply miss possible local minima by swimming through them without stopping. On the other hand, if the C(i) values are too small, convergence can be slow, but if the search finds a local minimum it will typically not deviate too far from it. In Section 5, we will set up experiments to investigate parameters of BFC.
4 Cluster validity and compared methods
One of the most important issues of cluster analysis is the evaluation of clustering results to find the partitioning that the best fits the underlying data. The procedure of evaluating the results of a clustering algorithm is known as cluster validity. Furthermore, in order to show the superiority of the clustering algorithm, some existing methods are selected for comparing with the proposed algorithm during cluster validity.
4.1 Cluster validity
Two kinds of cluster validity approaches are chosen in this article. The first is based on external criteria, which are used to evaluate the results of the proposed BFC algorithm based on the comparison to the prespecified class label information of the data set. The second one is based on internal criteria, which we evaluate the clustering results of the BFC algorithm performance without any prior knowledge of data sets. Two external validity measures Rand and Jaccard (Theodoridis and Koutroumbas 2006), as well as two internal validity measures, Beta (Pal et al. 2000) and Distance index are utilized for performance evaluation of the BFC algorithm and its comparison methods.
• Rand coefficient (R): It determines the degree of similarity between the known correct cluster structure and the results obtained by a clustering algorithm (Theodoridis and Koutroumbas 2006). It is defined as
_{R} _{=}
SS + DD
SS + SD + DS _{+} _{D}_{D} ^{.}
(6)
SS, SD, DS, DD represent the number of possible pairs of data points where,
SS: 
both the data points belong to the same cluster and same group. 
SD: 
both the data points belong to the same cluster but different groups. 
DS: 
both the data points belong to different clusters but same group. 
DD: 
both the data points belong to different clusters and different groups. 
Note that if there are N data points in a data set, M = SS + SD + DS + DD, where M is the total number of possible data pairs and its value equals to N(N − 1)/2.
328
J Intell Inf Syst (2012) 38:321–341
Value of R is in the range [0, 1] and higher the value of R, better is the clustering.
• Jaccard coefficient (J): It is the same as rand coefficient except that it excludes DD and is defined as
J =
^{S}^{S} _{+} _{D}_{S} .
SS + SD
(7)
Value of J locates in the interval [0, 1]. The higher the value of J, the better the clustering performance is.
• Beta index (β): It computes the ratio of total variation and within class variation (Pal et al. 2000), and is defined as
β =
C
i=1 n _{j}_{=}_{1} (X _{i}_{j} −
i
X) ^{2}
C
i=1 n _{j}_{=}_{1} (X _{i}_{j} − X _{i} ) ^{2}
i
,
(8)
where X is the mean of all the data points and X _{i} is the mean of the data points that belong to cluster C _{i} ; X _{i}_{j} is the jth data point of ith cluster and n _{i} is the number of data points in cluster C _{i} . Since the numerator of β is a constant for
a given data set, the value of β is dependent on the denominator only. The
denominator decreases with homogeneity in the formed clusters. Therefore, for
a given data set, higher the value of β, better is the clustering (Pal et al. 2000).
Note that (X _{i}_{j} − X) can be calculated as Euclidean distance of the two vectors
X _{i}_{j} and X.
• Distance index (Dis = ^{I}^{n}^{t}^{r}^{a}
_{I}_{n}_{t}_{e}_{r} ): It computes the ratio of average intracluster
distance and average intercluster distance. The intracluster distance measure is the distance between a point and its cluster center. We take the average of all of these distances and call it Intra which is defined as
Intra = ^{1}
n
K
i=1
x
_{j} ∈C _{i}
x _{j} − z _{i} ^{2} ,
(9)
where n is the total number of objects in a data set. The intercluster distance between two clusters is defined as the distance be
tween the centers of them. We calculate the average of all of these distances as
follows
Inter =
1 _{K} ^{} z _{i} − z _{j} ^{2} ,
i = 1, 2,
,
K − 1,
j = i + 1,
,
K.
(10)
A good clustering method should produce clusters with high intraclass similarity
while low interclass similarity. So cluster results can be measured by combining
the average intracluster distance (Intra) and average intercluster distance (Inter) in a ratio way:
_{D}_{i}_{s} _{=} Intra Inter
.
(11)
Therefore, we want to minimize the value of measure Dis.
J Intell Inf Syst (2012) 38:321–341
329
4.2 Methods for comparison
For presenting the superiority of the proposed BFC algorithm, we select some previous clustering techniques for algorithm comparisons. Firstly we choose the kmeans algorithm (MacQueen 1967) as a method to be compared because it is the most famous conventional clustering technique. The k means algorithm is a partitionbased clustering approach (see Algorithm 2) and has been widely applied for decades of years.
Algorithm 2 The kmeans Clustering Algorithm
Require:
Data set, X = {x _{1} , x _{2} , Cluster number, K. Ensure:
,
x _{n} };
Clusters: {C _{1} , C _{2} ,
, Initialize K centers for C _{1} , C _{2} ,
C _{K} }.
C _{K} : Randomly select K data points from X
, randomly as the initial centroid vectors. 2: repeat
1:
3: 
Assign each data point to its closest centroid and form K clusters by (2). 
4: 
Recompute the centroid for each cluster. 
5: until Centroid vectors do not change.
Moreover, as an global optimizationbased methodology, the BFC algorithm will be compared with the antbased clustering (Handl et al. 2006) and PSObased clustering technique (van der Merwe and Engelbrecht 2003). Ant colony optimization (ACO) (Dorigo and Maniezzo 1996) was designed to emulate ants’ behavior of laying pheromone on the ground while moving to solve optimization problems. Handl et al. (2006) presented an instance of ACO for cluster ing which return an explicit partitioning of data by an automatic process. The ACO algorithm imitates the mechanisms by choosing solutions based on pheromones and updating pheromones based on the solution quality (shown in Algorithm 3). Particle swarm optimization (PSO) (Kennedy and Eberhart 1995) is a population based algorithm. It is a global optimization method and simulates bud flocking or fish schooling behavior to achieve a selfevolution system. The clustering approach using PSO can search automatically the data centers of K groups data set by optimizing the objective function (see Algorithm 4). In Section 5, we will set up a series of experiments to describe method comparisons between BFC, kmeans, ACObased and PSObased clustering algorithms.
5 Experiments
In this section, we will present several simulation experiments on the platform of Matlab to give a detailed illustration on the superiority and feasibility of the proposed approach.
330
J Intell Inf Syst (2012) 38:321–341
Algorithm 3 The ACObased Clustering Algorithm
Require:
Data set, X = {x _{1} , x _{2} , Cluster number, K. Ensure: 
, 
x _{n} }; 

Clusters: {C _{1} , C _{2} , 
, 
C _{K} }. 
1: Initialize pheromones. Randomly scatter data items on the toroidal grid, and generate positions of R ants randomly from the data space for each center.
2: 3: 4: 5: 6: 7: for j = 1 : iter _{m}_{a}_{x} do 

for i = 1 : R do 

let each data belong to one cluster with the probability threshold q 

Calculate the objective function J(i, j) with current centers 

J _{l}_{a}_{s}_{t} = J(i, j) 

Construct solution S _{i} using pheromone trail 

8: 
Calculate new cluster center; Calculate J(i, j + 1) with current centers 

9: 
if J(i, j + 1) < J _{l}_{a}_{s}_{t} then 

10: 
S _{i} ( j + 1) = S _{i} ( j) //P _{i} represents the Save the best solution among the R 

11: 
solutions found. end if 

12: 
end for 

13: 
Update the pheromone level on all data according to the best solution. 

14: {z _{1} , z _{2} , , the best solution. 15: end for 
z _{K} } = S _{b} //Update cluster centers by the cluster center values of 
16: 
for t = 1 : n do 

17: 
for c = 1 : K do 

18: 
Calculate distance d _{c} = x _{t} − z _{c} 

19: 
end for 

20: 
d = {d _{1} , d _{2} , 
, 
d _{K} } 
21: 
Find the position p of min(d) 

22: 
C _{p} .add(x _{t} ) 
23: end for
5.1 Data source
Two different types of benchmark data sets are used: two synthetic data sets (Handl and Knowles 2008) that permit the modulation of specific data properties and three real data sets provided by UCI Machine Learning Repository (UCI Machine Learning Repository 2007). Both of the two synthetic data sets in our work follow xdimensional normal distributions N(μ, σ) from which the data items are located into the y different clusters. The sample size s of each cluster, the mean vector μ and the vector of the standard deviation σ are themselves randomly determined using uniform distributions over fixed ranges (with s ∈ [50, 450], μ _{i} ∈ [−10, 10] and σ _{i} ∈ [0, 5]). Consequently, clusters in each data set are with different size and different density. The first one, which we call it 2D4C, is a 2dimensional data set arranged in ([−20, 20], [−12, 8]) and contains 4 clusters with 528, 348, 272 and 424 instances each
J Intell Inf Syst (2012) 38:321–341
331
Algorithm 4 The PSObased Clustering Algorithm
Require:
Data set, X = {x _{1} , x _{2} , Cluster number, K. Ensure: 
, 
x _{n} }; 

Clusters: {C _{1} , C _{2} , 
, 
C _{K} }. 
1: Initialize the position M and velocity v of S particles randomly, in which each
K) contains K randomly generated centroid m _{i}_{K} }.
single particle M _{i} (i = 1, 2,
vectors: M _{i} = {m _{i}_{1} , m _{i}_{2} ,
,
,
2: 
for j = 1 : iter _{m}_{a}_{x} do 

3: 
for i = 1 : S do 

4: 
Calculate the objective function J(i, j) with current M _{i} ( j) 

5: 
J _{l}_{a}_{s}_{t} = J(i, j) 

6: 
v _{i} ( j + 1) = w ∗ v _{i} ( j) + c _{1} ∗ rand() ∗ (P _{i} ( j) − M _{i} ( j)) + c _{2} ∗ rand() ∗ (P _{g} − 

7: 
M _{i} ( j)) M _{i} ( j + 1) = M _{i} ( j) + v _{i} ( 
j) 

8: 
Calculate J(i, j + 1) with current M _{i} ( j + 1) 

9: 
if J(i, j + 1) < J _{l}_{a}_{s}_{t} then 

10: 
P _{i} ( j + 1) = M _{i} ( j + 1) //P _{i} represents the local best position, the best 

11: 
position found so far for particle i. 

12: 
else P _{i} ( j + 1) = P _{i} ( j) 

13: 
end if 

14: 
end for 

15: 
Update the global best position P _{g} : Select the best P _{i} from {P _{1} , P _{2} , 
, 
P _{S} } 

as P _{g} . //P _{g} represents the global best position in the neighborhood of each particle. 

16: 
{z _{1} , z _{2} , 
, 
z _{K} } = P _{g} 
17: end for
18: 
for t = 1 : n do 

19: 
for c = 1 : K do 

20: 
Calculate distance d _{c} = x _{t} − z _{c} 

21: 
end for 

22: 
d = {d _{1} , d _{2} , 
, 
d _{K} } 
23: 
Find the position p of min(d) 

24: 
C _{p} .add(x _{t} ) 
25: end for
(see Fig. 1). The second data set, named 10D4C, contains a total number of 1,289 items that spread in 4 clusters based on 10 different features. All the 5 data sets from UCI that we employ in our experiments are famous data base that can be easily found in data mining and pattern recognition literature. Iris data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant and can be treated as a cluster in the experiments. Each instance has 4 features representing sepal length, sepal width, petal length and petal width, respectively. Wine data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. This set contains 3 clusters and has 59,
332
J Intell Inf Syst (2012) 38:321–341
Fig. 1
The original 2dimensional data distribution in space
71, 48 instances for each cluster. Glass data set has 214 instances describing 6 classes of glass based on 9 features. Zoo data set is a simple database containing 101 animal instances with 16 Booleanvalued attributes which are classified into 7 categories. Ionosphere contains 351 radar data with 34 continuous features and was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high frequency antennas with a total transmitted power on the order of 6.4 kilowatts. The targets were free electrons in the ionosphere.“Good” radar returns are those showing evidence of some type of structure in the ionosphere. “Bad” returns are those that do not; their signals pass through the ionosphere. The data points in all the 5 data sets are scattered in highdimensional spaces. The description of all the data sets used in our study can be summarized in Table 1.
Table 1
of data sets
Summarization
Data sets 
Instances 
Featrues/dimensions 
Clusters 
2D4C 
1,572 
2 
4 
10D4C 
1,289 
10 
4 
Iris 
150 
4 
3 
Wine 
178 
13 
3 
Glass 
214 
9 
6 
Zoo 
101 
16 
7 
Ionosphere 
351 
34 
2 
J Intell Inf Syst (2012) 38:321–341
333
5.2 Parameter investigation
Parameter selection is an important part of optimizationbased approaches. In this subsection we present results from our investigations on the impacts of some key parameters based on the guidelines in Section 3.2, and assign initial values for them.
5.2.1 Chemotaxis step size C(i)
In BF, C(i) is the size of chemotaxis step and can be initialized with biologically motivated values. However, a biologically motivated value may not be the best for an engineering application (Passino 2002), it should be chosen according to our data clustering tasks. Below in Fig. 2 we illustrate the relationship between the objective function and the number of chemotactic steps N _{c} for different C(i). From Fig. 2 we can find when the size of chemotaxis step C(i) is smaller, the objective function converges faster. Since SSE reaches the smallest value at C(i) = 0.1, we select 0.1 as the parameter value of C(i) for the proposed BFC algorithm to implement the coming tasks.
5.2.2 Chemotactic step N _{c} and swim step N _{s}
Next, large values for Nc result in many chemotactic steps, and hopefully more opti mization progress, but of course more computational complexity. Figure 3 presents the characteristics between objective function and the number of chemotactic steps
Fig. 2
Performance of BFC for Iris data with different C(i)
334
J Intell Inf Syst (2012) 38:321–341
Fig. 3
Performance values for the five different swim step sizes for N _{c} from 1 to 100
N _{c} for different life time N _{s} of the bacteria. As evident, when the swim step N _{s} is smaller, the objective function converges faster. From Fig. 3 we can also find BFC converges to the smallest SSE at N _{s} = 4 and 6. However, the objective function converges faster at N _{s} = 4. We thus choose N _{s} = 4 and N _{c} = 100 in our data clustering tasks.
5.2.3 Reproduction step N _{r}_{e} and elimination–dispersal step N _{e}_{d}
If N _{c} is large enough, the value of N _{r}_{e} affects how the algorithm ignores bad regions and focuses on good ones. If N _{r}_{e} is too small, the algorithm may converge prematurely; however, larger values of N _{r}_{e} clearly increase computational com plexity. A low value for N _{e}_{d} dictates that the algorithm will not rely on random elimination–dispersal events to try to find favorable regions. A high value increases computational complexity but allows the bacteria to look in more regions to find good nutrient concentrations. Figures 4 and 5 depict the values of objective function (SSE) and the correspond ing elapsed timea by experiments with N _{r}_{e} from 2 to 6 and N _{e}_{d} from 1 to 5. It is easy to find in Figs. 4 and 5 that the larger N _{r}_{e} or N _{e}_{d} is, the more slowly BF C converges. Moreover, SSE changes slightly after N _{r}_{e} = 4 and N _{e}_{d} = 2, while the elapsed times increase significantly. Based on these results, we choose N _{r}_{e} = 4 and N _{e}_{d} = 2 in our applications.
J Intell Inf Syst (2012) 38:321–341
335
Fig. 4
Nre
SSE and the computing time of BFC for Iris data with different N _{r}_{e}
5.2.4 Elimination–dispersal probability p _{e}_{d}
In BF, if p _{e}_{d} is large, the algorithm can degrade to random exhaustive search. However, appropriately choose of p _{e}_{d} can help the algorithm jump out of local optima and into a global optimum. Figure 6 shows the relationship between objective
Fig. 5
Ned
SSE and the computing time of BFC for Iris data with different N _{e}_{d}
336
J Intell Inf Syst (2012) 38:321–341
Fig. 6
Ped
Performance values for the eight different elimination–dispersal probabilities
function values and different p _{e}_{d} . Apparently, BFC gets the smallest SSE value at p _{e}_{d} = 0.25.
5.2.5 Other parameters
For PSO, we use 50 particles, and set w = 0.72 and c _{1} = c _{2} = 1.49. These values were chosen to ensure good convergence (van den Bergh 2002). For ACO, the authors have designed some techniques to set the parameters for optimal performance (Handl et al. 2006). In our implementation we also choose 10 ants and 1,000 iteration steps which have followed the same settings. For BFC, based on the investigations in the previous subsection, we therefore choose S = 50, N _{c} = 100, N _{s} = 4, N _{r}_{e} = 4, N _{e}_{d} = 2 and p _{e}_{d} = 0.25.
5.3 Results and analysis
For all the results reported, average values of different performance indices over 30 simulations and their corresponding standard deviations (shown in bracket) for each data set are given. Euclidean distance is chosen to measure the distance between data points in our work. Rank of each algorithm is given depending on its performance measure followed by corresponding rank (from 1 to 3). Table 2 summarizes the clustering results obtained by the kmeans, ACO, PSO and the proposed BFC algorithms for different data sets. From the clustering results of real data sets which is shown in Table 2, and according to the properties of data sets which are described in Table 1, some conclusions are revealed as follows:
(1) It is apparent that in terms of external validity measures (Rand and Jaccard indices) performance of the proposed BFC algorithm is better for most of the
J Intell Inf Syst (2012) 38:321–341
337
Table 2
Values of performance measures by the kmeans, ACO, PSO and the proposed BFC
algorithms 

Data sets 
Method 
Rand 
Jaccard 
β 
Dis = ^{I}^{n}^{t}^{r}^{a} Inter 
Time (s) 
2D4C 
kmeans 
0.8636 
0.8021 
11.1319 ^{c} 
0.01079 
0.23058 ^{a} 
(0.010365) 
(0.017803) 
(0.684979) 
(0.289137) 
(0.011049) 

PSO 
0.9941 ^{a} 
0.9778 ^{a} 
12.565 ^{b} 
0.00998 ^{a} 
10.87969 

(0.0061857) 
(0.0094362) 
(0.669946) 
(0.598628) 
(0.419201) 

ACO 
0.9916 ^{c} 
0.9558 ^{c} 
1.3874 
0.01012 ^{c} 
10.19844 ^{c} 

(0.031725) 
(0.0107) 
(0.051462) 
(0.004398) 
(1.74367) 

BFC 
0.9920 ^{b} 
0.9702 ^{b} 
13.249 ^{a} 
0.01006 ^{b} 
6.7922 ^{b} 

(0.0027578) 
(0.0101823) 
(0.329451) 
(0.002997) 
(0.27472) 

10D4C 
kmeans 
0.8946 ^{c} 
0.7203 ^{c} 
2.264 ^{b} 
0.0973 
0.069018 ^{a} 
(0.03401184) 
(0.0138685) 
(0.052886) 
(0.080985) 
(0.023524) 

PSO 
0.8763 
0.6924 
2.1989 ^{c} 
0.09693 ^{c} 
27.71563 

(0.0393151) 
(0.0893076) 
(0.06987) 
(0.434701) 
(0.919202) 

ACO 
0.9239 ^{b} 
0.76142 ^{b} 
1.1102 
0.096711 ^{b} 
19.2719 ^{c} 

(0.050278) 
(0.035419) 
(0.034022) 
(0.062165) 
(5.57154) 

BFC 
0.9319 ^{a} 
0.8187 ^{a} 
2.2968 ^{a} 
0.09202 ^{a} 
18.28282 ^{b} 

(0.011764) 
(0.00219203) 
(0.040214) 
(0.044709) 
(0.902037) 

Iris 
kmeans 
0.8737 ^{c} 
0.6823 ^{c} 
7.8405 ^{c} 
0.02422 ^{c} 
0.00625 ^{a} 
(0.135340) 
(0.096661) 
(0.602076) 
(0.02267) 
(0.004941) 

PSO 
0.9195 ^{b} 
0.7828 ^{b} 
8.3579 ^{b} 
0.02243 ^{b} 
6.753125 

(0.0427095) 
(0.0717713) 
(0.374798) 
(0.005974) 
(0.07683) 

ACO 
0.8254 
0.6547 
1.6159 
0.03104 
5.5256 ^{c} 

(0.008045) 
(0.046406) 
(0.53276) 
(0.02067) 
(0.48719) 

BFC 
0.9341 ^{a} 
0.8180 ^{a} 
9.1295 ^{a} 
0.02111 ^{a} 
2.9344 ^{b} 

(0.0103238) 
(0.0248901) 
(0.369183) 
(0.004837) 
(0.058962) 

Wine 
kmeans 
0.7170 ^{c} 
0.4127 
7.3745 ^{c} 
0.02652 ^{b} 
0.008235 ^{a} 
(0.00675452) 
(0.00306349) 
(0.398942) 
(0.002768) 
(0.077548) 

PSO 
0.7307 ^{b} 
0.4312 ^{b} 
7.6108 ^{b} 
0.02727 ^{c} 
19.92031 

(0.0118794) 
(0.01004092) 
(0.358704) 
(0.003944) 
(0.180339) 

ACO 
0.683959 
0.424734 ^{c} 
1.012184 
0.030469 
11.0644 ^{c} 

(0.0107) 
(0.007082) 
(0.0061833) 
(0.008435) 
(0.3288) 

BFC 
0.7516 ^{a} 
0.4494 ^{a} 
7.9366 ^{a} 
0.0259 ^{a} 
7.86721 ^{b} 

(0.00289913) 
(0.00282842) 
(0.270582) 
(0.001703) 
(0.140304) 

Glass 
kmeans 
0.7047 ^{b} 
0.2676 ^{b} 
3.1188 ^{b} 
0.03647 ^{b} 
0.034375 ^{a} 
(0.0122868) 
(0.029821) 
(0.211435) 
(0.018903) 
(0.07548) 

PSO 
0.5409 
0.1902 
2.4245 ^{c} 
0.04097 
15.29531 

(0.0636369) 
(0.0543058) 
(0.127317) 
(0.03715) 
(0.699914) 

ACO 
0.6353 ^{c} 
0.2699 ^{c} 
1.009839 
0.037285 ^{c} 
13.9375 ^{c} 

(0.0395196) 
(0.0358161) 
(0.004898) 
(0.026172) 
(0.40873) 

BFC 
0.7376 ^{a} 
0.2765 ^{a} 
3.5644 ^{a} 
0.03171 ^{a} 
11.62502 ^{b} 

(0.0127279) 
(0.0208597) 
(0.072933) 
(0.007819) 
(0.203714) 

Zoo 
kmeans 
0.7998 
0.3758 
4.1048 ^{c} 
0.01821 
0.01875 ^{a} 
(0.0484368) 
(0.116199) 
(0.151947) 
(0.003507) 
(0.0010546) 

PSO 
0.8525 ^{c} 
0.4768 ^{c} 
4.6966 ^{b} 
0.01757 ^{c} 
44.23438 

(0.372645) 
(0.0714178) 
(0.26326) 
(0.009615) 
(3.770243) 

ACO 
0.8829 ^{b} 
0.6867 ^{b} 
1.02699 
0.01691 ^{b} 
17.6563 ^{c} 

(0.0231966) 
(0.0541532) 
(0.0168503) 
(0.0034002) 
(0.75973) 

BFC 
0.9210 ^{a} 
0.6977 ^{a} 
5.9665 ^{a} 
0.01538 ^{a} 
11.38752 ^{b} 

(0.0311569) 
(0.0448654) 
(0.093465) 
(0.003154) 
(0.552599) 
338
J Intell Inf Syst (2012) 38:321–341
Table 2 (continued) 

Data sets 
Method 
Rand 
Jaccard 
β 
Dis = ^{I}^{n}^{t}^{r}^{a} Inter 
Time (s) 
Ionosphere 
kmeans 
0.5877 ^{c} 
0.4323 ^{c} 
1.3405 ^{c} 
0.75862 ^{c} 
0.046875 ^{a} 
(0.0012882) 
(0.0043657) 
(0.07535433) 
(0.0738499) 
(0.02578) 

PSO 
0.5921 ^{b} 
0.4261 
1.3516 ^{b} 
0.75771 ^{b} 
45.1 ^{c} 

(0.0013718) 
(0.0067175) 
(0.04071) 
(0.081232) 
(1.094375) 

ACO 
0.5398 
0.5384 ^{a} 
1.3114 
0.76174 
53.6571 

(0.000967) 
(0.0015278) 
(0.0034079) 
(0.036454) 
(4.6563) 

BFC 
0.5989 ^{a} 
0.44390 ^{b} 
1.3528 ^{a} 
0.75413 ^{a} 
36.39376 ^{b} 

(0.00120208) 
(0.00314662) 
(0.042282) 
(0.023309) 
(0.476011) 
^{a} Rank 1
^{b} Rank 2
^{c} Rank 3
data sets (namely 10D4C, Iris, Wine, Glass, Zoo and Rand for Ionosphere), whereas PSO gives better result for 2D4C data set and ACO gets the best Jaccard for Ionosphere. These results show that with the help of global and chaotic search, the proposed BFC methodology can reach the global optimal solutions, which has covered the shortage of the kmeans algorithm. Meanwhile, as an optimizationbased clustering algorithm, BFC reaches the optimal points more closer and exhibited better convergence than ACO and PSO techniques. (2) Furthermore, for Glass and Zoo data, the proposed BFC approach gets a
(3)
(4)
(5)
evident improvement of Rand and Jaccard values to other three algorithms. Combining the description of Glass and Zoo data sets, we can conclude that the BFC algorithm is much more effective for multicluster data sets and has the superiority for clusters with different scales. We can find that the proposed BFC algorithm has the best performance of β index and the ACObased clustering algorithm gets the smallest (worst) β values for all data sets. It is clear that the BFC algorithm is quite available for data sets without any prior information, which will be helpful in real life clustering applications. When considering intracluster and intercluster distances, the former ensures compact clusters with little deviation from the cluster centers, while the latter ensures larger separation between the different clusters. Dis index is the ratio of intracluster and intercluster distances, which should be minimized. With reference to this criterion, the BFC algorithm succeeds most in finding clusters with smaller Dis value than the kmeans algorithm, antbased and PSObased approaches, although PSO algorithm performs the best for the 2D4C data set. The standard deviations of different measures obtained by different methods are shown in bracket. Stability of BFC over different data sets can also be seen from the smaller values of standard deviation of Rand, Jaccard and Dis indices. Although ACO has smaller standard deviation values of β due to its poor performance of β. And the kmeans algorithm gives less standard deviation of computing time because of its little computational complexity. These comparisons present that the results of the BFC algorithm change less at different experiments, and BFC is a more stable clustering technique than the kmeans, PSO and ACObased clustering algorithms.
J Intell Inf Syst (2012) 38:321–341
339
(6) The CPU (execution) time, in seconds, needed by the algorithms are also
(7) 
given in the table for comparison. All the experiments are performed in a Dell terminal with Intel Core(TM)2 Due CPU (2.53 GHz clock speed with 2GB memory) and in Windows XP environment. Implementation of the algorithms is done in Matlab. It is apparent that the 
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.