Professional Documents
Culture Documents
www.elsevier.com/locate/patcog
Received 10 December 2002; received in revised form 20 June 2003; accepted 20 June 2003
Abstract
The self-organizing map (SOM) has been widely used in many industrial applications. Classical clustering methods based
on the SOM often fail to deliver satisfactory results, specially when clusters have arbitrary shapes. In this paper, through some
preprocessing techniques for 4ltering out noises and outliers, we propose a new two-level SOM-based clustering algorithm
using a clustering validity index based on inter-cluster and intra-cluster density. Experimental results on synthetic and real data
sets demonstrate that the proposed clustering algorithm is able to cluster data better than the classical clustering algorithms
based on the SOM, and 4nd an optimal number of clusters.
? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Partitioning clustering; Hierarchical clustering; Clustering validity index; Self-organizing map; Multi-representation
0031-3203/03/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/S0031-3203(03)00237-1
176 S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188
Ref. [8] assumes each cluster can be either hyper-spherical in the clustering assessments. The assessment is able to 4nd
or hyper-ellipsoidal. The number of output neurons is equal an optimal partition of the input data [14].
to the desired number of clusters. When the number of clus- In this paper, a new two-level algorithm for clustering
ters is a prime number, the SOM cannot be realized in two of the SOM is proposed. The clustering at the second level
dimensions. Furthermore, the SOM is conceptually diGerent is agglomerative hierarchical clustering. The merging crite-
from clustering [4]. The SOM tries to extract and visually rion is motivated by a clustering validity index based on the
display the topological structure of high-dimensional input inter-cluster and intra-cluster density, and inter-cluster dis-
data while clustering is to partition the input data into groups. tances [14]. The original index is used for the whole input
The algorithms in Refs. [7,8] seem to mix two objectives, data and therefore is a global index. The optimal number
feature mapping and clustering, and the overall methodol- of clusters can be found by the clustering validity index. In
ogy is diLcult to interpret in either case [4]. this paper, the clustering validity index is slightly modi4ed
It is then reasonable that a two-level approach is able to and used locally to determine which neighboring pair of
cluster data based on the SOM. The idea is that the 4rst clusters to be merged into one cluster in agglomerative hi-
level is to train data by the SOM and the second level is erarchical clustering. Since more information is added into
to cluster data based on the SOM. The required number of the merging criterion in addition to inter-cluster distances,
output neurons at the 4rst level is more than the desired the proposed algorithm clusters data better than other
number of clusters. Clustering is carried out by clustering clustering algorithms based on the SOM. Through certain
of output neurons after completion of training performed by preprocessing techniques for 4ltering, the proposed cluster-
the SOM. This two-level approach has been addressed in ing algorithm is able to handle input data with noises and
Refs. [9–12]. They are actually multirepresentation-based outliers.
clustering because each cluster can be represented by mul- This paper is organized into 4ve sections. In Section 2,
tiple output neurons. In Ref. [9] there are two SOM lay- the SOM and clustering algorithms are brieMy reviewed.
ers for clustering. The second SOM layer takes the out- Algorithms of two-level clustering of the SOM are dis-
puts of the 4rst SOM layer as the inputs of the second cussed. In Section 3, a new algorithm of multirepresentation
SOM layer. The number of the neurons on the second map clustering of the SOM is proposed. In Section 4, experimen-
is equal to the desired number of clusters. The task of tal results on synthetic and real data sets demonstrate that
the second SOM layer is analogous to clustering of the the proposed algorithm is able to cluster the input data and
SOM by k-means algorithm. In Ref. [10] an agglomerative 4nd the optimal number of clusters. The clustering eGect of
contiguity-constrained clustering method on the SOM was the proposed algorithm is better than that of other cluster-
proposed. The merging process of neighboring neurons was ing algorithms on the SOM. Finally, the conclusions of this
based on minimal distance criterion. Algorithm in Ref. [11] work are presented in Section 5.
extended the idea in Ref. [10] by minimal variance crite-
rion and achieved better clustering results. In Ref. [12] both
the classical hierarchical and partitioning clustering algo- 2. Self-organizing map and clustering
rithms are applied in clustering of the SOM. The proposed
algorithms [12] were aimed at reducing computational com- 2.1. Self-organizing map and visualization
plexity compared with the classical clustering methods. The
algorithms presented in Refs. [10,11] need to recalculate Competitive learning is an adaptive process in which the
the center after two clusters are merged. They are only fea- neurons in a neural network gradually become sensitive to
sible for clusters with hyper-spherical or hyper-ellipsoidal diGerent input categories, sets of samples in a speci4c do-
shapes. The second SOM layer [9] and the batch k-means main of the input space. A division of neural nodes emerges
algorithm in clustering of the SOM [12] require the desired in the network to represent diGerent patterns of the inputs
number of clusters to be known and are only feasible for after training.
hyper-spherical-shaped clusters. Hierarchical clustering al- The division is enforced by competition among the neu-
gorithms on the SOM in Ref. [12] use only inter-cluster rons: when an input x arrives, the neuron that is best able
distance criterion to cluster the output neurons. In order to to represent it wins the competition and is allowed to learn
deal with arbitrary cluster shapes, high-order neurons are in- it even better, as will be described later. If there exists an
troduced in Ref. [13]. The inverse covariance matrix in the ordering between the neurons, i.e., the neurons are located
clustering metric can be considered as second-order statis- on a discrete lattice, the competitive learning algorithm can
tics to capture hyper-ellipsoidal properties of clusters. But be generalized: if not only the winning neuron but also
the algorithm in Ref. [13] is computation-consuming such its neighboring neurons on the lattice are allowed to learn,
that it is not suitable for real applications. Some additional the whole eGect is that the 4nal map becomes an ordered
information other than distances will be helpful in the clus- map in the input space. This is the essence of the SOM
tering process. In Ref. [14] a clustering validity assessment algorithm.
was proposed that not only inter-cluster distances, but also The SOM consists of M neurons located on a regular
inter-cluster density and intra-cluster density are considered low-dimensional grid, usually one or two dimensional.
S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188 177
Higher-dimensional grids are possible, but they are not 2.2. Clustering algorithms
generally used since their visualization is problematic. The
lattice of the grid is either hexagonal or rectangle. There are a multitude of clustering methods in the litera-
The basic SOM algorithm is iterative. Each neuron i ture, which can be broadly classi4ed into the following cat-
has a d-dimensional feature vector wi = [wi1 ; : : : ; wid ]. At egories [6]: hierarchical clustering, partitioning clustering,
each training step t, a sample data vector x(t) is randomly density-based clustering, grid-based clustering, model-based
chosen from the training set. Distances between x(t) and clustering. In this paper, only the 4rst two categories are
all the feature vectors are computed. The winning neuron, considered.
denoted by c, is the neuron with the feature vector closest In partitioning clustering, given a database of n object,
to x(t): a partitioning clustering algorithm constructs k partitions
of the data, where each partition represents a cluster and
c = arg min x(t) − wi ; i ∈ {1; : : : ; M }: (1) k 6 n. The most used partitioning clustering algorithm is
i
k-means algorithm, where each cluster is represented by the
A set of neighboring nodes of the winning node is denoted mean value of the objects in the cluster. One advantage of
as Nc . We de4ne hic (t) as the neighborhood kernel function the partitioning clustering is that the clustering is dynamic,
around the winning neuron c at time t. The neighborhood i.e., data points can move from one cluster to another. The
kernel function is a nonincreasing function of time and of other advantage is that some a priori knowledge, such as
the distance of neuron i from the winning neuron c. The cluster shapes, can be incorporated in the clustering. The
kernel can be taken as a Gaussian function: drawbacks of the partitioning clustering are the following:
−
Posi −Posc 2 (1) it encounters diLculty at discovering clusters of arbi-
hic (t) = e 2(t)2 ; i ∈ Nc ; (2) trary shapes; (2) the number of clusters is pre-4xed and the
optimal number of clusters is hard to determine.
where Posi is the coordinates of neuron i on the output grid An hierarchical clustering algorithm creates an hierarchi-
and (t) is kernel width. cal decomposition of the given set of data objects. It can
The weight update rule in the sequential SOM algorithm be classi4ed as either agglomerative or divisive. The advan-
can be written as tage of the hierarchical clustering is that it is not aGected
by initialization and local minima. The shortcomings of the
wi (t) + (t)hic (t)(x(t)
hierarchical clustering are the following: (1) it is impracti-
wi (t + 1) = −wi (t)); ∀i ∈ Nc ; (3) cal for large data sets due to the high-computational com-
plexity; (2) it does not incorporate any a priori knowledge
wi (t) otherwise:
such as cluster shapes; (3) the clustering is static, i.e., data
Both learning rate (t) and neighborhood (t) decrease points in a cluster at the early stage cannot move to another
monotonically with time. During training, the SOM behaves cluster at the latter stage. In this paper, the divisive hierar-
like a Mexible net that folds onto a “cloud” formed by the chical clustering is not considered because the top–down di-
training data. Because of the neighborhood relations, neigh- rection of the divisive hierarchical clustering is not suitable
boring neurons are pulled to the same direction, and thus for two-level clustering of the SOM.
feature vectors of neighboring neurons resemble each other. In the classical agglomerative hierarchical clustering, a
There are many variants of the SOM. However, these vari- pair of clusters to be merged has the minimum inter-cluster
ants are not considered in this paper because the proposed distance. The widely used measures of inter-cluster distance
algorithm is based on the SOM, but not a new variant of the are listed in Table 1 (mi is the mean for cluster Ci and ni is the
SOM. number of points in Ci ). All of these distance measures yield
The 2D map can be easily visualized and thus give people the same clustering results if the clusters are compact and
useful information about the input data. The usual way to well separated. But in some cases [17], using dmax ; dave , and
display the cluster structure of the data is to use a distance dmean as the distance measures result in wrong clusters that
matrix, such as U-matrix [15]. U-matrix method displays the are similar to those determined by partitioning clustering.
SOM grid according to the distance of neighboring neurons. The single-linkage clustering with distance measure dmin
The visual display of U-matrix can be three-dimensional may have “chaining e2ects”—a few points located so as to
using ridges and valleys, or two-dimensional using gray form a bridge between two clusters cause points across the
level. Clusters can be identi4ed in low inter-neuron distances clusters to be grouped into a single cluster.
and borders are identi4ed in high inter-neuron distances. The extended SOM (minimum distance) [10] utilizes the
Another method of visualizing cluster structure is to assign single-linkage clustering method on the SOM. The clus-
the input data to their nearest neurons. Some neurons then tering in the extended SOM (minimum variance) [11] is
have no input data assigned to them. These neurons can diGerent from the minimum-distance-based agglomerative
be used as the borders of clusters [16]. These methods are hierarchical clustering. But it is also an agglomerative
cluster visualization tools and inherently are not clustering hierarchical clustering on the SOM based on minimum
methods. variance.
178 S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188
Table 1
Four types of de4nitions of inter-cluster distance
SOM
Clustering
Input data Training
Fig. 1. The two-level approach of clustering of the SOM. DiGerent symbols on the map represent diGerent clusters.
2.3. Clustering of the SOM variance) [11], intra-cluster variances were considered and
better clustering results were obtained. But there is no
In the 4rst level, we use the SOM to form a 2D feature explicit inter-cluster distance information in the extended
map. The number of output neurons is signi4cantly more SOM (minimum variance). In some cases, a clustering al-
than the desired number of clusters. This requires more neu- gorithm using more information about the pair of clusters
rons to represent a cluster, rather than a single neuron to in addition to inter-cluster distances, such as special char-
represent a cluster. Then in the second level the output neu- acteristics about individual clusters and between-clusters,
rons are clustered such that the neurons on the map are di- will have better clustering eGects than the classical hier-
vided into as many diGerent regions as the desired number archical clustering algorithms. This additional information
of clusters. Each input data point can be assigned to a cluster has been considered in the Chameleon algorithm [18], but
according to their nearest output neuron. The whole process the algorithm has to construct k-nearest graph, which is
is illustrated in Fig. 1. This two-level approach for cluster- computationally complex when the data is very large.
ing of the SOM has been addressed in Refs. [9–12]. The In this paper, inter-cluster and intra-cluster density are
recently developed hierarchical algorithms CURE [17] and added in the merging criterion and some useful steps for
Chameleon [18] also utilize a two-level clustering technique. 4ltering noises and outliers are proposed before clustering
The classical clustering algorithms can be used in cluster- of the SOM. This will be discussed in the next section.
ing the output neurons of SOM. However, due to the disad-
vantages of diGerent types of clustering algorithms, we must
choose an appropriate one to cluster the SOM. Partitioning 3. Clustering of the SOM using local clustering validity
clustering of the SOM can cause incorrect clusters. If the index and preprocessing of the SOM for "ltering
clusters in the input data have nonspherical shapes, cluster-
ing of the SOM is deteriorated. The number of cluster in par- 3.1. Global clustering validity index for di2erent
titioning clustering, which is usually unknown beforehand, clustering algorithms
must be prede4ned. Therefore, a partitioning algorithm is
not adopted in this study. This will be illustrated in Section Since there are many clustering algorithms in the litera-
4. Agglomerative hierarchical clustering of the SOM is the ture, evaluation criteria are needed to justify the correctness
algorithm we adopted in this paper. But here more informa- of a partition. Furthermore, the evaluation algorithms need
tion about clusters is added into the clustering algorithm. to address the number of clusters that appear in a data set.
In the classical agglomerative hierarchical clustering of the A lot of eGorts have been made for clustering in the area of
SOM, i.e., the extended SOM (minimum distance) [10], only pattern recognition [19]. In general, there are three types of
inter-cluster distance information is used to merge the near- methods used to investigate cluster validity: (1) external cri-
est neighboring clusters. In the extended SOM (minimum teria; (2) internal criteria; (3) relative criteria [19]. The idea
S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188 179
E A F
3.5. The algorithm of clustering of the SOM
G H I
The overall proposed algorithm is summarized as follows:
4. Experimental results
(b)
To demonstrate the eGectiveness of the proposed cluster-
ing algorithm, four data sets were used in our experiments.
The input data are normalized such that the value of each
1 1 2 datum in each dimension lies in [0,1]. For training SOM we
1 1 2 2 2
used 100 training epochs on the input data and the learning
rate decreases from 1 to 0.0001.
1 1 2 2
1 1 2
4
2
(c)
-2
Fig. 3(a). Two clusters can be merged if the two clusters
are direct neighbors. When the pair of clusters are not di-
rect neighbors, they cannot be merged. This is shown in -3
Fig. 3(b) and (c). The CDbw are used locally for the input
data belonging to the directly neighboring pair of clusters. -4
If the CDbw for a pair of directly neighboring clusters is
the lowest among all the available directly neighboring pair -5
of clusters, the pair of clusters is merged into one cluster. -3 -2 -1 0 1 2
If the number of neurons is very large, the interpolating
neurons form the inter-cluster borders on the map. This may Fig. 4. The synthetic data set in the 2D plane.
182 S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188
4 4
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5
(a) -3 -2 -1 0 1 2
(b) -3 -2 -1 0 1 2
4 4
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5
(c) -3 -2 -1 0 1 2 (d) -3 -2 -1 0 1 2
4 4
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5
(e) -3 -2 -1 0 1 2 (f) -3 -2 -1 0 1 2
Fig. 5. Three partitions of the synthetic data set for clustering of the SOM by (a) the proposed algorithm; (b) k-means clustering algorithm;
(c) single-linkage hierarchical clustering algorithm; (d) complete-linkage hierarchical clustering algorithm; (e) centroid-linkage hierarchical
clustering algorithms; (f) average-linkage hierarchical clustering algorithm. “.”, “*” and “+” indicate three diGerent clusters, respectively.
S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188 183
11
Proposed
Single-linkage
10
5
2 3 4 5 6 7 8 9 10 11
Number of clusters
Fig. 6. The CDbw as a function of the number of clusters for the synthetic data set by the proposed algorithm (solid line), and the
single-linkage clustering algorithm (dashed line) on the SOM.
Fig. 7. Three clusters for the synthetic data set are displayed on
the map by the proposed algorithm or the single-linkage clustering
algorithm on the SOM (SOM map size of 4×4). The same symbol
on the map represents the same cluster. Fig. 8. Three clusters for the synthetic data set are displayed on
the map by the proposed algorithm or the single-linkage clustering
algorithm on the SOM (SOM map size of 6×6). The same symbol
on the map represents the same cluster.
4.1. 2D synthetic data set
70
Proposed
Single-linkage
60
40
30
20
10
0
2 4 6 8 10 12 14
Number of clusters
Fig. 9. The CDbw as a function of the number of clusters for the iris data by the proposed algorithm (solid line), and the single-linkage
clustering algorithm (dashed line) on the SOM.
(a) (b)
Fig. 10. Two clusters for the iris data set are identi4ed by the Fig. 11. For the known three classes, three clusters are formed
proposed algorithm or single-linkage clustering algorithm on the (SOM map size of 4 × 4) for the iris data set by (a) the proposed
SOM (SOM map size of 4 × 4). The same symbol on the map algorithm; (b) single-linkage clustering algorithm on the SOM. The
represents the same cluster. same symbol on the map represents the same cluster.
Table 3
Performance comparison of diGerent clustering algorithms for the
a trivial thing for 2D data. We also tested the above algo-
iris data set
rithms on map size of 3 × 3 and 5 × 5 and obtained similar
results. For a smaller map size such as 2 × 3, the clustering Algorithm Clustering
by all clustering algorithms on the SOM gave wrong clus- accuracy (%)
ters. This is because each cluster has fewer representations
Proposed clustering of the SOM 96.0
so that the elongated clusters cannot be adequately described Single-linkage clustering of the SOM 74.7
by the representation neurons. But for a larger map size such Extended SOM (minimum variance) 90.3
as 6 × 6, all these algorithms achieved the same correct re- Extended SOM (minimum distance) 89.2
sults because the interpolating neurons explicitly forms the Direct k-means 85.3
borders of the three natural clusters and the minimal num- Direct single-linkage clustering 68.0
ber of 4nal agglomerated clusters is 3. The three clusters on Direct complete-linkage clustering 84.0
the SOM with map size of 6 × 6 are shown in Fig. 8, where Direct centroid-linkage clustering 68.0
the interpolating neurons clearly separate the map into three Direct average-linkage clustering 69.3
regions representing three clusters. For a more larger map
size, the minimal number of 4nal agglomerated clusters can We do not use the k-means, the complete-linkage, the
be larger than three and thus cannot express the true infor- centroid-linkage and the average-linkage clustering algo-
mation about the input data. rithms on the SOM in the next three data sets because they
S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188 185
1 Cluster1 (30) 1 1 1
0.5 0.5 0.5 0.5
Cluster2 (35) Cluster3 (40) Cluster4 (45)
0 0 0 0
5 10 15 5 10 15 5 10 15 5 10 15
1 1 Cluster6 (55) 1 Cluster7 (60) 1 Cluster8 (70)
Fig. 12. The statistical information of 20 clusters for the 15D synthetic data set. The horizontal axis in each sub4gure represents the dimension
and vertical axis in each sub4gure represents the value in each dimension. The mean value of each dimension in each cluster is represented
by a black dot in each sub4gure. The standard deviation of each dimension in each cluster is represented by two curves enclosing the black
dots in each sub4gure. The number of data points in each cluster is bracketed in each sub4gure.
The Iris data set [24] has been widely used in pattern 2 7 8 11 19
classi4cation. It has 150 data points of four dimensions. The 2 7 14 17
data are divided into three classes with 50 points each. The
4rst class of Iris plant is linearly separable from the other 1 9 9 17 17
two. The other two classes are overlapped to some extent 1 1 9 13 17
and are not linearly separable. We clustered the data by the
proposed clustering algorithm and the single-linkage clus- Fig. 13. Twenty clusters for the 15D synthetic data set are displayed
tering algorithm on the SOM. We used an appropriate map on the map by the proposed algorithm on the SOM (SOM map
size of 4 × 4. The two algorithms achieved the same optimal size of 8 × 8). The same number on the map represents the same
clustering results. The CDbw as a function of the number cluster.
of clusters, plotted in Fig. 9, indicates that the data are
optimally divided into two clusters. The iris data of the 4rst
class form a cluster, and the rest two classes form the other If three clusters are forced to be formed, the proposed
cluster. The two clusters can be displayed on the map shown algorithm is better than the single-linkage-clustering algo-
in Fig. 10, where “*” symbol represents the 4rst class and rithm. The partition of the map performed by the two al-
“+” represents the second and third class. It is inconsistent gorithms is shown in Fig. 11. One cluster representing the
with the inherent three classes in the data. The two clusters 4rst class (“*” symbol) is clearly separated from other two
are also achieved in Ref. [25]. The two clusters are formed clusters by the interpolating neurons representing borders.
without a priori information about the classes of the data. The clustering accuracies by the proposed algorithm, the
Therefore, the two clusters found in the data are reasonable. single-linkage clustering of the SOM, the extended SOM
186 S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188
400
350
250
200
150
100
50
0 5 10 15 20 25 30 35 40
Number of clusters
Fig. 14. The CDbw as a function of the number of clusters for the 15D synthetic data set by the proposed algorithm on the SOM.
(minimum distance) [10], the extended SOM (minimum used the proposed clustering algorithm on the SOM with
variance) [11], the direct k-means, and the four direct ag- map size of 8 × 8. The 20 clusters displayed on the map are
glomerative hierarchical clustering algorithms are listed in shown in Fig. 13. The CDbw as a function of the number
Table 3. The single-linkage clustering of the SOM has dis- of clusters, plotted in Fig. 14, strongly indicates 20 clusters
advantage of a chain-link tendency so that it joins most of existed in the data. The partition of the data consistent with
the points in the second and third classes, which resulted in the cluster structure of the data with 100% accuracy and
a low clustering accuracy of 74.7%. Therefore in the next thus the statistical information of the clusters generated by
two experiments, we do not use the single-linkage clustering the proposed clustering algorithm is the same as that of the
of the SOM. On the other hand, the proposed algorithm is known clusters shown in Fig. 12.
able to form the correct three clusters with the highest accu-
racy 96.0%, while other agglomerative hierarchical cluster-
ing algorithms (direct or not direct) and the direct k-means 4.4. Wine data set
algorithm have lower clustering accuracy to some extent in
distinguishing the second and third class. Wine data set [26] has 178 13D data with known three
So in real data sets if the number of classes is known and is classes. The numbers of data samples in the three classes are
used for the number of clusters, and some overlapping exists 59, 71 and 48, respectively. We use the proposed algorithm
in some pair of clusters, the proposed algorithm is a better with map size of 4 × 4 to cluster the data. The CDbw as a
choice. If we do not use the information about the number function of the number of clusters, plotted in Fig. 15, indi-
of classes, the pair of overlapping classes may merge into a cates that the number of clusters is three, which is exactly
single cluster and then the number of clusters may be less equal to the number of the classes. This is because the three
than the number of classes. classes are well separated from each other. The three clusters
on the map is shown in Fig. 16. The clustering accuracies by
the proposed algorithm, the extended SOM (minimum vari-
4.3. 15D synthetic data set ance), the direct k-means, and the four direct agglomerative
hierarchical clustering algorithms are listed in Table 4. For
In this example, we used data set of 1780 15D data. The the data set, the proposed algorithm achieved the best clus-
data were created by 4rst generating 20 uniformly distributed tering result with a highest clustering accuracy 98.3%. The
random 15D points with each dimension lying in [0 1], and extended SOM (minimum variance) and the direct k-means
then adding some Gaussian noise to each point with stan- algorithm have also good clustering results with accuracy
dard deviation 0.12 in each dimension. The number of data larger than 90.0%, while the four direct agglomerative
points in each cluster varies from 30 to 165. The statisti- hierarchical clustering algorithms have a worse eGect
cal information for each cluster can be seen in Fig. 12. We in distinguishing the second and third class with lower
S. Wu, T.W.S. Chow / Pattern Recognition 37 (2004) 175 – 188 187
40
35
25
20
15
10
2 3 4 5 6 7 8 9 10 11
Number of clusters
Fig. 15. The CDbw as a function of the number of clusters for the wine data set by the proposed algorithm on the SOM.
5. Conclusions
[6] J. Han, M. Kamber, Data mining: concepts and techniques, International Joint Conference on Neural Networks, Nagoya,
Morgan-Kaufman, San Francisco, 2000. Japan, 1993, pp. 2448–2451.
[7] T. Huntsberger, P. Ajjimarangsee, Parallel self-organizing [17] S. Guha, R. Rastogi, K. Shim, CURE: an eLcient clustering
feature maps for unsupervised pattern recognition, Int. J. Gen. algorithm for large databases, Proceedings of ACM SIGMOD
Systems 16 (1989) 357–372. International Conference on Management of Data, New York,
[8] J. Mao, A.K. Jain, A self-organizing network for 1998, pp. 73–84.
hyperellipsoidal clustering (HEC), IEEE Trans. Neural [18] G. Karypis, E.-H. Han, V. Kumar, Chameleon: hierarchical
Networks 7 (1) (1996) 16–29. clustering using dynamic modeling, IEEE Comput. 32 (8)
[9] J. Lampinen, E. Oja, Clustering properties of hierarchical (1999) 68–74.
self-organizing maps, J. Math. Imag. Vis. 2 (2–3) (1992) [19] S. Theodoridis, K. Koutroubas, Pattern Recognition,
261–272. Academic Press, New York, 1999.
[10] F. Murtagh, Interpreting the Kohonen self-organizing [20] J.C. Dunn, Well separated clusters and optimal fuzzy
feature map using contiguity-constrained clustering, Pattern partitions, J. Cycbern. 4 (1974) 95–104.
Recognition Lett. 16 (1995) 399–408. [21] X.L. Xie, G. Beni, A validity measure for fuzzy clustering,
[11] M.Y. Kiang, Extending the Kohonen self-organizing map IEEE Trans. Pattern Anal. Mach. Intell. 13 (8) (1991)
networks for clustering analysis, Comput. Stat. Data Anal. 38 841–847.
(2001) 161–180. [22] G.W. Milligan, S.C. Soon, L.M. Sokol, The eGect of cluster
[12] J. Vesanto, E. Alhonierni, Clustering of the self-organizing size, dimensionality and number of clusters on recovery of
map, IEEE Trans. Neural Networks 11 (3) (2000) true cluster structure, IEEE Trans. Pattern Anal. Mach. Intell.
586–600. 5 (1983) 40–47.
[13] H. Lopson, H.T. Siegelmann, Clustering irregular shapes [23] R.N. Dave, Validating fuzzy partitions obtained through
using high-order neurons, Neural Computation 12 (10) (2000) c-shell clustering, Pattern Recognition Lett. 17 (1996)
2331–2353. 613–623.
[14] M. Halkidi, M. Vazirgiannis, Clustering validity assessment [24] R.A. Fisher, The use of multiple measure in taxonomic
using multi representatives, Proceedings of SETN Conference, problems, Ann. Eugenics 7 (Part II) (1936) 179–188.
Thessaloniki, Greece, April 2002. [25] J.C. Bezdek, N.R. Pal, Some new indexes of cluster validity,
[15] A. Ultsch, H.P. Siemon, Kohonen’s self organizing IEEE Trans. System, Man, and Cybern. 28 (3) (1998)
feature maps for exploratory data analysis, Proceedings of 301–315.
the International Neural Network Conference, Dordrecht, [26] C.L. Blake, C.J. Merz, UCI repository of machine learning
Netherlands, 1990, pp. 305 –308. databases, (http://www.ics.uci.edu/∼mlearn/MLRepository.
[16] X. Zhang, Y. Li, Self-organizing map as a new method html), Department of Information and Computer Science,
for clustering and data analysis, Proceedings of the University of California at Irvine, CA, 1998.
About the Author—SITAO WU is now pursuing Ph.D. degree in the Department of Electronic Engineering of City University of Hong
Kong, Hong Kong, China. He obtained B.E. and M.E. degrees in the Department of Electrical Engineering of Southwest Jiaotong University,
China in 1996 and 1999, respectively. His research interest areas are neural networks, pattern recognition, and their applications.
About the Author—TOMMY W.S. CHOW (M’93) received the B.Sc (First Hons.) and Ph.D. degrees from the University of Sunderland,
Sunderland, UK. He joined the City University of Hong Kong, Hong Kong, as a Lecturer in 1988. He is currently a Professor in the
Electronic Engineering Department. His research interests include machine fault diagnosis, HOS analysis, system identi4cation, and neural
networks learning algorithms and applications.