You are on page 1of 16

MAK:NG

ing, uscd.
2 then the

radicts the Chapter 5


To do this,
of squares
Clustering
Introduction
dary algo
ve decision Chapters 3 and 4 describe how samplesmay be classified if a training set is available to
Its of both use in the design of a classifier. However, there are many situations where the classes
themselves are initially undefined. Given a set of feature vectors samnpled from some
population, we would like to know if the data set consists of a number of relatively
distinct subsets. If it does and we can determine these subsets, we can define them to
be classes. This is sometimes called class discovery., The techniques from Chapters
3 and 4 can then be used to further analyze or model the data or to classify new datä
if desired( Clustering refers to the process of grouping samples so that the samples
are similar within each group. The groups are called clusters.
adjustable In some applications, the main goal may be to discover the subgroups rather than
to model them statistically. For example, the marketing director of a fir1n that supplies
business services may want to know if the businesses in a particular community fali
into any natural groupings of similar companies so that specific service packages and
marketing plans can be designed for each of these subgroups. Reading the public
data on these companies might give an idea of what some of these subgroups could
be, but the process would be difficult and unreliable, particularly if the nurnber of
3: 36, features or companies is large. Fortunately, clustering techniques allow the division
into subgroups to be done automatically, without any preconceptions about what kinds
of groupings should be found in the community being analyzed. Cluster analysis has
been applied in many fields. For exarnple, in 1971,Paykel used cluster analysis to group
and later 165 depressed patients into four clusters which were then called anxious," "hostile."
and that *retarded psychotic," and "young depressive." In image analysis, cl:stering can be
y decision used to find groups of pixels with similar gray levels, colors, or local textures, in order
ketch the to discover the various regions in the image.

199
200
CHAPIER 5. CLUSTERING

Animals

Dogs Cats

Large Small

LongHair Short Hair


St. Bernard Labrador

1 2 3 4 5

Figure 5.1: A hierarchical clustering.

In cases where there are only two features, clusters can be found through visual
inspection by looking for dense regions in a scatterplot of the data if the subgroups or
classes are wellseparated in the feature space. If, for example, there are two bivariate
normally distributed classes and their means are separated by more than two standard
deviations, two distinct peaks form if there is enough data. In Figure4.20 at least one
of the three classes forms a distinct cluster, which could be found even if the classes
were unknown. However,distinct clusters may exist in a high-dimensional feature space
and stillnot be apparent in any of the projections of the data onto a plane defined
by a pair of the feature axes. One general way to find candidates for the centersof
clusters is to form an n-dimensional histogram of the data and find the peaks in the
histogram. However, if the number of features is large, the histogram may have to
be very coarse to have a significant number of samples in any cell, and the locations
of the boundaries between these cells are specified arbitrarily in advance, rather than
dependipg on the data.

5/2 Hierarchical Clustering


Ahierarchy can be represented by atree structure such as the simple one shown
in Figure 5.1. The patients in au anim3l hospital are composed of two main groups,
dogs and cats, each of which is coInposed of subgroups. Each subgroup is, in turn,
composed of subgroups, and so on. Each of the individual animals, 1 through 5,
is represented at the lowest level of the tree. Hierarchical clustering refers to a
clustering process that organizes the dat into large groups, which contain smaller
groups, and so on. A hierarchical clustering may be drawn as a tree or dendrogram.
The finest grouping is at the bottom of the dendrogram; each sample by itself forms
5.2. HIERARCHICAL CLUSTERING
201

cltstey
are
he coarsest grouping is at the top of the dendrogram, where all santples
góycdinto one cluster. In between, there are various numbers of clusters. For
exynple, in the hierarchical clustering of Figure5.1, at level 0the clusters are
{1}, (2).{3}, (4}, {5),
cach consisting of an individual sample. At level 1, the clusters are

{1,2}.(3}, {4}.{5}.
At level 2, the clusters are
{1,2}.{3} {4, 5}.
At level 3, the clusters are
{1,2,3}, {4, 5}.
At level 4, the single cluster
{1,2,3, 4, 5}
consists of all,the samples.
In a hifarchical clustering, if at some level two samples belong to a cluster, they
belong tø the same cluster at all higher levels. For example, in Figure 5.1, at level 2
sample 4 and 5 belong to the same cluster; samples 4 and also belong to the same
cluster at levels 3 and 4.
/Hierarchical clusteringalgorithms are calledagglomerative if they build the den
dtogram from the bottom up and they are called divisive if they build the dendrogram
from the top dow
The general agglomerativeclustering algorithm is straightforward to describe. The
total number of samples will be denoted by n.
Agglomerative Clustering Algorithm
1. Begin with n clusters, cach consisting of one sample,
2. Repeat step 3 a total of n -1times.
3. Find the most similar clusters C; and C;i and merge C; and C; into one cluster.
If there is a tie, merge the first pair found.
meth
Different hierarchical clustering algorithms are obtained by using different ween
the similarity bet
ods to determine the similarity of clusters. One way to measure distance
between clusters. This
chusters is to define a function that measures distance
measures the distance
function typically is induced by an underlying function that
techniques (Sec
between pairs of samples. In cluster analysis as in nearest neighbor block
Euciidean distance ánd city
tion 4.4), the most popular distance measures are
distance.
202

CHAPTER 5. CLUSTERING
30

25
20
Feature2
15

10
5

1 2 4

5 10 15 20 25 30

Feature 1

Figure 5.2: Samples for clustering.


The Single-Linkage Algorithm
The single-linkage algorithm is also known as the
nearest neighbor method. The latter title underscores minimum method and the
its close
nearest neighbor classification method. The single-linkage algorithm relation to the
is obtained by
defining the distance between two clusters to be the smallest distance
between two
points such that one point is in each cluster. Formally, if C; and C; are
distance between them is defined as clusters, the

Dsu(C, C;)= a£C;,bEC;


min d(a, b),
where da, b) denotes the distance between the samples a and b.

Example 5.1 Hierarchical clustering using the single-linkage algorithm.


Perform a hierarchical clustering of five samples using the single-linkage algorithm and
two features, z and y. A scatterplot of the data is shown in Figure 5.2. Use Euclidean
distance for the distance between samples. The following tables give the feature values
for each sample and the distance d between each pair of samples:
23 4 5
1 4 4.0 11.7 20.0 21.5
(s-(yj 4.0 8.1
2 4 16.0 17.9
3 15 8 3 11.7 8.1 9.8 9.8
(5.1)
4 24 4 4 20.0 16.0 9.8 8.0
5 24 12 5 21.5 17.9 9.8 8.0

S37
2
5.2. HIERARCHICAL CLUSTERING 203

For the single-sample chusters {a} and {b}. Ds({a}. (b}) =d(o. b).
The algorithm begins with five chusters, cach cousisting of one salnple. The two
nearest clusters are then merged. The smallest number in (5.1) is 4. whirh is thr
distance between samples 1 and 2, so the clusters {1) and {2} are merged. A1 this
point there are four clusters
{1,2}. {3}.(4}.(5).
Next obtain the matrix that gives the distances between these clusters:
{1,2} 3 4 5
8.1 16.0 17.9
{1.2)
3 8.1 9.8 9.8
4 16.0 9.8 8.0
5 17.9 9.8 8.0

The value 8.1 in row {1,2} and column 3 gives the distance between the clusters {1.2}
and {3} and is computed in the following way. Matrix (5.1) shows that d(1, 3) = 11.7
and d(2,3)= 8.1. In the single-linkage algorithm, the distance between clusters is the
he minimum of these values, 8.1. The other values in the first row are computed in a
he similar way. The values in other than the first row or first column are simnply copied
by from the previous table (5.1). Since the minimum value in this matrix is 8. the clusters
WO
{4} and {5} are merged. At this point there are three clusters:
the
{1,2}. {3}, {4, 5}.
Next obtain the matrix that gives the distance between these clusters:

{1,2} 3 {4,5}
{1,2} 8.1 16.0
d 3 8.1 9.8
{4,5} 16.0 9.8
n and
idean Since the minimum value in this matrix is 8.1, the clusters {1.2} and (3} are merged.
ralues At this point there are two clusters:

{1,2,3}.{4,5}.
The next step will merge the Kvo remaining clusters at a distance of 9.S. The
(5.1) hierarchical clustering is comle. The dendrogram is shown in Figure 5.3.
204
CIHAPTER 5. CLUSTERING

1o
Distance
8
Nearest
Neighbor
6
4
2
0

1 2 4 5

Figure 5.3: Hierarchical clustering using the single-linkage algorithm. The distance
DsL between clusters that merge is shown on the vertical axis.

The Complete-Linkage Algorithm


(The complete-linkage algorithm is also called the maximum method or the far
thest neighbor method. It is obtained by defining the distance between two clusters
to be the largest distance between a sample in one cluster and a sample in the other
cluster. If C; and C are clusters, we define
max d(a, b).
DcL(Ci, C;) = aECi,bEC;

Example 5.2 Hierarchicol clustering using the complete-linkage algorithm.


Perform a hierarchical clustering using the complete-linkage algorithm on the data
shown in Figure 5.2. Use Euclidean distance (4.1) for the distance between samples.
As before, the algorithm begins with five clusters, each consisting of one samplc.
Thenearest clusters {1} and {2} are then merged to produce the clusters
{1,2}, (3}, {4}.{5}.
Next obtain the matrix that gives the distances between these clusters:
{1,2} 4 5

{1,2} 11.7 20.0 21.5


3 11.7 9.8 9.8
4 20.0 9.8 8.0
21.5 9.8 8.0
5.2. HIERARCIHICAL CLUSTERING 205

The value 11.7 in row {1,2} and column 3 gives the distance betwcen the clusters {1.2}
and {3} and is computed in the following way. Matrix (5.1) sbows that d(1,3) =11.7
and d(2, 3) = 81. In the complete-linkage algorithm, the distance between clusters is
the maximum of these values, 11.7. Thc other values in the first row are computed in
a similar way. The values in other than the first row or irst column are simply copicd
from (5.1). Since the minimum value in this matrix is 8, the clusters (4} and {5} are
merged. At this point the clusters are
{1,2}. {3}, {4,5}.
Next obtain the mnatrix that gives the distance between these clusters:
e l u s t

{1,2} 3 {4,5}
{1,2} 11.7 21.5
11.7 9.8
{4,5} 21.5 9.8

Since the minimum value in this matrix is 9.8, the clusters {3} and {4, 5} are merged.
At this point the clusters are
{1,2}, {3,4, 5}.
Notice that these clusters are different from those obtained at the corresponding point
of the single-linkage algorithm.
hierarchical
At the next step, the two remaining clusters will be merged. The
clustering is complete. The dendrogram is shown in Figure 5.4.

A cluster, by definition, contains similar samples. (The single-linkage algorithm


and the complete-linkage algorithm difer in how they determine when samples in two
clusters are similar so they can be merged. The single-linkage algorithm says that two
clusters C; and C; are similar if there are any elements a in C; and b in C; that are
similar, in the sense that the distance between a and b is small. In other words, in
the single-linkage algorithm, it takes a single similar pair a, b with a in C; and b in C;
in order to merge C; and C;. (Readers familiar with graph theory will recognize this
procedure as that used by Kruskal's algorithm to find a minimum spanning tree.) On
the other hand, the complete-linkage algorithm says that two clusters C; and C; are
small. In other
similar if the maximum of DcL(a, b) over all a in C; and b in C; is
similar in
words, in the complete-linkage algorithm all pairs in C; and C; must be
order to merge C; and Cj.

The Average-Linkage Algorithm


whereas the complete
The single-linkage algorithm allows clusters to grow long and thin are susceptible to
linkage algorithm produces more compact clusters. Both clusterings
206
CHAPTER 5. CLUSTERING

Furthest
Distance
Neighbor
15
20

10

1 3 5

Figure 5.4: Hierarchical clustering using the


complete-linkage algorithm.
distortion by outliers or deviant observations. The
an attempt to compromise between the extremes of the average-linkage algorithm is
single- and complete-linkage
algorithms.
The average-linkage clustering algorithm, also known as the
group method using arithmetic averages (UPGMA), is oneunweighted pair
of the most widely
used hierarchical clustering algorithms. The average-linkage
algorithm is obtained by
defining the distance between two clusters to be the average distance
in one cluster and a point in the other cluster.
between a point
Formally, if C; is a cluster with n;
members and C; is a cluster with n, members, the distance between the clusters is

1
D L(Ci, C;) = d(a, b).
aECi,bEC;

Example 5.3 Hierarchical clustering using the average-linkage algorithm.

Perform a hierarchical clustering using the average-linkage algorithm on the data shown
in Figure 5.2. Use Euclidean distance (4.1) for the distance between samples.
The algorithm begins with five clusters, each consisting of one sample. The nearest
clusters {1} and {2} are then mnerged to form the clusters

{1,2},{3),{4}, (5}.
5.2. HIERARCHICAL CLUSTERING
207

Next olbtain the latrix that pivCs the


distance bctween tlhese clusters:
{1,2} 3 4
{12} 9.9 18.0 19.7
3 9.9 9.8 9.8
4 18 9.8 8.0
5 19.7 9).8 8.0

The value 9.9 in row {1,2) and column 3 gives the


distance
and {3} and is computed in the following way. Matrix (5.1)between
the clusters {1, 2)
shows that d(1,3) = 11.7
and d(2,3) = 8.1. In the average-linkage algorithm, the distance
the average of these values, 9.9. The other values in the between clusters is
first row are
similar way. The values in other than the first row or first column are computed in a
from (5.1). Since the minimum value in this matrix is 8, the simply copied
clusters {4} and {5} are
merged. At this point tlhe clusters are

{1,2},{3},{4,5).
Next obtain the matrix that gives the distance between these
clusters:
{1,2} 3 {4,5}
{1,2} 9.9 18.9 2

3 9.9 9.8
{4,5} 18.9 9.8

Since the minimum value in this matrix is 9.8, the clusters {3} and {4,5}
are merged.?
At this point the clusters are
{1,2}, {3,4, 5}.
At the next step, the two remaining clusters are mnerged and the
hierarchical clustering
is complete.

An example of the application of the average-linkage algorithm to a larger data set


using the SAS statistical analysis software package is presented in Appendix B.4.

Ward's Method
Ward's method is also called the minimum-variance method. Like the other
t algorithms, Ward's method begins with one cluster for each individual sample. At
each iteration, among all pairs of clusters, it merges the pair that produces the smallest
squared error for the resulting set of clusters. The squared error for each cluster is
defned as follows. If a cluster contains m samples x1,...,Xm where x; is the feature
208 CHAPTER 5. CLUSTERING

vector (zi1,. ., Tia), the squared error for sample x which is the squared Euclidean
distance from the mean is
d

j=1
where u, is the mean value of feature / for the samples in the cluster
1
Wj = Tij:
i=1

"The squared error E for the entire cluster is the sum of the squared errors of the
samples
m d

=
E ( i j -;)' mo'.
i=lj=1
The vector comnposed of the means of each feature, (u1,..., Ld) = u, is called the
mean vector or centroid of the cluster. The squared error for a cluster is the sum
of the squared distances in each feature from the cluster members to their mean. The
squared error is thus equato the total variance of the cluster o times the number
of samples in the clusterf,where the total variance is defined to be o² = ot...+o;
the sum of the varianbes for each feature. The squared error for a set of clusters is
defined to be the sum of the squared errors for the individual clusters.

Example 5.4 Hierarchical clustering using Ward's method.


Perfopa hierarchical clustering using Ward's method on the data shown in Figure
5.2. The algorithm begins with five clusters each consisting of one sample. At this
point, the squared error is zerq. There are 10 possible ways to merge a pair of clusters:
Merge 1 and {2}, merge {1} and {3}, and so on. Figure 5.5 shows the squared error
for each possibility. For example, consider merging {1} and {2}. Since sample 1 has
the feature vector (4,4) and sample 2 has the feature vector (8,4), the feature means
are 6 and 4. The squared error for cluster {1, 2) is
(4-6) + (8 6)+(4- 4) + (4- 4) = 8.
The squared error for each of the other clusters (3}, {4}, and {5} is 0. Thus the
total
squared error for the clusters {1,2), {3},{4),{5} is
8+0+0 +0= 8.
Since the smallest squared error in Figure 5.5 is 8, the clusters {1} and {2} are merged
to give the clusters
{1,2}. (3), {4}.{5}.
5.2. HIERARCHICAL CLUSTERING
209

Squared
Clusters Error, E
{1,2),(3}4)45} 8.0
{1,3),(2},{4).45} 68.5
{1,4},{2}.{3}.(5} 200.0
{1,5).{2}.{3},(4} 232.0
{2,3}.(1},{4),{5} 32.5
{2,4}.{1}.{3},(5) 128.0
{2,5),{1}.(3)44} 160.0 e-(8-4)4

{3,4},{1),{2}.{5} 48.5
{3,5),(1}.{2}.(4} 48.5
{4,5){1).{2},(3} 32.0

Figure 5.5: Squared errors for each way of creating four clusters.

Squared
Clusters Error, E X
{1,2,3}.{4}.{5} 72.7
{1,2,4).{3},(5} 224.0
{1,2,5}.{3},(4} 266.7
{1,2},{3,4),(5} 56.5
{1,2).{3.5}(4} 56.5
{1,2},{4,5},(3} 40.0

Figure 5.6: Squared errors for three clusters.


2

2
2
Figure 5.6shows the squared error for all possible sets of clusters that result from
merging two of {1, 2}, {3}, 4}, {5}. Since the smallest squared error in Figure 5.6 is
40, the clusters {4) and {5} are merged to form the clusters
{1,2}, {3}, {4,5}.
Figure 5.7 shows the squared error for all possible sets of clusters that result from
merging two of {1,2}, {3), {4, 5}. Since the smallest squared error in Figure 5.7 is 94,
the clusters {3} and {4, 5} are merged to give the clusters
{1,2}. {3,4, 5}.
CHAPTER 5. CLUSTEIRING
210

Squared
Clusters Error, E
{1,2,3},{4,5} 104.7
{1,2,4,5),{3} 380.0
{1,2},(3,4,5}) 94.0

Figure 5.7: Squared errors for two, clusters.


400

Errors
300
Squared
200

of
Sum 100

1 2 3 4 5

Figure 5.8: Dendrogram for Ward's method.

At the next step, the two remaining clusters are merged and the hierarchical clus
tering is complete. The resulting dendrogram is shown in Figure 5.8.

i5.3 Partitional Clustering


Agglomerative clustering (Section 5.2) creates a series of nested clusters. This contrasts
with partitional clustering in which the goal is usually to create one set of clusters
are assumed
that partitions the data into similar groups. Samples close to one another
to be similar and the goal of the partitional clustering algorithms is to group data that
of clusters to be
are close together. In many of the partitional algorithms, the number
con_tructed is specified in advance.
fa.partitional algorithm is used to divide the data set into two groups, and then
each of these groups is divided into two parts, and so on, a hierarchical dendrogram
could be produced from the top down. The hierarchy produced by this divisive
5.3. PA RTITIONAL CLUSTERING 211

technique is more general than the bottom-up hierarchics produced by agglomerative


techniques because the groups can be dividcd into more than two subgroups inone
step. (The only way this could happen for an agglomerative technique would be for two
distances to tie, which would be extremely unlikely even if allowed by the algorithm.)
Another advantage of partitional techniques is that only the top part of the trec, which
shows the main groups and possibly their subgroups, may be required, and there may
be no need to complete the dendrogram. All of the examples in this section assume
that Euclidean distances are used, but the techniques could use any distance measure.

Forgy's Algorithm
One of the simplest partitional clustering algorithms is Forgy's algorithm Forgy].
Besides the data, input to the algorithm consists of k, the number of clusters to be
constructed, and k samples called seed points. The seed points could be chosen
randomly, or some knowledge of the desired cluster structure could be used to guide
their selection.

Forgy's Algorithm
1. Initialize the cluster centroids to the seed points.
sample in the
2. For each sample, find the cluster centroid nearest it. Put the
cluster identified with this nearest cluster centroid.

V3. If no samples changed clusters in step 2, stop.


A. Compute the centroids of the resulting clusters and go to step 2.

Example 5.5 Partitional clustering using Forgy's algorithm.


on the data shown in Figure
Perform a partitionalclustering using Forgy's algorithm
5.2. Set k = 2, which will produce two clusters,
and use the first two samples (4,4)
algorithm, the samples will be denoted by
and (8,4) in the list as seed points. In this
numbers to aid in the computation.,
their feature vectors rather than their sample
for each sample. Figure 5.9 shows the
For step 2, find the nearest cluster centroid
results. The clusters {(4.4) } and ((8,4), (15,8), (24,4), (24,12)} are produced.
clusters. The centroid of the first cluster
For step 4, compute the centroids of the
cluster is (17.75,7) since
is (4.4). The centroid of the second
(8+ 15 + 24 + 24)/4 = 17.75
212 CHAPTER 5. CLUSTERING

Ncarest
Sample Cluster Centroid
(4.4) (4,4) (3
g(4-s9) (8.4) (8,4)
(15,8) (8,4)
(24,4) (8,4)
(24,12) (8,4)

69 Figure 5.9: First iteration of Forgy's algorithm.


2
Nearest
Sample Cluster Centroid -N25+
(3.69
(4,4) (4,4)
(8,4) (4,4) - 3 . 33
6 0343
(15,8) (17.75,7)
(24,4) (17.75,7)
(24,12) (17.75,7)

Figure 5.10: Second iteration of Forgy's algorithm.

and
(4+ 8+ 4+ 12)/4 =7.
return to step
Since some samples changed clusters (tlhere were initially no clusters),
2
shows the results. The
Find the cluster centroid nearest each sample. Figure 5.10
clusters {(4,4), (8, 4)} and {(15,8), (24, 4), (24, 12)} areof produced. Since the sample
For step 4, compute the centroids (6,4) and (21, 8) the clusters.
4) changed clusters, return to step 2.
results. The
Find the cluster centroid nearest each sample. Figure 5.11 shows the
obtained.
clusters {(4,4),(8, 4)} and {(15, 8), (24, 4), (24, 12)} are sample
For step 4, compute the centroids (6, 4) and (21, 8) of the clusters. Since no
ternates.
will change clusters, the algorithm

In this version of Forgy's algorithm, the seed points are cfoseh árbitrarily as the
Grst two samples; however, other possibilities have been suggested. One alternativa :.
to begin with k clusters generated by one of the hierarchical clustering algorithms and
points.
use their centroids as initial seed
5.3. PARTITIONAL CLUSTERING
213

Nearest
Sample Cluster Centroid 2ene

(4,4) (6,4)
(8,4) (6,4)
(15,8) (21,8) 'Wesetswel4
(24,4) (21,8)
(24,12) (21,8)

Figure 5.11: Third iteration of Forgy's algorithm.

It has been proved (Selim] that Forgy's algorithm terminates; that is,
eventually no
samples change clusters. However, if the number of samples is large, it may take the
algorithm considerable time to produce stable clusters. For this reason, some versions
of Forgy's algorithm allow the user to restrict the number of iterations. Other versions
of Forgy's algorithm Dubes permit the user to supply parameters that allow new
clusters to be created and to establish a minimum cluster size.

The k-means Algorithm


An algorithm similar to Forgy's algorithm is known as the k-means algorithm. Be
sides the data, input to the algorithm consists of k, the number of clusters to be
constructed. The k-means algorithm differs from Forgy's algorithm in that the cen
troids oi the clusters are recomnputed as soon as a sample joins a cluster.) Also, unlike
Forgy's algorithm whichis iterative, the k-means algorithmn makes only two passes
through the data set.

k-means Algorithm
Begin with k clusters, each consisting of one of the first k samples. For each
of the remaining n k samples, find the centroid nearest it. Put the samnple in
the cluster identified with this nearest centroid. After each sample is assigned,
recompute the centroid of the altered cluster:
2. Go through the data a second time. For each sample,find the centroid nearest
it. Put the sanple in the cluster identified with this nearest centroid. (During
this step,do not recompute any centroid.)

Example 5.6 Partitional clustering using the k-means algorithm.


214
CHAPTER 5. CILUSTERIN

Distance to Distance to
Sample Centroid (9,5.3) Centroid (24,8)
(8,4) l.6 16.5
V&-s3) >(24,4) 15.1
(15,8) 6.6 9.0
(4,4) 6.6 40.4
(24,12) 16.4 (40)
Figure 5.12: Distances for use by step 2 of the k-means
algorithm.
Perform a partitional clustering using ihe k-means
5.2. Set k = 2 and assume that the data algorithm on the data in Figur
are ordered so that the first two
(8,4)_ and (24, 4). samples ar
For step 1, begin with two clusters {(8,4)} and
(8,4) and (24,4). For each of the remaining three {(24,4)} which have centroids a
it, put the sample in this cluster, and samples, find the centroid neares
recompute the centroid of this cluster.
The next sample (15, 8) is nearest the centroid (8,4) so it
At this point, the clusters are {(8, 4), (15, 8} and joins cluster {(8,4)}
cluster is updated to (11.5, 6) since {(24,4)}. The centroid of the first

2
(8+ 15) /2 = 11.5, (4 + 8)/2 = 6.
The next sample (4,4) is nearest the centroid (11.5, 6) so it
(15,8)}. At this point, the clusters are {(8,4), (15, 8), (4, 4)} and joins cluster {(8,4),
troid of the first cluster is updated to (9, 5.3). {(24, 4)}. The cen
The next sample (24, 12) is nearest the centroid (24, 4) it
At this point. the clusters are {(8,4), (15, 8), (4,4)} and {(24,so12), joins cluster {(24, 4)}.
of the second cluster is updated to (24, 8). At (24, 4)}. The centroid
this point, step 1of the algorithm is
complete.
For step 2, examine the samples one by one and put each one in the cluster
with the nearest centroid. As Figure 5.12shows, in this case no identified
sample changes clusters.
The resulting clusters are

{(8,4), (15, 8), (4, 4)} and {(24, 12), (24,

An alternative version of the k-means algorithm iterates step 2.


Specifically, step
2 is replaced by the following steps 2 through 4:
2. For each sample, find the centroid nearest it. Put the
sample in the cluster
identified with this nearest centroid.

You might also like