You are on page 1of 27

BITS Pilani

BITS Pilani Dr.Aruna Malapati


Asst Professor
Hyderabad Campus Department of CSIS
BITS Pilani
Hyderabad Campus

K-Means Clustering
Today’s Learning objective

• List the clustering algorithms

• Define K-Means clustering algorithm

• List and resolve issues with K-Means clustering

BITS Pilani, Hyderabad Campus


Clustering Algorithms

• K-means and its variants

• Hierarchical clustering

• Density-based clustering

BITS Pilani, Hyderabad Campus


K-means Clustering
• Partitional clustering approach
• Each cluster is associated with a centroid (center point)
• Each point is assigned to the cluster with the closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple

BITS Pilani, Hyderabad Campus


Importance of Choosing
Initial Centroids
Iteration 1 Iteration 2 Iteration 3
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5


y

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

Iteration 4 Iteration 5 Iteration 6


3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5


y

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

BITS Pilani, Hyderabad Campus


K Means clustering (section
9.1 Bishop page 454)

• Given the data set {x1, . . . , xN} where each Xi is a D-


dimensional Euclidean variable.
• Our goal is to partition the data set into some number K of
clusters.
• μk, where k = 1, . . . , K, in which μk is a prototype associated
with the kth cluster (representing the centres of the clusters).
• Our goal is then to find an assignment of data points to clusters,
as well as a set of vectors {μk}, such that the sum of the
squares of the distances of each data point to its closest vector
μk, is a minimum.

BITS Pilani, Hyderabad Campus


K Means clustering

• For each data point xn, we introduce a corresponding set


of binary indicator variables rnk ∈ {0, 1}, where k =
1, . . . , K describing which of the K clusters the data
point xn is assigned to, so that if data point xn is
assigned to cluster k then rnk = 1, and rnj = 0 for j = k.

BITS Pilani, Hyderabad Campus


K-means Clustering

• We can then define an objective function, which


represents the sum of the squares of the distances of
each data point to its assigned vector μk

• Our goal is to find values for the {rnk} and the {μk} so as to
minimize J.

BITS Pilani, Hyderabad Campus


Importance of Choosing
Initial Centroids
Iteration 5
1
2
3
4
3

2.5

1.5
y

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2


x

BITS Pilani, Hyderabad Campus


Solution to random
initialization

• Choose initial centroids and perform multiple runs and


select the set of clusters with minimum SSE.
• This success of this will depend on data set and number
of clusters chosen.

BITS Pilani, Hyderabad Campus


Handling Empty Clusters

• Basic K-means algorithm can yield empty clusters

• Several strategies

• Choose a point and assign it to the cluster

• Choose the point that contributes most to SSE

• Choose a point from the cluster with the highest SSE

– If there are several empty clusters, the above can be


repeated several times.

BITS Pilani, Hyderabad Campus


Updating Centers
Incrementally
• In the basic K-means algorithm, centroids are updated after
all points are assigned to a centroid

• An alternative is to update the centroids after each


assignment (incremental approach)
– Each assignment updates zero or two centroids
– More expensive
– Never get an empty cluster
– Can use “weights” to change the impact

BITS Pilani, Hyderabad Campus


Pre-processing and Post-
processing
• Pre-processing
– Normalize the data
– Eliminate outliers

• Post-processing
– Eliminate small clusters that may represent outliers
– Split ‘loose’ clusters, i.e., clusters with relatively high SSE
– Merge clusters that are ‘close’ and that have relatively low
SSE
– Can use these steps during the clustering process
• ISODATA

BITS Pilani, Hyderabad Campus


Bisecting K-means

• Bisecting K-means algorithm


– Variant of K-means that can produce a partitional or a hierarchical clustering

BITS Pilani, Hyderabad Campus


Bisecting K-means Example

BITS Pilani, Hyderabad Campus


Limitations of K-means

• K-means has problems when clusters are of differing

– Sizes

– Densities

– Non-globular shapes

• K-means has problems when the data contains outliers.

BITS Pilani, Hyderabad Campus


Limitations of K-means:
Differing Sizes

Original Points K-means (3 Clusters)

BITS Pilani, Hyderabad Campus


Limitations of K-means:
Differing Density

Original Points K-means (3 Clusters)

BITS Pilani, Hyderabad Campus


Limitations of K-means:
Non-globular Shapes

Original Points K-means (2 Clusters)

BITS Pilani, Hyderabad Campus


Problems with K-Means
Clustering
• K-Means Clustering works only for clusters which represent
gaussian distributions. Hence, we cannot use K-Means
Clustering for finding complex clusters or non-convex clusters.

• The K-Means Algorithm is very sensitive to initialization, and


hence one must be careful while initializing the cluster means.

• The Algorithm can get stuck at a local optima, finding clusters


different from those originally wanted. This is also a factor
affected by the initialization of the cluster means.

BITS Pilani, Hyderabad Campus


K-medoids Clustering
Algorithm

BITS Pilani, Hyderabad Campus


PAM (Partitioning Around
Medoids) (1987)
• PAM (Kaufman and Rousseeuw, 1987)
• Use real object to represent the cluster
• Select k representative objects arbitrarily
• For each pair of non-selected object h and selected
object I, calculate the total swapping cost TCih
• For each pair of i and h,
• If TCih < 0, i is replaced by h
• Then assign each non-selected object to the most
similar representative object
• repeat steps 2-3 until there is no change

BITS Pilani, Hyderabad Campus


A Typical K-Medoids
Algorithm (PAM)

BITS Pilani, Hyderabad Campus


Computation Complexity
for K-Means

• In each iteration,

• It costs O(Kn) to compute the distance between each


of n examples and K cluster means
• It costs O(n) to update the cluster means by adding
each example to one cluster
• Assume t iterations are done before terminating the
algorithm, the computational complexity is O(tKn)

BITS Pilani, Hyderabad Campus


K-Means/Median/Mode/Medoid
Clustering complexity

BITS Pilani, Hyderabad Campus


Take home message

• K-means algorithm is a simple yet popular method for


clustering analysis.
• Its performance is determined by initialization and
appropriate distance measure
• There are several variants of K-means to overcome its
weaknesses
• K-Medoids: resistance to noise and/or outliers
• K-Modes: extension to categorical data clustering analysis
• CLARA: extension to deal with large data sets
• Mixture models (EM algorithm): handling uncertainty of clusters
BITS Pilani, Hyderabad Campus

You might also like