You are on page 1of 13

Clustering

What is Cluster Analysis?

Cluster Analysis (or Clustering or Data Segmentation) is


• Grouping similar objects (mostly consumers) into Clusters (or Segments)
based on a large number of characteristics found in the data
• It is an ‘unsupervised learning’ technique
A Cluster…

Is a collection of objects
• Similar to one another within the same group
• Dissimilar to objects in other groups
Let us look at the q’re that you filled…
There were 14 attitude, lifestyle, activity related statements like
• I generally plan my expenses and never spend more than my budget
• I prefer reading or listening to music rather than exercising or playing a
sport
• I make it a point to do some physical exercises (like swimming, walking,
yoga) almost everyday
• I prefer spending my weekends at home with family rather than partying
out with friends

WE WOULD WANT TO CLUB SIMILAR PEOPLE (…AND CREATE CLUSTERS)


BASED ON THEIR REPONSES TO THESE STATEMENTS
There are two methods of Clustering…

• Hierarchical method, and


• K-Means method
Hierarchical Clustering…

• It starts with the assumption that each individual element is a Cluster


• Compares all pairs and clubs the nearest pair to form cluster
• Then, repeatedly goes on combining two nearest clusters
Example: Hierarchical clustering

(5,3)
o
(1,2)
o
x (1.5,1.5) x (4.7,1.3)
o (2,1) o (4,1)
x (1,1)
x (4.5,0.5)
o (0,0) o (5,0)

Data:
o … data point
x … centroid
Dendrogram 8
K-Means Clustering…

• Mostly done when the number of Clusters (segments) is broadly known


• So, it starts by picking K, the number of clusters
Example: Assigning Clusters

x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters after round 1
10
Example: Assigning Clusters

x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters after round 2
11
Example: Assigning Clusters

x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters at the end
12

You might also like