You are on page 1of 15

Cluster Analysis

Cluster Analysis
Cluster Analysis
Cluster Analysis
Where is Cluster Analysis Used?
• Understanding Buyer Behaviour:
– Identify homogeneous groups of buyers
• Identify new product opportunities:
– Competitive sets within market can be determined
– Examine current offerings compared to competitors
• Selecting test markets:
– Grouping cities into homogeneous markets
• Reducing data:
– Create sub-groups of data
Cluster Analysis
• Unsupervised learning • Do Sub-populations exist?
• Does not predict anything in – How many?
particular – What are their sizes?
• Not a Classification technique – – Any common properties?
We do not know the classes!! – Can they be split further?

• Can be used for Segmentation • Any outliers?


• Types of Clustering Algorithms
– K-Means Clustering
– Hierarchical Clustering
K-Means Clustering
• Methodology
• Partitioned into K clusters – Establish K number of Centroids
• K needs to be determined – Connect two Centroids and bisect with a
perpendicular line
• Uses concept of ‘Centroid’
– Assign a Centroid to each data point
• Ex: Type of Documents
– Calculate Mean of the distances between
(News/Scientific/Legal)
each point and respective Centroid
– Minimize Total Squared Intra Cluster Distance
(T)
– Relocate the Centroids
– Repeat above steps until the Centroids no
longer relocate
– Pick clustering that yields the lowest
Aggregate Distance i.e., Sum of T for all
clusters. Lowest Variance.
K-Means
Clustering
K-Means Animation:
http://stanford.edu/class/ee103/visualizatio
ns/kmeans/kmeans.html
K-Means Clustering
• Number of Clusters
– Variance is zero when
Number of Clusters =
Number of data points!!
– Variance decreases as
number of Clusters
increase
– However, decrease in
Variance reduces for each
K added
• Use Scree Plot Elbow
value to determine K
Hierarchical Clustering
• Involves creating clusters that have predetermined ordering from top
to bottom Ex: Files and folders in your hard disk
• Tree is also termed as Dendogram
• Types of Hierarchical Clustering
– Divisive Method:
• Assign all observations to a single cluster
• Partition cluster into 2 least similar clusters
• Proceed recursively until there is one cluster for each observation
– Agglomerative Method:
• Assign a cluster to each observation
• Compute distance (similarity) between each cluster
• Join 2 most similar cluster
• Proceed recursively until there is only a single cluster
Hierarchical Clustering Approaches

Agglomerative DIVISIVE
• Works “Bottom up”  Works “Top Down”
• Two most similar clusters  Two least similar clusters
are combined into nodes are split
• Iterated until root cluster
 Iterated until leaf cluster
Agglomerative Hierarchical Clustering

Source: http://infolab.stanford.edu/~ullman/mmds/ch7.pdf
Agglomerative Hierarchical Clustering

Source: http://infolab.stanford.edu/~ullman/mmds/ch7.pdf
Hierarchical Clustering
Clustering Analysis - Steps
• Formulate the problem
• Select a distance measure
• Select a clustering procedure
• Decide on the number of clusters
• Interpret and profile the clusters
• Assess the validity of clustering

You might also like