You are on page 1of 40

Week-9 Part-2

Clustering Analysis Methods


In data mining and statistics, hierarchical clustering
analysis is a method of clustering analysis that seeks to
build a hierarchy of clusters i.e. tree-type structure
based on the hierarchy.

In machine learning, clustering is the unsupervised


learning technique that groups the data based on
similarity between the set of data. There are
different-different types of clustering algorithms
Types of Hierarchical Clustering
Basically, there are two types of hierarchical
Clustering:

➢Agglomerative Clustering
➢Divisive clustering(Rarely Using)
Agglomerative Clustering
Agglomerative clustering is a hierarchical clustering algorithm that starts with each data
point as a single cluster and iteratively merges the closest pairs of clusters until only one
cluster remains. Let's walk through a simple numerical example to illustrate the
agglomerative clustering process.
Divisive Hierarchical Clustering:
• Iteratively splits the most "heterogeneous" cluster until
each cluster contains only a single data point.
• Divisive clustering is less commonly used than
agglomerative clustering, as it can be computationally
expensive.
Choosing the Right Method:
• The best type of hierarchical clustering algorithm for a
given task depends on the specific data and the
desired outcome.
• Overall, hierarchical clustering is a powerful and
versatile technique for unsupervised learning. By
choosing the right method and considering the
limitations, you can use it to gain valuable insights from
your data.
Computing Distance Matrix
• While merging two clusters we check the distance between
two every pair of clusters and merge the pair with the least
distance/most similarity. But the question is how is that
distance determined. There are different ways of defining Inter
Cluster distance/similarity. Some of them are:
• Min Distance: Find the minimum distance between any two
points of the cluster.
• Max Distance: Find the maximum distance between any two
points of the cluster.
• Group Average: Find the average distance between every two
points of the clusters.
• Ward’s Method: The similarity of two clusters is based on
the increase in squared error when two clusters are merged.
Types of Agglomerative Hierarchical
Clustering
Single Linkage: The distance between two clusters is
the minimum distance between any two points in the
different clusters. This method tends to create
long, stringy clusters.
Types of Agglomerative Hierarchical
Clustering contd..
Complete Linkage: The distance between two
clusters is the maximum distance between any two
points in the different clusters. This method tends to
create compact, spherical clusters.
Types of Agglomerative Hierarchical
Clustering contd..
Average Linkage: The distance between two clusters
is the average distance between all pairs of points in
the different clusters. This method is a compromise
between single and complete linkage.
Types of Agglomerative Hierarchical
Clustering contd..
Ward's Method: Merges clusters that minimize the
increase in total within-cluster variance. This method
is often used when the clusters are expected to be
spherical.
Types of Clustering
❖ Centroid-based clustering: This type of clustering algorithm
forms around the centroids of the data points. Example: K-
Means clustering, K-Mode clustering.
❖ Distribution-based clustering: This type of clustering
algorithm is modeled using statistical distributions. It
assumes that the data points in a cluster are generated from a
particular probability distribution, and the algorithm aims to
estimate the parameters of the distribution to group similar
data points into clusters Example: Gaussian Mixture Models
(GMM)
❖ Density-based clustering: This type of clustering algorithm
groups together data points that are in high-density
concentrations and separates points in low-concentrations
regions. The basic idea is that it identifies regions in the data
space that have a high density of data points and groups those
points together into clusters. Example: DBSCAN(Density-
Based Spatial Clustering of Applications with Noise)
Hierarchical Agglomerative Clustering
It is also known as the bottom-up approach or hierarchical agglomerative
clustering (HAC). A structure that is more informative than the unstructured set
of clusters returned by flat clustering. This clustering algorithm does not require
us to prespecify the number of clusters. Bottom-up algorithms treat each data as a
singleton cluster at the outset and then successively agglomerate pairs of clusters
until all clusters have been merged into a single cluster that contains all data.
Hierarchical Divisive clustering
It is also known as a top-down approach. This algorithm also does not require to prespecify the
number of clusters. Top-down clustering requires a method for splitting a cluster that contains
the whole data and proceeds by splitting clusters recursively until individual data have been split
into singleton clusters.
Solved Example for HAC
Problem Definition
Euclidean Distance
Calculating distances
Distance Matrix
Solution contd…Update Distance Matrix
Solution contd…
Solution contd..
Solution contd…
Solution contd…
Solution End
HAC-Solved Usecase-2
Merge (43,43) i.e., Min Dist is 1
After Merging 43 with 42 remove corresponding row and column
from the matrix
Min Dist 27 & 25 is 2, remove corresponding row and col and merge with
row 25
Solution contd..
After Merging 25,27 with 22
Solution contd…
After Merging 18 with ((25,27),22)
Solution contd…
Solution contd…
Solution contd…
Solution End
Usecase-3
Consider the following set of points in a two-dimensional space:
A(2,3),B(5,4),C(9,6),D(4,7),E(8,1),F(7,2)
We will use Euclidean distance as the measure of similarity between clusters. The algorithm
proceeds as follows:
Step 1: Calculate the pairwise distances between all points. For simplicity, let's assume Euclidean
distance:
Sol contd..
Solution End
• Step 4: Repeat steps 2 and 3 until only one
cluster remains.
• This process continues until all points are part
of a single cluster. The output is a dendrogram
that represents the hierarchy of cluster
mergers.
Dendrogram
Week-9 Part-2 End

End of Week-9

You might also like