You are on page 1of 3

1.

Today we'll be discussing about Hierarchical Clustering, one of the most popular
clustering techniques in Machine Learning.

Hierarchical Clustering is a method of grouping similar data points together based


on their pairwise distances. The result of hierarchical clustering can be represented
as a dendrogram, which is a tree-like diagram that illustrates the hierarchy of
clusters.

There are two types of Hierarchical Clustering: Agglomerative Approach and


Divisive Approach. Let's take a closer look at each one of them.

First, we have the Agglomerative Approach. This is a bottom-up approach that


starts with each data point as its own cluster and merges the closest pairs of
clusters until all data points are in the same cluster. This method is also known as
the "merging" method.

On the other hand, we have the Divisive Approach, which is a top-down approach
that starts with all data points in a single cluster and recursively splits them into
smaller clusters until each data point is in its own cluster. This method is also
known as the "splitting" method.

Both of these approaches have their own advantages and disadvantages, and the
choice of which one to use depends on the dataset and the problem at hand.

In summary, Hierarchical Clustering is a powerful technique for grouping similar


data points together. The resulting dendrogram can provide insights into the
underlying structure of the data.

2.Now let's dive deeper into the concept of Linkage. During both types of
Hierarchical Clustering, the distance between two sub-clusters needs to be
computed. The different types of linkages describe the different approaches to
measuring the distance between two sub-clusters of data points.

There are different types of linkages used in Hierarchical Clustering, and in this
section, we'll be discussing three of them: Single Linkage, Complete Linkage, and
Average Linkage.

-Single returns the minimum distance between two points from different clusters.
The formula to calculate the single linkage distance between two clusters A and B
is the distance between the closest two points from each cluster. If we denote the
distance between two points x and y as d(x,y), then the formula for the single
linkage distance between two clusters A and B is calculated as follows:

d(A,B) = min(d(x,y)) for x in A, y in B

Here, d(x,y) is the distance between two data points x and y, and min(d(x,y)) is the
minimum distance between all pairs of data points in the two clusters A and B.

This formula gives us the distance between two clusters A and B, and it is used to
select the pair of clusters with the smallest single linkage distance to merge them
into a new cluster in the single linkage clustering algorithm.

 Complete returns the maximum distance between two points.

The formula to calculate the complete linkage distance between two clusters A and
B is the distance between the furthest two points from each cluster. If we denote
the distance between two points x and y as d(x,y), then the formula for the
complete linkage distance between two clusters A and B is calculated as follows:

d(A,B) = max(d(x,y)) for x in A, y in B

Here, d(x,y) is the distance between two data points x and y, and max(d(x,y)) is the
maximum distance between all pairs of data points in the two clusters A and B.

This formula gives us the distance between two clusters A and B, and it is used to
select the pair of clusters with the largest complete linkage distance to merge them
into a new cluster in the complete linkage clustering algorithm.
 -Average takes the arithmetic mean of the distances between all points.

Considering two clusters A and B, the distance between every data-point i in A and
any data-point j in B is determined first, followed by the arithmetic mean of these
distances.
Average Linkage returns the arithmetic mean value.
nA nB
1
L ( A , B )= ∑ ∑ D ( i , j ) ,i ∈ A , j ∈ B
nA +nB I=1 j =1

1
D(A,B) = nA ∗ nB

Here, nA and nB are the number of data points in clusters A and B respectively,
D(i,j) is the distance between two data points i and j, and the sum is taken over all
pairs of data points where i is in cluster A and j is in cluster B.

In other words, the average linkage method calculates the distance between two
clusters A and B by averaging the distances between all pairs of data points from
the two clusters. This method tends to create compact and balanced clusters, and is
less sensitive to outliers than other linkage methods.

Once the distances between all pairs of clusters are calculated, the algorithm
selects the pair of clusters with the smallest average linkage distance to merge
them into a new cluster. The process is repeated until all data points are assigned to
a single cluster.In summary, Hierarchical Clustering is a powerful technique for
grouping similar data points together, and the resulting dendrogram can provide
insights into the underlying structure of the data. The choice of linkage method is
important, and Agglomerative Clustering is a useful method for creating a
hierarchical structure of clusters.

You might also like