Distance Measures

Distance Measures and Linkage
Methods In Hierarchical Clustering
Distance or proximity measures are used to determine the similarity

or “closeness” between similar objects in the dataset.
The goal of proximity measures is to find similar objects and to

group them in the same cluster.
Some common examples of distance measures that can be used to
compute the proximity matrix in hierarchical clustering, including
the following:
 Euclidean Distance
 Mahalanobis Distance
 Minkowski Distance
I would avoid the mathematical formulas involved in explaining the

distances but rather summarise the uses and functions of these
distance measures. A simple google search can provide tons of
examples and research papers if you are interested in learning the
underlying principles which support these calculations.
Euclidean Distance
The Euclidean distance is a non-negative measure which calculates
the distance between two points. The distance between these two
points is quantified based on the Pythagoras Theorem.
Mahalanobis Distance
The Mahalanobis distance is used to find the distance between two
points as a form of t-score. The Mahalanobis distance also takes
normalisation and dispersion of the data into account.
The following formula defines it.
It is useful for non-spherical-shaped distribution, even if points A

and B have the same Euclidean distance from point X, it’s
distribution might not be equally distributed.
Minkowski Distance
The Minkowski distance is defined by the following formula
Where M is an integer and depending on the value of M, it changes

the weight given to larger and smaller differences. For example,
suppose M = 10 and xi = (1,3) and xk = (2,3) then d10 = Square-
root(|1–3|+|2+3|) = Square-root(3)
As clustering is an exploratory process in nature, it is possible to

apply more than one type of clustering algorithm to the dataset. We
would expect to achieve the same outcome even by using different
clustering algorithms. However, the computations might differ as
each algorithm uses different computations to reach the outcome.
The Illustration of Agglomerative Methods of Clustering
As you remember, in Hierarchical clustering, all objects start as

singletons or an individual cluster. They are then merged using one
of the following linkage methods.
 Single Linkage
 Complete Linkage
 Centroid Linkage
 Ward’s Linkage
 Average Linkage
The linkage methods work by calculating the distances or

similarities between all objects. Then the closest pair of clusters are
combined into a single cluster, reducing the number of clusters
remaining.
The process is then repeated until there is only a single cluster left.
For the linkage examples, I would be using a simple scatterplot to
show the relation with the Points defined by the following table.
The output of scatterplot using R studio

And if you like to recreate the scatterplot, you can do so by using the
following code
From the scatter plot we can deduce that three clusters can be seen.
Hierarchical Clustering using Single Linkage
For the Single linkage, two clusters with the closest minimum
distance are merged. This process repeats until there is only a single
cluster left.
Hierarchical Clustering using Complete Linkage
For the Complete linkage, two clusters with the closest maximum
cluster left.
Hierarchical Clustering using Centroid Linkage
For the Centroid linkage, two clusters with the lowest centroid
cluster left.
Let point X denote the centroid distance of each cluster.

Hierarchical Clustering using Ward’s Linkage
For Ward’s linkage, two clusters are merged based on their error
sum of square (ESS) values. The two clusters with the lowest ESS
are merged. This process repeats until there is only a single cluster
left.
Hierarchical Clustering using Average Linkage
AKA group-average hierarchical clustering, the Average linkage

method uses the average pair-wise proximity among all pairs of
objects in different clusters. Clusters are merged based on
their lowest average distances.
That sums up common distance measures and linkage methods

In Hierarchical Clustering.

Distance Measures

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distance Measures

Uploaded by

Copyright:

Available Formats

Distance Measures and Linkage

Methods In Hierarchical Clustering

Distance or proximity measures are used to determine the similarity

The goal of proximity measures is to find similar objects and to

I would avoid the mathematical formulas involved in explaining the

The following formula defines it.

It is useful for non-spherical-shaped distribution, even if points A

The Minkowski distance is defined by the following formula

Where M is an integer and depending on the value of M, it changes

As clustering is an exploratory process in nature, it is possible to

The Illustration of Agglomerative Methods of Clustering

As you remember, in Hierarchical clustering, all objects start as

The linkage methods work by calculating the distances or

The output of scatterplot using R studio

Hierarchical Clustering using Single Linkage

Let point X denote the centroid distance of each cluster.

Hierarchical Clustering using Average Linkage

AKA group-average hierarchical clustering, the Average linkage

That sums up common distance measures and linkage methods

You might also like