You are on page 1of 6

Understanding

Agglomerative Clustering

After applying
Agglomerative
Clustering
What is a Agglomerative Clustering?

Agglomerative Clustering is a type of


hierarchical clustering algorithm. Instead of
beginning with the assumption of a
predetermined number of clusters (as in K-
means), agglomerative clustering starts by
treating each data point as a single cluster and
then repeatedly executes the following steps:

1. Identify the two clusters that are closest to


each other.
2. Merge the two most similar clusters.
3. Repeat the process until only a single
cluster remains.
Why Use Agglomerative Clustering?

Hierarchical Representation: Provides a


dendrogram (tree structure) that can give
insights into the hierarchical structure of
data.
No Need to Specify Number of Clusters:
Unlike K-means, you don't have to specify the
number of clusters in advance.

Real-life Applications

Taxonomy Formation: Building taxonomies


for biological organisms based on genetic
data.
Social Network Analysis: Determining
groups in social networks.
Market Research: Grouping customers with
similar purchase behaviors.
Advantages

Flexibility: Can work with any distance


metric and linkage criterion.
Provides Hierarchy: Helpful for
understanding multi-level abstractions.
Deterministic: Always produces the same
clustering (unlike K-means which might
converge to local optima).

Disadvantages

Scalability: Not suitable for large datasets


due to computational complexity.
No Refinement Phase: Once a decision is
made to merge two clusters, it cannot be
undone.
Python Implementation of
Agglomerative Clustering
This code demonstrates how to apply
Agglomerative Clustering on a simple 2D
dataset and visualize the results.

Adjusting parameters like n_clusters, affinity,


and linkage can fine-tune the clustering to suit
different needs.

You might also like