You are on page 1of 8
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one MACHINE LEARNING Everything on Hierarchical Clustering ‘An unsupervised clustering algorithm to hierarchically cluster data sharing common characteristics into distinct groups Renu Khandelwal (Follow) Jun 26-6 min read e In this article, you will learn. © What is Hierarchical clustering, and where is it used? «Two different types of Hierarchical clustering -Agglomerative and Divisive Clustering «How does the Hierarchical Clustering algorithm work with an understanding of different linkages and metrics? © What is a dendrogram? + Finding the optimal number of clusters from a dendrogram Implement Hierarchical clustering using python Clustering is the most common form of unsupervised learning on unlabeled data to clusters objects with common characteristics into discrete clusters based on a distance measure. Common Clustering Algorithms are * Centroid based clustering like KMeans which is efficient but sensitive to initial conditions and outliers + Density-based clustering like DBSCAN clusters data into a high-density area separated by a low-density area, + Distribution based clustering like Gaussian Mixture model using Expectation- Maximization(EM), which is a generative probabilistic model that attempts to find Gaussian probability distributions that best model the dataset + Hierarchical clustering Hierarchical clustering builds a hierarchy of clusters without prespecifying the number of clusters based on similarity score Hierarchical clustering is useful for + Customer Segmentation where customers can be segmented using demographic or income or purchasing patterns + Social Network Analysis to understand the dynamics of individuals and groups based on their interests and access to information available. + Arranging Genomic Data into meaningful biological structures based on a common characteristic * City Planning Clustering ensures a commercial zone is not within an industrial zone, or a residential area is not within an industrial zone. Hierarchical Clustering is either bottom-up, referred to as Agglomerative clustering, or Divisive, which uses a top-down approach. Agglomerative clustering A bottom-up approach where each data point is considered a singleton cluster at the start, clusters are iteratively merged based on similarity until all data points have merged into one cluster. Agglomerative clustering agglomerates pairs of clusters based on maximum, similarity calculated using distance metrics to obtain a new cluster, thus reducing the number of clusters with every iteration. They consider only local patterns and do not account for global distribution. Divisive clustering A top-down approach, opposite to Agglomerative cluster's bottom-up approach. The divisive cluster begins with one cluster encompassing all data points from the dataset. It then iteratively splits the cluster using a flat clustering algorithm like KMeans until each data point belongs to a singleton cluster. Divisive cluster produces more accurate hierarchies than bottom-up Agglomerative clusters. Divisive clustering accounts for the global pattern by looking at the complete information present in the dataset, Aggiomerative Divisive Clustering Clustering Deep Dive into Working of Agglomerative Clustering Step 1: All the data points are assigned as a single point cluster. If there are m observations in the dataset, each point is assigned to a cluster, and we will have m clusters. Step 2: Find the closets or the most similar pair of clusters to merge them into one cluster. Hierarchical clustering use similarity measure to combine the most similar cluster pair. The similarity between the clusters is measured using either Euclidean distance, Manhattan distance(City Block distance), Minkowski distance, or cosine similarity. ey If p=1 then Minkowski=Manhattan distance If p=2 then Minkowrski-Euclidean distance ten (Sinn Euclidean distonce Manhattan distance Minkowski distance Step 3: After identifying the two closest clusters, use the linkage method to. determine how to merge the two clusters. Several linkage methods are single linkage, complete linkage, average linkage, centroid method, or ward’s method. a e Single uinkage (9° @} DIC, 2 min ICA, C2) Rat Minimum distance between data points in clusters ro Complete Linkage @2—-®) (C2, <2} mac O(C1,€2) N24 \__/ Maximum distance between data points in clusters Average Linkage s) (C3, <2} Average distances ofall pairs in clusters XN Sal Cenicid method (gs (Sq) Mininum distance between centotds ofthe dusters aks) imum variance to minimize the total within-cluster variance Ward's method : Different linkage to merge two closest or similar clusters Repeat step 2 and step 3 till all observations are clustered. into one single cluster Calculate similarity measure and apply linkage method to Merge cluster oo) = ia” Ty aacta 090) ae Min(24.(@48)-minsO(31. 3}, £044), ELAS) Dendrogram Hierarchical clustering is typically visualized using a dendrogram. Dendrograms are a tree-like representation of points based on similarity or dissimilarity metrics. Adendrogram has data items of one axis and the distances along the other axis where the distance between two merged points or clusters monotonically increases. The distance score is calculated between every pair of points Dendrograms based on different Linkage methods from scipy.cluster hierarchy import dendrogram, linkage import matplotlib.pyplot as plt matplotlib inline plt. figure (figsize=(15,5)) data= [[i] for i in [9, 3, 6, 4, 11]] linked_s = linkage(data, ‘single’, metric='euclidean') plt.subplot(2, 5, 1) dendrogram(linked , labels=data) plt.ylabel ("Distance') plt.title("single linkage") linked_s = linkage(data, ‘complete’, metric='euclidean') plt.subplot (2, 5,2) dendrogram(linked_s , labels=data) plt.title("complete linkage") linked_s = linkage(data, ‘average’, metric='euclidean') plt.subplot (2, 5,3) dendrogram(linked s , labels=data) plt. title ("average linkage") linked_s = linkage(data, ‘ward’, metric='euclidean') plt.subplot (2, 5,4) dendrogram(linked_s , labels=data) plt.title("ward method") linked_s = linkage(data, ‘centroid’, metric='euclidean') plt.subplot (2, 5,5) dendrogram(linked_s , labels=data) plt.title("centroid method") plt.show() ly mm do call ih co, ie 2; vHeww “wwe ‘www “www Owe Agglomerative Clustering Implementation Dataset-Mall Customer Dataset import pandas as pd + Read the dataset into a datafra dataset = pd.read_csv('Mall_Customers .csv' ,index_col='CustomerID') # Drop duplic dataset.drop duplicates (inplace=True) plt. figure (figsize=(10,5)) + Creating the X= dataset.iloc[:, [1,2,3]].values linked_s = linkage(X, ‘complete’, motri dendrogram(linked , labels=x) plt.ylabel (‘Distance’) plt.axhline(y=1.5, color='orange") plt.title("Mall Customer HCA") plt. tight_layout (pad=3.0) plt. show () CityBlock') 195120 val «130337 es) ‘19226 Va 25126 2) 32337 16) Finding the optimal number of clusters from a Dendrogram Find the largest difference of heights in the vertical lines shown by the blue arrow above. The threshold is selected in such a way that it cuts the tallest vertical line. ‘The number of clusters will be the number of vertical lines intersected by the line drawn using the threshold. In the above case, the optimal number of clusters is 4 Running Agglomerative Clustering from sklearn.cluster import AgglomerativeClustering import seaborn as sns agg_cluster = AgglomerativeClustering(n_clusters=4) .fit_predict (x) #Visualising the clusters plt. figure (figsize=(15,7)) sns. scatterplot (X[agg_cluster X[agg_cluster color 'yellow', label = ‘Cluster 1', sns. scatterplot (X[agg_cluster X[agg_cluster == 1, 1], color "blue", label = ‘Cluster 2', sns.scatterplot (X[agg_cluster X[agg_cluster color ‘green’, label = ‘Cluster 3',s: sns. scatterplot (X[agg_cluster = ‘grey’, label = ‘Cluster 4',s=! plt.grid(False) plt.title('Clusters of customers’) plt.xlabel (‘Annual Income') plt.ylabel (‘Spending Score") plt.legend() X[agg_cluster = 3, 1], color plt.show() cs cote i * i Bg fo}. os % wet i «2 wey a - : roa yt alee 2 totus % : 2 ; A Additional things to know about Hierarchical Clustering Hierarchical clustering is sensitive to outliers and does not work with missing data Hierarchical clustering works especially well with smaller data sets. When the clustering is complete, it becomes more computationally expensive as more data points are considered. Conclusion: Hierarchical clustering is one of the unsupervised clustering methodologies to clusters objects with common characteristics into discrete clusters based on a distance measure. ‘The hierarchical algorithm builds clusters by merging or splitting them successively and without prespecifying the number of clusters. The similarity score is calculated using different methods like Euclidean distance or a city block distance. When we successively merge the clusters, it is Agglomerative clustering, and when we successively split, the clustering is referred to as Divisive clustering. The hierarchical clustering is visualized using Dendrograms. References: Hierarchical agglomerative clustering (stanford.edu) What is Hierarchical Clustering? An Introduction to Hierarchical Clustering What is Hierarchical Clustering Clustering is one of the popular techniques used to create homogeneous groups of... wow mygreatlearning com hierarchical-clustering.pdf (princeton.edu) hups://developers.google.com/machine-learning/clustering Sign up for Towards Al Newsletter By Towards Al ‘Towards Al publishes the best of tech, science, and engineering, Subscribe to receive our updates right in your inbox. Interested in working with us? Please contact us —> https:/isponsors.towardsal net Take a look. Youll need to signin or create an account to receive this newsletter Machine Learning Hierarchical Clustering Unsupervised Clustering Clustering Unsupervised Learning

You might also like