Professional Documents
Culture Documents
Clustering - Jun 2022
Clustering - Jun 2022
About Clustering
Continue… Clustering
• It is a class of techniques used to classify cases into
groups that are
• relatively homogeneous within themselves and
• heterogeneous between each other
• Homogeneity (similarity) and heterogeneity (dissimilarity)
are measured on the basis of a defined set of variables
• These groups are called clusters
1
02-09-2023
K-means clustering
• Lets see Hierarchical Clustering example in MS Excel and R • The goal of this algorithm is to find groups in the data, with the
number of groups represented by the variable k.
Excel file - Hierarchical and k-means Clustering - PracticeV1.xlsx • The algorithm works iteratively to assign each data point to one of k
groups based on the features that are provided.
R file - Hierarchical and k-means Clustering - PracticeV1.R
• Data points are clustered based on feature similarity.
2
02-09-2023
K-means clustering
• K-means requires the specification of the number of clusters in advance, say S = • Lets see k-means Clustering example in MS Excel and R
3.
• The method aims to group the observations based on their similarity using an
optimization procedure.
• The aim is to minimize the within-cluster variation which is defined as the sum of
square of the Euclidean distance between each data point to the centroid of its
cluster. More precisely, the algorithm works as follow:
1. Start by randomly assigning each subject to a cluster, s=1,…,S
2. Compute the centroid of each cluster and the distance of each subject to each
of the clusters centroids
3. Reassign each subject to the cluster with closest centroid
4. Repeat steps 2 and 3 until no further reassignment is possible (i.e., when the
within--cluster variance is minimized)
3
02-09-2023
Reading Reference
• https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-
clustering-and-different-methods-of-clustering/
• https://uc-r.github.io/kmeans_clustering
• https://towardsdatascience.com/the-5-clustering-algorithms-data-
scientists-need-to-know-a36d136ef68
• https://data-flair.training/blogs/clustering-in-data-mining/