You are on page 1of 4

02-09-2023

About Clustering

• Clustering is “the process of organizing objects into groups whose


Clustering members are similar in some way”.
• A cluster is therefore a collection of objects which are “similar”
Business and Managerial Application between them and are “dissimilar” to the objects belonging to other
clusters.

Continue… Clustering
• It is a class of techniques used to classify cases into
groups that are
• relatively homogeneous within themselves and
• heterogeneous between each other
• Homogeneity (similarity) and heterogeneity (dissimilarity)
are measured on the basis of a defined set of variables
• These groups are called clusters

Objective of clustering An Ideal Clustering Situation


• Intra cluster distance is the sum of
distances between objects in the
same cluster.
• This distance should always be
minimized.
• Inter cluster distance is the distance
between objects in the different
cluster.
• This distance should always be
maximized.

1
02-09-2023

A Practical Clustering Situation

Hierarchical clustering Hierarchical clustering


• Hierarchical clustering is an algorithm
that builds hierarchy of clusters.
1. Agglomerative Clustering - It starts
with all observations as a cluster and
with each step combine observations to
form one large cluster.
2. Divisive Clustering -It starts with one
large cluster and proceeds to split into
smaller cluster items that are most
dissimilar.
• The result of hierarchical clustering
can be shown using dendrogram.

K-means clustering
• Lets see Hierarchical Clustering example in MS Excel and R • The goal of this algorithm is to find groups in the data, with the
number of groups represented by the variable k.
Excel file - Hierarchical and k-means Clustering - PracticeV1.xlsx • The algorithm works iteratively to assign each data point to one of k
groups based on the features that are provided.
R file - Hierarchical and k-means Clustering - PracticeV1.R
• Data points are clustered based on feature similarity.

2
02-09-2023

K-means clustering
• K-means requires the specification of the number of clusters in advance, say S = • Lets see k-means Clustering example in MS Excel and R
3.
• The method aims to group the observations based on their similarity using an
optimization procedure.
• The aim is to minimize the within-cluster variation which is defined as the sum of
square of the Euclidean distance between each data point to the centroid of its
cluster. More precisely, the algorithm works as follow:
1. Start by randomly assigning each subject to a cluster, s=1,…,S
2. Compute the centroid of each cluster and the distance of each subject to each
of the clusters centroids
3. Reassign each subject to the cluster with closest centroid
4. Repeat steps 2 and 3 until no further reassignment is possible (i.e., when the
within--cluster variance is minimized)

Conducting Cluster Analysis: Formulate the


The “Business Decision”
Problem
• Perhaps the most important part of formulating the clustering The management team of a large shopping mall would like to
problem is selecting the variables on which the clustering is based. understand the types of people who are, or could be, visiting their mall.
• Inclusion of even one or two irrelevant variables may distort an They have good reasons to believe that there are a few different
otherwise useful clustering solution. market segments, and they are considering designing and positioning
• Basically, the set of variables selected should describe the similarity the shopping mall services better in order to attract mainly a few
between objects in terms that are relevant to the marketing profitable market segments, or to differentiate their services
research problem. (e.g.invitations to events, discounts, etc) across market segments.
• The variables should be selected based on past research, theory, or
a consideration of the hypotheses being tested. In exploratory
research, the researcher should exercise judgment and intuition.

Dataset : Consumers’ Attitude Towards


Shopping
• Based on past research, six attitudinal variables were identified. 160 • Practice Questions from US City dataset and shopping attitude
consumers were asked to express their degree of agreement with the dataset.
following statements on a 7-point scale (1=disagree, 7= agree)
• V1 : Shopping is fun
• V2 : Shopping is bad for your budget
• V3 : I combine shopping with eating out
• V4 : I try to get the best buys when shopping
• V5 : I don’t care about shopping
• V6 : You can save a lot of money by comparing prices
• Age, Gender, Frequency of mall visits per month

3
02-09-2023

Reading Reference
• https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-
clustering-and-different-methods-of-clustering/
• https://uc-r.github.io/kmeans_clustering
• https://towardsdatascience.com/the-5-clustering-algorithms-data-
scientists-need-to-know-a36d136ef68
• https://data-flair.training/blogs/clustering-in-data-mining/

You might also like