Customer Segmentation Using Unsupervised Machine Learning

VISVESVARAYA TECHNOLOGICAL UNIVERSITY
BELAGAVI
NET’S
NAVODAYA INSTITUTE OF TECHNOLOGY, RAICHUR
INTERNSHIP REPORT ON
“CUSTOMER SEGMENTATION’’
Submitted By
B SREEJA
(3NA17CS002)
Under The Guidance Of:

Prof. VIJAY KUMAR YADAV
NET’S
NAVODAYA INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the internship project work entitled “Customer Segmentation” is a
work carried out by B Sreeja (3NA17CS002) in partial fulfilment for the award of bachelor
of Engineering, Degree in Computer Science & Engineering as prescribed by Visvesvaraya
Technology University, Belgaum, during the year 2020-2021.
Prof. Vijay Kumar Yadav Dr.M. N. Faruk Dr. M. V. Mallikarjuna

Project Guide HOD, Dept. of CSE Principal
External Examiner
1.
2.
DECLARATION
I, B.Sreeja, student of Final semester B.E. in Computer Science, Navodaya Institute of
Technology, Raichur, hereby declare that the Internship training Report on “Customer
Segmentation”, submitted to Dlithe Consultancy Services, Bangalore in partial fulfilment of
Degree of Bachelors of Engineering is the original work done by me.
The information and data given in the report is authentic to the best of my knowledge
B. SREEJA
ACKNOWLEDGEMENT
We take this opportunity to present my thanks to all those who had really acted as lighting pillars
to enlighten our way through out this project that has lead to successful and satisfactory
completion of this study.
Our sincere thanks to beloved principal Dr.M.V.Mallikarjuna, Navodaya Institute Of

Technology Raichur. For his kind co-operation during our entire course.
We are really grateful to our HOD Dr.M.N.Faruk for providing us with an opportunity to
undertake this project in this university and providing us with all the facility.
We are highly thankful to Prof.Vijay Kumar Yadav for active support, valuable time and advice,
whole hearted guidance, sincere co-operation and painstaking involvement during the study and
completion of project within the time stipulated.
Lastly we are thankful to Mr.Anubhav from Dlithe Consultancy Services, particularly who have
been instrumental in creating proper, healthy and conductive environment and including new and
fresh innovative ideas for us during the project. There help, it would been extremely easy for us
preparing the project in a time bound frame work.
INDEX
CHAPTERS CONTENTS PAGE-NO.
1 Introduction
 Introduction to project 1
2 Literature Review 4
3 Requirements
 Hardware and Software details 5
4 Methodology
 Customer Segmentation Strategy 6
 Key benefits 6
 Integrated approach 7
5 Implementation
 Importing libraries and the dataset 8
 Visualizing distribution of data 8
 Heatmap, Pairplot 17
 Clustering Algorithms 20
25
 Birch, GMMs, Affinity propagation, Mean shift clustering
29
 DBSCAN, Optics
31
 Evaluating clusters
6 Results and Discussion

 Final result 35
Conclusion
CUSTOMER SEGMENTATION
CHAPTER-1
INTRODUCTION
Imagine that you are treating the grocery shop owner that you shop every day, as you treat your significant
other. That can be fun at the beginning, however may cause disastrous situations too. Likewise, it can be
unfavourable for a company to manage its relationships with every customer similarly.
Customer segmentation enables a company to customize its relationships with the

customers, as we do in our daily lives.
When you perform customer segmentation, you find similar characteristics in each customer’s behaviour and
needs. Then, those are generalized into groups to satisfy demands with various strategies. Moreover, those
strategies can be an input of the
 Targeted marketing activities to specific groups
 Launch of features aligning with the customer demand
 Development of the product roadmap
There are different products/solutions available in the market from packaged software to CRM products.
Today, I will apply an unsupervised machine learning algorithm with Python.
Project idea — Customer segmentation is a technique in which we divide the customers based on their
purchase history, gender, age, interest, etc. It is useful to get this information so that the store can get help in
personalize marketing and provide customers with relevant deals. With the help of this project, companies can
run user-specific campaigns and provide user-specific offers rather than broadcasting same offer to all the
users.
Customer segmentation is the practice of dividing a customer base into groups of individuals that are
similar in specific ways relevant to marketing, such as age, gender, interests and spending habits. Companies
employing customer segmentation operate under the fact that every customer is different and that their
marketing efforts would be better served if they target specific, smaller groups with messages that those
1
Dept. of CSE, NIT
consumers would find relevant and lead them to buy something. Companies also hope to gain a deeper
understanding of their customers’ preferences and needs with the idea of discovering what each segment finds
most valuable to more accurately tailor marketing materials toward that segment.
Customer segmentation relies on identifying key differentiators that divide customers into groups that can be
targeted. Information such as a customers’ demographics (age, race, religion, gender, family size, ethnicity,
income, education level), geography (where they live and work), psychographic (social class, lifestyle and
personality characteristics) and behavioural (spending, consumption, usage and desired benefits) tendencies
are taken into account when determining customer segmentation practices.
Benefits of customer segmentation include:
1. Personalization
 Personalization ensures that you provide exceptional customer experience.
2. Customer Retention
 It is 16 times as costly to build a long-term business relationship with a new customer than simply
to cultivate the loyalty of an existing customer.
3. Better ROI for marketing
2
Dept. of CSE, NIT
 Affirmations that right marketing messages are sent to the right people based on their life cycle
stage.
4. Reveal new opportunities
 Customer segmentation may reveal new trends about products and it may even give the first
mover’s advantage in a product segment.
Approach — Machine Learning
Unsupervised Learning is a class of Machine Learning techniques to find the patterns in data. The data given
to unsupervised algorithm are not labelled, which means only the input variables(X) are given with no
corresponding output variables. In unsupervised learning, the algorithms are left to themselves to discover
interesting structures in the data.
3
Dept. of CSE, NIT
CHAPTER-2
LITERATURE REVIEW
There are some analytics techniques that can help you with segmenting your customers. These are useful
especially when you have a large number of customers and it’s hard to discover patterns in your customer data
just by looking at transactions. The two most common ones are:
1. Clustering
 Clustering is an exploration technique for datasets where relationships between different

observations may be too hard to spot with the eye.
2. Principal Component Analysis (PCA)
 PCA is a statistical procedure that uses an orthogonal transformation to convert a set of

observations of possibly correlated variables (entities each of which takes on various numerical
values) into a set of values of linearly uncorrelated variables called principal components.
The following code takes advantage of the Mall Customer Segmentation Data to demonstrate the ability of K-
Means clustering algorithm to identify customer’s segments.
Exploring methods for cluster analysis, visualizing clusters through dimensionality reduction and
interpreting clusters through exploring impactful features.
Although we have seen a large influx of supervised machine learning techniques being used in organizations
these methods suffer from, typically, one large issue; a need for labeled data. Fortunately, many unsupervised
methods exist for clustering data into previously unseen groups, thereby extracting new insights from your
clientele.
This article will guide you through the ins and outs of clustering customers. Note that I will not only show
you which sklearn package you can use but more importantly, how they can be used and what to look out for.
As always, the data is relatively straightforward and you can follow along with the notebook. It contains
customer information from a Telecom company and is typically used to predict churn:
Fig.: Customer information.
4
Dept. of CSE, NIT
CHAPTER-3
REQUIREMENTS
Hardware and software details
Software used to run the project was Python (GoogleColaboratory).
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and
execute arbitrary python code through the browser, and is especially well suited to machine learning, data
analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup
to use, while providing free access to computing resources including GPUs.
Colaboratory allows you to write and execute Python in your browser, with
 Zero configuration required

 Free access to GPUs
 Easy sharing
Colab notebooks allow you to combine executable code and rich text in a single document, along
with images, HTML, LaTeX and more. When you create your own Colab notebooks, they are stored in
your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing
them to comment on your notebooks or even edit them. To create a new Colab notebook you can use the File
menu above.
Colab notebooks are Jupyter notebooks that are hosted by Colab.
Colab is used extensively in the machine learning community with applications including:
 Getting started with TensorFlow

 Developing and training neural networks
 Experimenting with TPUs
 Disseminating AI research
 Creating tutorials
5
Dept. of CSE, NIT
CHAPTER-4
METHODOLOGY
The segmentation process is one of the first steps in approaching a market and a good
segmentation scheme must be such that:
(a) The needs within one segment are similar.
(b) The needs from segment to segment are significantly different.
(c) The segments are measurable (definable), accessible and actionable.
(d) The segments should make sense for the local market.
CUSTOMER SEGMENTATION STRATEGY
The strategy grid will help you in making strategic decisions for each of your customer segments:
 Go: When your strength in a customer segment is strong and the segment is attractive.
 Keep: When your strength is strong but the attractiveness of the segment is low.
 Investigate: If the segment is attractive but your strength is low.
 Drop: If both segment attractiveness and your strength are low.
However, organizations must figure out the most relevant characteristic for their growth. Different customer
segments respond to different value propositions and require different strategic approaches. When properly
used, segmentation helps you allocate resources throughout all levels of your organization to create a value
proposition that uniquely serves your target customer groups.
KEY BENEFITS
 Targeting: Identifying those customers most likely to purchase and become your most profitable accounts.
 Messaging: Creating unique messaging, marketing/ sales channels and contact cadence for each unique
customer segment.
 Loyalty & Retention: Focusing on your most loyal customers and the drivers of customer retention and
renewal by customer segment.
 Next Logical Purchase: Predicting both what and when customers will purchase next - and establishing timed
campaigns to intercept that purchase.
6
Dept. of CSE, NIT
 Upsell/Cross-Sell: Understanding which offerings are most likely to drive follow on purchase and expand
share of wallet.
Effective segmentation drives revenue growth through increased ability to meet customers' demands. Its
greatest impact is on the top line, growing the number of customers, the amount of sales per customer and
lifetime value of the customer.
Integrated Approach
Fig: Integrated Approach of finding customer segments along with their hidden buying patterns
7
Dept. of CSE, NIT
CHAPTER-5
IMPLEMENTATION
Import libraries
# Import the libraries

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.cluster import KMeans
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import Birch
from sklearn.cluster import MeanShift
from sklearn.mixture import GaussianMixture
from sklearn.cluster import AffinityPropagation
from sklearn.cluster import OPTICS
Import the dataset
data = pd.read_csv('/kaggle/input/mall-customers/Mall_Customers.csv')
X = data.iloc[:, [3, 4]].values
Visualizing distribution of data
Data visualization is the graphic representation of data. It involves producing images that communicate
relationships among the represented data to viewers of the images. This communication is achieved through
the use of a systematic mapping between graphic marks and data values in the creation of the visualization.
This mapping establishes how data values will be represented visually, determining how and to what extent a
property of a graphic mark, such as size or color, will change to reflect changes in the value of a datum.
Distribution is the activity of both selling and delivering products and services from manufacturer to customer.
8
Dept. of CSE, NIT
plt.figure(figsize=(15,10))
sns.set(style='whitegrid')
plt.title('Distribution',fontsize=15)
plt.axis('off')
sns.distplot(X,color='orange')
plt.xlabel('Distribution')
plt.ylabel('#Customers')
Distribution of annual income

sns.countplot(data['Annual Income (k$)'], palette = 'rainbow')
plt.title('Distribution of Annual Income (k$)', fontsize = 20)
plt.show()
9
Dept. of CSE, NIT
sns.set(style = 'whitegrid')
sns.distplot(data['Annual Income (k$)'],color='g')
plt.title('Distribution of Annual Income(k$)', fontsize = 15)
plt.xlabel('Range of Annual Income(k$)')
10
Dept. of CSE, NIT
Distribution of age
From a business perspective, most the companies that appear to have a successful story are extremely focused
on a particular target group so as to provide the best experience for them. Hence, businesses are primarly
focused on such relevant activities. In addition, occasionally, a business may select more than one segment as
the focus of its activities, in which case, it would normally identify a primary target and a secondary target.
Primary target markets are those market segments to which marketing efforts are primarily directed and where
more of the business’s resources are allocated, while secondary markets are often smaller segments or less
vital to a product’s success.
The Age variable would be a good indicator of the targeted Age groups.
11
Dept. of CSE, NIT
sns.countplot(data['Age'], palette = 'cool')
plt.title('Distribution of Age', fontsize = 20)
plt.show()
sns.set(style = 'whitegrid')
sns.distplot(data['Age'],color='m')
plt.title('Distribution of Age', fontsize = 15)
plt.xlabel('Range of Age')
12
Dept. of CSE, NIT
Distribution of spending score

sns.countplot(data['Spending Score (1-100)'], palette = 'copper')
plt.title('Distribution of Spending Score (1-100)', fontsize = 20)
plt.show()
13
Dept. of CSE, NIT
sns.set(style='whitegrid')
sns.distplot(data['SpendingScore(1-100)'],color='r')
plt.title('Distribution of Spending Score (1-100)', fontsize = 15)
plt.xlabel('Range of Spending Score (1-100)')
14
Dept. of CSE, NIT
Gender
Individual customers, where the majority pertains to women.

size = data['Gender'].value_counts()
colors = ['pink', 'yellow']
plt.pie(size, colors = colors, explode = [0, 0.15], labels = ['Female', 'Male'], shadow = True, autopct =
'%.2f%%')
plt.title('Gender', fontsize = 15)
plt.axis('off')
plt.legend()
plt.show()
sns.catplot(x="Gender", kind="count", palette="cool", data=data)
15
Dept. of CSE, NIT
16
Dept. of CSE, NIT
Heatmap
A heatmap is another way to visualize hierarchical clustering. It’s also called a false colored image, where
data values are transformed to color scale.
Heat maps allow us to simultaneously visualize clusters of samples and features. First hierarchical clustering
is done of both the rows and the columns of the data matrix. The columns/rows of the data matrix are re-
ordered according to the hierarchical clustering result, putting similar observations close to each other. The
blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally, a color scheme is applied for the
visualization and the data matrix is displayed. Visualizing the data matrix in this way can help to find the
variables that appear to be characteristic for each sample cluster.
plt.title('Heatmap',fontsize=15)
sns.heatmap(X,cmap='BuPu')
17
Dept. of CSE, NIT
sns.heatmap(data.corr(), cmap = 'Greens', annot = True)
plt.title('Heatmap for the Data', fontsize = 15)
plt.show()
Pairplot
To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This shows
the relationship for (n, 2) combination of variable in a DataFrame as a matrix of plots and the diagonal plots
are the univariate plots.
18
Dept. of CSE, NIT
Fig: Seaborn pairplot
19
Dept. of CSE, NIT
Clustering Algorithms
There are many unsupervised clustering algorithms out there and although each of them has significant
strengths in certain situations, I will discuss two that are commonly used.
k-Means Clustering
K-means clustering is an unsupervised machine learning algorithm for clustering ’n’ observations into ‘k’
clusters where k is predefined or user-defined constant. The main idea is to define k centroids, one for each
cluster.
The K-Means algorithm involves:
1. Choosing the number of clusters “k”.
2. Randomly assign each point to a cluster
3. Until clusters stop changing, repeat the following:
 A. For each cluster, compute the cluster centroid by taking the mean vector of points in the cluster.
 B. Assign each data point to the cluster for which the centroid is the closest.
Two things are very important in K means, the first is to scale the variables before clustering the data *, and
second is to look at a scatter plot or a data table to estimate the number of cluster centers to set for the k
parameter in the model.
Note: Scaling is necessary when a distance between attributes is not sensible(i.e. distance between Age and
Height; different metrics are important too!). On the other hand, if you have attributes with a well-defined
meaning(e.g. latitude and longitude) then you should not scale your data, because this will cause distortion.
Our hypothesis and the answer we are trying to give using k-means is that there is an intuition that customers
can be grouped (clustered) according to their spending score given their income. My null hypothesis (which I
am trying to disprove) is that there are no groups(clusters) of customers based on these.
In my experience, this is by far the most frequently used algorithm for clustering data. k-Means starts by
choosing k random centers which you can set yourself. Then, all data points are assigned to the closest center
based on their Euclidean distance. Next, new centers are calculated and the data points are updated (see gif
below). This process continuous until clusters do not change between iterations.
We can use k-means++ to improve the initialization of the centers. It starts with an initial center and makes
sure that all subsequent centers are sufficiently far away. This optimizes the selection and creation of centers.
20
Dept. of CSE, NIT
You can then determine the optimal k clusters by using something called the elbow method. You want to find
the point of diminishing returns when selecting a range of clusters. You can do this by plotting the number of
clusters on the X-axis and the inertia (within-cluster sum-of-squares criterion) on the Y-axis. You then
select k for which you find a bend:
#Use of the elbow method to find the optimal number of clusters

wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.scatter(range(1, 11),wcss,c='b',s=100)
plt.plot(range(1, 11),wcss,c='r',linewidth=4)
plt.title('The Elbow Method',fontsize=20)
plt.xlabel('Number of clusters',fontsize=20)
plt.ylabel('Within-cluster-sum-of-squares',fontsize=20)
plt.show()
21
Dept. of CSE, NIT
You can see the bend at the green square. Thus, we selected k=5 clusters to be generated using k-Means.
One thing to note, since k-Means typically uses Euclidean distance to calculate the distances it does not work
well with high dimensional data sets due to the curse of dimensionality. This curse, in part, states that
Euclidean distances at high dimensionality have very little meaning since they are often very close together.
The data that we use is somewhat high dimensional since we have 27 features.
A solution would be to use the Cosine distance which works better in high dimensional space. Since Cosine
distance and Euclidean distance are connected linearly for normalized vectors we can simply normalize our
data.
As you can see the optimal number of clusters is five! Here comes the training part.
# Train the K-Means model on the dataset

kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 0)
y_kmeans = kmeans.fit_predict(X)# Visualization of the clusters of customers
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 60, c = 'c', label = '#1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 60, c = 'g', label = '#2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 60, c = 'm', label = '#3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 60, c = 'b', label = '#4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 60, c = 'r', label = '#5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 100, c = 'yellow', label = 'Centroids')
plt.title('Clusters',fontsize=20)
plt.xlabel('Annual Income (k$)',fontsize=20)
plt.ylabel('Spending Score (1-100)',fontsize=20)
plt.legend()
plt.show()
22
Dept. of CSE, NIT
23
Dept. of CSE, NIT
Hierarchical Clustering
The customers in first cluster(cyan) have average annual income as well as average spending score. In case of
the second cluster(green),customers with lower annual income but higher spending score belong to it.People
with both higher annual income and higher spending score belong to third cluster(magenta).The customers in
fourth cluster(blue) have lower annual income and lower spending score. People in fifth cluster(red) have
lower spending score but have a high annual income!
Let us visualize the dendrogram :
# Use the dendrogram to find the optimal number of clusters

dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram',fontsize=20)
plt.xlabel('Customers',fontsize=20)
plt.ylabel('Euclidean distances',fontsize=20)
plt.show()
24
Dept. of CSE, NIT
Birch
k-Means can be computationally quite expensive. Faster alternatives to this method

are MiniBatchKMeans and BIRCH. Both methods are quicker to generate clusters, but the quality of those
clusters are typically less than those generated by k-Means.
Let us see how Birch works from the following code.

brc = Birch(n_clusters=5)
brc.fit(X)
brc_y_pred = brc.predict(X)
plt.legend()
plt.scatter(X[:,0], X[:,1],c=brc_y_pred, cmap='coolwarm')
plt.title("BIRCH")
25
Dept. of CSE, NIT
Gaussian mixture models (GMMs)
Gaussian mixture models (GMMs) are often used for data clustering. You can use GMMs to perform either
hard clustering or soft clustering on query data. To perform hard clustering, the GMM assigns query data
points to the multivariate normal components that maximize the component posterior probability, given the
data.
gmm = GaussianMixture(n_components=5)
gmm.fit(X)
gmm_y_pred = gmm.predict(X)
plt.legend()
plt.scatter(X[:,0], X[:,1],c=gmm_y_pred, cmap='coolwarm')
plt.title("Gaussian Mixture Model")
26
Dept. of CSE, NIT
Affinity propagation
Affinity propagation (AP) is a clustering algorithm based on the concept of “message passing” between data
points. Unlike clustering algorithms such as k-means or k-medoids, affinity propagation does not require the
number of clusters to be determined or estimated before running the algorithm. Similar to k-medoids, affinity
propagation finds “exemplars,” members of the input set that are representative of clusters.
ap = AffinityPropagation(random_state=0)
ap.fit(X)
ap_y_pred = ap.predict(X)
plt.legend()
plt.scatter(X[:,0], X[:,1],c=ap_y_pred, cmap='spring')
plt.title("Affinity Propagation Model")
27
Dept. of CSE, NIT
Mean shift clustering
Mean shift clustering aims to discover “blobs” in a smooth density of samples. It is a centroid-based algorithm,
which works by updating candidates for centroids to be the mean of the points within a given region.
ms = MeanShift(bandwidth=2)
ms.fit(X)
ms_y_pred = ms.predict(X)
plt.legend()
plt.scatter(X[:,0], X[:,1],c=ms_y_pred, cmap='seismic')
plt.title("MeanShift Model")
28
Dept. of CSE, NIT
DBSCAN
Clustering can also be done based on the density of data points. One example is Density-Based Spatial
Clustering of Applications with Noise (DBSCAN) which clusters data points if they are sufficiently dense.
DBSCAN identifies clusters and expands them by scanning neighborhoods. If it cannot find any points to add
it simply moves on to a new point hoping it will find a new cluster. Any points that lack enough neighbors to
be clustered are classified as noise:
The difference with k-means is that DBSCAN does not require you to specify the number of clusters. The two
main parameters for DBSCAN are the minimum number of points that constitute a cluster and the size of the
neighborhood.
There is a relative of DBSCAN, called OPTICS (Ordering Points to Identify Cluster Structure), that invokes
a different process.
29
Dept. of CSE, NIT
Optics
Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based.
1. clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and
Jörg Sander.
2. Its basic idea is similar to DBSCAN,[3] but it addresses one of DBSCAN’s major weaknesses: the problem
of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly)
ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is
stored for each point that represents the density that must be accepted for a cluster so that both points belong
to the same cluster. This is represented as a dendrogram.
opt = OPTICS(min_samples=5)
opt_y_pred = opt.fit_predict(X)
plt.legend()
plt.scatter(X[:,0], X[:,1],c=opt_y_pred, cmap='inferno')
plt.title("OPTICS Model")
Training time!
# Train the Hierarchical Clustering model on the dataset
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)
30
Dept. of CSE, NIT
Evaluating Clusters
Next step is to perform the actual clustering and try to interpret both the quality of the clusters as well as its
content.
However, although objective measures are preferred I believe that when it comes to unsupervised clustering
visually examining the clusters is one of the best ways to evaluate them. Never blindly follow objective
measures. Make sure that you always inspect what exactly is happening!
Thus, next up are methods for visualizing clusters in 2d and 3d.
Visualizing Clusters
To visualize the clusters you can use one of the most popular methods for dimensionality reduction.
# Visualization of the clusters of customers

plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 60, c = 'm', label = '#1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 60, c = 'r', label = '#2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 60, c = 'b', label = '#3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 60, c = 'g', label = '#4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 60, c = 'c', label = '#5')
plt.legend()
plt.show()
31
Dept. of CSE, NIT
32
Dept. of CSE, NIT
We can then visualize our data in 3d:
data['label3'] = labels3
trace1 = go.Scatter3d(
x= data['Age'],
y= data['Spending Score (1-100)'],
z= data['Annual Income (k$)'],
mode='markers',
marker=dict(
color = data['label3'],
size= 20,
line=dict(
color= data['label3'],
width= 12
),
opacity=0.8
)
)
data = [trace1]
layout = go.Layout(
# margin=dict(
# l=0,
# r=0,
# b=0,
# t=0
# )
title= 'Clusters',
scene = dict(
xaxis = dict(title = 'Age'),
yaxis = dict(title = 'Spending Score'),
zaxis = dict(title = 'Annual Income')
)
)
fig = go.Figure(data=data, layout=layout)
py.offline.iplot(fig)
33
Dept. of CSE, NIT
34
Dept. of CSE, NIT
CHAPTER-6
FINAL RESULT
Visualization and Interpretation of the Results
Let’s plug in the k=4 to K-means and visualize how customer groups are created:
# create clustering model with optimal k=4
updated_kmeans_model = KMeans(n_clusters = 4,
init='k-means++',
max_iter=500,
random_state=42)updated_kmeans_model.fit_predict(customers.iloc[:,3:])
Values of each data point can be observed here interactively
35
Dept. of CSE, NIT
Data points are shown in spheres and centroids of each group are shown with cubes. 4 customer groups are as
follows:
Blue: Customers who ordered at least one product, with maximum total spending of 100 and having the
highest average return rate. They might be the newcomers of the e-commerce website.
Red: Customers who ordered 1 to 4 products, with average total spending of 150 and a maximum return rate
of 0.5.
Purple: Customers who ordered 1 to 4 products, with average total spending of 300 and a maximum return
rate of 0.5.
Green: Customers who ordered 1 to 13 products, with average total spending of 600 and average return rate
as 0. It makes the most favourable customer group for the company.
Let’s look at how many customers are there in each group — known as cluster
magnitudes:
The overall strategy would be preserving the most favourable customer group — the green one — while
moving the blue customer group to the red and purple areas.
36
Dept. of CSE, NIT
Blue group is 42% of all customers, any improvements achieved in this customer group will dramatically
increase the revenue. Eliminating high return rates and offering gift cards can move this customer group to
low-average-return-rate and high-total-spending area. If we assume that they are newcomers, gift cards can
expedite their come-back.
Red and purple group together consists of 50% of all customers. They are showing the same characteristics
from the average return rate and products ordered perspectives but differ from total spending. These groups
can be defined as who already know the brand and orders multiple products. Those customers can be kept up-
to-date with the brand with some specialized communications and discounts.
Green customer group consists of 8% of all customers, forming the most favourable customer group for the
brand. They order multiple products and they are highly likely to keep them. To maintain and possibly expand
this group, special deals and pre-product launches might help. Moreover, they can be magnets for new
customers impacting the expansion of the customer base.
37
Dept. of CSE, NIT
CONCLUSION
We approached customer segmentation problem from a behavioural aspect with the number of
products ordered, average return rate and total spending for each customer. Use of 3 features
helped us to understand and visualize the model.
All in all, the dataset was apt to perform an unsupervised machine learning problem. At first,
we only had customer’s data with order information and did not know if they belonged to any
group. With the K-means clustering, patterns in the data were found and extended further into
groups. We carved out strategies for the formed groups, making meaning out of a dataset that
is a dust cloud initially.

Customer Segmentation Using Unsupervised Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Customer Segmentation Using Unsupervised Machine Learning

Uploaded by

Copyright:

Available Formats

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

NAVODAYA INSTITUTE OF TECHNOLOGY, RAICHUR

Under The Guidance Of:

NAVODAYA INSTITUTE OF TECHNOLOGY

Prof. Vijay Kumar Yadav Dr.M. N. Faruk Dr. M. V. Mallikarjuna

Our sincere thanks to beloved principal Dr.M.V.Mallikarjuna, Navodaya Institute Of

CHAPTERS CONTENTS PAGE-NO.

6 Results and Discussion

Customer segmentation enables a company to customize its relationships with the

 Targeted marketing activities to specific groups

 Launch of features aligning with the customer demand

 Development of the product roadmap

Benefits of customer segmentation include:

 Personalization ensures that you provide exceptional customer experience.

3. Better ROI for marketing

4. Reveal new opportunities

Approach — Machine Learning

 Clustering is an exploration technique for datasets where relationships between different

2. Principal Component Analysis (PCA)

 PCA is a statistical procedure that uses an orthogonal transformation to convert a set of

Fig.: Customer information.

Hardware and software details

Software used to run the project was Python (GoogleColaboratory).

 Zero configuration required

Colab notebooks are Jupyter notebooks that are hosted by Colab.

 Getting started with TensorFlow

(b) The needs from segment to segment are significantly different.

(c) The segments are measurable (definable), accessible and actionable.

CUSTOMER SEGMENTATION STRATEGY

 Investigate: If the segment is attractive but your strength is low.

 Drop: If both segment attractiveness and your strength are low.

# Import the libraries

Import the dataset

X = data.iloc[:, [3, 4]].values

Visualizing distribution of data

Distribution of annual income

Distribution of spending score

Individual customers, where the majority pertains to women.

sns.catplot(x="Gender", kind="count", palette="cool", data=data)

Fig: Seaborn pairplot

The K-Means algorithm involves:

1. Choosing the number of clusters “k”.

2. Randomly assign each point to a cluster

3. Until clusters stop changing, repeat the following:

#Use of the elbow method to find the optimal number of clusters

# Train the K-Means model on the dataset

Let us visualize the dendrogram :

# Use the dendrogram to find the optimal number of clusters

k-Means can be computationally quite expensive. Faster alternatives to this method

Let us see how Birch works from the following code.

to the same cluster. This is represented as a dendrogram.

Thus, next up are methods for visualizing clusters in 2d and 3d.

# Visualization of the clusters of customers

Visualization and Interpretation of the Results

Values of each data point can be observed here interactively

You might also like