You are on page 1of 23

Cluster analysis

1
WHAT IS SEGMENTATION?

• Segmenting, at its most basic,

• Is the separation of a group of customers with different needs into subgroups of


customers with similar needs and preferences.

• By doing this, a company can better tailor and target its products and services to meet
each segment’s needs.

2
WHY DO WE NEED SEGMENTATION?

• Segmentation is a critical enabler to achieve business objectives and realize benefits

• Segmentation is critical to identify white spaces for new products/offerings

• Segmentation helps organizations to optimize their retention and acquisition strategy

• Segmentation is often used to optimize pricing across different products

• Segmentation enables organizations to become more customer-centric

• Market Dynamics make segmentation critical to business success

3
WHAT ARE THE DIFFERENT WAYS OF SEGMENTATION?

What are they doing?


• Product usage and loyalty tactic
Purchase • Brand Awareness
behaviour • Price Paid, SoW, Frequency
segmentation
How are they doing it?
• Purchase & shopping
behaviours
Channel • Key influencers
segmentation
Who are they?
• Lifestyle and life stage
• Geography
Demographics • Industry type (B2B)
segmentation

When and Where are they


doing it? strategic
Occasion • Purchase and usage occasions
segmentation

Needs
Why are they doing it?
• Category needs, desires and
segmentation
beliefs

Picture courtesy Prof. Theodoros Evgeniou 44


WHAT ARE THE DIFFERENT KINDS OF DATA USED FOR SEGMENTATION?

Primary data (Qual Customer data Third party data


and Quant)

• Behavioral • Product/Service • Credit score


usage
• Satisfaction • Demographics
• Subscription
• Preferred comm • Behavioral
channels • Features usage

• Preferred • Social network


engagement level integration

• Attitudes • Acquisition
channel
• Some
demographics

5
WHAT MAKES FOR A GOOD SEGMENTATION?

Segmentation exercise is considered successful if the segments formed are

Identifiable

Substantial

Accessible

Stable

Differentiable

Actionable

6
the basic concept

7
CLUSTER ANALYSIS THE BASIC CONCEPT

• Cluster analysis is an interdependence technique used to classify objects into relatively


homogeneous groups called clusters

• Cluster analysis is a classification technique that falls under the umbrella of


unsupervised learning methods and so is different from classification methods like
Logistic Regression, Discriminant Analysis, CART, CHAID which are termed as
supervised learning methods.

• Difference between cluster analysis and other methods mentioned above is that
clusters are discovered from the data and are not known apriori

8
Conducting Cluster analysis –
the steps involved

9
STEP-1: FORMULATE THE PROBLEM

• Formulate the problem (Select variables that form the basis of clustering)

– Select variables describe the similarity between objects in terms that are
relevant to the marketing problem

10
STEP-1: FORMULATE THE PROBLEM

&

&

11 11
STEP-1: FORMULATE THE PROBLEM

• Formulate the problem (Select variables that form the basis of clustering)

– Select variables describe the similarity between objects in terms that are
relevant to the marketing problem
– Select variables based on past research, theory or consideration of
hypotheses to be tested
– Consult experts in the category
– Clustering variables Vs Profiling variables

12
STEP- 2: SELECT A DISTANCE OR SIMILARITY MEASURE

• Measure similarity in terms of distance between objects


Similarity  1/distance

• Measures of similarity:
Euclidean distance {∑(Vai – Vbi)2}1/2
City block or Manhattan distance
Chebychev distance Max {|Vai – Vbi|}

• Euclidean distance is the most popular distance metric used

13
STEP- 3: SELECT A CLUSTERING PROCEDURE

• Hierarchical (a procedure characterized by a tree-like structure)

• Non-hierarchical (K-means clustering- a procedure that assigns a cluster


center and groups all objects within a specified threshold)

14
HIERARCHICAL CLUSTERING PROCEDURES
Cluster1 Cluster2

15
15
NON HIERARCHICAL CLUSTERING PROCEDURES

• Choose the number of clusters, k.

• Generate k random points as cluster centroids

• Assign each point to the nearest centroid

• Recompute the new cluster centroid

• Repeat till convergence criteria is met (assignment to clusters is not changing


over multiple iterations)

16
STEP- 3: SELECT A CLUSTERING PROCEDURE

• Question: What are the advantages, and disadvantages of non-hierarchical


clustering?

• Answer:
Advantages:
Faster, and has merit when the number of objects is large

Disadvantages:
Number of clusters must be pre-specified
Selection of cluster centers is arbitrary
Clustering solution depends on order of objects

17
STEP- 4: DECIDE ON THE NUMBER OF CLUSTERS

• Theoretical, conceptual or practical considerations might suggest a number

• In hierarchical clustering, distances at which clusters are combined can be


used as a criteria – get this info from agglomeration schedule or dendrogram

• In non-hierarchical clustering, ratio of Within group variance to between group


variance can be plotted against the number of clusters – point at which an
elbow occurs indicates the number of clusters

• Relative sizes of the clusters should be meaningful

18
EXHIBIT 1: AGGLOMERATION SCHEDULE The Coefficients column indicates
the distance between the two
clusters (or cases) joined at
each stage.

Cluster combined Stage cluster first


appears
Stage Cluster1 Cluster2 Coefficients Cluster1 Cluster2 Next stage

1 2 7 1.922 0 0 2

2 2 3 6.452 1 0 10

3 4 11 10.580 0 0 5

4 1 12 13.700 0 0 6

5 4 9 62.775 3 0 7

6 1 10 101.530 4 0 9

7 4 8 316.408 5 0 11

8 5 6 489.957 0 0 9

9 1 5 1530.504 6 8 10

10 1 2 2271.371 9 2 11

11 1 4 11671.97 10 7 0
For a good cluster solution, you will see a
4 clusters remain sudden jump in the distance coefficient
after stage8 (or a sudden drop in the similarity
coefficient) as you read down the table.
19
19
EXHIBIT 2 - DENDROGRAM

At each stage, one


case or cluster is
Stage joined with
another case or
1 2 7 cluster

2 2 When clusters or
3 cases are joined,
they are
3 4 11 subsequently
labeled with the
4 1 12 smaller of the two
cluster numbers.

5 4 9

10 2 1

11
1 4

1
20
20
STEP- 5: INTREPRET AND PROFILE THE CLUSTERS

• Examine the cluster centroids

• Profile clusters based on variables that were not used for clustering

• Identify variables that significantly differentiate between clusters using


Discriminant analysis or ANOVA

• Significant differences across not only Clustering variables but also


Descriptive variables, indicates presence of Natural clusters

21
CLUSTER ANALYSIS – STEPS INVOLVED

• Formulate the problem

• Select a similarity measure

• Select a clustering procedure

• Decide on the number of clusters

• Interpret, and profile the clusters

• Assess Reliability, and Validity

22
Thank you

23

You might also like