This action might not be possible to undo. Are you sure you want to continue?
This document has been prepared as a guide to creating a customer segmentation of overall shopping habits in terms of amount spent and the frequency of shop. A Shopping Habits Segmentation will identify consistent patterns of purchasing behaviour by classifying shoppers according to how often and how much they spend in a retailer. This provides key information as to where opportunities exist to increase both the frequency and value of purchases of specific shoppers and so understanding which shoppers have the most potential for maximum return on investment.
The Shopping Habits Segmentation can also be used as a metric against which to measure the success of any marketing activity. For instance, as we will have already reviewed their consistent shopper behaviour over time, we will have already accounted for the regular ups & downs of a normal shoppers purchasing cycles. Therefore, we can review if this pattern significantly alters during a promotional period, and thereby calculate the true impact on value.
2. Overview of the analysis process
This section explains the overall process and steps that need to be taken in order to create a Shopping Habits Segmentation.
Key aspects of the segmentation process are
1. Dat a Preparat ion
2. St able Periods
3. Spend Bands
4. Frequency Bands
5. Value Segm ent s
The data preparation consists of choosing the relevant items and time period available then summarising sales to a weekly level per shopper. This weekly sales data is used to identify the most stable period of time, using statistical techniques, to evaluate shopper behaviour. Each shopper’s number of transactions and average transaction value are calculated for the two most recent stable periods. This forms the basis of the Spend Bands and Frequency Bands which are consistent and regularly occurring levels of frequency and spend across time periods. It is the combination of these two brands that the Shopping Habits Segments are based on.
3. Detailed Analysis Process 1. Preparing the data
Customer level transaction data is summarised to a weekly basis for the time period available. For a retailer this will be transactions across ALL the items in the store. If creating a Value Segmentation for a supplier, spend across the supplier’s category is all that is needed.
2. Determining the Stable Period
The bands and segments of the Shopping Habits Segmentation are based on customer behaviour over a period of time. This period of time must be carefully determined if the results of the segmentation are to be meaningful and actionable. If the period is too short then it will not be suitable to accurately reflect each customer’s shopping habits. For example, a period of 1 week would not be useful in order to identify how frequently customers shop in a supermarket. The solution would not be stable and a high percentage of customers would change segment each time it was updated. An unstable solution would also make it difficult to trace the same customers over a period of time, as customers can only be placed in a segment if they have shopped during the time of the stable period. On the other hand, if the period is too long, then it will not be very responsive to real changes in customer behaviour. For example, a period of 1 year would not be useful for identifying changes in customers shopping habits that occurred in the last 2 months.
Before starting to create the stable periods, the data needs to be cleaned of weeks with extremely high or low sales e.g. Christmas, Mother’s Day. This is done by plotting the weekly sales and identifying outliers by eye.
In order to achieve the balance of stability and responsiveness, we begin with an unstable short period of 1 week, from the mid-point of the data, and calculate measures of stability. We then incrementally increase this period week by week and monitor the changes in the measures of stability. The optimal period is known as the stable period and is taken at the point where increasing the length of the period results in relatively little improvement in the stability.
Figure 2.1 – Increasing the pre and post periods in order to find the stable period
A useful way of measuring the stability is to use a linear regression model in order to see how accurately a customer’s spend in the pre-period can be used to predict a customer’s spend in the post period. Key measures to be monitored in terms of improvement of stability would be the R-Square and the Beta of the model.
Below is an example from the grocery segmentation. It can be seen that there is very little improvement in stability after 6 weeks.
Change in Stability Measure
0.60 35% Beta Change R Square Change 30% 25% 20% 0.30 15% 0.20 0.10 0.00 1 2 3 4 5 6 7 8 9 10 Period (weeks) 10% 5% 0%
Change in Stability Measure
Figure 2.2 – Statistical measure to identify the optimum stable period
Recent transactions covering a period of 6 months for a grocer, and 12 months for a high street retailer should be sufficient in order to determine the stable period. The 3 month point
within the data can then be taken as the mid-week as shown in Figure 2.1. The starting point for the stability analysis would be a summary of each customers spend on a weekly basis. The weekly spends can then be aggregated step by step as illustrated.
Figure 2.3 – Creating stable period data
For example, if we had determined a stability period of 6 weeks and had data running up to th the 30 April 2006, then we would need to extract 12 weeks of data. Transactions in the 6 th weeks from 20 March 2006 to 30 April 2006 form the post period. For each customer, we need to calculate the total spend and the number of times they went shopping in this post period. From this we can also calculate each customer’s average transaction value in the post period. The average transaction value in the post period is the post-spend divided by the number of post-transactions.
We also need to look at the preceding 6 weeks i.e. in the pre period. In this example, these are the transactions in the period 6 February 2006 to 19 March 2006. For each customer, we need to calculate the total spend and the number of times they went shopping in this pre period. From this we can also calculate each customer’s Average Transaction Value in the pre period.
The summary will therefore contain the following variables: Customer Identification Pre number of transactions Post number of transactions Pre average transaction value Post average transaction value
We recommend removing customers that have not spent in both periods.
3. Creation of stable spend (average transaction value) bands
The summary created in the previous step is used first to calculate the stable spend bands. The average transaction value will vary significantly across the different customers in the data. For example, it may be that the average transaction values in the pre and post period range from £10.00 to £800.00. The average transaction value must first be grouped into 100 groups, each with approximately the same number of customers i.e. percentiles. The spend ranges that define each group are determined by the distribution of customers across the average transaction values. The table below demonstrates how this method of grouping works.
Group 1 2 3 4 .. .. .. . 79 100
ATV Range £10 - £15 £15 - £25 £25 - £33 £33 - £40 .. .. .. .. £650 - £700 £700 - £800
Number of Customers 2,000 2,000 2,000 2,000 .. .. .. .. 2,000 2,000
This can be achieved in many statistical software packages by mathematically ranking the average transaction value. This means it is possible, using the data summary from Step 2 to place each customer in a group in the pre-period and a group in the post-period based on their Average Transaction Value. Some customers will remain in the same group in both periods, whilst others will change group.
Based on the example above, a customer with an Average Transaction Value of £27 in the pre-period and an Average Transaction Value of £28 in the post-period will be in group 3 in both periods. However, a customer with an Average Transaction Value of £14 in the preperiod and an Average Transaction Value of £24 in the post-period will have moved from group 1 to group 2. In summary some customers will remain in the same group, others will move up groups and others will move down groups. As is explained in more detail in the appendix, these customer movements or migrations between groups will be used to statistically measure the relationship between these groups. These relationships will be measured in terms of a statistical distance. The groups are then combined to make larger groups by clustering based on this notional distance. These larger groups form the stable spend bands.
Average Transaction Value Groups
Stable Average Transaction Value Bands
For example, on grocery value segmentation this stage resulted in 5 stable average transaction value bands .
Increasing Average Transaction Value
Small Baskets £0 -110 over 6 weeks Regular Baskets £110 -175 over 6 weeks Full Baskets £175 -280 over 6 weeks Small Trolleys £280 - 440 over 6 weeks
Full Trolleys >£440 over 6 weeks
For example in grocery, spend groups £10-£15, £15-£25, £25-£33 and so on up to group £100-£110 formed a cluster together, called Small Baskets. Any spend group greater than this but less than £175, formed a cluster together called Regular Baskets and so on. These clusters define the bands.
4. Creation of stable frequency bands
The summary created in Step 3 is now used to calculate the stable frequency bands. The number of transactions will also vary across the different customers in the data. For example, it may be that the number of transactions in the pre and post period range from 1 to 100 transactions. There will be less variation than in the case of average transaction value and so the number of transaction bands can be split into fewer groups, say 25 to 50. The number of groups will depend on the actual distribution of values within the data. Again these groups can be determined by ranking the frequency of transactions. As in the case of spend, some customers will remain in the same group, others will move up groups and others will move down groups. These customer movements or migrations between groups will be used to statistically measure the relationship between these groups. These relationships will be looked at in terms of a statistical distance. The groups are then grouped together to make larger groups by clustering based on this notional distance. These larger groups form the stable frequency bands.
Stable Transactions Transaction Groups Bands
For example, on a grocery value segmentation this stage resulted in 4 stable frequency bands.
Occasional Shoppers 1-3 Times in 6 weeks Regular Shoppers 4-5 Times in 6 weeks Frequent Shoppers 6-11 Times in 6 weeks Very Frequent Shoppers >12 Times in 6 weeks
5. Creation of the Shopping Habits Segments
Creation of the m spend bands and n frequency bands means that each customer can be placed in a cell of an m x n grid in the pre-period and a cell of the m x n grid in the postperiod, based on the customer summary created in Step 2. For the grocery example, a 5 x 4 grid of 20 cells.
Stable Spend Bands
Stable Transaction Bands
1_2 2_2 3_2 4_2
1_3 2_3 3_3 4_3
1_4 2_4 3_4 4_4
1_5 2_5 3_5 4_5
2_1 3_1 4_1
Some customers will be in the same cell in both the pre and post periods, whilst others will move to other cells. For example, a customer that had shopped 2 times with an average transaction value of £120 in the pre-period would be placed in cell 1_2. If the same customer then shopped 5 times with an average transaction value of £90 in the post-period, they would be placed in cell 2_1.
To summarise some customers will remain in the same cell, others will move to adjacent cells, whilst others will move to even more distant cells. These customer movements or migrations between cells will be used to statistically measure the relationship between these cells and they are then grouped together to make larger groupings by clustering based on this notional distance. These larger groups form the segments. The details of this are explained in more detail in the appendix.
Very Low Spend Infrequent Intermittent Occasional Regular Frequent Very Frequent
Now And Then Occasional Shoppers Low Engagement Shoppers
Dry & Sensitive Premium Loyals Loyals
Migration Clustering Methodology
This section provides an overview of migration clustering. Migration clustering is the technique behind much of the segmentation and is used to create stable spend bands and stable frequency bands. It is then used to determine the spend-frequency segments. Migration clustering uses the customers’ spending patterns in order to define the optimal bands and segments. The advantage of this method is that the bands and segments are determined by actual customer behaviour. These relationships will be measured in terms of a statistical distance. This is illustrated below with a pictorial representation of individual shoppers’s number of transaction and average transaction values for a stable period. Clustering looks at the distance between shoppers and groups customers based on these distances i.e. shoppers with similar frequency and ATV will fall into the same band. E.g. The distance between shopper 1 and shoppers 2 is such that it is not deemed statistically significant, where as the distance between shopper 1 and 3 is sufficient to be classified as a separate shopping behaviour.
Number of transactions
Average transaction value
Figure A.1 demonstrates groupings of customers based on frequency of transaction and ATV
This is a distance profile of each shopper to all other shoppers that we can then cluster on, in a statistical package. Statistical outputs from the clustering as well as some human judgement should be used in deciding upon final bands and segments. We use hierarchical clustering with a complete linkage method. In complete linkage, the distance between two
groups is the maximum distance between a shopper in one group and a shopper in the other group. The changes in shopping behaviour are measured by comparing the customers’ behaviour in the pre-period to the behaviour in the post-period using the groups. Tracing the movements of customers between the groups from the pre-period to post-period allow us to define the statistical distance. Groups between which there are many customer movements are closer together in statistical space, such as groups 1 (£10-£15) and 2 (£15-£25) in Figure A.1 i.e. these groups represent similar spending behaviour. Groups between which there are relatively few movements are further apart, such as groups 1 and 3 in Figure A.1. These groups represent distinct types of shopping behaviour. The movements between groups can be quantified by counting the number of customers that move from each pre-period group to each post-period group. An example follows for calculating the statistical distances between all groups in a 3 group situation. For a Shopping Habits Segmentation, the number of groups will always be larger than this but the example illustrates the necessary calculations that can easily be applied to the real segmentation.
Figure A.2 demonstrates customer movements from 3 pre-period groups to 3 postperiod groups.
Each customer is placed in a group based on spending behaviour in the pre period. The same customers are then placed in a group based on spending behaviour in the post period. Some customers will remain in the same group. Others will change groups based on a change in spending behaviour between the two periods. For example, some customers who are in group 1 in the pre-period, will remain in group 1 in the post-period, some customers will move to group 2 and some will move to group 3.
The distances between all the bands can be summarised as a table. If there were only 3 bands, this would look like the table shown below.
Band 1 Band 1 Band 2 Band 3 d11
Band 2 d12
Band 3 d13
We need the number of customers in each group in the pre-period and the number of customers in each group in the post period. We also need to measure the movements. The movements shown in Figure A.3 are simple 1-way movements. In order to get a better measure of the statistical ‘closeness’ of two groups, it is important to consider the reverse movements. For example, in order to look at the relationship between group 2 and group 3, we need to consider not only customers moving from group 2 to group 3 but those moving from group 3 to group 2. These pairs of migrations can be represented mathematically as a cross tabulation. The pair of frequencies associated with the group2 – group3 migration are highlighted.
Post Group 1 Pre Group 1 2 3 2 3
f11 f21 f31
f12 f22 f32
f13 f23 f33
Figure A.4: The migrations of a 3 group example
Each pair of groups therefore has two frequencies associated with it. For customers that do not change groups, there is only one frequency i.e. the diagonals. In order to calculate a total measure of the movements between two groups, we add the 2 frequencies together, so that the appropriate measure of customer migrations between two groups i and j is:
Customer migration measure between group i and group j
= (migrations from group i to group j) + (migrations from group j to group i)
The customer migration measures calculated for the example are shown in the table below.
Pre period group i 1 1 1 2 2 3
Post period group j 1 2 3 2 3 3
Number of customers in pre period 5,000 5,000 5,000 6,000 6,000 7,000
Number of customers in post period 4,000 7,000 7,000 7,000 7,000 7,000
Customer migration measure 2,000 3,000 2,000 3,000 6,000 4,000
Figure A.5: The movement levels
As well as the customer migration measure, the numbers of customers in each group in the pre and post periods are used within the statistical distance calculations. This is because groups with large number of customers are likely to have higher flows of customer migrations. We need to take this into account.
We define the statistical distance between any pair of groups as:
Ti T j Fij
Where: Ti is the number of customers in group i in the pre period plus the number of customers in group i in the post period. Fij is the customer migration between group i and group j, as calculated in Equation 1.
For example, in the group 2 - group 3 migration, shown in Figure A.5, the customer migration measure is 6,000. In the pre-period, group 2 contains 6,000 customers and in the post-period it contains 7,000 customers. Group 3 contains 7,000 customers in both the pre and post periods. The statistical distance is calculated as:
(6,000 7,000) * (7,000 7,000) 6,000
These statistical distances should be calculated for each pair of groups. The groups can then be clustered based on these distances as explained at the start of the section.
The above migration clustering process is repeated for Defining the spend bands Defining the frequency bands Defining the final frequency-spend bands
N.B. Although two consecutive stable periods (pre and post) are used to define the stable bands and segments by looking at movements, the actual Average Transaction Values and the numbers of transactions in each band correspond to a period of one stable period.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.