You are on page 1of 19

Predicting

Customer Lifetime
Value for Azure

Chao Zhong (Speaker)


Feng Zhu
Shijing Fang
Val Fontama
Session Goals
▪ Business objectives:
- How to predict customer value and identify high value customers early on, based
on the few key features (Recency, Frequency, Monetary) from customer history
- Appreciate the value and flexibility of the Customer Lifetime Value analysis in real
life applications

▪ Technical objectives:
- How to choose the right model based on business and data
- How to make Customer Lifetime Value predictions using the Fader model
- How to apply the Bayesian approach to real life problems
Sample Azure Customers

Customer cohort tenure recency frequency past_value future_value time_stamp proj_window

1 8/30/2014 9 7 2 168.39 2,707.16 11/1/2014 1,000

2 8/30/2014 9 0 9 750.78 29,873.87 11/1/2014 1,000

3 9/27/2014 5 0 5 412.86 33,617.62 11/1/2014 1,000

4 5/24/2014 23 0 23 1,892.52 28,821.90 11/1/2014 1,000

Terminology
• Recency = time since last purchase
• Frequency = number of purchases in the past
• Monetary = mean amount of purchases in the past
Design Requirements

▪ The Azure Customer Lifetime Value model must:


- Make long term prediction based on limited data
- Obtain high accuracy at both individual and group level
- Scale well to different projection windows
- Include both contracted and non-contracted customers
- Handle multiple Azure products, with very different consumption patterns
▪ Our main contribution to the CLV literature:
- Extend the applicability of the Fader BG/BB model to the cloud computing setting,
and in general, to an ensemble of heterogenous products, for customers with or
without contracts
Model Assumptions

▪ We make the following assumptions


- Stability in products and customers, but allow heterogeneity across customers
- Customers behave independently
- There are no contracts
- Bias correction necessary for contracted customers
- Consumption process is discrete / can be discretized
- Death (churn) is permanent
Fader BG/BB Model Workflow
Fader BG/BB Model Description 1
▪ At each purchase opportunity:
- Customer churn probability θ follows a Bernoulli distribution
- Customer (unobserved) lifetime follows a geometric distribution
- Heterogeneity in θ follows a beta distribution among customers
- Combining the above we have the beta-geometric distribution
▪ At each purchase opportunity, given the customer is alive:
- Customer purchase probability p follows a Bernoulli distribution
- Customer purchase in consecutive windows follows a binomial distribution
- Heterogeneity in p follows a beta distribution among customers
- Combining the above we have the beta-binomial distribution
▪ The goal is to infer the expected number of purchases E(X)
Fader BG/BB Model Description 2
▪ At each purchase opportunity, given the customer is alive and makes purchase
- Customer (mean) purchase amount m follows a gamma distribution
- Gamma distribution has two parameters, shape ρ and scale v
- Heterogeneity in v follows another gamma distribution among customers
- ρ is assumed to be constant across customers
- Combining the above we have the gamma-gamma distribution
▪ The goal is to infer the expected mean purchase amount E(M)
▪ The three latent variables (θ, p, m) are assumed to be independent a priori
- Posterior joint distribution shows negative correlation between θ and p
Data Description
▪ Contracted and non-contracted customers
- 106 weeks of weekly usage data
- Subscription based, consumption only, all services combined
- All customers started first paid use in the first 21 weeks
- 25 weeks training data, followed by 81 weeks testing data

▪ Sampling bias
- Training and testing data have different distributions
Fader BG/BB model performance 1
▪ Non-contracted customers aggregated
CLV
- Symmetric absolute error (SAPE) =
2.4% over 18 months out-of-time
testing
- Used 5 weeks for bias correction
(simple residual analysis)
▪ Non-contracted customers individual
CLV
- Mean absolute error (MAE) = 1600
units (~20/week), median AE = 430
units (~5/week) at end of testing
Fader BG/BB model performance 2
▪ Contracted customers aggregated CLV
- Symmetric absolute error (SAPE) =
38% over 18 months out-of-time
testing
- Violation of key assumption: non
contracted customers
▪ Contracted customers individual CLV
- MAE = 22380 units (~250/week),
median AE = 5880 units (~70/week) at
end of testing
Fader BG/BB model performance 3
▪ Contracted customers aggregated CLV
- SAPE = 1.3% over 13 months out-of-
time testing
- Used 25 weeks for bias correction
(simple residual analysis)
▪ Contracted customers individual CLV
- MAE = 22755 units (~400/week),
median AE = 6678 units (~120/week)
at end of testing
Fader BG/BB Model Iso-surfaces
▪ key observations:
- The higher surface, the lower the odds
of reaching it
- As frequency increases, the odds
initially decrease, then remain constant
- Specially in the high recency cases
- As recency increases, the odds initially
decrease, then form U-shape
- Majority of customers have recency
- Most high value customers have
recency = 0, and/or low frequency
Post model processing
▪ Bias correction (included)
- Residual analysis with few features
▪ Ensemble learners (included)
▪ Convert usage to revenue
▪ Apply discounting (included)
▪ Apply gross margin
▪ Aggregate to higher levels
- Offer types, cohorts, acquisition channels, industries, regions, etc.
Future work
▪ The work is being patented
▪ More model validation and experimentations
- Consolidate with other Customer Lifetime Value predictions in Microsoft
▪ Dynamic/interactive Customer Lifetime Value models
- Markov process, multi-world testing, reinforcement learning, etc.
▪ Product level Customer Lifetime Value models
Appendix 1. More Model Performance
Appendix 2. Fader Model Derivation 1
▪ Given observation, what is
the likelihood function of p
and θ
▪ And the likelihood function
of the beta parameters
▪ What is the conditional
E(X), given the history
and inferred parameters
Appendix 2. Fader Model Derivation 2
▪ Given observation, what is
the likelihood function of p
and v (Individual gamma)
▪ And the likelihood function
of p, q, γ (gamma-
gamma)
▪ Given gamma-gamma
parameters and
observation, what is the
conditional E(M)
Thanks! Questions?

You might also like