Predicting Customer Lifetime Value for Azure using the Fader Model

Predicting
Customer Lifetime
Value for Azure
Chao Zhong (Speaker)

Feng Zhu
Shijing Fang
Val Fontama
Session Goals
▪ Business objectives:
- How to predict customer value and identify high value customers early on, based
on the few key features (Recency, Frequency, Monetary) from customer history
- Appreciate the value and flexibility of the Customer Lifetime Value analysis in real
life applications
▪ Technical objectives:
- How to choose the right model based on business and data
- How to make Customer Lifetime Value predictions using the Fader model
- How to apply the Bayesian approach to real life problems
Sample Azure Customers
Customer cohort tenure recency frequency past_value future_value time_stamp proj_window
1 8/30/2014 9 7 2 168.39 2,707.16 11/1/2014 1,000
2 8/30/2014 9 0 9 750.78 29,873.87 11/1/2014 1,000
3 9/27/2014 5 0 5 412.86 33,617.62 11/1/2014 1,000
4 5/24/2014 23 0 23 1,892.52 28,821.90 11/1/2014 1,000
Terminology
• Recency = time since last purchase
• Frequency = number of purchases in the past
• Monetary = mean amount of purchases in the past
Design Requirements
▪ The Azure Customer Lifetime Value model must:

- Make long term prediction based on limited data
- Obtain high accuracy at both individual and group level
- Scale well to different projection windows
- Include both contracted and non-contracted customers
- Handle multiple Azure products, with very different consumption patterns
▪ Our main contribution to the CLV literature:
- Extend the applicability of the Fader BG/BB model to the cloud computing setting,
and in general, to an ensemble of heterogenous products, for customers with or
without contracts
Model Assumptions
▪ We make the following assumptions

- Stability in products and customers, but allow heterogeneity across customers
- Customers behave independently
- There are no contracts
- Bias correction necessary for contracted customers
- Consumption process is discrete / can be discretized
- Death (churn) is permanent
Fader BG/BB Model Workflow
Fader BG/BB Model Description 1
▪ At each purchase opportunity:
- Customer churn probability θ follows a Bernoulli distribution
- Customer (unobserved) lifetime follows a geometric distribution
- Heterogeneity in θ follows a beta distribution among customers
- Combining the above we have the beta-geometric distribution
▪ At each purchase opportunity, given the customer is alive:
- Customer purchase probability p follows a Bernoulli distribution
- Customer purchase in consecutive windows follows a binomial distribution
- Heterogeneity in p follows a beta distribution among customers
- Combining the above we have the beta-binomial distribution
▪ The goal is to infer the expected number of purchases E(X)
Fader BG/BB Model Description 2
▪ At each purchase opportunity, given the customer is alive and makes purchase
- Customer (mean) purchase amount m follows a gamma distribution
- Gamma distribution has two parameters, shape ρ and scale v
- Heterogeneity in v follows another gamma distribution among customers
- ρ is assumed to be constant across customers
- Combining the above we have the gamma-gamma distribution
▪ The goal is to infer the expected mean purchase amount E(M)
▪ The three latent variables (θ, p, m) are assumed to be independent a priori
- Posterior joint distribution shows negative correlation between θ and p
Data Description
▪ Contracted and non-contracted customers
- 106 weeks of weekly usage data
- Subscription based, consumption only, all services combined
- All customers started first paid use in the first 21 weeks
- 25 weeks training data, followed by 81 weeks testing data
▪ Sampling bias
- Training and testing data have different distributions
Fader BG/BB model performance 1
▪ Non-contracted customers aggregated
CLV
- Symmetric absolute error (SAPE) =
2.4% over 18 months out-of-time
testing
- Used 5 weeks for bias correction
(simple residual analysis)
▪ Non-contracted customers individual
CLV
- Mean absolute error (MAE) = 1600
units (~20/week), median AE = 430
units (~5/week) at end of testing
▪ Contracted customers aggregated CLV
- Symmetric absolute error (SAPE) =
38% over 18 months out-of-time
testing
- Violation of key assumption: non
contracted customers
▪ Contracted customers individual CLV
- MAE = 22380 units (~250/week),
median AE = 5880 units (~70/week) at
end of testing
▪ Contracted customers aggregated CLV
- SAPE = 1.3% over 13 months out-of-
time testing
- Used 25 weeks for bias correction
(simple residual analysis)
▪ Contracted customers individual CLV
- MAE = 22755 units (~400/week),
median AE = 6678 units (~120/week)
at end of testing
Fader BG/BB Model Iso-surfaces
▪ key observations:
- The higher surface, the lower the odds
of reaching it
- As frequency increases, the odds
initially decrease, then remain constant
- Specially in the high recency cases
- As recency increases, the odds initially
decrease, then form U-shape
- Majority of customers have recency
- Most high value customers have
recency = 0, and/or low frequency
Post model processing
▪ Bias correction (included)
- Residual analysis with few features
▪ Ensemble learners (included)
▪ Convert usage to revenue
▪ Apply discounting (included)
▪ Apply gross margin
▪ Aggregate to higher levels
- Offer types, cohorts, acquisition channels, industries, regions, etc.
Future work
▪ The work is being patented
▪ More model validation and experimentations
- Consolidate with other Customer Lifetime Value predictions in Microsoft
▪ Dynamic/interactive Customer Lifetime Value models
- Markov process, multi-world testing, reinforcement learning, etc.
▪ Product level Customer Lifetime Value models
Appendix 1. More Model Performance
Appendix 2. Fader Model Derivation 1
▪ Given observation, what is
the likelihood function of p
and θ
▪ And the likelihood function
of the beta parameters
▪ What is the conditional
E(X), given the history
and inferred parameters
Appendix 2. Fader Model Derivation 2
▪ Given observation, what is
the likelihood function of p
and v (Individual gamma)
▪ And the likelihood function
of p, q, γ (gamma-
gamma)
▪ Given gamma-gamma
parameters and
observation, what is the
conditional E(M)
Thanks! Questions?

Predicting Customer Lifetime Value for Azure using the Fader Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting Customer Lifetime Value for Azure using the Fader Model

Uploaded by

Copyright:

Available Formats

Predicting

Chao Zhong (Speaker)

Customer cohort tenure recency frequency past_value future_value time_stamp proj_window

1 8/30/2014 9 7 2 168.39 2,707.16 11/1/2014 1,000

2 8/30/2014 9 0 9 750.78 29,873.87 11/1/2014 1,000

3 9/27/2014 5 0 5 412.86 33,617.62 11/1/2014 1,000

4 5/24/2014 23 0 23 1,892.52 28,821.90 11/1/2014 1,000

▪ The Azure Customer Lifetime Value model must:

▪ We make the following assumptions

You might also like