Professional Documents
Culture Documents
Customer Lifetime
Value for Azure
▪ Technical objectives:
- How to choose the right model based on business and data
- How to make Customer Lifetime Value predictions using the Fader model
- How to apply the Bayesian approach to real life problems
Sample Azure Customers
Terminology
• Recency = time since last purchase
• Frequency = number of purchases in the past
• Monetary = mean amount of purchases in the past
Design Requirements
▪ Sampling bias
- Training and testing data have different distributions
Fader BG/BB model performance 1
▪ Non-contracted customers aggregated
CLV
- Symmetric absolute error (SAPE) =
2.4% over 18 months out-of-time
testing
- Used 5 weeks for bias correction
(simple residual analysis)
▪ Non-contracted customers individual
CLV
- Mean absolute error (MAE) = 1600
units (~20/week), median AE = 430
units (~5/week) at end of testing
Fader BG/BB model performance 2
▪ Contracted customers aggregated CLV
- Symmetric absolute error (SAPE) =
38% over 18 months out-of-time
testing
- Violation of key assumption: non
contracted customers
▪ Contracted customers individual CLV
- MAE = 22380 units (~250/week),
median AE = 5880 units (~70/week) at
end of testing
Fader BG/BB model performance 3
▪ Contracted customers aggregated CLV
- SAPE = 1.3% over 13 months out-of-
time testing
- Used 25 weeks for bias correction
(simple residual analysis)
▪ Contracted customers individual CLV
- MAE = 22755 units (~400/week),
median AE = 6678 units (~120/week)
at end of testing
Fader BG/BB Model Iso-surfaces
▪ key observations:
- The higher surface, the lower the odds
of reaching it
- As frequency increases, the odds
initially decrease, then remain constant
- Specially in the high recency cases
- As recency increases, the odds initially
decrease, then form U-shape
- Majority of customers have recency
- Most high value customers have
recency = 0, and/or low frequency
Post model processing
▪ Bias correction (included)
- Residual analysis with few features
▪ Ensemble learners (included)
▪ Convert usage to revenue
▪ Apply discounting (included)
▪ Apply gross margin
▪ Aggregate to higher levels
- Offer types, cohorts, acquisition channels, industries, regions, etc.
Future work
▪ The work is being patented
▪ More model validation and experimentations
- Consolidate with other Customer Lifetime Value predictions in Microsoft
▪ Dynamic/interactive Customer Lifetime Value models
- Markov process, multi-world testing, reinforcement learning, etc.
▪ Product level Customer Lifetime Value models
Appendix 1. More Model Performance
Appendix 2. Fader Model Derivation 1
▪ Given observation, what is
the likelihood function of p
and θ
▪ And the likelihood function
of the beta parameters
▪ What is the conditional
E(X), given the history
and inferred parameters
Appendix 2. Fader Model Derivation 2
▪ Given observation, what is
the likelihood function of p
and v (Individual gamma)
▪ And the likelihood function
of p, q, γ (gamma-
gamma)
▪ Given gamma-gamma
parameters and
observation, what is the
conditional E(M)
Thanks! Questions?