You are on page 1of 35

BT4211

Data-Driven Marketing
Customer: Purchase Choice, Quantity,
Duration

March 7, 2018 1
Purchase Decisions & Models

 Purchase choice
– Whether the customer will buy/churn?
– What brand/product/service will the customer buy?
• Binary logit (logistic regression) model, multinomial logit model
 Purchase quantity
– How much or how many units will the customer buy?
• Count data model (Poisson, negative binomial)
 Duration: inter-purchase, customer lifetime
– How soon will the customer make another purchase?
– How long will the customer stay on with the firm?
• Linear regression model, hazard model
2
Binary Response Models

 Linear probability model

– Link function

3
Binary Response Models

 Linear probability model


– Problems for binary responses
• Error term violates homoscedasticity assumption of
classical linear regression model
– Heteroscedasticity, if not corrected for, can increase prediction
error
• Predicted probability may not be bounded from 0 to 1
– Predictions can be impossible to interpret as probabilities

4
Binary Response Models

 Binary logit (logistic regression) model


– Link function

– Estimation method
• Maximum likelihood estimation

– Interpretation
• Odds ratio:
• Odds ratio per standard deviation change in x:
5
Binary Response Models

 Binary probit model


– Link function

– Estimation method
• Maximum likelihood estimation

6
Binary Response Models

 Logistic regression with rare events data


– Problems
• Rare event response rates below 1% are not unusual
• Binary logit and probit models can under-estimate
response probability in such cases
• Predicted response probabilities under-estimate the
actual likelihood of response
– Solutions
• Adjustments with choice-based sampling

7
Multinomial Response Models

 Multinomial response model


– Specification
• Number of choice (or response) alternatives is J
• Probability of a consumer i choosing alternative j

– Applications
• Brand or product choices
• Customer segment predictions
8
Multinomial Response Models

 Choice of function results in different


multinomial model types
• Examples: logit, probit, nested logit, ordered logit, etc.
 Alternative-varying regressors
• Regressors xi take different values for different alternatives

• Examples: costs of transport modes, prices of brands


 Alternative-invariant regressors
• Regressors xi take same values across alternatives

• Examples: socioeconomic status such as income, gender 9


Multinomial Response Models

 Model evaluation and selection methods


– Range of in-sample fitted probabilities for each
alternative
• Wider the range, the more discriminating is the model
– Akaike and/or Bayesian Information Criterion

– Pseudo R2

10
Multinomial Response Models
 Conditional logit model (CL)
• For alternative-varying regressors

 Multinomial logit model (MNL)


• For alternative-invariant regressors

 Mixed logit model (ML)


• For both alternative-varying and -invariant regressors

11
Multinomial Response Models

 Example:

12
Multinomial Response Models

 Example: marginal effects


– Conditional logit model (CL):
– Multinomial logit model (MNL):

13
Marginal Effects of Regressors

 Marginal effects of regressors:


– Change in conditional mean of y when regressors
x change by one unit
– Linear regression:

– Non-linear regression:

– General regression function

14
Marginal Effects of Regressors

 Marginal effects of regressors


– Calculus method

– All 3 measures are same for linear models


– All 3 measures are different for non-linear models
• Care must be taken in interpreting estimated coefficients
• R, Stata commands: margins, after model estimations 15
Count Data Models

 Overview of count data


– Discrete data with ordered metric (0,1,2,3,…)
• Examples
– Number of beers a consumer drinks in a week
– Number of mail orders a customer makes in a year
– Number of complaints a customer makes in a month
– Alternative modeling methods
• Multinomial logit model
– Inappropriate since dependent variable is ordered
• Linear regression model
– Inappropriate assumptions of normally distributed error terms
and continuous nature of dependent variable
16
Count Data Models

 Poisson regression model


– Specification

– Estimation method
• Maximum likelihood estimation

17
Count Data Models

 Poisson regression model


– Limitations
• Distribution is parameterized in terms of a single scalar
parameter
• Excess zeros problem
– More zeros in data than Poisson model predicts
• Over-dispersion in data
– Variance exceeds mean but Poisson model implies equality of
variance and mean
– Poisson MLE is still consistent, if conditional mean is correctly
specified
– Leads to deflated standard errors, inflated t statistics
– Over-dispersion and under-dispersion test statistics
18
Count Data Models

 Negative binomial regression model


– Specification

• Conditional distribution of Yi given ui

• Unconditional distribution of Yi

19
Count Data Models

 Negative binomial regression model


– Specification
• Unconditional distribution of Yi with ui assumed to be
from a Gamma distribution

• Mean:
• Variance:
– Estimation method
• Maximum likelihood estimation

20
Duration Models

 Overview of duration data


– Continuous or discrete time duration variable
• Example questions addressed by duration models
– What is the probability that a customer in a telecommunication
company will remain as a customer after a year?
– What is the attrition probability of each customer in a month?
– Are attrition probabilities different depending on the customer’s
demographic characteristics?
– What is the expected duration of a customer’s relationship with
the firm?

26
Duration Models
 Overview of duration data
– Censoring
• Buyer 1: complete information
• Buyer 2: left-censored
• Buyer 3: right-censored
• Buyer 4: left-and-right censored
• Buyer 5: interval-censored
Buyer 1

Buyer 2

Buyer 3

Buyer 4

Buyer 5

t0 Observation Window tN


27
Duration Models

 Linear regression model


– Method
• Simplest model to explain the relation between customer duration and
other explanatory variables
• Focus only on sample of prior customers (with full lifespan observed)
and omit right-censored observations, i.e., current customers
– Limitations
• Potential censoring bias, since data sample does not include all
customers, but only those prior ones with full lifespan observed
– Problematic especially when number of complete observations is small
relative to number of incomplete observations (i.e., current customers)
• Limited in helping to manage customer relationships
– Does not address probability of attrition during specified time periods

28
Duration Models

 Hazard model
– Objective
• Models length of time spent in a given state before
transition to another state
– Duration from being an active customer to a churned one
– Duration between two consecutive purchases
– Basic concepts
• Cumulative distribution function

• Survivor function

– Probability that the length of duration is at least t 29


Duration Models

 Hazard model
– Basic concepts
• Hazard rate function

– Instantaneous probability of leaving a state conditional on


survival to time t
• Cumulative hazard rate function

30
Duration Models

 Hazard model
– Basic concepts
• Hazard rate function: examples

31
Duration Models
 Hazard model
– Basic concepts
• Hazard rate function plots

32
Duration Models
 Hazard model
– Exponential distribution
• Constant hazard rate that does not vary with time
• Memory-less property
– Weibull distribution
• Hazard is monotonically increasing if
• Hazard is monotonically decreasing if
• can be a function of covariates X
– Generalized Weibull distribution
• Additional shape parameter , gives more flexibility
• Hazard is monotonically decreasing if
• Hazard is unimodal or U-shaped if 33
Duration Models
 Hazard model
– Gompertz distribution
• Hazard is monotonically increasing if
• Hazard is monotonically decreasing if
– Log-normal distribution
• Hazard is inverted U-shaped
– Log-logistic distribution
• Hazard is inverted U-shaped if
– Main issues in modeling
• Dependence on correct model specification
• Proportional Hazard (PH) model
• Accelerated Failure Time (AFT) model 34
Duration Models

 Maximum likelihood estimation


• Uncensored observations

• Censored observations

 Likelihood function for ith observation

 Log-likelihood function for entire sample

35
Proportional Hazard Model

 Conditional hazard rate

 Baseline hazard

• If 0 (t ,  ) is assumed to be non/semi-parametric =>


Cox Proportional Hazard model
 Scale factor

 Distributional examples
• Exponential, Weibull, Gompertz distributions 36
Proportional Hazard Model

 Interpretation
– Hazard ratio:
– Relative hazard rate:
• Percentage change of hazard rate with respect to the
unit change of the independent variable

37
Accelerated Failure Time Model

 Models ln(t) rather than t

 Conditional hazard rate

• Acceleration of baseline hazard if


• Deceleration of baseline hazard if
 Distributional examples
• Exponential, Weibull, Log-normal, Log-logistic distribution 38
Stata Commands for Models

 Linear regression model


• reg; xtreg
 Binary logit, probit model
• logit; probit; xtlogit; xtprobit
 Conditional logit model
• clogit
 Multinomial logit model
• mlogit
 Poisson regression model
• poisson; xtpoisson
39
Stata Commands for Models

 Negative binomial regression model


• nbreg; xtnbreg
 Tobit (Type I) model
• tobit; xttobit
 Tobit (Type II) model
• heckman
 Proportional hazard model
• stcox x1, x2, …; streg x1, x2, …; xtstreg x1, x2, …
 Accelerated failure time model
• streg x1, x2, …, time; xtstreg x1, x2, …, time
40

You might also like