You are on page 1of 16

Received: 26 September 2015 Revised: 9 June 2017 Accepted: 31 July 2017

DOI: 10.1002/asmb.2276

RESEARCH ARTICLE

A MCMC approach for modeling customer lifetime behavior


using the COM‐Poisson distribution

Mohamed Ben Mzoughia1 | Sharad Borle2 | Mohamed Limam3,4

1
LARODEC Laboratory, ISG Tunis,
University of Tunis, Tunis, Tunisia One of the major challenges associated with the measurement of customer
2
Jesse H. Jones Graduate School of lifetime value is selecting an appropriate model for predicting customer future
Business, Rice University, Houston, Texas transactions. Among such models, the Pareto/negative binomial distribution
77005, USA
3
(Pareto/NBD) is the most prevalent in noncontractual relationships
ISG Tunis, University of Tunis, Tunis,
Tunisia characterized by latent customer defections; ie, defections are not observed by
4
Dhofar University, Salalah, Oman the firm when they happen. However, this model and its applications have
some shortcomings. Firstly, a methodological shortcoming is that the
Correspondence
Pareto/NBD, like all lifetime transaction models based on statistical distribu-
Mohamed Ben Mzoughia, LARODEC
Laboratory, ISG Tunis, University of tions, assumes that the number of transactions by a customer follows a Poisson
Tunis, Tunis, Tunisia. distribution. However, many applications have an empirical distribution that
Email: mohamed.mzoughia@gmail.com
does not fit a Poisson model. Secondly, a computational concern is that the
implementation of Pareto/NBD model presents some estimation challenges
specifically related to the numerous evaluation of the Gaussian hypergeometric
function. Finally, the model provides 4 parameters as output, which is
insufficient to link the individual purchasing behavior to socio‐demographic
information and to predict the behavior of new customers. In this paper, we
model a customer's lifetime transactions using the Conway‐Maxwell‐Poisson
distribution, which is a generalization of the Poisson distribution, offering more
flexibility and a better fit to real‐world discrete data. To estimate parameters, we
propose a Markov chain Monte Carlo algorithm, which is easy to implement.
Use of this Bayesian paradigm provides individual customer estimates, which
help link purchase behavior to socio‐demographic characteristics and an oppor-
tunity to target individual customers.

KEYWORDS
Conway‐Maxwell‐Poisson, customer lifetime transactions, customer lifetime value, MCMC,
prediction, segmentation

1 | INTRODUCTION

Customer lifetime value (CLV) is a customer level metric used to target profitable customers and to optimize marketing
resources.1 The concept of CLV attempts to account for the anticipated future profitability of a customer during each
time period of his/her lifetime with the firm. A major component in assessing CLV is to provide satisfactory predictions
of customer's lifetime and the number of transactions during that lifetime. This is challenging in a “noncontractual”

Appl Stochastic Models Bus Ind. 2018;34:113–127. wileyonlinelibrary.com/journal/asmb Copyright © 2017 John Wiley & Sons, Ltd. 113
114 MZOUGHIA ET AL.

setting where a firm does not observe customer defections when they happen. Examples of these would be retailing,
catalog purchasing, and a variety of online purchase environments.
Despite the ability of existing methods to provide a prediction of customer profitability, many suffer from limitations.
One of the most common limitations is the assumption that the number of transactions made by each customer during
his/her lifetime follows a Poisson distribution. This assumption presents a methodological shortcoming because the
Poisson distribution assumes “equi‐dispersion,” which is often not satisfied. Indeed, many real data violate this
assumption, and overdispersion or underdispersion of data is a recurrent problem in many applications.2-5
To deal with this limitation, we propose in this paper to use the Conway‐Maxwell‐Poisson (COM‐Poisson)
distribution for the number of transactions for a customer as an alternative to the extant Poisson distribution commonly
used in CLV models. It is important to note that there have been many extensions in models developed to estimate a
customer's lifetime and the number of transactions during that lifetime. We discuss these extensions in Section 2 of
the paper. However, none of these extensions address the Poisson assumption on the number of purchases by a
customer. This is one contribution of this paper. A recent work by Borle et al6 introduced a Bayesian data augmentation
estimation scheme for lifetimes and lifetime transactions in a noncontractual setting. Their work focused on providing a
flexible estimation scheme to estimate various lifetime transaction models and their extensions. However, given the data
structure in our paper, this estimation scheme requires a Poisson assumption on the number of transactions by the
customer. The estimation scheme fails to incorporate an assumption other than the Poisson distribution.
On the other hand, we assume that the customer's unobserved “lifetime” is exponentially distributed. Knowing
that most of the extensions of these models have changed the latent lifetime assumption, as BG‐NBD,7
GammaGompertz‐NBD,8 etc, while keeping the Poisson assumption unchanged, we propose to focus on relaxing the
Poisson assumption. It is with this goal that we keep the exponential distribution for the latent lifetime assumption.
The remainder of this paper is organized as follows: Section 2 presents the popular lifetime transaction models.
Beginning with the Pareto/negative binomial distribution (Pareto/NBD) model for lifetime purchases, the section
discusses its limitations and various extensions. Section 3 presents our proposed modeling and implementation
approach, which motivates this research. Section 4 deals with details of the proposed model and related expressions to
estimate model parameters and to predict future number of transactions. Section 5 applies the model to 2 different data
sets to evaluate model performance and to show its ability to use customer characteristics to prospect for new customers.
Finally, we conclude in Section 6.

2 | LIFETIME TRANSACTIONS MODELING

The probability models proposed to estimate lifetime transactions in noncontractual settings specify a latent lifetime
distribution for customers and a distribution for the observed number of transactions. The model likelihood function
is then derived by integrating out the unobserved (latent) lifetimes across customers (Borle et al6). Among the various
probability models proposed to estimate lifetime transactions in such noncontractual settings, the Pareto/NBD proposed
by Schmittlein et al9 and the beta‐geometric/NBD (BG/NBD) of Fader et al7 are the most widely used. A more recent
model introduced in the same tradition is the GGompertz/NBD model (Bemmaor and Glady8). The latent lifetime
distribution in these models is the Pareto, beta‐geometric, and the gamma‐Gompertz. However, all these models assume
that the number of transactions follows a Poisson distribution, with the Poisson parameter being heterogeneously
distributed across customers per a gamma distribution, thus leading to an NBD (negative binomial) for the number of
transactions across customers. In the following, we present the Pareto/NBD model. We discuss its extensions, namely,
the BG/NBD and GGompertz‐NBD models. We also discuss the limitations of these models.

2.1 | The Pareto/NBD model


The Pareto/NBD is the most widely used model for lifetime transactions in a noncontractual setting, a setting
characterized by unobserved (by the firm) customer defections. This model provides a prediction of customer's lifetime
and the number of transactions during that lifetime and is based on the customer's historical purchase behavior. Three
past measures are required for every customer: the cohort T, which is the time from beginning of a customer's
relationship with the firm until the current time; the frequency x, which is the number of transactions that the customer
has made till the observation time; and the recency tx, is the time from the start of a customer relationship till the last
observed purchase time.
MZOUGHIA ET AL. 115

This model assumes that the number of transactions made by each customer follows a Poisson process with a
heterogeneous transaction rate across customers following a gamma distribution. These assumptions give rise to an
NBD (negative binomial) model for the number of transactions while the customer is “alive.” The time to “death,” or
unobserved lifetime, is assumed to follow an exponential distribution for individual customers, with the exponential
parameter distributed heterogeneously across customers per a gamma distribution, thus giving rise to a Pareto
distribution for lifetimes across customers. Thus, we get the Pareto/NBD model for number of lifetime transactions by
the customer. The model is estimated by specifying and maximizing a likelihood function, which integrates out the
unobserved lifetimes.

2.2 | Pareto/NBD extensions and limits


Although the Pareto/NBD model has been popular and in some sense a “gold standard” for models on lifetime
transactions when lifetimes are latent (unobserved), implementation of the model presents some challenges related to
the numerous evaluations of the Gaussian hypergeometric function in the model likelihood function. One notable
extension of the model aimed at simplifying estimation and implementation is the BG/NBD model proposed by Fader
et al.7 This model, while keeping the assumption on transaction rates to be still Poisson, assumes the latent lifetime
for the customer to be a geometric process with the geometric parameter being heterogeneously distributed per a beta
distribution across customers, thus leading to a beta‐geometric distribution for latent lifetimes across customers. The
discrete nature of latent lifetime distribution implies that customers are allowed defection only immediately after
purchasing (transacting) unlike the Pareto/NBD model where customers can defect at any time (Fader et al7).
A more recent extension, the GGompertz‐NBD proposed by Bemmaor and Glady,8 considers that the latent lifetime
across customers follows a gamma‐Gompertz process. There have been other extensions such as the periodic death
opportunity model of Jerath et al,10 which decouples the discrete dropout opportunities from the purchase process.
However, one notable feature of all these extensions is that all of them have focused on and relaxed the assumption
on the manner in which customers drop out of their relationship with the company, but they have not questioned the
Poisson assumption considered to model the number of transactions. That is, these extensions have made changes in
the latent life process without questioning the Poisson assumption for the number of transactions. The latter assumption
can be a limitation as many studies have shown its constraining nature.4
Moreover, apart from model extensions, researchers have proposed various methods of estimating such models, for
example, Abe11 and Ma and Liu12 used hierarchical Bayes estimation for the Pareto/NBD model. Singh et al13 and Borle
et al6 proposed a Bayesian data augmentation scheme to estimate a variety of lifetime transaction models.
In this paper, we propose to use the COM‐Poisson distribution in the lifetime transactions model as an alternative to
Poisson in modeling the number of transactions by an individual customer.

3 | P R O P O S E D M O D EL I N G A N D I M P L E M E N T A T I O N AP P R O A C H

3.1 | The modeling approach


The Poisson distribution is one of the most widely used discrete distributions. Nonetheless, it presents a significant
limitation resulting from the use of a single parameter, reducing its flexibility in several cases. Shmueli et al4 presented
2 real‐world data sets, which do not fit a Poisson distribution model, considering them as “just two examples in an ocean
of non‐Poisson data,” and demonstrating the real need for a more flexible alternative to the Poisson distribution.
Conway‐Maxwell‐Poisson distribution introduced by Conway and Maxwell14 and formalized as a statistical
distribution by Shmueli et al4 is a generalized form of the Poisson distribution. It has 2 parameters and has the flexibility
of modeling a wide range of over and underdispersed data.4 The distribution nests 3 well‐known discrete distributions,
namely, the Bernoulli, the Poisson, and the geometric. Its flexibility and special properties has led to many applications
in various fields (Boatwright et al15; Sellers et al16; Kadane et al23).
The COM‐Poisson is a 2 parameter (λ, υ) distribution. Compared with the Poisson distribution, it uses an additional
parameter (υ), offering more flexibility and better fit to discrete data. When υ = 1, the COM‐Poisson distribution becomes
the standard Poisson distribution. For cases, υ < 1 and υ > 1, the COM‐Poisson distribution describes an overdispersion
or underdispersion respectively as compared with the Poisson.
Zhu et al17 explore the COM‐Poisson distribution as a base for developing a COM‐Poisson process, which represents
a flexible stochastic process generalizing the Poisson process allowing for overdispersion and underdispersion in
116 MZOUGHIA ET AL.

counting events. They present a homogeneous COM‐Poisson process, which considers that the number of events that
have occurred during a unit interval of time follow the COM‐Poisson distribution with parameters λ and υ and have
independent increments. The homogeneous COM‐Poisson process has a rate parameter λ and dispersion parameter υ,
and the number of events occurring in the time interval (t, t + τ] follows a sum of COM‐Poisson distributions (sCOM‐
Poisson [λ, υ, τ] distribution) independent of t.17 In the light of work by Zhu et al,17 our application can be considered
a homogeneous COM‐Poisson process modeling the number of purchases that occur in each equal period of time.

3.2 | Proposed implementation approach


After introducing the model in Section 4, we use Markov chain Monte Carlo (MCMC) techniques, namely, the
Metropolis‐Hastings sampling algorithm, to estimate various model parameters.18 Using this algorithm, we setup an
MCMC chain drawing samples from the joint posterior function. The algorithm (using a proposal distribution) proposes
a possible new state in the Markov chain based on the previous state. The posterior function is evaluated at the new state
to accept or reject the proposed state. The merit of this method is the ease of estimation and providing additional
individual specific information from the data to support marketing decisions.

4 | MODEL DEVELOPMEN T

4.1 | Model assumptions


The proposed model is based on the following assumptions:

1. The number of transactions made by a customer during each unit time period say (tm − 1, tm] follows a COM‐Poisson
distribution with parameters υ and λ. Therefore, the probability of observing xm transactions in the unit time period
(tm − 1, tm] is given by

λxm 1 ∞ λj
PðX ¼ x m jλ; υÞ ¼ ; where Z ð λ; υÞ ¼ ∑ υ: (1)
ðx m ! Þυ Z ðλ; υÞ j¼0 ðj! Þ

Further, the number of transactions xm in the unit time period (tm − 1, tm] is independent of the number of
transactions xm − 1 in the unit time period (tm − 2, tm − 1].

2. Parameter υ is fixed across customers.


3. A customer's unobserved lifetime of length τ is exponentially distributed with dropout rate μ:

f ðτ jμÞ ¼ μ e−μτ : (2)

4. Parameters λ and μ vary independently across customers.

The COM‐Poisson parameter υ can be thought of as a dispersion parameter (given λ) and offers flexibility by
modeling a wide range of over and underdispersed data. This parameter in our implementation is assumed to be
common across customers. However, the parameters λ and μ are allowed to vary independently across customers, thus
subscripting them with an individual specific “i”, λi, and μi, respectively. We estimate them individually so that we can
link individual customer behavior to customer characteristics.
The main difference between the proposed model and the Pareto/NBD Model, given interval‐censored data as
presented by Fader et al,7 is that the proposed model uses a COM‐Poisson distribution, whereas the Pareto/NBD model
uses the Poisson distribution to model transaction counts. The COM‐Poisson distribution allows a greater extent of
flexibility in accounting for overdispersion/underdispersion in the transaction counts. However, a potential drawback
of using the COM‐Poisson as contrasted to using the Poisson is that a Poisson assumption with transaction rate λ is
equivalent to assuming that the time between transactions follows the exponential distribution with transaction rate
MZOUGHIA ET AL. 117

FIGURE 1 Visual depiction of a customer's typical data string

λ. The exponential is a well‐understood and explored distribution and has been widely used by practitioners as well as
academics. Such equivalence for a COM‐Poisson distribution has only recently started being explored, Zhu et al.17

4.2 | Likelihood specification


The Pareto/NBD model uses total number of transactions x made x during the period of observation (x being the suffi-
n
cient statistic for Poisson assumption). Here x ¼ ∑i¼1 x i , where i indexes the time periods and n is the total number of
time periods for a customer. For the COM‐Poisson distribution, Kadane et al24 derived the sufficient statistics to be x
and ∏(xi!).
For every customer, we do not know exactly when each transaction occurred. However, what we do know for every
customer is the number of transactions xi that occurred during the time intervals (ti − 1, ti], i = 1, …, n. These time inter-
vals together constitute the total time interval for a customer from his/her start of transactions till the observation time.
Figure 1 is a depiction of a customer's typical data string.
The time intervals (ti − 1, ti] are assumed to be equal (example, 1 wk, 2 wk, 1 mo, etc). We can then consider t1 to be 1
unit of time, t2 as 2 units of time, and so on. The number of transactions in the first time interval (0, t1] is given by x1, in
the second time interval (t1, t2] by x2. Number of transactions in the (tm − 2, tm − 1] interval is given by xm − 1 and so on.
The length of these time intervals directly affects accuracy of the model, which increases with reductions in the time
interval. The choice of these subintervals can be either based on the knowledge of firm activity or by using mathematical
techniques such as Dirichlet tessellation.19
j n
For a particular customer, let yj ¼ ∑i¼1 x i , then the total number of transactions for this customer is yn ¼ ∑i¼1 x i ,
because we have a total of n time periods in the data for every customer. The last observed transaction of a particular
customer occurs say in period m (m ≤ n), ie, in the interval (tm − 1, tm]. This then also implies that xm+1, …, xn − 1, xn
are all = 0, implying that yn = ym. In our case, the transaction history is presented through the number of transactions for
each of a series of equal discrete time intervals. With reference to Fader's et al22 note, which derives the Pareto/NBD
likelihood function for the case of interval‐censored data, our data structure, reported in terms of the transaction counts,
is considered as interval censored.
Because the last observed transaction for this customer occurs in period m (ie, the time period (tm − 1, tm]), the fact
that xm > 0 (xm being the number of transactions in period m) means that the customer must have been alive in the first
m−1
m − 1 periods. The individual‐level likelihood of the corresponding ym − 1 transactions (where ym−1 ¼ ∑i¼1 x i ) in the
λym−1 e−μtm−1
interval (0, tm − 1] is j¼m−1 υ :*
ð∏j¼1 xj !Þ Z ðλ;υÞtm−1
The individual‐level likelihood function of the corresponding ym transactions in the interval (0, tn] can then be
derived to be as follows (proof detailed in the Appendix):

 −μtm−1 
λx μe lnðZ ðλ; υÞÞe−μtn
L¼ × þ ; (3)
ðπ x Þυ ðμ þ lnðZ ðλ; υÞÞÞ Z ðλ; υÞtm−1 Z ðλ; υÞtn

where πx = ∏ (xi!).
The proposed model estimates 2N + 1 parameters, where N is the number of customers, the common COM‐Poisson
parameter υ, and the 2 parameters λi and μi estimated for each customer i, i = 1, …, N.

*It should be noted that because we consider t1 to be 1 unit of time, t2 as 2 units of time, and so on; hence, tm‐1 in the expression is considered as (m‐1)
units of time and as such tm‐1 can be replaced by (m‐1) in the expression.
118 MZOUGHIA ET AL.

4.3 | Parameter estimation


We estimate model parameters by setting up an MCMC chain using a Metropolis‐Hastings algorithm. This allows us to
estimate individual‐level parameters for the model. To do so, it is necessary to design a transition operator for the
Markov chain, which makes the chain's stationary distribution match the posterior function, denoted P, which is equal
to the likelihood function defined in Equation 7 multiplied by a parameter prior distribution. In our paper, we use
independent Gamma(1,1) distributions as priors for the υ, λi, and μi parameters. Kadane et al24 provide the form of con-
jugate prior density for the COM‐Poisson as,

hðλ; υÞ ¼ λa−1 e−υb Z −c ðλ; υÞκða; b; cÞ

λj ∞
where Z ðλ; υÞ ¼ ∑j¼0 and κ−1 ða; b; cÞ ¼ ∫∞ ∞ a−1 −υb −c
0 ∫0 λ e Z ðλ; υÞd λd υ. However, in our application, we use
ðj! Þυ
independent gamma priors primarily for 2 reasons; firstly, we estimate a single υ parameter across customers while
we estimate N λi parameters; secondly, the use of independent gamma priors is computationally parsimonious as
compared with the use of the COM‐Poisson conjugate density.
A normal distribution is used as a proposal in the Metropolis‐Hastings algorithm with a mean equal to the last
accepted value, and the variance of the normal is used as a tuning parameter to tune the acceptance ratio of draws.†

4.4 | Prediction of number of transactions


Given that we do not know if a customer is alive or not at time tn, the expected number of purchases in the period (tn, tn
with purchase history x, πx, and tn is calculated as

E ðX ðt Þjλ; μ; υ; x; π x ; t n Þ ¼ E ðX ðt Þjλ; υ; μ; alive at t n ÞPðτ>t n jλ; μ; υ; x; π x ; t m ; t n Þ : (4)

The expected number of transactions while the customer is alive at time tn is calculated using the COM‐Poisson's mean
∞ jλj
function b
x¼∑ υ as
j¼0 ðj!Þ Z ðλ; υÞ

t n þt b
x bx
E ðX ðt Þjλ; υ; μ; alive at t n Þ ¼ b
x tPðτ>t n þ t jμ;τ>tn Þ þ ∫ b
x τ f ðτ jμ; τ>t n Þdτ ¼ − e−μt : (5)
tn μ μ

Equation 5 assumes that the customer is alive at tn. As we do not know whether a customer is alive at tn, the
generalized form of the expected number of transactions is
 
bx b
x −μt −μtn
E ðX ðtÞjυ; λi ; μi ; t n Þ ¼ − e e : (6)
μ μ

The probability that a customer with purchase history x, πx, and tn is alive at time tn is calculated as

λx e−μtn
Pðτ>t n jλ; μ; υ; x; π x ; t n Þ ¼ : (7)
π x Z ðλ; υÞ Lðλ; υ; μjx; π x ; t n Þ
υ tn

Using Equations 5 and 7, we can predict the number of transactions for any customer and for any future time period.
This can be done by calculating the entire posterior distribution of the predicted number of transactions using the entire
set of posterior draws of λ, μ, and υ. For the sake of computational parsimony, we could also evaluate the prediction of
number of transactions using the posterior means of λ, μ, and υ.

5 | MODEL PERFORMANCE

The proposed model is applied on multiple datasets. Firstly, the well‐known CDNOW data set7 is used to evaluate model
performance. Secondly, 3 simulated data sets are used to cover cases of overdispersion, underdispersion, and the Poisson

The normal distribution is also used as a proposal for the other parameters drawn in our MCMC scheme.
MZOUGHIA ET AL. 119

dispersion. Finally, a North African retail bank data set is used to evaluate model performance and to analyze the link
between purchasing behavior and customer characteristics.

5.1 | The CDNOW data set


CDNOW is a well‐known dataset commonly used as a benchmark in lifetime transaction models (Fader et al,7 Abe,11
and Bemmaor and Glady8). This dataset tracks 23 570 customers of CDNOW (an online retailer) doing their first
purchase in the first 12 weeks of 1997 and observed from January 1997 through June 1998 (78 wk). We used the 2357
customer cohort, which is a 1/10th systematic sample of the 23 570 customers.
The first 39 weeks of the data are used to estimate model parameters. The next 39 weeks are used to validate our
model and for a comparative prediction analysis with the Pareto/NBD model. For the Pareto/NBD model, the
parameters are obtained using maximum likelihood estimation and are reported in Table 1.
For the proposed model, we used the Metropolis‐Hastings sampling algorithm as described in Section 4.3. The
sampler was run for 50 000 iterations, and the last 10 000 iterations were used for analysis on the 2 × N + 1 estimated
parameters. Figure 2 shows the convergence of the common COM‐Poisson parameter υ over the last 10 000 iterations:
The trace plot is a plot of the iteration number against the value of the draw of the parameter at each iteration. The
density is the histogram of the values in the trace plot, ie, the distribution of the values of the parameter in the chain.
According to Geweke20 and Heidelberger and Welch21 diagnostics, the convergence test was successful. The
convergence diagnostics were run in the R statistical environment, using the “CODA” package.
We summarize the estimated parameters in Table 2. The COM‐Poisson parameter υ is greater than 1, implying that
the number of transactions made by these 2357 customers presents underdispersion vis‐à‐vis a Poisson distribution. The
fact that the parameter is significantly different from 1.0 is some indication that the choice of Poisson distribution in
modeling the number of transactions may not be appropriate.
Figure 3 displays a histogram plot of the λi and μi parameters. Overlaid on the histogram are best fitting gamma
distributions, Gamma(1.33, 0.62) for λi and Gamma(0.94, 0.54) for μi. In the absence of incorporating observed
heterogeneity (in terms of socio‐demographic characteristics), the choice of adequate heterogeneity distribution for
the parameters λi and μi is important, especially if we want to predict the behavior of a sample of customers who are
not included in the estimation sample. For purposes of exposition, Figure 4 displays box plots of the first 10 customer's
parameters λi and μi. Graphically, there is considerable heterogeneity seen in the λi parameter.

TABLE 1 Estimation of the Pareto/NBD parameters using the CDNOW data set

Parameter r α s β

Value 0.553 10.578 0.606 11.664


Standard error (0.283) (2.082) (1.903) (5.055)

FIGURE 2 Convergence of the parameter υ over iterations


120 MZOUGHIA ET AL.

TABLE 2 Estimation of the proposed model parameters using the CDNOW data set

Parameter Mean Min Max

υ 2.661
λi 0.830 0.023 32.553
μi 0.508 0.001 4.393

FIGURE 3 Histogram plots of λi and μi parameters [Colour figure can be viewed at wileyonlinelibrary.com]

FIGURE 4 Box plots of the first ten λi's and μi's [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 5 provides a histogram plot showing posterior comparisons of customers being alive. This has been calculated
using the posterior means of parameters in Equation 11. Notice that a majority of customers get categorized as “dead” by
the end of observation period in this data.
We also conduct a predictive analysis using the estimated coefficients. Figure 6 displays a visualization of the condi-
tional expectations. The cohort is grouped by their number of transactions during the calibration period. Then, the aver-
age predicted number of transactions during the validation period is compared with the actual average number in each
group.
According to the plots, the proposed model offers slightly better aggregated forecasting ability than Pareto/NBD,
especially for the group of customers doing 7, 8, 9, or more transactions during the calibration period.
To evaluate the individual forecasting performance of the proposed model, we show in Table 3 the mean absolute
difference (MAD), which is a measure of statistical dispersion. The MAD is the average absolute difference between
the predicted and actual values. Based on this statistic, our proposed model outperforms the Pareto/NBD model.
MZOUGHIA ET AL. 121

FIGURE 5 Histogram plot of p(“alive”) [Colour figure can be viewed at wileyonlinelibrary.com]

FIGURE 6 Conditional expectations using CDNOW data set

TABLE 3 Comparison statistics

Model Pareto/NBD Proposed Model

MAD 0.754 0.639

The individual purchase rate λi and dropout rate μi estimated using the MCMC algorithm are used to segment
customers into 4 clusters using k‐means algorithm, which is a partition‐based method frequently used in data mining.
The number of clusters is obtained by optimizing the Bayesian information criterion. These classes are shown in
Figure 7 and can provide managers with the ability to propose marketing solutions for each group of customers
separately. For example, a manager can use a mailing action to increase loyalty of class 2 customers or a promotion
campaign to enhance the purchase frequency of class 3 customers.
As the CDNOW database does not give us enough data to overlap defined classes with customer characteristics, we
use further databases in our analysis. These are presented in the next 2 sections. Section 5.2 presents analysis on a
simulated dataset, and Section 5.3 presents analysis on a customer credit card transaction data provided by a major North
African retail bank.

5.2 | Simulated data


We use the proposed model to predict individual customer transactions in 3 sets of simulated data. These simulated data
consist of 2000 customers (i = 1, through 2000) making transactions during a time period of 104 weeks. The first 52 weeks
122 MZOUGHIA ET AL.

FIGURE 7 CDNOW customers segmentation [Colour figure can be viewed at wileyonlinelibrary.com]

are used to estimate parameters, and the next 52 weeks are used to evaluate forecasting performance of the proposed
model. For a comparative analysis, we also use the Pareto/NBD model to make similar predictions on these simulated
data.
The simulations across the 104 weeks are done as follows:

• Customer lifetime τi ~ Pareto(s, β)


• Weekly number of transactions xi ~ COM‐Poisson(υ, λi)
• COM‐Poisson parameter λi ~ Gamma(r, α)

The first simulated data set (scenario 1) uses a value of υ > 1 corresponding to an underdispersion of number of
transactions as compared with the Poisson distribution. The second simulated data set (scenario 2) considers the case
where υ = 1 and the number of transactions follows a Poisson distribution with an equi‐dispersion property. Finally,
the last data set (scenario 3) presents an overdispersed scenario with υ < 1. Table 4 shows the selected values of each
parameter used to generate the 3 simulated datasets.
To evaluate the proposed model performance, we predict future number of transactions and compare them with
those given by the Pareto/NBD model. The MAD statistic is used to evaluate the various predictions. Table 5 shows
the MAD statistic for each model with the 3 scenarios. Note that when the dispersion in data is as in the Poisson

TABLE 4 Selected parameters used to draw simulations

Parameter Scenario 1 Scenario 2 Scenario 3

υ 1.4 1.0 0.8


r 7.0 7.0 7.0
α 2.0 2.0 2.0
s 3.0 3.0 3.0
β 100.0 100.0 100.0

TABLE 5 Mean absolute difference predictions from the various models

Model Scenario 1 Scenario 2 Scenario 3

PGCP 35.12 24.13 15.92


Pareto/NBD 117.32 28.51 31.58
MZOUGHIA ET AL. 123

distribution, then both models (Pareto/NBD and our proposed model) perform similarly based on the MAD statistic.
However, for the overdispersed and underdispersed situations, Pareto/NBD model does not perform as well as the
proposed model.

5.3 | The retail bank data set


This is the last dataset used to demonstrate the proposed model. It is a set of customer credit card transactions provided
by a major North African retail bank. This data consists of weekly number of transactions of a single cohort of 5000
customers who made their first transaction in the first week of 2011. We have their weekly number of transactions from
January 2011 through December 2012 (transactions for 104 wk across these 5000 customers). We also have customers'
demographic information such as age, gender, and monthly income. Further, the data have information on
“multichannel shopping,” which is the way by which customers interact with the bank. Customers at this bank can
transact with the bank through various channels such as the physical branch, ATMs, Internet, or mobile banking.
Because all customers in our data use branches as well as ATMs, we used a dummy variable (“multichannel variable”),
which takes a value 1 if the customer uses Internet or mobile channel and 0 otherwise.
The first 52‐week data (year 2011) are used to estimate model parameters. The next 52 weeks (year 2012) are used to
validate our model and for a comparative predictive analysis with the Pareto/NBD.
The estimated parameters of the Pareto/NBD model are reported in Table 6 and the summary of the estimated
parameters for the proposed model in Table 7. In this case, the COM‐Poisson parameter υ is smaller than 1, implying
that the number of transactions made by these 5000 customers presents overdispersion vis‐à‐vis a Poisson distribution.
This again shows that the COM‐Poisson distribution is more appropriate than the Poisson to model the number of
transactions. It also indicates flexibility of the COM‐Poisson to model underdispersed as well as overdispersed data
(note that the value of υ in case of CDNOW data was greater than 1 indicating underdispersion vis‐a‐vis a Poisson).

TABLE 6 Estimation of the Pareto/NBD parameters using the retail bank data set

Parameter r α s β

Value 1.398 1.042 0.121 59.977


Standard error (1.84E‐4) (1.68E‐5) (1.94E‐4) (0.29)

TABLE 7 Estimation of the proposed model parameters using the retail bank data set

Parameter Mean Min Max

υ 0.456
λi 0.806 0.033 2.857
μi 0.036 0.3E‐3 1.753

FIGURE 8 Conditional expectations using retail bank data set


124 MZOUGHIA ET AL.

TABLE 8 Socio‐demographic characteristics of customers' classes

Total C1 C2 C3 C4 C5

Number of customers 5000 295 778 1244 1421 1262


E(weekly #transactions) 0.837 2.875 1.635 0.904 0.503 0.178
E(λ) 0.806 1.825 1.340 0.961 0.626 0.287
E(μ) 0.036 0.026 0.026 0.036 0.036 0.044
E(incomes) 796 964 905 832 774 679
% males 69% 76% 70% 70% 67% 69%
% multichannel 17% 28% 24% 20% 15% 11%
E(age) 45.5 46.4 46.0 45.3 45.6 45.1

To compare aggregated forecasting performance of the proposed model, we plot in Figure 8 the aggregated number of
transactions expected using the proposed model and the Pareto/NBD model compared with the actual data. Figure 8
shows a better performance of the proposed model compared with Pareto/NBD in forecasting customer behavior at
an aggregated level.
Using the individual purchase rates λi and dropout rates μi estimated using the MCMC algorithm, we segment
customers into 5 classes using the k‐means algorithm. The characteristics of each class are presented in Table 8. Looking
at these characteristics, we can broadly see that there are 2 families of customers; customers with a high level of
transaction rate λ (classes C1‐C2‐C3) and customers with a low level of transaction rate λ (classes C4‐C5). Across these
various classes, we notice that the purchasing behavior (number of transactions) improves with customer income,

FIGURE 9 A decision tree analyzing customers' segmentation [Colour figure can be viewed at wileyonlinelibrary.com]
MZOUGHIA ET AL. 125

customer age, and multichannel usage of the bank services. Customers in classes C4‐C5 tend to have low level of
transactions and have the shortest life time.
Such kind of analysis allows managers to characterize existing customers and to predict the behavior of new ones. In
this context, we propose to draw a decision tree to help managers target customers. This decision tree, produced using
the SPSS software, is given in Figure 9. As per the decision tree, among the new customers, the most interesting ones are

1. those earning more than $726


2. or those earning between $466 and $726 and using multichannel.

In both cases, the probability for a customer to belong to classes 1, 2, or 3 is higher than 50%. These classes (ie, 1, 2,
and 3) have the highest number of expected transactions (Table 8). On the other hand, customers who earn less than
$726 and do not use multichannel services have a high probability of being in class 5, a class characterized by lower num-
ber of transactions.
This kind of analysis can be a support for further marketing decisions, allowing managers to target existing or new
customers and to optimize marketing resources.

6 | CONCLUDING COMMENTS

Customer lifetime value is a key metric for any business activity. An important computation in calculating CLV is to
predict the number of lifetime transactions for a customer. Such computation becomes particularly challenging in
“noncontractual settings,” where the firm does not observe customer defections when they happen. There are a limited
number of models available in the literature to be used in such settings. A difficulty encountered by companies when
predicting customer future transactions is the choice of an appropriate model with satisfactory predictions for each
customer. Further, the selected model should allow taking strategic decisions to increase the profitability of existing
customers and to target new interesting ones. The Pareto/NBD model is the first and most widely used model to predict
future number of transactions for a customer in such noncontractual relationships. This model presents some
shortcomings, such as assuming that the number of transactions by a customer follows a Poisson distribution, where
many real data present overdispersion and underdispersion of data. Also, the likelihood function is difficult to compute
due to numerous evaluations of the Gaussian hypergeometric function. Finally, the Pareto/NBD provides limited
opportunities to incorporate socio‐demographic information or to target specific customers.
We propose a lifetime transaction model based on the COM‐Poisson distribution offering better flexibility to predict
future customer transactions over time. Our proposed model incorporates socio‐demographic information on customers
and fits data better compared with the Pareto/NBD. It also outperforms the Pareto/NBD model in terms of prediction
accuracy. The model parameters are estimated using a Metropolis‐Hastings algorithm, which besides being easy to
use also provides individual‐level estimation of customer's purchase and dropout rates. This then can be used to segment
and target customers. We demonstrate the flexibility of the model using 2 industry datasets and a simulated set of data.
Although in this work we do not allow the parameter υ of COM‐Poisson to vary heterogeneously across customers,
future work could explore models where the υ parameter is also heterogeneous across customers.

R EF E RE N C E S
1. Kumar V, Reinartz WJ. Customer Relationship Management: A Databased Approach. New Jersey: John Wiley & Sons, Inc.; 2006.
2. Lord D, Guikema SD, Geedipally SR. Application of the Conway‐Maxwell‐Poisson generalized linear model for analyzing motor vehicle
crashes. Accident Analysis & Prevention. 2008;40:1123‐1134.
3. Rodrigues J, de Castro M, Cancho VG, Balakrishnan N. COM–Poisson cure rate survival models and an application to a cutaneous
melanoma data. Journal of Statistical Planning and Inference. 2009;139(10):3605‐3611.
4. Shmueli G, Minka T, Kadane JB, Borle S, Boatwright P. A useful distribution for fitting discrete data: revival of the Conway–
Maxwell–Poisson distribution. J R Stat Soc Ser C Appl Stat. 2005;54(1):127‐142.
5. Zhu F. Modeling time series of counts with COM‐Poisson INGARCH models. Mathematical and Computer Modelling.
2012;56(9–10):191‐203.
6. Borle S, Singh SS, Jain DC, Patil A. Analyzing recurrent customer purchases and unobserved defections: a Bayesian data augmentation
scheme. Customer Needs and Solutions. 2016;3(1):11‐28.
126 MZOUGHIA ET AL.

7. Fader PS, Bruce GSH, Lee KL. Counting your customers the easy way: an alternative to the Pareto/NBD model. Marketing Science.
2005;24:275‐284.
8. Bemmaor A, Glady N. Modeling purchasing behavior with sudden "death": a flexible customer lifetime model. Management Science.
2012;58(5):1012‐1021.
9. Schmittlein DC, Morrison DG, Colombo R. Counting your customers: who are they and what will they do next? Management Science.
1987;33(1):1‐24.
10. Jerath K, Fader P, Hardie B. New perspectives on customer ‘death’ using a generalization of the Pareto/NBD model. Marketing Science.
2011;30(5):866‐880.
11. Abe M. Counting your customers one by one: a hierarchical Bayes extension to the Pareto/NBD model. Marketing Science.
2009;28(3):541‐553.
12. Ma SH, Liu JL. 2007. The MCMC approach for solving the Pareto/NBD model and possible extensions. In Third International Conference
on Natural Computation (ICNC 2007) (Vol. 2, pp. 505–512). IEEE.
13. Singh SS, Borle S, Jain DC. A generalized framework for estimating customer lifetime value when customer lifetimes are not observed.
QME. 2009;7(2):181‐205.
14. Conway RW, Maxwell WL. A queuing model with state dependent service rates. Journal of Industrial Engineering. 1962;12(2):132‐136.
15. Boatwright P, Borle S, Kadane JB. A model of the joint distribution of purchase quantity and timing. J Am Stat Assoc. 2003;98:564‐572.
16. Sellers KF, Borle S, Shmueli G. The COM‐Poisson model for count data: a survey of methods and applications. Applied Stochastic Models in
Business and Industry. 2012;28(2, 2012):104‐116.
17. Zhu L, Sellers KF, Morris DS, Shmueli G. Bridging the gap: a generalized stochastic process for count data. The American Statistician.
2017;71(1):71‐80.
18. Chib S, Greenberg E. Understanding the Metropolis–Hastings algorithm. The American Statistician. 1995;49:327‐335.
19. Green PJ, Sibson R. Computing Dirichlet tessellations in the plane. The Computer Journal. 1978;21(2):168‐173.
20. Geweke J. 1991. Evaluating the accuracy of sampling‐based approaches to the calculation of posterior moments (Vol. 196). Minneapolis,
MN, USA: Federal Reserve Bank of Minneapolis, Research Department.
21. Heidelberger P, Welch PD. Simulation run length control in the presence of an initial transient. Oper Res. 1983;31(6):1109‐1144.
22. Fader PS, Hardie BGS. 2005. “Implementing the Pareto/NBD model given interval‐censored data.”<http://brucehardie.com/notes/011/>
23. Kadane JB, Krishnan R, Shmueli G. A data disclosure policy for count data based on the COM‐Poisson distribution. Management Science.
2006b;52(10):1610‐1617.
24. Kadane JB, Shmueli G, Minka TP, Borle S, Boatwright P. Conjugate analysis of the Conway‐Maxwell‐Poisson distribution. Bayesian Anal.
2006a;1(2):363‐374.

How to cite this article: Mzoughia MB, Borle S, Limam M. A MCMC approach for modeling customer lifetime
behavior using the COM‐Poisson distribution. Appl Stochastic Models Bus Ind. 2018;34:113–127. https://doi.org/
10.1002/asmb.2276
MZOUGHIA ET AL. 127

APPENDIX: PROOF OF EQUATION 3.

A. | The individual‐level likelihood function of the corresponding ym transactions in the interval (0, tn] is

λym−1 e−μtm−1
L¼ υ ×ðA1 þ A2 þ A3 Þ (8)
j¼m−1
∏j¼1 x j ! Z ðλ; υÞtm−1

where A1 is the individual‐level likelihood element, in case where the customer becomes inactive sometime in the m‐th
period given by

λx m
A1 ¼ ∫ttmm−1 μ e‐μðτ‐tm‐1 Þdτ
ðx m ! Þυ Z ðλ; υÞτ−tm−1
h i (9)
λxm μeμtm‐1
¼ υ −t m−1 e‐tm‐1 ðμþ lnðZ ðλ;υÞÞÞ −e‐tm ðμþ lnðZ ðλ;υÞÞÞ ;
ðx m ! Þ Z ðλ; υÞ ðμ þ lnðZ ðλ; υÞÞÞ

A2 is the individual‐level likelihood element, where the customer is alive all through the m‐th period but becomes
inactive sometime in the interval (tm, tn] given by

λx m ‐μðt m ‐tm‐1 Þ
tn 1 ‐μðτ‐tm Þ
A2 ¼ υ t m −t m−1 e ∫ τ−t m μ e dτ
ðx m ! Þ Z ðλ; υÞ t m Z ðλ; υÞ
h i (10)
λxm μeμtm‐1
¼ υ −tm−1 e‐tm ðμþ lnðZðλ;υÞÞÞ −e‐tn ðμþ lnðZ ðλ;υÞÞÞ ;
ðx m ! Þ Z ðλ; υÞ ðμ þ lnðZ ðλ; υÞÞÞ

and A3 is the individual‐level likelihood element, in case where the customer is alive all through the m‐th period and
remains alive all through the interval (tm, tn] making no additional purchases. This is given by

λx m
A3 ¼ e‐μðtn ‐tm‐1 Þ : (11)
ðx m ! Þ Z ðλ; υÞtn −tm−1
υ

The final individual‐level likelihood function is as follows (resulting by substituting Equations 9, 10, and 11 into
Equation 8):
 −μtm−1 
λx μe lnðZ ðλ; υÞÞe−μtn
L¼ × þ ;
ðπ x Þυ ðμ þ lnðZ ðλ; υÞÞÞ Z ðλ; υÞtm−1 Z ðλ; υÞtn

where πx = ∏ (xi!) .
Copyright of Applied Stochastic Models in Business & Industry is the property of John Wiley
& Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder's express written permission. However, users may print,
download, or email articles for individual use.

You might also like