Professional Documents
Culture Documents
www.elsevier.com/locate/eswa
Abstract
With the advent of one-to-one marketing media, e.g. targeted direct mail or internet marketing, the opportunities to develop targeted
marketing (customer relationship management) campaigns are enhanced in such a way that it is now both organizationally and economically
feasible to profitably support a substantially larger number of marketing segments. However, the problem of what segments to distinguish,
and what actions to take towards the different segments increases substantially in such an environment. A systematic analytic procedure
optimizing both steps would be very welcome.
In this study, we present a joint optimization approach addressing two issues: (1) the segmentation of customers into homogeneous groups
of customers, (2) determining the optimal policy (i.e. what action to take from a set of available actions) towards each segment. We
implement this joint optimization framework in a direct-mail setting for a charitable organization. Many previous studies in this area
highlighted the importance of the following variables: R(ecency), F(requency), and M(onetary value). We use these variables to segment
customers. In a second step, we determine which marketing policy is optimal using markov decision processes, following similar previous
applications. The attractiveness of this stochastic dynamic programming procedure is based on the long-run maximization of expected
average profit. Our contribution lies in the combination of both steps into one optimization framework to obtain an optimal allocation of
marketing expenditures. Moreover, we control segment stability and policy performance by a bootstrap procedure. Our framework is
illustrated by a real-life application. The results show that the proposed model outperforms a CHAID segmentation.
q 2004 Elsevier Ltd. All rights reserved.
Keywords: Direct marketing; Econometric models; Sample selection; Target selection; Endogeneity
a segment, rather than real one-to-one marketing. With the the integration of the two-step process of segmentation and
advent of one-to-one marketing media, e.g. digital printing targeting in one framework to obtain an optimal allocation
for targeted direct mail or e-mail marketing, the opportu- of marketing expenditures.
nities to develop targeted marketing campaigns are This paper is structured as follows. In Section 2, we
enhanced in such a way that it is now both organizationally discuss our methodology in detail. In Section 3 we present
and economically feasible to profitably support a substan- an operationalization of the proposed methodology in a
tially larger number of marketing segments. However, the direct mailing context. The results of an empirical
complexity and the dimension of the segmentation and of application of our model are presented in Section 4. We
the identification of the appropriate marketing activity finish with a detailed discussion in Section 5.
increases substantially in such an environment. A systema-
tic analytic procedure optimizing both steps would be very
welcome. 2. Methodology
The quality of a specific segmentation is generally
measured by within-segment homogeneity and between In marketing, segmentation is usually not a goal in itself,
segment heterogeneity (Wedel & Kamakura, 2000). The but rather a means to an end. The actual goal is to determine
marketing activity is optimized with the assumption that the the most appropriate ( ¼ most profitable) marketing mix for
segmentation is optimally determined and can remain fixed. each segment. Usually, these segments are determined first
However, the objective for the segmentation procedure may and then, on the basis of these segments, the marketing mix
not be optimal for the overall performance of the direct is determined (Baesens et al., 2004; Bitran & Mondschein,
marketing effort. It can be more desirable to determine a 1996; Buckinx & Van den Poel, 2004; Van den Poel &
segmentation that performs well on other criteria than with Larivière, 2004), see Fig. 1.
respect to within-segment homogeneity. For instance, if we There is, however, no guarantee that the segmentation
look at a direct marketing context, the (direct) marketer is determined in the first step will lead to maximized profits. We
interested in maximizing direct-mail returns. In this case, propose a joint optimization routine that determines the most
the most preferred segmentation is the one that maximizes profitable customer segmentation and marketing strategy in
profits and not within-segment homogeneity. terms of the best overall long-term performance. The
In this study, we present a joint optimization approach optimization process considers a series of segmentations of
addressing two issues: (1) the segmentation of customers the customers. For each segmentation, the best marketing
into homogeneous groups of customers, (2) determining the policy in terms of long-term impact ( ¼ revenues) is
optimal policy for marketing activities towards each determined using stochastic dynamic programming. Other
segment. segmentations are considered in a local search environment
The first step determines the segmentation of the of the current segmentation until a (local) optimum is found
customers. The proposed segmentation can be classified as or a stopping criterion is triggered. The stochastic dynamic
an a-priori descriptive method (Wedel & Kamakura, 2000): programming procedure needs parameter estimates,
we determine segments before-hand, using only variables especially transition probabilities between the segments
that describe personal characteristics of the customers. over time and the expected response per customer segment.
Hence, we do not look at the relationship between a The marketing strategy optimization is bootstrapped to
dependent variable and some explanatory variables. account for estimation errors. A genetic algorithm is applied
In a second step, we determine the optimal marketing to the segment optimization in combination with the
policy using Markov decision processes, following similar optimization procedure for the marketing strategy. In
applications such as Bitran and Mondschein (1996) and conclusion one can describe the proposed optimization
Gönül and Shi (1998). The attractiveness of this stochastic process as shown in Fig. 2.
dynamic programming procedure is based on the long-run
maximization of expected average profit. Moreover, we 2.1. Step 1—segmentation
control segment stability and policy performance by a
bootstrap procedure. The segmentation we propose does not consider a
The performance ( ¼ profitability) of the (optimal) relationship between certain (explanatory) variables and a
marketing policy provides an indication of the quality of dependent variable (such as response). We propose to use an
the segmentation determined in the first step. By using a a-priori descriptive method, where the segments are
local search method, multiple segmentations are evaluated, determined before-hand. The quality of the resulting
resulting in an optimal segmentation of the customers
(optimal with respect to generated profits). We implement
this joint optimization framework in a direct-mail setting for
a charitable organization.
To summarize, we contribute to the existing literature by Fig. 1. The general procedure to go from a segmentation to an optimal
the specification of an optimization framework offering marketing strategy.
J.-J. Jonker et al. / Expert Systems with Applications 27 (2004) 159–168 161
(target selection). A review of this literature can be found in mailer. Therefore, we propose to use the following RFM
Baesens, Viaene, Van den Poel, Vanthienen and Dedene variables:
(2002). Apart from this stream, we find several approaches
to optimizing direct marketing messages that are somewhat 1. Recency ðRÞ; the number of mailings without a response
related to our proposed framework. since the last response of the customer.
The approach of DeSarbo and Ramaswamy (1994) seems 2. Frequency ðF1 Þ; the response percentage over the last 2
comparable to the one we propose. They consider years, defined by the total number of responses over the
combining the segmentation and targeting issue in one past 2 years divided by the number of mailings in the past
framework (their model is in essence a latent class 2 years.
formulation of a probit model). They use the model to 3. Frequency ðF2 Þ; the response percentage over the whole
determine which customers should be included on the period of registration.
mailing list for the next period. However, their model does 4. Monetary ðM1 Þ; the average size of the responses over the
not determine an optimal (in terms of profits) mailing past 2 years.
strategy, as they only model the binary (yes or no) response. 5. Monetary ðM2 Þ; the average size of the responses over
Second, the model has a single-period horizon and does not the whole period of registration.
take future mailings into account.
Two models that do have a multi-period horizon are Note that F1 and M1 explicitly measure (relative) short
Bitran and Mondschein (1996) and Gönül and Shi (1998). term response behavior, whereas F2 and M2 measure long-
Both models determine the optimal (long-term) mailing term response behavior. We have deliberately chosen to
strategy using a Markov decision model. However, the include variables which measure short-term as well as long-
segmentation and the mailing policy decision step are term response behavior in order to rule out the effects of the
considered separately. The state space (or segmentation) is mailing strategy as much as possible.
determined before-hand, and given this segmentation, the For example, suppose individual i always responds to a
optimal mailing policy is determined. Our approach mailing that (s)he receives and (s)he then spends 10 guilders.
considers several segmentations and determines the one The R is always equal to zero, F1 and F2 are equal to one and
that is optimal. M1 and M2 are equal to 10. If the decision rule of the direct
This section describes how our model is operationalized mailer indicates not to send a mailing to i (for instance
in a direct mailing context. First, we look at the because of a budgetary constraint) in year t and also in year
segmentation, and motivate which variables are used for t þ 1; the score on F1 and M1 will have become zero. But R
segmentation. Next, we explain the method that is used to will also be zero, because i has responded to the last mailing
obtain the optimal mailing policy, the Markov decision (s)he received. Now, if the direct mailer does not incorporate
process. Lastly, we describe the search algorithm that long-term measures of response behavior (such as F2 and M2 )
is used to look for different segmentations (Genetic when determining the selection, i probably will not be
Algorithm). selected again, whereas (s)he is a profitable customer (if we
assume the cost of sending a mailing is equal to 1). Hence,
3.1. Step 1—segmentation valuable response behavior might be lost if long-term
response measures are not incorporated.
3.1.1. Bases for segmentation
In direct marketing, the RFM variables are the most 3.1.2. Defining a segmentation
likely candidates as bases for segmentation (Bult and As described in Section 2, a segmentation is defined by
Wittink, 1996; Van den Poel, 2003; Viaene et al., 2001). the breakpoints for the variables. The number of segments
The RFM variables measure consumer response behavior in increases rapidly as the number of breakpoints increases.
three dimensions. The first dimension is Recency, which For instance, if each of the five RFM variables from the
indicates how long it has been since the customer has last previous Section has one breakpoint, there will be (25 ¼ )
responded. The second is Frequency, which provides a 32 segments, as each variable is divided in two parts when
measure of how often the customer has responded to one has one breakpoint. Similarly, if there are two
received mailings. And finally, Monetary Value measures breakpoints for each variable, there will be (35 ¼ ) 243
the amount of money or the number of products that the segments. Hence, the size of the search space remains
customer has spent in response to the mailings. limited, as a reasonable sized state space is already obtained
Various operationalizations of the RFM variables have with a small number of breakpoints.3
been used in the literature. The specific definition usually There is another reason to limit the number of
depends on a number of case specific elements (for instance breakpoints. If there are too many segments, the
catalogs vs. mailings). But, as indicated in Donkers, Jonker, transition probabilities will become difficult to estimate
Paap, and Franses (2001), it is important to formulate the
variables such that they measure response behavior of 3
The segmentation described here can also be seen as a state space
individuals and not (past) mailing behavior of the direct formulation of the Markov model, which will be discussed in Section 2.2.
J.-J. Jonker et al. / Expert Systems with Applications 27 (2004) 159–168 163
as there can be very few observations per transition. This (mean –std.dev.) of the new solution is higher than the
makes the transition probabilities less reliable. Hence, a (mean –std.dev.) of the old solution. In general, boot-
trade-off has to be made between the number of strapping takes a random sample with replacement of
breakpoints ( ¼ segments) and the reliability of the the same size as the original database. Bootstrapping
estimates of the parameters. In order to prevent the enables estimating the distribution of any statistic (Efron
presence of segments which only have a few members, and Tibshirani, 1993, p. 55– 56, 162 – 177). For each
our program combines segments with too few members bootstrap sample the model considered is estimated
with neighboring segments. separately. In our application, we use the procedure to
obtain a more reliable estimate of the average
3.2. Step 2—optimization performance of the selected segmentation scheme
(Hruschka, 2002). This is accomplished by drawing
Some recent models describe this dynamic process by bootstrap samples from the dataset, followed by assign-
means of a Markov decision process (Bitran & ing the customers to the segmentation scheme under
Mondschein, 1996; Gönül & Shi, 1998). These models investigation and optimizing the mailing policy.
describe the migration of an individual customer in terms
of transitions between states. A state records the segment
that the customer belongs to at a decision moment. 3.3. Step 3—local search methods
Migration between the states will occur as a result of the
mailing policy and the response of the customer in a
time period between two decision moments. For each Since its introduction by Holland (1975) genetic
customer the states are recorded at a number of decision algorithms (GA) have shown good results in numerous
moments. Let a be the mailing decision at a decision applications. There is a vast literature on GA (Davis,
moment for all customers who are observed in state s 1991), including studies on its theoretical and practical
ða ¼ 1; …; A; s ¼ 1; …; SÞ: The transition probability to performance and many extensions of the basic algorithm.
observe a migration from state s to state t after mailing For our application we consider the Simple Genetic
decision a is denoted by ps;t ðaÞ ðs; t ¼ 1; …; SÞ: At each Algorithm (SGA) as defined by Goldberg (1989).
decision moment, action a triggers an immediate reward Although we realize that other, possibly more sophisti-
rs ðaÞ for all customers observed in state s at the decision cated GA formulations exist, we feel that SGA is best
moment. This reward is given by the response of the suited for our application because its simplicity will lead
customer till the next decision moment. The direct to a feasible running time of the algorithm. The key
marketing firm however may be more interested in long- feature of GA has been formulated by Balakrishnan and
term profits. We will consider the long-term discounted Jacob (1996, p. 1108) as: “A key feature of GA is that
reward as the objective for mailing optimization. Bitran the search is conducted from a population rather than a
and Mondschein (1996) show that this objective is single point thus increasing the exploratory power of GA.
closely related to lifetime value of the customers. Other In contrast with other gradient search techniques which
long or short term objectives can be used as desired. The need additional knowledge such as the differentiability of
Markov decision model will determine the optimal the function, GA use the objective function or payoff
mailing policy by the well-known value iteration or directly, in that the candidate [solutions] are evaluated
policy iteration algorithm or by linear programming based on the specific objective.”
(Ross, 1983, p. 40). The parameters of the model that The genetic algorithm starts with a population of
have to be estimated before the optimization routine can randomly selected solutions for the segmentation of the
start are the transition probabilities for each action a and customers based on the RFM variables. A solution can
the immediate reward in each state when action a is
be denoted by the specification of the breakpoints for
chosen.
each of these variables. All solutions have two break-
The estimation of the parameters of the Markov
points for every variable, but a breakpoint can have
decision model introduces an estimation error in the
values outside of the range of the variable indicating that
expected profit for each mailing policy. Most critical is
the breakpoint is inactive. We also include the CHAID
correct estimation of the transition probabilities pst ðaÞ:
solution in the starting population.
Sparse data sets are known to obstruct reliable
estimation for the transition probabilities. To account One then evaluates the goodness of each solution with
for these problems in the parameter estimation we use a respect to the current population. This evaluation consists
bootstrap technique (Efron, 2000). More specifically, we of finding the optimal mailing policy for this segmenta-
want to assess the stability of a solution by calculating tion and determining the expected profit of this policy.
the bootstrap mean and standard deviation. In our Solutions that result into a higher profit are given more
setting, we only accept a solution as ‘better’ if it chance to ‘reproduce’ than others. In particular, we draw
outperforms a given solution (means are higher), and if with replacement 2k solutions from the population P
164 J.-J. Jonker et al. / Expert Systems with Applications 27 (2004) 159–168
Parameter Value
Table 4
Best 20 solutions from generation 50
Table 7 5. Discussion
Breakpoints for M1 and M2 for generation 1 and generation 50
identical. This indicates that our method does not converge Bult, J. R., & Wittink, D. R. (1996). Estimating and validating asymmetric
to various different local optima. heterogeneous loss functions applied to health care fund raising.
International Journal of Research in Marketing, 13, 215–226.
However, this study also has some limitations and there Buckinx, W., & Van den Poel, D. (2004). Customer base analysis: partial
remain a number of issues for future research. A limitation defection of behaviorally-loyal clients in a non-contractual FMCG
is that the proposed method requires a couple of years of retail setting. European Journal of Operational Research, in press.
information, otherwise the results will become less reliable. Davis, L. (1991). Handbook of genetic algorithms. New York: Van
Nostrand Reinhold.
There are also a number of elements which could be further
DeSarbo, W. S., & Ramaswamy, V. (1994). Crisp: customer response based
examined. First, one could consider alternative definitions iterative segmentation procedures for response modeling in direct
of a segmentation. In this study, a segmentation was defined marketing. Journal of Direct Marketing, 8(3), 720.
by the ‘breakpoints’. An alternative could be to combine the Donkers, B., Jonker, J. J., Paap, R., & Franses, P. H. B. F (2001). Deriving
scores on the three RFM dimensions into one variable, and target selection rules from endogenously selected samples. ERIM
Report ERS200168MKT, Erasmus University Rotterdam.
then use this new variable to segment the customers. A
Efron, B. (2000). The bootstrap and modern statistics. Journal of the
problem there would be to determine the weights for each American Statistical Association, 95(452), 1293–1296.
dimension. A second issue for future research would be to Efron, B., & Tibshirani, R. J. (1993). Introduction to the bootstrap. New
look at global search methods to determine the optimal York: Chapman and Hall.
segmentation. Frank, R. E., Massy, W. F., & Wind, Y. (1972). Market segmentation.
Englewood Cliffs, NJ: Prentice Hall.
Friedman, J. H., & Fisher, N. I. (1999). Bump hunting in high-dimensional
data. Statistics and Computing, 9(2), 123–143.
Glover, F., & Laguna, M. (1997). Tabu search. Boston, MA: Kluwer
Acknowledgements
Academic.
Goldberg, D. E. (1989). Genetic algorithms in search, Optimization and
The authors gratefully acknowledge a Dutch charitable machine learning. Reading, MA: Addison-Wesley.
organization for providing the data used in this study. Gönül, F., & Shi, M. Z. (1998). Optimal mailing of catalogs: a new
methodology using estimable structural dynamic programming models.
Management Science, 44(9), 1249–1262.
Holland, J. H. (1975). Adaption in natural and artificial systems. Ann Arbor,
References MI: University of Michigan Press.
Hruschka, H. (2002). Market share analysis using semiparametric
attraction models. European Journal of Operational Research,
Baesens, B., Verstraeten, G., Van den Poel, D., Egmont-Petersen, M., Van 138(2), 212 –225.
Kenhove, P., & Vanthienen, J. (2004). Bayesian network classifiers for Laarhoven, P. J. M., & Aarts, E. H. L. (1987). Simulated annealing: Theory
identifying the slope of the customer-lifecycle of long-life customers. and applications. Dordrecht: Reidel.
European Journal of Operational Research, 156(2), 508–523. Ross, S. M. (1983). Introduction to stochastic dynamic programming,
Baesens, B., Viaene, S., Van den Poel, D., Vanthienen, J., & Dedene, G. probability and mathematical statistics, a series of monographs and
(2002). Using bayesian neural networks for repeat purchase modelling textbooks. San Diego, CA: Academic Press.
in direct marketin. European Journal of Operational Research, 138(1), Sasieni, M. W. (1989). Optimal advertising strategies. Marketing Science,
191– 211. 8(4), 358 –370.
Balakrishnan, P. V., & Jacob, V. S. (1996). Genetic algorithms for product Van den Poel, D. (2003). Predicting Mail-Order Repeat Buying: Which
design. Management Science, 42(8), 1105– 1117. Variables Matter?. Tijdschrift voor Economie en Management, 48(3),
Bass, F. M., & Wind, J. (1995). Introduction to the special issue: empirical 371 –403.
generalizations marketing. Marketing Science, 14(3), GI– G6. Van den Poel, D., & Larivière, B. (2004). Customer attrition analysis for
Berger, P. D., & Nasr, N. I. (1998). Customer lifetime value: marketing financial services using proportional hazard models. European Journal
models and applications. Journal of Interactive Marketing, 12(1), of Operational Research, in press.
17–30. Viaene, S., Baesens, B., Van Gestel, T., Suykens, J. A. K., Van den Poel, D.,
Bitran, G. R., & Mondschein, S. V. (1996). Mailing decisions in the catalog Vanthienen, J., De Moor, B., & Dedene, G. (2001). Knowledge
sales industry. Management Science, 42(9), 1364–1381. discovery in a direct marketing case using least squares support vector
Blattberg, R. C., & Deighton, J. (1996). Manage marketing by the customer machines. International Journal of Intelligent Systems, 16(9),
equity test. Harvard Business Review, 136–144. 1023–1036.
Breiman, L., Friedman, J. H., 01shen, R. A., & Stone, C. J. (1984). Wedel, M., & Kamakura, W. A. (2000). Market segmentation: Conceptual
Classification and regression trees. New York: Chapman and Hall. and methodological foundations. Dordrecht: Kluwer Academic.