You are on page 1of 34


Customer Lifetime Value Measurement using Machine Learning

Tarun Rathi
Mathematics and Computing
Department of Mathematics
Indian Institute of Technology (IIT), Kharagpur -721302

Project guide: Dr. V Ravi
Associate Professor, IDRBT
Institute of Development and Research in Banking Technology (IDRBT)
Road No. 1, Castle Hills, Masab Tank, Hyderabad – 500 057

July 8, 2011


Date: July 8, 2011

This is to certify that the project Report entitled “Customer Lifetime Value
Measurement using Machine Learning Techniques” submitted by Mr. TARUN
RATHI, 3rd year student in the Department of Mathematics, enrolled in its 5
year integrated MSc. course of Mathematics and Computing, Indian Institute of
Technology, Kharagpur is a record of bonafide work carried out by him under
my guidance during the period May 6, 2011 to July 8, 2011 at Institute for
Development and Research in Banking Technology (IDRBT), Hyderabad.
The project work is a research study, which has been successfully completed as
per the set of objectives. I observed Mr. TARUN RATHI as sincere, hardworking
and having capability and aptitude for independent research work.

I wish him every success in his life.

Dr. V Ravi
Associate Professor, IDRBT


Declaration by the candidate
I declare that the summer internship project report entitled, “Customer
Lifetime Value Measurement using Machine Learning Techniques” is my own
work conducted under the supervision of Dr. V Ravi at Institute of
Development and Research in Banking Technology, Hyderabad. I have put in 64
days of attendance with my supervisor at IDRBT and awarded project
I further declare that to the best of my knowledge the report does not contain
any part of any work, which has been submitted for the award of any degree
either by this institute or in any other university without proper citation.

Tarun Rathi
III yr. Undergraduate Student
Department of Mathematics
IIT Kharagpur
July 8, 2011

Naveen Nekuri for his guidance and sincere help in understanding important concepts and also in the development of the WNN software. who helped me sort out all the problems in concept clarifications.1 Acknowledgement I would like to thank Mr B. Sambamurthy. and without whose support. the project would not have reached its present state. V. Tarun Rathi III yr. I gratefully acknowledge the guidance from Dr. director of IDRBT. Undergraduate Student Department of Mathematics IIT Kharagpur July 8. for giving me this opportunity. 2011 . I would also like to thank Mr. Ravi.

Artificial Neural Networks (ANN). Keywords : Customer lifetime value (CLV). RFM.2 Abstract: Customer Lifetime Value (CLV) is an important metric in relationship marketing approaches. K-Star Method. Towards the end we make a comparison of various machine learning techniques like Classification and Regression Trees (CART). Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) for the calculation of CLV. Additive Regression. Wavelet Neural Network (WNN). In this paper. K-star Method. Multilayer Perceptron (MLP). Additive Regression. Sequential Minimal Optimization (SMO). . we review the various models and different techniques used in the measurement of CLV. Past Customer Value (PCV) and Share-of-Wallet (SOW) for segregation of customers into good or bad. but these are not adequate. SVM using SMO. Past Customer Value (PCV). as they only segment customers based on their past contribution. machine learning techniques. CLV on the other hand calculates the future value of a customer over his or her entire lifetime. There have always been traditional techniques like Recency. Frequency and Monetary Value (RFM). Data mining. Support Vector Machines. Support Vector Machines (SVM). Share-of-Wallet (SOW). which means it takes into account the prospect of a bad customer being good in future and hence profitable for a company or organisation.

5 Some other Modelling Approaches 3.1 SVM 3.2.3 Models and Techniques to calculate CLV 2.5 CART 4.3 Contents Certificate Declaration by the candidate Acknowledgement 1 Abstract 2 1. Literature Review 2.1 Aggregate Approach 2.4 WNN 3. Estimating Future Customer Value using Machine Learning Techniques 3. Conclusion and Directions of future research 4 5 5 8 10 10 12 15 15 17 19 19 20 20 21 22 22 24 27 28 References 29 .2 Models and Software Used 3. Results and Comparison of Models 5.4 Econometric Models 2.3 Growth/Diffusion Models 2.2.3 MLP Data Description 3.2.1 RFM Models Individual Approach 2.2 Computer Science and Stochastic Models Additive Regression and K-Star 3. Introduction 2.

2007) :  = ∑   . It is very important for a firm to know whether a customer will continue his relationship with it in the future or not. Again. Frequency and Monetary Value (RFM). Matlhouse and Blattberg (2005) have given examples of customers who can be good at certain point and may not be good later and a bad customer turning to good by change of job. CLV is a disaggregate metric that can be used to find customers who can be profitable in future and hence be used allocate resources accordingly (Kumar and Reinartz.4 1. However. 2006). CLV helps firms to understand the behaviour of a customer in future and thus enable them to allocate their resources accordingly. Various firms are increasing relying on CLV to manage and measure their business. Kumar. There have been other measures as well which are fairly good indicators of customer loyalty like Recency. Past Customer Value (PCV) on the other hand calculates the total previous contribution of a customer adjusted for time value of money. Introduction: Customer Lifetime Value has become a very important metric in Customer Relationship Management. it is possible that a star customer of today may not be the same tomorrow. The customers who are more recent and have a high frequency and total monetary contribution are said to be the best customers in this approach. A common disadvantage which these models share is the inability to look forward and hence they do not consider the prospect of a customer being active in future. Share-of-Wallet is another metric to calculate customer loyalty which takes into account the brand preference of a customer. Besides. Lehmann and Stuart 2004). PCV also does not take into account the possibility of a customer being active in future (V. Kumar. Past Customer Value (PCV) and Share-of-Wallet (SOW). It measures the amount that a customer will spend on a particular brand against other brands. 2007). Customer Lifetime Value is defined as the present value of all future profits obtained from a customer over his or her entire lifetime of relationship with the firm (Berger and Nassr. CLV of current and future customers is a also a good measure of overall value of a firm (Gupta. 1998). The calculation of the probability of a customer being active in future is a very important part in CLV calculation. However it is not always possible to get the details of a customer spending on other brands which makes the calculation of SOW a difficult task. A very basic model to calculate CLV of a customer is (V. which differentiates CLV from from these traditional metrics of calculating customer loyalty.


.  is the customer index. is the discount rate.  is the time index.    where. T is the number of time periods considered for estimating CLV.

SVM using SMO. we have used several non-linear techniques like Classification and Regression Trees (CART). Kumar (2007) has shown individual level approach and aggregate level approach to calculate CLV. 2007) and models which takes into account the relationship between various components of CLV like customer acquitition and retention are also used (Thomas 2001). Gupta. Kumar and Janakiraman (2010). An overview of various data mining techniques used to calculate the parameters for CLV have been compiled by Aeron. (2007) used CART and markov chain model to calculate CLV. Tobit. CLV can broadly be classified in 2 ways: a) Aggregate Approach b) Individual Approach 2. Discount rate. BG/NBD.1 Aggregate Approach: This approach revolves around calculating Customer Equity (CE) of a firm. Besides this. Aeron.. Dwyer (1997) have used a customer migration model to take into account the repeat purchase behaviour of customers. Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) to calculate CLV which takes care of the relationship between the variables which act as input variables in the prediction of CLV. Various researchers have used different techniques to calculate these parameters for calculating CLV. Various behaviour based models like logit-models and multivariate Probit-models have also been used (Donkers. Verhoef and Jong. Support Vector Machines (SVM). Haenlein et al. Hansotia and Wang (1997) used Logistic Regression. Various researchers have devised different ways to calculate CE of a firm.5 There are various models to calculate the CLV of a customer or a cohort of customers. there are various techniques that are also used to calculate CLV or the parameters needed to calculate CLV. However. Add-on-selling rate. Later on we make conclusions and discuss the areas of future research. ARIMA. Further we also make a comparison of these techniques to find the best fitted model for the dataset we used. In our research. He has linked CLV to Customer Equity (CE) which is nothing but the average CLV of a cohort of customers. Referral rate and Cost factor. Malthouse (2009) presents a list of these methods used by academicians and researchers who participated in the Lifetime Value and Customer equity Modelling Competition. to calculate CLV . let us first have a look on the various approaches designed for calculating CLV. 2. Lehman and Stuart . We will present some of the most used models to calculate CLV in the later part of the paper. many researchers also use models like Pareto/NBD. Purchase amount. V. K-Star Method. Retention rate. Kumar and Janakiraman (2010) have presented various parameters that may be useful in the calculation of CLV which include Acquisition rate. Kohonen Networks etc. Support vector machines. CBG-NBD. MBG-NBD. Purchase Probability. Dries and Poel (2009) used quantile regression. Malthouse and Blattberg (2005) used linear regression for predicting future cash flows. Additive Regression. Customer Equity is nothing but the average CLV of a cohort of customers. depending on the amount of data available and the type of company. all of these parameters may not be required in a single model. Literature Review: Before going into the details of various models of CLV. Probit. Besides this. Most of the above mentioned models are used either to calculate the variables used to predict CLV or to find a relationship between them.

CLV = ∑ )!" # $ /1 '  ( – A Here.6 (2004) have calculated CE by summing up the CLV of all the customers and taking its average.+.. r=rate of retention A= Avg. Avg. Acquisition cost per customer Kumar and Reinartz (2006) gave a formula for calculating the retention rate for a customer segment as follows : Retention rate(%) = N+. <  =.+. They also took into account the rate of retention and the average acquisition cost per customer.. Berger and Nassr (1998) calculated CLV from the lifetime value of a customer segment./01+2340 56 0372361 8/9567 56 1 – 6+./01+2340 56 0372361 8/9567 56 1  .



P B J. ' H J. L. And ?$< = Max attainable retention rate. ?$ = predicted retention rate for a given period of time in future. given by the firm. where D is the profit function over time. which can be calculated separately. This models is given as : CLV = ∑  D  x [$ ⁄1 ' E ] . . V B Q.Z[. V B O.  >    ?$ @ ?$< A 1 B C   Here. however the fluctuation of retention rate and gross contribution margin needs to be taken care of while projecting CLV for longer periods. Q. V W where. V P \ ] ^ 1'E . V B Q. Taking this into account they proposed another model which calculated the profit function over time. return on retention and return on add-on selling rate across the entire customer base. r = coefficient of retention and calculated as r= (1/t) * (ln(?$< ) – ln(?$< B ?$ ) ) Projecting Retention rate : This model is good enough for calculating the CLV of a segment of customers over a small period of time. 1 V A MN. RS TU. Blattberg. B O. Getz and Thomas (2001) calculated average CLV or CE as the sum of return on acquisition. MN. They summarized the formula as : Y X V ) V U G   @ H IJ. L..

is the marketing costs in time period t for add-on selling for segment i d is the discount rate N. Q. one needs to have a customer base which provides information about previous brands purchased. _ is the number of segments. is the marketing cost per prospect (N) for acquiring customers for segment i. ..Z[. O. is the retention probability at time t for a customer in segment i. Lemon and Zeithaml (2004) used a CLV model in which they considered the case where a customer switches between different brands. Q. is the marketing costs in time period t for retained customers for segment i. is the acquisition probability at time t for segment i. in using this model. is the sales of the product/services offered by the firm at time t for segment i. J. Q.  is the segment designation and  is the initial time period. T. aU is the number of purchases customer i makes during the specified time period. probability of purchasing different brands etc. Per year) U is customer i’s expected purchase volume of brand j in purchase t DU is the expected contribution margin per unit of brand j from customer i in purchase t QU is the probability that customer i buys brand j in purchase t . L. . Rust. is the cost of goods at time t for segment i. is the number of potential customers at time t for segment i. EU is firm j s discount rate. Here the CLV of customer i to brand j is given as : ` U @ H  1  A DU A QU 1 ' EU  /. c is the average number of purchases customer i makes in a unit time (eg.7 CE(t) is the customer equity value for customers acquired at time t. However. U where.

j  = ∑> B ∑x  j/klmnompqj  ∑r <. recency of purchase.8 The Customer Equity (CE) of firm j is then calculated as the mean CLV of all customers across all firms multiplied by the total number of customers in the market across all brands.r. n is the number of purchases in the observation period. CLV is calculated for an individual customer as the sum of cumulated cash flows – discounted using WACC (Weighted avg. where. switching costs. Venkatesan and Kumar (2004) in his approach to calculate CLV predicted the customer’s purchase frequency based on their past purchases. customers characteristics. There are various ways to calculate P(active) : V. T is the time elapsed between acquisition and the most recent purchase and. This model however.s  svw . i is the gross contribution margin for customer i in period t. The CLV function in this case is represented as :  . This approach brings into light the need for calculating the probability of a customer being active or P(active).r.s x u. N is the time elapsed between acquisition and the period for which P(Active) needs to be calculated. The CLV in this case depends on the activity of the customer or his expected number of purchases during the prediction time period and also his expected contribution margin.2 Individual Approach : In this approach. first contribution margin etc. Most of them have also taken into account other factors like channel communication. is quite trivial. to make the predictions more accurate. The basic formula for CLV in this approach is :   @ H fgOhC A i  where. Several researchers have used statistically advanced methods to calculate P(active) or the expected frequency of purchase. Kumar (2007) have calculated P(active) as : P(Active) = a⁄J  . cost of capital) of a customer over his or her entire lifetime (Kumar and George. 2. 2007).

Various researchers and academicians who participated in the 2008 DMEF CLV Modelling Competition have used some of these models to calculated CLV.  is the lifetime value of customer i. CBG-NBD. number of years to forecast.=. log-normal distribution etc. but survey data is not available then Rust. .9 where. ". $ is the discount rate. an appropriate approach is adopted. depending on the various details of a customer. The obvious question which one comes across is which model we use. generalized gamma distribution. O. Kumar and George (2007) have also proposed an integrated or hybrid approach to calculate CLV. They observed that an aggregate approach performs poorly in terms of time to implement and expected benefits and a disaggregate approach has higher data requirement and more metrics to track. Probit. there have been various others models and techniques which calculate P(Active) or the expected frequency of purchase which include Pareto/NBD. Kumar and George (2007) have given a detailed discussion of the comparison of these models. If this data is not available. if size of wallet information of customers is not available. Besides this.x is the unit marketing cost for customer i in channel m in year l. They have also concluded that the model selection should depend on the requirement of the firm and which criteria would they more importance to in comparison of others. when we study the various models and techniques used by researchers to calculate the parameters of CLV or CLV itself. #. Lemon and Zeithaml (2004) approach is adopted.> is the contribution margin from customer i in purchase occasion y. If the firm’s transaction data and firm-customer interaction data in available then individual approach of Venkatesan and Kumar (2004) is adopted. and a is the predicted number of purchases made by customer i until the end of the planning period.=. We will come to know more about these in the next part of the paper.x is the number of contacts to customer i in channel m in year l. c$CyzC{O| is the predicted purchase frequency for customer i. As we have seen there are various aggregate and disaggregate approaches to calculate CLV. For example one may consider the cost involved as an important factor while others may consider expected profits as a major factor of contribution. BG/NBD. In this approach. but segment level data is available then Blattberg. Getz and Thomas (2001) approach is adopted. Tobit. MBGNBD.

Weighted RFM Model : Mahboubeh Khajvand and Mohammad Jafar Tarokh. Gupta et. 2. Thomas and Kumar (2005) captured customer acquisition and retention simultaneously. which is a time series prediction method.1 RFM Models : RFM Models have been in used in direct marketing for more than 30 years.3.e their recency. Mahboubeh Khajvand and Mohammad Jafar Tarokh. We will try to use some of his modelling methods in this paper with more examples and understanding. Most of the models calculate the parameters to measure CLV using different models and then combine the same as a new method to calculate CLV. Hence to overcome this. } is the weight of recency. The key limit to this modelling approach is that it is scoring model rather than a CLV Model. al. We now present in brief about two RFM based models used to determine CLV. 3 Models and Techniques to calculate CLV : There are various models to calculate CLV. they segment the data into various groups and calculate the CLV for each cluster using the following formula: < = J?< A }~ ' J< A } + J"< A } where. These type of models are most common in industry because of their ease of use.Q)s model where. In this model they got the raw data from an Iranian Bank and calculated the recency.q)x(P. These type of models are based on three levels of information from customers i. frequency and Monetary contribution. It divides customers into various segments and then calculates a score for each segment. Using various clustering techniques like K-mean clustering. frequency and monetary value obtained by AHP method based on expert people idea. They don’t actually provide a dollar value for each customer. For example Fader. Hardie and Lee (2005) have shown that RFM variables can be used to build a CLV model and that RFM are sufficient statistics for their CLV model. (2006) have given a good review on modelling CLV. (2010) have presented his model for estimating customer future value based on the data given by an Iranian Bank. p = order of auto regressive process d = order of differencing operator q = order of moving average process .d. Reinartz.D. The multiplicative seasonal ARIMA(p. Hardie and Lee (2005) captured recency and frequency in one model to calculate the expected the number of purchases and built another model to calculate the monetary value. Fader. frequency and Monetary value of each customer.10 2. (2010) proposed a multiplicative seasonal ARIMA – Auto Regressive Integrated Moving Average method to calculate CLV.

We can then rescale this number of discounted expected transaction (DET) by a monetary value (a multiplier) to yield a dollar number for each customer. D 0 is the D-fold differencing operator The main limitation of this model was that it predicted the future value of customers in the next interval only due to lack of data. Hardie and Lee (2005) proposed this model to calculate CLV. They showed that no other information other than RFM characteristics are required to formulate this model. Fader. u . It is also assumed that M is independent of R and F. which means that the customers who leave the relationship with a firm never come back.11 P = order of seasonal auto regressive process D= order of seasonal differencing operator Q= order of seasonal moving average process Can be represented by : where. RFM and CLV using Iso-value curves : Fader. T) as : . Hardie and Lee (2005) first of all calculated DET for a customer with observed behaviour (X=x. ƒ is d-folding differencing operator which is used to change a nonstationary time series to a stationary one. ƒ 0 €Q  €QD 0  x 1 @ θ B θBε1 €Q is auto regressive process. Further they have also used the “lost for good” approach to formulate this model. ‡Q is moving average process. €Q   is the seasonal moving average process and. This models is formulated as : CLV = margin x revenue/transaction x DET The calculation of DET is the most important part of this model. This suggests that the value per transaction can be factored out and we can forecast the flow of future transactions.

Getz and Thomas (2001) this calculation of CLV has the following problems : a) we don’t know the time horizon in projecting the sales. Now they added a general model of monetary value to a dollar value of CLV assuming that a customer’s given transactions varies around his/her average transaction value. CLV – frequency – recency etc. machine learning. various graphs also called as iso-curves were drawn to identify customers with different purchase histories but similar CLVs.) is the pareto/NBD likelihood function.).2 Computer Science and Stochastic Models : These types of models are primarily based on Data mining. First of all they determined they various profitability drivers as predictor variables together with target variables in a CART analysis to build a regression tree. projection-pursuit models. They used these sub-groups as discrete states and estimate a transition matrix . 2. The DET is thus calculated as : where r. There are various researchers who have been using these techniques to calculate CLV. according to Blattberg. Haenlein et al. Support Vector Machines (SVM) etc. After that they checked various distributions to find that the gamma distribution best fitted their model and hence calculated the expected average transaction value for a customer with an avg. decision tree models. However. the numerator is the expected number of transactions in period t and d is the discount rate.12 Here. Hence they used Pareto/NBD model by using a continuous-time formulation instead of discrete time formulation to compute DET (and this CLV) over an infinite time horizon. They key limitations of this model is that it is based on a “noncontractual purchase” model. non parametric statistics and other approaches that emphasize predictive ability.3. spend of ‹u across x transactions as : This value of monetary value obtained multiplied with DET gave the CLV of a customer. Classification and Regression Trees (CART). ‰ are the pareto/NBD parameters. It is not clear which distribution should be used to calculate the transaction incidence and transaction size immediately. L. Š(. spline-based models (Generalized Additive Models (GAM). This tree helped them to cluster the customer base into a set of homogenous subgroups. Following this.) is the confluent hypergeometric function of second kind. CLV – Recency. ˆ. b) What time periods to measure and c) The expression ignores specific timing of transactions. like CLV – frequency. They had the data from a retail bank. (2007) have used a model based on CART and 1st order Markov chains to calculate CLV. These include neural network models. and L(.

rendering it as insignificant. a study of the CLVs of each customer segment to carry out marketing strategies for each segment was made. It provides insights into the effects of the covariates on the conditional CLV distribution that may be missed by the least squares method. Malthouse and Blattberg (2005) have used linear regression to calculate CLV. It was assumed that assumed that client behaviour follows a 1st order markov process. Dries and Van den Poel (2009) have used quantile regression over linear regression to calculate CLV.2. they determined the state each customer belonged to at the beginning and end of a predefined time interval T by using decision rules resulting from CART analysis. Methods like k-fold cross validation are used to check the extent of correctness of the analysis. where. c) Feedforward neural network estimated using S-plus version 6. Invertible function g is a variance stabilizing transformation. which seems inappropriate for long term forecasts and the possibilities of brand switching in customer behaviour are not taken into account. The CLV in this case is related to the predictor variables with some regression function f as where. quantile regression method is a better .  = ∑ f A " /1 '  f is the probability of transition from one state to other. C are independent random variable with mean 0 and error variance V(C ) = Œ  . which does not take into account the behaviour of early periods. weighted with their corresponding transition probabilities was determined. We can consider various regression models for this function : a) Linear regression with variance stabilizing transformations estimated with ordinary least squares. In the final step the CLV of each customer group as the discounted sum of state dependent contribution margins. Finally. " is the contribution margin for customer i and is the discount rate. This model however has some limitations too. It was also assumed that the transition matrix is stable and constant over time. b) Linear regression estimated with iteratively re-weighted least squares(IRLS). It extends the mean regression model to conditional quantiles of the response variables like the median. In prediction of the top x-percent of the customers.13 which describes movements between them. using markov chains. To estimate the corresponding transition probability.0.

Genetic Algorihm (GA) are more suitable for optimization problems as they achieve global optimum with quick convergence especially for high dimensional problems. customer targeting etc. PCA (Principal Component Analysis) and logistic regression are used for selecting variables to input in ANN. acquisition rate.14 method than the linear regression method. ANN too is not without limitations. It cannot handle too many variables. other data mining techniques like Decision Trees (DT). The data is first divided into three groups. The other is the hybrid approach of GA/ANN by Kim and Street (2004) for customer targeting where. Malthouse . The smaller the top segment of interest. the better estimate of predictive performance we get. Selection of these parameters is a research area in itself. GA have seen varied applications among CLV parameters like multiobjective optimization (using GeneticPareto Algorithm). customer targeting. cross selling and feature selection. it can be used for both classification and regression purposes depending on the activation function. Besides GA. Aeron and Kumar (2010) have mentioned about different approaches for using ANN. MARS(Multivariate Adaptive Regression Splines). classification etc. Support Vector Machines (SVM) etc. The first group has all situational variables. logit. Artificial Neural Networks (ANN). Besides. the second has all demographics variables and the third has both situational and demographic variables. churn prediction. various other algorithms like GA. hazard functions. which are used to predict or optimize the various parameters for CLV like churn rate. Fuzzy Logic and Support Vector Machines also find applications for predicting churn and loyality index. it is evaluated on a data set and returns metrics to GA. GA is either used to predict these parameters or optimize parameter selection of other techniques like ANN. Malthouse and Blattberg (2005) used ANN to predict future cash flows. Besides. There are many other techniques and models like GAM(Generalized Addictive Models). First is the generalised stack approach used by Hu and Tsoukalas (2003) where ensemble method is used.5. which takes longer time to reach the desired solution. These approaches remain little known in the marketing literature and has a lot of scope for further research. Churn Rate in itself is a very vast area of CRM which can be used as a parameter in the prediction of CLV and many other related models. Again ANN have also been used to catch non linear paterns in data. There have been many worldwide competitions and tournaments in which various academics and practitioners use various methods by combining different models to get the best possible results. Genetic Algorithm (GA). The 2008 DMEF CLV Competition was one such competition in which various researchers and academicians came together to compete for the three tasks in that competition. CHAID. Once. Among DT the most common are C4. GA searches the exponential space of features and passes one subset of features to ANN. So. Besides all this initial weights are decided randomly in ANN. Fuzzy Logic and Support Vector Machines (SVM) are also in use but mostly to calculate CLV metrics like customer churn. The ANN extracts predictive information from each subset and learns the patters. There is no set rule to find ANN parameters. CART and SLIQ. it finds the data patters.

Customer Acquisition and Customer Retention are the key inputs for such a type of model.3 Growth/Diffusion Models : These types of models focus on calculating the CLV of current and future customers.3. of customers a firm is likely to acquire in the future (Gupta. and c is acquisition cost per customer. {V is the the no. al (2006) have given a good review on this type of models. 2004). ‰. Forecasting acquisition of future customers can be done in 2 ways : The first approach uses disaggregate customer data and builds models that predict the probability of acquiring a particular customer (Thomas. The expression for forecasting the number of new customers at time t is : . then value of average lost customer is : 2. For eg.3. Lehman and Stuart. i is the discount rate. where L. Blattberg and Fox. m is the margin. Various models relate customer acquisition and retention and come up with new models to calculate CLV. . If the relative proportions of lost customers are L. For example the right censored Tobit Model for CLV (Hansotia and Wang. Econometric models study customer acquisition. r is retention rate.4 Econometric Models : Gupta et. In a banking Industry which has recently acquired a new technology will have some customers who will be reluctant to that change and will be lost. retention and expansion (cross selling or margin) and combine them to calculate CLV. 2. 2004). Diffusion models can also be used to assess the value of a lost customer. The other approach is to use aggregate data and use diffusion or growth to predict the no. they estimated the CE of a firm as : where. of newly acquired customers for a segment k. Ž are parameters of the customer growth curve Using this. We will present the same in brief in this paper with an example of a right censored tobit model by Hansotia and Wang (1997).15 (2009) have made a compilation of the various models which were presented in that competition.

(2006) have . The likelihood function which is the probability of observing the sample value was given by : where. 1980) and b) Proportional Hazard Models (PH) (Levinthal and Fichman .16 1997). The equation may also be estimated using LIFEBERG procedure in SAS. We get different models like exponential. Again PH models specify the hazard rate (‘) and covariates (X) as : ‘ . b) the second one considers the “always a share approach” and typically uses markov models. for different specifications. It is a regression model with right censored observations and can be estimated by the method of maximum likelihood. al (2006) have also mentioned about a probability model.  @ ‘  exp ‰. Different specifications of Œ and  lead to different models such as Weibull or generalized gamma Model. the four type of models presented in this paper. Besides. weibull. al. where t is purchase duration for customer j and X are covariates. S=1 if observation i is uncensored and 0 otherwise. They again are are of two types : a) Accelerated Failure time (AFT) (Kalbfleisch and Prentice. Hazard models are used to predict probability of customer deflection. it has been taken into account in the Computer science and stochastic model. However Gupta et. 1988). they are broady classified into two main categories : a) the first one considers the “lost for good” approach and uses hazard models to predict the probability of customer deflection. Hansotia and Wang (1997) used a right censored tobit model to calculate the lifetime value of customers or LTV as it was called then. in our research. 2001) that ignoring the link b/w customer acquisition and retention may cause a 6-50% variation from these models. gompertz etc. the customers might walk away soon. For example if we spend less money on acquisition. In case of retention models. AFT is of the form : ln(U ) = ‰U U ' ŒU . Gupta et. #• is the (K+1) dimensional column vector of profile variable for the qth customer. The present value of a customer’s revenue (PVR) for the qth customer receiving package j was calculated as : . where. where. It has also been shown by some researchers (Thomas. However.

d is a predefined discount rate and Profit . betabinomial/beta-geometric (BG/BB). J is the number of different services sold. (2006) have also mentioned about persistence models which has been used in some CLV context to study the impact of advertising. 2. and is build directly on the definition of CLV as defined by Berger and Nasr (1998) : where. and the second is the service level models. univariate and multivariate models and duration models. is the average profit margin for service j. Various combinations of these assumptions results in models like pareto/NBD. Other than that Gupta et. 2005) and to examine differences in CLV resulting from different customer acquisition methods (Villanueva. The CLV predictions are then obtained by predicting purchase behaviour at the service level and combining the results of both models to calculate CLV.5 Some other Modelling Approaches : Donkers et al. An overview of the models as presented by Donkers et al. and MarginU.3. is a dummy indicating whether customer i purchases service j at time t.which disaggregate a customer’s profit into the contribution per service. discounting and product quality on customer equity (Yoo and Hanssens. ServU. for a multiservice industry is defined as : . (2007) have also made a review of various CLV modelling approaches with respect to the insurance industry sector. and Hanssens. is the amount of service purchased. They have also taken into account the heterogeneity in dropout rates across customers. 2006).17 made a few assumptions in their review of probability models like the probability of a customer being “alive” can be characterized by various probability distributions models. First Relationship Level models – which focus on relationship length and total profit. Yoo. They grouped these models into two types of models. where. al. These include a status quo model. UsageU. a Tobit-II model. markov models etc. (2007) with their mathematical models is given below : .

the Northwind Traders is adopted to demonstrate our approach. We have used Classification and Regression Trees (CART). Tobit II Model separates the effect of customer deflection on profitability. Probit Model is based on customer specific retention probabilities. we have used to calculate the future value of customers. The next part of the paper presents the machine learning approach. A dataset obtained from Microsoft Access 2000. Bagging Model is also based on customer specific retention probability. An Overview of Sevice-level-Models : These types of models are explained as choice model approach and duration model approach. Additive . Profit Regression Model aims at prediction of customer’s annual profit contribution. Support Vector Machines (SVM). Choice model approach has as dependent variable the decision to purchase a service or not. Duration Model is focused on customer’s relationship duration.18 An overview of Relationship Level Models : Here the Status Quo Model assumes profit simply remains constant over time. Retention Models are based on segmenting over RFM. It only models the ending of a period and not the starting of a new one. Duration Model approach focuses on the duration of an existing relationship. SVM using SMO.

We used 65 samples for training the data and the remaining 24 for testing purposes. we make a comparison of these models and suggest the best model to calculate the CLV. out of which 6 are input or predictor variables and the remaining one i. January – June 1996. Hansotia and Wang (1997) have used CART and CHAID for customer acquisition. 1996 as the target variable. contribution margin in jan-june. 3. K-Star Method. Estimating Future Customer Value using Machine Learning Techniques: There are various data mining techniques which are used in the field of classification and regression.1 Data Description : A sample database of Microsoft Access 2000.19 Regression. (2003) used Genetic Algorithms (GA) for predicting customer’s churn. In our case. In the later part of the paper. Au et al. we have we have used the regression technique to determine the future value of customers in the next prediction period. Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) to calculate the futute value of customers. 3. In the past. Kim and Street (2004) have used ANN for customer targeting. July – December 1995 and January – June 1996. We end this paper with results and discussion on the future development in the area of CLV measurement. However. using these techniques to directly predict a customer’s future value and hence CLV have not been done so far.e. The use of a technique depends on the type of data available. . The approach which we have adopted tries to eliminate this process and allows the software which uses this technique to predict the relationship between the input variables and their weightage in calculating CLV. 1994 till December 1995 and made a prediction of the expected contribution in the next period i. We have divided this time frame into 4 equal half years and calculated the frequency of purchase and the total monetary contribution in July – December 1994. The entire dataset is then dived in two parts: a) training and b) testing. The database contains 89 customers with a purchase period of 2 years from 1st July 1994 till 30th June 1996. Most of the previous approaches in measuring CLV have used two or more models to calculate either CLV or determine the relationship between the various parameters used to determine CLV. several researchers have used these techniques to determine the metrics of CLV depending on the type of model and approach they have used.e. January – June 1995. The total variables used are 7. the Northwind Traders database is adopted to calculate the CLV of customers. Further we kept the observation period from July.

Additive Regression. 95 The contribution margin in the period july – dec. 95 The contribution margin in the period jan – june. 94 as 1 and dec. SVM uses a linear model to implement non-linear class boundaries by mapping input vectors non-linearly into a high dimensional feature space using kernels.0. 2000). In Salford Predictive Miner (SPM). we used CART to train the dataset and applied the rules obtained from the training dataset on the testing dataset for prediction.e from july 94 till dec.2 Models and Software used: Knime 2. trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory (Cristianini & Shawe-Taylor. SVMs are learning systems that use a hypothesis space of linear functions in a high-dimensional space. The software developed at IDRBT.0) and a software by Chauhan et al. calculating july. (2009) developed at IDRBT for classification problems in DEWNN. for learning purposes of the training dataset and the weka predictor for prediction of the testing dataset. 95 The contribution margin in the period july – dec. 1998). Hyderabad was used to train the data using Wavelet Neural Network (WNN) and applied the learning parameters on the test data to get the results and NeuroShell for MLP. We have given brief description of the techniques used for prediction of the target variable. SVM using SMO. 95 as 18 The total number of purchases between july. 3.0 . 94 The contribution margin in the period jan – june. 96 3. Salford Predictive Miner (SPM).20 Table 1: Description of variables Type of variable Input Variable Variable Name Recency-dec95 Input Variable total frequency Input Variable Total duration Input Variable CM_july-dec94 Input Variable CM_jan-june95 Input Variable CM_july-dec95 Target Variable output Variable Description Calculates the recency as a score. SVMs have recently become one of the popular tools for machine learning and data mining and can perform both classification and regression.2. 94 till dec. In Knime.1 SVM : The SVM is a powerful learning algorithm based on recent advances in statistical learning theory (Vapnik. we have used Support Vector Machines (SVM). 95 The total duration of observation i. K-Star Method. NeuroShell 2 (Release 4. The training examples that . Hyderabad is used for analysis.

SVMs have the following advantages: (i) they are able to generalize well even if trained with a small number of examples and (ii) they do not assume prior knowledge of the probability distribution of the underlying dataset.98%.9102.2 Additive Regression and K-star: Addtive Regression is another classifier used in weka that enhances the performance of a regression base classifier.19 and the root relative squared error as 44.41%. Recently.03%. Reducing the shrinkage (learning rate) parameter helps prevent overfitting and has a smoothing effect but increases the learning time. root mean squared error as 3203. Prediction is accomplished by adding the predictions of each classifier. we found the correlation coefficient as 0. It differs from other instance-based learners in that it uses an entropy-based distance function.. 2008). K-star on the other hand is an instance-based classifier.8889 and root relative squared squared error as 48. we used two SVM learner models for predictive purposes. It also normalizes all attributes by default. In our research. .2. First we used the SVM Regression model as the learner function and then used weka predictor to get the results. In case of SVO (sequential minimal optimization algorithm) for training a support vector regression model. This implementation globally replaces all missing values and transforms nominal attributes into binary ones. Here we found the correlation coefficient as 0. The support vectors are then used to construct an optimal linear separating hyper plane (in case of pattern recognition) or a linear regression function (in case of regression) in this feature space. SVM may serve as a sound alternative combining the advantages of conventional statistical methods that are more theory-driven and easy to analyze and machine learning methods that are more data-driven.36%. The support vectors are conventionally determined by solving a quadratic programming (QP) problem. the root mean squared error as 3062. time series prediction and insurance claim fraud detection (Vinaykumar et al.21 are closest to the maximum margin hyper plane are called support vectors. as determined by some similarity function. These techniques are quite similar to what we did in SVM Regression and SMO Regression learners using weka predictors. we found the correlation coefficient as 0. In fact. Each iteration fits a model to the residuals left by the classifier on the previous iteration. 3. SVM is simple enough to be analyzed mathematically.8884 and the root relative squared error as 47. SVM are used in financial applications such as credit rating.57 and the root relative squared error as 46. In case of K-star. All other training examples are irrelevant for defining the binary class boundaries. distribution-free and robust. that is the class of a test instance is based upon the class of those training instances similar to it.895. we replaced the learner function by the SVOreg function. We found the correlation coefficient as 0. In Additive Regression.

2.1] to get the best results.8 % which was the least among all other methods used. as we will find out later. They have been shown to yield accurate predictions in difficult problems (Rumelhart. which has the effect of making the network much like a newborn’s brain – developed but without knowledge. (1984). For learning purposes we set the learning rate as 0.2. Before any data is passed to the network. & Williams. They have advantages over traditional Fourier methods in analyzing physical situations where the signal contains discontinuities and sharp spikes. radar and earthquake prediction. we used NeuroShell 2 (version 4.html). quantum physics. An individual wavelet is defined by x −b ψ a. They learn how to transform input data into a desired response So they are widely used for pattern classification and prediction.wolfram.4 WNN : The word wavelet is due to Grossmann et al. In our research. which is confined in a finite interval ”Daughter Wavelets”ψ a .22 3. MLPs start as a network of nodes arranged in three layers – the input. Interchanges between these fields during the last few years have led to many new wavelet applications such as image compression. 3. hidden. electrical engineering and seismic geology. We found the root mean squared error as 43. Wavelets are a class of function used to localize a given function in both space and scaling (http://mathworld. They are supervised networks so they require a desired response to be trained. chap. MLPs are feed-forward neural networks trained with the standard back propagation algorithm. Wavelets were developed independently in the fields of mathematics. respectively. Hinton. Each layer is fully connected to the next one.b ( x) =| α |−1/2 Ψ ( ) a In the case of non-uniformly distributed training data.1 and the scale function as linear [ A family of wavelet can be constructed from a function ψ ( x) known as “mother wavelet”. and the hidden layer serves to provide a means for input relations to be represented in the output.b ( x ) are then formed by translation (b) and dilation (a). A multi-layer perceptron is made up of several layers of neurons. an efficient way of solving this problem is by learning at multiple resolutions. the weights for the nodes are random. 8).0) to determine the results. With one or two hidden layers. 1986. Wavelets are especially useful for compressing image data.3 MLP : Multilayer Perceptron (MLP) is one of the most common neural network structures. they can approximate virtually any input–output map. and output layers. The input and output layers serve as nodes to buffer input and output for the model. momentum rate as 0. Wavelets in addition to forming an orthogonal basis are capable of explicitly representing the behaviour of a function at various . and have found home in a wide assortment of machine learning applications.5. as they are simple and effective.

hidden layer and output layer. nin is the number of input nodes and nhn is the number of hidden nodes and np is the number of This function called the ‘mother wavelet’ is localized both in the space and frequency domains (Becerra. In subsequent stages. the training times are significantly reduced ( Galvao.75t )exp(−t 2 / 2) (2) .……. In (1) when f(t) is taken as Morlet mother wavelet is has the following form : f (t ) = cos(1. a wavelet network is first trained to learn the mapping at the coarsest resolution level. is calculated with the following formula :computed as follows: n in ∑ nhn VK = ∑ j =1 W j f ( w ij x k i − b i =1 a j ) j (1) where..html). Number of input and output nodes depends on the number of inputs and outputs present in the problem. The number of hidden node can be any number from 3 to 15is a user-defined parameter depending on the problem. k = 1. multi resolution has many attractive features for solving engineering problems.. resulting in a more meaningful interpretation of the resulting mapping and more efficient training and adaptation of the network compared to conventional methods. 2001). The WNN network is consists of three layers namely input layer. The original training algorithm for training a WNN is as follows (Zhang et al. 2001): 1) Specify the number of hidden nodes required.ncl. The wavelet theory provides useful guidelines for the construction and initialization of networks and consequently. the network is trained to incorporate elements of mapping at higher and higher resolutions.23 resolutions of input variables.. Initialize randomly the dilation and translation parameters and the weights for the connections between the input and hidden layers and also between the hidden and the output layers. Abou-Seads 2005). Consequently. Each layer is fully connected to the nodes in the next subsequent layer.. Wavelet networks employ activation functions that are dilated and translated versions of a single function. where d is the input dimension (Zhang. Such WNN is implemented here with the Gaussian wavelet function. The wavelet neural network (WNN) was proposed as a universal tool for functional approximation. 2) The output value of the sample £ . 1997). which shows surprising effectiveness in solving the conventional problem of poor convergence or even divergence encountered in other kinds of neural networks It can dramatically increase convergence speed (Zhang et al.

2. We propose BFTWNN to resolve these problems. ∆a j . 3. ∂wij (t ) (5) ∂E + α∆a j (t ). In our research. wij . Almost all the decision tree algorithms are used for solving . We found the test set normalized root mean square error as 0. V k = 1 K     Where η and α are the learning and the momentum rates respectively. the error function can be taken as 1/ 2 ∧  k = np  ( V − V )2  K  K E=∑ (8) 2  . population size as 60. ∆b j (see formulas (4)-(7)). Some problem exists in the original WNN such as slow convergence. 4) Return to step (2) the process is continued until E satisfies the given error criteria.95. The software was initially made for classification purposes. the gradient descend algorithm is employed: ∂E + α∆W j (t ). ∂b j (t ) (7) ∆W j (t + 1) = −η ∆wij (t + 1) = −η ∆a j (t + 1) = −η ∆b j (t + 1) = −η where. we used a software made by Chauhan et al. and the whole training of the WNN is completed. The root relative squared error as 111.95. maximum weight as 102 and minimum weight as -102 to find the optimum solution. (2009) for DEWNN (Differential evolution trained Wavelet Neural Network). We changed the software code from classification to regression type and used it in our problem. entrapment in local minima and oscillation (Pan et al.00001.2 %. number of hidden node as 20. b j using ∆W j . a j .928441.. We set the weight factor as 0.5 CART : Decision trees form an integral part of ‘machine learning’ an important subdiscipline of artificial intelligence.24 And when taken as Gaussian wavelet it becomes f (t ) = exp( − t 2 ) (3) 3) Reduce the error of prediction by adjusting updating using W j .crossover factor as 0. ∂W j (4) ∂E + α∆wij (t ). convergence criteria as 0. Thus. ∂a j (t ) (6) ∂E + α∆b j (t ). in training the WNN. ∆wij . 2008). which was the highest amongst all the results.

a combination of exhaustive searches and intensive testing techniques to identify useful tree structures in the data. and (iii) assigning each terminal node to a class outcome (or predicted value for regression). we found better results. A tree’s leaf node may be a single member of some class. we used Salford Predictive Miner (SPM) to use CART for prediction purposes. however.38% which is very close to MLP. and the trees it produces contain rules. We see that we got the optimum results on growing the tree from node 5 to node is a robust. enumerated sets) both in input features and predicted features. The key elements of a CART analysis are a set of rules for: (i) splitting each node in a tree. The number of nodes. Figure 1 : CART : Plot of relative error vs number of nodes Figure 2: CART : Plot of percent error vs. The leaves of the tree contain the best prediction based on the training data. CART is powerful because it can deal with incomplete data. searching for and isolating significant patterns and relationships. We trained the model using least absolute deviation on the training data. Decision trees contain a binary question (with yes/no answer) about some feature at each node in the tree. which are humanly readable. CART uses a recursive partitioning. Terminal nodes . resulting in a set of ‘if–then’ rules.salford-systems. The root mean squared error changed to 3107. These rules can be used to solve the classification or regression problem.53 and the total number of nodes was 5. We found that the root mean squared error was 3367. (ii) deciding when a tree is complete. a probability density function (over some discrete class) or a predicted mean value for a continuous feature or a Gaussian (mean and standard deviation for a continuous value). Figure 3 shows the plot of relative vs. Decision tree algorithms induce a binary tree on a given training data. This discovered knowledge is then used to generate a decision tree resulting in reliable. on growing the tree nodes from 5 to 6. However.13 and the root relative squared error is 45. multiple types of features (floats. In our research. algorithms like CART solve regression problems also. Decision lists are a reduced form of this where an answer to each question leads directly to a leaf node. CART (http:// www. easy-tograsp predictive models in the form of ‘if–then’ rules. easy-to-use decision tree tool that automatically sifts large. complex databases.25 classification problems.

75 && TOTAL_FREQUENCY > 14 ) then y = 19044.75 ) then y = 24996 4.66 && CM_JULY_DEC95 <= 2464. 19 out 24 customers were put in node 1. We also found that the root mean squared error was 2892.26 3. if(CM_JULY_DEC95 <= 2278.64 2.1 ) then y = 38126. if(CM_JAN_JUNE95 <= 12252.06 ) then y = 1511. optimum nodes to 6. y is median . which was way less than the total RMSE of 3107. However. which is better than the overall error.06 && CM_JAN_JUNE95 <= 12252. one can draw from CART is that it is more useful than other methods for prediction.6 for the 19 customers in node 1. because of its rules which gives companies the flexibility to decide which customer to put in which node and also to choose the optimum number of nodes for their analysis.75 && TOTAL_FREQUENCY <= 14 ) then y = 6350. the overall increase in error was caused due to misclassification or high error rate in splitting customers in node 4 and node 6.89. if(CM_JAN_JUNE95 > 12252. if(CM_JAN_JUNE95 <= 12252.26 It was also seen from the results that.1 && CM_JULY_DEC95 > 2464.1 && CM_JULY_DEC95 > 2278. In case of growing.25 5.66 &&CM_JAN_JUNE95 > 3534. when the optimum number of nodes were kept at 5. if(CM_JAN_JUNE95 <= 12252. where.4 6.13.1 ) then y = 5932.66 && CM_JAN_JUNE95 <= 3534. we found that 14 customers were split in node 1. 5 in node 2. 4 in node 4 and 1 in node 6.7 . if(CM_JULY_DEC95 <= 2278.1 && CM_JULY_DEC95 > 2464. Figure 3 : CART : Tree details showing the splitting rules at each node A summary of the rules is given as : 1. 4 in node 3 and 1 in node 6. The RMSE in node 1 was 1846. One obvious conclusion.

27 4.77 3107.57 2986. CART in SPM and MLP in NeuroShell.98 3062.82 Root relative squared error 48. We have used various techniques like SVM. Additive Regression. One limitation of our study is that we have only predicted the future value of only the next time period.3% 46. K-star MLP CART 0. Besides this.8950 0.76 2233.4% 43.9102 NA NA Root Mean Squared error 3315. We find that companies can make better decisions with the help of these rules and the segmentation technique in CART.8889 0.3% Figure 4 : Graph of Error vs Model 49 48 47 46 45 44 43 42 41 MLP Additive Reg.48 2203. because of the small amount of dataset we have.21 2107. K-star Method in Knime using weka predictor.10 2343.0% 47. CART K-Star SMOreg SVMreg .8884 0.25 3311. customer behaviour etc. but we find CART to be more useful.19 3203.9% 44. WNN. We found that MLP has given the least error amongst all these models.13 Mean Absolute error 2513. A detailed summary of the final results of competing models is given in Table 2. Results and Comparison of Models : We have used various machine learning techniques to calculate the future value of 24 customers from a sample of 89 customers. as is more helpful in taking decisions by setting splitting rules and also predicts more accurately for a greater section of the test sample by splitting the sample into various nodes. Table 2 : Comparison of Competing Models Correlation coefficient SVMreg SMOreg Additive Reg. We believe that these models will be able to perform better in case of large dataset with more input variables including customer demographics. the error percentage is relative high.8% 45.03 2499.

we would still recommend using CART to calculate CLV as it segments the customers into various nodes and calculates more precisely for a larger segment of test case customers. We have also covered the tradional techniques used to calculated Customer Loyalty and found that CLV is better metric compared to these measures. Diffusion Models and also relationship level models and service level models. which include RFM Models. can be taken as an area of future research. Besides. Besides. There limitations can be overcome by using datasets which can give more information about the customer behaviour. . Further we see that although MLP gives the best result amongst all these models. SVM using SMO. Decision trees. we have also reviewed various modelling techniques to determine CLV. Emphasis has been given to catch the non-linear pattern in the data which was available for a set of 89 customers having a 2 year transaction history. This also resulted in some high error rates even amongst the best models. Artificial Neural Networks. Genetic Algorithms. We have also presented a study of measuring CLV by means of various machine learning techniques. We have used Classification and Regression Trees (CART). We have also not given much emphasis on feature selection and the relationship between the input variables to calculate CLV. Conclusion and Directions of future research: In this paper we have presented a review of various approaches and modelling techniques to determine Customer Lifetime Value. we have not covered techniques like k-fold cross validation. We also see that the type of approach used to calculate CLV depends on the type of data available and the type of result which a firm wants. The main limitations of our study have been the projection of future value of customers till only the next period. Producing better results with an integrated approach with this dataset is again an area of future research. the splitting rules would also help any firm to understand better the classification of a customer into a particular segment and hence derive more profit out of him. Support Vector Machines (SVM). Multilayer Perceptron (MLP) and Wavelet Neural Network (WNN) for the calculation of the future value 24 customers. mainly due to the limitation of the dataset we had.28 5. a large dataset will be useful to make better predictions as it can estimate the training parameters better. We see that the most frequently applied techniques to determine CLV parameter or to determine the relationship between them include. Computer Science and Stochastic Models. Support Vector Machines. For better estimation in small datasets. which again. Pareto/NBD models. his demographics etc. K-Star Method. Additive Regression. Econometric Models. Further. The most common approaches used to measure CLV are aggregate approach and individual approach.

Expert Syst. (2010) ‘Application of data mining techniques for customer lifetime value parameters : a review’. Sunil. Thomas js (2001). (1998). N. W. 5 (2) 163-190. A. 41 (1). IEEE Transactions on Evolutionary Computation.. 532– 545. UK: Cambridge University Press. (2005). (2003). Kumar. “Valuing Customers”. Journal of Interactive Marketing.. 139-155. W. Journal of Marketing Research. 6. S. Chauhan. An Introduction to Support Vector Machines. “RFM and CLV: Using Iso-CLV Curves for customer base analysis”. No. D. 11. Verhoef and M.. 12: 17–30 Blattberg. and Nasr. Ravi.29 References: Aeron. Galvao. Vol. Dries F.G. V. Lehmann and Jennifer Ames Stuart (2004). Robert C. MA : Harvard Business School Press. Hanssens. X. Kumar. 2006. 4. Au. P.F (1997) ‘Customer lifetime valuation to support marketing decision making’. H. Vol. Shawe-Taylor (2000).. Business Information Systems.. doi:1 0. de Jong (2007) Modeling CLV: a Test of Competing Models in the Insurance Industry. Customer lifetime value: Marketing models and applications. K. D. Boston. Kahn.. pp. H. N. Chan. pp. Cambridge. Getz G. Karthik Chandra: Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks. 36(4): 7659-7665 (2009) Cristianini. Quantitative Marketing and Economics.C..205 – 219. Peter S. & Yao. Neural and wavelet network models for financial distress classification. 7-18. ''Customer Equity: Building and Managing Relationships as Valuable Assets''. A novel evolutionary data mining algorithm with applications to churn prediction.Expert Syst. M. . Jounal of Marketing Research. and Lin. 35–55. N.. 11. N. Dirk Van den Poel: Benefits of quantile regression for the analysis of customer lifetime value in a contractual setting: An application in financial services. M. Int. J.. Data Mining and Knowledge Discovery. Appl. Donald R.1007/s1 0618-0051360-0 Berger.. and Ka Lok Lee (2005). I. B. P. Becerra. ______. Gupta. & Abou-Seads. Bruce G.10484 (2009) Dwyer. Journal of Service Research. 4. Modelling Customer Lifetime Value. V. V. and Janakiraman.529. M.. Hardie.. 42 (November). 9. and J. Benoit. D. No. Journal of Direct Marketing. Fader. Appl. 7(6). Hardie.514. B. R. Donkers.. 415-30. 36 (7): 10475 .

30 Hansotia. K. L. and J. B. An intelligent recommendation system for customer targeting: A data mining approach. Malthouse. C. No. 146. 345-69. Mahboubeh Khajvand. (3):1327-1332. Jacquelyn (2001). M. C. 1. R. Journal of Interactive Marketing. 2009.” European Journal of Operational Research. N. Wang (1997). Rumelhart. A. V. 3. Journal of Marketing 68. Malthouse. Thomas. Prentice. Werner. Fichman. J. and P. Jacquelyn Thomas and V. Haenlein. New York : John Willey. European management journal. T.. Y. Vol. 69 (1). "Learning representations by back-propagating errors". 215-228 Kumar. Zeithaml (2004). Customer Relationship Management : A Databased Approach. 272-275. 35:157171.R. ‘Return on marketing: Using customer equity to focus marketing strategy’. Kaplan.. Hu.2 – 16. Williams. No. D. (2005) ‘Can we predict customer lifetime value’. (2004). 6379.. Hinton.M. . Edward C. 650-661. Procedia CS. 7-19. 33. _______ and Morris George (2007). 2003. Levinthal. David E. and M. “Balancing Acquisition and Rentension Resources to Maximize Customer Profitability”. A.. C. Kumar (2005). A. pp. Ronald J. Journal of Direct Marketing 11(2). Journal of Interactive Marketing. Werner Reinartz (2006). Decision Support Systems.E and Blattberg. N. Geoffrey E. The Results from the Lifetime Value and Customer Equity Modeling Competition.J. Street. Journal of Marketing Research. 37(2). (2007) A model to determine customer lifetime value in a retail banking context.. “A methodology for linking customers acquisition to customer retention”. Beeser. and Mohammad Jafar Tarokh. Nature 323 (6088): 533–536 Rust.. and Tsoukalas. 2011. “Explaining Consumer Choice through Neural Networks: The Stacked Generalization Approach. 38 (2). Reinartz. Vol. Journal of Marketing. (1988). 19. 109-127. J. and V. and R. D. 23 (2009). M. “Dynamics of Interorganizational Attachments: Auditor Client Relationships”. Statistical Analysis of Failure Time Data. 262-68.. Administrative Science Quarterly. pp. Vol. (1980). Lemon. Journal of the Academy of Marketing Science. New York: Wiley Kim. ‘Analytical challenges in customer acquisition’.. Estimating customer future value of different customer segments based on adapted RFM model in retail banking context. (8 October 1986). Kalbfleisch.

February). Using wavelet network in non-parameters estimation. working paper. 41. 106-125 (October).. K. V. J. and V. vol 2. Journal of Marketing. 'A customer lifetime value framework for customer selections and resource allocation strategy'.S. Word-ofMouth Customer Acquisition on Customer Equity.. Mahil Carr. 68. pp 1-96. Venkatesan.J. February 2008.M. and D. "The Impact of Marketing-Induced vs. February 2005. New York. Ravi." Journal of Marketing Research. Zhang. Kumar (2004)." revised. Los Angeles. 1997. Q. Vinay Kumar. IEEE Transaction Neural Networks 8 (2): 227~236 . Journal of Marketing Research. “Recapturing lost customers”. 31-45. Hanssens. Yoo.. Vapnik. Statistical Learning Theory Wiley. & D.M. R. Anderson School of Management. Customer Lifetime Value – The path to profitability.31 Thomas. Raj Kiran: Software development cost estimation using wavelet neural networks. 2007. 1998. (2004. "Modeling the Sales and Customer Equity Effects of the Marketing Mix. Hanssens. Villanueva J. E. S. V. Foundations and Trends in Marketing. no 1... Journal of Systems and Software 81(11): 1853-1867 (2008) Yoo S.C. N. Kumar. and Fox. Blattberg R. V. University of California.