This action might not be possible to undo. Are you sure you want to continue?
A Review of Techniques and Approaches for Detecting and Managing Churn
Satyam Computer Services Ltd. #14 Langford Avenue Lalbagh Road Bangalore 560 025 India September 2004
One of the recent initiatives in the form of Wireless Local Number Portability has further pushed the churn rates in already stressed telecom segment. The attraction could be in the form of competitive price offerings. Although every approach has advantages and disadvantages in terms of ease of implementation. With more and more new service providers entering the scene. Churn. Table–1 illustrates some of the major factors influencing the churn .Churn Management: A Review of Techniques and Approaches for Detecting and Managing Churn Abstract In recent times. customer churn in telecom sector has assumed great significance. or better services in terms of quality or in value-addition. The attractions for a customer to switch from an existing service provider could be several. Significant increase in the cost of acquiring new customers is forcing service providers to focus their efforts towards effective churn management. factors such as customer’s lack of understanding of the service scheme. These mainly depend on the extraction of behavior patterns of customers from the historical data and identifying the patterns that are associated with possible churners. and ease of interpretation of the results. failure to fulfill SLAs by the service provider all lead to churn. Reported techniques for building the churn prediction model were based on approaches such as Decision trees. discounts and incentives. has assumed significant importance these days mainly due to the increasing costs of acquiring a new customer and ever increasing churn rates. Neural Nets. the cost of customer acquisition is also progressively increasing. Service providers need to be vigilant to the market initiatives of the competitors and respond proactively to contain the possible churn of their customers. Although the subscriber bases are growing day by day. This paper reviews the techniques based on these approaches for churn prediction. Various data mining techniques were suggested for prediction of churn. 1 Introduction Until recently. and Rule based learning. unfulfilled privacy promises. though not a new problem. some of these approaches were reported to be specifically suitable for churn prediction in the telecom sector. data requirements. Studies have indicated that this especially will have adverse affect on the churn of medium size business customers . In addition to external attractions from competitors. retaining the existing customers has assumed paramount importance. telecom service providers enjoyed the benefits of a rapidly growing subscriber base. Logistic regression. 2 . This paper presents a review on churn management in terms of the available techniques and approaches for detecting and managing churn. Ever increasing competition among the service providers and growing customer expectations are the main reasons for increasing churn.
privacy promises. addressing privacy issues with high importance. there are some basic factors (SLA. 3 . which needs to be managed to minimize the potential losses to the business. Factor Call quality Pricing options Corporate Capability Customer Service Credibility/Customer Communications Roaming/Coverage Handset Billing Cost of Roaming Importance Nature of data required for prediction Network Market. Customer Service Customer Service Customer Service Network Application Billing Market. A proactive identification of churners based on the prediction models provides ample opportunity to a service provider to design and employ the retention efforts. however. that follows soon after a customer declares his intentions to leave. in reality in spite of these customers do churn. The damage control exercise. Fulfilling the agreed SLAs. service scheme understanding) that when unattended lead to churn. generally yields poor results in comparison with proactive retention efforts based on proper churn prediction. Factors influencing subscriber satisfaction  Attractive service offerings and competitive pricing are only pre-requisites for retaining a customer.2 Churn Management Churn is an unavoidable phenomenon. Therefore. and proper communication of service offerings can prevent customer churn to a large extent. The positive fact is that a large part of the possible churn can be prevented by timely detection and effective retention efforts. A proper insight into the reasons behind churn is essential to design effective retention efforts. Billing Market. the effectiveness of a prediction model not only depends on its accuracy to predict a churner but also how well the results can be interpreted for inferring the possible reasons for the churn. Though there may be different reasons for a customer to churn. Billing 21% 18% 17% 17% 10% 7% 4% 3% 3% Table-1.
the missing values are replaced with a reasonable estimate based on interpolation methods. credit rating. Table–2 illustrates a scheme for performance evaluation of a model. Correct decisions must be made for good prediction results. As mentioned earlier. Broadly the data may be categorized as Personal and Transactional. Although huge databases are available with service providers. The performance of a prediction model is evaluated primarily in terms of the accuracies of prediction. 4 Types of Churn Prediction Most of the early models proposed for churn prediction were aimed at classifying a customer as a possible churner or non-churner. The basis for such identification tasks is the historical data that contains information about past churners. Prediction accuracy is affected by the inclusion of an attribute with a lot of missing values or by the exclusion of an important attribute. age. True positives 4 . models were developed to predict the churn in terms of likelihood of churn instead of a two-class classification. This aids in strategizing the deployment of limited resources available for retention efforts by choosing more probable churners. In most of the dimensionality reduction techniques. In the record level correction approach. The personal category includes attributes that represent customer demographic data such as geographic area. and family information. And the significance of an attribute for prediction is evaluated based on domain knowledge. an attribute is either selected or ignored for prediction based on its significance. income. A behavioral comparison is made between the churners and existing customers. There is also the problem of missing values in such large datasets. are chosen as likely churners. Missing values can be dealt with either at the attribute level or at the record level. and call failures. appropriate response to churn detection requires that the reasons for churn be inferred correctly. The size of the data is generally very large and this is compounded by high-dimensionality found in such data. they pose quite a few challenges for usability in churn prediction. type of calls. In the attribute level.3 Churn Prediction Churn prediction deals with the identification of customers who are likely to churn in the near future. Although these models had the advantage of being simple and robust with respect to the defects in the input data. Exploration of such large data set with very high dimension is a complex task and is quite challenging. Providing a good estimate of the missing values is another key issue for good churn prediction. for whom the comparison suggests similarity. Service providers maintain huge volumes of data that contains details of the customer transactions. reduction in dimensionality is traded off for possible information loss. Current focus is towards developing models that not only provide the churn likelihood prediction but also provide a clear insight into the possible reasons for prediction. The transactional category includes attributes describing various statistics such as duration of calls. Subsequently. These models predict the likelihood of churn for a customer thus differentiating the churners as more probable and less probable churners. they possessed serious limitations to the interpretation of reasons for churn. Those customers.
False positives (non churners predicted as churners) and false negatives (churners predicted as non churners) are the misclassifications made by a model. Actual proportion of churn in top 10 % of the predicted churners is given by: 1 2 n α10% = 1 n ∑ E[ yi = 1] n i =1 5 . Actual Churners Churners Non Churners True Positive Predicted False Positive Non Churners False Negative True Negative Table-2. For these models. it is important to capture this information too and is represented as recall. precision and recall. Recall represents the fraction of actual churners captured: Recall = (true positives / (true positives + false negatives)) It is desirable that the model should capture as many churners as possible and minimize the false positives amongst its prediction. This is because it could have captured only a fraction of actual churners and missed out a lot of them. the concept of lift curve is used for evaluating the performance of the model. that are defined below are also important in evaluation of a model. By predicting a lot of customers as churners. Precision represents the fraction of actual churners amongst the predicted churners: Precision = (true positives / (true positives + false positives)) Having good precision alone is not sufficient for a good model.(churners predicted as churners) and true negatives (non churners predicted as non churners) represent the correct classifications made by a model. performance evaluation is not straightforward in terms of Precision and Recall. In case of likelihood prediction models. It is indeed a challenge to develop a model that has good precision and high recall. Hence. Churn Prediction Category Matrix The two terms. The lift in top 10% of the predicted churners is used as a performance measure of the model and is referred as Top Decile Lift . then the customers can be ordered on decreasing likelihood of churn: f (x ) ≥ f (x ) ≥Κ f (x ) . we can achieve a higher recall value possibly by compromising precision. Once the churn likelihood of customers (f(x)) is determined. Top decile measures the proportion of churner in top 10% of the predicted churners divided by the proportion of churners in entire data set.
a lift value of 2 indicates that. the accuracy of a model that had good performance in the top decile may go down. For example. 6 . especially in cases where it is not feasible for a service provider to choose a large fraction of the population for deployment of retention efforts. Lift in Top Decile is one of the commonly used criteria for model evaluation. Measures like Area under the lift curve give good insight into the performance of the model over the entire spectrum . the proportion of churners to non-churners in the top decile. if we include the top 2 or 3 deciles. This indicates that the model has captured a lot of churners in the top decile.where E[yi=1] represents the event of true positive and E[yi=0] represents event false positive. is twice that of the entire population. Accordingly. And n represents the number of observations in top 10% of most likely churner. the customers in the top decile are chosen for customer retention. Lift is a good criterion to check a model’s performance in the chosen decile. The proportion of churn across the entire data set is given by: 1 N α = ∑ E[ y i = 1] N i =1 Lift in Top Decile is then calculated as the ratio of Predicted Churn Rate to that of Actual Churn Rate as: TopDecile = α 10% α A high lift value portrays that the number of churners in the top decile are high. It is deemed necessary to evaluate a model over the entire spectrum because some models are likely to be very good in identifying top likely churners but may not be that good in identifying the midrange churners. So. there is a need for measures that evaluate a model in the entire range. For example.
Typical Illustration of a Lift Curve In Fig. 1.100 Model Curve Percentage of churners 80 60 Random Curve 40 20 20 40 60 80 100 Percentage of ordered customers Figure-1. The shaded region represents the area between the model’s cumulative lift curve and a random curve. This means that if a service provider chooses to target 20% of the customer population then it is expected to reach out to 60% of the actual churners. 7 . which represents uniform distribution churners over entire customer population. and yi = 1] where N is the total number of customers and NC is the total number of actual churners in entire data set. better the performance of the model over the range. This implies that more the area. This area between cumulative curve and the random curve is measured in terms of Geni coefficient. Geni coefficient is given by: GeniCoefficient = 2 N ˆ ∑ (α T =1 N T −α T ) Fig-1 illustrates typical lift curve and dotted line shows that 60% of actual churners are captured in the top 20 % of predicted churners. customers ordered on churn likelihood are placed on x-axis and percentage of churners form y-axis. Geni coefficient is defined as below: Fraction of all customers with churn likelihood above the threshold f (xT) is: αT = 1 N ∑ E[ f ( x ) > f ( x i =1 i N T )] Similarly fraction of all actual churners with churn likelihood above the threshold f (xT) is: ˆT = α 1 Nc ∑ E[ f ( x ) > f ( x i =1 i NC T ).
and Neural Networks are some of the popular techniques that have been applied for churn prediction.1 Decision Trees Decision trees are primarily used in classification tasks. Logistic Regression. Another comparative study reveals that decision trees perform reasonably well and are highly useful in interpreting the results  8 . Given a customer record.5 Techniques for Modeling Churn Prediction Several techniques have been applied to derive models for churn prediction. one rule set holding true for every instance of customer record. though there is a criticism that they are not suitable for capturing complex and non-linear relationships between the attributes. the training data requirements are high. The label of the leaf node (Churner or Non Churner) is assigned to the customer record under evaluation. Decision trees are used primarily because of their simplicity and ease of interpretation. In this section we describe briefly each of these models. The tree represents a collection of multiple rule sets. The accuracy of the decision trees is high. Illustration of Typical Decision Tree Each node in a decision tree is a test condition and the branching from the node is based on the value of the attribute that is tested. the conditions that hold for churners and non-churners are learned based on training data and these conditions are expressed in the form a tree. however. Basic underlying principle common to all these techniques have been the machine learning principles. Recently. and branching accordingly. a model for churn prediction was developed using decision trees . An ensemble of decision trees was able to capture nearly 50% of the churners in its top decile on a particular test data set. This process is repeated until a leaf node is reached. testing at each node the value of the attribute. 5. Decision Tree. models based on evolutionary learning of rules are reported to be highly successful in churn prediction. Recently. A typical decision tree is illustrated below: Age > 50 Yes NO International calls < 5 Yes NO Monthly Minutes > 2000 Yes NO Churner Non Churner Non Churner Churner Figure-2. Given the dataset. the classification is done by traversing through the tree.
The logistic regression model is simply a non-linear transformation of a linear regression model. The estimated likelihood of churn is represented by a logit function as: P = 1/[1 + exp (-T)] where T= a + BX Where a represents a constant term. The “logistic” distribution is an S-shaped distribution function that is similar to the standard normal distribution. As T grows large (towards ∞). Standard representation of logistic regression is referred as logit function. Linear Regression models are useful for predicting continuous valued attributes whereas Logistic Regression models are suitable for binary attributes.5. It is equi-probable that a customer is a churner and non-churner. This technique was reported to yield an accuracy of 92% and was found that they perform better when compared with some other models . the exponential term becomes negligible and the probability comes closer to 1. This non-linear transformation of a linear regression model mainly overcome the limitations of linear regression which tends to give continuous probability values often greater than 1 or less than 0. When T equals to zero then the likelihood is 0. Logistic Regression models are good in modeling linear relationships between the predictor attributes and aid in determining the predictor attribute value. X represents the predictor attributes vector and B represents the coefficient vector for the predictor attributes.2 Logistic Regression The prediction task involves identification of a customer as a churner or non-churner. A logistic regression is a modified form of linear regression so as to obtain a discrete value for a dependent variable. Since the predicted attribute is associated with only two values. E E= 1 Linear Regression model Logistic Regression model E= 0 X Figure-3. Figure-3 illustrates the linear regression and logistic regression comparison. logistic regression techniques are suitable for such tasks. Linear Regression and Logistic Regression Models 9 . When T becomes very small (towards -∞) the probability of churn tends to 0.5.
This is one of the reasons why Neural Networks are hard to interpret. Models employing neural networks can learn complex relationships amongst the predictor attributes and accurately predict churn. A simple second order rule is shown below: Area _ code = 91 ∧ subcription _ length ≥ 1 ⇒ Churn = True 10 . and to output either a 1 or 0 that associates Churner or Nonchurner status to the customer record. They also do not help in the explanation of churn reasons. The number of terms (conditions) in a rule determines the order and the terms are connected (using connectives) to form a rule. Rule based approaches have excellent capability to express relations and such approaches are quite popular in the context of prediction tasks. Neural Networks are applied in a number of scenarios. Neural Networks need a large volume of training data to arrive at a reasonable weightage for the predictor attributes. Logistic Regression and Decision Trees are good at explaining the reasons for churn whereas Neural Networks are superior in churn prediction. A recent work has proposed churn models based on evolutionary learning of rules for churn . The weights are constantly updated during learning to model the correct “effect” of the attribute. 5. Neural Networks have been categorized as a “black box” model because of the difficulty in expressing the semantics of the relationship arrived. They also had fewer misclassifications among the models that were evaluated.4 Evolutionary Approach The necessity of precise interpretation of results has motivated researchers to explore newer models for churn prediction. Given a customer record and the set of predictor attributes.3 Neural Networks Neural Networks have been applied to various prediction tasks wherein the primary goal was prediction and lesser importance was given to model understanding. It also takes a lot of time to arrive at the correct weights. A possible way of combining the merits of the techniques is by using Logistic Regression and Decision Trees for explaining the behavior of churn while exploiting Neural Networks in making the actual churn predictions. A comparative study  reports that Neural Networks had a good recall with high prediction accuracy. In spite of the said drawbacks. The accuracy of Neural Networks outweighs all these disadvantages. The basic idea is that each attribute is associated with a weight and combinations of weighted attributes participate in the prediction task. a Neural Network is used to calculate a combination of these inputs. They can be applied to a variety of target functions (such as discrete and continuous).5. The main ideas is to iteratively and progressively generate rules that contain m conditions by combining the rules that contain m-1 conditions. Another independent evaluation also suggests that neural networks are superior in performance in comparison with the other models . the models that not only have superior performance in terms of prediction accuracies but also provide greater insight into the possible reasons for churn. Neural Networks have also been applied in the area of churn prediction.
The main focus was to understand the reasons for churn. 5.To obtain the rules of order m (in the m-1th iteration). the algorithm obtained a slightly better lift when compared with the other techniques. Lifetime Value of a customer is one of the ways to model revenues associated with a customer. This process is continued until all interesting rules of order m are identified. for example. there are other approaches that have been applied for churn prediction. It is also interesting to note that the average lifetime value of an existing customer is inversely proportional to the churn likelihood of that customer. In the context of churn management. A decision 11 . A naïve-Bayesian model was constructed for churn prediction and was reported to have an accuracy of 68% and also helped in the identification of a top few attributes that are related to churn. Identification of groups lead to an understanding of group characteristics which in turn would help in identifying churn characteristics. Though the execution speed of decision trees could not be matched. The rules are then tested for interestingness and the rules that do not satisfy this test are pruned. The model also explains the reasons for churn clearly in the form of rules. This combination step. Lifetime Value is the net revenue a service provider is likely to get over the period of customer stay. The rules that match the interestingness criteria are output and this forms the basis for prediction tasks. In another approach . it would be advised to target those customers who are likely to bring in more revenue for customer retention. Self-Organizing Maps and U-Matrix techniques were used. the algorithm generates an initial rule set of order m from the rule set of order m-1 by randomly combining them.5 Other Approaches Churn prediction being an important and a complex task. can be done using an evolutionary technique of crossover. Lifetime value of a customer (LVC) aids in such decision making process.6 Lifetime Value of a Customer and its role in CM Services for customers differ on the customers’ needs and the revenues that they generate. Some of them are Bayesian approach and self-organizing map (SOM) based approach. The algorithm terminates when there are no more interesting rules of higher order. the algorithm associates a measure that represents the evidence of truth derived for that rule from the training data. With every rule. The approach identified groups amongst customers and summarized the characteristics of groups using rules. The results indicate the usefulness of the approach in making correct predictions. Sample set of final rules could be: subscriber _ type = individual ∧ Area _ code = 91 ∧ subcription _ length ≥ 1 ⇒ Churn = True Area = Rural ∧ Mean _ Monthlycalls ≤ 45 ∧ Mean _ monthlyIncome ≤ 3000 ⇒ Churn = True The performance of the model was reported to be substantially better when compared with Neural Networks and Decision Trees . A classifier was constructed out of this and results suggest that the accuracy of the classifier was more than 90%. 5.
The objective is to minimize the combined 12 . after considering various factors that govern it. once the model is used to make the predictions. LVC based Target Selection for Retention This plug-in kind of LVC based selection though desirable. we choose the top likely churners for customer retention and generate a strategic response for those customers. Let us consider a scenario where the prediction model for churn generates likelihood of churn for every customer. If it is required to choose only a fraction of the identified churners. which the customers are likely to bring in over a period of time. A way of measuring the loss caused by a model is by using a cost function. This choice is rewarded in the form of revenues. We usually select a model that maximizes gains. if they decide to stay back. CV is not dependent on the churn likelihood of the customer. The main requirement of integrated models is that they should be able to accurately identify the “most valuable” (high CV) customers . However. the customers can be ordered and chosen for customer retention.by the customer to stay back is coupled with the increment of lifetime value of the customer. This calls for an integrated approach in which Customer Value (CV) is considered while making predictions. it is beneficial to include most of the customers with higher LVCs. It becomes all the more important to consider CVs in those cases where the CVs of different customers vary drastically. We assume that LVC is calculated. 4. The effort should be focused on those customers with higher CVs in order to have lesser impact on the overall revenue. it is beneficial to deviate from this procedure in certain situations. One is the cost for making a needless retention effort (false positives) and another is the cost of losing a customer because the model did not predict (false negatives). In a normal approach. The results are still heavily dependent on the model and the selection based on LVC affects the results only to a certain extent. When churn likelihood goes down then the customers lifetime value increases. The CV for a particular customer is computed based on the revenue and profit data available for that customer. Instead of merely choosing customers based on churn probability. This is depicted in the Fig. we can also consider LVC of the identified customers. This function usually has two associated costs. it may not yield effective results. This kind of LVC based selection can be applied as a separate step. And unlike LCV. Then. LVC Computation LVC Customers ordered on likelihood of churn Customer Data Churn Prediction Model Target Customer Selection Target Customers for Retention Figure-4.
6 Ongoing work in Churn Management Fig. Subscriber Clustering Telecom Database Demographics Usage Statistics Billing Data Service Requests Customer Value Critical Incidence Records Class-Specific Data Class-Specific Churn Modeler C Churn Scoring Subscriber Specific Churn Prediction Churn Model Repository Figure-5. Optimum segmentation of subscriber population into classes based on their demographic and selected transactional characteristics. The cost factors can have different weights. Multi-Class Churn Prediction 13 .cost. the net benefits outweigh the losses occurred due to inaccurate predictions. one can develop a model that minimizes the objective function by selecting most of the high value customers. 5 illustrates the approach for churn prediction in our ongoing work at Satyam. Generate specific churn model for each class based on the training data specific to that class. and 2. Though the accuracy of the model may go down as a result. Our approach is to address the churn prediction problem in two steps: 1. By having a greater weightage for the cost associated with loss of a high CV customer.
should also be able to capture the reasons for churn and express them in a lucid manner. 11.C. the models are less general (and more specific) resulting in greater accuracy.. Currently. It is mandatory to understand the likely causes of churn and devise solutions accordingly. The model. 4. Summary The problem of churn has assumed enormous significance due to high churn rates and the associated revenue losses. V. Mozer.econ. and can adapt to changing churn characteristics. A. M.” Technical Report.From this we aim to achieve greater prediction accuracy for the following reasons: 1. Medlin and T. Further we are integrating customer value into the churn prediction model and also as an attribute in subscriber classification.ac.be/tew/cteo/or_reports/0361. Accessed from: http://www.ca/en/industries/ice/communications/documents/WirelessLocalNumberPor tability. hence. C. C. and 2.pdf 2. “Wireless Local Number Portability. apart from making accurate predictions. Managing churn is a complex task and requires strategic efforts. we are working on the evolutionary learning based algorithm for churn prediction. KPMG . Churn being unavoidable. Geppert.kpmg. Lemmens. This is achieved by building a good prediction model.S. Croux.pdf 3.kuleuven.S. Accessed from: http://www. Instead of one model which is applicable universally to entire customer population. Nath. M. It is important to identify the likely churners amongst the existing customers to avoid revenue losses due to churn. Available: http://downloadeast. Customer value considerations could be integrated into prediction model so that valuable customers are retained. we derive separate model for each subscriber class.” IEEE Transactions on Neural Networks.and R. 690696. “Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry. et al. Aw. “Bagging and boosting classification trees to predict churn. Daleen Technologies. there has to be effective ways to control churn. Ways of addressing the likely churners continue to evolve as churn characteristics continue to vary.. Behara .pdf 14 .” Research Report.. Katholieke Universiteit Leuven. References 1.com/owsf_2003/40332.oracle. Each class is somewhat homogeneous in its characteristics and hence the models derived are likely to be more accurate.” White paper. “Customer Churn Analysis in the Wireless Industry: A Data Mining Approach. There is a definite need to create models that can provide better insights and higher prediction accuracy.
Scott. Accessed from: http://mba.html 8. Measurement and Analysis for Marketing . “Churn Modeling for Mobile Telecommunications. M. A.” Accessed from: http://www. 10. “Emergent Self-Organizing feature maps used for prediction and prevention of Churn in mobile phone markets. and E. Golovnia. “Defection Detection: Improving Predictive Accuracy of Customer Churn Models. USA.edu/pages/faculty/scott. 15 .amadeus. Yao. and D. “Integrating Customer Value Considerations into Predictive Modeling.” Working Paper.uk/events_resources/conferences/Views 2003/Arun Parekkat Operational comparison of Logistic regression. 7. Dec. et al. 2002. 314-324.co. K. A.neslin/working_papers.N.salford-systems. Accessed from: http://www.” VIEWS 2003: A SAS user group meeting.. C.. Ultsch. 7(6). Florida. Nov. Wai-Ho A.” IEEE Transactions on Evolutionary Computation.doc 6. “A Novel Evolutionary Data Mining Algorithm With Applications to Churn Prediction. Decision trees & Neural networks in modeling mobile service churn. 2003. Rosset. “Operational comparison of Logistic regression..dartmouth. S.com/churn.5. 10(4). Chan. Parekkat.” Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003). Melbourne.” Journal of Targeting. Steinberg.tuck. Scott A.. Neumann. 2003. and X.N. 19-22 December 2003.php 9. 532-545.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.