You are on page 1of 6

Customer Churn Time Prediction in Mobile

Telecommunication Industry Using Ordinal


Regression

Rupesh K. Gopal and Saroj K. Meher

Applied Research Group, Satyam Computer Services Limited


Entrepreneurship Center, Indian Institute of Science campus
Bangalore 560 012, India
Ph.: +91-80-23606830; Fax: +91-80-23601011
rupesh.gopal@gmail.com, saroj meher@satyam.com

Abstract. Customer churn in considered to be a core issue in telecom-


munication customer relationship management (CRM). Accurate pre-
diction of churn time or customer tenure is important for developing
appropriate retention strategies. In this paper, we discuss a method based
on ordinal regression to predict churn time or tenure of mobile telecom-
munication customers. Customer tenure is treated as an ordinal outcome
variable and ordinal regression is used for tenure modeling. We compare
ordinal regression with the state-of-the-art methods for tenure prediction
- survival analysis. We notice from our results that ordinal regression
could be an alternative technique for survival analysis for churn time
prediction of mobile customers. To the best knowledge of authors, the
use of ordinal regression as a potential technique for modeling customer
tenure has been attempted for the first time.

1 Introduction

Customer churn is a significant problem in many firms operating on contractual


or subscription business setting, like telecommunication operators, Internet ser-
vice providers, and cable services operators. In a study focusing on the role of
satisfaction to model customer length of stay with the telecom service provider,
Bolton [5], finds that customer satisfaction is positively correlated with their
tenure. First, the tenure is longer for customers who have high levels of cumu-
lative satisfaction, and second, the effect of perceived losses (for e.g., transac-
tion failures, bad service quality) on the tenure is negative. New customers are
particularly vulnerable and if their experiences are not satisfactory, the relation-
ship is likely to be short. Customers who are satisfied with the service provider
tend to stay for longer durations. Ordinal regression (OR) is a type of learning
when the response variable comes from a finite ordered (i.e., ordinal ) set. Sim-
ilar to modeling customer satisfaction (which is not observable!) on a ordinal
scale, we can model customer tenure as an ordinal response variable to predict
customer tenure. In section 2 we explain the use of classical tenure modeling

T. Washio et al. (Eds.): PAKDD 2008, LNAI 5012, pp. 884–889, 2008.

c Springer-Verlag Berlin Heidelberg 2008
Customer Churn Time Prediction in Mobile Telecommunication Industry 885

approach - survival analysis. In section 3 we explain ordinal regression for mod-


eling customer churn times. In section 4 we discuss our empirical results on a
real world mobile telecom dataset. In section 5 we conclude our work.

2 Classical Approaches for Tenure Modeling


Traditionally, tenure modeling or length of stay modeling belongs to a branch
of statistics called Survival Analysis. Survival analysis (SA) is concerned with
analyzing the time to occurrence of an event (e.g., time to churn) in the presence
of censored observations [2]. In SA, we begin observing a set of customers at some
well-defined point of time (called the origin time) and then follow them for some
substantial period of time, recording the times at which customer churn occurs.
Some customers may churn after the end of study period, i.e., after censoring
time. Such cases are called right censored observations [3]. Several parametric,
semi-parametric, and non-parametric survival regression techniques are available
as commercial products and is already part of the telecommunication CRM
process. Allison [2], gives a good insight into the use of survival analysis for
modeling time-to-event data using a commercial statistical package SAS R [4].
In SA, survival function and hazard rate functions are used to describe the
status of customer survival during the tenure of observation. The survival time T
is considered as a random variable. The survival function S(t) gives the probabil-
ity of survival to time t, that is, S(t) = P r(T > t) = 1 − P (t), where P (t) is the
c.d.f of survival time T . The hazard function h(t) is defined as the conditional
likelihood that a customer will churn at time t, given that churn did not occur in
the interval (0, t), and can be computed from S(t) using h(t) = − dt d
S(t). Thus
we can compute survival and hazard probabilities for a customer xi at each time
point tk in the study. By sorting all customers in ascending survival probabilities,
at a specified time tk , the customers with lowest predicted survival probabilities
will have highest likelihood to churn at that time [3].

3 Ordinal Regression for Tenure Modeling


In OR, we arrange customer tenure on a ordinal scale such that t1 ≺ t2 ≺ ... tk
≺ . . . ≺ tM . Chu & Keerthi [6], formulate the OR problem as a generalization of
support vector machines by determining M − 1 thresholds (parallel discriminant
hyperplanes) for M ranks by dividing the real line into M consecutive intervals,
one for each rank. Alternatively, Frank & Hall [7] decompose the original OR
problem into a set of binary classification problems. For a review of other OR
formulations see [6]. We have implemented the OR formulation proposed in
[7] for the results in this paper. The original M -class OR problem with ranks
{t1 , t2 , . . . , tM }, is converted into M − 1 nested binary classification problems by
using the ordering of the original ranks. Training starts by deriving new datasets
from the original dataset, one for each of the M − 1 new binary classes. Each
derived dataset contains the same number of samples as the original, with the
same attribute values for each sample, except the class value. In the next step,
886 R.K. Gopal and S.K. Meher

each of the M −1 classifiers will generate a model for each of the new datasets. For
each sample (customer) we estimate the probability that it belongs to a target
class (i.e., churn probability at that time) as follows: P r(t = t1 ) = 1 − P r(t =
t2 ∨ t = t3 . . . ∨ t = tM ) = P r(t > t1 ), P r(t = tk ) = P r(t > tk−1 ) − P r(t > tk )
k = 2, 3, . . . , M − 1, and P r(t = tM ) = P r(t > tM−1 ). To predict the churn
time of a customer with unknown churn time, the sample is processed by each
of the M − 1 classifiers and the class with maximum probability is assigned to
that customer.
In our experiments, we first start by grouping tenures into ranks such that
(ta , tb ] ≺ (tb , tc ] ≺ (tc , ∞), where ta < tb < tc < ∞. It is important to note
that this grouping into ranks could come from domain experts, like for example,
finding a set of customers who are likely to churn in 6-12 months period or finding
set of customers who are likely to stay for more than one year or two years.
In the next step, we repeat OR experiments on each rank, with a preference
level attached between each atomic time unit (that is, at month level). This
hierarchical way is taken to overcome the problem of large number of classes
present in the current problem.

4 Experiments and Results


We demonstrate results on the Churn Modeling Tournament data obtained from
The Center for Customer Relationship Management at Duke University [1]. The
data were provided by a major wireless telecommunications company using its
own customer records. We used calibration dataset which consists of 100, 000
customers for whom there were 169 independent variables, a unique identifier
for each customer and the churn label (0 for churn and 1 for no-churn).

4.1 Data Preparation


The churn modeling tournament data cannot be directly used for tenure mod-
eling experiments. We need to represent the data in a manner suitable for our
experiments. X = {xi : i = 1, 2, . . . , N } denote customers, aij denotes j th
feature value for ith customer. In churn modeling data, number of months in
service is one of the feature and churn is the output label. Customers who are
active at the time of sampling are treated as censored observations and churners
are considered as complete observations. Months in service becomes the output
variable. Only right-censoring is considered in the present study. Since all cus-
tomers in our dataset are at least 7 months old on the network, this becomes
the origin time for our experiments. We choose a study period window of 25
months. Therefore the censoring time will be at 31st month. Hence, customers
with tenure 7 ≤ t < 32 are considered to be complete observations. Customers
whose tenure t ≥ 32 months are considered to be censored observations (right
censoring). We bin tenure into five ranks such that A≺B≺C≺D≺E. Customers
who churned in time periods (in months) [7, 11], [12, 15], [16, 21], [22, 31], and
[32, ∞) are assigned ranks A, B, C, D, and E respectively. Note that censored
observations are assigned to last rank.
Customer Churn Time Prediction in Mobile Telecommunication Industry 887

4.2 Data Preprocessing


All attributes which had more than 30% of missing values and those that are
summation of two or more variables are also removed. Information gain and
chi-squared statistic tests [3] are then used for feature selection. Both methods
generate ranking for features. We conduct experiments with top 30, 40, 50 and
99 ranked features. Model with 40 top ranked features selected from information
gain criterion seems to give the best result and hence only this result is reported.
The dataset is randomly split into to halves, one for training and the other testing
purpose. Both the data sets have approximately 25,000 samples. We use the open
source data mining package Weka version 3.5.5 [8] for our experiments.

4.3 Empirical Results


First, we compare OR with multi-class classification (MC). The output consists
of 5 labels {A,B,C,D,E}. For both the classifiers C4.5 decision tree is used as
the base learner. OR is compared with one-against-all multi-class classification
scheme. In case of OR we have a preference level attached between output labels.
This information is absent in MC setting. The classification accuracy obtained
from OR and MC is 86.21% and 83.8% respectively. Mean absolute error (MAE)
for OR and MC is 0.066 and 0.2512 respectively. MAE is an important parameter
for comparison between OR and MC [6]. So far the model is able to predict only a
coarse time of churn (for example, between 12 and 15 months) of customers. We
repeat our experiments for each rank to predict churn time at month level. MAE
values are given inside the parenthesis. Accuracy and MAE results for ranks A,
B, C, and D for OR is 67.51% (0.1339), 70.15% (0.162), 56.85% (0.1488) and
39.57% (0.1262) respectively. Accuracy and MAE results for ranks A, B, C, and
D for MC is 64.45% (0.4548), 63.93% (0.3047), 54.58% (0.2449) and 39.53%
(0.1726) respectively. We notice that OR is consistently more accurate than
MC in generating predictions. We note that MAE is consistently lower for OR
compared to MC.
Next, we compare OR with Cox proportional hazards (PH) model, [9], for
predicting customer tenure at month level. We use the PROC PHREG proce-
dure available in SAS/STAT R software for Cox PH model [4]. Cox PH model is
a semi-parametric survival regression technique using partial likelihood estima-
tion. Cox PH model is useful when the form of survival distribution and hazard
function are not known in advance. The results are reported using cumulative lift
curve [1]. Figure 1 shows the cumulative lift curves by OR and PROC PHREG
for 7 through 11 months. We notice that PROC PHREG captures 30-60% of
churners in the top decile. Whereas, OR captures only about 20-40% of churners
in the top decile. Figure 2 shows the cumulative lift curves by OR and PROC
PHREG for 12 through 15 months. Here we notice that PROC PHREG is able
to capture at most 20% of churners in the top decile. Whereas, OR captures
about 20-45% of churners in the top decile. Top decile plots of ordinal regression
and PROC PHREG for 16 though 21 months and 22 through 31 months are not
presented here because of space limitations. However, the results from PROC
PHREG for these time periods are worse relative to ordinal regression. Full set
888 R.K. Gopal and S.K. Meher

100 100

90 90

80 80

70 70
Percent of churners

Percent of churners
60 60

50 50

40 40

30 month 7 30 month 7
month 8 month 8
month 9 month 9
20 month 10 20 month 10
month 11 month 11
Random Random
10 10

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Decile Decile

Fig. 1. Cumulative lift curve by OR model (left) and PROC PHREG by


SAS/STAT(right)
R for customers who churned at month 7 through 11

100 100

90 90

80 80

70 70
Percent of churners

Percent of churners
60 60

50 50

40 40

30 month 12 30 month 12
month 13 month 13
month 14 month 14
20 month 15 20 month 15
Random Random

10 10

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Decile Decile

Fig. 2. Cumulative lift curve by OR model (left) and PROC PHREG by


SAS/STAT(right)
R for customers who churned at month 12 through 15

of results comparing OR and SA is given in our technical report [10]. We notice


that PROC PHREG is able to make good predictions about customer churn
when their tenure is short (7 to 11 months), where as when tenure prediction is
desired for customers who have stayed for a considerably longer time with the
service provider, PROC PHREG seems to drastically decrease its accuracy. OR
on the other hand is seen to make predictions more uniformly. We anticipate
this is due to the balanced datasize of training samples given for training ordinal
regression units for each rank.

5 Conclusion and Future Work


In this paper we discussed the use of ordinal regression for modeling tenure of
mobile telecommunication customers. Ordinal regression is compared with multi-
class classification and is seen to perform better. Next we compared ordinal
regression with state-of-the-art method for tenure modeling, survival analysis
technique (Cox PH model). Ordinal regression is seen to make more uniform
predictions about customer tenure compared to Cox’s model. We would like to
emphasize here that ordinal regression is seen to perform better than survival
analysis only on the Duke university data. Due to difficulty in getting real world
data from telecommunication operators, we are unable to conduct experiments
on some more datasets. In future we wish to model tenure of customers at other
places where customer satisfaction plays an important role, like insurance and
Customer Churn Time Prediction in Mobile Telecommunication Industry 889

banking industry. We also wish to compare ordinal regression with parametric


survival regression models and different ordinal regression learning schemes on
tenure modeling data.

Acknowledgments. We are grateful for the valuable inputs and thoughtful


comments by Chiranjib Bhattacharyya, Department of Computer Science and
Automation at Indian Institute of Science, Bangalore, India. We thank Gururaj
Kallur, Arun Kumar, Sridhar Varadarajan, Srividya Gopalan, Ramya Ramakr-
ishnan and Narasimhan Balakrishnan for their support and encouragement.

References
1. Neslin, S.A., Gupta, S., Kamakura, W., Lu, J., Mason, C.H.: Defection detection:
Improving predictive accuracy of customer churn models. Working paper series,
Teradata center for customer relationship management at Duke university (2004)
2. Allison, P.D.: Survival analysis using the SAS system: A practical guide, SAS
Institute Inc, Cary, NC (1995)
3. Lu, J.: Predicting customer churn in telecommunications industry - An application
of survival analysis modeling using SAS. SAS User Group international online
proc., Paper No. 114-27 (2002)
4. SAS Institute Inc.: SAS/STATUsers
R Guide, Version 6. SAS Institute Inc., 1,2
(1989)
5. Bolton, R.N.: A dynamic model for the duration of the customer’s relationship
with a continuous service provider: The role of satisfaction. Marketing Science 17,
45–65 (1998)
6. Chu, W., Keerthi, S.S.: Support vector ordinal regression. Neural Computa-
tion 19(3), 145–152 (2007)
7. Frank, E., Hall, M.: A simple approach to ordinal classification. In: Proc. of the
European Conf. on Machine Learning, pp. 145–156 (2001)
8. Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and tech-
niques with Java implementations. Morgan Kaufman, San Francisco (2005)
9. Cox, D.R.: Regression models and life tables. J. of the Royal Stat. Soc., Series
B 34, 187–220 (1972)
10. Gopal, R.K., Bhattacharyya, C., Meher, S.K.: Customer churn time prediction in
mobile telecommnunication industry using ordinal regression. ARG-TR-Y7-001,
Technical report, Satyam Computer Services Limited (2007)

You might also like