You are on page 1of 3

Expert Systems with Applications 37 (2010) 4710–4712

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Short Communication

Churn models for prepaid customers in the cellular telecommunication


industry using large data marts
Marcin Owczarczuk
Institute of Econometrics, Warsaw School of Economics Al. Niepodleglosci 164, 02-554 Warsaw, Poland

a r t i c l e i n f o a b s t r a c t

Keywords: In this article, we test the usefulness of the popular data mining models to predict churn of the clients of
Churn prediction the Polish cellular telecommunication company. When comparing to previous studies on this topic, our
Retention research is novel in the following areas: (1) we deal with prepaid clients (previous studies dealt with
Wireless postpaid clients) who are far more likely to churn, are less stable and much less is known about them
Cellular
(no application, demographical or personal data), (2) we have 1381 potential variables derived from
CRM
the clients’ usage (previous studies dealt with data with at least tens of variables) and (3) we test the sta-
bility of models across time for all the percentiles of the lift curve – our test sample is collected six
months after the estimation of the model. The main finding from our research is that linear models, espe-
cially logistic regression, are a very good choice when modelling churn of the prepaid clients. Decision
trees are unstable in high percentiles of the lift curve, and we do not recommend their usage.
! 2009 Elsevier Ltd. All rights reserved.

1. Introduction (2009) for the overview. As far as churn in the cellular telecommu-
nication industry is concerned, see for example Pendharkar (2009),
1.1. The need of churn models Wei and Chiu (2002), Hung, Yen, and Wang (2006). In these papers,
data is collected on contractual customers. This sector is called
In telecommunication companies, the retention of customers is postpaid. The churn there is well defined. If the client wants to
one of the key activities of the CRM (customer relationship man- churn, he or she has to sign a proper document, usually in advance
agement) departments. The CRM actions are based on the direct of a month. Also, much is known about such customers: personal
communication to the customer, for example, via sms or direct call. data like age, gender and address. We have also information about
When communicating, certain services are proposed in order to their call direct records (cdr), and we may derive additional vari-
make a customer stay. The following sms may be the illustrative ables from cdr like average minutes of usage, etc.
example ‘‘make a least 30 PLN value recharge during next 7 days
and you will receive additional 10 PLN for calls”.1 When the offer
1.2. Prepaid customers
is accepted, the company has certain cost associated with the bo-
nus, but also the profit generated by the recharge. In addition,
In this article, we deal with a different type of clients, that is
the ‘‘life” of the customer extends by tens of days, which is the
prepaid. In our opinion, modelling prepaid is far more challenging.
usual time of spending the recharge and the bonus. Of course, there
Prepaid clients do not sign any contract and are anonymous. So we
is a natural question: which clients should be the target group of
do not have any personal data about them. All is known is their tar-
such marketing actions. These actions should not be addressed to
iff and usage derived from cdr. Prepaid customers do not pay the
loyal customers who would make a recharge anyway, because it
monthly subscription fee and their usage is less regular. We also
generates only loss associated with the bonus. On the contrary,
do not have a strict definition of churn. Of course, there is a term
customers who are likely to churn, may change their mind after
called the expiration of the SIM card, but in our opinion, it is not
receiving such message. So it is important to predict which cus-
a good definition of churn. This problem is described in the next
tomers are likely to churn in the near future and address a market-
subsection, and the description is based on the Polish cellular tele-
ing message only to them.
communication market.
The problem of churn prediction, regardless on the economy
sector, is well documented, see for example Ngai, Xiu, and Chau
1.3. Churn definition

E-mail address: mo23628@sgh.waw.pl When a prepaid customer makes a recharge, he or she is able to
1
PLN is the abbreviation of the Polish currency unit. make outgoing calls during the certain period of time. In Poland,

0957-4174/$ - see front matter ! 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2009.11.083
M. Owczarczuk / Expert Systems with Applications 37 (2010) 4710–4712 4711

when the value of the recharge is equal 30 PLN, this period is usu- like random forests or support vector machines, but we argue that
ally equal to 1 month (or 30 days depending on the brand). After their usage is improper when predicting churn. Linear models like
that period, a customer is able only to receive calls, also during cer- regression or Fisher discriminant analysis have a simple interpreta-
tain period of time (usually 365 days, depending on the brand and tion: positive coefficient by a variable suggests that larger values of
the value of the recharge). After that period, the SIM card is deac- this feature are symptoms of churn. Decision trees have also clear
tivated. Of course, until the deactivation, a customer may make a interpretation which can be expressed in terms of what-if rules.
recharge and the expiration date extends. Interpretable models are also much easier to debug which is very
Let us analyze the following example: the client makes a re- important when using such a large data mart. Our data mart is gen-
charge, spends its value during the same day and throws the SIM eral purpose mart, not necessary churn-oriented. So it is very easy
card away. Under the above regulations, the telecommunication to accidentally include irrelevant variables (like clients’ identifier)
company deactivates the SIM card one year after the real churn. or treat variables in the improper way, for example, using categor-
Retention actions are usually addressed short before the expected ical variables as if they were numerical (for example, the identifier
date of churn, so relying on the date of expiration can be very inef- of the client’s SIM card status which is coded as a numerical vari-
fective. So, it is crucial to have a proper definition. In this study, we able with a few levels). When the model is interpretable and se-
use the following: ‘‘the client churned if he or she had a 6-week lects only a small subset of significant features, such errors are
period without incoming and outgoing calls”. The moment of easy to detect. Also, mistakes during the data mart generation
churn is the beginning of this period. Because the marketing mes- phase (like a lack of certain attributes for certain clients or abnor-
sage should be sent short before the client churns, we predict mal values) may be easily detected when the model uses only a
whether a client churns four weeks after the moment of analysis. small subset of them and does it in a clear way.
So, we want to predict in advance of four weeks the occurrence
of six weeks of inactivity. 2.1. Data
This definition is the result of the separate analysis and we dis-
cuss it here only briefly. The churn definition should allow fast ver- Our data set consists of the train sample – 85,274 observations,
ification – we want to wait as short as possible to find out that the the calibration sample – 36,824 observations and the test sample –
client really stopped using the service. The definition should also 45,497 observations. Data in the train sample and the calibration
be certain – we want to be sure that after this period of inactivity, sample come from the dataset collected at the same time, which
the client makes no call and receives no call until the SIM card was then split randomly into the train and validation part. The test
deactivation. Six weeks is the compromise of these two goals – cli- sample was collected six months after the train and calibration
ents who did not use the service for six weeks rarely made a call sample.
after that period and there are many clients who awake after three,
four or five weeks of inactivity. 2.2. Models
In addition, if we would like to rely on a date of expiration, we
could only use outdated data about clients, because we had to wait Since applying regression models and Fisher discriminant anal-
365 days until the expiration to find out which clients really ysis directly to such a large data set may be difficult (long time of
churned in order to calculate dependent variable for the models. computation, possible numerical instability due to the collinear-
The telecommunication sector changes very quickly and such a ity), we applied the following preliminary variable selection (calcu-
long delay is unacceptable. lated on training set).
To each variable the Student’s t-test was applied. The null
hypothesis states that the means of the particular variable among
1.4. Data mart
churners and non-churners do not differ. The alternative hypothe-
sis states that there is a significant difference between these two
Previous studies on churn prediction in the cellular telecommu-
means. So variables that are potentially interesting when model-
nication market, for example Pendharkar (2009), Wei and Chiu
ling churn should have a significant difference of these means. So
(2002), Hung et al. (2006), used data with relatively small number
we selected 50 variables with the highest absolute value of the t
of explanatory variables. We are surprised, how little was known
statistics and used these 50 variables in the regression models.
about the customers. In our study, there are 1381 variables. All
We used full regression and regression with the stepwise, forward
of them are derived from cdr, tariffs and components. Variables
and backward selection based on the Wald test applied to earlier
associated with components represent usually the presence or ab-
selected 50 variables. As far as decision trees are concerned, we
sence of packages the clients may activate, for example, a package
used two versions: with all the 1381 variables (decision trees are
of cheaper calls to selected clients. In comparison to previous stud-
computationally fast and it is possible to estimate such models)
ies, our variables derived from the cdr data are far more detailed.
and with 50 best variables according to t -statistics similarly to
For example, we collect data about overall minutes of usage, but
the regression approach.
also minutes of usage splitted by days of the week (working days
and weekend) and time of calls (morning, midday, night). We also
2.3. Results
collected data about the ‘‘dispersion” of calls – we measured, how
many calls were made to the most frequent number and received
We tested our models using lift curves that measure the rela-
from the most frequent number. Our data is gathered from one
tion of churners in the top deciles of the score generated by the
of the Polish mobile operator in 2007 and 2008.
models to the fraction of churners in the whole population (lifts ex-
pressed as factors not as percentage). Since all the linear models
2. Churn modelling had similar performance, regardless on the additional variable
selection method (stepwise, backward, forward, none), but the lo-
In our study, we used the following models: logistic regression, gistic regression was slightly better than linear regression and
linear regression, Fisher linear discriminant analysis and decision Fisher discriminant analysis; in this article, we present results only
trees. The ground for this choice is following: we want to use inter- for the logistic regression with stepwise selection. Applying preli-
pretable models which give understanding what is the reason (or minary variable selection to decision trees gives similar results to
at least a symptom) of churn. We are aware of black-box models the full decision tree, so we present only decision trees with the
4712 M. Owczarczuk / Expert Systems with Applications 37 (2010) 4710–4712

15
15

o o − decision tree o − decision tree


+ +
+ − logistic regression + − logistic regression
o
+ +
o

10
10

+ +
oo
o o +
+o o+

lift
lift

+ o+
+
o o+
o+
+ o+
o+ o+
o++ o++
o ++ o+

5
oo++
5

o ++
oo ++ oo+++
oo ++++ ooo+++
oooo ++++ ooo+
o+
oooooo++++++ o+
o++++++
oooooo+++
++
ooooooo +++
ooo ++++ oo +++
ooo +++
ooo +++
oooo ++
oo ++
oo ++
oo ++
oo ++ ooo ++
oo ++
oo ++
oo +++
ooo +++
oo +++
ooo +++
ooo +++
ooo +++
ooo ++++ ooo ++++
oooo +++
ooo ++++
oooo ++++
oooo +++++
oooo ++++
oooo +++++
ooooo +++++
ooooo +++++++
ooooooo +++++++
ooooooo
ooooo +++++
ooooo +++++
ooooo ++++++++
oooooooo ++++++++
ooooooooo
+++++++
oooooooo

0
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
quantile quantile

Fig. 1. Lift curves for train sample. Fig. 3. Lift curves for test sample.

The poor results for the decision trees may be easily explained.
15

o − decision tree
Trees use rules of the form ‘‘if xi < C i then pðchurnÞ ¼ q”. Telecom-
+
+ − logistic regression munication market in Poland evolves very quickly: calls are cheap-
o
er and cheaper so clients make more and more calls. New services
10

o
+ are introduced, some components become very common, other are
+
o obsolete. So the distribution of the variables shifts in time. When
lift

o
+
o
+
using fixed splits ‘‘xi < C i ”, more and more clients start to fulfill
o
+
+
o+ these conditions or their negations. As a result, more and more cli-
o++
ents fall into certain leafs of the tree and the prediction becomes
5

o ++
oo ++
ooo++++
ooo +++++
ooooo +++++
less precise. On the contrary, when constructing the lift curve for
oooooo+++++
oooo ++++
oooo +++
ooooo
++
oo
++++
oo ++
oo ++
oo +++
ooo +++
ooo +++
ooo +++
the linear models, observations are sorted by the score, which is
ooo ++++
oooo +++++
ooooo +++++
ooooo ++++++
oooooo +++++++
ooooooo ++++++++
oooooooooooo
+++ in fact a linear combination of the variables and has the form
‘‘a1 x1 þ % % % þ ak xk ”. So the shift in distribution generates the shift
0

0.0 0.2 0.4 0.6 0.8 1.0 in score, but the sorting does not change and the lift curve remains
quantile stable.

Fig. 2. Lift curves for calibration sample.


3. Summary, conclusions and direction for future work

preliminary variable selection. The results for the train dataset are
In this article, we evaluated usefulness of regression and deci-
presented in Fig. 1, for the calibration dataset in the Fig. 2 and for
sion trees approach to the problem of modelling churn in the pre-
the test dataset in the Fig. 3.
paid sector of the cellular telecommunication company. Linear
We may observe that all the models give similar results for very
models are more stable than decision trees that get old quickly
lower quantiles among all data sets. It corresponds to the situation,
and their performance weakens in time, especially in top deciles
when the target group of the marketing campaign is large. The key
of the score. Nevertheless, we showed that prepaid churn can be
difference can be observed for high and medium quantiles. High
effectively predicted using large data mart. As far as future work
quantiles correspond to the situation, when a really small group
is concerned, it would be interesting to model churn in the sector
of clients should be selected to the marketing campaign.
that is somewhere between postpaid and prepaid – the mix sector.
As far as the calibration sample is concerned, logistic regression
Mix clients have to sign contract and personal data is available for
(and all the linear models) outperforms decision trees. However,
them, like for the postpaid customers. In addition, they make re-
these differences are significant only for medium quantiles. This
charges which makes them similar to prepaid.
may suggest that all the models have similar performance in short
term and are valid for the period, when the model is built. The key
question is whether the models are stable and valid when applied References
to datasets long after the models were built. When comparing re-
Hung, S. Y., Yen, D. C., & Wang, H. Y. (2006). Applying data mining to telecom churn
sults on the test dataset, we may observe that the lift curve of the
management. Expert Systems with Applications, 31, 515–524.
decision trees is much lower and that the lift curve of the logistic Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques
regression for high quantiles. For the logistic regression, the shape in customer relationship management: A literature review and classification.
of the lift curve is similar for calibration and test sample. For the Expert Systems with Applications, 36, 2592–2602.
Pendharkar, P. C. (2009). Genetic algorithm based neural network approaches for
decision tree, the shape of the lift curve differs for the calibration predicting churn in cellular wireless network services. Expert Systems with
and test sample. For the test sample, it is non-monotonic. Applications, 36, 6714–6720.
It suggests that linear models are more stable. It is a very impor- Wei, C. P., & Chiu, I. T. (2002). Turning telecommunications call details to churn
prediction: A data mining approach. Expert Systems with Applications, 23,
tant aspect of retention programs. Models that get old very quickly 103–112.
need frequent updates, which is time and cost consuming.

You might also like