You are on page 1of 6

Using Big Data Analysis to Retain Customers for Telecom

Industry
Yuanhu Gu Alvin R. Malicdem
College of Information Technology College of Information Technology
and Computer Science Don Mariano Marcos Memorial State University
University of the Cordilleras La Union, Philippines
Baguio, Benguet, Philippines (+63)9166458405
(+63)9271639145 amalicdem@dmmmsu.edu.ph
Tiger18921884431@gmail.com

Josephine S. Dela Cruz Thelma Domingo Palaoag


College of Information Technology College of Information Technology
and Computer Science and Computer Science
University of the Cordilleras University of the Cordilleras
Baguio, Benguet, Philippines Baguio, Benguet, Philippines
(+63)9496513187 (+63)9997134118
delacruzpen@gmail.com tpalaoag@gmail.com

ABSTRACT industry; feature analysis.


Nowadays, telecommunication markets are becoming more and
more competitive, and customer churn is becoming more and
1. INTRODUCTION
more serious. In the tough competitive mobile market, Customer As the market of the telecom industry grows more and more
Churn Management is becoming more and more critical. In mature, the competition among telecom companies becomes more
developing countries, most customers switch service providers and more intense. The market is saturated. Some telecom
because of good promotional incentives and lower monthly costs companies are finding various ways to lure customers away from
offered by competitive service providers. How to predict customer their competitors. So the new telecom companies provide various
churn quickly and accurately becomes very important. In this schemes and services to attract multiple customers to get the
paper, the researchers successfully analyzed the customer churn switch from competitor's service to their service [1]. Loss of
using big data feature analysis and multi-feature analysis. User customer from their service provider is known as churn [2].
data were modeled by XGBoost algorithm. The model is Therefore the old companies should predict churn customers and
optimized repeatedly with GridSearchCV as a parameter tool. The retain their existing customers.
accuracy of the model on the test set is 85.1%. The researchers As a telecom company, they need to pay attention to the data of
predicted about 11000 customer lists per month that may be about churn customers every month. The researcher works in a telecom
to churn. Using K-means clustering method, 11000 churn target company and provides the statement of churn customers. Last
customers per month were classified into three categories and year, the researcher found that the rate of churn customers is
telecom companies are suggested to take some solutions which 4.73%, which is 1.05 percentage points higher than the monthly
are found by feature analysis to retain customers. This big data average rate of 3.68% for the whole network. The researchers
analysis can be used to retain customers for the telecom industry. gave comprehensive big data support from data modeling and
analysis. The customers churn characteristics were analyzed and
CCS Concepts the problem was found. Competitors jointly launched “King Card”
• Information systems➝Information system applications➝ business to the public in an Internet game, which caused a
Data mining. significant upsurge in the industry market by lowering the tariff
and exempting the traffic usage fee of the series products of the
Keywords game company. This set of tariff policy combined with the
Big data analysis; retain customers; customer churn; telecom operation mode of APP cartridge pushing has a significant impact
Permission to make digital or hard copies of all or part of this work for on the loss of active users of the game.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that The researchers analyzed the characteristics of customers from the
copies bear this notice and the full citation on the first page. Copyrights aspects of using quality, accessing APP features, time segment
for components of this work owned by others than ACM must be and times of playing King of Glory, customers’ package attributes,
honored. Abstracting with credit is permitted. To copy otherwise, or customers' mobile terminal attributes, traffic saturation, terminal
republish, to post on servers or to redistribute to lists, requires prior compensation value and fee compensation value, and further
specific permission and/or a fee. Request permissions from carried out detailed analysis to find countermeasures. User data
Permissions@acm.org. were modeled by XGBoost algorithm, which is an integrated
ICCAI '19, April 19–22, 2019, Bali, Indonesia
learning method. This is known as Customer Churn Prediction
© 2019 Association for Computing Machinery.
ICCAI '19, April 19–22, 2019, Bali, Indonesia (CCP) [3]. The model is optimized repeatedly with GridSearchCV
ACM ISBN 978-1-4503-6106-4/19/04…$15.00 as a parameter adjusting tool. Finally, the accuracy of the model
DOI: https://doi.org/10.1145/3330482.3330510 on the test set is 85.1%. The researchers predicted about 11000

38
customer lists per month that may be about to churn. Using K- which is known as customer churn [4]. It has become harder for
means clustering method, about 11000 churn target customers per telecom providers to acquire new customers, and the need for
month are classified into three categories: traffic-restrained users, retaining existing ones has become of paramount importance [6].
price-sensitive users and traffic infinite relocation users. The So the goal of our project is to analyze and predict which
researchers suggested our company to take some methods to customers are might be going churn or not by using dataset of
retain customers. For three kinds of target churn customers, we customers. This will help the telecom companies to know in
successfully maintain about 9000 users per month through online advance which customer may switch from their service to their
public number precise push, off-line stock center call, stage gift competitor's service [1]. India telecom competition is similar with
traffic, gift wing payment of red envelope, recharge discount, China telecom, and they have done customer churn analysis.
relocation of an unlimited package and other policies. The churn
rate is reduced by 0.93%. This big data analysis can be used as 3.2 Data Cleansing
Customer Churn Prediction and to retain customers for the Data cleansing can be considered to be an activity that is
telecom industry. performed on the data sets of the data warehouse. The cleansing is
done in order to enhance and collectively maintain data
2. CONCEPTUAL FRAMEWORK consistency and quality [7]. Data cleansing is vital, especially for
There are three main difficulties for customer churn prediction redefining some null values, time segments, etc.
modeling. First, the customer churn data set is substantially
imbalanced in reality. Second, the samples in feature space are 3.3 Feature Analysis
relatively scattering. Third, the dimension of feature space is high When the number of features is very large relative to the number
and dimension reduction is necessary for algorithm efficiency [4]. of observations in your dataset, certain algorithms struggle to train
So, it is challenging to build a good model. The conceptual effective models. This is called the “Curse of Dimensionality,”
framework is shown as Figure 1. and it’s especially relevant for clustering algorithms that rely on
distance calculations [8]. Feature models have been commonly
used to model the variability in software product lines [9]. Feature
INPUT PROCESS OUTPUT analysis and multi-feature analysis can help us determine which
parameters are important. Through these analyses, maybe the
Data Feature Prediction
Analysis reasons can be found. In this study, the researchers have done a lot
Collecting of feature analysis and multi-feature analysis based on experience.
Multi-feature Assessment
Analysis 3.4 Model Building
Data In the paper “Dyad Churn: Customer Churn Prediction using
Cleansing Modeling Solution
Strong Social Ties”, it says: We propose a dyadic based churn
prediction model, Dyad Churn, where customer churn is modeled
Figure 1. Design framework. through social influence that propagates in the telecom network
over strong social ties [6]. This is advantage, but it needs a lot of
INPUT: Due to the rapid development of information technology, data, and collecting these data is very difficult. At the same time,
especially the rapid development of the Internet, the dimension of the workload of modeling is also very large. Due to the scattered
customer attributes is quite large. Data collection is a fundamental reasons of telecom customer churn, the accuracy and recall rate of
and enormous task. The data is also not very clean and needs to be these data modeling are also problems. As a result, churn
cleaned with experience. prediction has become one of the main telecom challenges [10].
PROCESS: High-dimensional features will increase computing 4. METHODOLOGY
costs and information redundancy. So it needs to conduct feature As a telecom company, the administrators need to pay attention to
analysis and multi-feature analysis to find reasonable features to the data of churn customers every month. The researcher works in
model. a telecom company and provides the statement of churn customers.
OUTPUT: Customers leave the company for different reasons. In We should do daily big data analysis. If someone finds a problem,
other words, the positive points (churn case) are dispersed in the we also should do big data analysis, in order to get the reason of
feature space [4]. Therefore, the model needs repeated the problem and the method to solve it. The methodology is
optimization and evaluation. After prediction, the researchers shown as Figure 2.
should classify churn customers and put forward some
suggestions on how to solve the problem. Daily big Multi-
data analysis Feature
feature Modeling Prediction
3. REVIEW OF RELATED LITERATURE Analysis
Analysis
The researchers work in a telecom company for many years. Discovered
Every month, churn customers should be paid attention more to. problems
But it is difficult to analyze customer churn. The researchers Assessment
Data
studied from internet and other companies, and think how to Collecting
predict customer churn. Big Data has significant impact in
developing and supporting modern societies [5]. Solution

3.1 Customer Churn Figure 2. Methodology


The new customers for one Telecommunication Company are
First, the researchers collect and clean the data, usually such as:
mostly the lost customers from another one. Telecommunication
time segment and times of accessing the internet, customers’
companies are suffering from losing customers from time to time,
package attributes, customers' mobile terminal attributes, traffic

39
saturation, terminal compensation value and fee compensation Set price (package price) is the fee per month at least. It includes
value, etc. Then, the researchers do feature analysis and multi- internet traffic, voice, etc. Set price is separated by 10 or 20¥.
feature analysis. Maybe, it is need to collect other new data. We For example, 50 means 40< set price<=50. The relationship
will apply the new data from other systems; sometimes we should between package price and customer churn is shown as Figure 4.
build a new system or method to collect new data.
Second, the researchers build the model through business
experience and analysis of the above features. Then prediction and
assessment, user big data are modeled by XGBoost algorithm,
which is an integrated learning method. In order to improve the
efficiency and accuracy of prediction, the researchers should
select appropriate features and remove the less important
parameters. The model is optimized repeatedly with
GridSearchCV as a parameter adjusting tool. After repeated
prediction, evaluation and optimization, we can find a reasonable
and high accuracy of evaluation and recall rate.
Finally, the researchers use K-means clustering method to classify
the prediction result, propose solutions and suggest our company
to take some methods to retain customers.

5. RESULT AND DISCUSSION


5.1 Background Figure 4. Churn - package price.
The researcher found that customer churn was 1 percentage point
According to analyze the telecom package price of mobile phone,
higher than in previous months. Running majority analysis, it was
found that a feature was more centralized churn. Further analysis the researcher found that customers handling the 50 ¥ card
of user's online behavior showed that the user was playing a game: package churn very easily. This part of customers accounted for
King of Glory. This analysis data set includes one month data of 27% of the total churn. The churn rate is very high.
user’s online behavior and other customer profiles. 5.2.3 Sales Channels
According to market survey, China Unicom combined the From the top 10 sale channels of churn customers, there are 5
Tencent Company and launched Unicom's large and small King campus channels. The probability of churn of students is relatively
Card business to the public, which caused a great upsurge in the high, which is related to students' lack of income and pursuit of
industry market in the way of lower tariffs and exemption from cheap fares. Of course, students like to play King of Glory very
the data traffic usage fees of Tencent products. This set of tariff much.
policy combined with the operation mode of Tencent App
cartridge pushing has a significant impact on the loss of active
5.2.4 Internet Traffic Saturation in Telecom Package
The researchers studied the proportion of usage internet traffic to
users of Tencent games on local E-surfing mobile.
package traffic. The analysis of Internet Traffic Saturation –
5.2 Customer Churn Feature Analysis Churn is shown as Figure 5. It is separated into five segments: low
(usage<=80% P (package internet traffic)), close (80%
5.2.1 Times of Accessing APP P<usage<=100% P), exceed (100% P<usage<=130% P), double
The times of accessing APP King of Glory every person per
(130% P<usage<=200% P), high (usage>200% P). 100% P is the
month are separated three classes: seldom (times<30), often
saturation of internet traffic in Telecom Package.
(30<=times<100), usually (times>=100).

Figure 3. Times of accessing to APP – churn. Figure 5. Internet traffic saturation – churn.

According to analyze the times of accessing APP King of Glory Customers who are close or exceed to the saturation of the main
shown as Figure 3, the researcher found that customers of telecom packages internet traffic are prone to churn. They can
“Usually” accessing APP are easy to churn. Because they have a locate two groups:
lot of demand for internet data, they are easy to churn and choose
competitor's service. 1. Rational consumer: the package internet traffic is not enough to
support consumer demand, pay more attention to cost-effective,
5.2.2 Telecom Package Price and often pay attention to the residual internet traffic in the
package. This part of users is vulnerable to the unlimited internet

40
traffic and low fee policies of Unicom “King Card” for Tencent display, and even can improve the accuracy of model prediction
games. by combining these two features.
2. Heavy internet traffic consumer: For the huge demand for
internet traffic, traffic often exceeds the standard, and buys over-
flow packages. There is a huge temptation for “King Card”
infinite Tencent game traffic policy.
5.2.5 Terminal Brand
The researchers analyzed the mobile terminal brands among churn
customers shown as Figure 6.
Users using OPPO and VIVO account for half of all churn users,
and the churn rate of users using these two brands is relatively
high. These churn rates do not fully explain the quality and
stickiness of mobile phones, because the sales of these brands are
not the same, and high-volume brands naturally account for a high
proportion of lost users. But this shows that we should focus on
the mobile phone brand.
The researchers found the terminal price is the important factors.
Users of high-end mobile phones are more likely to recognize the
service quality of telecommunications products, with high loyalty
and relatively low loss. Based on the analysis of the brand use of Figure 7. Characteristic correlation analysis.
churn users, the users who choose high-terminal Huawei,
Samsung and Millet mobile phones are relatively conservative and 5.3.2 The Time And Times of Playing APP
not easy to churn. The researchers separated customers who played King of Glory
into two categories according to the number of access (log in).
One is more access (more than 50 times per month), and the other
is fewer access. They are shown as Figure 8 and Figure 9.

Figure 6. Terminal brand – churn.


Figure 8. Log in more times – churn.
5.3 Customer Multi-feature Analysis
5.3.1 Characteristic Correlation Analysis Users who play King of Glory more often (log in more than 50
Before the model training, we analyze the correlation between two times) in the middle of the night (from 11 o'clock to 7 o'clock the
or two features, the correlation between each feature and the label, next day) are more likely to be churn than those who use it at
and the direct reservation of the strong correlation with the target other times.
label. The explanation which is strongly related to all the features
has no discrimination, so it can be removed directly.
From the correlation analysis chart shown as Figure 7, we can see
that red represents positive correlation, blue represents negative
correlation, and the deeper the color, the stronger the correlation.
We should make a concrete multi-feature analysis of strongly
correlated features.
Focus on the following characteristics: the time period of playing
King of Glory, the number of playing King of Glory, terminal
brand, the old and new degree of terminal brand, user balance,
whether it is the use of telecommunications broadband users,
traffic within the package.
The above multi-feature analysis for feature screening of Figure 9. Log in fewer times – churn.
modeling can determine which features to use through visual

41
Most users are focused on playing King of Glory before 23 5.5 Classify Churn Target Customers
o'clock in the day and evening, and users who play more times in Customers’ Segmentation is an important concept for designing
each period are more likely to churn than those who play fewer marketing campaigns to improve businesses and increase revenue.
times. Clustering algorithms can help marketing experts to achieve this
5.3.3 Balance and Telecom Broadband Purchase goal [11]. Using K-means clustering method, 11000 churn target
Customers are separated into 4 categories according to the balance customers per month are classified into three categories: traffic-
(little, much) and broadband (purchase or non). The analysis of restrained users, price-sensitive users and traffic infinite relocation
Balance and Broadband Purchase – Churn is shown as Figure 10. users. The online marketing is carried out by formulating the
matching policies of free gift traffic, recharge discount and
relocating unlimited package respectively.
For three kinds of target churn customers, we successfully
maintain about 9000 King of Glory potential churn customers
monthly, and the churn rate is reduced by 0.93%. This study is
mainly to find individual problems, and to conduct in-depth
research on them, propose solutions. Of course, daily customer
churn also needs to be analyzed, but it is generally difficult to
identify specific problems.

6. CONCLUSION
Churn is a big problem faced by telecom companies with fierce
competition in the market, and the large dimension of customer
attributes, the low similarity of reasons for churn and the low
Figure 10. Balance and broadband purchase – churn. churn rate are the difficulties of churn analysis. The researchers
conduct feature analysis and multi-feature analysis. After
In general, the majority of King of Glory’s users are broadband experience and feature selection, the researchers selected some
products users of telecommunications, which are relatively features of high correlation degree to improve prediction accuracy
difficult to leave the network, and users with large account and efficiency. In this paper, the researchers successfully analyzed
balances are more difficult to leave the network. However, for the customer churn. User data are modeled by XGBoost algorithm.
non-telecom broadband users and the balance are very small, the The model is optimized repeatedly with GridSearchCV. The
probability of leaving the network is very high. accuracy of the model on the test set is 85.1%. The researchers
predicted about 11000 customer lists per month that may be about
5.4 Modeling to churn. Using K-means clustering method, about 11000 churn
Through business experience and analysis of the above features, target customers per month are classified into three categories and
35 appropriate features were selected, and the accuracy of the telecom companies are suggested to take some methods to retain
model was improved by combining single feature filtering method. customers. Every month about 9000 successfully are maintained.
The feature importance is shown as Figure 11. The churn rate is reduced by 0.93% from 4.73%. According to
mobile ARPU value of 50 Yuan per user, 5.4 million Yuan of
business income was brought through the defense of King of
Glory in the whole year. At the same time, feature analysis can
find solutions to the problem. Additional customer churn features
may be added for better prediction. This customer churn
prediction through big data analysis may be used in other Asian
countries, such as India, Philippines, Vietnam and Indonesia,
which has large population, to retain customers for the telecom
industry.

7. REFERENCE
[1] Anujkumar Tiwari, Reuben Sam, Shakila Shaikh. Analysis
and prediction of churn customers for telecommunication
industry. 2017 International Conference on I-SMAC (IoT in
Social, Mobile, Analytics and Cloud) (I-SMAC). DOI=
https://doi.org/10.1109/I-SMAC.2017.8058343.
[2] Giridhar Maji, Sharmistha Mandal, Souvik Bhattacharya,
Figure 11. Feature importance. Soumya Sen. March 2017. Designing combo recharge plans
for telecom subscribers using itemset mining technique. 2017
User data are modeled by XGBoost algorithm, which is an IEEE International Conference on Industrial Technology
integrated learning method. The model is optimized repeatedly (ICIT). DOI= https://doi.org/10.1109/ICIT.2017.7915539.
with GridSearchCV as a parameter tool. Finally, the accuracy of [3] Adnan Amin, Sajid Anwar, Awais Adnan, Muhammad
the model on the test set is 85.1%. This model predicts potential Nawaz, Kaizhu Huang. May 2017. Customer churn
churn customers among active users of King of Glory and gets prediction in the telecommunication sector using a rough set
about 11000 churn customers lists per month. approach. Neurocomputing, Volume 237, 10 May 2017,

42
Pages 242-254. DOI= 2017: Proceedings of the International Conference on
https://doi.org/10.1016/j.neucom.2016.12.009. Advances in Image Processing. DOI=
[4] Hui Li, Deliang Yang, Lingling Yang, YaoLu,Xiaola Lin. https://doi.org/10.1145/3133264.3133296.
Oct. 2016. Supervised Massive Data Analysis for [8] Elitedatascience. May 2017. Dimensionality Reduction
Telecommunication Customer Churn Prediction. 2016 IEEE Algorithms: Strengths and Weaknesses. https://
International Conferences on Big Data and Cloud elitedatascience.com/dimensionality-reduction-algorithms.
Computing (BDCloud). DOI=
[9] Lamia Abo Zaid, Frederic Kleinermann, Olga De Troyer.
https://doi.org/10.1109/BDCloud-SocialCom-
January 2011. Feature Assembly Framework: towards
SustainCom.2016.35.
scalable and reusable feature models. VaMoS '11:
[5] Rahat Iqbal, Faiyaz Doctor, Brian More, Shahid Proceedings of the 5th Workshop on Variability Modeling of
Mahmud, Usman Yousuf. April 2018. Big data analytics: Software-Intensive Systems, Pages 1-9. DOI=
Computational intelligence techniques and application areas. https://doi.org/10.1145/1944892.1944893.
Technological Forecasting and Social Change, In press,
[10] Marwa N. Abd-Allah, Akram Salah, Samhaa R. El-Beltagy.
corrected proof, Available online 24 April 2018. DOI=
November 2014. Enhanced Customer Churn Prediction using
https://doi.org/10.1016/j.techfore.2018.03.024.
Social Network Analysis. DUBMOD '14: Proceedings of the
[6] Marwa N. Abd-Allah, Samhaa R. El-Beltagy, Akram Salah. 3rd Workshop on Data-Driven User Behavioral Modeling
July 2017. DyadChurn: Customer Churn Prediction using and Mining from Social Media. DOI=
Strong Social Ties. IDEAS 2017: Proceedings of the 21st https://doi.org/10.1145/2665994.2665997.
International Database Engineering & Applications
[11] Wafa Qadadeh, Sherief Abdallah. 2018. Customers
Symposium. DOI= https://doi.org/10.1145/3105831.3105832.
Segmentation in the Insurance Company (TIC) Dataset.
[7] Saad B. Alotaibi. August 2017. ETDC: An Efficient Procedia Computer Science, Volume 144, 2018, Pages 277-
Technique to Cleanse Data in the Data Warehouse. ICAIP 290. DOI= https://doi.org/10.1016/j.procs.2018.10.529.

43

You might also like