You are on page 1of 6

Customer Churn Prediction in Telecom Sector

Swasti Arya Yuvraj Harshvardhan Dr. Priyanka Tyagi


Department of Computer Department of Computer Department of Computer
Science and Engineering Science and Engineering Science and Engineering
Sharda University, Uttar Sharda University, Uttar Sharda University, Uttar
Pradesh, India Pradesh, India Pradesh, India
swasti.arya24@gmail.com yuvrajharshvardhan01@gmail.com priyanka.tyagi@sharda.ac.in

Abstract-Telecommunication sector is an important


industry in upcoming developing countries. The Tech
progress, the rapid increase in the number of operators INTRODUCTION
raising the competition of the industry.[1] In the telecommunication industry the term Churn is
Customer churn is Leading as one biggest problem of described to be the activity of customers discontinuing the
mobile phone companies as it can have a significant services offered and leaving the company due to
impact on sales and revenue. Loss can happen for a dissatisfaction of the services and due to better offers from
variety of reasons, including switching to a competitor, other network provider within the affordable price tag of the
canceling a subscription due to bad customer service, or customer. This leads to a high loss of revenue and cost of the
severing a relationship with a company because there is a company.[3]
link. Churn might be because to various factors, In the field of communication, customer churn is described
including switching other company, cancelling as the activity of customer’s who leave an organization and
subscription because of poor customer service, or maybe are dissatisfied with the service and give up because other
not continuing all contact with a brand because of service providers in the network have improved the product.
insufficient touch-points. [2] customer's favorable price. The company may experience
To meet this challenge, predictive modeling tools like loss of revenue or profits.
decision trees and random forest are used to identify The instant growth of the telephone industry in recent years
customers most likely to lose customers. has led to competition among telephone users.
This study compares the performance of decision trees Therefore, to maintain business profitability,
and random forests in predicting user communication telecommunication companies must focus on customer
skills. Marketing communications collect data about retention. Customer competition is a major concern for the
customers, usage behavior and billing. Next, the data is telecommunications industry as customers choose to switch
preprocessed and divided into training and testing. to competing service providers or full-service
Next, we train the decision tree and random forest are telecommunications. To meet these challenges, predictive
applied on the training set and evaluate the test. modeling tools are used to identify customers at risk of loss.
Finding the most important features for user prediction Decision trees and random forests are popular among these
using the importance score of the Random Forest strategies because they can process large amounts of data
algorithm. The data shows that the most important and identify key factors that cause customer churn.
factors are the number of customers, the monthly price A decision tree is a tree model that divides data into smaller
and the number of registered services. Research pieces based on the most important characteristics of the
concludes that random forests can predict customer target variable. Random forest, on the other hand, is a
churn in the communications industry. learning method that can improve the accuracy of
Telecommunications companies can use this data to predictions while avoiding overloading. This study compares
create retention strategies for outgoing customers. the performance of decision trees and random forests in
estimating the number of mobile users. Customer data, usage
statistics and billing information from telecommunication
Keywords: Random Forest, Decision tree, Telecom providers were used in the study. The data is preprocessed
sector, Prediction, Churning and divided into training and testing.

1
In this study, accuracy, precision, recall and F1 score are
used to measure the performance of decision trees and
random forests on test results. Additionally, this study uses
the importance score generated by the random forest
algorithm to identify the most important factors affecting
customers. Telecommunications companies can use the
findings of this research to develop retention plans for
customer churn, increased customer satisfaction and
retention. The research is 99% accurate.

FIGURE 1
FLOW OF THE PAPER

LITERATURE SURVEY

TABLE I
ANALYSIS OF PAST PAPERS
Title Author’s Algorithms Used Methodology Result
A comparison of ML T.Vafeiadis, ANN, SVM, Naïve Study compared the SVM-POLY with AdaBoost
technique for customer churn K.I.Diamantaras, Bayes performance of machine achieved an accuracy of
prediction G.Sarigiannidis, K. learning algorithms in almost 87% and F-measure
Chatzisavvas predicting customer churn. over 84%
Churn Prediction of G. Hemanth Kumar, XGBoost, RF, Data preprocessing, data XGBoost achieved an
Customer in Telecom S. V Mohan, V. Logistic Regression filtering, feature selection, accuracy of 0.78, RF
Industry using ML Kavitha, M. Harish and model training. achieved an accuracy of
Algorithms 0.80, and Logistic
Regression achieved an
accuracy of 0.79
Churn Prediction in Changez Khan, NN, SVM, Decision Data preprocessing, feature Neural Network achieved
Telecommunication Industry Saeed Shehzad, Tree selection, and model an accuracy of 73.2%
Using Rough Set Approach Adnan Amin, Imtiaz training.
Ali, Sajid Anwar
A Data-Driven Approach to Sérgio Moro, Logistic Regression, Data collection, Data Logistic regression
Improve Customer Churn Tianyuan Zhang, Fisher discriminant analysis, Dataset approach achieved an
Prediction Based on Telecom Ricardo F. Ramos equations description, feature accuracy rate of 83.94%
Customer Segmentation selection, and model
training.
Customer Churn Prediction Naveed Anwer Butt, Decision Tree, Data preprocessing, feature Random Forest achieved
in Telecommunication Nabgha Hashmi, Random Forest, selection, splitting dataset, the highest accuracy of
and Dr.Muddesar Logistic Regression, applying machine learning 87.5%
Iqbal and Support Vector algorithms, and evaluating
Machine performance.
Customer churn prediction Manas Kumar Logistic regression, Data preprocessing, feature Adaboost and XGboost
system: a ML approach Mishra, Jasroop Naïve Bayes, SVM, selection, splitting dataset, Classifier gave the highest
Singh Chadha, Adaboost, XGBoost applying machine learning accuracy of 81.71% and
Praveen Lalwani,and Classifier algorithms, and evaluating 80.8% respectively
Pratyush Sethi performance using various
metrics.
Methods for churn prediction Gavril Toderean, SVM, Bayesian Data preprocessing, feature Bayesian network gives the

2
in the pre-paid mobile Horia Beleiu, Ionuț network, Neural selection, splitting dataset, highest accuracy
telecommunication industry Brândușoiu1 network applying machine learning
algorithms, and evaluating
performance using various
metrics.
Ensemble based approach Ali Almazroi , Saba K-means, K- Data preprocessing, feature The ensemble-based
using a combination of Bashir, Syed Fakhar medoids, X-means selection, splitting dataset, approach achieved highest
clustering and classification Bilal , Abdulwahab and random applying machine learning accuracy
algorithms to enhance Farhan Hassan Khan clustering, Deep algorithms, and evaluating
customer churn prediction of and Abdulaleem Ali learning, Naïve performance using various
telecom industry Almazroi Bayes, metrics.

Predicting customer churn in Omar Adwan, , MLPNN Data preprocessing, Feature The MLPNN algorithm
the telecom industry using Khalid Jaradat, selection, Model training achieved high accuracy
Multilayer Perceptron Neural Osama Harfoushi,, and evaluation rates of 91.5%
Networks Hossam Faris and
Nazeeh Ghatasheh
Customer churn prediction in Kasem Decision Tree, Data pre-processing, XGBOOST algorithm
telecom using machine Ahmad ,Abdelrahim, Random Forest, Feature selection, Model works best and provides
learning in big data platform Assef Jafar, and GBM, XGBOOST training and evaluation, accuracy of 93.031%
Kadan Aljoumaa Model comparison
A support vector machine Ali Rodan, Hossam SVM Feature selection, Data pre- SVM achieved the best
approach for churn Faris, Jamal processing, Model training churn rate 90.3%
prediction in telecom Alsakran, and Omar and evaluation
industry Al-Kadi

The data is often not in the right format or contains


errors that need to be addressed. In this step, we
METHODOLOGY format the data using KNN to remove unnecessary
Churn is the rate at which customers discontinue doing columns, and handling missing data. We use data
business with a company in terms of customer experience. visualization techniques to identify errors, such as
It is a metric that calculates the rate of customers who do not outliers, and handle them appropriately.
use a product in each period. Loss of customers can be
voluntary (the customer decides to switch to a competitor's 4. Simple Visualization:
product or service) or involuntary (the customer moves and Visualizing the data is an essential part of any data
can no longer use the company's goods or services). analysis. We use basic visualization techniques to
Customer churn is important for businesses to track, as it get insights into the data and create the histograms.
affects revenue, profitability, and customer satisfaction. These insights help us identify patterns and
Understanding why customers are losing can help businesses relationships in the data.
reduce churn and retain more customers over time.
The following is a detailed methodology:
1. Importing Library:
The first step is to import the necessary libraries
such as NumPy, Pandas, Scikit-learn, and
Matplotlib, seaborn, plot.ly. express which will be
used for data manipulation, visualization, and
machine learning algorithms.
2. Basic Explore Dataset:
In this step, the dataset is loaded, and basic
statistical information is obtained. The goal is to get
an initial understanding of the data, such as the
number of rows and columns, data types, duplicated Fig. 2 Histogram representing how many people churned
values and missing values. and what was the reason
Conclusion:
•Dataset have 8946 rows and 38 columns
•There are missing value = NaNN Value. Treatment
of each data NaNN will be process
•No Duplicated data
3. Formatting and Cleaning Dataset:

3
Fig. 5 Colormap repressing the status of customer who
stayed and churned

6. Modelling:
After preprocessing the data, we are ready to build
machine learning models. We use cross validation
techniques to evaluate the model’s performance and
Fig. 3 Representation of customers who stayed, churned, choose the best performing model.
and joined based on different categories
7. Decision Tree Classifier:
A model that is built by repeatedly splitting the data
into smaller subsets based on the significant
feature. We use the Decision Tree Classifier
algorithm to build a decision tree model.

8. Random Forest Classifier:


An ensemble learning method that makes multiple
decision trees and combines their predictions to
make a final prediction. We use the Random Forest
Classifier algorithm to build a random forest model.
We tune the hyperparameters of the model to make
Fig. 4 Histogram representing customer status on different the performance better.
aspects
9. Conclusion:
In conclusion, we have applied various data
From data visualization we can conclude that the causes of analysis techniques and ML algorithms to predict
churn: customer churn. We have used simple visualization
•Type of offer E is the biggest contributor to customers techniques to get insights into the data, pre-
leaving the company, in contrast to offers A and B which processed the data to build ML models, and used
make customers stayed decision tree and random forest classifiers to build
•People who do not use services such as Internet Service, models. Our best calculated was the Random Forest
Multiple Lines, Online Back up more churn than stayed Classifier with an accuracy of 99% for the test data.
•The most visible thing is that the monthly contract factor is
the main cause of people churning

5. Preprocessing:
Before we can build machine learning models, we
need to pre-process the data. This step included
splitting the data into train and test and encoding
categorical variables. We use feature selection
techniques to reduce the features used in models,
which helps to reduce overfitting. We will delete
the joined data row, the prediction only focuses on
Churn or Stayed.

4
distribution model indicates that it will be more useful for
identifying customers at risk of loss and will allow
organizations to take important steps to retain them.
It should be said that accuracy is not a statistic to examine
when evaluating machine learning models. Other measures
such as precision, recall, and F1 scores should be explored
as they provide additional information about the
performance of the model.
Also, it is important to use other data of the preserved model
to ensure that the data used to train and test the model are
representative of the population studied and to ensure
generalizability.
Finally, the findings show that machine learning models can
predict customer matches. The high accuracy of the decision
tree classifier and random forest classifier models suggest
that they can be useful tools for companies to identify
customers at risk of loss. However, the Random Forest
classifier model outperforms the Decision Tree classifier
model in terms of accuracy and becomes the recommended
model for predicting customer churn in this scenario can be
used as additional audit data to determine the reliability of
the model for practical use.

REFERENCES
[1] A. K. Ahmad, A. Jafar, and K. Aljoumaa,
“Customer churn prediction in telecom using
machine learning in big data platform,” J Big Data,
vol. 6, no. 1, Dec. 2019, doi: 10.1186/s40537-019-
0191-6.
[2] L. Sook Ling, N. Mustafa, and S. F. Abdul
Razak, “Customer churn prediction for
FIGURE 6 telecommunication industry: A Malaysian Case
WORKING OF THE MODEL Study,” F1000Res, vol. 10, 2021, doi:
10.12688/f1000research.73597.1.
RESULT AND CONCLUSION [3] V. Umayaparvathi and K. Iyakutti, “A
Survey on Customer Churn Prediction in Telecom
Estimates are important for businesses that want to identify Industry: Datasets, Methods and Metrics,” 2016.
their customers who are about to leave. [4] T. Vafeiadis, K. I. Diamantaras, G.
This information can be used to attract customers, reduce Sarigiannidis, and K. C. Chatzisavvas, “A
customer acquisition costs, and increase revenue. In this comparison of machine learning techniques for
case, two machine learning methods are used to predict customer churn prediction,” Simul Model Pract
customer churn from customer data: Theory, vol. 55, pp. 1–9, Jun. 2015, doi:
A random forest distribution combined with decision trees 10.1016/j.simpat.2015.03.003.
Accuracy is as high as 97% for decision tree classification [5] G. H. Kumar, M. Kumar, and M. Harish,
and 99% for random forest classifier models. These high “Churn Prediction of Customer in Telecom
accuracy values indicate as input that both models accurately Industry using Machine Learning Algorithms.”
predict whether a customer will leave. [6] A. Amin, S. Shehzad, C. Khan, I. Ali, and
However, the Random Forest classifier model outperforms S. Anwar, “Churn prediction in telecommunication
the Decision Tree classifier model in terms of accuracy. A industry using rough set approach,” Studies in
random forest classifier is a cluster model that combines Computational Intelligence, vol. 572, pp. 83–95,
multiple decision trees to increase accuracy and reduce 2015, doi: 10.1007/978-3-319-10774-5_8.
overfitting. This might explain why it outperforms the [7] T. Zhang, S. Moro, and R. F. Ramos, “A
decision tree classification model in this case. Data-Driven Approach to Improve Customer Churn
Based on these findings, it can be concluded that the random Prediction Based on Telecom Customer
forest distribution is the best model for predicting customer Segmentation,” Future Internet, vol. 14, no. 3, Mar.
churn in this case. The higher accuracy of the random forest 2022, doi: 10.3390/fi14030094.

5
[8] N. Hashmi and N. A. Butt, “Customer [13] A. Rodan, H. Faris, O. S. Al-Kadi, J.
Churn Prediction in Telecommunication A Decade Alsakran, and O. Al-Kadi, “A Support Vector
Review and Classification Telecommunication Machine Approach for Churn Prediction in
Churn Prediction View project Educational Data Telecom Industry Multi-Verse Optimizer (MVO):
Mining View project Customer Churn Prediction in theories, variants, and applications View project
Telecommunication A Decade Review and Disability as Diversity: The Inclusion of Students
Classification,” 2014. with Disabilities in Higher Education/ Edu4ALL
[9] P. Lalwani, M. K. Mishra, J. S. Chadha, (ERASMUS+) View project A SUPPORT
and P. Sethi, “Customer churn prediction system: a VECTOR MACHINE APPROACH FOR CHURN
machine learning approach,” Computing, vol. 104, PREDICTION IN TELECOM INDUSTRY,” 2014.
no. 2, pp. 271–294, Feb. 2022, doi:
10.1007/s00607-021-00908-y.
[10] I. Brândușoiu, G. Toderean, and H. Beleiu,
“Methods for churn prediction in the pre-paid
mobile telecommunication industry.”
[11] S. F. Bilal, A. A. Almazroi, S. Bashir, F. AUTHOR INFORMATION
H. Khan, and A. A. Almazroi, “An ensemble based
Dr. Priyanka Tyagi, Professor, Department of Computer
approach using a combination of clustering and
Engineering, Sharda University
classification algorithms to enhance customer churn
Swasti Gandhi Arya, Student, Department of Computer
prediction in telecom industry,” PeerJ Comput Sci,
Engineering, Sharda University
vol. 8, 2022, doi: 10.7717/PEERJ-CS.854.
Yuvraj Harshvardhan, Student, Department of Computer
[12] O. Adwan, H. Faris, K. Jaradat, O.
Engineering, Sharda University
Harfoushi, and N. Ghatasheh, “Predicting Customer
Churn in Telecom Industry using MLP Neural
Networks: Modeling and Analysis,” Life Sci J, vol.
11, no. 3, pp. 1097–8135, 2014, doi:
10.7537/marslsj110314.11.

You might also like