You are on page 1of 13

CHURN ANALYSIS

In Telco Indsustry
Group 5

MUHYUDI ANINDYKA L E
BROTO M VICTOR GUNAWAN
WASESO
CONTENT
1. Customer Churn Prediction Concept
2. Related Theory
3. Data Visualization
4. Logistic Regression Interpretation

SWISS GERMANY UNIVERSITY 2


1. CUSTOMER CHURN MANAGEMENT (1/2)
Churn management is the art of identifying the valuable customers, who are likely to churn from a company and executing proactive
steps to retain them.

The telecommunication industry has got fierce competition among the various service providers. And, thus they keep introducing new
alluring offers every another day, making the customers switch from one competing mobile service provider to another. This customer
tendency to switch is referred to as Churn.

The importance of Churn management


The percentage of a service’s subscribers that discontinue its subscription to the service within a time period is called Churn Rate. While a
high churn rate implies that a company loses a high percentage of its subscribers every month; a company’s low churn rate statistics
justifies its capability of retaining its subscribers with a quality service. Thus, companies try their level best to lower churn by retaining
their customers.

Churn can be classified into three categories:


1. Account Churn- Where the customer is completely lost
2. Product Churn- Where the customer has lowered his subscription profile
3. Decreased Spend- Where the customer has reduced his spending without changing his subscription profile

As the customer acquisition costs are continuously increasing, customer churn management has become critically important. High Churn
Rate depicts various aspects such as unsatisfactory service, unsatisfied customers and the class of its customers and thus, Churn
management becomes essential. 3
1. CUSTOMER CHURN MANAGEMENT (2/2)
However, it is quite crucial as we don’t know: Context
1. How to manage the critical customer relationship optimally “Predict behavior to retain
customers. You can analyze all
2. Who the customer would be relevant customer data and
3. What they exactly want develop focused customer
4. What makes them go and what makes them stay retention programs.”

Churn Management involves the tracking of two customer metrics : Summary


5. Churn Score - The calculation of customer behavior Within a saturated market for the
6. Customer value - Calculated on the basis of customer’s desires and satisfaction communication service providers,
customer acquisition no longer
ensures sustainable revenue. The
Both these values are calculated and fed to the company’s CRM system through application
paradigm has moved towards
messaging from analytics applications and are updated in the consumer component’s attributed customer retention.
page. The CSRs as well as other company officials can see their customer churn score opening up Now, amazing the subscribers by
the records. Evaluating the customer churn score, a company can go for the best suited actions, demonstrating how much you
generally referred to as customer retention tools. understand their needs is the key
There may be several or a single issue within a company’s functional chain leading to customer to success. A better customer
churn. Beit revenue issue, or improper marketing, or sales issue, or customer dissatisfaction, or understanding, how to target them
and what they need can help in
network coverage issue, or even issues with the new technologies and strategy; anything can lead
reducing churn and lowering the
to alarming churn statistics. marketing costs substantially.
4
LOGISTIC REGRESSION
 Logistic regression is one of the most popular Machine learning
algorithm that comes under Supervised Learning techniques.
 It can be used for Classification as well as for Regression problems, but
mainly used for Classification problems.
 Logistic regression is used to predict the categorical dependent variable
with the help of independent variables.
 The output of Logistic Regression problem can be only between the 0
and 1.
 Logistic regression can be used where the probabilities between two
classes is required. Such as whether it will rain today or not, either 0 or
1, true or false etc.
 Logistic regression is based on the concept of Maximum Likelihood
estimation. According to this estimation, the observed data should be
most probable.
 In logistic regression, we pass the weighted sum of inputs through an
activation function that can map values in between 0 and 1. Such
activation function is known as sigmoid function and the curve
obtained is called as sigmoid curve or S-curve.

5
DIFFERENCE BETWEEN LINEAR REGRESSION AND LOGISTIC REGRESSION
Linear Regression Logistic Regression Summary :
Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent 1. The Linear regression models data
variables. variables. using continuous numeric value. As
against, logistic regression models
Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems. the data in the binary values.
2. Linear regression requires to
In Linear regression, we predict the value of continuous In logistic Regression, we predict the values of establish the linear relationship
variables. categorical variables.
among dependent and independent
In linear regression, we find the best fit line, by which In Logistic Regression, we find the S-curve by which we variable whereas it is not necessary
we can easily predict the output. can classify the samples. for logistic regression.
Least square estimation method is used for estimation Maximum likelihood estimation method is used for
3. In the linear regression, the
of accuracy. estimation of accuracy. independent variable can be
correlated with each other. On the
The output for Linear Regression must be a continuous The output of Logistic Regression must be a Categorical contrary, in the logistic regression,
value, such as price, age, etc. value such as 0 or 1, Yes or No, etc.
the variable must not be correlated
In Linear regression, it is required that relationship In Logistic regression, it is not required to have the with each other.
between dependent variable and independent variable linear relationship between the dependent and
must be linear. independent variable.

In linear regression, there may be collinearity between In logistic regression, there should not be collinearity
the independent variables. between the independent variable. 6
3. DATA VISUALIZATION
Customer Churn Prediction Model Churn Data
Time average to solve customer 10/14/20
Internet payment 1st month
problem (Internet only)
(INT_PAY_M1) 10/14/20
(INT_MTTR_M6)

Growth of Time average to solve


Internet revenue within 6 month
customer problem (month by
(INT_REV_INT_MAX6)
month) (ALL_MTTR_DECR)

Revenue of internet broadband


Growth of Internet payment Churn
regional
(INT_PAY_DECR) Score (ALL_TRB_M6)
543
Growth of internet usage Length of Stay of customer
(INT_DOWNL_DECR) (LOS)

Time average to solve customer


id of divisi regional
problem.
(DIVRE_ID)
(ALL_MTTR_M2)

We will clasify customer churn for stay “0” and churn for “1” Total Non-Churn Churn
7
4. INTERPRETATION (1/4)
Hypothesis :
1. Wald Test H0 : βi = 0
H1 : βi ≠ 0

Decision :
Reject H0 when p-value < ά
2. Hoslem Test
Analysis :
With a significance level of 5%, the variables Internet
payment 1st month (INT_PAY_M1), Internet revenue within 6
3. Coefficient month (INT_REV_INT_MAX6), Length of Stay of customer
Interpretation (LOS), Growth of internet usage (INT_DWNL_DECR), Growth
of Time average to solve customer problem (month by
month) (ALL_MTTR_DECR) and id of divisi regional
(DIVRE_ID) partially affect Customer Churn (Y_CHURN), while
4. Confusion Matrix the Revenue of internet broadband regional (ALL_TRB_M6)
variable is significant at the 10% level.

AIC (Akaike’s Information Criteria) statistic for model


selection on 4054,6. The smaller the AIC value, the better the
predictive power of the predictive model obtained.

8
4. INTERPRETATION (2/4)
Hosmer and Lemeshow Test :
1. Wald Test Hypothesis : Decision :

H0 : Model is fit Reject H0, when P-value < ά or Chisquare > Chisquare table
H1 : Model is not fit
2. Hoslem Test

3. Coefficient
Interpretation

4. Confusion Matrix
Chisquare table 95% with df = 8; 15.50731

Conclusion :

Because of p-value is 6,249e-08, less than 0,05. And, Chisquare (49.039) > Chisquare Table (15.51). Thus, reject H 0
With significancy level 5%, model logistic regression for Customer Churn Prediction is not fit. There is any
differences between observation result with possible predictive results.

9
4. INTERPRETATION (3/4)
Odds Ratios can help in determining how exactly each variable impact our Dependent Variable
1. Wald Test

2. Hoslem Test

3. Coefficient #For one increase in Internet payment 1st month, it leads to 0.5324 factor approximately increase in Churn chances.
Interpretation #Internet revenue within 6 months increases factor by 1.2329 in Churn chances.
#Growth of Internet payment increases factor by 1.0190 in Churn chances.
#For one increase in Growth of internet usage, it leads to 1.1280 factor approx. increase in Churn chances
#Time average to solve customer problem factor by 1.0334 in Churn chances
4. Confusion Matrix #Time average to solve customer problem (internet only) factor by 1.0273 in Churn chances
#For increasing of growth of time average to solve customer problem (month by month), it leads to 0.8930 factor
approximately increase in Churn chances
#Revenue of internet broadband regional factor by 1.0978 in Churn chances
#Length of Stay of customer factor by 0.7367 in Churn chances
#For one increase in id of divisional regional, it leads to 0.9069 factor approx. increase in Churn chances.

10
4. INTERPRETATION (4/4) • 0 = Customer Stay (Not Churn)
• 1 = Churn Customer

1. Wald Test
142 401
2. Hoslem Test
612 8845
3. Coefficient
Interpretation
 TP : Churn Prediction, actual Churn o True : Prediction is correct
 TN : Not Churn Prediction, actual not Churn o False : Prediction is wrong
4. Confusion Matrix  FN : Not Churn Prediction, actual Churn o Positif & Negative are the
 FP : Churn Prediction, Actual not Churn result of prediction Model

 Accuracy : (TP+TN)/(TP+TN+FP+FN)
 Sensitivity or Recall (True Positive Rate) : the ratio of true positive redictions
compared to the overall true positive data
TP / (FN + TP) = 142/(612+142) = 0,1883
 Specificity : the ratio predicting negative compared to the overall negative data
TN / (TN+FP) = 8845/(8845+401) = 0,9566

11
ROC CURVE

ROC (Receiver Operating Characteristics) is a


kind of performance measurement tool for
classification problems in determining the
threshold of a model

Area Under of Curve : 0.642992. it model


have AUC quite big, because for each point
he has a higher TP and / or lower FP

12
THANK YOU

You might also like