Professional Documents
Culture Documents
In Telco Indsustry
Group 5
MUHYUDI ANINDYKA L E
BROTO M VICTOR GUNAWAN
WASESO
CONTENT
1. Customer Churn Prediction Concept
2. Related Theory
3. Data Visualization
4. Logistic Regression Interpretation
The telecommunication industry has got fierce competition among the various service providers. And, thus they keep introducing new
alluring offers every another day, making the customers switch from one competing mobile service provider to another. This customer
tendency to switch is referred to as Churn.
As the customer acquisition costs are continuously increasing, customer churn management has become critically important. High Churn
Rate depicts various aspects such as unsatisfactory service, unsatisfied customers and the class of its customers and thus, Churn
management becomes essential. 3
1. CUSTOMER CHURN MANAGEMENT (2/2)
However, it is quite crucial as we don’t know: Context
1. How to manage the critical customer relationship optimally “Predict behavior to retain
customers. You can analyze all
2. Who the customer would be relevant customer data and
3. What they exactly want develop focused customer
4. What makes them go and what makes them stay retention programs.”
5
DIFFERENCE BETWEEN LINEAR REGRESSION AND LOGISTIC REGRESSION
Linear Regression Logistic Regression Summary :
Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent 1. The Linear regression models data
variables. variables. using continuous numeric value. As
against, logistic regression models
Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems. the data in the binary values.
2. Linear regression requires to
In Linear regression, we predict the value of continuous In logistic Regression, we predict the values of establish the linear relationship
variables. categorical variables.
among dependent and independent
In linear regression, we find the best fit line, by which In Logistic Regression, we find the S-curve by which we variable whereas it is not necessary
we can easily predict the output. can classify the samples. for logistic regression.
Least square estimation method is used for estimation Maximum likelihood estimation method is used for
3. In the linear regression, the
of accuracy. estimation of accuracy. independent variable can be
correlated with each other. On the
The output for Linear Regression must be a continuous The output of Logistic Regression must be a Categorical contrary, in the logistic regression,
value, such as price, age, etc. value such as 0 or 1, Yes or No, etc.
the variable must not be correlated
In Linear regression, it is required that relationship In Logistic regression, it is not required to have the with each other.
between dependent variable and independent variable linear relationship between the dependent and
must be linear. independent variable.
In linear regression, there may be collinearity between In logistic regression, there should not be collinearity
the independent variables. between the independent variable. 6
3. DATA VISUALIZATION
Customer Churn Prediction Model Churn Data
Time average to solve customer 10/14/20
Internet payment 1st month
problem (Internet only)
(INT_PAY_M1) 10/14/20
(INT_MTTR_M6)
We will clasify customer churn for stay “0” and churn for “1” Total Non-Churn Churn
7
4. INTERPRETATION (1/4)
Hypothesis :
1. Wald Test H0 : βi = 0
H1 : βi ≠ 0
Decision :
Reject H0 when p-value < ά
2. Hoslem Test
Analysis :
With a significance level of 5%, the variables Internet
payment 1st month (INT_PAY_M1), Internet revenue within 6
3. Coefficient month (INT_REV_INT_MAX6), Length of Stay of customer
Interpretation (LOS), Growth of internet usage (INT_DWNL_DECR), Growth
of Time average to solve customer problem (month by
month) (ALL_MTTR_DECR) and id of divisi regional
(DIVRE_ID) partially affect Customer Churn (Y_CHURN), while
4. Confusion Matrix the Revenue of internet broadband regional (ALL_TRB_M6)
variable is significant at the 10% level.
8
4. INTERPRETATION (2/4)
Hosmer and Lemeshow Test :
1. Wald Test Hypothesis : Decision :
H0 : Model is fit Reject H0, when P-value < ά or Chisquare > Chisquare table
H1 : Model is not fit
2. Hoslem Test
3. Coefficient
Interpretation
4. Confusion Matrix
Chisquare table 95% with df = 8; 15.50731
Conclusion :
Because of p-value is 6,249e-08, less than 0,05. And, Chisquare (49.039) > Chisquare Table (15.51). Thus, reject H 0
With significancy level 5%, model logistic regression for Customer Churn Prediction is not fit. There is any
differences between observation result with possible predictive results.
9
4. INTERPRETATION (3/4)
Odds Ratios can help in determining how exactly each variable impact our Dependent Variable
1. Wald Test
2. Hoslem Test
3. Coefficient #For one increase in Internet payment 1st month, it leads to 0.5324 factor approximately increase in Churn chances.
Interpretation #Internet revenue within 6 months increases factor by 1.2329 in Churn chances.
#Growth of Internet payment increases factor by 1.0190 in Churn chances.
#For one increase in Growth of internet usage, it leads to 1.1280 factor approx. increase in Churn chances
#Time average to solve customer problem factor by 1.0334 in Churn chances
4. Confusion Matrix #Time average to solve customer problem (internet only) factor by 1.0273 in Churn chances
#For increasing of growth of time average to solve customer problem (month by month), it leads to 0.8930 factor
approximately increase in Churn chances
#Revenue of internet broadband regional factor by 1.0978 in Churn chances
#Length of Stay of customer factor by 0.7367 in Churn chances
#For one increase in id of divisional regional, it leads to 0.9069 factor approx. increase in Churn chances.
10
4. INTERPRETATION (4/4) • 0 = Customer Stay (Not Churn)
• 1 = Churn Customer
1. Wald Test
142 401
2. Hoslem Test
612 8845
3. Coefficient
Interpretation
TP : Churn Prediction, actual Churn o True : Prediction is correct
TN : Not Churn Prediction, actual not Churn o False : Prediction is wrong
4. Confusion Matrix FN : Not Churn Prediction, actual Churn o Positif & Negative are the
FP : Churn Prediction, Actual not Churn result of prediction Model
Accuracy : (TP+TN)/(TP+TN+FP+FN)
Sensitivity or Recall (True Positive Rate) : the ratio of true positive redictions
compared to the overall true positive data
TP / (FN + TP) = 142/(612+142) = 0,1883
Specificity : the ratio predicting negative compared to the overall negative data
TN / (TN+FP) = 8845/(8845+401) = 0,9566
11
ROC CURVE
12
THANK YOU