Customer Churn Analysis

CUSTOMER CHURN ANALYSIS
BY MACHINE LEARNING USING PYTHON
Created By-
SUBHAMAY SADHUKHAN,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110175 of 2016-17
PRITHWISH DAS,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110145 of 2016-17
RITWICK DAS,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110150 of 2016-17
SUMIT SAHA,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110178 of 2016-17
Customer Churn Analysis 1

Table of Contents
SL. content Page no.

NO.
1 Acknowledgement
2 Project Objective
3 Project Scope
4 Data Description
5 Model Building
6 Code
7 Future scope of
improvements
8 Certificate

Acknowledgement
I take this opportunity to express my profound gratitude and deep regards to my faculty
Mr. Titas Roy Chowdhury for his exemplary guidance, monitoring and constant
encouragement throughout the course of this project. The blessings, help and guidance
given time to time shall carry me a long way in the journey of life on which I am about to
embark.
I am obliged to my project team members for the valuable information provided by them in
their respective fields. I am grateful for their co operation during the period of my
assignment.
Subhamay Sadhukhan

Acknowledgement
embark.
assignment.
Prithwish Das

Acknowledgement
embark.
assignment.
Ritwick Das

Acknowledgement
embark.
assignment.
Sumit Saha

Project Scope:
It is no secret that customer retention is a top priority for many
companies.
Acquiring new customers can be several times more expensive than

retaining existing ones.
Furthermore, gaining an understanding of the reasons customers churn
and estimating the risk associated with individual customers are both
powerful components of designing a data-driven retention strategy.
A churn model can be the tool that brings these elements together and
provides insights and outputs that drive decision making across an
organization.

Project objective
 Describe the problem:
 Simply put, customer churn occurs when customers or subscribers stop

doing business with a company or service. Also known as customer
attrition, customer churn is a critical metric because it is much less
expensive to retain existing customers than it is to acquire new customers
– earning business from new customers means working leads all the way
through the sales funnel, utilizing your marketing and sales resources
throughout the process. Customer retention, on the other hand, is
generally more cost-effective as you’ve already earned the trust and
loyalty of existing customers.
 Customer churn impedes growth, so companies should have a defined

method for calculating customer churn in a given period of time. By being
aware of and monitoring churn rate, organizations are equipped to
determine their customer retention success rates and identify strategies
for improvement.
 Various organizations calculate customer churn rate in a variety of ways,

as churn rate may represent the total number of customers lost, the
percentage of customers lost compared to the company’s total customer
count, the value of recurring business lost, or the percent of recurring
value lost. Other organizations calculate churn rate for a certain period of
time, such as quarterly periods or fiscal years. One of the most commonly
used methods for calculating customer churn is to divide the total number
of clients a company has at the beginning of a specified time period by the
number of customers lost during the same period.

 Causes of Customer Churn
 There are a multitude of issues that can lead customers to leave a

business, but there are a few that are considered to be the leading causes
of customer churn. The first is poor customer service. One study found
that nearly nine out of ten customers have abandoned a business due to a
poor experience. We are living and working in the era of the customer,
and customers are demanding exceptional customer service and
experiences. When they don’t receive it, they flock to competitors and
share their negative experiences on social media: 59% of 25-34-year-olds
share poor customer experiences online. Poor customer service,
therefore, can result in many more customers churning than simply the
one customer who had a poor service experience.
 Other causes of customer churn include a poor onboarding process, a lack

of ongoing customer success, natural causes that occur for all businesses
from time to time, a lack of value, low-quality communications, and a lack
of brand loyalty.

 Disadvantages of Customer Churn
 There is a direct relationship between customer lifetime value and the

ability to grow your business. As such, the higher your customer churn
rate, the lower your chances of growing your business. Even if you have
some of the best marketing campaigns in your industry, your bottom line
suffers if you are losing customers at a high rate, as the cost of acquiring
new customers is so high. Much has been written on the subject of the
cost of retaining customer vs acquiring customer, especially because study
after study shows that customer acquisition costs far exceed customer
retention costs. Generally, companies spend seven times more on
customer acquisition than customer retention, and the average global
value of a lost customer is $243. Obviously, customer churn is costly for
businesses.
 What is the objective
In order to build a sustainable SaaS business, companies need to focus

their efforts on reducing customer churn. According to the authors of
“Leading on the Edge of Chaos”, reducing customer churn by 5% can
increase profits 25-125%. Therefore to reduce churn, most SaaS
companies perform customer churn analysis. But what is customer churn
analysis and what are its benefits?

Churn Analysis
Customer churn analysis refers to the customer attrition rate in a
company. This analysis helps SaaS companies identify the cause of the
churn and implement effective strategies for retention.
Gainsight understands the negative impact that churn rate can have on
company profits. Named as the "2014 cool vendor for CRM sales" (by
Gartner), Gainsight’s customer intelligence and retention process
automation technology:
 Gathers available customer behavior, transactions,

demographics data and usage pattern
 Converts structured and unstructured data/information into
meaningful insights
 Utilizes these insights to predict customers who are likely to
churn
 Identifies the causes for churn and works to resolve those issues
 Engages with customers to foster relationships
 Implements effective programs for customer retention
Statistics show that acquiring new customers can cost five times
more than retaining existing customers. Gainsight performs
customer churn analysis to reduce churn, control retention and
improve performance.

>>How are we planning to solve:
 Analyse the data frame
 Drop the columns which have high multi collinearity and

low contribution towards ‘label’.
 Apply different classification models and note down the

scores for each models.
 Select the models of the best scores.

Linear Regression Model (LRM)
To predict customer satisfaction, regression analysis model is another

popular technique that is based on supervised learning model. In this
model a data set of past observation is used to see future values of
explanatory and numerical targeted variable

Naive Bayes Model (NBM)
Naive Bayes Model (NBM) In this model the probabilities of a given

input sample are calculated that belongs to a particular class y. The set
of variable is given (X1---Xn). The given formula is used to calculate
probabilities

Decision Tree (DT)
The decision tree is the most prominent predictive model that is used
for the purpose of classification of upcoming trial The decision tree
consists of two steps, tree building and tree pruning. In tree building
the training set data is recursively partitioned in accordance with the
values of the attributes. This process goes on until there is no one
partition is left to have identical values. In this process some values
may be removed from the data due to noisy data. The largest
estimated error rate branches are selected and then removed in
pruning. To predict accuracy and reducing complexity of the decision
tree is called tree pruning

Project scope

Data description
>st – state:
State code of each customer.
Data type: string
Value type: Categorical
Null value percentage: 0
>acclen - account length
Data type: Integer

The normalize form
Value type: continuous

Null percentage: 0
5 number statistics: 1 , 74 , 101 , 127 , 243
----------------------------------------------------------------------------------
arcode - area code

Data type: Integer
Value type: categorical
Null percentage: 0
--------------------------------------------------------------------------------
>phnum - phone number
Data type: Integer
Null percentage:0
-------------------------------------------------------------------------------

>intplan - internet plan
Data type: Object

Null percentage: 0
--------------------------------------------------------------------------------

>Voice- voice plan
Voice = voice plan

Data type: Object
Null percentage: 0
-----------------------------------------------------------------------------------

>nummailmes - no of email messages:
Data type: Integer

Null percentage: 0
5 number statistics: 0 , 0 , 0 , 20 , 51

------------------------------------------------------------------------------
>tdmin - total day messages

Data type: float
Null percentage: 0
5 number statistics: 0, 143.7, 179.4, 216.4, 350.8
------------------------------------------------------------------------

>tdcal - total day time calls

Data type: integer
Null percentage: 0
5 number statistics: 0 , 87 , 101 , 114 , 165.
--------------------------------------------------------------------------

>tdchar - total day time charges

Data type: float
Null percentage: 0
5 number statistics: 0 , 24.43, 30.5 , 36.79, 59.64

>temin - total evening time minutes

Data type: float
Null percentage: 0
5 number statistics: 0 , 166.6, 201.4, 235.3, 363.7

>tecahr- total evening time charges

Data type: float
Null percentage: 0
5 number statistics: 0 , 14.16, 17.12, 20 , 30.91

>Tnmin – Total night time minutes

Data type: float
Null percentage: 0
5 number statistics: 0 , 14.16, 17.12, 20 , 30.91

>tn cal- total night time calls

Data type: integer
Null percentage: 0
5 number statistics: 33 , 87 , 100 , 113 , 175.

>Tnchar- total night time charges

Data type: float
Null percentage: 0
5 number statistics: 1.04, 7.52, 9.05, 10.59, 17.77

>Timin- total international minutes

Data type: float
Null percentage: 0
5 number statistics: 0 , 8.5, 10.3, 12.1, 20

>Tical- total international calls

Data type: integer
Null percentage: 0
5 number statistics: 0, 3 , 4 , 6 , 20

>Tichar- total international charges

Data type: float
Null percentage: 0
5 number statistics: 0, 2.3 , 2.78, 3.27, 5.4

>Ncsc- number of customer service calls
Data type: float

Null percentage: 0
---------------------------------------------------------------------------------------

>Label: Churned?
Data type: object

Null percentage: 0

Heatmap:
1. from the heatmap of co-relation we can obtain that tdmin &

tdchar,temin & tecahr,tnmin & tnchar are co-linear.
2. so we are dropping tnchar ,temin and tdchar to reduce multi
co-linearity

Data analysis:
1. by the data analysis we analysed that there are 4 catagorical
column and 16 continues data
2.We drop some columns which we don’t need in the model because
those columns don’t affect the model named :-
i.st=state
ii.acclen=account length
iii.Tdcal=total daytime call
iv.tecal=total evening time call
v.temin=total evening time minutes
vi.tnchar=total night time charges
vii.tdchar=total day time charges

Output of our project:
From the output, we can see that decision tree

classifier is giving the best solution for our data sets.
So Desiciontree is elected as the optimum model.
>Confusion Matrix:
The confusion matrix for optimum model is
[1379 64]
[ 65 159]

Model building
Loading The Data

We will work with a toy dataset emulating the data that an e-
business company (and more precisely an internet market place)
might have.
First, create a new project, then import the two main datasets into
DSS:
o Events are a log of what happened on your website: what page
users see, what product they like or purchase…
o Product is a look-up table containing a product id, and information
about its category and price. Make sure to set product_id column as
bigint in the schema.
As with any data science project, the first step is to explore the data.
Start by opening the events dataset: it contains a user id, a
timestamp, an event type, a product id and a seller id. For instance,
we may want to check the distribution of the event types. From the
column header, just click on Analyze:

Data Attributes and Labels
The initial ingredient for building any predictive pipeline is data. For churn
specifically, historical data is captured and stored in a data warehouse,
depending on the application domain. The process of churn definition and
establishing data hooks to capture relevant events is highly iterative. It is
very important to keep this in mind as the initial churn definition, with its
associated data hooks, may not be applicable or relevant anymore as a
product or a service matures. That’s why it’s essential for data scientists to
not only monitor the performance of the predictive pipeline over time but
also to pay close attention to the alignment of churn definition with the
product’s changes as they might affect who the churners are.
The specific attributes used in a churn model are highly domain dependent.
However, broadly speaking, the most common attributes capture user
behavior with regards to engagement level with a product or service. This
can be thought of as the number of times that a user logs into her/his
account in a week or the amount of time that a user spends on a portal. In
short, frequency and intensity of usage/engagement are among the
strongest signals to predict churn.

LogisticReggression:
LogisticReggression is the appropriate regression analysis to conduct

when the dependent variable is dichotomous (binary). Like all
regression analyses, the logistic regression is a predictive analysis.
Logistic regression is used to describe data and to explain the
relationship between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.
Sometimes logistic regressions are difficult to interpret; the Intellectus
Statistics tool easily allows you to conduct the analysis, then in plain
English interprets the output.
Decision Tree Classifier :
 Introduction :
Decision Tree Classifier is a simple and widely used classification
technique. It applies a straitforward idea to solve the classification
problem. Decision Tree Classifier poses a series of carefully crafted
questions about the attributes of the test record. Each time time it
receive an answer, a follow-up question is asked until a conclusion
about the calss label of the record is reached.
 Build A Decision Tree :
Build a optimal decision tree is key problem in decision tree classifier.
In general, may decision trees can be constructed from a given set of
attributes. While some of the trees are more accurate than others,
finding the optimal tree is computationally infeasible because of the
exponential size of the search space.
However, various efficient algorithms have been developed to

construct a resonably accurate, albeit suboptimal, decision tree in a

reasonable amount of time. These algorithms ususally employ a greedy
strategy that grows a decision tree by making a serise of locally
optimum decisions about which attribute to use for partitioning the
data. For example, Hunt's algorithm, ID3, C4.5, CART, SPRINT are
greedy decision tree induction algorithms.
 Algorithm for Decision Tree Induction :
The decision tree induction algorithm works by recursively selecting

the best attribute to split the data and expanding the leaf nodes of the
tree until the stopping cirterion is met. The choice of best split test
condition is determined by comparing the impurity of child nodes and
also depends on which impurity measurement is used. After building
the decision tree, a tree-prunning step can be performed to reduce the
size of decision tree. Decision trees that are too large are susceptible to
a phenomenon known as overfitting. Pruning helps by trimming the
branches of the initail tree in a way that improves the generalization
capability of the decision tree.
Here is a example recursive function [ 2 ] that builds the tree by

choosing the best dividing criteria for the given data set. It is called
with list of rows and then loops through every column (except the last
one, which has the result in it), finds every possible value for that
column, and divides the dataset into two new subsets. It calculates the
weightedaverage entropy for every pair of new subsets by multiplying
each set’s entropy by the fraction of the items that ended up in each
set, and remembers which pair has the lowest entropy. If the best pair
of subsets doesn’t have a lower weighted-average entropy than the
current set, that branch ends and the counts of the possible outcomes
are stored. Otherwise, buildtree is called on each set and they are
added to the tree. The results of the calls on each subset are attached
to the True and False branches of the nodes, eventually constructing an
entire tree.

def buildtree(rows,scoref=entropy):
if len(rows)==0: return decisionnode( )
current_score=scoref(rows)
# Set up some variables to track the best criteria
best_gain=0.0
best_criteria=None
best_sets=None
column_count=len(rows[0])-1
for col in range(0,column_count):
# Generate the list of different values in
# this column
column_values={}
for row in rows:
column_values[row[col]]=1
# Now try dividing the rows up for each value
# in this column
for value in column_values.keys( ):
(set1,set2)=divideset(rows,col,value)

# Information gain
p=float(len(set1))/len(rows)
gain=current_score-p*scoref(set1)-(1-p)*scoref(set2)
if gain>best_gain and len(set1)>0 and len(set2)>0:
best_gain=gain
best_criteria=(col,value)
best_sets=(set1,set2)
# Create the subbranches
if best_gain>0:
trueBranch=buildtree(best_sets[0])
falseBranch=buildtree(best_sets[1])
return decisionnode(col=best_criteria[0],value=best_criteria[1],
tb=trueBranch,fb=falseBranch)
else:
return decisionnode(results=uniquecounts(rows))

Naïve Bayes Classifier :
 Introduction :
Naive Bayes is a simple technique for constructing classifiers:
models that assign class labels to problem instances, represented
as vectors of feature values, where the class labels are drawn
from some finite set. There is not a single algorithm for training
such classifiers, but a family of algorithms based on a common
principle: all naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other
feature, given the class variable. For example, a fruit may be
considered to be an apple if it is red, round, and about 10 cm in
diameter. A naive Bayes classifier considers each of these features
to contribute independently to the probability that this fruit is an
apple, regardless of any possible correlations between the color,
roundness, and diameter features.
For some types of probability models, naive Bayes classifiers can
be trained very efficiently in a supervised learning setting. In many
practical applications, parameter estimation for naive Bayes
models uses the method of maximum likelihood; in other words,
one can work with the naive Bayes model without
accepting Bayesian probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified
assumptions, naive Bayes classifiers have worked quite well in
many complex real-world situations. In 2004, an analysis of the
Bayesian classification problem showed that there are sound
theoretical reasons for the apparently implausible efficacy of
naïve Bayes classifiers. Still, a comprehensive comparison with
other classification algorithms in 2006 showed that Bayes
classification is outperformed by other approaches, such
as boosted trees or random forests.

An advantage of naive Bayes is that it only requires a small
number of training data to estimate the parameters necessary for
classification.
K-Nearest Neighbors(K-NN) :
 Introduction :
The k-nearest neighbors algorithm (k-NN) is a non-parametric method

used for classification and regression. In both cases, the input consists
of the k closest training examples in the feature space. The output
depends on whether k-NN is used for classification or regression.
k-NN is a type of instance-based learning, or lazy learning, where the
function is only approximated locally and all computation is deferred
until classification. The k-NN algorithm is among the simplest of
all machine learning algorithms.
Both for classification and regression, a useful technique can be used
to assign weight to the contributions of the neighbors, so that the
nearer neighbors contribute more to the average than the more
distant ones. For example, a common weighting scheme consists in
giving each neighbor a weight of 1/d, where d is the distance to the
neighbor.
The neighbors are taken from a set of objects for which the class (for k-
NN classification) or the object property value (for k-NN regression) is
known. This can be thought of as the training set for the algorithm,
though no explicit training step is required.
A peculiarity of the k-NN algorithm is that it is sensitive to the local
structure of the data

 Algorithm :
The training examples are vectors in a multidimensional feature space,
each with a class label. The training phase of the algorithm consists
only of storing the feature vectors and class labels of the training
samples.
In the classification phase, k is a user-defined constant, and an
unlabeled vector (a query or test point) is classified by assigning the
label which is most frequent among the k training samples nearest to
that query point.
A commonly used distance metric for continuous variables is Euclidean
distance. For discrete variables, such as for text classification, another
metric can be used, such as the overlap metric (or Hamming distance).
In the context of gene expression microarray data, for example, k-NN
has also been employed with correlation coefficients such as Pearson
and Spearman. Often, the classification accuracy of k-NN can be
improved significantly if the distance metric is learned with specialized
algorithms such as Large Margin Nearest Neighbor or Neighbourhood
components analysis.
A drawback of the basic "majority voting" classification occurs when
the class distribution is skewed. That is, examples of a more frequent
class tend to dominate the prediction of the new example, because
they tend to be common among the k nearest neighbors due to their
large number. One way to overcome this problem is to weight the
classification, taking into account the distance from the test point to
each of its k nearest neighbors. The class (or value, in regression
problems) of each of the k nearest points is multiplied by a weight
proportional to the inverse of the distance from that point to the test
point. Another way to overcome skew is by abstraction in data
representation. For example, in a self-organizing map(SOM), each node
is a representative (a center) of a cluster of similar points, regardless of
their density in the original training data. K-NN can then be applied to
the SOM.

 Parameter selection :
The best choice of k depends upon the data; generally, larger values
of k reduces effect of the noise on the classification, but make
boundaries between classes less distinct. A good k can be selected by
various heuristic techniques (see hyperparameter optimization). The
special case where the class is predicted to be the class of the closest
training sample (i.e. when k = 1) is called the nearest neighbor
algorithm.
The accuracy of the k-NN algorithm can be severely degraded by the
presence of noisy or irrelevant features, or if the feature scales are not
consistent with their importance. Much research effort has been put
into selecting or scaling features to improve classification. A
particularly popular approach is the use of evolutionary algorithms to
optimize feature scaling. Another popular approach is to scale features
by the mutual information of the training data with the training classes.
In binary (two class) classification problems, it is helpful to choose k to
be an odd number as this avoids tied votes. One popular way of
choosing the empirically optimal k in this setting is via bootstrap
method.

 KNN Regression :
In K-NN regression, the k-NN algorithm is used for estimating
continuous variables. One such algorithm uses a weighted average of
the k nearest neighbors, weighted by the inverse of their distance. This
algorithm works as follows:
1. Compute the Euclidean or Mahalanobis distancefrom the query

example to the labeled examples.
2. Order the labeled examples by increasing distance.
3. Find a heuristically optimal number k of nearest neighbors, based
on RMSE. This is done using cross validation.
4. Calculate an inverse distance weighted average with the k-nearest
multivariate neighbors.

finally accepted model
code
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn import linear_model
from sklearn import preprocessing
from sklearn import utils
from sklearn import metrics
from sklearn import tree
from sklearn import feature_selection
from sklearn import neighbors
from sklearn import naive_bayes
df=pd.read_csv("d:/folder/churn_train.txt",sep=", ",engine='python')
df1=pd.read_csv("d:/folder/churn_test.txt",sep=", ",engine='python')

le=preprocessing.LabelEncoder()
le1=preprocessing.LabelEncoder()
df.info()
df["st"]=le.fit_transform(df['st'])
df["intplan"]=le.fit_transform(df['intplan'])
df["voice"]=le.fit_transform(df['voice'])
df["label"]=le.fit_transform(df['label'])
df.dropna(inplace=True)
Xtrain=df.drop("phnum",axis=1)
Xtrain=Xtrain.drop("label",axis=1)
ytrain=df[['label']]
Xtrain.describe()
ytrain.describe()
ytrain['label'].value_counts()
#feature extraction
df['st'].value_counts()
sns.boxplot(y="st",data=df,x="label")
df['st'].isnull().sum()

sns.countplot(y='st',data=df,hue='label')
sns.distplot(df['st'])
df['acclen'].value_counts()
sns.boxplot(y="acclen",data=df,x="label")
sns.distplot(df['acclen'])
qr=np.percentile(df['acclen'],[0,25,50,75,100])
qr
Xtrain['intplan'].value_counts()
sns.boxplot(y="intplan",data=df,x="label")
sns.distplot(df['intplan'])
sns.countplot(y='intplan',data=df,hue='label')
Xtrain['voice'].value_counts()
sns.countplot(y='voice',data=df,hue='label')
sns.distplot(df['voice'])
sns.boxplot(y="voice",data=df,x="label")
Xtrain['nummailmes'].value_counts()
sns.boxplot(y="nummailmes",data=df,x="label")
sns.distplot(df['nummailmes'])

Xtrain['tdmin'].value_counts()
sns.boxplot(y="tdmin",data=df,x="label")
sns.distplot(df['tdmin'])
Xtrain['tdcal'].value_counts()
sns.boxplot(y="tdcal",data=df,x="label")
sns.distplot(df['tdcal'])
Xtrain['tdchar'].value_counts()
sns.boxplot(y="tdchar",data=df,x="label")
sns.distplot(df['tdchar'])
Xtrain['temin'].value_counts()
sns.boxplot(y="temin",data=df,x="label")
sns.distplot(df['temin'])
Xtrain['tecal'].value_counts()
sns.boxplot(y="tecal",data=df,x="label")
sns.distplot(df['tecal'])
Xtrain['tecahr'].value_counts()
sns.boxplot(y="tecahr",data=df,x="label")

sns.distplot(df['tecahr'])
sns.countplot(x='intplan',data=df,hue='label')
Xtrain['tnmin'].value_counts()
sns.boxplot(y="tnmin",data=df,x="label")
sns.distplot(df['tnmin'])
Xtrain['tncal'].value_counts()
sns.boxplot(y="tncal",data=df,x="label")
sns.distplot(df['tncal'])
Xtrain['tnchar'].value_counts()
sns.boxplot(y="tnchar",data=df,x="label")
sns.distplot(df['tnchar'])
Xtrain['timin'].value_counts()
sns.boxplot(y="timin",data=df,x="label")
sns.distplot(df['timin'])
sns.countplot(y='intplan',data=df,hue='label')

Xtrain['tical'].value_counts()
sns.boxplot(y="tical",data=df,x="label")
sns.distplot(df['tical'])
sns.countplot(x='tical',data=df,hue='label')
Xtrain['tichar'].value_counts()
sns.boxplot(y="tichar",data=df,x="label")
sns.distplot(df['tichar'])
sns.countplot(x='intplan',data=df,hue='label')
df['ncsc'].value_counts()
sns.boxplot(y="ncsc",data=df,x="label")
sns.distplot(df['ncsc'])
sns.countplot(x='ncsc',data=df,hue='label')
Xtrain=Xtrain.drop("st",axis=1)
Xtrain=Xtrain.drop("acclen",axis=1)
Xtrain=Xtrain.drop("tdcal",axis=1)

Xtrain=Xtrain.drop("tecal",axis=1)
Xtrain=Xtrain.drop("temin",axis=1)
Xtrain=Xtrain.drop("tnchar",axis=1)
Xtrain=Xtrain.drop("tdchar",axis=1)
ml=tree.DecisionTreeClassifier()
ml.fit(Xtrain,ytrain)
print("AUC:",metrics.roc_auc_score(ytrain,ml.predict(Xtrain)))
print("recall:",metrics.recall_score(ytrain,ml.predict(Xtrain)))
df1["st"]=le1.fit_transform(df1['st'])
df1["intplan"]=le1.fit_transform(df1['intplan'])
df1["voice"]=le1.fit_transform(df1['voice'])
df1["label"]=le1.fit_transform(df1['label'])
df1.dropna(inplace=True)
Xtest=df1.drop("phnum",axis=1)
Xtest=Xtest.drop("label",axis=1)

Xtest=Xtest.drop("st",axis=1)
Xtest=Xtest.drop("acclen",axis=1)
Xtest=Xtest.drop("tdcal",axis=1)
Xtest=Xtest.drop("tecal",axis=1)
Xtest=Xtest.drop("temin",axis=1)
Xtest=Xtest.drop("tnchar",axis=1)
Xtest=Xtest.drop("tdchar",axis=1)
ytest=df1[['label']]
ml=tree.DecisionTreeClassifier()
ml.fit(Xtrain,ytrain)
print("AUC:",metrics.roc_auc_score(ytest,ml.predict(Xtest)))
print("recall:",metrics.recall_score(ytest,ml.predict(Xtest)))
print("F1:",metrics.precision_score(ytest,ml.predict(Xtest)))
print("F1:",metrics.accuracy_score(ytest,ml.predict(Xtest)))

confmat=metrics.confusion_matrix(ytest,ml.predict(Xtest))
print(confmat)
#continouse heatmap
df2=df
df2=df2.drop('st',axis=1)
df2=df2.drop('intplan',axis=1)
df2=df2.drop('tichar',axis=1)
df2=df2.drop('voice',axis=1)
df2=df2.drop('label',axis=1)
sns.heatmap(df2.corr())
def modelstats(Xtrain,Xtest,ytrain,ytest):
stats=[]
modelnames=["LR","DecisionTree","KNN","NB"]
models=list()
models.append(linear_model.LogisticRegression())
models.append(tree.DecisionTreeClassifier())
models.append(neighbors.KNeighborsClassifier())

models.append(naive_bayes.GaussianNB())
for name,model in zip(modelnames,models):
if name=="KNN":
k=[l for l in range(5,17,2)]
grid={"n_neighbors":k}
grid_obj =
model_selection.GridSearchCV(estimator=model,param_grid=grid,scori
ng="f1")
grid_fit =grid_obj.fit(Xtrain,ytrain)
model = grid_fit.best_estimator_
model.fit(Xtrain,ytrain)
name=name+"("+str(grid_fit.best_params_["n_neighbors"])+")"
print(grid_fit.best_params_)
else:
model.fit(Xtrain,ytrain)
trainprediction=model.predict(Xtrain)
testprediction=model.predict(Xtest)
scores=list()
scores.append(name+"-train")
scores.append(metrics.accuracy_score(ytrain,trainprediction))
scores.append(metrics.precision_score(ytrain,trainprediction))
scores.append(metrics.recall_score(ytrain,trainprediction))
scores.append(metrics.roc_auc_score(ytrain,trainprediction))
stats.append(scores)

scores=list()
scores.append(name+"-test")
scores.append(metrics.accuracy_score(ytest,testprediction))
scores.append(metrics.precision_score(ytest,testprediction))
scores.append(metrics.recall_score(ytest,testprediction))
scores.append(metrics.roc_auc_score(ytest,testprediction))
stats.append(scores)
colnames=["MODELNAME","ACCURACY","PRECISION","RECALL","AU
C"]
return pd.DataFrame(stats,columns=colnames)
print(modelstats(Xtrain,Xtest,ytrain,ytest))

Conclusion and Future Works
 In conclusion, the K-Means combined with Naïve Bayes method

demonstrated better sensitivity and accuracy for predicting
customer churn in telecommunications sector. The results
obtained after optimization of parameters for Model A has proven
that by increasing the number of clusters of the corresponding
attribute helps in improving the true positive performance of the
model. However, this technique experiences one shortcoming
which is the tradeoff point in correctly predicting the true positive
and true negative output. The result obtained has addressed the
impact of class imbalance problem which makes it difficult for
classifier to make prediction. The training set that consists of
3,333 instances has only 14.49% of churned customers while the
remaining 85.50% is non-churn instances. Thus, this requires
further investigation theoretically and experimentally by
considering several pertinent issues. One of the approaches to
improve the model is to study the interval boundaries during data
discretization process; also, the number of intervals could possibly
be another factor that affects the learning rate of the classifier.
Lastly, this technique can also be improved by adopting different
machine learning algorithm such as support vector machine,
decision tree as well as the Bayesian network that allows learning
of non-linear data sample.

Certificate
This is to certify that Mr. SUBHAMAY SADHUKHAN, GURU NANAK INSTITUTE

OF TECHNOLOGY ,registration number: 161430110175,has successfully
completed a project on ‘Customer Churn Analysis’ using “Machine
learning using Python” under the guidance of Mr. Titas Roy
Chowdhury.
________________________________
(Mr. Titas Roy Chowdhury)
Globsyn Finishing School

Certificate
This is to certify that Mr. PRITHWISH DAS, GURU NANAK INSTITUTE OF

TECHNOLOGY, registration number: 161430110145,has successfully
Chowdhury.
________________________________

Certificate
This is to certify that Mr. RITWICK DAS, GURU NANAK INSTITUTE OF

TECHNOLOGY, registration number: 161430110150 ,has successfully
Chowdhury.
________________________________

Certificate
This is to certify that Mr. SUMIT SAHA, GURU NANAK INSTITUTE OF

TECHNOLOGY, registration number: 161430110178,has successfully
learning using Python under the guidance of Mr. Titas Roy Chowdhury.
________________________________

THANK YOU

Customer Churn Analysis - GRP - 11

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Customer Churn Analysis - GRP - 11

Uploaded by

Copyright:

Available Formats

BY MACHINE LEARNING USING PYTHON

Customer Churn Analysis 1

SL. content Page no.

Customer Churn Analysis 2

Customer Churn Analysis 3

Customer Churn Analysis 4

Customer Churn Analysis 5

Customer Churn Analysis 6

Acquiring new customers can be several times more expensive than

Customer Churn Analysis 7

 Simply put, customer churn occurs when customers or subscribers stop

 Customer churn impedes growth, so companies should have a defined

 Various organizations calculate customer churn rate in a variety of ways,

Customer Churn Analysis 8

 There are a multitude of issues that can lead customers to leave a

 Other causes of customer churn include a poor onboarding process, a lack

Customer Churn Analysis 9

 There is a direct relationship between customer lifetime value and the

 What is the objective

In order to build a sustainable SaaS business, companies need to focus

Customer Churn Analysis 10

 Gathers available customer behavior, transactions,

Customer Churn Analysis 11

 Analyse the data frame

 Drop the columns which have high multi collinearity and

 Apply different classification models and note down the

 Select the models of the best scores.

Customer Churn Analysis 12

To predict customer satisfaction, regression analysis model is another

Customer Churn Analysis 13

Naive Bayes Model (NBM) In this model the probabilities of a given

Customer Churn Analysis 14

Customer Churn Analysis 15

Customer Churn Analysis 16

Data type: Integer

Customer Churn Analysis 17

Value type: continuous

arcode - area code

Customer Churn Analysis 18

Customer Churn Analysis 19

Data type: Object

Customer Churn Analysis 20

Voice = voice plan

Customer Churn Analysis 21

Data type: Integer

Customer Churn Analysis 22

Customer Churn Analysis 23

Customer Churn Analysis 24

Customer Churn Analysis 25

Customer Churn Analysis 26

Customer Churn Analysis 27

Customer Churn Analysis 28

Customer Churn Analysis 29

Customer Churn Analysis 30

Customer Churn Analysis 31

Customer Churn Analysis 32

Customer Churn Analysis 33

Customer Churn Analysis 34

Customer Churn Analysis 35

Customer Churn Analysis 36

Customer Churn Analysis 37

Customer Churn Analysis 38