Professional Documents
Culture Documents
Created By-
SUBHAMAY SADHUKHAN,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110175 of 2016-17
PRITHWISH DAS,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110145 of 2016-17
RITWICK DAS,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110150 of 2016-17
SUMIT SAHA,GURU NANAK INSTITUTE OF TECHNOLOGY, 161430110178 of 2016-17
2 Project Objective
3 Project Scope
4 Data Description
5 Model Building
6 Code
7 Future scope of
improvements
8 Certificate
I take this opportunity to express my profound gratitude and deep regards to my faculty
Mr. Titas Roy Chowdhury for his exemplary guidance, monitoring and constant
encouragement throughout the course of this project. The blessings, help and guidance
given time to time shall carry me a long way in the journey of life on which I am about to
embark.
I am obliged to my project team members for the valuable information provided by them in
their respective fields. I am grateful for their co operation during the period of my
assignment.
Subhamay Sadhukhan
I take this opportunity to express my profound gratitude and deep regards to my faculty
Mr. Titas Roy Chowdhury for his exemplary guidance, monitoring and constant
encouragement throughout the course of this project. The blessings, help and guidance
given time to time shall carry me a long way in the journey of life on which I am about to
embark.
I am obliged to my project team members for the valuable information provided by them in
their respective fields. I am grateful for their co operation during the period of my
assignment.
Prithwish Das
I take this opportunity to express my profound gratitude and deep regards to my faculty
Mr. Titas Roy Chowdhury for his exemplary guidance, monitoring and constant
encouragement throughout the course of this project. The blessings, help and guidance
given time to time shall carry me a long way in the journey of life on which I am about to
embark.
I am obliged to my project team members for the valuable information provided by them in
their respective fields. I am grateful for their co operation during the period of my
assignment.
Ritwick Das
I take this opportunity to express my profound gratitude and deep regards to my faculty
Mr. Titas Roy Chowdhury for his exemplary guidance, monitoring and constant
encouragement throughout the course of this project. The blessings, help and guidance
given time to time shall carry me a long way in the journey of life on which I am about to
embark.
I am obliged to my project team members for the valuable information provided by them in
their respective fields. I am grateful for their co operation during the period of my
assignment.
Sumit Saha
A churn model can be the tool that brings these elements together and
provides insights and outputs that drive decision making across an
organization.
Gainsight understands the negative impact that churn rate can have on
company profits. Named as the "2014 cool vendor for CRM sales" (by
Gartner), Gainsight’s customer intelligence and retention process
automation technology:
>st – state:
State code of each customer.
Data type: string
Value type: Categorical
Null value percentage: 0
>acclen - account length
-----------------------------------------------------------------------------------
2.We drop some columns which we don’t need in the model because
those columns don’t affect the model named :-
i.st=state
ii.acclen=account length
iii.Tdcal=total daytime call
iv.tecal=total evening time call
v.temin=total evening time minutes
vi.tnchar=total night time charges
vii.tdchar=total day time charges
The specific attributes used in a churn model are highly domain dependent.
However, broadly speaking, the most common attributes capture user
behavior with regards to engagement level with a product or service. This
can be thought of as the number of times that a user logs into her/his
account in a week or the amount of time that a user spends on a portal. In
short, frequency and intensity of usage/engagement are among the
strongest signals to predict churn.
Introduction :
Decision Tree Classifier is a simple and widely used classification
technique. It applies a straitforward idea to solve the classification
problem. Decision Tree Classifier poses a series of carefully crafted
questions about the attributes of the test record. Each time time it
receive an answer, a follow-up question is asked until a conclusion
about the calss label of the record is reached.
Build A Decision Tree :
Build a optimal decision tree is key problem in decision tree classifier.
In general, may decision trees can be constructed from a given set of
attributes. While some of the trees are more accurate than others,
finding the optimal tree is computationally infeasible because of the
exponential size of the search space.
current_score=scoref(rows)
best_gain=0.0
best_criteria=None
best_sets=None
column_count=len(rows[0])-1
# this column
column_values={}
column_values[row[col]]=1
# in this column
(set1,set2)=divideset(rows,col,value)
p=float(len(set1))/len(rows)
gain=current_score-p*scoref(set1)-(1-p)*scoref(set2)
best_gain=gain
best_criteria=(col,value)
best_sets=(set1,set2)
if best_gain>0:
trueBranch=buildtree(best_sets[0])
falseBranch=buildtree(best_sets[1])
return decisionnode(col=best_criteria[0],value=best_criteria[1],
tb=trueBranch,fb=falseBranch)
else:
return decisionnode(results=uniquecounts(rows))
K-Nearest Neighbors(K-NN) :
Introduction :
code
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn import linear_model
from sklearn import preprocessing
from sklearn import utils
from sklearn import metrics
from sklearn import tree
from sklearn import feature_selection
from sklearn import neighbors
from sklearn import naive_bayes
df=pd.read_csv("d:/folder/churn_train.txt",sep=", ",engine='python')
df1=pd.read_csv("d:/folder/churn_test.txt",sep=", ",engine='python')
ytrain=df[['label']]
Xtrain.describe()
ytrain.describe()
ytrain['label'].value_counts()
#feature extraction
df['st'].value_counts()
sns.boxplot(y="st",data=df,x="label")
df['st'].isnull().sum()
df['acclen'].value_counts()
sns.boxplot(y="acclen",data=df,x="label")
sns.distplot(df['acclen'])
qr=np.percentile(df['acclen'],[0,25,50,75,100])
qr
Xtrain['intplan'].value_counts()
sns.boxplot(y="intplan",data=df,x="label")
sns.distplot(df['intplan'])
sns.countplot(y='intplan',data=df,hue='label')
Xtrain['voice'].value_counts()
sns.countplot(y='voice',data=df,hue='label')
sns.distplot(df['voice'])
sns.boxplot(y="voice",data=df,x="label")
Xtrain['nummailmes'].value_counts()
sns.boxplot(y="nummailmes",data=df,x="label")
sns.distplot(df['nummailmes'])
Xtrain['tdcal'].value_counts()
sns.boxplot(y="tdcal",data=df,x="label")
sns.distplot(df['tdcal'])
Xtrain['tdchar'].value_counts()
sns.boxplot(y="tdchar",data=df,x="label")
sns.distplot(df['tdchar'])
Xtrain['temin'].value_counts()
sns.boxplot(y="temin",data=df,x="label")
sns.distplot(df['temin'])
Xtrain['tecal'].value_counts()
sns.boxplot(y="tecal",data=df,x="label")
sns.distplot(df['tecal'])
Xtrain['tecahr'].value_counts()
sns.boxplot(y="tecahr",data=df,x="label")
Xtrain['tnmin'].value_counts()
sns.boxplot(y="tnmin",data=df,x="label")
sns.distplot(df['tnmin'])
Xtrain['tncal'].value_counts()
sns.boxplot(y="tncal",data=df,x="label")
sns.distplot(df['tncal'])
Xtrain['tnchar'].value_counts()
sns.boxplot(y="tnchar",data=df,x="label")
sns.distplot(df['tnchar'])
Xtrain['timin'].value_counts()
sns.boxplot(y="timin",data=df,x="label")
sns.distplot(df['timin'])
Xtrain['intplan'].value_counts()
sns.boxplot(y="intplan",data=df,x="label")
sns.distplot(df['intplan'])
sns.countplot(y='intplan',data=df,hue='label')
Xtrain['tichar'].value_counts()
sns.boxplot(y="tichar",data=df,x="label")
sns.distplot(df['tichar'])
Xtrain['intplan'].value_counts()
sns.boxplot(y="intplan",data=df,x="label")
sns.distplot(df['intplan'])
sns.countplot(x='intplan',data=df,hue='label')
df['ncsc'].value_counts()
sns.boxplot(y="ncsc",data=df,x="label")
sns.distplot(df['ncsc'])
sns.countplot(x='ncsc',data=df,hue='label')
Xtrain=Xtrain.drop("st",axis=1)
Xtrain=Xtrain.drop("acclen",axis=1)
Xtrain=Xtrain.drop("tdcal",axis=1)
Xtrain=Xtrain.drop("temin",axis=1)
Xtrain=Xtrain.drop("tnchar",axis=1)
Xtrain=Xtrain.drop("tdchar",axis=1)
ml=tree.DecisionTreeClassifier()
ml.fit(Xtrain,ytrain)
print("AUC:",metrics.roc_auc_score(ytrain,ml.predict(Xtrain)))
print("recall:",metrics.recall_score(ytrain,ml.predict(Xtrain)))
df1["st"]=le1.fit_transform(df1['st'])
df1["intplan"]=le1.fit_transform(df1['intplan'])
df1["voice"]=le1.fit_transform(df1['voice'])
df1["label"]=le1.fit_transform(df1['label'])
df1.dropna(inplace=True)
Xtest=df1.drop("phnum",axis=1)
Xtest=Xtest.drop("label",axis=1)
Xtest=Xtest.drop("tecal",axis=1)
Xtest=Xtest.drop("temin",axis=1)
Xtest=Xtest.drop("tnchar",axis=1)
Xtest=Xtest.drop("tdchar",axis=1)
ytest=df1[['label']]
ml=tree.DecisionTreeClassifier()
ml.fit(Xtrain,ytrain)
print("AUC:",metrics.roc_auc_score(ytest,ml.predict(Xtest)))
print("recall:",metrics.recall_score(ytest,ml.predict(Xtest)))
print("F1:",metrics.precision_score(ytest,ml.predict(Xtest)))
print("F1:",metrics.accuracy_score(ytest,ml.predict(Xtest)))
#continouse heatmap
df2=df
df2=df2.drop('st',axis=1)
df2=df2.drop('intplan',axis=1)
df2=df2.drop('tichar',axis=1)
df2=df2.drop('voice',axis=1)
df2=df2.drop('label',axis=1)
sns.heatmap(df2.corr())
def modelstats(Xtrain,Xtest,ytrain,ytest):
stats=[]
modelnames=["LR","DecisionTree","KNN","NB"]
models=list()
models.append(linear_model.LogisticRegression())
models.append(tree.DecisionTreeClassifier())
models.append(neighbors.KNeighborsClassifier())
________________________________
(Mr. Titas Roy Chowdhury)
Globsyn Finishing School
________________________________
(Mr. Titas Roy Chowdhury)
Globsyn Finishing School
________________________________
(Mr. Titas Roy Chowdhury)
Globsyn Finishing School
________________________________
(Mr. Titas Roy Chowdhury)
Globsyn Finishing School