Professional Documents
Culture Documents
DETAILS:
Name: Ankita
Mohapatra
(22020343011)
Name: Varun
Kumar Sinha
(22020343076)
Name: Sundar
TEAM: OMICRON
Bhattacharjee
(22020343070)
Name: Anshul
Group Assignment Aggarwal
(22020343013)
Defining the Problem
Given that a consumer credit card bank is facing the problem of customer
attrition. They want to analyse the data to find out the reason behind this and
leverage the same to predict customers who are likely to drop off.
14. Credit_Limit: It denotes the credit limit of the credit card. Data type is
numerical.
22.Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Coun
t_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_
1: It denotes the simple probabilistic classifier (Naive Bayes classifier) with
strong (naive) independence assumptions between the mentioned variables.
23.Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Coun
t_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_
2: It denotes the simple probabilistic classifier (Naive Bayes classifier) with
strong (naive) independence assumptions between the mentioned variables.
Summary of the data set using R
Statistical Analysis
Chi-square Test:
Output:
Code:
tab2<table(credit_card_churn_updated$Education_Level,credit_card_chu
rn_updated$Income_Category)
Output:
As p-value<0.05, we reject null hypothesis.
“Education level” and “Income Category” are dependent on each
other.
Code:
tab4<table(credit_card_churn_updated$Income_Category,credit_card_ch
urn_updated$Card_Category)
chisq.test(tab4)
Output:
Code:
res<-t.test(credit_card_churn_updated$Months_on_book ~
credit_card_churn_updated$Attrition_Flag,data =
credit_card_churn_updated,var.equal = TRUE)
res
Output:
Code:
res1<-t.test(credit_card_churn_updated$Months_on_book ~
credit_card_churn_updated$Gender,data =
credit_card_churn_updated,var.equal = TRUE)
res1
Output:
Code:
res2<-
t.test(credit_card_churn_updated$Total_Relationship_Count~credit_c
ard_churn_updated$Gender,date = credit_card_churn_updated,
var.equal = TRUE)
res2
Output:
Code:
res3<-
t.test(credit_card_churn_updated$Credit_Limit~credit_card_churn_up
dated$Gender, data = credit_card_churn_updated)
res3
Output:
Code:
res4<-
t.test(credit_card_churn_updated$Total_Trans_Amt~credit_card_chur
n_updated$Attrition_Flag,data = credit_card_churn_updated)
res4
Output:
Code:
one.way<-aov(credit_card_churn_updated$Months_on_book ~
credit_card_churn_updated$Card_Category, data =
credit_card_churn_updated)
summary(one.way)
Output:
Code:
one.way1<-
aov(credit_card_churn_updated$Months_Inactive_12_mon~credit_
card_churn_updated$Card_Category, data =
credit_card_churn_updated)
summary(one.way1)
Output:
Code:
one.way2<-
aov(credit_card_churn_updated$Months_on_book~credit_card_ch
urn_updated$Education_Level, data = credit_card_churn_updated)
summary(one.way2)
Output:
Code:
one.way4<-
aov(credit_card_churn_updated$Total_Trans_Amt~credit_card_ch
urn_updated$Income_Category,data = credit_card_churn_updated)
summary(one.way4)
Output:
Correlation:
Code:
cor.test(credit_card_churn_updated$Customer_Age,credit_card
_churn_updated$Months_on_book)
Output:
Code:
cor.test(credit_card_churn_updated$Months_Inactive_12_mon,
credit_card_churn_updated$Credit_Limit)
Output:
Code:
cor.test(credit_card_churn_updated$Credit_Limit,credit_card_c
hurn_updated$Months_on_book)
Output:
Regression:
Assumptions: The following assumptions are made while
building the models.
1. The model is linear.
2. The error terms have constant variances.
3. The error terms are independent of each other.
4. The error terms are normally distributed.
Models:
1) “Months_Inactive_12_mon” is determined by
“Income_Category” and “Total Relationship count”
Code:
reg2<lm(credit_card_churn_updated$Months_Inactive_12_
mon~credit_card_churn_updated$Income_Category+credit_
card_churn_updated$Total_Relationship_Count,data =
credit_card_churn_updated)
summary(reg2)
Output:
R-square is less than 0 and p value>0.05.
Code:
reg3<lm(credit_card_churn_updated$Months_on_book~cred
it_card_churn_updated$Income_Category+credit_card_chur
n_updated$Card_Category+credit_card_churn_updated$Cre
dit_Limit)
summary(reg3)
Output:
Code:
reg1<-lm(credit_card_churn_updated$Months_on_book ~
credit_card_churn_updated$Customer_Age+credit_card_chu
rn_updated$Gender+credit_card_churn_updated$Income_C
ategory+credit_card_churn_updated$Card_Category+credit_
card_churn_updated$Months_Inactive_12_mon+credit_card
_churn_updated$Credit_Limit+credit_card_churn_updated$
Total_Trans_Amt,data = credit_card_churn_updated)
summary(reg1)
Output:
R-square is between 0 to 1.
p-value<0.05
This model represents the best fit line model.