Professional Documents
Culture Documents
Problem 1
1.1 Read the data, do the necessary initial steps, and
exploratory data analysis (Univariate, Bi-variate, and
multivariate analysis).
Initial steps and exploratory data analysis are done in jupyter notebook
PairPlot
Correlation Plot
From the above correlation plot we can see a lot of positive correlation, except for min_payment_amt
which has mostly negative correlation.
Cluster 1:- customer with high current balance, spend more in single shopping, and high credit limit.
Cluster 2:- customer with less current balance, spend less on single shopping as compared to cluster 1
customer and has low credit limit.
1.4 Apply K-Means clustering on scaled data and
determine optimum clusters. Apply elbow curve and
silhouette score. Explain the results properly. Interpret and
write inferences on the finalized clusters.
K-Means clustering in done in jupyter notebook name ‘data mining problem 1’.
Elbow Curve
Cluster 1:- customers with least spending, least amount spend in single shopping, least credit limit,least
advance payment and maximum advance payment.
Cluster 2:- customer with maximum spending, maximum advance payment and spend most in a single
shopping.
Bank should have a lie up with ecommerce sites and should come up with various offers for the
customer so that they can spend a more using their credit card. Not only that, they can even give
discount coupons if the customer spend a minimum amount in a month. This could encourage
customers to spend more using credit card.
Problem 2
2.1 Read the data, do the necessary initial steps, and
exploratory data analysis (Univariate, Bi-variate, and
multivariate analysis).
Initial steps and exploratory data analysis is done in jupyter notebook naming ‘data mining problem 2’.
Pair Plot
From the below pair plot we can see except age variable other variable are left screwed.
Correlation Plot
From the above correlation plot we can see mostly positive correlation between variables.
2.2 Data Split: Split the data into test and train, build
classification model CART, Random Forest, Artificial
Neural Network
Classification model is build in jupyter notebook naming ‘data mining problem 2’
For cart
Test data accuracy =78.23%
After comparing all the models, Random Forest is the best model in this case as it has high accuracy,
precision, F1 score and AUC score as compared to other models.