You are on page 1of 12

Personal loan campaign

With PL_XSELL data

22nd May, 2020


Contents
1. Project Objective...........................................................................................................................2
2. Libraries and Dataset used............................................................................................................2
3. Business problem understanding..................................................................................................2
4. Exploratory Data Analysis..............................................................................................................3
5. Clustering.......................................................................................................................................6
6. CART Model Building and Evaluation.............................................................................................7
7. Random Forest model and evaluation...........................................................................................9
8. Conclusion...................................................................................................................................10

1|Page
1. Project Objective

The objective of the report is to understand banking data provided and to build the CART and
Random forest model for the Personal loan campaign

This project report covers areas the following:


 Exploratory data analysis
 CART model and performance evaluation, and
 Random forest model and performance evaluation.

2. Libraries and Dataset used

Dataset

The data set used for the project is PL_XSELL.csv which is containing the banking summary data of
20,000 banking customers.

Important Libraries used

S. No Name of the library Description


1 Data Explorer For EDA
2 Corrplot For EDA
3 caTools For splitting train and test data
4 Rpart Built CART tree using rpart function and setting the control
parameters
5 Rattle function to display the tree
6 ROCR Calculate KS , AUC , etc. statistics.
7 Ineq Calculate GINI Index
8 Randomforest For Random forest model building

3. Business problem understanding.


The data-set provides details from Bank about it’s customers banking data summary of the
responders and non-responders about a Personal Loan Campaign that was executed by the bank.

20000 customers were targeted with an offer of personal loan on 10% interest rate, out of which
2512 customers responded positively. The data needs to be used to create classification model(s) in
order to predict the response of new set of customers in the future, depending on the attributes
available in the data.

Classification Models using following Supervised Machine Learning Techniques:

1. Classification and Regression Tree

2. Random Forest

2|Page
4. Exploratory Data Analysis
Introduction

The data set is 5.3


MB and 85 % of the
Data fields are
continuous and 15%
of the data fields are
discrete.

And there are no


missing observation,
hence we have a
complete dataset for
analysis.

Basic distribution of data.

3|Page
It is observed that the amount of debit transaction, no of credit transaction. Total Debits, Total
credits and total cash withdraw data is right skewed.

4|Page
Correlation plot show no negative correlation,

There are strong positive correlation between the below pairs

 Total no. of transaction to Total no. of Debit transaction


 No. of Debit transaction to No. of credit transaction.
 Total no. of transaction to Total no. of credit transaction
 Amount of other bank ATM charges to no. of ATM debits
 Avg. Amt debited per Mobile Banking Transaction to Amount of Mobile debit transactions

5|Page
5. Clustering
The second part of the question deals with selecting the idel clustering technique and building a
cluster model.

Centroid based clustering is the optimal clustering mechanism for the given dataset, hence kmeans
clustering choose, and cluster model is built.

The number of optimum clusters is 3, using Eucledian distance and K-means clustering

6|Page
6. CART Model Building and Evaluation

Final tree on training data after pruning

Complexity parameter

7|Page
Success Segment Meaning
Customer with No of debit transaction is less Maximum probability for getting personal loan
than 6.5 and age >25 and <50, amount of
cheque transaction <8000
Customer with no of debit transaction greater Maximum probability for getting personal loan
than 6.5, and no of credit transaction greater
than 3.5

CART- Model evaluation

ROC curve for train and test data respective

Random Forest- Model evaluation – Train and Test Data

S. Method Train data result Test data result


No
1 KS 0.2302543 0.2104847
2 Area under curve 0.6511022 0.6284212
3 GINI 0.2653139 0.2626962
4 Classification error TARGET 0 1 TARGET 0 1
0 11884 363 0 5146 51
1 937 816 1 750 53

Overall all CART model performance is average

8|Page
7. Random Forest model and evaluation

mtry = 6 OOB error = 6.6%


Searching left ...
mtry = 4 OOB error = 6.99%
-0.05952381 1e-04
Searching right ...
mtry = 9 OOB error = 6.11%
0.07467532 1e-04
mtry = 13 OOB error = 5.9%
0.03391813 1e-04
mtry = 19 OOB error = 5.76%
0.02421308 1e-04
mtry = 28 OOB error = 5.66%
0.01612903 1e-04
mtry = 37 OOB error = 5.55%
0.02017654 1e-04

9|Page
Random Forest model evaluation

ROC curve for train and test data respectively

Random Forest- Model evaluation – Train and Test Data

S. No Method Train data result Test data result


1 KS 0.9973088 0.7934068
2 Area under curve 0.9999681 0.956837
3 GINI 0.7450146 0.6421228
4 Classification error RF_class RF_class
TARGET 0 1 TARGET 0 1
0 12246 1 0 5236 5
1 158 1595 1 296 463

Random forest Model performance is good on both test and train data, and is more stable than
CART model.

8. Conclusion
Hence the Bank can you the Random forest model to identify a potential customer to whom it can
sell personal loan more accurately than CART model.

10 | P a g e
11 | P a g e

You might also like