Personal Loan Campaign Final

Personal loan campaign
With PL_XSELL data
22nd May, 2020

Contents
1. Project Objective...........................................................................................................................2
2. Libraries and Dataset used............................................................................................................2
3. Business problem understanding..................................................................................................2
4. Exploratory Data Analysis..............................................................................................................3
5. Clustering.......................................................................................................................................6
6. CART Model Building and Evaluation.............................................................................................7
7. Random Forest model and evaluation...........................................................................................9
8. Conclusion...................................................................................................................................10
1|Page
1. Project Objective
The objective of the report is to understand banking data provided and to build the CART and
Random forest model for the Personal loan campaign
This project report covers areas the following:

 Exploratory data analysis
 CART model and performance evaluation, and
 Random forest model and performance evaluation.
2. Libraries and Dataset used
Dataset
The data set used for the project is PL_XSELL.csv which is containing the banking summary data of
20,000 banking customers.
Important Libraries used
S. No Name of the library Description

1 Data Explorer For EDA
2 Corrplot For EDA
3 caTools For splitting train and test data
4 Rpart Built CART tree using rpart function and setting the control
parameters
5 Rattle function to display the tree
6 ROCR Calculate KS , AUC , etc. statistics.
7 Ineq Calculate GINI Index
8 Randomforest For Random forest model building
3. Business problem understanding.

The data-set provides details from Bank about it’s customers banking data summary of the
responders and non-responders about a Personal Loan Campaign that was executed by the bank.
20000 customers were targeted with an offer of personal loan on 10% interest rate, out of which
2512 customers responded positively. The data needs to be used to create classification model(s) in
order to predict the response of new set of customers in the future, depending on the attributes
available in the data.
Classification Models using following Supervised Machine Learning Techniques:
1. Classification and Regression Tree
2. Random Forest
2|Page
4. Exploratory Data Analysis
Introduction
The data set is 5.3

MB and 85 % of the
Data fields are
continuous and 15%
of the data fields are
discrete.
And there are no

missing observation,
hence we have a
complete dataset for
analysis.
Basic distribution of data.
3|Page
It is observed that the amount of debit transaction, no of credit transaction. Total Debits, Total
credits and total cash withdraw data is right skewed.
4|Page
Correlation plot show no negative correlation,
There are strong positive correlation between the below pairs
 Total no. of transaction to Total no. of Debit transaction

 No. of Debit transaction to No. of credit transaction.
 Total no. of transaction to Total no. of credit transaction
 Amount of other bank ATM charges to no. of ATM debits
 Avg. Amt debited per Mobile Banking Transaction to Amount of Mobile debit transactions
5|Page
5. Clustering
The second part of the question deals with selecting the idel clustering technique and building a
cluster model.
Centroid based clustering is the optimal clustering mechanism for the given dataset, hence kmeans
clustering choose, and cluster model is built.
The number of optimum clusters is 3, using Eucledian distance and K-means clustering
6|Page
6. CART Model Building and Evaluation
Final tree on training data after pruning
Complexity parameter
7|Page
Success Segment Meaning
Customer with No of debit transaction is less Maximum probability for getting personal loan
than 6.5 and age >25 and <50, amount of
cheque transaction <8000
Customer with no of debit transaction greater Maximum probability for getting personal loan
than 6.5, and no of credit transaction greater
than 3.5
CART- Model evaluation
ROC curve for train and test data respective
Random Forest- Model evaluation – Train and Test Data
S. Method Train data result Test data result

No
1 KS 0.2302543 0.2104847
2 Area under curve 0.6511022 0.6284212
3 GINI 0.2653139 0.2626962
4 Classification error TARGET 0 1 TARGET 0 1
0 11884 363 0 5146 51
1 937 816 1 750 53
Overall all CART model performance is average
8|Page
7. Random Forest model and evaluation
mtry = 6 OOB error = 6.6%

Searching left ...
-0.05952381 1e-04
Searching right ...
0.07467532 1e-04
0.03391813 1e-04
0.02421308 1e-04
0.01612903 1e-04
0.02017654 1e-04
9|Page
Random Forest model evaluation
ROC curve for train and test data respectively
Random Forest- Model evaluation – Train and Test Data
S. No Method Train data result Test data result

1 KS 0.9973088 0.7934068
2 Area under curve 0.9999681 0.956837
3 GINI 0.7450146 0.6421228
4 Classification error RF_class RF_class
TARGET 0 1 TARGET 0 1
0 12246 1 0 5236 5
1 158 1595 1 296 463
Random forest Model performance is good on both test and train data, and is more stable than
CART model.
8. Conclusion
Hence the Bank can you the Random forest model to identify a potential customer to whom it can
sell personal loan more accurately than CART model.
10 | P a g e
11 | P a g e

Personal Loan Campaign Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Personal Loan Campaign Final

Uploaded by

Copyright:

Available Formats

Personal loan campaign

With PL_XSELL data

22nd May, 2020

This project report covers areas the following:

2. Libraries and Dataset used

Important Libraries used

S. No Name of the library Description

3. Business problem understanding.

Classification Models using following Supervised Machine Learning Techniques:

1. Classification and Regression Tree

The data set is 5.3

And there are no

Basic distribution of data.

There are strong positive correlation between the below pairs

 Total no. of transaction to Total no. of Debit transaction

Final tree on training data after pruning

CART- Model evaluation

ROC curve for train and test data respective

Random Forest- Model evaluation – Train and Test Data

S. Method Train data result Test data result

Overall all CART model performance is average

mtry = 6 OOB error = 6.6%

ROC curve for train and test data respectively

Random Forest- Model evaluation – Train and Test Data

S. No Method Train data result Test data result

You might also like