Professional Documents
Culture Documents
prediction
A Finance Predictive Model based on DNN and XGboost
1 Introducion
LendingClub is a US peer-to-peer lending company which specialises in lending various
types of loans to urban customers.
When the company receives a loan application, we have to make a decision based on the
applicant’s profile.
Two types of risks are associated with the bank’s decision :
— if the applicant is likely to repay the loan, then not approving it results in a loss
of business to the company ;
— if the applicant is not likely to repay the loan, i.e. he/she is likely to default, then
approving it may lead to a financial loss for the company.
Our aim is to develop the best supervised machine learning model capable to identify
which borrowers will payoff their loans.
We’ll try to do this binary classification task using Deep Neural Network and X-G boost
models.
Link dataset :https://www.kaggle.com/code/faressayah/lending-club-loan-defaulters-predictio
data?select=lending_club_loan_two.csv
1
Below we report some usefull graphs for our analysis :
300000
250000
200000
count
150000
100000
50000
0
Fully Paid Charged Off
loan_status
Figure 1 – loan_status distribution, we can see that our database is strictly unbalanced
1.0
loan_amnt 1 0.17 0.95 0.34 0.017 0.2 -0.078 0.33 0.1 0.22 0.22 -0.11
int_rate 0.17 1 0.16 -0.057 0.079 0.012 0.061 -0.011 0.29 -0.036 -0.083 0.057
installment 0.95 0.16 1 0.33 0.016 0.19 -0.068 0.32 0.12 0.2 0.19 -0.099 0.8
annual_inc 0.34 -0.057 0.33 1 -0.082 0.14 -0.014 0.3 0.028 0.19 0.24 -0.05
dti 0.017 0.079 0.016 -0.082 1 0.14 -0.018 0.064 0.088 0.1 -0.025 -0.015 0.6
open_acc 0.2 0.012 0.19 0.14 0.14 1 -0.018 0.22 -0.13 0.68 0.11 -0.028
pub_rec -0.078 0.061 -0.068 -0.014 -0.018 -0.018 1 -0.1 -0.076 0.02 0.012 0.7 0.4
revol_bal 0.33 -0.011 0.32 0.3 0.064 0.22 -0.1 1 0.23 0.19 0.19 -0.12
revol_util 0.1 0.29 0.12 0.028 0.088 -0.13 -0.076 0.23 1 -0.1 0.0075 -0.087 0.2
total_acc 0.22 -0.036 0.2 0.19 0.1 0.68 0.02 0.19 -0.1 1 0.38 0.042
mort_acc 0.22 -0.083 0.19 0.24 -0.025 0.11 0.012 0.19 0.0075 0.38 1 0.027 0.0
pub_rec_bankruptcies -0.11 0.057 -0.099 -0.05 -0.015 -0.028 0.7 -0.12 -0.087 0.042 0.027 1
loan_amnt
annual_inc
open_acc
int_rate
installment
dti
pub_rec
revol_bal
revol_util
total_acc
mort_acc
pub_rec_bankruptcies
2
0.20
0.15
0.10
0.05
0.00
Source Verified
Not Verified
Verified
verification_status
0.200
0.175
0.150
0.125
0.100
0.075
0.050
0.025
0.000
1 year
10+ years
2 years
3 years
4 years
5 years
6 years
7 years
8 years
9 years
< 1 year
emp_length
3
80000
70000
Charged Off 10403 11224
60000
50000
True label
40000
30000
Fully Paid 1732 87828
20000
10000
Charged Off Fully Paid
Predicted label
80000
70000
Charged Off 10644 10983
60000
50000
True label
40000
30000
Fully Paid 1865 87695
20000
10000
Charged Off Fully Paid
Predicted label
We added other layers using Dropout regolarization and compared the two models.
The regolarization gave us the best results so we procedeed always considering Dropout.
After this we tried to train another model called “diamond” and compare to the prece-
dents one.
All the models we trained and tested give us results very similar but at the end the best
one has been the “inverted pyramid” model with Dropout regolarization.
4
Also in this case we tried to optimize the hyperparamenters, doing Cross Validation,
with Random Search.
50000
Charged Off 6927 7598
40000
True label
30000
20000
Fully Paid 790 58810
10000
50000
Charged Off 6285 8240
40000
True label
30000
20000
Fully Paid 102 59498
10000
5
50000
Charged Off 6581 7944
40000
True label
30000
20000
Fully Paid 327 59273
10000
50000
Charged Off 6863 7662
40000
True label
30000
20000
Fully Paid 643 58957
10000
6
4 Conclusion
In this table we summarize the results of our model :
As we can see we have obtained great results in each metrics but f1 score.
On confusion matrix we have obtained a great prediction of the majority class ∼ 90 %
in every model but a worse one on minority ∼ 54 %. This not so optimal classification
is due to the beginning unbalanced of our dataset.
From our model, DNN predicts better than XGBoost.