pigiback

the “probability I get it back” calculator

Jenny Lin | Insight Winter 2017
pigiback

Problem: You want to invest in peer-to-peer loans, but
what is the RISK level?

pigiback uses machine learning to calculate the
probability of default on personal loans.
TRAINING SET

Mechanism: machine learning classifier to predict loan default

◉ Historical data from Lending Club (1.2M loans)
◉ All public peer-to-peer loans from 2008-2016
◉ Issue: Imbalanced data (9:1 no default)
◉ Individual financial history

www.pigiback.info
PIPELINE

Feature Analysis &
Preprocessing Default Prediction
Cross Validation

111 Features: Balance the Data: Feature Importance
via Decision Trees
Address Train with ‘No Default’
Multicollinearity Undersampling

Remove Posterior Random Forest Classifier Stratified 10-Folds
Information Cross Validation
(80% to train, 20% test)
Performance across 4 different classifiers

Cross-validation:
Stratified 10-Folds

Random Forest
wins!
RELATIVE FEATURE IMPORTANCE

} Past Defaults

Purpose

Interest Rate

21 features

99.99% variance
explained
CROSS VALIDATION
Actual

◉ Accuracy = 94.7%
No Default
True (-) False (-)
~1.1M ~1.2K
◉ OOB Error = 4.5%
Predicted

◉ AUC = 95.5%

False (+) True (+)
Default

~30K ~92K

No Default Default

95% 97% Optimized for High Recall
Pigiback helps individuals
minimize risk

maximize return
in the personal loan market