Tutorial 5

This tutorial focuses on tree-based methods in data science, specifically for predicting loan defaults using a dataset from Bondora. Key tasks include constructing descriptive statistics, fitting logistic regression models, and employing classification trees and random forests to improve prediction accuracy. The tutorial encourages hands-on practice with the data and analysis techniques before a discussion session.

Uploaded by

q.s.b.bibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views4 pages

Tutorial 5

Uploaded by

q.s.b.bibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Science Methods in Finance

R Tutorial 5

November 29, 2024

Important Instructions
• The purpose of this tutorial is for you to practise some of the key concepts on tree-based
methods

• It should not be submitted, but we strongly encourage you to work through it before
the discussion on Friday.

1
Dataset and Setup
This assignment uses loan data from https://bondora.com/en/ and refers to the data file
“LoanData2021.RDS”. The ultimate goal of the assignment is to identify a good model for
predicting default. The dependent variable is whether you have defaulted or not. You can
construct this variable with “default= ifelse(is.na(DefaultDate) == F, 1, 0)" and “defaultFac-
tor = factor(default, levels= c("1", "0"))". Please use the following variables for the default
predictions:

1. the age of the borrower (Age)

2. the borrower’s total income (IncomeTotal)

3. the gender of the borrower (0 male; 1 female) (Gender)

4. the total liabilities of the borrower (LiabilitiesTotal)

5. the borrower’s number of existing liabilities (ExistingLiabilities)

6. interest rate of the loan (Interest)

7. principal that still needs to be paid by the borrower (PrincipalBalance)

8. current loan duration in months (LoanDuration)

9. education of the borrower (1 Primary education; 2 Basic education; 3 Vocational edu-

cation; 4 Secondary education; 5 Higher education) (Education)

10. whether it’s a new credit customer or not (NewCreditCustomer)

11. amount the borrower received (Amount)

12. value of previous loans (AmountOfPreviousLoansBeforeLoan)

13. income after monthly liabilities (FreeCash)

2
Question 1
1. Construct a table with descriptive statistics of the variables you use. Make sure to deal
with missing values (if there is).

2. Fit a logistic regression using only age as predictor and interpret the coefficient on age.

3. Split the dataset into a training sample and a test set. Use 70% for training and 30%
for testing. Fit a logistic regression on the training set and report its accuracy based on
predictions on the test set. Recall that the model will output a probability of default,
which you need to turn into a default prediction (1 or 0), which can then be compared
with the actual defaults.

4. Add the interest variable and compare the estimated accuracy of the model with interest
and age with that of the model with only age. Interpret the coefficient on the interest
variable.

5. Fit a (large) classification tree using the training data using all the features in the
dataset. Plot the tree and report the number of terminal nodes as well as the four most
relevant variables used for splitting the tree. Report the training and test error rate for
the tree. Interpret and compare to the errors from the logistic model.

6. Prune the tree and use 5-fold cross validation to determine the optimal pruning param-
eter. Use the error rate as the pruning criterion and compare to the error rates of the
tree from the previous question.

7. Use random forest to improve predictions. For this set the seed to 1 and fit 300 trees
√
using m = p (round to the nearest integer) where m is the number of features used in
each split and p is the total number of features. Provide a plot of the OOB estimate of
the test error against the number of trees grown. Report the error rate for the training
data, the OOB estimate of the test error rate as well as the error rate for the test data.
Interpret the outcomes and compare to the accuracy of the other methods.

8. Provide a plot of the importance of the variables in the random forest and interpret.

9. Tune the Random Forest to select the optimal hyperparameter m. For this, you can
either set up a grid of different m by yourself or use the Caret package. If you use the
Caret package, start by defining the trainControl object, define your validation method
and add “preProc = c("center", "scale")" to the train object. Compare the in-sample

3
and out-of-sample accuracy of the tuned random forest and compare with the other
methods.

Thera Bank Loan Modeling Analysis
100% (1)
Thera Bank Loan Modeling Analysis
25 pages
Machine Learning: Random Forests & Boosting
No ratings yet
Machine Learning: Random Forests & Boosting
2 pages
Machine Learning Models for Default Prediction
No ratings yet
Machine Learning Models for Default Prediction
8 pages
Credit Risk Classification Analysis
No ratings yet
Credit Risk Classification Analysis
16 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
R Codes
No ratings yet
R Codes
4 pages
CART and Logistic Regression Analysis
No ratings yet
CART and Logistic Regression Analysis
14 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
India Credit Risk Model Development
No ratings yet
India Credit Risk Model Development
14 pages
Loan Default Prediction Analysis
No ratings yet
Loan Default Prediction Analysis
20 pages
Machine Learning with CARET in R
No ratings yet
Machine Learning with CARET in R
90 pages
Mid Sem Exam: Conversational AI Metrics
No ratings yet
Mid Sem Exam: Conversational AI Metrics
1 page
Decision Tree Model for Credit Approval
No ratings yet
Decision Tree Model for Credit Approval
13 pages
Retail Credit Scoring Analysis
No ratings yet
Retail Credit Scoring Analysis
9 pages
Response and Predictor Variables Analysis
No ratings yet
Response and Predictor Variables Analysis
6 pages
Credit Risk Prediction Model Analysis
No ratings yet
Credit Risk Prediction Model Analysis
7 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Tree Models in Insurance Pricing
No ratings yet
Tree Models in Insurance Pricing
142 pages
MBA Exam: Predictive Analytics Insights
No ratings yet
MBA Exam: Predictive Analytics Insights
11 pages
Credit Risk Analysis
No ratings yet
Credit Risk Analysis
6 pages
Data Mining - Summer 2 - Sesh 3
No ratings yet
Data Mining - Summer 2 - Sesh 3
66 pages
ML Assignment 2
No ratings yet
ML Assignment 2
2 pages
Loan Response Prediction Models
No ratings yet
Loan Response Prediction Models
97 pages
Class Tree
No ratings yet
Class Tree
36 pages
Decision Tree Model for Bank Campaign
No ratings yet
Decision Tree Model for Bank Campaign
5 pages
2025-07-31 Lecture 11 Notes-C11
No ratings yet
2025-07-31 Lecture 11 Notes-C11
3 pages
AIMLB PGP 2025 Session 8
No ratings yet
AIMLB PGP 2025 Session 8
52 pages
Da Rec
No ratings yet
Da Rec
29 pages
Random Forest PDF
No ratings yet
Random Forest PDF
14 pages
COL 774 Assignment 3: Decision Trees & RF
No ratings yet
COL 774 Assignment 3: Decision Trees & RF
4 pages
Financial Analytics Overview and Techniques
No ratings yet
Financial Analytics Overview and Techniques
27 pages
Econometrics: Overfitting & Model Tuning
No ratings yet
Econometrics: Overfitting & Model Tuning
9 pages
Decision Tree Classifier Assignment
No ratings yet
Decision Tree Classifier Assignment
7 pages
Final Project
No ratings yet
Final Project
9 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
Finance and Risk Analytics Project PDF
No ratings yet
Finance and Risk Analytics Project PDF
94 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Machine Learning Lab Report Guidelines
No ratings yet
Machine Learning Lab Report Guidelines
7 pages
Finance & Risk Analytics QSTN 1 - Credit Risk
No ratings yet
Finance & Risk Analytics QSTN 1 - Credit Risk
24 pages
Project Report On Credit Risk Analysis Using Random Forest
No ratings yet
Project Report On Credit Risk Analysis Using Random Forest
8 pages
Tree 7
No ratings yet
Tree 7
31 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
Bondora P2P Credit Scoring Model
No ratings yet
Bondora P2P Credit Scoring Model
5 pages
Credit Quality Assessment in Banking
No ratings yet
Credit Quality Assessment in Banking
2 pages
Report On Loan Eligibility Analysis
No ratings yet
Report On Loan Eligibility Analysis
5 pages
Default Risk Prediction in India
100% (1)
Default Risk Prediction in India
30 pages
Module 7 Homework Prompt - JMP
No ratings yet
Module 7 Homework Prompt - JMP
6 pages
Random Forest for Loan Default Prediction
100% (1)
Random Forest for Loan Default Prediction
11 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Decision Tree Machine Learning Guide
No ratings yet
Decision Tree Machine Learning Guide
19 pages
Classification Algorithms Performance Analysis
No ratings yet
Classification Algorithms Performance Analysis
12 pages
India Credit Risk Model Report
No ratings yet
India Credit Risk Model Report
18 pages
Practice Final
No ratings yet
Practice Final
15 pages
Logistic Regression for Binary Classification
No ratings yet
Logistic Regression for Binary Classification
16 pages
City Slickers: Fashion Insights
100% (1)
City Slickers: Fashion Insights
123 pages
SR02124SE
100% (2)
SR02124SE
6 pages
Volume 1
No ratings yet
Volume 1
79 pages
Smart System Integration Solutions
No ratings yet
Smart System Integration Solutions
12 pages
Phone Call Scenarios and Tips
No ratings yet
Phone Call Scenarios and Tips
3 pages
Ca Unit 5 Prabu
No ratings yet
Ca Unit 5 Prabu
37 pages
NBFCs & MFIs Conference Agenda
No ratings yet
NBFCs & MFIs Conference Agenda
2 pages
Film Quality Control
100% (5)
Film Quality Control
30 pages
Pool Details
No ratings yet
Pool Details
20 pages
Metalorganic Frameworks Mofs As Catalysts 2022
No ratings yet
Metalorganic Frameworks Mofs As Catalysts 2022
785 pages
All Authors Are Equal: The Publishing Life of Fredric Warburg, 1936-1971
100% (1)
All Authors Are Equal: The Publishing Life of Fredric Warburg, 1936-1971
25 pages
Writing an Autobiographical Essay Guide
No ratings yet
Writing an Autobiographical Essay Guide
4 pages
Flight Booking Confirmation IX 748
No ratings yet
Flight Booking Confirmation IX 748
2 pages
Datasheet Cisco GS7000
No ratings yet
Datasheet Cisco GS7000
7 pages
Re-Upskilling Automobile Tech 2024 Guide
No ratings yet
Re-Upskilling Automobile Tech 2024 Guide
10 pages
Policy Nomination Change Form
No ratings yet
Policy Nomination Change Form
1 page
Brochure Criticare EQuality 506DN - EN
No ratings yet
Brochure Criticare EQuality 506DN - EN
2 pages
Architecture of The 8085 Microprocessor
No ratings yet
Architecture of The 8085 Microprocessor
13 pages
Upper and Lower GI Disorders Lecture Notes PDF
No ratings yet
Upper and Lower GI Disorders Lecture Notes PDF
18 pages
Operating Instructions For Condenser Tumble Dryer T 234 C: Essential
No ratings yet
Operating Instructions For Condenser Tumble Dryer T 234 C: Essential
44 pages
Divas Nainwal - Resume
No ratings yet
Divas Nainwal - Resume
3 pages
Auditing and Ethics Analysis Report
No ratings yet
Auditing and Ethics Analysis Report
24 pages
Coolant Drain and Vacuum Fill Procedure For Certain C9.3 Through C18 Tier 4 Engines
No ratings yet
Coolant Drain and Vacuum Fill Procedure For Certain C9.3 Through C18 Tier 4 Engines
8 pages
MRT7 Station 4 BOQ
No ratings yet
MRT7 Station 4 BOQ
57 pages
Accounting Changes and Error Corrections
No ratings yet
Accounting Changes and Error Corrections
22 pages
HDFL Construction Business Plan Overview
No ratings yet
HDFL Construction Business Plan Overview
19 pages
Introduction To The Metal Cutting and Machine Tools
No ratings yet
Introduction To The Metal Cutting and Machine Tools
40 pages
The Office Script
No ratings yet
The Office Script
2 pages
DS Study Material Module 1
No ratings yet
DS Study Material Module 1
9 pages