0% found this document useful (0 votes)

22 views11 pages

EndTerm - MLBA - Group 7 Draft

The project analyzes a bank's dataset to identify potential customers for personal loan acceptance, focusing on data preprocessing and model development using Logistic Regression and Decision Tree. Key insights reveal significant class imbalance and the importance of features like Income and CD Account in predicting loan acceptance. The Decision Tree model outperforms Logistic Regression with higher accuracy and recall, making it more effective for the bank's objectives.

Uploaded by

Defi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

EndTerm - MLBA - Group 7 Draft

Uploaded by

Defi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Project Report: Predicting Personal Loan Acceptance

Project Title: Identifying Potential Customers for Personal Loan Uptake

Introduction & Dataset Name: This project aims to analyze a dataset from a bank that offers
personal loans to its customers. The primary objective is to identify customers who are most
likely to accept a personal loan offer, thereby helping the bank's management increase the
uptake of these loans. The dataset used for this analysis is bankloan.csv.

1. Data Preprocessing & Key Insights

Outline of Data Preprocessing Steps:

The initial bankloan.csv dataset comprised 5000 entries and 14 columns. The preprocessing
steps were crucial for cleaning the data and preparing it for effective model training.

● Initial Data Inspection:

○ The dataset was loaded into a pandas DataFrame.

○ Initial checks using .head(), .info(), and .describe() revealed the
dataset structure, data types (all numerical), and basic statistical summaries.
○ Crucially, no missing values were found across any of the columns.
● Handling Negative 'Experience' Values:

○ A significant data anomaly was identified in the Experience column: 52 rows

contained negative values (e.g., -3), which is illogical for "experience."
○ These negative values were imputed by replacing them with the median of all
positive Experience values in the dataset, which was 20.0. This correction
ensures that the Experience feature accurately reflects a customer's
professional background. The minimum Experience value became 0.0 after
this step.
● Dropping Irrelevant Columns:

○ The ID column, being a unique identifier, has no predictive power for loan
acceptance; therefore, it was removed.
○ The ZIP Code column, while numerical, represents highly granular geographical
information. Treating it as a numerical feature is inappropriate, and one-hot
encoding it would create too many features (high cardinality), potentially leading
to computational inefficiency and overfitting. For simplicity and focus on direct
customer attributes, ZIP Code was also dropped.
● Handling 'Education' Column:
○ The Education column contained discrete numerical values (1, 2, 3), which
clearly represent ordered categories (e.g., Undergraduate, Graduate,
Advanced/Professional). Given its ordinal nature, this column was retained as is,
without one-hot encoding, as its numerical representation implicitly captures the
hierarchical relationship.
● Feature Scaling:

○ To ensure that features with larger numerical ranges (like Income or Mortgage)
do not disproportionately influence the Logistic Regression model, numerical
features (Age, Experience, Income, CCAvg, Mortgage) were scaled using
StandardScaler. This transforms the data to have a mean of 0 and a standard
deviation of 1. Decision Tree models are generally insensitive to feature scaling,
but it was applied for consistency and Logistic Regression's benefit.
● Data Splitting:

○ The processed dataset was divided into training (70%) and testing (30%) sets.
This separation is vital to evaluate the models' generalization ability on unseen
data.
○ stratify=y was applied during the split to maintain the exact proportion of loan
acceptors (the target variable Personal Loan) in both the training and testing
sets, addressing the inherent class imbalance.

Key Insights from the Preprocessing and Exploratory Data Analysis (EDA):

The EDA revealed crucial characteristics of the dataset and provided insights into factors
influencing personal loan acceptance:

● Significant Class Imbalance: The target variable, Personal Loan, showed a notable
imbalance: only 9.6% of customers accepted a loan (Class 1), while 90.4% did not
(Class 0). This highlights the importance of evaluating models beyond simple accuracy,
focusing on metrics like recall and precision for the minority class.
● Impact of Key Numerical Features:

○ Income and CCAvg (Credit Card Average Spending): These were identified
as the most influential numerical features. Customers with significantly higher
incomes and higher average monthly credit card spending (CCAvg) were
found to be substantially more likely to accept personal loans. The distributions
for loan acceptors were clearly skewed towards higher values for these features.
○ Mortgage: While there was some correlation, the relationship between
Mortgage amount and loan acceptance was not as strong as with Income or
CCAvg.
○ Age and Experience: These features showed very little differentiation in their
distributions between loan acceptors and non-acceptors, indicating a minimal
direct impact on loan acceptance.

● Influence of Key Categorical/Binary Features:

○ CD Account: The presence of a CD Account was a remarkably strong indicator.

Customers who already held a CD Account were significantly more likely to
accept a personal loan.
○ Education: Higher Education levels (specifically levels 2 and 3) showed a
clear positive association with personal loan acceptance, suggesting that higher
education might correlate with better financial understanding or different financial
needs.
○ Family: Customers with larger Family sizes (e.g., 3 or 4 members) also
exhibited a slightly higher propensity for loan acceptance.
○ Securities Account, Online, CreditCard: These features showed very little
discernible impact on loan acceptance rates.
● Correlation Analysis:

○ A correlation matrix confirmed the strong positive relationships between

Personal Loan and Income, CCAvg, and CD Account.
○ Education and Mortgage had moderate positive correlations.
○ Features like Age, Experience, Securities Account, Online, and
CreditCard showed very weak or negligible correlations with loan acceptance.
○ A high correlation was noted between Age and Experience (0.99), indicating
multicollinearity, but given their low individual correlation with the target, it was
not a primary concern for the model's predictive power.

2. Predictive Model Development: Methodology, Model Performance, and

Interpretation

To identify potential customers, two predictive models were developed: Logistic Regression and
Decision Tree. Both were trained and evaluated to assess their effectiveness.

Methodology:

1. Data Preparation: The preprocessed dataset was used, with Personal Loan as the
target variable (y) and all other relevant features (excluding 'ID' and 'ZIP Code') as
predictors (X).
2. Data Splitting: The data was split into training (70%) and testing (30%) sets using
train_test_split with random_state=42 for reproducibility and stratify=y to
preserve the class distribution of Personal Loan in both sets.
3. Feature Scaling (for Logistic Regression): Numerical features (Age, Experience,
Income, CCAvg, Mortgage) were scaled using StandardScaler on the training data,
and this scaler was then applied to the test data. This step standardizes the feature
values, which is beneficial for Logistic Regression's optimization algorithm.
4. Model Instantiation and Training:
○ Logistic Regression: An instance of LogisticRegression from
sklearn.linear_model was created with random_state=42 and
solver='liblinear' (suitable for smaller datasets and regularization). The
model was then fit on X_train and y_train.
○ Decision Tree: An instance of DecisionTreeClassifier from
sklearn.tree was created with random_state=42 and a max_depth=5.
The max_depth parameter was set to control the complexity of the tree, helping
to prevent overfitting and improve interpretability. This model was also fit on
X_train and y_train.

Model Performance and Interpretation:

The performance of both models was evaluated using classification_report,

confusion_matrix, accuracy_score, and roc_auc_score. Visualizations like confusion
matrices and ROC curves were also generated.

Logistic Regression Model

● Overall Accuracy: 95.13%

● ROC AUC Score: 0.9646 (Indicates excellent discriminatory power)

Classification Report:

precision recall f1-score support

0 0.96 0.99 0.97 1356

1 0.83 0.62 0.71 144

Confusion Matrix:

1338 18

55 89

True Negatives (TN): 1338 (Correctly predicted no loan)

False Positives (FP): 18 (Incorrectly predicted loan, but they didn't take one)
False Negatives (FN): 55 (Incorrectly predicted no loan, but they did take one)
True Positives (TP): 89 (Correctly predicted loan)

Interpretation: The Logistic Regression model shows strong performance in predicting non-
loan acceptors (Class 0), with high precision and recall. However, for predicting loan acceptors
(Class 1), while its precision is good (83%), its recall is moderate (62%). This means it correctly
identifies 83% of the customers it predicts will take a loan, but it only captures 62% of the
customers who actually take a loan, missing 38% of potential positive cases.

Feature Coefficient Absolute Coefficient

CD Account 3.24 3.24
Income 2.08 2.08
Education 1.31 1.31
CreditCard -0.92 0.92
Securities Account -0.82 0.82
Online -0.64 0.64
Family 0.54 0.54
Age 0.48 0.48
Experience -0.48 0.48
CCAvg 0.2 0.2
Mortgage 0.06 0.06

Interpretation of Coefficients:

● Strong Positive Impact: CD Account, Income, and Education have the largest
positive coefficients, indicating that the presence of a CD account, higher income, and
higher education levels significantly increase the likelihood of a customer accepting a
personal loan.
● Negative Impact: CreditCard, Securities Account, and Online have negative
coefficients, suggesting that customers having these features might be slightly less likely
to accept a personal loan compared to others.
● Moderate/Minor Impact: Family, Age, Experience, CCAvg, and Mortgage have
smaller coefficients, implying a less pronounced influence on loan acceptance in this
model.

Decision Tree Model

● Overall Accuracy: 98.47%

● ROC AUC Score: 0.9951 (Indicates superior discriminatory power)

Classification Report:
precision recall f1-score support

0 0.99 0.99 0.99 1356

1 0.9 0.95 0.92 144

Confusion Matrix:

1340 16

7 137
p
True Negatives (TN): 1340
False Positives (FP): 16
False Negatives (FN): 7
True Positives (TP): 137

Interpretation:

The Decision Tree model significantly outperforms the Logistic Regression model, particularly in
identifying the minority class. Its recall for Class 1 (Personal Loan) is 95%, meaning it correctly
identifies 95% of actual loan acceptors, missing only 7 cases (False Negatives). This makes it a
much more effective tool for the bank's goal of increasing loan uptake.

Feature Importances of Decision Tree:

Feature Importance
Income 0.458
Education 0.326
Family 0.148
CCAvg 0.045
CD Account 0.011
Age 0.006
Online 0.004
Mortgage 0.003
Experience 0
Securities Account 0
CreditCard 0

Interpretation of Feature Importances:

● Income is by far the most important feature, followed by Education and Family. These
features are used at the top levels of the decision tree to make crucial splits.
● CCAvg and CD Account also contribute, though their individual importance in this
particular tree structure (with max_depth=5) is lower compared to the top three.
● Features like Experience, Securities Account, and CreditCard have negligible
or zero importance in this model, meaning they were not used in the key decision splits.

3. Model Comparison: Logistic Regression vs. Decision Tree

Metric Logistic Decision

Regression Tree

Accuracy 0.9513 0.9847

ROC AUC Score 0.9646 0.9951

Recall (Class 1) 0.62 0.95

Precision (Class 1) 0.83 0.90

False Negatives 55 7
(FN)

Conclusion on Model Comparison:

As the ROC Curve Comparison plot illustrates, the Decision Tree model (green curve)
demonstrates a superior performance with an Area Under the Curve (AUC) of 0.99, significantly
higher than the Logistic Regression model (blue curve) which has an AUC of 0.96. This
indicates that the Decision Tree model is much better at distinguishing between customers who
will accept a personal loan and those who will not, across various classification thresholds. The
closer the curve is to the top-left corner, the better the model's performance.

The Decision Tree model clearly outperforms the Logistic Regression model for this task,
especially in identifying potential loan acceptors. Its significantly higher recall for the positive
class (0.95 vs. 0.62) means it is much better at identifying customers who will actually take a
loan, leading to fewer missed opportunities for the bank.

4. Assumptions & Limitations

Assumptions:

● Data Representativeness: It is assumed that the provided bankloan dataset is

representative of the bank's overall customer base and reflects real-world customer
behavior regarding personal loan acceptance.
● Feature Relevance: The available features are assumed to contain sufficient
information to predict loan acceptance.
● Data Integrity: It is assumed that, after preprocessing, the data is accurate and free
from major unhandled errors (e.g., that median imputation for 'Experience' is an
acceptable approach).
● Ordinality of Education: It is assumed that the numerical representation of Education
(1, 2, 3) correctly reflects an ordinal relationship and that treating it as such is
appropriate for the models.
● Model Linearity (Logistic Regression): Logistic Regression assumes a linear
relationship between the independent variables and the log-odds of the dependent
variable.
● Independence of Observations: Both models assume that observations (customer
records) are independent of each other.

Limitations:

● Class Imbalance: Despite using stratify during splitting, the significant class
imbalance (9.6% loan acceptors) can still pose a challenge. Models might struggle with
the minority class, and performance metrics (like accuracy) can be misleading if not
considered alongside precision and recall.
● Generalizability: The models are trained on a specific dataset. Their performance on
entirely new customer segments or data from a different time period might vary.
● Feature Engineering Scope: Due to the scope of this project, extensive feature
engineering (e.g., creating interaction terms, more sophisticated handling of ZIP Code
data for geographical segmentation) was not performed, which could potentially further
improve model performance.
● Interpretability vs. Performance Trade-off: While Logistic Regression offers clear
coefficient interpretations, Decision Trees can become complex and less interpretable if
max_depth is not limited. Conversely, simpler Decision Trees might not capture all
complex relationships.
● Decision Tree Instability: Decision Trees can be sensitive to small changes in the
training data (high variance). While random_state ensures reproducibility for a given
run, this inherent instability can be a limitation for deployment. Ensemble methods (like
Random Forest or Gradient Boosting) could address this but were beyond the immediate
scope.
● No Causal Inference: The models identify correlations and predictive patterns, but they
do not establish causal relationships (e.g., high income causes loan acceptance, rather it
is strongly associated with it).

5. Conclusion & Key Takeaways

The analysis of the bank loan dataset and the development of predictive models have provided
valuable insights for identifying customers likely to accept personal loan offers.

Key Findings and Model Comparison:

● The dataset exhibited a notable class imbalance, with only 9.6% of customers accepting
personal loans.
● Income, CD Account, and Education Level emerged as the strongest positive
predictors of personal loan acceptance across both models. Customers with higher
incomes, those who possess a CD Account, and those with higher education levels are
significantly more likely to accept a loan.
● The Decision Tree model proved to be superior for this task. It achieved a higher
overall accuracy (98.47%) and, more importantly, a substantially better recall for the
positive class (95% vs. 62% for Logistic Regression). This indicates the Decision Tree is
much more effective at identifying true loan acceptors and minimizing missed
opportunities for the bank.

Key Takeaways for Increasing Loan Uptake:

Based on the insights derived from the models, the bank should primarily focus its marketing
and outreach efforts on customer segments characterized by:

1. High Income: This is the most critical factor. Targeting affluent customers will yield the
highest success rate.
2. Higher Education Levels: Customers with graduate or advanced degrees show a
strong propensity to accept personal loans.
3. Existing CD Account Holders: Customers who already have a Certificate of Deposit
(CD) account with the bank are significantly more likely to accept personal loans. This
group represents a highly promising target due to their existing relationship and likely
financial stability.
4. Larger Families: Customers with more family members also show a higher likelihood of
accepting loans.
5. Higher Credit Card Average Spending (CCAvg): Customers with higher monthly
credit card expenditures are also good candidates.

By prioritizing these characteristics, the bank can optimize its strategies to identify and engage
with the most receptive customers, thereby increasing the uptake of personal loans efficiently.
Further improvements could involve exploring more advanced modeling techniques or delving
deeper into feature engineering.

Mlba-Sec 1-Group 7 End Term Final
No ratings yet
Mlba-Sec 1-Group 7 End Term Final
14 pages
C2 - WK05 - Python Project 5 - Problem Statement
No ratings yet
C2 - WK05 - Python Project 5 - Problem Statement
3 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
Big Data Project Report
No ratings yet
Big Data Project Report
6 pages
Data Mining Project
100% (1)
Data Mining Project
24 pages
Business Report (AutoRecovered)
No ratings yet
Business Report (AutoRecovered)
10 pages
Edafinal 1
No ratings yet
Edafinal 1
32 pages
Project Report
No ratings yet
Project Report
19 pages
Project 2
100% (2)
Project 2
17 pages
Supervised Learning Problem For Solving
No ratings yet
Supervised Learning Problem For Solving
2 pages
Thera Bank Loan Campaign Analysis
100% (1)
Thera Bank Loan Campaign Analysis
21 pages
Thera Bank Loan Campaign Analysis
No ratings yet
Thera Bank Loan Campaign Analysis
21 pages
Report On Loan Eligibility Analysis
No ratings yet
Report On Loan Eligibility Analysis
5 pages
An Kit
No ratings yet
An Kit
12 pages
CPTR5360 Term Paper
No ratings yet
CPTR5360 Term Paper
15 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Data Analysis On Loan Prediction
No ratings yet
Data Analysis On Loan Prediction
20 pages
Predictive Analysis For Retail Banking
No ratings yet
Predictive Analysis For Retail Banking
28 pages
Predicting Term Deposit Subscriptions
No ratings yet
Predicting Term Deposit Subscriptions
19 pages
Predicting Term Deposit Subscriptions
No ratings yet
Predicting Term Deposit Subscriptions
19 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
IDS 575 Project Report
No ratings yet
IDS 575 Project Report
9 pages
Powerpoint Presentation On Rmarkdown 2
No ratings yet
Powerpoint Presentation On Rmarkdown 2
5 pages
Loan Approval Prediction Models
No ratings yet
Loan Approval Prediction Models
10 pages
Powerpoint Presentation On Rmarkdown
No ratings yet
Powerpoint Presentation On Rmarkdown
4 pages
Thera Bank Loan Purchase Modelling
No ratings yet
Thera Bank Loan Purchase Modelling
44 pages
REORT
No ratings yet
REORT
3 pages
GLCA DA MS Excel HBFC Project Modified-1
No ratings yet
GLCA DA MS Excel HBFC Project Modified-1
3 pages
HCI ScorecardModel PPT
No ratings yet
HCI ScorecardModel PPT
9 pages
Bank Marketing Campaign Prediction
No ratings yet
Bank Marketing Campaign Prediction
20 pages
Credit Default Project 23124001
No ratings yet
Credit Default Project 23124001
13 pages
Credit Card Loan Approval Report
No ratings yet
Credit Card Loan Approval Report
3 pages
Loan Approval Insights for Analysts
No ratings yet
Loan Approval Insights for Analysts
19 pages
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
22 pages
Loan Approval Prediction Guide
No ratings yet
Loan Approval Prediction Guide
5 pages
75.an Approach For Prediction of Loan Approval Using
No ratings yet
75.an Approach For Prediction of Loan Approval Using
5 pages
Credit Risk Prediction with ML Models
No ratings yet
Credit Risk Prediction with ML Models
5 pages
Loan Eligibility Prediction Model Analysis
No ratings yet
Loan Eligibility Prediction Model Analysis
12 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
Credit Scoring Models on Lending Club
No ratings yet
Credit Scoring Models on Lending Club
100 pages
Paper 3
No ratings yet
Paper 3
5 pages
Credit Score Prediction Model
No ratings yet
Credit Score Prediction Model
3 pages
Credit Card Approval
No ratings yet
Credit Card Approval
15 pages
Logistic Write Up
No ratings yet
Logistic Write Up
4 pages
Loan Approval - PPT
No ratings yet
Loan Approval - PPT
19 pages
Credit Risk Analysis
No ratings yet
Credit Risk Analysis
6 pages
Final Review Presentation 24msp3077
No ratings yet
Final Review Presentation 24msp3077
26 pages
Management Project Proposal - Kartik Mehta - 15A2HP441
No ratings yet
Management Project Proposal - Kartik Mehta - 15A2HP441
9 pages
Research Paper
No ratings yet
Research Paper
14 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Personal Loan Campaign Modelling SALINAS
100% (1)
Personal Loan Campaign Modelling SALINAS
21 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
Credit Approval
No ratings yet
Credit Approval
14 pages
Wa0003.
No ratings yet
Wa0003.
6 pages
Thera Bank Loan Purchase Modeling Analysis
No ratings yet
Thera Bank Loan Purchase Modeling Analysis
26 pages
Loan Approval Prediction Using Supervised Learning Algorithm
No ratings yet
Loan Approval Prediction Using Supervised Learning Algorithm
11 pages
6242 PROJECT: Team Placeholder
No ratings yet
6242 PROJECT: Team Placeholder
8 pages
Core Python Notes
No ratings yet
Core Python Notes
64 pages
Module 3 DSDV (Nep)
No ratings yet
Module 3 DSDV (Nep)
28 pages
U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
ODINI: Escaping Sensitive Data From Faraday-Caged, Air-Gapped Computers Via Magnetic Fields
No ratings yet
ODINI: Escaping Sensitive Data From Faraday-Caged, Air-Gapped Computers Via Magnetic Fields
18 pages
Ethereum Presentation
No ratings yet
Ethereum Presentation
12 pages
SOC Analyst Course Content v3
No ratings yet
SOC Analyst Course Content v3
16 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
NSN UMTS Equipment Overview
100% (1)
NSN UMTS Equipment Overview
33 pages
Display Devices
No ratings yet
Display Devices
25 pages
Multiplexer, Decoder and Flipflop
No ratings yet
Multiplexer, Decoder and Flipflop
10 pages
Cloudsmith - Series A Pitch Deck
No ratings yet
Cloudsmith - Series A Pitch Deck
36 pages
Overview of Programmable Logic Devices
No ratings yet
Overview of Programmable Logic Devices
16 pages
ServiceNow Training for UT Service Desk
No ratings yet
ServiceNow Training for UT Service Desk
122 pages
KIOT-Greenway Health Virtual Pooled Campus 25.07.2025-1
No ratings yet
KIOT-Greenway Health Virtual Pooled Campus 25.07.2025-1
1 page
Static Routes Configuration Guide
No ratings yet
Static Routes Configuration Guide
7 pages
SDK Pubsub Commands
No ratings yet
SDK Pubsub Commands
3 pages
JSP Overview and Lifecycle Explained
No ratings yet
JSP Overview and Lifecycle Explained
29 pages
Simple SQL Cheat Sheet - DML Triggers
No ratings yet
Simple SQL Cheat Sheet - DML Triggers
1 page
Lab4 Worksheet
No ratings yet
Lab4 Worksheet
7 pages
Godinot Chua 2006 Use of A Wbs Matrix To Improve Interface Management in Projects
No ratings yet
Godinot Chua 2006 Use of A Wbs Matrix To Improve Interface Management in Projects
13 pages
Data Structures and Java Class Library
No ratings yet
Data Structures and Java Class Library
4 pages
Arifa Nuryani Praptama Sangat
No ratings yet
Arifa Nuryani Praptama Sangat
3 pages
Umadevithangamani
No ratings yet
Umadevithangamani
4 pages
Combinatorial Counting Techniques
No ratings yet
Combinatorial Counting Techniques
55 pages
Ethernet Networking Essentials
No ratings yet
Ethernet Networking Essentials
30 pages
Oracle Exadata Cloud 2021 Sales Specialist Assessment
No ratings yet
Oracle Exadata Cloud 2021 Sales Specialist Assessment
4 pages
Digital Manager
No ratings yet
Digital Manager
3 pages
Microsoft Certified Security Compliance and Identity Fundamentals Skills Measured
No ratings yet
Microsoft Certified Security Compliance and Identity Fundamentals Skills Measured
4 pages
User Agent Strings Overview
No ratings yet
User Agent Strings Overview
246 pages
PHD Thesis Proposal Sample
75% (4)
PHD Thesis Proposal Sample
8 pages

EndTerm - MLBA - Group 7 Draft

Uploaded by

EndTerm - MLBA - Group 7 Draft

Uploaded by

Project Report: Predicting Personal Loan Acceptance

Project Title: Identifying Potential Customers for Personal Loan Uptake

1. Data Preprocessing & Key Insights

Outline of Data Preprocessing Steps:

● Initial Data Inspection:

○ The dataset was loaded into a pandas DataFrame.

○ A significant data anomaly was identified in the Experience column: 52 rows

● Influence of Key Categorical/Binary Features:

○ CD Account: The presence of a CD Account was a remarkably strong indicator.

○ A correlation matrix confirmed the strong positive relationships between

2. Predictive Model Development: Methodology, Model Performance, and

Model Performance and Interpretation:

The performance of both models was evaluated using classification_report,

Logistic Regression Model

● Overall Accuracy: 95.13%

precision recall f1-score support

0 0.96 0.99 0.97 1356

1 0.83 0.62 0.71 144

True Negatives (TN): 1338 (Correctly predicted no loan)

Feature Coefficient Absolute Coefficient

Decision Tree Model

● Overall Accuracy: 98.47%

0 0.99 0.99 0.99 1356

1 0.9 0.95 0.92 144

Feature Importances of Decision Tree:

Interpretation of Feature Importances:

3. Model Comparison: Logistic Regression vs. Decision Tree

Metric Logistic Decision

Accuracy 0.9513 0.9847

ROC AUC Score 0.9646 0.9951

Recall (Class 1) 0.62 0.95

Precision (Class 1) 0.83 0.90

Conclusion on Model Comparison:

4. Assumptions & Limitations

● Data Representativeness: It is assumed that the provided bankloan dataset is

5. Conclusion & Key Takeaways

Key Findings and Model Comparison:

Key Takeaways for Increasing Loan Uptake:

You might also like