You are on page 1of 10


Credit Risk
Background of the study and
To predict the credit default Rate
Accurate prediction of whether an individual will repay the credit or will it be
written off
Analytics Objectives
To run a logistic regression to linearize the dependent category to the extent possible
MLR to determine which independent variable plays a significant role in determining
the repay ability of the customers.
Description of the Data. (Variable name &
Variable definition)
Variable Name Data Type Data Description
Loan ID Nominal Transaction number
Customer ID Nominal Customer Transaction Number
Current Loan Amount Ratio Current outstanding Debt
Term Interval Tenure of Credit
Credit Score Ranking Ability of customer to repay scored by external rating agency
Annual Income Ratio Annual Income of Customer
Years in current job Interval Customer tenure in current job

Home Ownership Categorical Asset base of customer

Purpose Categorical Purpose of the Loan

Monthly Debt Ratio Time weighted average of Monthly Credit consumed
Years of Credit History Interval Tenure of loan because to analyse the interest remaining
Months since last delinquent Interval Contractual obligation to pay on time dishonoured
Number of Open Accounts Ratio Currently existing additional bank accounts linked to card
Number of Credit Problems Ratio History of credit inefficiency i.e. due dates violated
Current Credit Balance Ratio Amount owed to the credit card company not paid yet.

Maximum Open Credit Ratio Credit a financial institution can extend on an already existing line of credit

Bankruptcies Categorical Default

Tax Liens Categorical Tax defaults

Loan Status Binary Status of Repay
Business Analytics Process
Data points in loan amount had extremity value of 9999999. The data points
were deleted.
Category reduction
Years in current job category was rectified to include only ratio data
Missing Data handling tab was used to delete all the records which were missing
Months since last Deliquesce NA data point was converted through 0
The category Home Ownership was reduced to 3 categorical variable as the have
mortgage was merged with home mortgage
Over sampling of charged off cases were performed as the data points of charged
off was 22687 whereas that of fully paid 66987 after removing the blanks
Descriptive statistics of the data shows that most of the variables of the data are
right skewed thus using KNN relevant variables were normalized
Data mining Task Applied
Technique used
Logistic Regression
Multilinear Regression(MLR
ROC Curve, AUC = 0.589222 Lift chart (training dataset)
1.2 2500


0.8 Cumulative Loan


1500 Status_fix when

0.6 sorted using
LogReg Classifier predicted values
0.4 Random Classifier Cumulative Loan
Status_fix using
500 average

0 0
0 0.5 1 1.5 0 2000 4000 6000 8000 10000
1 - Specificity # Cases
Confusion Matrix

Confusion Matrix
Predicted Class

Actual Class 1 0

1 3842 1155

0 342 4658

Error Report
Class # Cases # Errors % Error
1 4997 1155 23.11386832
0 5000 342 6.84
Overall 9997 1497 14.97449235