You are on page 1of 6

Capstone Project

PGP- BABI

Title: Taiwan-Customer defaults

Project Notes – I
This first submission of project notes aims at case description and setting up the agenda for the
project. This report aims at analysing the business opportunity and implementation scope. It will also
cover exploratory analysis of the data through univariate and bivariate analysis.
This analysis will form the base of further model building and completing the prime agenda of the
project.

Mayank Bajpai
PGP-BABI Jan’19
Case Description

The data given corresponds to credit data of costumers based out of Taiwan. The data collected
contains the customer demographics and data on the credit history of the customers. This data also
contains whether there was a default in payment is done in the next month. The agenda of this
project is to determine with accuracy, the occurrence of a payment default and also to set the
costumer probability of default so the necessary risk mitigation can be done by the bank/credit card
company.

The key focus of the project is not only to predict worth accuracy the occurrence of a default but
also the business sense of risk profile of a customer with percentile-wise separation of high risk and
low risk clients.

One of the key objectives in this case will be to get the accurate prediction of defaulters as opposed
to an overall accuracy. This will ensure that maximum possible cases of defaults are classified as risky
customers.

Business opportunity/implementation of the case


BFSI industry generated the revenue through credit based products/services. This is one of the major
revenue generating areas for a company like this. As a consequence, the bank/company has to lend
a certain amount to the customer and the amount is recovered over a time with additional interest
on the amount, as per the policy.

This puts the bank in a position where recovering the money is both crucial to the overall running of
the system and the interest earned drives operations and profits. If the customer does not pay the
monthly EMIs on time, the person is classified as a defaulter and if the amount is not recovered, the
loan gets characterised as a bad loan and every year banks/credit systems incur a loss of millions due
to this uncertainty of predicting a good/ bad client. The EMIs if not received monthly on time also
put a pressure on the smooth running of operations.

Thus, if a creditor can predict a bad loan in advance or can predict risk of non-payment of the next
month EMI on time, measures can be taken to take decisions which can help in reducing the risk of
loan default.

This project thus is going to serve the same purpose of predicting the defaults of next month EMI
also suggesting the likelihood of a default basis customer demographics and credit repayment
history. This will help the credit system take decision as per the business requirement to mitigate
risk and increase profitability. This will also cap likelihood of default, which will let the creditor take
the risk as per the business requirements.
Data description

Default Payment: (Yes = 1, No = 0)

X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and
his/her family (supplementary) credit.

X2: Gender (1 = male; 2 = female).

X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

X4: Marital status (1 = married; 2 = single; 3 = others).

X5: Age (year).

X6 - X11: History of past payment. We tracked the past monthly payment records (from April to
September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment
status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the
repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two
months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005

X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.

X18-X23: Amount of previous payment (NT dollar).

X18 = amount paid in September, 2005

X19 = amount paid in August, 2005; . . .

X23 = amount paid in April, 2005.


Exploratory Analysis
The data is imported into R and basic dimensions of the data are as mentioned below.

The data file contains a total of 30000 entries spread across 25 fields.
Mentioned below are the fields involved:
The above values give a clear picture that some variables are to be considered as factors as opposed
to numeric values.

Univariate Analysis
Sex
M F
11888 18112

Education
0 1 2 3 4 5 6
14 10585 14030 4917 123 280 51

Marital Status

0 1 2 3
54 13659 15964 323

You might also like