You are on page 1of 19

CREDIT – EDA CASE STUDY

Mr. Murali Krishna Manala


Ms. Prachi Patil
PURPOSE

• Exploratory Data Analysis on the Customer Loan Application which might


help the bank w.r.t Risk associated with customer default behavior.
• To provide inferences & decisions based on the data analysis will thus enable
the company to channelize the business towards making or scaling out profits.

APPROACH

1) Importing and Cleaning of 2) Formatting or Grouping 3) Performing Univariate & 4) Draw useful Insights.
the Data provided for an effective analysis. Bivariate analysis on
Categorial and Numerical
fields.
UNIVARIATE DISCRETE ANALYSIS FOR AGE GROUPS

 Applicants are increasing with Age of the applicant until age 40 and after that we see decline in the no of
applications.
 And from the 2nd plot , we see that Default rate is decreasing as the Age of the applicant increases.
UNIVARIATE DISCRETE ANALYSIS FOR FAMILY STATUS

• From the chart, It can be inferred that the most of applicants belongs to Married, Single & Civil Marriage
categories sequentially. Out of which, Single & Civil Marriage tend to default more, and Unknown
category never defaulted.
UNIVARIATE DISCRETE ANALYSIS FOR OCCUPATION

 Most of the Applicants occupation is Missing. However Top applicants are from Laborers ,Sales Staff.
 But most of the Default percentage is occurring from Low-Skill Laborers group
UNIVARIATE DISCRETE ANALYSIS FOR INCOME TYPE

 Most of the applicants are From Working Category & Commercial Associate and Least are Businessman
and Student
 However, Default percentage is more on Maternity Leave and Unemployed applicant group
UNIVARIATE CATEGORICAL ANALYSIS FOR ORGANIZATION
TYPE

 Most of the Applicants are from Business Entity Type 3 , Missing and Self Employed.
 However, Most default percentage is from Transport Type 3 , Industry Type 13 and industry Type 8 groups
ORDERED/ CONT., NUMERI CAL
VARIABLE ANALYSIS ON WORK
EXP

• Most of the applicants are more than 20 Years


experience. And as the experience increasing
default rate decreasing.
• Applicants who have less than <1 year experience
have 50% default chance.
ORDERED/CONT., NUMERICAL
VARIABLE ANALYSIS

• Clients who have more dependencies tend to have


more default percentage
• Similarly Clients whose Region Rating is 3.0 tend
to have more default percentage
ORDERE D/CONT., NUME RI CAL
VARI ABLE ANALYSIS – E XT
SOURCE

• The better the external score of the applicant the


lesser the default rate.(Since Ext_Source_3 was
imputed null with Mean ,we are seeing nearly
binomial where as Ext_Source_2 is rightly skewed)
ORDE RE D/CONT., NUME RICAL
VARIABLE ANALYSIS – AMT
FIE LDS

• AMT_GOOD_PRICE , AMT_CREDIT,
AMT_ANNUITY doesn’t seem to have any impact
on the default rate.
B I VA R I AT E A N A LY S I S -
E DU C AT I O N V S G E N D E R V S
INCOME

• From the heatmap, We can infer that Male


applicants with Lower secondary and
Secondary Education tend default more
especially from the Low, Medium and
High income
• And, those with an academic degree,
except the Female applicants with very
high-income group does not default.
B I VA R I AT E A N A LY S I S –
OC C U PAT I O N V S FA M I LY
S TAT U S V S I N C O M E

• From the heatmap, Widows from the HR


Staff with Very Low- and Medium-income
default more.
• Civil Married applicants who are Drivers
with and Low income default more.
BIVARIATE
ANALYSIS – ORG
TYPE VS HOUSING
TYPE VS INCOME

•From the heat Map, Following applicants have more


default rate
• UnEmployed Living in Municipal Apartment
• Transport Type 1 with Rented apartment – Working
• Who are in Maternity Leave
• Office Apartment- Working in Industry Type 4 and
Insurance
• Industry Type 8 living with parents and working
• State servants in Municipal Apartment working
industry type 3
• Commercial Associates working in Industry Type-1
living in Co-op apartment
B
CORRELATION MATRIX FOR THE DEFAULT DATASET
D ATA C L E A N I N G

• Exploratory Data Analysis on the Customer Loan Application which might


help the bank w.r.t Risk associated with customer default behavior.
• To provide inferences & decisions based on the data analysis will thus enable
the company to channelize the business towards making or scaling out profits.
TO P 1 0 C O R R E L AT I O N S

Feature 1 Feature 2 Correlation Score - Defaulter Correlation Score -Non Defaulter

OBS_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE 0.99827 0.99851

AMT_GOODS_PRICE AMT_CREDIT 0.983108 0.987255

REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY 0.956637 0.950149

CNT_FAM_MEMBERS CNT_CHILDREN 0.884153 0.876905

DEF_60_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE 0.869016 0.859371

AMT_GOODS_PRICE AMT_ANNUITY 0.752895 0.776867

AMT_CREDIT AMT_ANNUITY 0.752195 0.771317

REGION_RATING_CLIENT_W_CITY REGION_POPULATION_RELATIVE 0.446977 0.539005

REGION_RATING_CLIENT REGION_POPULATION_RELATIVE 0.443236 0.537301

DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE 0.337389 0.331726

Top 10 correlations between variables are in the range of (0.33) to (0.99). and both datasets(defaulter
and non-defaulter) have almost similar correlation, except for the Region rating client vs Region
population relative.
INFERENCE

Recommended Applicants Risk Associated Applicants


IT Staff Applicants with Civil Marriage or widow
status belonging to occupation – Driver/HR
Applicants holding academic degree Staff
Low skilled Laborers
Male applicants with Lower secondary or
incomplete higher education
Applicants who are unemployed or on
maternity leave

You might also like