You are on page 1of 24

Bank Loan Case Study

BY
SOURAV KUMAR PANIGRAHI
Project Description

● This case study aims to give an idea on applying EDA on real life business scenario.
● Techniques of EDA will help us develop a basic understanding in risk development in banking and
financial service and minimises the risk of losing money while providing loans.
Approach

● At first, understand the datasets,so that it is easy to sort out the queries.
● Detect the missing values and clear it so that we can move forward with the analysis.
● Imported the datasets to google colab and started working with the coding.
Tech-Stack used

● Microsoft excel
● Google Colab
● Google Drive (To import the dataset)
1) Present the overall approach of the analysis. Mention the problem statement and the analysis
approach briefly

● Understand the dataset, so that it is easy to work with the analysis


● Detect the missing values and clear it so that we can move forward with the analysis.
● Imported the datasets to google colab and started working with the coding.
● At first we cleaned the data after finding out the missing values
● Identified the outliers from the respective datasets
● Identified the imbalance in the data and found the imbalance percentage.
● Explained univariate, segmented univariate and bivariate analysis using different columns from the
dataset
● Explained correlation using heatmaps.
2) Identify the missing data and use appropriate method to deal with it.

● At first we’ll find the NAN value % in each column.


● round(app_data.isnull().mean().sort_values(ascending= False)*100) < 13.5]
● By running this code,we get the count of missing values in each column.
● Cells which is having missing values greater than 13.5% are dropped.

Similarly for all the dataset


3) Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier. Again,
remember that for this exercise, it is not necessary to remove any data points.

● We are plotting a box chart to identify the outliers for applicaion_data(quantitative variables)
4) Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms.

univariate

Df1_extract = Df3[['DAYS_BIRTH']]
Df1_extract.hist()
plt.show()
Df1_extract.describe()
segmented univariate
bivariate analysis
5) Identify if there is data imbalance in the data. Find the ratio of data imbalance.
f1=Df3_Imbalance.diff(periods=1,axis=0)
difvalue=Df3_Imbalance[[list(Df3_Imbalance.columns)[-1]]].max()
difvalue
6) Find the top 10 correlation for the Client with payment difficulties and all other cases (Target
variable

Heat map for the correlation between critical quantitative values for clients having
difficulty in payment
7) include visualizations and summarize the most important results in the presentation. You are
free
to choose the graphs which explain the numerical/categorical variables. Insights should explain why
the
variable is important for differentiating the clients with payment difficulties with all other cases.

Box chart for the quantitative variables to see the outliers

prev_box_2 = ['AMT_APPLICATION','AMT_CREDIT', 'AMT_ANNUITY',


'AMT_GOODS_PRICE','DAYS_DECISION']
for i in Df_prev[prev_box_2]:
plt.figure(1,figsize=(15,5))
sns.boxplot(Df_prev[i])
plt.xticks(rotation = 90,fontsize =10)
plt.show()
THANK YOU

You might also like