You are on page 1of 14

Credit Risk

Problem Statement
Businesses or companies can fall prey to default if they are not able to keep up their debt
obligations. Defaults will lead to a lower credit rating for the company which in turn reduces its
chances of getting credit in the future and may have to pay higher interests on existing debts as
well as any new obligations. From an investor's point of view, he would want to invest in a
company if it is capable of handling its financial obligations, can grow quickly, and is able to
manage the growth scale.

A balance sheet is a financial statement of a company that provides a snapshot of what a


company owns, owes, and the amount invested by the shareholders. Thus, it is an important tool
that helps evaluate the performance of a business.

Data that is available includes information from the financial statement of the companies for the
previous year (2015). Also, information about the Networth of the company in the following year
(2016) is provided which can be used to drive the labeled field.

Explanation of data fields available in Data Dictionary, 'Credit Default Data Dictionary.xlsx'

Hints :

Dependent variable - We need to create a default variable which should take the value of 1 when
net worth next year is negative & 0 when net worth next year is positive.

Test Train Split - Split the data into Train and Test dataset in a ratio of 67:33 and use
random_state =42. Model Building is to be done on Train Dataset and Model Validation is to be
done on Test Dataset.

Credit Risk Dataset


Data Dictionary
Market Risk
The dataset contains 6 years of information(weekly stock information) on the stock prices of 10
different Indian Stocks. Calculate the mean and standard deviation on the stock returns and
share insights.

Please find attached the files to be referred.

Market Risk Dataset

You are expected to do the Market Risk Analysis using Python.

Please note the following:

• Please avoid sharing code in the business report. There might be a deduction if codes
are shared in the report
• Please ensure all the graphs displayed in the report are clearly visible
• The proper interpretation should be provided wherever required

1.8 Build a Random Forest Model on Train Dataset. Also showcase your
model building approach

We performed the train – test split as mentioned in the question and performed the random
forest model on the train data.
We built the random forest model with estimators or number of trees as 100 and tried to fit in
the training data. We also set our maximum features in each tree to be considered is 6 and
this will improve our performance tuning.

After fitting the model, we got a higher out-of-bag score of 0.96, which means we have 96%
of predicting the correct number of rows out of the sample considered from the training data.

1.9 Validate the Random Forest Model on test Dataset and state the
performance matrices. Also state interpretation from the model

After fitting the model on train dataset, we predict the values while fitting the model on the
test dataset. We also created the confusion and classification report for both the training and
test dataset to measure its performance metrics.

Confusion Matrix and Classification Report on the train data:


Confusion Matrix and Classification Report on the test data:

As per the results, we see that we have over-fit our model as we have accuracy value to be
1 and also as per confusion matrix, it identifies all of the values correctly without any false
value. This means we need to pass different value parameters while creating the model to
eliminate over-fitting model.

We perform the grid search for finding out the hyper parameters and build the model once
again. After finding out the hyper parameters, we calculate the performance metrics once
again.
Confusion Matrix and Classification Report on the train data (After Hyper parameters):

Confusion Matrix and Classification Report on the test data (After Hyper parameters):

We could see high accuracy and high recall value in both the training and the testing data
and also our model performs well, eliminating the over-fitting issue which occurred earlier
before tuning the hyper parameters.

1.10 Build a LDA Model on Train Dataset. Also showcase your model
building approach

We tried building the Linear Discriminant Analysis model on the same train data as above.
We build the model without any parameters passed on the model, thereby the model will
consider all of its default parameters while fitting the model.
1.11 Validate the LDA Model on test Dataset and state the performance
matrices. Also state interpretation from the model

After fitting the model, we need to fit the model on the test data and predict the values on the
test data.

Confusion Matrix for both train and test data:

We could see the confusion matrix for both the sets – for training and testing data sets. We
could see that the true positive and true negatives on the both sets to be on the higher side
rather than false negatives and false positives.

Classification Report for both train and test data:

We could see that the accuracy on the both training and testing sets seems to be on the
higher side, which is 94% and 93% respectively.

Also, precision is also on the higher side, but recall value is on the lesser side in both the
training and testing sets.

In this model, recall seems to be a problem but as per the problem, we need to improve our
recall value as we need to identify potential risk customers who will probably default in the
future.
1.12 Compare the performances of Logistic Regression, Random Forest
and LDA models (include ROC Curve)

We are considering the recall value along with the precision value for our performance
metrics. Let’s consider the performance metrics value on the test set for all our models.

Classification Report (Logistic Regression) along with ROC curve:

We got less recall value in the earlier model, but after the SMOTE technique, our
performance improved, thus improving recall value.

ROC curve on the training data:


ROC Curve on the testing data:

Classification Report (Random Forest) along with ROC curve:

In our random forest model, we were able to better overall performance metrics, including
precision and recall value on both our training and testing sets.

AUC score is higher for this model – 0.99 for training set and 0.98 for testing set.
Classification Report is mentioned above – Refer fig. above

ROC Curve on the training data:

ROC Curve on the testing data:

Classification Report (LDA) along with ROC curve:

Our LDA model shows performance metrics which is significantly lesser than other models –
Random Forest, Logistic Regression model as above.

AUC score is also lesser when compared to other models.

ROC Curve on the training data and testing data:


1.13 State Recommendations from the above models

Below are the recommendations from the above models.

• We are choosing the Random Forest Model as our optimum model as we have
higher performance metrics on all values – accuracy, precision and recall values for
both the training and testing sets.
• We have choose this model as our best , because logistic regression model gives
better results only when we use SMOTE technique for our unbalanced dataset issue.
Logistic Regression model gives good recall value when compared to all our models,
but our precision value takes a hit for our defaulters (people who are predicted to be
defaulters).
• We set out to identify the potential customers in the bank who are predicted to default
and our model should give us high recall value, but we also need to consider a model
which gives high precision value also.
• Random model also gives high AUC value of 98% in the testing set, which will give
better results in identifying our potential defaulters and it is important for us as a bank
to identify our defaulters.

Market Risk Problem:

We have imported the dataset and we could see the stock values of various companies
across the years 2014 to 2021.
The number of rows is 314 and we have 11 columns in this dataset.
Descriptive Statistics:

2.1 Draw Stock Price Graph (Stock Price vs. Time) for any 2 given stocks
with inference

Below is the Stock Price Graph for 2 major companies – Infosys, Idea Vodafone

Infosys:

We could see that the stock price seems to be on up rise from the years 2014 to 2021. We
could see a slight dip in stock prices in the year 2018 but the Company got hold of its
problems and from then on, there is steep increase in the stock price.

Idea-Vodafone:

In this plot, stock price is decreasing over the years and we could see that the prices are
inversely proportional to the year. We find the company has higher risk in terms of stock
market investments, as it could improve the share value even after merger acquisition.
We have plotted the graphs for other companies, below are the results

Companies with good stock prices over the year - Infosys, Indian Hotel, Shree Cement, Axis
bank

Companies with stock prices with variability across the year – SAIL, Jindal Steel, Jet
Airways, Mahindra & Mahindra

Companies with stock prices with descent over the year - Sun Pharma, Idea Vodafone.

2.2 Calculate Returns for all stocks with inference

We calculate the returns by taking logarithmic value difference by subtracting the stock value
from any day to its previous day value.

The first row will return a null value, as we don’t have any previous day stock price value.

Below are the returns calculated for all stocks.


2.3 Calculate Stock Means and Standard Deviation for all stocks with
inference

Below are the stock mean and standard deviation stock market values across the rows.

We are calculating the mean value and standard deviation using the python functionality.

Lesser the standard deviation, lesser is the risk in investing in that particular company and
higher the mean value means that the stock price is on the higher side.

Companies with higher mean – Shree Cement, Infosys


Companies with lesser standard deviation - Shree Cement, Infosys, Sun Pharma
2.4 Draw a plot of Stock Means vs. Standard Deviation and state your
inference

We created a data frame to show its mean value as average and the standard deviation
value as its volatility across all companies.

We also created the plot for these columns – Average stock price Vs Standard Deviation
(Volatility).

For X-axis , we need values with mean values higher than 0 , so we set the line at mean
value = 0 and we need to check values higher than that , so we know which companies have
higher stock mean value.

For Y-axis, we have set values with lower risk of 0.02 as the y-axis limit for finding out which
companies has risk value near to the limit value.
We could see that the companies – Infosys and Shree Cement has higher stock price and
less volatility and companies with stock prices value and lesser stock price than top 2
companies are Indian Hotel and Axis Bank.

2.5 Conclusion and Recommendations

Below are the conclusions and recommendations on the market risk analysis

• As a potential investor, we would recommend you to invest in the company – Shree


Company, as we could see the company’s share price has been steadily increasing
over the years and also we could see the mean value being the highest with lesser
risk value of 0.0399.
• Next recommendation would be to invest in the company – Infosys, as the company’s
stock price has been increasing from the year 2018, even though it has faced
instability in the stock price before. Mean value is the second –highest and this
company also has lesser risk of 0.035, even better than Shree Cement.
• We do not recommend you to invest in the companies – Idea Vodafone, Sun Pharma
and Jet Airways as all these company’s stock price are always on steady descent
over the years due to various losses at these companies. These companies have
both lesser stock prices mean and also higher risk value.
• We also encourage you to invest in other companies with less investment shares –
Indian Hotel and Axis Bank. They may not the best in their stock prices, but they
have better risk values than the other companies.

You might also like