You are on page 1of 4

Credit Risk Management Using Machine Learning

Zoran Ereiz
Faculty of Information Technology
Mostar, Bosnia-Herzegovina
zoran.ereiz@edu.fit.ba

Abstract—Credit risk management is essential to financial With a good model financial institutions could make
institutions as it directly affects business results. Although AI predictions how likely is a client to repay the loan before the
and machine learning are not new, microcredit organizations maturity date and then take actions prior to this happening
are shy in accepting these methods in their credit risk (Fig. 1).
assessment process and still use standards credit scoring based
on a linear calculation of a small number of indicators. This
scoring model gives mixed and unreliable results. Machine
learning, on the other hand, offers a much broader view of a
client and can be used to manage not only credit risk but other
business risks too. As this paper demonstrates, prediction ratio
is very high but depends on the quality of the data.

Keywords—AI, machine learning, risk management, credit


risk, dataset, prediction

I. INTRODUCTION
Fig. 1. Credit risk management system using machine learning
Credit risk is one of the most important risks to be
managed by a financial institution. Without loan repayment To mitigate the subjective part of the decision making
there is no profit, hence the problem of credit risk management process, different scoring models are introduced to evaluate
is relevant to all financial institutions involved in lending to certain parameters that could affect the loan repayment. In
individuals and legal entities. This is even more true with general, these parameters include, but are not limited to:
microcredit organisations who have only one product - loans.
Banks have a diverse portfolio so the risk is somewhat  income
mitigated but credit risk is still the most important to manage.
 loan amount
Credit risk is economic loss that emanates from the failure
 loan term
of a counterparty to fulfill its contractual obligations (e.g.,
timely payment of interest or principal), or from the increased  client's age
risk of default during the term of the transaction. Traditionally,
financial firms have employed classical linear, logit, and  client's gender
probit regressions to model credit risk[1].  number of household members
A standard credit score is largely a linear calculation of a  place of living (urban/rural)
small number (about 50) positive or negative numerical
characteristics about a person and thus misses out on a huge  industry of business (if business loan)
amount of additional personal information that can help to
either reduce negative risk or accept positive risk[2].  loan product

The management of credit risk of credit portfolios is  loan cycle


therefore one the most important tasks for the financial  previous loan days in arrears
liquidity and stability of banking sector in connection with
increased sensitivity of banks to the credit risks and changes  current credit classification
in the development of prices of financial instruments[3].
Credit risk management is a matter of predicting the  client's credit history
possibility of a default loan. Different financial institutions  credit history for address (municipality)
manage credit risks in different ways but in general the goal is
always the same.  various financial indicators

II. CREDIT SCORING WITH MACHINE LEARNING Various combinations of these parameters are scored to
identify a potential credit risk. However, these parameters are
The use of AI and machine learning to create a credit risk mostly general (e.g. credit history for municipality, place of
management model is not new but in the recent years it is living...) and do not give an accurate prediction. In
growing rapidly. It is particularly the increased complexity of Microcredit Organization EKI (MCO EKI) as of beginning of
assessing credit risk that has increased the attention to July 2019 there were 40812 active loans and out of this
machine learning. Using machine learning algorithms a new number 10185 loans received a bad credit scoring, out of
model could be created using existing anonymised historical which 1507 are late and 183 are in default. On the other hand,
data that would be used to train the model to make better out of 30627 loans that received a good/passing score 2837
predictions not only for credit risks but also for other risks like are late and 300 are in default. Obviously the scoring system
the possiblity of early repayment resulting in income losses currently in use is not giving good results.
from interest, the possibile danger of money laundering etc.

©2019 IEEE
III. PREPARING THE DATASET  Default
During the loan application process various data about the Some of these indicators (e.g. EarlyRepaid, WOF,
client, client's household and business is collected. Some of LateWithPayment...) represent information that is available
the data is relevant to credit risk prediction and some is not. A during repayment or after the loan is closed and as such
dataset with the most relevant information (based on previous directly indicates the prediction outcome the model is
experience in the financial institution) is created from a supposed to make so after further analysis they were deemed
database of 150091 loans. Out of this number 40812 are still irrelevant for the model and removed from the dataset. The
active while the remainder are closed i.e. repaid in one way or dependent variable in our dataset is a binary indicator with the
the other. A sample of 3612 loans will be used for the purpose value of 1 flagging a default loan.
of this paper while the full dataset will be used in the
implementation of the model. The sample dataset contains With the General Data Protection Regulation (GDPR)
loans covering a period from June 2014 to July 2015 (about a additional impediments are created since data can no longer
year of disbursed loans). be saved and used without consent if there isn't a contractual
relationship between the client and the financial institution.
In cooperation with employees from the Risk and This creates problems regarding saving and using data in the
Operations department a table was created containing records model.
deemed important to the credit risk management. The table
consists of the following 29 indicators characterizing the At MCO EKI a loan is flagged as delinquent if the client
borrower and loan: fails to pay a monthly installment in the month it is due. Days
in arrears (number of days from when the installment was due
 loLoanID to the actual payment) are also tracked but as long as the
payment is made within the month the client/loan is not
 AgreementNo
delinquent. If the loan is 180 days late it is written off i.e.
 ContactCode deemed uncollectable through regular procedure and it has to
be collected through court. Usually, a loan that is written off
 Amount has also had installments transferred to another month before
 Term payment was made.

 GracePeriod The first stage is cleaning the dataset. The dataset doesn't
contain any N/A or null values which is good. The next step
 Cycle is to remove columns that are irrelevant to the model being
built. This is dangerous because it could lead to an overfitted
 Business/Nonbusiness dataset resulting in low performance outcome. The following
 Product categories (table columns) were removed because they are
unavailable in the application approval process:
 Branch
 WOF
 LO
 NoMonthlyTransfers
 WOF
 FirstInstallmentTransferred
 NoMonthlyTransfers
 LateWithPayment
 FirstInstallmentTransferred
 FirstInstallmentLate
 LateWithPayment
 EarlyRepaid
 FirstInstallmentLate
 NoMonthsRepaid
 EarlyRepaid
The dataset is divided in 2 groups: train and test group
 NoMonthsRepaid where the training table contains 75% of the dataset. The test
table will be used to confirm the model after it is trained.
 CRK_Category
To generate the best suited model we used BigML's
 HH_Members OptiML, an automatic optimization option that allows users to
 HH_Members_No_Income find the best supervised learning model for the submitted data
(Fig. 2).
 RuralUrban
 Municipality
 EmploymentStatus
 Gender
 AgeAtDisbursement
 DocumentedIncome
 IncomePerHH_Member
The best model based on the supplied data was a bootstrap
decision forest (Fig.5).

Fig. 2. OptiML Model Optimization

 It can be used for both classification or regression Fig. 5. Model evaluation first attempt
problems.
After running the model on the test data it returned 68
 It works by automatically creating and evaluating incorrect predictions out of a dataset of 903 or 7.53%.
multiple models with multiple configurations (decision However, the most important field in this model was LO
trees, ensembles, logistic regressions, and deepnets) by (Loan Officer) followed by client’s municipality (Fig. 6).
using Bayesian parameter optimization.
 When the process finishes, it provides a list of the best
models for comparison and selection of the one that
best suits the use case.
The OptiML algorithm is split into two phases. The first,
the “parameter search” phase, uses a single holdout set to Fig. 6. Top 5 field importance
iteratively find promising sets of parameters. The second, the
“validation” phase is used to iteratively perform Monte Carlo This did not make much sense so a new model was created
cross-validation on those parameters that are somewhat close on a new dataset excluding some other fields:
to the best (Fig. 3).  ContactCode
 Branch
 LO
 Municipality
 DocumentedIncome
Unfortunately, this caused every model to give poor
performance results (Fig. 7).

Fig. 3. Model creatation and evaluation

For this second phase, the algorithm iteratively does new


train/test splits for the top half of algorithms remaining. Thus,
the best models will typically have more than one evaluation
associated with them.
After completion there is a report showing elapsed time,
evaluated models, number of candidate models and the Fig. 7. Model evaluation second attempt
number of created models (Fig. 4).
IV. CONCLUSION
Financial institutions are increasingly turning to AI and
machine learning not only for credit risk management, but also
to manage other risks like financial fraud, money-laundering,
the risk of not being in compliance with regulations (potential
risk of being fined by the regulator), client behaviour that
could lead to potential income loss to the financial institution
Fig. 4. Model creatation report
etc. All these risks can be managed using various AI and
machine learning techniques.
There are some significant practical issues that need to be ACKNOWLEDGMENT
addressed before AI and machine learning techniques for risk The author would like to thank MCO EKI for making the
management can claim its full potential. The most important data available for this research.
of these is the availability of suitable data. In practice the
process of model building is always hampered by the REFERENCES
availability and quality of data [4]. The collection process is
never perfect or completely accurate and the data often contain
[1] Altman, Edward I. "Financial ratios, discriminant analysis and the
inconsistencies or missing values. Another issue is the prediction of corporate bankruptcy." The Journal of Finance 23:4
availablity of skilled staff to implement these new techniques. (1968): 589-609.
There is also the issue of how accurate the model is. Due to [2] Aziz, Saqib & Dowling, Michael. (2018). AI and Machine Learning for
neverending changes in the lending market the model should Risk Management. SSRN Electronic Journal. 10.2139/ssrn.3201337.
be seen as a live matter and constantly monitored and adjusted. [3] Kiseľáková D., Kiseľák A., 2013, Analysis of banking business and its
impact on financial stability of economies euro area, “Polish Journal of
However, a machine learning model created this way Management Studies”, 8.
would enable the financial institution to make better and more [4] Galindo, Jorge & Tamayo, Pablo. (2000). Credit Risk Assessment
objective decisions regarding loan approval, to prevent Using Statistical and Machine Learning: Basic Methodology and Risk
internal and/or external fraud, suppress money-laundering Modeling Applications. Computational Economics. 15. 107-43.
activities and predict client behavior to prevent clients from 10.1023/A:1008699112516.
leaving, attracting them with new custom-designed loan [5] Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis
& Aristotelis Klamargias, 2019. "A robust machine learning approach
products etc. for credit risk analysis of large loan level datasets using deep learning
For future work a full set of data (at the moment 150091 and extreme gradient boosting," IFC Bulletins chapters, in: Bank for
International Settlements (ed.), Are post-crisis statistical initiatives
loans) should be used to confirm the findings of this research completed?, volume 49 Bank for International Settlements.
and the model implemented at MCO EKI.

You might also like