Professional Documents
Culture Documents
Credit Risk
Institutional Affiliation
Table of contents
Introduction 3
Data description 4
Descriptive statistics 5
T-test 11
ANOVA test 14
Job categories 15
Conclusion 19
References 20
3
Introduction
Because of the enormous effect that it may have on a financial institution's financial
stability and profitability, credit risk has taken on an increasingly prominent role for financial
institutions. The possibility that a borrower may not return a loan or otherwise fulfil their
financial responsibilities to a lender is what is meant by the term "credit risk." If a borrower fails
to make their payments, the lender may incur considerable financial losses, and in extreme
circumstances, it may even put the institution's ability to remain solvent at risk. As a direct
consequence of this, financial institutions are required to practice efficient management of credit
risk in order to reduce the chance of default and shield themselves from financial losses.
Assessing the creditworthiness of potential borrowers, monitoring and evaluating credit risk
exposures, determining the proper lending limits and conditions, and putting risk reduction
techniques into action are all components of an efficient credit risk management strategy. In
addition to the possibility of incurring financial losses, regulatory authorities keep a careful eye
on credit risk. These agencies are responsible for establishing rules and regulations for credit risk
hefty fines and penalties for failing to comply with the requirements of regulatory agencies
how the variables credit amount and credit duration, which are often predictors of credit risk, are
affected by the demographic variables of the respondents. Two statistical tests will be used in
this analysis which is the t-test and the ANOVA test. The objectives of the research are
1. Is the amount of credit amount borrowed the same for all sexes, job categories, and
2. Is the amount of credit duration the same for all sexes, job categories, and purposes the
This report will provide a step-by-step explanation of how the objectives were achieved
Data description
One of the datasets that are frequently utilized in the field of credit risk analysis is known
as the German credit risk data. A German financial institution's credit applicants' personal
information is included in this dataset. The information consists of twenty different factors, some
of which include age, sex, income, credit history, employment status, and others. Whether the
applicant went into default on loan or not is shown by the target variable, which is a binary
indicator of this status. This dataset is frequently utilized in the construction of prediction models
for the purpose of credit risk assessment. It has had extensive application in scholarly
investigations, and it is currently open for public use. It is obtainable through a variety of sites,
such as the UCI Machine Learning Repository and Kaggle, among others (Aithal & Jathanna,
2019).
Because of the convoluted system of categories and symbols that were used to organize
the initial dataset, it is nearly difficult to comprehend it. As a result, the researcher selected a
portion of the full dataset to analyze. There are a few columns that are simply disregarded since,
in my opinion, either they do not contain relevant information or their explanations are not clear.
5
Descriptive statistics
In this section, we provide the descriptive statistics of the numerical variables that were
Credit
3271.25
Standard Error 8 3 4
Median 2319.5 18 33
Mode 1393 24 27
Deviation 7 1 7
Variance 3 145.415 3
0.91978
Skewness 8 4 9
Range 18174 68 56
Minimum 250 4 19
Maximum 18424 72 75
From the table above, the average credit amount is 3271.25, while the lowers credit
amount recorded was 250, and the highest credit amount recorded was 18424. The average
duration for the loans, as per the table above, was 20.9 months, while the lowest months were 4
and the highest months were 72. In terms of age, the average age of the participants in the
research was 35.5, while the lowest and the highest ages were 19 and 56, respectively.
In this section, we also analyze the number of people in each of the categories. The first is the
As per the chart above, the was a high number of males who borrowed loans as opposed
to females. The number was nearly double that of females. Using this chart, we can say that there
are lots of males who borrow loans. To understand which sex borrowed a higher amount, we get
Labels amount
female 2877.774194
male 3448.04058
As per the chart above, on average, males had a higher credit amount as compared to
females. We also look at the relationship this had with the duration of the loan. This is
As per the results of this chart, there is a higher credit duration for men as compared to
females. In the study that follows, we examine how the purpose of the loan affects the size of
the credit line and the length of the loan. Let us start by taking a look at the total number of loan
reasons for everyone who participated in this data gathering. The outcomes are displayed below.
8
According to the findings above, many people take out loans in order to purchase cars.
The third greatest majority receives a loan for furniture or equipment, followed by those who
receive one in order to purchase a radio. The lowest total is used for vacation or other expenses.
The findings are shown in the table below in terms of the average credit amount for each of the
loan reasons.
According to the aforementioned figure, more people than any other category borrowed
money on average for vacations. A business loan and a car loan came after it, respectively.
9
Comparing all loans, loans for household appliances had the lowest average loan amount. The
next step is to examine how long each loan was taken out for. The graph below displays the
outcomes.
Similar to the last instance, the average loan term for a vacation loan was the greatest,
followed by the average loan durations for a business loan and a car loan. The reason for this is
that these loans are frequently quite substantial, and borrowers would need to take out loans for a
very long period to pay them back. The following part examines the relationship between the
variables of credit amount and loan term and the job levels of the individuals. We start by
There were four different employment categories, with Category 2 having the most
employees, followed by Category 3 and finally, Category 1. No one from job category 0 was
present. The employment classifications and average credit amount borrowed by each person are
Average
of Credit
1 2358.52
3070.965
2 1
5435.493
3 2
11
According to the preceding table, job category 3 received the most credit, then job
category 2, and then job category 1. The relationship between employment types and credit
Average
of
1 16.535
21.41111
2 1
25.16891
3 9
According to the data above, there is a similar pattern between the credit amount and the
T-test
The goal of the first objective was to test whether sex has no impact on the credit amount
and duration of the loans. In this section, a statistical test known as the t-test is used. An
illustration of an inferential statistic is the t-test, which may be used to compare the means of two
groups or look into any relationships between them. T-tests are used for examining data that has
a normal distribution but unknown variances, such as the data that was obtained by repeatedly
flipping a coin a hundred times. Calculating the t-statistic, the values of the t-distribution, and the
degrees of freedom are the three components of the statistical test known as the t-test. This test is
used to assess whether or not a null hypothesis is correct (Liu & Wang, 2021).
12
employing a t-test on data collected from both groups. The assumption here is that the means of
both groups are identical, which is known as the null hypothesis. Formulae are used to calculate
values, and the results are compared to standards. Therefore, one either accepts or rejects the
hypothesis of no effect. If it is possible to refute the null hypothesis, then it follows that the
readings of the data are not likely to be random. The t-test is one of the tests that are utilized for
the purpose of performing this job. Statisticians apply a variety of tests in addition to the t-test in
order to investigate a greater number of variables and larger sample sizes. Statisticians employ
the z-test when they have a high number of samples to analyze (Kim & Park, 2019).
Our first test looks at whether the credit amount borrowed by both sexes is the same. The
Null hypothesis: the credit amount borrowed by males and females is the same
Alternative hypothesis: the credit amount borrowed by males and females is not the same.
Difference Difference
From the table above, we have two results for the credit amount where we assume that
there were equal variances and equal variances were not assumed. For the two cases, we have a
p-value of 0.003 and 0.002, respectively. Generally, a test is interpreted in terms of its p-value. A
p-value that is larger than 0.05 indicates that we go with the null hypothesis, while a p-value that
is lower than 0.05 indicates that we go with the alternative hypothesis. In our case, the p-value is
lower than 0.05, which indicates that we go with the alternative hypothesis. We, therefore,
conclude that the loan amount borrowed by males and females are not the same. As per the
descriptive analysis in the previous section, we can say that males borrowed a higher credit
The next test looks at how sex influenced the duration of the loan. In short, it is trying to
answer the question is the loan duration on average the same for all sexes? The hypothesis
Null hypothesis: the duration of the loan borrowed is the same for males and females
Alternative hypothesis: the duration of the loan borrowed is not the same for males and females
The results of the t-test are presented as shown in the table below:
Equal variances
Duration .010 2.122 .822
assumed
14
As per the results of the table above, we note that the p-values of the t-test are 0.10 and
0.007, respectively. In all the cases, the p-values are lower than 0.05, which indicates that we
should go with the alternative hypothesis. Thus, we conclude that the credit duration of a loan
borrowed is not the same for males and females. From the descriptive statistics borrowed in the
previous section, we can conclude that the average duration for a loan for males is higher than
that of females. This can be explained in terms of the loan amount that was borrowed, as males
ANOVA test
Analysis of variance (ANOVA) is a technique used in statistics that classifies the sources
of observed aggregate variability within a data set into two categories: systematic and random.
The statistical analysis of the supplied data set reveals the influence of systematic rather than
random causes. The analysis of variance (ANOVA) test is the first step in determining which
factors are at play in a given data set. After the first test is finished, an analyst will run additional
tests on the procedural factors that undoubtedly contribute to the inconsistent character of the
The ANOVA test may be used to look at the relationship between more than two groups
all at once. The F statistic, also known as the F-ratio, is the outcome of the ANOVA formula and
allows for the comparison of various data sets to identify differences in variability. If there is no
significant difference between the groups, the F-ratio statistic of the ANOVA will be close to 1,
15
as stated by the null hypothesis. All potential values for the F statistic are distributed according
to the F-distribution. Degrees of freedom in both the numerator and denominator provide insight
into this set of distribution functions. This test will enable us to understand the impact of the
variable's purpose and job categories on credit amount and duration. They are used because they
have three or more categories. T-test was used in the previous section because it had 2 categories
Job categories
The first test we consider is the effects of job category on the credit amount. The question
answered is the average credit amount the same for all job categories? the ANOVA test
Null hypothesis: The average credit amount is the same for all job categories
Alternative hypothesis: The average credit amount is not the same for all job categories
ANOVA
Credit amount
Squares
7068674638.
Within Groups 996 7097062.890
736
16
7959875627.
Total 999
436
From the above table, the p-value is 0.000. As per the decision rule, when the p-value is
greater than 0.05, we go with the null hypothesis; else, we go with the alternative hypothesis. In
our case, if the p-value is lower than 0.05, then we go with the alternative hypothesis and
conclude that the credit amount is not the same for all of the job categories. The next table looks
at the impact of the job categories on the average credit duration. In other words, it answers the
question, is the average credit duration the same for all the job categories? The hypothesis
Null hypothesis: The average credit duration is the same for all job categories
Alternative hypothesis: The average credit duration is not the same for all job categories
The table below presents the results of the ANOVA analysis that was conducted.
ANOVA
Duration
Squares
Between
6947.446 3 2315.815 16.675 .000
Groups
As per the table above, the p-value is lower than 0.05, which indicates that the credit
Borrowing limits vary by occupation for a number of reasons. The average salary for
each profession is a major consideration. Executive roles, which tend to pay more, may allow
their employees to take out larger loans than entry-level retail ones. Job security is also a major
consideration. Borrowers who have established employment and a regular revenue stream may
find that lenders are more amenable to providing greater loan amounts. Lenders will feel more
The maximum loanable amounts vary by occupation and, to a lesser extent, by education
and training. Professionals with higher education requirements, such as physicians and attorneys,
may be eligible for larger loans. These general considerations may be supplemented by sector-
specific considerations. It's possible that different occupations within the same industry (or even
in other industries) will have different access to credit based on the specific lending standards in
place. Lending standards in the real estate business, for instance, might be different from those in
The purpose of this section was to conduct a loan to determine how the purpose of a loan
influences the credit amount and the duration of the loan. In other words, are the average credit
amount and credit duration the same for all loan purposes? Therefore, two tests were conducted;
the first test was on how the average credit amount is affected by the loan purpose. The
Null hypothesis: The average credit amount is the same for all loan purposes
Alternative hypothesis: The average credit amount is not the same for all loan purposes
18
ANOVA
Credit amount
Squares
7274985923.
Within Groups 992 7333655.165
716
7959875627.
Total 999
436
As per the above test, there is a difference in the average credit amount for all the loan
categories. This is because the p-value is greater than 0.05. The next table looks at the impact of
Conclusion
The main objective of this analysis was to determine whether the amount of credit
amount borrowed and the amount of credit duration was the same for all sexes, job categories,
and purposes the loans were used. Statistical tests, t-tests and ANOVA were used in the
analysis. The findings revealed that there were significant differences in credit amount between
males and females, with males borrowing a higher amount on average. Additionally, credit
19
duration was found to vary significantly between sexes, with males having a longer duration on
average. Furthermore, the ANOVA test showed that job categories had a significant influence on
both credit amount and duration. Different job categories were associated with varying credit
amounts and durations. These findings highlight the importance of considering demographic
variables when assessing credit risk. The step-by-step explanation of the objectives and tests
provided in this report enhances understanding of the research findings and their implications for
credit risk assessment. A financial institution should always consider these variables to avoid
defaulting, as they should have limits on the loans that they can give to individuals. Observing
this ensures that the loan amount and the duration of the loan offered to the individuals are what
References
Aithal, V., & Jathanna, R. D. (2019). Credit risk assessment using machine learning techniques.
3486. https://doi.org/10.35940/ijitee.A4936.119119
Burger, T. (2022). Applying FDR control subsequently to large scale one-way ANOVA testing
https://doi.org/10.1101/2022.08.29.505664
Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning
https://doi.org/10.1007/s10614-020-10042-0
Kim, T. K., & Park, J. H. (2019). More about the basic assumptions of t-test: normality and
https://doi.org/10.4097%2Fkja.d.18.00292
Liu, Q., & Wang, L. (2021). t-Test and ANOVA for data with ceiling and/or floor effects.
Miari, M., Anan, M. T., & Zeina, M. B. (2022). Neutrosophic two way ANOVA. International