You are on page 1of 20

1

Credit Risk

Student’s First Name, Middle Initial(s), Last Name

Institutional Affiliation

Course Number and Name

Instructor’s Name and Title

Assignment Due Date


2

Table of contents

Introduction 3

Data description 4

Descriptive statistics 5

T-test 11

ANOVA test 14

Job categories 15

Purpose of the loan 17

Conclusion 19

References 20
3

Introduction

Because of the enormous effect that it may have on a financial institution's financial

stability and profitability, credit risk has taken on an increasingly prominent role for financial

institutions. The possibility that a borrower may not return a loan or otherwise fulfil their

financial responsibilities to a lender is what is meant by the term "credit risk." If a borrower fails

to make their payments, the lender may incur considerable financial losses, and in extreme

circumstances, it may even put the institution's ability to remain solvent at risk. As a direct

consequence of this, financial institutions are required to practice efficient management of credit

risk in order to reduce the chance of default and shield themselves from financial losses.

Assessing the creditworthiness of potential borrowers, monitoring and evaluating credit risk

exposures, determining the proper lending limits and conditions, and putting risk reduction

techniques into action are all components of an efficient credit risk management strategy. In

addition to the possibility of incurring financial losses, regulatory authorities keep a careful eye

on credit risk. These agencies are responsible for establishing rules and regulations for credit risk

management in order to guarantee the stability of financial institutions. It is possible to incur

hefty fines and penalties for failing to comply with the requirements of regulatory agencies

(Bussman et al., 2021).

In this study, we are interested in contributing to credit risk research by understanding

how the variables credit amount and credit duration, which are often predictors of credit risk, are

affected by the demographic variables of the respondents. Two statistical tests will be used in

this analysis which is the t-test and the ANOVA test. The objectives of the research are

presented as shown below:


4

1. Is the amount of credit amount borrowed the same for all sexes, job categories, and

purposes the loans were used for?

2. Is the amount of credit duration the same for all sexes, job categories, and purposes the

loans were used?

This report will provide a step-by-step explanation of how the objectives were achieved

in addition to a thorough explanation of the tests.

Data description

One of the datasets that are frequently utilized in the field of credit risk analysis is known

as the German credit risk data. A German financial institution's credit applicants' personal

information is included in this dataset. The information consists of twenty different factors, some

of which include age, sex, income, credit history, employment status, and others. Whether the

applicant went into default on loan or not is shown by the target variable, which is a binary

indicator of this status. This dataset is frequently utilized in the construction of prediction models

for the purpose of credit risk assessment. It has had extensive application in scholarly

investigations, and it is currently open for public use. It is obtainable through a variety of sites,

such as the UCI Machine Learning Repository and Kaggle, among others (Aithal & Jathanna,

2019).

Because of the convoluted system of categories and symbols that were used to organize

the initial dataset, it is nearly difficult to comprehend it. As a result, the researcher selected a

portion of the full dataset to analyze. There are a few columns that are simply disregarded since,

in my opinion, either they do not contain relevant information or their explanations are not clear.
5

Descriptive statistics

In this section, we provide the descriptive statistics of the numerical variables that were

used in the analysis. They are presented in the table below.

Credit

amount duration Age

3271.25

Mean 8 20.903 35.546

89.2627 0.38133 0.35972

Standard Error 8 3 4

Median 2319.5 18 33

Mode 1393 24 27

Standard 2822.73 12.0588 11.3754

Deviation 7 1 7

Sample 796784 129.401

Variance 3 145.415 3

0.91978

Kurtosis 4.29259 1 0.59578

1.94962 1.09418 1.02073

Skewness 8 4 9

Range 18174 68 56

Minimum 250 4 19

Maximum 18424 72 75

Sum 327125 20903 35546


6

Count 1000 1000 1000

From the table above, the average credit amount is 3271.25, while the lowers credit

amount recorded was 250, and the highest credit amount recorded was 18424. The average

duration for the loans, as per the table above, was 20.9 months, while the lowest months were 4

and the highest months were 72. In terms of age, the average age of the participants in the

research was 35.5, while the lowest and the highest ages were 19 and 56, respectively.

In this section, we also analyze the number of people in each of the categories. The first is the

sex of the respondents, which is provided in the chart below.

As per the chart above, the was a high number of males who borrowed loans as opposed

to females. The number was nearly double that of females. Using this chart, we can say that there

are lots of males who borrow loans. To understand which sex borrowed a higher amount, we get

the average per loan as presented below.


7

Row Average of Credit

Labels amount

female 2877.774194

male 3448.04058

As per the chart above, on average, males had a higher credit amount as compared to

females. We also look at the relationship this had with the duration of the loan. This is

presented in the chart below.

As per the results of this chart, there is a higher credit duration for men as compared to

females. In the study that follows, we examine how the purpose of the loan affects the size of

the credit line and the length of the loan. Let us start by taking a look at the total number of loan

reasons for everyone who participated in this data gathering. The outcomes are displayed below.
8

According to the findings above, many people take out loans in order to purchase cars.

The third greatest majority receives a loan for furniture or equipment, followed by those who

receive one in order to purchase a radio. The lowest total is used for vacation or other expenses.

The findings are shown in the table below in terms of the average credit amount for each of the

loan reasons.

According to the aforementioned figure, more people than any other category borrowed

money on average for vacations. A business loan and a car loan came after it, respectively.
9

Comparing all loans, loans for household appliances had the lowest average loan amount. The

next step is to examine how long each loan was taken out for. The graph below displays the

outcomes.

Similar to the last instance, the average loan term for a vacation loan was the greatest,

followed by the average loan durations for a business loan and a car loan. The reason for this is

that these loans are frequently quite substantial, and borrowers would need to take out loans for a

very long period to pay them back. The following part examines the relationship between the

variables of credit amount and loan term and the job levels of the individuals. We start by

examining how frequently each employment level occurs.


10

There were four different employment categories, with Category 2 having the most

employees, followed by Category 3 and finally, Category 1. No one from job category 0 was

present. The employment classifications and average credit amount borrowed by each person are

shown in the table below.

Average

of Credit

Row Labels amount

1 2358.52

3070.965

2 1

5435.493

3 2
11

According to the preceding table, job category 3 received the most credit, then job

category 2, and then job category 1. The relationship between employment types and credit

lengths is next examined. The table below displays the findings.

Average

of

Row Labels Duration

1 16.535

21.41111

2 1

25.16891

3 9

According to the data above, there is a similar pattern between the credit amount and the

employment types and the credit term.

T-test

The goal of the first objective was to test whether sex has no impact on the credit amount

and duration of the loans. In this section, a statistical test known as the t-test is used. An

illustration of an inferential statistic is the t-test, which may be used to compare the means of two

groups or look into any relationships between them. T-tests are used for examining data that has

a normal distribution but unknown variances, such as the data that was obtained by repeatedly

flipping a coin a hundred times. Calculating the t-statistic, the values of the t-distribution, and the

degrees of freedom are the three components of the statistical test known as the t-test. This test is

used to assess whether or not a null hypothesis is correct (Liu & Wang, 2021).
12

We are able to provide a mathematical demonstration of the issue statement by

employing a t-test on data collected from both groups. The assumption here is that the means of

both groups are identical, which is known as the null hypothesis. Formulae are used to calculate

values, and the results are compared to standards. Therefore, one either accepts or rejects the

hypothesis of no effect. If it is possible to refute the null hypothesis, then it follows that the

readings of the data are not likely to be random. The t-test is one of the tests that are utilized for

the purpose of performing this job. Statisticians apply a variety of tests in addition to the t-test in

order to investigate a greater number of variables and larger sample sizes. Statisticians employ

the z-test when they have a high number of samples to analyze (Kim & Park, 2019).

Our first test looks at whether the credit amount borrowed by both sexes is the same. The

hypothesis statements for this test are shown below.

Null hypothesis: the credit amount borrowed by males and females is the same

Alternative hypothesis: the credit amount borrowed by males and females is not the same.

The results of the test are presented in the table below:

Independent Samples Test

t-test for Equality of Means

Sig. (2-tailed) Mean Std. Error

Difference Difference

Credit amount Equal variances assumed .003 570.266 192.254

Equal variances not


.002 570.266 184.531
assumed
13

From the table above, we have two results for the credit amount where we assume that

there were equal variances and equal variances were not assumed. For the two cases, we have a

p-value of 0.003 and 0.002, respectively. Generally, a test is interpreted in terms of its p-value. A

p-value that is larger than 0.05 indicates that we go with the null hypothesis, while a p-value that

is lower than 0.05 indicates that we go with the alternative hypothesis. In our case, the p-value is

lower than 0.05, which indicates that we go with the alternative hypothesis. We, therefore,

conclude that the loan amount borrowed by males and females are not the same. As per the

descriptive analysis in the previous section, we can say that males borrowed a higher credit

amount as opposed to Females.

The next test looks at how sex influenced the duration of the loan. In short, it is trying to

answer the question is the loan duration on average the same for all sexes? The hypothesis

statements for the analysis are shown below:

Null hypothesis: the duration of the loan borrowed is the same for males and females

Alternative hypothesis: the duration of the loan borrowed is not the same for males and females

The results of the t-test are presented as shown in the table below:

Independent Samples Test

t-test for Equality of Means

Sig. (2- Mean Std. Error

tailed) Difference Difference

Equal variances
Duration .010 2.122 .822
assumed
14

Equal variances not


.007 2.122 .786
assumed

As per the results of the table above, we note that the p-values of the t-test are 0.10 and

0.007, respectively. In all the cases, the p-values are lower than 0.05, which indicates that we

should go with the alternative hypothesis. Thus, we conclude that the credit duration of a loan

borrowed is not the same for males and females. From the descriptive statistics borrowed in the

previous section, we can conclude that the average duration for a loan for males is higher than

that of females. This can be explained in terms of the loan amount that was borrowed, as males

show a high tendency to borrow a high amount of credit.

ANOVA test

Analysis of variance (ANOVA) is a technique used in statistics that classifies the sources

of observed aggregate variability within a data set into two categories: systematic and random.

The statistical analysis of the supplied data set reveals the influence of systematic rather than

random causes. The analysis of variance (ANOVA) test is the first step in determining which

factors are at play in a given data set. After the first test is finished, an analyst will run additional

tests on the procedural factors that undoubtedly contribute to the inconsistent character of the

data collection (Burger, 2022).

The ANOVA test may be used to look at the relationship between more than two groups

all at once. The F statistic, also known as the F-ratio, is the outcome of the ANOVA formula and

allows for the comparison of various data sets to identify differences in variability. If there is no

significant difference between the groups, the F-ratio statistic of the ANOVA will be close to 1,
15

as stated by the null hypothesis. All potential values for the F statistic are distributed according

to the F-distribution. Degrees of freedom in both the numerator and denominator provide insight

into this set of distribution functions. This test will enable us to understand the impact of the

variable's purpose and job categories on credit amount and duration. They are used because they

have three or more categories. T-test was used in the previous section because it had 2 categories

(Miari et al., 2022).

Job categories

The first test we consider is the effects of job category on the credit amount. The question

answered is the average credit amount the same for all job categories? the ANOVA test

hypothesis statements are provided below

Null hypothesis: The average credit amount is the same for all job categories

Alternative hypothesis: The average credit amount is not the same for all job categories

The results of the ANOVA test are presented below:

ANOVA

Credit amount

Sum of df Mean Square F Sig.

Squares

Between 891200988.7 297066996.2


3 41.858 .000
Groups 00 33

7068674638.
Within Groups 996 7097062.890
736
16

7959875627.
Total 999
436

From the above table, the p-value is 0.000. As per the decision rule, when the p-value is

greater than 0.05, we go with the null hypothesis; else, we go with the alternative hypothesis. In

our case, if the p-value is lower than 0.05, then we go with the alternative hypothesis and

conclude that the credit amount is not the same for all of the job categories. The next table looks

at the impact of the job categories on the average credit duration. In other words, it answers the

question, is the average credit duration the same for all the job categories? The hypothesis

statements for this analysis are presented below:

Null hypothesis: The average credit duration is the same for all job categories

Alternative hypothesis: The average credit duration is not the same for all job categories

The table below presents the results of the ANOVA analysis that was conducted.

ANOVA

Duration

Sum of df Mean Square F Sig.

Squares

Between
6947.446 3 2315.815 16.675 .000
Groups

Within Groups 138322.145 996 138.878

Total 145269.591 999


17

As per the table above, the p-value is lower than 0.05, which indicates that the credit

duration is not the same for all job categories.

Borrowing limits vary by occupation for a number of reasons. The average salary for

each profession is a major consideration. Executive roles, which tend to pay more, may allow

their employees to take out larger loans than entry-level retail ones. Job security is also a major

consideration. Borrowers who have established employment and a regular revenue stream may

find that lenders are more amenable to providing greater loan amounts. Lenders will feel more

secure about getting their money back because of this security.

The maximum loanable amounts vary by occupation and, to a lesser extent, by education

and training. Professionals with higher education requirements, such as physicians and attorneys,

may be eligible for larger loans. These general considerations may be supplemented by sector-

specific considerations. It's possible that different occupations within the same industry (or even

in other industries) will have different access to credit based on the specific lending standards in

place. Lending standards in the real estate business, for instance, might be different from those in

the Internet sector.

Purpose of the loan

The purpose of this section was to conduct a loan to determine how the purpose of a loan

influences the credit amount and the duration of the loan. In other words, are the average credit

amount and credit duration the same for all loan purposes? Therefore, two tests were conducted;

the first test was on how the average credit amount is affected by the loan purpose. The

hypothesis statements are presented below.

Null hypothesis: The average credit amount is the same for all loan purposes

Alternative hypothesis: The average credit amount is not the same for all loan purposes
18

The resulting ANOVA test is presented below.

ANOVA

Credit amount

Sum of df Mean Square F Sig.

Squares

Between 684889703.7 97841386.24


7 13.341 .000
Groups 20 6

7274985923.
Within Groups 992 7333655.165
716

7959875627.
Total 999
436

As per the above test, there is a difference in the average credit amount for all the loan

categories. This is because the p-value is greater than 0.05. The next table looks at the impact of

the credit duration.

Conclusion

The main objective of this analysis was to determine whether the amount of credit

amount borrowed and the amount of credit duration was the same for all sexes, job categories,

and purposes the loans were used. Statistical tests, t-tests and ANOVA were used in the

analysis. The findings revealed that there were significant differences in credit amount between

males and females, with males borrowing a higher amount on average. Additionally, credit
19

duration was found to vary significantly between sexes, with males having a longer duration on

average. Furthermore, the ANOVA test showed that job categories had a significant influence on

both credit amount and duration. Different job categories were associated with varying credit

amounts and durations. These findings highlight the importance of considering demographic

variables when assessing credit risk. The step-by-step explanation of the objectives and tests

provided in this report enhances understanding of the research findings and their implications for

credit risk assessment. A financial institution should always consider these variables to avoid

defaulting, as they should have limits on the loans that they can give to individuals. Observing

this ensures that the loan amount and the duration of the loan offered to the individuals are what

they are capable of paying.


20

References

Aithal, V., & Jathanna, R. D. (2019). Credit risk assessment using machine learning techniques.

International Journal of Innovative Technology and Exploring Engineering, 9(1), 3482-

3486. https://doi.org/10.35940/ijitee.A4936.119119

Burger, T. (2022). Applying FDR control subsequently to large scale one-way ANOVA testing

in proteomics: practical considerations. bioRxiv, 2022-08.

https://doi.org/10.1101/2022.08.29.505664

Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning

in credit risk management. Computational Economics, 57, 203-216.

https://doi.org/10.1007/s10614-020-10042-0

Kim, T. K., & Park, J. H. (2019). More about the basic assumptions of t-test: normality and

sample size. Korean journal of anesthesiology, 72(4), 331-335.

https://doi.org/10.4097%2Fkja.d.18.00292

Liu, Q., & Wang, L. (2021). t-Test and ANOVA for data with ceiling and/or floor effects.

Behavior Research Methods, 53(1), 264-277. https://doi.org/10.3758/s13428-020-01407-

Miari, M., Anan, M. T., & Zeina, M. B. (2022). Neutrosophic two way ANOVA. International

Journal of Neutrosophic Science, 18(3), 73-83. http://dx.doi.org/10.54216/IJNS.180306

You might also like