You are on page 1of 24

INTRODUCTION:

Credit Scoring

Scoring is a way to apply statistical modeling to a representative database and generate a


numerical score for each borrower or loan. The score can be used to classify individual
borrowers or loans into risk categories.
Credit score can be defined as a numerical value or a categorization
derived from a statistical tool or modeling system used by a person who makes or
arranges a loan to predict the likelihood of certain credit behaviors, including default (and
the numerical value or the categorization derived from such analysis may also be referred
to as a risk predictor or risk score.
Thus, a credit score is a number that is intended to predict a
borrower’s propensity to repay a loan; a loan score expands upon the credit score to
include variables relating to loan characteristics — for example, the loan-to-value ratio
for a home mortgage — to create a numerical indicator of the probability that a loan may
default.

CREDIT SCORING PRACTICES IN INDIA

The prevailing credit scoring practices in the country has revealed the following points
about the credit scoring practices in India

Only few of the Public Sector Banks in the country use credit scoring for evaluating
their retail credit applicants. Same is the case with the old private banks. Many of
these banks even today use judgmental methods for retail credit.
Banks that use their in-house credit scoring models are not in the habit of regularly
back testing and validating them. A few banks are using the scoring model for 7 to 10
years without revision.
The same model is used for all types of loans-automobile, credit cards, housing,
personal loans etc.
Redlining i.e. classifying areas as per the repayment track record is not being done.
The scoring models are used only at the appraisal stage. Behavioral scoring is not
practiced.
Generic models are not in use. The banks using credit scoring have their own
customized models. These models are either developed by them or outsourced.
The parameters/characteristics do not form a part of the application. They are also not
revealed to the applicant.

Rural Credit

Agrarian economy has an important place in India, even though it is no longer the sector
that contributes the highest percentage to the growth rate. Indian rural scene is rather
dismal, with agriculture deteriorating into a rather unglamorous and unremunerative
sector. There is an urgent need to strengthen the social infrastructure so as to fight
illiteracy and poverty, and simultaneously, strengthen the physical infrastructure like
health services, roads, drinking water, dwelling houses, irrigation facilities, quality
agricultural inputs, processing units, produce storage units, marketing etc. Also bank
credit has to be delivered timely and cost-effectively.

India, the second fastest growing economy, powered by service sector growth, aided by
high quality intellectual capital and the boom in the manufacturing sector, still has about
28 crore people, predominantly ruralites, below the poverty line.

Adequately managing credit risk in financial institutions is critical for their survival and
growth. In the case of rural lending in general and agricultural lending in particular, the
issue of credit risk is of even of greater concern because of the higher levels of perceived
risks resulting from some of the characteristics of rural dwellers and the conditions that
they find themselves in. More extremely poor people tend to live in rural than in urban
areas. In addition, fewer people are able to access basic infrastructure services and these
tend to be of lesser quality or to be less reliable than in urban areas. Rural residents tend
to be less educated, more often than not they have insecure land tenure, and they live
farther apart than urban populations. Most importantly, agriculture, the mainstay of most
rural economies, tends to be subject to price volatility, weather shocks, and trade
restrictions. As a result, financial institutions that are active in rural areas are likely to
face an elevated level of credit risk and need to manage it well. The lack of good risk
mitigation techniques and high transaction costs can discourage formal financial
institutions from entering and serving rural areas.

In 2000, over seventy percent of India’s population, and roughly three quarters of its
poor, lived in rural areas. The main livelihood in rural India remains agriculture, an
activity characterized by significant time-lags in production and a high degree of
sensitivity to weather conditions. These features of agricultural production make access
to financial instruments critical to a rural household’s ability to smooth income shocks
and make long-term productive investments. However, as is well known lenders’
inability to perfectly identify the credit-worthiness of potential borrowers and the cost of
enforcing repayment places severe restrictions on rural households’ access to credit.
These problems are potentially more severe for the rural poor who are less able to reduce
lender risk by providing collateral. This also has implications for the geographical
distribution of formal credit lenders. Anticipating insufficient profits, lenders such as
commercial banks, may choose not to set up branches in relatively poor rural areas. This,
in turn, by giving lenders in the informal sector monopoly power may further raise the
interest rates faced by the rural poor and restrict their access to affordable credit.

A belief that the welfare costs of exclusion from the banking sector, especially for the
rural poor, are high has led to widespread government intervention in the banking sector
of low income countries. Examples of such interventions range from interest rate ceilings
on lending to the poor to State-led branch expansion in rural areas. India was home to
some of the largest policy interventions aimed at providing banking for the poor. The
motivation for the Indian interventions can be traced to the All-India Credit Survey
Report (1954). This report showed that four years after independence, the informal credit
sector accounted for the bulk of rural lending, with moneylenders contributing close to 70
percent of the total. The average annual interest rate on these loans exceeded 20 percent.
In contrast, less than one percent of the borrowing was accounted for by commercial
banks. Commercial banks remained confined to urban areas and geared towards the
financing of trade and commerce activities. The survey data also showed a strong positive
relationship between asset ownership and borrowing. The Report concluded that financial
backwardness was a root cause of rural poverty, and that commercial banks needed to be
harnessed to enhance formal credit in rural areas — both to enable poor, rural households
to adopt new technologies and production processes, and to displace ‘evil’ moneylenders
who exploited their monopoly power to charge high rates of interest.

Commercial banking in rural India remained unprofitable. The average default rate for
commercial banks during the 1980s stood at 42 percent (as a share of all loans due for
repayment). Default rates were very similar across types of borrower — a finding
consistent with poor monitoring of borrowers at all levels, and the fact that large scale
loan defaults were very often politically condoned.

In the end, it was the relative unprofitability of rural banking which led to the demise of
social banking in India. In 1991, at the outset of liberalization of the Indian economy, the
Report of the Committee on the Financial System stated that redistributive objectives
“should use the instrumentality of the fiscal rather than the credit system” and that
directed credit programs be phased out and that branch licensing policy be revoked. As a
result post-1991 rural branch expansion has been limited and multiple studies suggest that
access of the rural poor to the banking sector reduced. The share of rural banks in total
banks has fallen from 58 percent in 1990 to under 50 percent by 2000 and the share of
total bank credit that went to rural areas declined, from 15.3 per cent in 1988 to 10.6 per
cent in 2000. The policy recommendation is that this reduction in formal sector lending
be filled by micro-credit institutions. Despite impressive advances by the Indian micro-
credit sector it is still unclear whether they will be able to achieve a mobilization of rural
savings and a credit outreach which equaled the achievements of the Indian social
banking experiment in the 1970s and 1980s.

Need for Credit for Rural Households

The rural population in India suffers from a great deal of indebtedness and is subject to
exploitation in the credit market through high interest rates and lack of convenient access
to credit. Rural households need credit to fund their working capital needs on a day-to-
day basis, investment in agriculture, insurance against minor spikes and troughs with
respect to income and expenditure. Since cash flows in rural areas for the majority of
households are small and savings are small as well, rural households typically tend to rely
on credit for other consumption needs like education, food, housing, household functions
etc. To meet these credit needs rural households need access to financial institutions that
can provide them with credit at lower rates and at reasonable terms than the traditional
money lender thereby helping the rural populace avoid debt-traps that are common in
rural India.
VARIOUS MODELS/METHODS PROPOSED FOR CREDIT SCORING

Introduction:

The objective of credit scoring is to develop models that accurately distinguish Good
applicants (likely to repay) from Bad applicants (unlikely to repay). Until recently, this
distinction was made using a judgmental approach by merely inspecting the application
form details of the applicant. The credit expert then decided upon the creditworthiness of
the applicant, using all possible relevant information concerning his socio-demographic
status, economic conditions, and intentions. The advent of data storage technology has
facilitated financial institutions' ability to store all information regarding the
characteristics and repayment behavior of credit applicants electronically. This has
motivated the need to automate the credit-granting decision by using statistical or
machine-learning algorithms.
Numerous methods have been proposed in the literature to develop credit-risk evaluation
models. These models include traditional statistical methods (e.g., logistic regression,
Steenackers and Goovaerts 1989), nonparametric statistical models (e.g., k-nearest
neighbour, Henley and Hand 1997, and classification trees, Davis et al. 1992) and neural
network models (Desai et al. 1996). Most of these studies focus primarily on developing
classification models with high predictive accuracy without paying any attention to
explaining how the classifications are being made. Clearly, this plays a pivotal role in
credit-risk evaluation, as the evaluator may be required to give a justification for why a
certain credit application is approved or rejected.
Often conflicts arise when comparing the conclusions of some of these studies. For
example, Desai et a16 found that neural networks performed significantly better than
linear discriminant analysis for predicting the bad loans, whereas Yobas et al reported
that the latter outperforms the former. Furthermore, most of these studies only evaluate a
limited number of classification techniques on one particular credit scoring data set.
Hence, the issue of which classification technique to use for credit scoring remains a very
difficult and challenging problem.
The intent of applicant scoring is to forecast the future behavior of a new credit applicant;
behavior scoring tries to predict the future payment behavior of an existing account. The
terms applicant score and behavior score are traditionally used in the context of
discriminant analysis, which is, by far, the most common quantitative technique in credit
management and which, accordingly, receives the most attention in this survey. Other
techniques used are logistic regression, decision trees, expert systems, neural networks,
integer programming, linear programming, and nearest neighbour method. Coffman and
Chandler (1983) observed that behavior scoring lacks the widespread use and the
industry acceptance that applicant scoring enjoys. Another 1983 paper reports on a
survey showing that only one third of the companies surveyed use credit scoring for other
than application scoring (Nelson 1983).

Literature review

1. Statistical methods:

Statistical models, called score-cards or classifiers, use predictor variables


from application forms and other sources to yield estimates of the
probabilities of defaulting.
An accept or reject decision is taken by comparing the estimated probability
of defaulting with a suitable threshold.
Standard statistical methods used in industry for developing score-cards are
Discriminant Analysis, Linear Regression, Logistic Regression and decision
trees.

Discriminant analysis:

The first published account of the use of Discriminant Analysis to produce a


scoring system seems to be that of Durand (1941) who showed that the method
could produce good predictions of credit repayment.
Eisenbeis (1977, 1978) presented a critical assessment of the use of discriminant
analysis in business, finance and economics in general. The criticisms are
discussed in Rosenberg and Gleit (1944).
Other accounts of the use of discriminant analysis in credit scoring are given by
Myers and Forgy(1963) (who compared discriminant analysis and regression
analysis, Lane(1972), Apilado et al. (1974) and Moses and Liao(1987).
Grablowsky and Talley(1981) compared linear discriminant analysis and probit
analysis by using data from a large Midwestern retail chain in the USA.

Problems in Applying Discriminant Analysis for Credit Scoring:

Several authors have expressed sharp criticisms regarding the use of discriminant
analysis in credit scoring. Many of these criticisms are really problems inherent in
credit scoring.
Capon (1982) cites several severe methodological problems: Since the scoring
system is developed from a sample of people given credit, it is not unbiased when
applied to people seeking credit; development of scoring systems with too small
samples; and the use of arbitrary judgment when credit scorers assign an applicant
to a category (e.g., is an executive assistant clerical or managerial).
Galitz observes that, because scoring systems usually treat people with similar
characteristics identically, important exceptions may be missed.
Regression:

Ordinary linear regression has also been used in credit scoring. Since, regression
using dummy variables for the class labels yields a linear combination of the
predicting characteristics which is parallel to the discriminant function
(Lachenbruch, 1975), this method might also be expected to perform reasonably.
Orgler (1970) used regression analysis in a model for commercial loans, and in
1971 used regression analysis to construct a score-card for evaluating outstanding
loans, rather than screening new applications. Since the valuation of outstanding
loans includes information about how the customer has performed so far, it is a
behavioral scoring model. He found that the behavioral characteristics were more
predictive of future loan quality than are the application characteristics.
Other studies describing the use of regression include that of Fitzpatrick (19760,
Lucas (1992) and Henley (1995).

Logistic regression:

Logistic regression is a variation of ordinary regression which is used when the


dependent variable is a binary variable (i. e., it takes only two values, which
usually represent the occurrence or non-occurrence of some outcome event) and
the independent (input) variables are continuous, categorical, or both.
Logistic Regression has been widely used in the financial service industry for
credit scoring models.
On theoretical grounds we might suppose that logistic regression is a more
appropriate statistical tool than linear regression, given that two discrete classes
(good and bad) have been defined. In a comparative study, however, Henley
(1995) found that it was no better than the linear regression. He attributed this to
the fact that a relatively large proportion of the applicants whom he studied had
scores associated with estimated probabilities of being good risks between 0.2 and
0.8. When this is the case the logistic curve is very well approximated by a
straight line.

Decision Trees:

Another technique for credit decision making is recursive partitioning or Decision


trees, developed in the early 1960s by H. Raiffa and his colleagues at the Harvard
Business School (Raiffa and Schlaifer 1961).
In 1972 David Sparks at the University of Richmond used a decision tree to build
a credit scoring model.
Decision trees have gained some popularity and received official recognition
when the Federal Reserve Board, in its published interpretation of the Equal
Credit Opportunity Act, called decision trees an "empirically derived,
demonstrably and statistically sound credit system."
A detailed mathematical discussion of decision trees is given in Breiman et al.
(1984).
One way to apply a decision tree is to associate with each node either the
probability of nonpayment or the profit for the set of people represented by the
node (Makowski). The probability at node ‘n’ will thus be the weighted average
(with the weights determined according to the number of people at a node) of the
probabilities of the children of node ‘n’.
To make a decision on an observation (account), we trace down the tree from the
root node, choosing the appropriate branches for the observation, until we reach
the proper leaf node; comparing the probability of non-payment or profit at the
node to a chosen cutoff yields the decision.

Integer Programming Approach:

Another major approach to making yes/no credit decisions on an individual basis


is the integer programming approach of Showers and Chakrin (1981) and Kolesar
and Showers (1985). They developed a model to determine which AT&T
telephone customers should be required to leave a deposit. The advantage of
deposits is that they provide protection against bad debt and also serve to deter
risky customers; on the other hand, there is a cost of administering a deposit
policy and they deter some profitable customers. While they wanted a simple
scoring rule, because the customer data were all binary, they felt that classical DA
(Discriminant Analysis) would not be appropriate. They also wanted all weights
on the variables to be 0 or 1 for public policy reasons. The binary data yield a
finite set of possible customer profiles, and they formulated a 0-1 knapsack
problem to determine which profiles should pass.

2. Non Parametric statistical Models:

Non-parametric methods, especially nearest neighbour methods, have been


explored for credit scoring applications, e.g. by Chatterjee and Barcun (1970),
Hand (1986) and Henley and Hand (1996).
The first of these studied personal loan applications to a New York bank and
classified them on the basis of the proportion of cases with identical characteristic
vectors which belonged to the same class (this is feasible since they had only 8
characteristics, all binary).
Hand (1986) compared a variety of classification methods, including nearest
neighbours and recursive partitioning classifiers, on a data set describing
applications for loans for home improvement.
Henley and Hand (1996) described a detailed investigation of nearest neighbour
methods applied to data from a large mail order company. In particular, they
investigated the choice of metric (how to define ‘nearest’) and the choice of the
number of nearest neighbors to consider.

3. Neural Network:

The initial work on neural networks was motivated by the study on human brain and
the idea of neurons as its building blocks.
Artificial intelligence researchers introduced a computing neuron model to simulate
the way neurons work in human brain. This model provided the basis for many later
neural networks developments.
Neural networks (Gallant 1988, Eberhart and Dobbins 1990, Nelson and
Illingworth 1990), which model information processing in the human brain,
consist of input, hidden, and output layers of inter- connected neurons. Neurons in
the one layer are combined according to a set of strengths and fed to the next
layer. These strengths allow the network to learn and store associations.
The development of a neural network for credit analysis requires a training stage
in which, for example, the network is given actual information about loan defaults
and successes along with the support credit application data (i.e., income,
occupation, etc.). This information is used to obtain a best set of strengths.
NNs have served as versatile tools for data analysis in a variety of complex
environments. In finance, they have been successfully applied to bankruptcy and
loan-default prediction and credit evaluation [DeLurgio and Hays (2001) and Jain
and Nag (1995)].
Neural networks have been used successfully in corporate credit decisions and in
fraud detection; though not yet applied to consumer credit, they are actively being
studied and show great promise.
Maves (1991) observes that as markets, products, and economics change, neural
networks can be "retrained" much more quickly than discriminant analysis based
techniques.

Despite the tremendous benefits offered by knowledge discovery for businesses, neural
networks are not free from criticism. Most neural networks are of the "black box" kind. This
means that the tool can only produce conclusions without explanations and justification of
the reasons behind such conclusions. This makes accountability and reliability issues difficult
to address. That is why one of the main interests in knowledge discovery research is to find
ways to justify and to explain the knowledge discovery result. Other limitations concern the
high computational requirements of neural networks, usually in the form of computer power
and training time, and the scarcity of experts in the field, which makes some businesses avoid
their use (Marakas, 1999).

PROPOSED MODELS TO BE APPLIED ON THE PROVIDED DATA

Of the various models proposed here, the models which we are going to use are:
 Logit Analysis/Logistic regression
 Neural Network
Using these models, our objective will be to check out and find the model which has
better accuracy in predicting the loan default.

REASONS FOR SELECTING THESE TWO MODELS

In scoring applications, we work with binary variable, one that takes only two values:
whether a loan defaulted or not. Although linear regression sometimes is used to estimate
scoring models, logistic regression is preferred because it is designed specifically for the
case where we have a binary dependent variable.
A problem with linear regression is that it can produce predicted probabilities that are
greater than 1 or less than zero, which doesn’t make a whole lot of sense. The logistic
model prevents this by working with the odds of the event happening instead of the
probability, and taking the natural logarithm of the odds to prevent negative probabilities
(hence the name logistic). So,

Ln*(p/(p-1)) = B0 + B1* X1 +……..+Bn*Xn

We have then used neural networks to score new applicants because the lender does not
know how the prospect will perform on a loan. The score, therefore, is the prediction of
how the applicant will perform in the future, based on previous experience with similar
applicants. The model scores new individuals with unknown outcomes. Also, a particular
advantage offered by neural network is that it can capture nonlinear relationships.
Another reason for their success is that NNs are more capable than traditional statistics in
effectively modeling complex relationships. Dorsey et al. (1995) conducted an extensive
review of literature and found that neural network models consistently improve
bankruptcy forecasts. They found that classification with NNs performed better, both in-
sample and out-of-sample, than discriminant analysis and logistic regression. Further,
they can avoid problems resulting from strongly correlated independent variables. When
independent variables are highly correlated, as they often are (e.g., financial ratios
sharing the same numerator or denominator), substantial imprecision in estimation of
regression coefficients can result. Fortunately, these interrelationships are less likely to be
a problem in neural network models, as discussed by Klimasauskas (1996) and Tucker
(1998).

Practitioners in the financial services industry are also starting to benefit from the
improved accuracy of neural network-based credit scoring systems. American Express
and Security Pacific Bank are using neural credit scoring systems for credit card fraud
detection and small business loan approval, respectively. Lloyds Bowmaker Motor
Finance, which now employs a neural network credit scoring system for granting
automobile financing, claims that their neural network system is 10% more accurate than
the system they used previously [West (2000)].

EMPIRICAL STUDY:

Effective loan scoring requires large amounts of high-quality data. Many different types
of data are required, including information about loan origination and continuing loan
performance, borrower characteristics, and the financial outcome, i.e., whether the loan
prepays, becomes delinquent, defaults, or pays in full on time. The data must be available
for a long period of time so that important background factors, notably the robust
economy in recent years, can be taken into account in a predictive model.
In order to assess the effectiveness of credit scoring using statistical and Data Mining
methods, the loan applicant’s data of ICICI bank is to be used. We had asked for many
different data fields from the bank and based on our hypothesis, these fields have been
classified into two major categories: qualitative and quantitative factors. These data
include client’s financial statements, client's specifications, type and amount of loan and
client's performance in loan repayment.

In relation to loan application evaluation, the information that is regarded as important


includes:
 The character of the applicant, credit history of the applicant is certainly helpful
 The capital, this identifies how much money is being asked by the applicant
 The collateral, this refers to what things that the applicant willing to put up from
his/her own resources
 The capacity, the applicant's repaying ability is one of the most crucial factors, this
information can be in the form of how much income does he/she earn, how many jobs
does he/she have or how long has he/she been working in their job
 The financial position of the applicant

ADVANTAGES OF RURAL CREDIT SCORING MODEL:

The credit scoring model developed here based on the sample data would help the bank
in the following way:
• Objectivity & quantitative assessment
• Improved (informed) decision making in a consistent manner
• Improved speed of loan processing as time and manual steps get reduced
• Improved customer service
• Cost efficiency - reduces loss rates while holding approval rates constant
• Allows for risk based pricing
• Improves approval rates while holding loss rates constant
• Reduces training time for new credit staff
• Sharpen/Improve analytical skills of credit officers
• It strengthens banker-borrower relationship because of transparency in loan decision
making process

3.3. Further Benefits for the Bank:

• Finance is risk management, and scoring facilitates risk management


• Quantifies risk as the % chance that something ‘bad’ will happen
• Makes risk evaluation explicit, systematic, consistent (not just loan officer’s ‘gut
feeling’)
• Quantifies risk’s links with characteristics
• Therefore, better risk management implies more loans with same effort, greater
outreach, more market share, and greater profits
• Greatest benefit: Strengthen the credit culture of explicit & conscious risk management

LIMITATIONS OF CREDIT SCORING

The accuracy of the scoring systems for underrepresented groups is still an open question.
Accuracy is a very important consideration in using credit scoring. Even if the lender can
lower its costs of evaluating loan applications by using scoring, if the models are not
accurate, these cost savings would be eaten away by poorly performing loans. The
accuracy of a credit scoring system will depend on the care with which it is developed.
The data on which the system is based need to be a rich sample of both well-performing
and poorly performing loans. The data should be up to date, and the models should be re-
estimated frequently to ensure that changes in the relationships between potential factors
and loan performance are captured. If the bank using scoring increases its applicant pool
by mass marketing, it must ensure that the new pool of applicants behaves similarly to the
pool on which the model was built; otherwise, the model may not accurately predict the
behavior of these new applicants. The use of credit scoring itself may change a bank’s
applicant pool in unpredictable ways, since it changes the cost of lending to certain types
of borrowers. Again, this change in applicant pool may hurt the accuracy of a model that
was built using information from the past pool of applicants. Account should be taken not
only of the characteristics of borrowers who were granted credit but also of those who
were denied. Otherwise, a “selection bias” in the loan approval process could lead to bias
in the estimated weights in the scoring model. A model’s accuracy should be tested. A
good model needs to make accurate predictions in good economic times and bad, so the
data on which the model is based should cover both expansions and recessions. And the
testing should be done using loan samples that were not used to develop the model in the
first place.

METHODOLOGY

The following steps were taken to build the model for credit scoring.
 Firstly, we performed factor analysis of the given data in SPSS to understand which
all variables can be clubbed together so as to reduce it in lower no. of factors.
 After seeing the output and noting down the insignificant variables having very less
impact on the credibility of the customers, such insignificant variables were
eliminated and factor analysis was run again.
 Now logistic regression was run with all the independent variables taken initially.
 Now, the raw data (with all of the independent variables taken in 1 st factor analysis)
was used again as independent variables to run Statistica Neural Network (SNN).
 Again with the few selected variables SNN was run to compare the difference in
accuracy.
 Finally SNN was run again using the factor scores.

RESULTS AND SUGGESTIONS

FACTOR ANALYSIS:
 When factor analysis was done using the initial independent variables, the
variables were reduced to 7 factors named Luxurious utilities, Net Liability,
Total Income, Maturity, Commuting mode and Personal factors respectively
explaining 63.565% of the variations.
 After removing all the insignificant variables, when factor analysis was run using
the selected few variables, these reduced to 6 factors explaining 73.16% of the
variations.
 Taking these 6 factors into consideration, following patterns were found:

 Good customers showed to have more of luxurious items than the bad customers
though an exception was noticed that bad customers had more no. of washing
machines than good customers.
 Similarly, more no. of people possessing four wheelers defaulted.
7000

6000

5000
No. of customers

Count of GOOD_CUS
4000 Sum of TV
Sum of MS
3000 Sum of FRIDGE
Sum of WM
2000 Sum of TW
Sum of FW
1000

0
0 1
Luxury items

AGE
0% 0%

0%
12% 1-10
0% 11-20
6% 21-30
30%
31-40
41-50
51-60
61-70
15%
71-80
101-110

36%
Chart Title
2000
1800
1600
1400
No. of Customers

1200
1000 0
1
800
600
400
200
0
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 101-
110
Age Group

3000

2500

2000
No. of Customers

1500

Total
1000

500

0
0 1 2 3 4 5 0 1 2 3 4 5
0 1
Educational Qualification

 Maximum no. of loans was taken by the customers belonging to the age group of 31-
40 years.
 The age group of 11-20 years has maximum no. of defaulters (72.22%) as a
percentage of total no. of customers in that age group. Similarly, even as a percentage
of total no. of defaulters, maximum defaulters also belonged to the same age group.
 As per educational qualification wise, most of the customers have undergone some
other qualification. Second highest no. of customers were degree holders (graduate).
 As a percentage of the total no. of customers belonging to that age group, the
maximum no. of defaulters had completed Higher Secondary.

LOGIT REGRESSION

With initial independent variables:


 On running the logit analysis with initial independent variables, the overall correct
prediction of good and bad customers came to be 93.2%. Out of this, 98% of bad
customers were predicted correctly whereas only 89% of good customers were
predicted correctly.
 The classification table and the variables in the equation with their respective
exponential beta are given in ANNEXURE I
 Bases on the exponential beta of the independent variables, the probability of the
customer paying back the loan is calculated as
p=odds/ (odds+1)
 .Exponential betas with value greater than or equal to 1 had significance to us as
when converting it into probability, only such values provided a significant
probability. Though all such variables might not be statistically significant to us but if
an additional unit of such independent variable is increased, they can enhance
considerably the chance of the customer to pay back the loan. For e.g.- if the
customers owns one more music system as compared to what he currently owns, the
probability of him paying the loan will increase by 66.20%.
 Based on this parameter, the important variables were dependents, age, down
payment, music system and qualification description.

With selected few variables again the logistic regression was run which gave the
following results (ANNEXURE II)-
 92.8% results were predicted correctly.
 Important variables were income, age, other income, music system, four wheeler,
receivable, received and qualification description.

Finally the logistic regression was run using the factor scores and the results obtained
were as follows (ANNEXURE III)
 91.6% of results were predicted accurately.
 Important factors were factor 3 and factor 6 i.e. total income and personal factors.

So, it turned out that to get the best results from logistic regression, all the
independent variables should be taken into consideration so as to predict more
accurately.

NEURAL NETWORK
 SNN was run firstly using the initial independent variables and by the area under
ROC curve, we found that 96.8% of results can be predicted accurately using SNN
(ANNEXURE IV).
 When SNN was run using the selected few variables, area under ROC curve came to
be 94.69% (ANNEXURE V)
 Finally running the SNN using factor scores, the area under ROC curve came to be
95.89%

SNN also gave the same conclusion though on reducing the variables into factor, the % of
accuracy increased.

Comparing all these three we found that logistic regression and neural network has a very
high degree of accuracy and can predict loan default more accurately if all the
independent variables are taken into consideration.

As the model has been built up, now if any new customer arrives with a loan application,
the bank can simply use this model to predict whether there is any probability of the
customer to default or not and whether to provide the customer the specified loan or not.

Summary Conclusions:

This section presents the credit scoring model for rural loan that we have developed
based on the sample data obtained from the Bank. After giving an overview of the key
issues in credit risk management of rural loans, we talked about major risk drivers and
their importance in assessing the creditworthiness of the borrowers. Since banks are firms
balancing risk and return characteristics among alternative opportunities, banks cannot
avoid risks. Credit risk is the largest risk faced by banks even in rural loans. This implies
that rural exposures can typically be managed on a portfolio basis, and many exposures in
the same portfolio have similar risk characteristics. This will enable the bank to diversify
the risk and optimize the profit in the business which will ultimately enable them to
comply for the Basel II requirements under the advanced approach. In this direction, we
have also suggested how to use this scoring model for portfolio management of risk. It is
important to note that entire exercise is based on a sample data of a specific region.

In order to have a robust model and robust tool for mitigating risk in agricultural loan
which is perused as risky, for the entire bank, one has to enlarge the data sample and
include many regions into the analysis.
ANNEXURE I

Classification Tablea

Predicted

GOOD_CUS

Observed 0 1 Percentage Correct

Step 1 GOOD_CUS 0 5436 108 98.1

1 717 5790 89.0

Overall Percentage 93.2

a. The cut value is .500


Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95.0%
C.I.for
EXP(B)
Lower Upper probabilit
y
Ste 0.11756 0.01757 44.7398 2.25E- 1.12475 1.08667032 1.16418
1
p1 DEPENDEN 8 7 1 11 9 4 2 0.529358
0.27364 1.76104 0.18449 0.69549 0.40678834 1.18909
-0.36313 1
CHILDREN 2 9 4 3 5 5 0.410201
2.61E- 0.61635 0.43240 0.99999 0.99999283 1.00000
-2E-06 1
INCOME 06 4 5 8 3 3 0.499999
0.00600 56.5778 0.95586 0.94469235 0.96717
-0.04514 1 5.4E-14
EXPERIEN 1 7 8 7 7 0.488718
7.46E- 0.75717 0.38421 0.99993 0.99978888 1.00008
-6.5E-05 1
RENT 05 1 5 5 3 1 0.499984
0.02343 0.00124 356.043 2.05E- 1.02371 1.02122548
1 1.02621
AGE 8 2 5 79 5 7 0.505859
6.15E- 98.3850 3.44E- 1.00006 1.00004931 1.00007
6.2E-06 1
DOWNPAYM 05 5 23 1 6 4 0.500015
1.74E- 6.60466 0.01017 0.99995 0.99992114 0.99998
-4.5E-05 1
OTHINCOM 05 7 1 5 8 9 0.499989
0.16052 5.36867 0.02050 0.68939 0.50330297 0.94429
-0.37194 1
TV 4 5 2 5 2 2 0.408072
0.67237 0.14250 22.2633 2.38E- 1.95888 1.48153593 2.59003
1
MS 5 1 4 06 5 4 5 0.662035
0.10017 4.34633 0.03708 0.81152 0.66685498 0.98757
-0.20884 1
FRIDGE 5 4 9 2 6 4 0.447978
12.8281 0.00034 0.51658981 0.82424
-0.4269 0.11919 1 0.65253
WM 9 1 3 4 0.394867
0.12816 15.8846 6.73E- 0.60000 0.46672073 0.77134
-0.51082 1
TW 8 9 05 3 1 8 0.375001
0.40503 0.32706 1.53362 0.21556 1.49935 0.78978018 2.84646
1
FW 8 6 5 9 9 2 1 0.599897
0.00542 0.71459 1.00543 0.97663383 1.03509
0.01483 0.13373 1
RCVBL 3 5 8 4 2 0.501356
0.00659 0.02406 0.07515 0.78397 0.96024034 1.05523
1 1.00662
RCVD 8 6 6 2 8 9 0.501649
QUAL_DES_ 0.17600 0.02283 1.29E- 1.19244 1.14024857 1.24702
59.4022 1
A 4 6 14 3 6 7 0.543888
0.16081 690.536 3.4E- 0.01461
-4.22582 1
Constant 2 2 152 3

ANNEXUE II
Classification Tablea

Predicted

GOOD_CUS

Percentage
Observed 0 1 Correct

Step 1 GOOD_CUS 0 5451 93 98.3

1 775 5732 88.1

Overall Percentage 92.8

a. The cut value is .500

Variables in the Equation


95.0% C.I.for
EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper Probability


Step 1 CHILDREN -0.593 0.286 4.309 1 0.038 0.552 0.315 0.967 0.35567
INCOME 0 0 0.137 1 0.711 1 1 1 0.5
AGE 0.025 0.001 418.536 1 0 1.025 1.023 1.027 0.506173
OTHINCOM 0 0 9.294 1 0.002 1 1 1 0.5
TV -0.319 0.155 4.226 1 0.04 0.727 0.537 0.985 0.420961
MS 0.738 0.139 28.399 1 0 2.092 1.595 2.745 0.676585
FRIDGE -0.367 0.096 14.601 1 0 0.693 0.574 0.836 0.409333
WM -0.33 0.115 8.171 1 0.004 0.719 0.574 0.902 0.418266
TW -0.51 0.124 16.894 1 0 0.6 0.471 0.766 0.375
FW 0.427 0.32 1.79 1 0.181 1.533 0.82 2.869 0.605211
RCVBL 0.005 0.011 0.222 1 0.638 1.005 0.984 1.026 0.501247
RCVD 0.008 0.018 0.198 1 0.656 1.008 0.974 1.043 0.501992
QUAL_DES 0.166 0.022 55.946 1 0 1.18 1.13 1.233 0.541284
Constant -3.617 0.136 704.676 1 0 0.027

ANNEXURE III
Classification Predicted Tablea

GOOD_CUS

Percentage
Observed 0 1 Correct

Step 1 GOOD_CUS 0 5498 46 99.2

1 966 5541 85.2

Overall Percentage 91.6

a. The cut value is .500

Variables in the Equation

95.0% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper Probability


Step 1 FAC1_2
1.206 0.046 680.45 1 0 3.338 3.049 3.655 0.769479
FAC2_2
19.732 0.723 744.264 1 0 3.71E+08 8.99E+07 1.53E+09 1
FAC3_2
0.545 0.169 10.42 1 0.001 1.724 1.239 2.4 0.632893
FAC4_2
-1.223 0.052 563.753 1 0 0.294 0.266 0.326 0.227202
FAC5_2
-0.212 0.048 19.736 1 0 0.809 0.737 0.889 0.447208
FAC6_2
0.138 0.041 11.47 1 0.001 1.148 1.06 1.243 0.534451
Constant
10.75 0.448 577.026 1 0 46636.75

ANNEXURE IV

ROC Curve:
Misclassification table:

v1 v2 v1 v2 v1 v2

Total 2647 3379 1488 1525 1409 1603

Correct 2452 3101 1370 1424 1293 1479

Wrong 195 278 118 101 116 124

Unknown 0 0 0 0 0 0

v1 2452 278 1370 101 1293 124

v2 195 3101 118 1424 116 1479

ANNEXURE V

ROC curve:
Misclassification table:

v1 v2 v1 v2 v1 v2

Total 2624 3402 1495 1518 1425 1587

Correct 2561 2922 1466 1305 1392 1365

Wrong 63 480 29 213 33 222

Unknown 0 0 0 0 0 0

v1 2561 480 1466 213 1392 222

v2 63 2922 29 1305 33 1365

ANNEXURE VI

ROC Curve:
Misclassification table:

v1 v2 v1 v2 v1 v2

Total 2561 3465 1517 1496 1466 1546

Correct 2343 3166 1400 1379 1328 1406

Wrong 218 299 117 117 138 140

Unknown 0 0 0 0 0 0

v1 2343 299 1400 117 1328 140

v2 218 3166 117 1379 138 1406

REFERENCES:

http://www.icicibank.com
http://en.wikipedia.org

http://ksghome.harvard.edu/~rpande/papers/encyclopedia.pdf

http://www.bus.umich.edu/BottomOfThePyramid/ICICI.pdf

A report on A comparison between statistical and Data Mining methods for credit
scoring in case of limited available data by Hassan Sabzevari, Mehdi Soleymani and
Eaman Noorbakhsh

A report on RETAIL CREDIT CHANGES TACK–THE REACH VS.RISK DEBATE by


Prof T S Rama Krishna Rao, Senior Faculty, Academic Wing, ICFAI University

Reserve Bank of India, [1954], ‘All-India rural credit survey: report’, Report of the
Committee of Direction of the All India Rural Credit Survey, Mumbai.

Banerjee, Abhijit, [2004], “Contracting Constraints, Credit Markets and Economic


Development" in M.Dewatripont, L.Hansen and S.Turnovsky, eds. Advances in
Economics and Econometrics: Theory and Applications, Cambridge University Press

Government of India [1991], “Report of the Committee on the Financial Sector”


chairman M. Narsimhan, Ministry of Finance, Delhi .

You might also like