You are on page 1of 4

IMPLEMENTATION OF DATA SCIENCE AND MACHINE LEARNING IN ALTERNATIVE CREDIT SCORING

As per the availability of alternative credit scoring, we have collected the data and formed the
questionnaire into following categories as below: -
a) Personal information for loan agreement
b) Family Segment
c) Home Appliances & Vehicle
d) Financial Behavior

Amongst each category we have a set of attributes /questions which helps in the evaluation of credibility
of individual customer.

a) First comes the personal information for loan agreement. We have set up following questions to
evaluate the customer through their personal information. The questions are listed below:
(1) Age
(2) Relationship Status
(3) Physical Disability
(4) Residential Area
(5) Highest level of education
(6) Rent or own your residence

We have calculated Age of applicant through the Date of Birth of individual customer. Then for
Relationship Status we have segregated into following categories: Single, Married and Separated.

Now we are calculating conditional probability to calculate the scoring of different categories in R
language.

1) First comes the Relationship Status.

Here we are calculating the probability of each Relationship Status category given the loan is approved,
based on prior available dataset.

P(Single| Loan Status=”Y”)= 0.4359

P(Married| Loan Status=”Y”)= 0.313

P(Widow| Loan Status=”Y”)= 0.2511

Now based on higher probability of getting loan for a particular category, we will allocate scores. Higher
the probability, higher the score.

The scoring of all categories is mentioned below: -


For Personal Information segment: -

Relationship Status Physical Disability

Value score Value score

Single 3
Yes 0
Married 2
No 1
Widow 1

Highest level of education


Residential Area
Value score
Value score
No formal
1
education
Urban 2
School 2
Rural 1
High School 3

College 4

Rent or own your residence

Value score

Rented 1

Owned 2

other 0
Next comes Family Segment: -

Dependent members Does the customer use LPG

Value score
Value Score
Greater than 4 1
Yes 2
3 to 4 2

1 to 2 3 No 1

0 4

Does the household own any


appliances Does they own any Motor Vehicle

Value Score
Value Score
Fridge 4
Car 3
TV 3
Others 2
Oven 2
None 1
Mobile 1

Are you employed /Other

Value Score

Employed 3

Self-
2
employed

Unemployed 1

Student 0

Lastly the Financial Segment


For Daily Income and Expenditure and Monthly medication expenses, we will take numerical data and
not in ranges. We also took Loan Amount, Loan Period for individual loan applicant in numeric exact
values.

Nature of income Any significant medical expenses

Value Score Value Score


Temporary/Hawking 1
Yes 0
Permanent Market Place 3
No 1
others 2

Credit History

Value Score

Yes 2

Not available 1

No 0

Next is the implementation of Machine Learning: -

We have implemented Logistic Regression to check the loan eligibility criteria of an applicant

• Logistic regression is a commonly used classification algorithm in machine learning that allows
categorizing data into discrete classes by learning the relationship from a given set of labeled
data. Using 80% of our labelled dataset of historic loan applications, we would train a logistic
regression risk model. Then we would use this Machine Learning model to predict likelihood of
loan repayment for the remaining 20% applications (our test set). Here we will split the data into
train and test data. Train data is used for modelling and training of a dataset and then we will
test its accuracy by test data.

• Towards the end, we would use logistic regression methodology to identify the business rules for
accepting or rejecting any new loan application.

• The individual crossing the threshold value of logistic regression will be eligible for loan
applications.

 We are implementing this model via Python.

You might also like