You are on page 1of 16

Implementation of Machine Learning and Data Science in

Alternative Credit Scoring Technique

An Initiative by Women’s Micro Bank


Proposal & Implementation by Adzguru (PNG) Limited
Available parameters for evaluation of loan agreement

We have segregated the available data into following


categories as below:-

a) Personal information for loan agreement


b) Family Segment
c) Home Appliances & Vehicle
d) Financial Behavior
Amongst each category we have a set of attributes /questions
which helps in the evaluation of credibility of individual
customer.
Creation of Application by Python

We are executing the entire application with the help of Python. So we have installed libraries like Pandas and Numpy for Data analysis .
Pandas will help to analyze the data and Numpy can be used to perform mathematical arrays. Taking randomized data , we will create the
prediction.
Data Collection for Application

Here we take 20 attributes for predicting the eligibility of Loan Applicant . Taking individual inputs
from this 20 attributes and cleaning the data , we will further proceed in modelling of the data.
Application of Logistic Regression

Here we will split the data into train and test data. Train data is used for modelling and training of a
dataset and then we will test its accuracy by test data.

Now we are fitting Logistic Regression Model to the dataset for further prediction
Modelling of Alternative Credit Scoring Data by Logistic
Regression

Logistic Regression
We have implemented Logistic Regression to check the loan eligibility criteria of an applicant.
 Machine Learning scores have stronger predictive power, allowing for lower risk and/or higher acceptance and
allows to use nonstandard data sources which can further raise predictive power
 Logistic regression is a simple classification approach to estimate the relationship between a categorical
dependent variable and several independent variables. The logistic regression model is commonly used by
both mission-driven lenders and financial lenders for binary objective situations, typically appearing in the
form of credit scoring.
Logistic Regression Methodology

Logistic Regression
We have implemented Logistic Regression to check the loan eligibility criteria of an applicant.
• Logistic regression is a commonly used classification algorithm in machine learning that allows categorizing data
into discrete classes by learning the relationship from a given set of labeled data.
• We have prepared randomized dataset taking above mentioned segments and categories into account
• Using 80% of our labelled dataset of historic loan applications, we would train a logistic regression risk model.
Then we would use this Machine Learning model to predict likelihood of loan repayment for the remaining 20%
applications (our test set). Towards the end, we would use logistic regression methodology to identify the business
rules for accepting or rejecting any new loan application.
• The individual crossing the threshold value of logistic regression will be eligible for
loan applications.
 We are implementing this model via Python.
Application Interface
Creation of Questionnaire
First comes the personal information for loan agreement. We have set up
following questions to evaluate the customer through their personal
information. The questions are listed below:
(1)Age
(2)Relationship Status
(3)Physical Disability
(4)Residential Area
(5)Highest level of education
(6)Rent or own your residence
We have calculated Age of applicant through the Date of Birth of individual
customer. Then for Highest level of education we have segregated into
following categories: No proper education, School, High School, College.
Based on available dataset we have done analysis of categorical data and
calculate the probability for eligibility of getting the loan for each category of
an attribute and then we have allotted the scores for each category based on
the calculation.
Personal Information
For example – In Personal Information segment we have the following
attributes and each attributes are segregated by their values & scores as
inputs :

Relationship Status Physical Disability Residential Area

Value score
Value score Value score
Single 3
Yes 0 Urban 2
Married 2

Widow 1 No 1 Rural 1

Highest level of education


Rent or own your residence
Value score
Value score
No formal education 1
Rented 1
School 2

High School 3 Owned 2

College 4 other 0
Family & Standard of living
• In Family & Standard of living segment we have the following attributes

Dependent members
Does the customer use LPG
Value score

Value Score
Greater than 4 1

3 to 4 2 Yes 2

1 to 2 3
No 1
0 4

Does the household own any Are you employed /Other


appliances Does they own any Motor Vehicle
Value Score
Value Score Employed 3
Value Score
Fridge 4
Car 3 Self-employed 2
TV 3
Others 2
Oven 2 Unemployed 1

Mobile 1 None 1 Student 0


Financial Segment

For Daily Income and Expenditure and Monthly medication expenses, we will take numerical
data and not the ranges. But for nature of income, we have segregated into following categories: -

Nature of income Any significant medical expenses


Value Score
Value Score
Temporary/Hawking 1
Yes 0
Permanent Market Place 3

others 2 No 1

Credit History
Next we focuses on Credit history. Value Score
We also took Loan Amount, Loan Period for individual
Yes 2
loan applicant in numeric exact values.
Not available 1

No 0
Creation of Scores

Now we are calculating conditional probability to calculate the scoring of different


categories in R language . Relationship Status

Value score

1) First comes the Relationship Status.


Single 3

Here we are calculating the probability of Applicant who is Single getting the loans
Married 2

P(Applicant is Single| Loan Status=”Y”)= 0.4359


Widow 1

P(Applicant is Married| Loan Status=”Y”)= 0.313


P(Applicant is Widow| Loan Status=”Y”)= 0.2511.

Now based on higher probability of getting loan for a particular category, we will
allocate scores. Higher the probability, higher the score.
Creation of Scores
Does the household own any
Next come another attribute “Does the household own any appliances” appliances

Value Score

P(Applicant has Fridge| Loan Status=”Y”)= 0.3972 Fridge 4


TV 3
P(Applicant has TV| Loan Status=”Y”)= 0.2381 Oven 2

P(Applicant has Oven| Loan Status=”Y”)= 0.193 Mobile 1

P(Applicant has Mobile| Loan Status=”Y”)= 0.1717


Then we focus on “Does they own any Motor Vehicle” Does they own any Motor Vehicle

Value Score

P(Car| Loan Status=”Y”)= 0.6125 Car 3

P(2-Wheeler| Loan Status=”Y”)= 0.255 Others 2


P(None| Loan Status=”Y”)= 0.132 None 1

Now based on higher probability of getting loan for a particular category, we will
allocate scores. Higher the probability, higher the score.
CONCLUSION
To conclude, Logistic Regression with the help of Python provides an excellent tool to answer the
credibility or eligibility of getting loan from Women’s Micro Bank. We are ready to customise the
application as needed .
We are looking forward to Collab and support Women’s Micro Bank for such a great initiative of
empowering women in every aspect of society.
Thank You

You might also like