Professional Documents
Culture Documents
The research method adopted in this study is System Development Life Cycle.
Waterfall Model is the SDLC approach that was used for the software
development. In waterfall model approach, the whole process of development is
divided into separate phases. The outcome of one phase acts as the input for the
next phase serially.
1
important to apply tools in System Development Life Cycle because tools
cannot be used efficiently and effectively without proper extracting from facts.
Fact finding techniques are used in the early stage of System Development Life
Cycle including system analysis phase, design and post implementation review.
Facts included in any information system can be tested based on three steps:
data set used to create useful information, process functions to perform the
objectives and interface designs to interact with users. This study gathers facts
through the followings methods to analyze the system expectations.
The fact finding techniques that is employed in this study is observation. This
study adopts the secondary method of data collection. The dataset used was
obtained from the online machine learning repository of kaggle. The dataset has
thirteen (13) attributes, twelve (12) of which are feature attributes and one (1) is
the predicted value. Samples of 614 loan applicant information were collected
to train the model.
The data is available in a CSV (Comma Separated Values) file that can easily be
loaded into the system for training the model.
2
conditions document, as well as check whether the applicant is a registered
member of the society before the loan is granted
The main goal of this project is to decide whether or not a loan applicant is
eligible for a loan based on the numerous qualities that the user provides as
input. The Machine Learning Model is given these features, and it generates a
forecast based on how they affect the label. This was accomplished by first
looking for a dataset that met both the developer's and the user's requirements.
The interest rate on a loan may rise year after year, necessitating the
development of a system that can predict the types of loans available and their
rates. This technique can assist the cooperative society in determining which
client categories are eligible for a certain loan. By predicting loan, The
cooperative society can lower its non-performing assets by anticipating loan
performance.
3
The Unified Modeling Language (UML) is a general-purpose developmental
modeling language in the field of software engineering, which is intended to
provide a standard way to visualize the design of a system. It can be used to
model the structures of an application, behaviors and even business processes.
The central idea behind the usage of UML in this research is to capture the
significant details about the system, such that the problem will be clearly
understood, solution architecture can be developed, and a chosen
implementation scheme can be clearly identified and constructed.
4
Data cleansing and
processing
Logistic Regression
Determining the Training
and Testing Data Support Vector
Loan Machine
XGBoost
Random Forest
Decision Tree
Result Input
The architecture shows that the dataset is first preprocessed which involve
transforming raw data into an understandable format and check if there are
missing values. Processing the dataset is done to remove rows or columns that
have missing values due to mistakes the might have occurred when entering the
data into the CSV file. This is important as it helps prevent some runtime errors
5
like Not a Number (NaN) error that could prevent the system from working
effectively. The dataset is then normalized which involves rescaling real valued
numeric attributes into the range 0 and 1. The dataset is then divided into
training and testing datasets.
Input Features
6
Predict
Display
Use System
View Result
The selected architectural design defines all the components that needs to be
developed, communications with third party services, user flows and database
communications as well as front-end representations and behavior of each
component. The design is usually kept in the Design Specification Document.
7
3.3.1 Input Design
Input Design is the process of converting a user oriented description of the into
a computer based system. Input design facilitates the entry of data into the
computer system. In other for the proposed system to perform predictions, it
requires specific features of the house to be supplied as shown in the table
below.
8
11 Credit_History Categorical Credit score of applicant
There must be one or more outputs in a system. The reason for a system's design
is its output. When a system's output is efficient and accurate, it is said to have
attained its goals. The system's output is the expected consequence of the loan
features. The loan status eligibility will be displayed as either eligible or not
eligible.
9
3.4.2 Program flowchart
An algorithm is a step by step procedure, which defines a set of instruction to be
execute in a certain order to get the desired output. The flowchart below
represents the algorithm used in the study.
10
Figure 4: Flowchart for the proposed System
The dataset used to train the model was loaded to the system. The dataset is
divided into training the dataset which is 63% of the total dataset and testing
dataset which is 37% of the total dataset. The model is training involves
identifying patterns in the data and creating a function that attempts to map
input features to the output feature. The remaining testing data is used to
determine how well the model has learnt the training dataset and can carry out
prediction. The inputs supplied by the user first has to be arranged into a numpy
array before it is passed to the model, then the model uses the function that was
created at the point of learning to map the inputs given to the model to an output
in the solution space and sometimes even outside it depending on the values of
the input supplied to it. This is what is finally outputted from the system.
11
SN Module Description
1 Pickle It is use for saving a programs state data on the
disk where it is left off when restarted. It is used
for minimizing the execution time
2 Pandas It is used to load and save dataset
3 Numpy It is the core library for scientific computing in
python
4 Train_Test_Split It splits arrays or matrices into random train and
test subsets
5 Accuracy_Score It is used to determine how accurate the
classification is.
6 MinMax Scaler The ranges of values in the dataset are scaled
down to between 0 to 1. It is used to prevent one
feature from dominating other features.
12
c) PyCharm: PyCharm is a dedicated python Integrated Development
Environment providing a wide range of essential tools for python
developers, integrated to create a convenient environment for productive
python, web and data science development. It was used for the HTML,
CSS and scripting.
d) Jupyter Notebook: Jupyter notebook is an open source web application
that allows data scientists to create and share documents that integrate
live code, equations, computational output, visualizations, and other
multimedia resources, along with explanatory text in a single document.
In this study Jupyter was used to train the model.
13
CHAPTER FOUR
4.1 Implementation
Implementation involves testing the system to verify if it meets the stated aim
and objectives. It also involves training users to handle the system and plan for
a smooth conversion.
Once the data was cleanend and insights was gained about the dataset,
appropriate machine learning model that fits our dataset was applied. Five
Machine learning algorithms were selected to predict the dependent variable
in the dataset. The five algorithms are Logistics Regression, Support Vector
Machine, XGBoost, Random Forest and Decision Tree. These algorithms
were implemented with the help of python’s SciKit-learn Library. The
predicted outputs obtained from these algorithms were saved in comma
separated value file. This file was generated by the code at run time.
When the code gets executed and the model is trained, an interface is created
and connected to the trained model. Such that a user can input features of the
desired house, output is then gotten as a result of the computation from the
model.
14
Test cases were carried out to verify some functionalities of the software. Each
test case has its objective and expected outcome which is represented in the
table below.
15
4.3 Results
The software was executed as specified in the table 5 and 6. The outputs were
evaluated to determine the performance of the software. The results for the first
test LPS1 is shown in the table 7.
Algorithms Accuracy
score
Logistics Regression 0.83
XGBoost 0.79
From the above table it is clear that the Logistics Regression gives a higher
accuracy of 83% compared to other algorithm.
In the table 6 the test was to ascertain the workability of the software; to
determine if the software can actually predict the loan status and also to
16
determine if a change in a feature can alter the loan status. The results are shown
in table 8.
17
Figure 5: Loan status successful prediction
18
Figure 7: Change in Loan status due to Change in Selected Features (ii)
19
Figure 8: Change in Loan status due to Change in Selected Features (i)
20
Figure 9: Loan status successful prediction due to change in long loan term
duration and high applicant income.
21
Figure 10: Change in Loan status due to change in credit score, applicant
income and loan status.
22
same results that were not good enough compared to Logistics regression and
SVM with accuracy of 79% and 77.
In LPS3 the aim of the test was to test if the change in feature selection alters
the price depending on the type of features that were selected. This shows that
the system doesn’t only predict Loan status but it predicts loan status according
to the data it was trained with.
1. 64bits PC
Software requirement:
1. Web browser
2. Streamlit
3. Mysqlite
4. Pycharm Ide
23
5. Python interpreter
CHAPTER FIVE
5.1 CONCLUSION
Improvement in computing technology has made it possible to examine
information that cannot previously be captured, processed and analyzed. New
analytical techniques of machine learning can be used. This study is an
exploratory attempt to use five machine learning algorithms in predicting Loan
eligibility status, and then compare their results.
The study shows that machine learning algorithms can achieve accurate
prediction of 83%, as evaluated by the performance metrics. Given the dataset
used in this study, the conclusion is that Logistic Regression and Support
Vector Machine are able to generate accurate price predictions with lower
prediction errors, compared with the results of XGBoost, Random Forest and
Decision Tree.
The study has shown that machine learning algorithms, are tools important
for property researchers to use in Loan eligibility Status predictions. However,
these machine learning tools also have limitations.
To conclude, Machine learning is very useful for finding the relation between
the attributes and building the model according to the relation that attributes
contain. By using classification algorithm which is part of machine learning the
24
Loan status prediction can be done. Loan status prediction helps the Applicant
check their loan eligibility status at ease with the reduce rate of risk from the
cooperative. Algorithm find relation among the training data and the result is
applied on test data which will be users input. According to attributes specified
the plans gets provided.
5.2 RECOMMENDATION
Buying your own house is what every human wish for. Therefore, using this
system, people can buy houses and real estate at their rightful prices and ensure
that they don't get tricked by sketchy agents.
This system is recommended for usage in Real Estate because it will aid in
giving accurate predictions for them to set the pricing and save them from a lot
of hassle and save time. The system is apt enough in training itself and in
predicting the prices from the raw data provided to it.
Whenever large dataset is involved and there is much categorical data it is
recommended that Trees Regressors be used for modelling rather than other
algorithms.
The efficiency and effectiveness of using machine learning to handle loan status
prediction has already been identified by the in this project, therefore its
recommended that ;
That the loan prediction system using machine learning algorithms should
be adopted by organizations for loan status eligilibility n other to reduce
risk that accompanies loan approval to the wrong applicant .
25
APPENDIX
import streamlit as st
import pandas as pd
# from predict_page import predictor
from Altloan import show_page
from Altloan import show_loan_Analysis
import pickle
import pickle
import pandas as pd
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report
# for visualizations
import matplotlib.pyplot as plt
import seaborn as sns
26
# from cleanData import stringOutput
import sqlite3
conn=sqlite3.connect("data.db")
c=conn.cursor()
header = st.container()
inp = st.container()
pred = st.container()
def create_table():
c.execute('CREATE TABLE IF NOT EXISTS usertable(username
TEXT, password TEXT)')
def add_userdata(username,password):
c.execute('INSERT INTO usertable(username,password)
VALUES (?,?)',(username,password))
conn.commit()
def login_user(username,password):
c.execute('SELECT * FROM usertable WHERE username=? AND
password=?',(username, password))
27
data=c.fetchall()
return data
def view_all_users():
c.execute('SELECT * FROM usertable')
data=c.fetchall()
return data
def main():
menu=["Index" ,"Login","Signup"]
choice=st.sidebar.selectbox("Menu",menu)
if choice=="Index":
st.subheader("Homepage")
st.title("LOAN PREDICTION SYSTEM!!!")
# img1 = Image.open('')
# st.image('bkg1.jpg',use_column_width=True)
# st.image('background2.png')
elif choice=="Login":
st.subheader("Login Section")
28
username=st.sidebar.text_input("Username")
password=st.sidebar.text_input("Password", type='password')
if st.sidebar.checkbox("Login"):
create_table()
result=login_user(username,password)
#if password=="12345":
if result:
st.success("Logged In as {}".format(username))
# st.image('bkg1.jpg')
# Dataset_upload()
task=st.selectbox("Task",
["Predict","Explore","Profiles"])
if task=="Predict":
show_page()
elif task=="Explore":
show_loan_Analysis()
elif task=="Profiles":
st.subheader("User Profiles")
u_data=view_all_users()
clean_db=pd.DataFrame(u_data,columns=["Username","Password"])
st.dataframe(clean_db)
29
else:
st.warning("Incorrect Username/password")
elif choice=="Signup":
st.subheader("Create New Account")
new_user=st.text_input('Username')
new_password=st.text_input("Passord", type="password")
if st.button("Signup"):
create_table()
add_userdata(new_user,new_password)
st.success("You have succesfullly created a valid
Account")
st.info("Go to Login Menu to login")
if __name__=='__main__':
main()
30
31