You are on page 1of 31

CHAPTER THREE

ANALYSIS AND DESIGN

3.1 System Analysis


System Analysis is the study of a business problem domain to recommend
improvements and specify the business requirements and priorities for the
solution. It involves the analyzing and understanding a problem, then
identifying alternative solutions, choosing the best course of action and then
designing the chosen solution. It involves determining how existing systems
work and the problems associated with existing systems. It is worthy to note
that before a new system can be designed, it is necessary to study the system
that is to be improved upon or replaced, if there is any. System analysis is
conducted to study a system or its part in other to identify its objectives. System
analysis specifies what the system should do. It involves collection of data,
examination of an already existing solution and building the logical model of
the system.

The research method adopted in this study is System Development Life Cycle.
Waterfall Model is the SDLC approach that was used for the software
development. In waterfall model approach, the whole process of development is
divided into separate phases. The outcome of one phase acts as the input for the
next phase serially.

3.1.1 Fact Finding


Fact finding is process of collection of data and information based on
techniques which contain sampling of existing documents, research,
observation, questionnaires, interviews, prototyping and joint requirements
planning. System analyst uses suitable fact finding techniques to develop and
implement the current existing system. Collecting required facts are very

1
important to apply tools in System Development Life Cycle because tools
cannot be used efficiently and effectively without proper extracting from facts.
Fact finding techniques are used in the early stage of System Development Life
Cycle including system analysis phase, design and post implementation review.
Facts included in any information system can be tested based on three steps:
data set used to create useful information, process functions to perform the
objectives and interface designs to interact with users. This study gathers facts
through the followings methods to analyze the system expectations.
The fact finding techniques that is employed in this study is observation. This
study adopts the secondary method of data collection. The dataset used was
obtained from the online machine learning repository of kaggle. The dataset has
thirteen (13) attributes, twelve (12) of which are feature attributes and one (1) is
the predicted value. Samples of 614 loan applicant information were collected
to train the model.
The data is available in a CSV (Comma Separated Values) file that can easily be
loaded into the system for training the model.

3.1.2 Analysis of the Existing System


The existing system of loan in Staff Multipurpose Cooperative Society
Limited JOSTUM involves manual decision, and it also involves the applicant
meeting cooperative society management one on one. The applicant may have to
go the cooperative society office in other to make decision whether the loan will
be granted or not.

3.1.3 problems of the existing system


When applying for a loan, the applicant may be required to visit the cooperative
society office for proper documentation. The problem with this is that it takes
time; the cooperative society management must manually go over the terms and

2
conditions document, as well as check whether the applicant is a registered
member of the society before the loan is granted

In the cooperative staff multipurpose society environment, the primary goal is


to place their assets in safe hands. The cooperative society may grant a loan
following a regressive process of verification and validation, but there is no
guarantee that the chosen applicant is the most deserving of all applicants.

3.1.4 Advantages of proposed system

The main goal of this project is to decide whether or not a loan applicant is
eligible for a loan based on the numerous qualities that the user provides as
input. The Machine Learning Model is given these features, and it generates a
forecast based on how they affect the label. This was accomplished by first
looking for a dataset that met both the developer's and the user's requirements.
The interest rate on a loan may rise year after year, necessitating the
development of a system that can predict the types of loans available and their
rates. This technique can assist the cooperative society in determining which
client categories are eligible for a certain loan. By predicting loan, The
cooperative society can lower its non-performing assets by anticipating loan
performance.

3.2 Modeling the Proposed System


System modelling is the process of developing abstract models of a system,
with each model presenting a different perspective of the system. It is all about
representing a system using graphical notation. Models help the analyst to
understand the functionality of the system.

3
The Unified Modeling Language (UML) is a general-purpose developmental
modeling language in the field of software engineering, which is intended to
provide a standard way to visualize the design of a system. It can be used to
model the structures of an application, behaviors and even business processes.
The central idea behind the usage of UML in this research is to capture the
significant details about the system, such that the problem will be clearly
understood, solution architecture can be developed, and a chosen
implementation scheme can be clearly identified and constructed.

3.2.1 Proposed System Architecture


System architecture is the conceptual model that defines the structure,
behaviour and representation of a system. The architecture of the system is
shown in the figure below:

4
Data cleansing and
processing

Logistic Regression
Determining the Training
and Testing Data Support Vector
Loan Machine
XGBoost
Random Forest
Decision Tree

Predict Best Model

Result Input

Figure 1: Architecture of the system

The architecture shows that the dataset is first preprocessed which involve
transforming raw data into an understandable format and check if there are
missing values. Processing the dataset is done to remove rows or columns that
have missing values due to mistakes the might have occurred when entering the
data into the CSV file. This is important as it helps prevent some runtime errors

5
like Not a Number (NaN) error that could prevent the system from working
effectively. The dataset is then normalized which involves rescaling real valued
numeric attributes into the range 0 and 1. The dataset is then divided into
training and testing datasets.

3.2.2 Use Case Diagram


A use case diagram is a representation of a user’s interaction with the system
that shows the relationship between the user and the different use cases in which
the user is involved. Use case diagrams are a way to capture the system’s
functionality and requirements in UML diagrams. It captures the dynamic
behavior of live system. A use case diagram consists of a use case and an actor.
The systems use case diagram is shown below.

LOAN PREDICTION SYSTEM

Input Features

6
Predict

Display

Use System

View Result

Figure 2: Use case diagram of the Proposed system

3.3 System Design


System design is the specification or construction of a technical, computer-
based solution for the business requirements identified in a system analysis. It
gives the overall plan or model of a system consisting of all specifications that
give the system its form and structure i.e. the structural implementation of the
system analysis.

The selected architectural design defines all the components that needs to be
developed, communications with third party services, user flows and database
communications as well as front-end representations and behavior of each
component. The design is usually kept in the Design Specification Document.

7
3.3.1 Input Design
Input Design is the process of converting a user oriented description of the into
a computer based system. Input design facilitates the entry of data into the
computer system. In other for the proposed system to perform predictions, it
requires specific features of the house to be supplied as shown in the table
below.

Table 2: Input Design

S/ Field name Data type Description


N

1 Loan_ID Categorical Applicant Loan identification unique


number

2 Gender Categorical Applicant Gender

3 Married Boolean Applicant Marital status

4 Dependents Categorical Applicant dependency(e.g depends


on parents or Independent)

5 Education Categorical Applicant education level

6 Self_Employee Boolean Applicant Employee status

7 Applicantincome Numbers Monthly income of the user

8 Coapplicantincome Numbers Applicant partner or guarantors


income

9 LoanAmount Numbers Amount of loan applied by the


applicant

10 Loan_Amount_Term Numbers Duration of loan

8
11 Credit_History Categorical Credit score of applicant

12 Property_Area Categorical Applicant investment area

3.3.2 Output design

There must be one or more outputs in a system. The reason for a system's design
is its output. When a system's output is efficient and accurate, it is said to have
attained its goals. The system's output is the expected consequence of the loan
features. The loan status eligibility will be displayed as either eligible or not
eligible.

S/N Field name Data type Description

1 Loan_status Boolean The predicted loan status

3.4 Program Design


Program design is the process of translating system requirements into a program
that can be executed on a computer system. The program was designed using
Logistic regression machine learning algorithm;

3.4.1 Program architecture


Program architecture refers to the fundamental structures of a software system.

9
3.4.2 Program flowchart
An algorithm is a step by step procedure, which defines a set of instruction to be
execute in a certain order to get the desired output. The flowchart below
represents the algorithm used in the study.

10
Figure 4: Flowchart for the proposed System

The dataset used to train the model was loaded to the system. The dataset is
divided into training the dataset which is 63% of the total dataset and testing
dataset which is 37% of the total dataset. The model is training involves
identifying patterns in the data and creating a function that attempts to map
input features to the output feature. The remaining testing data is used to
determine how well the model has learnt the training dataset and can carry out
prediction. The inputs supplied by the user first has to be arranged into a numpy
array before it is passed to the model, then the model uses the function that was
created at the point of learning to map the inputs given to the model to an output
in the solution space and sometimes even outside it depending on the values of
the input supplied to it. This is what is finally outputted from the system.

3.5 Description of Modules


A module is a software component or part of a program that contains one or
more routines. The modules used in this study are explained in the table below.

Table 4: Modules used in the system

11
SN Module Description
1 Pickle It is use for saving a programs state data on the
disk where it is left off when restarted. It is used
for minimizing the execution time
2 Pandas It is used to load and save dataset
3 Numpy It is the core library for scientific computing in
python
4 Train_Test_Split It splits arrays or matrices into random train and
test subsets
5 Accuracy_Score It is used to determine how accurate the
classification is.
6 MinMax Scaler The ranges of values in the dataset are scaled
down to between 0 to 1. It is used to prevent one
feature from dominating other features.

3.6 Choice of Programming Tools


The programming tools that were used in implementing this project are
Streamlit,Sublime text, Python and Jupyter notebook.

a) Streamlit: Streamlit is an open-source Python library that makes it easy


to create and share beautiful, custom web apps for machine learning and
data science.
b) Python: python is a high level, interpreted and general purpose dynamic
programming language. The syntax in python helps the programmers
code in fewer steps as compared to other programming languages. It is
used in to carry out this study because; it supports web applications, it has
many in built functions, it has extensive support libraries and it has user
friendly data structures.

12
c) PyCharm: PyCharm is a dedicated python Integrated Development
Environment providing a wide range of essential tools for python
developers, integrated to create a convenient environment for productive
python, web and data science development. It was used for the HTML,
CSS and scripting.
d) Jupyter Notebook: Jupyter notebook is an open source web application
that allows data scientists to create and share documents that integrate
live code, equations, computational output, visualizations, and other
multimedia resources, along with explanatory text in a single document.
In this study Jupyter was used to train the model.

13
CHAPTER FOUR

IMPLEMENTATION AND RESULT

4.1 Implementation
Implementation involves testing the system to verify if it meets the stated aim
and objectives. It also involves training users to handle the system and plan for
a smooth conversion.

Once the data was cleanend and insights was gained about the dataset,
appropriate machine learning model that fits our dataset was applied. Five
Machine learning algorithms were selected to predict the dependent variable
in the dataset. The five algorithms are Logistics Regression, Support Vector
Machine, XGBoost, Random Forest and Decision Tree. These algorithms
were implemented with the help of python’s SciKit-learn Library. The
predicted outputs obtained from these algorithms were saved in comma
separated value file. This file was generated by the code at run time.

4.2 Program Testing


After the design and coding of an application, it is imperative to run a test to
ascertain that the actual results match the expected results. There are different
testing types but for the purpose of this study, the System testing was adopted.
The purpose of the system testing is to identify and correct errors in the system.

When the code gets executed and the model is trained, an interface is created
and connected to the trained model. Such that a user can input features of the
desired house, output is then gotten as a result of the computation from the
model.

14
Test cases were carried out to verify some functionalities of the software. Each
test case has its objective and expected outcome which is represented in the
table below.

Table 5: Test Case 1 (Evaluation of Algorithms)

1 LPS1 To evaluate the The system should evaluate


performanc of the five the performance of the
algorithms using algorithms to show the best to
Classification algorithm be chosen for the model.
performance metrics.

Table 6: Test Case 2 (Testing Software Workability)

SN Test Test Objectives Expected Outcomes


Cases

1 LPS2 To test if the system can The system should output


predict loan elibility the predicted loan status.
status.

2 LPS3 To ascertain if a change in Change in feature selection


feature selection affects should change the loan status
the Loan status. to either yes or no.

3 LPS3b To check if a changes in The Loan status should


features selected will change accordingly based
affect the Loan status on the trained dataset.
eligibility.

15
4.3 Results

The software was executed as specified in the table 5 and 6. The outputs were
evaluated to determine the performance of the software. The results for the first
test LPS1 is shown in the table 7.

Accuracy is one metric for evaluating classification models.


Informally, accuracy is the fraction of predictions our model got right.
Accuracy is taken into consideration to evaluate the performance of the five
algorithms.

Table 6: Results for Test Case 1

Algorithms Accuracy
score
Logistics Regression 0.83

Support vector 0.81

XGBoost 0.79

Random Forest 0.77

Decision Tree 0.68

From the above table it is clear that the Logistics Regression gives a higher
accuracy of 83% compared to other algorithm.

In the table 6 the test was to ascertain the workability of the software; to
determine if the software can actually predict the loan status and also to

16
determine if a change in a feature can alter the loan status. The results are shown
in table 8.

Table 8: Result for Test Case 2

S Test Test Objectives Results References


N Cases

1 LPS2 To test if the system can The system Figure 5


predict Loan status. predicted the
Loan status and
displayed it with
a success
message

2 LPS3 To ascertain if a change Every change in a Figure 6


in feature selection affects feature affected
the Loan staus. the house price

17
Figure 5: Loan status successful prediction

Figure 6: Change in Loan status due to Change in Selected Features (i)

18
Figure 7: Change in Loan status due to Change in Selected Features (ii)

19
Figure 8: Change in Loan status due to Change in Selected Features (i)

20
Figure 9: Loan status successful prediction due to change in long loan term
duration and high applicant income.

21
Figure 10: Change in Loan status due to change in credit score, applicant
income and loan status.

4.4 Discussion of Results


Based on the results gotten from the tests carried out, the system was able to
come out with the expected outcomes.

4.4.1 Results from Test Case 1


After comparing the models, it is found that Logistic Regression works best
with highest accuracy of 83%.

Next to the Logistics Regression is the Support Vector Machine(SVM), which


has an accuracy score of 81%. XGBoost and Random Forest almost gave the

22
same results that were not good enough compared to Logistics regression and
SVM with accuracy of 79% and 77.

Decision Tree had a poor performance compared to the other algorithms.

4.4.2 Results from Test Case 2


Considering the result gotten from LPS2, the system is able to predict Loan
eligibility status and display them for the user to view. This proves that the
system is working and has achieved the aim for which it was trained.

In LPS3 the aim of the test was to test if the change in feature selection alters
the price depending on the type of features that were selected. This shows that
the system doesn’t only predict Loan status but it predicts loan status according
to the data it was trained with.

4.5 System Requirement


System requirements are the configuration that a system must have in order for
the software to run efficiently. Failure to meet these requirements can result to
performance problems. System requirements can be considered in terms of
hardware and software.

The Hardware Requirements are:

1. 64bits PC

2. Hard drive of at least 500MB free space

Software requirement:

1. Web browser

2. Streamlit

3. Mysqlite

4. Pycharm Ide

23
5. Python interpreter

CHAPTER FIVE

RECOMMENDATION AND CONCLUSION

5.1 CONCLUSION
Improvement in computing technology has made it possible to examine
information that cannot previously be captured, processed and analyzed. New
analytical techniques of machine learning can be used. This study is an
exploratory attempt to use five machine learning algorithms in predicting Loan
eligibility status, and then compare their results.

The study shows that machine learning algorithms can achieve accurate
prediction of 83%, as evaluated by the performance metrics. Given the dataset
used in this study, the conclusion is that Logistic Regression and Support
Vector Machine are able to generate accurate price predictions with lower
prediction errors, compared with the results of XGBoost, Random Forest and
Decision Tree.

The study has shown that machine learning algorithms, are tools important
for property researchers to use in Loan eligibility Status predictions. However,
these machine learning tools also have limitations.

The choice of algorithm depends on consideration of a number of factors


such as the size of the data set, computing power of the equipment, and the
availability of waiting time for the results.

To conclude, Machine learning is very useful for finding the relation between
the attributes and building the model according to the relation that attributes
contain. By using classification algorithm which is part of machine learning the

24
Loan status prediction can be done. Loan status prediction helps the Applicant
check their loan eligibility status at ease with the reduce rate of risk from the
cooperative. Algorithm find relation among the training data and the result is
applied on test data which will be users input. According to attributes specified
the plans gets provided.

5.2 RECOMMENDATION
Buying your own house is what every human wish for. Therefore, using this
system, people can buy houses and real estate at their rightful prices and ensure
that they don't get tricked by sketchy agents.
This system is recommended for usage in Real Estate because it will aid in
giving accurate predictions for them to set the pricing and save them from a lot
of hassle and save time. The system is apt enough in training itself and in
predicting the prices from the raw data provided to it.
Whenever large dataset is involved and there is much categorical data it is
recommended that Trees Regressors be used for modelling rather than other
algorithms.

The efficiency and effectiveness of using machine learning to handle loan status
prediction has already been identified by the in this project, therefore its
recommended that ;

 That the loan prediction system using machine learning algorithms should
be adopted by organizations for loan status eligilibility n other to reduce
risk that accompanies loan approval to the wrong applicant .

25
APPENDIX

import streamlit as st
import pandas as pd
# from predict_page import predictor
from Altloan import show_page
from Altloan import show_loan_Analysis
import pickle
import pickle
import pandas as pd
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report

# for some basic operations


import numpy as np
import pandas as pd

# for visualizations
import matplotlib.pyplot as plt
import seaborn as sns

26
# from cleanData import stringOutput

import sqlite3
conn=sqlite3.connect("data.db")
c=conn.cursor()

header = st.container()
inp = st.container()
pred = st.container()

def create_table():
c.execute('CREATE TABLE IF NOT EXISTS usertable(username
TEXT, password TEXT)')

def add_userdata(username,password):
c.execute('INSERT INTO usertable(username,password)
VALUES (?,?)',(username,password))
conn.commit()

def login_user(username,password):
c.execute('SELECT * FROM usertable WHERE username=? AND
password=?',(username, password))

27
data=c.fetchall()
return data

def view_all_users():
c.execute('SELECT * FROM usertable')
data=c.fetchall()
return data

def main():

menu=["Index" ,"Login","Signup"]
choice=st.sidebar.selectbox("Menu",menu)

if choice=="Index":
st.subheader("Homepage")
st.title("LOAN PREDICTION SYSTEM!!!")
# img1 = Image.open('')
# st.image('bkg1.jpg',use_column_width=True)
# st.image('background2.png')

elif choice=="Login":
st.subheader("Login Section")

28
username=st.sidebar.text_input("Username")
password=st.sidebar.text_input("Password", type='password')
if st.sidebar.checkbox("Login"):
create_table()
result=login_user(username,password)

#if password=="12345":
if result:
st.success("Logged In as {}".format(username))
# st.image('bkg1.jpg')
# Dataset_upload()
task=st.selectbox("Task",
["Predict","Explore","Profiles"])
if task=="Predict":
show_page()
elif task=="Explore":
show_loan_Analysis()
elif task=="Profiles":
st.subheader("User Profiles")
u_data=view_all_users()

clean_db=pd.DataFrame(u_data,columns=["Username","Password"])

st.dataframe(clean_db)
29
else:
st.warning("Incorrect Username/password")
elif choice=="Signup":
st.subheader("Create New Account")
new_user=st.text_input('Username')
new_password=st.text_input("Passord", type="password")

if st.button("Signup"):
create_table()
add_userdata(new_user,new_password)
st.success("You have succesfullly created a valid
Account")
st.info("Go to Login Menu to login")

if __name__=='__main__':
main()

30
31

You might also like