Professional Documents
Culture Documents
2019-2023
A
Project Report
On
Business Repayment Date Prediction
Submitted in partial fulfillment of the requirements
for
7th Semester
Bachelor of Technology in
Computer Science and Engineering
by
1. Vismay Agarwal University Roll No- 19013440058
2. Aayush Marwah University Roll No- 19013440001
3. Pushkar Pandey University Roll No- 19013440033
4. Nadeem Ahmad University Roll No- 19013440028
5. Shivansh Kumar Singh University Roll No- 19013440047
This is to certify that the project work entitled “Business Repayment Date
Signature of External
(Signature)
1. Vismay Agarwal
(Signature)
2. Aayush Marwah
(Signature)
3. Pushkar Pandey
(Signature)
4. Nadeem Ahmad
(Signature)
5. Shivansh Kumar Singh
ACKNOWLEDGEMENT
(Signature)
1. Vismay Agarwal
(Signature)
2. Aayush Marwah
(Signature)
3. Pushkar Pandey
(Signature)
4. Nadeem Ahmad
(Signature)
5. Shivansh Kumar Singh
Content
1. Introduction
• Purpose
• Project Scope
• Product Features
2. System Analysis
• Software Requirements
• Hardware Requirements
3. System Design & Specifications
High Level Design (HLD)
• Project Model
• Libraries Imported
• Structure Chart
• DFD
• E-R Diagram
• UML ( Use Case Diagram)
4. Coding
• Sample Code
5. Testing
• Unit Testing
• Integration Testing
6. Conclusion & Limitations.
7. Reference/Bibliography
INTRODUCTION
PURPOSE:
A predictive model is a statistical technique using machine learning and
data mining to predict and forecast likely future outcomes with the aid of
historical and existing data. It works by analyzing current and historical data
and projecting what it learns on a model generated to forecast likely
outcomes. Predictive modeling can be used to predict just about anything,
from TV ratings and a customer’s next purchase to credit risks and
corporate earnings.
Most predictive models work fast and often complete their calculations in
real time. That’s why banks and retailers can, for example, calculate the risk
of an online mortgage or credit card application and accept or decline the
request almost instantly based on that prediction.
Our project aims to predict the likely date when the borrower may return the
amount due on the product he took from the manufacturer in advance using
this predictive model in machine learning using various algorithms like
linear regression, decision tree, random forest etc.
PROJECT SCOPE:
PRODUCT FEATURES:
SOFTWARE REQUIREMENT:
HARDWARE REQUIREMENT:
PROJECT MODEL:
LIBRARIES IMPORTED:
STRUCTURE CHART:
DATA FLOW DIAGRAM:
ENTITY RELATIONSHIP DIAGRAM:
UML DIAGRAM:
CODING
METHODS:
• Null Implification
• Pre Processing
• Feature Engineering
• Exploratory Data Analysis
• Encoding
• Feature Selection
• Linear Regression
• Decision Tree
NULL IMPLIFICATION:
Removing null columns in data frames since they cause data leaks and
inconsistency.
Finds null columns in data set. Returns true for parameter if null
column is found, else returns false. If a null column parameter is found,
we drop that parameter i.e, remove it from data set.
PRE PROCESSING:
Converts non readable, i.e, non int or float target and associate data
frames into readable dataframes. We do so by using regular
expressions. Here our dataframe is date, month and year. Using
datetime library of python we implement pre-processing.
FEATURE ENGINEERING:
EDA is used to analyze and investigate data sets and summarize their
main characteristics, often employing data visualization methods.
There are way too many outliers in this dataset we can't delete all of
them as they are necessary for our model. But instead we are going to
seat a min/max range and we are going to delete data beside it to
increase efficiency of model. The range that we are going to select is 70
acc to boxplot.
ENCODING:
Now that we have all necessary dataframes, we will try and convert
these dataframes into machine readable data i.e, into integer type. This is
done using encoding. We have implemented label encoding here. We are
applying label encoding on data frames that are in object type.
FEATURE SELECTION:
We can see in the axessubplot graph that there are data leaks in the
linear regression of the train dataset hence, it causes the low
accuracy in prediction.
DECISION TREE:
Here we notice that the accuracy is well over 99% as well as the
axessubplot shows more consistent graph compared to in linear
regression.
TESTING
UNIT TESTING:
INTEGRATION TESTING:
CONCLUSIONS & LIMITATIONS
CONCLUSIONS:
Time also plays a role in how well these techniques work. Though a
model may be successful at one point in time, customer behavior
changes with time and therefore, a model must be updated. The 2008
financial crisis exemplifies how crucial time consideration is because
invalid models were predicting the likelihood of mortgage customers
repaying loans without considering the possibility that U.S. housing
prices might drop.
WEBSITES:
https://www.netsuite.com/portal/resource/articles/financial-management/predictive-
modeling.shtml
https://www.sas.com/en_gb/insights/articles/analytics/a-guide-to-predictive-analytics-and-
machine-learning.html
https://insightsoftware.com/blog/top-5-predictive-analytics-models-and-algorithms/
https://towardsdatascience.com/predicting-loan-repayment-5df4e0023e92
BOOKS:
JOURNAL:
PROJECT REFERENCE: