Professional Documents
Culture Documents
ENGINEERING, KOLHAPUR
BACHELOR OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING
A REPORT
ON
Year 2021-22
CERTIFICATE
This is to certify that Mr. Soham Narayan Bhosale, Mr. Vishwardhan Sunil
Chougule, Mr.Nikhil Abhiman Jadhav, Mr.Ganesh Vitthal Karande and
Mr.Shahid Munir Mangole of CSE (ComputerScience Engineering) have
submitted project report entitled “House Price Prediction System. ”as per rule of
Shivaji university Kolhapur, in year 2021-2022.this project report is the record of
students own work carried out by them under the supervision & guidance in
satisfactory manner.
.
We also declare the this project report is our team preparation and not
copied from anywhere else.
Place:- Kolhapur
Date:-
We also express our gratitude & thanks to all staff members & colleagues
& all those who were directly or indirectly involved in our project.
1.2 OBJECTIVE 03
2. EXISTING SYSTEM 04
2.1 DESCRIPTION 04
2.2 DRAWBACKS 04
3. SYSTEM ARCHITECTURE 05
4. REQUIREMENT ANALYSIS 09
5. IMPLEMENTATION DETAILS 10
5.1 ALGORITHM 10
5.2 METHDOLOGY 11
6. RESULT 12
7. CONCLUSION 18
8. REFERENCES 19
House Price Prediction System
1. INTRODUCTION
Machine learning is a subfield of Artificial Intelligence (AI) that works with algorithms
and technologies to extract useful information from data. Machine learning methods are
appropriate in big data since attempting to manually process vast volumes of data would be
impossible without the support of machines. Machine learning in computer science attempts to
solve problems algorithmically rather than purely mathematically. Therefore, it is based on
creating algorithms that permit the machine to learn. However, there are two general groups in
machine learning which are supervised and unsupervised. Supervised is where the program
gets trained on pre-determined set to be able to predict when a new data is given. Unsupervised
is where the program tries to find the relationship and the hidden pattern between the data .
Several Machine Learning algorithms are used to solve problems in the real world
today. However, some of them give better performance in certain circumstances, as stated in
the No Free Lunch Theorem . Thus, this thesis attempts to use regression algorithms and
artificial neural network (ANN) to compare their performance when it comes to predicting
values of a given dataset.
The performance will be measured upon predicting house prices since the prediction in
many regression algorithms relies not only on a specific feature but on an unknown number of
attributes that result in the value to be predicted. House prices depend on an individual house
specification. Houses have a variant number of features that may not have the same cost due to
its location. For instance, a big house may have a higher price if it is located in desirable rich
area than being placed in a poor neighbourhood
CSE, BVCOEK 1
House Price Prediction System
Based on observations, some people know about house prices of particular areas and a lot of
people are unaware of house prices. There is not a proper web- based application which can
fulfill a user’s demand of knowing the house price of any particular area. This is a limitation
that gives them capability to store house prices, but at the same time people try to maintain
secrecy in house prices, so people try to make phone calls in order to know prices of the area.
They also make use of phone calls which are also limited to many features as compared to a
web base system. For example, a customer may make a phone call to an agent for a particular
house price, but there might be a chance of not knowing the exact house price.
CSE, BVCOEK 2
House Price Prediction System
1.2 Objective
To develop a web-based system that will help people to predict house prices.
To help in advertising the house price prediction and more services of a company,
through the availability of the system online.
CSE, BVCOEK 3
House Price Prediction System
2. EXISTING SYSTEM
2.1 Description
The resulting data is fed into a machine learning model. To find the optimal procedure and
parameters for the model, we will mostly employ K-fold Cross-Validation and the
GridSearchCV approach.
It turns out that the linear regression model produces thebest results for our data, with
score of more than 80%, which is not terrible.
Now, we need to export our model as a pickle file(Bengaluru_House_Data.pickle),
which transforms Python objects into a character stream. Also, in order to interactwith
the locations(columns) from the frontend, we must export them into a JSON
(columns.json) file
Drawbacks
1. Details are stored in papers.
2. maintenance is a huge problem.
3. Updating, changes in details in tedious tasks.
4. Performance is not achieved up to the requirements.
CSE, BVCOEK 4
House Price Prediction System
3. SYSTEM ARCHITECTURE
Fig (1)
CSE, BVCOEK 5
House Price Prediction System
Fig (2)
CSE, BVCOEK 6
House Price Prediction System
Linear Regression
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.
Lasso Regression
It is similar to the Ridge Regression except that the penalty term contains only the absolute
weights instead of a square of weights. Since it takes absolute values, hence, it can shrink the
slope to 0, whereas Ridge Regression can only shrink it near to 0. It is also called as L1
regularization.
Ridge Regression
Ridge regression is one of the types of linear regression in which a small amount of bias is
introduced so that we can get better long-term predictions.
Ridge regression is a regularization technique, which is used to reduce the complexity of the
model. It is also called as L2 regularization.
In this technique, the cost function is altered by adding the penalty term to it. The amount of
bias added to the model is called Ridge Regression penalty. We can calculate it by multiplying
with the lambda to the squared weight of each individual feature.
XGBoost
CSE, BVCOEK 7
House Price Prediction System
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based
on the concept of ensemble learning, which is a process of combining multiple classifiers to
solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to improve the predictive accuracy
of that dataset." Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.
CSE, BVCOEK 8
House Price Prediction System
CSE, BVCOEK 9
House Price Prediction System
5. IMPLEMENTATION DETAILS
5.1 Algorithm
Step 1-Start
CSE, BVCOEK 10
House Price Prediction System
5.2 Methodology
The project is done to pre-process the data and evaluate the prediction accuracy of the
models. The experiment has multiple stages that are required to get the prediction results.
These stages can be defined as: -
Pre-processing: both datasets will be checked and pre-processed using the methods These
methods have various ways of handling data. Thus, the pre-processing is done on multiple
iterations where each time the accuracy will be evaluated with the used combination.
Data splitting: dividing the dataset into two parts is essential to train the model with one and
use the other in the evaluation. The dataset will be split 75% for training and 25% for testing.
- Evaluation: the accuracy of both datasets will be evaluated by measuring the R2 and RMSE
rate when training the model alongside an evaluation of the actual prices on the test dataset
with the prices that are being predicted by the model. 10
Performance: alongside the evaluation metrics, the required time to train the model will be
measured to show the algorithm vary in terms of time.
Correlation: correlation between the available features and house price will be evaluated
using the Pearson Coefficient Correlation to identify whether the features have a negative,
positive or zero correlation with the house price.
CSE, BVCOEK 11
House Price Prediction System
6. RESULT
This program provides user a predicted price of the house. It gives the user an idea about
pricing of the house who are totally unaware about them. It provides a direct numerical value
prediction price of the house which helps the user to understand it easily.
It provides a very simple output, in rupee format. It helps many users to think about their
construction projects based on the price. Also helps in selling a house.
The user interface of the program is very user-friendly and simple to use and understand.
This model is 98% efficient to guess the price of the house.
It predicts the nearest correct price of the house based on the dataset it was provided with.
The following are the basic functionalities of the House Price Prediction Project in python
with Source code.
Main Page:
When When you start the project from any compiler or by clicking on the executable App.py
file , you will see the image shown below which is the main page for taking the user input
and for predicting the price of the house.
User Input:
In the below image the user inputs the data in the specified blocks needed for predicting the
house price.
CSE, BVCOEK 12
House Price Prediction System
Heatmap
A heatmap is a two-dimensional graphical representation of data where the individual values
that are contained in a matrix are represented as colours.
The Seaborn package allows the creation of annotated heatmaps which can be tweaked using
Matplotlib tools as per the creator’s requirement.
CSE, BVCOEK 13
House Price Prediction System
Fig (3)
Histogram
A histogram is a graphical representation of data points organized into user-specified ranges. Similar
in appearance to a bar graph, the histogram condenses a data series into an easily interpreted visual by
taking many data points and grouping them into logical ranges or bins.
CSE, BVCOEK 14
House Price Prediction System
Fig (4)
Scatter Plot
The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the
same length, one for the values of the x-axis, and one for the values of the y-axis:
Fig (5)
CSE, BVCOEK 15
House Price Prediction System
SubPlot
Subplots mean groups of axes that can exist in a single matplotlib figure. subplots()
function in the matplotlib library, helps in creating multiple layouts of subplots. It
provides control over all the individual plots that are created.
Fig (6)
Pairplot
Pairplot visualization comes handy when you want to go for Exploratory data analysis
(“EDA”).
Pairplot visualizes given data to find the relationship between them where the variables can be
continuous or categorical.
Fig (7)
CSE, BVCOEK 16
House Price Prediction System
Implemented Schedule
PERIOD WORK TO BE COMPLETED
CSE, BVCOEK 17
House Price Prediction System
7. CONCLUSION
Thus,we have completed our project entitled “House Price Prediction” by Using Machine
Learning with Python.
For this project we have studied how to download dataset and clean data and also Train model
using machine learning algorithms.
This project is very useful for us to develop logical thinking and knowledge in software
engineering.
CSE, BVCOEK 18
House Price Prediction System
8. REFERENCES
www.google.com
www.w3schools.com
www.geeksforgeeks.com
www.programiz.com
CSE, BVCOEK 19
House Price Prediction System
Dr.V.R.Ghorpad
EXTERNAL EXAMINER PRINCIPAL
CSE, BVCOEK 20