Final Report Jay

Project Phase II Report
On
House Price Prediction
Submitted for the requirement of
Project course
BACHELOR OF ENGINEERING
COMPUTER SCIENCE & ENGINEERING
Submitted to: Submitted By:
Rohini Bawa(e12228) 1. Jay Gautam 20BCS4962
(Supervisor) 2. Rahamtullah Sheikh 20BCS5161
Rohini 3 Shreya Lakhanpal 20BCS5123
4. Tathagat Chaubey 20BCS5081
Manjit Singh (e11549) 5. Jatin Kumar Mahaldar 20BCS5165
(Co-Supervisor)
Signature:
i
Jay Gautam
20BCS4962
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CHANDIGARH UNIVERSITY, GHARUAN
June 2022
ABSTRACT
House price forecasting is an important topic of real estate. The literature attempts
to derive useful knowledge from historical data of property markets. Machine
learning techniques are applied to analyze historical property transactions in India
to discover useful models for house buyers and sellers. Revealed is the high
discrepancy between house prices in the most expensive and most affordable
suburbs in the city of Mumbai. Moreover, experiments demonstrate that the
Multiple Linear Regression that is based on mean squared error measurement is a
competitive approach.
TABLE OF CONTENT
Acknowledgements......................................................................... v
1. Introduction.................................................................................1
ii
Jay Gautam
20BCS4962
1.1 Goal..........................................................................................1
1.2 Need of the application..........................................................1
2. DATASET...................................................................................2
2.2Steps in Preparing Data for Model........................................2
2.3Data Exploration.....................................................................3
2.4Data Visualization...................................................................3
2.5Data Selection..........................................................................4
2.6Data Transformation……………………………………….5-6
3. System Analysis...........................................................................7
3.1 ER Diagram............................................................................8
3.2 Data Flow Diagram................................................................8
3.3 Class Diagram.........................................................................8
4. Design...........................................................................................9
4.1 Design Goals............................................................................9
4.2Architectural Design..........................................................9-10
5.LANGUAGE AND LIBRARIES ...........................................................10
5.1HTML……………………………………………………10
5.2CSS………………………………………………………………...10
5.3 Java script ………………………………………………………..10
5.4 JSX…………………………………………………………………11
5.5 Python…………………………………………………………..11
5.6 Libraries used …………………………………………………11
6 Modules ………………………………………………………12
6.1 Regression Model……………………………………….12
6.2 Random Forest Regression Model…………………….12-13
6.3 XG Boost Regressor Model ……………………………13
7 Result and discussion………………………………………..14
8 Conclusion ……………………………………………………15
iii
Jay Gautam
20BCS4962
Acknowledgements
I would like to thank my major professor to Dr. Manjit Singh
for his constant guidance and help throughout the project. I would also like to thank
Er. Rohini Bawa for graciously accepting to be on my committee.
iv
Jay Gautam
20BCS4962
Finally, I would like to thank my family and my friends for all the support and
encouragement.
v
Jay Gautam
20BCS4962
1. INTRODUCTION
1.1 Goal
These are the Parameters on which we will evaluate ourselves-
Create an effective price prediction model
Validate the model’s prediction accuracy
Identify the important home price attributes which feed the model’s predictive power
1.2 Need Of Application
Having lived in India for so many years if there is one thing that I had been taking for
granted, it’s that housing and rental prices continue to rise. Since the housing crisis of
2008, housing prices have recovered remarkably well, especially in major housing
markets. However, in the 4th quarter of 2016, I was surprised to read that Bombay
housing prices had fallen the most in the last 4 years. In fact, median resale prices for
condos and coops fell 6.3%, marking the first time there was a decline since Q1 of
2017. The decline has been partly attributed to political uncertainty domestically and
abroad and the 2014 election. So, to maintain the transparency among customers and
also the comparison can be made easy through this model. If customer finds the price
of house at some given website higher than the price predicted by the model, so he can
reject that house.
6
Jay Gautam
20BCS4962
1
2. DATASET
2.1 Steps for preparing Meta model

Here we have web scrapped the Data from 99acres.com website which is one of the leading real
estate websites operating in INDIA.
Our Data contains Delhi and Pune area only.
7
Jay Gautam
20BCS4962
2
2.2 Data Exploration
Data exploration is the first step in data analysis and typically involves summarizing
the main characteristics of a data set, including its size, accuracy, initial patterns in the
data and other attributes. It is commonly conducted by data analysts using visual
analytics tools, but it can also be done in more advanced statistical software, Python.
Before it can conduct analysis on data collected by multiple data sources and stored in
data warehouses, an organization must know how many cases are in a data set, what
variables are included, how many missing values there are and what general hypotheses
the data is likely to support. An initial exploration of the data set can help answer these
questions by familiarizing analysts with the data with which they are working.
2.3 Data Visualization
Data visualization is the graphical representation of information and data. By using
visual elements like charts, graphs, and maps, data visualization tools provide an
accessible way to see and understand trends, outliers, and patterns in data. In the world
8
Jay Gautam
20BCS4962
of Big Data, data visualization tools and technologies are essential to analyse massive
amounts of information and make data-driven
2.4 Data Selection
Data selection is defined as the process of determining the appropriate data type
and source, as well as suitable instruments to collect data. Data selection precedes the
actual practice of data collection. This definition distinguishes data selection from
selective data reporting (selectively excluding data that is not supportive of a research
hypothesis) and interactive/active data selection (using collected data for monitoring
activities/events, or conducting secondary data analyses). The process of selecting
suitable data for a research project can impact data integrity.
The primary objective of data selection is the determination of appropriate data type,
source, and instrument(s) that allow investigators to adequately answer research
questions. This determination is often discipline-specific and is primarily driven by the
nature of the investigation, existing literature, and accessibility to necessary data.
9
Jay Gautam
20BCS4962
4
2.5 Data Transformation

The log transformation can be used to make highly skewed distributions less skewed.
This can be valuable both for making patterns in the data more interpretable and for
helping to meet the assumptions of inferential statistics.
It is hard to discern a pattern in the upper panel whereas the strong relationship is
shown clearly in the lower panel. The comparison of the means of log-transformed data
is actually a comparison of geometric means. This occurs because, as shown below,
the anti-log of the arithmetic mean of log-transformed values is the geometric mean.
10
Jay Gautam
20BCS4962
5
11
Jay Gautam
20BCS4962
6
3. SYSTEM ANALYSIS
3.1 ER Diagram
12
Jay Gautam
20BCS4962
7
3.2 Data Flow Diagram
3.3 Class Diagram
13
Jay Gautam
20BCS4962
8
4. Design
4.1 design goals
The design of the web application involves the design of the forms for listing the area,
search for location, display the complete specification for the location, and design a
comparison that is easy to use.
Design of an interactive application that enables the user to filter the products based on
different parameters.
14
Jay Gautam
20BCS4962
Design of an application that has features like drag and drop etc.
Design of application that decreases data transfers between the client and the server.
4.2 Architectural Context Diagram
In this context diagram, the information provided to and received from the online house
prediction is identified. The arrows represent the information received or generated by
the application. The closed boxes represent the set of sources and sinks of information.
5. Languages And Libraries
5.1 HTML
HTML code ensures the proper formatting of text and images for your Internet
browser. Without HTML, a browser would not know how to display text as elements or
15
Jay Gautam
20BCS4962
load images or other elements. HTML also provides a basic structure of the page, upon
which Cascading Style Sheets are overlaid to change its appearance. One could think of
HTML as the bones (structure) of a web page, and CSS as its skin (appearance).
5.2 CSS
CSS is among the core languages of the open web and is standardized across Web
browsers according to W3C specifications. Previously, development of various parts of
CSS specification was done synchronously, which allowed versioning of the latest
recommendations. You might have heard about CSS1, CSS2.1, CSS3. However, CSS4
has never become an official version.
5.3 Java Script

JavaScript is a lightweight, cross-platform, and interpreted scripting language. It is
well-known for the development of web pages; many non-browser environments also
use it. JavaScript can be used for Client-side developments as well as Server-
side developments. JavaScript contains a standard library of objects, like Array, Date,
and Math, and a core set of language elements like operators, control structures.
5.4 JSX
JSX stands for JavaScript XML. It is simply a syntax extension of JavaScript. It allows
us to directly write HTML in React (within JavaScript code). It is easy to create a
template using JSX in React, but it is not a simple template language instead it comes
with the full power of JavaScript.
It is faster than normal JavaScript as it performs optimizations while translating to
regular JavaScript. Instead of separating the markup and logic in separated files, React
16
Jay Gautam
20BCS4962
uses components for this purpose. We will learn about components in detail in further
articles.
5.5 Python
Python is widely used in scientific and numeric computing:
SciPy is a collection of packages for mathematics, science, and engineering.
Pandas is a data analysis and modelling library.
IPython is a powerful interactive shell that features easy editing and recording of a
work session, and supports visualizations and parallel computing.
The Software Carpentry Course teaches basic skills for scientific computing, running
bootcamps and providing open-access teaching materials
5.6 libraries used for this project are –
Beautiful soap Selinium
Panda Scrapy
Numpy Scikit-learn
6. Modules used
6.1 Regression Model

Linear Regression is a machine learning algorithm based on supervised learning.
It performs a regression task. Regression models a target prediction value based on
independent variables.
It is mostly used for finding out the relationship between variables and forecasting.
17
Jay Gautam
20BCS4962
Real Vs Predicted
6.2 R andom Forest Regression Model
A Random Forest is an ensemble technique capable of performing both regression and
classification tasks with the use of multiple decision trees and a technique called
Bootstrap Aggregation, commonly known as bagging.
Bagging, in the Random Forest method, involves training each decision tree on a
different data sample where sampling is done with replacement.
The basic idea behind this is to combine multiple decision trees in determining the final
output rather than relying on individual decision trees.
18
Jay Gautam
20BCS4962
Real Vs Predicted
6.3 XG Boost Regressor Model
XG Boost stands for eXtreme Gradient Boosting.
The XG Boost library implements the gradient boosting decision tree algorithm.
Boosting is an ensemble technique where new models are added to correct the errors
made by existing models.
Models are added sequentially until no further improvements can be made.
Real Vs Predicted
19
Jay Gautam
20BCS4962
13
7. Result and discussion
So, our study showed that…
Linear Regression displayed the best performance for this Dataset and can be used for
deploying purposes.
Random Forest Regressor and XGBoost Regressor are far behind, so can’t be
recommended for further deployment purposes.
RMSE Bar Graph
20
Jay Gautam
20BCS4962
The Model is deployed through Python Web App Flask in collaboration with HTML
and CSS
14
8. Conclusion
So, our Aim is achieved as we have successfully ticked all our parameters as mentioned
in our Aim Column. It is seen that circle rate is the most effective attribute in predicting
the house price and that the Linear Regression is the most effective model for our
Dataset with RMSE bar graph.
21
Jay Gautam
20BCS4962
22
Jay Gautam
20BCS4962

Final Report Jay

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Report Jay

Uploaded by

Copyright:

Available Formats

Project Phase II Report

House Price Prediction

Submitted for the requirement of

COMPUTER SCIENCE & ENGINEERING

Submitted to: Submitted By:

Rohini Bawa(e12228) 1. Jay Gautam 20BCS4962

(Supervisor) 2. Rahamtullah Sheikh 20BCS5161

Rohini 3 Shreya Lakhanpal 20BCS5123

4. Tathagat Chaubey 20BCS5081

Manjit Singh (e11549) 5. Jatin Kumar Mahaldar 20BCS5165

to derive useful knowledge from historical data of property markets. Machine

learning techniques are applied to analyze historical property transactions in India

suburbs in the city of Mumbai. Moreover, experiments demonstrate that the

Multiple Linear Regression that is based on mean squared error measurement is a

I would like to thank my major professor to Dr. Manjit Singh

Er. Rohini Bawa for graciously accepting to be on my committee.

These are the Parameters on which we will evaluate ourselves-

Create an effective price prediction model

Validate the model’s prediction accuracy

1.2 Need Of Application

reject that house.

2.1 Steps for preparing Meta model

estate websites operating in INDIA.

Our Data contains Delhi and Pune area only.

2.2 Data Exploration

2.3 Data Visualization

Data visualization is the graphical representation of information and data. By using

amounts of information and make data-driven

2.4 Data Selection

activities/events, or conducting secondary data analyses). The process of selecting

suitable data for a research project can impact data integrity.

source, and instrument(s) that allow investigators to adequately answer research

questions. This determination is often discipline-specific and is primarily driven by the

nature of the investigation, existing literature, and accessibility to necessary data.

2.5 Data Transformation

helping to meet the assumptions of inferential statistics.

is actually a comparison of geometric means. This occurs because, as shown below,

3.2 Data Flow Diagram

3.3 Class Diagram

4.1 design goals

comparison that is easy to use.

4.2 Architectural Context Diagram

prediction is identified. The arrows represent the information received or generated by

5. Languages And Libraries

browsers according to W3C specifications. Previously, development of various parts of

has never become an official version.

5.3 Java Script

use it. JavaScript can be used for Client-side developments as well as Server-

side developments. JavaScript contains a standard library of objects, like Array, Date,

and Math, and a core set of language elements like operators, control structures.

JSX stands for JavaScript XML. It is simply a syntax extension of JavaScript. It allows

us to directly write HTML in React (within JavaScript code). It is easy to create a

with the full power of JavaScript.

It is faster than normal JavaScript as it performs optimizations while translating to

Python is widely used in scientific and numeric computing:

SciPy is a collection of packages for mathematics, science, and engineering.

Pandas is a data analysis and modelling library.

work session, and supports visualizations and parallel computing.

bootcamps and providing open-access teaching materials

5.6 libraries used for this project are –

Beautiful soap Selinium

6.1 Regression Model

It performs a regression task. Regression models a target prediction value based on