You are on page 1of 19

A Project Report on

HOUSE PRICE PREDICTION USING MACHINE


LEARNING

Submitted in the partial fulfillment of the requirements for the


Summer Internship of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY

Submitted by

B.vinay kumar 18075A1202


I.Yoginath 18075A1204
k.Rakesh 18075A1205
T.sai chandra 18075A1212

PROJECT GUIDE
Miss. Anusha Prakash,
Associate Professor,
Dept. of Information Technology,
VNRVJIET,

DEPARTMENT OF INFORMATION TECHNOLOGY


VNR Vignana Jyothi Institute of Engineering
Technology
(Autonomous Institute, Accredited by NAAC with ‘A++’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090 ,
Department of InformationTechnology

DECLARATION
I hereby declare that the project entitled “House price prediction using
machine learning” submitted for the B.Tech Degree is my original work
and the project has not formed the basis for the award of any degree,
associate ship, fellowship or any other similar titles.

Signature of the Student:

B.Vinay I.Yoginath K.Rakesh T.Sai Chandra


(18075A1202) (18075A1204) (18075A1205) (18075A1212)

Place:
Date:
ACKNOWLEDGEMENT

We express our deep sense of gratitude to our beloved President, Dr.D.N.Rao,


VNR Vignana Jyothi Institute of Engineering &Technology for the valuable
guidance and for permitting us to carry out this project.

With immense pleasure, we record our deep sense of gratitude to our beloved
Principal, Dr.C.D.Naidufor permitting us to carry out this project.

We express our deep sense of gratitude to our beloved professor G.Suresh


Reddy, Associate Professor and Head, Department of Information
Technology, VNR Vignana Jyothi Institute of Engineering & Technology,
Hyderabad-90 for the valuable guidance and suggestions, keen interest and
through encouragement extended throughout period of project work.

We take immense pleasure to express our deep sense of gratitude to our beloved
Guide Miss.Anusha prakash, Associate Professor in Information Technology,
VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad, for his
valuable suggestions and rare insights, for constant source of encouragement and
inspiration throughout my project work.

We express our thanks to all those who contributed for the successful
completion of our project work.

B.Vinay kumar

I.Yoginath

K.Rakesh

T.Sai Chandra
A Project Report on

HOUSE PRICE PREDICTION USING MACHINE


LEARNING
Submitted in the partial fulfillment of the requirements for the
Summer Internship of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
B. vinay kumar 10875A1202

Under the esteemed guidance of

PROJECT GUIDE
Miss. Anusha Prakash,
Associate Professor,
Dept. of Information Technology,
VNRVJIET,
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering
Technology
(Autonomous Institute, Accredited by NAAC with ‘A++’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090 ,
A Project Report on

HOUSE PRICE PREDICTION USING MACHINE


LEARNING
Submitted in the partial fulfillment of the requirements for the
Summer Internship of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
I.yoginath 18075A1204
Under the esteemed guidance of

PROJECT GUIDE
Miss. Anusha Prakash,
Associate Professor,
Dept. of Information Technology,
VNRVJIET,
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering
Technology
(Autonomous Institute, Accredited by NAAC with ‘A++’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090 ,
A Project Report on

HOUSE PRICE PREDICTION USING MACHINE


LEARNING
Submitted in the partial fulfillment of the requirements for the
Summer Internship of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
K.rakesh 18075A1205
Under the esteemed guidance of

PROJECT GUIDE
Miss. Anusha Prakash,
Associate Professor,
Dept. of Information Technology,
VNRVJIET,
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering
Technology
(Autonomous Institute, Accredited by NAAC with ‘A++’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090 ,
A Project Report on

HOUSE PRICE PREDICTION USING MACHINE


LEARNING
Submitted in the partial fulfillment of the requirements for the
Summer Internship of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
T.sai Chandra 18075A1212
Under the esteemed guidance of

PROJECT GUIDE
Miss. Anusha Prakash,
Associate Professor,
Dept. of Information Technology,
VNRVJIET,
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering
Technology
(Autonomous Institute, Accredited by NAAC with ‘A++’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090 ,
INDEX
S.NO Contents PageNo
CHAPTER 1: INTORDUCTION

1.1 Need and Motivation 1


1.2 Present Problems Faced 1
1.3 Proposed model 2
1.4 Objective 2
CHAPTER 2: Gathering and processing the Data 3-5
2.1 Dataset 3
2.2 Data Exploration 4
2.3 Data Visualization 4
2.4 Data Selection 5
CHAPTER 3: MODEL BUILDING 6–8
3.1 Introduction to Model Building 6
3.2 MODELS USED :Regression Model 7
3.3 TECHNOLOGIES USED 8
CHAPTER4: IMPLEMENTATION 9 - 11

4.1 Reading data 9


4.2 Data Preprocessing 10
4.3 Linear Regression Model 10
4.4 Website/ User interface 11

RESULTS 12

FUTURE WORKS 12
ABSTRACT:

In today’s world, everyone wishes for a house that suits their lifestyle and
provides amenities according to their needs. House prices keep on changing
very frequently which proves that house prices are often exaggerated. There are
many factors that have to be taken into consideration for predicting house prices
such as location, number of rooms, and other basic local amenities. We will be
using RANDOM FOREST REGRESSION ALGORITHM with MACHINE
LEARNING for real-time data extraction. machine learning algorithm is used to
predict house prices with respect to the dataset.

CHAPTER 1: INTORDUCTION

Aim: These are the Parameters on which we will evaluate ourselves


• Create an effective price prediction model

• Validate the model’s prediction accuracy

Identify the important home price attributes which feed the model’s predictive power
1.1 Need and Motivation
Trends in housing prices indicate the current economic situation and also are a concern
to the users. There are many factors that have an impact on house prices, such as the
number of bedrooms and bathrooms. House price depends upon its area of extension.
Predicting house prices manually is a difficult task and generally not very accurate,
hence there are many systems developed for house price prediction. This system’s aim
was to make a model that can give us a good house price prediction based on other
variables. we used the RANDOM FOREST REGRESSION for Ames dataset and hence
it gave good accuracy. The house price prediction project helps, Users can view and
see the predicted housing price for that particular house.

1.2Problems Faced :
The Problems faced during buying a house:
1)Buying a house is a stressful thing.
2)Buyers are generally not aware of factors that influence the house prices.
3)Many problems are faced during buying a house.
Hence real estate agents are trusted with the communication between buyers
and sellers as well as laying down a legal contract for the transfer. This just
creates a middle man and increases the cost of houses

1.3 Proposed model : RANDOM FOREST REGRESSION Model


• RANDOM FOREST REGRESSION is a machine learning algorithm based on
supervised learning.

• It performs a regression task. Regression models a target prediction value


based on independent variables.
• It is mostly used for finding out the relationship between variables and
forecasting
1.4 Objective:

• The main objective of this project is to demonstrate and implement a


methodology that enables us to use maximum computational power
from a low-end processor and finally accomplishing the task of avoiding
drowsy driving.

• The results include inputs in different conditions are also tested out. It implies
under different lighting conditions as well as different drowsiness levels.
CHAPTER - 2

Gathering and processing the Data


The dataset we used in this project was taken from Kaggle websites. It
contains information on house price in it.

2.1Dataset looks as follows-

Preview of Dataset
2.2 Data Exploration:

Data exploration is the first step in data analysis and typically involves summarizing the
main characteristics of a data set, including its size, accuracy, initial patterns in the data
and other attributes. It is commonly conducted by data analysts using visual analytics
tools, but it can also be done in more advanced statistical software, Python. Before it can
conduct analysis on data collected by multiple data sources and stored in data
warehouses, an organization must know how many cases are in a data set, what variables
are included, how many missing values there are and what general hypotheses the data is
likely to support. An initial exploration of the data set can help answer these questions by
familiarizing analysts with the data with which they are working. We divided the data 9:1
for Training and Testing purpose respectively
fig 2.1.0 Dataset features

2.3 Data Visualization:

Data visualization is the graphical representation of information and


data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data. In the world of Big Data, data
visualization tools and technologies are essential to analyse
massive amounts of information and make data-driven decisions.

fig 2.2.0 Data visualisation


fig 2.2.1 grade vs price

2.4 Data Selection


Data selection is defined as the process of determining the appropriate data type and source, as well
as suitable instruments to collect data. Data selection precedes the actual practice of data collection.
This definition distinguishes data selection from selective data reporting and interactive/active data
selection (using collected data for monitoring activities/events, or conducting secondary data
analyses). The process of selecting suitable data for a research project can impact data integrity. The
primary objective of data selection is the determination of appropriate data type, source, and
instrument(s) that allow investigators to adequately answer research questions. This determination
is often discipline-specific and is primarily driven by the nature of the investigation, existing
literature, and accessibility to necessary data sources.

fig 2.3.1 Correlation heatmap


CHAPTER 3

3.1 Introduction to Model Building:


Collecting Data :

● First step was to Collect Data We collect data and split them into training Test data set
● Train the machine learning model in this case is Linear regression
● Based on Graphs and inputs predict the cost of the house.

3.2 MODELS USED : RANDOM FOREST REGRESSION


• RANDOM FOREST REGRESSION is a machine learning algorithm
based on supervised learning.
• It performs a regression task. Regression models a target prediction value
based on independent variables.
• It is mostly used for finding out the relationship between variables and
forecasting.
RANDOM FOREST REGRESSION GRAPH
3.3 TECHNOLOGIES USED
1) Python
2) Numpy and Pandas for data cleaning
3) Matplotlib for data visualization
4) Sklearn for model building
5) Jupyter notebook, visual studio code and pycharm as IDE
6) Python flask for http server
7) HTML/CSS/Javascript for UI
Chapter 4

4.1Reading data:

4.2 Data Preprocessing:


Many real-world data-sets may contain missing values for various reasons. They
are often encoded as NaNs, blanks or any other placeholders. Training a model
with a data-set that has a lot of missing values can drastically impact the machine
learning model’s quality. Some algorithms such as scikit-learn estimators assume
that all values are numerical and have and hold meaningful value. One way to
handle this problem is to get rid of the observations that have missing data.
However, you will risk losing data points with valuable information. A better
strategy would be to impute the missing values

4.3 Handle NA values:

fig4.2.1Heatmap for missing values

4.4 Training and Testing


It is useful to evaluate our model once it is trained. We want to know if it has
learned properly from a training split of the data. There can be 3 different
situations:
1) The model didn´t learn well on the data, and can’t predict even the outcomes of
the training set, this is called underfitting and it is caused because a high bias.
2) The model learn too well the training data, up to the point that it memorized it
and is not able to generalize on new data, this is called overfitting, it is caused
because high variance.
3) The model just had the right balance between bias and variance, it learned well
and is able predict correctly the outcomes on new data.

We have splitted our dataset into training and testing data with 7:3 ratio.
70% of the data is used to train the model and 30 %% of the data is used as
testing data to evaluate the model.

4.5 Defining a Performace Metric

It is difficult to measure the quality of a given model without quantifying its

performance on the training and testing. This is typically done using some type of

performance metric, whether it is through calculating some type of error, the

goodness of fit, or some other useful measurement.For this project, we will calculate

the coefficient of determination, R², to quantify the model’s performance. The

coefficient of determination for a model is a useful statistic in regression analysis, as

it often describes how “good” that model is at making predictions.

The values for R² range from 0 to 1, which captures the percentage of squared

correlation between the predicted and actual values of the target variable.
● A model with an R² of 0 is no better than a model that always predicts the mean

of the target variable.

● Whereas a model with an R² of 1 perfectly predicts the target variable.

● Any value between 0 and 1 indicates what percentage of the target variable,

using this model, can be explained by the features.

You might also like