ML Report Abhishek Awhale 2

A
Mini Project Report

on
“House Price Prediction using area of houses in

Monroe,New Jersey”
submitted in partial fulfilment of the requirements
of the degree of
Bachelor of Technology – B.Tech ITDS
by
Abhishek Awhale
URN NO: 2019-B-24072001B
Under the Guidance of

Siddharth Nanda
December 2021
School of Engineering
Ajeenkya D Y Patil University, Pune
Declaration of Originality
I, Abhishek Awhale, URN 2019-B-24072001B, hereby declare that this project entitled “House
Price Prediction using Area of House in Monroe, New Jersey” presents my original work carried
out as a bachelor student of School of Engineering, Ajeenkya D Y Patil University, Pune,
Maharashtra. To the best of my knowledge, this project report contains no material previously
published or written by another person, nor any material presented by me for the award of any
degree or diploma of Ajeenkya D Y Patil University, Pune or any other institution. Any
contribution made to this research by others, with whom I have worked at Ajeenkya D Y Patil
University, Pune or elsewhere, is explicitly acknowledged in the project report. Works of other
authors cited in this project report have been duly acknowledged under the sections “Reference”
or “Bibliography”. I also declare that I have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my
submission.
I am fully aware that in case of any non-compliance detected in future, the Academic Council of
Ajeenkya D Y Patil University, Pune may withdraw the degree awarded to me on the basis of the
present project report.
Date: 09th December 2020

Place: Lohegaon, Pune
Abhishek Awhale
ii
Acknowledgement
I remain immensely obliged to Prof. Siddhart Nanda – Project Supervisor, for providing
me with the idea of this topic, and for his/her invaluable support in garnering resources for me
either by way of information or computers also his guidance and supervision which made this
Internal Project happen.
I would like to thank Prof. Siddharth Nanda, Program Coordinator B.Tech, and Dr.
Biswajeet Champaty, Head of Department for their invaluable support.
I would like to say that it has indeed been a fulfilling experience for working out this
Internal Project.
Abhishek Awhale
iii
Dec 2021
CERTIFICATE
This is to certify that the project entitled “House Price Prediction using Area of
House in Monroe, New Jersey” is a bonafide work of “Abhishek Awhale” (Roll
No.01) submitted to the Ajeenkya D Y Patil University, Pune in partial fulfillment
of the requirement for the award of the degree of “Bachelor of Technology (B.Tech)
in Information Technology in Data Science”.
Prof. Siddharth Nanda

Project Supervisor
iv
Dec 2021
Supervisor’s Certificate
This is to certify that the project entitled “House Price Prediction using Area of
House in Monroe, New Jersey” submitted by Nikhil Shinde, URN: 2019-B-
17062000A, is a record of original work carried out by him/her under my supervision
and guidance in partial fulfillment of the requirements of the degree of Bachelor of
Technology (B.Tech) at School of Engineering, Ajeenkya D Y Patil University,
Pune, Maharashtra 412105. Neither this project report nor any part of it has been
submitted earlier for any degree or diploma to any institute or university in India or
abroad.
Prof. Siddharth Nanda

Project Supervisor
v
Abstract
House Price Index (HPI) is commonly used to estimate the changes in housing price. Since housing
price is strongly correlated toother factors such as location, area, population, it requires other
information apart from HPI to predict individual housing price.
There has been a considerably large number of papers adopting traditional machine learning
approaches to predict housing prices accurately, but they rarely concern about the performance of
individual models and neglect the less popular yet complex models.
As a result, to explore various impacts of features on prediction methods, this paper will apply both
traditional and advanced machine learning approaches to investigate the difference among several
advanced models. This paper will also comprehensively validate multiple techniques in model
implementation on regression and provide an optimistic result for housing price prediction.
Keywords: Housing Price Prediction; Linear Regression; Machine Learning; Stacked

Generalization
vi
TABLE OF CONTENTS
TITLE PAGE
NO.
DECLARATION OF ORIGINALITY ii
ACKNOWLEDGEMENT iii
CERTIFICATE iv
SUPERVISOR’S CERTIFICATE v
ABSTRACT……………………………………………………………………………. vi
TABLE OF CONTENTS….…………………………………………………………… vii
LIST OF FIGURES……………………………………………………………………. viii
LIST OF TABLES……………………………………………………………………... ix
LIST OF ABRIVATIONS……………………………………………………………... x
CHAPTER 1: INTRODUCTION
1.1 Introduction………………………………………………………….. xi-xii
CHAPTER 2: CODE IMPLEMETATION xiii-
xiv
CHAPTER 3: RESULTS and VISUALIZATION xv-
xviii
CHAPTER 4: CONCLUSION xix
CHAPTER 5: BIBLIOGRAPHY xx
vii
❖ List of Figures
• Fig1: Importing packages

• Fig2: Data reading
• Fig3: Describe data
• Fig4: prizes vs area
• Fig5: Regression Model
• Fig6: Data Training
• Fig7: Predicting Accuracy
viii
❖ List of Tables
• Table 1: Abbreviation Table

• Table 2: Library and their uses
ix
List of Abbreviation
MSE Mean Squared Error
MAE Mean Absolute Error
LR Linear Regression
Table 1: Abbreviation Table
x
1. INTRODUCTION
1.1 INTRODUCTION
In this notebook, we learn how to use scikit-learn to implement simple linear regression. We
download a dataset that is related to house prices based on their areas. Then, we split our data into
training and test sets, create a model using training set, evaluate your model using test set, and
finally use model to predict unknown value.
We have downloaded a home prices dataset, Homeprices.csv, which contains model-specific

prices of houses and es for houses based on their area located in Monroe in New Jersey
Train/Test Split involves splitting the dataset into training and testing sets respectively, which are
mutually exclusive. After which, you train with the training set and test with the testing set. This
will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is
not part of the dataset that have been used to train the data. It is more realistic for real world
problems.
This means that we know the outcome of each data point in this dataset, making it great to test
with! And since this data has not been used to train the model, the model has no knowledge of the
outcome of these data points. So, in essence, it’s truly an out-of-sample testing.
Dataset source
• AREA e.g. 2600

• PRICE e.g. 550000
xi
The most basic regression algorithm which make predictions by simply computing weighted sum
of input features adding a bias term.
A linear regression is just the equation of the line,
: is the dependent variable, what the model will predict, in this case the CO2 Emissions.
: is the independent variable, what the model will use to predict , in this case the Engine Size.
: is the intersection with the axis of Emissions.
: is the slope of the model.

The library Scikit-learn provides a linear model which calculates the values of and .
S. No. Used For Tools

1 For Data Visualization matplotlib
2 For Data Analysis pandas
3 For Numerical Operation NumPy
Table 2: Library and their uses
xii
2. CODE IMPLEMENTATION
#Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#using scikit
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error as mae
# Data Prep
df = pd.read_csv('C:/Users/Abhishek/OneDrive/Desktop/csv/homeprices.csv')
df
#Looking at the data (summarization)
%matplotlib inline
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='+')
# plot price vs Area

new_df = df.drop('price',axis='columns')
new_df
price = df.price
price
#Linear Regression
reg = linear_model.LinearRegression()
reg.fit(new_df,price)
reg.predict([[3300]])
reg.coef_
reg.intercept_
xiii
#Generate CSV file with list of home price predictions
area_df = pd.read_csv("C:/Users/Abhishek/OneDrive/Desktop/csv/areas.csv")
area_df
p = reg.predict(area_df)
p
area_df['prices']=p
area_df
area_df.to_csv("C:/Users/Abhishek/OneDrive/Desktop/prediction.csv")
#to show how my linear equation line looks

%matplotlib inline
plt.xlabel('area',fontsize=20)
plt.ylabel('price',fontsize=20)
plt.scatter(df.area,df.price,color="red",marker='+')
plt.plot(df.area,reg.predict(df[['area']]),color='blue')
#check MAE and MSE

from sklearn.metrics import mean_absolute_error as mae
mae(new_df,price) # mean absolute error
from sklearn.metrics import mean_squared_error as mse
mse(new_df,price) # mean squared error
mse(new_df,price,squared=False)
xiv
3. RESULT AND VISUALIZATION
Importing Needed packages
Fig1: Importing packages
Reading the data in
Fig2: Data reading
xv
Plot prices with respect to Areas
Fig6: prices vs areas
Linear Regression Model
Fig7: Regression Model
xvi
Train data distribution
Fig8: Data Training
Predicting MAE and MSE
Fig9: Predicting Accuracy
xvii
4. CONCLUSION
Is always a good practice to visualize all the data to see if there is a linear tendency or not, because
that can result to discover that the data is not quite linear, instead non-linear? Here the data was
linear, so using a linear regression was a good choice. Their differences discovered with three
methods shown that using more data to predict a value is mostly wise, but using a lot of data can
result too in an overfitting of the model, so one must be careful. In this case the data doesn't show
any sign of overfitting because with the values of MAE were always decreasing with the new data
and the value was increasing. So there was not any discrepancy. With the last model can we say
that the fitting is quite representative and the model fits the data well.
We move to a different dataset, since prices of home are mostly linearly dependent on all its
independent parameters. We explore the different kinds of linear curves viz. exponential etc. and
try to find out the best fitting curve to determine the home prizes.
Analysis is done using Python Scikit-learn library on Jupyter notebooks. Accuracy of each model
is verified using Residual MSE and Mean absolute error.
• Importing the essential packages to perform linear regression.

• Then we train the dataset by using scikit to predict the accuracy of the data. In the form, of
regression line.
xviii
5. REFERENCES
1. youtube
2. Github
xix

ML Report Abhishek Awhale 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Report Abhishek Awhale 2

Uploaded by

Copyright:

Available Formats

A

Mini Project Report

“House Price Prediction using area of houses in

Bachelor of Technology – B.Tech ITDS

Under the Guidance of

Date: 09th December 2020

Prof. Siddharth Nanda

Prof. Siddharth Nanda

Keywords: Housing Price Prediction; Linear Regression; Machine Learning; Stacked

• Fig1: Importing packages

• Table 1: Abbreviation Table

MSE Mean Squared Error

MAE Mean Absolute Error

Table 1: Abbreviation Table

We have downloaded a home prices dataset, Homeprices.csv, which contains model-specific

• AREA e.g. 2600

A linear regression is just the equation of the line,

: is the intersection with the axis of Emissions.

: is the slope of the model.

S. No. Used For Tools

import matplotlib.pyplot as plt

#Looking at the data (summarization)

# plot price vs Area

#to show how my linear equation line looks

#check MAE and MSE

Importing Needed packages

Fig1: Importing packages

Reading the data in

Fig2: Data reading

Fig6: prices vs areas

Linear Regression Model

Fig7: Regression Model

Fig8: Data Training

Predicting MAE and MSE

Fig9: Predicting Accuracy

• Importing the essential packages to perform linear regression.

You might also like