Welcome to Scribd!

House Price Prediction

Uploaded by

0% found this document useful (0 votes)

26 views14 pages

1) The document discusses building models to predict house prices in California using various census data metrics for different block groups. 2) It outlines the process of getting data, preparing it for machine learning algorithms, selecting and training a model, fine-tuning the model, and presenting results. 3) EDA of the data found outliers, relatively normal distributions, high correlation between house prices and median income, and valuable information in longitude and latitude that could be used for feature engineering.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

26 views14 pages

House Price Prediction

Uploaded by

Sanidhya pasari

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 14

Search inside document

HOUSE PRICE

PREDICTION
Using Machine Learning
Introduction
We will be building models to predict
house prices in California using California
Census data.
It consist of metrics such as population,
median income, median house price and
others for each block group in California
which typically consists of population from
600 too 3,000.
The ultimate goal of the project is to build a
prediction engine capable of predicting
district’s median housing price.
Project

Process

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Get the data. Discover and Prepare the data Select a model Fine-tune your Present your
visualize the data for Machine and train it. model. model
to gain insights. Learning

algorithms.

Data Prepration
df.info(),describe(),head() are probably one of the first things we want to
inspect having a pandas dataframe; showing feature names, limits/stats and
and a few first columns respectively, to get a some initial impression of the
data
DATA ANALYSIS
Scatter Matrix
This plot reveals a few
things. First, the correlation
is indeed very strong;

you can clearly see the

upward trend and the
points are not too dispersed.
Ocean
Proximity
EDA Conclusions
Conclusions from EDA:

1) There are outliers in all features .

2) In general, the distribution is close to normal
3) High correlation (~0.7) between target variable(median_house_value) and
median_income
4) High correlation between features, this is multicollinearity
5) Features of longitude and latitude are valuable information, use this in
feature generation
6) Houses (<1H OCEAN) the largest number
Feature Engineering
Creation and modification of the feature matrix data,Feature Engineering is
quite important and quite a cyclic process, we want to input a feature matrix
that will help teach a model something useful.
We want to make sure we feed the model data that is most relevant to the
prediction of a target variable, perhaps as less overlapping as possible as well.
Features with very high correlation teach a model similar things, multiple
times, maybe consider combing them and dropping the others.
we will create new feature that is
"rooms_per_household","bedrooms_per_room", "population_per_household"
as
The new bedrooms_per_room attribute is much more correlated with the
median house value than the total number of rooms or bedrooms
Model Selection
After using all the models we get Random forset as the one of the best model with the error rate of
$50,182 even though we see that error rate is pretty high in validation datasets compare to training sets
suggesting there might be over fitting issue.

RMSE Values
80,000

60,000

40,000

20,000

0
Linear Regression Decision Tree Random Forest XGBoost
Model Fine Tuning

After fine tuning the model we get following

conclusions:
It tells us that prediction error can fluctuate
anywhere between $45,685 to $49,691.
We get r2 score around 81 perceent
Conclusions and
Next Steps
I believe this model could be optimized and tuned
more to add accuracy either by adding new features
or engineering new features. This model can be used
to predict the house prices in any geographic
location by just slightly fine tuning the features and
parameters.
Thank you!

DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
House Price Prediction Using Machine Learning Techniques
Document5 pages
House Price Prediction Using Machine Learning Techniques
Vishal Joshi
No ratings yet
House Price Prediction Using Machine Learning Techniques
Document5 pages
House Price Prediction Using Machine Learning Techniques
Vishal Joshi
No ratings yet
Reading 11 - Programming End-to-End Solution
Document13 pages
Reading 11 - Programming End-to-End Solution
lussy
No ratings yet
Guide
Document210 pages
Guide
José Antonio Neves
No ratings yet
Modelling and Error Analysis
Document8 pages
Modelling and Error Analysis
Atmuri Ganesh
No ratings yet
Real Estate Project PDF
Document8 pages
Real Estate Project PDF
Rashmi
No ratings yet
Faisal Nadeem (SAP# 30601)
Document7 pages
Faisal Nadeem (SAP# 30601)
FAisal NAdeem
No ratings yet
CE802 Report
Document7 pages
CE802 Report
prenithjohnsamuel
No ratings yet
Building Good Training Sets UNIT 1 PART2
Document46 pages
Building Good Training Sets UNIT 1 PART2
Aditya Sharma
No ratings yet
Develop A Program To Implement Data Preprocessing Using
Document19 pages
Develop A Program To Implement Data Preprocessing Using
Fucker Jamun
No ratings yet
CE802 Pilot
Document2 pages
CE802 Pilot
prenithjohnsamuel
No ratings yet
Jupyter Lab
Document42 pages
Jupyter Lab
Paul Shaaf
No ratings yet
TB 969425740
Document16 pages
TB 969425740
guohong hu
No ratings yet
P-149 Final PPT
Document57 pages
P-149 Final PPT
Vijay rathod
No ratings yet
Dawit House
Document49 pages
Dawit House
dawitbelete1992
No ratings yet
Case Study 219302405
Document14 pages
Case Study 219302405
nishantjain2k03
No ratings yet
Title Predicting House Pricing Using AIML (KASHISH)
Document2 pages
Title Predicting House Pricing Using AIML (KASHISH)
Jay Vardhan
No ratings yet
Learneverythingai
Document14 pages
Learneverythingai
nasby18
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
Document16 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
「瞳」你分享
No ratings yet
Stock Price Trend Forecasting Using Supervised Learning Methods
Document2 pages
Stock Price Trend Forecasting Using Supervised Learning Methods
Ali
No ratings yet
Important Questions
Document4 pages
Important Questions
Adilrabia rsl
No ratings yet
Topic 4 - 2023
Document29 pages
Topic 4 - 2023
amerahmedkhan
No ratings yet
Dealing With Missing Data in Python Pandas
Document14 pages
Dealing With Missing Data in Python Pandas
Sello
No ratings yet
Predicting Stock Values Using A Recurrent Neural Network
Document12 pages
Predicting Stock Values Using A Recurrent Neural Network
Mr SKammer
No ratings yet
Machine Learning Part: Domain Overview
Document20 pages
Machine Learning Part: Domain Overview
surya prakash
No ratings yet
AMTA Assignment AMTA B (Aswin Avni Navya)
Document13 pages
AMTA Assignment AMTA B (Aswin Avni Navya)
Shambhawi Sinha
No ratings yet
Machine Learning - Project Group 3
Document17 pages
Machine Learning - Project Group 3
Tangirala Ashwini
No ratings yet
DMDW 03
Document25 pages
DMDW 03
Harsh Nag
No ratings yet
Data Science Interview Guide
Document23 pages
Data Science Interview Guide
Mary Koko
No ratings yet
CS 461 - Fall 2021 - Neural Networks - Machine Learning
Document5 pages
CS 461 - Fall 2021 - Neural Networks - Machine Learning
Victor Ruto
No ratings yet
Assignment 3 - LP1
Document13 pages
Assignment 3 - LP1
bbad070105
No ratings yet
7 محاضرات
Document36 pages
7 محاضرات
nnnn403010
No ratings yet
Regression Analysis Lasso and Ridge Regression 1678810035
Document18 pages
Regression Analysis Lasso and Ridge Regression 1678810035
arso arsovski
No ratings yet
Data Science II: Charles C.N. Wang
Document38 pages
Data Science II: Charles C.N. Wang
sar
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
Document17 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
vanjchao
No ratings yet
Malignant Comments Classifier Project
Document30 pages
Malignant Comments Classifier Project
Saranya M
No ratings yet
1-Linear Regression
Document22 pages
1-Linear Regression
Srinivasa G
No ratings yet
Data Science Interview QnAs by CloudyML
Document21 pages
Data Science Interview QnAs by CloudyML
macoyig
No ratings yet
Machine Learning Project - Predicting Boston House Prices With Regression - by Victor Roman - Towards Data Science
Document20 pages
Machine Learning Project - Predicting Boston House Prices With Regression - by Victor Roman - Towards Data Science
Ghifari Raka
No ratings yet
Data Preprocessing
Document38 pages
Data Preprocessing
Pradhana Riza
No ratings yet
Data Analytics
Document31 pages
Data Analytics
Sandeep Tanwar
No ratings yet
Aastha Mahajan Python File
Document17 pages
Aastha Mahajan Python File
aasthamahajan2003
No ratings yet
Core Data Science Concepts 1629081058
Document24 pages
Core Data Science Concepts 1629081058
Abhishek Prasoon
No ratings yet
Data Prep and Cleaning For Machine Learning
Document22 pages
Data Prep and Cleaning For Machine Learning
Shubham J
No ratings yet
Study Notes - Lesson 1 - 7 PDF
Document25 pages
Study Notes - Lesson 1 - 7 PDF
nandex777
No ratings yet
Algoritma Academy: Practical Statistics: Background
Document19 pages
Algoritma Academy: Practical Statistics: Background
damarbrian
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
Document7 pages
Decision Trees in Sklearn Decision Trees in Sklearn
adityaacharya44
No ratings yet
Machine Learning Models: by Mayuri Bhandari
Document48 pages
Machine Learning Models: by Mayuri Bhandari
mayuri
No ratings yet
Regression Analysis
Document17 pages
Regression Analysis
Anurag Tiwari
No ratings yet
Data Science Course Curriculum 27 Feb 2023
Document21 pages
Data Science Course Curriculum 27 Feb 2023
thugorigin
No ratings yet
30 Days of Interview Preparation
Document415 pages
30 Days of Interview Preparation
heet
100% (1)
Data Science Interview Questions
Document300 pages
Data Science Interview Questions
MaheshBirajdar
100% (1)
Data Science Interview
Document12 pages
Data Science Interview
Vaibhav Jain
100% (3)
Data Mining Problem 2 Report
Document13 pages
Data Mining Problem 2 Report
Babu Shaikh
No ratings yet
How To Build Deep Learning Models With SAS - Subconscious Musings PDF
Document13 pages
How To Build Deep Learning Models With SAS - Subconscious Musings PDF
Armando Salas Iparrazar
No ratings yet
Group Assignment: Machine Learning: TOPIC: Predicting of Census Data Using Machine Learning Techniques
Document11 pages
Group Assignment: Machine Learning: TOPIC: Predicting of Census Data Using Machine Learning Techniques
Simran Saha
No ratings yet
Scikit - Notes ML
Document12 pages
Scikit - Notes ML
Vulli Leela Venkata Phanindra
100% (1)
Spark Python Course APPLY Project Solution Guide Hints
Document2 pages
Spark Python Course APPLY Project Solution Guide Hints
Deepak
No ratings yet
1 Future of AI
Document56 pages
1 Future of AI
Thành Lê
No ratings yet
Stephan Klosterman Data Science Projects With Python Packt Publishing Pvt. Ltd. - 2021
Document133 pages
Stephan Klosterman Data Science Projects With Python Packt Publishing Pvt. Ltd. - 2021
Nafiseh Mohammadi
No ratings yet
Block Diagram of Sampling Process
Document2 pages
Block Diagram of Sampling Process
Amit Hasan
No ratings yet
Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science
Document11 pages
Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science
arka20032003
No ratings yet
Spatial Clustering Algorithms - An Overview: Bindiya M Varghese
Document8 pages
Spatial Clustering Algorithms - An Overview: Bindiya M Varghese
Diego Ignacio Garcia Jurado
No ratings yet
3 - Newton Raphson Method of Solving A Nonlinear Equation
Document9 pages
3 - Newton Raphson Method of Solving A Nonlinear Equation
yanyanbahala
No ratings yet
JavaScript Algorithms
Document292 pages
JavaScript Algorithms
kevin
91% (11)
MIT6 003F11 hw04 PDF
Document14 pages
MIT6 003F11 hw04 PDF
Sudheer Kumar
No ratings yet
Daa Unit 3
Document91 pages
Daa Unit 3
Abhishek Ff
No ratings yet
Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in The Wild (Invited Paper)
Document14 pages
Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in The Wild (Invited Paper)
Diego Labastida
No ratings yet
Python Data Science Handbook Python Data Science Handbook
Document5 pages
Python Data Science Handbook Python Data Science Handbook
Rafa
0% (1)
Apd Summer Institute Application
Document3 pages
Apd Summer Institute Application
F
No ratings yet
Notes Queueing Theory (M-M-1) (Inf - FIFO)
Document5 pages
Notes Queueing Theory (M-M-1) (Inf - FIFO)
hamteam
No ratings yet
Shor PDF
Document11 pages
Shor PDF
Přöĝŕằḿēr Aditya
No ratings yet
Business Mathematics CH - 3
Document29 pages
Business Mathematics CH - 3
Aynetu Terefe
No ratings yet
Cramer John
Document59 pages
Cramer John
Clifford Stone
No ratings yet
03 User Authentication
Document52 pages
03 User Authentication
Moh Abdul Rehman Mustafa
No ratings yet
Unit V Fourier Transform PDF
Document53 pages
Unit V Fourier Transform PDF
Rahul JR
No ratings yet
Lab Manual
Document4 pages
Lab Manual
Rufael
No ratings yet
Credit Card Fraud Detection Using Deep Learning Based On Auto-Encoder and Restricted Boltzmann Machine
Document8 pages
Credit Card Fraud Detection Using Deep Learning Based On Auto-Encoder and Restricted Boltzmann Machine
sahki h
No ratings yet
The Maximal Flow Problem
Document12 pages
The Maximal Flow Problem
Kris Tine
No ratings yet
RC1835
Document5 pages
RC1835
Jeena Mathew
No ratings yet
Audit Practitioners Guide To Machine Learning Part 1 - WHPAPG1 - WHP - Eng - 1022
Document21 pages
Audit Practitioners Guide To Machine Learning Part 1 - WHPAPG1 - WHP - Eng - 1022
Geelyn Marie Luzon
No ratings yet
Antenna Azimuth Position Control System Using PID
Document12 pages
Antenna Azimuth Position Control System Using PID
Ayumu
No ratings yet
Problem 3.156: 30 LB 10 LB
Document7 pages
Problem 3.156: 30 LB 10 LB
Curt De Castro
No ratings yet
Module 8 Inventory Excel Template
Document17 pages
Module 8 Inventory Excel Template
ENID
No ratings yet
IME672 - Lecture 48
Document21 pages
IME672 - Lecture 48
Himanshu Beniwal
No ratings yet
Hysys Rto Refguide
Document154 pages
Hysys Rto Refguide
Mohanad Magdy Said
No ratings yet
Complexity Analysis of Algorithms (Greek For Greek)
Document5 pages
Complexity Analysis of Algorithms (Greek For Greek)
Nguyệt Nguyệt
No ratings yet
Mcs-031: Design and Analysis of Algorithm: Downloaded From
Document2 pages
Mcs-031: Design and Analysis of Algorithm: Downloaded From
sonia madhwal
No ratings yet