You are on page 1of 42

Regression:

Predicting House Prices


Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
1 ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Predicting house prices

©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


How much is my house worth?

I want to list
my house
for sale

3   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


How much is my house worth?

$$ ????

4   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Look at recent sales in my neighborhood
•  How much did they sell for?

5   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Plot recent house sales
(Past 2 years)
y
price ($)

Terminology:
x – feature,
covariate, or
predictor
y – observation or
square feet (sq.ft.) x response
6   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Predict your house by
similar houses
y
price ($)

No house sold
recently had exactly
the same sq.ft.
square feet (sq.ft.) x
7   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Predict your house by
similar houses
y
price ($)

•  Look at average
price in range
•  Still only 2 houses!
•  Throwing out info
square feet (sq.ft.) x from all other sales
8   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Linear regression

©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Use a linear regression model
y Fit a line through the data
price ($)

f(x) = w0+w1 x

parameters
square feet (sq.ft.) x of model
10   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Use a linear regression model
y Fit a line through the data
price ($)

fw (x) = w0+w1 x
function
parameterized by
square feet (sq.ft.) x w = (w0 ,w1 )
11   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Which line?
y
price ($)

fw (x) = w0+w1 x
different parameters w

square feet (sq.ft.) x


12   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
“Cost” of using a given line
y Residual sum of squares (RSS)
price ($)

RSS(w0,w1) =
($house 1-[w0+w1sq.ft.house 1])2
+ ($house 2-[w0+w1sq.ft.house 2])2
+ ($house 3-[w0+w1sq.ft.house 3])2
+ … [include all houses]
square feet (sq.ft.) x
13   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Find “best” line
y Minimize cost over all
possible w0,w1
price ($)

RSS(w0,w1) =
($house 1-[w0+w1sq.ft.house 1])2
+ ($house 2-[w0+w1sq.ft.house 2])2
+ ($house 3-[w0+w1sq.ft.house 3])2
+ … [include all houses]
square feet (sq.ft.) x
ŵ
14   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Predicting your house price
y fw*(x) = ŵ0 + ŵ1 x
price ($)

Best guess of your


house price:
ŷ = ŵ0 + ŵ1 sq.ft.your house

square feet (sq.ft.) x


15   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Adding higher order effects

©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Fit data with a line or … ?
y
price ($)

You show
your friend
square feet (sq.ft.) x your analysis

17   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Fit data with a line or … ?
y
price ($)

Dude, it’s
not a linear
relationship!
square feet (sq.ft.) x
18   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
What about a quadratic function?
y
price ($)

Dude, it’s
not a linear
relationship!
square feet (sq.ft.) x
19   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
What about a quadratic function?
y
price ($)

fw(x) = w0 + w1 x+ w2 x2

square feet (sq.ft.) x


20   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Even higher order polynomial
y
price ($)

I can
minimize
your RSS
square feet (sq.ft.) x
21   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Do you believe this fit?
y
price ($)

My house
isn’t worth
so little

square feet (sq.ft.) x


22   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Evaluating overfitting via
training/test split

©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Do you believe this fit?
y
price ($)

Minimizes RSS,
but bad predictions

square feet (sq.ft.) x


24   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
What about a quadratic function?
y
price ($)

fw(x) = w0 + w1 x+ w2 x2

square feet (sq.ft.) x


25   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
How to choose model
order/complexity

•  Want good predictions, but


can’t observe future
•  Simulate predictions
1.  Remove some houses
2.  Fit model on remaining
3.  Predict heldout houses
26   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Training/test split

Terminology: – training set


– test set
27   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Training error
y
price ($)

Training error (w) =


($train 1-fw(sq.ft.train 1))2
Minimize to + ($train 2-fw(sq.ft.train 2))2
+ ($train 3-fw(sq.ft.train 3))2
find ŵ
+ … [include all
square feet (sq.ft.) x training houses]
28   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Test error
y
price ($)

Test error (ŵ) =


($test 1-fŵ(sq.ft.test 1))2
Assess + ($test 2-fŵ(sq.ft.test 2))2
predictions + ($test 3-fŵ(sq.ft.test 3))2
using ŵ + … [include all
square feet (sq.ft.) x test houses]
29   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Training/Test Curves
Error

Model complexity

30   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Adding other features

©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Predictions just based on
house size
y
price ($)

Only 1 bathroom!
Not same as my
3 bathrooms

square feet (sq.ft.) x


32   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Add more features fw(x) = w0 + w1 sq.ft.
y + w2 #bath
price ($)

x2

square feet (sq.ft.) x1


33   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
How many features to use?
•  Possible choices:
- Square feet
- # bathrooms
- # bedrooms
- Lot size
- Year built
- …
•  See Regression Course!

34   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Other regression examples

35 ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Salary after ML specialization

hard work

•  How much will your salary be? (y = $$)


•  Depends on x = performance in courses, quality of
capstone project, # of forum responses, …

36   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Salary after ML specialization

hard work

ŷ = ŵ0 + ŵ1 performance +
ŵ2 capstone + ŵ3 forum
informed by other students who
completed specialization
37   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  
Stock prediction
•  Predict the price of a stock
•  Depends on
- Recent history of stock price
- News events
- Related commodities

38   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Tweet popularity
•  How many people will retweet your tweet?
•  Depends on # followers,
# of followers of followers,
features of text tweeted,
popularity of hashtag,
# of past retweets,…

39   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Smart houses
•  Smart houses have many distributed sensors
•  What’s the temperature at your desk? (no sensor)
-  Learn spatial function to predict temp
•  Also depends on
-  Thermostat setting
-  Blinds open/closed
or window tint
-  Vents
-  Temperature outside
-  Time of day

40   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


Summary for regression

©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  


What you can do now…
•  Describe the input (features) and output (real-valued
predictions) of a regression model
•  Calculate a goodness-of-fit metric (e.g., RSS)
•  Estimate model parameters by minimizing RSS
(algorithms to come…)
•  Exploit the estimated model to form predictions
•  Perform a training/test split of the data
•  Analyze performance of various regression models in
terms of test error
•  Use test error to avoid overfitting when selecting amongst
candidate models
•  Describe a regression model using multiple features
•  Describe other applications where regression is useful

42   ©2015  Emily  Fox  &  Carlos  Guestrin   Machine  Learning  Specializa0on  

You might also like