You are on page 1of 42

Regression:

Predicting House Prices


Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
1 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Predicting house prices

2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


How much is my house worth?

I want to list
my house
for sale

3 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


How much is my house worth?

$$ ????

4 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Look at recent sales in my neighborhood
How much did they sell for?

5 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Plot recent house sales
(Past 2 years)
y
price ($)

Terminology:
x feature,
covariate, or
predictor
y observation or
square feet (sq.ft.) x response
6 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Predict your house by
similar houses
y
price ($)

No house sold
recently had exactly
the same sq.ft.
square feet (sq.ft.) x
7 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Predict your house by
similar houses
y
price ($)

Look at average
price in range
Still only 2 houses!
Throwing out info
square feet (sq.ft.) x from all other sales
8 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Linear regression

2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Use a linear regression model
y Fit a line through the data
price ($)

f(x) = w0+w1 x

parameters
square feet (sq.ft.) x of model
10 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Use a linear regression model
y Fit a line through the data
price ($)

fw (x) = w0+w1 x
function
parameterized by
square feet (sq.ft.) x w = (w0 ,w1 )
11 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Which line?
y
price ($)

fw (x) = w0+w1 x
dierent parameters w

square feet (sq.ft.) x


12 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Cost of using a given line
y Residual sum of squares (RSS)
price ($)

RSS(w0,w1) =
($house 1-[w0+w1sq.ft.house 1])2
+ ($house 2-[w0+w1sq.ft.house 2])2
+ ($house 3-[w0+w1sq.ft.house 3])2
+ [include all houses]
square feet (sq.ft.) x
13 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Find best line
y Minimize cost over all
possible w0,w1
price ($)

RSS(w0,w1) =
($house 1-[w0+w1sq.ft.house 1])2
+ ($house 2-[w0+w1sq.ft.house 2])2
+ ($house 3-[w0+w1sq.ft.house 3])2
+ [include all houses]
square feet (sq.ft.) x

14 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Predicting your house price
y fw*(x) = 0 + 1 x
price ($)

Best guess of your


house price:
= 0 + 1 sq.ft.your house

square feet (sq.ft.) x


15 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Adding higher order eects

2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Fit data with a line or ?
y
price ($)

You show
your friend
square feet (sq.ft.) x your analysis

17 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Fit data with a line or ?
y
price ($)

Dude, its
not a linear
relationship!
square feet (sq.ft.) x
18 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
What about a quadratic function?
y
price ($)

Dude, its
not a linear
relationship!
square feet (sq.ft.) x
19 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
What about a quadratic function?
y
price ($)

fw(x) = w0 + w1 x+ w2 x2

square feet (sq.ft.) x


20 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Even higher order polynomial
y
price ($)

I can
minimize
your RSS
square feet (sq.ft.) x
21 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Do you believe this fit?
y
price ($)

My house
isnt worth
so little

square feet (sq.ft.) x


22 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Evaluating overfitting via
training/test split

2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Do you believe this fit?
y
price ($)

Minimizes RSS,
but bad predictions

square feet (sq.ft.) x


24 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
What about a quadratic function?
y
price ($)

fw(x) = w0 + w1 x+ w2 x2

square feet (sq.ft.) x


25 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
How to choose model
order/complexity

Want good predictions, but


cant observe future
Simulate predictions
1. Remove some houses
2. Fit model on remaining
3. Predict heldout houses
26 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Training/test split

Terminology: training set


test set
27 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Training error
y
price ($)

Training error (w) =


($train 1-fw(sq.ft.train 1))2
Minimize to + ($train 2-fw(sq.ft.train 2))2
+ ($train 3-fw(sq.ft.train 3))2
find
+ [include all
square feet (sq.ft.) x training houses]
28 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Test error
y
price ($)

Test error () =
($test 1-f(sq.ft.test 1))2
Assess + ($test 2-f(sq.ft.test 2))2
predictions + ($test 3-f(sq.ft.test 3))2
using + [include all
square feet (sq.ft.) x test houses]
29 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Training/Test Curves
Error

Model complexity

30 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Adding other features

2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Predictions just based on
house size
y
price ($)

Only 1 bathroom!
Not same as my
3 bathrooms

square feet (sq.ft.) x


32 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Add more features fw(x) = w0 + w1 sq.ft.
y + w2 #bath
price ($)

x2

square feet (sq.ft.) x1


33 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
How many features to use?
Possible choices:
-Square feet
-# bathrooms
-# bedrooms
-Lot size
-Year built
-
See Regression Course!

34 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Other regression examples

35 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Salary after ML specialization

hard work

How much will your salary be? (y = $$)


Depends on x = performance in courses, quality of
capstone project, # of forum responses,

36 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Salary after ML specialization

hard work

= 0 + 1 performance +
2 capstone + 3 forum
informed by other students who
completed specialization
37 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on
Stock prediction
Predict the price of a stock
Depends on
-Recent history of stock price
-News events
-Related commodities

38 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Tweet popularity
How many people will retweet your tweet?
Depends on # followers,
# of followers of followers,
features of text tweeted,
popularity of hashtag,
# of past retweets,

39 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Smart houses
Smart houses have many distributed sensors
Whats the temperature at your desk? (no sensor)
- Learn spatial function to predict temp
Also depends on
- Thermostat setting
- Blinds open/closed
or window tint
- Vents
- Temperature outside
- Time of day

40 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


Summary for regression

2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on


What you can do now
Describe the input (features) and output (real-valued
predictions) of a regression model
Calculate a goodness-of-fit metric (e.g., RSS)
Estimate model parameters by minimizing RSS
(algorithms to come)
Exploit the estimated model to form predictions
Perform a training/test split of the data
Analyze performance of various regression models in
terms of test error
Use test error to avoid overfitting when selecting amongst
candidate models
Describe a regression model using multiple features
Describe other applications where regression is useful

42 2015 Emily Fox & Carlos Guestrin Machine Learning Specializa0on

You might also like