Professional Documents
Culture Documents
Supervised Learning
Linear Regression: Model and Algorithms
we wants to figure
out how much this
house will sell for.
3
How much is this house worth?
$$ ????
4
Data input output
(x1 = sq.ft., y1 = $)
(x2 = sq.ft., y2 = $)
(x3 = sq.ft., y3 = $)
Input vs. Output:
(x4 = sq.ft., y4 = $)
• y is the quantity of interest
• assume y can be predicted from x
(x5 = sq.ft., y5 = $)
…
Model –
How we assume the world works
y
price ($)
Regression model:
8
Simple linear regression model
y yi = w1 xi + εi
w1 is the parameter
(regression coefficient)
price ($)
f(x) = w1 x
house with
3 bathrooms
x[2]
price ($)
s
om
ro
th
ba
square feet (sq.ft.) #x[1]
11
Many possible inputs
- Square feet
- # bathrooms
- # bedrooms
- Lot size
- Year built
-…
12
General notation
scalar
Output: y
Inputs: x = (x[1],x[2],…, x[d])
d-dim vector
e.g., x[1] = sq. ft, x[2] = #baths and so on.
Notational conventions:
training set: {(xi, yi)}i=1..n
xi = input of ith data point/observation (vector); yi is output
xi[j] = jth input of ith data point (scalar)
n = number of observations; d = number of input features
13
Generic linear regression model
Model: Given feature vector xi = (xi[1],xi[2],…, xi[d])
yi = w1 xi[1] + w2 xi[2] + … + wd xi[d] + εi
= ∑%"#$ wj xi[j] + εi = w T x i + εi = x i T w + εi
…
feature d = x[d] =lot size
14
Fitting the linear regression model
RSS(w1) = (yi-w1xi)2
x[2]
price ($)
s
om
ro
th
ba
#
18
RSS for multiple regression
s
mo
ro
th
ba
#
x[1]
square feet (sq.ft.)
19
Our specific optimization problem
s
mo
ro
x[1]
square feet (sq.ft.)
21
1. Solve for RSS(w) = 0
Δ
Gradient of RSS
Δ
RSS(w) = [(y-Xw)T(y-Xw)]
Δ
22
Closed form solution
Δ
RSS(w) = [(y-Xw)T(y-Xw)]
Δ
= -2XT(y-Xw)
23
Solution to normal equations XTXŵ = XTy
ŵ = ( XTX )-1 XTy
24
2. Gradient descent
Gradient Descent
• Repeatedly move in direction that reduces the value of the
function.
26
Gradient descent for linear regression:
repeatedly move in direction of negative gradient
-2XT(y-Xw(t))
27
Interpreting elementwise
Update to jth feature weight:
y wj(t+1) ß wj(t) + 2η xi[j](yi-ŷi(w(t)))
price ($)
x[2]
s
mo
h ro
at
#b
29