Professional Documents
Culture Documents
$ 70 000
https://www.youtube.com/watch?v=IpGxLWOIZy4
House Price Prediction
$ 160 000
House Price Prediction
???
House Price Prediction
20
15
Price
(in $ 10 000’s) 10
5
0
Size (feet2)
Linear Regression
20 ×
15 ×
Price
12
10 ×× ××
(in $ 10 000’s)
5 × × ×
0
Size (feet2)
Today’s Agenda
● Linear Regression with One Variable
○ Model Representation
○ Cost Function
○ Gradient Descent
1534 315
852 178
... ...
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
Training set
Training set
Learning algorithm
Training set
Learning algorithm
h
(hypothesis)
Training set
Learning algorithm
Size of Estimated
h
house price
(hypothesis)
Training set
Learning algorithm
Size of Estimated
h
house price
(hypothesis)
h maps x’s to y’s
How do we represent h ?
Training set
Learning algorithm
Size of Estimated
h
house price
(hypothesis)
h maps x’s to y’s
How do we represent h ?
Learning algorithm ×
y × ×
× ×
Size of Estimated × ×
h
house price x
(hypothesis)
h maps x’s to y’s
How do we represent h ?
Learning algorithm ×
y × ×
× ×
Size of Estimated × ×
h
house price x
(hypothesis) Linear regression with one variable.
h maps x’s to y’s Univariate linear regression.
Cost Function
Training Set Size in feet2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178
... ...
Hypothesis: h (x) = 0
+ x
1
i‘s: Parameters
How to choose i‘s ?
h (x) = 0
+ 1
x
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
= 1.5 =0 =1
=0 = 0.5 = 0.5
h (x) = 0
+ 1
x
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
= 1.5 =0 =1
=0 = 0.5 = 0.5
h (x) = 0
+ 1
x
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
= 1.5 =0 =1
=0 = 0.5 = 0.5
h (x) = 0
+ 1
x
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
= 1.5 =0 =1
=0 = 0.5 = 0.5
×
y × ×
× ×
× ×
x
Idea: Choose 0, 1 so that
close to y for our
training examples (x,y)
-
×
y × ×
× ×
× ×
x
Idea: Choose 0, 1 so that
close to y for our
training examples (x,y)
-
×
y × ×
× ×
× ×
x
Idea: Choose 0, 1 so that
close to y for our
training examples (x,y)
-
×
y × ×
× ×
× ×
x
Idea: Choose 0, 1 so that
close to y for our
training examples (x,y)
-
×
y × ×
× ×
× ×
x
Idea: Choose 0, 1 so that
close to y for our
training examples (x,y)
-
×
y × × h (x) = + x
0 1
× ×
× ×
x
Idea: Choose 0, 1 so that
close to y for our
training examples (x,y)
-
×
y × × h (x) = + x
0 1
× ×
× ×
x = -
×
y × × h (x) = + x
0 1
× ×
× ×
x = -
Parameters:
0, 1
Cost Function:
= −
Goal:
Hypothesis: Simplified
h (x) = 0
+ x
1
H =
Parameters:
0, 1
Cost Function:
= − = −
Goal:
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 ×
2 ×
y
1 ×
0
0 1 2 3
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 ×
2 ×
y
1 × =1
0
0 1 2 3
x
J( ) = J(1) = ?
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2
y
1 × =1 1
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
J( ) = J(1) = 0
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2
y
1 × = 0.5 1
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2
y
1 × = 0.5 1
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2
y
1 × = 0.5 1
×
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2
y
1 × =0 1
×
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2
y
1 × =0 1
×
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 × 2×
y
1 × =0 1
×
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
h (x)
(for fixed 1, this is a function of x) (function of the parameters 1)
3 × 3
2 2×
×
×
y
1 1
×
× =0
× ×
0 0 ×
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
Cost Function
Intuition II
h (x) J( 0, 1)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
h (x) J( 0, 1)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
500
× ×× ×
400 ××
Price ($) 300
× ×××× ××
× ××× ×
in 1000’s ×××
200
100
××
0
0 1000 2000 3000
Size in feet2(x)
= 50
= 50 + 0.06x = 0.06
h (x) J( 0, 1)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
500
× ×× ×
400 ××
Price ($) 300
× ×××× ××
× ××× ×
in 1000’s ×××
200
100
××
0
0 1000 2000 3000
Size in feet2(x)
= 50 and ?
= 50 + 0.06x = 0.06
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1
×
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0 ×
300 ×××××××××××××× -0.1
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1 ×
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
Gradient Descent
Have some function
Want
Outline:
● Start with some ,
● Keep changing , to reduce until we
hopefully end up at a minimum
Gradient Descent algorithm
temp1 −
temp0
temp1
Gradient Descent algorithm
×
∈ℝ
−
∈ℝ
−
∈ℝ
≥0
−
− · (positive number)
∈ℝ
×
×
≥0
−
− · (positive number)
∈ℝ
× ×
×
≥0
−
− · (positive number)
∈ℝ
× ×
×
≥0 ≤0
− −
× ×
× ×
≥0 ≤0
− −
×
−
×
×
If is too small, gradient descent ×
can be slow.
××
××
at local optima
What will one step of gradient
descent − do?
at local optima =0
Gradient descent can converge to a local minimum,
even with the learning rate fixed.
−
×
−
×
×
As we approach a local minimum,
gradient descent will automatically
take smaller steps. So, no need to
decrease over time.
Gradient descent can converge to a local minimum,
even with the learning rate fixed.
−
×
×
As we approach a local minimum, ×
gradient descent will automatically
take smaller steps. So, no need to
decrease over time.
Gradient descent can converge to a local minimum,
even with the learning rate fixed.
−
×
×
As we approach a local minimum,
gradient descent will automatically
××
take smaller steps. So, no need to
decrease over time.
Gradient descent can converge to a local minimum,
even with the learning rate fixed.
−
×
×
As we approach a local minimum, ×
gradient descent will automatically ××× ×
take smaller steps. So, no need to
decrease over time.
Gradient Descent algorithm Linear Regression Model
−
= −
(for j = 0 and j = 1)
}
Gradient Descent algorithm Linear Regression Model
−
= −
(for j = 0 and j = 1)
}
= −
= −
= −
= + −
= −
= + −
j = 0: = −
j = 1: = −
= −
= + −
j = 0: = −
j = 1: = −
Gradient Descent algorithm
− −
− −
}
update ai and ai
simultaneously
}
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1 ×
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1 ××
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1
×××
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0
300 ×××××××××××××× -0.1 × ××
×
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× × ×××× ××
× 0 ×
300 ×××××××××××××× -0.1 × ××
×
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1
400
× ×××× ×× ××
× × 0
300 ×××××××××××××× -0.1 × ××
×
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
h (x)
(for fixed 0, 1, this is a function of x) (function of the parameters 0, 1)
700 0.5
0.4
600
×× ×× 0.3
×
Price $ (in 1000’s)
500 × 0.2
× ××× 0.1 ××
400
× × ×××× ××
× 0 ×
300 ×××××××××××××× -0.1 × ××
×
200
×××××× -0.2
×× × -0.3
100
-0.4
0 -0.5
0 1000 2000 3000 4000 -1000 -500 0 500 1000 1500 2000
2
Size (feet )
“Batch” Gradient Descent
2104 460
1416 232
1534 315
852 178
... ...
h (x) = 0
+ x
1
Multiple Variables Features
Size in Number of Number of Age of home Price ($) in
feet2 bedrooms floors (years) 1000’s
x1 x2 x3 x4 y
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 2 36 178
... ... ... ... ...
Notation:
n = number of features
= input (features) of training example
= value of features j in training example
Hypothesis
Previously: h (x) = 0
+ x
1
Hypothesis
Previously: h (x) = 0
+ x
1
= + + + +
Hypothesis
Previously: h (x) = 0
+ x
1
= + + + +
= 80 + 0.1 + 10 + 3 − 2
= + + + +
= + + + +
For convenience of notation, define = 1.
= + + + +
For convenience of notation, define = 1.
= ℝ = ℝ
= + + + +
For convenience of notation, define = 1.
= ℝ = ℝ
=
= + + + +
For convenience of notation, define = 1.
= ℝ = ℝ
=
Cost Function: = −
Gradient Descent:
repeat {
Previously (n = 1):
repeat {
− −
− −
(simultaneously update , )
}
Gradient Descent New Algorithm (n ≥ 1):
− −
(simultaneously update , )
}
Gradient Descent New Algorithm (n ≥ 1):
− −
− −
− −
(simultaneously update , )
} − −
...
Feature Scaling
Feature Scaling
Idea: Make sure features are on similar scale.
●
Feature Scaling
Idea: Make sure features are on similar scale.
2 size (feet2)
E.g. = size (0−2000 feet ) =
2000
= number of bedrooms (1−5)
number of bedrooms
=
5
●
Feature Scaling
Idea: Make sure features are on similar scale.
2 size (feet2)
E.g. = size (0−2000 feet ) =
2000
= number of bedrooms (1−5)
number of bedrooms
=
5
●
●
●
●
●
●
Feature Scaling
Get every feature into approximately a −1 ≤ ≤ 1 range.
Mean Normalization
Replace with − to make features have approximately
zero mean (do not apply to = 1).
size − 1000
E.g. = −0.5 ≤ ≤ 0.5
2000
#bedrooms − 2.5
= −0.5 ≤ ≤ 0.5
5
Mean Normalization
Replace with − to make features have approximately
zero mean (do not apply to = 1).
size − 1000
E.g. = −0.5 ≤ ≤ 0.5
2000
#bedrooms − 2.5
= −0.5 ≤ ≤ 0.5
5
− −
= =
Learning Rate
Gradient Descent
Example automatic
convergence test:
Declare convergence if
decreases by less than 10-3 in
0 100 200 300
one iteration.
No. of iterations
Making sure gradient descent is working correctly.
No. of iterations
Making sure gradient descent is working correctly.
No. of iterations
Making sure gradient descent is working correctly.
No. of iterations
No. of iterations
Making sure gradient descent is working correctly.
No. of iterations
No. of iterations
To choose , try
…, 0.001, …, 0.01, …, 0.1, …, 1, …
References