Professional Documents
Culture Documents
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Tangent planes
Functions of Two Variables
Functions of Two Variables
f (x) = x 2 y
x
Functions of Two Variables
f (x) = x 2 y
(2,4)
x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2
(2,4)
x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2
(2,4)
x
Finding the Tangent Plane
Finding the Tangent Plane
Slope?
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant
Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant
f(x, y) = x 2 + y 2
Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant
Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant
x2 + y2
Partial Derivatives
x2 + y2
Treat y as
a constant
2x
Partial Derivatives
x2 + y2
Treat y as Treat x as
a constant a constant
2x 2y
Partial Derivatives
x2 + y2 f(x, y)
Treat y as Treat x as
a constant a constant
2x 2y
Partial Derivatives
x2 + y2 f(x, y)
Treat y as Treat x as
a constant a constant
∂f
2x 2y fx =
∂x
Partial Derivatives
x2 + y2 f(x, y)
Treat y as Treat x as
a constant a constant
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial Derivatives
x2 + y2 f(x, y)
Treat y as Treat x as
a constant a constant
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial derivative of
f with respect to x


Partial Derivatives
x2 + y2 f(x, y)
Treat y as Treat x as
a constant a constant
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial derivative of Partial derivative of
f with respect to x f with respect to x




Intro To Partial Derivatives
Intro To Partial Derivatives
f (x, y) = x 2 + y 2
Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
Find partial derivatives of f with respect to
x and y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivatives of f with respect to
= x and y
∂x


Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivatives of f with respect to
= x and y
∂x
∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivatives of f with respect to
?
= x and y
∂x
∂f
=
∂y


Intro To Partial Derivatives
Partial derivative notation f (x, y) = x 2 + y 2
TASK
∂f Find partial derivatives of f with respect to
?
= x and y
∂x
∂f
=
∂y


Intro To Partial Derivatives
Partial derivative notation f (x, y) = x 2 + y 2
TASK
∂f Find partial derivatives of f with respect to
?
= x and y
∂x
∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f
?
=
∂x
∂f
=
∂y
Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
?
=
∂x
∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
?
=
∂x Step 1:
∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.
∂f
=
∂y



Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.
∂f
= Step 2:
∂y



Intro To Partial Derivatives
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.
∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.



Intro To Partial Derivatives
f(x, y) = x 2 + y 2
TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case y.
TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.
∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.




Intro To Partial Derivatives
f(x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.
∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.




Intro To Partial Derivatives
f(x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.
∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.




Intro To Partial Derivatives
f (x, y) = 1 + y 2
f(x, y) = x + y 2
2
TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.
∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.





Intro To Partial Derivatives
f (x, y) = 1 + y 2
f(x, y) = x + y 2
2
TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.
∂f
= 2y Step 2: Differentiate the function using the normal
∂y rules of differentiation.





Gradients and Gradient Descent
TASK
What is the partial derivate of f with respect to x?



Partial Derivatives (More Examples)
f(x, y) = 3x 2y 3
TASK
∂f
∂x
=
? What is the partial derivate of f with respect to x?



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
∂f
= TASK
∂x
Find partial derivate of f with respect to x


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
Constant coefficient
∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
Constant coefficient
∂f
= 3 TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
∂f
= 3 TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
∂f
= 3(2x) TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
∂f 3
= 3(2x)y TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
∂f 3
= 3(2x)y TASK
∂x
Find partial derivate of f with respect to x
= 6xy 3 Step 1: Treat all other variables as a constant. In
our case y.
TASK
What is the partial derivate of f with respect to y?


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
TASK
∂f What is the partial derivate of f with respect to y?
=
∂y


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
TASK
∂f
∂y
=
? 
What is the partial derivate of f with respect to y?

Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3
∂f
=
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
∂f
=
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
∂f
= 3
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
∂f 2
= 3 (x )
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
∂f 2 2
= 3 (x ) (3y )
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
∂f 2 2
= 3 (x ) (3y )
∂y TASK
= 9x 2y 2 What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
∂f 2 2
= 3 (x ) (3y )
∂y TASK
= 9x 2y 2 What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.
Gradients
Gradient
f(x, y) = x 2 + y 2
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant
∂f
= 2x
∂x
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient
Treat y as a constant Treat x as a constant
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient
[2y]
Treat y as a constant Treat x as a constant 2x
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient
[2y]
Treat y as a constant Treat x as a constant 2x
∂f
∂x
∇f = ∂f
∂y
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2
[2y]
2x
The gradient of f (x, y) is: ∇f =

Gradient
f(x, y) = x 2 + y 2
[2y]
2x
The gradient of f (x, y) is: ∇f =
TASK
Find the gradient of f (x, y) at (2,3)



Gradient
f(x, y) = x 2 + y 2
[2y]
2x
The gradient of f (x, y) is: ∇f =
TASK
Find the gradient of f (x, y) at (2,3)
The gradient of f (x, y) is given as:




Gradient
f(x, y) = x 2 + y 2
[2y]
2x
The gradient of f (x, y) is: ∇f =
TASK
Find the gradient of f (x, y) at (2,3)
The gradient of f (x, y) is given as:
[2 ⋅ 3]
2⋅2
∇f =




Gradient
f(x, y) = x 2 + y 2
[2y]
2x
The gradient of f (x, y) is: ∇f =
TASK
Find the gradient of f (x, y) at (2,3)
The gradient of f (x, y) is given as:
[2 ⋅ 3] [6]
2⋅2 4
∇f = =




Gradients and Gradient Descent
x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2
x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2
x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
when slope = 0
Minimum is
when slope = 0
Minimum is
when slope = 0
Minimum is
when slope = 0
Minimum is
Minimum is when both slopes = 0
when slope = 0
Minimum is
Minimum is when both slopes = 0
when slope = 0
Minimum is Minimum is
when slope = 0 when both slopes = 0
f′(x) = 0
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0
2x = 0
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0
2x = 0
x=0
x
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0
2x = 0
x=0
x
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
x=0
x
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
x
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
(x, y) = (0,0)
x
Minimum is Minimum is
when slope = 0 when both slopes = 0

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
(x, y) = (0,0)
x
Minimum is Minimum is
when slope = 0 when both slopes = 0

Too hot!
Motivation for Optimization in Two Variables
5m
5m
Motivation for Optimization in Two Variables
5m
5m
Motivation for Optimization in Two Variables
5m
5m
Motivation for Optimization in Two Variables
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
Starting point
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
T (∘C )
(m)
(m)
Motivation for Optimization in Two Variables
∂T
∘
T( C) =0
∂x
(m)
(m)
Motivation for Optimization in Two Variables
∂T
∘
T( C) =0
∂x
∂T
=0
∂y
(m)
(m)
Exercise
T(∘C )
(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
T(∘C )
(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
Try and calculate ∂f
∂x
T(∘C )
(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
Try and calculate ∂f
∂x
and ∂f
∘
T( C )
∂y
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
T(∘C )
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6)
∂x 90
T(∘C )
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6)
∂x 90
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12)
∂y 90
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0 x=4
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0 x=4
y=0
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=0
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0 y=0
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0 y=0 y=4
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
T(∘C )
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90
T(∘C )
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
y=0
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m)
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
y=0
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
y=0
85
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
85
85 y=0
85
x = 0, y = 0 Maxima
85 x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
85
85 y=0
85
x = 0, y = 0 Maxima
85 x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4
73.6
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
85
85 y=0
85
x = 0, y = 0 Maxima
85 x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4 Minimum
73.6
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Gradients and Gradient Descent
Location on
the x,y plane (1, 2) (2, 5) (3,3)
Location on
the x,y plane (1, 2) (2, 5) (3,3)
Location on
the x,y plane (1, 2) (2, 5) (3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
Location on y = mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
Location on y = mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
y = mx + b
(2, 5)
(3,3)
(1,2)
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
y = mx + b
(2, 5)
(3,3)
(1,2)
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
y = mx + b
(2, 5)
(1, m+b)
(3,3)
(1,2)
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
y = mx + b
(2, 5)
(1, m+b)
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
y = mx + b
(2, 5)
(2m + b − 5)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
y = mx + b
(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5)
(3,3)
(1,2)
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
=0
∂m
(3,3)
(1,2)
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
=0
∂m
∂E
(3,3) =0
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
=0 Quiz:
∂m
Find the partial derivative of E
∂E with respect to m
(3,3) =0
(1,2) ∂b
x


Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) =
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) = Quiz:
∂b
Find the partial derivative of E
(1,2)
x with respect to b


Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
b=
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
6b − 14 = 0
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
6b − 14 = 0
m=
b= ? b=
14
6
=
7
3
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost
1
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b m=
2
∂E
= 28m + 12b − 42 =0 7
∂m b=
3
∂E
= 6b + 12m − 20 =0
∂b 1 7
E(m = , b = ) ≈ 4.167
2 3
m=
b= ?
Linear Regression: Optimal Solution
y
1
m=
2
y = mx + b
(2, 5) 7
b=
3
(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Optimal Solution
y
1
m=
2
y = mx + b
(2, 5) 7
b=
3
(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Optimal Solution
y
1
m=
2
y = mx + b
(2, 5) 7
b=
3
(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
∂E
= 28m + 12b − 42 =0
∂m
Gradient Descent to the rescue
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Gradients and Gradient Descent
f (x) = e x − log(x)
Hard To Optimize Functions
f (x) = e x − log(x)
y
Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
f′(x) = 0

Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions
f (x) = e x − log(x)
Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions
f (x) = e x − log(x)
1
f′(x) = e x −
x y

Hard To Optimize Functions
f (x) = e x − log(x)
1
f′(x) = e x − =0
x y

Hard To Optimize Functions
f (x) = e x − log(x)
1
f′(x) = e x − =0
x y
1
ex =
x

Hard To Optimize Functions
f (x) = e x − log(x)
1
f′(x) = e x − =0
x y
1
ex =
x
Solution: x = 0.5671...


Hard To Optimize Functions
f (x) = e x − log(x)
1
f′(x) = e x − =0
x y
1
ex =
x
Solution: x = 0.5671...
Also known as the Omega constant


Method 1: Try Both Directions
y
Method 1: Try Both Directions
Is there any
other way?
y
Method 1: Try Both Directions
Is there any
other way?
y
Method 1: Try Both Directions
Is there any
other way?
y
Method 1: Try Both Directions
Is there any
other way?
y
Method 1: Try Both Directions
Is there any
other way?
y
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Method 1: Try Both Directions
Is there any
other way?
Repeat!
Gradients and Gradient Descent
y
Method 2: Be Clever
Try something
smarter…
y
Method 2: Be Clever
Try something
smarter…
y
Method 2: Be Clever
Try something
smarter…
Step right
Method 2: Be Clever
Try something
smarter…
Step right
Method 2: Be Clever
Try something
smarter…
y Step left
Step right
Method 2: Be Clever
Try something
smarter…
y Step left
Step right
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
Slope
Step right positive
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
new point
Slope
Step right positive
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point
Slope
Step right positive
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point - slope
Slope
Step right positive
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point - slope
x1 Step right
Slope
positive
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point - slope
x1 = x0 Step right
Slope
positive
Slope
negative
Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point - slope
Slope
negative

Method 2: Be Clever
Try something
smarter…
y
new point = old point - slope
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
y
new point = old point - slope
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
new point = old point - slope
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
new point = old point - slope
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
new point = old point - slope
x1 = x0 −f′(x0)
x1 = x0 − 0.01 f′(x0)


Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
new point = old point - slope
x1 = x0 −f′(x0)
x1 = x0 − α f′(x0)


Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
new point = old point - slope
x1 = x0 −f′(x0)
x1 = x0 − α f′(x0)
Learning rate


Method 2: Be Clever
Try something
smarter…
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)
Small
step

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)
Small Large
step slope

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)
Small Large
step slope
Small
slope

Method 2: Be Clever
Try something
smarter…
x1 = x0 − α f′(x0)
Gradient descent

Method 2: Be Clever
Try something
smarter…
x2 = x1 − α f′(x1)
Gradient descent

Method 2: Be Clever
Try something
smarter…
Gradient descent

Gradient Descent
Function: f(x) Goal: find minimum of f(x)
Step 1:
Define a learning rate α
Choose a starting point x0
Step 2:
Update: xk = xk−1 − af′(xk−1)
Step 3:
Repeat Step 2 until you are close enough to
the true minimum x*







Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
y

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
y



Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y







Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y
Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735











Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y
Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735
Repeat!











Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y
Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735
Repeat!











Gradients and Gradient Descent
x
Drawbacks of Gradient Descent
y
🤷
x
🤷
x
🤷
x
T
Gradient Descent With Heat Example
T
Gradient Descent With Heat Example
T
Gradient Descent With Heat Example
T
Gradient Descent With Heat Example
T
Gradient Descent With Heat Example
Repeat!
T
Gradient Descent With Heat Example
T
Gradient Descent With Heat Example
Repeat!
T
Gradients and Gradient Descent
∇f


Idea for Gradient Descent
∇f


Idea for Gradient Descent
∇f − ∇f



Idea for Gradient Descent
∇f − ∇f



Idea for Gradient Descent
− ∇f



Idea for Gradient Descent
(x1, y1) − ∇f





Idea for Gradient Descent
(x1, y1) − ∇f
Better point!





1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)
∂f 1 2
∂x
= − 90
x(3x − 12)y (y − 6)
∇f∂f= 1 2
= − 90 x (x − 6)y(3y − 12)
∂y

1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)
1
− 90 x(3x − 12)y 2(y − 6)
∇f = 1 2
− 90 x (x − 6)y(3y − 12)

1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)
1
− 90 x(3x − 12)y 2(y − 6)
∇f = 1 2
− 90 x (x − 6)y(3y − 12)
[−0.0935]
0.1134
∇f (0.5,0.6) =

Method 2: Gradient Descent
Start: x = 0.5, y = 0.6
[−0.0935]
0.1134
∇f (0.5,0.6) =

Method 2: Gradient Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
[−0.0935]
0.1134
∇f (0.5,0.6) =
Move by
−0.05 ∇f (0.5,0.6)



Method 2: Gradient Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
[−0.0935]
0.1134
∇f (0.5,0.6) =
Move by
−0.05 ∇f (0.5,0.6)
x ↦ 0.5057
y ↦ 0.6047



Method 2: Gradient Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
[−0.0935]
0.1134
∇f (0.5,0.6) =
Move by
−0.05 ∇f (0.5,0.6)
x ↦ 0.5057
y ↦ 0.6047



Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05


Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)




Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!




Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!




Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!




Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!




Gradient Descent
Gradient Descent
Function: f(x, y)

Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)


Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)
Step 1:
Define a learning rate α
Choose a starting point (x0, y0)




Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)
Step 1:
Define a learning rate α
Choose a starting point (x0, y0)
Step 2:
[yk] [yk−1]
xk xk−1
Update: = − α ∇f(xk−1, yk−1)





Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)
Step 1:
Define a learning rate α
Choose a starting point (x0, y0)
Step 2:
[yk] [yk−1]
xk xk−1
Update: = − α ∇f(xk−1, yk−1)
Step 3:
Repeat Step 2 until you are close enough to
the true minimum (x*, y*)






Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradients and Gradient Descent
y = mx + b
(2, 5)
(3,3)
(1,2)
x
Gradient Descent With Power Lines Example
y E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
y = mx + b
(2, 5)
(3,3)
(1,2)
x
Gradient Descent With Power Lines Example
y E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
y = mx + b
1 7
E(m = , b = ) ≈ 4.167
(2, 5) 2 3
(3,3)
(1,2)
x
Linear Regression: Gradient Descent
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b=
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ?
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ?
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ?
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ?
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ?
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ? The points m,b such that
the cost is minimum
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ? The points m,b such that
the cost is minimum
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ? The points m,b such that
the cost is minimum
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ? The points m,b such that
the cost is minimum
Steps:
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ? The points m,b such that
the cost is minimum
Steps:
Start with (m0, b0)
descend until you
find the minimum
m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
m=
b= ? The points m,b such that
the cost is minimum
Steps:
Start with (m0, b0)
descend until you
find the minimum
Iterate
(mk+1, bk+1) = (mk, bk ) − α ∇E(mk, bk ) m

Gradients and Gradient Descent
(2, 5)
(3,3)
(1,2)
x
Gradient Descent
y
y = mx + b
(2, 5)
(3,3)
(1,2)
x
Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x
b



Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x
b



Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x b
b m



Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x b
b m



Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x b
b m



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b



Another Example
Another Example
Another Example
TV advertisement
budget
Another Example
TV advertisement
budget
Another Example
TV budget Sales
Another Example
TV budget Sales
230.1 22.1
Another Example
TV budget Sales
230.1 22.1
44.5 10.4
Another Example
TV budget Sales
230.1 22.1
44.5 10.4
17.2 9.3
Another Example
230.1 22.1
44.5 10.4
17.2 9.3
Another Example
44.5 10.4
17.2 9.3
Another Example
44.5 10.4
y = mx + b
17.2 9.3
Another Example
44.5 10.4
y = mx + b
17.2 9.3
Another Example
44.5 10.4
y = mx + b
17.2 9.3
Multiple observations
Gradient Descent
y
y = mx + b
x
Gradient Descent
y
(xn, yn )
(x 2, y2 )
(x1, y1) y = mx + b
x
Gradient Descent
y
(xn, yn )
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x
Gradient Descent
Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x

Gradient Descent
Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x

Gradient Descent
Cost Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ b0 ]
(x 2, y2 )
m0
(x1, y1)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ b0 ]
(x 2, y2 )
m0
(x1, y1) y = m x0 + b0
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1) y = m x0 + b0 = − α ∇ℒ1(m0, b0)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x1 + b1
[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1) y = m x0 + b0 = − α ∇ℒ1(m0, b0)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x1 + b1
[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1)
= − α ∇ℒ1(m0, b0)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x2 + b2
[ b1 ] [ b2 ] [ b1 ]
(x 2, y2 ) m1 m2 m1
(x1, y1)
= − α ∇ℒ1(m1, b1)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
y = m x3 + b3 ℒ(m, b) =
(xn, yn ) 2m
[ b2 ] [ b3 ] [ b2 ]
(x 2, y2 ) m2 m3 m2
(x1, y1)
= − α ∇ℒ1(m2, b2)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ bN ] [ bN ] [ bN−1 ]
(x 2, y2 )
mN mN mN−1
(x1, y1)
= − α ∇ℒ1(mN−1, bN−1)
x

Gradient Descent
y
y = m xN + bN
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ bN ] [ bN ] [ bN−1 ]
(x 2, y2 )
mN mN mN−1
(x1, y1)
= − α ∇ℒ1(mN−1, bN−1)
x

Gradients and Gradient Descent
Conclusion