You are on page 1of 429

Copyright Notice

These slides are distributed under the Creative Commons License.

DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode


Gradients and Gradient Descent

Tangent planes
Functions of Two Variables
Functions of Two Variables
f (x) = x 2 y

x
Functions of Two Variables
f (x) = x 2 y

(2,4)

x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2

(2,4)

x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2

(2,4)

x
Finding the Tangent Plane
Finding the Tangent Plane

Fix y=4 f(x,4) = x 2 + 42


d
(f(x,4)) = 2x
dx

x=2 Fix x=2 f(2,y) = 22 + y 2


d
(f(2,y)) = 2y
dy
The tangent plane contains
both tangent lines.
Video 2: Introduction to Partial Derivatives

Example with the parabola, show tangent plane and slices


Gradients and Gradient Descent

Partial derivatives - Part 1


Slicing the Space
Slicing the Space
Slicing the Space
Slicing the Space
Slicing the Space
Slicing the Space
Partial Derivatives
f(x, y) = x 2 + y 2
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x


Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x

Slope?
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x

Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x

f(x, y) = x 2 + y 2

Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x


Constant
f(x, y) = x 2 + y 2

Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x


Constant
f(x, y) = x 2 + y 2
∂f
= 2x + 0
Slope? ∂x
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant

Function of one variable x


Constant
f(x, y) = x 2 + y 2
∂f
= 2x + 0
Slope? ∂x Derivative = 0
Partial derivative
Partial Derivatives
Partial Derivatives

x2 + y2
Partial Derivatives

x2 + y2

Treat y as
a constant

2x
Partial Derivatives

x2 + y2

Treat y as Treat x as
a constant a constant

2x 2y
Partial Derivatives

x2 + y2 f(x, y)

Treat y as Treat x as
a constant a constant

2x 2y
Partial Derivatives

x2 + y2 f(x, y)

Treat y as Treat x as
a constant a constant
∂f
2x 2y fx =
∂x
Partial Derivatives

x2 + y2 f(x, y)

Treat y as Treat x as
a constant a constant
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial Derivatives

x2 + y2 f(x, y)

Treat y as Treat x as
a constant a constant
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial derivative of
f with respect to x


Partial Derivatives

x2 + y2 f(x, y)

Treat y as Treat x as
a constant a constant
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial derivative of Partial derivative of
f with respect to x f with respect to x




Intro To Partial Derivatives
Intro To Partial Derivatives
f (x, y) = x 2 + y 2
Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
Find partial derivatives of f with respect to
x and y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivatives of f with respect to
= x and y
∂x


Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivatives of f with respect to
= x and y
∂x

∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivatives of f with respect to

?
= x and y
∂x

∂f
=
∂y


Intro To Partial Derivatives
Partial derivative notation f (x, y) = x 2 + y 2

TASK
∂f Find partial derivatives of f with respect to

?
= x and y
∂x

∂f
=
∂y


Intro To Partial Derivatives
Partial derivative notation f (x, y) = x 2 + y 2

TASK
∂f Find partial derivatives of f with respect to

?
= x and y
∂x

∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f

?
=
∂x

∂f
=
∂y
Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x

?
=
∂x

∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x

?
=
∂x Step 1:

∂f
=
∂y


Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x

?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.

∂f
=
∂y



Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x

?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.

∂f
= Step 2:
∂y



Intro To Partial Derivatives
f (x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x

?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.

∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.



Intro To Partial Derivatives
f(x, y) = x 2 + y 2

TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.




Intro To Partial Derivatives
f(x, y) = x 2 + y 2

TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.




Intro To Partial Derivatives
f(x, y) = x 2 + y 2

TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.




Intro To Partial Derivatives
f (x, y) = x 2 + 1
2 2
f(x, y) = x + y

TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.





Intro To Partial Derivatives
f (x, y) = x 2 + 1
2 2
f(x, y) = x + y

TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.





Intro To Partial Derivatives
f(x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.

∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.




Intro To Partial Derivatives
f(x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.

∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.




Intro To Partial Derivatives
f(x, y) = x 2 + y 2

TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.

∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.




Intro To Partial Derivatives
f (x, y) = 1 + y 2
f(x, y) = x + y 2
2

TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.

∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.





Intro To Partial Derivatives
f (x, y) = 1 + y 2
f(x, y) = x + y 2
2

TASK
∂f Find partial derivative of f with respect to x
= 2x
∂x Step 1: Treat all other variables as a constant. In
our case x.

∂f
= 2y Step 2: Differentiate the function using the normal
∂y rules of differentiation.





Gradients and Gradient Descent

Partial derivatives -Part 2


Partial Derivatives (More Examples)
Partial Derivatives (More Examples)
f(x, y) = 3x 2y 3

Partial Derivatives (More Examples)
f(x, y) = 3x 2y 3

TASK
What is the partial derivate of f with respect to x?



Partial Derivatives (More Examples)
f(x, y) = 3x 2y 3

TASK
∂f
∂x
=
? What is the partial derivate of f with respect to x?



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= TASK
∂x
Find partial derivate of f with respect to x


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

Constant coefficient
∂f
= TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

Constant coefficient
∂f
= 3 TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= 3 TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

Differentiate with respect to x


∂f
= 3 TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.




Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

Differentiate with respect to x


∂f
= 33(2x) TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.




Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= 3(2x) TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

treat as constant coefficient


∂f
= 3(2x) TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

treat as constant coefficient


∂f
= 3(2x) TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f 3
= 3(2x)y TASK
∂x
Find partial derivate of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f 3
= 3(2x)y TASK
∂x
Find partial derivate of f with respect to x
= 6xy 3 Step 1: Treat all other variables as a constant. In
our case y.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

TASK
What is the partial derivate of f with respect to y?


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

TASK
∂f What is the partial derivate of f with respect to y?
=
∂y


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

TASK
∂f
∂y
=
? 
What is the partial derivate of f with respect to y?


Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
=
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
=
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f
= 3
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f 2
= 3 (x )
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f 2 2
= 3 (x ) (3y )
∂y TASK
What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f 2 2
= 3 (x ) (3y )
∂y TASK
= 9x 2y 2 What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Partial Derivatives (More Examples)
f (x, y) = 3x 2y 3

∂f 2 2
= 3 (x ) (3y )
∂y TASK
= 9x 2y 2 What is the partial derivate of f with respect to y?
Step 1: Treat all other variables as a constant. In
our case x.

Step 2: Differentiate the function using the normal


rules of differentiation.



Gradients and Gradient Descent

Gradients
Gradient
f(x, y) = x 2 + y 2
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant

∂f
= 2x
∂x
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant

∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient
Treat y as a constant Treat x as a constant

∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient

[2y]
Treat y as a constant Treat x as a constant 2x

∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient

[2y]
Treat y as a constant Treat x as a constant 2x

∂f
∂x
∇f = ∂f
∂y

∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2

[2y]
2x
The gradient of f (x, y) is: ∇f =

Gradient
f(x, y) = x 2 + y 2

[2y]
2x
The gradient of f (x, y) is: ∇f =

TASK
Find the gradient of f (x, y) at (2,3)



Gradient
f(x, y) = x 2 + y 2

[2y]
2x
The gradient of f (x, y) is: ∇f =

TASK
Find the gradient of f (x, y) at (2,3)
The gradient of f (x, y) is given as:




Gradient
f(x, y) = x 2 + y 2

[2y]
2x
The gradient of f (x, y) is: ∇f =

TASK
Find the gradient of f (x, y) at (2,3)
The gradient of f (x, y) is given as:

[2 ⋅ 3]
2⋅2
∇f =




Gradient
f(x, y) = x 2 + y 2

[2y]
2x
The gradient of f (x, y) is: ∇f =

TASK
Find the gradient of f (x, y) at (2,3)
The gradient of f (x, y) is given as:

[2 ⋅ 3] [6]
2⋅2 4
∇f = =




Gradients and Gradient Descent

Gradients and maxima/


minima
Functions of Two Variables
Functions of Two Variables
f (x) = x 2 y

x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2

x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2

x
Functions of Two Variables
f (x) = x 2 y f (x, y) = x 2 + y 2

Minimum is
when slope = 0

Functions of Two Variables


f (x) = x 2 y f (x, y) = x 2 + y 2

Minimum is
when slope = 0

Functions of Two Variables


f (x) = x 2 y f (x, y) = x 2 + y 2

Minimum is
when slope = 0

Functions of Two Variables


f (x) = x 2 y f (x, y) = x 2 + y 2

Minimum is
when slope = 0

Functions of Two Variables


f (x) = x 2 y f (x, y) = x 2 + y 2

Minimum is
Minimum is when both slopes = 0
when slope = 0

Functions of Two Variables


f (x) = x 2 y f (x, y) = x 2 + y 2

Minimum is
Minimum is when both slopes = 0
when slope = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0

2x = 0

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0

2x = 0

x=0
x

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0

2x = 0

x=0
x

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0

x=0
x

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
x

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
(x, y) = (0,0)
x

Minimum is Minimum is
when slope = 0 when both slopes = 0

Functions of Two Variables


f (x) = x 2 f (x, y) = x 2 + y 2
y

f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
(x, y) = (0,0)
x

Minimum is Minimum is
when slope = 0 when both slopes = 0

Gradients and Gradient Descent

Optimization with gradients:


An example
Motivation for Optimization in Two Variables

Too hot!
Motivation for Optimization in Two Variables

5m
5m
Motivation for Optimization in Two Variables

5m
5m
Motivation for Optimization in Two Variables

5m
5m
Motivation for Optimization in Two Variables

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables
Starting point

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables

T (∘C )

(m)
(m)
Motivation for Optimization in Two Variables

∂T

T( C) =0
∂x

(m)
(m)
Motivation for Optimization in Two Variables

∂T

T( C) =0
∂x
∂T
=0
∂y

(m)
(m)
Exercise

T(∘C )

(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90

T(∘C )

(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
Try and calculate ∂f
∂x

T(∘C )

(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
Try and calculate ∂f
∂x

and ∂f

T( C )
∂y

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90

T(∘C )

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6)
∂x 90

T(∘C )

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6)
∂x 90

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12)
∂y 90

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0 x=4

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0 x=4
y=0

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=0
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0 y=0
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6

T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0 y=0 y=4
(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90

T(∘C )

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90

T(∘C )

(m)
(m)
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
y=0
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m)
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
y=0
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
y=0

85
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
85
85 y=0

85
x = 0, y = 0 Maxima
85 x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
85
85 y=0

85
x = 0, y = 0 Maxima
85 x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4
73.6
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Motivation for Optimization in Two Variables
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90 x=0
85 Maxima
85
85 y=0

85
x = 0, y = 0 Maxima
85 x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4 Minimum
73.6
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Gradients and Gradient Descent

Optimization using gradients


- Analytical method
Linear Regression: Analytical Approach
Linear Regression: Analytical Approach
Linear Regression: Analytical Approach

(1, 2) (2, 5) (3,3)


Linear Regression: Analytical Approach

Location on
the x,y plane (1, 2) (2, 5) (3,3)

Linear Regression: Analytical Approach


y

Location on
the x,y plane (1, 2) (2, 5) (3,3)

Linear Regression: Analytical Approach


y

Location on
the x,y plane (1, 2) (2, 5) (3,3)

(1,2)

Linear Regression: Analytical Approach


y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(1,2)

Linear Regression: Analytical Approach


y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

The cost of connecting


connection to the
powerline (3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

The cost of connecting


connection to the
powerline (3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

The cost of connecting


connection to the
powerline (3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on y = mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

The cost of connecting


connection to the
powerline (3,3)
(1,2)

Linear Regression: Analytical Approach


y
Initial line to represent
where to pass the
connection on the x-y
plane

Location on y = mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

The cost of connecting


connection to the
powerline (3,3)
(1,2)

Goal: Find m, b such that you x


minimize sum of squares cost

Linear Regression: Analytical Approach


y

y = mx + b

(2, 5)

(3,3)
(1,2)

x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

y = mx + b

(2, 5)

(3,3)
(1,2)

x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

y = mx + b

(2, 5)

(1, m+b)

(3,3)
(1,2)

x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

y = mx + b

(2, 5)

(1, m+b)

(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

y = mx + b

(2, 5)
(2m + b − 5)2

(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

y = mx + b

(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2


y = mx + b

(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2


y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2


y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2


y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
+9m 2 + b 2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2


y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
+9m 2 + b 2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b − 2)2
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2


y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
+9m 2 + b 2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b − 2)2
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5)

(3,3)
(1,2)

x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
=0
∂m

(3,3)
(1,2)

x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
=0
∂m

∂E
(3,3) =0
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
=0 Quiz:
∂m
Find the partial derivative of E
∂E with respect to m
(3,3) =0
(1,2) ∂b
x


Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
= 28m + 12b − 42
∂m

∂E
(3,3) =
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
= 28m + 12b − 42
∂m

∂E
(3,3) = Quiz:
∂b
Find the partial derivative of E
(1,2)

x with respect to b


Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
= 28m + 12b − 42
∂m

∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

(2, 5) ∂E
= 28m + 12b − 42
∂m

∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

∂E
(2, 5)
= 28m + 12b − 42 =0
∂m

∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

∂E
(2, 5)
= 28m + 12b − 42 =0
∂m

∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

∂E
(2, 5)
= 28m + 12b − 42 =0
∂m

∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

∂E
(2, 5)
= 28m + 12b − 42 =0
∂m

∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
b=
Linear Regression: Analytical Approach
y
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b


y = mx + b

∂E
(2, 5)
= 28m + 12b − 42 =0
∂m

∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

∂E
= 28m + 12b − 42 =0
∂m

∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

∂E
= 28m + 12b − 42 =0
∂m

∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

∂E
= 28m + 12b − 42 =0
∂m

∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

∂E
= 28m + 12b − 42 =0
∂m

∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m

∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b

m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
6b − 14 = 0
m=
b= ?
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
6b − 14 = 0
m=
b= ? b=
14
6
=
7
3
Linear Regression: Analytical Approach
Goal: Minimize sum of squares cost

1
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b m=
2
∂E
= 28m + 12b − 42 =0 7
∂m b=
3
∂E
= 6b + 12m − 20 =0
∂b 1 7
E(m = , b = ) ≈ 4.167
2 3
m=
b= ?
Linear Regression: Optimal Solution
y

1
m=
2
y = mx + b

(2, 5) 7
b=
3

(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Optimal Solution
y

1
m=
2
y = mx + b

(2, 5) 7
b=
3

(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Optimal Solution
y

1
m=
2
y = mx + b

(2, 5) 7
b=
3

(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

∂E
= 28m + 12b − 42 =0
∂m
Gradient Descent to the rescue
∂E
= 6b + 12m − 20 =0
∂b

m=
b= ?
Gradients and Gradient Descent

Optimization using Gradient


Descent in one variable -
Part 1
Hard To Optimize Functions
Hard To Optimize Functions

f (x) = e x − log(x)
Hard To Optimize Functions

f (x) = e x − log(x)

y
Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y
Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y
Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y

f′(x) = 0

Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions

f (x) = e x − log(x)

Minimum?
y
1
f′(x) = e x − =0
x

Hard To Optimize Functions

f (x) = e x − log(x)

1
f′(x) = e x −
x y

Hard To Optimize Functions

f (x) = e x − log(x)

1
f′(x) = e x − =0
x y

Hard To Optimize Functions

f (x) = e x − log(x)

1
f′(x) = e x − =0
x y

1
ex =
x

Hard To Optimize Functions

f (x) = e x − log(x)

1
f′(x) = e x − =0
x y

1
ex =
x

Solution: x = 0.5671...


Hard To Optimize Functions

f (x) = e x − log(x)

1
f′(x) = e x − =0
x y

1
ex =
x

Solution: x = 0.5671...
Also known as the Omega constant


Method 1: Try Both Directions

y
Method 1: Try Both Directions
Is there any
other way?

y
Method 1: Try Both Directions
Is there any
other way?

y
Method 1: Try Both Directions
Is there any
other way?

y
Method 1: Try Both Directions
Is there any
other way?

y
Method 1: Try Both Directions
Is there any
other way?

y
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Method 1: Try Both Directions
Is there any
other way?

Repeat!
Gradients and Gradient Descent

Optimization using Gradient


Descent in one variable -
Part 2
Method 2: Be Clever

y
Method 2: Be Clever
Try something
smarter…

y
Method 2: Be Clever
Try something
smarter…

y
Method 2: Be Clever
Try something
smarter…

Step right
Method 2: Be Clever
Try something
smarter…

Step right
Method 2: Be Clever
Try something
smarter…

y Step left

Step right
Method 2: Be Clever
Try something
smarter…

y Step left

Step right
Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left

Slope
Step right positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left
new point
Slope
Step right positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left
new point = old point
Slope
Step right positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left
new point = old point - slope
Slope
Step right positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left
new point = old point - slope

x1 Step right
Slope
positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left
new point = old point - slope

x1 = x0 Step right
Slope
positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y Step left
new point = old point - slope

x1 = x0 −f′(x0) Step right


Slope
positive

Slope
negative

Method 2: Be Clever
Try something
smarter…

y
new point = old point - slope

x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…

y
new point = old point - slope

x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic

y
new point = old point - slope

x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic

y
new point = old point - slope

x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic

y
new point = old point - slope

x1 = x0 −f′(x0)
x1 = x0 − 0.01 f′(x0)


Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic

y
new point = old point - slope

x1 = x0 −f′(x0)
x1 = x0 − α f′(x0)


Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic

y
new point = old point - slope

x1 = x0 −f′(x0)
x1 = x0 − α f′(x0)

Learning rate


Method 2: Be Clever
Try something
smarter…

x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…

x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y

x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y

x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y

x1 = x0 − α f′(x0)
Small
step

Method 2: Be Clever
Try something
smarter…
Big
step
y

x1 = x0 − α f′(x0)
Small Large
step slope

Method 2: Be Clever
Try something
smarter…
Big
step
y

x1 = x0 − α f′(x0)
Small Large
step slope

Small
slope

Method 2: Be Clever
Try something
smarter…

x1 = x0 − α f′(x0)

Gradient descent

Method 2: Be Clever
Try something
smarter…

x2 = x1 − α f′(x1)

Gradient descent

Method 2: Be Clever
Try something
smarter…

x20 = x19 − α f′(x19)

Gradient descent

Gradient Descent
Function: f(x) Goal: find minimum of f(x)

Step 1:
Define a learning rate α
Choose a starting point x0
Step 2:
Update: xk = xk−1 − af′(xk−1)
Step 3:
Repeat Step 2 until you are close enough to
the true minimum x*







Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x

y

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005

y



Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005

Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y







Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005

Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y

Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735











Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005

Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y

Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735

Repeat!











Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005

Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y

Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735

Repeat!











Gradients and Gradient Descent

Optimization using Gradient


Descent in one variable -
Part 3
What Is a Good Learning Rate?
Too large Too small
What Is a Good Learning Rate?
Too large Too small
What Is a Good Learning Rate?
Too large Too small
What Is a Good Learning Rate?
Too large Too small
What Is a Good Learning Rate?
Too large Too small
What Is a Good Learning Rate?
Just right
What Is a Good Learning Rate?
Just right
What Is a Good Learning Rate?
Just right Unfortunately, there is no
rule to give the best
learning rate α

Drawbacks of Gradient Descent
Drawbacks of Gradient Descent
y

x
Drawbacks of Gradient Descent
y

🤷
x

How do you find this point?

Drawbacks of Gradient Descent


y

🤷
x

stuck in a local minima How do you find this point?

Drawbacks of Gradient Descent


y Take multiple initial starting points for gradient
descent

🤷
x

stuck in a local minima How do you find this point?

Gradients and Gradient Descent

Optimization using Gradient


Descent in two variables -
Part 1
Gradient Descent With Heat Example

T
Gradient Descent With Heat Example

T
Gradient Descent With Heat Example

T
Gradient Descent With Heat Example

T
Gradient Descent With Heat Example

T
Gradient Descent With Heat Example

Repeat!
T
Gradient Descent With Heat Example

T
Gradient Descent With Heat Example

Repeat!
T
Gradients and Gradient Descent

Optimization using Gradient


Descent in two variables -
Part 2
Idea for Gradient Descent
Idea for Gradient Descent

Initial position: (x0, y0)



Idea for Gradient Descent

Initial position: (x0, y0)



Idea for Gradient Descent

Initial position: (x0, y0)



Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f

∇f


Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f

∇f


Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f


Direction of greatest descent: − ∇f

∇f − ∇f



Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f


Direction of greatest descent: − ∇f

∇f − ∇f



Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f


Direction of greatest descent: − ∇f

− ∇f



Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f


Direction of greatest descent: − ∇f

Updated position: (x0, y0) − α ∇f


− ∇f





Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f


Direction of greatest descent: − ∇f

Updated position: (x0, y0) − α ∇f

(x1, y1) − ∇f





Idea for Gradient Descent

Initial position: (x0, y0)

Direction of greatest ascent: ∇f


Direction of greatest descent: − ∇f

Updated position: (x0, y0) − α ∇f

(x1, y1) − ∇f

Better point!





1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)

Start: x = 0.5, y = 0.6



1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)

Start: x = 0.5, y = 0.6


∂f
∂x
∇f = ∂f
∂y

1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)

Start: x = 0.5, y = 0.6


∂f
∂x
∇f = ∂f
∂y

∂f 1 2
∂x
= − 90
x(3x − 12)y (y − 6)
∇f∂f= 1 2
= − 90 x (x − 6)y(3y − 12)
∂y

1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)

Start: x = 0.5, y = 0.6


∂f
∂x
∇f = ∂f
∂y

1
− 90 x(3x − 12)y 2(y − 6)
∇f = 1 2
− 90 x (x − 6)y(3y − 12)

1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)

Start: x = 0.5, y = 0.6


∂f
∂x
∇f = ∂f
∂y

1
− 90 x(3x − 12)y 2(y − 6)
∇f = 1 2
− 90 x (x − 6)y(3y − 12)

[−0.0935]
0.1134
∇f (0.5,0.6) =

Method 2: Gradient Descent
Start: x = 0.5, y = 0.6

[−0.0935]
0.1134
∇f (0.5,0.6) =

Method 2: Gradient Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05

[−0.0935]
0.1134
∇f (0.5,0.6) =

Move by
−0.05 ∇f (0.5,0.6)



Method 2: Gradient Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05

[−0.0935]
0.1134
∇f (0.5,0.6) =

Move by
−0.05 ∇f (0.5,0.6)

x ↦ 0.5057
y ↦ 0.6047



Method 2: Gradient Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05

[−0.0935]
0.1134
∇f (0.5,0.6) =

Move by
−0.05 ∇f (0.5,0.6)

x ↦ 0.5057
y ↦ 0.6047



Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05


Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05

Find:

[−0.0961]
−0.1162
∇f (0.5057,0.6047) =

Move by
−0.05 ∇f (0.5057,0.6047)




Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05

Find:

[−0.0961]
−0.1162
∇f (0.5057,0.6047) =

Move by
−0.05 ∇f (0.5057,0.6047)

x ↦ 0.5115
y ↦ 0.6095 Repeat!




Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05

Find:

[−0.0961]
−0.1162
∇f (0.5057,0.6047) =

Move by
−0.05 ∇f (0.5057,0.6047)

x ↦ 0.5115
y ↦ 0.6095 Repeat!




Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05

Find:

[−0.0961]
−0.1162
∇f (0.5057,0.6047) =

Move by
−0.05 ∇f (0.5057,0.6047)

x ↦ 0.5115
y ↦ 0.6095 Repeat!




Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05

Find:

[−0.0961]
−0.1162
∇f (0.5057,0.6047) =

Move by
−0.05 ∇f (0.5057,0.6047)

x ↦ 0.5115
y ↦ 0.6095 Repeat!




Gradient Descent
Gradient Descent
Function: f(x, y)

Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)


Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)

Step 1:
Define a learning rate α
Choose a starting point (x0, y0)




Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)

Step 1:
Define a learning rate α
Choose a starting point (x0, y0)
Step 2:

[yk] [yk−1]
xk xk−1
Update: = − α ∇f(xk−1, yk−1)





Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)

Step 1:
Define a learning rate α
Choose a starting point (x0, y0)
Step 2:

[yk] [yk−1]
xk xk−1
Update: = − α ∇f(xk−1, yk−1)
Step 3:
Repeat Step 2 until you are close enough to
the true minimum (x*, y*)






Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradients and Gradient Descent

Optimization using Gradient


Descent - Least squares
Gradient Descent With Power Lines Example
y

y = mx + b

(2, 5)

(3,3)
(1,2)

x
Gradient Descent With Power Lines Example
y E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b

(2, 5)

(3,3)
(1,2)

x
Gradient Descent With Power Lines Example
y E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
1 7
E(m = , b = ) ≈ 4.167
(2, 5) 2 3

(3,3)
(1,2)

x
Linear Regression: Gradient Descent

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

∇E = [28m + 12b − 42, 6b + 12m − 20]

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

∇E = [28m + 12b − 42, 6b + 12m − 20]

m=
b=

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

∇E = [28m + 12b − 42, 6b + 12m − 20]

m=
b= ?

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

∇E = [28m + 12b − 42, 6b + 12m − 20]

m=
b= ?

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

∇E = [28m + 12b − 42, 6b + 12m − 20]

m=
b= ?

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ?

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ?

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ? The points m,b such that
the cost is minimum

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ? The points m,b such that
the cost is minimum

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ? The points m,b such that
the cost is minimum

descend until you


find the minimum

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ? The points m,b such that
the cost is minimum

Steps:

descend until you


find the minimum

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ? The points m,b such that
the cost is minimum

Steps:
Start with (m0, b0)
descend until you
find the minimum

m

Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost

some initial starting


∇E = [28m + 12b − 42, 6b + 12m − 20] point

m=
b= ? The points m,b such that
the cost is minimum

Steps:
Start with (m0, b0)
descend until you
find the minimum
Iterate
(mk+1, bk+1) = (mk, bk ) − α ∇E(mk, bk ) m

Gradients and Gradient Descent

Optimization using Gradient


Descent - Least squares
with multiple observations
Gradient Descent
y

(2, 5)

(3,3)
(1,2)

x
Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)

x
Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)
Cost

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)
Cost

(3,3)
(1,2)
m
x
b



Gradient Descent
Square loss
y

y = mx + b

(2, 5)
Cost

(3,3)
(1,2)
m
x
b



Gradient Descent
Square loss
y

y = mx + b

(2, 5)
Cost

(3,3)
(1,2)
m
x b
b m



Gradient Descent
Square loss
y

y = mx + b

(2, 5)
Cost

(3,3)
(1,2)
m
x b
b m



Gradient Descent
Square loss
y

y = mx + b

(2, 5)
Cost

(3,3)
(1,2)
m
x b
b m



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Gradient Descent
y

y = mx + b

(2, 5)

(3,3)
(1,2)
m
x
b



Another Example
Another Example
Another Example

TV advertisement
budget
Another Example

TV advertisement
budget
Another Example

TV advertisement Number of sales


budget
Another Example
Another Example

TV budget Sales
Another Example

TV budget Sales

230.1 22.1
Another Example

TV budget Sales

230.1 22.1

44.5 10.4
Another Example

TV budget Sales

230.1 22.1

44.5 10.4

17.2 9.3
Another Example

TV budget Sales Goal: Predict sales in terms of TV budget

230.1 22.1

44.5 10.4

17.2 9.3
Another Example

TV budget Sales Goal: Predict sales in terms of TV budget

230.1 22.1 Tool: Linear regression

44.5 10.4

17.2 9.3
Another Example

TV budget Sales Goal: Predict sales in terms of TV budget

230.1 22.1 Tool: Linear regression

44.5 10.4
y = mx + b
17.2 9.3
Another Example

TV budget Sales Goal: Predict sales in terms of TV budget

230.1 22.1 Tool: Linear regression

44.5 10.4
y = mx + b
17.2 9.3
Another Example

TV budget Sales Goal: Predict sales in terms of TV budget

230.1 22.1 Tool: Linear regression

44.5 10.4
y = mx + b
17.2 9.3

Multiple observations
Gradient Descent
y

y = mx + b

x
Gradient Descent
y

(xn, yn )

(x 2, y2 )
(x1, y1) y = mx + b

x
Gradient Descent
y

(xn, yn )

(x 2, y2 )
(x1, y1) y = mx + b

(x1, m x1 + b)
x
Gradient Descent
Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

(x 2, y2 )
(x1, y1) y = mx + b

(x1, m x1 + b)
x

Gradient Descent
Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

(x 2, y2 )
(x1, y1) y = mx + b

(x1, m x1 + b)
x

Gradient Descent
Cost Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

(x 2, y2 )
(x1, y1) y = mx + b

(x1, m x1 + b)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

(x 2, y2 )
(x1, y1)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

[ b0 ]
(x 2, y2 )
m0
(x1, y1)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

[ b0 ]
(x 2, y2 )
m0
(x1, y1) y = m x0 + b0

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1) y = m x0 + b0 = − α ∇ℒ1(m0, b0)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

y = m x1 + b1

[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1) y = m x0 + b0 = − α ∇ℒ1(m0, b0)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

y = m x1 + b1

[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1)
= − α ∇ℒ1(m0, b0)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x2 + b2

[ b1 ] [ b2 ] [ b1 ]
(x 2, y2 ) m1 m2 m1
(x1, y1)
= − α ∇ℒ1(m1, b1)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
y = m x3 + b3 ℒ(m, b) =
(xn, yn ) 2m

[ b2 ] [ b3 ] [ b2 ]
(x 2, y2 ) m2 m3 m2
(x1, y1)
= − α ∇ℒ1(m2, b2)

x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

[ bN ] [ bN ] [ bN−1 ]
(x 2, y2 )
mN mN mN−1
(x1, y1)
= − α ∇ℒ1(mN−1, bN−1)

x

Gradient Descent
y
y = m xN + bN
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m

[ bN ] [ bN ] [ bN−1 ]
(x 2, y2 )
mN mN mN−1
(x1, y1)
= − α ∇ℒ1(mN−1, bN−1)

x

Gradients and Gradient Descent

Conclusion

You might also like