Deeplearning - Ai Deeplearning - Ai

Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

Gradients and Gradient Descent
Tangent planes
Functions of Two Variables
f (x) = x 2 y
x
f (x) = x 2 y
(2,4)
x
f (x) = x 2 y f (x, y) = x 2 + y 2
(2,4)
x
f (x) = x 2 y f (x, y) = x 2 + y 2
(2,4)
x
Finding the Tangent Plane
Finding the Tangent Plane
Fix y=4 f(x,4) = x 2 + 42

d
(f(x,4)) = 2x
dx
x=2 Fix x=2 f(2,y) = 22 + y 2

d
(f(2,y)) = 2y
dy
The tangent plane contains
both tangent lines.
Video 2: Introduction to Partial Derivatives
Example with the parabola, show tangent plane and slices

Partial derivatives - Part 1

Slicing the Space
Slicing the Space
Slicing the Space
Slicing the Space
Slicing the Space
Slicing the Space
Partial Derivatives
f(x, y) = x 2 + y 2
Partial Derivatives
f(x, y) = x 2 + y 2 Treat y as a constant
Partial Derivatives
Function of one variable x

Partial Derivatives
Slope?
Partial Derivatives
Slope?
Partial derivative
Partial Derivatives
f(x, y) = x 2 + y 2
Slope?
Partial derivative
Partial Derivatives

Constant
f(x, y) = x 2 + y 2
Slope?
Partial derivative
Partial Derivatives

Constant
f(x, y) = x 2 + y 2
∂f
= 2x + 0
Slope? ∂x
Partial derivative
Partial Derivatives

Constant
f(x, y) = x 2 + y 2
∂f
= 2x + 0
Slope? ∂x Derivative = 0
Partial derivative
Partial Derivatives
Partial Derivatives
x2 + y2
Partial Derivatives
x2 + y2
Treat y as
a constant
2x
Partial Derivatives
x2 + y2
Treat y as Treat x as
a constant a constant
2x 2y
Partial Derivatives
x2 + y2 f(x, y)
2x 2y
Partial Derivatives
x2 + y2 f(x, y)
∂f
2x 2y fx =
∂x
Partial Derivatives
x2 + y2 f(x, y)
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial Derivatives
x2 + y2 f(x, y)
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial derivative of
f with respect to x

Partial Derivatives
x2 + y2 f(x, y)
∂f ∂f
2x 2y fx = fy =
∂x ∂y
Partial derivative of Partial derivative of
f with respect to x f with respect to x

Intro To Partial Derivatives
f (x, y) = x 2 + y 2
f (x, y) = x 2 + y 2
TASK
f (x, y) = x 2 + y 2
TASK
Find partial derivatives of f with respect to
x and y

f (x, y) = x 2 + y 2
TASK
∂f Find partial derivatives of f with respect to
= x and y
∂x

f (x, y) = x 2 + y 2
TASK
= x and y
∂x
∂f
=
∂y

f (x, y) = x 2 + y 2
TASK
?
= x and y
∂x
∂f
=
∂y

Partial derivative notation f (x, y) = x 2 + y 2
TASK
?
= x and y
∂x
∂f
=
∂y

Partial derivative notation f (x, y) = x 2 + y 2
TASK
?
= x and y
∂x
∂f
=
∂y

f (x, y) = x 2 + y 2
TASK
∂f
?
=
∂x
∂f
=
∂y
f (x, y) = x 2 + y 2
TASK
∂f Find partial derivative of f with respect to x
?
=
∂x
∂f
=
∂y

f (x, y) = x 2 + y 2
TASK
?
=
∂x Step 1:
∂f
=
∂y

f (x, y) = x 2 + y 2
TASK
?
=
∂x Step 1: Treat all other variables as a constant. In
our case y.
∂f
=
∂y

f (x, y) = x 2 + y 2
TASK
?
=
our case y.
∂f
= Step 2:
∂y

f (x, y) = x 2 + y 2
TASK
?
=
our case y.
∂f
= Step 2: Differentiate the function using the normal
∂y rules of differentiation.

f(x, y) = x 2 + y 2
TASK
Find partial derivative of f with respect to x
Step 1: Treat all other variables as a constant. In
our case y.
Step 2: Differentiate the function using the normal

rules of differentiation.

f(x, y) = x 2 + y 2
TASK
our case y.


f(x, y) = x 2 + y 2
TASK
our case y.


f (x, y) = x 2 + 1
2 2
f(x, y) = x + y
TASK
our case y.


f (x, y) = x 2 + 1
2 2
f(x, y) = x + y
TASK
= 2x
our case y.


f(x, y) = x 2 + y 2
TASK
= 2x
our case x.
∂f

f(x, y) = x 2 + y 2
TASK
= 2x
our case x.
∂f

f(x, y) = x 2 + y 2
TASK
= 2x
our case x.
∂f

f (x, y) = 1 + y 2
f(x, y) = x + y 2
2
TASK
= 2x
our case x.
∂f

f (x, y) = 1 + y 2
f(x, y) = x + y 2
2
TASK
= 2x
our case x.
∂f
= 2y Step 2: Differentiate the function using the normal

Partial derivatives -Part 2

Partial Derivatives (More Examples)
f(x, y) = 3x 2y 3

f(x, y) = 3x 2y 3
TASK
What is the partial derivate of f with respect to x?

f(x, y) = 3x 2y 3
TASK
∂f
∂x
=
? What is the partial derivate of f with respect to x?

f (x, y) = 3x 2y 3
∂f
= TASK
∂x
Find partial derivate of f with respect to x

f (x, y) = 3x 2y 3
∂f
= TASK
∂x
our case y.

f (x, y) = 3x 2y 3
∂f
= TASK
∂x
our case y.

f (x, y) = 3x 2y 3
∂f
= TASK
∂x
our case y.


f (x, y) = 3x 2y 3
∂f
= TASK
∂x
our case y.


f (x, y) = 3x 2y 3
Constant coefficient
∂f
= TASK
∂x
our case y.


f (x, y) = 3x 2y 3
Constant coefficient
∂f
= 3 TASK
∂x
our case y.


f (x, y) = 3x 2y 3
∂f
= 3 TASK
∂x
our case y.


f (x, y) = 3x 2y 3
Differentiate with respect to x

∂f
= 3 TASK
∂x
our case y.


f (x, y) = 3x 2y 3
Differentiate with respect to x

∂f
= 33(2x) TASK
∂x
our case y.


f (x, y) = 3x 2y 3
∂f
= 3(2x) TASK
∂x
our case y.


f (x, y) = 3x 2y 3
treat as constant coefficient

∂f
= 3(2x) TASK
∂x
our case y.


f (x, y) = 3x 2y 3
treat as constant coefficient

∂f
= 3(2x) TASK
∂x
our case y.


f (x, y) = 3x 2y 3
∂f 3
= 3(2x)y TASK
∂x
our case y.


f (x, y) = 3x 2y 3
∂f 3
= 3(2x)y TASK
∂x
= 6xy 3 Step 1: Treat all other variables as a constant. In
our case y.


f (x, y) = 3x 2y 3
TASK
What is the partial derivate of f with respect to y?

f (x, y) = 3x 2y 3
TASK
∂f What is the partial derivate of f with respect to y?
=
∂y

f (x, y) = 3x 2y 3
TASK
∂f
∂y
=
?

f (x, y) = 3x 2y 3
∂f
=
∂y TASK
our case x.


f (x, y) = 3x 2y 3
∂f
=
∂y TASK
our case x.


f (x, y) = 3x 2y 3
∂f
= 3
∂y TASK
our case x.


f (x, y) = 3x 2y 3
∂f 2
= 3 (x )
∂y TASK
our case x.


f (x, y) = 3x 2y 3
∂f 2 2
= 3 (x ) (3y )
∂y TASK
our case x.


f (x, y) = 3x 2y 3
∂f 2 2
= 3 (x ) (3y )
∂y TASK
= 9x 2y 2 What is the partial derivate of f with respect to y?
our case x.


f (x, y) = 3x 2y 3
∂f 2 2
= 3 (x ) (3y )
∂y TASK
= 9x 2y 2 What is the partial derivate of f with respect to y?
our case x.


Gradients
Gradient
f(x, y) = x 2 + y 2
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant
Gradient
f(x, y) = x 2 + y 2
Treat y as a constant Treat x as a constant
Gradient
f(x, y) = x 2 + y 2
∂f
= 2x
∂x
Gradient
f(x, y) = x 2 + y 2
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2 Gradient
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
[2y]
Treat y as a constant Treat x as a constant 2x
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
[2y]
Treat y as a constant Treat x as a constant 2x
∂f
∂x
∇f = ∂f
∂y
∂f ∂f
= 2x = 2y
∂x ∂y
Gradient
f(x, y) = x 2 + y 2
[2y]
2x
The gradient of f (x, y) is: ∇f =

Gradient
f(x, y) = x 2 + y 2
[2y]
2x
TASK
Find the gradient of f (x, y) at (2,3)

Gradient
f(x, y) = x 2 + y 2
[2y]
2x
TASK
The gradient of f (x, y) is given as:

Gradient
f(x, y) = x 2 + y 2
[2y]
2x
TASK
[2 ⋅ 3]
2⋅2
∇f =

Gradient
f(x, y) = x 2 + y 2
[2y]
2x
TASK
[2 ⋅ 3] [6]
2⋅2 4
∇f = =

Gradients and maxima/

minima
f (x) = x 2 y
x
f (x) = x 2 y f (x, y) = x 2 + y 2
x
f (x) = x 2 y f (x, y) = x 2 + y 2
x
f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
when slope = 0

f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
when slope = 0

f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
when slope = 0

f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
when slope = 0

f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
Minimum is when both slopes = 0
when slope = 0

f (x) = x 2 y f (x, y) = x 2 + y 2
Minimum is
Minimum is when both slopes = 0
when slope = 0

f (x) = x 2 f (x, y) = x 2 + y 2
y
Minimum is Minimum is
when slope = 0 when both slopes = 0


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0
2x = 0


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0
2x = 0
x=0
x


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0
2x = 0
x=0
x


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
x=0
x


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
x


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
(x, y) = (0,0)
x


f (x) = x 2 f (x, y) = x 2 + y 2
y
f′(x) = 0 ∂f ∂f
= 0 and =0
∂x ∂y
2x = 0
2x = 0 and 2y = 0
x=0
(x, y) = (0,0)
x

Optimization with gradients:

An example
Motivation for Optimization in Two Variables
Too hot!
5m
5m
5m
5m
5m
5m
T (∘C )
(m)
(m)
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
Starting point
T (∘C )
(m)
(m)
T (∘C )
(m)
(m)
T (∘C )
(m)
(m)
∂T
∘
T( C) =0
∂x
(m)
(m)
∂T
∘
T( C) =0
∂x
∂T
=0
∂y
(m)
(m)
Exercise
T(∘C )
(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
T(∘C )
(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
Try and calculate ∂f
∂x
T(∘C )
(m)
(m)
Exercise
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
Try and calculate ∂f
∂x
and ∂f
∘
T( C )
∂y
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
T(∘C )
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6)
∂x 90
T(∘C )
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6)
∂x 90
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12)
∂y 90
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0 x=4
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=0 x=4
y=0
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=0
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0 y=0
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) ∂f 1
90 =− x(3x − 12)y 2(y − 6) = 0
∂x 90
x=4
x=0 y=0 y=6
T(∘C ) ∂f 1 2
=− x (x − 6)y(3y − 12) = 0
∂y 90
x=6
x=0 y=0 y=4
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6)
90
T(∘C )
(m)
(m)
1 2
T = f (x, y) = 85 − x (x − 6)y 2(y − 6) Candidate points for the minima
90
T(∘C )
(m)
(m)
1 2
90 x=0
y=0
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m)
(m) x = 6, y = 6
1 2
90 x=0
y=0
x = 0, y = 0
x = 0, y = 4
x = 0, y = 6 Outside
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
1 2
90 x=0
85 Maxima
y=0
85
x = 0, y = 0
x = 0, y = 4
T(∘C )
x = 4, y = 0
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
1 2
90 x=0
85 Maxima
85
85 y=0
85
x = 0, y = 0 Maxima
85 x = 0, y = 4
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
1 2
90 x=0
85 Maxima
85
85 y=0
85
x = 0, y = 0 Maxima
85 x = 0, y = 4
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4
73.6
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
1 2
90 x=0
85 Maxima
85
85 y=0
85
x = 0, y = 0 Maxima
85 x = 0, y = 4
T(∘C )
x = 4, y = 0 Maxima
x = 4, y = 4 Minimum
73.6
x = 6, y = 0
(m) Outside
(m) x = 6, y = 6
Optimization using gradients

- Analytical method
Linear Regression: Analytical Approach
(1, 2) (2, 5) (3,3)

Location on
the x,y plane (1, 2) (2, 5) (3,3)


y
Location on
the x,y plane (1, 2) (2, 5) (3,3)

y
Location on
the x,y plane (1, 2) (2, 5) (3,3)
(1,2)

y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(1,2)

y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
where to pass the
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
where to pass the
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
where to pass the
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2)

y
where to pass the
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
The cost of connecting

connection to the
powerline (3,3)
(1,2)

y
where to pass the
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

connection to the
powerline (3,3)
(1,2)

y
where to pass the
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

connection to the
powerline (3,3)
(1,2)

y
where to pass the
plane
Location on y = mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

connection to the
powerline (3,3)
(1,2)

y
where to pass the
plane
Location on y = mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

connection to the
powerline (3,3)
(1,2)
Goal: Find m, b such that you x

minimize sum of squares cost


y
y = mx + b
(2, 5)
(3,3)
(1,2)
x
y
Goal: Minimize sum of squares cost
y = mx + b
(2, 5)
(3,3)
(1,2)
x
y
y = mx + b
(2, 5)
(1, m+b)
(3,3)
(1,2)
x
y
y = mx + b
(2, 5)
(1, m+b)
(3,3)
(1,2) (m + b − 2)2
x
y
y = mx + b
(2, 5)
(2m + b − 5)2
(3,3)
(1,2) (m + b − 2)2
x
y
y = mx + b
(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
y
(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2

y = mx + b
(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
y
(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2

y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
y
(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2

y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
(3,3)
(1,2) (m + b − 2)2
x
y
(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2

y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
+9m 2 + b 2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b − 2)2
x
y
(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2

y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
+9m 2 + b 2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b − 2)2
x
y
(m + b − 2)2 + (2m + b − 5)2 + (3m + b − 3)2

y = mx + b
m 2 + b 2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m 2 + b 2 + 25 + 4mb − 20m − 10b
(3m + b − 3)2
+9m 2 + b 2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b − 2)2
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5)
(3,3)
(1,2)
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
=0
∂m
(3,3)
(1,2)
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
=0
∂m
∂E
(3,3) =0
(1,2) ∂b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
=0 Quiz:
∂m
Find the partial derivative of E
∂E with respect to m
(3,3) =0
(1,2) ∂b
x

y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) =
(1,2) ∂b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) = Quiz:
∂b
Find the partial derivative of E
(1,2)
x with respect to b

y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
(2, 5) ∂E
= 28m + 12b − 42
∂m
∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20
(1,2) ∂b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
b=
y
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b

y = mx + b
∂E
(2, 5)
= 28m + 12b − 42 =0
∂m
∂E
(3,3) = 6b + 12m − 20 =0
(1,2) ∂b
x
m=
b= ?
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
∂E
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
m=
b= ?
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
6b − 14 = 0
m=
b= ?
4m − 2 = 0
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
2
m= = 0.5
∂E 4
= 28m + 12b − 42 =0
∂m 6b + 12(0.5) − 20 = 0
∂E
= 6b + 12m − 20 =0 6b + 6 − 20 = 0
∂b
6b − 14 = 0
m=
b= ? b=
14
6
=
7
3
1
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b m=
2
∂E
= 28m + 12b − 42 =0 7
∂m b=
3
∂E
= 6b + 12m − 20 =0
∂b 1 7
E(m = , b = ) ≈ 4.167
2 3
m=
b= ?
Linear Regression: Optimal Solution
y
1
m=
2
y = mx + b
(2, 5) 7
b=
3
(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
y
1
m=
2
y = mx + b
(2, 5) 7
b=
3
(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
y
1
m=
2
y = mx + b
(2, 5) 7
b=
3
(3,3)
1 7
(1,2) E(m = , b = ) ≈ 4.167
2 3
x
Linear Regression: Gradient Descent
E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
∂E
= 28m + 12b − 42 =0
∂m
Gradient Descent to the rescue
∂E
= 6b + 12m − 20 =0
∂b
m=
b= ?
Optimization using Gradient

Descent in one variable -
Part 1
Hard To Optimize Functions
f (x) = e x − log(x)
y
Minimum?
y
Minimum?
y
Minimum?
y
f′(x) = 0

Minimum?
y
1
f′(x) = e x − =0
x

Minimum?
y
1
f′(x) = e x − =0
x

Minimum?
y
1
f′(x) = e x − =0
x

Minimum?
y
1
f′(x) = e x − =0
x

1
f′(x) = e x −
x y

1
f′(x) = e x − =0
x y

1
f′(x) = e x − =0
x y
1
ex =
x

1
f′(x) = e x − =0
x y
1
ex =
x
Solution: x = 0.5671...

1
f′(x) = e x − =0
x y
1
ex =
x
Solution: x = 0.5671...
Also known as the Omega constant

Method 1: Try Both Directions
y
Is there any
other way?
y
Is there any
other way?
y
Is there any
other way?
y
Is there any
other way?
y
Is there any
other way?
y
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!
Is there any
other way?
Repeat!

Part 2
Method 2: Be Clever
y
Method 2: Be Clever
Try something
smarter…
y
Method 2: Be Clever
Try something
smarter…
y
Method 2: Be Clever
Try something
smarter…
Step right
Method 2: Be Clever
Try something
smarter…
Step right
Method 2: Be Clever
Try something
smarter…
y Step left
Step right
Method 2: Be Clever
Try something
smarter…
y Step left
Step right
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
Slope
Step right positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
new point
Slope
Step right positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point
Slope
Step right positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
new point = old point - slope
Slope
Step right positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
x1 Step right
Slope
positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
x1 = x0 Step right
Slope
positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y Step left
x1 = x0 −f′(x0) Step right

Slope
positive
Slope
negative

Method 2: Be Clever
Try something
smarter…
y
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
y
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
x1 = x0 −f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
x1 = x0 −f′(x0)
x1 = x0 − 0.01 f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
x1 = x0 −f′(x0)
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big steps
can be chaotic
y
x1 = x0 −f′(x0)
x1 = x0 − α f′(x0)
Learning rate

Method 2: Be Clever
Try something
smarter…
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)
Small
step

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)
Small Large
step slope

Method 2: Be Clever
Try something
smarter…
Big
step
y
x1 = x0 − α f′(x0)
Small Large
step slope
Small
slope

Method 2: Be Clever
Try something
smarter…
x1 = x0 − α f′(x0)
Gradient descent

Method 2: Be Clever
Try something
smarter…
x2 = x1 − α f′(x1)
Gradient descent

Method 2: Be Clever
Try something
smarter…
x20 = x19 − α f′(x19)
Gradient descent

Gradient Descent
Function: f(x) Goal: find minimum of f(x)
Step 1:
Define a learning rate α
Choose a starting point x0
Step 2:
Update: xk = xk−1 − af′(xk−1)
Step 3:
Repeat Step 2 until you are close enough to
the true minimum x*

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
y

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
y

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y
Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y
Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735
Repeat!

Gradient
Method 2 Descent
1
f (x) = e x − log(x) f′(x) = e x −
x
Start: x = 0.05 Rate: α = 0.005
Find:
f′(0.05) = − 18.9
Move by −0.005 f′(0.05) x ↦ 0.1447
y
Find:
f′(0.1447) = − 5.7552
Move by −0.005 f′(0.05) x ↦ 0.1735
Repeat!


Part 3
What Is a Good Learning Rate?
Too large Too small
Too large Too small
Too large Too small
Too large Too small
Too large Too small
Just right
Just right
Just right Unfortunately, there is no
rule to give the best
learning rate α

Drawbacks of Gradient Descent
y
x
y
🤷
x
How do you find this point?

y
🤷
x
stuck in a local minima How do you find this point?

y Take multiple initial starting points for gradient
descent
🤷
x
stuck in a local minima How do you find this point?

Descent in two variables -
Part 1
Gradient Descent With Heat Example
T
T
T
T
T
Repeat!
T
T
Repeat!
T

Descent in two variables -
Part 2
Idea for Gradient Descent
Initial position: (x0, y0)



Direction of greatest ascent: ∇f
∇f

∇f


Direction of greatest descent: − ∇f
∇f − ∇f


∇f − ∇f


− ∇f


Updated position: (x0, y0) − α ∇f

− ∇f


(x1, y1) − ∇f


(x1, y1) − ∇f
Better point!

1 2
Method 2: Gradient Descent T = f (x, y) = 85 −
90
x (x − 6)y 2(y − 6)
Start: x = 0.5, y = 0.6

1 2
90
x (x − 6)y 2(y − 6)
Start: x = 0.5, y = 0.6

∂f
∂x
∇f = ∂f
∂y

1 2
90
x (x − 6)y 2(y − 6)
Start: x = 0.5, y = 0.6

∂f
∂x
∇f = ∂f
∂y
∂f 1 2
∂x
= − 90
x(3x − 12)y (y − 6)
∇f∂f= 1 2
= − 90 x (x − 6)y(3y − 12)
∂y

1 2
90
x (x − 6)y 2(y − 6)
Start: x = 0.5, y = 0.6

∂f
∂x
∇f = ∂f
∂y
1
− 90 x(3x − 12)y 2(y − 6)
∇f = 1 2
− 90 x (x − 6)y(3y − 12)

1 2
90
x (x − 6)y 2(y − 6)
Start: x = 0.5, y = 0.6

∂f
∂x
∇f = ∂f
∂y
1
− 90 x(3x − 12)y 2(y − 6)
∇f = 1 2
− 90 x (x − 6)y(3y − 12)
[−0.0935]
0.1134
∇f (0.5,0.6) =

Method 2: Gradient Descent
Start: x = 0.5, y = 0.6
[−0.0935]
0.1134
∇f (0.5,0.6) =

Start: x = 0.5, y = 0.6 Rate: α = 0.05
[−0.0935]
0.1134
∇f (0.5,0.6) =
Move by
−0.05 ∇f (0.5,0.6)

Start: x = 0.5, y = 0.6 Rate: α = 0.05
[−0.0935]
0.1134
∇f (0.5,0.6) =
Move by
−0.05 ∇f (0.5,0.6)
x ↦ 0.5057
y ↦ 0.6047

Start: x = 0.5, y = 0.6 Rate: α = 0.05
[−0.0935]
0.1134
∇f (0.5,0.6) =
Move by
−0.05 ∇f (0.5,0.6)
x ↦ 0.5057
y ↦ 0.6047

Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05

Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)

Method 2
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!

Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!

Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!

Method
Gradient2Descent
Start: x = 0.5, y = 0.6 Rate: α = 0.05
Find:
[−0.0961]
−0.1162
∇f (0.5057,0.6047) =
Move by
−0.05 ∇f (0.5057,0.6047)
x ↦ 0.5115
y ↦ 0.6095 Repeat!

Gradient Descent
Gradient Descent
Function: f(x, y)

Gradient Descent
Function: f(x, y) Goal: find minimum of f(x, y)

Gradient Descent
Step 1:
Choose a starting point (x0, y0)

Gradient Descent
Step 1:
Step 2:
[yk] [yk−1]
xk xk−1
Update: = − α ∇f(xk−1, yk−1)

Gradient Descent
Step 1:
Step 2:
[yk] [yk−1]
xk xk−1
Update: = − α ∇f(xk−1, yk−1)
Step 3:
Repeat Step 2 until you are close enough to
the true minimum (x*, y*)

Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima
Gradient
Descent
With Local
Minima

Descent - Least squares
Gradient Descent With Power Lines Example
y
y = mx + b
(2, 5)
(3,3)
(1,2)
x
y E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
y = mx + b
(2, 5)
(3,3)
(1,2)
x
y E(m, b) = 14m 2 + 3b 2 + 38 + 12mb − 42m − 20b
y = mx + b
1 7
E(m = , b = ) ≈ 4.167
(2, 5) 2 3
(3,3)
(1,2)
x
m

m

∇E = [28m + 12b − 42, 6b + 12m − 20]
m

∇E = [28m + 12b − 42, 6b + 12m − 20]
m=
b=
m

∇E = [28m + 12b − 42, 6b + 12m − 20]
m=
b= ?
m

∇E = [28m + 12b − 42, 6b + 12m − 20]
m=
b= ?
m

∇E = [28m + 12b − 42, 6b + 12m − 20]
m=
b= ?
m

some initial starting

∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
b= ?
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
b= ?
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
b= ? The points m,b such that
the cost is minimum
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
the cost is minimum
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
the cost is minimum
descend until you

find the minimum
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
the cost is minimum
Steps:
descend until you

find the minimum
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
the cost is minimum
Steps:
Start with (m0, b0)
descend until you
find the minimum
m


∇E = [28m + 12b − 42, 6b + 12m − 20] point
m=
the cost is minimum
Steps:
Start with (m0, b0)
descend until you
find the minimum
Iterate
(mk+1, bk+1) = (mk, bk ) − α ∇E(mk, bk ) m


Descent - Least squares
with multiple observations
Gradient Descent
y
(2, 5)
(3,3)
(1,2)
x
Gradient Descent
y
y = mx + b
(2, 5)
(3,3)
(1,2)
x
Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x
b

Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x
b

Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x b
b m

Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x b
b m

Gradient Descent
Square loss
y
ℒ
y = mx + b
(2, 5)
Cost
(3,3)
(1,2)
m
x b
b m

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Gradient Descent
y
ℒ
y = mx + b
(2, 5)
(3,3)
(1,2)
m
x
b

Another Example
Another Example
Another Example
TV advertisement
budget
Another Example
TV advertisement
budget
Another Example
TV advertisement Number of sales

budget
Another Example
Another Example
TV budget Sales
Another Example
TV budget Sales
230.1 22.1
Another Example
TV budget Sales
230.1 22.1
44.5 10.4
Another Example
TV budget Sales
230.1 22.1
44.5 10.4
17.2 9.3
Another Example
TV budget Sales Goal: Predict sales in terms of TV budget
230.1 22.1
44.5 10.4
17.2 9.3
Another Example
230.1 22.1 Tool: Linear regression
44.5 10.4
17.2 9.3
Another Example
44.5 10.4
y = mx + b
17.2 9.3
Another Example
44.5 10.4
y = mx + b
17.2 9.3
Another Example
44.5 10.4
y = mx + b
17.2 9.3
Multiple observations
Gradient Descent
y
y = mx + b
x
Gradient Descent
y
(xn, yn )
(x 2, y2 )
(x1, y1) y = mx + b
x
Gradient Descent
y
(xn, yn )
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x
Gradient Descent
Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x

Gradient Descent
Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x

Gradient Descent
Cost Loss
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1) y = mx + b
(x1, m x1 + b)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
(x 2, y2 )
(x1, y1)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ b0 ]
(x 2, y2 )
m0
(x1, y1)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ b0 ]
(x 2, y2 )
m0
(x1, y1) y = m x0 + b0
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1) y = m x0 + b0 = − α ∇ℒ1(m0, b0)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x1 + b1
[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1) y = m x0 + b0 = − α ∇ℒ1(m0, b0)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x1 + b1
[ b0 ] [ b1 ] [ b0 ]
(x 2, y2 )
m0 m1 m0
(x1, y1)
= − α ∇ℒ1(m0, b0)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
y = m x2 + b2
[ b1 ] [ b2 ] [ b1 ]
(x 2, y2 ) m1 m2 m1
(x1, y1)
= − α ∇ℒ1(m1, b1)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
y = m x3 + b3 ℒ(m, b) =
(xn, yn ) 2m
[ b2 ] [ b3 ] [ b2 ]
(x 2, y2 ) m2 m3 m2
(x1, y1)
= − α ∇ℒ1(m2, b2)
x

Gradient Descent
y
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ bN ] [ bN ] [ bN−1 ]
(x 2, y2 )
mN mN mN−1
(x1, y1)
= − α ∇ℒ1(mN−1, bN−1)
x

Gradient Descent
y
y = m xN + bN
1
[(m x1 + b − y1) + ⋯ + (m xn − yn) ]
2 2
ℒ(m, b) =
(xn, yn ) 2m
[ bN ] [ bN ] [ bN−1 ]
(x 2, y2 )
mN mN mN−1
(x1, y1)
= − α ∇ℒ1(mN−1, bN−1)
x

Conclusion

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Copyright:

Available Formats

Copyright Notice

These slides are distributed under the Creative Commons License.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

Fix y=4 f(x,4) = x 2 + 42

x=2 Fix x=2 f(2,y) = 22 + y 2

Example with the parabola, show tangent plane and slices

Partial derivatives - Part 1

Function of one variable x

Function of one variable x

Function of one variable x

Function of one variable x

Function of one variable x

Function of one variable x

Function of one variable x

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Partial derivatives -Part 2

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Differentiate with respect to x

Step 2: Differentiate the function using the normal

Differentiate with respect to x

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

treat as constant coefficient

Step 2: Differentiate the function using the normal

treat as constant coefficient

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Step 2: Differentiate the function using the normal

Gradients and maxima/

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Functions of Two Variables

Gradients and Gradient Descent

Optimization with gradients:

Optimization using gradients

(1, 2) (2, 5) (3,3)

Linear Regression: Analytical Approach