Professional Documents
Culture Documents
Lecture 4 Backpropagation
Lecture 4 Backpropagation
Backpropagation
Amit Sethi
Electrical Engineering, IIT Bombay
Learning objectives
• Write derivative of a nested function using
chain rule
𝜕𝑓
𝜕𝑥1
• 𝛻𝑓 𝒙 = 𝛻𝑓 𝑥1 , 𝑥2 = 𝜕𝑓
𝜕𝑥2
• At a minima or a maxima the
gradient is a zero vector
The function is flat in every
direction
• At a minima or a maxima the
gradient is a zero vector
Gradient of a function of a vector
• Gradient gives a direction for
moving towards the minima
• Take a small step towards
f(x1, x2) →
x
* ?
ReL
W1 + Z1 A1
U
b1 * ?
SoftM
W2 + Z2 A2
ax
b2 CE Loss
targ
et
Vector valued functions and Jacobians
• We often deal with functions that give multiple
outputs
𝑓1 (𝒙) 𝑓1 (𝑥1 , 𝑥2 , 𝑥3 )
• Let 𝒇 𝒙 = =
𝑓2 (𝒙) 𝑓2 (𝑥1 , 𝑥2 , 𝑥3 )
• Thinking in terms of vector of functions can make the
representation less cumbersome and computations
more efficient
• Then the Jacobian is
𝜕𝑓1 𝜕𝑓1 𝜕𝑓1
𝜕𝒇 𝜕𝒇 𝜕𝒇 𝜕𝑥1 𝜕𝑥2 𝜕𝑥3
• 𝑱(𝒇) = 𝜕𝑥1 𝜕𝑥2 𝜕𝑥3
= 𝜕𝑓2 𝜕𝑓2 𝜕𝑓2
𝜕𝑥1 𝜕𝑥2 𝜕𝑥3
Jacobian of each layer
• Compute the derivatives of a higher layer’s
output with respect to those of the lower
layer
𝑓 𝑥 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐,
𝑓 ′ 𝑥 = 2𝑎𝑥 + 𝑏,
𝑓′′ 𝑥 = 2𝑎
𝜕2 𝑓 𝜕2 𝑓
𝜕𝑥1 2 𝜕𝑥1 𝜕𝑥2 10 4
• And, 𝐻(𝑓 𝒙 ) = =
𝜕2 𝑓 𝜕2 𝑓 4 6
𝜕𝑥2 𝜕𝑥1 𝜕𝑥2 2
Saddle points, Hessian and long local
furrows
Saddle
point
A realistic picture
Local
minima
Local
maxima