Professional Documents
Culture Documents
Now, we have
n
X
fi = Aij xj , i = 1, · · · , m.
j=1
∂fi
= xq .
∂Aiq
f (B) = B T B =: K ∈ Rn×n
Then, we have
∂K
∈ R(n×n)×(m×n) .
∂B
Moreover
∂Kpq
∈ R1×(m×n) , forp, q = 1, · · · n
∂B
where Kpq is the (p, q)th entry of K = f (B)
Gradients of Matrices with respect to Matrices
Clearly, we have
∂pqij = Biq if j = p, p ̸= q
∂pqij = Bip if j = q, p ̸= q
∂pqij = 2Biq if j = p, p = q
∂pqij = 0 otherwise
where p, q, j = 1, · · · , n i = 1, · · · , m
Useful Identities for Computing Gradients
∂ T ∂f (X ) T
▶ ∂X f (X ) = ( ∂X )
∂ ∂f (X )
▶ ∂X tr (f (X )) = tr ( ∂X )
▶ ∂ −1 ∂f (X ) )
∂X det(f (X )) = det(f (X ))tr (f (X ) ∂X
∂ −1 −1 ∂f (X ) −1
▶ ∂X f (X ) = −f (X ) ∂X f (X )
Useful Identities for Computing Gradients
∂aT X −1 b
▶ ∂X = −(X −1 )T ab T (X −1 )T
▶ ∂x T a T
∂x = a
▶ ∂aT x T
∂x = a
T
∂a Xb
▶ T
∂X = ab
▶ ∂x T B T T
∂x = x (B + B )
∂ T
▶ ∂s (x − As) W (x − As) = −2(x − As)T WA
for symmetric W .
Backpropagation and Automatic Differentiation
Taking dervatives
df 2x + 2xexp(x 2 )
= p − sin(x 2 + exp(x 2 ))(2x + 2xexp(x 2 ))
dx 2 x 2 + exp(x 2 )
1
= 2x( p − sin(x 2 + exp(x 2 )))(1 + exp(x 2 ))
2 x + exp(x 2 )
2
Motivation
f0 := x
fi := σi (Ai−1 fi−1 + bi−1 ), i = 1, · · · , K .
∂L
If the partial derivatives
∂θi+1 are computed, then the computation
∂L
can be reused to compute ∂θ i
.
Example
Let
√
a = x 2 , b = exp(a), c = a + b, d = c, e = cos(c) ⇒ f = d + e
Example
∂a
⇒ = 2x
∂x
∂b
= exp(a)
∂a
∂c ∂c
= 1=
∂a ∂b
∂d 1
= √
∂c 2 c
∂e
= −sin(c)
∂c
∂f ∂f
= 1=
∂d ∂e
Example
Thus, we have
∂f ∂f ∂d ∂f ∂e
= +
∂c ∂d ∂c ∂e ∂c
∂f ∂f ∂c
=
∂b ∂c ∂b
∂f ∂f ∂b ∂f ∂c
= +
∂a ∂b ∂a ∂c ∂a
∂f ∂f ∂a
=
∂x ∂a ∂x
Example
xi = gi (xPa(xi ) )
Now,
∂f
f = xD ⇒ =1
∂D
For other variables, using chain rule, we get
∂f X ∂f ∂xj X ∂f ∂gi
= =
∂xi ∂xj ∂xi ∂xj ∂xi
xj :xi ∈Pa(xj ) xj :xi ∈Pa(xj )