Professional Documents
Culture Documents
Matrix Calc
Matrix Calc
Matrix Calculus
From too much study, and from extreme passion, cometh madnesse.
∂f (x)
∂x1
∂f (x)
∇f (x) , ∂x2 ∈ RK (2053)
..
.
∂f (x)
∂xK
while the second-order gradient of the twice differentiable real function with respect to its
vector argument is traditionally called the Hessian ;
interpreted
³ ´ ³ ´
(x) (x)
∂ 2f (x) ∂ ∂f
∂x1 ∂ ∂f
∂x2 ∂ 2f (x)
= = = (2055)
∂x1 ∂x2 ∂x2 ∂x1 ∂x2 ∂x1
Dattorro, Convex Optimization Euclidean Distance Geometry, Mεβoo, 2005, v2020.02.29. 599
600 APPENDIX D. MATRIX CALCULUS
where the gradient of each real entry is with respect to vector x as in (2053).
The gradient of real function g(X) : RK×L→ R on matrix domain is
∂g(X) ∂g(X) ∂g(X)
∂X11 ∂X12 ··· ∂X1L
∂g(X) ∂g(X) ∂g(X)
···
∇g(X) , ∂X21 ∂X22 ∂X2L ∈ RK×L
.. .. ..
. . .
∂g(X) ∂g(X) ∂g(X)
∂XK1 ∂XK2 ··· ∂XKL
(2060)
£
∇X(:,1) g(X)
∇X(:,2) g(X)
= ∈ RK×1×L
..
.
¤
∇X(:,L) g(X)
where gradient ∇X(:, i) is with respect to the i th column of X . The strange appearance of
(2060) in RK×1×L is meant to suggest a third dimension perpendicular to the page (not
D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derived from mater
meaning mother.
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 601
Because gradient of the product (2068) requires total change with respect to change in
each entry of matrix X , the Xb vector must make an inner product with each vector in
that second dimension of the cubix indicated by dotted line segments;
a1 0
0 a1 b1 X11 + b2 X12
· ¸
2×1×2
T
∇X (X a) Xb =
b1 X21 + b2 X22 ∈ R
a2 0
0 a2 (2072)
· ¸
a1 (b1 X11 + b2 X12 ) a1 (b1 X21 + b2 X22 )
= ∈ R2×2
a2 (b1 X11 + b2 X12 ) a2 (b1 X21 + b2 X22 )
= abTX T
where the cubix appears as a complete 2 × 2 × 2 matrix. In like manner for the second
term ∇X (g) f
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 603
b1 0
b2 0
· ¸
X11 a1 + X21 a2
∈ R2×1×2
T
∇X (Xb) X a =
0 b1 X12 a1 + X22 a2 (2073)
0 b2
= X TabT ∈ R2×2
The solution
∇X aTX 2 b = abTX T + X TabT (2074)
can be found from Table D.2.1 or verified using (2067). 2
∇X g f (X)T , h(X)T = ∇X f T ∇f g + ∇X hT ∇h g
¡ ¢
(2086)
where ek is the k th standard basis vector in RK while el is the l th standard basis vector in
RL . Total number of partial derivatives equals KLM N while the gradient is defined in
their terms; mn th entry of the gradient is
∂gmn (X) ∂gmn (X) ∂gmn (X)
∂X11 ∂X12 ··· ∂X1L
∂gmn (X) ∂gmn (X) ∂gmn (X)
∂X21 ∂X22 ··· ∂X2L ∈ RK×L
∇gmn (X) = (2093)
.. .. ..
. . .
∂gmn (X) ∂gmn (X) ∂gmn (X)
∂XK1 ∂XK2 ··· ∂XKL
which may be interpreted as the change in gmn at X when the change in Xkl is equal
to Ykl the kl th entry of any Y ∈ RK×L . Because the total change in gmn (X) due to Y is
the sum of change with respect to each and every Xkl , the mn th entry of the directional
derivative is the corresponding total differential [462, §15.8]
606 APPENDIX D. MATRIX CALCULUS
X ∂gmn (X)
Ykl = tr ∇gmn (X)T Y
¡ ¢
dgmn (X)|dX→Y = (2097)
∂Xkl
k, l
X gmn (X + ∆t Ykl ek eT
l ) − gmn (X)
= lim (2098)
∆t→0 ∆t
k, l
gmn (X + ∆t Y ) − gmn (X)
= lim (2099)
¯
∆t→0 ∆t
d¯ ¯
= gmn (X + t Y ) (2100)
dt ¯t=0
where t ∈ R . Assuming finite Y , equation (2099) is called the Gâteaux differential
[50, App.A.5] [265, §D.2.1] [474, §5.28] whose existence is implied by existence of the
Fréchet differential (the sum in (2097)). [337, §7.2] Each may be understood as the change
in gmn at X when the change in X is equal in magnitude and direction to Y .D.2 Hence
the directional derivative,
dg11 (X) dg12 (X) · · · dg1N (X)
¯
¯
¯
→Y dg21 (X) dg22 (X) · · · dg2N (X) ¯
dg (X) , ¯ ∈ RM ×N
¯
.. .. ..
. . . ¯
¯
dg (X) dg (X) · · · dg (X) ¯
M1 M2 MN dX→Y
Yet for all X ∈ dom g , any Y ∈ RK×L , and some open interval of t ∈ R
→Y
g(X + t Y ) = g(X) + t dg (X) + O(t2 ) (2103)
which is the first-order multidimensional Taylor series expansion about X . [462, §18.4]
[203, §2.3.4] Differentiation with respect to t and subsequent t-zeroing isolates the second
term of expansion. Thus differentiating and zeroing g(X + t Y ) in t is an operation
equivalent to individually differentiating and zeroing every entry gmn (X + t Y ) as in
(2100). So the directional derivative of g(X) : RK×L→ RM ×N in any direction Y ∈ RK×L
evaluated at X ∈ dom g becomes
¯
→Y d ¯¯
dg (X) = g(X + t Y ) ∈ RM ×N (2104)
dt ¯t=0
D.2 Although Y is a matrix, we may regard it as a vector in RKL .
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 607
υ ✡T
✡
f (α + t y) ✡
✡
✡
(α , f (α))✡
∇x f (α)
✡ f (x)
υ , ✡
→∇x f (α)
1 ✡
2 df(α) ✡
∂H
[371, §2.1, §5.4.5] [43, §6.3.1] which is simplest. In case of a real function g(X) : RK×L→ R
→Y
dg (X) = tr ∇g(X)T Y
¡ ¢
(2126)
In case g(X) : RK → R
→Y
dg (X) = ∇g(X)T Y (2129)
Unlike gradient, directional derivative does not expand dimension; directional
derivative (2104) retains the dimensions of g . The derivative with respect to t makes
the directional derivative resemble ordinary calculus (§D.2); e.g, when g(X) is linear,
→Y
dg (X) = g(Y ). [337, §7.2]
→X−X ⋆
df (X) ≥ 0 (2105)
⋄
∇ ∂g(X)
∂X11 ∇ ∂g(X)
∂X12 ··· ∇ ∂g(X)
∂X1L
∂g(X)
∇ ∂X21 ∇ ∂g(X) ··· ∇ ∂g(X)
2
∇ g(X) T1
= ..
∂X22
..
∂X2L
.. ∈ RK×L×M ×N ×K×L (2114)
. . .
∇ ∂g(X)
∂XK1 ∇ ∂g(X)
∂XK2 ··· ∂g(X)
∇ ∂XKL
∂∇g(X) ∂∇g(X) ∂∇g(X)
∂X11 ∂X12 ··· ∂X1L
∂∇g(X) ∂∇g(X) ∂∇g(X)
···
2
∇ g(X) T2
= ∂X21 ∂X22 ∂X2L ∈ RK×L×K×L×M ×N (2115)
.. .. ..
. . .
∂∇g(X) ∂∇g(X) ∂∇g(X)
∂XK1 ∂XK2 ··· ∂XKL
Assuming the limits to exist, we may state the partial derivative of the mn th entry of g
with respect to kl th and ij th entries of X ;
∂gmn (X+∆t ek eT
³ ´
∂ 2gmn (X) ∂ ∂gmn (X) l )−∂gmn (X)
∂Xkl ∂Xij = ∂Xij ∂Xkl = lim ∂X ij ∆t
∆t→0
(2116)
(gmn (X+∆t ek eTl +∆τ ei eTj )−gmn (X+∆t ek eTl ))− (gmn (X+∆τ ei eTj )−gmn (X))
= lim ∆τ ∆t
∆τ,∆t→0
(gmn (X+∆t Ykl ek eTl +∆τ Yij ei eTj )−gmn (X+∆t Ykl ek eTl ))− (gmn (X+∆τ Yij ei eTj )−gmn (X))
= lim ∆τ ∆t
∆τ,∆t→0
X X ∂ 2gmn (X) ³ ¢T ´
d 2gmn (X)|dX→Y = Ykl Yij = tr ∇X tr ∇gmn (X)T Y Y
¡
(2118)
i,j
∂Xkl ∂Xij
k, l
X ∂gmn (X + ∆t Y ) − ∂gmn (X)
= lim Yij (2119)
i,j
∆t→0 ∂Xij ∆t
gmn (X + 2∆t Y ) − 2gmn (X + ∆t Y ) + gmn (X)
= lim (2120)
∆t→0 ∆t2
2 ¯
¯
d ¯
= gmn (X + t Y ) (2121)
dt2 ¯t=0
Hence the second directional derivative,
2
d g11 (X) d 2g12 (X) ··· d 2g1N (X)
¯
¯
¯
→Y d 2g21 (X) d 2g22 (X) ··· d 2g2N (X)
¯
dg 2(X) , ¯ ∈ RM ×N
¯
.. .. ..
. . . ¯
¯
d 2gM 1 (X) d 2gM 2 (X) ··· 2
d gMN (X) ¯dX→Y
³ ¢T ´ ³ ¢T ´ ³ ¢T ´
tr ∇tr ∇g11 (X)T Y Y tr ∇tr ∇g12 (X)T Y Y · · · tr ∇tr ∇g1N (X)T Y Y
¡ ¡ ¡
³ ¢T ´ ³ ¢T ´ ³ ¢T ´
tr ∇tr ∇g21 (X)T Y Y tr ∇tr ∇g22 (X)T Y Y · · · tr ∇tr ∇g2N (X)T Y Y
¡ ¡ ¡
=
.. .. ..
³ . . .
¡ T
¢T ´ ³ ¡ T
¢T ´ ³ ¡ T
¢T ´
tr ∇tr ∇gM 1 (X) Y Y tr ∇tr ∇gM 2 (X) Y Y · · · tr ∇tr ∇gMN (X) Y Y
∂ 2g1N (X)
PP 2
∂ g11 (X) PP ∂ 2g12 (X) PP
Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· ∂Xkl ∂Xij Ykl Yij
i,j k, l ∂Xkl ∂Xij i,j k, l i,j k, l
2
P P ∂ 2g21 (X) PP 2
∂ g22 (X) PP ∂ g2N (X)
∂Xkl ∂Xij Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· ∂Xkl ∂Xij Ykl Yij
=
i,j k, l i,j k, l i,j k, l
(2122)
.. .. ..
. . .
P P ∂ 2gM 1 (X) PP ∂ 2gM 2 (X) P P ∂ 2gMN (X)
∂Xkl ∂Xij Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· Ykl Yij
∂Xkl ∂Xij
i,j k, l i,j k, l i,j k, l
Yet for all X ∈ dom g , any Y ∈ RK×L , and some open interval of t ∈ R
→Y 1 2 →Y2
g(X + t Y ) = g(X) + t dg (X) + t dg (X) + O(t3 ) (2124)
2!
which is the second-order multidimensional Taylor series expansion about X . [462, §18.4]
[203, §2.3.4] Differentiating twice with respect to t and subsequent t-zeroing isolates the
third term of the expansion. Thus differentiating and zeroing g(X + t Y ) in t is an
operation equivalent to individually differentiating and zeroing every entry gmn (X + t Y )
as in (2121). So the second directional derivative of g(X) : RK×L→ RM ×N becomes
[371, §2.1, §5.4.5] [43, §6.3.1]
→Y
d 2 ¯¯
¯
dg (X) = 2 ¯ g(X + t Y ) ∈ RM ×N
2
(2125)
dt t=0
which is again simplest. (confer (2104)) Directional derivative retains the dimensions of g .
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 611
→Y ³ ¢T ´
µ
→Y
¶
2 T T
¡
dg (X) = tr ∇X tr ∇g(X) Y Y = tr ∇X dg (X) Y (2127)
à !
→Y µ ³ ¢T ´T
¶ →Y
3 T 2 T
¡
dg (X) = tr ∇X tr ∇X tr ∇g(X) Y Y Y = tr ∇X dg (X) Y (2128)
→Y
dg (X) = Y T ∇ 2 g(X)Y
2
(2130)
→Y ¢T
dg (X) = ∇X Y T ∇ 2 g(X)Y Y
3
¡
(2131)
and so on.
→Y 1 2 →Y2 1 3 →Y3
g(X + µY ) = g(X) + µ dg (X) + µ dg (X) + µ dg (X) + O(µ4 ) (2132)
2! 3!
or on some open interval of kY k2
→Y −X 1 →Y2 −X 1 →Y3 −X
g(Y ) = g(X) + dg(X) + dg (X) + dg (X) + O(kY k4 ) (2133)
2! 3!
which are third-order expansions about X . The mean value theorem from calculus is what
insures finite order of the series. [462] [51, §1.1] [50, App.A.5] [265, §0.4] These somewhat
unbelievable formulaeD.3 imply that a function can be determined over the whole of its
domain by knowing its value and all its directional derivatives at a single point X .
→Y
d 2 ¯¯
¯
dg (X) = 2 ¯ g(X + t Y ) = 2X −1 Y X −1 Y X −1
2
(2135)
dt t=0
2
D.3 e.g, real continuous and differentiable function of real variable f (x) = e−1/x has no Taylor series
expansion about x = 0 , of any practical use, because each derivative equals 0 there.
612 APPENDIX D. MATRIX CALCULUS
→Y
d 3 ¯¯
¯
3
dg (X) = 3 ¯ g(X + t Y ) = −6X −1 Y X −1 Y X −1 Y X −1 (2136)
dt t=0
Let’s find the Taylor series expansion of g about X = I : Since g(I ) = I , for kY k2 < 1
(µ = 1 in (2132))
If Y is small, (X + Y )−1 ≈ X −1 − X −1 Y X −1 . 2
D.1.8.1 first-order
d
tr ∇X gmn (X + t Y )T Y = gmn (X + t Y )
¡ ¢
(2140)
dt
which is valid at t = 0 , of course, when X ∈ dom g . In the important case of a real
function g(X) : RK×L→ R , from (2126) we have simply
d
tr ∇X g(X + t Y )T Y = g(X + t Y )
¡ ¢
(2141)
dt
d
∇X g(X + t Y )T Y = g(X + t Y ) (2142)
dt
D.4 Had we instead set g(Y ) = (I + Y )−1 , then the equivalent expansion would have been about X = 0.
D.5 Justified by replacing X with X + t Y in (2097)-(2099); beginning,
X ∂gmn (X + t Y )
dgmn (X + t Y )|dX→Y = Ykl
k, l
∂Xkl
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 613
tr ∇X g(X + t Y )T Y = tr 2wwT(X T + t Y T )Y
¡ ¢ ¡ ¢
(2143)
T T T
= 2w (X Y + t Y Y )w (2144)
d d T
g(X + t Y ) = w (X + t Y )T (X + t Y )w (2145)
dt dt ¡
= wT X T Y + Y TX + 2t Y T Y w
¢
(2146)
T T T
= 2w (X Y + t Y Y )w (2147)
tr ∇X g(X + t Y )T Y 2wT¡(X T Y + t Y T Y )w ¢
¡ ¢
=
= 2 tr wwT(X T + t Y T )Y
tr ∇X g(X)T Y 2 tr wwTX T Y (2148)
¡ ¢ ¡ ¢
=
⇔
∇X g(X) = 2XwwT
2
D.1.8.2 second-order
Likewise removing the evaluation at t = 0 from (2125),
→Y
2 d2
dg (X + t Y ) = g(X + t Y ) (2149)
dt2
we can find a similar relationship between second-order gradient and second derivative: In
the general case g(X) : RK×L→ RM ×N from (2118) and (2121),
³ ¢T ´ d2
tr ∇X tr ∇X gmn (X + t Y )T Y Y = 2 gmn (X + t Y )
¡
(2150)
dt
From (2130), the simpler case, where real function g(X) : RK → R has vector argument,
d2
Y T ∇X2 g(X + t Y )Y = g(X + t Y ) (2152)
dt2
∇ 2 g(X)kl = ∇h(X)kl = − X −1 ek eT −1
∈ RK×K
¡ ¢
l X (2159)
2
From all these first- and second-order expressions, we may generate new ones by evaluating
both sides at arbitrary t (in some open interval) but only after differentiation.
D.2.1 algebraic
T
∇x (Ax
¡ T − b) T=¢ A
∇x x A − b = A
∇x xTAx + 2xTB y + y TC y = ¢A + AT x + 2B y
¡ ¡¢ ¢
+ y)TA(x + y) = A +¢AT (x + y)
¡
∇x (x
∇x x Ax + 2xTB y + y TC y = A + AT
2
¡ T
∇X aTX −1 b = −X −T abT X −T
confer
∂X −1
∇X (X −1 )kl = = −X −1 ek eT
l X −1
, (2095)
∂Xkl
(2159)
∇x aTxTxb = 2xaT b ∇X aTX TXb = X(abT + baT )
algebraic continued
d
dt (X + tY ) = Y
d T
dt B (X + t Y )−1 A = −B T (X + t Y )−1 Y (X + t Y )−1 A
d T
dt B (X + t Y )−TA = −B T (X + t Y )−T Y T (X + t Y )−TA
d T
dt B (X + t Y )µ A = . . . , −1 ≤ µ ≤ 1 , X , Y ∈ SM+
d2
dt2
B T (X + t Y )−1 A = 2B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
d3
dt3
B T (X + tY )−1
A = −6B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
d
(X + t Y )TA(X + t Y ) = Y TAX + X TAY + 2 t Y TAY
¡ ¢
dt ¡
d2
(X + t Y )TA(X + t Y ) = 2 Y TAY
¢
dt2¡ ¢−1
d T
dt (X¡+ t Y ) A(X + t Y ) ¢−1 T ¢−1
= − (X + t Y ) A(X + t Y ) (Y AX + X TAY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
T
¡
d
dt ((X + t Y )A(X + t Y )) = YAX + XAY + 2 t YAY
d2
dt2
((X + t Y )A(X + t Y )) = 2 YAY
2 T 2 T T T T
∇vec X tr(A XBX ) = ∇vec X vec(X) (B ⊗ A) vec X = B ⊗ A + B ⊗ A (2076)
D.2. TABLES OF GRADIENTS AND DERIVATIVES 617
D.2.3 trace
∇x µ x = µI ∇X tr µX = ∇X µ tr X = µI
d −1
∇x 1T δ(x)−1 1 = dx x = −x−2 ∇X tr X −1 = −X −2T
∇x 1 δ(x) y = −δ(x)−2 y
T −1
∇X tr(X −1 Y ) = ∇X tr(Y X −1 ) = −X −T Y TX −T
d µ
dx x = µx µ−1 ∇X tr X µ = µX µ−1 , X ∈ SM
∇X tr X j = jX (j−1)T
¢T
∇x (b − aTx)−1 = (b − aTx)−2 a ∇X tr (B − AX)−1 = (B − AX)−2 A
¡ ¢ ¡
∇X tr (X + Y )T (X + Y ) = 2(X + Y ) = ∇X kX + Y k2F
¡ ¢
trace continued
d d
dt tr g(X + t Y ) = tr dt g(X + t Y ) [273, p.491]
d
dt tr(X + t Y ) = tr Y
d
dt tr j(X + t Y ) = j tr j−1(X + t Y ) tr Y
d
tr(X + t Y )j = j tr (X + t Y )j−1 Y
¡ ¢
dt (∀ j)
d
dt tr((X + t Y )Y ) = tr Y 2
d d
tr (X + t Y )k Y = tr(Y (X + t Y )k ) = k tr (X + t Y )k−1 Y 2 ,
¡ ¢ ¡ ¢
dt dt k ∈ {0, 1, 2}
k−1
d d
tr (X + t Y )k Y = tr(Y (X + t Y )k ) = tr (X + t Y )i Y (X + t Y )k−1−i Y
¡ ¢ P
dt dt
i=0
d
tr¡(X + t Y )−1 Y ¢ = − tr¡(X + t Y )−1 Y (X + t Y )−1 Y ¢
¡ ¢ ¡ ¢
dt
d
dt tr¡B T (X + t Y )−1 A¢ = − tr¡B T (X + t Y )−1 Y (X + t Y )−1 A ¢
d
dt tr¡B T (X + t Y )−TA ¢ = − tr B T (X + t Y )−T Y T (X + t Y )−TA
d
dt tr B T (X + t Y )−k A = . . . , k>0
d
tr B T (X + t Y )µ A = . . . , −1 ≤ µ ≤ 1 , X , Y ∈ SM
¡ ¢
dt +
d2
tr B T (X + t Y )−1 A = 2 tr B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
¡ ¢ ¡ ¢
dt2
d
(X + t Y )TA(X + t Y ) = tr Y TAX + X TAY + 2 t Y TAY
¡ ¢ ¡ ¢
dt tr ¡
d2 T
¢ ¡ T ¢
dt2
tr (X + t Y ) A(X + t Y ) = 2 tr Y AY
³¡ ´
d −1
+ t Y )TA(X + t Y )
¢
dt tr (X ³¡
T
¢−1 T ¢−1 ´
(Y AX + X AY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
T
¡
= − tr (X + t Y ) A(X + t Y )
d
dt tr((X + t Y )A(X + t Y )) = tr(YAX + XAY + 2 t YAY )
d2
dt2
tr((X + t Y )A(X + t Y )) = 2 tr(YAY )
D.2. TABLES OF GRADIENTS AND DERIVATIVES 619
d
dx log x = x−1 ∇X log det X = X −T
∂X −T −1 T
∇X2 log det(X)kl = = − X −1 ek eT
¡ ¢
l X , confer (2112)(2159)
∂Xkl
d
dx log x−1 = −x−1 ∇X log det X −1 = −X −T
d
dx log x µ = µx−1 ∇X log detµ X = µX −T
µ
∇X log det X = µX −T
1
∇x log(aTx + b) = a aTx+b ∇X log det(AX + B) = AT(AX + B)−T
d
dt log det(X + t Y ) = tr ((X + t Y )−1 Y )
d2
dt2
log det(X + t Y ) = − tr ((X + t Y )−1 Y (X + t Y )−1 Y )
d
dt log det(X + t Y )−1 = − tr ((X + t Y )−1 Y )
d2
dt2
log det(X + t Y )−1 = tr ((X + t Y )−1 Y (X + t Y )−1 Y )
d
dt log det(δ(A(x
³ + t y) + a)2 + µI) ´
−1
= tr (δ(A(x + t y) + a)2 + µI) 2δ(A(x + t y) + a)δ(Ay)
620 APPENDIX D. MATRIX CALCULUS
D.2.5 determinant
d
dt det(X + t Y ) = det(X + t Y ) tr((X + t Y )−1 Y )
d2
det(X + t Y ) = det(X + t Y )(tr 2 (X + t Y )−1 Y − tr((X + t Y )−1 Y (X + t Y )−1 Y ))
¡ ¢
dt2
d
dt det(X + t Y )−1 = − det(X + t Y )−1 tr((X + t Y )−1 Y )
d2
dt2
det(X + t Y )−1 = det(X + t Y )−1 (tr 2 ((X + t Y )−1 Y ) + tr((X + t Y )−1 Y (X + t Y )−1 Y ))
d
dt detµ (X + t Y ) = µ detµ (X + t Y ) tr((X + t Y )−1 Y )
D.2.6 logarithmic
Matrix logarithm.
d
dt log(X + t Y )µ = µY (X + t Y )−1 = µ(X + t Y )−1 Y , XY = YX
d
dt log(I − t Y )µ = −µY (I − t Y )−1 = −µ(I − t Y )−1 Y [273, p.493]
D.2. TABLES OF GRADIENTS AND DERIVATIVES 621
D.2.7 exponential
Matrix exponential. [98, §3.6, §4.5] [440, §5.4]
T T T
∇X etr(Y X)
= ∇X det eY X
= etr(Y X)
Y (∀ X , Y )
T T T
YT
∇X tr¡eY X = ¢eY X Y T = Y T eX (∀ X , Y )
∇X tr AeY X = . . .
∇x 1T eAx = ATeAx
1
∇x log(1T e x ) = ex
1T e x
µ ¶
1 x 1 x xT
∇x2 log(1T e x ) = δ(e ) − e e
1T e x 1T e x
k
µ k
¶
Q 1
1 Q 1
∇x xi =k
xi 1/x
k
i=1 k i=1
k
µ k
¶µ ¶
1
1 1
1
∇x2 −2 T
Q Q
xi = −
k
xi δ(x) − (1/x)(1/x)
k
i=1 k i=1 k
d tY
dt e = etY Y = Y etY
d X+ t Y
dt e = eX+ t Y Y = Y eX+ t Y , XY = YX
d 2 X+ t Y
dt2
e = eX+ t Y Y 2 = Y eX+ t Y Y = Y 2 eX+ t Y , XY = YX
d j tr(X+ t Y )
e = etr(X+ t Y ) tr j(Y )
dt j