You are on page 1of 9

Mathematics for Machine Learning

Additional Exercises

Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong


Last update: 2020-05-16

In this living document, we provide additional exercises (including solutions) for the mathematics chapters
of our book Mathematics for Machine Learning, published by Cambridge University Press (2020). Possible
solutions are shown in blue. They may not be unique or optimal.
If you find mistakes, please raise a github issue at
https://github.com/mml-book/mml-book.github.io/issues.

Chapter 2
1. Find all solutions of the inhomogeneous system of linear equations Ax = b, where
(a)
   
1 2 1
A :=  3 0 , b := 0 .
−1 2 1
To determine the general solution of the inhomogeneous system of linear equations, a good start
is to compute the reduced row echelon form of the augmented system [A|b]:
 
1 2 1
 3 0 0  −3R1
−1 2 1 +R1
  1  
1 2 1 + 3 R2 1 0 0
 0 −6 −3  | · (− 1 )  0 1 1 
6 2
2
0 4 2 + 3 R2 0 0 0
From the last row of this augmented system, we see that 0x1 + 0x2 = 0, which is always true.
From the other rows, we obtain x1 = 0 and x2 = 12 , so that
 
0
x= 1
2

is the unique solution of the system of linear equations Ax = b.


(b)
   
1 2 3 1
A := , b := .
0 2 2 1
The general solution consists of a particular solution of the inhomogeneous system and all solutions
of the homogeneous system Ax = 0. An efficient way to determine the general solution is via the
reduced row echelon form (RREF) of the augmented system [A|b]:
   
1 2 3 1 −R2 1 0 1 0
0 2 2 1 · 21 0 1 1 1
2

1
• From the RREF, we can read out a particular solution (not unique) by using the pivot columns
as
 
0
xp =  12  ∈ R3 .
0

Here, we set x1 to the right-hand side of the augmented RREF in the first row, and x2 to
the right-hand side of the augmented RREF in the second row. Since xp ∈ R3 (otherwise the
matrix-vector multiplication Ax = b would not be defined), the third coordinate x3 = 0.
• Next, we determine all solutions of the homogeneous system of linear equations Ax = 0. From
the left-hand side of the augmented RREF, we can immediately read out the solutions as
 
1
λ 1  , λ ∈ R,
−1

where we used the Minus-1 trick.


• Putting everything together, we obtain the set of all solutions of the system Ax = b as
     
 0 1 
S = x ∈ R3 : x =  12  + λ  1  , λ ∈ R .
0 −1
 

2. Compute the matrix products AB, if possible, where


(a)
   
1 2 3 4 −1 2
A := , B :=
0 −1 2 0 2 1

This matrix multiplication is not defined since A ∈ R2×3 and B ∈ R2×3 . For the matrix product
to be defined, the “neighboring” dimensions (columns of A and rows of B) would need to match.
Here, they are 2 and 3.
(b)
 
  4 −1
1 2 3
A := , B := 2 0
0 −1 2
2 1

 
14 2
AB = ,
2 2

where (for example) 14 = 1 · 4 + 2 · 2 + 3 · 2.


3. Find the intersection L1 ∩ L2 , where L1 and L2 are affine spaces (subspaces that are offset from 0)
defined as
         
1 −3 10 1 5
L1 := 0 + span[−2] , L2 :=  6  + span[1 , 4] .
1 1 −2 1 1
|{z} | {z } | {z } | {z }
=:p1 =:U1 =:p2 =:U2

x ∈ L1 ⇐⇒ x = p1 + αb1

2
for some α ∈ R. We defined b1 as the basis vector of U1 . Similarly,

x ∈ L2 ⇐⇒ x = p2 + β1 c1 + β2 c2

for some β1 , β2 ∈ R and U2 = span[c1 , c2 ]. Therefore, for all x ∈ L1 ∩ L2 both conditions must hold
and we arrive at

x ∈ L1 ∩ L2 ⇐⇒ ∃α, β1 , β2 ∈ R : αb1 − β1 c1 − β2 c2 = p2 − p1

which leads to the inhomogeneous system of linear equations Aλ = b where λ = [α, β1 , β2 ]> and
   
−3 −1 −5 9
A := −2 −1 −4 , b := p2 − p1 .  6 
1 −1 −1 −3

We bring the augmented system [A|b] into reduced row echelon form using Gaussian elimination:
 
1 0 1 −3
0 1 2 0 
0 0 0 0

and read out the particular solution α = −3 ⇒ ξ = p1 − 3b1 = [10, 6, −2]> = p2 .


To find the general solution, we need to look at the intersection of the direction spaces U1 ∩ U2 . The
corresponding RREF that we obtain is identical to the submatrix
 
1 0 1
0 1 2
0 0 0

of the reduced row echelon form of the augmented system. We obtain β1 = −2β2 , such that
 
−3
U1 ∩ U2 = span[−2].
1

We then arrive at the final solution


   
10 −3
L1 ∩ L2 =  6  + span[−2] = L1 ,
−2 1

i.e., L1 ⊆ L2 .

Chapter 3
1. Consider R3 with h·, ·i defined for all x, y ∈ R3 as
 
4 2 1
hx, yi := x> Ay , A := 0 4 −1 .
1 −1 5

Is h·, ·i an inner product?


We will show that h·, ·i is not symmetric, i.e., hx, yi =6 hy, xi.
We choose x := [1, 1, 0]> and y = [1, 2, 0]> . Then hx, yi = 16 and hy, xi = 14 6= 16.
In general, we can see directly that A is not symmetric. Similarly, for a symmetric A, we would need
to check that it is positive definite (e.g., via the eigenvalues of A).

3
Chapter 4
1. Compute the determinants of the following matrices:
(a)
 
1 0 −3 0 9
3 7 10 3 17 
 
4
A :=  0 11 0 1
6 0 8 0 −3
5 1 6 −1 8


1
0 −3 0 9 1 0 −3 0 9
3 7 10 3 17 18 10 28 0 41

4 0 11 0 1 = 4 0 11 0 1

6
0 8 0 −3 6 0 8 0 −3
5 1 6 −1 8 5 1 6 −1 8

where we added 3 times the last row to the second row. Now, we develop the determinant about
the fourth column:

1 0 −3 9

18
1 −3 9
10 28 41 2nd col
det(A) = (−1)(−1)4+5

= 10 4 11 1
4 0 11 1 6 8 −3
6 0 8 −3
= 10(−33 − 18 + 288 − 594 − 8 − 36) = −4010,

where we can use the Sarrus rule.


(b)
 
2 0 4 5
1 1 1 1
B := 
 
9 2 0 0
0 0 2 3


2 0 4 5
2 4 5 2 4 5  
1 1 1 1 col 2
4 5 2 5 2 4

9 = 9 0 0 − 2 1 1 1 = −9 − 2 −2 + 3
2 0 0 0 2 3 1 1 1 1

0 2 3 0 2 3
0 2 3
= −9(12 − 10) − 2(−2 · (2 − 5) + 3(2 − 4)) = −18 − 2(6 − 6) = −18.

We could have seen that the second 3 × 3-matrix after the development about the 2nd column
is rank deficient (the third row is the first row minus twice the second row), which results in a
determinant of 0.
2. Consider an endomorphism Φ : R3 → R3 with transformation matrix
 
4 0 −2
A = 1 3 −2 , λ ∈ R
1 2 −1

4
(a) Compute the characteristic polynomial of A and determine all eigenvalues.
We have

4 − λ 0 −2
1st row 3 − λ −2 1 3 − λ
p(λ) = det(A − λI) = 1
3−λ −2 = (4 − λ)
− 2

2 −1 − λ 1 2
1 2 −1 − λ

= (4 − λ) (3 − λ)(−1 − λ) + 4 − 2(2 − (3 − λ))
= (4 − λ)(3 − λ)(−1 − λ) + 4(4 − λ) − 4 + 2(3 − λ)
= −λ3 + 6λ2 − 11λ + 6.

Now, we need to find the eigenvalues, i.e., the roots of p(λ):

− λ3 + 6λ2 − 11λ + 6 = 0
⇐⇒ λ3 − 6λ2 + 11λ − 6 = 0
⇐⇒ (λ − 1)(λ − 2)(λ − 3) = 0

Therefore, the eigenvalues are 1, 2, 3.


(b) Compute bases of all eigenspaces.
We use Gaussian eliminatin to determine E1 = ker(A − I)

·(− 61 ) − 32
     
3 0 −2 −3R2 0 −6 4 1 0
 1 2 −2   1 2 −2  + 13 R2 | swap with R1  0 1 − 23 
1 2 −2 −R2 0 0 0 0 0 0

Therefore,
 
2
E1 = span[2].
3

We use again Gaussian elimination to determine E2 = ker(A − 2I)


     
2 0 −2 −2R2 0 −2 2 +2R3 1 0 −1
 1 1 −2   1 1 −2  −R3 | move to R1  0 1 −1 
1 2 −3 −R2 0 1 −1 move to R2 0 0 0

and we obtain
 
1
E2 = span[1]
1

Finally, E3 = ker(A − 3I), which we compute via Gaussian elimination:


     
1 0 −2 1 0 −2 1 0 −2
 1 0 −2  −R1  0 0 0  swap with R3  0 1 −1 
1 2 −4 −R1 0 2 −2 · 21 0 0 0

such that
 
2
E3 = span[1]
1

5
(c) Determine a transformation matrix B such that B −1 AB is a diagonal matrix and provide this
diagonal matrix.
The desired matrix B consists of all eigenvectors (as the columns of the matrix), and is given by
 
2 1 2
2 1 1 .
3 1 1

The corresponding diagonal matrix is then


 
1 0 0
B −1 AB = 0 2 0 .
0 0 3

Note that this is the diagonal matrix with the eigenvalues on the diagonal. If you compute B −1 AB
and should get the same answer.
3. Consider the matrix  
−3 4 4
A = −5 9 5
−7 4 8
The aim is to find a matrix M ∈ R3×3 such that M 2 = A (a “square root” of A).
(a) Find an invertible matrix P and a diagonal matrix D such that A = P DP −1 .
The characteristic polynomial of A is p(λ) = −λ3 + 14λ2 − 49λ + 36. An obvious root of this
polynomial is 1, and we can factorize p(λ) = −(λ − 1)(λ − 4)(λ − 9), which gives us the eigenvalues
1, 4, 9.
 
1
We use Gaussian elimination to compute eigenspace E1 = ker(A−1I), and we get E1 = span[0].
1
   
0 1
Similarly, we get E4 = span[ 1 ] and E9 = span[2]. We then define the invertible matrix P
−1 1
and the diagonal matrix D as
   
1 0 1 1 0 0
P = 0 1 2 , D = 0 4 0
1 −1 1 0 0 9

so that A = P DP −1 .
(b) Let M be in R3×3 and let us assume that M 2 = A. Let us consider N = P −1 M P . Show that
N 2 = D. Then prove that N commutes with D, i.e., N D = DN .
Exploiting the associativity of matrix multiplication, we obtain

N 2 = (P −1 M P )(P −1 M P ) = P −1 M (P P −1 )M P = P −1 M 2 P = P −1 AP = D

and, therefore,
N D = N (N 2 ) = N 3 = (N 2 )N = DN .

(c) Explain that N is thus necessarily diagonal.


Hint: Note that all the diagonal values of D are distinct.
Intuitively, as D is diagonal, the product N D multiplies the columns of N while DN multiplies
the rows of N . But as N D = DN , and D has different values on the diagonal, then N has to
be diagonal. Let us prove this result formally.

6
Let us denote by ni,j the coefficient of matrix N at row i and column j and let di denote the ith
coefficient on the diagonal of D. Note that in our example, i and j will be ranged in {1, 2, 3}, but
this result extends to matrices of arbitrary size. Let i and j be in {1, 2, 3}. The coefficient of N D
at row i and column j is equal to ni,j dj , while that of DN is equal to di ni,j . The matrix equality
N D = DN yields
∀i, j ∈ {1, 2, 3} : ni,j dj = ni,j di ,
i.e.,
∀i, j ∈ {1, 2, 3} : ni,j (dj − di ) = 0. (1)
In general, a product is null if and only if at least one of its factors is null. But as all the values
on the diagonal of D are different, (1) is equivalent to
∀i, j ∈ {1, 2, 3} : (i 6= j) =⇒ (ni,j = 0),
which ensures that N is diagonal. Note that if two values on the diagonal of D were equal, N
would not necessarily be diagonal and we would have infinitely many candidates for N , and thus
as many for M .
(d) What can you say about N ’s possible values? Compute a matrix M , whose square is equal to A.
How many different such matrices are there?
We can write N as N = diag(n1 , n2 , n3 ) and N 2 = D requires that n21 = 1, n22 = 4 and n23 = 9. As
all diagonal values are positive, we have exactly two distinct square roots for each one. Therefore,
we have 8 possible values for N that we gather in the following set:
{diag(n1 , n2 , n3 ) | n1 ∈ {−1, +1}, n2 ∈ {−2, +2}, n3 ∈ {−3, +3}} .
Now, let us set N = diag(1, 2, 3) and compute the product M = P N P −1 . First, Gaussian
elimination gives us  
3 −1 −1
1
P −1 =  1 0 −1 ,
2
−1 1 1
and we find one square root of A as
 
0 1 1
M = P N P −1 = −1 3 1 .
−2 1 3

We can check that M 2 indeed equals A. We can choose amongst the 8 different possible values
of N to find a new square root of A. Hence, there are equally many different such matrices M .
4. https://github.com/mml-book/mml-book.github.io/issues/338
Let A ∈ Rm×n . Show that:
• AA> and A> A have identical non-zero eigenvalues.
• If q is an eigenvector of AA> then A> q is an eigenvector of A> A.
• If p is an eigenvector of A> A then Ap is an eigenvector of AA> .

• We start by showing that if λ 6= 0 is an eigenvalue of AA> then it is also a non-zero eigenvalue of


A> A.
Let λ 6= 0 be an eigenvalue of AA> and q be a corresponding eigenvector, i.e., (AA> )q = λq.
Then
(A> A)A> q = A> (AA> q) = A> (λq) = λA> q.

We now need to show that A> q 6= 0 before we can conclude that λ is an eigenvalue of A> A.
Assume A> q = 0. Then it would follow that AA> q = 0, which contradicts AA> q = λq 6= 0
since q is an eigenvector of AA> with associated eigenvalue λ. Therefore, q 6= 0, which implies
that A> q 6= 0.
Therefore, λ is an eigenvalue of A> A with A> q as the corresponding eigenvector.

7
• Let us now consider the case where λ 6= 0 is an eigenvalue of A> A. We want to show that λ is
also an eigenvalue of AA> .
Let λ 6= 0 be an eigenvalue of A> A and p be a corresponding eigenvector, i.e., (A> A)p = λp.
Then

(AA> )Ap = A(A> Ap) = A(λp) = λAp.

Similar to above, we now need to show that Ap 6= 0 before we can draw our conclusions.
Assume Ap = 0. Then 0 = Ap = A> Ap = λp with λ 6= 0. This contradicts our assumption that
p is an eigenvector of A> A. Therefore Ap 6= 0.
Therefore, λ 6= 0 is an eigenvalue of AA> , and a corresponding eigenvector is Ap.

Chapter 5
1. Consider

f := Ax,

where A ∈ R3×2 and x ∈ R2 . Compute the partial derivative


∂f
.
∂A

• We start by determining the dimension of the partial derivative. Knowing the dimensions of A
∂f
and x, it follows that f ∈ R3 . Therefore, ∂A ∈ R3×(3×2) .
• We look at every element of f := [f1 , f2 , f3 ]> and determine the corresponding partial derivatives.
By definition,
2
X
fi = Aij xj
j=1

for i = 1, 2, 3. Therefore,
∂fi
= xj
∂Aij
∂fi
=0
∂Akj
for k 6= i. This then gives
∂f1 ∂f1 ∂f1 ∂f1 ∂f1 ∂f1
=x1 =x2 =0 =0 =0 =0
∂A11 ∂A12 ∂A21 ∂A22 ∂A31 ∂A32
∂f2 ∂f2 ∂f2 ∂f2 ∂f2 ∂f2
=0 =0 =x1 =x2 =0 =0
∂A11 ∂A12 ∂A21 ∂A22 ∂A31 ∂A32
∂f3 ∂f3 ∂f3 ∂f3 ∂f3 ∂f3
=0 =0 =0 =0 =x1 =x2
∂A11 ∂A12 ∂A21 ∂A22 ∂A31 ∂A32
We now have all our 18 entries that we need to construct our 3 × 3 × 2 partial derivative, which
can be done in the following way (where we store the partial derivatives in df ):

∂fi
df [i, j, k] = .
∂Ajk

8
From above, we see that
   
x1 0 0 x2 0 0
∂f ∂f
df [:, :, 1] = =0 x1 0  ∈ R3×3 , df [:, :, 2] = =0 x2 0  ∈ R3×3 ,
∂A:1 ∂A:2
0 0 x1 0 0 x2

which is what we expect if we compute the partial derivative of a vector f ∈ R3 with respect to
a column vector A:i ∈ R3 of matrix A.
• An alternative approach is to vectorize A, compute the partial derivatives, and then re-assemble
them afterwards. Here, we define a vector
 
  A11
a1 A21 
a2   
A31  6
A12  ∈ R ,
a =  .  := 
  
 ..   
A22 
a6
A32

which consists of the stacked columns of A. Using this vector, we obtain the elements of f as

f1 = a1 x1 + a4 x2
f2 = a2 x1 + a5 x2
f3 = a3 x1 + a6 x2 .

The partial derivative of f ∈ R3 with respect to a ∈ R6 results in the 3 × 6 matrix


 
x1 0 0 | x2 0 0
∂f
=  0 x1 0 | 0 x2 0  ∈ R3×6 .
∂a
0 0 x1 | 0 0 x2

We can now get the desired partial derivative as


   
x1 0 0 x2 0 0
df [:, :, 1] =  0 x1 0  , df [:, :, 2] =  0 x2 0 .
0 0 x1 0 0 x2

Chapter 6
Chapter 7

You might also like