Professional Documents
Culture Documents
Additional Exercises
In this living document, we provide additional exercises (including solutions) for the mathematics chapters
of our book Mathematics for Machine Learning, published by Cambridge University Press (2020). Possible
solutions are shown in blue. They may not be unique or optimal.
If you find mistakes, please raise a github issue at
https://github.com/mml-book/mml-book.github.io/issues.
Chapter 2
1. Find all solutions of the inhomogeneous system of linear equations Ax = b, where
(a)
1 2 1
A := 3 0 , b := 0 .
−1 2 1
To determine the general solution of the inhomogeneous system of linear equations, a good start
is to compute the reduced row echelon form of the augmented system [A|b]:
1 2 1
3 0 0 −3R1
−1 2 1 +R1
1
1 2 1 + 3 R2 1 0 0
0 −6 −3 | · (− 1 ) 0 1 1
6 2
2
0 4 2 + 3 R2 0 0 0
From the last row of this augmented system, we see that 0x1 + 0x2 = 0, which is always true.
From the other rows, we obtain x1 = 0 and x2 = 12 , so that
0
x= 1
2
1
• From the RREF, we can read out a particular solution (not unique) by using the pivot columns
as
0
xp = 12 ∈ R3 .
0
Here, we set x1 to the right-hand side of the augmented RREF in the first row, and x2 to
the right-hand side of the augmented RREF in the second row. Since xp ∈ R3 (otherwise the
matrix-vector multiplication Ax = b would not be defined), the third coordinate x3 = 0.
• Next, we determine all solutions of the homogeneous system of linear equations Ax = 0. From
the left-hand side of the augmented RREF, we can immediately read out the solutions as
1
λ 1 , λ ∈ R,
−1
This matrix multiplication is not defined since A ∈ R2×3 and B ∈ R2×3 . For the matrix product
to be defined, the “neighboring” dimensions (columns of A and rows of B) would need to match.
Here, they are 2 and 3.
(b)
4 −1
1 2 3
A := , B := 2 0
0 −1 2
2 1
14 2
AB = ,
2 2
x ∈ L1 ⇐⇒ x = p1 + αb1
2
for some α ∈ R. We defined b1 as the basis vector of U1 . Similarly,
x ∈ L2 ⇐⇒ x = p2 + β1 c1 + β2 c2
for some β1 , β2 ∈ R and U2 = span[c1 , c2 ]. Therefore, for all x ∈ L1 ∩ L2 both conditions must hold
and we arrive at
x ∈ L1 ∩ L2 ⇐⇒ ∃α, β1 , β2 ∈ R : αb1 − β1 c1 − β2 c2 = p2 − p1
which leads to the inhomogeneous system of linear equations Aλ = b where λ = [α, β1 , β2 ]> and
−3 −1 −5 9
A := −2 −1 −4 , b := p2 − p1 . 6
1 −1 −1 −3
We bring the augmented system [A|b] into reduced row echelon form using Gaussian elimination:
1 0 1 −3
0 1 2 0
0 0 0 0
of the reduced row echelon form of the augmented system. We obtain β1 = −2β2 , such that
−3
U1 ∩ U2 = span[−2].
1
i.e., L1 ⊆ L2 .
Chapter 3
1. Consider R3 with h·, ·i defined for all x, y ∈ R3 as
4 2 1
hx, yi := x> Ay , A := 0 4 −1 .
1 −1 5
3
Chapter 4
1. Compute the determinants of the following matrices:
(a)
1 0 −3 0 9
3 7 10 3 17
4
A := 0 11 0 1
6 0 8 0 −3
5 1 6 −1 8
1
0 −3 0 9 1 0 −3 0 9
3 7 10 3 17 18 10 28 0 41
4 0 11 0 1 = 4 0 11 0 1
6
0 8 0 −3 6 0 8 0 −3
5 1 6 −1 8 5 1 6 −1 8
where we added 3 times the last row to the second row. Now, we develop the determinant about
the fourth column:
1 0 −3 9
18
1 −3 9
10 28 41 2nd col
det(A) = (−1)(−1)4+5
= 10 4 11 1
4 0 11 1 6 8 −3
6 0 8 −3
= 10(−33 − 18 + 288 − 594 − 8 − 36) = −4010,
2 0 4 5
2 4 5 2 4 5
1 1 1 1 col 2
4 5 2 5 2 4
9 = 9 0 0 − 2 1 1 1 = −9 − 2 −2 + 3
2 0 0 0 2 3 1 1 1 1
0 2 3 0 2 3
0 2 3
= −9(12 − 10) − 2(−2 · (2 − 5) + 3(2 − 4)) = −18 − 2(6 − 6) = −18.
We could have seen that the second 3 × 3-matrix after the development about the 2nd column
is rank deficient (the third row is the first row minus twice the second row), which results in a
determinant of 0.
2. Consider an endomorphism Φ : R3 → R3 with transformation matrix
4 0 −2
A = 1 3 −2 , λ ∈ R
1 2 −1
4
(a) Compute the characteristic polynomial of A and determine all eigenvalues.
We have
4 − λ 0 −2
1st row 3 − λ −2 1 3 − λ
p(λ) = det(A − λI) = 1
3−λ −2 = (4 − λ)
− 2
2 −1 − λ 1 2
1 2 −1 − λ
= (4 − λ) (3 − λ)(−1 − λ) + 4 − 2(2 − (3 − λ))
= (4 − λ)(3 − λ)(−1 − λ) + 4(4 − λ) − 4 + 2(3 − λ)
= −λ3 + 6λ2 − 11λ + 6.
− λ3 + 6λ2 − 11λ + 6 = 0
⇐⇒ λ3 − 6λ2 + 11λ − 6 = 0
⇐⇒ (λ − 1)(λ − 2)(λ − 3) = 0
·(− 61 ) − 32
3 0 −2 −3R2 0 −6 4 1 0
1 2 −2 1 2 −2 + 13 R2 | swap with R1 0 1 − 23
1 2 −2 −R2 0 0 0 0 0 0
Therefore,
2
E1 = span[2].
3
and we obtain
1
E2 = span[1]
1
such that
2
E3 = span[1]
1
5
(c) Determine a transformation matrix B such that B −1 AB is a diagonal matrix and provide this
diagonal matrix.
The desired matrix B consists of all eigenvectors (as the columns of the matrix), and is given by
2 1 2
2 1 1 .
3 1 1
Note that this is the diagonal matrix with the eigenvalues on the diagonal. If you compute B −1 AB
and should get the same answer.
3. Consider the matrix
−3 4 4
A = −5 9 5
−7 4 8
The aim is to find a matrix M ∈ R3×3 such that M 2 = A (a “square root” of A).
(a) Find an invertible matrix P and a diagonal matrix D such that A = P DP −1 .
The characteristic polynomial of A is p(λ) = −λ3 + 14λ2 − 49λ + 36. An obvious root of this
polynomial is 1, and we can factorize p(λ) = −(λ − 1)(λ − 4)(λ − 9), which gives us the eigenvalues
1, 4, 9.
1
We use Gaussian elimination to compute eigenspace E1 = ker(A−1I), and we get E1 = span[0].
1
0 1
Similarly, we get E4 = span[ 1 ] and E9 = span[2]. We then define the invertible matrix P
−1 1
and the diagonal matrix D as
1 0 1 1 0 0
P = 0 1 2 , D = 0 4 0
1 −1 1 0 0 9
so that A = P DP −1 .
(b) Let M be in R3×3 and let us assume that M 2 = A. Let us consider N = P −1 M P . Show that
N 2 = D. Then prove that N commutes with D, i.e., N D = DN .
Exploiting the associativity of matrix multiplication, we obtain
N 2 = (P −1 M P )(P −1 M P ) = P −1 M (P P −1 )M P = P −1 M 2 P = P −1 AP = D
and, therefore,
N D = N (N 2 ) = N 3 = (N 2 )N = DN .
6
Let us denote by ni,j the coefficient of matrix N at row i and column j and let di denote the ith
coefficient on the diagonal of D. Note that in our example, i and j will be ranged in {1, 2, 3}, but
this result extends to matrices of arbitrary size. Let i and j be in {1, 2, 3}. The coefficient of N D
at row i and column j is equal to ni,j dj , while that of DN is equal to di ni,j . The matrix equality
N D = DN yields
∀i, j ∈ {1, 2, 3} : ni,j dj = ni,j di ,
i.e.,
∀i, j ∈ {1, 2, 3} : ni,j (dj − di ) = 0. (1)
In general, a product is null if and only if at least one of its factors is null. But as all the values
on the diagonal of D are different, (1) is equivalent to
∀i, j ∈ {1, 2, 3} : (i 6= j) =⇒ (ni,j = 0),
which ensures that N is diagonal. Note that if two values on the diagonal of D were equal, N
would not necessarily be diagonal and we would have infinitely many candidates for N , and thus
as many for M .
(d) What can you say about N ’s possible values? Compute a matrix M , whose square is equal to A.
How many different such matrices are there?
We can write N as N = diag(n1 , n2 , n3 ) and N 2 = D requires that n21 = 1, n22 = 4 and n23 = 9. As
all diagonal values are positive, we have exactly two distinct square roots for each one. Therefore,
we have 8 possible values for N that we gather in the following set:
{diag(n1 , n2 , n3 ) | n1 ∈ {−1, +1}, n2 ∈ {−2, +2}, n3 ∈ {−3, +3}} .
Now, let us set N = diag(1, 2, 3) and compute the product M = P N P −1 . First, Gaussian
elimination gives us
3 −1 −1
1
P −1 = 1 0 −1 ,
2
−1 1 1
and we find one square root of A as
0 1 1
M = P N P −1 = −1 3 1 .
−2 1 3
We can check that M 2 indeed equals A. We can choose amongst the 8 different possible values
of N to find a new square root of A. Hence, there are equally many different such matrices M .
4. https://github.com/mml-book/mml-book.github.io/issues/338
Let A ∈ Rm×n . Show that:
• AA> and A> A have identical non-zero eigenvalues.
• If q is an eigenvector of AA> then A> q is an eigenvector of A> A.
• If p is an eigenvector of A> A then Ap is an eigenvector of AA> .
We now need to show that A> q 6= 0 before we can conclude that λ is an eigenvalue of A> A.
Assume A> q = 0. Then it would follow that AA> q = 0, which contradicts AA> q = λq 6= 0
since q is an eigenvector of AA> with associated eigenvalue λ. Therefore, q 6= 0, which implies
that A> q 6= 0.
Therefore, λ is an eigenvalue of A> A with A> q as the corresponding eigenvector.
7
• Let us now consider the case where λ 6= 0 is an eigenvalue of A> A. We want to show that λ is
also an eigenvalue of AA> .
Let λ 6= 0 be an eigenvalue of A> A and p be a corresponding eigenvector, i.e., (A> A)p = λp.
Then
Similar to above, we now need to show that Ap 6= 0 before we can draw our conclusions.
Assume Ap = 0. Then 0 = Ap = A> Ap = λp with λ 6= 0. This contradicts our assumption that
p is an eigenvector of A> A. Therefore Ap 6= 0.
Therefore, λ 6= 0 is an eigenvalue of AA> , and a corresponding eigenvector is Ap.
Chapter 5
1. Consider
f := Ax,
• We start by determining the dimension of the partial derivative. Knowing the dimensions of A
∂f
and x, it follows that f ∈ R3 . Therefore, ∂A ∈ R3×(3×2) .
• We look at every element of f := [f1 , f2 , f3 ]> and determine the corresponding partial derivatives.
By definition,
2
X
fi = Aij xj
j=1
for i = 1, 2, 3. Therefore,
∂fi
= xj
∂Aij
∂fi
=0
∂Akj
for k 6= i. This then gives
∂f1 ∂f1 ∂f1 ∂f1 ∂f1 ∂f1
=x1 =x2 =0 =0 =0 =0
∂A11 ∂A12 ∂A21 ∂A22 ∂A31 ∂A32
∂f2 ∂f2 ∂f2 ∂f2 ∂f2 ∂f2
=0 =0 =x1 =x2 =0 =0
∂A11 ∂A12 ∂A21 ∂A22 ∂A31 ∂A32
∂f3 ∂f3 ∂f3 ∂f3 ∂f3 ∂f3
=0 =0 =0 =0 =x1 =x2
∂A11 ∂A12 ∂A21 ∂A22 ∂A31 ∂A32
We now have all our 18 entries that we need to construct our 3 × 3 × 2 partial derivative, which
can be done in the following way (where we store the partial derivatives in df ):
∂fi
df [i, j, k] = .
∂Ajk
8
From above, we see that
x1 0 0 x2 0 0
∂f ∂f
df [:, :, 1] = =0 x1 0 ∈ R3×3 , df [:, :, 2] = =0 x2 0 ∈ R3×3 ,
∂A:1 ∂A:2
0 0 x1 0 0 x2
which is what we expect if we compute the partial derivative of a vector f ∈ R3 with respect to
a column vector A:i ∈ R3 of matrix A.
• An alternative approach is to vectorize A, compute the partial derivatives, and then re-assemble
them afterwards. Here, we define a vector
A11
a1 A21
a2
A31 6
A12 ∈ R ,
a = . :=
..
A22
a6
A32
which consists of the stacked columns of A. Using this vector, we obtain the elements of f as
f1 = a1 x1 + a4 x2
f2 = a2 x1 + a5 x2
f3 = a3 x1 + a6 x2 .
Chapter 6
Chapter 7