You are on page 1of 38

Linear Algebra and Geometry 3

Inner product spaces, quadratic forms, and more advanced problem solving

Least squares, SVD, and pseudoinverse


(Moore—Penrose inverse)

Hania Uscka-Wehlou, Ph.D. (2009, Uppsala University: Mathematics)


University teacher in mathematics, Sweden
Notation and terminology

pseudo-inverse or pseudoinverse

+
A or A †
Theorem Let A 2 R . Supp m⇥n

of R consisting only of eigenvec


n
7·6 22 7=6 22 22 7 Example: Compute the inverse of the matrix
Extra material: notes with the motivation
A:
for the for
algorithm (for 2-by-2anand 3-by-3
540 0 b33 05 4 0 0 a33 b33 0 5 88 Algorithm 2 inverse matrices,
3 example.
0 0 0 b44 0 0 0 a44 b44 88 Algorithm for inverse matrices, an example. 2 1 the inverse
Example: Compute 1 of the matrix
4
The invertible matrix theorem
Example: Compute the inverse of the matrix A: A = 4 0 6 4 5 .
generalised for product of any number of diagonal 2 0 23 2 A
Let A be a square n-by-n matrix over ℝ. The following statements are equivalent 2 1 1
(i.e., they are either all true or all false for any given matrix): A = 40 6 4 5.
ent n 2 N+ of a diagonal matrix A is the diagonal 89 Matrix inverse, Problem 1. 0 2 2
wers anii along the diagonal: 1. A is invertible, that is, A has an inverse, is nonsingular. 89 Matrix inverse, Problem 1.
2. There is an n-by-n matrix B such that Problem
AB = In = BA. 1: Determine if the matrix A is invertible and, if it is invertible, co
Problem 1: 2Determine if the matrix A is inve
3n 2 n 3 89 Matrix inverse, Problem 1. 3
0 0 0 a11 0 3. 0A is row-equivalent
0 to the n-by-n identity matrix In.
2 0 1
7 6 n 4. A has n pivot
7 positions (or: n leading 1’s Problem 1: Determine if the matrix A is invertible and, if it is invertible, compute i
after row reduction to RREF).
a22 0 07 6 0 a22 5. 0A has0full7 rank; that is, rank(A) = n. A 2= 4 0 1 3 1 5. A
=
0 a33 0 5 40 n
0 6.aThe 0 5 A can be expressed as a finite product of elementary matrices.
33 matrix
2 0
3 0 1
1
0 0 a44 0 an44
0 7. 0The equation Ax = 0 has only the trivial solution x = 0. A = 4 0 1 1 5.
n
Solve the system of
8. The equation Ax = b has exactly one solution for some b in ℝ . linear equations
Video 89 Solve
Ax = the
b system
for 0 of 1RHS
3 every linear b
equations
= (b , Ax
b
1 2 3 , b=)bT
n
eating Rule 2. 9. The equation Ax = b has exactly one solution for each b in ℝ .
Solve the system of linear equations AxExtra
Extra material: notes with solved Problem 1. material:
= b for notesbwith
every RHS = (bsolved
, b , b Problem
1 )T .
2 3
1.
10. detA ≠ 0. Extra material: notes with solved Problem 90 Matrix1. inverse, Problem 2.
sponding to eigenvalue are non-trivial 90 Matrix inverse, Problem 2.
Problem 2: Determine all the values of c for w
90 Matrix inverse, Problem 2.
⇥ n system of equations Problemdeterminant
11. The columns of A are linearly independent. 2: Determine all the
testallfor values of c for which the following
ℝ n matrix is in
12. The columns of A span ℝn. Problem 2: Determine thenvalues
vectorsof cgenerating
for which 2 thethe entire
following
3 matrix is invertible
13. Col(A) = ℝn. 2 c c3 c
n c4 c c
14. The columns of A form a basis of ℝ . 1 c 1 5 .
4 5
(A I)v = 0. 15. The transpose AT is an invertible matrix (hence rows of A are linearly independent, spanExtra 1 c 1 .
ℝn, material:
00 0notes cwith solved Problem 2
and form a basis of ℝ ). n 0 c
must have infinitely 16. A is column-equivalent to the n-by-n Extra 91 Matrix equations, Problem 3.
many solutions
identity material:
matrix I n .
Extra material: notes notes n
with
withn
solved Problem
solved Problem 2. 2.
17. The linear transformation mapping x to Ax is a bijection from ℝ to ℝ .
if λ is an eigenvalue 18. The kernel of A is trivial, that is,91 91 Matrix
Matrix
it contains only equations,
equations,
the null vector Problem
Problem 3.
3. ker(A) =Problem
as an element, null(A) = {0}
{0}. 3: Find the matrix A that solves the
  T 
19. The number 0 is not an eigenvalue of A. 2 with
1 solved T 2 7 3.

Problem Extra material: notes 2 1 Problem
det( I A) = det(A I) 20. The matrix A has a left inverse (that Problem
is, there 3: a3:Find
exists BFind the matrix
suchthe
that matrix
BA = I) orAathat
A right solves
that inverse
solvesthethe
equation
equation 5 3
= A +=AA + A .
4 8
(that is, there exists a C such that AC = I), in which case both left and right inverses exist 92 and B = Cequations,
Matrix =A .−1 5 4.
Problem 3
e, the system of equations (A I)v = 0 must have Extra material: notes with solved Problem 3.
Extra material: notes with solved Problem 3.
the trivial one (v = 0). 92 Matrix equations, Problem 4. Problem 4: Find the matrix A that solves the
m×n
In this section: A ∈ ℝ
m×n
In this section: A ∈ ℝ

−1
Solve the system Ax = b by taking x = A b

−1 m n n m
Find the inverse TA : ℝ → ℝ to TA : ℝ → ℝ
m×n
In this section: A ∈ ℝ

−1
Solve the system Ax = b by taking x = A b

−1 m n n m
Find the inverse TA : ℝ → ℝ to TA : ℝ → ℝ

Problem: not all systems are consistent, and not all functions are invertible
Video 188
Least Squares ⊥
b − Ax̂ ∈ C

⊥ T
C = Null(A )

T
A (b − Ax)̂ = 0

T T
A Ax̂ = A b Video 194

T −1 T
The normal equation ? x̂ = (A A) A b
If the columns of A
are linearly independent
Video 188
Least Squares ⊥
b − Ax̂ ∈ C

⊥ T
C = Null(A )

T
A (b − Ax)̂ = 0

T T
A Ax̂ = A b Video 194

T −1 T
The normal equation ? x̂ = (A A) A b
By the Invertible Matrix Theorem applied for If the columns of A
T
A A: happens when all its eigenvalues are positive. are linearly independent
Video 188
Least Squares Only for “long”

b − Ax̂ ∈ C matrices:
m⩾n

⊥ T
C = Null(A )
Overdetermined
systems
T
A (b − Ax)̂ = 0

T T
A Ax̂ = A b Video 194

T −1 T
The normal equation ? x̂ = (A A) A b
By the Invertible Matrix Theorem applied for If the columns of A
T
A A: happens when all its eigenvalues are positive. are linearly independent
Video 188
Least Squares Only for “long”

b − Ax̂ ∈ C matrices:
m⩾n

⊥ T
C = Null(A )
T
Note: the nullspace of A must be Overdetermined
n systems
mapped onto the zero vector in ℝ T
A (b − Ax)̂ = 0

T T
A Ax̂ = A b Video 194

T −1 T
The normal equation ? x̂ = (A A) A b
By the Invertible Matrix Theorem applied for If the columns of A
T
A A: happens when all its eigenvalues are positive. are linearly independent
If all the columns of A are linearly independent, we can find an
“inverse” (pseudoinverse) which gives us unique least square
solution to the system Ax = b

† T −1 T
A = (A A) A

Matrix A has full column rank


n m n n
σ1 vT1
m m ⋱σ vT2
r

ON n
= u1 | u2 | … | ur | ur+1 | … | um
zeros vTn

ON

T
A U Σ V
T
A = UΣV

m n
Here ui ∈ ℝ , vi ∈ ℝ , σi ∈ ℝ
† −1 T
A suggestion: take A = VΣ U

n m n n
σ1 vT1
m m ⋱σ vT2
r

ON n
= u1 | u2 | … | ur | ur+1 | … | um
zeros vTn

ON

T
A U Σ V
T
A = UΣV

m n
Here ui ∈ ℝ , vi ∈ ℝ , σi ∈ ℝ
0 1 01 1
1 0 ··· 0 1
0 ··· 0
B0 ··· 0C B0 1
··· 0C
B 2 C † 1 B 2 C
⌃r = B .. .. ... .. C ) ⌃r = ⌃r =B. .. .. .. C
@. . .A @ .. . . .A
1
0 0 ··· r 0 0 ··· r

Problem
2 3:3 Compute the pseudoinverse for the matrix
1 1
A = 40 15 from V192, using two methods.
1 0
Theorem: Singular Value Decomposition Let A 2 R m⇥n

be a matrix with rank r. Then A can be written in the form


T
A = U ⌃V
0 1 01 1
1 0 ··· 0 1
0 ··· 0
B0 ··· 0C B0 1
··· 0C
B 2 C † 1 B 2 C
⌃r = B .. .. ... .. C ) ⌃r = ⌃r =B. .. .. .. C
@. . .A @ .. . . .A
1
0 0 ··· r 0 0 ··· r

0 01
1 pseudoinverse for the matrix 1
Problem
1 2
3:
0 · ·3 Compute
· 0 0 ··· 0 the 1
0 ··· 0 0 ··· 0
B0 1 ·1· · 0 0 · · · 0 C B 0 1
··· 0 0 ··· 0 C
B. 2
C B C
B . 4.. . . 5 .. .. . . .. C B. 2
. . . . . .. C
AB =
. .0 1 . from
. . V192,
. . B .
. .
C using two methods.
. . . .
. .
. . . .C
B C B C

⌃ = B 0 0 · · · r 0 · · · 0C ) ⌃ = B 0 0 · · ·B 1
0 ··· 0 C
B 1 0
0 0 ··· 0 0 ··· 0
C B r C
C
B C B 0 0 · · · 0 0 · · · 0C
B. . . . . . . C B . . . . . C
@ .. .. . . .. .. . . .. A @ .. .. . .
. . .. .. m⇥n
. . .. A
Theorem: Singular Value Decomposition Let A 2 R
0 0 · · · 0 0 · · · 0 0 0 · · · 0
be a matrix with rank r. Then A can be written in the form0 · · · 0
m×n n×m
T
A = U ⌃V
0 1 01 1
0 ··· 0
† † T
A correction of the suggestion: take A = VΣ U Pseudoinverse

n m n n
σ1 vT1
m m ⋱σ vT2
r

ON n
= u1 | u2 | … | ur | ur+1 | … | um
zeros vTn

ON

T
A U Σ V
T
A = UΣV

m n
Here ui ∈ ℝ , vi ∈ ℝ , σi ∈ ℝ
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A

† T † T † T
m×m AA = (UΣV )(VΣ U ) = UΣΣ U
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A

† T † T † T
m×m AA = (UΣV )(VΣ U ) = UΣΣ U

† † T T † T
n×n A A = (VΣ U )(UΣV ) = VΣ ΣV
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A

m×m † T
AA = (UΣV )(VΣ U ) = UΣΣ U † T † T Symmetric: because only
symmetric matrices can be
orthogonally diagonalized!
† † T T † T
n×n A A = (VΣ U )(UΣV ) = VΣ ΣV See Video 146.
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A
See Video 190
m×m † T
AA = (UΣV )(VΣ U ) = UΣΣ U † T † T Projection matrix on ColA

† † T T † T
n×n A A = (VΣ U )(UΣV ) = VΣ ΣV Projection matrix on RowA
t
Video 126 x · bk x · bk bk x 1 t
b k = b k = b k = (b k bk )x
kbk k2 kbk k2 kbk k2 kbk k2
1
Projection
1
matrix 1
t t t
P = (b1 b 1 ) + (b 2 b 2 ) + · · · + (b r br )
kb1 k2 kb2 k2 kbr k2

Let M be a subspace of the Euclidean space E , with dimension r < n. If


n

{b1 , b2 , . . . , br } is an orthogonal basis for M and x is any vector in E then n

✓ ◆
1 t 1 t 1 t
P x = projM x = (b1 b1 ) + (b 2 b 2 ) + · · · + (b r b r ) x
kb1 k2 kb2 k2 kbr k2

Problem 4: Determine the max 2and the 2


min values
2
of
2x1 + 3x2 2x3 with constraint x1 + x2 + (x2 x3) = 1
(i.e., on an ellipsoid), using Cauchy–Schwarz inequality
for the following inner product on R :
3
Video 67
2 3 2 3
u1 v1
6 u2 7 6 v2 7
u= 6 7
4 u3 5 v= 6 7
4 v3 5
u4 v4

2 3
Inner product v1
⇥ ⇤ 6 v2 7
T
u v = u · v = u1 u 2 u3 u4 6 7 = u 1 v 1 + u2 v 2 + u3 v 3 + u 4 v 4
4 v3 5
v4
Outer product Projection matrix on u; matrix with rank 1
2 3 2 3
u1 u1 v 1 u1 v 2 u1 v 3 u1 v 4
6 u 7 ⇥ ⇤ 6 u v u2 v 2 u2 v 3 u2 v 4 7
T 6 2 7 6
uv = 4 5 v1 v2 v3 v4 = 4 2 1 7
u3 u3 v 1 u3 v 2 u3 v 3 u3 v 4 5
u4 u4 v 1 u4 v 2 u4 v 3 u4 v 4
n×1 1×n
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A

m×m † T
AA = (UΣV )(VΣ U ) = UΣΣ U † T † T Projection matrix on ColA

† † T T † T
n×n A A = (VΣ U )(UΣV ) = VΣ ΣV Projection matrix on RowA
Video 135 Least squares solution to Ax = b
C = Col(A) is a finite-dimensional subspace of ℝm. If
b ∈ ℝm, then projCb is the best approximation to b
from C in the sense that
∥b − projCb∥ ⩽ ∥b − u∥

̂ ⩽ ∥b − Ax∥
∥b − Ax∥
n
for every vector u ∈ C. Then u = Ax for some x ∈ ℝ
C = Col(A)

b
Ax̂ = u = projCb
⊥ T
C = Null(A )
v = projC⊥b

̂ ⩽ ∥b − Ax∥ for all x ∈ ℝ n


∥b − Ax∥
m×n n×m
T † † T
A = UΣV A = VΣ U
Pseudoinverse

† T † T T † T T
AA A = (UΣV )(VΣ U )(UΣV ) = UΣΣ ΣV = UΣV = A

† † † T T † T † † T † T †
A AA = (VΣ U )(UΣV )(VΣ U ) = VΣ ΣΣ U = VΣ U = A

m×m † T
AA = (UΣV )(VΣ U ) = UΣΣ U † T † T Projection matrix on ColA

† † T T † T
n×n A A = (VΣ U )(UΣV ) = VΣ ΣV Projection matrix on RowA

† †
Vector x* = A b is an exact solution to Ax = AA b

so it is a least square solution to Ax = b.



If Ax = b has a solution, then x* = A b is the solution with minimal norm.

If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Projection matrix on ColA



If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Does it have the least norm?


n †
Let x ∈ ℝ be any solution, i.e. Ax = b. Consider its orthogonal decomposition via A A (on the row space of A):

If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Does it have the least norm?


n †
Let x ∈ ℝ be any solution, i.e. Ax = b. Consider its orthogonal decomposition via A A (on the row space of A):

If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Does it have the least norm?


n †
Let x ∈ ℝ be any solution, i.e. Ax = b. Consider its orthogonal decomposition via A A (on the row space of A):

† † † †
x = (A A)x + (I − A A)x = A b + (I − A A)x

If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Does it have the least norm?


n †
Let x ∈ ℝ be any solution, i.e. Ax = b. Consider its orthogonal decomposition via A A (on the row space of A):

† † † †
x = (A A)x + (I − A A)x = A b + (I − A A)x
Pythagorean Theorem gives us:

If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Does it have the least norm?


n †
Let x ∈ ℝ be any solution, i.e. Ax = b. Consider its orthogonal decomposition via A A (on the row space of A):

† † † †
x = (A A)x + (I − A A)x = A b + (I − A A)x
Pythagorean Theorem gives us:

2 † 2 † 2 † 2
∥x∥ = ∥A b∥ + ∥(I − A A)x∥ ⩾ ∥A b∥
This shows that

∥x∥ ⩾ ∥A b∥

for all solutions x to the system Ax = b.



If Ax = b has a solution, then x* = A b is the solution with minimal norm.

Because the system is consistent, b ∈ ColA, so b = AA b, and therefore x* is a solution.

Does it have the least norm?


n †
Let x ∈ ℝ be any solution, i.e. Ax = b. Consider its orthogonal decomposition via A A (on the row space of A):

† † † †
x = (A A)x + (I − A A)x = A b + (I − A A)x
Pythagorean Theorem gives us:

2 † 2 † 2 † 2
∥x∥ = ∥A b∥ + ∥(I − A A)x∥ ⩾ ∥A b∥
This shows that

∥x∥ ⩾ ∥A b∥

for all solutions x to the system Ax = b.


Summary A
n m
Pseudoinverse: A † ∈ ℝn×m for a matrix A ∈ ℝm×n ℝ A † ℝ
† † † † † †
Properties: AA A = A , A AA = A, and both AA and A A are symmetric

Existence: A † always exists and is unique

Computing: if A has full column rank: A † = (A T A)−1A T


† n×m
if A is “diagonal”: A ∈ ℝ is also “diagonal” with reciprocals of non-zero diagonal entries of A
† † T T
in general: A = VΣ U if A = UΣV


More properties: AA is an orthogonal projection matrix onto ColA

A A is an orthogonal projection matrix onto RowA

A b is a least square solution to Ax = b
if the system Ax = b is consistent, then A †b is its solution with the smallest norm.

You might also like