You are on page 1of 2

The formula for the orthogonal projection

Let V be a subspace of Rn . To find the matrix of the orthogonal projection onto V , the way we
first discussed, takes three steps:
(1) Find a basis ~v1 , ~v2 , . . . , ~vm for V .
(2) Turn the basis ~vi into P an orthonormal basis ~ui , using the Gram-Schmidt algorithm.
(3) Your answer is P = ~ui ~uTi . Note that this is an n × n matrix, we are multiplying a column
vector by a row vector instead of the other way around.
It is often better to combine steps (2) and (3). (Note that you still need to find a basis!) Here
is the result: Let A be the matrix with columns ~vi . Then

P = A(AT A)−1 AT

Your textbook states this formula without proof in Section 5.4, so I thought I’d write up the
proof. I urge you to also understand the other ways of dealing with orthogonal projec-
tion that our book discusses, and not simply memorize the formula.

Example
Let V be the span of the vectors (1 2 3 4)T and (5 6 7 8)T . These two vectors are linearly
independent (since they are not proportional), so
 
1 5
2 6 
A= 3 7  .

4 8
Then !
  87 −7
T 30 70 T −1 160 32
A A= (A A) = −7 3 .
70 174
32 32

Note that AT and A are not square, but the product AT A is, so (AT A)−1 makes sense.
Then we have
 7 2 1
− 15
  
1 5 ! 10 5 10
2 6 87 −7
  2 3 1 1 
T −1 T 160 32 1 2 3 4  5 10 5 10
A(A A) A =  = 1
  
3 7 −7 3 5 6 7 8 1 3 2 
32 32 10 5 10 5
 
4 8 1 1
− 5 10 5 2 7
10

You can have fun checking that this is, indeed, the matrix of orthogonal projection onto V .

Why is AT A invertible?
This formula only makes sense if AT A is invertible. So let’s prove that it is. Suppose, for the
sake of contradiction, that ~c = (c1 , c2 , · · · , cm ) is a nonzero vector in the kernel of AT A. Then
AT A~c = 0 and so
0 = ~cT AT A~c = (A~c)T (A~c) = |A~c|2 .
Since the only vector with length 0 is ~0, this shows that A~c = 0.
But A~c = c1~v1 + c2~v2 + · · · + cm~vm and we assumed that ~v1 , ~v2 , . . . , ~vm was a basis for V , so
there are no linear relations between the ~vi . So we can’t have c1~v1 + c2~v2 + · · · + cm~vm = ~0. This
is our contradiction and we deduce that AT A didn’t have a kernel after all.
Direct proof
Let’s first check that P w ~ is the right thing in two particular cases.
Case 1: w ~ is in V . In this case, w~ = A~c for some ~c. Then
~ = A(AT A)−1 AT (A~c) = A(AT A)−1 (AT A)~c = A~c = w
Pw ~
which is what we want.
Case 2: w ~ is in V ⊥ . Since V = Image(A), we have V ⊥ = Image(A)⊥ = Ker(AT ) and we see
T
that A w~ = 0. Then
Pw~ = A(AT A)−1 AT w ~ = A(AT A)−1~0 = ~0
which is, again, what we want.
For a general w,
~ write w~ =w ~1 + w
~ 2 where w
~ 1 is in V and w ~ 2 is in V ⊥ . Then
Pw
~ = Pw ~1 + P w~2 = w~ 1 + ~0 = w~1
where we used Case 1 to compute P w ~ 1 and Case 2 to compute P w
~ 2 . Taking w
~ to w
~ 1 is, indeed,
exactly what orthogonal projection is suppose to do.

Proof using orthogonal bases


Our vectors ~v1 , ~v2 , . . . , ~vm are a basis for V , but not an orthonormal basis. However, V does
have an orthonormal basis. Let ~u1 , ~u2 , . . . , ~um be this basis. Let’s write Q for the matrix whose
columns T
P are the ~ui . The condition that ~ui are orthonormal is the same as Q Q = Idm . Let
~vj = i Rij ~ui . We can rewrite this using matrices, A = QR.
I claim that R is invertible. Notice that R is square (it’s m × m). If R~x = 0, then A~x =
QR~x = 0. But the columns of A are linearly independent, so A is injective, a contradiction. Since
Rank(R) = Rank(RT ), this also shows that RT is invertible.
Now we plug in:
P = A(AT A)−1 AT
= (QR)((QR)T QR)−1 (QR)T
= QR(RT QT QR)−1 RT QT
= QR(RT R)−1 RT QT because QT Q = Id
= QRR−1 (RT )−1 RT QT we already argued that R and RT are invertible
= QQT
As your textbook explains (Theorem 5.3.10), when the columns of Q are an orthonormal basis
of V , then QQT is the matrix of orthogonal projection onto V .
Note that we needed to argue that R and RT were invertible before using the formula (RT R)−1 =
R (RT )−1 . By contrast, A and AT are not invertible (they’re not even square) so it doesn’t make
−1

sense to write (AT A)−1 = A−1 (AT )−1 .

You might also like