Professional Documents
Culture Documents
Let V be a subspace of Rn . To find the matrix of the orthogonal projection onto V , the way we
first discussed, takes three steps:
(1) Find a basis ~v1 , ~v2 , . . . , ~vm for V .
(2) Turn the basis ~vi into P an orthonormal basis ~ui , using the Gram-Schmidt algorithm.
(3) Your answer is P = ~ui ~uTi . Note that this is an n × n matrix, we are multiplying a column
vector by a row vector instead of the other way around.
It is often better to combine steps (2) and (3). (Note that you still need to find a basis!) Here
is the result: Let A be the matrix with columns ~vi . Then
P = A(AT A)−1 AT
Your textbook states this formula without proof in Section 5.4, so I thought I’d write up the
proof. I urge you to also understand the other ways of dealing with orthogonal projec-
tion that our book discusses, and not simply memorize the formula.
Example
Let V be the span of the vectors (1 2 3 4)T and (5 6 7 8)T . These two vectors are linearly
independent (since they are not proportional), so
1 5
2 6
A= 3 7 .
4 8
Then !
87 −7
T 30 70 T −1 160 32
A A= (A A) = −7 3 .
70 174
32 32
Note that AT and A are not square, but the product AT A is, so (AT A)−1 makes sense.
Then we have
7 2 1
− 15
1 5 ! 10 5 10
2 6 87 −7
2 3 1 1
T −1 T 160 32 1 2 3 4 5 10 5 10
A(A A) A = = 1
3 7 −7 3 5 6 7 8 1 3 2
32 32 10 5 10 5
4 8 1 1
− 5 10 5 2 7
10
You can have fun checking that this is, indeed, the matrix of orthogonal projection onto V .
Why is AT A invertible?
This formula only makes sense if AT A is invertible. So let’s prove that it is. Suppose, for the
sake of contradiction, that ~c = (c1 , c2 , · · · , cm ) is a nonzero vector in the kernel of AT A. Then
AT A~c = 0 and so
0 = ~cT AT A~c = (A~c)T (A~c) = |A~c|2 .
Since the only vector with length 0 is ~0, this shows that A~c = 0.
But A~c = c1~v1 + c2~v2 + · · · + cm~vm and we assumed that ~v1 , ~v2 , . . . , ~vm was a basis for V , so
there are no linear relations between the ~vi . So we can’t have c1~v1 + c2~v2 + · · · + cm~vm = ~0. This
is our contradiction and we deduce that AT A didn’t have a kernel after all.
Direct proof
Let’s first check that P w ~ is the right thing in two particular cases.
Case 1: w ~ is in V . In this case, w~ = A~c for some ~c. Then
~ = A(AT A)−1 AT (A~c) = A(AT A)−1 (AT A)~c = A~c = w
Pw ~
which is what we want.
Case 2: w ~ is in V ⊥ . Since V = Image(A), we have V ⊥ = Image(A)⊥ = Ker(AT ) and we see
T
that A w~ = 0. Then
Pw~ = A(AT A)−1 AT w ~ = A(AT A)−1~0 = ~0
which is, again, what we want.
For a general w,
~ write w~ =w ~1 + w
~ 2 where w
~ 1 is in V and w ~ 2 is in V ⊥ . Then
Pw
~ = Pw ~1 + P w~2 = w~ 1 + ~0 = w~1
where we used Case 1 to compute P w ~ 1 and Case 2 to compute P w
~ 2 . Taking w
~ to w
~ 1 is, indeed,
exactly what orthogonal projection is suppose to do.