MA 106: Linear Algebra: Prof. B.V. Limaye IIT Dharwad

MA 106: Linear Algebra
Lecture 18
Prof. B.V. Limaye

IIT Dharwad
Thursday, 15 February 2018
B.V. Limaye, IITDH MA 106: Lec-18

Orthogonal Projection
Let K denote either R or C. Recall that in Lecture 13, we
have defined the (perpendicular) projection of x ∈ Kn×1 in the
direction of nonzero y ∈ Kn×1 as follows:
⟨y, x⟩
Py (x) := y.
⟨y, y⟩
In particular, if y is a unit vector, then Py (x) = ⟨y, x⟩y.
We noted that the vector Py (x) is a scalar multiple of the
vector y, and proved the important relation
( )
x − Py (x) ⊥ y.
As a consequence,
∥x − Py (x)∥2 = ⟨x − Py (x), x − Py (x)⟩
= ⟨x, x − Py (x)⟩
= ∥x∥2 − ⟨x, Py (x)⟩.
More generally, let Y be a nonzero subspace of Kn×1 . We
would like to find a (perpendicular) projection of x ∈ Kn×1 into
Y , that is, we want to find y ∈ Y such that (x − y) ∈ Y ⊥ .
(This y is ‘the foot of the perpendicular’ from x into Y .)
If u1 , . . . , uk is an orthonormal basis for the subspace Y , then
a vector belongs to Y ⊥ if and only if it is orthogonal to each uj
for j = 1, . . . , k. As we saw while studying G-S OP, the vector
ỹ := x − Pu1 (x) − · · · − Puk (x) = x − ⟨u1 , x⟩u1 − · · · − ⟨uk , x⟩uk
is orthogonal to each uj for j = 1, . . . , k, and so ỹ ∈ Y ⊥ .

Since the vector y := ⟨u1 , x⟩u1 + · · · + ⟨uk , x⟩uk belongs to
Y , it is a (perpendicular) projection of x in Y .
The following result shows that this is the only vector in Y
that works!

Proposition (Projection Theorem)
Let Y be a subspace of Kn×1 . Then for every x ∈ Kn×1 , there
are unique y ∈ Y and ỹ ∈ Y ⊥ such that x = y + ỹ, that is,
Kn×1 = Y ⊕ Y ⊥ . The map PY : Kn×1 → Kn×1 given by
PY (x) = y is linear and satisfies (PY )2 = PY .
In fact, if u1 , . . . , uk is an orthonormal basis for Y , then
PY (x) = ⟨u1 , x⟩u1 + · · · + ⟨uk , x⟩uk and ỹ = x − y.
Proof. If Y = {0}, then every x ∈ Y ⊥ , and so x = 0 + x.

Suppose Y ̸= {0}, and let u1 , . . . , uk be an orthonormal basis
for Y . For x ∈ Kn×1 , define
PY (x) := ⟨u1 , x⟩u1 + · · · + ⟨uk , x⟩uk .
Clearly, y := PY (x) ∈ Y . Define ỹ := x − PY (x). Then

⟨uj , ỹ⟩ = ⟨uj , x − PY (x)⟩
∑k
= ⟨uj , x⟩ − ⟨uℓ , x⟩⟨uℓ , uj ⟩
ℓ=1
= ⟨uj , x⟩ − ⟨uj , x⟩ = 0
for j = 1, . . . , k by orthonormality. Hence ỹ ∈ Y ⊥ . Thus
x = y + ỹ with y ∈ Y and ỹ ∈ Y ⊥ . This proves existence.
To prove uniqueness, let z ∈ Y and z̃ ∈ Y ⊥ be such that
x = z + z̃. Then (y − z) ∈ Y and also y − z = (z̃ − ỹ) ∈ Y ⊥ ,
so that (y − z) ∈ Y ∩ Y ⊥ . Hence (y − z) ⊥ (y − z), and so
y − z = 0. Thus z = y, and in turn, z̃ = ỹ.
The map PY is linear since the inner product is linear in the
second variable. Also, PY (uj ) = uj for all j = 1, . . . , k. Hence
PY (x) = x if and only if x ∈ Y . As a consequence,
PY2 (x) = PY (PY (x)) = PY (x) for all x ∈ Kn×1 .
Let Y be a subspace of Kn×1 . Then the linear map
PY : Kn×1 → Kn×1 whose existence and uniqueness is proved
in the above result is called the orthogonal projection of
Kn×1 onto the subspace Y .
Given x ∈ Kn×1 , we shall now use the orthogonal projection
onto Y to find a vector in Y which is closest to x.
Let E be a nonempty subset of Kn×1 and let x ∈ Kn×1 .
A best approximation to x from E is an element y0 ∈ Y
such that ∥x − y0 ∥ ≤ ∥x − y∥ of all y ∈ Y .
A best approximation to a vector from a subset of vectors may
not exist, and even if it exists, it may not be unique. In this
light, the following result is noteworthy.

Proposition
Let Y be a subspace of Kn×1 and let x ∈ Kn×1 . Then there is
a unique best approximation to x from Y , namely, PY (x).
Further, PY (x) is the unique element of Y such that
x − PY (x) is orthogonal to Y . Also, the square of the distance
from x to its best approximation from Y is
∥x − PY (x)∥2 = ∥x∥2 − ⟨x, PY (x)⟩.
( )
Proof. We note that PY (x) ∈ Y and x − PY (x) ∈ Y ⊥ .
Let y ∈ Y . Then PY (x) − y also belongs to Y . Hence by the
Pythagorus theorem,
( ) ( ) 2
∥x − y∥2 = x − PY (x) + PY (x) − y
= ∥x − PY (x)∥2 + ∥PY (x) − y∥2
≥ ∥x − PY (x)∥2 ,
where equality holds if and only if y = PY (x). This shows that
PY (x) is the unique best approximation to x from Y .
Further, let y ∈ Y be such that (x − y) ∈ Y ⊥ . Then
x = y + (x − y), and since Kn×1 = Y ⊕ Y ⊥ , it follows that
y = PY (x).
( )
Finally, since PY (x) ∈ Y and x − PY (x) ∈ Y ⊥ ,
∥x − PY (x)∥2 = ⟨x − PY (x), x − PY (x)⟩
= ⟨x, x − PY (x)⟩
= ∥x∥2 − ⟨x, PY (x)⟩.
x − PY (x) b
b x
Y b
PY (x)
Remark
As we have seen before, if u1 , . . . , uk is an orthonormal basis
for Y , then PY (x) = ⟨u1 , x⟩u1 + · · · + ⟨uk , x⟩uk , and so
∑
k
x − PY (x) 2 = ∥x∥2 − ⟨x, PY (x)⟩ = ∥x∥2 − |⟨x, uj ⟩|2 .
j=1
We now give an application of the above considerations to a

linear system Ax = b, where A ∈ Km×n and b ∈ Km×1 .
[ ]
Let A = c1 · · · cn in terms of its n columns. Now the
[ ]T
above system has a solution x := x1 · · · xn if and only if
x1 c1 + · · · + xn cn = b, that is, b belongs to the column space
C(A) of A.
Suppose b ̸∈ C(A). We would like to find the best
approximation to b from the subspace C(A) of Km×1 .
Starting with the first column c1 of A, and using the G-S OP,
we find an ordered orthonormal basis (u1 , . . . , uk ) of C(A),
where k ≤ n. The best approximation a to b from C(A) is
∑
k ∑
k
a := PC(A) (b) = ⟨uj , b⟩uj = (u∗j b)uj .
j=1 j=1
Example    
1 1 1
Let A := 1 0 and b := 0 . Then C(A) is the span of
  
0 1 −5
[ ]T [ ]T
the 2 column vectors 1 1 0 and 1 0 1 of A. Then
[ √ √ ]T [ √ √ √ ]T
u1 := 1/ 2 1/ 2 0 and u2 := 1/ 6 −1/ 6 2/ 6
form an orthonormal basis for C(A). Hence the best
approximation a to b from C(A) is
1[ ]T 3 [ ]T [ ]T
(u∗ ∗
1 b)u1 +(u2 b)u2 = 1 1 0 − 1 −1 2 = −1 2 −3 .
2 2
Let us calculate the distance from b to its best approximation
a from C(A). Clearly, it is equal to
[ ]T [ ]T [ ]T √
∥b−a∥ = 1 0 −5 − −1 2 3 ∥ = ∥ 2 −2 −2 || = 2 3.
Also, according to an earlier formula, the square of this

distance is equal to
1 27
∥b∥2 − ⟨b, u1 ⟩|2 − ⟨b, u2 ⟩|2 = 26 − − = 12,
4 2
√ √
so that the desired distance is equal to 12 = 2 3, as
obtained above.

We now explain an alternative approach for finding the best
approximation y0 to b from C(A). It avoids the G-S OP.
Since y0 ∈ C(A), there is x0 ∈ Kn×1 such that y0 = Ax0 . Now
by the previous proposition, y0 is the unique element of C(A)
such that (b − y0 ) ∈ C(A)⊥ , that is, c∗j (b − y0 ) = 0 for each
j = 1, . . . , n, which means A∗ (b − y0 ) = 0.
Thus for x ∈ Kn×1 , A∗ (b − Ax) = 0 ⇐⇒ y = Ax. Hence 0
y0 can be obtained by finding a solution x of the normal
equations A∗ A x = A∗ b , and then letting y0 := Ax.
This system of n equations in n unknowns always has a
solution, namely x0 . Suppose rank A = n, that is, the columns
of the matrix A are linearly independent. Then rank A∗ A = n,
and the unique solution of the normal equations is given by
x := (A∗ A)−1 A∗ b. In this case, the best approximation to b
from C(A) is y0 := Ax = A(A∗ A)−1 A∗ b. In particular, if m = n,
then A is invertible, and y0 = b since b ∈ C(A).
On the other hand, suppose rank A < n, that is, the columns
of A are linearly dependent. Then rank A∗ A < n, and the
normal equations have infinitely many solutions, but if x is any
one of them, then Ax = y0 .
Example    
1 1 1

Let A := 1 0 ∈ K  3×2
and b := 0  ∈ K3×1 . We find

0 1 −5
the best approximation to b from[ the ] column space of
[ A]using
normal equations. Here A∗ A = and A∗ b =
2 1 1
.
1 2 −4
Now the
[ ] [ normal
] [ equations
] A∗ Ax = A∗ b, that is,
2 1 x1 1
= have a unique solution, namely
1 [2 ]x2 [ −4 ]
x 2 [ ]T
x= 1 = . Hence Ax = −1 2 −3 as before.
x2 −3
   
1 1 1 1
Next, let B := 1 0 1/2 and b := 0 . Then C(B) = C(A)
  
0 1 1/2 −5
since the third column of B is the average of the first two
columns of A, and so the best approximation to b from C(B)
[ ]T
is the same column vector −1 2 −3 as before. But here
  

2 1 3/2 1
B∗ B =  1 2 3/2 and B∗ b =  −4  .
3/2 3/2 3/2 −3/2
Now the normal equations B∗ Bx = B∗ b have many solutions

[ ]T [ ]T
such as 2 −3 0 or 3 −2 −2 , since the columns of
B are linearly dependent. However,
[ ]T [ ]T [ ]T
B 2 −3 0 = B 3 −2 −2 = −1 2 −3 as before.

Least Squares Approximation
Suppose a large number of data points (a1 , b1 ), . . . , (an , bn ) in
R2 are given, and we want to find a straight line passing
through these points. If all these points are collinear, then we
may join any two of them by a straight line, which will work
for us. If, however, these points are not collinear (which is
most often the case), then we would like to find a straight line
t = x1 + s x2 which will be most suitable in the following sense.
Consider the ‘error’ |x1 + aj x2 − bj | caused because of the
straight line not passing through the data point (aj , bj ) for
j = 1, . . . , n. We may collect these errors and attempt to
mimimize the least squares error given by
(∑n )1/2
|x1 + aj x2 − bj |2
.
j=1
This is known as the least squares problem.
To solve this problem, let
   
1 a1 b1
1 a2  [ ] b2 
  x  
A :=  .. ..  , x := 1 and b :=  ..  .
. .  x 2 .
1 an bn
[ ]T
Then Ax = x1 + a1 x2 x1 + a2 x2 · · · x1 + an x2 .
The least squares problem is to find b ∈ C(A) such that
∑
n
∥Ax − b∥ = 2
|x1 + aj x2 − bj |2
j=1
is minimised, that is, to find the best approximation to the

vector b from the column space C(A) of A.
As we have seen, this is done either by orthonormalizing the
[ ]T
columns of A, or by finding a vector x0 := x1 x2 such that
b − Ax0 is orthogonal to C(A) using the normal equations.
Instead of fitting a straight line t = x1 + s x2 to the data
points, we may attempt to fit a curve given by a polynomial
t = x1 + s x2 + s 2 x3 + · · · + s k xk+1 of degree k ∈ N, to the
data points (a1 , b1 ), . . . , (an , bn ). To solve this problem, let
     
1 a1 a12 · · · a1k x1 b1
1 a2 a2 · · · ak   x2   b2 
 2 2    
A :=  .. .. .. ..  , x :=  ..  and b :=  ..  .
. . . · · · .   .  .
1 an an2 · · · ank xk+1 bn
Then A x is equal to
[ ]T
x1 + a1 x2 + · · · + a1k xk+1 · · · x1 + an x2 + · · · + ank xk+1 .
As before, the least squares problem is to minimize
∑ n
∥Ax − b∥2 = |x1 + aj x2 + · · · + ajk xk+1 − bj |2
j=1
by finding the best approximation to the vector b from the

column space C(A) of A.
Example
Let us find a straight line t = x1 + s x2 that best fits the data
points (−1, 1), (1, 1), (2, 3) in the least squares sense. For this
purpose,
 we find the orthogonal projection of the vector  
1 1 −1
b := 1 into the column space of the matrix A := 1 1 .
3 [ ] [ ] 1 2
Then A∗ A = and A∗ b =
3 2 5
.
2 6 6
Hence the unique solution of the normal equations
[ ]T [ ]T
A∗ Ax = A∗ b is x = x1 x2 := 9/7 4/7 . Thus the
straight line which best fits the data is 7t = 9 + 4s.
Next, we find a parabola t = x1 + s x2 + s 2 x3 that best fits the
data points (−1, 1), (1, 1), (2, 3), (−2, 2) in the least squares
sense.

   
1 −1 1 1
1 1 1 1
Let A :=  
1 2 4 and b := 3.
 
1 −2 4  2
4 0 10 [ ]
Then A∗ A =  0 10 0  and A∗ b = 7 2 22 .
10 0 34
The unique solution of the normal equations A∗ Ax = A∗ b is
[ ]T [ ]T
x = x1 x2 x3 := 1/2 1/5 1/2 . Thus the parabola
which best fits the data is 10t = 5 + 2s + 5s 2 .
t t
(2, 3)
b (2, 3) b
(−2, 2) b
(−1, 1) b b (1, 1) (-1,1) b b (1, 1)
s s

MA 106: Linear Algebra: Prof. B.V. Limaye IIT Dharwad

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MA 106: Linear Algebra: Prof. B.V. Limaye IIT Dharwad

Uploaded by

Copyright:

Available Formats

MA 106: Linear Algebra

Prof. B.V. Limaye

Thursday, 15 February 2018

B.V. Limaye, IITDH MA 106: Lec-18

ỹ := x − Pu1 (x) − · · · − Puk (x) = x − ⟨u1 , x⟩u1 − · · · − ⟨uk , x⟩uk

is orthogonal to each uj for j = 1, . . . , k, and so ỹ ∈ Y ⊥ .

B.V. Limaye, IITDH MA 106: Lec-18

PY (x) = ⟨u1 , x⟩u1 + · · · + ⟨uk , x⟩uk and ỹ = x − y.

Proof. If Y = {0}, then every x ∈ Y ⊥ , and so x = 0 + x.

PY (x) := ⟨u1 , x⟩u1 + · · · + ⟨uk , x⟩uk .

Clearly, y := PY (x) ∈ Y . Define ỹ := x − PY (x). Then

B.V. Limaye, IITDH MA 106: Lec-18

We now give an application of the above considerations to a

Also, according to an earlier formula, the square of this

B.V. Limaye, IITDH MA 106: Lec-18

Now the normal equations B∗ Bx = B∗ b have many solutions

B.V. Limaye, IITDH MA 106: Lec-18

is minimised, that is, to find the best approximation to the

by finding the best approximation to the vector b from the

B.V. Limaye, IITDH MA 106: Lec-18

(−1, 1) b b (1, 1) (-1,1) b b (1, 1)

B.V. Limaye, IITDH MA 106: Lec-18

You might also like