You are on page 1of 80

Chapter 4

Inner Product Spaces

MATH2601 Slides, 2021 – p. 1


4.1 The dot product in Rp
You should recall from first year the dot product of two
vectors in Rp :
a · b = a1 b1 + · · · + ap bp .
We can now summarise its major properties by saying the dot
product is a real-valued, symmetric, bilinear map on Rp .
It is also positive definite:
Definition 4.1 A bilinear F-valued map, T , on Fp is positive
definite if for all a ∈ Fp , T (a, a) > 0 and T (a, a) = 0 if and only
if a = 0.
p √
Recall the length of a vector in R is kak = a · a, and positive
definiteness implies that the zero vector is the only vector of
zero length.

MATH2601 Slides, 2021 – p. 2


We can clearly borrow the definition of the dot product for
fields other than R, but we run into problems:
!
1
• In C2 if a = then 12 + i2 = 0 but a 6= 0.
i
 
1
• Suppose F = Z3 , V = F3 and a =  
1 then
1
a · a = 1 + 1 + 1 = 0 but a 6= 0.

We can fix up positive definiteness in Cn (at a cost, as we


shall see), but with finite fields the problem cannot be made to
go away.
So in this chapter, we will explicitly stick to vector spaces over
R or C, and I will say which we are looking at.

MATH2601 Slides, 2021 – p. 3


Theorem 4.1 (Cauchy-Schwarz Inequality in Rp ) For any
a, b ∈ Rp we have

−||a|| ||b|| ≤ a · b ≤ ||a|| ||b||.

Proof: We may assume a 6= 0, b 6= 0. Why?


For arbitrary λ ∈ R, consider

q(λ) = ||a − λb||2 = λ2 (b · b) − 2λ(a · b) + (a · a) .

Clearly q(λ) > 0 for all λ, and q(λ) is a quadratic in λ.


Hence, q(λ) has non-negative discriminant:

4(b · b)(a · a) − 4(a · b)2 > 0


i.e. (a · b)2 ≤ ||a||2 ||b||2 .

The result follows. 


MATH2601 Slides, 2021 – p. 4
Corollory 4.2 If a 6= 0, b 6= 0 then
a·b
−1 ≤ ≤ 1.
||a|| ||b||

Note: In R2 , a · b = ||a|| ||b|| cos θ, for 0 ≤ θ ≤ π the angle


between the vectors:
b b

0 b θ
b
a
Definition 4.2 If a, b ∈ Rn are non-zero then the angle θ
between a and b is defined by
a·b
cos θ = , θ ∈ [0, π].
||a|| ||b||

We call non-zero vectors a and b orthogonal if a · b = 0.


MATH2601 Slides, 2021 – p. 5
Example 4.1 In R17 let U = {x : xi = 0, 1 ≤ i ≤ 7} and
V = {x : xi = 0, 8 ≤ i ≤ 17}.
Note that every vector in U is orthogonal to every vector in V .
Furthermore dim U = 10, dim V = 7 and U ∩ V = {0}, therefore
U ⊕ V = R17 .
We have a special name for subspaces like this:
Definition 4.3 Let X ≤ Rp for some p. The space

Y = {y ∈ Rp : y · x = 0 for all x ∈ X}

is called the orthogonal complement of X , X ⊥ .


Proposition 4.3 For any X ≤ Rp , Rp = X ⊕ X ⊥ and
(X ⊥ )⊥ = X .
Proof: Deferred until we have more tools, when it is easy: see
Theorem 4.11. 
MATH2601 Slides, 2021 – p. 6
Definition 4.4 A set S = {v1 , . . . , vk } ⊆ Rp of non-zero
vectors is orthogonal if vi · vj = 0, i 6= j .
(
0 i 6= j
We say S is orthonormal if vi · vj = δij = .
1 else
A standard example of an orthonormal set is the standard
basis.
Lemma 4.4 An orthogonal set S in Rp is linearly independent.
k
X
Proof: Assume S is as above and that αi vi = 0. Then for
i=1
each j
k
!
X
0 = 0 · vj = αi vi · vj = αj kvj k2 .
i=1

Hence all αj = 0 and S is independent (and so k ≤ p). 


MATH2601 Slides, 2021 – p. 7
Theorem 4.5 (The Triangle Inequality) For a, b ∈ Rp
||a + b||
||a + b|| ≤ ||a|| + ||b||.
||b||
||a||
Proof:

||a + b||2 = (a + b) · (a + b)
= ||a||2 + 2a · b + ||b||2
≤ ||a||2 + 2||a|| ||b|| + ||b||2
(by Cauchy-Schwarz)
= (||a|| + ||b||)2 .

Since both terms are positive, ||a + b|| ≤ ||a|| + ||b||. 

MATH2601 Slides, 2021 – p. 8


4.2 Dot Product in Cp
As the example in the previous section suggested, the dot
product in Cp must be different from that in Rp .
The problem we hit was that the square of a complex number
can be negative.
However, zz = |z|2 ≥ 0 for all z ∈ C so if a ∈ Cp ,
aT = (a1 , . . . , ap ) we can try
p
X n
X
||a||2 = ai ai = ai ai = aT a.
i=1 i=1

Then ||a|| ∈ R+ and ||a|| = 0 ⇐⇒ a = 0.

This suggests the next definition. . .

MATH2601 Slides, 2021 – p. 9


Definition 4.5 The standard dot product on Cp is defined by
p
X
a·b= ai bi = aT b .
i=1

As in the real case, the length or norm of a vector in Cp is



kak = a · a
We will use a∗ as a useful shorthand for aT from now on, so
that a · b = a∗ b.
Some texts (or lecturers) define the standard dot product on
Cp as aT b.
The mathematics is basically the same, but I find my definition
more natural.
Just take care to check which convention is used.

MATH2601 Slides, 2021 – p. 10


Example 4.2 In C2 ,
! !
1 1−i
· = 1(1 − i) + (−i)(2i) = 3 − i
i 2i

Lemma 4.6 The standard dot product on Cp has the following


properties:
a) a · (λb + c) = λa · b + a · c for λ ∈ C.
b) b · a = a · b.
c) (λb + c) · a = λb · a + c · a for λ ∈ C.
d) kak > 0 and kak = 0 ⇐⇒ a = 0 .
Proof: EXERCISE 

MATH2601 Slides, 2021 – p. 11


We see from this lemma that the standard dot product on Cp is
not bilinear or symmetric, but is positive definite.
We refer to property c) in lemma 4.6 as conjugate linearity,
and property b) as conjugate symmetry.
The term sesquilinearity is sometimes used for properties a)
and c) together.
While the angle between complex vectors does not really
make sense, we do use the !
same definition
! for orthogonality in
1 1
Cp as in Rp : for example and are orthogonal.
i −i
In addition, Cauchy-Schwarz (|u · v| ≤ ||u|| ||v||), the triangle
inequality, orthogonality, orthonormality, orthogonal
subspaces all work nicely for Cp .
But rather than explore that, we turn to the more general
setting. . .
MATH2601 Slides, 2021 – p. 12
4.3 Inner Product Spaces
For the rest of this section, F = R or F = C: I shall usually
state and prove theorems in the F = C case; F = R is
analogous but often more straightforward.
Definition 4.6 If V is a vector space over F then an inner
product on V is a function h , i : V × V → F, that is, for all
u, v ∈ V hu, vi ∈ F, such that
IP1 hu, v + wi = hu, vi + hu, wi .
IP2 hu, αvi = αhu, vi .
IP3 hv, ui = hu, vi .
IP4 hv, vi is real and > 0 if v 6= 0 and = 0 if v = 0.
We call V with h , i an inner product space.
The norm of a vector is then kvk = hv, vi1/2 .
If F = R, then IP3 is just symmetry: hv, ui = hu, vi
MATH2601 Slides, 2021 – p. 13
Example 4.3 The standard dot products on Rp and Cp are
clearly inner products.
Example 4.4 For C[0, 1], the vector space of all continuous
real valued functions on [0, 1], the function
Z 1
hf, gi = f (t)g(t) dt
0

is also an inner product.


The only non-trivial fact is:

if f 6= 0 then hf, f i > 0 .

(This is an EXERCISE in elementary analysis.)


We can generalise this to any interval [a, b] of course, and even
to real valued functions on closed bounded subsets of Rn .

MATH2601 Slides, 2021 – p. 14


Example 4.5 Let X, Y ∈ Mp,q (C), and define

hX, Y i = tr(X ∗ Y )

This is an inner product on Mp,q (C)


As usual, only positive definiteness needs any work:
By definition of matrix multiplication, the row i, column j
element of X ∗ X is
p
X p
X
(X ∗ X)ij = (X ∗ )ik (X)kj = xki xkj ,
k=1 k=1

and hence
q X
X p
tr(X ∗ X) = |xki |2 ≥ 0 .
i=1 k=1

Clearly, tr(X ∗ X) = 0 iff X is the zero matrix. 


MATH2601 Slides, 2021 – p. 15
Example 4.6 Now let V = R2 and define

hu, vi = u1 v1 − u1 v2 − u2 v1 + 7u2 v2
! !
1 −1 v1
= (u1 u2 )
−1 7 v2
= uT Av, A symmetric.
Then hu, ui = u21 − 2u1 u2 + 7u22
= (u1 − u2 )2 + 6u22

So hu, ui is clearly non-negative, and is zero iff u = 0.


It is easy to check that the other properties of an inner product
hold for hu, vi (EXERCISE), and so hu, vi defines an inner
product on R2 .

MATH2601 Slides, 2021 – p. 16


Example 4.7 Let ℓ2 be the vector space of real sequences
X∞
{ak } that are square summable: a2k is finite.
k=1

X
Show that h{ak }, {bk }i = ak bk is an inner product on ℓ2 .
k=1
SOLUTION: The tricky part here is proving the definition of
h{ak }, {bk }i actually makes sense: checking the properties of
an inner product hold is then a routine EXERCISE.

But the am/gm inequality, 21 (x + y) ≥ xy , implies that

1 2
|ak bk | ≤ (ak + b2k ) .
2 P∞
Then the comparison test implies that k=1 ak bk is absolutely
convergent, hence converges, so the definition does make
sense. 
MATH2601 Slides, 2021 – p. 17
Lemma 4.7 Let V be an inner product space. Then for
u, v, w ∈ V and α ∈ C:
a) ||u|| > 0 if and only if u 6= 0.
b) hu + v, wi = hu, wi + hv, wi.
c) hαu, vi = αhu, vi and ||αu|| = |α| ||u||.
d) hx, ui = 0 for all u if and only if x = 0.
e) |hu, vi| ≤ ||u|| ||v|| (Cauchy-Schwarz inequality)
f) ||u + v|| ≤ ||u|| + ||v|| (the triangle inequality).
Proof: Parts a) to d) are routine EXERCISES, and f) will follow
from e) as in the proof of theorem 4.5 with one minor
modification.
Cauchy-Schwarz requires a slightly different argument, but as
before we assume neither u nor v are zero.

MATH2601 Slides, 2021 – p. 18


Set A = kuk2 , B = |hu, vi| and C = kvk2 .
There is a complex number z with |z| = 1 such that zhu, vi = B
(why?).
Then if λ is real

hu − zλv, u − zλvi = hu, ui − zλhu, vi − zλhv, ui + λ2 hv, vi

The expression on the left is real and non-negative, and


zhv, ui = zhu, vi = B , so for all real λ,

A − 2Bλ + Cλ2 ≥ 0 .

Since C 6= 0, we can set λ = B/C and rearranging gives

B 2 ≤ AC

which is the Cauchy-Schwarz inequality. 

MATH2601 Slides, 2021 – p. 19


Notes
a) Example 4.6 can obviously be generalised in Rn by
choosing a suitable symmetric matrix A.
We can even use the same idea in Cn , by defining
hu, vi = u∗ Av where A must be Hermitian: A∗ = A.
Not every symmetric (or Hermitian) matrix
! will define an
1 0
inner product of course: A = would not do.
0 −1
But it is only the postitive definiteness that fails, and as
long as the symmetric (Hermitian) matrix is invertible a lot
of interesting results still hold.
For example, Special Relativity can be considered
(mathematically) as a branch of linear algebra with an
indefinite inner product.

MATH2601 Slides, 2021 – p. 20


b) Example 4.4 can also be generalised in other ways. For
example, if w(t) > 0 is continuous, then
Z 1
hf, gi = f (t)g(t)w(t) dt
0

also gives an inner product on C[0, 1].


We call w(t) a weighting function.
c) If we choose the weighting function appropriately, and/or
restrict the set of allowable functions, we could even
create an inner product for functions over R.
We are heading into infinite dimensional inner product
spaces here, which is more properly studied in courses
on functional analysis.

MATH2601 Slides, 2021 – p. 21


d) We can use the Cauchy-Schwarz
Z 1 sZ 1 Z 1

f (t) g(t) dt ≤ f (t) 2 dt g(t)2 dt

0 0 0

and the triangle inequalities


s s
Z 1 Z 1
(f (t) + g(t))2 dt ≤ f (t)2 dt
0 0
s
Z 1
+ g(t)2 dt
0

to obtain results on C[0, 1] which are hard to prove directly.

MATH2601 Slides, 2021 – p. 22


4.4 Orthogonality and Orthonormality
Many of our earlier definitions can be easily adapted to the
general case:
Definition 4.7 Let V be an inner product space. Non-zero
vectors u and v are orthogonal if hu, vi = 0; we will use the
notation u ⊥ v for this.
A set S = {v1 , . . . , vk } ⊆ V of non-zero vectors is orthogonal
if hvi , vj i = 0, i 6= j .
(
1 i=j
We say S is orthonormal if hvi , vj i = δij = .
0 i 6= j
Clearly, we can always rescale each member of an orthogonal
set to make it orthonormal: just divide each vi by its length
hvi , vi i1/2 , which we know is non-zero.
Also, with the same proof as lemma 4.4, we note that
orthogonal sets are linearly independent.
MATH2601 Slides, 2021 – p. 23
Example 4.8 Show that in M2,2 (R)
( ! ! ! !)
1 1 1 1 1 −1 1 −1 −1 1 1 −1
, , ,
2 1 1 2 1 −1 2 1 1 2 −1 1

is an orthonormal basis if hX, Y i = tr(X ∗ Y ).


SOLUTION: Calling the matrices M1 ,. . . ,M4 in the order
given, we have, for example
! !
∗ 1 1 1 ∗ 1 1 1
M1 M1 = , M2 M1 =
2 1 1 2 −1 −1

and so hM1 , M1 i = 1 and hM2 , M1 i = 0.


I leave the rest of the inner products as an EXERCISE.
The set is thus l.i and is a basis as dim(M2,2 (R)) = 4. 

MATH2601 Slides, 2021 – p. 24


Example 4.9 Consider the continuous Zfunctions on [−π, π],
π
C[−π, π] with the inner product hf, gi = f (x)g(x) dx
−π
Note that for p, q ∈ Z,
Z π Z π
1
sin(px) sin(qx) dx = cos((p − q)x) − cos((p + q)x) dx
−π 2 −π

is zero if p 6= q . So

S = {sin(x), sin(2x), sin(3x), . . . }

is an infinite orthogonal set in this inner product space.


EXERCISE: prove similarly that

C = {1, cos(x), cos(2x), cos(3x), . . . }

is an infinite orthogonal set in this inner product space.


MATH2601 Slides, 2021 – p. 25
Definition 4.8 In inner product space V let v 6= 0.
The projection of u ∈ V onto v is defined as

hv, ui
projv (u) = v.
hv, vi

Note: u − αv ⊥ v ⇐⇒ u − αv = u − projv u.
The picture in R2 is:
u

0 b

projv u v
Note that the projection operator is linear (EXERCISE):

projv (αx + y) = α projv (x) + projv (y) .

(This is one way of remembering that you use hv, ui in the


complex case and not hu, vi.)
MATH2601 Slides, 2021 – p. 26
Example 4.10 Consider C2 with the ! standard inner product:
!
−5 + 9i 1+i
find the projection of u = onto v = .
−3 − 2i 2−i
SOLUTION: We have

hv, ui = (1 − i)(−5 + 9i) + (2 + i)(−3 − 2i) = 7i


hv, vi = |1 + i|2 + |2 + i|2 = 7
! !
1+i −1 + i
So projv (u) = i = . 
2−i 1 + 2i
!
−4 + 8i
Note that x = u − projv u = and
−4 − 4i

hx, vi = (−4 − 8i)(1 + i) + (−4 + 4i)(2 − i) = 0 .


MATH2601 Slides, 2021 – p. 27
Example 4.11 In C[−π, π] from example 4.9, find the
projection of f (x) = x onto g(x) = sin(px) for p ∈ Z.
SOLUTION: We have
Z π

hg, f i = x sin(px) dx = · · · = (−1)p+1 ,
−π p
Z π
hg, gi = sin2 (px) dx = · · · = π .
−π

And so
2
projg (f ) = (−1)p+1 sin(px). 
p
Some of you will recognise the sort of calculation we are doing
here: hg, f i is a Fourier coefficent (up to a scale factor).
We can push this idea on a little more. . .

MATH2601 Slides, 2021 – p. 28


Lemma 4.8 If S = {v1 , . . . , vk } is an orthogonal set of
non-zero vectors in inner product space V and v ∈ span(S)
Xk
then v = projvi v.
i=1
k
X
If S is an orthonormal set v= hvi , vivi .
i=1
Proof: Since S is orthogonal, it is an independent set so there
are unique αi with v = α1 v1 + · · · + αk vk . So
k
X
hvi , vi = hvi , αj vj i = αi hvi , vi i
j=1

hvi , vi
Thus αi = and so αi vi = projvi v.
hvi , vi i
The second part follows trivially. 
MATH2601 Slides, 2021 – p. 29
Corollory 4.9 If {e1 , . . . , en } is an orthonormal basis for V
Xn
then v = hei , viei .
i=1

These last two results illustrate one reason we are interested


in orthogonal and orthonormal bases: we can find coordinates
in them without solving linear equations.
Orthonormal bases have many other uses, which we will cover
later, but firstly we move on to look at a way of creating
orthonormal bases in any (finite dimensional) inner product
space.

MATH2601 Slides, 2021 – p. 30


4.5 The Gram-Schmidt Process
Theorem 4.10 (Gram-Schmidt Process) Every finite
dimensional inner product space has an orthonormal basis.
Proof: The proof is constructive: we show how to turn any
given basis into an orthogonal one and then normalise.
Suppose S = {v1 , . . . , vp } is a basis for V over F. Then let

w1 = v 1
w2 = v2 − projw1 (v2 )
w3 = v3 − projw1 (v3 ) − projw2 (v3 )
.. .. ..
. . ·················· .
k
X
wk+1 = vk+1 − projwj (vk+1 )
j=1

MATH2601 Slides, 2021 – p. 31


You should be aware of how this looks in Rp :
v3
w3
v2
w2 w2

v 1 = w1
w1
We now have to prove that {w1 , . . . , wp } is an orthogonal
basis, which we do by induction:
Let P (k) be the proposition that

span{v1 , . . . , vk } = span{w1 , . . . , wk }
and wk ⊥ wi , 1 ≤ i ≤ k − 1

Then P (1) is true: why? As we have chosen w1 = v1 .

MATH2601 Slides, 2021 – p. 32


Next, let k ≥ 2 and assume P (j) for all j from 1 to k − 1, that is
for all j < k

span{v1 , . . . , vj } = span{w1 , . . . , wj }
and wj ⊥ wi , 1 ≤ i ≤ j − 1

k−1
X hwi , vk i
Now wk = v k − wi and as the vi are l.i.,
hwi , wi i
i=1

wk ∈
/ span{v1 , . . . , vk−1 } = span{w1 , . . . , wk−1 } .

As wk 6= 0,

span{v1 , . . . , vk } = span{w1 , . . . , wk−1 , vk } = span{w1 , . . . , wk }

MATH2601 Slides, 2021 – p. 33


Finally, for 1 ≤ j ≤ k − 1,
* k−1
+
X
hwj , wk i = wj , v k − projwi (vk )
i=1
* k−1
+
X hwi , vk i
= wj , v k − wi
hwi , wi i
i=1
hwj , vk i
= hwj , vk i − hwj , wj i = 0 .
hwj , wj i

And hence P (k) is true.


The result now follows by induction. 
wi
I will usually use the notation ei = (sometimes I will use
kwi k
qi ) to denote the members of the orthonormal basis generated
by the Gram-Schmidt process.
MATH2601 Slides, 2021 – p. 34
Example 4.12 Apply the Gram-Schmidt process to the
vectors
     
1 3 3
     
v1 = 2 , v2 =  1  , v3 = 1
2 −4 2

in R3 with the standard dot product.


SOLUTION:
     
1 2 2
1  1  1 
e1 = 2 , e2 =  1  , e3 = −2
3 3 3
2 −2 1

MATH2601 Slides, 2021 – p. 35


Example 4.13 Find an orthonormal basis of C 3 (with its

1
 
standard dot product) containing a multiple of  i .
1
SOLUTION:
      

 1 1 2 0 
  1   1  
√  i  , √  −i  , √ −i

 3 1 6 2 
−1 1 

MATH2601 Slides, 2021 – p. 36


Example 4.14 Find an orthonormal basis for span({1, t, t2 }) as
Z 1
a subspace of C[0, 1] with respect to hf, gi = f (t) g(t) dt.
0
SOLUTION:
    
√ 1 √ 2 1
1, 2 3 t − ,6 5 t − t + .
2 6

MATH2601 Slides, 2021 – p. 37


4.6 Orthogonal Complements
Definition 4.10 Let X ≤ V for some inner product space V .
The space

Y = {y ∈ V : hy, xi = 0 for all x ∈ X}

is called the orthogonal complement of X , X ⊥ .

We now have a generalisation of proposition 4.3


Theorem 4.11 Suppose V is a finite dimensional inner
product space and W ≤ V . Then

V = W ⊕ W⊥ and (W ⊥ )⊥ = W .

MATH2601 Slides, 2021 – p. 38


Proof: Let {v1 , . . . , vk } be an orthonormal basis for W
(Theorem 4.10) and complete to an orthonormal basis of V .

B = {v1 , . . . , vk , . . . , vn }.

Let X = span{vk+1 , . . . , vn }. Clearly

X ∩ W = {0 }, X ⊕W =V and X ⊥ W

So X ≤ W ⊥ .
But if v ∈ W ⊥ then hvi , vi = 0 for 1 ≤ i ≤ k , and so
Xn
v= αi vi for some αi , and hence v ∈ X , so W ⊥ ≤ X .
i=k+1

Thus W ⊥ = X and V = W ⊕ W ⊥ .
By symmetry (W ⊥ )⊥ = W . 

MATH2601 Slides, 2021 – p. 39


This result is not true for infinite dimensions:
Example 4.15 Returning to ℓ2 from example 4.7, let V be the
subspace of all sequences with only finitely many non-zero
terms.
That this is a proper subspace is straightforward, since, for
example {n−1 } ∈ ℓ2 but is not in V .
I claim that V ⊥ is trivial: for suppose {ak } ∈ V ⊥ , and that
ai 6= 0 for some i.
Then the sequence {bk } with bi = 1 and all the other terms
zero is in V , and h{ak }, {bk }i = ai 6= 0, contradiction.
So clearly ℓ2 is not the sum of V and its orthogonal
complement.
Infinte dimensional inner product spaces and their properties
are topics studied in courses in Functional Analysis such as
MATH3611.
MATH2601 Slides, 2021 – p. 40
Corollory 4.12 Suppose V is a finite dimensional inner
product space, W ≤ V and v ∈ V then

v = a + b, a ∈ W, b ∈ W ⊥

where a and b are unique.


We say a = projW v and b = projW ⊥ v.
k
X n
X
(Of course, a = projvi v and b = v − a = projvi v, for a
i=1 i=k+1
basis of V is as in the proof of Theorem 4.11.)

In the same way that projection onto a vector is linear, for any
W ≤ V , projW is a linear map on V .
It follows from the corollary that im(projW ) = W and
ker(projW ) = W ⊥ (in finite dimensions at least).

MATH2601 Slides, 2021 – p. 41



Example 4.16 Consider span(1, t, t2 ,
t) ≤ C[0, 1] with the
usual inner product.

Find the projection of t onto W = span(1, t, t2 ).
SOLUTION: We have an orthonormal bases for W from
example 4.14, so we can use that.

2 2

3 + 24t − 10t
35

MATH2601 Slides, 2021 – p. 42


Lemma 4.13 In finite dimensional product space V with
W ≤V:
a) If v ∈ V then v − projW (v) is in W ⊥ .
b) If w ∈ W then projW (w) = w. Consequently, the
projection mapping is idempotent: that is,
(projW ) ◦ (projW ) = projW .
c) For all v ∈ V we have k projW (v)k ≤ kvk.
d) The function projW + proj(W ⊥ ) is the identity function on V .

Proof: All are straighforward corollaries of the previous result,


and left as EXERCISES. 

MATH2601 Slides, 2021 – p. 43


Lemma 4.14 Let W be a subspace of a finite-dimensional
inner product space V , and let v ∈ V . For every w ∈ W we
have
kv − wk ≥ kv − projW (v)k ,
with equality if and only if w = projW (v), i.e. v − w ∈ W ⊥ .
Proof: Let x = projW (v), so and w ∈ W , then v − x ⊥ w − x,
using lemma 4.7 a) since the latter is in W . Then

kv − wk2 = k(v − x) − (w − x)k2 = kv − xk2 + kw − xk2


≥ kv − xk2 . (*)

Moreover, equality holds in the given result if and only if


equality holds in (∗), if and only if kw − xk2 = 0, if and only if
w = x. 
You may find it easier to make sense of this proof if you
visualise the situation where W is a plane in R3 .
MATH2601 Slides, 2021 – p. 44
The point of this last lemma is that the projection onto a
subspace solves a minimising problem:
If v is not in W , what point in W is closest to it? (Closest in the
sense defined by the inner product.)
We will return to this idea when we consider least squares
problems later in this chapter.
But first I want to look at how inner products interact with
linear maps and matrices.

MATH2601 Slides, 2021 – p. 45


4.7 Adjoints
If we were to fix a vector x in inner product space V , then the
properties of the inner product tell us that the map v 7→ hx, vi
is a linear map from V to the base field, F.
Maps from V to the field F are called covectors, and the set
of all covectors is called the dual space V ∗ .
From corollary 3.14, for finite dimensional spaces V and V ∗
have the same dimension, and so are isomorphic
(theorem 3.11).
It turns out that on a finite-dimensional inner product space,
every covector can be written as an inner product with a fixed
vector, and in the real case this leads us to a canonical (basis
independent) isomorphism between V and V ∗ . . .

MATH2601 Slides, 2021 – p. 46


Lemma 4.15 If (V, h , i) is a finite dimensional inner product
space and T : V → F is linear then there is a unique vector
t ∈ V such that for all v ∈ V , T (v) = ht, vi.
Proof: Let B = {ei } be an orthonormal basis of V and define
scalars ti = T (ei ).
P
Then I claim that the vector t = i ti ei is the vector we want.
P
Because firstly, if v = i vi ei for any vector,
X X
T (v) = vi T (ei ) = v i ti .
i i

But on the other hand


* +
X X X X
ht, vi = t i ei , v j ej = ti vj hei , ej i = ti v i ,
i j i,j i

and so t is suitable.
MATH2601 Slides, 2021 – p. 47
Uniqueness follows from definiteness as usual: if s is another
such vector then we have ht, vi = hs, vi for all v ∈ V .
But then ht − s, vi = 0 for all v ∈ V , and so t − s = 0. 

For real IP space, the map sending covector T to vector t is an


isomorphism between V and the dual space V ∗ .
Firstly we check linearity:
Let λ ∈ R, T (v) = ht, vi and S(v) = hs, vi, then

(λT + S)(v) = λT (v) + S(v) = λht, vi + hs, vi = hλt + s, vi

So λT + S maps to λt + s in the real case.


Since we have already proved uniqueness, the map is
one-to-one.
But as V and V ∗ have the same dimension, this means the
map in Lemma 4.15 is an isomorphism (theorem 3.9).

MATH2601 Slides, 2021 – p. 48


The isomorphism T 7→ t is called raising the index and we
write t = T ♯ .
Its inverse, taking a vector t to the linear map T with
T (v) = ht, vi, is called lowering the index, and we write
T = t♭ .
In the case of Rn with the standard dot product, v · w = vT w,
so v♭ = vT and the action of v♭ is just matrix multiplication.
This is why we can think of the dual space Rn∗ as row vectors.

Note: in the complex case we would have conjugate linearity,


and so have an isomorphism to the conjugate dual space V ∗ .
We will not get involved in this!

MATH2601 Slides, 2021 – p. 49


We can use these isomorphisms to find a way to “reverse” a
linear map T : V → W between inner product spaces.
Theorem 4.16 For any linear map T : V → W between
finite-dimensional inner product spaces, there is a unique
linear map T ∗ : W → V called the adjoint of T with

hw, T (v)i = hT ∗ (w), vi (1)

for all v ∈ V and w ∈ W .


(NB the inner product on the left is in W , on the right is in V ).
Proof: I am going to prove uniqueness here, and leave the
existence to the end of the section.
Suppose we have two linear maps A and B with

hw, T (v)i = hA(w), vi = hB(w), vi

for all v ∈ V and w ∈ W .


MATH2601 Slides, 2021 – p. 50
Then for all v ∈ V and w ∈ W , h(A − B)(w), vi = 0.
By positive definiteness, (A − B)(w) = 0 for all w ∈ W .
Hence A = B and the adjoint (if it exists) is unique.
See the end of the section for the rest of the proof. 

Uniqueness implies we do not need to use the proof to find


the adjoint: if we find a linear map T ∗ satisfying equation (1)
then it is the adjoint.
For example, in the case of the standard inner product on Rp ,
suppose T (v) = Av for some p × q matrix A.
Then
w · Av = wT Av = (AT w)T v = (AT w) · v .

So T ∗ (w) = AT w – the adjoint “is just the transpose”.

MATH2601 Slides, 2021 – p. 51


T
Similarly for Cp we find T ∗ (w) = A w – the adjoint is the
T
conjugate transpose, and we refer to A as the adjoint of A,
and write it as A∗ .

Uniqueness also allows us the easy result:


Proposition 4.17 For any inner product space, V , the identity
map on V is its own adjoint.
Proof: Trivial. 

MATH2601 Slides, 2021 – p. 52


Example 4.18 Consider the integral inner product
Z 1
hp, qi = p(t)q(t) dt on both P2 (R) and P1 (R). Find the
−1
adjoint of the differentiation map T : P2 (R) → P1 (R), p 7→ p′ .
SOLUTION: In order to find T ∗ , we need to determine
r0 , r1 , r2 in terms of q0 , q1 , such that the equation
Z 1 Z 1
(q0 + q1 t)(p1 + 2p2 t) dt = (r0 + r1 t + r2 t2 )(p0 + p1 t + p2 t2 ) dt
−1 −1

is true for all p0 , p1 , p2 .


Using Maple, we find that we need, for all p0 , p1 , p2 ,
     
2 2 2 4 2
r2 + 2r0 p0 + r1 − 2q0 p1 + r2 − q1 + r0 p2 = 0
3 3 5 3 3

MATH2601 Slides, 2021 – p. 53


Clearly, that implies each coefficent of the pi is zero, and
solving we find that the adjoint is T ∗ : P1 (R) → P2 (R) where

∗ 5 15 2
T (q0 + q1 t) = − q1 + 3q0 t + q1 t . 
2 2

The adjoint will change if the inner product does, of course.


For example, if the integration in the previous example had
been over [0, 1], the adjoint would have been

T ∗ (q0 + q1 t) = 2q1 − 6q0 + (12q0 − 24q1 )t + 20q1 t2 .

MATH2601 Slides, 2021 – p. 54


Theorem 4.18 Suppose that V and W are finite-dimensional
inner product spaces; let S and T be linear transformations
from V to W . Then
a) (S + T )∗ = S ∗ + T ∗
b) for any scalar α we have (αT )∗ = αT ∗ ;
c) (T ∗ )∗ = T .
d) if U : W → X is linear then (U ◦ T )∗ = T ∗ ◦ U ∗
Proof: All these results follow from uniqueness and linearity.
For example for (a) (the rest are an EXERCISE) we have

h(S + T )∗ (w), vi = hw, (S + T )(v)i = hw, S(v)i + hw, T (v)i


= hS ∗ (w), vi + hT ∗ (w), vi
= h(S ∗ + T ∗ )(w), vi

So by uniqueness of the adjoint (S + T )∗ = S ∗ + T ∗ . 


MATH2601 Slides, 2021 – p. 55
Using corollary 4.9 we can now show that in an orthonormal
basis, the matrix of the adjoint of a map is the adjoint
(conjugate transpose) of its matrix:
Theorem 4.19 Let V and W be finite-dimensional inner
product spaces with orthonormal bases B and C respectively.
If A is the matrix of the linear transformation T : V → W with
respect to bases B and C , then the matrix of T ∗ : W → V with
respect to bases C and B is the adjoint of A.
Proof: Let B = {e1 , . . . , ep }, C = {f1 , . . . , fq } and A = (aij ).
By definition of the matrix of a linear map, the ith column of A
is the coordinate vector with respect to C of T (ei ).
The element aji is the j th component of this coordinate vector;
since C is orthonormal, corollary 4.9 gives

aji = hfj , T (ei )i .

MATH2601 Slides, 2021 – p. 56


Now let M be the matrix of T ∗ : W → V with respect to bases
C and B .
By the same argument as before

mij = ith component of [T ∗ (fj )]B


= hei , T ∗ (fj )i
= hT (ei ), fj i
= hfj , T (ei )i = aji .

Thus M is the conjugate of the transpose of A. 


 √   
1 6 3√ 2 1
I invite you to show that √ , t, 10 t − is an
2 2 4 3
orthonormal basis of P2 (R) for the inner product in
example 4.18 and check the formula for the adjoint we found
in example 4.18.
MATH2601 Slides, 2021 – p. 57
We finish this section by completing the proof of the existence
of the adjoint:
Theorem 4.16 For any linear map T : V → W between
finite-dimensional inner product spaces, there is a unique
linear map T ∗ : W → V called the adjoint of T with

hw, T (v)i = hT ∗ (w), vi (1)

for all v ∈ V and w ∈ W .


Proof: Recall we have already proved uniqueness.
Although this existence part is true for real and complex inner
product spaces, I am only going to prove it in the real case, as
that shows the important part.
For the complex case our raising and lowering maps need
judicious conjugations in order to preserve linearity, and that
obscures what is happening.

MATH2601 Slides, 2021 – p. 58


So assuming the field is R, pick any w ∈ W , and define the
map ω : V → F by

ω(v) = hw, T (v)i = w♭ (T (v)) .

Since T and the inner product are linear, ω is linear and hence
is in V ∗ .
Now define T ∗ (w) = ω ♯ ∈ V , so that hT ∗ (w), vi = hw, T (v)i.
Now T ∗ is linear as for all v ∈ V

hT ∗ (λw1 + w2 ), vi = hλw1 + w2 , T (v)i


= λhw1 , T (v)i + hw2 , T (v)i
= λhT ∗ (w1 ), vi + hT ∗ (w2 ), vi
= hλT ∗ (w1 ) + T ∗ (w2 ), vi .
So T ∗ (λw1 + w2 ) = λT ∗ (w1 ) + T ∗ (w2 ) from lemma 4.7 d),
i.e. T ∗ is linear. 
MATH2601 Slides, 2021 – p. 59
4.8 Maps with Special Adjoints
Definition 4.11 Let T : V → V be a linear map on a
finite-dimensional inner product space V . Then T is said to be
• unitary if T ∗ = T −1 ;
• an isometry if kT (v)k = kvk for all v ∈ V ;
• self-adjoint or Hermitian if T ∗ = T .
Two properties of these types of maps are easy:
Proposition 4.20 a) A map T is unitary if and only if T ∗ is
unitary.
b) The set of all unitary transformations on V forms a group
under composition.
Proof: EXERCISE 

MATH2601 Slides, 2021 – p. 60


Theorem 4.21 Let T be a linear map on a finite-dimensional
inner product space V . Then the following are equivalent:
a) T is an isometry;
b) hT (v), T (w)i = hv, wi for all v, w ∈ V (i.e. T preserves
inner products);
c) T is unitary (i.e. T ∗ ◦ T is the identity);
d) T ∗ is an isometry;
e) if {e1 , . . . , en } is an orthonormal basis for V then so is
{T (e1 ), . . . , T (en )}.
Proof (Outline). To show a) ⇒ b) we need the polarisation
identities, for example (real case)
1
hx, yi = (kx + yk2 − kx − yk2 ) .
4
See the problem sheets for the complex one.
MATH2601 Slides, 2021 – p. 61
For b) ⇒ c), note that for all v and w we have

hv, wi = hT ∗ ◦ T (v), wi

and lemma 4.7 d) implies T ∗ ◦ T is the identity.


To show that c) implies d), observe that T ◦ T ∗ is also the
identity and thus for all v

hv, vi = hv, T ◦ T ∗ (v)i = hT ∗ (v), T ∗ (v)i .

And d) ⇒ a) now follows from theorem 4.18 c) (T = T ∗∗ ).


Finally, b) ⇒ e) is straightforward, and e) ⇒ a) follows from
applying lemma 4.8 (coordinates in orthonormal bases) to
both v and T (v). 
I leave the details as an EXERCISE.

MATH2601 Slides, 2021 – p. 62


For matrices we have similar definitions:
Definition 4.12 A matrix A ∈ Mp,p (C) is called unitary if
A∗ = A−1 and Hermitian if A∗ = A.
A matrix A ∈ Mp,p (R) is called orthogonal if AT = A−1 and
symmetric if AT = A.
If A is a Hermitian matrix, the map x 7→ Ax is clearly a
Hermitian map (with the standard inner product), and similarly
for unitary, symmetric etc.
Applying theorem 4.21 e) to the standard bases we get:
Corollory 4.22 The columns of a p × p matrix are an
orthonormal basis of Cp if and only if A is unitary.
The columns of a p × p matrix are an orthonormal basis of Rp
if and only if A is orthogonal.
The same results apply to rows.

MATH2601 Slides, 2021 – p. 63


Example 4.19 From examples 4.12 and 4.13 we have
orthogonal matrices
   √ √ 
1 2 2 1/ 3 2/ 6 0
1   √ √ √ 
Q1 = 2 1 −2 and Q2 =  i/ 3 −i/ 6 1/ 2 .
3 √ √ √
2 −2 1 1/ 3 −1/ 6 i/ 2

It is easy to see that the rows of Q1 are an orthonormal basis


of R3 .
Checking that the rows of Q2 are also an orthonormal basis of
C3 is only a little more work.

MATH2601 Slides, 2021 – p. 64


4.9 QR Factorisations
Theorem 4.23 If A is p × q of rank q (so p > q ) then we can
write A = QR where Q is an p × q matrix with orthonormal
columns, and R is an q × q invertible upper triangular matrix.
Proof: One way of proving this is to express the
Gram-Schmidt algorithm in terms of matrices:
Let A have columns v1 , . . . , vq , and define
w1
w1 = v 1 , q1 = ;
||w1 ||
k−1
X wk
wk = v k − hqj , vk iqj , qk = , 2 ≤ k ≤ q.
||wk ||
j=1

Notice I am using the orthonormal vectors (the qi ) in this


process, not the orthogonal vectors (the wi ).

MATH2601 Slides, 2021 – p. 65


We make the vi the subjects of the equations:

v1 = ||w1 ||q1
k−1
X
vk = hqj , vk iqj + ||wk ||qk , 2 ≤ k ≤ q.
j=1

So that
 
A = v1 ... · · · ... vq
 
||w1 || hq1 , v2 i . . . hq1 , vq i
  ||w || . . . hq , v i
.. ..  2 2 q 
= q1 . · · · . qq  . . 
 . . .
. 
||wq ||
= QR

MATH2601 Slides, 2021 – p. 66


Where Q has orthonormal columns and R is upper triangular
and invertible. 
Example 4.20 Returning to examples 4.12 and 4.13 we have
    
1 3 3 1 2 2 3 −1 3
  1  
 2 1 1 =  2 1 −2  5 1
3
2 −4 2 2 −2 1 2

and
   √ √  √ √ √ 
1 1 0 1/ 3 2/ 6 0 3 1/ 3 −i/ 3
   √ √ √  √ √ 
 i 0 1 =  i/ 3 −i/ 6 1/ 2  6/3 i/ 6 
√ √ √ √
1 0 0 1/ 3 −1/ 6 i/ 2 1/ 2

MATH2601 Slides, 2021 – p. 67


In the case where A is square (and so invertible), Q will also
be square and either unitary (if complex) or orthogonal (if
real).
In the non-square cases, we can modify our QR factorisation
to make Q square, at the expense of making R not square:
Corollory 4.24 Let A ∈ Mp,q (F) with p > q and rank(A) = q .
Then we can write A = QeRe, with Qe being p × p unitary (or
e being p × q , of rank q and in echelon form.
orthogonal), and R
Proof: Let A = QR as in theorem 4.23.
We create Qe by completing the columns of Q to an
e is unitary (or orthogonal).
orthonormal basis of Fp , so Q
We create R e by add p − q rows of zeros to the bottom of R.
eR
It follows that Q e = QR = A. 

MATH2601 Slides, 2021 – p. 68


Example 4.21 Find both QR factorisations of
 
1 1 1
−1 0 0
 
A= .
 0 1 0
0 0 1

SOLUTION:
 √ √ √  √ √ √ 
1/ 2 1/ 6 1/ 12 1/2 2 1/ 2 1/ 2
√ √ √
−1/ 2 1/ 6 1/ 12   √ √ 
 1/2   0 6/2 1/ 6
A= √ √  √ 
 0 2/ 6 −1/ 12 −1/2  0 0 2/ 3

0 0 3/ 12 −1/2 0 0 0


MATH2601 Slides, 2021 – p. 69


Example 4.22 Find a QR factorisation of
 
2i −2 0
 
A =  1 2i −2
2 3i 1

SOLUTION:
  
2i 2 −i 3 4i 0
1  
A =  1 2i −2 0 1 i  
3
2 i 2 0 0 2

MATH2601 Slides, 2021 – p. 70


4.10 The Method of Least Squares
Suppose we have a lot of data and in trying to fit a model to it
we find ourselves solving a system of linear equations Ax = b.
Also suppose that we want to take all the data into account
and not introduce too many unknowns, and thus end up
solving p equations in q unknowns with p > q .
Typically, there will be no exact solution: what’s the best that
can be done?
Firstly, of course, we need to know what we mean by “best".

The answer to both questions (due to Legendre and Gauss)


turns out to be minimize the sum of the squares of the errors.
So we minimize kAx − bk using the standard inner product on
Rp or Cp : we call these solutions the least squares solutions
to the system.
MATH2601 Slides, 2021 – p. 71
Now Ax is in the column space of A, so lemma 4.14 tells us
the least squares solution is the projection on b onto col(A).
Another way of saying this is that we want to pick x so that
Ax − b is in col(A)⊥ . But

Ax − b ∈ col(A)⊥ ⇐⇒ hAx − b, Azi = 0 ∀z


⇐⇒ hA∗ (Ax − b), zi = 0 ∀z
⇐⇒ A∗ Ax − A∗ b = 0
⇐⇒ A∗ Ax = A∗ b .

We have proved:
Theorem 4.25 A least squares solution to the system of
equations Ax = b is a solution to the equations A∗ Ax = A∗ b
which are known as the normal equations.

MATH2601 Slides, 2021 – p. 72


The natural questions to ask are: are there always solutions to
the normal equations, and if so how many solutions are there?
This is not hard to answer: since Ax is the projection onto the
column space of A, the least squares solution to Ax = b
always exists, and will be unique as long as the columns of A
are independent.
So if A has independent columns, this argument implies that
A∗ A is invertible – a fact that you can prove directly without
much difficulty – and so the solution to the normal equations is

x = (A∗ A)−1 A∗ b . (2)

This gives us a formula for the projection onto col(A):

projcol(A) (x) = A(A∗ A)−1 A∗ x

MATH2601 Slides, 2021 – p. 73


The commonest use of least squares is finding the line of
best fit to a set of points in R2 : given a set of points
{(xi , yi ) : i = 1, . . . , n} we look for a least squares solution to
the equations {yi = a + bxi }. !
a
In matrix form we have Ax = b with x = , b ∈ Rn the
b
column vector of the yi and A ∈ Mn,2 (R) of the shape
 
1 x1
 1 x2  P !
  ∗ n xi
 .. ..  and so A A = P P 2 .
. .  xi xi
1 xn

As long as at least 2 of the xi are different, the colums of A are


independent and so there is a unique line of best fit and we
can use equation (2) (rather than row reduction).
MATH2601 Slides, 2021 – p. 74
Example 4.23 Find the line of best fit to the points

(1, 2), (2, 3), (3, 2), (4, 0)

in the least squares sense.


SOLUTION: Plotting the points gives
b

b b

! ! !
4 10 α 7
Normal equations = .
10 30 β 14
Line of best fit is y = 3.5 − 0.7x.

MATH2601 Slides, 2021 – p. 75


Example 4.24 Consider

W = {x ∈ R3 : 2x1 + 3x2 − x3 = 0} ≤ R3 .

Find the projection matrix onto W .


Check your result by showing that the normal to W is
projected to 0.
SOLUTION: Projection is
 
10 −6 2
1  
−6 5 3  
14
2 3 13

MATH2601 Slides, 2021 – p. 76


Example 4.25 Find the quadratic of best fit for the points from
example 4.23:
(1, 2), (2, 3), (3, 2), (4, 0)
SOLUTION:
b

b b

Answer: y = − 41 + 61
20 t − 3 2
4t 
We could go on of course: note that there will be a cubic going
exactly through these four points.
I will leave you to prove that if the original equations have a
solution, the least squares method will give it.
MATH2601 Slides, 2021 – p. 77
A practical difficulty with the least squares procedure, if we
calculate with decimal approximations, is that A∗ A may be
ill–conditioned, meaning that numerical inversion or
row-reduction of the matrix incurs large rounding errors.
It may improve matters if we make use of a QR factorisation for
A, but the Gram-Schmidt method can also throw up numerical
issues, (e.g. subtracting two numbers that are close).
There is a modified Gram-Schmidt process that minimises
these issues, but there is a method of finding a QR
factorisation called the Householder algorithm that is
purposely designed to avoid all these problems. It is rather
clumsy for hand calculations though.
I will leave it to you to investigate these matters if you are
interested, but it is important to see how we could use a QR
factorisation to find least squares solutions.

MATH2601 Slides, 2021 – p. 78


So suppose we are solving Ax = b and we have a QR
factorisation of A as A = QR where R is square and invertible.
Then the normal equations in the form A∗ Ax = A∗ b are

R∗ Q∗ QRx = R∗ Q∗ b so that R∗ Rx = R∗ Q∗ b

since orthogonality of the columns of Q means that Q∗ Q = I


even if Q is not square.
But if R is invertible so is R∗ and we have

Rx = Q∗ b

as the normal equations, and these can be solved by


back-substitution.

MATH2601 Slides, 2021 – p. 79


Example 4.26 Use a QR factorisation to find the line of best
fit to the points
(1, 2), (2, 3), (3, 2), (4, 0)
in the least squares sense.
SOLUTION: We have
   √ 
1 1 1/2 −3 5/10 !
1 2 1/2 −√5/10  2 5
   √  √
A= = 
1 3 1/2 5/10  0 5

1 4 1/2 3 5/10
!
7/2
and so Q∗ y = √ leading to the same solution
−7 5/10
y = 3.5 − 0.7x . 

MATH2601 Slides, 2021 – p. 80

You might also like