You are on page 1of 22

3.

Inner products

Hilbert spaces, inner products, orthogonal and orthonormal families (chap 7 of the text
and chapter 6 of [K])

Recall that Fourier series, Fourier sine and cosine series share a very important property:
orthogonality. To be exact,
Z π Z π
sin mx sin nxdx = cos nx cos mxdx = 0 if m 6= n, m, n ∈ N ∪ {0}
0 0

And Z 2π
sin nx cos mxdx = 0 for all m, n ∈ N ∪ {0}.
0
These are the crucial properties that enable us to compute their coefficients easily. Let
us try to relate that to a fact/terminology we learn in linear algebra.

Definition 3.1. h·, ·i is said to be an inner product on a real vector space V if for all
f, g, h ∈ V , we have
(1) hf, gi = hg, f i (symmetric)
(2) hf, g + chi = hf, gi + chf, hi and hf + ch, gi = hf, gi + chh, gi for all c ∈ R (bilin-
ear)
(3) hf, f i ≥ 0 with equality only when f = 0 (positive definite).

Remark 3.2. (i) Instead of h·, ·i, (·, ·) is the notation used in your text book.
(ii) Occasionally, we may also need to talk about complex inner product on a complex
vector space. In that case, (1) and (2) above will have to be modified:

21
22 Jan 14, 22

Examples of vector spaces and inner products that you might have seen before: Rn , Cn ,
space of polynomials on R (or Rn ) and space of polynomials of degree less than m.

Next, the following vector spaces are examples that we will use in this module.
(1) C[−π, π]: Space of continuous functions on [−π, π].
(2) P C[−π, π]: Space of piecewise continuous functions on [−π, π].
(3) P C 1 [−π, π]: Space of piecewise smooth functions on [−π, π].
(4) L2 [−π, π]: Space of square integrable functions on [−π, π] or space of square inte-
grable periodic functions of period 2π.
(5) l2 space of square summable sequences.
(6) Space of periodic functions with period 2π.
(7) Space of piecewise continuous periodic functions with period 2π.
(8) Space of piecewise smooth periodic functions with period 2π.
(9) S2π the space of infinitely differentiable periodic functions of period 2π.
Jan 14, 22 23

A not so standard example of vector space and inner product:

∞ ∞
a2k < ∞}.
X X
S={ ak sin kx :
k=1 k=1
Again, one could define quite a number of inner products on it.

(1) hf, gi = 0 f (x)g(x)dx;
P∞ P∞ P∞
(2) h k=1 ak sin kx, k=1 bk sin kxi = k=1 ak b k .
Actually, if you have taken a course on measure and integral, you might be able to see
that S = L2 [0, π].
Remarks:

Definition 3.3. A set F in a vector space with inner product h·, ·i is said to be orthogonal
if (we usually assume 0 ∈
/ F)

hf, gi = 0 for all f, g ∈ F, f 6= g

Examples:
(1) Standard basis of Euclidean spaces Rn under usual inner product.

(2) {cos kx, sin kx : k = 0, 1, 2 · · · , } under inner product:

(3) {sin kx : k ∈ N} under inner product:

(4) {cos kx : k ∈ N} under inner product:

1
(5) {1, x, − x2 } under inner product:
3
24 Jan 14, 22

Definition 3.4. An orthogonal set F in a vector spaces with inner product is said to be
orthonormal if hf, f i = 1 for all f ∈ F.

Examples: under above mentioned respective inner product:


(1) Standard Euclidean basis.

1 1 1
(2) { √ , √ cos kx, √ sin kx : k ∈ N},
2π π π


1 2
(3) { √ , √ cos kx : k ∈ N},
π π


2
(4) { √ sin kx : k ∈ N}
π

1
Question: can we make {1, x, − x2 } orthonormal?
3
Jan 14, 22 25

In general, if we have an inner product h·, ·i on a vector space, it is clear that kxk =
q
hx, xi defines a norm: that is,

Let us now recall from linear algebra


(1) Cauchy Schwartz inequality: |hx, yi| ≤ kxkkyk
(2) parallelogram rule kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 .
Indeed, in general, we have Pythagorean law:

k k
vj k2 = kvj k2 if hvj , vl i = 0 whenever j 6= l.
X X
k (P )
j=1 j=1

Indeed, inner products are used in Linear Algebra to define the angle between two
vectors in V . It is easy to see that one can define inner product on any finite dimensional
vector spaces (do it yourselves). This is unfortunately not true for infinite dimensional
vector spaces. If such an inner product exists, we will say this vector space is an inner
product space and it is of course also a normed linear space. A complete inner product
space is said to be a Hilbert space.
Examples
26 Jan 14, 22

Homework 3.5. Let v1 , · · · , vn be a basis of a complex vector space V . Define


n
X n
X n
X
h αj vj , βj vj i = αj βj , for all αj , βj ∈ C.
j=1 j=1 j=1

Show that this defines an inner product on V and hence show that V is a complete inner
product space (Hilbert space).
Finally, try to find as many inner product on this space V as possible (excluding constant
multiples of them) if you could.

In view of the Pythagorean law, we realize that it is nice to have a family of nonzero
vectors F = {vα }α∈I with hvα , vβ i = 0 whenever vα 6= vβ , vα , vβ ∈ F. We will say such a
family an orthogonal family. Moreover, it will be easier in computation if it is normalized,
that is, kvα k = 1 for all α ∈ I. In this case, it will be called an orthonormal family and
we will have by Pythagorean law,
n n
αj vj k2 = |αj |2 for all αj ∈ R or C , vj ∈ F for all j.
X X
k
j=1 j=1

Pn
We observe that if v = j=1 αj vj , then αj = hv, vj i for all j and it remains true as
n → ∞.

As such family has this nice structure, we wonder whether we can have such a (orthonor-
mal) family of vectors such that every vector in V can be either a linear combination of
vectors in this family or the limit of some linear combinations of vectors (i.e., infinite sum
in terms of such vectors that will be known as basis). We will simply refer such a family
as an orthonormal basis.
Jan 14, 22 27

Let us state some simple facts below.

Proposition 3.6. (1) An orthogonal family of vectors is linearly independent.


(2) Every Hilbert space has a maximal (complete) orthonormal set.

Proof. The first one is easy while the second one need ”Gram Schmidt process” and
”Zorn’s lemma”.

Definition An orthonormal set {e1 , e2 , · · · } in a vector space V is said to be an or-


thonormal basis of V if for any f ∈ V , there exist unique {Ck } ⊂ R such that

X
f= Ck ek .
k=1

In particular, it is a complete orthonormal set.


Remark : we only talk about a vector space with countably infinite orthonormal basis
in the above. You should have already seen the case of finite dimensional vector spaces
in linear algebra. Unfortunately, none of the infinite dimensional vector space discussed
above is a Hilbert space as they are not complete.
28 Jan 14, 22

Next, we have the Bessel’s inequality and Parseval’s identity.

Theorem 3.7. Let F be a family of orthonormal vectors in V (we will assume in this
module that it is at most countable). Then

|hv, vα i|2 ≤ kvk2 for all v ∈ V (Bessel’s inequality).


X

vα ∈F

Next, if this family is complete and hence it is an orthonormal basis with


Span{vα : vα ∈ F} = V (the vector space itself ), then we have Parseval’s identity:

|hv, vα i|2 = kvk2 for all v ∈ V.


X

vα ∈F

Proof.
Jan 14, 22 29

The above has many consequences. First of all, we have

Proposition 3.8. Best approximation by family of orthonormal vectors.


Let {v1 · · · , vn } be an orthonormal family of vectors. Then for all u ∈ V ,
n
X
inf{ku − vk : v ∈ Span{v1 , · · · vn } = M } = k hu, vj ivj − uk.
j=1

Thus, infimum is achieved. Moreover, it is unique. We will then say this vector
n
X
hu, vj ivj = PM u is the orthogoanl projection of u to M . By uniqueness this PM u is
j=1
independent of the choice of orthonormal basis of M . Moreover, u − PM u is perpendicular
to M and
(P) kPM u − PM vk ≤ ku − vk.
Note that (P) implies this projection map PM is continuous.

Remark If u ∈ M , then PM (u) = u.

Homework 3.9. Let {vj }j∈N be an orthonormal basis of a Hilbert space H and x, y ∈ H.
Show that

X
hx, yi = hx, vj ihy, vj i.
j=1
30 Jan 14, 22

Least squares approximation in Rn

We consider least squares approximation of a vector in Rn out of a linear subspace M .


Problem 1. Let {a1 , . . . , am } be a basis of a subspace M of the inner product space Rn ,
where m < n. For any b ∈ Rn , we want to find v ∈ M such that

kv − bk = inf kṽ − bk.


ṽ∈M

This is the same as finding α∗ = (α1∗ , . . . , αm



) ∈ Rm such that

m m

X X
αk∗ ak − b = minm αk ak − b .


α∈R
k=1 k=1

Let A = [a1 . . . am ] be the n × m matrix with columns ak , 1 ≤ k ≤ m. Then Problem


1 can be rewritten in the form:

Problem 10 . Find α∗ ∈ Rm such that

kAα∗ − bk = minm kAα − bk.


α∈R

First note that ARm (the range) is a linear subspace of Rn and we may write any
element inside ARm as Aα with α ∈ Rm . Recall that the best approximation in ARm is
just the ’orthogonal projection’ of b to ARm = M :
m
αi∗ ai
X
PM b =
i=1

of b onto M , and the vector α∗ = (α1∗ , . . . , αm



) satisfies the normal equations
    
 ha1 , a1 i · · · ham , a1 i  α1∗   hb, a1 i 
 .. . . ..  ..   .. 

 . . . 
 . 
 = 
 . ,
 (3.1)
    

ha1 , am i · · · ham , am i αm hb, am i

or equivalently in matrix form,

AT A α∗ = AT b. (3.2)
Jan 14, 22 31

Indeed, it is easy to establish (3.2) since we know b − PM b is orthogonal to every vector


in M , thus for any x ∈ Rm , we have since Ax ∈ M ,

hb − PM b, Axi = 0 and hence hb, Axi = hPM b, Axi.

Thus if PM b = Aα∗ , we have

hAα∗ , Axi = hAT Aα∗ , xi for all x ∈ Rm and hence we have (3.2).

Question: what if the given {a1 , · · · , am } is also orthonormal?

Application :
Let S := {(xk , yk ) ∈ R2 , 1 ≤ k ≤ n}, xk ’s distinct, be a finite set of points in R2
and W be a finite dimensional space of continuous functions. Find the least squares
approximation f ∗ ∈ W to the points in S, that is, solve the following discrete least
squares problem
n
X 1/2 n
X 1/2
∗ 2 2
|yk − f (xk )| = min |yk − f (xk )| .
f ∈W
k=1 k=1

Let m be the dimension of the space W (m usually more than n), and {f1 , . . . , fm }
Pm
be a basis of W . Then every function f ∈ W can be written as f = i=1 αi fi for some
(α1 , . . . , αm ) ∈ Rm . Thus the problem reduces to finding (α1∗ , · · · , αm

) ∈ Rm such that
n
X m 1/2 n
X m 1/2
αi∗ fi (xk )|2 2
X X
|yk − = min |yk − αi fi (xk )| .
(α1 ,... ,αm )∈Rm
k=1 i=1 k=1 i=1
32 Jan 14, 22

Therefore it suffices to find w∗ such that

kb − w∗ k = min kb − wk,
w∈M

where k · k is the usual Euclidean norm on Rn , b = (y1 , . . . , yn ) ∈ Rn , and M is the


subspace of Rn spanned by the vectors

ei := (fi (x1 ), . . . , fi (xn )), 1 ≤ i ≤ m. (3.3)

If {e1 , . . . , em } is linearly independent in Rn , by Proposition 3.8, there is a unique


Pm
function f = i=1 αi fi ∈ W which is the solution of the problem. Moreover the vector
(α1 , . . . , αm ) satisfies the following linear system
    
Pn 2 Pn Pn
 k=1 |f1 (xk )| ··· k=1 fm (xk )f1 (xk )   α1   k=1 yk f1 (xk ) 
 .. .. ..  . 
  ..  = 
 .. 
.

 . . .    . 
 P    P 
n Pn n
f (x )f (x
k=1 1 k m k) ··· k=1 |fm (xk )|2 αm k=1 yk fm (xk ) (3.4)
Jan 14, 22 33

Notice that we need {ei = (fi (x1 ), · · · , fi (xn )) : i = 1, · · · , n} being linearly indepen-
dent. This is usually true when n ≥ m Let us now look at a special case:

Example 3.10. (Linear regression) Let (xk , yk ) ∈ R2 , 1 ≤ k ≤ n, xk ’s distinct. Find the


least squares approximation (the best approximation in `2 -norm) using a line, that is,

n
X 1/2
min |yk − l(xk )|2 ,
`
k=1

where l is linear on the plane.

Write l(x) = α0 + α1 x. Then

n
X 1/2 n
X 1/2
min |yk − l(xk )|2 = min (yk − α0 − α1 xk )2
` α0 ,α1 ∈R
k=1 k=1
= min2 ky − Aαk2 ,
α∈R

where A = [1 x], 1 = (1, . . . , 1)T with n 1’s, x = (x1 , . . . , xn )T , y = (y1 , . . . , yn )T and


α = (α0 , α1 )T . Therefore, by the characterization of best approximation, the coefficients
α0∗ , α1∗ of the best approximation satisfies the following equation:
    
h1, 1i hx, 1i α0∗ hy, 1i
   = .
h1, xi hx, xi α1∗ hy, xi

By direct computation, we have

h1, 1i = n,
n
X
hx, 1i = xk ,
k=1
n
x2k ,
X
hx, xi =
k=1
Xn
hy, 1i = yk ,
k=1
Xn
hy, xi = xk yk .
k=1

Therefore

l∗ (x) = α0∗ + α1∗ x,


34 Jan 14, 22

where
Pn
yk )( nk=1 x2k ) − ( nk=1 yk xk )(
P P Pn
( xk )
α0∗ = k=1 k=1
,
n nk=1 x2k − ( nk=1 xk )2
P P

n nk=1 xk yk − ( nk=1 xk )( nk=1 yk )


P P P
α1∗ = .
n nk=1 x2k − ( nk=1 xk )2
P P

More generally, consider the following regression problem.

Example 3.11. Let (xk , yk ) ∈ R2 , 1 ≤ k ≤ n, xk ’s distinct. Find the best approximation


(in `2 -norm) using a polynomial of degree at most m, that is,
n
X 1/2
2
min |yk − p(xk )| ,
p
k=1

where p(x) = α0 + α1 x + . . . + αm xm .

For the case that W = Pm , the space of all polynomials of degree at most m, the vectors
in (3.3) are linearly independent in Rn if m < n. Therefore there is a unique polynomial
p∗ of degree at most m such that
n
X 1/2 n
X 1/2
∗ 2 2
|yk − p (xk )| = min |yk − p(xk )| .
p ∈Pm
k=1 k=1

Indeed, when m = n − 1, the above minimum will be zero (See Appendix).


Jan 14, 22 35

Gram determinant [K,chap 6, 6.5]


Recall that AT A in (3.1) plays an important role. Indeed, its determinant is called
the Gram determinant of {a1 , · · · , am } and we can denote it as G(a1 , · · · , am ). With this
notation, we may express the minimum distance of b to ARm as

kb − PM bk2 = G(b, a1 , · · · , am )/G(a1 , · · · , am ).

Note that again the formula is easy when the given vectors ai are orthonormal.
Proof:
36 Jan 14, 22

Best approximation in L2
We now extend our study to more general inner product spaces. Recall that

Z b
hf, gi = f gdx
a

define an inner product on various subspaces of L2 .


In particular, let us recall the following orthogonal families of functions. Instead of
using polynomials, we use trigonometric functions.

Example:
1 1 1
{ √ , √ cos kx, √ sin kx : k ∈ N} is an orthonormal basis of the space of square
2π π π
integrable periodic functions of period 2π.

s
1 2
{√ , cos kx : k ∈ N} is an orthonormal basis of the space of even square integrable
π π s
2
periodic functions of period 2π and { sin kx : k ∈ N} is an orthonormal basis of the
π
space of odd square integrable periodic functions of period 2π.

Best approximation in the mean (in L2 ) (§ 64)

Theorem 3.12. Let {ϕk } be a family of orthonormal set. If ck = (f, ϕk ) for all k, then
for any n ∈ N and {γk } ⊂ R,
Z b n Z b n
2
ck ϕk (x)|2 dx.
X X
|f (x) − γk ϕk (x)| dx ≥ |f (x) −
a k=1 a k=1
Jan 14, 22 37

Proof.

Best approximation in the mean for Fourier series (§ 66)


Let f be a piecewise continuous function on [0, L] and let
∞ ∞
X 2kπx X 2kπx
a0 + ak cos + bk sin
k=1 L k=1 L
be the Fourier series of f on [0, L]. Then
for all n and {a0k , b0k } ⊂ R,
Z L n n
2kπx X 2kπx 2
|f (x) − (a00 + a0k cos b0k sin
X
+ )| dx
0 k=1 L k=1 L
Z L n n
X 2kπx X 2kπx 2
≤ |f (x) − (a0 + ak cos + bk sin )| dx
0 k=1 L k=1 L
(best approximation in the mean).

Example 3.13. Recall that


X 2(−1)k+1 sin kx
= x if − π < x < π,
k=1 k
Thus,
38 Jan 14, 22

Parseval’s identity or Parseval’s equation (§ 65)

Theorem 3.14. Let {ϕk (x) : k ∈ N} be an orthonormal basis of L2 [a, b] (space of square
integrable functions). Then for any f ∈ L2 [a, b],
Z b ∞
2
c2k
X
|f (x) |dx = Parseval’s equation/identity
a k=1

Z b ∞
X
where ck = f (x)ϕk (x)dx. Note that f = ck ϕk .
a k=1
In particular, let f be a piecewise continuous function on [0, L] and let
∞ ∞
X 2kπx X 2kπx
a0 + ak cos + bk sin
k=1 L k=1 L

be the Fourier series of f on (0, L). Then



Z L !
|f (x)|2 dx = L/2 2a20 + (a2k + b2k ) .
X
Parseval’s equation/identity
0 k=1

Example 3.15. From the previous example, using the Fourier series of x on [−π, π], we
have

!2 ∞
Z π
2
X 2(−1)k+1 X 1
x dx = π = 4π 2
.
−π k=1 k k=1 k

Hence

1
= π 2 /6.
X
k 2
k=1

[Exercise] Find the Parseval equations for each of the following:



(−1)k
sin kx = x3 − π 2 x, −π < x < π.
X
(1) 12 3
k=1 k


π2 (−1)k cos kx
= x2 for all − π ≤ x ≤ π
X
(2) +4 2
3 k=1 k
Jan 14, 22 39

Parseval’s equation for Fourier cosine series and sine series (§ 66)
Similarly, if f is a piecewise continuous function on [0, π], then
Z π ∞
πX 2Zπ
|f (x)|2 dx = b2k , bk = f (x) sin kxdx
0 2 k=1 π 0
and

Z π !
2 π 2
a2k ,
X
|f (x)| dx = 2a0 +
0 2 k=1
Z π
2 1Zπ
ak = f (x) cos kxdx for k ∈ N and a0 = f (x)dx.
π 0 π 0

Proof of Parseval’s identity.


First, let us consider the special subset S2π (spaces of infinitely differentiable functions
on R that are periodic with period 2π).
P∞ P∞
Lemma Let ak and bk as before. Then k=1 |ak |, k=1 |bk | both converge.
Proof.
40 Jan 14, 22

We now consider least squares approximation of a function in a weighted L2 -space.


A weight w on the subinterval [a, b] of R is a positive, continuous and integrable function
on [a, b], except that w(a) or w(b) may be ∞. (Indeed, we can deal with more general
weight if you have taken ’measure and integral’.)

Example 3.16. (1) w1 (x) := 1, x ∈ [−1, 1].


(2) w2 (x) := (1 − x2 )−1/2 , x ∈ [−1, 1].
2 /2
(3) The Gaussian function w3 defined by w3 (x) = (2π)−1/2 e−x on (−∞, ∞).

For a weight w on [a, b], we define the weighted L2 -space by

L2w [a, b] := {f : f is measurable on [a,b] and kf k2,w < ∞} ,

where !1/2
Z b
2
kf k2,w := |f (x)| w(x)dx .
a

One may verify that L2w [a, b] is an inner product space with the inner product h·, ·iw on
L2w [a, b] defined by
Z b
hf, giw = f (x)g(x)w(x) dx, ∀ f, g ∈ L2w [a, b].
a

Let C[a, b] be the space of all continuous functions on [a, b]. If [a, b] is a finite interval,
then for any f ∈ C[a, b],
Z b Z b
kf k22,w = |f (x)|2 w(x)dx ≤ kf k2∞ w(x) dx < ∞.
a a

Hence if Pn is the space of all polynomials of degree at most n. then

Pn ⊂ C[a, b] ⊂ L2w [a, b].


Jan 14, 22 41

By our previous results, we have the following result on least squares approximation by
polynomials in a weighted L2 -space.

Theorem 3.17. Let w be a weight on a finite interval [a, b] and let f ∈ L2w [a, b]. Then
p∗n ∈ Pn is the least squares approximation to f out of Pn if and only if

hf − p∗n , piw = 0 ∀ p ∈ Pn .
Pn
Moreover, p∗n (x) = k=0 αk∗ xk , where
    
 h1, 1iw ··· hxn , 1iw  α0∗   hf, 1iw 
 .. .. ..  ..  
= .. 
.

 . . . 
 .   .  (3.5)
    
h1, xn iw · · · hxn , xn iw αn∗ hf, xn iw
42 Jan 14, 22

Appendix: Lagrange interpolation

We want to construct the polynomial pn of degree ≤ n which interpolates the points


(xi , yi ), i = 0, 1, . . . , n. For k = 0, 1, . . . , n, let

(x − x0 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )


lk (x) =
(xk − x0 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
Πni=0,i6=k (x − xi )
= . (3.6)
Πni=0,i6=k (xk − xi )
Then lk is a polynomial of degree n,

lk (xj ) = δkj , j = 0, 1, . . . , n


1,
 j=k
=  (3.7)
0,
 j 6= k.
Let
n
X
pn (x) = yk lk (x). (3.8)
k=0

Then pn is a polynomial of degree ≤ n, and


n
X
pn (xj ) = yk lk (xj ) = yj , j = 0, 1, . . . , n.
k=0

Hence RHS of (3.8) is the unique polynomial of degree ≤ n which interpolates the data
(xj , yj ), j = 0, 1, . . . , n, and is called the Lagrange interpolating polynomial. (3.8) is called
the Lagrange interpolation formula. In particular,

Theorem 3.18. Let f be a real-valued function defined on a set containing {x0 , x1 , . . . , xn },


where xi ’s all distinct. Then
n
X
pn (x) = f (xk )lk (x), (3.9)
k=0

where lk is given by (3.8), is the unique polynomial of degree ≤ n such that

pn (xi ) = f (xi ), i = 0, 1, . . . , n.

(We say that pn interpolates f at x0 , x1 , . . . , xn .)


| {z }
nodes/knots

Remark. If we add an extra data point, we have to work out completely the new
interpolating polynomial.

You might also like