You are on page 1of 61

Numerical Analysis Module 2

Fundamentals of Vector Spaces


Sachin C. Patwardhan
Dept. of Chemical Engineering,
Indian Institute of Technology, Bombay
Powai, Mumbai, 400 076, Inda.
Email: sachinp@iitb.ac.in

Contents
1 Introduction 3

2 Basics of Vector Spaces 3


2.1 Linear Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Subspace, Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Transformations 13

4 Magnitudes 19
4.1 Normed Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Induced Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Convergence and Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Angles 29
5.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Orthogomal Projections in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Gram-Schmidt Process and Orthogonal Polynomials . . . . . . . . . . . . . . 45
5.5 Generalized Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

1
6 Summary 54

7 Appendices 54
7.1 Appendix A: Computation of Matrix Norms . . . . . . . . . . . . . . . . . . 54
7.1.1 Computation of the 2-norm [8][5] . . . . . . . . . . . . . . . . . . . . 54
7.1.2 Computation of 1 Norm [5] . . . . . . . . . . . . . . . . . . . . . . . 56
7.1.3 Computation of 1 Norm of a Matrix [5] . . . . . . . . . . . . . . . . 57
7.2 Appendix B: Necessary and Su¢ cient Conditions for Unconstrained Optimality 58
7.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2.2 Necessary Condition for Optimality . . . . . . . . . . . . . . . . . . . 59
7.2.3 Su¢ cient Condition for Optimality . . . . . . . . . . . . . . . . . . . 60

2
1 Introduction
In the previous module, we derived di¤erent abstract equation forms, namely linear algebraic
equations, nonlinear algebraic equations, DAEs, ODE-IVPs, OBE-BVPs and PDEs. Are
these equation forms fundamentally di¤erent from each other? To see the unity among the
apparent diversity, we need to acquaint ourselves with the generalized concept of a vector
space.
When we begin to use the concept of vectors for formulating mathematical models for
physical systems, we start with the concept of a vector in the three dimensional coordinate
space. From the mathematical viewpoint, the three dimensional space can be looked upon as
a set of objects, called vectors, which satisfy certain generic properties. While working with
mathematical modeling we need to deal with a variety of such sets containing di¤erent types
of objects. It is possible to distill essential properties satis…ed by all the vectors in the three
dimensional vector space and develop a more general concept of a vector space, which is a set
of objects that satisfy these generic properties. Such a generalization can provide a uni…ed
view of problem formulations and the solution techniques. Generalization of the concept of
the vector and the three dimensional vector space to any general set is not su¢ cient. To
work with these sets of generalized vectors, we also need to generalize various algebraic and
geometric concepts, such as magnitude of a vector, convergence of a sequence of vectors,
limit, angle between two vectors, orthogonality, etc.. Understanding the fundamentals of
vector spaces also helps in developing a uni…ed view of many seemingly di¤erent numerical
schemes. In this module, fundamentals of vector spaces are brie‡y introduced. A more
detailed treatment of these topics can be found in Luenberger [4] and Kreyzig [1].
A word of advice before we begin to study these grand generalizations. While dealing
with the generalization of geometric notions in three dimensions to more general vector
spaces, it is di¢ cult to visualize vectors and surfaces as we can do in the three dimensional
vector space. However, if you understand the geometrical concepts in the three dimensional
space well, then you can develop a good understanding of the corresponding concept in any
general vector space. In short, it is enough to know your school geometry well. We are only
building qualitatively similar structures on the other sets of interest.

2 Basics of Vector Spaces


2.1 Linear Vector Space
When the vectors are introduced to us in the high-school or junior college textbooks, the …rst
concept that we are taught is the parallelogram law of vector addition. Given two arbitrary

3
vectors, say u and v in R3 ,

u =i j + 2k and v =i + j k

and some arbitrary real numbers, say and , we learn that

Scalar multiplication of the vectors, u and v; by scalars and , respectively, yields


new vectors u and v, that are also in R3 ; i.e.
2 3 2 3
6 7 6 7
u =4 a 5 2 R3 and v =4 52R
3

Adding u and v; i.e. u + v; yields another vector in R3 ; i.e.


2 3
+
6 7
u + v = 4 a + b 5 2 R3
2

The vector u + v forms one of the diagonals of the parallelogram with sides u and
v.

In fact, if u and v happen to represent two independent directions in a plane, then any
other vector in the plane can be reached as a linear combination of these two vectors. For
example, any vector lying in the x y plane can be reached as linear combination of the
following two independent directions
2 3 2 3
1 1
6 7 6 7
u = 4 1 5 and v = 4 1 5
0 0

Note that scalars here are chosen as the set of real numbers with a purpose. If we happen
to restrict the scalars to some other set, say the set of integers, then we will not be able
reach any arbitrary point in the plane through the use of parallelogram law. What is then so
special about the law of vector addition and the set of real numbers as the choice of scalars?
To get insight into this requires us to understand the concepts of closure and …eld.

De…nition 1 (Closure) A set is said to be closed under an operation when any two elements
of the set subject to the operation yields a third element belonging to the same set.

De…nition 2 (Field) A …eld is a set of elements closed under addition, subtraction, mul-
tiplication and division.

4
Note that the set of integers is closed under addition, multiplication and subtraction.
However, this set is not closed under division. On the other hand, the set of real numbers (R)
and the set of complex numbers (C) are closed under addition, subtraction, multiplication
and division. Moreover, the set of real numbers (R) and the set of complex numbers (C)
are scalar …elds. However, the set of integers is not a …eld. Also, it is easy to see that the
closures under vector addition and scalar multiplication are two fundamental properties of
vectors in R3 . These properties form the basis for generalizing the concept of a vector space.
A vector space is de…ned as a nonempty set of elements, which is closed under addition and
scalar multiplication. Thus, associated with every vector space is a set of scalars F (also
called as scalar …eld or coe¢ cient …eld) used to de…ne scalar multiplication on the space. In
functional analysis, the scalars will always be taken to be the set of real numbers (R) or
complex numbers (C).

De…nition 3 (Vector Space): A vector space X is a set of elements called vectors and an
associated scalar …eld F together with two operations. The …rst operation is called addition
which associates with any two vectors x; y 2 X a vector x + y 2 X , the sum of x and y.
The second operation is called scalar multiplication, which associates with any vector x 2 X
and any scalar 2 F a vector x (a scalar multiple of x by ):

Thus, when X is a linear vector space, given any vectors x; y 2 X and any scalars
; 2 R; the element x + y 2 X. This implies that the well known parallelogram law in
three dimensions also holds true in any vector space.
Given a vector space X and scalar …eld F , it is easy to show that the following properties
hold for any x; y; z 2 X and any scalars ; 2 F :

1. Commutative law: x + y = y + x

2. Associative law: x + (y + z) = (x + y) + z

3. There exists a null vector 0 such that x + 0 = x for all x 2 X

4. Distributive laws: (x + y) = x + y, ( + ) x = x + x and (x) = ( x)

5. x = 0 when = 0 and x = x when = 1:

6. For convenience 1x is de…ned as x and called as negative of a vector. We have


x + ( x) = 0; where 0 represents zero vector in X:

Let us examine some examples of the vector spaces. To begin with, let us consider the
set X consisting of all n-tuples of the form
h iT
x = x1 x2 ::::: xn (1)

5
where xi 2 R together with F R. Given any two elements from this set, say x; y 2 X,
and arbitrary scalars, ; 2 R; it is easy to see that n-tuple x + y is also contained in
X. The set considered here is n dimensional real coordinate space, i.e. X Rn . Suppose,
in the previous example, we consider a set of n-tuples where xi 2 C together with F C: It
is straight-forward to show that x; y 2 C n , and arbitrary scalars, ; 2 C; it is easy to see
that n-tuple x + y is also contained in C n : Note that the choice of scalar …eld is critical
while de…ning a vector space. For example consider (X Rn ; F C). Since for any x 2 X
and any 2 C the vector x 2R = n ; this combination of set X and scalar …eld F does not
form a vector space.
The generalized de…nition of vector space permits us to view many interesting sets as
vector spaces other then Rn and C n : For example, let us consider the set of all m n matrices
with real elements together with scalar …eld R, i.e. X Rm Rn ; F R: It is easy to see
that, if A; B 2 X; then A + B 2 X for arbitrary ; 2 R and thus X is a vector space.
Note that a "vector" in this space is an m n matrix and the null vector corresponds to
an m n null matrix. Another example of vector space is a set of all in…nite sequence of
real numbers, which is denotes as X l1 , together with F R. A typical vector x of this
space has the form
x = (x1 ; x2 ; :::; xk ; ::::)
x; y 2 l1 ; and arbitrary scalars, ; 2 R; linear combination x + y is also contained in
l1 . Thus, the combination (X l1 ; F R) also quali…es to be a vector space. Note that a
single vector in this set has in…nite elements, which is qualitatively di¤erent from 2, 3 or n
tuples that we have considered so far.
Are there other sets in which a single vector consist of in…nite elements? There are plenty
of them which we encounter in many engineering problems. Consider a set of all real valued
continuous functions over an interval [a; b]; which is denoted as C[a; b], together with F R.
We write x = y if x(t) = y(t) for all t 2 [a; b] The null vector 0 in this space is a function
which is zero everywhere on [a; b] ; i.e.,

f (t) = 0 for all t 2 [a; b]

If x(t) and y(t) are vectors from this space and is a real scalar, then (x + y)(t) =
x(t) + y(t) and ( x)(t) = x(t) are also elements of C[a; b]:Thus,the pair, X C[a; b];
F R forms a vector space. Other examples of such sets that qualify as vector spaces are
(a) collection of all continuous and n times di¤erentiable functions over an interval [a; b],
i.e. X C (n) [a; b]; together withF R and (b) the set of all functions ff (t) : t 2 [a; b]g for

6
which
Zb
jf (t)jp dt < 1
a

holds and which is denoted as is a linear space Lp [a; b]:


Let X and Y be two vector spaces. Then their product space, denoted by X Y , is an
ordered pair (x; y) such that x 2 X; y 2 Y: If z(1) = (x(1) ; y(1) ) and z(2) = (x(2) ; y(2) ) are
two elements of X Y; then it is easy to show that z(1) + z(2) 2 X Y for any scalars
( , ). Thus, product space is a linear vector space.

Example 4 Let X = C[a; b] and Y = R; then the product space X Y = C[a; b] R


forms a linear vector space. Such product spaces arise in the context of ordinary di¤erential
equations.

2.2 Subspace, Basis and Dimension


Note that we are dealing with a set of vectors

x(k) : k = 1; 2; ::::::m (2)

The individual elements in the set are indexed using superscript (k). Now, if X = Rn and
x(k) 2 Rn represents k th vector in the set, then it is a vector with n components which are
represented as follows
h iT
x(k) = x1(k) x2(k) :::: x(k)n (3)

Similarly, if X = l1 and x(k) 2 l1 represents k th vector in the set, then x(k) represents an
in…nite sequence with elements denoted as follows
h iT
(k) (k)
x(k) = x1 :::: xi :::::: (4)

In three dimension, we often work with subsets like lines or planes. For example, consider
the set S of collection of all vectors

x = x(1) + x(2)

where ; 2 R are arbitrary scalars and


2 3 23
2 4
6 7 6 7
x(1) = 4 1 5 and x(2) =4 0 5 (5)
0 1

7
In other words, S represents the set of all possible linear combinations of fx(1) ; x(2) g. This
set de…nes a plane passing through the origin in R3 .It is interesting to note that vectors
belonging to this set obey properties of a vector space, i.e. it is easy to show that

x 2S ) x 2S for arbitrary and x; y 2S ) x + y 2S (6)

On the other hand, consider the set, S1 , which is collection of all vectors of the form

x = x(1) + x(2) + b (7)


h iT
where b = 1 2 3 2
= S1 . Now, consider two vectors x; y 2S1 in this set represented as

(1) (2) (1) (2)


x= 1x + 1x + b and y = 2x + 2x +b

The vector (y + x)

(1) (2)
x + y =( 1 + 2) x +( 1 + 2) x + 2b 2
= S1

Also, vector x = ( 1 x(1) + 1 x(2) ) + b 2S = 1 for an arbitrary : Thus, the lines or planes
passing through the origin di¤er from the rest and belong to a special class of subsets of R3 .
Any two dimensional plane passing through the origin of R3 is a sub-space of R3 . The origin
must be included in the set for it to qualify as a sub-space. Note that a plane which does not
pass through the origin is not a sub-space. The concept of a sub-space can be generalized
as follows.

De…nition 5 (Subspace): A non-empty subset M of a vector space X is called subspace


of X if every vector x + y is in M for arbitrary ; 2 F wherever x and y are both in
M: Every subspace always contains the null vector, i.e., the origin of the space x:

Thus, the fundamental property of objects (elements) in a vector space is that they can
be constructed by simply adding other elements in the space. This property is formally
de…ned as follows.

De…nition 6 (Linear Combination): A linear combination of vectors x(1) ; x(2) ; ; :::::::x


(m)

in a vector space is of the form 1 x(1) + 2 x(2) + :::::::::::::: + m x(m) where ( 1 ; ::: m ) are
scalars.

De…nition 7 (Span of a Set of Vectors): Let S be a subset of vector space X. The set
generated by all possible linear combinations of elements of S is called as the span of S and
denoted as [S]. Span of S is a subspace of X.

8
De…nition 8 (Linear Dependence): A vector x is said to be linearly dependent upon a
set S of vectors if x can be expressed as a linear combination of vectors from S. Alternatively,
x is linearly dependent upon S if x belongs to the span of S; i.e. x 2 [S]. A vector is said
to be linearly independent of set S, if it is not linearly dependent on S . A necessary and
su¢ cient condition for the set of vectors x(1) ; x(2) ; :::::x(m) to be linearly independent is that
the expression
Xm
(i)
ix =0 (8)
i=1

implies that i = 0 for all i = 1; 2::::::m:

Example 9 Show that functions 1, exp(t), exp(2t), exp(3t) are linearly independent over
any interval [a,b].
Let us assume that vectors (1; et ; e2t ; e3t ) are linearly dependent i.e. there are constants
( ; ; ; ), not all equal to zero, such that

+ et + e2t + e3t = 0 holds for all t 2 [a; b] (9)

Taking the derivative on both the sides, the above equality implies

et ( + 2 et + 3 e2t ) = 0 holds for all t 2 [a; b]

Since et > 0 holds for all t 2 [a; b], the above equation implies that

+ 2 et + 3 e2t = 0 holds for all t 2 [a; b] (10)

Taking the derivative on both the sides, the above equality implies

et (2 + 6 et ) = 0 holds for all t 2 [a; b]

which implies that


2 + 6 et = 0 holds for all t 2 [a; b] (11)

) et = holds for all t 2 [a; b]

which is absurd. Thus, equality (11) holds only for = = 0 and vectors (1; et ) are linearly
independent on any interval [a; b]. With = = 0, equality (10) only when = 0 and
equality (9) holds only when = 0: Thus, vectors (1; et ; e2t ; e3t ) are linearly independent.

De…nition 10 (Basis): A set S X of linearly independent vectors in X is said to be


the basis for space X if S generates X i.e. X = [S]:

9
A vector space having …nite basis (spanned by set of vectors with …nite number of el-
ements) is said to be …nite dimensional. All other vector spaces are said to be in…nite
dimensional. We characterize a …nite dimensional space by number of elements in a basis.
Any two basis for a …nite dimensional vector space contain the same number of elements.

Example 11 Consider the system of linear algebraic equations


2 32 3 2 3
1 2 4 x1 0
6 76 7 6 7
Ax = 4 1 2 4 5 4 x2 5 = 4 0 5
2 4 8 x3 0

It is easy to see that matrix A has rank equal to one and columns (and rows) are linearly
dependent. Thus, it is possible to obtain non-zero solutions to the above equation, which can
be re-written as follows
2 3 2 3 2 3 2 3
1 2 4 0
6 7 6 7 6 7 6 7
4 1 5 x1 + 4 2 5 x2 + 4 4 5 x3 = 4 0 5
2 4 8 0

Two possible solutions are


2 3 2 3
2 4
6 7 6 7
x(1) =4 1 5 and x(2) =4 0 5
0 1

In fact, x(1) and x(2) and linearly independent and any linear combination of these two
vectors, i.e.
x = x(1) + x(2)
for any scalars ( ; ) 2 R satis…es

Ax = A x(1) + x(2) = Ax(1) + Ax(2) = 0:

Thus, the solutions can be represented by a set M span x(1) ; x(2) ; which forms a two
dimensional subspace of R3 :

Example 12 Basis, Span and Sub-spaces


h iT
1. Let S = fvg where v = 1 2 3 4 5 and let us de…ne the span of S as [S] = v
where 2 R represents a scalar. Here, [S] is a one dimensional vector space and
subspace of R5

10
2. Let S = v(1) ; v(2) where
2 3 2 3
1 5
6 7 6 7
6 2 7 6 4 7
6 7 6 7
v(1) =6
6 3 7
7 ; v(2) =6
6 3 7
7 (12)
6 7 6 7
4 4 5 4 2 5
5 1

Here the span of S (i.e. [S]) is the two dimensional subspace of R5 .

3. Consider the set of nth order polynomials on interval [0; 1]. A possible basis for this
space is
p(1) (z) = 1; p(2) (z) = z; p(3) (z) = z 2 ; ::::; p(n+1) (z) = z n (13)
Any vector p(t) from this space can be expressed as

(1) (2) (n+1)


p(z) = 0p (z) + 1p (z) + ::::::::: + np (z) (14)
n
= 0 + 1z + :::::::::: + nz

Note that [S] in this case is an (n + 1) dimensional subspace of C[a; b].

4. Consider a set of continuous functions

S = ff1 (t) = 1; f2 (t) = et ; f3 (t) = e2t ; and f4 (t) = e3t g

de…ned on interval [a,b]. The span of the set S forms a 4 dimensional sub-space of
C[a; b]:

5. Consider a set of continuous functions over the interval C[ ; ]: A well known


basis for this space is

x(0) (z) = 1; x(1) (z) = cos(z); x(2) (z) = sin(z); (15)


(3) (4)
x (z) = cos(2z); x (z) = sin(2z); :::::::: (16)

It can be shown that C[ ; ] is an in…nite dimensional vector space.

6. Consider the space X of all n n real valued matrices. The set of all symmetric real
valued n n matrices, say S1 ; is a subspace of the set of all real valued n n matrices.
This follows from the fact that the matrix A + B is a real valued symmetric matrix
for arbitrary scalars ; 2 R when A,B 2S 1 : Similarly, the set of all skew symmetric

11
real valued n n matrices, say S2 ; is also a subspace of X: This follows from the fact
that for any A,B 2S 2 and for arbitrary scalars ; 2 R

( A + B)T = ( A + B) 2 S2

On the other hand, the set of all positive de…nite real valued n n matrices, say S3 ; is
not a subspace of X: This is because, if A 2S 3 then A 2 = S3 :

2.3 Exercise
1. Decide the linear dependence or independence of

(a) (1,1,2), (1,2,1), (3,1,1)


(b) x(1) x(2) ; x(2) x(3) ; x(3) x(4) ; x(4) x(1) for any x(1) ; x(2) ; x(3) ; x(4)
(c) (1,1,0), (1,0,0), (0,1,1), (x,y,z) for any scalars x,y,z

2. Describe geometrically the subspaces of R3 spanned by the following sets

(a) (0,0,0), (0,1,0), (0,2,0)


(b) (0,0,1), (0,1,1), (0,2,0)
(c) all six of these vectors given in 7(a) and 7(b)
(d) set of all vectors with positive components

3. While solving problems using a digital computer, arithmetic operations can be per-
formed only with a limited precision due to …nite word lengths. Consider the vector
space X R and discuss which of the laws of algebra (associative, distributive, com-
mutative) are not satis…ed for the ‡oating point arithmetic in a digital computer.

4. Consider the space X of all n n matrices. Find a basis for this vector space and show
that the set of all lower triangular n n matrices forms a subspace of X:

5. Consider a set, X; consisting of all real valued 2 2 matrices. Find a basis for this
vector space. Also, consider a subspace, S, of X consisting of all real valued symmetric
2 2 matrices. Find a basis for S. What are the dimensions of X and S?

6. Does the set of functions of the form

f (t) = 1=(a + bt)

constitute a linear vector space?

12
7. Give an example of a function which is in L1 [0; 1] but not in L2 [0; 1]:

8. Show that polynomials p1 (t) = 1; p2 (t) = t; and p3 (t) = t2 are linearly independent
over any interval [0; 1].

9. Show that the solution of the di¤erential equation

d2 x
+x=0
dt2
is a linear space. What is the dimension of this space?

3 Transformations
Using the generalized concepts of vectors and vector spaces, we can look at the mathematical
models in engineering as transformations, which map a subset of vectors from one vector
space to a subset in another space.

De…nition 13 (Transformation): Let X and Y be linear spaces and let M be subset of


X. A rule which associates with every element x 2 M to an element y 2 Y is said to be a
transformation from X to Y with domain M . If y corresponds to x under the transformation
we write y = T (x) where T (:) is called an operator.

The set of all elements for which an operator T is de…ned is called as the domain of T
and the set of all elements generated by transforming elements in the domain by T are called
as the range of T . If for every y 2 Y , there is at most one x 2 M for which T (x) = y ,
then T (:) is said to be one to one. If for every y 2 Y there is at least one x 2 M; then T is
said to map M onto Y: A transformation is said to be invertible if it is one to one and onto.

De…nition 14 (Linear Transformations): A transformation T mapping a vector space


X into a vector space Y is said to be linear if for every x(1) ; x(2) 2 X and all scalars ;
we have

T ( x(1) + x(2) ) = T (x(1) ) + T (x(2) ): (17)


Note that any transformation that does not satisfy the above de…nition is not a linear
transformation.

Example 15 Linear Operators

13
1. Consider the transformation
y = Ax (18)
where y 2 Rm ,x 2 Rn ; A 2 Rm Rn is a (m n) matrix and T (x) =Ax. Whether
this mapping is onto Rm depends on the rank of the matrix. Now, consider two vectors
x(1) ; x(2) 2 Rn and two arbitrary scalars ( ; ). Since

T ( x(1) + x(2) ) = A( x(1) + x(2) ) = Ax(1) + Ax(2) ) = T (x(1) ) + T (x(2) )

T (x) =Ax is a linear operator.

2. Consider the transformation


y = Ax + b (19)
where y; b 2 Rm ,x 2 Rn ; A 2 Rm Rn is a (m n) matrix and T (x) =Ax + b.
Note that b 6=0: Now, consider two vectors x(1) ; x(2) 2 Rn and any two arbitrary scalars
( ; ). Since

T ( x(1) + x(2) ) = A( x(1) + x(2) ) + b


T ( x(1) ) = Ax(1) + b and T ( x(2) )= Ax(2) + b
T ( x(1) + x(2) ) 6= T (x(1) ) + T (x(2) )

Thus, T (x) =Ax + b is not a linear operator.

3. Consider the transformation involving di¤erentiation, i.e.

dx(t)
y(t)=
dt
where t 2 [a; b] : Here, T (x) =dx=dt is an operator from, X C (1) [a; b]; the space of
continuously di¤erentiable functions, to the space of continuous function, i.e., Y
C[a; b]. Now, given two continuous functions, x(t); z(t) 2 C[a; b] and any two arbitrary
scalars ( ; ), it is easy to show that

d( x(t) + z(t)) dx(t) dz(t)


= +
dt dt dt
i.e. T (x) is a linear operator. On the other hand, the transformation such as
2
d (x(t))
y(t)=
dt

14
h i2
i.e. T (x) = d(x(t))
dt
is also an operator from X C (1) [a; b] to Y C[a; b]: However,
T (x) in this case is not a linear operator because
2 2 2
d (x(t) + z(t)) d (x(t)) d (z(t)) d (x(t)) d (z(t))
= + +2 (20)
dt dt dt dt dt
2 2
d (x(t)) d (z(t))
6= + (21)
dt dt

4. Consider the transformation de…ned by a de…nite integration operator, i.e.


Z b
= x( )d T [x( )]
a

which maps X {space of integrable functions over [a; b]} into Y R: It is easy to
check that this is a linear transformation.

Example 16 ODE and PDE Operators

1. Consider the ODE-IVP

dx=dt = f [t; x(t)] ; t 2 [0; 1) (22)

with the initial condition x(0) = : De…ning the product space Y = C (1) [a; 1) R;
the transformation T : C (1) [0; 1) ! Y can be stated as

T [x(t)] [dx=dt f (t; x(t)) ; x(0)]

and the ODE-IVP can be represented as

T [x(t)] = (0(t); )

where 0 represents a zero function over the interval [0; 1); i:e:; 0(t) = 0 for t 2 [0; 1):
If f (t; x(t)) is a linear function of x(t); then T [x(t)] is a linear operator else it is a
nonlinear operator.

2. Consider the ODE-BVP


d2 u du
a 2 + b + cg(u) = 0 (0 z 1)
dz dz

du(0)
B:C: at z = 0 : f1 ; u(0) = 0
dz
du(1)
B:C: at z = 1 : f2 ; u(1) = 1
dz

15
In this case, the transformation T [u(z)] de…ned as

d2 u(z) du(z)
T [u(z)] = a 2
+b + cg (u(z)) ; f1 (u0 (0); u(0)) ; f2 (u0 (1); u(1))
dz dz

maps the space X C (2) [0; 1] to Y = C (2) [0; 1] R R and the ODE-BVP can be
represented as follows
T [u(z)] = (0(z); 0 ; 1 )

3. Consider the general PDE

@2u @u @u
a 2 +b + cg(u) =0
@z @z @t
de…ned over (0 < z < 1) and t 0 with the initial and the boundary conditions speci…ed
as follows
u(z; 0) = h(z) for (0 < z < 1)

du(0; t)
B:C: at z = 0 : f1 ; u(0; t) = 0 for t 0
dz
du(1; t)
B:C: at z = 1 : f2 ; u(1) = 1 for t 0
dz

In this case, the transformation T [u(z; t)] de…ned as

@ 2 u(z; t) @u(z; t)
T [u(z; t)] = a + b
@z 2 @z
@u
+cg (u(z; t)) ; u(z; 0); f1 (u0 (0; t); u(0; t)) ; f2 (u0 (1; t); u(1; t))
@t
maps the space X C (2) [0; 1] C (1) [0; 1) to Y = C (2) [0; 1] C[a; b] R R and the
PDE can be represented as follows

T [u(z; t)] = (0(z; t); h(z); 0; 1)

From these example, it is clear that these seemingly dissimilar transformations can be
represented in a uni…ed manner as follows

y = T (x) (23)

where T : X ! Y is such that x 2X and y 2 M Y . Here, the set M can be entire space
Y or a sub-space of Y: To understand this better, let us look at some examples.

16
Example 17 Consider a system of linear algebraic equations
2 32 3
1 0 1 x1
6 76 7
Ax = 4 1 1 0 5 4 x2 5 = b
0 1 1 x3

It is desired to show that the set of all solutions of this equation for arbitrary vector b is the
same as R3 :
It is easy to see that matrix A has rank equal to three and columns (and rows) are linearly
independent. Since the columns are linearly independent, a unique solution x 2R3 can be
found for any arbitrary vector b 2 R3 : Now, let us …nd a general solution x for an arbitrary
vector b by computing A 1 as follows
2 3
1=2 1=2 1=2
6 7
x = A 1 b = 4 1=2 1=2 1=2 5 b
1=2 1=2 1=2
2 3 2 3 2 3
1=2 1=2 1=2
6 7 6 7 6 7
= b1 4 1=2 5 + b2 4 1=2 5 + b3 4 1=2 5
1=2 1=2 1=2
= b1 v(1) + b2 v(2) + b3 v(3)

By de…nition
b1 v(1) + b2 v(2) + b3 v(3) 2 span v(1) ; v(2) ; v(3)
for an arbitrary b 2 R3 ; and, since vectors v(1) ; v(2) ; v(3) are linearly independent, we have

M span v(1) ; v(2) ; v(3) R3

i.e. set of all possible solutions x of the system of equations under considerations is identical
to the entire space R3 :

Example 18 Consider a third order linear ordinary di¤erential equation


d3 u d2 u du
3
+ 6 2
+ 11 + 6u = 0
dt dt dt
de…ned over C (3) [0; 1); i.e. set of thrice di¤erentiable functions over [0; 1) together with a
initial condition
du d2 u
u(0) = a; = b and =c
dt t=0 dt2 t=0
Roots of the characteristic polynomial i.e.

p3 + 6p2 + 11p + 6 = 0

17
are p = 1, p = 2 and p = 3. Thus, general solution of the ODE can be written as
t 2t 3t
u(t) = e + e + e

where ( ; ; ) 2 R are arbitrary scalars. Since vectors fe t ; e 2t ; e 3t g are linearly indepen-


dent, the set of solutions can be represented as M span fe t ; e 2t ; e 3t g ; which forms a
three dimensional sub-space of C (3) [0; 1): Thus, the set of all possible solutions of the ODE
forms a 3 dimensional subspace of C (3) [0; 1):

Example 19 Consider the ODE-BVP


d2 u(z)
+ 2 u(z) = 0 f or 0 < z < 1
dz 2
B:C:1 (at z = 0) : u(0) = 0
B:C:2 (at z = 1) : u(1) = 0

The general solution of this ODE-BVP, which satis…es the boundary conditions, is given by
X
1
u(z) = 1 sin( z) + 2 sin(2 z) + 3 sin(3 z) + ::: = i sin(i z)
i=1

where ( 1 ; 2 ; :::) 2 R are arbitrary scalars. The set of vectors fsin( z); sin(2 z); sin(3 z); :::g
is linearly independent and form a basis for C (2) [0; 1]; i.e. the set of twice di¤erentiable con-
tinuous functions in interval [0; 1] i.e.

M C (2) [0; 1] = span fsin( z); sin(2 z); sin(3 z); :::g

The concept of linear vector space and transformations de…ned on vector spaces allows us
to arrive at a uni…ed representation of seemingly di¤erent problems encountered in engineer-
ing applications. A large number of problems arising in applied engineering mathematics
can be stated as follows [3]:

Solve equation y = T (x) (24)


where x 2 M X; y 2 Y

Here, X and Y are vector spaces and operator T : M ! Y: In engineering parlance, x; y and
T represent input, output and model, respectively. Linz [3] proposes the following broad
classi…cation of problems encountered in computational mathematics

Direct Problems: Given the operator T and x; …nd y: In this case, we are trying to
compute the output of a given system of equations for given input. The computation
of de…nite integrals is an example of this type.

18
Inverse Problems: Given the operator T and y; …nd x: In this case we are looking for
the input which generates the observed output. Solving the system of simultaneous
(linear/nonlinear) algebraic equations, ordinary and partial di¤erential equations
and integral equations are examples of this category. In fact, a majority of system
design problems, in which we are expected to decide inputs x for given speci…cations
of outputs y, belong to this class

Identi…cation problems: Given the operator x and y; …nd T : In this case, we try
to …nd the laws governing the system from a knowledge of the relation between the
inputs and outputs. Problems involving model parameter estimation from measured
input output data, such as estimating reaction rate parameters or development of
transfer function models, belong to this class of problems.

The direct problems can be treated relatively easily. The inverse problems and the
identi…cation problems are more di¢ cult to solve and form the central theme of this numerical
analysis course.

4 Magnitudes
4.1 Normed Linear Spaces
In three dimensional space, we use lengths or magnitudes to compare any two vectors. Gen-
eralization of the concept of length / magnitude of a vector in three dimensional vector space
to an arbitrary vector space is achieved by de…ning a scalar valued function called norm of
a vector.

De…nition 20 (Normed Linear Vector Space): A normed linear vector space is a vector
space X on which there is de…ned a real valued function which maps each element x 2 X
into a real number kxkcalled norm of x. The norm satis…es the fallowing axioms.

1. kxk 0 for all x 2 X ; kxk = 0 if and only if x =0 (zero vector)

2. kx + yk kxk + kykfor each x; y 2 X: (triangle inequality).

3. k xk = j j : kxk for all scalars and each x 2 X

Example 21 Vector norms:

P
N
1. (Rn ; k:k1 ) :Euclidean space Rn with 1-norm: kxk1 jxi j
i=1

19
2. (Rn ; k:k2 ) :Euclidean space Rn with 2-norm:
" # 12
X
N
kxk2 (xi )2
i=1

3. Rn ; k:kp :Euclidean space Rn with p-norm:

" # p1
X
N
kxkp jxi jp (25)
i=1

where p is a positive integer

4. (Rn ; k:k1 ) :Euclidean space Rn with 1 norm: kxk1 max jxi j

5. n dimensional complex space (C n ) with p-norm:


" # p1
X
N
kxkp jxi jp (26)
i=1

; where p is a positive integer

6. Space of in…nite sequences (l1 ) with p-norm: An element in this space, say x 2 l1 , is
an in…nite sequence of numbers

x = fx1; x2 ; ::::::::; xn ; ::::::::g (27)

such that p-norm de…ned as


" # p1
X
1
kxkp jxi jp <1 (28)
i=1

is bounded for every x 2 l1 ; where p is an integer.

7. (C[a; b]; kx(t)k1 ) : The normed linear space C[a; b] together with in…nite norm

max
kx(t)k1 jx(t)j (29)
a t b

It is easy to see that kx(t)k1 de…ned above quali…es to be a norm

max jx(t) + y(t)j max[jx(t)j + jy(t)j] max jx(t)j + max jy(t)j (30)

max j x(t)j = max j j jx(t)j = j j max jx(t)j (31)

20
8. Other types of norms, which can be de…ned on the set of continuous functions over
[a; b] are as follows
Zb
kx(t)k1 jx(t)j dt (32)
a
2b 3 21
Z
kx(t)k2 4 jx(t)j2 dt5 (33)
a

Example 22 Determine whether(a) max jdf (t)=dtj (b) max jx(t)j + max jx0 (t)j (c) jx(a)j +
max jx0 (t)j and (d) jx(a)j max jx(t)j can serve as a valid de…nitions for norm in C(2) [a; b]:
Solution: (a) max jdf (t)=dtj : For this to be a norm function, Axiom 1 in the de…nition
of the normed vector spaces requires

kf (t)k = 0 ) f (t) is the zero vector in C(2) [a; b] i.e. f (t) = 0 for all t 2 [a; b]

However, consider the constant function i.e. g(t) = c for all t 2 [a; b] where c is some
non-zero value. It is easy to see that

max jdg(t)=dtj = 0

even when g(t) does not correspond to the zero vector. Thus, the above function violates
Axiom 1 in the de…nition of a normed vector space and, consequently, cannot qualify as a
norm.
(b) max jx(t)j + max jx0 (t)j : For any non-zero function x(t) 2 C(2) [a; b], Axiom 1 is
satis…ed. Axiom 2 follows from the following inequality

kx(t) + y(t)k = max jx(t) + y(t)j + max jx0 (t) + y0 (t)j


[max jx(t)j + max jy(t)j] + [max jx0 (t)j + max jy0 (t)j]
[max jx(t)j + max jx0 (t)j] + [max jy(t)j + max jy0 (t)j]
kx(t)k + ky(t)k

It is easy to show that Axiom 3 is also satis…ed for all scalars : Thus, the given function
de…nes a norm on C (2) [a; b]
(c) jx(a)j + max jx0 (t)j : For any non-zero function x(t) 2 C(2) [a; b], Axiom 1 is satis…ed.
Axiom 2 follows from the following inequality

kx(t) + y(t)k = jx(a) + y(a)j + max jx0 (t) + y0 (t)j


[jx(a)j + jy(a)j] + [max jx0 (t)j + max jy0 (t)j]
[jx(a)j + max jx0 (t)j] + [jy(a)j + max jy0 (t)j]
kx(t)k + ky(t)k

21
Axiom A3 is also satis…ed for any as

k x(t)k = j x(a)j + max j x0 (t)j


= j j [jx(a)j + max jx0 (t)j]
= j j : kxk

(d) jx(a)j max jx(t)j : Consider a non-zero function x(t) in C (2) [a; b] such that x(a) = 0
and max jx(t)j 6= 0: Then, Axiom 1 is not satis…ed for all vector x(t) 2 C (2) [a; b] and the
above function does not qualify to be a norm on C (2) [a; b]:

4.2 Induced Matrix Norms


We have already mentioned that the set of all m n matrices with real entries (or complex
entries) can be viewed as a linear vector space. In this subsection, we brie‡y introduce the
concept of induced norm of a matrix, which plays a vital role in numerical analysis. A
norm of a matrix can be interpreted as the ampli…cation power of the matrix. To develop a
numerical measure for ill conditioning of a matrix, we …rst have to quantify this ampli…cation
power of the matrix.

De…nition 23 (Induced Matrix Norm): The induced norm of an m n matrix A is


de…ned as mapping from Rm Rn ! R+ such that

M ax kAxk
kAk = (34)
x 6= 0 kxk

In other words, kAk bounds the ampli…cation power of the matrix i.e.

kAxk
kAk for all x 2 Rn ; x 6= 0 (35)
kxk
The equality holds for at least one non zero vector x 2 Rn . An alternate way of de…ning
matrix norm is as follows
M ax
kAk = kAbxk (36)
kb
xk = 1
b as
De…ning x
x
b=
x
kxk
it is easy to see that these two de…nitions are equivalent. The following conditions are
satis…ed for any matrices A; B 2 Rm Rn

1. kAk > 0 if A 6= [0] and k[0]k = 0

22
2. k Ak = j j.kAk

3. kA + Bk kAk + kBk

The induced norms, i.e. the norm of a matrix induced by vector norms on Rm and Rn ,
can be interpreted as the maximum gain or ampli…cation factor of the matrix. Commonly
used matrix norms are as follows

2-norm:
M ax kAxk2 T 1=2
jjAjj2 = =[ max (A A)] (37)
x 6= 0 kxk2
where max (AT A) and min (AT A) denote maximum and minimum magnitude eigen-
value of AT A, respectively.(Refer to Appendix for details of the derivation).

1-norm: Maximum over column sums of jaij j(Refer to Appendix for details of the
derivation) " m #
M ax kAxk1 max X
jjAjj1 = = jaij j (38)
x 6= 0 kxk1 1 j n i=1

1 norm: Maximum over row sums of jaij j (Refer to Appendix for details of the
derivation) " n #
M ax kAxk1 max X
jjAjj1 = = jaij j (39)
x 6= 0 kxk1 1 i m j=1

Remark 24 There are other matrix norms, such as Frobenious norm, which are not induced
matrix norms. The Frobenious norm is de…ned as follows
" #1=2
X
n X
n
jjAjjF = jaij j2
i=1 j=1

4.3 Convergence and Banach Spaces


In a normed linear space X; the set of all vectors x 2X such that kx xk 1 is called unit
ball centered at x. A unit ball in (R2 ; k:k2 ) is the set of all vectors in the circle with the
origin at the center and radius equal to one while a unit ball in (R3 ; k:k2 ) is the set of all
points in the unit sphere with the origin at the center. Schematic representation of a unit
ball in C[0,1] when maximum norm is used is shown in Figure 1. The unit ball in C[0,1] is
set of all functions f(z) such that jf (z)j 1 where z 2 [0; 1]:

23
Schematic representation of a unit ball in C[0,1]

Once we have de…ned a norm in a vector space, we can proceed to generalize the concept
of convergence of a sequence of vectors. The concept of convergence is central to all
iterative numerical methods.

De…nition 25 (Cauchy sequence): A sequence x(k) in normed linear space is said to


be a Cauchy sequence if x(n) x(m) ! 0 as n; m ! 1; i.e. given an " > 0 there exists an
integer N such that x(n) x(m) < " for all n; m N:

De…nition 26 (Convergence): In a normed linear space an in…nite sequence of vectors


x(k) : k = 1; 2; ::::::: is said to converge to a vector x if the sequence x x(k) ; k = 1; 2; :::
of real numbers converges to zero. In this case we write x(k) ! x :

In particular, a sequence x(k) in Rn converges if and only if each component of the


vector sequence converges. If a sequence converges, then its limit is unique.

Example 27 Convergent sequences: Consider the sequence of vectors represented as


2 3 2 3
1 + (0:2)k 1
6 1 + (0:9)k 7 6 1 7
(k) 6 7 6 7
x =6 7!6 7 (40)
4 3= 1 + ( 0:5)k 5 4 3 5
(0:8)k 0

24
for k = 0, 1, 2,.... is a convergent sequence with respect to any p-norm de…ned on R4 : It can
be shown that it is a Cauchy sequence. Note that each element of the vector converges to a
limit in this case.

Every convergent sequence is a Cauchy sequence. Moreover, when we are working in


R or C n ;all Cauchy sequences are convergent. However, all Cauchy sequences in a general
n

vector space need not be convergent. Cauchy sequences in some vector spaces exhibit such
strange behavior and this motivates the concept of completeness of a vector space.

De…nition 28 (Banach Space): A normed linear space X is said to be complete if every


Cauchy sequence has a limit in X. A complete normed linear space is called Banach space.

Examples of Banach spaces are

(Rn ; k:k1 ) ; (Rn ; k:k2 ) ; (Rn ; k:k1 )

(Cn ; k:k1 ) ; (Cn ; k:k2 ) ; (l1 ; k:k1 ) ; (l1 ; k:k2 )


Concept of Banach spaces can be better understood if we consider an example of a vector
space where a Cauchy sequence is not convergent, i.e. the space under consideration is an
incomplete normed linear space. Note that, even if we …nd one Cauchy sequence in this
space which does not converge, it is su¢ cient to prove that the space is not complete.

Example 29 Let X = (Q; k:k1 ) i.e. set of rational numbers (Q) with scalar …eld also as the
set of rational numbers (Q) and norm de…ned as

kxk1 = jxj (41)

A vector in this space is a rational number. In this space, we can construct Cauchy sequences
which do not converge to a rational numbers (or rather they converge to irrational numbers).
For example, the well known Cauchy sequence

x(1) = 1=1
x(2) = 1=1 + 1=(2!)
:::::::::
(n)
x = 1=1 + 1=(2!) + ::::: + 1=(n!)

converges to e, which is an irrational number. Similarly, consider the sequence

x(n+1) = 4 (1=x(n) )

25
Starting from the initial point x(0) = 1; we can generate the sequence of rational numbers

3=1; 11=3; 41=11; ::::


p
which converges to 2 + 3 as n ! 1:Thus, limits of the above sequences is outside the space
X and the space is incomplete.

Example 30 Consider the sequence of functions in the space of twice di¤erentiable contin-
uous functions C (2) ( 1; 1)
1 1
f (k) (t) = + tan 1
(kt)
2
de…ned in interval 1 < t < 1; for all integers k. The range of the function is (0,1). As
k ! 1; the sequence of continuous function converges to a discontinuous function

u( ) (t) = 0 1<t<0
= 1 0<t<1

Example 31 Let X = (C[0; 1]; k:k1 ) i.e. the space of continuous functions on [0; 1] with
one norm de…ned on it, i.e.
Z1
kx(t)k1 = jx(t)j dt (42)
0

and let us de…ne a sequence [4]


8 9
>
< 0 (0 ( 21
t 1
n
) >
=
(n) 1
x (t) = n(t ) + 1 ( 12 1
) t 1
) (43)
>
:
2 n
1
2 >
;
1 (t 2
)

Each member is a continuous function and the sequence is Cauchy as

1 1 1
x(n) x(m) = !0 (44)
2 n m
However, as can be observed from Figure 2, the sequence does not converge to a continuous
function.

The concepts of convergence, Cauchy sequences and completeness of space assume im-
portance in the analysis of iterative numerical techniques. Any iterative numerical method
generates a sequence of vectors and we have to assess whether the sequence is Cauchy to
terminate the iterations. To a beginner, it may appear that the concept of incomplete vector
space does not have much use in practice. It may be noted that, when we compute numerical

26
Figure 1: Sequence of continuous functions

solutions using any computer, we are working in …nite dimensional incomplete vector spaces.
In any computer with …nite precision, any irrational number such as or e; is approximated
by an rational number due to …nite precision. In fact, even if we want to …nd a solution
in Rn ; while using a …nite precision computer to compute a solution, we actually end up
working in Qn and not in Rn :

4.4 Exercises
1. Over a normed space (X, k:k), we de…ne a function of two variables d(u; v) = ku vk.
Show that d(u; v) is a distance function, in other words, d(u; v) has the following
properties of an ordinary distance between two points:

(a) d(u; v) 0 for any u; v 2 X , and d(u; v) = 0 if and only if u = v;


(b) d(u; v) = d(v; u) for any u, v V;
(c) (the triangle inequality) d(u; w) d(u; v) + d(w; v) for any u; v; w 2 X .

A linear space endowed with a distance function is called a metric space.

2. Determine which of the following de…nitions are valid as de…nitions for norms in
C(2) [a; b]

27
(a) max jx(t)j + max jx0 (t)j
(b) max jx0 (t)j
(c) jx(a)j + max jx0 (t)j
(d) jx(a)j max jx(t)j

3. In a normed linear space X the set of all vectors x 2X such that kx xk 1 is called
the unit ball centered at x:

(a) Sketch unit balls in R2 when 1, 2 and 1 norms are used.


(b) Sketch the unit ball in C[0,1] when the maximum norm is used.
(c) Can you draw a picture of the unit ball in L2 [0; 1]?

4. Consider a vector space X {Set of real valued 2 2 matrices} together with a scalar
…eld F R. Now, consider a subset of S X where S consists of all invertible 2 2
real valued matrices. Does set S form a subspace of X? Justify your answer.

5. Two norms k:ka and k:kb are said to be equivalent if there exists two positive constants
c1 and c2 , independent of x; such that

c1 kxka kxkb c2 kxka

(a) Show that in Rn the 2 norm (Euclidean norm) and 1 norm (maximum norm)
are equivalent.
(b) Show that in Rn the 1 norm and 1 norm (maximum norm) are equivalent.

6. Show that
jkxk kykj kx yk

7. A norm k:ka is said to be stronger than another norm k:kb if

lim lim
x(k) a
=0) x(k) b
=0
k!1 k!1

but not vice versa. For C[0,1], show that the maximum norm is stronger than 2 norm.

8. Show that the function kxk2;W : Rn ! R de…ned as


p
kxk2;W = xT Wx

de…nes a norm on when W is a positive de…nite matrix.

28
9. Consider real valued m n matrices A and B. Prove the following inequality [8]

kA + Bk kAk + kBk

10. Consider real valued square matrices A and B. Prove the following inequalities/identities
[8]
kABk kAk kBk
kAk2 = AT 2

11. Consider real valued square nonsingular matrices A and B. Show that

1 1 1 1
A B A B kA Bk

12. Consider an arbitrary real valued square matrix A. Show that max (A) is not a satis-
factory norm of A.

13. Consider a real valued symmetric positive de…nite matrix A: Show that kAk2 =
max (A):

14. Consider matrix 2 3


2 1 0
6 7
A=4 1 2 1 5
0 1 2
p p
(a) Eigenvalues of A are 2 2; 2; 2 + 2. Find eigenvectors of A diagonalize A, i.e.
1
express A = .
(b) Find 1, 2 and 1 norms of A:

15. Consider a real values square matrix A:Show that even max j i j , is not a satisfactory
norm of a matrix, by …nding a 2 2 counter example to the following inequalities

max (A + B) max (A) + max (B)

max (AB) max (A) max (B)

5 Angles
Similar to magnitude/length of a vector, another important concept in three dimensional
space that needs to be generalized is the angle between any two vectors.

29
5.1 Inner Product Spaces
Given any two unit vectors in R3 , say x
b and y
b; the angle between these two vectors is de…ned
using the inner (or dot) product of two vectors as
T
T x y
cos( ) = (b
x) y b= (45)
kxk2 kyk2
b1 yb1 + x
= x b2 yb2 + x
b3 yb3 (46)

The fact that the cosine of the angle between any two unit vectors is always less than one
can be stated as
jcos( )j = jhb bij 1
x; y (47)
Moreover, vectors x and y are called orthogonal if xT y = 0: Orthogonality is probably the
most useful concept while working in three dimensional Euclidean space. Inner product
spaces and Hilbert spaces generalize these simple geometrical concepts in three dimensional
Euclidean space to higher or in…nite dimensional vector spaces.

De…nition 32 (Inner Product Space): An inner product space is a linear vector space
X together with an inner product de…ned on X X. Corresponding to each pair of vectors
x; y 2 X the inner product hx; yi of x and y is a scalar. The inner product satis…es the
following axioms:

1. hx; yi = hy; xi (complex conjugate)

2. hx + y; zi = hx; zi + hy; zi

3. h x; yi = hx; yi
hx; yi = hx; yi

4. hx; xi 0 and hx; xi = 0 if and only if x = 0:

De…nition 33 (Hilbert Space): A complete inner product space is called as a Hilbert


space.

Here are some examples of commonly used inner products and Hilbert spaces.

Example 34 Inner Product Spaces

30
1. X Rn with inner product de…ned as
X
n
T
hx; yi = x y = xi yi (48)
i=1

X
n
hx; xi = (xi )2 = kxk22 (49)
i=1

is a Hilbert space.

2. X Rn with inner product de…ned as

hx; yiW = xT W y (50)

where W is a positive de…nite matrix is a Hilbert space. The corresponding 2-norm is


p p
de…ned as kxkW;2 = hx; xiW = xT W x

3. X C n with inner product de…ned as


X
n
hx; yi = xi yi (51)
i=1

X
n X
n
hx; xi = xi xi = jxi j2 = kxk22 (52)
i=1 i=1

is a Hilbert space.

4. The set of real valued square integrable functions on interval [a; b] with inner product
de…ned as
Zb
hx; yi = x(t)y(t)dt (53)
a

is an Hilbert space and denoted as L2 [a; b]: Well known examples of spaces of this type
are the set of continuous functions on L2 [ ; ] or L2 [0; 2 ], which are considered while
developing Fourier series expansions of continuous functions on [ ; ] or [0; 2 ] using
sin(n ) and cos(n ) as basis functions.

5. Space of polynomial functions on [a; b] with inner product

Zb
hx; yi = x(t)y(t)dt (54)
a

is an inner product space. This is a subspace of L2 [a; b]:

31
6. Space of complex valued square integrable functions on [a; b] with inner product

Zb
hx; yi = x(t)y(t)dt (55)
a

is an inner product space:

Axioms 2 and 3 imply that the inner product is linear in the …rst entry. The quantity
1
hx; xi 2 is a candidate function for de…ning norm on the inner product space: Axioms 1 and
3 imply that k xk = j j kxk and axiom 4 implies that kxk > 0 for x 6= 0: If we show that
p p
hx; xi satis…es the triangle inequality, then hx; xi de…nes a norm on space X . We …rst
prove Cauchy-Schwarz inequality, which is a generalization of equation (47), and proceed to
p p
show that hx; xi de…nes the well known 2-norm on X; i.e. kxk2 = hx; xi.

Lemma 35 (Cauchey-Schwarz Inequality): Let X denote an inner product space. For


all x; y 2 X ,the following inequality holds

jhx; yij [hx; xi]1=2 [hy; yi]1=2 (56)

The equality holds if and only if x = y or y = 0


Proof: If y = 0, the equality holds trivially so we assume y 6= 0: Then, for all scalars
;we have

0 hx y; x yi = hx; xi hx; yi hy; xi + j j2 hy; yi (57)

hy; xi
In particular, if we choose = ; then, using axiom 1 in the de…nition of an inner
hy; yi
product, we have
hy; xi hx; yi
= = (58)
hy; yi hy; yi

2 hx; yi hy; xi
) hx; yi hy; xi = (59)
hy; yi
2 hx; yi hx; yi 2 jhx; yij2
= = (60)
hy; yi hy; yi

jhx; yij2
)0 hx; xi (61)
hy; yi

32
p
or j hx; yij hx; xi hy; yi
The triangle inequality can be can be established easily using the Cauchy-Schwarz in-
equality as follows

hx + y; x + yi = hx; xi + hx; yi + hy; xi + hy; yi (62)


hx; xi + 2 jhx; yij + hy; yi (63)
p
hx; xi + 2 hx; xi hy; yi + hy; yi (64)
p p p
hx + y; x + yi hx; xi + hy; yi (65)
p
Thus, the candidate function hx; xi satis…es all the properties necessary to de…ne a norm,
i.e.
p p
hx; xi 0 8 x 2 X and hx; xi = 0 if f x = 0 (66)
p p
h x; xi = j j hx; xi (67)
p p p
hx + y; x + yi hx; xi + hy; yi (Triangle inequality) (68)
p
Thus, the function hx; xi indeed de…nes a norm on the inner product space X: In fact the
inner product de…nes the well known 2-norm on X; i.e.
p
kxk2 = hx; xi (69)

and the triangle inequality can be stated as

kx + yk22 kxk22 + 2 kxk2 : kyk2 + kyk22 = [kxk2 + kyk2 ]2 (70)

or kx + yk2 kxk2 + kyk2 (71)

De…nition 36 (Angle) The angle between any two vectors in an inner product space is
de…ned by
1 hx; yi
= cos (72)
kxk2 kyk2
De…nition 37 (Orthogonal Vectors): In an inner product space X two vector x; y
2 X are said to be orthogonal if hx; yi = 0:We symbolize this by x?y: A vector x is said to
be orthogonal to a set S (written as x?S) if x?z for each z 2 S:

Just as orthogonality has many consequences in three dimensional geometry, it has many
implications in any inner-product/Hilbert space [4]. The Pythagoras theorem, which is
probably the most important result the plane geometry, is true in any inner product space.

33
Lemma 38 If x?y in an inner product space then kx + yk22 = kxk22 + kyk22 .

Proof: kx + yk22 = hx + y; x + yi = kxk22 + kyk22 + hx; yi + hy; xi :

De…nition 39 (Orthogonal Set): A set of vectors S in an inner product space X is said


to be an orthogonal set if x?y for each x; y 2 S and x 6= y: The set is said to be orthonormal
if, in addition each vector in the set has a norm equal to unity.

Note that an orthogonal set of non-zero vectors is a linearly independent set. We of-
ten prefer to work with an orthonormal basis as any vector can be uniquely represented
in terms of components along the orthonormal directions. Common examples of such or-
thonormal basis are (a) unit vectors along the coordinate directions in Rn (b) function
fsin(nt) : n = 1; 2; :::g and fcos(nt) : n = 1; 2; :::g in L2 [0; 2 ]:

Example 40 Show that the function hx; yiW : Rn Rn ! R de…ned as

hx; yiW = xT Wy

de…nes an inner product on Rn when W is a symmetric positive de…nite matrix.


Solution: For hx; yiW = xT Wy to qualify as an inner product, it must satisfy the
four axioms in the de…nition of the inner product. We have,

hx; yiW = xT Wy and hy; xiW = yT Wx

Since W is symmetric, i.e.


T
WT = W, xT Wy = yT WT x = yT Wx

Thus, axiom A1 holds for any x; y 2 Rn :

hx + y; ziW = (x + y)T Wz = xT Wz + xT Wz = hx; ziW + hy; ziW

Thus, axiom A2 holds for any x; y; z 2 Rn :

h x; yi = ( x)T Wy = (xT Wy) = hx; yi


hx; yi = xT W( y) = (xT Wy) = hx; yi

Thus, axiom A3 holds for any x; y 2 Rn : Since W is positive de…nite, it follows that
hx; xiW = xT Wx > 0 if x 6= 0 and hx; xiW = xT Wx = 0 if x = 0: Thus, axiom A4
holds for any x 2 Rn : Since all four axioms are satis…ed, hy; xiW = yT Wx is a valid
de…nition of an inner product.

34
Example 41 The triangle inequality asserts that, for any two vectors x and y belonging to
an inner product space
kx + yk2 jjyjj2 +jjxjj2
Does the Cauchy-Schwartz inequality follow from the triangle inequality? Under what condi-
tion does the Schwarz inequality becomes an equality?
Solution: Squaring both the sides, we have

kx + yk22 = hx + y; x + yi [jjyjj2 +jjxjj2 ]2

hx; xi + hy; yi + 2 hx; yi jjyjj22 +jjxjj22 + 2jjyjj2 jjxjj2


jjyjj22 +jjxjj22 + 2 hx; yi jjyjj22 +jjxjj22 + 2jjyjj2 jjxjj2
Since, jjyjj22 + jjxjj22 0 for any x; y 2 X; the above inequality reduces to

hx; yi jjyjj2 jjxjj2 (73)

The triangle inequality also implies that

kx yk22 = hx y; x yi [jjyjj2 +jjxjj2 ]2

hx; xi + hy; yi 2 hx; yi jjyjj22 +jjxjj22 + 2jjyjj2 jjxjj2


jjyjj22 +jjxjj22 2 hx; yi jjyjj22 +jjxjj22 + 2jjyjj2 jjxjj2
Since, jjyjj22 + jjxjj22 0 for any x; y 2 X; the above inequality reduces to

hx; yi jjyjj2 jjxjj2

i.e.
jjyjj2 jjxjj2 hx; yi (74)
Combining inequalities (73) and (74), we arrive at the Cauchy-Schwartz inequality

jjyjj2 jjxjj2 hx; yi jjyjj2 jjxjj2 (75)

i.e.
jhx; yij jjyjj2 jjxjj2 (76)
The Cauchy-Schwartz inequality reduces to an equality when y = x:

35
Figure 2: Schematic representation of projection of a point, b , on line, a.

5.2 Orthogonal Projections


Consider an inner product space X. Suppose we are given a vector b 2X and a direction
a 2X: We want to express b as sum of two vectors, one along the direction of a and another
perpendicular to a: To begin with, let us construct a unit vector e along a, i.e.
a a
b
a= =p
kak2 ha; ai
such that hb ai = 1: Now, we are looking for a vector p =
a;b b
a ;where 2 R is a scalar, such
that vector b p is orthogonal to a, i.e.

hb
a; b pi = hb
a; b b
ai = 0 (77)
hb
a; bi
) = = hb
a; bi (78)
hb
a;bai
Thus, the projection of b along direction b
a, i.e. p; can be expressed as

p= b a; bi b
a= hb a (79)

Figure 2 illustrates this concept for X R3 :


More generally, consider an m dimensional subspace S of inner product space X such
that
S = span a(1) ; a(2) ; ::::; a(m)

36
where the vectors a(1) ; a(2) ; ::::; a(m) 2 Rn are linearly independent vectors. Given an
arbitrary point b 2X , the problem is to …nd a point p in subspace S such that b p is
orthogonal to S. We can de…ne unit vectors

a(i) a(i)
a(i) =
b = p for i = 1; 2; :::; m
ka(i) k2 ha(i) ; a(i) i
As p 2 S we have
X
m
p= a(1) +
1b a(2) + :::: +
2b a(m) =
mb a(i)
ib (80)
i=1

Requirement that b p is orthogonal to S implies that b a(j) for


p is orthogonal to each b
i = 1; 2; :::; m, i.e.
* +
X
m
a(i) ; b
b p = a(i) ; b
b a(j)
jb = 0 for i = 1; 2; :::; m (81)
j=1

or * +
X
m X
m
(i) (j)
b
a ; jb
a = i a(i) ;b
b a(j) = b
a(i) ; b for i = 1; 2; :::m (82)
j=1 j=1

Note that these are m linear equations in m unknowns, f i : i = 1; 2; :::mg. Collecting the
above set of equations, we arrive at the following matrix equation
2 (1) (1) 32 3 2 (1) 3
b
a ;b a a(1) ;b
b a(2) :::: a(1) ;b
b a(m) 1 b
a ;b
6 b a(m) 7 6 7 6 a(2) ; b 7
6 a(2) ;ba(1) a(2) ;b
b a(2) :::: a(2) ;b
b 76 2 7 6 b 7
6 76 7=6 7 (83)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5 4 :::: 5
a(m) ;b
b a(1) a(m) ;b
b a(2) ::::: b a(m) ;b
a(m) m a(m) ; b
b

which can be solved to to …nd f i a(1) ; b


: i = 1; 2; :::mg. In particular, if the set b a(2) ; ::::; b
a(m)
is orthonormal, i.e.
a(i) ;b
b a(j) = 0 when i 6= j
a(1) ; b
or the set b a(2) ; ::::; b
a(m) is an orthonormal basis of S; then it follows that
2 3 2 3
1 a(1) ; b
b
6 7 6 a(2) ; b
b 7
6 2 7 6 7
6 7=6 7 (84)
4 :::: 5 4 :::: 5
m a(m) ; b
b

The situation is exactly the same when we are given a point b 2R3 and plane S in R3 ,
which is spanned by two linearly independent vectors a(1) ; a(2) : We would like to …nd the

37
Figure 3: Schematic representation of a projection of a vector b on a subspace spanned by
vectors (u,v)

distance of b from S, i.e., a point p 2S such that kp bk2 is minimum (see Figure ??).
Again, from school geometry, we know that such a point can be obtained by drawing a
perpendicular from b to S ; p is the point where this perpendicular meets S (see Figure ??).
We would like to formally derive this result using optimization.
It may be noted that, to project vecftor b on subspace S, it is not necessary to generate
a(i) : i = 1; 2; :::; m: One can directly work with the set a(i) : i = 1; 2; :::; m :
unit vectors b
Let us express p 2 S we have
X
m
(1) (2) (m) (i)
p= 1a + 2a + :::: + ma = ia (85)
i=1

Requirement that b p is orthogonal to S implies that b p is orthogonal to each a(j) for


i = 1; 2; :::; m, i.e.
* +
X
m
a(i) ; b p = a(i) ; b ja
(j)
= 0 for i = 1; 2; :::; m (86)
j=1

or * +
X
m X
m
a(i) ; ja
(j)
= i a(i) ; a(j) = a(i) ; b for i = 1; 2; :::m (87)
j=1 j=1

38
Note that these are m linear equations in m unknowns, f i : i = 1; 2; :::mg. Collecting the
above set of equations, we arrive at the following matrix equation
2 (1) (1) 32 3 2 (1) 3
a ;a a(1) ; a(2) :::: a(1) ; a(m) 1 a ;b
6 a(2) ; a(1) a(2) ; a(2) a(2) ; a(m) 7 6 7 6 7
6 :::: 7 6 2 7 6 a(2) ; b 7
6 76 7=6 7 (88)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5 4 :::: 5
a(m) ; a(1) a(m) ; a(2) ::::: a(m) ; a(m) m a(m) ; b
De…ning matrix, G, and vector, f ; as
2 (1) (1) 3 2 3
a ;a a(1) ; a(2) :::: a(1) ; a(m) a(1) ; b
6 a(2) ; a(1) a(2) ; a(2) :::: a(2) ; a(m) 7 6 a(2) ; b 7
6 7 6 7
G =6 7 ; f =6 7 (89)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5
(m) (1)
a ;a a(m) ; a(2) ::::: a(m) ; a(m) a(m) ; b
eqation (88) can be expressed as a linear algebraic equation

G =f (90)

The m m matrix G on the L.H.S. of (90) is called as the Gram matrix. When we work with
real valued vectors and the inner product is a map from X X to R, then it is easy to show
that G is a symmetric matrix, i.e. GT = G: Moreover, since vectors a(1) ; a(2) ; :::::::; a(m)
are assumed to be linearly independent, the Gram matrix is nonsingular. The equation we
have derived here represnts a very general result called projection theorem, which holds
in any Hilbert space.

Theorem 42 Classical Projection Theorem [4]: Let X be a Hilbert space and S be a


…nite dimensional subspace of X: Corresponding to any vector b 2X; there is unique vec-
tor p 2S such that kb pk2 kb xk2 for any vector x 2 S: Furthermore, a necessary
and su¢ cient condition for p 2S be the unique minimizing vector is that vector (b p) is
orthogonal to S:

Consider the error vector, e, de…ned as


X
m
(j)
b p=b ja
j=1

and
* +
X
m X
m
' = kb pk22 = b (j)
ja ; b ja
(j)

j=1 j=1
* m + * m +
X X X
m
(j) (j) (j)
= hb; bi 2 ja ; b + ja ; ja
j=1 j=1 j=1

39
We want to …nd vector p such that ' = kb pk22 is smallest. Thus, the problem of …nding
p can be recast as follows

M in M in
p= '= kb pk22
f i : i = 1; 2; :::mg f i : i = 1; 2; :::mg

Using the necessary condition for optimality (see Appendix B), it follows that, at the opti-
mum point, the following set of equations holds
@'
= 0 for i = 1; 2; :::; m (91)
@ i
i.e.
* + * m +
@' X
m X
= a(i) ; b b; a(i) + a(i) ; ja
(j)
+ (j)
ja ; a
(i)
=0 (92)
@ i j=1 j=1
for i = 1; 2; :::; m

The necessary condition for optimality implies that


* +
Xm
a(i) ; b ja
(j)
= a(i) ; b p = 0 for i = 1; 2; :::; m
j=1

i.e. b p is perpendicular to S. The set of equations (92) can be rearranged as follows


X
m
a(j) ; a(i) j = a(i) ; b for i = 1; 2; :::; m (93)
j=1

which is same as equation (88). Further, it can be shown that G is a positve de…nite matrix
and the solution b = G 1 f yields the global minimum, i.e. we get smallest possible kb pk22
for = b .

Example 43 Consider Hilber space X L2 [0; 2 ] and a vector g(t) = et 2 L2 [0; 2 ]: We


want to …nd a component p(t) of g(t) in the 2 dimensional subspace, S, spanned by vectors

f1 (t) = 1, f2 (t) = t and S = span ff1 (t); f2 (t)g

such that function e(t) = g(t) p(t) is orthogonal to S. Since p(t) 2 S; it is of the form

p(t) = 1 f1 (t) + 2 f2 (t) = 1 + 2t

and coe¢ cients ( 1 ; 2) can be found by solving for


" #" # " #
hf1 ;f1 i hf1 ;f2 i 1 hf1 ;et i
=
hf2 ;f1 i hf2 ;f2 i 2 hf2 ;et i

40
Z 2 2Z
(2 )2 (2 )3
hf1 ;f1 i = 2 ; hf1 ;f2 i = (1:t)dt = ; hf2 ;f2 i = t2 dt =
0 2 0 3
Z 2 Z 2
2
f1 ;et = (1:et )dt = (e2 1) and f2 ;et = (t:et )dt = tet et 0 = e2 (2 1) + 1
0 0
" # " # 1 " #
(2 )2
1 2 2
(e2 1)
= (2 )2 (2 )3 2
2 2 3
e (2 1) + 1

5.3 Orthogomal Projections in Rn


A special case of interest of interest from the engineering viewpoint is when X = Rn and
the linearly independent set is

a(1) ; a(2) ; ::::; a(m) 2 Rn

such that m < n: Let S = span a(1) ; a(2) ; ::::; a(m) represent and m dimensional subspace
of Rn . Given a vector b 2Rn ; we are interested in …nding the orthogonal projection,p; of b
in S. Let us consider a motivating example …rst. We will then state the general result.

Example 44 Consider the problem of …nding an approximate correlation relating the speci…c
heat of a gas at constant pressure, Cp , as a function of temperature in certain temperature
range [Ta ; Tb ]: Let us assume that we have obtained three "measurements" of the speci…c
heat, fCp1 ; Cp2 ; Cp3 g ;at three di¤erent temperatures fT1 ; T2 ; T3 g 2 [Ta ; Tb ]: Let us propose an
approximate model of the form

Cp = 1 + 2T
bp + e
+e=C (94)

where ( 1 ; 2 ) represent parameters of the correlation, e represents approximation error and


bp represents estimate of Cp based on temperature T . Thus, we have three equations
C

Cp1 = 1 + 2 T1 + e1 (95)
Cp2 = 1 + 1 T2 + e2 (96)
Cp3 = 1 + 2 T3 + e4 (97)

which can be rearranged as follows


2 3 2 3 2 3
Cp1 1 T1 " # e1
6 7 6 7 1 6 7
4 Cp2 5 = 4 1 T2 5 + 4 e2 5
2
Cp3 1 T3 e3

41
Thus, we are looking for a vector p
2 3 2 3 2 3
bp1
C 1 T1
6 b 7 6 7 6 7 (1) (2)
p 4 Cp2 5 = 1 4 1 5 + 2 4 T2 5 = 1a + 2a
bp3
C 1 T3

that lies in the two dimensional subspace S = span a(1) ; a(2) or R3 such that kb pk22 =
kek22 is as small as possible where
2 3 2 3
Cp1 e1
6 7 6 7
b 4 Cp2 5 and e 4 e2 5
Cp3 e3

Let us assume that, for any x; y 2R3 ; the inner product de…ned as hx; yi = xT y. Then,
" # " #
a(1) ; a(1) a(1) ; a(2) 3 T1 + T2 + T3
G= =
a(2) ; a(1) a(2) ; a(2) T1 + T2 + T3 T12 + T22 + T32
" # " #
a(1) ; b Cp1 + Cp2 + Cp3
f= =
a(2) ; b Cp1 T1 + Cp2 T2 + Cp3 T3

and b = G 1 f .The vector p obtained using this is such that e = b p ? p and kek22 =
eT e =e21 + e22 + e23 is as small as possible. In other words, b represents the least squares
estimate of the parameter vector . There is an alternate way to arrive at the same result.
Let is de…ne a matrix 2 3
h i 1 T1
6 7
A = a(1) a(2) = 4 1 T2 5
1 T3
Then, it is easy to show that
" # " # " #
a(1) T h i a(1)
T
a(1) a(1)
T
a(2) a(1) ; a(1) a(1) ; a(2)
AT A = T a(1)
a (2) = T T =
a(2) a(2) a(1) a(2) a(2) a(2) ; a(1) a(2) ; a(2)
" T
# " #
a(1) b a(1) ; b
AT b = T =
a(2) b a(2) ; b
Thus,
1
b LS = AT A AT b

42
Now, consider the general case where S = span a(1) ; a(2) ; ::::; a(m) and we want to
project an arbitrary vector b2 Rn on S. Let us assume that, for any x; y 2Rn ; the inner
product de…ned as hx; yi = xT y. If we de…ne matrix
h i
A = a(1) a(2) ::: a(m)

then it is easy to show that


G = AT A and f = AT b (98)
Thus, the least squares estimate of the parameter vector is
1
b = AT A AT b (99)
LS

and h i
1
p = A b LS = A AT A AT b (100)
Here, h i
1
Pr = A AT A AT (101)
is called as a projection matrix and it projects on S. Also, the component orthogomal to S
is given as h i
1
e = b p = b A AT A AT b = [I Pr ] b (102)
This, the projection matrix Pr facilitates splitting a vector b into two components: p 2 S
and e = b p ?S. The projection matrix has an interesting property. It is square of itself.
h ih i h i
2 T 1 T T 1 T T 1 T
(Pr ) = A A A A A A A A = A A A A = Pr (103)

Thus,
(Pr )2 b = Pr (Pr b) = Pr (p) = (Pr )b =p (104)
This makes perfect sense from the geometric viewpoint. Since p 2S; projection of p on S is
p itself.

Example 45 Let us revisit the problem of …nding an approximate correlation relating the
speci…c heat of a gas at constant pressure, Cp , as a function of temperature in certain temper-
ature range [Ta ; Tb ]. Let us now assume that we obtained …ve "measurements" of the speci…c
heat, fCp1 ; Cp2 ; Cp3 ; Cp4 ; Cp5 g ;at …ve di¤erent temperatures fT1 ; T2 ; T3 ; T4 ; T5 g 2 [Ta ; Tb ]: This
time let us propose an approximate model of the form

Cp = 1 + 2T + 3T
2 bp + e
+e=C (105)

43
where ( 1 ; 2 ; 3 ) represent parameters of the correlation, e represents approximation error
and Cbp represents estimate of Cp based on temperature T . The resulting set of equations can
be rearranged as follows
2 3 2 3 2 3
Cp1 1 T1 T12 e1
6 7 6 72 3 6 7
6 Cp2 7 6 1 T2 T22 7 1 6 e2 7
6 7 6 76 7 6 6 7
Cp = 6 6 Cp3
7 = 6 1 T3 T 2 7 4
7 6 3 7 2 5 + 6 e 3
7=A +e
7
6 7 6 2 7 6 7
C
4 p4 5 4 1 T 4 T 4 5 3 e
4 4 5
Cp5 1 T5 T52 e5

Thus, we are looking for projection of vector b Cp on the 3 dimensional subspace S of R5


which is spanned by
2 3 2 3 2 3
1 T1 T12
6 7 6 7 6 7
6 1 7 6 T2 7 6 T22 7
6 7 6 7 6 7
a(1) = 6 1
6 7
7 ; a(2) = 6 T3 7 ; a(3) =
6 7
6
6 T32 7
7
6 7 6 7 6 7
4 1 5 4 T4 5 4 T42 5
1 T5 T52
The least square solution of vector can be obtained as follows
1
b LS = AT A AT b

and p = A b LS :

Example 46 It was found that yield of a chemical reaction, Y , is a function of the operating
temperature and pressure. Experiments have been carried out at 50 di¤erent combinations
f(T1 ; P1 ); (T2 ; ; P2 ); :::; (T100 ; ; P100 )g and the corresponding reaction yield fY1 ; Y2 ; ::::; Y100 g
has been recorded. A model relating the reaction yield and (T; P ) is proposed as follows

Y = 1 + 2T + 3P + e = Yb + e (106)

where ( 1 ; 2 ; 3 ) represent parameters of the correlation, e represents approximation error


and Yb represents estimate of Y based on temperature (T; P ).Thus, we have 50 equations of
the form
Yi = 1 + 2 Ti + 3 Pi + ei for i = 1; 2; :::; 100
The resulting set of equations can be rearranged as follows
2 3 2 3 2 3
Y1 1 T1 P1 2 3 e1
6 Y 7 6 1 T P2 7 1 6 e2 7
6 2 7 6 2 76 7 6 7
Y=6 7 6
= 74 2 5 6 + 7=A +e
4 :::: 5 4 ::: ::: ::: 5 4 ::: 5
3
Y100 1 T100 P100 e100

44
Thus, we are looking for projection of vector b Y on the 3 dimensional subspace S of R100
which is spanned by
2 3 2 3 2 3
1 T1 P1
6 1 7 6 T 7 6 P 7
6 7 6 2 7 6 2 7
a(1) = 6 7 ; a(2) = 6 7 ; a(3) = 6 7
4 ::: 5 4 ::: 5 4 ::: 5
1 T100 P100

The least square solution of vector can be obtained as follows


1
b = AT A AT b
LS

b A b LS : Note that AT A is 3
and p =Y= 3 matrix and AT b is 3 1 vector.

5.4 Gram-Schmidt Process and Orthogonal Polynomials


The orthogonal projections play a vital role in the computational engineering. Given any
linearly independent set of vectors in an inner product space X, it is possible to con-
struct an orthonormal set using the concept of orthogonal projections. This procedure
is called the Gram-Schmidt procedure. Consider a linearly independent set of vectors
x(i) ; i = 1; 2; 3:::::n in an inner product space we de…ne e(1) as

x(1)
z(1) =
b (107)
kx(1) k2
z(2) in two steps.
We form a unit vector b

z(2) = x(2) x(2) ; b


z(1) b
z(1) (108)
where x(2) ; b
z(1) is a component of x(2) along b
z(1) :

(2) z(2)
b
z = (2) (109)
kz k2

By direct calculation it can be veri…ed that bz(1) ?b


z(2) : The remaining orthonormal vectors
z(i) are de…ned by induction. The vector z(k) is formed according to the equation
b

X
k 1
(k) (k)
z =x x(k) ; b
z(i) :b
z(i) (110)
i=1

and
z(k)
z(k) =
b ; k = 1; 2; :::::::::n (111)
kz(k) k2

45
It can be veri…ed by direct computation that z(k) ?b
z(j) for all j < k as follows

X
k 1
z(k) ; b
z(j) = x(k) ; b
z(j) x(k) ; b
z(i) : b
z(i) ; b
z(j) (112)
i=1
(k) (j) (k)
= x ;b
z z(j) = 0
x ;b (113)

Example 47 Gram-Schmidt Procedure in R3 : Consider X = R3 with hx; yi = xT y:


Given a set of three linearly independent vectors in R3
2 3 2 3 2 3
1 1 2
(1) 6 7 (2) 6 7 (3) 6 7
x = 4 0 5; x = 4 0 5; x = 4 1 5 (114)
1 0 0

we want to construct an orthonormal set. Applying the Gram Schmidt procedure,


2 1 3
p
2
x(1) 6 7
z(1)
b = (1) = 4 0 5 (115)
kx k2 1 p
2

z(2) = x(2) z(2) ; e(1) :b


b z(1) (116)
2 3 2 1 3 2 3
1
1 p
6 7 1 6 2 7 6 2 7
= 4 0 5 p 4 0 5=4 0 5
2
0 p1 1
2 2

2 3
p1
(2) 2
z 6 7
z(2) =
b (2)
=4 0 5 (117)
kz k2
p1
2

z(3) = x(3) x(3) ; b


z(1) :b z(1) x(3) ; b
z(2) z(2)
:b
2 3 2 1 3 2 1 3 2 3
2 p p 0
6 7 p 6
2
7 p 6
2
7 6 7
= 4 1 5 24 0 5 24 0 5=4 1 5 (118)
0 p1 p1 0
2 2

z(3) h iT
(3)
b
z = (3) : = 0 1 0
kz k2
Note that the vectors in the orthonormal set will depend on the de…nition of inner product.
Suppose we de…ne the inner product as follows

hx; yiW = xT W y (119)

46
2 3
2 1 1
6 7
W =4 1 2 1 5
1 1 2
p
where W is a positive de…nite matrix. Then, the length of x(1) W;2
= 6 and the unit
z(1) becomes
vector b 2 1 3
p
6
x(1) 6 7
z(1)
b = (1) :=4 0 5 (120)
kx kW;2 1 p
6

The remaining two orthonormal vectors have to be computed using the inner product de…ned
by equation 119.

Example 48 Gram-Schmidt Procedure in C[a,b]: Let X represent set of continuous


functions on interval 1 t 1 with inner product de…ned as
Z1
hx(t); y(t)i = x(t)y(t)dt (121)
1

Given a set of four linearly independent vectors

x(1) (t) = 1; x(2) (t) = t; x(3) (t) = t2 ; x(4) (t) = t3 (122)

we intend to generate an orthonormal set. Applying the Gram-Schmidt procedure

x(1) (t) 1
z(1) (t) =
b (1)
=p (123)
kx (t)k 2
Z1
t
z(1) (t); x(2) (t) =
b dt = 0 (124)
2
1

z(2) (t) = t x(2) ; b z(1) = t = x(2) (t)


z(1) :b (125)
z(2)
z(2) =
b (126)
kz(2) k
Z1 1
(2) 2 t3 2
z (t) = t2 dt = = (127)
3 1 3
1
r
2
z(2) (t) = (128)
3
r
3
z(2) (t) =
b t (129)
2

47
z(3) (t) = x(3) (t) x(3) (t); b
z(1) (t) :b
z(1) (t) x(3) (t); b
z(2) (t) :b
z(2) (t)
01 1 0r 1 1
Z Z
1 @ 2 A (1) 3 3 A (2)
= t2 t dt b z (t) @ t dt b z (t)
2 2
1 1
1 1
= t2 0 = t2 (130)
3 3
z(3) (t)
z(3) (t) =
b (131)
kz(3) (t)k

Z1 2
(3) 2 (3) (3) 2 1
where z (t) = z (t); z (t) = t dt (132)
3
1
Z1 1
4 2 2 1 t5 2t3 t
= t t + dt = +
3 9 5 9 9 1
1
24 2 18 10 8
= + = =
39 9 45 45
r r
8 2 2
z(3) (t) = = (133)
45 3 5
The orthonormal polynomials constructed above are well known Legendre polynomial. It
turns out that r
2n + 1
b
zn (t) = pn (t) ; (n = 0; 1; 2:::::::) (134)
2
where
( 1)n dn n
Pn (t) = n n
1 t2 (135)
2 n! dt
are Legendre polynomials. It can be shown that this set of polynomials forms an ortho-
normal basis for the set of continuous functions on [ 1; 1]. The …rst few elements in this
orthogonal set are as follows
1 1
P0 (t) = 1; P1 (t) = t; P2 (t) = (3t2 1); P3 (t) = (5t3 3t)
2 2
1 1
P4 (t) = (35t4 2
30t + 3); P5 (t) = (63t5 3
70t + 15t)
8 8
Example 49 Gram-Schmidt Procedure in other Spaces

1. Shifted Legendre polynomials: X = C[0; 1] and inner product de…ned as


Z1
hx(t); y(t)i = x(t)y(t)dt (136)
0

48
These polynomials are generated starting from linearly independent vectors

x(1) (t) = 1; x(2) (t) = t; x(3) (t) = t2 ; x(4) (t) = t3 (137)

and applying the Gram-Schmidt process.

2. Hermite Polynomials: X L2 ( 1; 1); i.e. space of continuous functions over


( 1; 1) with 2 norm de…ned on it and
Z1
hx(t); y(t)i = exp( t2 )x(t)y(t)dt (138)
1

Apply Gram-Schmidt to the following set of vectors in L2 ( 1; 1)

x(1) (t) = 1 ; x(2) (t) = t ; (139)


x(3) (t) = t2 ; ::::::x(k) (t) = tk 1
; :::: (140)

The …rst few elements in this orthogonal set are as follows

H0 (t) = 1; H1 (t) = 2t; H2 (t) = 4t2 2; H3 (t) = 5t3 12t


4 2 5 3
H4 (t) = 16t 48t + 12; H5 (t) = 32t 160t + 120t

3. Laguerre Polynomials: X L2 (0; 1); i.e. the space of continuous functions over
(0; 1) with 2 norm de…ned on it and
Z1
hx(t); y(t)i = e t x(t)y(t)dt (141)
0

Apply Gram-Schmidt to the following set of vectors in L2 (0; 1)

x(1) (t) = 1 ; x(2) (t) = t ; (142)


x(3) (t) = t2 ; ::::::x(k) (t) = tk 1
; :::: (143)

The …rst few Laguerre polynomials are as follows

L0 (t) = 1 ; L1 (t) = 1 t ; L2 (t) = 1 2t + (1=2)t2


3 1 3 2 3 1
L3 (t) = 1 3t + t2 t ; L4 (t) = 1 4t + 3t2 t + t4
2 6 3 24

49
5.5 Generalized Fourier Series
z(1) ; b
Consider an inner product space X together with an orthonormal basis fb z(2) ; ::::; b
z(i) ; ::::g
for X. Now, any element x 2X can be expressed as

z(1) b
x = x; b z(1) + x; b
z(2) b
z(2) + ::::: + x; b
z(i) b
z(i) + :::::

This represents the Fourier expansion of x in terms of the orthogonal basis. Some examples
of the Fourier series expansion are as follows

X R3 : Expressing a vector as

v =xi + yj + zk

is the Fourier expansion of v:

X C[ ; ] : Expressing a continuous function f (t) 2 X in terms of orthonormal


cos(jt) sin(jt)
basis { p ; p : j = 0; 1; 2; ::::} the Fourier expansion of f (t):
2 2
X C[0; 1] : Expressing a continuous function f (t) 2 X in terms of shifted Legandre
polynomials is Fourier expansion of expansion of f (t):

5.6 Exercise
1. Compute a matrix that projects every point in the plane onto line x + 2y = 0.

2. Find projection p = A of b on the two dimensional subspace spanned by columns of


matrix A where 2 3 2 3
1 0 1
6 7 6 7
A = 4 0 1 5; b = 4 1 5
1 1 0
Verify that vector e = b p is perpendicular to the columns of A.

3. Find the best straight line …t, y = 1 + 2 t + e; to the following measurements in the
least square sense and sketch your solution

y = 2 at t = 1 ; y = 0 at t = 0
y = 3 at t = 1 ; y= 5 at t = 2

4. Suppose that instead of a straight line, we …t a parabola in the previous exercise


2
y= 1 + 2t + 3t +e

Find the least square estimates of parameters ( 1 ; 2; 3 ).

50
5. It is desired to …t the heat capacity data for methylecyclohexane (C7 H14 ) to a linear
function of temperature
Cp = 1 + 2 T + e

where Cp is the heat capacity, T is the absolute temperature. Determine the least
squares estimates of the model parameters the following data:

cp (KJ=kgK) T (K) cp (KJ=kgK) T (K)


1.426 150 1.627 230
1.469 170 1.661 240
1.516 190 1.696 250
1.567 210 1.732 260

6. Use all of the data given in the following table to …t the following two-dimensional
models for di¤usion coe¢ cient D as a function of temperature (T) and weight fraction
(X)
D = 1 + 2T + 3X + e
such that the sum of the suare of the approximation errors is minimized and estimate
Db at T = 22; X = 0:36.

T(0 C) 20 20 20 25 25 25 30 30 30
X 0.3 0.4 0.5 0.3 0.4 0.5 0.3 0.4 0.5
5 2
D 10 cm =s 0.823 0.639 0.43 0.973 0.751 0.506 1.032 0.824 0.561

7. Consider X = R3 with hx; yi = xT Wy: Given a set of three linearly independent


vectors in R3 2 3 2 3 2 3
1 3 1
(1) 6 7 (2) 6 7 (3) 6 7
x = 4 2 5; x = 4 2 5; x = 4 2 5
1 1 3
we want to construct an orthonormal set. Apply the Gram Schmidt procedure,

hx; yiW = xT W y
2 3
2 1 1
6 7
W=4 1 2 1 5
1 1 2

51
8. Gram-Schmidt Procedure in C[a; b]: Let X represent a set of continuous functions
on the interval 0 t 1 with the inner product de…ned as
Z1
hx(t); y(t)i = w(t)x(t)y(t)dt
0

Given a set of three linearly independent vectors

x(1) (t) = 1; x(2) (t) = t; x(3) (t) = t2 ;

…nd the orthonormal set of vectors if (a) w(t) = 1 (Shifted Legendre Polynomials)
(b) w(t) = t(1 t) (Jacobi polynomials).

9. Show that in C[a,b] with the maximum norm, we cannot de…ne an inner product hx; yi
such that hx; xi1=2 = kxk1 : In other words, show that in C[a; b] the following function

max
hx(t);y(t)i = jx(t)y(t)j
t
cannot de…ne an inner product.

10. In C (1) [a; b]; is


Zb
hx; yi = x0 (t)y0 (t)dt + x(a)y(a)
a
an inner product?

11. Show that in C (1) [a; b] the integral


Zb
hx; yi = w(t)x(t)y(t)dt
a

with w(t) > 0; de…nes an inner product.

12. Show that the parallelogram law holds in any inner product space.

kx + yk2 + kx yk2 = 2 kxk2 + 2 kyk2

Does it hold in C[a,b] with the maximum norm?

13. Consider a real inner product space (X, h:; :i) together with the norm induced by the
p
inner product kxk = hx; xi. Show that the following identity (known as polarization
identity) holds
1
hx; yi = kx + yk2 kx yk2
4

52
14. Assume x(1) ; x(2) ; :::; x(n) are mutually orthogonal non-zero vectors in an inner product
space. Show that they are linearly independent.

15. Suppose the set bz(1) ; b


z(2) ; :::; b
z(n) ; :::: represents an orthonormal basis of a real inner
product space, X, i.e. given any vector x 2 X; we can represent x as follows
X
1
x= z(i)
xi b
i=1

Show that, for x; y 2 X;


X
1 X
1
(i) (i)
hx; yi = x; b
z y; b
z = xi yi
i=1 i=1

Also, show that Parseval’s equality holds, i.e.

2
X
1
2 X
1
kxk = x; b
z (i)
= jxi j2
i=1 i=1

for any x; y 2 X:
Note: This is a very important result. As a illustration, consider Hilbert space
L2 [ ; ] together with orthonormal basis
n p p p o
b z(2) (t) = 1=
z(1) (t) = 1= 2 ; b z(3) (t) = 1=
cos(t); b sin(t); ::::

The parseval’s equality states that, for an arbitraty function f (t) 2 L2 [ ; ], if we


can …nd the Fourier coe¢ cients
X
1
f (t) = z(i)
ci b
i=1
Z
(i)
ci = f (t); b
z = z(i) (t)dt
f (t); b

then Z
2 2
X
1
kf (t)k = jf (t)j dt = jci j2
i=1

16. It can be shown that the sequence


p
z(k) (t) =
b 2= sin(kt)

forms a orthonormal basis in L2 [0; ]: It is desired to represent f (t) = et in terms of


this basis. Find the …rst two Fourier coe¢ eints.

53
17. Consider an inner product space, L2 [0; ], with an inner product de…ned as follows
Z
hf (t);g(t)i = f (t)g(t)dt
0

The following two vectors are orthonormal in L2 [0; ] :


p p
f (t) = 2= sin(2t) and g(t) = 2= sin(3t)

Show that they are linearly independent. Further, …nd the orthogonal projection, p(t);
of the vector et on the subspace spanned f (t) and g(t).

18. Suppose in Problem 40, we choose two linearly indepedent functions

f (t) = 1 and g(t) = t2

Further, …nd the orthogonal projection, p(t); of the vector et on the subspace spanned
f (t) and g(t).

6 Summary
In this chapter, some fundamental concepts from functional analysis have been reviewed.
We begin with the concept of a general vector space and de…ne various algebraic and geomet-
ric structures like norm and inner product. We then move to de…ne the inner product, which
generalizes the concept of dot product, and the angle between vectors. We also interpret
the notion of orthogonality in a general inner product space and develop the Gram-Schmidt
process, which can generate an orthonormal set from a linearly independent set. The def-
inition of the inner product and orthogonality paves the way to generalize the concept of
projecting a vector onto any sub-space of an inner product space. In the end, we discuss
induced matrix norms, which play an important role in the analysis of numerical schemes.

7 Appendices
7.1 Appendix A: Computation of Matrix Norms
7.1.1 Computation of the 2-norm [8][5]

To begin with, we state an important result that is needed to compute 2-norm of a matrix.

54
Theorem 50 Let B 2 Rm Rm represent a square symmetric matrix. Then, (a) B has
only real eigenvalues and orthogonal eigenvectors and (b) B is always diagonalizable as
T
B=

where is a unitary matrix, i.e. T = I; consists of eigenvector B and is a diagonal


matrix with the eigenvalues of B appearing on the diagonal.

Now, consider 2-norm of a matrix A 2 Rm Rn , which is de…ned as follows

max jjAxjj2
jjAjj2 = (144)
x 6= 0 jjxjj2

Squaring both sides we have

max (Ax)T (Ax) max xT Bx


jjAjj22 = =
x 6= 0 (xT x) x 6= 0 (xT x)

where B = AT A is a symmetric matrix. Since B is symmetric, we can diagonalize B as


T
B= (145)

where is a matrix with eigenvectors of B as columns and is the diagonal matrix with
eigenvalues of B (= AT A) on the main diagonal. Note that in this case is a unitary
matrix, i.e.,
T T 1
= I; i.e. = (146)
and eigenvectors are orthogonal. Also, since

xT Bx =(Ax)T (Ax) = kAxk22 0

B is positive semide…nite matrix and the eigenvalues, i; of AT A are non-negative. Using


the fact that is unitary, we can write
T
xT x = xT x = yT y (147)

This implies that


xT Bx yT y
= (148)
(xT x) (yT y)
where y = T x: Also, note that AT A is positive semi-de…nite matrix Suppose the eigen-
values, i ; of AT A are numbered such that

0 1 2 :::::: n (149)

55
Then, we have
yT y ( 1 y12 + 2 y22 + :::::: + n yn2 )
= n (150)
(yT y) (y12 + y22 + :::::: + yn2 )
which implies that
yT y xT Bx xT (AT A)x
= = n (151)
(yT y) (xT x) (xT x)
The equality holds only at the corresponding eigenvector of AT A, i.e.,
T T
v(n) (AT A)v(n) v(n) nv
(n)

T
= T
= n (152)
[v(n) ] v(n) [v(n) ] v(n)
Thus, the 2 norm of matrix A can be computed as follows

max
jjAjj22 = jjAxjj2 =jjxjj2 = T
max (A A) (153)
x 6= 0

i.e.
T 1=2
jjAjj2 = [ max (A A)] (154)
T
where max (A A) denotes the maximum magnitude eigenvalue or the spectral radius of
AT A.

7.1.2 Computation of 1 Norm [5]

Consider 1-norm of a matrix A 2 Rm Rn , which is de…ned as follows

max jjAxjj1 M ax
jjAjj1 = = kAb
xk1 (155)
x 6= 0 jjxjj1 kb
xk1 = 1

Now, i’th component of vector Ax can be written as follows


X
n
(Ax)i = bj
aij x
j=1

and " m #
X
m X
n X
m X
n X
n X
kAb
xk1 = bj
aij x jaij j jb
xj j jaij j jb
xj j
i=1 j=1 i=1 j=1 j=1 i=1

De…ne " #
X
m
Cj = jaij j
i=1

and let
max
Cmax = Cj
1 j n

56
Pm
Then, since i=1 jb
xj j = kb
xk1 = 1; it follows that
!
X
n X
n X
n
kAb
xk1 Cj jb
xj j Cmax jb
xj j = Cmax jb
xj j = Cmax
j=1 j=1 j=1

Suppose, C b as follows
/ j = Cmax for j = k. Then, we can choose a vector x

bk = 1 and x
x bj = 0 when j 6= k

b , we have
For this choice of x
" #
X
n
kAb
xk1 = Cmax = jaik j
i=1

and thus " #


X
n
jjAjj1 = Cmax = jaik j
i=1
i.e. jjAjj1 equals the maximum of the column sum of jaij j :

7.1.3 Computation of 1 Norm of a Matrix [5]

Consider 1-norm of a matrix A 2 Rm Rn , which is de…ned as follows


max jjAxjj1 M ax
jjAjj1 = = kAb
xk1 (156)
x 6= 0 jjxjj1 kb
xk1 = 1
Let us de…ne v = Ab
x. Now, using de…nition of 1 norm of a vector, we have
M ax X
n
M ax
kAb
xk1 = kvk1 = jvi j = bj
aij x
i i j=1

M ax X
n
jaij j jb
xj j
i j=1

Since kb
xk1 = 1; it follows that jb
xj j 1 and
M ax X M ax X
n n
kAb
xk1 jaij j jb
xj j jaij j
i j=1
i j=1

b such that
Suppose that the maximum occurs for i = k. Then choosing x

xj = sign(akj )

we obtain the equality i.e.


M ax X
n
jjAjj1 = jaij j
i j=1

Thus, jjAjj1 equals the maximum of the row sum of jaik j :

57
7.2 Appendix B: Necessary and Su¢ cient Conditions for Uncon-
strained Optimality
7.2.1 Preliminaries

Given a real valued scalar function (x) : Rn ! R de…ned for any x 2 Rn :

De…nition 51 (Global Minimum): If there exists a point x 2 Rn such that (x ) < (x)
for any x 2 RN ; then x is called as the global minimum of (x):

De…nition 52 "-neighborhood of a point x be de…ned as the set N (x; ") = fx : kx xk "g

De…nition 53 (Local Minimum) : If there exists an " neighborhood NC (x) round x such
that (x) < (x) for each x 2 Ne (x); then x is called a local minimum.

Before we prove the necessary and su¢ cient conditions for optimality, we revise some
relevant de…nitions from linear algebra.

De…nition 54 (Positive De…nite Matrix) An n n matrix A is called positive de…nite


if for every x 2Rn
xT Ax > 0 (157)
whenever x 6= 0:

De…nition 55 (Positive Semi-de…nite Matrix) An n n matrix A is called positive


semi-de…nite if for every x 2 Rn we have

xT Ax 0 (158)

De…nition 56 (Negative De…nite Matrix) An n n matrix A is called negative de…nite


if for every x 2Rn
xT Ax <0 (159)
whenever x 6= 0:

De…nition 57 (Negative Semi-de…nite Matrix) An n n matrix A is called negative


semi-de…nite if for every x 2Rn we have

xT Ax 0 (160)

58
7.2.2 Necessary Condition for Optimality

The necessary condition for optimality, which can be used to establish whether a given point
is a stationary (maximum or minimum) point, is given by the following theorem.

Theorem 58 If (x) is continuous and di¤erentiable and has an extreme (or stationary)
point (i.e., maximum or minimum) point at x = x; then
T
@ @ @
r (x) = :::::::::::::: =0 (161)
@x1 @x2 @xN x=x

Proof: Suppose x = x is a minimum point and one of the partial derivatives, say the
th
k one, does not vanish at x =x; then by Taylor’s theorem

X
N
@
(x + x) = (x) + (x) xi + R2 (x; x) (162)
i=1
@xi
@
i:e: (x + x) (x) = (x) + R2 (x; x)
xk (163)
@xi
Since R2 (x; x) is of order ( xi )2 ; the terms of order xi will dominate over the higher
order terms for su¢ ciently small x: Thus, the sign of (x + x) (x) is decided by the
sign of
@
xk (x)
@xk
Suppose,
@
(x) > 0 (164)
@xk
then, choosing xk < 0 implies

(x + x) (x) < 0 ) (x + x) < (x) (165)

and (x) can be further reduced by reducing xk : This contradicts the assumption that
x = x is a minimum point. Similarly, if
@
(x) < 0 (166)
@xk
then, choosing xk > 0 implies

(x + x) (x) < 0 ) (x + x) < (x) (167)

and (x) can be further reduced by increasing xk : This contradicts the assumption that
x = x is a minimum point. Thus, x = x will be a minimum of (x) only if
@
(x) = 0 F or k = 1; 2; :::; n (168)
@xk
Similar arguments can be made if x = x is a maximum of (x):

59
7.2.3 Su¢ cient Condition for Optimality

The su¢ cient condition for optimality, which can be used to establish whether a stationary
point is a maximum or a minimum, is given by the following theorem.

Theorem 59 A su¢ cient condition for a stationary point x = x to be an extreme point


@2
(i.e., maximum or minimum) is that the matrix (Hessian of ) evaluated at x = x
@xi @xj
is

1. positive de…nite when x = x is a minimum

2. negative de…nite when x = x is a maximum

Proof: Using the Taylor series expansion, we have

X
N
@ 1 XX @ 2 (x +
N N
x)
(x + x) = (x) + (x) x + xi xj
i=1
@x i 2! i=1 j=1
@x i @x j

(0 < < 1) (169)

:Since x = x is a stationary point we have

r (x) = 0 (170)

Thus, the above equation reduces to

1 XX @ 2 (x +
N N
x)
(x + x) (x) = xi xj (171)
2! i=1 j=1 @xi @xj
(0 < < 1)

This implies that the sign of (a + x) (a) at the extreme point x is same as the sign of
@2
the R.H.S. Since the 2’nd partial derivative is continuous in the neighborhood of
@xi @xj
x = x; its value at x = x + x will have same sign as its value at x = x for all su¢ ciently
small x. If the quantity

XN X N
@ 2 (x + x)
xi xj ' ( x)T [r2 (x)] x 0 (172)
i=1 j=1
@xi @xj

for all x, then x = x will be a local minimum. In other words, if the Hessian matrix
,[r2 (x)], is positive semi-de…nite, then x = x will be a local minimum. If the quantity

60
XN X N
@ 2 (x + x)
xi xj ' ( x)T [r2 (x)] x 0 (173)
i=1 j=1
@x i @x j

for all x, then x = x will be a local maximum. In other words, if the Hessian matrix,
[r2 (x)], is negative semi-de…nite, then x = x will be a local maximum.
It may be noted that the need to de…ne positive de…nite or negative de…nite matrices
naturally arises from the geometric considerations while qualifying a stationary point in
multi-dimensional optimization problems. Whether a matrix is positive (semi) de…nite, neg-
ative (semi) de…nite or inde…nite can be established using algebraic conditions, such as the
sign of the eigenvalues of the matrix. If the eigenvalues of a matrix are all real positive (i.e.
i 0 for all i) then, the matrix is positive semi-de…nite. If the eigenvalues of a matrix are
all real negative (i.e. i 0 for all i) then, the matrix is negative semi-de…nite. When the
eigenvalues have mixed signs, the matrix is inde…nite.

References
[1] Kreyzig, E.; Introduction to Functional Analysis with Applications,Wiley, New York,
1978.

[2] Limaye, B. V., Functional Analysis (3rd Ed.), New Age International, New Delhi, 2014.

[3] Linz, P.; Theoretical Numerical Analysis, Dover, New York, 1979.

[4] Luenberger, D. G.; Optimization by Vector Space Approach , Wiley, New York, 1969.

[5] Phillips, G. M. and P. J. Taylor; Theory and Applications of Numerical Analysis, Acad-
emic Press, 1996.

[6] Pushpavanam, S.; Mathematical Methods in Chemical Engineering, Prentice Hall of


India, New Delhi, 1998.

[7] Ramakrishna, D. and N. R. Amundson, Linear Operator Methods in Chemical Engi-


neering with Applications to Transport and Chemical Reaction Systems, Prentice Hall,
1985.

[8] Strang, G.; Linear Algebra and Its Applications. Harcourt Brace Jevanovich, New
York, 1988.

[9] Strang, G.; Introduction to Applied Mathematics. Wellesley Cambridge, MA, 1986.

61

You might also like