Professional Documents
Culture Documents
Contents
1 Introduction 3
3 Transformations 13
4 Magnitudes 19
4.1 Normed Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Induced Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Convergence and Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Angles 29
5.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Orthogomal Projections in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Gram-Schmidt Process and Orthogonal Polynomials . . . . . . . . . . . . . . 45
5.5 Generalized Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1
6 Summary 54
7 Appendices 54
7.1 Appendix A: Computation of Matrix Norms . . . . . . . . . . . . . . . . . . 54
7.1.1 Computation of the 2-norm [8][5] . . . . . . . . . . . . . . . . . . . . 54
7.1.2 Computation of 1 Norm [5] . . . . . . . . . . . . . . . . . . . . . . . 56
7.1.3 Computation of 1 Norm of a Matrix [5] . . . . . . . . . . . . . . . . 57
7.2 Appendix B: Necessary and Su¢ cient Conditions for Unconstrained Optimality 58
7.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2.2 Necessary Condition for Optimality . . . . . . . . . . . . . . . . . . . 59
7.2.3 Su¢ cient Condition for Optimality . . . . . . . . . . . . . . . . . . . 60
2
1 Introduction
In the previous module, we derived di¤erent abstract equation forms, namely linear algebraic
equations, nonlinear algebraic equations, DAEs, ODE-IVPs, OBE-BVPs and PDEs. Are
these equation forms fundamentally di¤erent from each other? To see the unity among the
apparent diversity, we need to acquaint ourselves with the generalized concept of a vector
space.
When we begin to use the concept of vectors for formulating mathematical models for
physical systems, we start with the concept of a vector in the three dimensional coordinate
space. From the mathematical viewpoint, the three dimensional space can be looked upon as
a set of objects, called vectors, which satisfy certain generic properties. While working with
mathematical modeling we need to deal with a variety of such sets containing di¤erent types
of objects. It is possible to distill essential properties satis…ed by all the vectors in the three
dimensional vector space and develop a more general concept of a vector space, which is a set
of objects that satisfy these generic properties. Such a generalization can provide a uni…ed
view of problem formulations and the solution techniques. Generalization of the concept of
the vector and the three dimensional vector space to any general set is not su¢ cient. To
work with these sets of generalized vectors, we also need to generalize various algebraic and
geometric concepts, such as magnitude of a vector, convergence of a sequence of vectors,
limit, angle between two vectors, orthogonality, etc.. Understanding the fundamentals of
vector spaces also helps in developing a uni…ed view of many seemingly di¤erent numerical
schemes. In this module, fundamentals of vector spaces are brie‡y introduced. A more
detailed treatment of these topics can be found in Luenberger [4] and Kreyzig [1].
A word of advice before we begin to study these grand generalizations. While dealing
with the generalization of geometric notions in three dimensions to more general vector
spaces, it is di¢ cult to visualize vectors and surfaces as we can do in the three dimensional
vector space. However, if you understand the geometrical concepts in the three dimensional
space well, then you can develop a good understanding of the corresponding concept in any
general vector space. In short, it is enough to know your school geometry well. We are only
building qualitatively similar structures on the other sets of interest.
3
vectors, say u and v in R3 ,
u =i j + 2k and v =i + j k
The vector u + v forms one of the diagonals of the parallelogram with sides u and
v.
In fact, if u and v happen to represent two independent directions in a plane, then any
other vector in the plane can be reached as a linear combination of these two vectors. For
example, any vector lying in the x y plane can be reached as linear combination of the
following two independent directions
2 3 2 3
1 1
6 7 6 7
u = 4 1 5 and v = 4 1 5
0 0
Note that scalars here are chosen as the set of real numbers with a purpose. If we happen
to restrict the scalars to some other set, say the set of integers, then we will not be able
reach any arbitrary point in the plane through the use of parallelogram law. What is then so
special about the law of vector addition and the set of real numbers as the choice of scalars?
To get insight into this requires us to understand the concepts of closure and …eld.
De…nition 1 (Closure) A set is said to be closed under an operation when any two elements
of the set subject to the operation yields a third element belonging to the same set.
De…nition 2 (Field) A …eld is a set of elements closed under addition, subtraction, mul-
tiplication and division.
4
Note that the set of integers is closed under addition, multiplication and subtraction.
However, this set is not closed under division. On the other hand, the set of real numbers (R)
and the set of complex numbers (C) are closed under addition, subtraction, multiplication
and division. Moreover, the set of real numbers (R) and the set of complex numbers (C)
are scalar …elds. However, the set of integers is not a …eld. Also, it is easy to see that the
closures under vector addition and scalar multiplication are two fundamental properties of
vectors in R3 . These properties form the basis for generalizing the concept of a vector space.
A vector space is de…ned as a nonempty set of elements, which is closed under addition and
scalar multiplication. Thus, associated with every vector space is a set of scalars F (also
called as scalar …eld or coe¢ cient …eld) used to de…ne scalar multiplication on the space. In
functional analysis, the scalars will always be taken to be the set of real numbers (R) or
complex numbers (C).
De…nition 3 (Vector Space): A vector space X is a set of elements called vectors and an
associated scalar …eld F together with two operations. The …rst operation is called addition
which associates with any two vectors x; y 2 X a vector x + y 2 X , the sum of x and y.
The second operation is called scalar multiplication, which associates with any vector x 2 X
and any scalar 2 F a vector x (a scalar multiple of x by ):
Thus, when X is a linear vector space, given any vectors x; y 2 X and any scalars
; 2 R; the element x + y 2 X. This implies that the well known parallelogram law in
three dimensions also holds true in any vector space.
Given a vector space X and scalar …eld F , it is easy to show that the following properties
hold for any x; y; z 2 X and any scalars ; 2 F :
1. Commutative law: x + y = y + x
2. Associative law: x + (y + z) = (x + y) + z
Let us examine some examples of the vector spaces. To begin with, let us consider the
set X consisting of all n-tuples of the form
h iT
x = x1 x2 ::::: xn (1)
5
where xi 2 R together with F R. Given any two elements from this set, say x; y 2 X,
and arbitrary scalars, ; 2 R; it is easy to see that n-tuple x + y is also contained in
X. The set considered here is n dimensional real coordinate space, i.e. X Rn . Suppose,
in the previous example, we consider a set of n-tuples where xi 2 C together with F C: It
is straight-forward to show that x; y 2 C n , and arbitrary scalars, ; 2 C; it is easy to see
that n-tuple x + y is also contained in C n : Note that the choice of scalar …eld is critical
while de…ning a vector space. For example consider (X Rn ; F C). Since for any x 2 X
and any 2 C the vector x 2R = n ; this combination of set X and scalar …eld F does not
form a vector space.
The generalized de…nition of vector space permits us to view many interesting sets as
vector spaces other then Rn and C n : For example, let us consider the set of all m n matrices
with real elements together with scalar …eld R, i.e. X Rm Rn ; F R: It is easy to see
that, if A; B 2 X; then A + B 2 X for arbitrary ; 2 R and thus X is a vector space.
Note that a "vector" in this space is an m n matrix and the null vector corresponds to
an m n null matrix. Another example of vector space is a set of all in…nite sequence of
real numbers, which is denotes as X l1 , together with F R. A typical vector x of this
space has the form
x = (x1 ; x2 ; :::; xk ; ::::)
x; y 2 l1 ; and arbitrary scalars, ; 2 R; linear combination x + y is also contained in
l1 . Thus, the combination (X l1 ; F R) also quali…es to be a vector space. Note that a
single vector in this set has in…nite elements, which is qualitatively di¤erent from 2, 3 or n
tuples that we have considered so far.
Are there other sets in which a single vector consist of in…nite elements? There are plenty
of them which we encounter in many engineering problems. Consider a set of all real valued
continuous functions over an interval [a; b]; which is denoted as C[a; b], together with F R.
We write x = y if x(t) = y(t) for all t 2 [a; b] The null vector 0 in this space is a function
which is zero everywhere on [a; b] ; i.e.,
If x(t) and y(t) are vectors from this space and is a real scalar, then (x + y)(t) =
x(t) + y(t) and ( x)(t) = x(t) are also elements of C[a; b]:Thus,the pair, X C[a; b];
F R forms a vector space. Other examples of such sets that qualify as vector spaces are
(a) collection of all continuous and n times di¤erentiable functions over an interval [a; b],
i.e. X C (n) [a; b]; together withF R and (b) the set of all functions ff (t) : t 2 [a; b]g for
6
which
Zb
jf (t)jp dt < 1
a
The individual elements in the set are indexed using superscript (k). Now, if X = Rn and
x(k) 2 Rn represents k th vector in the set, then it is a vector with n components which are
represented as follows
h iT
x(k) = x1(k) x2(k) :::: x(k)n (3)
Similarly, if X = l1 and x(k) 2 l1 represents k th vector in the set, then x(k) represents an
in…nite sequence with elements denoted as follows
h iT
(k) (k)
x(k) = x1 :::: xi :::::: (4)
In three dimension, we often work with subsets like lines or planes. For example, consider
the set S of collection of all vectors
x = x(1) + x(2)
7
In other words, S represents the set of all possible linear combinations of fx(1) ; x(2) g. This
set de…nes a plane passing through the origin in R3 .It is interesting to note that vectors
belonging to this set obey properties of a vector space, i.e. it is easy to show that
On the other hand, consider the set, S1 , which is collection of all vectors of the form
The vector (y + x)
(1) (2)
x + y =( 1 + 2) x +( 1 + 2) x + 2b 2
= S1
Also, vector x = ( 1 x(1) + 1 x(2) ) + b 2S = 1 for an arbitrary : Thus, the lines or planes
passing through the origin di¤er from the rest and belong to a special class of subsets of R3 .
Any two dimensional plane passing through the origin of R3 is a sub-space of R3 . The origin
must be included in the set for it to qualify as a sub-space. Note that a plane which does not
pass through the origin is not a sub-space. The concept of a sub-space can be generalized
as follows.
Thus, the fundamental property of objects (elements) in a vector space is that they can
be constructed by simply adding other elements in the space. This property is formally
de…ned as follows.
in a vector space is of the form 1 x(1) + 2 x(2) + :::::::::::::: + m x(m) where ( 1 ; ::: m ) are
scalars.
De…nition 7 (Span of a Set of Vectors): Let S be a subset of vector space X. The set
generated by all possible linear combinations of elements of S is called as the span of S and
denoted as [S]. Span of S is a subspace of X.
8
De…nition 8 (Linear Dependence): A vector x is said to be linearly dependent upon a
set S of vectors if x can be expressed as a linear combination of vectors from S. Alternatively,
x is linearly dependent upon S if x belongs to the span of S; i.e. x 2 [S]. A vector is said
to be linearly independent of set S, if it is not linearly dependent on S . A necessary and
su¢ cient condition for the set of vectors x(1) ; x(2) ; :::::x(m) to be linearly independent is that
the expression
Xm
(i)
ix =0 (8)
i=1
Example 9 Show that functions 1, exp(t), exp(2t), exp(3t) are linearly independent over
any interval [a,b].
Let us assume that vectors (1; et ; e2t ; e3t ) are linearly dependent i.e. there are constants
( ; ; ; ), not all equal to zero, such that
Taking the derivative on both the sides, the above equality implies
Since et > 0 holds for all t 2 [a; b], the above equation implies that
Taking the derivative on both the sides, the above equality implies
which is absurd. Thus, equality (11) holds only for = = 0 and vectors (1; et ) are linearly
independent on any interval [a; b]. With = = 0, equality (10) only when = 0 and
equality (9) holds only when = 0: Thus, vectors (1; et ; e2t ; e3t ) are linearly independent.
9
A vector space having …nite basis (spanned by set of vectors with …nite number of el-
ements) is said to be …nite dimensional. All other vector spaces are said to be in…nite
dimensional. We characterize a …nite dimensional space by number of elements in a basis.
Any two basis for a …nite dimensional vector space contain the same number of elements.
It is easy to see that matrix A has rank equal to one and columns (and rows) are linearly
dependent. Thus, it is possible to obtain non-zero solutions to the above equation, which can
be re-written as follows
2 3 2 3 2 3 2 3
1 2 4 0
6 7 6 7 6 7 6 7
4 1 5 x1 + 4 2 5 x2 + 4 4 5 x3 = 4 0 5
2 4 8 0
In fact, x(1) and x(2) and linearly independent and any linear combination of these two
vectors, i.e.
x = x(1) + x(2)
for any scalars ( ; ) 2 R satis…es
Thus, the solutions can be represented by a set M span x(1) ; x(2) ; which forms a two
dimensional subspace of R3 :
10
2. Let S = v(1) ; v(2) where
2 3 2 3
1 5
6 7 6 7
6 2 7 6 4 7
6 7 6 7
v(1) =6
6 3 7
7 ; v(2) =6
6 3 7
7 (12)
6 7 6 7
4 4 5 4 2 5
5 1
3. Consider the set of nth order polynomials on interval [0; 1]. A possible basis for this
space is
p(1) (z) = 1; p(2) (z) = z; p(3) (z) = z 2 ; ::::; p(n+1) (z) = z n (13)
Any vector p(t) from this space can be expressed as
de…ned on interval [a,b]. The span of the set S forms a 4 dimensional sub-space of
C[a; b]:
6. Consider the space X of all n n real valued matrices. The set of all symmetric real
valued n n matrices, say S1 ; is a subspace of the set of all real valued n n matrices.
This follows from the fact that the matrix A + B is a real valued symmetric matrix
for arbitrary scalars ; 2 R when A,B 2S 1 : Similarly, the set of all skew symmetric
11
real valued n n matrices, say S2 ; is also a subspace of X: This follows from the fact
that for any A,B 2S 2 and for arbitrary scalars ; 2 R
( A + B)T = ( A + B) 2 S2
On the other hand, the set of all positive de…nite real valued n n matrices, say S3 ; is
not a subspace of X: This is because, if A 2S 3 then A 2 = S3 :
2.3 Exercise
1. Decide the linear dependence or independence of
3. While solving problems using a digital computer, arithmetic operations can be per-
formed only with a limited precision due to …nite word lengths. Consider the vector
space X R and discuss which of the laws of algebra (associative, distributive, com-
mutative) are not satis…ed for the ‡oating point arithmetic in a digital computer.
4. Consider the space X of all n n matrices. Find a basis for this vector space and show
that the set of all lower triangular n n matrices forms a subspace of X:
5. Consider a set, X; consisting of all real valued 2 2 matrices. Find a basis for this
vector space. Also, consider a subspace, S, of X consisting of all real valued symmetric
2 2 matrices. Find a basis for S. What are the dimensions of X and S?
12
7. Give an example of a function which is in L1 [0; 1] but not in L2 [0; 1]:
8. Show that polynomials p1 (t) = 1; p2 (t) = t; and p3 (t) = t2 are linearly independent
over any interval [0; 1].
d2 x
+x=0
dt2
is a linear space. What is the dimension of this space?
3 Transformations
Using the generalized concepts of vectors and vector spaces, we can look at the mathematical
models in engineering as transformations, which map a subset of vectors from one vector
space to a subset in another space.
The set of all elements for which an operator T is de…ned is called as the domain of T
and the set of all elements generated by transforming elements in the domain by T are called
as the range of T . If for every y 2 Y , there is at most one x 2 M for which T (x) = y ,
then T (:) is said to be one to one. If for every y 2 Y there is at least one x 2 M; then T is
said to map M onto Y: A transformation is said to be invertible if it is one to one and onto.
13
1. Consider the transformation
y = Ax (18)
where y 2 Rm ,x 2 Rn ; A 2 Rm Rn is a (m n) matrix and T (x) =Ax. Whether
this mapping is onto Rm depends on the rank of the matrix. Now, consider two vectors
x(1) ; x(2) 2 Rn and two arbitrary scalars ( ; ). Since
dx(t)
y(t)=
dt
where t 2 [a; b] : Here, T (x) =dx=dt is an operator from, X C (1) [a; b]; the space of
continuously di¤erentiable functions, to the space of continuous function, i.e., Y
C[a; b]. Now, given two continuous functions, x(t); z(t) 2 C[a; b] and any two arbitrary
scalars ( ; ), it is easy to show that
14
h i2
i.e. T (x) = d(x(t))
dt
is also an operator from X C (1) [a; b] to Y C[a; b]: However,
T (x) in this case is not a linear operator because
2 2 2
d (x(t) + z(t)) d (x(t)) d (z(t)) d (x(t)) d (z(t))
= + +2 (20)
dt dt dt dt dt
2 2
d (x(t)) d (z(t))
6= + (21)
dt dt
which maps X {space of integrable functions over [a; b]} into Y R: It is easy to
check that this is a linear transformation.
with the initial condition x(0) = : De…ning the product space Y = C (1) [a; 1) R;
the transformation T : C (1) [0; 1) ! Y can be stated as
T [x(t)] = (0(t); )
where 0 represents a zero function over the interval [0; 1); i:e:; 0(t) = 0 for t 2 [0; 1):
If f (t; x(t)) is a linear function of x(t); then T [x(t)] is a linear operator else it is a
nonlinear operator.
du(0)
B:C: at z = 0 : f1 ; u(0) = 0
dz
du(1)
B:C: at z = 1 : f2 ; u(1) = 1
dz
15
In this case, the transformation T [u(z)] de…ned as
d2 u(z) du(z)
T [u(z)] = a 2
+b + cg (u(z)) ; f1 (u0 (0); u(0)) ; f2 (u0 (1); u(1))
dz dz
maps the space X C (2) [0; 1] to Y = C (2) [0; 1] R R and the ODE-BVP can be
represented as follows
T [u(z)] = (0(z); 0 ; 1 )
@2u @u @u
a 2 +b + cg(u) =0
@z @z @t
de…ned over (0 < z < 1) and t 0 with the initial and the boundary conditions speci…ed
as follows
u(z; 0) = h(z) for (0 < z < 1)
du(0; t)
B:C: at z = 0 : f1 ; u(0; t) = 0 for t 0
dz
du(1; t)
B:C: at z = 1 : f2 ; u(1) = 1 for t 0
dz
@ 2 u(z; t) @u(z; t)
T [u(z; t)] = a + b
@z 2 @z
@u
+cg (u(z; t)) ; u(z; 0); f1 (u0 (0; t); u(0; t)) ; f2 (u0 (1; t); u(1; t))
@t
maps the space X C (2) [0; 1] C (1) [0; 1) to Y = C (2) [0; 1] C[a; b] R R and the
PDE can be represented as follows
From these example, it is clear that these seemingly dissimilar transformations can be
represented in a uni…ed manner as follows
y = T (x) (23)
where T : X ! Y is such that x 2X and y 2 M Y . Here, the set M can be entire space
Y or a sub-space of Y: To understand this better, let us look at some examples.
16
Example 17 Consider a system of linear algebraic equations
2 32 3
1 0 1 x1
6 76 7
Ax = 4 1 1 0 5 4 x2 5 = b
0 1 1 x3
It is desired to show that the set of all solutions of this equation for arbitrary vector b is the
same as R3 :
It is easy to see that matrix A has rank equal to three and columns (and rows) are linearly
independent. Since the columns are linearly independent, a unique solution x 2R3 can be
found for any arbitrary vector b 2 R3 : Now, let us …nd a general solution x for an arbitrary
vector b by computing A 1 as follows
2 3
1=2 1=2 1=2
6 7
x = A 1 b = 4 1=2 1=2 1=2 5 b
1=2 1=2 1=2
2 3 2 3 2 3
1=2 1=2 1=2
6 7 6 7 6 7
= b1 4 1=2 5 + b2 4 1=2 5 + b3 4 1=2 5
1=2 1=2 1=2
= b1 v(1) + b2 v(2) + b3 v(3)
By de…nition
b1 v(1) + b2 v(2) + b3 v(3) 2 span v(1) ; v(2) ; v(3)
for an arbitrary b 2 R3 ; and, since vectors v(1) ; v(2) ; v(3) are linearly independent, we have
i.e. set of all possible solutions x of the system of equations under considerations is identical
to the entire space R3 :
p3 + 6p2 + 11p + 6 = 0
17
are p = 1, p = 2 and p = 3. Thus, general solution of the ODE can be written as
t 2t 3t
u(t) = e + e + e
The general solution of this ODE-BVP, which satis…es the boundary conditions, is given by
X
1
u(z) = 1 sin( z) + 2 sin(2 z) + 3 sin(3 z) + ::: = i sin(i z)
i=1
where ( 1 ; 2 ; :::) 2 R are arbitrary scalars. The set of vectors fsin( z); sin(2 z); sin(3 z); :::g
is linearly independent and form a basis for C (2) [0; 1]; i.e. the set of twice di¤erentiable con-
tinuous functions in interval [0; 1] i.e.
M C (2) [0; 1] = span fsin( z); sin(2 z); sin(3 z); :::g
The concept of linear vector space and transformations de…ned on vector spaces allows us
to arrive at a uni…ed representation of seemingly di¤erent problems encountered in engineer-
ing applications. A large number of problems arising in applied engineering mathematics
can be stated as follows [3]:
Here, X and Y are vector spaces and operator T : M ! Y: In engineering parlance, x; y and
T represent input, output and model, respectively. Linz [3] proposes the following broad
classi…cation of problems encountered in computational mathematics
Direct Problems: Given the operator T and x; …nd y: In this case, we are trying to
compute the output of a given system of equations for given input. The computation
of de…nite integrals is an example of this type.
18
Inverse Problems: Given the operator T and y; …nd x: In this case we are looking for
the input which generates the observed output. Solving the system of simultaneous
(linear/nonlinear) algebraic equations, ordinary and partial di¤erential equations
and integral equations are examples of this category. In fact, a majority of system
design problems, in which we are expected to decide inputs x for given speci…cations
of outputs y, belong to this class
Identi…cation problems: Given the operator x and y; …nd T : In this case, we try
to …nd the laws governing the system from a knowledge of the relation between the
inputs and outputs. Problems involving model parameter estimation from measured
input output data, such as estimating reaction rate parameters or development of
transfer function models, belong to this class of problems.
The direct problems can be treated relatively easily. The inverse problems and the
identi…cation problems are more di¢ cult to solve and form the central theme of this numerical
analysis course.
4 Magnitudes
4.1 Normed Linear Spaces
In three dimensional space, we use lengths or magnitudes to compare any two vectors. Gen-
eralization of the concept of length / magnitude of a vector in three dimensional vector space
to an arbitrary vector space is achieved by de…ning a scalar valued function called norm of
a vector.
De…nition 20 (Normed Linear Vector Space): A normed linear vector space is a vector
space X on which there is de…ned a real valued function which maps each element x 2 X
into a real number kxkcalled norm of x. The norm satis…es the fallowing axioms.
P
N
1. (Rn ; k:k1 ) :Euclidean space Rn with 1-norm: kxk1 jxi j
i=1
19
2. (Rn ; k:k2 ) :Euclidean space Rn with 2-norm:
" # 12
X
N
kxk2 (xi )2
i=1
" # p1
X
N
kxkp jxi jp (25)
i=1
6. Space of in…nite sequences (l1 ) with p-norm: An element in this space, say x 2 l1 , is
an in…nite sequence of numbers
7. (C[a; b]; kx(t)k1 ) : The normed linear space C[a; b] together with in…nite norm
max
kx(t)k1 jx(t)j (29)
a t b
max jx(t) + y(t)j max[jx(t)j + jy(t)j] max jx(t)j + max jy(t)j (30)
20
8. Other types of norms, which can be de…ned on the set of continuous functions over
[a; b] are as follows
Zb
kx(t)k1 jx(t)j dt (32)
a
2b 3 21
Z
kx(t)k2 4 jx(t)j2 dt5 (33)
a
Example 22 Determine whether(a) max jdf (t)=dtj (b) max jx(t)j + max jx0 (t)j (c) jx(a)j +
max jx0 (t)j and (d) jx(a)j max jx(t)j can serve as a valid de…nitions for norm in C(2) [a; b]:
Solution: (a) max jdf (t)=dtj : For this to be a norm function, Axiom 1 in the de…nition
of the normed vector spaces requires
kf (t)k = 0 ) f (t) is the zero vector in C(2) [a; b] i.e. f (t) = 0 for all t 2 [a; b]
However, consider the constant function i.e. g(t) = c for all t 2 [a; b] where c is some
non-zero value. It is easy to see that
max jdg(t)=dtj = 0
even when g(t) does not correspond to the zero vector. Thus, the above function violates
Axiom 1 in the de…nition of a normed vector space and, consequently, cannot qualify as a
norm.
(b) max jx(t)j + max jx0 (t)j : For any non-zero function x(t) 2 C(2) [a; b], Axiom 1 is
satis…ed. Axiom 2 follows from the following inequality
It is easy to show that Axiom 3 is also satis…ed for all scalars : Thus, the given function
de…nes a norm on C (2) [a; b]
(c) jx(a)j + max jx0 (t)j : For any non-zero function x(t) 2 C(2) [a; b], Axiom 1 is satis…ed.
Axiom 2 follows from the following inequality
21
Axiom A3 is also satis…ed for any as
(d) jx(a)j max jx(t)j : Consider a non-zero function x(t) in C (2) [a; b] such that x(a) = 0
and max jx(t)j 6= 0: Then, Axiom 1 is not satis…ed for all vector x(t) 2 C (2) [a; b] and the
above function does not qualify to be a norm on C (2) [a; b]:
M ax kAxk
kAk = (34)
x 6= 0 kxk
In other words, kAk bounds the ampli…cation power of the matrix i.e.
kAxk
kAk for all x 2 Rn ; x 6= 0 (35)
kxk
The equality holds for at least one non zero vector x 2 Rn . An alternate way of de…ning
matrix norm is as follows
M ax
kAk = kAbxk (36)
kb
xk = 1
b as
De…ning x
x
b=
x
kxk
it is easy to see that these two de…nitions are equivalent. The following conditions are
satis…ed for any matrices A; B 2 Rm Rn
22
2. k Ak = j j.kAk
3. kA + Bk kAk + kBk
The induced norms, i.e. the norm of a matrix induced by vector norms on Rm and Rn ,
can be interpreted as the maximum gain or ampli…cation factor of the matrix. Commonly
used matrix norms are as follows
2-norm:
M ax kAxk2 T 1=2
jjAjj2 = =[ max (A A)] (37)
x 6= 0 kxk2
where max (AT A) and min (AT A) denote maximum and minimum magnitude eigen-
value of AT A, respectively.(Refer to Appendix for details of the derivation).
1-norm: Maximum over column sums of jaij j(Refer to Appendix for details of the
derivation) " m #
M ax kAxk1 max X
jjAjj1 = = jaij j (38)
x 6= 0 kxk1 1 j n i=1
1 norm: Maximum over row sums of jaij j (Refer to Appendix for details of the
derivation) " n #
M ax kAxk1 max X
jjAjj1 = = jaij j (39)
x 6= 0 kxk1 1 i m j=1
Remark 24 There are other matrix norms, such as Frobenious norm, which are not induced
matrix norms. The Frobenious norm is de…ned as follows
" #1=2
X
n X
n
jjAjjF = jaij j2
i=1 j=1
23
Schematic representation of a unit ball in C[0,1]
Once we have de…ned a norm in a vector space, we can proceed to generalize the concept
of convergence of a sequence of vectors. The concept of convergence is central to all
iterative numerical methods.
24
for k = 0, 1, 2,.... is a convergent sequence with respect to any p-norm de…ned on R4 : It can
be shown that it is a Cauchy sequence. Note that each element of the vector converges to a
limit in this case.
vector space need not be convergent. Cauchy sequences in some vector spaces exhibit such
strange behavior and this motivates the concept of completeness of a vector space.
Example 29 Let X = (Q; k:k1 ) i.e. set of rational numbers (Q) with scalar …eld also as the
set of rational numbers (Q) and norm de…ned as
A vector in this space is a rational number. In this space, we can construct Cauchy sequences
which do not converge to a rational numbers (or rather they converge to irrational numbers).
For example, the well known Cauchy sequence
x(1) = 1=1
x(2) = 1=1 + 1=(2!)
:::::::::
(n)
x = 1=1 + 1=(2!) + ::::: + 1=(n!)
x(n+1) = 4 (1=x(n) )
25
Starting from the initial point x(0) = 1; we can generate the sequence of rational numbers
Example 30 Consider the sequence of functions in the space of twice di¤erentiable contin-
uous functions C (2) ( 1; 1)
1 1
f (k) (t) = + tan 1
(kt)
2
de…ned in interval 1 < t < 1; for all integers k. The range of the function is (0,1). As
k ! 1; the sequence of continuous function converges to a discontinuous function
u( ) (t) = 0 1<t<0
= 1 0<t<1
Example 31 Let X = (C[0; 1]; k:k1 ) i.e. the space of continuous functions on [0; 1] with
one norm de…ned on it, i.e.
Z1
kx(t)k1 = jx(t)j dt (42)
0
1 1 1
x(n) x(m) = !0 (44)
2 n m
However, as can be observed from Figure 2, the sequence does not converge to a continuous
function.
The concepts of convergence, Cauchy sequences and completeness of space assume im-
portance in the analysis of iterative numerical techniques. Any iterative numerical method
generates a sequence of vectors and we have to assess whether the sequence is Cauchy to
terminate the iterations. To a beginner, it may appear that the concept of incomplete vector
space does not have much use in practice. It may be noted that, when we compute numerical
26
Figure 1: Sequence of continuous functions
solutions using any computer, we are working in …nite dimensional incomplete vector spaces.
In any computer with …nite precision, any irrational number such as or e; is approximated
by an rational number due to …nite precision. In fact, even if we want to …nd a solution
in Rn ; while using a …nite precision computer to compute a solution, we actually end up
working in Qn and not in Rn :
4.4 Exercises
1. Over a normed space (X, k:k), we de…ne a function of two variables d(u; v) = ku vk.
Show that d(u; v) is a distance function, in other words, d(u; v) has the following
properties of an ordinary distance between two points:
2. Determine which of the following de…nitions are valid as de…nitions for norms in
C(2) [a; b]
27
(a) max jx(t)j + max jx0 (t)j
(b) max jx0 (t)j
(c) jx(a)j + max jx0 (t)j
(d) jx(a)j max jx(t)j
3. In a normed linear space X the set of all vectors x 2X such that kx xk 1 is called
the unit ball centered at x:
4. Consider a vector space X {Set of real valued 2 2 matrices} together with a scalar
…eld F R. Now, consider a subset of S X where S consists of all invertible 2 2
real valued matrices. Does set S form a subspace of X? Justify your answer.
5. Two norms k:ka and k:kb are said to be equivalent if there exists two positive constants
c1 and c2 , independent of x; such that
(a) Show that in Rn the 2 norm (Euclidean norm) and 1 norm (maximum norm)
are equivalent.
(b) Show that in Rn the 1 norm and 1 norm (maximum norm) are equivalent.
6. Show that
jkxk kykj kx yk
lim lim
x(k) a
=0) x(k) b
=0
k!1 k!1
but not vice versa. For C[0,1], show that the maximum norm is stronger than 2 norm.
28
9. Consider real valued m n matrices A and B. Prove the following inequality [8]
kA + Bk kAk + kBk
10. Consider real valued square matrices A and B. Prove the following inequalities/identities
[8]
kABk kAk kBk
kAk2 = AT 2
11. Consider real valued square nonsingular matrices A and B. Show that
1 1 1 1
A B A B kA Bk
12. Consider an arbitrary real valued square matrix A. Show that max (A) is not a satis-
factory norm of A.
13. Consider a real valued symmetric positive de…nite matrix A: Show that kAk2 =
max (A):
15. Consider a real values square matrix A:Show that even max j i j , is not a satisfactory
norm of a matrix, by …nding a 2 2 counter example to the following inequalities
5 Angles
Similar to magnitude/length of a vector, another important concept in three dimensional
space that needs to be generalized is the angle between any two vectors.
29
5.1 Inner Product Spaces
Given any two unit vectors in R3 , say x
b and y
b; the angle between these two vectors is de…ned
using the inner (or dot) product of two vectors as
T
T x y
cos( ) = (b
x) y b= (45)
kxk2 kyk2
b1 yb1 + x
= x b2 yb2 + x
b3 yb3 (46)
The fact that the cosine of the angle between any two unit vectors is always less than one
can be stated as
jcos( )j = jhb bij 1
x; y (47)
Moreover, vectors x and y are called orthogonal if xT y = 0: Orthogonality is probably the
most useful concept while working in three dimensional Euclidean space. Inner product
spaces and Hilbert spaces generalize these simple geometrical concepts in three dimensional
Euclidean space to higher or in…nite dimensional vector spaces.
De…nition 32 (Inner Product Space): An inner product space is a linear vector space
X together with an inner product de…ned on X X. Corresponding to each pair of vectors
x; y 2 X the inner product hx; yi of x and y is a scalar. The inner product satis…es the
following axioms:
2. hx + y; zi = hx; zi + hy; zi
3. h x; yi = hx; yi
hx; yi = hx; yi
Here are some examples of commonly used inner products and Hilbert spaces.
30
1. X Rn with inner product de…ned as
X
n
T
hx; yi = x y = xi yi (48)
i=1
X
n
hx; xi = (xi )2 = kxk22 (49)
i=1
is a Hilbert space.
X
n X
n
hx; xi = xi xi = jxi j2 = kxk22 (52)
i=1 i=1
is a Hilbert space.
4. The set of real valued square integrable functions on interval [a; b] with inner product
de…ned as
Zb
hx; yi = x(t)y(t)dt (53)
a
is an Hilbert space and denoted as L2 [a; b]: Well known examples of spaces of this type
are the set of continuous functions on L2 [ ; ] or L2 [0; 2 ], which are considered while
developing Fourier series expansions of continuous functions on [ ; ] or [0; 2 ] using
sin(n ) and cos(n ) as basis functions.
Zb
hx; yi = x(t)y(t)dt (54)
a
31
6. Space of complex valued square integrable functions on [a; b] with inner product
Zb
hx; yi = x(t)y(t)dt (55)
a
Axioms 2 and 3 imply that the inner product is linear in the …rst entry. The quantity
1
hx; xi 2 is a candidate function for de…ning norm on the inner product space: Axioms 1 and
3 imply that k xk = j j kxk and axiom 4 implies that kxk > 0 for x 6= 0: If we show that
p p
hx; xi satis…es the triangle inequality, then hx; xi de…nes a norm on space X . We …rst
prove Cauchy-Schwarz inequality, which is a generalization of equation (47), and proceed to
p p
show that hx; xi de…nes the well known 2-norm on X; i.e. kxk2 = hx; xi.
hy; xi
In particular, if we choose = ; then, using axiom 1 in the de…nition of an inner
hy; yi
product, we have
hy; xi hx; yi
= = (58)
hy; yi hy; yi
2 hx; yi hy; xi
) hx; yi hy; xi = (59)
hy; yi
2 hx; yi hx; yi 2 jhx; yij2
= = (60)
hy; yi hy; yi
jhx; yij2
)0 hx; xi (61)
hy; yi
32
p
or j hx; yij hx; xi hy; yi
The triangle inequality can be can be established easily using the Cauchy-Schwarz in-
equality as follows
De…nition 36 (Angle) The angle between any two vectors in an inner product space is
de…ned by
1 hx; yi
= cos (72)
kxk2 kyk2
De…nition 37 (Orthogonal Vectors): In an inner product space X two vector x; y
2 X are said to be orthogonal if hx; yi = 0:We symbolize this by x?y: A vector x is said to
be orthogonal to a set S (written as x?S) if x?z for each z 2 S:
Just as orthogonality has many consequences in three dimensional geometry, it has many
implications in any inner-product/Hilbert space [4]. The Pythagoras theorem, which is
probably the most important result the plane geometry, is true in any inner product space.
33
Lemma 38 If x?y in an inner product space then kx + yk22 = kxk22 + kyk22 .
Note that an orthogonal set of non-zero vectors is a linearly independent set. We of-
ten prefer to work with an orthonormal basis as any vector can be uniquely represented
in terms of components along the orthonormal directions. Common examples of such or-
thonormal basis are (a) unit vectors along the coordinate directions in Rn (b) function
fsin(nt) : n = 1; 2; :::g and fcos(nt) : n = 1; 2; :::g in L2 [0; 2 ]:
hx; yiW = xT Wy
Thus, axiom A3 holds for any x; y 2 Rn : Since W is positive de…nite, it follows that
hx; xiW = xT Wx > 0 if x 6= 0 and hx; xiW = xT Wx = 0 if x = 0: Thus, axiom A4
holds for any x 2 Rn : Since all four axioms are satis…ed, hy; xiW = yT Wx is a valid
de…nition of an inner product.
34
Example 41 The triangle inequality asserts that, for any two vectors x and y belonging to
an inner product space
kx + yk2 jjyjj2 +jjxjj2
Does the Cauchy-Schwartz inequality follow from the triangle inequality? Under what condi-
tion does the Schwarz inequality becomes an equality?
Solution: Squaring both the sides, we have
i.e.
jjyjj2 jjxjj2 hx; yi (74)
Combining inequalities (73) and (74), we arrive at the Cauchy-Schwartz inequality
i.e.
jhx; yij jjyjj2 jjxjj2 (76)
The Cauchy-Schwartz inequality reduces to an equality when y = x:
35
Figure 2: Schematic representation of projection of a point, b , on line, a.
hb
a; b pi = hb
a; b b
ai = 0 (77)
hb
a; bi
) = = hb
a; bi (78)
hb
a;bai
Thus, the projection of b along direction b
a, i.e. p; can be expressed as
p= b a; bi b
a= hb a (79)
36
where the vectors a(1) ; a(2) ; ::::; a(m) 2 Rn are linearly independent vectors. Given an
arbitrary point b 2X , the problem is to …nd a point p in subspace S such that b p is
orthogonal to S. We can de…ne unit vectors
a(i) a(i)
a(i) =
b = p for i = 1; 2; :::; m
ka(i) k2 ha(i) ; a(i) i
As p 2 S we have
X
m
p= a(1) +
1b a(2) + :::: +
2b a(m) =
mb a(i)
ib (80)
i=1
or * +
X
m X
m
(i) (j)
b
a ; jb
a = i a(i) ;b
b a(j) = b
a(i) ; b for i = 1; 2; :::m (82)
j=1 j=1
Note that these are m linear equations in m unknowns, f i : i = 1; 2; :::mg. Collecting the
above set of equations, we arrive at the following matrix equation
2 (1) (1) 32 3 2 (1) 3
b
a ;b a a(1) ;b
b a(2) :::: a(1) ;b
b a(m) 1 b
a ;b
6 b a(m) 7 6 7 6 a(2) ; b 7
6 a(2) ;ba(1) a(2) ;b
b a(2) :::: a(2) ;b
b 76 2 7 6 b 7
6 76 7=6 7 (83)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5 4 :::: 5
a(m) ;b
b a(1) a(m) ;b
b a(2) ::::: b a(m) ;b
a(m) m a(m) ; b
b
The situation is exactly the same when we are given a point b 2R3 and plane S in R3 ,
which is spanned by two linearly independent vectors a(1) ; a(2) : We would like to …nd the
37
Figure 3: Schematic representation of a projection of a vector b on a subspace spanned by
vectors (u,v)
distance of b from S, i.e., a point p 2S such that kp bk2 is minimum (see Figure ??).
Again, from school geometry, we know that such a point can be obtained by drawing a
perpendicular from b to S ; p is the point where this perpendicular meets S (see Figure ??).
We would like to formally derive this result using optimization.
It may be noted that, to project vecftor b on subspace S, it is not necessary to generate
a(i) : i = 1; 2; :::; m: One can directly work with the set a(i) : i = 1; 2; :::; m :
unit vectors b
Let us express p 2 S we have
X
m
(1) (2) (m) (i)
p= 1a + 2a + :::: + ma = ia (85)
i=1
or * +
X
m X
m
a(i) ; ja
(j)
= i a(i) ; a(j) = a(i) ; b for i = 1; 2; :::m (87)
j=1 j=1
38
Note that these are m linear equations in m unknowns, f i : i = 1; 2; :::mg. Collecting the
above set of equations, we arrive at the following matrix equation
2 (1) (1) 32 3 2 (1) 3
a ;a a(1) ; a(2) :::: a(1) ; a(m) 1 a ;b
6 a(2) ; a(1) a(2) ; a(2) a(2) ; a(m) 7 6 7 6 7
6 :::: 7 6 2 7 6 a(2) ; b 7
6 76 7=6 7 (88)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5 4 :::: 5
a(m) ; a(1) a(m) ; a(2) ::::: a(m) ; a(m) m a(m) ; b
De…ning matrix, G, and vector, f ; as
2 (1) (1) 3 2 3
a ;a a(1) ; a(2) :::: a(1) ; a(m) a(1) ; b
6 a(2) ; a(1) a(2) ; a(2) :::: a(2) ; a(m) 7 6 a(2) ; b 7
6 7 6 7
G =6 7 ; f =6 7 (89)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5
(m) (1)
a ;a a(m) ; a(2) ::::: a(m) ; a(m) a(m) ; b
eqation (88) can be expressed as a linear algebraic equation
G =f (90)
The m m matrix G on the L.H.S. of (90) is called as the Gram matrix. When we work with
real valued vectors and the inner product is a map from X X to R, then it is easy to show
that G is a symmetric matrix, i.e. GT = G: Moreover, since vectors a(1) ; a(2) ; :::::::; a(m)
are assumed to be linearly independent, the Gram matrix is nonsingular. The equation we
have derived here represnts a very general result called projection theorem, which holds
in any Hilbert space.
and
* +
X
m X
m
' = kb pk22 = b (j)
ja ; b ja
(j)
j=1 j=1
* m + * m +
X X X
m
(j) (j) (j)
= hb; bi 2 ja ; b + ja ; ja
j=1 j=1 j=1
39
We want to …nd vector p such that ' = kb pk22 is smallest. Thus, the problem of …nding
p can be recast as follows
M in M in
p= '= kb pk22
f i : i = 1; 2; :::mg f i : i = 1; 2; :::mg
Using the necessary condition for optimality (see Appendix B), it follows that, at the opti-
mum point, the following set of equations holds
@'
= 0 for i = 1; 2; :::; m (91)
@ i
i.e.
* + * m +
@' X
m X
= a(i) ; b b; a(i) + a(i) ; ja
(j)
+ (j)
ja ; a
(i)
=0 (92)
@ i j=1 j=1
for i = 1; 2; :::; m
which is same as equation (88). Further, it can be shown that G is a positve de…nite matrix
and the solution b = G 1 f yields the global minimum, i.e. we get smallest possible kb pk22
for = b .
such that function e(t) = g(t) p(t) is orthogonal to S. Since p(t) 2 S; it is of the form
40
Z 2 2Z
(2 )2 (2 )3
hf1 ;f1 i = 2 ; hf1 ;f2 i = (1:t)dt = ; hf2 ;f2 i = t2 dt =
0 2 0 3
Z 2 Z 2
2
f1 ;et = (1:et )dt = (e2 1) and f2 ;et = (t:et )dt = tet et 0 = e2 (2 1) + 1
0 0
" # " # 1 " #
(2 )2
1 2 2
(e2 1)
= (2 )2 (2 )3 2
2 2 3
e (2 1) + 1
such that m < n: Let S = span a(1) ; a(2) ; ::::; a(m) represent and m dimensional subspace
of Rn . Given a vector b 2Rn ; we are interested in …nding the orthogonal projection,p; of b
in S. Let us consider a motivating example …rst. We will then state the general result.
Example 44 Consider the problem of …nding an approximate correlation relating the speci…c
heat of a gas at constant pressure, Cp , as a function of temperature in certain temperature
range [Ta ; Tb ]: Let us assume that we have obtained three "measurements" of the speci…c
heat, fCp1 ; Cp2 ; Cp3 g ;at three di¤erent temperatures fT1 ; T2 ; T3 g 2 [Ta ; Tb ]: Let us propose an
approximate model of the form
Cp = 1 + 2T
bp + e
+e=C (94)
Cp1 = 1 + 2 T1 + e1 (95)
Cp2 = 1 + 1 T2 + e2 (96)
Cp3 = 1 + 2 T3 + e4 (97)
41
Thus, we are looking for a vector p
2 3 2 3 2 3
bp1
C 1 T1
6 b 7 6 7 6 7 (1) (2)
p 4 Cp2 5 = 1 4 1 5 + 2 4 T2 5 = 1a + 2a
bp3
C 1 T3
that lies in the two dimensional subspace S = span a(1) ; a(2) or R3 such that kb pk22 =
kek22 is as small as possible where
2 3 2 3
Cp1 e1
6 7 6 7
b 4 Cp2 5 and e 4 e2 5
Cp3 e3
Let us assume that, for any x; y 2R3 ; the inner product de…ned as hx; yi = xT y. Then,
" # " #
a(1) ; a(1) a(1) ; a(2) 3 T1 + T2 + T3
G= =
a(2) ; a(1) a(2) ; a(2) T1 + T2 + T3 T12 + T22 + T32
" # " #
a(1) ; b Cp1 + Cp2 + Cp3
f= =
a(2) ; b Cp1 T1 + Cp2 T2 + Cp3 T3
and b = G 1 f .The vector p obtained using this is such that e = b p ? p and kek22 =
eT e =e21 + e22 + e23 is as small as possible. In other words, b represents the least squares
estimate of the parameter vector . There is an alternate way to arrive at the same result.
Let is de…ne a matrix 2 3
h i 1 T1
6 7
A = a(1) a(2) = 4 1 T2 5
1 T3
Then, it is easy to show that
" # " # " #
a(1) T h i a(1)
T
a(1) a(1)
T
a(2) a(1) ; a(1) a(1) ; a(2)
AT A = T a(1)
a (2) = T T =
a(2) a(2) a(1) a(2) a(2) a(2) ; a(1) a(2) ; a(2)
" T
# " #
a(1) b a(1) ; b
AT b = T =
a(2) b a(2) ; b
Thus,
1
b LS = AT A AT b
42
Now, consider the general case where S = span a(1) ; a(2) ; ::::; a(m) and we want to
project an arbitrary vector b2 Rn on S. Let us assume that, for any x; y 2Rn ; the inner
product de…ned as hx; yi = xT y. If we de…ne matrix
h i
A = a(1) a(2) ::: a(m)
and h i
1
p = A b LS = A AT A AT b (100)
Here, h i
1
Pr = A AT A AT (101)
is called as a projection matrix and it projects on S. Also, the component orthogomal to S
is given as h i
1
e = b p = b A AT A AT b = [I Pr ] b (102)
This, the projection matrix Pr facilitates splitting a vector b into two components: p 2 S
and e = b p ?S. The projection matrix has an interesting property. It is square of itself.
h ih i h i
2 T 1 T T 1 T T 1 T
(Pr ) = A A A A A A A A = A A A A = Pr (103)
Thus,
(Pr )2 b = Pr (Pr b) = Pr (p) = (Pr )b =p (104)
This makes perfect sense from the geometric viewpoint. Since p 2S; projection of p on S is
p itself.
Example 45 Let us revisit the problem of …nding an approximate correlation relating the
speci…c heat of a gas at constant pressure, Cp , as a function of temperature in certain temper-
ature range [Ta ; Tb ]. Let us now assume that we obtained …ve "measurements" of the speci…c
heat, fCp1 ; Cp2 ; Cp3 ; Cp4 ; Cp5 g ;at …ve di¤erent temperatures fT1 ; T2 ; T3 ; T4 ; T5 g 2 [Ta ; Tb ]: This
time let us propose an approximate model of the form
Cp = 1 + 2T + 3T
2 bp + e
+e=C (105)
43
where ( 1 ; 2 ; 3 ) represent parameters of the correlation, e represents approximation error
and Cbp represents estimate of Cp based on temperature T . The resulting set of equations can
be rearranged as follows
2 3 2 3 2 3
Cp1 1 T1 T12 e1
6 7 6 72 3 6 7
6 Cp2 7 6 1 T2 T22 7 1 6 e2 7
6 7 6 76 7 6 6 7
Cp = 6 6 Cp3
7 = 6 1 T3 T 2 7 4
7 6 3 7 2 5 + 6 e 3
7=A +e
7
6 7 6 2 7 6 7
C
4 p4 5 4 1 T 4 T 4 5 3 e
4 4 5
Cp5 1 T5 T52 e5
and p = A b LS :
Example 46 It was found that yield of a chemical reaction, Y , is a function of the operating
temperature and pressure. Experiments have been carried out at 50 di¤erent combinations
f(T1 ; P1 ); (T2 ; ; P2 ); :::; (T100 ; ; P100 )g and the corresponding reaction yield fY1 ; Y2 ; ::::; Y100 g
has been recorded. A model relating the reaction yield and (T; P ) is proposed as follows
Y = 1 + 2T + 3P + e = Yb + e (106)
44
Thus, we are looking for projection of vector b Y on the 3 dimensional subspace S of R100
which is spanned by
2 3 2 3 2 3
1 T1 P1
6 1 7 6 T 7 6 P 7
6 7 6 2 7 6 2 7
a(1) = 6 7 ; a(2) = 6 7 ; a(3) = 6 7
4 ::: 5 4 ::: 5 4 ::: 5
1 T100 P100
b A b LS : Note that AT A is 3
and p =Y= 3 matrix and AT b is 3 1 vector.
x(1)
z(1) =
b (107)
kx(1) k2
z(2) in two steps.
We form a unit vector b
(2) z(2)
b
z = (2) (109)
kz k2
X
k 1
(k) (k)
z =x x(k) ; b
z(i) :b
z(i) (110)
i=1
and
z(k)
z(k) =
b ; k = 1; 2; :::::::::n (111)
kz(k) k2
45
It can be veri…ed by direct computation that z(k) ?b
z(j) for all j < k as follows
X
k 1
z(k) ; b
z(j) = x(k) ; b
z(j) x(k) ; b
z(i) : b
z(i) ; b
z(j) (112)
i=1
(k) (j) (k)
= x ;b
z z(j) = 0
x ;b (113)
2 3
p1
(2) 2
z 6 7
z(2) =
b (2)
=4 0 5 (117)
kz k2
p1
2
z(3) h iT
(3)
b
z = (3) : = 0 1 0
kz k2
Note that the vectors in the orthonormal set will depend on the de…nition of inner product.
Suppose we de…ne the inner product as follows
46
2 3
2 1 1
6 7
W =4 1 2 1 5
1 1 2
p
where W is a positive de…nite matrix. Then, the length of x(1) W;2
= 6 and the unit
z(1) becomes
vector b 2 1 3
p
6
x(1) 6 7
z(1)
b = (1) :=4 0 5 (120)
kx kW;2 1 p
6
The remaining two orthonormal vectors have to be computed using the inner product de…ned
by equation 119.
x(1) (t) 1
z(1) (t) =
b (1)
=p (123)
kx (t)k 2
Z1
t
z(1) (t); x(2) (t) =
b dt = 0 (124)
2
1
47
z(3) (t) = x(3) (t) x(3) (t); b
z(1) (t) :b
z(1) (t) x(3) (t); b
z(2) (t) :b
z(2) (t)
01 1 0r 1 1
Z Z
1 @ 2 A (1) 3 3 A (2)
= t2 t dt b z (t) @ t dt b z (t)
2 2
1 1
1 1
= t2 0 = t2 (130)
3 3
z(3) (t)
z(3) (t) =
b (131)
kz(3) (t)k
Z1 2
(3) 2 (3) (3) 2 1
where z (t) = z (t); z (t) = t dt (132)
3
1
Z1 1
4 2 2 1 t5 2t3 t
= t t + dt = +
3 9 5 9 9 1
1
24 2 18 10 8
= + = =
39 9 45 45
r r
8 2 2
z(3) (t) = = (133)
45 3 5
The orthonormal polynomials constructed above are well known Legendre polynomial. It
turns out that r
2n + 1
b
zn (t) = pn (t) ; (n = 0; 1; 2:::::::) (134)
2
where
( 1)n dn n
Pn (t) = n n
1 t2 (135)
2 n! dt
are Legendre polynomials. It can be shown that this set of polynomials forms an ortho-
normal basis for the set of continuous functions on [ 1; 1]. The …rst few elements in this
orthogonal set are as follows
1 1
P0 (t) = 1; P1 (t) = t; P2 (t) = (3t2 1); P3 (t) = (5t3 3t)
2 2
1 1
P4 (t) = (35t4 2
30t + 3); P5 (t) = (63t5 3
70t + 15t)
8 8
Example 49 Gram-Schmidt Procedure in other Spaces
48
These polynomials are generated starting from linearly independent vectors
3. Laguerre Polynomials: X L2 (0; 1); i.e. the space of continuous functions over
(0; 1) with 2 norm de…ned on it and
Z1
hx(t); y(t)i = e t x(t)y(t)dt (141)
0
49
5.5 Generalized Fourier Series
z(1) ; b
Consider an inner product space X together with an orthonormal basis fb z(2) ; ::::; b
z(i) ; ::::g
for X. Now, any element x 2X can be expressed as
z(1) b
x = x; b z(1) + x; b
z(2) b
z(2) + ::::: + x; b
z(i) b
z(i) + :::::
This represents the Fourier expansion of x in terms of the orthogonal basis. Some examples
of the Fourier series expansion are as follows
X R3 : Expressing a vector as
v =xi + yj + zk
5.6 Exercise
1. Compute a matrix that projects every point in the plane onto line x + 2y = 0.
3. Find the best straight line …t, y = 1 + 2 t + e; to the following measurements in the
least square sense and sketch your solution
y = 2 at t = 1 ; y = 0 at t = 0
y = 3 at t = 1 ; y= 5 at t = 2
50
5. It is desired to …t the heat capacity data for methylecyclohexane (C7 H14 ) to a linear
function of temperature
Cp = 1 + 2 T + e
where Cp is the heat capacity, T is the absolute temperature. Determine the least
squares estimates of the model parameters the following data:
6. Use all of the data given in the following table to …t the following two-dimensional
models for di¤usion coe¢ cient D as a function of temperature (T) and weight fraction
(X)
D = 1 + 2T + 3X + e
such that the sum of the suare of the approximation errors is minimized and estimate
Db at T = 22; X = 0:36.
T(0 C) 20 20 20 25 25 25 30 30 30
X 0.3 0.4 0.5 0.3 0.4 0.5 0.3 0.4 0.5
5 2
D 10 cm =s 0.823 0.639 0.43 0.973 0.751 0.506 1.032 0.824 0.561
hx; yiW = xT W y
2 3
2 1 1
6 7
W=4 1 2 1 5
1 1 2
51
8. Gram-Schmidt Procedure in C[a; b]: Let X represent a set of continuous functions
on the interval 0 t 1 with the inner product de…ned as
Z1
hx(t); y(t)i = w(t)x(t)y(t)dt
0
…nd the orthonormal set of vectors if (a) w(t) = 1 (Shifted Legendre Polynomials)
(b) w(t) = t(1 t) (Jacobi polynomials).
9. Show that in C[a,b] with the maximum norm, we cannot de…ne an inner product hx; yi
such that hx; xi1=2 = kxk1 : In other words, show that in C[a; b] the following function
max
hx(t);y(t)i = jx(t)y(t)j
t
cannot de…ne an inner product.
12. Show that the parallelogram law holds in any inner product space.
13. Consider a real inner product space (X, h:; :i) together with the norm induced by the
p
inner product kxk = hx; xi. Show that the following identity (known as polarization
identity) holds
1
hx; yi = kx + yk2 kx yk2
4
52
14. Assume x(1) ; x(2) ; :::; x(n) are mutually orthogonal non-zero vectors in an inner product
space. Show that they are linearly independent.
2
X
1
2 X
1
kxk = x; b
z (i)
= jxi j2
i=1 i=1
for any x; y 2 X:
Note: This is a very important result. As a illustration, consider Hilbert space
L2 [ ; ] together with orthonormal basis
n p p p o
b z(2) (t) = 1=
z(1) (t) = 1= 2 ; b z(3) (t) = 1=
cos(t); b sin(t); ::::
then Z
2 2
X
1
kf (t)k = jf (t)j dt = jci j2
i=1
53
17. Consider an inner product space, L2 [0; ], with an inner product de…ned as follows
Z
hf (t);g(t)i = f (t)g(t)dt
0
Show that they are linearly independent. Further, …nd the orthogonal projection, p(t);
of the vector et on the subspace spanned f (t) and g(t).
Further, …nd the orthogonal projection, p(t); of the vector et on the subspace spanned
f (t) and g(t).
6 Summary
In this chapter, some fundamental concepts from functional analysis have been reviewed.
We begin with the concept of a general vector space and de…ne various algebraic and geomet-
ric structures like norm and inner product. We then move to de…ne the inner product, which
generalizes the concept of dot product, and the angle between vectors. We also interpret
the notion of orthogonality in a general inner product space and develop the Gram-Schmidt
process, which can generate an orthonormal set from a linearly independent set. The def-
inition of the inner product and orthogonality paves the way to generalize the concept of
projecting a vector onto any sub-space of an inner product space. In the end, we discuss
induced matrix norms, which play an important role in the analysis of numerical schemes.
7 Appendices
7.1 Appendix A: Computation of Matrix Norms
7.1.1 Computation of the 2-norm [8][5]
To begin with, we state an important result that is needed to compute 2-norm of a matrix.
54
Theorem 50 Let B 2 Rm Rm represent a square symmetric matrix. Then, (a) B has
only real eigenvalues and orthogonal eigenvectors and (b) B is always diagonalizable as
T
B=
max jjAxjj2
jjAjj2 = (144)
x 6= 0 jjxjj2
where is a matrix with eigenvectors of B as columns and is the diagonal matrix with
eigenvalues of B (= AT A) on the main diagonal. Note that in this case is a unitary
matrix, i.e.,
T T 1
= I; i.e. = (146)
and eigenvectors are orthogonal. Also, since
0 1 2 :::::: n (149)
55
Then, we have
yT y ( 1 y12 + 2 y22 + :::::: + n yn2 )
= n (150)
(yT y) (y12 + y22 + :::::: + yn2 )
which implies that
yT y xT Bx xT (AT A)x
= = n (151)
(yT y) (xT x) (xT x)
The equality holds only at the corresponding eigenvector of AT A, i.e.,
T T
v(n) (AT A)v(n) v(n) nv
(n)
T
= T
= n (152)
[v(n) ] v(n) [v(n) ] v(n)
Thus, the 2 norm of matrix A can be computed as follows
max
jjAjj22 = jjAxjj2 =jjxjj2 = T
max (A A) (153)
x 6= 0
i.e.
T 1=2
jjAjj2 = [ max (A A)] (154)
T
where max (A A) denotes the maximum magnitude eigenvalue or the spectral radius of
AT A.
max jjAxjj1 M ax
jjAjj1 = = kAb
xk1 (155)
x 6= 0 jjxjj1 kb
xk1 = 1
and " m #
X
m X
n X
m X
n X
n X
kAb
xk1 = bj
aij x jaij j jb
xj j jaij j jb
xj j
i=1 j=1 i=1 j=1 j=1 i=1
De…ne " #
X
m
Cj = jaij j
i=1
and let
max
Cmax = Cj
1 j n
56
Pm
Then, since i=1 jb
xj j = kb
xk1 = 1; it follows that
!
X
n X
n X
n
kAb
xk1 Cj jb
xj j Cmax jb
xj j = Cmax jb
xj j = Cmax
j=1 j=1 j=1
Suppose, C b as follows
/ j = Cmax for j = k. Then, we can choose a vector x
bk = 1 and x
x bj = 0 when j 6= k
b , we have
For this choice of x
" #
X
n
kAb
xk1 = Cmax = jaik j
i=1
M ax X
n
jaij j jb
xj j
i j=1
Since kb
xk1 = 1; it follows that jb
xj j 1 and
M ax X M ax X
n n
kAb
xk1 jaij j jb
xj j jaij j
i j=1
i j=1
b such that
Suppose that the maximum occurs for i = k. Then choosing x
xj = sign(akj )
57
7.2 Appendix B: Necessary and Su¢ cient Conditions for Uncon-
strained Optimality
7.2.1 Preliminaries
De…nition 51 (Global Minimum): If there exists a point x 2 Rn such that (x ) < (x)
for any x 2 RN ; then x is called as the global minimum of (x):
De…nition 53 (Local Minimum) : If there exists an " neighborhood NC (x) round x such
that (x) < (x) for each x 2 Ne (x); then x is called a local minimum.
Before we prove the necessary and su¢ cient conditions for optimality, we revise some
relevant de…nitions from linear algebra.
xT Ax 0 (158)
xT Ax 0 (160)
58
7.2.2 Necessary Condition for Optimality
The necessary condition for optimality, which can be used to establish whether a given point
is a stationary (maximum or minimum) point, is given by the following theorem.
Theorem 58 If (x) is continuous and di¤erentiable and has an extreme (or stationary)
point (i.e., maximum or minimum) point at x = x; then
T
@ @ @
r (x) = :::::::::::::: =0 (161)
@x1 @x2 @xN x=x
Proof: Suppose x = x is a minimum point and one of the partial derivatives, say the
th
k one, does not vanish at x =x; then by Taylor’s theorem
X
N
@
(x + x) = (x) + (x) xi + R2 (x; x) (162)
i=1
@xi
@
i:e: (x + x) (x) = (x) + R2 (x; x)
xk (163)
@xi
Since R2 (x; x) is of order ( xi )2 ; the terms of order xi will dominate over the higher
order terms for su¢ ciently small x: Thus, the sign of (x + x) (x) is decided by the
sign of
@
xk (x)
@xk
Suppose,
@
(x) > 0 (164)
@xk
then, choosing xk < 0 implies
and (x) can be further reduced by reducing xk : This contradicts the assumption that
x = x is a minimum point. Similarly, if
@
(x) < 0 (166)
@xk
then, choosing xk > 0 implies
and (x) can be further reduced by increasing xk : This contradicts the assumption that
x = x is a minimum point. Thus, x = x will be a minimum of (x) only if
@
(x) = 0 F or k = 1; 2; :::; n (168)
@xk
Similar arguments can be made if x = x is a maximum of (x):
59
7.2.3 Su¢ cient Condition for Optimality
The su¢ cient condition for optimality, which can be used to establish whether a stationary
point is a maximum or a minimum, is given by the following theorem.
X
N
@ 1 XX @ 2 (x +
N N
x)
(x + x) = (x) + (x) x + xi xj
i=1
@x i 2! i=1 j=1
@x i @x j
r (x) = 0 (170)
1 XX @ 2 (x +
N N
x)
(x + x) (x) = xi xj (171)
2! i=1 j=1 @xi @xj
(0 < < 1)
This implies that the sign of (a + x) (a) at the extreme point x is same as the sign of
@2
the R.H.S. Since the 2’nd partial derivative is continuous in the neighborhood of
@xi @xj
x = x; its value at x = x + x will have same sign as its value at x = x for all su¢ ciently
small x. If the quantity
XN X N
@ 2 (x + x)
xi xj ' ( x)T [r2 (x)] x 0 (172)
i=1 j=1
@xi @xj
for all x, then x = x will be a local minimum. In other words, if the Hessian matrix
,[r2 (x)], is positive semi-de…nite, then x = x will be a local minimum. If the quantity
60
XN X N
@ 2 (x + x)
xi xj ' ( x)T [r2 (x)] x 0 (173)
i=1 j=1
@x i @x j
for all x, then x = x will be a local maximum. In other words, if the Hessian matrix,
[r2 (x)], is negative semi-de…nite, then x = x will be a local maximum.
It may be noted that the need to de…ne positive de…nite or negative de…nite matrices
naturally arises from the geometric considerations while qualifying a stationary point in
multi-dimensional optimization problems. Whether a matrix is positive (semi) de…nite, neg-
ative (semi) de…nite or inde…nite can be established using algebraic conditions, such as the
sign of the eigenvalues of the matrix. If the eigenvalues of a matrix are all real positive (i.e.
i 0 for all i) then, the matrix is positive semi-de…nite. If the eigenvalues of a matrix are
all real negative (i.e. i 0 for all i) then, the matrix is negative semi-de…nite. When the
eigenvalues have mixed signs, the matrix is inde…nite.
References
[1] Kreyzig, E.; Introduction to Functional Analysis with Applications,Wiley, New York,
1978.
[2] Limaye, B. V., Functional Analysis (3rd Ed.), New Age International, New Delhi, 2014.
[3] Linz, P.; Theoretical Numerical Analysis, Dover, New York, 1979.
[4] Luenberger, D. G.; Optimization by Vector Space Approach , Wiley, New York, 1969.
[5] Phillips, G. M. and P. J. Taylor; Theory and Applications of Numerical Analysis, Acad-
emic Press, 1996.
[8] Strang, G.; Linear Algebra and Its Applications. Harcourt Brace Jevanovich, New
York, 1988.
[9] Strang, G.; Introduction to Applied Mathematics. Wellesley Cambridge, MA, 1986.
61