# Applications of Linear Algebra

by
Gordon C. Everstine
5 June 2010
Copyright c _ 1998–2010 by Gordon C. Everstine.
This book was typeset with L
A
T
E
X2
ε
(MiKTeX).
Preface
These lecture notes are intended to supplement a one-semester graduate-level engineering
course at The George Washington University in algebraic methods appropriate to the solution
of engineering computational problems, including systems of linear equations, linear vector
spaces, matrices, least squares problems, Fourier series, and eigenvalue problems. In general,
the mix of topics and level of presentation are aimed at upper-level undergraduates and ﬁrst-
year graduate students in mechanical, aerospace, and civil engineering.
Gordon Everstine
Gaithersburg, Maryland
June 2010
iii
Contents
1 Systems of Linear Equations 1
1.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Nonsingular Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Diagonal and Triangular Systems . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Operation Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 Matrix Interpretation of Back Substitution . . . . . . . . . . . . . . . . . . . 14
1.9 LDU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11 Multiple Right-Hand Sides and Matrix Inverses . . . . . . . . . . . . . . . . 18
1.12 Tridiagonal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.13 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Vector Spaces 25
2.1 Rectangular Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Spanning a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Row Space and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Pseudoinverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 Orthogonal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.8 Projections Onto Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Change of Basis 40
3.1 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Examples of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Isotropic Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Least Squares Problems 48
4.1 Least Squares Fitting of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 The General Linear Least Squares Problem . . . . . . . . . . . . . . . . . . . 52
4.3 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Fourier Series 55
5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Generalized Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Fourier Expansions Using a Polynomial Basis . . . . . . . . . . . . . . . . . 61
5.4 Similarity of Fourier Series With Least Squares Fitting . . . . . . . . . . . . 64
v
6 Eigenvalue Problems 64
6.1 Example 1: Mechanical Vibrations . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Properties of the Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Example 2: Principal Axes of Stress . . . . . . . . . . . . . . . . . . . . . . . 73
6.4 Computing Eigenvalues by Power Iteration . . . . . . . . . . . . . . . . . . . 75
6.5 Inverse Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.6 Iteration for Other Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.7 Similarity Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.8 Positive Deﬁnite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.9 Application to Diﬀerential Equations . . . . . . . . . . . . . . . . . . . . . . 84
6.10 Application to Structural Dynamics . . . . . . . . . . . . . . . . . . . . . . . 88
Bibliography 90
Index 93
List of Figures
1 Two Vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Some Vectors That Span the xy-Plane. . . . . . . . . . . . . . . . . . . . . . 30
3 90

Rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Reﬂection in 45

Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Projection Onto Horizontal Axis. . . . . . . . . . . . . . . . . . . . . . . . . 34
6 Rotation by Angle θ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7 Projection Onto Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8 Element Coordinate Systems in the Finite Element Method. . . . . . . . . . 40
9 Basis Vectors in Polar Coordinate System. . . . . . . . . . . . . . . . . . . . 41
10 Change of Basis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
11 Element Coordinate System for Pin-Jointed Rod. . . . . . . . . . . . . . . . 47
12 Example of Least Squares Fit of Data. . . . . . . . . . . . . . . . . . . . . . 50
13 The First Two Vectors in Gram-Schmidt. . . . . . . . . . . . . . . . . . . . . 54
14 The Projection of One Vector onto Another. . . . . . . . . . . . . . . . . . . 58
15 Convergence of Series in Example. . . . . . . . . . . . . . . . . . . . . . . . . 59
16 A Function with a Jump Discontinuity. . . . . . . . . . . . . . . . . . . . . . 61
17 The First Two Vectors in Gram-Schmidt. . . . . . . . . . . . . . . . . . . . . 63
18 2-DOF Mass-Spring System. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
19 Mode Shapes for 2-DOF Mass-Spring System. (a) Undeformed shape, (b)
Mode 1 (masses in-phase), (c) Mode 2 (masses out-of-phase). . . . . . . . . . 67
20 3-DOF Mass-Spring System. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
21 Geometrical Interpretation of Sweeping. . . . . . . . . . . . . . . . . . . . . 79
22 Elastic System Acted Upon by Forces. . . . . . . . . . . . . . . . . . . . . . 83
vi
1 Systems of Linear Equations
A system of n linear equations in n unknowns can be expressed in the form
A
11
x
1
+ A
12
x
2
+ A
13
x
3
+ + A
1n
x
n
= b
1
A
21
x
1
+ A
22
x
2
+ A
23
x
3
+ + A
2n
x
n
= b
2
.
.
.
A
n1
x
1
+ A
n2
x
2
+ A
n3
x
3
+ + A
nn
x
n
= b
n
.
_
¸
¸
¸
_
¸
¸
¸
_
(1.1)
The problem is to ﬁnd numbers x
1
, x
2
, , x
n
, given the coeﬃcients A
ij
and the right-hand
side b
1
, b
2
, , b
n
.
In matrix notation,
Ax = b, (1.2)
where A is the n n matrix
A =
_
¸
¸
¸
_
A
11
A
12
A
1n
A
21
A
22
A
2n
.
.
.
.
.
.
.
.
.
.
.
.
A
n1
A
n2
A
nn
_
¸
¸
¸
_
, (1.3)
b is the right-hand side vector
b =
_
¸
¸
¸
_
¸
¸
¸
_
b
1
b
2
.
.
.
b
n
_
¸
¸
¸
_
¸
¸
¸
_
, (1.4)
and x is the solution vector
x =
_
¸
¸
¸
_
¸
¸
¸
_
x
1
x
2
.
.
.
x
n
_
¸
¸
¸
_
¸
¸
¸
_
. (1.5)
1.1 Deﬁnitions
The identity matrix I is the square diagonal matrix with 1s on the diagonal:
I =
_
¸
¸
¸
¸
¸
_
1 0 0 0
0 1 0 0
0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1
_
¸
¸
¸
¸
¸
_
. (1.6)
In index notation, the identity matrix is the Kronecker delta:
I
ij
= δ
ij
=
_
1, i = j,
0, i ,= j.
(1.7)
1
The inverse A
−1
of a square matrix A is the matrix such that
A
−1
A = AA
−1
= I. (1.8)
The inverse is unique if it exists. For example, let B be the inverse, so that AB = BA = I.
If C is another inverse, C = CI = C(AB) = (CA)B = IB = B.
A square matrix A is said to be orthogonal if
AA
T
= A
T
A = I. (1.9)
That is, an orthogonal matrix is one whose inverse is equal to the transpose.
An upper triangular matrix has only zeros below the diagonal, e.g.,
U =
_
¸
¸
¸
¸
_
A
11
A
12
A
13
A
14
A
15
0 A
22
A
23
A
24
A
25
0 0 A
33
A
34
A
35
0 0 0 A
44
A
45
0 0 0 0 A
55
_
¸
¸
¸
¸
_
. (1.10)
Similarly, a lower triangular matrix has only zeros above the diagonal, e.g.,
L =
_
¸
¸
¸
¸
_
A
11
0 0 0 0
A
21
A
22
0 0 0
A
31
A
32
A
33
0 0
A
41
A
42
A
43
A
44
0
A
51
A
52
A
53
A
54
A
55
_
¸
¸
¸
¸
_
. (1.11)
A tridiagonal matrix has all its nonzeros on the main diagonal and the two adjacent diagonals,
e.g.,
A =
_
¸
¸
¸
¸
¸
¸
_
x x 0 0 0 0
x x x 0 0 0
0 x x x 0 0
0 0 x x x 0
0 0 0 x x x
0 0 0 0 x x
_
¸
¸
¸
¸
¸
¸
_
. (1.12)
Consider two three-dimensional vectors u and v given by
u = u
1
e
1
+ u
2
e
2
+ u
3
e
3
, v = v
1
e
1
+ v
2
e
2
+ v
3
e
3
, (1.13)
where e
1
, e
2
, and e
3
are the mutually orthogonal unit basis vectors in the Cartesian coordi-
nate directions, and u
1
, u
2
, u
3
, v
1
, v
2
, and v
3
are the vector components. The dot product
(or inner product) of u and v is
u v = (u
1
e
1
+ u
2
e
2
+ u
3
e
3
) (v
1
e
1
+ v
2
e
2
+ v
3
e
3
) = u
1
v
1
+ u
2
v
2
+ u
3
v
3
=
3

i=1
u
i
v
i
, (1.14)
where
e
i
e
j
= δ
ij
. (1.15)
2
¸

´

θ a
b c = a −b
b sin θ
¡ ¸
a −b cos θ
Figure 1: Two Vectors.
In n dimensions,
u v =
n

i=1
u
i
v
i
, (1.16)
where n is the dimension of each vector (the number of components). There are several
notations for this product, all of which are equivalent:
inner product notation (u, v)
vector notation u v
index notation
n

i=1
u
i
v
i
matrix notation u
T
v or v
T
u
The scalar product can also be written in matrix notation as
u
T
v =
_
u
1
u
2
u
n
¸
_
¸
¸
¸
_
¸
¸
¸
_
v
1
v
2
.
.
.
v
n
_
¸
¸
¸
_
¸
¸
¸
_
. (1.17)
For u a vector, the length of u is
u = [u[ =

u u =
¸
¸
¸
_
n

i=1
u
i
u
i
=
_
u
2
1
+ u
2
2
+ + u
2
n
. (1.18)
A unit vector is a vector whose length is unity. Two vectors u and v are orthogonal if
u v = 0.
Consider the triangle deﬁned by three vectors a, b, and c (Fig. 1), where
c = a −b, (1.19)
in which case
(a −b cos θ)
2
+ (b sin θ)
2
= c
2
, (1.20)
where a, b, and c are the lengths of a, b, and c, respectively, amd θ is the angle between a
and b. Eq. 1.20 can be expanded to yield
a
2
−2ab cos θ + b
2
cos
2
θ + b
2
sin
2
θ = c
2
(1.21)
3
or
c
2
= a
2
+ b
2
−2ab cos θ, (1.22)
which is the law of cosines. We can further expand the law of cosines in terms of components
to obtain
(a
1
−b
1
)
2
+ (a
2
−b
2
)
2
+ (a
3
−b
3
)
2
= a
2
1
+ a
2
2
+ a
2
3
+ b
2
1
+ b
2
2
+ b
2
3
−2ab cos θ (1.23)
or
a
1
b
1
+ a
2
b
2
+ a
3
b
3
= ab cos θ. (1.24)
Thus, the dot product of two vectors can alternatively be expressed as
a b = [a[[b[ cos θ = ab cos θ, (1.25)
where θ is the angle between the two vectors.
1.2 Nonsingular Systems of Equations
If the linear system Ax = b has a unique solution x for every right-hand side b, the system is
said to be nonsingular. Usually, one refers to the matrix rather than the system of equations
as being nonsingular, since this property is determined by the coeﬃcients A
ij
of the matrix
and does not depend on the right-hand side b. If A is singular (i.e., not nonsingular), then,
for some right-hand sides, the system Ax = b will be inconsistent and have no solutions,
and, for other right-hand sides, the equations will be dependent and have an inﬁnite number
of solutions.
For example, the system
2x
1
+ 3x
2
= 5
4x
1
+ 6x
2
= 1
_
(1.26)
is inconsistent and has no solution. The system
2x
1
+ 3x
2
= 5
4x
1
+ 6x
2
= 10
_
(1.27)
is redundant (i.e., has dependent equations) and has an inﬁnite number of solutions. The
only possibilities for a square system of linear equations is that the system have no solution,
one solution, or an inﬁnite number of solutions.
1.3 Diagonal and Triangular Systems
Several types of systems of equations are particularly easy to solve, including diagonal and
triangular systems. For the diagonal matrix
A =
_
¸
¸
¸
¸
¸
_
A
11
A
22
A
33
.
.
.
A
nn
_
¸
¸
¸
¸
¸
_
, (1.28)
4
the system Ax = b has the solution
x
1
=
b
1
A
11
, x
2
=
b
2
A
22
, , x
n
=
b
n
A
nn
, (1.29)
assuming that, for each i, A
ii
,= 0. If A
kk
= 0 for some k, then x
k
does not exist if b
k
,= 0,
and x
k
is arbitrary if b
k
= 0.
Consider the upper triangular system (A
ij
= 0 if i > j)
A
11
x
1
+ A
12
x
2
+ A
13
x
3
+ + A
1n
x
n
= b
1
A
22
x
2
+ A
23
x
3
+ + A
2n
x
n
= b
2
A
33
x
3
+ + A
3n
x
n
= b
3
.
.
.
A
n−1,n−1
x
n−1
+ A
n−1,n
x
n
= b
n−1
A
nn
x
n
= b
n
.
_
¸
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
¸
_
(1.30)
This system is solved by backsolving:
x
n
= b
n
/A
nn
, (1.31)
x
n−1
= (b
n−1
−A
n−1,n
x
n
)/A
n−1,n−1
, (1.32)
and so on until all components of x are found.
The backsolving algorithm can be summarized as follows:
1. Input n, A, b
2. x
n
= b
n
/A
nn
3. For k = n −1, n −2, . . . , 1: [k = row number]
x
k
= b
k
For i = k + 1, k + 2, . . . , n:
x
k
= x
k
−A
ki
x
i
x
k
= x
k
/A
kk
4. Output x
Since the only impediments to the successful completion of this algorithm are the divisions,
it is apparent that an upper triangular system is nonsingular if, and only if, A
ii
,= 0 for all
i. Alternatively, if any A
ii
= 0, the system is singular.
For example, consider the upper triangular system
x
1
− 2x
2
+ x
3
= 1
2x
2
+ 4x
3
= 10
− 2x
3
= 8.
_
_
_
(1.33)
From the backsolving algorithm, the solution is found as
x
3
= −4,
x
2
= [10 −4(−4)]/2 = 13,
x
1
= 1 + 2(13) −(−4) = 31.
(1.34)
5
A variation on the back-solving algorithm listed above is obtained if we return the solution
in the right-hand side array b. That is, array x could be eliminated by overwriting b and
storing the solution in b. With this variation, the back-solving algorithm becomes
1. Input n, A, b
2. b
n
= b
n
/A
nn
3. For k = n −1, n −2, . . . , 1: [k = row number]
For i = k + 1, k + 2, . . . , n:
b
k
= b
k
−A
ki
b
i
b
k
= b
k
/A
kk
4. Output b
On output, the solution is returned in b, and the original b is destroyed.
A lower triangular system Ax = b is one for which A
ij
= 0 if i < j. This system can be
solved directly by forward solving in a manner similar to backsolving for upper triangular
systems.
1.4 Gaussian Elimination
The approach widely used to solve general systems of linear equations is Gaussian elimina-
tion, for which the procedure is to transform the general system into upper triangular form,
and then backsolve. The transformation is based on the following elementary operations,
which have no eﬀect on the solution of the system:
1. An equation can be multiplied by a nonzero constant.
2. Two equations can be interchanged.
3. Two equations can be added together and either equation replaced by the sum.
4. An equation can be multiplied by a nonzero constant, added to another equation,
and the result used to replace either of the ﬁrst two equations. (This operation is a
combination of the ﬁrst and third operations.)
We illustrate Gaussian elimination with an example. Consider the system
−x
1
+ 2x
2
+ 3x
3
+ x
4
= 1
−3x
1
+ 8x
2
+ 8x
3
+ x
4
= 2
x
1
+ 2x
2
− 6x
3
+ 4x
4
= −1
2x
1
− 4x
2
− 5x
3
− x
4
= 0.
_
¸
¸
_
¸
¸
_
(1.35)
Step 1: Eliminate all coeﬃcients of x
1
except for that in the ﬁrst equation:
−x
1
+ 2x
2
+ 3x
3
+ x
4
= 1
2x
2
− x
3
− 2x
4
= −1
4x
2
− 3x
3
+ 5x
4
= 0
x
3
+ x
4
= 2.
_
¸
¸
_
¸
¸
_
(1.36)
6
Step 2: Eliminate all coeﬃcients of x
2
except for those in the ﬁrst two equations:
−x
1
+ 2x
2
+ 3x
3
+ x
4
= 1
2x
2
− x
3
− 2x
4
= −1
− x
3
+ 9x
4
= 2
x
3
+ x
4
= 2.
_
¸
¸
_
¸
¸
_
(1.37)
Step 3: Eliminate all coeﬃcients of x
3
except for those in the ﬁrst three equations:
−x
1
+ 2x
2
+ 3x
3
+ x
4
= 1
2x
2
− x
3
− 2x
4
= −1
− x
3
+ 9x
4
= 2
10x
4
= 4.
_
¸
¸
_
¸
¸
_
(1.38)
Ths system is now in upper triangular form, and we could complete the solution by back-
solving the upper triangular system. Notice in the above procedure that we work with both
sides of the equations simultaneously.
Notice also that we could save some eﬀort in the above procedure by not writing the
unknowns at each step, and instead use matrix notation. We deﬁne a matrix consisting of
the left-hand side coeﬃcient matrix augmented by the right-hand side, and apply to this
matrix the same elementary operations used in the three steps above:
_
¸
¸
_
−1 2 3 1 1
−3 8 8 1 2
1 2 −6 4 −1
2 −4 −5 −1 0
_
¸
¸
_
−→
_
¸
¸
_
−1 2 3 1 1
0 2 −1 −2 −1
0 4 −3 5 0
0 0 1 1 2
_
¸
¸
_
(1.39)
−→
_
¸
¸
_
−1 2 3 1 1
0 2 −1 −2 −1
0 0 −1 9 2
0 0 1 1 2
_
¸
¸
_
−→
_
¸
¸
_
−1 2 3 1 1
0 2 −1 −2 −1
0 0 −1 9 2
0 0 0 10 4
_
¸
¸
_
. (1.40)
Note that this reduction can be performed “in-place” (i.e., by writing over the matrix and
destroying both the original matrix and the right-hand side).
Multiple right-hand sides can be handled simultaneously. For example, if we want the
solution of Ax
1
= b
1
and Ax
2
= b
2
, we augment the coeﬃcient matrix by both right-hand
sides:
_
¸
¸
_
−1 2 3 1 1 2
−3 8 8 1 2 3
1 2 −6 4 −1 3
2 −4 −5 −1 0 −1
_
¸
¸
_
.
The Gaussian elimination algorithm (reduction to upper triangular form) can now be
summarized as follows:
1. Input n, A, b
2. For k = 1, 2, . . . , n −1: [k = pivot row number]
For i = k + 1, k + 2, . . . , n: [i = row number below k]
7
m = −A
ik
/A
kk
[m = row multiplier]
For j = k + 1, k + 2, . . . , n: [j = column number to right of k]
A
ij
= A
ij
+ mA
kj
b
i
= b
i
+ mb
k
[rhs]
3. Output A, b
The output of this procedure is an upper triangular system which can be solved by back-
solving.
Note that each multiplier m could be saved in A
ik
(which is in the lower triangle and not
used in the above algorithm) so that additional right-hand sides could be solved later without
having to reduce the original system again. Thus, the Gaussian elimination algorithm is
modiﬁed as follows:
1. Input n, A, b
2. For k = 1, 2, . . . , n −1: [k = pivot row number]
For i = k + 1, k + 2, . . . , n: [i = row number below k]
A
ik
= −A
ik
/A
kk
[multiplier stored in A
ik
]
For j = k + 1, k + 2, . . . , n: [j = column number to right of k]
A
ij
= A
ij
+ A
ik
A
kj
b
i
= b
i
+ A
ik
b
k
[rhs]
3. Output A, b
Note further that, if the multipliers are saved in A
ik
, the loops involving the coeﬃcient
matrix A can (and should) be separated from those involving the right-hand side b:
1. Input n, A, b
2. Elimination
For k = 1, 2, . . . , n −1: [k = pivot row number]
For i = k + 1, k + 2, . . . , n: [i = row number below k]
A
ik
= −A
ik
/A
kk
[multiplier stored in A
ik
]
For j = k + 1, k + 2, . . . , n: [j = column number to right of k]
A
ij
= A
ij
+ A
ik
A
kj
3. RHS modiﬁcation
For k = 1, 2, . . . , n −1: [k = pivot row number]
For i = k + 1, k + 2, . . . , n: [i = row number below k]
b
i
= b
i
+ A
ik
b
k
[rhs]
4. Output A, b
If the coeﬃcient matrix is banded, the loops above can be shortened to avoid unnecessary
operations on zeros. A band matrix, which occurs frequently in applications, is one for which
the nonzeros are clustered about the main diagonal. We deﬁne the matrix semi-bandwidth
w as the maximum “distance” of any nonzero oﬀ-diagonal term from the diagonal (including
8
the diagonal). Thus, for a band matrix, the upper limit n in the loops associated with pivot
row k above can be replaced by
k
max
= min(k + w −1, n). (1.41)
The modiﬁed algorithm thus becomes
1. Input n, A, b, w [w = matrix semi-bandwidth]
2. Elimination
For k = 1, 2, . . . , n −1: [k = pivot row number]
k
max
= min(k + w −1, n) [replaces n in the loops]
For i = k + 1, k + 2, . . . , k
max
: [i = row number below k]
A
ik
= −A
ik
/A
kk
[multiplier stored in A
ik
]
For j = k + 1, k + 2, . . . , k
max
: [j = column number to right of k]
A
ij
= A
ij
+ A
ik
A
kj
3. RHS modiﬁcation
For k = 1, 2, . . . , n −1: [k = pivot row number]
k
max
= min(k + w −1, n) [replaces n in the loop]
For i = k + 1, k + 2, . . . , k
max
: [i = row number below k]
b
i
= b
i
+ A
ik
b
k
[rhs]
4. Output A, b
A similar modiﬁcation is made to the back-solve algorithm.
1.5 Operation Counts
Except for small systems of equations (small n), we can estimate the number of operations
required to execute an algorithm by looking only at the operations in the innermost loop.
We deﬁne an operation as the combination of a multiply and add, since they generally are
executed together. Gaussian elimination involves a triply-nested loop. For ﬁxed row number
k, the inner loops on i and j are executed n −k times each. Hence,
n−1

k=1
(n −k)
2
=
1

m=n−1
m
2
=
n−1

m=1
m
2

_
n−1
0
m
2
dm = (n −1)
3
/3 ≈ n
3
/3, (1.42)
where the approximations all depend on n’s being large. Thus, for a fully-populated matrix,
3
/3 operations.
For a band matrix with a semi-bandwidth w ¸n, w
2
operations are performed for each
of the n rows (to ﬁrst order). Thus, for a general band matrix, Gaussian elimination requires
2
operations.
9
For backsolving, we consider Step 3 of the backsolve algorithm. In Row k (counting from
the bottom), there are approximately k multiply-add operations. Hence,
n

k=1
k = n(n + 1)/2 ≈ n
2
/2. (1.43)
2
/2 operations for each right-hand side.
Notice that, for large n, the elimination step has many more operations than does the
backsolve.
1.6 Partial Pivoting
For Gaussian elimination to work, the (current) diagonal entry A
ii
must be nonzero at
each intermediate step. For example, consider the same example used earlier, but with the
equations rearranged:
_
¸
¸
_
−1 2 3 1 1
2 −4 −5 −1 0
−3 8 8 1 2
1 2 −6 4 −1
_
¸
¸
_
−→
_
¸
¸
_
−1 2 3 1 1
0 0 1 1 2
0 2 −1 −2 −1
0 4 −3 5 0
_
¸
¸
_
. (1.44)
In this case, the next step would fail, since A
22
= 0. The solution to this problem is
to interchange two equations to avoid the zero divisor. If the system is nonsingular, this
interchange is always possible.
However, complications arise in Gaussian elimination not only when A
ii
= 0 at some step
but also sometimes when A
ii
is small. For example, consider the system
(1.000 10
−5
)x
1
+ 1.000x
2
= 1.000
1.000x
1
+ 1.000x
2
= 2.000,
_
(1.45)
the solution of which is (x
1
, x
2
) = (1.00001000 . . . , 0.99998999 . . .). Assume that we perform
the Gaussian elimination on a computer which has four digits of precision. After the one-step
elimination, we obtain
(1.000 10
−5
)x
1
+ 1.000x
2
= 1.000
− 1.000 10
5
x
2
= −1.000 10
5
,
_
(1.46)
whose solution is x
2
= 1.000, x
1
= 0.000.
If we now interchange the original equations, the system is
1.000x
1
+ 1.000x
2
= 2.000
(1.000 10
−5
)x
1
+ 1.000x
2
= 1.000,
_
(1.47)
which, after elimination, yields
1.000x
1
+ 1.000x
2
= 2.000
1.000x
2
= 1.000,
_
(1.48)
10
whose solution is x
2
= 1.000, x
1
= 1.000, which is correct to the precision available.
We conclude from this example that it is better to interchange equations at Step i so that
[A
ii
[ is as large as possible. The interchanging of equations is called partial pivoting. Thus
an alternative version of the Gaussian elimination algorithm would be that given previously
except that, at each step (Step i), the ith equation is interchanged with one below so that [A
ii
[
is as large as possible. In actual practice, the “interchange” can alternatively be performed
by interchanging the equation subscripts rather than the equations themselves.
1.7 LU Decomposition
Deﬁne M
ik
as the identity matrix with the addition of the number m
ik
in Row i, Column k:
M
ik
=
_
¸
¸
¸
¸
¸
_
1
1
1
m
ik
.
.
.
1
_
¸
¸
¸
¸
¸
_
. (1.49)
For example, for n = 3, M
31
is
M
31
=
_
_
1 0 0
0 1 0
m
31
0 1
_
_
. (1.50)
We now consider the eﬀect of multiplying a matrix A by M
ik
. For example, for n = 3,
M
31
A =
_
_
1 0 0
0 1 0
m
31
0 1
_
_
_
_
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
_
_
(1.51)
=
_
_
A
11
A
12
A
13
A
21
A
22
A
23
m
31
A
11
+ A
31
m
31
A
12
+ A
32
m
31
A
13
+ A
33
_
_
. (1.52)
Thus, when A is multiplied by M
31
, we obtain a new matrix which has Row 3 replaced
by the sum of Row 3 and m
31
Row 1. In general, multiplication by M
ik
has the eﬀect of
replacing Row i by Row i+m
ik
Row k. Thus, each step of the Gaussian elimination process
is equivalent to multiplying A by some M
ik
. For example, for n = 4 (a 4 4 matrix), the
entire Gaussian elimination can be written as
M
43
M
42
M
32
M
41
M
31
M
21
A = U, (1.53)
where
m
ik
= −
A
ik
A
kk
(1.54)
are the multipliers in Gaussian elimination, and U is the upper triangular matrix obtained
in the Gaussian elimination. Note that the multipliers m
ik
in Eq. 1.54 are computed using
the current matrix entries obtained during Gaussian elimination.
11
If each M
−1
ik
exists, then, for n = 4,
A = (M
43
M
42
M
32
M
41
M
31
M
21
)
−1
U = M
−1
21
M
−1
31
M
−1
41
M
−1
32
M
−1
42
M
−1
43
U, (1.55)
since the inverse of a matrix product is equal to the product of the matrix inverses in reverse
order. We observe, by example, that M
−1
ik
exists and is obtained fromM
ik
simply by changing
the sign of m
ik
(the ik element), since, for n = 4 and M
42
,
_
¸
¸
_
1
1
1
m
42
1
_
¸
¸
_
_
¸
¸
_
1
1
1
−m
42
1
_
¸
¸
_
=
_
¸
¸
_
1 0 0 0
0 1 0 0
0 0 1 0
0 m
42
−m
42
0 1
_
¸
¸
_
= I. (1.56)
From Eq. 1.55, for n = 3, we obtain
A = M
−1
21
M
−1
31
M
−1
32
U, (1.57)
where
M
−1
21
M
−1
31
M
−1
32
=
_
_
1
−m
21
1
1
_
_
_
_
1
1
−m
31
1
_
_
_
_
1
1
−m
32
1
_
_
(1.58)
=
_
_
1
−m
21
1
1
_
_
_
_
1
1
−m
31
−m
32
1
_
_
(1.59)
=
_
_
1
−m
21
1
−m
31
−m
32
1
_
_
. (1.60)
In general,
M
−1
21
M
−1
31
M
−1
n1
M
−1
n,n−1
=
_
¸
¸
¸
¸
¸
_
1
−m
21
1
−m
31
−m
32
1
.
.
.
.
.
.
.
.
.
.
.
.
−m
n1
−m
n2
−m
n,n−1
1
_
¸
¸
¸
¸
¸
_
, (1.61)
and, for any nonsingular matrix A,
A = LU, (1.62)
where L is the lower triangular matrix
L =
_
¸
¸
¸
¸
¸
_
1
−m
21
1
−m
31
−m
32
1
.
.
.
.
.
.
.
.
.
.
.
.
−m
n1
−m
n2
−m
n,n−1
1
_
¸
¸
¸
¸
¸
_
, (1.63)
12
and U is the upper triangular matrix obtained in Gaussian elimination. A lower triangular
matrix with 1s on the diagonal is referred to as a lower unit triangular matrix.
The decomposition A = LU is referred to as the “LU decomposition” or “LU factoriza-
tion” of A. Note that the LU factors are obtained directly from the Gaussian elimination
algorithm, since U is the upper triangular result, and L consists of the negatives of the
row multipliers. Thus, the LU decomposition is merely a matrix interpretation of Gaussian
elimination.
It is also useful to show a common storage scheme for the LU decomposition. If A
is factored in place (i.e., the storage locations for A are over-written during the Gaussian
elimination process), both L and U can be stored in the same n
2
storage locations used for
A. In particular,
_
¸
¸
¸
¸
¸
_
A
11
A
12
A
13
A
1n
A
21
A
22
A
23
A
2n
A
31
A
32
A
33
A
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A
n1
A
n2
A
n3
A
nn
_
¸
¸
¸
¸
¸
_
is replaced by
_
¸
¸
¸
¸
¸
_
U
11
U
12
U
13
U
1n
L
21
U
22
U
23
U
2n
L
31
L
32
U
33
U
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
L
n1
L
n2
L
n3
U
nn
_
¸
¸
¸
¸
¸
_
,
where the L
ij
entries are the negatives of the multipliers obtained during Gaussian elimina-
tion. Since L is a lower unit triangular matrix (with 1s on the diagonal), it is not necessary
to store the 1s. Each L
ij
entry can be stored as it is computed.
We have seen how a lower unit triangular matrix can be formed from the product of M
matrices. It can also be shown that the inverse of a matrix product is the product of the
individual inverses in reverse order. We have also seen that the inverse of an M matrix is
another M matrix: the one with the oﬀ-diagonal term negated. Thus, one might infer that
the inverse of a lower unit triangular matrix is equal to the same matrix with the terms
below the diagonal negated. However, such an inference would be incorrect, since, although
the inverse of a lower unit triangular matrix is a lower triangular matrix equal to the product
of M matrices, the M matrices are in the wrong order. Thus, the inverse of a lower unit
triangular matrix is not equal to the same matrix with the terms below the diagonal negated,
as the following simple example illustrates. Consider the matrix
A =
_
_
1 0 0
5 1 0
1 8 1
_
_
. (1.64)
We note that
A
−1
,=
_
_
1 0 0
−5 1 0
−1 −8 1
_
_
, (1.65)
since
_
_
1 0 0
5 1 0
1 8 1
_
_
_
_
1 0 0
−5 1 0
−1 −8 1
_
_
=
_
_
1 0 0
0 1 0
−40 0 1
_
_
,= I. (1.66)
13
To see why we cannot compute A
−1
by simply negating the terms of Abelow the diagonal,
consider the three matrices
B =
_
_
1 0 0
5 1 0
0 0 1
_
_
, C =
_
_
1 0 0
0 1 0
1 0 1
_
_
, D =
_
_
1 0 0
0 1 0
0 8 1
_
_
. (1.67)
Is A = BCD or A = DCB or both? Notice that
BCD =
_
_
1 0 0
5 1 0
0 0 1
_
_
_
_
1 0 0
0 1 0
1 0 1
_
_
_
_
1 0 0
0 1 0
0 8 1
_
_
=
_
_
1 0 0
5 1 0
1 8 1
_
_
= A, (1.68)
and
DCB =
_
_
1 0 0
0 1 0
0 8 1
_
_
_
_
1 0 0
0 1 0
1 0 1
_
_
_
_
1 0 0
5 1 0
0 0 1
_
_
=
_
_
1 0 0
5 1 0
41 8 1
_
_
,= A. (1.69)
Thus, we conclude that order matters when M matrices are multiplied, and
BCD ,= DCB. (1.70)
That is, the nice behavior which occurs when multiplying M matrices occurs only if the
matrices are multiplied in the right order: the same order as that which the Gaussian
elimination multipliers are computed. To complete this discussion, we compute A
−1
as
A
−1
= D
−1
C
−1
B
−1
=
_
_
1 0 0
0 1 0
0 −8 1
_
_
_
_
1 0 0
0 1 0
−1 0 1
_
_
_
_
1 0 0
−5 1 0
0 0 1
_
_
=
_
_
1 0 0
−5 1 0
39 −8 1
_
_
.
(1.71)
1.8 Matrix Interpretation of Back Substitution
Consider the linear system Ax = b, where A has been factored into A = LU. Then
LUx = b. (1.72)
If we now deﬁne Ux = y, Eq. 1.72 is equivalent to the pair of equations
_
Ly = b,
Ux = y,
(1.73)
which can be solved in two steps, ﬁrst to compute y and then to compute x. The combination
of Gaussian elimination and backsolve can therefore be interpreted as the sequence of two
matrix solutions: Eq. 1.73a is the modiﬁcation of the original right-hand side b by the
multipliers to yield y. Eq. 1.73b is the backsolve, since the backsolve operation involves the
coeﬃcient matrix U to yield the solution x. The two steps in Eq. 1.73 are often referred to
as forward substitution and back substitution, respectively. The combination is sometimes
14
called the forward-backward substitution (FBS). Since the forward and back substitutions
each require n
2
/2 operations for a fully populated matrix, FBS requires n
2
operations.
A good example of why equations are solved with LU factorization followed by FBS
is provided by transient structural dynamics. Transient response analysis of structures is
concerned with computing the forced response due to general time-dependent loads (e.g.,
blast or shock). Linear ﬁnite element modeling results in the matrix diﬀerential equation
M¨ u +B˙ u +Ku = F(t), (1.74)
where u(t) is the vector of unknown displacements, t is time, M is the system mass matrix,
B is the viscous damping matrix, K is the system stiﬀness matrix, F is the time-dependent
vector of applied forces, and dots denote diﬀerentiation with respect to the time t. M, B,
and K are constant (independent of time). The problem is to determine u(t), given the
initial displacement u
0
and initial velocity ˙ u
0
.
There are several numerical approaches for solving this system of equations, one of which
integrates the equations directly in the time domain. Another approach, the modal super-
position approach, will be discussed as an application of eigenvalue problems in Section 6.10
(p. 88). An example of a direct integrator is the Newmark-beta ﬁnite diﬀerence method (in
time), which results in the recursive relation
_
1
∆t
2
M +
1
2∆t
B+
1
3
K
_
u
n+1
=
1
3
(F
n+1
+F
n
+F
n−1
)
+
_
2
∆t
2
M−
1
3
K
_
u
n
+
_

1
∆t
2
M+
1
2∆t
B−
1
3
K
_
u
n−1
,
(1.75)
where ∆t is the integration time step, and u
n
is the solution at the nth step. If the current
time solution is u
n
, this matrix system is a system of linear algebraic equations whose solution
u
n+1
is the displacement solution at the next time step. The right-hand side depends on
the current solution u
n
and the immediately preceding solution u
n−1
. Since the original
diﬀerential equation is second-order, it makes sense that two consecutive time solutions
are needed to move forward, since a minimum of three consecutive time values is needed to
estimate a second derivative using ﬁnite diﬀerences. Newmark-beta uses the initial conditions
in a special start-up procedure to obtain u
0
and u
−1
to get the process started.
There are various issues with integrators (including stability, time step selection, and the
start-up procedure), but our interest here is in the computational eﬀort. Notice that, in
Eq. 1.75, the coeﬃcient matrices in parentheses that involve the time-independent matrices
M, B, and K do not change during the integration unless the time step ∆t changes. Thus,
Eq. 1.75 is an example of a linear equation system of the form Ax = b for which many
solutions are needed (one for each right-hand side b) for the same coeﬃcient matrix A, but
the right-hand sides are not known in advance. Thus, the solution strategy is to factor the
left-hand side coeﬃcient matrix once into LU, save the factors, form the new rhs at each step,
and then complete the solution with FBS. The computational eﬀort is therefore one matrix
decomposition (LU) for each unique ∆t followed by two matrix/vector muliplications (to form
the rhs) and one FBS at each step. Since LU costs considerably more than multiplication
or FBS, many time steps can be computed for the same eﬀort as one LU. As a result, one
should not change the time step size ∆t unnecessarily.
15
1.9 LDU Decomposition
Recall that any nonsingular square matrix A can be factored into A = LU, where L is a
lower unit triangular matrix, and U is an upper triangular matrix. The diagonal entries in
U can be factored out into a separate diagonal matrix D to yield
A = LDU, (1.76)
where D is a nonsingular diagonal matrix, and U is an upper unit triangular matrix (not
the same U as in LU).
To verify the correctness of this variation of the LU decomposition, let
LU = LD
ˆ
U. (1.77)
For a 3 3 matrix,
U =
_
_
U
11
U
12
U
13
0 U
22
U
23
0 0 U
33
_
_
=
_
_
D
11
0 0
0 D
22
0
0 0 D
33
_
_
_
_
1
ˆ
U
12
ˆ
U
13
0 1
ˆ
U
23
0 0 1
_
_
= D
ˆ
U, (1.78)
where D
ii
= U
ii
(no sum on i), and
ˆ
U
ij
= U
ij
/D
ii
(no sum on i). That is,
_
_
U
11
U
12
U
13
0 U
22
U
23
0 0 U
33
_
_
=
_
_
U
11
0 0
0 U
22
0
0 0 U
33
_
_
_
_
1 U
12
/U
11
U
13
/U
11
0 1 U
23
/U
22
0 0 1
_
_
. (1.79)
For example,
A =
_
_
2 1 1
4 −6 0
−2 7 2
_
_
=
_
_
1 0 0
2 1 0
−1 −1 1
_
_
_
_
2 1 1
0 −8 −2
0 0 1
_
_
(1.80)
=
_
_
1 0 0
2 1 0
−1 −1 1
_
_
_
_
2 0 0
0 −8 0
0 0 1
_
_
_
_
1
1
/2
1
/2
0 1
1
/4
0 0 1
_
_
= LD
ˆ
U. (1.81)
Note that, if A is symmetric (A
T
= A),
A = LDU = (LDU)
T
= U
T
DL
T
, (1.82)
or
U = L
T
. (1.83)
Thus, nonsingular symmetric matrices can be factored into
A = LDL
T
. (1.84)
16
1.10 Determinants
Let A be an n n square matrix. For n = 2, we deﬁne the determinant of A as
det
_
A
11
A
12
A
21
A
22
_
=
¸
¸
¸
¸
A
11
A
12
A
21
A
22
¸
¸
¸
¸
= A
11
A
22
−A
12
A
21
. (1.85)
For example,
¸
¸
¸
¸
1 2
3 4
¸
¸
¸
¸
= 1 4 −2 3 = −2, (1.86)
where the number of multiply-add operations is 2. For n = 3, we can expand by cofactors
to obtain, for example,
¸
¸
¸
¸
¸
¸
1 2 3
4 5 6
7 8 9
¸
¸
¸
¸
¸
¸
= 1
¸
¸
¸
¸
5 6
8 9
¸
¸
¸
¸
−2
¸
¸
¸
¸
4 6
7 9
¸
¸
¸
¸
+ 3
¸
¸
¸
¸
4 5
7 8
¸
¸
¸
¸
= 0, (1.87)
where the number of operations is approximately 3 2 = 6. In general, if we expand a
determinant using cofactors, the number of operations to expand an n n determinant is n
times the number of operations to expand an (n − 1) (n − 1) determinant. Thus, using
the cofactor approach, the evaluation of the determinant of an n n matrix requires about
n! operations (to ﬁrst order), which is prohibitively expensive for all but the smallest of
matrices.
Several properties of determinants are of interest and stated here without proof:
1. A cofactor expansion can be performed along any matrix row or column.
2. det A
T
= det A
3. det(AB) = (det A)(det B)
4. det A
−1
= (det A)
−1
(a special case of Property 3)
5. The determinant of a triangular matrix is the product of the diagonals.
6. Interchanging two rows or two columns of a matrix changes the sign of the determinant.
7. Multiplying any single row or column by a constant multiplies the determinant by that
constant. Consequently, for an n n matrix A and scalar c, det(cA) = c
n
det(A).
8. A matrix with a zero row or column has a zero determinant.
9. A matrix with two identical rows or two identical columns has a zero determinant.
10. If a row (or column) of a matrix is a constant multiple of another row (or column), the
determinant is zero.
11. Adding a constant multiple of a row (or column) to another row (or column) leaves
the determinant unchanged.
17
To illustrate the ﬁrst property using the matrix of Eq. 1.87, we could also write
¸
¸
¸
¸
¸
¸
1 2 3
4 5 6
7 8 9
¸
¸
¸
¸
¸
¸
= −2
¸
¸
¸
¸
4 6
7 9
¸
¸
¸
¸
+ 5
¸
¸
¸
¸
1 3
7 9
¸
¸
¸
¸
−8
¸
¸
¸
¸
1 3
4 6
¸
¸
¸
¸
= 0. (1.88)
This property is sometimes convenient for matrices with many zeros, e.g.,
¸
¸
¸
¸
¸
¸
1 0 3
0 5 0
7 0 9
¸
¸
¸
¸
¸
¸
= 5
¸
¸
¸
¸
1 3
7 9
¸
¸
¸
¸
= −60. (1.89)
Now consider the LU factorization A = LU:
det A = (det L)(det U) = (1 1 1 1)(U
11
U
22
U
nn
) = U
11
U
22
U
nn
. (1.90)
This result implies that a nonsingular matrix has a nonzero determinant, since, if Gaussian
elimination results in a U matrix with only nonzeros on the diagonal, U is nonsingular,
which implies that A is nonsingular, and det A ,= 0. On the other hand, if U has one or
more zeros on the diagonal, U and A are both singular, and det A = 0. Thus, a matrix is
nonsingular if, and only if, its determinant is nonzero.
Eq. 1.90 also shows that, given U, the determinant can be computed with an additional
n operations. Given A, calculating det A requires about n
3
/3 operations, since computing
the LU factorization by Gaussian elimination requires about n
3
/3 operations, a very small
number compared to the n! required by a cofactor expansion, as seen in the table below:
n n
3
/3 n!
5 42 120
10 333 3.610
6
25 5208 1.610
25
60 7.210
4
8.310
81
Determinants are rarely of interest in engineering computations or numerical mathematics,
but, if needed, can be economically computed using the LU factorization.
1.11 Multiple Right-Hand Sides and Matrix Inverses
Consider the matrix system Ax = b, where A is an n n matrix, and both x and b are
nm matrices. That is, both the right-hand side b and the solution x have m columns, each
corresponding to an independent right-hand side. Each column of the solution x corresponds
to the same column of the right-hand side b, as can be seen if the system is written in block
form as
A[x
1
x
2
x
3
] = [b
1
b
2
b
3
], (1.91)
or
[Ax
1
Ax
2
Ax
3
] = [b
1
b
2
b
3
]. (1.92)
Thus, a convenient and economical approach to solve a system with multiple right-hand sides
is to factor A into A = LU (an n
3
/3 operation), and then apply FBS (an n
2
operation) to
18
each right-hand side. Unless the number m of right-hand sides is large, additional right-hand
sides add little to the overall computational cost of solving the system.
Given a square matrix A, the matrix inverse A
−1
, if it exists, is the matrix satisfying
AA
−1
= I =
_
¸
¸
¸
¸
¸
_
1 0 0 0
0 1 0 0
0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1
_
¸
¸
¸
¸
¸
_
. (1.93)
The calculation of A
−1
is equivalent to solving a set of equations with coeﬃcient matrix A
and n right-hand sides, each one column of the identity matrix. The level of eﬀort required
is one LU factorization and n FBSs; that is,
Number of operations ≈
1
3
n
3
+ n(n
2
) =
4
3
n
3
. (1.94)
Thus, computing an inverse is only four times more expensive than solving a system with one
right-hand side. In fact, if one exploits the special form of the right-hand side (the identity
matrix, which has only one nonzero in each column), it turns out that the inverse can be
computed in n
3
operations.
1.12 Tridiagonal Systems
Tridiagonal systems arise in a variety of engineering applications, including the ﬁnite diﬀer-
ence solution of diﬀerential equations (e.g., the Crank-Nicolson ﬁnite diﬀerence method for
solving parabolic partial diﬀerential equations) and cubic spline interpolation.
Tridiagonal systems are particularly easy (and fast) to solve using Gaussian elimination.
It is convenient to solve such systems using the following notation:
d
1
x
1
+ u
1
x
2
= b
1
l
2
x
1
+ d
2
x
2
+ u
2
x
3
= b
2
l
3
x
2
+ d
3
x
3
+ u
3
x
4
= b
3
.
.
.
l
n
x
n−1
+ d
n
x
n
= b
n
,
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
(1.95)
where d
i
, u
i
, and l
i
are, respectively, the diagonal, upper, and lower matrix entries in Row
i. All coeﬃcients can now be stored in three one-dimensional arrays, D(), U(), and L(),
instead of a full two-dimensional array A(I,J). The solution algorithm (reduction to upper
triangular form by Gaussian elimination followed by backsolving) can now be summarized
as follows:
1. For k = 1, 2, . . . , n −1: [k = pivot row]
m = −l
k+1
/d
k
[m = multiplier needed to annihilate term below]
d
k+1
= d
k+1
+ mu
k
[new diagonal entry in next row]
b
k+1
= b
k+1
+ mb
k
[new rhs in next row]
19
2. x
n
= b
n
/d
n
[start of backsolve]
3. For k = n −1, n −2, . . . , 1: [backsolve loop]
x
k
= (b
k
−u
k
x
k+1
)/d
k
To ﬁrst order, this algorithm requires 5n operations.
1.13 Iterative Methods
Gaussian elimination is an example of a direct method, for which the number of operations
required to eﬀect a solution is predictable and ﬁxed. Some systems of equations have char-
acteristics that can be exploited by iterative methods, where the number of operations is
not predictable in advance. For example, systems with large diagonal entries will converge
rapidly with iterative methods.
Consider the linear systemAx = b. In Jacobi’s method, one starts by assuming a solution
x
(0)
, say, and then computing a new approximation to the solution using the equations
_
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
_
x
(1)
1
=
_
b
1
−A
12
x
(0)
2
−A
13
x
(0)
3
− −A
1n
x
(0)
n
_
/A
11
x
(1)
2
=
_
b
2
−A
21
x
(0)
1
−A
23
x
(0)
3
− −A
2n
x
(0)
n
_
/A
22
.
.
.
x
(1)
n
=
_
b
n
−A
n1
x
(0)
1
−A
n2
x
(0)
2
− −A
n,n−1
x
(0)
n−1
_
/A
nn
.
(1.96)
In general,
x
(k)
i
=
_
b
i
−A
i1
x
(k−1)
1
−A
i2
x
(k−1)
2
− −A
i,i−1
x
(k−1)
i−1
−A
i,i+1
x
(k−1)
i+1
− −A
in
x
(k−1)
n
_
/A
ii
,
i = 1, 2, . . . , n. (1.97)
or
x
(k)
i
=
_
b
i

j=i
A
ij
x
(k−1)
j
_
/A
ii
, i = 1, 2, . . . , n. (1.98)
In these equations, superscripts refer to the iteration number. If the vector x does not change
much from one iteration to the next, the process has converged, and we have a solution.
Jacobi’s method is also called iteration by total steps and the method of simultaneous
displacements, since the order in which the equations are processed is irrelevant, and the
updates could in principle be done simultaneously. Thus, Jacobi’s method is nicely suited
to parallel computation.
Notice that, in this algorithm, two consecutive approximations to x are needed at one
time so that convergence can be checked. That is, it is not possible to update the vector x
in place.
The Jacobi algorithm can be summarized as follows:
1. Input n, A, b, N
max
, x [x = initial guess]
2. k=0
20
3. k=k+1 [k = iteration number]
4. For i = 1, 2, . . . , n:
y
i
= (b
i
−A
i1
x
1
−A
i2
x
2
− −A
i,i−1
x
i−1
−A
i,i+1
x
i+1
− −A
in
x
n
) /A
ii
5. If converged (compare y with x), output y and exit; otherwise . . .
6. x = y [save latest iterate in x]
7. If k < N
max
, go to Step 3; otherwise . . .
8. No convergence; error termination.
In the above algorithm, the old iterates at each step are called x, and the new ones are called
y.
For example, consider the system Ax = b, where
A =
_
_
2 1 0
1 4 1
0 1 3
_
_
, b =
_
_
_
1
0
0
_
_
_
. (1.99)
This system can be written in the form
_
_
_
x
1
= (1 −x
2
)/2
x
2
= (−x
1
−x
3
)/4
x
3
= −x
2
/3.
(1.100)
As a starting vector, we use
x
(0)
=
_
_
_
0
0
0
_
_
_
. (1.101)
We summarize the iterations in a table:
k x
1
x
2
x
3
0 0 0 0
1 0.5000 0 0
2 0.5000 -0.1250 0
3 0.5625 -0.1250 0.0417
4 0.5625 -0.1510 0.0417
5 0.5755 -0.1510 0.0503
6 0.5755 -0.1564 0.0503
7 0.5782 -0.1564 0.0521
8 0.5782 -0.1576 0.0521
9 0.5788 -0.1576 0.0525
10 0.5788 -0.1578 0.0525
11 0.5789 -0.1578 0.0526
12 0.5789 -0.1579 0.0526
21
For the precision displayed, this iteration has converged.
Notice that, in Jacobi’s method, when x
(k)
3
is computed, we already know x
(k)
2
and x
(k)
1
.
Why not use the new values rather than x
(k−1)
2
and x
(k−1)
1
? The iterative algorithm which uses
at each step the “improved” values if they are available is called the Gauss-Seidel method.
In general, in Gauss-Seidel,
x
(k)
i
=
_
b
i
−A
i1
x
(k)
1
−A
i2
x
(k)
2
− −A
i,i−1
x
(k)
i−1
−A
i,i+1
x
(k−1)
i+1
− −A
in
x
(k−1)
n
_
/A
ii
,
i = 1, 2, . . . , n. (1.102)
This formula is used instead of Eq. 1.97.
The Gauss-Seidel algorithm can be summarized as follows:
1. Input n, A, b, N
max
, x [x = initial guess]
2. k=0
3. k=k+1 [k = iteration number]
4. y = x [save previous iterate in y]
5. For i = 1, 2, . . . , n:
x
i
= (b
i
−A
i1
x
1
−A
i2
x
2
− −A
i,i−1
x
i−1
−A
i,i+1
x
i+1
− −A
in
x
n
) /A
ii
6. If converged (compare x with y), output x and exit; otherwise . . .
7. If k < N
max
, go to Step 3; otherwise . . .
8. No convergence; error termination.
A =
_
_
2 1 0
1 4 1
0 1 3
_
_
, b =
_
_
_
1
0
0
_
_
_
. (1.103)
The system Ax = b can again be written in the form
_
_
_
x
1
= (1 −x
2
)/2
x
2
= (−x
1
−x
3
)/4
x
3
= −x
2
/3,
(1.104)
where we again use the starting vector
x
(0)
=
_
_
_
0
0
0
_
_
_
. (1.105)
We summarize the Gauss-Seidel iterations in a table:
22
k x
1
x
2
x
3
0 0 0 0
1 0.5000 -0.1250 0.0417
2 0.5625 -0.1510 0.0503
3 0.5755 -0.1564 0.0521
4 0.5782 -0.1576 0.0525
5 0.5788 -0.1578 0.0526
6 0.5789 -0.1578 0.0526
For this example, Gauss-Seidel converges is about half as many iterations as Jacobi.
In Gauss-Seidel iteration, the updates cannot be done simultaneously as in the Jacobi
method, since each component of the new iterate depends upon all previously computed
components. Also, each iterate depends upon the order in which the equations are processed,
so that the Gauss-Seidel method is sometimes called the method of successive displacements
to indicate the dependence of the iterates on the ordering. If the ordering of the equations
is changed, the components of the new iterate will also change. It also turns out that the
rate of convergence depends on the ordering.
In general, if the Jacobi method converges, the Gauss-Seidel method will converge faster
than the Jacobi method. However, this statement is not always true. Consider, for example,
the system Ax = b, where
A =
_
_
1 2 −2
1 1 1
2 2 1
_
_
, b =
_
_
_
1
1
1
_
_
_
. (1.106)
The solution of this system is (−3, 3, 1). For this system, Jacobi converges, and Gauss-Seidel
diverges. On the other hand, consider the system Ax = b with
A =
_
_
5 4 3
4 7 4
3 4 4
_
_
, b =
_
_
_
12
15
11
_
_
_
. (1.107)
For this system, whose solution is (1, 1, 1), Jacobi diverges, and Gauss-Seidel converges.
Generally, for iterative methods such as Jacobi or Gauss-Seidel, the stopping criterion is
either that the absolute change from one iteration to the next is small or that the relative
change from one iteration to the next is small. That is, either
max
i
¸
¸
¸x
(k)
i
−x
(k−1)
i
¸
¸
¸ < ε (1.108)
or
max
i
¸
¸
¸x
(k)
i
−x
(k−1)
i
¸
¸
¸ < ε max
i
¸
¸
¸x
(k−1)
i
¸
¸
¸ , (1.109)
where ε is a small number chosen by the analyst.
We now return to the question of convergence of iterative methods. Consider Jacobi’s
method of solution for the system Ax = b. Deﬁne M as the iteration matrix for Jacobi’s
method; i.e., M is deﬁned as the matrix such that
x
(k)
= Mx
(k−1)
+d, (1.110)
23
where
M =
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
0 −
A
12
A
11

A
13
A
11

A
1n
A
11

A
21
A
22
0 −
A
23
A
22

A
2n
A
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

A
n1
A
nn

A
n2
A
nn

A
n,n−1
A
nn
0
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
, d
i
= b
i
/A
ii
. (1.111)
Deﬁne the error at the kth iteration as
e
(k)
= x
(k)
−x, (1.112)
where x is the exact solution (unknown). By deﬁnition (Eq. 1.110),
Mx +d = x. (1.113)
Thus,
Me
(k)
= Mx
(k)
−Mx =
_
x
(k+1)
−d
_
−(x −d) = x
(k+1)
−x = e
(k+1)
. (1.114)
That is, if the iteration matrix M acts on the error e
(k)
, e
(k+1)
results. Thus, to have the
error decrease at each iteration, we conclude that M should be small in some sense. The
last equation can be written in terms of components as
e
(k+1)
i
= −

j=i
A
ij
A
ii
e
(k)
j
(1.115)
or, with absolute values,
¸
¸
¸e
(k+1)
i
¸
¸
¸ =
¸
¸
¸
¸
¸

j=i
A
ij
A
ii
e
(k)
j
¸
¸
¸
¸
¸
. (1.116)
Since, for any numbers c
i
,
¸
¸
¸
¸
¸

i
c
i
¸
¸
¸
¸
¸

i
[c
i
[ , (1.117)
it follows that
¸
¸
¸e
(k+1)
i
¸
¸
¸ ≤

j=i
¸
¸
¸
¸
A
ij
A
ii
e
(k)
j
¸
¸
¸
¸
=

j=i
¸
¸
¸
¸
A
ij
A
ii
¸
¸
¸
¸
¸
¸
¸e
(k)
j
¸
¸
¸ ≤
_

j=i
¸
¸
¸
¸
A
ij
A
ii
¸
¸
¸
¸
_
¸
¸
e
(k)
max
¸
¸
= α
i
¸
¸
e
(k)
max
¸
¸
. (1.118)
where we deﬁne
α
i
=

j=i
¸
¸
¸
¸
A
ij
A
ii
¸
¸
¸
¸
=
1
[A
ii
[

j=i
[A
ij
[. (1.119)
24
Thus, Eq. 1.118 implies
¸
¸
e
(k+1)
max
¸
¸
≤ α
max
¸
¸
e
(k)
max
¸
¸
. (1.120)
If α
max
< 1, the error decreases with each iteration, where α
max
< 1 implies
[A
ii
[ >

j=i
[A
ij
[, i = 1, 2, . . . , n. (1.121)
A matrix which satisﬁes this condition is said to be strictly diagonally dominant. Therefore,
a suﬃcient condition for Jacobi’s method to converge is that the coeﬃcient matrix A be
strictly diagonally dominant. Notice that the starting vector does not matter, since it did
not enter into this derivation. It turns out that strict diagonal dominance is also a suﬃcient
condition for the Gauss-Seidel method to converge.
Note that strict diagonal dominance is a suﬃcient condition, but not necessary. As a
practical matter, rapid convergence is desired, and rapid convergence can be assured only
if the main diagonal coeﬃcients are large compared to the oﬀ-diagonal coeﬃcients. These
suﬃcient conditions are weak in the sense that the iteration might converge even for systems
not diagonally dominant, as seen in the examples of Eqs. 1.106 and 1.107.
2 Vector Spaces
A vector space is deﬁned as a set of objects, called vectors, which can be combined using
addition and multiplication under the following eight rules:
1. Addition is commutatative: a +b = b +a.
2. Addition is associative: (a +b) +c = a + (b +c).
3. There exists a unique zero vector 0 such that a +0 = a.
4. There exists a unique negative vector such that a + (−a) = 0.
5. 1 a = a.
6. Multiplication is associative: α(βa) = (αβ)a for scalars α, β.
7. Multiplication is distributive with respect to vector addition: α(a +b) = αa + αb.
8. Multiplication is distributive with respect to scalar addition: (α + β)a = αa + βa.
Examples of vector spaces include the following:
1. The space of n-dimensional vectors: a = (a
1
, a
2
, . . . , a
n
).
2. The space of mn matrices.
3. The space of functions f(x).
25
Any set of objects that obeys the above eight rules is a vector space.
A subspace is a subset of a vector space which is closed under addition and scalar mul-
tiplication. The phrase “closed under addition and scalar multiplication” means that (1) if
two vectors a and b are in the subspace, their sum a + b also lies in the subspace, and (2)
if a vector a is in the subspace, the scalar multiple αa lies in the subspace. For example,
a two-dimensional plane in 3-D is a subspace. However, polynomials of degree 2 are not a
subspace, since, for example, we can add two polynomials of degree 2,
_
−x
2
+ x + 1
_
+
_
x
2
+ x + 1
_
= 2x + 2, (2.1)
and obtain a result which is a polynomial of degree 1. On the other hand, polynomials of
degree ≤ 2 are a subspace.
The column space of a matrix A consists of all combinations of the columns of A. For
example, consider the 3 2 system
_
_
1 0
5 4
2 4
_
_
_
u
v
_
=
_
_
_
b
1
b
2
b
3
_
_
_
. (2.2)
Since this system has three equations and two unknowns, it may not have a solution. An
alternative way to write this system is
u
_
_
_
1
5
2
_
_
_
+ v
_
_
_
0
4
4
_
_
_
=
_
_
_
b
1
b
2
b
3
_
_
_
, (2.3)
where u and v are scalar multipliers of the two vectors. This system has a solution if, and
only if, the right-hand side b can be written as a linear combination of the two columns.
Geometrically, the system has a solution if, and only if, b lies in the plane formed by the
two vectors. That is, the system has a solution if, and only if, b lies in the column space of
A.
We note that the column space of an m n matrix A is a subspace of m-dimensional
space R
m
. For example, in the 3 2 matrix above, the column space is the plane in 3-D
formed by the two vectors (1, 5, 2) and (0, 4, 4).
The null space of a matrix A consists of all vectors x such that Ax = 0.
2.1 Rectangular Systems of Equations
For square systems of equations Ax = b, there is one solution (if A
−1
exists) or zero or
inﬁnite solutions (if A
−1
does not exist). The situation with rectangular systems is a little
diﬀerent.
For example, consider the 3 4 system
_
_
1 3 3 2
2 6 9 5
−1 −3 3 0
_
_
_
¸
¸
_
¸
¸
_
x
1
x
2
x
3
x
4
_
¸
¸
_
¸
¸
_
=
_
_
_
b
1
b
2
b
3
_
_
_
. (2.4)
26
We can simplify this system using elementary row operations to obtain
_
_
1 3 3 2 b
1
2 6 9 5 b
2
−1 −3 3 0 b
3
_
_
−→
_
_
1 3 3 2 b
1
0 0 3 1 b
2
−2b
1
0 0 6 2 b
1
+ b
3
_
_
(2.5)
−→
_
_
1 3 3 2 b
1
0 0 3 1 b
2
−2b
1
0 0 0 0 b
3
−2b
2
+ 5b
1
_
_
, (2.6)
which implies that the solution exists if, and only if,
b
3
−2b
2
+ 5b
1
= 0. (2.7)
Thus, for this system, even though there are more unknowns than equations, there may be
no solution.
Now assume that a solution of Eq. 2.4 exists. For example, let
b =
_
_
_
1
5
5
_
_
_
, (2.8)
in which case the row-reduced system is
_
_
1 3 3 2 1
0 0 3 1 3
0 0 0 0 0
_
_
, (2.9)
which implies, from the second equation,
3x
3
+ x
4
= 3. (2.10)
One of the two unknowns (x
4
, say) can be picked arbitrarily, in which case
x
3
= 1 −
1
3
x
4
. (2.11)
From the ﬁrst equation in Eq. 2.9, we obtain
x
1
= 1 −3x
2
−3x
3
−2x
4
= 1 −3x
2
−3(1 −x
4
/3) −2x
4
= −2 −3x
2
−x
4
, (2.12)
which implies that the solution can be written in the form
x =
_
¸
¸
_
¸
¸
_
x
1
x
2
x
3
x
4
_
¸
¸
_
¸
¸
_
=
_
¸
¸
_
¸
¸
_
−2
0
1
0
_
¸
¸
_
¸
¸
_
+ x
2
_
¸
¸
_
¸
¸
_
−3
1
0
0
_
¸
¸
_
¸
¸
_
+ x
4
_
¸
¸
_
¸
¸
_
−1
0
−1/3
1
_
¸
¸
_
¸
¸
_
. (2.13)
This system thus has a doubly inﬁnite set of solutions, i.e., there are two free parameters.
27
Now consider the corresponding homogeneous system (with zero right-hand side):
Ax
h
= 0, (2.14)
where, from Eq. 2.9, the row-reduced system is
_
_
1 3 3 2 0
0 0 3 1 0
0 0 0 0 0
_
_
, (2.15)
which implies
3x
3
= −x
4
(2.16)
and
x
1
= −3x
2
−3x
3
−2x
4
= −3x
2
−x
4
. (2.17)
Thus, the solution of the homogeneous system can be written
x
h
= x
2
_
¸
¸
_
¸
¸
_
−3
1
0
0
_
¸
¸
_
¸
¸
_
+ x
4
_
¸
¸
_
¸
¸
_
−1
0
−1/3
1
_
¸
¸
_
¸
¸
_
. (2.18)
Therefore, from Eqs. 2.13 and 2.18, the solution of the nonhomogeneous system Ax = b
can be written as the sum of a particular solution x
p
and the solution of the corresponding
homogeneous problem Ax
h
= 0:
x = x
p
+x
h
. (2.19)
Alternatively, for the system Ax = b, given a solution x, any solution of the corresponding
homogeneous system can be added to get another solution, since
Ax = A(x
p
+x
h
) = Ax
p
+Ax
h
= b +0 = b. (2.20)
The ﬁnal row-reduced matrix is said to be in echelon form if it looks like this:
_
¸
¸
_
x _ x x x x x
0 x _ x x x x
0 0 0 x _ x x
0 0 0 0 0 0
_
¸
¸
_
,
where nonzeros are denoted with the letter “x”. The echelon form of a matrix has the
following characteristics:
1. The nonzero rows appear ﬁrst.
2. The pivots (circled above) are the ﬁrst nonzeros in each row.
3. Only zeros appear below each pivot.
4. Each pivot appears to the right of the pivot in the row above.
The number of nonzero rows in the row-reduced (echelon) matrix is referred to as the rank
of the system. Thus, matrix rank is the number of independent rows in a matrix.
28
2.2 Linear Independence
The vectors v
1
, v
2
, . . . , v
k
are linearly independent if the linear combination
c
1
v
1
+ c
2
v
2
+ + c
k
v
k
= 0 (2.21)
implies c
1
= c
2
= = c
k
= 0. We note that, if the coeﬃcients c
i
are not all zero, then one
vector can be written as a linear combination of the others.
For the special case of two vectors, two vectors (of any dimension) are linearly dependent if
one is a multiple of the other, i.e., v
1
= cv
2
. Geometrically, if two vectors are dependent, they
are parallel. Three vectors (of any dimension) are linearly dependent if they are coplanar.
For example, let us determine if the three vectors
v
1
=
_
_
_
3
3
3
_
_
_
, v
2
=
_
_
_
4
5
4
_
_
_
, v
3
=
_
_
_
2
7
4
_
_
_
(2.22)
are independent. Note that Eq. 2.21 implies
_
v
1
v
2
v
3
¸
_
_
_
c
1
c
2
c
3
_
_
_
= 0 (2.23)
or
_
_
3 4 2
3 5 7
3 4 4
_
_
_
_
_
c
1
c
2
c
3
_
_
_
=
_
_
_
0
0
0
_
_
_
, (2.24)
so that v
1
, v
2
, v
3
are independent if c
1
= c
2
= c
3
= 0 is the only solution of this system.
Thus, to determine the independence of vectors, we can arrange the vectors in the columns
of a matrix, and determine the null space of the matrix:
_
_
3 4 2 0
3 5 7 0
3 4 4 0
_
_
−→
_
_
3 4 2 0
0 1 5 0
0 0 2 0
_
_
. (2.25)
This system implies c
1
= c
2
= c
3
= 0, since an upper triangular system with a nonzero
diagonal is nonsingular. Thus, the three given vectors are independent.
The previous example also shows that the columns of a matrix are linearly independent
if, and only if, the null space of the matrix is the zero vector.
Let r denote the rank of a matrix (the number of independent rows). We observe that
the number of independent columns is also r, since each column of an echelon matrix with a
pivot has a nonzero component in a new location. For example, consider the echelon matrix
_
_
1 _ 3 3 2
0 0 3 _ 1
0 0 0 0
_
_
,
where the pivots are circled. Column 1 has one nonzero component. Column 2 is not
independent, since it is proportional to Column 1. Column 3 has two nonzero components,
29
¸
`
(1,0)
(0,1)
¸

(1,0)
(1,1)
Figure 2: Some Vectors That Span the xy-Plane.
so Column 3 cannot be dependent on the ﬁrst two columns. Column 4 also has two nonzero
components, but the column can be written uniquely as a linear combination of Columns 1
and 3 by picking ﬁrst the Column 3 multiplier (1/3) followed by the Column 1 multiplier.
Thus we see that, for a matrix, the row rank is equal to the column rank.
2.3 Spanning a Subspace
If every vector v in a vector space can be expressed as a linear combination
v = c
1
w
1
+ c
2
w
2
+ + c
k
w
k
(2.26)
for some coeﬃcients c
i
, then the vectors w
1
, w
2
, . . . , w
k
are said to span the space. For
example, in 2-D, the vectors (1,0) and (0,1) span the xy-plane (Fig. 2). The two vectors
(1,0) and (1,1) also span the xy-plane. Since the number of spanning vectors need not be
minimal, the three vectors (1,0), (0,1), and (1,1) also span the xy-plane, although only two
of the three vectors are needed.
The column space of a matrix is the space spanned by the columns of the matrix. This
deﬁnition is equivalent to that given earlier (all possible combinations of the columns).
A basis for a vector space is a set of vectors which is linearly independent and spans the
space. The requirement for linear independence means that there can be no extra vectors.
For example, the two vectors (1,0) and (0,1) in Fig. 2 form a basis for R
2
. The two vectors
(1,0) and (1,1) also form a basis for R
2
. However, the three vectors (1,0), (0,1), and (1,1)
do not form a basis for R
2
, since one of the three vectors is linearly dependent on the other
two. Thus, we see that all bases for a vector space V contain the same number of vectors;
that number is referred to as the dimension of V .
2.4 Row Space and Null Space
The row space of a matrix A consists of all possible combinations of the rows of A. Consider
again the 3 4 example used in the discussion of rectangular systems:
A =
_
_
1 3 3 2
2 6 9 5
−1 −3 3 0
_
_
−→
_
_
1 3 3 2
0 0 3 1
0 0 6 2
_
_
−→
_
_
1 _ 3 3 2
0 0 3 _ 1
0 0 0 0
_
_
= U, (2.27)
where the pivots are circled. A has two independent rows, i.e., rank(A) = 2. Also, since the
rows of the ﬁnal matrix U are simply linear combinations of the rows of A, the row spaces of
both A and U are the same. Again we see that the number of pivots is equal to the rank r,
30
and that the number of independent columns is also r. That is, the number of independent
columns is equal to the number of independent rows. Hence, for a matrix,
Row rank = Column rank. (2.28)
We recall that the null space of a matrix A consists of all vectors x such that Ax = 0.
With elementary row operations, A can be transformed into echelon form, i.e.,
Ax = 0 −→Ux = 0. (2.29)
For the example of the preceding section,
U =
_
_
1 3 3 2
0 0 3 1
0 0 0 0
_
_
, (2.30)
which implies
x
1
+ 3x
2
+ 3x
3
+ 2x
4
= 0 (2.31)
and
3x
3
+ x
4
= 0. (2.32)
With four unknowns and two independent equations, we could, for example, choose x
4
arbitrarily in Eq. 2.32 and x
2
arbitrarily in Eq. 2.31. In general, for a rectangular system
with m equations and n unknowns (i.e., A is an mn matrix), the number of free variables
(the dimension of the null space) is n −r, where r is the matrix rank.
We thus summarize the dimensions of the various spaces in the following table. For an
mn matrix A (m rows, n columns):
Space Dimension Subspace of
Row r R
n
Column r R
m
Null n −r R
n
Note that
dim(row space) + dim(null space) = n = number of columns. (2.33)
This property will be useful in the discussions of pseudoinverses and least squares.
2.5 Pseudoinverses
Rectangular mn matrices cannot be inverted in the usual sense. However, one can compute
a pseudoinverse by considering either AA
T
or A
T
A, whichever (it turns out) is smaller. We
consider two cases: m < n and m > n. Notice that, for any matrix A, both AA
T
and A
T
A
are square and symmetric.
For m < n, consider AA
T
, which is a square mm matrix. If AA
T
is invertible,
_
AA
T
_ _
AA
T
_
−1
= I (2.34)
31
or
A
_
A
T
_
AA
T
_
−1
_
= I, (2.35)
where the matrix C = A
T
_
AA
T
_
−1
in brackets is referred to as the pseudo right inverse.
The reason for choosing the order of the two inverses given in Eq. 2.34 is that this order
allows the matrix A to be isolated in the next equation.
For the second case, m > n, consider A
T
A, which is an nn matrix. If A
T
Ais invertible,
_
A
T
A
_
−1
_
A
T
A
_
= I (2.36)
or
_
_
A
T
A
_
−1
A
T
_
A = I, (2.37)
where the matrix B =
_
A
T
A
_
−1
A
T
in brackets is referred to as the pseudo left inverse.
For example, consider the 2 3 matrix
A =
_
4 0 0
0 5 0
_
, (2.38)
in which case
AA
T
=
_
4 0 0
0 5 0
_
_
_
4 0
0 5
0 0
_
_
=
_
16 0
0 25
_
, (2.39)
and the pseudo right inverse is
C = A
T
_
AA
T
_
−1
=
_
_
4 0
0 5
0 0
_
_
_
1/16 0
0 1/25
_
=
_
_
1/4 0
0 1/5
0 0
_
_
. (2.40)
We check the correctness of this result by noting that
AC =
_
4 0 0
0 5 0
_
_
_
1/4 0
0 1/5
0 0
_
_
=
_
1 0
0 1
_
= I. (2.41)
One of the most important properties associated with pseudoinverses is that, for any
matrix A,
rank
_
A
T
A
_
= rank
_
AA
T
_
= rank(A). (2.42)
To show that
rank
_
A
T
A
_
= rank(A), (2.43)
we ﬁrst recall that, for any matrix A,
dim(row space) + dim(null space) = number of columns (2.44)
or
r + dim(null space) = n. (2.45)
32
Since A
T
A and A have the same number of columns (n), if suﬃces to show that A
T
A and
A have the same null space. That is, we want to prove that
Ax = 0 ⇐⇒ A
T
Ax = 0. (2.46)
The forward direction (=⇒) is clear by inspection. For the other direction (⇐=), consider
A
T
Ax = 0. (2.47)
If we take the dot product of both sides with x,
0 = x
T
A
T
Ax = (Ax)
T
(Ax) = [Ax[
2
, (2.48)
which implies (since a vector of zero length must be the zero vector)
Ax = 0. (2.49)
Thus, since A
T
A and A have both the same null space and the same number of columns
(n), the dimension r of the row space of the two matrices is the same, and the ﬁrst part of
Eq. 2.42 is proved. Also, with A
T
= B,
rank
_
AA
T
_
= rank
_
B
T
B
_
= rank(B) = rank(B
T
) = rank(A), (2.50)
and the second part of Eq. 2.42 is proved. The implication of Eq. 2.42 is that, if rank(A) =
n (i.e., A’s columns are independent), then A
T
A is invertible.
It will also be shown in ¸4 that, for the overdetermined system Ax = b (with m > n),
the pseudoinverse solution using A
T
A is equivalent to the least squares solution.
2.6 Linear Transformations
If A is an mn matrix, and x is an n-vector, the matrix product Ax can be thought of as
a transformation of x into another vector x

:
Ax = x

, (2.51)
where x

is an m1 vector. We consider some examples:
1. Stretching
Let
A =
_
c 0
0 c
_
= cI, (2.52)
in which case
Ax =
_
c 0
0 c
_ _
x
1
x
2
_
=
_
cx
1
cx
2
_
= c
_
x
1
x
2
_
. (2.53)
33
>
>
>
>
`
`
`

(x
1
, x
2
)
θ
θ
Figure 3: 90

Rotation.

>
>
>
>
>
>
>

`
(x
1
, x
2
)
Figure 4: Reﬂection in 45

Line.
2. 90

Rotation (Fig. 3)
Let
A =
_
0 −1
1 0
_
, (2.54)
in which case
Ax =
_
0 −1
1 0
_ _
x
1
x
2
_
=
_
−x
2
x
1
_
. (2.55)
3. Reﬂection in 45

Line (Fig. 4)
Let
A =
_
0 1
1 0
_
, (2.56)
in which case
Ax =
_
0 1
1 0
_ _
x
1
x
2
_
=
_
x
2
x
1
_
. (2.57)
4. Projection Onto Horizontal Axis (Fig. 5)
Let
A =
_
1 0
0 0
_
, (2.58)
in which case
Ax =
_
1 0
0 0
_ _
x
1
x
2
_
=
_
x
1
0
_
. (2.59)
>
>
>
>
>
>
>
(x
1
, x
2
)
r
(x
1
, 0)
Figure 5: Projection Onto Horizontal Axis.
34
Note that transformations which can be represented by matrices are linear transforma-
tions, since
A(cx + dy) = c(Ax) + d(Ay), (2.60)
where x and y are vectors, and c and d are scalars. This property implies that knowing how
A transforms each vector in a basis determines how A transforms every vector in the entire
space, since the vectors in the space are linear combinations of the basis vectors:
A(c
1
x
1
+ c
2
x
2
+ + c
n
x
n
) = c
1
(Ax
1
) + c
2
(Ax
2
) + + c
n
(Ax
n
), (2.61)
where x
i
is the ith basis vector, and c
i
is the ith scalar multiplier.
We now continue with the examples of linear transformations:
5. Diﬀerentiation of Polynomials
Consider the polynomial of degree 4
p(t) = a
0
+ a
1
t + a
2
t
2
+ a
3
t
3
+ a
4
t
4
. (2.62)
The basis vectors are
p
1
= 1, p
2
= t, p
3
= t
2
, p
4
= t
3
, p
5
= t
4
, (2.63)
which implies that, in terms of the basis vectors,
p(t) = a
0
p
1
+ a
1
p
2
+ a
2
p
3
+ a
3
p
4
+ a
4
p
5
. (2.64)
The coeﬃcients a
i
become the components of the vector p. If we diﬀerentiate the
polynomial, we obtain
p

(t) = a
1
+ 2a
2
t + 3a
3
t
2
+ 4a
4
t
3
= a
1
p
1
+ 2a
2
p
2
+ 3a
3
p
3
+ 4a
4
p
4
. (2.65)
Diﬀerentiation of polynomials of degree 4 can thus be written in matrix form as
_
¸
¸
_
0 1 0 0 0
0 0 2 0 0
0 0 0 3 0
0 0 0 0 4
_
¸
¸
_
_
¸
¸
¸
¸
_
¸
¸
¸
¸
_
a
0
a
1
a
2
a
3
a
4
_
¸
¸
¸
¸
_
¸
¸
¸
¸
_
=
_
¸
¸
_
¸
¸
_
a
1
2a
2
3a
3
4a
4
_
¸
¸
_
¸
¸
_
, (2.66)
so that the matrix which represents diﬀerentiation of polynomials of degree 4 is
A
D
=
_
¸
¸
_
0 1 0 0 0
0 0 2 0 0
0 0 0 3 0
0 0 0 0 4
_
¸
¸
_
, (2.67)
where the subscript “D” denotes diﬀerentiation. Note that this matrix has been written
as a rectangular 45 matrix, since diﬀerentiation reduces the degree of the polynomial
by unity.
35
6. Integration of Polynomials
Let
p(t) = a
0
+ a
1
t + a
2
t
2
+ a
3
t
3
, (2.68)
which can be integrated to obtain
_
t
0
p(τ) dτ = a
0
t +
a
1
2
t
2
+
a
2
3
t
3
+
a
3
4
t
4
= a
0
p
2
+
a
1
2
p
3
+
a
2
3
p
4
+
a
3
4
p
5
. (2.69)
Thus, integration of polynomials of degree 3 can be written in matrix form as
_
¸
¸
¸
¸
¸
_
0 0 0 0
1 0 0 0
0
1
2
0 0
0 0
1
3
0
0 0 0
1
4
_
¸
¸
¸
¸
¸
_
_
¸
¸
¸
_
¸
¸
¸
_
a
0
a
1
a
2
a
3
_
¸
¸
¸
_
¸
¸
¸
_
=
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
0
a
0
1
2
a
1
1
3
a
2
1
4
a
3
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
, (2.70)
so that the matrix which represents integration of polynomials of degree 3 is
A
I
=
_
¸
¸
¸
¸
_
0 0 0 0
1 0 0 0
0
1
2
0 0
0 0
1
3
0
0 0 0
1
4
_
¸
¸
¸
¸
_
, (2.71)
where the subscript “I” denotes integration. We note that integration followed by
diﬀerentiation is an inverse operation which restores the vector to its original:
A
D
A
I
=
_
¸
¸
_
0 1 0 0 0
0 0 2 0 0
0 0 0 3 0
0 0 0 0 4
_
¸
¸
_
_
¸
¸
¸
¸
_
0 0 0 0
1 0 0 0
0
1
2
0 0
0 0
1
3
0
0 0 0
1
4
_
¸
¸
¸
¸
_
=
_
¸
¸
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
¸
¸
_
= I. (2.72)
Thus, A
D
is the pseudo left inverse of A
I
. On the other hand, performing the operations
in reverse order (diﬀerentiation followed by integation) does not restore the vector to
its original:
A
I
A
D
=
_
¸
¸
¸
¸
_
0 0 0 0
1 0 0 0
0
1
2
0 0
0 0
1
3
0
0 0 0
1
4
_
¸
¸
¸
¸
_
_
¸
¸
_
0 1 0 0 0
0 0 2 0 0
0 0 0 3 0
0 0 0 0 4
_
¸
¸
_
=
_
¸
¸
¸
¸
_
0 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
_
¸
¸
¸
¸
_
,= I. (2.73)
Integration after diﬀerentiation does not restore the original constant term.
36
¸
`
e
1
e
2
>
>
>
>
>
`
`
`
`

e

1
e

2
θ
θ
Figure 6: Rotation by Angle θ.
7. Rotation by Angle θ (Fig. 6)
Consider the 2-D transformation
R =
_
cos θ −sin θ
sin θ cos θ
_
. (2.74)
If v is a typical vector in 2-D, v can be written
v =
_
v
1
v
2
_
= v
1
_
1
0
_
+ v
2
_
0
1
_
= v
1
e
1
+ v
2
e
2
, (2.75)
where e
1
and e
2
are the basis vectors. We now consider the eﬀect of R on the two
basis vectors e
1
and e
2
:
Re
1
=
_
cos θ
sin θ
_
= e

1
, Re
2
=
_
−sin θ
cos θ
_
= e

2
, (2.76)
as shown in Fig. 6. Since e
1
and e
2
are both rotated by θ, so is v. The transformation
R has the following properties:
(a) R
θ
and R
−θ
are inverses (rotation by θ and −θ):
R
−θ
R
θ
=
_
cos θ sin θ
−sin θ cos θ
_ _
cos θ −sin θ
sin θ cos θ
_
=
_
1 0
0 1
_
= I, (2.77)
where we used sin(−θ) = −sin θ and cos(−θ) = cos θ to compute R
−θ
. Thus, R
is an orthogonal matrix (RR
T
= I). Note also that R
−θ
= R
−1
θ
= R
T
.
(b) R

= R
θ
R
θ
(rotation through the double angle):
R
θ
R
θ
=
_
cos θ −sin θ
sin θ cos θ
_ _
cos θ −sin θ
sin θ cos θ
_
(2.78)
=
_
cos
2
θ −sin
2
θ −2 sin θ cos θ
2 sin θ cos θ cos
2
θ −sin
2
θ
_
(2.79)
=
_
cos 2θ −sin 2θ
sin 2θ cos 2θ
_
= R

(2.80)
The above equations assume the double-angle trigonometric identities as known,
but an alternative view of these equations would be to assume that R

= R
θ
R
θ
must be true, and view these equations as a derivation of the double-angle iden-
tities.
37
(c) R
θ+φ
= R
φ
R
θ
(rotation through θ, then φ):
R
φ
R
θ
=
_
cos φ −sin φ
sin φ cos φ
_ _
cos θ −sin θ
sin θ cos θ
_
(2.81)
=
_
cos φcos θ −sin φsin θ −cos φsin θ −sin φcos θ
sin φcos θ + cos φsin θ −sin φsin θ + cos φcos θ
_
(2.82)
=
_
cos(φ + θ) −sin(φ + θ)
sin(φ + θ) cos(φ + θ)
_
= R
φ+θ
(2.83)
We note that, in 2-D rotations (but not in 3-D), the order of rotations does not
matter, i.e.,
R
φ
R
θ
= R
θ
R
φ
. (2.84)
Properties (a) and (b) are special cases of Property (c).
2.7 Orthogonal Subspaces
For two vectors u = (u
1
, u
2
, . . . , u
n
) and v = (v
1
, v
2
, . . . , v
n
), we recall that the inner product
(or dot product) of u and v is deﬁned as
(u, v) = u v = u
T
v =
n

i=1
u
i
v
i
. (2.85)
The length of u is
[u[ =
_
u
2
1
+ u
2
2
+ + u
2
n
=

u u =
¸
¸
¸
_
n

i=1
u
i
u
i
(2.86)
The vectors u and v are orthogonal if u v = 0.
We note ﬁrst that a set of mutually orthogonal vectors is linearly independent. To prove
this assertion, let vectors v
1
, v
2
, . . . , v
k
be mutually orthogonal. To prove independence, we
must show that
c
1
v
1
+ c
2
v
2
+ + c
k
v
k
= 0 (2.87)
implies c
i
= 0 for all i. If we take the dot product of both sides of Eq. 2.87 with v
1
, we
obtain
v
1
(c
1
v
1
+ c
2
v
2
+ + c
k
v
k
) = 0, (2.88)
where orthogonality implies v
1
v
i
= 0 for i ,= 1. Thus, from Eq. 2.88,
c
1
v
1
v
1
= c
1
[v
1
[
2
= 0. (2.89)
Since [v
1
[ , = 0, we conclude that c
1
= 0. Similarly, c
2
= c
3
= = c
k
= 0, and the original
assertion is proved.
We deﬁne two subspaces as orthogonal if every vector in one subspace is orthogonal to
every vector in the other subspace. For example, the xy-plane is a subspace of R
3
, and the
orthogonal subspace is the z-axis.
38

1
2
3
θ
a
p
b
Figure 7: Projection Onto Line.
We now show that the row space of a matrix is orthogonal to the null space. To prove
this assertion, consider a matrix A for which x is in the nullspace, and v is in the row space.
Thus,
Ax = 0 (2.90)
and
v = A
T
w (2.91)
for some w. That is, v is a combination of the columns of A
T
or rows of A. Then,
v
T
x = w
T
Ax = 0, (2.92)
thus proving the assertion.
2.8 Projections Onto Lines
Consider two n-dimensional vectors a = (a
1
, a
2
, . . . , a
n
) and b = (b
1
, b
2
, . . . , b
n
). We wish to
project b onto a, as shown in Fig. 7 in 3-D. Let p be the vector obtained by projecting b
onto a. The scalar projection of b onto a is [b[ cos θ, where θ is the angle between b and a.
Thus,
p = ([b[ cos θ)
a
[a[
, (2.93)
where the fraction in this expression is the unit vector in the direction of a. Since
a b = [a[[b[ cos θ, (2.94)
Eq. 2.93 becomes
p = [b[
a b
[a[[b[
a
[a[
=
a b
a a
a. (2.95)
This result can be obtained by inspection by noting that the scalar projection of b onto a is
the dot product of b with a unit vector in the direction of a. The vector projection would
then be this scalar projection times a unit vector in the direction of a:
p =
_
b
a
[a[
_
a
[a[
. (2.96)
39
¸
`
x
y

r
r

`
`
`` `
axial member
¯ x ¯ y
¸
`
x
y

´
´
´ ´
-
-
-
-
r
r
r

`
`
`` `
triangular membrane
¯ x ¯ y
Figure 8: Element Coordinate Systems in the Finite Element Method.
We can also determine the projection matrix for this projection. The projection matrix
is the matrix P that transforms b into p, i.e., P is the matrix such that
p = Pb. (2.97)
Here it is convenient to switch to index notation with the summation convention (in which
repeated indices are summed). From Eq. 2.95,
p
i
=
a
j
b
j
a a
a
i
=
a
i
a
j
a a
b
j
= P
ij
b
j
. (2.98)
Thus, the projection matrix is
P
ij
=
a
i
a
j
a a
(2.99)
or, in vector and matrix notations,
P =
aa
T
a a
=
aa
T
a
T
a
. (2.100)
The projection matrix P has the following properties:
1. P is symmetric, i.e., P
ij
= P
ji
.
2. P
2
= P. That is, once a vector has been projected, there is no place else to go with
further projection. This property can be proved using index notation:
_
P
2
_
ij
= (PP)
ij
= P
ik
P
kj
=
a
i
a
k
a
k
a
j
(a a)(a a)
=
a
i
a
j
(a a)
(a a)(a a)
=
a
i
a
j
a a
= P
ij
. (2.101)
3. P is singular. That is, given p, it is not possible to reconstruct the original vector b,
since b is not unique.
3 Change of Basis
On many occasions in engineering applications, the need arises to transform vectors and
matrices from one coordinate system to another. For example, in the ﬁnite element method,
it is frequently more convenient to derive element matrices in a local element coordinate
system and then transform those matrices to a global system (Fig. 8). Transformations
40
¸
`
x
y

`
`
`
``
q
e
r
e
θ
θ
r
Figure 9: Basis Vectors in Polar Coordinate System.
¸
x
1
`
x
2

¯ x
1
`
`
`
`
`
`
`
``
¯ x
2

v
θ
Figure 10: Change of Basis.
are also needed to transform from other orthogonal coordinate systems (e.g., cylindrical or
spherical) to Cartesian coordinates (Fig. 9).
Let the vector v be given by
v = v
1
e
1
+ v
2
e
2
+ v
3
e
3
=
3

i=1
v
i
e
i
, (3.1)
where e
i
are the basis vectors, and v
i
are the components of v. Using the summation
convention, we can omit the summation sign and write
v = v
i
e
i
, (3.2)
where, if a subscript appears exactly twice, a summation is implied over the range.
An orthonormal basis is deﬁned as a basis whose basis vectors are mutually orthogonal
unit vectors (i.e., vectors of unit length). If e
i
is an orthonormal basis,
e
i
e
j
= δ
ij
=
_
1, i = j,
0, i ,= j,
(3.3)
where δ
ij
is the Kronecker delta.
Since bases are not unique, we can express v in two diﬀerent orthonormal bases:
v =
3

i=1
v
i
e
i
=
3

i=1
¯ v
i
¯ e
i
, (3.4)
where v
i
are the components of v in the unbarred coordinate system, and ¯ v
i
are the com-
ponents in the barred system (Fig. 10). If we take the dot product of both sides of Eq. 3.4
41
with e
j
, we obtain
3

i=1
v
i
e
i
e
j
=
3

i=1
¯ v
i
¯ e
i
e
j
, (3.5)
where e
i
e
j
= δ
ij
, and we deﬁne the 3 3 matrix R as
R
ij
= ¯ e
i
e
j
. (3.6)
Thus, from Eq. 3.5,
v
j
=
3

i=1
R
ij
¯ v
i
=
3

i=1
R
T
ji
¯ v
i
. (3.7)
Since the matrix product
C = AB (3.8)
can be written using subscript notation as
C
ij
=
3

k=1
A
ik
B
kj
, (3.9)
Eq. 3.7 is equivalent to the matrix product
v = R
T
¯ v. (3.10)
Similarly, if we take the dot product of Eq. 3.4 with ¯ e
j
, we obtain
3

i=1
v
i
e
i
¯ e
j
=
3

i=1
¯ v
i
¯ e
i
¯ e
j
, (3.11)
where ¯ e
i
¯ e
j
= δ
ij
, and e
i
¯ e
j
= R
ji
. Thus,
¯ v
j
=
3

i=1
R
ji
v
i
or ¯ v = Rv or v = R
−1
¯ v. (3.12)
A comparison of Eqs. 3.10 and 3.12 yields
R
−1
= R
T
or RR
T
= I or
3

k=1
R
ik
R
jk
= δ
ij
, (3.13)
where I is the identity matrix (I
ij
= δ
ij
):
I =
_
_
1 0 0
0 1 0
0 0 1
_
_
. (3.14)
This type of transformation is called an orthogonal coordinate transformation (OCT). A
matrix R satisfying Eq. 3.13 is said to be an orthogonal matrix. That is, an orthogonal
42
matrix is one whose inverse is equal to the transpose. R is sometimes called a rotation
matrix.
For example, for the coordinate rotation shown in Fig. 10, in 3-D,
R =
_
_
cos θ sin θ 0
−sin θ cos θ 0
0 0 1
_
_
. (3.15)
In 2-D,
R =
_
cos θ sin θ
−sin θ cos θ
_
(3.16)
and
_
v
x
= ¯ v
x
cos θ − ¯ v
y
sin θ
v
y
= ¯ v
x
sin θ + ¯ v
y
cos θ.
(3.17)
We recall that the determinant of a matrix product is equal to the product of the deter-
minants. Also, the determinant of the transpose of a matrix is equal to the determinant of
the matrix itself. Thus, from Eq. 3.13,
det(RR
T
) = (det R)(det R
T
) = (det R)
2
= det I = 1, (3.18)
and we conclude that, for an orthogonal matrix R,
det R = ±1. (3.19)
The plus sign occurs for rotations, and the minus sign occurs for combinations of rotations
and reﬂections. For example, the orthogonal matrix
R =
_
_
1 0 0
0 1 0
0 0 −1
_
_
(3.20)
indicates a reﬂection in the z direction (i.e., the sign of the z component is changed).
Another property of orthogonal matrices that can be deduced directly from the deﬁnition,
Eq. 3.13, is that the rows and columns of an orthogonal matrix must be unit vectors and
mutually orthogonal. That is, the rows and columns form an orthonormal set.
We note that the length of a vector is unchanged under an orthogonal coordinate trans-
formation, since the square of the length is given by
¯ v
i
¯ v
i
= R
ij
v
j
R
ik
v
k
= δ
jk
v
j
v
k
= v
j
v
j
, (3.21)
where the summation convention was used. That is, the square of the length of a vector is
the same in both coordinate systems.
To summarize, under an orthogonal coordinate transformation, vectors transform accord-
ing to the rule
¯ v = Rv or ¯ v
i
=
3

j=1
R
ij
v
j
, (3.22)
where
R
ij
= ¯ e
i
e
j
, (3.23)
and
RR
T
= R
T
R = I. (3.24)
43
3.1 Tensors
A vector which transforms under an orthogonal coordinate transformation according to the
rule ¯ v = Rv is deﬁned as a tensor of rank 1. A tensor of rank 0 is a scalar (a quantity which
is unchanged by an orthogonal coordinate transformation). For example, temperature and
pressure are scalars, since
¯
T = T and ¯ p = p.
We now introduce tensors of rank 2. Consider a matrix M = (M
ij
) which relates two
vectors u and v by
v = Mu or v
i
=
3

j=1
M
ij
u
j
(3.25)
(i.e., the result of multiplying a matrix and a vector is a vector). Also, in a rotated coordinate
system,
¯ v =
¯
M¯ u. (3.26)
Since both u and v are vectors (tensors of rank 1), Eq. 3.25 implies
R
T
¯ v = MR
T
¯ u or ¯ v = RMR
T
¯ u. (3.27)
By comparing Eqs. 3.26 and 3.27, we obtain
¯
M = RMR
T
(3.28)
or, in index notation,
¯
M
ij
=
3

k=1
3

l=1
R
ik
R
jl
M
kl
, (3.29)
which is the transformation rule for a tensor of rank 2. In general, a tensor of rank n, which
has n indices, transforms under an orthogonal coordinate transformation according to the
rule
¯
A
ij···k
=
3

p=1
3

q=1

3

r=1
R
ip
R
jq
R
kr
A
pq···r
. (3.30)
3.2 Examples of Tensors
1. Stress and strain in elasticity
The stress tensor σ is
σ =
_
_
σ
11
σ
12
σ
13
σ
21
σ
22
σ
23
σ
31
σ
32
σ
33
_
_
, (3.31)
where σ
11
, σ
22
, σ
33
are the direct (normal) stresses, and σ
12
, σ
13
, and σ
23
are the shear
stresses. The corresponding strain tensor is
ε =
_
_
ε
11
ε
12
ε
13
ε
21
ε
22
ε
23
ε
31
ε
32
ε
33
_
_
, (3.32)
44
where, in terms of displacements,
ε
ij
=
1
2
(u
i,j
+ u
j,i
) =
1
2
_
∂u
i
∂x
j
+
∂u
j
∂x
i
_
. (3.33)
The shear strains in this tensor are equal to half the corresponding engineering shear strains.
Both σ and ε transform like second rank tensors.
2. Generalized Hooke’s law
According to Hooke’s law in elasticity, the extension in a bar subjected to an axial force
is proportional to the force, or stress is proportional to strain. In 1-D,
σ = Eε, (3.34)
where E is Young’s modulus, an experimentally determined material property.
In general three-dimensional elasticity, there are nine components of stress σ
ij
and nine
components of strain ε
ij
. According to generalized Hooke’s law, each stress component can
be written as a linear combination of the nine strain components:
σ
ij
= c
ijkl
ε
kl
, (3.35)
where the 81 material constants c
ijkl
are experimentally determined, and the summation
convention is being used.
We now prove that c
ijkl
is a tensor of rank 4. We can write Eq. 3.35 in terms of stress
and strain in a second orthogonal coordinate system as
R
ki
R
lj
¯ σ
kl
= c
ijkl
R
mk
R
nl
¯ ε
mn
. (3.36)
If we multiply both sides of this equation by R
pj
R
oi
, and sum repeated indices, we obtain
R
pj
R
oi
R
ki
R
lj
¯ σ
kl
= R
oi
R
pj
R
mk
R
nl
c
ijkl
¯ ε
mn
, (3.37)
or, because R is an orthogonal matrix,
δ
ok
δ
pl
¯ σ
kl
= ¯ σ
op
= R
oi
R
pj
R
mk
R
nl
c
ijkl
¯ ε
mn
. (3.38)
Since, in the second coordinate system,
¯ σ
op
= ¯ c
opmn
¯ ε
mn
, (3.39)
we conclude that
¯ c
opmn
= R
oi
R
pj
R
mk
R
nl
c
ijkl
, (3.40)
which proves that c
ijkl
is a tensor of rank 4.
3. Stiﬀness matrix in ﬁnite element analysis
In the ﬁnite element method of analysis for structures, the forces F acting on an object
in static equilibrium are a linear combination of the displacements u (or vice versa):
Ku = F, (3.41)
45
where K is referred to as the stiﬀness matrix (with dimensions of force/displacement). In
this equation, u and F contain several subvectors, since u and F are the displacement and
force vectors for all grid points, i.e.,
u =
_
¸
¸
¸
_
¸
¸
¸
_
u
a
u
b
u
c
.
.
.
_
¸
¸
¸
_
¸
¸
¸
_
, F =
_
¸
¸
¸
_
¸
¸
¸
_
F
a
F
b
F
c
.
.
.
_
¸
¸
¸
_
¸
¸
¸
_
(3.42)
for grid points a, b, c, . . ., where
¯ u
a
= R
a
u
a
, ¯ u
b
= R
b
u
b
, . (3.43)
Thus,
¯ u =
_
¸
¸
¸
_
¸
¸
¸
_
¯ u
a
¯ u
b
¯ u
c
.
.
.
_
¸
¸
¸
_
¸
¸
¸
_
=
_
¸
¸
¸
_
R
a
0 0
0 R
b
0
0 0 R
c

.
.
.
.
.
.
.
.
.
.
.
.
_
¸
¸
¸
_
_
¸
¸
¸
_
¸
¸
¸
_
u
a
u
b
u
c
.
.
.
_
¸
¸
¸
_
¸
¸
¸
_
= Γu, (3.44)
where Γ is an orthogonal block-diagonal matrix consisting of rotation matrices R:
Γ =
_
¸
¸
¸
_
R
a
0 0
0 R
b
0
0 0 R
c

.
.
.
.
.
.
.
.
.
.
.
.
_
¸
¸
¸
_
, (3.45)
and
Γ
T
Γ = I. (3.46)
Similarly,
¯
F = ΓF. (3.47)
Thus, if
¯
K¯ u =
¯
F, (3.48)
¯
KΓu = ΓF (3.49)
or
_
Γ
T
¯

_
u = F. (3.50)
That is, the stiﬀness matrix transforms like other tensors of rank 2:
K = Γ
T
¯
KΓ. (3.51)
We illustrate the transformation of a ﬁnite element stiﬀness matrix by transforming the
stiﬀness matrix for the pin-jointed rod element from a local element coordinate system to a
global Cartesian system. Consider the rod shown in Fig. 11. For this element, the 4 4 2-D
46

q
q

´ ¯ x
¯ y
θ
¸
`
x
y

q
q

´
1
2

´
3
4
θ
DOF
Figure 11: Element Coordinate System for Pin-Jointed Rod.
stiﬀness matrix in the element coordinate system is
¯
K =
AE
L
_
¸
¸
_
1 0 −1 0
0 0 0 0
−1 0 1 0
0 0 0 0
_
¸
¸
_
, (3.52)
where A is the cross-sectional area of the rod, E is Young’s modulus of the rod material,
and L is the rod length. In the global coordinate system,
K = Γ
T
¯
KΓ, (3.53)
where the 4 4 transformation matrix is
Γ =
_
R 0
0 R
_
, (3.54)
and the rotation matrix is
R =
_
cos θ sin θ
−sin θ cos θ
_
. (3.55)
Thus,
K =
AE
L
_
¸
¸
_
c −s 0 0
s c 0 0
0 0 c −s
0 0 s c
_
¸
¸
_
_
¸
¸
_
1 0 −1 0
0 0 0 0
−1 0 1 0
0 0 0 0
_
¸
¸
_
_
¸
¸
_
c s 0 0
−s c 0 0
0 0 c s
0 0 −s c
_
¸
¸
_
=
AE
L
_
¸
¸
_
c
2
cs −c
2
−cs
cs s
2
−cs −s
2
−c
2
−cs c
2
cs
−cs −s
2
cs s
2
_
¸
¸
_
, (3.56)
where c = cos θ and s = sin θ.
47
3.3 Isotropic Tensors
An isotropic tensor is a tensor which is independent of coordinate system (i.e., invariant
under an orthogonal coordinate transformation). The Kronecker delta δ
ij
is a second rank
tensor and isotropic, since
¯
δ
ij
= δ
ij
, and
¯
I = RIR
T
= RR
T
= I. (3.57)
That is, the identity matrix I is invariant under an orthogonal coordinate transformation.
It can be shown in tensor analysis that δ
ij
is the only isotropic tensor of rank 2 and,
moreover, δ
ij
is the characteristic tensor for all isotropic tensors:
Rank Isotropic Tensors
1 none
2 cδ
ij
3 none
4 aδ
ij
δ
kl
+ bδ
ik
δ
jl
+ cδ
il
δ
jk
odd none
That is, all isotropic tensors of rank 4 must be of the form shown above, which has three
constants. For example, in generalized Hooke’s law (Eq. 3.35), the material property tensor
c
ijkl
has 3
4
= 81 constants (assuming no symmetry). For an isotropic material, c
ijkl
must
be an isotropic tensor of rank 4, thus implying at most three independent elastic material
constants (on the basis of tensor analysis alone). The actual number of independent material
constants for an isotropic material turns out to be two rather than three, a result which
depends on the existence of a strain energy function, which implies the additional symmetry
c
ijkl
= c
klij
.
4 Least Squares Problems
Consider the rectangular system of equations
Ax = b. (4.1)
where A is an mn matrix with m > n. That is, we are interested in the case where there
are more equations than unknowns. Assume that the system is inconsistent, so that Eq. 4.1
has no solution. Our interest here is to ﬁnd a vector x which “best” ﬁts the data in some
sense.
We state the problem as follows: If A is a mn matrix, with m > n, and Ax = b, ﬁnd
x such that the scalar error measure E = [Ax − b[ is minimized, where E is the length of
the residual Ax −b (error).
To solve this problem, we consider the square of the error:
E
2
= (Ax −b)
T
(Ax −b)
= (x
T
A
T
−b
T
)(Ax −b)
= x
T
A
T
Ax −x
T
A
T
b −b
T
Ax +b
T
b, (4.2)
48
or, in index notation using the summation convention,
E
2
= x
i
(A
T
)
ij
A
jk
x
k
−x
i
(A
T
)
ij
b
j
−b
i
A
ij
x
j
+ b
i
b
i
= x
i
A
ji
A
jk
x
k
−x
i
A
ji
b
j
−b
i
A
ij
x
j
+ b
i
b
i
. (4.3)
We wish to ﬁnd the vector x which minimizes E
2
. Thus, we set
∂(E
2
)
∂x
l
= 0 for each l, (4.4)
where we note that
∂x
i
∂x
l
= δ
il
, (4.5)
where δ
ij
is the Kronecker delta deﬁned as
δ
ij
=
_
1, i = j,
0, i ,= j.
(4.6)
We then diﬀerentiate Eq. 4.3 to obtain
0 =
∂(E
2
)
∂x
l
= δ
il
A
ji
A
jk
x
k
+ x
i
A
ji
A
jk
δ
kl
−δ
il
A
ji
b
j
−b
i
A
ij
δ
jl
+ 0 (4.7)
= A
jl
A
jk
x
k
+ x
i
A
ji
A
jl
−A
jl
b
j
−b
i
A
il
, (4.8)
where the dummy subscripts k in the ﬁrst term and j in the third term can both be changed
to i. Thus,
0 =
∂(E
2
)
∂x
l
= A
jl
A
ji
x
i
+ A
ji
A
jl
x
i
−A
il
b
i
−A
il
b
i
= (2A
T
Ax −2A
T
b)
l
(4.9)
or
A
T
Ax = A
T
b. (4.10)
The problem solved is referred to as the least squares problem, since we have minimized E,
the square root of the sum of the squares of the components of the vector Ax − b. The
equations, Eq. 4.10, which solve the least squares problem are referred to as the normal
equations.
Since A is an m n matrix, the coeﬃcient matrix A
T
A in Eq. 4.10 is a square n n
matrix. Moreover, A
T
A is also symmetric, since
(A
T
A)
T
= A
T
(A
T
)
T
= A
T
A. (4.11)
If A
T
A is invertible, we could “solve” the normal equations to obtain
x = (A
T
A)
−1
A
T
b. (4.12)
However, (A
T
A)
−1
may not exist, since the original system, Eq. 4.1, may not have been
over-determined (i.e., rank r < n).
49
¸
`
x
f(x)
r r
r
r

Error at a point

p(x)
(x
1
, f
1
)
x
1
x
2
x
3
x
4
Figure 12: Example of Least Squares Fit of Data.
We note that Eq. 4.10 could have been obtained simply by multiplying both sides of
Eq. 4.1 by A
T
, but we would not have known that the result is a least-squares solution. We
also recall that the matrix (A
T
A)
−1
A
T
in Eq. 4.12 is the pseudo left inverse of A. Thus,
the pseudo left inverse is equivalent to a least squares solution.
We also note that, if A is square and invertible, then
x = (A
T
A)
−1
A
T
b = A
−1
(A
T
)
−1
A
T
b = A
−1
b, (4.13)
as expected.
One of the most important properties associated with the normal equations is that, for
any matrix A,
rank(A
T
A) = rank(A), (4.14)
as proved in ¸2.5 following Eq. 2.42. We recall that the implication of this result is that, if
rank(A) = n (i.e., A’s columns are independent), then A
T
A is invertible.
4.1 Least Squares Fitting of Data
Sometimes one has a large number of discrete values of some function and wants to determine
a fairly simple approximation to the data, but make use of all the data. For example, as
shown in Fig. 12, suppose we know the points (x
1
, f
1
), (x
2
, f
2
), . . . , (x
m
, f
m
), and we want to
approximate f with the low-order polynomial
p(x) = a
0
+ a
1
x + a
2
x
2
+ + a
n
x
n
, (4.15)
where the number m of points is large, and the order n of the polynomial is small.
The goal of the ﬁt is for p to agree as closely as possible with the data. For example, we
could require that the algebraic sum of the errors would be minimized by p, but positive and
negative errors would cancel each other, thus distorting the evaluation of the ﬁt. We could
alternatively require that the sum of the absolute errors be minimum, but it turns out that
that choice results in a nasty mathematical problem. Hence we pick p(x) such that the sum
of the squares of the errors,
R = [f
1
−p(x
1
)]
2
+ [f
2
−p(x
2
)]
2
+ + [f
m
−p(x
m
)]
2
, (4.16)
is minimum. We note that the quantities in brackets are the errors at each of the discrete
points where the function is known. R is referred to as the residual.
50
For example, consider the linear approximation
p(x) = a
0
+ a
1
x, (4.17)
where the measure of error is given by
R(a
0
, a
1
) = [f
1
−(a
0
+ a
1
x
1
)]
2
+ [f
2
−(a
0
+ a
1
x
2
)]
2
+ + [f
m
−(a
0
+ a
1
x
m
)]
2
. (4.18)
For R a minimum,
∂R
∂a
0
= 0 and
∂R
∂a
1
= 0, (4.19)
which implies, from Eq. 4.18,
2[f
1
−(a
0
+ a
1
x
1
)](−1) + 2[f
2
−(a
0
+ a
1
x
2
)](−1) + + 2[f
m
−(a
0
+ a
1
x
m
)](−1) = 0
2[f
1
−(a
0
+ a
1
x
1
)](−x
1
) + 2[f
2
−(a
0
+ a
1
x
2
)](−x
2
) + + 2[f
m
−(a
0
+ a
1
x
m
)](−x
m
) = 0
(4.20)
or
ma
0
+ (x
1
+ x
2
+ + x
m
)a
1
= f
1
+ f
2
+ + f
m
(x
1
+ x
2
+ + x
m
)a
0
+ (x
2
1
+ x
2
2
+ + x
2
m
)a
1
= f
1
x
1
+ f
2
x
2
+ + f
m
x
m
.
(4.21)
This is a symmetric system of two linear algebraic equations in two unknowns (a
0
, a
1
) which
can be solved for a
0
and a
1
to yield p(x).
An alternative approach would be to attempt to ﬁnd (a
0
, a
1
) such that the linear function
goes through all m points:
a
0
+ a
1
x
1
= f
1
a
0
+ a
1
x
2
= f
2
a
0
+ a
1
x
3
= f
3
.
.
.
a
0
+ a
1
x
m
= f
m
,
(4.22)
which is an overdetermined system if m > 2. In matrix notation, this system can be written
Aa = f , (4.23)
where
A =
_
¸
¸
¸
¸
¸
_
1 x
1
1 x
2
1 x
3
.
.
.
.
.
.
1 x
m
_
¸
¸
¸
¸
¸
_
, f =
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
f
1
f
2
f
3
.
.
.
f
m
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
, a =
_
a
0
a
1
_
. (4.24)
This system was solved previously by multiplying by A
T
:
A
T
Aa = A
T
f , (4.25)
51
where
A
T
A =
_
1 1 1 1
x
1
x
2
x
3
x
m
_
_
¸
¸
¸
¸
¸
_
1 x
1
1 x
2
1 x
3
.
.
.
.
.
.
1 x
m
_
¸
¸
¸
¸
¸
_
=
_
¸
¸
¸
_
m
m

i=1
x
i
m

i=1
x
i
m

i=1
x
2
i
_
¸
¸
¸
_
, (4.26)
and
A
T
f =
_
1 1 1 1
x
1
x
2
x
3
x
m
_
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
f
1
f
2
f
3
.
.
.
f
m
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
=
_
¸
¸
¸
_
¸
¸
¸
_
m

i=1
f
i
m

i=1
x
i
f
i
_
¸
¸
¸
_
¸
¸
¸
_
. (4.27)
Thus, Eq. 4.25 becomes
_
¸
¸
¸
_
m
m

i=1
x
i
m

i=1
x
i
m

i=1
x
2
i
_
¸
¸
¸
_
_
a
0
a
1
_
=
_
¸
¸
¸
_
¸
¸
¸
_
m

i=1
f
i
m

i=1
x
i
f
i
_
¸
¸
¸
_
¸
¸
¸
_
, (4.28)
which is identical to Eq. 4.21. That is, both approaches give the same result. The geomet-
ric interpretation of minimizing the sum of squares of errors is equivalent to the algebraic
interpretation of ﬁnding x such that [Aa −f [ is minimized. Both approaches yield Eq. 4.28.
Hence, the two error measures are also the same; i.e., E
2
= R.
4.2 The General Linear Least Squares Problem
Given m points (x
1
, f
1
), (x
2
, f
2
), . . . , (x
m
, f
m
), we could, in general, perform a least squares
ﬁt using a linear combination of other functions φ
1
(x), φ
2
(x), . . . , φ
n
(x). For example, we
could attempt a ﬁt using the set of functions ¦e
x
, 1/x, sin x¦. The functions used in the
linear combination are referred to as basis functions (or basis vectors). Thus, the more
general least squares problem is to ﬁnd the coeﬃcients a
i
in
p(x) = a
1
φ
1
(x) + a
2
φ
2
(x) + + a
n
φ
n
(x) (4.29)
so that the mean square error, Eq. 4.16, between the function p(x) and the original points
is minimized.
If we use the algebraic approach to set up the equations, we attempt to force p(x) to go
through all m points:
a
1
φ
1
(x
1
) + a
2
φ
2
(x
1
) + + a
n
φ
n
(x
1
) = f
1
a
1
φ
1
(x
2
) + a
2
φ
2
(x
2
) + + a
n
φ
n
(x
2
) = f
2
.
.
.
a
1
φ
1
(x
m
) + a
2
φ
2
(x
m
) + + a
n
φ
n
(x
m
) = f
m
,
(4.30)
52
which is an over-determined system if m > n. Thus,
Aa = f or A
ij
a
j
= f
i
, (4.31)
where the summation convention is used, and the coeﬃcient matrix is
A
ij
= φ
j
(x
i
). (4.32)
The least squares solution of this problem is therefore found by multiplying by A
T
:
(A
T
A)a = A
T
f , (4.33)
where (using the summation convention)
(A
T
A)
ij
= (A
T
)
ik
A
kj
= A
ki
A
kj
= φ
i
(x
k

j
(x
k
) (4.34)
and
(A
T
f )
i
= (A
T
)
ij
f
j
= A
ji
f
j
= φ
i
(x
j
)f
j
. (4.35)
Thus, the normal equations for the general linear least squares problem are (in index nota-
tion)

i
(x
k

j
(x
k
)]a
j
= φ
i
(x
j
)f
j
. (4.36)
The order of this system is n, the number of basis functions. The straight line ﬁt considered
earlier is a special case of this formula.
4.3 Gram-Schmidt Orthogonalization
One of the potential problems with the least squares ﬁts described above is that the set of
basis functions is not an orthogonal set (even though the functions may be independent),
and the resulting system of equations (the normal equations) may be hard to solve (i.e, the
system may be poorly conditioned). This situation can be improved by making the set of
basis functions an orthogonal set.
The Gram-Schmidt procedure is a method for ﬁnding an orthonormal basis for a set
of independent vectors. Note that, given a set of vectors, we cannot simply use as our
orthonormal basis the set ¦e
(1)
, e
(2)
, e
(3)
, . . .¦, since the given vectors may deﬁne a subspace
of a larger dimensional space (e.g., a skewed plane in R
3
).
Consider the following problem: Given a set of independent vectors a
(1)
, a
(2)
, a
(3)
, . . ., ﬁnd
a set of orthonormal vectors q
(1)
, q
(2)
, q
(3)
, . . . that span the same space. (A set of vectors is
orthonormal if the vectors in the set are both mutually orthogonal and normalized to unit
length.)
We start the Gram-Schmidt procedure by choosing the ﬁrst unit basis vector to be parallel
to the ﬁrst vector in the set:
q
(1)
=
a
(1)
[a
(1)
[
, (4.37)
as shown in Fig. 13. We deﬁne the second unit basis vector so that it is coplanar with a
(1)
and a
(2)
:
q
(2)
=
a
(2)
−(a
(2)
q
(1)
)q
(1)
[a
(2)
−(a
(2)
q
(1)
)q
(1)
[
. (4.38)
53

a
(1)

a
(2)

q
(1)
`
`
`` `
a
(2)
−(a
(2)
q
(1)
)q
(1)
Figure 13: The First Two Vectors in Gram-Schmidt.
Since the third unit basis vector must be orthogonal to the ﬁrst two, we create q
(3)
by
subtracting from a
(3)
any part of a
(3)
which is parallel to either q
(1)
or q
(2)
. Hence,
q
(3)
=
a
(3)
−(a
(3)
q
(1)
)q
(1)
−(a
(3)
q
(2)
)q
(2)
[a
(3)
−(a
(3)
q
(1)
)q
(1)
−(a
(3)
q
(2)
)q
(2)
[
, (4.39)
q
(4)
=
a
(4)
−(a
(4)
q
(1)
)q
(1)
−(a
(4)
q
(2)
)q
(2)
−(a
(4)
q
(3)
)q
(3)
[a
(4)
−(a
(4)
q
(1)
)q
(1)
−(a
(4)
q
(2)
)q
(2)
−(a
(4)
q
(3)
)q
(3)
[
, (4.40)
and so on.
4.4 QR Factorization
Consider an m n matrix A whose columns a
(1)
, a
(2)
, a
(3)
, . . . , a
(n)
are independent. If we
have an orthonormal basis q
(1)
, q
(2)
, q
(3)
, . . . , q
(n)
for the columns, then each column can be
expanded in terms of the orthonormal basis:
a
(1)
= (a
(1)
q
(1)
)q
(1)
+ (a
(1)
q
(2)
)q
(2)
+ + (a
(1)
q
(n)
)q
(n)
a
(2)
= (a
(2)
q
(1)
)q
(1)
+ (a
(2)
q
(2)
)q
(2)
+ + (a
(2)
q
(n)
)q
(n)
.
.
.
a
(n)
= (a
(n)
q
(1)
)q
(1)
+ (a
(n)
q
(2)
)q
(2)
+ + (a
(n)
q
(n)
)q
(n)
_
¸
¸
¸
_
¸
¸
¸
_
, (4.41)
or, in terms of matrices,
A =
_
a
(1)
a
(2)
a
(n)
¸
=
_
q
(1)
q
(2)
q
(n)
¸
_
¸
¸
¸
_
a
(1)
q
(1)
a
(2)
q
(1)

a
(1)
q
(2)
a
(2)
q
(2)

.
.
.
.
.
.
.
.
.
a
(1)
q
(n)
a
(2)
q
(n)

_
¸
¸
¸
_
. (4.42)
However, if the q vectors were obtained from a Gram-Schmidt process, q
(1)
is parallel to
a
(1)
, and
a
(1)
q
(2)
= a
(1)
q
(3)
= a
(1)
q
(4)
= = 0. (4.43)
Also, q
(2)
is in the plane of a
(1)
and a
(2)
, in which case
q
(3)
a
(2)
= q
(4)
a
(2)
= q
(5)
a
(2)
= = 0. (4.44)
Hence, Eq. 4.42 becomes
A =
_
a
(1)
a
(2)
a
(n)
¸
=
_
q
(1)
q
(2)
q
(n)
¸
_
¸
¸
¸
_
a
(1)
q
(1)
a
(2)
q
(1)
a
(3)
q
(1)

0 a
(2)
q
(2)
a
(3)
q
(2)

0 0 a
(3)
q
(3)

.
.
.
.
.
.
.
.
.
.
.
.
_
¸
¸
¸
_
(4.45)
54
or
A = QR, (4.46)
where Q is the matrix formed with the orthonormal columns q
(1)
, q
(2)
, q
(3)
, . . . , q
(n)
of the
Gram-Schmidt process, and R is an upper triangular matrix. We thus conclude that every
mn matrix with independent columns can be factored into A = QR.
The beneﬁt of the QR factorization is that it simpliﬁes the least squares solution. We
recall that the rectangular system
Ax = b (4.47)
with m > n was solved using least squares by solving the normal equations
A
T
Ax = A
T
b. (4.48)
If A is factored into
A = QR (4.49)
(where A and Q are mn matrices, and R is an n n matrix), Eq. 4.48 becomes
R
T
Q
T
QRx = R
T
Q
T
b, (4.50)
where, in block form,
Q
T
Q =
_
¸
¸
¸
_
¸
¸
¸
_
q
(1)T
q
(2)T
.
.
.
q
(n)T
_
¸
¸
¸
_
¸
¸
¸
_
_
q
(1)
q
(2)
q
(n)
¸
=
_
¸
¸
¸
¸
¸
_
1 0 0 0
0 1 0 0
0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1
_
¸
¸
¸
¸
¸
_
= I. (4.51)
Thus,
R
T
Rx = R
T
Q
T
b, (4.52)
and, since R is upper triangular and invertible, the normal equations become
Rx = Q
T
b. (4.53)
This system can be solved very quickly, since R is an upper triangular matrix. However,
the major computational expense of the least squares problem is factoring A into QR, i.e.,
performing the Gram-Schmidt process to ﬁnd Q and R.
5 Fourier Series
Consider two functions f(x) and g(x) deﬁned in the interval [a, b] (i.e., a ≤ x ≤ b). We
deﬁne the inner product (or scalar product) of f and g as
(f, g) =
_
b
a
f(x)g(x) dx. (5.1)
Note the analogy between this deﬁnition for functions and the dot product for vectors.
55
We deﬁne the norm N
f
of a function f as
N
f
= (f, f)
1
2
=
¸
_
b
a
[f(x)]
2
dx. (5.2)
Note that a new function
¯
f(x) deﬁned as
¯
f(x) =
f(x)
N
f
(5.3)
has unit norm, i.e., (
¯
f,
¯
f) = 1. The function
¯
f is said to be normalized over [a, b]. Thus,
this deﬁnition of the norm of a function is analogous to the length of a vector.
We deﬁne two functions f and g as orthogonal over [a, b] if (f, g) = 0. Orthogonality
of functions is analogous to the orthogonality of vectors. We deﬁne the set of functions
φ
i
(x), i = 1, 2, . . . , as an orthonormal set over [a, b] if (φ
i
, φ
j
) = δ
ij
, where δ
ij
is the
Kronecker delta deﬁned as
δ
ij
=
_
1, i = j,
0, i ,= j.
(5.4)
For example, the set of functions
¦1, cos x, sin x, cos 2x, sin 2x, cos 3x, sin 3x, . . .¦
is an orthogonal set over [−π, π], since, for integers m and n,
(cos nx, sin mx) = 0,
(sin nx, sin mx) = (cos nx, cos mx) = 0, m ,= n.
(5.5)
To compute the norms of these functions, we note that
_
π
−π
(1)
2
dx = 2π,
_
π
−π
sin
2
nx dx =
_
π
−π
cos
2
nx dx = π
(5.6)
for all integers n. Thus, the set of functions
_
1

,
cos x

π
,
sin x

π
,
cos 2x

π
,
sin 2x

π
, . . .
_
is an orthonormal set over [−π, π].
The Fourier series of a function deﬁned over [−L, L] is an expansion of the function into
sines and cosines:
f(x) = A
0
+

n=1
A
n
cos
nπx
L
+

n=1
B
n
sin
nπx
L
. (5.7)
This equation is an expansion in functions orthogonal over [−L, L], analogous to the expan-
sion of a vector in orthogonal basis vectors. The basis functions here are
_
1, cos
πx
L
, sin
πx
L
, cos
2πx
L
, sin
2πx
L
, cos
3πx
L
, sin
3πx
L
, . . .
_
.
56
To determine the coeﬃcients A
n
, B
n
of the Fourier series, we take the inner product of
both sides of Eq. 5.7 with each basis function; that is, we multiply both sides of that equation
by a basis function, and integrate. For example,
_
L
−L
f(x) cos
mπx
L
dx =A
0
_
L
−L
cos
mπx
L
dx +

n=1
A
n
_
L
−L
cos
nπx
L
cos
mπx
L
dx
+

n=1
B
n
_
L
−L
sin
nπx
L
cos
mπx
L
dx =
_
A
0
(2L), m = 0,
A
m
(L), m ,= 0,
(5.8)
since the integrals in Eq. 5.8 are
_
L
−L
cos
mπx
L
dx =
_
2L, m = 0,
0, m ,= 0,
(5.9)
_
L
−L
cos
nπx
L
cos
mπx
L
dx =
_
L
−L
sin
nπx
L
sin
mπx
L
dx =
_
L, m = n,
0, m ,= n,
(5.10)
_
L
−L
sin
nπx
L
cos
mπx
L
dx = 0 (5.11)
for integers m and n. Similarly, we can evaluate B
n
. Thus, the coeﬃcients A
n
, B
n
in the
Fourier series, Eq. 5.7, are
A
0
=
1
2L
_
L
−L
f(x) dx, (5.12)
A
n
=
1
L
_
L
−L
f(x) cos
nπx
L
dx, n > 0, (5.13)
B
n
=
1
L
_
L
−L
f(x) sin
nπx
L
dx. (5.14)
Note that A
0
is the average value of f(x) over the domain [−L, L].
The evaluation of the coeﬃcients using integrals could have alternatively been expressed
in terms of inner products, in which case, from Eq. 5.7 (with L = π),
(f, cos mx) = A
m
(cos mx, cos mx) (5.15)
or
A
n
=
(f, cos nx)
(cos nx, cos nx)
. (5.16)
It is interesting to note the similarity between this last formula and the formula of the vector
projection p of a vector b onto another vector a (Fig. 14):
p =
_
b
a
[a[
_
a
[a[
=
b a
a a
a. (5.17)
Thus, the nth Fourier coeﬃcient can be interpreted as being the “projection” of the function
f on the nth basis function (cos nx).
57

a

b

p
Figure 14: The Projection of One Vector onto Another.
5.1 Example
We evaluate the Fourier series for the function
f(x) =
_
0, −5 < x < 0,
3, 0 < x < 5.
(5.18)
From Eqs. 5.12–5.14 (with L = 5), the Fourier coeﬃcients are
A
0
=
1
10
_
5
−5
f(x) dx =
1
10
_
5
0
3 dx = 3/2, (5.19)
A
n
=
1
5
_
5
−5
f(x) cos
nπx
5
dx =
1
5
_
5
0
3 cos
nπx
5
dx = 0, n > 0, (5.20)
B
n
=
1
5
_
5
−5
f(x) sin
nπx
5
dx =
1
5
_
5
0
3 sin
nπx
5
dx =
_
6/(nπ), n odd,
0, n even.
(5.21)
Thus the corresponding Fourier series is
f(x) =
3
2
+
6
π

n odd
1
n
sin
nπx
5
=
3
2
+
6
π
_
sin
πx
5
+
1
3
sin
3πx
5
+
1
5
sin
5πx
5
+
_
. (5.22)
The convergence of this series can be seen by evaluating the partial sums
f(x) ≈ f
N
(x) = A
0
+
N

n=1
A
n
cos
nπx
L
+
N

n=1
B
n
sin
nπx
L
(5.23)
for diﬀerent values of the upper limit N. Four such representations of f(x) are shown in
Fig. 15 for N = 5, 10, 20, and 40. Since half the sine terms and all the cosine terms (except
the constant term) are zero, the four partial sums have 4, 6, 11, and 21 nonzero terms,
respectively.
Notice also in Fig. 15 that, as the number of terms included in the Fourier series increases,
the overshoot which occurs at a discontinuity does not diminish. This property of Fourier
series is referred to as the Gibbs phenomenom. Notice also that, at a discontinuity of the
function, the series converges to the average value of the function at the discontinuity.
5.2 Generalized Fourier Series
The expansion of a function in a series of sines and cosines is a special case of an expansion
in other orthogonal functions. In the generalized Fourier series representation of the function
58
¸
`
x
y
-5 5
3
N = 5
¸
`
x
y
-5 5
3
N = 10
¸
`
x
y
-5 5
3
N = 20
¸
`
x
y
-5 5
3
N = 40
Figure 15: Convergence of Series in Example.
f(x),
f(x) ≈
N

i=1
c
i
φ
i
(x), (5.24)
we wish to evaluate the coeﬃcients c
i
so as to minimize the mean square error
M
N
=
_
b
a
_
f(x) −
N

i=1
c
i
φ
i
(x)
_
2
dx, (5.25)
where (φ
i
, φ
j
) = δ
ij
(Kronecker delta). Thus, the series representation in Eq. 5.24 is an
expansion in orthonormal functions.
To minimize M
N
, we set, for each j,
0 =
∂M
N
∂c
j
=
_
b
a
2
_
f(x) −
N

i=1
c
i
φ
i
(x)
_
[−φ
j
(x)] dx, (5.26)
or
_
b
a
f(x)φ
j
(x) dx =
N

i=1
c
i
_
b
a
φ
i
(x)φ
j
(x) dx =
N

i=1
c
i
δ
ij
= c
j
. (5.27)
That is,
c
i
= (f, φ
i
). (5.28)
Alternatively, had we started with the series, Eq. 5.24, we would have obtained the same
result by taking the inner product of each side with φ
j
:
(f, φ
j
) =
N

i=1
c
i

i
, φ
j
) =
N

i=1
c
i
δ
ij
= c
j
. (5.29)
The expansion coeﬃcients c
i
are called the Fourier coeﬃcients, and the series is called the
generalized Fourier series. The generalized Fourier series thus has the property that the
59
series representation of a function minimizes the mean square error between the series and
the function.
The formula for the generalized Fourier coeﬃcient c
i
in Eq. 5.28 can be interpreted as the
projection of the function f(x) on the ith basis function φ
i
. This interpretation is analogous
to the expansion of a vector in terms of the unit basis vectors in the coordinate directions.
For example, in three-dimensions, the vector v is, in Cartesian coordinates,
v = v
x
e
x
+ v
y
e
y
+ v
z
e
z
, (5.30)
where the components v
x
, v
y
, and v
z
are the projections of v on the coordinate basis vectors
e
x
, e
y
, and e
z
. That is,
v
x
= v e
x
, v
y
= v e
y
, v
z
= v e
z
. (5.31)
These components are analogous to the Fourier coeﬃcients c
i
.
From Eq. 5.25, we can deduce some additional properties of the Fourier series. From that
equation, the mean square error after summing N terms of the series is
M
N
= (f, f) −2
N

i=1
c
i
(f, φ
i
) +
_
b
a
_
N

i=1
c
i
φ
i
__
N

j=1
c
j
φ
j
_
dx
= (f, f) −2
N

i=1
c
2
i
+
N

i=1
N

j=1
c
i
c
j

i
, φ
j
)
= (f, f) −2
N

i=1
c
2
i
+
N

i=1
c
2
i
= (f, f) −
N

i=1
c
2
i
. (5.32)
Since, by deﬁnition, M
N
≥ 0, this last equation implies
N

i=1
c
2
i
≤ (f, f), (5.33)
which is referred to as Bessel’s Inequality. Since the right-hand side of this inequality (the
square of the norm) is ﬁxed for a given function, and each term of the left-hand side is
non-negative, a necessary condition for a Fourier series to converge is that c
i
→0 as i →∞.
From Eq. 5.32, the error after summing N terms of the series is
M
N
= (f, f) −
N

i=1
c
2
i
, (5.34)
so that the error after N + 1 terms is
M
N+1
= (f, f) −
N+1

i=1
c
2
i
= (f, f) −
N

i=1
c
2
i
−c
2
N+1
= M
N
−c
2
N+1
. (5.35)
60
¸
`
x
f(x)
q
Figure 16: A Function with a Jump Discontinuity.
That is, adding a new term in the series decreases the error (if that new term is nonzero).
Thus, in general, higher order approximations are better than lower order approximations.
The set of basis functions φ
i
(x) is deﬁned as a complete set if
lim
N→∞
M
N
= lim
N→∞
_
b
a
_
f(x) −
N

i=1
c
i
φ
i
(x)
_
2
dx = 0. (5.36)
If this equation holds, the Fourier series is said to converge in the mean. Having a complete
set means that the set of basis functions φ
i
(x) is not missing any members. Thus, we could
alternatively deﬁne the set φ
i
(x) as complete if there is no function with positive norm
which is orthogonal to each basis function φ
i
(x) in the set. It turns out that the set of basis
functions in Eq. 5.7 forms a complete set in [−L, L].
Note that convergence in the mean does not imply convergence pointwise. That is, it
may not be true that
lim
N→∞
¸
¸
¸
¸
¸
f(x) −
N

i=1
c
i
φ
i
(x)
¸
¸
¸
¸
¸
= 0 (5.37)
for all x. For example, the series representation of a discontinuous function f(x) will not
have pointwise convergence at the discontinuity (Fig. 16).
The vector analog to completeness for sets of functions can be illustrated by observing
that, to express an arbitrary three-dimensional vector in terms of a sum of coordinate ba-
sis vectors, three basis vectors are needed. A basis consisting of only two unit vectors is
incomplete, since it in incapable of representing arbitrary vectors in three dimensions.
5.3 Fourier Expansions Using a Polynomial Basis
Recall, in the generalized Fourier series, we represent the function f(x) with the expansion
f(x) =
N

i=1
c
i
φ
i
(x), (5.38)
although the equality is approximate if N is ﬁnite. Let the basis functions be
¦φ
i
¦ = ¦1, x, x
2
, x
3
, . . .¦, (5.39)
which is a set of independent, nonorthogonal functions. If we take the inner product of both
sides of Eq. 5.38 with φ
j
, we obtain
(f, φ
j
) =
N

i=1
c
i

i
, φ
j
), (5.40)
61
where, in general, since the basis functions are nonorthogonal,

i
, φ
j
) ,= 0 for i ,= j. (5.41)
Thus, Eq. 5.40 is a set of coupled equations which can be solved to yield c
i
:
(f, φ
1
) = (φ
1
, φ
1
)c
1
+ (φ
1
, φ
2
)c
2
+ (φ
1
, φ
3
)c
3
+
(f, φ
2
) = (φ
2
, φ
1
)c
1
+ (φ
2
, φ
2
)c
2
+ (φ
2
, φ
3
)c
3
+
(f, φ
3
) = (φ
3
, φ
1
)c
1
+ (φ
3
, φ
2
)c
2
+ (φ
3
, φ
3
)c
3
+
.
.
.
(5.42)
or
Ac = b, (5.43)
where
A
ij
= (φ
i
, φ
j
), c =
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
c
1
c
2
c
3
.
.
.
c
N
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
, b =
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
(f, φ
1
)
(f, φ
2
)
(f, φ
3
)
.
.
.
(f, φ
N
)
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
, (5.44)
and φ
i
(x) = x
i−1
.
If we assume the domain 0 ≤ x ≤ 1, we obtain
A
ij
=
_
1
0
x
i−1
x
j−1
dx =
_
1
0
x
i+j−2
dx =
x
i+j−1
i + j −1
¸
¸
¸
¸
1
0
=
1
i + j −1
(5.45)
or
A =
_
¸
¸
¸
¸
¸
¸
_
1
1
2
1
3
1
4

1
2
1
3
1
4
1
5

1
3
1
4
1
5
1
6

1
4
1
5
1
6
1
7

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
¸
¸
¸
¸
¸
¸
_
, (5.46)
which is referred to as the Hilbert matrix.
There are several problems with this approach:
1. Since the set ¦φ
i
¦ is not orthogonal, we must solve a set of coupled equations to obtain
the coeﬃcients c
i
.
2. The coeﬃcients c
i
depend on the number N of terms in the expansion. That is, to add
more terms, all c
i
change, not just the new coeﬃcients.
3. The resulting coeﬃcient matrix, the Hilbert matrix, is terribly conditioned even for
very small matrices.
62

a
(1)

a
(2)

q
(1)
`
`
`` `
a
(2)
−(a
(2)
q
(1)
)q
(1)
Figure 17: The First Two Vectors in Gram-Schmidt.
These problems can be eliminated if we switch to a set of orthogonal polynomials, which
can be found using the Gram-Schmidt process. Let φ
i
(x) denote the nonorthogonal set of
basis functions
φ
0
(x) = 1, φ
1
(x) = x, φ
2
(x) = x
2
, . . . , (5.47)
where we have chosen to count the functions starting from zero. We let P
i
(x) denote the set
of orthogonal basis functions to be determined. These functions will not be required to have
unit norm.
To facilitate the application of Gram-Schmidt to functions, we recall the Gram-Schmidt
algorithm for vectors. Given a set of independent, nonorthogonal vectors a
(1)
, a
(2)
, a
(3)
, . . .,
this procedure computes a set of orthonormal vectors q
(1)
, q
(2)
, q
(3)
, . . . that span the same
space. For example, the ﬁrst two orthonormal vectors are
q
(1)
=
a
(1)
[a
(1)
[
, q
(2)
=
a
(2)
−(a
(2)
q
(1)
)q
(1)
[a
(2)
−(a
(2)
q
(1)
)q
(1)
[
, (5.48)
as illustrated in Fig. 17.
For convenience, we choose the domain −1 ≤ x ≤ 1, in which case

i
, φ
j
) = 0, i + j = odd, (5.49)
since the integral of an odd power of x over symmetrical limits is zero (i.e., the integrand is
an odd function of x).
Thus, by analogy with Gram-Schmidt for vectors, we obtain
P
0
(x) = φ
0
(x) = 1, (5.50)
P
1
(x) = φ
1
(x) −(φ
1
, P
0
)
P
0
(x)
(P
0
, P
0
)
= x −(x, 1)
1
(1, 1)
= x −0 = x, (5.51)
P
2
(x) = φ
2
(x)−(φ
2
, P
0
)
P
0
(x)
(P
0
, P
0
)
−(φ
2
, P
1
)
P
1
(x)
(P
1
, P
1
)
= x
2
−(x
2
, 1)
1
(1, 1)
−(x
2
, x)
x
(x, x)
, (5.52)
where
(x
2
, 1) =
_
1
−1
x
2
dx =
2
3
, (1, 1) =
_
1
−1
dx = 2, (x
2
, x) = 0. (5.53)
Thus,
P
2
(x) = x
2

1
3
, (5.54)
and so on. These polynomials, called Legendre polynomials, are orthogonal over [-1,1]. Be-
cause of their orthogonality, the Legendre polynomials are a better choice for polynomial
expansions than the polynomials of the Taylor series, Eq. 5.47.
63
With the basis functions φ
i
given by the Legendre polynomials, Eq. 5.40 becomes
(f, φ
j
) =
N

i=1
c
i

i
, φ
j
) = c
j

j
, φ
j
) (no sum on j), (5.55)
since each equation in Eq. 5.40 has only one nonzero term on the right-hand side (the term
i = j).
5.4 Similarity of Fourier Series With Least Squares Fitting
In the least squares problem, given a ﬁnite number m of points (x
i
, f
i
) and the n-term ﬁtting
function
p(x) = a
i
φ
i
(x), (5.56)
the normal equations are, from Eq. 4.36,

i
(x
k

j
(x
k
)]a
j
= f
j
φ
i
(x
j
), (5.57)
where the order of the resulting system is n, the number of basis functions. The summation
convention is used, so that a summation is implied over repeated indices.
In the Fourier series problem, given a function f(x) and the generalized Fourier series
f(x) = c
i
φ
i
(x), (5.58)
the coeﬃcients are determined from

i
, φ
j
)c
j
= (f, φ
i
), (5.59)
where the summation convention is used again.
Note the similarities between Eqs. 5.57 and 5.59. These two equations are essentially
the same, except that Eq. 5.57 is the discrete form of Eq. 5.59. Also, the basis functions in
Eq. 5.59 are orthogonal, so that Eq. 5.59 is a diagonal system.
6 Eigenvalue Problems
If, for a square matrix A of order n, there exists a vector x ,= 0 and a number λ such that
Ax = λx, (6.1)
λ is called an eigenvalue of A, and x is the corresponding eigenvector. Note that Eq. 6.1
can alternatively be written in the form
Ax = λIx, (6.2)
where I is the identity matrix, or
(A−λI)x = 0, (6.3)
64

`
`
`
`

k
1
m
1
¸
u
1
(t)

e e

`
`
`
`

k
2
m
2
¸
u
2
(t)

e e
¸
f
2
(t)
Figure 18: 2-DOF Mass-Spring System.
which is a system of n linear algebraic equations. If the coeﬃcient matrix A − λI were
nonsingular, the unique solution of Eq. 6.3 would be x = 0, which is not of interest. Thus,
to obtain a nonzero solution of the eigenvalue problem, A − λI must be singular, which
implies that
det(A−λI) =
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
a
11
−λ a
12
a
13
a
1n
a
21
a
22
−λ a
23
a
2n
a
31
a
32
a
33
−λ a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
−λ
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
= 0. (6.4)
We note that this determinant, when expanded, yields a polynomial of degree n referred
to as the characteristic polynomial associated with the matrix. The equation obtained by
equating this polynomial with zero is referred to as the characteristic equation for the matrix.
The eigenvalues are thus the roots of the characteristic polynomial. Since a polynomial of
degree n has n roots, A has n eigenvalues (not necessarily real).
Eigenvalue problems arise in many physical applications, including free vibrations of
mechanical systems, buckling of structures, and the calculation of principal axes of stress,
strain, and inertia.
6.1 Example 1: Mechanical Vibrations
Consider the undamped two-degree-of-freedom mass-spring system shown in Fig. 18. We let
u
1
and u
2
denote the displacements from the equilibrium of the two masses m
1
and m
2
. The
stiﬀnesses of the two springs are k
1
and k
2
.
For the purposes of computation, let k
1
= k
2
= 1 and m
1
= m
2
= 1. From Newton’s
second law of motion (F = ma), the equations of motion of this system are
¨ u
1
+ 2u
1
−u
2
= 0
¨ u
2
−u
1
+ u
2
= f
2
_
, (6.5)
where dots denote diﬀerentiation with respect to the time t. In matrix notation, this system
can be written
_
1 0
0 1
_ _
¨ u
1
¨ u
2
_
+
_
2 −1
−1 1
_ _
u
1
u
2
_
=
_
0
f
2
_
(6.6)
or
M¨ u +Ku = F, (6.7)
where
M =
_
1 0
0 1
_
, K =
_
2 −1
−1 1
_
, F =
_
0
f
2
_
. (6.8)
65
These three matrices are referred to as the system’s mass, stiﬀness, and force matrices,
respectively.
With zero applied force (the free undamped vibration problem),
M¨ u +Ku = 0. (6.9)
We look for nonzero solutions of this equation in the form
u = u
0
cos ωt. (6.10)
That is, we look for solutions of Eq. 6.9 which are sinusoidal in time and where both DOF
oscillate with the same circular frequency ω. The vector u
0
is the amplitude vector for the
solution. The substitution of Eq. 6.10 into Eq. 6.9 yields
−Mω
2
u
0
cos ωt +Ku
0
cos ωt = 0. (6.11)
Since this equation must hold for all time t,
(−ω
2
M+K)u
0
= 0 (6.12)
or
Ku
0
= ω
2
Mu
0
(6.13)
or
M
−1
Ku
0
= ω
2
u
0
. (6.14)
This is an eigenvalue problem in standard form if we deﬁne A = M
−1
K and ω
2
= λ:
Au
0
= λu
0
(6.15)
or
(A−λI)u
0
= 0. (6.16)
For this problem,
A = M
−1
K =
_
1 0
0 1
_ _
2 −1
−1 1
_
=
_
2 −1
−1 1
_
. (6.17)
Since we want the eigenvalues and eigenvectors of A, it follows from Eq. 6.16 that
det(A−λI) =
¸
¸
¸
¸
2 −λ −1
−1 1 −λ
¸
¸
¸
¸
= (2 −λ)(1 −λ) −1 = λ
2
−3λ + 1 = 0. (6.18)
Thus,
λ =
3 ±

5
2
≈ 0.382 and 2.618 , (6.19)
and the circular frequencies ω (the square root of λ) are
ω
1
=
_
λ
1
= 0.618, ω
2
=
_
λ
2
= 1.618. (6.20)
66

`
`
`
`

`
`
`
`
(a)
¸
0.618
¸
1.000
¸
1.000
¡
0.618

`
`
`
`

`
`
`
`
(b)

`
`
`
`

`
`
(c)
Figure 19: Mode Shapes for 2-DOF Mass-Spring System. (a) Undeformed shape, (b) Mode
1 (masses in-phase), (c) Mode 2 (masses out-of-phase).
To obtain the eigenvectors, we solve Eq. 6.16 for each eigenvalue found. For the ﬁrst
eigenvalue, λ = 0.382, we obtain
_
1.618 −1
−1 0.618
_ _
u
1
u
2
_
=
_
0
0
_
, (6.21)
which is a redundant system of equations satisﬁed by
u
(1)
=
_
0.618
1
_
. (6.22)
The superscript indicates that this is the eigenvector associated with the ﬁrst eigenvalue.
Note that, since the eigenvalue problem is a homogeneous problem, any nonzero multiple of
u
(1)
is also an eigenvector.
For the second eigenvalue, λ = 2.618, we obtain
_
−0.618 −1
−1 −1.618
_ _
u
1
u
2
_
=
_
0
0
_
, (6.23)
a solution of which is
u
(2)
=
_
1
−0.618
_
. (6.24)
Physically, ω
i
is a natural frequency of the system (in radians per second), and u
(i)
is the
corresponding mode shape. An n-DOF system has n natural frequencies and mode shapes.
Note that natural frequencies are not commonly expressed in radians per second, but in
cycles per second or Hertz (Hz), where ω = 2πf. The lowest natural frequency for a system
is referred to as the fundamental frequency for the system.
The mode shapes for this example are sketched in Fig. 19. For Mode 1, the two masses
are vibrating in-phase, and m
2
has the larger displacement, as expected. For Mode 2, the
two masses are vibrating out-of-phase.
The determinant approach used above is ﬁne for small matrices but not well-suited to
numerical computation involving large matrices.
67
6.2 Properties of the Eigenvalue Problem
There are several useful properties associated with the eigenvalue problem:
Property 1. Eigenvectors are unique only up to an arbitrary multiplicative constant.
From the eigenvalue problem, Eq. 6.1, if x is a solution, αx is also a solution for any nonzero
scalar α. We note that Eq. 6.1 is a homogeneous equation.
Property 2. The eigenvalues of a triangular matrix are the diagonal entries. For example,
let
A =
_
_
1 4 5
0 4 7
0 0 9
_
_
, (6.25)
for which the characteristic equation is
det(A−λI) =
¸
¸
¸
¸
¸
¸
1 −λ 4 5
0 4 −λ 7
0 0 9 −λ
¸
¸
¸
¸
¸
¸
= (1 −λ)(4 −λ)(9 −λ) = 0, (6.26)
which implies λ = 1, 4, 9. Although the eigenvalues of a triangular matrix are obvious, it
turns out that the eigenvectors are not obvious.
Property 3. The eigenvalues of a diagonal matrix are the diagonal entries, and the eigen-
vectors are _
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
1
0
0
.
.
.
0
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
,
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
0
1
0
.
.
.
0
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
,
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
0
0
1
0
.
.
.
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
, etc. (6.27)
For example, let
A =
_
_
11 0 0
0 4 0
0 0 9
_
_
. (6.28)
For λ
1
= 11,
x
(1)
=
_
_
_
1
0
0
_
_
_
, (6.29)
since
Ax
(1)
=
_
_
11 0 0
0 4 0
0 0 9
_
_
_
_
_
1
0
0
_
_
_
=
_
_
_
11
0
0
_
_
_
= 11
_
_
_
1
0
0
_
_
_
= λ
1
x
(1)
. (6.30)
Similarly,
Ax
(2)
= 4
_
_
_
0
1
0
_
_
_
, Ax
(3)
= 9
_
_
_
0
0
1
_
_
_
. (6.31)
68
Property 4. The sum of the eigenvalues of A is equal to the trace of A; i.e., for a matrix
of order n,
n

i=1
λ
i
= tr A. (6.32)
(The trace of a matrix is deﬁned as the sum of the main diagonal terms.) To prove this
property, we ﬁrst note, from Eq. 6.4, that
det(A−λI) =
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
a
11
−λ a
12
a
13
a
1n
a
21
a
22
−λ a
23
a
2n
a
31
a
32
a
33
−λ a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
−λ
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
= (−λ)
n
+ (a
11
+ a
22
+ + a
nn
)(−λ)
n−1
+ . (6.33)
No other products in the determinant contribute to the (−λ)
n−1
term. For example, the
cofactor of a
12
deletes two matrix terms involving λ, so that the largest power of λ from that
cofactor is n −2. On the other hand, since det(A−λI) is a polynomial of degree n, we can
write it in the factored form
det(A−λI) = (λ
1
−λ)(λ
2
−λ) (λ
n
−λ)
= (−λ)
n
+ (λ
1
+ λ
2
+ + λ
n
)(−λ)
n−1
+ + λ
1
λ
2
λ
3
λ
n
. (6.34)
Since the polynomials in Eqs. 6.33 and 6.34 must be identically equal for all λ, we equate
the coeﬃcients of the (−λ)
n−1
terms to obtain
λ
1
+ λ
2
+ + λ
n
= a
11
+ a
22
+ + a
nn
, (6.35)
which is the desired result.
Property 5. The product of the eigenvalues equals the determinant of A, i.e.,
λ
1
λ
2
λ
n
= det A. (6.36)
This property results immediately from Eq. 6.34 by setting λ = 0.
Property 6. If the eigenvectors of A are linearly independent, and a matrix S is formed
whose columns are the eigenvectors, then the matrix product
S
−1
AS = Λ =
_
¸
¸
¸
¸
¸
_
λ
1
λ
2
λ
3
.
.
.
λ
n
_
¸
¸
¸
¸
¸
_
(6.37)
69
is diagonal. This property is proved in block form as follows:
AS = A
_
x
(1)
x
(2)
x
(n)
¸
=
_
Ax
(1)
Ax
(2)
Ax
(n)
¸
=
_
λ
1
x
(1)
λ
2
x
(2)
λ
n
x
(n)
¸
=
_
x
(1)
x
(2)
x
(n)
¸
_
¸
¸
¸
¸
¸
_
λ
1
λ
2
λ
3
.
.
.
λ
n
_
¸
¸
¸
¸
¸
_
= SΛ. (6.38)
Since the columns of S are independent, S is invertible, and
S
−1
AS = Λ or A = SΛS
−1
. (6.39)
Property 7. Distinct eigenvalues yield independent eigenvectors. To prove this property,
we consider two eigensolutions:
Ax
(1)
= λ
1
x
(1)
(6.40)
Ax
(2)
= λ
2
x
(2)
. (6.41)
To show that x
(1)
and x
(2)
are independent, we must show that
c
1
x
(1)
+ c
2
x
(2)
= 0 (6.42)
implies c
1
= c
2
= 0. We ﬁrst multiply Eq. 6.42 by A to obtain
c
1
Ax
(1)
+ c
2
Ax
(2)
= c
1
λ
1
x
(1)
+ c
2
λ
2
x
(2)
= 0. (6.43)
We then multiply Eq. 6.42 by λ
2
to obtain
c
1
λ
2
x
(1)
+ c
2
λ
2
x
(2)
= 0. (6.44)
If we subtract these last two equations, we obtain
c
1

1
−λ
2
)x
(1)
= 0, (6.45)
which implies c
1
= 0, since the eigenvalues are distinct (λ
1
,= λ
2
). Similarly, c
2
= 0, and the
eigenvectors are independent. From the last two properties, we conclude that a matrix with
distinct eigenvalues can always be diagonalized. It turns out that a matrix with repeated
eigenvalues cannot always be diagonalized.
Property 8. The eigenvalues and eigenvectors of a real, symmetric matrix are real. To
prove this property, we consider the eigenvalue problem
Ax = λx, (6.46)
70
where A is a real, symmetric matrix. If we take the complex conjugate of both sides of this
equation, we obtain
Ax

= λ

x

, (6.47)
where the asterisk (*) denotes the complex conjugate. The goal is to show that λ = λ

. We
multiply Eq. 6.46 by x
∗T
(the conjugate transpose) and Eq. 6.47 by x
T
to obtain
x
∗T
Ax = λx
∗T
x, (6.48)
x
T
Ax

= λ

x
T
x

, (6.49)
where the two left-hand sides are equal, since they are the transposes of each other, and
both are scalars (1 1 matrices). Thus,
λx
∗T
x = λ

x
T
x

, (6.50)
where x
∗T
x = x
T
x

,= 0 implies λ = λ

(i.e., λ is real). The eigenvectors are also real, since
they are obtained by solving the equation
(A−λI)x = 0 (6.51)
for x, given λ.
In engineering applications, the form of the eigenvalue problem that is of greater interest
than Eq. 6.46 is the generalized eigenvalue problem
Ax = λBx, (6.52)
where both A and B are real and symmetric. Having A and B real and symmetric is not
suﬃcient, as is, to show that the generalized eigenvalue problem has only real eigenvalues,
as can be seen from the example
A =
_
1 0
0 −1
_
, B =
_
0 1
1 0
_
, (6.53)
for which Eq. 6.52 has imaginary eigenvalues ±i, where i =

−1. However, it turns out
that, if B is not only real and symmetric but also positive deﬁnite (a term to be deﬁned
later in ¸6.8, p. 82), the generalized eigenvalue problem has real eigenvalues [10]. Since, in
engineering applications, B is usually positive deﬁnite, the generalized eigenvalue problems
that arise in engineering have real eigenvalues.
Matrix formulations of engineering problems generally yield symmetric coeﬃcient matri-
ces, so this property is of great practical signiﬁcance. As a result, various physical quantities
of interest, which must be real, are guaranteed to be real, including the natural frequencies
of vibration and mode shapes (the eigenvectors) of undamped mechanical systems, buckling
loads in elastic stability problems, and principal stresses.
Property 9. The eigenvectors of a real, symmetric matrix with distinct eigenvalues are
mutually orthogonal. To prove this property, we consider from the outset the generalized
eigenvalue problem
Ax = λBx, (6.54)
71
where A and B are both real, symmetric matrices. For two eigenvalues λ
1
and λ
2
,
Ax
(1)
= λ
1
Bx
(1)
, (6.55)
Ax
(2)
= λ
2
Bx
(2)
, (6.56)
from which it follows that the scalar
λ
1
x
(2)T
Bx
(1)
= x
(2)T
Ax
(1)
= x
(1)T
Ax
(2)
= λ
2
x
(1)T
Bx
(2)
= λ
2
x
(2)T
Bx
(1)
. (6.57)
Thus,

1
−λ
2
)x
(2)T
Bx
(1)
= 0, (6.58)
and, if the eigenvalues are distinct (λ
1
,= λ
2
),
x
(2)T
Bx
(1)
= 0. (6.59)
Also, since Ax = λBx,
x
(2)T
Ax
(1)
= 0. (6.60)
The eigenvectors are said to be orthogonal with respect to the matrices A and B.
In mechanical vibrations, A represents the stiﬀness matrix, and B represents the mass
matrix. For x an eigenvector, Eq. 6.54 implies the scalar equation
x
T
Ax = λx
T
Bx (6.61)
or
λ =
x
T
Ax
x
T
Bx
, (6.62)
which is referred to as the Rayleigh quotient. In mechanical vibrations, the numerator and
denominator are the generalized stiﬀness and mass, respectively, for a given vibration mode,
and λ = ω
2
is the square of the circular frequency.
The orthogonality relations proved here also imply that
S
T
AS = ΛS
T
BS, (6.63)
where S is the matrix of eigenvectors (in the columns), Λis the diagonal matrix of eigenvalues
deﬁned in Eq. 6.37, and S
T
AS and S
T
BS are both diagonal matrices. To prove Eq. 6.63,
we note that, with the matrix S of eigenvectors deﬁned as
S =
_
x
(1)
x
(2)

¸
, (6.64)
it follows that
S
T
AS =
_
¸
_
¸
_
x
(1)T
x
(2)T
.
.
.
_
¸
_
¸
_
A
_
x
(1)
x
(2)

¸
=
_
¸
_
x
(1)T
Ax
(1)
x
(1)T
Ax
(2)

x
(2)T
Ax
(1)
x
(2)T
Ax
(2)

.
.
.
.
.
.
.
.
.
_
¸
_
, (6.65)
where the oﬀ-diagonal terms are zero by orthogonality. A similar expression applies for B.
72
If Ax = λx (i.e., B = I), and the eigenvectors are normalized to unit length, then the
matrix S is orthogonal, and
S
T
AS = ΛS
T
IS = Λ (6.66)
or
A = SΛS
T
. (6.67)
This last result, referred to as the spectral theorem, means that a real, symmetric matrix can
be factored into A = SΛS
T
, where S has orthonormal eigenvectors in the columns, and Λ
is a diagonal matrix of eigenvalues.
Conversely, if the eigenvalues of a matrix A are real, and the eigenvectors are mutually
orthogonal, A is necessarily symmetric. Since orthogonal eigenvectors are independent,
Eq. 6.39 implies that A can be written in the form
A = SΛS
−1
, (6.68)
where S is the matrix whose columns are the eigenvectors. If the orthogonal eigenvectors
are normalized to unit length, S is also orthogonal (i.e., S
−1
= S
T
), and
A = SΛS
T
, (6.69)
which is a symmetric matrix.
6.3 Example 2: Principal Axes of Stress
The matrix of stress components in 3-D elasticity is a tensor of rank 2:
σ =
_
_
σ
11
σ
12
σ
13
σ
21
σ
22
σ
23
σ
31
σ
32
σ
33
_
_
, (6.70)
where the diagonal components of the matrix are the normal (or direct) stresses, and the
oﬀ-diagonal components are the shear stresses. This matrix is symmetric.
Recall that an orthogonal transformation of coordinates transforms σ according to the
rule
σ

= RσR
T
, (6.71)
where
R
ij
= e

i
e
j
, (6.72)
e

i
and e
j
are unit vectors in the coordinate directions, and R is an orthogonal matrix
(RR
T
= I).
A key problem is to determine whether there is an orthogonal transformation of coordi-
nates (i.e., a rotation of axes) x

= Rx such that the stress tensor σ is diagonalized:
σ

=
_
_
σ

11
0 0
0 σ

22
0
0 0 σ

33
_
_
=
_
_
s
1
0 0
0 s
2
0
0 0 s
3
_
_
, (6.73)
where the diagonal stresses in this new coordinate system are denoted s
1
, s
2
, s
3
.
73
From Eq. 6.71,
σR
T
= R
T
σ

. (6.74)
We now let v
i
denote the ith column of R
T
; i.e.,
R
T
= [v
1
v
2
v
3
], (6.75)
in which case Eq. 6.74 can be written in matrix form as
σ[v
1
v
2
v
3
] = [v
1
v
2
v
3
]
_
_
s
1
0 0
0 s
2
0
0 0 s
3
_
_
= [s
1
v
1
s
2
v
2
s
3
v
3
]. (6.76)
(The various matrices in this equation are conformable from a “block” point of view, since
the left-hand side is, for example, the product of a 1 1 matrix with a 1 3 matrix.) Each
column of this equation is
σv
i
= s
i
v
i
(no sum on i). (6.77)
Thus, the original desire to ﬁnd a coordinate rotation which would transform the stress
tensor to diagonal form reduces to seeking vectors v such that
σv = sv. (6.78)
Eq. 6.78 is an eigenvalue problem with s the eigenvalue, and v the corresponding eigen-
vector. The goal in solving Eq. 6.78 (and eigenvalue problems in general) is to ﬁnd nonzero
vectors v which satisfy Eq. 6.78 for some scalar s. Geometrically, the goal in solving Eq. 6.78
is to ﬁnd nonzero vectors v which, when multiplied by the matrix σ, result in new vectors
which are parallel to v.
Eq. 6.78 is equivalent to the matrix system
(σ −sI)v = 0. (6.79)
For nontrivial solutions (v ,= 0), the matrix σ −sI must be singular, implying that
det(σ −sI) =
¸
¸
¸
¸
¸
¸
σ
11
−s σ
12
σ
13
σ
21
σ
22
−s σ
23
σ
31
σ
32
σ
33
−s
¸
¸
¸
¸
¸
¸
= 0, (6.80)
which is referred to as the characteristic equation associated with the eigenvalue problem.
The characteristic equation is a cubic polynomial in s, since, when expanded, yields

11
−s)[(σ
22
−s)(σ
33
−s) −σ
23
σ
32
] −σ
12

21

33
−s) −σ
23
σ
31
]
+ σ
13

21
σ
32
−σ
31

22
−s)] = 0 (6.81)
or
−s
3
+ θ
1
s
2
−θ
2
s + θ
3
= 0, (6.82)
where
_
_
_
θ
1
= σ
11
+ σ
22
+ σ
33
= σ
ii
= tr σ
θ
2
= σ
22
σ
33
+ σ
33
σ
11
+ σ
11
σ
22
−σ
2
31
−σ
2
12
−σ
2
23
=
1
2

ii
σ
jj
−σ
ij
σ
ji
)
θ
3
= det σ.
(6.83)
74

`
`
`
`

k
1
=3
m
1
=1
¸
u
1
(t)

e e

`
`
`
`

k
2
=2
m
2
=2
¸
u
2
(t)

e e

`
`
`
`

k
3
=1
m
3
=3
¸
u
3
(t)

e e
Figure 20: 3-DOF Mass-Spring System.
The stresses s
1
, s
2
, s
3
, which are the three solutions of the characteristic equation, are
referred to as the principal stresses. The resulting stress tensor is
σ =
_
_
s
1
0 0
0 s
2
0
0 0 s
3
_
_
, (6.84)
and the coordinate axes of the coordinate system in which the stress tensor is diagonal is
referred to as the principal axes of stress or the principal coordinates.
The principal axes of stress are the eigenvectors of the stress tensor, since the (normalized)
eigenvectors are columns of R
T
(rows of R), where
R
ij
= e

i
e
j
. (6.85)
Thus, Row i of R consists of the direction cosines of the ith principal axis.
Since the principal stresses are independent of the original coordinate system, the co-
eﬃcients of the characteristic polynomial, Eq. 6.82, must be invariant with respect to a
coordinate rotation. Thus, θ
1
, θ
2
, and θ
3
are referred to as the invariants of the stress
tensor. That is, the three invariants have the same values in all coordinate systems.
In principal coordinates, the stress invariants are
_
_
_
θ
1
= s
1
+ s
2
+ s
3
θ
2
= s
2
s
3
+ s
3
s
1
+ s
1
s
2
θ
3
= s
1
s
2
s
3
(6.86)
We note that the above theory for eigenvalues, principal axes, and invariants is applicable
to all tensors of rank 2, since we did not use the fact that we were dealing speciﬁcally with
stress. For example, strain is also a tensor of rank 2. A geometrical example of a tensor
of rank 2 is the inertia matrix I
ij
whose matrix elements are moments of inertia. The
determination of principal axes of inertia in two dimensions in an eigenvalue problem.
6.4 Computing Eigenvalues by Power Iteration
Determinants are not a practical way to compute eigenvalues and eigenvectors except for very
small systems (n = 2 or 3). Here we show a classical approach to solving the eigenproblem
for large matrices. We will illustrate the procedure with a mechanical vibrations example.
Consider the 3-DOF system shown in Fig. 20. For this system, the mass and stiﬀness matrices
are
M =
_
_
1 0 0
0 2 0
0 0 3
_
_
, K =
_
_
5 −2 0
−2 3 −1
0 −1 1
_
_
. (6.87)
75
The eigenvalue problem is, from Eq. 6.13,
Ku = λMu (6.88)
or
Au = λu, (6.89)
where u is the vector of displacement amplitudes, and
A = M
−1
K =
_
¸
_
1 0 0
0
1
2
0
0 0
1
3
_
¸
_
_
¸
_
5 −2 0
−2 3 −1
0 −1 1
_
¸
_
=
_
¸
_
5 −2 0
−1
3
2

1
2
0 −
1
3
1
3
_
¸
_
. (6.90)
Notice that the product of two symmetric matrices need not be symmetric, even if the ﬁrst
matrix is diagonal.
By deﬁnition, a solution (eigenvector) u of Eq. 6.89 is a vector such that Au is a scalar
multiple of u, where the scalar multiple is the corresponding eigenvalue λ. Geometrically, u is
an eigenvector if Au is parallel to u. That is, an eigenvector of a matrix A is a vector which
is transformed by A into a new vector with the same orientation but (possibly) diﬀerent
length. Thus, our approach to computing a solution of Eq. 6.89 will be to attempt to guess
an eigenvector, check if that vector is transformed by A into a parallel vector, and if, as
expected, our guess is wrong, we try to improve upon the guess iteratively until convergence
is achieved.
For the matrix A given in Eq. 6.90, let our initial guess for an eigenvector be
u
(0)
=
_
_
_
1
0
0
_
_
_
, (6.91)
in which case
Au
(0)
=
_
¸
_
5 −2 0
−1
3
2

1
2
0 −
1
3
1
3
_
¸
_
_
¸
_
¸
_
1
0
0
_
¸
_
¸
_
=
_
¸
_
¸
_
5
−1
0
_
¸
_
¸
_
= 5
_
¸
_
¸
_
1.0
−0.2
0.0
_
¸
_
¸
_
= λ
1
u
(1)
, (6.92)
where, to allow easy comparison of u
(1)
with u
(0)
, we factored the largest component from
the right-hand side vector. Since u
(0)
and u
(1)
are not parallel, u
(0)
is not an eigenvector.
To determine if u
(1)
is an eigenvector, we repeat this calculation using u
(1)
:
Au
(1)
=
_
¸
_
5 −2 0
−1
3
2

1
2
0 −
1
3
1
3
_
¸
_
_
¸
_
¸
_
1.0
−0.2
0.0
_
¸
_
¸
_
=
_
¸
_
¸
_
5.40
−1.30
0.067
_
¸
_
¸
_
= 5.40
_
¸
_
¸
_
1.00
−0.24
0.01
_
¸
_
¸
_
= λ
2
u
(2)
. (6.93)
Since u
(1)
and u
(2)
are not parallel, we conclude that u
(1)
is not an eigenvector, and we
iterate again. It is useful to summarize all the iterations:
76
_
_
_
1
0
0
_
_
_
,
_
_
_
1.0
−0.2
0.0
_
_
_
,
_
_
_
1.00
−0.24
0.01
_
_
_
,
_
_
_
1.000
−0.249
0.015
_
_
_
,
_
_
_
1.000
−0.252
0.016
_
_
_
, ,
_
_
_
1.0000
−0.2518
0.0162
_
_
_
5.0 5.40 5.48 5.49 5.5036
Under each iterate for the eigenvector is shown the corresponding iterate for the eigenvalue.
Thus, we conclude that one eigenvalue of A is 5.5036, and the corresponding eigenvector is
_
_
_
1.0000
−0.2518
0.0162
_
_
_
.
This iterative procedure is called the power method or power iteration.
We now wish to determine which of the three eigenvalues has been found. A symmetric
system of order n has n eigenvectors which are orthogonal. Hence, any vector of dimension
n can be expanded in a linear combination of such vectors. Let the n eigenvectors of A be
φ
(1)
, φ
(2)
, , φ
(n)
, where [λ
1
[ < [λ
2
[ < < [λ
n
[. Then
u
(0)
= c
1
φ
(1)
+ c
2
φ
(2)
+ + c
n
φ
(n)
(6.94)
and
u
(1)
= Au
(0)
= c
1

(1)
+ c
2

(2)
+ + c
n

(n)
= c
1
λ
1
φ
(1)
+ c
2
λ
2
φ
(2)
+ + c
n
λ
n
φ
(n)
. (6.95)
Similarly,
u
(2)
= c
1
λ
2
1
φ
(1)
+ c
2
λ
2
2
φ
(2)
+ + c
n
λ
2
n
φ
(n)
, (6.96)
and, in general, after r iterations,
u
(r)
= c
1
λ
r
1
φ
(1)
+ c
2
λ
r
2
φ
(2)
+ + c
n
λ
r
n
φ
(n)
= c
n
λ
r
n
_
c
1
c
n
_
λ
1
λ
n
_
r
φ
(1)
+
c
2
c
n
_
λ
2
λ
n
_
r
φ
(2)
+ + φ
(n)
_
. (6.97)
Since ¸
¸
¸
¸
λ
i
λ
n
¸
¸
¸
¸
< 1 (6.98)
for all i < n, if follows that
u
(r)
→c
n
λ
r
n
φ
(n)
as r →∞. (6.99)
This last equation shows that power iteration converges to the nth eigenvector (the one
corresponding to the largest eigenvalue). However, the largest eigenvalue is rarely of interest
in engineering applications. For example, in mechanical vibration problems, it is the low
modes that are of interest. In fact, discrete element models such as ﬁnite element models
are necessarily low frequency models, and the higher modes are not physically meaningful if
the original problem was a continuum. In elastic stability problems, the buckling load factor
turns out to be an eigenvalue, and it is the lowest buckling load which is normally of interest.
77
6.5 Inverse Iteration
We recall that iteration with A in
Au = λu, (6.100)
yields the largest eigenvalue and corresponding eigenvector of A. We could alternatively
write the eigenvalue problem as
A
−1
u =
1
λ
u, (6.101)
so that iteration with A
−1
would converge to the largest value of 1/λ and hence the smallest
λ. This procedure is called inverse iteration. In practice, A
−1
A
−1
is both expensive and unnecessary. Instead, one solves the system of equations with A
as coeﬃcient matrix:
u
(k)
=
1
λ
Au
(k+1)
, (6.102)
where k is the iteration number. Since, for each iteration, the same coeﬃcient matrix A is
used, one can factor A = LU, save the factors, and repeat the forward-backward substitution
(FBS) for each iteration.
For the mechanical vibration problem, where A = M
−1
K, Eq. 6.102 becomes
Mu
(k)
=
1
λ
Ku
(k+1)
. (6.103)
K is factored once, and the LU factors saved. Each iteration requires an additional matrix
multiplication and FBS.
Consider the generalized eigenvalue problem
Ku = λMu, (6.104)
where K and M are both real, symmetric matrices. We can shift the origin of the eigenvalue
scale with the transformation
λ = λ
0
+ µ, (6.105)
where λ
0
is a speciﬁed shift. Eq. 6.104 then becomes
Ku = (λ
0
+ µ)Mu, (6.106)
or
(K−λ
0
M)u = µMu, (6.107)
which is an eigenvalue problem with µ as the eigenvalue. This equation can be arranged for
inverse iteration to obtain
Mu
(k)
=
1
µ
(K−λ
0
M)u
(k+1)
, (6.108)
where k is the iteration number, so that inverse iteration would converge to the smallest
eigenvalue µ. Thus, by picking λ
0
, we can ﬁnd the eigenvalue closest to the shift point λ
0
.
This procedure is referred to as inverse iteration with a shift. The matrix which must be
factored is K− λ
0
M. A practical application of inverse iteration with a shift would be the
mechanical vibration problem in which one seeks the vibration modes closest to a particular
frequency.
78
>
>
>
>
>
>
>
>
w
¸

(1)
`
v
¸
w

(1)
|Mφ
(1)
|
¡ ¸
Figure 21: Geometrical Interpretation of Sweeping.
6.6 Iteration for Other Eigenvalues
We saw from the preceding sections that power iteration converges to the largest eigenvalue,
and inverse iteration converges to the smallest eigenvalue. It is often of interest to ﬁnd other
eigenvalues as well. Here we describe the modiﬁcation to the power iteration algorithm to
allow convergence to the second largest eigenvalue.
Consider again the eigenvalue problem
Au = λu, (6.109)
where A = M
−1
K, and assume we have already found the largest eigenvalue and its cor-
responding eigenvector φ
(1)
. Let φ
(2)
denote the eigenvector corresponding to the second
largest eigenvalue. Thus,

(2)
= λφ
(2)
, (6.110)
where, by orthogonality,
φ
(2)T

(1)
= 0 or φ
(2)

(1)
= 0. (6.111)
(The orthogonality relation has been written in both matrix and vector notations.)
To ﬁnd φ
(2)
, given φ
(1)
, we will perform another power iteration but with the added
requirement that, at each iteration, the vector used in the iteration is orthogonal to φ
(1)
.
For example, consider the vector w as a trial second eigenvector. Fig. 21 shows the plane of
the two vectors w and Mφ
(1)
. Since a unit vector in the direction of Mφ
(1)
is

(1)
[Mφ
(1)
[
,
the length of the projection of w onto Mφ
(1)
is
w

(1)
[Mφ
(1)
[
,
and the vector projection of w onto Mφ
(1)
is
_
w

(1)
[Mφ
(1)
[
_

(1)
[Mφ
(1)
[
.
Thus, a new trial vector v which is orthogonal to Mφ
(1)
is
ˆ v = w−
_
w

(1)
[Mφ
(1)
[
_

(1)
[Mφ
(1)
[
. (6.112)
79
This procedure of subtracting from each trial vector (at each iteration) any components
parallel to Mφ
(1)
is sometimes referred to as a sweeping procedure. This procedure is also
equivalent to the Gram-Schmidt orthogonalization procedure discussed previously, except
that here we do not require that v be a unit vector.
Note that, even though A is used in the iteration, we must use either M or K in the
orthogonality relation rather than A or the identity I, since, in general, A is not symmetric
(e.g., Eq. 6.90). With a nonsymmetric A, there is no orthogonality of the eigenvectors with
respect to A or I. If A happens to be symmetric (a special case of the symmetric generalized
eigenvalue problem), then the eigenvectors would be orthogonal with respect to A and I.
Thus, even though we may iterate using A, the orthogonality requires M or K weighting.
M is chosen, since it usually contains more zeros than K and, for the example to follow, is
diagonal.
To illustrate the sweeping procedure, we continue the mechanical vibrations example
started earlier. Recall that, from power iteration,
A = M
−1
K =
_
¸
_
5 −2 0
−1
3
2

1
2
0 −
1
3
1
3
_
¸
_
, M =
_
¸
_
1 0 0
0 2 0
0 0 3
_
¸
_
, φ
(1)
=
_
¸
_
¸
_
1.0000
−0.2518
0.0162
_
¸
_
¸
_
, (6.113)
where

(1)
=
_
_
1 0 0
0 2 0
0 0 3
_
_
_
_
_
1.0000
−0.2518
0.0162
_
_
_
=
_
_
_
1.0000
−0.5036
0.0486
_
_
_
, (6.114)
a vector whose length is [Mφ
(1)
[ = 1.1207. Thus, the unit vector in the direction of Mφ
(1)
is

(1)
[Mφ
(1)
[
=
_
_
_
0.8923
−0.4494
0.0434
_
_
_
. (6.115)
The iterations are most conveniently displayed with a spreadsheet:
80
i w
(i)
α =
w
(i)

(1)
|Mφ
(1)
|
w
(i)
−α

(1)
|Mφ
(1)
|
λ
i
v
(i)
Av
(i)
= w
(i+1)
0
_
_
_
1
0
0
_
_
_
0.8923
_
_
_
0.2038
0.4010
−0.0387
_
_
_
0.4010
_
_
_
0.5082
1.0000
−0.0965
_
_
_
_
_
_
0.5410
1.0401
−0.3655
_
_
_
1
_
_
_
0.5410
1.0401
−0.3655
_
_
_
−0.0005
_
_
_
0.5414
1.0399
−0.3655
_
_
_
1.0399
_
_
_
0.5206
1.0000
−0.3515
_
_
_
_
_
_
0.6030
1.1552
−0.4505
_
_
_
2
_
_
_
0.6030
1.1552
−0.4505
_
_
_
−0.0006
_
_
_
0.6035
1.1549
−0.4505
_
_
_
1.1549
_
_
_
0.5226
1.0000
−0.3901
_
_
_
_
_
_
0.6130
1.1725
−0.4634
_
_
_
3
_
_
_
0.6130
1.1725
−0.4634
_
_
_
−0.0001
_
_
_
0.6131
1.1725
−0.4634
_
_
_
1.1725
_
_
_
0.5229
1.0000
−0.3952
_
_
_
_
_
_
0.6145
1.1747
−0.4651
_
_
_
4
_
_
_
0.6145
1.1747
−0.4651
_
_
_
0.0002
_
_
_
0.6143
1.1748
−0.4651
_
_
_
1.1748
_
_
_
0.5229
1.0000
−0.3959
_
_
_
_
_
_
0.6145
1.1751
−0.4653
_
_
_
5
_
_
_
0.6145
1.1751
−0.4653
_
_
_
0.0000
_
_
_
0.6145
1.1751
−0.4653
_
_
_
1.1751
_
_
_
0.5229
1.0000
−0.3960
_
_
_
_
_
_
0.6145
1.1751
−0.4653
_
_
_
6
_
_
_
0.6145
1.1751
−0.4653
_
_
_
0.0000
_
_
_
0.6145
1.1751
−0.4653
_
_
_
1.1751
_
_
_
0.5229
1.0000
−0.3960
_
_
_
Thus, the second largest eigenvalue of A is 1.1751, and the corresponding eigenvector is
_
_
_
0.5229
1.0000
−0.3960
_
_
_
.
6.7 Similarity Transformations
Two matrices A and B are deﬁned as similar if B = M
−1
AM for some invertible matrix
M. The transformation from A to B (or vice versa) is called a similarity transformation.
The main property of interest is that similar matrices have the same eigenvalues. To
prove this property, consider the eigenvalue problem
Ax = λx, (6.116)
and let
B = M
−1
AM or A = MBM
−1
. (6.117)
The substitution of Eq. 6.117 into Eq. 6.116 yields
MBM
−1
x = λx (6.118)
or
B(M
−1
x) = λ(M
−1
x), (6.119)
81
which completes the proof. This equation also shows that, if x is an eigenvector of A, M
−1
x
is an eigenvector of B, and λ is the eigenvalue.
A similarity transformation can be interpreted as a change of basis. Consider, for exam-
ple, mechanical vibrations, where the eigenvalue problem computes the natural frequencies
and mode shapes of vibration. If the eigenvalues (natural frequencies) do not change under
the transformation, the system represented by A has not changed. However, the eigenvec-
tors are transformed from x to M
−1
x, a transformation which can be thought of as simply
a change of coordinates.
We recall from Property 6 of the eigenvalue problem (page 69) that, if a matrix M is
formed with the eigenvectors of A in the columns, and the eigenvectors are independent,
M
−1
AM = Λ =
_
¸
¸
¸
_
λ
1
λ
2
.
.
.
λ
n
_
¸
¸
¸
_
, (6.120)
where Λ is a diagonal matrix populated with the eigenvalues of A. Thus, the way to
“simplify” a matrix (by diagonalizing it) is to ﬁnd its eigenvectors. The eigenvalue problem
is thus the basis for ﬁnding principal stresses and principal strains in elasticity and principal
axes of inertia.
6.8 Positive Deﬁnite Matrices
For a square matrix M, the scalar Q = x
T
Mx = x Mx is called a quadratic form. In index
Q =
n

i=1
n

j=1
M
ij
x
i
x
j
(6.121)
= M
11
x
2
1
+ M
22
x
2
2
+ + (M
12
+ M
21
)x
1
x
2
+ (M
13
+ M
31
)x
1
x
3
+ . (6.122)
A matrix S is said to be symmetric if S = S
T
(i.e., S
ij
= S
ji
). A matrix A is said to
be antisymmetric (or skew symmetric) if A = −A
T
(i.e., A
ij
= −A
ji
). For example, an
antisymmetric matrix of order 3 is necessarily of the form
A =
_
_
0 A
12
A
13
−A
12
0 A
23
−A
13
−A
23
0
_
_
. (6.123)
Any square matrix M can be written as the unique sum of a symmetric and an antisym-
metric matrix M = S +A, where
S = S
T
=
1
2
(M+M
T
), A = −A
T
=
1
2
(M−M
T
). (6.124)
For example, for a 3 3 matrix M,
M =
_
_
3 5 7
1 2 8
9 6 4
_
_
=
_
_
3 3 8
3 2 7
8 7 4
_
_
+
_
_
0 2 −1
−2 0 1
1 −1 0
_
_
= S +A. (6.125)
82
r
>
>
>.
F
1
r
`
F
2
r ¸
F
3
Figure 22: Elastic System Acted Upon by Forces.
The antisymmetric part of a matrix does not contribute to the quadratic form, i.e., if A
is antisymmetric, x
T
Ax = 0 for all vectors x. To prove this property, we note that
x
T
Ax = (x
T
Ax)
T
= x
T
A
T
x = −x
T
Ax = 0, (6.126)
since the scalar is equal to its own negative, thus completing the proof. Thus, for any square
matrix M,
x
T
Mx = x
T
Sx, (6.127)
where S is the symmetric part of M.
A square matrix M is deﬁned as positive deﬁnite if x
T
Mx > 0 for all x ,= 0. M is deﬁned
as positive semi-deﬁnite if x
T
Mx ≥ 0 for all x ,= 0.
Quadratic forms are of interest in engineering applications, since, physically, a quadratic
form often corresponds to energy. For example, consider an elastic system (Fig. 22) acted
upon by forces in static equilibrium. Let u denote the vector of displacements, and let F
denote the vector of forces. If the forces and displacements are linearly related by generalized
Hooke’s law,
F = Ku, (6.128)
where K is the system stiﬀness matrix.
The work W done by the forces F acting through the displacements u is then
W =
1
2
u F =
1
2
u Ku =
1
2
u
T
Ku, (6.129)
which is a quadratic form. Since the work required to deform a mechanical system in
equilibrium (and with suﬃcient constraints to prevent rigid body motion) is positive for any
set of nonzero displacements,
W =
1
2
u Ku > 0. (6.130)
That is, the stiﬀness matrix K for a mechanical system in equilibrium and constrained to
prevent rigid body motion must be positive deﬁnite.
The main property of interest for positive deﬁnite matrices is that a matrix M is positive
deﬁnite if, and only if, all the eigenvalues of M are positive. To prove this statement, we
must prove two properties: (1) positive deﬁniteness implies positive eigenvalues, and (2)
positive eigenvalues imply positive deﬁniteness.
We ﬁrst prove that positive deﬁniteness implies positive eigenvalues. For the ith eigen-
value λ
i
,
Mx
(i)
= λ
i
x
(i)
, (6.131)
83
where x
(i)
is the corresponding eigenvector. If we form the dot product of both sides of this
equation with x
(i)
, we obtain
0 < x
(i)
Mx
(i)
= λ
i
x
(i)
x
(i)
= λ
i
¸
¸
x
(i)
¸
¸
2
, (6.132)
which implies λ
i
> 0, thus completing the ﬁrst part of the proof.
We now prove that positive eigenvalues imply positive deﬁniteness. Since, for a quadratic
form, only the symmetric part of M matters, we can assume that M is symmetric. From
Property 9 of the eigenvalue problem (page 71), the eigenvectors of M are mutually orthog-
onal, which implies that the eigenvectors are independent. We can therefore expand any
vector x using an eigenvector basis:
x =
n

i=1
c
i
x
(i)
, (6.133)
where x
(i)
is the ith eigenvector. Then,
Mx =
n

i=1
c
i
Mx
(i)
=
n

i=1
c
i
λ
i
x
(i)
, (6.134)
which implies that
x Mx =
n

j=1
c
j
x
(j)

n

i=1
c
i
λ
i
x
(i)
=
n

i=1
n

j=1
c
i
c
j
λ
i
x
(i)
x
(j)
. (6.135)
However, since the eigenvectors are mutually orthogonal,
x
(i)
x
(j)
= 0 for i ,= j. (6.136)
Therefore,
x Mx =
n

i=1
c
2
i
λ
i
¸
¸
x
(i)
¸
¸
2
, (6.137)
which is positive if all λ
i
> 0, thus completing the second part of the proof. This important
property establishes the connection between positive deﬁniteness and eigenvalues.
Two other consequences of positive-deﬁniteness are
1. A positive deﬁnite matrix is non-singular, since Ax = 0 implies x
T
Ax = 0 implies
x = 0 implies A
−1
exists.
2. Gaussian elimination can be performed without pivoting.
6.9 Application to Diﬀerential Equations
Consider the ordinary diﬀerential equation
y

= ay, (6.138)
84
where y(t) is an unknown function to be determined, y

(t) is its derivative, and a is a constant.
All solutions of this equation are of the form
y(t) = y(0)e
at
, (6.139)
where y(0) is the initial condition.
The generalization of Eq. 6.138 to systems of ordinary diﬀerential equations is
y

1
= a
11
y
1
+ a
12
y
2
+ + a
1n
y
n
y

2
= a
21
y
1
+ a
22
y
2
+ + a
2n
y
n
.
.
.
y

n
= a
n1
y
1
+ a
n2
y
2
+ + a
nn
y
n
,
(6.140)
where y
1
(t), y
2
(t), . . . , y
n
(t) are the unknown functions to be determined, and the a
ij
are
constants. In matrix notation, this system can be written
y

= Ay, (6.141)
where
y =
_
¸
¸
¸
_
¸
¸
¸
_
y
1
y
2
.
.
.
y
n
_
¸
¸
¸
_
¸
¸
¸
_
, A =
_
¸
¸
¸
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
nn
_
¸
¸
¸
_
. (6.142)
Our principal interest in this section is in solving systems of coupled diﬀerential equations
like Eq. 6.141. Note that, if the matrix A is a diagonal matrix, Eq. 6.141 is easy to solve,
since each equation in the system involves only one unknown function (i.e., the system is
uncoupled). Thus, to solve the system y

= Ay for which A is not diagonal, we will attempt
a change of variable
y = Su, (6.143)
where S is a square matrix of constants picked to yield a new system with a diagonal
coeﬃcient matrix. The substitution of Eq. 6.143 into Eq. 6.141 yields
Su

= ASu (6.144)
or
u

=
_
S
−1
AS
_
u = Du, (6.145)
where
D = S
−1
AS, (6.146)
and we have assumed that S is invertible. The choice for S is now clear: S is the matrix
which diagonalizes A. From Property 6 of the eigenvalue problem (page 69), we see that S
is the matrix whose columns are the linearly independent eigenvectors of A, and D is the
diagonal matrix of eigenvalues:
D = S
−1
AS =
_
¸
¸
¸
¸
¸
_
λ
1
λ
2
λ
3
.
.
.
λ
n
_
¸
¸
¸
¸
¸
_
= Λ, (6.147)
85
where λ
i
is the ith eigenvalue of A.
Thus, we can use the following procedure for solving the coupled system of diﬀerential
equations y

= Ay:
1. Find the eigenvalues and eigenvectors of A.
2. Solve the uncoupled (diagonal) system u

= Λu, where Λ is the diagonal matrix of
eigenvalues of A.
3. Determine y from the equation y = Su, where S is the matrix whose columns are the
eigenvectors of A.
We illustrate this procedure by solving the system
y

1
= y
1
+ y
2
y

2
= 4y
1
−2y
2
(6.148)
with the initial conditions y
1
(0) = 1, y
2
(0) = 6. The coeﬃcient matrix for this system is
A =
_
1 1
4 −2
_
. (6.149)
Since the characteristic polynomial is
det(A−λI) =
¸
¸
¸
¸
1 −λ 1
4 −2 −λ
¸
¸
¸
¸
= λ
2
+ λ −6 = (λ + 3)(λ −2), (6.150)
the eigenvalues of A are λ = 2 and λ = −3. The corresponding eigenvectors are (1, 1) and
(1, −4), respectively. Thus, a diagonalizing matrix for A is
S =
_
1 1
1 −4
_
, (6.151)
and
Λ = S
−1
AS =
1
5
_
4 1
1 −1
_ _
1 1
4 −2
_ _
1 1
1 −4
_
=
_
2 0
0 −3
_
. (6.152)
Therefore, the substitution y = Su yields the new diagonal system
u

= Λu =
_
2 0
0 −3
_
u (6.153)
or
u

1
= 2u
1
, u

2
= −3u
2
, (6.154)
whose general solutions are
u
1
= c
1
e
2t
, u
2
= c
2
e
−3t
. (6.155)
Thus, y, the original dependent variable of interest, is given by
y = Su =
_
1 1
1 −4
_ _
u
1
u
2
_
=
_
c
1
e
2t
+ c
2
e
−3t
c
1
e
2t
−4c
2
e
−3t
_
(6.156)
86
or
_
y
1
= c
1
e
2t
+ c
2
e
−3t
y
2
= c
1
e
2t
−4c
2
e
−3t
.
(6.157)
The application of the initial conditions yields
c
1
+ c
2
= 1
c
1
−4c
2
= 6
(6.158)
or c
1
= 2, c
2
= −1. Thus, the solution of the original system is
_
y
1
= 2e
2t
−e
−3t
y
2
= 2e
2t
+ 4e
−3t
.
(6.159)
Notice from Eq. 6.156 that the solution can be written in the form
y = [v
1
v
2
]
_
u
1
u
2
_
= u
1
v
1
+ u
2
v
2
= c
1
e
λ
1
t
v
1
+ c
2
e
λ
2
t
v
2
, (6.160)
where the eigenvectors of A are the columns of S:
v
1
=
_
1
1
_
, v
2
=
_
1
−4
_
. (6.161)
Thus, in general, if the coeﬃcient matrix A of the system y

= Ay is diagonalizable, the
general solution can be written in the form
y = c
1
e
λ
1
t
v
1
+ c
2
e
λ
2
t
v
2
+ + c
n
e
λnt
v
n
, (6.162)
where λ
1
, λ
2
, . . . , λ
n
are the eigenvalues of A, and v
i
is the eigenvector corresponding to λ
i
.
A comparison of the matrix diﬀerential equation, Eq. 6.141, with the scalar equation,
Eq. 6.138, indicates that the solution of Eq. 6.141 can be written in the form
y(t) = e
At
y(0), (6.163)
where the exponential of the matrix At is formally deﬁned [8] by the convergent power series
e
At
= I +At +
(At)
2
2!
+
(At)
3
3!
+ . (6.164)
From Eq. 6.147,
A = SΛS
−1
, (6.165)
where S is a matrix whose columns are the eigenvectors of A. Notice that, for integers k,
A
k
= (SΛS
−1
)
k
= (SΛS
−1
)(SΛS
−1
) (SΛS
−1
) = SΛ
k
S
−1
, (6.166)
since S
−1
cancels S. Thus,
e
At
= I +SΛS
−1
t +

2
S
−1
t
2
2!
+

3
S
−1
t
3
3!
+ (6.167)
= S
_
I +Λt +
(Λt)
2
2!
+
(Λt)
3
3!
+
_
S
−1
= Se
Λt
S
−1
. (6.168)
87
Since Λt is a diagonal matrix, the powers (Λt)
k
are also diagonal, and e
Λt
is therefore the
diagonal matrix
e
Λt
=
_
¸
¸
¸
_
e
λ
1
t
e
λ
2
t
.
.
.
e
λnt
_
¸
¸
¸
_
. (6.169)
Thus, the solution of y

= Ay can be written in the form
y(t) = Se
Λt
S
−1
y(0), (6.170)
where S is a matrix whose columns are the eigenvectors of A, Λ is the diagonal matrix of
eigenvalues, and the exponential is given by Eq. 6.169.
For the example of Eq. 6.148, Eq. 6.170 yields
_
y
1
y
2
_
=
1
5
_
1 1
1 −4
_ _
e
2t
0
0 e
−3t
_ _
4 1
1 −1
_ _
1
6
_
=
_
2e
2t
−e
−3t
2e
2t
+ 4e
−3t
_
, (6.171)
in agreement with Eq. 6.159.
Note that this discussion of ﬁrst-order systems includes as a special case the nth-order
equation with constant coeﬃcients, since a single nth-order ODE is equivalent to a system
of n ﬁrst-order ODEs. For example, the n-th order system
a
n
y
(n)
+ a
n−1
y
(n−1)
+ a
n−2
y
(n−2)
+ + a
1
y

+ a
0
y = 0, a
n
,= 0, (6.172)
can be replaced by a ﬁrst-order system if we rename the derivatives as y
(i)
= y
i+1
(t), in
which case _
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
y

1
= y
2
y

2
= y
3
.
.
.
y

n−1
= y
n
y

n
= (−a
0
y
1
−a
1
y
2
− −a
n−2
y
n−1
−a
n−1
y
n
)/a
n
.
(6.173)
6.10 Application to Structural Dynamics
We saw in the discussion of mechanical vibrations (¸6.1, p. 65) that the equation of motion
for a linear, multi-DOF, undamped mechanical system is
M¨ u +Ku = F(t), (6.174)
where Mis the system mass matrix, Kis the system stiﬀness matrix, F is the time-dependent
applied force vector, u(t) is the unknown displacement vector, and dots denote diﬀerentia-
tion with respect to the time t. If the system is damped, a viscous damping matrix B is
introduced, and the equations of motion become
M¨ u +B˙ u +Ku = F(t). (6.175)
88
All three system matrices are real and symmetric. Equations like this arise whether the
system is a discrete element system such as in Fig. 20 or a continuum modeled with ﬁnite
elements. The general problem is to integrate these equations in time to determine u(t),
given the initial displacement u
0
and initial velocity ˙ u
0
.
There are several approaches used to solve these equations, one of which is the Newmark
method (Eq. 1.75) discussed on p. 15. Another approach expands the solution in terms of
vibration modes (the modal superposition approach). We recall that, for the free, undamped
vibration problem
Ku = λMu, (6.176)
the eigenvectors are orthogonal with respect to both the mass M and stiﬀness K (Property
9, p. 71). Thus, the eigenvectors could be used to form a basis for the solution space, and
the unknown displacement vector in Eq. 6.175 could be written as a linear combination of
the modes:
u(t) = ξ
1
(t)x
(1)
+ ξ
2
(t)x
(2)
+ + ξ
m
(t)x
(m)
. (6.177)
The number of terms in this expansion would be n (the number of physical DOF) if all
eigenvectors were used in the expansion, but, in practice, a relatively small number m of
modes is usually suﬃcient for acceptable accuracy. The modes selected are generally those
having lowest frequency (eigenvalue). In this equation, the multipliers ξ
i
(t) are referred to
as the modal coordinates, which can be collected into the array
ξ(t) =
_
¸
¸
¸
_
¸
¸
¸
_
ξ
1
(t)
ξ
2
(t)
.
.
.
ξ
m
(t)
_
¸
¸
¸
_
¸
¸
¸
_
. (6.178)
Also, like in Property 6 (p. 69), we form the matrix S whose columns are the eigenvectors:
S =
_
x
(1)
x
(2)
x
(m)
¸
, (6.179)
in which case Eq. 6.177 can be written in matrix form as
u(t) = Sξ(t). (6.180)
This equation can be interpreted as a transformation from physical coordinates u to modal
coordinates ξ. If this equation is substituted into the equation of motion, Eq. 6.175, and
multiplied by S
T
, we obtain
S
T
MS
¨
ξ +S
T
BS
˙
ξ +S
T
KSξ = S
T
F(t) = P(t), (6.181)
where
´
M = S
T
MS and
´
K = S
T
KS are, by orthogonality, diagonal matrices referred to
as the modal mass and stiﬀness matrices, respectively. The diagonal entries in these two
matrices are the generalized masses and stiﬀnesses for the individual vibration modes. The
modal damping matrix
´
B = S
T
BS is, in general, not a diagonal matrix, since the eigenvalue
problem did not involve B. Thus, in modal coordinates, the equations of motion are
´
M
¨
ξ +
´
B
˙
ξ +
´
Kξ = P(t). (6.182)
89
This last equation is similar to Eq. 6.175 except that this equation has m DOF (the
number of eigenvectors in the modal expansion), which generally is much smaller than the
number n of physical DOF in Eq. 6.175. If
´
Bis non-diagonal, both equations require a similar
time integration. The main beneﬁt of the modal formulation is that, in structural dynamics,
it is common to approximate damping with a mode-by-mode frequency-dependent viscous
damping coeﬃcient so that
´
B is diagonal, in which case the modal equations (Eq. 6.182)
uncouple into m scalar equations of the form
m
i
¨
ξ
i
+ b
i
˙
ξ
i
+ k
i
ξ
i
= P
i
(t), (6.183)
where m
i
and k
i
are the generalized mass and stiﬀness, respectively, for mode i. This is
the equation of motion for the ith modal coordinate. Since it is a scalar equation with an
arbitrary right-hand side, it has a known analytic solution in terms of an integral. Once all
the scalar equations have been solved, the solution u(t) in terms of physical coordinates is
obtained from Eq. 6.177. Thus we see that a multi-DOF dynamical system can be viewed
as a collection of single-DOF mass-spring-damper systems associated with the modes.
Most of the computational expense in the modal expansion method arises from solving
the eigenvalue problem. Thus, the modal method is generally advantageous in structural
dynamics if a small number of modes gives suﬃcient accuracy, if many time steps are required
in the analysis, or if the structure has no localized damping treatments.
Bibliography
[1] J.W. Brown and R.V. Churchill. Fourier Series and Boundary Value Problems.
McGraw-Hill, Inc., New York, seventh edition, 2006.
[2] R.W. Clough and J. Penzien. Dynamics of Structures. McGraw-Hill, Inc., New York,
second edition, 1993.
[3] K.H. Huebner, D.L. Dewhirst, D.E. Smith, and T.G. Byrom. The Finite Element
Method for Engineers. John Wiley and Sons, Inc., New York, fourth edition, 2001.
[4] D.C. Kay. Schaum’s Outline of Theory and Problems of Tensor Calculus. Schaum’s
Outline Series. McGraw-Hill, Inc., New York, 1988.
[5] A.J. Laub. Matrix Analysis for Scientists and Engineers. Society for Industrial and
[6] S. Lipschutz. 3000 Solved Problems in Linear Algebra. Schaum’s Outline Series.
McGraw-Hill, Inc., New York, 1988.
[7] S. Lipschutz. Schaum’s Outline of Theory and Problems of Linear Algebra. Schaum’s
Outline Series. McGraw-Hill, Inc., New York, second edition, 1991.
[8] C. Moler and C. Van Loan. Nineteen dubious ways to compute the exponential of a
matrix, twenty-ﬁve years later. SIAM Review, 45(1):3–49, 2003.
90
[9] M.R. Spiegel. Schaum’s Outline of Fourier Analysis With Applications to Boundary
Value Problems. Schaum’s Outline Series. McGraw-Hill, Inc., New York, 1974.
[10] G. Strang. Linear Algebra and Its Applications. Thomson Brooks/Cole, Belmont, CA,
fourth edition, 2006.
[11] J.S. Vandergraft. Introduction to Numerical Calculations. Academic Press, Inc., New
York, second edition, 1983.
91
Index
backsolving, 5, 19
matrix interpretation, 14
band matrix, 8
basis for vector space, 30
basis functions, 52, 60
basis vectors, 52
Bessel’s inequality, 60
buckling, 65
change of basis, 40
characteristic equation, 65, 74
column space, 26, 30
completeness, 61
Crank-Nicolson method, 19
determinants, 17, 43
diagonal system, 4
diagonally dominant, 25
diﬀerential equations, 84
dimension of vector space, 30
direct methods, 20
dot product, 2
echelon form of matrix, 28
eigenvalue problems, 64
generalized, 71
properties, 68
sweeping, 80
elementary operations, 6
equation(s)
backsolving, 5
characteristic, 65, 74
diagonal system, 4
diﬀerential, 84
elementary operations, 6
forward solving, 6
Gaussian elimination, 6
homogeneous, 28
inconsistent, 4
lower triangular, 6
nonhomogeneous, 28
nonsingular, 4
normal, 49, 50, 53
partial pivoting, 10
rectangular systems, 26
upper triangular, 5
FBS, 15
ﬁnite diﬀerence method, 15
forward solving, 6
Fourier series, 55, 58
Bessel’s inequality, 60
convergence, 58
Gibbs phenomenom, 58
least squares, 64
polynomial basis, 61
function(s)
basis, 60
complete set, 61
inner product, 55
norm, 56
normalized, 56
orthogonal, 56
orthonormal, 56
scalar product, 55
Gaussian elimination, 6, 84
generalized eigenvalue problem, 71
generalized mass, stiﬀness, 72
Gibbs phenomenom, 58
Gram-Schmidt orthogonalization, 53
Hilbert matrix, 62
inconsistent equations, 4
inner product, 2, 55
invariants, 75
inverse iteration, 78
iterative methods, 20
Jacobi’s method, 20
Kronecker delta, 1, 41, 48, 49, 56
law of cosines, 4
least squares problems, 48
Legendre polynomials, 63
93
linear independence, 29
LU decomposition, 13
storage scheme, 13
mass matrix, 72
matrix
band, 8
block, 74
Hilbert, 62
identity, 1
inverse, 2, 19
lower triangular, 2
orthogonal, 2, 43
positive deﬁnite, 82
projection, 40
pseudoinverse, 31
rank, 28
rotation, 43
symmetric, 40
trace, 69
tridiagonal, 2
upper triangular, 2
mean square error, 52, 59
mechanical vibrations, 65, 75
mode shape, 67
mode superposition, 89
moments of inertia, 75
multiple right-hand sides, 7, 18
multiplier, 8
natural frequency, 67
Newmark method, 15
nonsingular equations, 4
norm of function, 56
normal equations, 49, 50, 53
null space, 26
dimension, 31
orthogonal to row space, 39
operation counts, 9
orthogonal
coordinate transformation, 42
eigenvectors, 71
functions, 56
matrix, 2
subspaces, 38
vectors, 3
orthonormal
basis, 41
set, 56
partial pivoting, 10, 84
pivots, 28
power iteration, 75
convergence, 77
principal stress, 73, 75
projections onto lines, 39
QR factorization, 54
Rayleigh quotient, 72
residual, 48, 50
row space, 30
orthogonal to null space, 39
scalar product
functions, 55
similarity transformation, 81
spanning a subspace, 30
spectral theorem, 73
stiﬀness matrix, 72
stress
invariants, 75
principal, 75
structural dynamics, 15, 88
integrator, 15
subspace, 26
summation convention, 41, 49
sweeping procedure, 80
tensor(s), 44
examples, 44
isotropic, 48
trace of matrix, 69
transformation(s)
diﬀerentiation, 35
integration, 36
linear, 33
projection, 34
reﬂection, 34
rotation, 34, 37
94
similarity, 81
stretching, 33
tridiagonal system, 19
unit vector(s), 3, 41
upper triangular system, 5
vector spaces, 25
95

A This book was typeset with L TEX 2ε (MiKTeX).

Preface
These lecture notes are intended to supplement a one-semester graduate-level engineering course at The George Washington University in algebraic methods appropriate to the solution of engineering computational problems, including systems of linear equations, linear vector spaces, matrices, least squares problems, Fourier series, and eigenvalue problems. In general, the mix of topics and level of presentation are aimed at upper-level undergraduates and ﬁrstyear graduate students in mechanical, aerospace, and civil engineering. Gordon Everstine Gaithersburg, Maryland June 2010

iii

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . 1. . . . . . . . . . . 1. . . . .6 Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . .4 QR Factorization . . .10 Determinants . . . . . . . . . . . . .2 Linear Independence . . . . . . . . . 5. . . . . . . . . . . . . . . .1 Rectangular Systems of Equations 2. . . . . . . . . . . . . . . . . .6 Linear Transformations . . . . . . . . .2 The General Linear Least Squares 4. . 4 Least Squares Problems 4.1 Tensors . . . . . . . . . . . . . . Problem . . . . . . . . . . . . .1 Example . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Contents 1 Systems of Linear Equations 1. . . . . . . . . . . . . . . . . . . . . . . 1. . . . .2 Nonsingular Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . .3 Fourier Expansions Using a Polynomial Basis . . . . . . .5 Pseudoinverses . . . v . . . . . . . . . . .8 Matrix Interpretation of Back Substitution . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . 2. . . . . . . . . . . . . . . . . .3 Diagonal and Triangular Systems . . . 2. . . . . . . . .2 Examples of Tensors . . 3. . . . . . . . 1. . . . . . . . . . . . . . . . .1 Least Squares Fitting of Data . . . . . . . . . . . . .3 Spanning a Subspace . . . . . . . . . . . . . . 5 Fourier Series 5. . . . . . . . . . . . . . . . . . . . . . . . .4 Similarity of Fourier Series With Least Squares Fitting . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 Tridiagonal Systems . . . . . . . . . . . . . . .13 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Projections Onto Lines . . . . . . . . . . . .2 Generalized Fourier Series . . . . . . . . . . 3. . .11 Multiple Right-Hand Sides and Matrix Inverses 1. . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . .9 LDU Decomposition . . . . . . . . . . . . . . 2. . . . . . . . . . . .4 Gaussian Elimination . 2. . . . . . . . . . . . . . . . . . . . . . . . . 3 Change of Basis 3. . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . .1 Deﬁnitions . . . . . . . . . . . . . . . 1. . . . . . . . .4 Row Space and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . 1 1 4 4 6 9 10 11 14 16 17 18 19 20 25 26 29 30 30 31 33 38 39 40 44 44 48 48 50 52 53 54 55 58 58 61 64 . . . . . . . . . . . . . . . . . . .3 Gram-Schmidt Orthogonalization 4.7 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Vector Spaces 2. . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Operation Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Orthogonal Subspaces . . . . . . . . . . . . . .3 Isotropic Tensors . . . . .

. . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . Example of Least Squares Fit of Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotation by Angle θ. . . . . (c) Mode 2 (masses out-of-phase). . . . . . . . . . .8 Positive Deﬁnite Matrices . . . . . . . 6. . . . . 6. . .2 Properties of the Eigenvalue Problem . . . . . . . . . . Mode Shapes for 2-DOF Mass-Spring System. . . . . . . . . .5 Inverse Iteration . . . . . . .3 Example 2: Principal Axes of Stress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Application to Structural Dynamics . 2-DOF Mass-Spring System. . . . . . . . . . . . . . . . .6 Eigenvalue Problems 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Computing Eigenvalues by Power Iteration 6. . (b) Mode 1 (masses in-phase). . . . . Element Coordinate System for Pin-Jointed Rod. . . . . . .7 Similarity Transformations . . . . . . . . . . . . . . . . . . 64 65 68 73 75 78 79 81 82 84 88 90 93 List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Two Vectors. . . . Some Vectors That Span the xy-Plane. . . . . . . . . . . . . 6. . . . . . .6 Iteration for Other Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90◦ Rotation. . . . . . . . . . . . Projection Onto Line. . . . Reﬂection in 45◦ Line. . . . . . . Bibliography Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Application to Diﬀerential Equations . . . . . . . . . . . . . . The First Two Vectors in Gram-Schmidt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projection Onto Horizontal Axis. . . . . . . . . . . . . . . . . . . . . . . . . .1 Example 1: Mechanical Vibrations . . . The First Two Vectors in Gram-Schmidt. . . . . . . . . . . . . . . . . . . Geometrical Interpretation of Sweeping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of Series in Example. . . . . . . . . . . . . . . The Projection of One Vector onto Another. . . . . . . . . . . . . . . . . . . . 3 30 34 34 34 37 39 40 41 41 47 50 54 58 59 61 63 65 67 75 79 83 vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elastic System Acted Upon by Forces. . . . . . . . . . . . . . . . . Basis Vectors in Polar Coordinate System. . . . . . . . . . . . . . . . . . . . . . A Function with a Jump Discontinuity. . . 3-DOF Mass-Spring System. 6. . Change of Basis. . . . 6. . . . . . . . . . . . . . . Element Coordinate Systems in the Finite Element Method. . . . . . . . . . . . . . . . (a) Undeformed shape. . . . .

4) and x is the solution vector    x1      x2   .. . An1 An2 · · · Ann    . x= . . .  (1. .7) . .  (1. i = j. (1. . 0. (1.6) In index notation.  .5) 1.1) The problem is to ﬁnd numbers x1 . given the coeﬃcients Aij and the right-hand side b1 . . i = j. .   .  . . . . .1 Deﬁnitions matrix with 1s on the diagonal:  0 ··· 0 0 ··· 0   1 ··· 0 . . b= .  . · · · . . .    b   n A11 A12 · · · A1n A21 A22 · · · A2n . xn . the identity matrix is the Kronecker delta: Iij = δij = 1 1. 0 ··· 1 The identity matrix I is the square diagonal  1 0  0 1   I= 0 0  . b2 .  .  .    x   n (1. . . 0 0 (1.  .2) where A is the n × n matrix    A=  b is the right-hand side vector    b1      b2   . .1 Systems of Linear Equations A system of n linear equations in n unknowns can be expressed in the form  A11 x1 + A12 x2 + A13 x3 + · · · + A1n xn = b1    A21 x1 + A22 x2 + A23 x3 + · · · + A2n xn = b2  .   . bn . . . In matrix notation..  . Ax = b.   An1 x1 + An2 x2 + An3 x3 + · · · + Ann xn = bn . · · · . x2 .3) (1.

u2 . That is. If C is another inverse. e2 . An upper triangular matrix has only zeros below the diagonal. (1.12) A= 0 0 x x x 0     0 0 0 x x x  0 0 0 0 x x Consider two three-dimensional vectors u and v given by u = u1 e1 + u2 e2 + u3 e3 . e.11) A tridiagonal matrix has all its nonzeros on the main diagonal and the two adjacent diagonals. 2 (1.   A11 A12 A13 A14 A15  0 A22 A23 A24 A25    0 A33 A34 A35  . C = CI = C(AB) = (CA)B = IB = B.g. u3 .15) . v = v1 e1 + v2 e2 + v3 e3 .13) where e1 . and e3 are the mutually orthogonal unit basis vectors in the Cartesian coordinate directions. a lower triangular matrix has only  A11 0  A21 A22  L =  A31 A32   A41 A42 A51 A52 zeros above the diagonal..g..  0 0 0 0 0 0   A33 0 0 . v2 .14) where ei · ej = δij .   x x 0 0 0 0  x x x 0 0 0     0 x x x 0 0   . (1. For example.  A43 A44 0  A53 A54 A55 (1. let B be the inverse.g. The dot product (or inner product) of u and v is 3 u · v = (u1 e1 + u2 e2 + u3 e3 ) · (v1 e1 + v2 e2 + v3 e3 ) = u1 v1 + u2 v2 + u3 v3 = i=1 ui vi . and u1 ..The inverse A−1 of a square matrix A is the matrix such that A−1 A = AA−1 = I. an orthogonal matrix is one whose inverse is equal to the transpose. e. e. (1.9) (1.8) The inverse is unique if it exists. so that AB = BA = I. (1. v1 . and v3 are the vector components. A square matrix A is said to be orthogonal if AAT = AT A = I. U= 0    0 0 0 A44 A45  0 0 0 0 A55 Similarly.10) (1.

n u·v = i=1 ui vi . amd θ is the angle between a and b. Eq. b. Two vectors u and v are orthogonal if u · v = 0. in which case (a − b cos θ)2 + (b sin θ)2 = c2 .19) .21) (1.17) u·u= i=1 ui ui = u2 + u2 + · · · + u2 . Consider the triangle deﬁned by three vectors a. (1.     . 1). where c = a − b.20) where a. and c. b. all of which are equivalent: inner product notation (u. and c are the lengths of a. the length of u is u = |u| = √ n (1.18) A unit vector is a vector whose length is unity. b.   v  n For u a vector. U d d dc=a−b b sin θ d d d b    θ  a E  d ' a − b cos θ E d Figure 1: Two Vectors. 1 2 n (1.20 can be expanded to yield a2 − 2ab cos θ + b2 cos2 θ + b2 sin2 θ = c2 3 (1. 1. (1. There are several notations for this product. In n dimensions.16) where n is the dimension of each vector (the number of components). and c (Fig.  . respectively. . v) vector notation u·v n index notation i=1 ui vi uT v or vT u matrix notation The scalar product can also be written in matrix notation as    v1      v2   T u v = u1 u2 · · · un .

23) 1. or an inﬁnite number of solutions. one solution. the system 2x1 + 3x2 = 5 (1.24) (1. (1. where θ is the angle between the two vectors.28) . for some right-hand sides.26) 4x1 + 6x2 = 1 is inconsistent and has no solution.22) which is the law of cosines. 1..e. (1.. not nonsingular).3 Diagonal and Triangular Systems Several types of systems of equations are particularly easy to solve.2 Nonsingular Systems of Equations If the linear system Ax = b has a unique solution x for every right-hand side b. the dot product of two vectors can alternatively be expressed as a · b = |a||b| cos θ = ab cos θ. has dependent equations) and has an inﬁnite number of solutions. We can further expand the law of cosines in terms of components to obtain (a1 − b1 )2 + (a2 − b2 )2 + (a3 − b3 )2 = a2 + a2 + a2 + b2 + b2 + b2 − 2ab cos θ 1 2 3 1 2 3 or a1 b1 + a2 b2 + a3 b3 = ab cos θ.25) (1. the system Ax = b will be inconsistent and have no solutions. The only possibilities for a square system of linear equations is that the system have no solution. then. and. since this property is determined by the coeﬃcients Aij of the matrix and does not depend on the right-hand side b.   .. Thus. one refers to the matrix rather than the system of equations as being nonsingular. Ann 4 . If A is singular (i. For the diagonal matrix   A11   A22     A33 A= (1. the equations will be dependent and have an inﬁnite number of solutions. For example. Usually. the system is said to be nonsingular.e. for other right-hand sides.   .or c2 = a2 + b2 − 2ab cos θ.27) is redundant (i. including diagonal and triangular systems. The system 2x1 + 3x2 = 5 4x1 + 6x2 = 10 (1.

1: [k = row number] x k = bk For i = k + 1. . the system is singular. the solution is found as x3 = −4.  . . .30) . A11 A22 Ann (1. Output x Since the only impediments to the successful completion of this algorithm are the divisions.34) (1. consider the upper triangular system  x1 − 2x2 + x3 = 1  2x2 + 4x3 = 10 (1. · · · . A. . Aii = 0. .29) assuming that. then xk does not exist if bk = 0. Aii = 0 for all i. and only if.32) .n xn )/An−1.the system Ax = b has the solution x1 = b2 bn b1 . n: xk = xk − Aki xi xk = xk /Akk 4.n xn = bn−1    Ann xn = bn . If Akk = 0 for some k. From the backsolving algorithm. 5 (1.n−1 . if any Aii = 0. and so on until all components of x are found. Alternatively. xn = . k + 2. x2 = . . and xk is arbitrary if bk = 0. n − 2. For example.  This system is solved by backsolving: xn = bn /Ann . . x2 = [10 − 4(−4)]/2 = 13. .    An−1. The backsolving algorithm can be summarized as follows: 1.33)  − 2x3 = 8. Input n. x1 = 1 + 2(13) − (−4) = 31. it is apparent that an upper triangular system is nonsingular if. b 2. Consider the upper triangular system (Aij = 0 if i > j)  A11 x1 + A12 x2 + A13 x3 + ··· + A1n xn = b1     A22 x2 + A23 x3 + ··· + A2n xn = b2    A33 x3 + ··· + A3n xn = b3  (1.31) (1. xn = bn /Ann 3. for each i. For k = n − 1. xn−1 = (bn−1 − An−1.n−1 xn−1 + An−1. .

n: bk = bk − Aki bi bk = bk /Akk 4. (This operation is a combination of the ﬁrst and third operations. Output b On output. b 2. and the result used to replace either of the ﬁrst two equations. 1: [k = row number] For i = k + 1. With this variation. The transformation is based on the following elementary operations.A variation on the back-solving algorithm listed above is obtained if we return the solution in the right-hand side array b. added to another equation. An equation can be multiplied by a nonzero constant. . Two equations can be interchanged. 4. An equation can be multiplied by a nonzero constant. . and the original b is destroyed. That is. . Two equations can be added together and either equation replaced by the sum. 2. Consider the system  −x1 + 2x2 + 3x3 + x4 = 1    −3x1 + 8x2 + 8x3 + x4 = 2 x1 + 2x2 − 6x3 + 4x4 = −1    2x1 − 4x2 − 5x3 − x4 = 0. the back-solving algorithm becomes 1. bn = bn /Ann 3. A. . Step 1: Eliminate all coeﬃcients of x1 except for that in the ﬁrst equation:  −x1 + 2x2 + 3x3 + x4 = 1    2x2 − x3 − 2x4 = −1 4x2 − 3x3 + 5x4 = 0    x3 + x4 = 2. This system can be solved directly by forward solving in a manner similar to backsolving for upper triangular systems. k + 2. .35) (1. 3. array x could be eliminated by overwriting b and storing the solution in b.36) . . for which the procedure is to transform the general system into upper triangular form. Input n. 1. the solution is returned in b. For k = n − 1. and then backsolve.4 Gaussian Elimination The approach widely used to solve general systems of linear equations is Gaussian elimination. 6 (1. which have no eﬀect on the solution of the system: 1. A lower triangular system Ax = b is one for which Aij = 0 if i < j. .) We illustrate Gaussian elimination with an example. . n − 2.

n: [i = row number below k] 7 . . and apply to this matrix the same elementary operations used in the three steps above:     1 1 −1 2 3 1 −1 2 3 1    −3 8 8 1 2   −→  0 2 −1 −2 −1   (1. we augment the coeﬃcient matrix by both right-hand sides:   −1 2 3 1 1 2  −3 2 3  8 8 1  .  1 2 −6 4 −1 3  2 −4 −5 −1 0 −1 The Gaussian elimination algorithm (reduction to upper triangular form) can now be summarized as follows: 1. if we want the solution of Ax1 = b1 and Ax2 = b2 .39)  0 4 −3  1 0  2 −6 4 −1  5 2 −4 −5 −1 0 0 0 1 1 2     1 1 −1 2 3 1 −1 2 3 1   0 2 −1 −2 −1    −→  0 2 −1 −2 −1  .e. Notice also that we could save some eﬀort in the above procedure by not writing the unknowns at each step. Notice in the above procedure that we work with both sides of the equations simultaneously.. (1.Step 2: Eliminate all coeﬃcients of x2 except for those in the ﬁrst two equations:  −x1 + 2x2 + 3x3 + x4 = 1    2x2 − x3 − 2x4 = −1 − x3 + 9x4 = 2    x3 + x4 = 2. . k + 2. . . A. n − 1: [k = pivot row number] For i = k + 1. . and instead use matrix notation.38) Ths system is now in upper triangular form. Input n. b 2. .40)  0 0 −1  0 0 −1 2  2  9 9 0 0 1 1 2 0 0 0 10 4 Note that this reduction can be performed “in-place” (i. Multiple right-hand sides can be handled simultaneously. For example. by writing over the matrix and destroying both the original matrix and the right-hand side). . Step 3: Eliminate all coeﬃcients of x3 except for those in the ﬁrst three equations:  −x1 + 2x2 + 3x3 + x4 = 1    2x2 − x3 − 2x4 = −1 − x3 + 9x4 = 2    10x4 = 4. . 2. For k = 1.37) (1. −→  (1. We deﬁne a matrix consisting of the left-hand side coeﬃcient matrix augmented by the right-hand side. and we could complete the solution by backsolving the upper triangular system.

2. . the loops above can be shortened to avoid unnecessary operations on zeros. the Gaussian elimination algorithm is modiﬁed as follows: 1. . k + 2. is one for which the nonzeros are clustered about the main diagonal. RHS modiﬁcation For k = 1. Note that each multiplier m could be saved in Aik (which is in the lower triangle and not used in the above algorithm) so that additional right-hand sides could be solved later without having to reduce the original system again. . . . b The output of this procedure is an upper triangular system which can be solved by backsolving. Thus. . Input n. n: [i = row number below k] bi = bi + Aik bk [rhs] 4. b If the coeﬃcient matrix is banded. Input n. . . Elimination For k = 1. Output A. . Output A. . . b Note further that. the loops involving the coeﬃcient matrix A can (and should) be separated from those involving the right-hand side b: 1. n: [j = column number to right of k] Aij = Aij + mAkj bi = bi + mbk [rhs] 3. n: [j = column number to right of k] Aij = Aij + Aik Akj bi = bi + Aik bk [rhs] 3. n: [i = row number below k] Aik = −Aik /Akk [multiplier stored in Aik ] For j = k + 1. k + 2. . k + 2. which occurs frequently in applications. . For k = 1. We deﬁne the matrix semi-bandwidth w as the maximum “distance” of any nonzero oﬀ-diagonal term from the diagonal (including 8 . n − 1: [k = pivot row number] For i = k + 1. . . .m = −Aik /Akk [m = row multiplier] For j = k + 1. . . . . k + 2. n − 1: [k = pivot row number] For i = k + 1. A band matrix. k + 2. . . n: [j = column number to right of k] Aij = Aij + Aik Akj 3. . . if the multipliers are saved in Aik . . 2. A. . . . b 2. n: [i = row number below k] Aik = −Aik /Akk [multiplier stored in Aik ] For j = k + 1. A. . . . . . n − 1: [k = pivot row number] For i = k + 1. 2. . . k + 2. b 2. Output A. .

the diagonal). Thus, for a band matrix, the upper limit n in the loops associated with pivot row k above can be replaced by kmax = min(k + w − 1, n). The modiﬁed algorithm thus becomes 1. Input n, A, b, w [w = matrix semi-bandwidth] (1.41)

2. Elimination For k = 1, 2, . . . , n − 1: [k = pivot row number] kmax = min(k + w − 1, n) [replaces n in the loops] For i = k + 1, k + 2, . . . , kmax : [i = row number below k] Aik = −Aik /Akk [multiplier stored in Aik ] For j = k + 1, k + 2, . . . , kmax : [j = column number to right of k] Aij = Aij + Aik Akj 3. RHS modiﬁcation For k = 1, 2, . . . , n − 1: [k = pivot row number] kmax = min(k + w − 1, n) [replaces n in the loop] For i = k + 1, k + 2, . . . , kmax : [i = row number below k] bi = bi + Aik bk [rhs] 4. Output A, b A similar modiﬁcation is made to the back-solve algorithm.

1.5

Operation Counts

Except for small systems of equations (small n), we can estimate the number of operations required to execute an algorithm by looking only at the operations in the innermost loop. We deﬁne an operation as the combination of a multiply and add, since they generally are executed together. Gaussian elimination involves a triply-nested loop. For ﬁxed row number k, the inner loops on i and j are executed n − k times each. Hence,
n−1 1 n−1

0

(n − k) =
k=1 n−1 m=n−1

2

m =
m=1

2

m2 (1.42)

m2 dm = (n − 1)3 /3 ≈ n3 /3,

where the approximations all depend on n’s being large. Thus, for a fully-populated matrix, Gaussian elimination requires about n3 /3 operations. For a band matrix with a semi-bandwidth w n, w2 operations are performed for each of the n rows (to ﬁrst order). Thus, for a general band matrix, Gaussian elimination requires about nw2 operations.

9

For backsolving, we consider Step 3 of the backsolve algorithm. In Row k (counting from the bottom), there are approximately k multiply-add operations. Hence,
n

k=1

k = n(n + 1)/2 ≈ n2 /2.

(1.43)

Thus, backsolving requires about n2 /2 operations for each right-hand side. Notice that, for large n, the elimination step has many more operations than does the backsolve.

1.6

Partial Pivoting

For Gaussian elimination to work, the (current) diagonal entry Aii must be nonzero at each intermediate step. For example, consider the same example used earlier, but with the equations rearranged:     1 1 −1 2 3 1 −1 2 3 1   2 −4 −5 −1 0  1 1 2   −→  0 0   (1.44)  0 2 −1 −2 −1  .  −3 8 8 1 2  1 2 −6 4 −1 0 4 −3 5 0 In this case, the next step would fail, since A22 = 0. The solution to this problem is to interchange two equations to avoid the zero divisor. If the system is nonsingular, this interchange is always possible. However, complications arise in Gaussian elimination not only when Aii = 0 at some step but also sometimes when Aii is small. For example, consider the system (1.000 × 10−5 )x1 + 1.000x2 = 1.000 1.000x1 + 1.000x2 = 2.000, (1.45)

the solution of which is (x1 , x2 ) = (1.00001000 . . . , 0.99998999 . . .). Assume that we perform the Gaussian elimination on a computer which has four digits of precision. After the one-step elimination, we obtain (1.000 × 10−5 )x1 + 1.000x2 = 1.000 − 1.000 × 105 x2 = −1.000 × 105 , whose solution is x2 = 1.000, x1 = 0.000. If we now interchange the original equations, the system is 1.000x1 + 1.000x2 = 2.000 (1.000 × 10−5 )x1 + 1.000x2 = 1.000, which, after elimination, yields 1.000x1 + 1.000x2 = 2.000 1.000x2 = 1.000, 10 (1.48) (1.47) (1.46)

whose solution is x2 = 1.000, x1 = 1.000, which is correct to the precision available. We conclude from this example that it is better to interchange equations at Step i so that |Aii | is as large as possible. The interchanging of equations is called partial pivoting. Thus an alternative version of the Gaussian elimination algorithm would be that given previously except that, at each step (Step i), the ith equation is interchanged with one below so that |Aii | is as large as possible. In actual practice, the “interchange” can alternatively be performed by interchanging the equation subscripts rather than the equations themselves.

1.7

LU Decomposition

Deﬁne Mik as the identity matrix with the addition of the number mik in Row i, Column k:   1   1     1 Mik =  (1.49) .   ..   . mik 1 For example, for n = 3, M31 is M31 1  0 = m31  0 1 0  0 0 . 1 (1.50)

We now consider the eﬀect of multiplying a matrix A by Mik . For example, for n = 3,    1 0 0 A11 A12 A13 1 0   A21 A22 A23  M31 A =  0 (1.51) A31 A32 A33 m31 0 1   A11 A12 A13 . A21 A22 A23 = (1.52) m31 A11 + A31 m31 A12 + A32 m31 A13 + A33 Thus, when A is multiplied by M31 , we obtain a new matrix which has Row 3 replaced by the sum of Row 3 and m31 ×Row 1. In general, multiplication by Mik has the eﬀect of replacing Row i by Row i+mik ×Row k. Thus, each step of the Gaussian elimination process is equivalent to multiplying A by some Mik . For example, for n = 4 (a 4 × 4 matrix), the entire Gaussian elimination can be written as M43 M42 M32 M41 M31 M21 A = U, where (1.53)

Aik (1.54) Akk are the multipliers in Gaussian elimination, and U is the upper triangular matrix obtained in the Gaussian elimination. Note that the multipliers mik in Eq. 1.54 are computed using the current matrix entries obtained during Gaussian elimination. mik = − 11

. we obtain A = M−1 M−1 M−1 U. where L is the lower triangular matrix  1  −m21 1   −m31 −m32 1 L=  .62) (1.   1 (1.58)   −m31  . 21 31 32 where M−1 M−1 M−1 21 31 32 1  −m21 = 1 =  −m21     1 1  1 1 1 1 =  −m21 −m31 −m32 In general. since. for n = 4 and M42 . for n = 4. .57) 1 1 −m31 1 1 −m32 1     1 1 1 −m32 1   (1.. . . . . ..If each M−1 exists. .n−1 (1. ik A = (M43 M42 M32 M41 M31 M21 )−1 U = M−1 M−1 M−1 M−1 M−1 M−1 U. −mn1 −mn2 · · · −mn. for n = 3. that M−1 exists and is obtained from Mik simply by changing ik the sign of mik (the ik element).55. by example. for any nonsingular matrix A.   1 (1.n−1 12     .n−1 A = LU.56)     0 1 1 0 1 0  m42 1 −m42 1 0 m42 − m42 0 1 From Eq. then. −mn1 −mn2 · · · −mn. .     .      1 1 1 0 0 0     0 1 1 1 0 0    =  = I.63) . (1.55) since the inverse of a matrix product is equal to the product of the matrix inverses in reverse order. 1 (1. . . .  −1 M21 M−1 · · · M−1 · · · M−1 31 n1 n. . We observe. . 1.61) and. 43 42 32 41 31 21 (1.60)    =   1 −m21 1 −m31 −m32 1 . .  . . . .59) (1. .

 . one might infer that the inverse of a lower unit triangular matrix is equal to the same matrix with the terms below the diagonal negated. since. although the inverse of a lower unit triangular matrix is a lower triangular matrix equal to the product of M matrices.. . . . The decomposition A = LU is referred to as the “LU decomposition” or “LU factorization” of A. and L consists of the negatives of the row multipliers. . It can also be shown that the inverse of a matrix product is the product of the individual inverses in reverse order. such an inference would be incorrect. . . Each Lij entry can be stored as it is computed.66) 13 . Thus. the storage locations for A are over-written during the Gaussian elimination process). . A lower triangular matrix with 1s on the diagonal is referred to as a lower unit triangular matrix. In particular. .. . An1 An2 An3 · · · Ann Ln1 Ln2 Ln3 · · · Unn where the Lij entries are the negatives of the multipliers obtained during Gaussian elimination. .64) 1 8 1 We note that A−1 since  1 0 0 1 0 . Consider the matrix   1 0 0 A =  5 1 0 .e. Thus. Note that the LU factors are obtained directly from the Gaussian elimination algorithm. We have also seen that the inverse of an M matrix is another M matrix: the one with the oﬀ-diagonal term negated. . the M matrices are in the wrong order. since U is the upper triangular result. .   . 1 8 1 −1 −8 1 −40 0 1 (1. the inverse of a lower unit triangular matrix is not equal to the same matrix with the terms below the diagonal negated.  . .  .and U is the upper triangular matrix obtained in Gaussian elimination. Since L is a lower unit triangular matrix (with 1s on the diagonal). However. . the LU decomposition is merely a matrix interpretation of Gaussian elimination.65)      1 0 0 1 0 0 1 0 0  5 1 0   −5 1 0 = 0 1 0  = I.  .     A11 A12 A13 · · · A1n U11 U12 U13 · · · U1n  A21 A22 A23 · · · A2n   L21 U22 U23 · · · U2n       A31 A32 A33 · · · A3n      is replaced by  L31 L32 U33 · · · U3n  . . . .  . . If A is factored in place (i. (1. as the following simple example illustrates. .. . . both L and U can be stored in the same n2 storage locations used for A. it is not necessary to store the 1s. We have seen how a lower unit triangular matrix can be formed from the product of M matrices. It is also useful to show a common storage scheme for the LU decomposition.  . =  −5 −1 −8 1  (1. Thus.

ﬁrst to compute y and then to compute x. The two steps in Eq.73 are often referred to as forward substitution and back substitution.72 is equivalent to the pair of equations Ly = b.68) That is.73a is the modiﬁcation of the original right-hand side b by the multipliers to yield y.72) which can be solved in two steps. (1. A−1 = D−1 C−1 B−1 =  0 0 −8 1 −1 0 1 0 0 1 39 −8 1 (1. we compute A−1 as       1 0 0 1 0 0 1 0 0 1 0 0 1 0   0 1 0   −5 1 0  =  −5 1 0 . Eq. 1. Ux = y. 1. The combination is sometimes 14 . C =  0 1 0 . 0 8 1 1 0 1 0 0 1 41 8 1 Thus. 1. 1. the nice behavior which occurs when multiplying M matrices occurs only if the matrices are multiplied in the right order: the same order as that which the Gaussian elimination multipliers are computed. To complete this discussion. Then LUx = b. we conclude that order matters when M matrices are multiplied. The combination of Gaussian elimination and backsolve can therefore be interpreted as the sequence of two matrix solutions: Eq.69) or both?  0 1 0  0 1 1 Notice that     0 0 1 0 0 1 0 0 1 0   0 1 0  =  5 1 0  = A.73b is the backsolve.73) (1. D =  0 1 0 .71) 1. If we now deﬁne Ux = y. consider the three matrices       1 0 0 1 0 0 1 0 0 B =  5 1 0 . 0 1 0 8 1 1 8 1 (1.8 Matrix Interpretation of Back Substitution Consider the linear system Ax = b. where A has been factored into A = LU.70)  (1.67) 0 0 1 1 0 1 0 8 1 Is A = BCD or A = DCB  1 0 BCD =  5 1 0 0 and      1 0 0 1 0 0 1 0 0 1 0 0 DCB =  0 1 0   0 1 0   5 1 0  =  5 1 0  = A. and BCD = DCB. respectively. (1. since the backsolve operation involves the coeﬃcient matrix U to yield the solution x.To see why we cannot compute A−1 by simply negating the terms of A below the diagonal. Eq. (1.

will be discussed as an application of eigenvalue problems in Section 6. and the start-up procedure).. Since the forward and back substitutions each require n2 /2 operations for a fully populated matrix.75) where ∆t is the integration time step. Thus. which results in the recursive relation 1 1 1 1 M+ B + K un+1 = (Fn+1 + Fn + Fn−1 ) ∆t2 2∆t 3 3 2 1 1 1 1 + M − K un + − 2 M + B − K un−1 . 88). There are various issues with integrators (including stability. F is the time-dependent vector of applied forces. 2 ∆t 3 ∆t 2∆t 3 (1. K is the system stiﬀness matrix. blast or shock).75. The computational eﬀort is therefore one matrix decomposition (LU) for each unique ∆t followed by two matrix/vector muliplications (to form the rhs) and one FBS at each step. t is time. Another approach. this matrix system is a system of linear algebraic equations whose solution un+1 is the displacement solution at the next time step. 1. An example of a direct integrator is the Newmark-beta ﬁnite diﬀerence method (in time). As a result. but the right-hand sides are not known in advance. save the factors. and K do not change during the integration unless the time step ∆t changes. in Eq. and then complete the solution with FBS. 1. B is the viscous damping matrix. The problem is to determine u(t). Thus. FBS requires n2 operations. the coeﬃcient matrices in parentheses that involve the time-independent matrices M. M. the modal superposition approach. Newmark-beta uses the initial conditions in a special start-up procedure to obtain u0 and u−1 to get the process started. Linear ﬁnite element modeling results in the matrix diﬀerential equation ˙ M¨ + Bu + Ku = F(t). B.g. given the ˙ initial displacement u0 and initial velocity u0 . time step selection. u (1. one should not change the time step size ∆t unnecessarily. one of which integrates the equations directly in the time domain. it makes sense that two consecutive time solutions are needed to move forward. form the new rhs at each step. The right-hand side depends on the current solution un and the immediately preceding solution un−1 .75 is an example of a linear equation system of the form Ax = b for which many solutions are needed (one for each right-hand side b) for the same coeﬃcient matrix A. Since the original diﬀerential equation is second-order. Since LU costs considerably more than multiplication or FBS. If the current time solution is un . but our interest here is in the computational eﬀort. Notice that.74) where u(t) is the vector of unknown displacements. since a minimum of three consecutive time values is needed to estimate a second derivative using ﬁnite diﬀerences. M is the system mass matrix. B. Transient response analysis of structures is concerned with computing the forced response due to general time-dependent loads (e. the solution strategy is to factor the left-hand side coeﬃcient matrix once into LU. Eq. 15 .10 (p. There are several numerical approaches for solving this system of equations. A good example of why equations are solved with LU factorization followed by FBS is provided by transient structural dynamics. and K are constant (independent of time). and dots denote diﬀerentiation with respect to the time t.called the forward-backward substitution (FBS). and un is the solution at the nth step. many time steps can be computed for the same eﬀort as one LU.

The diagonal entries in U can be factored out into a separate diagonal matrix D to yield A = LDU.1. or U = LT . For a 3 × 3 matrix. where L is a lower unit triangular matrix. and U is an upper unit triangular matrix (not the same U as in LU).   2 1 1 1 0 0 1 0 A =  4 −6 0  =  2 −2 7 2 −1 −1 1    1 0 0 2 0 0 1 0   0 −8 0   = 2 −1 −1 1 0 0 1 Note that. 0 0 U33 0 0 D33 0 0 1 ˆ where Dii = Uii (no sum on i). if A is symmetric (AT = A).76) where D is a nonsingular diagonal matrix.82)   2 1 1   0 −8 −2  0 0 1  1 1/2 1/2 ˆ 0 1 1/4  = LDU.77) (1.79) (1. 0 0 U33 0 0 U33 0 0 1 For example.9 LDU Decomposition Recall that any nonsingular square matrix A can be factored into A = LU. nonsingular symmetric matrices can be factored into A = LDLT . A = LDU = (LDU)T = UT DLT . (1.      U11 U12 U13 U11 0 0 1 U12 /U11 U13 /U11  0 U22 U23  =  0 U22 0   0 1 U23 /U22  .80) (1. Thus. let ˆ LU = LDU. 0 0 1  (1. and Uij = Uij /Dii (no sum on i). That is.78) (1.81) 16 .84) (1.      ˆ ˆ U11 U12 U13 D11 0 0 1 U12 U13 ˆ ˆ U =  0 U22 U23  =  0 D22 0   0 1 U23  = DU. (1. To verify the correctness of this variation of the LU decomposition. and U is an upper triangular matrix.83) (1.

10. If a row (or column) of a matrix is a constant multiple of another row (or column). (1. the determinant is zero. 17 . 7. In general. det(AB) = (det A)(det B) 4.86) A11 A12 A21 A22 = A11 A12 A21 A22 = A11 A22 − A12 A21 . Several properties of determinants are of interest and stated here without proof: 1. Consequently. Multiplying any single row or column by a constant multiplies the determinant by that constant. Adding a constant multiple of a row (or column) to another row (or column) leaves the determinant unchanged.85) where the number of multiply-add operations is 2. For n = 3. det AT = det A 3. Thus. we deﬁne the determinant of A as det For example. the evaluation of the determinant of an n × n matrix requires about n! operations (to ﬁrst order). det A−1 = (det A)−1 (a special case of Property 3) 5. 11. 8.1.10 Determinants Let A be an n × n square matrix. A matrix with a zero row or column has a zero determinant. det(cA) = cn det(A). which is prohibitively expensive for all but the smallest of matrices. 6. 1 2 3 4 5 6 7 8 9 =1 5 6 4 6 4 5 −2 +3 8 9 7 9 7 8 = 0. A matrix with two identical rows or two identical columns has a zero determinant. 2. using the cofactor approach. (1. The determinant of a triangular matrix is the product of the diagonals. A cofactor expansion can be performed along any matrix row or column. For n = 2.87) where the number of operations is approximately 3 · 2 = 6. the number of operations to expand an n × n determinant is n times the number of operations to expand an (n − 1) × (n − 1) determinant. (1. for example. 9. if we expand a determinant using cofactors. for an n × n matrix A and scalar c. 1 2 3 4 = 1 · 4 − 2 · 3 = −2. Interchanging two rows or two columns of a matrix changes the sign of the determinant. we can expand by cofactors to obtain.

6×1025 8. since.2×104 n! 120 3.3×1081 Determinants are rarely of interest in engineering computations or numerical mathematics.88) This property is sometimes convenient for matrices with many zeros. and both x and b are n×m matrices. Thus. 1. Each column of the solution x corresponds to the same column of the right-hand side b. given U. its determinant is nonzero. since computing the LU factorization by Gaussian elimination requires about n3 /3 operations. and only if. e. both the right-hand side b and the solution x have m columns. Given A.92) Thus. (1. That is. (1. (1. calculating det A requires about n3 /3 operations. a matrix is nonsingular if.91) or [Ax1 Ax2 Ax3 ] = [b1 b2 b3 ]. 1.11 Multiple Right-Hand Sides and Matrix Inverses Consider the matrix system Ax = b. a very small number compared to the n! required by a cofactor expansion. each corresponding to an independent right-hand side. and then apply FBS (an n2 operation) to 18 . if needed.90 also shows that.. which implies that A is nonsingular. 1. U and A are both singular. (1. the determinant can be computed with an additional n operations.87. can be economically computed using the LU factorization. as seen in the table below: n3 /3 n 5 42 10 333 25 5208 60 7.90) This result implies that a nonsingular matrix has a nonzero determinant. and det A = 0.89) Now consider the LU factorization A = LU: det A = (det L)(det U) = (1 · 1 · 1 · · · 1)(U11 U22 · · · Unn ) = U11 U22 · · · Unn . if U has one or more zeros on the diagonal. (1. Eq.g. where A is an n × n matrix. as can be seen if the system is written in block form as A[x1 x2 x3 ] = [b1 b2 b3 ]. a convenient and economical approach to solve a system with multiple right-hand sides is to factor A into A = LU (an n3 /3 operation).To illustrate the ﬁrst property using the matrix of Eq. On the other hand.6×106 1. but. if Gaussian elimination results in a U matrix with only nonzeros on the diagonal. 1 0 3 0 5 0 7 0 9 =5 1 3 7 9 = −60. U is nonsingular. and det A = 0. we could also write 1 2 3 4 5 6 7 8 9 = −2 4 6 1 3 1 3 +5 −8 7 9 7 9 4 6 = 0.

1. additional right-hand sides add little to the overall computational cost of solving the system. instead of a full two-dimensional array A(I. 0 0 0 ··· 1 The calculation of A−1 is equivalent to solving a set of equations with coeﬃcient matrix A and n right-hand sides. the diagonal.  . . which has only one nonzero in each column). . 4 1 Number of operations ≈ n3 + n(n2 ) = n3 . . each one column of the identity matrix. is the matrix satisfying   1 0 0 ··· 0  0 1 0 ··· 0      AA−1 = I =  0 0 1 · · · 0  . . including the ﬁnite diﬀerence solution of diﬀerential equations (e. Given a square matrix A. n − 1: [k = pivot row] m = −lk+1 /dk [m = multiplier needed to annihilate term below] dk+1 = dk+1 + muk [new diagonal entry in next row] bk+1 = bk+1 + mbk [new rhs in next row] 19 . The solution algorithm (reduction to upper triangular form by Gaussian elimination followed by backsolving) can now be summarized as follows: 1.  where di . the matrix inverse A−1 . . All coeﬃcients can now be stored in three one-dimensional arrays. .each right-hand side. 3 3 (1. D(·). . In fact. and li are.   ln xn−1 + dn xn = bn . respectively. . the Crank-Nicolson ﬁnite diﬀerence method for solving parabolic partial diﬀerential equations) and cubic spline interpolation.12 Tridiagonal Systems Tridiagonal systems arise in a variety of engineering applications.  . .. if it exists. For k = 1. if one exploits the special form of the right-hand side (the identity matrix.   .  ..95)  . 2. . and L(·).94) Thus. It is convenient to solve such systems using the following notation:  d 1 x1 + u 1 x2 = b1     l2 x1 + d2 x2 + u2 x3 = b2   l3 x2 + d3 x3 + u3 x4 = b3 (1. The level of eﬀort required is one LU factorization and n FBSs. . . and lower matrix entries in Row i. . that is.J). computing an inverse is only four times more expensive than solving a system with one right-hand side. upper.93)  . . . ui . it turns out that the inverse can be computed in n3 operations. Tridiagonal systems are particularly easy (and fast) to solve using Gaussian elimination. (1. U(·). Unless the number m of right-hand sides is large.g.

i−1 xi−1 − Ai. . say. 2. since the order in which the equations are processed is irrelevant.98) In these equations. superscripts refer to the iteration number.n−1 xn−1 /Ann . n (1. 2.2.i+1 xi+1 − · · · − Ain x(k−1) /Aii .  . . for which the number of operations required to eﬀect a solution is predictable and ﬁxed. Jacobi’s method is nicely suited to parallel computation. the process has converged. xn = bn /dn [start of backsolve] [backsolve loop] 3.13 Iterative Methods Gaussian elimination is an example of a direct method. xi (k) = bi − Ai1 x1 (k−1) − Ai2 x2 (k−1) − · · · − Ai. 1: xk = (bk − uk xk+1 )/dk To ﬁrst order. one starts by assuming a solution (0) x . Notice that. i = 1. If the vector x does not change much from one iteration to the next. it is not possible to update the vector x in place. and we have a solution. 1. or xi (k) = bi − j=i Aij xj (k−1) /Aii . in this algorithm. this algorithm requires 5n operations. two consecutive approximations to x are needed at one time so that convergence can be checked. and then computing a new approximation to the solution using the equations   x(1) = b1 − A12 x(0) − A13 x(0) − · · · − A1n x(0) /A11  1 n 2 3    (1)  (0) (0)  x = b2 − A21 x − A23 x − · · · − A2n x(0) /A22 n 2 1 3 (1. Jacobi’s method is also called iteration by total steps and the method of simultaneous displacements. Input n. systems with large diagonal entries will converge rapidly with iterative methods. That is. and the updates could in principle be done simultaneously. . Nmax .  . .97) (k−1) (k−1) i = 1. k=0 20 . The Jacobi algorithm can be summarized as follows: 1. . In Jacobi’s method. Some systems of equations have characteristics that can be exploited by iterative methods. . n. n − 2. . For k = n − 1. For example. x [x = initial guess] 2. . . In general. n. where the number of operations is not predictable in advance. Thus.96) . . Consider the linear system Ax = b. . b. (1.    (1) (0)  x = b − A x(0) − A x(0) − · · · − A  n n n1 1 n2 2 n. . A.

i+1 xi+1 − · · · − Ain xn ) /Aii 5.0503 0. output y and exit.5625 0.5788 0.100) 21 .5789 0. . A= (1. otherwise . No convergence. If k < Nmax . otherwise .1564 -0.1576 -0. . In the above algorithm.5755 0. where     2 1 0  1   1 4 1 .101) (1.5782 0. =   0 (1.1578 -0. . .0525 0.5788 0.5000 0.0503 0. error termination.0417 0.5755 0. 2. .0521 0.0417 0.1576 -0. b = 0 . 8.99)   0 1 3 0 This system can be written in the form   x1 = (1 − x2 )/2 x2 = (−x1 − x3 )/4  x3 = −x2 /3.5782 0. n: yi = (bi − Ai1 x1 − Ai2 x2 − · · · − Ai.0521 0.1510 -0.0526 0. x = y [save latest iterate in x] 7. For i = 1.1564 -0.5789 x2 0 0 -0.0525 0. consider the system Ax = b. the old iterates at each step are called x. If converged (compare y with x). and the new ones are called y.1250 -0. .1510 -0.1250 -0.i−1 xi−1 − Ai.5000 0. For example.5625 0. .0526    0  0 . 6. . go to Step 3. k=k+1 [k = iteration number] 4.1579 x3 0 0 0 0. we use x(0) We summarize the iterations in a table: k 0 1 2 3 4 5 6 7 8 9 10 11 12 x1 0 0.3.1578 -0. As a starting vector.

This formula is used instead of Eq. this iteration has converged. Nmax . . otherwise . A=   0 1 3 0 The system Ax = b can again be written in the form   x1 = (1 − x2 )/2 x2 = (−x1 − x3 )/4  x3 = −x2 /3. 2.104) (1. . If k < Nmax . . n. when x3 is computed. in Jacobi’s method. The Gauss-Seidel algorithm can be summarized as follows: 1. . =   0 (1. 7. output x and exit.i+1 xi+1 − · · · − Ain xn ) /Aii 6. = bi − Ai1 x1 − Ai2 x2 − · · · − Ai. x [x = initial guess] 2. otherwise . error termination. . No convergence. in Gauss-Seidel. (k−1) (k−1) Why not use the new values rather than x2 and x1 ? The iterative algorithm which uses at each step the “improved” values if they are available is called the Gauss-Seidel method. we already know x2 and x1 .i−1 xi−1 − Ai.i−1 xi−1 − Ai.105) We summarize the Gauss-Seidel iterations in a table: 22 . A.For the precision displayed. In general.97. k=0 3. b. xi (k) (k−1) /Aii . If converged (compare x with y). y = x [save previous iterate in y] 5. 1. . (k) (k) (k) Notice that. b = 0 . .102) (1. . . . 2. We return to the example used to illustrate Jacobi’s method:     2 1 0  1   1 4 1 . . where we again use the starting vector x(0)    0  0 . go to Step 3. Input n. n: xi = (bi − Ai1 x1 − Ai2 x2 − · · · − Ai.103) (1.i+1 xi+1 − · · · − Ain xn (k) (k) (k) (k−1) i = 1. . k=k+1 [k = iteration number] 4. 8. For i = 1.

107)   3 4 4 11 For this system. Deﬁne M as the iteration matrix for Jacobi’s method. 23 (1. b = 1 1 . In Gauss-Seidel iteration. A =  4 7 4 . If the ordering of the equations is changed.5788 0.1578 x3 0 0. the updates cannot be done simultaneously as in the Jacobi method. We now return to the question of convergence of iterative methods. the Gauss-Seidel method will converge faster than the Jacobi method. this statement is not always true. 1). i. 1. 1). where     1 2 −2  1   1 1 . It also turns out that the rate of convergence depends on the ordering. since each component of the new iterate depends upon all previously computed components. for iterative methods such as Jacobi or Gauss-Seidel.e. On the other hand. Gauss-Seidel converges is about half as many iterations as Jacobi. the system Ax = b.k 0 1 2 3 4 5 6 x1 0 0.1564 -0.5789 x2 0 -0. for example. 3.1250 -0. and Gauss-Seidel converges. M is deﬁned as the matrix such that x(k) = Mx(k−1) + d.1578 -0. (1. whose solution is (1. A= (1. That is. the components of the new iterate will also change. if the Jacobi method converges.0526 0. Consider.109) or max xi − xi i (k) (k−1) < ε max xi i where ε is a small number chosen by the analyst.110) . the stopping criterion is either that the absolute change from one iteration to the next is small or that the relative change from one iteration to the next is small. Consider Jacobi’s method of solution for the system Ax = b.1576 -0.1510 -0.106)   2 2 1 1 The solution of this system is (−3.0503 0. b = (1. In general.0521 0.0417 0. Also..5782 0. Jacobi converges. consider the system Ax = b with     5 4 3  12  15 . For this system.0525 0. either max xi − xi i (k) (k−1) <ε (k−1) (1. each iterate depends upon the order in which the equations are processed. so that the Gauss-Seidel method is sometimes called the method of successive displacements to indicate the dependence of the iterates on the ordering. and Gauss-Seidel diverges.5625 0. Jacobi diverges.5755 0.5000 0. However. Generally.108) .0526 For this example.

e(k+1) results.113) (1. Me(k) = Mx(k) − Mx = x(k+1) − d − (x − d) = x(k+1) − x = e(k+1) . we conclude that M should be small in some sense.  .      di = bi /Aii . By deﬁnition (Eq. Thus. Aii j (1. 1. ci ≤ i i (k+1) (k+1) =− j=i Aij (k) e Aii j (1. j=i (1.116) |ci | . (1. .n−1 Ann A1n − A11 A2n − A22 .118) where we deﬁne αi = j=i Aij 1 = Aii |Aii | |Aij |. Mx + d = x.119) 24 .110). . . 0       .112) That is.115) = j=i Aij (k) e .where   0   A21  − 0  M =  A22  . with absolute values. . (1.. . ··· − ··· ··· . . (1. max max (1. . The last equation can be written in terms of components as ei or. where x is the exact solution (unknown). if the iteration matrix M acts on the error e(k) .    A An2 n1 − − Ann Ann A12 − A11 A13 − A11 A23 − A22 .. .117) it follows that ei (k+1) ≤ j=i Aij (k) e = Aii j j=i Aij Aii ej (k) ≤ j=i Aij Aii e(k) = αi e(k) . to have the error decrease at each iteration. for any numbers ci . Thus.111) Deﬁne the error at the kth iteration as e(k) = x(k) − x. ei Since.114) (1. An.

where αmax < 1 implies |Aii | > j=i |Aij |. a suﬃcient condition for Jacobi’s method to converge is that the coeﬃcient matrix A be strictly diagonally dominant. .106 and 1. the error decreases with each iteration. 4. Examples of vector spaces include the following: 1. . called vectors. but not necessary.Thus. 3. 3. 1 a = a. which can be combined using addition and multiplication under the following eight rules: 1. Addition is associative: (a + b) + c = a + (b + c). The space of functions f (x). 7. Addition is commutatative: a + b = b + a. Multiplication is distributive with respect to vector addition: α(a + b) = αa + αb.107.120) If αmax < 1. . Notice that the starting vector does not matter. 5. . 2. As a practical matter. since it did not enter into this derivation. and rapid convergence can be assured only if the main diagonal coeﬃcients are large compared to the oﬀ-diagonal coeﬃcients. an ). There exists a unique zero vector 0 such that a + 0 = a. rapid convergence is desired. max (1. i = 1. The space of m × n matrices. 1. . (1. a2 . Note that strict diagonal dominance is a suﬃcient condition. n. . 8. 1. These suﬃcient conditions are weak in the sense that the iteration might converge even for systems not diagonally dominant. The space of n-dimensional vectors: a = (a1 .118 implies (k) e(k+1) ≤ αmax emax . Therefore. Multiplication is associative: α(βa) = (αβ)a for scalars α. 2 Vector Spaces A vector space is deﬁned as a set of objects. as seen in the examples of Eqs. It turns out that strict diagonal dominance is also a suﬃcient condition for the Gauss-Seidel method to converge. 2. 2. 25 . . 6. There exists a unique negative vector such that a + (−a) = 0. Eq. Multiplication is distributive with respect to scalar addition: (α + β)a = αa + βa. β. .121) A matrix which satisﬁes this condition is said to be strictly diagonally dominant.

For example. (2. An    b1  b2 .1 Rectangular Systems of Equations For square systems of equations Ax = b. For example. it may not have a solution. and only if. consider the 3 × 2 system     1 0  b1   5 4  u = b2 . there is one solution (if A−1 exists) or zero or inﬁnite solutions (if A−1 does not exist).1) and obtain a result which is a polynomial of degree 1. their sum a + b also lies in the subspace. a two-dimensional plane in 3-D is a subspace. in the 3 × 2 matrix above. the scalar multiple αa lies in the subspace. On the other hand. This system has a solution if. and only if. For example. 2) and (0. (2. b lies in the plane formed by the two vectors. and (2) if a vector a is in the subspace.2) v   2 4 b3 Since this system has three equations and two alternative way to write this system is     1   0 4 u 5 +v    2 4 unknowns. polynomials of degree ≤ 2 are a subspace. consider the 3 × 4 system       x1      b1  1 3 3 2   x2  2 6 9 5  b2 = . We note that the column space of an m × n matrix A is a subspace of m-dimensional space Rm . The phrase “closed under addition and scalar multiplication” means that (1) if two vectors a and b are in the subspace.4)  x3      −1 −3 3 0  b3  x4 26 . The null space of a matrix A consists of all vectors x such that Ax = 0. the system has a solution if. the system has a solution if. Geometrically.3) where u and v are scalar multipliers of the two vectors. b lies in the column space of A. However. the column space is the plane in 3-D formed by the two vectors (1. For example. polynomials of degree 2 are not a subspace. The column space of a matrix A consists of all combinations of the columns of A. (2. the right-hand side b can be written as a linear combination of the two columns. 4. The situation with rectangular systems is a little diﬀerent.Any set of objects that obeys the above eight rules is a vector space. 5. for example. =    b3   (2. That is. 2. since. −x2 + x + 1 + x2 + x + 1 = 2x + 2. A subspace is a subset of a vector space which is closed under addition and scalar multiplication. we can add two polynomials of degree 2. and only if. 4).

we obtain x1 = 1 − 3x2 − 3x3 − 2x4 = 1 − 3x2 − 3(1 − x4 /3) − 2x4 = −2 − 3x2 − x4 .6) (2.7) Thus. say) can be picked arbitrarily. 3 From the ﬁrst equation in Eq. One of the two unknowns (x4 . and only if. (2. b3 − 2b2 + 5b1 = 0. For example. Now assume that a solution of Eq.12) (2.e. which implies that the solution can be written in the form          x1   −2   −3   −1                  x2 0 1 0 + x2 + x4 .. let    1  5 .10) (2.13) This system thus has a doubly inﬁnite set of solutions. 2.9) (2. for this system. even though there are more unknowns than equations.11) (2.5) (2.8)   5 in which case the row-reduced system is   1 3 3 2 1  0 0 3 1 3 .9. there may be no solution.We can simplify this system using elementary row operations to obtain     1 3 3 2 b1 1 3 3 2 b1  2 6 9 5 b2  −→  0 0 3 1 b2 − 2b1  −1 −3 3 0 b3 0 0 6 2 b 1 + b3   1 3 3 2 b1 . from the second equation. −→  0 0 3 1 b2 − 2b1 0 0 0 0 b3 − 2b2 + 5b1 which implies that the solution exists if. 3x3 + x4 = 3. i. x= =  x3   1   0   −1/3                  x4 0 0 1 (2. 0 0 0 0 0 which implies. b= (2. there are two free parameters. 27 . 2.4 exists. in which case 1 x3 = 1 − x4 .

The pivots (circled above) are the ﬁrst nonzeros in each row. where. any solution of the corresponding homogeneous system can be added to get another solution. given a solution x. 3.17) (2. The number of nonzero rows in the row-reduced (echelon) matrix is referred to as the rank of the system.13 and 2. 0 0 0 0 0 which implies 3x3 = −x4 and x1 = −3x2 − 3x3 − 2x4 = −3x2 − x4 . Only zeros appear below each pivot.20) where nonzeros are denoted with the letter “x”. 2. 2. Thus. the solution of the homogeneous system can be written      −3   −1          1 0 xh = x2 + x4 .Now consider the corresponding homogeneous system (with zero right-hand side): Axh = 0.16) (2. Each pivot appears to the right of the pivot in the row above. The ﬁnal row-reduced matrix is said to  x x  0 x   0 0 0 0 be in echelon form if it looks like this:  x x x x x x x x  .9. 0 x x x  0 0 0 0 (2. for the system Ax = b.18. (2. the solution of the nonhomogeneous system Ax = b can be written as the sum of a particular solution xp and the solution of the corresponding homogeneous problem Axh = 0: x = xp + xh . 28 . from Eqs. the row-reduced system is   1 3 3 2 0  0 0 3 1 0 . The echelon form of a matrix has the following characteristics: 1. since Ax = A(xp + xh ) = Axp + Axh = b + 0 = b.15) (2.19) Alternatively. 4. matrix rank is the number of independent rows in a matrix.14) (2. 2. from Eq.  0   −1/3          0 1 (2.18) Therefore. Thus. The nonzero rows appear ﬁrst.

21 implies v1 v2 v3 or    c1  c2 =0   c3 (2.e. (2. Geometrically. Thus. then one vector can be written as a linear combination of the others. they are parallel. =     3 4 4 c3 0  (2. if the coeﬃcients ci are not all zero. . two vectors (of any dimension) are linearly dependent if one is a multiple of the other. For example. i.23)     3 4 2  c1   0   3 5 7  c2 0 . v1 = cv2 . Column 2 is not independent. Let r denote the rank of a matrix (the number of independent rows). and only if.22)       3 4 4 are independent. For the special case of two vectors. The previous example also shows that the columns of a matrix are linearly independent if. . Column 1 has one nonzero component. the three given vectors are independent. v2 = 5 . the null space of the matrix is the zero vector. let us determine if the three vectors        3   4   2  3 . 2. consider the echelon matrix   1 3 3 2  0 0 3 1 . v2 .2. For example. We note that. to determine the independence of vectors. 29 .25) 3 4 4 0 0 0 2 0 This system implies c1 = c2 = c3 = 0. v3 are independent if c1 = c2 = c3 = 0 is the only solution of this system. Thus. v2 .21) The vectors v1 . since each column of an echelon matrix with a pivot has a nonzero component in a new location.24) so that v1 . v3 = 7 v1 = (2. . Column 3 has two nonzero components.. Three vectors (of any dimension) are linearly dependent if they are coplanar.2 Linear Independence c1 v1 + c2 v2 + · · · + ck vk = 0 (2. since an upper triangular system with a nonzero diagonal is nonsingular. 0 0 0 0 where the pivots are circled. if two vectors are dependent. We observe that the number of independent columns is also r. vk are linearly independent if the linear combination implies c1 = c2 = · · · = ck = 0. since it is proportional to Column 1. and determine the null space of the matrix:     3 4 2 0 3 4 2 0  3 5 7 0  −→  0 1 5 0  . . Note that Eq. we can arrange the vectors in the columns of a matrix.

the row spaces of both A and U are the same. 2. in 2-D. . .1) in Fig. but the column can be written uniquely as a linear combination of Columns 1 and 3 by picking ﬁrst the Column 3 multiplier (1/3) followed by the Column 1 multiplier. Consider again the 3 × 4 example used in the discussion of rectangular systems:       1 3 3 2 1 3 3 2 1 3 3 2 6 9 5  −→  0 0 3 1  −→  0 0 3 1  = U. For example. that number is referred to as the dimension of V . 30 .0) and (1.27) −1 −3 3 0 0 0 6 2 0 0 0 0 where the pivots are circled.1) also span the xy-plane.1) also span the xy-plane.4 Row Space and Null Space The row space of a matrix A consists of all possible combinations of the rows of A.1). the vectors (1.1) do not form a basis for R2 . 2). However.0) and (0. The two vectors (1. Again we see that the number of pivots is equal to the rank r. Thus. we see that all bases for a vector space V contain the same number of vectors. .(0. and (1. 2 form a basis for R2 . so Column 3 cannot be dependent on the ﬁrst two columns.1) span the xy-plane (Fig.e.0).1). wk are said to span the space. i. Also. although only two of the three vectors are needed.0)   E (1. A basis for a vector space is a set of vectors which is linearly independent and spans the space. since the rows of the ﬁnal matrix U are simply linear combinations of the rows of A. rank(A) = 2. For example. 2. A= 2 (2.. The two vectors (1. . the two vectors (1.3 Spanning a Subspace If every vector v in a vector space can be expressed as a linear combination v = c1 w1 + c2 w2 + · · · + ck wk (2. (0. The requirement for linear independence means that there can be no extra vectors.0) and (0. w2 . the three vectors (1. Column 4 also has two nonzero components. This deﬁnition is equivalent to that given earlier (all possible combinations of the columns). The column space of a matrix is the space spanned by the columns of the matrix.26) for some coeﬃcients ci . then the vectors w1 . the three vectors (1. Thus we see that. for a matrix.0) Figure 2: Some Vectors That Span the xy-Plane.0) and (1.1) T     E (1.1)    (1. Since the number of spanning vectors need not be minimal. since one of the three vectors is linearly dependent on the other two.1) also form a basis for R2 . (0. A has two independent rows. and (1. the row rank is equal to the column rank.0).

This property will be useful in the discussions of pseudoinverses and least squares. In general. i. n columns): Space Dimension Subspace of Row r Rn Column r Rm Null n−r Rn Note that dim(row space) + dim(null space) = n = number of columns. the number of free variables (the dimension of the null space) is n − r. However.31) 2. With elementary row operations. consider AAT .29)  1 3 3 2 U =  0 0 3 1 . Hence. (2. A can be transformed into echelon form. AAT AAT 31 −1 =I (2. for a rectangular system with m equations and n unknowns (i.e. the number of independent columns is equal to the number of independent rows. 2.32 and x2 arbitrarily in Eq. for a matrix.. which is a square m × m matrix. We thus summarize the dimensions of the various spaces in the following table.32) With four unknowns and two independent equations.28) We recall that the null space of a matrix A consists of all vectors x such that Ax = 0. whichever (it turns out) is smaller.33) (2. That is. one can compute a pseudoinverse by considering either AAT or AT A.30) which implies x1 + 3x2 + 3x3 + 2x4 = 0 and 3x3 + x4 = 0.34) . Row rank = Column rank. For an m × n matrix A (m rows. Ax = 0 −→ Ux = 0.  (2. where r is the matrix rank..e.5 Pseudoinverses Rectangular m×n matrices cannot be inverted in the usual sense. For the example of the preceding section. For m < n. we could. We consider two cases: m < n and m > n. for example. Notice that. choose x4 arbitrarily in Eq. 2.31. If AAT is invertible.and that the number of independent columns is also r. (2. (2. 0 0 0 0 (2. for any matrix A. both AAT and AT A are square and symmetric. A is an m × n matrix).

consider AT A.37) where the matrix B = AT A AT in brackets is referred to as the pseudo left inverse. (2. we ﬁrst recall that.45) (2.39)  4 0 = 0 5  0 0  1/16 0 0 1/25  1/4 0 =  0 1/5  .44) (2. rank AT A = rank AAT = rank(A). (2.34 is that this order allows the matrix A to be isolated in the next equation. 0 0  (2.36) −1 (2. consider the 2 × 3 matrix A= in which case AAT = and the pseudo right inverse is C = AT AAT −1 4 0 0 0 5 0  . m > n. which is an n×n matrix. If AT A is invertible.35) where the matrix C = AT AAT in brackets is referred to as the pseudo right inverse.or A AT AAT −1 −1 = I.43) . AT A or AT A −1 −1 AT A = I AT A = I. 2.38) 4 0 0 0 5 0  4 0  0 5 = 0 0 16 0 0 25 . for any matrix A. For the second case. (2. dim(row space) + dim(null space) = number of columns or r + dim(null space) = n. The reason for choosing the order of the two inverses given in Eq. (2. (2. For example.40) We check the correctness of this result by noting that   1/4 0 4 0 0  0 1/5  = AC = 0 5 0 0 0 1 0 0 1 = I. 32 (2. for any matrix A.41) One of the most important properties associated with pseudoinverses is that. (2.42) To show that rank AT A = rank(A).

if suﬃces to show that AT A and A have the same null space. 0 = xT AT Ax = (Ax)T (Ax) = |Ax|2 . We consider some examples: 1.53) c 0 0 c = cI.52) (2. rank AAT = rank BT B = rank(B) = rank(BT ) = rank(A). A’s columns are independent). (2.42 is proved. if rank(A) = n (i. If we take the dot product of both sides with x. Also.48) (2. the matrix product Ax can be thought of as a transformation of x into another vector x : Ax = x . and x is an n-vector. with AT = B. and the ﬁrst part of Eq. For the other direction (⇐=). (2.. Stretching Let A= in which case Ax = c 0 0 c x1 x2 = cx1 cx2 =c x1 x2 . the pseudoinverse solution using AT A is equivalent to the least squares solution. 2.47) Thus.6 Linear Transformations If A is an m × n matrix. The implication of Eq.49) (2. consider AT Ax = 0. 2.Since AT A and A have the same number of columns (n). for the overdetermined system Ax = b (with m > n). then AT A is invertible.51) 33 . It will also be shown in §4 that.42 is proved. the dimension r of the row space of the two matrices is the same. (2.42 is that. (2. we want to prove that Ax = 0 ⇐⇒ AT Ax = 0. where x is an m × 1 vector. (2. 2.50) and the second part of Eq. That is.46) The forward direction (=⇒) is clear by inspection. which implies (since a vector of zero length must be the zero vector) Ax = 0. since AT A and A have both the same null space and the same number of columns (n). 2.e.

x2 ) (x1 . 90◦ Rotation (Fig. 2. Projection Onto Horizontal Axis (Fig. ! ¡ ¡   ¡ ¡   ¨ B ¨ ¨ ¡   ¨ ¡¨¨   ¨ ¡ (x1 . 5) Let A= in which case Ax = 1 0 0 0 x1 x2 = x1 0 .u e eθ B ¨ e ¨¨ θ e¨ (x1 . (2.56) 0 −1 1 0 0 −1 1 0 x1 x2 . 4) Let A= in which case Ax = 0 1 1 0 x1 x2 = x2 x1 . 3) Let A= in which case Ax = 3.59) 1 0 0 0 . −x2 x1 (2.58) ¨ ¨¨ B ¨ ¨ ¨¨ r (x1 . (2.57) 0 1 1 0 . x2 ) Figure 3: 90◦ Rotation. x2 ) Figure 4: Reﬂection in 45◦ Line. 34 . (2. Reﬂection in 45◦ Line (Fig. (2.54) = . 0) Figure 5: Projection Onto Horizontal Axis.55) 4. (2.

p(t) = a0 p1 + a1 p2 + a2 p3 + a3 p4 + a4 p5 . and ci is the ith scalar multiplier.60) where x and y are vectors. (2.62) (2.63) (2.61) The coeﬃcients ai become the components of the vector p. p2 = t. since diﬀerentiation reduces the degree of the polynomial by unity.64) (2. We now continue with the examples of linear transformations: 5.67) 0  4 where the subscript “D” denotes diﬀerentiation. This property implies that knowing how A transforms each vector in a basis determines how A transforms every vector in the entire space. p4 = t3 . and c and d are scalars. (2. (2. p3 = t2 . Diﬀerentiation of Polynomials Consider the polynomial of degree 4 p(t) = a0 + a1 t + a2 t2 + a3 t3 + a4 t4 . Note that this matrix has been written as a rectangular 4 × 5 matrix.66)  0 0 0 3 0     a3   3a3        0 0 0 0 4  4a4   a4 so that the matrix which represents diﬀerentiation  0 1 0 0  0 0 2 0 AD =   0 0 0 3 0 0 0 0 of polynomials of degree 4 is  0 0  . where xi is the ith basis vector. which implies that. in terms of the basis vectors. p5 = t4 . If we diﬀerentiate the polynomial. since the vectors in the space are linear combinations of the basis vectors: A(c1 x1 + c2 x2 + · · · + cn xn ) = c1 (Ax1 ) + c2 (Ax2 ) + · · · + cn (Axn ). 35 . (2.Note that transformations which can be represented by matrices are linear transformations.65) Diﬀerentiation of polynomials of degree 4 can thus be written in matrix form as       a0     0 1 0 0 0   a1   a1         0 0 2 0 0  2a2   a2 = . we obtain p (t) = a1 + 2a2 t + 3a3 t2 + 4a4 t3 = a1 p1 + 2a2 p2 + 3a3 p3 + 4a4 p4 . (2. since A(cx + dy) = c(Ax) + d(Ay). The basis vectors are p1 = 1.

0 0 0 3 0  1   0 0 0 3 0 0 0 0 4 0 0 0 1 0 0 0 1 4 Thus.6. in reverse order (diﬀerentiation followed its original:   0 0 0 0   1 0 0 0  0 1 0  0 0 2  1 AI A D =  0 2 0 0      0 0 1 0  0 0 0 3 0 0 0 0 0 0 1 4 On the other hand.73) 0 0 3 0 0  0   = 0    4  Integration after diﬀerentiation does not restore the original constant term. AD is the pseudo left inverse of AI . 2 3 4 2 3 4 degree  0   0     0    0   1 4 (2. Integration of Polynomials Let p(t) = a0 + a1 t + a2 t2 + a3 t3 . integration of polynomials of  0 0 0  1 0 0    0 1 0 2   0 0 1 3 0 0 0 3 can be written in matrix form as     0   a0     a    0       a1 1 = .  0  1 4 so that the matrix which represents integration  0 0 0  1 0 0  AI =  0 1 0 2   0 0 1 3 0 0 0 (2. performing the operations by integation) does not restore the vector to  0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1     = I.71) where the subscript “I” denotes integration.70) a1 a2   2    1a    2   1  a3   3    a 4 3 of polynomials of degree 3 is  0 0   0 . which can be integrated to obtain t (2. (2. 36 .68) p(τ ) dτ = a0 t + 0 a1 2 a2 3 a3 4 a1 a2 a3 t + t + t = a0 p 2 + p 3 + p 4 + p 5 .   (2.72) 2   0 0 1 0  = I.69) Thus. We note that integration followed by diﬀerentiation is an inverse operation which restores the vector to its original:       0 0 0 0 1 0 0 0 0 1 0 0 0    0 0 2 0 0  1 0 0 0   0 1 0 0     0 1 0 0  =  AD AI =  (2.

75) where e1 and e2 are the basis vectors. 6. θ (b) R2θ = Rθ Rθ (rotation through the double angle): Rθ Rθ = = = cos θ − sin θ sin θ cos θ cos θ − sin θ sin θ cos θ (2.74) If v is a typical vector in 2-D. Note also that R−θ = R−1 = RT .76) as shown in Fig. Re2 = − sin θ cos θ = e2 .80) cos2 θ − sin2 θ −2 sin θ cos θ 2 sin θ cos θ cos2 θ − sin2 θ cos 2θ − sin 2θ sin 2θ cos 2θ = R2θ The above equations assume the double-angle trigonometric identities as known. R is an orthogonal matrix (RRT = I). (2. (2. Thus. 37 . The transformation R has the following properties: (a) Rθ and R−θ are inverses (rotation by θ and −θ): R−θ Rθ = cos θ sin θ − sin θ cos θ cos θ − sin θ sin θ cos θ = 1 0 0 1 = I. 7. v can be written v= v1 v2 = v1 1 0 + v2 0 1 = v1 e1 + v2 e2 .77) where we used sin(−θ) = − sin θ and cos(−θ) = cos θ to compute R−θ .78) (2. so is v.e2 e T2 u e e θ e B ¨ e1 e ¨¨ ¨ θ E e¨ e1 Figure 6: Rotation by Angle θ. but an alternative view of these equations would be to assume that R2θ = Rθ Rθ must be true. and view these equations as a derivation of the double-angle identities. We now consider the eﬀect of R on the two basis vectors e1 and e2 : Re1 = cos θ sin θ = e1 . (2. 6) Consider the 2-D transformation R= cos θ − sin θ sin θ cos θ .79) (2. Since e1 and e2 are both rotated by θ. Rotation by Angle θ (Fig. (2.

and the orthogonal subspace is the z-axis. Thus. . . we recall that the inner product (or dot product) of u and v is deﬁned as n (u. 2. . 38 . (2.. un ) and v = (v1 . the xy-plane is a subspace of R3 . We note ﬁrst that a set of mutually orthogonal vectors is linearly independent.87) implies ci = 0 for all i. c1 v1 · v1 = c1 |v1 |2 = 0.87 with v1 . v) = u · v = u v = i=1 T ui vi . If we take the dot product of both sides of Eq. the order of rotations does not matter. i. (2. 2.89) Since |v1 | = 0. let vectors v1 . .84) Properties (a) and (b) are special cases of Property (c). v2 . (2. 2. .86) The vectors u and v are orthogonal if u · v = 0. we obtain v1 · (c1 v1 + c2 v2 + · · · + ck vk ) = 0. .(c) Rθ+φ = Rφ Rθ (rotation through θ. u2 . from Eq. we conclude that c1 = 0.7 Orthogonal Subspaces For two vectors u = (u1 .88) where orthogonality implies v1 · vi = 0 for i = 1.83) cos φ cos θ − sin φ sin θ sin φ cos θ + cos φ sin θ cos(φ + θ) − sin(φ + θ) sin(φ + θ) cos(φ + θ) We note that. . To prove this assertion. c2 = c3 = · · · = ck = 0.88. .85) The length of u is |u| = u2 1 + u2 2 + ··· + u2 n = √ n u·u= i=1 ui ui (2. . For example.e. and the original assertion is proved. then φ): Rφ Rθ = = = cos φ − sin φ sin φ cos φ cos θ − sin θ sin θ cos θ − cos φ sin θ − sin φ cos θ − sin φ sin θ + cos φ cos θ = Rφ+θ (2.81) (2. We deﬁne two subspaces as orthogonal if every vector in one subspace is orthogonal to every vector in the other subspace. v2 . vn ). Similarly. (2. in 2-D rotations (but not in 3-D). we must show that c1 v1 + c2 v2 + · · · + ck vk = 0 (2. vk be mutually orthogonal. . Rφ Rθ = Rθ Rφ .82) (2. . . To prove independence.

94) This result can be obtained by inspection by noting that the scalar projection of b onto a is the dot product of b with a unit vector in the direction of a.96) 39 . |a| (2. b2 . v is a combination of the columns of AT or rows of A. We now show that the row space of a matrix is orthogonal to the null space. Since a · b = |a||b| cos θ. . a (2. Thus. and v is in the row space.91) 2. |a||b| |a| a·a (2.95) (2.3 b       I     θ  p      I   a           1 2 Figure 7: Projection Onto Line. an ) and b = (b1 . thus proving the assertion. That is. as shown in Fig. (2. 7 in 3-D.93 becomes p = |b| a·b a a·b = a. |a| where the fraction in this expression is the unit vector in the direction of a. . where θ is the angle between b and a. . The vector projection would then be this scalar projection times a unit vector in the direction of a: p= b· a |a| a . The scalar projection of b onto a is |b| cos θ. Let p be the vector obtained by projecting b onto a. Ax = 0 (2. Eq.92) (2. vT x = wT Ax = 0. consider a matrix A for which x is in the nullspace.90) and v = AT w for some w. . We wish to project b onto a. Thus.93) p = (|b| cos θ) . 2. To prove this assertion. bn ). a2 . . . . Then.8 Projections Onto Lines Consider two n-dimensional vectors a = (a1 . .

e.97) Here it is convenient to switch to index notation with the summation convention (in which repeated indices are summed). That is. 2. (2. From Eq. the projection matrix is Pij = or. given p. This property can be proved using index notation: P2 ij = (PP)ij = Pik Pkj = ai ak ak aj ai aj (a · a) ai aj = = = Pij . since b is not unique. 8). the need arises to transform vectors and matrices from one coordinate system to another. i. P is the matrix such that p = Pb. a·a a a (2. The projection matrix is the matrix P that transforms b into p. it is frequently more convenient to derive element matrices in a local element coordinate system and then transform those matrices to a global system (Fig. Transformations 40 . a·a a·a ai aj a·a (2.98) (2. Pij = Pji .95. For example.. it is not possible to reconstruct the original vector b. P= aaT aaT = T . We can also determine the projection matrix for this projection..99) The projection matrix P has the following properties: 1. P2 = P. P is symmetric.101) 3. 2. That is.100) aj b j ai aj ai = bj = Pij bj . i. 3 Change of Basis On many occasions in engineering applications. pi = Thus. there is no place else to go with further projection. P is singular. in vector and matrix notations. once a vector has been projected. in the ﬁnite element method.e. (a · a)(a · a) (a · a)(a · a) a·a (2.y T y T y ¯ w f f f r   I¯  x  r  y ¯ w f f axial member E r I¯ x     r   f  triangular membrane r  E x x Figure 8: Element Coordinate Systems in the Finite Element Method.

i = j. (3. 10). and vi are the com¯ ponents in the barred system (Fig. and vi are the components of v. a summation is implied over the range. (3. are also needed to transform from other orthogonal coordinate systems (e. we can omit the summation sign and write v = vi ei . If ei is an orthonormal basis. 3.y T eθ f w f f r I  er f q θ E x Figure 9: Basis Vectors in Polar Coordinate System. T2    f f f f f           θ f     I   x x2 f ¯ w f v x1 ¯ E x1 Figure 10: Change of Basis.3) where δij is the Kronecker delta. 9). If we take the dot product of both sides of Eq.. we can express v in two diﬀerent orthonormal bases: 3 3 v= i=1 vi ei = i=1 vi ei . if a subscript appears exactly twice. (3. cylindrical or spherical) to Cartesian coordinates (Fig. Since bases are not unique.. An orthonormal basis is deﬁned as a basis whose basis vectors are mutually orthogonal unit vectors (i. Using the summation convention.4 41 . ei · ej = δij = 1. Let the vector v be given by 3 v = v1 e1 + v2 e2 + v3 e3 = i=1 vi ei .1) where ei are the basis vectors. vectors of unit length).4) where vi are the components of v in the unbarred coordinate system. 0. ¯¯ (3.e.g.2) where. i = j.

(3. and we deﬁne the 3 × 3 matrix R as ¯ Rij = ei · ej . and ei · ej = Rji . 3 3 (3.9) Eq. ¯ ¯ (3.13) where I is the identity matrix (Iij = δij ):  1 0 0 I =  0 1 0 .7 is equivalent to the matrix product ¯ v = RT v . 0 0 1  (3.7) Since the matrix product C = AB can be written using subscript notation as 3 (3. (3.13 is said to be an orthogonal matrix.10 and 3. (3.12 yields 3 R −1 =R T or RR = I or k=1 T Rik Rjk = δij . from Eq.14) This type of transformation is called an orthogonal coordinate transformation (OCT).with ej . 3.10) ¯ vi ei · ej = i=1 i=1 vi ei · ej . 3.6) vj = i=1 Rij vi = ¯ i=1 T ¯ Rji vi .5) where ei · ej = δij . we obtain 3 3 (3.5.11) ¯ ¯ ¯ where ei · ej = δij . ¯ ¯ ¯ (3. ¯ Similarly.8) Cij = k=1 Aik Bkj . Thus. 3. 3. (3. A matrix R satisfying Eq. we obtain 3 3 vi ei · ej = i=1 i=1 vi ei · ej . 3 vj = ¯ i=1 Rji vi ¯ ¯ or v = Rv or v = R−1 v. That is. if we take the dot product of Eq. 3. Thus.4 with ej . an orthogonal 42 .12) A comparison of Eqs.

13. and the minus sign occurs for combinations of rotations and reﬂections.21) where the summation convention was used.22) where ¯ Rij = ei · ej .23) . ¯¯ (3.20) 0 0 −1 indicates a reﬂection in the z direction (i. That is. The plus sign occurs for rotations. We note that the length of a vector is unchanged under an orthogonal coordinate transformation. To summarize. is that the rows and columns of an orthogonal matrix must be unit vectors and mutually orthogonal. R= and cos θ sin θ − sin θ cos θ (3.   cos θ sin θ 0 R =  − sin θ cos θ 0  . (3. Also.matrix is one whose inverse is equal to the transpose. Eq. for an orthogonal matrix R. 3. det(RRT ) = (det R)(det RT ) = (det R)2 = det I = 1. (3. 3.16) vx = vx cos θ − vy sin θ ¯ ¯ (3. and RRT = RT R = I. from Eq. That is.15) 0 0 1 In 2-D.18) (3. in 3-D. the orthogonal matrix   1 0 0 0  R= 0 1 (3. Thus. det R = ±1.19) and we conclude that.. ¯ ¯ We recall that the determinant of a matrix product is equal to the product of the determinants. the square of the length of a vector is the same in both coordinate systems. For example. the sign of the z component is changed). vectors transform according to the rule 3 ¯ v = Rv or vi = ¯ j=1 Rij vj .17) vy = vx sin θ + vy cos θ. the determinant of the transpose of a matrix is equal to the determinant of the matrix itself. Another property of orthogonal matrices that can be deduced directly from the deﬁnition. R is sometimes called a rotation matrix. the rows and columns form an orthonormal set. 43 (3.e. (3. 10. under an orthogonal coordinate transformation.24) (3. for the coordinate rotation shown in Fig. since the square of the length is given by vi vi = Rij vj Rik vk = δjk vj vk = vj vj .13. For example.

The corresponding strain tensor is  ε11  ε21 ε= ε31 stresses. since T = T and p = p. σ31 σ32 σ33  (3. (3.26 and 3. and σ12 .29) which is the transformation rule for a tensor of rank 2. σ22 . σ33 are the direct (normal) stresses. Consider a matrix M = (Mij ) which relates two vectors u and v by 3 v = Mu or vi = j=1 Mij uj (3.28) ¯ Mij = k=1 l=1 Rik Rjl Mkl . (3. temperature and ¯ pressure are scalars. ε32 ε33 44 (3. By comparing Eqs.25) (i. Also.e.27) (3. Stress and strain in elasticity The stress tensor σ is  σ11 σ12 σ13 σ =  σ21 σ22 σ23  . (3. 3 3 (3.26) Since both u and v are vectors (tensors of rank 1). the result of multiplying a matrix and a vector is a vector). σ13 .2 Examples of Tensors 1. ¯u ¯ v = M¯ . in a rotated coordinate system. A tensor of rank 0 is a scalar (a quantity which is unchanged by an orthogonal coordinate transformation). which has n indices. ¯ We now introduce tensors of rank 2. In general.3..27.31) where σ11 . 3. For example. we obtain ¯ M = RMRT or. 3.25 implies ¯ ¯ ¯ ¯ RT v = MRT u or v = RMRT u. in index notation. a tensor of rank n.32) . and σ23 are the shear  ε12 ε13 ε22 ε23  . transforms under an orthogonal coordinate transformation according to the rule 3 3 3 ¯ Aij···k = p=1 q=1 ··· r=1 Rip Rjq · · · Rkr Apq···r .30) 3.1 Tensors A vector which transforms under an orthogonal coordinate transformation according to the ¯ rule v = Rv is deﬁned as a tensor of rank 1. Eq.

Both σ and ε transform like second rank tensors. ¯ ¯ ¯ we conclude that copmn = Roi Rpj Rmk Rnl cijkl .where. we obtain Rpj Roi Rki Rlj σkl = Roi Rpj Rmk Rnl cijkl εmn .j + uj. In general three-dimensional elasticity. We now prove that cijkl is a tensor of rank 4. ¯ ¯ (3.39) (3.33) The shear strains in this tensor are equal to half the corresponding engineering shear strains. Generalized Hooke’s law According to Hooke’s law in elasticity. Stiﬀness matrix in ﬁnite element analysis In the ﬁnite element method of analysis for structures. δok δpl σkl = σop = Roi Rpj Rmk Rnl cijkl εmn . In 1-D.41) (3.37) .35) where the 81 material constants cijkl are experimentally determined. 1 1 εij = (ui. ¯ ¯ or. an experimentally determined material property. 3. there are nine components of stress σij and nine components of strain εij . ¯ ¯ ¯ Since. (3. and the summation convention is being used. because R is an orthogonal matrix.i ) = 2 2 ∂ui ∂uj + ∂xj ∂xi . According to generalized Hooke’s law. σ = Eε. 3.38) (3.35 in terms of stress and strain in a second orthogonal coordinate system as Rki Rlj σkl = cijkl Rmk Rnl εmn . 45 (3. (3. We can write Eq. the extension in a bar subjected to an axial force is proportional to the force.34) where E is Young’s modulus. the forces F acting on an object in static equilibrium are a linear combination of the displacements u (or vice versa): Ku = F. or stress is proportional to strain. (3. in the second coordinate system. in terms of displacements. σop = copmn εmn . each stress component can be written as a linear combination of the nine strain components: σij = cijkl εkl .40) (3. and sum repeated indices. 2. ¯ which proves that cijkl is a tensor of rank 4.36) If we multiply both sides of this equation by Rpj Roi .

c.. .            ua ub uc . the stiﬀness matrix transforms like other tensors of rank 2: ¯ K = ΓT KΓ. . Thus. ¯ F = ΓF. ¯ KΓu = ΓF or ¯ ΓT KΓ u = F.   .51) We illustrate the transformation of a ﬁnite element stiﬀness matrix by transforming the stiﬀness matrix for the pin-jointed rod element from a local element coordinate system to a global Cartesian system. . . Similarly. .50) (3. . .  .47) (3. .42)  uc   Fc   .  for grid points a. where ¯ ¯ ua = Ra ua .   .44) (3. .. Consider the rod shown in Fig. F= (3.   . · · · . .      ua   Fa         ub    Fb   u= . . (3. (3.          = Γu.   . and consisting of rotation matrices R:  0 ··· 0 ···   Rc · · ·  . ··· ··· ··· . Thus.45) ΓT Γ = I. . if ¯u ¯ K¯ = F. For this element. (3.48) (3.   . i. . . 11. In this equation. . .   . b. . since u and F are the displacement and force vectors for all grid points.   . .49) (3.46) (3. . the 4 × 4 2-D 46 .43) where Γ is an orthogonal block-diagonal matrix  Ra 0  0 Rb  Γ= 0 0  . . .     ua  ¯     ub   ¯   ¯ u= = ¯  uc    .e.. That is. u and F contain several subvectors. .where K is referred to as the stiﬀness matrix (with dimensions of force/displacement). ub = Rb ub . .  Ra 0 0 0 Rb 0 0 0 Rc ..

52) where A is the cross-sectional area of the rod.56) cos θ sin θ − sin θ cos θ . (3.  (3.54) (3. and L is the rod length. In the global coordinate system. ¯ K = ΓT KΓ.  0 0   s  c (3.q           y T            θ     y ¯         s d  ¯   x     dq    E 4    dq               DOF         θ     2         s d       1 d  q  s d 3 x Figure 11: Element Coordinate System for Pin-Jointed Rod. E is Young’s modulus of the rod material.53) 47 . = 2 c2 cs  L  −c −cs −cs −s2 cs s2  where c = cos θ and s = sin θ. where the 4 × 4 transformation matrix is Γ= and the rotation matrix is R= Thus. stiﬀness matrix in the element coordinate system is  1 0 −1 0 AE  0 0 0 0 ¯  K=  −1 0 1 0 L 0 0 0 0   . (3.   c s 0 c −s 0 0 1 0 −1 0 AE  s 0 0   −s c 0 c 0 0  0 0    K=   −1 0  0 0  0 0 c −s 1 0 c L 0 0 −s 0 0 s c 0 0 0 0   2 2 c cs −c −cs  cs AE  s2 −cs −s2  .55) R 0 0 R .

For example. and ¯ = RIRT = RRT = I. That is. thus implying at most three independent elastic material constants (on the basis of tensor analysis alone). we consider the square of the error: E 2 = (Ax − b)T (Ax − b) = (xT AT − bT )(Ax − b) = xT AT Ax − xT AT b − bT Ax + bT b. (4. since δij = δij . we are interested in the case where there are more equations than unknowns. To solve this problem.e..3. where E is the length of the residual Ax − b (error). 4. The actual number of independent material constants for an isotropic material turns out to be two rather than three.1 has no solution. 48 (4. cijkl must be an isotropic tensor of rank 4. We state the problem as follows: If A is a m × n matrix. The Kronecker delta δij is a second rank ¯ tensor and isotropic. I (3.2) . so that Eq. and Ax = b. For an isotropic material. which has three constants. invariant under an orthogonal coordinate transformation).3 Isotropic Tensors An isotropic tensor is a tensor which is independent of coordinate system (i. with m > n.57) That is. Assume that the system is inconsistent.35).1) Consider the rectangular system of equations where A is an m × n matrix with m > n. δij is the characteristic tensor for all isotropic tensors: Rank 1 2 3 4 odd Isotropic Tensors none cδij none aδij δkl + bδik δjl + cδil δjk none That is. Our interest here is to ﬁnd a vector x which “best” ﬁts the data in some sense. It can be shown in tensor analysis that δij is the only isotropic tensor of rank 2 and. in generalized Hooke’s law (Eq. all isotropic tensors of rank 4 must be of the form shown above. which implies the additional symmetry cijkl = cklij . moreover. 4 Least Squares Problems Ax = b. a result which depends on the existence of a strain energy function. 3. the material property tensor cijkl has 34 = 81 constants (assuming no symmetry). ﬁnd x such that the scalar error measure E = |Ax − b| is minimized. the identity matrix I is invariant under an orthogonal coordinate transformation.

Eq.8) 1. we could “solve” the normal equations to obtain x = (AT A)−1 AT b.11) ∂(E 2 ) = Ajl Aji xi + Aji Ajl xi − Ail bi − Ail bi = (2AT Ax − 2AT b)l ∂xl (4.5) (4. the coeﬃcient matrix AT A in Eq. 4. 4.4) (4. since (AT A)T = AT (AT )T = AT A.3 to obtain 0= ∂(E 2 ) = δil Aji Ajk xk + xi Aji Ajk δkl − δil Aji bj − bi Aij δjl + 0 ∂xl = Ajl Ajk xk + xi Aji Ajl − Ajl bj − bi Ail . AT A is also symmetric. rank r < n). Since A is an m × n matrix.7) (4.10) The problem solved is referred to as the least squares problem. since the original system. 49 . 0. Thus. in index notation using the summation convention. (AT A)−1 may not exist. (4. 0= or AT Ax = AT b. E 2 = xi (AT )ij Ajk xk − xi (AT )ij bj − bi Aij xj + bi bi = xi Aji Ajk xk − xi Aji bj − bi Aij xj + bi bi . i = j. (4.10.or. we set ∂(E 2 ) = 0 for each l. If AT A is invertible. Thus. (4.12) (4. 4. ∂xl where δij is the Kronecker delta deﬁned as δij = We then diﬀerentiate Eq. Eq. which solve the least squares problem are referred to as the normal equations. i = j.3) (4. We wish to ﬁnd the vector x which minimizes E 2 . 4.e. may not have been over-determined (i. ∂xl where we note that ∂xi = δil .9) However.. since we have minimized E. Moreover.1.6) where the dummy subscripts k in the ﬁrst term and j in the third term can both be changed to i.10 is a square n × n matrix. The equations. the square root of the sum of the squares of the components of the vector Ax − b. (4.

We also recall that the matrix (AT A)−1 AT in Eq. f2 ). . One of the most important properties associated with the normal equations is that. we could require that the algebraic sum of the errors would be minimized by p. for any matrix A.10 could have been obtained simply by multiplying both sides of Eq. thus distorting the evaluation of the ﬁt. Thus. as shown in Fig. 4. but positive and negative errors would cancel each other. f1 ) Error T c r at a point r    p(x)   x1 x2 x3 x4 E x Figure 12: Example of Least Squares Fit of Data.. 4. 12. For example. and we want to approximate f with the low-order polynomial p(x) = a0 + a1 x + a2 x2 + · · · + an xn . (4. (x2 .f T(x) r r (x1 . (xm .5 following Eq. For example. We note that the quantities in brackets are the errors at each of the discrete points where the function is known. 4. then x = (AT A)−1 AT b = A−1 (AT )−1 AT b = A−1 b. R = [f1 − p(x1 )]2 + [f2 − p(x2 )]2 + · · · + [fm − p(xm )]2 .15) where the number m of points is large.1 by AT . then AT A is invertible.42. . R is referred to as the residual.12 is the pseudo left inverse of A. We note that Eq. We recall that the implication of this result is that.1 Least Squares Fitting of Data Sometimes one has a large number of discrete values of some function and wants to determine a fairly simple approximation to the data. We also note that. and the order n of the polynomial is small. f1 ). We could alternatively require that the sum of the absolute errors be minimum. but we would not have known that the result is a least-squares solution. if rank(A) = n (i. . but make use of all the data. suppose we know the points (x1 .e. .14) as proved in §2.13) as expected. (4. The goal of the ﬁt is for p to agree as closely as possible with the data. 50 . (4. 4.16) is minimum. but it turns out that that choice results in a nasty mathematical problem. the pseudo left inverse is equivalent to a least squares solution. A’s columns are independent). Hence we pick p(x) such that the sum of the squares of the errors. fm ). 2. (4. if A is square and invertible. rank(AT A) = rank(A).

(4.      f   m (4. .  . m 2 1 (4.21) This is a symmetric system of two linear algebraic equations in two unknowns (a0 . consider the linear approximation p(x) = a0 + a1 x. 2[f1 − (a0 + a1 x1 )](−1) + 2[f2 − (a0 + a1 x2 )](−1) + · · · + 2[fm − (a0 + a1 xm )](−1) = 0 2[f1 − (a0 + a1 x1 )](−x1 ) + 2[f2 − (a0 + a1 x2 )](−x2 ) + · · · + 2[fm − (a0 + a1 xm )](−xm ) = 0 (4. . . An alternative approach would be to attempt to ﬁnd (a0 .23) (4. from Eq. For R a minimum. a1 ) = [f1 − (a0 + a1 x1 )]2 + [f2 − (a0 + a1 x2 )]2 + · · · + [fm − (a0 + a1 xm )]2 .24) This system was solved previously by multiplying by AT : AT Aa = AT f . where the measure of error is given by R(a0 .For example.22) .  . 1 xm    f1      f2       f3 f= .20) or ma0 + (x1 + x2 + · · · + xm )a1 = f1 + f2 + · · · + fm (x1 + x2 + · · · + xm )a0 + (x2 + x2 + · · · + x2 )a1 = f1 x1 + f2 x2 + · · · + fm xm . this system can be written Aa = f . .   .25) 51 . which is an overdetermined system if m > 2. . a1 ) such that the linear function goes through all m points: a0 + a1 x 1 = f 1 a0 + a1 x 2 = f 2 a0 + a1 x 3 = f 3 (4.  . a0 + a1 x m = f m .19) (4.   . ∂R ∂R = 0 and = 0.  . (4. where     A=    1 x1 1 x2   1 x3  .18. a1 ) which can be solved for a0 and a1 to yield p(x).17) a= a0 a1 . 4.18) (4. ∂a0 ∂a1 which implies. In matrix notation.

That is.2 The General Linear Least Squares Problem Given m points (x1 . . we could attempt a ﬁt using the set of functions {ex . (4. 4. .26) m m i=1 m xi i=1 i=1 and 1 1 1 ··· x1 x2 x3 · · · 1 xm A f= T    f1       f2           f3 =  . perform a least squares ﬁt using a linear combination of other functions φ1 (x). the more general least squares problem is to ﬁnd the coeﬃcients ai in p(x) = a1 φ1 (x) + a2 φ2 (x) + · · · + an φn (x) (4. 2  xi (4. . between the function p(x) and the original points is minimized.. both approaches give the same result. The geometric interpretation of minimizing the sum of squares of errors is equivalent to the algebraic interpretation of ﬁnding x such that |Aa − f | is minimized.where 1 1 1 ··· x1 x2 x3 · · · 1 xm  A A= T         1 x1  1 x2   1 x3  =    .30) . E 2 = R.e. . φn (x). sin x}. (4. . 4. . fm ).25 becomes  m     m i=1 m m  xi    2  xi a0 a1 =          m fi i=1 m      . . in general. the two error measures are also the same. we could. f1 ). φ2 (x).    . . 1 xm m  xi   .    .28) xi i=1 i=1 i=1   xi f i   which is identical to Eq. Hence.27) i=1   xi f i   Thus. Both approaches yield Eq. 4. 1/x. a1 φ1 (xm ) + a2 φ2 (xm ) + · · · + an φn (xm ) = fm . (x2 . . f2 ). Eq.  .21. 52 (4. . Thus. we attempt to force p(x) to go through all m points: a1 φ1 (x1 ) + a2 φ2 (x1 ) + · · · + an φn (x1 ) = f1 a1 φ1 (x2 ) + a2 φ2 (x2 ) + · · · + an φn (x2 ) = f2 .        f   m m fi i=1 m      . . 4. . 4. .   . If we use the algebraic approach to set up the equations. For example. Eq. i.28.16. (xm .29) so that the mean square error. The functions used in the linear combination are referred to as basis functions (or basis vectors).

the number of basis functions. (4. Note that.. and the resulting system of equations (the normal equations) may be hard to solve (i. . The Gram-Schmidt procedure is a method for ﬁnding an orthonormal basis for a set of independent vectors. (4. We deﬁne the second unit basis vector so that it is coplanar with a(1) and a(2) : a(2) − (a(2) · q(1) )q(1) q(2) = (2) .which is an over-determined system if m > n.3 Gram-Schmidt Orthogonalization One of the potential problems with the least squares ﬁts described above is that the set of basis functions is not an orthogonal set (even though the functions may be independent).32) The least squares solution of this problem is therefore found by multiplying by AT : (AT A)a = AT f . . q(3) . (4. . the normal equations for the general linear least squares problem are (in index notation) [φi (xk )φj (xk )]aj = φi (xj )fj . e(3) . (4. and the coeﬃcient matrix is Aij = φj (xi ). Aa = f or Aij aj = fi . . a skewed plane in R3 ).g. since the given vectors may deﬁne a subspace of a larger dimensional space (e. . the system may be poorly conditioned). . . ﬁnd a set of orthonormal vectors q(1) .) We start the Gram-Schmidt procedure by choosing the ﬁrst unit basis vector to be parallel to the ﬁrst vector in the set: a(1) (4. Thus. 13.38) |a − (a(2) · q(1) )q(1) | 53 .36) The order of this system is n.35) Thus. q(2) . . (4. e(2) . given a set of vectors. a(2) .37) q(1) = (1) .33) 4.}.31) where the summation convention is used. a(3) .34) (4. (4. This situation can be improved by making the set of basis functions an orthogonal set. we cannot simply use as our orthonormal basis the set {e(1) .e. that span the same space. where (using the summation convention) (AT A)ij = (AT )ik Akj = Aki Akj = φi (xk )φj (xk ) and (AT f )i = (AT )ij fj = Aji fj = φi (xj )fj . The straight line ﬁt considered earlier is a special case of this formula. |a | as shown in Fig. (A set of vectors is orthonormal if the vectors in the set are both mutually orthogonal and normalized to unit length.. . Consider the following problem: Given a set of independent vectors a(1) .

.44) ··· ··· ··· . a(2) . . q(n) for the columns. . q(2) is in the plane of a(1) and a(2) . . .a(2) a(2) − (a(2) · q(1) )q(1)      I  a(1)   w f   I  f   (1) q   f Figure 13: The First Two Vectors in Gram-Schmidt. q(1) is parallel to a(1) . we create q(3) by subtracting from a(3) any part of a(3) which is parallel to either q(1) or q(2) . if the q vectors were obtained from a Gram-Schmidt process. q(2) . Since the third unit basis vector must be orthogonal to the ﬁrst two. .      a (2) ··· a (n) = q (1) q (2) ··· q (n)     a(1) · q(1) a(2) · q(1) a(3) · q(1) 0 a(2) · q(2) a(3) · q(2) 0 0 a(3) · q(3) . a(n) are independent. |a − (a(4) · q(1) )q(1) − (a(4) · q(2) )q(2) − (a(4) · q(3) )q(3) | 4.4 QR Factorization Consider an m × n matrix A whose columns a(1) . . (4.  . If we have an orthonormal basis q(1) . . Hence.  .42) However. . . . Hence.. in terms of matrices. . . Eq.43) Also. .40) a(4) − (a(4) · q(1) )q(1) − (a(4) · q(2) )q(2) − (a(4) · q(3) )q(3) = (4) .42 becomes  A= a (1) (4. . .45) 54 . (1) (n) (2) (n) a ·q a ·q ··· (4. (4. . q(3) . . . . in which case q(3) · a(2) = q(4) · a(2) = q(5) · a(2) = · · · = 0. . . then each column can be expanded in terms of the orthonormal basis:  a(1) = (a(1) · q(1) )q(1) + (a(1) · q(2) )q(2) + · · · + (a(1) · q(n) )q(n)    a(2) = (a(2) · q(1) )q(1) + (a(2) · q(2) )q(2) + · · · + (a(2) · q(n) )q(n)  . (4) a(3) − (a(3) · q(1) )q(1) − (a(3) · q(2) )q(2) . . .41) . (4. 4. |a(3) − (a(3) · q(1) )q(1) − (a(3) · q(2) )q(2) | (4.    A = a(1) a(2) · · · a(n) = q(1) q(2) · · · q(n)    a(1) · q(1) a(2) · q(1) · · · a(1) · q(2) a(2) · q(2) · · ·   .39) (4. . q(3) = q and so on. .   a(n) = (a(n) · q(1) )q(1) + (a(n) · q(2) )q(2) + · · · + (a(n) · q(n) )q(n)  or. and a(1) · q(2) = a(1) · q(3) = a(1) · q(4) = · · · = 0. a(3) .

e. 0 0 1 . (4.  . .1) Note the analogy between this deﬁnition for functions and the dot product for vectors. performing the Gram-Schmidt process to ﬁnd Q and R.51) This system can be solved very quickly. and R is an upper triangular matrix.48 becomes RT QT QRx = RT QT b. in block form.50) (4. (5.52)           q(1) q(2) · · · q(n)    =    ··· 0 ··· 0   · · · 0  = I. the normal equations become Rx = QT b.53) (4. since R is upper triangular and invertible.. . . and. 0 0 0 ··· 1 1 0 0 . 4. . (4. q(n) of the Gram-Schmidt process. . We recall that the rectangular system Ax = b (4. However. i. and R is an n × n matrix). If A is factored into A = QR (where A and Q are m × n matrices. . (4.. We thus conclude that every m × n matrix with independent columns can be factored into A = QR. where.48) (4. . q(3) .e. the major computational expense of the least squares problem is factoring A into QR.  . g) = a f (x)g(x) dx. . Eq.  .  (n)T  q Thus. 5 Fourier Series Consider two functions f (x) and g(x) deﬁned in the interval [a.. We deﬁne the inner product (or scalar product) of f and g as b (f.  . 55 . . a ≤ x ≤ b). . 0 1 0 . . The beneﬁt of the QR factorization is that it simpliﬁes the least squares solution.49) (4. . q(2) .   q(1)T   (2)T  q T Q Q= . RT Rx = RT QT b.  .47) with m > n was solved using least squares by solving the normal equations AT Ax = AT b. b] (i.46) where Q is the matrix formed with the orthonormal columns q(1) .or A = QR. since R is an upper triangular matrix.

. cos . cos . m = n.5) (1)2 dx = 2π.. . . . i. .. π]. Thus. π π π π 2π is an orthonormal set over [−π. where δij is the Kronecker delta deﬁned as 1. the set of functions cos x sin x cos 2x sin 2x 1 √ . (sin nx. (5. f ) 2 = a 1 b [f (x)]2 dx. . sin mx) = (cos nx. Orthogonality of functions is analogous to the orthogonality of vectors. sin . Thus. We deﬁne two functions f and g as orthogonal over [a.6) cos2 nx dx = π sin nx dx = −π for all integers n. the set of functions {1. this deﬁnition of the norm of a function is analogous to the length of a vector. . . L].2) ¯ Note that a new function f (x) deﬁned as f (x) ¯ f (x) = Nf (5. 2. analogous to the expansion of a vector in orthogonal basis vectors.7) Bn sin L L n=1 n=1 This equation is an expansion in functions orthogonal over [−L. The basis functions here are 1.} is an orthogonal set over [−π.3) ¯ ¯ ¯ has unit norm.e. (cos nx. √ . The function f is said to be normalized over [a. i = j. cos mx) = 0. sin . f ) = 1. . g) = 0. √ . φj ) = δij . sin . since. To compute the norms of these functions. sin x. δij = (5. we note that π (5.4) 0. (5. π 2 −π −π π (5. sin 2x. b]. b] if (φi . cos x. √ .. The Fourier series of a function deﬁned over [−L. b] if (f. √ . L] is an expansion of the function into sines and cosines: ∞ ∞ nπx nπx f (x) = A0 + An cos + . L L L L L L 56 . cos 3x. . cos 2x. i = 1. We deﬁne the set of functions φi (x). For example. cos πx πx 2πx 2πx 3πx 3πx . (f . as an orthonormal set over [a. i = j. sin 3x. . sin mx) = 0. π]..We deﬁne the norm Nf of a function f as Nf = (f. for integers m and n.

57 .10) (5.13) (5.14) 1 Bn = L f (x) sin −L Note that A0 is the average value of f (x) over the domain [−L. cos nx) (5. (f. m = n. m = 0. Similarly. the nth Fourier coeﬃcient can be interpreted as being the “projection” of the function f on the nth basis function (cos nx). that is. cos nx) .7. L. Eq. cos mx) = Am (cos mx. 0. are L 1 f (x) dx. L]. For example.9) cos −L nπx mπx cos dx = L L L sin −L nπx mπx sin dx = L L (5. and integrate. Am (L). 5. m = n. n > 0. L f (x) cos −L mπx dx =A0 L + L cos −L ∞ mπx dx + An L n=1 L ∞ L cos −L mπx nπx cos dx L L A0 (2L). cos mx) or An = (f. we multiply both sides of that equation by a basis function. The evaluation of the coeﬃcients using integrals could have alternatively been expressed in terms of inner products.7 with each basis function. Bn in the Fourier series. m = 0. (5.8) Bn n=1 −L sin nπx mπx cos dx = L L since the integrals in Eq.To determine the coeﬃcients An .7 (with L = π). m = 0.17) Thus. 14): p= b· a |a| a b·a = a.15) (5. Thus. L (5. we can evaluate Bn . (5. |a| a·a (5. from Eq. 5.8 are L cos −L L mπx dx = L L 2L. 0. (cos nx. 5. 5. we take the inner product of both sides of Eq.16) It is interesting to note the similarity between this last formula and the formula of the vector projection p of a vector b onto another vector a (Fig. in which case. (5. L nπx dx.11) sin −L nπx mπx cos dx = 0 L L for integers m and n. the coeﬃcients An .12) A0 = 2L −L An = 1 L L f (x) cos −L L nπx dx. m = 0. Bn of the Fourier series.

the overshoot which occurs at a discontinuity does not diminish. as the number of terms included in the Fourier series increases. 10.14 (with L = 5). (5. (5.b      I    I    p  a    Figure 14: The Projection of One Vector onto Another. the four partial sums have 4. In the generalized Fourier series representation of the function 58 . n odd. n even. Notice also in Fig. 5 6/(nπ). 15 for N = 5. 6. −5 < x < 0. n > 0. and 40. at a discontinuity of the function.23) for diﬀerent values of the upper limit N . Since half the sine terms and all the cosine terms (except the constant term) are zero.21) f (x) sin −5 nπx 1 dx = 5 5 3 sin 0 nπx dx = 5 Thus the corresponding Fourier series is f (x) = 3 6 + 2 π 1 nπx 3 6 sin = + n 5 2 π n odd sin πx 1 3πx 1 5πx + sin + sin + ··· 5 3 5 5 5 .2 Generalized Fourier Series The expansion of a function in a series of sines and cosines is a special case of an expansion in other orthogonal functions.19) An = Bn = 1 5 5 f (x) cos −5 nπx 1 dx = 5 5 5 3 cos 0 nπx dx = 0. 0. the Fourier coeﬃcients are A0 = 1 5 5 1 10 5 f (x) dx = −5 1 10 5 5 3 dx = 3/2. respectively. 5. This property of Fourier series is referred to as the Gibbs phenomenom.12–5. 11. 0 (5. Notice also that. the series converges to the average value of the function at the discontinuity. Four such representations of f (x) are shown in Fig. 5. 20. 3.22) The convergence of this series can be seen by evaluating the partial sums N f (x) ≈ fN (x) = A0 + n=1 An cos nπx nπx Bn sin + L L n=1 N (5. 5. We evaluate the Fourier series for the function f (x) = (5.20) (5. 0 < x < 5. 15 that.1 Example 0. and 21 nonzero terms.18) From Eqs.

24) we wish to evaluate the coeﬃcients ci so as to minimize the mean square error b N 2 MN = a f (x) − i=1 ci φi (x) dx. φi ). 5.24. (5. φj ) = δij (Kronecker delta).27) That is. N (5. (5. for each j. 5. N f (x) ≈ i=1 ci φi (x). the series representation in Eq. (5. had we started with the series. φj ) = i=1 ci (φi . ∂MN 0= = ∂cj or b b N 2 f (x) − a N i=1 b ci φi (x) [−φj (x)] dx. To minimize MN . (5. (5.29) The expansion coeﬃcients ci are called the Fourier coeﬃcients. The generalized Fourier series thus has the property that the 59 .y T 3 N =5 E y T 3 N = 10 x -5 y T 3 N = 40 E E x -5 y T 3 N = 20 5 5 x -5 E x -5 5 5 Figure 15: Convergence of Series in Example.25) where (φi . f (x). Eq. φj ) = i=1 ci δij = cj .28) Alternatively. we set. we would have obtained the same result by taking the inner product of each side with φj : N N (f.24 is an expansion in orthonormal functions.26) f (x)φj (x) dx = a i=1 ci a φi (x)φj (x) dx = i=1 ci δij = cj . and the series is called the generalized Fourier series. Thus. ci = (f.

25. That is. From Eq. and vz are the projections of v on the coordinate basis vectors ex . a necessary condition for a Fourier series to converge is that ci → 0 as i → ∞. ey . f ) − 2 i=1 N + = (f. vz = v · ez . v = vx ex + vy ey + vz ez . From that equation. the vector v is. i N N (5. and ez . (5. the error after summing N terms of the series is N MN = (f. The formula for the generalized Fourier coeﬃcient ci in Eq. f ) − 2 i=1 N c2 i c2 i c2 . in Cartesian coordinates.28 can be interpreted as the projection of the function f (x) on the ith basis function φi . f ) − i=1 c2 = (f. by deﬁnition.33) which is referred to as Bessel’s Inequality.31) These components are analogous to the Fourier coeﬃcients ci . 5. vy .32. For example. From Eq. i i=1 (5. φj ) c2 i i=1 = (f. f ) − 2 i=1 N ci (f. Since the right-hand side of this inequality (the square of the norm) is ﬁxed for a given function.series representation of a function minimizes the mean square error between the series and the function. (5. f ) − i=1 (5.30) where the components vx . the mean square error after summing N terms of the series is N b N N MN = (f.34) so that the error after N + 1 terms is N +1 N MN +1 = (f.35) 60 . f ) − i=1 c2 .32) Since. f ) − i i=1 c2 − c2 +1 = MN − c2 +1 . MN ≥ 0. 5. vx = v · ex . in three-dimensions. vy = v · ey . φi ) + a N N i=1 ci φi j=1 cj φj dx = (f. we can deduce some additional properties of the Fourier series. i (5. 5. f ). This interpretation is analogous to the expansion of a vector in terms of the unit basis vectors in the coordinate directions. this last equation implies N c2 ≤ (f. i + i=1 j=1 N ci cj (φi . and each term of the left-hand side is non-negative.

higher order approximations are better than lower order approximations.7 forms a complete set in [−L. That is.40) . (5. Let the basis functions be {φi } = {1. . Having a complete set means that the set of basis functions φi (x) is not missing any members. (5. If we take the inner product of both sides of Eq. 16).}.39) which is a set of independent. That is. A basis consisting of only two unit vectors is incomplete. φj ) = i=1 ci (φi . in general. φj ). The set of basis functions φi (x) is deﬁned as a complete set if b N →∞ N 2 lim MN = lim N →∞ f (x) − a i=1 ci φi (x) dx = 0. Thus.37) for all x. x2 . we could alternatively deﬁne the set φi (x) as complete if there is no function with positive norm which is orthogonal to each basis function φi (x) in the set. (5. we obtain N (f. to express an arbitrary three-dimensional vector in terms of a sum of coordinate basis vectors. It turns out that the set of basis functions in Eq.38 with φj . adding a new term in the series decreases the error (if that new term is nonzero). three basis vectors are needed.36) If this equation holds. 5. x3 . Note that convergence in the mean does not imply convergence pointwise. in the generalized Fourier series.38) although the equality is approximate if N is ﬁnite. L]. 5. nonorthogonal functions. since it in incapable of representing arbitrary vectors in three dimensions. we represent the function f (x) with the expansion f (x) = i=1 ci φi (x). For example. the Fourier series is said to converge in the mean. 61 (5. x. it may not be true that N N →∞ lim f (x) − i=1 ci φi (x) = 0 (5. . .T (x) f q E x Figure 16: A Function with a Jump Discontinuity. The vector analog to completeness for sets of functions can be illustrated by observing that. the series representation of a discontinuous function f (x) will not have pointwise convergence at the discontinuity (Fig.3 Fourier Expansions Using a Polynomial Basis N Recall. 5. Thus.

2. we obtain 1 1 Aij = 0 xi−1 xj−1 dx = 0 xi+j−2 dx = xi+j−1 i+j−1 ··· ··· ··· ··· . φ2 )c2 + (φ2 .40 is a set of coupled equations which can be solved to yield ci : (f. is terribly conditioned even for very small matrices.    1 = 0 1 i+j−1 (5. The coeﬃcients ci depend on the number N of terms in the expansion. in general. φ2 )   (f. Eq. (5.     .41) (5. all ci change. (φi . . φ1 )c1 + (φ1 . where    c1      c2       c3 . φ3 )c3 + · · · .  . since the basis functions are nonorthogonal.where. which is referred to as the Hilbert matrix. φj ). 5. φ1 )c1 + (φ2 . The resulting coeﬃcient matrix. φj ) = 0 for i = j. φ2 )c2 + (φ1 . Since the set {φi } is not orthogonal. . That is. If we assume the domain 0 ≤ x ≤ 1. . to add more terms..46) . . φ3 )c3 + · · · (f. . . . .    (f. . φ1 )c1 + (φ3 . . There are several problems with this approach: 1. φ1 )    (f.44) and φi (x) = xi−1 . 62 . φ ) N (5. . . not just the new coeﬃcients. or Ac = b. we must solve a set of coupled equations to obtain the coeﬃcients ci .43)              Aij = (φi . . Thus.      c   N   (f. φ3 )c3 + · · · (f. φ2 )c2 + (φ3 . .   .45) or     A=    1 1 2 1 3 1 4 1 2 1 3 1 4 1 5 1 3 1 4 1 5 1 6 1 4 1 5 1 6 1 7 (5. 3. φ3 ) b=  . φ1 ) = (φ1 . the Hilbert matrix.   . . φ2 ) = (φ2 . φ3 ) = (φ3 . c=  .42) (5.  .

53) . 63 P0 (x) 1 = x − (x. . 3 −1 1 1 (1. |a(2) − (a(2) · q(1) )q(1) | (5. . nonorthogonal vectors a(1) . x) . φ2 (x) = x2 . 1) = −1 dx = 2. (5. P0 ) (P1 . φ1 (x) = x.51) P0 (x) P1 (x) 1 x −(φ2 . . P1 ) = x2 −(x2 . Eq.50) (5.1]. we choose the domain −1 ≤ x ≤ 1. we recall the Gram-Schmidt algorithm for vectors.54) 3 and so on. the Legendre polynomials are a better choice for polynomial expansions than the polynomials of the Taylor series. . a(3) . 1) = x − 0 = x. ..e. |a(1) | q(2) = a(2) − (a(2) · q(1) )q(1) . the integrand is an odd function of x). . that span the same space. 5. i + j = odd. x) = 0. φj ) = 0. (5.47. (5. (x2 .47) where we have chosen to count the functions starting from zero. are orthogonal over [-1. x) 2 x2 dx = . Let φi (x) denote the nonorthogonal set of basis functions φ0 (x) = 1. by analogy with Gram-Schmidt for vectors. For convenience. we obtain P0 (x) = φ0 (x) = 1. We let Pi (x) denote the set of orthogonal basis functions to be determined. Because of their orthogonality. Thus. P1 (x) = φ1 (x) − (φ1 . 1) = Thus.48) as illustrated in Fig. in which case (φi . 17. 1) −(x2 . q(3) . These polynomials.49) since the integral of an odd power of x over symmetrical limits is zero (i. These problems can be eliminated if we switch to a set of orthogonal polynomials. 1) (x.. P0 ) where (x2 . a(2) . Given a set of independent. For example. 1 P2 (x) = x2 − . . (5. 1) (5. To facilitate the application of Gram-Schmidt to functions. called Legendre polynomials. . the ﬁrst two orthonormal vectors are q(1) = a(1) . q(2) . which can be found using the Gram-Schmidt process. P0 ) P2 (x) = φ2 (x)−(φ2 . P1 ) (1. (5. These functions will not be required to have unit norm. P0 ) (1. this procedure computes a set of orthonormal vectors q(1) .52) (P0 . (P0 .a(2) a(2) − (a(2) · q(1) )q(1)      I  a(1)   w f   I  f   (1) q   f Figure 17: The First Two Vectors in Gram-Schmidt. . .

there exists a vector x = 0 and a number λ such that λ is called an eigenvalue of A.4 Similarity of Fourier Series With Least Squares Fitting In the least squares problem. given a function f (x) and the generalized Fourier series f (x) = ci φi (x). fi ) and the n-term ﬁtting function p(x) = ai φi (x). 5. so that a summation is implied over repeated indices. the basis functions in Eq. or (A − λI)x = 0.57) where the order of the resulting system is n.59) (5.2) .56) the normal equations are. (5. 6. Eq.59 are orthogonal.59 is a diagonal system.1 can alternatively be written in the form Ax = λIx.58) where the summation convention is used again. (5.40 becomes N (f. 5. φj )cj = (f.59. (5. The summation convention is used. 5. Also. given a ﬁnite number m of points (xi .1) If. Note the similarities between Eqs. φj ) = cj (φj .59. In the Fourier series problem. for a square matrix A of order n. 6 Eigenvalue Problems Ax = λx.57 is the discrete form of Eq. 5.3) (6. (5. 4.40 has only one nonzero term on the right-hand side (the term i = j). 5.57 and 5. Note that Eq.55) since each equation in Eq. [φi (xk )φj (xk )]aj = fj φi (xj ). φj ) (no sum on j). φi ). where I is the identity matrix. 5. These two equations are essentially the same. the coeﬃcients are determined from (φi . the number of basis functions. 5. φj ) = i=1 ci (φi . from Eq.With the basis functions φi given by the Legendre polynomials. (6. 64 (6. and x is the corresponding eigenvector. except that Eq. 5.36. so that Eq.

We let u1 and u2 denote the displacements from the equilibrium of the two masses m1 and m2 . yields a polynomial of degree n referred to as the characteristic polynomial associated with the matrix. The stiﬀnesses of the two springs are k1 and k2 .3 would be x = 0.6) 0 1 u2 ¨ −1 1 u2 f2 or M¨ + Ku = F. Eigenvalue problems arise in many physical applications. to obtain a nonzero solution of the eigenvalue problem. which is a system of n linear algebraic equations. let k1 = k2 = 1 and m1 = m2 = 1. this system can be written 1 0 u1 ¨ 2 −1 u1 0 + = (6. Since a polynomial of degree n has n roots.5) where dots denote diﬀerentiation with respect to the time t. . 18. an1 an2 an3 ··· ··· ··· .4) We note that this determinant.8) . . . (6. (6. K= 2 −1 −1 1 65 .E u1 (t) k2 ¡e ¡e ¡ ¡ e¡ e¡ E u2 (t) E k1       ¡e ¡e ¡ e¡ e¡ m1 e e      m2 e e      f2 (t) Figure 18: 2-DOF Mass-Spring System. . the equations of motion of this system are u1 + 2u1 − u2 = 0 ¨ u2 − u1 + u2 = f2 ¨ . the unique solution of Eq. when expanded. For the purposes of computation. A − λI must be singular. strain.. which implies that a11 − λ a12 a13 a21 a22 − λ a23 a31 a32 a33 − λ . If the coeﬃcient matrix A − λI were nonsingular. . A has n eigenvalues (not necessarily real). and inertia. (6. . 6. and the calculation of principal axes of stress. The eigenvalues are thus the roots of the characteristic polynomial.7) where M= 1 0 0 1 . .1 Example 1: Mechanical Vibrations Consider the undamped two-degree-of-freedom mass-spring system shown in Fig. buckling of structures.. F= 0 f2 . including free vibrations of mechanical systems. u (6. . ann − λ det(A − λI) = = 0. The equation obtained by equating this polynomial with zero is referred to as the characteristic equation for the matrix. Thus. 6. which is not of interest. From Newton’s second law of motion (F = ma). . In matrix notation. ··· a1n a2n a3n . .

(6.14) (6.9 yields − Mω 2 u0 cos ωt + Ku0 cos ωt = 0.9) That is. 6. 6.9 which are sinusoidal in time and where both DOF oscillate with the same circular frequency ω.13) (6. With zero applied force (the free undamped vibration problem). we look for solutions of Eq. and force matrices.12) (6.16 that det(A − λI) = Thus.19) (6. 66 ω2 = λ2 = 1. stiﬀness.11) Since we want the eigenvalues and eigenvectors of A. (6.618.20) . 6. A = M−1 K = 1 0 0 1 2 −1 −1 1 = 2 −1 −1 1 .15) (6.17) (6.These three matrices are referred to as the system’s mass. 6. respectively.382 and 2. 2 and the circular frequencies ω (the square root of λ) are ω1 = λ1 = 0.10 into Eq.18) √ 3± 5 λ= ≈ 0. This is an eigenvalue problem in standard form if we deﬁne A = M−1 K and ω 2 = λ: Au0 = λu0 or (A − λI)u0 = 0. it follows from Eq. 2 − λ −1 −1 1 − λ = (2 − λ)(1 − λ) − 1 = λ2 − 3λ + 1 = 0. (−ω 2 M + K)u0 = 0 or Ku0 = ω 2 Mu0 or M−1 Ku0 = ω 2 u0 . M¨ + Ku = 0.16) (6. (6. For this problem.618 .618. The substitution of Eq.10) (6. u We look for nonzero solutions of this equation in the form u = u0 cos ωt. (6. Since this equation must hold for all time t. The vector u0 is the amplitude vector for the solution.

and u(i) is the corresponding mode shape.618 −1 −1 0. (a) Undeformed shape. Note that natural frequencies are not commonly expressed in radians per second. The lowest natural frequency for a system is referred to as the fundamental frequency for the system. (b) Mode 1 (masses in-phase).618 a solution of which is u(2) = 1 −0. and m2 has the larger displacement. since the eigenvalue problem is a homogeneous problem.382. as expected.618 .21) which is a redundant system of equations satisﬁed by u(1) = 0. (6. Note that. For Mode 1. we obtain −0.618 1 .000 ' ¡e ¡ ¡ e¡ (b)       ¡e ¡e ¡ ¡ e¡ e¡ 0. 67 . λ = 0. To obtain the eigenvectors.000 ¡e ¡e ¡ ¡ e¡ e¡ E 1. where ω = 2πf . (6.22) The superscript indicates that this is the eigenvector associated with the ﬁrst eigenvalue.16 for each eigenvalue found.(a)       ¡ e ¡e ¡ e¡ e ¡ E 0.618 −1 −1 −1. but in cycles per second or Hertz (Hz).618 ¡ e ¡e ¡ e¡ e ¡ E 1. we obtain 1.618 u1 u2 = 0 0 . the two masses are vibrating out-of-phase. The determinant approach used above is ﬁne for small matrices but not well-suited to numerical computation involving large matrices. For Mode 2. (c) Mode 2 (masses out-of-phase). λ = 2.618. 6. we solve Eq. For the second eigenvalue.23) Physically. the two masses are vibrating in-phase. An n-DOF system has n natural frequencies and mode shapes. 19. ωi is a natural frequency of the system (in radians per second). For the ﬁrst eigenvalue.24) u1 u2 = 0 0 . The mode shapes for this example are sketched in Fig.618 (c)       ¡e ¡e ¡ ¡ e¡ e¡ Figure 19: Mode Shapes for 2-DOF Mass-Spring System. (6. any nonzero multiple of u(1) is also an eigenvector. (6.

For example.   0   0   .   0 Ax(3)    0  =9 0 . Although the eigenvalues of a triangular matrix are obvious.28) (6. αx is also a solution for any nonzero scalar α. 6.6. Eq.   . etc.       . 6.   0   .       0 0 9 0 0 0  (6. The eigenvalues of a diagonal matrix are the diagonal entries.   . x(1) since Ax(1) Similarly. Ax(2)    0  =4 1 .29)       11 0 0  1   11   1  0 = 0 4 0  0 = = 11 0 = λ1 x(1) . Property 2. (6. if x is a solution.1 is a homogeneous equation. From the eigenvalue problem. 9. 4. let  11 0 0 A =  0 4 0 . let   1 4 5 A =  0 4 7 . (6. and the eigenvectors are        1   0   0               0   1   0              0 . =   0  (6.27)  .  For example.1.   .31)    1  0 . 0 .30) 68 . The eigenvalues of a triangular matrix are the diagonal entries. it turns out that the eigenvectors are not obvious.   1 (6. We note that Eq. (6. Eigenvectors are unique only up to an arbitrary multiplicative constant.   . Property 3. 1 . 0 0 9 For λ1 = 11.2 Properties of the Eigenvalue Problem There are several useful properties associated with the eigenvalue problem: Property 1.25) 0 0 9 for which the characteristic equation is det(A − λI) = 1−λ 4 5 0 4−λ 7 0 0 9−λ = (1 − λ)(4 − λ)(9 − λ) = 0.           .26) which implies λ = 1.

.. i. Property 5. The sum of the eigenvalues of A is equal to the trace of A.34) Since the polynomials in Eqs. . 6. The product of the eigenvalues equals the determinant of A.   λn 69 . No other products in the determinant contribute to the (−λ)n−1 term..36) (6.35) This property results immediately from Eq.33 and 6. . the cofactor of a12 deletes two matrix terms involving λ.34 by setting λ = 0.33) = (−λ) + (a11 + a22 + · · · + ann )(−λ)n−1 + · · · . which is the desired result. we equate the coeﬃcients of the (−λ)n−1 terms to obtain λ1 + λ2 + · · · + λn = a11 + a22 + · · · + ann . ann − λ (6.) To prove this property. .e.34 must be identically equal for all λ. 6. ··· a1n a2n a3n . For example. On the other hand. . then the matrix product   λ1   λ2     −1 λ3 S AS = Λ =  (6. since det(A − λI) is a polynomial of degree n. (6. Property 6. .. from Eq. we ﬁrst note. an1 an2 an3 n det(A − λI) = ··· ··· ··· . . If the eigenvectors of A are linearly independent. . for a matrix of order n.37)    . λ1 λ2 · · · λn = det A. so that the largest power of λ from that cofactor is n − 2. n λi = tr A.Property 4. i. and a matrix S is formed whose columns are the eigenvectors.4.. (6.e. .32) (The trace of a matrix is deﬁned as the sum of the main diagonal terms. we can write it in the factored form det(A − λI) = (λ1 − λ)(λ2 − λ) · · · (λn − λ) = (−λ)n + (λ1 + λ2 + · · · + λn )(−λ)n−1 + · · · + λ1 λ2 λ3 · · · λn . that a11 − λ a12 a13 a21 a22 − λ a23 a31 a32 a33 − λ . i=1 (6. .. . 6.

To prove this property. Distinct eigenvalues yield independent eigenvectors. From the last two properties. Property 8. and S−1 AS = Λ or A = SΛS−1 . we consider two eigensolutions: Ax(1) = λ1 x(1) Ax(2) = λ2 x(2) . 6.46) 70 . λn Since the columns of S are independent. . This property is proved in block form as follows: AS = A = = x(1) x(2) · · · x(n) Ax(n) λn x(n)  λ1        λ2 λ3 . We ﬁrst multiply Eq.43) (6.38) Ax(1) Ax(2) · · · λ1 x(1) λ2 x(2) · · · = x(1) x(2) · · · x(n) Property 7. (6.. we consider the eigenvalue problem Ax = λx.42 by A to obtain c1 Ax(1) + c2 Ax(2) = c1 λ1 x(1) + c2 λ2 x(2) = 0.44) (6.45) (6. We then multiply Eq.   (6. (6.40) (6.is diagonal. 6.41) which implies c1 = 0. symmetric matrix are real. we must show that c1 x(1) + c2 x(2) = 0 implies c1 = c2 = 0. (6. To show that x(1) and x(2) are independent. To prove this property. S is invertible. It turns out that a matrix with repeated eigenvalues cannot always be diagonalized. we obtain c1 (λ1 − λ2 )x(1) = 0. and the eigenvectors are independent. If we subtract these last two equations. we conclude that a matrix with distinct eigenvalues can always be diagonalized.42) (6. Similarly. The eigenvalues and eigenvectors of a real. since the eigenvalues are distinct (λ1 = λ2 ).39)     = SΛ.42 by λ2 to obtain c1 λ2 x(1) + c2 λ2 x(2) = 0. c2 = 0.

(6.51) for x. In engineering applications. to show that the generalized eigenvalue problem has only real eigenvalues. so this property is of great practical signiﬁcance. p. λ is real). λx∗T x = λ∗ xT x∗ . xT Ax∗ = λ∗ xT x∗ . The eigenvectors are also real.46 by x∗T (the conjugate transpose) and Eq. and principal stresses. given λ. which must be real. symmetric matrix. since they are the transposes of each other. it turns out that.53) √ for which Eq. buckling loads in elastic stability problems. (6.49) where the two left-hand sides are equal. 82). and both are scalars (1 × 1 matrices). (6. including the natural frequencies of vibration and mode shapes (the eigenvectors) of undamped mechanical systems. various physical quantities of interest. As a result. if B is not only real and symmetric but also positive deﬁnite (a term to be deﬁned later in §6. 6. (6. the form of the eigenvalue problem that is of greater interest than Eq. Having A and B real and symmetric is not suﬃcient. Since. since they are obtained by solving the equation (A − λI)x = 0 (6. symmetric matrix with distinct eigenvalues are mutually orthogonal.46 is the generalized eigenvalue problem Ax = λBx. 6.52) where both A and B are real and symmetric. 6.47 by xT to obtain x∗T Ax = λx∗T x. If we take the complex conjugate of both sides of this equation. 6. as can be seen from the example A= 1 0 0 −1 . Property 9.54) 71 . in engineering applications.48) (6. We multiply Eq. Matrix formulations of engineering problems generally yield symmetric coeﬃcient matrices.50) where x∗T x = xT x∗ = 0 implies λ = λ∗ (i.. However.e.where A is a real. The eigenvectors of a real. The goal is to show that λ = λ∗ . the generalized eigenvalue problem has real eigenvalues [10]. (6. B is usually positive deﬁnite. the generalized eigenvalue problems that arise in engineering have real eigenvalues. we consider from the outset the generalized eigenvalue problem Ax = λBx. To prove this property.8.47) where the asterisk (*) denotes the complex conjugate. (6. where i = −1. Thus.52 has imaginary eigenvalues ±i. we obtain Ax∗ = λ∗ x∗ . as is. B= 0 1 1 0 . are guaranteed to be real.

56) xT Ax .59) (6. if the eigenvalues are distinct (λ1 = λ2 ). . Ax(1) = λ1 Bx(1) . we note that.57) (6. =  x Ax  . with the matrix S of eigenvectors deﬁned as S= it follows that   x(1)T  x(2)T ST AS =  . .61) (6. 6. .where A and B are both real.  . x(2)T Ax(1) = 0. In mechanical vibrations. Thus. from which it follows that the scalar λ1 x(2)T Bx(1) = x(2)T Ax(1) = x(1)T Ax(2) = λ2 x(1)T Bx(2) = λ2 x(2)T Bx(1) . A represents the stiﬀness matrix. 72 . . (6. (6. (λ1 − λ2 )x(2)T Bx(1) = 0. 6. for a given vibration mode. respectively. Also. x(2)T Bx(1) = 0.60) The eigenvectors are said to be orthogonal with respect to the matrices A and B. (6. and λ = ω 2 is the square of the circular frequency. A similar expression applies for B. x(1) x(2) · · · .  (6.58) (6. The orthogonality relations proved here also imply that λ= ST AS = ΛST BS. In mechanical vibrations.62) xT Bx which is referred to as the Rayleigh quotient. Eq. Ax(2) = λ2 Bx(2) .65) where the oﬀ-diagonal terms are zero by orthogonality. .63) where S is the matrix of eigenvectors (in the columns).. and. To prove Eq.37. the numerator and denominator are the generalized stiﬀness and mass. .54 implies the scalar equation xT Ax = λxT Bx or (6. symmetric matrices. 6. For x an eigenvector.64)      A x(1) x(2) · · ·  x(1)T Ax(1) x(1)T Ax(2) · · · (1)  (2)T x(2)T Ax(2) · · ·  . (6. For two eigenvalues λ1 and λ2 . and B represents the mass matrix. Λ is the diagonal matrix of eigenvalues deﬁned in Eq. and ST AS and ST BS are both diagonal matrices.55) (6.63. since Ax = λBx. . .

referred to as the spectral theorem. s2 . S is also orthogonal (i.e. S−1 = ST ).70) where the diagonal components of the matrix are the normal (or direct) stresses. symmetric matrix can be factored into A = SΛST . (6. Eq. and the oﬀ-diagonal components are the shear stresses..72) ei and ej are unit vectors in the coordinate directions. and ST AS = ΛST IS = Λ (6. σ33 The matrix of stress components in 3-D elasticity  σ11 σ12 σ =  σ21 σ22 σ31 σ32 (6.. which is a symmetric matrix. and the eigenvectors are mutually orthogonal.e. 6. 73 .71) where Rij = ei · ej . and A = SΛST . (6. Recall that an orthogonal transformation of coordinates transforms σ according to the rule σ = RσRT . and Λ is a diagonal matrix of eigenvalues. B = I). If the orthogonal eigenvectors are normalized to unit length.68) where S is the matrix whose columns are the eigenvectors. (6.e.66) or A = SΛST . means that a real. then the matrix S is orthogonal. (6.67) This last result. (6. and R is an orthogonal matrix (RRT = I). s3 . Conversely. A key problem is to determine whether there is an orthogonal transformation of coordinates (i.3 Example 2: Principal Axes of Stress is a tensor of rank 2:  σ13 σ23  . This matrix is symmetric.39 implies that A can be written in the form A = SΛS−1 . a rotation of axes) x = Rx such that the stress tensor σ is diagonalized:     σ11 0 0 s1 0 0 σ =  0 σ22 0  =  0 s2 0  .69) 6. if the eigenvalues of a matrix A are real. (6.If Ax = λx (i. A is necessarily symmetric.73) 0 0 σ33 0 0 s3 where the diagonal stresses in this new coordinate system are denoted s1 . Since orthogonal eigenvectors are independent.. and the eigenvectors are normalized to unit length. where S has orthonormal eigenvectors in the columns.

e. since. The goal in solving Eq. Eq.74) (6. 6.79) which is referred to as the characteristic equation associated with the eigenvalue problem.78 (and eigenvalue problems in general) is to ﬁnd nonzero vectors v which satisfy Eq. for example. where   θ1 = σ11 + σ22 + σ33 = σii = tr σ 1 2 2 2 θ2 = σ22 σ33 + σ33 σ11 + σ11 σ22 − σ31 − σ12 − σ23 = 2 (σii σjj − σij σji )  θ3 = det σ. when expanded. yields (σ11 − s)[(σ22 − s)(σ33 − s) − σ23 σ32 ] − σ12 [σ21 (σ33 − s) − σ23 σ31 ] + σ13 [σ21 σ32 − σ31 (σ22 − s)] = 0 or − s3 + θ1 s2 − θ2 s + θ3 = 0. 6.82) (6. i. (6.78) Eq.75) (6. 74 (6. the original desire to ﬁnd a coordinate rotation which would transform the stress tensor to diagonal form reduces to seeking vectors v such that σv = sv. For nontrivial solutions (v = 0).71. The characteristic equation is a cubic polynomial in s. σRT = RT σ . RT = [v1 v2 v3 ]. and v the corresponding eigenvector. Geometrically.) Each column of this equation is σvi = si vi (no sum on i). 0 0 s3 (6.78 for some scalar s. 6.76) (The various matrices in this equation are conformable from a “block” point of view. in which case Eq. 6. when multiplied by the matrix σ.From Eq.83) (6. since the left-hand side is.. We now let vi denote the ith column of RT .78 is to ﬁnd nonzero vectors v which.78 is an eigenvalue problem with s the eigenvalue. 6. (6.77) Thus. (6. 6.80) (6. 6.78 is equivalent to the matrix system (σ − sI)v = 0.81) . the product of a 1 × 1 matrix with a 1 × 3 matrix. implying that det(σ − sI) = σ11 − s σ12 σ13 σ21 σ22 − s σ23 σ31 σ32 σ33 − s = 0. result in new vectors which are parallel to v.74 can be written in matrix form as   s1 0 0 σ[v1 v2 v3 ] = [v1 v2 v3 ]  0 s2 0  = [s1 v1 s2 v2 s3 v3 ]. the goal in solving Eq. the matrix σ − sI must be singular.

Thus. For this system. s2 .87) 0 0 3 0 −1 1 75 . Eq. The determination of principal axes of inertia in two dimensions in an eigenvalue problem. A geometrical example of a tensor of rank 2 is the inertia matrix Iij whose matrix elements are moments of inertia. are referred to as the principal stresses.86)  θ3 = s1 s2 s3 We note that the above theory for eigenvalues. since the (normalized) eigenvectors are columns of RT (rows of R). θ2 . (6.85) Thus. principal axes. Since the principal stresses are independent of the original coordinate system. (6.4 Computing Eigenvalues by Power Iteration Determinants are not a practical way to compute eigenvalues and eigenvectors except for very small systems (n = 2 or 3). the mass and stiﬀness matrices are     1 0 0 5 −2 0 3 −1  .82. Consider the 3-DOF system shown in Fig. strain is also a tensor of rank 2. must be invariant with respect to a coordinate rotation. and invariants is applicable to all tensors of rank 2. For example. θ1 . The stresses s1 . K =  −2 (6. 6. the three invariants have the same values in all coordinate systems. Row i of R consists of the direction cosines of the ith principal axis. Here we show a classical approach to solving the eigenproblem for large matrices. That is. We will illustrate the procedure with a mechanical vibrations example. M =  0 2 0  . where Rij = ei · ej . The resulting stress tensor is   s1 0 0 σ =  0 s2 0  . the stress invariants are   θ1 = s1 + s2 + s3 θ2 = s2 s3 + s3 s1 + s1 s2 (6. s3 . the coeﬃcients of the characteristic polynomial. and θ3 are referred to as the invariants of the stress tensor. The principal axes of stress are the eigenvectors of the stress tensor.84) 0 0 s3 and the coordinate axes of the coordinate system in which the stress tensor is diagonal is referred to as the principal axes of stress or the principal coordinates. since we did not use the fact that we were dealing speciﬁcally with stress. In principal coordinates. which are the three solutions of the characteristic equation.E k1 =3       ¡e ¡ e¡ ¡e ¡ u1 (t) k2 =2 ¡e ¡e ¡ e¡ e ¡ E u2 (t) k3 =1 ¡e ¡e ¡ e¡ e¡ E u3 (t) m1 =1 e e      m2 =2 e e      m3 =3 e e      Figure 20: 3-DOF Mass-Spring System. 20. 6.

90.24 = λ2 u(2) . 6.92) where. Ku = λMu or Au = λu. Geometrically.93) − 1  −0. our approach to computing a solution of Eq.0 0.067 0.The eigenvalue problem is. Since u(0) and u(1) are not parallel.40 −0. our guess is wrong.0 = 5 −0.01 3 3 Since u(1) and u(2) are not parallel.88) (6. check if that vector is transformed by A into a parallel vector. That is.13.0   5. 3 −1  =  −1 2 2 1 0 0 1 0 −1 0 −1 1 3 3 3 (6. 6. where the scalar multiple is the corresponding eigenvalue λ.2 −1.0         = λ1 u(1) . we try to improve upon the guess iteratively until convergence is achieved. (6.89 is a vector such that Au is a scalar multiple of u. u is an eigenvector if Au is parallel to u. 6. we repeat this calculation using u(1) :         1. we factored the largest component from the right-hand side vector. and if.2     0.40          3 Au(1) =  −1 = = 5.91) u =   0 in which case Au(0)  5 −2 0  1    3 =  −1 −1  0 2 2   1 1 0 −3 0 3     5 = −1     0       1.00  5 −2 0  1. from Eq. to allow easy comparison of u(1) with u(0) . u(0) is not an eigenvector. a solution (eigenvector) u of Eq. For the matrix A given in Eq. and      5 −2 0 1 0 0 5 −2 0      1 3 A = M−1 K =  0 2 0   −2 −1  . where u is the vector of displacement amplitudes. (6. and we iterate again. as expected. an eigenvector of a matrix A is a vector which is transformed by A into a new vector with the same orientation but (possibly) diﬀerent length. (6. 6.89 will be to attempt to guess an eigenvector. we conclude that u(1) is not an eigenvector.89) (6. even if the ﬁrst matrix is diagonal. By deﬁnition. To determine if u(1) is an eigenvector. Thus.90) Notice that the product of two symmetric matrices need not be symmetric. It is useful to summarize all the iterations: 76 .30 2 2             1 0 −1 0. let our initial guess for an eigenvector be    1  (0) 0 .

we conclude that one eigenvalue of A is 5. In fact.015 0. 77 .000   1. φ(2) .0 5. A symmetric system of order n has n eigenvectors which are orthogonal. −0. Then u(0) = c1 φ(1) + c2 φ(2) + · · · + cn φ(n) and u(1) = Au(0) = c1 Aφ(1) + c2 Aφ(2) + · · · + cn Aφ(n) = c1 λ1 φ(1) + c2 λ2 φ(2) + · · · + cn λn φ(n) .0 0.48 5. Thus.252 . and the higher modes are not physically meaningful if the original problem was a continuum.00   1.5036. −0. φ(n) .             1   1. −0. = cn λn cn λn cn λn Since λi <1 λn (6. after r iterations. Similarly. the buckling load factor turns out to be an eigenvalue.40 5.97) (6.94) (6. in general.96) (6.2518             0 0.0   1. where |λ1 | < |λ2 | < · · · < |λn |.0162 5. discrete element models such as ﬁnite element models are necessarily low frequency models. In elastic stability problems.016 0. if follows that u(r) → cn λr φ(n) as r → ∞.24 .2518 . For example. and the corresponding eigenvector is    1. Let the n eigenvectors of A be φ(1) . u(2) = c1 λ2 φ(1) + c2 λ2 φ(2) + · · · + cn λ2 φ(n) . the largest eigenvalue is rarely of interest in engineering applications.000   1. Hence. and it is the lowest buckling load which is normally of interest.5036 Under each iterate for the eigenvector is shown the corresponding iterate for the eigenvalue. However. 1 2 n and.   0. −0.0000  0 .98) for all i < n. u(r) = c1 λr φ(1) + c2 λr φ(2) + · · · + cn λr φ(n) 1 2 n r r c2 λ2 λ1 (1) r c1 φ + φ(2) + · · · + φ(n) .0162 This iterative procedure is called the power method or power iteration. n (6.49 ··· 5. −0.249 . it is the low modes that are of interest.01 0.95) (6. · · · .2 . in mechanical vibration problems. any vector of dimension n can be expanded in a linear combination of such vectors. We now wish to determine which of the three eigenvalues has been found. · · · .99) This last equation shows that power iteration converges to the nth eigenvector (the one corresponding to the largest eigenvalue).0000  −0.

The matrix which must be factored is K − λ0 M. one can factor A = LU. This procedure is called inverse iteration.107) which is an eigenvalue problem with µ as the eigenvalue. (6. since computing A−1 is both expensive and unnecessary. Each iteration requires an additional matrix multiplication and FBS. and repeat the forward-backward substitution (FBS) for each iteration. for each iteration.102 becomes Mu(k) = 1 Ku(k+1) .5 Inverse Iteration Au = λu.102) u(k) = Au(k+1) . Eq. one solves the system of equations with A as coeﬃcient matrix: 1 (6. (6. or (K − λ0 M)u = µMu. This procedure is referred to as inverse iteration with a shift. This equation can be arranged for inverse iteration to obtain 1 (6. 6.104) where K and M are both real.108) Mu(k) = (K − λ0 M)u(k+1) . we can ﬁnd the eigenvalue closest to the shift point λ0 . Eq. by picking λ0 .101) A−1 u = u. A−1 is not found. (6. A practical application of inverse iteration with a shift would be the mechanical vibration problem in which one seeks the vibration modes closest to a particular frequency. where A = M−1 K. 78 (6. and the LU factors saved. Consider the generalized eigenvalue problem Ku = λMu.6.103) K is factored once. λ so that iteration with A−1 would converge to the largest value of 1/λ and hence the smallest λ. so that inverse iteration would converge to the smallest eigenvalue µ.106) . the same coeﬃcient matrix A is used. save the factors. We could alternatively write the eigenvalue problem as 1 (6. Instead. In practice. µ where k is the iteration number. We can shift the origin of the eigenvalue scale with the transformation λ = λ0 + µ. For the mechanical vibration problem. symmetric matrices. Thus. Since.104 then becomes Ku = (λ0 + µ)Mu. λ where k is the iteration number. λ (6. 6.105) where λ0 is a speciﬁed shift. (6.100) We recall that iteration with A in yields the largest eigenvalue and corresponding eigenvector of A.

v
T ¨ ¨¨ '

B ¨ ¨¨ ¨¨ E
Mφ |Mφ(1) |
(1)

w Mφ(1)

E

E

Figure 21: Geometrical Interpretation of Sweeping.

6.6

Iteration for Other Eigenvalues

We saw from the preceding sections that power iteration converges to the largest eigenvalue, and inverse iteration converges to the smallest eigenvalue. It is often of interest to ﬁnd other eigenvalues as well. Here we describe the modiﬁcation to the power iteration algorithm to allow convergence to the second largest eigenvalue. Consider again the eigenvalue problem Au = λu, (6.109)

where A = M−1 K, and assume we have already found the largest eigenvalue and its corresponding eigenvector φ(1) . Let φ(2) denote the eigenvector corresponding to the second largest eigenvalue. Thus, Aφ(2) = λφ(2) , (6.110) where, by orthogonality, φ(2)T Mφ(1) = 0 or φ(2) · Mφ(1) = 0. (6.111)

(The orthogonality relation has been written in both matrix and vector notations.) To ﬁnd φ(2) , given φ(1) , we will perform another power iteration but with the added requirement that, at each iteration, the vector used in the iteration is orthogonal to φ(1) . For example, consider the vector w as a trial second eigenvector. Fig. 21 shows the plane of the two vectors w and Mφ(1) . Since a unit vector in the direction of Mφ(1) is Mφ(1) , |Mφ(1) | the length of the projection of w onto Mφ(1) is w· Mφ(1) , |Mφ(1) |

and the vector projection of w onto Mφ(1) is w· Mφ(1) |Mφ(1) | Mφ(1) . |Mφ(1) |

Thus, a new trial vector v which is orthogonal to Mφ(1) is ˆ v=w− w· Mφ(1) |Mφ(1) | 79 Mφ(1) . |Mφ(1) | (6.112)

This procedure of subtracting from each trial vector (at each iteration) any components parallel to Mφ(1) is sometimes referred to as a sweeping procedure. This procedure is also equivalent to the Gram-Schmidt orthogonalization procedure discussed previously, except that here we do not require that v be a unit vector. Note that, even though A is used in the iteration, we must use either M or K in the orthogonality relation rather than A or the identity I, since, in general, A is not symmetric (e.g., Eq. 6.90). With a nonsymmetric A, there is no orthogonality of the eigenvectors with respect to A or I. If A happens to be symmetric (a special case of the symmetric generalized eigenvalue problem), then the eigenvectors would be orthogonal with respect to A and I. Thus, even though we may iterate using A, the orthogonality requires M or K weighting. M is chosen, since it usually contains more zeros than K and, for the example to follow, is diagonal. To illustrate the sweeping procedure, we continue the mechanical vibrations example started earlier. Recall that, from power iteration,        1.0000  1 0 0 5 −2 0      (1) −1 1  3 A = M K =  −1 −0.2518 , (6.113) −2  , M =  0 2 0  , φ = 2     1 1 0.0162 0 0 3 0 −3 3 where Mφ(1)     1 0 0  1.0000   1.0000  −0.5036 , =  0 2 0  −0.2518 =     0 0 3 0.0162 0.0486 

(6.114)

a vector whose length is |Mφ(1) | = 1.1207. Thus, the unit vector in the direction of Mφ(1) is    0.8923  (1) Mφ −0.4494 . (6.115) = |Mφ(1) |  0.0434  The iterations are most conveniently displayed with a spreadsheet:

80

i 0                  

1

2

3

4

5

6

w(i)    1  0   0 0.5410 1.0401 −0.3655 0.6030 1.1552 −0.4505 0.6130 1.1725 −0.4634 0.6145 1.1747 −0.4651 0.6145 1.1751 −0.4653 0.6145 1.1751 −0.4653

α= Mφ(1) w · |Mφ(1) |
(i)

0.8923                  

−0.0005

−0.0006

−0.0001

0.0002

0.0000

0.0000

Mφ w(i) − α |Mφ(1) |    0.2038  0.4010    −0.0387   0.5414  1.0399    −0.3655   0.6035  1.1549    −0.4505   0.6131  1.1725    −0.4634   0.6143  1.1748    −0.4651   0.6145  1.1751    −0.4653   0.6145  1.1751   −0.4653

(1)

λi 0.4010                     

v(i) 0.5082 1.0000 −0.0965 0.5206 1.0000 −0.3515 0.5226 1.0000 −0.3901 0.5229 1.0000 −0.3952 0.5229 1.0000 −0.3959 0.5229 1.0000 −0.3960 0.5229 1.0000 −0.3960                     

1.0399

1.1549

1.1725

1.1748

1.1751

Av(i) = w(i+1)    0.5410  1.0401    −0.3655   0.6030  1.1552    −0.4505   0.6130  1.1725    −0.4634   0.6145  1.1747    −0.4651   0.6145  1.1751    −0.4653   0.6145  1.1751   −0.4653

1.1751

Thus, the second largest eigenvalue of A is 1.1751, and the corresponding eigenvector is    0.5229  1.0000 .   −0.3960

6.7

Similarity Transformations

Two matrices A and B are deﬁned as similar if B = M−1 AM for some invertible matrix M. The transformation from A to B (or vice versa) is called a similarity transformation. The main property of interest is that similar matrices have the same eigenvalues. To prove this property, consider the eigenvalue problem Ax = λx, and let B = M−1 AM or A = MBM−1 . The substitution of Eq. 6.117 into Eq. 6.116 yields MBM−1 x = λx or B(M−1 x) = λ(M−1 x), 81 (6.119) (6.118) (6.117) (6.116)

if x is an eigenvector of A. Sij = Sji ). for a 3 × 3 matrix M. However.. 2 2 For example. For example.125) . the way to “simplify” a matrix (by diagonalizing it) is to ﬁnd its eigenvectors.. and λ is the eigenvalue. Consider. A = −AT = (M − MT ).121) (6. M−1 x is an eigenvector of B. for example.e.which completes the proof. A similarity transformation can be interpreted as a change of basis. A =  −A12 (6. if a matrix M is formed with the eigenvectors of A in the columns.8 Positive Deﬁnite Matrices For a square matrix M. In index notation. The eigenvalue problem is thus the basis for ﬁnding principal stresses and principal strains in elasticity and principal axes of inertia. 6.   λ1   λ2   −1 M AM = Λ =  (6. Thus. We recall from Property 6 of the eigenvalue problem (page 69) that.123) −A13 −A23 0 Any square matrix M can be written as the unique sum of a symmetric and an antisymmetric matrix M = S + A. and the eigenvectors are independent.124) (6. the scalar Q = xT Mx = x · Mx is called a quadratic form. a transformation which can be thought of as simply a change of coordinates. mechanical vibrations. where the eigenvalue problem computes the natural frequencies and mode shapes of vibration. 2 A matrix S is said to be symmetric if S = ST (i.122) M22 x2 + · · · + (M12 + M21 )x1 x2 + (M13 + M31 )x1 x3 + · · · . where 1 1 S = ST = (M + MT ). an antisymmetric matrix of order 3 is necessarily of the form   0 A12 A13 0 A23  .       3 5 7 3 3 8 0 2 −1 0 1  = S + A. M =  1 2 8  =  3 2 7  +  −2 9 6 4 8 7 4 1 −1 0 82 (6. the quadratic form is n n Q= = Mij xi xj i=1 j=1 M11 x2 + 1 (6.   λn where Λ is a diagonal matrix populated with the eigenvalues of A. the system represented by A has not changed.e. Aij = −Aji ).120) . the eigenvectors are transformed from x to M−1 x... This equation also shows that. A matrix A is said to be antisymmetric (or skew symmetric) if A = −AT (i. . If the eigenvalues (natural frequencies) do not change under the transformation.

Since the work required to deform a mechanical system in equilibrium (and with suﬃcient constraints to prevent rigid body motion) is positive for any set of nonzero displacements. xT Mx = xT Sx. (6. M is deﬁned as positive semi-deﬁnite if xT Mx ≥ 0 for all x = 0. A square matrix M is deﬁned as positive deﬁnite if xT Mx > 0 for all x = 0.129) which is a quadratic form. for any square matrix M. If the forces and displacements are linearly related by generalized Hooke’s law. (6. the stiﬀness matrix K for a mechanical system in equilibrium and constrained to prevent rigid body motion must be positive deﬁnite. (6. For example.130) 2 That is. a quadratic form often corresponds to energy. We ﬁrst prove that positive deﬁniteness implies positive eigenvalues.127) where S is the symmetric part of M. physically. The antisymmetric part of a matrix does not contribute to the quadratic form. consider an elastic system (Fig. xT Ax = 0 for all vectors x. we note that xT Ax = (xT Ax)T = xT AT x = −xT Ax = 0.. Let u denote the vector of displacements.128) where K is the system stiﬀness matrix. Quadratic forms are of interest in engineering applications. (6. 2 2 2 (6. and only if. if A is antisymmetric. and let F denote the vector of forces. i.F1 ¨r ¨ % ¨ r E ¡ r ! ¡ F2 F3 Figure 22: Elastic System Acted Upon by Forces. since. all the eigenvalues of M are positive. we must prove two properties: (1) positive deﬁniteness implies positive eigenvalues. To prove this statement. 22) acted upon by forces in static equilibrium. F = Ku. and (2) positive eigenvalues imply positive deﬁniteness. thus completing the proof. For the ith eigenvalue λi . The main property of interest for positive deﬁnite matrices is that a matrix M is positive deﬁnite if.131) 83 . To prove this property. Thus. Mx(i) = λi x(i) . The work W done by the forces F acting through the displacements u is then 1 1 1 W = u · F = u · Ku = uT Ku. 1 W = u · Ku > 0. (6.126) since the scalar is equal to its own negative.e.

only the symmetric part of M matters. We now prove that positive eigenvalues imply positive deﬁniteness.134) which implies that n n n n x · Mx = j=1 cj x (j) · i=1 ci λi x (i) = i=1 j=1 ci cj λi x(i) · x(j) . since the eigenvectors are mutually orthogonal. which implies that the eigenvectors are independent. the eigenvectors of M are mutually orthogonal. This important property establishes the connection between positive deﬁniteness and eigenvalues.9 Application to Diﬀerential Equations Consider the ordinary diﬀerential equation y = ay. Two other consequences of positive-deﬁniteness are 1. We can therefore expand any vector x using an eigenvector basis: n x= i=1 ci x(i) . From Property 9 of the eigenvalue problem (page 71). (6. Then. x(i) · x(j) = 0 for i = j. thus completing the second part of the proof. we can assume that M is symmetric. If we form the dot product of both sides of this equation with x(i) .133) where x(i) is the ith eigenvector. thus completing the ﬁrst part of the proof.136) x · Mx = i=1 c2 λi x(i) .135) However.138) . since Ax = 0 implies xT Ax = 0 implies x = 0 implies A−1 exists. Gaussian elimination can be performed without pivoting. 2 (6.where x(i) is the corresponding eigenvector. n (6. we obtain 0 < x(i) · Mx(i) = λi x(i) · x(i) = λi x(i) .137) which is positive if all λi > 0. for a quadratic form. Since. (6. 6. n n Mx = i=1 ci Mx(i) = i=1 ci λi x(i) . (6. Therefore.132) which implies λi > 0. i 2 (6. A positive deﬁnite matrix is non-singular. 84 (6. 2.

we see that S is the matrix whose columns are the linearly independent eigenvectors of A. . yn (t) are the unknown functions to be determined. since each equation in the system involves only one unknown function (i. . an1 an2 · · · ann    . 6.142) (6. In matrix notation. where y1 (t).  .   . The substitution of Eq. . Thus.  (6. yn = an1 y1 + an2 y2 + · · · + ann yn . .138 to systems of ordinary diﬀerential equations is y1 = a11 y1 + a12 y2 + · · · + a1n yn y2 = a21 y1 + a22 y2 + · · · + a2n yn . .e. 6. .   . All solutions of this equation are of the form y(t) = y(0)eat . . and a is a constant.139) (6. . λn 85 (6. where D = S−1 AS. . this system can be written y = Ay. . .. Note that. the system is uncoupled).146) and we have assumed that S is invertible. where    y1      y2   y= ..   .143) where S is a square matrix of constants picked to yield a new system with a diagonal coeﬃcient matrix. y2 (t).143 into Eq. The choice for S is now clear: S is the matrix which diagonalizes A. .. and the aij are constants. if the matrix A is a diagonal matrix. and D is the diagonal matrix of eigenvalues:   λ1   λ2     −1 λ3 D = S AS =  (6.144) .where y(t) is an unknown function to be determined. .141) (6.147)  = Λ.140) Our principal interest in this section is in solving systems of coupled diﬀerential equations like Eq. (6. (6. 6. to solve the system y = Ay for which A is not diagonal. . . where y(0) is the initial condition. From Property 6 of the eigenvalue problem (page 69). 6. 6.141 is easy to solve..141 yields Su = ASu or u = S−1 AS u = Du. . The generalization of Eq.    y   n    A=  a11 a12 · · · a1n a21 a22 · · · a2n . Eq.145) (6. y (t) is its derivative.141. we will attempt a change of variable y = Su.

1) and (1. a diagonalizing matrix for A is S= and Λ = S−1 AS = 1 1 1 −4 . We illustrate this procedure by solving the system y1 = y1 + y2 y2 = 4y1 − 2y2 (6. 2.156) u2 = −3u2 . 3.154) 2 0 0 −3 u (6. (6. (6. respectively.151) 1 5 4 1 1 −1 1 1 4 −2 1 1 1 −4 = 2 0 0 −3 . the substitution y = Su yields the new diagonal system u = Λu = or u1 = 2u1 .149) the eigenvalues of A are λ = 2 and λ = −3. (6.where λi is the ith eigenvalue of A. the original dependent variable of interest.152) Therefore.153) . (6. is given by y = Su = 1 1 1 −4 u1 u2 86 = c1 e2t + c2 e−3t c1 e2t − 4c2 e−3t (6. Thus. u2 = c2 e−3t . Thus. (6. y.148) with the initial conditions y1 (0) = 1. y2 (0) = 6. we can use the following procedure for solving the coupled system of diﬀerential equations y = Ay: 1. Determine y from the equation y = Su. where S is the matrix whose columns are the eigenvectors of A. −4). (6. Find the eigenvalues and eigenvectors of A.150) 1 1 4 −2 . The coeﬃcient matrix for this system is A= Since the characteristic polynomial is det(A − λI) = 1−λ 1 4 −2 − λ = λ2 + λ − 6 = (λ + 3)(λ − 2).155) Thus. Solve the uncoupled (diagonal) system u = Λu. where Λ is the diagonal matrix of eigenvalues of A. whose general solutions are u1 = c1 e2t . The corresponding eigenvectors are (1.

or y1 = c1 e2t + c2 e−3t y2 = c1 e2t − 4c2 e−3t . 6.162) where λ1 . . Ak = (SΛS−1 )k = (SΛS−1 )(SΛS−1 ) · · · (SΛS−1 ) = SΛk S−1 . A = SΛS−1 .165) where S is a matrix whose columns are the eigenvectors of A. λ2 . 6. (6. if the coeﬃcient matrix A of the system y = Ay is diagonalizable. 6. Notice that. the general solution can be written in the form y = c1 eλ1 t v1 + c2 eλ2 t v2 + · · · + cn eλn t vn . Eq.138. Eq.141 can be written in the form y(t) = eAt y(0). Notice from Eq. Thus. 2! 3! (6. .147. (6.164) . . Thus. 2! 3! 87 (6. 6. the solution of the original system is y1 = 2e2t − e−3t y2 = 2e2t + 4e−3t .166) (At)2 (At)3 + + ··· .159) (6.156 that the solution can be written in the form y = [v1 v2 ] u1 u2 = u1 v1 + u2 v2 = c1 eλ1 t v1 + c2 eλ2 t v2 . indicates that the solution of Eq.157) The application of the initial conditions yields c1 + c2 = 1 c1 − 4c2 = 6 or c1 = 2. v2 = 1 −4 .141. and vi is the eigenvector corresponding to λi .160) (6. with the scalar equation. (6. . λn are the eigenvalues of A.168) (6.163) where the exponential of the matrix At is formally deﬁned [8] by the convergent power series eAt = I + At + From Eq. for integers k.167) (6. since S−1 cancels S.158) where the eigenvectors of A are the columns of S: v1 = 1 1 . c2 = −1. A comparison of the matrix diﬀerential equation. (6. in general. 6. eAt = I + SΛS−1 t + SΛ2 S−1 t2 SΛ3 S−1 t3 + + ··· 2! 3! (Λt)2 (Λt)3 = S I + Λt + + + · · · S−1 = SeΛt S−1 . (6. (6.161) Thus.

171) in agreement with Eq. (6. since a single nth-order ODE is equivalent to a system of n ﬁrst-order ODEs. u (6.   y  n−1 = yn    y = (−a y − a y − · · · − a y 0 1 1 2 n−2 n−1 − an−1 yn )/an . 6. p. For example.Since Λt is a diagonal matrix. For the example of Eq.174) where M is the system mass matrix. and the equations of motion become ˙ M¨ + Bu + Ku = F(t). and dots denote diﬀerentiation with respect to the time t.   . 6. and the exponential is given by Eq. (6. the n-th order system an y (n) + an−1 y (n−1) + an−2 y (n−2) + · · · + a1 y + a0 y = 0. . undamped mechanical system is M¨ + Ku = F(t).175) 88 . Λ is the diagonal matrix of eigenvalues. u (6.170 yields y1 y2 = 1 5 1 1 1 −4 e2t 0 0 e−3t 4 1 1 −1 1 6 = 2e2t − e−3t 2e2t + 4e−3t .159. 65) that the equation of motion for a linear. 6. and eΛt is therefore the diagonal matrix   eλ1 t   e λ2 t   Λt e = (6.1. Note that this discussion of ﬁrst-order systems includes as a special case the nth-order equation with constant coeﬃcients. (6.169) .169. 6.170) where S is a matrix whose columns are the eigenvectors of A. Eq. the powers (Λt)k are also diagonal. u(t) is the unknown displacement vector.172) can be replaced by a ﬁrst-order system if we rename the derivatives as y (i) = yi+1 (t). n 6. (6. F is the time-dependent applied force vector. a viscous damping matrix B is introduced. If the system is damped. an = 0. . the solution of y = Ay can be written in the form y(t) = SeΛt S−1 y(0). K is the system stiﬀness matrix.173) .148.. eλn t Thus.10 Application to Structural Dynamics We saw in the discussion of mechanical vibrations (§6. in which case   y1 = y2    y = y3  2  . multi-DOF.

178) . one of which is the Newmark method (Eq. for the free.176) the eigenvectors are orthogonal with respect to both the mass M and stiﬀness K (Property 9. In this equation. in practice. in modal coordinates. (6. 69). 1. 6. a relatively small number m of modes is usually suﬃcient for acceptable accuracy.175 could be written as a linear combination of the modes: u(t) = ξ1 (t)x(1) + ξ2 (t)x(2) + · · · + ξm (t)x(m) . (6. diagonal matrices referred to as the modal mass and stiﬀness matrices. Thus. and multiplied by ST . The diagonal entries in these two matrices are the generalized masses and stiﬀnesses for the individual vibration modes. Another approach expands the solution in terms of vibration modes (the modal superposition approach). 71). we obtain ¨ ˙ ST MSξ + ST BSξ + ST KSξ = ST F(t) = P(t). There are several approaches used to solve these equations.    ξ (t)   m Also. we form the matrix S whose columns are the eigenvectors: S= x(1) x(2) · · · x(m) . If this equation is substituted into the equation of motion. (6. like in Property 6 (p.75) discussed on p.177) The number of terms in this expansion would be n (the number of physical DOF) if all eigenvectors were used in the expansion.182) . the eigenvectors could be used to form a basis for the solution space. Thus. the multipliers ξi (t) are referred to as the modal coordinates.180) This equation can be interpreted as a transformation from physical coordinates u to modal coordinates ξ. We recall that. not a diagonal matrix. The modes selected are generally those having lowest frequency (eigenvalue). Equations like this arise whether the system is a discrete element system such as in Fig. 6.   .175. by orthogonality. the equations of motion are ¨ ˙ Mξ + Bξ + Kξ = P(t). p. undamped vibration problem Ku = λMu. 89 (6. ˙ given the initial displacement u0 and initial velocity u0 . 6. and the unknown displacement vector in Eq.177 can be written in matrix form as u(t) = Sξ(t).All three system matrices are real and symmetric. 15. (6. since the eigenvalue problem did not involve B.  . in general. respectively. (6.181) where M = ST MS and K = ST KS are. Eq. which can be collected into the array    ξ1 (t)      ξ2 (t)   ξ(t) = . but. The modal damping matrix B = ST BS is.179) in which case Eq. The general problem is to integrate these equations in time to determine u(t). 20 or a continuum modeled with ﬁnite elements. (6.

it is common to approximate damping with a mode-by-mode frequency-dependent viscous damping coeﬃcient so that B is diagonal. second edition. If B is non-diagonal. New York.. 2005. which generally is much smaller than the number n of physical DOF in Eq. Lipschutz.183) where mi and ki are the generalized mass and stiﬀness. Bibliography [1] J. in which case the modal equations (Eq. New York. both equations require a similar time integration. 1993. Huebner. 6. Dewhirst. Lipschutz. [8] C. Inc. it has a known analytic solution in terms of an integral.175 except that this equation has m DOF (the number of eigenvectors in the modal expansion). Inc. D. 2003. Thus we see that a multi-DOF dynamical system can be viewed as a collection of single-DOF mass-spring-damper systems associated with the modes. respectively. 3000 Solved Problems in Linear Algebra. Schaum’s Outline of Theory and Problems of Tensor Calculus. Schaum’s Outline Series. 45(1):3–49. SIAM Review. 90 . Byrom. for mode i. New York. and T. or if the structure has no localized damping treatments.H.W. New York. Thus. 1988. Kay.L. McGraw-Hill.. 2006. John Wiley and Sons. 1991.182) uncouple into m scalar equations of the form ¨ ˙ mi ξi + bi ξi + ki ξi = Pi (t). New York. New York. Philadelphia. seventh edition. D. Smith. This is the equation of motion for the ith modal coordinate. second edition. fourth edition.. [2] R.V. Fourier Series and Boundary Value Problems. Inc. Brown and R. Nineteen dubious ways to compute the exponential of a matrix. The main beneﬁt of the modal formulation is that. twenty-ﬁve years later. [4] D. McGraw-Hill. if many time steps are required in the analysis. Van Loan. (6. 6. 6.. [6] S. Dynamics of Structures. the modal method is generally advantageous in structural dynamics if a small number of modes gives suﬃcient accuracy. Schaum’s Outline of Theory and Problems of Linear Algebra. Schaum’s Outline Series. McGraw-Hill. 1988. [7] S. Most of the computational expense in the modal expansion method arises from solving the eigenvalue problem. in structural dynamics. Churchill. Inc. 2001. Penzien. [5] A.J. The Finite Element Method for Engineers.177. 6. Laub. Society for Industrial and Applied Mathematics. McGraw-Hill.. the solution u(t) in terms of physical coordinates is obtained from Eq.E.This last equation is similar to Eq. Once all the scalar equations have been solved.C.W. Clough and J. Inc..G. Inc. Since it is a scalar equation with an arbitrary right-hand side. McGraw-Hill. [3] K. Schaum’s Outline Series.175. Matrix Analysis for Scientists and Engineers. Moler and C.

Academic Press. Spiegel. 2006. Inc. Inc. [11] J. Schaum’s Outline of Fourier Analysis With Applications to Boundary Value Problems. [10] G. Introduction to Numerical Calculations. 91 .[9] M. New York.S. CA. Vandergraft.R. fourth edition. 1974.. Linear Algebra and Its Applications. New York.. second edition. Thomson Brooks/Cole. McGraw-Hill. 1983. Belmont. Schaum’s Outline Series. Strang.

.

15 forward solving. 63 93 . 6 nonhomogeneous. 64 polynomial basis. 4 least squares problems. 56 scalar product. 60 convergence. 61 Crank-Nicolson method. 61 inner product. 55 Gaussian elimination. 30 completeness. stiﬀness. 41. 84 dimension of vector space. 50. 56 orthonormal. 30 basis functions. 78 iterative methods. 6 homogeneous. 2. 10 rectangular systems. 26 upper triangular. 71 properties. 4 diagonally dominant. 58 Bessel’s inequality. 65 change of basis. 43 diagonal system. 74 column space. 40 characteristic equation. 15 ﬁnite diﬀerence method. 80 elementary operations. 58 Gram-Schmidt orthogonalization. 64 generalized. 55 norm. 60 complete set. 17. 4 diﬀerential. 55 invariants. 60 basis vectors. 61 function(s) basis. 74 diagonal system. 4 normal. 25 diﬀerential equations. 55. 58 Gibbs phenomenom. 20 dot product. 56 orthogonal. 65. 52. 5 FBS. 4 lower triangular. 53 Hilbert matrix. 30 direct methods. 19 matrix interpretation. 28 nonsingular. 53 partial pivoting. 28 eigenvalue problems. 6 Fourier series. 8 basis for vector space. 6 forward solving. 48. 19 determinants. 6 Gaussian elimination. 71 generalized mass. 60 buckling. 56 normalized. 5. 62 inconsistent equations. 4 inner product. 49. 28 inconsistent. 49. 65.Index backsolving. 1. 2 echelon form of matrix. 52 Bessel’s inequality. 20 Kronecker delta. 26. 20 Jacobi’s method. 56 law of cosines. 84 elementary operations. 84 generalized eigenvalue problem. 6. 75 inverse iteration. 68 sweeping. 5 characteristic. 48 Legendre polynomials. 72 Gibbs phenomenom. 58 least squares. 6 equation(s) backsolving. 14 band matrix.

84 pivots. 2. 30 spectral theorem. 38 vectors. 73. 2. 55 similarity transformation. 49 sweeping procedure. 75 mode shape. 41 set. 39 scalar product functions. 13 mass matrix. 77 principal stress. 89 moments of inertia. 3 orthonormal basis. 52. 43 positive deﬁnite. 75 multiple right-hand sides. 59 mechanical vibrations. 8 natural frequency. 2 upper triangular. 43 symmetric. 2 subspaces. 18 multiplier. 10. 26 dimension. 40 pseudoinverse. 56 partial pivoting. 8 block. 65. 36 linear. 40 trace. 29 LU decomposition. 2 orthogonal. 82 projection. 72 residual. 53 null space. 72 matrix band. 30 orthogonal to null space. 28 power iteration. 67 Newmark method. 19 lower triangular. 15. 82 Rayleigh quotient. 34 rotation. 49. 31 rank. 56 normal equations. 31 orthogonal to row space. 81 spanning a subspace.linear independence. 39 QR factorization. 34 reﬂection. 44 examples. 39 operation counts. 72 stress invariants. 41. 2 mean square error. 50 row space. 35 integration. 7. 75 principal. 15 subspace. 75 projections onto lines. 56 matrix. 42 eigenvectors. 4 norm of function. 26 summation convention. 33 projection. 50. 75 structural dynamics. 80 tensor(s). 71 functions. 28 rotation. 37 94 . 67 mode superposition. 73 stiﬀness matrix. 54 quadratic forms. 62 identity. 69 tridiagonal. 75 convergence. 48. 48 trace of matrix. 1 inverse. 9 orthogonal coordinate transformation. 13 storage scheme. 69 transformation(s) diﬀerentiation. 74 Hilbert. 88 integrator. 34. 15 nonsingular equations. 44 isotropic.

25 95 . 3. 81 stretching. 41 upper triangular system. 33 tridiagonal system. 19 unit vector(s).similarity. 5 vector spaces.