The Mechanics of Elastic Solids
Volume 1: A Brief Review of Some Mathematical
Preliminaries
Version 1.0
Rohan Abeyaratne
Quentin Berg Professor of Mechanics
Department of Mechanical Engineering
MIT
Copyright c Rohan Abeyaratne, 1987
All rights reserved.
http://web.mit.edu/abeyaratne/lecture notes.html
December 2, 2006
2
3
Electronic Publication
Rohan Abeyaratne
Quentin Berg Professor of Mechanics
Department of Mechanical Engineering
77 Massachusetts Institute of Technology
Cambridge, MA 021394307, USA
Copyright c by Rohan Abeyaratne, 1987
All rights reserved
Abeyaratne, Rohan, 1952
Lecture Notes on The Mechanics of Elastic Solids. Volume 1: A Brief Review of Some Math
ematical Preliminaries / Rohan Abeyaratne – 1st Edition – Cambridge, MA:
ISBN13: 9780979186509
ISBN10: 0979186501
QC
Please send corrections, suggestions and comments to abeyaratne.vol.1@gmail.com
Updated June 25 2007
4
i
Dedicated with admiration and aﬀection
to Matt Murphy and the miracle of science,
for the gift of renaissance.
iii
PREFACE
The Department of Mechanical Engineering at MIT oﬀers a series of graduate level sub
jects on the Mechanics of Solids and Structures which include:
2.071: Mechanics of Solid Materials,
2.072: Mechanics of Continuous Media,
2.074: Solid Mechanics: Elasticity,
2.073: Solid Mechanics: Plasticity and Inelastic Deformation,
2.075: Advanced Mechanical Behavior of Materials,
2.080: Structural Mechanics,
2.094: Finite Element Analysis of Solids and Fluids,
2.095: Molecular Modeling and Simulation for Mechanics, and
2.099: Computational Mechanics of Materials.
Over the years, I have had the opportunity to regularly teach the second and third of
these subjects, 2.072 and 2.074 (formerly known as 2.083), and the current three volumes
are comprised of the lecture notes I developed for them. The ﬁrst draft of these notes was
produced in 1987 and they have been corrected, reﬁned and expanded on every following
occasion that I taught these classes. The material in the current presentation is still meant
to be a set of lecture notes, not a text book. It has been organized as follows:
Volume I: A Brief Review of Some Mathematical Preliminaries
Volume II: Continuum Mechanics
Volume III: Elasticity
My appreciation for mechanics was nucleated by Professors Douglas Amarasekara and
Munidasa Ranaweera of the (then) University of Ceylon, and was subsequently shaped and
grew substantially under the inﬂuence of Professors James K. Knowles and Eli Sternberg
of the California Institute of Technology. I have been most fortunate to have had the
opportunity to apprentice under these inspiring and distinctive scholars. I would especially
like to acknowledge a great many illuminating and stimulating interactions with my mentor,
colleague and friend Jim Knowles, whose inﬂuence on me cannot be overstated.
I am also indebted to the many MIT students who have given me enormous fulﬁllment
and joy to be part of their education.
My understanding of elasticity as well as these notes have also beneﬁtted greatly from
many useful conversations with Kaushik Bhattacharya, Janet Blume, Eliot Fried, Morton E.
iv
Gurtin, Richard D. James, Stelios Kyriakides, David M. Parks, Phoebus Rosakis, Stewart
Silling and Nicolas Triantafyllidis, which I gratefully acknowledge.
Volume I of these notes provides a collection of essential deﬁnitions, results, and illus
trative examples, designed to review those aspects of mathematics that will be encountered
in the subsequent volumes. It is most certainly not meant to be a source for learning these
topics for the ﬁrst time. The treatment is concise, selective and limited in scope. For exam
ple, Linear Algebra is a far richer subject than the treatment here, which is limited to real
3dimensional Euclidean vector spaces.
The topics covered in Volumes II and III are largely those one would expect to see covered
in such a set of lecture notes. Personal taste has led me to include a few special (but still
wellknown) topics. Examples of this include sections on the statistical mechanical theory
of polymer chains and the lattice theory of crystalline solids in the discussion of constitutive
theory in Volume II; and sections on the socalled Eshelby problem and the eﬀective behavior
of twophase materials in Volume III.
There are a number of Worked Examples at the end of each chapter which are an essential
part of the notes. Many of these examples either provide, more details, or a proof, of a
result that had been quoted previously in the text; or it illustrates a general concept; or it
establishes a result that will be used subsequently (possibly in a later volume).
The content of these notes are entirely classical, in the best sense of the word, and none
of the material here is original. I have drawn on a number of sources over the years as I
prepared my lectures. I cannot recall every source I have used but certainly they include
those listed at the end of each chapter. In a more general sense the broad approach and
philosophy taken has been inﬂuenced by:
Volume I: A Brief Review of Some Mathematical Preliminaries
I.M. Gelfand and S.V. Fomin, Calculus of Variations, Prentice Hall, 1963.
J.K. Knowles, Linear Vector Spaces and Cartesian Tensors, Oxford University Press,
New York, 1997.
Volume II: Continuum Mechanics
P. Chadwick, Continuum Mechanics: Concise Theory and Problems, Dover,1999.
J.L. Ericksen, Introduction to the Thermodynamics of Solids, Chapman and Hall, 1991.
M.E. Gurtin, An Introduction to Continuum Mechanics, Academic Press, 1981.
J. K. Knowles and E. Sternberg, (Unpublished) Lecture Notes for AM136: Finite Elas
ticity, California Institute of Technology, Pasadena, CA 1978.
v
C. Truesdell and W. Noll, The nonlinear ﬁeld theories of mechanics, in Handb¨ uch der
Physik, Edited by S. Fl¨ ugge, Volume III/3, Springer, 1965.
Volume IIII: Elasticity
M.E. Gurtin, The linear theory of elasticity, in Mechanics of Solids  Volume II, edited
by C. Truesdell, SpringerVerlag, 1984.
J. K. Knowles, (Unpublished) Lecture Notes for AM135: Elasticity, California Institute
of Technology, Pasadena, CA, 1976.
A. E. H. Love, A Treatise on the Mathematical Theory of Elasticity, Dover, 1944.
S. P. Timoshenko and J.N. Goodier, Theory of Elasticity, McGrawHill, 1987.
The following notation will be used consistently in Volume I: Greek letters will denote real
numbers; lowercase boldface Latin letters will denote vectors; and uppercase boldface Latin
letters will denote linear transformations. Thus, for example, α, β, γ... will denote scalars
(real numbers); a, b, c, ... will denote vectors; and A, B, C, ... will denote linear transforma
tions. In particular, “o” will denote the null vector while “0” will denote the null linear
transformation. As much as possible this notation will also be used in Volumes II and III
though there will be some lapses (for reasons of tradition).
vi
Contents
1 Matrix Algebra and Indicial Notation 1
1.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Indicial notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Summation convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Kronecker delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 The alternator or permutation symbol . . . . . . . . . . . . . . . . . . . . . 10
1.6 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Vectors and Linear Transformations 17
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Euclidean point space . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Linear Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Components of Tensors. Cartesian Tensors 41
3.1 Components of a vector in a basis. . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Components of a linear transformation in a basis. . . . . . . . . . . . . . . . 43
3.3 Components in two bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Determinant, trace, scalarproduct and norm . . . . . . . . . . . . . . . . . . 47
vii
viii CONTENTS
3.5 Cartesian Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Symmetry: Groups of Linear Transformations 67
4.1 An example in twodimensions. . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 An example in threedimensions. . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Lattices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4 Groups of Linear Transformations. . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Symmetry of a scalarvalued function . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Calculus of Vector and Tensor Fields 85
5.1 Notation and deﬁnitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Integral theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6 Orthogonal Curvilinear Coordinates 99
6.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 General Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . . . . . . 102
6.2.1 Coordinate transformation. Inverse transformation. . . . . . . . . . . 102
6.2.2 Metric coeﬃcients, scale moduli. . . . . . . . . . . . . . . . . . . . . . 104
6.2.3 Inverse partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.4 Components of ∂ˆ e
i
/∂ˆ x
j
in the local basis (ˆ e
1
, ˆ e
2
, ˆ e
3
) . . . . . . . . . . 107
6.3 Transformation of Basic Tensor Relations . . . . . . . . . . . . . . . . . . . . 108
6.3.1 Gradient of a scalar ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.2 Gradient of a vector ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . . 109
CONTENTS ix
6.3.3 Divergence of a vector ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.4 Laplacian of a scalar ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.5 Curl of a vector ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.6 Divergence of a symmetric 2tensor ﬁeld . . . . . . . . . . . . . . . . 111
6.3.7 Diﬀerential elements of volume . . . . . . . . . . . . . . . . . . . . . 112
6.3.8 Diﬀerential elements of area . . . . . . . . . . . . . . . . . . . . . . . 112
6.4 Examples of Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . . . 113
6.5 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7 Calculus of Variations 123
7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Brief review of calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.3 A necessary condition for an extremum . . . . . . . . . . . . . . . . . . . . . 128
7.4 Application of necessary condition δF = 0 . . . . . . . . . . . . . . . . . . . 130
7.4.1 The basic problem. Euler equation. . . . . . . . . . . . . . . . . . . . 130
7.4.2 An example. The Brachistochrone Problem. . . . . . . . . . . . . . . 132
7.4.3 A Formalism for Deriving the Euler Equation . . . . . . . . . . . . . 136
7.5 Generalizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.5.1 Generalization: Free endpoint; Natural boundary conditions. . . . . 137
7.5.2 Generalization: Higher derivatives. . . . . . . . . . . . . . . . . . . . 140
7.5.3 Generalization: Multiple functions. . . . . . . . . . . . . . . . . . . . 142
7.5.4 Generalization: End point of extremal lying on a curve. . . . . . . . . 147
7.6 Constrained Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.6.1 Integral constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.6.2 Algebraic constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
x CONTENTS
7.6.3 Diﬀerential constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.7 WeirstrassErdman corner conditions . . . . . . . . . . . . . . . . . . . . . . 157
7.7.1 Piecewise smooth minimizer with nonsmoothness occuring at a pre
scribed location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.7.2 Piecewise smooth minimizer with nonsmoothness occuring at an un
known location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.8 Generalization to higher dimensional space. . . . . . . . . . . . . . . . . . . 166
7.9 Second variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.10 Suﬃcient condition for convex functionals . . . . . . . . . . . . . . . . . . . 180
7.11 Direct method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.11.1 The Ritz method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.12 Worked Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Chapter 1
Matrix Algebra and Indicial Notation
Notation:
{a} ..... m×1 matrix, i.e. a column matrix with m rows and one column
a
i
..... element in rowi of the column matrix {a}
[A] ..... m×n matrix
A
ij
..... element in rowi, columnj of the matrix [A]
1.1 Matrix algebra
Even though more general matrices can be considered, for our purposes it is suﬃcient to
consider a matrix to be a rectangular array of real numbers that obeys certain rules of
addition and multiplication. A m×n matrix [A] has m rows and n columns:
[A] =
¸
¸
¸
¸
A
11
A
12
. . . A
1n
A
21
A
22
. . . A
2n
. . . . . . . . . . . .
A
m1
A
m2
. . . A
mn
; (1.1)
A
ij
denotes the element located in the ith row and jth column. The column matrix
{x} =
¸
¸
¸
¸
x
1
x
2
. . .
x
m
(1.2)
1
2 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
has m rows and one column; The row matrix
{y} = {y
1
, y
2
, . . . , y
n
} (1.3)
has one row and n columns. If all the elements of a matrix are zero it is said to be a null
matrix and is denoted by [0] or {0} as the case may be.
Two m×n matrices [A] and [B] are said to be equal if and only if all of their corresponding
elements are equal:
A
ij
= B
ij
, i = 1, 2, . . . m, j = 1, 2, . . . , n. (1.4)
If [A] and [B] are both m× n matrices, their sum is the m× n matrix [C] denoted by
[C] = [A] + [B] whose elements are
C
ij
= A
ij
+ B
ij
, i = 1, 2, . . . m, j = 1, 2, . . . , n. (1.5)
If [A] is a p ×q matrix and [B] is a q ×r matrix, their product is the p ×r matrix [C] with
elements
C
ij
=
q
¸
k=1
A
ik
B
kj
, i = 1, 2, . . . p, j = 1, 2, . . . , q; (1.6)
one writes [C] = [A][B]. In general [A][B] = [B][A]; therefore rather than referring to [A][B]
as the product of [A] and [B] we should more precisely refer to [A][B] as [A] postmultiplied
by [B]; or [B] premultiplied by [A]. It is worth noting that if two matrices [A] and [B] obey
the equation [A][B] = [0] this does not necessarily mean that either [A] or [B] has to be the
null matrix [0]. Similarly if three matrices [A], [B] and [C] obey [A][B] = [A][C] this does
not necessarily mean that [B] = [C] (even if [A] = [0].) The product by a scalar α of a m×n
matrix [A] is the m×n matrix [B] with components
B
ij
= αA
ij
, i = 1, 2, . . . m, j = 1, 2, . . . , n; (1.7)
one writes [B] = α[A].
Note that a m
1
×n
1
matrix [A
1
] can be postmultiplied by a m
2
×n
2
matrix [A
2
] if and
only if n
1
= m
2
. In particular, consider a m× n matrix [A] and a n × 1 (column) matrix
{x}. Then we can postmultiply [A] by {x} to get the m×1 column matrix [A]{x}; but we
cannot premultiply [A] by {x} (unless m=1), i.e. {x}[A] does not exist is general.
The transpose of the m×n matrix [A] is the n ×m matrix [B] where
B
ij
= A
ji
for each i = 1, 2, . . . n, and j = 1, 2, . . . , m. (1.8)
1.1. MATRIX ALGEBRA 3
Usually one denotes the matrix [B] by [A]
T
. One can verify that
[A + B]
T
= [A]
T
+ [B]
T
, [AB]
T
= [B]
T
[A]
T
. (1.9)
The transpose of a column matrix is a row matrix; and vice versa. Suppose that [A] is a
m× n matrix and that {x} is a m× 1 (column) matrix. Then we can premultiply [A] by
{x}
T
, i.e. {x}
T
[A] exists (and is a 1 ×n row matrix). For any n×1 column matrix {x} note
that
{x}
T
{x} = {x}{x}
T
= x
2
1
+ x
2
2
. . . + x
2
n
=
n
¸
i=1
x
2
i
. (1.10)
A n×n matrix [A] is called a square matrix; the diagonal elements of this matrix are the
A
ii
’s. A square matrix [A] is said to be symmetrical if
A
ij
= A
ji
for each i, j = 1, 2, . . . n; (1.11)
skewsymmetrical if
A
ij
= −A
ji
for each i, j = 1, 2, . . . n. (1.12)
Thus for a symmetric matrix [A] we have [A]
T
= [A]; for a skewsymmetric matrix [A] we
have [A]
T
= −[A]. Observe that each diagonal element of a skewsymmetric matrix must be
zero.
If the oﬀdiagonal elements of a square matrix are all zero, i.e. A
ij
= 0 for each i, j =
1, 2, . . . n, i = j, the matrix is said to be diagonal. If every diagonal element of a diagonal
matrix is 1 the matrix is called a unit matrix and is usually denoted by [I].
Suppose that [A] is a n×n square matrix and that {x} is a n×1 (column) matrix. Then
we can postmultiply [A] by {x} to get a n × 1 column matrix [A]{x}, and premultiply the
resulting matrix by {x}
T
to get a 1 ×1 square matrix, eﬀectively just a scalar, {x}
T
[A]{x}.
Note that
{x}
T
[A]{x} =
n
¸
i=1
n
¸
j=1
A
ij
x
i
x
j
. (1.13)
This is referred to as the quadratic form associated with [A]. In the special case of a diagonal
matrix [A]
{x}
T
[A]{x} = A
11
x
2
1
+ A
22
x
2
1
+ . . . + A
nn
x
2
n
. (1.14)
The trace of a square matrix is the sum of the diagonal elements of that matrix and is
denoted by trace[A]:
trace[A] =
n
¸
i=1
A
ii
. (1.15)
4 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
One can show that
trace([A][B]) = trace([B][A]). (1.16)
Let det[A] denote the determinant of a square matrix. Then for a 2 ×2 matrix
det
A
11
A
12
A
21
A
22
= A
11
A
22
−A
12
A
21
, (1.17)
and for a 3 ×3 matrix
det
¸
¸
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
= A
11
det
A
22
A
23
A
32
A
33
−A
12
det
A
21
A
23
A
31
A
33
+ A
13
det
A
21
A
22
A
31
A
32
.
(1.18)
The determinant of a n×n matrix is deﬁned recursively in a similar manner. One can show
that
det([A][B]) = (det[A]) (det[B]). (1.19)
Note that trace[A] and det[A] are both scalarvalued functions of the matrix [A].
Consider a square matrix [A]. For each i = 1, 2, . . . , n, a row matrix {a}
i
can be created
by assembling the elements in the ith row of [A]: {a}
i
= {A
i1
, A
i2
, A
i3
, . . . , A
in
}. If the only
scalars α
i
for which
α
1
{a}
1
+ α
2
{a}
2
+ α
3
{a}
3
+ . . . α
n
{a}
n
= {0} (1.20)
are α
1
= α
2
= . . . = α
n
= 0, the rows of [A] are said to be linearly independent. If at least
one of the α’s is nonzero, they are said to be linearly dependent, and then at least one row
of [A] can be expressed as a linear combination of the other rows.
Consider a square matrix [A] and suppose that its rows are linearly independent. Then
the matrix is said to be nonsingular and there exists a matrix [B], usually denoted by
[B] = [A]
−1
and called the inverse of [A], for which [B][A] = [A][B] = [I]. For [A] to be
nonsingular it is necessary and suﬃcient that det[A] = 0. If the rows of [A] are linearly
dependent, the matrix is singular and an inverse matrix does not exist.
Consider a n×n square matrix [A]. First consider the (n−1)×(n−1) matrix obtained by
eliminating the ith row and jth column of [A]; then consider the determinant of that second
matrix; and ﬁnally consider the product of that determinant with (−1)
i+j
. The number thus
obtained is called the cofactor of A
ij
. If [B] is the inverse of [A], [B] = [A]
−1
, then
B
ij
=
cofactor of A
ji
det[A]
(1.21)
1.2. INDICIAL NOTATION 5
If the transpose and inverse of a matrix coincide, i.e. if
[A]
−1
= [A]
T
, (1.22)
then the matrix is said to be orthogonal. Note that for an orthogonal matrix [A], one has
[A][A]
T
= [A]
T
[A] = [I] and that det[A] = ±1.
1.2 Indicial notation
Consider a n × n square matrix [A] and two n × 1 column matrices {x} and {b}. Let A
ij
denote the element of [A] in its i
th
row and j
th
column, and let x
i
and b
i
denote the elements
in the i
th
row of {x} and {b} respectively. Now consider the matrix equation [A]{x} = {b}:
¸
¸
¸
¸
A
11
A
12
. . . A
1n
A
21
A
22
. . . A
2n
. . . . . . . . . . . .
A
n1
A
n2
. . . A
nn
¸
¸
¸
¸
x
1
x
2
. . .
x
n
=
¸
¸
¸
¸
b
1
b
2
. . .
b
n
. (1.23)
Carrying out the matrix multiplication, this is equivalent to the system of linear algebraic
equations
A
11
x
1
+A
12
x
2
+. . . +A
1n
x
n
= b
1
,
A
21
x
1
+A
22
x
2
+. . . +A
2n
x
n
= b
2
,
. . . +. . . +. . . +. . . = . . .
A
n1
x
1
+A
n2
x
2
+. . . +A
nn
x
n
= b
n
.
(1.24)
This system of equations can be written more compactly as
A
i1
x
1
+ A
i2
x
2
+ . . . A
in
x
n
= b
i
with i taking each value in the range 1, 2, . . . n; (1.25)
or even more compactly by omitting the statement “with i taking each value in the range
1, 2, . . . , n”, and simply writing
A
i1
x
1
+ A
i2
x
2
+ . . . + A
in
x
n
= b
i
(1.26)
with the understanding that (1.26) holds for each value of the subscript i in the range i =
1, 2, . . . n. This understanding is referred to as the range convention. The subscript i is called
a free subscript because it is free to take on each value in its range. From here on, we shall
always use the range convention unless explicitly stated otherwise.
6 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
Observe that
A
j1
x
1
+ A
j2
x
2
+ . . . + A
jn
x
n
= b
j
(1.27)
is identical to (1.26); this is because j is a free subscript in (1.27) and so (1.27) is required
to hold “for all j = 1, 2, . . . , n” and this leads back to (1.24). This illustrates the fact that
the particular choice of index for the free subscript in an equation is not important provided
that the same free subscript appears in every symbol grouping.
1
As a second example, suppose that f(x
1
, x
2
, . . . , x
n
) is a function of x
1
, x
2
, . . . , x
n
, Then,
if we write the equation
∂f
∂x
k
= 3x
k
, (1.28)
the index k in it is a free subscript and so takes all values in the range 1, 2, . . . , n. Thus
(1.28) is a compact way of writing the n equations
∂f
∂x
1
= 3x
1
,
∂f
∂x
2
= 3x
2
, . . . ,
∂f
∂x
n
= 3x
n
. (1.29)
As a third example, the equation
A
pq
= x
p
x
q
(1.30)
has two free subscripts p and q, and each, independently, takes all values in the range
1, 2, . . . , n. Therefore (1.30) corresponds to the nine equations
A
11
= x
1
x
1
, A
12
= x
1
x
2
, . . . A
1n
= x
1
x
n
,
A
21
= x
2
x
1
, A
22
= x
2
x
2
, . . . A
2n
= x
2
x
n
,
. . . . . . . . . . . . = . . .
A
n1
= x
n
x
1
, A
n2
= x
n
x
2
, . . . A
nn
= x
n
x
n
.
(1.31)
In general, if an equation involves N free indices, then it represents 3
N
scalar equations.
In order to be consistent it is important that the same free subscript(s) must appear once,
and only once, in every group of symbols in an equation. For example, in equation (1.26),
since the index i appears once in the symbol group A
i1
x
1
, it must necessarily appear once
in each of the remaining symbol groups A
i2
x
2
, A
i3
x
3
, . . . A
in
x
n
and b
i
of that equation.
Similarly since the free subscripts p and q appear in the symbol group on the lefthand
1
By a “symbol group” we mean a set of terms contained between +, − and = signs.
1.3. SUMMATION CONVENTION 7
side of equation (1.30), it must also appear in the symbol group on the righthand side.
An equation of the form A
pq
= x
i
x
j
would violate this consistency requirement as would
A
i1
x
i
+ A
j2
x
2
= 0.
Note ﬁnally that had we adopted the range convention in Section 1.1, we would have
omitted the various “i=1,2,. . . ,n” statements there and written, for example, equation (1.4)
for the equality of two matrices as simply A
ij
= B
ij
; equation (1.5) for the sum of two
matrices as simply C
ij
= A
ij
+ B
ij
; equation (1.7) for the scalar multiple of a matrix as
B
ij
= αA
ij
; equation (1.8) for the transpose of a matrix as simply B
ij
= A
ji
; equation
(1.11) deﬁning a symmetric matrix as simply A
ij
= A
ji
; and equation (1.12) deﬁning a
skewsymmetric matrix as simply A
ij
= −A
ji
.
1.3 Summation convention
Next, observe that (1.26) can be written as
n
¸
j=1
A
ij
x
j
= b
i
. (1.32)
We can simplify the notation even further by agreeing to drop the summation sign and instead
imposing the rule that summation is implied over a subscript that appears twice in a symbol
grouping. With this understanding in force, we would write (1.32) as
A
ij
x
j
= b
i
(1.33)
with summation on the subscript j being implied. A subscript that appears twice in a
symbol grouping is called a repeated or dummy subscript; the subscript j in (1.33) is a
dummy subscript.
Note that
A
ik
x
k
= b
i
(1.34)
is identical to (1.33); this is because k is a dummy subscript in (1.34) and therefore summa
tion on k in implied in (1.34). Thus the particular choice of index for the dummy subscript
is not important.
In order to avoid ambiguity, no subscript is allowed to appear more than twice in any
symbol grouping. Thus we shall never write, for example, A
ii
x
i
= b
i
since, if we did, the
index i would appear 3 times in the ﬁrst symbol group.
8 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
Summary of Rules:
1. Lowercase latin subscripts take on values in the range (1, 2, . . . , n).
2. A given index may appear either once or twice in a symbol grouping. If it appears
once, it is called a free index and it takes on each value in its range. If it appears twice,
it is called a dummy index and summation is implied over it.
3. The same index may not appear more than twice in the same symbol grouping.
4. All symbol groupings in an equation must have the same free subscripts.
Free and dummy indices may be changed without altering the meaning of an expression
provided that one does not violate the preceding rules. Thus, for example, we can change
the free subscript p in every term of the equation
A
pq
x
q
= b
p
(1.35)
to any other index, say k, and equivalently write
A
kq
x
q
= b
k
. (1.36)
We can also change the repeated subscript q to some other index, say s, and write
A
ks
x
s
= b
k
. (1.37)
The three preceding equations are identical.
It is important to emphasize that each of the equations in, for example (1.24), involves
scalar quantities, and therefore, the order in which the terms appear within a symbol group
is irrelevant. Thus, for example, (1.24)
1
is equivalent to x
1
A
11
+ x
2
A
12
+ . . . + x
n
A
1n
=
b
1
. Likewise we can write (1.33) equivalently as x
j
A
ij
= b
i
. Note that both A
ij
x
j
= b
i
and x
j
A
ij
= b
i
represent the matrix equation [A]{x} = {b}; the second equation does not
correspond to {x}[A] = {b}. In an indicial equation it is the location of the subscripts that
is crucial; in particular, it is the location where the repeated subscript appears that tells us
whether {x} multiplies [A] or [A] multiplies {x}.
Note ﬁnally that had we adopted the range and summation conventions in Section 1.1,
we would have written equation (1.6) for the product of two matrices as C
ij
= A
ik
B
kj
;
equation (1.10) for the product of a matrix by its transpose as {x}
T
{x} = x
i
x
i
; equation
(1.13) for the quadratic form as {x}
T
[A]{x} = A
ij
x
i
x
j
; and equation (1.15) for the trace as
trace [A] = A
ii
.
1.4. KRONECKER DELTA 9
1.4 Kronecker delta
The Kronecker Delta, δ
ij
, is deﬁned by
δ
ij
=
1 if i = j,
0 if i = j.
(1.38)
Note that it represents the elements of the identity matrix. If [Q] is an orthogonal matrix,
then we know that [Q][Q]
T
= [Q]
T
[Q] = [I]. This implies, in indicial notation, that
Q
ik
Q
jk
= Q
ki
Q
kj
= δ
ij
. (1.39)
The following useful property of the Kronecker delta is sometimes called the substitution
rule. Consider, for example, any column matrix {u} and suppose that one wishes to simplify
the expression u
i
δ
ij
. Recall that u
i
δ
ij
= u
1
δ
1j
+ u
2
δ
2j
+ . . . + u
n
δ
nj
. Since δ
ij
is zero unless
i = j, it follows that all terms on the righthand side vanish trivially except for the one term
for which i = j. Thus the term that survives on the righthand side is u
j
and so
u
i
δ
ij
= u
j
. (1.40)
Thus we have used the facts that (i) since δ
ij
is zero unless i = j, the expression being
simpliﬁed has a nonzero value only if i = j; (ii) and when i = j, δ
ij
is unity. Thus replacing
the Kronecker delta by unity, and changing the repeated subscript i → j, gives u
i
δ
ij
= u
j
.
Similarly, suppose that [A] is a square matrix and one wishes to simplify A
jk
δ
j
. Then by the
same reasoning, we replace the Kronecker delta by unity and change the repeated subscript
j → to obtain
2
A
jk
δ
j
= A
k
. (1.41)
More generally, if δ
ip
multiplies a quantity C
ijk
representing n
4
numbers, one replaces
the Kronecker delta by unity and changes the repeated subscript i →p to obtain
C
ijk
δ
ip
= C
pjk
. (1.42)
The substitution rule applies even more generally: for any quantity or expression T
ipq...z
, one
simply replaces the Kronecker delta by unity and changes the repeated subscript i → j to
obtain
T
ipq...z
δ
ij
= T
jpq...z
. (1.43)
2
Observe that these results are immediately apparent by using matrix algebra. In the ﬁrst example, note
that δ
ji
u
i
(which is equal to the quantity δ
ij
u
i
that is given) is simply the jth element of the column matrix
[I]{u}. Since [I]{u} = {u} the result follows at once. Similarly in the second example, δ
j
A
jk
is simply the
, kelement of the matrix [I][A]. Since [I][A] = [A], the result follows.
10 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
1.5 The alternator or permutation symbol
We now limit attention to subscripts that range over 1, 2, 3 only. The alternator or permu
tation symbol is deﬁned by
e
ijk
=
0 if two or more subscripts i, j, k, are equal,
+1 if the subscripts i, j, k, are in cyclic order,
−1 if the subscripts i, j, k, are in anticyclic order,
=
0 if two or more subscripts i, j, k, are equal,
+1 for (i, j, k) = (1, 2, 3), (2, 3, 1), (3, 1, 2),
−1 for (i, j, k) = (1, 3, 2), (2, 1, 3), (3, 2, 1).
(1.44)
Observe from its deﬁnition that the sign of e
ijk
changes whenever any two adjacent subscripts
are switched:
e
ijk
= −e
jik
= e
jki
. (1.45)
One can show by direct calculation that the determinant of a 3 matrix [A] can be written
in either of two forms
det[A] = e
ijk
A
1i
A
2j
A
3k
or det[A] = e
ijk
A
i1
A
j2
A
k3
; (1.46)
as well as in the form
det[A] =
1
6
e
ijk
e
pqr
A
ip
A
jq
A
kr
. (1.47)
Another useful identity involving the determinant is
e
pqr
det[A] = e
ijk
A
ip
A
jq
A
kr
. (1.48)
The following relation involving the alternator and the Kronecker delta will be useful in
subsequent calculations
e
ijk
e
pqk
= δ
ip
δ
jq
−δ
iq
δ
jp
. (1.49)
It is left to the reader to develop proofs of these identities. They can, of course, be veriﬁed
directly, by simply writing out all of the terms in (1.46)  (1.49).
1.6. WORKED EXAMPLES. 11
1.6 Worked Examples.
Example(1.1): If [A] and [B] are n×n square matrices and {x}, {y}, {z} are n×1 column matrices, express
the matrix equation
{y} = [A]{x} + [B]{z}
as a set of scalar equations.
Solution: By the rules of matrix multiplication, the element y
i
in the i
th
row of {y} is obtained by ﬁrst pairwise
multiplying the elements A
i1
, A
i2
, . . . , A
in
of the i
th
row of [A] by the respective elements x
1
, x
2
, . . . , x
n
of
{x} and summing; then doing the same for the elements of [B] and {z}; and ﬁnally adding the two together.
Thus
y
i
= A
ij
x
j
+B
ij
z
j
,
where summation over the dummy index j is implied, and this equation holds for each value of the free index
i = 1, 2, . . . , n. Note that one can alternatively – and equivalently – write the above equation in any of the
following forms:
y
k
= A
kj
x
j
+B
kj
z
j
, y
k
= A
kp
x
p
+B
kp
z
p
, y
i
= A
ip
x
p
+B
iq
z
q
.
Observe that all rules for indicial notation are satisﬁed by each of the three equations above.
Example(1.2): The n × n matrices [C], [D] and [E] are deﬁned in terms of the two n × n matrices [A] and
[B] by
[C] = [A][B], [D] = [B][A], [E] = [A][B]
T
.
Express the elements of [C], [D] and [E] in terms of the elements of [A] and [B].
Solution: By the rules of matrix multiplication, the element C
ij
in the i
th
row and j
th
column of [C] is
obtained by multiplying the elements of the i
th
row of [A], pairwise, by the respective elements of the j
th
column of [B] and summing. So, C
ij
is obtained by multiplying the elements A
i1
, A
i2
, . . . A
in
by, respectively,
B
1j
, B
2j
, . . . B
nj
and summing. Thus
C
ij
= A
ik
B
kj
;
note that i and j are both free indices here and so this represents n
2
scalar equations; moreover summation
is carried out over the repeated index k. It follows likewise that the equation [D] = [B][A] leads to
D
ij
= B
ik
A
kj
; or equivalently D
ij
= A
kj
B
ik
,
where the second expression was obtained by simply changing the order in which the terms appear in the
ﬁrst expression (since, as noted previously, the order of terms within a symbol group is insigniﬁcant since
these are scalar quantities.) In order to calculate E
ij
, we ﬁrst multiply [A] by [B]
T
to obtain E
ij
= A
ik
B
T
kj
.
However, by deﬁnition of transposition, the i, jelement of a matrix [B]
T
equals the j, ielement of the matrix
[B]: B
T
ij
= B
ji
and so we can write
E
ij
= A
ik
B
jk
.
12 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
All four expressions here involve the ik, kj or jk elements of [A] and [B]. The precise locations of the
subscripts vary and the meaning of the terms depend crucially on these locations. It is worth repeating that
the location of the repeated subscript k tells us what term multiplies what term.
Example(1.3): If [S] is any symmetric matrix and [W] is any skewsymmetric matrix, show that
S
ij
W
ij
= 0.
Solution: Note that both i and j are dummy subscripts here; therefore there are summations over each of
them. Also, there is no free subscript so this is just a single scalar equation.
Whenever there is a dummy subscript, the choice of the particular index for that dummy subscript is
arbitrary, and we can change it to another index, provided that we change both repeated subscripts to the
new symbol (and as long as we do not have any subscript appearing more than twice). Thus, for example,
since i is a dummy subscript in S
ij
W
ij
, we can change i → p and get S
ij
W
ij
= S
pj
W
pj
. Note that we can
change i to any other index except j; if we did change it to j, then there would be four j’s and that violates
one of our rules.
By changing the dummy indices i →p and j →q, we get S
ij
W
ij
= S
pq
W
pq
. We can now change dummy
indices again, from p →j and q →i which gives S
pq
W
pq
= S
ji
W
ji
. On combining, these we get
S
ij
W
ij
= S
ji
W
ji
.
Eﬀectively, we have changed both i and j simultaneously from i →j and j →i.
Next, since [S] is symmetric S
ji
= S
ij
; and since [W] is skewsymmetric, W
ji
= −W
ij
. Therefore
S
ji
W
ji
= −S
ij
W
ij
. Using this in the righthand side of the preceding equation gives
S
ij
W
ij
= −S
ij
W
ij
from which it follows that S
ij
W
ij
= 0.
Remark: As a special case, take S
ij
= u
i
u
j
where {u} is an arbitrary column matrix; note that this [S] is
symmetric. It follows that for any skewsymmetric [W],
W
ij
u
i
u
j
= 0 for all u
i
.
Example(1.4): Show that any matrix [A] can be additively decomposed into the sum of a symmetric matrix
and a skewsymmetric matrix.
Solution: Deﬁne matrices [S] and [W] in terms of the given the matrix [A] as follows:
S
ij
=
1
2
(A
ij
+A
ji
), W
ij
=
1
2
(A
ij
−A
ji
).
It may be readily veriﬁed from these deﬁnitions that S
ij
= S
ji
and that W
ij
= −W
ij
. Thus, the matrix [S]
is symmetric and [W] is skewsymmetric. By adding the two equations in above one obtains
S
ij
+W
ij
= A
ij
,
1.6. WORKED EXAMPLES. 13
or in matrix form, [A] = [S] + [W].
Example (1.5): Show that the quadratic form T
ij
u
i
u
j
is unchanged if T
ij
is replaced by its symmetric part.
i.e. show that for any matrix [T],
T
ij
u
i
u
j
= S
ij
u
i
u
j
for all u
i
where S
ij
=
1
2
(T
ij
+T
ji
). (i)
Solution: The result follows from the following calculation:
T
ij
u
i
u
j
=
1
2
T
ij
+
1
2
T
ij
+
1
2
T
ji
−
1
2
T
ji
u
i
u
j
=
1
2
(T
ij
+T
ji
) u
i
u
j
+
1
2
(T
ij
−T
ji
) u
i
u
j
= S
ij
u
i
u
j
,
where in the last step we have used the facts that A
ij
= T
ij
− T
ji
is skewsymmetric, that B
ij
= u
i
u
j
is
symmetric, and that for any symmetric matrix [A] and any skewsymmetric matrix [B], one has A
ij
B
ij
= 0.
Example (1.6): Suppose that D
1111
, D
1112
, . . . D
111n
, . . . D
1121
, D
1122
, . . . D
112n
, . . . D
nnnn
are n
4
constants;
and let D
ijk
denote a generic element of this set where each of the subscripts i, j, k, take all values in
the range 1, 2, . . . n. Let [E] be an arbitrary symmetric matrix and deﬁne the elements of a matrix [A] by
A
ij
= D
ijk
E
k
. Show that [A] is unchanged if D
ijk
is replaced by its “symmetric part” C
ijk
where
C
ijk
=
1
2
(D
ijk
+D
ijk
). (i)
Solution: In a manner entirely analogous to the previous example,
A
ij
= D
ijk
E
k
=
1
2
D
ijk
+
1
2
D
ijk
+
1
2
D
ijk
−
1
2
D
ijk
E
k
=
1
2
(D
ijk
+D
ijk
) E
k
+
1
2
(D
ijk
−D
ijk
) E
k
= C
ijk
E
k
,
where in the last step we have used the fact that (D
ijk
−D
ijk
)E
k
= 0 since D
ijk
−D
ijk
is skew symmetric
in the subscripts k, while E
k
is symmetric in the subscripts k, .
Example (1.7): Evaluate the expression δ
ij
δ
ik
δ
jk
.
Solution: By using the substitution rule, ﬁrst on the repeated index i and then on the repeated index j, we
have δ
ij
δ
ik
δ
jk
= δ
jk
δ
jk
= δ
kk
= δ
11
+δ
22
+. . . +δ
nn
= n.
Example(1.8): Given an orthogonal matrix [Q], use indicial notation to solve the matrix equation [Q]{x} =
{a} for {x}.
14 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
Solution: In indicial form, the equation [Q]{x} = {a} reads
Q
ij
x
j
= a
i
.
Multiplying both sides by Q
ik
gives
Q
ik
Q
ij
x
j
= Q
ik
a
i
.
Since [Q] is orthogonal, we know from (1.39) that Q
rp
Q
rq
= δ
pq
. Thus the preceding equation simpliﬁes to
δ
jk
x
j
= Q
ik
a
i
,
which, by the substitution rule, reduces further to
x
k
= Q
ik
a
i
.
In matrix notation this reads {x} = [Q]
T
{a} which we could, of course, have written down immediately from
the fact that {x} = [Q]
−1
{a}, and for an orthogonal matrix, [Q]
−1
= [Q]
T
.
Example(1.9): Consider the function f(x
1
, x
2
, . . . , x
n
) = A
ij
x
i
x
j
where the A
ij
’s are constants. Calculate
the partial derivatives ∂f/∂x
i
.
Solution: We begin by making two general observations. First, note that because of the summation on
the indices i and j, it is incorrect to conclude that ∂f/∂x
i
= A
ij
x
j
by viewing this in the same way as
diﬀerentiating the function A
12
x
1
x
2
with respect to x
1
. Second, observe that if we diﬀerentiatiate f with
respect to x
i
and write ∂f/∂x
i
= ∂(A
ij
x
i
x
j
)/∂x
i
, we would violate our rules because the righthand side
has the subscript i appearing three times in one symbol grouping. In order to get around this diﬃculty we
make use of the fact that the speciﬁc choice of the index in a dummy subscript is not signiﬁcant and so we
can write f = A
pq
x
p
x
q
.
Diﬀerentiating f and using the fact that [A] is constant gives
∂f
∂x
i
=
∂
∂x
i
(A
pq
x
p
x
q
) = A
pq
∂
∂x
i
(x
p
x
q
) = A
pq
¸
∂x
p
∂x
i
x
q
+x
p
∂x
q
∂x
i
.
Since the x
i
’s are independent variables, it follows that
∂x
i
∂x
j
=
0 if i = j,
1 if i = j,
i.e.
∂x
i
∂x
j
= δ
ij
.
Using this above gives
∂f
∂x
i
= A
pq
[δ
pi
x
q
+x
p
δ
qi
] = A
pq
δ
pi
x
q
+A
pq
x
p
δ
qi
which, by the substitution rule, simpliﬁes to
∂f
∂x
i
= A
iq
x
q
+A
pi
x
p
= A
ij
x
j
+A
ji
x
j
= (A
ij
+A
ji
)x
j
.
1.6. WORKED EXAMPLES. 15
Example (1.10): Suppose that {x}
T
[A]{x} = 0 for all column matrices {x} where the square matrix [A] is
independent of {x}. What does this imply about [A]?
Solution: We know from a previous example that that if [A] is a skewsymmetric and [S] is symmetric then
A
ij
S
ij
= 0, and as a special case of this that A
ij
x
i
x
j
= 0 for all {x}. Thus a suﬃcient condition for the
given equation to hold is that [A] be skewsymmetric. Now we show that this is also a necessary condition.
We are given that A
ij
x
i
x
j
= 0 for all x
i
. Since this equation holds for all x
i
, we may diﬀerentiate both
sides with respect to x
k
and proceed as follows:
0 =
∂
∂x
k
(A
ij
x
i
x
j
) = A
ij
∂
∂x
k
(x
i
x
j
) = A
ij
∂x
i
∂x
k
x
j
+A
ij
x
i
∂x
j
∂x
k
= A
ij
δ
ik
x
j
+A
ij
x
i
δ
jk
, (i)
where we have used the fact that ∂x
i
/∂x
j
= δ
ij
in the last step. On using the substitution rule, this simpliﬁes
to
A
kj
x
j
+A
ik
x
i
= (A
kj
+A
jk
) x
j
= 0. (ii)
Since this also holds for all x
i
, it may be diﬀerentiated again with respect to x
i
to obtain
(A
kj
+A
jk
)
∂x
j
∂x
i
= (A
kj
+A
jk
) δ
ji
= A
ki
+A
ik
= 0. (iii)
Thus [A] must necessarily be a skew symmetric matrix,
Therefore it is necessary and suﬃcient that [A] be skewsymmetric.
Example (1.11): Let C
ijkl
be a set of n
4
constants. Deﬁne the function
´
W([E]) for all matrices [E] by
´
W([E]) = W(E
11
, E
12
, ....E
nn
) =
1
2
C
ijkl
E
ij
E
kl
. Calculate
∂W
∂E
ij
and
∂
2
W
∂E
ij
∂E
kl
. (i)
Solution: First, since the E
ij
’s are independent variables, it follows that
∂E
pq
∂E
ij
=
1 if p = i and q = j,
0 otherwise.
Therefore,
∂E
pq
∂E
ij
= δ
pi
δ
qj
. (ii)
Keeping this in mind and diﬀerentiating W(E
11
, E
12
, ....E
33
) with respect to E
ij
gives
∂W
∂E
ij
=
∂
∂E
ij
1
2
C
pqrs
E
pq
E
rs
=
1
2
C
pqrs
∂E
pq
∂E
ij
E
rs
+E
pq
∂E
rs
∂E
ij
=
1
2
C
pqrs
(δ
pi
δ
qj
E
rs
+δ
ri
δ
sj
E
pq
)
=
1
2
C
ijrs
E
rs
+
1
2
C
pqij
E
pq
=
1
2
(C
ijpq
+C
pqij
) E
pq
.
16 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION
where we have made use of the substitution rule. (Note that in the ﬁrst step we wrote W =
1
2
C
pqrs
E
pq
E
rs
rather than W =
1
2
C
ijkl
E
ij
E
kl
because we would violate our rules for indices had we written ∂(
1
2
C
ijkl
E
ij
E
kl
)/∂E
ij
.)
Diﬀerentiating this once more with respect to E
kl
gives
∂
2
W
∂E
ij
∂E
kl
=
∂
∂E
k
1
2
(C
ijpq
+C
pqij
) E
pq
=
1
2
(C
ijpq
+C
pqij
) δ
pk
δ
ql
(iii)
=
1
2
(C
ijkl
+C
klij
) (iv)
Example (1.12): Evaluate the expression e
ijk
e
kij
.
Solution: By ﬁrst using the skew symmetry property (1.45), then using the identity (1.49), and ﬁnally using
the substitution rule, we have e
ijk
e
kij
= −e
ijk
e
ikj
= −(δ
jk
δ
kj
−δ
jj
δ
kk
) = −(δ
jj
−δ
jj
δ
kk
) = −(3−3×3) = 6.
Example(1.13): Show that
e
ijk
S
jk
= 0 (i)
if and only if the matrix [S] is symmetric.
Solution: First, suppose that [S] is symmetric. Pick and ﬁx the free subscript i at any value i = 1, 2, 3. Then,
we can think of e
ijk
as the j, k element of a 3×3 matrix. Since e
ijk
= −e
ikj
this is a skewsymmetric matrix.
In a previous example we showed that S
ij
W
ij
= 0 for any symmetric matrix [S] and any skewsymmetric
matrix [W]. Consequently (i) must hold.
Conversely suppose that (i) holds for some matrix [S]. Multiplying (i) by e
ipq
and using the identity
(1.49) leads to
e
ipq
e
ijk
S
jk
= (δ
pj
δ
qk
−δ
pk
δ
qj
)S
jk
= S
pq
−S
qp
= 0
where in the last step we have used the substitutin rule. Thus S
pq
= S
qp
and so [S] is symmetric.
Remark: Note as a special case of this result that
e
ijk
v
j
v
k
= 0 (ii)
for any arbitrary column matrix {v}.
References
1. R.A. Frazer, W.J. Duncan and A.R. Collar, Elementary Matrices, Cambridge University Press, 1965.
2. R. Bellman, Introduction to Matrix Analysis, McGrawHill, 1960.
Chapter 2
Vectors and Linear Transformations
Notation:
α ..... scalar
a ..... vector
A ..... linear transformation
As mentioned in the Preface, Linear Algebra is a far richer subject than the very restricted
glimpse provided here might suggest. The discussion in these notes is limited almost entirely
to (a) real 3dimensional Euclidean vector spaces, and (b) to linear transformations that
carry vectors from one vector space into the same vector space. These notes are designed
to review those aspects of linear algebra that will be encountered in our study of continuum
mechanics; it is not meant to be a source for learning the subject of linear algebra for the
ﬁrst time.
The following notation will be consistently used: Greek letters will denote real numbers;
lowercase boldface Latin letters will denote vectors; and uppercase boldface Latin letters will
denote linear transformations. Thus, for example, α, β, γ... will denote scalars (real num
bers); a, b, c, ... will denote vectors; and A, B, C, ... will denote linear transformations. In
particular, “o” will denote the null vector while “0” will denote the null linear transforma
tion.
17
18 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
2.1 Vectors
A vector space V is a collection of elements, called vectors, together with two operations,
addition and multiplication by a scalar. The operation of addition (has certain properties
which we do not list here) and associates with each pair of vectors x and y in V, a vector
denoted by x +y that is also in V. In particular, it is assumed that there is a unique vector
o ∈ V called the null vector such that x+o = x. The operation of scalar multiplication (has
certain properties which we do not list here) and associates with each vector x ∈ V and each
real number α, another vector in V denoted by αx.
Let x
1
, x
2
, . . . , x
k
be k vectors in V. These vectors are said to be linearly independent if
the only real numbers α
1
, α
2
. . . , α
k
for which
α
1
x
1
+ α
2
x
2
· · · + α
k
x
k
= o (2.1)
are the numbers α
1
= α
2
= . . . α
k
= 0. If V contains n linearly independent vectors but
does not contain n + 1 linearly independent vectors, we say that the dimension of V is n.
Unless stated otherwise, from hereon we restrict attention to 3dimensional vector spaces.
If V is a vector space, any set of three linearly independent vectors {e
1
, e
2
, e
3
} is said to
be a basis for V. Given any vector x ∈ V there exist a unique set of numbers ξ
1
, ξ
2
, ξ
3
such
that
x = ξ
1
e
1
+ ξ
2
e
2
+ ξ
3
e
3
; (2.2)
the numbers ξ
1
, ξ
2
, ξ
3
are called the components of x in the basis {e
1
, e
2
, e
3
}.
Let U be a subset of a vector space V; we say that U is a subspace (or linear manifold)
of V if, for every x, y ∈ U and every real number α, the vectors x +y and αx are also in U.
Thus a linear manifold U of V is itself a vector space under the same operations of addition
and multiplication by a scalar as in V.
A scalarproduct (or inner product or dot product) on V is a function which assigns to
each pair of vectors x, y in V a scalar, which we denote by x · y. A scalarproduct has
certain properties which we do not list here except to note that it is required that
x · y = y · x for all x, y ∈ V. (2.3)
A Euclidean vector space is a vector space together with an inner product on that space.
From hereon we shall restrict attention to 3dimensional Euclidean vector spaces and denote
such a space by E
3
.
2.1. VECTORS 19
The length (or magnitude or norm) of a vector x is the scalar denoted by x and deﬁned
by
x = (x · x)
1/2
. (2.4)
A vector has zero length if and only if it is the null vector. A unit vector is a vector of unit
length. The angle θ between two vectors x and y is deﬁned by
cos θ =
x · y
xy
, 0 ≤ θ ≤ π. (2.5)
Two vectors x and y are orthogonal if x · y = 0. It is obvious, nevertheless helpful, to note
that if we are given two vectors x and y where x· y = 0 and y = o, this does not necessarily
imply that x = o; on the other hand if x · y = 0 for every vector y, then x must be the null
vector.
An orthonormal basis is a triplet of mutually orthogonal unit vectors e
1
, e
2
, e
3
∈ E
3
. For
such a basis,
e
i
· e
j
= δ
ij
for i, j = 1, 2, 3, (2.6)
where the Kronecker delta δ
ij
is deﬁned in the usual way by
δ
ij
=
1 if i = j,
0 if i = j.
(2.7)
A vectorproduct (or crossproduct) on E
3
is a function which assigns to each ordered pair
of vectors x, y ∈ E
3
, a vector, which we denote by x × y. The vectorproduct must have
certain properties (which we do not list here) except to note that it is required that
y ×x = −x ×y for all x, y ∈ V. (2.8)
One can show that
x ×y = x y sin θ n, (2.9)
where θ is the angle between x and y as deﬁned by (2.5), and n is a unit vector in the
direction x×y which therefore is normal to the plane deﬁned by x and y. Since n is parallel
to x ×y, and since it has unit length, it follows that n = (x ×y)/(x ×y). The magnitude
x × y of the crossproduct can be interpreted geometrically as the area of the triangle
formed by the vectors x and y. A basis {e
1
, e
2
, e
3
} is said to be righthanded if
(e
1
×e
2
) · e
3
> 0. (2.10)
20 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
2.1.1 Euclidean point space
A Euclidean point space P whose elements are called points, is related to a Euclidean vector
space E
3
in the following manner. Every order pair of points (p, q) is uniquely associated
with a vector in E
3
, say
→
pq, such that
(i)
→
pq = −
→
qp for all p, q ∈ P.
(ii)
→
pq +
→
qr=
→
pr for all p, q, r ∈ P.
(iii) given an arbitrary point p ∈ P and an arbitrary vector x ∈ E
3
, there is a unique point
q ∈ P such that x =
→
pq. Here x is called the position of point q relative to the point p.
Pick and ﬁx an arbitrary point o ∈ P (which we call the origin of P) and an arbitrary basis
for E
3
of unit vectors e
1
, e
2
, e
3
. Corresponding to any point p ∈ P there is a unique vector
→
op= x = x
1
e
1
+ x
2
e
2
+ x
3
e
3
∈ E
3
. The triplet (x
1
, x
2
, x
3
) are called the coordinates of p
in the (coordinate) frame F = {o; e
1
, e
2
, e
3
} comprised of the origin o and the basis vectors
e
1
, e
2
, e
3
. If e
1
, e
2
, e
3
is an orthonormal basis, the coordinate frame {o; e
1
, e
2
, e
3
} is called a
rectangular cartesian coordinate frame.
2.2 Linear Transformations.
Consider a threedimensional Euclidean vector space E
3
. Let F be a function (or transfor
mation) which assigns to each vector x ∈ E
3
, a second vector y ∈ E
3
,
y = F(x), x ∈ E
3
, y ∈ E
3
; (2.11)
F is said to be a linear transformation if it is such that
F(αx + βy) = αF(x) + βF(y) (2.12)
for all scalars α, β and all vectors x, y ∈ E
3
. When F is a linear transformation, we usually
omit the parenthesis and write Fx instead of F(x). Note that Fx is a vector, and it is the
image of x under the transformation F.
A linear transformation is deﬁned by the way it operates on vectors in E
3
. A geometric
example of a linear transformation is the “projection operator” Π which projects vectors
onto a given plane P. Let P be the plane normal to the unit vector n.; see Figure 2.1. For
2.2. LINEAR TRANSFORMATIONS. 21
P n
x
Πx
Figure 2.1: The projection Πx of a vector x onto the plane P.
any vector x ∈ E
3
, Πx ∈ P is the vector obtained by projecting x onto P. It can be veriﬁed
geometrically that P is deﬁned by
Πx = x −(x · n)n for all x ∈ E
3
. (2.13)
Linear transformations tell us how vectors are mapped into other vectors. In particular,
suppose that {y
1
, y
2
, y
3
} are any three vectors in E
3
and that {x
1
, x
2
, x
3
} are any three
linearly independent vectors in E
3
. Then there is a unique linear transformation F that
maps {x
1
, x
2
, x
3
} into {y
1
, y
2
, y
3
}: y
1
= Fx
1
, y
2
= Fx
2
, y
3
= Fx
3
. This follows from the
fact that {x
1
, x
2
, x
3
} is a basis for E
3
. Therefore any arbitrary vector x can be expressed
uniquely in the form x = ξ
1
x
1
+ ξ
2
x
2
+ ξ
3
x
3
; consequently the image Fx of any vector x is
given by Fx = ξ
1
y
1
+ ξ
2
y
2
+ ξ
3
y
3
which is a rule for assigning a unique vector Fx to any
given vector x.
The null linear transformation 0 is the linear transformation that takes every vector x
into the null vector o. The identity linear transformation I takes every vector x into itself.
Thus
0x = o, Ix = x for all x ∈ E
3
. (2.14)
Let A and B be linear transformations on E
3
and let α be a scalar. The linear trans
formations A+B, AB and αA are deﬁned as those linear transformations which are such
that
(A+B)x = Ax +Bx for all x ∈ E
3
, (2.15)
(AB)x = A(Bx) for all x ∈ E
3
, (2.16)
(αA)x = α(Ax) for all x ∈ E
3
, (2.17)
respectively; A + B is called the sum of A and B, AB the product, and αA is the scalar
multiple of A by α. In general,
AB = BA. (2.18)
22 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
The range of a linear transformation A (i.e., the collection of all vectors Ax as x takes
all values in E
3
) is a subspace of E
3
. The dimension of this particular subspace is known
as the rank of A. The set of all vectors x for which Ax = o is also a subspace of E
3
; it is
known as the null space of A.
Given any linear transformation A, one can show that there is a unique linear transfor
mation usually denoted by A
T
such that
Ax · y = x · A
T
y for all x, y ∈ E
3
. (2.19)
A
T
is called the transpose of A. One can show that
(αA)
T
= αA
T
, (A+B)
T
= A
T
+B
T
, (AB)
T
= B
T
A
T
. (2.20)
A linear transformation A is said to be symmetric if
A = A
T
; (2.21)
skewsymmetric if
A = −A
T
. (2.22)
Every linear transformation A can be represented as the sum of a symmetric linear trans
formation S and a skewsymmetric linear transformation W as follows:
A = S +W where S =
1
2
(A+A
T
), W =
1
2
(A−A
T
). (2.23)
For every skewsymmetric linear transformation W, it may be shown that
Wx · x = 0 for all x ∈ E
3
; (2.24)
moreover, there exists a vector w (called the axial vector of W) which has the property that
Wx = w×x for all x ∈ E
3
. (2.25)
Given a linear transformation A, if the only vector x for which Ax = o is the zero
vector, then we say that A is nonsingular. It follows from this that if A is nonsingular
then Ax = Ay whenever x = y. Thus, a nonsingular transformation A is a onetoone
transformation in the sense that, for any given y ∈ E
3
, there is one and only one vector x ∈ E
3
for which Ax = y. Consequently, corresponding to any nonsingular linear transformation
2.2. LINEAR TRANSFORMATIONS. 23
A, there exists a second linear transformation, denoted by A
−1
and called the inverse of A,
such that Ax = y if and only if x = A
−1
y, or equivalently, such that
AA
−1
= A
−1
A = I. (2.26)
If {y
1
, y
2
, y
3
} and {x
1
, x
2
, x
3
} are two sets of linearly independent vectors in E
3
, then
there is a unique nonsingular linear transformation F that maps {x
1
, x
2
, x
3
} into {y
1
, y
2
, y
3
}:
y
1
= Fx
1
, y
2
= Fx
2
, y
3
= Fx
3
. The inverse of F maps {y
1
, y
2
, y
3
} into {x
1
, x
2
, x
3
}. If both
bases {x
1
, x
2
, x
3
} and {y
1
, y
2
, y
3
} are righthanded (or both are lefthanded) we say that
the linear transformation F preserves the orientation of the vector space.
If two linear transformations A and B are both nonsingular, then so is AB; moreover,
(AB)
−1
= B
−1
A
−1
. (2.27)
If A is nonsingular then so is A
T
; moreover,
(A
T
)
−1
= (A
−1
)
T
, (2.28)
and so there is no ambiguity in writing this linear transformation as A
−T
.
A linear transformation Q is said to be orthogonal if it preserves length, i.e., if
Qx = x for all x ∈ E
3
. (2.29)
If Q is orthogonal, it follows that it also preserves the inner product:
Qx · Qy = x · y for all x, y ∈ E
3
. (2.30)
Thus an orthogonal linear transformation preserves both the length of a vector and the angle
between two vectors. If Q is orthogonal, it is necessarily nonsingular and
Q
−1
= Q
T
. (2.31)
A linear transformation A is said to be positive deﬁnite if
Ax · x > 0 for all x ∈ E
3
, x = o; (2.32)
positivesemideﬁnite if
Ax · x
¯
≥ 0 for all x ∈ E
3
. (2.33)
24 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
A positive deﬁnite linear transformation is necessarily nonsingular. Moreover, A is positive
deﬁnite if and only if its symmetric part (1/2)(A+A
T
) is positive deﬁnite.
Let A be a linear transformation. A subspace U is known as an invariant subspace of
A if Av ∈ U for all v ∈ U. Given a linear transformation A, suppose that there exists an
associated onedimensional invariant subspace. Since U is onedimensional, it follows that if
v ∈ U then any other vector in U can be expressed in the form λv for some scalar λ. Since
U is an invariant subspace we know in addition that Av ∈ U whenever v ∈ U. Combining
these two fact shows that Av = λv for all v ∈ U. A vector v and a scalar λ such that
Av = λv, (2.34)
are known, respectively, as an eigenvector and an eigenvalue of A. Each eigenvector of A
characterizes a onedimensional invariant subspace of A. Every linear transformation A (on
a 3dimensional vector space E
3
) has at least one eigenvalue.
It can be shown that a symmetric linear transformation A has three real eigenvalues
λ
1
, λ
2
, and λ
3
, and a corresponding set of three mutually orthogonal eigenvectors e
1
, e
2
, and
e
3
. The particular basis of E
3
comprised of {e
1
, e
2
, e
3
} is said to be a principal basis of A.
Every eigenvalue of a positive deﬁnite linear transformation must be positive, and no
eigenvalue of a nonsingular linear transformation can be zero. A symmetric linear transfor
mation is positive deﬁnite if and only if all three of its eigenvalues are positive.
If e and λ are an eigenvector and eigenvalue of a linear transformation A, then for any
positive integer n, it is easily seen that e and λ
n
are an eigenvector and an eigenvalue of A
n
where A
n
= AA...(n times)..AA; this continues to be true for negative integers m provided
A is nonsingular and if by A
−m
we mean (A
−1
)
m
, m > 0.
Finally, according to the polar decomposition theorem, given any nonsingular linear trans
formation F, there exists unique symmetric positive deﬁnite linear transformations U and
V and a unique orthogonal linear transformation R such that
F = RU = VR. (2.35)
If λ and r are an eigenvalue and eigenvector of U, then it can be readily shown that λ and
Rr are an eigenvalue and eigenvector of V.
Given two vectors a, b ∈ E
3
, their tensorproduct is the linear transformation usually
denoted by a ⊗b, which is such that
(a ⊗b)x = (x · b)a for all x ∈ E
3
. (2.36)
2.2. LINEAR TRANSFORMATIONS. 25
Observe that for any x ∈ E
3
, the vector (a ⊗b)x is parallel to the vector a. Thus the range
of the linear transformation a ⊗ b is the onedimensional subspace of E
3
consisting of all
vectors parallel to a. The rank of the linear transformation a ⊗b is thus unity.
For any vectors a, b, c, and d it is easily shown that
(a ⊗b)
T
= b ⊗a, (a ⊗b)(c ⊗d) = (b · c)(a ⊗d). (2.37)
The product of a linear transformation A with the linear transformation a ⊗b gives
A(a ⊗b) = (Aa) ⊗b, (a ⊗b)A = a ⊗(A
T
b). (2.38)
Let {e
1
, e
2
, e
3
} be an orthonormal basis. Since this is a basis, any vector in E
3
, and
therefore in particular each of the vectors Ae
1
, Ae
2
, Ae
3
, can be expressed as a unique
linear combination of the basis vectors e
1
, e
2
, e
3
. It follows that there exist unique real
numbers A
ij
such that
Ae
j
=
3
¸
i=1
A
ij
e
i
, j = 1, 2, 3, (2.39)
where A
ij
is the i
th
component on the vector Ae
j
. They can equivalently be expressed as
A
ij
= e
i
· (Ae
j
). The linear transformation A can now be represented as
A =
3
¸
i=1
3
¸
j=1
A
ij
(e
i
⊗e
j
). (2.40)
One refers to the A
ij
’s as the components of the linear transformation A in the basis
{e
1
, e
2
, e
3
}. Note that
3
¸
i=1
e
i
⊗e
i
= I,
3
¸
i=1
(Ae
i
) ⊗e
i
= A. (2.41)
Let S be a symmetric linear transformation with eigenvalues λ
1
, λ
2
, λ
3
and corresponding
(mutually orthogonal unit) eigenvectors e
1
, e
2
, e
3
. Since Se
j
= λ
j
e
j
for each j = 1, 2, 3, it
follows from (2.39) that the components of S in the principal basis {e
1
, e
2
, e
3
} are S
11
=
λ
1
, S
21
= S
31
= 0; S
12
= 0, S
22
= λ
2
, S
32
= 0; S
13
= S
23
= 0, S
33
= λ
3
. It follows from the
general representation (2.40) that S admits the representation
S =
3
¸
i=1
λ
i
(e
i
⊗e
i
); (2.42)
26 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
this is called the spectral representation of a symmetric linear transformation. It can be
readily shown that, for any positive integer n,
S
n
=
3
¸
i=1
λ
n
i
(e
i
⊗e
i
); (2.43)
if S is symmetric and nonsingular, then
S
−1
=
3
¸
i=1
(1/λ
i
) (e
i
⊗e
i
). (2.44)
If S is symmetric and positive deﬁnite, there is a unique symmetric positive deﬁnite linear
transformation T such that T
2
= S. We call T the positive deﬁnite square root of S and
denote it by T =
√
S. It is readily seen that
√
S =
3
¸
i=i
λ
i
(e
i
⊗e
i
). (2.45)
2.3 Worked Examples.
Example 2.1: Given three vectors a, b, c, show that
a · (b ×c) = b · (c ×a) = c · (a ×b).
Solution: By the properties of the vectorproduct, the vector (a + b) is normal to the vector (a + b) × c.
Thus
(a +b) · [(a +b) ×c] = 0.
On expanding this out one obtains
a · (a ×c) +a · (b ×c) +b · (a ×c) +b · (b ×c) = 0.
Since a is normal to (a × c), and b is normal to (b × c), the ﬁrst and last terms in this equation vanish.
Finally, recall that a ×c = −c ×a. Thus the preceding equation simpliﬁes to
a · (b ×c) = b · (c ×a).
This establishes the ﬁrst part of the result. The second part is shown analogously.
Example 2.2: Show that a necessary and suﬃcient condition for three vectors a, b, c in E
3
– none of which
is the null vector – to be linearly dependent is that a · (b ×c) = 0.
2.3. WORKED EXAMPLES. 27
Solution: To show necessity, suppose that the three vectors a, b, c, are linearly dependent. It follows that
αa +βb +γc = o
for some real numbers α, β, γ, at least one of which is non zero. Taking the vectorproduct of this equation
with c and then taking the scalarproduct of the result with a leads to
βa · (b ×c) = 0.
Analogous calculations with the other pairs of vectors, and keeping in mind that a · (b ×c) = b · (c ×a) =
c · (a ×b), leads to
αa · (b ×c) = 0, βa · (b ×c) = 0, γa · (b ×c) = 0.
Since at least one of α, β, γ is nonzero it follows that necessarily a · (b ×c) = o.
To show suﬃciency, let a· (b×c) = 0 and assume that a, b, c are linearly independent. We will show that
this is a contradiction whence a, b, c must be linearly dependent. By the properties of the vectorproduct,
the vector b ×c is normal to the plane deﬁned by the vectors b and c. By assumption, a · (b ×c) = 0, and
this implies that a is normal to b×c. Since we are in E
3
this means that a must lie in the plane deﬁned by
b and c. This means they cannot be linearly independent.
Example 2.3: Interpret the quantity a·(b×c) geometrically in terms of the volume of the tetrahedron deﬁned
by the vectors a, b, c.
Solution: Consider the tetrahedron formed by the three vectors a, b, c as depicted in Figure 2.2. Its volume
V
0
=
1
3
A
0
h
0
where A
0
is the area of its base and h
0
is its height.
n =
a × b
a × b
Volume =
1
3
A
0
×h
0
= a × b A
0
= c · n
h
0
c
a
b
n
Area A
0
Height h
0
Figure 2.2: Volume of the tetrahedron deﬁned by vectors a, b, c.
Consider the triangle deﬁned by the vectors a and b to be the base of the tetrahedron. Its area A
0
can
be written as 1/2 base ×height = 1/2a(b sin θ) where θ is the angle between a and b. However from the
property (2.9) of the vectorproduct we have a ×b = ab sin θ and so A
0
= a ×b/2.
Next, n = (a × b)/a × b is a unit vector that is normal to the base of the tetrahedron, and so the
height of the tetrahedron is h
0
= c · n; see Figure 2.2.
Therefore
V
0
=
1
3
A
0
h
0
=
1
3
a ×b
2
(c · n) =
1
6
(a ×b) · c. (i)
28 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
Observe that this provides a geometric explanation for why the vectors a, b, c are linearly dependent if and
only if (a ×b) · c = 0.
Example 2.4: Let φ(x) be a scalarvalued function deﬁned on the vector space E
3
. If φ is linear, i.e. if
φ(αx+βy) = αφ(x) +βφ(y) for all scalars α, β and all vectors x, y, show that φ(x) = c· x for some constant
vector c. Remark: This shows that the scalarproduct is the most general scalarvalued linear function of a
vector.
Solution: Let {e
1
, e
3
, e
3
} be any orthonormal basis for E
3
. Then an arbitrary vector x can be written in
terms of its components as x = x
1
e
1
+x
2
e
2
+x
3
e
3
. Therefore
φ(x) = φ(x
1
e
1
+x
2
e
2
+x
3
e
3
)
which because of the linearity of φ leads to
φ(x) = x
1
φ(e
1
) +x
2
φ(e
2
) +x
3
φ(e
3
).
On setting c
i
= φ(e
i
), i = 1, 2, 3, we ﬁnd
φ(x) = x
1
c
1
+x
2
c
2
+x
3
c
3
= c · x
where c = c
1
e
1
+c
2
e
2
+c
3
e
3
.
Example 2.5: If two linear transformations A and B have the property that Ax · y = Bx · y for all vectors
x and y, show that A = B.
Solution: Since (Ax − Bx) · y = 0 for all vectors y, we may choose y = Ax − Bx in this, leading to
Ax −Bx
2
= 0. Since the only vector of zero length is the null vector, this implies that
Ax = Bx for all vectors x (i)
and so A = B.
Example 2.6: Let n be a unit vector, and let P be the plane through o normal to n. Let Π and R be the
transformations which, respectively, project and reﬂect a vector in the plane P.
a. Show that Π and R are linear transformations; Π is called the “projection linear transformation”
while R is known as the “reﬂection linear transformation”.
b. Show that R(Rx) = x for all x ∈ E
3
.
c. Verify that a reﬂection linear transformation Ris nonsingular while a projection linear transformation
Π is singular. What is the inverse of R?
d. Verify that a projection linear transformation Π is symmetric and that a reﬂection linear transforma
tion R is orthogonal.
2.3. WORKED EXAMPLES. 29
P n
x
Πx
(x · n)n
Rx
(x · n)n
Figure 2.3: The projection Πx and reﬂection Rx of a vector x on the plane P.
e. Show that the projection linear transformation and reﬂection linear transformation can be represented
as Π = I −n ⊗n and R = I −2(n ⊗n) respectively.
Solution:
a. Figure 2.3 shows a sketch of the plane P, its unit normal vector n, a generic vector x, its projection
Πx and its reﬂection Rx. By geometry we see that
Πx = x −(x · n)n, Rx = x −2(x · n)n. (i)
These deﬁne the images Πx and Rx of a generic vector x under the transformation Π and R. One
can readily verify that Π and R satisfy the requirement (2.12) of a linear transformation.
b. Applying the deﬁnition (i)
2
of R to the vector Rx gives
R(Rx) = (Rx) −2
(Rx) · n
n
Replacing Rx on the righthand side of this equation by (i)
2
, and expanding the resulting expression
shows that the righthand side simpliﬁes to x. Thus R(Rx) = x.
c. Applying the deﬁnition (i)
1
of Π to the vector n gives
Πn = n −(n · n)n = n −n = o.
Therefore Πn = o and (since n = o) we see that o is not the only vector that is mapped to the null
vector by Π. The transformation Π is therefore singular.
Next consider the transformation R and consider a vector x that is mapped by it to the null vector,
i.e. Rx = o. Using (i)
2
x = 2(x · n)n.
Taking the scalarproduct of this equation with the unit vector n yields x · n = 2(x · n) from which
we conclude that x · n = 0. Substituting this into the righthand side of the preceding equation leads
to x = o. Therefore Rx = o if and only if x = o and so R is nonsingular.
To ﬁnd the inverse of R, recall from part (b) that R(Rx) = x. Operating on both sides of this
equation by R
−1
gives Rx = R
−1
x. Since this holds for all vectors x it follows that R
−1
= R.
30 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
d. To show that Π is symmetric we simply use its deﬁnition (i)
1
to calculate Πx · y and x · Πy for
arbitrary vectors x and y. This yields
Πx · y =
x −(x · n)n
· y = x · y −(x · n)(y · n)
and
x · Πy = x ·
y −(x · n)n
= x · y −(x · n)(y · n).
Thus Πx · y = x · Πy and so Π is symmetric.
To show that R is orthogonal we must show that RR
T
= I or R
T
= R
−1
. We begin by calculating
R
T
. Recall from the deﬁnition (2.19) that the transpose satisﬁes the requirement x · R
T
y = Rx · y.
Using the deﬁnition (i)
2
of R on the righthand side of this equation yields
x · R
T
y = x · y −2(x · n)(y · n).
We can rearrange the righthand side of this equation so it reads
x · R
T
y = x ·
y −2(y · n)n
.
Since this holds for all x it follows that R
T
y = y − 2(y · n)n. Comparing this with (i)
2
shows that
R
T
= R. In part (c) we showed that R
−1
= R and so it now follows that R
T
= R
−1
. Thus R is
orthogonal.
e. Applying the operation (I −n ⊗n) on an arbitrary vector x gives
I −n ⊗n
x = x −(n ⊗n)x = x −(x · n)n = Πx
and so Π = I −n ⊗n.
Similarly
I −2n ⊗n
x = x −2(x · n)n = Rx
and so R = I −2n ⊗n.
Example 2.7: If W is a skew symmetric linear transformation show that
Wx · x = 0 for all x . (i)
Solution: By the deﬁnition (2.19) of the transpose, we have Wx · x = x · W
T
x; and since W = −W
T
for a
skew symmetric linear transformation, this can be written as Wx· x = −x· Wx. Finally the property (2.3)
of the scalarproduct allows this to be written as Wx · x = −Wx · x from which the desired result follows.
Example 2.8: Show that (AB)
T
= B
T
A
T
.
Solution: First, by the deﬁnition (2.19) of the transpose,
(AB)x · y = x · (AB)
T
y . (i)
2.3. WORKED EXAMPLES. 31
Second, note that (AB)x · y = A(Bx) · y. By the deﬁnition of the transpose of A we have A(Bx) · y =
Bx· A
T
y; and by the deﬁnition of the transpose of B we have Bx· A
T
y = x· B
T
A
T
y. Therefore combining
these three equations shows that
(AB)x · y = x · B
T
A
T
y (ii)
On equating these two expressions for (AB)x · y shows that x · (AB)
T
y = x · B
T
A
T
y for all vectors x, y
which establishes the desired result.
Example 2.9: If o is the null vector, then show that Ao = o for any linear transformation A.
Solution: The null vector o has the property that when it is added to any vector, the vector remains
unchanged. Therefore x + o = x, and similarly Ax + o = Ax. However operating on the ﬁrst of these
equations by A shows that Ax + Ao = Ax, which when combined with the second equation yields the
desired result.
Example 2.10: If A and B are nonsingular linear transformations show that AB is also nonsingular and
that (AB)
−1
= B
−1
A
−1
.
Solution: Let C = B
−1
A
−1
. We will show that (AB)C = C(AB) = I and therefore that C is the inverse
of AB. (Since the inverse would thus have been shown to exist, necessarily AB must be nonsingular.)
Observe ﬁrst that
(AB) C = (AB) B
−1
A
−1
= A(BB
−1
)A
−1
= AIA
−1
= I ,
and similarly that
C(AB) = B
−1
A
−1
(AB) = B
−1
(A
−1
A)B == B
−1
IB = I .
Therefore (AB)C = C(AB) = I and so C is the inverse of AB.
Example 2.11: If A is nonsingular, show that (A
−1
)
T
= (A
T
)
−1
.
Solution: Since (A
T
)
−1
is the inverse of A
T
we have (A
T
)
−1
A
T
= I. Postoperating on both sides of this
equation by (A
−1
)
T
gives
(A
T
)
−1
A
T
(A
−1
)
T
= (A
−1
)
T
.
Recall that (AB)
T
= B
T
A
T
for any two linear transformations A and B. Thus the preceding equation
simpliﬁes to
(A
T
)
−1
(A
−1
A)
T
= (A
−1
)
T
Since A
−1
A = I the desired result follows.
Example 2.12: Show that an orthogonal linear transformation Q preserves inner products, i.e. show that
Qx · Qy = x · y for all vectors x, y.
32 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
Solution: Since
(x −y) · (x −y) = x · x +y · y −2x · y
it follows that
x · y =
1
2
¸
x
2
+y
2
−x −y
2
¸
. (i)
Since this holds for all vectors x, y it must also hold when x and y are replaced by Qx and Qy:
Qx · Qy =
1
2
¸
Qx
2
+Qy
2
−Qx −Qy
2
¸
.
By deﬁnition, an orthogonal linear transformation Q preserves length, i.e. Qv = v for all vectors v. Thus
the preceding equation simpliﬁes to
Qx · Qy =
1
2
¸
x
2
+y
2
−x −y
2
¸
. (ii)
Since the righthandsides of the preceding expressions for x · y and Qx · Qy are the same, it follows that
Qx · Qy = x · y.
Remark: Thus an orthogonal linear transformation preserves the length of any vector and the inner product
between any two vectors. It follows therefore that an orthogonal linear transformation preserves the angle
between a pair of vectors as well.
Example 2.13: Let Q be an orthogonal linear transformation. Show that
a. Q is nonsingular, and that
b. Q
−1
= Q
T
.
Solution:
a. To show that Q is nonsingular we must show that the only vector x for which Qx = o is the null
vector x = o. Suppose that Qx = o for some vector x. Taking the norm of the two sides of this
equation leads to Qx = o = 0. However an orthogonal linear transformation preserves length and
therefore Qx = x. Consequently x = 0. However the only vector of zero length is the null vector
and so necessarily x = o. Thus Q is nonsingular.
b. Since Q is orthogonal it preserves the inner product: Qx· Qy = x· y for all vectors x and y. However
the property (2.19) of the transpose shows that Qx· Qy = x· Q
T
Qy. It follows that x· Q
T
Qy = x· y
for all vectors x and y, and therefore that Q
T
Q = I. Thus Q
−1
= Q
T
.
Example 2.14: If α
1
and α
2
are two distinct eigenvalues of a symmetric linear transformation A, show that
the corresponding eigenvectors a
1
and a
2
are orthogonal to each other.
Solution: Recall from the deﬁnition of the transpose that Aa
1
· a
2
= a
1
· A
T
a
2
, and since A is symmetric
that A = A
T
. Thus
Aa
1
· a
2
= a
1
· Aa
2
.
2.3. WORKED EXAMPLES. 33
Since a
1
and a
2
are eigenvectors of A corresponding to the eigenvalues α
1
and α
2
, we have Aa
1
= α
1
a
1
and
Aa
2
= α
2
a
2
. Thus the preceding equation reduces to α
1
a
1
· a
2
= α
2
a
1
· a
2
or equivalently
(α
1
−α
2
)(a
1
· a
2
) = 0.
Since, α
1
= α
2
it follows that necessarily a
1
· a
2
= 0.
Example 2.15: If λ and e are an eigenvalue and eigenvector of an arbitrary linear transformation A, show
that λ and P
−1
e are an eigenvalue and eigenvector of the linear transformation P
−1
AP. Here P is an
arbitrary nonsingular linear transformation.
Solution: Since PP
−1
= I it follows that Ae = APP
−1
e. However we are told that Ae = λe, whence
APP
−1
e = λe. Operating on both sides with P
−1
gives P
−1
APP
−1
e = λP
−1
e which establishes the
result.
Example 2.16: If λ is an eigenvalue of an orthogonal linear transformation Q, show that λ = 1.
Solution: Let λ and e be an eigenvalue and corresponding eigenvector of Q. Thus Qe = λe and so Qe =
λe = λ e. However, Q preserves length and so Qe = e. Thus λ = 1.
Remark: We will show later that +1 is an eigenvalue of a “proper” orthogonal linear transformation on E
3
.
The corresponding eigvector is known as the axis of Q.
Example 2.17: The components of a linear transformation A in an orthonormal basis {e
1
, e
2
, e
3
} are the
unique real numbers A
ij
deﬁned by
Ae
j
=
3
¸
i=1
A
ij
e
i
, j = 1, 2, 3. (i)
Show that the linear transformation A can be represented as
A =
3
¸
i=1
3
¸
j=1
A
ij
(e
i
⊗e
j
). (ii)
Solution: Consider the linear transformation given on the righthand side of (ii) and operate it on an arbitrary
vector x:
¸
3
¸
i=1
3
¸
j=1
A
ij
(e
i
⊗e
j
)
x =
3
¸
i=1
3
¸
j=1
A
ij
(x · e
j
)e
i
=
3
¸
i=1
3
¸
j=1
A
ij
x
j
e
i
=
3
¸
j=1
x
j
3
¸
i=1
A
ij
e
i
,
where we have used the facts that (p⊗q)r = (q· r)p and x
i
= x· e
i
. On using (i) in the right most expression
above, we can continue this calculation as follows:
¸
3
¸
i=1
3
¸
j=1
A
ij
(e
i
⊗e
j
)
x =
3
¸
j=1
x
j
Ae
j
= A
3
¸
j=1
x
j
e
j
= Ax.
34 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
The desired result follows from this since this holds for arbitrary vectors x.
Example 2.18: Let Rbe a “rotation transformation” that rotates vectors in IE
3
through an angle θ, 0 < θ < π,
about an axis e (in the sense of the righthand rule). Show that R can be represented as
R = e ⊗e + (e
1
⊗e
1
+e
2
⊗e
2
) cos θ −(e
1
⊗e
2
−e
2
⊗e
1
) sin θ, (i)
where e
1
and e
2
are any two mutually orthogonal vectors such that {e
1
, e
2
, e} forms a righthanded or
thonormal basis for IE
3
.
Solution: We begin by listing what is given to us in the problem statement. Since the transformation R
simply rotates vectors, it necessarily preserves the length of a vector and so
Rx = x for all vectors x. (ii)
In addition, since the angle through which R rotates a vector is θ, the angle between any vector x and its
image Rx is θ:
Rx · x = x
2
cos θ for all vectors x. (iii)
Next, since R rotates vectors about the axis e, the angle between any vector x and e must equal the angle
between Rx and e:
Rx · e = x · e for all vectors x; (iv)
moreover, it leaves the axis e itself unchanged:
Re = e. (v)
And ﬁnally, since the rotation is in the sense of the righthand rule, for any vector x that is not parallelel to
the axis e, the vectors x, Rx and e must obey the inequality
(x ×Rx) · e > 0 for all vectors x that are not parallel to e. (vi)
Let {e
1
, e
2
, e} be a righthanded orthonormal basis. This implies that any vector in E
3
, and therefore
in particular the vectors Re
1
, Re
2
and Re, can be expressed as linear combinations of e
1
, e
2
and e,
Re
1
= R
11
e
1
+R
21
e
2
+R
31
e,
Re
2
= R
12
e
1
+R
22
e
2
+R
32
e,
Re = R
13
e
1
+R
23
e
2
+R
33
e,
(vii)
for some unique real numbers R
ij
, i, j = 1, 2, 3.
First, it follows from (v) and (vii)
3
that
R
13
= 0, R
23
= 0, R
33
= 1.
Second, we conclude from (iv) with the choice x = e
1
that Re
1
· e = 0. Similarly Re
2
· e = 0. These together
with (vii) imply that
R
31
= R
32
= 0.
2.3. WORKED EXAMPLES. 35
Third, it follows from (iii) with x = e
1
and (vii)
1
that R
11
= cos θ. One similarly shows that R
22
= cos θ.
Thus
R
11
= R
22
= cos θ.
Collecting these results allows us to write (vii) as
Re
1
= cos θ e
1
+R
21
e
2
,
Re
2
= R
12
e
1
+cos θ e
2
,
Re = e,
(viii)
Fourth, the inequality (vi) with the choice x = e
1
, together with (viii) and the fact that {e
1
, e
2
, e} forms
a righthanded basis yields R
21
> 0. Similarly the choice x = e
2
, yields R
12
< 0. Fifth, (ii) with x = e
1
gives Re
1
 = 1 which in view of (viii)
1
requires that R
21
= ±sin θ. Similarly we ﬁnd that R
12
= ±sin θ.
Collecting these results shows that
R
21
= +sin θ, R
12
= −sin θ,
since 0 < θ < π. Thus in conclusion we can write (viii) as
Re
1
= cos θ e
1
+sin θ e
2
,
Re
2
= −sin θ e
1
+cos θ e
2
,
Re = e.
(ix)
Finally, recall the representation (2.40) of a linear transformation in terms of its components as deﬁned
in (2.39). Applying this to (ix) allows us to write
R = cos θ (e
1
⊗e
1
) + sin θ (e
2
⊗e
1
) −sin θ (e
1
⊗e
2
) + cos θ (e
2
⊗e
2
) + (e ⊗e) (x)
which can be rearranged to give the desired result.
Example 2.19: If F is a nonsingular linear transformation, show that F
T
F is symmetric and positive deﬁnite.
Solution: For any linear transformations A and B we know that (AB)
T
= B
T
A
T
and (A
T
)
T
= A. It
therefore follows that
(F
T
F)
T
= F
T
(F
T
)
T
= F
T
F; (i)
this shows that F
T
F is symmetric.
In order to show that F
T
F is positive deﬁnite, we consider the quadratic form F
T
Fx · x. By using the
property (2.19) of the transpose, we can write
F
T
Fx · x = (Fx) · (Fx) = Fx
2
≥ 0. (ii)
Further, equality holds here if and only if Fx = o, which, since F is nonsingular, can happen only if x = o.
Thus F
T
Fx · x > 0 for all vectors x = o and so F
T
F is positive deﬁnite.
36 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
Example 2.20: Consider a symmetric positive deﬁnite linear transformation S. Show that it has a unique
symmetric positive deﬁnite square root, i.e. show that there is a unique symmetric positive deﬁnite linear
transformation T for which T
2
= S.
Solution: Since S is symmetric and positive deﬁnite it has three real positive eigenvalues σ
1
, σ
2
, σ
3
with
corresponding eigenvectors s
1
, s
2
, s
3
which may be taken to be orthonormal. Further, we know that S can
be represented as
S =
3
¸
i=1
σ
i
(s
i
⊗s
i
). (i)
If one deﬁnes a linear transformation T by
T =
3
¸
i=1
√
σ
i
(s
i
⊗s
i
) (ii)
one can readily verify that T is symmetric, positive deﬁnite and that T
2
= S. This establishes the existence
of a symmetric positive deﬁnite squareroot of S. What remains is to show uniqueness of this squareroot.
Suppose that S has two symmetric positive deﬁnite square roots T
1
and T
2
: S = T
2
1
= T
2
2
. Let σ > 0
and s be an eigenvalue and corresponding eigenvector of S. Then Ss = σs and so T
2
1
s = σs. Thus we have
(T
1
+
√
σI)(T
1
−
√
σI)s = 0 . (iii)
If we set f = (T
1
−
√
σI)s this can be written as
T
1
f = −
√
σf . (iv)
Thus either f = o or f is an eigenvector of T
1
corresponding to the eigenvalue −
√
σ(< 0). Since T
1
is
positive deﬁnite it cannot have a negative eigenvalue. Thus f = o and so
T
1
s =
√
σs . (v)
It similarly follows that T
2
s =
√
σs and therefore that
T
1
s = T
2
s. (vi)
This holds for every eigenvector s of S: i.e. T
1
s
i
= T
2
s
i
, i = 1, 2, 3. Since the triplet of eigenvectors form a
basis for the underlying vector space this in turn implies that T
1
x = T
2
x for any vector x. Thus T
1
= T
2
.
Example 2.21: Polar Decomposition Theorem: If F is a nonsingular linear transformation, show that there
exists a unique positive deﬁnite symmetric linear transformation U, and a unique orthogonal linear trans
formation R such that F = RU.
Solution: It follows from Example 2.19 that F
T
F is symmetric and positive deﬁnite. It then follows from
Example 2.20 that F
T
F has a unique symmetric positive deﬁnite square root, say, U:
U =
F
T
F. (i)
2.3. WORKED EXAMPLES. 37
Finally, since U is positive deﬁnite, it is nonsingular, and its inverse U
−1
exists. Deﬁne the linear
transformation R through:
R = FU
−1
. (ii)
All we have to do is to show that R is orthogonal. But this follows from
R
T
R = (FU
−1
)
T
(FU
−1
) = (U
−1
)
T
F
T
FU
−1
= U
−1
U
2
U
−1
= I. (iii)
In this calculation we have used the fact that U, and so U
−1
, are symmetric. This establishes the proposition
(except for the uniqueness which if left as an exercise).
Example 2.22: The polar decomposition theorem states that any nonsingular linear transformation F can
be represented uniquely in the forms F = RU = VR where R is orthogonal and U and V are symmetric
and positive deﬁnite. Let λ
i
, r
i
, i = 1, 2, 3 be the eigenvalues and eigenvectors of U. From Example 2.15 it
follows that the eigenvalues of V are the same as those of U and that the corresponding eigenvectors
i
of
V are given by
i
= Rr
i
. Thus U and V have the spectral decompositions
U =
3
¸
i=1
λ
i
r
i
⊗r
i
, V =
3
¸
i=1
λ
i
i
⊗
i
.
Show that
F =
3
¸
i=1
λ
i
i
⊗r
i
, R =
3
¸
i=1
i
⊗r
i
.
Solution: First, by using the property (2.38)
1
and
i
= Rr
i
we have
F = RU = R
3
¸
i=1
λ
i
r
i
⊗r
i
=
3
¸
i=1
λ
i
(Rr
i
) ⊗r
i
=
3
¸
i=1
λ
i
i
⊗r
i
. (i)
Next, since U is nonsingular
U
−1
=
3
¸
i=1
λ
−1
i
r
i
⊗r
i
.
and therefore
R = FU
−1
=
3
¸
i=1
λ
i
i
⊗r
i
3
¸
j=1
λ
−1
j
r
j
⊗r
j
=
3
¸
i=1
3
¸
j=1
λ
i
λ
−1
j
(
i
⊗r
i
)(r
j
⊗r
j
).
By using the property (2.37)
2
and the fact that r
i
· r
j
= δ
ij
, we have (
i
⊗r
i
)(r
j
⊗r
j
) = (r
i
· r
j
)(
i
⊗r
j
) =
δ
ij
(
i
⊗r
j
). Therefore
R =
3
¸
i=1
3
¸
j=1
λ
i
λ
−1
j
δ
ij
(
i
⊗r
j
) =
3
¸
i=1
λ
i
λ
−1
i
(
i
⊗r
i
) =
3
¸
i=1
(
i
⊗r
i
). (ii)
Example 2.23: Determine the rank and the null space of the linear transformation C = a ⊗ b where a =
o, b = o.
38 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
Solution: Recall that the rank of any linear transformation A is the dimension of its range. (The range of A
is the particular subspace of E
3
comprised of all vectors Ax as x takes all values in E
3
.) Since Cx = (b· x)a
the vector Cx is parallel to the vector a for every choice of the vector x. Thus the range of C is the set of
vectors parallel to a and its dimension is one. The linear transformation C therefore has rank one.
Recall that the null space of any linear transformation A is the particular subspace of E
3
comprised of
the set of all vectors x for which Ax = o. Since Cx = (b · x)a and a = o, the null space of C consists of all
vectors x for which b · x, i.e. the set of all vectors normal to b.
Example 2.24: Let λ
1
≤ λ
2
≤ λ
3
be the eigenvalues of the symmetric linear transformation S. Show that S
can be expressed in the form
S = (I +a ⊗b)(I +b ⊗a) a = o, b = o, (i)
if and only if
0 ≤ λ
1
≤ 1, λ
2
= 1, λ
3
≥ 1. (ii)
Example 2.25: Calculate the square roots of the identity tensor.
Solution: The identity is certainly a symmetric positive deﬁnite tensor. By the result of a previous example
on the squareroot of a symmetric positive deﬁnite tensor, it follows that there is a unique symmetric positive
deﬁnite tensor which is the square root of I. Obviously, this square root is also I. However, there are other
square roots of I that are not symmetric positive deﬁnite. We are to explore them here: thus we wish to
determine a tensor A on E
3
such that A
2
= I, A = I and A = −I.
First, if Ax = x for every vector x ∈ E
3
, then, by deﬁnition, A = I. Since we are given that A = I,
there must exist at least one nonnull vector x for which Ax = x; call this vector f
1
so that Af
1
= f
1
. Set
e
1
= (A−I) f
1
; (i)
since Af
1
= f
1
, it follows that e
1
= o. Observe that
(A+I) e
1
= (A+I) (A−I)f
1
= (A
2
−I)f
1
= Of
1
= o. (ii)
Therefore
Ae
1
= −e
1
(iii)
and so −1 is an eigenvalue of A with corresponding eigenvector e
1
. Without loss of generality we can assume
that e
1
 = 1.
Second, the fact that A = −I, together with A
2
= I similary implies that there must exist a unit vector
e
2
for which
Ae
2
= e
2
, (iv)
from which we conclude that +1 is an eigenvalue of A with corresponding eigenvector e
2
.
2.3. WORKED EXAMPLES. 39
Third, one can show that {e
1
, e
2
} is a linearly independent pair of vectors. To see this, suppose that for
some scalars ξ
1
, ξ
2
one has
ξ
1
e
1
+ξ
2
e
2
= o.
Operating on this by A yields ξ
1
Ae
1
+ξ
2
Ae
2
= o, which on using (iii) and (iv) leads to
−ξ
1
e
1
+ξ
2
e
2
= o.
Subtracting and adding the preceding two equations shows that ξ
1
e
1
= ξ
2
e
2
= o. Since e
1
and e
2
are
eigenvectors, neither of them is the null vector o, and therefore ξ
1
= ξ
2
= 0. Therefore e
1
and e
2
are linearly
independent.
Fourth, let e
3
be a unit vector that is perpendicular to both e
1
and e
2
. The triplet of vectors {e
1
, e
2
, e
3
}
is linearly independent and therefore forms a basis for E
3
.
Fifth, the components A
ij
of the tensor A in the basis {e
1
, e
2
, e
3
} are given, as usual, by
Ae
j
= A
ij
e
i
. (v)
Comparing (v) with (iii) yields A
11
= −1, A
21
= A
31
= 0, and similarly comparing (v) with (iv) yields
A
22
= 1, A
12
= A
32
= 0. The matrix of components of A in this basis is therefore
[A] =
¸
¸
¸
−1 0 A
13
0 1 A
23
0 0 A
33
. (vi)
It follows that
[A
2
] = [A]
2
= [A][A] =
¸
¸
¸
1 0 −A
13
+A
13
A
33
0 1 A
23
+A
23
A
33
0 0 A
2
33
. (vii)
(Notation: [A
2
] is the matrix of components of A
2
while [A]
2
is the square of the matrix of components of
A. Why is [A
2
] = [A]
2
?) However, since A
2
= I, the matrix of components of A
2
in any basis has to be the
identity matrix. Therefore we must have
−A
13
+A
13
A
33
= 0, A
23
+A
23
A
33
= 0, A
2
33
= 1, (viii)
which implies that
either
A
13
= arbitrary,
A
23
= 0,
A
33
= 1,
or
A
13
= 0,
A
23
= arbitrary,
A
33
= −1.
(ix)
Consequently the matrix [A] must necessarily have one of the two forms
¸
¸
¸
−1 0 α
1
0 1 0
0 0 1
or
¸
¸
¸
−1 0 0
0 1 α
2
0 0 −1
, (x)
40 CHAPTER 2. VECTORS AND LINEAR TRANSFORMATIONS
where α
1
and α
2
are arbitrary scalars.
Sixth, set
p
1
= e
1
, q
1
= −e
1
+
α
1
2
e
3
. (xi)
Then
p
1
⊗q
1
= −e
1
⊗e
1
+
α
1
2
e
1
⊗e
3
,
and therefore
I + 2p
1
⊗q
1
=
e
1
⊗e
1
+e
2
⊗e
2
+e
3
⊗e
3
−2e
1
⊗e
1
+α
1
e
1
⊗e
3
= −e
1
⊗e
1
+e
2
⊗e
2
+e
3
⊗e
3
+α
1
e
1
⊗e
3
.
Note from this that the components of the tensor I +2p
1
⊗q
1
are given by (x)
1
. Conversely, one can readily
verify that the tensor
A = I + 2p
1
⊗q
1
(xii)
has the desired properties A
2
= I, A = I, A = −I for any value of the scalar α
1
.
Alternatively set
p
2
= e
2
, q
2
= e
2
+
α
2
2
e
3
. (xiii)
Then
p
2
⊗q
2
= e
2
⊗e
2
+
α
2
2
e
2
⊗e
3
,
and therefore
−I + 2p
2
⊗q
2
=
−e
1
⊗e
1
−e
2
⊗e
2
−e
3
⊗e
3
+ 2e
2
⊗e
2
+α
2
e
2
⊗e
3
= −e
1
⊗e
1
+e
2
⊗e
2
−e
3
⊗e
3
+α
2
e
2
⊗e
3
.
Note from this that the components of the tensor −I + 2p
2
⊗ q
2
are given by (x)
2
. Conversely, one can
readily verify that the tensor
A = −I + 2p
2
⊗q
2
(xiv)
has the desired properties A
2
= I, A = I, A = −I for any value of the scalar α
2
.
Thus the tensors deﬁned in (xii) and (xiv) are both square roots of the identity tensor that are not
symmetric positive deﬁnite.
References
1. I.M. Gelfand, Lectures on Linear Algebra, Wiley, New York, 1963.
2. P.R. Halmos, Finite Dimensional Vector Spaces, Van Nostrand, New Jersey, 1958.
3. J.K. Knowles, Linear Vector Spaces and Cartesian Tensors, Oxford University Press, New York, 1997.
Chapter 3
Components of Vectors and Tensors.
Cartesian Tensors.
Notation:
α ..... scalar
{a} ..... 3 ×1 column matrix
a ..... vector
a
i
..... i
th
component of the vector a in some basis; or i
th
element of the column matrix {a}
[A] ..... 3 ×3 square matrix
A ..... linear transformation
A
ij
..... i, j component of the linear transformation A in some basis; or i, j element of the square matrix [A]
C
ijk
..... i, j, k, component of 4tensor C in some basis
T
i1i2....in
..... i
1
i
2
....i
n
component of ntensor T in some basis.
3.1 Components of a vector in a basis.
Let IE
3
be a threedimensional Euclidean vector space. A set of three linearly independent
vectors {e
1
, e
2
, e
3
} forms a basis for IE
3
in the sense that an arbitrary vector v can always
be expressed as a linear combination of the three basis vectors; i.e. given any v ∈ IE
3
, there
are unique scalars α, β, γ such that
v = αe
1
+ βe
2
+ γe
3
. (3.1)
If each basis vector e
i
has unit length, and if each pair of basis vectors e
i
, e
j
are mutually
orthogonal, we say that {e
1
, e
2
, e
3
} forms an orthonormal basis for IE
3
. Thus, for an
41
42 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
orthonormal basis,
e
i
· e
j
= δ
ij
(3.2)
where δ
ij
is the Kronecker delta. In these notes we shall always restrict attention to or
thonormal bases unless explicitly stated otherwise. If the basis is righthanded, one has in
addition that
e
i
· (e
j
×e
k
) = e
ijk
(3.3)
where e
ijk
is the alternator introduced previously in (1.44).
The components v
i
of a vector v in a basis {e
1
, e
2
, e
3
} are deﬁned by
v
i
= v · e
i
. (3.4)
The vector can be expressed in terms of its components and the basis vectors as
v = v
i
e
i
. (3.5)
The components of v may be assembled into a column matrix
{v} =
¸
¸
v
1
v
2
v
3
. (3.6)
e
1
e
2 e
3
v
1
v
2
v
3
v
v
2
v
3
v
v
1
Figure 3.1: Components {v
1
, v
2
, v
3
} and {v
1
, v
2
, v
3
} of the same vector v in two diﬀerent bases.
Even though this is obvious from the deﬁnition (3.4), it is still important to emphasize
that the components v
i
of a vector depend on both the vector v and the choice of basis.
Suppose, for example, that we are given two bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
} as shown in
3.2. COMPONENTS OF A LINEAR TRANSFORMATION IN A BASIS. 43
Figure 3.1. Then the vector v has one set of components v
i
in the ﬁrst basis and a diﬀerent
set of components v
i
in the second basis:
v
i
= v · e
i
, v
i
= v · e
i
. (3.7)
Thus the one vector v can be expressed in either of the two equivalent forms
v = v
i
e
i
or v = v
i
e
i
. (3.8)
The components v
i
and v
i
are related to each other (as we shall discuss later) but in general,
v
i
= v
i
.
Once a basis {e
1
, e
2
, e
3
} is chosen and ﬁxed, there is a unique vector x associated with
any given column matrix {x} such that the components of x in {e
1
, e
2
, e
3
} are {x}. Thus,
once the basis is ﬁxed, there is a onetoone correspondence between column matrices and
vectors. It follows, for example, that once the basis is ﬁxed, the vector equation z = x + y
can be written equivalently as
{z} = {x} +{y} or z
i
= x
i
+ y
i
(3.9)
in terms of the components x
i
, y
i
and z
i
in the given basis.
If u
i
and v
i
are the components of two vectors u and v in a basis, then the scalarproduct
u · v can be expressed as
u · v = u
i
v
i
; (3.10)
the vectorproduct u ×v can be expressed as
u ×v = (e
ijk
u
j
v
k
)e
i
or equivalently as (u ×v)
i
= e
ijk
u
j
v
k
, (3.11)
where e
ijk
is the alternator introduced previously in (1.44).
3.2 Components of a linear transformation in a basis.
Consider a linear transformation A. Any vector in IE
3
can be expressed as a linear combina
tion of the basis vectors e
1
, e
2
and e
3
. In particular this is true of the three vectors Ae
1
, Ae
2
and Ae
3
. Let A
ij
be the ith component of the vector Ae
j
so that
Ae
j
= A
ij
e
i
. (3.12)
44 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
We can also write
A
ij
= e
i
· (Ae
j
). (3.13)
The 9 scalars A
ij
are known as the components of the linear transformation A in the
basis {e
1
, e
2
, e
3
}. The components A
ij
can be assembled into a square matrix:
[A] =
¸
¸
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
. (3.14)
The linear transformation A can be expressed in terms of its components A
ij
and the basis
vectors e
i
as
A =
3
¸
j=1
3
¸
i=1
A
ij
(e
i
⊗e
j
). (3.15)
The components A
ij
of a linear transformation depend on both the linear transformation
A and the choice of basis. Suppose, for example, that we are given two bases {e
1
, e
2
, e
3
}
and {e
1
, e
2
, e
3
}. Then the linear transformation A has one set of components A
ij
in the ﬁrst
basis and a diﬀerent set of components A
ij
in the second basis:
A
ij
= e
i
· (Ae
j
), A
ij
= e
i
· (Ae
j
). (3.16)
The components A
ij
and A
ij
are related to each other (as we shall discuss later) but in
general A
ij
= A
ij
.
The components of the linear transformation A = a ⊗b are
A
ij
= a
i
b
j
. (3.17)
Once a basis {e
1
, e
2
, e
3
} is chosen and ﬁxed, there is a unique linear transformation M
associated with any given square matrix [M] such that the components of M in {e
1
, e
2
, e
3
}
are [M]. Thus, once the basis is ﬁxed, there is a onetoone correspondence between square
matrices and linear transformations. It follows, for example, that the equation y = Ax
relating the linear transformation A and the vectors x and y can be written equivalently as
{y} = [A]{x} or y
i
= A
ij
x
j
(3.18)
in terms of the components A
ij
, x
i
and y
i
in the given basis. Similarly, if A, B and C are
linear transformations such that C = AB, then their component matrices [A], [B] and [C]
are related by
[C] = [A][B] or C
ij
= A
ik
B
kj
. (3.19)
3.3. COMPONENTS IN TWO BASES. 45
The component matrix [I] of the identity linear transformation I in any orthonormal basis
is the unit matrix; its components are therefore given by the Kronecker delta δ
ij
. If [A] and
[A
T
] are the component matrices of the linear transformations A and A
T
, then [A
T
] = [A]
T
and A
T
ij
= A
ji
.
As mentioned in Section 2.2, a symmetric linear transformation S has three real eigen
values λ
1
, λ
2
, λ
3
and corresponding orthonormal eigenvectors e
1
, e
2
, e
3
. The eigenvectors are
referred to as the principal directions of S. The particular basis consisting of the eigenvec
tors is called a principal basis for S. The component matrix [S] of the symmetric linear
transformation S in its principal basis is
[S] =
¸
¸
λ
1
0 0
0 λ
2
0
0 0 λ
3
. (3.20)
As a ﬁnal remark we note that if we are to establish certain results for vectors and linear
transformations, we can, if it is more convenient to do so, pick and ﬁx a basis, and then
work with the components in that basis. If necessary, we can revert back to the vectors and
linear transformations at the end. For example the ﬁrst example in the previous chapter
asked us to show that a · (b × c) = b · (c × a). In terms of components, the left hand
side of this reads a · (b × c) = a
i
(b × c)
i
= a
i
e
ijk
b
j
c
k
= e
ijk
a
i
b
j
c
k
. Similarly the right
hand side reads b · (c × a) = b
i
(c × a)
i
= b
i
e
ijk
c
j
a
k
= e
ijk
a
k
b
i
c
j
. Since i, j, k are dummy
subscripts in the rightmost expression, they can be changed to any other subscript; thus by
changing k → i, i → j and j → k we can write b · (c × a) = e
jki
a
i
b
j
c
k
. Finally recalling
that the sign of e
ijk
changes when any two adjacent subscripts are switched we ﬁnd that
b · (c × a) = e
jki
a
i
b
j
c
k
= −e
jik
a
i
b
j
c
k
= e
ijk
a
i
b
j
c
k
where we have ﬁrst switched the ki and
then the ji in the subscript of the alternator. The rightmost expressions of a · (b ×c) and
b · (c ×a) are identical and therefore this establishes the desired identity.
3.3 Components in two bases.
Consider a 3dimensional Euclidean vector space together with two orthonormal bases {e
1
, e
2
, e
3
}
and {e
1
, e
2
, e
3
}. Since {e
1
, e
2
, e
3
} forms a basis, any vector, and therefore in particular the
vectors e
i
, can be represented as a linear combination of the basis vectors e
1
, e
2
, e
3
. Let Q
ij
be the jth component of the vector e
i
in the basis {e
1
, e
2
, e
3
}:
e
i
= Q
ij
e
j
. (3.21)
46 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
By taking the dotproduct of (NNN) with e
k
, one sees that
Q
ij
= e
i
· e
j
, (3.22)
and so Q
ij
is the cosine of the angle between the basis vectors e
i
and e
j
. Observe from
(NNN) that Q
ji
can also be interpreted as the jth component of e
i
in the basis {e
1
, e
2
, e
3
}
whence we also have
e
i
= Q
ji
e
j
. (3.23)
The 9 numbers Q
ij
can be assembled into a square matrix [Q]. This matrix relates the
two bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
}. Since both bases are orthonormal it can be readily
shown that [Q] is an orthogonal matrix. If in addition, if one basis can be rotated into the
other, which means that both bases are righthanded or both are lefthanded, then [Q] is a
proper orthogonal matrix and det[Q] = +1; if the two bases are related by a reﬂection, which
means that one basis is righthanded and the other is lefthanded, then [Q] is an improper
orthogonal matrix and det[Q] = −1.
We may now relate the diﬀerent components of a single vector v in two bases.
Let v
i
and v
i
be the ith component of the same vector v in the two bases {e
1
, e
2
, e
3
} and
{e
1
, e
2
, e
3
}. Then one can show that
v
i
= Q
ij
v
j
or equivalently {v
} = [Q]{v} (3.24)
Since [Q] is orthogonal, one also has the inverse relationships
v
i
= Q
ji
v
j
or equivalently {v} = [Q]
T
{v
}. (3.25)
In general, the component matrices {v} and {v
} of a vector v in two diﬀerent bases are
diﬀerent. A vector whose components in every basis happen to be the same is called an
isotropic vector: {v} = [Q]{v} for all orthogonal matrices [Q]. It is possible to show that
the only isotropic vector is the null vector o.
Similarly, we may relate the diﬀerent components of a single linear transforma
tion A in two bases. Let A
ij
and A
ij
be the ijcomponents of the same linear transformation
A in the two bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
}. Then one can show that
A
ij
= Q
ip
Q
jq
A
pq
or equivalently [A
] = [Q][A][Q]
T
. (3.26)
Since [Q] is orthogonal, one also has the inverse relationships
A
ij
= Q
pi
Q
qj
A
pq
or equivalently [A] = [Q]
T
[A
][Q]. (3.27)
3.4. DETERMINANT, TRACE, SCALARPRODUCT AND NORM 47
In general, the component matrices [A] and [A
] of a linear transformation A in two
diﬀerent bases are diﬀerent. A linear transformation whose components in every basis happen
to be the same is called an isotropic linear transformation: [A] = [Q][A][Q]
T
for all
orthogonal matrices [Q]. It is possible to show that the most general isotropic symmetric
linear transformation is a scalar multiple of the identity αI where α is an arbitrary scalar.
3.4 Scalarvalued functions of linear transformations.
Determinant, trace, scalarproduct and norm.
Let Φ(A; e
1
, e
2
, e
3
) be a scalarvalued function that depends on a linear transformation
A and a (nonnecessarily orthonormal) basis {e
1
, e
2
, e
3
}. For example Φ(A; e
1
, e
2
, e
3
) =
Ae
1
· e
1
. Certain such functions are in fact independent of the basis, so that for every two
(notnecessarily orthonormal) bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
} one has Φ(A; e
1
, e
2
, e
3
) =
Φ(A; e
1
, e
2
, e
3
), and in such a case we can simply write Φ(A). One example of such a
function is
Φ(A; e
1
, e
2
, e
3
) =
(Ae
1
×Ae
2
) · Ae
3
(e
1
×e
2
) · e
3
, (3.28)
(though it is certainly not obvious that this function is independent of the choice of basis).
Equivalently, let A be a linear transformation and let [A] be the components of A in some
basis {e
1
, e
2
, e
3
}. Let φ([A]) be some realvalued function deﬁned on the set of all square
matrices. If [A
] are the components of A in some other basis {e
1
, e
2
, e
3
}, then in general
φ([A]) = φ[A
]). This means that the function φ depends on the linear transformation A
and the underlying basis. Certain functions φ have the property that φ([A]) = φ[A
]) for
all pairs of bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
} and therefore such a function depends on the
linear transformation only and not the basis. For such a function we may write φ(A).
We ﬁrst consider two important examples here. Since the components [A] and [A
] of
a linear tranformation A in two bases are related by [A
] = [Q][A][Q]
T
, if we take the
determinant of this matrix equation we get
det[A
] = det([Q][A][Q]
T
) = det[Q] det[A] det[Q]
T
= (det[Q])
2
det[A] = det[A], (3.29)
since the determinant of an orthogonal matrix is ±1. Therefore without ambiguity we may
deﬁne the determinant of a linear transformation A to be the (basis independent) scalar
valued function given by
det A = det[A]. (3.30)
48 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
We will see in an example at the end of this Chapter that the particular function Φ deﬁned
in (3.28) is in fact the determinant det A.
Similarly, we may deﬁne the trace of a linear transformation A to be the (basis inde
pendent) scalarvalued function given by
trace A = tr[A]. (3.31)
In terms of its components in a basis one has
det A = e
ijk
A
1i
A
2j
A
3k
= e
ijk
A
i1
A
j2
A
k3
, traceA = A
ii
; (3.32)
see (1.46). It is useful to note the following properties of the determinant of a linear trans
formation:
det(AB) = det(A) det(B), det(αA) = α
3
det(A), det(A
T
) = det (A). (3.33)
As mentioned previously, a linear transformation A is said to be nonsingular if the only
vector x for which Ax = o is the null vector x = o. Equivalently, one can show that A is
nonsingular if and only if
det A = 0. (3.34)
If A is nonsingular, then
det(A
−1
) = 1/ det(A). (3.35)
Suppose that λ and v = o are an eigenvalue and eigenvector of given a linear transfor
mation A. Then by deﬁnition, Av = λv, or equivalently (A − λI)v = o. Since v = o it
follows that A−λI must be singular and so
det(A−λI) = 0. (3.36)
The eigenvalues are the roots λ of this cubic equation. The eigenvalues and eigenvectors of a
linear transformation do not depend on any choice of basis. Thus the eigenvalues of a linear
transformation are also scalarvalued functions of A whose values depends only on A and
not the basis: λ
i
= λ
i
(A). If S is symmetric, its matrix of components in a principal basis
are
[S] =
¸
¸
¸
¸
¸
λ
1
0 0
0 λ
2
0
0 0 λ
3
. (3.37)
3.4. DETERMINANT, TRACE, SCALARPRODUCT AND NORM 49
The particular scalarvalued functions
I
1
(A) = tr A,
I
2
(A) = 1/2 [(tr A)
2
−tr (A
2
)] ,
I
3
(A) = det A,
(3.38)
will appear frequently in what follows. It can be readily veriﬁed that for any linear trans
formation A and all orthogonal linear transformations Q,
I
1
(Q
T
AQ) = I
1
(A), I
2
(Q
T
AQ) = I
2
(A), I
3
(Q
T
AQ) = I
3
(A), (3.39)
and for this reason the three functions (3.38) are said to be invariant under orthogonal trans
formations. Observe from (3.37) that for a symmetric linear transformation with eigenvalues
λ
1
, λ
2
, λ
3
I
1
(S) = λ
1
+ λ
2
+ λ
3
,
I
2
(S) = λ
1
λ
2
+ λ
2
λ
3
+ λ
3
λ
1
,
I
3
(S) = λ
1
λ
2
λ
3
.
(3.40)
The mapping (3.40) between invariants and eigenvalues is onetoone. In addition one can
show that for any linear transformation A and any real number α,
det(A−αI) = −α
3
+ I
1
(A)α
2
−I
2
(A)α + I
3
(A).
Note in particular that the cubic equation for the eigenvalues of a linear transformation can
be written as
λ
3
−I
1
(A)λ
2
+ I
2
(A)λ −I
3
(A) = 0.
Finally, one can show that
A
3
−I
1
(A)A
2
+ I
2
(A)A−I
3
(A)I = O. (3.41)
which is known as the CayleyHamilton theorem.
One can similarly deﬁne scalarvalued functions of two linear transformations A and B.
The particular function φ(A, B) deﬁned by
φ(A, B) = tr(AB
T
) (3.42)
will play an important role in what follows. Note that in terms of components in a basis,
φ(A, B) = tr(AB
T
) = A
ij
B
ij
. (3.43)
50 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
This particular scalarvalued function is often known as the scalar product of the two
linear transformation A and B and is written as A· B:
A· B = tr(AB
T
). (3.44)
It is natural then to deﬁne the magnitude (or norm) of a linear transformation A, denoted
by A as
A =
√
A· A =
tr(AA
T
). (3.45)
Note that in terms of components in a basis,
A
2
= A
ij
A
ij
. (3.46)
Observe the useful property that if A →0, then each component
A
ij
→0. (3.47)
This will be used later when we linearize the theory of large deformations.
3.5 Cartesian Tensors
Consider two orthonormal bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
}. A quantity whose components
v
i
and v
i
in these two bases are related by
v
i
= Q
ij
v
j
(3.48)
is called a 1
st
order Cartesian tensor or a 1tensor. It follows from our preceding discussion
that a vector is a 1tensor.
A quantity whose components A
ij
and A
ij
in two bases are related by
A
ij
= Q
ip
Q
jq
A
pq
(3.49)
is called a 2
nd
order Cartesian tensor or a 2tensor. It follows from our preceding discussion
that a linear transformation is a 2tensor.
The concept of an n
th
order tensor can be introduced similarly: let T be a physical entity
which, in a given basis {e
1
, e
2
, e
3
}, is deﬁned completely by a set of 3
n
ordered numbers
T
i
1
i
2
....in
. The numbers T
i
1
i
2
....in
are called the components of T in the basis {e
1
, e
2
, e
3
}. If,
for example, T is a scalar, vector or linear transformation, it is represented by 3
0
, 3
1
and 3
2
3.5. CARTESIAN TENSORS 51
components respectively in the given basis. Let {e
1
, e
2
, e
3
} be a second basis related to the
ﬁrst one by the orthogonal matrix [Q], and let T
i
1
i
2
....in
be the components of the entity T
in the second basis. Then, if for every pair of such bases, these two sets of components are
related by
T
i
1
i
2
....in
= Q
i
1
j
1
Q
i
2
j
2
.... Q
injn
T
j
1
j
2
....jn
, (3.50)
the entity T is called a n
th
order Cartesian tensor or more simply an ntensor.
Note that the components of a tensor in an arbitrary basis can be calculated if its com
ponents in any one basis are known.
Two tensors of the same order are added by adding corresponding components.
Recall that the outerproduct of two vectors a and b is the 2−tensor C = a ⊗ b whose
components are given by C
ij
= a
i
b
j
. This can be generalized to higherorder tensors. Given
an ntensor A and an mtensor B their outerproduct is the (m + n)−tensor C = A ⊗ B
whose components are given by
C
i
1
i
2
..inj
1
j
2
..jm
= A
i
1
i
2
...in
B
j
1
j
2
...jm
. (3.51)
Let A be a 2tensor with components A
ij
in some basis. Then “contracting” A over its
subscripts leads to the scalar A
ii
. This can be generalized to higherorder tensors. Let A
be a ntensor with components A
i
1
i
2
...in
in some basis. Then “contracting” A over two of its
subscripts, say the i
j
th and i
k
th subscripts, leads to the (n −2)−tensor whose components
in this basis are A
i
1
i
2
.. i
j−1
p i
j+1
...i
k−1
p i
k+1
.... in
. Contracting over two subscripts involves
setting those two subscripts equal, and therefore summing over them.
Let a, b and T be entities whose components in a basis are denoted by a
i
, b
i
and T
ij
.
Suppose that the components of T in some basis are related to the components of a and b
in that same basis by a
i
= T
ij
b
j
. If a and b are 1tensors, then one can readily show that
T is necessarily a 2tensor. This is called the quotient rule since it has the appearance
of saying that the quotient of two 1tensors is a 2tensor. This rule generalizes naturally to
tensors of more general order. Suppose that A, B and T are entities whose components in a
basis are related by,
A
i
1
i
2
..in
= T
k
1
k
2
...k
B
j
1
j
2
...jm
(3.52)
where some of the subscripts maybe repeated. If it is known that A and B are tensors, then
T is necessarily a tensor as well.
In general, the components of a tensor T in two diﬀerent bases are diﬀerent: T
i
1
i
2
...in
=
T
i
1
i
2
...in
. However, there are certain special tensors whose components in one basis are the
52 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
same as those in any other basis; an example of this is the identity 2tensor I. Such a tensor
is said to be isotropic. In general, a tensor T is said to be an isotropic tensor if its
components have the same values in all bases, i.e. if
T
i
1
i
2
...in
= T
i
1
i
2
...in
(3.53)
in all bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
}. Equivalently, for an isotropic tensor
T
i
1
i
2
....in
= Q
i
1
j
1
Q
i
2
j
2
....Q
injn
T
j
1
j
2
....jn
for all orthogonal matrices [Q]. (3.54)
One can show that (a) the only isotropic 1tensor is the null vector o; (b) the most general
isotropic 2tensor is a scalar multiple of the identity linear transformation, αI; (c) the most
general isotropic 3tensor is the null 3tensor o; (d) and the most general isotropic 4tensor
C has components (in any basis)
C
ijkl
= αδ
ij
δ
kl
+ βδ
ik
δ
jl
+ γδ
il
δ
jk
(3.55)
where α, β, γ are arbitrary scalars.
3.6 Worked Examples.
In some of the examples below, we are asked to establish certain results for vectors and linear
transformations. As noted previously, whenever it is more convenient we may pick and ﬁx a
basis, and then work using components in that basis. If necessary, we can revert back to the
vectors and linear transformations at the end. We shall do this frequently in what follows
and will not bother to explain this each time.
It is also worth pointing out that in some of the example below calculations involving
vectors and/or linear transformation are carried out without reference to their components.
One might have expected such examples to have been presented in Chapter 2. They are
contained in the present chapter because they all involve either the determinant or trace of a
linear transformation, and we chose to deﬁne these quantities in terms of components (even
though they are basis independent).
Example 3.1: Suppose that A is a symmetric linear transformation. Show that its matrix of components [A]
in any basis is a symmetric matrix.
3.6. WORKED EXAMPLES. 53
Solution: According to (3.13), the components of A in the basis {e
1
, e
2
, e
3
} are deﬁned by
A
ji
= e
j
· Ae
i
. (i)
The property (NNN) of the transpose shows that e
j
· Ae
i
= A
T
e
j
· e
i
, which, on using the fact that A
is symmetric further simpliﬁes to e
j
· Ae
i
= Ae
j
· e
i
; and ﬁnally since the order of the vectors in a scalar
product do not matter we have e
j
· Ae
i
= e
i
· Ae
j
. Thus
A
ji
= e
i
· Ae
j
. (ii)
By (3.13), the right most term here is the A
ij
component of A, and so (ii) yields
A
ji
= A
ij
. (iii)
Thus [A] = [A]
T
and so the matrix [A] is symmetric.
Remark: Conversely, if it is known that the matrix of components [A] of a linear transformation in some
basis is is symmetric, then the linear transformation A is also symmetric.
Example 2.5: Choose any convenient basis and calculate the components of the projection linear transfor
mation Π and the reﬂection linear transformation R in that basis.
Solution: Let e
3
be a unit vector normal to the plane P and let e
1
and e
2
be any two unit vectors in
P such that {e
1
, e
2
, e
3
} forms an orthonormal basis. From an example in the previous chapter we know
that the projection transformation Π and the reﬂection transformation R in the plane P can be written as
Π = I −e
3
⊗e
3
and R = I −2(e
3
⊗e
3
) respectively. Since the components of e
3
in the chosen basis are δ
3i
,
we ﬁnd that
Π
ij
= δ
ij
−(e
3
)
i
(e
3
)
j
= δ
ij
−δ
3i
δ
3j
, R
ij
= δ
ij
−2δ
3i
δ
3j
.
Example 3.2: Consider the scalarvalued function
f(A, B) = trace(AB
T
) (i)
and show that, for all linear transformations A, B, C, and for all scalars α, this functionf has the following
properties:
i) f(A, B) = f(B, A),
ii) f(αA, B) = αf(A, B),
iii) f(A+C, B) = f(A, B) +f(C, B) and
iv) f(A, A) > 0 provided A = 0.
Solution: Let A
ij
and B
ij
be the components of A and B in an arbitrary basis. In terms of these components,
(AB
T
)
ij
= A
ik
B
T
kj
= A
ik
B
jk
and so
f(A, B) = A
ik
B
ik
. (ii)
54 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
It is now trivial to verify that all of the above requirements hold.
Remark: It follows from this that the function f has all of the usual requirements of a scalar product.
Therefore we may deﬁne the scalarproduct of two linear transformations A and B, denoted by A· B, as
A· B = trace(AB
T
) = A
ij
B
ij
. (iii)
Note that, based on this scalarproduct, we can deﬁne the magnitude of a linear transformation to be
A =
√
A· A =
A
ij
A
ij
. (iv)
Example 3.3: For any two vectors u and v, show that their crossproduct u×v is orthogonal to both u and
v.
Solution: We are to show, for example, that u · (u ×v) = 0. In terms of their components we can write
u · (u ×v) = u
i
(u ×v)
i
= u
i
(e
ijk
u
j
v
k
) = e
ijk
u
i
u
j
v
k
. (i)
Since e
ijk
= −e
jik
and u
i
u
j
= u
j
u
i
, it follows that e
ijk
is skewsymmetric in the subscripts ij and u
i
u
j
is
symmetric in the subscripts ij. Thus it follows from Example 1.3 that e
ijk
u
i
u
j
= 0 and so u · (u ×v) = 0.
The orthogonality of v and u ×v can be established similarly.
Example 3.4 Suppose that a, b, c, are any three linearly independent vectors and that F be an arbitrary
nonsingular linear transformation. Show that
(Fa ×Fb) · Fc = det F (a ×b) · c (i)
Solution: First consider the lefthand side of (i). On using (3.10), and (3.11), we can express this as
(Fa ×Fb) · Fc = (Fa ×Fb)
i
(Fc)
i
= e
ijk
(Fa)
j
(Fb)
k
(Fc)
i
, (ii)
and consequently
(Fa ×Fb) · Fc = e
ijk
(F
jp
a
p
) (F
kq
b
q
) (F
ir
c
r
) = e
ijk
F
ir
F
jp
F
kq
a
p
b
q
c
r
. (iii)
Turning next to the righthand side of (i), we note that
det F (a ×b) · c = det[F](a ×b)
i
c
i
= det[F]e
ijk
a
j
b
k
c
i
= det[F]e
rpq
a
p
b
q
c
r
. (iv)
Recalling the identity e
rpq
det[F] = e
ijk
F
ir
F
jp
F
kq
in (1.48) for the determinant of a matrix and substituting
this into (iv) gives
det F (a ×b) · c = e
ijk
F
ir
F
jp
F
kq
a
p
b
q
c
r
. (v)
Since the righthand sides of (iii) and (v) are identical, it follows that the lefthand sides must also be equal,
thus establishing the desired result.
3.6. WORKED EXAMPLES. 55
Example 3.5 Suppose that a, b and c are three noncoplanar vectors in IE
3
. Let V
0
be the volume of the
tetrahedron deﬁned by these three vectors. Next, suppose that F is a nonsingular 2tensor and let V denote
the volume of the tetrahedron deﬁned by the vectors Fa, Fb and Fc. Note that the second tetrahedron is
the image of the ﬁrst tetrahedron under the transformation F. Derive a formula for V in terms of V
0
and F.
c
a
b
Fa
Fb
Fc
Volume V
Volume V
0
F
Figure 3.2: Tetrahedron of volume V
0
deﬁned by three noncoplanar vectors a, b and c; and its image
under the linear transformation F.
Solution: Recall from an example in the previous Chapter that the volume V
0
of the tetrahedron deﬁned by
any three noncoplanar vectors a, b, c is
V
0
=
1
6
(a ×b) · c.
The volume V of the tetrahedron deﬁned by the three vectors Fa, Fb, Fc is likewise
V =
1
6
(Fa ×Fb) · Fc.
It follows from the result of the previous example that
V/V
0
= det F
which describes how volumes are mapped by the transformation F.
Example 3.6: Suppose that a and b are two noncolinear vectors in IE
3
. Let α
0
be the area of the par
allelogram deﬁned by these two vectors and let n
0
be a unit vector that is normal to the plane of this
parallelogram. Next, suppose that F is a nonsingular 2tensor and let α and n denote the area and unit
normal to the parallelogram deﬁned by the vectors Fa and Fb. Derive formulas for α and n in terms of
α
0
, n
0
and F.
Solution: By the properties of the vectorproduct we know that
α
0
= a ×b, n
0
=
a ×b
a ×b
;
and similarly that
α = Fa ×Fb, n =
Fa ×Fb
Fa ×Fb
.
56 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
F
a
b
n
0 Area α
0
Fa
Fb
n
Area α
Figure 3.3: Parallelogram of area α
0
with unit normal n
0
deﬁned by two noncolinear vectors a and b;
and its image under the linear transformation F.
Therefore
α
0
n
0
= a ×b, and αn = Fa ×Fb. (i)
But
(Fa ×Fb)
s
= e
sij
(Fa)
i
(Fb)
j
= e
sij
F
ip
a
p
F
jq
b
q
. (ii)
Also recall the identity e
pqr
det[F] = e
ijk
F
ip
F
jq
F
kr
introduced in (1.48). Multiplying both sides of this
identity by F
−1
rs
leads to
e
pqr
det[F]F
−1
rs
= e
ijk
F
ip
F
jq
F
kr
F
−1
rs
= e
ijk
F
ip
F
jq
δ
ks
= e
ijs
F
ip
F
jq
= e
sij
F
ip
F
jq
(iii)
Substituting (iii) into (ii) gives
(Fa ×Fb)
s
= det[F]e
pqr
F
−1
rs
a
p
b
q
= det[F]e
rpq
a
p
b
q
F
−1
rs
= det F(a ×b)
r
F
−T
sr
= det F
F
−T
(a ×b)
s
and so using (i),
αn = α
0
det F(F
−T
n
0
).
This describes how (vectorial) areas are mapped by the transformation F. Taking the norm of this vector
equation gives
α
α
0
=  det F F
−T
n
0
;
and substituting this result into the preceding equation gives
n =
F
−T
n
0
F
−T
n
0

.
Example 3.5: Let {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
} be two bases related by nine scalars Q
ij
through e
i
= Q
ij
e
j
.
Let Q be the linear transformation whose components in the basis {e
1
, e
2
, e
3
} are Q
ij
. Show that
e
i
= Q
T
e
i
;
thus Q
T
is the transformation that carries the ﬁrst basis into the second.
3.6. WORKED EXAMPLES. 57
Solution: Since Q
ij
are the components of the linear transformation Q in the basis {e
1
, e
2
, e
3
}, it follows
from the deﬁnition of components that
Qe
j
= Q
ij
e
i
.
Since [Q] is an orthogonal matrix one readily sees that Q is an orthogonal transformation. Operating on
both sides of the preceding equation by Q
T
and using the orthogonality of Q leads to
e
j
= Q
ij
Q
T
e
i
.
Multiplying both sides of this by Q
kj
and noting by the orthogonality of Q that Q
kj
Q
ij
= δ
ki
, we are now
led to
Q
kj
e
j
= Q
T
e
k
or equivalently
Q
T
e
i
= Q
ij
e
j
.
This, together with the given fact that e
i
= Q
ij
e
j
, yields the desired result.
Example 3.6: Determine the relationship between the components v
i
and v
i
of a vector v in two bases.
Solution: The components v
i
of v in the basis {e
1
, e
2
, e
3
} are deﬁned by
v
i
= v · e
i
,
and its components v
i
in the second basis {e
1
, e
2
, e
3
} are deﬁned by
v
i
= v · e
i
.
It follows from this and (NNN) that
v
i
= v · e
i
= v · (Q
ij
e
j
) = Q
ij
v · e
j
= Q
ij
v
j
.
Thus, the components of the vector v in the two bases are related by
v
i
= Q
ij
v
j
.
Example 3.7: Determine the relationship between the components A
ij
and A
ij
of a linear transformation A
in two bases.
Solution: The components A
ij
of the linear transformation A in the basis {e
1
, e
2
, e
3
} are deﬁned by
A
ij
= e
i
· (Ae
j
), (i)
and its components A
ij
in a second basis {e
1
, e
2
, e
3
} are deﬁned by
A
ij
= e
i
· (Ae
j
). (ii)
58 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
By ﬁrst making use of (NNN), and then (i), we can write (ii) as
A
ij
= e
i
· (Ae
j
) = Q
ip
e
p
· (AQ
jq
e
q
) = Q
ip
Q
jq
e
p
· (Ae
q
) = Q
ip
Q
jq
A
pq
. (iii)
Thus, the components of the linear transformation A in the two bases are related by
A
ij
= Q
ip
Q
jq
A
pq
. (iv)
Example 3.8: Suppose that the basis {e
1
, e
2
, e
3
} is obtained by rotating the basis {e
1
, e
2
, e
3
} through an
angle θ about the unit vector e
3
; see Figure 3.4. Write out the transformation rule for 2tensors explicitly
in this case.
e
1
e
2
e
1
e
2
e
3
, e
3
θ
θ
Figure 3.4: A basis {e
1
, e
2
, e
3
} obtained by rotating the basis {e
1
, e
2
, e
3
} through an angle θ about the
unit vector e
3
.
Solution: In view of the given relationship between the two bases it follows that
e
1
= cos θ e
1
+ sin θ e
2
,
e
2
= −sin θ e
1
+ cos θ e
2
,
e
3
= e
3
.
The matrix [Q] which relates the two bases is deﬁned by Q
ij
= e
i
· e
j
, and so it follows that
[Q] =
¸
¸
¸
¸
¸
cos θ sin θ 0
−sin θ cos θ 0
0 0 1
.
3.6. WORKED EXAMPLES. 59
Substituting this [Q] into [A
] = [Q][A][Q]
T
and multiplying out the matrices leads to the 9 equations
A
11
=
A
11
+A
22
2
+
A
11
−A
22
2
cos 2θ +
A
12
+A
21
2
sin 2θ,
A
12
=
A
12
−A
21
2
+
A
12
−A
21
2
cos 2θ −
A
11
−A
22
2
sin 2θ,
A
21
= −
A
12
−A
21
2
−
A
12
−A
21
2
cos 2θ −
A
11
−A
22
2
sin 2θ,
A
22
=
A
11
+A
22
2
−
A
11
−A
22
2
cos 2θ −
A
12
+A
21
2
sin 2θ,
A
13
= A
13
cos θ + A
23
sin θ, A
31
= A
31
cos θ + A
32
sin θ,
A
23
= A
23
cos θ − A
13
sin θ, A
32
= A
32
cos θ − A
31
sin θ,
A
33
= A
33
.
In the special case when [A] is symmetric, and in addition A
13
= A
23
= 0, these nine equations simplify to
A
11
=
A
11
+A
22
2
+
A
11
−A
22
2
cos 2θ + A
12
sin 2θ,
A
22
=
A
11
+A
22
2
−
A
11
−A
22
2
cos 2θ − A
12
sin 2θ,
A
12
= −
A
11
−A
22
2
sin 2θ,
together with A
13
= A
23
= 0 and A
33
= A
33
. These are the wellknown equations underlying the Mohr’s
circle for transforming 2tensors in twodimensions.
Example 3.9:
a. Let a, b and T be entities whose components in some arbitrary basis are a
i
, b
i
and T
ij
. The components
of T in any basis are deﬁned in terms of the components of a and b in that basis by
T
ijk
= a
i
b
j
b
k
. (i)
If a and b are vectors, show that T is a 3tensor.
b. Suppose that A and B are 2tensors and that their components in some basis are related by
A
ij
= C
ijk
B
k
. (ii)
Show that the C
ijk
’s are the components of a 4tensor.
Solution:
a. Let a
i
, a
i
and b
i
, b
i
be the components of a and b in two arbitrary bases. We are told that the
components of the entity T in these two bases are deﬁned by
T
ijk
= a
i
b
j
b
k
, T
ijk
= a
i
b
j
b
k
. (iii)
60 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
Since a and b are known to be vectors, their components transform according to the 1tensor trans
formation rule
a
i
= Q
ij
a
j
, b
i
= Q
ij
b
j
. (iv)
Combining equations (iii) and (iv) gives
T
ijk
= a
i
b
j
b
k
= Q
ip
a
p
Q
jq
b
q
Q
kr
b
r
= Q
ip
Q
jq
Q
kr
a
p
b
q
b
r
= Q
ip
Q
jq
Q
kr
T
pqr
. (v)
Therefore the components of T in two bases transform according to T
ijk
= Q
ip
Q
jq
Q
kr
T
pqr
. Therefore
T is a 3tensor.
b. Let A
ij
, B
ij
, C
ijk
and A
ij
, B
ij
, C
ijk
be the components of A, B, C in two arbitrary bases:
A
ij
= C
ijk
B
k
, A
ij
= C
ijk
B
k
. (vi)
We are told that A and B are 2tensors, whence
A
ij
= Q
ip
Q
jq
A
pq
, B
ij
= Q
ip
Q
jq
B
pq
, (vii)
and we must show that C
ijk
is a 4tensor, i.e that C
ijk
= Q
ip
Q
jq
Q
kr
Q
s
C
pqrs
. Substituting (vii)
into (vi)
2
gives
Q
ip
Q
jq
A
pq
= C
ijk
Q
kp
Q
q
B
pq
, (viii)
Multiplying both sides by Q
im
Q
jn
and using the orthogonality of [Q], i.e. the fact that Q
ip
Q
im
= δ
pm
,
leads to
δ
pm
δ
qn
A
pq
= C
ijk
Q
im
Q
jn
Q
kp
Q
q
B
pq
, (ix)
which by the substitution rule tells us that
A
mn
= C
ijk
Q
im
Q
jn
Q
kp
Q
q
B
pq
, (x)
or on using (vi)
1
in this that
C
mnpq
B
pq
= C
ijk
Q
im
Q
jn
Q
kp
Q
q
B
pq
. (xi)
Since this holds for all matrices [B] we must have
C
mnpq
= C
ijk
Q
im
Q
jn
Q
kp
Q
q
. (xii)
Finally multiplying both sides by Q
am
Q
bn
Q
cp
Q
dq
, using the orthogonality of [Q] and the substitution
rule yields the desired result
Q
am
Q
bn
Q
cp
Q
dq
C
mnpq
= C
abcd
. (xiii)
Example 3.10: Verify that the alternator e
ijk
has the property that
e
ijk
= Q
ip
Q
jq
Q
kr
e
pqr
for all proper orthogonal matrices [Q], (i)
but that more generally
e
ijk
= Q
ip
Q
jq
Q
kr
e
pqr
for all orthogonal matrices [Q]. (ii)
3.6. WORKED EXAMPLES. 61
Note from this that the alternator is not an isotropic 3tensor.
Example 3.11: If C
ijk
is an isotropic 4tensor, show that necessarily C
iik
= αδ
k
for some arbitrary scalar
α.
Solution: Since C
ijkl
is an isotropic 4tensor, by deﬁnition,
C
ijkl
= Q
ip
Q
jq
Q
kr
Q
ls
C
pqrs
for all orthogonal matrices [Q]. On setting i = j in this; then using the orthogonality of [Q]; and ﬁnally
using the substitution rule, we are led to
C
iikl
= Q
ip
Q
iq
Q
kr
Q
ls
C
pqrs
= δ
pq
Q
kr
Q
ls
C
pqrs
= Q
kr
Q
ls
C
pprs
.
Thus C
iik
obeys
C
iikl
= Q
kr
Q
ls
C
pprs
for all orthogonal matrices [Q],
and therefore it is an isotropic 2tensor. The desired result now follows since the most general isotropic
2tensor is a scalar multiple of the identity.
Example 3.12: Show that the most general isotropic vector is the null vector o.
Solution: In order to show this we must determine the most general vector u which is such that
u
i
= Q
ij
u
j
for all orthogonal matrices [Q]. (i)
Since (i) is to hold for all orthogonal matrices [Q], it must necessarily hold for the special choice [Q] = −[I].
Then Q
ij
= −δ
ij
, and so (i) reduces to
u
i
= −δ
ij
u
j
= −u
i
; (ii)
thus u
i
= 0 and so u = o.
Conversely, u = o obviously satisﬁes (i) for all orthogonal matrices [Q]. Thus u = o is the most general
isotropic vector.
Example 3.13: Show that the most general isotropic symmetric tensor is a scalar multiple of the identity.
Solution: We must ﬁnd the most general symmetric 2tensor A whose components in every basis are the
same; i.e.,
[A] = [Q][A][Q]
T
for all orthogonal matrices [Q]. (i)
First, since A is symmetric, we know that there is some basis in which [A] is diagonal. Since A is also
isotropic, it follows that [A] must therefore be diagonal in every basis. Thus [A] has the form
[A] =
¸
¸
λ
1
0 0
0 λ
2
0
0 0 λ
3
(ii)
62 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
in any basis. Thus (i) takes the form
¸
¸
λ
1
0 0
0 λ
2
0
0 0 λ
3
= [Q]
¸
¸
λ
1
0 0
0 λ
2
0
0 0 λ
3
[Q
T
] (iii)
for all orthogonal matrices [Q]. Thus (iii) must necessarily hold for the special choice
[Q] =
¸
¸
0 0 1
1 0 0
0 1 0
, (iv)
in which case (iii) reduces to
¸
¸
λ
1
0 0
0 λ
2
0
0 0 λ
3
=
¸
¸
λ
2
0 0
0 λ
1
0
0 0 λ
3
. (v)
Therefore, λ
1
= λ
2
.
A permutation of this special choice of [Q] similarly shows that λ
2
= λ
3
. Thus λ
1
= λ
2
= λ
3
= say α.
Therefore [A] necessarily must have the form [A] = α[I].
Conversely, by direct substitution, [A] = α[I] is readily shown to obey (i) for any orthogonal matrix [Q].
This establishes the result.
Example 3.14: If W is a skewsymmetric tensor, show that there is a vector w such that Wx = w×x for
all x ∈ IE.
Solution: Let W
ij
be the components of W in some basis and let w be the vector whose components in this
basis are deﬁned by
w
i
= −
1
2
e
ijk
W
jk
. (i)
Then, we merely have to show that w has the desired property stated above.
Multiplying both sides of the preceding equation by e
ipq
and then using the identity e
ijk
e
ipq
= δ
jp
δ
kq
−
δ
jq
δ
kp
, and ﬁnally using the substitution rule gives
e
ipq
w
i
= −
1
2
(δ
jp
δ
kq
−δ
jq
δ
kp
) W
jk
= −
1
2
(W
pq
−W
qp
)
Since W is skewsymmetric we have W
ij
= −W
ji
and thus conclude that
W
ij
= −e
ijk
w
k
.
Now for any vector x,
W
ij
x
j
= −e
ijk
w
k
x
j
= e
ikj
w
k
x
j
= (w×x)
i
.
Thus the vector w deﬁned by (i) has the desired property Wx = w×x.
3.6. WORKED EXAMPLES. 63
Example 3.15: Verify that the 4tensor
C
ijk
= αδ
ij
δ
k
+βδ
ik
δ
j
+γδ
i
δ
jk
, (i)
where α, β, γ are scalars, is isotropic. If this isotropic 4tensor is to possess the symmetry C
ijk
= C
jik
,
show that one must have β = γ.
Solution: In order to verify that C
ijk
are the components of an isotropic 4tensor we have to show that
C
ijk
= Q
ip
Q
jq
Q
kr
Q
s
C
pqrs
for all orthogonal matrices [Q]. The righthand side of this can be simpliﬁed
by using the given form of C
ijk
; the substitution rule; and the orthogonality of [Q] as follows:
Q
ip
Q
jq
Q
kr
Q
s
C
pqrs
= Q
ip
Q
jq
Q
kr
Q
s
(α δ
pq
δ
rs
+β δ
pr
δ
qs
+γ δ
ps
δ
qr
)
= α Q
iq
Q
jq
Q
ks
Q
s
+β Q
ir
Q
js
Q
kr
Q
s
+γ Q
is
Q
jr
Q
kr
Q
s
= α (Q
iq
Q
jq
) (Q
ks
Q
s
) +β (Q
ir
Q
kr
) (Q
js
Q
s
) +γ (Q
is
Q
s
) (Q
jr
Q
kr
)
= α δ
ij
δ
k
+β δ
ik
δ
j
+γ δ
i
δ
jk
= C
ijk
. (ii)
This establishes the desired result.
Turning to the second question, enforcing the requirement C
ijk
= C
jik
on (i) leads, after some simpli
ﬁcation, to
(β −γ) (δ
ik
δ
j
−δ
jk
δ
i
) = 0 . (iii)
Since this must hold for all values of the free indices i, j, k, , it must necessarily hold for the special choice
i = 1, j = 2, k = 1, = 2. Therefore (β −γ)(δ
11
δ
22
−δ
21
δ
12
) = 0 and so
β = γ. (iv)
Remark: We have shown that β = γ is necessary if C given in (i) is to have the symmetry C
ijk
= C
jik
.
One can readily verify that it is suﬃcient as well. It is useful for later use to record here, that the most
general isotropic 4tensor C with the symmetry property C
ijk
= C
jik
is
C
ijk
= αδ
ij
δ
k
+β (δ
ik
δ
j
+δ
i
δ
jk
) (v)
where α and β are scalars.
Remark: Observe that C
ijk
given by (v) automatically has the symmetry C
ijk
= C
kij
.
Example 3.16: If A is a tensor such that
Ax · x = 0 for all x (i)
show that A is necessarily skewsymmetric.
Solution: By deﬁnition of the transpose and the properties of the scalar product, Ax· x = x· A
T
x = A
T
x· x.
Therefore A has the properties that
Ax · x = 0, and A
T
x · x = 0 for all vectors x.
64 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
Adding these two equations gives
Sx · x = 0 where S =
A+A
T
.
Observe that S is symmetric. Therefore in terms of components in a principal basis of S,
Sx · x = σ
1
x
2
1
+σ
2
x
2
2
+σ
3
x
2
3
= 0
where the σ
k
’s are the eigenvalues of S. Since this must hold for all real numbers x
k
, it follows that every
eigenvalue must vanish: σ
1
= σ
2
= σ
3
= 0. Therefore S = O whence
A = −A
T
.
Remark: An important consequence of this is that if A is a tensor with the property that Ax · x = 0 for all
x, it does not follow that A = 0 necessarily.
Example 2.18: For any orthogonal linear transformation Q, show that det Q = ±1.
Solution: Recall that for any two linear transformations A and B we have det(AB) = det Adet B and
det B = det B
T
. Since QQ
T
= I it now follows that 1 = det I = det(QQ
T
) = det Qdet Q
T
= (det Q)
2
. The
desired result now follows.
Example 2.20: If Q is a proper orthogonal linear transformation on the vector space IE
3
, show that there
exists a vector v such that Qv = v. This vector is known as the axis of Q.
Solution: To show that there is a vector v such that Qv = v, it is suﬃcient to show that Q has an eigenvalue
+1, i.e. that (Q−I)v) = o or equivalently that det(Q−I) = 0.
Since QQ
T
= I we have Q(Q
T
− I) = I − Q. On taking the determinant of both sides and using the
fact that det(AB) = det Adet B we get
det Q det(Q
T
−I) = det (I −Q) . (i)
Recall that det Q = +1 for a proper orthogonal linear transformation, and that det A = A
T
and det(−A) =
(−1)
3
det(A) for a 3dimensional vector space. Therefore this leads to
det(Q−I) = −det(Q−I), (ii)
and the desired result now follows.
Example 2.22: For any linear transformation A, show that det(A−µI) = det(Q
T
AQ−µI) for all orthogonal
linear transformations Q and all scalars µ.
Solution: This follows readily since
det(Q
T
AQ−µI) = det(Q
T
AQ−µQ
T
Q) = det
Q
T
(A−µI)Q
= det Q
T
det(A−µI) det Q = det(A−µI).
3.6. WORKED EXAMPLES. 65
Remark: Observe from this result that the eigenvalues of Q
T
AQ coincide with those of Q, so that in
particular the same is true of their product and their sum: det(Q
T
AQ) = det A and tr(Q
T
AQ) = tr A.
Example 2.26: Deﬁne a scalarvalued function φ(A; e
1
, e
2
, e
3
) for all linear transformations A and all (not
necessarily) orthonormal bases {e
1
, e
2
, e
3
} by
φ(A; e
1
, e
2
, e
3
) =
Ae
1
· (e
2
×e
3
) +e
1
· (Ae
2
×e
3
) +e
1
· (e
2
×Ae
3
)
e
1
· (e
2
×e
3
)
.
Show that φ(A, e
1
, e
2
, e
3
) is in fact independent of the choice of basis, i.e., show that
φ(A, e
1
, e
2
, e
3
) = I
1
(A, e
1
, e
2
, e
3
)
for any two bases {e
1
, e
2
, e
3
} and {e
1
, e
2
, e
3
}. Thus, we can simply write φ(A) instead of φ(A, e
1
, e
2
, e
3
);
φ(A) is called a scalar invariant of A.
Pick any orthonormal basis and express φ(A) in terms of the components of A in that basis; and hence
show that φ(A) = trace A.
Example 3.7: Let F(t) be a oneparameter familty of nonsingular 2tensors that depends smoothly on the
parameter t. Calculate
d
dt
det F(t).
Solution: From the result of Example 3.NNN we have
F(t)a ×F(t)b
· F(t)c = det F(t) (a ×b) · c
Diﬀerentiating this with respect to t gives
˙
F(t)a ×F(t)b
· F(t)c +
F(t)a ×
˙
F(t)b
· F(t)c +
F(t)a ×F(t)b
·
˙
F(t)c =
d
dt
det F(t) (a ×b) · c
where we have set
˙
F(t) = dF/dt. We can write this as
˙
FF
−1
Fa ×Fb
· Fc +
Fa ×
˙
FF
−1
Fb
· Fc +
Fa ×Fb
·
˙
FF
−1
Fc =
d
dt
det F
(a ×b) · c.
In view of the result of Example 3.NNN, this can be written as
trace
˙
FF
−1
Fa ×Fb
· Fc =
d
dt
det F
(a ×b) · c
and now using the result of Example 3.NNN once more we get
trace
˙
FF
−1
det F (a ×b) · c =
d
dt
det F
(a ×b) · c.
or
d
dt
det F = trace
˙
FF
−1
det F.
66 CHAPTER 3. COMPONENTS OF TENSORS. CARTESIAN TENSORS
Example 2.30: For any integer N > 0, show that the polynomial
P
N
(A) = c
0
I +c
1
A+c
2
A
2
+. . . c
k
A
k
+. . . +c
N
A
N
can be written as a quadratic polynomial of A.
Solution: This follows readily from the CayleyHamilton Theorem (3.41) as follows: suppose that A is non
singular so that I
3
(A) = det A = 0. Then (3.41) shows that A
3
can be written as a linear combination of
I, A and A
2
. Next, multiplying this by A tells us that A
4
can be written as a linear combination of A, A
2
and A
3
, and therefore, on using the result of the previous step, as linear combination of I, A and A
2
. This
process can be continued an arbitrary number of times to see that for any integer k, A
k
can be expressed as
a linear combination of I, A and A
2
. The result thus follows.
Example 2.31: For any linear transformation A show that
det(A−αI) = −α
3
+I
1
(A)α
2
−I
2
(A)α +I
3
(A)
for all real numbers α where I
1
(A), I
2
(A) and I
3
(A) are the principal scalar invariants of A:
I
1
(A) = trace A, I
2
(A) = 1/2[(trace A)
2
−trace(A
2
)], I
3
(A) = det A.
Example 2.32: Calculate the principal scalar invariants I
1
, I
2
and I
3
of the linear transformation a ⊗b.
References
1. H. Jeﬀreys, Cartesian Tensors, Cambridge, 1931.
2. J.K. Knowles, Linear Vector Spaces and Cartesian Tensors, Oxford University Press, New York, 1997.
3. L.A. Segel, Mathematics Applied to Continuum Mechanics, Dover, New York, 1987.
Chapter 4
Characterizing Symmetry: Groups of
Linear Transformations.
Linear transformations are mappings of vector spaces into vector spaces. When an object
is mapped using a linear transformation, certain transformations preserve its symmetry
while others don’t. One way in which to characterize the symmetry of an object is to
consider the collection of all linear transformations that preserve its symmetry. The set of
such transformations depends on the object: for example the set of linear transformations
that preserve the symmetry of a cube is diﬀerent to the set of linear transformations that
preserve the symmetry of a tetrahedron. In this chapter we touch brieﬂy on the question of
characterizing symmetry by linear transformations.
Intuitively a “uniform allaround expansion”, i.e. a linear transformation of the form αI
that rescales the object by changing its size but not its shape, does not aﬀect symmetry.
We are interested in other linear transformations that also preserve symmetry, principally
rotations and reﬂections. In this Chapter we shall consider those linear transformations that
map the object back into itself. The collection of such transformations have certain important
and useful properties.
67
68 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
i
j
B
C
D
A
o
Figure 4.1: Mapping a square into itself.
4.1 An example in twodimensions.
We begin with an illustrative example. Consider a square, ABCD, which lies in a plane
normal to the unit vector k, whose center is at the origin o and whose sides are parallel
to the orthonormal vectors {i, j}. Consider mappings that carry the square into itself. The
vertex A can be placed in one of 4 positions; see Figure 4.1. Once the location of A has
been determined, the vertex B can be placed in one of 2 positions (allowing for reﬂections
or in just one position if only rotations are permitted). And once the locations of A and
B have been ﬁxed, there is no further ﬂexibility and the locations of the remaining vertices
are ﬁxed. Thus there are a total of 4 × 2 = 8 symmetry preserving transformations of the
square, 4 of which are rotations and 4 of which are reﬂections.
Consider the 4 rotations. In order to determine them, we (a) identify the axes of rotational
symmetry and then (b) determine the number of distinct rotations about each such axis.
In the present case there is just 1 axis to consider, viz. k, and we note that 0
o
, 90
o
, 180
o
and 270
o
rotations about this axis map the square back into itself. Thus the following 4
distinct rotations are symmetry transformations: I, R
π/2
k
, R
π
k
, R
3π/2
k
where we are using the
notation introduced previously, i.e. R
φ
n
is a righthanded rotation through an angle φ about
the axis n.
Let G
square
denote the set consisting of these 4 symmetry preserving rotations:
G
square
= {I, R
π/2
k
, R
π
k
, R
3π/2
k
}.
4.2. AN EXAMPLE IN THREEDIMENSIONS. 69
This collection of linear transformations has two important properties: ﬁrst, observe that
the successive application of any two symmetries yields a third symmetry, i.e. if P
1
and P
2
are in G
square
, then so is their product P
1
P
2
. For example, R
π
k
R
π/2
k
= R
3π/2
k
, R
π/2
k
R
3π/2
k
=
I, R
3π/2
k
R
3π/2
k
= R
π
k
etc. Second, observe that if P is any member of G
square
, then so is its
inverse P
−1
. For example (R
π
k
)
−1
= R
π
k
, (R
3π/2
k
)
−1
= R
π/2
k
etc. As we shall see in Section
4.4, these two properties endow the set G
square
with a certain special structure.
Next consider the rotation R
π/2
k
and observe that every element of the set G
square
can be
represented in the form (R
π/2
k
)
n
for the integer choices n = 0, 1, 2, 3. Therefore we can say
that the set G
square
is “generated” by the element R
π/2
k
.
Finally observe that
G
square
= {I, R
π
}
is a subset of G
square
and that it too has the properties that if P
1
, P
2
∈ G
square
then their
product P
1
P
2
is also in G
square
; and if P ∈ G
square
so is its inverse P
−1
.
We shall generalize all of this in Section 4.4.
4.2 An example in threedimensions.
B
C
D
A
i
j
k
o
Figure 4.2: Mapping a cube into itself.
Before considering some general theory, it is useful to consider the threedimensional
version of the previous problem. Consider a cube whose center is at the origin o and whose
70 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
edges are parallel to the orthonormal vectors {i, j, k}, and consider mappings that carry the
cube into itself. Consider a vertex A, and its three adjacent vertices B, C, D. The vertex
A can be placed in one of 8 positions. Once the location of A has been determined, the
vertex B can be placed in one of 3 positions. And once the locations of A and B have been
ﬁxed, the vertex C can be placed in one of 2 positions (allowing for reﬂections or in just one
position if only rotations are permitted). Once the vertices A, B and C have been placed,
the locations of the remaining vertices are ﬁxed. Thus there are a total of 8 × 3 × 2 = 48
symmetry preserving transformations of the cube, 24 of which are rotations and 24 of which
are reﬂections.
First, consider the 24 rotations. In order to determine these rotations we again (a) identify
all axes of rotational symmetry and then (b) determine the number of distinct rotations about
each such axis. In the present case we see that, in addition to the identify transformation I
itself, we have the following rotational transformations that preserve symmetry:
1. There are 3 axes that join the center of one face of the cube to the center of the
opposite face of the cube which we can take to be i, j, k, (which in materials science
are called the {100} directions); and 90
o
, 180
o
and 270
o
rotations about each of these
axes maps the cube back into the cube. Thus the following 3×3 = 9 distinct rotations
are symmetry transformations:
R
π/2
i
, R
π
i
, R
3π/2
i
, R
π/2
j
, R
π
j
, R
3π/2
j
, R
π/2
k
, R
π
k
, R
3π/2
k
2. There are 4 axes that join one vertex of the cube to the diagonally opposite vertex of
the cube which we can take to be i + j + k, i − j + k, i + j − k, i − j − k, (which in
materials science are called the {111} directions); and 120
o
and 240
o
rotations about
each of these axes maps the cube back into the cube. Thus the following 4 × 2 = 8
distinct rotations are symmetry transformations:
R
2π/3
i+j+k
, R
4π/3
i+j+k
, R
2π/3
i−j+k
, R
4π/3
i−j+k
, R
2π/3
i+j−k
, R
4π/3
i+j−k
, R
2π/3
i−j−k
, R
4π/3
i−j−k
.
3. Finally, there are 6 axes that join the center of one edge of the cube to the center
of the diagonally opposite edge of the cube which we can take to be i + j, i − j, i +
k, i − k, j + k, j − k (which in materials science are called the {110} directions); and
4.2. AN EXAMPLE IN THREEDIMENSIONS. 71
180
o
rotations about each of these axes maps the cube back into the cube. Thus the
following 6 ×1 = 6 distinct rotations are symmetry transformations:
R
π
i+j
, R
π
i−j
, R
π
i+k
, R
π
i−k
, R
π
j+k
, R
π
j−k
.
Let G
cube
denote the collection of these 24 symmetry preserving rotations:
G
cube
= {I,
R
π/2
i
, R
π
i
, R
3π/2
i
, R
π/2
j
, R
π
j
, R
3π/2
j
, R
π/2
k
, R
π
k
, R
3π/2
k
R
2π/3
i+j+k
, R
4π/3
i+j+k
, R
2π/3
i−j+k
, R
4π/3
i−j+k
, R
2π/3
i+j−k
, R
4π/3
i+j−k
, R
2π/3
i−j−k
, R
4π/3
i−j−k
,
R
π
i+j
, R
π
i−j
, R
π
i+k
, R
π
i−k
, R
π
j+k
, R
π
j−k
}.
(4.1)
If one considers rotations and reﬂections, then there are 48 elements in this set, where the
24 reﬂections are obtained by multiplying each rotation by −I. (It is important to remark
that this just happens to be true for the cube, but is not generally true. In general, if R is a
rotational symmetry of an object then −R is, of course, a reﬂection, but it need not describe
a reﬂectional symmetry of the object; e.g. see the example of the tetrahedron discussed
later.)
The collection of linear transformations G
cube
has two important properties that one can
verify: (i) if P
1
and P
2
∈ G
cube
, then their product P
1
P
2
is also in G
cube
, and (ii) if P ∈ G
cube
,
then so does its inverse P
−1
.
Next, one can verify that every element of the set G
cube
can be represented in the form
(R
π/2
i
)
p
(R
π/2
j
)
q
(R
π/2
k
)
r
for integer choices of p, q, r. For example the rotation R
2π/3
i+j+k
(about
a {111} axis) and the rotation R
π
i+k
(about a {110} axis) can be represented as
R
2π/3
i+j+k
=
R
π/2
k
−1
R
π/2
j
−1
, R
π
i+k
=
R
π/2
j
−1
R
π/2
k
2
.
(One way in which to verify this is to use the representation of a rotation tensor determined in
Example 2.18.) Therefore we can say that the set G
cube
is “generated” by the three elements
R
π/2
i
, R
π/2
j
and R
π/2
k
.
72 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
4.3 Lattices.
A geometric structure of particular interest in solid mechanics is a lattice and we now make
a few observations on the symmetry of lattices. The simplest lattice, a Bravais lattice
L{o;
1
,
2
,
3
}, is an inﬁnite set of periodically arranged points in space generated by the
translation of a single point o through three linearly independent lattice vectors {
1
,
2
,
3
}:
L{o;
1
,
2
,
3
} = {x  x = o +
3
¸
n=1
n
i
i
, n
i
∈ Z } (4.2)
where Z is the set of integers. Figure 4.3 shows a twodimensional square lattice and one
possible set of lattice vectors
1
,
2
. (It is clear from the ﬁgure that diﬀerent sets of lattice
vectors can correspond to the same lattice.)
i
j
a
a
1
2
Figure 4.3: A twodimensional square lattice with lattice vectors
1
,
2
.
It can be shown that a linear transformation P maps a lattice back into itself if and only
if
P
i
=
3
¸
j=1
M
ij
j
(4.3)
for some 3 ×3 matrix [M] whose elements M
ij
are integers and where det[M] = 1. Given a
lattice, let G
lattice
be the set of all linear transformations P that map the lattice back into
itself. One can show that if P
1
, P
2
∈ G
lattice
then their product P
1
P
2
is also in G
lattice
; and if
P ∈ G
lattice
so is its inverse P
−1
. The set G
lattice
is called the symmetry group of the lattice;
4.4. GROUPS OF LINEAR TRANSFORMATIONS. 73
and the set of rotations in G
lattice
is known as the point group of the lattice. For example the
point group of a simple cubic lattice
1
is the set G
cube
of 24 rotations given in (4.1).
4.4 Groups of Linear Transformations.
A collection G of nonsingular linear transformations is said to be a group of linear trans
formations if it possesses the following two properties:
(i) if P
1
∈ G and P
2
∈ G then P
1
P
2
∈ G,
(ii) if P ∈ G then P
−1
∈ G.
Note from this that the identity transformation I is necessarily a member of every group G.
Clearly the three sets G
square
, G
cube
and G
lattice
encountered in the previous sections are
groups. One can show that each of the following sets of linear transformations forms a group:
 the set of all orthogonal linear transformations;
 the set of all proper orthogonal linear transformations;
 the set of all unimodular linear transformations
2
(i.e. linear transformations with de
terminant equal to ±1); and
 the set of all proper unimodular linear transformations (i.e. linear transformations
with determinant equal to +1).
The generators of a group G are those elements P
1
, P
2
, . . . , P
n
which, when they and
their inverses are multiplied among themselves in various combinations yield all the elements
of the group. Generators of the groups G
square
and G
cube
were given previously.
In general, a collection of linear transformations G
is said to be a subgroup of a group
G if
(i) G
⊂ G and
(ii) G
is itself a group.
1
There are seven diﬀerent types of symmetry that arise in Bravais lattices, viz. triclinic, monoclinic,
orthorhombic, tetragonal, cubic, trigonal and hexagonal. Because, for example, a cubic lattice can be body
centered or facecentered, and so on, the number of diﬀerent types of lattices is greater than seven.
2
While the determinant of an orthogonal tensor is ±1 the converse is not necessarily true. There are
unimodular tensors, e.g. P = I + αi ⊗ j, that are not orthogonal. Thus the unimodular group is not
equivalent to the orthogonal group.
74 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
One can readily show that the group of proper orthogonal linear transformations is a sub
group of the group of orthogonal linear transformations, which in turn is a subgroup of the
group of unimodular linear transformations. In our ﬁrst example, G
square
is a subgroup of
G
square
.
It should be mentioned that the general theory of groups deals with collections of elements
(together with certain “rules” including “multiplication”) where the elements need not be
linear transformations. For example the set of all integers Z with “multiplication” deﬁned
as the addition of numbers, the identity taken to be zero, and the inverse of x taken to be
−x is a group. Similarly the set of all matrices of the form
¸
cosh x sinh x
sinh x cosh x
where −∞< x < ∞,
with “multiplication” deﬁned as matrix multiplication, the identity being the identity matrix,
and the inverse being
¸
cosh(−x) sinh(−x)
sinh(−x) cosh(−x)
,
can be shown to be a group. However, our discussion in these notes is limited to groups of
linear transformations.
4.5 Symmetry of a scalarvalued function of symmetric
positivedeﬁnite tensors.
When we discuss the constitutive behavior of a material in Volume 2, we will encounter
a scalarvalued function ψ(C) deﬁned for all symmetric positive deﬁnite tensors C. (This
represents the energy in the material and characterizes its mechanical response). The sym
metry of the material will be characterized by a set G of nonsingular tensors P which has
the property that, for each P ∈ G,
ψ(C) = ψ(P
T
CP) for all symmetric positive −deﬁnite C. (4.4)
4.5. SYMMETRY OF A SCALARVALUED FUNCTION 75
It can be readily shown that this set of tensors G is a group. To see this, ﬁrst let
P
1
, P
2
∈ G so that
ψ(C) = ψ(P
T
1
CP
1
), ψ(C) = ψ(P
T
2
CP
2
), (4.5)
for all symmetric positivedeﬁnite C. Then ψ((P
1
P
2
)
T
CP
1
P
2
) = ψ(P
T
2
(P
T
1
CP
1
)P
2
) =
ψ(P
T
1
CP
1
) = ψ(C) where we have used (4.5)
2
and (4.5)
1
in the penultimate and ultimate
steps respectively. Thus if P
1
and P
2
are in G, then so is P
1
P
2
. Next, suppose that P ∈ G.
Since P is nonsingular, the equation S = P
T
CP provides a onetoone relation between
symmetric positive deﬁnite tensors C and S. Thus, since (4.4) holds for all symmetric
positivedeﬁnite C, it also holds for all symmetric positivedeﬁnite linear transformations
S = P
T
CP. Substituting this into (4.4) gives ψ(S) = ψ(P
−T
SP
−1
) for all symmetric
positivedeﬁnite S; and so P
−1
is also in G. Thus the set G of nonsingular tensors obeying
(4.4) is a group; we shall refer to it as the symmetry group of ψ.
Observe from (4.4) that the symmetry group of ψ contains the elements I and −I, and
as a consequence, if P ∈ G then −P ∈ G also.
To examine an explicit example, consider the function
ψ(C) =
´
ψ
det C
. (4.6)
It is seen trivially that for this
´
ψ, equation (4.4) holds if and only if det P = ±1. Thus the
symmetry group of this ψ consists of all unimodular tensors ( i.e. tensors with determinant
equal to ±1).
As a second example consider the function
ψ(C) =
´
ψ
Cn · n
(4.7)
where n is a given ﬁxed unit vector. Let Q
n
be a rotation about the axis n through an
arbitrary angle. Then since n is the axis of Q
n
we know that Q
n
n = n. Therefore
ψ(Q
T
n
CQ
n
) =
´
ψ
Q
T
n
CQ
n
n · n
=
´
ψ
CQ
n
n · Q
n
n
=
´
ψ
Cn · n
= ψ(C). (4.8)
The symmetry group of the function (4.7) therefore contains the set of all rotations about
n. (Are there any other tensors in G?)
76 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
The following result will be useful in Volume 2. Let H be some ﬁxed nonsingular linear
transformation, and consider two functions ψ
1
(C) and ψ
2
(C), each deﬁned for all symmetric
positivedeﬁnite tensors C. Suppose that ψ
1
and ψ
2
are related by
ψ
2
(C) = ψ
1
(H
T
CH) for all symmetric positive −deﬁnite tensors C. (4.9)
If G
1
and G
2
are the symmetry groups of ψ
1
and ψ
2
respectively, then it can be shown that
G
2
= HG
1
H
−1
(4.10)
in the sense that a tensor P ∈ G
1
if and only if the tensor HPH
−1
∈ G
2
. As a special case
of this, if H is a spherical tensor, i.e. if H = αI, then G
1
= G
2
.
Next, note that any nonsingular tensor P can be written as the product of a spherical
tensor αI and a unimodular tensor T as P = (αI)T provided that we take α = ( det P)
1/3
since then det T = ±1. This, together with the special case of the result noted in the
preceding paragraph provides a hint of why we might want to limit attention to unimodular
tensors rather than consider all nonsingular tensors in our discussion of symmetry.
This motivates the following slight modiﬁcation to our original notion of symmetry of a
function ψ(C). We characterize the symmetry of ψ by the set G of unimodular tensors P
which have the property that, for each P ∈ G,
ψ(C) = ψ(P
T
CP) for all symmetric positive −deﬁnite C. (4.11)
It can be readily shown that this set of tensors G is also a group, necessarily a subgroup of
the unimodular group.
A function ψ is said to be isotropic if its symmetry group G contains all orthogonal
tensors. Thus for an isotropic function ψ,
ψ(C) = ψ(P
T
CP) (4.12)
for all symmetric positivedeﬁnite C and all orthogonal P. From a theorem in algebra it
follows that an isotropic function ψ depends on C only through its principal scalar invariants
deﬁned previously in (3.38), i.e. that there exists a function
´
ψ such that
ψ(C) =
´
ψ
I
1
(C), I
2
(C), I
3
(C)
(4.13)
4.6. WORKED EXAMPLES. 77
where
I
1
(C) = trace C
I
2
(C) = 1/2 [(trace C)
2
−trace (C
2
)] ,
I
3
(C) = det C.
(4.14)
As a second example consider “cubic symmetry” where the symmetry group G coincides
with the set of 24 rotations G
cube
given in (4.1) plus the corresponding reﬂections obtained
by multiplying these rotations by −I. As noted previously, this group is generated by
R
π/2
i
, R
π/2
j
, R
π/2
k
and −I, and contains 24 rotations and 24 reﬂections. Then, according to a
theorem in algebra (see pg 312 of Truesdell and Noll),
ψ(C) =
´
ψ
i
1
(C), i
2
(C), i
3
(C), i
4
(C), i
5
(C), i
6
(C), i
7
(C), i
8
(C), i
9
(C)
(4.15)
where
i
1
(C) = C
11
+ C
22
+ C
33
,
i
2
(C) = C
22
C
33
+ C
33
C
11
+ C
11
C
22
i
3
(C) = C
11
C
22
C
33
i
4
(C) = C
2
23
+ C
2
31
+ C
2
12
i
5
(C) = C
2
31
C
2
32
+ C
2
12
C
2
23
+ C
2
23
C
2
31
i
6
(C) = C
23
C
31
C
12
i
7
(C) = C
22
C
2
12
+ C
33
C
2
31
+ C
33
C
2
23
+ C
11
C
2
12
+ C
11
C
2
31
+ C
22
C
2
23
i
8
(C) = C
11
C
2
31
C
2
12
+ C
22
C
2
12
C
2
23
+ C
33
C
2
23
C
2
31
i
9
(C) = C
2
23
C
22
C
33
+ C
2
31
C
33
C
11
+ C
2
12
C
11
C
22
(4.16)
If G contains I and all rotations R
φ
n
, 0 < φ < 2π, through all angles φ about a ﬁxed axis
n, the corresponding symmetry is called transverse isotropy.
If G includes the three elements −R
π
i
, −R
π
j
, −R
π
k
which represent reﬂections in the
planes normal to i, j and k, the symmetry is called orthotropy.
4.6 Worked Examples.
Example 4.1: Characterize the set H
square
of linear transformations that map a square back into a square,
including both rotations and reﬂections.
78 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
B
C D
A
i
j
Figure 4.4: Mapping a square into itself.
Solution: We return to the problem describe in Section 4.1 and now consider the set rotations and reﬂections
H
square
that map the square back into itself. The set of rotations that do this were determined earlier and
they are
G
square
= {I, R
π/2
k
, R
π
k
, R
3π/2
k
}.
As the 4 reﬂectional symmetries we can pick
H = reﬂection in the horizontal axis i,
V = reﬂection in the vertical axis j,
D = reﬂection in the diagonal with positive slope i +j,
D
= reﬂection in the diagonal with negative slope −i +j,
and so
H
square
=
I, R
π/2
k
, R
π
k
, R
3π/2
k
, H, V, D, D
¸
. (i)
One can verify that H
square
is a group since it possesses the property that if P
1
and P
2
are two transfor
mations in G, then so is their product P
1
P
2
; e.g. D
= R
3π/2
k
H, D = HR
3π/2
k
etc. And if P is any member
of G, then so is its inverse; e.g. H
−1
= H etc.
Example 4.2: Find the generators of H
square
and all subgroups of H
square
.
Solution: All elements of H
square
can be represented in the form (R
π/2
k
)
i
H
j
for integer choices of i = 0, 1, 2, 3
and j = 0, 1:
R
π
k
= (R
π/2
k
)
2
, R
3π/2
k
= (R
π/2
k
)
3
, I = (R
π/2
k
)
4
.
D
= (R
π/2
k
)
3
H, V = (R
π/2
k
)
2
H, D = R
π/2
k
H.
Therefore the group H
square
is generated by the two elements H and R
π/2
.
One can verify that the following 8 collections of linear transformations are subgroups of H
square
:
I, R
π/2
, R
π
, R
3π/2
¸
,
¸
I, D, D
, R
π
¸
, {I, H, V, R
π
} , {I, R
π
} ,
{I, D} ,
¸
I, D
¸
, {I, H} , {I, V} ,
4.6. WORKED EXAMPLES. 79
Geometrically, each of these subgroups leaves some aspect of the square invariant. The ﬁrst leaves the face
invariant, the second leaves a diagonal invariant, the third leaves the axis invariant, the fourth leaves an axis
and a diagonal invariant etc. There are no other subgroups of H
square
.
Example 4.3: Characterize the rotational symmetry of a regular tetrahedron.
B
C
D
A
i
j
k
p
Figure 4.5: A regular tetrahedron ABCD, three orthonormal vectors {i, j, k} and a unit vector p. The axis
k passes through the vertex A and the centroid of the opposite face BCD, while the unit vector p passes
through the center of the edge AD and the center of the opposite edge BC.
Solution:
1. There are 4 axes like k in the ﬁgure that pass through a vertex of the tetrahedron and the centroid
of the opposite face; and righthanded rotations of 120
o
and 240
o
about each of these axes maps the
tetrahedron back onto itself. Thus these 4 ×2 = 8 distinct rotations – of the form R
2π/3
k
, R
4π/3
k
, etc.
– are symmetry transformations of the tetrahedron.
2. There are three axes like p shown in the ﬁgure that pass through the midpoints of a pair of opposite
edges; and a righthanded rotation through 180
o
about each of these axes maps the tetrahedron
back onto itself. Thus these 3 × 1 = 3 distinct rotations – of the form R
π
p
, etc. – are symmetry
transformations of the tetrahedron.
The group G
tetrahedron
of rotational symmetries of a tetrahedron therefore consists of these 11 rotations
plus the identity transformation I.
Example 4.4: Are all symmetry preserving linear transformations necessarily either rotations or reﬂections?
Solution: We began this chapter by considering the symmetry of a square, and examining the diﬀerent ways
in which the square could be mapped back into itself. Now consider the example of a twodimensional a ×a
80 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
square lattice, i.e. the set of inﬁnite points
L
square
= {x  x = n
1
ai +n
2
aj, n
1
, n
2
∈ Z ≡ Integers} (i)
depicted in Figure 4.6, and examine the diﬀerent ways in which this lattice can be mapped back into itself.
i
j
a
a
1
2
Figure 4.6: A twodimensional × square lattice.
We ﬁrst note that the rotational and reﬂectional symmetry transformations of a a × a square are also
symmetry transformations for the lattice since they leave the lattice invariant. There are however other
transformations, that are neither rotations nor reﬂections, that also leave the lattice invariant. For example,
if for every integer n, one rigidly translates the n
th
row of the lattice by precisely the amount n in the
i direction, one recovers the original lattice. Thus, the “shearing” of the lattice described by the linear
transformation
P = I +ai ⊗j (ii)
is also a symmetry preserving transformation.
Example 4.5: Show that each of the following sets of linear transformations forms a group: all orthogonal
tensors; all proper orthogonal tensors; all unimodular tensors (i.e. tensors with determinant equal to ±1);
and all proper unimodular tensors (i.e. tensors with determinant equal to +1).
Example 4.6: Show that the group of proper orthogonal tensors is a subgroup of the group of orthogonal
tensors, which in turn is a subgroup of the group of unimodular tensors.
Example 4.7: Suppose that a function ψ(C) is deﬁned for all symmetric positive deﬁnite tensors C and that
its symmetry group is the set of all orthogonal tensors. Show that ψ depends on C only through its principal
scalar invariants, i.e. show that there is a function
´
ψ such that
ψ(C) =
´
ψ
I
1
(C), I
2
(C), I
3
(C)
4.6. WORKED EXAMPLES. 81
where I
i
(C), i = 1, 2, 3, are the principal scalar invariants of C deﬁned previously in (3.38).
Solution: We are given that ψ has the property that for all symmetric positivedeﬁnite tensors C and all
orthogonal tensors Q
ψ(C) = ψ(Q
T
CQ). (i)
In order to prove the desired result it is suﬃcient to show that, if C
1
and C
2
are two symmetric tensors
whose principal invariants I
i
are the same,
I
1
(C
1
) = I
1
(C
2
), I
2
(C
1
) = I
2
(C
2
), I
3
(C
1
) = I
3
(C
2
), (ii)
then ψ(C
1
) = ψ(C
2
).
Recall that the mapping (3.40) between principal invariants and eigenvalues is onetoone. It follows
from this and (ii) that the eigenvalues of C
1
and C
2
are the same. Thus we can write
C
1
=
3
¸
i=1
λ
i
e
(1)
i
⊗e
(1)
i
, C
2
=
3
¸
i=1
λ
i
e
(2)
i
⊗e
(2)
i
, (iii)
where the two sets of orthonormal vectors {e
(1)
1
, e
(1)
2
, e
(1)
3
} and {e
(1)
1
, e
(1)
2
, e
(1)
3
} are the respective principal
bases of C
1
and C
2
. Since each set of basis vectors is orthonormal, there is an orthogonal tensor R that
carries {e
(1)
1
, e
(1)
2
, e
(1)
3
} into {e
(2)
1
, e
(2)
2
, e
(2)
3
}:
Re
(1)
i
= e
(2)
i
, i = 1, 2, 3. (iv)
Thus
R
T
3
¸
i=1
λ
i
e
(2)
i
⊗e
(2)
i
R =
3
¸
i=1
λ
i
R
T
(e
(2)
i
⊗e
(2)
i
)R =
3
¸
i=1
λ
i
(R
T
e
(2)
i
) ⊗(R
T
e
(2)
i
) =
3
¸
i=1
λ
i
(e
(1)
i
⊗(e
(1)
i
), (v)
and so R
T
C
2
R = C
1
. Therefore ψ(C
1
) = ψ(R
T
C
2
R) = ψ(C
2
) where in the last step we have used (i).
This establishes the desired result.
Example 4.8: Consider a scalarvalued function f(x) that is deﬁned for all vectors x. Let G be the set of all
nonsingular linear transformations P that have the property that for each P ∈ G, one has f(x) = f(Px)
for all vectors x.
i) Show that G is a group.
ii) Find the most general form of f if G contains the set of all orthogonal transformations.
Solution:
i) Suppose that P
1
and P
2
are in G, i.e. that
f(x) = f(P
1
x) for all vectors x, and
f(x) = f(P
2
x) for all vectors x.
¸
(i)
82 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS
Then
f
(P
1
P
2
)x
= f
P
1
(P
2
x)
= f(P
2
x) = f(x)
where in the penultimate and ultimate steps we have used (i)
1
and (i)
2
respectively.
Next, suppose that P ∈ G so that
f(x) = f(Px) for all vectors x.
Since P is nonsingular we can set y = Px and obtain
f(P
−1
y) = f(y) for all vectors y.
It thus follows that G has the two deﬁning properties of a group.
ii) If x
1
and x
2
are two vectors that have the same length, we will show that f(x
1
) = f(x
2
), whence
f(x) depends on x only through its length x, i.e. there exists a function
´
f such that
f(x) =
´
f(x) for all vectors x.
If x
1
and x
2
are two vectors that have the same length, there is a rotation tensor R that carries x
2
to x
1
: Rx
2
= x
1
. Therefore
f(x
1
) = f(Rx
2
) = f(x
2
),
where in the last step we have used the fact that G contains the set of all orthogonal transformations,
i.e. that f(x) = f(Px) for all vectors x and all orthogonal P. This establishes the result claimed
above.
Example 4.9: Consider a scalarvalued function g(C, m⊗m) that is deﬁned for all symmetric positivedeﬁnite
tensors C and all unit vectors m. Let G be the set of all nonsingular linear transformations P that have
the property that for each P ∈ G, one has g(C, n ⊗n) = g(P
T
CP, P
T
(n ⊗n)P) for all symmetric positive
deﬁnite tensors C and some particular unit vector n. If G contains the set of all orthogonal transformations,
show that there exists a function ´ g such that
g(C, n ⊗n) = ´ g
I
1
(C), I
2
(C), I
3
(C), I
4
(C, n), I
5
(C, n)
where I
1
(C), I
2
(C), I
3
(C) are the three fundamental scalar invariants of C and
I
4
(C, n) = Cn · n, I
5
(C, n) = C
2
n · n.
Remark: Observe that with respect to an orthonormal basis {e
1
, e
2
, e
3
} where e
3
= n one has I
4
= C
33
and
I
5
= C
2
31
+C
2
32
+C
2
33
.
Solution: We are told that
g(C, n ⊗n) = g(Q
T
CQ, Q
T
(n ⊗n)Q) (i)
for all orthogonal Q and all symmetric positive deﬁnite C. As in Example 4.7, it is suﬃcient to show that if
C
1
and C
2
are two symmetric positive deﬁnite linear transformations whose “invariants” I
i
, i = 1, 2, 3, 4, 5”
are the same, i.e.
I
1
(C
1
) = I
1
(C
2
), I
2
(C
1
) = I
2
(C
2
), I
3
(C
1
) = I
3
(C
2
), I
4
(C
1
, n) = I
4
(C
2
, n), I
5
(C
1
, n) = I
5
(C
2
, n) (ii)
4.6. WORKED EXAMPLES. 83
then g(C
1
, n ⊗n) = g(C
2
, n ⊗n). From (ii)
1,2,3
and the analysis in Example 4.7 it follows that there is an
orthogonal tensor R such that R
T
C
2
R = C
1
. It is readily seen from this that R
T
C
2
2
R = C
2
1
as well. It
now follows from this, the fact that R is orthogonal, (ii)
4,5
and the deﬁnitions of I
4
and I
5
that
Rn · Rn = n · n, C
2
Rn · Rn = C
2
n · n, C
2
2
Rn · Rn = C
2
2
n · n, (iii)
and this must hold for all symmetric positive deﬁne C
2
. This implies that
Rn = ±n and consequently R
T
n = ±n,
as may be seen, for example, for expressing (iii) in a principal basis of C
2
. Consequently
g(C
1
, n ⊗n) = g(R
T
C
2
R, (R
T
n) ⊗(R
T
n)) = g(R
T
C
2
R, R
T
(n ⊗n)R) = g(C
2
, n ⊗n)
where we have used (i) in the very last step. This establishes the desired result.
REFERENCES
1. M.A. Armstrong, Groups and Symmetry, SpringerVerlag, 1988.
2. G. Birkhoﬀ and S. MacLane, A Survey of Modern Algebra, MacMillan, 1977.
3. C. Truesdell and W. Noll, The nonlinear ﬁeld theories of mechanics, in Handbuch der Physik, Volume
III/3, edited by S. Flugge, SpringerVerlag, 1965.
4. A.J.M. Spencer, Theory of invariants, in Continuum Physics, Volume I, edited by A.C. Eringen,
Academic Press, 1971.
Chapter 5
Calculus of Vector and Tensor Fields
Notation:
α ..... scalar
{a} ..... 3 ×1 column matrix
a ..... vector
a
i
..... i
th
component of the vector a in some basis; or i
th
element of the column matrix {a}
[A] ..... 3 ×3 square matrix
A ..... secondorder tensor (2tensor)
A
ij
..... i, j component of the 2tensor A in some basis; or i, j element of the square matrix [A]
C ..... fourthorder tensor (4tensor)
C
ijk
..... i, j, k, component of 4tensor C in some basis
T
i1i2....in
..... i
1
i
2
....i
n
component of ntensor T in some basis.
5.1 Notation and deﬁnitions.
Let Rbe a bounded region of threedimensional space whose boundary is denoted by ∂Rand
let x denote the position vector of a generic point in R+ ∂R. We shall consider scalar and
tensor ﬁelds such as φ(x), v(x), A(x) and T(x) deﬁned on R+∂R. The region R+∂R and
these ﬁelds will always be assumed to be suﬃciently regular so as to permit the calculations
carried out below.
While the subject of the calculus of tensor ﬁelds can be dealt with directly, we shall take
the more limited approach of working with the components of these ﬁelds. The components
will always be taken with respect to a single ﬁxed orthonormal basis {e
1
, e
2
, e
3
}. Each com
ponent of say a vector ﬁeld v(x) or a 2tensor ﬁeld A(x) is eﬀectively a scalarvalued function
85
86 CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
on threedimensional space, v
i
(x
1
, x
2
, x
3
) and A
ij
(x
1
, x
2
, x
3
), and we can use the wellknown
operations of classical calculus on such ﬁelds such as partial diﬀerentiation with respect to
x
k
.
In order to simplify writing, we shall use the notation that a comma followed by a sub
script denotes partial diﬀerentiation with respect to the corresponding xcoordinate. Thus,
for example, we will write
φ
,i
=
∂φ
∂x
i
, φ
,ij
=
∂
2
φ
∂x
i
∂x
j
, v
i,j
=
∂v
i
∂x
j
, (5.1)
and so on, where v
i
and x
i
are the i
th
components of the vectors v and x in the basis
{e
1
, e
2
, e
3
}.
The gradient of a scalar ﬁeld φ(x) is a vector ﬁeld denoted by grad φ (or ∇φ). Its
i
th
component in the orthonormal basis is
(grad φ)
i
= φ
,i
, (5.2)
so that
grad φ = φ
,i
e
i
.
The gradient of a vector ﬁeld v(x) is a 2tensor ﬁeld denoted by grad v (or ∇v). Its ij
th

component in the orthonormal basis is
(grad v)
ij
= v
i,j
, (5.3)
so that
grad v = v
i,j
e
i
⊗e
j
.
The gradient of a scalar ﬁeld φ in the particular direction of the unit vector n is denoted by
∂φ/∂n and deﬁned by
∂φ
∂n
= ∇φ · n. (5.4)
The divergence of a vector ﬁeld v(x) is a scalar ﬁeld denoted by div v (or ∇· v). It is
given by
div v = v
i,i
. (5.5)
The divergence of a 2tensor ﬁeld A(x) is a vector ﬁeld denoted by div A (or ∇· A). Its
i
th
component in the orthonormal basis is
(div A)
i
= A
ij,j
(5.6)
5.2. INTEGRAL THEOREMS 87
so that
div A = A
ij,j
e
i
.
The curl of a vector ﬁeld v(x) is a vector ﬁeld denoted by curl v (or ∇× v). Its i
th

component in the orthonormal basis is
(curl v)
i
= e
ijk
v
k,j
(5.7)
so that
curl v = e
ijk
v
k,j
e
i
.
The Laplacians of a scalar ﬁeld φ(x), a vector ﬁeld v(x) and a 2tensor ﬁeld A(x) are
the scalar, vector and 2tensor ﬁelds with components
∇
2
φ = φ
,kk
, (∇
2
v)
i
= v
i,kk
, (∇
2
A)
ij
= A
ij,kk
, (5.8)
5.2 Integral theorems
Let D be an arbitrary regular subregion of the region R. The divergence theorem allows
one to relate a surface integral on ∂D to a volume integral on D. In particular, for a scalar
ﬁeld φ(x)
∂D
φn dA =
D
∇φ dV or
∂D
φn
k
dA =
D
φ
,k
dV. (5.9)
Likewise for a vector ﬁeld v(x) one has
∂D
v · n dA =
D
∇· v dV or
∂D
v
k
n
k
dA =
D
v
k,k
dV, (5.10)
as well as
∂D
v ⊗n dA =
D
∇v dV or
∂D
v
i
n
k
dA =
D
v
i,k
dV. (5.11)
More generally for a ntensor ﬁeld T(x) the divergence theorem gives
∂D
T
i
1
i
2
...in
n
k
dA =
D
∂
∂x
k
(T
i
1
i
2
...in
) dV (5.12)
where some of the subscripts i
1
, i
2
, . . . , i
n
may be repeated and one of them might equal k.
88 CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
5.3 Localization
Certain physical principles are described to us in terms of equations that hold on an arbitrary
portion of a body, i.e. in terms of an integral over a subregion D of R. It is often useful
to derive an equivalent statement of such a principle in terms of equations that must hold
at each point x in the body. In what follows, we shall frequently have need to do this, i.e.
convert a “global principle” to an equivalent “local ﬁeld equation”.
Consider for example the scalar ﬁeld φ(x) that is deﬁned and continuous at all x ∈
R+ ∂ R and suppose that
D
φ(x) dV = 0 for all subregions D ⊂ R. (5.13)
We will show that this “global principle” is equivalent to the “local ﬁeld equation”
φ(x) = 0 at every point x ∈ R. (5.14)
z
B
(z)
D
Figure 5.1: The region R, a subregion D and a neighborhood B
(z) of the point z.
We will prove this by contradiction. Suppose that (5.14) does not hold. This implies that
there is a point, say z ∈ R, at which φ(z) = 0. Suppose that φ is positive at this point:
φ(z) > 0. Since we are told that φ is continuous, φ is necessarily (strictly) positive in some
neighborhood of z as well. Let B
(z) be a sphere with its center at z and radius > 0. We
can always choose suﬃciently small so that B
(z) is a suﬃciently small neighborhood of z
and
φ(x) > 0 at all x ∈ B
(z). (5.15)
Now pick a region D which is a subset of B
(z). Then φ(x) > 0 for all x ∈ D. Integrating φ
over this D gives
D
φ(x) dV > 0 (5.16)
thus contradicting (5.13). An entirely analogous calculation can be carried out in the case
φ(z) < 0. Thus our starting assumption must be false and (5.14) must hold.
5.4. WORKED EXAMPLES. 89
5.4 Worked Examples.
In all of the examples below the region R will be a bounded regular region and its boundary
∂R will be smooth. All ﬁelds are deﬁned on this region and are as smooth as in necessary.
In some of the examples below, we are asked to establish certain results for vector and
tensor ﬁelds. When it is more convenient, we will carry out our calculations by ﬁrst picking
and ﬁxing a basis, and then working with the components in that basis. If necessary, we will
revert back to the vector and tensor ﬁelds at the end. We shall do this frequently in what
follows and will not bother to explain this strategy each time.
Example 5.1: Calculate the gradient of the scalarvalued function φ(x) = Ax · x where A is a constant
2tensor.
Solution: Writing φ in terms of components
φ = A
ij
x
i
x
j
.
Calculating the partial derivative of φ with respect to x
k
yields
φ
,k
= A
ij
(x
i
x
j
)
,k
= A
ij
(x
i,k
x
j
+x
i
x
j,k
) = A
ij
(δ
ik
x
j
+x
i
δ
jk
) = A
kj
x
j
+A
ik
x
i
= (A
kj
+A
jk
)x
j
or equivalently ∇φ = (A+A
T
)x.
Example 5.2: Let v(x) be a vector ﬁeld and let v
i
(x
1
, x
2
, x
3
) be the i
th
component of v in a ﬁxed orthonormal
basis {e
1
, e
2
, e
3
}. For each i and j deﬁne
F
ij
= v
i,j
.
Show that F
ij
are the components of a 2tensor.
Solution: Since v and x are 1tensors, their components obey the transformation rules
v
i
= Q
ik
v
k
, v
i
= Q
ki
v
k
and x
j
= Q
jk
x
k
, x
= Q
j
x
j
Therefore
F
ij
=
∂v
i
∂x
j
=
∂v
i
∂x
∂x
∂x
j
=
∂v
i
∂x
Q
j
=
∂(Q
ik
v
k
)
∂x
Q
j
= Q
ik
Q
j
∂v
k
∂x
= Q
ik
Q
j
F
k
,
which is the transformation rule for a 2tensor.
Example 5.3: If φ(x), u(x) and A(x) are a scalar, vector and 2tensor ﬁelds respectively. Establish the
identities
a. div (φu) = u · grad φ +φ div u
90 CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
b. grad (φu) = u ⊗grad φ +φ grad u
c. div (φA) = A grad φ +φ div A
Solution:
a. In terms of components we are asked to show that (φu
i
)
,i
= u
i
φ
,i
+ φ u
i,i
. This follows immediately
by expanding (φu
i
)
,i
using the chain rule.
b. In terms of components we are asked to show that (φu
i
)
,j
= u
i
φ
,j
+ φ u
i,j
. Again, this follows
immediately by expanding (φu
i
)
,j
using the chain rule.
c. In terms of components we are asked to show that (φA
ij
)
,j
= A
ij
φ
,j
+ φ A
ij,j
. Again, this follows
immediately by expanding (φA
ij
)
,j
using the chain rule.
Example 5.4: If φ(x) and v(x) are a scalar and vector ﬁeld respectively, show that
∇×(φv) = φ(∇×v) −v ×∇φ (i)
Solution: Recall that the curl of a vector ﬁeld u can be expressed as ∇×u = e
ijk
u
k,j
e
i
where e
i
is a ﬁxed
basis vector. Thus evaluating ∇×(φv):
∇×(φv) = e
ijk
(φv
k
)
,j
e
i
= e
ijk
φ v
k,j
e
i
+e
ijk
φ
,j
v
k
e
i
= φ ∇×v +∇φ ×v (ii)
from which the desired result follows because a ×b = −b ×a.
Example 5.5: Let u(x) be a vector ﬁeld and deﬁne a second vector ﬁeld ξ(x) by ξ(x) = curl u(x). Show
that
a. ∇· ξ = 0;
b. (∇u −∇u
T
)a = ξ ×a for any vector ﬁeld a(x); and
c. ξ · ξ = ∇u · ∇u −∇u · ∇u
T
Solution: Recall that in terms of its components, ξ = curl u = ∇×u can be expressed as
ξ
i
= e
ijk
u
k,j
. (i)
a. A direct calculation gives
∇· ξ = ξ
i,i
= (e
ijk
u
k,j
)
,i
= e
ijk
u
k,ji
= 0 (ii)
where in the last step we have used the fact that e
ijk
is skewsymmetric in the subscripts i, j, and
u
k,ji
is symmetric in the subscripts i, j (since the order of partial diﬀerentiation can be switched) and
therefore their product vanishes.
5.4. WORKED EXAMPLES. 91
b. Multiplying both sides of (i) by e
ipq
gives
e
ipq
ξ
i
= e
ipq
e
ijk
u
k,j
= (δ
pj
δ
qk
−δ
pk
δ
qj
) u
k,j
= u
q,p
−u
p,q
, (iii)
where we have made use of the identity e
ipq
e
ijk
= δ
pj
δ
qk
− δ
pk
δ
qj
between the alternator and the
Kronecker delta infroduced in (1.49) as well as the substitution rule. Multiplying both sides of this
by a
q
and using the fact that e
ipq
= −e
piq
gives
e
piq
ξ
i
a
q
= (u
p,q
−u
q,p
)a
q
, (iv)
or ξ ×a = (∇u −∇u
T
)a.
c. Since (∇u)
ij
= u
i,j
and the inner product of two 2tensors is A· B = A
ij
B
ij
, the righthand side of
the equation we are asked to establish can be written as ∇u · ∇u − ∇u · ∇u
T
= (∇u)
ij
(∇u)
ij
−
(∇u)
ij
(∇u)
ji
= u
i,j
u
i,j
−u
i,j
u
j,i
. The lefthand side on the hand is ξ · ξ = ξ
i
ξ
i
.
Using (i), the aforementioned identity between the alternator and the Kronecker delta, and the sub
stitution rule leads to the desired result as follows:
ξ
i
ξ
i
= (e
ijk
u
k,j
) (e
ipq
u
p,q
) = (δ
jp
δ
kq
−δ
jq
δ
kp
) u
k,j
u
p,q
= u
q,p
u
p,q
−u
p,q
u
p,q
. (v)
Example 5.6: Let u(x), E(x) and S(x) be, respectively, a vector and two 2tensor ﬁelds. These ﬁelds are
related by
E =
1
2
∇u +∇u
T
, S = 2µE+λ trace(E) 1, (i)
where λ and µ are constants. Suppose that
u(x) = b
x
r
3
where r = x, x = 0, (ii)
and b is a constant. Use (i)
1
to calculate the ﬁeld E(x) corresponding to the ﬁeld u(x) given in (ii), and then
use (i)
2
to calculate the associated ﬁeld S(x). Thus verify that the ﬁeld S(x) corresponding to (ii) satisﬁes
the diﬀerential equation:
div S = o, x = 0. (iii)
Solution: We proceed in the manner suggested in the problem statement by ﬁrst using (i)
1
to calculate the
E corresponding to the u given by (ii); substituting the result into (i)
2
gives the corresponding S; and ﬁnally
we can then check whether or not this S satisﬁes (iii).
In components,
E
ij
=
1
2
(u
i,j
−u
j,i
) , (iv)
and therefore we begin by calculting u
i,j
. For this, it is convenient to ﬁrst calculate ∂r/∂x
j
= r
,j
. Observe
by diﬀerentiating r
2
= x
2
= x
i
x
i
that
2rr
,j
= 2x
i,j
x
i
= 2δ
ij
x
i
= 2x
j
, (v)
and therefore
r
,j
=
x
j
r
. (vi)
92 CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
Now diﬀerentiating the given vector ﬁeld u
i
= bx
i
/r
3
with respect to x
j
gives
u
i,j
=
b
r
3
x
i,j
+bx
i
(r
−3
)
,j
=
b
r
3
δ
ij
−3b
x
i
r
4
r
,j
= b
δ
ij
r
3
−3b
x
i
r
4
x
j
r
= b
δ
ij
r
3
−3b
x
i
x
j
r
5
.
(vii)
Substituting this into (iv) gives us E
ij
:
E
ij
=
1
2
(u
i,j
+u
j,i
) = b
δ
ij
r
3
−3
x
i
x
j
r
5
. (viii)
Next, substituting (viii) into (i)
2
, gives us S
ij
:
S
ij
= 2µ E
ij
+λE
kk
δ
ij
= 2µb
δ
ij
r
3
−
3x
i
x
j
r
5
+λb
δ
kk
r
3
−3
x
k
x
k
r
5
δ
ij
= 2µb
δ
ij
r
3
−
3x
i
x
j
r
5
+λb
3
r
3
−3
r
2
r
5
δ
ij
= 2µb
δ
ij
r
3
−
3x
i
x
j
r
5
.
(ix)
Finally we use this to calculate ∂S
ij
/∂x
j
= S
ij,j
:
1
2µb
S
ij,j
= δ
ij
(r
−3
)
,j
−
3
r
5
(x
i
x
j
)
,j
−3x
i
x
j
(r
−5
)
,j
= δ
ij
−
3
r
4
r
,j
−
3
r
5
(δ
ij
x
j
+x
i
δ
jj
) −3x
i
x
j
−
5
r
6
r
,j
= −3
δ
ij
r
4
x
j
r
−
3
r
5
(x
i
+ 3x
i
) +
15x
i
x
j
r
6
x
j
r
= 0. (x)
Example 5.7: Show that
∂R
x ⊗n dA = V I, (i)
where V is the volume of the region R, and x is the position vector of a typical point in R +∂R.
Solution: In terms of components in a ﬁxed basis, we have to show that
∂R
x
i
n
j
dA = V δ
ij
. (ii)
The result follows immediately by using the divergence theorem (5.11):
∂R
x
i
n
j
dA =
R
x
i,j
dV =
R
δ
ij
dV = δ
ij
R
dV = δ
ij
V. (iii)
Example 5.8: Let A(x) be a 2tensor ﬁeld with the property that
∂D
A(x)n(x) dA = o for all subregions D ⊂ R, (i)
5.4. WORKED EXAMPLES. 93
where n(x) is the unit outward normal vector at a point x on the boundary ∂D. Show that (i) holds if and
only if div A = o at each point x ∈ R.
Solution: In terms of components in a ﬁxed basis, we are told that
∂D
A
ij
(x)n
j
(x) dA = 0 for all subregions D ⊂ R. (ii)
By using the divergence theorem (5.12), this implies that
D
A
ij,j
dV = 0 for all subregions D ⊂ R. (iii)
If A
ij,j
is continuous on R, the result established in the previous problem allows us to conclude that
A
ij,j
= 0 at each x ∈ R. (iv)
Conversely if (iv) holds, one can easily reverse the preceding steps to conclude that then (i) also holds. This
shows that (iv) is both necessary and suﬃcient for (i) to hold.
Example 5.9: Let A(x) be a 2tensor ﬁeld which satisﬁes the diﬀerential equation div A = o at each point
in R. Suppose that in addition
∂D
x ×An dA = o for all subregions D ⊂ R.
Show that A must be a symmetric 2tensor.
Solution: In terms of components we are given that
∂D
e
ijk
x
j
A
kp
n
p
dA = 0,
which on using the divergence theorem yields
D
e
ijk
(x
j
A
kp
)
,p
dV =
D
e
ijk
[δ
jp
A
kp
+x
j
A
kp,p
] dV = 0.
We are also given that A
ij,j
= 0 at each point in R and so the preceding equation simpliﬁes, after using the
substitution rule, to
D
e
ijk
A
kj
dV = 0.
Since this holds for all subregions D ⊂ R we can localize it to
e
ijk
A
kj
= 0 at each x ∈ R.
Finally, multiplying both sides by e
ipq
and using the identity e
ipq
e
ijk
= δ
pj
δ
qk
−δ
pk
δ
qj
in (1.49) yields
(δ
pj
δ
qk
−δ
pk
δ
qj
)A
kj
= A
qp
−A
pq
= 0
and so A is symmetric.
94 CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
Example 5.10: Let ε
1
(x
1
, x
2
) and ε
2
(x
1
, x
2
) be deﬁned on a simply connected twodimensional domain R.
Find necessary and suﬃcient conditions under which there exists a function u(x
1
, x
2
) such that
u
,1
= ε
1
, u
,2
= ε
2
for all (x
1
, x
2
) ∈ R. (i)
Solution: In the presence of suﬃcient smoothness, the order of partial diﬀerentiation does not matter and so
we necessarily have u
,12
= u
,21
. Therefore a necessary condition for (i) to hold is that ε
1
, ε
2
obey
ε
1,2
= ε
2,1
for all (x
1
, x
2
) ∈ R. (ii)
C
D
(x
1
, x
2
)
(0, 0)
R
s
n
s
1
= −n
2
, s
2
= n
1
(0, 0)
(x
1
, x
2
)
s
C
R
(ξ
1
, ξ
2
) = (ξ
1
(s), ξ
2
(s))
(s
1
, s
2
) = (ξ
1
(s), ξ
2
(s))
s
(a) (b )
Figure 5.2: (a) Path C from (0, 0) to (x
1
, x
2
). The curve is parameterized by arc length s as ξ
1
= ξ
1
(s), ξ
2
=
ξ
2
(s), 0 ≤ s ≤ s
0
. The unit tangent vector on S, s, has components (s
1
, s
2
). (b) A closed path C
passing
through (0, 0) and (x
1
, x
2
) and coinciding with C over part of its length. The unit outward normal vector on
S is n, and it has components (n
1
, n
2
)
.
To show that (ii) is also suﬃcient for the existence of u, we shall provide a formula for explicitly
calculating the function u in terms of the given functions ε
1
and ε
2
. Let C be an arbitrary regular oriented
curve in R that connects (0, 0) to (x
1
, x
2
). A generic point on the curve is denoted by (ξ
1
, ξ
2
) and the curve
is characterized by the parameterization
ξ
1
= ξ
1
(s), ξ
2
= ξ
2
(s), 0 ≤ s ≤ s
0
, (iii)
where s is arc length on C and (ξ
1
(0), ξ
2
(0)) = (0, 0) and (ξ
1
(s
0
), ξ
2
(s
0
)) = (x
1
, x
2
). We will show that the
function
u(x
1
, x
2
) =
s0
0
ε
1
(ξ
1
(s), ξ
2
(s))ξ
1
(s) +ε
2
(ξ
1
(s), ξ
2
(s))ξ
2
(s)
ds (iv)
satisﬁes the requirement (i) when (ii) holds.
5.4. WORKED EXAMPLES. 95
To see this we must ﬁrst show that the integral (iv) does in fact deﬁne a function of (x
1
, x
2
), i.e. that it
does not depend on the path of integration. (Note that if a function u satisﬁes (i). then so does the function
u + constant and so the dependence on the arbitrary starting point of the integral is to be expected.) Thus
consider a closed path C
that starts and ends at (0, 0) and passes through (x
1
, x
2
) as sketched in Figure
NNN (b). We need to show that
C
ε
1
(ξ
1
(s), ξ
2
(s))ξ
1
(s) +ε
2
(ξ
1
(s), ξ
2
(s))ξ
2
(s)
ds = 0. (v)
Recall that (ξ
1
(s), ξ
2
(s)) are the components of the unit tangent vector on C
at the point (ξ
1
(s), ξ
2
(s)):
s
1
= ξ
1
(s), s
2
= ξ
2
(s). Observe further from the ﬁgure that the components of the unit tangent vector s and
the unit outward normal vector n are related by s
1
= −n
2
and s
2
= n
1
. Thus the lefthand side of (v) can
be written as
C
ε
1
s
1
+ε
2
s
2
ds =
C
ε
2
n
1
−ε
1
n
2
ds =
D
ε
2,1
−ε
1,2
dA (vi)
where we have used the divergence theorem in the last step and D
is the region enclosed by C
. In view of
(ii), this last integral vanishes. Thus the integral (v) vanishes on any closed path C
and so the integral (iv) is
independent of path and depends only on the end points. Thus (iv) does in fact deﬁne a function u(x
1
, x
2
).
Finally it remains to show that the function (iv) satisﬁes the requirements (i). This is readily seen by
writing (iv) in the form
u(x
1
, x
2
) =
(x1,x2)
(0,0)
ε
1
(ξ
1
, ξ
2
)dξ
1
+ε
2
(ξ
1
, ξ
2
)dξ
2
(vii)
and then diﬀerentiating this with respect to x
1
and x
2
.
Example 5.11: Let a
1
(x
1
, x
2
) and a
2
(x
1
, x
2
) be deﬁned on a simply connected twodimensional domain R.
Suppose that a
1
and a
2
satisfy the partial diﬀerential equation
a
1,1
(x
1
, x
2
) +a
2,2
(x
1
, x
2
) = 0 for all (x
1
, x
2
) ∈ R. (i)
Show that (i) holds if and only if there is a function φ(x
1
, x
2
) such that
a
1
(x
1
, x
2
) = φ
,2
(x
1
, x
2
), a
2
(x
1
, x
2
) = −φ
,1
(x
1
, x
2
). (ii)
Solution: This is simply a restatement of the previous example in a form that will ﬁnd useful in what follows.
Example 5.12: Find the most general vector ﬁeld u(x) which satisﬁes the diﬀerential equation
1
2
∇u +∇u
T
= O at all x ∈ R. (i)
Solution: In terms of components, ∇u = −∇u
T
reads:
u
i,j
= −u
j,i
. (ii)
96 CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
Diﬀerentiating this with respect to x
k
, and then changing the order of diﬀerentiation gives
u
i,jk
= −u
j,ik
= −u
j,ki
.
However by (ii), u
j,k
= −u
k,j
. Using this and then changing the order of diﬀerentiation leads to
u
i,jk
= −u
j,ki
= u
k,ji
= u
k,ij
.
Again, by (ii), u
k,i
= −u
i,k
. Using this and changing the order of diﬀerentiation once again leads to
u
i,jk
= u
k,ij
= −u
i,kj
= −u
i,jk
.
It therefore follows that
u
i,jk
= 0.
Integrating this once gives
u
i,j
= C
ij
(iii)
where the C
ij
’s are constants. Integrating this once more gives
u
i
= C
ij
x
j
+c
i
, (iv)
where the c
i
’s are constants. The vector ﬁeld u(x) must necessarily have this form if (ii) is to hold.
To examine suﬃciency, substituting (iv) into (ii) shows that [C] must be skewsymmetric. Thus in
summary the most general vector ﬁeld u(x) that satisﬁes (i) is
u(x) = Cx +c
where C is a constant skewsymmetric 2tensor and c is a constant vector.
Example 5.13: Suppose that a scalarvalued function f(A) is deﬁned for all symmetric tensors A. In terms
of components in a ﬁxed basis we have f = f(A
11
, A
12
, A
13
, A
21
, . . . A
33
). The partial derivatives of f with
respect to A
ij
,
∂f
∂A
ij
, (i)
are the components of a 2tensor. Is this tensor symmetric?
Solution: Consider, for example, the particular function f = A · A = A
ij
A
ij
which, when written out in
components, reads:
f = f
1
(A
11
, A
12
, A
13
, A
21
, . . . A
33
) = A
2
11
+A
2
22
+A
2
33
+ 2A
2
12
+ 2A
2
23
+ 2A
2
31
. (ii)
Proceeding formally and diﬀerentiating (ii) with respect to A
12
, and separately with respect to A
21
, gives
∂f
1
∂A
12
= 4A
12
,
∂f
1
∂A
21
= 0, (iii)
which implies that ∂f
1
/∂A
12
= ∂f
1
/∂A
21
.
5.4. WORKED EXAMPLES. 97
On the other hand, since A
ij
is symmetric we can write
A
ij
=
1
2
(A
ij
+A
ji
) . (iv)
Substituting (iv) into the formula (ii) for f gives f = f
2
(A
11
, A
12
, A
13
, A
21
, . . . A
33
) :
f
2
(A
11
, A
12
, A
13
, A
21
, . . . A
33
)
= A
2
11
+A
2
22
+A
2
33
+ 2
¸
1
2
(A
12
+A
21
)
2
+ 2
¸
1
2
(A
23
+A
31
)
2
+ 2
¸
1
2
(A
31
+A
13
)
2
,
= A
2
11
+A
2
22
+A
2
33
+
1
2
A
2
12
+A
12
A
21
+
1
2
A
2
21
+. . . +
1
2
A
2
31
+A
31
A
13
+
1
2
A
2
13
. (v)
Note that the values of f
1
[A] = f
2
[A] for any symmetric matrix [A]. Diﬀerentiating f
2
leads to
∂f
2
∂A
12
= A
12
+A
21
,
∂f
2
∂A
21
= A
21
+A
12
, (vi)
and so now, ∂f
2
/∂A
12
= ∂f
2
/∂A
21
.
The source of the original diﬃculty is the fact that the 9 A
ij
’s in the argument of f
1
are not independent
variables since A
ij
= A
ji
; and yet we have been calculating partial derivatives as if they were independent.
In fact, the original problem statement itself is illposed since we are asked to calculate ∂f/∂A
ij
but told
that [A] is restricted to being symmetric.
Suppose that f
2
is deﬁned by (v) for all matrices [A] and not just symmetric matrices [A]. We see that
the values of the functions f
1
and f
2
are equal at all symmetric matrices and so in going from f
1
→ f
2
,
we have eﬀectively relaxed the constraint of symmetry and expanded the domain of deﬁnition of f to all
matrices [A]. We may diﬀerentiate f
2
by treating the 9 A
ij
’s to be independent and the result can then be
evaluated at symmetric matrices. We assume that this is what was meant in the problem statement.
In general, if a function f(A
11
, A
12
, . . . A
33
) is expressed in symmetric form, by changing A
ij
→
1
2
(A
ij
+
A
ji
), then ∂f/∂A
ij
will be symmetric; but not otherwise. Throughout these volumes, whenever we encounter
a function of a symmetric tensor, we shall always assume that it has been written in symmetric form; and
therefore its derivative with respect to the tensor can be assumed to be symmetric.
Remark: We will encounter a similar situation involving tensors whose determinant is unity. On occasion we
will have need to diﬀerentiate a function g
1
(A) deﬁned for all tensors with det A = 1 and we shall do this
by extending the deﬁnition of the given function and deﬁning a second function g
2
(A) for all tensors; g
2
is
deﬁned such that g
1
(A) = g
2
(A) for all tensors with unit determinant. We then diﬀerentiate g
2
and evaluate
the result at tensors with unit determinant.
References
1. P. Chadwick, Continuum Mechanics, Chapter 1, Sections 10 and 11, Dover, 1999.
2. M.E. Gurtin, An Introduction to Continuum Mechanics, Chapter 2, Academic Press, 1981.
3. L.A. Segel, Mathematics Applied to Continuum Mechanics, Section 2.3, Dover, New York, 1987.
Chapter 6
Orthogonal Curvilinear Coordinates
6.1 Introductory Remarks
The notes in this section are a somewhat simpliﬁed version of notes developed by Professor
Eli Sternberg of Caltech. The discussion here, which is a general treatment of orthogonal
curvilinear coordinates, is a compromise between a general tensorial treatment that includes
oblique coordinate systems and an adhoc treatment of special orthogonal curvilinear co
ordinate systems. A summary of the main tensor analytic results of this section are given
in equations (6.32)  (6.37) in terms of the scale factors h
i
deﬁned in (6.17) that relate
the rectangular cartesian coordinates (x
1
, x
2
, x
3
) to the orthogonal curvilinear coordinates
(ˆ x
1
, ˆ x
2
, ˆ x
3
).
It is helpful to begin by reviewing a few aspects of the familiar case of circular cylindrical
coordinates. Let {e
1
, e
2
, e
3
} be a ﬁxed orthonormal basis, and let O be a ﬁxed point chosen
as the origin. The point O and the basis {e
1
, e
2
, e
3
}, together, constitute a frame which we
denote by {O; e
1
, e
2
, e
3
}. Consider a generic point P in R
3
whose position vector relative
to this origin O is x. The rectangular cartesian coordinates of the point P in the frame
{O; e
1
, e
2
, e
3
} are the components (x
1
, x
2
, x
3
) of the position vector x in this basis.
We introduce circular cylindrical coordinates (r, θ, z) through the mappings
x
1
= r cos θ; x
2
= r sin θ; x
3
= z;
for all (r, θ, z) ∈ [0, ∞) ×[0, 2π) ×(−∞, ∞).
(6.1)
The mapping (6.1) is onetoone except at r = 0 (i.e. x
1
= x
2
= 0). Indeed (6.1) may be
99
100 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
explicitly inverted for r > 0 to give
r =
x
2
1
+ x
2
2
; cos θ = x
1
/r, sin θ = x
2
/r; z = x
3
. (6.2)
For a general set of orthogonal curvilinear coordinates one cannot, in general, explicitly
invert the coordinate mapping in this way.
The Jacobian determinant of the mapping (6.1) is
∆(r, θ, z) = det
∂x
1
/∂r ∂x
1
/∂θ ∂x
1
/∂z
∂x
2
/∂r ∂x
2
/∂θ ∂x
2
/∂z
∂x
3
/∂r ∂x
3
/∂θ ∂x
3
/∂z
¸
¸
¸
¸
¸
¸
= r ≥ 0.
Note that ∆(r, θ, z) = 0 if and only if r = 0 and is otherwise strictly positive; this reﬂects
the invertibility of (6.1) on (r, θ, z) ∈ (0, ∞) × [0, 2π) × (−∞, ∞), and the breakdown in
invertibility at r = 0.
x
1
x
2
x
3
z = constant
z
r
θ
r = constant
θ = constant
θ
r
zcoordinate line
rcoordinate line
θcoordinate line
z
Figure 6.1: Circular cylindrical coordinates (r, θ, z).
The circular cylindrical coordinates (r, θ, z) admit the familiar geometric interpretation
illustrated in Figure 6.1. In view of (6.2), one has:
r = r
o
= constant : circular cylinders, co −axial with x
3
−axis,
θ = θ
o
= constant : meridional half −planes through x
3
−axis,
z = z
o
= constant : planes perpendicular to x
3
−axis.
The above surfaces constitute a triply orthogonal family of coordinate surfaces; each “regular
point” of E
3
( i.e. a point at which r > 0) is the intersection of a unique triplet of (mutually
6.1. INTRODUCTORY REMARKS 101
perpendicular) coordinate surfaces. The coordinate lines are the pairwise intersections of
the coordinate surfaces; thus for example as illustrated in Figure 6.1, the line along which
a rcoordinate surface and a zcoordinate surface intersect is a θcoordinate line. Along
any coordinate line only one of the coordinates (r, θ, z) varies, while the other two remain
constant.
In terms of the circular cylindrical coordinates the position vector x can be written as
x = x(r, θ, z) = (r cos θ)e
1
+ (r sin θ)e
2
+ ze
3
. (6.3)
The vectors
∂x/∂r, ∂x/∂θ, ∂x/∂z,
are tangent to the coordinate lines corresponding to r, θ and z respectively. The socalled
metric coeﬃcients h
r
, h
θ
, h
z
denote the magnitudes of these vectors, i.e.
h
r
= ∂x/∂r, h
θ
= ∂x/∂θ, h
z
= ∂x/∂z,
and so the unit tangent vectors corresponding to the respective coordinate lines r, θ and z
are:
e
r
=
1
h
r
(∂x/∂r), e
θ
=
1
h
θ
(∂x/∂θ), e
z
=
1
h
z
(∂x/∂z).
In the present case one has h
r
= 1, h
θ
= r, h
z
= 1 and
e
r
=
1
h
r
(∂x/∂r) = cos θ e
1
+ sin θ e
2
,
e
θ
=
1
h
θ
(∂x/∂θ) = −sin θ e
1
+ cos θe
2
,
e
z
=
1
h
z
(∂x/∂z) = e
3
.
The triplet of vectors {e
r
, e
θ
, e
z
} forms a local orthonormal basis at the point x. They are
local because they depend on the point x; sometimes, when we need to emphasize this fact,
we will write {e
r
(x), e
θ
(x), e
z
(x)}.
In order to calculate the derivatives of various ﬁeld quantities it is clear that we will
need to calculate quantities such as ∂e
r
/∂r, ∂e
r
/∂θ, . . . etc. ; and in order to calculate the
components of these derivatives in the local basis we will need to calculate quantities of the
form
e
r
· (∂e
r
/∂r), e
θ
· (∂e
r
/∂r), e
z
· (∂e
r
/∂r),
e
r
· (∂e
θ
/∂r), e
θ
· (∂e
θ
/∂r), e
z
· (∂e
θ
/∂r), . . . etc.
(6.4)
102 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
Much of the analysis in the general case to follow, leading eventually to Equation (6.30) in
Subsection 6.2.4, is devoted to calculating these quantities.
Notation: As far as possible, we will consistently denote the ﬁxed cartesian coordinate system
and all components and quantities associated with it by symbols such as x
i
, e
i
, f(x
1
, x
2
, x
3
),
v
i
(x
1
, x
2
, x
3
), A
ij
(x
1
, x
2
, x
3
) etc. and we shall consistently denote the local curvilinear coor
dinate system and all components and quantities associated with it by similar symbols with
“hats” over them, e.g. ˆ x
i
, ˆ e
i
,
ˆ
f(ˆ x
1
, ˆ x
2
, ˆ x
3
), ˆ v
i
(ˆ x
1
, ˆ x
2
, ˆ x
3
),
´
A
ij
(ˆ x
1
, ˆ x
2
, ˆ x
3
) etc.
6.2 General Orthogonal Curvilinear Coordinates
Let {e
1
, e
2
, e
3
} be a ﬁxed righthanded orthonormal basis, let O be the ﬁxed point cho
sen as the origin and let {O; e
1
, e
2
, e
3
} be the associated frame. The rectangular cartesian
coordinates of the point with position vector x in this frame are
(x
1
, x
2
, x
3
) where x
i
= x · e
i
.
6.2.1 Coordinate transformation. Inverse transformation.
We introduce curvilinear coordinates (ˆ x
1
, ˆ x
2
, ˆ x
3
) through a triplet of scalar mappings
x
i
= x
i
(ˆ x
1
, ˆ x
2
, ˆ x
3
) for all (ˆ x
1
, ˆ x
2
, ˆ x
3
) ∈
ˆ
R, (6.5)
where the domain of deﬁnition
ˆ
R is a subset of E
3
. Each curvilinear coordinate ˆ x
i
belongs
to some linear interval L
i
, and
ˆ
R = L
1
× L
2
× L
3
. For example in the case of circular
cylindrical coordinates we have L
1
= {(ˆ x
1
 0 ≤ ˆ x
1
< ∞}, L
2
= {(ˆ x
2
 0 ≤ ˆ x
2
< 2π} and
L
3
= {(ˆ x
3
 − ∞ < ˆ x
3
< ∞}, and the “box”
ˆ
R is given by
ˆ
R = {(ˆ x
1
, ˆ x
2
, ˆ x
3
)  0 ≤ ˆ x
1
<
∞, 0 ≤ ˆ x
2
< 2π, −∞< ˆ x
3
< ∞}. Observe that the “box”
ˆ
R includes some but possibly not
all of its faces.
Equation (6.5) may be interpreted as a mapping of
ˆ
R into E
3
. We shall assume that
(x
1
, x
2
, x
3
) ranges over all of E
3
as (ˆ x
1
, ˆ x
2
, ˆ x
3
) takes on all values in
ˆ
R. We assume further
that the mapping (6.5) is onetoone and suﬃciently smooth in the interior of
ˆ
R so that the
inverse mapping
ˆ x
i
= ˆ x
i
(x
1
, x
2
, x
3
) (6.6)
exists and is appropriately smooth at all (x
1
, x
2
, x
3
) in the image of the interior of
ˆ
R.
6.2. GENERAL ORTHOGONAL CURVILINEAR COORDINATES 103
Note that the mapping (6.5) might not be uniquely invertible on some of the faces of
ˆ
R
which are mapped into “singular” lines or surfaces in E
3
. (For example in the case of circular
cylindrical coordinates, ˆ x
1
= r = 0 is a singular surface; see Section 6.1.) Points that are not
on a singular line or surface will be referred to as “regular points” of E
3
.
The Jacobian matrix [J] of the mapping (6.5) has elements
J
ij
=
∂x
i
∂ˆ x
j
(6.7)
and by the assumed smoothness and onetooneness of the mapping, the Jacobian determi
nant does not vanish on the interior of
ˆ
R. Without loss of generality we can take therefore
take it to be positive:
det[J] =
1
6
e
ijk
e
pqr
∂x
i
∂ˆ x
p
∂x
j
∂ˆ x
q
∂x
k
∂ˆ x
r
> 0 . (6.8)
The Jacobian matrix of the inverse mapping (6.6) is [J]
−1
.
The coordinate surface ˆ x
i
= constant is deﬁned by
ˆ x
i
(x
1
, x
2
, x
3
) = ˆ x
o
i
= constant, i = 1, 2, 3;
the pairwise intersections of these surfaces are the corresponding coordinate lines, along
which only one of the curvilinear coordinates varies. Thus every regular point of E
3
is the
point of intersection of a unique triplet of coordinate surfaces and coordinate lines, as is
illustrated in Figure 6.2.
Recall that the tangent vector along an arbitrary regular curve
Γ : x = x(t), (α ≤ t ≤ β), (6.9)
can be taken to be
1
˙ x(t) = ˙ x
i
(t)e
i
; it is oriented in the direction of increasing t. Thus in the
case of the special curve
Γ
1
: x = x(ˆ x
1
, c
2
, c
3
), ˆ x
1
∈ L
1
, c
2
= constant, c
3
= constant,
corresponding to a ˆ x
1
coordinate line, the tangent vector can be taken to be ∂x/∂ˆ x
1
. Gen
eralizing this, ∂x/∂ˆ x
i
are tangent vectors and
ˆ e
i
=
1
∂x/∂ˆ x
i

∂x
∂ˆ x
i
(no sum) (6.10)
1
Here and in the sequel a superior dot indicates diﬀerentiation with respect to the parameter t.
104 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
x
1
x
2
x
3
e
1
e
2
e
3
ˆ e
1
ˆ e
2
ˆ e
3
Q
ij
= ˆ e
i
· e
j
ˆ x
1
ˆ x
2
ˆ x
3 ˆ x
1
coordinate surface
ˆ x
2
coordinate surface
ˆ x
3
coordinate surface
Figure 6.2: Orthogonal curvilinear coordinates (ˆ x
1
, ˆ x
2
, ˆ x
3
) and the associated local orthonormal basis
vectors {ˆ e
1
, ˆ e
2
, ˆ e
3
}. Here ˆ e
i
is the unit tangent vector along the ˆ x
i
coordinate line, the sense of ˆ e
i
being
determined by the direction in which ˆ x
i
increases. The proper orthogonal matrix [Q] characterizes the
rotational transformation relating this basis to the rectangular cartesian basis {e
1
, e
2
, e
3
}.
are the unit tangent vectors along the ˆ x
i
−coordinate lines, both of which point in the sense
of increasing ˆ x
i
. Since our discussion is limited to orthogonal curvilinear coordinate systems,
we must require
for i = j : ˆ e
i
· ˆ e
j
= 0 or
∂x
∂ˆ x
i
·
∂x
∂ˆ x
j
= 0 or
∂x
k
∂ˆ x
i
·
∂x
k
∂ˆ x
j
= 0. (6.11)
6.2.2 Metric coeﬃcients, scale moduli.
Consider again the arbitrary regular curve Γ parameterized by (6.9). If s(t) is the arclength
of Γ, measured from an arbitrary ﬁxed point on Γ, one has
 ˙ s(t) =  ˙ x(t) =
˙ x(t) · ˙ x(t). (6.12)
6.2. GENERAL ORTHOGONAL CURVILINEAR COORDINATES 105
One concludes from (6.12), (6.5) and the chain rule that
ds
dt
2
=
dx
dt
·
dx
dt
=
dx
k
dt
·
dx
k
dt
=
∂x
k
∂ˆ x
i
dˆ x
i
dt
·
∂x
k
∂ˆ x
j
dˆ x
j
dt
=
∂x
k
∂ˆ x
i
·
∂x
k
∂ˆ x
j
dˆ x
i
dt
dˆ x
j
dt
.
where
ˆ x
i
(t) = ˆ x
i
(x
1
(t), x
2
(t), x
3
(t)), (α ≤ t ≤ β),
Thus
ds
dt
2
= g
ij
dˆ x
i
dt
dˆ x
j
dt
or (ds)
2
= g
ij
dˆ x
i
dˆ x
j
, (6.13)
in which g
ij
are the metric coeﬃcients of the curvilinear coordinate system under consider
ation. They are deﬁned by
g
ij
=
∂x
∂ˆ x
i
·
∂x
∂ˆ x
j
=
∂x
k
∂ˆ x
i
∂x
k
∂ˆ x
j
. (6.14)
Note that
g
ij
= 0, (i = j), (6.15)
as a consequence of the orthogonality condition (6.11). Observe that in terms of the Jacobian
matrix [J] deﬁned earlier in (6.7) we can write g
ij
= J
ki
J
kj
or equivalently [g] = [J]
T
[J].
Because of (6.14) and (6.15) the metric coeﬃcients can be written as
g
ij
= h
i
h
j
δ
ij
, (6.16)
where the scale moduli h
i
are deﬁned by
23
h
i
=
√
g
ii
=
∂x
k
∂ˆ x
i
∂x
k
∂ˆ x
i
=
∂x
1
∂ˆ x
i
2
+
∂x
2
∂ˆ x
i
2
+
∂x
3
∂ˆ x
i
2
> 0, (6.17)
noting that h
i
= 0 is precluded by (6.8). The matrix of metric coeﬃcients is therefore
[g] =
¸
¸
¸
¸
¸
h
2
1
0 0
0 h
2
2
0
0 0 h
2
3
. (6.18)
From (6.13), (6.14), (6.17) follows
(ds)
2
= (h
1
dˆ x
1
)
2
+ (h
2
dˆ x
2
)
2
+ (h
3
dˆ x
3
)
2
along Γ, (6.19)
2
Here and henceforth the underlining of one of two or more repeated indices indicates suspended sum
mation with respect to this index.
3
Some authors such as Love deﬁne h
i
as 1/
√
g
ii
instead of as
√
g
ii
106 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
which reveals the geometric signiﬁcance of the scale moduli, i.e.
h
i
=
ds
dˆ x
i
along the ˆ x
i
−coordinate lines. (6.20)
It follows from (6.17), (6.10) and (6.11) that the unit vector ˆ e
i
can be expressed as
ˆ e
i
=
1
h
i
∂x
∂ˆ x
i
, (6.21)
and therefore the proper orthogonal matrix [Q] relating the two sets of basis vectors is given
by
Q
ij
= ˆ e
i
· e
j
=
1
h
i
∂x
j
∂ˆ x
i
(6.22)
6.2.3 Inverse partial derivatives
In view of (6.5), (6.6) one has the identity
x
i
= x
i
(ˆ x
1
(x
1
, x
2
, x
3
), ˆ x
2
(x
1
, x
2
, x
3
), ˆ x
3
(x
1
, x
2
, x
3
)),
so that from the chain rule,
∂x
i
∂ˆ x
k
∂ˆ x
k
∂x
j
= δ
ij
.
Multiply this by ∂x
i
/∂ˆ x
m
, noting the implied contraction on the index i, and use (6.14),
(6.16) to conﬁrm that
∂x
j
∂ˆ x
m
= g
km
∂ˆ x
k
∂x
j
= h
2
m
∂ˆ x
m
∂x
j
.
Thus the inverse partial derivatives are given by
∂ˆ x
i
∂x
j
=
1
h
2
i
∂x
j
∂ˆ x
i
. (6.23)
By (6.22), the elements of the matrix [Q] that relates the two coordinate systems can be
written in the alternative form
Q
ij
= ˆ e
i
· e
j
= h
i
∂ˆ x
i
∂x
j
. (6.24)
Moreover, (6.23) and (6.17) yield the following alternative expressions for h
i
:
h
i
= 1
∂ˆ x
i
∂x
1
2
+
∂ˆ x
i
∂x
2
2
+
∂ˆ x
i
∂x
3
2
.
6.2. GENERAL ORTHOGONAL CURVILINEAR COORDINATES 107
6.2.4 Components of ∂ˆ e
i
/∂ˆ x
j
in the local basis (ˆ e
1
, ˆ e
2
, ˆ e
3
)
In order to calculate the derivatives of various ﬁeld quantities it is clear that we will need to
calculate the quantities ∂ˆ e
i
/∂ˆ x
j
; and in order to calculate the components of these derivatives
in the local basis we will need to calculate quantities of the form ˆ e
k
· ∂ˆ e
i
/∂ˆ x
j
. Calculating
these quantities is an essential prerequisite for the transformation of basic tensoranalytic
relations into arbitrary orthogonal curvilinear coordinates, and this subsection is devoted to
this calculation.
From (6.21),
∂ˆ e
i
∂ˆ x
j
· ˆ e
k
=
−
1
h
2
i
∂h
i
∂ˆ x
j
∂x
∂ˆ x
i
+
1
h
i
∂
2
x
∂ˆ x
i
∂ˆ x
j
¸
·
1
h
k
∂x
∂ˆ x
k
,
while by (6.14), (6.17),
∂x
∂ˆ x
i
·
∂x
∂ˆ x
j
= g
ij
= δ
ij
h
i
h
j
. (6.25)
Therefore
∂ˆ e
i
∂ˆ x
j
· ˆ e
k
= −
δ
ik
h
i
∂h
i
∂ˆ x
j
+ +
1
h
i
h
k
∂
2
x
∂ˆ x
i
∂ˆ x
k
·
∂x
∂x
k
. (6.26)
In order to express the second derivative term in (6.26) in terms of the scalemoduli and
their ﬁrst partial derivatives, we begin by diﬀerentiating (6.25) with respect to ˆ x
k
. Thus,
∂
2
x
∂ˆ x
i
∂ˆ x
k
·
∂x
∂ˆ x
j
+
∂
2
x
∂ˆ x
j
∂ˆ x
k
·
∂x
∂ˆ x
i
= δ
ij
∂
∂ˆ x
k
(h
i
h
j
) . (6.27)
If we refer to (6.27) as (a), and let (b) and (c) be the identities resulting from (6.27) when
(i, j, k) are replaced by (j, k, i) and (k, i, j), respectively, then
1
2
{(b)+(c) (a)} is readily found
to yield
∂
2
x
∂ˆ x
i
∂ˆ x
j
·
∂x
∂ˆ x
k
=
1
2
δ
jk
∂
∂ˆ x
i
(h
j
h
k
) + δ
ki
∂
∂ˆ x
j
(h
k
h
i
) −δ
ij
∂
∂ˆ x
k
(h
i
h
j
)
. (6.28)
Substituting (6.28) into (6.26) leads to
∂ˆ e
i
∂ˆ x
j
· ˆ e
k
= −
δ
ik
h
i
∂h
i
∂ˆ x
j
+
1
2h
i
h
k
δ
jk
∂
∂ˆ x
i
(h
j
h
k
) + δ
ki
∂
∂ˆ x
j
(h
k
h
i
) −δ
ij
∂
∂ˆ x
k
(h
i
h
j
)
. (6.29)
Equation (6.29) provides the explicit expressions for the terms ∂ˆ e
i
/∂ˆ x
j
· ˆ e
k
that we sought.
108 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
Observe the following properties that follow from it:
∂ˆ e
i
∂ˆ x
j
· ˆ e
k
= 0 if (i, j, k) distinct,
∂ˆ e
i
∂ˆ x
j
· ˆ e
k
= 0 if k = i,
∂ˆ e
i
∂ˆ x
i
· ˆ e
k
= −
1
h
k
∂h
i
∂ˆ x
k
, if i = k
∂ˆ e
i
∂ˆ x
k
· ˆ e
k
= −
1
h
i
∂h
k
∂ˆ x
i
if i = k.
(6.30)
6.3 Transformation of Basic Tensor Relations
Let T be a cartesian tensor ﬁeld of order N ≥ 1, deﬁned on a region R ⊂ E
3
and suppose
that the points of R are regular points of E
3
with respect to a given orthogonal curvilinear
coordinate system.
The curvilinear components
ˆ
T
ijk...n
of T are the components of T in the local basis
(ˆ e
1
, ˆ e
2
, ˆ e
3
). Thus,
ˆ
T
ij...n
= Q
ip
Q
jq
. . . Q
nr
T
pq...r
, where Q
ip
= ˆ e
i
· e
p
. (6.31)
6.3.1 Gradient of a scalar ﬁeld
Let φ(x) be a scalarvalued function and let v(x) denote its gradient:
v = ∇φ or equivalently v
i
= φ
,i
.
The components of v in the two bases {e
1
, e
2
, e
3
} and {ˆ e
1
, ˆ e
2
, ˆ e
3
} are related in the usual
way by
ˆ v
k
= Q
ki
v
i
and so
ˆ v
k
= Q
ki
v
i
= Q
ki
∂φ
∂x
i
.
On using (6.22) this leads to
ˆ v
k
=
1
h
k
∂x
i
∂ˆ x
k
∂φ
∂x
i
=
1
h
k
∂φ
∂x
i
∂x
i
∂ˆ x
k
,
6.3. TRANSFORMATION OF BASIC TENSOR RELATIONS 109
so that by the chain rule
ˆ v
k
=
1
h
k
∂
ˆ
φ
∂ˆ x
k
, (6.32)
where we have set
ˆ
φ(ˆ x
1
, ˆ x
2
, ˆ x
3
) = φ(x
1
(ˆ x
1
, ˆ x
2
, ˆ x
3
), x
2
(ˆ x
1
, ˆ x
2
, ˆ x
3
), x
3
(ˆ x
1
, ˆ x
2
, ˆ x
3
)).
6.3.2 Gradient of a vector ﬁeld
Let v(x) be a vectorvalued function and let W(x) denote its gradient:
W = ∇v or equivalently W
ij
= v
i,j
.
The components of W and v in the two bases {e
1
, e
2
, e
3
} and {ˆ e
1
, ˆ e
2
, ˆ e
3
} are related in the
usual way by
´
W
ij
= Q
ip
Q
jq
W
pq
, v
p
= Q
np
ˆ v
n
,
and therefore
´
W
ij
= Q
ip
Q
jq
∂v
p
∂x
q
= Q
ip
Q
jq
∂
∂x
q
(Q
np
ˆ v
n
) = Q
ip
Q
jq
∂
∂ˆ x
m
(Q
np
ˆ v
n
)
∂ˆ x
m
∂x
q
.
Thus by (6.24)
4
´
W
ij
= Q
ip
Q
jq
3
¸
m=1
1
h
m
Q
mq
∂
∂ˆ x
m
(Q
np
ˆ v
n
) .
Since Q
jq
Q
mq
= δ
mj
, this simpliﬁes to
´
W
ij
= Q
ip
1
h
j
∂
∂ˆ x
j
(Q
np
ˆ v
n
),
which, on expanding the terms in parentheses, yields
´
W
ij
=
1
h
j
∂ˆ v
i
∂ˆ x
j
+ Q
ip
∂Q
np
∂ˆ x
j
ˆ v
n
.
However by (6.22)
Q
ip
∂Q
np
∂ˆ x
j
ˆ v
n
= Q
ip
∂
∂ˆ x
j
(ˆ e
n
· e
p
)ˆ v
n
= ˆ e
i
·
∂ˆ e
n
∂ˆ x
j
ˆ v
n
,
and so
´
W
ij
=
1
h
j
∂ˆ v
i
∂ˆ x
j
+
¸
ˆ e
i
·
∂ˆ e
n
∂ˆ x
j
ˆ v
n
, (6.33)
in which the coeﬃcient in brackets is given by (6.29).
4
We explicitly use the summation sign in this equation (and elsewhere) when an index is repeated 3 or
more times, and we wish sum over it.
110 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
6.3.3 Divergence of a vector ﬁeld
Let v(x) be a vectorvalued function and let W(x) = ∇v(x) denote its gradient. Then
div v = trace W = v
i,i
.
Therefore from (6.33), the invariance of the trace of W, and (6.30),
div v = trace W = trace
´
W =
´
W
ii
=
3
¸
i=1
1
h
i
∂ˆ v
i
∂ˆ x
i
+
¸
n=i
1
h
n
∂h
i
∂ˆ x
n
ˆ v
n
¸
.
Collecting terms involving ˆ v
1
, ˆ v
2
, and ˆ v
3
alone, one has
div v =
1
h
1
∂ˆ v
1
∂ˆ x
1
+
ˆ v
1
h
2
h
1
∂h
2
∂ˆ x
1
+
ˆ v
1
h
3
h
1
∂h
3
∂ˆ x
1
+ . . . + . . .
Thus
div v =
1
h
1
h
2
h
3
∂
∂ˆ x
1
(h
2
h
3
ˆ v
1
) +
∂
∂ˆ x
2
(h
3
h
1
ˆ v
2
) +
∂
∂ˆ x
3
(h
1
h
2
ˆ v
3
)
. (6.34)
6.3.4 Laplacian of a scalar ﬁeld
Let φ(x) be a scalarvalued function. Since
∇
2
φ = div(grad φ) = φ
,kk
the results from Subsections (6.3.1) and (6.3.3) permit us to infer that
∇
2
φ =
1
h
1
h
2
h
3
∂
∂ˆ x
1
h
2
h
3
h
1
∂
ˆ
φ
∂ˆ x
1
+
∂
∂ˆ x
2
h
3
h
1
h
2
∂
ˆ
φ
∂ˆ x
2
+
∂
∂ˆ x
3
h
1
h
2
h
3
∂
ˆ
φ
∂ˆ x
3
¸
(6.35)
where we have set
ˆ
φ(ˆ x
1
, ˆ x
2
, ˆ x
3
) = φ(x
1
(ˆ x
1
, ˆ x
2
, ˆ x
3
), x
2
(ˆ x
1
, ˆ x
2
, ˆ x
3
), x
3
(ˆ x
1
, ˆ x
2
, ˆ x
3
)).
6.3.5 Curl of a vector ﬁeld
Let v(x) be a vectorvalued ﬁeld and let w(x) be its curl so that
w = curl v or equivalently w
i
= e
ijk
v
k,j
.
6.3. TRANSFORMATION OF BASIC TENSOR RELATIONS 111
Let
W = ∇v or equivalently W
ij
= v
i,j
.
Then as we have shown in an earlier chapter
w
i
= e
ijk
W
kj
, ´ w
i
= e
ijk
´
W
kj
.
Consequently from Subsection 6.3.2,
´ w
i
=
3
¸
j=1
1
h
j
e
ijk
∂ˆ v
k
∂ˆ x
j
+
¸
ˆ e
k
·
∂ˆ e
n
∂ˆ x
j
ˆ v
n
.
By (6.30), the second term within the braces sums out to zero unless n = j. Thus, using the
second of (6.30), one arrives at
´ w
i
=
3
¸
j,k=1
1
h
j
e
ijk
∂ˆ v
k
∂ˆ x
j
−
1
h
k
∂h
j
∂ˆ x
k
ˆ v
j
.
This yields
´ w
1
=
1
h
2
h
3
∂
∂ˆ x
2
(h
3
ˆ v
3
) −
∂
∂ˆ x
3
(h
2
ˆ v
2
)
,
´ w
2
=
1
h
3
h
1
∂
∂ˆ x
3
(h
1
ˆ v
1
) −
∂
∂ˆ x
1
(h
3
ˆ v
3
)
,
´ w
3
=
1
h
1
h
2
∂
∂ˆ x
1
(h
2
ˆ v
2
) −
∂
∂ˆ x
2
(h
1
ˆ v
1
)
.
(6.36)
6.3.6 Divergence of a symmetric 2tensor ﬁeld
Let S(x) be a symmetric 2tensor ﬁeld and let v(x) denote its divergence:
v = div S, S = S
T
, or equivalently v
i
= S
ij,j
, S
ij
= S
ji
.
The components of v and S in the two bases {e
1
, e
2
, e
3
} and {ˆ e
1
, ˆ e
2
, ˆ e
3
} are related in the
usual way by
ˆ v
i
= Q
ip
v
p
, S
ij
= Q
mi
Q
nj
ˆ
S
mn
,
and consequently
ˆ v
i
= Q
ip
S
pj,j
= Q
ip
∂
∂x
j
(Q
mp
Q
nj
ˆ
S
mn
) = Q
ip
∂
∂ˆ x
k
(Q
mp
Q
nj
ˆ
S
mn
)
∂ˆ x
k
∂x
j
.
112 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
By using (6.24), the orthogonality of the matrix [Q], (6.30) and
ˆ
S
ij
=
ˆ
S
ji
we obtain
ˆ v
1
=
1
h
1
h
2
h
3
∂
∂ˆ x
1
(h
2
h
3
ˆ
S
11
) +
∂
∂ˆ x
2
(h
3
h
1
ˆ
S
12
) +
∂
∂ˆ x
3
(h
1
h
2
ˆ
S
13
)
+
1
h
1
h
2
∂h
1
∂ˆ x
2
ˆ
S
12
+
1
h
1
h
3
∂h
1
∂ˆ x
3
ˆ
S
13
−
1
h
1
h
2
∂h
2
∂ˆ x
1
ˆ
S
22
−
1
h
1
h
3
∂h
3
∂ˆ x
1
ˆ
S
33
,
(6.37)
with analogous expressions for ˆ v
2
and ˆ v
3
.
Equations (6.32)  (6.37) provide the fundamental expressions for the basic tensoranalytic
quantities that we will need. Observe that they reduce to their classical rectangular cartesian
forms in the special case x
i
= ˆ x
i
(in which case h
1
= h
2
= h
3
= 1).
6.3.7 Diﬀerential elements of volume
When evaluating a volume integral over a region D, we sometimes ﬁnd it convenient to
transform it from the form
D
. . .
dx
1
dx
2
dx
3
into an equivalent expression of the form
D
. . .
dˆ x
1
dˆ x
2
dˆ x
3
.
In order to do this we must relate dx
1
dx
2
dx
3
to dˆ x
1
dˆ x
2
dˆ x
3
. By (6.22),
det[Q] =
1
h
1
h
2
h
3
det[J].
However since [Q] is a proper orthogonal matrix its determinant takes the value +1. Therefore
det[J] = h
1
h
2
h
3
and so the basic relation dx
1
dx
2
dx
3
= det[J] dˆ x
1
dˆ x
2
dˆ x
3
leads to
dx
1
dx
2
dx
3
= h
1
h
2
h
3
dˆ x
1
dˆ x
2
dˆ x
3
. (6.38)
6.3.8 Diﬀerential elements of area
Let d
ˆ
A
1
denote a diﬀerential element of (vector) area on a ˆ x
1
coordinate surface so that
d
ˆ
A
1
= (dˆ x
2
∂x/∂ˆ x
2
) × (dˆ x
3
∂x/∂ˆ x
3
). In view of (6.21) this leads to d
ˆ
A
1
= (dˆ x
2
h
2
ˆ e
2
) ×
(dˆ x
3
h
3
ˆ e
3
) = h
2
h
3
dˆ x
2
dˆ x
3
ˆ e
1
. Thus the diﬀerential elements of (scalar) area on the ˆ x
1
, ˆ x
2

and ˆ x
3
coordinate surfaces are given by
d
ˆ
A
1
= h
2
h
3
dˆ x
2
dˆ x
3
, d
ˆ
A
2
= h
3
h
1
dˆ x
3
dˆ x
1
, d
ˆ
A
3
= h
1
h
2
dˆ x
1
dˆ x
2
, (6.39)
respectively.
6.4. EXAMPLES OF ORTHOGONAL CURVILINEAR COORDINATES 113
6.4 Some Examples of Orthogonal Curvilinear Coordi
nate Systems
Circular Cylindrical Coordinates (r, θ, z):
x
1
= r cos θ, x
2
= r sin θ, x
3
= z;
for all (r, θ, z) ∈ [0, ∞) ×[0, 2π) ×(−∞, ∞);
h
r
= 1, h
θ
= r, h
z
= 1.
(6.40)
Spherical Coordinates (r, θ, φ):
x
1
= r sin θ cos φ, x
2
= r sin θ sin φ, x
3
= r cos θ;
for all (r, θ, φ) ∈ [0, ∞) ×[0, 2π) ×(−π, π];
h
r
= 1, h
θ
= r, h
φ
= r sin θ .
(6.41)
Elliptical Cylindrical Coordinates (ξ, η, z):
x
1
= a cosh ξ cos η, x
2
= a sinh ξ sin η, x
3
= z;
for all (ξ, η, z) ∈ [0, ∞) ×(−π, π] ×(−∞, ∞);
h
ξ
= h
η
= a
sinh
2
ξ + sin
2
η, h
z
= 1 .
(6.42)
Parabolic Cylindrical Coordinates (u, v, w):
x
1
=
1
2
(u
2
−v
2
), x
2
= uv, x
3
= w;
for all (u, v, w) ∈ (−∞, ∞) ×[0, ∞) ×(−∞, ∞);
h
u
= h
v
=
√
u
2
+ v
2
, h
z
= 1 .
(6.43)
6.5 Worked Examples.
Example 6.1: Let E(x) be a symmetric 2tensor ﬁeld that is related to a vector ﬁeld u(x) through
E =
1
2
∇u +∇u
T
.
114 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
In a cartesian coordinate system this can be written equivalently as
E
ij
=
1
2
∂u
i
∂x
j
+
∂u
j
∂x
i
.
Establish the analogous formulas in a general orthogonal curvilinear coordinate system.
Solution: Using the result from Subsection 6.3.2 and the formulas for ˆ e
k
· (∂ˆ e
i
/∂ˆ x
j
one ﬁnds after elementary
simpliﬁcation that
´
E
11
=
1
h
1
∂ˆ u
1
∂ˆ x
1
+
1
h
1
h
2
∂h
1
∂ˆ x
2
ˆ u
2
+
1
h
1
h
3
∂h
1
∂ˆ x
3
ˆ u
3
, E
22
= . . . , E
33
= . . . ,
´
E
12
=
´
E
21
=
1
2
h
1
h
2
∂
∂ˆ x
2
ˆ u
1
h
1
+
h
2
h
1
∂
∂ˆ x
1
ˆ u
2
h
2
, E
23
= . . . , E
31
= . . . ,
(i)
Example 6.2: Consider a symmetric 2tensor ﬁeld S(x) and a vector ﬁeld b(x) that satisfy the equation
div S +b = o.
In a cartesian coordinate system this can be written equivalently as
∂S
ij
∂x
j
+b
i
= 0.
Establish the analogous formulas in a general orthogonal curvilinear coordinate system.
Solution: From the results in Subsection 6.3.6 we have
1
h
1
h
2
h
3
∂
∂ˆ x
1
(h
2
h
3
´
S
11
) +
∂
∂ˆ x
2
(h
3
h
1
´
S
12
) +
∂
∂ˆ x
3
(h
1
h
2
´
S
13
)
+
1
h
1
h
2
∂h
1
∂ˆ x
2
´
S
12
+
1
h
1
h
3
∂h
1
∂ˆ x
3
´
S
13
−
1
h
1
h
2
∂h
2
∂ˆ x
1
´
S
22
−
1
h
1
h
3
∂h
3
∂ˆ x
1
´
S
33
+
ˆ
b
1
= 0,
. . . . . . . . . etc.
(i)
where
ˆ
b
i
= Q
ip
b
p
Example 6.3: Consider circular cylindrical coordinates (ˆ x
1
, ˆ x
2
, ˆ x
3
) = (r, θ, z) which are related to (x
1
, x
2
, x
3
)
through
x
1
= r cos θ, x
2
= r sin θ, x
3
= z,
0 ≤ r < ∞, 0 ≤ θ < 2π, −∞< z < ∞.
Let f(x) be a scalarvalued ﬁeld, u(x) a vectorvalued ﬁeld, and S(x) a symmetric 2tensor ﬁeld. Express
the following quanties,
(a) grad f
6.5. WORKED EXAMPLES. 115
x
1
x
2
x
3
r
θ
z
e
r
e
θ
e
z
e
r
= cos θ e
1
+ sin θ e
2
,
e
θ
= −sin θ e
1
+ cos θ e
2
,
e
z
= e
3
,
Figure 6.3: Cylindrical coordinates (r, θ, z) and the associated local curvilinear orthonormal basis
{e
r
, e
θ
, e
z
}.
(b) ∇
2
f
(c) div u
(d) curl u
(e)
1
2
∇u +∇u
T
and
(f) div S
in this coordinate system.
Solution: We simply need to specialize the basic results established in Section 6.3.
In the present case we have
(ˆ x
1
, ˆ x
2
, ˆ x
3
) = (r, θ, z) (i)
and the coordinate mapping (6.5) takes the particular form
x
1
= r cos θ, x
2
= r sin θ, x
3
= z. (ii)
The matrix [∂x
i
/∂ˆ x
j
] therefore specializes to
¸
¸
¸
¸
¸
∂x
1
/∂r ∂x
1
/∂θ ∂x
1
/∂z
∂x
2
/∂r ∂x
2
/∂θ ∂x
2
/∂z
∂x
3
/∂r ∂x
3
/∂θ ∂x
3
/∂z
=
¸
¸
¸
¸
¸
cos θ −r sin θ 0
sin θ r cos θ 0
0 0 1
,
116 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
and the scale moduli are
h
r
=
∂x
1
∂r
2
+
∂x
2
∂r
2
+
∂x
3
∂r
2
= 1,
h
θ
=
∂x
1
∂θ
2
+
∂x
2
∂θ
2
+
∂x
3
∂θ
2
= r,
h
z
=
∂x
1
∂z
2
+
∂x
2
∂z
2
+
∂x
3
∂z
2
= 1.
(iii)
We use the natural notation
(u
r
, u
θ
, u
z
) = (ˆ u
1
, ˆ u
2
, ˆ u
3
) (iv)
for the components of a vector ﬁeld, and
(S
rr
, S
rθ
, . . .) = (
ˆ
S
11
,
ˆ
S
12
, . . .) (v)
for the components of a 2tensor ﬁeld, and
(e
r
, e
θ
, e
z
) = (ˆ e
1
, ˆ e
2
, ˆ e
3
) (vi)
for the unit vectors associated with the local cylindrical coordinate system.
From (ii),
x = (r cos θ)e
1
+ (r sin θ)e
2
+ (z)e
3
,
and therefore on using (6.21) and (iii) we obtain the following expressions for the unit vectors associated
with the local cylindrical coordinate system:
e
r
= cos θ e
1
+ sin θ e
2
,
e
θ
= −sin θ e
1
+ cos θ e
2
,
e
z
= e
3
,
which, in this case, could have been obtained geometrically from Figure 6.3.
(a) Substituting (i) and (iii) into (6.32) gives
∇f =
∂
´
f
∂r
e
r
+
1
r
∂
´
f
∂θ
e
θ
+
∂
´
f
∂z
e
z
where we have set
´
f(r, θ, z) = f(x
1
, x
2
, x
3
).
(b) Substituting (i) and (iii) into (6.35) gives
∇
2
´
f =
∂
2 ´
f
∂r
2
+
1
r
∂
´
f
∂r
+
1
r
2
∂
2 ´
f
∂θ
2
+
∂
2 ´
f
∂z
2
where we have set
´
f(r, θ, z) = f(x
1
, x
2
, x
3
).
6.5. WORKED EXAMPLES. 117
(c) Substituting (i) and (iii) into (6.34) gives
div u =
∂u
r
∂r
+
1
r
u
r
+
1
r
∂u
θ
∂θ
+
∂u
z
∂z
(d) Substituting (i) and (iii) into (6.36) gives
curl u =
1
r
∂u
z
∂θ
−
∂u
θ
∂z
e
r
+
∂u
r
∂z
−
∂u
z
∂r
e
θ
+
∂u
θ
∂r
+
u
θ
r
−
1
r
∂u
r
∂θ
e
z
(e) Set E=(1/2)(∇u + ∇u
T
). Substituting (i) and (iii) into (6.33) enables us to calculate ∇u whence we
can calculate E. Writing the cylindrical components
ˆ
E
ij
of E as
(E
rr
, E
rθ
, E
rz
, . . .) = (
ˆ
E
11
,
ˆ
E
12
,
ˆ
E
13
. . .),
one ﬁnds
E
rr
=
∂u
r
∂r
,
E
θθ
=
1
r
∂u
θ
∂θ
+
u
r
r
,
E
zz
=
∂u
z
∂z
,
E
rθ
=
1
2
1
r
∂u
r
∂θ
+
∂u
θ
∂r
−
u
θ
r
,
E
θz
=
1
2
∂u
θ
∂z
+
1
r
∂u
z
∂θ
,
E
zr
=
1
2
∂u
z
∂r
+
∂u
r
∂z
.
Alternatively these could have been obtained from the results of Example 6.1.
(f) Finally, substituting (i) and (iii) into (6.37) gives
div S =
∂S
rr
∂r
+
1
r
∂S
rθ
∂θ
+
∂S
rz
∂z
+
S
rr
−S
θθ
r
e
r
+
∂S
rθ
∂r
+
1
r
∂S
θθ
∂θ
+
∂S
θz
∂z
+
2S
rθ
r
e
θ
+
∂S
zr
∂r
+
1
r
∂S
zθ
∂θ
+
∂S
zz
∂z
+
S
zr
r
e
z
Alternatively these could have been obtained from the results of Example 6.2.
Example 6.4: Consider spherical coordinates (ˆ x
1
, ˆ x
2
, ˆ x
3
) = (r, θ, φ) which are related to (x
1
, x
2
, x
3
) through
x
1
= r sin θ cos φ, x
2
= r sin θ sin φ, x
3
= r cos θ,
0 ≤ r < ∞, 0 ≤ θ ≤ π, 0 ≤ φ < 2π.
Let f(x) be a scalarvalued ﬁeld, u(x) a vectorvalued ﬁeld, and S(x) a symmetric 2tensor ﬁeld. Express
the following quanties,
118 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
x
1
x
2
x
3
r
θ
φ
e
r
= (sin θ cos φ) e
1
+ (sin θ sin φ) e
2
+ cos θ e
3
,
e
θ
= (cos θ cos φ) e
1
+ (cos θ sin φ) e
2
− sin θ e
3
,
e
φ
= −sin φ e
1
+ cos φ e
2
,
e
r
e
θ
e
φ
Figure 6.4: Spherical coordinates (r, θ, φ) and the associated local curvilinear orthonormal basis {e
r
, e
θ
, e
φ
}.
(a) grad f
(b) div u
(c) ∇
2
f
(d) curl u
(e)
1
2
∇u +∇u
T
and
(f) div S
in this coordinate system.
Solution: We simply need to specialize the basic results established in Section 6.3.
In the present case we have
(ˆ x
1
, ˆ x
2
, ˆ x
3
) = (r, θ, φ), (i)
and the coordinate mapping (6.5) takes the particular form
x
1
= r sin θ cos φ, x
2
= r sin θ sin φ, x
3
= r cos θ. (ii)
The matrix [∂x
i
/∂ˆ x
j
] therefore specializes to
¸
¸
¸
¸
¸
∂x
1
/∂r ∂x
1
/∂θ ∂x
1
/∂φ
∂x
2
/∂r ∂x
2
/∂θ ∂x
2
/∂φ
∂x
3
/∂r ∂x
3
/∂θ ∂x
3
/∂φ
=
¸
¸
¸
¸
¸
sin θ cos φ r cos θ cos φ −r sin θ sin φ
sin θ sin φ r cos θ sin φ r sin θ cos φ
cos θ −r sin θ 0
6.5. WORKED EXAMPLES. 119
and the scale moduli are
h
r
=
∂x
1
∂r
2
+
∂x
2
∂r
2
+
∂x
3
∂r
2
= 1,
h
θ
=
∂x
1
∂θ
2
+
∂x
2
∂θ
2
+
∂x
3
∂θ
2
= r,
h
φ
=
∂x
1
∂φ
2
+
∂x
2
∂φ
2
+
∂x
3
∂φ
2
= r sin θ.
(iii)
We use the natural notation
(u
r
, u
θ
, u
φ
) = (ˆ u
1
, ˆ u
2
, ˆ u
3
) (iv)
for the components of a vector ﬁeld,
(S
rr
, S
rθ
, S
rφ
. . .) = (
ˆ
S
11
,
ˆ
S
12
,
ˆ
S
13
. . .) (v)
for the components of a 2tensor ﬁeld, and
(e
r
, e
θ
, e
φ
) = (ˆ e
1
, ˆ e
2
, ˆ e
3
) (vi)
for the unit vectors associated with the local spherical coordinate system.
From (ii),
x = (r sin θ cos φ)e
1
+ (r sin θ sin φ)e
2
+ (r cos θ)e
3
,
and therefore on using (6.21) and (iii) we obtain the following expressions for the unit vectors associated
with the local spherical coordinate system:
e
r
= (sin θ cos φ) e
1
+ (sin θ sin φ) e
2
+ cos θ e
3
,
e
θ
= (cos θ cos φ) e
1
+ (cos θ sin φ) e
2
− sin θ e
3
,
e
φ
= −sin φ e
1
+ cos φ e
2
,
which, in this case, could have been obtained geometrically from Figure 6.4.
(a) Substituting (i) and (iii) into (6.32) gives
∇f =
∂
´
f
∂r
e
r
+
1
r
∂
´
f
∂θ
e
θ
+
1
r sin θ
∂
´
f
∂φ
e
φ
.
where we have set
´
f(r, θ, φ) = f(x
1
, x
2
, x
3
).
(b) Substituting (i) and (iii) into (6.35) gives
∇
2
f =
∂
2 ´
f
∂r
2
+
2
r
∂
´
f
∂r
+
1
r
2
∂
2 ´
f
∂θ
2
+
1
r
2
cot θ
∂
´
f
∂θ
+
1
r
2
sin
2
θ
∂
2 ´
f
∂φ
2
where we have set
´
f(r, θ, φ) = f(x
1
, x
2
, x
3
).
120 CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
(c) Substituting (i) and (iii) into (6.34) gives
div u =
1
r
2
sin θ
¸
∂
∂r
(r
2
sin θ u
r
) +
∂
∂θ
(r sin θu
θ
) +
∂
∂φ
(ru
φ
)
.
(d) Substituting (i) and (iii) into (6.36) gives
curl u =
1
r
2
sin θ
¸
∂
∂θ
(r sin θv
φ
) −
∂
∂φ
(rv
θ
)
e
r
+
1
r sin θ
¸
∂v
r
∂φ
−
∂
∂r
(r sin θv
φ
)
e
θ
+
1
r
¸
∂
∂r
(rv
θ
) −
∂v
r
∂θ
e
φ
.
(e) Set E=(1/2)(∇u + ∇u
T
). We substitute (i) and (iii) into (6.33) to calculate ∇u from which one can
calculate E. Writing the spherical components
´
E
ij
of E as
(E
rr
, E
rθ
, E
rφ
, . . .) = (
´
E
11
,
´
E
12
,
´
E
13
. . .),
one ﬁnds
E
rr
=
∂u
r
∂r
,
E
θθ
=
1
r
∂u
θ
∂θ
+
u
r
r
,
E
φφ
=
1
r sin θ
∂u
φ
∂φ
+
u
r
r
+
cot θ
r
u
θ
,
E
rθ
=
1
2
1
r
∂u
r
∂θ
+
∂u
θ
∂r
−
u
θ
r
,
E
θφ
=
1
2
1
r sin θ
∂u
θ
∂φ
+
1
r
∂u
φ
∂θ
−
cot θ
r
u
φ
,
E
φr
=
1
2
1
r sin θ
∂u
r
∂φ
+
∂u
φ
∂r
−
u
φ
r
,
Alternatively these could have been obtained from the results of Example 6.1.
(f) Finally substituting (i) and (iii) into (6.37) gives
div S =
∂S
rr
∂r
+
1
r
∂S
rθ
∂θ
+
1
r sin θ
∂S
rφ
∂φ
+
1
r
[2S
rr
−S
θθ
−S
φφ
+ cot θS
rθ
]
e
r
+
∂S
rθ
∂r
+
1
r
∂S
θθ
∂θ
+
1
r sin θ
∂S
θφ
∂φ
+
1
r
[3S
rθ
+ cot θ(S
θθ
−S
φφ
)]
e
θ
+
∂S
rφ
∂r
+
1
r
∂S
θφ
∂θ
+
1
r sin θ
∂S
φφ
∂φ
+
1
r
[3S
rφ
+ 2 cot θS
θφ
]
e
φ
Alternatively these could have been obtained from the results of Example 6.2.
Example 6.5: Show that the matrix [Q] deﬁned by (6.22) is a proper orthogonal matrix.
Proof: From (6.22),
Q
ij
=
1
h
i
∂x
j
∂ˆ x
i
,
6.5. WORKED EXAMPLES. 121
and therefore
Q
ik
Q
jk
=
1
h
i
h
j
∂x
k
∂ˆ x
i
∂x
k
∂ˆ x
j
=
1
h
i
h
j
g
ij
= δ
ij
,
where in the penultimate step we have used (6.14) and in the ultimate step we have used (6.16). Thus [Q]
is an orthogonal matrix. Next, from (6.22) and (6.7),
Q
ij
=
1
h
i
∂x
j
∂ˆ x
i
=
1
h
i
J
ji
where J
ij
= ∂x
i
/∂ˆ x
j
are the elements of the Jacobian matrix. Thus
det[Q] =
1
h
1
h
2
h
3
det[J] > 0
where the inequality is a consequence of the inequalities in (6.8) and (6.17). Hence [Q] is proper orthogonal.
References
1. H. Reismann and P.S. Pawlik, Elasticity: Theory and Applications, Wiley, 1980.
2. L.A. Segel, Mathematics Applied to Continuum Mechanics, Dover, New York, 1987.
3. E. Sternberg, (Unpublished) Lecture Notes for AM 135: Elasticity, California Institute of Technology,
Pasadena, California, 1976.
Chapter 7
Calculus of Variations
7.1 Introduction.
Numerous problems in physics can be formulated as mathematical problems in optimization.
For example in optics, Fermat’s principle states that the path taken by a ray of light in
propagating from one point to another is the path that minimizes the travel time. Most
equilibrium theories of mechanics involve ﬁnding a conﬁguration of a system that minimizes
its energy. For example a heavy cable that hangs under gravity between two ﬁxed pegs
adopts the shape that, from among all possible shapes, minimizes the gravitational potential
energy of the system. Or, if we subject a straight beam to a compressive load, its deformed
conﬁguration is the shape which minimizes the total energy of the system. Depending on
the load, the energy minimizing conﬁguration may be straight or bent (buckled). If we dip a
(nonplanar) wire loop into soapy water, the soap ﬁlm that forms across the loop is the one
that minimizes the surface energy (which under most circumstances equals minimizing the
surface area of the soap ﬁlm). Another common problem occurs in geodesics where, given
some surface and two points on it, we want to ﬁnd the path of shortest distance joining those
two points which lies entirely on the given surface.
In each of these problem we have a scalarvalued quantity F such as energy or time that
depends on a function φ such as the shape or path, and we want to ﬁnd the function φ that
minimizes the quantity F of interest. Note that the scalarvalued function F is deﬁned on a
set of functions. One refers to F as a functional and writes F{φ}.
As a speciﬁc example, consider the socalled Brachistochrone Problem. We are given two
123
124 CHAPTER 7. CALCULUS OF VARIATIONS
points (0, 0) and (1, h) in the x, yplane, with h > 0, that are to be joined by a smooth wire.
A bead is released from rest from the point (0, 0) and slides along the wire due to gravity.
For what shape of wire is the time of travel from (0, 0) to (1, h) least?
x
y = φ(x)
g
y
(0, 0)
(1, h)
1
Figure 7.1: Curve joining (0, 0) to (1, h) along which a bead slides under gravity.
In order to formulate this problem, let y = φ(x), 0 ≤ x ≤ 1, describe a generic curve
joining (0, 0) to (1, h). Let s(t) denote the distance traveled by the bead along the wire at
time t so that v(t) = ds/dt is its corresponding speed. The travel time of the bead is
T =
ds
v
where the integral is taken along the entire path. In the question posed to us, we are to ﬁnd
the curve, i.e. the function φ(x), which makes T a minimum. Since we are to minimize T by
varying φ, it is natural to ﬁrst rewrite the formula for T in a form that explicitly displays
its dependency on φ.
Note ﬁrst, that by elementary calculus, the arc length ds is related to dx by
ds =
dx
2
+ dy
2
=
1 + (dy/dx)
2
dx =
1 + (φ
)
2
dx
and so we can write
T =
1
0
1 + (φ
)
2
v
dx.
Next, we wish to express the speed v in terms of φ. If (x(t), y(t)) denote the coordinates
of the bead at time t, the conservation of energy tells us that the sum of the potential and
kinetic energies does not vary with time:
−mgφ(x(t)) +
1
2
mv
2
(t) = 0,
7.1. INTRODUCTION. 125
where the right hand side is the total energy at the initial instant. Solving this for v gives
v =
2gφ.
Finally, substituting this back into the formula for the travel time gives
T{φ} =
1
0
1 + (φ
)
2
2gφ
dx. (7.1)
Given a curve characterized by y = φ(x), this formula gives the corresponding travel time
for the bead. Our task is to ﬁnd, from among all such curves, the one that minimizes T{φ}.
This minimization takes place over a set of functions φ. In order to complete the for
mulation of the problem, we should carefully characterize this set of “admissible functions”
(or “test functions”). A generic curve is described by y = φ(x), 0 ≤ x ≤ 1. Since we are
only interested in curves that pass through the points (0, 0) and (1, h) we must require that
φ(0) = 0, φ(1) = h. Finally, for analytical reasons we only consider curves that are continu
ous and have a continous slope, i.e. φ and φ
are both continuous on [0, 1]. Thus the set A of
admissible functions that we wish to consider is
A =
¸
φ(·)
φ : [0, 1] →R, φ ∈ C
1
[0, 1], φ(0) = 0, φ(1) = h
¸
. (7.2)
Our task is to minimize T{φ} over the set A.
Remark: Since the shortest distance between two points is given by the straight line that
joins them, it is natural to wonder whether a straight line is also the curve that gives the
minimum travel time. To investigate this, consider (a) a straight line, and (b) a circular arc,
that joins (0, 0) to (1, h). Use (7.1) to calculate the travel time for each of these paths and
show that the straight line is not the path that gives the least travel time.
Remark: One can consider various variants of the Brachistochrone Problem. For example, the
length of the curve joining the two points might be prescribed, in which case the minimization
is to be carried out subject to the constraint that the length is given. Or perhaps the position
of the left hand end might be prescribed as above, but the right hand end of the wire might
be allowed to lie anywhere on the vertical line through x = 1. Or, there might be some
prohibited region of the x, yplane through which the path is disallowed from passing. And
so on.
In summary, in the simplest problem in the calculus of variations we are required to ﬁnd
a function φ(x) ∈ C
1
[0, 1] that minimizes a functional F{φ} of the form
F{φ} =
1
0
f(x, φ, φ
)dx
126 CHAPTER 7. CALCULUS OF VARIATIONS
over an admissible set of test functions A. The test functions (or admissible functions) φ are
subject to certain conditions including smoothness requirements; possibly (but not neces
sarily) boundary conditions at both ends x = 0, 1; and possibly (but not necessarily) side
constraints of various forms. Other types of problems will be also be encountered in what
follows.
7.2 Brief review of calculus.
Perhaps it is useful to begin by reviewing the familiar question of minimization in calculus.
Consider a subset A of ndimensional space R
n
and let F(x) = F(x
1
, x
2
, . . . , x
n
) be a real
valued function deﬁned on A. We say that x
o
∈ A is a minimizer of F if
1
F(x) ≥ F(x
o
) for all x ∈ A. (7.3)
Sometimes we are only interested in ﬁnding a “local minimizer”, i.e. a point x
o
that
minimizes F relative to all x that are “close” to x
0
. In order to speak of such a notion we
must have a measure of “closeness”. Thus suppose that the vector space R
n
is Euclidean so
that a norm is deﬁned on R
n
. Then we say that x
o
is a local minimizer of F if F(x) ≥ F(x
o
)
for all x in a neighborhood of x
o
, i.e. if
F(x) ≥ F(x
o
) for all x such that x −x
o
 < r (7.4)
for some r > 0.
Deﬁne the function
ˆ
F(ε) for −ε
0
< ε < ε
0
by
ˆ
F(ε) = F(x
0
+ εn) (7.5)
where n is a ﬁxed vector and ε
0
is small enough to ensure that x
0
+εn ∈ A for all ε ∈ (−ε
0
, ε
0
).
In the presence of suﬃcient smoothness we can write
ˆ
F(ε) −
ˆ
F(0) =
ˆ
F
(0)ε +
1
2
ˆ
F
(0)ε
2
+ O(ε
3
). (7.6)
Since F(x
0
+εn) ≥ F(x
0
) it follows that
ˆ
F(ε) ≥
ˆ
F(0). Thus if x
0
is to be a minimizier of F
it is necessary that
ˆ
F
(0) = 0,
ˆ
F
(0) ≥ 0. (7.7)
1
A maximizer of F is a minimizer of −F so we don’t need to address maximizing separately from mini
mizing.
7.2. BRIEF REVIEW OF CALCULUS. 127
It is customary to use the following notation and terminology: we set
δF(x
o
, n) =
ˆ
F
(0), (7.8)
which is called the ﬁrst variation of F and similarly set
δ
2
F(x
o
, n) =
ˆ
F
(0) (7.9)
which is called the second variation of F. At an interior local minimizer x
0
, one necessarily
must have
δF(x
o
, n) = 0 and δ
2
F(x
o
, n) ≥ 0 for all unit vectors n. (7.10)
In the present setting of calculus, we know from (7.5), (7.8) that δF(x
o
, n) = ∇F(x
o
) · n
and that δ
2
F(x
o
, n) = (∇∇F(x
o
))n·n. Here the vector ﬁeld ∇F is the gradient of F and the
tensor ﬁeld ∇∇F is the gradient of ∇F. Therefore (7.10) is equivalent to the requirements
that
∇F(x
o
) · n = 0 and (∇∇F(x
o
))n · n ≥ 0 for all unit vectors n (7.11)
or equivalently
n
¸
i=1
∂F
∂x
i
x=x
0
n
i
= 0 and
n
¸
i=1
n
¸
j=1
∂
2
F
∂x
i
∂x
j
x=x
0
n
i
n
j
≥ 0 (7.12)
whence we must have ∇F(x
o
) = o and the Hessian ∇∇F(x
o
) must be positive semideﬁnite.
Remark: It is worth recalling that a function need not have a minimizer. For example, the
function F
1
(x) = x deﬁned on A
1
= (−∞, ∞) is unbounded as x →±∞. Another example is
given by the function F
2
(x) = x deﬁned on A
2
= (−1, 1) noting that F
2
≥ −1 on A
2
; however,
while the value of F
2
can get as close as one wishes to −1, it cannot actually achieve the
value −1 since there is no x ∈ A
2
at which f(x) = −1; note that −1 / ∈ A
2
. Finally, consider
the function F
3
(x) deﬁned on A
3
= [−1, 1] where F
3
(x) = 1 for −1 ≤ x ≤ 0 and F(x) = x
for 0 < x ≤ 1; the value of F
3
can get as close as one wishes to 0 but cannot achieve it since
F(0) = 1. In the ﬁrst example A
1
was unbounded. In the second, A
2
was bounded but open.
And in the third example A
3
was bounded and closed but the function was discontinuous on
A
3
. In order for a minimizer to exist, A must be compact (i.e. bounded and closed). It can
be shown that if A is compact and if F is continuous on A then F assumes both maximum
and minimum values on A.
128 CHAPTER 7. CALCULUS OF VARIATIONS
7.3 The basic idea: necessary conditions for a mini
mum: δF = 0, δ
2
F ≥ 0.
In the calculus of variations, we are typically given a functional F deﬁned on a function
space A, where F : A → R, and we are asked to ﬁnd a function φ
o
∈ A that minimizes F
over A: i.e. to ﬁnd φ
o
∈ A for which
F{φ} ≥ F{φ
o
} for all φ ∈ A.
x
0
y
(0, a)
(1, b)
y = φ
1
(x)
1
y = φ
2
(x)
Figure 7.2: Two functions φ
1
and φ
2
that are “close” in the sense of the norm  · 
0
but not in the sense
of the norm  · 
1
.
Most often, we will be looking for a local (or relative) minimizer, i.e. for a function φ
0
that minimizes F relative to all “nearby functions”. This requires that we select a norm so
that the distance between two functions can be quantiﬁed. For a function φ in the set of
functions that are continuous on an interval [x
1
, x
2
], i.e. for φ ∈ C[x
1
, x
2
], one can deﬁne a
norm by
φ
0
= max
x
1
≤x≤x
2
φ(x).
For a function φ in the set of functions that are continuous and have continuous ﬁrst deriva
tives on [x
1
, x
2
], i.e. for φ ∈ C
1
[x
1
, x
2
] one can deﬁne a norm by
φ
1
= max
x
1
≤x≤x
2
φ(x) + max
x
1
≤x≤x
2
φ
(x);
and so on. (Of course the norm φ
0
can also be used on C
1
[x
1
, x
2
].)
7.3. A NECESSARY CONDITION FOR AN EXTREMUM 129
When seeking a local minimizer of a functional F we might say we want to ﬁnd φ
0
for
which
F{φ} ≥ F{φ
o
} for all admissible φ such that φ −φ
0

0
< r
for some r > 0. In this case the minimizer φ
0
is being compared with all admissible functions
φ whose values are close to those of φ
0
for all x
1
≤ x ≤ x
2
. Such a local minimizer is called a
strong minimizer. On the other hand, when seeking a local minimizer we might say we want
to ﬁnd φ
0
for which
F{φ} ≥ F{φ
o
} for all admissible φ such that φ −φ
0

1
< r
for some r > 0. In this case the minimizer is being compared with all functions whose values
and whose ﬁrst derivatives are close to those of φ
0
for all x
1
≤ x ≤ x
2
. Such a local minimizer
is called a weak minimizer. A strong minimizer is automatically a weak minimizer.
Unless explicitly stated otherwise, in this Chapter we will be examining weak local ex
trema. The approach for ﬁnding such extrema of a functional is essentially the same as that
used in the more familiar case of calculus reviewed in the preceding subsection. Consider
a functional F{φ} deﬁned on a function space A and suppose that φ
o
∈ A minimizes F. In
order to determine φ
0
we consider the oneparameter family of admissible functions
φ(x; ε) = φ
0
(x) + ε η(x) (7.13)
that are close to φ
0
; here ε is a real variable in the range −ε
0
< ε < ε
0
and η(x) is a once
continuously diﬀerentiable function. Since φ is to be admissible, we must have φ
0
+ εη ∈ A
for each ε ∈ (−ε
0
, ε
0
). Deﬁne a function
ˆ
F(ε) by
ˆ
F(ε) = F{φ
0
+ εη}, −ε
0
< ε < ε
0
. (7.14)
Since φ
0
minimizes F it follows that F{φ
0
+ εη} ≥ F{φ
0
} or equivalently
ˆ
F(ε) ≥
ˆ
F(0).
Therefore ε = 0 minimizes
ˆ
F(ε). The ﬁrst and second variations of F are deﬁned by
δF{φ
0
, η} =
ˆ
F
(0) and δ
2
F{φ
0
, η} =
ˆ
F
(0) respectively, and so if φ
0
minimizes F, then
it is necessary that
δF{φ
0
, η} = 0, δ
2
F{φ
0
, η} ≥ 0. (7.15)
These are necessary conditions on a minimizer φ
o
. We cannot go further in general.
In any speciﬁc problem, such as those in the subsequent sections, the necessary condition
δF{φ
o
, η} = 0 can be further simpliﬁed by exploiting the fact that it must hold for all
admissible η. This allows one to eliminate η leading to a condition (or conditions) that only
involves the minimizer φ
0
.
130 CHAPTER 7. CALCULUS OF VARIATIONS
Remark: Note that when η is independent of ε the functions φ
0
(x) and φ
0
(x) + εη(x), and
their derivatives, are close to each other for small ε. On the other hand the functions φ
0
(x)
and φ
0
(x)+ε sin(x/ε) are close to each other but their derivatives are not close to each other.
Throughout these notes we will consider functions η that are independent of ε and so, as
noted previously, we will be restricting attention exclusively to weak minimizers.
7.4 Application of the necessary condition δF = 0 to
the basic problem. Euler equation.
7.4.1 The basic problem. Euler equation.
Consider the following class of problems: let A be the set of all continuously diﬀerentiable
functions φ(x) deﬁned for 0 ≤ x ≤ 1 with φ(0) = a, φ(1) = b:
A =
¸
φ(·)
φ : [0, 1] →R, φ ∈ C
1
[0, 1], φ(0) = a, φ(1) = b
¸
. (7.16)
Let f(x, y, z) be a given function, deﬁned and smooth for all real x, y, z. Deﬁne a functional
F{φ}, for every φ ∈ A, by
F{φ} =
1
0
f (x, φ(x), φ
(x)) dx. (7.17)
We wish to ﬁnd a function φ ∈ A which minimizes F{φ}.
x
0
y
(0, a)
(1, b)
y = φ
0
(x) + εη(x)
y = φ
0
(x)
Figure 7.3: The minimizer φ
0
and a neighboring function φ
0
+εη.
7.4. APPLICATION OF NECESSARY CONDITION δF = 0 131
Suppose that φ
0
(x) ∈ A is a minimizer of F, so that F{φ} ≥ F{φ
0
} for all φ ∈ A. In
order to determine φ
0
we consider the one parameter family of admissible functions φ(x; ε) =
φ
0
(x) + ε η(x) where ε is a real variable in the range −ε
0
< ε < ε
0
and η(x) is a once
continuously diﬀerentiable function on [0, 1]; see Figure 7.3. Since φ must be admissible we
need φ
0
+ ε η ∈ A for each ε. Therefore we must have φ(0, ε) = a and φ(1, ε) = b which in
turn requires that
η(0) = η(1) = 0. (7.18)
Pick any function η(x) with the property (7.18) and ﬁx it. Deﬁne the function
ˆ
F(ε) =
F{φ
0
+ εη} so that
ˆ
F(ε) = F{φ
0
+ εη} =
1
0
f(x, φ
0
+ εη, φ
0
+ εη
) dx. (7.19)
We know from the analysis of the preceding section that a necessary condition for φ
0
to
minimize F is that
δF{φ
o
, η} =
ˆ
F
(0) = 0. (7.20)
On using the chainrule, we ﬁnd
ˆ
F
(ε) from (7.19) to be
ˆ
F
(ε) =
1
0
∂f
∂y
(x, φ
0
+ εη, φ
0
+ εη
) η +
∂f
∂z
(x, φ
0
+ εη, φ
0
+ εη
)η
dx,
and so (7.20) leads to
δF{φ
o
, η} =
ˆ
F
(0) =
1
0
∂f
∂y
(x, φ
0
, φ
0
)η +
∂f
∂z
(x, φ
0
, φ
0
)η
dx = 0. (7.21)
Thus far we have simply repeated the general analysis of the preceding section in the
context of the particular functional (7.17). Our goal is to ﬁnd φ
0
and so we must eliminate η
from (7.21). To do this we rearrange the terms in (7.21) into a convenient form and exploit
the fact that (7.21) must hold for all functions η that satisfy (7.18).
In order to do this we proceed as follows: Integrating the second term in (7.21) by parts
gives
1
0
∂f
∂z
η
dx =
¸
η
∂f
∂z
x=1
x=0
−
1
0
d
dx
∂f
∂z
η dx .
However by (7.18) we have η(0) = η(1) = 0 and therefore the ﬁrst term on the righthand
side drops out. Thus (7.21) reduces to
1
0
¸
∂f
∂y
−
d
dx
∂f
∂z
η dx = 0. (7.22)
132 CHAPTER 7. CALCULUS OF VARIATIONS
Though we have viewed η as ﬁxed up to this point, we recognize that the above derivation
is valid for all once continuously diﬀerentiable functions η(x) which have η(0) = η(1) = 0.
Therefore (7.22) must hold for all such functions.
Lemma: The following is a basic result from calculus: Let p(x) be a continuous function on [0, 1] and suppose
that
1
0
p(x)n(x)dx = 0
for all continuous functions n(x) with n(0) = n(1) = 0. Then,
p(x) = 0 for 0 ≤ x ≤ 1.
In view of this Lemma we conclude that the integrand of (7.22) must vanish and therefore
obtain the diﬀerential equation
d
dx
¸
∂f
∂z
(x, φ
0
, φ
0
)
−
∂f
∂y
(x, φ
0
, φ
0
) = 0 for 0 ≤ x ≤ 1. (7.23)
This is a diﬀerential equation for φ
0
, which together with the boundary conditions
φ
0
(0) = a, φ
0
(1) = b, (7.24)
provides the mathematical problem governing the minimizer φ
0
(x). The diﬀerential equation
(7.23) is referred to as the Euler equation (sometimes referred to as the EulerLagrange
equation) associated with the functional (7.17).
Notation: In order to avoid the (precise though) cumbersome notation above, we shall drop
the subscript “0” from the minimizing function φ
0
; moreover, we shall write the Euler equa
tion (7.23) as
d
dx
¸
∂f
∂φ
(x, φ, φ
)
−
∂f
∂φ
(x, φ, φ
) = 0, (7.25)
where, in carrying out the partial diﬀerentiation in (7.25), one treats x, φ and φ
as if they
were independent variables.
7.4.2 An example. The Brachistochrone Problem.
Consider the Brachistochrone Problem formulated in the ﬁrst example of Section 7.1. Here
we have
f(x, φ, φ
) =
1 + (φ
)
2
2gφ
7.4. APPLICATION OF NECESSARY CONDITION δF = 0 133
and we wish to ﬁnd the function φ
0
(x) that minimizes
F{φ} =
1
0
f(x, φ(x), φ
(x))dx =
1
0
1 + [φ
(x)]
2
2gφ(x)
dx
over the class of functions φ(x) that are continuous and have continuous ﬁrst derivatives on
[0,1], and satisfy the boundary conditions φ(0) = 0, φ(1) = h. Treating x, φ and φ
as if they
are independent variables and diﬀerentiating the function f(x, φ, φ
) gives:
∂f
∂φ
=
1 + (φ
)
2
2g
1
2(φ)
3/2
,
∂f
∂φ
=
φ
2gφ(1 + (φ
)
2
)
,
and therefore the Euler equation (7.23) specializes to
d
dx
φ
(φ)(1 + (φ
)
2
)
−
1 + (φ
)
2
2(φ)
3/2
= 0, 0 < x < 1, (7.26)
with associated boundary conditions
φ(0) = 0, φ(1) = h. (7.27)
The minimizer φ(x) therefore must satisfy the boundaryvalue problem consisting of the
secondorder (nonlinear) ordinary diﬀerential equation (7.26) and the boundary conditions
(7.27).
The rest of this subsection has nothing to do with the calculus of variations. It is simply
concerned with the solving the boundary value problem (7.26), (7.27). We can write the
diﬀerential equation as
φ
φ(1 + (φ
)
2
)
d
dφ
φ
φ(1 + (φ
)
2
)
+
1
2φ
2
= 0
which can be immediately integrated to give
1
(φ
(x))
2
=
φ(x)
c
2
−φ(x)
(7.28)
where c is a constant of integration that is to be determined.
It is most convenient to ﬁnd the path of fastest descent in parametric form, x = x(θ), φ =
φ(θ), θ
1
< θ < θ
2
, and to this end we adopt the substitution
φ =
c
2
2
(1 −cos θ) = c
2
sin
2
(θ/2), θ
1
< θ < θ
2
. (7.29)
134 CHAPTER 7. CALCULUS OF VARIATIONS
Diﬀerentiating this with respect to x gives
φ
(x) =
c
2
2
sin θ θ
(x)
so that, together with (7.28) and (7.29), this leads to
dx
dθ
=
c
2
2
(1 −cos θ)
which integrates to give
x =
c
2
2
(θ −sin θ) + c
1
, θ
1
< θ < θ
2
. (7.30)
We now turn to the boundary conditions. The requirement φ(x) = 0 at x = 0, together
with (7.29) and (7.30), gives us θ
1
= 0 and c
1
= 0. We thus have
x =
c
2
2
(θ −sin θ),
φ =
c
2
2
(1 −cos θ),
0 ≤ θ ≤ θ
2
. (7.31)
The remaining boundary condition φ(x) = h at x = 1 gives the following two equations for
ﬁnding the two constants θ
2
and c:
1 =
c
2
2
(θ
2
−sin θ
2
),
h =
c
2
2
(1 −cos θ
2
).
(7.32)
Once this pair of equations is solved for c and θ
2
then (7.31) provides the solution of the
problem. We now address the solvability of (7.32).
To this end, ﬁrst, if we deﬁne the function p(θ) by
p(θ) =
θ −sin θ
1 −cos θ
, 0 < θ < 2π, (7.33)
then, by dividing the ﬁrst equation in (7.32) by the second, we see that θ
2
is a root of the
equation
p(θ
2
) = h
−1
. (7.34)
One can readily verify that the function p(θ) has the properties
p →0 as θ →0+, p →∞ as θ →2π−,
dp
dθ
=
cos θ/2
sin
3
θ/2
(tan θ/2 −θ/2) > 0 for 0 < θ < 2π.
7.4. APPLICATION OF NECESSARY CONDITION δF = 0 135
p(θ)
θ
h
−1
θ
2 2π
Figure 7.4: A graph of the function p(θ) deﬁned in (7.33) versus θ. Note that given any h > 0 the equation
h
−1
= p(θ) has a unique root θ = θ
2
∈ (0, 2π).
Therefore it follows that as θ goes from 0 to 2π, the function p(θ) increases monotonically
from 0 to ∞; see Figure 7.4. Therefore, given any h > 0, the equation p(θ
2
) = h
−1
can be
solved for a unique value of θ
2
∈ (0, 2π). The value of c is then given by (7.32)
1
.
Thus in summary, the path of minimum descent is given by the curve deﬁned in (7.31)
with the values of θ
2
and c given by (7.34) and (7.32)
1
respectively. Figure 7.5 shows that
the curve (7.31) is a cycloid – the path traversed by a point on the rim of a wheel that rolls
without slipping.
A
P
θ
P
P
A
A
π θ =
x(θ) y(θ) ( , )
P
'
R
x
y
R
y(θ) = AP
− AP cos θ = R(1 − cos θ)
x(θ) = PP
− AP sin θ = R(θ − sin θ)
Figure 7.5: A cycloid x = x(θ), y = y(θ) is generated by rolling a circle along the xaxis as shown, the
parameter θ having the signiﬁcance of being the angle of rolling.
136 CHAPTER 7. CALCULUS OF VARIATIONS
7.4.3 A Formalism for Deriving the Euler Equation
In order to expedite the steps involved in deriving the Euler equation, one usually uses the
following formal procedure. First, we adopt the following notation: if H{φ} is any quantity
that depends on φ, then by δH we mean
2
δH = H(φ + εη) −H(φ) up to linear terms. (7.35)
that is,
δH = ε
dH{φ + εη}
dε
ε=0
, (7.36)
For example, by δφ we mean
δφ = (φ + εη) −(φ) = εη; (7.37)
by δφ
we mean
δφ
= (φ
+ εη
) −(φ
) = εη
= (δφ)
; (7.38)
by δf we mean
δf = f(x, φ + εη, φ
+ εη
) −f(x, φ, φ
)
=
∂f
∂φ
(x, φ, φ
) εη +
∂f
∂φ
(x, φ, φ
) εη
=
∂f
∂φ
δφ +
∂f
∂φ
δφ
;
(7.39)
and by δF, or δ
1
0
f dx, we mean
δF = F{φ + εη} −F{φ} = ε
¸
d
dε
F{φ + εη}
ε=0
= ε
1
0
¸
∂f
∂φ
η +
∂f
∂φ
η
dx
=
1
0
¸
∂f
∂φ
δφ +
∂f
∂φ
δφ
dx =
1
0
δf dx.
(7.40)
We refer to δφ(x) as an admissible variation. When η(0) = η(1) = 0, it follows that
δφ(0) = δφ(1) = 0.
2
Note the following minor change in notation: what we call δH here is what we previously would have
called ε δH.
7.5. GENERALIZATIONS. 137
We refer to δF as the ﬁrst variation of the functional F. Observe from (7.40) that
δF = δ
1
0
f dx =
1
0
δf dx. (7.41)
Finally observe that the necessary condition for a minimum that we wrote down previously
can be written as
δF{φ, δφ} = 0 for all admissible variations δφ. (7.42)
For purposes of illustration, let us now repeat our previous derivation of the Euler equa
tion using this new notation
3
. Given the functional F, a necessary condition for an extremum
of F is
δF = 0
and so our task is to calculate δF:
δF = δ
1
0
f dx =
1
0
δf dx.
Since f = f(x, φ, φ
), this in turn leads to
4
δF =
1
0
¸
∂f
∂φ
δφ +
∂f
∂φ
δφ
dx.
From here on we can proceed as before by setting δF = 0, integrating the second term by
parts, and using the boundary conditions and the arbitrariness of an admissible variation
δφ(x) to derive the Euler equation.
7.5 Generalizations.
7.5.1 Generalization: Free endpoint; Natural boundary conditions.
Consider the following modiﬁed problem: suppose that we want to ﬁnd the function φ(x)
from among all once continuously diﬀerentiable functions that makes the functional
F{φ} =
1
0
f(x, φ, φ
) dx
3
If ever in doubt about a particular step during a calculation, always go back to the meaning of the
symbols δφ, etc. or revert to using ε, η etc.
4
Note that the variation δ does not operate on x since it is the function φ that is being varied not the
independent variable x. So in particular, δf = f
φ
δφ +f
φ
δφ
and not δf = f
x
δx +f
φ
δφ +f
φ
δφ
.
138 CHAPTER 7. CALCULUS OF VARIATIONS
a minimum. Note that we do not restrict attention here to those functions that satisfy
φ(0) = a, φ(1) = b. So the set of admissible functions A is
A =
¸
φ(·) φ : [0, 1] →R, φ ∈ C
1
[0, 1]
¸
(7.43)
Note that the class of admissible functions A is much larger than before. The functional
F{φ} is deﬁned for all φ ∈ A by
F{φ} =
1
0
f(x, φ, φ
) dx. (7.44)
We begin by calculating the ﬁrst variation of F:
δF = δ
1
0
f dx =
1
0
δf dx =
1
0
¸
∂f
∂φ
δφ +
∂f
∂φ
δφ
dx (7.45)
Integrating the last term by parts yields
δF =
1
0
¸
∂f
∂φ
−
d
dx
∂f
∂φ
δφ dx +
¸
∂f
∂φ
δφ
1
0
. (7.46)
Since δF = 0 at an extremum, we must have
1
0
¸
∂f
∂φ
−
d
dx
∂f
∂φ
δφ dx +
¸
∂f
∂φ
δφ
1
0
= 0 (7.47)
for all admissible variations δφ(x). Note that the boundary term in (7.47) does not automat
ically drop out now because δφ(0) and δφ(1) do not have to vanish. First restrict attention
to all variations δφ with the additional property δφ(0) = δφ(1) = 0; equation (7.47) must
necessarily hold for all such variations δφ. The boundary terms now drop out and by the
Lemma in Section 7.4.1 it follows that
d
dx
¸
∂f
∂φ
−
∂f
∂φ
= 0 for 0 < x < 1. (7.48)
This is the same Euler equation as before. Next, return to (7.47) and keep (7.48) in mind.
We see that we must have
∂f
∂φ
x=1
δφ(1) −
∂f
∂φ
x=0
δφ(0) = 0 (7.49)
for all admissible variations δφ. Since δφ(0) and δφ(1) are both arbitrary (and not necessarily
zero), (7.49) requires that
∂f
∂φ
= 0 at x = 0 and x = 1. (7.50)
7.5. GENERALIZATIONS. 139
Equation (7.50) provides the boundary conditions to be satisﬁed by the extremizing function
φ(x). These boundary conditions were determined as part of the extremization; they are
referred to as natural boundary conditions in contrast to boundary conditions that are given
as part of a problem statement.
Example: Reconsider the Brachistochrone Problem analyzed previously but now suppose
that we want to ﬁnd the shape of the wire that commences from (0, 0) and ends somewhere
on the vertical through x = 1; see Figure 7.6. The only diﬀerence between this and the ﬁrst
Brachistochrone Problem is that here the set of admissible functions is
A
2
=
¸
φ(·)
φ : [0, 1] →R, φ ∈ C
1
[0, 1], φ(0) = 0
¸
;
note that there is no restriction on φ at x = 1. Our task is to minimize the travel time of
the bead T{φ} over the set A
2
.
x
0
g
y
y = φ(x)
1
Figure 7.6: Curve joining (0, 0) to an arbitrary point on the vertical line through x = 1.
The minimizer must satisfy the same Euler equation (7.26) as in the ﬁrst problem, and
the same boundary condition φ(0) = 0 at the left hand end. To ﬁnd the natural boundary
condition at the other end, recall that
f(x, φ, φ
) =
1 + (φ
)
2
2gφ
.
Diﬀerentiating this gives
∂f
∂φ
=
φ
2gφ(1 + (φ
)
2
)
.
and so by (7.50), the natural boundary coundition is
φ
2gφ(1 + (φ
)
2
)
= 0 at x = 1,
140 CHAPTER 7. CALCULUS OF VARIATIONS
which simpliﬁes to
φ
(1) = 0.
.
7.5.2 Generalization: Higher derivatives.
The functional F{φ} considered above involved a function φ and its ﬁrst derivative φ
. One
can consider functionals that involve higher derivatives of φ, for example
F{φ} =
1
0
f(x, φ, φ
, φ
) dx.
We begin with the formulation and analysis of a speciﬁc example and then turn to some
theory.
u
x
y
M
M
u
φ = u
Energy per unit length =
1
2
Mφ
=
1
2
EI(u
)
2
δφ
M = EIφ
Figure 7.7: The neutral axis of a beam in reference (straight) and deformed (curved) states. The bold lines
represent a crosssection of the beam in the reference and deformed states. In the classical BernoulliEuler
theory of beams, crosssections remain perpendicular to the neutral axis.
Example: The BernoulliEuler Beam. Consider an elastic beam of length L and bending
stiﬀness EI, which is clamped at its left hand end. The beam carries a distributed load p(x)
along its length and a concentrated force F at the right hand end x = L; both loads act in
the −ydirection. Let u(x), 0 ≤ x ≤ L, be a geometrically admissible deﬂection of the beam.
Since the beam is clamped at the left hand end this means that u(x) is any (smooth enough)
function that satisﬁes the geometric boundary conditions
u(0) = 0, u
(0) = 0; (7.51)
7.5. GENERALIZATIONS. 141
the boundary condition (7.51)
1
describes the geometric condition that the beam is clamped
at x = 0 and therefore cannot deﬂect at that point; the boundary condition (7.51)
2
describes
the geometric condition that the beam is clamped at x = 0 and therefore cannot rotate at
the left end. The set of admissible test functions that we consider is
A =
¸
u(·)  u : [0, L] →R, u ∈ C
4
[0, L], u(0) = 0, u
(0) = 0
¸
, (7.52)
which consists of all “geometrically possible conﬁgurations”.
From elasticity theory we know that the elastic energy associated with a deformed con
ﬁguration of the beam is (1/2)EI(u
)
2
per unit length. Therefore the total potential energy
of the system is
Φ{u} =
L
0
1
2
EI(u
(x))
2
dx −
L
0
p(x)u(x) dx −F u(L), (7.53)
where the last two terms represent the potential energy of the distributed and concentrated
loading respectively; the negative sign in front of these terms arises because the loads act
in the −ydirection while u is the deﬂection in the +ydirection. Note that the integrand
of the functional involves the higher derivative term u
. In addition, note that only two
boundary conditions u(0) = 0, u
(0) = 0 are given and so we expect to derive additional
natural boundary conditions at the right hand end x = L.
The actual deﬂection of the beam minimizes the potential energy (7.53) over the set
(7.52). We proceed in the usual way by calculating the ﬁrst variation δΦ and setting it equal
to zero:
δΦ = 0.
By using (7.53) this can be written explicitly as
L
0
EI u
δu
dx −
L
0
p δu dx −F δu(L) = 0.
Twice integrating the ﬁrst term by parts leads to
L
0
EI u
δu dx −
L
0
p δu dx −F δu(L) −
EIu
δu
L
0
+
EIu
δu
L
0
= 0.
The given boundary conditions (7.51) require that an admissible variation δu must obey
δu(0) = 0, δu
(0) = 0. Therefore the preceding equation simpliﬁes to
L
0
(EI u
−p) δu dx −[EIu
(L) + F] δu(L) + EIu
(L)δu
(L) = 0.
142 CHAPTER 7. CALCULUS OF VARIATIONS
Since this must hold for all admissible variations δu(x), it follows in the usual way that the
extremizing function u(x) must obey
EI u
(x) −p(x) = 0 for 0 < x < L,
EI u
(L) + F = 0,
EI u
(L) = 0.
(7.54)
Thus the extremizer u(x) obeys the fourth order linear ordinary diﬀerential equation (7.54)
1
,
the prescribed boundary conditions (7.51) and the natural boundary conditions (7.54)
2,3
.
The natural boundary condition (7.54)
2
describes the mechanical condition that the beam
carries a concentrated force F at the right hand end; and the natural boundary condition
(7.54)
3
describes the mechanical condition that the beam is free to rotate (and therefore has
zero “bending moment”) at the right hand end.
Exercise: Consider the functional
F{φ} =
1
0
f(x, φ, φ
, φ
)dx
deﬁned on the set of admissible functions A consisting of functions φ that are deﬁned and
four times continuously diﬀerentiable on [0, 1] and that satisfy the four boundary conditions
φ(0) = φ
0
, φ
(0) = φ
0
, φ(1) = φ
1
, φ
(1) = φ
1
.
Show that the function φ that extremizes F over the set A must satisfy the Euler equation
∂f
∂φ
−
d
dx
∂f
∂φ
+
d
2
dx
2
∂f
∂φ
= 0 for 0 < x < 1
where, as before, the partial derivatives ∂f/∂φ, ∂f/∂φ
and ∂f/∂φ
are calculated by treating
φ, φ
and φ
as if they are independent variables in f(x, φ, φ
, φ
).
7.5.3 Generalization: Multiple functions.
The functional F{φ} considered above involved a single function φ and its derivatives. One
can consider functionals that involve multiple functions, for example a functional
F{u, v} =
1
0
f(x, u, u
, v, v
) dx
7.5. GENERALIZATIONS. 143
that involves two functions u(x) and v(x). We begin with the formulation and analysis of a
speciﬁc example and then turn to some theory.
Example: The Timoshenko Beam. Consider a beam of length L, bending stiﬀness
5
EI and
shear stiﬀness GA. The beam is clamped at x = 0, it carries a distributed load p(x) along its
length which acts in the −ydirection, and carries a concentrated force F at the end x = L,
also in the −ydirection.
In the simplest model of a beam – the socalled BernoulliEuler model – the deformed
state of the beam is completely deﬁned by the deﬂection u(x) of the centerline (the neutral
axis) of the beam. In that theory, shear deformations are neglected and therefore a cross
section of the beam remains perpendicular to the neutral axis even in the deformed state.
Here we discuss a more general theory of beams, one that accounts for shear deformations.
u
φ
u
φ 
x
y
Figure 7.8: The neutral axis of a beam in reference (straight) and deformed (curved) states. The bold lines
represent a crosssection of the beam in the reference and deformed states. The thin line is perpendicular
to the deformed neutral axis, so that in the classical BernoulliEuler theory of beams, where crosssections
remain perpendicular to the neutral axis, the thin line and the bold line would coincide. The angle between
the vertical and the bold line if φ. The angle between the neutral axis and the horizontal, which equals the
angle between the perpendicular to the neutral axis (the thin line) and the vertical dashed line, is u
. The
decrease in the angle between the crosssection and the neutral axis is therefore u
−φ.
In the theory considered here, a crosssection of the beam is not constrained to remain
perpendicular to the neutral axis. Thus a deformed state of the beam is characterized by two
5
E and G are the Young’s modulus and shear modulus of the material, while I and A are the second
moment of crosssection and the area of the crosssection respectively.
144 CHAPTER 7. CALCULUS OF VARIATIONS
ﬁelds: one, u(x), characterizes the deﬂection of the centerline of the beam at a location x,
and the second, φ(x), characterizes the rotation of the crosssection at x. (In the Bernoulli
Euler model, φ(x) = u
(x) since for small angles, the rotation equals the slope.) The fact
that the left hand end is clamped implies that the point x = 0 cannot deﬂect and that the
crosssection at x = 0 cannot rotate. Thus we have the geometric boundary conditions
u(0) = 0, φ(0) = 0. (7.55)
Note that the zero rotation boundary condition is φ(0) = 0 and not u
(0) = 0.
In the more accurate beam theory discussed here, the socalled Timoshenko beam theory,
one does not neglect shear deformations and so u(x) and φ(x) are (geometrically) independent
functions. Since the shear strain is deﬁned as the change in angle between two ﬁbers that
are initially at right angles to each other, the shear strain in the present situation is
γ(x) = u
(x) −φ(x);
see Figure 7.8. Observe that in the BernoulliEuler theory γ(x) = 0.
γ
S = GAγ
S
S
M
δφ
M = EIφ
M
Figure 7.9: Basic constitutive relationships for a beam.
The basic equations of elasticity tell us that the momentcurvature relation for bending
is
M(x) = EIφ
(x)
7.5. GENERALIZATIONS. 145
and that the associated elastic energy per unit length of the beam, (1/2)Mφ
, is
1
2
EI(φ
(x))
2
.
Similarly, we know from elasticity that the shear forceshear strain relation for a beam is
6
S(x) = GAγ(x)
and that the associated elastic energy per unit length of the beam, (1/2)Sγ, is
1
2
GA(γ(x))
2
.
The total potential energy of the system is thus
Φ = Φ{u, φ} =
L
0
1
2
EI(φ
(x))
2
+
1
2
GA(u
(x) −φ(x))
2
dx −
L
0
pu(x)dx −Fu(L),
(7.56)
where the last two terms in this expression represent the potential energy of the distributed
and concentrated loading respectively (and the negative signs arise because u is the deﬂection
in the +ydirection while the loadings p and F are applied in the −ydirection). We allow
for the possibility that p, EI and GA may vary along the length of the beam and therefore
might be functions of x.
The displacement and rotation ﬁelds u(x) and φ(x) associated with an equilibrium con
ﬁguration of the beam minimizes the potential energy Φ{u, φ} over the admissible set A of
test functions where take
A = {u(·), φ(·)
u : [0, l] →R, φ : [0, l] →R, u ∈ C
2
([0, L]), φ ∈ C
2
([0, L]), u(0) = 0, φ(0) = 0}.
Note that all admissible functions are required to satisfy the geometric boundary conditions
(7.55).
To ﬁnd a minimizer of Φ we calculate its ﬁrst variation which from (7.56) is
δΦ =
L
0
EIφ
δφ
+ GA(u
−φ)(δu
−δφ)
¸
dx −
L
0
p δu dx −F δu(L).
6
Since the top and bottom faces of the diﬀerential element shown in Figure 7.9 are free of shear traction,
we know that the element is not in a state of simple shear. Instead, the shear stress must vary with y such
that it vanishes at the top and bottom. In engineering practice, this is taken into account approximately by
replacing GA by κGA where the heuristic parameter κ ≈ 0.8 −0.9.
146 CHAPTER 7. CALCULUS OF VARIATIONS
Integrating the terms involving δu
and δφ
by parts gives
δΦ =
EIφ
δφ
L
0
−
L
0
d
dx
EIφ
δφdx
+
GA(u
−φ)δu
L
0
−
L
0
d
dx
GA(u
−φ)
δu dx −
L
0
GA(u
−φ)δφdx
−
L
0
p δu dx −Fδu(L).
Finally on using the facts that an admissible variation must satisfy δu(0) = 0 and δφ(0) = 0,
and collecting the like terms in the preceding equation leads to
δΦ = EIφ
(L) δφ(L) +
GA
u
(L) −φ(L)
−F
δu(L)
−
L
0
¸
d
dx
EIφ
+ GA(u
−φ)
δφ(x) dx
−
L
0
¸
d
dx
GA(u
−φ)
+ p
δu(x) dx.
(7.57)
At a minimizer, we have δΦ = 0 for all admissible variations. Since the variations
δu(x), δφ(x) are arbitrary on 0 < x < L and since δu(L) and δφ(L) are also arbitrary,
it follows from (7.57) that the ﬁeld equations
d
dx
EIφ
+ GA(u
−φ) = 0, 0 < x < L,
d
dx
GA(u
−φ)
+ p = 0, 0 < x < L,
(7.58)
and the natural boundary conditions
EIφ
(L) = 0, GA
u
(L) −φ(L)
= F (7.59)
must hold.
Thus in summary, an equilibrium conﬁguration of the beam is described by the deﬂec
tion u(x) and rotation φ(x) that satisfy the diﬀerential equations (7.58) and the boundary
conditions (7.55), (7.59). [Remark: Can you recover the BernoulliEuler theory from the
Timoshenko theory in the limit as the shear rigidity GA →∞?]
Exercise: Consider a smooth function f(x, y
1
, y
2
, . . . , y
n
, z
1
, z
2
, . . . , z
n
) deﬁned for all x, y
1
, y
2
, . . . , y
n
,
z
1
, . . . , z
n
. Let φ
1
(x), φ
2
(x), . . . , φ
n
(x) be n oncecontinuously diﬀerentiable functions on [0, 1]
7.5. GENERALIZATIONS. 147
with φ
i
(0) = a
i
, φ
i
(1) = b
i
. Let F be the functional deﬁned by
F {φ
1
, φ
2
, . . . , φ
n
} =
1
0
f(x, φ
1
, φ
2
, . . . , φ
n
, φ
1
, φ
2
, . . . , φ
n
) dx (7.60)
on the set of all such admissible functions. Show that the functions φ
1
(x), φ
2
(x), . . . , φ
n
(x)
that extremize F must necessarily satisfy the n Euler equations
d
dx
¸
∂f
∂φ
i
−
∂f
∂φ
i
= 0 for 0 < x < 1, (i = 1, 2, . . . , n). (7.61)
7.5.4 Generalization: End point of extremal lying on a curve.
Consider the set A of all functions that describe curves in the x, yplane that commence from
a given point (0, a) and end at some point on the curve G(x, y) = 0. We wish to minimize a
functional F{φ} over this set of functions.
x
0
y = φ(x)
y
G(x, y) = 0
(0, a)
y = φ(x) + δφ(x)
x
R
x
R
+ δx
R
Figure 7.10: Curve joining (0, a) to an arbitrary point on the given curve G(x, y) = 0.
Suppose that φ(x) ∈ A is a minimizer of F. Let x = x
R
be the abscissa of the point at
which the curve y = φ(x) intersects the curve G(x, y) = 0. Observe that x
R
is not known a
priori and is to be determined along with φ. Moreover, note that the abscissa of the point
at which a neighboring curve deﬁned by y = φ(x) + δφ(x) intersects the curve G = 0 is not
x
R
but x
R
+ δx
R
; see Figure 7.10.
At the minimizer,
F{φ} =
x
R
0
f(x, φ, φ
)dx
148 CHAPTER 7. CALCULUS OF VARIATIONS
and at a neighboring test function
F{φ + δφ} =
x
R
+δx
R
0
f(x, φ + δφ, φ
+ δφ
)dx.
Therefore on calculating the ﬁrst variation δF, which equals the linearized form of F{φ +
δφ} −F{φ}, we ﬁnd
δF =
x
R
+δx
R
0
f(x, φ, φ
) + f
φ
(x, φ, φ
)δφ + f
φ
(x, φ, φ
)δφ
dx −
x
R
0
f(x, φ, φ
)dx
where we have set f
φ
= ∂f/∂φ and f
φ
= ∂f/∂φ
. This leads to
δF =
x
R
+δx
R
x
R
f(x, φ, φ
)dx +
x
R
0
f
φ
(x, φ, φ
)δφ + f
φ
(x, φ, φ
)δφ
dx
which in turn reduces to
δF = f
x
R
, φ(x
R
), φ
(x
R
)
δx
R
+
x
R
0
f
φ
δφ + f
φ
δφ
dx.
Thus setting the ﬁrst variation δF equal to zero gives
f
x
R
, φ(x
R
), φ
(x
R
)
δx
R
+
x
R
0
f
φ
δφ + f
φ
δφ
dx = 0.
After integrating the last term by parts we get
f
x
R
, φ(x
R
), φ
(x
R
)
δx
R
+
f
φ
δφ
x
R
0
+
x
R
0
f
φ
−
d
dx
f
φ
δφdx = 0
which, on using the fact that δφ(0) = 0, reduces to
f
x
R
, φ(x
R
), φ
(x
R
)
δx
R
+ f
φ
x
R
, φ(x
R
), φ
(x
R
)
δφ(x
R
) +
x
R
0
f
φ
−
d
dx
f
φ
δφdx = 0.
(7.62)
First limit attention to the subset of all test functions that terminate at the same point
(x
R
, φ(x
R
)) as the minimizer. In this case δx
R
= 0 and δφ(x
R
) = 0 and so the ﬁrst two
terms in (7.62) vanish. Since this specialized version of equation (7.62) must hold for all such
variations δφ(x), this leads to the Euler equation
f
φ
−
d
dx
f
φ
= 0, 0 ≤ x ≤ x
R
. (7.63)
We now return to arbitrary admissible test functions. Substituting (7.63) into (7.62) gives
f
x
R
, φ(x
R
), φ
(x
R
)
δx
R
+ f
φ
x
R
, φ(x
R
), φ
(x
R
)
δφ(x
R
) = 0 (7.64)
7.5. GENERALIZATIONS. 149
which must hold for all admissible δx
R
and δφ(x
R
). It is important to observe that since
admissible test curves must end on the curve G = 0, the quantities δx
R
and δφ(x
R
) are not
independent of each other. Thus (7.64) does not hold for all δx
R
and δφ(x
R
); only for those
that are consistent with this geometric requirement. The requirement that the minimizing
curve and the neighboring test curve terminate on the curve G(x, y) = 0 implies that
G(x
R
, φ(x
R
)) = 0, G(x
R
+ δx
R
, φ(x
R
+ δx
R
) + δφ(x
R
+ δx
R
)) = 0, .
Note that linearization gives
G(x
R
+ δx
R
, φ(x
R
+ δx
R
) + δφ(x
R
+ δx
R
))
= G(x
R
+ δx
R
, φ(x
R
) + φ
(x
R
)δx
R
+ δφ(x
R
))
= G(x
R
, φ(x
R
)) + G
x
(x
R
, φ(x
R
))δx
R
+ G
y
(x
R
, φ(x
R
))
φ
(x
R
)δx
R
+ δφ(x
R
)
,
= G(x
R
, φ(x
R
)) +
G
x
(x
R
, φ(x
R
)) + φ
(x
R
)G
y
(x
R
, φ(x
R
))
δx
R
+ G
y
(x
R
, φ(x
R
)) δφ(x
R
).
where we have set G
x
= ∂G/∂x and G
y
= ∂G/∂x. Setting δG = G(x
R
+δx
R
, φ(x
R
+δx
R
) +
δφ(x
R
+δx
R
)) −G(x
R
, φ(x
R
)) = 0 thus leads to the following relation between the variations
δx
R
and δφ(x
R
):
G
x
(x
R
, φ(x
R
)) + φ
(x
R
)G
y
(x
R
, φ(x
R
))
δx
R
+ G
y
(x
R
, φ(x
R
)) δφ(x
R
) = 0. (7.65)
Thus (7.64) must hold for all δx
R
and δφ(x
R
) that satisfy (7.65). This implies that
7
f
x
R
, φ(x
R
), φ
(x
R
)
−λ
G
x
(x
R
, φ(x
R
)) + φ
(x
R
)G
y
(x
R
, φ(x
R
))
= 0,
f
φ
x
R
, φ(x
R
), φ
(x
R
)
−λG
y
(x
R
, φ(x
R
)) = 0,
for some constant λ (referred to as a Lagrange multiplier). We can use the second equation
above to simplify the ﬁrst equation which then leads to the pair of equations
f
x
R
, φ(x
R
), φ
(x
R
)
−φ
(x
R
)f
φ
x
R
, φ(x
R
), φ
(x
R
)
−λG
x
(x
R
, φ(x
R
)) = 0,
f
φ
x
R
, φ(x
R
), φ
(x
R
)
−λG
y
(x
R
, φ(x
R
)) = 0.
(7.66)
7
It may be helpful to recall from calculus that if we are to minimize a function I(ε
1
, ε
2
), we must satisfy
the condition dI = (∂I/∂ε
1
)dε
1
+ (∂I/∂ε
2
)dε
2
= 0. But if this minimization is carried out subject to the
side constraint J(ε
1
, ε
2
) = 0 then we must respect the side condition dJ = (∂J/∂ε
1
)dε
1
+ (∂J/∂ε
2
)dε
2
= 0.
Under these circumstances, one ﬁnds that that one must require the conditions ∂I/∂ε
1
= λ∂J/∂ε
1
, ∂I/∂ε
2
=
λ∂J/∂ε
2
where the Lagrange multiplier λ is unknown and is also to be determined. The constrain equation
J = 0 provides the extra condition required for this purpose.
150 CHAPTER 7. CALCULUS OF VARIATIONS
Equation (7.66) provides two natural boundary conditions at the right hand end x = x
R
.
In summary: an extremal φ(x) must satisfy the diﬀerential equations (7.63) on 0 ≤
x ≤ x
R
, the boundary condition φ = a at x = 0, the two natural boundary conditions
(7.66) at x = x
R
, and the equation G(x
R
, φ(x
R
)) = 0. (Note that the presence of the
additional unknown λ is compensated for by the imposition of the additional condition
G(x
R
, φ(x
R
)) = 0.)
Example: Suppose that G(x, y) = c
1
x+c
2
y+c
3
and that we are to ﬁnd the curve of shortest
length that commences from (0, a) and ends on G = 0.
Since ds =
dx
2
+ dy
2
=
1 + (φ
)
2
dx we are to minimize the functional
F =
x
R
0
1 + (φ
)
2
dx.
Thus
f(x, φ, φ
) =
1 + (φ
)
2
, f
φ
(x, φ, φ
) = 0 and f
φ
(x, φ, φ
) =
φ
1 + (φ
)
2
. (7.67)
On using (7.67), the Euler equation (7.63) can be integrated immediately to give
φ
(x) = constant for 0 ≤ x ≤ x
R
.
The boundary condition at the left hand end is
φ(0) = a,
while the boundary conditions (7.66) at the right hand end give
1
1 + φ
2
(x
R
)
= λc
1
,
φ
(x
R
)
1 + φ
2
(x
R
)
= λc
2
.
Finally the condition G(x
R
, φ(x
R
)) = 0 requires that
c
1
x
R
+ c
2
φ(x
R
) + c
3
= 0.
Solving the preceding equations leads to the minimizer
φ(x) = (c
2
/c
1
)x + a for 0 ≤ x ≤ −
c
1
(ac
2
+ c
3
)
c
2
1
+ c
2
2
.
7.6. CONSTRAINED MINIMIZATION 151
7.6 Constrained Minimization
7.6.1 Integral constraints.
Consider a problem of the following general form: ﬁnd admissible functions φ
1
(x), φ
2
(x) that
minimizes
F{φ
1
, φ
2
} =
1
0
f(x, φ
1
(x), φ
2
(x), φ
1
(x), φ
2
(x)) dx (7.68)
subject to the constraint
G(φ
1
, φ
2
) =
1
0
f(x, φ
1
(x), φ
2
(x), φ
1
(x), φ
2
(x)) dx = 0. (7.69)
For reasons of clarity we shall return to the more detailed approach where we introduce
parameters ε
1
, ε
2
and functions η
1
(x), η
1
(x), rather than following the formal approach using
variations δφ
1
(x), δφ
2
(x). Accordingly, suppose that the pair φ
1
(x), φ
2
(x) is the minimizer. By
evaluating F and G on a family of neighboring admissible functions φ
1
(x) +ε
1
η
1
(x), φ
2
(x) +
ε
2
η
2
(x) we have
ˆ
F(ε
1
, ε
2
) = F{φ
1
(x) + ε
1
η
1
(x), φ
2
(x) + ε
2
η
2
(x)},
ˆ
G(ε
1
, ε
2
) = G(φ
1
(x) + ε
1
η
1
(x), φ
2
(x) + ε
2
η
2
(x)) = 0.
(7.70)
If we begin by keeping η
1
and η
2
ﬁxed, this is a classical minimization problem for a function
of two variables: we are to minimize the function
ˆ
F(ε
1
, ε
2
) with respect to the variables
ε
1
and ε
2
, subject to the constraint
ˆ
G(ε
1
, ε
2
) = 0. A necessary condition for this is that
d
ˆ
F(ε
1
, ε
2
) = 0, i.e. that
d
ˆ
F =
∂
ˆ
F
∂ε
1
dε
1
+
∂
ˆ
F
∂ε
2
dε
2
= 0, (7.71)
for all dε
1
, dε
2
that are consistent with the constraint. Because of the constraint, dε
1
and dε
2
cannot be varied independently. Instead the constraint requires that they be related by
d
ˆ
G =
∂
ˆ
G
∂ε
1
dε
1
+
∂
ˆ
G
∂ε
2
dε
2
= 0. (7.72)
If we didn’t have the constraint, then (7.71) would imply the usual requirements ∂
ˆ
F/∂ε
1
=
∂
ˆ
F/∂ε
2
= 0. However when the constraint equation (7.72) holds, (7.71) only requires that
∂
ˆ
F
∂ε
1
= λ
∂
ˆ
G
∂ε
1
,
∂
ˆ
F
∂ε
2
= λ
∂
ˆ
G
∂ε
2
, (7.73)
152 CHAPTER 7. CALCULUS OF VARIATIONS
for some constant λ, or equivalently
∂
∂ε
1
(
ˆ
F −λ
ˆ
G) = 0,
∂
∂ε
2
(
ˆ
F −λ
ˆ
G) = 0. (7.74)
Therefore minimizing
ˆ
F subject to the constraint
ˆ
G = 0 is equivalent to minimizing
ˆ
F −λ
ˆ
G
without regard to the constraint; λ is known as a Lagrange multiplier. Proceeding from
here on leads to the Euler equation associated with F −λG. The presence of the additional
unknown parameter λ is balanced by the availability of the constraint equation G = 0.
Example: Consider a heavy inextensible cable of mass per unit length m that hangs under
gravity. The two ends of the cable are held at the same vertical height, a distance 2H apart.
The cable has a given length L. We know from physics that the cable adopts a shape that
minimizes the potential energy. We are asked to determine this shape.
x
0
y = φ(x)
g
y
H −H
Let y = φ(x), −H ≤ x ≤ H, describe an admissible shape of the cable. The potential
energy of the cable is determined by integrating mgφ with respect to the arc length s along
the cable which, since ds =
dx
2
+ dy
2
=
1 + (φ
)
2
dx, is given by
V {φ} =
L
0
mgφds = mg
H
−H
φ
1 + (φ
)
2
dx. (7.75)
Since the cable is inextensible, its length
{φ} =
L
0
ds =
H
−H
1 + (φ
)
2
dx (7.76)
must equal L. Therefore we are asked to ﬁnd a function φ(x) with φ(−H) = φ(H), that
minimizes V {φ} subject to the constraint {φ} = L. According to the theory developed
7.6. CONSTRAINED MINIMIZATION 153
above, this function must satisfy the Euler equation associated with the functional V {φ} −
λ{φ} where the Lagrange multiplier λ is a constant. The resulting boundary value problem
together with the constraint = L yields the shape of the cable φ(x).
Calculating the ﬁrst variation of V −λmg, where the constant λ is a Lagrange multiplier,
leads to the Euler equation
d
dx
(φ −λ)
φ
1 + (φ
)
2
¸
−
1 + (φ
)
2
= 0, −H < x < H.
This can be integrated once to yield
φ
=
(φ −λ)
2
c
2
− 1
where c is a constant of integration. Integrating this again leads to
φ(x) = c cosh[(x + d)/c] + λ, −H < x < H,
where d is a second constant of integration. For symmetry, we must have φ
(0) = 0 and
therefore d = 0. Thus
φ(x) = c cosh(x/c) + λ, −H < x < H. (7.77)
The constant λ in (7.77) is simply a reference height. For example we could take the xaxis
to pass through the two pegs in which case φ(±H) = 0 and then λ = −c cosh(H/c) and so
φ(x) = c
cosh(x/c) − cosh(H/c)
, −H < x < H. (7.78)
Substituting (7.78) into the constraint condition {φ} = L with given by (7.76) yields
L = 2c sinh(H/c). (7.79)
Thus in summary, if equation (7.79) can be solved for c, then (7.78) gives the equation
describing the shape of the cable.
All that remains is to examine the solvability of (7.79). To this end set z = H/c and
µ = L/(2H). Then we must solve µz = sinh z where µ > 1 is a constant. (The requirement
µ > 1 follows from the physical necessity that the distance between the pegs, 2H, be less
than the length of the rope, L.) One can show that as z increases from 0 to ∞, the function
sinh z −µz starts from the value 0, decreases monotonically to some ﬁnite negative value at
some z = z
∗
> 0, and then increases monotonically to ∞. Thus for each µ > 0 the function
sinh z − µz vanishes at some unique positive value of z. Consequently (7.79) has a unique
root c > 0.
154 CHAPTER 7. CALCULUS OF VARIATIONS
7.6.2 Algebraic constraints
Now consider a problem of the following general type: ﬁnd a pair of admissible functions
φ
1
(x), φ
2
(x) that minimizes
1
0
f(x, φ
1
, φ
2
, φ
1
, φ
2
)dx
subject to the algebraic constraint
g(x, φ
1
(x), φ
2
(x)) = 0 for 0 ≤ x ≤ 1.
One can show that a necessary condition is that the minimizer should satisfy the Euler
equation associated with f −λg. In this problem the Lagrange multiplier λ may be a function
of x.
Example: Consider a conical surface characterized by
g(x
1
, x
2
, x
3
) = x
2
1
+ x
2
2
−R
2
(x
3
) = 0, R(x
3
) = x
3
tan α, x
3
> 0.
Let P = (p
1
, p
2
, p
3
) and Q = (q
1
, q
2
, q
3
), q
3
> p
3
, be two points on this surface. A smooth
wire lies entirely on the conical surface and joins the points P and Q. A bead slides along
the wire under gravity, beginning at rest from P. From among all such wires, we are to ﬁnd
the one that gives the minimum travel time.
x
1
x
2
x
3
P
Q
g
Figure 7.11: A curve that joins the points (p
1
, p
2
, p
3
) to (q
2
, q
2
, q
3
) and lies on the conical surface x
2
1
+
x
2
2
−x
2
3
tan
2
α = 0.
Suppose that the wire can be described parametrically by x
1
= φ
1
(x
3
), x
2
= φ
2
(x
3
) for
p
3
≤ x
3
≤ q
3
. (Not all of the permissible curves can be described this way and so by using
7.6. CONSTRAINED MINIMIZATION 155
this characterization we are limiting ourselves to a subset of all the permited curves.) Since
the curve has to lie on the conical surface it is necessary that
g(φ
1
(x
3
), φ
2
(x
3
), x
3
) = 0, p
3
≤ x
3
≤ q
3
. (7.80)
The travel time is found by integrating ds/v along the path. The arc length ds along the
path is given by
ds =
dx
2
1
+ dx
2
2
+ dx
2
3
=
(φ
1
)
2
+ (φ
2
)
2
+ 1 dx
3
.
The conservation of energy tells us that
1
2
mv
2
(t) −mgx
3
(t) = −mgp
3
, or
v =
2g(x
3
−p
3
).
Therefore the travel time is
T{φ
1
, φ
2
} =
q
3
p
3
(φ
1
)
2
+ (φ
2
)
2
+ 1
2g(x
3
−p
3
)
dx
3
.
Our task is to minimize T{φ
1
, φ
2
} over the set of admissible functions
A =
(φ
1
, φ
2
)
φ
i
: [p
3
, q
3
] →R, φ
i
∈ C
2
[p
3
, q
3
], φ
i
(p
3
) = p
i
, φ
i
(q
3
) = q
i
, i = 1, 2
¸
,
subject to the constraint
g(φ
1
(x
3
), φ
2
(x
3
), x
3
) = 0, p
3
≤ x
3
≤ q
3
.
According to the theory developed the solution is given by solving the Euler equations
associated with f −λ(x
3
)g where
f(x
3
, φ
1
, φ
2
, φ
1
, φ
2
) =
(φ
1
)
2
+ (φ
2
)
2
+ 1
2g(x
3
−p
3
)
and g(x
1
, x
2
, x
3
) = x
2
1
+ x
2
2
−x
2
3
tan
2
α,
subject to the prescribed conditions at the ends and the constraint g(x
1
, x
2
, x
3
) = 0.
7.6.3 Diﬀerential constraints
Now consider a problem of the following general type: ﬁnd a pair of admissible functions
φ
1
(x), φ
2
(x) that minimizes
1
0
f(x, φ
1
, φ
2
, φ
1
, φ
2
)dx
156 CHAPTER 7. CALCULUS OF VARIATIONS
subject to the diﬀerential equation constraint
g(x, φ
1
(x), φ
2
(x), φ
1
(x), φ
2
(x)) = 0, for 0 ≤ x ≤ 1.
Suppose that the constraint is not integrable, i.e. suppose that there does not exist a func
tion h(x, φ
1
(x), φ
2
(x)) such that g = dh/dx. (In dynamics, such constraints are called non
holonomic.) One can show that it is necessary that the minimizer satisfy the Euler equation
associated with f − λg. In these problems, the Lagrange multiplier λ may be a function of
x.
Example: Determine functions φ
1
(x) and φ
2
(x) that minimize
1
0
f(x, φ
1
, φ
1
, φ
2
)dx
over an admissible set of functions subject to the nonholonomic constraint
g(x, φ
1
, φ
2
, φ
1
, φ
2
) = φ
2
−φ
1
= 0, for 0 ≤ x ≤ 1. (7.81)
According to the theory above, the minimizers satisfy the Euler equations
d
dx
¸
∂h
∂φ
1
−
∂h
∂φ
1
= 0,
d
dx
¸
∂h
∂φ
2
−
∂h
∂φ
2
= 0 for 0 < x < 1, (7.82)
where h = f −λg. On substituting for f and g, these Euler equations reduce to
d
dx
¸
∂f
∂φ
1
+ λ
−
∂f
∂φ
1
= 0,
d
dx
¸
∂f
∂φ
2
+ λ = 0 for 0 < x < 1. (7.83)
Thus the functions φ
1
(x), φ
2
(x), λ(x) are determined from the three diﬀerential equations
(7.81), (7.83).
Remark: Note by substituting the constraint into the integrand of the functional that we can
equivalently pose this problem as one for determining the function φ
1
(x) that minimizes
1
0
f(x, φ
1
, φ
1
, φ
1
)dx
over an admissible set of functions.
7.7. WEIRSTRASSERDMAN CORNER CONDITIONS 157
7.7 Piecewise smooth minimizers. WeirstrassErdman
corner conditions.
In order to motivate the discussion to follow, ﬁrst consider the problem of minimizing the
functional
F{φ} =
1
0
((φ
)
2
−1)
2
dx (7.84)
over functions φ with φ(0) = φ(1) = 0.
This is apparently a problem of the classical type where in the present case we are
to minimize the integral of f(x, φ, φ
) =
(φ
)
2
− 1
2
with respect to x over the interval
[0, 1]. Assuming that the class of admissible functions are those that are C
1
[0, 1] and satisfy
φ(0) = φ(1) = 0, the minimizer must necessarily satisfy the Euler equation
d
dx
(∂f/∂φ
) −
(∂f/∂φ) = 0. In the present case this specializes to 2[ (φ
)
2
−1](2φ
) = constant for 0 ≤ x ≤ 1,
which in turn gives φ
(x) = constant for 0 ≤ x ≤ 1. On enforcing the boundary conditions
φ(0) = φ(1) = 0, this gives φ(x) = 0 for 0 ≤ x ≤ 1. This is an extremizer of F{φ} over
the class of admissible functions under consideration. It is readily seen from (7.84) that the
value of F at this particular function φ(x) = 0 is F = 1
Note from (7.84) that F ≥ 0. It is natural to wonder whether there is a function φ
∗
(x)
that gives F{φ
∗
} = 0. If so, φ
∗
would be a minimizer. If there is such a function φ
∗
, we know
that it cannot belong to the class of admissible functions considered above, since if it did,
we would have found it from the preceding calculation. Therefore if there is a function φ
∗
of
this type, it does not belong to the set of functions A. The functions in A were required to
be C
1
[0, 1] and to vanish at the two ends x = 0 and x = 1. Since φ
∗
/ ∈ A it must not satisfy
one or both of these two conditions. The problem statement requires that the boundary
conditions must hold. Therefore it must be true that φ is not as smooth as C
1
[0, 1].
If there is a function φ
∗
such that F{φ
∗
} = 0, it follows from the nonnegative character
of the integrand in (7.84) that the integrand itself should vanish almost everywhere in [0, 1].
This requires that φ
(x) = ±1 almost everywhere in [0, 1]. The piecewise linear function
φ
∗
(x) =
x for 0 ≤ x ≤ 1/2,
(1 −x) for 1/2 ≤ x ≤ 1,
(7.85)
has this property. It is continuous, is piecewise C
1
, and gives F{φ
∗
} = 0. Moreover φ
∗
(x)
satisﬁes the Euler equation except at x = 1/2.
158 CHAPTER 7. CALCULUS OF VARIATIONS
But is it legitimate for us to consider piecewise smooth functions? If so are there are
any restrictions that we must enforce? Physical problems involving discontinuities in certain
physical ﬁelds or their derivatives often arise when, for example, the problem concerns an
interface separating two diﬀerent mateirals. A speciﬁc example will be considered below.
7.7.1 Piecewise smooth minimizer with nonsmoothness occuring
at a prescribed location.
Suppose that we wish to extremize the functional
F{φ} =
1
0
f(x, φ, φ
)dx
over some suitable set of admissible functions, and suppose further that we know that the
extremal φ(x) is continuous but has a discontinuity in its slope at x = s: i.e. φ
(s−) = φ
(s+)
where φ
(s±) denotes the limiting values of φ
(s ± ε) as ε → 0. Thus the set of admissible
functions is composed of all functions that are smooth on either side of x = s, that are
continuous at x = s, and that satisfy the given boundary conditions φ(0) = φ
0
, φ(1) = φ
1
:
A = {φ(·)
φ : [0, 1] →R, φ ∈ C
1
([0, s) ∪ (s, 1]), φ ∈ C[0, 1], φ(0) = φ
0
, φ(1) = φ
1
} .
Observe that an admissible function is required to be continuous on [0, 1], required to have
a continuous ﬁrst derivative on either side of x = s, and its ﬁrst derivative is permitted to
have a jump discontinuity at a given location x = s.
1
x
y
s
φ(x)
φ(x) + δφ(x)
φ(x)
φ(x) + δφ(x)
Figure 7.12: Extremal φ(x) and a neighboring test function φ(x) +δφ(x) both with kinks at x = s.
7.7. WEIRSTRASSERDMAN CORNER CONDITIONS 159
Suppose that F is extremized by a function φ(x) ∈ A and suppose that this extremal has
a jump discontinuity in its ﬁrst derivative at x = s. Let δφ(x) be an admissible variation
which means that the neighboring function φ(x) + δ(x) is also in A which means that it is
C
1
on [0, s) ∪(s, 1] and (may) have a jump discontinuity in its ﬁrst derivative at the location
x = s; see Figure 7.12. This implies that
δφ(x) ∈ C([0, 1]), δφ(x) ∈ C
1
([0, s) ∪ (s, 1]), δφ(0) = δφ(1) = 0.
In view of the lack of smoothness at x = s it is convenient to split the integral into two parts
and write
F{φ} =
s
0
f(x, φ, φ
)dx +
1
s
f(x, φ, φ
)dx,
and
F{φ + δφ} =
s
0
f(x, φ + δφ, φ
+ δφ
)dx +
1
s
f(x, φ + δφ, φ
+ δφ
)dx.
Upon calculating δF, which by deﬁnition equals F{φ+δφ} −F{φ} upto terms linear in δφ,
and setting δF = 0, we obtain
s
0
(f
φ
δφ + f
φ
δφ
) dx +
1
s
(f
φ
δφ + f
φ
δφ
) dx = 0.
Integrating the terms involving δφ
by parts leads to
s
0
¸
f
φ
−
d
dx
(f
φ
)
δφ dx +
1
s
¸
f
φ
−
d
dx
(f
φ
)
δφ dx +
¸
∂f
∂φ
δφ
s−
x=0
+
¸
∂f
∂φ
δφ
1
x=s+
= 0.
However, since δφ(0) = δφ(1) = 0, this simpliﬁes to
1
0
¸
∂f
∂φ
−
d
dx
∂f
∂φ
δφ(x) dx +
∂f
∂φ
x=s−
−
∂f
∂φ
x=s+
δφ(s) = 0.
First, if we limit attention to variations that are such that δφ(s) = 0, the second term
in the equation above vanishes, and only the integral remains. Since δφ(x) can be chosen
arbitrarily for all x ∈ (0, 1), x = s, this implies that the term within the brackets in the
integrand must vanish at each of these x’s. This leads to the Euler equation
∂f
∂φ
−
d
dx
∂f
∂φ
= 0 for 0 < x < 1, x = s.
Second, when this is substituted back into the equation above it, the integral now disappears.
Since the resulting equation must hold for all variations δφ(s), it follows that we must have
∂f
∂φ
x=s−
=
∂f
∂φ
x=s+
160 CHAPTER 7. CALCULUS OF VARIATIONS
at x = s. This is a “matching condition” or “jump condition” that relates the solution on
the left of x = s to the solution on its right. The matching condition shows that even though
φ
has a jump discontinuity at x = s, the quantity ∂f/∂φ
is continuous at this point.
Thus in summary an extremal φ must obey the following boundary value problem:
d
dx
∂f
∂φ
−
∂f
∂φ
= 0 for 0 < x < 1, x = s,
φ(0) = φ
0
,
φ(1) = φ
1
,
∂f
∂φ
x=s−
=
∂f
∂φ
x=s+
at x = s.
(7.86)
x
0
y
n
1
(x, y)
n
2
(x, y)
(a, A)
(b, B)
x = 0, −∞< y < ∞
θ
+
θ
−
Figure 7.13: Ray of light in a twophase material.
Example: Consider a twophase material that occupies all of x, yspace. The material oc
cupying x < 0 is diﬀerent to the material occupying x > 0 and so x = 0 is the interface
between the two materials. In particular, suppose that the refractive indices of the materials
occupying x < 0 and x > 0 are n
1
(x, y) and n
2
(x, y) respectively; see Figure 7.13. We are
asked to determine the path y = φ(x), a ≤ x ≤ b, followed by a ray of light travelling from
a point (a, A) in the left halfplane to the point (b, B) in the right halfplane. In particular,
we are to determine the conditions at the point where the ray crosses the interface between
the two media.
7.7. WEIRSTRASSERDMAN CORNER CONDITIONS 161
According to Fermat’s principle, a ray of light travelling between two given points follows
the path that it can traverse in the shortest possible time. Also, we know that light travels
at a speed c/n(x, y) where n(x, y) is the index of refraction at the point (x, y). Thus the
transit time is determined by integrating n/c along the path followed by the light, which,
since ds =
1 + (φ
)
2
dx can be written as
T{φ} =
b
a
1
c
n(x, φ(x))
1 + (φ
)
2
dx.
Thus the problem at hand is to determine φ that minimizes the functional T{φ} over the
set of admissible functions
A = {φ(·)
φ ∈ C[a, b], φ ∈ C
1
([a, 0) ∪ (0, b]), φ(a) = A, φ(b) = B}.
Note that this set of admissible functions allows the path followed by the light to have a
kink at x = 0 even though the path is continuous.
The functional we are asked to minimize can be written in the standard form
T{φ} =
b
a
f(x, φ, φ
) dx where f(x, φ, φ
) =
n(x, φ)
c
1 + (φ
)
2
.
Therefore
∂f
∂φ
=
n(x, φ)
c
φ
1 + (φ
)
2
and so the matching condition at the kink at x = 0 requires that
n
c
φ
1 + (φ
)
2
be continuous at x = 0.
Observe that, if θ is the angle made by the ray of light with the xaxis at some point along
its path, then tan θ = φ
and so sin θ = φ
/
1 + (φ
)
2
. Therefore the matching condition
requires that nsin θ be continuous, or
n
+
sin θ
+
= n
−
sin θ
−
where n
±
and θ
±
are the limiting values of n(x, φ(x)) and θ(x) as x → 0±. This is Snell’s
wellknown law of refraction.
162 CHAPTER 7. CALCULUS OF VARIATIONS
7.7.2 Piecewise smooth minimizer with nonsmoothness occuring
at an unknown location
Suppose again that we wish to extremize the functional
F(φ) =
1
0
f(x, φ, φ
) dx
over the admissible set of functions
A = {φ(·) : φ : [0, 1] →R, φ ∈ C[0, 1], φ ∈ C
1
p
[0, 1], φ(0) = a, φ(1) = b}
Just as before, the admissible functions are continuous and have a piecewise continuous ﬁrst
derivative. However in contrast to the preceding case, if there is discontinuity in the ﬁrst
derivative of φ at some location x = s, the location s is not known a priori and so is also to
be determined.
1
x
y
s
s + δs
φ(x)
φ(x) + δφ(x)
φ(x)
φ(x) + δφ(x)
Figure 7.14: Extremal φ(x) with a kink at x = s and a neighboring test function φ(x) +δφ(x) with kinks
at x = s and s +δs.
Suppose that F is extremized by the function φ(x) and that it has a jump discontinuity
in its ﬁrst derivative at x = s; (we shall say that φ has a “kink” at x = s). Suppose further
that φ is C
1
on either side of x = s. Consider a variation δφ(x) that vanishes at the two ends
x = 0 and x = 1, is continuous on [0, 1], is C
1
on [0, 1] except at x = s + δs where it has a
jump discontinuity in its ﬁrst derivative:
δφ ∈ C[0, 1] ∪ C
1
[0, s + δs) ∪ C
1
(s + δs, 1], δφ(0) = δφ(1) = 0.
Note that φ(x) + δφ(x) has kinks at both x = s and x = s + δs. Note further that we have
varied the function φ(x) and the location of the kink s. See Figure 7.14.
7.7. WEIRSTRASSERDMAN CORNER CONDITIONS 163
Since the extremal φ(x) has a kink at x = s it is convenient to split the integral and
express F{φ} as
F{φ} =
s
0
f(x, φ, φ
)dx +
1
s
f(x, φ, φ
)dx.
Similarly since the the neigboring function φ(x) + δ(x) has kinks at x = s and x = s + δs,
it is convenient to express F{φ + δφ} by splitting the integral into three terms as follows:
F{φ + δφ} =
s
0
f(x, φ + δφ, φ
+ δφ
)dx +
s+δs
s
f(x, φ + δφ, φ
+ δφ
)dx
+
1
s+δs
f(x, φ + δφ, φ
+ δφ
)dx.
We can now calculate the ﬁrst variation δF which, by deﬁnition, equals F{φ + δφ} −F{φ}
upto terms linear in δφ. Calculating δF in this way and setting the result equal to zero, leads
after integrating by parts, to
1
0
Aδφ(x)dx + B δφ(s) + C δs = 0,
where
A =
∂f
∂φ
−
d
dx
∂f
∂φ
,
B =
∂f
∂φ
x=s−
−
∂f
∂φ
x=s+
,
C =
f −φ
∂f
∂φ
x=s−
−
f −φ
∂f
∂φ
x=s+
.
(7.87)
By the arbitrariness of the variations above, it follows in the usual way that A, B and C all
must vanish. This leads to the usual Euler equation on (0, s) ∪ (s, 1), and the following two
additional requirements at x = s:
∂f
∂φ
s−
=
∂f
∂φ
s+
, (7.88)
f −φ
∂f
∂φ
s−
=
f −φ
∂f
∂φ
s+
. (7.89)
The two matching conditions (or jump conditions) (7.88) and (7.89) are known as the
WierstrassErdmann corner conditions (the term “corner” referring to the “kink” in φ).
Equation (7.88) is the same condition that was derived in the preceding subsection.
164 CHAPTER 7. CALCULUS OF VARIATIONS
Example: Find the extremals of the functional
F(φ) =
4
0
f(x, φ, φ
) dx =
4
0
(φ
−1)
2
(φ
+ 1)
2
dx
over the set of piecewise smooth functions subject to the end conditions φ(0) = 0, φ(4) = 2.
For simplicity, restrict attention to functions that have at most one point at which φ
has a
discontinuity.
Here
f(x, φ, φ
) =
(φ
)
2
−1
2
(7.90)
and therefore on diﬀerentiating f,
∂f
∂φ
= 4φ
(φ
)
2
−1
,
∂f
∂φ
= 0. (7.91)
Consequently the Euler equation (at points of smoothness) is
d
dx
f
φ
−f
φ
=
d
dx
4φ
(φ
)
2
−1
= 0. (7.92)
First, consider an extremal that is smooth everywhere. (Such an extremal might not, of
course, exist.) In this case the Euler equation (7.92) holds on the entire interval (0, 4) and
so we conclude that φ
(x) = constant for 0 ≤ x ≤ 4. On integrating this and using the
boundary conditions φ(0) = 0, φ(4) = 2, we ﬁnd that φ(x) = x/2, 0 ≤ x ≤ 4, is a smooth
extremal. In order to compare this with what follows, it is helpful to call this, say, φ
0
. Thus
φ
o
(x) =
1
2
x for 0 ≤ x ≤ 4,
is a smooth extremal of F.
Next consider a piecewise smooth extremizer of F which has a kink at some location
x = s; the value of s ∈ (0, 4) is not known a priori and is to be determined. (Again, such an
extremal might not, of course, exist.) The Euler equation (7.92) now holds on either side of
x = s and so we ﬁnd from (7.92) that φ
= c = constant on (0, s) and φ
= d = constant on
(s, 4) where c = d; (if c = d there would be no kink at x = s and we have already dealt with
this case above). Thus
φ
(x) =
c for 0 < x < s,
d for s < x < 4.
7.7. WEIRSTRASSERDMAN CORNER CONDITIONS 165
Integrating this, separately on (0, s) and (s, 4), and enforcing the boundary conditions φ(0) =
0, φ(4) = 2, leads to
φ(x) =
cx for 0 ≤ x ≤ s,
d(x −4) + 2 for s ≤ x ≤ 4.
(7.93)
Since φ is required to be continuous, we must have φ(s−) = φ(s+) which requires that
cs = d(s −4) + 2 whence
s =
2 −4d
c −d
. (7.94)
Note that s would not exist if c = d.
All that remains is to ﬁnd c and d, and the two WeirstrassErdmann corner conditions
(7.88), (7.89) provide us with the two equations for doing this. From (7.90), (7.91) and (7.93),
∂f
∂φ
=
4c(c
2
−1) for 0 < x < s,
4d(d
2
−1) for s < x < 4.
and
f −φ
∂f
∂φ
=
−(c
2
−1)(1 + 3c
2
) for 0 < x < s,
−(d
2
−1)(1 + 3d
2
) for s < x < 4.
Therefore the WeirstrassErdmann corner conditions (7.88) and (7.89), which require re
spectively the continuity of ∂f/∂φ
and f−φ
∂f/∂φ
at x = s, give us the pair of simultaneous
equations
c(c
2
−1) = d(d
2
−1),
(c
2
−1)(1 + 3c
2
) = (d
2
−1)(1 + 3d
2
).
Keeping in mind that c = d and solving these equations leads to the two solutions:
c = 1, d = −1, and c = −1, d = 1.
Corresponding to the former we ﬁnd from (7.94) that s = 3, while the latter leads to s = 1.
Thus from (7.93) there are two piecewise smooth extremals φ
1
(x) and φ
2
(x) of the assumed
form:
φ
1
(x) =
x for 0 ≤ x ≤ 3,
−x + 6 for 3 ≤ x ≤ 4.
φ
2
(x) =
−x for 0 ≤ x ≤ 1,
x −2 for 1 ≤ x ≤ 4.
166 CHAPTER 7. CALCULUS OF VARIATIONS
φ
0
(x)
φ
1
(x)
φ
2
(x)
1 2 3
x
y
1
2
3
4
φ
2
(x)
φ
1
(x)
Figure 7.15: Smooth extremal φ
0
(x) and piecewise smooth extremals φ
1
(x) and φ
2
(x).
Figure 7.15 shows graphs of φ
o
, φ
1
and φ
2
. By evaluating the functional F at each of the
extremals φ
0
, φ
1
and φ
2
, we ﬁnd
F{φ
o
} = 9/4, F{φ
1
} = F{φ
2
} = 0.
Remark: By inspection of the given functional
F(φ) =
4
0
(φ
)
2
−1
2
dx,
it is clear that (a) F ≥ 0, and (b) F = 0 if and only if φ
= ±1 everywhere (except at
isolated points where φ
may be undeﬁned). The extremals φ
1
and φ
2
have this property and
therefore correspond to absolute minimizers of F.
7.8 Generalization to higher dimensional space.
In order to help motivate the way in which we will approach higherdimensional problems
(which will in fact be entirely parallel to the approach we took for onedimensional problems)
we begin with some preliminary observations.
First, consider the onedimensional variational problem of minimizing a functional
F{φ} =
1
0
f(x, φ, φ
, φ
) dx
7.8. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. 167
on a set of suitably smooth functions with no prescribed boundary conditions at either
end. The analogous twodimensional problem would be to consider a set of suitably smooth
functions φ(x, y) deﬁned on a domain D of the x, yplane and to minimize a given functional
F{φ} =
D
f(x, y, φ, ∂φ/∂x, ∂φ/∂y, ∂
2
φ/∂x
2
, ∂
2
φ/∂x∂y, ∂
2
φ/∂y
2
) dA
over this set of functions with no boundary conditions prescribed anywhere on the boundary
∂D.
In deriving the Euler equation in the onedimensional case our strategy was to exploit
the fact that the variation δφ(x) was arbitrary in the interior 0 < x < 1 of the domain. This
motivated us to express the integrand in the form of some quantity A (independent of any
variations) multiplied by δφ(x). Then, the arbitrariness of δφ allowed us to conclude that A
must vanish on the entire domain. We approach twodimensional problems similarly and our
strategy will be to exploit the fact that δφ(x, y) is arbitrary in the interior of D and so we
attempt to express the integrand as some quantity A that is independent of any variations
multiplied by δφ. Similarly concerning the boundary terms, in the onedimensional case we
were able to exploit the fact that δφ and its derivative δφ
are arbitrary at the boundary
points x = 0 and x = 1, and this motivated us to express the boundary terms as some
quantity B that is independent of any variations multiplied by δφ(0), another quantity C
independent of any variations multiplied by δφ
(0), and so on. We approach twodimensional
problems similarly and our strategy for the boundary terms is to exploit the fact that δφ
and its normal derivative ∂(δφ)/∂n are arbitrary on the boundary ∂D. Thus the goal in
our calculations will be to express the boundary terms as some quantity independent of any
variations multiplied by δφ, another quantity independent of any variations multiplied by
∂(δφ)/∂n etc. Thus in the twodimensional case our strategy will be to take the ﬁrst variation
of F and carry out appropriate calculations that lead us to an equation of the form
δF =
D
Aδφ(x, y) dA +
∂D
B δφ(x, y) ds +
∂D
C
∂
∂n
(δφ(x, y))
ds = 0 (7.95)
where A, B, C are independent of δφ and its derivatives and the latter two integrals are on
the boundary of the domain D. We then exploit the arbitrariness of δφ(x, y) on the interior
of the domain of integration, and the arbitrariness of δφ and ∂(δφ)/∂n on the boundary
∂D to conclude that the minimizer must satisfy the partial diﬀerential equation A = 0 for
(x, y) ∈ D and the boundary conditions B = C = 0 on ∂D.
Next, recall that one of the steps involved in calculating the minimizer of a onedimensional
problem is integration by parts. This converts a term that is an integral over [0, 1] into terms
168 CHAPTER 7. CALCULUS OF VARIATIONS
that are only evaluated on the boundary points x = 0 and 1. The analog of this in higher
dimensions is carried out using the divergence theorem, which in twodimensions reads
D
∂P
∂x
+
∂Q
∂y
dA =
∂D
(Pn
x
+ Qn
y
) ds (7.96)
which expresses the left hand side, which is an integral over D, in a form that only involves
terms on the boundary. Here n
x
, n
y
are the components of the unit normal vector n on ∂D
that points out of D. Note that in the special case where P = ∂χ/∂x and Q = ∂χ/∂y for
some χ(x, y) the integrand of the right hand side is ∂χ/∂n.
Remark: The derivative of a function φ(x, y) in a direction corresponding to a unit vector m
is written as ∂φ/∂m and deﬁned by ∂φ/∂m = ∇φ· m = (∂φ/∂x) m
x
+∂φ/∂y) m
y
where m
x
and m
y
are the components of m in the x and ydirections respectively. On the boundary
∂D of a two dimensional domain D we frequently need to calculate the derivative of φ in
directions n and s that are normal and tangential to ∂D. In vector form we have
∇φ =
∂φ
∂x
i +
∂φ
∂y
j =
∂φ
∂n
n +
∂φ
∂s
s
where i and j are unit vectors in the x and ydirections. Recall also that a function φ(x, y)
and its tangential derivative ∂φ/∂s along the boundary ∂D are not independent of each other
in the following sense: if one knows the values of φ along ∂D one can diﬀerentiate φ along
the boundary to get ∂φ/∂s; and conversely if one knows the values of ∂φ/∂s along ∂D one
can integrate it along the boundary to ﬁnd φ to within a constant. This is why equation
(7.95) does not involve a term of the form E ∂(δφ)/∂s integrated along the boundary ∂D
since it can be rewritten as the integral of −(∂E/∂s) δφ along the boundary
Example 1: A stretched membrane. A stretched ﬂexible membrane occupies a regular
region D of the x, yplane. A pressure p(x, y) is applied normal to the surface of the membrane
in the negative zdirection. Let u(x, y) be the resulting deﬂection of the membrane in the
zdirection. The membrane is ﬁxed along its entire edge ∂D and so
u = 0 for (x, y) ∈ ∂D. (7.97)
One can show that the potential energy Φ associated with any deﬂection u that is compatible
with the given boundary condition is
Φ{u} =
D
1
2
∇u
2
dA −
D
pu dA
7.8. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. 169
where we have taken the relevant stiﬀness of the membrane to be unity. The actual deﬂection
of the membrane is the function that minimizes the potential energy over the set of test
functions
A = {u
u ∈ C
2
(D), u = 0 for (x, y) ∈ ∂D}.
x
y
0
∂D
D
Fixed
Figure 7.16: A stretched elastic membrane whose midplane occupies a region D of the x, yplane and
whose boundary ∂D is ﬁxed. The membrane surface is subjected to a pressure loading p(x, y) that acts in
the negative zdirection.
Since
Φ{u} =
D
1
2
(u
,x
u
,x
+ u
,y
u
,y
) dA −
D
pu dA,
its ﬁrst variation is
δΦ =
D
(u
,x
δu
,x
+ u
,y
δu
,y
) dA −
D
pδudA,
where an admissible variation δu(x, y) vanishes on ∂D. Here we are using the notation
that a comma followed by a subscript denotes partial diﬀerentiation with respect to the
corresponding coordinate, for example u
,x
= ∂u/∂x and u
,xy
= ∂
2
u/∂x∂y. In order to make
use of the divergence theorem and convert the area integral into a boundary integral we
must write the integrand so that it involves terms of the form (. . .)
,x
+ (. . .)
,y
; see (7.96).
This suggests that we rewrite the preceding equation as
δΦ =
D
(u
,x
δu)
,x
+ (u
,y
δu)
,y
−(u
,xx
+ u
,yy
)δu
dA −
D
pδudA,
or equivalently as
δΦ =
D
(u
,x
δu)
,x
+ (u
,y
δu)
,y
dA −
D
u
,xx
+ u
,yy
+ p
δu dA.
170 CHAPTER 7. CALCULUS OF VARIATIONS
By using the divergence theorem on the ﬁrst integral we get
δΦ =
∂D
u
,x
n
x
+ u
,y
n
y
δu ds −
D
(u
,xx
+ u
,yy
+ p) δu dA
where n is a unit outward normal along ∂D. We can write this equivalently as
δΦ =
∂D
∂u
∂n
δu ds −
D
∇
2
u + p
δu dA. (7.98)
Since the variation δu vanishes on ∂D the ﬁrst integral drops out and we are left with
δΦ = −
D
∇
2
u + p
δu dA (7.99)
which must vanish for all admissible variations δu(x, y). Thus the minimizer satisﬁes the
partial diﬀerential equation
∇
2
u + p = 0 for (x, y) ∈ D
which is the Euler equation in this case that is to be solved subject to the prescribed boundary
condition (7.97). Note that if some part of the boundary of D had not been ﬁxed, then we
would not have δu = 0 on that part in which case (7.98) and (7.99) would yield the natural
boundary condition ∂φ/∂n = 0 on that segment.
NNN Show the calculations for just one term w
2
xx
. Include nu = 0 in text.NNN
NNN Check signs of terms and sign conventionNNN
Example 2: The Kirchhoﬀ theory of plates. We consider the bending of a thin plate
according to the socalled Kirchhoﬀ theory. Solely for purposes of mathematical simplicity
we shall assume that the Poisson ratio ν of the material is zero. A discussion of the case
ν = 0 can be found in many books, for example, in “Energy & Finite Elements Methods in
Structural Mechanic” by I.H. Shames & C.L. Dym. When ν = 0 the plate bending stiﬀness
D = Et
3
/12 where E is the Young’s modulus of the material and t is the thickness of the
plate. The midplane of the plate occupies a domain of the x, yplane and w(x, y) denotes
the deﬂection (displacement) of a point on the midplane in the zdirection. The basic con
stitutive relationships of elastic plate theory relate the internal moments M
x
, M
y
, M
xy
, M
yx
(see Figure 7.17) to the second derivatives of the displacement ﬁeld w
,xx
, w
,yy
, w
,xy
by
M
x
= −Dw
,xx
, M
y
= −Dw
,yy
, M
xy
= M
yx
= −Dw
,xy
, (7.100)
7.8. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. 171
where a comma followed by a subscript denotes partial diﬀerentiation with respect to the
corresponding coordinate and D is the plate bending stiﬀness; and the shear forces in the
plate are given by
V
x
= −D(∇
2
w)
,x
, V
y
= −D(∇
2
w)
,y
. (7.101)
The elastic energy per unit volume of the plate is given by
1
2
M
x
w
,xx
+ M
y
w
,yy
+ M
xy
w
,xy
+ M
yx
w
,yx
=
D
2
w
2
,xx
+ w
2
,yy
+ 2w
2
,xy
. (7.102)
x
y
z
M
x
M
y
M
xy
V
x
V
y
M
x
= D(w
,xx
+ νw
yy
)
M
y
= D(w
,yy
+ νw
xx
)
D(1 −ν)w
,xy
V
x
= D(∇
2
w)
,y
V
y
= D(∇
2
w)
,x
dx
dy
t
M
yx
M
xy
= M
yx
=
Figure 7.17: A diﬀerential element dx×dy ×t of a thin plate. A bold arrow represents a force and thus V
x
and V
y
are (shear) forces. A bold arrow with two arrow heads represents a moment whose sense is given by
the right hand rule. Thus M
xy
and M
yx
are (twisting) moments while M
x
and M
y
are (bending) moments.
x
y
a
b
0
Clamped
x
y
Ω
∂Ω
1
∂Ω
2
0
Free
Clamped
n
s
n
x
= 1, n
y
= 0
s
x
= 0, s
y
= 1
Free
∂Ω
2
Figure 7.18: Left: A thin elastic plate whose midplane occupies a region Ω of the x, yplane. The segment
∂Ω
1
of the boundary is clamped while the remainder ∂Ω
2
is free of loading. Right: A rectangular a ×b plate
with a load free edge at its right hand side x = a, 0 ≤ y ≤ b.
172 CHAPTER 7. CALCULUS OF VARIATIONS
It is worth noting the following puzzling question: Consider the rectangular plate shown
in the right hand diagram of Figure 7.18. Based on Figure 7.17 we know that there is a
bending moment M
x
, a twisting moment M
xy
, and a shear force V
x
acting on any surface
x = constant in the plate. Therefore, in particular, since the right hand edge x = a is free
of loading one would expect to have the three conditions M
x
= M
xy
= V
x
= 0 along that
boundary. However we will ﬁnd that the diﬀerential equation to be solved in the interior of
the plate requires (and can only accommodate) two boundary conditions at any point on
the edge. The question then arises as to what the correct boundary conditions on this edge
should be. Our variational approach will give us precisely two natural boundary conditions
on this edge. They will involve M
x
, M
xy
and V
x
but will not require that each of them must
vanish individually.
Consider a thin elastic plate whose midplane occupies a domain Ω of the x, yplane as
shown in the left hand diagram of Figure 7.18. A normal loading p(x, y) is applied on the
ﬂat face of the plate in the −zdirection. A part of the plate boundary denoted by ∂Ω
1
is
clamped while the remainder ∂Ω
2
is free of any external loading. Thus if w(x, y) denotes the
deﬂection of the plate in the zdirection we have the geometric boundary conditions
w = ∂w/∂n = 0 for (x, y) ∈ ∂Ω
1
. (7.103)
The total potential energy of the system is
Φ{w} =
Ω
D
2
w
2
,xx
+ 2w
2
,xy
+ w
2
,yy
−p w
dA (7.104)
where the ﬁrst group of terms represents the elastic energy in the plate and the last term
represents the potential energy of the pressure loading (the negative sign arising from the
fact that p acts in the minus zdirection while w is the deﬂection in the positive z direction).
This functional Φ is deﬁned on the set of all kinematically admissible deﬂection ﬁelds which
is the set of all suitably smooth functions w(x, y) that satisfy the geometric requirements
(7.103). The actual deﬂection ﬁeld is the one that minimizes the potential energy Φ over this
set.
We now determine the Euler equation and natural boundary conditions associated with
(7.104) by calculating the ﬁrst variation of Φ{w} and setting it equal to zero:
Ω
w
,xx
δw
,xx
+ 2w
,xy
δw
,xy
+ w
,yy
δw
,yy
−
p
D
δw
dA = 0. (7.105)
To simplify this we begin be rearranging the terms into a form that will allow us to use
the divergence theorem, thereby converting part of the area integral on Ω into a boundary
7.8. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. 173
integral on ∂Ω. In order to use the divergence theorem we must write the integrand so that
it involves terms of the form (. . .)
,x
+ (. . .)
,y
; see (7.96). Accordingly we rewrite (7.105) as
0 =
Ω
w
,xx
δw
,xx
+ 2w
,xy
δw
,xy
+ w
,yy
δw
,yy
−(p/D)δw
dA,
=
Ω
w
,xx
δw
,x
+ w
,xy
δw
,y
,x
+
w
,xy
δw
,x
+ w
,yy
δw
,y
,y
−w
,xxx
δw
,x
−w
,xxy
δw
,y
−w
,xyy
δw
,x
−w
,yyy
δw
,y
−(p/D)δw
dA,
=
∂Ω
w
,xx
δw
,x
+ w
,xy
δw
,y
n
x
+
w
,xy
δw
,x
+ w
,yy
δw
,y
n
y
ds
−
Ω
w
,xxx
δw
,x
+ w
,xxy
δw
,y
+ w
,xyy
δw
,x
+ w
,yyy
δw
,y
+ (p/D)δw
dA,
=
∂Ω
I
1
ds −
Ω
I
2
dA.
(7.106)
We have used the divergence theorem (7.96) in going from the second equation above to the
third equation. In order to facilitate further simpliﬁcation, in the last step we have let I
1
and I
2
denote the integrands of the boundary and area integrals.
To simplify the area integral in (7.106) we again rearrange the terms in I
2
into a form
that will allow us to use the divergence theorem. Thus
Ω
I
2
dA =
Ω
w
,xxx
δw
,x
+ w
,xxy
δw
,y
+ w
,xyy
δw
,x
+ w
,yyy
δw
,y
+ p/Dδw
dA,
=
Ω
w
,xxx
δw + w
,xyy
δw
,x
+
w
,xxy
δw + w
,yyy
δw
,y
−
w
,xxxx
+ 2w
,xxyy
+ w
,yyyy
−p/D
δw
dA,
=
∂Ω
w
,xxx
δw + w
,xyy
δw
n
x
+
w
,xxy
δw + w
,yyy
δw
n
y
ds
−
Ω
∇
4
w −(p/D)
δwdA,
=
∂Ω
w
,xxx
n
x
+ w
,xyy
n
x
+ w
,xxy
n
y
+ w
,yyy
n
y
δwds
−
Ω
∇
4
w −p/D
δwdA,
=
∂Ω
P
1
δwds −
Ω
P
2
δwdA,
(7.107)
174 CHAPTER 7. CALCULUS OF VARIATIONS
where we have set
P
1
= w
,xxx
n
x
+ w
,xyy
n
x
+ w
,xxy
n
y
+ w
,yyy
n
y
and P
2
= ∇
4
w −p/D. (7.108)
In the preceding calculation, we have used the divergence theorem (7.96) in going from the
second equation in (7.107) to the third equation, and we have set
∇
4
w = ∇
2
(∇
2
w) = w
,xxxx
+ 2w
,xxyy
+ w
,yyyy
.
Next we simplify the boundary term in (7.106) by converting the derivatives of the
variation with respect to x and y into derivatives with respect to normal and tangential
coordinates n and s. To do this we use the fact that ∇δw = δw
,x
i + δw
,y
j = δw
,n
n + δw
,s
s
from which it follows that δw
,x
= δw
,n
n
x
+ δw
,s
s
x
and δw
,y
= δw
,n
n
y
+ δw
,s
s
y
. Thus from
(7.106),
∂Ω
I
1
ds =
∂Ω
w
,xx
n
x
δw
,x
+ w
,xy
n
x
δw
,y
+
w
,xy
n
y
δw
,x
+ w
,yy
n
y
δw
,y
ds,
=
∂Ω
w
,xx
n
x
+ w
,xy
n
y
δw
,x
+
w
,xy
n
x
+ w
,yy
n
y
δw
,y
ds,
=
∂Ω
w
,xx
n
x
+ w
,xy
n
y
δw
,n
n
x
+ δw
,s
s
x
+
w
,xy
n
x
+ w
,yy
n
y
δw
,n
n
y
+ δw
,s
s
y
ds,
=
∂Ω
w
,xx
n
2
x
+ w
,xy
n
x
n
y
+ w
,xy
n
x
n
y
+ w
,yy
n
2
y
δw
,n
+
w
,xx
n
x
s
x
+ w
,xy
s
x
n
y
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
δw
,s
ds,
=
∂Ω
w
,xx
n
2
x
+ w
,xy
n
x
n
y
+ w
,xy
n
x
n
y
+ w
,yy
n
2
y
δw
,n
+ I
3
ds.
(7.109)
To further simplify this we have set I
3
equal to the last expression in (7.109) and this term
can be written as
∂Ω
I
3
ds =
∂Ω
w
,xx
n
x
s
x
+ w
,xy
s
x
n
y
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
δw
,s
ds,
=
∂Ω
w
,xx
n
x
s
x
+ w
,xy
s
x
n
y
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
δw
,s
−
w
,xx
n
x
s
x
+ w
,xy
s
x
n
y
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
,s
δw
ds.
(7.110)
If a ﬁeld f(x, y) varies smoothly along ∂Ω, and if the curve ∂Ω itself is smooth, then
∂Ω
∂f
∂s
ds = 0 (7.111)
7.8. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. 175
since this is an integral over a closed curve
8
. It follows from this that the ﬁrst term in the
last expression of (7.110) vanishes and so
∂Ω
I
3
ds = −
∂Ω
w
,xx
n
x
s
x
+ w
,xy
s
x
n
y
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
,s
δw
ds.
(7.112)
Substituting (7.112) into (7.109) yields
∂Ω
I
1
ds =
∂Ω
P
3
∂
∂n
(δw) ds −
∂Ω
∂
∂s
(P
4
) δwds, (7.113)
where we have set
P
3
= w
,xx
n
2
x
+ w
,xy
n
x
n
y
+ w
,xy
n
x
n
y
+ w
,yy
n
2
y
,
P
4
= w
,xx
n
x
s
x
+ w
,xy
n
y
s
x
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
.
(7.114)
Finally, substituting (7.113) and (7.107) into (7.106) leads to
Ω
P
2
δwdA −
∂Ω
P
1
+
∂
∂s
(P
4
)
δwds +
∂Ω
P
3
∂
∂n
(δw) ds = 0 (7.115)
which must hold for all admissible variations δw. First restrict attention to variations which
vanish on the boundary ∂Ω and whose normal derivative ∂(δw)/∂n also vanish on ∂Ω. This
leads us to the Euler equation P
2
= 0:
∇
4
w −p/D = 0 for (x, y) ∈ Ω. (7.116)
Returning to (7.115) with this gives
−
∂Ω
P
1
+
∂
∂s
(P
4
)
δwds +
∂Ω
P
3
∂
∂n
(δw) ds = 0. (7.117)
Since the portion ∂Ω
1
of the boundary is clamped we have w = ∂w/∂n = 0 for (x, y) ∈ ∂Ω
1
.
Thus the variations δw and ∂(δw)/∂n must also vanish on ∂Ω
1
. Thus (7.117) simpliﬁes to
−
∂Ω
2
P
1
+
∂
∂s
(P
4
)
δwds +
∂Ω
2
P
3
∂
∂n
(δw) ds = 0 (7.118)
for variations δw and ∂(δw)/∂n that are arbitrary on ∂Ω
2
where ∂Ω
2
is the complement of
∂Ω
1
, i.e. ∂Ω = ∂Ω
1
∪ ∂Ω
2
. Thus we conclude that P
1
+ ∂P
4
/∂s = 0 and P
3
= 0 on ∂Ω
2
:
w
,xxx
n
x
+ w
,xyy
n
x
+ w
,xxy
n
y
+ w
,yyy
n
y
+
∂
∂s
(w
,xx
n
x
s
x
+ w
,xy
n
y
s
x
+ w
,xy
n
x
s
y
+ w
,yy
n
y
s
y
) = 0
w
,xx
n
2
x
+ w
,xy
n
x
n
y
+ w
,xy
n
x
n
y
+ w
,yy
n
2
y
= 0,
for (x, y) ∈ ∂Ω
2
.
(7.119)
8
In the present setting one would have this degree of smoothness if there are no concentrated loads applied
on the boundary of the plate ∂Ω and the boundary curve itself has no corners.
176 CHAPTER 7. CALCULUS OF VARIATIONS
Thus, in summary, the Kirchhoﬀ theory of plates for the problem at hand requires that
one solve the ﬁeld equation (7.116) on Ω subjected to the displacement boundary conditions
(7.103) on ∂Ω
1
and the natural boundary conditions (7.119) on ∂Ω
2
.
Remark: If we deﬁne the moments M
n
, M
ns
and force V
n
by
M
n
= −D (w
,xx
n
x
n
x
+ w
,xy
n
x
n
y
+ w
,yx
n
y
n
x
+ w
,yy
n
y
n
y
)
M
ns
= −D (w
,xx
n
x
s
x
+ w
,xy
n
y
s
x
+ w
,yx
n
x
s
y
+ w
,yy
n
y
s
y
)
V
n
= −D (w
,xxx
n
x
+ w
,xyy
n
x
+ w
,yxx
n
y
+ w
,yyy
n
y
)
(7.120)
then the two natural boundary conditions can be written as
M
n
= 0,
∂
∂s
M
ns
+ V
n
= 0. (7.121)
As a special case suppose that the plate is rectangular, 0 ≤ x ≤ a, 0 ≤ y ≤ b and that
the right edge x = a, 0 ≤ y ≤ b is free of load; see the right diagram in Figure 7.18. Then
n
x
= 1, n
y
= 0, s
x
= 0, s
y
= 1 on ∂Ω
2
and so (7.120) simpliﬁes to
M
n
= −D w
,xx
M
ns
= −D w
,yx
V
n
= −D (w
,xxx
+ w
,xyy
)
(7.122)
which because of (7.100) shows that in this case M
n
= M
x
, M
ns
= M
xy
, V
n
= V
x
. Thus the
natural boundary conditions (7.121) can be written as
M
x
= 0,
∂
∂y
M
xy
+ V
x
= 0. (7.123)
This answers the question we posed soon after (7.101) as to what the correct boundary
conditions on a free edge should be. We had noted that intuitively we would have expected
the moments and forces to vanish on a free edge and therefore that M
x
= M
xy
= V
x
= 0 there;
but this is in contradiction to the mathematical fact that the diﬀerential equation (7.116)
only requires two conditions at each point on the boundary. The two natural boundary
conditions (7.123) require that certain combinations of M
x
, M
xy
, V
x
vanish but not that all
three vanish.
Example 3: Minimal surface equation. Let C be a closed curve in R
3
as sketched in
Figure 7.19. From among all surfaces S in R
3
that have C as its boundary, we wish to
7.8. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. 177
determine the surface that has minimum area. As a physical example, if C corresponds to
a wire loop which we dip in a soapy solution, a thin soap ﬁlm will form across C. The
surface that forms is the one that, from among all possible surfaces S that are bounded
by C, minimizes the total surface energy of the ﬁlm; which (if the surface energy density is
constant) is the surface with minimal area.
y
x
z
D
∂D
C
:
z = φ(x, y)
z = h(x, y)
S :
i
j
k
Figure 7.19: The closed curve C in R
3
is given. From among all surfaces S in R
3
that have C as its boundary,
the surface with minimal area is to be sought. The curve ∂D is the projection of C onto the x, yplane.
Let C be a closed curve in R
3
. Suppose that its projection onto the x, yplane is denoted
by ∂D and let D denote the simply connected region contained within ∂D; see Figure 7.19.
Suppose that C is characterized by z = h(x, y) for (x, y) ∈ ∂D. Let z = φ(x, y) for (x, y) ∈ D
describe a surface S in R
3
that has C as its boundary; necessarily φ = h on ∂D. Thus the
admissible set of functions we are considering are
A{φ
φ : D →R, φ ∈ C
1
(D), φ = h on ∂D} .
Consider a rectangular diﬀerential element on the x, yplane that is contained within D.
The vector joining (x, y) to (x+dx, y) is dx = dx i while the vector joining (x, y) to (x, y+dy)
is dy = dy j. If du and dv are vectors on the surface z = φ(x, y) whose projections are dx
and dy respectively, then we know that
du = dx i + φ
x
dx k, dv = dy j + φ
y
dy k.
The vectors du and dv deﬁne a parallelogram on the surface z = φ(x, y) and the area of
this parallelogram is du ×dv. Thus the area of a diﬀerential element on S is
du ×dv =
−φ
x
dxdy i −φ
y
dxdy j + dxdy k
=
1 + φ
2
x
+ φ
2
y
dxdy.
178 CHAPTER 7. CALCULUS OF VARIATIONS
Consequently the problem at hand is to minimize the functional
F{φ} =
D
1 + φ
2
x
+ φ
2
y
dA.
over the set of admissible functions
A = {φ  φ : D →R, φ ∈ C
2
(D), φ = h on ∂D}.
It is left as an exercise to show that setting the ﬁrst variation of F equal to zero leads to the
socalled minimal surface equation
(1 + φ
2
y
)φ
xx
−2φ
x
φ
y
φ
xy
+ (1 + φ
2
x
)φ
yy
= 0.
Remark: See en.wikipedia.org/wiki/Soap bubble and www.susqu.edu/facstaﬀ/b/brakke/ for
additional discussion.
7.9 Second variation. Another necessary condition for
a minimum.
In order to illustrate the basic ideas of this section in the simplest possible setting, we conﬁne
the discussion to the particular functional
F{φ} =
1
0
f(x, φ, φ
)dx
deﬁned over a set of admissible functions A. Suppose that a particular function φ minimizes
F, and that for some given function η, the oneparameter family of functions φ + εη are
admissible for all suﬃciently small values of the parameter ε. Deﬁne
ˆ
F(ε) by
ˆ
F(ε) = F{φ + εη} =
1
0
f(x, φ + εη, φ
+ εη
)dx,
so that by Taylor expansion,
ˆ
F(ε) =
ˆ
F(0) + ε
ˆ
F
(0) +
ε
2
2
ˆ
F
(0) + O(ε
3
),
where
ˆ
F(0) =
1
0
f(x, φ, φ
)dx = F{φ},
ε
ˆ
F
(0) = δF{φ, η},
ε
2
F
(0) = ε
2
1
0
{f
φφ
η
2
+ 2f
φφ
ηη
+ f
φ
φ
(η
)
2
} dx
def
= δ
2
F{φ, η}.
7.9. SECOND VARIATION 179
Since φ minimizes F, it follows that ε = 0 minimizes
ˆ
F(ε), and consequently that
δ
2
F{φ, η} ≥ 0,
in addition to the requirement δF{φ, η} = 0. Thus a necessary condition for a function φ to
minimize a functional F is that the second variation of F be nonnegative for all admissible
variations δφ:
δ
2
F{φ, δφ} =
1
0
¸
f
φφ
(δφ)
2
+ 2f
φφ
(δφ)(δφ
) + f
φ
φ
(δφ
)
2
¸
dx ≥ 0, (7.124)
where we have set δφ = εη. The inequality is reversed if φ maximizes F.
The condition δ
2
F{φ, η} ≥ 0 is necessary but not suﬃcient for the functional F to have
a minimum at φ. We shall not discuss suﬃcient conditions in general in these notes.
Proposition: Legendre Condition: A necessary condition for (7.124) to hold is that
f
φ
φ
(x, φ(x), φ
(x)) ≥ 0 for 0 ≤ x ≤ 1
for the minimizing function φ.
Example: Consider a curve in the x, yplane characterized by y = φ(x) that begins at (0, φ
0
)
and ends at (1, φ
1
). From among all such curves, ﬁnd the one that, when rotated about the
xaxis, generates the surface of minimum area.
Thus we are asked to minimize the functional
F{φ} =
1
0
f(x, φ, φ
) dx where f(x, φ, φ
) = φ
1 + (φ
)
2
,
over a set of admissible functions that satisfy the boundary conditions φ(0) = φ
0
, φ(1) = φ
1
.
A function φ that minimizes F must satisfy the boundary value problem consisting of
the Euler equation and the given boundary conditions:
d
dx
φφ
1 + (φ
)
2
−
1 + (φ
)
2
= 0,
φ(0) = φ
0
, φ(1) = φ
1
.
The general solution of this Euler equation is
φ(x) = αcosh
x −β
α
for 0 ≤ x ≤ 1,
180 CHAPTER 7. CALCULUS OF VARIATIONS
where the constants α and β are determined through the boundary conditions. To test the
Legendre condition we calculate f
φ
φ
and ﬁnd that
f
φ
φ
=
φ
(
1 + (φ
)
2
)
3
,
which, when evaluated at the particular function φ(x) = αcosh(x −β)/α yields
f
φ
φ

φ=αcosh(x−β)/α
=
α
cosh
2
(x −β)/α
.
Therefore as long as α > 0 the Legendre condition is satisﬁed.
7.10 Suﬃcient condition for minimization of convex
functionals
x
0
y
x
1
x
2
y = F(x)
F(x
1
) ≥ F(x
2
) + F
(x
2
)(x
1
−x
2
) for all x
1
, x
2
∈ D.
D
y = F(x
2
) + F
(x
2
)(x −x
2
)
Figure 7.20: A convex curve y = F(x) lies above the tangent line through any point of the curve.
We now turn to a brief discussion of suﬃcient conditions for a minimum for a special
class of functionals. It is useful to begin by reviewing the question of ﬁnding the minimum of
a realvalued function of a real variable. A function F(x) deﬁned for x ∈ A with continuous
ﬁrst derivatives is said to be convex if
F(x
1
) ≥ F(x
2
) + F
(x
2
)(x
1
−x
2
) for all x
1
, x
2
∈ A;
7.10. SUFFICIENT CONDITION FOR CONVEX FUNCTIONALS 181
see Figure 7.20 for a geometric interpretation of convexity. If a convex function has a station
ary point, say at x
o
, then it follows by setting x
2
= x
o
in the preceding equation that x
o
is a
minimizer of F. Therefore a stationary point of a convex function is necessarily a minimizer.
If F is strictly convex on A, i.e. if F is convex and F(x
1
) = F(x
2
) + F
(x
2
)(x
1
− x
2
) if and
only if x
1
= x
2
, then F can only have one stationary point and so can only have one interior
minimum.
This is also true for a realvalued function F with continuous ﬁrst derivatives on a domain
A in R
n
, where convexity is deﬁned by
9
F(x
1
) ≥ F(x
2
) + δF(x
2
, x
1
−x
2
) for all x
1
, x
2
∈ A.
If a convex function has a stationary point at, say, x
o
, then since δF(x
o
, y) = 0 for all y
it follows that x
o
is a minimizer of F. Therefore a stationary point of a convex function
is necessarily a minimizer. If F is strictly convex on A, i.e. if F is convex and F(x
1
) =
F(x
2
) + δF(x
2
, x
1
− x
2
) if and only if x
1
= x
2
, then F can have only one stationary point
and so can have only one interior minimum.
We now turn to a functional F{φ} which is said to be convex on A if
F{φ + η} ≥ F{φ} + δF{φ, η} for all φ, φ + η ∈ A.
If F is stationary at φ
o
∈ A, then δF{φ
o
, η} = 0 for all admissible η, and it follows that φ
o
is in fact a minimizer of F. Therefore a stationary point of a convex functional is necessarily
a minimizer.
For example, consider the generic functional
F{φ} =
1
0
f(x, φ, φ
)dx. (7.125)
Then
δF{φ, η} =
1
0
∂f
∂φ
η +
∂f
∂φ
η
dx
and so the convexity condition F{φ + η} −F{φ} ≥ δF{φ, η} takes the special form
1
0
f(x, φ + η, φ
+ η
) −f(x, φ, φ
)
dx ≥
1
0
∂f
∂φ
η +
∂f
∂φ
η
dx. (7.126)
In general it might not be simple to test whether this condition holds in a particular case.
It is readily seen that a suﬃcient condition for (7.126) to hold is that the integrands satisfy
9
See equation (??) for the deﬁnition of δF(x, y).
182 CHAPTER 7. CALCULUS OF VARIATIONS
the inequality
f(x, y + v, z + w) −f(x, y, z) ≥
∂f
∂y
v +
∂f
∂z
w (7.127)
for all (x, y, z), (x, y + v, z + w) in the domain of f. This is precisely the requirement that
the function f(x, y, z) be a convex function of y, z at ﬁxed x.
Thus in summary: if the integrand f of the functional F deﬁned in (7.125) satisﬁes the
convexity condition (7.127), then, a function φ that extremizes F is in fact a minimizer of F.
Note that this is simply a suﬃcient condition for ensuring that an extremum is a minimum.
Remark: In the special case where f(x, y, z) is independent of y, one sees from basic calculus
that if ∂
2
f/∂z
2
> 0 then f(x, z) is a strictly convex function of z at each ﬁxed x.
Example: Geodesics. Find the curve of shortest length that lies entirely on a circular
cylinder of radius a, beginning (in circular cylindrical coordinates (r, θ, ξ)) at (a, θ
1
, ξ
1
) and
ending at (a, θ
2
, ξ
2
) as shown in the ﬁgure.
a
ξ
(a, θ
1
, ξ
1
)
(a, θ
2
, ξ
2
)
a
θ1
θ
2
x = a cos θ
y = a sin θ
= ξ(θ)
θ
1
≤ θ ≤ θ
2
ξ
Figure 7.21: A curve that lies entirely on a circular cylinder of radius a, beginning (in circular cylindrical
coordinates) at (a, θ
1
, ξ
1
) and ending at (a, θ
2
, ξ
2
).
We can characterize a curve in R
3
using a parametric characterization using circular
cylindrical coordinates by r = r(θ), ξ = ξ(θ), θ
1
≤ θ ≤ θ
2
. When the curve lies on the surface
of a circular cylinder of radius a this specializes to
r = a, ξ = ξ(θ) for θ
1
≤ θ ≤ θ
2
.
7.11. DIRECT METHOD 183
Since the arc length can be written as
ds =
dr
2
+ r
2
dθ
2
+ dξ
2
=
r
(θ)
2
+
r(θ)
2
+
ξ
(θ)
2
dθ = =
a
2
+
ξ
(θ)
2
dθ.
our task is to minimize the functional
F{ξ} =
θ
2
θ
1
f(θ, ξ(θ), ξ
(θ)) dθ where f(x, y, z) =
√
a
2
+ z
2
over the set of all suitably smooth functions ξ(θ) deﬁned for θ
1
≤ θ ≤ θ
2
which satisfy
ξ(θ
1
) = ξ
1
, ξ(θ
2
) = ξ
2
.
Evaluating the necessary condition δF = 0 leads to the Euler equation. This second order
diﬀerential equation for ξ(θ) can be readily solved, which after using the boundary conditions
ξ(θ
1
) = ξ
1
, ξ(θ
2
) = ξ
2
leads to
ξ(θ) = ξ
1
+
ξ
1
−ξ
2
θ
1
−θ
2
(θ −θ
1
). (7.128)
Direct diﬀerentiation of f(x, y, z) =
√
a
2
+ z
2
shows that
∂
2
f
∂z
2
=
a
2
(a
2
+ z
2
)
3/2
> 0
and so f is a strictly convex function of z. Thus the curve of minimum length is given
uniquely by (7.128) – a helix. Note that if the circular cylindrical surface is cut along a
vertical line and unrolled into a ﬂat sheet, this curve unfolds into a straight line.
7.11 Direct method of the calculus of variations and
minimizing sequences.
We now turn to a diﬀerent method of seeking minima, and for purposes of introduction, begin
by reviewing the familiar case of a realvalued function f(x) of a real variable x ∈ (−∞, ∞).
Consider the speciﬁc example f(x) = x
2
. This function is nonnegative and has a minimum
value of zero which it attains at x = 0. Consider the sequence of numbers
x
0
, x
1
, x
2
, x
3
. . . x
k
, . . . where x
k
=
1
2
k
,
and note that
lim
k→∞
f(x
k
) = 0.
184 CHAPTER 7. CALCULUS OF VARIATIONS
2 1 1 2
1
2
3
2 1 1 2
1
2
3
(a) (b)
f(x) = 1
f(x) = x
2
f(x) = x
2
x x
Figure 7.22: (a) The function f(x) = x
2
for −∞ < x < ∞ and (b) the function f(x) = 1 for x ≤ 0,
f(x) = x
2
for x > 0.
The sequence 1/2, 1/2
2
, . . . , 1/2
k
, . . . is called a minimizing sequence in the sense that
the value of the function f(x
k
) converges to the minimum value of f as k → ∞. Moreover,
observe that
lim
k→∞
x
k
= 0
as well, and so the sequence itself converges to the minimizer of f, i.e. to x = 0. This latter
feature is true because in this example
f( lim
k→∞
x
k
) = lim
n→∞
f(x
k
).
As we know, not all functions have a minimum value, even if they happen to have a ﬁnite
greatest lower bound. We now consider an example to illustrate the fact that a minimizing
sequence can be used to ﬁnd the greatest lower bound of a function that does not have a
minimum. Consider the function f(x) = 1 for x ≤ 0 and f(x) = x
2
for x > 0. This function
is nonnegative, and in fact, it can take values arbitrarily close to the value 0. However it
does not have a minimum value since there is no value of x for which f(x) = 0; (note that
f(0) = 1). The greatest lower bound or inﬁmum (denoted by “inf”) of f is
inf
−∞<x<∞
f(x) = 0.
Again consider the sequence of numbers
x
0
, x
1
, x
2
, x
3
. . . x
k
, . . . where x
k
=
1
2
k
,
and note that
lim
k→∞
f(x
k
) = 0.
7.11. DIRECT METHOD 185
In this case the value of the function f(x
k
) converges to the inﬁmum of f as k →∞. However
since
lim
k→∞
x
k
= 0
the limit of the sequence itself is x = 0 and f(0) is not the inﬁmum of f. This is because in
this example
f( lim
k→∞
x
k
) = lim
n→∞
f(x
k
).
Returning now to a functional, suppose that we are to ﬁnd the inﬁmum (or the minimum
if it exists) of a functional F{φ} over an admissible set of functions A. Let
inf
φ∈A
F{φ} = m (> −∞).
Necessarily there must exist a sequence of functions φ
1
, φ
2
, . . . in A such that
lim
n→∞
F{φ
k
} = m;
such a sequence is called a minimizing sequence.
If the sequence φ
1
, φ
2
, . . . converges to a limiting function φ
∗
, and if
F{ lim
n→∞
φ
k
} = lim
n→∞
F{φ
k
},
then it follows that F{φ
∗
} = m and the function φ
∗
is the minimizer of F. The functions φ
k
of a minimizing sequence can be considered to be approximate solutions of the minimization
problem.
Just as in the second example of this section, in some variational problems the limiting
function φ
∗
of a minimizing sequence φ
1
, φ
2
, . . . does not minimize the functional F; see the
last Example of this section.
7.11.1 The Ritz method
Suppose that we are to minimize a functional F{φ} over an admissible set A. Consider an
inﬁnite sequence of functions φ
1
, φ
2
, . . . in A. Let A
p
be the subset of functions in A that
can be expressed as a linear combination of the ﬁrst p functions φ
1
, φ
2
, . . . φ
p
. In order to
minimize F over the subset A
p
we must simply minimize
´
F(α
1
, α
2
, . . . , α
p
) = F{α
1
φ
1
+ α
2
φ
2
+ . . . + α
p
φ
p
}
186 CHAPTER 7. CALCULUS OF VARIATIONS
with respect to the real parameters α
1
, α
2
, . . . α
p
. Suppose that the minimum of F on A
p
is
denoted by m
p
. Clearly A
1
⊂ A
2
⊂ A
3
. . . ⊂ A and therefore m
1
≥ m
2
≥ m
3
. . .
10
. Thus, in
the socalled Ritz Method, we minimize F over a subset A
p
to ﬁnd an approximate minimizer;
moreover, increasing the value of p improves the approximation in the sense of the preceding
footnote.
Example: Consider an elastic bar of length L and modulus E that is ﬁxed at both ends and
carries a distributed axial load b(x). A displacement ﬁeld u(x) must satisfy the boundary
conditions u(0) = u(L) = 0 and the associated potential energy is
F{u} =
L
0
1
2
E(u
)
2
dx −
L
0
bu dx.
We now use the Ritz method to ﬁnd an approximate displacement ﬁeld that minimizes
F. Consider the sequence of functions v
1
, v
2
, v
3
, . . . where
v
p
= sin
pπx
L
;
observe that v
p
(0) = v
p
(L) = 0 for all intergers p. Consider the function
u
n
(x) =
n
¸
p=1
α
p
sin
pπx
L
for any integer n ≥ 1 and evaluate
´
F(α
1
, α
2
, . . . α
n
) = F{u
n
} =
L
0
1
2
E(u
n
)
2
dx −
L
0
bu
n
dx.
Since
L
0
2 cos
pπx
L
cos
qπx
L
dx =
0 for p = q,
L for p = q,
it follows that
L
0
(u
n
)
2
dx =
L
0
n
¸
p=1
α
p
pπ
L
cos
pπx
L
n
¸
q=1
α
q
qπ
L
cos
qπx
L
dx =
1
2
n
¸
p=1
α
2
p
p
2
π
2
L
Therefore
´
F(α
1
, α
2
, . . . α
n
) = F{u
n
} =
n
¸
p=1
1
4
E α
2
p
p
2
π
2
L
−α
p
L
0
b sin
pπx
L
dx
(7.129)
10
If the sequence φ
1
, φ
2
, . . . is complete, and the functional F{φ} is continuous in the appropriate norm,
then one can show that lim
p→∞
m
p
= m.
7.12. WORKED EXAMPLES. 187
To minimize
´
F(α
1
, α
2
, . . . α
n
) with respect to α
p
we set ∂
´
F/∂α
p
= 0. This leads to
α
p
=
L
0
b sin
pπx
L
dx
E
p
2
π
2
2L
for p = 1, 2, . . . n. (7.130)
Therefore by substituting (7.130) into (7.129) we ﬁnd that the nterm Ritz approximation
of the energy is
−
n
¸
p=1
1
4
E α
2
p
p
2
π
2
L
where α
p
=
L
0
b sin
pπx
L
dx
E
p
2
π
2
2L
,
and the corresponding approximate displacement ﬁeld is given by
u
n
=
n
¸
p=1
α
p
sin
pπx
L
where α
p
=
L
0
b sin
pπx
L
dx
E
p
2
π
2
2L
.
References
1. J.L. Troutman, Variational Calculus with Elementary Convexity, SpringerVerlag, 1983.
2. C. Fox, An Introduction to the Calculus of Variations, Dover, 1987.
3. G. Bliss, Lectures on the Calculus of Variations, University of Chicago Press, 1946.
4. L.A. Pars, Calculus of Variations, Wiley, 1963.
5. R. Weinstock, Calculus of Variations with Applications, Dover, 1952.
6. I.M. Gelfand and S.V. Fomin, Calculus of Variations, PrenticeHall, 1963.
7. T. Mura and T. Koya, Variational Methods in Mechanics, Oxford, 1992.
7.12 Worked Examples.
Example 7.N: Consider two given points (x
1
, h
1
) and (x
2
, h
2
), with h
1
> h
2
, that are to be joined by a
smooth wire. The wire is permited to have any shape, provided that it does not enter into the interior of the
circular region (x − x
0
)
2
+ (y − y
0
)
2
≤ R
2
. A bead is released from rest from the point (x
1
, h
1
) and slides
188 CHAPTER 7. CALCULUS OF VARIATIONS
x
0
y = φ(x)
g
y
(x
1
, h
1
)
(x
2
, h
2
)
x
1
x
2
Figure 7.23: A curve y = φ(x) joining (x
1
, h
1
) to (x
2
, h
2
) which is disallowed from entering the forbidden
region (x −x
0
)
2
+ (φ(x) −y
0
)
2
< R
2
.
along the wire (without friction) due to gravity. For what shape of wire is the time of travel from (x
1
, h
1
) to
(x
2
, h
2
) least?
Here the wire may not enter into the interior of the prescribed circular region . Therefore in considering
diﬀerent wires that connect (x
1
, h
1
) to (x
2
, h
2
), we may only consider those that lie entirely outside this
region:
(x −x
0
)
2
+ (φ(x) −y
0
)
2
≥ R
2
, x
1
≤ x ≤ x
2
. (i)
The travel time of the bead is again given by (7.1) and the test functions must satisfy the same requirements
as in the ﬁrst example except that, in addition, they must be such satisfy the (inequality) constraint (i). Our
task is to minimize T{φ} over the set A
1
subject to the constratint (i).
Example 7.N: Buckling: Consider a beam whose centerline occupies the interval y = 0, 0 < x < L, in an
undeformed conﬁguration. A compressive force P is applied at x = L and the beam adopts a buckled shape
described by y = φ(x). Figure NNN shows the centerline of the beam in both the undeformed and deformed
conﬁgurations. The beam is ﬁxed by a pin at x = 0; the end x = L is also pinned but is permitted to move
along the xaxis. The prescribed geometric boundary conditions on the deﬂected shape of the beam are thus
φ(0) = φ(L) = 0.
By geometry, the curvature κ(x) of a curve y = φ(x) is given by
κ(x) =
φ
(x)
[1 + (φ
(x))
2
]
3/2
.
From elasticity theory we know that the bending energy per unit length of a beam is (1/2)Mκ and that
the bending moment M is related to the curvature κ by M = EIκ where EI is the bending stiﬀness of the
beam. Thus the bending energy associated with a diﬀerential element of the beam is (1/2)EIκ
2
ds where ds
7.12. WORKED EXAMPLES. 189
x
y
P
L
O
Figure 7.24: An elastic beam in undeformed (lower ﬁgure) and buckled (upper ﬁgure) conﬁgurations.
is arc length along the deformed beam. Thus the total bending energy in the beam is
L
0
1
2
EI κ
2
(x) ds
where the arc length s is related to the coordinate x by the geometric relation
ds =
1 + (φ
(x))
2
dx.
Thus the total bending energy of the beam is
L
0
1
2
EI
(φ
)
2
[1 + (φ
)
2
]
5/2
dx.
Next we need to account for the potential energy associated with the compressive force P on the beam.
Since the change in length of a diﬀerential element is ds − dx, the amount by which the right hand end of
the beam moves leftwards is
−
L
0
ds −
L
0
dx
= −
L
0
1 + (φ
)
2
dx − L
.
Thus the potential energy associated with the applied force P is
− P
L
0
1 + (φ
)
2
dx − L
.
Therefore the total potential energy of the system is
Φ{x, φ, φ
, φ
} =
L
0
1
2
EI
(φ
)
2
[1 + (φ
)
2
]
5/2
dx −
L
0
P
1 + (φ
)
2
− 1
dx.
The Euler equation, which for such a functional has the general form
d
2
dx
2
f
φ
−
d
dx
f
φ
+ f
φ
= 0,
190 CHAPTER 7. CALCULUS OF VARIATIONS
simpliﬁes in the present case since f does not depend explicitly on φ. The last term above therefore drops out
and the resulting equation can be integrated once immediately. This eventually leads to the Euler equation
d
dx
φ
[1 + (φ
)
2
]
5/2
+
φ
2[1 + (φ
)
2
]
1/2
P
EI/2
+ 5
¸
φ
[1 + (φ
)
2
]
3/2
2
= c
where c is a constant of integration, and the natural boundary conditions are
φ
(0) = φ
(L) = 0.
Example 7.N: Linearize BVP in buckling problem above. Also, approximate the energy and derive Euler
equation associated with it.
Example 7.N: u(x, t) where 0 ≤ x ≤ L, 0 ≤ t ≤ T Functional
F{u} =
T
0
L
0
1
2
u
2
t
−
1
2
u
2
x
dxdt
Euler equation (Wave equation)
u
tt
−u
xx
= 0.
Example 7.N: Physical example? Functional
F{u} =
T
0
L
0
1
2
u
2
t
−
1
2
u
2
x
+
1
2
m
2
u
2
dxdt
Euler equation (KleinGordon equation)
u
tt
−u
xx
+m
2
u = 0.
Example 7.N: Null lagrangian
Example 7.2: Soap Film Problem. Consider two circular wires, each of radius R, that are placed coaxially, a
distance H apart. The planes deﬁned by the two circles are parallel to each other and perpendicular to their
common axis. This arrangement of wires is dipped into a soapy bath and taken out. Determine the shape of
the soap ﬁlm that forms.
We shall assume that the soap ﬁlm adopts the shape with minimum surface energy, which implies that we
are to ﬁnd the shape with minimum surface area. Suppose that the ﬁlm spans across the two circular wires.
7.12. WORKED EXAMPLES. 191
H
x
y
R
R
−H
y = φ(x)
By symmetry, the surface must coincide with the surface of revolution of some curve y = φ(x), −H ≤ x ≤ H.
By geometry, the surface area of this ﬁlm is
Area{φ} = 2π
φ(x)ds = 2π
H
−H
φ(x)
1 + (φ
)
2
dx,
where we have used the fact that ds =
1 + (φ
)
2
dx, and this is to be minimized subject to the requirements
φ(−H) = φ(H) = R and
φ(x) ≥ 0 for −H < x < H.
In order to determine the shape that minimizes the surface area we calculate its ﬁrst variation δArea
and set it equal to zero. This gives the Euler equation
d
dx
φφ
1 + (φ
)
2
¸
−
1 + (φ
)
2
= 0
which we can write as
φφ
1 + (φ
)
2
d
dx
φφ
1 + (φ
)
2
¸
− φφ
= 0
or
d
dx
φφ
1 + (φ
)
2
2
−
d
dx
(φ)
2
= 0.
This can be integrated to give
(φ
)
2
=
φ
c
2
−1
where c is a constant. Integrating again and using the boundary conditions φ(H) = φ(−H) = R, leads to
φ(x) = c cosh
x
c
(i)
where c is to be determined from the algebraic equation
cosh
H
c
=
R
c
. (ii)
192 CHAPTER 7. CALCULUS OF VARIATIONS
0.5 1 1.5 2 2.5 3
2
4
6
8
10
ζ = cosh ξ
ζ = µξ, µ > µ
∗
ζ = µ
∗
ξ
ζ = µξ, µ < µ
∗
ξ
ζ
Figure 7.25: Intersection of the curve described by ζ = cosh ξ with the straight line ζ = µξ.
Given H and R, if this equation can be solved for c, then the minimizing shape is given by (i) with this value
(or values) of c.
To examine the solvability of (ii) set ξ = H/c and µ = R/H and then this equation can be written as
cosh ξ = µξ.
As seen from Figure 7.25, the graph ζ = cosh ξ intersects the straight line ζ = µξ twice if µ > µ
∗
; once if
µ = µ
∗
; and there is no intersection if µ < µ
∗
. Here µ
∗
≈ 1.50888 is found by solving the pair of algebraic
equations cosh ξ = µ
∗
ξ, sinh ξ = µ
∗
where the latter equation reﬂects the tangency of the two curves at the
contact point in this limiting case.
Thus in summary, if R < µ
∗
H there is no shape of the soap ﬁlm that extremizes the surface area; if
R = µ
∗
H there is a unique shape of the soap ﬁlm given by (i) that extremizes the surface area; if R > µ
∗
H
there are two shapes of the soap ﬁlm given by (i) that extremize the surface area (and further analysis
investigating the stability of these conﬁgurations is needed in order to determine the physically realized
shape).
Remark: In order to understand what happens when R < µ
∗
H consider the following heuristic argument.
There are three possible conﬁgurations of the soap ﬁlm to consider: one, the ﬁlm bridges across from one
circular wire to the other but it does not form on the ﬂat faces of the two circular wires themselves (which
is the case analyzed above); two, the ﬁlm forms on each circular wire but does not bridge the two wires;
and three, the ﬁlm does both of the above. We can immediately discard the third case since it involves more
surface area than either of the ﬁrst two cases. Consider the ﬁrst possibility: the soap ﬁlm spans across the
two circular wires and, as an approximation, suppose that it forms a circular cylinder of radius R and length
2H. In this case the area of the soap ﬁlm is 2πR(2H). In the second case, the soap ﬁlm covers only the two
end regions formed by the circular wires and so the area of the soap ﬁlm is 2×πR
2
. Since 4πRH < 2πR
2
for
2H < R, and 4πRH > 2πR
2
for 2H > R, this suggests that the soap ﬁlm will span across the two circular
7.12. WORKED EXAMPLES. 193
wires if R > 2H, whereas the soap will not span across the two circular wires if R < 2H (and would instead
cover only the two circular ends).
Example 7.4: Minimum Drag Problem. Consider a spacecraft whose shape is to be designed such that the
drag on it is a minimum. The outer surface of the spacecraft is composed of two segments as shown in
Figure 7.26: the inner portion (x = 0 with 0 < y < h
1
in the ﬁgure) is a ﬂat circular disk shaped nose of
radius h
1
, and the outer portion is obtained by rigidly rotating the curve y = φ(x), 0 < x < 1, about the
xaxis. We are told that φ(0) = h
1
, φ(1) = h
2
with the value of h
2
being given; the value of h
1
however is
not prescribed and is to be chosen along with the function φ(x) such that the drag is minimized.
x
y
y = φ(x)
dF = pdA
θ
p ∼ (V cos θ)
2 s
0 1
h
1
h
2
V
dA = 2πφds
Figure 7.26: The shape of the space craft with minimum drag is generated by rotating the curve y = φ(x)
about the xaxis. The space craft moves at a speed V in the −xdirection.
According to the most elementary model of drag (due to Newton), the pressure at some point on a surface
is proportional to the square of the normal speed of that point. Thus if the space craft has speed V relative to
the surrounding medium, the pressure on the body at some generic point is proportional to (V cos θ)
2
where
θ is the angle shown in Figure 7.26; this acts on a diﬀerential area dA = 2πyds = 2πφds. The horizontal
component of this force is therefore obtained by integrating dF cos θ = (V cos θ)
2
×(2πφds) ×cos θ over the
entire body. Thus the drag D is given, in suitable units, by
D = πh
2
1
+ 2π
1
0
φ(φ
)
3
[1 + (φ
)
2
]
dx,
where we have used the fact that ds = dx
1 + (φ
)
2
and cos θ = φ
/
1 + (φ
)
2
.
To optimize this we calculate the ﬁrst variation of D, remembering that both the function φ and the
parameter h
1
can be varied. Thus we are led to
δD = 2πh
1
δh
1
+ 2π
1
0
(φ
)
3
[1 + (φ
)
2
]
δφ dx + 2π
1
0
φ
¸
3(φ
)
2
1 + (φ
)
2
−
2(φ
)
4
[1 + (φ
)
2
]
2
δφ
dx,
= 2πh
1
δh
1
+ 2π
1
0
(φ
)
3
[1 + (φ
)
2
]
δφ dx + 2π
1
0
¸
φ(φ
)
2
(3 + (φ
)
2
)
[1 + (φ
)
2
]
2
δφ
dx.
194 CHAPTER 7. CALCULUS OF VARIATIONS
Integrating the last term by parts and recalling that δφ(1) = 0 and δφ(0) = δh
1
(since the value of φ(1) is
prescribed but the value of φ(0) = h
1
is not) we are led to
δD = 2πh
1
δh
1
+2π
1
0
(φ
)
3
[1 + (φ
)
2
]
δφ dx −2π
1
0
d
dx
¸
φ(φ
)
2
(3 + (φ
)
2
)
[1 + (φ
)
2
]
2
δφ dx −
2πφ(φ
)
2
(3 + (φ
)
2
)
[1 + (φ
)
2
]
2
x=0
δh
1
.
The arbitrariness of δφ and δh
1
now yield
d
dx
¸
φ(φ
)
2
(3 + (φ
)
2
)
[1 + (φ
)
2
]
2
−
(φ
)
3
[1 + (φ
)
2
]
= 0 for 0 < x < 1,
(φ
)
2
(3 + (φ
)
2
)
[1 + (φ
)
2
]
2
x=0
= 1.
(i)
The diﬀerential equation in (i)
1
and the natural boundary condition (i)
2
can be readily reduced to
d
dx
φ(φ
)
3
[1 + (φ
)
2
]
2
= 0 for 0 < x < 1,
φ
(0) = 1.
(ii)
The diﬀerential equation (ii) tells us that
φ(φ
)
3
[1 + (φ
)
2
]
2
= c
1
for 0 < x < 1, (iii)
where c
1
is a constant. Together with the given boundary conditions, we are therefore to solve the diﬀerential
equation (iii) subject to the conditions
φ(0) = h
1
, φ(1) = h
2
, φ
(0) = 1, (iv)
in order to ﬁnd the shape φ(x) and the parameter h
1
.
Since φ(0) = h
1
and φ
(0) = 1 this shows that c
1
= h
1
/4.
It is most convenient to write the solution of (iii) with c
1
= h
1
/4 parametrically by setting φ
= ξ. This
leads to
φ =
h
1
4
ξ
−3
+ 2ξ
−1
+ξ
,
x =
h
1
4
3
4
ξ
−4
+ξ
−2
+ log ξ
+c
2
,
1 > ξ > ξ
2
,
where c
2
is a constant of integration and ξ is the parameter. On physical grounds we expect that the slope φ
will decrease with increasing x and so we have supposed that ξ decreases as x increases; thus as x increases
from 0 to 1 we have supposed that ξ decreases from ξ
1
to ξ
2
(where we know that ξ
1
= φ
(0) = 1). Since
ξ = φ
= 1 when x = 0 the preceding equation gives c
2
= −7h
1
/16. Thus
φ =
h
1
4
ξ
−3
+ 2ξ
−1
+ξ
,
x =
h
1
4
3
4
ξ
−4
+ξ
−2
+ log ξ −
7
4
,
1 > ξ > ξ
2
. (v)
7.12. WORKED EXAMPLES. 195
The boundary condition φ = h
1
, φ
= 1 at x = 0 has already been satisﬁed. The boundary condition
φ = h
2
, x = 1 at ξ = ξ
2
requires that
h
2
=
h
1
4
ξ
−3
2
+ 2ξ
−1
2
+ξ
2
,
1 =
h
1
4
3
4
ξ
−4
2
+ξ
−2
2
+ log ξ
2
−
7
4
,
(vi)
which are two equations for determining h
2
and ξ
2
. Dividing the ﬁrst of (vi) by the second yields a single
equation for ξ
2
:
ξ
5
2
−h
2
ξ
4
2
log ξ
2
+
7
4
h
2
ξ
4
2
+ 2ξ
3
2
−h
2
ξ
2
2
+ξ
2
−
3
4
h
2
= 0. (vii)
If this can be solved for ξ
2
, then either equation in (vi) gives the value of h
1
and (v) then provides a parametric
description of the optimal shape. For example if we take h
2
= 1 then the root of (vii) is ξ
2
≈ 0.521703 and
then h
1
≈ 0.350943.
Example 7.5: Consider the variational problem where we are asked to minimize the functional
F{φ} =
1
0
f(φ, φ
)dx
over some admissible set of functions A. Note that this is a special case of the standard problem where the
function f(x, φ, φ
) is not explicitly dependent on x. In the present case f depends on x only through φ(x)
and φ
(x).
The Euler equation is given, as usual, by
d
dx
f
φ
− f
φ
= 0 for 0 < x < 1.
Multiplying this by φ
gives
φ
d
dx
f
φ
− φ
f
φ
= 0 for 0 < x < 1,
which can be written equivalently as
¸
d
dx
(φ
f
φ
) −φ
f
φ
−
¸
d
dx
f −φ
f
φ
= 0 for 0 < x < 1.
Since this simpliﬁes to
d
dx
φ
f
φ
−f
= 0 for 0 < x < 1,
it follows that in this special case the Euler equation can be integrated once to have the simpliﬁed form
φ
f
φ
−f = constant for 0 < x < 1.
Remark: We could have taken advantage of this in, for example, the preceding problem.
Example 7.6: Elastic bar. The following problem arises when examining the equilibrium state of a one
dimensional bar composed of a nonlinearly elastic material. An equilibrium state of the bar is characterized
196 CHAPTER 7. CALCULUS OF VARIATIONS
by a displacement ﬁeld u(x) and the material of which the bar is composed is characterized by a potential
´
W(u
). It is convenient to denote the derivative of W by σ,
´ σ(u
) =
´
W
(u
),
so that then σ(x) = ´ σ(u
(x)) represents the stress at the point x in the bar. The bar has unit crosssectional
area and occupies the interval 0 ≤ x ≤ L in a reference conﬁguration. The end x = 0 of the bar is ﬁxed, so
that u(0) = 0, a prescribed force P is applied at the end x = L, and a distributed force per unit length b(x)
is applied along the length of the bar.
An admissible displacement ﬁeld is required to be continuous on [0, L], piecewise continuously diﬀeren
tiable on [0, L] and to conform to the boundary condition u(0) = 0. The total potential energy associated
with any admissible displacement ﬁeld is
V {u} =
L
0
´
W(u
(x))dx −
L
0
b(x)u(x)dx − Pu(L),
which can be written in the conventional form
V {u} =
L
0
f(x, u, u
)dx where f(x, u, u
) =
´
W(u
) −bu −Pu
.
The actual displacement ﬁeld minimizes the potential energy V over the admissible set, and so the three
basic ingredients of the theory can now be derived as follows:
i. At any point x at which the displacement ﬁeld is smooth, the Euler equation
d
dx
∂f
∂u
−
∂f
∂u
= 0
takes the explicit form
d
dx
´
W
(u
) +b = 0,
which can be written in terms of stress as
d
dx
σ +b = 0.
ii. The displacement ﬁeld u(x) satisﬁes the prescribed boundary condition u = 0 at x = 0. The natural
boundary condition at the right hand end is given, according to equation (7.50) in Section 7.5.1, by
f
u
= 0 at x = L,
which in the present case reduces to
σ(x) =
´
W
(u
(x) = P at x = L.
iii. Finally, suppose that u
has a jump discontinuity at some location x = s. Then the ﬁrst Weirstrass
Erdmann corner condition (7.88) requires that ∂f/∂u
be continuous at x = s, i.e. that the stress
σ(x) must be continuous at x = s:
σ
x=s−
= σ
x=s+
. (i)
7.12. WORKED EXAMPLES. 197
The second WeirstrassErdmann corner condition (7.89) requires that f −u
∂f/∂u
be continuous at
x = s, i.e. that the quantity W −u
σ must be continuous at x = s:
W −u
σ
x=s−
= W −u
σ
x=s+
(ii)
Remark: The generalization of the quantity W −u
σ to 3dimensions in known as the Eshelby tensor.
0
u
W(u
)
0
σ
σ(u
)
u
(s+) u
(s−)
u
(a) (b )
Figure 7.27: (a) A nonmonotonic (risingfallingrising) stress response function ´ σ(u
) and (b) the corre
sponding nonconvex energy
´
W(u
).
In order to illustrate how a discontinuity in u
can arise in an elastic bar, observe ﬁrst that according
to the ﬁrst WeirstrassErdmann condition (i), the stress σ on either side of x = s has to be continuous.
Thus if the function ´ σ(u
) is monotonically increasing, then it follows that σ = ´ σ(u
) has a unique solution
u
corresponding to a given σ, and so u
must also be continuous at x = s. On the other hand if ´ σ(u
)
is a nonmonotonic function as, for example, shown in Figure 7.27(a), then more than one value of u
can
correspond to the same value of σ, and so in such a case, even though σ(x) is continuous at x = s it is possible
for u
to be discontinuous, i.e. for u
(s−) = u
(s+), as shown in the ﬁgure. The energy function
´
W(u
) sketched
in Figure 7.27(b) corresponds to the stress function ´ σ(u
) shown in Figure 7.27(a). In particular, the values
of u
at which ´ σ has a local maximum and local minimum, correspond to inﬂection points of the energy
function
´
W(u
) since
´
W
= 0 when ´ σ
= 0.
The second WeirstrassErdmann condition (ii) tells us that the stress σ at the discontinuity has to have
a special value. To see this we write out (ii) explicitly as
´
W(u
(s+)) −u
(s+)σ =
´
W(u
(s−) −u
(s−)σ
and then use ´ σ(u
) =
´
W
(u
) to express it in the form
u
(s+)
u
(s−)
´ σ(v)dv = σ
u
(s+) −u
(s−)
. (iii)
198 CHAPTER 7. CALCULUS OF VARIATIONS
This implies that the value of σ must be such that the area under the stress response curve in Figure 7.27(a)
from u
(s−) to u
(s+) must equal the area of the rectangle which has the same base and has height σ; or
equivalently that the two shaded areas in Figure 7.27(a) must be equal.
Example 7.7: Nonsmooth extremal. Find a curve that extremizes
F{φ} =
1
0
f(x, φ(x), φ
(x))dx,
that begins from (0, a), and ends at (1, b) after contacting a given curve y = g(x).
Remark: By identifying the curve y = g(x) with the surface of a mirror and specializing the functional F to
the travel time of light, one can thus derive the law of reﬂection for light.
Example 7.8: Inequality Constraint. Find a curve that extremizes
I(φ) =
a
0
(φ
(x))
3
dx, φ(0) = φ(a) = 0,
that is prohibited from entering the interior of the circle
(x −a/2)
2
+y
2
= b
2
.
Example 7.9: An example to caution against simpleminded discretization. (Due to John Ball). Let
F{u} =
1
0
(u
3
(x) −x)
2
(u
(x))
2
dx
for all functions such that u(0) = 0, u(1) = 1. Clearly F{u} ≥ 0. Moreover F{¯ u} = 0 for ¯ u(x) = x
1/3
.
Therefore the minimizer of F{u} is ¯ u(x) = x
1/3
.
Discretize the interval [0, 1] into N segments, and take a piecewise linear test function that is linear on
each segment. Calculate the functional F at this test function, next minimize it at ﬁxed N, and ﬁnally take
its limit as N tends to inﬁnity. What do you get? (You will get an answer but not the correct one.)
To anticipate this diﬃculty in a diﬀerent way, consider a 2 element discretization, and take the continuous
test function
u
1
(x) =
cx for 0 < x < h,
¯ u(x) for h < x < 1.
(i)
Calculate F{u} for this function. Take limit as h → 0 and observe that F{u} does not go to zero (i.e. to
F{¯ u}).
7.12. WORKED EXAMPLES. 199
Example 7.10: Legendre necessary condition for a local minimum: Let
F{ε} =
1
0
f(x, φ +εη, φ
+εη
)dx
for all functions η(x) with η(0) = η(1) = 0. Show that
F
(0) =
1
0
¸
f
φφ
η
2
+ 2f
φφ
ηη
+f
φ
φ
(η
)
2
¸
dx.
Suppose that F
(0) ≥ 0 for all admissible functions η. Show that it is necessary that
f
φ
φ
(x, φ(x), φ
(x)) ≥ 0 for 0 ≤ x ≤ 1.
Example 7.11: Bending of a thin plate. Consider a thin rectangular plate of dimensions a ×b that occupies
the region A = {(x, y)  0 < x < a, 0 < y < b} of the x, yplane. A distributed pressure loading p(x, y)
is applied on the planar face of the plate in the zdirection, and the resulting deﬂection of the plate in the
zdirection is denoted by w(x, y). The edges x = 0 and y = 0 of the plate are clamped, which implies the
geometric restrictions that the plate cannot deﬂect nor rotate along these edges:
w = 0,
∂w
∂x
= 0 on x = 0, 0 < y < b,
w = 0,
∂w
∂y
= 0 on y = 0, 0 < x < a;
(i)
the edge y = b is hinged, which means that its deﬂection must be zero but there is no geometric restriction
on the slope:
w = 0, on y = b, 0 < x < a; (ii)
and ﬁnally the edge x = a is free in the sense that the deﬂection and the slope are not geometrically restricted
in any way.
The potential energy of the plate and loading associated with an admissible deﬂection ﬁeld, i.e. a function
w(x, y) that obeys (i) and (i) is given by
Φ{w} =
D
2
A
¸
∂
2
w
∂x
2
+
∂
2
w
∂y
2
2
−2(1 −ν)
∂
2
w
∂x
2
∂
2
w
∂y
2
−
∂
2
w
∂x∂y
2
¸
dxdy −
A
pw dxdy. (iii)
where D and ν are constants. The actual deﬂection of the plate is given by the minimizer of Φ. We are asked
to derive the Euler equation and the natural boundary conditions to be satisﬁed by minimizing w.
Answer: The Euler equation is
∂
4
w
∂x
4
+ 2
∂
4
w
∂x
2
∂y
2
+
∂
4
w
∂x
4
=
p
D
0 < x < a, 0 < y < b, (iv)
and the natural boundary conditions are
∂
2
w
∂y
2
+ν
∂
2
w
∂x
2
= 0, on y = b, 0 < x < a; (v)
200 CHAPTER 7. CALCULUS OF VARIATIONS
and
∂
2
w
∂x
2
+ν
∂
2
w
∂y
2
= 0 and
∂
3
w
∂x
3
+ 2(1 −ν)
∂
3
w
∂x∂y
2
= 0, on x = a, 0 < y < b; (vi)
Example 7.12: Consider the functional
F{φ} =
1
0
(φ
)
2
−1
2
+φ
2
dx, φ(0) = φ(1) = 0, (i)
and determine a minimizing sequence φ
1
, φ
2
, φ
3
, . . . such that F{φ
k
} approaches its inﬁmum as k → ∞. If
the minimizing sequence itself converges to φ
∗
show that F{φ
∗
} is not the inﬁmum of F.
Remark: Note that this functional is nonnegative. If the functional takes the value zero, then, since its
integrand is the sum of two nonnegative terms, each of those terms must vanish individually. Thus we must
have φ
(x) = ±1 and φ(x) = 0 on the interval 0 < x < 1. These cannot be both satisﬁed by a regular
function.
0 1
h
h =
1
k
nh (n + 1)h
h/2
x
φ
k
(x) = x −nh
φ
k
(x) = −x + (n + 1)
Figure 7.28: Sawtooth function φ
k
(x) with k local maxima and linear segments of slope ±1.
Let φ
k
(x) be the piecewise linear sawtooth function with klocal maxima as shown in Figure 7.28; the
slope of each linear segment is ±1. Note that the base h = 1/k and the height is h/2. Thus as k increases
there are more and more teeth, each of which has a smaller base and smaller height. Observe that the ﬁrst
term in the integrand of (i) vanishes identically for any k; the second term, which equals the area under the
square of φ
k
approaches zero as k → ∞. Thus the family of sawteeth functions is a minimizing sequence
of (i). However, note that since φ
k
(x) → 0 at each ﬁxed x as k → ∞ the limiting function to which this
sequence converges, i.e. φ(x) = 0, is not a minimizer of (i).
2
3
Electronic Publication
Rohan Abeyaratne Quentin Berg Professor of Mechanics Department of Mechanical Engineering 77 Massachusetts Institute of Technology Cambridge, MA 021394307, USA Copyright c by Rohan Abeyaratne, 1987 All rights reserved
Abeyaratne, Rohan, 1952Lecture Notes on The Mechanics of Elastic Solids. Volume 1: A Brief Review of Some Mathematical Preliminaries / Rohan Abeyaratne – 1st Edition – Cambridge, MA:
ISBN13: 9780979186509 ISBN10: 0979186501
QC
Please send corrections, suggestions and comments to abeyaratne.vol.1@gmail.com
Updated June 25 2007
4
i Dedicated with admiration and aﬀection to Matt Murphy and the miracle of science. . for the gift of renaissance.
.
072 and 2. Morton E.083). not a text book. colleague and friend Jim Knowles. I am also indebted to the many MIT students who have given me enormous fulﬁllment and joy to be part of their education.iii PREFACE The Department of Mechanical Engineering at MIT oﬀers a series of graduate level subjects on the Mechanics of Solids and Structures which include: 2. It has been organized as follows: Volume I: A Brief Review of Some Mathematical Preliminaries Volume II: Continuum Mechanics Volume III: Elasticity My appreciation for mechanics was nucleated by Professors Douglas Amarasekara and Munidasa Ranaweera of the (then) University of Ceylon. Molecular Modeling and Simulation for Mechanics. My understanding of elasticity as well as these notes have also beneﬁtted greatly from many useful conversations with Kaushik Bhattacharya. The ﬁrst draft of these notes was produced in 1987 and they have been corrected.075: 2. .074: 2. and was subsequently shaped and grew substantially under the inﬂuence of Professors James K. Solid Mechanics: Plasticity and Inelastic Deformation.095: 2.080: 2. Advanced Mechanical Behavior of Materials. 2. I would especially like to acknowledge a great many illuminating and stimulating interactions with my mentor.073: 2. and the current three volumes are comprised of the lecture notes I developed for them. and Computational Mechanics of Materials. Eliot Fried.074 (formerly known as 2.071: 2.094: 2.099: Mechanics of Solid Materials. The material in the current presentation is still meant to be a set of lecture notes. I have been most fortunate to have had the opportunity to apprentice under these inspiring and distinctive scholars. Janet Blume. Solid Mechanics: Elasticity.072: 2. I have had the opportunity to regularly teach the second and third of these subjects. Knowles and Eli Sternberg of the California Institute of Technology. Structural Mechanics. Finite Element Analysis of Solids and Fluids. reﬁned and expanded on every following occasion that I taught these classes. whose inﬂuence on me cannot be overstated. Mechanics of Continuous Media. Over the years.
There are a number of Worked Examples at the end of each chapter which are an essential part of the notes. Chapman and Hall. California Institute of Technology. Introduction to the Thermodynamics of Solids. Continuum Mechanics: Concise Theory and Problems. results. 1963. The treatment is concise. Ericksen. Gelfand and S. The content of these notes are entirely classical. CA 1978. 1981. Stewart Silling and Nicolas Triantafyllidis. David M.V. Oxford University Press. Knowles and E. in the best sense of the word. In a more general sense the broad approach and philosophy taken has been inﬂuenced by: Volume I: A Brief Review of Some Mathematical Preliminaries I. Parks. Prentice Hall. Examples of this include sections on the statistical mechanical theory of polymer chains and the lattice theory of crystalline solids in the discussion of constitutive theory in Volume II. An Introduction to Continuum Mechanics. Calculus of Variations. James. of a result that had been quoted previously in the text. and illustrative examples.L. or a proof. Richard D. or it establishes a result that will be used subsequently (possibly in a later volume). Chadwick. It is most certainly not meant to be a source for learning these topics for the ﬁrst time. For example. I cannot recall every source I have used but certainly they include those listed at the end of each chapter. Linear Algebra is a far richer subject than the treatment here. J. selective and limited in scope. New York. Stelios Kyriakides. which I gratefully acknowledge. Academic Press. K. Sternberg.M. M. 1997. Volume II: Continuum Mechanics P.1999. I have drawn on a number of sources over the years as I prepared my lectures. (Unpublished) Lecture Notes for AM136: Finite Elasticity. Phoebus Rosakis. designed to review those aspects of mathematics that will be encountered in the subsequent volumes. Pasadena. J. more details. which is limited to real 3dimensional Euclidean vector spaces.iv Gurtin. Volume I of these notes provides a collection of essential deﬁnitions. and sections on the socalled Eshelby problem and the eﬀective behavior of twophase materials in Volume III. Personal taste has led me to include a few special (but still wellknown) topics. or it illustrates a general concept. Gurtin.E. J. Knowles.K. Dover. . Fomin. Linear Vector Spaces and Cartesian Tensors. Many of these examples either provide. The topics covered in Volumes II and III are largely those one would expect to see covered in such a set of lecture notes. 1991. and none of the material here is original.
Timoshenko and J. Springer. B. Volume III/3. b. and uppercase boldface Latin letters will denote linear transformations. A Treatise on the Mathematical Theory of Elasticity. ..E. Thus. CA. McGrawHill. California Institute of Technology. will denote vectors. Pasadena. 1965. c. will denote scalars (real numbers). C.N. (Unpublished) Lecture Notes for AM135: Elasticity. S. Edited by S. P. Knowles. H. “o” will denote the null vector while “0” will denote the null linear transformation. lowercase boldface Latin letters will denote vectors. Dover. J. Truesdell and W. u Volume IIII: Elasticity M. Gurtin. will denote linear transformations. in Mechanics of Solids . 1944.. SpringerVerlag. edited by C. 1976. in Handb¨ch der u Physik. and A. Love. The nonlinear ﬁeld theories of mechanics. Goodier.Volume II. Fl¨gge. γ. β. E.. A. a. Noll. In particular. for example. Truesdell. Theory of Elasticity.. . α.. 1984. . 1987.. The following notation will be used consistently in Volume I: Greek letters will denote real numbers. As much as possible this notation will also be used in Volumes II and III though there will be some lapses (for reasons of tradition). The linear theory of elasticity.v C. K.
vi .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Components of a vector in a basis. . . . . . .2 3. . .3 1. . . . Linear Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1. . . . . .3 3. . . . 2 Vectors and Linear Transformations 2.Contents 1 Matrix Algebra and Indicial Notation 1. . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Euclidean point space . . . . . . . . . . .2 1. . . . . . . . .1 3. . scalarproduct and norm . . . . . . . . . . . 3 Components of Tensors. . . . . . . . . . . . . . . . . . . . . . . . . Cartesian Tensors 3. . . . . . . . . . . . . . . . . . . . . . . . . Indicial notation . . . . . . . . . . . Worked Examples. . . . . . . . Components of a linear transformation in a basis. . . . 1 1 5 7 9 10 11 17 18 20 20 26 41 41 43 45 47 Summation convention . . . . . . . . . . . . . . .4 1. . . . . . . . . . . . . . . . . . Components in two bases. . . . . . . . .5 1. . . . . . . . . . . trace. The alternator or permutation symbol . . . . . . . . . . . . . . .1. . .1 2. . . . . . Determinant. . . . .1 Vectors . . . . . . . . . . . . . . .2 2. . . Worked Examples. . . . . . . .6 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kronecker delta . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .5 3. . . . . . . 108 Gradient of a vector ﬁeld . . . . . . . . . . .viii 3. . . . . . . . . . . . . . 109 . . 107 e ˆ e ˆ ˆ 6. . . 50 52 67 68 69 72 73 74 77 85 85 87 88 89 99 99 4 Symmetry: Groups of Linear Transformations 4. . . 5 Calculus of Vector and Tensor Fields 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Worked Examples. . 102 6. Groups of Linear Transformations. . . . . . . . .2 5.2 Introductory Remarks . . . .1 6.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6. . . . . . . . . . . . . . . . . . . . . . Worked Examples. . .3. . . . .2 Gradient of a scalar ﬁeld . . . .3.2. . . Inverse transformation. . . scale moduli. . . . . . . . . .1 5. . . . . .3 6. . . . . . . . . . . . . . . . . General Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetry of a scalarvalued function . . . . e3 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 CONTENTS Cartesian Tensors . . . . . . . . . . . . . . . . . . . .2 4. . . . . . . . . Worked Examples. . .4 4. . . . .1 6. . . . . . . . . . . . . . . 104 Inverse partial derivatives . . . . . . . . . . . . . e2 . . .1 6. .6 An example in twodimensions. . . . . . . . . Integral theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 6. . . . . . . . .1 4. . . . . . . . . . . . . . . Lattices. . . . . . . . . . Localization . . .2.3 4. . . . . .4 Coordinate transformation. . . . 6 Orthogonal Curvilinear Coordinates 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4. . . . . . . . . . . 106 Components of ∂ˆi /∂ xj in the local basis (ˆ1 . . . . . . . . 102 Metric coeﬃcients. . . An example in threedimensions. . . . . . . . . . . . . . . . .4 Notation and deﬁnitions. . . . . . . . . . . . . . . . . .3 Transformation of Basic Tensor Relations .2. . . . . . . . . .3 5. . . .
. . . . . . . 112 Examples of Orthogonal Curvilinear Coordinates . . . .5 Generalizations. . . . . . . . . . . . . . . . . . .3 7.3. . . . . . .3. . . . . . . . . . . . . . . . . . 136 7. . 130 7.4 Introduction. . . . . . . . . . Natural boundary conditions. . . . . . . . . .4 6. . . . . .4. . . . .1 7. . . .3. . . . . . .1 7. . . . . . . . . . . .3 The basic problem. . . . . . . . . . . .2 7. . . . . . . . .5 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5. . . .5 ix Divergence of a vector ﬁeld . . . . . . . . . . . . . . .4 Generalization: Free endpoint. . . . . . . . . . . 132 A Formalism for Deriving the Euler Equation . . . . . . . 130 An example. . . . . . . 140 Generalization: Multiple functions. . . . . . . . . . . . . . . . 111 Diﬀerential elements of volume . . . . . . . . . 137 Generalization: Higher derivatives. . . . . . . .6 Constrained Minimization . 137 7. . . .CONTENTS 6. . . . . . . . . . . . . .5. . . Euler equation. . . 110 Divergence of a symmetric 2tensor ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Application of necessary condition δF = 0 . . . . . . . . . . . . . . .5. . . . . . 113 Worked Examples. . . . . .2 7. . . . 110 Laplacian of a scalar ﬁeld . . 112 Diﬀerential elements of area . . . 142 Generalization: End point of extremal lying on a curve. . . . . . . 123 Brief review of calculus. . . .8 6. . . . . . . . . . . .4. . . 154 . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . 147 7. . . . . . .6. . .5. . . . . . . . . .2 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Brachistochrone Problem. 113 123 7 Calculus of Variations 7. . . . . . . . . . . .2 Integral constraints. . . . .4. .3. 126 A necessary condition for an extremum . 151 Algebraic constraints . . . . . . . .3. . . . . . . . .1 7. . . 110 Curl of a vector ﬁeld . . . . .1 7. . . . 151 7. .3 6. . . .7 6. . . . . . . . . . . . . . . . . . . .4 6.6.6 6. . . . . . .3 7. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7. . . . . . . . 185 7. . .7 CONTENTS Diﬀerential constraints .10 Suﬃcient condition for convex functionals . . . . . . . . . . . . .11 Direct method . . . . . . . . . .2 7. . . . . . 183 7. . . . . . . . . .x 7. . . . . . . . . . . . . .3 7.11. . . . . . . . . . . . . . . . . . . . . . . . . .8 7.6. . . . . . . . 162 . . . . . . . . . . . . . . . . . . .1 Piecewise smooth minimizer with nonsmoothness occuring at a prescribed location. . . . . 158 Piecewise smooth minimizer with nonsmoothness occuring at an unknown location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 Worked Examples. . . 157 7. . . .7. . .7. . . . . . . . . . . . . . . . . .1 The Ritz method . . . . . . . . . . . . . . . . . . . . Second variation . . 180 7. . . . . . . . . . . . . . . 187 . . . . . . . . . . . .9 Generalization to higher dimensional space. . . . . . . . . . 155 WeirstrassErdman corner conditions . . . . . 178 7. . . . . . . . . . . . . . .
e.... Am2 ... ai . i.. .1) Aij denotes the element located in the ith row and jth column.. Aij ...... for our purposes it is suﬃcient to consider a matrix to be a rectangular array of real numbers that obeys certain rules of addition and multiplication.. The column matrix {x} = 1 x1 x2 . Am1 A12 A22 .1 Matrix algebra Even though more general matrices can be considered... columnj of the matrix [A] 1. [A] . ..... . Amn . (1.. A m × n matrix [A] has m rows and n columns: [A] = A11 A21 .. xm (1........ A1n A2n ....2) ..Chapter 1 Matrix Algebra and Indicial Notation Notation: {a} . a column matrix with m rows and one column element in rowi of the column matrix {a} m × n matrix element in rowi. m × 1 matrix.
2. i. . .5) If [A] is a p × q matrix and [B] is a q × r matrix. .8) i = 1. . j = 1.3) has one row and n columns. . (1. m. i = 1.2 CHAPTER 1. consider a m × n matrix [A] and a n × 1 (column) matrix {x}. j = 1. . . therefore rather than referring to [A][B] as the product of [A] and [B] we should more precisely refer to [A][B] as [A] postmultiplied by [B]. . . 2. If all the elements of a matrix are zero it is said to be a null matrix and is denoted by [0] or {0} as the case may be. . i = 1. . . The transpose of the m × n matrix [A] is the n × m matrix [B] where Bij = Aji for each i = 1. . n. or [B] premultiplied by [A]. . [B] and [C] obey [A][B] = [A][C] this does not necessarily mean that [B] = [C] (even if [A] = [0].6) one writes [C] = [A][B]. . (1. n. . . . . . 2. i = 1. j = 1. one writes [B] = α[A].7) . 2. y2 . The row matrix {y} = {y1 . . . . . . . . . j = 1. . 2. . It is worth noting that if two matrices [A] and [B] obey the equation [A][B] = [0] this does not necessarily mean that either [A] or [B] has to be the null matrix [0]. m. . n. {x}[A] does not exist is general. 2. m. their product is the p × r matrix [C] with elements q Cij = k=1 Aik Bkj . .) The product by a scalar α of a m × n matrix [A] is the m × n matrix [B] with components Bij = αAij . n. p. (1. . 2. . . Similarly if three matrices [A]. but we cannot premultiply [A] by {x} (unless m=1).e.4) If [A] and [B] are both m × n matrices. q. In general [A][B] = [B][A]. 2. their sum is the m × n matrix [C] denoted by [C] = [A] + [B] whose elements are Cij = Aij + Bij . Two m×n matrices [A] and [B] are said to be equal if and only if all of their corresponding elements are equal: Aij = Bij . Then we can postmultiply [A] by {x} to get the m × 1 column matrix [A]{x}. . (1. MATRIX ALGEBRA AND INDICIAL NOTATION has m rows and one column. . and j = 1. 2. yn } (1. . 2. m. In particular. . Note that a m1 × n1 matrix [A1 ] can be postmultiplied by a m2 × n2 matrix [A2 ] if and only if n1 = m2 . (1.
.9) The transpose of a column matrix is a row matrix. n.14) 1 1 n The trace of a square matrix is the sum of the diagonal elements of that matrix and is denoted by trace[A]: n trace[A] = i=1 Aii . . . and premultiply the resulting matrix by {x}T to get a 1 × 1 square matrix. i i=1 (1. i=1 j=1 for each i. (1. {x}T [A]{x}. 3 (1.11) (1. 2.15) . . i = j. the diagonal elements of this matrix are the Aii ’s. . Then we can premultiply [A] by {x}T . j = 1.12) Thus for a symmetric matrix [A] we have [A]T = [A]. 2. .e. i. (1.1. In the special case of a diagonal matrix [A] {x}T [A]{x} = A11 x2 + A22 x2 + . eﬀectively just a scalar. . For any n × 1 column matrix {x} note that n {x}T {x} = {x}{x}T = x2 + x2 . 2. . n.13) This is referred to as the quadratic form associated with [A]. . [AB]T = [B]T [A]T . . j = 1. MATRIX ALGEBRA Usually one denotes the matrix [B] by [A]T . for a skewsymmetric matrix [A] we have [A]T = −[A]. If every diagonal element of a diagonal matrix is 1 the matrix is called a unit matrix and is usually denoted by [I]. the matrix is said to be diagonal. Aij = 0 for each i. and vice versa. + Ann x2 . n. (1. Note that n n {x}T [A]{x} = Aij xi xj . + x2 = 1 2 n x2 . If the oﬀdiagonal elements of a square matrix are all zero. (1.10) A n × n matrix [A] is called a square matrix. Suppose that [A] is a n × n square matrix and that {x} is a n × 1 (column) matrix. A square matrix [A] is said to be symmetrical if Aij = Aji skewsymmetrical if Aij = −Aji for each i.1. . Suppose that [A] is a m × n matrix and that {x} is a m × 1 (column) matrix. j = 1. . Observe that each diagonal element of a skewsymmetric matrix must be zero. Then we can postmultiply [A] by {x} to get a n × 1 column matrix [A]{x}. One can verify that [A + B]T = [A]T + [B]T . {x}T [A] exists (and is a 1 × n row matrix). i.e. .
. If at least one of the α’s is nonzero. Ain }.4 One can show that CHAPTER 1. . . (1. . The number thus obtained is called the cofactor of Aij .16) (1. Consider a square matrix [A]. For [A] to be nonsingular it is necessary and suﬃcient that det[A] = 0.17) (1. Then for a 2 × 2 matrix det A11 A12 A21 A22 = A11 A22 − A12 A21 . . . the rows of [A] are said to be linearly independent. Consider a square matrix [A] and suppose that its rows are linearly independent. If the rows of [A] are linearly dependent. = αn = 0. then Bij = cofactor of Aji det[A] (1. Let det[A] denote the determinant of a square matrix. Consider a n×n square matrix [A]. a row matrix {a}i can be created by assembling the elements in the ith row of [A]: {a}i = {Ai1 . Ai3 .20) and for a 3 × 3 matrix A11 A12 A13 det A21 A22 A23 = A11 det A31 A32 A33 A22 A23 A32 A33 − A12 det A21 A23 A31 A33 + A13 det A21 A22 A31 A32 . usually denoted by [B] = [A]−1 and called the inverse of [A]. . n. If [B] is the inverse of [A]. [B] = [A]−1 . For each i = 1. . and ﬁnally consider the product of that determinant with (−1)i+j . for which [B][A] = [A][B] = [I]. . 2. One can show that det([A][B]) = (det[A]) (det[B]). MATRIX ALGEBRA AND INDICIAL NOTATION trace([A][B]) = trace([B][A]).19) Note that trace[A] and det[A] are both scalarvalued functions of the matrix [A]. and then at least one row of [A] can be expressed as a linear combination of the other rows.21) . the matrix is singular and an inverse matrix does not exist. are α1 = α2 = . If the only scalars αi for which α1 {a}1 + α2 {a}2 + α3 {a}3 + . Ai2 . (1. αn {a}n = {0} (1. . Then the matrix is said to be nonsingular and there exists a matrix [B]. . they are said to be linearly dependent. .18) The determinant of a n × n matrix is deﬁned recursively in a similar manner. First consider the (n−1)×(n−1) matrix obtained by eliminating the ith row and jth column of [A]. then consider the determinant of that second matrix.
.. one has [A][A]T = [A]T [A] = [I] and that det[A] = ±1. +Ann xn = bn ..2 Indicial notation This system of equations can be written more compactly as Ai1 x1 + Ai2 x2 + . . +A2n xn = b2 . Let Aij denote the element of [A] in its ith row and j th column. ..23) . The subscript i is called a free subscript because it is free to take on each value in its range. . = . +. . this is equivalent to the system of linear algebraic equations A11 x1 +A12 x2 + . . n.25) Consider a n × n square matrix [A] and two n × 1 column matrices {x} and {b}. +A1n xn = b1 . 2. . A2n x2 (1. we shall always use the range convention unless explicitly stated otherwise.. This understanding is referred to as the range convention. +. +. ... i..26) with the understanding that (1. . An1 x1 +An2 x2 + .1.e. . . + Ain xn = bi (1. .2. . . Now consider the matrix equation [A]{x} = {b}: b1 x1 A11 A12 . 2. .. An1 An2 . . with i taking each value in the range 1.. ... . Ain xn = bi Carrying out the matrix multiplication. . From here on.. 2.26) holds for each value of the subscript i in the range i = 1. .22) then the matrix is said to be orthogonal. . .. (1.. and let xi and bi denote the elements in the ith row of {x} and {b} respectively. and simply writing Ai1 x1 + Ai2 x2 + .. 5 (1.. . . A21 x1 +A22 x2 + . .... A1n A b 2 21 A22 . if [A]−1 = [A]T . . n... . . = . n”. 1. . . (1. Ann xn bn or even more compactly by omitting the statement “with i taking each value in the range 1.24) . INDICIAL NOTATION If the transpose and inverse of a matrix coincide.. . Note that for an orthogonal matrix [A]. .
. An2 = xn x2 . . .. 2. . . . n. Similarly since the free subscripts p and q appear in the symbol group on the lefthand 1 By a “symbol group” we mean a set of terms contained between +. . this is because j is a free subscript in (1. = . .. ∂x2 . MATRIX ALGEBRA AND INDICIAL NOTATION Aj1 x1 + Aj2 x2 + . . ∂xn (1. + Ajn xn = bj (1.. in equation (1. xn ) is a function of x1 . 2. . A2n = x2 xn . .26). .27) is identical to (1. Therefore (1. suppose that f (x1 . if we write the equation ∂f = 3xk . .30) corresponds to the nine equations A11 = x1 x1 . . In general.1 As a second example. . . x2 . For example. .. . . ∂x1 ∂f = 3x2 . and each.28) is a compact way of writing the n equations ∂f = 3x1 . .. A12 = x1 x2 ..30) has two free subscripts p and q. . .27) is required to hold “for all j = 1. . since the index i appears once in the symbol group Ai1 x1 . (1.28) ∂xk the index k in it is a free subscript and so takes all values in the range 1. .6 Observe that CHAPTER 1. independently. . . Ai3 x3 . . then it represents 3N scalar equations. n.29) As a third example.31) .. Ain xn and bi of that equation. .. A1n = x1 xn . Ann = xn xn . Then. if an equation involves N free indices. takes all values in the range 1. . 2. . .26)..27) and so (1. . . and only once.. An1 = xn x1 . A21 = x2 x1 . Thus (1. . .. (1. This illustrates the fact that the particular choice of index for the free subscript in an equation is not important provided that the same free subscript appears in every symbol grouping. .24). − and = signs. x2 . ∂f = 3xn .. in every group of symbols in an equation. In order to be consistent it is important that the same free subscript(s) must appear once. . . xn . the equation Apq = xp xq (1. n” and this leads back to (1.. it must necessarily appear once in each of the remaining symbol groups Ai2 x2 . A22 = x2 x2 .
34). In order to avoid ambiguity. With this understanding in force. if we did.5) for the sum of two matrices as simply Cij = Aij + Bij . .2. j=1 (1.33) is a dummy subscript. it must also appear in the symbol group on the righthand side.8) for the transpose of a matrix as simply Bij = Aji . for example. we would have omitted the various “i=1.3. no subscript is allowed to appear more than twice in any symbol grouping.3 Summation convention Next. and equation (1. A subscript that appears twice in a symbol grouping is called a repeated or dummy subscript.32) We can simplify the notation even further by agreeing to drop the summation sign and instead imposing the rule that summation is implied over a subscript that appears twice in a symbol grouping. equation (1.. Aii xi = bi since.33) with summation on the subscript j being implied. An equation of the form Apq = xi xj would violate this consistency requirement as would Ai1 xi + Aj2 x2 = 0. Thus the particular choice of index for the dummy subscript is not important.n” statements there and written. this is because k is a dummy subscript in (1. Thus we shall never write. . 1.4) for the equality of two matrices as simply Aij = Bij . the subscript j in (1. Note that Aik xk = bi (1.26) can be written as n Aij xj = bi . Note ﬁnally that had we adopted the range convention in Section 1. equation (1.11) deﬁning a symmetric matrix as simply Aij = Aji .1.30). . equation (1.34) is identical to (1. .32) as Aij xj = bi (1. we would write (1. SUMMATION CONVENTION 7 side of equation (1. equation (1.34) and therefore summation on k in implied in (1. the index i would appear 3 times in the ﬁrst symbol group. observe that (1.33).12) deﬁning a skewsymmetric matrix as simply Aij = −Aji .1. equation (1. for example.7) for the scalar multiple of a matrix as Bij = αAij .
6) for the product of two matrices as Cij = Aik Bkj . and write Aks xs = bk . say s. for example. All symbol groupings in an equation must have the same free subscripts. In an indicial equation it is the location of the subscripts that is crucial. If it appears twice. A given index may appear either once or twice in a symbol grouping. . and equivalently write Akq xq = bk . we would have written equation (1. 3. Lowercase latin subscripts take on values in the range (1. MATRIX ALGEBRA AND INDICIAL NOTATION 1.33) equivalently as xj Aij = bi . and therefore.24)1 is equivalent to x1 A11 + x2 A12 + .35) . the second equation does not correspond to {x}[A] = {b}. it is the location where the repeated subscript appears that tells us whether {x} multiplies [A] or [A] multiplies {x}. the order in which the terms appear within a symbol group is irrelevant. + xn A1n = b1 . . involves scalar quantities. it is called a free index and it takes on each value in its range.10) for the product of a matrix by its transpose as {x}T {x} = xi xi .15) for the trace as trace [A] = Aii . We can also change the repeated subscript q to some other index. equation (1. . . 4. 2. (1. . and equation (1. The three preceding equations are identical. Note that both Aij xj = bi and xj Aij = bi represent the matrix equation [A]{x} = {b}. in particular. 2. say k.13) for the quadratic form as {x}T [A]{x} = Aij xi xj . equation (1. If it appears once. for example (1. Thus. Note ﬁnally that had we adopted the range and summation conventions in Section 1.1. (1. Free and dummy indices may be changed without altering the meaning of an expression provided that one does not violate the preceding rules. it is called a dummy index and summation is implied over it. n). we can change the free subscript p in every term of the equation Apq xq = bp to any other index.36) (1.8 Summary of Rules: CHAPTER 1.24). Likewise we can write (1. .37) (1. It is important to emphasize that each of the equations in. Thus. for example. The same index may not appear more than twice in the same symbol grouping.
it follows that all terms on the righthand side vanish trivially except for the one term for which i = j. (ii) and when i = j.. δij is unity. Since [I]{u} = {u} the result follows at once.. one simply replaces the Kronecker delta by unity and changes the repeated subscript i → j to obtain Tipq. This implies.40) Thus we have used the facts that (i) since δij is zero unless i = j. . in indicial notation. (1. δ j Ajk is simply the . (1. + un δnj .1. is deﬁned by δij = 1 if i = j. If [Q] is an orthogonal matrix. the expression being simpliﬁed has a nonzero value only if i = j. that Qik Qjk = Qki Qkj = δij . one replaces the Kronecker delta by unity and changes the repeated subscript i → p to obtain Cij k δip = Cpj k . Thus the term that survives on the righthand side is uj and so ui δij = uj ..4 Kronecker delta The Kronecker Delta.39) The following useful property of the Kronecker delta is sometimes called the substitution rule.41) More generally. Since δij is zero unless i = j. Thus replacing the Kronecker delta by unity. (1.z δij = Tjpq. gives ui δij = uj . kelement of the matrix [I][A]. δij . (1. note that δji ui (which is equal to the quantity δij ui that is given) is simply the jth element of the column matrix [I]{u}. for example. Since [I][A] = [A]. (1..42) The substitution rule applies even more generally: for any quantity or expression Tipq. if δip multiplies a quantity Cij k representing n4 numbers. Then by the same reasoning. and changing the repeated subscript i → j. any column matrix {u} and suppose that one wishes to simplify the expression ui δij .. KRONECKER DELTA 9 1. then we know that [Q][Q]T = [Q]T [Q] = [I]. 2 . 0 if i = j. Consider. suppose that [A] is a square matrix and one wishes to simplify Ajk δ j . the result follows. we replace the Kronecker delta by unity and change the repeated subscript j → to obtain2 Ajk δ j = A k . Similarly.43) Observe that these results are immediately apparent by using matrix algebra.4.z . Recall that ui δij = u1 δ1j + u2 δ2j + .. In the ﬁrst example.38) Note that it represents the elements of the identity matrix. . (1. Similarly in the second example.z .
j.49). −1 if the subscripts i. are equal. k.44) Observe from its deﬁnition that the sign of eijk changes whenever any two adjacent subscripts are switched: eijk = −ejik = ejki . MATRIX ALGEBRA AND INDICIAL NOTATION 1. (1. 1.45) One can show by direct calculation that the determinant of a 3 matrix [A] can be written in either of two forms det[A] = eijk A1i A2j A3k as well as in the form det[A] = 1 eijk epqr Aip Ajq Akr . 0 if two or more subscripts i. = +1 for (i.48) The following relation involving the alternator and the Kronecker delta will be useful in subsequent calculations eijk epqk = δip δjq − δiq δjp . −1 for (i. 3 only. eijk (1. j. j. 1. (1. directly. = +1 if the subscripts i. by simply writing out all of the terms in (1. (2. The alternator or permutation symbol is deﬁned by 0 if two or more subscripts i. 2. 3). 3. be veriﬁed .46) Another useful identity involving the determinant is epqr det[A] = eijk Aip Ajq Akr . (1.49) It is left to the reader to develop proofs of these identities. k.47) or det[A] = eijk Ai1 Aj2 Ak3 . (1.(1. 2). 6 (1. are in anticyclic order. 2). are equal.10 CHAPTER 1.46) . k. They can. 1). 2. j. 3).5 The alternator or permutation symbol We now limit attention to subscripts that range over 1. (3. k) = (1. 2. j. k) = (1. 3. of course. are in cyclic order. (2. 1). k. j. (3.
.6 Worked Examples. and this equation holds for each value of the free index i = 1. where summation over the dummy index j is implied. .6. . . then doing the same for the elements of [B] and {z}. . Ai2 . Ai2 . the element Cij in the ith row and j th column of [C] is obtained by multiplying the elements of the ith row of [A]. Cij is obtained by multiplying the elements Ai1 . . Ain of the ith row of [A] by the respective elements x1 . yi = Aip xp + Biq zq . as noted previously. Bnj and summing. . Solution: By the rules of matrix multiplication. .1. {z} are n × 1 column matrices. respectively. . express the matrix equation {y} = [A]{x} + [B]{z} as a set of scalar equations. . . . Example(1. xn of {x} and summing. Note that one can alternatively – and equivalently – write the above equation in any of the following forms: yk = Akj xj + Bkj zj . However. . Ain by. by the respective elements of the j th column of [B] and summing. . Thus Cij = Aik Bkj . n. Observe that all rules for indicial notation are satisﬁed by each of the three equations above. B1j . . note that i and j are both free indices here and so this represents n2 scalar equations. the order of terms within a symbol group is insigniﬁcant since T these are scalar quantities. B2j . Thus yi = Aij xj + Bij zj . we ﬁrst multiply [A] by [B]T to obtain Eij = Aik Bkj . . moreover summation is carried out over the repeated index k. pairwise.1): If [A] and [B] are n × n square matrices and {x}. WORKED EXAMPLES. .) In order to calculate Eij . Example(1. It follows likewise that the equation [D] = [B][A] leads to Dij = Bik Akj . the element yi in the ith row of {y} is obtained by ﬁrst pairwise multiplying the elements Ai1 .2): The n × n matrices [C]. Express the elements of [C]. [D] and [E] are deﬁned in terms of the two n × n matrices [A] and [B] by [C] = [A][B]. 11 1. and ﬁnally adding the two together. 2. ielement of the matrix T [B]: Bij = Bji and so we can write Eij = Aik Bjk . x2 . Solution: By the rules of matrix multiplication. [D] and [E] in terms of the elements of [A] and [B]. [E] = [A][B]T . So. or equivalently Dij = Akj Bik . . [D] = [B][A]. the i. by deﬁnition of transposition. {y}. jelement of a matrix [B]T equals the j. yk = Akp xp + Bkp zp . . where the second expression was obtained by simply changing the order in which the terms appear in the ﬁrst expression (since.
On combining. then there would be four j’s and that violates one of our rules. note that this [S] is symmetric. 2 2 It may be readily veriﬁed from these deﬁnitions that Sij = Sji and that Wij = −Wij . therefore there are summations over each of them. By adding the two equations in above one obtains Sij = Sij + Wij = Aij . Whenever there is a dummy subscript. Solution: Deﬁne matrices [S] and [W ] in terms of the given the matrix [A] as follows: 1 1 (Aij + Aji ). Also. Thus. we have changed both i and j simultaneously from i → j and j → i. Eﬀectively. kj or jk elements of [A] and [B]. and since [W ] is skewsymmetric. Note that we can change i to any other index except j. Remark: As a special case. Example(1. Solution: Note that both i and j are dummy subscripts here. we can change i → p and get Sij Wij = Spj Wpj . show that Sij Wij = 0. Example(1.4): Show that any matrix [A] can be additively decomposed into the sum of a symmetric matrix and a skewsymmetric matrix.12 CHAPTER 1. By changing the dummy indices i → p and j → q. these we get Sij Wij = Sji Wji . Wij ui uj = 0 for all ui . The precise locations of the subscripts vary and the meaning of the terms depend crucially on these locations. from p → j and q → i which gives Spq Wpq = Sji Wji . the choice of the particular index for that dummy subscript is arbitrary. We can now change dummy indices again. . MATRIX ALGEBRA AND INDICIAL NOTATION All four expressions here involve the ik. provided that we change both repeated subscripts to the new symbol (and as long as we do not have any subscript appearing more than twice). take Sij = ui uj where {u} is an arbitrary column matrix. since [S] is symmetric Sji = Sij . we get Sij Wij = Spq Wpq . and we can change it to another index. for example. the matrix [S] is symmetric and [W ] is skewsymmetric. if we did change it to j. since i is a dummy subscript in Sij Wij . Next. Using this in the righthand side of the preceding equation gives Sij Wij = −Sij Wij from which it follows that Sij Wij = 0. It follows that for any skewsymmetric [W ]. Wij = (Aij − Aji ). there is no free subscript so this is just a single scalar equation.3): If [S] is any symmetric matrix and [W ] is any skewsymmetric matrix. Therefore Sji Wji = −Sij Wij . Wji = −Wij . Thus. It is worth repeating that the location of the repeated subscript k tells us what term multiplies what term.
D111n . Show that [A] is unchanged if Dijk is replaced by its “symmetric part” Cijk where Cijk = 1 (Dijk + Dij k ). j. Example(1. is skew symmetric Example (1. Example (1.6): Suppose that D1111 . . D112n . . D1121 . . . . while Ek is symmetric in the subscripts k. show that for any matrix [T ]. D1122 . use indicial notation to solve the matrix equation [Q]{x} = {a} for {x}. ﬁrst on the repeated index i and then on the repeated index j.6. . Let [E] be an arbitrary symmetric matrix and deﬁne the elements of a matrix [A] by Aij = Dijk Ek .1. = where in the last step we have used the facts that Aij = Tij − Tji is skewsymmetric. . [A] = [S] + [W ]. 2. + δnn = n. or in matrix form. that Bij = ui uj is symmetric. . k where in the last step we have used the fact that (Dijk − Dij k ) Ek = 0 since Dijk − Dij in the subscripts k. . take all values in the range 1. . and that for any symmetric matrix [A] and any skewsymmetric matrix [B]. . . . WORKED EXAMPLES. 13 Example (1.5): Show that the quadratic form Tij ui uj is unchanged if Tij is replaced by its symmetric part.e.7): Evaluate the expression δij δik δjk . Tij ui uj = Sij ui uj for all ui where Sij = 1 (Tij + Tji ). Dnnnn are n4 constants. . . D1112 . i. and let Dijk denote a generic element of this set where each of the subscripts i. 2 (i) Solution: The result follows from the following calculation: Tij ui uj = 1 1 1 1 Tij + Tij + Tji − Tji ui uj 2 2 2 2 1 1 (Tij + Tji ) ui uj + (Tij − Tji ) ui uj 2 2 = Sij ui uj . Aij = Dijk Ek = = = 1 1 1 1 Dijk + Dijk + Dij k − Dij k Ek 2 2 2 2 1 1 (Dijk + Dij k ) Ek + (Dijk − Dij k ) Ek 2 2 Cijk Ek . n. . . 2 (i) Solution: In a manner entirely analogous to the previous example.8): Given an orthogonal matrix [Q]. . we have δij δik δjk = δjk δjk = δkk = δ11 + δ22 + . one has Aij Bij = 0. . Solution: By using the substitution rule. k.
which. ∂xi = ∂xj 1 if i = j. Solution: We begin by making two general observations. Calculate the partial derivatives ∂f /∂xi . we know from (1. by the substitution rule. . Thus the preceding equation simpliﬁes to δjk xj = Qik ai . Multiplying both sides by Qik gives Qik Qij xj = Qik ai . . x2 . ∂xi . i. reduces further to xk = Qik ai . by the substitution rule. In order to get around this diﬃculty we make use of the fact that the speciﬁc choice of the index in a dummy subscript is not signiﬁcant and so we can write f = Apq xp xq . and for an orthogonal matrix. . ∂xj Using this above gives ∂f = Apq [δpi xq + xp δqi ] = Apq δpi xq + Apq xp δqi ∂xi which.9): Consider the function f (x1 . simpliﬁes to ∂f = Aiq xq + Api xp = Aij xj + Aji xj = (Aij + Aji )xj . MATRIX ALGEBRA AND INDICIAL NOTATION Solution: In indicial form. the equation [Q]{x} = {a} reads Qij xj = ai . First.39) that Qrp Qrq = δpq . . it is incorrect to conclude that ∂f /∂xi = Aij xj by viewing this in the same way as diﬀerentiating the function A12 x1 x2 with respect to x1 . observe that if we diﬀerentiatiate f with respect to xi and write ∂f /∂xi = ∂(Aij xi xj )/∂xi . [Q]−1 = [Q]T . In matrix notation this reads {x} = [Q]T {a} which we could. note that because of the summation on the indices i and j. . xn ) = Aij xi xj where the Aij ’s are constants. of course. it follows that 0 if i = j.e.14 CHAPTER 1. have written down immediately from the fact that {x} = [Q]−1 {a}. we would violate our rules because the righthand side has the subscript i appearing three times in one symbol grouping. Since [Q] is orthogonal. Diﬀerentiating f and using the fact that [A] is constant gives ∂ ∂f ∂ ∂xp ∂xq = (Apq xp xq ) = Apq (xp xq ) = Apq xq + xp ∂xi ∂xi ∂xi ∂xi ∂xi Since the xi ’s are independent variables. Example(1. Second. ∂xi = δij .
Thus a suﬃcient condition for the given equation to hold is that [A] be skewsymmetric..10): Suppose that {x}T [A]{x} = 0 for all column matrices {x} where the square matrix [A] is independent of {x}. ∂xi (iii) Thus [A] must necessarily be a skew symmetric matrix. Example (1.E33 ) with respect to Eij gives ∂W ∂ = ∂Eij ∂Eij 1 Cpqrs Epq Ers 2 = 1 Cpqrs 2 ∂Epq ∂Ers Ers + Epq ∂Eij ∂Eij = = = 1 Cpqrs (δpi δqj Ers + δri δsj Epq ) 2 1 1 Cijrs Ers + Cpqij Epq 2 2 1 (Cijpq + Cpqij ) Epq .1. What does this imply about [A]? Solution: We know from a previous example that that if [A] is a skewsymmetric and [S] is symmetric then Aij Sij = 0. E12 .. since the Eij ’s are independent variables. Calculate ∂W ∂Eij and ∂2W . WORKED EXAMPLES. Since this equation holds for all xi . this simpliﬁes to Akj xj + Aik xi = (Akj + Ajk ) xj = 0. ∂Epq = ∂Eij 0 otherwise. ∂Eij (ii) Keeping this in mind and diﬀerentiating W (E11 . 2 .. E12 . and as a special case of this that Aij xi xj = 0 for all {x}.Enn ) = 1 2 Cijkl Eij Ekl . 15 Example (1. Therefore it is necessary and suﬃcient that [A] be skewsymmetric. .11): Let Cijkl be a set of n4 constants. . Therefore.6... it may be diﬀerentiated again with respect to xi to obtain (Akj + Ajk ) ∂xj = (Akj + Ajk ) δji = Aki + Aik = 0. (ii) Since this also holds for all xi . ∂Eij ∂Ekl (i) Solution: First. it follows that 1 if p = i and q = j. ∂Epq = δpi δqj . Deﬁne the function W ([E]) for all matrices [E] by W ([E]) = W (E11 . On using the substitution rule. we may diﬀerentiate both sides with respect to xk and proceed as follows: 0= ∂ ∂xi ∂xj ∂ (Aij xi xj ) = Aij (xi xj ) = Aij xj + Aij xi = Aij δik xj + Aij xi δjk . Now we show that this is also a necessary condition. We are given that Aij xi xj = 0 for all xi . ∂xk ∂xk ∂xk ∂xk (i) where we have used the fact that ∂xi /∂xj = δij in the last step..
k element of a 3 × 3 matrix. (ii) (i) References 1. Solution: By ﬁrst using the skew symmetry property (1.R. Introduction to Matrix Analysis. Thus Spq = Sqp and so [S] is symmetric. Bellman.49). . and ﬁnally using the substitution rule. Collar.13): Show that eijk Sjk = 0 if and only if the matrix [S] is symmetric. we have eijk ekij = −eijk eikj = −(δjk δkj −δjj δkk ) = −(δjj −δjj δkk ) = −(3−3×3) = 6.) 2 Diﬀerentiating this once more with respect to Ekl gives ∂2W ∂ = ∂Eij ∂Ekl ∂Ek 1 (Cijpq + Cpqij ) Epq 2 = 1 (Cijpq + Cpqij ) δpk δql 2 1 (Cijkl + Cklij ) 2 (iii) = (iv) Example (1. (Note that in the ﬁrst step we wrote W = 1 Cpqrs Epq Ers 2 1 rather than W = 2 Cijkl Eij Ekl because we would violate our rules for indices had we written ∂( 1 Cijkl Eij Ekl )/∂Eij .45). Since eijk = −eikj this is a skewsymmetric matrix.A. 1960. then using the identity (1. Elementary Matrices. R. Multiplying (i) by eipq and using the identity (1.49) leads to eipq eijk Sjk = (δpj δqk − δpk δqj )Sjk = Spq − Sqp = 0 where in the last step we have used the substitutin rule. 2. we can think of eijk as the j. Example(1. Pick and ﬁx the free subscript i at any value i = 1. R. Then.12): Evaluate the expression eijk ekij . Duncan and A. W. 1965. Conversely suppose that (i) holds for some matrix [S]. Consequently (i) must hold. Cambridge University Press. suppose that [S] is symmetric. Solution: First. 3.16 CHAPTER 1. MATRIX ALGEBRA AND INDICIAL NOTATION where we have made use of the substitution rule. In a previous example we showed that Sij Wij = 0 for any symmetric matrix [S] and any skewsymmetric matrix [W ]. 2.J. Frazer. McGrawHill. Remark: Note as a special case of this result that eijk vj vk = 0 for any arbitrary column matrix {v}.
. for example. it is not meant to be a source for learning the subject of linear algebra for the ﬁrst time. B. . “o” will denote the null vector while “0” will denote the null linear transformation.. will denote scalars (real numbers). The following notation will be consistently used: Greek letters will denote real numbers.. These notes are designed to review those aspects of linear algebra that will be encountered in our study of continuum mechanics. The discussion in these notes is limited almost entirely to (a) real 3dimensional Euclidean vector spaces.. and A. Linear Algebra is a far richer subject than the very restricted glimpse provided here might suggest.... Thus.... γ. and (b) to linear transformations that carry vectors from one vector space into the same vector space. lowercase boldface Latin letters will denote vectors. b. β... will denote linear transformations. vector A . . C... 17 ..Chapter 2 Vectors and Linear Transformations Notation: α .. c. α. linear transformation As mentioned in the Preface. scalar a . a. will denote vectors. In particular... and uppercase boldface Latin letters will denote linear transformations.
The operation of addition (has certain properties which we do not list here) and associates with each pair of vectors x and y in V. e3 } is said to be a basis for V. together with two operations. α2 . . another vector in V denoted by αx. we say that U is a subspace (or linear manifold) of V if.2) the numbers ξ1 . . . . y in V a scalar. The operation of scalar multiplication (has certain properties which we do not list here) and associates with each vector x ∈ V and each real number α. e2 . ξ2 . ξ2 . A scalarproduct (or inner product or dot product) on V is a function which assigns to each pair of vectors x. ξ3 are called the components of x in the basis {e1 . a vector denoted by x + y that is also in V. which we denote by x · y. any set of three linearly independent vectors {e1 . VECTORS AND LINEAR TRANSFORMATIONS 2. we say that the dimension of V is n. for every x. Unless stated otherwise. If V is a vector space. Thus a linear manifold U of V is itself a vector space under the same operations of addition and multiplication by a scalar as in V. These vectors are said to be linearly independent if the only real numbers α1 . the vectors x + y and αx are also in U.3) A Euclidean vector space is a vector space together with an inner product on that space. y ∈ U and every real number α. xk be k vectors in V. . .18 CHAPTER 2. Let U be a subset of a vector space V. Let x1 . αk for which α1 x1 + α2 x2 · · · + αk xk = o (2. e2 . addition and multiplication by a scalar. Given any vector x ∈ V there exist a unique set of numbers ξ1 . . From hereon we shall restrict attention to 3dimensional Euclidean vector spaces and denote such a space by E3 . . from hereon we restrict attention to 3dimensional vector spaces.1 Vectors A vector space V is a collection of elements. e3 }. αk = 0. If V contains n linearly independent vectors but does not contain n + 1 linearly independent vectors. ξ3 such that x = ξ1 e1 + ξ2 e2 + ξ3 e3 . x2 . called vectors. (2. . A scalarproduct has certain properties which we do not list here except to note that it is required that x·y =y·x for all x. it is assumed that there is a unique vector o ∈ V called the null vector such that x + o = x. y ∈ V. In particular. . (2.1) are the numbers α1 = α2 = .
this does not necessarily imply that x = o. (2. The magnitude x × y of the crossproduct can be interpreted geometrically as the area of the triangle formed by the vectors x and y. and n is a unit vector in the direction x × y which therefore is normal to the plane deﬁned by x and y. xy 0 ≤ θ ≤ π. 0 if i = j. e3 } is said to be righthanded if (e1 × e2 ) · e3 > 0. (2. to note that if we are given two vectors x and y where x · y = 0 and y = o. nevertheless helpful. It is obvious. (2. An orthonormal basis is a triplet of mutually orthogonal unit vectors e1 . then x must be the null vector. 3. 2.8) where θ is the angle between x and y as deﬁned by (2. y ∈ E3 . e2 . and since it has unit length. The angle θ between two vectors x and y is deﬁned by cos θ = x·y . y ∈ V. The vectorproduct must have certain properties (which we do not list here) except to note that it is required that y × x = −x × y One can show that x × y = x y sin θ n.1. (2. it follows that n = (x × y)/(x × y). (2.5). e2 . on the other hand if x · y = 0 for every vector y.9) for all x. For such a basis.7) A vectorproduct (or crossproduct) on E3 is a function which assigns to each ordered pair of vectors x. which we denote by x × y.2. ei · ej = δij for i.5) Two vectors x and y are orthogonal if x · y = 0.4) A vector has zero length if and only if it is the null vector. a vector. (2. (2.10) . e3 ∈ E3 . VECTORS 19 by The length (or magnitude or norm) of a vector x is the scalar denoted by x and deﬁned x = (x · x)1/2 . A basis {e1 .6) where the Kronecker delta δij is deﬁned in the usual way by δij = 1 if i = j. A unit vector is a vector of unit length. Since n is parallel to x × y. j = 1.
. and it is the image of x under the transformation F. e3 . say pq. e2 . we usually omit the parenthesis and write Fx instead of F(x). e3 . q. x2 . e3 } is called a rectangular cartesian coordinate frame. x ∈ E3 . x3 ) are called the coordinates of p in the (coordinate) frame F = {o. If e1 .1.2 Linear Transformations.11) F is said to be a linear transformation if it is such that F(αx + βy) = αF(x) + βF(y) (2. (2. VECTORS AND LINEAR TRANSFORMATIONS 2. Pick and ﬁx an arbitrary point o ∈ P (which we call the origin of P) and an arbitrary basis for E3 of unit vectors e1 . e2 . Let P be the plane normal to the unit vector n.1 Euclidean point space A Euclidean point space P whose elements are called points. e1 . e1 . is related to a Euclidean vector space E3 in the following manner. y ∈ E3 . see Figure 2. Here x is called the position of point q relative to the point p. For . A linear transformation is deﬁned by the way it operates on vectors in E3 . e2 . a second vector y ∈ E3 .20 CHAPTER 2. (iii) given an arbitrary point p ∈ P and an arbitrary vector x ∈ E3 .1. e3 } comprised of the origin o and the basis vectors e1 . Let F be a function (or transformation) which assigns to each vector x ∈ E3 . When F is a linear transformation. r ∈ P. such that (i) pq = − qp (ii) pq + qr=pr → → → → → for all p. Corresponding to any point p ∈ P there is a unique vector → op= x = x1 e1 + x2 e2 + x3 e3 ∈ E3 . e2 . y ∈ E3 . there is a unique point → q ∈ P such that x =pq. The triplet (x1 . y = F(x). q ∈ P. Every order pair of points (p. q) is uniquely associated → with a vector in E3 . β and all vectors x. 2. A geometric example of a linear transformation is the “projection operator” Π which projects vectors onto a given plane P. the coordinate frame {o. Note that Fx is a vector. for all p. e3 is an orthonormal basis. e2 .12) for all scalars α. Consider a threedimensional Euclidean vector space E3 .
y3 } are any three vectors in E3 and that {x1 . consequently the image Fx of any vector x is given by Fx = ξ1 y1 + ξ2 y2 + ξ3 y3 which is a rule for assigning a unique vector Fx to any given vector x. AB and αA are deﬁned as those linear transformations which are such that (A + B)x = Ax + Bx for all x ∈ E3 . Therefore any arbitrary vector x can be expressed uniquely in the form x = ξ1 x1 + ξ2 x2 + ξ3 x3 . AB = BA. It can be veriﬁed geometrically that P is deﬁned by Πx = x − (x · n)n for all x ∈ E3 . LINEAR TRANSFORMATIONS. P n x 21 Πx Figure 2. (2.13) Linear transformations tell us how vectors are mapped into other vectors. y2 . x3 } is a basis for E3 .1: The projection Πx of a vector x onto the plane P. y3 }: y1 = Fx1 . x2 . In general. AB the product.14) Let A and B be linear transformations on E3 and let α be a scalar. y3 = Fx3 . x2 . (2. y2 .17) respectively. x3 } are any three linearly independent vectors in E3 . Then there is a unique linear transformation F that maps {x1 . y2 = Fx2 . x2 . and αA is the scalar multiple of A by α. Πx ∈ P is the vector obtained by projecting x onto P.16) (2. A + B is called the sum of A and B. Thus 0x = o. The null linear transformation 0 is the linear transformation that takes every vector x into the null vector o. for all x ∈ E3 .15) (AB)x = A(Bx) (αA)x = α(Ax) for all x ∈ E3 . (2. x3 } into {y1 . The identity linear transformation I takes every vector x into itself. In particular. any vector x ∈ E3 . (2. suppose that {y1 . (2. Ix = x for all x ∈ E3 . This follows from the fact that {x1 .2.2.18) . The linear transformations A + B.
VECTORS AND LINEAR TRANSFORMATIONS The range of a linear transformation A (i. (A + B)T = AT + BT .21) Every linear transformation A can be represented as the sum of a symmetric linear transformation S and a skewsymmetric linear transformation W as follows: 1 1 A = S + W where S = (A + AT ). Consequently. it is known as the null space of A. (2.19) AT is called the transpose of A.e. (AB)T = BT AT . it may be shown that Wx · x = 0 for all x ∈ E3 . One can show that (αA)T = αAT . there is one and only one vector x ∈ E3 for which Ax = y.20) A linear transformation A is said to be symmetric if A = AT . for any given y ∈ E3 . (2. The dimension of this particular subspace is known as the rank of A.24) (2. (2. Given any linear transformation A. (2.22) (2. skewsymmetric if A = −AT . a nonsingular transformation A is a onetoone transformation in the sense that. 2 2 For every skewsymmetric linear transformation W. W = (A − AT ).. if the only vector x for which Ax = o is the zero vector.23) moreover. corresponding to any nonsingular linear transformation .25) Given a linear transformation A. It follows from this that if A is nonsingular then Ax = Ay whenever x = y. (2. there exists a vector w (called the axial vector of W) which has the property that Wx = w × x for all x ∈ E3 . y ∈ E3 . The set of all vectors x for which Ax = o is also a subspace of E3 . one can show that there is a unique linear transformation usually denoted by AT such that Ax · y = x · AT y for all x. Thus. then we say that A is nonsingular.22 CHAPTER 2. the collection of all vectors Ax as x takes all values in E3 ) is a subspace of E3 .
y2 . LINEAR TRANSFORMATIONS.32) (2. ¯ (2. (AB)−1 = B−1 A−1 .2. there exists a second linear transformation. y3 }: y1 = Fx1 . x2 . moreover. y2 . y3 } are righthanded (or both are lefthanded) we say that the linear transformation F preserves the orientation of the vector space. then there is a unique nonsingular linear transformation F that maps {x1 . if Qx = x for all x ∈ E3 . x2 . If A is nonsingular then so is AT . (AT )−1 = (A−1 )T . y2 . then so is AB. it follows that it also preserves the inner product: Qx · Qy = x · y for all x. 23 A. positivesemideﬁnite if Ax · x ≥ 0 for all x ∈ E3 . (2. (2. x3 } and {y1 .27) (2. it is necessarily nonsingular and Q−1 = QT . i. y2 = Fx2 . If two linear transformations A and B are both nonsingular.33) (2. If Q is orthogonal. such that AA−1 = A−1 A = I.2. If both bases {x1 . such that Ax = y if and only if x = A−1 y. x3 } are two sets of linearly independent vectors in E3 . x2 . y3 = Fx3 . x = o. A linear transformation Q is said to be orthogonal if it preserves length.. y3 } and {x1 .28) Thus an orthogonal linear transformation preserves both the length of a vector and the angle between two vectors.e. If Q is orthogonal. y3 } into {x1 .31) . x2 . and so there is no ambiguity in writing this linear transformation as A−T . x3 }. moreover. y2 . The inverse of F maps {y1 . or equivalently. y ∈ E3 . x3 } into {y1 .29) (2.26) If {y1 .30) (2. A linear transformation A is said to be positive deﬁnite if Ax · x > 0 for all x ∈ E3 . denoted by A−1 and called the inverse of A.
Moreover. Given two vectors a. λ2 . Finally. Let A be a linear transformation. Every eigenvalue of a positive deﬁnite linear transformation must be positive. as an eigenvector and an eigenvalue of A. (2.24 CHAPTER 2. then it can be readily shown that λ and Rr are an eigenvalue and eigenvector of V.AA. Combining these two fact shows that Av = λv for all v ∈ U.34) are known. and a corresponding set of three mutually orthogonal eigenvectors e1 . it is easily seen that e and λn are an eigenvector and an eigenvalue of An where An = AA.(n times). then for any positive integer n. A subspace U is known as an invariant subspace of A if Av ∈ U for all v ∈ U.35) If λ and r are an eigenvalue and eigenvector of U. e3 } is said to be a principal basis of A.. The particular basis of E3 comprised of {e1 . If e and λ are an eigenvector and eigenvalue of a linear transformation A. and no eigenvalue of a nonsingular linear transformation can be zero. e2 . (2. m > 0. and e3 . Every linear transformation A (on a 3dimensional vector space E3 ) has at least one eigenvalue. It can be shown that a symmetric linear transformation A has three real eigenvalues λ1 . this continues to be true for negative integers m provided A is nonsingular and if by A−m we mean (A−1 )m . b ∈ E3 . (2. A is positive deﬁnite if and only if its symmetric part (1/2)(A + AT ) is positive deﬁnite..36) . Since U is an invariant subspace we know in addition that Av ∈ U whenever v ∈ U. Given a linear transformation A.. Each eigenvector of A characterizes a onedimensional invariant subspace of A. which is such that (a ⊗ b)x = (x · b)a for all x ∈ E3 . according to the polar decomposition theorem. Since U is onedimensional. their tensorproduct is the linear transformation usually denoted by a ⊗ b. suppose that there exists an associated onedimensional invariant subspace. e2 . A vector v and a scalar λ such that Av = λv. given any nonsingular linear transformation F. it follows that if v ∈ U then any other vector in U can be expressed in the form λv for some scalar λ. there exists unique symmetric positive deﬁnite linear transformations U and V and a unique orthogonal linear transformation R such that F = RU = VR. VECTORS AND LINEAR TRANSFORMATIONS A positive deﬁnite linear transformation is necessarily nonsingular. and λ3 . respectively. A symmetric linear transformation is positive deﬁnite if and only if all three of its eigenvalues are positive.
The rank of the linear transformation a ⊗ b is thus unity. (2. 25 Observe that for any x ∈ E3 . (a ⊗ b)A = a ⊗ (AT b). e2 . λ2 . i=1 (Aei ) ⊗ ei = A.42) . 3. it follows from (2. e3 }. (2. Ae2 . S22 = λ2 . Ae3 . (2. j = 1. e2 .41) Let S be a symmetric linear transformation with eigenvalues λ1 .38) Let {e1 . The linear transformation A can now be represented as 3 3 A= i=1 j=1 Aij (ei ⊗ ej ).2. (2. e3 } be an orthonormal basis. any vector in E3 . For any vectors a. Since Sej = λj ej for each j = 1. (2. It follows that there exist unique real numbers Aij such that 3 Aej = i=1 Aij ei . λ3 and corresponding (mutually orthogonal unit) eigenvectors e1 . and d it is easily shown that (a ⊗ b)T = b ⊗ a. e3 . S21 = S31 = 0. S13 = S23 = 0.39) that the components of S in the principal basis {e1 . c. S32 = 0. Since this is a basis. 2. 2. e2 . e2 . and therefore in particular each of the vectors Ae1 . b.40) that S admits the representation 3 S= i=1 λi (ei ⊗ ei ). e3 . S12 = 0. 3. e2 . the vector (a ⊗ b)x is parallel to the vector a. e3 } are S11 = λ1 . They can equivalently be expressed as Aij = ei · (Aej ). LINEAR TRANSFORMATIONS. (a ⊗ b)(c ⊗ d) = (b · c)(a ⊗ d). (2. Thus the range of the linear transformation a ⊗ b is the onedimensional subspace of E3 consisting of all vectors parallel to a.2. can be expressed as a unique linear combination of the basis vectors e1 . Note that 3 3 i=1 ei ⊗ ei = I. It follows from the general representation (2.39) where Aij is the ith component on the vector Aej . S33 = λ3 .37) The product of a linear transformation A with the linear transformation a ⊗ b gives A(a ⊗ b) = (Aa) ⊗ b.40) One refers to the Aij ’s as the components of the linear transformation A in the basis {e1 .
3 Worked Examples.45) 2. (2. The second part is shown analogously. Solution: By the properties of the vectorproduct. It is readily seen that √ 3 S= i=i λi (ei ⊗ ei ). and b is normal to (b × c). c. there is a unique symmetric positive deﬁnite linear transformation T such that T2 = S. then 3 S−1 = i=1 (1/λi ) (ei ⊗ ei ). VECTORS AND LINEAR TRANSFORMATIONS this is called the spectral representation of a symmetric linear transformation. c in E3 – none of which is the null vector – to be linearly dependent is that a · (b × c) = 0. for any positive integer n. On expanding this out one obtains a · (a × c) + a · (b × c) + b · (a × c) + b · (b × c) = 0. Thus the preceding equation simpliﬁes to a · (b × c) = b · (c × a). b. Finally. Example 2. b.2: Show that a necessary and suﬃcient condition for three vectors a. . show that a · (b × c) = b · (c × a) = c · (a × b). 3 S = i=1 n λn (ei ⊗ ei ). i (2. Example 2. the vector (a + b) is normal to the vector (a + b) × c.1: Given three vectors a. This establishes the ﬁrst part of the result.44) If S is symmetric and positive deﬁnite. Since a is normal to (a × c).43) if S is symmetric and nonsingular. the ﬁrst and last terms in this equation vanish.26 CHAPTER 2. (2. Thus (a + b) · [(a + b) × c] = 0. We call T the positive deﬁnite square root of S and √ denote it by T = S. recall that a × c = −c × a. It can be readily shown that.
By the properties of the vectorproduct.2. γ is nonzero it follows that necessarily a · (b × c) = o.2. Analogous calculations with the other pairs of vectors. at least one of which is non zero.3: Interpret the quantity a·(b×c) geometrically in terms of the volume of the tetrahedron deﬁned by the vectors a. β. leads to αa · (b × c) = 0. It follows that αa + βb + γc = o for some real numbers α. c.2. Since at least one of α. Its area A0 can be written as 1/2 base × height = 1/2a(b sin θ) where θ is the angle between a and b. a · (b × c) = 0. b. the vector b × c is normal to the plane deﬁned by the vectors b and c.3. n = (a × b)/a × b is a unit vector that is normal to the base of the tetrahedron. c must be linearly dependent. see Figure 2. c. c as depicted in Figure 2. Since we are in E3 this means that a must lie in the plane deﬁned by b and c. b. γ. WORKED EXAMPLES. β. Taking the vectorproduct of this equation with c and then taking the scalarproduct of the result with a leads to βa · (b × c) = 0. To show suﬃciency. Solution: Consider the tetrahedron formed by the three vectors a. and so the height of the tetrahedron is h0 = c · n. b. b. let a·(b×c) = 0 and assume that a. 3 Height Heigh h0 Area A0 c b a n olume Volume = 1 3 A0 × h0 A0 = a × b h0 = c · n a×b n= a × b Figure 2. This means they cannot be linearly independent. Example 2. c. and this implies that a is normal to b × c. and keeping in mind that a · (b × c) = b · (c × a) = c · (a × b). Its volume V0 = 1 A0 h0 where A0 is the area of its base and h0 is its height. Therefore V0 = 1 1 A0 h0 = 3 3 a × b 2 (c · n) = 1 (a × b) · c. βa · (b × c) = 0. suppose that the three vectors a. b.2: Volume of the tetrahedron deﬁned by vectors a. are linearly dependent. 27 Solution: To show necessity. γa · (b × c) = 0.9) of the vectorproduct we have a × b = ab sin θ and so A0 = a × b/2. However from the property (2. We will show that this is a contradiction whence a. 6 (i) . Consider the triangle deﬁned by the vectors a and b to be the base of the tetrahedron. Next. c are linearly independent. By assumption. b.
2. show that φ(x) = c · x for some constant vector c. a. leading to Ax − Bx2 = 0. project and reﬂect a vector in the plane P. 3. c. Example 2. Then an arbitrary vector x can be written in terms of its components as x = x1 e1 + x2 e2 + x3 e3 .28 CHAPTER 2. i. β and all vectors x. we may choose y = Ax − Bx in this. show that A = B. On setting ci = φ(ei ). we ﬁnd φ(x) = x1 c1 + x2 c2 + x3 c3 = c · x where c = c1 e1 + c2 e2 + c3 e3 . If φ is linear.4: Let φ(x) be a scalarvalued function deﬁned on the vector space E3 . What is the inverse of R? d. Example 2. Solution: Since (Ax − Bx) · y = 0 for all vectors y. y. Remark: This shows that the scalarproduct is the most general scalarvalued linear function of a vector. this implies that Ax = Bx and so A = B. e3 } be any orthonormal basis for E3 . Let Π and R be the transformations which. Therefore φ(x) = φ(x1 e1 + x2 e2 + x3 e3 ) which because of the linearity of φ leads to φ(x) = x1 φ(e1 ) + x2 φ(e2 ) + x3 φ(e3 ). .6: Let n be a unit vector. Verify that a reﬂection linear transformation R is nonsingular while a projection linear transformation Π is singular. Π is called the “projection linear transformation” while R is known as the “reﬂection linear transformation”. Solution: Let {e1 . VECTORS AND LINEAR TRANSFORMATIONS Observe that this provides a geometric explanation for why the vectors a. Show that Π and R are linear transformations. if φ(αx + βy) = αφ(x) + βφ(y) for all scalars α. i = 1. for all vectors x (i) Example 2. Since the only vector of zero length is the null vector. Show that R(Rx) = x for all x ∈ E3 . respectively. Verify that a projection linear transformation Π is symmetric and that a reﬂection linear transformation R is orthogonal. b. b. e3 .e. and let P be the plane through o normal to n. c are linearly dependent if and only if (a × b) · c = 0.5: If two linear transformations A and B have the property that Ax · y = Bx · y for all vectors x and y.
3: The projection Πx and reﬂection Rx of a vector x on the plane P. Figure 2. b. By geometry we see that Πx = x − (x · n)n. Rx = o. its unit normal vector n. Rx = x − 2(x · n)n.3 shows a sketch of the plane P. a generic vector x. Taking the scalarproduct of this equation with the unit vector n yields x · n = 2(x · n) from which we conclude that x · n = 0. its projection Πx and its reﬂection Rx. c. Solution: a.2. WORKED EXAMPLES. The transformation Π is therefore singular. P n x Πx Rx (x · n)n 29 (x · n)n Figure 2. e. Applying the deﬁnition (i)2 of R to the vector Rx gives R(Rx) = (Rx) − 2 (Rx) · n n Replacing Rx on the righthand side of this equation by (i)2 . (i) These deﬁne the images Πx and Rx of a generic vector x under the transformation Π and R. Operating on both sides of this equation by R−1 gives Rx = R−1 x. One can readily verify that Π and R satisfy the requirement (2. Therefore Πn = o and (since n = o) we see that o is not the only vector that is mapped to the null vector by Π. Therefore Rx = o if and only if x = o and so R is nonsingular. Since this holds for all vectors x it follows that R−1 = R. recall from part (b) that R(Rx) = x. i. Substituting this into the righthand side of the preceding equation leads to x = o.3. and expanding the resulting expression shows that the righthand side simpliﬁes to x. Using (i)2 x = 2(x · n)n.e.12) of a linear transformation. . Applying the deﬁnition (i)1 of Π to the vector n gives Πn = n − (n · n)n = n − n = o. Show that the projection linear transformation and reﬂection linear transformation can be represented as Π = I − n ⊗ n and R = I − 2(n ⊗ n) respectively. Next consider the transformation R and consider a vector x that is mapped by it to the null vector. Thus R(Rx) = x. To ﬁnd the inverse of R.
and since W = −WT for a skew symmetric linear transformation. Recall from the deﬁnition (2. (i) Solution: By the deﬁnition (2. (i) . To show that R is orthogonal we must show that RRT = I or RT = R−1 .19) of the transpose. In part (c) we showed that R−1 = R and so it now follows that RT = R−1 .7: If W is a skew symmetric linear transformation show that Wx · x = 0 for all x . this can be written as Wx · x = −x · Wx. Using the deﬁnition (i)2 of R on the righthand side of this equation yields x · RT y = x · y − 2(x · n)(y · n). Solution: First. To show that Π is symmetric we simply use its deﬁnition (i)1 to calculate Πx · y and x · Πy for arbitrary vectors x and y. Example 2. e. This yields Πx · y = x − (x · n)n · y = x · y − (x · n)(y · n) and x · Πy = x · y − (x · n)n = x · y − (x · n)(y · n). we have Wx · x = x · WT x.19) of the transpose. Example 2.8: Show that (AB)T = BT AT . by the deﬁnition (2. Finally the property (2.19) that the transpose satisﬁes the requirement x · RT y = Rx · y. Similarly I − 2n ⊗ n x = x − 2(x · n)n = Rx and so R = I − 2n ⊗ n. Since this holds for all x it follows that RT y = y − 2(y · n)n.3) of the scalarproduct allows this to be written as Wx · x = −Wx · x from which the desired result follows. Thus Πx · y = x · Πy and so Π is symmetric. We begin by calculating RT . Thus R is orthogonal. VECTORS AND LINEAR TRANSFORMATIONS d. Applying the operation (I − n ⊗ n) on an arbitrary vector x gives I − n ⊗ n x = x − (n ⊗ n)x = x − (x · n)n = Πx and so Π = I − n ⊗ n. We can rearrange the righthand side of this equation so it reads x · RT y = x · y − 2(y · n)n .30 CHAPTER 2. (AB)x · y = x · (AB)T y . Comparing this with (i)2 shows that RT = R.
Therefore combining these three equations shows that (AB)x · y = x · BT AT y (ii) On equating these two expressions for (AB)x · y shows that x · (AB)T y = x · BT AT y for all vectors x. Solution: Since (AT )−1 is the inverse of AT we have (AT )−1 AT = I.9: If o is the null vector. WORKED EXAMPLES. (Since the inverse would thus have been shown to exist. Solution: Let C = B−1 A−1 . necessarily AB must be nonsingular. Therefore x + o = x.e. Postoperating on both sides of this equation by (A−1 )T gives (AT )−1 AT (A−1 )T = (A−1 )T . Example 2. Recall that (AB)T = BT AT for any two linear transformations A and B.2.3. . Example 2. Therefore (AB)C = C(AB) = I and so C is the inverse of AB. show that Qx · Qy = x · y for all vectors x. 31 Second.) Observe ﬁrst that (AB) C = (AB) B−1 A−1 = A(BB−1 )A−1 = AIA−1 = I . which when combined with the second equation yields the desired result. y. However operating on the ﬁrst of these equations by A shows that Ax + Ao = Ax. and similarly that C (AB) = B−1 A−1 (AB) = B−1 (A−1 A)B == B−1 IB = I . then show that Ao = o for any linear transformation A. Example 2. and similarly Ax + o = Ax. show that (A−1 )T = (AT )−1 . note that (AB)x · y = A(Bx) · y. Solution: The null vector o has the property that when it is added to any vector. and by the deﬁnition of the transpose of B we have Bx · AT y = x · BT AT y. the vector remains unchanged.12: Show that an orthogonal linear transformation Q preserves inner products. Thus the preceding equation simpliﬁes to (AT )−1 (A−1 A)T = (A−1 )T Since A−1 A = I the desired result follows. Example 2. i. By the deﬁnition of the transpose of A we have A(Bx) · y = Bx · AT y. y which establishes the desired result.10: If A and B are nonsingular linear transformations show that AB is also nonsingular and that (AB)−1 = B−1 A−1 . We will show that (AB)C = C(AB) = I and therefore that C is the inverse of AB.11: If A is nonsingular.
13: Let Q be an orthogonal linear transformation. Remark: Thus an orthogonal linear transformation preserves the length of any vector and the inner product between any two vectors. However an orthogonal linear transformation preserves length and therefore Qx = x. . Q is nonsingular. Show that a. Thus Q is nonsingular. Q−1 = QT . b. It follows that x · QT Qy = x · y for all vectors x and y. (ii) Since the righthandsides of the preceding expressions for x · y and Qx · Qy are the same. Qv = v for all vectors v. Example 2. and therefore that QT Q = I. an orthogonal linear transformation Q preserves length. and since A is symmetric that A = AT .32 Solution: Since CHAPTER 2. Example 2. Thus Aa1 · a2 = a1 · Aa2 . Since Q is orthogonal it preserves the inner product: Qx · Qy = x · y for all vectors x and y. Consequently x = 0. Thus Q−1 = QT . Suppose that Qx = o for some vector x.e.19) of the transpose shows that Qx · Qy = x · QT Qy. Thus the preceding equation simpliﬁes to Qx · Qy = 1 x2 + y2 − x − y2 2 . it follows that Qx · Qy = x · y. y it must also hold when x and y are replaced by Qx and Qy: x·y = Qx · Qy = 1 Qx2 + Qy2 − Qx − Qy2 . and that b.14: If α1 and α2 are two distinct eigenvalues of a symmetric linear transformation A. show that the corresponding eigenvectors a1 and a2 are orthogonal to each other. Taking the norm of the two sides of this equation leads to Qx = o = 0. To show that Q is nonsingular we must show that the only vector x for which Qx = o is the null vector x = o. VECTORS AND LINEAR TRANSFORMATIONS (x − y) · (x − y) = x · x + y · y − 2x · y 1 x2 + y2 − x − y2 . 2 Since this holds for all vectors x. However the property (2. 2 it follows that (i) By deﬁnition. However the only vector of zero length is the null vector and so necessarily x = o. i. Solution: Recall from the deﬁnition of the transpose that Aa1 · a2 = a1 · AT a2 . Solution: a. It follows therefore that an orthogonal linear transformation preserves the angle between a pair of vectors as well.
Operating on both sides with P−1 gives P−1 APP−1 e = λP−1 e which establishes the result. show that λ and P−1 e are an eigenvalue and eigenvector of the linear transformation P−1 AP. α1 = α2 it follows that necessarily a1 · a2 = 0. Thus the preceding equation reduces to α1 a1 · a2 = α2 a1 · a2 or equivalently (α1 − α2 )(a1 · a2 ) = 0. WORKED EXAMPLES.15: If λ and e are an eigenvalue and eigenvector of an arbitrary linear transformation A. whence APP−1 e = λe. show that λ = 1.3. Example 2. Since. However we are told that Ae = λe. Example 2. e2 .17: The components of a linear transformation A in an orthonormal basis {e1 . (i) Show that the linear transformation A can be represented as 3 3 A= i=1 j=1 Aij (ei ⊗ ej ). Solution: Since PP−1 = I it follows that Ae = APP−1 e. 3. i=1 j=1 j=1 i=1 i=1 j=1 Aij (ei ⊗ ej ) x = xj Aej = A xj ej = Ax. Solution: Let λ and e be an eigenvalue and corresponding eigenvector of Q. However. The corresponding eigvector is known as the axis of Q. On using (i) in the right most expression above. j=1 j=1 . 33 Since a1 and a2 are eigenvectors of A corresponding to the eigenvalues α1 and α2 . Example 2. Thus Qe = λe and so Qe = λe = λ e.16: If λ is an eigenvalue of an orthogonal linear transformation Q. j = 1. Thus λ = 1. Q preserves length and so Qe = e. e3 } are the unique real numbers Aij deﬁned by 3 Aej = i=1 Aij ei . we can continue this calculation as follows: 3 3 3 3 i=1 j=1 Aij (ei ⊗ ej ) x = i=1 j=1 Aij (x · ej )ei = Aij xj ei = xj Aij ei . we have Aa1 = α1 a1 and Aa2 = α2 a2 .2. 2. (ii) Solution: Consider the linear transformation given on the righthand side of (ii) and operate it on an arbitrary vector x: 3 3 3 3 3 3 3 3 where we have used the facts that (p⊗q)r = (q·r)p and xi = x·ei . Remark: We will show later that +1 is an eigenvalue of a “proper” orthogonal linear transformation on E3 . Here P is an arbitrary nonsingular linear transformation.
the angle between any vector x and its image Rx is θ: Rx · x = x2 cos θ for all vectors x. it follows from (v) and (vii)3 that Let {e1 . e} be a righthanded orthonormal basis. it leaves the axis e itself unchanged: Re = e.34 CHAPTER 2. These together with (vii) imply that R31 = R32 = 0. e} forms a righthanded orthonormal basis for IE3 . since R rotates vectors about the axis e. . R23 = 0. e2 and e. Second. (i) where e1 and e2 are any two mutually orthogonal vectors such that {e1 . Solution: We begin by listing what is given to us in the problem statement. Re2 and Re. j = 1. (iv) moreover. e2 . 2. 3. can be expressed as linear combinations of e1 . for any vector x that is not parallelel to the axis e. Re = R13 e1 + R23 e2 + R33 e. the angle between any vector x and e must equal the angle between Rx and e: Rx · e = x · e for all vectors x. (v) And ﬁnally. This implies that any vector in E3 . 0 < θ < π. R13 = 0. since the angle through which R rotates a vector is θ. Show that R can be represented as R = e ⊗ e + (e1 ⊗ e1 + e2 ⊗ e2 ) cos θ − (e1 ⊗ e2 − e2 ⊗ e1 ) sin θ. Rx and e must obey the inequality (x × Rx) · e > 0 for all vectors x that are not parallel to e. (iii) Next. (vi) for some unique real numbers Rij .18: Let R be a “rotation transformation” that rotates vectors in IE3 through an angle θ. about an axis e (in the sense of the righthand rule). since the rotation is in the sense of the righthand rule. we conclude from (iv) with the choice x = e1 that Re1 · e = 0. VECTORS AND LINEAR TRANSFORMATIONS The desired result follows from this since this holds for arbitrary vectors x. Similarly Re2 · e = 0. R33 = 1. it necessarily preserves the length of a vector and so Rx = x for all vectors x. e2 . Since the transformation R simply rotates vectors. Re1 = R11 e1 + R21 e2 + R31 e. First. and therefore in particular the vectors Re1 . Example 2. i. (vii) Re2 = R12 e1 + R22 e2 + R32 e. the vectors x. (ii) In addition.
e2 . it follows from (iii) with x = e1 and (vii)1 that R11 = cos θ. we can write FT Fx · x = (Fx) · (Fx) = Fx2 ≥ 0. + sin θ e2 . together with (viii) and the fact that {e1 . Collecting these results allows us to write (vii) as Re1 Re2 Re = = = cos θ e1 R12 e1 e. can happen only if x = o. (ix) Finally.39). Thus FT Fx · x > 0 for all vectors x = o and so FT F is positive deﬁnite. since F is nonsingular. Fifth. the inequality (vi) with the choice x = e1 . (i) this shows that FT F is symmetric. + cos θ e2 . It therefore follows that (FT F)T = FT (FT )T = FT F. Thus in conclusion we can write (viii) as Re1 Re2 Re = cos θ e1 = − sin θ e1 = e. which. Applying this to (ix) allows us to write R = cos θ (e1 ⊗ e1 ) + sin θ (e2 ⊗ e1 ) − sin θ (e1 ⊗ e2 ) + cos θ (e2 ⊗ e2 ) + (e ⊗ e) which can be rearranged to give the desired result.40) of a linear transformation in terms of its components as deﬁned in (2. WORKED EXAMPLES. recall the representation (2. +R21 e2 . (viii) Fourth.19: If F is a nonsingular linear transformation. 35 Third.19) of the transpose. e} forms a righthanded basis yields R21 > 0. Similarly the choice x = e2 . . In order to show that FT F is positive deﬁnite. Solution: For any linear transformations A and B we know that (AB)T = BT AT and (AT )T = A.3. we consider the quadratic form FT Fx · x. show that FT F is symmetric and positive deﬁnite. R12 = − sin θ. yields R12 < 0. since 0 < θ < π. Thus R11 = R22 = cos θ. Similarly we ﬁnd that R12 = ± sin θ.2. By using the property (2. (ii) Further. (ii) with x = e1 gives Re1  = 1 which in view of (viii)1 requires that R21 = ± sin θ. + cos θ e2 . (x) Example 2. One similarly shows that R22 = cos θ. Collecting these results shows that R21 = + sin θ. equality holds here if and only if Fx = o.
Solution: Since S is symmetric and positive deﬁnite it has three real positive eigenvalues σ1 . s2 . Suppose that S has two symmetric positive deﬁnite square roots T1 and T2 : S = T2 = T2 . show that there is a unique symmetric positive deﬁnite linear transformation T for which T2 = S. Further. (iv) √ Thus either f = o or f is an eigenvector of T1 corresponding to the eigenvalue − σ(< 0).e. positive deﬁnite and that T2 = S. Thus f = o and so √ (v) T1 s = σs . Show that it has a unique symmetric positive deﬁnite square root. Let σ > 0 2 1 and s be an eigenvalue and corresponding eigenvector of S. (i) . say. This establishes the existence of a symmetric positive deﬁnite squareroot of S.e. s3 which may be taken to be orthonormal. Thus we have 1 √ √ (T1 + σI)(T1 − σI)s = 0 .20 that FT F has a unique symmetric positive deﬁnite square root. Solution: It follows from Example 2. Since T1 is positive deﬁnite it cannot have a negative eigenvalue. i = 1. Example 2. and a unique orthogonal linear transformation R such that F = RU. (vi) This holds for every eigenvector s of S: i. we know that S can be represented as 3 S= i=1 σi (si ⊗ si ). U: U= FT F. σ2 . Then Ss = σs and so T2 s = σs.36 CHAPTER 2. T1 si = T2 si . It then follows from Example 2. Thus T1 = T2 . 3. It similarly follows that T2 s = √ σs and therefore that T1 s = T2 s. i. (iii) If we set f = (T1 − √ σI)s this can be written as √ T1 f = − σf . Since the triplet of eigenvectors form a basis for the underlying vector space this in turn implies that T1 x = T2 x for any vector x.20: Consider a symmetric positive deﬁnite linear transformation S.21: Polar Decomposition Theorem: If F is a nonsingular linear transformation. show that there exists a unique positive deﬁnite symmetric linear transformation U. What remains is to show uniqueness of this squareroot. 2. VECTORS AND LINEAR TRANSFORMATIONS Example 2. (i) If one deﬁnes a linear transformation T by 3 T= i=1 √ σi (si ⊗ si ) (ii) one can readily verify that T is symmetric. σ3 with corresponding eigenvectors s1 .19 that FT F is symmetric and positive deﬁnite.
Show that F= 3 3 λi i=1 i ⊗ ri . This establishes the proposition (except for the uniqueness which if left as an exercise). ri . Let λi . i = 1. WORKED EXAMPLES. 2. All we have to do is to show that R is orthogonal.37)2 and the fact that ri · rj = δij . Thus U and V have the spectral decompositions 3 3 U= i=1 λi ri ⊗ ri . Finally. From Example 2. i R= i=1 i ⊗ ri . (i) Next. by using the property (2. b = o. Solution: First.22: The polar decomposition theorem states that any nonsingular linear transformation F can be represented uniquely in the forms F = RU = VR where R is orthogonal and U and V are symmetric and positive deﬁnite. since U is nonsingular 3 U−1 = i=1 λ−1 ri ⊗ ri .3. (ii) Example 2. 37 Deﬁne the linear (ii) (iii) In this calculation we have used the fact that U. But this follows from RT R = (FU−1 )T (FU−1 ) = (U−1 )T FT FU−1 = U−1 U2 U−1 = I. it is nonsingular.38)1 and 3 = Rri we have 3 3 F = RU = R i=1 λi ri ⊗ ri = i=1 λi (Rri ) ⊗ ri = λi i=1 i ⊗ ri . since U is positive deﬁnite. transformation R through: R = FU−1 . Example 2. are symmetric. and so U−1 . Therefore 3 3 3 ⊗ ri )(rj ⊗ rj ) = (ri · rj )( 3 ⊗ rj ) = R= i=1 j=1 λi λ−1 δij ( j i ⊗ rj ) = λi λ−1 ( i i=1 i ⊗ ri ) = ( i=1 i ⊗ ri ).23: Determine the rank and the null space of the linear transformation C = a ⊗ b where a = o.15 it follows that the eigenvalues of V are the same as those of U and that the corresponding eigenvectors i of V are given by i = Rri . i and therefore 3 3 3 3 R = FU−1 = i=1 λi i ⊗ ri j=1 λ−1 rj ⊗ rj = j λi λ−1 ( j i=1 j=1 i i ⊗ ri )(rj ⊗ rj ). 3 be the eigenvalues and eigenvectors of U.2. we have ( δij ( i ⊗ rj ). i By using the property (2. . V= i=1 λi i ⊗ i . and its inverse U−1 exists.
the null space of C consists of all vectors x for which b · x. i.24: Let λ1 ≤ λ2 ≤ λ3 be the eigenvalues of the symmetric linear transformation S.38 CHAPTER 2.) Since Cx = (b · x)a the vector Cx is parallel to the vector a for every choice of the vector x. Obviously. Show that S can be expressed in the form S = (I + a ⊗ b)(I + b ⊗ a) if and only if 0 ≤ λ1 ≤ 1. call this vector f1 so that Af1 = f1 . if Ax = x for every vector x ∈ E3 . b = o. λ2 = 1. (ii) a = o. Example 2. the set of all vectors normal to b. there are other square roots of I that are not symmetric positive deﬁnite. However. (The range of A is the particular subspace of E3 comprised of all vectors Ax as x takes all values in E3 . the fact that A = −I. Since Cx = (b · x)a and a = o. Solution: The identity is certainly a symmetric positive deﬁnite tensor. The linear transformation C therefore has rank one. We are to explore them here: thus we wish to determine a tensor A on E3 such that A2 = I. By the result of a previous example on the squareroot of a symmetric positive deﬁnite tensor. VECTORS AND LINEAR TRANSFORMATIONS Solution: Recall that the rank of any linear transformation A is the dimension of its range. Since we are given that A = I. together with A2 = I similary implies that there must exist a unit vector e2 for which Ae2 = e2 . it follows that e1 = o.e. it follows that there is a unique symmetric positive deﬁnite tensor which is the square root of I. Without loss of generality we can assume that e1  = 1. (i) Example 2. Second. A = I and A = −I. Observe that (A + I) e1 = (A + I) (A − I)f1 = (A2 − I)f1 = Of1 = o. Set e1 = (A − I) f1 . then. . A = I. by deﬁnition. Thus the range of C is the set of vectors parallel to a and its dimension is one. (iv) from which we conclude that +1 is an eigenvalue of A with corresponding eigenvector e2 . since Af1 = f1 . there must exist at least one nonnull vector x for which Ax = x. Therefore Ae1 = −e1 (iii) (ii) (i) and so −1 is an eigenvalue of A with corresponding eigenvector e1 .25: Calculate the square roots of the identity tensor. First. this square root is also I. λ3 ≥ 1. Recall that the null space of any linear transformation A is the particular subspace of E3 comprised of the set of all vectors x for which Ax = o.
arbitrary. which implies that A13 either A23 A33 = arbitrary. WORKED EXAMPLES. (vi) 0 0 A33 0 1 0 [A2 ] = [A]2 = [A][A] = 0 0 1 −A13 + A13 A33 A2 33 A23 + A23 A33 . and therefore ξ1 = ξ2 = 0. A13 or A23 A33 = = = 0. Why is [A2 ] = [A]2 ?) However. Fourth. = 1. (vii) (Notation: [A2 ] is the matrix of components of A2 while [A]2 is the square of the matrix of components of A. e3 } are given. ξ2 one has ξ1 e1 + ξ2 e2 = o. A21 = A31 = 0. e3 } is linearly independent and therefore forms a basis for E3 . The matrix of components of A in this basis is therefore −1 0 A13 [A] = 0 1 A23 . as usual. The triplet of vectors {e1 . e2 } is a linearly independent pair of vectors. since A2 = I. (x) . (v) It follows that Comparing (v) with (iii) yields A11 = −1. −1. the components Aij of the tensor A in the basis {e1 . Fifth. Operating on this by A yields ξ1 Ae1 + ξ2 Ae2 = o. To see this. = 0. A23 + A23 A33 = 0. and similarly comparing (v) with (iv) yields A22 = 1.3. one can show that {e1 . e2 . A12 = A32 = 0. Therefore e1 and e2 are linearly independent. by Aej = Aij ei . Therefore we must have −A13 + A13 A33 = 0. Subtracting and adding the preceding two equations shows that ξ1 e1 = ξ2 e2 = o. neither of them is the null vector o. A2 = 1. suppose that for some scalars ξ1 . let e3 be a unit vector that is perpendicular to both e1 and e2 . which on using (iii) and (iv) leads to −ξ1 e1 + ξ2 e2 = o. Since e1 and e2 are eigenvectors. the matrix of components of A2 in any basis has to be the identity matrix. 39 Third. e2 .2. 33 (viii) (ix) Consequently the matrix [A] must necessarily have one of the two forms −1 0 α1 −1 0 0 0 1 0 0 1 α2 or 0 0 1 0 0 −1 .
A = −I for any value of the scalar α1 . Then p1 ⊗ q1 = −e1 ⊗ e1 + and therefore I + 2p1 ⊗ q1 = e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3 − 2e1 ⊗ e1 + α1 e1 ⊗ e3 q1 = −e1 + α1 e3 . 3.K. Note from this that the components of the tensor −I + 2p2 ⊗ q2 are given by (x)2 . set p1 = e1 . Thus the tensors deﬁned in (xii) and (xiv) are both square roots of the identity tensor that are not symmetric positive deﬁnite. . 2 (xi) α1 e1 ⊗ e3 . New York. Wiley. Halmos. Sixth. Oxford University Press. Van Nostrand. New York. I. 1997. one can readily verify that the tensor A = −I + 2p2 ⊗ q2 (xiv) has the desired properties A2 = I. one can readily verify that the tensor A = I + 2p1 ⊗ q1 (xii) has the desired properties A2 = I. Gelfand. Knowles. New Jersey. 1963. 1958.R. Finite Dimensional Vector Spaces. Linear Vector Spaces and Cartesian Tensors. A = I. Alternatively set p2 = e2 .40 CHAPTER 2. Note from this that the components of the tensor I + 2p1 ⊗ q1 are given by (x)1 . VECTORS AND LINEAR TRANSFORMATIONS where α1 and α2 are arbitrary scalars. 2. Conversely. Then p2 ⊗ q2 = e2 ⊗ e2 + and therefore −I + 2p2 ⊗ q2 = − e1 ⊗ e1 − e2 ⊗ e2 − e3 ⊗ e3 + 2e2 ⊗ e2 + α2 e2 ⊗ e3 q2 = e2 + α2 e3 .M. 2 = −e1 ⊗ e1 + e2 ⊗ e2 − e3 ⊗ e3 + α2 e2 ⊗ e3 . A = I. 2 (xiii) α2 e2 ⊗ e3 . J. 2 = −e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3 + α1 e1 ⊗ e3 . References 1. Lectures on Linear Algebra. Conversely. P. A = −I for any value of the scalar α2 .
.in ..e.. Let IE3 be a threedimensional Euclidean vector space. k.... β. (3... given any v ∈ IE3 .. i..1 Components of a vector in a basis. e2 . A set of three linearly independent vectors {e1 .. there are unique scalars α.. Cartesian Tensors. e3 } forms an orthonormal basis for IE3 .. γ such that v = αe1 + βe2 + γe3 ....... .... .... ej are mutually orthogonal. or i. e2 .Chapter 3 Components of Vectors and Tensors.1) If each basis vector ei has unit length.in component of ntensor T in some basis. component of 4tensor C in some basis i1 i2 . Notation: α {a} a ai [A] A Aij Cijk Ti1 i2 . .. for an 41 .... scalar 3 × 1 column matrix vector ith component of the vector a in some basis.... j.. . 3. j element of the square matrix [A] i.... and if each pair of basis vectors ei . . or ith element of the column matrix {a} 3 × 3 square matrix linear transformation i. .. j component of the linear transformation A in some basis.. we say that {e1 . e3 } forms a basis for IE3 in the sense that an arbitrary vector v can always be expressed as a linear combination of the three basis vectors.. Thus.. .. ..
v3 (3. CARTESIAN TENSORS orthonormal basis. The vector can be expressed in terms of its components and the basis vectors as v = vi ei .2) where δij is the Kronecker delta. ei · ej = δij (3. e3 } are deﬁned by vi = v · ei .6) v3 v v3 e3 v e2 v1 v2 v1 v2 e1 Figure 3. COMPONENTS OF TENSORS. that we are given two bases {e1 . v2 . e2 . v3 } of the same vector v in two diﬀerent bases. it is still important to emphasize that the components vi of a vector depend on both the vector v and the choice of basis.4).4) (3. e3 } as shown in . Even though this is obvious from the deﬁnition (3. e2 . If the basis is righthanded. one has in addition that ei · (ej × ek ) = eijk where eijk is the alternator introduced previously in (1. e3 } and {e1 .3) (3. e2 .42 CHAPTER 3.5) (3.1: Components {v1 . The components of v may be assembled into a column matrix v1 {v} = v2 . v3 } and {v1 . In these notes we shall always restrict attention to orthonormal bases unless explicitly stated otherwise. The components vi of a vector v in a basis {e1 . for example.44). v2 . Suppose.
9) where eijk is the alternator introduced previously in (1. (3.2 Components of a linear transformation in a basis. Consider a linear transformation A. Let Aij be the ith component of the vector Aej so that Aej = Aij ei .3. 43 Figure 3. e2 . that once the basis is ﬁxed.2. Ae2 and Ae3 .7) Thus the one vector v can be expressed in either of the two equivalent forms v = vi ei or v = vi ei . If ui and vi are the components of two vectors u and v in a basis. there is a unique vector x associated with any given column matrix {x} such that the components of x in {e1 . e2 . vi = vi .11) (3.1.12) . then the scalarproduct u · v can be expressed as u · v = ui vi . (3. It follows. the vector equation z = x + y can be written equivalently as {z} = {x} + {y} or zi = xi + yi in terms of the components xi .8) The components vi and vi are related to each other (as we shall discuss later) but in general. Then the vector v has one set of components vi in the ﬁrst basis and a diﬀerent set of components vi in the second basis: vi = v · ei . there is a onetoone correspondence between column matrices and vectors. (3. e3 } is chosen and ﬁxed. once the basis is ﬁxed. 3. COMPONENTS OF A LINEAR TRANSFORMATION IN A BASIS. In particular this is true of the three vectors Ae1 . (3. Once a basis {e1 .44). (3. vi = v · ei . e3 } are {x}. Thus. Any vector in IE3 can be expressed as a linear combination of the basis vectors e1 . for example.10) the vectorproduct u × v can be expressed as u × v = (eijk uj vk )ei or equivalently as (u × v)i = eijk uj vk . e2 and e3 . yi and zi in the given basis.
Then the linear transformation A has one set of components Aij in the ﬁrst basis and a diﬀerent set of components Aij in the second basis: Aij = ei · (Aej ). Aij = ei · (Aej ). if A. e3 } are [M ]. Thus. then their component matrices [A]. The components Aij can be assembled into a square matrix: A11 A12 A13 [A] = A21 A22 A23 . CARTESIAN TENSORS We can also write Aij = ei · (Aej ). e3 }. Suppose. e3 } and {e1 .17) Once a basis {e1 . e3 }. e2 . (3. [B] and [C] are related by [C] = [A][B] or Cij = Aik Bkj .14) A31 A32 A33 3 3 The linear transformation A can be expressed in terms of its components Aij and the basis vectors ei as A= j=1 i=1 Aij (ei ⊗ ej ). COMPONENTS OF TENSORS. e2 . there is a unique linear transformation M associated with any given square matrix [M ] such that the components of M in {e1 . (3.18) in terms of the components Aij . (3.19) . It follows. Similarly. e2 . once the basis is ﬁxed. e2 . for example.16) The components Aij and Aij are related to each other (as we shall discuss later) but in general Aij = Aij . (3. e3 } is chosen and ﬁxed. (3. The components of the linear transformation A = a ⊗ b are Aij = ai bj . xi and yi in the given basis. that the equation y = Ax relating the linear transformation A and the vectors x and y can be written equivalently as {y} = [A]{x} or yi = Aij xj (3. that we are given two bases {e1 .44 CHAPTER 3. B and C are linear transformations such that C = AB. for example.15) The components Aij of a linear transformation depend on both the linear transformation A and the choice of basis. e2 . there is a onetoone correspondence between square matrices and linear transformations.13) The 9 scalars Aij are known as the components of the linear transformation A in the basis {e1 . (3.
The rightmost expressions of a · (b × c) and b · (c × a) are identical and therefore this establishes the desired identity. thus by changing k → i. (3. (3. Since i. 45 The component matrix [I] of the identity linear transformation I in any orthonormal basis is the unit matrix.3 Components in two bases. Let Qij be the jth component of the vector ei in the basis {e1 . The particular basis consisting of the eigenvectors is called a principal basis for S.3. λ2 . e2 . the left hand side of this reads a · (b × c) = ai (b × c)i = ai eijk bj ck = eijk ai bj ck . In terms of components. If necessary. and therefore in particular the vectors ei . If [A] and [AT ] are the component matrices of the linear transformations A and AT . λ3 and corresponding orthonormal eigenvectors e1 . Consider a 3dimensional Euclidean vector space together with two orthonormal bases {e1 . Similarly the righthand side reads b · (c × a) = bi (c × a)i = bi eijk cj ak = eijk ak bi cj . pick and ﬁx a basis. e2 . The eigenvectors are referred to as the principal directions of S.3. e3 }. a symmetric linear transformation S has three real eigenvalues λ1 . The component matrix [S] of the symmetric linear transformation S in its principal basis is λ1 0 0 [S] = 0 λ2 0 . e3 }: ei = Qij ej . e3 } and {e1 . e2 . e3 } forms a basis. j.21) . k are dummy subscripts in the rightmost expression. we can. i → j and j → k we can write b · (c × a) = ejki ai bj ck .20) 0 0 λ3 As a ﬁnal remark we note that if we are to establish certain results for vectors and linear transformations. e3 . Since {e1 . e2 . and then work with the components in that basis. 3. if it is more convenient to do so. they can be changed to any other subscript. its components are therefore given by the Kronecker delta δij .2. COMPONENTS IN TWO BASES. e2 . can be represented as a linear combination of the basis vectors e1 . we can revert back to the vectors and linear transformations at the end. e3 . Finally recalling that the sign of eijk changes when any two adjacent subscripts are switched we ﬁnd that b · (c × a) = ejki ai bj ck = −ejik ai bj ck = eijk ai bj ck where we have ﬁrst switched the ki and then the ji in the subscript of the alternator. then [AT ] = [A]T and AT = Aji . any vector. e2 . For example the ﬁrst example in the previous chapter asked us to show that a · (b × c) = b · (c × a). ij As mentioned in Section 2.
27) . Then one can show that vi = Qij vj or equivalently {v } = [Q]{v} (3. Let Aij and Aij be the ijcomponents of the same linear transformation A in the two bases {e1 . then [Q] is a proper orthogonal matrix and det[Q] = +1. (3. e2 . one also has the inverse relationships vi = Qji vj or equivalently {v} = [Q]T {v }. e3 }. A vector whose components in every basis happen to be the same is called an isotropic vector: {v} = [Q]{v} for all orthogonal matrices [Q]. e2 . then [Q] is an improper orthogonal matrix and det[Q] = −1. Similarly. COMPONENTS OF TENSORS. e2 . (3.23) The 9 numbers Qij can be assembled into a square matrix [Q]. (3. If in addition. e3 }. (3.25) In general. if the two bases are related by a reﬂection. which means that one basis is righthanded and the other is lefthanded. the component matrices {v} and {v } of a vector v in two diﬀerent bases are diﬀerent. We may now relate the diﬀerent components of a single vector v in two bases.24) Since [Q] is orthogonal. one sees that Qij = ei · ej . e3 } and {e1 .46 CHAPTER 3. which means that both bases are righthanded or both are lefthanded. e3 }. e2 . It is possible to show that the only isotropic vector is the null vector o. Let vi and vi be the ith component of the same vector v in the two bases {e1 .22) and so Qij is the cosine of the angle between the basis vectors ei and ej . e2 . e3 } and {e1 . if one basis can be rotated into the other. (3. e3 } and {e1 . Since both bases are orthonormal it can be readily shown that [Q] is an orthogonal matrix. e2 . one also has the inverse relationships Aij = Qpi Qqj Apq or equivalently [A] = [Q]T [A ][Q]. we may relate the diﬀerent components of a single linear transformation A in two bases. e2 . CARTESIAN TENSORS By taking the dotproduct of (NNN) with ek . Then one can show that Aij = Qip Qjq Apq or equivalently [A ] = [Q][A][Q]T .26) Since [Q] is orthogonal. Observe from (NNN) that Qji can also be interpreted as the jth component of ei in the basis {e1 . e3 } whence we also have ei = Qji ej . This matrix relates the two bases {e1 .
e2 . If [A ] are the components of A in some other basis {e1 . e1 . SCALARPRODUCT AND NORM 47 In general. e2 . e3 ) = Ae1 · e1 . e2 . and in such a case we can simply write Φ(A). e2 .28) Φ(A. TRACE.4 Scalarvalued functions of linear transformations. e3 } and {e1 .30) . We ﬁrst consider two important examples here. It is possible to show that the most general isotropic symmetric linear transformation is a scalar multiple of the identity αI where α is an arbitrary scalar. e2 . Equivalently. if we take the determinant of this matrix equation we get det[A ] = det([Q][A][Q]T ) = det[Q] det[A] det[Q]T = (det[Q])2 det[A] = det[A]. DETERMINANT. e2 . trace. so that for every two (notnecessarily orthonormal) bases {e1 . For example Φ(A. Certain functions φ have the property that φ([A]) = φ[A ]) for all pairs of bases {e1 . Certain such functions are in fact independent of the basis. e3 } and {e1 . (3. e3 ) be a scalarvalued function that depends on a linear transformation A and a (nonnecessarily orthonormal) basis {e1 . then in general φ([A]) = φ[A ]). e3 }. e2 .3. Let Φ(A. e3 ) = Φ(A. the component matrices [A] and [A ] of a linear transformation A in two diﬀerent bases are diﬀerent. Therefore without ambiguity we may deﬁne the determinant of a linear transformation A to be the (basis independent) scalarvalued function given by det A = det[A]. e1 . One example of such a function is (Ae1 × Ae2 ) · Ae3 .29) since the determinant of an orthogonal matrix is ±1. e2 . e1 . e3 } one has Φ(A. e1 . e2 . 3. A linear transformation whose components in every basis happen to be the same is called an isotropic linear transformation: [A] = [Q][A][Q]T for all orthogonal matrices [Q]. e1 . This means that the function φ depends on the linear transformation A and the underlying basis. (3.4. (3. let A be a linear transformation and let [A] be the components of A in some basis {e1 . e2 . e3 }. Since the components [A] and [A ] of a linear tranformation A in two bases are related by [A ] = [Q][A][Q]T . e3 ) = (e1 × e2 ) · e3 (though it is certainly not obvious that this function is independent of the choice of basis). e2 . e3 } and therefore such a function depends on the linear transformation only and not the basis. Determinant. e3 }. For such a function we may write φ(A). e2 . e3 ). Let φ([A]) be some realvalued function deﬁned on the set of all square matrices. scalarproduct and norm.
36) The eigenvalues are the roots λ of this cubic equation. we may deﬁne the trace of a linear transformation A to be the (basis independent) scalarvalued function given by trace A = tr[A].28) is in fact the determinant det A. then det(A−1 ) = 1/ det(A).33) As mentioned previously.34) If A is nonsingular. (3. a linear transformation A is said to be nonsingular if the only vector x for which Ax = o is the null vector x = o. (3. Equivalently. If S is symmetric. Thus the eigenvalues of a linear transformation are also scalarvalued functions of A whose values depends only on A and not the basis: λi = λi (A). The eigenvalues and eigenvectors of a linear transformation do not depend on any choice of basis. one can show that A is nonsingular if and only if det A = 0. its matrix of components in a principal basis are λ1 0 0 [S] = 0 λ2 0 . Then by deﬁnition. Av = λv. traceA = Aii .32) (3. (3.46).37) 0 0 λ3 .31) see (1. It is useful to note the following properties of the determinant of a linear transformation: det(AB) = det(A) det(B). or equivalently (A − λI)v = o. COMPONENTS OF TENSORS. In terms of its components in a basis one has det A = eijk A1i A2j A3k = eijk Ai1 Aj2 Ak3 . det(αA) = α3 det(A). Since v = o it follows that A − λI must be singular and so det(A − λI) = 0. Similarly.35) Suppose that λ and v = o are an eigenvalue and eigenvector of given a linear transformation A. CARTESIAN TENSORS We will see in an example at the end of this Chapter that the particular function Φ deﬁned in (3. (3. (3. det(AT ) = det (A).48 CHAPTER 3. (3.
39) and for this reason the three functions (3.3. (3. I2 (QT AQ) = I2 (A).43) . I3 (A) = det A. One can similarly deﬁne scalarvalued functions of two linear transformations A and B.38) will appear frequently in what follows. λ3 I1 (S) = λ1 + λ2 + λ3 . one can show that A3 − I1 (A)A2 + I2 (A)A − I3 (A)I = O.41) (3. 49 (3. (3. I2 (A) = 1/2 [(tr A)2 − tr (A2 )] . λ2 . Observe from (3. I1 (QT AQ) = I1 (A). B) deﬁned by φ(A. The particular function φ(A. DETERMINANT. I3 (QT AQ) = I3 (A). B) = tr(ABT ) (3. which is known as the CayleyHamilton theorem. Note in particular that the cubic equation for the eigenvalues of a linear transformation can be written as λ3 − I1 (A)λ2 + I2 (A)λ − I3 (A) = 0.40) between invariants and eigenvalues is onetoone. det(A − αI) = −α3 + I1 (A)α2 − I2 (A)α + I3 (A).38) are said to be invariant under orthogonal transformations.42) (3. Finally.40) will play an important role in what follows. It can be readily veriﬁed that for any linear transformation A and all orthogonal linear transformations Q. TRACE. Note that in terms of components in a basis. In addition one can show that for any linear transformation A and any real number α.4. The mapping (3. I3 (S) = λ1 λ2 λ3 . φ(A. SCALARPRODUCT AND NORM The particular scalarvalued functions I1 (A) = tr A. I2 (S) = λ1 λ2 + λ2 λ3 + λ3 λ1 .37) that for a symmetric linear transformation with eigenvalues λ1 . B) = tr(ABT ) = Aij Bij .
The numbers Ti1 i2 . (3. 31 and 32 .50 CHAPTER 3.5 Cartesian Tensors Consider two orthonormal bases {e1 .45) A = A · A = tr(AAT ).49) is called a 2nd order Cartesian tensor or a 2tensor..48) is called a 1st order Cartesian tensor or a 1tensor. A quantity whose components Aij and Aij in two bases are related by Aij = Qip Qjq Apq (3. e2 .in . A2 = Aij Aij . CARTESIAN TENSORS This particular scalarvalued function is often known as the scalar product of the two linear transformation A and B and is written as A · B: A · B = tr(ABT ).in are called the components of T in the basis {e1 .. for example. then each component Aij → 0. If.. e3 }. denoted by A as √ (3. e3 }. it is represented by 30 . This will be used later when we linearize the theory of large deformations. e3 }. COMPONENTS OF TENSORS. e3 } and {e1 . T is a scalar. Observe the useful property that if A → 0.. vector or linear transformation. is deﬁned completely by a set of 3n ordered numbers Ti1 i2 . in a given basis {e1 . The concept of an nth order tensor can be introduced similarly: let T be a physical entity which.46) 3. (3. e2 .. It follows from our preceding discussion that a linear transformation is a 2tensor.47) (3. Note that in terms of components in a basis. It follows from our preceding discussion that a vector is a 1tensor. e2 . e2 ..44) It is natural then to deﬁne the magnitude (or norm) of a linear transformation A. A quantity whose components vi and vi in these two bases are related by vi = Qij vj (3.
52) where some of the subscripts maybe repeated. bi and Tij . If it is known that A and B are tensors. Contracting over two subscripts involves setting those two subscripts equal. Suppose that A. ij−1 p ij+1 .in j1 j2 . Let a.in .... However.jm = Ai1 i2 .. Two tensors of the same order are added by adding corresponding components.. If a and b are 1tensors...jm (3.. and let Ti1 i2 .in = Ti1 i2 . there are certain special tensors whose components in one basis are the .50) the entity T is called a nth order Cartesian tensor or more simply an ntensor. then one can readily show that T is necessarily a 2tensor. Recall that the outerproduct of two vectors a and b is the 2−tensor C = a ⊗ b whose components are given by Cij = ai bj . (3.....3.. This can be generalized to higherorder tensors. Then “contracting” A over its subscripts leads to the scalar Aii . and therefore summing over them.. say the ij th and ik th subscripts. Note that the components of a tensor in an arbitrary basis can be calculated if its components in any one basis are known. Suppose that the components of T in some basis are related to the components of a and b in that same basis by ai = Tij bj . This can be generalized to higherorder tensors.51) Let A be a 2tensor with components Aij in some basis. e2 .in = Tk1 k2 . Qin jn Tj1 j2 . if for every pair of such bases. In general.. then T is necessarily a tensor as well..5..ik−1 p ik+1 .. Then....jn ..... This rule generalizes naturally to tensors of more general order. (3.. in ..jm .... Let A be a ntensor with components Ai1 i2 . Then “contracting” A over two of its subscripts. e3 } be a second basis related to the ﬁrst one by the orthogonal matrix [Q]. b and T be entities whose components in a basis are denoted by ai . B and T are entities whose components in a basis are related by. Given an ntensor A and an mtensor B their outerproduct is the (m + n)−tensor C = A ⊗ B whose components are given by Ci1 i2 . Let {e1 ...in in some basis.in Bj1 j2 . leads to the (n − 2)−tensor whose components in this basis are Ai1 i2 .. Ai1 i2 . the components of a tensor T in two diﬀerent bases are diﬀerent: Ti1 i2 ..k Bj1 j2 .. CARTESIAN TENSORS 51 components respectively in the given basis.in = Qi1 j1 Qi2 j2 . This is called the quotient rule since it has the appearance of saying that the quotient of two 1tensors is a 2tensor. these two sets of components are related by Ti1 i2 .in be the components of the entity T in the second basis.
γ are arbitrary scalars.e..Qin jn Tj1 j2 . e3 }.. CARTESIAN TENSORS same as those in any other basis. One might have expected such examples to have been presented in Chapter 2.in in all bases {e1 .6 Worked Examples. As noted previously. (3. We shall do this frequently in what follows and will not bother to explain this each time. . If necessary. Equivalently. we are asked to establish certain results for vectors and linear transformations. we can revert back to the vectors and linear transformations at the end.55) 3. Example 3. if Ti1 i2 .. (c) the most general isotropic 3tensor is the null 3tensor o.in = Ti1 i2 .jn for all orthogonal matrices [Q].54) (3.1: Suppose that A is a symmetric linear transformation. Such a tensor is said to be isotropic. e2 . In general. for an isotropic tensor Ti1 i2 . (3. e3 } and {e1 .in = Qi1 j1 Qi2 j2 .53) One can show that (a) the only isotropic 1tensor is the null vector o.. and then work using components in that basis... (d) and the most general isotropic 4tensor C has components (in any basis) Cijkl = αδij δkl + βδik δjl + γδil δjk where α. In some of the examples below.. Show that its matrix of components [A] in any basis is a symmetric matrix. a tensor T is said to be an isotropic tensor if its components have the same values in all bases... and we chose to deﬁne these quantities in terms of components (even though they are basis independent)... β. e2 . (b) the most general isotropic 2tensor is a scalar multiple of the identity linear transformation. an example of this is the identity 2tensor I. i. whenever it is more convenient we may pick and ﬁx a basis.. They are contained in the present chapter because they all involve either the determinant or trace of a linear transformation.52 CHAPTER 3. αI.. It is also worth pointing out that in some of the example below calculations involving vectors and/or linear transformation are carried out without reference to their components. COMPONENTS OF TENSORS.
A) > 0 provided A = 0. ii) f (αA. In terms of these components. (ii) . B. and for all scalars α. and so (ii) yields Aji = Aij . B) = Aik Bik . WORKED EXAMPLES.13). A). From an example in the previous chapter we know that the projection transformation Π and the reﬂection transformation R in the plane P can be written as Π = I − e3 ⊗ e3 and R = I − 2(e3 ⊗ e3 ) respectively. (iii) (ii) Example 2. Rij = δij − 2δ3i δ3j . B) + f (C. B).5: Choose any convenient basis and calculate the components of the projection linear transformation Π and the reﬂection linear transformation R in that basis. By (3. e2 . Thus Aji = ei · Aej . on using the fact that A is symmetric further simpliﬁes to ej · Aei = Aej · ei . the components of A in the basis {e1 . Thus [A] = [A]T and so the matrix [A] is symmetric. B) = f (B. 53 (i) The property (NNN) of the transpose shows that ej · Aei = AT ej · ei . Remark: Conversely. the right most term here is the Aij component of A. B) and iv) f (A. Solution: According to (3. if it is known that the matrix of components [A] of a linear transformation in some basis is is symmetric. which. e2 . and ﬁnally since the order of the vectors in a scalar product do not matter we have ej · Aei = ei · Aej . we ﬁnd that Πij = δij − (e3 )i (e3 )j = δij − δ3i δ3j .6. for all linear transformations A. this functionf has the following properties: i) f (A. T (ABT )ij = Aik Bkj = Aik Bjk and so f (A. B) = αf (A. iii) f (A + C.2: Consider the scalarvalued function f (A. B) = trace(ABT ) (i) and show that. Solution: Let e3 be a unit vector normal to the plane P and let e1 and e2 be any two unit vectors in P such that {e1 . B) = f (A. Since the components of e3 in the chosen basis are δ3i .13). e3 } are deﬁned by Aji = ej · Aei . Example 3. e3 } forms an orthonormal basis.3. Solution: Let Aij and Bij be the components of A and B in an arbitrary basis. C. then the linear transformation A is also symmetric.
we note that det F (a × b) · c = det[F ](a × b)i ci = det[F ]eijk aj bk ci = det[F ]erpq ap bq cr . thus establishing the desired result. b. it follows that the lefthand sides must also be equal. Note that. we can deﬁne the magnitude of a linear transformation to be A = √ A·A= Aij Aij . based on this scalarproduct.54 CHAPTER 3. we can express this as (Fa × Fb) · Fc = (Fa × Fb)i (Fc)i = eijk (Fa)j (Fb)k (Fc)i . Remark: It follows from this that the function f has all of the usual requirements of a scalar product.10). (i) Since eijk = −ejik and ui uj = uj ui . for example.4 Suppose that a. COMPONENTS OF TENSORS. Example 3. Therefore we may deﬁne the scalarproduct of two linear transformations A and B. as A · B = trace(ABT ) = Aij Bij . Solution: We are to show. CARTESIAN TENSORS It is now trivial to verify that all of the above requirements hold. In terms of their components we can write u · (u × v) = ui (u × v)i = ui (eijk uj vk ) = eijk ui uj vk . Show that (Fa × Fb) · Fc = det F (a × b) · c Solution: First consider the lefthand side of (i). it follows that eijk is skewsymmetric in the subscripts ij and ui uj is symmetric in the subscripts ij. that u · (u × v) = 0.3: For any two vectors u and v.48) for the determinant of a matrix and substituting this into (iv) gives det F (a × b) · c = eijk Fir Fjp Fkq ap bq cr . (iv) (iii) (ii) (i) Recalling the identity erpq det[F ] = eijk Fir Fjp Fkq in (1. On using (3. .3 that eijk ui uj = 0 and so u · (u × v) = 0. c. show that their crossproduct u × v is orthogonal to both u and v. denoted by A · B. The orthogonality of v and u × v can be established similarly. are any three linearly independent vectors and that F be an arbitrary nonsingular linear transformation. (iv) (iii) Example 3.11). and (3. and consequently (Fa × Fb) · Fc = eijk (Fjp ap ) (Fkq bq ) (Fir cr ) = eijk Fir Fjp Fkq ap bq cr . Turning next to the righthand side of (i). (v) Since the righthand sides of (iii) and (v) are identical. Thus it follows from Example 1.
Next. n0 and F. 55 Example 3.5 Suppose that a. Volume V olume Volume V0 olume c b a Fa Figure 3. Example 3. 6 The volume V of the tetrahedron deﬁned by the three vectors Fa.6. and similarly that α = Fa × Fb. c is 1 V0 = (a × b) · c. Solution: Recall from an example in the previous Chapter that the volume V0 of the tetrahedron deﬁned by any three noncoplanar vectors a. Fb and Fc. Note that the second tetrahedron is the image of the ﬁrst tetrahedron under the transformation F. Let α0 be the area of the parallelogram deﬁned by these two vectors and let n0 be a unit vector that is normal to the plane of this parallelogram. b and c. Fa × Fb n0 = a×b . n= Fa × Fb . Fc is likewise V = 1 (Fa × Fb) · Fc.6: Suppose that a and b are two noncolinear vectors in IE3 . Derive a formula for V in terms of V0 and F. Derive formulas for α and n in terms of α0 . a × b . suppose that F is a nonsingular 2tensor and let V denote the volume of the tetrahedron deﬁned by the vectors Fa. b and c are three noncoplanar vectors in IE3 . WORKED EXAMPLES. 6 F Fc Fb It follows from the result of the previous example that V /V0 = det F which describes how volumes are mapped by the transformation F. Let V0 be the volume of the tetrahedron deﬁned by these three vectors. suppose that F is a nonsingular 2tensor and let α and n denote the area and unit normal to the parallelogram deﬁned by the vectors Fa and Fb. Fb.2: Tetrahedron of volume V0 deﬁned by three noncoplanar vectors a.3. b. Solution: By the properties of the vectorproduct we know that α0 = a × b. and its image under the linear transformation F. Next.
e3 } and {e1 .5: Let {e1 . e3 } are Qij . Let Q be the linear transformation whose components in the basis {e1 . e3 } be two bases related by nine scalars Qij through ei = Qij ej . CARTESIAN TENSORS F n0 Area α Area α0 b a n Fb Fa Figure 3. Show that ei = QT ei . and its image under the linear transformation F. (ii) and αn = Fa × Fb. F−T n0  Example 3. This describes how (vectorial) areas are mapped by the transformation F. Multiplying both sides of this −1 identity by Frs leads to −1 −1 epqr det[F ]Frs = eijk Fip Fjq Fkr Frs = eijk Fip Fjq δks = eijs Fip Fjq = esij Fip Fjq (iii) Substituting (iii) into (ii) gives −1 −1 −T (Fa × Fb)s = det[F ]epqr Frs ap bq = det[F ]erpq ap bq Frs = det F(a × b)r Fsr = det F F−T (a × b) s and so using (i). e2 . (i) Also recall the identity epqr det[F ] = eijk Fip Fjq Fkr introduced in (1. Taking the norm of this vector equation gives α =  det F F−T n0 . . α0 and substituting this result into the preceding equation gives n= F−T n0 .48). COMPONENTS OF TENSORS. e2 . But (Fa × Fb)s = esij (Fa)i (Fb)j = esij Fip ap Fjq bq . thus QT is the transformation that carries the ﬁrst basis into the second. αn = α0 det F(F−T n0 ). e2 . Therefore α0 n0 = a × b.3: Parallelogram of area α0 with unit normal n0 deﬁned by two noncolinear vectors a and b.56 CHAPTER 3.
the components of the vector v in the two bases are related by vi = Qij vj . e2 . Since [Q] is an orthogonal matrix one readily sees that Q is an orthogonal transformation. e3 }. together with the given fact that ei = Qij ej . yields the desired result. it follows from the deﬁnition of components that Qej = Qij ei . we are now led to Qkj ej = QT ek or equivalently QT ei = Qij ej .3. Example 3.6: Determine the relationship between the components vi and vi of a vector v in two bases. and its components Aij in a second basis {e1 .7: Determine the relationship between the components Aij and Aij of a linear transformation A in two bases. Thus. e2 . Example 3. e2 .6. (ii) (i) . e3 } are deﬁned by Aij = ei · (Aej ). It follows from this and (NNN) that vi = v · ei = v · (Qij ej ) = Qij v · ej = Qij vj . Solution: The components Aij of the linear transformation A in the basis {e1 . e2 . Multiplying both sides of this by Qkj and noting by the orthogonality of Q that Qkj Qij = δki . e3 } are deﬁned by Aij = ei · (Aej ). This. Solution: The components vi of v in the basis {e1 . Operating on both sides of the preceding equation by QT and using the orthogonality of Q leads to ej = Qij QT ei . e3 } are deﬁned by vi = v · e i . and its components vi in the second basis {e1 . WORKED EXAMPLES. 57 Solution: Since Qij are the components of the linear transformation Q in the basis {e1 . e3 } are deﬁned by vi = v · e i . e2 .
− sin θ e1 + cos θ e2 . and then (i). the components of the linear transformation A in the two bases are related by Aij = Qip Qjq Apq . (iv) (iii) Example 3. e3. Write out the transformation rule for 2tensors explicitly in this case. CARTESIAN TENSORS By ﬁrst making use of (NNN). e3 } is obtained by rotating the basis {e1 . e2 . e2 . we can write (ii) as Aij = ei · (Aej ) = Qip ep · (AQjq eq ) = Qip Qjq ep · (Aeq ) = Qip Qjq Apq . 1 . e3 θ e2 θ e1 e2 e1 Figure 3. and so it follows that [Q] = − sin θ 0 0 . see Figure 3. Thus. e3 } through an angle θ about the unit vector e3 . e3 } obtained by rotating the basis {e1 . e2 . Solution: In view of the given relationship between the two bases it follows that e1 e2 e3 = = = cos θ e1 + sin θ e2 .4: A basis {e1 . e2 .8: Suppose that the basis {e1 . e3 } through an angle θ about the unit vector e3 .58 CHAPTER 3.4. e3 . COMPONENTS OF TENSORS. cos θ sin θ cos θ 0 0 The matrix [Q] which relates the two bases is deﬁned by Qij = ei · ej .
bi be the components of a and b in two arbitrary bases. and in addition A13 = A23 = 0. These are the wellknown equations underlying the Mohr’s circle for transforming 2tensors in twodimensions. (iii) (ii) (i) . Tijk = ai bj bk . ai and bi . Let ai .9: a. b and T be entities whose components in some arbitrary basis are ai . these nine equations simplify to A11 − A22 A11 + A22 + cos 2θ + A12 sin 2θ. 2 2 2 A11 + A22 A11 − A22 A12 + A21 = − cos 2θ − sin 2θ. A31 = A31 cos θ + A32 sin θ. bi and Tij . A32 = A32 cos θ − A31 sin θ. = 2 2 2 A12 − A21 A12 − A21 A11 − A22 = − − cos 2θ − sin 2θ. Show that the Cijk ’s are the components of a 4tensor. b. 2 2 2 = = A13 cos θ + A23 sin θ. WORKED EXAMPLES. If a and b are vectors. 59 together with A13 = A23 = 0 and A33 = A33 . Solution: a. Substituting this [Q] into [A ] = [Q][A][Q]T and multiplying out the matrices leads to the 9 equations A11 A12 A21 A22 A13 A23 A33 A11 + A22 A11 − A22 A12 + A21 + cos 2θ + sin 2θ. show that T is a 3tensor. 2 2 2 A12 − A21 A11 − A22 A12 − A21 + cos 2θ − sin 2θ. 2 2 A11 − A22 A12 = − sin 2θ. 2 Example 3. In the special case when [A] is symmetric.3. The components of T in any basis are deﬁned in terms of the components of a and b in that basis by Tijk = ai bj bk . Let a. A11 = 2 2 A11 + A22 A11 − A22 A22 = − cos 2θ − A12 sin 2θ. = A23 cos θ − A13 sin θ. We are told that the components of the entity T in these two bases are deﬁned by Tijk = ai bj bk . Suppose that A and B are 2tensors and that their components in some basis are related by Aij = Cijk Bk . = A33 .6.
(v) Therefore the components of T in two bases transform according to Tijk = Qip Qjq Qkr Tpqr . CARTESIAN TENSORS Since a and b are known to be vectors. (vii) Aij = Cijk Bk . the fact that Qip Qim = δpm . Cijk be the components of A. Bij = Qip Qjq Bpq . b. Bij . Therefore T is a 3tensor. or on using (vi)1 in this that Cmnpq Bpq = Cijk Qim Qjn Qkp Q q Bpq . COMPONENTS OF TENSORS. (xiii) Example 3.e. i. leads to δpm δqn Apq = Cijk Qim Qjn Qkp Q q Bpq .60 CHAPTER 3. (i) . C in two arbitrary bases: Aij = Cijk Bk . (ii) for all proper orthogonal matrices [Q]. (vi) and we must show that Cijk is a 4tensor. whence Aij = Qip Qjq Apq . bi = Qij bj . Since this holds for all matrices [B] we must have Cmnpq = Cijk Qim Qjn Qkp Q q . (xii) (xi) (x) Finally multiplying both sides by Qam Qbn Qcp Qdq .e that Cijk = Qip Qjq Qkr Q s Cpqrs . Cijk and Aij . (ix) which by the substitution rule tells us that Amn = Cijk Qim Qjn Qkp Q q Bpq . B. (viii) Multiplying both sides by Qim Qjn and using the orthogonality of [Q]. Bij . (iv) Combining equations (iii) and (iv) gives Tijk = ai bj bk = Qip ap Qjq bq Qkr br = Qip Qjq Qkr ap bq br = Qip Qjq Qkr Tpqr . Substituting (vii) into (vi)2 gives Qip Qjq Apq = Cijk Qkp Q q Bpq . their components transform according to the 1tensor transformation rule ai = Qij aj . Let Aij . using the orthogonality of [Q] and the substitution rule yields the desired result Qam Qbn Qcp Qdq Cmnpq = Cabcd . We are told that A and B are 2tensors. i.10: Verify that the alternator eijk has the property that eijk = Qip Qjq Qkr epqr but that more generally eijk = Qip Qjq Qkr epqr for all orthogonal matrices [Q].
. since A is symmetric. [A] = [Q][A][Q]T for all orthogonal matrices [Q]. Example 3. show that necessarily Ciik = αδk for some arbitrary scalar α. Thus u = o is the most general isotropic vector. Cijkl = Qip Qjq Qkr Qls Cpqrs for all orthogonal matrices [Q]. it must necessarily hold for the special choice [Q] = −[I]. (i) Since (i) is to hold for all orthogonal matrices [Q]. WORKED EXAMPLES.13: Show that the most general isotropic symmetric tensor is a scalar multiple of the identity. and therefore it is an isotropic 2tensor. Thus [A] has the form λ1 0 0 [A] = 0 (ii) λ2 0 0 0 λ3 .11: If Cijk is an isotropic 4tensor. by deﬁnition.3. Solution: Since Cijkl is an isotropic 4tensor. Conversely. The desired result now follows since the most general isotropic 2tensor is a scalar multiple of the identity. 61 Example 3. (i) First. we know that there is some basis in which [A] is diagonal. Then Qij = −δij . then using the orthogonality of [Q].e. and ﬁnally using the substitution rule.6. (ii) thus ui = 0 and so u = o. Note from this that the alternator is not an isotropic 3tensor. On setting i = j in this. Example 3. Since A is also isotropic.12: Show that the most general isotropic vector is the null vector o. it follows that [A] must therefore be diagonal in every basis. u = o obviously satisﬁes (i) for all orthogonal matrices [Q]. i. Solution: In order to show this we must determine the most general vector u which is such that ui = Qij uj for all orthogonal matrices [Q]. we are led to Ciikl = Qip Qiq Qkr Qls Cpqrs = δpq Qkr Qls Cpqrs = Qkr Qls Cpprs . Solution: We must ﬁnd the most general symmetric 2tensor A whose components in every basis are the same. Thus Ciik obeys Ciikl = Qkr Qls Cpprs for all orthogonal matrices [Q]. and so (i) reduces to ui = −δij uj = −ui .
. Solution: Let Wij be the components of W in some basis and let w be the vector whose components in this basis are deﬁned by 1 wi = − eijk Wjk . Thus the vector w deﬁned by (i) has the desired property Wx = w × x. A permutation of this special choice of [Q] similarly shows that λ2 = λ3 . Thus (iii) must necessarily hold for the special choice 0 0 1 [Q] = 1 0 0 . we merely have to show that w has the desired property stated above. show that there is a vector w such that Wx = w × x for all x ∈ IE.62 CHAPTER 3. and ﬁnally using the substitution rule gives 1 1 eipq wi = − (δjp δkq − δjq δkp ) Wjk = − (Wpq − Wqp ) 2 2 Since W is skewsymmetric we have Wij = −Wji and thus conclude that Wij = −eijk wk . λ1 = λ2 . Conversely. COMPONENTS OF TENSORS. by direct substitution. (i) 2 Then. This establishes the result. Multiplying both sides of the preceding equation by eipq and then using the identity eijk eipq = δjp δkq − δjq δkp . [A] = α[I] is readily shown to obey (i) for any orthogonal matrix [Q]. Example 3.14: If W is a skewsymmetric tensor. Thus (i) takes the form λ1 0 λ2 0 0 0 λ1 0 0 = [Q] 0 0 λ3 0 λ2 0 0 0 [QT ] λ3 (iii) (iv) (v) Therefore. λ3 in any basis. Therefore [A] necessarily must have the form [A] = α[I]. Wij xj = −eijk wk xj = eikj wk xj = (w × x)i . CARTESIAN TENSORS in which case (iii) reduces to for all orthogonal matrices [Q]. Now for any vector x. Thus λ1 = λ2 = λ3 = say α. 0 1 0 λ1 0 0 0 λ2 0 0 λ2 0 = 0 λ3 0 0 λ1 0 0 0 .
63 (i) where α. = 2. it must necessarily hold for the special choice i = 1. k. . One can readily verify that it is suﬃcient as well. is isotropic. Turning to the second question. WORKED EXAMPLES. If this isotropic 4tensor is to possess the symmetry Cijk = Cjik .6. Solution: By deﬁnition of the transpose and the properties of the scalar product. γ are scalars.15: Verify that the 4tensor Cijk = αδij δk + βδik δj + γδi δjk . Therefore A has the properties that Ax · x = 0. and the orthogonality of [Q] as follows: Qip Qjq Qkr Q s Cpqrs = = = = = Qip Qjq Qkr Q s (α δpq δrs + β δpr δqs + γ δps δqr ) α Qiq Qjq Qks Q s + β Qir Qjs Qkr Q s + γ Qis Qjr Qkr Q s α (Qiq Qjq ) (Qks Q s ) + β (Qir Qkr ) (Qjs Q s ) + γ (Qis Q s ) (Qjr Qkr ) α δij δk + β δik δj + γ δi δjk Cijk . to (β − γ) (δik δj − δjk δi ) = 0 . (iii) Since this must hold for all values of the free indices i. k = 1.16: If A is a tensor such that Ax · x = 0 show that A is necessarily skewsymmetric. the substitution rule. Ax · x = x · AT x = AT x · x. Therefore (β − γ)(δ11 δ22 − δ21 δ12 ) = 0 and so β = γ. Remark: Observe that Cijk given by (v) automatically has the symmetry Cijk = Ck ij .3. Example 3. Solution: In order to verify that Cijk are the components of an isotropic 4tensor we have to show that Cijk = Qip Qjq Qkr Q s Cpqrs for all orthogonal matrices [Q]. show that one must have β = γ. It is useful for later use to record here. that the most general isotropic 4tensor C with the symmetry property Cijk = Cjik is Cijk = αδij δk + β (δik δj + δi δjk ) where α and β are scalars. β. j. after some simpliﬁcation. enforcing the requirement Cijk = Cjik on (i) leads. (ii) This establishes the desired result. and AT x · x = 0 for all vectors x. j = 2. The righthand side of this can be simpliﬁed by using the given form of Cijk . (v) Example 3. (iv) Remark: We have shown that β = γ is necessary if C given in (i) is to have the symmetry Cijk = Cjik . for all x (i) .
COMPONENTS OF TENSORS.22: For any linear transformation A.e. Since this must hold for all real numbers xk . it does not follow that A = 0 necessarily. (i) Recall that det Q = +1 for a proper orthogonal linear transformation. This vector is known as the axis of Q. and the desired result now follows. Observe that S is symmetric. Solution: This follows readily since det(QT AQ−µI) = det(QT AQ−µQT Q) = det QT (A − µI)Q = det QT det(A−µI) det Q = det(A−µI). and that det A = AT and det(−A) = (−1)3 det(A) for a 3dimensional vector space. Example 2. (ii) Example 2.18: For any orthogonal linear transformation Q. Therefore this leads to det(Q − I) = − det(Q − I). Therefore in terms of components in a principal basis of S. i. . Since QQT = I we have Q(QT − I) = I − Q. On taking the determinant of both sides and using the fact that det(AB) = det A det B we get det Q det(QT − I) = det (I − Q) . Sx · x = σ1 x2 + σ2 x2 + σ3 x2 = 0 1 2 3 where the σk ’s are the eigenvalues of S. Example 2. CARTESIAN TENSORS Adding these two equations gives Sx · x = 0 where S = A + AT . show that det(A−µI) = det(QT AQ−µI) for all orthogonal linear transformations Q and all scalars µ. that (Q − I)v) = o or equivalently that det(Q − I) = 0.64 CHAPTER 3. Therefore S = O whence A = −AT . show that there exists a vector v such that Qv = v. Remark: An important consequence of this is that if A is a tensor with the property that Ax · x = 0 for all x. Since QQT = I it now follows that 1 = det I = det(QQT ) = det Q det QT = (det Q)2 . Solution: To show that there is a vector v such that Qv = v. Solution: Recall that for any two linear transformations A and B we have det(AB) = det A det B and det B = det BT . it follows that every eigenvalue must vanish: σ1 = σ2 = σ3 = 0.20: If Q is a proper orthogonal linear transformation on the vector space IE3 . it is suﬃcient to show that Q has an eigenvalue +1. show that det Q = ±1. The desired result now follows.
e2 . e2 . e2 . WORKED EXAMPLES. i. and now using the result of Example 3. e3 ) = Ae1 · (e2 × e3 ) + e1 · (Ae2 × e3 ) + e1 · (e2 × Ae3 ) . dt Solution: From the result of Example 3. e2 .3. e3 ) = I1 (A.NNN. e2 . e3 ) for all linear transformations A and all (not necessarily) orthonormal bases {e1 .e. We can write this as ˙ ˙ ˙ FF−1 Fa × Fb · Fc + Fa × FF−1 Fb · Fc + Fa × Fb · FF−1 Fc = In view of the result of Example 3. this can be written as ˙ trace FF−1 Fa × Fb · Fc = d det F dt (a × b) · c d det F dt (a × b) · c. e1 · (e2 × e3 ) Show that φ(A.26: Deﬁne a scalarvalued function φ(A. e1 . 65 Remark: Observe from this result that the eigenvalues of QT AQ coincide with those of Q.6. e2 . φ(A) is called a scalar invariant of A. d ˙ det F = trace FF−1 det F. so that in particular the same is true of their product and their sum: det(QT AQ) = det A and tr(QT AQ) = tr A.7: Let F(t) be a oneparameter familty of nonsingular 2tensors that depends smoothly on the parameter t. dt .NNN once more we get ˙ trace FF−1 det F (a × b) · c = or d det F dt (a × b) · c. e3 } by φ(A. e3 ) is in fact independent of the choice of basis. Example 2. e3 } and {e1 . e2 .. e3 }. e1 . Pick any orthonormal basis and express φ(A) in terms of the components of A in that basis. and hence show that φ(A) = trace A. e2 . e1 . e3 ) for any two bases {e1 . Calculate d det F(t). e3 ). e2 . e1 . e1 . e1 .NNN we have F(t)a × F(t)b · F(t)c = det F(t) (a × b) · c Diﬀerentiating this with respect to t gives d ˙ ˙ ˙ F(t)a × F(t)b · F(t)c + F(t)a × F(t)b · F(t)c + F(t)a × F(t)b · F(t)c = det F(t) (a × b) · c dt ˙ where we have set F(t) = dF/dt. Example 3. show that φ(A. we can simply write φ(A) instead of φ(A. Thus.
41) shows that A3 can be written as a linear combination of I. I2 and I3 of the linear transformation a ⊗ b. I3 (A) = det A. Solution: This follows readily from the CayleyHamilton Theorem (3. .32: Calculate the principal scalar invariants I1 . Cambridge. References 1.A. This process can be continued an arbitrary number of times to see that for any integer k. Jeﬀreys. 1997. A2 and A3 . New York.31: For any linear transformation A show that det(A − αI) = −α3 + I1 (A)α2 − I2 (A)α + I3 (A) for all real numbers α where I1 (A). and therefore. 1987. . The result thus follows. Example 2. 2. Knowles. . + cN AN can be written as a quadratic polynomial of A. Cartesian Tensors. H. A and A2 . 3. Linear Vector Spaces and Cartesian Tensors. Segel. Ak can be expressed as a linear combination of I. A and A2 . Mathematics Applied to Continuum Mechanics.30: For any integer N > 0. ck Ak + . as linear combination of I. Example 2. L. I2 (A) = 1/2[(trace A)2 − trace(A2 )]. 1931. show that the polynomial PN (A) = c0 I + c1 A + c2 A2 + .41) as follows: suppose that A is nonsingular so that I3 (A) = det A = 0. multiplying this by A tells us that A4 can be written as a linear combination of A. CARTESIAN TENSORS Example 2. on using the result of the previous step.K. J. Dover. New York.66 CHAPTER 3. Oxford University Press. A and A2 . I2 (A) and I3 (A) are the principal scalar invariants of A: I1 (A) = trace A. Next. . Then (3. COMPONENTS OF TENSORS. .
In this chapter we touch brieﬂy on the question of characterizing symmetry by linear transformations. The set of such transformations depends on the object: for example the set of linear transformations that preserve the symmetry of a cube is diﬀerent to the set of linear transformations that preserve the symmetry of a tetrahedron. principally rotations and reﬂections. i. does not aﬀect symmetry. When an object is mapped using a linear transformation. One way in which to characterize the symmetry of an object is to consider the collection of all linear transformations that preserve its symmetry.Chapter 4 Characterizing Symmetry: Groups of Linear Transformations. We are interested in other linear transformations that also preserve symmetry. Intuitively a “uniform allaround expansion”. a linear transformation of the form αI that rescales the object by changing its size but not its shape. 67 . In this Chapter we shall consider those linear transformations that map the object back into itself. certain transformations preserve its symmetry while others don’t. The collection of such transformations have certain important and useful properties.e. Linear transformations are mappings of vector spaces into vector spaces.
68 CHAPTER 4. Thus there are a total of 4 × 2 = 8 symmetry preserving transformations of the square. j}. R where we are using the k k k φ notation introduced previously.1: Mapping a square into itself. Thus the following 4 . SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS C j D B A o i Figure 4. 4 of which are rotations and 4 of which are reﬂections.1. 4. The been determined. which lies in a plane normal to the unit vector k.1 An example in twodimensions. k. Rn is a righthanded rotation through an angle φ about the axis n. Consider mappings that carry the square into itself. We begin with an illustrative example. viz. Consider a square. Rπ . R }. 180o and 270o rotations about this axis map the square back into itself. Rπ . we (a) identify the axes of rotational symmetry and then (b) determine the number of distinct rotations about each such axis. Once the location of A has to the orthonormal vectors {i. In order to determine them. i. whose center is at the origin o and whose sides are parallel vertex A can be placed in one of 4 positions. And once the locations of A and B have been ﬁxed. R π/2 π/2 3π/2 k . and we note that 0o . ABCD. k k 3π/2 . In the present case there is just 1 axis to consider. 90o . the vertex B can be placed in one of 2 positions (allowing for reﬂections or in just one position if only rotations are permitted). R Let Gsquare denote the set consisting of these 4 symmetry preserving rotations: Gsquare = {I. see Figure 4. Consider the 4 rotations. there is no further ﬂexibility and the locations of the remaining vertices are ﬁxed.e. distinct rotations are symmetry transformations: I.
AN EXAMPLE IN THREEDIMENSIONS. R R = k k k k k 3π/2 3π/2 I. i.2. 2.e. it is useful to consider the threedimensional version of the previous problem. then so is its k k k 3π/2 −1 π/2 −1 inverse P . and if P ∈ G square so is its inverse P−1 . For example (Rπ )−1 = Rπ . Rπ } is a subset of Gsquare and that it too has the properties that if P1 . these two properties endow the set Gsquare with a certain special structure. Therefore we can say k π/2 that the set Gsquare is “generated” by the element R . For example. 3.2: Mapping a cube into itself. Rπ R = R . observe that if P is any member of Gsquare . Before considering some general theory.4. k Next consider the rotation R Finally observe that G square = {I. P2 ∈ G square then their product P1 P2 is also in G square .2 An example in threedimensions. R R = Rπ etc. observe that the successive application of any two symmetries yields a third symmetry. and observe that every element of the set Gsquare can be k π/2 represented in the form (R )n for the integer choices n = 0. Consider a cube whose center is at the origin o and whose . k j B o i C D A Figure 4.4. 69 This collection of linear transformations has two important properties: ﬁrst. (R ) =R etc. Second. then so is their product P1 P2 . if P1 and P2 are in Gsquare . As we shall see in Section k k k k 4.4. 1. We shall generalize all of this in Section 4. π/2 π/2 3π/2 π/2 3π/2 4.
and its three adjacent vertices B. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS edges are parallel to the orthonormal vectors {i. and consider mappings that carry the cube into itself. k}. i − j. consider the 24 rotations. R . And once the locations of A and B have been ﬁxed. i − k. and . we have the following rotational transformations that preserve symmetry: 1. R i i 3π/2 . i + k. the vertex C can be placed in one of 2 positions (allowing for reﬂections or in just one position if only rotations are permitted). and 90o . Rπ . (which in materials science are called the {100} directions). Finally. i − j + k. R . j. Once the location of A has been determined. R j j 3π/2 . In order to determine these rotations we again (a) identify all axes of rotational symmetry and then (b) determine the number of distinct rotations about each such axis.70 CHAPTER 4. R . and 120o and 240o rotations about each of these axes maps the cube back into the cube. j. i + j − k. (which in i+j+k . 180o and 270o rotations about each of these axes maps the cube back into the cube. 24 of which are rotations and 24 of which i . The vertex A can be placed in one of 8 positions. R 2π/3 . First. j + k. Thus there are a total of 8 × 3 × 2 = 48 are reﬂections. R k k k π/2 3π/2 2. j − k (which in materials science are called the {110} directions). Rπ . Thus the following 3 × 3 = 9 distinct rotations are symmetry transformations: R π/2 symmetry preserving transformations of the cube. the locations of the remaining vertices are ﬁxed. B and C have been placed. There are 4 axes that join one vertex of the cube to the diagonally opposite vertex of materials science are called the {111} directions). R . Thus the following 4 × 2 = 8 distinct rotations are symmetry transformations: R 2π/3 the cube which we can take to be i + j + k. D. i − j − k. There are 3 axes that join the center of one face of the cube to the center of the opposite face of the cube which we can take to be i. there are 6 axes that join the center of one edge of the cube to the center of the diagonally opposite edge of the cube which we can take to be i + j. the vertex B can be placed in one of 3 positions. Once the vertices A. R π/2 j . R . In the present case we see that. R 4π/3 i+j+k . Consider a vertex A. k. C. i−j+k i−j+k i+j−k i+j−k i−j−k i−j−k 4π/3 2π/3 4π/3 2π/3 4π/3 3. R . in addition to the identify transformation I itself. Rπ .
but it need not describe then so does its inverse P−1 . In general. R 4π/3 3π/2 2π/3 4π/3 i+j+k i+j+k i−j+k . e. R 4π/3 2π/3 i−j+k i+j−k i+j−k . Rπ+k = R i j π/2 −1 R k π/2 2 . Rπ . if R is a a reﬂectional symmetry of the object. one can verify that every element of the set Gcube can be represented in the form π/2 π/2 2π/3 R j π/2 −1 . where the 24 reﬂections are obtained by multiplying each rotation by −I. (It is important to remark that this just happens to be true for the cube. Rπ−k }. Rπ−k .) The collection of linear transformations Gcube has two important properties that one can rotational symmetry of an object then −R is. Rπ−k . and (ii) if P ∈ Gcube . Thus the following 6 × 1 = 6 distinct rotations are symmetry transformations: Rπ+j . r. Rπ+k . Rπ−j . of course. i−j−k (4.g. R . R k k . For example the rotation R (about i j k i+j+k a {111} axis) and the rotation Rπ+k (about a {110} axis) can be represented as i (R R = R i+j+k k 2π/3 π/2 −1 π/2 p Next. see the example of the tetrahedron discussed later. Rπ+k . R 2π/3 i−j−k .18. Rπ+k . i i i i j j Let Gcube denote the collection of these 24 symmetry preserving rotations: Gcube = {I. then there are 48 elements in this set. R 3π/2 . R .) Therefore we can say that the set Gcube is “generated” by the three elements R π/2 i . R j j j .2.1) 4π/3 Rπ+j . then their product P1 P2 is also in Gcube . Rπ . Rπ−k . Rπ+k . i i i i j j If one considers rotations and reﬂections.4.R π/2 j and R π/2 k . 71 180o rotations about each of these axes maps the cube back into the cube. R π/2 k . Rπ . ) (R )q (R )r for integer choices of p. . q. R R π/2 i . Rπ−j . but is not generally true. a reﬂection. AN EXAMPLE IN THREEDIMENSIONS. (One way in which to verify this is to use the representation of a rotation tensor determined in Example 2. R . verify: (i) if P1 and P2 ∈ Gcube . R i i . R 2π/3 π/2 3π/2 .
3) for some 3 × 3 matrix [M ] whose elements Mij are integers and where det[M ] = 1. . and if P ∈ Glattice so is its inverse P−1 .72 CHAPTER 4. 3 1 . P2 ∈ Glattice then their product P1 P2 is also in Glattice .3: A twodimensional square lattice with lattice vectors 1. 2 . 2. 3} = {x  x = o + n=1 ni i . a Bravais lattice L{o. 3 }: L{o.3 shows a twodimensional square lattice and one possible set of lattice vectors 1. is an inﬁnite set of periodically arranged points in space generated by the 2 . A geometric structure of particular interest in solid mechanics is a lattice and we now make a few observations on the symmetry of lattices. Given a lattice. 2.3 Lattices. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS 4. translation of a single point o through three linearly independent lattice vectors { 1 .) a a j 2 i 1 Figure 4. One can show that if P1 . (It is clear from the ﬁgure that diﬀerent sets of lattice vectors can correspond to the same lattice. The set Glattice is called the symmetry group of the lattice. Figure 4. n i ∈ Z } (4. 2. let Glattice be the set of all linear transformations P that map the lattice back into itself. The simplest lattice.2) where Z is the set of integers. 3 }. 1. It can be shown that a linear transformation P maps a lattice back into itself if and only if 3 P i = j=1 Mij j (4.
for example.4. if P ∈ G then P−1 ∈ G. triclinic. The generators of a group G are those elements P1 .the set of all unimodular linear transformations2 (i.g.4 Groups of Linear Transformations.the set of all orthogonal linear transformations. Because. . 73 and the set of rotations in Glattice is known as the point group of the lattice. and . P2 . . 2 While the determinant of an orthogonal tensor is ±1 the converse is not necessarily true. For example the point group of a simple cubic lattice1 is the set Gcube of 24 rotations given in (4. a collection of linear transformations G is said to be a subgroup of a group (i) G ⊂ G and (ii) G is itself a group. 4. viz. Thus the unimodular group is not equivalent to the orthogonal group. tetragonal.the set of all proper orthogonal linear transformations. orthorhombic. . One can show that each of the following sets of linear transformations forms a group: . linear transformations with determinant equal to ±1). that are not orthogonal. G if In general. Gcube and Glattice encountered in the previous sections are their inverses are multiplied among themselves in various combinations yield all the elements of the group.1). Note from this that the identity transformation I is necessarily a member of every group G.e. monoclinic.e. . There are seven diﬀerent types of symmetry that arise in Bravais lattices.the set of all proper unimodular linear transformations (i. linear transformations with determinant equal to +1). There are unimodular tensors. Generators of the groups Gsquare and Gcube were given previously. the number of diﬀerent types of lattices is greater than seven. P = I + αi ⊗ j. a cubic lattice can be bodycentered or facecentered. .4. trigonal and hexagonal. e. 1 . Pn which. groups. A collection G of nonsingular linear transformations is said to be a group of linear transformations if it possesses the following two properties: (i) (ii) if P1 ∈ G and P2 ∈ G then P1 P2 ∈ G. GROUPS OF LINEAR TRANSFORMATIONS. and so on. when they and Clearly the three sets Gsquare . cubic. .
our discussion in these notes is limited to groups of linear transformations. ψ(C) = ψ(PT CP) for all symmetric positive − deﬁnite C. which in turn is a subgroup of the group of unimodular linear transformations. It should be mentioned that the general theory of groups deals with collections of elements (together with certain “rules” including “multiplication”) where the elements need not be linear transformations.74 CHAPTER 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS One can readily show that the group of proper orthogonal linear transformations is a subgroup of the group of orthogonal linear transformations. (This represents the energy in the material and characterizes its mechanical response).4) . Similarly the set of all matrices of the form and the inverse being cosh x sinh x sinh x cosh x where − ∞ < x < ∞.5 Symmetry of a scalarvalued function of symmetric positivedeﬁnite tensors. . The symmetry of the material will be characterized by a set G of nonsingular tensors P which has the property that. for each P ∈ G. cosh(−x) sinh(−x) sinh(−x) cosh(−x) can be shown to be a group. In our ﬁrst example. (4. the identity taken to be zero. For example the set of all integers Z with “multiplication” deﬁned as the addition of numbers. the identity being the identity matrix. When we discuss the constitutive behavior of a material in Volume 2. G square is a subgroup of Gsquare . with “multiplication” deﬁned as matrix multiplication. and the inverse of x taken to be −x is a group. However. 4. we will encounter a scalarvalued function ψ(C) deﬁned for all symmetric positive deﬁnite tensors C.
Thus if P1 and P2 are in G. (Are there any other tensors in G?) .5. Next. Let Qn be a rotation about the axis n through an arbitrary angle. It is seen trivially that for this ψ. ﬁrst let ψ(C) = ψ(PT CP1 ). if P ∈ G then −P ∈ G also.5)2 and (4. consider the function ψ(C) = ψ det C . equation (4.7) where n is a given ﬁxed unit vector. 1 ψ(C) = ψ(PT CP2 ).8) The symmetry group of the function (4. Then ψ((P1 P2 )T CP1 P2 ) = ψ(PT (PT CP1 )P2 ) = 2 1 ψ(PT CP1 ) = ψ(C) where we have used (4. SYMMETRY OF A SCALARVALUED FUNCTION 75 P1 .e. P2 ∈ G so that It can be readily shown that this set of tensors G is a group.4) gives ψ(S) = ψ(P−T SP−1 ) for all symmetric positivedeﬁnite S.6) as a consequence. To see this. Then since n is the axis of Qn we know that Qn n = n.4) is a group.4. n n (4. Substituting this into (4. 2 (4. the equation S = PT CP provides a onetoone relation between symmetric positive deﬁnite tensors C and S. Observe from (4. since (4. then so is P1 P2 . Thus the set G of nonsingular tensors obeying (4.4) that the symmetry group of ψ contains the elements I and −I. (4. Thus.7) therefore contains the set of all rotations about n. Thus the symmetry group of this ψ consists of all unimodular tensors ( i. tensors with determinant equal to ±1).5)1 in the penultimate and ultimate 1 steps respectively.4) holds for all symmetric positivedeﬁnite C. suppose that P ∈ G. it also holds for all symmetric positivedeﬁnite linear transformations S = PT CP. Therefore ψ(QT CQn ) = ψ QT CQn n · n = ψ CQn n · Qn n = ψ Cn · n = ψ(C).4) holds if and only if det P = ±1.5) for all symmetric positivedeﬁnite C. Since P is nonsingular. As a second example consider the function ψ(C) = ψ Cn · n (4. and so P−1 is also in G. we shall refer to it as the symmetry group of ψ. and To examine an explicit example.
if H is a spherical tensor. Thus for an isotropic function ψ. that there exists a function ψ such that ψ(C) = ψ I1 (C). I3 (C) (4.38).9) If G1 and G2 are the symmetry groups of ψ1 and ψ2 respectively. I2 (C). From a theorem in algebra it follows that an isotropic function ψ depends on C only through its principal scalar invariants deﬁned previously in (3. then G1 = G2 .13) . This motivates the following slight modiﬁcation to our original notion of symmetry of a function ψ(C). and consider two functions ψ1 (C) and ψ2 (C).11) It can be readily shown that this set of tensors G is also a group. necessarily a subgroup of the unimodular group. if H = αI.e. then it can be shown that G2 = HG1 H−1 of this. together with the special case of the result noted in the preceding paragraph provides a hint of why we might want to limit attention to unimodular tensors rather than consider all nonsingular tensors in our discussion of symmetry. A function ψ is said to be isotropic if its symmetry group G contains all orthogonal ψ(C) = ψ(PT CP) (4. Suppose that ψ1 and ψ2 are related by ψ2 (C) = ψ1 (HT CH) for all symmetric positive − deﬁnite tensors C.76 CHAPTER 4.10) in the sense that a tensor P ∈ G1 if and only if the tensor HPH−1 ∈ G2 . (4. As a special case Next. for all symmetric positivedeﬁnite C and all orthogonal P. ψ(C) = ψ(PT CP) for all symmetric positive − deﬁnite C.12) tensors. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS The following result will be useful in Volume 2. note that any nonsingular tensor P can be written as the product of a spherical tensor αI and a unimodular tensor T as P = (αI)T provided that we take α = ( det P)1/3 since then det T = ±1. Let H be some ﬁxed nonsingular linear transformation. We characterize the symmetry of ψ by the set G of unimodular tensors P which have the property that. i. This. each deﬁned for all symmetric positivedeﬁnite tensors C. for each P ∈ G. (4.e. i. (4.
0 < φ < 2π.4. according to a i j k theorem in algebra (see pg 312 of Truesdell and Noll). i6 (C). i4 (C). If G contains I and all rotations Rφ .6 Worked Examples. and contains 24 rotations and 24 reﬂections. C22 C33 + C33 C11 + C11 C22 C11 C22 C33 2 2 2 C23 + C31 + C12 2 2 2 2 2 2 C31 C32 + C12 C23 + C23 C31 C23 C31 C12 2 2 2 2 2 2 C22 C12 + C33 C31 + C33 C23 + C11 C12 + C11 C31 + C22 C23 2 2 2 2 2 2 C11 C31 C12 + C22 C12 C23 + C33 C23 C31 2 2 2 C23 C22 C33 + C31 C33 C11 + C12 C11 C22 (4. 77 (4. −Rπ which represent reﬂections in the i j k planes normal to i.16) n. R . As noted previously. including both rotations and reﬂections. through all angles φ about a ﬁxed axis n If G includes the three elements −Rπ .14) with the set of 24 rotations Gcube given in (4.1: Characterize the set Hsquare of linear transformations that map a square back into a square. j and k. I3 (C) = det C. ψ(C) = ψ i1 (C). i8 (C). .6. 4. i7 (C). where I1 (C) = trace C I2 (C) = 1/2 [(trace C)2 − trace (C2 )] . i2 (C).15) (4. R and −I. this group is generated by π/2 π/2 π/2 As a second example consider “cubic symmetry” where the symmetry group G coincides . −Rπ . Example 4. the corresponding symmetry is called transverse isotropy. i5 (C).1) plus the corresponding reﬂections obtained R by multiplying these rotations by −I. i3 (C). WORKED EXAMPLES. i9 (C) where i1 (C) i2 (C) i3 (C) i4 (C) i5 (C) i6 (C) i7 (C) i8 (C) i9 (C) = = = = = = = = = C11 + C22 + C33 . Then. the symmetry is called orthotropy.
1 and now consider the set rotations and reﬂections Hsquare that map the square back into itself. D. 1. V} . D} . k π/2 D=R π/2 k H. k k k k k D = (R π/2 3 π/2 k ) H. Solution: All elements of Hsquare can be represented in the form (R )i Hj for integer choices of i = 0. Rπ . D . Rπ } . e. . The set of rotations that do this were determined earlier and they are π/2 3π/2 Gsquare = {I. {I. then so is its inverse. Therefore the group Hsquare is generated by the two elements H and Rπ/2 . Rπ . R = (R )3 .g. = reﬂection in the diagonal with negative slope −i + j.2: Find the generators of Hsquare and all subgroups of Hsquare . H. {I. I. R π/2 = reﬂection in the horizontal axis i. Rπ/2 . D . V. 3 k and j = 0. e. k k k As the 4 reﬂectional symmetries we can pick H V D D and so Hsquare = I. Example 4. then so is their product P1 P2 . Solution: We return to the problem describe in Section 4. V. {I. R }. I = (R )4 . {I. I. k . Rπ . (i) One can verify that Hsquare is a group since it possesses the property that if P1 and P2 are two transfor3π/2 3π/2 H. R k 3π/2 k . R . V = (R )2 H. {I.g. = reﬂection in the vertical axis j.4: Mapping a square into itself. H = H etc. Rπ } .78 CHAPTER 4. 1: π/2 π/2 π/2 3π/2 Rπ = (R )2 . D = HR etc. And if P is any member mations in G. D . = reﬂection in the diagonal with positive slope i + j. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS j D C i A B Figure 4. D. R3π/2 . 2. H. D = R k k −1 of G. H} . Rπ . One can verify that the following 8 collections of linear transformations are subgroups of Hsquare : I.
the second leaves a diagonal invariant. 2. Now consider the example of a twodimensional a × a . three orthonormal vectors {i. Thus these 4 × 2 = 8 distinct rotations – of the form R . Example 4. etc.4: Are all symmetry preserving linear transformations necessarily either rotations or reﬂections? Solution: We began this chapter by considering the symmetry of a square. The group Gtetrahedron of rotational symmetries of a tetrahedron therefore consists of these 11 rotations plus the identity transformation I. The ﬁrst leaves the face invariant. Thus these 3 × 1 = 3 distinct rotations – of the form Rπ . j. k A B p D j C i Figure 4. Example 4. There are three axes like p shown in the ﬁgure that pass through the midpoints of a pair of opposite edges. while the unit vector p passes through the center of the edge AD and the center of the opposite edge BC.5: A regular tetrahedron ABCD. and righthanded rotations of 120o and 240o about each of these axes maps the 2π/3 4π/3 tetrahedron back onto itself. 79 Geometrically. the third leaves the axis invariant. each of these subgroups leaves some aspect of the square invariant. k k – are symmetry transformations of the tetrahedron. There are 4 axes like k in the ﬁgure that pass through a vertex of the tetrahedron and the centroid of the opposite face. There are no other subgroups of Hsquare . The axis k passes through the vertex A and the centroid of the opposite face BCD. and a righthanded rotation through 180o about each of these axes maps the tetrahedron back onto itself.4.6. – are symmetry p transformations of the tetrahedron. the fourth leaves an axis and a diagonal invariant etc.R . k} and a unit vector p. etc. and examining the diﬀerent ways in which the square could be mapped back into itself.3: Characterize the rotational symmetry of a regular tetrahedron. Solution: 1. WORKED EXAMPLES.
6: Show that the group of proper orthogonal tensors is a subgroup of the group of orthogonal tensors. show that there is a function ψ such that ψ(C) = ψ I1 (C). I3 (C) . Show that ψ depends on C only through its principal scalar invariants. if for every integer n. There are however other transformations. the “shearing” of the lattice described by the linear transformation P = I + ai ⊗ j (ii) is also a symmetry preserving transformation.e. that are neither rotations nor reﬂections.e. n1 . and all proper unimodular tensors (i. a a j 2 i 1 Figure 4.5: Show that each of the following sets of linear transformations forms a group: all orthogonal tensors. i. that also leave the lattice invariant. Thus.e.6. all unimodular tensors (i. For example.6: A twodimensional × square lattice. which in turn is a subgroup of the group of unimodular tensors.e. the set of inﬁnite points Lsquare = {x  x = n1 ai + n2 aj.80 CHAPTER 4. i. tensors with determinant equal to ±1).7: Suppose that a function ψ(C) is deﬁned for all symmetric positive deﬁnite tensors C and that its symmetry group is the set of all orthogonal tensors. n2 ∈ Z ≡ Integers} (i) depicted in Figure 4. and examine the diﬀerent ways in which this lattice can be mapped back into itself. Example 4. Example 4. I2 (C). We ﬁrst note that the rotational and reﬂectional symmetry transformations of a a × a square are also symmetry transformations for the lattice since they leave the lattice invariant. all proper orthogonal tensors. one recovers the original lattice. one rigidly translates the nth row of the lattice by precisely the amount n in the i direction. tensors with determinant equal to +1). Example 4. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS square lattice.
38). Since each set of basis vectors is orthonormal. there is an orthogonal tensor R that (1) (1) (1) (2) (2) (2) carries {e1 .40) between principal invariants and eigenvalues is onetoone. (1) (1) (2) (iii) where the two sets of orthonormal vectors {e1 . e2 . WORKED EXAMPLES. I1 (C1 ) = I1 (C2 ). 3. Example 4. i = 1. e2 . (2) i = 1. one has f (x) = f (Px) for all vectors x.4. Thus we can write 3 3 I2 (C1 ) = I2 (C2 ). (ii) C1 = i=1 λi ei (1) ⊗ ei . (v) (1) (1) and so RT C2 R = C1 . i) Show that G is a group. (iv) RT i=1 λi ei (2) ⊗ ei (2) R= i=1 λi RT (ei ⊗ ei )R = (2) (2) i=1 λi (RT ei ) ⊗ (RT ei ) = (2) (2) i=1 λi (ei ⊗ (ei ). 2. e3 } are the respective principal bases of C1 and C2 . 3. for all vectors x. Let G be the set of all nonsingular linear transformations P that have the property that for each P ∈ G. This establishes the desired result. (i) In order to prove the desired result it is suﬃcient to show that. e3 } and {e1 . Recall that the mapping (3. 2. Solution: i) Suppose that P1 and P2 are in G. then ψ(C1 ) = ψ(C2 ). (1) (1) (1) C2 = i=1 (1) λi ei (2) ⊗ ei . i. e3 }: Rei Thus 3 3 3 3 (1) (1) = ei .6.8: Consider a scalarvalued function f (x) that is deﬁned for all vectors x. e3 } into {e1 . if C1 and C2 are two symmetric tensors whose principal invariants Ii are the same.e. ii) Find the most general form of f if G contains the set of all orthogonal transformations. are the principal scalar invariants of C deﬁned previously in (3. 81 Solution: We are given that ψ has the property that for all symmetric positivedeﬁnite tensors C and all orthogonal tensors Q ψ(C) = ψ(QT CQ). where Ii (C). that f (x) f (x) = = f (P1 x) f (P2 x) for all vectors x. e2 . Therefore ψ(C1 ) = ψ(RT C2 R) = ψ(C2 ) where in the last step we have used (i). and (i) . It follows from this and (ii) that the eigenvalues of C1 and C2 are the same. e2 . I3 (C1 ) = I3 (C2 ).
I3 (C1 ) = I3 (C2 ).9: Consider a scalarvalued function g(C. If G contains the set of all orthogonal transformations. Example 4. show that there exists a function g such that g(C. Next. suppose that P ∈ G so that f (x) = f (Px) for all vectors x. I3 (C) are the three fundamental scalar invariants of C and I4 (C. n) = C2 n · n. there is a rotation tensor R that carries x2 to x1 : Rx2 = x1 . This establishes the result claimed above. we will show that f (x1 ) = f (x2 ). n) = Cn · n. it is suﬃcient to show that if C1 and C2 are two symmetric positive deﬁnite linear transformations whose “invariants” Ii . n ⊗ n) = g(PT CP. I5 (C1 . i = 1. n). I4 (C1 .e. 2. e3 } where e3 = n one has I4 = C33 and 2 2 2 I5 = C31 + C32 + C33 . Remark: Observe that with respect to an orthonormal basis {e1 . there exists a function f such that f (x) = f (x) for all vectors x.7. m⊗m) that is deﬁned for all symmetric positivedeﬁnite tensors C and all unit vectors m. PT (n ⊗ n)P) for all symmetric positivedeﬁnite tensors C and some particular unit vector n.e. As in Example 4. QT (n ⊗ n)Q) (i) for all orthogonal Q and all symmetric positive deﬁnite C. Since P is nonsingular we can set y = Px and obtain f (P−1 y) = f (y) for all vectors y. n ⊗ n) = g I1 (C). that f (x) = f (Px) for all vectors x and all orthogonal P. n ⊗ n) = g(QT CQ. I4 (C. ii) If x1 and x2 are two vectors that have the same length. I2 (C).e. I3 (C). where in the last step we have used the fact that G contains the set of all orthogonal transformations. I1 (C1 ) = I1 (C2 ). i. I5 (C. 5” are the same. 4. I2 (C). n). i. If x1 and x2 are two vectors that have the same length. Therefore f (x1 ) = f (Rx2 ) = f (x2 ). Let G be the set of all nonsingular linear transformations P that have the property that for each P ∈ G. It thus follows that G has the two deﬁning properties of a group. I2 (C1 ) = I2 (C2 ). n) where I1 (C). Solution: We are told that g(C. one has g(C. i. SYMMETRY: GROUPS OF LINEAR TRANSFORMATIONS f (P1 P2 )x = f P1 (P2 x) = f (P2 x) = f (x) where in the penultimate and ultimate steps we have used (i)1 and (i)2 respectively. n) (ii) . I5 (C. whence f (x) depends on x only through its length x. e2 .82 Then CHAPTER 4. 3. n) = I5 (C2 . n) = I4 (C2 .
Truesdell and W.5 and the deﬁnitions of I4 and I5 that Rn · Rn = n · n.3 and the analysis in Example 4. Volume I. for example. MacLane. WORKED EXAMPLES. . (ii)4. n ⊗ n) = g(RT C2 R. edited by S. C. REFERENCES 1. Theory of invariants. The nonlinear ﬁeld theories of mechanics. This establishes the desired result. G.J. M. edited by A. n ⊗ n). This implies that Rn = ±n and consequently RT n = ±n. It 2 1 now follows from this. Groups and Symmetry. the fact that R is orthogonal.4. Spencer. Academic Press.2. 1965. 3. MacMillan. From (ii)1.C. Birkhoﬀ and S. Consequently g(C1 . Flugge. 1971. Volume III/3. 1977. SpringerVerlag. RT (n ⊗ n)R) = g(C2 . Noll. 1988. Eringen.7 it follows that there is an orthogonal tensor R such that RT C2 R = C1 . for expressing (iii) in a principal basis of C2 . Armstrong. 2 2 (iii) and this must hold for all symmetric positive deﬁne C2 . SpringerVerlag. in Continuum Physics. C2 Rn · Rn = C2 n · n. in Handbuch der Physik. C2 Rn · Rn = C2 n · n. 2. as may be seen.A. n ⊗ n) where we have used (i) in the very last step. 83 then g(C1 . (RT n) ⊗ (RT n)) = g(RT C2 R. A Survey of Modern Algebra.6. It is readily seen from this that RT C2 R = C2 as well.M. A. n ⊗ n) = g(C2 . 4.
.
.. e3 }..Chapter 5 Calculus of Vector and Tensor Fields Notation: α {a} a ai [A] A Aij C Cijk Ti1 i2 ... The components will always be taken with respect to a single ﬁxed orthonormal basis {e1 .. 5...in .. ..... . . component of 4tensor C in some basis i1 i2 ....... v(x).. . or ith element of the column matrix {a} 3 × 3 square matrix secondorder tensor (2tensor) i.. .... . We shall consider scalar and tensor ﬁelds such as φ(x).. A(x) and T(x) deﬁned on R + ∂R.. j component of the 2tensor A in some basis... scalar 3 × 1 column matrix vector ith component of the vector a in some basis... j. ..1 Notation and deﬁnitions... j element of the square matrix [A] fourthorder tensor (4tensor) i. or i.. .. . k...... e2 . The region R + ∂R and these ﬁelds will always be assumed to be suﬃciently regular so as to permit the calculations carried out below.. Each component of say a vector ﬁeld v(x) or a 2tensor ﬁeld A(x) is eﬀectively a scalarvalued function 85 ... Let R be a bounded region of threedimensional space whose boundary is denoted by ∂R and let x denote the position vector of a generic point in R + ∂R.in component of ntensor T in some basis.. we shall take the more limited approach of working with the components of these ﬁelds... While the subject of the calculus of tensor ﬁelds can be dealt with directly.
86 CHAPTER 5. where vi and xi are the ith components of the vectors v and x in the basis {e1 . (5. and we can use the wellknown operations of classical calculus on such ﬁelds such as partial diﬀerentiation with respect to xk . we shall use the notation that a comma followed by a subscript denotes partial diﬀerentiation with respect to the corresponding xcoordinate. The gradient of a scalar ﬁeld φ(x) is a vector ﬁeld denoted by grad φ (or i component in the orthonormal basis is th φ). we will write φ.5) · A). Thus.3) The gradient of a scalar ﬁeld φ in the particular direction of the unit vector n is denoted by ∂φ/∂n and deﬁned by ∂φ = φ · n. x2 . x2 . In order to simplify writing. ∂xi φ. x3 ) and Aij (x1 .6) . Its (5. so that grad φ = φ. (5.j = ∂vi . The divergence of a 2tensor ﬁeld A(x) is a vector ﬁeld denoted by div A (or ith component in the orthonormal basis is (div A)i = Aij. It is (5. Its ij th  (5. Its (grad φ)i = φ.4) ∂n The divergence of a vector ﬁeld v(x) is a scalar ﬁeld denoted by div v (or given by div v = vi. for example.i .i . x3 ).1) and so on. so that grad v = vi.ij = ∂ 2φ .2) v). ∂xj (5. e2 .j . The gradient of a vector ﬁeld v(x) is a 2tensor ﬁeld denoted by grad v (or component in the orthonormal basis is (grad v)ij = vi.j ei ⊗ ej . e3 }. ∂xi ∂xj vi. CALCULUS OF VECTOR AND TENSOR FIELDS on threedimensional space. vi (x1 .i ei .i = ∂φ .j · v).
for a scalar ﬁeld φ(x) φn dA = ∂D D φ dV or ∂D φnk dA = D φ.k dV.j so that curl v = eijk vk..2. i2 .12) where some of the subscripts i1 .j ei . In particular.10) ∂D D as well as v ⊗ n dA = v dV D or ∂D vi nk dA = D vi. ( 2 v)i = vi. (5. (5. . (5.9) Likewise for a vector ﬁeld v(x) one has v · n dA = · v dV or ∂D vk nk dA = D vk. ( 2 A)ij = Aij.kk .11) ∂D More generally for a ntensor ﬁeld T(x) the divergence theorem gives Ti1 i2 .kk .kk .2 Integral theorems Let D be an arbitrary regular subregion of the region R.8) 5..5. in may be repeated and one of them might equal k. 87 × v). INTEGRAL THEOREMS so that div A = Aij.. . The divergence theorem allows one to relate a surface integral on ∂D to a volume integral on D.i ) dV ∂xk 1 2 n (5. Its ith (5.in nk dA = ∂D D ∂ (Ti i .k dV. vector and 2tensor ﬁelds with components 2 φ = φ. The curl of a vector ﬁeld v(x) is a vector ﬁeld denoted by curl v (or component in the orthonormal basis is (curl v)i = eijk vk. . a vector ﬁeld v(x) and a 2tensor ﬁeld A(x) are the scalar.k dV.. (5. . .j ei .7) The Laplacians of a scalar ﬁeld φ(x).
88
CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
5.3
Localization
Certain physical principles are described to us in terms of equations that hold on an arbitrary portion of a body, i.e. in terms of an integral over a subregion D of R. It is often useful to derive an equivalent statement of such a principle in terms of equations that must hold at each point x in the body. In what follows, we shall frequently have need to do this, i.e. convert a “global principle” to an equivalent “local ﬁeld equation”. Consider for example the scalar ﬁeld φ(x) that is deﬁned and continuous at all x ∈ R + ∂ R and suppose that φ(x) dV = 0
D
for all subregions D ⊂ R. at every point x ∈ R.
(5.13)
We will show that this “global principle” is equivalent to the “local ﬁeld equation” φ(x) = 0 (5.14)
D
z
B (z)
Figure 5.1: The region R, a subregion D and a neighborhood B (z) of the point z. We will prove this by contradiction. Suppose that (5.14) does not hold. This implies that there is a point, say z ∈ R, at which φ(z) = 0. Suppose that φ is positive at this point: φ(z) > 0. Since we are told that φ is continuous, φ is necessarily (strictly) positive in some neighborhood of z as well. Let B (z) be a sphere with its center at z and radius > 0. We can always choose suﬃciently small so that B (z) is a suﬃciently small neighborhood of z and φ(x) > 0 at all x ∈ B (z). (5.15) Now pick a region D which is a subset of B (z). Then φ(x) > 0 for all x ∈ D. Integrating φ over this D gives φ(x) dV > 0
D
(5.16)
thus contradicting (5.13). An entirely analogous calculation can be carried out in the case φ(z) < 0. Thus our starting assumption must be false and (5.14) must hold.
5.4. WORKED EXAMPLES.
89
5.4
Worked Examples.
In all of the examples below the region R will be a bounded regular region and its boundary ∂R will be smooth. All ﬁelds are deﬁned on this region and are as smooth as in necessary. In some of the examples below, we are asked to establish certain results for vector and tensor ﬁelds. When it is more convenient, we will carry out our calculations by ﬁrst picking and ﬁxing a basis, and then working with the components in that basis. If necessary, we will revert back to the vector and tensor ﬁelds at the end. We shall do this frequently in what follows and will not bother to explain this strategy each time.
Example 5.1: Calculate the gradient of the scalarvalued function φ(x) = Ax · x where A is a constant 2tensor. Solution: Writing φ in terms of components φ = Aij xi xj . Calculating the partial derivative of φ with respect to xk yields φ,k = Aij (xi xj ),k = Aij (xi,k xj + xi xj,k ) = Aij (δik xj + xi δjk ) = Akj xj + Aik xi = (Akj + Ajk )xj or equivalently φ = (A + AT )x.
Example 5.2: Let v(x) be a vector ﬁeld and let vi (x1 , x2 , x3 ) be the ith component of v in a ﬁxed orthonormal basis {e1 , e2 , e3 }. For each i and j deﬁne Fij = vi,j . Show that Fij are the components of a 2tensor. Solution: Since v and x are 1tensors, their components obey the transformation rules vi = Qik vk , vi = Qki vk Therefore Fij = and xj = Qjk xk , x = Qj xj
∂vi ∂vi ∂x ∂vi ∂(Qik vk ) ∂vk = = Qj = Qj = Qik Qj = Qik Qj Fk , ∂xj ∂x ∂xj ∂x ∂x ∂x
which is the transformation rule for a 2tensor.
Example 5.3: If φ(x), u(x) and A(x) are a scalar, vector and 2tensor ﬁelds respectively. Establish the identities a. div (φu) = u · grad φ + φ div u
90
CHAPTER 5. CALCULUS OF VECTOR AND TENSOR FIELDS
b. grad (φu) = u ⊗ grad φ + φ grad u c. div (φA) = A grad φ + φ div A
Solution: a. In terms of components we are asked to show that (φui ),i = ui φ,i + φ ui,i . This follows immediately by expanding (φui ),i using the chain rule. b. In terms of components we are asked to show that (φui ),j = ui φ,j + φ ui,j . Again, this follows immediately by expanding (φui ),j using the chain rule. c. In terms of components we are asked to show that (φAij ),j = Aij φ,j + φ Aij,j . Again, this follows immediately by expanding (φAij ),j using the chain rule.
Example 5.4: If φ(x) and v(x) are a scalar and vector ﬁeld respectively, show that × (φv) = φ( × v) − v × φ (i) × u = eijk uk,j ei where ei is a ﬁxed ×v+ φ×v (ii)
Solution: Recall that the curl of a vector ﬁeld u can be expressed as basis vector. Thus evaluating × (φv):
× (φv) = eijk (φvk ),j ei = eijk φ vk,j ei + eijk φ,j vk ei = φ from which the desired result follows because a × b = −b × a.
Example 5.5: Let u(x) be a vector ﬁeld and deﬁne a second vector ﬁeld ξ(x) by ξ(x) = curl u(x). Show that a. · ξ = 0; uT )a = ξ × a for any vector ﬁeld a(x); and u· u− u· uT × u can be expressed as (i)
b. ( u − c. ξ · ξ =
Solution: Recall that in terms of its components, ξ = curl u = ξi = eijk uk,j . a. A direct calculation gives
· ξ = ξi,i = (eijk uk,j ),i = eijk uk,ji = 0
(ii)
where in the last step we have used the fact that eijk is skewsymmetric in the subscripts i, j, and uk,ji is symmetric in the subscripts i, j (since the order of partial diﬀerentiation can be switched) and therefore their product vanishes.
5.4. WORKED EXAMPLES.
b. Multiplying both sides of (i) by eipq gives eipq ξi = eipq eijk uk,j = (δpj δqk − δpk δqj ) uk,j = uq,p − up,q ,
91
(iii)
where we have made use of the identity eipq eijk = δpj δqk − δpk δqj between the alternator and the Kronecker delta infroduced in (1.49) as well as the substitution rule. Multiplying both sides of this by aq and using the fact that eipq = −epiq gives epiq ξi aq = (up,q − uq,p )aq , or ξ × a = ( u − uT )a. (iv)
c. Since ( u)ij = ui,j and the inner product of two 2tensors is A · B = Aij Bij , the righthand side of the equation we are asked to establish can be written as u · u − u · uT = ( u)ij ( u)ij − ( u)ij ( u)ji = ui,j ui,j − ui,j uj,i . The lefthand side on the hand is ξ · ξ = ξi ξi . Using (i), the aforementioned identity between the alternator and the Kronecker delta, and the substitution rule leads to the desired result as follows: ξi ξi = (eijk uk,j ) (eipq up,q ) = (δjp δkq − δjq δkp ) uk,j up,q = uq,p up,q − up,q up,q . (v)
Example 5.6: Let u(x), E(x) and S(x) be, respectively, a vector and two 2tensor ﬁelds. These ﬁelds are related by 1 E= u + uT , S = 2µE + λ trace(E) 1, (i) 2 where λ and µ are constants. Suppose that u(x) = b x r3 where r = x, x = 0, (ii)
and b is a constant. Use (i)1 to calculate the ﬁeld E(x) corresponding to the ﬁeld u(x) given in (ii), and then use (i)2 to calculate the associated ﬁeld S(x). Thus verify that the ﬁeld S(x) corresponding to (ii) satisﬁes the diﬀerential equation: div S = o, x = 0. (iii) Solution: We proceed in the manner suggested in the problem statement by ﬁrst using (i)1 to calculate the E corresponding to the u given by (ii); substituting the result into (i)2 gives the corresponding S; and ﬁnally we can then check whether or not this S satisﬁes (iii). In components, 1 (ui,j − uj,i ) , (iv) 2 and therefore we begin by calculting ui,j . For this, it is convenient to ﬁrst calculate ∂r/∂xj = r,j . Observe by diﬀerentiating r2 = x2 = xi xi that Eij = 2rr,j = 2xi,j xi = 2δij xi = 2xj , and therefore r,j = xj . r (v)
(vi)
j r3 xi xj δij = b 3 − 3b 4 r r r = = = b xi δ − 3b 4 r. r r (vii) Substituting this into (iv) gives us Eij : Eij = 1 (ui.j : 1 Sij.j b xi.j dV = R δij dV = δij R dV = δij V.11): xi nj dA = ∂R R xi. gives us Sij : Sij = = 2µ Eij + λEkk δij = 2µb 2µb δij 3xi xj − 3 r r5 δij 3xi xj δkk xk xk − + λb −3 δij 3 5 3 r r r r5 r2 3xi xj 3 δij − 3 5 δij = 2µb − + λb . 3 3 r r r r5 (ix) Finally we use this to calculate ∂Sij /∂xj = Sij.j + bxi (r−3 ).8: Let A(x) be a 2tensor ﬁeld with the property that A(x)n(x) dA = o ∂D for all subregions D ⊂ R.i ) = b 2 xi xj δij −3 5 3 r r .7: Show that ∂R x ⊗ n dA = V I. (i) where V is the volume of the region R. (i) . (viii) Next.j − 3xi xj (r−5 ). and x is the position vector of a typical point in R + ∂R.j r5 − 3 5 (δij xj + xi δjj ) − 3xi xj − 6 r.j r5 r = = = −3 0.j r4 3 (xi xj ).j + uj. substituting (viii) into (i)2 . Solution: In terms of components in a ﬁxed basis.j − δij − 3 r. CALCULUS OF VECTOR AND TENSOR FIELDS Now diﬀerentiating the given vector ﬁeld ui = bxi /r3 with respect to xj gives ui. δij xj 3 15xi xj xj − 5 (xi + 3xi ) + r4 r r r6 r (x) Example 5. ∂R (ii) The result follows immediately by using the divergence theorem (5.j 2µb = δij (r−3 ). (iii) Example 5. we have to show that xi nj dA = V δij .92 CHAPTER 5.j 3 ij r r xi xj δij b 3 − 3b 5 .
5. Finally.j = 0 at each point in R and so the preceding equation simpliﬁes. Show that A must be a symmetric 2tensor. ∂D which on using the divergence theorem yields eijk (xj Akp ). multiplying both sides by eipq and using the identity eipq eijk = δpj δqk − δpk δqj in (1.9: Let A(x) be a 2tensor ﬁeld which satisﬁes the diﬀerential equation div A = o at each point in R. the result established in the previous problem allows us to conclude that Aij. Show that (i) holds if and only if div A = o at each point x ∈ R.j is continuous on R.p ] dV = 0.j = 0 at each x ∈ R. This shows that (iv) is both necessary and suﬃcient for (i) to hold.p dV = D D eijk [δjp Akp + xj Akp. Solution: In terms of components we are given that eijk xj Akp np dA = 0.12). one can easily reverse the preceding steps to conclude that then (i) also holds. (ii) By using the divergence theorem (5. this implies that Aij. we are told that Aij (x)nj (x) dA = 0 ∂D for all subregions D ⊂ R. . WORKED EXAMPLES. after using the substitution rule. 93 where n(x) is the unit outward normal vector at a point x on the boundary ∂D. (iii) If Aij. Example 5. Suppose that in addition ∂D x × An dA = o for all subregions D ⊂ R.j dV = 0 D for all subregions D ⊂ R. (iv) Conversely if (iv) holds.4.49) yields (δpj δqk − δpk δqj )Akj = Aqp − Apq = 0 and so A is symmetric. Solution: In terms of components in a ﬁxed basis. We are also given that Aij. D Since this holds for all subregions D ⊂ R we can localize it to eijk Akj = 0 at each x ∈ R. to eijk Akj dV = 0.
x2 ) = ε1 (ξ1 (s). The unit tangent vector on S. The unit outward normal vector on S is n. ξ2(s)) s s R C (x1.10: Let ε1 (x1 .12 = u. 0) and (ξ1 (s0 ). Let C be an arbitrary regular oriented curve in R that connects (0.2 = ε2. ξ2 (s))ξ1 (s) + ε2 (ξ1 (s). n2 ) . We will show that the function s0 u(x1 . x2 ).1 = ε1 . x2 ) and coinciding with C over part of its length. x2) D s n (0. 0 ≤ s ≤ s0 . (0 0) (0. the order of partial diﬀerentiation does not matter and so we necessarily have u.2: (a) Path C from (0. s. Therefore a necessary condition for (i) to hold is that ε1 . 0) (0 (s1. x2) C )) (ξ1. 0) to (x1 . u. and it has components (n1 . x2 ) ∈ R. ξ2 = ξ2 (s). To show that (ii) is also suﬃcient for the existence of u. ξ2 (s0 )) = (x1 . ξ2 (0)) = (0. ξ2 = ξ2 (s). . ξ2) = (ξ1(s). 0) and (x1 . ξ2 ) and the curve is characterized by the parameterization ξ1 = ξ1 (s). Find necessary and suﬃcient conditions under which there exists a function u(x1 . (i) Solution: In the presence of suﬃcient smoothness.94 CHAPTER 5. (ii) (a) (b) R (x1. x2 ) ∈ R. we shall provide a formula for explicitly calculating the function u in terms of the given functions ε1 and ε2 . x2 ).21 . The curve is parameterized by arc length s as ξ1 = ξ1 (s). x2 ) such that u. s2 ).1 for all (x1 . s2 = n 1 Figure 5. ξ2 (s))ξ2 (s) ds (iv) 0 satisﬁes the requirement (i) when (ii) holds. (b) A closed path C passing through (0. 0 ≤ s ≤ s0 . ε2 obey ε1.2 = ε2 for all (x1 . ξ2(s)) )) s1 = −n2. (iii) where s is arc length on C and (ξ1 (0). 0) to (x1 . x2 ). s2) = (ξ1(s). A generic point on the curve is denoted by (ξ1 . x2 ) be deﬁned on a simply connected twodimensional domain R. x2 ) and ε2 (x1 . has components (s1 . CALCULUS OF VECTOR AND TENSOR FIELDS Example 5.
x2 ) be deﬁned on a simply connected twodimensional domain R.e. WORKED EXAMPLES. Observe further from the ﬁgure that the components of the unit tangent vector s and the unit outward normal vector n are related by s1 = −n2 and s2 = n1 . This is readily seen by writing (iv) in the form (x1 . 95 To see this we must ﬁrst show that the integral (iv) does in fact deﬁne a function of (x1 . x2 ) such that a1 (x1 .) Thus consider a closed path C that starts and ends at (0. ξ2 (s)) are the components of the unit tangent vector on C at the point (ξ1 (s).1 (x1 . x2 ) and a2 (x1 . Example 5. then so does the function u + constant and so the dependence on the arbitrary starting point of the integral is to be expected. (i) u = − uT reads: ui.1 (x1 .1 − ε1. ξ2 (s))ξ1 (s) + ε2 (ξ1 (s). x2 ) as sketched in Figure NNN (b). Finally it remains to show that the function (iv) satisﬁes the requirements (i). 0) and passes through (x1 .j = −uj. x2 ). x2 ) ∈ R. (ii) Solution: This is simply a restatement of the previous example in a form that will ﬁnd useful in what follows. Thus (iv) does in fact deﬁne a function u(x1 . i. Suppose that a1 and a2 satisfy the partial diﬀerential equation a1. C (v) Recall that (ξ1 (s).4.5. We need to show that ε1 (ξ1 (s).12: Find the most general vector ﬁeld u(x) which satisﬁes the diﬀerential equation 1 2 Solution: In terms of components. x2 ) = 0 for all (x1 . ξ2 (s)): s1 = ξ1 (s). x2 ). Thus the integral (v) vanishes on any closed path C and so the integral (iv) is independent of path and depends only on the end points. this last integral vanishes. Example 5. s2 = ξ2 (s). a2 (x1 .2 dA (vi) where we have used the divergence theorem in the last step and D is the region enclosed by C . ξ2 (s))ξ2 (s) ds = 0.2 (x1 .0) ε1 (ξ1 . Thus the lefthand side of (v) can be written as ε1 s1 + ε2 s2 ds = C C ε2 n1 − ε1 n2 ds = D ε2. x2 ) + a2. x2 ) = −φ. u+ uT = O at all x ∈ R. x2 ). (Note that if a function u satisﬁes (i). x2 ) = (0. ξ2 )dξ2 (vii) and then diﬀerentiating this with respect to x1 and x2 . (ii) . x2 ) = φ. x2 ).11: Let a1 (x1 . that it does not depend on the path of integration. ξ2 )dξ1 + ε2 (ξ1 .2 (x1 .i . (i) Show that (i) holds if and only if there is a function φ(x1 .x2 ) u(x1 . In view of (ii).
However by (ii). Is this tensor symmetric? Solution: Consider. reads: f = f1 (A11 . .ij = −ui. 11 22 33 12 23 31 (ii) Proceeding formally and diﬀerentiating (ii) with respect to A12 . uk. . . (i) ∂Aij are the components of a 2tensor. In terms of components in a ﬁxed basis we have f = f (A11 . Thus in summary the most general vector ﬁeld u(x) that satisﬁes (i) is u(x) = Cx + c where C is a constant skewsymmetric 2tensor and c is a constant vector. A12 . Using this and then changing the order of diﬀerentiation leads to ui. for example. uj. CALCULUS OF VECTOR AND TENSOR FIELDS Diﬀerentiating this with respect to xk . and then changing the order of diﬀerentiation gives ui. A21 .ji = uk.ik = −uj. The partial derivatives of f with respect to Aij .kj = −ui. and separately with respect to A21 . the particular function f = A · A = Aij Aij which. . Again.jk = uk. when written out in components. ∂A12 which implies that ∂f1 /∂A12 = ∂f1 /∂A21 . A21 . A13 . .i = −ui.jk = 0.k .jk = −uj. A13 . (iv) (iii) Example 5. The vector ﬁeld u(x) must necessarily have this form if (ii) is to hold. It therefore follows that ui. A33 ).ij . ∂f1 = 0. Using this and changing the order of diﬀerentiation once again leads to ui. substituting (iv) into (ii) shows that [C] must be skewsymmetric.j = Cij where the Cij ’s are constants. where the ci ’s are constants.jk = −uj. A12 .j . . Integrating this once gives ui. ∂A21 (iii) . To examine suﬃciency. Integrating this once more gives ui = Cij xj + ci . by (ii).k = −uk.jk .ki = uk.ki .13: Suppose that a scalarvalued function f (A) is deﬁned for all symmetric tensors A. A33 ) = A2 + A2 + A2 + 2A2 + 2A2 + 2A2 . gives ∂f1 = 4A12 . ∂f .96 CHAPTER 5.
A33 ) is expressed in symmetric form. Remark: We will encounter a similar situation involving tensors whose determinant is unity. ∂f2 /∂A12 = ∂f2 /∂A21 . then ∂f /∂Aij will be symmetric. .5. we shall always assume that it has been written in symmetric form. Chadwick. We assume that this is what was meant in the problem statement. A21 . 1987. the original problem statement itself is illposed since we are asked to calculate ∂f /∂Aij but told that [A] is restricted to being symmetric. . We may diﬀerentiate f2 by treating the 9 Aij ’s to be independent and the result can then be evaluated at symmetric matrices. ∂A21 2 2 2 97 (iv) . On occasion we will have need to diﬀerentiate a function g1 (A) deﬁned for all tensors with det A = 1 and we shall do this by extending the deﬁnition of the given function and deﬁning a second function g2 (A) for all tensors. An Introduction to Continuum Mechanics. Section 2. we have eﬀectively relaxed the constraint of symmetry and expanded the domain of deﬁnition of f to all matrices [A]. In fact.4. . . Segel. References 1. . ∂A12 and so now. A13 . Dover. 1999. . A12 . L. M. We see that the values of the functions f1 and f2 are equal at all symmetric matrices and so in going from f1 → f2 . Gurtin. A12 . . Chapter 2. Dover. Throughout these volumes. 2 Substituting (iv) into the formula (ii) for f gives f = f2 (A11 . but not otherwise. ∂f2 = A21 + A12 . whenever we encounter a function of a symmetric tensor. Sections 10 and 11. A13 . A12 . (v) (vi) The source of the original diﬃculty is the fact that the 9 Aij ’s in the argument of f1 are not independent variables since Aij = Aji . Diﬀerentiating f2 leads to = A2 + A2 + A2 + 2 11 22 33 1 (A12 + A21 ) 2 +2 ∂f2 = A12 + A21 . g2 is deﬁned such that g1 (A) = g2 (A) for all tensors with unit determinant. Chapter 1. and therefore its derivative with respect to the tensor can be assumed to be symmetric. P. In general.E. . 11 22 33 2 2 2 2 13 Note that the values of f1 [A] = f2 [A] for any symmetric matrix [A]. 1981. A33 ) 1 1 (A23 + A31 ) + 2 (A31 + A13 ) 2 2 1 2 1 2 1 2 1 = A2 + A2 + A2 + A12 + A12 A21 + A21 + . WORKED EXAMPLES. .3. . . Continuum Mechanics. and yet we have been calculating partial derivatives as if they were independent. Suppose that f2 is deﬁned by (v) for all matrices [A] and not just symmetric matrices [A]. by changing Aij → 1 (Aij + 2 Aji ). since Aij is symmetric we can write 1 (Aij + Aji ) . A21 . We then diﬀerentiate g2 and evaluate the result at tensors with unit determinant. Mathematics Applied to Continuum Mechanics. 3. + A31 + A31 A13 + A2 . New York. Academic Press. . A33 ) : Aij = f2 (A11 . 2. if a function f (A11 . On the other hand.A.
.
e2 . which is a general treatment of orthogonal curvilinear coordinates. e1 .1) may be .1 Introductory Remarks The notes in this section are a somewhat simpliﬁed version of notes developed by Professor Eli Sternberg of Caltech.17) that relate the rectangular cartesian coordinates (x1 .37) in terms of the scale factors hi deﬁned in (6. constitute a frame which we denote by {O. Let {e1 . x2 . x3 ) to the orthogonal curvilinear coordinates (ˆ1 . 99 (6. The rectangular cartesian coordinates of the point P in the frame {O. x ˆ ˆ It is helpful to begin by reviewing a few aspects of the familiar case of circular cylindrical coordinates.1) is onetoone except at r = 0 (i. e3 }. e3 } be a ﬁxed orthonormal basis. x1 = x2 = 0). The point O and the basis {e1 . z) through the mappings x1 = r cos θ. e3 } are the components (x1 . x2 . 2π) × (−∞. We introduce circular cylindrical coordinates (r. x3 ) of the position vector x in this basis. and let O be a ﬁxed point chosen as the origin. for all (r. The discussion here.(6.32) .1) The mapping (6. is a compromise between a general tensorial treatment that includes oblique coordinate systems and an adhoc treatment of special orthogonal curvilinear coordinate systems. together. Consider a generic point P in R3 whose position vector relative to this origin O is x. x3 = z. x3 ).Chapter 6 Orthogonal Curvilinear Coordinates 6. Indeed (6. A summary of the main tensor analytic results of this section are given in equations (6. θ. e2 . e3 }. e2 .e. z) ∈ [0. θ. e2 . ∞) × [0. x2 = r sin θ. x2 . ∞). e1 .
ORTHOGONAL CURVILINEAR COORDINATES explicitly inverted for r > 0 to give r= x2 + x2 . (6. in general. co − axial with x3 − axis. ∂x3 /∂z Note that ∆(r. explicitly invert the coordinate mapping in this way.100 CHAPTER 6.1) on (r. one has: r = ro = constant : θ = θo = constant : z = zo = constant : circular cylinders. planes perpendicular to x3 − axis. θ. and the breakdown in invertibility at r = 0. In view of (6. θ. 2π) × (−∞. z).2). θ.2) For a general set of orthogonal curvilinear coordinates one cannot. meridional half − planes through x3 − axis. z = x3 . z) admit the familiar geometric interpretation illustrated in Figure 6.1: Circular cylindrical coordinates (r. The circular cylindrical coordinates (r. The Jacobian determinant of the mapping (6. 2 1 cos θ = x1 /r.e. z) ∈ (0. The above surfaces constitute a triply orthogonal family of coordinate surfaces.1. θ. θ. z) = 0 if and only if r = 0 and is otherwise strictly positive. sin θ = x2 /r.1) ∂x1 /∂r ∂x1 /∂θ ∆(r. each “regular point” of E3 ( i. z) = det ∂x2 /∂r ∂x2 /∂θ ∂x3 /∂r ∂x3 /∂θ is ∂x1 /∂z ∂x2 /∂z = r ≥ 0. this reﬂects the invertibility of (6. x3 z = constan constant z rcoordinate line coordinate θcoordinate line coordinate θ constant θ = constan r θ x1 coordinate zcoordinate line z r constant r = constan x2 Figure 6. ∞). ∞) × [0. a point at which r > 0) is the intersection of a unique triplet of (mutually .
They are local because they depend on the point x. and in order to calculate the components of these derivatives in the local basis we will need to calculate quantities of the form er · (∂er /∂r). i. hθ = r. θ. . er = hr 1 eθ = (∂x/∂θ) = − sin θ e1 + cos θe2 . hr = ∂x/∂r. etc. hθ 1 (∂x/∂z) = e3 . sometimes. hθ . z) = (r cos θ)e1 + (r sin θ)e2 + ze3 . ez · (∂eθ /∂r). . thus for example as illustrated in Figure 6.6. hz denote the magnitudes of these vectors. In terms of the circular cylindrical coordinates the position vector x can be written as x = x(r. (6. θ and z are: 1 1 1 eθ = (∂x/∂θ). θ and z respectively. etc. hr hθ hz In the present case one has hr = 1. when we need to emphasize this fact. ez · (∂er /∂r). hθ = ∂x/∂θ. eθ · (∂eθ /∂r). ez } forms a local orthonormal basis at the point x. and so the unit tangent vectors corresponding to the respective coordinate lines r. eθ (x). are tangent to the coordinate lines corresponding to r. ∂er /∂θ. θ. we will write {er (x).1. eθ . er = (∂x/∂r). .1.3) The triplet of vectors {er . The socalled metric coeﬃcients hr . z) varies. ∂x/∂θ. . In order to calculate the derivatives of various ﬁeld quantities it is clear that we will need to calculate quantities such as ∂er /∂r. ez = (∂x/∂z). while the other two remain constant. Along any coordinate line only one of the coordinates (r. the line along which a rcoordinate surface and a zcoordinate surface intersect is a θcoordinate line. .e. eθ · (∂er /∂r). ez (x)}. (6. . hz = ∂x/∂z. The coordinate lines are the pairwise intersections of the coordinate surfaces. hz = 1 and 1 (∂x/∂r) = cos θ e1 + sin θ e2 . ∂x/∂z. The vectors ∂x/∂r. .4) er · (∂eθ /∂r). ez = hz . INTRODUCTORY REMARKS 101 perpendicular) coordinate surfaces.
x2 . x3 ). x2 . x2 . vi (x1 . x2 . and R = L1 × L2 × L3 . e2 . x2 . let O be the ﬁxed point chosen as the origin and let {O. Notation: As far as possible. Each curvilinear coordinate xi belongs ˆ ˆ to some linear interval Li . x2 . x3 ) ∈ R. x2 . Inverse transformation. e3 } be the associated frame. L2 = {(ˆ2  0 ≤ x2 < 2π} and x ˆ x ˆ ˆ ˆ L3 = {(ˆ3  − ∞ < x3 < ∞}. The rectangular cartesian coordinates of the point with position vector x in this frame are (x1 .g. x2 . .5) may be interpreted as a mapping of R into E3 . x2 .5) is onetoone and suﬃciently smooth in the interior of R so that the inverse mapping xi = xi (x1 . x3 ). x3 ) for all (ˆ1 . x3 ) takes on all values in R. e3 } be a ﬁxed righthanded orthonormal basis. f (ˆ1 . For example in the case of circular cylindrical coordinates we have L1 = {(ˆ1  0 ≤ x1 < ∞}. x3 ) etc. ei . x3 )  0 ≤ x1 < x ˆ x ˆ ˆ ˆ ˆ ∞. ORTHOGONAL CURVILINEAR COORDINATES Much of the analysis in the general case to follow. ˆ Equation (6. and the “box” R is given by R = {(ˆ1 . We introduce curvilinear coordinates (ˆ1 . x3 ). x2 .1 Coordinate transformation.102 CHAPTER 6. vi (ˆ1 . Aij (ˆ1 . Aij (x1 .6) ˆ exists and is appropriately smooth at all (x1 . Observe that the “box” R includes some but possibly not ˆ ˆ all of its faces. xi . leading eventually to Equation (6. and we shall consistently denote the local curvilinear coordinate system and all components and quantities associated with it by similar symbols with “hats” over them. ei . e. x3 ) ranges over all of E3 as (ˆ1 . x2 .2. 0 ≤ x2 < 2π. x2 . x ˆ ˆ x ˆ ˆ (6. We assume further x ˆ ˆ ˆ that the mapping (6. f (x1 . x3 ). 6. we will consistently denote the ﬁxed cartesian coordinate system and all components and quantities associated with it by symbols such as xi . e2 .2 General Orthogonal Curvilinear Coordinates Let {e1 . is devoted to calculating these quantities. x2 . −∞ < x3 < ∞}.30) in Subsection 6.4. x3 ) in the image of the interior of R. e1 . We shall assume that ˆ (x1 . x3 ) etc. x3 ) ˆ ˆ (6. x3 ) where xi = x · ei .2. x3 ) through a triplet of scalar mappings x ˆ ˆ ˆ xi = xi (ˆ1 . x2 . ˆ ˆ ˆx ˆ ˆ ˆ x ˆ ˆ x ˆ ˆ 6. x2 .5) ˆ where the domain of deﬁnition R is a subset of E3 .
The Jacobian matrix [J] of the mapping (6.6) is [J]−1 .7) and by the assumed smoothness and onetooneness of the mapping. as is illustrated in Figure 6. the Jacobian determiˆ nant does not vanish on the interior of R.9) ˙ can be taken to be1 x(t) = xi (t)ei . c3 = constant.6. c2 . Without loss of generality we can take therefore take it to be positive: 1 ∂xi ∂xj ∂xk det[J] = eijk epqr > 0.) Points that are not ˆ on a singular line or surface will be referred to as “regular points” of E3 . ˆ ˆi i = 1. (6. .8) 6 ∂ xp ∂ xq ∂ x r ˆ ˆ ˆ The Jacobian matrix of the inverse mapping (6. corresponding to a x1 coordinate line. x x1 ∈ L1 .1. (For example in the case of circular cylindrical coordinates. Genˆ ˆ eralizing this. Thus in the ˙ case of the special curve Γ1 : x = x(ˆ1 .10) Here and in the sequel a superior dot indicates diﬀerentiation with respect to the parameter t. see Section 6. along which only one of the curvilinear coordinates varies. ∂x/∂ xi are tangent vectors and ˆ ˆ ei = 1 1 ∂x ∂x/∂ xi  ∂ xi ˆ ˆ (no sum) (6. The coordinate surface xi = constant is deﬁned by ˆ xi (x1 .2. Recall that the tangent vector along an arbitrary regular curve Γ : x = x(t). the tangent vector can be taken to be ∂x/∂ x1 . x2 . Thus every regular point of E3 is the point of intersection of a unique triplet of coordinate surfaces and coordinate lines. it is oriented in the direction of increasing t. GENERAL ORTHOGONAL CURVILINEAR COORDINATES 103 ˆ Note that the mapping (6.5) might not be uniquely invertible on some of the faces of R which are mapped into “singular” lines or surfaces in E3 . (6. 2. x3 ) = xo = constant. (α ≤ t ≤ β). c3 ). the pairwise intersections of these surfaces are the corresponding coordinate lines. 3. ˆ c2 = constant.5) has elements Jij = ∂xi ∂ xj ˆ (6. x1 = r = 0 is a singular surface.2.
The proper orthogonal matrix [Q] characterizes the ˆ rotational transformation relating this basis to the rectangular cartesian basis {e1 . e2 .11) 6. ORTHOGONAL CURVILINEAR COORDINATES x3 ˆ x1coordinate surface ˆ coordinate x3 ˆ e3 x2coordinate surface ˆ coordinate ˆ e1 x1 ˆ ˆ e2 x3coordinate surface ˆ coordinate x2 ˆ e3 e2 e1 ˆ Qij = ei · ej x2 x1 Figure 6. scale moduli. Here ei is the unit tangent vector along the xi coordinate line. x3 ) and the associated local orthonormal basis x ˆ ˆ ˆ ˆ vectors {ˆ1 .12) .2: Orthogonal curvilinear coordinates (ˆ1 . Since our discussion is limited to orthogonal curvilinear coordinate systems. x2 . e3 }. e3 }. If s(t) is the arclength of Γ. e2 .104 CHAPTER 6. Consider again the arbitrary regular curve Γ parameterized by (6. both of which point in the sense ˆ of increasing xi . ˆ we must require for i = j : ˆ ˆ ei · ej = 0 or ∂x ∂x · =0 ∂ xi ∂ xj ˆ ˆ or ∂xk ∂xk · = 0.2 Metric coeﬃcients. measured from an arbitrary ﬁxed point on Γ. ∂ xi ∂ xj ˆ ˆ (6. are the unit tangent vectors along the xi − coordinate lines. (6.9).2. the sense of ei being e ˆ ˆ ˆ determined by the direction in which xi increases. one has ˙ s(t) = x(t) = ˙ ˙ ˙ x(t) · x(t).
(6. where the scale moduli hi are deﬁned by23 √ hi = gii = ∂xk ∂xk = ∂ xi ∂ xi ˆ ˆ ∂x1 ∂ xi ˆ 2 (6. √ √ 3 Some authors such as Love deﬁne hi as 1/ gii instead of as gii . (i = j).13) dt dt dt in which gij are the metric coeﬃcients of the curvilinear coordinate system under consideration.13). The h2 1 [g] = 0 0 From (6. 0 h2 3 along Γ. (6.17) follows matrix of metric coeﬃcients is therefore 0 0 2 (6.14).14) gij = ∂ xi ∂ xj ˆ ˆ ∂ x i ∂ xj ˆ ˆ Note that gij = 0.15) the metric coeﬃcients can be written as gij = hi hj δij . (6.7) we can write gij = Jki Jkj or equivalently [g] = [J]T [J].17) noting that hi = 0 is precluded by (6.2. ˆ ˆ Thus 2 2 105 = dx dx dxk dxk · = · = dt dt dt dt ∂xk dˆi x ∂ xi dt ˆ · ∂xk dˆj x ∂ xj dt ˆ = ∂xk ∂xk · ∂ xi ∂ xj ˆ ˆ dˆi dˆj x x . x3 (t)). Because of (6. (6. (6. x x (6. dt dt (α ≤ t ≤ β). ds dˆi dˆj x x = gij or (ds)2 = gij dˆi dˆj .15) as a consequence of the orthogonality condition (6. x2 (t).14) and (6. They are deﬁned by ∂xk ∂xk ∂x ∂x · = .6.16) + ∂x2 ∂ xi ˆ 2 + ∂x3 ∂ xi ˆ 2 > 0. Observe that in terms of the Jacobian matrix [J] deﬁned earlier in (6.19) (ds)2 = (h1 dˆ1 )2 + (h2 dˆ2 )2 + (h3 dˆ3 )2 x x x 2 Here and henceforth the underlining of one of two or more repeated indices indicates suspended summation with respect to this index.8).18) h2 0 .12). (6.11). GENERAL ORTHOGONAL CURVILINEAR COORDINATES One concludes from (6. (6.5) and the chain rule that ds dt where xi (t) = xi (x1 (t).
∂xj (6. hi = ds dˆi x along the xi − coordinate lines. the elements of the matrix [Q] that relates the two coordinate systems can be written in the alternative form ˆ Qij = ei · ej = hi ∂ xi ˆ . x3 )). ∂ xk ∂xj ˆ Multiply this by ∂xi /∂ xm . ˆ (6. (6.2. x2 (x1 . x2 . x ˆ ˆ so that from the chain rule. i.e.14).23) By (6.10) and (6.16) to conﬁrm that ∂xj ∂ xm ˆ ∂ xk ˆ = gkm = h2 .22) Qij = ei · ej = hi ∂ xi ˆ 6. x3 ). and use (6.5). x2 . (6. x3 (x1 . m ∂ xm ˆ ∂xj ∂xj Thus the inverse partial derivatives are given by ∂ xi ˆ 1 ∂xj = 2 .3 Inverse partial derivatives In view of (6. ˆ (6. ∂xj hi ∂ xi ˆ (6.6) one has the identity xi = xi (ˆ1 (x1 .17). ∂xi ∂ xk ˆ = δij .22). hi ∂ xi ˆ (6.17) yield the following alternative expressions for hi : hi = 1 ∂ xi ˆ ∂x1 2 + ∂ xi ˆ ∂x2 2 + ∂ xi ˆ ∂x3 2 . x2 .106 CHAPTER 6.20) ˆ It follows from (6.11) that the unit vector ei can be expressed as ˆ ei = 1 ∂x . (6.23) and (6. noting the implied contraction on the index i. . x3 ).24) Moreover. ORTHOGONAL CURVILINEAR COORDINATES which reveals the geometric signiﬁcance of the scale moduli.21) and therefore the proper orthogonal matrix [Q] relating the two sets of basis vectors is given by 1 ∂xj ˆ (6.
17).25) − 1 ∂hi ∂x 1 ∂ 2x + h2 ∂ xj ∂ xi hi ∂ xi ∂ xj ˆ ˆ ˆ i ˆ · 1 ∂x . From (6.2.14).28) into (6. respectively.29) Equation (6. j).27) If we refer to (6.25) with respect to xk . ∂ xi ∂ xk ∂ xj ∂ x j ∂ xk ∂ x i ˆ ˆ ˆ ˆ ˆ ˆ ∂ xk ˆ (6.26) in terms of the scalemoduli and their ﬁrst partial derivatives. (6. e3 ) e ˆ e ˆ ˆ In order to calculate the derivatives of various ﬁeld quantities it is clear that we will need to calculate the quantities ∂ˆi /∂ xj .2. and in order to calculate the components of these derivatives e ˆ ˆ in the local basis we will need to calculate quantities of the form ek · ∂ˆi /∂ xj . ˆ ∂ 2x ∂x ∂ 2x ∂x ∂ · + · = δij (hi hj ) . Thus.26) leads to ∂ˆi e δik ∂hi 1 ˆ · ek = − + ∂ xj ˆ hi ∂ xj 2hi hk ˆ δjk ∂ ∂ ∂ (hj hk ) + δki (hk hi ) − δij (hi hj ) . k) are replaced by (j. hk ∂ xk ˆ In order to express the second derivative term in (6. ∂ xj ˆ hi ∂ xj ˆ hi hk ∂ xi ∂ xk ∂xk ˆ ˆ (6. ∂ xi ˆ ∂ xj ˆ ∂ xk ˆ (6. we begin by diﬀerentiating (6. ∂x ∂x · = gij = δij hi hj . e ˆ ˆ .4 Components of ∂ˆi /∂ xj in the local basis (ˆ1 . e2 . i. j.27) as (a).28) Substituting (6. Calculating e ˆ these quantities is an essential prerequisite for the transformation of basic tensoranalytic relations into arbitrary orthogonal curvilinear coordinates. k.6. GENERAL ORTHOGONAL CURVILINEAR COORDINATES 107 6. and let (b) and (c) be the identities resulting from (6. i) and (k.26) (6. ∂ xi ˆ ∂ xj ˆ ∂ xk ˆ (6. ∂ xi ∂ xj ˆ ˆ Therefore δik ∂hi 1 ∂x ∂ˆi e ∂ 2x ˆ · ek = − ++ · . then 2 {(b)+(c) (a)} is readily found to yield ∂x 1 ∂ 2x · = ∂ xi ∂ xj ∂ xk ˆ ˆ ˆ 2 δjk ∂ ∂ ∂ (hj hk ) + δki (hk hi ) − δij (hi hj ) . and this subsection is devoted to this calculation.27) when 1 (i.29) provides the explicit expressions for the terms ∂ˆi /∂ xj · ek that we sought. ∂ˆi e ˆ · ek = ∂ xj ˆ while by (6.21).
108
CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
Observe the following properties that follow from it: ∂ˆi e ˆ · ek = 0 if (i, j, k) distinct, ∂ xj ˆ ∂ˆi e ˆ · ek = 0 if k = i, ∂ xj ˆ ∂ˆi e 1 ∂hi ˆ · ek = − , if i = k ∂ xi ˆ hk ∂ xk ˆ ∂ˆi e 1 ∂hk ˆ · ek = − if i = k. ∂ xk ˆ hi ∂ xi ˆ
(6.30)
6.3
Transformation of Basic Tensor Relations
Let T be a cartesian tensor ﬁeld of order N ≥ 1, deﬁned on a region R ⊂ E3 and suppose that the points of R are regular points of E3 with respect to a given orthogonal curvilinear coordinate system. ˆ The curvilinear components Tijk...n of T are the components of T in the local basis (ˆ1 , e2 , e3 ). Thus, e ˆ ˆ ˆ Tij...n = Qip Qjq . . . Qnr Tpq...r , where ˆ Qip = ei · ep . (6.31)
6.3.1
Gradient of a scalar ﬁeld
Let φ(x) be a scalarvalued function and let v(x) denote its gradient: v= φ or equivalently vi = φ,i .
The components of v in the two bases {e1 , e2 , e3 } and {ˆ1 , e2 , e3 } are related in the usual e ˆ ˆ way by vk = Qki vi ˆ and so vk = Qki vi = Qki ˆ On using (6.22) this leads to vk = ˆ 1 ∂xi hk ∂ xk ˆ ∂φ 1 ∂φ ∂xi = , ∂xi hk ∂xi ∂ xk ˆ ∂φ . ∂xi
6.3. TRANSFORMATION OF BASIC TENSOR RELATIONS so that by the chain rule vk = ˆ where we have set ˆx ˆ ˆ φ(ˆ1 , x2 , x3 ) = φ(x1 (ˆ1 , x2 , x3 ), x2 (ˆ1 , x2 , x3 ), x3 (ˆ1 , x2 , x3 )). x ˆ ˆ x ˆ ˆ x ˆ ˆ
109
ˆ 1 ∂φ , hk ∂ xk ˆ
(6.32)
6.3.2
Gradient of a vector ﬁeld
Let v(x) be a vectorvalued function and let W(x) denote its gradient: W= v or equivalently Wij = vi,j .
The components of W and v in the two bases {e1 , e2 , e3 } and {ˆ1 , e2 , e3 } are related in the e ˆ ˆ usual way by Wij = Qip Qjq Wpq , vp = Qnp vn , ˆ and therefore Wij = Qip Qjq Thus by (6.24)4 Wij = Qip Qjq Since Qjq Qmq = δmj , this simpliﬁes to Wij = Qip 1 ∂ (Qnp vn ), ˆ hj ∂ xj ˆ 1 ∂ Qmq (Qnp vn ) . ˆ hm ∂ xm ˆ m=1
3
∂vp ∂ xm ˆ ∂ ∂ = Qip Qjq (Qnp vn ) = Qip Qjq ˆ (Qnp vn ) ˆ . ∂xq ∂xq ∂ xm ˆ ∂xq
which, on expanding the terms in parentheses, yields Wij = However by (6.22) Qip and so Wij = ∂Qnp ∂ ∂ˆn e ˆ vn = Qip ˆ (ˆn · ep )ˆn = ei · e v vn , ˆ ∂ xj ˆ ∂ xj ˆ ∂ xj ˆ 1 hj ∂ˆn e ∂ˆi v ˆ + ei · vn , ˆ ∂ xj ˆ ∂ xj ˆ (6.33) 1 hj ∂ˆi v ∂Qnp + Qip vn . ˆ ∂ xj ˆ ∂ xj ˆ
in which the coeﬃcient in brackets is given by (6.29).
We explicitly use the summation sign in this equation (and elsewhere) when an index is repeated 3 or more times, and we wish sum over it.
4
110
CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
6.3.3
Divergence of a vector ﬁeld
v(x) denote its gradient. Then
Let v(x) be a vectorvalued function and let W(x) =
div v = trace W = vi,i . Therefore from (6.33), the invariance of the trace of W, and (6.30),
3
div v = trace W = trace W = Wii =
i=1
1 hi
∂ˆi v + ∂ xi ˆ
n=i
1 ∂hi vn ˆ hn ∂ xn ˆ
.
Collecting terms involving v1 , v2 , and v3 alone, one has ˆ ˆ ˆ div v = Thus div v = 1 ∂ˆ1 v v1 ∂h2 ˆ v1 ∂h3 ˆ + + + ... + ... h1 ∂ x1 h2 h1 ∂ x1 h3 h1 ∂ x1 ˆ ˆ ˆ ∂ ∂ ∂ (h2 h3 v1 ) + ˆ (h3 h1 v2 ) + ˆ (h1 h2 v3 ) . ˆ ∂ x1 ˆ ∂ x2 ˆ ∂ x3 ˆ (6.34)
1 h1 h2 h3
6.3.4
Laplacian of a scalar ﬁeld
Let φ(x) be a scalarvalued function. Since
2
φ = div(grad φ) = φ,kk
the results from Subsections (6.3.1) and (6.3.3) permit us to infer that
2
φ=
1 h1 h2 h3
ˆ ˆ ˆ ∂ h2 h3 ∂ φ ∂ h3 h1 ∂ φ ∂ h1 h2 ∂ φ + + ∂ x1 h1 ∂ x1 ˆ ˆ ∂ x2 h2 ∂ x2 ˆ ˆ ∂ x3 h3 ∂ x3 ˆ ˆ
(6.35)
where we have set ˆx ˆ ˆ φ(ˆ1 , x2 , x3 ) = φ(x1 (ˆ1 , x2 , x3 ), x2 (ˆ1 , x2 , x3 ), x3 (ˆ1 , x2 , x3 )). x ˆ ˆ x ˆ ˆ x ˆ ˆ
6.3.5
Curl of a vector ﬁeld
Let v(x) be a vectorvalued ﬁeld and let w(x) be its curl so that w = curl v or equivalently wi = eijk vk,j .
6.3. TRANSFORMATION OF BASIC TENSOR RELATIONS Let W= v or equivalently Wij = vi,j . Then as we have shown in an earlier chapter wi = eijk Wkj , Consequently from Subsection 6.3.2,
3
111
wi = eijk Wkj .
wi =
j=1
1 eijk hj
∂ˆk v ∂ˆn e ˆ + ek · vn . ˆ ∂ xj ˆ ∂ xj ˆ
By (6.30), the second term within the braces sums out to zero unless n = j. Thus, using the second of (6.30), one arrives at wi = This yields w1 = w2 = w3 = 1 h2 h3 1 h3 h1 1 h1 h2 ∂ ∂ (h3 v3 ) − ˆ (h2 v2 ) , ˆ ∂ x2 ˆ ∂ x3 ˆ ∂ ∂ (h1 v1 ) − ˆ (h3 v3 ) , ˆ ∂ x3 ˆ ∂ x1 ˆ ∂ ∂ (h2 v2 ) − ˆ (h1 v1 ) . ˆ ∂ x1 ˆ ∂ x2 ˆ (6.36) 1 eijk hj j,k=1
3
∂ˆk v 1 ∂hj − vj . ˆ ∂ xj ˆ hk ∂ xk ˆ
6.3.6
Divergence of a symmetric 2tensor ﬁeld
Let S(x) be a symmetric 2tensor ﬁeld and let v(x) denote its divergence: v = div S, S = ST , or equivalently vi = Sij,j , Sij = Sji .
The components of v and S in the two bases {e1 , e2 , e3 } and {ˆ1 , e2 , e3 } are related in the e ˆ ˆ usual way by ˆ vi = Qip vp , Sij = Qmi Qnj Smn , ˆ and consequently vi = Qip Spj,j = Qip ˆ ∂ ∂ ∂ xk ˆ ˆ ˆ (Qmp Qnj Smn ) = Qip (Qmp Qnj Smn ) . ∂xj ∂ xk ˆ ∂xj
112
CHAPTER 6. ORTHOGONAL CURVILINEAR COORDINATES
ˆ ˆ By using (6.24), the orthogonality of the matrix [Q], (6.30) and Sij = Sji we obtain v1 = ˆ 1 h1 h2 h3 ∂ ∂ ∂ ˆ ˆ ˆ (h2 h3 S11 ) + (h3 h1 S12 ) + (h1 h2 S13 ) ∂ x1 ˆ ∂ x2 ˆ ∂ x3 ˆ
1 ∂h1 ˆ 1 ∂h1 ˆ 1 ∂h2 ˆ 1 ∂h3 ˆ + S12 + S13 − S22 − S33 , h1 h2 ∂ x2 ˆ h1 h3 ∂ x3 ˆ h1 h2 ∂ x1 ˆ h1 h3 ∂ x1 ˆ with analogous expressions for v2 and v3 . ˆ ˆ
(6.37)
Equations (6.32)  (6.37) provide the fundamental expressions for the basic tensoranalytic quantities that we will need. Observe that they reduce to their classical rectangular cartesian forms in the special case xi = xi (in which case h1 = h2 = h3 = 1). ˆ
6.3.7
Diﬀerential elements of volume
When evaluating a volume integral over a region D, we sometimes ﬁnd it convenient to transform it from the form . . . dx1 dx2 dx3
D
into an equivalent expression of the form
D
. . . dˆ1 dˆ2 dˆ3 . x x x
In order to do this we must relate dx1 dx2 dx3 to dˆ1 dˆ2 dˆ3 . By (6.22), x x x det[Q] = 1 det[J]. h1 h2 h3
However since [Q] is a proper orthogonal matrix its determinant takes the value +1. Therefore det[J] = h1 h2 h3 and so the basic relation dx1 dx2 dx3 = det[J] dˆ1 dˆ2 dˆ3 leads to x x x dx1 dx2 dx3 = h1 h2 h3 dˆ1 dˆ2 dˆ3 . x x x (6.38)
6.3.8
Diﬀerential elements of area
ˆ Let dA1 denote a diﬀerential element of (vector) area on a x1 coordinate surface so that ˆ ˆ ˆ ˆ dA1 = (dˆ2 ∂x/∂ x2 ) × (dˆ3 ∂x/∂ x3 ). In view of (6.21) this leads to dA1 = (dˆ2 h2 e2 ) × x ˆ x ˆ x ˆ (dˆ3 h3 e3 ) = h2 h3 dˆ2 dˆ3 e1 . Thus the diﬀerential elements of (scalar) area on the x1 , x2 x x x ˆ ˆ ˆ and x3 coordinate surfaces are given by ˆ ˆ dA1 = h2 h3 dˆ2 dˆ3 , x x respectively. ˆ dA2 = h3 h1 dˆ3 dˆ1 , x x ˆ dA3 = h1 h2 dˆ1 dˆ2 , x x (6.39)
w) ∈ (−∞. ∞). hu = hv = u hz = 1 . √ 2 + v2.1: Let E(x) be a symmetric 2tensor ﬁeld that is related to a vector ﬁeld u(x) through E= 1 2 u+ uT . π] × (−∞. (6. z): x1 = a cosh ξ cos η. Spherical Coordinates (r. x3 = z. hr = 1. π]. x2 = r sin θ. ∞) × (−π. x2 = uv. x3 = w. ∞) × [0. hz = 1 . ∞) × (−∞. hr = 1.41) Parabolic Cylindrical Coordinates (u. ∞) × [0. θ. (6. θ. Example 6. x2 = r sin θ sin φ. ∞) × [0. hθ = r. w): 1 x1 = 2 (u2 − v 2 ). for all (ξ. θ. η. φ) ∈ [0.40) Elliptical Cylindrical Coordinates (ξ. z): x1 = r cos θ.4. (6. for all (r. . η. φ): x1 = r sin θ cos φ. hφ = r sin θ .6. x3 = r cos θ. x3 = z. x2 = a sinh ξ sin η. hθ = r. 2π) × (−π. v.5 Worked Examples. ∞).42) for all (u. v. θ. 2π) × (−∞. 2 2 hξ = hη = a sinh ξ + sin η.4 Some Examples of Orthogonal Curvilinear Coordinate Systems Circular Cylindrical Coordinates (r. z) ∈ [0. ∞). for all (r. EXAMPLES OF ORTHOGONAL CURVILINEAR COORDINATES 113 6. (6. z) ∈ [0.43) 6. hz = 1.
. etc. E23 = . 0 ≤ r < ∞. ˆ h1 ∂ x1 ˆ h1 h2 ∂ x2 ˆ h1 h3 ∂ x3 ˆ 1 h1 ∂ u1 ˆ h2 ∂ u2 ˆ = E21 = + 2 h2 ∂ x2 h1 ˆ h1 ∂ x1 h2 ˆ = E33 = . x3 ) x ˆ ˆ through x1 = r cos θ. .3. ˆ e ˆ Solution: Using the result from Subsection 6. .3. ORTHOGONAL CURVILINEAR COORDINATES In a cartesian coordinate system this can be written equivalently as Eij = 1 2 ∂ui ∂uj + ∂xj ∂xi . −∞ < z < ∞.. .3: Consider circular cylindrical coordinates (ˆ1 .. . ∂xj Establish the analogous formulas in a general orthogonal curvilinear coordinate system.114 CHAPTER 6. and S(x) a symmetric 2tensor ﬁeld. . . . . x3 ) = (r. . . 0 ≤ θ < 2π. x3 = z..2 and the formulas for ek ·(∂ˆi /∂ xj one ﬁnds after elementary simpliﬁcation that 1 ∂ u1 ˆ 1 ∂h1 1 ∂h1 + u2 + ˆ u3 . . u(x) a vectorvalued ﬁeld.6 we have 1 h1 h2 h3 ∂ ∂ ∂ (h2 h3 S11 ) + (h3 h1 S12 ) + (h1 h2 S13 ) ∂ x1 ˆ ∂ x2 ˆ ∂ x3 ˆ 1 ∂h1 1 ∂h1 1 ∂h2 1 ∂h3 + S12 + S13 − S22 − S33 + ˆ1 = 0. . Express the following quanties. E11 E12 E22 = . . z) which are related to (x1 . In a cartesian coordinate system this can be written equivalently as ∂Sij + bi = 0. x2 . E31 = . Establish the analogous formulas in a general orthogonal curvilinear coordinate system. θ.2: Consider a symmetric 2tensor ﬁeld S(x) and a vector ﬁeld b(x) that satisfy the equation div S + b = o. Let f (x) be a scalarvalued ﬁeld. b h1 h2 ∂ x2 ˆ h1 h3 ∂ x3 ˆ h1 h2 ∂ x1 ˆ h1 h3 ∂ x1 ˆ (i) . where ˆi = Qip bp b Example 6. . . . . x2 = r sin θ. (a) grad f . (i) Example 6. x2 . Solution: From the results in Subsection 6.
6. eθ . (b) 2 f (c) div u (d) curl u (e) 1 2 u+ uT and (f) div S in this coordinate system. 1 0 x2 = r sin θ. x3 = z.3: Cylindrical coordinates (r. x2 . The matrix [∂xi /∂ xj ] therefore specializes to ˆ ∂x2 /∂r ∂x3 /∂r ∂x1 /∂r ∂x1 /∂θ ∂x2 /∂θ ∂x3 /∂θ ∂x1 /∂z ∂x2 /∂z ∂x3 /∂z sin θ 0 cos θ −r sin θ r cos θ 0 0 . e3 . Solution: We simply need to specialize the basic results established in Section 6. θ. ez = er = 115 x3 ez eθ z r θ x1 er x2 Figure 6. eθ = − sin θ e1 + cos θ e2.5. cos θ e1 + sin θ e2. θ. In the present case we have (ˆ1 . (ii) (i) = .5) takes the particular form x1 = r cos θ. z) and the associated local curvilinear orthonormal basis {er .3. ez }. WORKED EXAMPLES. x3 ) = (r. z) x ˆ ˆ and the coordinate mapping (6.
. Srθ . (vi) (v) (iv) which. From (ii). ez = e3 . and therefore on using (6. .32) gives f= ∂f ∂r er + 1 ∂f r ∂θ eθ + ∂f ∂z ez where we have set f (r. u2 . (a) Substituting (i) and (iii) into (6. ORTHOGONAL CURVILINEAR COORDINATES hr hθ hz = = = ∂x1 ∂r ∂x1 ∂θ ∂x1 ∂z 2 + 2 ∂x2 ∂r ∂x2 ∂θ ∂x2 ∂z 2 + 2 ∂x3 ∂r ∂x3 ∂θ ∂x3 ∂z 2 = 2 1. . u3 ) u ˆ ˆ for the components of a vector ﬁeld.35) gives 2 f= ∂2f 1 ∂f 1 ∂2f ∂2f + + 2 2 + 2 2 ∂r r ∂r r ∂θ ∂z where we have set f (r. x2 . could have been obtained geometrically from Figure 6. z) = f (x1 .) = (S11 . . (b) Substituting (i) and (iii) into (6.116 and the scale moduli are CHAPTER 6. x = (r cos θ)e1 + (r sin θ)e2 + (z)e3 . in this case. e3 ) e ˆ ˆ for the unit vectors associated with the local cylindrical coordinate system. x2 . S12 .) for the components of a 2tensor ﬁeld. θ. . ez ) = (ˆ1 . uz ) = (ˆ1 .3. x3 ). eθ . x3 ). . (iii) + 2 + 2 = r. 2 + + = 1. z) = f (x1 . We use the natural notation (ur . θ. eθ = − sin θ e1 + cos θ e2 . e2 . and (er . .21) and (iii) we obtain the following expressions for the unit vectors associated with the local cylindrical coordinate system: er = cos θ e1 + sin θ e2 . and ˆ ˆ (Srr . uθ .
) = (E11 . x3 = r cos θ. (c) Substituting (i) and (iii) into (6.2. substituting (i) and (iii) into (6.4: Consider spherical coordinates (ˆ1 .33) enables us to calculate ˆ can calculate E.34) gives div u = ∂ur 1 1 ∂uθ ∂uz + ur + + ∂r r r ∂θ ∂z 117 (d) Substituting (i) and (iii) into (6. and S(x) a symmetric 2tensor ﬁeld.37) gives div S = + + ∂Srr 1 ∂Srθ ∂Srz Srr − Sθθ + + + ∂r r ∂θ ∂z r ∂Srθ 1 ∂Sθθ ∂Sθz 2Srθ + + + ∂r r ∂θ ∂z r ∂Szr 1 ∂Szθ ∂Szz Szr + + + ∂r r ∂θ ∂z r eθ ez er Alternatively these could have been obtained from the results of Example 6. Express the following quanties.). . x3 ) = (r. u(x) a vectorvalued ﬁeld. Erz . Example 6. WORKED EXAMPLES. r ∂θ r ∂uz . Writing the cylindrical components Eij of E as ˆ ˆ ˆ (Err . ∂z ∂uθ uθ 1 1 ∂ur + − . 2 ∂z r ∂θ ∂ur 1 ∂uz + . E13 . (f ) Finally.36) gives curl u = 1 ∂uz ∂uθ − r ∂θ ∂z er + ∂ur ∂uz − ∂z ∂r eθ + ∂uθ uθ 1 ∂ur + − ∂r r r ∂θ ez u whence we (e) Set E=(1/2)( u + uT ). x2 . 2 r ∂θ ∂r r 1 ∂uθ 1 ∂uz + . 0 ≤ φ < 2π. x3 ) through x ˆ ˆ x1 = r sin θ cos φ. . 0 ≤ θ ≤ π. x2 . 2 ∂r ∂z Alternatively these could have been obtained from the results of Example 6. 0 ≤ r < ∞. . ∂r 1 ∂uθ ur + . Let f (x) be a scalarvalued ﬁeld. . one ﬁnds Err Eθθ Ezz Erθ Eθz Ezr = = = = = = ∂ur . Erθ . E12 . φ) which are related to (x1 . x2 = r sin θ sin φ. . . θ.1.6. Substituting (i) and (iii) into (6.5.
(ii) (i) = . x ˆ ˆ and the coordinate mapping (6. x3 = r cos θ. eφ = − sin φ e1 + cos φ e2. ORTHOGONAL CURVILINEAR COORDINATES er = (sin θ cos φ) e1 + (sin θ sin φ) e2 + cos θ e3. The matrix [∂xi /∂ xj ] therefore specializes to ˆ ∂x2 /∂r ∂x3 /∂r ∂x1 /∂r ∂x1 /∂θ ∂x2 /∂θ ∂x3 /∂θ ∂x2 /∂φ ∂x3 /∂φ ∂x1 /∂φ sin θ sin φ cos θ sin θ cos φ r cos θ cos φ r cos θ sin φ −r sin θ −r sin θ sin φ r sin θ cos φ 0 x2 = r sin θ sin φ. eφ }.5) takes the particular form x1 = r sin θ cos φ. sin sin cos cos eθ = (cos θ cos φ) e1 + (cos θ sin φ) e2 − sin θ e3.4: Spherical coordinates (r. (a) grad f (b) div u (c) 2 f (d) curl u (e) 1 2 u+ uT and (f) div S in this coordinate system. In the present case we have (ˆ1 . φ).118 x3 CHAPTER 6. θ.3. θ. x3 ) = (r. x2 φ er eφ eθ θ r x1 Figure 6. φ) and the associated local curvilinear orthonormal basis {er . Solution: We simply need to specialize the basic results established in Section 6. x2 . eθ .
. eθ = (cos θ cos φ) e1 + (cos θ sin φ) e2 − sin θ e3 . x3 ).4. . and (er . . eφ = − sin φ e1 + cos φ e2 . e2 .) for the components of a 2tensor ﬁeld. . S13 . 2 + + = r sin θ.32) gives f= ∂f ∂r er + 1 ∂f r ∂θ eθ + 1 ∂f r sin θ ∂φ eφ . (b) Substituting (i) and (iii) into (6. and therefore on using (6. φ) = f (x1 . eφ ) = (ˆ1 . From (ii). WORKED EXAMPLES. u2 . . u3 ) u ˆ ˆ for the components of a vector ﬁeld. x3 ). where we have set f (r.21) and (iii) we obtain the following expressions for the unit vectors associated with the local spherical coordinate system: er = (sin θ cos φ) e1 + (sin θ sin φ) e2 + cos θ e3 . uφ ) = (ˆ1 . in this case. e3 ) e ˆ ˆ for the unit vectors associated with the local spherical coordinate system. Srθ . We use the natural notation (ur .5. x2 . (a) Substituting (i) and (iii) into (6. (vi) (v) (iv) which. eθ . θ. (iii) + 2 + 2 = r. φ) = f (x1 . x = (r sin θ cos φ)e1 + (r sin θ sin φ)e2 + (r cos θ)e3 . uθ . and the scale moduli are hr hθ hφ = = = ∂x1 ∂r ∂x1 ∂θ ∂x1 ∂φ 2 119 + 2 ∂x2 ∂r ∂x2 ∂θ ∂x2 ∂φ 2 + 2 ∂x3 ∂r ∂x3 ∂θ ∂x3 ∂φ 2 = 2 1. ˆ ˆ ˆ (Srr . θ. Srφ . S12 .) = (S11 .35) gives 2 f= ∂2f 2 ∂f 1 ∂2f 1 ∂f 1 ∂2f + + 2 2 + 2 cot θ + 2 2 ∂r2 r ∂r r ∂θ r ∂θ r sin θ ∂φ2 where we have set f (r. x2 .6. could have been obtained geometrically from Figure 6.
37) gives div S = + + 1 ∂Srθ 1 ∂Srφ 1 ∂Srr + + + [2Srr − Sθθ − Sφφ + cot θSrθ ] er ∂r r ∂θ r sin θ ∂φ r ∂Srθ 1 ∂Sθθ 1 ∂Sθφ 1 + + + [3Srθ + cot θ(Sθθ − Sφφ )] eθ ∂r r ∂θ r sin θ ∂φ r ∂Srφ 1 ∂Sθφ 1 ∂Sφφ 1 + + + [3Srφ + 2 cot θSθφ ] eφ ∂r r ∂θ r sin θ ∂φ r Alternatively these could have been obtained from the results of Example 6. hi ∂ xi ˆ . . one ﬁnds Err Eθθ Eφφ Erθ Eθφ Eφr = = = = = = ∂ur . ∂r 1 ∂uθ ur + . Qij = 1 ∂xj .120 CHAPTER 6.2.22). Erθ . u from which one can er + 1 ∂vr ∂ − (r sin θvφ ) r sin θ ∂φ ∂r eθ (e) Set E=(1/2)( u + uT ). E13 . . Erφ .1. E12 .36) gives curl u = + 1 ∂ ∂ (r sin θvφ ) − (rvθ ) r2 sin θ ∂θ ∂φ ∂vr 1 ∂ (rvθ ) − r ∂r ∂θ eφ .).5: Show that the matrix [Q] deﬁned by (6. r sin θ ∂φ r r 1 1 ∂ur ∂uθ uθ + − . . .) = (E11 .34) gives div u = ∂ 2 1 ∂ ∂ (r sin θ ur ) + (r sin θuθ ) + (ruφ ) r2 sin θ ∂r ∂θ ∂φ . (f ) Finally substituting (i) and (iii) into (6. 2 r sin θ ∂φ r ∂θ r 1 1 ∂ur ∂uφ uφ + − . (d) Substituting (i) and (iii) into (6. We substitute (i) and (iii) into (6. Proof: From (6. Writing the spherical components Eij of E as (Err . 2 r ∂θ ∂r r 1 ∂uθ 1 ∂uφ cot θ 1 + − uφ . ORTHOGONAL CURVILINEAR COORDINATES (c) Substituting (i) and (iii) into (6.33) to calculate calculate E. 2 r sin θ ∂φ ∂r r Alternatively these could have been obtained from the results of Example 6. Example 6.22) is a proper orthogonal matrix. . r ∂θ r 1 ∂uφ ur cot θ + + uθ .
1980. 1987. and therefore Qik Qjk = 121 1 ∂xk ∂xk 1 = gij = δij . Dover. References 1.14) and in the ultimate step we have used (6. 1976. 2.16). Pasadena. Qij = 1 ∂xj 1 Jji = hi ∂ xi ˆ hi where Jij = ∂xi /∂ xj are the elements of the Jacobian matrix. 3. Mathematics Applied to Continuum Mechanics. New York. E. Thus ˆ det[Q] = 1 det[J] > 0 h1 h2 h3 where the inequality is a consequence of the inequalities in (6. California Institute of Technology. Elasticity: Theory and Applications. hi hj ∂ xi ∂ xj ˆ ˆ hi hj where in the penultimate step we have used (6.7). Hence [Q] is proper orthogonal. L. Pawlik.S. Sternberg. Wiley. H. Segel.6.17). WORKED EXAMPLES.A. Reismann and P. California. (Unpublished) Lecture Notes for AM 135: Elasticity. Next.5.8) and (6. Thus [Q] is an orthogonal matrix. from (6. .22) and (6.
.
As a speciﬁc example. minimizes the gravitational potential energy of the system. In each of these problem we have a scalarvalued quantity F such as energy or time that depends on a function φ such as the shape or path. the soap ﬁlm that forms across the loop is the one that minimizes the surface energy (which under most circumstances equals minimizing the surface area of the soap ﬁlm). Numerous problems in physics can be formulated as mathematical problems in optimization. For example a heavy cable that hangs under gravity between two ﬁxed pegs adopts the shape that. For example in optics. We are given two 123 . we want to ﬁnd the path of shortest distance joining those two points which lies entirely on the given surface. One refers to F as a functional and writes F {φ}. the energy minimizing conﬁguration may be straight or bent (buckled). its deformed conﬁguration is the shape which minimizes the total energy of the system. If we dip a (nonplanar) wire loop into soapy water. Fermat’s principle states that the path taken by a ray of light in propagating from one point to another is the path that minimizes the travel time. given some surface and two points on it.1 Introduction. Depending on the load. Note that the scalarvalued function F is deﬁned on a set of functions. consider the socalled Brachistochrone Problem. from among all possible shapes. and we want to ﬁnd the function φ that minimizes the quantity F of interest.Chapter 7 Calculus of Variations 7. Or. if we subject a straight beam to a compressive load. Another common problem occurs in geodesics where. Most equilibrium theories of mechanics involve ﬁnding a conﬁguration of a system that minimizes its energy.
0) and slides along the wire due to gravity. Note ﬁrst. we wish to express the speed v in terms of φ. the arc length ds is related to dx by ds = and so we can write T = 0 dx2 + dy 2 = 1 + (dy/dx)2 dx = 1 + (φ )2 dx 1 1 + (φ )2 dx. the function φ(x). Since we are to minimize T by varying φ. we are to ﬁnd the curve. h) along which a bead slides under gravity. yplane. In the question posed to us. h) least? (0. the conservation of energy tells us that the sum of the potential and kinetic energies does not vary with time: 1 −mgφ(x(t)) + mv 2 (t) = 0. 0 ≤ x ≤ 1. v Next. In order to formulate this problem. it is natural to ﬁrst rewrite the formula for T in a form that explicitly displays its dependency on φ. 2 . which makes T a minimum. with h > 0. that by elementary calculus. A bead is released from rest from the point (0. (0 0) 1 x g (1. CALCULUS OF VARIATIONS points (0. y(t)) denote the coordinates of the bead at time t. 0) to (1. 0) to (1. i. For what shape of wire is the time of travel from (0. 0) to (1.e. The travel time of the bead is T = ds v where the integral is taken along the entire path. let y = φ(x). h). If (x(t). h) in the x. describe a generic curve joining (0. (1 h) y = φ(x) y Figure 7.1: Curve joining (0. Let s(t) denote the distance traveled by the bead along the wire at time t so that v(t) = ds/dt is its corresponding speed. 0) and (1.124 CHAPTER 7. that are to be joined by a smooth wire.
1) Given a curve characterized by y = φ(x). there might be some prohibited region of the x. in which case the minimization is to be carried out subject to the constraint that the length is given. Finally. φ. Our task is to minimize T {φ} over the set A. φ ∈ C 1 [0. that joins (0. i. Our task is to ﬁnd. 0 ≤ x ≤ 1. INTRODUCTION. the length of the curve joining the two points might be prescribed. in the simplest problem in the calculus of variations we are required to ﬁnd a function φ(x) ∈ C 1 [0. but the right hand end of the wire might be allowed to lie anywhere on the vertical line through x = 1. 1]. this formula gives the corresponding travel time for the bead. For example. it is natural to wonder whether a straight line is also the curve that gives the minimum travel time. Thus the set A of admissible functions that we wish to consider is A = φ(·) φ : [0. Remark: One can consider various variants of the Brachistochrone Problem. Solving this for v gives v= 2gφ. 125 where the right hand side is the total energy at the initial instant. φ )dx 0 . 0) and (1.e. 0) to (1. h) we must require that φ(0) = 0. φ and φ are both continuous on [0. Since we are only interested in curves that pass through the points (0. In order to complete the formulation of the problem. A generic curve is described by y = φ(x). And so on. Finally. we should carefully characterize this set of “admissible functions” (or “test functions”). φ(0) = 0. This minimization takes place over a set of functions φ. consider (a) a straight line. φ(1) = h . and (b) a circular arc. Use (7. 1] that minimizes a functional F {φ} of the form 1 (7. 2gφ (7. substituting this back into the formula for the travel time gives 1 T {φ} = 0 1 + (φ )2 dx. Remark: Since the shortest distance between two points is given by the straight line that joins them. Or perhaps the position of the left hand end might be prescribed as above. h).7. 1] → R.1) to calculate the travel time for each of these paths and show that the straight line is not the path that gives the least travel time. for analytical reasons we only consider curves that are continuous and have a continous slope. from among all such curves. 1]. In summary.1. the one that minimizes T {φ}. To investigate this.2) F {φ} = f (x. Or. yplane through which the path is disallowed from passing. φ(1) = h.
and possibly (but not necessarily) side constraints of various forms. ˆ Deﬁne the function F (ε) for −ε0 < ε < ε0 by ˆ F (ε) = F (x0 + εn) (7. a point xo that minimizes F relative to all x that are “close” to x0 . possibly (but not necessarily) boundary conditions at both ends x = 0. F (0) ≥ 0. i. if F (x) ≥ F (xo ) for all x such that x − xo  < r for some r > 0. x2 . We say that xo ∈ A is a minimizer of F if1 F (x) ≥ F (xo ) for all x ∈ A. Perhaps it is useful to begin by reviewing the familiar question of minimization in calculus. .6) ˆ ˆ Since F (x0 + εn) ≥ F (x0 ) it follows that F (ε) ≥ F (0). 7. . (7. In order to speak of such a notion we must have a measure of “closeness”. Then we say that xo is a local minimizer of F if F (x) ≥ F (xo ) for all x in a neighborhood of xo . (7. The test functions (or admissible functions) φ are subject to certain conditions including smoothness requirements. . Thus if x0 is to be a minimizier of F it is necessary that ˆ ˆ F (0) = 0.4) where n is a ﬁxed vector and ε0 is small enough to ensure that x0 +εn ∈ A for all ε ∈ (−ε0 . Thus suppose that the vector space Rn is Euclidean so that a norm is deﬁned on Rn . 1 .e.2 Brief review of calculus.126 CHAPTER 7. xn ) be a realvalued function deﬁned on A.7) A maximizer of F is a minimizer of −F so we don’t need to address maximizing separately from minimizing. i. . 2 (7. CALCULUS OF VARIATIONS over an admissible set of test functions A.5) (7.3) Sometimes we are only interested in ﬁnding a “local minimizer”. ε0 ). Other types of problems will be also be encountered in what follows. Consider a subset A of ndimensional space Rn and let F (x) = F (x1 . In the presence of suﬃcient smoothness we can write 1ˆ ˆ ˆ ˆ F (ε) − F (0) = F (0)ε + F (0)ε2 + O(ε3 ).e. 1.
however.9) which is called the second variation of F . 1) noting that F2 ≥ −1 on A2 . while the value of F2 can get as close as one wishes to −1. bounded and closed). . In the ﬁrst example A1 was unbounded. Finally.8) (7.5). It can be shown that if A is compact and if F is continuous on A then F assumes both maximum and minimum values on A.2. which is called the ﬁrst variation of F and similarly set ˆ δ 2 F (xo . n) = F (0) 127 (7.12) whence we must have F (xo ) = o and the Hessian F (xo ) must be positive semideﬁnite. And in the third example A3 was bounded and closed but the function was discontinuous on A3 . n) = F (0). Here the vector ﬁeld F is the gradient of F and the tensor ﬁeld F is the gradient of F . Remark: It is worth recalling that a function need not have a minimizer.11) i=1 ∂F ∂xi n n ni = 0 x=x0 and i=1 j=1 ∂ 2F ∂xi ∂xj x=x0 ni nj ≥ 0 (7. one necessarily must have δF (xo . Another example is given by the function F2 (x) = x deﬁned on A2 = (−1.7. n) = 0 and δ 2 F(xo . (7. Therefore (7. A2 was bounded but open. In order for a minimizer to exist. For example. n) = F (xo ) · n and that δ 2 F (xo . BRIEF REVIEW OF CALCULUS.10) is equivalent to the requirements that F (xo ) · n = 0 or equivalently n and ( F(xo ))n · n ≥ 0 for all unit vectors n (7. At an interior local minimizer x0 . In the second. ∞) is unbounded as x → ±∞. we know from (7. n) = ( F (xo ))n·n.10) In the present setting of calculus. the function F1 (x) = x deﬁned on A1 = (−∞.8) that δF (xo . the value of F3 can get as close as one wishes to 0 but cannot achieve it since F (0) = 1.e. consider / the function F3 (x) deﬁned on A3 = [−1. n) ≥ 0 for all unit vectors n. note that −1 ∈ A2 . it cannot actually achieve the value −1 since there is no x ∈ A2 at which f (x) = −1. 1] where F3 (x) = 1 for −1 ≤ x ≤ 0 and F (x) = x for 0 < x ≤ 1. (7. It is customary to use the following notation and terminology: we set ˆ δF (xo . A must be compact (i.
i.128 CHAPTER 7. For a function φ in the set of functions that are continuous on an interval [x1 . to ﬁnd φo ∈ A for which F {φ} ≥ F {φo } y for all φ ∈ A. (Of course the norm φ0 can also be used on C 1 [x1 . x1 ≤x≤x2 For a function φ in the set of functions that are continuous and have continuous ﬁrst derivatives on [x1 . for φ ∈ C 1 [x1 . b) (1 y = φ1(x) (0.e.e. CALCULUS OF VARIATIONS 7. y = φ2(x) (1. x2 ]. Most often. x2 ] one can deﬁne a norm by φ1 = max φ(x) + x1 ≤x≤x2 x1 ≤x≤x2 max φ (x).3 The basic idea: necessary conditions for a minimum: δF = 0.2: Two functions φ1 and φ2 that are “close” in the sense of the norm  · 0 but not in the sense of the norm  · 1 . we are typically given a functional F deﬁned on a function space A. i. (0 a) 0 x 1 Figure 7. and we are asked to ﬁnd a function φo ∈ A that minimizes F over A: i. we will be looking for a local (or relative) minimizer. one can deﬁne a norm by φ0 = max φ(x). In the calculus of variations.e. This requires that we select a norm so that the distance between two functions can be quantiﬁed. x2 ]. for a function φ0 that minimizes F relative to all “nearby functions”. for φ ∈ C[x1 . x2 ].) . i. where F : A → R. and so on. δ 2F ≥ 0.e. x2 ].
η} ≥ 0. In this case the minimizer φ0 is being compared with all admissible functions φ whose values are close to those of φ0 for all x1 ≤ x ≤ x2 . then it is necessary that δF {φ0 . (7. η} = F (0) and δ 2 F {φ0 . (7. when seeking a local minimizer we might say we want to ﬁnd φ0 for which F {φ} ≥ F {φo } for all admissible φ such that φ − φ0 1 < r for some r > 0. Since φ is to be admissible. The ﬁrst and second variations of F are deﬁned by ˆ ˆ δF {φ0 . we must have φ0 + εη ∈ A ˆ for each ε ∈ (−ε0 .13) that are close to φ0 . This allows one to eliminate η leading to a condition (or conditions) that only involves the minimizer φ0 .3. Such a local minimizer is called a strong minimizer. In any speciﬁc problem. η} = 0. Consider a functional F {φ} deﬁned on a function space A and suppose that φo ∈ A minimizes F . ˆ Therefore ε = 0 minimizes F (ε). . and so if φ0 minimizes F . In order to determine φ0 we consider the oneparameter family of admissible functions φ(x. the necessary condition δF {φo . δ 2 F {φ0 . such as those in the subsequent sections. here ε is a real variable in the range −ε0 < ε < ε0 and η(x) is a once continuously diﬀerentiable function.14) ˆ ˆ Since φ0 minimizes F it follows that F {φ0 + εη} ≥ F {φ0 } or equivalently F (ε) ≥ F (0). We cannot go further in general. On the other hand. η} = F (0) respectively. The approach for ﬁnding such extrema of a functional is essentially the same as that used in the more familiar case of calculus reviewed in the preceding subsection. A strong minimizer is automatically a weak minimizer. In this case the minimizer is being compared with all functions whose values and whose ﬁrst derivatives are close to those of φ0 for all x1 ≤ x ≤ x2 . Such a local minimizer is called a weak minimizer.15) These are necessary conditions on a minimizer φo . A NECESSARY CONDITION FOR AN EXTREMUM 129 When seeking a local minimizer of a functional F we might say we want to ﬁnd φ0 for which F {φ} ≥ F {φo } for all admissible φ such that φ − φ0 0 < r for some r > 0. Deﬁne a function F (ε) by ˆ F (ε) = F {φ0 + εη}.7. Unless explicitly stated otherwise. η} = 0 can be further simpliﬁed by exploiting the fact that it must hold for all admissible η. ε) = φ0 (x) + ε η(x) (7. ε0 ). in this Chapter we will be examining weak local extrema. −ε0 < ε < ε0 .
1 The basic problem.16) Let f (x.4. (0 a) y = φ0(x) 0 x Figure 7. y. are close to each other for small ε. and their derivatives.17) We wish to ﬁnd a function φ ∈ A which minimizes F {φ}. φ (x)) dx. φ ∈ C 1 [0. . Throughout these notes we will consider functions η that are independent of ε and so. 7. φ(1) = b . deﬁned and smooth for all real x. φ(0) = a. Euler equation.130 CHAPTER 7. φ(1) = b: A = φ(·) φ : [0. we will be restricting attention exclusively to weak minimizers. b) (1 (0. for every φ ∈ A. 0 (7.3: The minimizer φ0 and a neighboring function φ0 + εη. z. 1] → R. Euler equation. y.4 Application of the necessary condition δF = 0 to the basic problem. Deﬁne a functional F {φ}. y y = φ0(x) + εη(x) (1. CALCULUS OF VARIATIONS Remark: Note that when η is independent of ε the functions φ0 (x) and φ0 (x) + εη(x). φ(x). On the other hand the functions φ0 (x) and φ0 (x)+ε sin(x/ε) are close to each other but their derivatives are not close to each other. by 1 F {φ} = f (x. 7. Consider the following class of problems: let A be the set of all continuously diﬀerentiable functions φ(x) deﬁned for 0 ≤ x ≤ 1 with φ(0) = a. (7. as noted previously. z) be a given function. 1].
(7.19) to be ˆ F (ε) = 0 1 ∂f ∂f (x. φ0 + εη. Since φ must be admissible we need φ0 + ε η ∈ A for each ε. ε) = b which in turn requires that η(0) = η(1) = 0.22) . Deﬁne the function F (ε) = F {φ0 + εη} so that ˆ F (ε) = F {φ0 + εη} = 1 f (x. φ0 )η + (x. we ﬁnd F (ε) from (7. φ0 )η ∂y ∂z dx = 0. ∂z ∂z x=0 ∂z 0 0 dx However by (7. APPLICATION OF NECESSARY CONDITION δF = 0 131 Suppose that φ0 (x) ∈ A is a minimizer of F .18) ˆ Pick any function η(x) with the property (7. (7. η} = F (0) = 0.21) reduces to 1 0 ∂f d − ∂y dx ∂f ∂z η dx = 0. φ0 + εη.21). so that F {φ} ≥ F {φ0 } for all φ ∈ A. ε) = φ0 (x) + ε η(x) where ε is a real variable in the range −ε0 < ε < ε0 and η(x) is a once continuously diﬀerentiable function on [0. Therefore we must have φ(0. φ0 .18) we have η(0) = η(1) = 0 and therefore the ﬁrst term on the righthand side drops out. In order to determine φ0 we consider the one parameter family of admissible functions φ(x.20) leads to ˆ δF {φo .21) by parts gives x=1 1 1 ∂f ∂f d ∂f η dx = η − η dx . and so (7. φ0 + εη ) η + (x. (7.3.20) ˆ On using the chainrule.18). see Figure 7.17).18) and ﬁx it. φ0 + εη ) dx. (7. To do this we rearrange the terms in (7. φ0 . 0 (7. φ0 + εη )η ∂y ∂z dx.21) must hold for all functions η that satisfy (7.7. In order to do this we proceed as follows: Integrating the second term in (7.21) into a convenient form and exploit the fact that (7. η} = F (0) = 1 0 ∂f ∂f (x. Our goal is to ﬁnd φ0 and so we must eliminate η from (7. 1].19) We know from the analysis of the preceding section that a necessary condition for φ0 to minimize F is that ˆ δF {φo . φ0 + εη. ε) = a and φ(1.4. Thus (7.21) Thus far we have simply repeated the general analysis of the preceding section in the context of the particular functional (7.
132 CHAPTER 7.17). In view of this Lemma we conclude that the integrand of (7.1. we shall drop the subscript “0” from the minimizing function φ0 . (7. Therefore (7. we recognize that the above derivation is valid for all once continuously diﬀerentiable functions η(x) which have η(0) = η(1) = 0. φ.23) is referred to as the Euler equation (sometimes referred to as the EulerLagrange equation) associated with the functional (7. Here we have 1 + (φ )2 f (x. dx ∂z ∂y This is a diﬀerential equation for φ0 . p(x) = 0 for 0 ≤ x ≤ 1. which together with the boundary conditions φ0 (0) = a.22) must vanish and therefore obtain the diﬀerential equation d ∂f ∂f (x. The Brachistochrone Problem.25) dx ∂φ ∂φ where.23) provides the mathematical problem governing the minimizer φ0 (x).24) (7. 1] and suppose that 1 p(x)n(x)dx = 0 0 for all continuous functions n(x) with n(0) = n(1) = 0. Then. we shall write the Euler equation (7. 7. Consider the Brachistochrone Problem formulated in the ﬁrst example of Section 7. φ and φ as if they were independent variables.4. one treats x.2 An example. φ0 ) = 0 for 0 ≤ x ≤ 1. φ ) − (x. φ. moreover. CALCULUS OF VARIATIONS Though we have viewed η as ﬁxed up to this point.23) as d ∂f ∂f (x.25). φ. The diﬀerential equation (7. (7. φ0 (1) = b. φ0 . φ ) = 2gφ . φ0 .22) must hold for all such functions. Notation: In order to avoid the (precise though) cumbersome notation above. φ ) = 0. in carrying out the partial diﬀerentiation in (7. Lemma: The following is a basic result from calculus: Let p(x) be a continuous function on [0. φ0 ) − (x.
and satisfy the boundary conditions φ(0) = 0. (7.27). 2(φ)3/2 0 < x < 1. We can write the diﬀerential equation as φ φ(1 + (φ )2 ) d dφ φ φ(1 + (φ )2 ) + 1 = 0 2φ2 which can be immediately integrated to give 1 φ(x) = 2 2 (φ (x)) c − φ(x) where c is a constant of integration that is to be determined.27). It is simply concerned with the solving the boundary value problem (7. 2g 2(φ)3/2 ∂f = ∂φ φ 2gφ(1 + (φ )2 ) . φ ) gives: ∂f = ∂φ 1 + (φ )2 1 .23) specializes to d dx φ (φ)(1 + (φ )2 ) − 1 + (φ )2 = 0. φ. φ(1) = h.27) The minimizer φ(x) therefore must satisfy the boundaryvalue problem consisting of the secondorder (nonlinear) ordinary diﬀerential equation (7.26). Treating x.26) and the boundary conditions (7. x = x(θ).4. and therefore the Euler equation (7. (7. APPLICATION OF NECESSARY CONDITION δF = 0 and we wish to ﬁnd the function φ0 (x) that minimizes 1 1 133 F {φ} = f (x. (7. (7. φ = φ(θ).7. It is most convenient to ﬁnd the path of fastest descent in parametric form. and to this end we adopt the substitution φ= c2 (1 − cos θ) = c2 sin2 (θ/2). The rest of this subsection has nothing to do with the calculus of variations.28) .26) with associated boundary conditions φ(0) = 0. φ(1) = h.29) (7. φ (x))dx = 0 0 1 + [φ (x)]2 dx 2gφ(x) over the class of functions φ(x) that are continuous and have continuous ﬁrst derivatives on [0. φ and φ as if they are independent variables and diﬀerentiating the function f (x.1]. φ(x). 2 θ1 < θ < θ2 . θ1 < θ < θ2 .
32) c2 h = (1 − cos θ2 ).30). p → ∞ as θ → 2π−. 1 − cos θ 0 < θ < 2π. together with (7. 2 To this end. φ = 2 Once this pair of equations is solved for c and θ2 then (7.29) and (7. ﬁrst. then.32) by the second. together with (7.31) provides the solution of the problem.28) and (7. (7. (7.32). We thus have c2 x = (θ − sin θ). this leads to φ (x) = dx c2 = (1 − cos θ) dθ 2 which integrates to give x= c2 (θ − sin θ) + c1 . We now address the solvability of (7. 2 0 ≤ θ ≤ θ2 . by dividing the ﬁrst equation in (7. (7. we see that θ2 is a root of the equation p(θ2 ) = h−1 .134 Diﬀerentiating this with respect to x gives CHAPTER 7.33) We now turn to the boundary conditions. dp cos θ/2 = (tan θ/2 − θ/2) > 0 for 0 < θ < 2π. 2 θ1 < θ < θ2 . if we deﬁne the function p(θ) by p(θ) = θ − sin θ .29). gives us θ1 = 0 and c1 = 0. dθ sin3 θ/2 . (7.31) c2 (1 − cos θ). 2 (7.34) One can readily verify that the function p(θ) has the properties p → 0 as θ → 0+. CALCULUS OF VARIATIONS c2 sin θ θ (x) 2 so that. The requirement φ(x) = 0 at x = 0.30) The remaining boundary condition φ(x) = h at x = 1 gives the following two equations for ﬁnding the two constants θ2 and c: c2 1 = (θ2 − sin θ2 ).
y(θ)) θ A θ=π A R x(θ) = P P − AP sin θ = R(θ − sin θ) y(θ) = AP − AP cos θ = R(1 − cos θ) (1 y P Figure 7.5: A cycloid x = x(θ).4.31) with the values of θ2 and c given by (7.5 shows that the curve (7. . the path of minimum descent is given by the curve deﬁned in (7. the equation p(θ2 ) = h−1 can be solved for a unique value of θ2 ∈ (0. P P' x R P A (x(θ). The value of c is then given by (7.7. y = y(θ) is generated by rolling a circle along the xaxis as shown. given any h > 0.32)1 respectively. the function p(θ) increases monotonically from 0 to ∞.4. Thus in summary. the parameter θ having the signiﬁcance of being the angle of rolling. see Figure 7.34) and (7. APPLICATION OF NECESSARY CONDITION δF = 0 p(θ) 135 h−1 θ2 2π θ Figure 7. Therefore. Therefore it follows that as θ goes from 0 to 2π.33) versus θ.31) is a cycloid – the path traversed by a point on the rim of a wheel that rolls without slipping.32)1 .4: A graph of the function p(θ) deﬁned in (7. 2π). Figure 7. Note that given any h > 0 the equation h−1 = p(θ) has a unique root θ = θ2 ∈ (0. 2π).
136 CHAPTER 7. φ. one usually uses the following formal procedure. by δφ we mean δφ = (φ + εη ) − (φ ) = εη = (δφ) . φ + εη. φ. ε=0 (7. that is.3 A Formalism for Deriving the Euler Equation In order to expedite the steps involved in deriving the Euler equation. we mean δF = F {φ + εη} − F {φ} = ε 1 d F {φ + εη} dε η dx = 0 ε=0 = ε 0 1 ∂f ∂φ η+ ∂f ∂φ dx 1 (7. φ + εη ) − f (x.35) dH{φ + εη} dε . ∂φ ∂φ = 1 (7. or δ 0 f dx. When η(0) = η(1) = 0. φ ) εη + (x. 2 . δH = ε For example. Note the following minor change in notation: what we call δH here is what we previously would have called ε δH.38) (7. φ ) ∂f ∂f (x. CALCULUS OF VARIATIONS 7. φ.36) (7. by δf we mean δf = f (x.39) and by δF . then by δH we mean2 δH = H(φ + εη) − H(φ) up to linear terms.37) (7. First.40) δf dx. it follows that δφ(0) = δφ(1) = 0. φ ) εη ∂φ ∂φ ∂f ∂f = δφ + δφ . we adopt the following notation: if H{φ} is any quantity that depends on φ.4. by δφ we mean δφ = (φ + εη) − (φ) = εη. = 0 ∂f ∂f δφ + δφ ∂φ ∂φ We refer to δφ(x) as an admissible variation.
φ. φ. So in particular. δf = fφ δφ + fφ δφ and not δf = fx δx + fφ δφ + fφ δφ . or revert to using ε.7. From here on we can proceed as before by setting δF = 0. Generalization: Free endpoint. 7. and using the boundary conditions and the arbitrariness of an admissible variation δφ(x) to derive the Euler equation. etc. Natural boundary conditions. this in turn leads to4 1 δF = 0 ∂f ∂φ δφ + ∂f ∂φ δφ dx. let us now repeat our previous derivation of the Euler equation using this new notation3 . GENERALIZATIONS. . Since f = f (x. Given the functional F .1 Generalizations. We refer to δF as the ﬁrst variation of the functional F. integrating the second term by parts. a necessary condition for an extremum of F is δF = 0 and so our task is to calculate δF : 1 1 δF = δ 0 f dx = 0 δf dx.41) Finally observe that the necessary condition for a minimum that we wrote down previously can be written as δF {φ. (7. Observe from (7. δφ} = 0 for all admissible variations δφ.42) For purposes of illustration.40) that 1 1 137 δF = δ 0 f dx = 0 δf dx. (7. φ ) dx 0 If ever in doubt about a particular step during a calculation. η etc.5. φ ). Consider the following modiﬁed problem: suppose that we want to ﬁnd the function φ(x) from among all once continuously diﬀerentiable functions that makes the functional 1 F {φ} = 3 f (x.5 7. always go back to the meaning of the symbols δφ.5. 4 Note that the variation δ does not operate on x since it is the function φ that is being varied not the independent variable x.
47) and keep (7.138 CHAPTER 7.49) x=1 for all admissible variations δφ. (7.47) for all admissible variations δφ(x). φ(1) = b. Since δφ(0) and δφ(1) are both arbitrary (and not necessarily zero). 1] (7. First restrict attention to all variations δφ with the additional property δφ(0) = δφ(1) = 0. ∂φ (7. Note that the boundary term in (7.44) We begin by calculating the ﬁrst variation of F : 1 1 1 δF = δ 0 f dx = 0 δf dx = 0 ∂f ∂φ δφ + ∂f ∂φ δφ dx (7.48) in mind.43) Note that the class of admissible functions A is much larger than before. we must have 1 0 ∂f d − ∂φ dx ∂f ∂φ δφ dx + ∂f δφ ∂φ 1 = 0 0 (7. We see that we must have ∂f ∂φ δφ(1) − ∂f ∂φ δφ(0) = 0 x=0 (7. (7.45) Integrating the last term by parts yields 1 δF = 0 d ∂f − ∂φ dx ∂f ∂φ δφ dx + ∂f δφ ∂φ 1 . So the set of admissible functions A is A = φ(·) φ : [0. The boundary terms now drop out and by the Lemma in Section 7. return to (7. 1] → R.47) does not automatically drop out now because δφ(0) and δφ(1) do not have to vanish. CALCULUS OF VARIATIONS a minimum. The functional F {φ} is deﬁned for all φ ∈ A by 1 F {φ} = f (x.46) Since δF = 0 at an extremum.47) must necessarily hold for all such variations δφ. equation (7. φ. 0 (7.49) requires that ∂f = 0 at x = 0 and x = 1.50) ∂φ . Note that we do not restrict attention here to those functions that satisfy φ(0) = a. φ ∈ C 1 [0.4.1 it follows that d ∂f dx ∂φ − ∂f = 0 for 0 < x < 1. φ ) dx. 0 (7. Next.48) This is the same Euler equation as before.
0 1 x g y = φ(x) y Figure 7. 1] → R. φ(0) = 0 . recall that f (x. To ﬁnd the natural boundary condition at the other end. 1]. Our task is to minimize the travel time of the bead T {φ} over the set A2 .50). GENERALIZATIONS. 2gφ and so by (7.50) provides the boundary conditions to be satisﬁed by the extremizing function φ(x). 1 + (φ )2 . 139 Equation (7. the natural boundary coundition is = 0 at x = 1.7. φ ) = Diﬀerentiating this gives ∂f = ∂φ φ 2gφ(1 + (φ )2 ) φ 2gφ(1 + (φ )2 ) . These boundary conditions were determined as part of the extremization. φ ∈ C 1 [0. . note that there is no restriction on φ at x = 1.5. and the same boundary condition φ(0) = 0 at the left hand end. 0) and ends somewhere on the vertical through x = 1.6.6: Curve joining (0.26) as in the ﬁrst problem. φ. Example: Reconsider the Brachistochrone Problem analyzed previously but now suppose that we want to ﬁnd the shape of the wire that commences from (0. The minimizer must satisfy the same Euler equation (7. The only diﬀerence between this and the ﬁrst Brachistochrone Problem is that here the set of admissible functions is A2 = φ(·) φ : [0. see Figure 7. 0) to an arbitrary point on the vertical line through x = 1. they are referred to as natural boundary conditions in contrast to boundary conditions that are given as part of a problem statement.
2 Generalization: Higher derivatives. The bold lines represent a crosssection of the beam in the reference and deformed states. One can consider functionals that involve higher derivatives of φ.140 which simpliﬁes to CHAPTER 7.5. CALCULUS OF VARIATIONS φ (1) = 0. Example: The BernoulliEuler Beam. for example 1 F {φ} = f (x. Since the beam is clamped at the left hand end this means that u(x) is any (smooth enough) function that satisﬁes the geometric boundary conditions u(0) = 0. . φ ) dx. be a geometrically admissible deﬂection of the beam. In the classical BernoulliEuler theory of beams. which is clamped at its left hand end. 0 ≤ x ≤ L. The beam carries a distributed load p(x) along its length and a concentrated force F at the right hand end x = L. 0 We begin with the formulation and analysis of a speciﬁc example and then turn to some theory. crosssections remain perpendicular to the neutral axis. δφ Energy per unit length = 1 M φ = 1 EI(u )2 er 2 2 M M u M = EIφ φ=u y x u Figure 7. φ. φ .7: The neutral axis of a beam in reference (straight) and deformed (curved) states. Consider an elastic beam of length L and bending stiﬀness EI. (7. The functional F {φ} considered above involved a function φ and its ﬁrst derivative φ . 7.51) . both loads act in the −ydirection. u (0) = 0. Let u(x).
The set of admissible test functions that we consider is A = u(·)  u : [0. note that only two boundary conditions u(0) = 0. (7. the negative sign in front of these terms arises because the loads act in the −ydirection while u is the deﬂection in the +ydirection. L]. the boundary condition (7. By using (7.51) require that an admissible variation δu must obey δu(0) = 0. u ∈ C 4 [0.5. 141 the boundary condition (7. δu (0) = 0. L] → R. Note that the integrand of the functional involves the higher derivative term u .52) Φ{u} = 0 1 EI(u (x))2 dx − 2 L 0 p(x)u(x) dx − F u(L).7. In addition. . Therefore the total potential energy of the system is L . We proceed in the usual way by calculating the ﬁrst variation δΦ and setting it equal to zero: δΦ = 0.53) where the last two terms represent the potential energy of the distributed and concentrated loading respectively. Twice integrating the ﬁrst term by parts leads to L 0 L EI u δu dx − 0 p δu dx − F δu(L) − EIu δu L 0 L + EIu δu 0 = 0. The given boundary conditions (7. (7.51)2 describes the geometric condition that the beam is clamped at x = 0 and therefore cannot rotate at the left end.51)1 describes the geometric condition that the beam is clamped at x = 0 and therefore cannot deﬂect at that point. u(0) = 0. u (0) = 0 are given and so we expect to derive additional natural boundary conditions at the right hand end x = L. u (0) = 0 which consists of all “geometrically possible conﬁgurations”. From elasticity theory we know that the elastic energy associated with a deformed conﬁguration of the beam is (1/2)EI(u )2 per unit length.52).53) over the set (7.53) this can be written explicitly as L 0 L EI u δu dx − 0 p δu dx − F δu(L) = 0. Therefore the preceding equation simpliﬁes to L (EI u 0 − p) δu dx − [EIu (L) + F ] δu(L) + EIu (L)δu (L) = 0. The actual deﬂection of the beam minimizes the potential energy (7. GENERALIZATIONS.
5. φ (0) = φ0 . CALCULUS OF VARIATIONS Since this must hold for all admissible variations δu(x). it follows in the usual way that the extremizing function u(x) must obey EI u (x) − p(x) = 0 for 0 < x < L. The natural boundary condition (7. φ . the partial derivatives ∂f /∂φ. φ . φ and φ as if they are independent variables in f (x. the prescribed boundary conditions (7. as before.142 CHAPTER 7. ∂f /∂φ and ∂f /∂φ are calculated by treating φ. u . φ(1) = φ1 . Exercise: Consider the functional 1 F {φ} = f (x.3 . v. Thus the extremizer u(x) obeys the fourth order linear ordinary diﬀerential equation (7. 7.54)2 describes the mechanical condition that the beam carries a concentrated force F at the right hand end. (7. One can consider functionals that involve multiple functions.3 Generalization: Multiple functions. EI u (L) = 0. v ) dx 0 . φ (1) = φ1 . φ )dx 0 deﬁned on the set of admissible functions A consisting of functions φ that are deﬁned and four times continuously diﬀerentiable on [0.54)2. and the natural boundary condition (7. Show that the function φ that extremizes F over the set A must satisfy the Euler equation d ∂f − ∂φ dx ∂f ∂φ d2 + 2 dx ∂f ∂φ =0 for 0 < x < 1 where. 1] and that satisfy the four boundary conditions φ(0) = φ0 .51) and the natural boundary conditions (7. φ. u. φ ).54)3 describes the mechanical condition that the beam is free to rotate (and therefore has zero “bending moment”) at the right hand end. φ. v} = f (x. The functional F {φ} considered above involved a single function φ and its derivatives.54) EI u (L) + F = 0. for example a functional 1 F {u.54)1 .
The bold lines represent a crosssection of the beam in the reference and deformed states. The angle between the neutral axis and the horizontal. We begin with the formulation and analysis of a speciﬁc example and then turn to some theory. the thin line and the bold line would coincide. φ u φ u y x Figure 7. which equals the angle between the perpendicular to the neutral axis (the thin line) and the vertical dashed line. shear deformations are neglected and therefore a crosssection of the beam remains perpendicular to the neutral axis even in the deformed state.5. The beam is clamped at x = 0. In that theory. In the simplest model of a beam – the socalled BernoulliEuler model – the deformed state of the beam is completely deﬁned by the deﬂection u(x) of the centerline (the neutral axis) of the beam. bending stiﬀness5 EI and shear stiﬀness GA. 5 . Here we discuss a more general theory of beams. 143 that involves two functions u(x) and v(x). while I and A are the second moment of crosssection and the area of the crosssection respectively. one that accounts for shear deformations. Example: The Timoshenko Beam. The decrease in the angle between the crosssection and the neutral axis is therefore u − φ.7. also in the −ydirection. so that in the classical BernoulliEuler theory of beams. a crosssection of the beam is not constrained to remain perpendicular to the neutral axis. The angle between the vertical and the bold line if φ. where crosssections remain perpendicular to the neutral axis. Consider a beam of length L. In the theory considered here. is u . it carries a distributed load p(x) along its length which acts in the −ydirection. The thin line is perpendicular to the deformed neutral axis. GENERALIZATIONS. Thus a deformed state of the beam is characterized by two E and G are the Young’s modulus and shear modulus of the material.8: The neutral axis of a beam in reference (straight) and deformed (curved) states. and carries a concentrated force F at the end x = L.
φ(0) = 0. Observe that in the BernoulliEuler theory γ(x) = 0.144 CHAPTER 7. Thus we have the geometric boundary conditions u(0) = 0. characterizes the deﬂection of the centerline of the beam at a location x. and the second.9: Basic constitutive relationships for a beam. the shear strain in the present situation is γ(x) = u (x) − φ(x). CALCULUS OF VARIATIONS ﬁelds: one. characterizes the rotation of the crosssection at x. (In the BernoulliEuler model.) The fact that the left hand end is clamped implies that the point x = 0 cannot deﬂect and that the crosssection at x = 0 cannot rotate. φ(x) = u (x) since for small angles. In the more accurate beam theory discussed here. The basic equations of elasticity tell us that the momentcurvature relation for bending is M (x) = EIφ (x) . one does not neglect shear deformations and so u(x) and φ(x) are (geometrically) independent functions. φ(x). the socalled Timoshenko beam theory. u(x). (7.55) Note that the zero rotation boundary condition is φ(0) = 0 and not u (0) = 0. δφ S γ M M S S = GAγ GA M = EIφ Figure 7. the rotation equals the slope. Since the shear strain is deﬁned as the change in angle between two ﬁbers that are initially at right angles to each other. see Figure 7.8.
φ : [0. is 1 GA(γ(x))2 . We allow for the possibility that p. φ(0) = 0}.9 are free of shear traction. GENERALIZATIONS. this is taken into account approximately by replacing GA by κGA where the heuristic parameter κ ≈ 0.55).56) where the last two terms in this expression represent the potential energy of the distributed and concentrated loading respectively (and the negative signs arise because u is the deﬂection in the +ydirection while the loadings p and F are applied in the −ydirection). Since the top and bottom faces of the diﬀerential element shown in Figure 7. To ﬁnd a minimizer of Φ we calculate its ﬁrst variation which from (7. 2 The total potential energy of the system is thus L Φ = Φ{u. and that the associated elastic energy per unit length of the beam. u(0) = 0. Note that all admissible functions are required to satisfy the geometric boundary conditions (7. The displacement and rotation ﬁelds u(x) and φ(x) associated with an equilibrium conﬁguration of the beam minimizes the potential energy Φ{u. φ(·) u : [0. u ∈ C 2 ([0. we know that the element is not in a state of simple shear. 2 145 Similarly. is 1 EI(φ (x))2 . we know from elasticity that the shear forceshear strain relation for a beam is6 S(x) = GAγ(x) and that the associated elastic energy per unit length of the beam. EI and GA may vary along the length of the beam and therefore might be functions of x.8 − 0. (7. the shear stress must vary with y such that it vanishes at the top and bottom. φ} over the admissible set A of test functions where take A = {u(·). (1/2)M φ . Instead.9. L]).5. In engineering practice. (1/2)Sγ. L]). .7. φ ∈ C 2 ([0. l] → R.56) is L L δΦ = 0 6 EIφ δφ + GA(u − φ)(δu − δφ) dx − 0 p δu dx − F δu(L). l] → R. φ} = 0 1 1 EI(φ (x))2 + GA(u (x) − φ(x))2 2 2 L dx − 0 pu(x)dx − F u(L).
. an equilibrium conﬁguration of the beam is described by the deﬂection u(x) and rotation φ(x) that satisfy the diﬀerential equations (7. Let φ1 (x). . . and collecting the like terms in the preceding equation leads to δΦ = EIφ (L) δφ(L) + GA u (L) − φ(L) − F δu(L) L − − 0 L 0 d EIφ + GA(u − φ) δφ(x) dx dx d GA(u − φ) + p δu(x) dx. yn . 0 < x < L. . . (7. . dx (7. zn . . 1] GA u (L) − φ(L) = F (7.59). . . . . . we have δΦ = 0 for all admissible variations. z2 . δφ(x) are arbitrary on 0 < x < L and since δu(L) and δφ(L) are also arbitrary. y1 . 0 < x < L. .57) that the ﬁeld equations d EIφ + GA(u − φ) = 0. Since the variations δu(x). Thus in summary. zn ) deﬁned for all x. . it follows from (7.55).57) At a minimizer. [Remark: Can you recover the BernoulliEuler theory from the Timoshenko theory in the limit as the shear rigidity GA → ∞?] Exercise: Consider a smooth function f (x. . y2 .58) d GA(u − φ) + p = 0. y2 . . φn (x) be n oncecontinuously diﬀerentiable functions on [0. Finally on using the facts that an admissible variation must satisfy δu(0) = 0 and δφ(0) = 0. yn . z1 .59) . dx (7. CALCULUS OF VARIATIONS Integrating the terms involving δu and δφ by parts gives δΦ = + − EIφ δφ d EIφ δφ dx 0 0 dx L L d GA(u − φ)δu − GA(u − φ) δu dx − 0 0 dx L L − L 0 GA(u − φ)δφ dx L 0 p δu dx − F δu(L).58) and the boundary conditions (7. . dx and the natural boundary conditions EIφ (L) = 0. y1 . z1 . φ2 (x). must hold. . .146 CHAPTER 7. .
. y y = φ(x) + δφ(x) (0. (0 a) G(x. Moreover. . a) and end at some point on the curve G(x. note that the abscissa of the point at which a neighboring curve deﬁned by y = φ(x) + δφ(x) intersects the curve G = 0 is not xR but xR + δxR . . . . . . (7. y) = 0. Observe that xR is not known a priori and is to be determined along with φ. We wish to minimize a functional F {φ} over this set of functions. . with φi (0) = ai . φ2 . . F {φ} = 0 xR f (x. φ1 . y) = 0. φ1 . n). Let x = xR be the abscissa of the point at which the curve y = φ(x) intersects the curve G(x. (i = 1.7. GENERALIZATIONS.61) 7. φn (x) that extremize F must necessarily satisfy the n Euler equations d ∂f ∂f =0 − dx ∂φi ∂φi for 0 < x < 1. φ. . . y = φ(x) 0 xR xR + δxR x Figure 7. see Figure 7. y) = 0. . a) to an arbitrary point on the given curve G(x.5. φ2 . φ2 (x). Let F be the functional deﬁned by 1 147 F {φ1 . yplane that commence from a given point (0. φ )dx . φn ) dx 0 (7. . φi (1) = bi . .5. . φn } = f (x. . At the minimizer. φ2 . φn . .10: Curve joining (0. .4 Generalization: End point of extremal lying on a curve. Show that the functions φ1 (x). Consider the set A of all functions that describe curves in the x. .60) on the set of all such admissible functions. Suppose that φ(x) ∈ A is a minimizer of F . y) = 0 x.10. . 2.
dx 0 ≤ x ≤ xR . Thus setting the ﬁrst variation δF equal to zero gives xR f xR . φ + δφ )dx. φ )δφ + fφ (x. φ )δφ dx which in turn reduces to xR δF = f xR . φ )dx 0 where we have set fφ = ∂f /∂φ and fφ = ∂f /∂φ . this leads to the Euler equation fφ − d fφ = 0.62) First limit attention to the subset of all test functions that terminate at the same point (xR . φ + δφ. In this case δxR = 0 and δφ(xR ) = 0 and so the ﬁrst two terms in (7. φ (xR ) δxR + 0 fφ δφ + fφ δφ dx. φ. φ. Substituting (7.64) . φ(xR ). dx (7. φ(xR ). φ. φ(xR )) as the minimizer. φ. This leads to xR +δxR xR δF = xR f (x. φ.62) vanish. on using the fact that δφ(0) = 0. φ(xR ). Since this specialized version of equation (7. φ (xR ) δxR + 0 fφ δφ + fφ δφ dx = 0. φ(xR ). After integrating the last term by parts we get f xR . 0 Therefore on calculating the ﬁrst variation δF . φ (xR ) δxR + fφ xR . φ (xR ) δxR + fφ δφ xR 0 xR + 0 fφ − d fφ δφ dx = 0 dx xR which. φ ) + fφ (x. φ. φ )δφ dx − f (x. φ )dx + 0 fφ (x. φ(xR ). φ (xR ) δφ(xR ) = 0 (7.62) must hold for all such variations δφ(x).148 and at a neighboring test function CHAPTER 7. φ.63) We now return to arbitrary admissible test functions.63) into (7. which equals the linearized form of F {φ + δφ} − F {φ}. we ﬁnd xR +δxR xR δF = 0 f (x. reduces to f xR . φ(xR ).62) gives f xR . φ (xR ) δxR + fφ xR . φ (xR ) δφ(xR ) + 0 fφ − d fφ δφ dx = 0. CALCULUS OF VARIATIONS xR +δxR F {φ + δφ} = f (x. (7. φ )δφ + fφ (x. φ(xR ).
(7. (7. φ(xR ). This implies that7 f xR . φ(xR )) + φ (xR )Gy (xR . where we have set Gx = ∂G/∂x and Gy = ∂G/∂x. φ(xR + δxR ) + δφ(xR + δxR )) = G(xR + δxR . Thus (7. for some constant λ (referred to as a Lagrange multiplier). we must satisfy the condition dI = (∂I/∂ε1 )dε1 + (∂I/∂ε2 )dε2 = 0. φ(xR ). It is important to observe that since admissible test curves must end on the curve G = 0. = G(xR . φ(xR )) = 0. φ(xR )) = 0. ε2 ). one ﬁnds that that one must require the conditions ∂I/∂ε1 = λ∂J/∂ε1 . φ(xR )) + φ (xR )Gy (xR . φ(xR )) = 0 thus leads to the following relation between the variations δxR and δφ(xR ): Gx (xR . only for those that are consistent with this geometric requirement.66) fφ xR . φ(xR )) + Gx (xR . φ(xR )) φ (xR )δxR + δφ(xR ) . GENERALIZATIONS. ε2 ) = 0 then we must respect the side condition dJ = (∂J/∂ε1 )dε1 + (∂J/∂ε2 )dε2 = 0. But if this minimization is carried out subject to the side constraint J(ε1 . φ (xR ) − λGy (xR . φ(xR )) + φ (xR )Gy (xR . Note that linearization gives G(xR + δxR . Setting δG = G(xR + δxR . φ (xR ) − λGx (xR . φ(xR ))δxR + Gy (xR . Thus (7. φ(xR )) = 0. φ(xR )) = 0. The constrain equation J = 0 provides the extra condition required for this purpose. φ(xR )) δφ(xR ) = 0. . . the quantities δxR and δφ(xR ) are not independent of each other. The requirement that the minimizing curve and the neighboring test curve terminate on the curve G(x. φ(xR ). φ(xR )) δxR + Gy (xR . We can use the second equation above to simplify the ﬁrst equation which then leads to the pair of equations f xR . φ(xR ).65).5. y) = 0 implies that G(xR .64) does not hold for all δxR and δφ(xR ). φ(xR ). φ (xR ) − φ (xR )fφ xR . φ(xR )) = 0. φ(xR ) + φ (xR )δxR + δφ(xR )) = G(xR . φ (xR ) − λ Gx (xR . φ(xR )) δφ(xR ). φ(xR )) + Gx (xR . 7 G(xR + δxR . φ(xR + δxR ) + δφ(xR + δxR )) = 0. fφ xR . φ (xR ) − λGy (xR . Under these circumstances.64) must hold for all δxR and δφ(xR ) that satisfy (7.65) It may be helpful to recall from calculus that if we are to minimize a function I(ε1 . φ(xR )) δxR + Gy (xR .7. φ(xR + δxR ) + δφ(xR + δxR )) − G(xR . 149 which must hold for all admissible δxR and δφ(xR ). ∂I/∂ε2 = λ∂J/∂ε2 where the Lagrange multiplier λ is unknown and is also to be determined.
φ.67). φ (xR ) 1 + φ 2 (xR ) = λc2 . the boundary condition φ = a at x = 0.150 CHAPTER 7. and the equation G(xR . Since ds = dx2 + dy 2 = 1 + (φ )2 dx we are to minimize the functional xR F = 0 1 + (φ )2 dx. (7. φ ) = 0 and fφ (x. the two natural boundary conditions (7. CALCULUS OF VARIATIONS Equation (7. Finally the condition G(xR . φ(xR )) = 0 requires that c1 xR + c2 φ(xR ) + c3 = 0. Solving the preceding equations leads to the minimizer φ(x) = (c2 /c1 )x + a for 0 ≤ x ≤ − c1 (ac2 + c3 ) . Thus f (x. φ(xR )) = 0.66) provides two natural boundary conditions at the right hand end x = xR . φ(xR )) = 0.66) at the right hand end give 1 1 + φ 2 (xR ) = λc1 .) Example: Suppose that G(x. φ. In summary: an extremal φ(x) must satisfy the diﬀerential equations (7.66) at x = xR . c2 + c2 1 2 . φ ) = 1 + (φ )2 . (Note that the presence of the additional unknown λ is compensated for by the imposition of the additional condition G(xR . while the boundary conditions (7. for 0 ≤ x ≤ xR . the Euler equation (7.63) on 0 ≤ x ≤ xR . y) = c1 x+c2 y +c3 and that we are to ﬁnd the curve of shortest length that commences from (0. fφ (x.63) can be integrated immediately to give φ (x) = constant The boundary condition at the left hand end is φ(0) = a. φ. a) and ends on G = 0.67) On using (7. φ ) = φ 1 + (φ )2 .
e. dε2 that are consistent with the constraint. (7. ∂ε2 ∂ε2 (7.1 Constrained Minimization Integral constraints.6 7. φ2 (x). ε2 ) = F {φ1 (x) + ε1 η1 (x). (7. φ1 (x). ε2 ) = G(φ1 (x) + ε1 η1 (x).6. φ2 (x) that minimizes 1 F {φ1 . that ˆ ˆ ∂F ∂F ˆ dF = dε1 + dε2 = 0.71) ∂ε1 ∂ε2 for all dε1 . φ2 (x) + ε2 η2 (x)) = 0.72) ˆ If we didn’t have the constraint. (7.72) holds.69) For reasons of clarity we shall return to the more detailed approach where we introduce parameters ε1 . Consider a problem of the following general form: ﬁnd admissible functions φ1 (x). (7. φ2 (x) + ε2 η2 (x) we have ˆ F (ε1 . By evaluating F and G on a family of neighboring admissible functions φ1 (x) + ε1 η1 (x). ε2 ) = 0.73) . Instead the constraint requires that they be related by ˆ dG = ˆ ˆ ∂G ∂G dε1 + dε2 = 0.6.68) G(φ1 .70) If we begin by keeping η1 and η2 ﬁxed. φ1 (x). However when the constraint equation (7. i. dε1 and dε2 cannot be varied independently. A necessary condition for this is that ˆ dF (ε1 . φ1 (x). ˆ G(ε1 . δφ2 (x). CONSTRAINED MINIMIZATION 151 7. subject to the constraint G(ε1 . φ2 (x)) dx = 0. this is a classical minimization problem for a function ˆ of two variables: we are to minimize the function F (ε1 . suppose that the pair φ1 (x). ∂ε1 ∂ε2 (7.7. Because of the constraint. φ2 (x) + ε2 η2 (x)}. φ2 (x) is the minimizer. Accordingly. ε2 ) = 0. then (7. φ2 ) = 0 f (x. φ2 (x). φ1 (x). ∂ε1 ∂ε1 ˆ ˆ ∂F ∂G =λ .71) only requires that ˆ ˆ ∂G ∂F =λ . φ2 (x)) dx 0 (7. η1 (x). ε2 ) with respect to the variables ˆ ε1 and ε2 . φ2 } = subject to the constraint 1 f (x.71) would imply the usual requirements ∂ F /∂ε1 = ˆ ∂ F /∂ε2 = 0. rather than following the formal approach using variations δφ1 (x). ε2 and functions η1 (x).
Proceeding from here on leads to the Euler equation associated with F − λG.152 for some constant λ. ∂ε1 CHAPTER 7. Example: Consider a heavy inextensible cable of mass per unit length m that hangs under gravity. The potential energy of the cable is determined by integrating mgφ with respect to the arc length s along the cable which. that minimizes V {φ} subject to the constraint {φ} = L. We are asked to determine this shape. The two ends of the cable are held at the same vertical height. λ is known as a Lagrange multiplier. −H ≤ x ≤ H. According to the theory developed . since ds = dx2 + dy 2 = 1 + (φ )2 dx.76) must equal L.75) Since the cable is inextensible. describe an admissible shape of the cable. is given by L H V {φ} = mgφ ds = mg 0 −H φ 1 + (φ )2 dx. CALCULUS OF VARIATIONS ∂ ˆ ˆ (F − λG) = 0. ∂ε2 (7. Therefore we are asked to ﬁnd a function φ(x) with φ(−H) = φ(H). (7. The presence of the additional unknown parameter λ is balanced by the availability of the constraint equation G = 0. a distance 2H apart. We know from physics that the cable adopts a shape that minimizes the potential energy. y y = φ(x) g −H 0 H x Let y = φ(x). The cable has a given length L.74) ˆ ˆ ˆ ˆ Therefore minimizing F subject to the constraint G = 0 is equivalent to minimizing F − λG without regard to the constraint. or equivalently ∂ ˆ ˆ (F − λG) = 0. its length L H {φ} = ds = 0 −H 1 + (φ )2 dx (7.
−H < x < H.77) The constant λ in (7.78) gives the equation describing the shape of the cable. the function sinh z − µz starts from the value 0. Integrating this again leads to φ = φ(x) = c cosh[(x + d)/c] + λ. then (7. given by (7. where d is a second constant of integration. Then we must solve µz = sinh z where µ > 1 is a constant. where the constant λ is a Lagrange multiplier. Calculating the ﬁrst variation of V −λmg . All that remains is to examine the solvability of (7.6. This can be integrated once to yield (φ − λ)2 − 1 c2 where c is a constant of integration. To this end set z = H/c and µ = L/(2H). if equation (7. we must have φ (0) = 0 and therefore d = 0. For example we could take the xaxis to pass through the two pegs in which case φ(±H) = 0 and then λ = −c cosh(H/c) and so φ(x) = c cosh(x/c) − cosh(H/c) .78) Substituting (7. Consequently (7. L. leads to the Euler equation d dx (φ − λ) φ 1 + (φ )2 − 1 + (φ )2 = 0.79) has a unique root c > 0.78) into the constraint condition {φ} = L with L = 2c sinh(H/c). The resulting boundary value problem together with the constraint = L yields the shape of the cable φ(x).76) yields (7. −H < x < H. Thus φ(x) = c cosh(x/c) + λ.79). −H < x < H. 2H.79) can be solved for c. decreases monotonically to some ﬁnite negative value at some z = z∗ > 0. For symmetry. Thus for each µ > 0 the function sinh z − µz vanishes at some unique positive value of z.79) Thus in summary. (The requirement µ > 1 follows from the physical necessity that the distance between the pegs.77) is simply a reference height.) One can show that as z increases from 0 to ∞. CONSTRAINED MINIMIZATION 153 above. be less than the length of the rope. (7. this function must satisfy the Euler equation associated with the functional V {φ} − λ {φ} where the Lagrange multiplier λ is a constant. . (7. −H < x < H. and then increases monotonically to ∞.7.
CALCULUS OF VARIATIONS 7. be two points on this surface. One can show that a necessary condition is that the minimizer should satisfy the Euler equation associated with f −λg. x1 g x2 P Q x3 Figure 7. From among all such wires. In this problem the Lagrange multiplier λ may be a function of x. p2 . x2 = φ2 (x3 ) for p3 ≤ x3 ≤ q3 . p3 ) and Q = (q1 . 2 3 Suppose that the wire can be described parametrically by x1 = φ1 (x3 ). 1 2 R(x3 ) = x3 tan α. q2 . φ2 )dx 0 subject to the algebraic constraint g(x. Let P = (p1 .2 Algebraic constraints Now consider a problem of the following general type: ﬁnd a pair of admissible functions φ1 (x).11: A curve that joins the points (p1 . q3 > p3 .6. φ1 .154 CHAPTER 7. q3 ) and lies on the conical surface x2 + 1 x2 − x2 tan2 α = 0. φ1 (x). x2 . p2 . Example: Consider a conical surface characterized by g(x1 . q3 ). φ2 (x)) = 0 for 0 ≤ x ≤ 1. x3 ) = x2 + x2 − R2 (x3 ) = 0. A bead slides along the wire under gravity. A smooth wire lies entirely on the conical surface and joins the points P and Q. beginning at rest from P. φ2 (x) that minimizes 1 f (x. (Not all of the permissible curves can be described this way and so by using . we are to ﬁnd the one that gives the minimum travel time. x3 > 0. φ2 . q2 . φ1 . p3 ) to (q2 .
φ2 (x3 ). (7. p3 ≤ x3 ≤ q3 .3 Diﬀerential constraints Now consider a problem of the following general type: ﬁnd a pair of admissible functions φ1 (x). 3 1 2 subject to the prescribed conditions at the ends and the constraint g(x1 . p3 Our task is to minimize T {φ1 . φ1 .6. φ2 (x3 ). or v= Therefore the travel time is q3 2g(x3 − p3 ). 2 . q3 ]. φi (p3 ) = pi . T {φ1 . φ1 . φi (q3 ) = qi .) Since the curve has to lie on the conical surface it is necessary that g(φ1 (x3 ). x2 . q3 ] → R. φ2 . φ2 . φi ∈ C 2 [p3 .80) The travel time is found by integrating ds/v along the path.6. φ2 ) φi : [p3 . CONSTRAINED MINIMIZATION 155 this characterization we are limiting ourselves to a subset of all the permited curves. φ2 ) = (φ1 )2 + (φ2 )2 + 1 2g(x3 − p3 ) and g(x1 . x2 . According to the theory developed the solution is given by solving the Euler equations associated with f − λ(x3 )g where f (x3 . The arc length ds along the path is given by ds = dx2 + dx2 + dx2 = 3 2 1 (φ1 )2 + (φ2 )2 + 1 dx3 .7. 7. i = 1. φ1 . φ2 } over the set of admissible functions A = (φ1 . x3 ) = 0. φ2 )dx 0 . x3 ) = 0. φ1 . φ2 } = (φ1 )2 + (φ2 )2 + 1 2g(x3 − p3 ) dx3 . x3 ) = 0. subject to the constraint g(φ1 (x3 ). φ2 (x) that minimizes 1 f (x. p3 ≤ x3 ≤ q3 . 1 The conservation of energy tells us that 2 mv 2 (t) − mgx3 (t) = −mgp3 . x3 ) = x2 + x2 − x2 tan2 α.
(In dynamics. Remark: Note by substituting the constraint into the integrand of the functional that we can equivalently pose this problem as one for determining the function φ1 (x) that minimizes 1 f (x.83) Thus the functions φ1 (x). φ1 . φ1 (x). φ2 (x)) such that g = dh/dx. φ1 . In these problems. φ2 (x). φ1 (x). φ2 )dx 0 over an admissible set of functions subject to the nonholonomic constraint g(x. Example: Determine functions φ1 (x) and φ2 (x) that minimize 1 f (x. dx ∂φ1 ∂φ1 d ∂f +λ=0 dx ∂φ2 for 0 < x < 1.81). (7. φ1 (x). λ(x) are determined from the three diﬀerential equations (7. φ1 )dx 0 over an admissible set of functions. i. On substituting for f and g. (7. such constraints are called nonholonomic. these Euler equations reduce to d ∂f ∂f +λ − = 0.81) According to the theory above. φ1 .) One can show that it is necessary that the minimizer satisfy the Euler equation associated with f − λg. the minimizers satisfy the Euler equations ∂h d ∂h = 0. for 0 ≤ x ≤ 1. φ2 (x)) = 0.82) where h = f − λg. CALCULUS OF VARIATIONS subject to the diﬀerential equation constraint g(x.83). (7. Suppose that the constraint is not integrable. φ2 . φ1 . suppose that there does not exist a function h(x.e. (7. the Lagrange multiplier λ may be a function of x. − dx ∂φ1 ∂φ1 d ∂h ∂h =0 − dx ∂φ2 ∂φ2 for 0 < x < 1. φ2 ) = φ2 − φ1 = 0. φ2 (x).156 CHAPTER 7. φ1 . . φ1 . for 0 ≤ x ≤ 1.
WEIRSTRASSERDMAN CORNER CONDITIONS 157 7. 1]. the minimizer must necessarily satisfy the Euler equation dx (∂f /∂φ ) − (∂f /∂φ) = 0. we would have found it from the preceding calculation. It is natural to wonder whether there is a function φ∗ (x) that gives F {φ∗ } = 0. Therefore if there is a function φ∗ of this type. On enforcing the boundary conditions φ(0) = φ(1) = 0. we know that it cannot belong to the class of admissible functions considered above. 1].84) This is apparently a problem of the classical type where in the present case we are 2 to minimize the integral of f (x.85) (1 − x) for 1/2 ≤ x ≤ 1.7 Piecewise smooth minimizers. and gives F {φ∗ } = 0. this gives φ(x) = 0 for 0 ≤ x ≤ 1. Since φ∗ ∈ A it must not satisfy / one or both of these two conditions.7. 1]. It is continuous.84) that the integrand itself should vanish almost everywhere in [0. The piecewise linear function x for 0 ≤ x ≤ 1/2. φ∗ (x) = (7. Assuming that the class of admissible functions are those that are C 1 [0.7. φ ) = (φ )2 − 1 with respect to x over the interval [0.84) that F ≥ 0. If so. φ∗ would be a minimizer. If there is such a function φ∗ . it follows from the nonnegative character of the integrand in (7. since if it did. 0 ((φ )2 − 1)2 dx (7.84) that the value of F at this particular function φ(x) = 0 is F = 1 Note from (7. Therefore it must be true that φ is not as smooth as C 1 [0. φ. it does not belong to the set of functions A. has this property. This is an extremizer of F {φ} over the class of admissible functions under consideration. This requires that φ (x) = ±1 almost everywhere in [0. In the present case this specializes to 2[ (φ )2 −1](2φ ) = constant for 0 ≤ x ≤ 1. The problem statement requires that the boundary conditions must hold. The functions in A were required to be C 1 [0. 1] and to vanish at the two ends x = 0 and x = 1. 1] and satisfy d φ(0) = φ(1) = 0. In order to motivate the discussion to follow. 1]. WeirstrassErdman corner conditions. ﬁrst consider the problem of minimizing the functional 1 F {φ} = over functions φ with φ(0) = φ(1) = 0. . If there is a function φ∗ such that F {φ∗ } = 0. is piecewise C 1 . which in turn gives φ (x) = constant for 0 ≤ x ≤ 1. It is readily seen from (7. Moreover φ∗ (x) satisﬁes the Euler equation except at x = 1/2.
7. 1]. s) ∪ (s. 1]. for example. that are continuous at x = s. A speciﬁc example will be considered below. Thus the set of admissible functions is composed of all functions that are smooth on either side of x = s.158 CHAPTER 7. 1]). φ(1) = φ1 } . . 1] → R. required to have a continuous ﬁrst derivative on either side of x = s.e. φ. and suppose further that we know that the extremal φ(x) is continuous but has a discontinuity in its slope at x = s: i.7. and its ﬁrst derivative is permitted to have a jump discontinuity at a given location x = s. φ(1) = φ1 : A = {φ(·) φ : [0. φ (s−) = φ (s+) where φ (s±) denotes the limiting values of φ (s ± ε) as ε → 0. φ(0) = φ0 . and that satisfy the given boundary conditions φ(0) = φ0 . φ )dx 0 over some suitable set of admissible functions.1 Piecewise smooth minimizer with nonsmoothness occuring at a prescribed location.12: Extremal φ(x) and a neighboring test function φ(x) + δφ(x) both with kinks at x = s. Observe that an admissible function is required to be continuous on [0. CALCULUS OF VARIATIONS But is it legitimate for us to consider piecewise smooth functions? If so are there are any restrictions that we must enforce? Physical problems involving discontinuities in certain physical ﬁelds or their derivatives often arise when. φ ∈ C[0. y φ(x) + δφ(x) φ(x) + δφ(x) φ(x) φ(x) s 1 x Figure 7. φ ∈ C 1 ([0. the problem concerns an interface separating two diﬀerent mateirals. Suppose that we wish to extremize the functional 1 F {φ} = f (x.
φ )dx.7. 1). Second. which by deﬁnition equals F {φ + δφ} − F {φ} upto terms linear in δφ. x=s+ First. x = s. this implies that the term within the brackets in the integrand must vanish at each of these x’s. the second term in the equation above vanishes. x = s. this simpliﬁes to 1 0 ∂f d − ∂φ dx ∂f ∂φ δφ(x) dx + ∂f ∂φ x=s− − ∂f ∂φ δφ(s) = 0. This implies that δφ(x) ∈ C([0. x=s+ However. Integrating the terms involving δφ by parts leads to s 0 d (fφ ) δφ dx + fφ − dx 1 s d (fφ ) δφ dx + fφ − dx ∂f δφ ∂φ s− + x=0 ∂f δφ ∂φ 1 = 0. In view of the lack of smoothness at x = s it is convenient to split the integral into two parts and write s 1 F {φ} = and F {φ + δφ} = 0 s f (x. φ + δφ. if we limit attention to variations that are such that δφ(s) = 0. Since the resulting equation must hold for all variations δφ(s). This leads to the Euler equation ∂f d − ∂φ dx ∂f ∂φ =0 for 0 < x < 1. φ. see Figure 7. since δφ(0) = δφ(1) = 0. Let δφ(x) be an admissible variation which means that the neighboring function φ(x) + δ(x) is also in A which means that it is C 1 on [0. φ )dx + 0 s f (x. the integral now disappears. when this is substituted back into the equation above it. s) ∪ (s. φ + δφ )dx.7. Since δφ(x) can be chosen arbitrarily for all x ∈ (0. 1]). φ + δφ )dx + s f (x. δφ(x) ∈ C 1 ([0. it follows that we must have ∂f ∂φ x=s− = ∂f ∂φ x=s+ . 1 f (x. we obtain s 1 (fφ δφ + fφ δφ ) dx + 0 s (fφ δφ + fφ δφ ) dx = 0. δφ(0) = δφ(1) = 0. φ. 1] and (may) have a jump discontinuity in its ﬁrst derivative at the location x = s. Upon calculating δF . and setting δF = 0. 1]). φ + δφ.12. s) ∪ (s. and only the integral remains. WEIRSTRASSERDMAN CORNER CONDITIONS 159 Suppose that F is extremized by a function φ(x) ∈ A and suppose that this extremal has a jump discontinuity in its ﬁrst derivative at x = s.
B) in the right halfplane. (7. ∂f ∂f = at x = s. the quantity ∂f /∂φ is continuous at this point. A) a. see Figure 7. A) in the left halfplane to the point (b. The matching condition shows that even though φ has a jump discontinuity at x = s. Thus in summary an extremal φ must obey the following boundary value problem: ∂f d ∂f − =0 for 0 < x < 1. n2(x. x = s. dx ∂φ ∂φ φ(0) = φ0 . In particular.13: Ray of light in a twophase material. (b. θ+ θ− 0 x (a. This is a “matching condition” or “jump condition” that relates the solution on the left of x = s to the solution on its right.86) φ(1) = φ1 . The material occupying x < 0 is diﬀerent to the material occupying x > 0 and so x = 0 is the interface between the two materials. Example: Consider a twophase material that occupies all of x. y) and n2 (x. followed by a ray of light travelling from a point (a. We are asked to determine the path y = φ(x). In particular. a ≤ x ≤ b. x = 0. y) respectively. we are to determine the conditions at the point where the ray crosses the interface between the two media.13. −∞ < y < ∞ Figure 7. . suppose that the refractive indices of the materials occupying x < 0 and x > 0 are n1 (x. y) x. ∂φ x=s− ∂φ x=s+ y x. y) b. yspace. B) n1(x.160 CHAPTER 7. CALCULUS OF VARIATIONS at x = s.
7. y) where n(x. Thus the transit time is determined by integrating n/c along the path followed by the light. b]). WEIRSTRASSERDMAN CORNER CONDITIONS 161 According to Fermat’s principle. y). φ(x)) and θ(x) as x → 0±. Note that this set of admissible functions allows the path followed by the light to have a kink at x = 0 even though the path is continuous. . Observe that. then tan θ = φ and so sin θ = φ / 1 + (φ )2 . b]. φ. φ(b) = B}. φ ) dx a where f (x. y) is the index of refraction at the point (x. This is Snell’s wellknown law of refraction. a ray of light travelling between two given points follows the path that it can traverse in the shortest possible time. The functional we are asked to minimize can be written in the standard form b T {φ} = Therefore f (x. which. Therefore the matching condition requires that n sin θ be continuous. we know that light travels at a speed c/n(x. φ(a) = A. 0) ∪ (0. φ ∈ C 1 ([a. or n+ sin θ+ = n− sin θ− where n± and θ± are the limiting values of n(x. φ ) = n(x. c Thus the problem at hand is to determine φ that minimizes the functional T {φ} over the set of admissible functions A = {φ(·) φ ∈ C[a. Also.7. if θ is the angle made by the ray of light with the xaxis at some point along its path. φ(x)) 1 + (φ )2 dx. since ds = 1 + (φ )2 dx can be written as b T {φ} = a 1 n(x. φ) c 1 + (φ )2 . φ) ∂f = ∂φ c φ 1 + (φ )2 and so the matching condition at the kink at x = 0 requires that n c φ 1 + (φ )2 be continuous at x = 0. n(x. φ.
φ ∈ Cp [0. the location s is not known a priori and so is also to be determined.14. Suppose further that φ is C 1 on either side of x = s. 1] except at x = s + δs where it has a jump discontinuity in its ﬁrst derivative: δφ ∈ C[0. . However in contrast to the preceding case. Note that φ(x) + δφ(x) has kinks at both x = s and x = s + δs. 1] → R. 1]. CALCULUS OF VARIATIONS 7. 1]. δφ(0) = δφ(1) = 0. s + δs) ∪ C 1 (s + δs. 1]. See Figure 7. Note further that we have varied the function φ(x) and the location of the kink s. φ ) dx over the admissible set of functions 1 A = {φ(·) : φ : [0. φ(1) = b} Just as before. the admissible functions are continuous and have a piecewise continuous ﬁrst derivative. φ(0) = a. φ ∈ C[0.162 CHAPTER 7.2 Piecewise smooth minimizer with nonsmoothness occuring at an unknown location Suppose again that we wish to extremize the functional 1 F (φ) = 0 f (x. φ. 1] ∪ C 1 [0. 1]. is continuous on [0.14: Extremal φ(x) with a kink at x = s and a neighboring test function φ(x) + δφ(x) with kinks at x = s and s + δs. (we shall say that φ has a “kink” at x = s). Suppose that F is extremized by the function φ(x) and that it has a jump discontinuity in its ﬁrst derivative at x = s. y φ(x) + δφ(x) φ(x) + δφ(x) φ(x) φ(x) s s + δs 1 x Figure 7. is C 1 on [0.7. Consider a variation δφ(x) that vanishes at the two ends x = 0 and x = 1. if there is discontinuity in the ﬁrst derivative of φ at some location x = s.
∂f ∂φ B = . φ + δφ )dx + 0 1 s f (x.88) s− s+ s− = f −φ s+ . 0 where A = ∂f d − ∂φ dx ∂f ∂φ ∂f ∂φ − . φ + δφ )dx f (x. φ.7. ∂f ∂φ (7. Calculating δF in this way and setting the result equal to zero. WEIRSTRASSERDMAN CORNER CONDITIONS 163 Since the extremal φ(x) has a kink at x = s it is convenient to split the integral and express F {φ} as s 1 F {φ} = f (x. φ + δφ. φ. φ )dx. by deﬁnition. Similarly since the the neigboring function φ(x) + δ(x) has kinks at x = s and x = s + δs. φ )dx + 0 s f (x. (7. it follows in the usual way that A.87) ∂f ∂φ x=s− C = f −φ ∂f ∂φ x=s− − f −φ .7. equals F {φ + δφ} − F {φ} upto terms linear in δφ. it is convenient to express F {φ + δφ} by splitting the integral into three terms as follows: s s+δs F {φ + δφ} = + f (x. s) ∪ (s. φ + δφ.88) is the same condition that was derived in the preceding subsection. B and C all must vanish. and the following two additional requirements at x = s: ∂f ∂φ f −φ ∂f ∂φ = ∂f ∂φ . leads after integrating by parts. φ + δφ )dx.89) The two matching conditions (or jump conditions) (7. Equation (7.88) and (7. x=s+ By the arbitrariness of the variations above. 1). to 1 A δφ(x)dx + B δφ(s) + C δs = 0. . This leads to the usual Euler equation on (0. x=s+ (7.89) are known as the WierstrassErdmann corner conditions (the term “corner” referring to the “kink” in φ). φ + δφ. s+δs We can now calculate the ﬁrst variation δF which.
For simplicity. φ (x) = d for s < x < 4. φ ) dx = 0 (φ − 1)2 (φ + 1)2 dx over the set of piecewise smooth functions subject to the end conditions φ(0) = 0. of course. (Again. On integrating this and using the boundary conditions φ(0) = 0.92) now holds on either side of x = s and so we ﬁnd from (7. say.92) that φ = c = constant on (0.91) 2 (7.90) Consequently the Euler equation (at points of smoothness) is d d fφ − fφ = dx dx 4φ (φ )2 − 1 = 0. it is helpful to call this. restrict attention to functions that have at most one point at which φ has a discontinuity.164 CHAPTER 7. we ﬁnd that φ(x) = x/2. . Next consider a piecewise smooth extremizer of F which has a kink at some location x = s. φ0 .) The Euler equation (7. CALCULUS OF VARIATIONS Example: Find the extremals of the functional 4 4 F (φ) = 0 f (x. ∂φ (7. exist. Thus c for 0 < x < s. such an extremal might not. 0 ≤ x ≤ 4. Thus 1 φo (x) = x 2 is a smooth extremal of F . (Such an extremal might not. is a smooth extremal. φ. the value of s ∈ (0. φ(4) = 2. (if c = d there would be no kink at x = s and we have already dealt with this case above). 4) is not known a priori and is to be determined.) In this case the Euler equation (7.92) First. for 0 ≤ x ≤ 4. ∂f = 4φ (φ )2 − 1 . ∂φ ∂f = 0. φ.92) holds on the entire interval (0. (7. exist. φ ) = (φ )2 − 1 and therefore on diﬀerentiating f . consider an extremal that is smooth everywhere. φ(4) = 2. of course. 4) and so we conclude that φ (x) = constant for 0 ≤ x ≤ 4. 4) where c = d. In order to compare this with what follows. Here f (x. s) and φ = d = constant on (s.
Keeping in mind that c = d and solving these equations leads to the two solutions: c = 1. and c = −1. we must have φ(s−) = φ(s+) which requires that cs = d(s − 4) + 2 whence 2 − 4d .91) and (7. and enforcing the boundary conditions φ(0) = 0. −(d2 − 1)(1 + 3d2 ) for s < x < 4. φ1 (x) = −x + 6 for 3 ≤ x ≤ 4. s) and (s. and ∂f f −φ = ∂φ −(c2 − 1)(1 + 3c2 ) for 0 < x < s.7. φ2 (x) = x − 2 for 1 ≤ x ≤ 4. leads to cx for 0 ≤ x ≤ s.93). . (7.93) there are two piecewise smooth extremals φ1 (x) and φ2 (x) of the assumed form: x for 0 ≤ x ≤ 3.94) that s = 3.90).89). and the two WeirstrassErdmann corner conditions (7. All that remains is to ﬁnd c and d. d = 1. From (7.88). Integrating this. d = −1. WEIRSTRASSERDMAN CORNER CONDITIONS 165 Since φ is required to be continuous. Corresponding to the former we ﬁnd from (7. ∂f = ∂φ 4d(d2 − 1) for s < x < 4. Therefore the WeirstrassErdmann corner conditions (7. while the latter leads to s = 1.94) s= c−d Note that s would not exist if c = d. 4).7. (7. (7. which require respectively the continuity of ∂f /∂φ and f −φ ∂f /∂φ at x = s.88) and (7. Thus from (7. 4c(c2 − 1) for 0 < x < s. φ(4) = 2. separately on (0.93) d(x − 4) + 2 for s ≤ x ≤ 4.89) provide us with the two equations for doing this. −x for 0 ≤ x ≤ 1. (c2 − 1)(1 + 3c2 ) = (d2 − 1)(1 + 3d2 ). give us the pair of simultaneous equations c(c2 − 1) = d(d2 − 1). φ(x) = (7.
F {φ1 } = F {φ2 } = 0. Remark: By inspection of the given functional 4 F (φ) = 0 (φ )2 − 1 dx. φ . φ. By evaluating the functional F at each of the extremals φ0 . First.8 Generalization to higher dimensional space. and (b) F = 0 if and only if φ = ±1 everywhere (except at isolated points where φ may be undeﬁned).15 shows graphs of φo . φ1 and φ2 . φ1 and φ2 . φ ) dx 0 . 2 it is clear that (a) F ≥ 0.15: Smooth extremal φ0 (x) and piecewise smooth extremals φ1 (x) and φ2 (x). consider the onedimensional variational problem of minimizing a functional 1 F {φ} = f (x. The extremals φ1 and φ2 have this property and therefore correspond to absolute minimizers of F . 7. we ﬁnd F {φo } = 9/4. Figure 7. CALCULUS OF VARIATIONS y 3 2 1 φ1(x) φ1(x) φ0(x) 1 2 3 4 x φ2(x) φ2(x) Figure 7. In order to help motivate the way in which we will approach higherdimensional problems (which will in fact be entirely parallel to the approach we took for onedimensional problems) we begin with some preliminary observations.166 CHAPTER 7.
This motivated us to express the integrand in the form of some quantity A (independent of any variations) multiplied by δφ(x). ∂ 2 φ/∂x2 . C are independent of δφ and its derivatives and the latter two integrals are on the boundary of the domain D. ∂ 2 φ/∂x∂y. ∂ 2 φ/∂y 2 ) dA D over this set of functions with no boundary conditions prescribed anywhere on the boundary ∂D.8. another quantity C independent of any variations multiplied by δφ (0). y)) ds = 0 ∂n (7. We approach twodimensional problems similarly and our strategy will be to exploit the fact that δφ(x. This converts a term that is an integral over [0. y) ds + ∂D C ∂ (δφ(x. Thus the goal in our calculations will be to express the boundary terms as some quantity independent of any variations multiplied by δφ. y) is arbitrary in the interior of D and so we attempt to express the integrand as some quantity A that is independent of any variations multiplied by δφ. another quantity independent of any variations multiplied by ∂(δφ)/∂n etc. y) dA + ∂D B δφ(x.7. in the onedimensional case we were able to exploit the fact that δφ and its derivative δφ are arbitrary at the boundary points x = 0 and x = 1. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. and so on. ∂φ/∂x. Next. recall that one of the steps involved in calculating the minimizer of a onedimensional problem is integration by parts. y) ∈ D and the boundary conditions B = C = 0 on ∂D. Then. yplane and to minimize a given functional F {φ} = f (x. The analogous twodimensional problem would be to consider a set of suitably smooth functions φ(x.95) where A. We then exploit the arbitrariness of δφ(x. 167 on a set of suitably smooth functions with no prescribed boundary conditions at either end. and the arbitrariness of δφ and ∂(δφ)/∂n on the boundary ∂D to conclude that the minimizer must satisfy the partial diﬀerential equation A = 0 for (x. ∂φ/∂y. Thus in the twodimensional case our strategy will be to take the ﬁrst variation of F and carry out appropriate calculations that lead us to an equation of the form δF = D A δφ(x. and this motivated us to express the boundary terms as some quantity B that is independent of any variations multiplied by δφ(0). φ. B. We approach twodimensional problems similarly and our strategy for the boundary terms is to exploit the fact that δφ and its normal derivative ∂(δφ)/∂n are arbitrary on the boundary ∂D. y) on the interior of the domain of integration. y) deﬁned on a domain D of the x. 1] into terms . In deriving the Euler equation in the onedimensional case our strategy was to exploit the fact that the variation δφ(x) was arbitrary in the interior 0 < x < 1 of the domain. y. the arbitrariness of δφ allowed us to conclude that A must vanish on the entire domain. Similarly concerning the boundary terms.
y) and its tangential derivative ∂φ/∂s along the boundary ∂D are not independent of each other in the following sense: if one knows the values of φ along ∂D one can diﬀerentiate φ along the boundary to get ∂φ/∂s. y) is applied normal to the surface of the membrane in the negative zdirection. which is an integral over D. in a form that only involves terms on the boundary. This is why equation (7. y) the integrand of the right hand side is ∂χ/∂n. Let u(x. The analog of this in higher dimensions is carried out using the divergence theorem. CALCULUS OF VARIATIONS that are only evaluated on the boundary points x = 0 and 1. A pressure p(x. Note that in the special case where P = ∂χ/∂x and Q = ∂χ/∂y for some χ(x.95) does not involve a term of the form E ∂(δφ)/∂s integrated along the boundary ∂D since it can be rewritten as the integral of −(∂E/∂s) δφ along the boundary Example 1: A stretched membrane. which in twodimensions reads ∂Q ∂P + ∂x ∂y dA = ∂D (P nx + Qny ) ds (7. y) ∈ ∂D. Recall also that a function φ(x. y) be the resulting deﬂection of the membrane in the zdirection. The membrane is ﬁxed along its entire edge ∂D and so u=0 for (x.and ydirections respectively. yplane. On the boundary ∂D of a two dimensional domain D we frequently need to calculate the derivative of φ in directions n and s that are normal and tangential to ∂D. A stretched ﬂexible membrane occupies a regular region D of the x. ny are the components of the unit normal vector n on ∂D that points out of D. and conversely if one knows the values of ∂φ/∂s along ∂D one can integrate it along the boundary to ﬁnd φ to within a constant. (7. In vector form we have φ= ∂φ ∂φ ∂φ ∂φ i+ j= n+ s ∂x ∂y ∂n ∂s where i and j are unit vectors in the x.168 CHAPTER 7.96) D which expresses the left hand side. Here nx .97) One can show that the potential energy Φ associated with any deﬂection u that is compatible with the given boundary condition is Φ{u} = D 1 2 u dA − 2 pu dA D . y) in a direction corresponding to a unit vector m is written as ∂φ/∂m and deﬁned by ∂φ/∂m = φ · m = (∂φ/∂x) mx + ∂φ/∂y) my where mx and my are the components of m in the x. Remark: The derivative of a function φ(x.and ydirections.
x u.xy = ∂ 2 u/∂x∂y.8.x + (u. .y dA − u.x δu. Since Φ{u} = D 1 (u.7.x + (u.).x δu).96).y δu).xx + u. D or equivalently as δΦ = D (u. D . D its ﬁrst variation is δΦ = D (u.x δu). . 169 where we have taken the relevant stiﬀness of the membrane to be unity. y) vanishes on ∂D.xx + u. u = 0 for (x. In order to make use of the divergence theorem and convert the area integral into a boundary integral we must write the integrand so that it involves terms of the form (. see (7.y .y u. y) that acts in the negative zdirection.x + u. y) ∈ ∂D}. GENERALIZATION TO HIGHER DIMENSIONAL SPACE.x = ∂u/∂x and u. for example u.y ) dA − pδu dA.y δu.x + (. yplane and whose boundary ∂D is ﬁxed. D where an admissible variation δu(x.).y ) dA − 2 pu dA.y − (u.yy + p δu dA.x + u. . This suggests that we rewrite the preceding equation as δΦ = D (u.y δu). Here we are using the notation that a comma followed by a subscript denotes partial diﬀerentiation with respect to the corresponding coordinate. . The actual deﬂection of the membrane is the function that minimizes the potential energy over the set of test functions A = {u u ∈ C 2 (D).16: A stretched elastic membrane whose midplane occupies a region D of the x. The membrane surface is subjected to a pressure loading p(x. y 0 x D ∂D Fixed Figure 7.yy )δu dA − pδu dA.
My = −Dw. The basic constitutive relationships of elastic plate theory relate the internal moments Mx .L. My .H.yy .170 CHAPTER 7. (7. yplane and w(x.yy + p) δu dA D where n is a unit outward normal along ∂D. then we would not have δu = 0 on that part in which case (7.xy . Note that if some part of the boundary of D had not been ﬁxed.100) . We can write this equivalently as δΦ = ∂D ∂u δu ds − ∂n 2 D u + p δu dA. Include nu = 0 in text.xx + u.99) would yield the natural boundary condition ∂φ/∂n = 0 on that segment. Shames & C.yy .98) Since the variation δu vanishes on ∂D the ﬁrst integral drops out and we are left with δΦ = − 2 D u + p δu dA (7. When ν = 0 the plate bending stiﬀness D = Et3 /12 where E is the Young’s modulus of the material and t is the thickness of the plate. w. w. in “Energy & Finite Elements Methods in Structural Mechanic” by I. for example. A discussion of the case ν = 0 can be found in many books. We consider the bending of a thin plate according to the socalled Kirchhoﬀ theory.y ny δu ds − (u.NNN NNN Check signs of terms and sign conventionNNN Example 2: The Kirchhoﬀ theory of plates.97).x nx + u. Myx (see Figure 7. Solely for purposes of mathematical simplicity we shall assume that the Poisson ratio ν of the material is zero. y). The midplane of the plate occupies a domain of the x. Mxy = Myx = −Dw. 2 NNN Show the calculations for just one term wxx . (7. CALCULUS OF VARIATIONS By using the divergence theorem on the ﬁrst integral we get δΦ = ∂D u. y) denotes the deﬂection (displacement) of a point on the midplane in the zdirection. Mxy .xx . Thus the minimizer satisﬁes the partial diﬀerential equation 2 u+p=0 for (x.xx .xy by Mx = −Dw. y) ∈ D which is the Euler equation in this case that is to be solved subject to the prescribed boundary condition (7.99) which must vanish for all admissible variations δu(x.17) to the second derivatives of the displacement ﬁeld w.98) and (7. Dym.
xx (7.xy + Myx w.y Vy = D(∇2w). . and the shear forces in the plate are given by Vx = −D( 2 w). yplane.xy . sy = 1 n x ∂Ω2 Free ree Figure 7.7. 171 where a comma followed by a subscript denotes partial diﬀerentiation with respect to the corresponding coordinate and D is the plate bending stiﬀness. A bold arrow with two arrow heads represents a moment whose sense is given by the right hand rule. Vy = −D( 2 w).xx + νwyy ) My = D(w. (7.xy (1 t Mxy x Vx Mx My Myx Vx = D(∇2w). Thus Mxy and Myx are (twisting) moments while Mx and My are (bending) moments. Right: A rectangular a × b plate with a load free edge at its right hand side x = a.8. Clamped ∂Ω1 Clamped y y Clamped Clamped b ∂Ω2 Free ree nx = 1.yy Mxy = Myx = D(1 − ν)w.102) z dy dx Mx = D(w.yy + 2w.x . The segment ∂Ω1 of the boundary is clamped while the remainder ∂Ω2 is free of loading.17: A diﬀerential element dx × dy × t of a thin plate.y + νwxx) . 0 ≤ y ≤ b.101) The elastic energy per unit volume of the plate is given by D 2 1 2 2 Mx w. 2 2 . GENERALIZATION TO HIGHER DIMENSIONAL SPACE.yx = w + w.y . ny = 0 0 x Ω 0 s a sx = 0.yy + Mxy w. A bold arrow represents a force and thus Vx and Vy are (shear) forces.18: Left: A thin elastic plate whose midplane occupies a region Ω of the x.x Vy y Figure 7.xx + My w.
D (7. Consider a thin elastic plate whose midplane occupies a domain Ω of the x.yy − p w dA 2 (7.xy δw. They will involve Mx . (7. a twisting moment Mxy .xy + w.104) where the ﬁrst group of terms represents the elastic energy in the plate and the last term represents the potential energy of the pressure loading (the negative sign arising from the fact that p acts in the minus zdirection while w is the deﬂection in the positive z direction).17 we know that there is a bending moment Mx . This functional Φ is deﬁned on the set of all kinematically admissible deﬂection ﬁelds which is the set of all suitably smooth functions w(x. CALCULUS OF VARIATIONS It is worth noting the following puzzling question: Consider the rectangular plate shown in the right hand diagram of Figure 7.xy + w.105) Ω To simplify this we begin be rearranging the terms into a form that will allow us to use the divergence theorem. A part of the plate boundary denoted by ∂Ω1 is clamped while the remainder ∂Ω2 is free of any external loading. and a shear force Vx acting on any surface x = constant in the plate.103). Based on Figure 7.xx δw.172 CHAPTER 7.yy δw. A normal loading p(x. y) that satisfy the geometric requirements (7. in particular. since the right hand edge x = a is free of loading one would expect to have the three conditions Mx = Mxy = Vx = 0 along that boundary.18. Thus if w(x. Therefore. y) ∈ ∂Ω1 . The actual deﬂection ﬁeld is the one that minimizes the potential energy Φ over this set.103) D 2 2 2 w. y) denotes the deﬂection of the plate in the zdirection we have the geometric boundary conditions w = ∂w/∂n = 0 The total potential energy of the system is Φ{w} = Ω for (x.xx + 2w.xx + 2w. However we will ﬁnd that the diﬀerential equation to be solved in the interior of the plate requires (and can only accommodate) two boundary conditions at any point on the edge. The question then arises as to what the correct boundary conditions on this edge should be. thereby converting part of the area integral on Ω into a boundary .104) by calculating the ﬁrst variation of Φ{w} and setting it equal to zero: w. yplane as shown in the left hand diagram of Figure 7. Mxy and Vx but will not require that each of them must vanish individually. y) is applied on the ﬂat face of the plate in the −zdirection. We now determine the Euler equation and natural boundary conditions associated with (7.18. Our variational approach will give us precisely two natural boundary conditions on this edge.yy − p δw dA = 0.
Accordingly we rewrite (7.y + p/D δw dA.y − w.yy δw.y − (p/D)δw dA.).8.96).xx δw.y .x + (.x + w.yy δw.xy δw.xxy δw + w.x + w. = ∂Ω (7. w.yyy δw ny ds − 4 Ω w − (p/D) δw dA.xyy nx + w.yyy δw. w. Ω We have used the divergence theorem (7. .y .yy − (p/D)δw dA. see (7.yyy δw.y nx + w.xx δw.yyy δw.xxx δw.106) we again rearrange the terms in I2 into a form that will allow us to use the divergence theorem.).106) w.xxxx + 2w.x + w.x − w.xxy ny + w.xyy δw nx + w.x = Ω + w.y + w. = ∂Ω w. GENERALIZATION TO HIGHER DIMENSIONAL SPACE.xxy δw + w.xy δw.xy δw.x + w. .xx δw.x + w. P2 δw dA.yyyy − p/D δw dA. 173 integral on ∂Ω.105) as 0 = Ω w.y −w.y + w.y − w.x = + w.xxy δw. in the last step we have let I1 and I2 denote the integrands of the boundary and area integrals.yyy δw .y . .107) = ∂Ω w.xy δw.xxx δw + w.x − w.x + w. .96) in going from the second equation above to the third equation.xyy δw.7.xx + 2w.yy δw. (7.xxx nx + w.xyy δw.xy + w. Thus I Ω 2 dA = Ω w. To simplify the area integral in (7.x + w.y ny ds − w.yyy ny δw ds − 4 Ω w − p/D δw dA. In order to use the divergence theorem we must write the integrand so that it involves terms of the form (.y + (p/D)δw dA. Ω = ∂Ω I1 ds − I2 dA.xxx δw + w.xxx δw.xxx δw.xyy δw Ω . Ω = ∂Ω P1 δw ds − .xxy δw.x + w. In order to facilitate further simpliﬁcation.xxy δw.xy δw.xxyy + w.xyy δw.
xx nx sx + w.107) to the third equation.x i + δw.x = δw.n ny + δw.y j = δw. Thus from (7.yy ny δw. and we have set 4 w= 2 ( 2 w) = w.y + w.yy ny sy . w.xy nx + w.xy ny ∂Ω = = ∂Ω δw.106) by converting the derivatives of the variation with respect to x and y into derivatives with respect to normal and tangential coordinates n and s.111) ∂Ω .xy nx ny + w.y ds.xy nx sy + w.xxx nx + w.110) δw ds.s sx and δw.s ds.xy ny δw.xxyy + w. CALCULUS OF VARIATIONS P1 = w.s sx + w.yyyy .s If a ﬁeld f (x.xy nx ny + w.xy ny δw.xyy nx + w. w. To do this we use the fact that δw = δw.xy nx δw.xx n2 + w. w. Next we simplify the boundary term in (7.xx n2 + w.xy nx sy + w.y ds.x + w.s sy .xy sx ny + w.yy n2 δw.n n + δw.yy n2 δw.xy nx sy + w.xy sx ny + w. we have used the divergence theorem (7.yyy ny and P2 = 4 w − p/D.yy ny δw.s sy ds.96) in going from the second equation in (7.s ds.174 where we have set CHAPTER 7. I ∂Ω 1 = ∂Ω ds = ∂Ω w.n nx + δw.109) To further simplify this we have set I3 equal to the last expression in (7.s s from which it follows that δw.n ny + δw.yy ny sy δw.xy nx ny + w. and if the curve ∂Ω itself is smooth.106).yy ny sy δw ∂Ω .xy sx ny + w.xy sx ny + w. then ∂f ds ∂s = 0 (7.xx nx sx + w. w.xx nx + w.xx nx + w.xxy ny + w. y) varies smoothly along ∂Ω.x + w.xy nx + w. (7.yy ny sy δw.n + I3 ds.x + w. − w.n x y + w. = ∂Ω w.xx nx sx + w.xx nx sx + w.yy ny δw.s = (7.n nx + δw.y = δw.xy nx ny + w.xy nx sy + w.108) In the preceding calculation.xxxx + 2w. x y (7.109) and this term can be written as I ∂Ω 3 ds = ∂Ω w.xx nx δw.
106) leads to Ω (7. (7. 8 .e.xy nx sy + w. y) ∈ ∂Ω1 .114) P2 δw dA − P1 + ∂Ω ∂ (P4 ) ∂s δw ds + ∂Ω P3 ∂ (δw) ds = 0 ∂n (7. ∂n (7. 175 since this is an integral over a closed curve8 .xy nx sy + w.xy ny sx + w.115) with this gives − P1 + ∂Ω δw ds + ∂ (δw) ds = 0. It follows from this that the ﬁrst term in the last expression of (7.yy n2 . Thus (7.110) vanishes and so I ∂Ω 3 ds = − w.115) which must hold for all admissible variations δw.xx nx sx + w.112) Substituting (7.yy ny sy .xy nx ny + w.xx nx sx + w.119) In the present setting one would have this degree of smoothness if there are no concentrated loads applied on the boundary of the plate ∂Ω and the boundary curve itself has no corners. GENERALIZATION TO HIGHER DIMENSIONAL SPACE.xy nx ny + w.117) Since the portion ∂Ω1 of the boundary is clamped we have w = ∂w/∂n = 0 for (x.xy sx ny + w. ∂Ω = ∂Ω1 ∪ ∂Ω2 .7.xy nx sy + w.xy nx ny + w.xxy ny + w.8. First restrict attention to variations which vanish on the boundary ∂Ω and whose normal derivative ∂(δw)/∂n also vanish on ∂Ω.xy ny sx + w. Finally.117) simpliﬁes to − P1 + ∂Ω2 ∂ (P4 ) ∂s δw ds + ∂Ω2 P3 ∂ (δw) ds = 0 ∂n (7.yy ny = 0. i. Thus we conclude that P1 + ∂P4 /∂s = 0 and P3 = 0 on ∂Ω2 : w.yy ny sy ) = 0 for (x.xxx nx + w. x y P4 = w.118) for variations δw and ∂(δw)/∂n that are arbitrary on ∂Ω2 where ∂Ω2 is the complement of ∂Ω1 . 2 2 w.113) where we have set P3 = w. substituting (7. (7.yy ny sy ∂Ω .xx nx sx + w.yyy ny ∂ + ∂s (w.109) yields I1 ds = ∂Ω ∂Ω P3 ∂ (δw) ds − ∂n ∂Ω ∂ (P4 ) δw ds.xx n2 + w.112) into (7. This leads us to the Euler equation P2 = 0: 4 w − p/D = 0 ∂ (P4 ) ∂s for (x.xyy nx + w.xy nx ny + w. y) ∈ Ω. ∂s (7. Thus the variations δw and ∂(δw)/∂n must also vanish on ∂Ω1 .113) and (7. y) ∈ ∂Ω2 .s δw ds.116) Returning to (7.107) into (7.xx nx + w. P3 ∂Ω (7.
We had noted that intuitively we would have expected the moments and forces to vanish on a free edge and therefore that Mx = Mxy = Vx = 0 there. in summary.xx nx nx + w. CALCULUS OF VARIATIONS Thus.101) as to what the correct boundary conditions on a free edge should be. 0 ≤ y ≤ b is free of load.100) shows that in this case Mn = Mx . we wish to .176 CHAPTER 7. sx = 0.xxx + w.xy ny sx + w.123) require that certain combinations of Mx . see the right diagram in Figure 7. the Kirchhoﬀ theory of plates for the problem at hand requires that one solve the ﬁeld equation (7.yxx ny + w.xx nx sx + w.122) Mns = −D w.19. Vn = Vx .xx (7.yy ny sy ) Vn = −D (w.yx Vn = −D (w. ∂y (7. ∂ Mxy + Vx = 0.xyy nx + w.120) Mns = −D (w. The two natural boundary conditions (7. Mns and force Vn by Mn = −D (w.yy ny ny ) (7.103) on ∂Ω1 and the natural boundary conditions (7.121) can be written as Mx = 0. ny = 0. Then nx = 1. Mxy . Thus the natural boundary conditions (7.yx nx sy + w. Mns = Mxy . sy = 1 on ∂Ω2 and so (7. ∂s then the two natural boundary conditions can be written as Mn = 0.116) only requires two conditions at each point on the boundary.xxx nx + w. Let C be a closed curve in R3 as sketched in Figure 7. Example 3: Minimal surface equation. 0 ≤ y ≤ b and that the right edge x = a. but this is in contradiction to the mathematical fact that the diﬀerential equation (7. 0 ≤ x ≤ a. Vx vanish but not that all three vanish.121) As a special case suppose that the plate is rectangular. Remark: If we deﬁne the moments Mn . From among all surfaces S in R3 that have C as its boundary.xyy ) which because of (7.xy nx ny + w.119) on ∂Ω2 .yx ny nx + w.116) on Ω subjected to the displacement boundary conditions (7.18.120) simpliﬁes to Mn = −D w. (7.yyy ny ) ∂ Mns + Vn = 0.123) This answers the question we posed soon after (7.
As a physical example. see Figure 7. yplane. Suppose that its projection onto the x.19: The closed curve C in R3 is given. Consider a rectangular diﬀerential element on the x. then we know that du = dx i + φx dx k. y) y x. minimizes the total surface energy of the ﬁlm. yplane is denoted by ∂D and let D denote the simply connected region contained within ∂D. if C corresponds to a wire loop which we dip in a soapy solution. y) to (x. y+dy) is dy = dy j. y) and the area of this parallelogram is du × dv. y) for (x.19. 177 determine the surface that has minimum area. S : z = φ(x. x y . from among all possible surfaces S that are bounded by C. y) k j D i ∂D x the surface with minimal area is to be sought. Thus the admissible set of functions we are considering are A{φ φ : D → R. y) is dx = dx i while the vector joining (x.8. C : z = h(x. y) to (x+dx. y) for (x. z x. From among all surfaces S in R3 that have C as its boundary. The curve ∂D is the projection of C onto the x. GENERALIZATION TO HIGHER DIMENSIONAL SPACE. If du and dv are vectors on the surface z = φ(x. Figure 7. Let C be a closed curve in R3 . φ = h on ∂D} . dv = dy j + φy dy k. Let z = φ(x. yplane that is contained within D. y) ∈ ∂D. The vectors du and dv deﬁne a parallelogram on the surface z = φ(x. necessarily φ = h on ∂D. Thus the area of a diﬀerential element on S is du × dv = − φx dxdy i − φy dxdy j + dxdy k = 1 + φ2 + φ2 dxdy. The vector joining (x. y) ∈ D describe a surface S in R3 that has C as its boundary.7. φ ∈ C 1 (D). which (if the surface energy density is constant) is the surface with minimal area. a thin soap ﬁlm will form across C. The surface that forms is the one that. y) whose projections are dx and dy respectively. Suppose that C is characterized by z = h(x.
η}. and that for some given function η. η}.edu/facstaﬀ/b/brakke/ for additional discussion. def . we conﬁne the discussion to the particular functional 1 F {φ} = f (x. φ ∈ C 2 (D). φ )dx = F {φ}. Another necessary condition for a minimum. φ + εη.susqu. φ. 1 0 ε2 F (0) = ε2 {fφφ η 2 + 2fφφ ηη + fφ φ (η )2 } dx = δ 2 F {φ. y x Remark: See en.9 Second variation. x y D 7. φ + εη )dx.wikipedia. 2 where ˆ F (0) ˆ εF (0) = 1 0 1 f (x. = δF {φ. Deﬁne F (ε) by ˆ F (ε) = F {φ + εη} = so that by Taylor expansion. In order to illustrate the basic ideas of this section in the simplest possible setting. Suppose that a particular function φ minimizes F . CALCULUS OF VARIATIONS Consequently the problem at hand is to minimize the functional F {φ} = over the set of admissible functions A = {φ  φ : D → R.178 CHAPTER 7. the oneparameter family of functions φ + εη are ˆ admissible for all suﬃciently small values of the parameter ε. ε2 ˆ ˆ ˆ ˆ F (ε) = F (0) + εF (0) + F (0) + O(ε3 ). φ )dx 0 deﬁned over a set of admissible functions A.org/wiki/Soap bubble and www. 1 + φ2 + φ2 dA. φ = h on ∂D}. φ. 0 f (x. It is left as an exercise to show that setting the ﬁrst variation of F equal to zero leads to the socalled minimal surface equation (1 + φ2 )φxx − 2φx φy φxy + (1 + φ2 )φyy = 0.
it follows that ε = 0 minimizes F (ε). Proposition: Legendre Condition: A necessary condition for (7. We shall not discuss suﬃcient conditions in general in these notes. η} ≥ 0 is necessary but not suﬃcient for the functional F to have a minimum at φ. The condition δ 2 F {φ. and consequently that δ 2 F {φ. 179 in addition to the requirement δF {φ. . A function φ that minimizes F must satisfy the boundary value problem consisting of the Euler equation and the given boundary conditions: d φφ − 1 + (φ )2 = 0. From among all such curves.124) where we have set δφ = εη. φ (x)) ≥ 0 for the minimizing function φ. generates the surface of minimum area. The general solution of this Euler equation is φ(x) = α cosh x−β α for 0 ≤ x ≤ 1. over a set of admissible functions that satisfy the boundary conditions φ(0) = φ0 . ﬁnd the one that. η} = 0. (7.9.124) to hold is that fφ φ (x. φ1 ). φ(x). φ. when rotated about the xaxis. φ(1) = φ1 . φ(1) = φ1 . Thus a necessary condition for a function φ to minimize a functional F is that the second variation of F be nonnegative for all admissible variations δφ: 1 δ 2 F {φ. dx 1 + (φ )2 φ(0) = φ0 . φ. φ ) = φ 1 + (φ )2 . Thus we are asked to minimize the functional 1 for 0 ≤ x ≤ 1 F {φ} = f (x. φ0 ) and ends at (1. φ ) dx 0 where f (x. SECOND VARIATION ˆ Since φ minimizes F . The inequality is reversed if φ maximizes F . δφ} = 0 fφφ (δφ)2 + 2fφφ (δφ)(δφ ) + fφ φ (δφ )2 dx ≥ 0. yplane characterized by y = φ(x) that begins at (0. Example: Consider a curve in the x.7. η} ≥ 0.
when evaluated at the particular function φ(x) = α cosh(x − β)/α yields fφ φ φ=α cosh(x−β)/α = α .180 CHAPTER 7. It is useful to begin by reviewing the question of ﬁnding the minimum of a realvalued function of a real variable. x2 ∈ A. y = F (x) y = F (x2) + F (x2)(x − x2) 0 x2 x1 x D Figure 7. We now turn to a brief discussion of suﬃcient conditions for a minimum for a special class of functionals. . which. CALCULUS OF VARIATIONS where the constants α and β are determined through the boundary conditions.20: A convex curve y = F (x) lies above the tangent line through any point of the curve. A function F (x) deﬁned for x ∈ A with continuous ﬁrst derivatives is said to be convex if F (x1 ) ≥ F (x2 ) + F (x2 )(x1 − x2 ) for all x1 . cosh (x − β)/α 2 Therefore as long as α > 0 the Legendre condition is satisﬁed. To test the Legendre condition we calculate fφ φ and ﬁnd that fφ φ = φ ( 1 + (φ )2 )3 . x2 ∈ D.10 Suﬃcient condition for minimization of convex functionals y F (x1) ≥ F (x2) + F (x2)(x1 − x2) for all x1. 7.
7. φ. xo . (7. φ + η ∈ A.e. then it follows by setting x2 = xo in the preceding equation that xo is a minimizer of F .10. We now turn to a functional F {φ} which is said to be convex on A if F {φ + η} ≥ F {φ} + δF {φ. 0 1 (7. It is readily seen that a suﬃcient condition for (7. If F is strictly convex on A. φ + η ) − f (x. Therefore a stationary point of a convex function is necessarily a minimizer. i. then F can have only one stationary point and so can have only one interior minimum. x1 − x2 ) for all x1 . then F can only have one stationary point and so can only have one interior minimum. SUFFICIENT CONDITION FOR CONVEX FUNCTIONALS 181 see Figure 7.125) ∂f ∂f η+ η ∂φ ∂φ 1 dx and so the convexity condition F {φ + η} − F {φ} ≥ δF {φ. and it follows that φo is in fact a minimizer of F . consider the generic functional 1 F {φ} = Then δF {φ.e. For example. If a convex function has a stationary point. If F is strictly convex on A. φ ) dx ≥ 0 ∂f ∂f η+ η ∂φ ∂φ dx. if F is convex and F (x1 ) = F (x2 ) + δF (x2 .20 for a geometric interpretation of convexity. η} = 0 for all admissible η. then since δF (xo . If F is stationary at φo ∈ A. i. φ )dx. .126) to hold is that the integrands satisfy 9 See equation (??) for the deﬁnition of δF (x. φ. then δF {φo . y) = 0 for all y it follows that xo is a minimizer of F . say. If a convex function has a stationary point at. Therefore a stationary point of a convex functional is necessarily a minimizer. say at xo . This is also true for a realvalued function F with continuous ﬁrst derivatives on a domain A in Rn . x1 − x2 ) if and only if x1 = x2 . where convexity is deﬁned by9 F (x1 ) ≥ F (x2 ) + δF (x2 . Therefore a stationary point of a convex function is necessarily a minimizer. η} = 1 0 0 f (x. y).126) In general it might not be simple to test whether this condition holds in a particular case. η} for all φ. φ + η. η} takes the special form f (x. if F is convex and F (x1 ) = F (x2 ) + F (x2 )(x1 − x2 ) if and only if x1 = x2 . x2 ∈ A.
θ1 ≤ θ ≤ θ2 . θ2 . (a.21: A curve that lies entirely on a circular cylinder of radius a. ξ1) x = a cos θ y = a sin θ ξ = ξ(θ) θ1 ≤ θ ≤ θ 2 a. z) be a convex function of y. y.127). θ. (a. y + v. ξ2 ) as shown in the ﬁgure. This is precisely the requirement that the function f (x. a function φ that extremizes F is in fact a minimizer of F .182 the inequality CHAPTER 7. y. When the curve lies on the surface of a circular cylinder of radius a this specializes to r = a. then. Note that this is simply a suﬃcient condition for ensuring that an extremum is a minimum. beginning (in circular cylindrical coordinates (r. θ1 . ξ2 ). ξ1 ) and ending at (a. We can characterize a curve in R3 using a parametric characterization using circular cylindrical coordinates by r = r(θ). CALCULUS OF VARIATIONS f (x. Find the curve of shortest length that lies entirely on a circular cylinder of radius a. z) is a strictly convex function of z at each ﬁxed x. z at ﬁxed x. z). z) ≥ ∂f ∂f v+ w ∂y ∂z (7. θ1. y + v. beginning (in circular cylindrical coordinates) at (a. y. z) is independent of y.127) for all (x. (x. Remark: In the special case where f (x. ξ = ξ(θ). ξ2) Figure 7. ξ1 ) and ending at (a. ξ a θ1 θ2 a a.125) satisﬁes the convexity condition (7. . y. one sees from basic calculus that if ∂ 2 f /∂z 2 > 0 then f (x. Thus in summary: if the integrand f of the functional F deﬁned in (7. θ1 . z + w) − f (x. Example: Geodesics. ξ = ξ(θ) for θ1 ≤ θ ≤ θ2 . θ2 . θ2. ξ)) at (a. z + w) in the domain of f .
This second order diﬀerential equation for ξ(θ) can be readily solved. Thus the curve of minimum length is given uniquely by (7. which after using the boundary conditions ξ(θ1 ) = ξ1 . and note that k→∞ where xk = 1 . Evaluating the necessary condition δF = 0 leads to the Euler equation. DIRECT METHOD Since the arc length can be written as ds = dr2 + r2 dθ2 + dξ 2 = r (θ) 2 183 + r(θ) 2 + ξ (θ) 2 dθ = = a2 + ξ (θ) 2 dθ. our task is to minimize the functional θ2 F {ξ} = f (θ. This function is nonnegative and has a minimum value of zero which it attains at x = 0. xk . 2k lim f (xk ) = 0. ξ(θ2 ) = ξ2 .11. x 3 .11 Direct method of the calculus of variations and minimizing sequences.128) – a helix. ξ(θ).128) a2 + z 2 shows that a2 ∂ 2f = 2 >0 ∂z 2 (a + z 2 )3/2 and so f is a strictly convex function of z. . ξ(θ2 ) = ξ2 leads to ξ(θ) = ξ1 + Direct diﬀerentiation of f (x. y. . . x 2 . ξ (θ)) dθ θ1 where f (x. Consider the speciﬁc example f (x) = x2 . begin by reviewing the familiar case of a realvalued function f (x) of a real variable x ∈ (−∞. . x 1 . Note that if the circular cylindrical surface is cut along a vertical line and unrolled into a ﬂat sheet. (7. . We now turn to a diﬀerent method of seeking minima. z) = √ ξ1 − ξ2 θ1 − θ2 (θ − θ1 ). . z) = √ a2 + z 2 over the set of all suitably smooth functions ξ(θ) deﬁned for θ1 ≤ θ ≤ θ2 which satisfy ξ(θ1 ) = ξ1 . 7. y. ∞).7. Consider the sequence of numbers x 0 . this curve unfolds into a straight line. and for purposes of introduction.
184
CHAPTER 7. CALCULUS OF VARIATIONS
3
f(x) = x2
3
f(x) = x2
2
2
f(x) = 1
1 1
2
1
1
2
x
2
1
1
2
x
(a)
(b)
Figure 7.22: (a) The function f (x) = x2 for −∞ < x < ∞ and (b) the function f (x) = 1 for x ≤ 0,
f (x) = x2 for x > 0.
The sequence 1/2, 1/22 , . . . , 1/2k , . . . is called a minimizing sequence in the sense that the value of the function f (xk ) converges to the minimum value of f as k → ∞. Moreover, observe that lim xk = 0
k→∞
as well, and so the sequence itself converges to the minimizer of f , i.e. to x = 0. This latter feature is true because in this example f ( lim xk ) = lim f (xk ).
k→∞ n→∞
As we know, not all functions have a minimum value, even if they happen to have a ﬁnite greatest lower bound. We now consider an example to illustrate the fact that a minimizing sequence can be used to ﬁnd the greatest lower bound of a function that does not have a minimum. Consider the function f (x) = 1 for x ≤ 0 and f (x) = x2 for x > 0. This function is nonnegative, and in fact, it can take values arbitrarily close to the value 0. However it does not have a minimum value since there is no value of x for which f (x) = 0; (note that f (0) = 1). The greatest lower bound or inﬁmum (denoted by “inf”) of f is
−∞<x<∞
inf
f (x) = 0.
Again consider the sequence of numbers x 0 , x 1 , x 2 , x 3 . . . xk , . . . and note that
k→∞
where xk =
1 , 2k
lim f (xk ) = 0.
7.11. DIRECT METHOD
185
In this case the value of the function f (xk ) converges to the inﬁmum of f as k → ∞. However since lim xk = 0
k→∞
the limit of the sequence itself is x = 0 and f (0) is not the inﬁmum of f . This is because in this example f ( lim xk ) = lim f (xk ).
k→∞ n→∞
Returning now to a functional, suppose that we are to ﬁnd the inﬁmum (or the minimum if it exists) of a functional F {φ} over an admissible set of functions A. Let
φ∈A
inf F {φ} = m (> −∞).
Necessarily there must exist a sequence of functions φ1 , φ2 , . . . in A such that
n→∞
lim F {φk } = m;
such a sequence is called a minimizing sequence. If the sequence φ1 , φ2 , . . . converges to a limiting function φ∗ , and if F { lim φk } = lim F {φk },
n→∞ n→∞
then it follows that F {φ∗ } = m and the function φ∗ is the minimizer of F . The functions φk of a minimizing sequence can be considered to be approximate solutions of the minimization problem. Just as in the second example of this section, in some variational problems the limiting function φ∗ of a minimizing sequence φ1 , φ2 , . . . does not minimize the functional F ; see the last Example of this section.
7.11.1
The Ritz method
Suppose that we are to minimize a functional F {φ} over an admissible set A. Consider an inﬁnite sequence of functions φ1 , φ2 , . . . in A. Let Ap be the subset of functions in A that can be expressed as a linear combination of the ﬁrst p functions φ1 , φ2 , . . . φp . In order to minimize F over the subset Ap we must simply minimize F (α1 , α2 , . . . , αp ) = F {α1 φ1 + α2 φ2 + . . . + αp φp }
186
CHAPTER 7. CALCULUS OF VARIATIONS
with respect to the real parameters α1 , α2 , . . . αp . Suppose that the minimum of F on Ap is denoted by mp . Clearly A1 ⊂ A2 ⊂ A3 . . . ⊂ A and therefore m1 ≥ m2 ≥ m3 . . .10 . Thus, in the socalled Ritz Method, we minimize F over a subset Ap to ﬁnd an approximate minimizer; moreover, increasing the value of p improves the approximation in the sense of the preceding footnote. Example: Consider an elastic bar of length L and modulus E that is ﬁxed at both ends and carries a distributed axial load b(x). A displacement ﬁeld u(x) must satisfy the boundary conditions u(0) = u(L) = 0 and the associated potential energy is
L
F {u} =
0
1 E(u )2 dx − 2
L
bu dx.
0
We now use the Ritz method to ﬁnd an approximate displacement ﬁeld that minimizes F . Consider the sequence of functions v1 , v2 , v3 , . . . where pπx ; vp = sin L observe that vp (0) = vp (L) = 0 for all intergers p. Consider the function
n
un (x) =
p=1
αp sin
pπx L
for any integer n ≥ 1 and evaluate
L
F (α1 , α2 , . . . αn ) = F {un } = Since
L
0
1 E(un )2 dx − 2 0
L
bun dx.
0
2 cos
0
it follows that
L L n
pπx qπx cos dx = L L
n
for p = q,
L for p = q, 1 dx = 2
n 2 αp p=1
(un ) dx =
0 0
2
pπx pπ cos αp L L p=1
n
qπ qπx αq cos L L q=1
L
p2 π 2 L
Therefore F (α1 , α2 , . . . αn ) = F {un } =
10
p=1
2 2 1 2 p π E αp − αp 4 L
b sin
0
pπx dx L
(7.129)
If the sequence φ1 , φ2 , . . . is complete, and the functional F {φ} is continuous in the appropriate norm, then one can show that lim mp = m.
p→∞
7.12. WORKED EXAMPLES. To minimize F (α1 , α2 , . . . αn ) with respect to αp we set ∂ F /∂αp = 0. This leads to αp =
L 0
187
b sin pπx dx L E
p2 π 2 2L
for p = 1, 2, . . . n.
(7.130)
Therefore by substituting (7.130) into (7.129) we ﬁnd that the nterm Ritz approximation of the energy is
n
−
p=1
2 2 1 2 p π E αp 4 L
where αp =
L 0
b sin pπx dx L E
p2 π 2 2L
,
and the corresponding approximate displacement ﬁeld is given by pπx un = αp sin L p=1
n
where αp =
L 0
b sin pπx dx L E
p2 π 2 2L
.
References 1. J.L. Troutman, Variational Calculus with Elementary Convexity, SpringerVerlag, 1983. 2. C. Fox, An Introduction to the Calculus of Variations, Dover, 1987. 3. G. Bliss, Lectures on the Calculus of Variations, University of Chicago Press, 1946. 4. L.A. Pars, Calculus of Variations, Wiley, 1963. 5. R. Weinstock, Calculus of Variations with Applications, Dover, 1952. 6. I.M. Gelfand and S.V. Fomin, Calculus of Variations, PrenticeHall, 1963. 7. T. Mura and T. Koya, Variational Methods in Mechanics, Oxford, 1992.
7.12
Worked Examples.
Example 7.N: Consider two given points (x1 , h1 ) and (x2 , h2 ), with h1 > h2 , that are to be joined by a smooth wire. The wire is permited to have any shape, provided that it does not enter into the interior of the circular region (x − x0 )2 + (y − y0 )2 ≤ R2 . A bead is released from rest from the point (x1 , h1 ) and slides
1) and the test functions must satisfy the same requirements as in the ﬁrst example except that. h1 ) to (x2 .188 y CHAPTER 7. x1 ≤ x ≤ x2 . h2 ) least? Here the wire may not enter into the interior of the prescribed circular region . h2) 0 x1 x2 x Figure 7. CALCULUS OF VARIATIONS (x1. Thus the bending energy associated with a diﬀerential element of the beam is (1/2)EIκ2 ds where ds . For what shape of wire is the time of travel from (x1 . 0 < x < L. A compressive force P is applied at x = L and the beam adopts a buckled shape described by y = φ(x). we may only consider those that lie entirely outside this region: (x − x0 )2 + (φ(x) − y0 )2 ≥ R2 .23: A curve y = φ(x) joining (x1 . in an undeformed conﬁguration. the end x = L is also pinned but is permitted to move along the xaxis. Figure NNN shows the centerline of the beam in both the undeformed and deformed conﬁgurations. h1) y = φ(x) g (x2. h2 ). [1 + (φ (x))2 ]3/2 From elasticity theory we know that the bending energy per unit length of a beam is (1/2)M κ and that the bending moment M is related to the curvature κ by M = EIκ where EI is the bending stiﬀness of the beam. By geometry. Therefore in considering diﬀerent wires that connect (x1 . Example 7. (i) The travel time of the bead is again given by (7. h1 ) to (x2 .N: Buckling: Consider a beam whose centerline occupies the interval y = 0. h1 ) to (x2 . the curvature κ(x) of a curve y = φ(x) is given by κ(x) = φ (x) . h2 ) which is disallowed from entering the forbidden region (x − x0 )2 + (φ(x) − y0 )2 < R2 . along the wire (without friction) due to gravity. The beam is ﬁxed by a pin at x = 0. they must be such satisfy the (inequality) constraint (i). in addition. Our task is to minimize T {φ} over the set A1 subject to the constratint (i). The prescribed geometric boundary conditions on the deﬂected shape of the beam are thus φ(0) = φ(L) = 0.
7. is arc length along the deformed beam. 1 (φ )2 EI dx. 2 [1 + (φ )2 ]5/2 Next we need to account for the potential energy associated with the compressive force P on the beam. the amount by which the right hand end of the beam moves leftwards is L L L − 0 ds − dx 0 = − 0 1 + (φ )2 dx − L . φ . φ. which for such a functional has the general form d2 fφ dx2 − d fφ dx + fφ = 0. The Euler equation. Since the change in length of a diﬀerential element is ds − dx. WORKED EXAMPLES. Thus the potential energy associated with the applied force P is L −P 0 1 + (φ )2 dx − L . Thus the total bending energy in the beam is L 0 x 1 EI κ2 (x) ds 2 where the arc length s is related to the coordinate x by the geometric relation ds = Thus the total bending energy of the beam is L 0 1 + (φ (x))2 dx. 189 P y O L Figure 7. φ } = 0 1 (φ )2 EI dx − 2 [1 + (φ )2 ]5/2 L P 0 1 + (φ )2 − 1 dx.12.24: An elastic beam in undeformed (lower ﬁgure) and buckled (upper ﬁgure) conﬁgurations. Therefore the total potential energy of the system is L Φ{x. .
. This arrangement of wires is dipped into a soapy bath and taken out. and the natural boundary conditions are φ (0) = φ (L) = 0.N: Null lagrangian Example 7. This eventually leads to the Euler equation d dx φ [1 + (φ )2 ]5/2 + φ 2[1 + (φ )2 ]1/2 P φ + 5 EI/2 [1 + (φ )2 ]3/2 2 =c where c is a constant of integration.N: Linearize BVP in buckling problem above.2: Soap Film Problem. The planes deﬁned by the two circles are parallel to each other and perpendicular to their common axis. Consider two circular wires. Determine the shape of the soap ﬁlm that forms. Example 7.190 CHAPTER 7. which implies that we are to ﬁnd the shape with minimum surface area. 0 ≤ t ≤ T Functional T L 0 F {u} = Euler equation (Wave equation) 0 1 1 2 u − u2 dxdt 2 t 2 x utt − uxx = 0. each of radius R. The last term above therefore drops out and the resulting equation can be integrated once immediately. a distance H apart. that are placed coaxially. approximate the energy and derive Euler equation associated with it. We shall assume that the soap ﬁlm adopts the shape with minimum surface energy. CALCULUS OF VARIATIONS simpliﬁes in the present case since f does not depend explicitly on φ.N: Physical example? Functional T L 0 F {u} = 0 1 2 1 2 2 1 2 ut − u + m u 2 2 x 2 dxdt Euler equation (KleinGordon equation) utt − uxx + m2 u = 0.N: u(x. Example 7. Suppose that the ﬁlm spans across the two circular wires. Also. Example 7. Example 7. t) where 0 ≤ x ≤ L.
Integrating again and using the boundary conditions φ(H) = φ(−H) = R. φ(x) ≥ 0 In order to determine the shape that minimizes the surface area we calculate its ﬁrst variation δArea and set it equal to zero. dx −1 where c is a constant. −H ≤ x ≤ H. 191 y y = φ(x) R −H R H x By symmetry. By geometry. c c (ii) x c (i) . leads to φ(x) = c cosh where c is to be determined from the algebraic equation cosh H R = . WORKED EXAMPLES.12. and this is to be minimized subject to the requirements for − H < x < H.7. the surface must coincide with the surface of revolution of some curve y = φ(x). the surface area of this ﬁlm is H Area{φ} = 2π where we have used the fact that ds = φ(−H) = φ(H) = R and φ(x)ds = 2π −H φ(x) 1 + (φ )2 dx. This gives the Euler equation d dx which we can write as φφ 1 + (φ or d dx This can be integrated to give (φ ) = 2 φφ 1 + (φ )2 − 1 + (φ )2 = 0 )2 d dx φφ 1 + (φ )2 φφ 1 + (φ )2 2 − φφ = 0 − φ c 2 d (φ)2 = 0. 1 + (φ )2 dx.
Here µ∗ ≈ 1. two.5 1 1. the ﬁlm forms on each circular wire but does not bridge the two wires. the ﬁlm bridges across from one circular wire to the other but it does not form on the ﬂat faces of the two circular wires themselves (which is the case analyzed above). Remark: In order to understand what happens when R < µ∗ H consider the following heuristic argument.5 3 2 ξ Figure 7. To examine the solvability of (ii) set ξ = H/c and µ = R/H and then this equation can be written as cosh ξ = µξ. Consider the ﬁrst possibility: the soap ﬁlm spans across the two circular wires and. once if µ = µ∗ .50888 is found by solving the pair of algebraic equations cosh ξ = µ∗ ξ.25: Intersection of the curve described by ζ = cosh ξ with the straight line ζ = µξ. In the second case. Thus in summary. the soap ﬁlm covers only the two end regions formed by the circular wires and so the area of the soap ﬁlm is 2 × πR2 . this suggests that the soap ﬁlm will span across the two circular . Given H and R. if R < µ∗ H there is no shape of the soap ﬁlm that extremizes the surface area. the graph ζ = cosh ξ intersects the straight line ζ = µξ twice if µ > µ∗ . if R > µ∗ H there are two shapes of the soap ﬁlm given by (i) that extremize the surface area (and further analysis investigating the stability of these conﬁgurations is needed in order to determine the physically realized shape). In this case the area of the soap ﬁlm is 2πR(2H). µ > µ∗ 6 4 ζ = µ∗ ξ ζ = µξ. as an approximation. µ < µ∗ 0. and three. sinh ξ = µ∗ where the latter equation reﬂects the tangency of the two curves at the contact point in this limiting case. We can immediately discard the third case since it involves more surface area than either of the ﬁrst two cases. and 4πRH > 2πR2 for 2H > R. Since 4πRH < 2πR2 for 2H < R. if R = µ∗ H there is a unique shape of the soap ﬁlm given by (i) that extremizes the surface area. There are three possible conﬁgurations of the soap ﬁlm to consider: one.5 2 2. suppose that it forms a circular cylinder of radius R and length 2H.192 ζ 10 CHAPTER 7. As seen from Figure 7. if this equation can be solved for c.25. and there is no intersection if µ < µ∗ . then the minimizing shape is given by (i) with this value (or values) of c. CALCULUS OF VARIATIONS ζ = cosh ξ 8 ζ = µξ. the ﬁlm does both of the above.
12. According to the most elementary model of drag (due to Newton). Thus if the space craft has speed V relative to the surrounding medium. 1 + (φ )2 [1 + (φ )2 ]2 φ(φ )2 (3 + (φ )2 ) δφ dx. Thus the drag D is given. the pressure on the body at some generic point is proportional to (V cos θ)2 where θ is the angle shown in Figure 7. Example 7. the value of h1 however is not prescribed and is to be chosen along with the function φ(x) such that the drag is minimized. y h2 dF = pdA dA θ dA = 2πφds ds y = φ(x) s h1 p ∼ (V cos θ)2 V 0 1 x Figure 7. 0 < x < 1. remembering that both the function φ and the parameter h1 can be varied. We are told that φ(0) = h1 .7.26. [1 + (φ )2 ] 1 + (φ )2 .4: Minimum Drag Problem. in suitable units. and the outer portion is obtained by rigidly rotating the curve y = φ(x). The space craft moves at a speed V in the −xdirection. Consider a spacecraft whose shape is to be designed such that the drag on it is a minimum. this acts on a diﬀerential area dA = 2πyds = 2πφds. by 1 D = πh2 + 2π 1 0 φ(φ )3 dx. where we have used the fact that ds = dx 1 + (φ )2 and cos θ = φ / To optimize this we calculate the ﬁrst variation of D. about the xaxis. φ(1) = h2 with the value of h2 being given. 193 wires if R > 2H. The horizontal component of this force is therefore obtained by integrating dF cos θ = (V cos θ)2 × (2πφds) × cos θ over the entire body. Thus we are led to 1 δD = = 2πh1 δh1 + 2π 0 1 (φ )3 δφ dx + 2π [1 + (φ )2 ] (φ ) δφ dx + 2π [1 + (φ )2 ] 3 1 φ 0 1 0 3(φ )2 2(φ )4 − δφ dx. the pressure at some point on a surface is proportional to the square of the normal speed of that point. The outer surface of the spacecraft is composed of two segments as shown in Figure 7. WORKED EXAMPLES. [1 + (φ )2 ]2 2πh1 δh1 + 2π 0 .26: The shape of the space craft with minimum drag is generated by rotating the curve y = φ(x) about the xaxis.26: the inner portion (x = 0 with 0 < y < h1 in the ﬁgure) is a ﬂat circular disk shaped nose of radius h1 . whereas the soap will not span across the two circular wires if R < 2H (and would instead cover only the two circular ends).
Together with the given boundary conditions. (ii) φ (0) = 1. 4 4 where c2 is a constant of integration and ξ is the parameter. x=0 The arbitrariness of δφ and δh1 now yield d φ(φ )2 (3 + (φ )2 ) (φ )3 − =0 for dx [1 + (φ )2 ]2 [1 + (φ )2 ] (φ )2 (3 + (φ )2 ) = 1. (iv) in order to ﬁnd the shape φ(x) and the parameter h1 . The diﬀerential equation (ii) tells us that φ(φ )3 = c1 [1 + (φ )2 ]2 for 0 < x < 1. This leads to h1 −3 φ = ξ + 2ξ −1 + ξ . thus as x increases from 0 to 1 we have supposed that ξ decreases from ξ1 to ξ2 (where we know that ξ1 = φ (0) = 1). Since ξ = φ = 1 when x = 0 the preceding equation gives c2 = −7h1 /16. It is most convenient to write the solution of (iii) with c1 = h1 /4 parametrically by setting φ = ξ.194 CHAPTER 7. [1 + (φ )2 ]2 x=0 0 < x < 1. we are therefore to solve the diﬀerential equation (iii) subject to the conditions φ(0) = h1 . h1 3 −4 x = ξ + ξ −2 + log ξ + c2 . Since φ(0) = h1 and φ (0) = 1 this shows that c1 = h1 /4. (iii) where c1 is a constant. φ(1) = h2 . Thus h1 −3 −1 φ = ξ + 2ξ + ξ . 4 1 > ξ > ξ2 . φ (0) = 1. 4 4 4 . CALCULUS OF VARIATIONS Integrating the last term by parts and recalling that δφ(1) = 0 and δφ(0) = δh1 (since the value of φ(1) is prescribed but the value of φ(0) = h1 is not) we are led to 1 δD = 2πh1 δh1 +2π 0 (φ )3 δφ dx − 2π [1 + (φ )2 ] 1 0 d φ(φ )2 (3 + (φ )2 ) 2πφ(φ )2 (3 + (φ )2 ) δφ dx − dx [1 + (φ )2 ]2 [1 + (φ )2 ]2 δh1 . 4 1 > ξ > ξ2 . (v) h1 3 −4 7 x = ξ + ξ −2 + log ξ − . On physical grounds we expect that the slope φ will decrease with increasing x and so we have supposed that ξ decreases as x increases. (i) The diﬀerential equation in (i)1 and the natural boundary condition (i)2 can be readily reduced to d dx φ(φ )3 [1 + (φ )2 ]2 =0 for 0 < x < 1.
350943. h2 = 4 (vi) h1 3 −4 7 −2 1 = . dx it follows that in this special case the Euler equation can be integrated once to have the simpliﬁed form φ fφ − f = constant for 0 < x < 1. as usual. For example if we take h2 = 1 then the root of (vii) is ξ2 ≈ 0. then either equation in (vi) gives the value of h1 and (v) then provides a parametric description of the optimal shape. In the present case f depends on x only through φ(x) and φ (x). ξ2 + ξ2 + log ξ2 − 4 4 4 Example 7. An equilibrium state of the bar is characterized . φ )dx 0 over some admissible set of functions A. Example 7. d f − φ fφ dx =0 for 0 < x < 1. 4 4 If this can be solved for ξ2 . Note that this is a special case of the standard problem where the function f (x.7. The boundary condition φ = h2 .5: Consider the variational problem where we are asked to minimize the functional 1 F {φ} = f (φ. The boundary condition φ = h1 . The following problem arises when examining the equilibrium state of a onedimensional bar composed of a nonlinearly elastic material. The Euler equation is given.6: Elastic bar. for 0 < x < 1. φ ) is not explicitly dependent on x. 195 which are two equations for determining h2 and ξ2 . by d fφ − fφ = 0 dx Multiplying this by φ gives d fφ − φ fφ = 0 dx which can be written equivalently as φ d (φ fφ ) − φ fφ dx Since this simpliﬁes to d φ fφ − f = 0 for 0 < x < 1. − for 0 < x < 1. for example. φ = 1 at x = 0 has already been satisﬁed.12. WORKED EXAMPLES.521703 and then h1 ≈ 0. φ. Dividing the ﬁrst of (vi) by the second yields a single equation for ξ2 : 3 7 4 3 2 5 4 (vii) ξ2 − h2 ξ2 log ξ2 + h2 ξ2 + 2ξ2 − h2 ξ2 + ξ2 − h2 = 0. x = 1 at ξ = ξ2 requires that h1 −3 −1 ξ2 + 2ξ2 + ξ2 . Remark: We could have taken advantage of this in. the preceding problem.
The bar has unit crosssectional area and occupies the interval 0 ≤ x ≤ L in a reference conﬁguration. The natural boundary condition at the right hand end is given. The end x = 0 of the bar is ﬁxed. u. i. An admissible displacement ﬁeld is required to be continuous on [0. and so the three basic ingredients of the theory can now be derived as follows: i. (i) . u. which can be written in the conventional form L V {u} = f (x.196 CHAPTER 7. dx ii. that the stress σ(x) must be continuous at x = s: σ x=s− = σ x=s+ . a prescribed force P is applied at the end x = L. ∂f ∂u − ∂f =0 ∂u iii. so that then σ(x) = σ(u (x)) represents the stress at the point x in the bar. according to equation (7.88) requires that ∂f /∂u be continuous at x = s. u )dx 0 where f (x. Then the ﬁrst WeirstrassErdmann corner condition (7. The total potential energy associated with any admissible displacement ﬁeld is L L V {u} = 0 W (u (x))dx − 0 b(x)u(x)dx − P u(L). The displacement ﬁeld u(x) satisﬁes the prescribed boundary condition u = 0 at x = 0. so that u(0) = 0. by fu = 0 which in the present case reduces to σ(x) = W (u (x) = P at x = L. and a distributed force per unit length b(x) is applied along the length of the bar. CALCULUS OF VARIATIONS by a displacement ﬁeld u(x) and the material of which the bar is composed is characterized by a potential W (u ). σ(u ) = W (u ). The actual displacement ﬁeld minimizes the potential energy V over the admissible set. piecewise continuously diﬀerentiable on [0. dx which can be written in terms of stress as d σ + b = 0.5. At any point x at which the displacement ﬁeld is smooth. Finally.e. suppose that u has a jump discontinuity at some location x = s.50) in Section 7. L]. u ) = W (u ) − bu − P u . It is convenient to denote the derivative of W by σ. at x = L. L] and to conform to the boundary condition u(0) = 0.1. the Euler equation d dx takes the explicit form d W (u ) + b = 0.
correspond to inﬂection points of the energy function W (u ) since W = 0 when σ = 0. WORKED EXAMPLES. 197 The second WeirstrassErdmann corner condition (7.27(a). and so in such a case. as shown in the ﬁgure. that the quantity W − u σ must be continuous at x = s: W −uσ x=s− = W −uσ (ii) x=s+ Remark: The generalization of the quantity W − u σ to 3dimensions in known as the Eshelby tensor.7. In order to illustrate how a discontinuity in u can arise in an elastic bar. then it follows that σ = σ(u ) has a unique solution u corresponding to a given σ. i. even though σ(x) is continuous at x = s it is possible for u to be discontinuous. In particular.27(a).e. Thus if the function σ(u ) is monotonically increasing. observe ﬁrst that according to the ﬁrst WeirstrassErdmann condition (i). for example. The second WeirstrassErdmann condition (ii) tells us that the stress σ at the discontinuity has to have a special value.27(b) corresponds to the stress function σ(u ) shown in Figure 7. the stress σ on either side of x = s has to be continuous. To see this we write out (ii) explicitly as W (u (s+)) − u (s+)σ = W (u (s−) − u (s−)σ and then use σ(u ) = W (u ) to express it in the form u (s+) u (s−) σ(v)dv = σ u (s+) − u (s−) . i.12.27: (a) A nonmonotonic (risingfallingrising) stress response function σ(u ) and (b) the corresponding nonconvex energy W (u ). (iii) . On the other hand if σ(u ) is a nonmonotonic function as. for u (s−) = u (s+). shown in Figure 7. then more than one value of u can correspond to the same value of σ.e.89) requires that f − u ∂f /∂u be continuous at x = s. The energy function W (u ) sketched in Figure 7. σ(u ) W (u ) σ 0 u (s−) (a) +) u (s+) u 0 (b) u Figure 7. and so u must also be continuous at x = s. the values of u at which σ has a local maximum and local minimum.
u ¯ Therefore the minimizer of F {u} is u(x) = x1/3 . u(1) = 1. next minimize it at ﬁxed N .198 CHAPTER 7.9: An example to caution against simpleminded discretization.8: Inequality Constraint. that is prohibited from entering the interior of the circle (x − a/2)2 + y 2 = b2 . Find a curve that extremizes a I(φ) = 0 (φ (x)) dx. u1 (x) = (i) u(x) for h < x < 1. Find a curve that extremizes 1 F {φ} = f (x.27(a) must be equal. u .e.) To anticipate this diﬃculty in a diﬀerent way. 0 that begins from (0. CALCULUS OF VARIATIONS This implies that the value of σ must be such that the area under the stress response curve in Figure 7. Take limit as h → 0 and observe that F {u} does not go to zero (i. and take the continuous test function cx for 0 < x < h. 3 φ(0) = φ(a) = 0. Moreover F {¯} = 0 for u(x) = x1/3 . Remark: By identifying the curve y = g(x) with the surface of a mirror and specializing the functional F to the travel time of light. (Due to John Ball). and take a piecewise linear test function that is linear on each segment. Example 7. b) after contacting a given curve y = g(x). ¯ Calculate F {u} for this function. Let 1 F {u} = 0 (u3 (x) − x)2 (u (x))2 dx for all functions such that u(0) = 0. What do you get? (You will get an answer but not the correct one. to F {¯}). a). Calculate the functional F at this test function. or equivalently that the two shaded areas in Figure 7. φ(x).7: Nonsmooth extremal. one can thus derive the law of reﬂection for light. ¯ Discretize the interval [0. Clearly F {u} ≥ 0. and ends at (1. and ﬁnally take its limit as N tends to inﬁnity. consider a 2 element discretization. Example 7.27(a) from u (s−) to u (s+) must equal the area of the rectangle which has the same base and has height σ. φ (x))dx. Example 7. 1] into N segments.
10: Legendre necessary condition for a local minimum: Let 1 199 F {ε} = f (x. 0 < y < b. Show that 1 F (0) = 0 fφφ η 2 + 2fφφ ηη + fφ φ (η )2 dx.e.11: Bending of a thin plate. Show that it is necessary that fφ φ (x. (iv) . and the resulting deﬂection of the plate in the zdirection is denoted by w(x. φ(x).12. on y = b.7. y). which means that its deﬂection must be zero but there is no geometric restriction on the slope: w = 0. (v) 0 < x < a. We are asked to derive the Euler equation and the natural boundary conditions to be satisﬁed by minimizing w. y) is applied on the planar face of the plate in the zdirection. 0 < y < b} of the x. (ii) and ﬁnally the edge x = a is free in the sense that the deﬂection and the slope are not geometrically restricted in any way. ∂x (i) ∂w w = 0. 0 < x < a. The potential energy of the plate and loading associated with an admissible deﬂection ﬁeld. Example 7. which implies the geometric restrictions that the plate cannot deﬂect nor rotate along these edges: ∂w =0 on x = 0. φ + εη )dx 0 for all functions η(x) with η(0) = η(1) = 0. Suppose that F (0) ≥ 0 for all admissible functions η. φ (x)) ≥ 0 for 0 ≤ x ≤ 1. Example 7. a function w(x. A distributed pressure loading p(x. 0 < y < b. =0 on y = 0. w = 0. i. WORKED EXAMPLES. 0 < x < a. The edges x = 0 and y = 0 of the plate are clamped. A (iii) where D and ν are constants. Consider a thin rectangular plate of dimensions a × b that occupies the region A = {(x. y)  0 < x < a. y) that obeys (i) and (i) is given by Φ{w} = D 2 ∂2w ∂2w + ∂x2 ∂y 2 2 A − 2(1 − ν) ∂2w ∂2w − ∂x2 ∂y 2 ∂2w ∂x∂y 2 dxdy − pw dxdy. ∂y 2 ∂x on y = b. ∂y the edge y = b is hinged. The actual deﬂection of the plate is given by the minimizer of Φ. φ + εη. 0 < x < a. Answer: The Euler equation is ∂4w ∂4w ∂4w p + 2 + = ∂x4 ∂x2 ∂y 2 ∂x4 D and the natural boundary conditions are ∂2w ∂2w + ν 2 = 0. yplane.
φ(x) = 0. i.200 and ∂2w ∂2w +ν 2 =0 2 ∂x ∂y CHAPTER 7. Note that the base h = 1/k and the height is h/2. since its integrand is the sum of two nonnegative terms. Thus the family of sawteeth functions is a minimizing sequence of (i). is not a minimizer of (i). . then. such that F {φk } approaches its inﬁmum as k → ∞. If the minimizing sequence itself converges to φ∗ show that F {φ∗ } is not the inﬁmum of F . . the second term. 1 k h= φk (x) = x − nh φk (x) = −x + (n + 1) h/ h/2 0 1 x h nh (n + 1)h 1) Figure 7. each of which has a smaller base and smaller height. which equals the area under the square of φk approaches zero as k → ∞.28.28: Sawtooth function φk (x) with k local maxima and linear segments of slope ±1. note that since φk (x) → 0 at each ﬁxed x as k → ∞ the limiting function to which this sequence converges. each of those terms must vanish individually. These cannot be both satisﬁed by a regular function. . (vi) Example 7. If the functional takes the value zero. Thus as k increases there are more and more teeth.12: Consider the functional 1 F {φ} = 0 (φ )2 − 1 2 + φ2 dx. Remark: Note that this functional is nonnegative. . CALCULUS OF VARIATIONS and ∂3w ∂3w + 2(1 − ν) = 0. Observe that the ﬁrst term in the integrand of (i) vanishes identically for any k. Thus we must have φ (x) = ±1 and φ(x) = 0 on the interval 0 < x < 1. However. Let φk (x) be the piecewise linear sawtooth function with klocal maxima as shown in Figure 7. φ2 .e. 0 < y < b. φ(0) = φ(1) = 0. φ3 . 3 ∂x ∂x∂y 2 on x = a. (i) and determine a minimizing sequence φ1 . the slope of each linear segment is ±1.