This action might not be possible to undo. Are you sure you want to continue?
Lecture Notes
William G. Faris
May 14, 2002
2
Contents
1 Linear Algebra 7
1.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.2 Reduced row echelon form . . . . . . . . . . . . . . . . . . 8
1.1.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.4 The Jordan form . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.6 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.7 Spectral theorem . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.8 Circulant (convolution) matrices . . . . . . . . . . . . . . 15
1.1.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.2 Linear transformations . . . . . . . . . . . . . . . . . . . . 18
1.2.3 Reduced row echelon form . . . . . . . . . . . . . . . . . . 18
1.2.4 Jordan form . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.5 Forms and dual spaces . . . . . . . . . . . . . . . . . . . . 19
1.2.6 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.7 Special relativity . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.8 Scalar products and adjoint . . . . . . . . . . . . . . . . . 22
1.2.9 Spectral theorem . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Vector ﬁelds and diﬀerential forms . . . . . . . . . . . . . . . . . 25
1.3.1 Coordinate systems . . . . . . . . . . . . . . . . . . . . . 25
1.3.2 Vector ﬁelds . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.3 Diﬀerential forms . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.4 Linearization of a vector ﬁeld near a zero . . . . . . . . . 28
1.3.5 Quadratic approximation to a function at a critical point 29
1.3.6 Diﬀerential, gradient, and divergence . . . . . . . . . . . . 30
1.3.7 Spherical polar coordinates . . . . . . . . . . . . . . . . . 31
1.3.8 Gradient systems . . . . . . . . . . . . . . . . . . . . . . . 32
1.3.9 Hamiltonian systems . . . . . . . . . . . . . . . . . . . . . 33
1.3.10 Contravariant and covariant . . . . . . . . . . . . . . . . . 34
1.3.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3
4 CONTENTS
2 Fourier series 37
2.1 Orthonormal families . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 L
2
convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Pointwise convergence . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Fourier transforms 45
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 L
1
theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 L
2
theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Fourier transform pairs . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Poisson summation formula . . . . . . . . . . . . . . . . . . . . . 50
3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 PDE Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Complex integration 53
4.1 Complex number quiz . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Closed and exact forms . . . . . . . . . . . . . . . . . . . 54
4.2.2 CauchyRiemann equations . . . . . . . . . . . . . . . . . 55
4.2.3 The Cauchy integral theorem . . . . . . . . . . . . . . . . 55
4.2.4 Polar representation . . . . . . . . . . . . . . . . . . . . . 56
4.2.5 Branch cuts . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Complex integration and residue calculus . . . . . . . . . . . . . 57
4.3.1 The Cauchy integral formula . . . . . . . . . . . . . . . . 57
4.3.2 The residue calculus . . . . . . . . . . . . . . . . . . . . . 58
4.3.3 Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.4 A residue calculation . . . . . . . . . . . . . . . . . . . . . 59
4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 More residue calculus . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 Jordan’s lemma . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.2 A more delicate residue calculation . . . . . . . . . . . . . 61
4.5.3 Cauchy formula for derivatives . . . . . . . . . . . . . . . 61
4.5.4 Poles of higher order . . . . . . . . . . . . . . . . . . . . . 62
4.5.5 A residue calculation with a double pole . . . . . . . . . . 62
4.6 The Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.1 Radius of convergence . . . . . . . . . . . . . . . . . . . . 63
4.6.2 Riemann surfaces . . . . . . . . . . . . . . . . . . . . . . . 64
CONTENTS 5
5 Distributions 67
5.1 Properties of distributions . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Mapping distributions . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Approximate delta functions . . . . . . . . . . . . . . . . . . . . . 70
5.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Tempered distributions . . . . . . . . . . . . . . . . . . . . . . . . 72
5.7 Poisson equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.8 Diﬀusion equation . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.9 Wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.10 Homogeneous solutions of the wave equation . . . . . . . . . . . 77
5.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.12 Answers to ﬁrst two problems . . . . . . . . . . . . . . . . . . . . 79
6 Bounded Operators 81
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 Bounded linear operators . . . . . . . . . . . . . . . . . . . . . . 81
6.3 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 HilbertSchmidt operators . . . . . . . . . . . . . . . . . . . . . . 87
6.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6 Finite rank operators . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Densely Deﬁned Closed Operators 93
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 The spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6 Spectra of inverse operators . . . . . . . . . . . . . . . . . . . . . 96
7.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.8 Selfadjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.9 First order diﬀerential operators with a bounded interval: point
spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.10 Spectral projection and reduced resolvent . . . . . . . . . . . . . 100
7.11 Generating secondorder selfadjoint operators . . . . . . . . . . . 101
7.12 First order diﬀerential operators with a semiinﬁnite interval:
residual spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.13 First order diﬀerential operators with an inﬁnite interval: contin
uous spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.14 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.15 A pathological example . . . . . . . . . . . . . . . . . . . . . . . 104
6 CONTENTS
8 Normal operators 105
8.1 Spectrum of a normal operator . . . . . . . . . . . . . . . . . . . 105
8.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.3 Variation of parameters and Green’s functions . . . . . . . . . . . 107
8.4 Second order diﬀerential operators with a bounded interval: point
spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.5 Second order diﬀerential operators with a semibounded interval:
continuous spectrum . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.6 Second order diﬀerential operators with an inﬁnite interval: con
tinuous spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.7 The spectral theorem for normal operators . . . . . . . . . . . . . 110
8.8 Examples: compact normal operators . . . . . . . . . . . . . . . 112
8.9 Examples: translation invariant operators and the Fourier trans
form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.10 Examples: Schr¨odinger operators . . . . . . . . . . . . . . . . . . 113
8.11 Subnormal operators . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.12 Examples: forward translation invariant operators and the Laplace
transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.13 Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.14 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9 Calculus of Variations 121
9.1 The EulerLagrange equation . . . . . . . . . . . . . . . . . . . . 121
9.2 A conservation law . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.3 Second variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.4 Interlude: The Legendre transform . . . . . . . . . . . . . . . . . 124
9.5 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . 126
9.6 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . 127
9.7 Kinetic and potential energy . . . . . . . . . . . . . . . . . . . . . 128
9.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.9 The path integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.10 Appendix: Lagrange multipliers . . . . . . . . . . . . . . . . . . . 131
10 Perturbation theory 133
10.1 The implicit function theorem: scalar case . . . . . . . . . . . . . 133
10.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
10.3 The implicit function theorem: systems . . . . . . . . . . . . . . 136
10.4 Nonlinear diﬀerential equations . . . . . . . . . . . . . . . . . . . 137
10.5 A singular perturbation example . . . . . . . . . . . . . . . . . . 138
10.6 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . 139
10.7 The selfadjoint case . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.8 The anharmonic oscillator . . . . . . . . . . . . . . . . . . . . . . 142
Chapter 1
Linear Algebra
1.1 Matrices
1.1.1 Matrix algebra
An m by n matrix A is an array of complex numbers A
ij
for 1 ≤ i ≤ m and
1 ≤ j ≤ n.
The vector space operations are the sum A+B and the scalar multiple cA.
Let A and B have the same dimensions. The operations are deﬁned by
(A+B)
ij
= A
ij
+B
ij
(1.1)
and
(cA)
ij
= cA
ij
. (1.2)
The m by n zero matrix is deﬁned by
0
ij
= 0. (1.3)
A matrix is a linear combination of other matrices if it is obtained from those
matrices by adding scalar multiples of those matrices.
Let A be an m by n matrix and B be an n by p matrix. Then the product
AB is deﬁned by
(AB)
ik
=
n
j=1
A
ij
B
jk
. (1.4)
The n by n identity matrix is deﬁned by
I
ij
= δ
ij
. (1.5)
Here δ
ij
is the Kronecker delta that is equal to 1 when i = j and to 0 when
i ,= j.
Matrix algebra satisﬁes the usual properties of addition and many of the
usual properties of multiplication. In particular, we have the associative law
(AB)C = A(BC) (1.6)
7
8 CHAPTER 1. LINEAR ALGEBRA
and the unit law
AI = IA = A. (1.7)
Even more important, we have the distributive laws
A(B +C) = AB +AC
(B +C)A = BA+CA. (1.8)
However multiplication is not commutative; in general AB ,= BA.
An n by n matrix A is invertible if there is another n by n matrix B with
AB = I and BA = I. In this case we write B = A
−1
. (It turns out that if BA =
I, then also AB = I, but this is not obvious at ﬁrst.) The inverse operation has
several nice properties, including (A
−1
)
−1
= A and (AB)
−1
= B
−1
A
−1
.
The notion of division is ambiguous. Suppose B is invertible. Then both
AB
−1
and B
−1
A exist, but they are usually not equal.
Let A be an n by n square matrix. The trace of A is the sum of the diagonal
entries. It is easy to check that tr(AB) = tr(BA) for all such matrices. Although
matrices do not commute, their traces do.
1.1.2 Reduced row echelon form
An m component vector is an m by 1 matrix. The ith standard basis vector is
the vector with 1 in the ith row and zeros everywhere else.
An m by n matrix R is in reduced row echelon form (rref) if each column
is either the next unit basis vector, or a a linear combination of the previous
unit basis vectors. The columns where unit basis vectors occur are called pivot
columns. The rank r of R is the number of pivot columns.
Theorem. If A is an m by n matrix, then there is an m by m matrix E that
is invertible and such that
EA = R, (1.9)
where R is in reduced row echelon form. The matrix R is uniquely determined
by A.
This theorem allows us to speak of the pivot columns of A and the rank of
A. Notice that if A is n by n and had rank n, then R is the identity matrix and
E is the inverse of A.
Example. Let
A =
_
_
4 12 2 16 1
0 0 3 12 −2
−1 −3 0 −2 −3
_
_
. (1.10)
Then the rref of A is
R =
_
_
1 3 0 2 0
0 0 1 4 0
0 0 0 0 1
_
_
. (1.11)
Corollary. Let A have reduced row echelon form R. The null space of A is
the null space of R. That is, the solutions of the homogeneous equation Ax = 0
are the same as the solutions of the homogeneous equation Rx = 0.
1.1. MATRICES 9
Introduce a deﬁnition: A matrix is ﬂipped rref if when ﬂipped leftright
(ﬂiplr) and ﬂipped updown (ﬂipud) it is rref.
The way reduced row echelon form is usually deﬁned, one works from left
to right and from top to bottom. If you try to deﬁne a corresponding concept
where you work from right to left and from bottom to top, a perhaps sensible
name for this is ﬂipped reduced row echelon form.
Theorem. Let A be an m by n matrix with rank r. Then there is a unique
n by n −r matrix N such that the transpose of N is ﬂipped rref and such that
the transpose has pivot columns that are the non pivot columns of A and such
that
AN = 0 (1.12)
The columns of N are called the rational basis for the null space of A. It is
easy to ﬁnd N by solving RN = 0. The rational null space matrix N has the
property that its transpose is in ﬂipped reduced row echelon form.
Example: In the above example the null space matrix of A is
N =
_
¸
¸
¸
_
−3 −2
1 0
0 −4
0 1
0 0
_
¸
¸
¸
_
. (1.13)
That is, the solutions of Ax = 0 are the vectors of the form x = Nz. In other
words, the columns of N span the null space of A.
One can also use the technique to solve inhomogeneous equations Ax = b.
One simply applies the theory to the augmented matrix [A − b]. There is a
solution when the last column of A is not a pivot column. A particular solution
may be read oﬀ from the last column of the rational basis.
Example. Solve Ax = b, where
b =
_
_
4
−19
−9
_
_
. (1.14)
To accomplish this, let
A
1
=
_
_
4 12 2 16 1 −4
0 0 3 12 −2 19
−1 −3 0 −2 −3 9
_
_
. (1.15)
Then the rref of A
1
is
R =
_
_
1 3 0 2 0 −3
0 0 1 4 0 5
0 0 0 0 1 −2
_
_
. (1.16)
10 CHAPTER 1. LINEAR ALGEBRA
The null space matrix of A
1
is
N
1
=
_
¸
¸
¸
¸
¸
_
−3 −2 3
1 0 0
0 −4 −5
0 1 0
0 0 2
0 0 1
_
¸
¸
¸
¸
¸
_
. (1.17)
Thus the solution of Ax = b is
x =
_
¸
¸
¸
_
3
0
−5
0
2
_
¸
¸
¸
_
. (1.18)
1.1.3 Problems
Recall that the columns of a matrix A are linearly dependent if and only if the
homogeneous equation Ax = 0 has a nontrivial solution. Also, a vector y is in
the span of the columns if and only if the inhomogeneous equation Ax = y has
a solution.
1. Show that if p is a particular solution of the equation Ax = b, then every
other solution is of the form x = p +z, where Az = 0.
2. Consider the matrix
A =
_
¸
¸
¸
¸
¸
_
2 4 5 19 −3
2 4 3 9 −2
−4 −8 1 17 2
3 6 1 −4 −2
4 8 1 −7 −3
2 4 −3 −21 0
_
¸
¸
¸
¸
¸
_
.
Use Matlab to ﬁnd the reduced row echelon form of the matrix. Use this
to ﬁnd the rational basis for the solution of the homogeneous equation
Az = 0. Check this with the Matlab solution. Write the general solution
of the homogeneous equation.
3. Let A be the matrix given above. Use Matlab to ﬁnd an invertible matrix
E such that EA = R is in reduced echelon form. Find the determinant of
E.
4. Consider the system Ax = b, where A is as above, and
b =
_
¸
¸
¸
¸
¸
_
36
21
9
6
6
−23
_
¸
¸
¸
¸
¸
_
.
1.1. MATRICES 11
Find the reduced echelon form of the matrix A augmented by the column
−b on the right. Use this to ﬁnd the rational basis for the solution of the
homogeneous equation involving the augmented matrix. Use this to ﬁnd
the general solution of the original inhomogeneous equation.
5. Consider a system of 6 linear equations in 5 unknowns. It could be overde
termined, that is, have no solutions. Or it could have special properties
and be under determined, that is, have many solutions. Could it be nei
ther, that is, have exactly one solution? Is there such an example? Answer
this question, and prove that your answer is correct.
6. Consider a system of 5 linear equations in 6 unknowns. Could it have
exactly one solution? Is there such an example? Answer this question,
and prove that your answer is correct.
7. Consider the ﬁve vectors in R
6
formed by the columns of A. Show that
the vector b is in the span of these ﬁve vectors. Find explicit weights that
give it as a linear combination.
8. Is every vector y in R
6
in the span of these ﬁve vectors? If so, prove it.
If not, give an explicit example of a vector that is not in the span, and
prove that it is not in the span.
9. Are these ﬁve vectors linearly independent? Prove that your answer is
correct.
The vectors in A that are pivot columns of A have the same span as the
columns of A, yet are linearly independent. Find these vectors. How many
are they? Prove that they are linearly independent.
1.1.4 The Jordan form
Two matrices A, B are said to be similar if there is an invertible matrix P with
P
−1
AP = B. Notice that if A and B are similar, then tr(A) = tr(B).
Theorem. Let A be an n by n matrix. Then there is an invertible matrix P
such that
P
−1
AP = D +N, (1.19)
where D is diagonal with diagonal entries D
kk
= λ
k
. Each entry of N is zero,
except if λ
k
= λ
k−1
, then it is allowed that N
k−1 k
= 1.
Important note: Even if the matrix A is real, it may be that the matrices P
and D are complex.
The equation may also be written AP = PD + PN. If we let D
kk
= λ
k
,
then this is
n
j=1
A
ij
P
jk
= P
ik
λ
k
+P
i k−1
N
k−1 k
. (1.20)
12 CHAPTER 1. LINEAR ALGEBRA
The N
k−1 k
factor is zero, except in cases where λ
k
= λ
k−1
, when it is allowed
to be 1. We can also write this in vector form. It is the eigenvalue equation
Au
k
= λ
k
u
k
(1.21)
except when λ
k
= λ
k−1
, when it may take the form
Au
k
= λ
k
u
k
+u
k−1
. (1.22)
The hard thing is to actually ﬁnd the eigenvalues λ
k
that form the diagonal
elements of the matrix D. This is a nonlinear problem. One way to see this is
the following. For each k = 1, . . . , n the matrix power A
k
is similar to a matrix
(D +N)
k
with the same diagonal entries as D
k
. Thus we have the identity
tr(A
k
) = λ
k
1
+ +λ
k
n
. (1.23)
This is a system of n nonlinear polynomial equations in n unknowns λ
1
, . . . , λ
k
.
As is well known, there is another way of writing these equations in terms of the
characteristic polynomial. This gives a single nth order polynomial equation in
one unknown λ. This single equation has the same n solutions. For n = 2 the
equation is quite easy to deal with by hand. For larger n one is often driven to
computer algorithms.
Example: Let
A =
_
1 2
−15 12
_
. (1.24)
The eigenvalues are 7, 6. The eigenvectors are the columns of
P =
_
1 2
3 5
_
. (1.25)
Let
D =
_
7 0
0 6
_
. (1.26)
Then AP = PD.
Example: Let
A =
_
10 −1
9 4
_
. (1.27)
The only eigenvalue is 7. The eigenvector is the ﬁrst column of
P =
_
1 2
3 5
_
. (1.28)
Let
D +N =
_
7 1
0 7
_
. (1.29)
Then AP = P(D +N).
1.1. MATRICES 13
1.1.5 Problems
1. If A is a square matrix and f is a function deﬁned by a convergent power
series, then f(A) is deﬁned. Show that if A is similar to B, then f(A) is
similar to f(B).
2. By the Jordan form theorem, A is similar to D+N, where D is diagonal,
N is nilpotent, and D, N commute. To say that N is nilpotent is to say
that for some p ≥ 1 the power N
p
= 0. Show that
f(D +N) =
p−1
m=0
1
m!
f
(m)
(D)N
m
(1.30)
3. Show that
exp(t(D +N)) = exp(tD)
p−1
m=0
1
m!
N
m
t
m
. (1.31)
Use this to describe the set of all solutions x = exp(tA)z to the diﬀerential
equation
dx
dt
= Ax. (1.32)
with initial condition x = z when t = 0.
4. Take
A =
_
0 1
−k −2c
_
, (1.33)
where k ≥ 0 and c ≥ 0. The diﬀerential equation describes an oscillator
with spring constant k and friction coeﬃcient 2c. Find the eigenvalues
and sketch a typical solution in the x
1
, x
2
plane in each of the following
cases: over damped c
2
> k > 0; critically damped c
2
= k > 0; under
damped 0 < c
2
< k; undamped 0 = c
2
< k; free motion 0 = c
2
= k.
5. Consider the critically damped case. Find the Jordan form of the matrix
A, and ﬁnd a similarity transformation that transforms A to its Jordan
form.
6. If A = PDP
−1
, where the diagonal matrix D has diagonal entries λ
i
, then
f(A) may be deﬁned for an arbitrary function f by f(A) = Pf(D)P
−1
,
where f(D) is the diagonal matrix with entries f(λ
i
). Thus, for instance,
if each λ
i
≥ 0, then
√
A is deﬁned. Find the square root of
A =
_
20 40
−8 −16
_
. (1.34)
7. Give an example of a matrix A with each eigenvalue λ
i
≥ 0, but for which
no square root
√
A can be deﬁned? Why does the formula in the second
problem not work?
14 CHAPTER 1. LINEAR ALGEBRA
1.1.6 Quadratic forms
Given an m by n matrix A, its adjoint is an n by n matrix A
∗
deﬁned by
A
∗
ij
=
¯
A
ji
. (1.35)
If we are dealing with real numbers, then the adjoint is just the transpose. The
adjoint operator has several nice properties, including A
∗∗
= A and (AB)
∗
=
B
∗
A
∗
.
Two matrices A, B are said to be congruent if there is an invertible matrix
P with P
∗
AP = B.
Theorem. Let A be an n by n matrix with A = A
∗
. Then there is an
invertible matrix P such that
P
∗
AP = D, (1.36)
where D is diagonal with entries 1, −1, and 0.
Deﬁne the quadratic form Q(x) = x
∗
Ax. Then the equation says that one
may make a change of variables x = Py so that Q(x) = y
∗
Dy.
1.1.7 Spectral theorem
A matrix U is said to be unitary if U
−1
= U
∗
. In the real case this is the same
as being an orthogonal matrix.
Theorem. Let A be an n by n matrix with A = A
∗
. Then there is a unitary
matrix U such that
U
−1
AU = D, (1.37)
where D is diagonal with real entries.
Example: Let
A =
_
1 2
2 1
_
. (1.38)
The eigenvalues are given by
D =
_
3 0
0 −1
_
. (1.39)
A suitable orthogonal matrix P is
P =
1
√
2
_
1 −1
1 1
_
. (1.40)
Then AP = PD.
A matrix A is said to be normal if AA
∗
= A
∗
A. A selfadjoint matrix
(A
∗
= A) is normal. A skewadjoint matrix (A
∗
= −A) is normal. A unitary
matrix (A
∗
= A
−1
) is normal.
Theorem. Let A be an n by n matrix that is normal. Then there is a unitary
matrix U such that
U
−1
AU = D, (1.41)
1.1. MATRICES 15
where D is diagonal.
Notice that this is the eigenvalue equation AU = UD. The columns of U
are the eigenvectors, and the diagonal entries of D are the eigenvalues.
Example: Let
P =
1
√
2
_
1 −1
1 1
_
. (1.42)
Then P is a rotation by π/4. The eigenvalues are on the diagonal of
F =
1
√
2
_
1 +i 0
0 1 −i
_
. (1.43)
A suitable unitary matrix Q is
Q =
1
√
2
_
1 i
−i −1
_
. (1.44)
Then PQ = QF.
1.1.8 Circulant (convolution) matrices
If A is normal, then there are unitary U and diagonal D with AU = UD. But
they are diﬃcult to compute. Here is one special but important situation where
everything is explicit.
A circulant (convolution) matrix is an n by n matrix such that there is a
function a with A
pq
= a(p − q), where the diﬀerence is computed modulo n.
[For instance, a 4 by 4 matrix would have the same entry a(3) in the 12, 23, 34,
and 41 positions.]
The DFT (discrete Fourier transform) matrix is an n by n unitary matrix
given by
U
qk
=
1
√
n
e
2πiqk
n
. (1.45)
Theorem. Let A be a circulant matrix. If U is the DFT matrix, then
U
−1
AU = D, (1.46)
where D is a diagonal matrix with
D
kk
= ˆ a(k) =
r
a(r)e
−
2πirk
n
. (1.47)
1.1.9 Problems
1. Let
A =
_
¸
¸
¸
_
5 1 4 2 3
1 2 6 3 1
4 6 3 0 4
2 3 0 1 2
3 1 4 2 3
_
¸
¸
¸
_
. (1.48)
Use Matlab to ﬁnd orthogonal P and diagonal D so that P
−1
AP = D.
16 CHAPTER 1. LINEAR ALGEBRA
2. Find unitary Q and diagonal F so that Q
−1
PQ = F.
3. The orthogonal matrix P is made of rotations in two planes and possibly
a reﬂection. What are the two angles of rotations? Is there a reﬂection
present as well? Explain.
4. Find an invertible matrix R such that R
∗
AR = G, where G is diagonal
with all entries ±1 or 0.
5. In the following problems the matrix A need not be square. Let A be a
matrix with trivial null space. Show that A
∗
A is invertible.
6. Let E = A(A
∗
A)
−1
A
∗
. Show that E
∗
= E and E
2
= E.
7. Deﬁne the pseudoinverse A
+
= (A
∗
A)
−1
A
∗
. Show that AA
+
= E and
A
+
A = I.
8. Let
A =
_
¸
¸
¸
¸
¸
¸
¸
_
1 1 1
1 2 4
1 3 9
1 4 16
1 5 25
1 6 36
1 7 49
_
¸
¸
¸
¸
¸
¸
¸
_
. (1.49)
Calculate E and check the identities above. Find the eigenvalues of E.
Explain the geometric meaning of E.
9. Find the parameter values x such that Ax best approximates
y =
_
¸
¸
¸
¸
¸
¸
¸
_
7
5
6
−1
−3
−7
−24
_
¸
¸
¸
¸
¸
¸
¸
_
. (1.50)
(Hint: Let x = A
+
y.) What is the geometric interpretation of Ax?
1.2. VECTOR SPACES 17
1.2 Vector spaces
1.2.1 Vector spaces
A vector space is a collection V of objects that may be combined with vector
addition and scalar multiplication. There will be two possibilities for the scalars.
They may be elements of the ﬁeld R of real numbers. Or they may be elements
of the ﬁeld C of complex numbers. To handle both cases together, we shall
consider the scalars as belonging to a ﬁeld F.
The vector space addition axioms say that the sum of each two vectors u, v
in V is another vector in the space called u+v in V . Addition must satisfy the
following axioms: associative law
(u +v) +w = u + (v +w) (1.51)
additive identity law
u +0 = 0 +u = u (1.52)
additive inverse law
u + (−u) = (−u) +u = 0 (1.53)
and commutative law
u +v = v +u. (1.54)
The vector space axioms also require that for each scalar c in F and each u
in V there is a vector cu in V . Scalar multiplication must satisfy the following
axioms: distributive law for vector addition
c(u +v) = cu +cv (1.55)
distributive law for scalar addition
(c +d)u = cu +cu (1.56)
associative law for scalar multiplication
(cd)u = c(du) (1.57)
identity law for scalar multiplication
1u = u. (1.58)
The elements of a vector space are pictured as arrows starting at a ﬁxed
origin. The vector sum is pictured in terms of the parallelogram law. The sum
of two arrows starting at the origin is the diagonal of the parallelogram formed
by the two arrows.
A subspace of a vector space is a subset that is itself a vector space. The
smallest possible subspace consists of the zero vector at the origin. A one
dimensional vector subspace is pictured as a line through the origin. A two
dimensional vector subspace is pictured as a plane through the origin, and so
on.
18 CHAPTER 1. LINEAR ALGEBRA
1.2.2 Linear transformations
Let V and W be vector spaces. A linear transformation T : V →W is a function
from V to W that preserves the vector space operations. Thus
T(u +v) = Tu +Tv (1.59)
and
T(cu) = cTu. (1.60)
The null space of a linear transformation T : V →W is the set of all vectors
u in V such that Tu = 0. The null space is sometimes called the kernel.
The range of a linear transformation T : V →W is the set of all vectors Tu
in W, where u is in V . The range is sometimes called the image.
Theorem. The null space of a linear transformation T : V →W is a subspace
of V .
Theorem. The range of a linear transformation T : V →W is a subspace of
W.
Theorem. A linear transformation T : V → W is onetoone if and only if
its null space is the zero subspace.
Theorem. A linear transformation T : V → W is onto W if and only if its
range is W.
Theorem. A linear transformation T : V →W has an inverse T
−1
: W →V
if and only if its null space is the zero subspace and its range is W.
An invertible linear transformation will also be called a vector space isomor
phism.
Consider a list of vectors p
1
, . . . , p
n
. They deﬁne a linear transformation
P : F
n
→V , by taking the vector in F
n
to be the weights of a linear combination
of the p
1
, . . . , p
n
.
Theorem. A list of vectors is linearly independent if and only if the corre
sponding linear transformation has zero null space.
Theorem. A list of vectors spans the vector space V if and only if the
corresponding linear transformation has range V .
Theorem. A list of vectors is a basis for V if and only if the corresponding
linear transformation P : F
n
→ V is a vector space isomorphism. In that case
the inverse transformation P
−1
: V → F
n
is the transformation that carries a
vector to its coordinates with respect to the basis.
In case there is a basis transformation P : F
n
→ V , the vector space V is
said to be n dimensional.
1.2.3 Reduced row echelon form
Theorem Let p
1
, . . . , p
n
be a list of vectors in an m dimensional vector space
V . Let P : F
n
→ V be the corresponding transformation. Then there is an
isomorphism E : V →F
m
such that
EP = R, (1.61)
1.2. VECTOR SPACES 19
where R is in reduced row echelon form. The m by n matrix R is uniquely
determined by the list of vectors.
Thus for an arbitrary list of vectors, there are certain vectors that are pivot
vectors. These vectors are those that are not linear combinations of previous
vectors in the list. The reduced row echelon form of a list of vectors expresses
the extent to which each vector in the list is a linear combination of previous
pivot vectors in the list.
1.2.4 Jordan form
Theorem. Let T : V →V be a linear transformation of an n dimensional vector
space to itself. Then there is an invertible matrix P : R
n
→V such that
P
−1
TP = D +N, (1.62)
where D is diagonal with diagonal entries D
kk
= λ
k
. Each entry of N is zero,
except if λ
k
= λ
k−1
, then it is allowed that N
k−1 k
= 1.
We can also write this in vector form. The equation TP = P(D + N) is
equivalent to the eigenvalue equation
Tp
k
= λ
k
p
k
(1.63)
except when λ
k
= λ
k−1
, when it may take the form
Tp
k
= λ
k
p
k
+p
k−1
. (1.64)
It is diﬃcult to picture such a linear transformation, even for a real vector
space. One way to do it works especially well in the case when the dimension of
V is 2. Then the linear transformation T maps each vector u in V to another
vector Tu in V . Think of each u as determining a point in the plane. Draw
the vector Tu as a vector starting at the point u. Then this gives a picture
of the linear transformation as a vector ﬁeld. The qualitative features of this
vector ﬁeld depend on whether the eigenvalues are real or occur as a complex
conjugate pair. In the real case, they can be both positive, both negative, or
of mixed sign. These give repelling, attracting, and hyperbolic ﬁxed points. In
the complex case the real part can be either positive or negative. In the ﬁrst
case the rotation has magnitude less than π/2, and the vector ﬁeld is a repelling
spiral. In the second case the rotation has magnitude greater than π/2 and the
vector ﬁeld is an attracting spiral. The case of complex eigenvalues with real
part equal to zero is special but important. The transformation is similar to a
rotation that has magnitude ±π/2, and so the vector ﬁeld goes around in closed
elliptical paths.
1.2.5 Forms and dual spaces
The dual space V
∗
of a vector space V consists of all linear transformations
from V to the ﬁeld of scalars F. If ω is in V
∗
, then the value of ω on u in V is
written ¸ω, u).
20 CHAPTER 1. LINEAR ALGEBRA
We think of the elements of F
n
as column vectors. The elements of the dual
space of F
n
are row vectors. The value of a row vector on a column vector is
given by taking the row vector on the left and the column vector on the right
and using matrix multiplication.
The way to picture an element of V
∗
is via its contour lines. These are
straight lines at regularly spaced intervals. The value of ω on u is obtained by
taking the vector u and seeing how many contour lines it crosses.
Sometimes vectors in the original space V are called contravariant vectors,
while forms in the dual space V
∗
are called covariant vectors.
1.2.6 Quadratic forms
A sesquilinear form is a function that assigns to each ordered pair of vectors u,
v in V a scalar a(u, v). It is required that it be linear in the second variable.
Thus
a(u, v +w) = a(u, v) +a(u, w) (1.65)
and
a(u, cv) = ca(u, v). (1.66)
Furthermore, it is required to be conjugate linear in the ﬁrst variable. this says
that
a(u +v, w) = a(u, w) +a(v, w) (1.67)
and
a(cu, v) = ¯ ca(u, v). (1.68)
In the real case one does not need the complex conjugates on the scalars. In
this case this is called a bilinear form.
A sesquilinear form deﬁnes a linear transformation from V to V
∗
. (Actually,
in the complex case it is a conjugate linear transformation.) This linear trans
formation takes the vector u to the form a(u, ). So a sesquilinear form may be
viewed as a special kind of linear transformation, but not one from a space to
itself.
A sesquilinear form is sometimes regarded as a covariant tensor. By contrast,
a linear transformation a mixed tensor.
A sesquilinear form is Hermitian if
a(u, v) = a(v, u). (1.69)
In the real case the complex conjugate does not occur. In this case the bilinear
form is called symmetric.
A Hermitian form deﬁnes a quadratic forma(u, u) that is real. The quadratic
form determines the Hermitian form. In fact,
4'a(u, v) = a(u +v, u +v) +a(u −v, u −v) (1.70)
determines the real part. Then ·a(u, v) = 'a(iu, v) determines the imaginary
part.
1.2. VECTOR SPACES 21
Theorem. Let a be a Hermitian form deﬁned for vectors in an n dimensional
vector space V . Then there exists an invertible linear transformation P : F
n
→
V such that
a(Px, Px) = x
∗
Dx, (1.71)
where D is diagonal with entries 1, −1, and 0.
It is sometimes possible to get a good picture of a quadratic form. In the
real case when V is two dimensional a quadratic form may be pictured by its
contour lines. There are diﬀerent cases depending on whether one has +1s or
−1s or a +1, −1 combination.
1.2.7 Special relativity
In special relativity V is the four dimensional real vector space of displacements
in spacetime. Thus a vector u takes one event in spacetime into another event
in spacetime. It represents a separation between these spacetime events.
There is a quadratic form g with signature +1, −1, −1, −1. The vector u is
said to be timelike, lightlike, or spacelike according to whether g(u, u) > 0,
g(u, u) = 0, or g(u, u) < 0. The time (perhaps in seconds) between timelike
separated events is
_
g(u, u). The distance (perhaps in lightseconds) between
spacelike separated events is
_
−g(u, u).
The light cone is the set of lightlike vectors. The rest of the vector space
consists of three parts: forward timelike vectors, backward timelike vectors,
and spacelike vectors.
Lemma. If two forward timelike vectors u, v have unit length
_
g(u, u) =
_
g(v, v) = 1, then their diﬀerence u −v is spacelike.
Proof: This lemma is at least geometrically plausible. It may be proved by
using the diagonal form D of the quadratic form.
Lemma If u, v are forward timelike vectors with unit length, then g(u, v) >
1.
Proof: This follows immediately from the previous lemma.
AntiSchwarz inequality. If u, v are forward timelike vectors, then g(u, v) >
_
g(u, u)
_
g(v, v).
Proof: This follows immediately from the previous lemma.
Antitriangle inequality. If u, v are forward timelike vectors, and u+v = w,
then
_
g(w, w) >
_
g(u, u) +
_
g(v, v). (1.72)
Proof: This follows immediately from the antiSchwarz inequality.
The antitriangle inequality is the basis of the twin paradox. In this case
the lengths of the vectors are measured in time units. One twin undergoes no
acceleration; the total displacement in space time is given by w. The other twin
begins a journey according to vector u, then changes course and returns to a
meeting event according to vector v. At this meeting the more adventurous
twin is younger.
The antitriangle inequality is also the basis of the famous conversion of
mass to energy. In this case w is the energymomentum vector of a particle,
22 CHAPTER 1. LINEAR ALGEBRA
and
_
g(w, w) is its rest mass. (Consider energy, momentum, and mass as all
measured in energy units.) The particle splits into two particles that ﬂy apart
from each other. The energymomentum vectors of the two particles are u and
v. By conservation of energymomentum, w = u + v. The total rest mass
_
g(u, u) +
_
g(v, v of the two particles is smaller than the original rest mass.
This has released the energy that allows the particles to ﬂy apart.
1.2.8 Scalar products and adjoint
A Hermitian form g(u, v) is an inner product (or scalar product) if g(u, u) > 0
except when u = 0. Often when there is just one scalar product in consideration
it is written g(u, v) = (u, v). However we shall continue to use the more explicit
notation for a while, since it is important to realize that to use these notions
there must be some mechanism that chooses an inner product. In the simplest
cases the inner product arises from elementary geometry, but there can be other
sources for a deﬁnition of a natural inner product.
Remark. An inner product gives a linear transformation from a vector space
to its dual space. Let V be a vector space with inner product g. Let u be in V .
Then there is a corresponding linear form u
∗
in the dual space of V deﬁned by
¸u
∗
, v) = g(u, v) (1.73)
for all v in V . This form could also be denoted u
∗
= gu.
The picture associated with this construction is that given a vector u repre
sented by an arrow, the u
∗
= gu is a form with contour lines perpendicular to
the arrow. The longer the vector, the more closely spaced the contour lines.
Theorem. (Riesz representation theorem) An inner product not only gives a
transformation from a vector space to its dual space, but this is an isomorphism.
Let V be a ﬁnite dimensional vector space with inner product g. Let ω : V →F
be a linear form in the dual space of V . Then there is a unique vector ω
∗
in V
such that
g(ω
∗
, v) = ¸ω, v) (1.74)
for all v in V . This vector could also be denoted ω
∗
= g
−1
ω.
The picture associated with this construction is that given a form ω repre
sented by contour lines, the ω
∗
= g
−1
ω is a vector perpendicular to the lines.
The more closely spaced the contour lines, the longer the vector.
Theorem: Let V
1
and V
2
be two ﬁnite dimensional vector spaces with inner
products g
1
and g
2
. Let A : V
1
→ V
2
be a linear transformation. Then there is
another operator A
∗
: V
2
→V
1
such that
g
1
(A
∗
u, v) = g
2
(Av, u) (1.75)
for every u in V
2
and v in V
1
. Equivalently,
g
1
(A
∗
u, v) = g
2
(u, Av) (1.76)
for every u in V
2
and v in V
1
.
1.2. VECTOR SPACES 23
The outline of the proof of this theorem is the following. If u is given,
then the map g
2
(u, A) is an element of the dual space V
∗
1
. It follows from the
previous theorem that it is represented by a vector. This vector is A
∗
u.
A linear transformation A : V
1
→V
2
is unitary if A
∗
= A
−1
.
There are more possibilities when the two spaces are the same. A linear
transformation A : V → V is selfadjoint if A
∗
= A. It is skewadjoint if
A
∗
= −A. It is unitary if A
∗
= A
−1
.
A linear transformation A : V →V is normal if AA
∗
= A
∗
A. Every operator
that is selfadjoint, skewadjoint, or unitary is normal.
Remark 1. Here is a special case of the deﬁnition of adjoint. Fix a vector u
in V . Let A : F → V be deﬁned by Ac = cu. Then A
∗
: V → F is a form u
∗
such that ¸u
∗
, z)c = g(z, cu). Thus justiﬁes the deﬁnition ¸u
∗
, z) = g(u, z).
Remark 2. Here is another special case of the deﬁnition of adjoint. Let
A : V → F be given by a form ω. Then the adjoint A
∗
: F → V applied to
c is the vector cω
∗
such that g(cω
∗
, v) = ¯ c¸ω, v). This justiﬁes the deﬁnition
g(ω
∗
, v) = ¸ω, v).
There are standard equalities and inequalities for inner products. Two vec
tors u, v are said to be orthogonal (or perpendicular) if g(u, v) = 0.
Theorem. (Pythagorus) If u, v are orthogonal and w = u +v, then
g(w, w) = g(u, u) +g(v, v). (1.77)
Lemma. If u, v are vectors with unit length, then 'g(u, v) ≤ 1.
Proof: Compute 0 ≤ g(u −v, u −v) = 2 −2'g(u, v).
Lemma. If u, v are vectors with unit length, then [g(u, v)[ ≤ 1.
Proof: This follows easily from the previous lemma.
Notice that in the real case this says that −1 ≤ g(u, v) ≤ 1. This allows
us to deﬁne the angle between two unit vectors u, v to be the number θ with
0 ≤ θ ≤ π and cos(θ) = g(u, v).
Schwarz inequality. We have the inequality [g(u, v)[ ≤
_
g(u, u)
_
g(v, v).
Proof: This follows immediately from the previous lemma.
Triangle inequality. If u +v = w, then
_
g(w, w) ≤
_
g(u, u) +
_
g(v, v). (1.78)
Proof: This follows immediately from the Schwarz inequality.
1.2.9 Spectral theorem
Theorem. Let V be an n dimensional vector space with an inner product g. Let
T : V → V be a linear transformation that is normal. Then there is a unitary
transformation U : F
n
→V such that
U
−1
TU = D, (1.79)
where D is diagonal.
24 CHAPTER 1. LINEAR ALGEBRA
We can also write this in vector form. The transformation U deﬁnes an or
thonormal basis of vectors. The equation is equivalent to the eigenvalue equation
TU = UD, or, explicitly,
Tu
k
= λ
k
u
k
. (1.80)
The distinction between linear transformations and quadratic forms some
times shows up even in the context of matrix theory. Let A be a selfadjoint
matrix, regarded as deﬁning a quadratic form. Let B be a selfadjoint matrix
that has all eigenvalues strictly positive. Then B deﬁnes a quadratic form that
is in fact a scalar product. The natural linear transformation to consider is then
T = B
−1
A. The eigenvalue problem for this linear transformation then takes
the form
Au = λBu. (1.81)
Indeed, this version is often encountered in concrete problems. The linear trans
formation B
−1
A is a selfadjoint linear transformation relative to the quadratic
form deﬁned by B. Therefore the eigenvalues λ are automatically real.
Example: Consider a system of linear spring oscillators. The equation of
motion is
M
d
2
x
dt
2
= −Ax. (1.82)
Here M is the matrix of mass coeﬃcients (a diagonal matrix with strictly pos
itive entries). The matrix A is real and symmetric, and it is assumed to have
positive eigenvalues. If we try a solution of the form x = z cos(
√
λt), we get the
condition
Az = λMz. (1.83)
The λ should be real and positive, so the
√
λ indeed make sense as resonant
angular frequencies.
In this example the inner product is deﬁned by the mass matrix M. The
matrix M
−1
A is a selfadjoint operator relative to the inner product deﬁned by
M. That is why the eigenvalues of M
−1
A are real. In fact, the eigenvalues of the
linear transformation M
−1
A are also positive. This is because the associated
quadratic form is deﬁned by A, which is positive.
1.3. VECTOR FIELDS AND DIFFERENTIAL FORMS 25
1.3 Vector ﬁelds and diﬀerential forms
1.3.1 Coordinate systems
In mathematical modeling it is important to be ﬂexible in the choice of coordi
nate system. For this reason we shall consider what one can do with arbitrary
coordinate systems. We consider systems whose state can be described by two
coordinates. However the same ideas work for higher dimensional systems.
Say some part of the system can be described by coordinates u, v. Let α, β
be another coordinate system for that part of the system. Then the scalars α
and β may be expressed as functions of the u and v. Furthermore, the matrix
J =
_
∂α
∂u
∂α
∂v
∂β
∂u
∂β
∂v
_
(1.84)
is invertible at each point, and the scalars u and v may be expressed as functions
of the α and β.
Consider the partial derivative ∂/∂u as the derivative holding the coordinate
v constant. Similarly, the partial derivative ∂/∂v is the derivative holding u
constant. Then the chain rule says that
∂
∂u
=
∂α
∂u
∂
∂α
+
∂β
∂u
∂
∂β
. (1.85)
Similarly,
∂
∂v
=
∂α
∂v
∂
∂α
+
∂β
∂v
∂
∂β
. (1.86)
These formulas are true when applied to an arbitrary scalar. Notice that these
formulas amount to the application of the transpose of the matrix J to the
partial derivatives.
One must be careful to note that an expression such as ∂/∂u makes no
sense without reference to a coordinate system u, v. For instance, say that we
were interested in coordinate system u, β. Then ∂/∂u would be computed as a
derivative along a curve where β is constant. This would be something quite
diﬀerent. Thus we would have, for instance, by taking u = α, the relation
∂
∂u
=
∂
∂u
+
∂β
∂u
∂
∂β
. (1.87)
This seems like complete nonsense, until one realizes that the ﬁrst ∂/∂u is taken
holding v constant, while the second ∂/∂u is taken holding β constant.
One way out of this kind of puzzle is to be very careful to specify the entire
coordinate system whenever a partial derivative is taken. Alternatively, one can
do as the chemists do and use notation that explicitly speciﬁes what is being
held constant.
One can do similar computations with diﬀerentials of the coordinates. Thus,
for instance,
du =
∂u
∂α
dα +
∂u
∂β
dβ (1.88)
26 CHAPTER 1. LINEAR ALGEBRA
and
dv =
∂v
∂α
dα +
∂v
∂β
dβ (1.89)
Diﬀerentials such as these occur in line integrals. Notice that the relations are
quite diﬀerent: they involve the inverse of the matrix J.
Here it is important to note that expressions such as du are deﬁned quite
independently of the other members of the coordinate system. In fact, let h be
an arbitrary scalar. Then dh is deﬁned and may be expressed in terms of an
arbitrary coordinate system. For instance
dh =
∂h
∂u
du +
∂h
∂v
dv (1.90)
and also
dh =
∂h
∂α
dα +
∂h
∂β
dβ. (1.91)
1.3.2 Vector ﬁelds
For each point P in the space being studied, there is a vector space V (P),
called the tangent space at the point P. The elements of V (P) are pictured as
arrows starting at the point P. There are of course inﬁnitely many vectors in
each V (P). One can pick a linearly independent list that spans V (P) as basis
vectors for V (P).
We consider arbitrary coordinates x, y in the plane. These are not neces
sarily Cartesian coordinates. Then there is a corresponding basis at each point
given by the partial derivatives ∂/∂x and ∂/∂y. This kind of basis is called a
coordinate basis.
A vector ﬁeld assigns to each point a vector in the tangent space at that
point. (The vector should depend on the point in a smooth way.) Thus a vector
ﬁeld is a ﬁrst order linear partial diﬀerential operator of the form
L = a
∂
∂x
+b
∂
∂y
(1.92)
where a and b are functions of x and y. It is sometimes called a contravariant
vector ﬁeld.
With each vector ﬁeld L there is an associated ordinary diﬀerential equation
dx
dt
= a = f(x, y) (1.93)
dy
dt
= b = g(x, y). (1.94)
Theorem. Let h be a scalar function of x and y. Then along each solution
of the diﬀerential equation
dh
dt
= Lh. (1.95)
1.3. VECTOR FIELDS AND DIFFERENTIAL FORMS 27
The vector ﬁeld determines the diﬀerential equation. Furthermore, the diﬀer
ential equation determines the vector ﬁeld. They are the same thing, expressed
in somewhat diﬀerent language. The vector ﬁeld or the diﬀerential equation
may be expressed in an arbitrary coordinate system.
A solution of the diﬀerential equation is a parameterized curve such that the
tangent velocity vector at each point of the curve is the value of the vector ﬁeld
at that point.
Theorem. (straightening out theorem) Consider a point where L ,= 0. Then
near that point there is a new coordinate system u and v such that
L =
∂
∂u
. (1.96)
1.3.3 Diﬀerential forms
A diﬀerential form assigns to each point a linear form on the tangent space at
that point. (Again the assignment should be smooth.) Thus it is an expression
of the form
ω = p dx +q dy (1.97)
where p and q are functions of x and y. It is sometimes called a covariant vector
ﬁeld.
The value of the diﬀerential form on the vector ﬁeld is the scalar function
¸ω, L) = pa +qb. (1.98)
It is a scalar function of x and y. Again, a diﬀerential form may be expressed
in an arbitrary coordinate system.
A diﬀerential form may be pictured in a rough way by indicating a bunch of
parallel lines near each point of the space. However, as we shall see, these lines
ﬁt together in a nice way only in a special situation (for an exact form).
A diﬀerential form is said to be exact in a region if there is a smooth scalar
function h deﬁned on the region such that
ω = dh. (1.99)
Explicitly, this says that
p dx +q dy =
∂h
∂x
dx +
∂h
∂y
dy. (1.100)
In this case the contour lines of the diﬀerential form at each tangent space ﬁt
together on a small scale as pieces of the contour curves of the function h.
A diﬀerential form is said to be closed in a region if it satisﬁes the integra
bility condition
∂p
∂y
=
∂q
∂x
(1.101)
in the region.
28 CHAPTER 1. LINEAR ALGEBRA
Theorem. If a form is exact, then it is closed.
Theorem. (Poincar´e lemma) If a form is closed, then it is locally exact.
It is not true that if a form is closed in a region, then it is exact in the
region. Thus, for example, the form (−y dx + xdy)/(x
2
+ y
2
) is closed in the
plane minus the origin, but it is not exact in this region. On the other hand,
it is locally exact, by the Poincar´e lemma. In fact, consider any smaller region,
obtained by removing a line from the origin to inﬁnity. In this smaller region
this form may be represented as dθ, where the angle θ has a discontinuity only
on the line.
When a diﬀerential form is exact it is easy to picture. Then it is of the form
dh. The contour lines of h are curves. The diﬀerential at any point is obtained
by zooming in at the point so close that the contour lines look straight.
A diﬀerential form is what occurs as the integrand of a line integral. Let C
be an oriented curve. Then the line integral
_
C
ω =
_
C
p dx +q dy (1.102)
may be computed with an arbitrary coordinate system and with an arbitrary
parameterization of the oriented curve. The numerical answer is always the
same.
If a form is exact in a region, so that ω = dh, and if the curve C goes from
point P to point Q, then the integral
_
C
ω = h(Q) −h(P). (1.103)
Thus the value of the line integral depends only on the end points. It turns out
that the form is exact in the region if and only if the value of the line integral
along each curve in the region depends only on the end points. Another way to
formulate this is to say that a form is exact in a region if and only if the line
integral around each closed curve is zero.
1.3.4 Linearization of a vector ﬁeld near a zero
The situation is more complicated at a point where L = 0. At such a point,
there is a linearization of the vector ﬁeld. Thus u, v are coordinates that vanish
at the point, and the linearization is given by
du
dt
=
∂a
∂x
u +
∂a
∂y
v (1.104)
dy
dt
=
∂b
∂x
u +
∂b
∂y
v. (1.105)
The partial derivatives are evaluated at the point. Thus this equation has the
matrix form
d
dt
_
u
v
_
=
_
a
x
a
y
b
x
b
y
_ _
u
v
_
. (1.106)
1.3. VECTOR FIELDS AND DIFFERENTIAL FORMS 29
The idea is that the vector ﬁeld is somehow approximated by this linear vector
ﬁeld. The u and v represent deviations of the x and y from their values at this
point. This linear diﬀerential equation can be analyzed using the Jordan form
of the matrix.
It is natural to ask whether at an isolated zero of the vector ﬁeld coordinates
can be chosen so that the linear equation is exactly equivalent to the original
nonlinear equation. The answer is: usually, but not always. In particular,
there is a problem when the eigenvalues satisfy certain linear equations with
integer coeﬃcients, such as λ
1
+ λ
2
= 0. This situation includes some of the
most interesting cases, such as that of conjugate pure imaginary eigenvalues.
The precise statement of the eigenvalue condition is in terms of integer linear
combinations m
1
λ
1
+ m
2
λ
2
with m
1
≥ 0, m
2
≥ 0, and m
1
+ m
2
≥ 2. The
condition is that neither λ
1
nor λ
2
may be expressed in this form.
How can this condition be violated? One possibility is to have λ
1
= 2λ
1
,
which says λ
1
= 0. Another is to have λ
1
= 2λ
2
. More interesting is λ
1
=
2λ
1
+ λ
2
, which gives λ
1
+ λ
2
= 0. This includes the case of conjugate pure
imaginary eigenvalues.
Theorem. (Sternberg linearization theorem). Suppose that a vector ﬁeld
satisﬁes the eigenvalue condition at a zero. Then there is a new coordinate
system near that zero in which the vector ﬁeld is a linear vector ﬁeld.
For the straightening out theorem and the Sternberg linearization theorem,
see E. Nelson, Topics in Dynamics I: Flows, Princeton University Press, Prince
ton, NJ, 1969.
1.3.5 Quadratic approximation to a function at a critical
point
The useful way to picture a function h is by its contour lines.
Consider a function h such that at a particular point we have dh = 0. Then
there is a naturally deﬁned quadratic form
d
2
h = h
xx
(dx)
2
+ 2h
xy
dxdy +h
yy
(dy)
2
. (1.107)
The partial derivatives are evaluated at the point. This quadratic form is called
the Hessian. It is determined by a symmetric matrix
H =
_
h
xx
h
xy
h
xy
h
yy
_
. (1.108)
Its value on a tangent vector L at the point is
d
2
h(L, L) = h
xx
a
2
+ 2h
xy
ab +h
yy
b
2
. (1.109)
Theorem (Morse lemma) Let h be such that dh vanishes at a point. Let h
0
be the value of h at this point. Suppose that the Hessian is nondegenerate at
the point. Then there is a coordinate system u, v such that near the point
h −h
0
=
1
u
2
+
2
v
2
, (1.110)
30 CHAPTER 1. LINEAR ALGEBRA
where
1
and
2
are constants that are each equal to ±1.
For the Morse lemma, see J. Milnor, Morse Theory, Princeton University
Press, Princeton, NJ, 1969.
1.3.6 Diﬀerential, gradient, and divergence
It is important to distinguish between diﬀerential and gradient. The diﬀerential
of a scalar function h is given in an arbitrary coordinate system by
dh =
∂h
∂u
du +
∂h
∂v
dv. (1.111)
On the other hand, the deﬁnition of the gradient depends on the quadratic form
that deﬁnes the metric:
g = E(du)
2
+ 2F dudv +G(dv)
2
. (1.112)
It is obtained by applying the inverse g
−1
of the quadratic form to the diﬀerential
to obtain the corresponding vector ﬁeld. The inverse of the matrix
g =
_
E F
F G
_
(1.113)
is
g
−1
=
1
[g[
_
G −F
−F E
_
, (1.114)
where [g[ = EG−F
2
is the determinant of the matrix g. Thus
∇h =
1
[g[
_
G
∂h
∂u
−F
∂h
∂v
_
∂
∂u
+
1
[g[
_
−F
∂h
∂u
+E
∂h
∂v
_
∂
∂v
(1.115)
For orthogonal coordinates, when F = 0, the metric has the form g = E(du)
2
+
F(dv)
2
. In this case, the expression is simpler:
∇h =
1
E
_
∂h
∂u
_
∂
∂u
+
1
G
_
∂h
∂v
_
∂
∂v
(1.116)
Example: In polar coordinates
g = (dr)
2
+r
2
(dθ)
2
(1.117)
with determinant [g[ = r
2
. So
∇h =
∂h
∂r
∂
∂r
+
1
r
2
∂h
∂θ
∂
∂θ
. (1.118)
Another useful operation is the divergence of a vector ﬁeld. For a vector
ﬁeld
v = a
∂
∂u
+b
∂
∂v
(1.119)
1.3. VECTOR FIELDS AND DIFFERENTIAL FORMS 31
the divergence is
∇ v =
1
√
g
∂
√
ga
∂u
+
1
√
g
∂
√
gb
∂v
. (1.120)
In this formula
√
g denotes the square root of the determinant of the metric form
in this coordinate system. It is not immediately clear why this is the correct
quantity. However it may ultimately be traced back to the fact that the volume
element in an arbitrary coordinate system is
√
g dudv. For instance, in polar
coordinates it is r dr dθ.
Thus, for example, in polar coordinates the divergence of
v = a
∂
∂r
+b
∂
∂θ
(1.121)
is
∇ v =
1
r
∂ra
∂r
+
1
r
∂rb
∂θ
=
1
r
∂ra
∂r
+
∂b
∂θ
. (1.122)
The Laplace operator is the divergence of the gradient. Thus
∇ ∇h =
1
√
g
∂
∂u
1
√
g
_
G
∂h
∂u
−F
∂h
∂v
_
+
1
√
g
∂
∂v
1
√
g
_
−F
∂h
∂u
+E
∂h
∂v
_
. (1.123)
Again this simpliﬁes in orthogonal coordinates:
∇ ∇h =
1
√
g
∂
∂u
√
g
E
∂h
∂u
+
1
√
g
∂
∂v
√
g
G
∂h
∂v
. (1.124)
In polar coordinates this is
∇ ∇h =
1
r
∂
∂r
_
r
∂h
∂r
_
+
1
r
2
∂
2
h
∂θ
2
. (1.125)
There are similar deﬁnitions in higher deﬁnitions. The important thing to
emphasize is that vector ﬁelds are very diﬀerent from diﬀerential forms, and this
diﬀerence is highly visible once one leaves Cartesian coordinates. These objects
play diﬀerent roles. A diﬀerential form is the sort of object that ﬁts naturally
into a line integral. On the other hand, a vector ﬁeld is something that deﬁnes
a ﬂow, via the solution of a system of ordinary diﬀerential equations.
1.3.7 Spherical polar coordinates
These ideas have analogs in higher dimensions. The diﬀerential has the same
form in any coordinate system. On the other hand, the gradient depends on the
inner product.
Example: Consider spherical polar coordinates, where θ is colatitude and φ
is longitude. Then the metric form is
g = (dr)
2
+r
2
(dθ)
2
+r
2
sin
2
(θ)(dφ)
2
. (1.126)
32 CHAPTER 1. LINEAR ALGEBRA
In this case the gradient is
∇h =
∂h
∂r
∂
∂r
+
1
r
2
∂h
∂θ
∂
∂θ
+
1
r
2
sin
2
(θ)
∂h
∂φ
∂
∂φ
. (1.127)
The divergence of a vector ﬁeld makes sense in any number of dimensions.
For spherical polar coordinates the volume element is r
2
sin(θ) dr dθ dφ. The
divergence of
v = a
∂
∂r
+b
∂
∂θ
+c
∂
∂φ
(1.128)
is
∇ v =
1
r
2
sin(θ)
∂r
2
sin(θ)a
∂r
+
1
r
2
sin(θ)
∂r
2
sin(θ)b
∂θ
+
1
r
2
sin(θ)
∂r
2
sin(θ)c
∂φ
.
(1.129)
This can be written more simply as
∇ v =
1
r
2
∂r
2
a
∂r
+
1
sin(θ)
∂ sin(θ)b
∂θ
+
∂c
∂φ
. (1.130)
The Laplace operator is the divergence of the gradient. In spherical polar
coordinates this is
∇ ∇h =
1
r
2
∂
∂r
_
r
2
∂h
∂r
_
+
1
r
2
sin(θ)
∂
∂θ
sin(θ)
∂h
∂θ
+
1
r
2
sin
2
(θ)
∂
2
h
∂φ
2
. (1.131)
1.3.8 Gradient systems
Now we use Cartesian coordinates x, y and the usual inner product. This inner
product maps a form to a corresponding vector ﬁeld. Thus the diﬀerential form
dh =
∂h
∂x
dx +
∂h
∂y
dy (1.132)
determines a gradient vector ﬁeld
∇h =
∂h
∂x
∂
∂x
+
∂h
∂y
∂
∂y
. (1.133)
The corresponding diﬀerential equation is
dx
dt
= h
x
(1.134)
dy
dt
= h
y
. (1.135)
Theorem. Along a solution of a gradient system
dh
dt
= h
2
x
+h
2
y
≥ 0. (1.136)
1.3. VECTOR FIELDS AND DIFFERENTIAL FORMS 33
This is hill climbing. The picture is that at each point, the gradient vector
ﬁeld points uphill in the steepest direction.
Consider a point where dh = 0. Then the linearization of the equation is
given by
du
dt
= h
xx
u +h
xy
v (1.137)
dv
dt
= h
xy
u +h
yy
v. (1.138)
where the partial derivatives are evaluated at the point. This is a real symmetric
matrix, and therefore its eigenvalues are real.
1.3.9 Hamiltonian systems
Now we use Cartesian coordinates x, y, but now instead of a symmetric inner
product we use a skewsymmetric form to map a form to a corresponding vector.
Thus the diﬀerential form
dH =
∂H
∂x
dx +
∂H
∂y
dy (1.139)
determines a Hamiltonian vector ﬁeld
ˇ
∇H =
∂H
∂y
∂
∂x
−
∂H
∂x
∂
∂y
. (1.140)
The corresponding diﬀerential equation is
dx
dt
= H
y
(1.141)
dy
dt
= −H
x
. (1.142)
Theorem. (Conservation of Energy) Along a solution of a Hamiltonian sys
tem.
dH
dt
= H
y
H
x
−H
x
H
y
= 0. (1.143)
This is conservation of energy. The picture is that at each point the vector
ﬁeld points along one of the contour lines of H.
Consider a point where dH = 0. Then the linearization of the equation is
given by
du
dt
= H
xy
u +H
yy
v (1.144)
dv
dt
= −H
xx
u −H
xy
v. (1.145)
where the partial derivatives are evaluated at the point. This is a special kind
of matrix. The eigenvalues λ satisfy the equation
λ
2
= H
2
xy
−H
xx
H
yy
. (1.146)
34 CHAPTER 1. LINEAR ALGEBRA
So they are either real with opposite sign, or they are complex conjugate pure
imaginary. In the real case solutions linger near the equilibrium point. In the
complex case solutions oscillate near the equilibrium point.
1.3.10 Contravariant and covariant
Here is a summary that may be useful. Contravariant objects include points
and vectors (arrows). These are objects that live in the space. They also include
oriented curves (whose tangent vectors are arrows). Thus a vector ﬁeld (con
travariant) determines solution curves of the associated system of diﬀerential
equations (contravariant).
Covariant objects associate numbers to objects that live in the space. They
include scalars, linear forms (elements of the dual space), and bilinear forms.
They also include scalar functions and diﬀerential forms. Often covariant objects
are described by their contours. The diﬀerential of a scalar function (covariant)
is a diﬀerential form (covariant). The integral of a diﬀerential form (a covariant
object) along an oriented curve (a contravariant object) is a number.
There are also mixed objects. A linear transformation is a mixed object,
since it takes vectors as inputs (covariant), but it gives vectors as outputs (con
travariant).
A bilinear form may be used to move from the contravariant world to the
covariant world. The input is a vector, and the output is a linear form. If the
bilinear form is nondegenerate, then there is an inverse that can take a linear
form as an input and give a vector as an output. This occurs for instance when
we take the diﬀerential of a scalar function (covariant) and use a symmetric
bilinear form to get a gradient vector ﬁeld (contravariant). It also occurs when
we take the diﬀerential of a scalar function (covariant) and use an antisymmetric
bilinear form to get a Hamiltonian vector ﬁeld (contravariant).
1.3.11 Problems
1. A vector ﬁeld that is nonzero at a point can be transformed into a constant
vector ﬁeld near that point. Pick a point away from the origin and ﬁnd
coordinates u and v near there so that
−y
∂
∂x
+x
∂
∂y
=
∂
∂u
. (1.147)
2. A vector ﬁeld that is nonzero at a point can be transformed into a constant
vector ﬁeld near that point. Pick a point away from the origin and ﬁnd
coordinates u and v so that
−
y
x
2
+y
2
∂
∂x
+
x
x
2
+y
2
∂
∂y
=
∂
∂u
. (1.148)
3. A diﬀerential form usually cannot be transformed into a constant diﬀer
ential form, but there special circumstances when that can occur. Is it
1.3. VECTOR FIELDS AND DIFFERENTIAL FORMS 35
possible to ﬁnd coordinates u and v near a given point (not the origin) so
that
−y dx +xdy = du? (1.149)
4. A diﬀerential form usually cannot be transformed into a constant diﬀer
ential form, but there special circumstances when that can occur. Is it
possible to ﬁnd coordinates u and v near a given point (not the origin) so
that
−
y
x
2
+y
2
dx +
x
x
2
+y
2
dy = du? (1.150)
5. Consider the vector ﬁeld
L = x(4 −x −y)
∂
∂x
+ (x −2)y
∂
∂y
. (1.151)
Find its zeros. At each zero, ﬁnd the linearization. For each linearization,
ﬁnd the eigenvalues. Use this information to sketch the vector ﬁeld.
6. Let h be a smooth function. Its gradient expressed with respect to Carte
sian basis vectors ∂/∂x and ∂/∂y is
∇h =
∂h
∂x
∂
∂x
+
∂h
∂y
∂
∂y
. (1.152)
Find the gradient ∇h expressed with respect to polar basis vectors ∂/∂r
and ∂/∂θ.
7. Let H be a smooth function. Its Hamiltonian vector ﬁeld expressed with
respect to Cartesian basis vectors ∂/∂x and ∂/∂y is
ˇ
∇H =
∂H
∂y
∂
∂x
−
∂H
∂x
∂
∂y
. (1.153)
Find this same vector ﬁeld expressed with respect to polar basis vectors
∂/∂r and ∂/∂θ.
8. Consider the vector ﬁeld
L = (1 +x
2
+y
2
)y
∂
∂x
−(1 +x
2
+y
2
)x
∂
∂y
. (1.154)
Find its linearization at 0. Show that there is no coordinate system near
0 in which the vector ﬁeld L is expressed by its linearization. Hint: Solve
the associated system of ordinary diﬀerential equations, both for L and
for its linearization. Find the period of a solution in both cases.
9. Here is an example of a ﬁxed point where the linear stability analysis
gives an elliptic ﬁxed point but changing to polar coordinates shows the
unstable nature of the ﬁxed point:
dx
dt
= −y +x(x
2
+y
2
) (1.155)
dy
dt
= x +y(x
2
+y
2
). (1.156)
36 CHAPTER 1. LINEAR ALGEBRA
Change the vector ﬁeld to the polar coordinate representation and solve
the corresponding system of ordinary diﬀerential equations.
Chapter 2
Fourier series
2.1 Orthonormal families
Let T be the circle parameterized by [0, 2π) or by [−π, π). Let f be a complex
function on T that is integrable. The nth Fourier coeﬃcient is
c
n
=
1
2π
_
2π
0
e
−inx
f(x) dx. (2.1)
The goal is to show that f has a representation as a Fourier series
f(x) =
∞
n=−∞
c
n
e
inx
. (2.2)
There are two problems. One is to interpret the sense in which the series
converges. The second is to show that it actually converges to f.
This is a huge subject. However the simplest and most useful theory is in
the context of Hilbert space. Let L
2
(T) be the space of all (Borel measurable)
functions such that
f
2
=
1
2π
_
2π
0
[f(x)[
2
dx < ∞. (2.3)
Then L
2
(T) is a Hilbert space with inner product
(f, g) =
1
2π
_
2π
0
f(x)g(x) dx. (2.4)
Let
φ
n
(x) = exp(inx). (2.5)
Then the φ
n
form an orthogonal family in L
2
(T). Furthermore, we have the
identity
f
2
=
n≤N
[c
n
[
2
+f −
n≤N
c
n
φ
n

2
. (2.6)
37
38 CHAPTER 2. FOURIER SERIES
In particular, we have Bessel’s inequality
f
2
≥
n≤N
[c
n
[
2
. (2.7)
This shows that
∞
n=−∞
[c
n
[
2
< ∞. (2.8)
The space of sequences satisfying this identity is called
2
. Thus we have proved
the following theorem.
Theorem. If f is in L
2
(T), then its sequence of Fourier coeﬃcients is in
2
.
2.2 L
2
convergence
It is easy to see that
n≤N
c
n
φ
n
is a Cauchy sequence in L
2
(T) as N → ∞.
It follows from the completeness of L
2
(T) that it converges in the L
2
sense to a
sum g in L
2
(T). That is, we have
lim
N→∞
g −
n≤N
c
n
φ
n

2
= 0. (2.9)
The only remaining thing to show is that g = f. This, however, requires some
additional ideas.
One way to do it is to take r with 0 < r < 1 and look at the sum
∞
n=−∞
r
n
c
n
e
inx
=
1
2π
_
2π
0
P
r
(x −y)f(y) dy. (2.10)
Here
P
r
(x) =
∞
n=−∞
r
n
e
inx
=
1 −r
2
1 −2r cos(x) +r
2
. (2.11)
The functions
1
2π
P
r
(x) have the properties of an approximate delta function.
Each such function is positive and has integral 1 over the periodic interval.
Furthermore,
P
r
(x) ≤
1 −r
2
2r(1 −cos(x))
, (2.12)
which approaches zero as r →1 away from points where cos(x) = 1.
Theorem. If f is continuous on T, then
f(x) = lim
r↑1
∞
n=−∞
r
n
c
n
e
inx
, (2.13)
and the convergence is uniform.
2.2. L
2
CONVERGENCE 39
Proof: It is easy to compute that
∞
n=−∞
r
n
c
n
e
inx
=
1
2π
_
2π
0
P
r
(x−y)f(y) dy =
1
2π
_
2π
0
P
r
(z)f(x−z) dz. (2.14)
Hence
f(x) −
∞
n=−∞
r
n
c
n
e
inx
=
1
2π
_
2π
0
P
r
(z)(f(x) −f(x −z)) dz. (2.15)
Thus
[f(x)−
∞
n=−∞
r
n
c
n
e
inx
[ = [
1
2π
_
2π
0
P
r
(z)(f(x)−f(x−z)) dz[ ≤
1
2π
_
2π
0
P
r
(z)[f(x)−f(x−z)[ dz.
(2.16)
Since f is continuous on T, its absolute value is bounded by some constant
M. Furthermore, since f is continuous on T, it is uniformly continuous on T.
Consider arbitrary > 0. It follows from the uniform continuity that there exists
δ > 0 such that for all x and all z with [z[ < δ we have [f(x) −f(x −z)[ < /2.
Thus the right hand side is bounded by
1
2π
_
z<δ
P
r
(z)[f(x)−f(x−z)[ dz+
1
2π
_
z≥δ
P
r
(z)[f(x)−f(x−z)[ dx ≤ /2+
1 −r
2
2r(1 −cos(δ))
2M.
(2.17)
Take r so close to 1 that the second term is also less than /2. Then the diﬀerence
that is being estimated is less than for such r. This proves the result.
Theorem. If f is in L
2
(T), then
f =
∞
n=−∞
c
n
φ
n
(2.18)
in the sense that
lim
N→∞
f −
n≤N
c
n
φ
n

2
= 0. (2.19)
Proof: Let f have Fourier coeﬃcients c
n
and g =
n
c
n
φ
n
. Let > 0 be
arbitrary. Let h be in C(T) with
f −h
2
< /2. (2.20)
Suppose h has Fourier coeﬃcients d
n
. Then there exists r < 1 such that
h −
n
r
n
d
n
e
inx

∞
< /2. (2.21)
It follows that
h −
n
r
n
d
n
e
inx

2
< /2. (2.22)
40 CHAPTER 2. FOURIER SERIES
Thus
f −
n
r
n
d
n
e
inx

2
< . (2.23)
However since g is the orthogonal projection of f onto the span of the φ
n
, it is
also the best approximation of f in the span of the φ
n
. Thus
f −g
2
< . (2.24)
Since > 0 is arbitrary, we can only conclude that f = g.
Theorem. If f is in L
2
(T), then
f
2
=
∞
n=−∞
[c
n
[
2
. (2.25)
2.3 Absolute convergence
Deﬁne the function spaces
C(T) ⊂ L
∞
(T) ⊂ L
2
(T) ⊂ L
1
(T). (2.26)
The norms f
∞
on the ﬁrst two spaces are the same, the smallest number M
such that [f(x)[ ≤ M (with the possible exception of a set of x of measure zero).
The space C(T) consists of continuous functions; the space L
∞
(T) consists of
all bounded functions. The norm on L
2
(T) is given by f
2
2
=
1
2π
_
2π
0
[f(x)[
2
dx.
The norm on L
1
(T) is given by f
1
=
1
2π
_
2π
0
[f(x)[ dx. Since the integral is a
probability average, their relation is
f
1
≤ f
2
≤ f
∞
. (2.27)
Also deﬁne the sequence spaces
1
⊂
2
⊂ c
0
⊂
∞
. (2.28)
The norm on
1
is c
1
=
n
[c
n
[. Then norm on
2
is given by c
2
2
=
n
[c
n
[
2
.
The norms on the last two spaces are the same, that is, c
∞
is the smallest
M such that [c
n
[ ≤ M. The space c
0
consists of all sequences with limit 0 at
inﬁnity. The relation between these norms is
c
∞
≤ c
2
≤ c
1
. (2.29)
We have seen that the Fourier series theorem gives a perfect correspondence
between L
2
(T) and
2
. For the other spaces the situation is more complex.
RiemannLebesgue Lemma. If f is in L
1
(T), then the Fourier coeﬃcients of
f are in c
0
, that is, they approaches 0 at inﬁnity.
Proof: Each function in L
2
(T) has Fourier coeﬃcients in
2
, so each function
in L
2
(T) has Fourier coeﬃcients that vanish at inﬁnity. The map from a function
to its Fourier coeﬃcients gives a continuous map from L
1
(T) to
∞
. However
2.4. POINTWISE CONVERGENCE 41
every function in L
1
(T) may be approximated arbitrarily closely in L
1
(T) norm
by a function in L
2
(T). Hence its coeﬃcients may be approximated arbitrarily
well in
∞
norm by coeﬃcients that vanish at inﬁnity. Therefore the coeﬃcients
vanish at inﬁnity.
In summary, the map from a function to its Fourier coeﬃcients gives a con
tinuous map from L
1
(T) to c
0
. That is, the Fourier coeﬃcients of an integrable
function are bounded (this is obvious) and approach zero (RiemannLebesgue
lemma). Furthermore, the Fourier coeﬃcients determine the function.
The map from Fourier coeﬃcients to functions gives a continuous map from
1
to C(T). An sequence that is absolutely summable deﬁnes a Fourier series
that converges absolutely and uniformly to a continuous function.
Theorem. If f is in L
2
(T) and if f
exists (in the sense that f is an integral
of f) and if f
is also in L
2
(T), then the Fourier coeﬃcients are in
1
.
2.4 Pointwise convergence
This gives a very satisfactory picture of Fourier series. However there is one
slightly unsatisfying point. The convergence in the L
2
sense does not imply
convergence at a particular point. Of course, if the derivative is in L
2
then
we have uniform convergence, and in particular convergence at each point. But
what if the function is diﬀerentiable at one point but has discontinuities at other
points? What can we say about convergence at that one point? Fortunately, we
can ﬁnd something about that case by a closer examination of the partial sums.
One looks at the partial sum
n≤N
c
n
e
inx
=
1
2π
_
2π
0
D
N
(x −y)f(y) dy. (2.30)
Here
D
N
(x) =
n≤N
e
inx
=
sin((N +
1
2
)x)
sin(
1
2
x)
. (2.31)
This Dirichlet kernel D
N
(x) has at least some of the properties of an approxi
mate delta function. Unfortunately, it is not positive; instead it oscillates wildly
for large N at points away from where sin(x/2) = 0. However the function
1/(2π)D
N
(x) does have integral 1.
Theorem. If for some x the function
d
x
(z) =
f(x +z) −f(x)
2 sin(z/2)
(2.32)
is in L
1
(T), then at that point
f(x) =
∞
n=−∞
c
n
φ
n
(x). (2.33)
42 CHAPTER 2. FOURIER SERIES
Note that if d
x
(z) is continuous at z = 0, then its value at z = 0 is d
x
(0) =
f
(x). So the hypothesis of the theorem is a condition related to diﬀerentiability
of f at the point x. The conclusion of the theorem is pointwise convergence of
the Fourier series at that point. Since f may be discontinuous at other points,
it is possible that this Fourier series is not absolutely convergent.
Proof: We have
f(x) −
n≤N
c
n
e
inx
=
1
2π
_
2π
0
D
N
(z)(f(x) −f(x −z)) dz. (2.34)
We can write this as
f(x) −
n≤N
c
n
e
inx
=
1
2π
_
2π
0
2 sin((N +
1
2
)z)d
x
(−z) dz. (2.35)
This goes to zero as N →∞, by the RiemannLebesgue lemma.
2.5 Problems
1. Let f(x) = x deﬁned for −π ≤ x < π. Find the L
1
(T), L
2
(T), and L
∞
(T)
norms of f, and compare them.
2. Find the Fourier coeﬃcients c
n
of f for all n in Z.
3. Find the
∞
,
2
, and
1
norms of these Fourier coeﬃcients, and compare
them.
4. Use the equality of L
2
and
2
norms to compute
ζ(2) =
∞
n=1
1
n
2
.
5. Compare the
∞
and L
1
norms for this problem. Compare the L
∞
and
1
norms for this problem.
6. Use the pointwise convergence at x = π/2 to evaluate the inﬁnite sum
∞
k=0
(−1)
k
1
2k + 1
,
regarded as a limit of partial sums. Does this sum converge absolutely?
7. Let F(x) =
1
2
x
2
deﬁned for −π ≤ x < π. Find the Fourier coeﬃcients of
this function.
8. Use the equality of L
2
and
2
norms to compute
ζ(4) =
∞
n=1
1
n
4
.
2.5. PROBLEMS 43
9. Compare the
∞
and L
1
norms for this problem. Compare the L
∞
and
1
norms for this problem.
10. At which points x of T is F(x) continuous? Diﬀerentiable? At which
points x of T is f(x) continuous? Diﬀerentiable? At which x does F
(x) =
f(x)? Can the Fourier series of f(x) be obtained by diﬀerentiating the
Fourier series of F(x) pointwise? (This last question can be answered by
inspecting the explicit form of the Fourier series for the two problems.)
44 CHAPTER 2. FOURIER SERIES
Chapter 3
Fourier transforms
3.1 Introduction
Let R be the line parameterized by x. Let f be a complex function on R that
is integrable. The Fourier transform
ˆ
f = Ff is
ˆ
f(k) =
_
∞
−∞
e
−ikx
f(x) dx. (3.1)
It is a function on the (dual) real line R
parameterized by k. The goal is to
show that f has a representation as an inverse Fourier transform
f(x) =
_
∞
−∞
e
ikx
ˆ
f(k)
dk
2π
. (3.2)
There are two problems. One is to interpret the sense in which these integrals
converge. The second is to show that the inversion formula actually holds.
The simplest and most useful theory is in the context of Hilbert space. Let
L
2
(R) be the space of all (Borel measurable) complex functions such that
f
2
2
=
_
∞
−∞
[f(x)[
2
dx < ∞. (3.3)
Then L
2
(R) is a Hilbert space with inner product
(f, g) =
_
f(x)g(x) dx. (3.4)
Let L
2
(R
) be the space of all (Borel measurable) complex functions such that
h
2
2
=
_
∞
−∞
[h(k))[
2
dk
2π
< ∞. (3.5)
Then L
2
(R
) is a Hilbert space with inner product
(h, u) =
_
∞
−∞
h(k)u(k)
dk
2π
. (3.6)
45
46 CHAPTER 3. FOURIER TRANSFORMS
We shall see that the correspondence between f and
ˆ
f is a unitary map from
L
2
(R) onto L
2
(R
). So this theory is simple and powerful.
3.2 L
1
theory
First, we need to develop the L
1
theory. The space L
1
is a Banach space. Its
dual space is L
∞
, the space of essentially bounded functions. An example of a
function in the dual space is the exponential function φ
k
(x) = e
ikx
. The Fourier
transform is then
ˆ
f(k) = ¸φ
k
, f) =
_
∞
−∞
φ
k
(x)f(x) dx, (3.7)
where φ
k
is in L
∞
and f is in L
1
.
Theorem. If f, g are in L
1
(R), then the convolution f ∗g is another function
in L
1
(R) deﬁned by
(f ∗ g)(x) =
_
∞
−∞
f(x −y)g(y) dy. (3.8)
Theorem. If f, g are in L
1
(R), then the Fourier transform of the convolution
is the product of the Fourier transforms:
¯
(f ∗ g)(k) =
ˆ
f(k)ˆ g(k). (3.9)
Theorem. Let f
∗
(x) = f(−x). Then the Fourier transform of f
∗
is the
complex conjugate of
ˆ
f.
Theorem. If f is in L
1
(R), then its Fourier transform
ˆ
f is in L
∞
(R
) and
satisﬁes 
ˆ
f
∞
≤ f
1
. Furthermore,
ˆ
f is in C
0
(R
), the space of bounded
continuous functions that vanish at inﬁnity.
Theorem. If f is in L
1
and is also continuous and bounded, we have the
inversion formula in the form
f(x) = lim
↓0
_
∞
−∞
e
ikx
ˆ
δ
(k)
ˆ
f(k)
dk
2π
, (3.10)
where
ˆ
δ
(k) = exp(−[k[). (3.11)
Proof: The inverse Fourier transform of this is
δ
(x) =
1
π
x
2
+
2
. (3.12)
It is easy to calculate that
_
∞
−∞
e
ikx
ˆ
δ
(k)
ˆ
f(k)
dk
2π
= (δ
∗ f)(x). (3.13)
However δ
is an approximate delta function. The result follows by taking →0.
3.3. L
2
THEORY 47
3.3 L
2
theory
The space L
2
is its own dual space, and it is a Hilbert space. It is the setting
for the most elegant and simple theory of the Fourier transform.
Lemma. If f is in L
1
(R) and in L
2
(R), then
ˆ
f is in L
2
(R
), and f
2
2
= 
ˆ
f
2
2
.
Proof. Let g = f
∗
∗ f. Then g is in L
1
and is continuous and bounded.
Furthermore, the Fourier transform of g is [
ˆ
f(k)[
2
. Thus
f
2
2
= g(0) = lim
↓0
_
∞
−∞
ˆ
δ
(k)[
ˆ
f(k)[
2
dk
2π
=
_
∞
−∞
[
ˆ
f(k)[
2
dk
2π
. (3.14)
Theorem. Let f be in L
2
(R). For each a, let f
a
= 1
[−a,a]
f. Then f
a
is in
L
1
(R) and in L
2
(R), and f
a
→f in L
2
(R) as a →∞. Furthermore, there exists
ˆ
f in L
2
(R
) such that
ˆ
f
a
→
ˆ
f as a →∞.
Explicitly, this says that the Fourier transform
ˆ
f(k) is characterized by
_
∞
−∞
[
ˆ
f(k) −
_
a
−a
e
−ikx
f(x) dx[
2
dk
2π
→0 (3.15)
as a →∞.
These arguments show that the Fourier transformation F : L
2
(R) →L
2
(R
)
deﬁned by Ff =
ˆ
f is welldeﬁned and preserves norm. It is easy to see from
the fact that it preserves norm that it also preserves inner product: (Ff, Fg) =
(f, g).
Deﬁne the inverse Fourier transform F
∗
in the same way, so that if h is in
L
1
(R
) and in L
2
(R
), then F
∗
h is in L
2
(R) and is given by the usual inverse
Fourier transform formula. Again we can extend the inverse transformation to
F
∗
: L
2
(R
) →L
2
(R) so that it preserves norm and inner product.
Now it is easy to check that (F
∗
h, f) = (h, Ff). Take h = Fg. Then
(F
∗
Fg, f) = (Fg, Ff) = (g, f). That is F
∗
Fg = g. Similarly, one may show
that FF
∗
u = u. These equations show that F is unitary and that F
∗
= F
−1
is
the inverse of F. This proves the following result.
Theorem. The Fourier transform F initially deﬁned on L
1
(R) ∩ L
2
(R) ex
tends by continuity to F : L
2
(R) → L
2
(R
). The inverse Fourier transform F
∗
initially deﬁned on L
1
(R
) ∩ L
2
(R
) extends by continuity to F
∗
: L
2
(R
) →
L
2
(R). These are unitary operators that preserve L
2
norm and preserve inner
product. Furthermore, F
∗
is the inverse of F.
3.4 Absolute convergence
We have seen that the Fourier transform gives a perfect correspondence between
L
2
(R) and L
2
(R
). For the other spaces the situation is more complex.
The map from a function to its Fourier transform gives a continuous map
from L
1
(R) to part of C
0
(R
). That is, the Fourier transform of an inte
grable function is continuous and bounded (this is obvious) and approach zero
48 CHAPTER 3. FOURIER TRANSFORMS
(RiemannLebesgue lemma). Furthermore, this map is onetoone. That is, the
Fourier transform determines the function.
The inverse Fourier transform gives a continuous map from L
1
(R
) to C
0
(R).
This is also a onetoone transformation.
One useful fact is that if f is in L
1
(R) and g is in L
2
(R), then the convolution
f ∗g is in L
2
(R). Furthermore,
¯
f ∗ g(k) =
ˆ
f(k)ˆ g(k) is the product of a bounded
function with an L
2
(R
) function, and therefore is in L
2
(R
).
However the same pattern of the product of a bounded function with an L
2
function can arise in other ways. For instance, consider the translate f
a
of a
function f in L
2
(R) deﬁned by f
a
(x) = f(x−a). Then
´
f
a
(k) = exp(−ika)
ˆ
f(k).
This is also the product of a bounded function with an L
2
(R
) function.
One can think of this last example as a limiting case of a convolution.
Let δ
be an approximate δ function. Then (δ
)
a
∗ f has Fourier transform
exp(−ika)
ˆ
δ
(k)
ˆ
f(k). Now let →0. Then (δ
)
a
∗f →f
a
, while exp(−ika)
ˆ
δ
(k)
ˆ
f(k) →
exp(−ika)
ˆ
f(k).
Theorem. If f is in L
2
(R) and if f
exists (in the sense that f is an integral
of f) and if f
is also in L
2
(R), then the Fourier transform is in L
1
(R
). As a
consequence f is is C
0
(R).
Proof:
ˆ
f(k) = (1/
√
1 +k
2
)
√
1 +k
2 ˆ
f(k). Since f is in L
2
(R), it follows
that
ˆ
f(k) is in L
2
(R). Since f
is in L
2
(R), it follows that k
ˆ
f(k) is in L
2
(R
).
Hence
√
1 +k
2 ˆ
f(k) is in L
2
(R
). Since 1/
√
1 +k
2
is also in L
2
(R
), it follows
from the Schwarz inequality that
ˆ
f(k) is in L
1
(R
).
3.5 Fourier transform pairs
There are some famous Fourier transforms.
Fix σ > 0. Consider the Gaussian
g
σ
(x) =
1
√
2πσ
2
exp(−
x
2
2σ
2
). (3.16)
Its Fourier transform is
ˆ g
σ
(k) = exp(−
σ
2
k
2
2
). (3.17)
Proof: Deﬁne the Fourier transform ˆ g
σ
(k) by the usual formula. Check that
_
d
dk
+σ
2
k
_
ˆ g
σ
(k) = 0. (3.18)
This proves that
ˆ g
σ
(k) = C exp(−
σ
2
k
2
2
). (3.19)
Now apply the equality of L
2
norms. This implies that C
2
= 1. By looking at
the case k = 0 it becomes obvious that C = 1.
3.6. PROBLEMS 49
Let > 0. Introduce the Heaviside function H(k) that is 1 for k > 0 and 0
for k < 0. The two basic Fourier transform pairs are
f
(x) =
1
x −i
(3.20)
with Fourier transform
ˆ
f
(k) = 2πiH(−k)e
k
. (3.21)
and its complex conjugate
f
(x) =
1
x +i
(3.22)
with Fourier transform
ˆ
f
(−k) = −2πiH(k)e
−k
. (3.23)
These may be checked by computing the inverse Fourier transform. Notice that
f
and its conjugate are not in L
1
(R).
Take 1/π times the imaginary part. This gives the approximate delta func
tion
δ
(x) =
1
π
x
2
+
2
. (3.24)
with Fourier transform
ˆ
δ
(k) = e
−k
. (3.25)
Take the real part. This gives the approximate principal value of 1/x function
p
(x) =
x
x
2
+
2
(3.26)
with Fourier transform
ˆ p
(k) = −πi[H(k)e
−k
−H(−k)e
k
]. (3.27)
3.6 Problems
1. Let f(x) = 1/(2a) for −a ≤ x ≤ a and be zero elsewhere. Find the L
1
(R),
L
2
(R), and L
∞
(R) norms of f, and compare them.
2. Find the Fourier transform of f.
3. Find the L
∞
(R
), L
2
(R
), and L
1
(R
) norms of the Fourier transform, and
compare them.
4. Compare the L
∞
(R
) and L
1
(R) norms for this problem. Compare the
L
∞
(R) and L
1
(R
) norms for this problem.
5. Use the pointwise convergence at x = 0 to evaluate an improper integral.
6. Calculate the convolution of f with itself.
7. Find the Fourier transform of the convolution of f with itself. Verify in
this case that the Fourier transform of the convolution is the product of
the Fourier transforms.
50 CHAPTER 3. FOURIER TRANSFORMS
3.7 Poisson summation formula
Theorem: Let f be in L
1
(R) with
ˆ
f in L
1
(R
) and such that
k
[
ˆ
f(k)[ < ∞.
Then
2π
n
f(2πn) =
k
ˆ
f(k). (3.28)
Proof: Let
S(t) =
n
f(2πn +t). (3.29)
Since S(t) is 2π periodic, we can expand
S(t) =
k
a
k
e
ikt
. (3.30)
It is easy to compute that
a
k
=
1
2π
_
2π
0
S(t)e
−ikt
dt =
1
2π
ˆ
f(k). (3.31)
So the Fourier series of S(t) is absolutely summable. In particular
S(0) =
k
a
k
. (3.32)
3.8 Problems
1. In this problem the Fourier transform is
ˆ
f(k) =
_
∞
−∞
e
−ixk
f(x) dx
and the inverse Fourier transform is
f(x) =
_
∞
−∞
e
ixk
f(k)
dk
2π
.
These provide an isomorphism between the Hilbert spaces L
2
(R, dx) and
L
2
(R,
dk
2π
). The norm of f in the ﬁrst space is equal to the norm of
ˆ
f in
the second space. We will be interested in the situation where the Fourier
transform is bandlimited, that is, only waves with [k[ ≤ a have nonzero
amplitude.
Make the assumption that [k[ > a implies
ˆ
f(k) = 0. That is, the Fourier
transform of f vanishes outside of the interval [−a, a].
Let
g(x) =
sin(ax)
ax
.
3.9. PDE PROBLEMS 51
The problem is to prove that
f(x) =
∞
m=−∞
f(
mπ
a
)g(x −
mπ
a
).
This says that if you know f at multiples of π/a, then you know f at all
points.
Hint: Let g
m
(x) = g(x − mπ/a). The task is to prove that f(x) =
m
c
m
g
m
(x) with c
m
= f(mπ/a). It helps to use the Fourier transform
of these functions. First prove that the Fourier transform of g(x) is given
by ˆ g(k) = π/a for [k[ ≤ a and ˆ g(k) = 0 for [k[ > a. (Actually, it may
be easier to deal with the inverse Fourier transform.) Then prove that
ˆ g
m
(k) = exp(−imπk/a)ˆ g(k). Finally, note that the functions ˆ g
m
(k) are
orthogonal.
2. In the theory of neural networks one wants to synthesize an arbitrary
function from linear combinations of translates of a ﬁxed function. Let f
be a function in L
2
(R). Suppose that the Fourier transform
ˆ
f(k) ,= 0 for
all k. Deﬁne the translate f
a
by f
a
(x) = f(x − a). The task is to show
that the set of all linear combinations of the functions f
a
, where a ranges
over all real numbers, is dense in L
2
(R).
Hint: It is suﬃcient to show that if g is in L
2
(R) with (g, f
a
) = 0 for all
a, then g = 0. (Why is this suﬃcient?) This can be done using Fourier
analysis.
3.9 PDE Problems
1. Consider the initial value problem for the reactiondiﬀusion equation
∂u
∂t
=
∂
2
u
∂x
2
+u −u
2
.
We are interested in solutions u with 0 ≤ u ≤ 1. Find the constant solu
tions of this equation. Find the linear equations that are the linearizations
about each of these constant solutions.
2. Find the dispersion relations for each of these linear equations. Find the
range of wave numbers (if any) that are responsible for any instability of
these linear equations.
3. Let z = x−ct and s = t. Find ∂/∂x and ∂/∂t in terms of ∂/∂z and ∂/∂s.
Write the partial diﬀerential equation in these new variables. A traveling
wave solution is a solution u for which ∂u/∂s = 0. Write the ordinary
diﬀerential equation for a traveling wave solution (in terms of du/dz).
52 CHAPTER 3. FOURIER TRANSFORMS
4. Write this ordinary diﬀerential equation as a ﬁrst order system. Take
c > 0. Find the ﬁxed points and classify them.
5. Look for a traveling wave solution that goes from 1 at z = −∞ to 0 at
z = +∞. For which values of c are there solutions that remain in the
interval from 0 to 1?
Chapter 4
Complex integration
4.1 Complex number quiz
1. Simplify
1
3+4i
.
2. Simplify [
1
3+4i
[.
3. Find the cube roots of 1.
4. Here are some identities for complex conjugate. Which ones need correc
tion? z +w = ¯ z + ¯ w, z −w = ¯ z − ¯ w, zw = ¯ z ¯ w, z/w = ¯ z/ ¯ w. Make suitable
corrections, perhaps changing an equality to an inequality.
5. Here are some identities for absolute value. Which ones need correction?
[z +w[ = [z[ +[w[, [z −w[ = [z[ −[w[, [zw[ = [z[[w[, [z/w[ = [z[/[w[. Make
suitable corrections, perhaps changing an equality to an inequality.
6. Deﬁne log(z) so that −π < ·log(z) ≤ π. Discuss the identities e
log(z)
= z
and log(e
w
) = w.
7. Deﬁne z
w
= e
wlog z
. Find i
i
.
8. What is the power series of log(1 +z) about z = 0? What is the radius of
convergence of this power series?
9. What is the power series of cos(z) about z = 0? What is its radius of
convergence?
10. Fix w. How many solutions are there of cos(z) = w with −π < 'z ≤ π.
53
54 CHAPTER 4. COMPLEX INTEGRATION
4.2 Complex functions
4.2.1 Closed and exact forms
In the following a region will refer to an open subset of the plane. A diﬀerential
form p dx +q dy is said to be closed in a region R if throughout the region
∂q
∂x
=
∂p
∂y
. (4.1)
It is said to be exact in a region R if there is a function h deﬁned on the region
with
dh = p dx +q dy. (4.2)
Theorem. An exact form is closed.
The converse is not true. Consider, for instance, the plane minus the origin.
The form (−y dx + xdy)/(x
2
+ y
2
) is not exact in this region. It is, however,
exact in the plane minus the negative axis. In this region
−y dx +xdy
x
2
+y
2
= dθ, (4.3)
where −π/2 < θ < π/2.
Green’s theorem. If S is a bounded region with oriented boundary ∂S, then
_
∂S
p dx +q dy =
_ _
S
(
∂q
∂x
−
∂p
∂y
) dxdy. (4.4)
Consider a region R and an oriented curve C in R. Then C ∼ 0 (C is
homologous to 0) in R means that there is a bounded region S such that S and
its oriented boundary ∂S are contained in R such that ∂S = C.
Corollary. If p dx +q dy is closed in some region R, and if C ∼ 0 in R, then
_
C
p dx +q dy = 0. (4.5)
If C is an oriented curve, then −C is the oriented curve with the opposite
orientation. The sum C
1
+ C
2
of two oriented curves is obtained by following
one curve and then the other. The diﬀerence C
1
− C
2
is deﬁned by following
one curve and then the other in the reverse direction.
Consider a region R and two oriented curves C
1
and C
2
in R. Then C
1
∼ C
2
(C
1
is homologous to C
2
) in R means that C
1
−C
2
∼ 0 in R.
Corollary. If p dx + q dy is closed in some region R, and if C
1
∼ C
2
in R,
then
_
C
1
p dx +q dy =
_
C
2
p dx +q dy. (4.6)
4.2. COMPLEX FUNCTIONS 55
4.2.2 CauchyRiemann equations
Write z = x +iy. Deﬁne partial diﬀerential operators
∂
∂z
=
∂
∂x
+
1
i
∂
∂y
(4.7)
and
∂
∂¯ z
=
∂
∂x
−
1
i
∂
∂y
(4.8)
The justiﬁcation for this deﬁnition is the following. Every polynomial in x, y
may be written as a polynomial in z, ¯ z, and conversely. Then for each term in
such a polynomial
∂
∂z
z
m
¯ z
n
= mz
m−1
¯ z
n
(4.9)
and
∂
∂¯ z
z
m
¯ z
n
= z
m
n¯ z
n−1
. (4.10)
Let w = u +iv be a function f(z) of z = x +iy. Suppose that this satisﬁes
the system of partial diﬀerential equations
∂w
∂¯ z
= 0. (4.11)
In this case we say that f(z) is an analytic function of z in this region. Explicitly
∂(u +iv)
∂x
−
∂(u +iv)
∂iy
= 0. (4.12)
This gives the CauchyRiemann equations
∂u
∂x
=
∂v
∂y
(4.13)
and
∂v
∂x
= −
∂u
∂y
. (4.14)
4.2.3 The Cauchy integral theorem
Consider an analytic function w = f(z) and the diﬀerential form
wdz = f(z) dz = (u +iv) (dx +idy) = (udx −v dy) +i(v dx +udy). (4.15)
According to the CauchyRiemann equations, this is a closed form.
Theorem (Cauchy integral theorem) If f(z) is analytic in a region R, and if
C ∼ 0 in R, then
_
C
f(z) dz = 0. (4.16)
56 CHAPTER 4. COMPLEX INTEGRATION
Example: Consider the diﬀerential form z
m
dz for integer m ,= 1. When
m ≥ 0 this is deﬁned in the entire complex plane; when m < 0 it is deﬁned in
the punctured plane (the plane with 0 removed). It is exact, since
z
m
dz =
1
m+ 1
dz
m+1
. (4.17)
On the other hand, the diﬀerential form dz/z is closed but not exact in the
punctured plane.
4.2.4 Polar representation
The exponential function is deﬁned by
exp(z) = e
z
=
∞
n=0
z
n
n!
. (4.18)
It is easy to check that
e
x+iy
= e
x
e
iy
= e
x
(cos(y) +i sin(y)). (4.19)
Sometimes it is useful to represent a complex number in the polar represen
tation
z = x +iy = r(cos(θ) +i sin(θ)). (4.20)
This can also be written
z = re
iθ
. (4.21)
From this we derive
dz = dx +i dy = dr e
iθ
+rie
iθ
dθ. (4.22)
This may also be written
dz
z
=
dr
r
+i dθ. (4.23)
Notice that this does not say that dz/z is exact in the punctured plane. The
reason is that the angle θ is not deﬁned in this region. However dz/z is exact
in a cut plane, that is, a plane that excludes some line running from the origin
to inﬁnity.
Let C(0) be a circle of radius r centered at 0. We conclude that
_
C(0)
f(z) dz =
_
2π
0
f(z)z i dθ. (4.24)
In particular,
_
C(0)
1
z
dz =
_
2π
0
i dθ = 2πi. (4.25)
By a change of variable, we conclude that for a circle C(z) of radius r centered
at z we have
_
C(z)
1
ξ −z
dξ = 2πi. (4.26)
4.3. COMPLEX INTEGRATION AND RESIDUE CALCULUS 57
4.2.5 Branch cuts
Remember that
dz
z
=
dr
r
+i dθ (4.27)
is exact in a cut plane. Therefore
dz
z
= d log(z) (4.28)
in a cut plane,
log(z) = log(r) +iθ (4.29)
Two convenient choices are 0 < θ < 2π (cut along the positive axis and −π <
θ < π (cut along the negative axis).
In the same way one can deﬁne such functions as
√
z = exp(
1
2
log(z)). (4.30)
Again one must make a convention about the cut.
4.3 Complex integration and residue calculus
4.3.1 The Cauchy integral formula
Theorem. (Cauchy integral formula) Let f(ξ) be analytic in a region R. Let
C ∼ 0 in R, so that C = ∂S, where S is a bounded region contained in R. Let
z be a point in S. Then
f(z) =
1
2πi
_
C
f(ξ)
ξ −z
dξ. (4.31)
Proof: Let C
δ
(z) be a small circle about z. Let R
be the region R with the
point z removed. Then C ∼ C
δ
(z) in R
. It follows that
1
2πi
_
C
f(ξ)
ξ −z
dξ =
1
2πi
_
C
δ
(z)
f(ξ)
ξ −z
dξ. (4.32)
It follows that
1
2πi
_
C
f(ξ)
ξ −z
dξ −f(z) =
1
2πi
_
C
δ
(z)
f(ξ) −f(z)
ξ −z
dξ. (4.33)
Consider an arbitrary > 0. The function f(ξ) is continuous at ξ = z. Therefore
there is a δ so small that for ξ on C
δ
(z) the absolute value [f(ξ)−f(z)[ ≤ . Then
the integral on the right hand side has integral with absolute value bounded by
1
2π
_
2π
0
δ
δdθ = . (4.34)
Therefore the left hand side has absolute value bounded by . Since is arbitrary,
the left hand side is zero.
58 CHAPTER 4. COMPLEX INTEGRATION
4.3.2 The residue calculus
Say that f(z) has an isolated singularity at z
0
. Let C
δ
(z
0
) be a circle about z
0
that contains no other singularity. Then the residue of f(z) at z
0
is the integral
res(z
0
) =
1
2πi
_
C
δ
(z
0
)
f(z) dz. (4.35)
Theorem. (Residue Theorem) Say that C ∼ 0 in R, so that C = ∂S with
the bounded region S contained in R. Suppose that f(z) is analytic in R except
for isolated singularities z
1
, . . . , z
k
in S. Then
_
C
f(z) dz = 2πi
k
j=1
res(z
j
). (4.36)
Proof: Let R
be R with the singularities omitted. Consider small circles
C
1
, . . . , C
k
around these singularities such that C ∼ C
1
+ +C
k
in R
. Apply
the Cauchy integral theorem to C −C
1
−. . . −C
k
.
If f(z) = g(z)/(z −z
0
) with g(z) analytic near z
0
and g(z
0
) ,= 0, then f(z)
is said to have a pole of order 1 at z
0
.
Theorem. If f(z) = g(z)/(z − z
0
) has a pole of order 1, then its residue at
that pole is
res(z
0
) = g(z
0
) = lim
z→z
0
(z −z
0
)f(z). (4.37)
Proof. By the Cauchy integral formula for each suﬃciently small circle C
about z
0
the function g(z) satisﬁes
g(z
0
) =
1
2πi
_
C
g(z)
z −z
0
dξ =
1
2πi
_
C
f(z) dz. (4.38)
This is the residue.
4.3.3 Estimates
Recall that for complex numbers we have [zw[ = [z[[w[ and [z/w[ = [z[/[w[.
Furthermore, we have
[[z[ −[w[[ ≤ [z ±w[ ≤ [z[ +[w[. (4.39)
When [z[ > [w[ this allows us to estimate
1
[z ±w[
≤
1
[z[ −[w[
. (4.40)
Finally, we have
[e
z
[ = e
z
. (4.41)
4.4. PROBLEMS 59
4.3.4 A residue calculation
Consider the task of computing the integral
_
∞
−∞
e
−ikx
1
x
2
+ 1
dx (4.42)
where k is real. This is the Fourier transform of a function that is in L
2
and
also in L
1
. The idea is to use the analytic function
f(z) = e
−ikz
1
z
2
+ 1
. (4.43)
The ﬁrst thing is to analyze the singularities of this function. There are poles
at z = ±i. Furthermore, there is an essential singularity at z = ∞.
First look at the case when k ≤ 0. The essential singularity of the exponen
tial function has the remarkable feature that for ·z ≥ 0 the absolute value of
e
−ikz
is bounded by one. This suggests looking at a closed oriented curve C
a
in
the upper half plane. Take C
a
to run along the x axis from −a to a and then
along the semicircle z = ae
iθ
from θ = 0 to θ = π. If a > 1 there is a singularity
at z = i inside the curve. So the residue is the value of g(z) = e
−ikz
/(z +i) at
z = i, that is, g(i) = e
k
/(2i). By the residue theorem
_
C
a
e
−ikz
1
z
2
+ 1
dz = πe
k
(4.44)
for each a > 1.
Now let a →∞. The contribution from the semicircle is bounded by
_
π
0
1
a
2
−1
a dθ = π
a
a
2
−1
. (4.45)
We conclude that for k ≤ 0
_
∞
−∞
e
−ikx
1
x
2
+ 1
dx = πe
k
. (4.46)
Next look at the case when k ≥ 0. In this case we could look at an oriented
curve in the lower half plane. The integral runs from a to −a and then around
a semicircle in the lower half plane. The residue at z = −i is e
−k
/(−2i). By
the residue theorem, the integral is −πe
−k
. We conclude that for all real k
_
∞
−∞
e
−ikx
1
x
2
+ 1
dx = πe
−k
. (4.47)
4.4 Problems
1. Evaluate
_
∞
0
1
√
x(4 +x
2
)
dx
by contour integration. Show all steps, including estimation of integrals
that vanish in the limit of large contours.
60 CHAPTER 4. COMPLEX INTEGRATION
2. In the following problems f(z) is analytic in some region. We say that
f(z) has a root of multiplicity m at z
0
if f(z) = (z −z
0
)
m
h(z), where h(z)
is analytic with h(z
0
) ,= 0. Find the residue of f
(z)/f(z) at such a z
0
.
3. Say that f(z) has several roots inside the contour C. Evaluate
1
2πi
_
C
f
(z)
f(z)
dz.
4. Say that
f(z) = a
0
+a
1
z +a
2
z
2
+ +a
n
z
n
is a polynomial. Furthermore, suppose that C is a contour surrounding
the origin on which
[a
k
z
k
[ > [f(z) −a
k
z
k
[.
Show that on this contour
f(z) = a
k
z
k
g(z)
where
[g(z) −1[ < 1
on the contour. Use the result of the previous problem to show that the
number of roots (counting multiplicity) inside C is k.
5. Find the number of roots (counting multiplicity) of z
6
+3z
5
+1 inside the
unit circle.
4.5 More residue calculus
4.5.1 Jordan’s lemma
Jordan’s lemma says that for b > 0 we have
1
π
_
π
0
e
−b sin(θ)
dθ ≤
1
b
. (4.48)
To prove it, it is suﬃcient to estimate twice the integral over the interval
from 0 to π/2. On this interval use the inequality (2/π)θ ≤ sin(θ). This gives
2
π
_ π
2
0
e
−2bθ/π
dθ =
1
b
(1 −e
−b
) ≤
1
b
. (4.49)
4.5. MORE RESIDUE CALCULUS 61
4.5.2 A more delicate residue calculation
Consider the task of computing the integral
_
∞
−∞
e
−ikx
1
x −i
dx (4.50)
where k is real. This is the Fourier transform of a function that is in L
2
but not
in L
1
. The idea is to use the analytic function
f(z) = e
−ikz
1
z −i
. (4.51)
The ﬁrst thing is to analyze the singularities of this function. There is a pole
at z = i. Furthermore, there is an essential singularity at z = ∞.
First look at the case when k < 0. Take C
a
to run along the x axis from −a
to a and then along the semicircle z = ae
iθ
in the upper half plane from θ = 0
to θ = π. If a > 1 there is a singularity at z = i inside the curve. So the residue
is the value of g(z) = e
−ikz
at z = i, that is, g(i) = e
k
. By the residue theorem
_
C
a
e
−ikz
1
z −i
dz = 2πie
k
(4.52)
for each a > 1.
Now let a → ∞. The contribution from the semicircle is bounded using
Jordan’s lemma:
_
π
0
e
ka sin(θ)
1
a −1
a dθ ≤ π
1
−ka
a
a −1
(4.53)
We conclude that for k < 0
_
∞
−∞
e
−ikx
1
x −i
dx = 2πie
k
. (4.54)
Next look at the case when k < 0. In this case we could look at an oriented
curve in the lower half plane. The integral runs from a to −a and then around
a semicircle in the lower half plane. The residue is zero. We conclude using
Jordan’s lemma that for k < 0
_
∞
−∞
e
−ikx
1
x −i
dx = 0. (4.55)
4.5.3 Cauchy formula for derivatives
Theorem. (Cauchy formula for derivatives) Let f(ξ) be analytic in a region
R including a point z. Let C be an oriented curve in R such that for each
suﬃciently small circle C(z) about z, C ∼ C(z) in R. Then the mth derivative
satisﬁes
1
m!
f
(m)
(z) =
1
2πi
_
C
f(ξ)
(ξ −z)
m+1
dξ. (4.56)
Proof: Diﬀerentiate the Cauchy integral formula with respect to z a total of
m times.
62 CHAPTER 4. COMPLEX INTEGRATION
4.5.4 Poles of higher order
If f(z) = g(z)/(z − z
0
)
m
with g(z) analytic near z
0
and g(z
0
) ,= 0 and m ≥ 1,
then f(z) is said to have a pole of order m at z
0
.
Theorem. If f(z) = g(z)/(z − z
0
)
m
has a pole of order m, then its residue
at that pole is
res(z
0
) =
1
(m−1)!
g
(m−1)
(z
0
). (4.57)
Proof. By the Cauchy formula for derivatives for each suﬃciently small circle
C about z the m−1th derivative satisﬁes
1
(m−1)!
g
(m−1)
(z
0
) =
1
2πi
_
C
g(z)
(z −z
0
)
m
dz =
1
2πi
_
C
f(z) dz. (4.58)
The expression given in the theorem is evaluated in practice by using the
fact that g(z) = (z −z
0
)
m
f(z) for z near z
0
, performing the diﬀerentiation, and
then setting z = z
0
. This is routine, but it can be tedious.
4.5.5 A residue calculation with a double pole
Conﬁder the task of computing the integral
_
∞
−∞
e
−ikx
1
(x
2
+ 1)
2
dx (4.59)
where k is real. This is the Fourier transform of a function that is in L
2
and
also in L
1
. The idea is to use the analytic function
f(z) = e
−ikz
1
(z
2
+ 1)
2
. (4.60)
The ﬁrst thing is to analyze the singularities of this function. There are poles
at z = ±i. Furthermore, there is an essential singularity at z = ∞.
First look at the case when k ≤ 0. Consider a closed oriented curve C
a
in the
upper half plane. Take C
a
to run along the x axis from −a to a and then along
the semicircle z = ae
iθ
from θ = 0 to θ = π. If a > 1 there is a singularity at
z = i inside the curve. The pole there is of order 2. So the residue is calculated
by letting g(z) = e
−ikz
/(z +i)
2
, taking the derivative, and evaluating at z = i,
that is, g
(i) = (1 −k)e
k
/(4i). By the residue theorem
_
C
a
e
−ikz
1
z
2
+ 1
dz =
1
2
π(1 −k)e
k
(4.61)
for each a > 1.
Now let a →∞. The contribution from the semicircle vanishes in that limit.
We conclude that for k ≤ 0
_
∞
−∞
e
−ikx
1
(x
2
+ 1)
2
dx =
1
2
π(1 −k)e
k
. (4.62)
4.6. THE TAYLOR EXPANSION 63
Next look at the case when k ≥ 0. In this case we could look at an oriented
curve in the lower half plane. The integral runs from a to −a and then around
a semicircle in the lower half plane. The residue at z = −i is −(1 +k)e
−k
/(4i).
By the residue theorem, the integral is −π(1 + k)e
−k
/2. We conclude that for
all real k
_
∞
−∞
e
−ikx
1
(x
2
+ 1)
2
dx =
1
2
π(1 +[k[)e
−k
(4.63)
4.6 The Taylor expansion
4.6.1 Radius of convergence
Theorem. Let f(z) be analytic in a region R including a point z
0
. Let C(z
0
) be
a circle centered at z
0
such that C(z
0
) and its interior are in R. Then for z in
the interior of C(z
0
)
f(z) =
∞
m=0
1
m!
f
(m)
(z
0
)(z −z
0
)
m
. (4.64)
Proof. For each ﬁxed ξ the function of z given by
1
ξ −z
=
1
ξ −z
0
+z
0
−z
=
1
(ξ −z
0
)
1
1 −
z−z
0
ξ−z
0
=
1
(ξ −z
0
)
∞
m=0
_
z −z
0
ξ −z
0
_
m
.
(4.65)
has a geometric series expansion. Multiply by (1/2πi)f(ξ) dξ and integrate
around C(z
0
). On the left hand side apply the Cauchy integral formula to get
f(z).
In each term in the expansion on the right hand side apply the Cauchy
formula for derivatives in the form
1
2πi
_
C(z
0
)
f(ξ)
(ξ −z
0
)
m+1
dξ =
1
m!
f
(m)
(z
0
). (4.66)
This theorem is remarkable because it shows that the condition of analyticity
implies that the Taylor series always converges. Furthermore, take the radius
of the circle C(z
0
) as large as possible. The only constraint is that there must
be a function that is analytic inside the circle and that extends f(z). Thus one
must avoid the singularity of this extension of f(z) that is closest to z
0
. This
explains why the radius of convergence of the series is the distance from z
0
to
this nearest singularity.
The reason that we talk of an analytic extension is that artiﬁcialities in the
deﬁnition of the function, such as branch cuts, should not matter. On the other
hand, singularities such as poles, essential singularities, and branch points are
intrinsic. The radius of convergence is the distance to the nearest such intrinsic
singularity.
If one knows an analytic function near some point, then one knows it all
the way out to the radius of convergence of the Taylor series about that point.
64 CHAPTER 4. COMPLEX INTEGRATION
But then for each point in this larger region there is another Taylor series. The
function is then deﬁned out to the radius of convergence associated with that
point. The process may be carried out over a larger and larger region, until
blocked by some instrinsic singularity. It is known as analytic continuation.
If the analytic continuation process begins at some point and winds around a
branch point, then it may lead to a new deﬁnition of the analytic function at the
original point. This appears to lead to the necessity of introducing an artiﬁcal
branch cut in the deﬁnition of the function. However this may be avoided by
introducing the concept of Riemann surface.
4.6.2 Riemann surfaces
When an analytic function has a branch cut, it is an indicator of the fact that
the function should not be thought of not as a function on a region of the
complex plane, but instead as a function on a Riemann surface. A Riemann
surface corresponds to several copies of a region in the complex plane. These
copies are called sheets. Where one makes the transition from one sheet to
another depends on the choice of branch cut. But the Riemann surface itself is
independent of the notion of sheet.
As an example, take the function w =
√
z. The most natural deﬁnition of this
function is as a function on the curve w
2
= z. This is a kind of twodimensional
parabola in a four dimensional space of two complex variables. The value of the
square root function on the point with coordinates z, w on the parabola w
2
= z
is w. Notice that if z is given, then there are usually two corresponding values
of w. Thus if we want to think of
√
z as a function of a complex variable, then
it is ambiguous. But if we think of it as a function on the Riemann surface,
then it is perfectly well deﬁned. The Riemann surface has two sheets. If we
wish, we may think of sheet I as the complex plane cut along the negative axis.
Sheet II is another copy of the complex plane cut along the negative axis. As
z crosses the cut, it changes from one sheet to the other. The value of w also
varies continuously, and it keeps track of what sheet the z is on.
As another example, take the function w =
_
z(z −1). The most natural
deﬁnition of this function is as a function on the curve w
2
= z(z − 1). This
is a kind of twodimensional circle in a four dimensional space of two complex
variables. The value of the function on the point with coordinates z, w on the
circle w
2
= z(z − 1) is w. Notice that if z is given, then there are usually two
corresponding values of w. Thus if we want to think of
_
z(z −1) as a function
of a complex variable, then it is ambiguous. But if we think of it as a function
on the Riemann surface, then it is perfectly well deﬁned. The Riemann surface
has two sheets. If we wish, we may think of sheet I as the complex plane cut
between 0 and 1. Sheet II is another copy of the complex plane, also cut between
0 and 1. As z crosses the cut, it changes from one sheet to the other. The value
of w also varies continuously, and it keeps track of what sheet the z is on.
As a ﬁnal example, take the function w = log z. The most natural deﬁnition
of this function is as a function on the curve exp(w) = z. This is a two
dimensional curve in a four dimensional space of two complex variables. The
4.6. THE TAYLOR EXPANSION 65
value of the logarithm function on the point with coordinates z, w on the curve
exp(w) = z is w. Notice that if z is given, then there are inﬁnitely many
corresponding values of w. Thus if we want to think of log z as a function of a
complex variable, then it is ambiguous. But if we think of it as a function on
the Riemann surface, then it is perfectly well deﬁned. The Riemann surface has
inﬁnitely many sheets. If we wish, we may think of each sheet as the complex
plane cut along the negative axis. As z crosses the cut, it changes from one
sheet to the other. The value of w also varies continuously, and it keeps track
of what sheet the z is on. The inﬁnitely many sheets form a kind of spiral that
winds around the origin inﬁnitely many times.
66 CHAPTER 4. COMPLEX INTEGRATION
Chapter 5
Distributions
5.1 Properties of distributions
Consider the space C
∞
c
(R) of complex test functions. These are complex func
tions deﬁned on the real line, inﬁnitely diﬀerentiable and with compact support.
A distribution is a linear functional from this space to the complex numbers. It
must satisfy a certain continuity condition, but we shall ignore that. The value
of the distribution F on the test function φ is written ¸F, φ).
If f is a locally integrable function, then we may deﬁne a distribution
¸F, φ) =
_
∞
−∞
f(x)φ(x) dx. (5.1)
Thus many functions deﬁne distributions. This is why distributions are also
called generalized functions.
A sequence of distributions F
n
is said to converge to a distribution F if for
each test function φ the numbers ¸F
n
, φ) converge to the number ¸F, φ).
Example. The distribution δ
a
is deﬁned by
¸δ
a
, φ) = φ(a). (5.2)
This is not given by a locally integrable function. However a distribution may
be written as a limit of functions. For instance, let > 0 and consider the
function
δ
(x −a) =
1
π
(x −a)
2
+
2
. (5.3)
The limit of the distributions deﬁned by these locally integrable functions as
↓ 0 is δ
a
. For this reason the distribution is often written in the incorrect but
suggestive notation
¸δ
a
, φ) =
_
∞
−∞
δ(x −a)φ(x) dx. (5.4)
Operations on distributions are deﬁned by looking at the example of a dis
tribution deﬁned by a function and applying the integration by parts formula
67
68 CHAPTER 5. DISTRIBUTIONS
in that case. Thus, for instance, the derivative is deﬁned by
¸F
, φ) = −¸F, φ
). (5.5)
Example: Consider the locally integrable Heaviside function given by H(x) =
1 for x > 0, H(x) = 0 for x < 0. Then H
= δ. Here δ is given by
¸δ, φ) = φ(0). (5.6)
Example: Consider the locally integrable log function f(x) = log [x[. Then
f
(x) = PV 1/x. Here the principal value PV 1/x is given by
¸PV
1
x
, φ) = lim
↓0
_
∞
−∞
x
x
2
+
2
φ(x) dx.. (5.7)
This can be seen by writing log [x[ = lim
log
√
x
2
+
2
.
A distribution can be approximated by more than one sequence of functions.
For instance, for each a > 0 let log
a
([x[) = log([x[) for [x[ ≥ a and log
a
([x[) =
log(a) for [x[ < a. Then log
a
([x[) also approaches log([x[) as a → 0. So its
derivative is another approximating sequence for PV 1/x. This says that
¸PV
1
x
, φ) = lim
a↓0
_
x>a
1
x
φ(x) dx. (5.8)
We can use this sequence to compute the derivative of PV 1/x. We get
_
∞
−∞
_
d
dx
PV
1
x
_
φ(x) dx = − lim
a→0
_
x>a
1
x
φ
(x) dx = lim
a→0
_
_
x>a
−
1
x
2
φ(x) dx +
φ(a) +φ(−a)
a
_
.
(5.9)
But [φ(a) +φ(−a) −2φ(0)]/a converges to zero, so this is
_
∞
−∞
_
d
dx
PV
1
x
_
φ(x) dx = lim
a→0
_
x>a
−
1
x
2
[φ(x) −φ(0)] dx. (5.10)
Another operator is multiplication of a distribution by a smooth function g
in C
∞
. This is deﬁned in the obvious way by
¸g F, φ) = ¸F, gφ). (5.11)
Distributions are not functions. They may not have values at points, and
in general nonlinear operations are not deﬁned. For instance, the square of a
distribution is not always a well deﬁned distribution.
Also, some algebraic operations involving distributions are quite tricky. Con
sider, for instance, the associative law. Apply this to the three distributions
δ(x), x, and PV 1/x. Clearly the product δ(x) x = 0. On the other hand, the
product x PV 1/x = 1 is one. So if the associate law were to hold, we would
get
0 = 0 PV
1
x
= (δ(x) x) PV
1
x
= δ(x) (x PV
1
x
) = δ(x) 1 = δ(x). (5.12)
5.2. MAPPING DISTRIBUTIONS 69
5.2 Mapping distributions
A test function φ is naturally viewed as a covariant object, so the distribution
F is contravariant. A proper function is a function such that the inverse image
of each compact set is compact. It is natural to deﬁne the forward push of a
distribution F by a smooth proper function g by ¸g[F], φ) = ¸F, φ◦g). Example:
If F is the distribution δ(x−3) and if u = g(x) = x
2
−4, then the forward push
is δ(u −5). This is because
_
δ(u −5)φ(u) du =
_
δ(x −3)φ(x
2
−4) dx.
On the other hand, it is actually more common to think of a distribution
as being a covariant object, since a distribution is supposed to be a generalized
function. The backward pull of the distribution by a smooth function g is
deﬁned in at least some circumstances by
¸F ◦ g, φ) = ¸F, g[φ]). (5.13)
Here
g[φ](u) =
g(x)=u
φ(x)
[g
(x)[
. (5.14)
Example. Let u = g(x) = x
2
− 4, with a > 0. Then the backward pull of
δ(u) under g is δ(x
2
−4) = (1/4)(δ(x −2) +δ(x + 2)). This is because
g[φ](u) =
φ(
√
u
2
+ 4) +φ(−
√
u
2
+ 4)
2
√
u
2
+ 4
. (5.15)
So if F = δ, then
F ◦ g =
1
4
(δ
2
+δ
−2
) (5.16)
Example: The backward pull is not always deﬁned. To consider a dis
tribution as a covariant object is a somewhat awkward act in general. Let
u = h(x) = x
2
. The backward pull of δ(u) by h is δ(x
2
), which is not deﬁned.
Example: The general formula for the pull back of the delta function is
δ(g(x)) =
g(a)=0
1
[g
(a)[
δ(x −a). (5.17)
The most important distributions are δ, PV 1/x, 1/(x−i0), and 1/(x+i0).
These are the limits of the functions δ
(x), x/(x
2
+
2
), 1/(x −i), 1/(x +i) as
↓ 0. The relations between these functions are given by
δ
(x) =
1
π
x
2
+
2
=
1
2πi
_
1
x −i
−
1
x +i
_
. (5.18)
and
x
x
2
+
2
=
1
2
_
1
x −i
+
1
x +i
_
. (5.19)
70 CHAPTER 5. DISTRIBUTIONS
5.3 Radon measures
A Radon measure is a positive distribution. That is, it is a linear functional
µ on C
∞
c
(R) such that for each test function φ the condition φ ≥ 0 implies
that the value ¸µ, φ) ≥ 0. Every Radon measure extends uniquely by continuity
to a linear function on C
c
(R), the space of continuous functions with compact
support. Each positive locally integrable function h deﬁnes a Radon measure by
integrating φ(x) times h(x) dx. Also, the point mass distributions δ
a
are Radon
measures.
It is common to write the value of a Radon measure in the form
¸µ, φ) =
_
φ(x) dµ(x). (5.20)
What is remarkable is that the theory of Lebesgue integration works for Radon
measures. That is, given a real function f ≥ 0 that is only required to be Borel
measurable, there is a natural deﬁnition of the integral such that
0 ≤
_
f(x) dµ(x) ≤ +∞. (5.21)
Furthermore, if f is a complex Borel measurable function such that [f[ has ﬁnite
integral, then the integral of f is deﬁned and satisﬁes.
[
_
f(x) dµ(x)[ ≤
_
[f(x)[ dµ(x) < +∞. (5.22)
5.4 Approximate delta functions
It might seem that one could replace the notion of distribution by the notion of
a sequence of approximating functions. This is true in some sense, but the fact
is that many diﬀerent sequences may approximate the same distribution. Here
is a result of that nature.
Theorem. Let δ
1
(u) ≥ 0 be a positive function with integral 1. For each > 0
deﬁne δ
(x) = δ(x/)/. Then the functions δ
converge to the δ distribution as
tends to zero.
The convergence takes place in the sense of distributions (smooth test func
tions with compact support) or even in the sense of Radon measures (continuous
test functions with compact support). Notice that there are no continuity or
symmetry assumptions on the initial function.
The proof of this theorem is simple. Each δ
has integral one. Consider a
bounded continuous function φ. Then
_
∞
−∞
δ
(x)φ(x) dx =
_
∞
−∞
δ
1
(u)φ(u) du. (5.23)
The dominated convergence theorem shows that this approaches the integral
_
∞
−∞
δ
1
(u)φ(0) du = φ(0). (5.24)
5.5. PROBLEMS 71
Here is an even stronger result.
Theorem. For each > 0 let δ
≥ 0 be a positive function with integral 1.
Suppose that for each a > 0 that
_
x>a
δ
(x) dx →0 (5.25)
as → 0. Then the functions δ
converge to the δ distribution as tends to
zero.
Here is a proof of this more general theorem. Let H
(a) =
_
a
0
δ
(x) dx. Then
for each a < 0 we have H
(a) → 0, and for each a > 0 we have 1 −H
(a) → 0.
In other words, for each a ,= 0 we have H
(a) → H(a) as → 0. Since the
functions H
are uniformly bounded, it follows from the dominated convergence
theorem that H
→H in the sense of distributions. It follows by diﬀerentiation
that δ
→δ in the sense of distributions.
5.5 Problems
If F and G are distributions, and if at least one of them has compact support,
then their convolution F ∗ G is deﬁned by
¸F ∗ G, φ) = ¸F
x
G
y
, φ(x +y)).
This product is commutative. It is also associative if at least two of the three
factors have compact support.
1. If F and G are given by locally integrable functions f and g, and at
least one has compact support, then F ∗ G is given by a locally integrable
function
(f ∗ g)(z) =
_
∞
−∞
f(x)g(z −x) dx =
_
∞
−∞
f(z −y)g(y) dy.
2. If G is given by a test function g, then F ∗g is given by a smooth function
(F ∗ g)(z) = ¸F
x
, g(z −x)).
3. Calculate the convolution 1 ∗ δ
.
4. Calculate the convolution δ
∗ H, where H is the Heaviside function.
5. Calculate the convolution (1 ∗ δ
) ∗ H and also calculate the convolution
1∗(δ
∗H). What does this say about the associative law for convolution?
6. Let L be a constant coeﬃcient linear diﬀerential operator. Let u be a
distribution that is a fundamental solution, that is, let Lu = δ. Let G be
a distribution with compact support. Show that the convolution F = u∗G
satisﬁes the equation LF = G. Hint: Write ¸LF, φ) = ¸F, L
†
φ), where L
†
is adjoint to L.
72 CHAPTER 5. DISTRIBUTIONS
7. Take L = −d
2
/dx
2
. Is there a fundamental solution that has support in
a bounded interval? Is there a fundamental solution that has support in
a semiinﬁnite interval?
5.6 Tempered distributions
Let d/dx be the operator of diﬀerentiation, and let x be the operator of mul
tiplication by the coordinate x. The space o of rapidly decreasing smooth test
functions consists of the functions φ in L
2
such that every ﬁnite product of the
operators d/dx and x applied to φ is also in L
2
. The advantage of this deﬁnition
is that the space o is clearly invariant under Fourier transformation.
A tempered distribution is a linear functional on o satisfying the appro
priate continuity property. Every tempered distribution restricts to deﬁne a
distribution. So tempered distributions are more special.
The advantage of tempered distributions is that one can deﬁne Fourier trans
forms
ˆ
F of tempered distributions F. The deﬁnition is
¸
ˆ
F, φ) = ¸F,
ˆ
φ). (5.26)
Here
ˆ
φ is the Fourier transform of φ.
Here are some Fourier transforms for functions. The ﬁrst two are easy. They
are
_
∞
−∞
e
−ikx
1
x −i
dx = 2πie
k
H(−k) (5.27)
and
_
∞
−∞
e
−ikx
1
x +i
dx = −2πie
−k
H(k). (5.28)
Subtract these and divide by 2πi. This gives
_
∞
−∞
e
−ikx
δ
(x) dx = e
−k
. (5.29)
Instead, add these and divide by 2. This gives
_
∞
−∞
e
−ikx
x
x
2
+
2
dx = −πie
−k
sign(k). (5.30)
The corresponding Fourier transforms for distributions are
F[1/(x −i0)] = 2πiH(−k) (5.31)
and
F[1/(x +i0)] = −2πiH(k). (5.32)
Also,
F[δ(x)] = 1 (5.33)
5.6. TEMPERED DISTRIBUTIONS 73
and
F[PV
1
x
] = −πisign(k). (5.34)
Example: Here is a more complicated calculation. The derivative of PV 1/x
is the distribution
d
dx
PV
1
x
= −
1
x
2
+cδ(x), (5.35)
where c is the inﬁnite constant
c =
_
∞
−∞
1
x
2
dx. (5.36)
This makes rigorous sense if interprets it as
_
∞
−∞
_
d
dx
PV
1
x
_
φ(x) dx = lim
a→0
_
x>a
−
1
x
2
[φ(x) −φ(0)] dx. (5.37)
One can get an intuitive picture of this result by graphing the approximating
functions. The key formula is
d
dx
x
x
2
+
2
= −
x
2
(x
2
+
2
)
2
+c
2
3
π(x
2
+
2
)
2
, (5.38)
where c
= π/(2). This is an approximation to −1/x
2
plus a big constant times
an approximation to the delta function.
The Fourier transform of the derivative is obtained by multiplying the Fourier
transform of PV 1/x by ik. Thus the Fourier transform of −1/x
2
+ cδ(x) is ik
times −πisign(k) which is π[k[.
This example is interesting, because it looks at ﬁrst glance that the derivative
of PV 1/x should be −1/x
2
, which is negative deﬁnite. But the correct answer
for the derivative is −1/x
2
+ cδ(x), which is actually positive deﬁnite. And in
fact its Fourier transform is positive.
For each of these formula there is a corresponding inverse Fourier transform.
For, instance, the inverse Fourier transform of 1 is
δ(x) =
_
∞
∞
e
ikx
dk
2π
=
_
∞
0
cos(kx)
dk
2π
. (5.39)
Of course such an equation is interpreted by integrating both sides with a test
function.
Another formula of the same type is gotten by taking the inverse Fourier
transform of −πisign(k). This is
PV
1
x
= −πi
_
∞
−∞
e
ikx
sign(k)
dk
2π
=
_
∞
0
sin(kx) dk. (5.40)
74 CHAPTER 5. DISTRIBUTIONS
5.7 Poisson equation
We begin the study of fundamental solutions of partial diﬀerential equations.
These are solutions of the equation Lu = δ, where L is the diﬀerential operator,
and δ is a point source.
Let us start with the equation in one space dimension:
_
−
d
2
dx
2
+m
2
_
u = δ(x). (5.41)
This is an equilibrium equation that balances a source with losses due to diﬀu
sion and to dissipation (when m > 0). Fourier transform. This gives
_
k
2
+m
2
_
ˆ u(k) = 1. (5.42)
The solution is
ˆ u(k) =
1
k
2
+m
2
. (5.43)
There is no problem of division by zero. The inverse Fourier transform is
u(x) =
1
2m
e
−mx
. (5.44)
This is the only solution that is a tempered distribution. (The solutions of the
homogeneous equation all grow exponentially.)
What happens when m = 0? This is more subtle. The equation is
−
d
2
dx
2
u = δ(x). (5.45)
Fourier transform. This gives
k
2
ˆ u(k) = 1. (5.46)
Now there is a real question about division by zero. Furthermore, the homo
geneous equation has solutions that are tempered distributions, namely linear
combinations of δ(k) and δ
(k). The ﬁnal result is that the inhomogeneous equa
tion does have a tempered distribution solution, but it needs careful deﬁnition.
The solution is
ˆ u(k) =
1
k
2
+∞δ(k). (5.47)
This may be thought of as the derivative of −PV 1/k. The inverse Fourier
transform of PV 1/k is (1/2)isign(x). So the inverse Fourier transform of
−d/dkPV 1/k is −(−ix)(1/2)isign(x) = −(1/2)[x[. Thus
u(x) = −
1
2
[x[ (5.48)
is a solution of the inhomogeneous equation. The solutions of the homogeneous
equation are linear combinations of 1 and x. None of these solutions are a
5.8. DIFFUSION EQUATION 75
good description of diﬀusive equilibrium. In fact, in one dimension there is no
diﬀusive equilibrium.
The next case that is simple to compute and of practical importance is the
equation in dimension 3. This is
_
−∇
2
+m
2
_
u = δ(x). (5.49)
This is an equilibrium equation that balances a source with losses due to diﬀu
sion and to dissipation (when m > 0). Fourier transform. This gives
_
k
2
+m
2
_
ˆ u(k) = 1. (5.50)
The solution is
ˆ u(k) =
1
k
2
+m
2
. (5.51)
The inverse Fourier transform in the three dimension case may be computed by
going to polar coordinates. It is
u(x) =
1
4π[x[
e
−mx
. (5.52)
What happens when m = 0? The situation is very diﬀerent in three dimen
sions. The equation is
∇
2
u = δ(x). (5.53)
Fourier transform. This gives
k
2
ˆ u(k) = 1. (5.54)
The inhomogeneous equation has a solution
ˆ u(k) =
1
k
2
. (5.55)
But now this is a locally integrable function. It deﬁnes a tempered distribution
without any regularization. Thus
u(x) =
1
4π[x[
(5.56)
is a solution of the inhomogeneous equation. In three dimensions there is dif
fusive equilibrium. There is so much room that the eﬀect of the source can be
completely compensated by diﬀusion alone.
5.8 Diﬀusion equation
The diﬀusion equation or heat equation is
_
∂
∂t
−
1
2
σ
2
∂
2
∂x
2
_
u = δ(x)δ(t). (5.57)
76 CHAPTER 5. DISTRIBUTIONS
It says that the time rate of change is entirely due to diﬀusion. Fourier trans
form. We get
(iω +
1
2
σ
2
k
2
)ˆ u(k, ω) = 1. (5.58)
This has solution
ˆ u(k, ω) =
1
iω +
1
2
σ
2
k
2
=
1
i
1
ω −i
1
2
σ
2
k
2
. (5.59)
Here the only division by zero is when both ω and k are zero. But this is not
so serious, because it is clear how to regularize. We can use the fact that the
inverse Fourier transform of 1/(ω −i) with > 0 is iH(t)e
−t
. So we have
ˆ u(k, t) = H(t)e
−
σ
2
tk
2
2
. (5.60)
This is a Gaussian, so the fundamental solution is
u(x, t) = H(t)
1
√
2πσ
2
t
e
−
x
2
2σ
2
t
. (5.61)
5.9 Wave equation
We will look for the solution of the wave equation with a point source at time
zero that lies in the forward light cone. The wave equation in 1 + 1 dimensions
is
_
∂
2
∂t
2
−c
2
∂
2
∂x
2
_
u = δ(x)δ(t). (5.62)
Fourier transform. We get
(−ω
2
+c
2
k
2
)ˆ u(k, ω) = 1. (5.63)
This has solution
ˆ u(k, ω) = −
1
(ω −i0)
2
−c
2
k
2
. (5.64)
The division by zero is serious, but it is possible to regularize. The choice
of regularization is not the only possible one, but we shall see that it is the
convention that gives propagation into the future. We can write this also as
ˆ u(k, ω) = −
1
2c[k[
_
1
ω −i0 −c[k[
−
1
ω −i0 +c[k[
_
. (5.65)
We can use the fact that the inverse Fourier transform of 1/(ω − i0) is iH(t).
So we have
ˆ u(k, t) = −
1
2c[k[
iH(t)
_
e
ickt
−e
−ickt
_
=
1
c[k[
sin(c[k[t)H(t). (5.66)
5.10. HOMOGENEOUS SOLUTIONS OF THE WAVE EQUATION 77
It is easy to check that this is the Fourier transform of
u(x, t) =
1
2c
[H(x +ct) −H(x −ct)] H(t). (5.67)
The wave equation in 3 + 1 dimensions is
_
∂
2
∂t
2
−c
2
∇
2
_
u = δ(x)δ(t). (5.68)
Fourier transform. We get
(−ω
2
+c
2
k
2
)ˆ u(k, ω) = 1. (5.69)
This has solution
ˆ u(k, ω) = −
1
(ω −i0)
2
−c
2
k
2
. (5.70)
Again we can write this as
ˆ u(k, ω) = −
1
2c[k[
_
1
ω −i0 −c[k[
−
1
ω −i0 +c[k[
_
. (5.71)
Thus we have again
ˆ u(k, t) = −
1
2c[k[
iH(t)
_
e
ickt
−e
−ickt
_
=
1
c[k[
sin(c[k[t)H(t). (5.72)
However the diﬀerence is that the k variable is three dimensional. It is easy to
check that this is the Fourier transform of
u(x, t) =
1
4πc[x[
δ([x[ −ct)H(t). (5.73)
This is a beautiful formula. It represents an expanding sphere of inﬂuence, going
into the future. Inside the sphere it is dark. The solution has an even more
beautiful expression that exhibits the symmetry:
u(x, t) =
1
2πc
δ(x
2
−c
2
t
2
)H(t). (5.74)
5.10 Homogeneous solutions of the wave equa
tion
The polynomial ω
2
−k
2
vanishes on an entire cone, so it is not surprising that
the wave equation has a number of interesting homogeneous solutions. The
most important ones are
ˆ u(k, t) =
1
c[k[
sin(c[k[t) (5.75)
78 CHAPTER 5. DISTRIBUTIONS
and its derivative
ˆ v(k, t) = cos(c[k[t). (5.76)
These are the solutions that are used in constructing solutions of the initial
value problem.
It is interesting to see what these solutions look like in the frequency repre
sentation. The result is
ˆ u(k, ω) = −πi
1
c[k[
[δ(ω −c[k[) −δ(ω +c[k[)] = −2πiδ(ω
2
−c
2
k
2
)sign(ω) (5.77)
and
ˆ v(k, ω) = iωˆ u(k, ω) = π[δ(ω −c[k[) +δ(ω +c[k[)] = 2π[ω[δ(ω
2
−c
2
k
2
). (5.78)
5.11 Problems
1. Show that
1
x
1
3
is a locally integrable function and thus deﬁnes a distribu
tion. Show that its distribution derivative is
d
dx
1
x
1
3
= −
1
3
1
x
4
3
+cδ(x), (5.79)
where
c =
1
3
_
∞
−∞
1
x
4
3
dx (5.80)
Hint: To make this rigorous, consider x/(x
2
+
2
)
2
3
.
2. Show that
1
x
1
3
is a locally integrable function and thus deﬁnes a distrib
ution. Show that its distribution derivative is
d
dx
1
x
1
3
= −
1
3
1
x
4
3
sign(x). (5.81)
The right hand side is not locally integrable. Explain the deﬁnition of the
right hand side as a distribution. Hint: To make this rigorous, consider
1/(x
2
+
2
)
1
6
.
3. Discuss the contrast between the results in the last two problems. It
may help to draw some graphs of the functions that approximate these
distributions.
4. Let m > 0. Use Fourier transforms to ﬁnd a tempered distribution u that
is the fundamental solution of the equation
−
d
2
dx
2
u +m
2
u = δ(x). (5.82)
Is this a unique tempered distribution solution? Explain.
5.12. ANSWERS TO FIRST TWO PROBLEMS 79
5. Let m > 0. Consider Euclidian space of dimension n = 3. Use Fourier
transforms to ﬁnd a tempered distribution u that is the fundamental so
lution of the equation
−∇
2
u +m
2
u = δ(x). (5.83)
5.12 Answers to ﬁrst two problems
1. The function
1
x
1
3
is locally integrable, while its pointwise derivative −
1
3
1
x
4
3
is not. But we do have
d
dx
x
(x
2
+
2
)
2
3
= −
1
3
x
2
(x
2
+
2
)
5
3
+
2
(x
2
+
2
)
5
3
. (5.84)
Let
c
=
_
∞
−∞
1
3
x
2
(x
2
+
2
)
5
3
dx (5.85)
which is easily seen to be proportional to 1/
1
3
. From the fundamental
theory of calculus
_
∞
−∞
2
(x
2
+
2
)
5
3
dx = c
. (5.86)
Write
δ
(x) =
1
c
2
(x
2
+
2
)
5
3
. (5.87)
Then δ
(x) →δ(x) in the sense of distributions. Furthermore,
_
∞
−∞
d
dx
1
x
1
3
φ(x) dx = lim
→0
_
∞
−∞
[−
1
3
x
2
(x
2
+
2
)
5
3
+c
δ
(x)]φ(x) dx. (5.88)
Since for φ smooth enough
_
c
δ
(x)[φ(x) −φ(0)] dx →0, (5.89)
this may be written in the simple form
_
∞
−∞
d
dx
1
x
1
3
φ(x) dx =
_
∞
−∞
−
1
3
1
x
4
3
[φ(x) −φ(0)] dx. (5.90)
Notice that the integrand is integrable for each test function φ.
2. The function
1
x
1
3
is locally integrable, while its pointwise derivative −
1
3
1
x
4
3
sign(x)
is not. We have
d
dx
1
(x
2
+
2
)
1
6
= −
1
3
x
(x
2
+
2
)
7
6
. (5.91)
80 CHAPTER 5. DISTRIBUTIONS
Since this derivative is an odd function, we have
_
∞
−∞
d
dx
1
[x[
1
3
φ(x) dx = lim
→0
_
∞
−∞
−
1
3
x
(x
2
+
2
)
7
6
[φ(x) −φ(0)] dx. (5.92)
We can write this as
_
∞
−∞
d
dx
1
[x[
1
3
φ(x) dx =
_
∞
−∞
−
1
3
1
x
4
3
sign(x)[φ(x) −φ(0)] dx. (5.93)
Again the integrand is integrable.
Chapter 6
Bounded Operators
6.1 Introduction
This chapter deals with bounded linear operators. These are bounded linear
transformations of a Hilbert space into itself. In fact, the chapter treats four
classes of operators: ﬁnite rank, HilbertSchmidt, compact, and bounded. Every
ﬁnite rank operator is HilbertSchmidt. Every HilbertSchmidt operator is com
pact. Every compact operator is bounded.
We shall see in the next chapter that it is also valuable to look at an even
broader class of operators, those that are closed and densely deﬁned. Every
bounded operator (everywhere deﬁned) is closed and densely deﬁned. However
the present chapter treats only bounded operators.
6.2 Bounded linear operators
Let H be a Hilbert space (a vector space with an inner product that is a complete
metric space). A linear transformation K : H → H is said to be bounded if
it maps bounded sets into bounded sets. This is equivalent to there being a
constant M with
Ku ≤ Mu. (6.1)
Let M
2
be the least such M. This is called uniform norm of K and is written
K
∞
or simply K when the context makes this clear. [The subscript in M
2
is supposed to remind us that we are dealing with Hilbert spaces like L
2
. The
subscript in K
∞
, on the other hand, tells that we are looking at a least upper
bound.]
If K and L are bounded operators, their sum K + L and product KL are
bounded. Furthermore, we have K + L
∞
≤ K
∞
+ L
∞
and KL
∞
≤
K
∞
L
∞
.
A bounded operator K always has an adjoint K
∗
that is a bounded operator.
81
82 CHAPTER 6. BOUNDED OPERATORS
It is the unique operator with the property that
(u, K
∗
v) = (Ku, v) (6.2)
for all u, v in H. Furthermore K
∗∗
= K, and K
∗

∞
= K
∞
. For products
we have (KL)
∗
= L
∗
K
∗
, in the opposite order.
It is also possible to deﬁne the adjoint of an operator from one Hilbert space
to another. Here is a special case. Let g be a vector in the Hilbert space H.
Then g deﬁnes a linear transformation from C to H by sending z to the vector
zg. The adjoint is a linear transformation from H to C, denoted by g
∗
. It is the
transformation that sends v to (g, v), so g
∗
v = (g, v). The adjointness relation
is ¯ z g
∗
v = (zg, v) which is just ¯ z(g, v) = (zg, v).
Let f be another vector. Deﬁne the operator K from H to H by Ku =
f(g, u), that is, K = fg
∗
. Then the adjoint of K is K
∗
= gf
∗
.
Example: HilbertSchmidt integral operators. Let
(Kf)(x) =
_
k(x, y)f(y) dy. (6.3)
with k in L
2
, that is,
k
2
2
=
_ _
[k(x, y)[
2
dxdy < ∞. (6.4)
Then K is bounded with norm K
∞
≤ k
2
.
Proof: Fix x. Apply the Schwarz inequality to the integral over y in the def
inition of the operator. This gives that [(Kf)(x)[
2
≤
_
[k(x, y)[
2
dy
_
[f(y)[
2
dy.
Now integrate over x.
Example: Interpolation integral operators. Let K be an integral operator
such that
M
1
= sup
y
_
[k(x, y)[ dx < ∞ (6.5)
and
M
∞
= sup
x
_
[k(x, y)[ dy < ∞. (6.6)
(Here we choose to think of k(x, y) as a function of x and y, not as a Schwartz
distribution like δ(x −y).) Let 1/p +1/q = 1. Then for each p with 1 ≤ p ≤ ∞
the norm M
p
of K as an operator on L
p
is bounded by M
p
≤ M
1
p
1
M
1
q
∞
. In
particular as an operator on L
2
the norm M
2
= K
∞
of K is bounded by
M
2
≤
_
M
1
M
∞
. (6.7)
The reason for the name interpolation operator (which is not standard) is that
the bound interpolates for all p from the extreme cases p = 1 (where the bound
is M
1
) and p = ∞ (where the bound is M
∞
).
Proof: Write
[(Kf)(x)[ ≤
_
[k(x, y)[[f(y)[ dy =
_
[k(x, y)[
1
q
[k(x, y)[
1
p
[f(y)[ dy. (6.8)
6.2. BOUNDED LINEAR OPERATORS 83
Apply the H¨older inequality to the integral. This gives
[(Kf)(x)[ ≤
__
[k(x, y)[ dy
_1
q
__
[k(x, y)[[f(y)[
p
dy
_1
p
≤ M
1
q
∞
__
[k(x, y)[[f(y)[
p
dy
_1
p
.
(6.9)
It follows that
_
[(Kf)(x)[
p
dx ≤ M
p
q
∞
_ _
[k(x, y)[[f(y)[
p
dy dx ≤ M
p
q
∞
M
1
_
[f(y)[
p
dy.
(6.10)
Take the pth root to get the bound
__
[(Kf)(x)[
p
dx
_1
p
≤ M
1
q
∞
M
1
p
1
__
[f(y)[
p
dy
_1
p
. (6.11)
For a bounded operator K the set of points µ such that (µI − K)
−1
is
a bounded operator is called the resolvent set of K. The complement of the
resolvent set is called the spectrum of K. When one is only interested in values
of µ ,= 0 it is common to deﬁne λ = 1/µ and write
µ(µI −K)
−1
= (I −λK)
−1
(6.12)
While one must be alert to which convention is being employed, this should be
recognized as a trivial relabeling.
Theorem. If complex number µ satisﬁes K
∞
< [µ[, then µ is in the
resolvent set of K.
Proof: This is the Neumann series expansion
(µI −K)
−1
=
1
µ
∞
j=0
1
µ
j
K
j
. (6.13)
The jth term has norm K
j

∞
/[µ[
j
≤ (K
∞
/[µ[)
j
. This is the jth term of a
convergent geometric series.
Theorem. If for some n ≥ 1 the complex number µ satisﬁes K
n

∞
< [µ[
n
,
then µ is in the resolvent set of K.
This is the Neumann series again. But now we only require the estimate
for the nth power of the operator. Write j = an + b, where 0 ≤ b < n. Then
K
j

∞
/[µ[
j
is bounded by (K
∞
/[µ[)
b
((K
n

∞
/[µ[
n
)
a
. So this is the sum of
n convergent geometric series, one for each value of b between 0 and n −1.
The spectral radius of an operator is the largest value of [µ[, where µ is in the
spectrum. The estimate of this theorem gives an upper bound on the spectral
radius.
This is a remarkable result, since it has no analog for scalars. However for a
matrix the norm of a power can be considerably smaller than the corresponding
power of the norm, so this result can be quite useful. The most spectacular
application is to Volterra integral operators.
84 CHAPTER 6. BOUNDED OPERATORS
Example: Volterra integral operators. Let H = L
2
(0, 1) and
(Kf)(x) =
_
x
0
k(x, y)f(y) dy. (6.14)
Suppose that [k(x, y)[ ≤ C for 0 ≤ y ≤ x ≤ 1 and k(x, y) = 0 for 0 ≤ x < y ≤ 1.
Then
K
n

∞
≤
C
n
(n −1)!
. (6.15)
As a consequence every complex number µ ,= 0 is in the resolvent set of K.
Proof: Each power K
n
with n ≥ 1 is also a Volterra integral operator. We
claim that it has kernel k
n
(x, y) with
[k
n
(x, y)[ ≤
C
n
(n −1)!
(x −y)
n−1
. (6.16)
for 0 ≤ y ≤ x ≤ 1 and zero otherwise. This follows by induction.
Since [k
n
(x, y)[ ≤ C
n
/(n − 1)!, the HilbertSchmidt norm of K is also
bounded by C
n
/(n−1)!. However this goes to zero faster than every power. So
the Neumann series converges.
The norm of a bounded operator can always be found by calculating the norm
of a selfadjoint bounded operator. In fact, K
2
∞
= K
∗
K
∞
. Furthermore,
the norm of a selfadjoint operator is its spectral radius. This shows that to
calculate the norm of a bounded operator K exactly, one needs only to calculate
the spectral radius of K
∗
K and take the square root. Unfortunately, this can
be a diﬃcult problem. This is why it is good to have other ways of estimating
the norm.
6.3 Compact operators
Let H be a Hilbert space. A subset S is totally bounded if for every > 0
there is a cover of S by a ﬁnite number N of balls. A totally bounded set is
bounded.
Thus for instance, if H is ﬁnite dimensional with dimension n and S is a cube
of side L, then N ≈ (L/)
n
. This number N is ﬁnite, though it increases with
. One expects a cube or a ball to be totally bounded only in ﬁnite dimensional
situations.
However the situation is diﬀerent for a rectangular shaped region in inﬁnite
dimensions. Say that the sides are L
1
, L
2
, L
3
, . . . , L
k
, . . . and that these decrease
to zero. Consider > 0. Pick k so large that L
k+1
, L
k+1
, . . . are all less than .
Then N ≈ (L
1
/)(L
2
/) (L
k
/). This can increase very rapidly with . But
such a region that is fat only in ﬁnitely many dimensions and increasingly thin
in all the others is totally bounded.
A linear transformation K : H →H is said to be compact if it maps bounded
sets into totally bounded sets. (One can take the bounded set to be the unit
ball.) A compact operator is bounded.
6.3. COMPACT OPERATORS 85
If K and L are compact operators, then so is their sum K + L. If K is
compact and L is bounded, then KL and LK are compact.
If the selfadjoint operator K
∗
K is compact, then K is compact.
Proof: Let > 0. Then there are u
1
, . . . , u
k
in the unit ball such that for
each u in the unit ball there is a j with K
∗
K(u − u
j
) < /2. But this says
that
K(u−u
j
) = (K(u−u
j
), K(u−u
j
)) = ((u−u
j
), K
∗
K(u−u
j
)) ≤ u−u
j
 K
∗
K(u−u
j
) < .
(6.17)
The adjoint K
∗
of a compact operator K is a compact operator.
Proof: Say that K is compact. Then since K
∗
is bounded, it follows that
KK
∗
is compact. It follows from the last result that K
∗
is compact.
Approximation theorem. If K
n
is a sequence of compact operators, and K
is a bounded operator, and if K
n
− K
∞
→ 0 and n → ∞, then K is also
compact.
The proof is a classical /3 argument. Let > 0. Choose n so large that
K−K
n

∞
<
3
. Since K
n
is compact, there are ﬁnitely many vectors u
1
, . . . , u
k
in the unit ball such that every vector K
n
u with u in the unit ball is within /3 of
some K
n
u
j
. Consider an arbitrary u in the unit ball and pick the corresponding
u
j
. Since
Ku −Ku
j
= (K −K
n
)u + (K
n
u −K
n
u
j
) + (K
n
−K)u
j
, (6.18)
it follows that
Ku−Ku
j
 ≤ (K−K
n
)u+K
n
u−K
n
u
j
+(K
n
−K)u
j
 ≤
3
+
3
+
3
= .
(6.19)
This shows that the image of the unit ball under K is totally bounded.
Spectral properties of compact operators. Let K be a compact operator. The
only nonzero points in the spectrum of K are eigenvalues of ﬁnite multiplicity.
The only possible accumulation point of the spectrum is 0.
Notice that there is no general claim that the eigenvectors of K form a basis
for the Hilbert space. The example of a Volterra integral operator provides a
counterexample: The only point in the spectrum is zero.
Let K be a compact operator. We want to consider equations of the form
µu = f +Ku, (6.20)
where µ is a parameter. If µ = 0, then this is an equation of the ﬁrst kind. If
µ ,= 0, then this is an equation of the second kind. Very often an equation of
the second kind is written u = f
1
+λKu, where λ = 1/µ, and where f
1
= λf.
The condition for a unique solution of an equation of the ﬁrst kind is the
existence of the inverse K
−1
, and the solution is u = −K
−1
f. Thus the issue
for an equation of the ﬁrst kind is whether µ = 0 is not an eigenvalue of K. If
µ = 0 is not an eigenvalue, the operator K
−1
will be typically be an unbounded
operator that is only deﬁned on a linear subspace of the Hilbert space.
86 CHAPTER 6. BOUNDED OPERATORS
The condition for a unique solution of an equation of the second kind is the
existence of the inverse (µI − K)
−1
for a particular value of µ ,= 0, and the
solution is u = (µI − K)
−1
f. The issue for an equation of the second kind
is whether µ ,= 0 is not an eigenvalue of K. In this case, if µ ,= 0 is not an
eigenvalue, then (µI − K)
−1
will be a bounded operator. Thus equations of
the second kind are much nicer. This is because compact operators have much
better spectral properties away from zero.
Spectral theorem for compact selfadjoint operators. Let K be a compact
selfadjoint operator. Then there is an orthonormal basis u
j
of eigenvectors
of K. The eigenvalues µ
j
of K are real. Each nonzero eigenvalue is of ﬁnite
multiplicity. The only possible accumulation point of the eigenvalues is zero.
The operator K has the representation
Kf =
j
µ
j
u
j
(u
j
, f). (6.21)
In abbreviated form this is
K =
j
µ
j
u
j
u
∗
j
. (6.22)
There is yet another way of writing the spectral theorem for compact op
erators. Deﬁne the unitary operator U from H to
2
by (Uf)
j
= (u
j
, f). Its
inverse is the unitary operator form
2
to H given by (U
∗
c) =
c
j
u
j
. Let M
be the diagonal operator from
2
to
2
deﬁned by multiplication by µ
j
. Then
K = U
∗
MU. (6.23)
The norm of a compact operator can always be found by calculating the norm
of a selfadjoint compact operator. In fact, K
2
∞
= K
∗
K
∞
. Furthermore,
the norm of a compact selfadjoint operator is its spectral radius, which in this
case is the largest value of [µ[, where µ is an eigenvalue. Unfortunately, this can
be a diﬃcult computation.
Singular value decomposition for compact operators. Let K be a compact
operator. Then there is an orthonormal family u
j
and an orthonormal family
w
j
and a sequence of numbers χ
j
≥ 0 (singular values of K) approaching zero
such that the operator K has the representation
Kf =
j
χ
j
w
j
(u
j
, f). (6.24)
In abbreviated form this is
K =
j
χ
j
w
j
u
∗
j
. (6.25)
Sketch of proof: The operator K
∗
K is selfadjoint with positive eigenvalues
χ
2
j
. We can write
K
∗
Kf =
j
χ
2
j
u
j
(u
j
, f). (6.26)
6.4. HILBERTSCHMIDT OPERATORS 87
Then
√
K
∗
Kf =
j
χ
j
u
j
(u
j
, f). (6.27)
Since Kf = 
√
K
∗
Kf for each f, we can write K = V
√
K
∗
K, where V g =
g for all g =
√
K
∗
Kf in the range of
√
K
∗
K. This is the wellknown polar
decomposition. Then
Kf =
j
χ
j
w
j
(u
j
, f), (6.28)
where w
j
= V u
j
.
There is another way of writing the singular value decomposition of K. Let
√
K
∗
K = U
∗
DU be the spectral representation of
√
K
∗
K, where D is diagonal
with entries χ
j
≥ 0. Then K = V U
∗
DU = WDU.
It follows from the approximation theorem and from the singular value de
composition that an operator is compact if and only if it is a norm limit of a
sequence of ﬁnite rank operators.
Notice that this theorem gives a fairly clear picture of what a compact op
erator acting on L
2
looks like. It is an integral operator with kernel
k(x, y) =
j
χ
j
w
j
(x)u
j
(y), (6.29)
where the χ
j
→ 0. Of course this representation may be diﬃcult to ﬁnd in
practice. What happens in the case of a HilbertSchmidt integral operator is
special: the χ
j
go to zero suﬃciently rapidly that
j
[χ
j
[
2
< ∞.
6.4 HilbertSchmidt operators
Let H be a Hilbert space. For a positive bounded selfadjoint operator B the
trace is deﬁned by trB =
j
(e
j
, Be
j
), where the e
j
form an orthonormal ba
sis. A bounded linear operator K : H → H is said to be HilbertSchmidt if
tr(K
∗
K) < ∞. The HilbertSchmidt norm is K
2
=
_
tr(K
∗
K).
Theorem. If K is a HilbertSchmidt integral operator with (Kf)(x) =
_
k(x, y)f(y) dy then
K
2
2
=
_ _
[k(x, y)[
2
dxdy. (6.30)
Proof: Let ¦e
1
, e
2
, ...¦ be an orthonormal basis for H. We can always expand
a vector in this basis via
u =
j
e
j
(e
j
, u). (6.31)
(Remember the convention adopted here that inner products are linear in the
second variable, conjugate linear in the ﬁrst variable.) Write
Kf =
i
e
i
j
(e
i
, Ke
j
)(e
j
, f). (6.32)
88 CHAPTER 6. BOUNDED OPERATORS
The matrix elements (e
i
, Ke
j
) satisfy
i
j
[(e
i
, Ke
j
)[
2
=
j
i
(Ke
j
, e
i
)(e
i
, Ke
j
) =
j
[[Ke
j
[[
2
=
j
(e
j
, K
∗
Ke
j
) = tr(K
∗
K).
(6.33)
So the kernel of the integral operator K is
k(x, y) =
i
j
(e
i
, Ke
j
)e
i
(x)e
j
(y). (6.34)
This sum is convergent in L
2
(dxdy).
A HilbertSchmidt operator is a bounded operator. It is always true that
K
∞
≤ K
2
.
Theorem. A HilbertSchmidt operator is compact.
Proof: Let K be a HilbertSchmidt operator. Then K is given by a square
summable matrix. So K may be approximated by a sequence of ﬁniterank
operators K
n
such that K
n
− K
2
→ 0. In particular, K
n
− K
∞
→ 0.
Since each K
n
is compact, it follows from the approximation theorem that K is
compact.
If K and L are HilbertSchmidt operators, their sum K + L is a Hilbert
Schmidt operator and K +L
2
≤ K
2
+L
2
. If K is a HilbertSchmidt op
erator and L is a bounded operator, then the products KL and LK are Hilbert
Schmidt. Furthermore, KL
2
≤ K
2
L
∞
and LK
2
≤ L
∞
K
2
.
The adjoint of a HilbertSchmidt operator is a HilbertSchmidt operator.
Furthermore, K
∗

2
= K
2
.
Notice that the HilbertSchmidt norm of a bounded operator is deﬁned in
terms of a selfadjoint operator. In fact, K
2
is the square root of the trace
of the selfadjoint operator K
∗
K, and the trace is the sum of the eigenvalues.
However we do not need to calculate the eigenvalues, since, as we have seen,
there are much easier ways to calculate the trace.
6.5 Problems
1. Let H = L
2
be the Hilbert space of square integrable functions on the
line. Create an example of a HilbertSchmidt operator that is not an
interpolation operator.
2. Let H = L
2
be the Hilbert space of square integrable functions on the
line. Create an example of an interpolation operator that is not a Hilbert
Schmidt operator. Make the example so that the operator is not compact.
3. Find an example of a compact bounded operator on H = L
2
that is neither
a HilbertSchmidt or an interpolation operator.
4. Find an example of a bounded operator on H = L
2
that is neither a
HilbertSchmidt nor an interpolation operator, and that is also not a com
pact operator.
6.6. FINITE RANK OPERATORS 89
5. Let H be a Hilbert space. Give an example of a HilbertSchmidt operator
for which the spectral radius is equal to the uniform norm.
6. Let H be a Hilbert space. Give an example of a HilbertSchmidt operator
for which the spectral radius is very diﬀerent from the uniform norm.
7. Let H be a Hilbert space. Is it possible for a HilbertSchmidt operator
to have its HilbertSchmidt norm equal to its uniform norm? Describe all
possible such situations.
6.6 Finite rank operators
Let H be a Hilbert space. A bounded linear transformation K : H →H is said
to be ﬁnite rank if its range is ﬁnite dimensional. The dimension of the range
of K is called the rank of K. A ﬁnite rank operator may be represented in the
form
Kf =
j
z
j
(u
j
, f), (6.35)
where the sum is ﬁnite. In a more abbreviated notation we could write
K =
j
z
j
u
∗
j
. (6.36)
Thus in L
2
this is an integral operator with kernel
k(x, y) =
j
z
j
(x)u
j
(y). (6.37)
If K and L are ﬁnite rank operators, then so is their sum K + L. If K is
ﬁnite rank and L is bounded, then KL and LK are ﬁnite rank.
The adjoint K
∗
of a ﬁnite rank operator K is ﬁnite rank. The two operators
have the same rank.
Every ﬁnite rank operator is HilbertSchmidt and hence compact. If K is a
compact operator, then there exists a sequence of ﬁnite rank operators K
n
such
that K
n
−K
∞
→0 as n →∞.
If K is a ﬁnite rank operator and µ ,= 0, then the calculation of (µI −K)
−1
or of (I − λK)
−1
may be reduced to a ﬁnitedimensional matrix problem in
the ﬁnite dimensional space R(K). This is because K leaves R(K) invariant.
Therefore if µ = 1/λ is not an eigenvalue of K acting in R(K), then there is an
inverse (I −λK)
−1
acting in R(K). However this gives a corresponding inverse
in the original Hilbert space, by the formula
(I −λK)
−1
= I + (I −λK)
−1
λK. (6.38)
Explictly, to solve
u = λKu +f, (6.39)
90 CHAPTER 6. BOUNDED OPERATORS
write
u = (I −λK)
−1
f = f + (I −λK)
−1
λKf. (6.40)
To solve this, let w be the second term on the right hand side, so that u = f +w.
Then (I −λK)w = λKf. Write w =
j
a
j
z
j
. Then
j
a
j
z
j
−λ
j
z
j
(u
j
,
r
a
r
z
r
) = λ
j
z
j
(u
j
, f). (6.41)
Thus
a
j
−λ
r
(u
j
, z
r
)a
r
= (u
j
, f). (6.42)
This is a matrix equation that may be solved whenever 1/λ is not an eigenvalue
of the matrix with entries (u
j
, z
r
).
6.7 Problems
It may help to recall that the problem of inverting I − λK is the same as the
problem of showing that µ = 1/λ is not in the spectrum of K.
1. Consider functions in L
2
(−∞, ∞). Consider the integral equation
f(x) −λ
_
∞
−∞
cos(
_
x
2
+y
4
)e
−x−y
f(y) dy = g(x).
It is claimed that there exists r > 0 such that for every complex number λ
with [λ[ < r the equation has a unique solution. Prove or disprove. Inter
pret this as a statement about the spectrum of a certain linear operator.
2. Consider functions in L
2
(−∞, ∞). Consider the integral equation
f(x) −λ
_
∞
−∞
cos(
_
x
2
+y
4
)e
−x−y
f(y) dy = g(x).
It is claimed that there exists R < ∞ such that for every complex number
λ with [λ[ > R the equation does not have a unique solution. Prove or
disprove. Interpret this as a statement about the spectrum of a certain
linear operator.
3. Consider functions in L
2
(−∞, ∞). Consider the integral equation
f(x) −λ
_
∞
−∞
e
−x−y
f(y) dy = g(x).
Find all complex numbers λ for which this equation has a unique solution.
Find the solution. Interpret this as a statement about the spectrum of a
certain linear operator.
6.7. PROBLEMS 91
4. Consider functions in L
2
(0, 1). Consider the integral equation
f(x) −λ
_
x
0
f(y) dy = g(x).
Find all complex numbers λ for which this equation has a unique solution.
Find the solution. Interpret this as a statement about the spectrum of a
certain linear operator. Hint: Diﬀerentiate. Solve a ﬁrst order equation
with a boundary condition.
5. Consider functions in L
2
(0, 1). Consider the integral equation
f(x) −λ
__
x
0
y(1 −x)f(y) dy +
_
1
x
x(1 −y)f(y) dy
_
= g(x).
Find all complex numbers λ for which this equation has a unique solution.
Interpret this as a statement about the spectrum of a certain linear oper
ator. Hint: The integral operator K has eigenfunctions sin(nπx). Verify
this directly. This should also determine the eigenvalues.
92 CHAPTER 6. BOUNDED OPERATORS
Chapter 7
Densely Deﬁned Closed
Operators
7.1 Introduction
This chapter deals primarily with densely deﬁned closed operators. Each every
where deﬁned bounded operator is in particular a densely deﬁned closed oper
ator.
If L is a densely deﬁned closed operator, then so is its adjoint L
∗
. Further
more, L
∗∗
= L. If both L and L
∗
have trivial null spaces, then both L
−1
and
L
∗−1
are densely deﬁned closed operators.
A complex number λ is in the resolvent set of a densely deﬁned closed op
erator L if (L −λI)
−1
is an everywhere deﬁned bounded operator. A complex
number λ is in the spectrum of L if it is not in the resolvent set. A complex
number λ is in the spectrum of L if and only if
¯
λ is in the spectrum of the
adjoint L
∗
.
It is common to divide the spectrum into three disjoint subsets: point spec
trum, continuous spectrum, and residual spectrum. (This terminology is mis
leading, in that it treats limits of point spectrum as continuous spectrum.) In
this treatment we divide the spectrum into four disjoint subsets: standard point
spectrum, pseudocontinuous spectrum, anomalous point spectrum, and resid
ual spectrum. The adjoint operation maps the ﬁrst two kind of spectra into
themselves, but it reverses the latter two.
7.2 Subspaces
Let H be a Hilbert space. Let M be a vector subspace of H. The closure
¯
M
is also a vector subspace of H. The subspace M is said to be closed if M =
¯
M. The orthogonal complement M
⊥
is a closed subspace of H. Furthermore,
(
¯
M)
⊥
= M
⊥
. Finally M
⊥⊥
=
¯
M. The nicest subspaces are closed subspaces.
93
94 CHAPTER 7. DENSELY DEFINED CLOSED OPERATORS
For a closed subspace M we always have M
⊥⊥
= M.
A subspace M is dense in H if
¯
M = H. This is equivalent to the condition
M
⊥
= ¦0¦.
7.3 Graphs
Let H be a Hilbert space. The direct sum H ⊕ H with itself consists of all
ordered pairs [u, v], where u, v are each vectors in H. The inner product of
[u, v] with [u
, v
] is
([u, v], [u
, v
]) = (u, u
) + (v, v
). (7.1)
A graph is a linear subspace of H ⊕ H. If we have two graphs L
1
and L
2
and if L
1
⊂ L
2
, then we say that L
1
is a restriction of L
2
or L
2
is an extension
of L
1
.
If L is a graph, then its domain D(L) is the set of all u in H such that there
exists a v with [u, v] in L. Its range R(L) is the set of all v in H such that there
exists u with [u, v] in L.
If L is a graph, then its null space N(L) is the set of all u in H such that
[u, 0] is in L.
If L is a graph, then the inverse graph L
−1
consists of all [v, u] such that
[u, v] is in L.
We say that a graph L is an operator if [0, v] in L implies v = 0. It is easy
to see that this is equivalent to saying that N(L
−1
) = ¦0¦ is the zero subspace.
When L is an operator and [u, v] is in L, then we write Lu = v. We shall explore
the properties of operators in the next section.
Write
¯
L for the closure of L. We say L is closed if L =
¯
L.
If L is a graph, then the adjoint graph L
∗
consists of the pairs [w, z] such
that for all [u, v] in L we have (z, u) = (w, v).
If L is a graph, then the adjoint graph L
∗
is always closed. Furthermore,
the adjoint of its closure
¯
L is the same as the adjoint of L.
Remark: One way to think of the adjoint graph is to deﬁne the negative
inverse −L
−1
of a graph L to consist of all the ordered pairs [v, −u] with [u, v]
in L. Then the adjoint L
∗
is the orthogonal complement in H ⊕ H of −L
−1
.
That is, the pair [z, w] in the graph L
∗
is orthogonal to each [v, −u] with [u, v]
in the graph of L. This says that ([z, w], [v, −u]) = (z, u) − (w, v) = 0 for all
such [u, v]. [This says to take the graph with negative reciprocal slope, and then
take the perpendicular graph to that.]
Another way to think of this is to deﬁne the antisymmetric formω([z, w], [u, v]) =
(z, u) − (w, v). Then the adjoint A
∗
consists of the orthogonal complement of
A with respect to ω.
Theorem. If L is a graph, then L
∗∗
=
¯
L.
Perhaps the nicest general class of graphs consists of the closed graphs L.
For such a graph the adjoint L
∗
is a graph, and L
∗∗
= L.
It is not hard to check that L
∗−1
= L
−1∗
.
Theorem. N(L
∗
) = R(L)
⊥
.
7.4. OPERATORS 95
Corollary. R(L) = N(L
∗
)
⊥
.
This corollary is very important in the theory of linear equations. Let L be
a linear operator. Suppose that R(L) is a closed subspace of H. Then in this
special case the corollary says that R(L) = N(L
∗
)
⊥
. Thus a linear equation
Lu = g has a solution u if and only if g is orthogonal to all solutions v of
L
∗
v = 0. This is called the Fredholm alternative.
7.4 Operators
If L is a graph, then L is an operator if [0, v] in L implies v = 0. It is easy to
see that L is an operator precisely when N(L
−1
) = ¦0¦ is the zero subspace.
When L is an operator and [u, v] is in L, then we write Lu = v.
Corollary. L is densely deﬁned if and only if L
∗
is an operator.
Proof: Apply the last theorem of the previous section to L
−1
. This gives
N(L
∗−1
) = D(L)
⊥
.
Corollary.
¯
L is an operator if and only if L
∗
is densely deﬁned.
Proof: Apply the previous corollary to L
∗
and use L
∗∗
=
¯
L.
An operator L is said to be closable if
¯
L is also an operator. For a densely
deﬁned closable operator the adjoint L
∗
is a densely deﬁned closed operator,
and L
∗∗
=
¯
L. Furthermore, the deﬁnition of the adjoint is that w is in the
domain of L
∗
and L
∗
w = z if and only if for all u in the domain of L we have
(z, u) = (w, Lu), that is,
(L
∗
w, u) = (w, Lu). (7.2)
Perhaps the nicest general class of operators consists of the densely deﬁned
closed operators L. For such an operator the adjoint L
∗
is a densely deﬁned
closed operator, and L
∗∗
= L.
7.5 The spectrum
Theorem. (Closed graph theorem) Let H be a Hilbert space. Let L be a closed
operator with domain D(L) = H. Then L is a bounded operator. (The converse
is obvious.)
0. Let L be a closed operator. We say that λ is in the resolvent set of L if
N(L−λI) = ¦0¦ and R(L−λI) = H. In that case (L−λI)
−1
is a closed operator
with domain D((L −λI)
−1
) = H. By the closed graph theorem, (L −λI)
−1
is
a bounded operator.
We shall usually refer to (L−λI)
−1
as the resolvent of L. However in some
contexts it is convenient to use instead (λI − L)
−1
, which is of course just the
negative. There is no great distinction between these two possible deﬁnitions of
resolvent. However it is important to be alert to which one is being used.
1. Let L be a closed operator. We say that λ is in the standard point
spectrum of L if N(L −λI) ,= ¦0¦ and R(L −λI)
⊥
,= ¦0¦.
96 CHAPTER 7. DENSELY DEFINED CLOSED OPERATORS
2. Let L be a closed operator. We say that λ is in the anomalous point
spectrum of L if N(L − λI) ,= ¦0¦ and R(L − λI)
⊥
= ¦0¦ (that is, R(L − λI)
is dense in H).
3. Let L be a closed operator. We say that λ is in the pseudocontinuous
spectrum of L if N(L−λI) = ¦0¦ and R(L−λI)
⊥
= ¦0¦ (so R(L−λI) is dense
in H) but R(L − λI) ,= H. In that case (L − λI)
−1
is a closed operator with
dense domain D((L −λI)
−1
) not equal to H.
4. Let L be a closed operator. We say that λ is in the residual spectrum
of L if N(L − λI) = ¦0¦ and R(L − λI)
⊥
,= ¦0¦. In that case (L − λI)
−1
is a
closed operator with a domain that is not dense.
Theorem. Let L be a densely deﬁned closed operator and let L
∗
be its adjoint
operator. Then:
0. The number λ is in the resolvent set of L if and only if
¯
λ is in the resolvent
set of L
∗
.
1. The number λ is in the standard point spectrum of L if and only if
¯
λ is
in the standard point spectrum of L
∗
.
2. The number λ is in the anomalous point spectrum of L if and only if
¯
λ is
in the residual spectrum of L
∗
.
3. λ is in the pseudocontinuous spectrum of L if and only if
¯
λ is in the
pseudocontinuous spectrum of L
∗
.
4. λ is in the residual spectrum of L if and only if
¯
λ is in the anomalous
point spectrum of L
∗
.
For ﬁnite dimensional vector spaces only cases 0 and 1 can occur.
Summary: Let L be a closed, densely deﬁned operator. The complex number
λ is in the point spectrum of L is equivalent to λ being an eigenvalue of L.
Similarly, λ in the pseudocontinuous spectrum of L is equivalent to (L−λI)
−1
being a densely deﬁned, closed, but unbounded operator. Finally, λ in the
residual spectrum of L is equivalent to (L−λI)
−1
being a closed operator that
is not densely deﬁned.
7.6 Spectra of inverse operators
Consider a closed, densely deﬁned operator with an inverse L
−1
that is also a
closed, densely deﬁned operator. Let λ ,= 0. Then λ is in the resolvent set of L
if and only if 1/λ is in the resolvent set of L
−1
. In fact, we have the identity
_
I −λ
−1
L
_
−1
+
_
I −λL
−1
_
−1
= I. (7.3)
One very important situation is when K = L
−1
is a compact operator.
Then we know that all nonzero elements µ of the spectrum of K = L
−1
are
eigenvalues of ﬁnite multiplicity, with zero as their only possible accumulation
point. It follows that all elements λ = 1/µ of the spectrum of L are eigenvalues
of ﬁnite multiplicity, with inﬁnity as their only possible accumulation point.
7.7. PROBLEMS 97
7.7 Problems
If K is a bounded everywhere deﬁned operator, then in particular K is a closed
densely deﬁned operator.
If K is a closed densely deﬁned operator, and if both K and K
∗
have trivial
nullspaces, then L = K
−1
is also a closed densely deﬁned operator.
1. Let K = L
−1
be as above. Let λ ,= 0 and let µ = 1/λ. Find a formula
relating the resolvent (L −λ)
−1
to the resolvent (K −µ)
−1
.
2. Consider functions in L
2
(0, 1). Consider the integral operator K given by
(Kf)(x) =
_
x
0
f(y) dy.
Show that L = K
−1
exists and is closed and densely deﬁned. Describe
the domain of L. Be explicit about boundary conditions. Describe how L
acts on the elements of this domain. Show that L
∗
= K
∗−1
is closed and
densely deﬁned. Describe the domain of L
∗
. Describe how L
∗
acts on the
elements of this domain. Hint: Diﬀerentiate.
3. In the preceding problem, ﬁnd the spectrum of L. Also, ﬁnd the resol
vent (L − λ)
−1
of L. Hint: Solve a ﬁrst order linear ordinary diﬀerential
equation.
4. Consider functions in L
2
(0, 1). Consider the integral operator K given by
(Kf)(x) =
__
x
0
y(1 −x)f(y) dy +
_
1
x
x(1 −y)f(y) dy
_
.
Show that L = K
−1
exists and is closed and densely deﬁned. Describe
the domain of L. Be explicit about boundary conditions. Describe how L
acts on the elements of this domain. Hint: Diﬀerentiate twice.
5. In the preceding problem, ﬁnd the spectrum of L. Find the resolvent
(L − λ)
−1
of L. Hint: Use sin(
√
λx) and sin(
√
λ(1 − x)) as a basis for
the solutions of a homogeneous second order linear ordinary diﬀerential
equation. Solve the inhomogeneous equation by variation of parameters.
6. Let K be a compact operator. Suppose that K and K
∗
have trivial null
spaces, so that L = K
−1
is a closed densely deﬁned operator. Prove
that the spectrum of L = K
−1
consists of isolated eigenvalues of ﬁnite
multiplicity. To what extent does this result apply to the examples in the
previous problems?
7. Let K be a compact selfadjoint operator. Suppose that K has trivial
nullspace, so that L = K
−1
is a selfadjoint operator. Prove that there
exists an orthogonal basis consisting of eigenvectors of L. To what extent
does this result apply to the examples in the previous problems?
98 CHAPTER 7. DENSELY DEFINED CLOSED OPERATORS
7.8 Selfadjoint operators
It is diﬃcult to do algebraic operations with closed, densely deﬁned operators,
because their domains may diﬀer. It is always true that (zL)
∗
= ¯ zL
∗
. If K is
bounded everywhere deﬁned, then (L+K)
∗
= L
∗
+K
∗
and (K+L)
∗
= K
∗
+L
∗
.
Furthermore, if K is bounded everywhere deﬁned, then (KL)
∗
= L
∗
K
∗
.
An operator A is selfadjoint if A = A
∗
. A selfadjoint operator is automat
ically closed and densely deﬁned. (Every adjoint A
∗
is automatically closed. If
A
∗
is an operator, then A is densely deﬁned.)
If a selfadjoint operator has trivial null space, then its inverse is also a
selfadjoint operator.
If L is a closed, densely deﬁned operator, then L
∗
L is deﬁned on the domain
consisting of all u in D(L) such that Lu is in D(L
∗
). It is not obvious that this
is closed and densely deﬁned, much less that it is selfadjoint. However this is
all a consequence of the following theorem.
Theorem. If L is a closed and densely deﬁned operator, then L
∗
L is a self
adjoint operator.
Proof: If L is a closed and densely deﬁned operator, then L
∗
is also a closed
and densely deﬁned operator. Furthermore, LL
∗
is an operator with LL
∗
⊂
(LL
∗
)
∗
.
The Hilbert space H ⊕ H may be written as the direct sum of the two
closed graphs −L
−1
and L
∗
Therefore an arbitrary [0, h] for h in H may be
written as the sum [0, h] = [−Lf, f] + [g, L
∗
g]. This says that 0 = −Lf + g
and h = f +L
∗
g. As a consequence h = f +L
∗
Lf. Furthermore, by properties
of projections we have Lf
2
+ f
2
≤ h
2
. We have shown that for each
h we can solve (I + L
∗
L)f = h and that f
2
≤ h
2
. Thus (I + L
∗
L)
−1
is
everywhere deﬁned and is a bounded operator with norm bounded by one.
Since L
∗
L ⊂ (L
∗
L)
∗
, we have (I + L
∗
L)
−1
⊂ (I + L
∗
L)
−1∗
. It follows that
(I +L
∗
L)
−1
= (I +L
∗
L)
−1∗
is a selfadjoint operator. The conclusion follows.
7.9 First order diﬀerential operators with a bounded
interval: point spectrum
In this section we shall see examples of operators with no spectrum at all.
However we shall also see a very pretty and useful example of an operator with
standard point spectrum. This operator is the one behind the theory of Fourier
series.
Example 1A: This example is one where the correct number of boundary
conditions are imposed. In the case of a ﬁrst order diﬀerential operator this
number is one. Let H be the Hilbert space L
2
(0, 1). Let L
0
be the operator
d/dx acting on functions of the form f(x) =
_
x
0
g(y) dy where g is in H. The
value of L
0
on such a function is g(x). Notice that functions in the domain of L
0
automatically satisfy the boundary condition f(0) = 0. This is an example of a
closed operator. The reason is that (L
−1
0
g)(x) =
_
x
0
g(y) dy. This is a bounded
operator deﬁned on the entire Hilbert space. So L
−1
0
and L
0
are both closed.
7.9. FIRST ORDER DIFFERENTIAL OPERATORS WITHABOUNDEDINTERVAL: POINT SPECTRUM99
The adjoint of the inverse is given by (L
−1∗
0
)h(x) =
_
1
x
h(y) dy. It follows
that L
∗
0
is the operator −L
1
, where L
1
is given by d/dx acting on functions
of the form f(x) = −
_
1
x
g(y) dy where g is in H. The value of L
1
on such a
function is g(x). Notice that functions in the domain of L
1
automatically satisfy
the boundary condition f(1) = 0.
The operators L
0
and L
1
each have one boundary condition. They are
negative adjoints of each other. They each have a spectral theory, but it is
extremely pathological. For instance, the resolvent of L
0
is given by
((L
0
−λI)
−1
g)(x) =
_
x
0
e
λ(x−y)
g(y) dy.
So there are no points at all in the spectrum of L
0
. It is in some sense located
all at inﬁnity.
To see this, consider the operator L
−1
0
. This operator has spectrum consist
ing of the point zero. All the spectral information is hidden at this one point.
This is, by the way, an example of pseudocontinuous spectrum.
This is one important though somewhat technical point. The domain of L
0
consists precisely of the functions in the range of K
0
= L
−1
0
. In the example
where K
0
is the integration operator, this is all functions of the form
u(x) =
_
x
0
f(y)dy, (7.4)
where f is in L
2
(0, 1). These functions u need not be C
1
. They belong to a larger
class of functions that are indeﬁnite integrals of L
2
functions. Such functions
are continuous, but they may have slope discontinuities. The functions u of
course satisfy the boundary condition u(0) = 0. The action of L
0
on a function
u is given by L
0
u = u
, where the derivative exists except possible on a set of
measure zero. However L
2
functions such as f are deﬁned only up to sets of
measure zero, so this is not a problem.
Now for a really picky question: If u is also regarded as an L
2
function, then
it is also deﬁned only up to sets of measure zero. So what does u(0) mean?
After all, the set consisting of 0 alone is of measure zero. The answer is that
the general indeﬁnite integral is a function of the form
u(x) =
_
x
0
f(y)dy +C. (7.5)
Among all L
2
functions given by such an integral expression, there is a subclass
of those for which C = 0. These are the ones satisfying the boundary condition
u(0) = 0. There is another subclass for which C = −
_
1
0
f(y)dy. These are the
ones satisfying u(1) = 0.
A densely deﬁned operator L is said to be selfadjoint if L = L
∗
. Similarly,
L is said to be skewadjoint if L = −L
∗
.
Example 1B: Here is another example with the correct number of boundary
conditions. Let H be the Hilbert space L
2
(0, 1). Let L
=
be the operator d/dx
100 CHAPTER 7. DENSELY DEFINED CLOSED OPERATORS
acting on functions of the form f(x) =
_
x
0
g(y) dy + C where g is in H and
_
1
0
g(y) dy = 0. The value of L
=
on such a function is g(x). Notice that
functions in the domain of L
=
automatically satisfy the boundary condition
f(0) = f(1). The operator L
=
is skewadjoint.
The operator L
=
has periodic boundary conditions. Since it is skewadjoint,
it has an extraordinarily nice spectral theory. The resolvent is
((L
=
−λI)
−1
g)(x) =
1
1 −e
λ
_
x
0
e
λ(x−y)
g(y) dy −
1
1 −e
−λ
_
1
x
e
−λ(y−x)
g(y) dy.
The spectrum consists of the numbers 2πin. These are all point spectrum. The
corresponding eigenvectors form a basis that gives the Fourier series expansion
of an arbitrary periodic function with period one.
A densely deﬁned operator L is said to be Hermitian if L ⊂ L
∗
. This is
simply the algebraic property that
(Lu, w) = (u, Lw)
for all u, w in D(L). Similarly, L is said to be skewHermitian if L ⊂ −L
∗
.
Example 2: The following example illustrates what goes wrong when one
imposes the wrong number of boundary conditions. Let H be the Hilbert space
L
2
(0, 1). Let L
01
be the operator d/dx acting on functions of the form f(x) =
_
x
0
g(y) dy where g is in H and
_
1
0
g(y) dy = 0. The value of L
01
on such a
function is g(x). Notice that functions in the domain of L
01
automatically
satisfy the boundary conditions f(0) = 0 and f(1) = 0. The adjoint of L
01
is the operator −L, where L is given by d/dx acting on functions of the form
_
x
0
g(y) dy+C, where g is in H. The value of L on such a function is g(x). Notice
that functions in the domain of L need not satisfy any boundary conditions.
From this we see that L
01
is skewHermitian.
The operator L is d/dx. It has too few boundary conditions. The opera
tor L
01
has a boundary condition at 0 and at 1. This is too many boundary
conditions. Each of these operators is the negative of the adjoint of the other.
The spectrum of L consists of the entire complex plane, and it is all point spec
trum. The spectrum of L
01
also consists of the entire complex plane, and it is
all residual spectrum.
Remark: The operators L
0
and L
1
have L
01
⊂ L
0
⊂ L and with L
01
⊂ L
1
⊂
L. Furthermore, the operator L
=
has L
01
⊂ L
=
⊂ L. Thus there are various
correct choices of boundary conditions, but they may have diﬀerent spectral
properties.
7.10 Spectral projection and reduced resolvent
Consider a closed densely deﬁned operator L. In this section we shall assume
that the eigenvectors of L span the entire Hilbert space. Consider also an
isolated eigenvalue λ
n
. The spectral projection corresponding to λ
n
is a (not
7.11. GENERATING SECONDORDER SELFADJOINT OPERATORS101
necessarily orthogonal) projection onto the corresponding eigenspace. It is given
in terms of the resolvent by
P
n
= lim
λ→λ
n
(λ
n
−λ)(L −λI)
−1
. (7.6)
This is the negative of the residue of the resolvent at λ
1
. The reason this works
is that L =
m
λ
m
P
m
and consequently
(L −λI)
−1
=
m
1
λ
m
−λ
P
m
, (7.7)
at least in the case under consideration, when the eigenvectors span the entire
Hilbert space.
The reduced resolvent corresponding to λ
n
is deﬁned as the operator that
inverts (L −λ
n
I) in the range of (I −P
n
) and is zero in the range of P
n
. It is
a solution of the equations
S
n
P
n
= P
n
S
n
= 0 (7.8)
and
(L −λ
n
I)S
n
= 1 −P
n
(7.9)
It may be expressed in terms of the resolvent by
S
n
= lim
λ→λ
n
(L −λI)
−1
(1 −P
n
). (7.10)
When the eigenvectors span this is
S
n
=
m=n
1
λ
m
−λ
n
P
m
. (7.11)
Example: Take the skewadjoint operator L
=
= d/dx acting in L
2
(0, 1)
with periodic boundary conditions. The spectral projection corresponding to
eigenvalue 2πin is the selfadjoint operator
(P
n
g)(x) =
_
1
0
exp(2πin(x −y))g(y) dy. (7.12)
The reduced resolvent corresponding to eigenvalue 0 is the skewadjoint operator
(S
0
g)(x) =
_
x
0
(
1
2
−x +y)g(y) dy +
_
1
x
(−
1
2
−x +y)g(y) dy. (7.13)
7.11 Generating secondorder selfadjoint oper
ators
This section exploits the theorem that says that if L is an arbitrary closed
densely deﬁned operator, then L
∗
L is a selfadjoint operator. Remember that
L
∗∗
= L, so LL
∗
is also a selfadjoint operator.
102 CHAPTER 7. DENSELY DEFINED CLOSED OPERATORS
It would be easy to conclude that ﬁrst order diﬀerential operators such as L
01
and its adjoint L are of no interest for spectral theory. This is not the case. From
the general theorem LL
01
and L
01
L are selfadjoint secondorder diﬀerential
operators. These, as we shall see, have a nice spectral theory. The operator
LL
01
is the operator −d
2
/dx
2
with Dirichlet boundary conditions u(0) = 0 and
u(1) = 0. The operator L
01
L is the operator −d
2
/dx
2
with Neumann boundary
conditions u
(0) = 0 and u
(1) = 0. It is amusing to work out other selfadjoint
second order diﬀerential operators that may be generated from the ﬁrst order
diﬀerential operators of the preceding sections.
7.12 First order diﬀerential operators with a semi
inﬁnite interval: residual spectrum
In this example we shall see examples of operators with anomalous point spec
trum and residual spectrum. These operators underly the theory of the Laplace
transform.
Example: Let H be the Hilbert space L
2
(0, ∞). Let L
0
be the operator d/dx
acting on functions f in H of the form f(x) =
_
x
0
g(y) dy where g is in H. The
value of L
0
on such a function is g(x). Notice that functions in the domain of
L
0
automatically satisfy the boundary condition f(0) = 0.
If 'λ < 0, then we can ﬁnd always ﬁnd a solution of the equation (L
0
−
λI)f = g. This solution is
f(x) =
_
x
0
e
λ(x−y)
g(y) dy. (7.14)
This equation deﬁnes the bounded operator (L
0
−λI)
−1
that sends g into f, at
least when 'λ < 0. Notice that if 'λ > 0, then the formula gives a result in L
2
only if g is orthogonal to e
−
¯
λx
. Thus 'λ > 0 corresponds to residual spectrum.
Again let H be the Hilbert space L
2
(0, ∞). Let L be the operator d/dx with
no boundary condition. If 'λ > 0, then we can ﬁnd always ﬁnd a solution of
the equation (L −λI)f = g. This solution is
f(x) = −
_
∞
x
e
λ(x−y)
g(y) dy. (7.15)
This equation deﬁnes the bounded operator (L −λI)
−1
that sends g into f, at
least when 'λ > 0. On the other hand, if 'λ < 0, then we have point spectrum.
The relation between these two operators is L
∗
0
= −L. This corresponds to
the fact that (L
0
−λI)
−1∗
= (−L −
¯
λ)
−1
, which is easy to check directly.
7.13 First order diﬀerential operators with an
inﬁnite interval: continuous spectrum
In this section we shall see an example of an operator with continuous spectrum.
This is the example that underlies the theory of the Fourier transform.
7.14. PROBLEMS 103
Example: Let H be the Hilbert space L
2
(−∞, ∞). Let L be the operator
d/dx acting on functions f in H with derivatives f
in H.
If 'λ < 0, then we can ﬁnd always ﬁnd a solution of the equation (L−λI)f =
g. This solution is
f(x) =
_
x
−∞
e
λ(x−y)
g(y) dy. (7.16)
This equation deﬁnes the bounded operator (L −λI)
−1
that sends g into f, at
least when 'λ < 0. If 'λ > 0, then we can ﬁnd always ﬁnd a solution of the
equation (L −λI)f = g. This solution is
f(x) = −
_
∞
x
e
λ(x−y)
g(y) dy. (7.17)
This equation deﬁnes the bounded operator (L −λI)
−1
that sends g into f, at
least when 'λ > 0.
The operator L is skewadjoint, that is, L
∗
= −L. This corresponds to the
fact that (L −λI)
−1∗
= (−L −
¯
λ)
−1
, which is easy to check directly.
7.14 Problems
1. Let H = L
2
(0, 1). Let L = −id/dx with periodic boundary conditions.
Find an explicit formula for (λ−L)
−1
g. Hint: Solve the ﬁrst order ordinary
diﬀerential equation (λ − L)f = g with the boundary condition f(0) =
f(1).
2. Find the eigenvalues and eigenvectors of L. For each eigenvalue λ
n
, ﬁnd
the residue P
n
of (λ −L)
−1
at λ
n
.
3. Find the explicit form of the formula g =
n
P
n
g.
4. Let H = L
2
(−∞, ∞). Let L = −id/dx. Let k be real and > 0. Find
an explicit formula for (L −k −i)
−1
g. Also, ﬁnd an explicit formula for
(L −k +i)
−1
g. Find the explicit form of the expression
δ
(L −k)g =
1
2πi
[(L −k −i)
−1
−(L −k +i)
−1
]g.
5. Find the explicit form of the formula
g =
_
∞
−∞
δ
(L −k)g dk.
6. Let →0. Find the explicit form of the formula
g =
_
∞
−∞
δ(L −k)g dk.
104 CHAPTER 7. DENSELY DEFINED CLOSED OPERATORS
7.15 A pathological example
Consider H = L
2
(R) and ﬁx g in H. Deﬁne K as the integral operator
with kernel k(x, y) = g(x)δ(y). Consider the domain D(K) to be the set
of all continuous functions in H. Then
(Ku)(x) = g(x)f(0). (7.18)
This K is densely deﬁned but not closed. It has a closure
¯
K, but this is
not an operator. To see this, let u
n
→ u and Ku
n
= u
n
(0)g → v. Then
the pair [u, v] is in the graph
¯
K. But we can take u
n
→ 0 in the Hilbert
space sense, yet with each u
n
(0) = C. So this gives the pair [0, Cg] in the
graph
¯
K. This is certainly not an operator!
Consider the adjoint K
∗
. This is the integral operator with kernel δ(x)g(y).
That is,
(K
∗
w)(x) = δ(x)
_
∞
−∞
g(y)w(y) dy. (7.19)
Since
_
∞
−∞
δ(x)
2
dx = +∞, (7.20)
the δ(x) is not in L
2
. Hence the domain of K
∗
consists of all w with
(g, w) = 0, and K
∗
= 0 on this domain. This is an operator that is closed
but not densely deﬁned. According to the general theory, its adjoint is
K
∗∗
=
¯
K, which is not an operator.
Chapter 8
Normal operators
8.1 Spectrum of a normal operator
Theorem (von Neumann) Let L be a densely deﬁned closed operator. Then L
∗
L
and LL
∗
are each selfadjoint operators.
A densely deﬁned closed operator is said to be normal if L
∗
L = LL
∗
.
There are three particularly important classes of normal operators.
1. A selfadjoint operator is an operator L with L
∗
= L.
2. A skewadjoint operator is an operator L with L
∗
= −L.
3. A unitary operator is an operator L with L
∗
= L
−1
. A unitary operator
is bounded.
For a selfadjoint operator the spectrum is on the real axis. For a skew
adjoint operator the spectrum is on the imaginary axis. For a unitary operator
the spectrum is on the unit circle.
For normal operators there is a diﬀerent classiﬁcation of spectrum. Let L be
a normal operator acting in a Hilbert space H. The point spectrum consists of
the eigenvalues of L. The corresponding eigenvectors span a closed subspace M
p
of the Hilbert space. The spectrum of L in this space consists of either what we
have previously called standard point spectrum or of what we have previously
called pseudocontinuous spectrum. This kind of pseudocontinuous spectrum
is not really continuous at all, since it consists of limits of point spectrum.
Let M
c
be the orthogonal complement in H of M
p
. Then the spectrum
of L restricted to M
c
is called the continuous spectrum of L. In our previ
ous classiﬁcation the spectrum of L in this space would be pseudocontinuous
spectrum.
With this classiﬁcation for normal operators the point spectrum and contin
uous spectrum can overlap. But they really have nothing to do with each other,
since they take place in orthogonal subspaces.
Spectral theorem for compact normal operators. Let K be a compact normal
operator. (This includes the cases of selfadjoint operators and skewadjoint
operators.) Then K has an orthogonal basis of eigenvectors. The nonzero
105
106 CHAPTER 8. NORMAL OPERATORS
eigenvalues have ﬁnite multiplicity. The only possible accumulation point of
eigenvalues is zero.
Spectral theorem for normal operators with compact resolvent. Let L be
a normal operator with compact resolvent. (This includes the cases of self
adjoint operators and skewadjoint operators.) Then L has an orthogonal basis
of eigenvectors. The eigenvalues have ﬁnite multiplicity. The only possible
accumulation point of eigenvalues is inﬁnity.
8.2 Problems
1. Perhaps the most beautiful selfadjoint operator is the spherical Laplacian
∆
S
=
1
sin θ
∂
∂θ
sin θ
∂
∂θ
+
1
sin
2
θ
∂
2
∂φ
2
.
Show by explicit computation that this is a Hermitian operator acting on
L
2
of the sphere with surface measure sin θ dθ dφ. Pay explicit attention
to what happens at the north pole and south pole when one integrates by
parts.
2. Let r be the radius satisfying r
2
= x
2
+y
2
+z
2
. Let
L = r
∂
∂r
be the Euler operator. Show that the Laplace operator
∆ =
∂
2
∂x
2
+
∂
2
∂y
2
+
∂
2
∂z
2
is related to L and ∆
S
by
∆ =
1
r
2
[L(L + 1) + ∆
S
].
3. Let p be a polynomial in x, y, z that is harmonic and homogeneous of
degree . Thus ∆p = 0 and Lp = p. Such a p is called a solid spherical
harmonic. Show that each solid spherical harmonic is an eigenfunction of
∆
S
and ﬁnd the corresponding eigenvalue as a function of .
4. The restriction of a solid spherical harmonic to the sphere r
2
= 1 is called
a surface spherical harmonic. The surface spherical harmonics are the
eigenfunctions of ∆
S
. Show that surface spherical harmonics for diﬀerent
values of are orthogonal in the Hilbert space of L
2
functions on the
sphere.
5. The dimension of the eigenspace indexed by is 2 + 1. For = 0 the
eigenspace is spanned by 1. For = 1 it is spanned by z, x + iy, and
x −iy. For = 2 it is spanned by 3z
2
−r
2
, z(x +iy), z(x −iy), (x +iy)
2
,
8.3. VARIATION OF PARAMETERS AND GREEN’S FUNCTIONS 107
and (x − iy)
2
. For = 3 it is spanned by 5z
3
− 3zr
2
, (5z
2
− r
2
)(x + iy).
(5z
2
−r
2
)(x −iy), z(x +iy)
2
, z(x −iy)
2
, (x +iy)
3
, (x −iy)
3
. Express the
corresponding surface spherical harmonics in spherical coordinates.
6. In the case = 1 we can write the general spherical harmonic as ax+by+cz.
In the case = 2 we can write it as ax
2
+ by
2
+ cz
2
+ dxy + eyz + fzx
with an additional condition on the coeﬃcients. What is this condition?
In the case = 3 we can write it as a
1
x
3
+b
1
y
2
x+c
1
z
2
x+a
2
y
3
+b
2
z
2
y +
c
2
x
2
y +a
3
z
3
+b
3
x
2
z +c
3
y
2
z +dxyz with additional conditions. What are
they?
8.3 Variation of parameters and Green’s func
tions
First look at ﬁrst order linear ordinary diﬀerential operators. Let Lu = p(x)u
+
r(x)u. Let u
1
be a nonzero solution of the homogeneous equation Lu = 0. The
general solution of the homogeneous equation Lu = 0 is u(x) = c
1
u
1
(x), where
c
1
is a parameter. The method of variation of parameters gives a solution of
the inhomogeneous equation Lu = f in the form u(x) = c
1
(x)u
1
(x).
The condition on the parameter is given by plugging u into Lu = f. This
gives p(x)c
1
(x)u
1
(x) = f(x). The solution is c
1
(x) = f(x)/(p(x)u
1
(x)). The
only diﬃcult part is to integrate this to get the general solution
u(x) =
_
x
a
u
1
(x)
p(y)u
1
(y)
f(y) dy +Cu
1
(x). (8.1)
Now look at second order linear ordinary diﬀerential operators. Let Lu =
p(x)u
+r(x)u
+q(x)u. Let u
1
and u
2
be independent solutions of the homoge
neous equation Lu = 0. The general solution of the homogeneous equation Lu =
0 is u(x) = c
1
u
1
(x) +c
2
u
2
(x), where c
1
and c
2
are parameters. The method of
variation of parameters gives a solution of the inhomogeneous equation Lu = f
in the form u(x) = c
1
(x)u
1
(x) +c
2
(x)u
2
(x). Not only that, it has the property
that the derivative has the same form, that is, u
(x) = c
1
(x)u
1
(x) +c
2
(x)u
2
(x).
If this is to be so, then c
1
(x)u
1
(x) + c
2
(x)u
2
(x) = 0. This is the ﬁrst
equation. The second equation is given by plugging u into Lu = f. This
gives p(x)(c
1
u
1
(x) + c
2
(x)u
2
(x)) = f(x). This system of two linear equations
is easily solved. Let w(x) = u
1
(x)u
2
(x) − u
2
(x)u
1
(x). The solution is c
1
(x) =
−u
2
(x)f(x)/(p(x)w(x)) and c
2
(x) = u
1
(x)f(x)/(p(x)w(x)).
A solution of Lu = f is thus
u(x) =
_
b
x
u
1
(x)u
2
(y)
p(y)w(y)
f(y) dy +
_
x
a
u
2
(x)u
1
(y)
p(y)w(y)
f(y) dy. (8.2)
Furthermore,
u
(x) =
_
b
x
u
1
(x)u
2
(y)
p(y)w(y)
f(y) dy +
_
x
a
u
2
(x)u
1
(y)
p(y)w(y)
f(y) dy. (8.3)
108 CHAPTER 8. NORMAL OPERATORS
Notice that u(a) = Au
1
(a) and u
(a) = Au
1
(a), while u(b) = Bu
2
(b) and
u
(b) = Bu
2
(b). So this form of the solution is useful for specifying boundary
conditions at a and b.
The general solution is obtained by adding an arbitrary linear combination
C
1
u
1
(x)+C
2
u
2
(x). However often we want a particular solution with boundary
conditions at a and b. Then we use the form above. This can also be written
u(x) = (Kf)(x) =
_
b
a
k(x, y)f(y) dy, (8.4)
where
k(x, y) =
_
u
1
(x)u
2
(y)
p(y)w(y)
if x < y
u
2
(x)u
1
(y)
p(y)w(y)
if x > y
(8.5)
Sometime one thinks of y as a ﬁxed source and write the equation
L
x
k(x, y) = δ(x −y). (8.6)
Of course this is just another way of saying that LK = I.
8.4 Second order diﬀerential operators with a
bounded interval: point spectrum
Example. Consider the selfadjoint diﬀerential operator L
D
= −d
2
/dx
2
on
L
2
(0, 1) with Dirichlet boundary conditions f(0) = 0 and f(1) = 0 at 0 and
1. Take the solutions u
1
(x) = sin(
√
λx)/
√
λ and u
2
(x) = sin(
√
λ(1 − x))/
√
λ.
These are deﬁned in a way that does not depend on which square root of λ
is taken. (Furthermore, they have obvious values in the limit λ → 0.) Then
p(x) = −1 and w(x) = −sin(
√
λ)/
√
λ. This also does not depend on the cut.
The resolvent is thus ((L
D
−λ)
−1
g)(x) = f(x) where
f(x) =
1
√
λsin(
√
λ)
__
x
0
sin(
√
λ(1 −x)) sin(
√
λy)g(y) dy +
_
1
x
sin(
√
λx) sin(
√
λ(1 −y))g(y) dy
_
.
(8.7)
The spectrum consists of the points λ = n
2
π
2
for n = 1, 2, 3, . . .. This is
standard point spectrum. It is amusing to work out the spectral projection at
the eigenvalue n
2
π
2
. This is the negative of the residue and is explicitly
(P
n
g)(x) = 2
_
1
0
sin(nπx) sin(nπy)g(y) dy. (8.8)
Example. Consider the selfadjoint diﬀerential operator L
N
= −d
2
/dx
2
on
L
2
(0, 1) with Neumann boundary conditions f
(0) = 0 and f
(1) = 0 at 0 and
1. Take the solutions u
1
(x) = cos(
√
λx) and u
2
(x) = cos(
√
λ(1 − x)). Then
p(x) = −1 and w(x) =
√
λsin(
√
λ). This also does not depend on the cut. The
8.5. SECONDORDER DIFFERENTIAL OPERATORS WITHASEMIBOUNDEDINTERVAL: CONTINUOUS SPECTRUM109
resolvent is thus
((L
N
−λ)
−1
g)(x) = −
1
√
λsin(
√
λ)
__
x
0
cos(
√
λ(1 −x)) cos(
√
λy)g(y) dy +
_
1
x
cos(
√
λx) cos(
√
λ(1 −y))g(y) dy
_
.
(8.9)
The spectrum consists of the points λ = n
2
π
2
for n = 0, 1, 2, 3, . . .. This is
standard point spectrum. The spectral projection at the eigenvalue n
2
π
2
for
n = 1, 2, 3, . . . is
(P
n
g)(x) = 2
_
1
0
cos(nπx) cos(nπy)g(y) dy. (8.10)
For n = 0 it is
(P
0
g)(x) =
_
1
0
g(y) dy. (8.11)
It is interesting to compute the reduced resolvent of L
N
at the eigenvalue 0.
Thus we must compute (L
N
−λ)
−1
¯ g, where ¯ g = (1 −P)g has zero average, and
then let λ approach zero. This is easy. Expand the cosine functions to second
order. The constant terms may be neglected, since they are orthogonal to ¯ g.
This gives
(S
0
¯ g)(x) =
_
x
0
(
1
2
(1 −x)
2
+
1
2
y
2
)¯ g(y) dy +
_
1
x
(
1
2
x
2
+
1
2
(1 −y)
2
)¯ g(y) dy. (8.12)
From this it is easy to work out that
(S
0
g)(x) =
_
x
0
(
1
2
(1 −x)
2
+
1
2
y
2
−
1
6
)g(y) dy +
_
1
x
(
1
2
x
2
+
1
2
(1 −y)
2
−
1
6
)g(y) dy.
(8.13)
8.5 Second order diﬀerential operators with a
semibounded interval: continuous spectrum
Example. Consider the selfadjoint diﬀerential operator L
D
= −d
2
/dx
2
on
L
2
(0, ∞) with Dirichlet boundary condition f(0) = 0 at 0. Take the solutions
u
1
(x) = sinh(
√
−λx)/
√
−λ and u
2
(x) = e
−
√
−λx
. Since sinh(iz) = i sin(z), this
is the same u
1
(x) as before. In u
2
(x) the square root is taken to be cut on the
negative axis. Then p(x) = −1 and w(x) = −1. The resolvent is
((L
D
−λ)
−1
g)(x) =
1
√
−λ
__
x
0
e
−
√
−λx
sinh(
√
−λy)g(y) dy +
_
∞
x
sinh(
√
−λx)e
−
√
−λy
g(y) dy
_
.
(8.14)
The spectrum consists of the positive real axis and is continuous.
It is instructive to compute the resolvent of the selfadjoint diﬀerential oper
ator L
N
= −d
2
/dx
2
on L
2
(0, ∞) with Neumann boundary condition f
(0) = 0
at 0. Again the spectrum consists of the positive real axis and is continuous.
110 CHAPTER 8. NORMAL OPERATORS
8.6 Second order diﬀerential operators with an
inﬁnite interval: continuous spectrum
Example. Consider the selfadjoint diﬀerential operator L = −d
2
/dx
2
on L
2
(−∞, ∞).
There is now no choice of boundary conditions. The resolvent is
((L−λ)
−1
g)(x) =
1
2
√
−λ
__
x
−∞
e
−
√
−λx
e
√
−λy
g(y) dy +
_
∞
x
e
√
−λx
e
−
√
−λy
g(y) dy
_
.
(8.15)
This can also be written in the form
((L −λ)
−1
g)(x) =
1
2
√
−λ
_
∞
−∞
e
−
√
−λx−y
g(y) dy. (8.16)
The spectrum consists of the positive real axis and is continuous.
8.7 The spectral theorem for normal operators
Throughout the discussion we make the convention that the inner product is
conjugate linear in the ﬁrst variable and linear in the second variable.
The great theorem of spectral theory is the following.
Let H be a Hilbert space. Let L be a normal operator. Then there exists
a set K (which may be taken to be a disjoint union of copies of the line) and
a measure µ on K and a unitary operator U : H → L
2
(K, µ) and a complex
function λ on K such that
(ULf)(k) = λ(k)(Uf)(k). (8.17)
Thus if we write Λ for the operator of multiplication by the function λ, we get
the representation
L = U
∗
ΛU. (8.18)
The theorem is a generalization of the theorem on diagonalization of normal
matrices. If the measure µ is discrete, then the norm in the space L
2
(µ) is given
by g
2
=
k
[g(k)[
2
µ(¦k¦). The λ
k
are the eigenvalues of L. The equation
then says
(ULf)
k
= λ
k
(Uf)
k
. (8.19)
The unitary operator U is given by
(Uf)
k
= (ψ
k
, f), (8.20)
where the ψ
k
are eigenvectors of L normalized so that µ(¦k¦) = 1/(ψ
k
, ψ
k
) .
The inverse of U is given by
U
∗
g =
k
g
k
ψ
k
µ(¦k¦). (8.21)
8.7. THE SPECTRAL THEOREM FOR NORMAL OPERATORS 111
The equation
Lf = U
∗
ΛUf (8.22)
says explicitly that
Lf =
k
λ
k
(ψ
k
, f)ψ
k
µ(¦k¦). (8.23)
If the measure µ is continuous, then the norm in the space L
2
(K, µ) is given
by g
2
=
_
[g(k)[
2
dµ(k). Then λ(k)is a function of the continuous parameter
k. The equation then says
(ULf)(k) = λ(k)(Uf)(k). (8.24)
In quite general contexts the unitary operator U is given by
(Uf)(k) = (ψ
k
, f), (8.25)
but now this equation only makes sense for a dense set of f in the Hilbert space,
and the ψ
k
resemble eigenvectors of L, but do not belong to the Hilbert space,
but instead to some larger space, such as a space of slowly growing functions or
of mildly singular distributions. The inverse of U is given formally by
U
∗
g =
_
g(k)ψ
k
dµ(k), (8.26)
but this equation must be interpreted in some weak sense. The equation
Lf = U
∗
ΛUf (8.27)
says formally that
Lf =
_
λ(k)(ψ
k
, f)ψ
k
dµ(k). (8.28)
Since the eigenvectors ψ
k
are not in the Hilbert space, it is convenient in
many contexts to forget about them and instead refer to the measure µ and
the function λ and to the operators U and U
∗
. The theorem says simply that
every normal operator L is isomorphic to multiplication by a function λ. The
simplicity and power of the equation L = U
∗
ΛU cannot be overestimated.
The spectral theorem for normal operators says that every normal operator
is isomorphic (by a unitary operator mapping the Hilbert space to an L
2
space)
to a multiplication operator (multiplication by some complex valued function
λ). The spectrum is the essential range of the function. This is the set of points
ω such that for each > 0 the set of all points k such that λ(k) is within of w
has measure > 0. This is obvious; a function λ(k) has 1/(λ(k) −w) bounded if
and only if w is not in the essential range of λ(k).
112 CHAPTER 8. NORMAL OPERATORS
8.8 Examples: compact normal operators
The theorem on compact normal operators is a corollary of the general spectral
theorem. Consider a compact normal operator L. Say that it is isomorphic
to multiplication by λ. Fix > 0 and look at the part of the space where
[L[ ≥ . This is just the part of the space that is isomorphic to the part of L
2
where [λ[ ≥ . More explicitly, this is the subspace consisting of all functions g
in L
2
such that g(k) ,= 0 only where [λ(k)[ ≥ . On this part of the space the
operator L maps the unit ball onto a set that contains the ball of radius . Since
L is compact, it follows that this part of the space is ﬁnite dimensional. This
shows that the spectrum in the subspace where [L[ ≥ is ﬁnite dimensional.
Therefore there are only ﬁnitely many eigenvectors of ﬁnite multiplicity in this
space. Since > 0 is arbitrary, it follows that there are only countably many
isolated eigenvectors of ﬁnite multiplicity in the part of the space where [L[ > 0.
In the part of the space where L = 0 we can have an eigenvalue 0 of arbitrary
multiplicity (zero, ﬁnite, or inﬁnite).
8.9 Examples: translation invariant operators
and the Fourier transform
The nicest examples for the continuous case are given by translation invariant
operators acting in H = L
2
(R, dx). In this case the Fourier transform maps H
into L
2
(R, dk/(2π)). The Fourier transform is given formally by
(Ff)(k) = (ψ
k
, f) =
_
∞
−∞
e
−ikx
f(x) dx. (8.29)
Here ψ
k
(x) = e
ikx
, and we are using the convention that the inner product is
linear in the second variable. The inverse Fourier transform is
(F
−1
g)(x) =
_
∞
−∞
e
ikx
g(k)
dk
2π
. (8.30)
Here are some examples:
Example 1: Translation. Let U
a
be deﬁned by
(T
a
f)(x) = f(x −a). (8.31)
Then T
a
is unitary. The spectral representation is given by the Fourier trans
form. In fact
(FT
a
f)(k) = exp(−ika)(Ff)(k). (8.32)
Example 2: Convolution. Let C be deﬁned by
(Cf)(x) =
_
∞
−∞
c(x −y)f(y) dy =
_
∞
−∞
c(a)f(x −a) da, (8.33)
8.10. EXAMPLES: SCHR
¨
ODINGER OPERATORS 113
where c is an integrable function. Then C is bounded normal. Then by inte
grating the ﬁrst example we get
(FCf)(k) = ˆ c(k)(Ff)(k), (8.34)
where ˆ c is the Fourier transform of c.
Example 3: Diﬀerentiation. Let D be deﬁned by
(Df)(x) =
df(x)
dx
. (8.35)
Then D is skewadjoint. Furthermore, we get
(FDf)(k) = ik(Ff)(k). (8.36)
Notice that the unitary operator in Example 1 may be written
U
a
= exp(−aD). (8.37)
Example 4. Second diﬀerentiation. Let D
2
be deﬁned by
(D
2
f)(x) =
d
2
f(x)
dx
2
. (8.38)
Then D is selfadjoint. Furthermore, we get
(FDf)(k) = −k
2
(Ff)(k). (8.39)
We can take interesting functions of these operator. For instance (−D
2
+
m
2
)
−1
is convolution by 1/(2m)e
−mx
. And exp(tD
2
) is convolution by 1/
√
2πte
−x
2
2t
.
8.10 Examples: Schr¨odinger operators
If V (x) is a real locally integrable function that is bounded below, then
H = −D
2
+V (x) (8.40)
is a welldeﬁned selfadjoint operator. Such an operator is called a Schr¨odinger
operator.
If V (x) →∞as [x[ →∞, then the spectrum of H is point spectrum. Finding
the eigenvalues is a challenge. One case where it is possible to obtain explicit
formulas is when V (x) is a quadratic function.
Also there is an interesting limiting case. If V (x) = 0 for 0 < x < 1 and
V (x) = +∞ elsewhere, then we may think of this as the operator H = −D
2
with Dirichlet boundary conditions at the end points of the unit interval. We
know how to ﬁnd the spectrum in this case.
If on the other hand, V (x) is integrable on the line, then the spectrum of
H consists of positive continuous spectrum and possibly some strictly negative
eigenvalues. A nice example of this is the square well, where there is a constant
114 CHAPTER 8. NORMAL OPERATORS
a > 0 with V (x) = −a for 0 < x < 1 and V (x) = 0 otherwise. This is another
case where computations are possible.
The calculation of the spectral properties of Schr¨odinger operators is the
main task of quantum physics. However we shall see that Schr¨odinger opera
tors play a role in other contexts as well. (One example will be in calculus of
variations.)
8.11 Subnormal operators
The spectral theorem for normal operators is a landmark. However not every
operator is normal. One important class of operators with fascinating spectral
properties consists of the subnormal operators. A subnormal operator is an
operator that is the restriction of a normal operator to an invariant subspace.
First consider the case of bounded operators. An operator S : H → H is
subnormal if there exists a larger Hilbert space H
with H ⊂ H
as a closed
subspace, a normal operator N : H
→ H
that leaves H invariant, and such
that N restricted to H is S.
Example: Let H = L
2
([0, ∞), dt). For each a ≥ 0 deﬁne
(S
a
f)(t) = f(t −a) (8.41)
for a ≤ t and
(S
a
f)(t) = 0 (8.42)
for 0 ≤ t < a. Then S
a
is subnormal.
To see this, consider the bigger space H
= L
2
(R, dt) and consider H as
the subspace of functions that vanish except on the positive reals. Let U
a
be
translation by a on H
. If a ≥ 0, then the subspace H is left invariant by U
a
.
Then S
a
is U
a
acting in this subspace.
It follows from the spectral theorem that a subnormal operator S is isomor
phic by a unitary operator U : H →M to a multiplication operator that sends
g(k) to λ(k)g(k). Here M is a closed subspace of an L
2
space. For each f in H
we have Uf in M and USf(k) = λ(k)(Uf)(k) in M.
Now it is more diﬃcult to characterize the spectrum. A number w is in the
resolvent set if 1/(λ(k) −w)g(k) is in M for every function g(k) in M. However
it is not suﬃcient that this is a bounded function. If, for instance, every function
g(k) in M has an extension to an analytic function g(z) deﬁned on some larger
region, then one would want 1/(λ(z) − w)g(z) to also be an analytic function
in this region. So we need to require also that w is not in the range of the
extension λ(z).
Example: Let F be the Fourier transform applied to the Hilbert space H
of L
2
functions that vanish except on the positive axis. Then the image of this
transform consists of the subspace M of L
2
functions g(k) that are boundary
values of analytic functions g(z) in the lower half plane. The operator S
a
for a >
0 is isomorphic to multiplication by λ(k) = exp(−iak) acting in this subspace.
This function extends to a function λ(z) = exp(−iaz) deﬁned in the lower half
8.11. SUBNORMAL OPERATORS 115
plane. So the spectrum is the range of this function. But the image of ·z ≤ 0
under exp(−iaz) is the unit circle [w[ ≤ 1. So this is the spectrum.
The adjoint of a subnormal operator S : H → H is another operator S
∗
:
H →H. The adjoint S
∗
need not be subnormal.
Lemma. If N : H
→ H
is normal, H ⊂ H
, and S : H → H is the
restriction of N to H, then S
∗
: H →H is given by S
∗
= PN
∗
, where P is the
orthogonal projection of H
onto H.
Proof: If u is in H, for each v in H we have
(S
∗
u, v) = (u, Sv) = (u, Nv) = (N
∗
u, v). (8.43)
This says that N
∗
u−S
∗
u is orthogonal to every v in H. Since S
∗
u is in H, this
implies that S
∗
u is the orthogonal projection of N
∗
u onto H.
Theorem. If S is a subnormal operator, then SS
∗
≤ S
∗
S as quadratic forms,
that is, (u, SS
∗
u) ≤ (u, S
∗
Su) for all u in H.
Proof: First note that for u in H
we have
(N
∗
u, N
∗
u) = (u, NN
∗
u) = (u, N
∗
Nu) = (Nu, Nu). (8.44)
Then for u in H we have
(S
∗
u, S
∗
u) = (PN
∗
u, PN
∗
u) ≤ (N
∗
u, N
∗
u) = (Nu, Nu) = (Su, Su). (8.45)
Corollary. If S is a subnormal operator and Su = 0, then S
∗
u = 0. Thus
the null space of S is contained in the null space of S
∗
.
Corollary. If S is a subnormal operator and Su = λu, then S
∗
u =
¯
λu. Thus
every eigenvector of S is an eigenvector of S
∗
.
It is not true that every eigenvalue of S
∗
is an eigenvalue of S. It is more
typical that S
∗
has eigenvalues while S does not. In fact we shall see examples
in which S
∗
has anomalous point spectrum, while S has residual spectrum.
Theorem. If the Hilbert space is ﬁnite dimensional, then every subnormal
operator is normal.
Proof: Let S be a subnormal operator acting in H. Since the space H is
ﬁnite dimensional, S has an eigenvector u in H with Su = λu. Since S is
subnormal, it follows that S
∗
u =
¯
λu. Let v be a vector in H that is orthogonal
to u. Then (Sv, u) = (v, S
∗
u) =
¯
λ(v, u) = 0. Thus the orthogonal complement
of u in H is also an invariant space, so the operator S restricted to this smaller
space is also subnormal. Continue in this way until one ﬁnds an orthogonal
basis of eigenvectors for S.
There are examples in which one might want to consider unbounded sub
normal operators. One possible deﬁnition might be the following. Consider an
unbounded normal operator N. Thus there is a dense domain D(N) ⊂ H
such
that N : D(N) →H
is normal. Let H be a Hilbert space that is a closed sub
space of H
. Suppose that D(N) ∩H is dense in H and that N sends vectors in
this dense subspace into H. Then if D(S) = D(N) ∩H and S is the restriction
of N to D(S), the operator S is subnormal. Then D(S) is dense in H, and
S : D(S) →H is a closed, densely deﬁned operator.
116 CHAPTER 8. NORMAL OPERATORS
Example: Let H = L
2
([0, ∞), dt). Deﬁne
(Sf)(t) = f
(t) (8.46)
on the domain consisting of all f in H such that f
is also in H and f(0) = 0.
Then S is subnormal.
To see this, consider the bigger space H
= L
2
(R, dt) and consider H as
the subspace of functions that vanish except on the positive reals. Let N be
diﬀerentiation on H
. Notice that if f is in D(N) and is also in H, then f(0) = 0
automatically.
If N is a normal operator acting in H
, so N : D(N) →H
, then its adjoint
N
∗
is also a normal operator, and in fact D(N
∗
) = D(N). So if S : D(S) →H
is subnormal, then D(S) = D(N) ∩ H = D(N
∗
) ∩ H. Furthermore, if u is in
D(N
∗
) ∩ H, then u is in D(S
∗
) and S
∗
u = PN
∗
u. This can be seen from the
computation (u, Sv) = (u, Nv) = (N
∗
u, v) for all v in D(S).
We conclude that for a subnormal operator D(S) ⊂ D(S
∗
) and (S
∗
, S
∗
u) ≤
(Su, Su) for all u in D(S). This is also an easy computation: (S
∗
u, S
∗
u) =
(PN
∗
u, PN
∗
u) ≤ (N
∗
u, N
∗
u) = (Nu, Nu) = (Su, Su).
Example: For the operator S in the last example we have
(S
∗
f)(t) = −f
(t) (8.47)
on the domain consisting of all f in H such that f
is also in H. There is no
boundary condition at zero. This is a case where D(S) ⊂ D(S
∗
) and the two
domains are not equal.
8.12 Examples: forward translation invariant op
erators and the Laplace transform
The example in the last section is the operator theory context for the theory of
the Laplace transform.
The Laplace transform of a function f in L
2
([0, ∞), dt) is
(Lf)(z) =
_
∞
0
e
−zt
f(t) dt. (8.48)
If we think of z = iω on the imaginary axis, then when regarded as a function
of ω this is the Fourier transform . However it extends as an analytic function
to ω in the lower half plane, that is, to z in the right half plane.
Let a ≥ 0 and let S
a
be the operator of right translation ﬁlling in with zero
as deﬁned in the previous section. Then
(LS
a
f)(z) =
_
∞
a
e
−zt
f(t −a) dt = e
−az
(Lf)(z). (8.49)
So S
a
is isomorphic to multiplication by e
−az
acting on the Hilbert space of
functions analytic in the right half plane. Its spectrum consists of the closed
unit disk.
8.12. EXAMPLES: FORWARDTRANSLATIONINVARIANT OPERATORS ANDTHE LAPLACE TRANSFORM117
Let c be an integrable function deﬁned on the positive axis, and let the causal
convolution C be deﬁned by
(Cf)(t) =
_
t
0
c(t −u)f(u) du =
_
t
0
c(a)f(t −a) da. (8.50)
Then C is subnormal, and
(LCf)(z) = ˆ c(z)(Lf)(z), (8.51)
where ˆ c is the Laplace transform of c.
Let D
0
= d/dt be diﬀerentiation with zero boundary conditions at the origin.
Then
(LD
0
f)(z) = z(Lf)(z). (8.52)
Notice that the boundary condition is essential for integration by parts. The
spectrum of D
0
consists of the closed right half plane. We can write exp(−aD
0
) =
S
a
for a ≥ 0. This operator satisﬁes the diﬀerential equation dS
a
f/da =
−D
0
S
a
f for a ≥ 0, provided that f is in the domain of D
0
(and in particu
lar satisﬁes the boundary condition).
The adjoint of a subnormal operator need not be subnormal. Thus, for
instance, the adjoint of S
a
is
(S
∗
a
f)(t) = f(t +a). (8.53)
Its spectrum is also the closed unit disc, but the interior of the disk consists of
point spectrum. Notice that the Laplace transform does not send this into a
multiplication operator. In fact,
(LS
∗
a
f)(z) =
_
∞
0
e
−zt
f(t +a) dt = e
az
_
(Lf)(z) −
_
a
0
e
−zt
f(t) dt
_
. (8.54)
Similarly, the adjoint of D
0
is −D, where D = d/dt with no boundary
condition. Again the Laplace transform does not make this into a multiplication
operator. In fact, we have
(LDf)(z) = z(Lf)(z) −f(0). (8.55)
The spectrum of D consists of the closed left half plane. The interior consists
of point spectrum. We can write exp(aD) = exp(−aD
∗
0
) = S
∗
a
for a ≥ 0. Even
though D does not have a spectral representation as a multiplication operator,
this operator satisﬁes the diﬀerential equation dS
∗
a
f/da = DS
∗
a
f for a ≥ 0,
provided f is in the domain of D.
Example: Solve the diﬀerential equation (D + k)f = g, with k > 0, with
boundary condition f(0) = c. The operator D +k is not invertible, since −k is
an eigenvalue of D. However we can write f = u −ce
−kt
and solve
(D
0
+k)u = g. (8.56)
118 CHAPTER 8. NORMAL OPERATORS
The solution is u = (D
0
+k)
−1
g. In terms of Laplace transforms this is ˆ u(z) =
1/(z +k)ˆ g(z). It follows that u is given by the causal convolution
u(t) =
_
t
0
e
−k(t−u)
g(u) du. (8.57)
Thus
f(t) = f(0)e
−kt
+
_
t
0
e
−k(t−u)
g(u) du. (8.58)
8.13 Quantum mechanics
Here is a dictionary of the basic concepts. Fix a Hilbert space. A quantum
observable is a selfadjoint operator L. A quantum state is a unit vector u. The
expectation of the observable L in the state u in D(L) is
µ = (u, Lu). (8.59)
The variance of the observable L in the state u in D(L) is
σ
2
= (L −µI)u
2
. (8.60)
Here are some observables. The Hilbert space is L
2
(R, dx). The momentum
observable is p = −i¯hd/dx. Here ¯h > 0 is Planck’s constant. The position
observable is q which is multiplication by the coordinate x. The Heisenberg
uncertainty principle says that for every state the product σ
p
σ
q
≥ ¯h/2.
If L is an observable and f is a real function, then f(L) is an observable.
Take the case where the function is 1
A
, the indicator function of a set A of
position coordinate values. Then 1
A
(q) has expectation
(u, 1
A
(q)u) =
_
A
[u(x)[
2
dx. (8.61)
This is the probability that the position is in A. Similarly, take the case where
the function is 1
B
, the indicator function of a set B of momentum coordinate
values. Since in the Fourier transform representation p is represented by multi
plication by ¯ hk, it follows that 1
B
(p) has expectation
(u, 1
B
(p)u) =
_
{k¯ hk∈B}
[ˆ u(k)[
2
dk
2π
. (8.62)
This is the probability that the momentum is in B.
Energy observables are particularly important. The kinetic energy observable
is H
0
= p
2
/(2m) = −¯h
2
/(2m)d
2
/dx
2
. Here m > 0 is the mass. The spectrum of
H
0
is the positive real axis. The potential energy observable is V = v(q) which is
multiplication by v(x). Here v is a given real function that represents potential
energy as a function of the space coordinate. The spectrum of V is the range
of the function v. The total energy observable or quantum Hamiltonian is
H = H
0
+V =
p
2
2m
+v(q) = −
¯h
2
2m
d
2
dx
2
+v(x). (8.63)
8.14. PROBLEMS 119
If we assume that the function v is bounded, then H is a selfadjoint operator.
(In many cases when v is only bounded below it remains a selfadjoint oper
ator.) The problem of investigating its spectrum is of the utmost importance
for quantum mechanics. Since H
0
and V do not commute, this is not an easy
problem.
Suppose that the total energy observable H is a selfadjoint operator. Then
the time evolution operator is the unitary operator exp(−itH/¯h). A central
problem of quantum mechanics is to compute this operator. This is not easy,
because while exp(−itH
0
/¯h) and exp(−itV/¯h) are easy to compute, the opera
tors H
0
and V do not commute. So there is no direct algebraic way to express
exp(−itH/¯h) in terms of the simpler operators. Nevertheless, we shall encounter
a beautiful formula for this time evolution operator in the next chapter.
8.14 Problems
1. A particularly fascinating selfadjoint operator is the quantum harmonic
oscillator. (This operator also occurs in disguised form in other contexts.)
It is
N =
1
2
_
−
d
2
dx
2
+x
2
−1
_
acting in L
2
(−∞, ∞). Show that it factors as
N = A
∗
A,
where
A =
1
√
2
_
x +
d
dx
_
and
A
∗
=
1
√
2
_
x −
d
dx
_
.
2. Show that AA
∗
= A
∗
A+I.
3. Solve the equation Au
0
= 0. Show that Nu
0
= 0.
4. Show that if Nu
n
= nu
n
and u
n+1
= A
∗
u
n
, then Nu
n+1
= (n + 1)u
n+1
.
Thus the eigenvalues of N are the natural numbers. These are the stan
dard type of point spectrum.
5. Show that each eigenfunction u
n
is a polynomial in x times u
0
(x). Find
the polynomials for the cases of n = 0, 1, 2, 3 explicitly (up to constant
factors). Verify that each u
n
belongs to the Hilbert space.
6. It may be shown that A
∗
is a subnormal operator and so A is the adjoint
of a subnormal operator. Find all eigenvalues (point spectrum) of A.
Find each corresponding eigenvector. Verify that it belongs to the Hilbert
space.
120 CHAPTER 8. NORMAL OPERATORS
7. Find all eigenvalues (point spectrum) of A
∗
. Find the spectrum of A
∗
.
What kind of spectrum is it?
8. If A
∗
is indeed a subnormal operator, then we should have A
∗
A ≤ AA
∗
as quadratic forms. Is this the case?
Chapter 9
Calculus of Variations
9.1 The EulerLagrange equation
The problem is to ﬁnd the critical points of
F(y) =
_
x
2
x
1
f(y, y
x
, x) dx. (9.1)
The diﬀerential of F is
dF(y)h =
_
x
2
x
1
(f
y
h +f
y
x
h
x
) dx =
_
x
2
x
1
(f
y
−
d
dx
f
y
x
)hdx +f
y
x
h [
x
2
x
1
. (9.2)
Thus for the diﬀerential to be zero we must have the EulerLagrange equation
f
y
−
d
dx
f
y
x
= 0. (9.3)
This is an equation for the critical function y. It is second order, and it has the
explicit form
f
y
−f
yy
x
dy
dx
−f
y
x
y
x
d
2
y
dx
2
−f
xy
x
= 0. (9.4)
This equation is linear in d
2
y/dx
2
. However the coeﬃcients are in general
nonlinear expressions in y, dy/dx, and x.
If the y are required to have ﬁxed values y = y
1
at x = x
1
and y = y
2
at
x = x
2
at the end points, then the h are required to be zero at the end points.
The corresponding boundary term is automatically zero.
If the y and h are free to vary at an end point, then at a critical point one
must have f
y
x
= 0 at at the end points.
Sometimes one wants to think of
δF
δy(x)
=
∂f
∂y
−
d
dx
∂f
∂y
x
(9.5)
121
122 CHAPTER 9. CALCULUS OF VARIATIONS
as the gradient of F, where the inner product is given by the integral. This
expression is then known as the variational derivative. The EulerLagrange
equation then says that the variational derivative is zero.
Example. Consider the problem of minimizing the length
F(y) =
_
x
2
x
1
_
1 +y
2
x
dx (9.6)
between the points (x
1
, y
1
) and (x
2
, y
2
). The boundary condition are y(x
1
) = y
1
and y(x
2
) = y
2
. The solution of the EulerLagrange equation is y
x
= C, a
curve of constant slope. So the solution is y − y
1
= m(x − x
1
), where m =
(y
2
−y
1
)/(x
2
−x
1
).
Example. Consider the problem of minimizing the length
F(y) =
_
x
2
x
1
_
1 +y
2
x
dx (9.7)
between the lines x = x
1
and x = x
2
. The boundary conditions for the Euler
Lagrange equation are y
x
(x
1
) = 0 and y
x
(x
2
) = 0. The solution of the Euler
Lagrange equation is y
x
= C, a curve of constant slope. But to satisfy the
boundary conditions C = 0. So the solution is y = A, where A is a arbitrary
constant.
9.2 A conservation law
Say that one has a solution of the EulerLagrange equation. Then
d
dx
f = f
y
y
x
+f
y
x
y
xx
+f
x
=
d
dx
(y
x
f
y
x
) +f
x
. (9.8)
Thus
d
dx
H +f
x
= 0, (9.9)
where
H = y
x
f
y
x
−f. (9.10)
If f
x
= 0, then this says that H is constant. This is the conservation law. It is
a nonlinear equation for y and dy/dx. It takes the explicit form
y
x
f
y
x
−f = C. (9.11)
Thus it is ﬁrst order, but in general fully nonlinear in y
x
. How in practice does
one solve a problem in calculus of variations? There is a way when the function
f(y, y
x
) does not depend on x. Use the conservation law H(y, dy/dx) = C
to solve for dy/dx = α(y). Perhaps this equation can be solved directly. Or
write dx = dy/α(y). Integrate both sides to get x in terms of y. Or make a
substitution expressing y in terms of a new variable u, and get x and y each in
terms of u.
9.3. SECOND VARIATION 123
Example 1: This example comes from minimizing the area of a surface of
revolution. The unknown is a function y of x between −a and a. The value of
y at ±a is r. The element of area is proportional to 2πy ds = 2πy
_
1 +y
2
x
dx.
Let f(y, y
x
) = 2πy
_
1 +y
2
x
. Then H = −2πy/
_
1 +y
2
x
= −C. The diﬀerential
equation to be solved is (dy/dx)
2
= (k
2
y
2
− 1), where k = 2π/C. This has
solution y = (1/k) cosh(k(x − x
0
)). By symmetry x
0
= 0. So we need to solve
rk = cosh(ak). This equation has to be solved for k. Fix r and vary a. When a
is small enough, then cosh(ak) cuts the line rk in two points. Thus there are two
solutions of the EulerLagrange equations, corresponding to y = (1/k) cosh(kx)
with the two values of k. The derivative is dy/dx = sinh(kx). The smaller value
of k gives the smaller derivative, so this is the minimum area surface satisfying
the boundary conditions.
Example 2: This example comes up ﬁnding the maximum area for given
arc length. For simplicity consider y as a function of x between −a and a,
with value 0 at the two end points. For simplicity let the arc length κ satisfy
2a < κ < πa. The area is the integral from −a to a of y dx, while the length
is the integral from −a to a of ds =
_
1 +y
2
x
dx. This is a Lagrange multiplier
problem. The function is f(y, y
x
) = y − λ
_
1 +y
2
x
. The conserved quantity
is H = −λ1/
_
1 +y
2
x
− y = −c
1
. The diﬀerential equation is dx/dy = (y −
c
1
)/
_
(λ
2
−(y −c
1
)
2
. Thus x − c
2
=
_
λ
2
−(y −c
1
)
2
, which is the equation
of a circle (x −c
2
)
2
+ (y −c
1
)
2
= λ
2
. The Lagrange multiplier turns out to be
the radius of the circle. By symmetry c
2
= 0. Furthermore c
1
= −
√
λ
2
−a
2
. If
we let sin(θ) = a/λ, then the equation sin(θ) < θ < (π/2) sin(θ) translates into
2a < κ = 2λθ < πa.
9.3 Second variation
Say that y is a solution of the EulerLagrange equation that has ﬁxed values
at the end points. Then the second diﬀerential of F is obtained by expanding
F(y +h) = F(y) +dF(y)h +
1
2
d
2
F(y)(h, h) + . The result is
d
2
F(y)(h, h) =
_
x
2
x
1
(f
yy
h
2
+ 2f
yy
x
hh
x
+f
y
x
y
x
h
2
x
) dx. (9.12)
The functions h have value 0 at the end points, so we may freely integrate by
parts. The result may thus also be written
d
2
F(y)(h, h) =
_
x
2
x
1
((f
yy
−
d
dx
f
yy
x
)h
2
+f
y
x
y
x
h
2
x
) dx. (9.13)
Yet another form is
d
2
F(y)(h, h) =
_
x
2
x
1
[−
d
dx
(f
y
x
y
x
h
x
) + (f
yy
−
d
dx
f
yy
x
)h]hdx = (Lh, h). (9.14)
We recognize
Lh = −
d
dx
(f
y
x
y
x
h
x
) + (f
yy
−
d
dx
f
yy
x
)h (9.15)
124 CHAPTER 9. CALCULUS OF VARIATIONS
as a SturmLiouville operator. In fact, if f(y, y
x
, x) =
1
2
y
2
x
+g(y, x), then
Lh = −h
xx
+g
yy
h (9.16)
is a Schr¨odinger operator. The coeﬃcient g
yy
has the solution of the Euler
Lagrange inserted, so it is regarded as a function of x. The operator has Dirichlet
boundary conditions at the end points of the interval from x
1
to x
2
.
9.4 Interlude: The Legendre transform
The Legendre transform of a function L of v is deﬁned as follows. Let p = dL/dv.
Find the inverse function deﬁning v as a function of p. Then the Legendre
transform is a function H of p satisfying dH/dp = v. The constant of integration
is chosen so that H(0) +L(0) = 0.
Example: Let L = e
v
−v −1. Then the derivative is p = e
v
−1. The inverse
function is v = log(1 +p). The integral is H = (1 +p) log(1 +p) −p.
There is a remarkable formula relating the Legendre transform of a function
to the original function. Let L be a function of v. Deﬁne
p =
dL
dv
. (9.17)
Then
H = pv −L. (9.18)
is the Legendre transform of L. In the following we want to think of L as a
function of the v, while H is a function of the dual variable p. The variable p
is covariant and is dual to the variable v, which is contravariant. Thus in this
formula v is deﬁned in terms of p by solving the equation p = dL/dv for v.
The fundamental theorem about the Legendre transform is
v =
dH
dp
. (9.19)
Proof: Let H = pv −L as above By the product rule and the chain rule and
the deﬁnition of p we get
dH
dp
= v +p
dv
dp
−
dL
dv
dv
dp
= v +p
dv
dp
−p
dv
dp
= v. (9.20)
Thus the situation is symmetric, and one can go back from H to L in the
same way as one got from L to H. Conclusion: Two functions are Legendre
transforms of each other when their derivatives are inverse functions to each
other.
One would like a condition that guarantees that the equations that deﬁne
the Legendre transform actually have solutions. In order for p = dL/dv to have
a unique solution v, it would be useful to have dL/dv to be strictly increasing.
This is the same as saying that dp/dv = d
2
L/dv
2
> 0. The inverse function
then has derivative given by the inverse dv/dp = d
2
H/dp
2
> 0.
9.4. INTERLUDE: THE LEGENDRE TRANSFORM 125
Example: Let L = e
v
−v−1. Then we have seen that H = (p+1) log(p+1)−p.
This also follows from the general formula H = pv −L. In fact, since p = e
v
−1
has inverse v = log(1+p), we have H = pv −L = p log(1+p) −[(1+p) −log(1+
p) − 1] = (1 + p) log(1 + p) − p. In this example the second derivatives are e
v
and 1/(p + 1). These are both positive, and they are reciprocals of each other.
Example: Let L = v
a
/a with 1 < a < ∞ deﬁned for v ≥ 0. Then H = p
b
/b
with 1 < b < ∞ deﬁned for p ≥ 0. In this case p = v
a−1
and v = p
b−1
. The
relation between a and b is (a − 1)(b − 1) = 1. This may also be written as
1/a + 1/b = 1. Thus a and b are conjugate exponents. The second derivatives
are (a −1)v
a−2
and (b −1)p
b−2
. Again they are reciprocals.
The Legendre transform plays a fundamental role in thermodynamics and
in mechanics. Here are examples from mechanics involving kinetic energy.
Example: Start with L = (1/2)mv
2
. Then p = mv. So v = p/m and
H = pv −L = p
2
/(2m). This H is the Legendre transform of L.
Example: If we start instead with H = p
2
/(2m), then v = p/m. So p = mv
and L = pv − H = (1/2)mv
2
. The Legendre transform of L brings us back to
the original H.
Example: A famous relativistic expression for energy is H =
_
m
2
c
4
+p
2
c
2
.
It is an amusing exercise to compute that L = −mc
2
_
1 −v
2
/c
2
. The key is
the relation p = mv/
_
1 −v
2
/c
2
between the derivatives.
The Legendre transform has a generalization to several dimensions. Let L
be a function of v
1
, . . . , v
n
. Deﬁne
p
j
=
∂L
∂v
j
, (9.21)
so dL =
j
p
j
dv
j
. (This is the diﬀerential of L, so it is a oneform.) Then
H =
k
p
k
v
k
−L. (9.22)
is the Legendre transform of L. In the following we want to think of L as a
function of the v
j
, while H is a function of the dual variables p
k
. The variables p
k
are covariant and dual to the variable v
j
, which are contravariant. Thus in this
formula the v
j
are deﬁned in terms of p
k
by solving the equation p
k
= ∂L/∂v
k
for the v
j
.
The fundamental theorem about the Legendre transform in this context is
v
k
=
∂H
∂p
k
. (9.23)
Proof: Let H =
k
p
k
v
k
− L as above By the product rule and the chain
rule and the deﬁnition of p we get
∂H
∂p
j
= v
j
+
k
p
k
∂v
k
∂p
j
−
k
∂L
∂v
k
∂v
k
∂p
j
= v
j
+
k
p
k
∂v
k
∂p
j
−
k
p
k
∂v
k
∂p
j
= v
j
. (9.24)
Thus the situation is symmetric, and one can go back from H to L in the
same way as one got from L to H. Conclusion: Two functions are Legendre
126 CHAPTER 9. CALCULUS OF VARIATIONS
transforms of each other when their derivatives are inverse functions to each
other.
One would like a condition that guarantees that the equations that deﬁne
the Legendre transform actually have solutions. In order for the equation to
have a solution locally, the matrix
∂p
k
∂v
j
=
∂
2
L
∂v
k
∂v
j
(9.25)
should be invertible. A particularly nice condition is that this matrix is strictly
positive deﬁnite. The inverse matrix is then
∂v
j
∂p
k
=
∂
2
H
∂p
j
∂p
k
. (9.26)
It is then also strictly positive deﬁnite.
9.5 Lagrangian mechanics
One is given a Lagrangian function L that is a function of position q and velocity
˙ q and possibly time t. The problem is to ﬁnd the critical points of
S(q) =
_
t
2
t
1
L(q, q
t
, t) dt. (9.27)
The function q represents position as a function of time. It has ﬁxed values q
1
and q
2
at the end points t
1
and t
2
. The diﬀerential of S is
dS(q)h =
_
t
2
t
1
(L
q
h +L
˙ q
h
t
) dt =
_
t
2
t
1
(L
q
−
d
dt
L
˙ q
)hdt. (9.28)
Here h is a function with values 0 at the end points. Thus for the diﬀerential
to be zero we must have the EulerLagrange equation
∂L
∂q
−
d
dt
∂L
∂ ˙ q
= 0. (9.29)
Say that one has a solution of the EulerLagrange equation. Then
d
dt
L = L
q
q
t
+L
˙ q
q
tt
+L
t
=
d
dt
(q
t
L
˙ q
) +L
t
. (9.30)
Thus along a solution
d
dt
H +
∂L
∂t
= 0, (9.31)
where
H = q
t
L
˙ q
−L. (9.32)
If L
t
= 0, then this says that H is constant. This is the energy conservation
law.
9.6. HAMILTONIAN MECHANICS 127
Deﬁne
p =
∂L
∂ ˙ q
. (9.33)
This is the momentum variable. Then the EulerLagrange equation says that
along a solution
dp
dt
=
∂L
∂q
. (9.34)
9.6 Hamiltonian mechanics
Let
H = p ˙ q −L. (9.35)
be the Legendre transform of L, where ˙ q is deﬁned in terms of p implicitly by
p = ∂L/∂ ˙ q.
In the following we want to think of L as a function of q and ˙ q and possibly
t, while H is a function of q and p and possibly t. The momentum variable p is
covariant and is dual to the velocity variable ˙ q, which is contravariant. However
p is expressed in terms of q and ˙ q by solving the Legendre transform equation
p = ∂L/∂ ˙ q. According to the properties of the Legendre transform we have
˙ q =
∂H
∂p
. (9.36)
In fact, the proof is simple if we remember that q is ﬁxed:
∂H
∂p
= ˙ q +p
∂ ˙ q
∂p
−
∂L
∂ ˙ q
∂ ˙ q
∂p
= ˙ q. (9.37)
It also follows that
∂H
∂q
= −
∂L
∂q
. (9.38)
This also has an easy proof if we remember that p is ﬁxed and so ˙ q depends on
q:
∂H
∂q
= p
∂ ˙ q
∂q
−
∂L
∂q
−
∂L
∂ ˙ q
∂ ˙ q
∂q
= −
∂L
∂q
. (9.39)
The EulerLagrange equation says that along a solution we have
dp
dt
= −
∂H
∂q
. (9.40)
Since p = ∂L/∂ ˙ q determines the function p in terms of q, dq/dt, and t, it follows
that along a solution
dq
dt
=
∂H
∂p
(9.41)
determines the function dq/dt in terms of q, p, and t. The last two equations
are Hamilton’s equations. They are a ﬁrst order system, linear in dq/dt and
dp/dt, but nonlinear in q, p, t.
128 CHAPTER 9. CALCULUS OF VARIATIONS
It is easy to check from Hamilton’s equations that along a solution
dH
dt
=
∂H
∂t
. (9.42)
When H does not depend explicitly on t, then this has an integral in which H
is constant. This integral is conservation of energy. It gives a situation in which
p is related to q. This allows an equation in which dq/dt is related to q. This
relation may be highly nonlinear.
9.7 Kinetic and potential energy
The most classical situation is when
L =
1
2
m˙ q
2
−V (q). (9.43)
The EulerLagrange equation is just.
−V
(q) −
dq
t
dt
= 0. (9.44)
This is Newton’s law of motion.
In this problem the momentum is p = m˙ q. The Hamiltonian is
H =
1
2m
p
2
+V (q). (9.45)
Let us look at this from the point of view of maximum and minimum. The
second diﬀerential is determined by the Schr¨odinger operator
L = −m
d
2
dt
2
−V
(q(t)). (9.46)
Here q(t) is a function of t satisfying the EulerLagrange equation and the
boundary conditions. So for instance if this operator with Dirichlet boundary
conditions at t
1
and t
2
has strictly positive eigenvalues, then the solution will
be a minimum. This will happen, for instance, if V
(q) ≤ 0. In many other
cases, the solution will not be a minimum of the action, but only a stationary
point.
Example: Take an example with V
(q) < 0. Then the problem is to ﬁnd q(t)
that minimizes the integral from t
1
to t
1
of (1/2)m˙ q
2
− V (q). Here q(t
1
) = q
1
and q(t
2
) = q
2
are ﬁxed. Clearly there is tradeoﬀ. One would like the solution
to linger as long as possible in the region where V (q) is large. On the other
hand, to satisfy the boundary conditions the solution should not move too fast.
The solution will start at the point y
1
at time t
1
. Then it go reasonably, but not
excessively, rapidly to the region where V (q) is large. There it will linger. Then
it will fall back to the point y
2
, arriving there at time t
2
. This minimization
problem seems to have nothing to do with Newton’s laws. But it gives the same
answer.
9.8. PROBLEMS 129
9.8 Problems
Let L be a function of q and ˙ q given by
L =
1
2
m˙ q
2
−V (q).
Let
S(q) =
_
t
2
t
1
Ldt,
with functions q of t satisfying q = q
1
at t = t
1
and q = q
2
at t = t
2
.
1. Show that
dS(q)h = (−m
d
dt
˙ q −V
(q), h),
where h satisﬁes Dirichlet boundary conditions at t
1
and t
2
.
2. Consider a q(t) for which dS(q) = 0. Show that
d
2
S(q)(h, h) =
_
t
2
t
1
[m
_
dh
dt
_
2
−V
(q)h
2
] dt.
where the functions h satisfy Dirichlet boundary conditions at t
1
and t
2
.
3. Consider a q(t) for which dS(q) = 0. Show that
d
2
S(q)(h, h) = (h, [−m
d
2
dt
2
−V
(q)]h),
where the operator satisﬁes Dirichlet boundary conditions at t
1
and t
2
.
4. Show that if V (q) is concave down, then the solution q(t) of the variational
problem is actually a minimum.
5. Let H = m˙ q
2
−L = (1/2)m˙ q
2
+V (q). Show that H = E along a solution,
where E is a constant.
6. From now on take the example V (q) = −(1/2)kq
2
. Here k > 0. Note the
sign. We are interested in solutions with E > 0. Let ω =
_
k/m. Show
that q = C sinh(ωt) is a solution, and ﬁnd the constant C in terms of E.
7. Take t
1
= −T and t
2
= T. Take q
1
= −a and q
2
= a. Fix a. Write the
boundary condition a = C sinh(ωT) as a relation between T and E. Show
that T →0 implies E →∞, while T →∞ implies E →0.
8. Interpret the result of the last problem intuitively in terms of particle
motion satisfying conservation of energy.
9. Interpret the result of the same problem intuitively in terms of a mini
mization problem.
130 CHAPTER 9. CALCULUS OF VARIATIONS
9.9 The path integral
Let us return to the problem of evaluating the quantum mechanical time evo
lution exp(−iitH/¯h). Here the total energy operator H = H
0
+ V is the sum
of two noncommuting operators, corresponding to potential energy and kinetic
energy.
The problem is easy when we just have kinetic energy. Then we must
evaluate exp(−itH
0
/¯h), where H
0
= p
2
/(2m) = −¯h
2
/(2m)d
2
/dx
2
. In the
Fourier transform representation this is multiplication by ¯ h
2
/(2m)k
2
. So in the
Fourier transform representation the unitary time evolution is multiplication by
exp(−it(¯ h/m)k
2
/2). This is like the heat equation with complex diﬀusion coef
ﬁcient iσ
2
= i(¯h/m). So the solution is convolution by the inverse Fourier trans
form, which is 1/
√
2πiσ
2
t exp(−x
2
/(2iσ
2
t)) = 1/
_
2πi(¯h/m)t exp(ix
2
/(2(¯h/m)t).
In other words,
(exp(−
itH
0
¯h
)u)(x) =
_
∞
−∞
exp(
im(x −x
)
2
t¯h
)u(x
)
dx
_
2πi(¯ h/m)t
. (9.47)
The calculation of the exponential is also easy when we just have potential
energy. In fact,
(exp(−
itV
¯h
)u)(x) = exp(−
itv(x)
¯h
)u(x). (9.48)
The Trotter product formula is a remarkable formula that expresses the
result for the sum H = H
0
+ V in terms of the separate results for H
0
and for
V . It may be proved under various circumstances, for example when V is a
bounded operator. The formula says that
exp(−
itH
¯h
)u = lim
n→∞
_
exp(−
i(t/n)H
0
¯h
) exp(−
i(t/n)V
¯h
)
_
n
u. (9.49)
We can write out the Trotter product formula in detail using the results
obtained before for H
0
and for V separately. Write ∆t = t/n. Then the quantity
of interest is (exp(−
itH
¯ h
)u)(x) = u(x, t) given by the Trotter formula. This works
out to be
u(x, t) = lim
n→∞
_
∞
−∞
_
∞
−∞
exp
_
_
_
_
i
n−1
j=0
_
_
x
j+1
−x
j
∆t
_
2
−V (x
j
)
_
∆t
¯h
_
_
_
_
u(x
0
)
dx
n−1
dx
0
(
_
2πi(¯ h/m)∆t)
n
,
(9.50)
where x
n
= x.
So far this has been rigorous. However now we follow Feynman and take the
formal limit as n →∞ and ∆t →0 with n∆ = t ﬁxed. This gives
(exp(−
itH
¯h
)u)(x) =
_
exp
_
_
_
_
i
_
t
0
_
_
dx(t
)
dt
_
2
−V (x(t
))
_
dt
¯h
_
_
_
_
u(x
0
) Tx,
(9.51)
9.10. APPENDIX: LAGRANGE MULTIPLIERS 131
where the integral is over all paths x from some x
0
at time 0 to ﬁxed ﬁnal x at
time t. This can also be written
(exp(−
itH
¯h
)u)(x) =
_
exp
_
i
_
t
0
L(x(t
),
dx(t
)
dt
) dt
¯h
_
u(x
0
) Tx, (9.52)
where
L(x(t
),
dx(t
)
dt
) =
_
dx(t
)
dt
_
2
−V (x(t
)) (9.53)
is the Lagrangian. Thus the integrand in the path integral is the exponential of
i times the action divided by ¯h.
This expression goes some way toward an explanation of the principle of
stationary action. Consider a part of the integral near a path that is not a
critical point of the action. Then nearby paths will have considerably diﬀerent
values of the action, and after division by ¯ h the phase will be very diﬀerent. So
there will be a lot of cancelation. On the other hand, the part of the integral
near a path that is a critical point will contribute more or less the same value
of the action. So there will be no cancelation.
9.10 Appendix: Lagrange multipliers
In this section y will represent the coordinates y
1
, . . . , y
n
. We are interested in
ﬁnding the critical points of a scalar function F(y) = F(y
1
, . . . , y
n
).
Consider ﬁrst the problem of ﬁnding a critical point of F(y) with no con
straints. The usual method is to compute the diﬀerential and set it equal to
zero, so that the equation to be solved is dF(y) = 0.
Consider the problem of ﬁnding a critical point of F(y) subject to a con
straint G(y) = κ.
The method of Lagrange multipliers is to say that the diﬀerential of F(y)
without the constraint must satisfy
dF(y) = λdG(y) (9.54)
for some λ. That is, the function can vary only by relaxing the constraint. The
change in the function must be proportional to the change in the constraint.
This fundamental equation is an equation for diﬀerential forms, and it takes the
same form in every coordinate system.
Example: Maximize y
1
y
2
subject to y
2
1
+ y
2
2
= κ. The equation is y
2
dy
1
+
y
1
dy
2
= λ(2y
1
dy
1
+2y
2
dy
2
). This gives y
2
= 2λy
1
and y
1
= 2λy
2
. Eliminating
λ we get y
2
1
= y
2
2
. Combine this with y
2
1
+ y
2
2
= κ. We get y
1
= ±
_
κ/2 and
y
2
= ±
_
κ/2. The value of y
1
y
2
at these points are ±κ/2. If you care, you can
also compute that λ = ±1/2.
Say that ¯ y is a critical point, so in particular it satisﬁes the constraint G(¯ y) =
κ. Then dG(¯ y) = dκ. It follows that dF(¯ y) = λdG(¯ y) = λdκ. The conclusion is
that the Lagrange multiplier λ is the rate of change of the value of the function
at the critical point as a function of the constraint parameter, that is, λ =
dF(¯ y)
dκ
.
132 CHAPTER 9. CALCULUS OF VARIATIONS
Example: In the last example the derivative of the maximum value ±κ/2
with respect to κ is λ = ±1/2.
In calculations, one wants to maximize F(y) subject to the constraint G(y) =
κ. The idea is to take λ arbitrary. Set
¯
F(y) = F(y) − λG(y) and require
d
¯
F(y) = 0. This gives the above equation for y and λ. However one also has
the constraint equation for y and λ. These two are then solved simultaneously.
The method of Lagrange multipliers extends to the case when there are
several constraints.
Consider the problem of ﬁnding a critical point of F(y) subject to constraint
G
1
(y) = κ
1
, . . . , G
m
(y) = κ
m
..
The method of Lagrange multipliers is to say that the diﬀerential of F(y)
without the constraint must satisfy
dF(y) =
m
j=1
λ
j
dG
j
(y) (9.55)
for some λ. That is, the function can vary only by relaxing the constraint. The
change in the function must be proportional to the change in the constraints.
This fundamental equation is an equation for diﬀerential forms, and it takes the
same form in every coordinate system.
Say that ¯ y is a critical point, so in particular it satisﬁes the constraints
G
j
(¯ y) = κ
j
. Then dG
j
(¯ y) = dκ
j
. It follows that dF(¯ y) =
j
λ
j
dG
j
(¯ y) =
j
λ
j
dκ
j
. The conclusion is that the Lagrange multiplier λ
j
is the rate of change
of the value of the function at the critical point as a function of the constraint
parameter κ
j
, with the other constraint parameters ﬁxed. Thus λ
j
=
∂F(¯ y)
∂κ
j
.
Chapter 10
Perturbation theory
10.1 The implicit function theorem: scalar case
Let f(u, ) be a function of u and . Let f
u
(u, ) be the partial derivative of F
as a function of u. The assumption for the implicit function theorem is that
f(u
0
, 0) = 0 and the partial derivative f
u
(u
0
, 0) ,= 0. There are some other
technical assumptions. The conclusion is that the equation
f(u, ) = 0 (10.1)
deﬁnes u as a function of for suﬃciently near 0, in such a way that = 0 is
mapped into u
0
.
The intuition behind the theorem is that one can expand
f(u, ) ≈ f
u
(u
0
, 0)(u −u
0
) +f
(u
0
, 0). (10.2)
Then we can solve
u ≈ u
0
−f
u
(u
0
, 0)
−1
f
(u
0
, u). (10.3)
For small this is a very good approximation. In fact, this says that at the
point = 0 we have
du
d
= −f
u
(u
0
, 0)
−1
f
(u
0
, 0). (10.4)
This result is called ﬁrst order perturbation theory.
For practical calculations one wants to compute higher derivatives u
(n)
as a
function of . These are used in the Taylor expansion
u =
∞
n=0
u
(n)
n!
n
. (10.5)
In ﬁrst order we have
f
u
u
(1)
+f
= 0 (10.6)
133
134 CHAPTER 10. PERTURBATION THEORY
which gives the ﬁrst order result
u
(1)
= −f
−1
u
f
. (10.7)
Usually we are interested in evaluating this expression at u = u
0
and = 0.
The second order result is obtained by diﬀerentiating the ﬁrst order equation.
This gives
f
u
u
(2)
+f
uu
(u
(1)
)
2
+ 2f
u
u
(1)
+f
= 0. (10.8)
Thus the result of second order perturbation theory is that
u
(2)
= −f
−1
u
[f
uu
(u
(1)
)
2
+ 2f
u
u
(1)
+f
]. (10.9)
Again we are interested in evaluating this expression at u = u
0
and = 0.
The general pattern is this. The nth derivative is given by
f
u
u
(n)
+r
n
= 0 (10.10)
Diﬀerentiate again. This gives
f
u
u
(n+1)
+ [f
uu
u
(1)
+f
u
]u
(n)
+ [r
nu
u
(1)
+r
n
] = 0. (10.11)
Thus at every stage one only has to invert the ﬁrst derivative f
u
.
Example: Here is a simple example. Fix a. We want to solve u = a +g(u)
to ﬁnd u as a function of . The ﬁrst derivative is u
= g(u) + g
(u)u
. The
second derivative is u
= 2g
(u)u
+g
(u)u
+g
(u)u
2
. The third derivative
is u
= 3g
(u)u
2
+3g
(u)u
+ times other junk. Evaluate these at = 0, that
is, at u = a. This gives u
= g(a), u
= 2g(a)g
(a), and u
= 3g(a)
2
g
(a) +
6g(a)g
(a)
2
. It is diﬃcult to see the pattern. However in the next section we
will see that there is one, at least in this case.
Example: Another popular way of solving the problem is to substitute u =
a +u
+
1
2
u
2
+
1
6
u
3
+ in the equation. This gives
a+u
+
1
2
u
2
+
1
6
u
3
+ = a+[g(a)+g
(a)(u
+
1
2
u
2
+ )+
1
2
2
g
(a)(u
+ )
2
+ ].
(10.12)
Equating powers of gives the same equations, and these are solved recursively
the same way.
10.2 Problems
There is at least one case when one can evaluate Taylor coeﬃcients to all orders.
Say that x is a ﬁxed parameter and one wants to solve
u = x +tg(u) (10.13)
for u as a function of t. When t = 0 the solution is u = x. The Lagrange
expansion says that for arbitrary n ≥ 1 we have
d
n
u
dt
n
[
t=0
=
d
n−1
dx
n−1
g(x)
n
. (10.14)
10.2. PROBLEMS 135
So the expansion to all orders is
u = x+tg(x) +
t
2
2
d
dx
g(x)
2
+
t
3
6
d
2
dx
2
g(x)
3
+ +
t
n
n!
d
n−1
dx
n−1
g(x)
n
+ . (10.15)
1. Let w = g(u). Show that the equation becomes
w = g(x +wt). (10.16)
2. Show that the function w is the solution of the partial diﬀerential equation
∂w
∂t
= w
∂w
∂x
(10.17)
with initial condition w = g(x) at t = 0.
3. Prove the conservation laws
∂w
n
∂t
=
n
n + 1
∂w
n+1
∂x
. (10.18)
4. Prove that for n ≥ 1 we have
∂
n−1
w
∂t
n−1
=
1
n
∂
n−1
∂x
n−1
w
n
. (10.19)
5. Prove that at t = 0
w = g(x)+
t
2
d
dx
g(x)
2
+
t
2
6
d
2
dx
2
g(x)
3
+ +
t
n−1
n!
d
n−1
dx
n−1
g(x)
n
+ . (10.20)
6. Use u = x +tw to prove the Lagrange expansion.
Note: The partial diﬀerential equation has a physical interpretation. If
v = −w, then the equation is
∂v
∂t
+v
∂v
∂x
= 0 (10.21)
with initial condition v = h(x) at time zero. This is the equation for the velocity
of a gas of particles moving freely in one dimension. The motion of a particle
moving with the gas is given by
dx
dt
= v. (10.22)
The acceleration of a particle is
dv
dt
=
∂v
∂t
+
dx
dt
∂v
∂x
= 0 (10.23)
That is why it is considered free particle motion. Since v is constant along
particle paths x = x
0
+ vt, we have that v at arbitrary x, t is given in terms of
the initial v = h(x
0
) at time zero by v = h(x
0
) = h(x −vt). So the solution of
the equation is
v = h(x −vt). (10.24)
136 CHAPTER 10. PERTURBATION THEORY
10.3 The implicit function theorem: systems
We can take F(u, ) = 0 to be a system of n equations for n variables u and an
extra variable . Or more generally we can take F(u, ) to be a function of u
and , where u ranges over a region U in some Banach space E. Then F(u, )
takes values in a Banach space E
. The parameter is still real.
We shall often abbreviate the function by F. Let F
u
be the partial derivative
of F as a function of u. In the case of a system this is the n by n matrix of
partial derivatives of the components of F with respect to the components of u.
More generally, it is a function on U with values that are linear transformations
from E to E
.
The assumption for the implicit function theorem is that F(u
0
, 0) = 0 and
the partial derivative F
u
(u
0
, 0) has a bounded inverse. There are some other
technical assumptions. The conclusion is that the equation
F(u, ) = 0 (10.25)
deﬁnes u as a function of for suﬃciently near 0, in such a way that = 0 is
mapped into u
0
.
For practical calculations one wants to compute higher derivatives u
(n)
as a
function of . These are used in the Taylor expansion
u =
∞
n=0
u
(n)
n!
n
. (10.26)
In ﬁrst order we have
F
u
u
(1)
+F
= 0 (10.27)
which gives the ﬁrst order result
u
(1)
= −F
−1
u
F
. (10.28)
Note that F
has values in E
. On the other hand, F
−1
u
has values that are
linear transformation from E
to E. So the right hand side has values in E, as
it must. Usually we are interested in evaluating this expression at u = u
0
and
= 0.
The second order result is obtained by diﬀerentiating the ﬁrst order equation.
This gives
F
u
u
(2)
+F
uu
(u
(1)
, u
1)
) + 2F
u
u
(1)
+F
= 0. (10.29)
Thus the result of second order perturbation theory is that
u
(2)
= −F
−1
u
[F
uu
(u
(1)
, u
1)
) + 2F
u
u
(1)
+F
]. (10.30)
Note that F
uu
has values that are bilinear functions from E E to E
. On
the other hand, F
u
has values that are linear functions from E to E
. Again
the right hand side has values in E. Again we are interested in evaluating this
expression at u = u
0
and = 0.
10.4. NONLINEAR DIFFERENTIAL EQUATIONS 137
The general pattern is this. The nth derivative is given by
F
u
u
(n)
+R
n
= 0 (10.31)
Diﬀerentiate again. This gives
F
u
u
(n+1)
+ [F
uu
u
(1)
+F
u
]u
(n)
+ [R
nu
u
(1)
+R
n
] = 0. (10.32)
Thus at every stage one only has to invert the ﬁrst derivative F
u
.
10.4 Nonlinear diﬀerential equations
Let us look at the simplest nonlinear diﬀerential equation
Lu = g(u). (10.33)
Here L is the restriction of a linear operator to satisfy an inhomogeneous bound
ary condition. For example, we could have L = ∂/∂t+1 acting on functions with
u(0) = 1. When = 0 there is a solution u = u
0
. In the example u
0
(t) = e
−t
.
The derivative of L is L
0
, where L
0
is a linear operator with zero boundary
conditions. For instance, we could have L
0
= ∂/∂t+1 acting on functions h with
h(0) = 0. Thus the ﬁrst order perturbation equation is L
0
u
= g(u) +g
(u)u
.
The second order equation is L
0
u
= 2g
(u)u
+ terms with . One can continue
in this way. Thus the hierarchy of equations begins Lu
0
= 0, L
0
u
= g(u
0
),
L
0
u
= 2g
(u
0
)u
.
It is possible to think of this in another way. Write u = u
0
+ w. Then the
equation Lu = g(u) becomes L
0
w = g(u), where L
0
is the operator deﬁned
with zero boundary conditions. Suppose that L
0
has an inverse. Then
u = u
0
+L
−1
0
g(u). (10.34)
This is very close in form to the scalar equation examined in the ﬁrst two
sections.
This equation may be solved by the usual procedure. The ﬁrst derivative
is u
= L
−1
0
g(u) + L
−1
0
g
(u)u
. The second derivative is u
= 2L
−1
0
g
(u)u
+
L
−1
0
g
(u)u
+ L
−1
0
[g
(u)u
2
]. The third derivative is u
= 3L
−1
0
[g
(u)u
2
] +
3L
−1
0
[g
(u)u
] + times other junk. Evaluate these at = 0, that is, at u = u
0
.
This gives
u
= L
−1
0
g(u
0
), (10.35)
and
u
= 2L
−1
0
[g
(u
0
)L
−1
0
g(u
0
)], (10.36)
and
u
= 3L
−1
0
[g
(u
0
)(L
−1
0
(g(u
0
))
2
] + 6L
−1
0
[g
(u
0
)L
−1
0
(g
(u
0
)L
−1
0
g(u
0
))]. (10.37)
Notice how the linear operators occur in various places, making it impossible to
do the simpliﬁcation that we found in the scalar case.
138 CHAPTER 10. PERTURBATION THEORY
10.5 A singular perturbation example
In quantum mechanics one often wants to solve a problem of the form
−
¯h
2
2m
u
+v(x)u = λu. (10.38)
Here v(x) is a given real function. In many contexts λ plays the role of an
eigenvalue parameter. The constant m > 0 is ﬁxed. The constant ¯h > 0 is the
perturbation parameter.
The characteristic feature of this problem is that the zero order perturbation
is not apparent. Thus setting ¯h = 0 it is not clear that there will be a solution
at all.
The resolution of this problem is to write the equation in new variables. For
simplicity think of the case when λ > v(x). Thus
u = Ae
i
S
¯ h
. (10.39)
The equation becomes
A
_
1
2m
(S
)
2
+v(x) −λ
_
=
¯h
2
2m
A
(10.40)
together with
2A
S
+AS
= 0. (10.41)
Now we can recognize the ¯h = 0 limit as
1
2m
(S
)
2
+v(x) = λ. (10.42)
The other equation may be written
(A
2
S
)
= 0. (10.43)
These equations have a physical interpretation. The function S
is a velocity
that depends on position. The ﬁrst equation is the classical equation for con
servation of energy. The function A
2
is a density. The second equation is the
conservation law that says that the current A
2
S
is constant in space. Of course
these equations are only valid in the zero order approximation, which in this
case is called the WKB approximation.
This approximation is relevant to a heuristic understanding of Schr¨odinger
operators of the form
H = −
¯h
2
2m
d
2
dx
2
+v(x). (10.44)
The Hilbert space is the space of L
2
functions on the line. If v(x) is bounded
below, then one expects this to be a welldeﬁned selfadjoint operator. In fact,
for λ < 0 suﬃciently negative, one expects (H − λI)
−1
to be a welldeﬁned
bounded selfadjoint operator. The reason is that the solutions of the diﬀerential
10.6. EIGENVALUES AND EIGENVECTORS 139
equation either decay rapidly at inﬁnity or grow rapidly at inﬁnity. The Green’s
function is deﬁned by the solutions that decay rapidly at inﬁnity. There is a
solution of Hu = λu that is in L
2
at −∞ and by another solution of the same
equation that is in L
2
at +∞. The two solutions are matched by a jump
condition to form the kernel of the integral operator.
When v(x) is very badly unbounded below, then something quite diﬀerent
happens. Let us look at the example of v(x) ∼ x
2n
with < 0. According to
the WKB approximation, we have S
(x) ∼ ±[x[
n
. It follows that A
2
∼ [x[
−n
.
Thus it is plausible that if n > 1 then every solution is in L
2
. These solutions
are highly oscillatory and decrease at inﬁnity at a moderate rate.
If this is indeed the case, then the operator H is not selfadjoint. The
problem is that when one wants to deﬁne a Green’s function, there are two
linearly independent solutions, and they are both in L
2
at ±∞. So there is no
boundary condition at inﬁnity that determines the Green’s function in a natural
way.
10.6 Eigenvalues and eigenvectors
We have an operator H
0
+ V . We are interested in an isolated eigenvalue λ
of multiplicity 1. The corresponding eigenvector is p. It is only determined up
to a constant multiple. The adjoint operator has eigenvalue
¯
λ with eigenvalue
q. It is convenient to choose p and q so that the inner product (q, p) = 1. The
associated projection onto the eigenspace is P satisfying P
2
= P. It is given by
Pu = (q, u)p. Thus p is an eigenvector with
(H
0
+V −λ)p = 0. (10.45)
Furthermore, (q, (H
0
+ V − λ)u) = 0 for all u. We also have the reduced
resolvent S with SP = PS = 0 and S(H
0
+V −λI) = 1 −P.
If we indicate the dependence on explicitly, then the equation we want to
solve is
(H
0
+V −λ())p() = 0. (10.46)
We would like to ﬁnd the perturbation expansion in terms of the λ and p and q
and S associated with = 0.
The following discussion is meant to establish two formulas, one for the
eigenvalue and one for the eigenvector (not normalized). The ﬁrst is the formula
for the eigenvalue to second order
λ() = λ + (q, V p) −(q, V SV p)
2
+ . (10.47)
The other is the formula for the (unnormalized) eigenvector to second order
p() = p −SV p + [SV SV p −S
2
V p(q, V p)]
2
+ . (10.48)
These formulas are so important that they should be memorized, at least the
one for the eigenvalues.
140 CHAPTER 10. PERTURBATION THEORY
We now proceed to the derivations. Diﬀerentiate the eigenvector equation
with respect to . This gives
(H
0
+V −λ) ˙ p −
˙
λp +V p = 0. (10.49)
Now take the inner product with q on the left. This gives
˙
λ = (q, V p). (10.50)
This gives the ﬁrst order change of the eigenvalue. That is,
λ() = λ + (q, V p) . (10.51)
Next apply I −P on the left. This gives
(I −P)(H
0
+V −λ) ˙ p + (I −P)V p = 0, (10.52)
or
(I −P) ˙ p = −SV p. (10.53)
Thus
˙ p = −SV p +c
1
p. (10.54)
Notice that the derivative of the eigenvector is not uniquely determined. It
ordinarily harmless to take c
1
= 0. This gives the ﬁrst order change of the
eigenvector
p() = p + [−SV p +c
1
p]. (10.55)
Now diﬀerentiate again. This gives
(H
0
+V −λ)¨ p +
¨
λp + 2V ˙ p −2
˙
λ ˙ p = 0. (10.56)
Take the inner product with q on the left. This gives
¨
λ + 2(q, V ˙ p) −2
˙
λ(q, ˙ p), (10.57)
or
1
2
¨
λ = −(q, V SV p). (10.58)
This gives the second order change of the eigenvalue. That is,
λ() = λ + (q, V p) −(q, V SV p)
2
+ . (10.59)
Now apply 1 −P on the left. This gives
(1 −P)(H
0
+V −λ)¨ p + 2(1 −P)V ˙ p −2
˙
λ(1 −P) ˙ p = 0. (10.60)
Thus
1
2
(1 −P)¨ p = SV SV p −
1
2
c
1
SV p −S
2
V p(q, V p). (10.61)
10.7. THE SELFADJOINT CASE 141
Hence
1
2
¨ p = SV SV p −S
2
V p(q, V p) −
1
2
c
1
SV p +
1
2
c
2
p. (10.62)
This gives the second order change of the eigenvector
p() = p +[−SV p +c
1
p] +[SV SV p −S
2
V p(q, V p) −
1
2
c
1
SV p +
1
2
c
2
p]
2
+ .
(10.63)
Now let us change notation. We will think of a basis p
n
for the Hilbert space
and a dual basis q
m
, with (q
m
, p
n
) = δ
mn
. We write q
∗
m
for the functional that
takes u to (q
m
, u). Then the various operators have special forms. We take the
p
n
to be the eigenvectors of H
0
. Thus H
0
p
n
= λ
n
p
n
. The operator P
n
= p
n
q
∗
n
is the spectral projection associated with the nth eigenvalue λ
n
of H
0
. The
operator S
n
=
j=n
p
j
q
∗
j
/(λ
k
−λ
n
).
Then the second order expression for the eigenvalue can be written as
λ
n
() = λ
n
+ (q
n
, V p
n
) −(q
n
, V S
n
V p
n
)
2
+ . (10.64)
Even more explicitly, this is
λ
n
() = λ
n
+ (q
n
, V p
n
) −
j=n
(q
n
, V p
j
)
1
λ
j
−λ
n
(q
j
, V p
n
)
2
+ . (10.65)
The second order expression for the eigenvector p
n
() is (taking c
1
= 0)
p
n
() = p
n
−S
n
V p
n
+[S
n
V S
n
V p
n
−S
2
n
V p
n
(q
n
, V p
n
)+
1
2
c
2
p
n
]
2
+ . (10.66)
If H
0
has discrete spectrum, then (leaving out the last term) this may be written
p
n
() = p
n
−
j=n
1
λ
j
−λ
n
(q
j
, V p
n
)p
j
(10.67)
+
j=n
1
λ
j
−λ
n
[
k=n
(q
j
, V p
k
)
1
λ
k
−λ
n
(q
k
, V p
n
) −
1
λ
j
−λ
n
(q
j
, V p
n
)(q
n
, V p
n
)]p
j
2
+ .
10.7 The selfadjoint case
In the selfadjoint case q = p and it is natural to normalize so that (p(), p()) = 1
to second order. This gives c
1
= 0 and
c
2
= −(p, V S
2
V p). (10.68)
So in this case
λ() = λ + (p, V p) −(p, V SV p)
2
+ . (10.69)
and
p() = p −SV p + [SV SV p −S
2
V p(p, V p) −
1
2
(p, V S
2
V p)p]
2
+ . (10.70)
142 CHAPTER 10. PERTURBATION THEORY
If λ = λ
n
and p = p
n
, we can write the coeﬃcient c
2
even more explicitly as
c
2
= −
j=n
(p
n
, V p
j
)
1
(λ
j
−λ
n
)
2
(p
j
, V p
n
). (10.71)
Example: Take
H
0
=
_
1 0
0 −1
_
(10.72)
and
V =
_
0 1
1 0
_
. (10.73)
The eigenvalues are λ() = ±
√
1 +
2
. However it is interesting to compare this
exact result with the result of second order perturbation theory.
Let us look at the perturbation of the eigenvalue −1 of H
0
. For this eigen
value the spectral projection is
P =
_
0 0
0 1
_
(10.74)
and the eigenvector p is the second column of P. The reduced resolvent is
S =
_
1
1−(−1)
0
0 0
_
=
_
1
2
0
0 0
_
. (10.75)
So the coeﬃcient in second order perturbation theory is
−(p, V SV p) = −
1
2
. (10.76)
Thus
λ() = −1 −
1
2
2
+ . (10.77)
10.8 The anharmonic oscillator
In this section we let
H
0
=
1
2
_
d
2
dx
2
+x
2
−1
_
. (10.78)
The perturbation V is multiplication by
v(x) = x
4
. (10.79)
Let
A =
1
√
2
_
x +
d
dx
_
(10.80)
and
A
∗
=
1
√
2
_
x −
d
dx
_
. (10.81)
10.8. THE ANHARMONIC OSCILLATOR 143
Then AA
∗
= A
∗
A+I and H
0
= A
∗
A. Furthermore, the solution of Au
0
= 0 is
u
0
(x) =
1
4
√
π
e
−
x
2
2
. (10.82)
Deﬁne for each n = 0, 1, 2, 3, . . . the vector
u
n
=
1
n!
A
∗n
u
0
. (10.83)
We have A
∗
u
m
=
√
m+ 1u
m+1
and Au
m
=
√
mu
m−1
. Then the u
n
form an
orthonormal basis consisting of eigenvectors of H
0
. Furthermore,
H
0
u
n
= nu
n
. (10.84)
Say that we are interested in a particular n. The corresponding reduced
resolvent S
n
is given by
S
n
u
m
=
1
m−n
u
m
(10.85)
for m ,= n, and by S
n
u
n
= 0.
The operator multiplication by x may be expressed in terms of A and A
∗
by
x =
1
√
2
(A+A
∗
). (10.86)
This allows us to calculate the matrix elements of x. These are given by
(u
m+1
, xu
m
) =
1
√
2
(u
m+1
, A
∗
u
m
) =
1
√
2
√
m+ 1 (10.87)
and
(u
m−1
, xu
m
) =
1
√
2
(u
m−1
, Au
m
) =
1
√
2
√
m. (10.88)
So each transition is to a neighboring natural number.
As a warmup, let us compute the matrix element (u
n
, x
2
u
n
). This is the sum
of two possible transition paths, one updown and one downup. The result is
(n + 1)/2 + (n/2) = (2n + 1)/2.
To compute the ﬁrst order perturbation coeﬃcient (u
n
, x
4
u
n
). we need to
look at all transitions that go from n to n in 4 steps. After two steps one is
either at u
n−2
, at u
n
, or at u
n+2
. So
(u
n
, x
4
u
n
) = (u
n
, x
2
u
n−2
)
2
+ (u
n
, x
2
u
n
)
2
+ (u
n
, x
2
u
n+2
)
2
. (10.89)
This is
(u
n
, x
4
u
n
) =
n(n −1)
4
+ (
2n + 1
2
)
2
+
(n + 1)(n + 2)
4
=
6n
2
+ 6n + 3
4
. (10.90)
Higher order perturbations may be computed with more labor. It turns out
that the perturbation series in does not converge. Some intuition for this is
provided by the observation that H
0
+V is not a selfadjoint operator for < 0.
So for < 0 the eigenvalue problem is meaningless. On the other hand, if the
Taylor series had a positive radius of convergence, then it would converge also
for some values of < 0.
2
Contents
1 Linear Algebra 1.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . 1.1.2 Reduced row echelon form . . . . . . . . . . . . . . 1.1.3 Problems . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 The Jordan form . . . . . . . . . . . . . . . . . . . 1.1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . 1.1.6 Quadratic forms . . . . . . . . . . . . . . . . . . . 1.1.7 Spectral theorem . . . . . . . . . . . . . . . . . . . 1.1.8 Circulant (convolution) matrices . . . . . . . . . . 1.1.9 Problems . . . . . . . . . . . . . . . . . . . . . . . 1.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . 1.2.2 Linear transformations . . . . . . . . . . . . . . . . 1.2.3 Reduced row echelon form . . . . . . . . . . . . . . 1.2.4 Jordan form . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Forms and dual spaces . . . . . . . . . . . . . . . . 1.2.6 Quadratic forms . . . . . . . . . . . . . . . . . . . 1.2.7 Special relativity . . . . . . . . . . . . . . . . . . . 1.2.8 Scalar products and adjoint . . . . . . . . . . . . . 1.2.9 Spectral theorem . . . . . . . . . . . . . . . . . . . 1.3 Vector ﬁelds and diﬀerential forms . . . . . . . . . . . . . 1.3.1 Coordinate systems . . . . . . . . . . . . . . . . . 1.3.2 Vector ﬁelds . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Diﬀerential forms . . . . . . . . . . . . . . . . . . . 1.3.4 Linearization of a vector ﬁeld near a zero . . . . . 1.3.5 Quadratic approximation to a function at a critical 1.3.6 Diﬀerential, gradient, and divergence . . . . . . . . 1.3.7 Spherical polar coordinates . . . . . . . . . . . . . 1.3.8 Gradient systems . . . . . . . . . . . . . . . . . . . 1.3.9 Hamiltonian systems . . . . . . . . . . . . . . . . . 1.3.10 Contravariant and covariant . . . . . . . . . . . . . 1.3.11 Problems . . . . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . point . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 8 10 11 13 14 14 15 15 17 17 18 18 19 19 20 21 22 23 25 25 26 27 28 29 30 31 32 33 34 34
. . . . . .3 Cauchy formula for derivatives . . . . . . . . . . . . . . . . . . .1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Problems . . . . . . . . . . 4. . . . . . . . . . . . .3. 3. .1 Jordan’s lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . 3. . 4. . . . . . . . . . . . . . 3.3 The Cauchy integral theorem . . . . . . . .2. . . . . . . . . . . 2. 3. . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . .4 Problems .2 Riemann surfaces . . . . . . . 4. . . .1 The Cauchy integral formula . . . . . . . . . . . . . .5 Fourier transform pairs . . . . . . . . . . . CONTENTS 37 37 38 40 41 42 45 45 46 47 47 48 49 50 50 51 53 53 54 54 55 55 56 57 57 57 58 58 59 59 60 60 61 61 62 62 63 63 64 . . . . .1 Closed and exact forms . . . . .2 L1 theory . . . . . . . . . . . . .4 2 Fourier series 2. . . . . . . . .7 Poisson summation formula 3. . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . 4.5. . . . . . . . .4 Poles of higher order . . . .6. .5 A residue calculation with a double pole 4.4 A residue calculation . . 4. . . . 4. . 4. . 2. . . . 4. . . . . . . . . . . . . . . . . . . . . 4. . . . . . .8 Problems . . . . . . . . . . . . . 4. . . . . . . . . . . . . .5. . .6. . 4. . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . .5 Branch cuts . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 L2 convergence . . .5 More residue calculus . . . . . . . . . . . .4 Pointwise convergence 2. . . . . . . 3. . . . . . . . .5. . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 CauchyRiemann equations . 2. .1 Orthonormal families . . . . . . . . . . . . . . . . . . 4. . . . . .9 PDE Problems . .3. . . . . . . . . . . . . . .1 Radius of convergence . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . .4 Polar representation . . . . . . . . . . . . . . . . . . . . . . . . .3.5. . . . . . . . . . . . . . . . . . . . . . . . . . .3 Absolute convergence . . . . . . . . . . . . . . . . . . . . .2. . . . 4.6 The Taylor expansion . . . . . . .1 Complex number quiz . . . . .3 Complex integration and residue calculus . . . . . .3. . . . . .6 Problems . . 3 Fourier transforms 3. . . . . . . . . . . . . . . . 4. .3 Estimates . . . . . . . . . . . . . . 3. . . . . . . . . . 4.2 A more delicate residue calculation . .2 Complex functions . . . . . . . . . 4 Complex integration 4. . . . . . . 4. . . 4. . . . . . . . . . .4 Absolute convergence . . . . . . . . . .2 The residue calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . .3 L2 theory . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .12 Answers to ﬁrst two problems . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . .6 Finite rank operators . . . . . . . . . 7. . . . . . . . . . .2 Mapping distributions . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Radon measures . . . . . .4 Approximate delta functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . .12 First order diﬀerential operators with a semiinﬁnite interval: residual spectrum . . . . equation . . . . .CONTENTS 5 Distributions 5. . . . . . . . . . . . . . . . . . . . 7. . . . .15 A pathological example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . 7 Densely Deﬁned Closed Operators 7. .1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 The spectrum . . . . . . . .7 Problems . 7. . . . . . . . . . . . . .10 Spectral projection and reduced resolvent . 7. . . . . . .4 HilbertSchmidt operators 6. . . . 7.13 First order diﬀerential operators with an inﬁnite interval: continuous spectrum . . . .2 Subspaces . . . . . . . . . . 7. . . . . . . . . . . . . .4 Operators . . 5.5 Problems . . . .9 First order diﬀerential operators with a bounded interval: point spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Generating secondorder selfadjoint operators . . . . . 7. . . . . . . . . . . 5. . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Homogeneous solutions of the wave 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Problems .6 Spectra of inverse operators . . . . . . 7. . . . . . . . 6.3 Graphs . . . . . . . . . 7. . 5 67 67 69 70 70 71 72 74 75 76 77 78 79 81 81 81 84 87 88 89 90 93 93 93 94 95 95 96 97 98 98 100 101 102 102 103 104 . . . . . . . . . . . . .2 Bounded linear operators 6. . . . . . . . . . . . . . . . . . . . . . . . .7 Poisson equation . . . . . . . .8 Selfadjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . 7. 6. . . . . . . . . . . . . . 6 Bounded Operators 6.11 Problems . . 5. . .1 Properties of distributions . . . . . . . . . . .8 Diﬀusion equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Tempered distributions . . . . 5. . . . . . 5. . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . .14 Problems . . . . . . . . . . . . . . . . . . . . .5 Problems . 7. . . . . . . . . . . . .9 Wave equation . . . . . . . . . . . . . . . .3 Compact operators . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . .11 Subnormal operators . . . . . . . . . . . . . . . . . . . . . . . . . .4 Second order diﬀerential operators with a bounded interval: point spectrum . . . . . . . . . . .5 Second order diﬀerential operators with a semibounded interval: continuous spectrum . . . . . . . 10. . . . . . .5 A singular perturbation example . .2 Problems . . .6 Second order diﬀerential operators with an inﬁnite interval: continuous spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . 9. . . . . . . .8 Examples: compact normal operators . . . . . . .4 Interlude: The Legendre transform 9. . . . . . . . . . . . . . . . . . . . . . .2 A conservation law . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . 10. . .1 The EulerLagrange equation . . . . . . .12 Examples: forward translation invariant operators and the Laplace transform . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Appendix: Lagrange multipliers . . . . . . . . . 9. 8. . .5 Lagrangian mechanics . . .7 The spectral theorem for normal operators . . . .3 The implicit function theorem: systems . . . . . . . . . . . 8. . . . . .3 Second variation . . . 10 Perturbation theory 10. . . . . . . . . 10. .9 The path integral . . . . . . . . . . . . . . . . . . . . . . .8 The anharmonic oscillator . . . . . 8. . . . . . . 8. .1 The implicit function theorem: scalar case 10. . . . . . .13 Quantum mechanics . . . . . . . . . . . . . 9. . . . . . . . . . . . . .4 Nonlinear diﬀerential equations .7 The selfadjoint case . . . . . . . . . . . . . . . . . . . . .6 Hamiltonian mechanics . . . .6 Eigenvalues and eigenvectors . 10. . . . . . . . 9. 8. 8. 8. . . . . . .8 Problems . . . . . . . . . . . . . . . . . . . . .1 Spectrum of a normal operator . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Kinetic and potential energy . 10. . . . 9 Calculus of Variations 9. . o 8. . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . 10. . .9 Examples: translation invariant operators and the Fourier transform . . . . . . . . . . . . . . . . .2 Problems . . . . . . . . . . . . . . . . . . . . .10 Examples: Schr¨dinger operators . . . . . . . . . . . . . 8. . . . . . . . . . .6 CONTENTS 105 105 106 107 108 109 110 110 112 112 113 114 116 118 119 121 121 122 123 124 126 127 128 129 130 131 133 133 134 136 137 138 139 141 142 8 Normal operators 8. . . .14 Problems . . . . . .3 Variation of parameters and Green’s functions . . . . . . . . . 8. . . . . .
The vector space operations are the sum A + B and the scalar multiple cA. The operations are deﬁned by (A + B)ij = Aij + Bij and (cA)ij = cAij . Let A be an m by n matrix and B be an n by p matrix.6) .4) The n by n identity matrix is deﬁned by Iij = δij . In particular. Let A and B have the same dimensions.1 1.1) A matrix is a linear combination of other matrices if it is obtained from those matrices by adding scalar multiples of those matrices.3) (1. Then the product AB is deﬁned by n (AB)ik = j=1 Aij Bjk .5) Here δij is the Kronecker delta that is equal to 1 when i = j and to 0 when i = j. (1.Chapter 1 Linear Algebra 1. Matrix algebra satisﬁes the usual properties of addition and many of the usual properties of multiplication. (1.2) (1. we have the associative law (AB)C = A(BC) 7 (1. (1.1. The m by n zero matrix is deﬁned by 0ij = 0.1 Matrices Matrix algebra An m by n matrix A is an array of complex numbers Aij for 1 ≤ i ≤ m and 1 ≤ j ≤ n.
LINEAR ALGEBRA AI = IA = A.) The inverse operation has several nice properties. Even more important. in general AB = BA. their traces do. The matrix R is uniquely determined by A.9) where R is in reduced row echelon form. An m by n matrix R is in reduced row echelon form (rref) if each column is either the next unit basis vector. In this case we write B = A−1 . then there is an m by m matrix E that is invertible and such that EA = R. Suppose B is invertible. but this is not obvious at ﬁrst. Let A be an n by n square matrix. It is easy to check that tr(AB) = tr(BA) for all such matrices. the solutions of the homogeneous equation Ax = 0 are the same as the solutions of the homogeneous equation Rx = 0. 0 0 0 0 1 (1. An n by n matrix A is invertible if there is another n by n matrix B with AB = I and BA = I. Although matrices do not commute. (It turns out that if BA = I. (1.10) A= 0 0 3 12 −2 . then R is the identity matrix and E is the inverse of A. Then both AB −1 and B −1 A exist. The ith standard basis vector is the vector with 1 in the ith row and zeros everywhere else.8 and the unit law CHAPTER 1. The trace of A is the sum of the diagonal entries. or a a linear combination of the previous unit basis vectors. This theorem allows us to speak of the pivot columns of A and the rank of A. The null space of A is the null space of R. Theorem.8) However multiplication is not commutative. Let 4 12 2 16 1 (1.11) Corollary. then also AB = I.2 Reduced row echelon form An m component vector is an m by 1 matrix. Notice that if A is n by n and had rank n. we have the distributive laws A(B + C) = (B + C)A = AB + AC BA + CA. but they are usually not equal. (1. Example. If A is an m by n matrix.7) (1. .1. The columns where unit basis vectors occur are called pivot columns. −1 −3 0 −2 −3 Then the rref of A is 1 3 0 2 0 R = 0 0 1 4 0. including (A−1 )−1 = A and (AB)−1 = B −1 A−1 . Let A have reduced row echelon form R. That is. The rank r of R is the number of pivot columns. The notion of division is ambiguous. 1.
16) (1. Then there is a unique n by n − r matrix N such that the transpose of N is ﬂipped rref and such that the transpose has pivot columns that are the non pivot columns of A and such that AN = 0 (1. A particular solution may be read oﬀ from the last column of the rational basis. One simply applies the theory to the augmented matrix [A − b]. Theorem. Example. one works from left to right and from top to bottom. One can also use the technique to solve inhomogeneous equations Ax = b. It is easy to ﬁnd N by solving RN = 0. In other words.1. −9 To accomplish this.13) That is. the columns of N span the null space of A. where 4 b = −19 . MATRICES 9 Introduce a deﬁnition: A matrix is ﬂipped rref if when ﬂipped leftright (ﬂiplr) and ﬂipped updown (ﬂipud) it is rref. the solutions of Ax = 0 are the vectors of the form x = N z. let 4 12 2 16 1 −4 A1 = 0 0 3 12 −2 19 .14) (1. 0 0 0 0 1 −2 (1. Solve Ax = b. If you try to deﬁne a corresponding concept where you work from right to left and from bottom to top. The way reduced row echelon form is usually deﬁned.15) . Example: In the above example the null space matrix of A is −3 −2 0 1 N = 0 −4 . 0 1 0 0 (1. The rational null space matrix N has the property that its transpose is in ﬂipped reduced row echelon form. There is a solution when the last column of A is not a pivot column. a perhaps sensible name for this is ﬂipped reduced row echelon form.1. Let A be an m by n matrix with rank r. −1 −3 0 −2 −3 9 Then the rref of A1 is 1 3 0 2 0 −3 R = 0 0 1 4 0 5 .12) The columns of N are called the rational basis for the null space of A.
Consider the system Ax = b.18) 1. 3. Show that if p is a particular solution of the equation Ax = b. Let A be the matrix given above. LINEAR ALGEBRA −3 1 0 N1 = 0 0 0 Thus the solution of Ax = b is −2 3 0 0 −4 −5 . Check this with the Matlab solution. and 36 21 9 b= . Find the determinant of E. −2 −3 0 Use Matlab to ﬁnd the reduced row echelon form of the matrix.3 Problems Recall that the columns of a matrix A are linearly dependent if and only if the homogeneous equation Ax = 0 has a nontrivial solution. 6 6 −23 .10 The null space matrix of A1 is CHAPTER 1. 4. Use Matlab to ﬁnd an invertible matrix E such that EA = R is in reduced echelon form. Consider the matrix 2 2 −4 A= 3 4 2 4 4 −8 6 8 4 5 19 3 9 1 17 1 −4 1 −7 −3 −21 −3 −2 2 . Write the general solution of the homogeneous equation.17) 3 0 x = −5 . a vector y is in the span of the columns if and only if the inhomogeneous equation Ax = y has a solution. 0 2 (1. Also. 2. where A is as above. Use this to ﬁnd the rational basis for the solution of the homogeneous equation Az = 0.1. 1 0 0 2 0 1 (1. then every other solution is of the form x = p + z. 1. where Az = 0.
Show that the vector b is in the span of these ﬁve vectors. it may be that the matrices P and D are complex. that is.19) where D is diagonal with diagonal entries Dkk = λk . that is. then tr(A) = tr(B). 6. give an explicit example of a vector that is not in the span. that is. Important note: Even if the matrix A is real. Or it could have special properties and be under determined. j=1 (1. Consider a system of 6 linear equations in 5 unknowns. and prove that your answer is correct. 7. Then there is an invertible matrix P such that P −1 AP = D + N. Find these vectors. 5. and prove that it is not in the span. and prove that your answer is correct. Is every vector y in R6 in the span of these ﬁve vectors? If so. Could it have exactly one solution? Is there such an example? Answer this question. 1. If we let Dkk = λk . then it is allowed that Nk−1 k = 1. Find explicit weights that give it as a linear combination. prove it. 8. have no solutions. Notice that if A and B are similar. If not. Consider the ﬁve vectors in R6 formed by the columns of A. have many solutions. Are these ﬁve vectors linearly independent? Prove that your answer is correct. (1. have exactly one solution? Is there such an example? Answer this question. Each entry of N is zero. yet are linearly independent.1. Could it be neither. 9. except if λk = λk−1 . MATRICES 11 Find the reduced echelon form of the matrix A augmented by the column −b on the right. B are said to be similar if there is an invertible matrix P with P −1 AP = B.1. How many are they? Prove that they are linearly independent.4 The Jordan form Two matrices A. The equation may also be written AP = P D + P N . then this is n Aij Pjk = Pik λk + Pi k−1 Nk−1 k . Consider a system of 5 linear equations in 6 unknowns.1. It could be overdetermined. Use this to ﬁnd the general solution of the original inhomogeneous equation. Use this to ﬁnd the rational basis for the solution of the homogeneous equation involving the augmented matrix.20) . The vectors in A that are pivot columns of A have the same span as the columns of A. Let A be an n by n matrix. Theorem.
. (1.25) (1. λk . n the matrix power Ak is similar to a matrix (D + N )k with the same diagonal entries as Dk .28) (1. . It is the eigenvalue equation Auk = λk uk except when λk = λk−1 . 3 5 7 1 . .29) . The eigenvector is the ﬁrst column of P = Let D+N = Then AP = P (D + N ). This is a nonlinear problem.26) The only eigenvalue is 7. This single equation has the same n solutions. 9 4 (1. Thus we have the identity tr(Ak ) = λk + · · · + λk . when it may take the form Auk = λk uk + uk−1 . . We can also write this in vector form.12 CHAPTER 1. 6.21) The hard thing is to actually ﬁnd the eigenvalues λk that form the diagonal elements of the matrix D.23) This is a system of n nonlinear polynomial equations in n unknowns λ1 . 1 n (1. when it is allowed to be 1.22) (1. (1. there is another way of writing these equations in terms of the characteristic polynomial. 0 6 (1. Example: Let A= 10 −1 . For larger n one is often driven to computer algorithms. LINEAR ALGEBRA The Nk−1 k factor is zero. except in cases where λk = λk−1 . . 0 7 (1.27) 1 2 . 1 2 . Example: Let 1 2 A= . The eigenvectors are the columns of P = Let D= Then AP = P D. For n = 2 the equation is quite easy to deal with by hand. For each k = 1. As is well known. . One way to see this is the following. .24) −15 12 The eigenvalues are 7. 3 5 7 0 . This gives a single nth order polynomial equation in one unknown λ. .
4. MATRICES 13 1. N is nilpotent.1. then f (A) is deﬁned. −8 −16 (1.33) where k ≥ 0 and c ≥ 0. Find the Jordan form of the matrix A. √ if each λi ≥ 0. m! m=0 (1. If A is a square matrix and f is a function deﬁned by a convergent power series. Give an example of a matrix A with each eigenvalue λi ≥ 0. undamped 0 = c2 < k. free motion 0 = c2 = k. Thus. then A is deﬁned.30) 1 m m N t . Take A= 0 −k 1 . If A = P DP −1 . where the diagonal matrix D has diagonal entries λi . then f (A) is similar to f (B). then f (A) may be deﬁned for an arbitrary function f by f (A) = P f (D)P −1 . Show that if A is similar to B. critically damped c2 = k > 0. 6. N commute. but for which √ no square root A can be deﬁned? Why does the formula in the second problem not work? . Show that f (D + N ) = 3. Find the eigenvalues and sketch a typical solution in the x1 . and D. where D is diagonal. By the Jordan form theorem.31) Use this to describe the set of all solutions x = exp(tA)z to the diﬀerential equation dx = Ax. −2c (1. To say that N is nilpotent is to say that for some p ≥ 1 the power N p = 0. 5. (1. and ﬁnd a similarity transformation that transforms A to its Jordan form.1. where f (D) is the diagonal matrix with entries f (λi ). Show that exp(t(D + N )) = exp(tD) 1 (m) f (D)N m m! m=0 p−1 p−1 (1. The diﬀerential equation describes an oscillator with spring constant k and friction coeﬃcient 2c.5 Problems 1. x2 plane in each of the following cases: over damped c2 > k > 0. A is similar to D + N .32) dt with initial condition x = z when t = 0. for instance. under damped 0 < c2 < k. Find the square root of A= 20 40 .1.34) 7. 2. Consider the critically damped case.
including A∗∗ = A and (AB)∗ = B ∗ A∗ . (1.37) where D is diagonal with real entries. Deﬁne the quadratic form Q(x) = x∗ Ax.40) 3 0 . (1. −1. Let A be an n by n matrix that is normal. Then there is a unitary matrix U such that U −1 AU = D. ij (1. Theorem. Then there is an invertible matrix P such that P ∗ AP = D. LINEAR ALGEBRA 1. A matrix A is said to be normal if AA∗ = A∗ A. its adjoint is an n by n matrix A∗ deﬁned by If we are dealing with real numbers. In the real case this is the same as being an orthogonal matrix. A unitary matrix (A∗ = A−1 ) is normal. Theorem. 1. Let A be an n by n matrix with A = A∗ . Let A be an n by n matrix with A = A∗ . and 0. B are said to be congruent if there is an invertible matrix P with P ∗ AP = B. Then there is a unitary matrix U such that U −1 AU = D.14 CHAPTER 1.38) Then AP = P D. Example: Let A= The eigenvalues are given by D= A suitable orthogonal matrix P is 1 1 −1 P =√ .36) where D is diagonal with entries 1.7 Spectral theorem A matrix U is said to be unitary if U −1 = U ∗ . (1. then the adjoint is just the transpose.1.39) 1 2 .35) Given an m by n matrix A. The adjoint operator has several nice properties.41) . Then the equation says that one may make a change of variables x = P y so that Q(x) = y∗ Dy. 2 1 1 (1.6 Quadratic forms ¯ A∗ = Aji . 2 1 (1. A selfadjoint matrix ∗ (A = A) is normal. A skewadjoint matrix (A∗ = −A) is normal.1. Theorem. Two matrices A. 0 −1 (1.
a 4 by 4 matrix would have the same entry a(3) in the 12. and 41 positions. (1.44) (1. But they are diﬃcult to compute. and the diagonal entries of D are the eigenvalues.1. 23.1. (1.1. Example: Let 1 1 −1 P =√ . The columns of U are the eigenvectors.43) 1. then U −1 AU = D. Here is one special but important situation where everything is explicit. F =√ 0 1−i 2 A suitable unitary matrix Q is 1 1 i Q= √ . where the diﬀerence is computed modulo n.48) Use Matlab to ﬁnd orthogonal P and diagonal D so that P −1 AP = D. 2 3 (1.46) a(r)e− 2πirk n . A circulant (convolution) matrix is an n by n matrix such that there is a function a with Apq = a(p − q). where D is a diagonal matrix with Dkk = a(k) = ˆ r (1. MATRICES 15 where D is diagonal.] The DFT (discrete Fourier transform) matrix is an n by n unitary matrix given by 1 2πiqk Uqk = √ e n .1. . 34. then there are unitary U and diagonal D with AU = U D.42) 2 1 1 Then P is a rotation by π/4.47) 1.9 1. If U is the DFT matrix. (1.45) n Theorem. [For instance. Let A be a circulant matrix. Let Problems 5 1 A = 4 2 3 1 2 6 3 1 4 6 3 0 4 2 3 0 1 2 3 1 4. Notice that this is the eigenvalue equation AU = U D.8 Circulant (convolution) matrices If A is normal. The eigenvalues are on the diagonal of 1 1+i 0 . (1. 2 −i −1 Then P Q = QF .
9. 5. The orthogonal matrix P is made of rotations in two planes and possibly a reﬂection.50) . where G is diagonal with all entries ±1 or 0. 6. 3. Find the eigenvalues of E. Let E = A(A∗ A)−1 A∗ . Deﬁne the pseudoinverse A+ = (A∗ A)−1 A∗ . Let A be a matrix with trivial null space. Show that E ∗ = E and E 2 = E. 4.49) Calculate E and check the identities above. 7. Explain the geometric meaning of E. Let 1 1 1 A = 1 1 1 1 1 2 3 4 5 6 7 1 4 9 16 . Find the parameter values x such that Ax best approximates 7 5 6 y = −1 . Show that A∗ A is invertible.16 CHAPTER 1.) What is the geometric interpretation of Ax? (1. 25 36 49 (1. In the following problems the matrix A need not be square. Find unitary Q and diagonal F so that Q−1 P Q = F . Find an invertible matrix R such that R∗ AR = G. LINEAR ALGEBRA 2. Show that AA+ = E and A+ A = I. What are the two angles of rotations? Is there a reﬂection present as well? Explain. 8. −3 −7 −24 (Hint: Let x = A+ y.
A subspace of a vector space is a subset that is itself a vector space. There will be two possibilities for the scalars.54) The vector space axioms also require that for each scalar c in F and each u in V there is a vector cu in V . A two dimensional vector subspace is pictured as a plane through the origin.55) (1.1. The vector space addition axioms say that the sum of each two vectors u.51) The elements of a vector space are pictured as arrows starting at a ﬁxed origin. Scalar multiplication must satisfy the following axioms: distributive law for vector addition c(u + v) = cu + cv distributive law for scalar addition (c + d)u = cu + cu associative law for scalar multiplication (cd)u = c(du) identity law for scalar multiplication 1u = u. Or they may be elements of the ﬁeld C of complex numbers. we shall consider the scalars as belonging to a ﬁeld F. and so on. (1.2. VECTOR SPACES 17 1.56) (1.57) (1.2.2 1. They may be elements of the ﬁeld R of real numbers. The smallest possible subspace consists of the zero vector at the origin. . The sum of two arrows starting at the origin is the diagonal of the parallelogram formed by the two arrows. Addition must satisfy the following axioms: associative law (u + v) + w = u + (v + w) additive identity law u+0=0+u=u additive inverse law u + (−u) = (−u) + u = 0 and commutative law u + v = v + u.52) (1. v in V is another vector in the space called u + v in V .58) (1.53) (1. To handle both cases together.1 Vector spaces Vector spaces A vector space is a collection V of objects that may be combined with vector addition and scalar multiplication. The vector sum is pictured in terms of the parallelogram law. A one dimensional vector subspace is pictured as a line through the origin. (1.
59) 1. In that case the inverse transformation P −1 : V → Fn is the transformation that carries a vector to its coordinates with respect to the basis. Theorem. A list of vectors spans the vector space V if and only if the corresponding linear transformation has range V . pn be a list of vectors in an m dimensional vector space V . . . The null space of a linear transformation T : V → W is a subspace of V .18 CHAPTER 1. . . (1. . A list of vectors is a basis for V if and only if the corresponding linear transformation P : Fn → V is a vector space isomorphism. They deﬁne a linear transformation P : Fn → V . Theorem. (1. Theorem.61) . A list of vectors is linearly independent if and only if the corresponding linear transformation has zero null space. pn . In case there is a basis transformation P : Fn → V . The range is sometimes called the image. Theorem. where u is in V . Theorem.60) The null space of a linear transformation T : V → W is the set of all vectors u in V such that T u = 0. Let P : Fn → V be the corresponding transformation. . . Thus T (u + v) = T u + T v and T (cu) = cT u. (1. A linear transformation T : V → W is a function from V to W that preserves the vector space operations. Then there is an isomorphism E : V → Fm such that EP = R. . . A linear transformation T : V → W is onetoone if and only if its null space is the zero subspace. the vector space V is said to be n dimensional.3 Reduced row echelon form Theorem Let p1 . Consider a list of vectors p1 . . An invertible linear transformation will also be called a vector space isomorphism. LINEAR ALGEBRA 1. by taking the vector in Fn to be the weights of a linear combination of the p1 .2. Theorem. pn . . The range of a linear transformation T : V → W is a subspace of W. A linear transformation T : V → W is onto W if and only if its range is W . . Theorem. The range of a linear transformation T : V → W is the set of all vectors T u in W .2. The null space is sometimes called the kernel. A linear transformation T : V → W has an inverse T −1 : W → V if and only if its null space is the zero subspace and its range is W . Theorem.2 Linear transformations Let V and W be vector spaces.
Then this gives a picture of the linear transformation as a vector ﬁeld. 1. both negative.2. In the second case the rotation has magnitude greater than π/2 and the vector ﬁeld is an attracting spiral. Then the linear transformation T maps each vector u in V to another vector T u in V . These give repelling. In the ﬁrst case the rotation has magnitude less than π/2. they can be both positive. and the vector ﬁeld is a repelling spiral. (1.4 Jordan form Theorem. These vectors are those that are not linear combinations of previous vectors in the list. If ω is in V ∗ . In the real case. VECTOR SPACES 19 where R is in reduced row echelon form. Then there is an invertible matrix P : Rn → V such that P −1 T P = D + N. Let T : V → V be a linear transformation of an n dimensional vector space to itself. The reduced row echelon form of a list of vectors expresses the extent to which each vector in the list is a linear combination of previous pivot vectors in the list.5 Forms and dual spaces The dual space V ∗ of a vector space V consists of all linear transformations from V to the ﬁeld of scalars F.2.63) It is diﬃcult to picture such a linear transformation. In the complex case the real part can be either positive or negative. The qualitative features of this vector ﬁeld depend on whether the eigenvalues are real or occur as a complex conjugate pair. u . then the value of ω on u in V is written ω. Thus for an arbitrary list of vectors. there are certain vectors that are pivot vectors.64) (1. The case of complex eigenvalues with real part equal to zero is special but important. The transformation is similar to a rotation that has magnitude ±π/2. or of mixed sign. except if λk = λk−1 . One way to do it works especially well in the case when the dimension of V is 2. Think of each u as determining a point in the plane.1.62) where D is diagonal with diagonal entries Dkk = λk . then it is allowed that Nk−1 k = 1. The m by n matrix R is uniquely determined by the list of vectors. 1.2. Draw the vector T u as a vector starting at the point u. and hyperbolic ﬁxed points. We can also write this in vector form. and so the vector ﬁeld goes around in closed elliptical paths. Each entry of N is zero. The equation T P = P (D + N ) is equivalent to the eigenvalue equation T pk = λk pk except when λk = λk−1 . when it may take the form T pk = λk pk + pk−1 . attracting. even for a real vector space. (1. .
w) = a(u. It is required that it be linear in the second variable. a linear transformation a mixed tensor. (1. In fact. w) + a(v. u). it is required to be conjugate linear in the ﬁrst variable. v) = (1. 4 a(u.2. v) = ca(u. v). v). In this case this is called a bilinear form. The value of ω on u is obtained by taking the vector u and seeing how many contour lines it crosses. v) + a(u. LINEAR ALGEBRA We think of the elements of Fn as column vectors. cv) = ca(u. The elements of the dual space of Fn are row vectors.67) and a(cu. u − v) determines the real part.69) In the real case the complex conjugate does not occur.6 Quadratic forms A sesquilinear form is a function that assigns to each ordered pair of vectors u. v). Then part. A sesquilinear form is sometimes regarded as a covariant tensor. A sesquilinear form deﬁnes a linear transformation from V to V ∗ . (1. this says that a(u + v. u) that is real. w) (1. In this case the bilinear form is called symmetric. while forms in the dual space V ∗ are called covariant vectors. These are straight lines at regularly spaced intervals. v in V a scalar a(u.) This linear transformation takes the vector u to the form a(u. The quadratic form determines the Hermitian form. in the complex case it is a conjugate linear transformation. w) (1. ¯ (1. Sometimes vectors in the original space V are called contravariant vectors.65) and a(u. A sesquilinear form is Hermitian if a(u. ·). The way to picture an element of V ∗ is via its contour lines. but not one from a space to itself. v) = a(u + v.70) a(iu. a(u. (Actually. u + v) + a(u − v. v + w) = a(u. A Hermitian form deﬁnes a quadratic form a(u.20 CHAPTER 1.66) Furthermore. v) = a(v. Thus a(u. So a sesquilinear form may be viewed as a special kind of linear transformation. The value of a row vector on a column vector is given by taking the row vector on the left and the column vector on the right and using matrix multiplication. v) determines the imaginary . 1. By contrast.68) In the real case one does not need the complex conjugates on the scalars.
1.2. VECTOR SPACES
21
Theorem. Let a be a Hermitian form deﬁned for vectors in an n dimensional vector space V . Then there exists an invertible linear transformation P : Fn → V such that a(P x, P x) = x∗ Dx, (1.71) where D is diagonal with entries 1, −1, and 0. It is sometimes possible to get a good picture of a quadratic form. In the real case when V is two dimensional a quadratic form may be pictured by its contour lines. There are diﬀerent cases depending on whether one has +1s or −1s or a +1, −1 combination.
1.2.7
Special relativity
In special relativity V is the four dimensional real vector space of displacements in spacetime. Thus a vector u takes one event in spacetime into another event in spacetime. It represents a separation between these spacetime events. There is a quadratic form g with signature +1, −1, −1, −1. The vector u is said to be timelike, lightlike, or spacelike according to whether g(u, u) > 0, g(u, u) = 0, or g(u, u) < 0. The time (perhaps in seconds) between timelike separated events is g(u, u). The distance (perhaps in lightseconds) between spacelike separated events is −g(u, u). The light cone is the set of lightlike vectors. The rest of the vector space consists of three parts: forward timelike vectors, backward timelike vectors, and spacelike vectors. Lemma. If two forward timelike vectors u, v have unit length g(u, u) = g(v, v) = 1, then their diﬀerence u − v is spacelike. Proof: This lemma is at least geometrically plausible. It may be proved by using the diagonal form D of the quadratic form. Lemma If u, v are forward timelike vectors with unit length, then g(u, v) > 1. Proof: This follows immediately from the previous lemma. AntiSchwarz inequality. If u, v are forward timelike vectors, then g(u, v) > g(u, u) g(v, v). Proof: This follows immediately from the previous lemma. Antitriangle inequality. If u, v are forward timelike vectors, and u+v = w, then g(w, w) > g(u, u) + g(v, v). (1.72) Proof: This follows immediately from the antiSchwarz inequality. The antitriangle inequality is the basis of the twin paradox. In this case the lengths of the vectors are measured in time units. One twin undergoes no acceleration; the total displacement in space time is given by w. The other twin begins a journey according to vector u, then changes course and returns to a meeting event according to vector v. At this meeting the more adventurous twin is younger. The antitriangle inequality is also the basis of the famous conversion of mass to energy. In this case w is the energymomentum vector of a particle,
22
CHAPTER 1. LINEAR ALGEBRA
and g(w, w) is its rest mass. (Consider energy, momentum, and mass as all measured in energy units.) The particle splits into two particles that ﬂy apart from each other. The energymomentum vectors of the two particles are u and v. By conservation of energymomentum, w = u + v. The total rest mass g(u, u) + g(v, v of the two particles is smaller than the original rest mass. This has released the energy that allows the particles to ﬂy apart.
1.2.8
Scalar products and adjoint
A Hermitian form g(u, v) is an inner product (or scalar product) if g(u, u) > 0 except when u = 0. Often when there is just one scalar product in consideration it is written g(u, v) = (u, v). However we shall continue to use the more explicit notation for a while, since it is important to realize that to use these notions there must be some mechanism that chooses an inner product. In the simplest cases the inner product arises from elementary geometry, but there can be other sources for a deﬁnition of a natural inner product. Remark. An inner product gives a linear transformation from a vector space to its dual space. Let V be a vector space with inner product g. Let u be in V . Then there is a corresponding linear form u∗ in the dual space of V deﬁned by u∗ , v = g(u, v) (1.73)
for all v in V . This form could also be denoted u∗ = gu. The picture associated with this construction is that given a vector u represented by an arrow, the u∗ = gu is a form with contour lines perpendicular to the arrow. The longer the vector, the more closely spaced the contour lines. Theorem. (Riesz representation theorem) An inner product not only gives a transformation from a vector space to its dual space, but this is an isomorphism. Let V be a ﬁnite dimensional vector space with inner product g. Let ω : V → F be a linear form in the dual space of V . Then there is a unique vector ω ∗ in V such that g(ω ∗ , v) = ω, v (1.74) for all v in V . This vector could also be denoted ω ∗ = g −1 ω. The picture associated with this construction is that given a form ω represented by contour lines, the ω ∗ = g −1 ω is a vector perpendicular to the lines. The more closely spaced the contour lines, the longer the vector. Theorem: Let V1 and V2 be two ﬁnite dimensional vector spaces with inner products g1 and g2 . Let A : V1 → V2 be a linear transformation. Then there is another operator A∗ : V2 → V1 such that g1 (A∗ u, v) = g2 (Av, u) for every u in V2 and v in V1 . Equivalently, g1 (A∗ u, v) = g2 (u, Av) for every u in V2 and v in V1 . (1.76) (1.75)
1.2. VECTOR SPACES
23
The outline of the proof of this theorem is the following. If u is given, then the map g2 (u, A·) is an element of the dual space V1∗ . It follows from the previous theorem that it is represented by a vector. This vector is A∗ u. A linear transformation A : V1 → V2 is unitary if A∗ = A−1 . There are more possibilities when the two spaces are the same. A linear transformation A : V → V is selfadjoint if A∗ = A. It is skewadjoint if A∗ = −A. It is unitary if A∗ = A−1 . A linear transformation A : V → V is normal if AA∗ = A∗ A. Every operator that is selfadjoint, skewadjoint, or unitary is normal. Remark 1. Here is a special case of the deﬁnition of adjoint. Fix a vector u in V . Let A : F → V be deﬁned by Ac = cu. Then A∗ : V → F is a form u∗ such that u∗ , z c = g(z, cu). Thus justiﬁes the deﬁnition u∗ , z = g(u, z). Remark 2. Here is another special case of the deﬁnition of adjoint. Let A : V → F be given by a form ω. Then the adjoint A∗ : F → V applied to c is the vector cω ∗ such that g(cω ∗ , v) = c ω, v . This justiﬁes the deﬁnition ¯ g(ω ∗ , v) = ω, v . There are standard equalities and inequalities for inner products. Two vectors u, v are said to be orthogonal (or perpendicular) if g(u, v) = 0. Theorem. (Pythagorus) If u, v are orthogonal and w = u + v, then g(w, w) = g(u, u) + g(v, v). (1.77)
Lemma. If u, v are vectors with unit length, then g(u, v) ≤ 1. Proof: Compute 0 ≤ g(u − v, u − v) = 2 − 2 g(u, v). Lemma. If u, v are vectors with unit length, then g(u, v) ≤ 1. Proof: This follows easily from the previous lemma. Notice that in the real case this says that −1 ≤ g(u, v) ≤ 1. This allows us to deﬁne the angle between two unit vectors u, v to be the number θ with 0 ≤ θ ≤ π and cos(θ) = g(u, v). Schwarz inequality. We have the inequality g(u, v) ≤ g(u, u) g(v, v). Proof: This follows immediately from the previous lemma. Triangle inequality. If u + v = w, then g(w, w) ≤ g(u, u) + g(v, v). (1.78)
Proof: This follows immediately from the Schwarz inequality.
1.2.9
Spectral theorem
Theorem. Let V be an n dimensional vector space with an inner product g. Let T : V → V be a linear transformation that is normal. Then there is a unitary transformation U : Fn → V such that U −1 T U = D, where D is diagonal. (1.79)
(1. explicitly. In fact. (1. or. T uk = λk uk . This is because the associated quadratic form is deﬁned by A.83) √ The λ should be real and positive. this version is often encountered in concrete problems. we get the condition Az = λM z.80) The distinction between linear transformations and quadratic forms sometimes shows up even in the context of matrix theory. the eigenvalues of the linear transformation M −1 A are also positive. which is positive. In this example the inner product is deﬁned by the mass matrix M . The equation is equivalent to the eigenvalue equation T U = U D. The transformation U deﬁnes an orthonormal basis of vectors. The natural linear transformation to consider is then T = B −1 A. . (1. Let A be a selfadjoint matrix. The matrix M −1 A is a selfadjoint operator relative to the inner product deﬁned by M .82) dt Here M is the matrix of mass coeﬃcients (a diagonal matrix with strictly positive entries). The matrix A is real and symmetric. (1. LINEAR ALGEBRA We can also write this in vector form. regarded as deﬁning a quadratic form. The equation of motion is d2 x M 2 = −Ax. The eigenvalue problem for this linear transformation then takes the form Au = λBu.81) Indeed. If we try a solution of the form x = z cos( λt). Therefore the eigenvalues λ are automatically real. so the λ indeed make sense as resonant angular frequencies. That is why the eigenvalues of M −1 A are real. The linear transformation B −1 A is a selfadjoint linear transformation relative to the quadratic form deﬁned by B. and it is assumed to have √ positive eigenvalues.24 CHAPTER 1. Then B deﬁnes a quadratic form that is in fact a scalar product. Example: Consider a system of linear spring oscillators. Let B be a selfadjoint matrix that has all eigenvalues strictly positive.
Furthermore. until one realizes that the ﬁrst ∂/∂u is taken holding v constant.1 Vector ﬁelds and diﬀerential forms Coordinate systems In mathematical modeling it is important to be ﬂexible in the choice of coordinate system. and the scalars u and v may be expressed as functions of the α and β. For this reason we shall consider what one can do with arbitrary coordinate systems. Let α.3. Then the chain rule says that ∂ ∂α ∂ ∂β ∂ = + . one can do as the chemists do and use notation that explicitly speciﬁes what is being held constant. This would be something quite diﬀerent. Similarly. for instance.1. the partial derivative ∂/∂v is the derivative holding u constant. Then ∂/∂u would be computed as a derivative along a curve where β is constant. the relation ∂ ∂ ∂β ∂ = + . v.3. β. One must be careful to note that an expression such as ∂/∂u makes no sense without reference to a coordinate system u. Consider the partial derivative ∂/∂u as the derivative holding the coordinate v constant. We consider systems whose state can be described by two coordinates. by taking u = α. the matrix J= ∂α ∂u ∂β ∂u ∂α ∂v ∂β ∂v (1. ∂u ∂u ∂u ∂β (1.84) is invertible at each point. ∂u ∂u ∂α ∂u ∂β Similarly. One way out of this kind of puzzle is to be very careful to specify the entire coordinate system whenever a partial derivative is taken. Then the scalars α and β may be expressed as functions of the u and v. ∂u ∂u dα + dβ (1.88) du = ∂α ∂β . Thus. Say some part of the system can be described by coordinates u.85) ∂ ∂α ∂ ∂β ∂ = + . (1. Notice that these formulas amount to the application of the transpose of the matrix J to the partial derivatives. However the same ideas work for higher dimensional systems.87) This seems like complete nonsense. (1. One can do similar computations with diﬀerentials of the coordinates. Alternatively. VECTOR FIELDS AND DIFFERENTIAL FORMS 25 1. Thus we would have. For instance. say that we were interested in coordinate system u. while the second ∂/∂u is taken holding β constant. β be another coordinate system for that part of the system.3 1. for instance.86) ∂v ∂v ∂α ∂v ∂β These formulas are true when applied to an arbitrary scalar. v.
For instance dh = and also dh = ∂h ∂h du + dv ∂u ∂v ∂h ∂h dα + dβ.94) Theorem. y in the plane.3. The elements of V (P ) are pictured as arrows starting at the point P .93) (1. These are not necessarily Cartesian coordinates. Notice that the relations are quite diﬀerent: they involve the inverse of the matrix J. LINEAR ALGEBRA ∂v ∂v dα + dβ ∂α ∂β (1. With each vector ﬁeld L there is an associated ordinary diﬀerential equation dx dt dy dt = = a = f (x. There are of course inﬁnitely many vectors in each V (P ). there is a vector space V (P ).91) 1. It is sometimes called a contravariant vector ﬁeld. called the tangent space at the point P. let h be an arbitrary scalar. (1. Let h be a scalar function of x and y. We consider arbitrary coordinates x. Here it is important to note that expressions such as du are deﬁned quite independently of the other members of the coordinate system. Then along each solution of the diﬀerential equation dh = Lh.89) Diﬀerentials such as these occur in line integrals. Then dh is deﬁned and may be expressed in terms of an arbitrary coordinate system.) Thus a vector ﬁeld is a ﬁrst order linear partial diﬀerential operator of the form L=a ∂ ∂ +b ∂x ∂y (1. y) b = g(x.26 and dv = CHAPTER 1.2 Vector ﬁelds For each point P in the space being studied. ∂α ∂β (1. y). A vector ﬁeld assigns to each point a vector in the tangent space at that point. One can pick a linearly independent list that spans V (P ) as basis vectors for V (P ).95) dt . Then there is a corresponding basis at each point given by the partial derivatives ∂/∂x and ∂/∂y. (1.92) where a and b are functions of x and y. (The vector should depend on the point in a smooth way. In fact. This kind of basis is called a coordinate basis.90) (1.
. Theorem. this says that p dx + q dy = ∂h ∂h dx + dy. They are the same thing. The vector ﬁeld or the diﬀerential equation may be expressed in an arbitrary coordinate system. (1.97) where p and q are functions of x and y.) Thus it is an expression of the form ω = p dx + q dy (1.3 Diﬀerential forms A diﬀerential form assigns to each point a linear form on the tangent space at that point. Then near that point there is a new coordinate system u and v such that L= ∂ . Again. VECTOR FIELDS AND DIFFERENTIAL FORMS 27 The vector ﬁeld determines the diﬀerential equation.3. A diﬀerential form is said to be exact in a region if there is a smooth scalar function h deﬁned on the region such that ω = dh. these lines ﬁt together in a nice way only in a special situation (for an exact form). a diﬀerential form may be expressed in an arbitrary coordinate system.3.96) 1. It is sometimes called a covariant vector ﬁeld. Explicitly. A diﬀerential form is said to be closed in a region if it satisﬁes the integrability condition ∂q ∂p = (1. as we shall see.101) ∂y ∂x in the region. expressed in somewhat diﬀerent language.99) In this case the contour lines of the diﬀerential form at each tangent space ﬁt together on a small scale as pieces of the contour curves of the function h.1. ∂x ∂y (1. However.100) (1. the diﬀerential equation determines the vector ﬁeld. ∂u (1. (straightening out theorem) Consider a point where L = 0. L = pa + qb.98) It is a scalar function of x and y. A diﬀerential form may be pictured in a rough way by indicating a bunch of parallel lines near each point of the space. Furthermore. (Again the assignment should be smooth. The value of the diﬀerential form on the vector ﬁeld is the scalar function ω. A solution of the diﬀerential equation is a parameterized curve such that the tangent velocity vector at each point of the curve is the value of the vector ﬁeld at that point.
102) may be computed with an arbitrary coordinate system and with an arbitrary parameterization of the oriented curve. On the other hand. It turns out that the form is exact in the region if and only if the value of the line integral along each curve in the region depends only on the end points. At such a point.28 CHAPTER 1. (Poincar´ lemma) If a form is closed.105) The partial derivatives are evaluated at the point. and the linearization is given by du dt dy dt = = ∂a u+ ∂x ∂b u+ ∂x ∂a v ∂y ∂b v. the form (−y dx + x dy)/(x2 + y 2 ) is closed in the plane minus the origin.3. then it is locally exact. Thus u.103) Thus the value of the line integral depends only on the end points. there is a linearization of the vector ﬁeld. then it is exact in the region. If a form is exact. where the angle θ has a discontinuity only on the line. In fact. and if the curve C goes from point P to point Q. so that ω = dh. 1. The numerical answer is always the same. Theorem. e obtained by removing a line from the origin to inﬁnity. it is locally exact. Another way to formulate this is to say that a form is exact in a region if and only if the line integral around each closed curve is zero.106) bx by v dt v . e It is not true that if a form is closed in a region. v are coordinates that vanish at the point. for example. Then the line integral ω= C C p dx + q dy (1. then the integral ω = h(Q) − h(P ). A diﬀerential form is what occurs as the integrand of a line integral. then it is closed. by the Poincar´ lemma. ∂y (1. consider any smaller region. but it is not exact in this region. C (1. The contour lines of h are curves.4 Linearization of a vector ﬁeld near a zero The situation is more complicated at a point where L = 0.104) (1. Then it is of the form dh. When a diﬀerential form is exact it is easy to picture. Thus. Thus this equation has the matrix form d u ax ay u = . LINEAR ALGEBRA Theorem. Let C be an oriented curve. (1. In this smaller region this form may be represented as dθ. The diﬀerential at any point is obtained by zooming in at the point so close that the contour lines look straight. If a form is exact in a region.
1969. such as λ1 + λ2 = 0.108) Its value on a tangent vector L at the point is d2 h(L. Then there is a new coordinate system near that zero in which the vector ﬁeld is a linear vector ﬁeld. which gives λ1 + λ2 = 0. This situation includes some of the most interesting cases. which says λ1 = 0. The condition is that neither λ1 nor λ2 may be expressed in this form. The precise statement of the eigenvalue condition is in terms of integer linear combinations m1 λ1 + m2 λ2 with m1 ≥ 0. (1. v such that near the point h − h0 = 2 1u + 2v 2 . Theorem. (1. (Sternberg linearization theorem). Consider a function h such that at a particular point we have dh = 0. 1. L) = hxx a2 + 2hxy ab + hyy b2 . Let h0 be the value of h at this point. Suppose that the Hessian is nondegenerate at the point.110) . Princeton University Press. This linear diﬀerential equation can be analyzed using the Jordan form of the matrix. Nelson. see E.3. Suppose that a vector ﬁeld satisﬁes the eigenvalue condition at a zero. (1. More interesting is λ1 = 2λ1 + λ2 . m2 ≥ 0. In particular. It is natural to ask whether at an isolated zero of the vector ﬁeld coordinates can be chosen so that the linear equation is exactly equivalent to the original nonlinear equation. Topics in Dynamics I: Flows.1. How can this condition be violated? One possibility is to have λ1 = 2λ1 . Another is to have λ1 = 2λ2 . For the straightening out theorem and the Sternberg linearization theorem. and m1 + m2 ≥ 2. there is a problem when the eigenvalues satisfy certain linear equations with integer coeﬃcients. This includes the case of conjugate pure imaginary eigenvalues. NJ. hyy (1. It is determined by a symmetric matrix H= hxx hxy hxy .109) Theorem (Morse lemma) Let h be such that dh vanishes at a point.107) The partial derivatives are evaluated at the point. VECTOR FIELDS AND DIFFERENTIAL FORMS 29 The idea is that the vector ﬁeld is somehow approximated by this linear vector ﬁeld. This quadratic form is called the Hessian. The u and v represent deviations of the x and y from their values at this point. Princeton.5 Quadratic approximation to a function at a critical point The useful way to picture a function h is by its contour lines.3. Then there is a coordinate system u. such as that of conjugate pure imaginary eigenvalues. The answer is: usually. Then there is a naturally deﬁned quadratic form d2 h = hxx (dx)2 + 2hxy dxdy + hyy (dy)2 . but not always.
the metric has the form g = E(du)2 + F (dv)2 .115) For orthogonal coordinates. ∂r ∂r r2 ∂θ ∂θ (1. gradient. ∂u ∂v (1.3. LINEAR ALGEBRA where 1 and 2 are constants that are each equal to ±1. when F = 0. and divergence It is important to distinguish between diﬀerential and gradient. 1969.6 Diﬀerential.114) where g = EG − F 2 is the determinant of the matrix g.111) On the other hand. The diﬀerential of a scalar function h is given in an arbitrary coordinate system by dh = ∂h ∂h du + dv.30 CHAPTER 1. NJ. the expression is simpler: h= 1 E ∂h ∂u ∂ 1 + ∂u G ∂h ∂v ∂ ∂v (1. In this case. 1. For a vector ﬁeld ∂ ∂ +b (1. Princeton.118) (1.117) Another useful operation is the divergence of a vector ﬁeld.112) It is obtained by applying the inverse g −1 of the quadratic form to the diﬀerential to obtain the corresponding vector ﬁeld.116) Example: In polar coordinates g = (dr)2 + r2 (dθ)2 with determinant g = r2 . the deﬁnition of the gradient depends on the quadratic form that deﬁnes the metric: g = E(du)2 + 2F du dv + G(dv)2 . Morse Theory. E (1. (1.113) 1 G g −F (1. Thus h= 1 g G ∂h ∂h −F ∂u ∂v ∂ 1 + ∂u g −F ∂h ∂h +E ∂u ∂v ∂ ∂v (1. Princeton University Press. So h= ∂h ∂ 1 ∂h ∂ + .119) v=a ∂u ∂v . see J. The inverse of the matrix g= is g −1 = E F F G −F . For the Morse lemma. Milnor.
A diﬀerential form is the sort of object that ﬁts naturally into a line integral.123) Again this simpliﬁes in orthogonal coordinates: √ √ 1 ∂ g ∂h 1 ∂ g ∂h · h= √ +√ . g ∂u E ∂u g ∂v G ∂v In polar coordinates this is · h= 1 ∂ r ∂r r ∂h ∂r + 1 ∂2h . r ∂r r ∂θ r ∂r ∂θ The Laplace operator is the divergence of the gradient.120) √ In this formula g denotes the square root of the determinant of the metric form in this coordinate system.124) (1. a vector ﬁeld is something that deﬁnes a ﬂow.1.122) . g ∂u g ∂v 31 (1. VECTOR FIELDS AND DIFFERENTIAL FORMS the divergence is √ √ 1 ∂ ga 1 ∂ gb ·v = √ +√ . via the solution of a system of ordinary diﬀerential equations. Thus ·v = · 1 ∂ 1 h= √ √ g ∂u g G ∂h ∂h −F ∂u ∂v 1 ∂ 1 +√ √ g ∂v g −F ∂h ∂h +E ∂u ∂v (1. The important thing to emphasize is that vector ﬁelds are very diﬀerent from diﬀerential forms. On the other hand. in polar coordinates it is r dr dθ. However it may ultimately be traced back to the fact that the volume √ element in an arbitrary coordinate system is g du dv. These objects play diﬀerent roles.3. For instance. It is not immediately clear why this is the correct quantity. in polar coordinates the divergence of v=a is ∂ ∂ +b ∂r ∂θ (1. (1. (1. On the other hand.3.126) . for example. Thus. Example: Consider spherical polar coordinates. where θ is colatitude and φ is longitude. The diﬀerential has the same form in any coordinate system. the gradient depends on the inner product. Then the metric form is g = (dr)2 + r2 (dθ)2 + r2 sin2 (θ)(dφ)2 . 1. and this diﬀerence is highly visible once one leaves Cartesian coordinates.7 Spherical polar coordinates These ideas have analogs in higher dimensions.121) 1 ∂ra 1 ∂rb 1 ∂ra ∂b + = + . r2 ∂θ2 (1.125) There are similar deﬁnitions in higher deﬁnitions.
131) 1. x y dt (1. 2 ∂r r sin(θ) ∂θ ∂φ (1. This inner product maps a form to a corresponding vector ﬁeld.32 In this case the gradient is h= CHAPTER 1. The divergence of ∂ ∂ ∂ v=a +b +c (1.8 Gradient systems Now we use Cartesian coordinates x.136) .3.133) ∂h ∂h dx + dy ∂x ∂y (1. (1. y and the usual inner product.132) The corresponding diﬀerential equation is dx dt dy dt = = hx hy .130) The Laplace operator is the divergence of the gradient. ∂r ∂r r2 ∂θ ∂θ r2 sin2 (θ) ∂φ ∂φ (1. sin(θ) ∂θ ∂θ r sin (θ) ∂φ2 (1.129) This can be written more simply as ·v = ·v = 1 ∂r2 a 1 ∂ sin(θ)b ∂c + + . r2 sin(θ) ∂r r sin(θ) ∂θ r sin(θ) ∂φ (1. LINEAR ALGEBRA ∂h ∂ 1 ∂h ∂ 1 ∂h ∂ + + .135) Theorem.127) The divergence of a vector ﬁeld makes sense in any number of dimensions. Along a solution of a gradient system dh = h2 + h2 ≥ 0.134) (1. For spherical polar coordinates the volume element is r2 sin(θ) dr dθ dφ. Thus the diﬀerential form dh = determines a gradient vector ﬁeld h= ∂h ∂ ∂h ∂ + . In spherical polar coordinates this is · h= 1 ∂ r2 ∂r r2 ∂h ∂r + r2 ∂2h 1 ∂ ∂h 1 sin(θ) + 2 2 .128) ∂r ∂θ ∂φ is 1 1 1 ∂r2 sin(θ)a ∂r2 sin(θ)b ∂r2 sin(θ)c + 2 + 2 . ∂x ∂x ∂y ∂y (1.
and therefore its eigenvalues are real. (1.3. (1.141) (1.1. The picture is that at each point. (Conservation of Energy) Along a solution of a Hamiltonian system. Then the linearization of the equation is given by du dt dv dt = = hxx u + hxy v hxy u + hyy v.145) where the partial derivatives are evaluated at the point. the gradient vector ﬁeld points uphill in the steepest direction. Consider a point where dh = 0.138) where the partial derivatives are evaluated at the point. dH = Hy Hx − Hx Hy = 0.139) determines a Hamiltonian vector ﬁeld ˇ H = ∂H ∂ − ∂H ∂ . This is a real symmetric matrix. y. 1. (1. Thus the diﬀerential form dH = ∂H ∂H dx + dy ∂x ∂y (1. Then the linearization of the equation is given by du dt dv dt = = Hxy u + Hyy v −Hxx u − Hxy v.3. but now instead of a symmetric inner product we use a skewsymmetric form to map a form to a corresponding vector.146) . ∂y ∂x ∂x ∂y The corresponding diﬀerential equation is dx dt dy dt = = Hy −Hx .140) Theorem.144) (1. This is a special kind of matrix. (1. VECTOR FIELDS AND DIFFERENTIAL FORMS 33 This is hill climbing. The picture is that at each point the vector ﬁeld points along one of the contour lines of H.143) dt This is conservation of energy. The eigenvalues λ satisfy the equation 2 λ2 = Hxy − Hxx Hyy . (1.142) (1.9 Hamiltonian systems Now we use Cartesian coordinates x.137) (1. Consider a point where dH = 0.
and the output is a linear form. A linear transformation is a mixed object. 1. The integral of a diﬀerential form (a covariant object) along an oriented curve (a contravariant object) is a number.3. 1. These are objects that live in the space. A vector ﬁeld that is nonzero at a point can be transformed into a constant vector ﬁeld near that point. Contravariant objects include points and vectors (arrows).34 CHAPTER 1. The diﬀerential of a scalar function (covariant) is a diﬀerential form (covariant). There are also mixed objects.3. LINEAR ALGEBRA So they are either real with opposite sign.10 Contravariant and covariant Here is a summary that may be useful.147) 2. Often covariant objects are described by their contours. but there special circumstances when that can occur. since it takes vectors as inputs (covariant). Thus a vector ﬁeld (contravariant) determines solution curves of the associated system of diﬀerential equations (contravariant). A vector ﬁeld that is nonzero at a point can be transformed into a constant vector ﬁeld near that point. A bilinear form may be used to move from the contravariant world to the covariant world. or they are complex conjugate pure imaginary. and bilinear forms. Covariant objects associate numbers to objects that live in the space. Is it . They include scalars. Pick a point away from the origin and ﬁnd coordinates u and v so that − x2 ∂ x ∂ ∂ y + 2 = . 2 ∂x 2 ∂y +y x +y ∂u (1. The input is a vector. but it gives vectors as outputs (contravariant). In the complex case solutions oscillate near the equilibrium point. Pick a point away from the origin and ﬁnd coordinates u and v near there so that −y ∂ ∂ ∂ +x = . linear forms (elements of the dual space). ∂x ∂y ∂u (1. then there is an inverse that can take a linear form as an input and give a vector as an output. It also occurs when we take the diﬀerential of a scalar function (covariant) and use an antisymmetric bilinear form to get a Hamiltonian vector ﬁeld (contravariant). They also include scalar functions and diﬀerential forms.148) 3. This occurs for instance when we take the diﬀerential of a scalar function (covariant) and use a symmetric bilinear form to get a gradient vector ﬁeld (contravariant). In the real case solutions linger near the equilibrium point. A diﬀerential form usually cannot be transformed into a constant diﬀerential form. If the bilinear form is nondegenerate.11 Problems 1. They also include oriented curves (whose tangent vectors are arrows).
1. Here is an example of a ﬁxed point where the linear stability analysis gives an elliptic ﬁxed point but changing to polar coordinates shows the unstable nature of the ﬁxed point: dx dt dy dt = −y + x(x2 + y 2 ) = x + y(x2 + y 2 ). but there special circumstances when that can occur. Find the period of a solution in both cases. (1.154) Find its linearization at 0.152) h expressed with respect to polar basis vectors ∂/∂r 7.150) x + y2 x + y2 5. Consider the vector ﬁeld L = x(4 − x − y) ∂ ∂ + (x − 2)y . Hint: Solve the associated system of ordinary diﬀerential equations. ﬁnd the eigenvalues.3. both for L and for its linearization. Show that there is no coordinate system near 0 in which the vector ﬁeld L is expressed by its linearization. For each linearization.155) (1. A diﬀerential form usually cannot be transformed into a constant diﬀerential form. Let h be a smooth function. Its gradient expressed with respect to Cartesian basis vectors ∂/∂x and ∂/∂y is h= Find the gradient and ∂/∂θ. ∂x ∂y (1. Is it possible to ﬁnd coordinates u and v near a given point (not the origin) so that y x − 2 dx + 2 dy = du? (1. Consider the vector ﬁeld L = (1 + x2 + y 2 )y ∂ ∂ − (1 + x2 + y 2 )x . Let H be a smooth function. At each zero. Its Hamiltonian vector ﬁeld expressed with respect to Cartesian basis vectors ∂/∂x and ∂/∂y is ˇ H = ∂H ∂ − ∂H ∂ . ∂x ∂x ∂y ∂y (1.153) Find this same vector ﬁeld expressed with respect to polar basis vectors ∂/∂r and ∂/∂θ. 6. Use this information to sketch the vector ﬁeld.156) . VECTOR FIELDS AND DIFFERENTIAL FORMS 35 possible to ﬁnd coordinates u and v near a given point (not the origin) so that −y dx + x dy = du? (1. ∂x ∂y (1. ﬁnd the linearization.149) 4. 8. 9.151) Find its zeros. ∂y ∂x ∂x ∂y (1. ∂h ∂ ∂h ∂ + .
36 CHAPTER 1. LINEAR ALGEBRA Change the vector ﬁeld to the polar coordinate representation and solve the corresponding system of ordinary diﬀerential equations. .
Chapter 2
Fourier series
2.1 Orthonormal families
Let T be the circle parameterized by [0, 2π) or by [−π, π). Let f be a complex function on T that is integrable. The nth Fourier coeﬃcient is cn = 1 2π
2π 0
e−inx f (x) dx.
(2.1)
The goal is to show that f has a representation as a Fourier series
∞
f (x) =
n=−∞
cn einx .
(2.2)
There are two problems. One is to interpret the sense in which the series converges. The second is to show that it actually converges to f . This is a huge subject. However the simplest and most useful theory is in the context of Hilbert space. Let L2 (T ) be the space of all (Borel measurable) functions such that f
2
=
1 2π
2π 0
f (x)2 dx < ∞.
(2.3)
Then L2 (T ) is a Hilbert space with inner product (f, g) = Let φn (x) = exp(inx). (2.5) Then the φn form an orthogonal family in L2 (T ). Furthermore, we have the identity f 2= cn 2 + f − cn φn 2 . (2.6)
n≤N n≤N
1 2π
2π
f (x)g(x) dx.
0
(2.4)
37
38 In particular, we have Bessel’s inequality f This shows that
2
CHAPTER 2. FOURIER SERIES
≥
n≤N
cn 2 .
(2.7)
∞
cn 2 < ∞.
n=−∞
(2.8)
The space of sequences satisfying this identity is called 2 . Thus we have proved the following theorem. Theorem. If f is in L2 (T ), then its sequence of Fourier coeﬃcients is in 2 .
2.2
L2 convergence
It is easy to see that n≤N cn φn is a Cauchy sequence in L2 (T ) as N → ∞. It follows from the completeness of L2 (T ) that it converges in the L2 sense to a sum g in L2 (T ). That is, we have
N →∞
lim
g−
n≤N
cn φn
2
= 0.
(2.9)
The only remaining thing to show is that g = f . This, however, requires some additional ideas. One way to do it is to take r with 0 < r < 1 and look at the sum
∞
rn cn einx =
n=−∞
1 2π
2π
Pr (x − y)f (y) dy.
0
(2.10)
Here Pr (x) =
∞
rn einx =
n=−∞
1 − r2 . 1 − 2r cos(x) + r2
(2.11)
1 The functions 2π Pr (x) have the properties of an approximate delta function. Each such function is positive and has integral 1 over the periodic interval. Furthermore, 1 − r2 Pr (x) ≤ , (2.12) 2r(1 − cos(x))
which approaches zero as r → 1 away from points where cos(x) = 1. Theorem. If f is continuous on T , then
∞
f (x) = lim
r↑1 n=−∞
rn cn einx ,
(2.13)
and the convergence is uniform.
2.2. L2 CONVERGENCE Proof: It is easy to compute that
∞
39
rn cn einx =
n=−∞
1 2π
2π
Pr (x−y)f (y) dy =
0
1 2π
2π
Pr (z)f (x−z) dz. (2.14)
0
Hence
∞
f (x) −
n=−∞
rn cn einx =
1 2π
2π
Pr (z)(f (x) − f (x − z)) dz.
0
(2.15)
Thus
∞
f (x)−
n=−∞
rn cn einx  = 
1 2π
2π
Pr (z)(f (x)−f (x−z)) dz ≤
0
1 2π
2π
Pr (z)f (x)−f (x−z) dz.
0
(2.16) Since f is continuous on T , its absolute value is bounded by some constant M . Furthermore, since f is continuous on T , it is uniformly continuous on T . Consider arbitrary > 0. It follows from the uniform continuity that there exists δ > 0 such that for all x and all z with z < δ we have f (x) − f (x − z) < /2. Thus the right hand side is bounded by 1 − r2 2M. 2r(1 − cos(δ)) z<δ z≥δ (2.17) Take r so close to 1 that the second term is also less than /2. Then the diﬀerence that is being estimated is less than for such r. This proves the result. Theorem. If f is in L2 (T ), then 1 2π Pr (z)f (x)−f (x−z) dz+ 1 2π Pr (z)f (x)−f (x−z) dx ≤ /2+
∞
f=
n=−∞
cn φn
(2.18)
in the sense that
N →∞
lim
f−
n≤N
cn φn
2
= 0.
n cn φn .
(2.19) Let > 0 be (2.20)
Proof: Let f have Fourier coeﬃcients cn and g = arbitrary. Let h be in C(T ) with f −h
2
< /2.
Suppose h has Fourier coeﬃcients dn . Then there exists r < 1 such that h−
n
rn dn einx
∞
< /2.
(2.21)
It follows that h−
n
rn dn einx
2
< /2.
(2.22)
The norm on L2 (T ) is given by f 2 = 2π 0 f (x)2 dx.29) We have seen that the Fourier series theorem gives a perfect correspondence between L2 (T ) and 2 . The relation between these norms is c ∞ ≤ c 2 ≤ c 1. (2.3 Absolute convergence C(T ) ⊂ L∞ (T ) ⊂ L2 (T ) ⊂ L1 (T ). their relation is f 1 ≤ f 2 ≤ f ∞.26) Deﬁne the function spaces The norms f ∞ on the ﬁrst two spaces are the same. it is also the best approximation of f in the span of the φn . The space C(T ) consists of continuous functions. Since the integral is a probability average.27) Also deﬁne the sequence spaces 1 ⊂ 2 ⊂ c0 ⊂ ∞ . the smallest number M such that f (x) ≤ M (with the possible exception of a set of x of measure zero). then ∞ f 2 = n=−∞ cn 2 . then the Fourier coeﬃcients of f are in c0 . 2 2π 1 The norm on L1 (T ) is given by f 1 = 2π 0 f (x) dx. The space c0 consists of all sequences with limit 0 at inﬁnity. For the other spaces the situation is more complex. that is.23) However since g is the orthogonal projection of f onto the span of the φn . 2 The norms on the last two spaces are the same. Thus f −g 2 < . Theorem. (2. (2. (2. If f is in L1 (T ). we can only conclude that f = g. that is. RiemannLebesgue Lemma.40 Thus f− n CHAPTER 2. The map from a function to its Fourier coeﬃcients gives a continuous map from L1 (T ) to ∞ .24) Since > 0 is arbitrary. If f is in L2 (T ).28) The norm on 1 is c 1 = n cn . so each function in L2 (T ) has Fourier coeﬃcients that vanish at inﬁnity. c ∞ is the smallest M such that cn  ≤ M . Then norm on 2 is given by c 2 = n cn 2 . (2. the space L∞ (T ) consists of 2π 1 all bounded functions. (2. FOURIER SERIES rn dn einx 2 < . Proof: Each function in L2 (T ) has Fourier coeﬃcients in 2 . (2. they approaches 0 at inﬁnity.25) 2. However .
and in particular convergence at each point. then at that point ∞ f (x + z) − f (x) 2 sin(z/2) (2. Hence its coeﬃcients may be approximated arbitrarily well in ∞ norm by coeﬃcients that vanish at inﬁnity. However the function 1/(2π)DN (x) does have integral 1. One looks at the partial sum cn einx = n≤N 1 2π 2π DN (x − y)f (y) dy. (2. Theorem. 0 (2. Furthermore. then the Fourier coeﬃcients are in 1 . In summary. Of course.32) f (x) = n=−∞ cn φn (x). it is not positive. If for some x the function dx (z) = is in L1 (T ). 2. However there is one slightly unsatisfying point.30) Here DN (x) = n≤N einx = 1 sin((N + 2 )x) . the Fourier coeﬃcients determine the function. The map from Fourier coeﬃcients to functions gives a continuous map from 1 to C(T ). 1 sin( 2 x) (2.2. An sequence that is absolutely summable deﬁnes a Fourier series that converges absolutely and uniformly to a continuous function. The convergence in the L2 sense does not imply convergence at a particular point.4.33) . the Fourier coeﬃcients of an integrable function are bounded (this is obvious) and approach zero (RiemannLebesgue lemma). That is.4 Pointwise convergence This gives a very satisfactory picture of Fourier series. But what if the function is diﬀerentiable at one point but has discontinuities at other points? What can we say about convergence at that one point? Fortunately. Unfortunately. instead it oscillates wildly for large N at points away from where sin(x/2) = 0. we can ﬁnd something about that case by a closer examination of the partial sums. the map from a function to its Fourier coeﬃcients gives a continuous map from L1 (T ) to c0 . Therefore the coeﬃcients vanish at inﬁnity. Theorem.31) This Dirichlet kernel DN (x) has at least some of the properties of an approximate delta function. if the derivative is in L2 then we have uniform convergence. If f is in L2 (T ) and if f exists (in the sense that f is an integral of f ) and if f is also in L2 (T ). POINTWISE CONVERGENCE 41 every function in L1 (T ) may be approximated arbitrarily closely in L1 (T ) norm by a function in L2 (T ).
2. Proof: We have f (x) − n≤N cn einx = 1 2π 2π DN (z)(f (x) − f (x − z)) dz. Since f may be discontinuous at other points. Find the them. and compare 2 4.42 CHAPTER 2. 2 (2. and compare them. and 1 norms of these Fourier coeﬃcients. Find the Fourier coeﬃcients of 2 this function. 3.35) This goes to zero as N → ∞. Use the equality of L2 and 2 norms to compute ζ(4) = 1 . then its value at z = 0 is dx (0) = f (x). Does this sum converge absolutely? 7. by the RiemannLebesgue lemma. L2 (T ). n4 n=1 ∞ . it is possible that this Fourier series is not absolutely convergent. Compare the L∞ and norms for this problem. FOURIER SERIES Note that if dx (z) is continuous at z = 0. Use the pointwise convergence at x = π/2 to evaluate the inﬁnite sum ∞ (−1)k k=0 1 . Let f (x) = x deﬁned for −π ≤ x < π. and L∞ (T ) norms of f . ∞ . 2 . Find the Fourier coeﬃcients cn of f for all n in Z. 2k + 1 regarded as a limit of partial sums.34) We can write this as f (x) − n≤N cn einx = 1 2π 2π 0 1 2 sin((N + )z)dx (−z) dz. 8.5 Problems 1. n2 n=1 1 ∞ 5. So the hypothesis of the theorem is a condition related to diﬀerentiability of f at the point x. Compare the ∞ and L1 norms for this problem. 6. Find the L1 (T ). Use the equality of L2 and norms to compute ζ(2) = 1 . Let F (x) = 1 x2 deﬁned for −π ≤ x < π. The conclusion of the theorem is pointwise convergence of the Fourier series at that point. 2. 0 (2.
5.) . Compare the L∞ and norms for this problem.2. PROBLEMS 9. At which points x of T is F (x) continuous? Diﬀerentiable? At which points x of T is f (x) continuous? Diﬀerentiable? At which x does F (x) = f (x)? Can the Fourier series of f (x) be obtained by diﬀerentiating the Fourier series of F (x) pointwise? (This last question can be answered by inspecting the explicit form of the Fourier series for the two problems. 43 1 10. Compare the ∞ and L1 norms for this problem.
44 CHAPTER 2. FOURIER SERIES .
The simplest and most useful theory is in the context of Hilbert space. One is to interpret the sense in which these integrals converge. Let f be a complex function on R that ˆ is integrable. g) = f (x)g(x) dx.3) Then L2 (R) is a Hilbert space with inner product (f. 2π (3.2) There are two problems.6) .5) Then L2 (R ) is a Hilbert space with inner product ∞ (h. 2π (3. The Fourier transform f = F f is ˆ f (k) = ∞ −∞ e−ikx f (x) dx.1) It is a function on the (dual) real line R parameterized by k. (3.Chapter 3 Fourier transforms 3. (3. u) = −∞ h(k)u(k) 45 dk . (3. The second is to show that the inversion formula actually holds. 2π (3.1 Introduction Let R be the line parameterized by x. Let L2 (R) be the space of all (Borel measurable) complex functions such that f 2 2 ∞ = −∞ f (x)2 dx < ∞.4) Let L2 (R ) be the space of all (Borel measurable) complex functions such that h 2 2 ∞ = −∞ h(k))2 dk < ∞. The goal is to show that f has a representation as an inverse Fourier transform ∞ f (x) = −∞ ˆ eikx f (k) dk .
Theorem. The space L1 is a Banach space.2 L1 theory First. (3. The result follows by taking → 0. (3. Theorem.8) Theorem. 1 π x2 + (3. 3. An example of a function in the dual space is the exponential function φk (x) = eikx . Its dual space is L∞ . (3. −∞ (3. we need to develop the L1 theory. So this theory is simple and powerful. f is in C0 (R ).46 CHAPTER 3. then the convolution f ∗ g is another function in L1 (R) deﬁned by ∞ (f ∗ g)(x) = −∞ f (x − y)g(y) dy. the space of essentially bounded functions. . the space of bounded ˆ satisﬁes f continuous functions that vanish at inﬁnity. ˆ Theorem. FOURIER TRANSFORMS ˆ We shall see that the correspondence between f and f is a unitary map from 2 2 L (R) onto L (R ).11) Proof: The inverse Fourier transform of this is δ (x) = It is easy to calculate that dk ˆ ˆ eikx δ (k)f (k) = (δ ∗ f )(x).10) where ˆ δ (k) = exp(− k). f = ∞ φk (x)f (x) dx. g are in L1 (R). Furthermore. 2π −∞ ∞ (3.9) Theorem. 2π −∞ ∞ 2 .13) However δ is an approximate delta function. If f is in L1 (R). Let f ∗ (x) = f (−x).12) (3. then its Fourier transform f is in L∞ (R ) and ˆ ∞ ≤ f 1 . we have the inversion formula in the form f (x) = lim ↓0 dk ˆ ˆ eikx δ (k)f (k) . If f. If f. If f is in L1 and is also continuous and bounded. g are in L1 (R). The Fourier transform is then ˆ f (k) = φk . Then the Fourier transform of f ∗ is the ˆ complex conjugate of f .7) where φk is in L∞ and f is in L1 . then the Fourier transform of the convolution is the product of the Fourier transforms: ˆ g (f ∗ g)(k) = f (k)ˆ(k).
It is easy to see from the fact that it preserves norm that it also preserves inner product: (F f. Then g is in L1 and is continuous and bounded. F g) = (f. f ) = (F g. then F ∗ h is in L2 (R) and is given by the usual inverse Fourier transform formula. Similarly. 3. 2π (3. Take h = F g. ˆ Furthermore. The inverse Fourier transform F ∗ initially deﬁned on L1 (R ) ∩ L2 (R ) extends by continuity to F ∗ : L2 (R ) → L2 (R). Let g = f ∗ ∗ f . one may show that F F ∗ u = u. For the other spaces the situation is more complex. ˆ ˆ Lemma. g). Furthermore. Then fa is in L (R) and in L2 (R). The map from a function to its Fourier transform gives a continuous map from L1 (R) to part of C0 (R ). It is the setting for the most elegant and simple theory of the Fourier transform. F f ). That is. the Fourier transform of g is f (k)2 . the Fourier transform of an integrable function is continuous and bounded (this is obvious) and approach zero . f ) = (h.4 Absolute convergence We have seen that the Fourier transform gives a perfect correspondence between L2 (R) and L2 (R ).14) Theorem. Furthermore. there exists ˆ ˆ ˆ f in L2 (R ) such that fa → f as a → ∞. so that if h is in 1 L (R ) and in L2 (R ).3. this says that the Fourier transform f (k) is characterized by 1 ∞ −∞ ˆ f (k) − a −a e−ikx f (x) dx2 dk →0 2π (3. For each a. Let f be in L2 (R). and it is a Hilbert space. This proves the following result. 2 2 Proof. The Fourier transform F initially deﬁned on L1 (R) ∩ L2 (R) extends by continuity to F : L2 (R) → L2 (R ). Theorem. These are unitary operators that preserve L2 norm and preserve inner product.a] f . That is F ∗ F g = g.3. F f ) = (g. and fa → f in L2 (R) as a → ∞.3 L2 theory The space L2 is its own dual space. ˆ Explicitly. Thus f 2 2 = g(0) = lim ↓0 dk ˆ ˆ δ (k)f (k)2 = 2π −∞ ∞ ∞ −∞ ˆ f (k)2 dk . let fa = 1[−a. Then ∗ (F F g. Deﬁne the inverse Fourier transform F ∗ in the same way.15) as a → ∞. then f is in L2 (R ). These equations show that F is unitary and that F ∗ = F −1 is the inverse of F . These arguments show that the Fourier transformation F : L2 (R) → L2 (R ) ˆ deﬁned by F f = f is welldeﬁned and preserves norm. f ). Again we can extend the inverse transformation to F ∗ : L2 (R ) → L2 (R) so that it preserves norm and inner product. Now it is easy to check that (F ∗ h. If f is in L1 (R) and in L2 (R). and f 2 = f 2 . L2 THEORY 47 3. F ∗ is the inverse of F .
f ∗ g(k) = f (k)ˆ(k) is the product of a bounded 2 function with an L (R ) function. while exp(−ika)δ (k)f (k) → ˆ exp(−ika)f (k). However the same pattern of the product of a bounded function with an L2 function can arise in other ways. then the Fourier transform is in L1 (R ). Since f is in L2 (R). it follows ˆ(k) is in L2 (R). Then (δ )a ∗f → fa . 2 This is also the product of a bounded function with an L (R ) function. This implies that C 2 = 1. it follows that k f (k) is in L2 (R ). This is also a onetoone transformation. 2σ 2 (3. Furthermore. Fix σ > 0.48 CHAPTER 3. Furthermore. As a consequence f is is C0 (R). ˆ dk This proves that gσ (k) = C exp(− ˆ σ2 k2 ). ˆ that f √ √ 2 f (k) is in L2 (R ).17) Proof: Deﬁne the Fourier transform gσ (k) by the usual formula. it follows Hence 1 + k ˆ ˆ from the Schwarz inequality that f (k) is in L1 (R ).5 Fourier transform pairs There are some famous Fourier transforms. and therefore is in L2 (R ). the Fourier transform determines the function. . Since f is in L2 (R). For instance.19) (3. 2 (3. Then fa (k) = exp(−ika)f (k). Theorem.16) σ2 k2 ).18) Now apply the equality of L2 norms. Consider the Gaussian gσ (x) = √ Its Fourier transform is gσ (k) = exp(− ˆ 1 2πσ 2 exp(− x2 ). By looking at the case k = 0 it becomes obvious that C = 1. Let δ be an approximate δ function. One useful fact is that if f is in L1 (R) and g is in L2 (R). One can think of this last example as a limiting case of a convolution. this map is onetoone. Then (δ )a ∗ f has Fourier transform ˆ ˆ ˆ ˆ exp(−ika)δ (k)f (k). Now let → 0. Check that ˆ d + σ 2 k gσ (k) = 0. 2 (3. If f is in L2 (R) and if f exists (in the sense that f is an integral of f ) and if f is also in L2 (R). consider the translate fa of a ˆ function f in L2 (R) deﬁned by fa (x) = f (x − a). FOURIER TRANSFORMS (RiemannLebesgue lemma). Since 1/ 1 + k 2 is also in L2 (R ). The inverse Fourier transform gives a continuous map from L1 (R ) to C0 (R). then the convolution ˆ g f ∗ g is in L2 (R). 3. √ √ ˆ ˆ Proof: f (k) = (1/ 1 + k 2 ) · 1 + k 2 f (k). That is.
and compare them.23) 1 x−i (3. 5.25) Take the real part. Compare the L∞ (R ) and L1 (R) norms for this problem. . Let f (x) = 1/(2a) for −a ≤ x ≤ a and be zero elsewhere. 2. Use the pointwise convergence at x = 0 to evaluate an improper integral. L2 (R ). and L∞ (R) norms of f .21) (3.26) x + 2 with Fourier transform p (k) = −πi[H(k)e− ˆ k − H(−k)e k ]. Find the Fourier transform of the convolution of f with itself.3. Introduce the Heaviside function H(k) that is 1 for k > 0 and 0 for k < 0.6.6 Problems 1. The two basic Fourier transform pairs are f (x) = with Fourier transform and its complex conjugate f (x) = with Fourier transform ˆ f (−k) = −2πiH(k)e− k . Find the Fourier transform of f . (3. 4. 7. (3. 3.24) π x2 + 2 with Fourier transform ˆ δ (k) = e− k . Find the L∞ (R ). (3. Take 1/π times the imaginary part. L2 (R). and L1 (R ) norms of the Fourier transform. PROBLEMS 49 Let > 0. 1 x+i These may be checked by computing the inverse Fourier transform. This gives the approximate delta function 1 δ (x) = . Calculate the convolution of f with itself.20) (3. Find the L1 (R). and compare them.22) ˆ f (k) = 2πiH(−k)e k . Notice that f and its conjugate are not in L1 (R). Compare the L∞ (R) and L1 (R ) norms for this problem. This gives the approximate principal value of 1/x function x p (x) = 2 (3. Verify in this case that the Fourier transform of the convolution is the product of the Fourier transforms. (3. 6.27) 3.
28) Proof: Let S(t) = n f (2πn + t). that is. (3. (3.32) 3. 2π (3.7 Poisson summation formula k ˆ Theorem: Let f be in L1 (R) with f in L1 (R ) and such that Then ˆ 2π f (2πn) = f (k). (3. Let g(x) = sin(ax) . we can expand S(t) = k ak eikt .8 Problems ∞ −∞ 1. In this problem the Fourier transform is ˆ f (k) = e−ixk f (x) dx and the inverse Fourier transform is ∞ f (x) = −∞ eixk f (k) dk .50 CHAPTER 3. n k ˆ f (k) < ∞. 2π These provide an isomorphism between the Hilbert spaces L2 (R. ax .31) So the Fourier series of S(t) is absolutely summable. only waves with k ≤ a have nonzero amplitude. the Fourier transform of f vanishes outside of the interval [−a. (3. In particular S(0) = k ak .30) It is easy to compute that ak = 1 2π 2π 0 S(t)e−ikt dt = 1 ˆ f (k). a]. That is. ˆ Make the assumption that k > a implies f (k) = 0.29) Since S(t) is 2π periodic. FOURIER TRANSFORMS 3. The norm of f in the ﬁrst space is equal to the norm of f in the second space. dx) and dk ˆ L2 (R. 2π ). We will be interested in the situation where the Fourier transform is bandlimited.
3. Let z = x − ct and s = t. Suppose that the Fourier transform f (k) = 0 for all k. In the theory of neural networks one wants to synthesize an arbitrary function from linear combinations of translates of a ﬁxed function. then you know f at all points. Find the constant solutions of this equation. where a ranges over all real numbers. Hint: Let gm (x) = g(x − mπ/a). . it may ˆ ˆ be easier to deal with the inverse Fourier transform. is dense in L2 (R). 3. A traveling wave solution is a solution u for which ∂u/∂s = 0. Find ∂/∂x and ∂/∂t in terms of ∂/∂z and ∂/∂s. Deﬁne the translate fa by fa (x) = f (x − a). then g = 0. Find the dispersion relations for each of these linear equations.) Then prove that gm (k) = exp(−imπk/a)ˆ(k). PDE PROBLEMS The problem is to prove that ∞ 51 f (x) = m=−∞ f( mπ mπ )g(x − ). It helps to use the Fourier transform of these functions. Find the range of wave numbers (if any) that are responsible for any instability of these linear equations. Hint: It is suﬃcient to show that if g is in L2 (R) with (g. 2. a a This says that if you know f at multiples of π/a. Finally. 1. ∂t ∂x2 We are interested in solutions u with 0 ≤ u ≤ 1. Find the linear equations that are the linearizations about each of these constant solutions.9. Consider the initial value problem for the reactiondiﬀusion equation 2. (Actually. 3. The task is to show that the set of all linear combinations of the functions fa . Let f ˆ be a function in L2 (R). note that the functions gm (k) are ˆ g ˆ orthogonal. Write the ordinary diﬀerential equation for a traveling wave solution (in terms of du/dz). Write the partial diﬀerential equation in these new variables. fa ) = 0 for all a. (Why is this suﬃcient?) This can be done using Fourier analysis.9 PDE Problems ∂u ∂2u = + u − u2 . The task is to prove that f (x) = m cm gm (x) with cm = f (mπ/a). First prove that the Fourier transform of g(x) is given by g (k) = π/a for k ≤ a and g (k) = 0 for k > a.
Take c > 0. FOURIER TRANSFORMS 4. Look for a traveling wave solution that goes from 1 at z = −∞ to 0 at z = +∞. Find the ﬁxed points and classify them. Write this ordinary diﬀerential equation as a ﬁrst order system. 5. For which values of c are there solutions that remain in the interval from 0 to 1? .52 CHAPTER 3.
Which ones need correc¯ ¯ ¯ ¯ ¯¯ ¯ ¯ tion? z + w = z + w. 1. 6. What is the power series of log(1 + z) about z = 0? What is the radius of convergence of this power series? 9. 8. zw = zw. Make suitable corrections. Which ones need correction? z + w = z + w. Discuss the identities elog(z) = z . 3. Find the cube roots of 1. Deﬁne log(z) so that −π < and log(ew ) = w.1 Complex number quiz 1 3+4i . What is the power series of cos(z) about z = 0? What is its radius of convergence? 10. Make suitable corrections.Chapter 4 Complex integration 4. z − w = z − w. Deﬁne z w = ew log z . 4. perhaps changing an equality to an inequality. How many solutions are there of cos(z) = w with −π < 53 z ≤ π. Fix w. Simplify 1 2. 5. z/w = z/w. Simplify  3+4i . Here are some identities for complex conjugate. perhaps changing an equality to an inequality. log(z) ≤ π. Here are some identities for absolute value. z/w = z /w. z − w = z − w. Find ii . zw = z w. 7.
Green’s theorem. A diﬀerential form p dx + q dy is said to be closed in a region R if throughout the region ∂q ∂p = . An exact form is closed. however. If S is a bounded region with oriented boundary ∂S. (4.2) Theorem. Consider a region R and two oriented curves C1 and C2 in R. and if C1 ∼ C2 in R. C (4.1) It is said to be exact in a region R if there is a function h deﬁned on the region with dh = p dx + q dy. Then C ∼ 0 (C is homologous to 0) in R means that there is a bounded region S such that S and its oriented boundary ∂S are contained in R such that ∂S = C. The sum C1 + C2 of two oriented curves is obtained by following one curve and then the other. It is. ∂x ∂y (4. then −C is the oriented curve with the opposite orientation. then p dx + q dy = ∂S S ( ∂p ∂q − ) dx dy. (4. If p dx + q dy is closed in some region R. then p dx + q dy = C1 C2 p dx + q dy.2. If p dx + q dy is closed in some region R. The converse is not true. COMPLEX INTEGRATION 4. Corollary.6) . exact in the plane minus the negative axis. x2 + y 2 (4.5) If C is an oriented curve. The form (−y dx + x dy)/(x2 + y 2 ) is not exact in this region. the plane minus the origin. for instance.1 Complex functions Closed and exact forms In the following a region will refer to an open subset of the plane. Consider. Corollary.54 CHAPTER 4. Then C1 ∼ C2 (C1 is homologous to C2 ) in R means that C1 − C2 ∼ 0 in R.3) where −π/2 < θ < π/2. In this region −y dx + x dy = dθ.4) Consider a region R and an oriented curve C in R. and if C ∼ 0 in R. then p dx + q dy = 0. The diﬀerence C1 − C2 is deﬁned by following one curve and then the other in the reverse direction.2 4. ∂x ∂y (4.
12) (4.15) According to the CauchyRiemann equations.9) ∂z and ∂ m n z z = z m n¯n−1 .11) In this case we say that f (z) is an analytic function of z in this region.2. ∂z ¯ (4. C (4.7) and ∂ 1 ∂ ∂ = − ∂z ¯ ∂x i ∂y (4.2. (4. and conversely.4. this is a closed form. Theorem (Cauchy integral theorem) If f (z) is analytic in a region R. Every polynomial in x.2 CauchyRiemann equations ∂ ∂ 1 ∂ = + ∂z ∂x i ∂y Write z = x + iy. Deﬁne partial diﬀerential operators (4. Then for each term in ¯ such a polynomial ∂ m n z z = mz m−1 z n ¯ ¯ (4.10) ∂z ¯ Let w = u + iv be a function f (z) of z = x + iy. then f (z) dz = 0. ∂x ∂iy This gives the CauchyRiemann equations ∂u ∂v = ∂x ∂y and ∂u ∂v =− . and if C ∼ 0 in R. Suppose that this satisﬁes the system of partial diﬀerential equations ∂w = 0. Explicitly ∂(u + iv) ∂(u + iv) − = 0. ¯ z (4.2. ∂x ∂y (4. COMPLEX FUNCTIONS 55 4.8) The justiﬁcation for this deﬁnition is the following. y may be written as a polynomial in z.13) (4. z .16) .3 The Cauchy integral theorem Consider an analytic function w = f (z) and the diﬀerential form w dz = f (z) dz = (u + iv) (dx + idy) = (u dx − v dy) + i(v dx + u dy).14) 4.
COMPLEX INTEGRATION Example: Consider the diﬀerential form z m dz for integer m = 1.4 Polar representation zn . we conclude that for a circle C(z) of radius r centered at z we have 1 dξ = 2πi. (4. when m < 0 it is deﬁned in the punctured plane (the plane with 0 removed).56 CHAPTER 4.20) This can also be written From this we derive dz = dx + i dy = dr eiθ + rieiθ dθ. C(0) 1 dz = z i dθ = 2πi.17) On the other hand. We conclude that 2π f (z) dz = C(0) 0 2π f (z)z i dθ. (4. 0 (4. However dz/z is exact in a cut plane. since z m dz = 1 dz m+1 . It is exact. that is.23) z r Notice that this does not say that dz/z is exact in the punctured plane.24) In particular.2. (4.22) z = reiθ . When m ≥ 0 this is deﬁned in the entire complex plane. The reason is that the angle θ is not deﬁned in this region.21) dr dz = + i dθ.19) (4. (4.25) By a change of variable. Let C(0) be a circle of radius r centered at 0. n! n=0 ∞ The exponential function is deﬁned by exp(z) = ez = It is easy to check that ex+iy = ex eiy = ex (cos(y) + i sin(y)). a plane that excludes some line running from the origin to inﬁnity. This may also be written (4. the diﬀerential form dz/z is closed but not exact in the punctured plane. 4. (4.18) Sometimes it is useful to represent a complex number in the polar representation z = x + iy = r(cos(θ) + i sin(θ)). (4.26) C(z) ξ − z . m+1 (4.
4.5 Branch cuts (4. 2 Again one must make a convention about the cut.34) Therefore the left hand side has absolute value bounded by .3.2. Let R be the region R with the point z removed. Then the integral on the right hand side has integral with absolute value bounded by 1 2π 2π 0 δ δdθ = . In the same way one can deﬁne such functions as 1 z = exp( log(z)).29) Two convenient choices are 0 < θ < 2π (cut along the positive axis and −π < θ < π (cut along the negative axis).1 Complex integration and residue calculus The Cauchy integral formula Theorem. .30) 4. The function f (ξ) is continuous at ξ = z. Let z be a point in S.27) dz dr = + i dθ z r is exact in a cut plane. (Cauchy integral formula) Let f (ξ) be analytic in a region R. (4. Since is arbitrary.31) C Proof: Let Cδ (z) be a small circle about z. Let C ∼ 0 in R.33) Cδ (z) Consider an arbitrary > 0. Then C ∼ Cδ (z) in R .3 4. Therefore there is a δ so small that for ξ on Cδ (z) the absolute value f (ξ)−f (z) ≤ .28) (4. √ (4. It follows that 1 2πi It follows that 1 2πi C C f (ξ) 1 dξ = ξ−z 2πi Cδ (z) f (ξ) dξ. where S is a bounded region contained in R. ξ−z (4.32) f (ξ) 1 dξ − f (z) = ξ−z 2πi (4. ξ−z f (ξ) − f (z) dξ. Then f (z) = 1 2πi f (ξ) dξ. ξ−z (4.3. Therefore dz = d log(z) z in a cut plane. the left hand side is zero. COMPLEX INTEGRATION AND RESIDUE CALCULUS 57 4. so that C = ∂S. log(z) = log(r) + iθ Remember that (4.
(Residue Theorem) Say that C ∼ 0 in R. C (4. When z > w this allows us to estimate 1 1 ≤ . . Cδ (z0 ) (4. 1 2πi g(z) 1 dξ = z − z0 2πi f (z) dz.3. so that C = ∂S with the bounded region S contained in R. we have z − w ≤ z ± w ≤ z + w. Let Cδ (z0 ) be a circle about z0 that contains no other singularity. Furthermore. If f (z) = g(z)/(z − z0 ) with g(z) analytic near z0 and g(z0 ) = 0.35) Theorem. . .3 Estimates Recall that for complex numbers we have zw = zw and z/w = z/w. Theorem. (4. then f (z) is said to have a pole of order 1 at z0 .41) . . .37) z→z0 Proof. .3. − Ck . we have ez  = e z (4. By the Cauchy integral formula for each suﬃciently small circle C about z0 the function g(z) satisﬁes g(z0 ) = This is the residue. Then k f (z) dz = 2πi C j=1 res(zj ). Suppose that f (z) is analytic in R except for isolated singularities z1 . If f (z) = g(z)/(z − z0 ) has a pole of order 1. . .58 CHAPTER 4. Ck around these singularities such that C ∼ C1 + · · · + Ck in R .2 The residue calculus Say that f (z) has an isolated singularity at z0 .36) Proof: Let R be R with the singularities omitted. COMPLEX INTEGRATION 4. (4.38) C 4. Apply the Cauchy integral theorem to C − C1 − . (4. then its residue at that pole is res(z0 ) = g(z0 ) = lim (z − z0 )f (z). zk in S. Consider small circles C1 .40) . z ± w z − w Finally. Then the residue of f (z) at z0 is the integral res(z0 ) = 1 2πi f (z) dz. .39) (4. .
4 Problems 1 dx x(4 + x2 ) by contour integration.45) We conclude that for k ≤ 0 ∞ −∞ e−ikx (4. x2 + 1 (4. including estimation of integrals that vanish in the limit of large contours. that is.43) The ﬁrst thing is to analyze the singularities of this function. Take Ca to run along the x axis from −a to a and then along the semicircle z = aeiθ from θ = 0 to θ = π.44) for each a > 1. g(i) = ek /(2i). Evaluate . There are poles at z = ±i. The integral runs from a to −a and then around a semicircle in the lower half plane. This suggests looking at a closed oriented curve Ca in the upper half plane.3. We conclude that for all real k ∞ −∞ e−ikx x2 1 dx = πe−k . Show all steps.46) Next look at the case when k ≥ 0. Furthermore. By the residue theorem e−ikz Ca 1 dz = πek z2 + 1 (4. The residue at z = −i is e−k /(−2i). The contribution from the semicircle is bounded by π 0 1 a a dθ = π 2 . By the residue theorem. This is the Fourier transform of a function that is in L2 and also in L1 . √ 0 ∞ 1. there is an essential singularity at z = ∞.47) 4. In this case we could look at an oriented curve in the lower half plane.42) where k is real.4.4. PROBLEMS 59 4. +1 (4. the integral is −πe−k . The idea is to use the analytic function f (z) = e−ikz 1 . So the residue is the value of g(z) = e−ikz /(z + i) at z = i.4 A residue calculation ∞ −∞ Consider the task of computing the integral e−ikx 1 dx x2 + 1 (4. First look at the case when k ≤ 0. The essential singularity of the exponential function has the remarkable feature that for z ≥ 0 the absolute value of e−ikz is bounded by one. z2 + 1 (4. If a > 1 there is a singularity at z = i inside the curve. Now let a → ∞. a2 − 1 a −1 1 dx = πek .
60 CHAPTER 4. 5.48) To prove it. Say that f (z) has several roots inside the contour C. Find the residue of f (z)/f (z) at such a z0 . In the following problems f (z) is analytic in some region.5. Use the result of the previous problem to show that the number of roots (counting multiplicity) inside C is k.49) . where h(z) is analytic with h(z0 ) = 0. Say that f (z) = a0 + a1 z + a2 z 2 + · · · + an z n is a polynomial. Show that on this contour f (z) = ak z k g(z) where g(z) − 1 < 1 on the contour. We say that f (z) has a root of multiplicity m at z0 if f (z) = (z − z0 )m h(z). it is suﬃcient to estimate twice the integral over the interval from 0 to π/2. 3. Evaluate 1 2πi 4. Furthermore. b b (4. On this interval use the inequality (2/π)θ ≤ sin(θ). COMPLEX INTEGRATION 2. suppose that C is a contour surrounding the origin on which ak z k  > f (z) − ak z k . b (4. f (z) C 4.1 More residue calculus Jordan’s lemma Jordan’s lemma says that for b > 0 we have 1 π π 0 e−b sin(θ) dθ ≤ 1 . Find the number of roots (counting multiplicity) of z 6 + 3z 5 + 1 inside the unit circle.5 4. This gives 2 π π 2 e−2bθ/π dθ = 0 1 1 (1 − e−b ) ≤ . f (z) dz.
there is an essential singularity at z = ∞. By the residue theorem e−ikz Ca 1 dz = 2πiek z−i (4. C ∼ C(z) in R.5. If a > 1 there is a singularity at z = i inside the curve. Let C be an oriented curve in R such that for each suﬃciently small circle C(z) about z. The contribution from the semicircle is bounded using Jordan’s lemma: π 1 a 1 a dθ ≤ π (4. MORE RESIDUE CALCULUS 61 4. (Cauchy formula for derivatives) Let f (ξ) be analytic in a region R including a point z.53) eka sin(θ) a−1 −ka a − 1 0 We conclude that for k < 0 ∞ −∞ e−ikx 1 dx = 2πiek . z−i (4.3 Cauchy formula for derivatives Theorem.50) where k is real.51) The ﬁrst thing is to analyze the singularities of this function. The idea is to use the analytic function f (z) = e−ikz 1 . We conclude using Jordan’s lemma that for k < 0 ∞ −∞ e−ikx 1 dx = 0. So the residue is the value of g(z) = e−ikz at z = i.5. g(i) = ek . x−i (4. Then the mth derivative satisﬁes 1 f (ξ) 1 (m) f (z) = dξ. The residue is zero. The integral runs from a to −a and then around a semicircle in the lower half plane. There is a pole at z = i.4. In this case we could look at an oriented curve in the lower half plane. .2 A more delicate residue calculation ∞ −∞ Consider the task of computing the integral e−ikx 1 dx x−i (4.52) for each a > 1.56) m! 2πi C (ξ − z)m+1 Proof: Diﬀerentiate the Cauchy integral formula with respect to z a total of m times.54) Next look at the case when k < 0. x−i (4. First look at the case when k < 0. Now let a → ∞.5. This is the Fourier transform of a function that is in L2 but not in L1 . that is.55) 4. Furthermore. (4. Take Ca to run along the x axis from −a to a and then along the semicircle z = aeiθ in the upper half plane from θ = 0 to θ = π.
and then setting z = z0 . taking the derivative. C (4. Furthermore. Theorem. and evaluating at z = i. First look at the case when k ≤ 0. So the residue is calculated by letting g(z) = e−ikz /(z + i)2 . By the Cauchy formula for derivatives for each suﬃciently small circle C about z the m − 1th derivative satisﬁes 1 1 g (m−1) (z0 ) = (m − 1)! 2πi g(z) 1 dz = (z − z0 )m 2πi f (z) dz. Consider a closed oriented curve Ca in the upper half plane. there is an essential singularity at z = ∞.58) C The expression given in the theorem is evaluated in practice by using the fact that g(z) = (z − z0 )m f (z) for z near z0 . There are poles at z = ±i.62 CHAPTER 4.60) The ﬁrst thing is to analyze the singularities of this function. The idea is to use the analytic function f (z) = e−ikz (z 2 1 .62) . but it can be tedious.59) where k is real. performing the diﬀerentiation. The pole there is of order 2. (x2 + 1)2 2 (4. Now let a → ∞. By the residue theorem e−ikz Ca z2 1 1 dz = π(1 − k)ek +1 2 (4. Take Ca to run along the x axis from −a to a and then along the semicircle z = aeiθ from θ = 0 to θ = π. This is routine. then its residue at that pole is 1 res(z0 ) = g (m−1) (z0 ). This is the Fourier transform of a function that is in L2 and also in L1 . 4.61) for each a > 1. If a > 1 there is a singularity at z = i inside the curve.57) (m − 1)! Proof. If f (z) = g(z)/(z − z0 )m has a pole of order m.4 Poles of higher order If f (z) = g(z)/(z − z0 )m with g(z) analytic near z0 and g(z0 ) = 0 and m ≥ 1. COMPLEX INTEGRATION 4.5 A residue calculation with a double pole ∞ −∞ Conﬁder the task of computing the integral e−ikx 1 dx (x2 + 1)2 (4.5. then f (z) is said to have a pole of order m at z0 . + 1)2 (4. We conclude that for k ≤ 0 ∞ −∞ e−ikx 1 1 dx = π(1 − k)ek . g (i) = (1 − k)ek /(4i). The contribution from the semicircle vanishes in that limit. that is.5. (4.
then one knows it all the way out to the radius of convergence of the Taylor series about that point.6 4.66) C(z0 ) This theorem is remarkable because it shows that the condition of analyticity implies that the Taylor series always converges. Furthermore. The only constraint is that there must be a function that is analytic inside the circle and that extends f (z). THE TAYLOR EXPANSION 63 Next look at the case when k ≥ 0.64) Proof. Thus one must avoid the singularity of this extension of f (z) that is closest to z0 . On the left hand side apply the Cauchy integral formula to get f (z). m! m=0 ∞ (4. We conclude that for all real k ∞ 1 1 e−ikx 2 dx = π(1 + k)e−k (4. Let C(z0 ) be a circle centered at z0 such that C(z0 ) and its interior are in R. This explains why the radius of convergence of the series is the distance from z0 to this nearest singularity. .6. Then for z in the interior of C(z0 ) f (z) = 1 (m) f (z0 )(z − z0 )m . On the other hand.63) (x + 1)2 2 −∞ 4.65) has a geometric series expansion. take the radius of the circle C(z0 ) as large as possible. essential singularities. Multiply by (1/2πi)f (ξ) dξ and integrate around C(z0 ). such as branch cuts. the integral is −π(1 + k)e−k /2. If one knows an analytic function near some point. (4. The radius of convergence is the distance to the nearest such intrinsic singularity. m+1 (ξ − z0 ) m! (4.1 The Taylor expansion Radius of convergence Theorem. In each term in the expansion on the right hand side apply the Cauchy formula for derivatives in the form 1 2πi f (ξ) 1 (m) dξ = f (z0 ). The reason that we talk of an analytic extension is that artiﬁcialities in the deﬁnition of the function. For each ﬁxed ξ the function of z given by 1 1 1 1 1 = = = 0 ξ−z ξ − z0 + z0 − z (ξ − z0 ) 1 − z−z0 (ξ − z0 ) m=0 ξ−z ∞ z − z0 ξ − z0 m . should not matter. singularities such as poles. Let f (z) be analytic in a region R including a point z0 .6. The integral runs from a to −a and then around a semicircle in the lower half plane.4. The residue at z = −i is −(1 + k)e−k /(4i). In this case we could look at an oriented curve in the lower half plane. and branch points are intrinsic. By the residue theorem.
2 Riemann surfaces When an analytic function has a branch cut. then it is perfectly well deﬁned. Sheet II is another copy of the complex plane. This appears to lead to the necessity of introducing an artiﬁcal branch cut in the deﬁnition of the function. 4. This is a twodimensional curve in a four dimensional space of two complex variables. Where one makes the transition from one sheet to another depends on the choice of branch cut. If we wish. then there are usually two corresponding values of w. As a ﬁnal example. But if we think of it as a function on the Riemann surface. take the function w = z(z − 1). it changes from one sheet to the other. then it is ambiguous. This is a kind of twodimensional circle in a four dimensional space of two complex variables. Thus if we want to think of z(z − 1) as a function of a complex variable. we may think of sheet I as the complex plane cut between 0 and 1. The value of the square root function on the point with coordinates z. w on the circle w2 = z(z − 1) is w. If the analytic continuation process begins at some point and winds around a branch point. The most natural deﬁnition of this function is as a function on the curve exp(w) = z. Sheet II is another copy of the complex plane cut along the negative axis. √ As an example. The Riemann surface has two sheets. But the Riemann surface itself is independent of the notion of sheet. The value of the function on the point with coordinates z. The value of w also varies continuously. also cut between 0 and 1. until blocked by some instrinsic singularity. and it keeps track of what sheet the z is on. Notice that if z is given. If we wish. The function is then deﬁned out to the radius of convergence associated with that point. COMPLEX INTEGRATION But then for each point in this larger region there is another Taylor series. The process may be carried out over a larger and larger region. and it keeps track of what sheet the z is on. The most natural deﬁnition of this function is as a function on the curve w2 = z(z − 1). These copies are called sheets. It is known as analytic continuation. but instead as a function on a Riemann surface. w on the parabola w2 = z is w. As z crosses the cut. As z crosses the cut. But if we think of it as a function on the Riemann surface. The value of w also varies continuously.6. The Riemann surface has two sheets. This is a kind of twodimensional parabola in a four dimensional space of two complex variables. then it may lead to a new deﬁnition of the analytic function at the original point. it is an indicator of the fact that the function should not be thought of not as a function on a region of the complex plane. A Riemann surface corresponds to several copies of a region in the complex plane. The most natural deﬁnition of this function is as a function on the curve w2 = z. then it is perfectly well deﬁned. take the function w = log z. take the function w = z. As another example.64 CHAPTER 4. it changes from one sheet to the other. we may think of sheet I as the complex plane cut along the negative axis. The . Thus if we want to think of z as a function of a complex variable. Notice that if z is given. However this may be avoided by introducing the concept of Riemann surface. then there are usually two corresponding values √ of w. then it is ambiguous.
then there are inﬁnitely many corresponding values of w. it changes from one sheet to the other. then it is ambiguous. The Riemann surface has inﬁnitely many sheets. The inﬁnitely many sheets form a kind of spiral that winds around the origin inﬁnitely many times. w on the curve exp(w) = z is w. THE TAYLOR EXPANSION 65 value of the logarithm function on the point with coordinates z. and it keeps track of what sheet the z is on.4. we may think of each sheet as the complex plane cut along the negative axis.6. As z crosses the cut. then it is perfectly well deﬁned. But if we think of it as a function on the Riemann surface. If we wish. Thus if we want to think of log z as a function of a complex variable. The value of w also varies continuously. . Notice that if z is given.
COMPLEX INTEGRATION .66 CHAPTER 4.
φ = φ(a). This is why distributions are also called generalized functions.Chapter 5 Distributions 5. but we shall ignore that. φ converge to the number F. A distribution is a linear functional from this space to the complex numbers. The value of the distribution F on the test function φ is written F. (5. (5. φ . φ = −∞ δ(x − a)φ(x) dx. For this reason the distribution is often written in the incorrect but suggestive notation ∞ δa . Example. The distribution δa is deﬁned by δa . let > 0 and consider the function 1 δ (x − a) = .4) Operations on distributions are deﬁned by looking at the example of a distribution deﬁned by a function and applying the integration by parts formula 67 .3) π (x − a)2 + 2 The limit of the distributions deﬁned by these locally integrable functions as ↓ 0 is δa . It must satisfy a certain continuity condition. However a distribution may be written as a limit of functions. then we may deﬁne a distribution ∞ F.2) This is not given by a locally integrable function. A sequence of distributions Fn is said to converge to a distribution F if for each test function φ the numbers Fn . inﬁnitely diﬀerentiable and with compact support.1 Properties of distributions ∞ Consider the space Cc (R) of complex test functions. (5. For instance. If f is a locally integrable function. These are complex functions deﬁned on the real line. φ = −∞ f (x)φ(x) dx. (5.1) Thus many functions deﬁne distributions. φ .
Then H = δ. x2 a (5. the associative law. This is deﬁned in the obvious way by g · F. Clearly the product δ(x) · x = 0. On the other hand. φ . for each a > 0 let loga (x) = log(x) for x ≥ a and loga (x) = log(a) for x < a. Here the principal value PV 1/x is given by x φ(x) dx. φ = φ(0). the derivative is deﬁned by F . and in general nonlinear operations are not deﬁned. (5.12) x x x . for instance.9) But [φ(a) + φ(−a) − 2φ(0)]/a converges to zero.7) x2 + 2 −∞ √ This can be seen by writing log x = lim log x2 + 2 . H(x) = 0 for x < 0. (5. φ = lim ↓0 x ∞ x>a We can use this sequence to compute the derivative of PV 1/x. we would get 0 = 0 · PV 1 1 1 = (δ(x) · x) · PV = δ(x) · (x · PV ) = δ(x) · 1 = δ(x). So if the associate law were to hold. φ = − F. for instance. some algebraic operations involving distributions are quite tricky. Here δ is given by δ. Then loga (x) also approaches log(x) as a → 0. Apply this to the three distributions δ(x). so this is ∞ −∞ d 1 PV dx x φ(x) dx = lim a→0 − x>a 1 [φ(x) − φ(0)] dx. Also. For instance.10) Another operator is multiplication of a distribution by a smooth function g in C ∞ . DISTRIBUTIONS in that case. (5. For instance. φ = F. φ = lim a↓0 x 1 φ(x) dx.. the product x · PV 1/x = 1 is one. x. Then f (x) = PV 1/x.68 CHAPTER 5.6) Example: Consider the locally integrable log function f (x) = log x. Consider. and PV 1/x.8) 1 . Thus. x2 (5. gφ . This says that PV PV 1 . They may not have values at points. x (5. So its derivative is another approximating sequence for PV 1/x.11) Distributions are not functions. (5. the square of a distribution is not always a well deﬁned distribution. (5. A distribution can be approximated by more than one sequence of functions. We get ∞ −∞ d 1 PV dx x φ(x) dx = − lim a→0 x>a 1 φ (x) dx = lim a→0 x − x>a 1 φ(a) + φ(−a) φ(x) dx + .5) Example: Consider the locally integrable Heaviside function given by H(x) = 1 for x > 0.
These are the limits of the functions δ (x). g[φ] . with a > 0. MAPPING DISTRIBUTIONS 69 5. then the forward push is δ(u − 5). which is not deﬁned. g (a) (5. it is actually more common to think of a distribution as being a covariant object. To consider a distribution as a covariant object is a somewhat awkward act in general.18) 2 = 1 2 1 1 + x−i x+i .16) (5. This is because √ √ φ( u2 + 4) + φ(− u2 + 4) √ g[φ](u) = .5. (5. and 1/(x + i0). The backward pull of the distribution by a smooth function g is deﬁned in at least some circumstances by F ◦ g. 1/(x − i ). Let u = h(x) = x2 . so the distribution F is contravariant. g (x) (5.13) φ(x) .19) . φ = F. The relations between these functions are given by δ (x) = and x2 1 π x2 + x + 2 = 1 2πi 1 1 − x−i x+i . 2 u2 + 4 So if F = δ. This is because δ(u − 5)φ(u) du = δ(x − 3)φ(x2 − 4) dx. On the other hand.14) Example. 1/(x − i0).2. then F ◦g = 1 (δ2 + δ−2 ) 4 (5. Example: If F is the distribution δ(x − 3) and if u = g(x) = x2 − 4. The backward pull of δ(u) by h is δ(x2 ).15) Example: The backward pull is not always deﬁned. Here g[φ](u) = g(x)=u (5. (5.17) The most important distributions are δ. It is natural to deﬁne the forward push of a distribution F by a smooth proper function g by g[F ]. Then the backward pull of δ(u) under g is δ(x2 − 4) = (1/4)(δ(x − 2) + δ(x + 2)). A proper function is a function such that the inverse image of each compact set is compact. φ = F. φ◦g .2 Mapping distributions A test function φ is naturally viewed as a covariant object. Example: The general formula for the pull back of the delta function is δ(g(x)) = g(a)=0 1 δ(x − a). x/(x2 + 2 ). 1/(x + i ) as ↓ 0. since a distribution is supposed to be a generalized function. PV 1/x. Let u = g(x) = x2 − 4.
4 Approximate delta functions It might seem that one could replace the notion of distribution by the notion of a sequence of approximating functions.21) Furthermore. Also. Here is a result of that nature. Each δ has integral one. Then the functions δ converge to the δ distribution as tends to zero. Each positive locally integrable function h deﬁnes a Radon measure by integrating φ(x) times h(x) dx. For each > 0 deﬁne δ (x) = δ(x/ )/ .70 CHAPTER 5. (5.3 Radon measures A Radon measure is a positive distribution. but the fact is that many diﬀerent sequences may approximate the same distribution. given a real function f ≥ 0 that is only required to be Borel measurable. Every Radon measure extends uniquely by continuity to a linear function on Cc (R). if f is a complex Borel measurable function such that f  has ﬁnite integral. (5. That is. Notice that there are no continuity or symmetry assumptions on the initial function.23) The dominated convergence theorem shows that this approaches the integral ∞ δ1 (u)φ(0) du = φ(0).24) . The convergence takes place in the sense of distributions (smooth test functions with compact support) or even in the sense of Radon measures (continuous test functions with compact support). That is. then the integral of f is deﬁned and satisﬁes. −∞ (5. The proof of this theorem is simple. the point mass distributions δa are Radon measures. there is a natural deﬁnition of the integral such that 0≤ f (x) dµ(x) ≤ +∞. φ = φ(x) dµ(x). the space of continuous functions with compact support. φ ≥ 0. This is true in some sense. Consider a bounded continuous function φ. DISTRIBUTIONS 5. it is a linear functional ∞ µ on Cc (R) such that for each test function φ the condition φ ≥ 0 implies that the value µ. (5. Theorem.20) What is remarkable is that the theory of Lebesgue integration works for Radon measures. It is common to write the value of a Radon measure in the form µ.22) 5. (5.  f (x) dµ(x) ≤ f (x) dµ(x) < +∞. Then ∞ ∞ δ (x)φ(x) dx = −∞ −∞ δ1 (u)φ( u) du. Let δ1 (u) ≥ 0 be a positive function with integral 1.
Calculate the convolution δ ∗ H. PROBLEMS 71 Here is an even stronger result. If G is given by a test function g. Since the functions H are uniformly bounded. it follows from the dominated convergence theorem that H → H in the sense of distributions. Let u be a distribution that is a fundamental solution. where H is the Heaviside function. In other words. and if at least one of them has compact support. and at least one has compact support. What does this say about the associative law for convolution? 6. Theorem. φ = F. Then for each a < 0 we have H (a) → 0. . This product is commutative. 4. φ = Fx Gy . Let L be a constant coeﬃcient linear diﬀerential operator. let Lu = δ. Let H (a) = 0 δ (x) dx. Show that the convolution F = u∗G satisﬁes the equation LF = G. then F ∗ g is given by a smooth function (F ∗ g)(z) = Fx .5. 2.5 Problems If F and G are distributions. φ(x + y) . It is also associative if at least two of the three factors have compact support.5. that is. then their convolution F ∗ G is deﬁned by F ∗ G. Suppose that for each a > 0 that δ (x) dx → 0 x>a (5. where L† is adjoint to L. and for each a > 0 we have 1 − H (a) → 0. 5. 1. Then the functions δ converge to the δ distribution as tends to zero. g(z − x) . Hint: Write LF. Let G be a distribution with compact support. then F ∗ G is given by a locally integrable function ∞ ∞ (f ∗ g)(z) = −∞ f (x)g(z − x) dx = −∞ f (z − y)g(y) dy. Calculate the convolution (1 ∗ δ ) ∗ H and also calculate the convolution 1 ∗ (δ ∗ H). for each a = 0 we have H (a) → H(a) as → 0.25) as → 0. a Here is a proof of this more general theorem. 3. If F and G are given by locally integrable functions f and g. L† φ . For each > 0 let δ ≥ 0 be a positive function with integral 1. Calculate the convolution 1 ∗ δ . 5. It follows by diﬀerentiation that δ → δ in the sense of distributions.
A tempered distribution is a linear functional on S satisfying the appropriate continuity property.28) Subtract these and divide by 2πi. The deﬁnition is ˆ ˆ F . Take L = −d2 /dx2 . φ = F. φ . The advantage of this deﬁnition is that the space S is clearly invariant under Fourier transformation.6 Tempered distributions Let d/dx be the operator of diﬀerentiation. Here are some Fourier transforms for functions. F [δ(x)] = 1 (5. (5. The space S of rapidly decreasing smooth test functions consists of the functions φ in L2 such that every ﬁnite product of the operators d/dx and x applied to φ is also in L2 . DISTRIBUTIONS 7. This gives ∞ −∞ e−ikx δ (x) dx = e− k . Is there a fundamental solution that has support in a bounded interval? Is there a fundamental solution that has support in a semiinﬁnite interval? 5. The ﬁrst two are easy.27) x−i −∞ and ∞ −∞ e−ikx 1 dx = −2πie− k H(k). x+i (5.29) Instead. add these and divide by 2. Every tempered distribution restricts to deﬁne a distribution.26) ˆ Here φ is the Fourier transform of φ. and let x be the operator of multiplication by the coordinate x. (5.31) .33) (5. They are ∞ 1 e−ikx dx = 2πie k H(−k) (5. Also. The advantage of tempered distributions is that one can deﬁne Fourier transˆ forms F of tempered distributions F .32) (5.72 CHAPTER 5. So tempered distributions are more special. (5. This gives ∞ −∞ e−ikx x x2 + 2 dx = −πie− k sign(k).30) The corresponding Fourier transforms for distributions are F [1/(x − i0)] = 2πiH(−k) and F [1/(x + i0)] = −2πiH(k).
6. For. instance.37) One can get an intuitive picture of this result by graphing the approximating functions. x2 (5.36) This makes rigorous sense if interprets it as ∞ −∞ d 1 PV dx x φ(x) dx = lim a→0 − x>a 1 [φ(x) − φ(0)] dx. x (5. The key formula is d x 2+ dx x =− x2 2 3 +c (x2 + 2 )2 π(x2 + . For each of these formula there is a corresponding inverse Fourier transform.34) Example: Here is a more complicated calculation. which is actually positive deﬁnite. TEMPERED DISTRIBUTIONS and F [PV 73 1 ] = −πisign(k). (5.40) .35) dx x x where c is the inﬁnite constant ∞ c= −∞ 1 dx. 0 (5. x2 (5. the inverse Fourier transform of 1 is ∞ δ(x) = ∞ eikx dk = 2π ∞ cos(kx) 0 dk . This example is interesting. And in fact its Fourier transform is positive. 2π (5. This is an approximation to −1/x2 plus a big constant times an approximation to the delta function.39) Of course such an equation is interpreted by integrating both sides with a test function. The Fourier transform of the derivative is obtained by multiplying the Fourier transform of PV 1/x by ik. Another formula of the same type is gotten by taking the inverse Fourier transform of −πisign(k). which is negative deﬁnite.5. because it looks at ﬁrst glance that the derivative of PV 1/x should be −1/x2 . (5.38) 2 2 )2 where c = π/(2 ). Thus the Fourier transform of −1/x2 + cδ(x) is ik times −πisign(k) which is πk. This is PV 1 = −πi x ∞ −∞ eikx sign(k) dk = 2π ∞ sin(kx) dk. But the correct answer for the derivative is −1/x2 + cδ(x). The derivative of PV 1/x is the distribution d 1 1 PV = − 2 + cδ(x).
ˆ (5.) What happens when m = 0? This is more subtle. None of these solutions are a . (The solutions of the homogeneous equation all grow exponentially. These are solutions of the equation Lu = δ. DISTRIBUTIONS 5. Let us start with the equation in one space dimension: − d2 + m2 u = δ(x). Furthermore.7 Poisson equation We begin the study of fundamental solutions of partial diﬀerential equations. The ﬁnal result is that the inhomogeneous equation does have a tempered distribution solution. namely linear combinations of δ(k) and δ (k).47) k This may be thought of as the derivative of −PV 1/k. where L is the diﬀerential operator.43) + m2 There is no problem of division by zero. The equation is − Fourier transform. The inverse Fourier transform of PV 1/k is (1/2)isign(x). ˆ The solution is (5. dx2 (5. dx2 (5.74 CHAPTER 5.41) This is an equilibrium equation that balances a source with losses due to diﬀusion and to dissipation (when m > 0). and δ is a point source.44) This is the only solution that is a tempered distribution. (5. 2m (5.45) Now there is a real question about division by zero. Thus 1 u(x) = − x 2 (5.42) 1 . The solution is 1 u(k) = 2 + ∞δ(k).48) is a solution of the inhomogeneous equation. So the inverse Fourier transform of −d/dkPV 1/k is −(−ix)(1/2)isign(x) = −(1/2)x. The inverse Fourier transform is u(k) = ˆ k2 u(x) = 1 −mx e . This gives k 2 u(k) = 1. but it needs careful deﬁnition. This gives k 2 + m2 u(k) = 1. the homogeneous equation has solutions that are tempered distributions. ˆ (5. The solutions of the homogeneous equation are linear combinations of 1 and x. Fourier transform.46) d2 u = δ(x).
Thus u(x) = 1 4πx (5. In fact.52) What happens when m = 0? The situation is very diﬀerent in three dimensions. It is u(x) = 1 −mx e .8 Diﬀusion equation ∂ 1 ∂2 − σ2 2 ∂t 2 ∂x The diﬀusion equation or heat equation is u = δ(x)δ(t). (5. (5. Fourier transform. It deﬁnes a tempered distribution without any regularization. ˆ The solution is u(k) = ˆ (5. There is so much room that the eﬀect of the source can be completely compensated by diﬀusion alone.51) k 2 + m2 The inverse Fourier transform in the three dimension case may be computed by going to polar coordinates.5. in one dimension there is no diﬀusive equilibrium.56) is a solution of the inhomogeneous equation. This gives k 2 u(k) = 1. (5. k2 (5.57) .55) (5. The next case that is simple to compute and of practical importance is the equation in dimension 3. This gives k 2 + m2 u(k) = 1. 5. In three dimensions there is diffusive equilibrium. (5. The equation is 2 u = δ(x).8.53) Fourier transform. DIFFUSION EQUATION 75 good description of diﬀusive equilibrium.54) But now this is a locally integrable function. This is − 2 + m2 u = δ(x). ˆ The inhomogeneous equation has a solution u(k) = ˆ 1 . 4πx (5.50) 1 .49) This is an equilibrium equation that balances a source with losses due to diﬀusion and to dissipation (when m > 0).
ω) = 1. ω) = ˆ 1 iω + 1 2 2 2σ k = 1 1 . t) = H(t) √ 1 2πσ 2 t e− 2σ2 t . We get (−ω 2 + c2 k 2 )ˆ(k. But this is not so serious.63) (5. (ω − i0)2 − c2 k 2 (5.66) . (5. We can write this also as u(k. (5. The choice of regularization is not the only possible one. The wave equation in 1 + 1 dimensions is ∂2 ∂2 − c2 2 u = δ(x)δ(t). So we have u(k. but it is possible to regularize. u This has solution u(k. So we have u(k. We get 1 (iω + σ 2 k 2 )ˆ(k. We can use the fact that the inverse Fourier transform of 1/(ω − i ) with > 0 is iH(t)e− t .9 Wave equation We will look for the solution of the wave equation with a point source at time zero that lies in the forward light cone. ω) = − ˆ 1 2ck 1 1 − ω − i0 − ck ω − i0 + ck . DISTRIBUTIONS It says that the time rate of change is entirely due to diﬀusion. i ω − i 1 σ2 k2 2 (5.58) 2 This has solution u(k. Fourier transform.61) 5. but we shall see that it is the convention that gives propagation into the future. x2 (5.60) This is a Gaussian. t) = H(t)e− ˆ σ 2 tk2 2 .65) We can use the fact that the inverse Fourier transform of 1/(ω − i0) is iH(t). because it is clear how to regularize. ω) = 1.64) The division by zero is serious.76 CHAPTER 5.59) Here the only division by zero is when both ω and k are zero. 2ck ck (5. u (5. t) = − ˆ 1 1 iH(t) eickt − e−ickt = sin(ckt)H(t). ω) = − ˆ 1 . so the fundamental solution is u(x.62) ∂t2 ∂x Fourier transform. (5.
2c 77 (5. 2ck ck (5.71) 1 .70) However the diﬀerence is that the k variable is three dimensional. Inside the sphere it is dark. going into the future.69) 2 u = δ(x)δ(t). The solution has an even more beautiful expression that exhibits the symmetry: u(x. It is easy to check that this is the Fourier transform of u(x. 4πcx (5.75) . t) = 1 δ(x − ct)H(t). We get (−ω 2 + c2 k 2 )ˆ(k. ω) = − ˆ Thus we have again u(k. The most important ones are u(k.73) This is a beautiful formula. u This has solution u(k.72) 1 2ck 1 1 − ω − i0 − ck ω − i0 + ck . so it is not surprising that the wave equation has a number of interesting homogeneous solutions. ω) = − ˆ Again we can write this as u(k. (ω − i0)2 − c2 k 2 (5. (5. ω) = 1.68) (5. (5. t) = ˆ 1 sin(ckt) ck (5.74) 5. 2πc (5.10. It represents an expanding sphere of inﬂuence.5.10 Homogeneous solutions of the wave equation The polynomial ω 2 − k 2 vanishes on an entire cone.67) The wave equation in 3 + 1 dimensions is ∂2 − c2 ∂t2 Fourier transform. t) = 1 [H(x + ct) − H(x − ct)] H(t). t) = 1 δ(x2 − c2 t2 )H(t). HOMOGENEOUS SOLUTIONS OF THE WAVE EQUATION It is easy to check that this is the Fourier transform of u(x. t) = − ˆ 1 1 iH(t) eickt − e−ickt = sin(ckt)H(t).
It may help to draw some graphs of the functions that approximate these distributions. Show that 11 is a locally integrable function and thus deﬁnes a distribux3 tion. Use Fourier transforms to ﬁnd a tempered distribution u that is the fundamental solution of the equation − d2 u + m2 u = δ(x). dx2 (5.81) The right hand side is not locally integrable. Explain the deﬁnition of the right hand side as a distribution. . DISTRIBUTIONS v (k. consider x/(x2 + 2. Discuss the contrast between the results in the last two problems.77) ck 5. consider 1 1/(x2 + 2 ) 6 . t) = cos(ckt). The result is u(k. ω) = π[δ(ω − ck) + δ(ω + ck)] = 2πωδ(ω 2 − c2 k 2 ).80) )3 .78 and its derivative CHAPTER 5. Show that 1 x 3 1 is a locally integrable function and thus deﬁnes a distrib1 1 d 1 sign(x). 3.76) These are the solutions that are used in constructing solutions of the initial value problem. ω) = iωˆ(k. 2 Hint: To make this rigorous. It is interesting to see what these solutions look like in the frequency representation. Show that its distribution derivative is (5. ω) = −πi ˆ and v (k. (5.79) 1 x3 4 dx 2 (5. 4. 1 = − dx x 3 3 x4 3 ution. Show that its distribution derivative is 1 1 d 1 + cδ(x). ˆ (5. Let m > 0.11 Problems 1.82) Is this a unique tempered distribution solution? Explain. 1 = − dx x 3 3 x4 3 where c= 1 3 ∞ −∞ (5. Hint: To make this rigorous.78) ˆ u 1 [δ(ω − ck) − δ(ω + ck)] = −2πiδ(ω 2 − c2 k 2 )sign(ω) (5.
86) (x2 + 5 2) 3 dx = c . The function 1 x 3 1 is locally integrable. Consider Euclidian space of dimension n = 3.12. ANSWERS TO FIRST TWO PROBLEMS 79 5. Furthermore.91) . 2 + 2) 1 2 + 2) 7 dx (x 3 (x 6 6 1 is not. while its pointwise derivative − 1 3 d 1 1 x =− . Let m > 0. this may be written in the simple form ∞ −∞ (5. But we do have 2 d x 1 x2 2 = − 5 + 5 .89) d 1 φ(x) dx = dx x 1 3 ∞ − −∞ 1 1 [φ(x) − φ(0)] dx. dx (x2 + 2 ) 3 3 (x2 + 2 ) 3 (x2 + 2 ) 3 (5.5. The function 11 is locally integrable.87) Then δ (x) → δ(x) in the sense of distributions.84) Let c = ∞ −∞ 1 x2 dx 3 (x2 + 2 ) 5 3 1 3 (5. Write δ (x) = 2 1 . From the fundamental (5. (5. ∞ −∞ d 1 φ(x) dx = lim →0 dx x 1 3 ∞ [− −∞ 1 x2 + c δ (x)]φ(x) dx.90) Notice that the integrand is integrable for each test function φ. 3 x4 3 (5.88) 3 (x2 + 2 ) 5 3 Since for φ smooth enough c δ (x)[φ(x) − φ(0)] dx → 0. We have x3 4 sign(x) (5. Use Fourier transforms to ﬁnd a tempered distribution u that is the fundamental solution of the equation − 2 u + m2 u = δ(x). (5. while its pointwise derivative − 1 14 3 x3 x3 is not.12 Answers to ﬁrst two problems 1. c (x2 + 2 ) 5 3 (5.83) 5. 2.85) which is easily seen to be proportional to 1/ theory of calculus ∞ 2 −∞ .
92) We can write this as ∞ −∞ d 1 φ(x) dx = dx x 1 3 ∞ − −∞ 1 1 sign(x)[φ(x) − φ(0)] dx. DISTRIBUTIONS Since this derivative is an odd function. . 3 x4 3 (5. 3 (x2 + 2 ) 7 6 (5.80 CHAPTER 5. we have ∞ −∞ d 1 φ(x) dx = lim →0 dx x 1 3 ∞ − −∞ 1 x [φ(x) − φ(0)] dx.93) Again the integrand is integrable.
This is called uniform norm of K and is written K ∞ or simply K when the context makes this clear. These are bounded linear transformations of a Hilbert space into itself. A linear transformation K : H → H is said to be bounded if it maps bounded sets into bounded sets. [The subscript in M2 is supposed to remind us that we are dealing with Hilbert spaces like L2 . This is equivalent to there being a constant M with Ku ≤ M u . and bounded. tells that we are looking at a least upper bound.2 Bounded linear operators Let H be a Hilbert space (a vector space with an inner product that is a complete metric space). The subscript in K ∞ . Furthermore. we have K + L ∞ ≤ K ∞ + L ∞ and KL ∞ ≤ K ∞ L ∞. Every ﬁnite rank operator is HilbertSchmidt.1) Let M2 be the least such M . on the other hand. Every HilbertSchmidt operator is compact. the chapter treats four classes of operators: ﬁnite rank. their sum K + L and product KL are bounded.Chapter 6 Bounded Operators 6. Every compact operator is bounded. Every bounded operator (everywhere deﬁned) is closed and densely deﬁned. However the present chapter treats only bounded operators. A bounded operator K always has an adjoint K ∗ that is a bounded operator. compact.1 Introduction This chapter deals with bounded linear operators. HilbertSchmidt. (6. In fact. those that are closed and densely deﬁned. We shall see in the next chapter that it is also valuable to look at an even broader class of operators.] If K and L are bounded operators. 81 . 6.
82
CHAPTER 6. BOUNDED OPERATORS
It is the unique operator with the property that (u, K ∗ v) = (Ku, v) (6.2)
for all u, v in H. Furthermore K ∗∗ = K, and K ∗ ∞ = K ∞ . For products we have (KL)∗ = L∗ K ∗ , in the opposite order. It is also possible to deﬁne the adjoint of an operator from one Hilbert space to another. Here is a special case. Let g be a vector in the Hilbert space H. Then g deﬁnes a linear transformation from C to H by sending z to the vector zg. The adjoint is a linear transformation from H to C, denoted by g ∗ . It is the transformation that sends v to (g, v), so g ∗ v = (g, v). The adjointness relation is z g ∗ v = (zg, v) which is just z (g, v) = (zg, v). ¯ ¯ Let f be another vector. Deﬁne the operator K from H to H by Ku = f (g, u), that is, K = f g ∗ . Then the adjoint of K is K ∗ = gf ∗ . Example: HilbertSchmidt integral operators. Let (Kf )(x) = with k in L2 , that is, k
2 2
k(x, y)f (y) dy.
(6.3)
=
k(x, y)2 dx dy < ∞.
(6.4)
Then K is bounded with norm K ∞ ≤ k 2 . Proof: Fix x. Apply the Schwarz inequality to the integral over y in the definition of the operator. This gives that (Kf )(x)2 ≤ k(x, y)2 dy f (y)2 dy. Now integrate over x. Example: Interpolation integral operators. Let K be an integral operator such that M1 = sup k(x, y) dx < ∞ (6.5)
y
and M∞ = sup
x
k(x, y) dy < ∞.
(6.6)
(Here we choose to think of k(x, y) as a function of x and y, not as a Schwartz distribution like δ(x − y).) Let 1/p + 1/q = 1. Then for each p with 1 ≤ p ≤ ∞
q the norm Mp of K as an operator on Lp is bounded by Mp ≤ M1p M∞ . In particular as an operator on L2 the norm M2 = K ∞ of K is bounded by 1 1
M2 ≤
M1 M∞ .
(6.7)
The reason for the name interpolation operator (which is not standard) is that the bound interpolates for all p from the extreme cases p = 1 (where the bound is M1 ) and p = ∞ (where the bound is M∞ ). Proof: Write (Kf )(x) ≤ k(x, y)f (y) dy = k(x, y) q · k(x, y) p f (y) dy.
1 1
(6.8)
6.2. BOUNDED LINEAR OPERATORS Apply the H¨lder inequality to the integral. This gives o
1 q
83
(Kf )(x) ≤ It follows that
k(x, y) dy
k(x, y)f (y) dy
p
1 p
≤ M∞
1 q
k(x, y)f (y) dy (6.9)
p
1 p
.
q (Kf )(x)p dx ≤ M∞
p
q k(x, y)f (y)p dy dx ≤ M∞ M1
p
f (y)p dy. (6.10)
Take the pth root to get the bound (Kf )(x) dx
p
1 p
≤ M∞ M1
1 q
1 p
f (y) dy
p
1 p
.
(6.11)
For a bounded operator K the set of points µ such that (µI − K)−1 is a bounded operator is called the resolvent set of K. The complement of the resolvent set is called the spectrum of K. When one is only interested in values of µ = 0 it is common to deﬁne λ = 1/µ and write µ(µI − K)−1 = (I − λK)−1 (6.12)
While one must be alert to which convention is being employed, this should be recognized as a trivial relabeling. Theorem. If complex number µ satisﬁes K ∞ < µ, then µ is in the resolvent set of K. Proof: This is the Neumann series expansion (µI − K)−1 = 1 µ
∞ j=0
1 j K . µj
(6.13)
The jth term has norm K j ∞ /µj ≤ ( K ∞ /µ)j . This is the jth term of a convergent geometric series. Theorem. If for some n ≥ 1 the complex number µ satisﬁes K n ∞ < µn , then µ is in the resolvent set of K. This is the Neumann series again. But now we only require the estimate for the nth power of the operator. Write j = an + b, where 0 ≤ b < n. Then K j ∞ /µj is bounded by ( K ∞ /µ)b (( K n ∞ /µn )a . So this is the sum of n convergent geometric series, one for each value of b between 0 and n − 1. The spectral radius of an operator is the largest value of µ, where µ is in the spectrum. The estimate of this theorem gives an upper bound on the spectral radius. This is a remarkable result, since it has no analog for scalars. However for a matrix the norm of a power can be considerably smaller than the corresponding power of the norm, so this result can be quite useful. The most spectacular application is to Volterra integral operators.
84
CHAPTER 6. BOUNDED OPERATORS Example: Volterra integral operators. Let H = L2 (0, 1) and
x
(Kf )(x) =
0
k(x, y)f (y) dy.
(6.14)
Suppose that k(x, y) ≤ C for 0 ≤ y ≤ x ≤ 1 and k(x, y) = 0 for 0 ≤ x < y ≤ 1. Then Cn Kn ∞ ≤ . (6.15) (n − 1)! As a consequence every complex number µ = 0 is in the resolvent set of K. Proof: Each power K n with n ≥ 1 is also a Volterra integral operator. We claim that it has kernel k n (x, y) with k n (x, y) ≤ Cn (x − y)n−1 . (n − 1)! (6.16)
for 0 ≤ y ≤ x ≤ 1 and zero otherwise. This follows by induction. Since k n (x, y) ≤ C n /(n − 1)!, the HilbertSchmidt norm of K is also bounded by C n /(n − 1)!. However this goes to zero faster than every power. So the Neumann series converges. The norm of a bounded operator can always be found by calculating the norm of a selfadjoint bounded operator. In fact, K 2 = K ∗ K ∞ . Furthermore, ∞ the norm of a selfadjoint operator is its spectral radius. This shows that to calculate the norm of a bounded operator K exactly, one needs only to calculate the spectral radius of K ∗ K and take the square root. Unfortunately, this can be a diﬃcult problem. This is why it is good to have other ways of estimating the norm.
6.3
Compact operators
Let H be a Hilbert space. A subset S is totally bounded if for every > 0 there is a cover of S by a ﬁnite number N of balls. A totally bounded set is bounded. Thus for instance, if H is ﬁnite dimensional with dimension n and S is a cube of side L, then N ≈ (L/ )n . This number N is ﬁnite, though it increases with . One expects a cube or a ball to be totally bounded only in ﬁnite dimensional situations. However the situation is diﬀerent for a rectangular shaped region in inﬁnite dimensions. Say that the sides are L1 , L2 , L3 , . . . , Lk , . . . and that these decrease to zero. Consider > 0. Pick k so large that Lk+1 , Lk+1 , . . . are all less than . Then N ≈ (L1 / )(L2 / ) · · · (Lk / ). This can increase very rapidly with . But such a region that is fat only in ﬁnitely many dimensions and increasingly thin in all the others is totally bounded. A linear transformation K : H → H is said to be compact if it maps bounded sets into totally bounded sets. (One can take the bounded set to be the unit ball.) A compact operator is bounded.
then this is an equation of the ﬁrst kind. Since Kn is compact. . then K is also compact.17) The adjoint K ∗ of a compact operator K is a compact operator. then KL and LK are compact. it follows that KK ∗ is compact. and the solution is u = −K −1 f . then K is compact. Then since K ∗ is bounded. If µ = 0. Let K be a compact operator. The condition for a unique solution of an equation of the ﬁrst kind is the existence of the inverse K −1 . . Approximation theorem. COMPACT OPERATORS 85 If K and L are compact operators.3. . . uk in the unit ball such that every vector Kn u with u in the unit ball is within /3 of some Kn uj . The only possible accumulation point of the spectrum is 0. then this is an equation of the second kind. The proof is a classical /3 argument. If Kn is a sequence of compact operators. If µ = 0 is not an eigenvalue. The example of a Volterra integral operator provides a counterexample: The only point in the spectrum is zero. Proof: Let > 0. . Let K be a compact operator. the operator K −1 will be typically be an unbounded operator that is only deﬁned on a linear subspace of the Hilbert space. If the selfadjoint operator K ∗ K is compact.18) where µ is a parameter. Consider an arbitrary u in the unit ball and pick the corresponding uj . If µ = 0. Since Ku − Kuj = (K − Kn )u + (Kn u − Kn uj ) + (Kn − K)uj . The only nonzero points in the spectrum of K are eigenvalues of ﬁnite multiplicity. K ∗ K(u−uj )) ≤ u−uj K ∗ K(u−uj ) < . then so is their sum K + L. It follows from the last result that K ∗ is compact. Proof: Say that K is compact. Thus the issue for an equation of the ﬁrst kind is whether µ = 0 is not an eigenvalue of K.6. and if Kn − K ∞ → 0 and n → ∞. uk in the unit ball such that for each u in the unit ball there is a j with K ∗ K(u − uj ) < /2. and K is a bounded operator. Then there are u1 . where λ = 1/µ. Choose n so large that K −Kn ∞ < 3 . Let > 0. (6. If K is compact and L is bounded. But this says that K(u−uj ) = (K(u−uj ). . and where f1 = λf . there are ﬁnitely many vectors u1 .19) This shows that the image of the unit ball under K is totally bounded. . Notice that there is no general claim that the eigenvectors of K form a basis for the Hilbert space.20) Ku − Kuj ≤ (K − Kn )u + Kn u − Kn uj + (Kn − K)uj ≤ (6. 3 3 (6. . Very often an equation of the second kind is written u = f1 + λKu. (6. K(u−uj )) = ((u−uj ). it follows that + + = . Spectral properties of compact operators. We want to consider equations of the form 3 µu = f + Ku. .
Then there is an orthonormal basis uj of eigenvectors of K. ∞ the norm of a compact selfadjoint operator is its spectral radius.21) In abbreviated form this is K= j µj uj u∗ . This is because compact operators have much better spectral properties away from zero. if µ = 0 is not an eigenvalue. (6.26) . f ). The operator K has the representation Kf = j µj uj (uj . then (µI − K)−1 will be a bounded operator. K 2 = K ∗ K ∞ . Let M be the diagonal operator from 2 to 2 deﬁned by multiplication by µj . Then K = U ∗ M U. Let K be a compact selfadjoint operator. Spectral theorem for compact selfadjoint operators. Furthermore. Let K be a compact operator.86 CHAPTER 6. j Sketch of proof: The operator K ∗ K is selfadjoint with positive eigenvalues We can write K ∗ Kf = j χ2 uj (uj . In fact. In this case. Unfortunately. Thus equations of the second kind are much nicer. (6. this can be a diﬃcult computation. f ). j (6. j (6.25) χ2 . Its inverse is the unitary operator form 2 to H given by (U ∗ c) = cj uj . Singular value decomposition for compact operators. f ). which in this case is the largest value of µ. Deﬁne the unitary operator U from H to 2 by (U f )j = (uj . f ). and the solution is u = (µI − K)−1 f . The only possible accumulation point of the eigenvalues is zero.24) In abbreviated form this is K= j χj wj u∗ . j (6. where µ is an eigenvalue. The eigenvalues µj of K are real. (6. Then there is an orthonormal family uj and an orthonormal family wj and a sequence of numbers χj ≥ 0 (singular values of K) approaching zero such that the operator K has the representation Kf = j χj wj (uj . Each nonzero eigenvalue is of ﬁnite multiplicity. The issue for an equation of the second kind is whether µ = 0 is not an eigenvalue of K. BOUNDED OPERATORS The condition for a unique solution of an equation of the second kind is the existence of the inverse (µI − K)−1 for a particular value of µ = 0.23) The norm of a compact operator can always be found by calculating the norm of a selfadjoint compact operator.22) There is yet another way of writing the spectral theorem for compact operators.
Then K = V U ∗ DU = W DU . f )..31) j (Remember the convention adopted here that inner products are linear in the second variable. Bej ). another way of writing the singular value decomposition of K. Let √ There is ∗ √ K ∗ K = U DU be the spectral representation of K ∗ K. (6. where the ej form an orthonormal basis. Then Kf = χj wj (uj . (6. y)2 dx dy. y) = j χj wj (x)uj (y).} be an orthonormal basis for H.27) √ √ Since Kf = √K ∗ Kf for each f . What happens in the case of a HilbertSchmidt integral operator is special: the χj go to zero suﬃciently rapidly that j χj 2 < ∞.) Write Kf = i ei j (ei . u).28) j where wj = V uj . 6. Theorem. (6.4 HilbertSchmidt operators Let H be a Hilbert space. (6.29) where the χj → 0. f ). Notice that this theorem gives a fairly clear picture of what a compact operator acting on L2 looks like. If K is a HilbertSchmidt integral operator with (Kf )(x) = k(x. e2 . For a positive bounded selfadjoint operator B the trace is deﬁned by trB = j (ej . It follows from the approximation theorem and from the singular value decomposition that an operator is compact if and only if it is a norm limit of a sequence of ﬁnite rank operators. This is the wellknown polar decomposition. conjugate linear in the ﬁrst variable. where D is diagonal with entries χj ≥ 0.32) . We can always expand a vector in this basis via u= ej (ej .. we can write K = V K ∗ K. Of course this representation may be diﬃcult to ﬁnd in practice. The HilbertSchmidt norm is K 2 = tr(K ∗ K). (6.30) Proof: Let {e1 . It is an integral operator with kernel k(x.6. y)f (y) dy then K 2 2 = k(x.4. A bounded linear operator K : H → H is said to be HilbertSchmidt if tr(K ∗ K) < ∞. f ). Kej )(ej . . HILBERTSCHMIDT OPERATORS Then √ 87 K ∗ Kf = j χj uj (uj . where V g = √ g for all g = K ∗ Kf in the range of K ∗ K. (6.
If K is a HilbertSchmidt operator and L is a bounded operator. as we have seen. 3. . In fact. Find an example of a compact bounded operator on H = L2 that is neither a HilbertSchmidt or an interpolation operator. However we do not need to calculate the eigenvalues. Then K is given by a squaresummable matrix.33) So the kernel of the integral operator K is k(x. Furthermore. K 2 is the square root of the trace of the selfadjoint operator K ∗ K. Let H = L2 be the Hilbert space of square integrable functions on the line. their sum K + L is a HilbertSchmidt operator and K + L 2 ≤ K 2 + L 2 . Theorem. Kej )2 = i j j i (Kej . 6. Proof: Let K be a HilbertSchmidt operator. 4. If K and L are HilbertSchmidt operators. In particular. y) = i j (ei . Kej )ei (x)ej (y). BOUNDED OPERATORS The matrix elements (ei . Kej ) satisfy (ei . Let H = L2 be the Hilbert space of square integrable functions on the line. Since each Kn is compact. Create an example of an interpolation operator that is not a HilbertSchmidt operator. (6. So K may be approximated by a sequence of ﬁniterank operators Kn such that Kn − K 2 → 0. 2. ei )(ei . since. and that is also not a compact operator. Find an example of a bounded operator on H = L2 that is neither a HilbertSchmidt nor an interpolation operator. Kej ) = j Kej 2 = j (ej . Kn − K ∞ → 0.5 Problems 1. (6. Make the example so that the operator is not compact. A HilbertSchmidt operator is compact. Create an example of a HilbertSchmidt operator that is not an interpolation operator.88 CHAPTER 6. It is always true that K ∞ ≤ K 2. and the trace is the sum of the eigenvalues. then the products KL and LK are HilbertSchmidt. The adjoint of a HilbertSchmidt operator is a HilbertSchmidt operator. there are much easier ways to calculate the trace. K ∗ Kej ) = tr(K ∗ K). K ∗ 2 = K 2 . A HilbertSchmidt operator is a bounded operator. Notice that the HilbertSchmidt norm of a bounded operator is deﬁned in terms of a selfadjoint operator.34) This sum is convergent in L2 (dxdy). it follows from the approximation theorem that K is compact. Furthermore. KL 2 ≤ K 2 L ∞ and LK 2 ≤ L ∞ K 2 .
35) j where the sum is ﬁnite. j (6. Let H be a Hilbert space. then so is their sum K + L. Therefore if µ = 1/λ is not an eigenvalue of K acting in R(K). Explictly.6. to solve u = λKu + f. then the calculation of (µI − K)−1 or of (I − λK)−1 may be reduced to a ﬁnitedimensional matrix problem in the ﬁnite dimensional space R(K). (6. The adjoint K ∗ of a ﬁnite rank operator K is ﬁnite rank. 7. Give an example of a HilbertSchmidt operator for which the spectral radius is very diﬀerent from the uniform norm.6. Is it possible for a HilbertSchmidt operator to have its HilbertSchmidt norm equal to its uniform norm? Describe all possible such situations. by the formula (I − λK)−1 = I + (I − λK)−1 λK. This is because K leaves R(K) invariant. A bounded linear transformation K : H → H is said to be ﬁnite rank if its range is ﬁnite dimensional. A ﬁnite rank operator may be represented in the form Kf = zj (uj .37) If K and L are ﬁnite rank operators. Give an example of a HilbertSchmidt operator for which the spectral radius is equal to the uniform norm.39) (6. then there exists a sequence of ﬁnite rank operators Kn such that Kn − K ∞ → 0 as n → ∞.36) Thus in L2 this is an integral operator with kernel k(x. then there is an inverse (I − λK)−1 acting in R(K). Let H be a Hilbert space. However this gives a corresponding inverse in the original Hilbert space. In a more abbreviated notation we could write K= j zj u∗ . If K is ﬁnite rank and L is bounded. If K is a ﬁnite rank operator and µ = 0. (6. 6. The dimension of the range of K is called the rank of K. y) = j zj (x)uj (y). Every ﬁnite rank operator is HilbertSchmidt and hence compact. then KL and LK are ﬁnite rank. The two operators have the same rank. (6. 6.6 Finite rank operators Let H be a Hilbert space. f ). If K is a compact operator.38) . Let H be a Hilbert space. FINITE RANK OPERATORS 89 5.
Then (I − λK)w = λKf . Find all complex numbers λ for which this equation has a unique solution. ∞). ∞). r ar zr ) = λ j zj (uj . 6. zr ). Write w = j aj zj . so that u = f +w. Prove or disprove. (6. Consider functions in L2 (−∞. f ). It is claimed that there exists r > 0 such that for every complex number λ with λ < r the equation has a unique solution. let w be the second term on the right hand side. ∞). Prove or disprove. (6. Consider the integral equation ∞ f (x) − λ −∞ cos( x2 + y 4 )e−x−y f (y) dy = g(x). (6. f ).42) This is a matrix equation that may be solved whenever 1/λ is not an eigenvalue of the matrix with entries (uj . It is claimed that there exists R < ∞ such that for every complex number λ with λ > R the equation does not have a unique solution. Consider the integral equation ∞ f (x) − λ −∞ cos( x2 + y 4 )e−x−y f (y) dy = g(x).40) To solve this. Then aj zj − λ j j zj (uj . 1. Consider functions in L2 (−∞. zr )ar = (uj .90 write CHAPTER 6. Interpret this as a statement about the spectrum of a certain linear operator.7 Problems It may help to recall that the problem of inverting I − λK is the same as the problem of showing that µ = 1/λ is not in the spectrum of K. Consider the integral equation ∞ f (x) − λ −∞ e−x−y f (y) dy = g(x). BOUNDED OPERATORS u = (I − λK)−1 f = f + (I − λK)−1 λKf. Interpret this as a statement about the spectrum of a certain linear operator. 2. . 3.41) Thus aj − λ r (uj . Consider functions in L2 (−∞. Find the solution. Interpret this as a statement about the spectrum of a certain linear operator.
Hint: Diﬀerentiate.6. Consider the integral equation x 91 f (x) − λ 0 f (y) dy = g(x). 5. Find all complex numbers λ for which this equation has a unique solution. This should also determine the eigenvalues.7. Consider functions in L2 (0. Interpret this as a statement about the spectrum of a certain linear operator. Solve a ﬁrst order equation with a boundary condition. . PROBLEMS 4. Consider functions in L2 (0. 1). 1). Interpret this as a statement about the spectrum of a certain linear operator. Find the solution. Consider the integral equation x 1 f (x) − λ 0 y(1 − x)f (y) dy + x x(1 − y)f (y) dy = g(x). Verify this directly. Find all complex numbers λ for which this equation has a unique solution. Hint: The integral operator K has eigenfunctions sin(nπx).
BOUNDED OPERATORS .92 CHAPTER 6.
Furthermore. anomalous point spectrum. continuous spectrum. L∗∗ = L.Chapter 7 Densely Deﬁned Closed Operators 7. If L is a densely deﬁned closed operator. 7. 93 . The adjoint operation maps the ﬁrst two kind of spectra into themselves. The subspace M is said to be closed if M = ¯ M . It is common to divide the spectrum into three disjoint subsets: point spectrum. Let M be a vector subspace of H. and residual spectrum. Furthermore. A complex number λ is in the resolvent set of a densely deﬁned closed operator L if (L − λI)−1 is an everywhere deﬁned bounded operator. then both L−1 and L∗−1 are densely deﬁned closed operators. (This terminology is misleading. The nicest subspaces are closed subspaces.2 Subspaces ¯ Let H be a Hilbert space. The closure M is also a vector subspace of H. The orthogonal complement M ⊥ is a closed subspace of H. ¯ ¯ (M )⊥ = M ⊥ . Finally M ⊥⊥ = M .1 Introduction This chapter deals primarily with densely deﬁned closed operators. and residual spectrum. A complex number λ is in the spectrum of L if it is not in the resolvent set. pseudocontinuous spectrum. in that it treats limits of point spectrum as continuous spectrum.) In this treatment we divide the spectrum into four disjoint subsets: standard point spectrum. Each everywhere deﬁned bounded operator is in particular a densely deﬁned closed operator. then so is its adjoint L∗ . A complex ¯ number λ is in the spectrum of L if and only if λ is in the spectrum of the ∗ adjoint L . but it reverses the latter two. If both L and L∗ have trivial null spaces.
This is equivalent to the condition ⊥ M = {0}. then we say that L1 is a restriction of L2 or L2 is an extension of L1 . then its null space N (L) is the set of all u in H such that [u. (7. z] such that for all [u. DENSELY DEFINED CLOSED OPERATORS For a closed subspace M we always have M ⊥⊥ = M . v]) = (z. Then the adjoint L∗ is the orthogonal complement in H ⊕ H of −L−1 . When L is an operator and [u.] Another way to think of this is to deﬁne the antisymmetric form ω([z. v] in L implies v = 0.3 Graphs Let H be a Hilbert space. If L is a graph. It is not hard to check that L∗−1 = L−1∗ . the pair [z. If we have two graphs L1 and L2 and if L1 ⊂ L2 . If L is a graph. This says that ([z. We shall explore the properties of operators in the next section. We say L is closed if L = L. v] in L. The inner product of [u. v) = 0 for all such [u. and then take the perpendicular graph to that. 0] is in L. and L∗∗ = L. u] such that [u. v]. −u] with [u. Remark: One way to think of the adjoint graph is to deﬁne the negative inverse −L−1 of a graph L to consist of all the ordered pairs [v. 7. where u. then we write Lu = v. Then the adjoint A∗ consists of the orthogonal complement of A with respect to ω. then the inverse graph L−1 consists of all [v. That is. v] in L. u ) + (v. w]. u) − (w. Perhaps the nicest general class of graphs consists of the closed graphs L. v are each vectors in H. then its domain D(L) is the set of all u in H such that there exists a v with [u. ¯ ¯ Write L for the closure of L. v] is in L. v] in the graph of L. The direct sum H ⊕ H with itself consists of all ordered pairs [u. v). We say that a graph L is an operator if [0. v] is in L. u) = (w. v]. w].1) A graph is a linear subspace of H ⊕ H. ¯ the adjoint of its closure L is the same as the adjoint of L. For such a graph the adjoint L∗ is a graph. v ]) = (u. . v). ¯ Theorem. v]. [u. ∗ If L is a graph. Its range R(L) is the set of all v in H such that there exists u with [u. u) − (w. N (L∗ ) = R(L)⊥ . v] with [u . w] in the graph L∗ is orthogonal to each [v. then L∗∗ = L. −u]) = (z. v] in L. Theorem.94 CHAPTER 7. v] in L we have (z. If L is a graph. then the adjoint graph L∗ is always closed. v ). If L is a graph. ¯ A subspace M is dense in H if M = H. [This says to take the graph with negative reciprocal slope. −u] with [u. [v. If L is a graph. then the adjoint graph L consists of the pairs [w. Furthermore. [u . v ] is ([u. It is easy to see that this is equivalent to saying that N (L−1 ) = {0} is the zero subspace.
However in some contexts it is convenient to use instead (λI − L)−1 . This corollary is very important in the theory of linear equations. (L∗ w. When L is an operator and [u. Let L be a closed operator. Corollary. Furthermore. (Closed graph theorem) Let H be a Hilbert space. This is called the Fredholm alternative.7. Then L is a bounded operator. ¯ Corollary. Thus a linear equation Lu = g has a solution u if and only if g is orthogonal to all solutions v of L∗ v = 0. L is densely deﬁned if and only if L∗ is an operator.4 Operators If L is a graph. By the closed graph theorem. u) = (w. and L∗∗ = L.5 The spectrum Theorem. Let L be a closed operator with domain D(L) = H. then L is an operator if [0. Let L be a closed operator. the deﬁnition of the adjoint is that w is in the domain of L∗ and L∗ w = z if and only if for all u in the domain of L we have (z. then we write Lu = v. In that case (L−λI)−1 is a closed operator with domain D((L − λI)−1 ) = H. However it is important to be alert to which one is being used. which is of course just the negative. ¯ is also an operator. Lu). Suppose that R(L) is a closed subspace of H. Proof: Apply the last theorem of the previous section to L−1 . We shall usually refer to (L − λI)−1 as the resolvent of L. (The converse is obvious. v] in L implies v = 0. (L − λI)−1 is a bounded operator.2) Perhaps the nicest general class of operators consists of the densely deﬁned closed operators L. R(L) = N (L∗ )⊥ . (7. 1. v] is in L. We say that λ is in the resolvent set of L if N (L−λI) = {0} and R(L−λI) = H. that is. L is an operator if and only if L∗ is densely deﬁned. u) = (w. For such an operator the adjoint L∗ is a densely deﬁned closed operator. ¯ Proof: Apply the previous corollary to L∗ and use L∗∗ = L. For a densely An operator L is said to be closable if L deﬁned closable operator the adjoint L∗ is a densely deﬁned closed operator. This gives N (L∗−1 ) = D(L)⊥ . 7. Let L be a linear operator. There is no great distinction between these two possible deﬁnitions of resolvent. OPERATORS 95 Corollary.4. We say that λ is in the standard point spectrum of L if N (L − λI) = {0} and R(L − λI)⊥ = {0}. . Lu). ¯ and L∗∗ = L.) 0. Then in this special case the corollary says that R(L) = N (L∗ )⊥ . 7. It is easy to see that L is an operator precisely when N (L−1 ) = {0} is the zero subspace.
4. 7. In that case (L − λI)−1 is a closed operator with dense domain D((L − λI)−1 ) not equal to H. λ in the residual spectrum of L is equivalent to (L − λI)−1 being a closed operator that is not densely deﬁned. Then: ¯ 0. with zero as their only possible accumulation point. DENSELY DEFINED CLOSED OPERATORS 2. It follows that all elements λ = 1/µ of the spectrum of L are eigenvalues of ﬁnite multiplicity. Then λ is in the resolvent set of L if and only if 1/λ is in the resolvent set of L−1 . The number λ is in the resolvent set of L if and only if λ is in the resolvent set of L∗ . The number λ is in the standard point spectrum of L if and only if λ is ∗ in the standard point spectrum of L . The number λ is in the anomalous point spectrum of L if and only if λ is ∗ in the residual spectrum of L . but unbounded operator. R(L − λI) is dense in H). ¯ 4.6 Spectra of inverse operators Consider a closed. ¯ 1. Finally. Let λ = 0. ¯ 2. The complex number λ is in the point spectrum of L is equivalent to λ being an eigenvalue of L. closed. densely deﬁned operator. λ is in the pseudocontinuous spectrum of L if and only if λ is in the pseudocontinuous spectrum of L∗ . 3. We say that λ is in the residual spectrum of L if N (L − λI) = {0} and R(L − λI)⊥ = {0}. Summary: Let L be a closed. densely deﬁned operator with an inverse L−1 that is also a closed.96 CHAPTER 7. In that case (L − λI)−1 is a closed operator with a domain that is not dense. Let L be a densely deﬁned closed operator and let L∗ be its adjoint operator. (7. with inﬁnity as their only possible accumulation point. Let L be a closed operator. We say that λ is in the pseudocontinuous spectrum of L if N (L − λI) = {0} and R(L − λI)⊥ = {0} (so R(L − λI) is dense in H) but R(L − λI) = H. Let L be a closed operator. we have the identity I − λ−1 L −1 + I − λL−1 −1 = I. Theorem. We say that λ is in the anomalous point spectrum of L if N (L − λI) = {0} and R(L − λI)⊥ = {0} (that is. λ is in the residual spectrum of L if and only if λ is in the anomalous ∗ point spectrum of L . ¯ 3.3) One very important situation is when K = L−1 is a compact operator. Similarly. In fact. Then we know that all nonzero elements µ of the spectrum of K = L−1 are eigenvalues of ﬁnite multiplicity. For ﬁnite dimensional vector spaces only cases 0 and 1 can occur. Let L be a closed operator. . densely deﬁned operator. λ in the pseudocontinuous spectrum of L is equivalent to (L − λI)−1 being a densely deﬁned.
Suppose that K and K ∗ have trivial nullspaces. Let λ = 0 and let µ = 1/λ. then in particular K is a closed densely deﬁned operator. Show that L = K −1 exists and is closed and densely deﬁned. In the preceding problem. Describe how L∗ acts on the elements of this domain. Consider the integral operator K given by x 1 (Kf )(x) = 0 y(1 − x)f (y) dy + x x(1 − y)f (y) dy . 1. so that L = K −1 is a closed densely deﬁned operator. Let K be a compact operator. 1). Solve the inhomogeneous equation by variation of parameters. Hint: Solve a ﬁrst order linear ordinary diﬀerential equation.7 Problems If K is a bounded everywhere deﬁned operator. Be explicit about boundary conditions. Consider functions in L2 (0. Describe the domain of L∗ . Be explicit about boundary conditions. Prove that the spectrum of L = K −1 consists of isolated eigenvalues of ﬁnite multiplicity. ﬁnd the resolvent (L − λ)−1 of L. Hint: Diﬀerentiate. 2. Consider the integral operator K given by x (Kf )(x) = 0 f (y) dy. PROBLEMS 97 7. To what extent does this result apply to the examples in the previous problems? . To what extent does this result apply to the examples in the previous problems? 7.7. so that L = K −1 is a selfadjoint operator. 5. Find the resolvent the of (L − λ)−1 of L. ﬁnd √ spectrum √ L. then L = K −1 is also a closed densely deﬁned operator. Show that L∗ = K ∗−1 is closed and densely deﬁned. Describe the domain of L. Find a formula relating the resolvent (L − λ)−1 to the resolvent (K − µ)−1 . Describe how L acts on the elements of this domain. 1). Consider functions in L2 (0. Describe the domain of L. If K is a closed densely deﬁned operator. Show that L = K −1 exists and is closed and densely deﬁned. Suppose that K has trivial nullspace. Prove that there exists an orthogonal basis consisting of eigenvectors of L. 6. ﬁnd the spectrum of L. Hint: Use sin( λx) and sin( λ(1 − x)) as a basis for the solutions of a homogeneous second order linear ordinary diﬀerential equation. and if both K and K ∗ have trivial nullspaces. 3. In the preceding problem. 4. Let K = L−1 be as above. Also.7. Hint: Diﬀerentiate twice. Describe how L acts on the elements of this domain. Let K be a compact selfadjoint operator.
Theorem. then (L+K)∗ = L∗ +K ∗ and (K +L)∗ = K ∗ +L∗ . A selfadjoint operator is automatically closed and densely deﬁned. f ] + [g. Notice that functions in the domain of L0 automatically satisfy the boundary condition f (0) = 0. It is always true that (zL)∗ = z L∗ . densely deﬁned operator. If K is ¯ bounded everywhere deﬁned. However we shall also see a very pretty and useful example of an operator with standard point spectrum. However this is all a consequence of the following theorem. An operator A is selfadjoint if A = A∗ . we have (I + L∗ L)−1 ⊂ (I + L∗ L)−1∗ . densely deﬁned operators. So L0 and L0 are both closed. As a consequence h = f + L∗ Lf . This says that 0 = −Lf + g and h = f + L∗ g. It follows that (I + L∗ L)−1 = (I + L∗ L)−1∗ is a selfadjoint operator. Furthermore. This is a bounded 0 −1 operator deﬁned on the entire Hilbert space. Let H be the Hilbert space L2 (0. Thus (I + L∗ L)−1 is everywhere deﬁned and is a bounded operator with norm bounded by one. The reason is that (L−1 g)(x) = 0 g(y) dy. then A is densely deﬁned. if K is bounded everywhere deﬁned. We have shown that for each h we can solve (I + L∗ L)f = h and that f 2 ≤ h 2 .) If a selfadjoint operator has trivial null space. 1). (Every adjoint A∗ is automatically closed. It is not obvious that this is closed and densely deﬁned. Let L0 be the operator x d/dx acting on functions of the form f (x) = 0 g(y) dy where g is in H. The Hilbert space H ⊕ H may be written as the direct sum of the two closed graphs −L−1 and L∗ Therefore an arbitrary [0. then its inverse is also a selfadjoint operator.98 CHAPTER 7. much less that it is selfadjoint. The value of L0 on such a function is g(x). LL∗ is an operator with LL∗ ⊂ (LL∗ )∗ . h] = [−Lf. 7.9 First order diﬀerential operators with a bounded interval: point spectrum In this section we shall see examples of operators with no spectrum at all. Furthermore. . This operator is the one behind the theory of Fourier series. Furthermore. Since L∗ L ⊂ (L∗ L)∗ . Proof: If L is a closed and densely deﬁned operator. then (KL)∗ = L∗ K ∗ . If L is a closed and densely deﬁned operator. In the case of a ﬁrst order diﬀerential operator this number is one. because their domains may diﬀer. by properties of projections we have Lf 2 + f 2 ≤ h 2 . L∗ g]. DENSELY DEFINED CLOSED OPERATORS 7. h] for h in H may be written as the sum [0. then L∗ L is a selfadjoint operator. This is an example of a x closed operator. Example 1A: This example is one where the correct number of boundary conditions are imposed. then L∗ L is deﬁned on the domain consisting of all u in D(L) such that Lu is in D(L∗ ). If L is a closed. If A∗ is an operator. then L∗ is also a closed and densely deﬁned operator. The conclusion follows.8 Selfadjoint operators It is diﬃcult to do algebraic operations with closed.
These functions u need not be C 1 . where L1 is given by d/dx acting on functions 0 1 of the form f (x) = − x g(y) dy where g is in H. The functions u of course satisfy the boundary condition u(0) = 0.5) Among all L2 functions given by such an integral expression. For instance. So what does u(0) mean? After all. Let L= be the operator d/dx . Such functions are continuous. by the way. 1). there is a subclass of those for which C = 0. These are the ones satisfying the boundary condition 1 u(0) = 0. There is another subclass for which C = − 0 f (y)dy. The value of L1 on such a function is g(x). To see this. However L2 functions such as f are deﬁned only up to sets of measure zero. These are the ones satisfying u(1) = 0. but they may have slope discontinuities. but it is extremely pathological.7. It is in some sense located all at inﬁnity. L is said to be skewadjoint if L = −L∗ . 1). where the derivative exists except possible on a set of measure zero. an example of pseudocontinuous spectrum. This is one important though somewhat technical point. FIRST ORDER DIFFERENTIAL OPERATORS WITH A BOUNDED INTERVAL: POINT SPECTRUM99 The adjoint of the inverse is given by (L−1∗ )h(x) = x h(y) dy. the resolvent of L0 is given by ((L0 − λI)−1 g)(x) = 0 x 1 eλ(x−y) g(y) dy. So there are no points at all in the spectrum of L0 . Let H be the Hilbert space L2 (0. Example 1B: Here is another example with the correct number of boundary conditions.9. In the example 0 where K0 is the integration operator. The operators L0 and L1 each have one boundary condition. The action of L0 on a function u is given by L0 u = u . (7. The domain of L0 consists precisely of the functions in the range of K0 = L−1 . this is all functions of the form x u(x) = 0 f (y)dy. consider the operator L−1 . They belong to a larger class of functions that are indeﬁnite integrals of L2 functions. Now for a really picky question: If u is also regarded as an L2 function. (7. Similarly. then it is also deﬁned only up to sets of measure zero. A densely deﬁned operator L is said to be selfadjoint if L = L∗ . All the spectral information is hidden at this one point. the set consisting of 0 alone is of measure zero. They each have a spectral theory.4) where f is in L2 (0. It follows 0 that L∗ is the operator −L1 . This operator has spectrum consist0 ing of the point zero. so this is not a problem. This is. They are negative adjoints of each other. Notice that functions in the domain of L1 automatically satisfy the boundary condition f (1) = 0. The answer is that the general indeﬁnite integral is a function of the form x u(x) = 0 f (y)dy + C.
The spectrum of L01 also consists of the entire complex plane. It has too few boundary conditions. From this we see that L01 is skewHermitian. Lw) for all u. DENSELY DEFINED CLOSED OPERATORS x acting on functions of the form f (x) = 0 g(y) dy + C where g is in H and 1 g(y) dy = 0. Let L01 be the operator d/dx acting on functions of the form f (x) = x 1 g(y) dy where g is in H and 0 g(y) dy = 0.10 Spectral projection and reduced resolvent Consider a closed densely deﬁned operator L. This is simply the algebraic property that (Lu. 1). This is too many boundary conditions. The spectral projection corresponding to λn is a (not . The spectrum consists of the numbers 2πin. The operator L is d/dx. and it is all point spectrum. The value of L01 on such a 0 function is g(x). but they may have diﬀerent spectral properties. The operator L= has periodic boundary conditions. The spectrum of L consists of the entire complex plane. Notice 0 that functions in the domain of L need not satisfy any boundary conditions. These are all point spectrum. w) = (u. A densely deﬁned operator L is said to be Hermitian if L ⊂ L∗ . Since it is skewadjoint. Let H be the Hilbert space L2 (0. The corresponding eigenvectors form a basis that gives the Fourier series expansion of an arbitrary periodic function with period one. the operator L= has L01 ⊂ L= ⊂ L. Thus there are various correct choices of boundary conditions. Each of these operators is the negative of the adjoint of the other. where g is in H. Notice that 0 functions in the domain of L= automatically satisfy the boundary condition f (0) = f (1). and it is all residual spectrum. The adjoint of L01 is the operator −L.100 CHAPTER 7. Notice that functions in the domain of L01 automatically satisfy the boundary conditions f (0) = 0 and f (1) = 0. Similarly. w in D(L). The operator L= is skewadjoint. it has an extraordinarily nice spectral theory. where L is given by d/dx acting on functions of the form x g(y) dy+C. The operator L01 has a boundary condition at 0 and at 1. The value of L on such a function is g(x). In this section we shall assume that the eigenvectors of L span the entire Hilbert space. The resolvent is ((L= − λI)−1 g)(x) = 1 1 − eλ x 0 eλ(x−y) g(y) dy − 1 1 − e−λ 1 x e−λ(y−x) g(y) dy. The value of L= on such a function is g(x). Example 2: The following example illustrates what goes wrong when one imposes the wrong number of boundary conditions. Furthermore. 7. L is said to be skewHermitian if L ⊂ −L∗ . Remark: The operators L0 and L1 have L01 ⊂ L0 ⊂ L and with L01 ⊂ L1 ⊂ L. Consider also an isolated eigenvalue λn .
λm − λ (7. then L∗ L is a selfadjoint operator. GENERATING SECONDORDER SELFADJOINT OPERATORS 101 necessarily orthogonal) projection onto the corresponding eigenspace.6) This is the negative of the residue of the resolvent at λ1 . (7. The reason this works is that L = m λm Pm and consequently (L − λI)−1 = m 1 Pm . Remember that L∗∗ = L. The reduced resolvent corresponding to λn is deﬁned as the operator that inverts (L − λn I) in the range of (I − Pn ) and is zero in the range of Pn . λ→λn (7.13) 7. so LL∗ is also a selfadjoint operator.12) The reduced resolvent corresponding to eigenvalue 0 is the skewadjoint operator x (S0 g)(x) = 0 1 ( − x + y)g(y) dy + 2 1 x 1 (− − x + y)g(y) dy. The spectral projection corresponding to eigenvalue 2πin is the selfadjoint operator 1 (Pn g)(x) = 0 exp(2πin(x − y))g(y) dy.8) (7.11 Generating secondorder selfadjoint operators This section exploits the theorem that says that if L is an arbitrary closed densely deﬁned operator.7. . It is a solution of the equations Sn Pn = Pn Sn = 0 and (L − λn I)Sn = 1 − Pn It may be expressed in terms of the resolvent by Sn = lim (L − λI)−1 (1 − Pn ).11. 2 (7.10) When the eigenvectors span this is Sn = m=n 1 Pm . 1) with periodic boundary conditions. λ→λn (7.11) Example: Take the skewadjoint operator L= = d/dx acting in L2 (0. when the eigenvectors span the entire Hilbert space.9) (7.7) at least in the case under consideration. λm − λn (7. It is given in terms of the resolvent by Pn = lim (λn − λ)(L − λI)−1 .
On the other hand. . Let L0 be the operator d/dx x acting on functions f in H of the form f (x) = 0 g(y) dy where g is in H. If λ > 0.15) This equation deﬁnes the bounded operator (L − λI)−1 that sends g into f . then we can ﬁnd always ﬁnd a solution of the equation (L0 − λI)f = g. This solution is x f (x) = 0 eλ(x−y) g(y) dy. then the formula gives a result in L2 ¯ only if g is orthogonal to e−λx . DENSELY DEFINED CLOSED OPERATORS It would be easy to conclude that ﬁrst order diﬀerential operators such as L01 and its adjoint L are of no interest for spectral theory. then we have point spectrum. From the general theorem LL01 and L01 L are selfadjoint secondorder diﬀerential operators. (7. Notice that if λ > 0. Thus λ > 0 corresponds to residual spectrum. have a nice spectral theory. It is amusing to work out other selfadjoint second order diﬀerential operators that may be generated from the ﬁrst order diﬀerential operators of the preceding sections. (7. This is the example that underlies the theory of the Fourier transform. If λ < 0. These operators underly the theory of the Laplace transform. at least when λ > 0. Notice that functions in the domain of L0 automatically satisfy the boundary condition f (0) = 0. if λ < 0. Let L be the operator d/dx with no boundary condition.14) This equation deﬁnes the bounded operator (L0 − λI)−1 that sends g into f . This is not the case. at least when λ < 0. Again let H be the Hilbert space L2 (0. ∞). 7. This solution is ∞ f (x) = − x eλ(x−y) g(y) dy. These. The value of L0 on such a function is g(x). which is easy to check directly.13 First order diﬀerential operators with an inﬁnite interval: continuous spectrum In this section we shall see an example of an operator with continuous spectrum. 7. This corresponds to 0 ¯ the fact that (L0 − λI)−1∗ = (−L − λ)−1 . ∞). as we shall see.12 First order diﬀerential operators with a semiinﬁnite interval: residual spectrum In this example we shall see examples of operators with anomalous point spectrum and residual spectrum. then we can ﬁnd always ﬁnd a solution of the equation (L − λI)f = g. The relation between these two operators is L∗ = −L. The operator L01 L is the operator −d2 /dx2 with Neumann boundary conditions u (0) = 0 and u (1) = 0.102 CHAPTER 7. The operator LL01 is the operator −d2 /dx2 with Dirichlet boundary conditions u(0) = 0 and u(1) = 0. Example: Let H be the Hilbert space L2 (0.
If λ > 0. PROBLEMS 103 Example: Let H be the Hilbert space L2 (−∞. 2. (7.14 Problems 1. Let H = L2 (0. 2πi 5. ∞). 7. Also. (7. This solution is x f (x) = −∞ eλ(x−y) g(y) dy. For each eigenvalue λn . Find an explicit formula for (L − k − i )−1 g. that is. then we can ﬁnd always ﬁnd a solution of the equation (L−λI)f = g. L∗ = −L. then we can ﬁnd always ﬁnd a solution of the equation (L − λI)f = g. Find the explicit form of the formula g = n Pn g. ∞).17) This equation deﬁnes the bounded operator (L − λI)−1 that sends g into f . at least when λ > 0. Find the explicit form of the expression δ (L − k)g = 1 [(L − k − i )−1 − (L − k + i )−1 ]g. 6. at least when λ < 0. Find the eigenvalues and eigenvectors of L. ﬁnd the residue Pn of (λ − L)−1 at λn .14. 1). If λ < 0. Find the explicit form of the formula ∞ g= −∞ δ(L − k)g dk. Let L = −id/dx. ﬁnd an explicit formula for (L − k + i )−1 g.16) This equation deﬁnes the bounded operator (L − λI)−1 that sends g into f . 4.7. Let L = −id/dx with periodic boundary conditions. Find an explicit formula for (λ−L)−1 g. Let → 0. The operator L is skewadjoint. Let k be real and > 0. This solution is ∞ f (x) = − x eλ(x−y) g(y) dy. 3. Let L be the operator d/dx acting on functions f in H with derivatives f in H. Find the explicit form of the formula ∞ g= −∞ δ (L − k)g dk. Hint: Solve the ﬁrst order ordinary diﬀerential equation (λ − L)f = g with the boundary condition f (0) = f (1). . This corresponds to the ¯ fact that (L − λI)−1∗ = (−L − λ)−1 . which is easy to check directly. Let H = L2 (−∞.
This is the integral operator with kernel δ(x)g(y). This is an operator that is closed but not densely deﬁned. Hence the domain of K ∗ consists of all w with (g.19) δ(x)2 dx = +∞. but this is not an operator. its adjoint is ¯ K ∗∗ = K. But we can take un → 0 in the Hilbert space sense. Cg] in the ¯ graph K. yet with each un (0) = C.15 A pathological example Consider H = L2 (R) and ﬁx g in H. w) = 0. (7. Then (Ku)(x) = g(x)f (0). According to the general theory. Then ¯ the pair [u.18) ¯ This K is densely deﬁned but not closed. and K ∗ = 0 on this domain. That is. (K ∗ w)(x) = δ(x) Since ∞ −∞ ∞ g(y)w(y) dy. Deﬁne K as the integral operator with kernel k(x.20) the δ(x) is not in L2 . v] is in the graph K. It has a closure K. Consider the domain D(K) to be the set of all continuous functions in H. −∞ (7. So this gives the pair [0. which is not an operator.104 CHAPTER 7. let un → u and Kun = un (0)g → v. To see this. DENSELY DEFINED CLOSED OPERATORS 7. (7. y) = g(x)δ(y). . This is certainly not an operator! Consider the adjoint K ∗ .
Let K be a compact normal operator. But they really have nothing to do with each other. The corresponding eigenvectors span a closed subspace Mp of the Hilbert space. 3. A densely deﬁned closed operator is said to be normal if L∗ L = LL∗ . For a selfadjoint operator the spectrum is on the real axis. 2. since they take place in orthogonal subspaces. Then the spectrum of L restricted to Mc is called the continuous spectrum of L. There are three particularly important classes of normal operators. With this classiﬁcation for normal operators the point spectrum and continuous spectrum can overlap. The point spectrum consists of the eigenvalues of L. Let L be a normal operator acting in a Hilbert space H. A skewadjoint operator is an operator L with L∗ = −L.Chapter 8 Normal operators 8. This kind of pseudocontinuous spectrum is not really continuous at all. Spectral theorem for compact normal operators. Then L∗ L and LL∗ are each selfadjoint operators. Let Mc be the orthogonal complement in H of Mp . For normal operators there is a diﬀerent classiﬁcation of spectrum.) Then K has an orthogonal basis of eigenvectors. 1. The spectrum of L in this space consists of either what we have previously called standard point spectrum or of what we have previously called pseudocontinuous spectrum. since it consists of limits of point spectrum. In our previous classiﬁcation the spectrum of L in this space would be pseudocontinuous spectrum. For a unitary operator the spectrum is on the unit circle. The nonzero 105 . A unitary operator is an operator L with L∗ = L−1 .1 Spectrum of a normal operator Theorem (von Neumann) Let L be a densely deﬁned closed operator. For a skewadjoint operator the spectrum is on the imaginary axis. (This includes the cases of selfadjoint operators and skewadjoint operators. A unitary operator is bounded. A selfadjoint operator is an operator L with L∗ = L.
The restriction of a solid spherical harmonic to the sphere r2 = 1 is called a surface spherical harmonic. x + iy. The only possible accumulation point of eigenvalues is zero. sin θ ∂θ ∂θ sin2 θ ∂φ2 1. Perhaps the most beautiful selfadjoint operator is the spherical Laplacian ∆S = Show by explicit computation that this is a Hermitian operator acting on L2 of the sphere with surface measure sin θ dθ dφ. r2 ∂2 ∂2 ∂2 + 2+ 2 ∂x2 ∂y ∂z 3. Let L be a normal operator with compact resolvent. Such a p is called a solid spherical harmonic. Pay explicit attention to what happens at the north pole and south pole when one integrates by parts. 8.2 Problems 1 ∂ ∂ 1 ∂2 sin θ + . For = 1 it is spanned by z. (x + iy)2 . 4.106 CHAPTER 8. Spectral theorem for normal operators with compact resolvent. Show that each solid spherical harmonic is an eigenfunction of ∆S and ﬁnd the corresponding eigenvalue as a function of . . Show that the Laplace operator ∆= is related to L and ∆S by ∆= 1 [L(L + 1) + ∆S ]. 5. Let L=r ∂ ∂r be the Euler operator. (This includes the cases of selfadjoint operators and skewadjoint operators. The only possible accumulation point of eigenvalues is inﬁnity. Let r be the radius satisfying r2 = x2 + y 2 + z 2 . The eigenvalues have ﬁnite multiplicity. The dimension of the eigenspace indexed by is 2 + 1. For = 0 the eigenspace is spanned by 1. The surface spherical harmonics are the eigenfunctions of ∆S .) Then L has an orthogonal basis of eigenvectors. y. z(x + iy). 2. NORMAL OPERATORS eigenvalues have ﬁnite multiplicity. and x − iy. Let p be a polynomial in x. Thus ∆p = 0 and Lp = p. z that is harmonic and homogeneous of degree . z(x − iy). For = 2 it is spanned by 3z 2 − r2 . Show that surface spherical harmonics for diﬀerent values of are orthogonal in the Hilbert space of L2 functions on the sphere.
The general solution of the homogeneous equation Lu = 0 is u(x) = c1 u1 (x). then c1 (x)u1 (x) + c2 (x)u2 (x) = 0. This system of two linear equations is easily solved. What are they? 8. For = 3 it is spanned by 5z 3 − 3zr2 . This is the ﬁrst equation. The method of variation of parameters gives a solution of the inhomogeneous equation Lu = f in the form u(x) = c1 (x)u1 (x).3. In the case = 1 we can write the general spherical harmonic as ax+by+cz. The condition on the parameter is given by plugging u into Lu = f . where c1 and c2 are parameters. The solution is c1 (x) = f (x)/(p(x)u1 (x)). z(x − iy)2 .2) Furthermore. If this is to be so. b u (x) = x u1 (x)u2 (y) f (y) dy + p(y)w(y) x a u2 (x)u1 (y) f (y) dy. (5z 2 − r2 )(x + iy). where c1 is a parameter. p(y)w(y) (8. Express the corresponding surface spherical harmonics in spherical coordinates. VARIATION OF PARAMETERS AND GREEN’S FUNCTIONS 107 and (x − iy)2 . The solution is c1 (x) = −u2 (x)f (x)/(p(x)w(x)) and c2 (x) = u1 (x)f (x)/(p(x)w(x)). The only diﬃcult part is to integrate this to get the general solution x u(x) = a u1 (x) f (y) dy + Cu1 (x). p(y)u1 (y) (8. u (x) = c1 (x)u1 (x) + c2 (x)u2 (x). it has the property that the derivative has the same form. (5z 2 − r2 )(x − iy). A solution of Lu = f is thus b u(x) = x u1 (x)u2 (y) f (y) dy + p(y)w(y) x a u2 (x)u1 (y) f (y) dy. z(x + iy)2 .3) .1) Now look at second order linear ordinary diﬀerential operators. The second equation is given by plugging u into Lu = f . Not only that. Let u1 be a nonzero solution of the homogeneous equation Lu = 0. In the case = 2 we can write it as ax2 + by 2 + cz 2 + dxy + eyz + f zx with an additional condition on the coeﬃcients. (x − iy)3 .3 Variation of parameters and Green’s functions First look at ﬁrst order linear ordinary diﬀerential operators. The method of variation of parameters gives a solution of the inhomogeneous equation Lu = f in the form u(x) = c1 (x)u1 (x) + c2 (x)u2 (x). This gives p(x)(c1 u1 (x) + c2 (x)u2 (x)) = f (x). Let Lu = p(x)u + r(x)u + q(x)u.8. 6. that is. The general solution of the homogeneous equation Lu = 0 is u(x) = c1 u1 (x) + c2 u2 (x). Let Lu = p(x)u + r(x)u. This gives p(x)c1 (x)u1 (x) = f (x). Let w(x) = u1 (x)u2 (x) − u2 (x)u1 (x). What is this condition? In the case = 3 we can write it as a1 x3 + b1 y 2 x + c1 z 2 x + a2 y 3 + b2 z 2 y + c2 x2 y + a3 z 3 + b3 x2 z + c3 y 2 z + dxyz with additional conditions. Let u1 and u2 be independent solutions of the homogeneous equation Lu = 0. (x + iy)3 . p(y)w(y) (8.
(8. they √ √ have obvious values in the limit λ → 0. while u(b) = Bu2 (b) and u (b) = Bu2 (b). This also does not depend on the cut. (Furthermore. 3.) Then p(x) = −1 and w(x) = − sin( λ)/ λ. NORMAL OPERATORS Notice that u(a) = Au1 (a) and u (a) = Au1 (a).. However often we want a particular solution with boundary conditions at a and b.6) 8. y)f (y) dy. 1) with Dirichlet boundary conditions f (0) = 0 and f (1) = 0 at 0 √ and √ √ √ 1. The resolvent is thus ((LD − λ)−1 g)(x) = f (x) where f (x) = √ 1 √ λ sin( λ) x 0 √ √ sin( λ(1 − x)) sin( λy)g(y) dy + 1 x √ √ sin( λx) sin( λ(1 − y))g(y) dy . Take the solutions u1 (x) = sin( λx)/ λ and u2 (x) = sin( λ(1 − x))/ λ. This is the negative of the residue and is explicitly 1 (Pn g)(x) = 2 0 sin(nπx) sin(nπy)g(y) dy. y) = u1 (x)u2 (y) p(y)w(y) u2 (x)u1 (y) p(y)w(y) if x < y if x > y (8.4) where k(x. (8.7) The spectrum consists of the points λ = n2 π 2 for n = 1.108 CHAPTER 8. y) = δ(x − y).4 Second order diﬀerential operators with a bounded interval: point spectrum Example. The general solution is obtained by adding an arbitrary linear combination C1 u1 (x) + C2 u2 (x). (8. Consider the selfadjoint diﬀerential operator LN = −d2 /dx2 on L (0. (8. Of course this is just another way of saying that LK = I. Then we use the form above. Then √ √ p(x) = −1 and w(x) = λ sin( λ). So this form of the solution is useful for specifying boundary conditions at a and b. .8) Example. Consider the selfadjoint diﬀerential operator LD = −d2 /dx2 on L2 (0. It is amusing to work out the spectral projection at the eigenvalue n2 π 2 . The 2 . This is standard point spectrum. These are deﬁned in a way that does not depend on which square root of λ is taken. 1) with Neumann boundary conditions f (0) = 0 and √ (1) = 0 at 0 and f √ 1. 2. .5) Sometime one thinks of y as a ﬁxed source and write the equation Lx k(x. This can also be written b u(x) = (Kf )(x) = a k(x. This also does not depend on the cut. . Take the solutions u1 (x) = cos( λx) and u2 (x) = cos( λ(1 − x)).
13) 8. The spectral projection at the eigenvalue n2 π 2 for n = 1. The constant terms may be neglected. . . Expand the cosine functions to second order.. . . Consider the selfadjoint diﬀerential operator LD = −d2 /dx2 on L2 (0. ¯ This gives x (S0 g )(x) = ¯ 0 1 1 ( (1 − x)2 + y 2 )¯(y) dy + g 2 2 1 x 1 1 ( x2 + (1 − y)2 )¯(y) dy. Since sinh(iz) = i sin(z). 3. Again the spectrum consists of the positive real axis and is continuous. The resolvent is ((LD −λ)−1 g)(x) = √ 1 −λ x 0 e− √ −λx √ sinh( −λy)g(y) dy + ∞ x √ √ sinh( −λx)e− −λy g(y) dy . ∞) with Neumann boundary condition f (0) = 0 at 0. (8. . (8. This is easy. .12) g 2 2 From this it is easy to work out that x (S0 g)(x) = 0 1 1 1 ( (1 − x)2 + y 2 − )g(y) dy + 2 2 6 1 x 1 1 1 ( x2 + (1 − y)2 − )g(y) dy. . where g = (1 − P )g has zero average. 2. 1.9) The spectrum consists of the points λ = n2 π 2 for n = 0. is 1 (Pn g)(x) = 2 0 cos(nπx) cos(nπy)g(y) dy. In u2 (x) the square root is taken to be cut on the negative axis. It is instructive to compute the resolvent of the selfadjoint diﬀerential operator LN = −d2 /dx2 on L2 (0.10) For n = 0 it is (P0 g)(x) = 0 g(y) dy. SECOND ORDER DIFFERENTIAL OPERATORS WITH A SEMIBOUNDED INTERVAL: CONTINUOU resolvent is thus ((LN −λ)−1 g)(x) = − √ 1 √ λ sin( λ) x 0 √ √ cos( λ(1 − x)) cos( λy)g(y) dy + 1 x √ √ cos( λx) cos( λ(1 − y))g(y) dy . 2 2 6 (8. Thus we must compute (LN − λ)−1 g . this is the same u1 (x) as before.5 Second order diﬀerential operators with a semibounded interval: continuous spectrum Example. and ¯ ¯ then let λ approach zero. Take the solutions √ √ √ u1 (x) = sinh( −λx)/ −λ and u2 (x) = e− −λx .11) It is interesting to compute the reduced resolvent of LN at the eigenvalue 0. (8.14) The spectrum consists of the positive real axis and is continuous.8. (8. 3. This is standard point spectrum. since they are orthogonal to g . 2.5. ∞) with Dirichlet boundary condition f (0) = 0 at 0. Then p(x) = −1 and w(x) = −1. 1 (8.
(8. (8. The equation then says (U Lf )k = λk (U f )k . NORMAL OPERATORS 8. µ) and a complex function λ on K such that (U Lf )(k) = λ(k)(U f )(k).18) The theorem is a generalization of the theorem on diagonalization of normal matrices. The λk are the eigenvalues of L. ψk ) .15) This can also be written in the form 1 ((L − λ)−1 g)(x) = √ 2 −λ ∞ −∞ e− √ −λx−y g(y) dy. (8. Let L be a normal operator. Then there exists a set K (which may be taken to be a disjoint union of copies of the line) and a measure µ on K and a unitary operator U : H → L2 (K. then the norm in the space L2 (µ) is given by g 2 = k g(k)2 µ({k}). Let H be a Hilbert space.20) where the ψk are eigenvectors of L normalized so that µ({k}) = 1/(ψk .6 Second order diﬀerential operators with an inﬁnite interval: continuous spectrum Example. 8. If the measure µ is discrete. The inverse of U is given by U ∗g = k gk ψk µ({k}).19) The unitary operator U is given by (U f )k = (ψk . Consider the selfadjoint diﬀerential operator L = −d2 /dx2 on L2 (−∞.17) Thus if we write Λ for the operator of multiplication by the function λ. we get the representation L = U ∗ ΛU.110 CHAPTER 8.21) . x (8. The great theorem of spectral theory is the following. (8. There is now no choice of boundary conditions. The resolvent is 1 ((L−λ)−1 g)(x) = √ 2 −λ x −∞ e− √ −λx e √ −λy g(y) dy + ∞ √ e √ −λx − −λy e g(y) dy . (8.7 The spectral theorem for normal operators Throughout the discussion we make the convention that the inner product is conjugate linear in the ﬁrst variable and linear in the second variable. ∞).16) The spectrum consists of the positive real axis and is continuous. f ). (8.
23) If the measure µ is continuous. The spectrum is the essential range of the function. The inverse of U is given formally by U ∗g = g(k)ψk dµ(k).22) λk (ψk . In quite general contexts the unitary operator U is given by (U f )(k) = (ψk . but instead to some larger space. (8. The equation Lf = U ∗ ΛU f says formally that Lf = λ(k)(ψk . such as a space of slowly growing functions or of mildly singular distributions. The theorem says simply that every normal operator L is isomorphic to multiplication by a function λ. This is the set of points ω such that for each > 0 the set of all points k such that λ(k) is within of w has measure > 0. THE SPECTRAL THEOREM FOR NORMAL OPERATORS The equation Lf = U ∗ ΛU f says explicitly that Lf = k 111 (8. µ) is given by g 2 = g(k)2 dµ(k).26) but this equation must be interpreted in some weak sense. f ).28) (8. .8. but do not belong to the Hilbert space. The equation then says (U Lf )(k) = λ(k)(U f )(k).27) Since the eigenvectors ψk are not in the Hilbert space.24) but now this equation only makes sense for a dense set of f in the Hilbert space. then the norm in the space L2 (K.25) (8. f )ψk dµ(k). (8. and the ψk resemble eigenvectors of L. (8. a function λ(k) has 1/(λ(k) − w) bounded if and only if w is not in the essential range of λ(k). (8. This is obvious. f )ψk µ({k}). The spectral theorem for normal operators says that every normal operator is isomorphic (by a unitary operator mapping the Hilbert space to an L2 space) to a multiplication operator (multiplication by some complex valued function λ). Then λ(k)is a function of the continuous parameter k. The simplicity and power of the equation L = U ∗ ΛU cannot be overestimated. it is convenient in many contexts to forget about them and instead refer to the measure µ and the function λ and to the operators U and U ∗ .7.
31) Then Ta is unitary. (8. This is just the part of the space that is isomorphic to the part of L2 where λ ≥ .9 Examples: translation invariant operators and the Fourier transform The nicest examples for the continuous case are given by translation invariant operators acting in H = L2 (R.8 Examples: compact normal operators The theorem on compact normal operators is a corollary of the general spectral theorem. ﬁnite.32) Example 2: Convolution. Since L is compact. Since > 0 is arbitrary. Say that it is isomorphic to multiplication by λ. The inverse Fourier transform is (F −1 g)(x) = ∞ −∞ eikx g(k) dk .33) .30) Here are some examples: Example 1: Translation. dk/(2π)). and we are using the convention that the inner product is linear in the second variable. it follows that there are only countably many isolated eigenvectors of ﬁnite multiplicity in the part of the space where L > 0. f ) = −∞ e−ikx f (x) dx. (8. Let C be deﬁned by ∞ ∞ (Cf )(x) = −∞ c(x − y)f (y) dy = −∞ c(a)f (x − a) da. In fact (F Ta f )(k) = exp(−ika)(F f )(k).29) Here ψk (x) = eikx . Consider a compact normal operator L. (8. The spectral representation is given by the Fourier transform. 2π (8.112 CHAPTER 8. Let Ua be deﬁned by (Ta f )(x) = f (x − a). Fix > 0 and look at the part of the space where L ≥ . it follows that this part of the space is ﬁnite dimensional. The Fourier transform is given formally by ∞ (F f )(k) = (ψk . On this part of the space the operator L maps the unit ball onto a set that contains the ball of radius . this is the subspace consisting of all functions g in L2 such that g(k) = 0 only where λ(k) ≥ . 8. Therefore there are only ﬁnitely many eigenvectors of ﬁnite multiplicity in this space. In this case the Fourier transform maps H into L2 (R. This shows that the spectrum in the subspace where L ≥ is ﬁnite dimensional. In the part of the space where L = 0 we can have an eigenvalue 0 of arbitrary multiplicity (zero. (8. dx). or inﬁnite). NORMAL OPERATORS 8. More explicitly.
10. If V (x) = 0 for 0 < x < 1 and V (x) = +∞ elsewhere. we get (F Df )(k) = ik(F f )(k). Furthermore. Then by integrating the ﬁrst example we get (F Cf )(k) = c(k)(F f )(k). then we may think of this as the operator H = −D2 with Dirichlet boundary conditions at the end points of the unit interval. we get (F Df )(k) = −k 2 (F f )(k). Such an operator is called a Schr¨dinger o operator. Notice that the unitary operator in Example 1 may be written Ua = exp(−aD). Second diﬀerentiation.36) Then D is selfadjoint. A nice example of this is the square well.40) If V (x) is a real locally integrable function that is bounded below. Then C is bounded normal. then the spectrum of H is point spectrum. 2 −1 8. Furthermore.10 Examples: Schr¨dinger operators o H = −D2 + V (x) (8.34) Then D is skewadjoint. ˆ Example 3: Diﬀerentiation. ˆ where c is the Fourier transform of c.¨ 8. One case where it is possible to obtain explicit formulas is when V (x) is a quadratic function. EXAMPLES: SCHRODINGER OPERATORS 113 where c is an integrable function. then is a welldeﬁned selfadjoint operator. For instance (−D2 + √ −x2 m ) is convolution by 1/(2m)e−mx . If on the other hand. then the spectrum of H consists of positive continuous spectrum and possibly some strictly negative eigenvalues.35) (8. Let D2 be deﬁned by (D2 f )(x) = d2 f (x) . V (x) is integrable on the line. And exp(tD2 ) is convolution by 1/ 2πte 2t . Let D be deﬁned by (Df )(x) = df (x) . Finding the eigenvalues is a challenge. (8. Also there is an interesting limiting case. We know how to ﬁnd the spectrum in this case. dx (8.39) We can take interesting functions of these operator. Example 4.37) (8. If V (x) → ∞ as x → ∞.38) (8. dx2 (8. where there is a constant .
dt). for instance. For each a ≥ 0 deﬁne (Sa f )(t) = f (t − a) for a ≤ t and (Sa f )(t) = 0 (8. Example: Let F be the Fourier transform applied to the Hilbert space H of L2 functions that vanish except on the positive axis. It follows from the spectral theorem that a subnormal operator S is isomorphic by a unitary operator U : H → M to a multiplication operator that sends g(k) to λ(k)g(k). ∞). a normal operator N : H → H that leaves H invariant.114 CHAPTER 8. An operator S : H → H is subnormal if there exists a larger Hilbert space H with H ⊂ H as a closed subspace.11 Subnormal operators The spectral theorem for normal operators is a landmark. and such that N restricted to H is S. Then the image of this transform consists of the subspace M of L2 functions g(k) that are boundary values of analytic functions g(z) in the lower half plane. One important class of operators with fascinating spectral properties consists of the subnormal operators. Then Sa is subnormal. consider the bigger space H = L2 (R. Let Ua be translation by a on H . dt) and consider H as the subspace of functions that vanish except on the positive reals. This function extends to a function λ(z) = exp(−iaz) deﬁned in the lower half (8.42) for 0 ≤ t < a. If.) 8. Now it is more diﬃcult to characterize the spectrum. So we need to require also that w is not in the range of the extension λ(z). However not every operator is normal. then one would want 1/(λ(z) − w)g(z) to also be an analytic function in this region. every function g(k) in M has an extension to an analytic function g(z) deﬁned on some larger region. However we shall see that Schr¨dinger operao tors play a role in other contexts as well.41) . A number w is in the resolvent set if 1/(λ(k) − w)g(k) is in M for every function g(k) in M . Example: Let H = L2 ([0. If a ≥ 0. The calculation of the spectral properties of Schr¨dinger operators is the o main task of quantum physics. First consider the case of bounded operators. NORMAL OPERATORS a > 0 with V (x) = −a for 0 < x < 1 and V (x) = 0 otherwise. A subnormal operator is an operator that is the restriction of a normal operator to an invariant subspace. For each f in H we have U f in M and U Sf (k) = λ(k)(U f )(k) in M . Then Sa is Ua acting in this subspace. (One example will be in calculus of variations. This is another case where computations are possible. Here M is a closed subspace of an L2 space. To see this. then the subspace H is left invariant by Ua . The operator Sa for a > 0 is isomorphic to multiplication by λ(k) = exp(−iak) acting in this subspace. However it is not suﬃcient that this is a bounded function.
v) = (u. Since S is ¯ subnormal. S ∗ u) = (P N ∗ u. The adjoint S ∗ need not be subnormal. S ∗ Su) for all u in H. N ∗ u) = (N u. One possible deﬁnition might be the following. N u) = (Su. so the operator S restricted to this smaller space is also subnormal. (8. If S is a subnormal operator and Su = 0. Thus the null space of S is contained in the null space of S ∗ . Sv) = (u. Let H be a Hilbert space that is a closed subspace of H . Theorem. Then for u in H we have (S ∗ u. where P is the orthogonal projection of H onto H. Let v be a vector in H that is orthogonal ∗ ¯ u) = 0. S u) = λ(v. Thus there is a dense domain D(N ) ⊂ H such that N : D(N ) → H is normal. (u. So the spectrum is the range of this function.45) (8. while S has residual spectrum. There are examples in which one might want to consider unbounded subnormal operators. Theorem. It is not true that every eigenvalue of S ∗ is an eigenvalue of S. If N : H → H is normal. of u in H is also an invariant space. then S ∗ u = 0. Then D(S) is dense in H. the operator S is subnormal. ¯ Corollary.8. S has an eigenvector u in H with Su = λu. and S : H → H is the restriction of N to H. (8. But the image of z ≤ 0 under exp(−iaz) is the unit circle w ≤ 1. v). Su). So this is the spectrum. If S is a subnormal operator. Proof: Let S be a subnormal operator acting in H. Proof: First note that for u in H we have (N ∗ u. Continue in this way until one ﬁnds an orthogonal basis of eigenvectors for S. The adjoint of a subnormal operator S : H → H is another operator S ∗ : H → H. for each v in H we have (S ∗ u. Since the space H is ﬁnite dimensional. P N ∗ u) ≤ (N ∗ u. N ∗ u) = (u. Thus the orthogonal complement to u. SS ∗ u) ≤ (u. Consider an unbounded normal operator N . SUBNORMAL OPERATORS 115 plane. this implies that S ∗ u is the orthogonal projection of N ∗ u onto H. then every subnormal operator is normal. N N ∗ u) = (u. N ∗ N u) = (N u. then S ∗ u = λu. it follows that S ∗ u = λu. densely deﬁned operator. Then (Sv. H ⊂ H . Lemma. N u). Thus ∗ every eigenvector of S is an eigenvector of S . If S is a subnormal operator and Su = λu. and S : D(S) → H is a closed. Since S ∗ u is in H. Proof: If u is in H. that is. then SS ∗ ≤ S ∗ S as quadratic forms. It is more typical that S ∗ has eigenvalues while S does not. If the Hilbert space is ﬁnite dimensional. In fact we shall see examples in which S ∗ has anomalous point spectrum. then S ∗ : H → H is given by S ∗ = P N ∗ . u) = (v. Then if D(S) = D(N ) ∩ H and S is the restriction of N to D(S).11. Suppose that D(N ) ∩ H is dense in H and that N sends vectors in this dense subspace into H. . N v) = (N ∗ u.43) This says that N ∗ u − S ∗ u is orthogonal to every v in H.44) Corollary.
Notice that if f is in D(N ) and is also in H. Su).12 Examples: forward translation invariant operators and the Laplace transform The example in the last section is the operator theory context for the theory of the Laplace transform. There is no boundary condition at zero. dt) is ∞ (Lf )(z) = 0 e−zt f (t) dt. Example: For the operator S in the last example we have (S ∗ f )(t) = −f (t) (8. Deﬁne (Sf )(t) = f (t) (8. This can be seen from the computation (u. Then ∞ (LSa f )(z) = a e−zt f (t − a) dt = e−az (Lf )(z). v) for all v in D(S). (8. ∞). S ∗ u) ≤ (Su. Then S is subnormal.49) So Sa is isomorphic to multiplication by e−az acting on the Hilbert space of functions analytic in the right half plane. (8. to z in the right half plane. dt). P N ∗ u) ≤ (N ∗ u. The Laplace transform of a function f in L2 ([0. N ∗ u) = (N u. S ∗ u) = (P N ∗ u. 8. dt) and consider H as the subspace of functions that vanish except on the positive reals. If N is a normal operator acting in H .47) on the domain consisting of all f in H such that f is also in H. Sv) = (u. NORMAL OPERATORS Example: Let H = L2 ([0.46) on the domain consisting of all f in H such that f is also in H and f (0) = 0.116 CHAPTER 8. Su) for all u in D(S). So if S : D(S) → H is subnormal. We conclude that for a subnormal operator D(S) ⊂ D(S ∗ ) and (S ∗ . consider the bigger space H = L2 (R. and in fact D(N ∗ ) = D(N ). Let N be diﬀerentiation on H . if u is in D(N ∗ ) ∩ H. Let a ≥ 0 and let Sa be the operator of right translation ﬁlling in with zero as deﬁned in the previous section. that is. then u is in D(S ∗ ) and S ∗ u = P N ∗ u. then its adjoint N ∗ is also a normal operator. then D(S) = D(N ) ∩ H = D(N ∗ ) ∩ H. then f (0) = 0 automatically. To see this. Furthermore. so N : D(N ) → H . This is also an easy computation: (S ∗ u. N u) = (Su. Its spectrum consists of the closed unit disk. ∞). then when regarded as a function of ω this is the Fourier transform . However it extends as an analytic function to ω in the lower half plane. This is a case where D(S) ⊂ D(S ∗ ) and the two domains are not equal. .48) If we think of z = iω on the imaginary axis. N v) = (N ∗ u.
for instance. where D = d/dt with no boundary condition. (8. provided f is in the domain of D.55) The spectrum of D consists of the closed left half plane. (8. Example: Solve the diﬀerential equation (D + k)f = g.54) Similarly. since −k is an eigenvalue of D. This operator satisﬁes the diﬀerential equation dSa f /da = −D0 Sa f for a ≥ 0. The spectrum of D0 consists of the closed right half plane. ∗ (LSa f )(z) = ∞ 0 e−zt f (t + a) dt = eaz (Lf )(z) − 0 a e−zt f (t) dt . The interior consists ∗ ∗ of point spectrum. (8. However we can write f = u − ce−kt and solve (D0 + k)u = g. We can write exp(−aD0 ) = Sa for a ≥ 0. The operator D + k is not invertible. ˆ Let D0 = d/dt be diﬀerentiation with zero boundary conditions at the origin.52) Notice that the boundary condition is essential for integration by parts. (8. ˆ (8. Again the Laplace transform does not make this into a multiplication operator. EXAMPLES: FORWARD TRANSLATION INVARIANT OPERATORS AND THE LAPLACE TRANSFO Let c be an integrable function deﬁned on the positive axis. with boundary condition f (0) = c. In fact. (8. The adjoint of a subnormal operator need not be subnormal.12. provided that f is in the domain of D0 (and in particular satisﬁes the boundary condition). with k > 0. Even though D does not have a spectral representation as a multiplication operator. and let the causal convolution C be deﬁned by t t (Cf )(t) = 0 c(t − u)f (u) du = 0 c(a)f (t − a) da.56) . ∗ ∗ this operator satisﬁes the diﬀerential equation dSa f /da = DSa f for a ≥ 0. We can write exp(aD) = exp(−aD0 ) = Sa for a ≥ 0. but the interior of the disk consists of point spectrum. Then (LD0 f )(z) = z(Lf )(z).50) Then C is subnormal. (8. Thus. the adjoint of Sa is ∗ (Sa f )(t) = f (t + a).53) Its spectrum is also the closed unit disc. In fact. Notice that the Laplace transform does not send this into a multiplication operator. the adjoint of D0 is −D. we have (LDf )(z) = z(Lf )(z) − f (0).51) where c is the Laplace transform of c. and (LCf )(z) = c(z)(Lf )(z).8.
57) Thus f (t) = f (0)e−kt + e−k(t−u) g(u) du. The total energy observable or quantum Hamiltonian is H = H0 + V = h ¯ 2 d2 p2 + v(q) = − + v(x). A quantum state is a unit vector u. (8. take the case where the function is 1B . The position h h observable is q which is multiplication by the coordinate x. (8.62) This is the probability that the momentum is in B.13 Quantum mechanics Here is a dictionary of the basic concepts.118 CHAPTER 8. The Heisenberg uncertainty principle says that for every state the product σp σq ≥ ¯ /2. the indicator function of a set B of momentum coordinate values. Here v is a given real function that represents potential energy as a function of the space coordinate. Fix a Hilbert space. A quantum observable is a selfadjoint operator L. The spectrum of h H0 is the positive real axis. Here ¯ > 0 is Planck’s constant. Lu). 2π (8. h If L is an observable and f is a real function. Take the case where the function is 1A . The Hilbert space is L2 (R.59) Here are some observables. Then 1A (q) has expectation (u.58) 8. the indicator function of a set A of position coordinate values. then f (L) is an observable. It follows that u is given by the causal convolution g t u(t) = 0 e−k(t−u) g(u) du. The potential energy observable is V = v(q) which is multiplication by v(x). it follows that 1B (p) has expectation ¯ (u. Since in the Fourier transform representation p is represented by multiplication by hk. The expectation of the observable L in the state u in D(L) is µ = (u. dx). 2m 2m dx2 (8. In terms of Laplace transforms this is u(z) = ˆ 1/(z + k)ˆ(z). The kinetic energy observable is H0 = p2 /(2m) = −¯ 2 /(2m)d2 /dx2 . Energy observables are particularly important. Here m > 0 is the mass. 1A (q)u) = A u(x)2 dx.61) This is the probability that the position is in A. t 0 (8. The momentum observable is p = −i¯ d/dx. The variance of the observable L in the state u in D(L) is σ 2 = (L − µI)u 2 .63) .60) (8. The spectrum of V is the range of the function v. NORMAL OPERATORS The solution is u = (D0 + k)−1 g. (8. Similarly. 1B (p)u) = {k¯ k∈B} h ˆ(k)2 u dk .
then N un+1 = (n + 1)un+1 . Verify that each un belongs to the Hilbert space. Suppose that the total energy observable H is a selfadjoint operator. this is not an easy problem. Show that AA∗ = A∗ A + I. ∞). PROBLEMS 119 If we assume that the function v is bounded. 8. 1. we shall encounter h a beautiful formula for this time evolution operator in the next chapter. So there is no direct algebraic way to express exp(−itH/¯ ) in terms of the simpler operators. 2. (This operator also occurs in disguised form in other contexts. 3 explicitly (up to constant factors). Thus the eigenvalues of N are the natural numbers. Since H0 and V do not commute. Show that N u0 = 0. because while exp(−itH0 /¯ ) and exp(−itV /¯ ) are easy to compute. . It may be shown that A∗ is a subnormal operator and so A is the adjoint of a subnormal operator. where 1 A= √ 2 1 A∗ = √ 2 d dx d dx . (In many cases when v is only bounded below it remains a selfadjoint operator. Solve the equation Au0 = 0. the operah h tors H0 and V do not commute. Show that each eigenfunction un is a polynomial in x times u0 (x). This is not easy. A particularly fascinating selfadjoint operator is the quantum harmonic oscillator. Nevertheless. Then the time evolution operator is the unitary operator exp(−itH/¯ ). then H is a selfadjoint operator.14. 5. Verify that it belongs to the Hilbert space. 4. Find each corresponding eigenvector. Find all eigenvalues (point spectrum) of A.14 Problems 1. Show that if N un = nun and un+1 = A∗ un . A central h problem of quantum mechanics is to compute this operator. These are the standard type of point spectrum.) It is 1 d2 N= − 2 + x2 − 1 2 dx acting in L2 (−∞. x+ and x− 2. Show that it factors as N = A∗ A.) The problem of investigating its spectrum is of the utmost importance for quantum mechanics.8. 3. 6. Find the polynomials for the cases of n = 0.
Is this the case? . What kind of spectrum is it? 8. Find all eigenvalues (point spectrum) of A∗ . Find the spectrum of A∗ . NORMAL OPERATORS 7. then we should have A∗ A ≤ AA∗ as quadratic forms.120 CHAPTER 8. If A∗ is indeed a subnormal operator.
(9. However the coeﬃcients are in general nonlinear expressions in y.1) The diﬀerential of F is x2 x2 dF (y)h = x1 (fy h + fyx hx ) dx = (fy − x1 d fy )h dx + fyx h x2 . then the h are required to be zero at the end points. and it has the explicit form dy d2 y fy − fyyx − fyx yx 2 − fxyx = 0.4) dx dx This equation is linear in d2 y/dx2 . It is second order.5) . dy/dx.1 The EulerLagrange equation x2 The problem is to ﬁnd the critical points of F (y) = x1 f (y. yx . dx x (9. Sometimes one wants to think of ∂f d ∂f δF = − δy(x) ∂y dx ∂yx 121 (9. and x. If the y are required to have ﬁxed values y = y1 at x = x1 and y = y2 at x = x2 at the end points. then at a critical point one must have fyx = 0 at at the end points.Chapter 9 Calculus of Variations 9. The corresponding boundary term is automatically zero.3) This is an equation for the critical function y. x) dx. x1 dx x (9. If the y and h are free to vary at an end point. (9.2) Thus for the diﬀerential to be zero we must have the EulerLagrange equation fy − d fy = 0.
CALCULUS OF VARIATIONS as the gradient of F . The boundary conditions for the EulerLagrange equation are yx (x1 ) = 0 and yx (x2 ) = 0.10) If fx = 0. This expression is then known as the variational derivative. Consider the problem of minimizing the length x2 F (y) = x1 2 1 + yx dx (9. and get x and y each in terms of u. where m = (y2 − y1 )/(x2 − x1 ). .7) between the lines x = x1 and x = x2 .8) Thus where d H + fx = 0. How in practice does one solve a problem in calculus of variations? There is a way when the function f (y. y2 ). So the solution is y − y1 = m(x − x1 ). The boundary condition are y(x1 ) = y1 and y(x2 ) = y2 . Or write dx = dy/α(y).9) (9. This is the conservation law.6) between the points (x1 . (9.11) Thus it is ﬁrst order. 9.2 A conservation law d d f = fy yx + fyx yxx + fx = (yx fyx ) + fx . dy/dx) = C to solve for dy/dx = α(y). then this says that H is constant. The solution of the EulerLagrange equation is yx = C. but in general fully nonlinear in yx . Perhaps this equation can be solved directly.122 CHAPTER 9. Example. where the inner product is given by the integral. So the solution is y = A. dx H = yx fyx − f. where A is a arbitrary constant. It takes the explicit form yx fyx − f = C. But to satisfy the boundary conditions C = 0. a curve of constant slope. Use the conservation law H(y. Then (9. (9. y1 ) and (x2 . a curve of constant slope. Or make a substitution expressing y in terms of a new variable u. It is a nonlinear equation for y and dy/dx. dx dx Say that one has a solution of the EulerLagrange equation. The EulerLagrange equation then says that the variational derivative is zero. Example. The solution of the EulerLagrange equation is yx = C. yx ) does not depend on x. Integrate both sides to get x in terms of y. Consider the problem of minimizing the length x2 F (y) = x1 2 1 + yx dx (9.
yx ) = 2πy 1 + yx . Then H = −2πy/ 1 + yx = −C. x (9. The function is f (y. which is the equation of a circle (x − c2 )2 + (y − c1 )2 = λ2 .3. By symmetry x0 = 0. The area is the integral from −a to a of y dx. then the equation sin(θ) < θ < (π/2) sin(θ) translates into 2a < κ = 2λθ < πa. This equation has to be solved for k. If we let sin(θ) = a/λ. x dx x (9. h). yx ) = y − λ 1 + yx . For simplicity let the arc length κ satisfy 2a < κ < πa. By symmetry c2 = 0. Thus there are two solutions of the EulerLagrange equations. The conserved quantity 2 − y = −c . When a is small enough. 2 2 Let f (y. So we need to solve rk = cosh(ak). The smaller value of k gives the smaller derivative. corresponding to y = (1/k) cosh(kx) with the two values of k. This has solution y = (1/k) cosh(k(x − x0 )). then cosh(ak) cuts the line rk in two points. The derivative is dy/dx = sinh(kx). h) = x2 x1 (fyy h2 + 2fyyx hhx + fyx yx h2 ) dx. For simplicity consider y as a function of x between −a and a. The unknown is a function y of x between −a and a. so this is the minimum area surface satisfying the boundary conditions. Then the second diﬀerential of F is obtained by expanding F (y + h) = F (y) + dF (y)h + 1 d2 F (y)(h.14) dx x x dx x d d (fyx yx hx ) + (fyy − fyy )h dx dx x (9. The value of 2 y at ±a is r. h) + · · ·. (9. Example 2: This example comes up ﬁnding the maximum area for given arc length. The result is 2 d2 F (y)(h.9.15) .13) [− x1 d d (fy y hx ) + (fyy − fyy )h]h dx = (Lh. SECOND VARIATION 123 Example 1: This example comes from minimizing the area of a surface of revolution.12) The functions h have value 0 at the end points. The Lagrange multiplier turns out to be √ the radius of the circle. The diﬀerential equation is dx/dy = (y − is H = −λ1/ 1 + yx 1 c1 )/ (λ2 − (y − c1 )2 . The element of area is proportional to 2πy ds = 2πy 1 + yx dx. with value 0 at the two end points. 9. so we may freely integrate by parts. while the length 2 is the integral from −a to a of ds = 1 + yx dx. Thus x − c2 = λ2 − (y − c1 )2 . h) = Yet another form is d2 F (y)(h.3 Second variation Say that y is a solution of the EulerLagrange equation that has ﬁxed values at the end points. h) = We recognize Lh = − x2 x2 ((fyy − x1 d fyy )h2 + fyx yx h2 ) dx. This is a Lagrange multiplier 2 problem. The diﬀerential equation to be solved is (dy/dx)2 = (k 2 y 2 − 1). The result may thus also be written d2 F (y)(h. Furthermore c1 = − λ2 − a2 . where k = 2π/C. Fix r and vary a.
In the following we want to think of L as a function of the v. if f (y. One would like a condition that guarantees that the equations that deﬁne the Legendre transform actually have solutions. 9. The variable p is covariant and is dual to the variable v. dp (9. CALCULUS OF VARIATIONS 1 2 as a SturmLiouville operator. (9. The fundamental theorem about the Legendre transform is v= dH . while H is a function of the dual variable p. and one can go back from H to L in the same way as one got from L to H. x). Then the derivative is p = ev − 1.4 Interlude: The Legendre transform The Legendre transform of a function L of v is deﬁned as follows. x) = 2 yx + g(y. The inverse function then has derivative given by the inverse dv/dp = d2 H/dp2 > 0. The constant of integration is chosen so that H(0) + L(0) = 0.124 CHAPTER 9. Let p = dL/dv. Let L be a function of v.17) Proof: Let H = pv − L as above By the product rule and the chain rule and the deﬁnition of p we get dH dv dL dv dv dv =v+p − =v+p −p = v. then Lh = −hxx + gyy h (9. Thus in this formula v is deﬁned in terms of p by solving the equation p = dL/dv for v. dp dp dv dp dp dp (9. Find the inverse function deﬁning v as a function of p. so it is regarded as a function of x. The coeﬃcient gyy has the solution of the Eulero Lagrange inserted. There is a remarkable formula relating the Legendre transform of a function to the original function. Conclusion: Two functions are Legendre transforms of each other when their derivatives are inverse functions to each other. Example: Let L = ev − v − 1. The inverse function is v = log(1 + p).20) Thus the situation is symmetric.18) is the Legendre transform of L.19) dL . Deﬁne p= Then H = pv − L. it would be useful to have dL/dv to be strictly increasing.16) is a Schr¨dinger operator. In order for p = dL/dv to have a unique solution v. This is the same as saying that dp/dv = d2 L/dv 2 > 0. which is contravariant. yx . Then the Legendre transform is a function H of p satisfying dH/dp = v. The operator has Dirichlet boundary conditions at the end points of the interval from x1 to x2 . dv (9. . The integral is H = (1 + p) log(1 + p) − p. In fact.
Example: Start with L = (1/2)mv 2 . This H is the Legendre transform of L. The variables pk are covariant and dual to the variable vj . ∂pk (9. (This is the diﬀerential of L. (9. then v = p/m. Here are examples from mechanics involving kinetic energy.4. The second derivatives are (a − 1)v a−2 and (b − 1)pb−2 . so it is a oneform. . In this example the second derivatives are ev and 1/(p + 1). and they are reciprocals of each other. This also follows from the general formula H = pv − L. vn .21) pj dvj .23) Proof: Let H = k pk vk − L as above By the product rule and the chain rule and the deﬁnition of p we get ∂H = vj + ∂pj pk k ∂vk − ∂pj k ∂L ∂vk = vj + ∂vk ∂pj pk k ∂vk − ∂pj pk k ∂vk = vj . The Legendre transform has a generalization to several dimensions. This may also be written as 1/a + 1/b = 1. . while H is a function of the dual variables pk . So p = mv and L = pv − H = (1/2)mv 2 . Let L be a function of v1 . The Legendre transform plays a fundamental role in thermodynamics and in mechanics. Example: If we start instead with H = p2 /(2m). Then we have seen that H = (p+1) log(p+1)−p. since p = ev − 1 has inverse v = log(1 + p). The fundamental theorem about the Legendre transform in this context is vk = ∂H . Conclusion: Two functions are Legendre . we have H = pv − L = p log(1 + p) − [(1 + p) − log(1 + p) − 1] = (1 + p) log(1 + p) − p. (9. Then p = mv. The relation between a and b is (a − 1)(b − 1) = 1. It is an amusing exercise to compute that L = −mc2 1 − v 2 /c2 . INTERLUDE: THE LEGENDRE TRANSFORM 125 Example: Let L = ev −v−1. Example: Let L = v a /a with 1 < a < ∞ deﬁned for v ≥ 0. . which are contravariant.9. In fact. So v = p/m and H = pv − L = p2 /(2m).22) is the Legendre transform of L. Example: A famous relativistic expression for energy is H = m2 c4 + p2 c2 . Deﬁne pj = so dL = j ∂L . These are both positive. and one can go back from H to L in the same way as one got from L to H. In the following we want to think of L as a function of the vj . Thus a and b are conjugate exponents.) Then H= k pk vk − L.24) ∂pj Thus the situation is symmetric. The key is the relation p = mv/ 1 − v 2 /c2 between the derivatives. Then H = pb /b with 1 < b < ∞ deﬁned for p ≥ 0. Again they are reciprocals. . The Legendre transform of L brings us back to the original H. Thus in this formula the vj are deﬁned in terms of pk by solving the equation pk = ∂L/∂vk for the vj . In this case p = v a−1 and v = pb−1 . ∂vj (9.
dt ∂t H = qt Lq − L. ∂q dt ∂ q ˙ Say that one has a solution of the EulerLagrange equation. This is the energy conservation law.5 Lagrangian mechanics One is given a Lagrangian function L that is a function of position q and velocity q and possibly time t. t) dt. The diﬀerential of S is t2 t2 dS(q)h = t1 (Lq h + Lq ht ) dt = ˙ t1 (Lq − d Lq )h dt. . then this says that H is constant. ˙ (9.29) (9. In order for the equation to have a solution locally. It has ﬁxed values q1 and q2 at the end points t1 and t2 .27) The function q represents position as a function of time.126 CHAPTER 9.26) 9.30) (9. CALCULUS OF VARIATIONS transforms of each other when their derivatives are inverse functions to each other.31) (9. (9. qt . The problem is to ﬁnd the critical points of ˙ t2 S(q) = t1 L(q. (9.25) should be invertible. The inverse matrix is then ∂vj ∂2H = .28) Here h is a function with values 0 at the end points. ˙ dt (9. ∂pk ∂pj ∂pk It is then also strictly positive deﬁnite. One would like a condition that guarantees that the equations that deﬁne the Legendre transform actually have solutions. Then d d L = Lq qt + Lq qtt + Lt = (qt Lq ) + Lt .32) where If Lt = 0. ˙ ˙ dt dt Thus along a solution ∂L d H+ = 0. A particularly nice condition is that this matrix is strictly positive deﬁnite. the matrix ∂pk ∂2L = ∂vj ∂vk ∂vj (9. Thus for the diﬀerential to be zero we must have the EulerLagrange equation ∂L d ∂L − = 0.
˙ In the following we want to think of L as a function of q and q and possibly ˙ t. p. They are a ﬁrst order system. (9. ∂q ˙ (9. while H is a function of q and p and possibly t. (9.40) Since p = ∂L/∂ q determines the function p in terms of q.35) be the Legendre transform of L.36) In fact. HAMILTONIAN MECHANICS Deﬁne p= 127 ∂L . and t. The last two equations are Hamilton’s equations.39) ∂q ∂q ∂q ∂ q ∂q ˙ ∂q The EulerLagrange equation says that along a solution we have dp ∂H =− . where q is deﬁned in terms of p implicitly by ˙ p = ∂L/∂ q. linear in dq/dt and dp/dt. p. which is contravariant. The momentum variable p is covariant and is dual to the velocity variable q. the proof is simple if we remember that q is ﬁxed: ∂H ∂q ˙ ∂L ∂ q ˙ =q+p ˙ − = q.33) This is the momentum variable. it follows ˙ that along a solution ∂H dq = (9. but nonlinear in q. According to the properties of the Legendre transform we have ˙ q= ˙ ∂H .37) (9. and t. Then the EulerLagrange equation says that along a solution dp ∂L = .6. .34) dt ∂q 9. ˙ (9.41) dt ∂p determines the function dq/dt in terms of q. ∂p (9. ˙ ∂p ∂p ∂ q ∂p ˙ It also follows that ∂L ∂H =− . dt ∂q (9. However ˙ p is expressed in terms of q and q by solving the Legendre transform equation ˙ p = ∂L/∂ q.38) This also has an easy proof if we remember that p is ﬁxed and so q depends on ˙ q: ∂H ∂ q ∂L ∂L ∂ q ˙ ˙ ∂L =p − − =− . t.6 Let Hamiltonian mechanics H = pq − L. ∂q ∂q (9. dq/dt.9.
There it will linger.42) When H does not depend explicitly on t.45) Let us look at this from the point of view of maximum and minimum. Example: Take an example with V (q) < 0. CALCULUS OF VARIATIONS It is easy to check from Hamilton’s equations that along a solution dH ∂H = .44) This is Newton’s law of motion. dt2 (9. 2m (9. then this has an integral in which H is constant. Here q(t1 ) = q1 ˙ and q(t2 ) = q2 are ﬁxed. The Hamiltonian is ˙ H= 1 2 p + V (q). for instance. The solution will start at the point y1 at time t1 . On the other hand. This integral is conservation of energy. So for instance if this operator with Dirichlet boundary conditions at t1 and t2 has strictly positive eigenvalues. rapidly to the region where V (q) is large. if V (q) ≤ 0. This relation may be highly nonlinear.7 Kinetic and potential energy L= 1 2 mq − V (q). the solution will not be a minimum of the action. The second diﬀerential is determined by the Schr¨dinger operator o L = −m d2 − V (q(t)). But it gives the same answer.128 CHAPTER 9. ˙ 2 (9. Then it go reasonably. In this problem the momentum is p = mq. . Then it will fall back to the point y2 .43) The most classical situation is when The EulerLagrange equation is just. Clearly there is tradeoﬀ. but not excessively. −V (q) − dqt = 0. 9. to satisfy the boundary conditions the solution should not move too fast. It gives a situation in which p is related to q. then the solution will be a minimum. This minimization problem seems to have nothing to do with Newton’s laws. One would like the solution to linger as long as possible in the region where V (q) is large. In many other cases. dt (9. arriving there at time t2 .46) Here q(t) is a function of t satisfying the EulerLagrange equation and the boundary conditions. Then the problem is to ﬁnd q(t) that minimizes the integral from t1 to t1 of (1/2)mq 2 − V (q). This allows an equation in which dq/dt is related to q. This will happen. dt ∂t (9. but only a stationary point.
9. ˙ dt where h satisﬁes Dirichlet boundary conditions at t1 and t2 . Interpret the result of the same problem intuitively in terms of a minimization problem. where the functions h satisfy Dirichlet boundary conditions at t1 and t2 . We are interested in solutions with E > 0. Show that d2 S(q)(h. [−m d2 − V (q)]h). Show that q = C sinh(ωt) is a solution. Let H = mq 2 − L = (1/2)mq 2 + V (q). ˙ ˙ where E is a constant. Write the boundary condition a = C sinh(ωT ) as a relation between T and E. then the solution q(t) of the variational problem is actually a minimum. Fix a. 4. Show that d2 S(q)(h. dt2 where the operator satisﬁes Dirichlet boundary conditions at t1 and t2 . From now on take the example V (q) = −(1/2)kq 2 . h).9. Let ω = k/m. Show that if V (q) is concave down. with functions q of t satisfying q = q1 at t = t1 and q = q2 at t = t2 . h) = (h. 1. h) = t2 [m t1 dh dt 2 − V (q)h2 ] dt. Show that d q − V (q).8. 3. 8. Here k > 0. PROBLEMS 129 9. Consider a q(t) for which dS(q) = 0. 7. 6. Note the sign. and ﬁnd the constant C in terms of E. Consider a q(t) for which dS(q) = 0. Take t1 = −T and t2 = T . Take q1 = −a and q2 = a. Show that T → 0 implies E → ∞. while T → ∞ implies E → 0.8 Problems 1 2 mq − V (q). dS(q)h = (−m 2. Interpret the result of the last problem intuitively in terms of particle motion satisfying conservation of energy. Show that H = E along a solution. 5. ˙ 2 t2 Let L be a function of q and q given by ˙ L= Let S(q) = t1 Ldt. .
Then the quantity of interest is (exp(− itH )u)(x) = u(x.49) We can write out the Trotter product formula in detail using the results obtained before for H0 and for V separately. In fact.50) where xn = x. Here the total energy operator H = H0 + V is the sum h of two noncommuting operators. This is like the heat equation with complex diﬀusion coefh ﬁcient iσ 2 = i(¯ /m). t) given by the Trotter formula.48) h ¯ h ¯ The Trotter product formula is a remarkable formula that expresses the result for the sum H = H0 + V in terms of the separate results for H0 and for V . This gives 2 t dx(t ) − V (x(t )) dt i 0 dt itH u(x0 ) Dx. (9. h h In other words. Then we must evaluate exp(−itH0 /¯ ). t) = lim ··· exp n→∞ −∞ h ¯ ( 2πi(¯ /m)∆t)n h −∞ (9. which is 1/ 2πiσ 2 t exp(−x2 /(2iσ 2 t)) = 1/ 2πi(¯ /m)t exp(ix2 /(2(¯ /m)t). It may be proved under various circumstances. )u)(x) = exp (exp(− h ¯ h ¯ (9. So in the ¯ Fourier transform representation the unitary time evolution is multiplication by exp(−it(¯ /m)k 2 /2). So the solution is convolution by the inverse Fourier transh√ form. (9. The formula says that exp(− itH i(t/n)V i(t/n)H0 )u = lim exp(− ) exp(− ) n→∞ h ¯ h ¯ h ¯ n u. The problem is easy when we just have kinetic energy.51) . So far this has been rigorous. Write ∆t = t/n.9 The path integral Let us return to the problem of evaluating the quantum mechanical time evolution exp(−iitH/¯ ).130 CHAPTER 9. This works h ¯ out to be 2 xj+1 −xj n−1 − V (xj ) ∆t i j=0 ∞ ∞ ∆t u(x0 ) dxn−1 · · · dx0 . corresponding to potential energy and kinetic energy. where H0 = p2 /(2m) = −¯ 2 /(2m)d2 /dx2 .47) The calculation of the exponential is also easy when we just have potential energy. However now we follow Feynman and take the formal limit as n → ∞ and ∆t → 0 with n∆ = t ﬁxed. for example when V is a bounded operator. (9. itV itv(x) (exp(− )u)(x) = exp(− )u(x). In the h h Fourier transform representation this is multiplication by h2 /(2m)k 2 . u(x. CALCULUS OF VARIATIONS 9. (exp(− itH0 )u)(x) = h ¯ ∞ exp( −∞ im(x − x )2 )u(x ) t¯ h dx 2πi(¯ /m)t h .
The conclusion is y y y that the Lagrange multiplier λ is the rate of change of the value of the function y at the critical point as a function of the constraint parameter. that is. dx(t ) ) dt dt h ¯ 2 u(x0 ) Dx.9. So there will be no cancelation. 9.54) for some λ. The equation is y2 dy1 + y1 dy2 = λ(2y1 dy1 + 2y2 dy2 ). (9. Then dG(¯) = dκ. This gives y2 = 2λy1 and y1 = 2λy2 .52) − V (x(t )) (9. We get y1 = ± κ/2 and y2 = ± κ/2.10. the function can vary only by relaxing the constraint. 2 2 Example: Maximize y1 y2 subject to y1 + y2 = κ. . That is. . The change in the function must be proportional to the change in the constraint. yn . It follows that dF (¯) = λdG(¯) = λdκ. . Combine this with y1 + y2 = κ. . This can also be written itH (exp(− )u)(x) = h ¯ where L(x(t ). The usual method is to compute the diﬀerential and set it equal to zero. If you care. and it takes the same form in every coordinate system. and after division by h the phase will be very diﬀerent. Thus the integrand in the path integral is the exponential of i times the action divided by ¯ . The method of Lagrange multipliers is to say that the diﬀerential of F (y) without the constraint must satisfy dF (y) = λdG(y) (9. h This expression goes some way toward an explanation of the principle of stationary action. dx(t ) )= dt dx(t ) dt exp i t 0 L(x(t ). so in particular it satisﬁes the constraint G(¯) = ¯ y κ. APPENDIX: LAGRANGE MULTIPLIERS 131 where the integral is over all paths x from some x0 at time 0 to ﬁxed ﬁnal x at time t. yn ). So ¯ there will be a lot of cancelation. Then nearby paths will have considerably diﬀerent values of the action. the part of the integral near a path that is a critical point will contribute more or less the same value of the action. The value of y1 y2 at these points are ±κ/2. Consider ﬁrst the problem of ﬁnding a critical point of F (y) with no constraints. Say that y is a critical point. . We are interested in ﬁnding the critical points of a scalar function F (y) = F (y1 . On the other hand. . dκ .10 Appendix: Lagrange multipliers In this section y will represent the coordinates y1 . you can also compute that λ = ±1/2.53) is the Lagrangian. Consider a part of the integral near a path that is not a critical point of the action. Consider the problem of ﬁnding a critical point of F (y) subject to a constraint G(y) = κ. so that the equation to be solved is dF (y) = 0. . . This fundamental equation is an equation for diﬀerential forms. λ = dF (¯) . Eliminating 2 2 2 2 λ we get y1 = y2 .
. . so in particular it satisﬁes the constraints ¯ y Gj (¯) = κj . This fundamental equation is an equation for diﬀerential forms. Set F (y) = F (y) − λG(y) and require ¯ (y) = 0. These two are then solved simultaneously. and it takes the same form in every coordinate system.55) for some λ. This gives the above equation for y and λ. Consider the problem of ﬁnding a critical point of F (y) subject to constraint G1 (y) = κ1 . It follows that dF (¯) = y y y j λj dGj (¯) = λj dκj . CALCULUS OF VARIATIONS Example: In the last example the derivative of the maximum value ±κ/2 with respect to κ is λ = ±1/2. Say that y is a critical point. However one also has dF the constraint equation for y and λ. The method of Lagrange multipliers extends to the case when there are several constraints.. the function can vary only by relaxing the constraint. with the other constraint parameters ﬁxed. ∂κj . Thus λj = ∂F (¯) . The method of Lagrange multipliers is to say that the diﬀerential of F (y) without the constraint must satisfy m dF (y) = j=1 λj dGj (y) (9. That is.132 CHAPTER 9. Then dGj (¯) = dκj . The conclusion is that the Lagrange multiplier λj is the rate of change j of the value of the function at the critical point as a function of the constraint y parameter κj . one wants to maximize F (y) subject to the constraint G(y) = ¯ κ. In calculations. . The change in the function must be proportional to the change in the constraints. Gm (y) = κm . The idea is to take λ arbitrary. .
1) = 0 is (10.1 The implicit function theorem: scalar case Let f (u.3) For small this is a very good approximation. ) be a function of u and . For practical calculations one wants to compute higher derivatives u(n) as a function of . 0)−1 f (u0 . 0)(u − u0 ) + f (u0 . (10. Then we can solve u ≈ u0 − fu (u0 . 0) = 0 and the partial derivative fu (u0 . this says that at the point = 0 we have du = −fu (u0 .4) d This result is called ﬁrst order perturbation theory. The assumption for the implicit function theorem is that f (u0 . in such a way that mapped into u0 . 0) = 0.2) . (10. These are used in the Taylor expansion u= In ﬁrst order we have fu u(1) + f = 0 133 (10. 0). In fact. Let fu (u. 0) .6) u(n) n! n=0 ∞ n (10.5) . ) = 0 deﬁnes u as a function of for suﬃciently near 0. u) . 0)−1 f (u0 .Chapter 10 Perturbation theory 10. There are some other technical assumptions. The intuition behind the theorem is that one can expand f (u. ) be the partial derivative of F as a function of u. (10. The conclusion is that the equation f (u. ) ≈ fu (u0 .
at least in this case.8) Thus the result of second order perturbation theory is that −1 u(2) = −fu [fuu (u(1) )2 + 2fu u(1) + f ]. Fix a. (10. When t = 0 the solution is u = x. This gives 2 6 1 a+u + u 2 2 1 + u 6 3 Equating powers of the same way.7) Usually we are interested in evaluating this expression at u = u0 and = 0. The ﬁrst derivative is u = g(u) + g (u)u .14) . that is. 2 (10. at u = a. dtn dx (10. and u = 3g(a)2 g (a) + 6g(a)g (a)2 . It is diﬃcult to see the pattern. The nth derivative is given by fu u(n) + rn = 0 Diﬀerentiate again. This gives fu u(2) + fuu (u(1) )2 + 2fu u(1) + f = 0.12) gives the same equations. The second derivative is u = 2g (u)u + g (u)u + g (u)u2 .134 which gives the ﬁrst order result CHAPTER 10. Example: Another popular way of solving the problem is to substitute u = a + u + 1 u 2 + 1 u 3 + · · · in the equation. Example: Here is a simple example. 1 2 g (a)(u +· · ·)2 +· · ·]. This gives u = g(a). (10. Say that x is a ﬁxed parameter and one wants to solve u = x + tg(u) (10. (10. However in the next section we will see that there is one. We want to solve u = a + g(u) to ﬁnd u as a function of . Evaluate these at = 0. and these are solved recursively 2 1 +· · · = a+ [g(a)+g (a)(u + u 2 +· · ·)+ 10. PERTURBATION THEORY −1 u(1) = −fu f . (10. (10.13) for u as a function of t.2 Problems There is at least one case when one can evaluate Taylor coeﬃcients to all orders. The second order result is obtained by diﬀerentiating the ﬁrst order equation. This gives fu u(n+1) + [fuu u(1) + fu ]u(n) + [rnu u(1) + rn ] = 0. The Lagrange expansion says that for arbitrary n ≥ 1 we have dn−1 dn u t=0 = n−1 g(x)n . u = 2g(a)g (a).11) Thus at every stage one only has to invert the ﬁrst derivative fu .9) = 0.10) Again we are interested in evaluating this expression at u = u0 and The general pattern is this. The third derivative is u = 3g (u)u2 +3g (u)u + times other junk.
Show that the equation becomes 2.21) with initial condition v = h(x) at time zero. (10. Prove the conservation laws n ∂wn+1 ∂wn = . Let w = g(u).15) 2 2 dx 6 dx n! dxn−1 w = g(x + wt). dt The acceleration of a particle is ∂v dx ∂v dv = + =0 (10.2. The motion of a particle moving with the gas is given by dx = v. If v = −w. ∂t n + 1 ∂x 4.16) 1. then the equation is ∂v ∂v +v =0 ∂t ∂x (10.23) dt ∂t dt ∂x That is why it is considered free particle motion.24) (10. This is the equation for the velocity of a gas of particles moving freely in one dimension.22) . 3.19) (10.20) 2 dx 6 dx2 n! dxn−1 (10. t is given in terms of the initial v = h(x0 ) at time zero by v = h(x0 ) = h(x − vt).10. we have that v at arbitrary x. So the solution of the equation is v = h(x − vt). Since v is constant along particle paths x = x0 + vt. n−1 ∂t n ∂xn−1 5. (10. (10.18) (10. Use u = x + tw to prove the Lagrange expansion.17) 6. Show that the function w is the solution of the partial diﬀerential equation ∂w ∂w =w ∂t ∂x with initial condition w = g(x) at t = 0. Note: The partial diﬀerential equation has a physical interpretation. PROBLEMS So the expansion to all orders is u = x + tg(x) + 135 t2 d t3 d2 tn dn−1 g(x)2 + g(x)3 + · · · + g(x)n + · · · . Prove that at t = 0 w = g(x)+ t d t2 d2 tn−1 dn−1 g(x)2 + g(x)3 +· · ·+ g(x)n +· · · . Prove that for n ≥ 1 we have 1 ∂ n−1 n ∂ n−1 w = w . (10.
in such a way that = 0 is mapped into u0 . Usually we are interested in evaluating this expression at u = u0 and = 0. u(n) n! n=0 ∞ n . u1) ) + 2Fu u(1) + F = 0. 0) = 0 and the partial derivative Fu (u0 . ) to be a function of u and .26) (10. PERTURBATION THEORY 10. where u ranges over a region U in some Banach space E.25) deﬁnes u as a function of for suﬃciently near 0. ) = 0 (10. The second order result is obtained by diﬀerentiating the ﬁrst order equation. it is a function on U with values that are linear transformations from E to E . (10. Or more generally we can take F (u. Fu has values that are linear functions from E to E . Let Fu be the partial derivative of F as a function of u. (10. The parameter is still real. as it must.136 CHAPTER 10. 0) has a bounded inverse.28) −1 Note that F has values in E . ) = 0 to be a system of n equations for n variables u and an extra variable . ) takes values in a Banach space E . Fu has values that are linear transformation from E to E. So the right hand side has values in E. There are some other technical assumptions.29) Thus the result of second order perturbation theory is that −1 u(2) = −Fu [Fuu (u(1) . On the other hand. On the other hand. Again the right hand side has values in E.3 The implicit function theorem: systems We can take F (u.30) Note that Fuu has values that are bilinear functions from E × E to E . The assumption for the implicit function theorem is that F (u0 . (10. Then F (u. . Again we are interested in evaluating this expression at u = u0 and = 0. These are used in the Taylor expansion u= In ﬁrst order we have Fu u(1) + F = 0 which gives the ﬁrst order result −1 u(1) = −Fu F . The conclusion is that the equation F (u.27) (10. We shall often abbreviate the function by F . This gives Fu u(2) + Fuu (u(1) . In the case of a system this is the n by n matrix of partial derivatives of the components of F with respect to the components of u. For practical calculations one wants to compute higher derivatives u(n) as a function of . More generally. u1) ) + 2Fu u(1) + F ].
4. When = 0 there is a solution u = u0 . Then u = u0 + L−1 g(u). that is. The second order equation is L0 u = 2g (u)u + terms with . The ﬁrst derivative is u = L−1 g(u) + L−1 g (u)u . This gives Fu u(n+1) + [Fuu u(1) + Fu ]u(n) + [Rnu u(1) + Rn ] = 0. L0 u = g(u0 ). This equation may be solved by the usual procedure. at u = u0 .4 Nonlinear diﬀerential equations Lu = g(u). NONLINEAR DIFFERENTIAL EQUATIONS The general pattern is this. The second derivative is u = 2L−1 g (u)u + 0 0 0 −1 L0 g (u)u + L−1 [g (u)u2 ]. 0 This gives u = L−1 g(u0 ). The derivative of L is L0 . Evaluate these at = 0. 0 (10. The nth derivative is given by Fu u(n) + Rn = 0 Diﬀerentiate again. Thus the hierarchy of equations begins Lu0 = 0. (10. L0 u = 2g (u0 )u . Thus the ﬁrst order perturbation equation is L0 u = g(u) + g (u)u .31) (10. making it impossible to do the simpliﬁcation that we found in the scalar case. . It is possible to think of this in another way. (10. (10.10. where L0 is a linear operator with zero boundary conditions.35) 0 and and u = 3L−1 [g (u0 )(L−1 (g(u0 ))2 ] + 6L−1 [g (u0 )L−1 (g (u0 )L−1 g(u0 ))]. In the example u0 (t) = e−t . 137 (10. One can continue in this way. Then the equation Lu = g(u) becomes L0 w = g(u). Suppose that L0 has an inverse.37) 0 0 0 0 0 u = 2L−1 [g (u0 )L−1 g(u0 )]. Thus at every stage one only has to invert the ﬁrst derivative Fu . The third derivative is u = 3L−1 [g (u)u2 ] + 0 0 3L−1 [g (u)u ] + times other junk.33) Let us look at the simplest nonlinear diﬀerential equation Here L is the restriction of a linear operator to satisfy an inhomogeneous boundary condition. 0 0 (10. where L0 is the operator deﬁned with zero boundary conditions.34) This is very close in form to the scalar equation examined in the ﬁrst two sections. For example.32) 10. For instance. we could have L = ∂/∂t+1 acting on functions with u(0) = 1.36) Notice how the linear operators occur in various places. we could have L0 = ∂/∂t+1 acting on functions h with h(0) = 0. Write u = u0 + w.
(10. In many contexts λ plays the role of an eigenvalue parameter.43) (10. (10.40) These equations have a physical interpretation. Of course these equations are only valid in the zero order approximation. The reason is that the solutions of the diﬀerential . 2m The other equation may be written (A2 S ) = 0. S (10. for λ < 0 suﬃciently negative. The resolution of this problem is to write the equation in new variables. PERTURBATION THEORY 10. Thus ¯ u = Aei h . The function S is a velocity that depends on position.42) (10. In fact. 2m In quantum mechanics one often wants to solve a problem of the form − (10. The second equation is the conservation law that says that the current A2 S is constant in space.41) 1 h ¯2 (S )2 + v(x) − λ = A 2m 2m (10. Now we can recognize the ¯ = 0 limit as h 1 (S )2 + v(x) = λ. The constant ¯ > 0 is the h perturbation parameter. This approximation is relevant to a heuristic understanding of Schr¨dinger o operators of the form h ¯ 2 d2 H=− + v(x). The ﬁrst equation is the classical equation for conservation of energy.138 CHAPTER 10. which in this case is called the WKB approximation. Thus setting ¯ = 0 it is not clear that there will be a solution h at all. The function A2 is a density. The constant m > 0 is ﬁxed.5 A singular perturbation example h ¯2 u + v(x)u = λu. then one expects this to be a welldeﬁned selfadjoint operator. If v(x) is bounded below.44) 2m dx2 The Hilbert space is the space of L2 functions on the line. The characteristic feature of this problem is that the zero order perturbation is not apparent. one expects (H − λI)−1 to be a welldeﬁned bounded selfadjoint operator. For simplicity think of the case when λ > v(x).38) Here v(x) is a given real function.39) The equation becomes A together with 2A S + AS = 0.
We also have the reduced resolvent S with SP = P S = 0 and S(H0 + V − λI) = 1 − P . p) = 1. V p) − (q. and they are both in L2 at ±∞. then the equation we want to solve is (H0 + V − λ( ))p( ) = 0.6.48) These formulas are so important that they should be memorized. The ﬁrst is the formula for the eigenvalue to second order λ( ) = λ + (q. (H0 + V − λ)u) = 0 for all u. u)p. There is a solution of Hu = λu that is in L2 at −∞ and by another solution of the same equation that is in L2 at +∞. The corresponding eigenvector is p. Let us look at the example of v(x) ∼ x2n with < 0. According to the WKB approximation. If this is indeed the case. So there is no boundary condition at inﬁnity that determines the Green’s function in a natural way. at least the one for the eigenvalues. . It is given by P u = (q. The Green’s function is deﬁned by the solutions that decay rapidly at inﬁnity. The problem is that when one wants to deﬁne a Green’s function. Thus p is an eigenvector with (H0 + V − λ)p = 0. These solutions are highly oscillatory and decrease at inﬁnity at a moderate rate.10. (10.6 Eigenvalues and eigenvectors We have an operator H0 + V . The associated projection onto the eigenspace is P satisfying P 2 = P . V SV p) 2 + ···. EIGENVALUES AND EIGENVECTORS 139 equation either decay rapidly at inﬁnity or grow rapidly at inﬁnity. The adjoint operator has eigenvalue λ with eigenvalue q. Thus it is plausible that if n > 1 then every solution is in L2 . then something quite diﬀerent happens. (10. It is only determined up ¯ to a constant multiple. V p)] 2 + ···. (q.47) The other is the formula for the (unnormalized) eigenvector to second order p( ) = p − SV p + [SV SV p − S 2 V p(q. (10. 10. We are interested in an isolated eigenvalue λ of multiplicity 1. When v(x) is very badly unbounded below. then the operator H is not selfadjoint. there are two linearly independent solutions.46) We would like to ﬁnd the perturbation expansion in terms of the λ and p and q and S associated with = 0. one for the eigenvalue and one for the eigenvector (not normalized). It is convenient to choose p and q so that the inner product (q. we have S (x) ∼ ±xn . The two solutions are matched by a jump condition to form the kernel of the integral operator. The following discussion is meant to establish two formulas. (10. It follows that A2 ∼ x−n .45) Furthermore. If we indicate the dependence on explicitly.
p ¨ ˙ Take the inner product with q on the left. V p) − (q. This gives (I − P )(H0 + V − λ)p + (I − P )V p = 0. V p).61) . Diﬀerentiate the eigenvector equation with respect to . ˙ (10.60) 2 (10.52) (10. ˙ or (I − P )p = −SV p.140 CHAPTER 10. This gives ˙˙ (H0 + V − λ)¨ + λp + 2V p − 2λp = 0.53) (10. p). (10. This gives ˙ (1 − P )(H0 + V − λ)¨ + 2(1 − P )V p − 2λ(1 − P )p = 0. λ( ) = λ + (q. V p) · · · .58) + ···. That is. λ( ) = λ + (q.50) (10. p 2 2 (10. V SV p). V p).51) (10.59) (10. V SV p) Now apply 1 − P on the left. This gives the ﬁrst order change of the eigenvector p( ) = p + [−SV p + c1 p] . PERTURBATION THEORY We now proceed to the derivations.49) (10. This gives ˙ λ = (q. ˙ ˙ Now take the inner product with q on the left. That is. This gives the ﬁrst order change of the eigenvalue. This gives ¨ ˙ λ + 2(q.55) Now diﬀerentiate again.54) Notice that the derivative of the eigenvector is not uniquely determined.57) or (10. (10. 2 This gives the second order change of the eigenvalue. ˙ Thus p = −SV p + c1 p. Next apply I − P on the left. ˙ ˙ 1¨ λ = −(q. p ˙ ˙ Thus 1 1 (1 − P )¨ = SV SV p − c1 SV p − S 2 V p(q. It ordinarily harmless to take c1 = 0. V p) − 2λ(q. This gives (H0 + V − λ)p − λp + V p = 0.56) (10.
V S 2 V p)p] 2 2 2 (10.64) 1 (qj . V SV p) and 1 p( ) = p − SV p + [SV SV p − S 2 V p(p. V pn )pj λj − λn (qj . (10. V pn )]pj λk − λn λj − λn + ···. V pn ) − (qj . V pk ) k=n (10. u). (10. 10. THE SELFADJOINT CASE Hence 141 1 1 1 p = SV SV p − S 2 V p(q. (10. We will think of a basis pn for the Hilbert space ∗ and a dual basis qm .63) Now let us change notation. We write qm for the functional that takes u to (qm . (10.66) 2 If H0 has discrete spectrum. V pn ) − (qn .10.62) 2 +···. V pn )(qn . V Sn V pn ) Even more explicitly. The ∗ operator Sn = j=n pj qj /(λk − λn ). V pn ) λj − λn 2 + ···.7.67) 2 + j=n 1 [ λj − λn 1 1 (qk . p( )) = 1 to second order. this is λn ( ) = λn + (qn . So in this case λ( ) = λ + (p. V p) − (p.7 The selfadjoint case In the selfadjoint case q = p and it is natural to normalize so that (p( ). Thus H0 pn = λn pn . then (leaving out the last term) this may be written pn ( ) = pn − j=n 1 (qj . Then the second order expression for the eigenvalue can be written as λn ( ) = λn + (qn . V S 2 V p).68) + ···. V pj ) j=n 2 + ···.65) The second order expression for the eigenvector pn ( ) is (taking c1 = 0) 1 2 pn ( ) = pn −Sn V pn +[Sn V Sn V pn −Sn V pn (qn . (10. V pn )+ c2 pn ] 2 +· · · . with (qm . V pn ) − (qn . Then the various operators have special forms. V p) − c1 SV p + c2 p. V p) − (p. V p) − c1 SV p + c2 p] 2 2 (10. This gives c1 = 0 and c2 = −(p. (10. ¨ 2 2 2 This gives the second order change of the eigenvector 1 1 p( ) = p + [−SV p + c1 p] + [SV SV p − S 2 V p(q.70) . We take the ∗ pn to be the eigenvectors of H0 .69) + · · · . The operator Pn = pn qn is the spectral projection associated with the nth eigenvalue λn of H0 . pn ) = δmn .
However it is interesting to compare this exact result with the result of second order perturbation theory. 1 0 (10. V pn ). Let 1 A= √ 2 1 A∗ = √ 2 d dx d dx . The reduced resolvent is S= 1 1−(−1) 0 0 = 0 0 1 2 0 . Let us look at the perturbation of the eigenvalue −1 of H0 .8 The anharmonic oscillator H0 = 1 d2 + x2 − 1 .78) In this section we let The perturbation V is multiplication by v(x) = x4 . (10.80) and x− (10.77) 10.76) + ···. PERTURBATION THEORY If λ = λn and p = pn .72) (pn . 2 dx2 (10. (λj − λn )2 (10. we can write the coeﬃcient c2 even more explicitly as c2 = − Example: Take H0 = 1 0 0 −1 0 1 .74) and the eigenvector p is the second column of P . V pj ) j=n 1 (pj .81) .73) √ The eigenvalues are λ( ) = ± 1 + 2 . 2 Thus λ( ) = −1 − 1 2 2 (10. For this eigenvalue the spectral projection is P = 0 0 0 1 (10. 0 (10.79) x+ (10.71) and V = (10.75) So the coeﬃcient in second order perturbation theory is 1 −(p. V SV p) = − .142 CHAPTER 10. (10.
After two steps one is either at un−2 . Furthermore. Furthermore. . . or at un+2 . x2 un+2 )2 . xum ) = √ (um+1 . So (un . we need to look at all transitions that go from n to n in 4 steps. A∗ um ) = √ m + 1 (10. So for < 0 the eigenvalue problem is meaningless. x4 un ) = (un .90) 4 2 4 4 Higher order perturbations may be computed with more labor. let us compute the matrix element (un . 3. (10. (10. x2 un ). and by Sn un = 0. This is the sum of two possible transition paths. Then the un form an orthonormal basis consisting of eigenvectors of H0 . 4 π (10. The corresponding reduced resolvent Sn is given by 1 Sn um = um (10.8.83) n! √ √ We have A∗ um = m + 1um+1 and Aum = mum−1 . then it would converge also for some values of < 0. It turns out that the perturbation series in does not converge. As a warmup. This is n(n − 1) 2n + 1 2 (n + 1)(n + 2) 6n2 + 6n + 3 +( ) + = .84) Say that we are interested in a particular n. To compute the ﬁrst order perturbation coeﬃcient (un .89) . Aum ) = √ m. (10. the solution of Au0 = 0 is x2 1 u0 (x) = √ e− 2 . (10. x2 un )2 + (un . The result is (n + 1)/2 + (n/2) = (2n + 1)/2.86) This allows us to calculate the matrix elements of x.82) Deﬁne for each n = 0. one updown and one downup. THE ANHARMONIC OSCILLATOR 143 Then AA∗ = A∗ A + I and H0 = A∗ A.88) 2 2 So each transition is to a neighboring natural number. Some intuition for this is provided by the observation that H0 + V is not a selfadjoint operator for < 0. 2. 2 (10. x4 un ).10. xum ) = √ (um−1 . 1. at un . The operator multiplication by x may be expressed in terms of A and A∗ by 1 x = √ (A + A∗ ).87) 2 2 and 1 1 √ (um−1 . These are given by 1 1 √ (um+1 . if the Taylor series had a positive radius of convergence. the vector 1 ∗n A u0 . x4 un ) = (10.85) m−n for m = n. (un . . un = H0 un = nun . x2 un−2 )2 + (un . On the other hand.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.