Professional Documents
Culture Documents
Real Variables
Real Variables
Shlomo Sternberg
May 10, 2005
2
Introduction.
I have taught the beginning graduate course in real variables and functional
analysis three times in the last five years, and this book is the result. The
course assumes that the student has seen the basics of real variable theory and
point set topology. The elements of the topology of metrics spaces are presented
(in the nature of a rapid review) in Chapter I.
The course itself consists of two parts: 1) measure theory and integration,
and 2) Hilbert space theory, especially the spectral theorem and its applications.
In Chapter II I do the basics of Hilbert space theory, i.e. what I can do
without measure theory or the Lebesgue integral. The hero here (and perhaps
for the first half of the course) is the Riesz representation theorem. Included
is the spectral theorem for compact self-adjoint operators and applications of
this theorem to elliptic partial differential equations. The pde material follows
closely the treatment by Bers and Schecter in Partial Differential Equations by
Bers, John and Schecter AMS (1964)
Chapter III is a rapid presentation of the basics about the Fourier transform.
Chapter IV is concerned with measure theory. The first part follows Caratheodorys
classical presentation. The second part dealing with Hausdorff measure and dimension, Hutchinsons theorem and fractals is taken in large part from the book
by Edgar, Measure theory, Topology, and Fractal Geometry Springer (1991).
This book contains many more details and beautiful examples and pictures.
Chapter V is a standard treatment of the Lebesgue integral.
Chapters VI, and VIII deal with abstract measure theory and integration.
These chapters basically follow the treatment by Loomis in his Abstract Harmonic Analysis.
Chapter VII develops the theory of Wiener measure and Brownian motion
following a classical paper by Ed Nelson published in the Journal of Mathematical Physics in 1964. Then we study the idea of a generalized random process
as introduced by Gelfand and Vilenkin, but from a point of view taught to us
by Dan Stroock.
The rest of the book is devoted to the spectral theorem. We present three
proofs of this theorem. The first, which is currently the most popular, derives
the theorem from the Gelfand representation theorem for Banach algebras. This
is presented in Chapter IX (for bounded operators). In this chapter we again
follow Loomis rather closely.
In Chapter X we extend the proof to unbounded operators, following Loomis
and Reed and Simon Methods of Modern Mathematical Physics. Then we give
Lorchs proof of the spectral theorem from his book Spectral Theory. This has
the flavor of complex analysis. The third proof due to Davies, presented at the
end of Chapter XII replaces complex analysis by almost complex analysis.
The remaining chapters can be considered as giving more specialized information about the spectral theorem and its applications. Chapter XI is devoted to one parameter semi-groups, and especially to Stones theorem about
the infinitesimal generator of one parameter groups of unitary transformations.
Chapter XII discusses some theorems which are of importance in applications of
3
the spectral theorem to quantum mechanics and quantum chemistry. Chapter
XIII is a brief introduction to the Lax-Phillips theory of scattering.
Contents
1 The
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
is its length.
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
16
17
18
18
19
20
20
21
21
23
24
24
25
27
30
32
35
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
37
38
39
40
41
42
42
45
47
48
49
49
CONTENTS
2.1.13 Bessels inequality. . . . . . . . . . . . . .
2.1.14 Parsevals equation. . . . . . . . . . . . .
2.1.15 Orthonormal bases. . . . . . . . . . . . .
2.2 Self-adjoint transformations. . . . . . . . . . . . .
2.2.1 Non-negative self-adjoint transformations.
2.3 Compact self-adjoint transformations. . . . . . .
2.4 Fouriers Fourier series. . . . . . . . . . . . . . .
2.4.1 Proof by integration by parts. . . . . . . .
d
.. . . . . . . .
2.4.2 Relation to the operator dx
2.4.3 G
ardings inequality, special case. . . . . .
2.5 The Heisenberg uncertainty principle. . . . . . .
2.6 The Sobolev Spaces. . . . . . . . . . . . . . . . .
2.7 G
ardings inequality. . . . . . . . . . . . . . . . .
2.8 Consequences of G
ardings inequality. . . . . . .
2.9 Extension of the basic lemmas to manifolds. . . .
2.10 Example: Hodge theory. . . . . . . . . . . . . . .
2.11 The resolvent. . . . . . . . . . . . . . . . . . . . .
3 The
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
50
50
51
52
54
57
57
60
62
64
67
72
76
79
80
83
Fourier Transform.
Conventions, especially about 2. . . . . . . . . . . . . . .
Convolution goes to multiplication. . . . . . . . . . . . . .
Scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fourier transform of a Gaussian is a Gaussian. . . . . . .
The multiplication formula. . . . . . . . . . . . . . . . . .
The inversion formula. . . . . . . . . . . . . . . . . . . . .
Plancherels theorem . . . . . . . . . . . . . . . . . . . . .
The Poisson summation formula. . . . . . . . . . . . . . .
The Shannon sampling theorem. . . . . . . . . . . . . . .
The Heisenberg Uncertainty Principle. . . . . . . . . . . .
Tempered distributions. . . . . . . . . . . . . . . . . . . .
3.11.1 Examples of Fourier transforms of elements of S 0 . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
85
86
86
86
88
88
88
89
90
91
92
93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
95
98
98
102
104
108
109
110
111
113
114
116
117
4 Measure theory.
4.1 Lebesgue outer measure. . . . . . . . . . .
4.2 Lebesgue inner measure. . . . . . . . . . .
4.3 Lebesgues definition of measurability. . .
4.4 Caratheodorys definition of measurability.
4.5 Countable additivity. . . . . . . . . . . . .
4.6 -fields, measures, and outer measures. . .
4.7 Constructing outer measures, Method I. .
4.7.1 A pathological example. . . . . . .
4.7.2 Metric outer measures. . . . . . . .
4.8 Constructing outer measures, Method II. .
4.8.1 An example. . . . . . . . . . . . .
4.9 Hausdorff measure. . . . . . . . . . . . . .
4.10 Hausdorff dimension. . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4.11 Push forward. . . . . . . . . . . . . . . . . . . . .
4.12 The Hausdorff dimension of fractals . . . . . . .
4.12.1 Similarity dimension. . . . . . . . . . . . .
4.12.2 The string model. . . . . . . . . . . . . .
4.13 The Hausdorff metric and Hutchinsons theorem.
4.14 Affine examples . . . . . . . . . . . . . . . . . . .
4.14.1 The classical Cantor set. . . . . . . . . . .
4.14.2 The Sierpinski Gasket . . . . . . . . . . .
4.14.3 Morans theorem . . . . . . . . . . . . . .
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
119
119
119
122
124
126
126
128
129
5 The
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
Lebesgue integral.
133
Real valued measurable functions. . . . . . . . . . . . . . . . . . 134
The integral of a non-negative function. . . . . . . . . . . . . . . 134
Fatous lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
The monotone convergence theorem. . . . . . . . . . . . . . . . . 140
The space L1 (X, R). . . . . . . . . . . . . . . . . . . . . . . . . . 140
The dominated convergence theorem. . . . . . . . . . . . . . . . . 143
Riemann integrability. . . . . . . . . . . . . . . . . . . . . . . . . 144
The Beppo - Levi theorem. . . . . . . . . . . . . . . . . . . . . . 145
L1 is complete. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Dense subsets of L1 (R, R). . . . . . . . . . . . . . . . . . . . . . 147
The Riemann-Lebesgue Lemma. . . . . . . . . . . . . . . . . . . 148
5.11.1 The Cantor-Lebesgue theorem. . . . . . . . . . . . . . . . 150
5.12 Fubinis theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.12.1 Product -fields. . . . . . . . . . . . . . . . . . . . . . . . 151
5.12.2 -systems and -systems. . . . . . . . . . . . . . . . . . . 152
5.12.3 The monotone class theorem. . . . . . . . . . . . . . . . . 153
5.12.4 Fubini for finite measures and bounded functions. . . . . 154
5.12.5 Extensions to unbounded functions and to -finite measures.156
6 The
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
Daniell integral.
The Daniell Integral . . . . . . . . . . . . . . . .
Monotone class theorems. . . . . . . . . . . . . .
Measure. . . . . . . . . . . . . . . . . . . . . . . .
H
older, Minkowski , Lp and Lq . . . . . . . . . . .
k k is the essential sup norm. . . . . . . . . . .
The Radon-Nikodym Theorem. . . . . . . . . . .
The dual space of Lp . . . . . . . . . . . . . . . .
6.7.1 The variations of a bounded functional. .
6.7.2 Duality of Lp and Lq when (S) < . . .
6.7.3 The case where (S) = . . . . . . . . .
Integration on locally compact Hausdorff spaces.
6.8.1 Riesz representation theorems. . . . . . .
6.8.2 Fubinis theorem. . . . . . . . . . . . . . .
The Riesz representation theorem redux. . . . . .
6.9.1 Statement of the theorem. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
157
160
161
163
166
167
170
171
172
173
175
175
176
177
177
CONTENTS
6.9.2 Propositions in topology. . . . . . . . . .
6.9.3 Proof of the uniqueness of the restricted
6.10 Existence. . . . . . . . . . . . . . . . . . . . . . .
6.10.1 Definition. . . . . . . . . . . . . . . . . . .
6.10.2 Measurability of the Borel sets. . . . . . .
6.10.3 Compact sets have finite measure. . . . .
6.10.4 Interior regularity. . . . . . . . . . . . . .
6.10.5 Conclusion of the proof. . . . . . . . . . .
. . . . . .
to B(X).
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
178
180
180
180
182
183
183
184
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
187
187
189
190
194
195
196
196
198
199
200
202
8 Haar measure.
205
8.1 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.1.1 Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.1.2 Discrete groups. . . . . . . . . . . . . . . . . . . . . . . . 206
8.1.3 Lie groups. . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.2 Topological facts. . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.3 Construction of the Haar integral. . . . . . . . . . . . . . . . . . 212
8.4 Uniqueness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.5 (G) < if and only if G is compact. . . . . . . . . . . . . . . . 218
8.6 The group algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.7 The involution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.7.1 The modular function. . . . . . . . . . . . . . . . . . . . . 220
8.7.2 Definition of the involution. . . . . . . . . . . . . . . . . . 222
8.7.3 Relation to convolution. . . . . . . . . . . . . . . . . . . . 223
8.7.4 Banach algebras with involutions. . . . . . . . . . . . . . 223
8.8 The algebra of finite measures. . . . . . . . . . . . . . . . . . . . 223
8.8.1 Algebras and coalgebras. . . . . . . . . . . . . . . . . . . . 224
8.9 Invariant and relatively invariant measures on homogeneous spaces.225
CONTENTS
231
232
232
232
233
234
235
236
238
10 The
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
255
256
261
261
262
263
264
265
266
268
269
271
spectral theorem.
Resolutions of the identity. . . . . . . . . . . . . . . . . . . . . .
The spectral theorem for bounded normal operators. . . . . . . .
Stones formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unbounded operators. . . . . . . . . . . . . . . . . . . . . . . . .
Operators and their domains. . . . . . . . . . . . . . . . . . . . .
The adjoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Self-adjoint operators. . . . . . . . . . . . . . . . . . . . . . . . .
The resolvent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The multiplication operator form of the spectral theorem. . . . .
10.9.1 Cyclic vectors. . . . . . . . . . . . . . . . . . . . . . . . .
10.9.2 The general case. . . . . . . . . . . . . . . . . . . . . . . .
10.9.3 The spectral theorem for unbounded self-adjoint operators, multiplication operator form. . . . . . . . . . . . . .
10.9.4 The functional calculus. . . . . . . . . . . . . . . . . . . .
10.9.5 Resolutions of the identity. . . . . . . . . . . . . . . . . .
10.10The Riesz-Dunford calculus. . . . . . . . . . . . . . . . . . . . . .
10.11Lorchs proof of the spectral theorem. . . . . . . . . . . . . . . .
10.11.1 Positive operators. . . . . . . . . . . . . . . . . . . . . . .
10.11.2 The point spectrum. . . . . . . . . . . . . . . . . . . . . .
10.11.3 Partition into pure types. . . . . . . . . . . . . . . . . . .
10.11.4 Completion of the proof. . . . . . . . . . . . . . . . . . . .
10.12Characterizing operators with purely continuous spectrum. . . .
10.13Appendix. The closed graph theorem. . . . . . . . . . . . . . . .
241
241
242
244
247
248
249
250
251
253
271
273
274
276
279
279
281
282
283
287
288
10
11 Stones theorem
11.1 von Neumanns Cayley transform. . . . . . . . . .
11.1.1 An elementary example. . . . . . . . . . . .
11.2 Equibounded semi-groups on a Frechet space. . . .
11.2.1 The infinitesimal generator. . . . . . . . . .
11.3 The differential equation . . . . . . . . . . . . . . .
11.3.1 The resolvent. . . . . . . . . . . . . . . . . .
11.3.2 Examples. . . . . . . . . . . . . . . . . . . .
11.4 The power series expansion of the exponential. . .
11.5 The Hille Yosida theorem. . . . . . . . . . . . . . .
11.6 Contraction semigroups. . . . . . . . . . . . . . . .
11.6.1 Dissipation and contraction. . . . . . . . . .
11.6.2 A special case: exp(t(B I)) with kBk 1.
11.7 Convergence of semigroups. . . . . . . . . . . . . .
11.8 The Trotter product formula. . . . . . . . . . . . .
11.8.1 Lies formula. . . . . . . . . . . . . . . . . .
11.8.2 Chernoffs theorem. . . . . . . . . . . . . .
11.8.3 The product formula. . . . . . . . . . . . .
11.8.4 Commutators. . . . . . . . . . . . . . . . .
11.8.5 The Kato-Rellich theorem. . . . . . . . . .
11.8.6 Feynman path integrals. . . . . . . . . . . .
11.9 The Feynman-Kac formula. . . . . . . . . . . . . .
11.10The free Hamiltonian and the Yukawa potential. .
11.10.1 The Yukawa potential and the resolvent. . .
11.10.2 The time evolution of the free Hamiltonian.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
291
292
297
299
299
301
303
304
309
310
313
314
316
317
320
320
321
322
323
323
324
326
328
329
331
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
operator.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
333
333
333
335
336
339
340
341
343
344
345
345
345
346
347
348
350
350
351
352
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
354
355
357
357
357
358
358
360
360
362
364
365
366
367
368
368
368
369
370
371
371
373
374
376
377
380
13 Scattering theory.
13.1 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.1 Translation - truncation. . . . . . . . . . . . . . . . . . . .
13.1.2 Incoming representations. . . . . . . . . . . . . . . . . . .
13.1.3 Scattering residue. . . . . . . . . . . . . . . . . . . . . . .
13.2 Breit-Wigner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.3 The representation theorem for strongly contractive semi-groups.
13.4 The Sinai representation theorem. . . . . . . . . . . . . . . . . .
13.5 The Stone - von Neumann theorem. . . . . . . . . . . . . . . . .
383
383
383
384
386
387
388
390
392
12.4
12.5
12.6
12.7
11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
CONTENTS
Chapter 1
Metric spaces
14
15
16
In other words, we may define the distance function on the quotient space X/R,
i.e. on the space of equivalence classes by
d({x}, {y}) := d(u, v),
u {x}, v {y}
and this does not depend on the choice of u and v. Axioms 1)-3) for a metric
space continue to hold, but now
d({x}, {y}) = 0 {x} = {y}.
In other words, X/R is a metric space. Clearly the projection map x 7 {x} is
an isometry of X onto X/R. (An isometry is a map which preserves distances.)
In particular it is continuous. It is also open.
In short, we have provided a canonical way of passing (via an isometry) from
a pseudo-metric space to a metric space by identifying points which are at zero
distance from one another.
A subset A of a pseudo-metric space X is called dense if its closure is the
whole space. From the above construction, the image A/R of A in the quotient
space X/R is again dense. We will use this fact in the next section in the
following form:
If f : Y X is an isometry of Y such that f (Y ) is a dense set of X, then
f descends to a map F of Y onto a dense set in the metric space X/R.
1.2
17
Any metric (or pseudo-metric) space can be mapped by a one to one isometry
onto a dense subset of a complete metric (or pseudo-metric) space.
By the italicized statement of the preceding section, it is enough to prove
this for a pseudo-metric spaces X. Let Xseq denote the set of Cauchy sequences
in X, and define the distance between the Cauchy sequences {xn } and {yn } to
be
d({xn }, {yn }) := lim d(xn , yn ).
n
1.3
Of special interest are vector spaces which have a metric which is compatible
with the vector space properties and which is complete: Let V be a vector space
over the real or complex numbers. A norm is a real valued function
v 7 kvk
on V which satisfies
1. kvk 0 and > 0 if v 6= 0,
2. kcvk = |c|kvk for any real (or complex) number c, and
3. kv + wk kvk + kwk v, w V .
Then d(v, w) := kvwk is a metric on V , which satisfies d(v+u, w+u) = d(v, w)
for all v, w, u V . The ball of radius r about the origin is then the set of all v
such that kvk < r. A vector space equipped with a norm is called a normed
vector space and if it is complete relative to the metric it is called a Banach
space.
Our construction shows that any vector space with a norm can be completed
so that it becomes a Banach space.
18
1.4
Compactness.
A topological space X is said to be compact if it has one (and hence the other)
of the following equivalent properties:
Every open cover has a finite subcover. In more detail: if {U } is a
collection of open sets with
[
X
U
1.5
Total Boundedness.
A metric space X is said to be totally bounded if for every > 0 there are
finitely many open balls of radius which cover X.
Theorem 1.5.1 The following assertions are equivalent for a metric space:
1. X is compact.
2. Every sequence in X has a convergent subsequence.
3. X is totally bounded and complete.
Proof that 1. 2. Let {yi } be a sequence of points in X. We first show that
there is a point x with the property for every > 0, the open ball of radius
centered at x contains the points yi for infinitely many i. Suppose not. Then
for any z X there is an > 0 such that the ball B (z) contains only finitely
many yi . Since z B (z), the set of such balls covers X. By compactness,
finitely many of these balls cover X, and hence there are only finitely many i,
a contradiction.
Now choose i1 so that yi1 is in the ball of radius 12 centered at x. Then
choose i2 > i1 so that yi2 is in the ball of radius 14 centered at x and keep going.
We have constructed a subsequence so that the points yik converge to x. Thus
we have proved that 1. implies 2.
1.6. SEPARABILITY.
19
1.6
Separability.
20
1.7
Second Countability.
1.8
Suppose that condition 2. and 3. of the theorem hold for the metric space
X. By Proposition 1.6.2, X is separable, and hence by Proposition 1.7.1, X is
21
1.9
Dinis lemma.
Let X be a metric space and let L denote the space of real valued continuous
functions of compact support. So f L means that f is continuous, and the
closure of the set of all x for which |f (x)| > 0 is compact. Thus L is a real
vector space, and f L |f | L. Thus if f L and g L then f + g L and
also max (f, g) = 12 (f + g + |f g|) L and min (f, g) = 12 (f + g |f g|) L.
For a sequence of elements in L (or more generally in any space of real valued
functions) we write fn 0 to mean that the sequence of functions is monotone
decreasing, and at each x we have fn (x) 0.
Theorem 1.9.1 Dinis lemma. If fn L and fn 0 then kfn k 0. In
other words, monotone decreasing convergence to 0 implies uniform convergence
to zero for elements of L.
Proof. Given T> 0, let Cn = {x|fn (x) }. Then the Cn are compact,
Cn Cn+1 and k Ck = . Hence a finite intersection is already empty, which
means that Cn = for some n. This means that kfn k for some n, and
hence, since the sequence is monotone decreasing, for all subsequent n. QED
1.10
(1.1)
Here the length `(I) of any interval I = [a, b] is b a with the same definition
for half open intervals (a, b] or [a, b), or open intervals. Of course if a =
and b is finite or +, or if a is finite and b = + the length is infinite. So the
infimum in (1.1) is taken over all covers of A by intervals. By the usual /2n
trick, i.e. by replacing each Ij = [aj , bj ] by (aj /2j+1 , bj + /2j+1 ) we may
22
assume that the infimum is taken over open intervals. (Equally well, we could
use half open intervals of the form [a, b), for example.).
It is clear that if A B then m (A) m (B) since any cover of B by
intervals is a cover of A. Also, if Z is any set of measure zero, then m (A Z) =
m (A). In particular, m (Z) = 0 if Z has measure zero. Also, if A = [a, b] is an
interval, then we can cover it by itself, so
m ([a, b]) b a,
and hence the same is true for (a, b], [a, b), or (a, b). If the interval is infinite, it
clearly can not be covered by a set of intervals whose total length is finite, since
if we lined them up with end points touching they could not cover an infinite
interval. We still must prove that
m (I) = `(I)
(1.2)
then
dc
(bi ai ).
n
[
(ai , bi ) then d c
i=1
n
X
(bi ai ).
i=1
n
[
i=3
(ai , bi ).
23
So by induction
d c (b2 a1 ) +
n
X
(bi ai ).
i=3
1.11
In the first few sections we repeatedly used an argument which involved choosing this or that element of a set. That we can do so is an axiom known as
The axiom of choice. If F is a function with domain D such that F (x)
is a non-empty set for every x D, then there exists a function f with domain
D such that f (x) F (x) for every x D.
It has been proved by G
odel that if mathematics is consistent without the
axiom of choice (a big if!) then mathematics remains consistent with the
axiom of choice added.
In fact, it will be convenient for us to take a slightly less intuitive axiom as
out starting point:
Zorns lemma.
Every partially ordered set A has a maximal linearly
ordered subset. If every linearly ordered subset of A has an upper bound, then
A contains a maximum element.
The second assertion is a consequence of the first. For let B be a maximum
linearly ordered subset of A, and x an upper bound for B. Then x is a maximum
element of A, for if y x then we could add y to B to obtain a larger linearly
ordered set. Thus there is no element in A which is strictly larger than x which
is what we mean when we say that x is a maximum element.
Zorns lemma implies the axiom of choice.
Indeed, consider the set A of all functions g defined on subsets of D such
that g(x) F (x). We will let dom(g) denote the domain of definition of g. The
set A is not empty, for if we pick a point x0 D and pick y0 F (x0 ), then
the function g whose domain consists of the single point x0 and whose value
g(x0 ) = y0 gives an element of A. Put a partial order on A by saying that
g h if dom(g) dom(h) and the restriction of h to dom g coincides with g.
A linearly ordered subset means that we have an increasing family of domains
X, with functions h defined consistently with respect to restriction. But S
this
means that there is a function g defined on the union of these domains, X
whose restriction to each X coincides with the corresponding h. This is clearly
an upper bound. So A has a maximal element f . If the domain of f were not
24
all of D we could add a single point x0 not in the domain of f and y0 F (x0 )
contradicting the maximality of f . QED
1.12
1.13
Tychonoff s theorem.
Let I be a set, serving as an index set. Suppose that for each I we are
given a non-empty topological space S . The Cartesian product
Y
S :=
S
I
is defined as the collection of all functions x whose domain in I and such that
x() S . This space is not empty by the axiom of choice. We frequently write
x instead of x() and called x the coordinate of x.
The map
Y
f :
S S , x 7 x
I
25
I
Proof. Let F be a family of closed subsets of S with the property that the
intersection of any finite collection of subsets from this family is not empty. We
must show that the intersection of all the elements of F is not empty. Using
Zorn, extend F to a maximal family F0 of (not necessarily closed) subsets of S
with the property that the intersection of any finite collection of elements of F0
is not empty. For each , the projection f (F0 ) has the property that there is
a point x S which is in the closure of all the sets belonging to f (F0 ). Let
x S be the point whose -th coordinate is x . We will show that x is in the
closure of every element of F0 which will complete the proof.
Let U be an open set containing x. By the definition of the product topology,
there are finitely many i and open subsets Ui Si such that
x
n
\
f1
(Ui ) U.
i
i=1
f1
(Ui ) F0 ,
i
i=1
1.14
Urysohns lemma.
26
V 12 V1 .
Applying our normality assumption to the sets F0 and V 1c we can find an open
2
This is a union of open sets, hence open. Similarly, f (x) > a means that there
is some r > a such that x 6 Vr . Thus
[
f 1 ((a, 1]) =
(Vr )c ,
r>a
f 1 ((a, 1])
are open. Hence f 1 ((a, b)) is open. Since the intervals [0, b), (a, 1] and (a, b)
form a basis for the open sets on the interval [0, 1], we see that the inverse image
of any open set under f is open, which says that f is continuous. QED
We will have several occasions to use this result.
1.15
27
The Taylor series expansion about the point 12 for the function t 7 (t + 2 ) 2
converges uniformly on [0, 1]. So there exists, for any > 0 there is a polynomial
P such that
1
|P (x2 ) (x2 + 2 ) 2 | < on [1, 1].
Let
Q := P P (0).
We have |P (0) | < so
|P (0)| < 2.
So Q(0) = 0 and
1
(x2 + 2 ) 2 |x|
for small . So
|Q(x2 ) |x| | < 4
on
[0, 1].
28
As Q does not contain a constant term, and A is an algebra, Q(f 2 ) A for any
f A. Since we are assuming that |f | 1 we have
Q(f 2 ) A,
Let
Up,q, := {x|fp,q, (x) < f (x) + }, Vp,q, := {x|fp,q, (x) > f (x) }.
Fix q and . The sets Up,q, cover S as p varies. Hence a finite number cover S
since we are assuming that S is compact. We may take the minimum fq, of the
corresponding finite collection of fp,q, . The function fq, has the property that
fq, (x) < f (x) +
and
fq, (x) > f (x)
for
x
Vp,q,
where the intersection is again over the same finite set of ps. We have now
found a collection of functions fq, such that
fq, < f +
and fq, > f on some neighborhood Vq, of q. We may choose a finite number
of q so that the Vq, cover all of S. Taking the maximum of the corresponding
fq, gives a function f A with f < f < f + , i.e.
kf f k < .
29
30
disjoint from S, and all we have done is add a disconnected point to S. The
space S is called the one-point compactification of S. In applications of
the Stone-Weierstrass theorem, we shall frequently have to do with an algebra
of functions on a locally compact space consisting of functions which vanish
at infinity in the sense that for any > 0 there is a compact set C such that
|f | < on the complement of C. We can think of these functions as being
defined on S and all vanishing at p .
We now turn to a second proof of this important theorem.
1.16
Machados theorem.
Let M be a compact space and let CR (M) denote the algebra of continuous
real valued functions on M. We let k k = k k denote the uniform norm on
CR (M). More generally, for any closed set F M, we let
kf kF = sup |f (x)|
xF
so k k = k kM .
If A CR (M) is a collection of functions, we will say that a subset E M
is a level set (for A) if all the elements of A are constant on the set E. Also,
for any f CR (M) and any closed set F M, we let
df (F ) := inf kf gkF .
gA
(1.3)
Let F denote the collection of all non-empty closed subsets of M with this
property. Clearly M F so this collection is not empty. We order F by the
reverse of inclusion: F1 F2 if F1 F2 . Let C be a totally ordered subset
of F. Since M is compact, the intersection of any nested family of non-empty
closed sets is again non-empty. We claim that the intersection of all the sets in
C belongs to F, i.e. satisfies (1.3). Indeed, since df (F ) = df (M) for any F C
this means that for any g A, the sets
{x F ||f (x) g(x)| df ((M ))}
are non-empty. They are also closed and nested, and hence have a non-empty
intersection. So on the set
\
E=
F
F C
we have
kf gkE df (M).
31
So every chain has an upper bound, and hence by Zorns lemma, there exists a
maximum, i.e. there exists a non-empty closed subset E satisfying (1.3) which
has the property that no proper subset of E satisfies (1.3). We shall call such a
subset f -minimal.
Theorem 1.16.1 [Machado.] Suppose that A CR (M) is a subalgebra which
contains the constants and which is closed in the uniform topology. Then for
every f CR (M) there exists an A level set satisfying(1.3). In fact, every
f -minimal set is an A level set.
Proof. Let E be an f -minimal set. Suppose it is not an A level set. This
means that there is some h A which is not constant on A. Replacing h by
ah + c where a and c are constant, we may arrange that
min h = 0
xE
and max h = 1.
xE
Let
1
2
} and E1 : {x E| x 1}.
3
3
These are non-empty closed proper subsets of E, and hence the minimality of
E implies that there exist g0 , g1 A such that
E0 := {x E|0 h(x)
kf g0 kE0 df (M)
and kn := hn g0 + (1 hn )g1 .
1
.
(1 + hn )2n
32
Now the binomial formula implies that for any integer k and any positive number
a we have ka (1 + a)k or (1 + a)k 1/(ka). So we have
hn
On E0 \ E1 we have hn
2 n
3
1
.
2n hn
so there we have
n
3
hn
0.
4
Thus kn g0 uniformly on E0 \ E1 and kn g1 uniformly on E1 \ E0 . We
conclude that for n large enough
kf kn kE < df (M)
contradicting our assumption that df (E) = df (M). QED
Corollary 1.16.1 [The Stone-Weierstrass Theorem.] If A is a uniformly
closed subalgebra of CR (M) which contains the constants and separates points,
then A = CR (M).
Proof. The only A-level sets are points. But since kf f (a)k{a} = 0, we
conclude that df (M) = 0, i.e. f A for any f CR (M). QED
1.17
This says:
Theorem 1.17.1 [Hahn-Banach]. Let M be a subspace of a normed linear
space B, and let F be a bounded linear function on M . Then F can be extended
so as to be defined on all of B without increasing its norm.
Proof by Zorn. Suppose that we can prove
Proposition 1.17.1 Let M be a subspace of a normed linear space B, and let
F be a bounded linear function on on M . Let y B, y 6 M . Then F can be
extended to M + {y} without changing its norm.
Then we could order the extensions of F by inclusion, one extension being
than another if it is defined on a larger space. The extension defined on the
union of any family of subspaces ordered by inclusion is again an extension, and
so is an upper bound. The proposition implies that a maximal extension must
be defined on the whole space, otherwise we can extend it further. So we must
prove the proposition.
I was careful in the statement not to specify whether our spaces are over the
real or complex numbers. Let us first assume that we are dealing with a real
vector space, and then deduce the complex case.
33
x M, R.
x M.
F (x1 ) kx1 + yk x1 M.
x1 , x2 M ?
But x1 x2 = (x1 + y) (x2 + y) and so using the fact that kF k = 1 and the
triangle inequality gives
|F (x2 ) F (x1 )| kx2 x1 k kx2 + yk + kx1 + yk.
This completes the proof of the proposition, and hence of the Hahn-Banach
theorem over the real numbers.
We now deal with the complex case. If B is a complex normed vector
space, then it is also a real vector space, and the real and imaginary parts of a
complex linear function are real linear functions. In other words, we can write
any complex linear function F as
F (x) = G(x) + iH(x)
34
where G and H are real linear functions. The fact that F is complex linear says
that F (ix) = iF (x) or
G(ix) = H(x)
or
H(x) = G(ix)
or
F (x) = G(x) iG(ix).
The fact that kF k = 1 implies that kGk 1. So we can adjoin the real one
dimensional space spanned by y to M and extend the real linear function to it,
keeping the norm 1. Next adjoin the real one dimensional space spanned by
iy and extend G to it. We now have G extended to M Cy with no increase
in norm. Try to define
F (z) := G(z) iG(iz)
on M Cy. This map of M Cy C is R-linear, and coincides with F on
M . We must check that it is complex linear and that its norm is 1: To check
that it is complex linear it is enough to observe that
F (iz) = G(iz) iG(z) = i[G(z) iG(iz)] = iF (z).
To check the norm, we may, for any z, choose so that ei F (z) is real and is
non-negative. Then
|F (z)| = |ei F (z)| = |F (ei z)| = G(ei z) kei zk = kzk
so kF k 1. QED
Suppose that M is a closed subspace of B and that y 6 M . Let d denote
the distance of y to M , so that
d := inf ky xk.
xM
|d|
d
d
= sup
= = 1.
ky xk x0 M ky x0 k
d
35
We have an embedding
B B
x 7 x where x (F ) := F (x).
1.18
2K
+ K.
r
36
So
2K
+K
r
for all n proving the theorem from the proposition.
kFn k
Proof of the proposition. If the proposition is false, we can find n1 such that
|Fn1 (x)| > 1 at some x B(0, 1) and hence in some ball of radius < 12 about x.
Then we can find an n2 with |Fn2 (z)| > 2 in some non-empty closed ball of radius
< 13 lying inside the first ball. Continuing inductively, we choose a subsequence
nm and a family of nested non-empty balls Bm with |Fnm (z)| > m throughout
Bm and the radii of the balls tending to zero. Since B is complete, there is
a point x common to all these balls, and {|Fn (x)|} is unbounded, contrary to
hypothesis. QED
We will have occasion to use this theorem in a reversed form. Recall
that we have the norm preserving injection B B sending x 7 x where
x (F ) = F (x). Since B is a Banach space (even if B is incomplete) we have
Corollary 1.18.1 If {xn } is a sequence of elements in a normed linear space
such that the numerical sequence {|F (xn )|} is bounded for each fixed F B
then the sequence of norms {kxn k} is bounded.
Chapter 2
Hilbert space.
2.1.1
Scalar products.
z1
z = ...
zn
and (z, w) is given by
(z, w) :=
n
X
1
37
zi wi .
38
|ai |2 < .
Here
(a, b) :=
ai bi .
2.1.2
(2.1)
Proof. For any real number t condition 3. above says that (f tg, f tg) 0.
Expanding out gives
0 (f tg, f tg) = (f, f ) t[(f, g) + (g, f )] + t2 (g, g).
Since (g, f ) = (f, g), the coefficient of t in the above expression is twice the real
part of (f, g). So the real quadratic form
Q(t) := (f, f ) 2Re(f, g)t + t2 (g.g)
is nowhere negative. So it can not have distinct real roots, and hence by the
b2 4ac rule we get
4(Re(f, g))2 4(f, f )(g, g) 0
or
(Re(f, g))2 (f, f )(g, g).
(2.2)
This is useful and almost but not quite what we want. But we may apply this
inequality to h = ei g for any . Then (h, h) = (g, g). Choose so that
(f, g) = rei
39
2.1.3
kf k := (f, f ) 2
so we can write the Cauchy-Schwartz inequality as
|(f, g)| kf kkgk.
The triangle inequality says that
kf + gk kf k + kgk.
(2.3)
Proof.
kf + gk2
= (f + g, f + g)
= (f, f ) + 2Re(f, g) + (g, g)
(f, f ) + 2kf kkgk + (g, g) by (2.2)
= kf |2 + 2kf kkgk + kgk2
= (kf k + kgk)2 .
Taking square roots gives the triangle inequality (2.3). Notice that
kcf k = |c|kf k
(2.4)
40
A complex vector space V endowed with a scalar product is called a preHilbert space.
Let V be a complex vector space and let k k be a map which assigns to any
f V a non-negative real kf k number such that kf k > 0 for all non-zero f . If
k k satisfies the triangle inequality (2.3) and equation (2.4) it is called a norm.
A vector space endowed with a norm is called a normed space. The pre-Hilbert
spaces can be characterized among all normed spaces by the parallelogram law
as we will discuss below.
Later on, we will have to weaken condition (2.4) in our general study. But
it is too complicated to give the general definition right now.
2.1.4
The reason for the prefix pre is the following: The distance d defined above
has all the desired properties we might expect of a distance. In particular, we
can define the notions of limit and of a Cauchy sequence as is done for the
real numbers: If fn is a sequence of elements of V , and f V we say that f is
the limit of the fn and write
lim fn = f,
or fn f
41
2.1.5
(2.5)
X
i
|zi |2 kui k2 .
42
2.1.6
(2.6)
(2.7)
kf + gk2 + kf gk2 = 2 kf k2 + kgk2 .
(2.8)
gives
This is known as the parallelogram law. It is the algebraic expression of the
theorem of Apollonius which asserts that the sum of the areas of the squares on
the sides of a parallelogram equals the sum of the areas of the squares on the
diagonals.
If we subtract (2.7) from (2.6) we get
Re(f, g) =
1
kf + gk2 kf gk2 .
4
(2.9)
1
kf + gk2 kf gk2 + ikf + igk2 ikf igk2 .
(2.10)
4
If we now complete a pre-Hilbert space, the right hand side of this equation
is defined on the completion, and is a continuous function there. It therefore
follows that the scalar product extends to the completion, and, by continuity,
satisfies all the axioms for a scalar product, plus the completeness condition for
the associated norm. In other words, the completion of a pre-Hilbert space is a
Hilbert space.
(f, g) =
2.1.7
43
definition, then all the axioms on a scalar product hold. The easiest axiom to
verify is
(g, f ) = (f, g).
Indeed, the real part of the right hand side of (2.10) is unchanged under the
interchange of f and g (since g f = (f g) and k hk = khk for any h is one
of the properties of a norm). Also g + if = i(f ig) and kihk = khk so the last
two terms on the right of (2.10) get interchanged, proving that (g, f ) = (f, g).
It is just as easy to prove that
(if, g) = i(f, g).
Indeed replacing f by if sends kf + igk2 into kif + igk2 = kf + gk2 and sends
kf + gk2 into kif + gk2 = ki(f ig)k2 = kf igk2 = i(ikf igk2 ) so has the
effect of multiplying the sum of the first and fourth terms by i, and similarly
for the sum of the second and third terms on the right hand side of (2.10).
Now (2.10) implies (2.9). Suppose we replace f, g in (2.8) by f1 + g, f2 and
by f1 g, f2 and subtract the second equation from the first. We get
kf1 + f2 + gk2 kf1 + f2 gk2 + kf1 f2 + gk2 kf1 f2 gk2
= 2 kf1 + gk2 kf1 gk2 .
In view of (2.9) we can write this as
Re (f1 + f2 , g) + Re (f1 f2 , g) = 2Re (f1 , g).
(2.11)
Now the right hand side of (2.9) vanishes when f = 0 since kgk = k gk. So if
we take f1 = f2 = f in (2.11) we get
Re (2f, g) = 2Re (f, g).
We can thus write (2.11) as
Re (f1 + f2 , g) + Re (f1 f2 , g) = Re (2f1 , g).
In this equation make the substitutions
f1 7
1
(f1 + f2 ),
2
f2 7
1
(f1 f2 ).
2
This yields
Re (f1 + f2 , g) = Re (f1 , g) + Re (f2 , g).
Since it follows from (2.10) and (2.9) that
(f, g) = Re (f, g) iRe (if, g)
we conclude that
(f1 + f2 , g) = (f1 , g) + (f2 , g).
44
2.1.8
45
Orthogonal projection.
46
(v, y)
.
kyk2
(2.12)
2.1.9
47
1
y
`(y)
sup
|`(x)|/kxk.
xV,kxk6=0
48
Theorem 2.1.2 Every continuous linear function on H is given by scalar product by some element of H.
The proof is a consequence of the theorem about projections applied to
N := ker ` :
If ` = 0 there is nothing to prove. If ` 6= 0 then N is a closed subspace of
codimension one. Choose v 6 N . Then there is an x N with (v x) N. Let
y :=
1
(v x).
kv xk
Then
yN
and
kyk = 1.
For any f H,
[f (f, y)y] y
so
f (f, y)y N
or
`(f ) = (f, y)`(y),
so if we set
g := `(y)y
then
(f, g) = `(f )
for all f H. QED
2.1.10
What is L2 (T)?
We have defined the space L2 (T) to be the completion of the space C(T) under
1
the L2 norm kf k2 = (f, f ) 2 . In particular, every linear function on C(T) which
is continuous with respect to the this L2 norm extends to a unique continuous
linear function on L2 (T). By the Riesz representation theorem we know that
every such continuous linear function is given by scalar product by an element of
L2 (T). Thus we may think of the elements of L2 (T) as being the linear functions
on C(T) which are continuous with respect to the L2 norm. An element of L2 (T)
should not be thought of as a function, but rather as a linear function on the
space of continuous functions relative to a special norm - the L2 norm.
2.1.11
49
meaning that the Mi are mutually perpendicular and every element x of M can
be written as
X
x=
xi , xi Mi .
(The orthogonality guarantees that such a decomposition is unique.) Suppose
further that each Mi is such that the projection Mi exists. Then M exists
and
X
M (v) =
Mi (v).
(2.13)
P
Proof. Clearly the right hand side belongs to M . We must show v i Mi (v)
is orthogonal to every element of M . For this it is enough to show that it is
orthogonal to each Mj since every element of M is a sum of elements of the Mj .
So suppose xj Mj . But (Mi v, xj ) = 0 if i 6= j. So
X
(v
Mi (v), xj ) = (v Mj (v), xj ) = 0
by the defining property of Mj .
2.1.12
We now will put the equations (2.12) and (2.13) together: Suppose that M is
a finite dimensional subspace with an orthonormal basis i . This implies that
M is an orthogonal direct sum of the one dimensional spaces spanned by the i
and hence M exists and is given by
X
M (v) =
ai i where ai = (v, i ).
(2.14)
2.1.13
Bessels inequality.
We now look at the infinite dimensional situation and suppose that we are given
an orthonormal sequence {i }
1 . Any v V has its Fourier coefficients
ai = (v, i )
relative to the members of this sequence. Bessels inequality asserts that
|ai |2 kvk2 ,
(2.15)
50
Proof. Let
vn :=
n
X
ai i ,
i=1
so that vn is the projection of v onto the subspace spanned by the first n of the
i . In any event, (v vn ) vn so by the Pythagorean Theorem
kvk2 = kv vn k2 + kvn k2 = kv vn k2 +
n
X
|ai |2 .
i=1
|ai |2 kvk2
i=1
and letting n shows that the series on the left of Bessels inequality
converges and that Bessels inequality holds.
2.1.14
Parsevals equation.
2.1.15
Orthonormal bases.
We still suppose that V is merely a pre-Hilbert space. We say that an orthonormal sequence {i } is a basis of V if every element of V is the sum of its Fourier
series. For example, one of our tasks will be to show that the exponentials
{einx }
n= form a basis of C(T).
If the orthonormal sequence i is a basis, then any v can be approximated
as closely as we like by finite linear combinations of the i , in fact by the partial
sums of its Fourier series. We say that the finite linear combinations of the i
are dense in V . Conversely, suppose that the finite linear combinations of the
51
i are dense in V . This means that for any v and any > 0 we can find an n
and a set of n complex numbers bi such that
X
kv
bi i k .
But we know that vn is the closest vector to v among all the linear combinations
of the first n of the i . so we must have
kv vn k .
But this says that the Fourier series of v converges to v, i.e. that the i form
a basis. For example, we know from Fejers theorem that the exponentials eikx
are dense in C(T). Hence we know that they form a basis of the pre-Hilbert
space C(T). We will give some alternative proofs of this fact below.
In the case that V is actually a Hilbert space, and not merely a pre-Hilbert
space, there is an alternative and very useful criterion for an orthonormal sequence to be a basis: Let M be the set of all limits of finite linear combinations of
the i . Any Cauchy sequence in M converges (in V ) since V is a Hilbert space,
and this limit belongs to M since it is itself a limit of finite linear combinations
of the i (by the diagonal argument for example). Thus V = M M , and
the i form a basis of M . So the i form a basis of V if and only if M = {0}.
But this is the same as saying that no non-zero vector is orthogonal to all the
i . So we have proved
Proposition 2.1.1 In a Hilbert space, the orthonormal set {i } is a basis if
and only if no non-zero vector is orthogonal to all the i .
2.2
Self-adjoint transformations.
We continue to let V denote a pre-Hilbert space. Let T be a linear transformation of V into itself. This means that for every v V the vector T v V
is defined and that T v depends linearly on v : T (av + bw) = aT v + bT w for
any two vectors v and w and any two complex numbers a and b. We recall
from linear algebra that a non-zero vector v is called an eigenvector of T if T v
is a scalar times v, in other words if T v = v where the number is called the
corresponding eigenvalue.
A linear transformation T on V is called symmetric if for any pair of
elements v and w of V we have
(T v, w) = (v, T w).
Notice that if v is an eigenvector of a symmetric transformation T with
eigenvalue , then
(v, v) = (v, v) = (T v, v) = (v, T v) = (v, v) = (v, v),
so = . In other words, all eigenvalues of a symmetric transformation are
real.
52
We will let S = S(V ) denote the unit sphere of V , i.e. S denotes the set
of all V such that kk = 1. A linear transformation T is called bounded
if kT k is bounded as ranges over all of S. If T is bounded, we let
kT k := max kT k.
S
Then
kT vk kT kkvk
for all v V . A linear transformation on a finite dimensional space is automatically bounded, but not so for an infinite dimensional space.
Also, for any linear transformation T , we will let N (T ) denote the kernel of
T , so
N (T ) = {v V |T v = 0}
and R(T ) denote the range of T , so
R(T ) := {v|v = T w for some w V }.
Both N (T ) and R(T ) are linear subspaces of V .
For bounded transformations, the phrase self-adjoint is synonymous with
symmetric. Later on we will need to study non-bounded (not everywhere
defined) symmetric transformations, and then a rather subtle and important
distinction will be made between self-adjoint transformations and those which
are merely symmetric. But for the rest of this section we will only be considering bounded linear transformations, and so we will freely use the phrase
self-adjoint, and (usually) drop the adjective bounded since all our transformations will be assumed to be bounded.
We denote the set of all (bounded) self-adjoint transformations by A, or by
A(V ) if we need to make V explicit.
2.2.1
53
|(T v, w)| (T v, v) 2 (T w, w) 2 .
Now let us assume in addition that T is bounded with norm kT k. Let us take
w = T v in the preceding inequality. We get
1
(T T v, T v) 2 kT T vk 2 kT vk 2 kT k 2 kT vk 2 kT vk 2 = kT k 2 kT vk,
where we have used the defining property of kT k in the form kT T vk kT kkT vk.
Substituting this into the previous inequality we get
1
kT vk2 (T v, v) 2 kT k 2 kT vk.
If kT vk =
6 0 we may divide this inequality by kT vk to obtain
1
kT vk kT k 2 (T v, v) 2 .
(2.17)
(2.18)
It also follows from (2.17) that if we have a sequence {vn } of vectors with
(T vn , vn ) 0 then kT vv k 0 and so
(T vn , vn ) 0 T vn 0.
(2.19)
Notice that if T is a bounded self adjoint transformation, not necessarily nonnegative, then rI T is a non-negative self-adjoint transformation if r kT k:
Indeed,
((rI T )v, v) = r(v, v) (T v, v) (r kT k)(v, v) 0
since, by Cauchy-Schwartz,
(T v, v) |(T v, v)| kT vkkvk kT kkvk2 = kT k(v, v).
So we may apply the preceding results to rI T .
54
2.3
1
T w.
m21
1 2
T w
m21
55
or
T 2 w = m21 w.
Also kwk = kT k = m1 6= 0. So w 6= 0. So w is an eigenvector of T 2 with
eigenvalue m21 . We have
0 = (T 2 m21 )w = (T + m1 )(T m1 )w.
If (T m1 )w = 0, then w is an eigenvector of T with eigenvalue m1 and we
normalize by setting
1
1 :=
w.
kwk
Then k1 k = 1 and
T 1 = m1 1 .
If (T m1 )w 6= 0 then y := (T m1 )w is an eigenvector of T with eigenvalue
m1 and again we normalize by setting
1 :=
1
y.
kyk
56
The first case is one of the alternatives in the theorem, so we need to look at
the second alternative.
We first prove that |rn | 0. If not, there is some c > 0 such that |rn | c for
all n (since the |rn | are decreasing). If i 6= j,then by the Pythagorean theorem
we have
kT i T j k2 = kri i rj j k2 = ri2 ki k2 + rj2 kj k2 .
Since ki k = kj | = 1 this gives
kT i T j k2 = ri2 + rj2 2c2 .
Hence no subsequence
of the T i can converge, since all these vectors are at
n
X
bi i = T v
i=1
Now v
Pn
i=1
n
X
ai ri i = T (v
i=1
n
X
ai i ).
i=1
n
X
ai i )k |rn+1 |k(v
i=1
n
X
ai i )k.
i=1
n
X
ai i )k kvk.
i=1
n
X
i=1
bi i k = kT (v
n
X
ai i )k |rn+1 |kvk 0.
i=1
This proves that the Fourier series of w converges to w concluding the proof of
the theorem.
The converse of the above result is easy. Here is a version: Suppose that
H is a Hilbert space with an orthonormal basis {i } consisting of eigenvectors
57
1
r > N (j).
j
We can then let Hj denote the closed subspace spanned by all the eigenvectors
r , r > N (j), so that
H = H
j Hj
is an orthogonal decomposition and H
j is finite dimensional, in fact is spanned
the first N (j) eigenvectors of T .
Now let {ui } be a sequence of vectors with kui k 1 say. We decompose
each element as
00
ui = u0i u00i ,
u0i H
1 , u i Hj .
We can choose a subsequence so that u0ik converges, because they all belong to
a finite dimensional space, and hence so does T uik since T is bounded. We can
decompose every element of this subsequence into its H
2 and H2 components,
and choose a subsequence so that the first component converges. Proceeding in
this way, and then using the Cantor diagonal trick of choosing the k-th term of
the k-th selected subsequence, we have found a subsequence such that for any
fixed j, the (now relabeled) subsequence, the H
j component of T uj converges.
But the Hj component of T uj has norm less than 1/j, and so the sequence
converges by the triangle inequality.
2.4
2.4.1
We have let C(T) denote the space of continuous functions on the real line which
are periodic with period 2. We will let C 1 (T) denote the space of periodic
functions which have a continuous first derivative (necessarily periodic) and by
C 2 (T) the space of periodic functions with two continuous derivatives. If f and
g both belong to C 1 (T) then integration by parts gives
Z
Z
1
1
f 0 gdx =
f g 0 dx
2
2
58
since the boundary terms, which normally arise in the integration by parts
formula, cancel, due to the periodicity of f and g. If we take g = einx /(in), n 6= 0
the integral on the right hand side of this equation is the Fourier coefficient:
Z
1
cn =
f (x)einx dx.
2
We thus obtain
1 1
cn =
in 2
f 0 (x)einx dx
so, for n 6= 0,
A
|cn |
n
1
where A :=
2
|f 0 (x)|dx
N,M
M
X
cn f (0).
Write f (x) = (f (x) f (0)) + f (0). The Fourier coefficients of any constant
function c all vanish except for the c0 term which equals c. So the above limit
is trivially true when f is a constant. Hence, in proving the above formula, it is
enough to prove it under the additional assumption that f (0) = 0, and we need
to prove that in this case
lim
N,M
(cN + cN +1 + + cM ) 0.
59
where
gN,M (x) = eiN x +ei(N 1)x + +eiM x = eiN x 1 + eix + + ei(M +N )x =
1 ei(M +N +1)x
eiN x ei(M +1)x
=
, x 6= 0
ix
1e
1 eix
where we have used the formula for a geometric sum. By lHopitals rule, this
extends continuously to the value M + N + 1 for x = 0. Now f (0) = 0, and
since f has two continuous derivatives, the function
eiN x
h(x) :=
f (x)
1 eix
defined for x 6= 0 (or any multiple of 2) extends, by lHopitals rule, to a function defined at all values, and which is continuously differentiable and periodic.
Hence the limit we are computing is
Z
Z
1
1
h(x)eiN x dx
h(x)ei(M +1)x dx
2
2
and we know that each of these terms tends to zero.
We have thus proved that the Fourier series of any twice differentiable periodic function converges uniformly and absolutely to that function. If we consider
the space C 2 (T) with our usual scalar product
Z
1
(f, g) =
f gdx
2
then the functions einx are dense in this space, since uniform convergence implies
convergence in the k k norm associated to ( , ). So, on general principles, Bessels
inequality and Parsevals equation hold.
It is not true in general that the Fourier series of a continuous function
converges uniformly to that function (or converges at all in the sense of uniform
convergence). However it is true that we do have convergence in the L2 norm,
i.e. the Hilbert space k k norm on C(T). To prove this, we need only prove that
the exponential functions einx are dense, and since they are dense in C 2 (T), it is
enough to prove that C 2 (T) is dense in C(T). For this, let be a function defined
on the line with at least two continuous bounded derivatives with (0) = 1 and
of total integral equal to one and which vanishes rapidly at infinity. A favorite
is the Gauss normal function
2
1
(x) := ex /2
2
60
As t 0 the function t becomes more and more concentrated about the origin,
but still has total integral one. Hence, for any bounded continuous function f ,
the function t ? f defined by
Z
Z
(t ? f )(x) :=
f (x y)t (y)dy =
f (u)t (x u)du.
satisfies t ? f f uniformly on any finite interval. From the rightmost expression for t ? f above we see that t ? f has two continuous derivatives. From
the first expression we see that t ? f is periodic if f is. This proves that C 2 (T)
is dense in C(T). We have thus proved convergence in the L2 norm.
2.4.2
d
.
dx
d
dx
in that
D einx = ineinx .
So they are also eigenvalues of the operator D2 with eigenvalues n2 . Also, on
the space of twice differentiable periodic functions the operator D2 satisfies
Z
Z
1
1
(D2 f, g) =
f 00 (x)g(x)dx = f 0 (x)g(x)
f 0 (x)g 0 (x)dx
2
2
61
If we can show that every element of C([, ]) is a sum of its Fourier series
(in the pre-Hilbert space sense) then the same will be true for C(T). So we will
work with C([, ]).
We can consider the operator D2 1 as a linear map
D2 1 : C 2 ([, ]) C([, ]).
This map is surjective, meaning that given any continuous function g we can
find a twice differentiable function f satisfying the differential equation
f 00 f = g.
In fact we can find a whole two dimensional family of solutions because we can
add any solution of the homogeneous equation
h00 h = 0
to f and still obtain a solution. We could write down an explicit solution for the
equation f 00 f = g, but we will not need to. It is enough for us to know that
the solution exists, which follows from the general theory of ordinary differential
equations.
The general solution of the homogeneous equation is given by
h(x) = aex + bex .
Let
M C 2 ([, ])
be the subspace consisting of those functions which satisfy the periodic boundary conditions
f () = f (), f 0 () = f 0 ().
Given any f we can always find a solution of the homogeneous equation such
that f h M . Indeed, we need to choose the complex numbers a and b such
that if h is as given above, then
h() h() = f () f (), and h0 () h0 () = f 0 () f 0 ().
Collecting coefficients and denoting the right hand side of these equations by c
and d we get the linear equations
(e e )(a b) = c, (e e )(a + b) = d
which has a unique solution.
So there exists a unique operator
T : C([, ]) M
with the property that
(D2 I) T = I.
62
(2.20)
Once we will have proved this fact, then we know every element of M can
be expanded in terms of a series consisting of eigenvectors of T with non-zero
eigenvalues. But if
T w = w
then
D2 w = (D2 I)w + w =
1
[(D2 I) T ]w + w =
1
+ 1 w.
2.4.3
G
ardings inequality, special case.
We now turn to the compactness. We have already verified that for any f M
we have
([D2 1]f, f ) = (f 0 , f 0 ) (f, f ).
Taking absolute values we get
kf 0 k2 + kf k2 |([D2 1]f, f )|.
(2.21)
(We actually get equality here, the more general version of this that we will
develop later will be an inequality.)
Let u = [D2 1]f and use the Cauchy-Schwartz inequality
|([D2 1]f, f )| = |(u, f )| kukkf k
63
and kf 0 k 1.
(2.22)
We wish to show that from any sequence of functions satisfying these two conditions we can extract a subsequence which converges. Here convergence means,
of course, with respect to the norm given by
Z
1
2
|f (x)|2 dx.
kf k =
2
In fact, we will prove something stronger: that given any sequence of functions
satisfying (2.22) we can find a subsequence which converges in the uniform norm
kf k := max |f (x)|.
x[,]
Notice that
kf k =
1
2
12
12
Z
1
|f (x)|2 dx
(kf k )2 dx
= kf k
2
1
|b a|
2
and
k |f 0 |k = kf 0 k 1.
64
We conclude that
(2.23)
1
2
12
12
Z
1
2
|f | (x)dx
(min |f |) dx
= min |f |
2
and |b a| 2 so
kf k 1 + 2.
Thus the values of all the f T [S] are all uniformly bounded - (they take values
in a circle of radius 1 + 2) and they are equicontinuous in that (2.23) holds.
This is enough to guarantee that out of every sequence of such f we can choose
a uniformly convergent subsequence.
(We recall how the proof of this goes: Since all the values of all the f are
bounded, at any point we can choose a subsequence so that the values of the
f at that point converge, and, by passing to a succession of subsequences (and
passing to a diagonal), we can arrange that this holds at any countable set of
points. In particular, we may choose say the rational points in [, ]. Suppose
that fn is this subsequence. We claim that (2.23) then implies that the fn form
a Cauchy sequence in the uniform norm and hence converge in the uniform norm
to some continuous function. Indeed, for any choose such that
1
(2) 2 2 <
1
,
3
choose a finite number of rational points which are within distance of any
point of [, ] and choose N sufficiently large that |fi fj | < 31 at each of
these points, r. when i and j are N . Then at any x [, ]
|fi (x) fj (x)| |fi (x) fi (r)| + |fj (x) fj (r)| + |fi (r) fj (r)|
since we can choose r such that that the first two and hence all of the three
terms is 13 .)
2.5
In this section we show how the arguments leading to the Cauchy-Schwartz inequality give one of the most important discoveries of twentieth century physics,
the Heisenberg uncertainty principle.
65
X
r
pk
X
r
pk
( hi)2
r2 ()2
X
( hi)2
1
1
=
( hi)2 pk = 2 .
r2 ()2
r2 ()2
r
all k
all k
Replacing i by i + c does not change the variance.
Now suppose that A is a self-adjoint operator on V , that the i are the
eigenvalues of A with eigenvectors i constituting an orthonormal basis, and
that the pi = |(, i |2 as above.
pk
66
67
2.6
Recall that T now stands for the n-dimensional torus. Let P = P(T) denote
the space of trigonometric polynomials. These are functions on the torus of the
form
X
u(x) =
a` ei`x
where
` = (`1 , . . . , `n )
is an n-tuplet of integers and the sum is finite. For each integer t (positive, zero
or negative) we introduce the scalar product
(u, v)t :=
(1 + ` `)t a` b` .
(2.24)
1
(2)n
Z
u(x)v(x)dx.
T
This differs by a factor of (2)n from the scalar product that is used by Bers
and Schecter. We will denote the norm corresponding to the scalar product
( , )s by k ks .
If
2
2
:=
+ +
(x1 )2
(xn )2
the operator (1 + ) satisfies
(1 + )u =
(1 + ` `)a` ei`x
and so
((1 + )t u, v)s = (u, (1 + )t v)s = (u, v)s+t
and
k(1 + )t uks = kuks+2t .
(2.25)
68
(2.26)
(1 + ` `)s a` b`
(1 + ` `)
s+t
2
a` (1 + ` `)
st
2
b`
= ((1 + )
s+t
2
u, (1 + )
s+t
2
st
2
v)0
st
2
vk0
The generalized Cauchy-Schwartz inequality reduces to the usual CauchySchwartz inequality when t = 0.
Clearly we have
kuks kukt if s t.
If Dp denotes a partial derivative,
Dp =
|p|
(x1 )p1 (xn )pm
then
Dp u =
(i`)p a` ei`x .
(2.27)
and similarly
kukt (constant depending on t)
X
|p|t
In particular,
kDp uk0
if t 0.
(2.28)
69
kDp uk0
|p|t
are equivalent.
We let Ht denote the completion of the space P with respect to the norm
k kt . Each Ht is a Hilbert space, and we have natural embeddings
Ht , Hs if s < t.
Equation (2.25) says that
(1 + )t : Hs+2t Hs
and is an isometry.
From the generalized Schwartz inequality we also have a natural pairing of
Ht with Ht given by the extension of ( , )0 , so
|(u, v)0 | kukt kvkt .
(2.29)
In fact, this pairing allows us to identify Ht with the space of continuous linear
functions on Ht . Indeed, if is a continuous linear function on Ht the Riesz
representation theorem tells us that there is a w Ht such that (u) = (u, w)t .
Set
v := (1 + )t w.
Then
v Ht
and
(u, v)0 = (u, (1 + )t w)0 = (u, w)t = (u).
We record this fact as
Ht = (Ht ) .
As an illustration of (2.30), observe that the series
X
(1 + ` `)s
`
converges for
n
s< .
2
This means that if define v by taking
b` 1
(2.30)
70
P
then v Hs for s < n2 . If u is given by u(x) = ` a` ei`x is any trigonometric
polynomial, then
X
(u, v)0 =
a` = u(0).
So the natural pairing (2.29) allows us to extend the linear function sending
u 7 u(0) to all of Ht if t > n2 . We can now give v its true name: it is the
Dirac delta function (on the torus) where
(u, )0 = u(0).
So Ht for t >
as
n
2,
for |p| k.
(2.31)
xT
71
hT, k i 0.
We then obtain
Theorem 2.6.1 [Laurent Schwartz.] H is the space of all distributions.
In other words, any distribution belongs to Ht for some t.
Proof. Suppose that T is a distribution that does not belong to any Ht . This
means that for any k > 0 we can find a C function k with
kk kk <
1
k
and
|hT, k i| 1.
But by Lemma 2.6.1 we know that kk kk < k1 implies that Dp k 0 uniformly
for any fixed p contradicting the continuity property of T . QED
Suppose that is a C function on T. Multiplication by is clearly a
bounded operator on H0 = L2 (T), and so it is also a bounded operator on
Ht , t > 0 since we can expand Dp (u) by applications of Leibnitzs rule.
For t = s < 0 we know by the generalized Cauchy Schwartz inequality that
kukt = sup |(v, u)0 |/kvks = sup |(u, v)|/kvks kukt kvks /kvks .
So in all cases we have
kukt (const. depending on and t)kukt .
(2.32)
Let
L=
p (x)Dp
|p|m
72
Proof. We must show that the image of the unit ball B of Ht in Hs can be
covered by finitely many balls of radius . Choose N so large that
(1 + ` `)(st)/2 <
2
2.7
G
ardings inequality.
(2.34)
for all u Ht1 . This elementary inequality will be the key to several arguments
in this section where we will combine
P (2.34) withp integration by parts.
A differential operator L =
|p|m p (x)D with real coefficients and m
even is called elliptic if there is a constant c > 0 such that
X
(1)m/2
ap (x) p c( )m/2 .
(2.35)
|p|=m
In this inequality, the vector is a dummy variable. (Its true invariant significance is that it is a covector, i.e. an element of the cotangent space at x.) The
2.7. G
ARDINGS INEQUALITY.
73
expression on the left of this inequality is called the symbol of the operator L.
It is a homogeneous polynomial of degree m in the variable whose coefficients
are functions of x. The symbol of L is sometimes written as (L) or (L)(x, ).
Another way of expressing condition (2.35) is to say that there is a positive
constant c such that
(L)(x, ) c for all x and such that = 1.
We will assume until further notice that the operator L is elliptic and that
m is a positive even integer.
Theorem 2.7.1 [G
ardings inequality.] For every u C (T) we have
(u, Lu)0 c1 kuk2m/2 c2 kuk20
(2.36)
Stage 1. L =
|p|=m
(u, Lu)0
X
X X
=
a` ei`x ,
p (i`)p a` ei`x
`
(` `)m/2 |a` |2
|p|=m
by (2.35)
= c
cCkuk2m/2 ckuk0
where
C = sup
r0
1 + rm/2
.
(1 + r)m/2
74
|p|=m
p (x)Dp and
where sufficiently small. (How small will be determined very soon in the
course of the discussion.) We have
(u, L0 u)0 c0 kuk2m/2 ckuk20
from stage 1.
We integrate (u, L1 u)0 by parts m/2 times. There are no boundary terms
since we are on the torus. In integrating by parts some of the derivatives will
hit the coefficients. Let us collect all the these terms as I2 . The other terms we
collect as I1 , so
XZ
0
I1 =
bp0 +p00 Dp uDp00 udx
where |p0 | = |p00 | = m/2 and br = r . We can estimate this sum by
|I1 | const.kuk2m/2
and so will require that (const.) < c0 .
The remaining terms give a sum of the form
XZ
0
I2 =
bp0 q Dp uDq udx
where p0 m/2, q 0 < m/2 so we have
|I2 | const.kuk m2 kuk m2 1 .
Now let us take
m
m
1, t1 = , t2 = 0
2
2
in (2.34) which yields, for any > 0,
s=
2.7. G
ARDINGS INEQUALITY.
75
pm/2
kDp uk0
76
which is G
ardings inequality. QED
For the time being we will continue to study the case of the torus. But a
look ahead is in order. In this last step of the argument, where we applied the
partition of unity argument, we have really freed ourselves of the restriction of
being on the torus. Once we make the appropriate definitions, we will then
get G
ardings inequality for elliptic operators on manifolds. Furthermore, the
consequences we are about to draw from G
ardings inequality will be equally
valid in the more general setting.
2.8
Consequences of G
ardings inequality.
Proposition 2.8.1 For every integer t there is a constant c(t) = c(t, L) and a
positive number = (t, L) such that
kukt c(t)kLu + uktm
(2.38)
when
>
for all smooth u, and hence for all u Ht .
Proof.
t=s+
m
2.
2.8. CONSEQUENCES OF G
ARDINGS INEQUALITY.
77
78
and
(L + I)1
M := m (L + I)1 .
2.9
79
Suppose for the rest of this section that M is compact. Let {Ui } be a finite cover
of M by coordinate neighborhoods over which E has a given trivialization, and
i a partition of unity subordinate to this cover. Let i be a diffeomorphism or
Ui with an open subset of Tn where n is the dimension of M . Then if s is a
m
m
smooth section of E, we can think of (i s)1
i as an R or C valued function
n
on T , and consider the sum of the k kk norms applied to each component. We
shall continue to denote this sum by ki f 1
i kk and then define
X
kf kk :=
ki f 1
i kk
i
where the norms on the right are in the norms on the torus. These norms
depend on the trivializations and on the partitions of unity. But any two norms
are equivalent, and the k k0 norm is equivalent to the intrinsic L2 norm defined
above. We define the Sobolev spaces Wk to be the completion of the space of
smooth sections of E relative to the norm k kk for k 0, and these spaces are
well defined as topological vector spaces independently of the choices. Since
Sobolevs lemma holds locally, it goes through unchanged. Similarly Rellichs
lemma: if sn is a sequence of elements of W` which is bounded in the k k` norm
for ` > k, then each of the elements i sn 1
belong to H` on the torus, and
i
are bounded in the k k` norm, hence we can select a subsequence of 1 sn 1
1
which converges in Hk , then a subsubsequence such that i sn 1
for i = 1, 2
i
converge etc. arriving at a subsequence of sn which converges in Wk .
A differential operator L mapping sections of E into sections of E is an
operator whose local expression (in terms of a trivialization and a coordinate
chart) has the form
X
Ls =
p (x)Dp s
|p|m
Here the ap are linear maps (or matrices if our trivializations are in terms of
Rm ).
Under changes of coordinates and trivializations the change in the coefficients
are rather complicated, but the symbol of the differential operator
X
(L)() :=
ap (x) p T Mx
|p|=m
is well defined.
80
If we put a Riemann metric on the manifold, we can talk about the length
|| of any cotangent vector.
If L is a differential operator from E to itself (i.e. F =E) we shall call L
even elliptic if m is even and there exists some constant C such that
hv, (L)()vi C||m |v|2
for all x M, v Ex , T Mx and h , i denotes the scalar product on
Ex . G
ardings inequality holds. Indeed, locally, this is just a restatement of
the (vector valued version) of G
ardings inequality that we have already proved
for the torus. But Stage 4 in the proof extends unchanged (other than the
replacement of scalar valued functions by vector valued functions) to the more
general case.
2.10
We assume knowledge of the basic facts about differentiable manifolds, in particular the existence of an operator d : k k+1 with its usual properties,
where k denotes the space of exterior k-forms. Also, if M is orientable and
carries a Riemann metric then the Riemann metric induces a scalar product on
the exterior powers of T M and also picks out a volume form. So there is an
induced scalar product ( , ) = ( , )k on k and a formal adjoint of d
: k k1
and satisfies
(d, ) = (, )
where is a (k + 1)-form and is a k-form. Then
:= d + d
is a second order differential operator on k and satisfies
(, ) = kdk2 + kk2
where kk|2 = (, ) is the intrinsic L2 norm (so k k = k k0 in terms of the
notation of the preceding section). Furthermore, if
X
=
I dxI
I
I = (i1 , . . . , ik )
g ij
I
+
xi xj
81
where
g ij = hdxi , dxj i
and the are lower order derivatives. In particular is elliptic.
Let k and suppose that
d = 0.
Let C(), the cohomology class of be the set of all k which satisfy
= d, k1
and let
C()
denote the closure of C in the L2 norm. It is a closed subspace of the Hilbert
space obtained by completing k relative to its L2 norm. Let us denote this
space by Lk2 , so C() is a closed subspace of Lk2 .
Proposition 2.10.1 If k and d = 0, there exists a unique C() such
that
k k kk C().
Furthermore, is smooth,and
d = 0
and
= 0.
(, d)
, >0
|(, d)|
82
so
|(, d)| |d|2 .
As is arbitrary, this implies that (, d) = 0.
So (, ) = (, [d + d]) = 0 for any k . Hence is a weak solution
of = 0 and so is smooth. The space Hk of weak, and hence smooth solutions
of = 0 is finite dimensional by the general theory. It is called the space
of Harmonic forms. We have seen that there is a unique harmonic form in
the cohomology class of any closed form, s the cohomology groups are finite
dimensional. In fact, the general theory tells us that
M
Lk2
Ek
=
=
=
=
=
I H
dN
N
N
0
83
which are the fundamental assertions of Hodge theory, together with the assertion proved above that H is the unique minimizing element in its cohomology
class.
We have seen that
M
M
E2k+1 is an isomorphism for 6= 0
(2.39)
E2k
d+ :
k
(1)k dim Ek = 0
L k
This shows that the index of the operator d + acting on
L2 is the Euler
characteristic of the manifold. (The index of any operator is the difference
between the dimensions of the kernel and cokernel).
Let Pk, denote the projection of Lk2 onto Ek . So
X
et =
et Pk,
is the solution of the heat equation on Lk2 . As t this approaches the
operator H projecting Lk2 onto Hk . Letting k denote the operator on Lk2
we see that
X
tr etk =
ek
where the sum is over all eigenvalues k of k counted with multiplicity. It
follows from (2.39) that the alternating sum over k of the corresponding sum
over non-zero eigenvalues vanishes. Hence
X
(1)k tr etk = (M )
is independent of t. The index theorem computes this trace for small values of
t in terms of local geometric invariants.
The operator d + is an example of a Dirac operator whose general definition we will not give here. The corresponding assertion and local evaluation
is the content of the celebrated Atiyah-Singer index theorem, one of the most
important theorems discovered in the twentieth century.
2.11
The resolvent.
In order to connect what we have done here notation that will come later, it is
convenient to let A = L so that now the operator
(zI A)1
is compact as an operator on H0 for z sufficiently negative. (I have dropped the
m which should come in front of this expression.) The operator A now has only
84
where n is an eigenvector of A with eigenvalue n and the form an orthonormal basis of H0 . Then
(zI A)1 u =
1
an n .
z n
The operator (zI A)1 is called the resolvent of A at the point z and denoted
by
R(z, A)
or simply by R(z) if A is fixed. So
R(z, A) := (zI A)1
for those values of z C for which the right hand side is defined.
If z and are complex numbers with Rez > Rea, then the integral
Z
ezt eat dt
0
where we may interpret this equation as a shorthand for doing the integral for
the coefficient of each eigenvector, as above, or as an actual operator valued
integral. We will spend a lot of time later on in this course generalizing this
formula and deriving many consequences from it.
Chapter 3
The space S consists of all functions on mathbbR which are infinitely differentiable and vanish at infinity rapidly with all their derivatives in the sense
that
kf km,n := sup{|xm f (n) (x)|} < .
The k km,n give a family of semi-norms on S making S into a Frechet space that is, a vector space space whose topology is determined by a countable family
of semi-norms. More about this later in the course. We use the measure
1
dx
2
on R and so define the Fourier transform of an element of S by
1
f() :=
2
f (x)eix dx
Z
f (x t)g(t)dt.
R
86
3.2
=
=
=
Z Z
1
f (x t)g(t)dxeix dx
2
Z Z
1
f (u)g(t)ei(u+t) dudt
2
Z
Z
1
1
iu
g(t)eit dt
f (u)e
du
2 R
2 R
so
(f ? g)= fg.
3.3
Scaling.
For any f S and a > 0 define Sa f by (Sa )f (x) := f (ax). Then setting u = ax
so dx = (1/a)du we have
Z
1
f (ax)eix dx
(Sa f )() =
2 R
Z
1
=
(1/a)f (u)eiu(/a) du
2 R
so
(Sa f )= (1/a)S1/a f.
3.4
ex /2 dx = 1.
2 R
The integral
Z
2
1
ex /2x dx
2 R
converges for all complex values of , uniformly in any compact region. Hence
it defines an analytic function of that can be evaluated by taking to be real
and then using analytic continuation. For real we complete the square and
make a change of variables:
Z
Z
2
2
1
1
x2 /2x
e(x+) /2+ /2 dx
e
dx =
2 R
2 R
Z
2
2
/2 1
= e
e(x+) /2 dx
2 R
= e
/2
87
Setting = i gives
n
=n
if n(x) := ex
/2
x2 /2
then
( )(x) =
1 x2 /22
e
.
Z
(1/a)(S1/a g)()d =
g()d
R
Z
( )()d = 1
R
for all .
Let
:= 1 := (1 )
and
:= ( ).
Then
() =
1
so
1
( ? g)() g() =
2
1
=
2
1
[g( ) g()]
d =
R
Z
[g( ) g()]()d.
R
88
3.5
f(x)g(x)dx =
f (x)
g (x)dx
R
f (y)eixy dyg(x)dx.
2 R R
We can write this integral as a double integral and then interchange the order
of integration which gives the right hand side.
3.6
f()eix d.
To prove this, we first observe that for any h S the Fourier transform of
) as follows directly from the definition.
x 7 eix h(x) is just 7 h(
itx 2 x2 /2
Taking g(x) = e e
in the multiplication formula gives
Z
Z
2 2
1
1
f(t)eitx e t /2 dt =
f (t) (t x)dt = (f ? )(x).
2 R
2 R
2 2
3.7
Plancherels theorem
Let
f(x) := f (x).
Then the Fourier transform of f is given by
Z
Z
1
1
f (u)eiu du = f()
f (x)eix dx =
2 R
2 R
so
(f)= f.
Thus
(f ? f)= |f|2 .
89
|f (x)|2 dx.
f (x)f(0 x)dx =
2 R
2 R
Thus we have proved Plancherels formula
Z
Z
1
1
|f (x)|2 dx =
|f(x)|2 dx.
2 R
2 R
Define L2 (R) to be the completion of S with respect to the L2 norm given by
the left hand side of the above equation. Since S is dense in L2 (R) we conclude
that the Fourier transform extends to unitary isomorphism of L2 (R) onto itself.
3.8
g(x + 2k)
where
am =
1
2
Z
0
h(x)eimx dx =
1
2
Z
R
1
g(x)eimx dx = g(m).
2
90
3.9
sin (n t)
1 X
.
f (n)
n=
nt
(3.1)
Proof. Let g be the periodic function (of period 2) which extends f, the
Fourier transform of f . So
g( ) = f( ),
[, ]
and is periodic.
Expand g into a Fourier series:
g=
cn ein ,
nZ
where
1
cn =
2
in
g( )e
or
cn =
But
f (t) =
1
(2)
1
2
1
d =
2
1
1
(2) 2
f( )eit d =
(2) 2
1
1
(2) 2
f( )ein d,
f (n).
1
(2)
1
2
g( )eit d =
f (n)ei(n+t) d.
But
Z
i(tn)
ei(tn)
ei(tn) ei(tn)
sin (n t)
d =
=
=2
. QED
i(t n)
i(t n)
nt
91
1
.
2c
(3.2)
1X
sin (x n)
f (na)
,
xn
or setting t = ax,
f (t) =
X
n=
f (na)
sin( a (t na)
.
a (t na)
(3.3)
3.10
|f (x)|2 dx = 1.
92
Suppose for the moment that these means both vanish. The Heisenberg Uncertainty Principle says that
Z
Z
1
2
2
|xf (x)| dx
| f ()| d .
4
0
Proof. Write if () as the
R Fourier transform of f and use Plancherel to
write the second integral as |f 0 (x)|2 dx. Then the Cauchy - Schwarz inequality
says that the left hand side is the square of
Z
Z
|xf (x)f 0 (x)|dx Re(xf (x)f 0 (x))dx =
Z
1
0
0
x(f (x)f (x) + f (x)f (x)dx
2
Z
Z
1
1
d
1
= x |f |2 dx = |f |2 dx = . QED
2
dx
2
2
If f has norm one but the mean of the probability density |f |2 is not necessarily of zero (and similarly for for its Fourier transform) the Heisenberg uncertainty
principle says that
Z
Z
1
|(x xm )f (x)|2 dx
|( m )f()|2 d .
4
The general case is reduced to the special case by replacing f (x) by
f (x + xm )eim x .
3.11
Tempered distributions.
The space S was defined to be the collection of all smooth functions on R such
that
kf km,n := sup{|xm f (n) (x)|} < .
x
The collection of these norms define a topology on S which is much finer that
the L2 topology: We declare that a sequence of functions {fk } approaches g S
if and only if
kfk gkm,n 0
for every m and n.
A linear function on S which is continuous with respect to this topology is
called a tempered distribution.
The space of tempered distributions is denoted by S 0 . For example, every
element f S defines a linear function on S by
Z
1
(x)f (x)dx.
7 h, f i =
2 R
93
But this last expression makes sense for any element f L2 (R), or for any
piecewise continuous function f which grows at infinity no faster than any polynomial. For example, if f 1, the linear function associated to f assigns to
the value
Z
1
(x)dx.
2 R
This is clearly continuous with respect to the topology of S but this function of
does not make sense for a general element of L2 (R).
Another example of an element of S 0 is the Dirac -function which assigns
to S its value at 0. This is an element of S 0 but makes no sense when
evaluated on a general element of L2 (R).
If f S, then the Plancherel formula formula implies that its Fourier transform F(f ) = f satisfies
(, f ) = (F(), F(f )).
But we can now use this equation to define the Fourier transform of an arbitrary
element of S 0 : If ` S 0 we define F(`) to be the linear function
F(`)() := `(F 1 ()).
3.11.1
()) = (F
1
() (0) =
2
Z
(x)dx.
R
94
()
e
ddx =
()eix xddx =
i d
2
2
Z
d()
d()
d() ix
1
1
e ddx = i F (F
(0) = i
.
i
dx
dx
dx
2
Now for an element of S we have
Z
Z
1
df
d
f dx =
dx.
dx
dx
2
So we define the derivative of an ` S 0 by
d`
d
() = `
.
dx
dx
d
.
So the Fourier transform of x is i dx
Chapter 4
Measure theory.
4.1
We recall some results from the chapter on metric spaces: For any subset A R
we defined its Lebesgue outer measure by
X
[
m (A) := inf
`(In ) : In are intervals with A
In .
(4.1)
Here the length `(I) of any interval I = [a, b] is b a with the same definition
for half open intervals (a, b] or [a, b), or open intervals. Of course if a =
and b is finite or +, or if a is finite and b = + the length is infinite. So the
infimum in (4.1) is taken over all covers of A by intervals. By the usual /2n
trick, i.e. by replacing each Ij = [aj , bj ] by (aj /2j+1 , bj + /2j+1 ) we may
assume that the infimum is taken over open intervals. (Equally well, we could
use half open intervals of the form [a, b), for example.).
It is clear that if A B then m (A) m (B) since any cover of B by
intervals is a cover of A. Also, if Z is any set of measure zero, then m (A Z) =
m (A). In particular, m (Z) = 0 if Z has measure zero. Also, if A = [a, b] is an
interval, then we can cover it by itself, so
m ([a, b]) b a,
and hence the same is true for (a, b], [a, b), or (a, b). If the interval is infinite, it
clearly can not be covered by a set of intervals whose total length is finite, since
if we lined them up with end points touching they could not cover an infinite
interval. We recall the proof that
m (I) = `(I)
(4.2)
95
96
then
dc
(bi ai ).
n
[
(ai , bi ) then d c
i=1
n
X
(bi ai ),
i=1
n
[
(ai , bi ).
i=3
So by induction
d c (b2 a1 ) +
n
X
(bi ai ).
i=3
(4.3)
in (4.1). We will see that when we pass to other types of measures this will
make a difference.
We have verified, or can easily verify the following properties:
1.
m () = 0.
97
2.
A B m (A) m (B).
3.
[
X
m ( Ai )
m (Ai ).
i
then
vol J
vol (I).
IC
98
4.2
Item 5. in the preceding paragraph says that the Lebesgue outer measure of
any set is obtained by approximating it from the outside by open sets. The
Lebesgue inner measure is defined as
m (A) = sup{m (K) : K A, K compact }.
(4.4)
Clearly
m (A) m (A)
since m (K) m (A) for any K A. We also have
Proposition 4.2.1 For any interval I we have
m (I) = `(I).
(4.5)
4.3
(4.6)
(4.7)
If K is a compact set, then m (K) = m (K) since K is a compact set contained in itself. Hence all compact sets are measurable in the sense of Lebesgue.
If I is a bounded interval, then I is measurable in the sense of Lebesgue by
Proposition 4.2.1.
If m (A) = , we say that A is measurable in the sense of Lebesgue if all
of the sets A [n, n] are measurable.
S
Proposition 4.3.1 If A = Ai is a (finite or) countable disjoint union of sets
which are measurable in the sense of Lebesgue, then A is measurable in the sense
of Lebesgue and
X
m(A) =
m(Ai ).
i
99
Proof. We may assume that m(A) < - otherwise apply the result to A
[n, n] and Ai [n, n] for each n. We have
m (A)
m (An ) =
m(An ).
= m(An ) n
2n
2
m(An ).
is a disjoint union of intervals (some open, some closed, some half open) and O
is the disjont union of the Jn . So every open set is a disjoint union of intervals
hence measurable in the sense of Lebesgue.
If F is closed, and m (F ) = , then F [n, n] is compact, and so F is
measurable in the sense of Lebesgue. Suppose that
m (F ) < .
100
,1 2] F
2
2
2
:= ([2 + 3 , 1] F ) ([1, 2
2
:= ([3 + 4 , 2] F ) ([2, 3
2
..
.
:= [1 +
] F)
23
] F)
24
and set
G :=
Gi, .
The Gi, are all compact, and hence measurable in the sense of Lebesgue, and
the union in the definition of G is disjoint, so is measurable in the sense of
Lebesgue. Furthermore, the sum of the lengths of the gaps between the
intervals that went into the definition of the Gi, is . So
m(G ) + = m (G ) + m (F ) m (G ) = m(G ) =
m(Gi, ).
In particular, the sum on the right converges, and hence by considering a finite
number of terms, we will have a finite sum whose value is at least m(G ) .
The corresponding union of sets will be a compact set K contained in F with
m(K ) m (F ) 2.
Hence all closed sets are measurable in the sense of Lebesgue. QED
Theorem 4.3.1 A is measurable in the sense of Lebesgue if and only if for
every > 0 there is an open set U A and a closed set F A such that
m(U \ F ) < .
Proof. Suppose that A is measurable in the sense of Lebesgue with m(A) < .
Then there is an open set U A with m(U ) < m (A) + /2 = m(A) + /2, and
there is a compact set F A with m(F ) m (A) = m(A) /2. Since
U \ F is open, it is measurable in the sense of Lebesgue, and so is F as it is
compact. Also F and U \ F are disjoint. Hence by Proposition 4.3.1,
m(U \ F ) = m(U ) m(F ) < m(A) + m(A)
= .
2
2
If A is measurable in the sense of Lebesgue, and m(A) = , we can apply
the above to A I where I is any compact interval. So we can find open
sets Un A [n 2n+1 , n + 2n+1 ] and closed sets Fn A [n, n] with
m(Un \ Fn ) < /2n . Here the n are sufficiently small positive numbers. We
101
102
4.4
(4.8)
(4.9)
for all A.
Suppose E is measurable in the sense of Lebesgue. Let > 0. Choose
U E F with U open, F closed and m(U/F ) < which we can do by
Theorem 4.3.1. Let V be an open set containing A. Then A \ E V \ F and
A E (V U ) so
m (A \ E) + m (A E) m(V \ F ) + m(V U )
m(V \ U ) + m(U \ F ) + m(V U )
m(V ) + .
(We can pass from the second line to the third since both V \ U and V U
are measurable in the sense of Lebesgue and we can apply Proposition 4.3.1.)
Taking the infimum over all open V containing A, the last term becomes m (A),
and as is arbitrary, we have established (4.9) showing that E is measurable in
the sense of Caratheodory.
103
104
(4.10)
4.5
Countable additivity.
The first main theorem in the subject is the following description of M and the
function m on it:
Theorem 4.5.1 M and the function m : M R have the following properties:
R M.
E M E c M.
If En M for n = 1, 2, 3, . . . then
En M.
Fn M and
m(Fn ).
n=1
Proof. We already know the first two items on the list, and we know that a
finite union of sets in M is again in M. We also know the last assertion which
is Proposition 4.3.1. But it will be instructive and useful for us to have a proof
starting directly from Caratheodorys definition of measurablity:
If F1 M, F2 M and F1 F2 = then taking
A = F1 F2 , E1 = F1 , E2 = F2
105
in (4.10) gives
m(F1 F2 ) = m(F1 ) + m(F2 ).
Induction then shows that if F1 , . . . , Fn are pairwise disjoint elements of M then
their union belongs to M and
m(F1 F2 Fn ) = m(F1 ) + m(F2 ) + + m(Fn ).
More generally, if we let A be arbitrary and take E1 = F1 , E2 = F2 in (4.10)
we get
m (A) = m (A F1 ) + m (A F2 ) + m (A (F1 F2 )c ).
If F3 M is disjoint from F1 and F2 we may apply (4.8) with A replaced by
A (F1 F2 )c and E by F3 to get
m (A (F1 F2 )c )) = m (A F3 ) + m (A (F1 F2 F3 )c ),
since
c
n
X
m (A Fi ) + m (A (F1 Fn )c ).
(4.11)
Now suppose that we have a countable family {Fi } of pairwise disjoint sets
belonging to M. Since
!c
!c
n
[
[
Fi
Fi
i=1
i=1
m (A)
n
X
m (A Fi ) + m
!c !
Fi
i=1
m (A)
m (A Fi ) + m
!c !
Fi
i=1
Now given any collection of sets Bk we can find intervals {Ik,j } with
[
Bk
Ik,j
j
106
and
m (Bk )
`(Ik,j ) +
.
2k
So
[
Bk
and hence
m
[
Ik,j
k,j
X
Bk
m (Bk ),
the inequality being trivially true if the sum on the right is infinite. So
!!
X
[
m (A Fk ) m A
Fi
.
i=1
Thus
m (A)
i=1
m (A Fi ) + m
!c !
Fi
i=1
!!
Fi
+m
i=1
!c !
Fi
i=1
The extreme right of this inequality is the left hand side of (4.9) applied to
[
E=
Fi ,
i
X
[
m (A) =
m (A Fi ) + m A
Fi
.
(4.12)
i
If we take A =
i=1
Fi we conclude that
m(F ) =
m(Fn )
(4.13)
n=1
Fj .
107
F3 := E3 \ (E1 E2 )
etc. The right hand sides all belong to M since M is closed under taking
complements and finite unions and hence intersections, and
[
[
Fj =
Ej .
j
n
n
[ X
X
[
m
An =
m(Bi ) = lim
m(Bn ) = lim m
Bi = lim m(An ).
i=1
i=1
i=1
QED
Proposition 4.5.3 If Cn Cn+1 is a decreasing family of sets in M and
m(C1 ) < then
\
m
Cn = lim m(Cn ).
n
108
!
[
(C1 \ Cn ) = C1 \
Cn
So
!
m
(C1 \ Cn )
= m(C1 ) m
\
Cn = m(C1 ) lim m(Cn ).
n
Subtracting m(C1 ) from both sides of the last equation gives the equality in the
proposition. QED
4.6
We will now take the items in Theorem 4.5.1 as axioms: Let X be a set. (Usually
X will be a topological space or even a metric space). A collection F of subsets
of X is called a field if:
X F,
If E F then E c = X \ E F, and
If {En } is a sequence of elements in F then
En F,
The intersection of any family of -fields is again a -field, and hence given
any collection C of subsets of X, there is a smallest -field F which contains it.
Then F is called the -field generated by C.
If X is a metric space, the -field generated by the collection of open sets is
called the Borel -field, usually denoted by B or B(X) and a set belonging to
B is called a Borel set.
Given a -field F a (non-negative) measure is a function
m : F [0, ]
such that
m() = 0 and
Countable additivity: If Fn is a disjoint collection of sets in F then
!
[
X
m
Fn =
m(Fn ).
n
109
4.7
110
inf
Dccc(A)
`(D).
(4.14)
DD
and since this is true for all > 0 we conclude that m is countably subadditive.
So we have verified that m defined by (4.14) is an outer measure. We must
check that it satisfies the two conditions in the theorem. If A C then the
single element collection {A} ccc(A), so m (A) `(A), so the first condition
is obvious. As to the second condition, suppose n is an outer measure with
n (D) `(D) for all D C. Then for any set A and any countable cover D of
A by elements of C we have
!
X
X
[
`(D)
n (D) n
D n (A),
DD
DD
DD
4.7.1
A pathological example.
Suppose we take X = R, and let C consist of all half open intervals of the form
[a, b). However, instead of taking ` to be the length of the interval, we take it
to be the square root of the length:
1
`([a, b)) := (b a) 2 .
111
I claim that any half open interval (say [0, 1)) of length one has m ([a, b)) = 1.
(Since ` is translation invariant, it does not matter which interval we choose.)
Indeed, m ([0, 1)) 1 by the first condition in the theorem, since `([0, 1)) = 1.
On the other hand, if
[
[0, 1) [ai , bi )
i
(bi ai ) 2
2
(bi ai ) +
(bi ai ) 2 (bj aj ) 2 1.
i6=j
So m ([0, 1)) = 1.
On the other hand,
consider an interval [a, b) of length 2. Since it covers
so
m (I [1, 1)) + m (I c [1, 1) = 2 >
2 m ([1, 1)).
In other words, the closed unit interval is not measurable relative to the outer
measure m determined by the theorem. We would like Borel sets to be measurable, and the above computation shows that the measure produced by Method
I as above does not have this desirable property. In fact, if we consider two half
open intervals I1 and I2 of length one separated by a small distance of size ,
say, then their union I1 I2 is covered by an interval of length 2 + , and hence
4.7.2
(4.15)
The condition d(A, B) > 0 means that there is an > 0 (depending on A and
B) so that d(x.y) > for all x A, y B. The main result here is due to
Caratheodory:
112
1
}.
j
(4.16)
since (A F ) Aj A. Now
A\F =
Aj
1
1
> 0.
j
j+1
n
[
k=1
!
B2k1
n
X
k=1
m (B2k1 ),
n
[
k=1
!
B2k
n
X
m (B2k ).
k=1
Both of these are m (A2n ) since the union of the sets involved are contained
in A2n . Since m (A2n ) is increasing, and assumed bounded, both of the above
113
= m
[
Ai
= m Aj
Bj
kj+1
m (Aj ) +
m (Bj )
k=j+1
lim m (An ) +
m (Bj ).
k=j+1
But the sum on the right can be made as small as possible by choosing j large,
since the series converges. Hence
m (A/F ) lim m (An )
n
QED.
4.8
The axioms for an outer measure are preserved by this limit operation, so mII
is an outer measure. If A and B are such that d(A, B) > 2, then any set of
C which intersects A does not intersect B and vice versa, so throwing away
extraneous sets in a cover of A B which does not intersect either, we see that
mII (A B) = mII (A) + mII (B). The method II construction always yields a
metric outer measure.
114
4.8.1
An example.
Let X be the set of all (one sided) infinite sequences of 0s and 1s. So a point
of X is an expression of the form
a1 a2 a3
where each ai is 0 or 1. For any finite sequence of 0s or 1s, let [] denote
the set of all sequences which begin with . We also let || denote the length
of , that is, the number of bits in . For each
0<r<1
we define a metric dr on X by: If
x = x0 ,
y = y 0
where the first bit in x0 is different from the first bit in y 0 then
dr (x, y) := r|| .
In other words, the distance between two sequence is rk where k is the length
of the longest initial segment where they agree. Clearly dr (x, y) 0 and = 0 if
and only if x = y, and dr (y, x) = dr (x, y). Also, for three x, y, and z we claim
that
dr (x, z) max{dr (x, y), dr (y, z)}.
Indeed, if two of the three points are equal this is obvious. Otherwise, let j
denote the length of the longest common prefix of x and y, and let k denote
the length of the longest common prefix of y and z. Let m = min(j, k). Then
the first m bits of x agree with the first m bits of z and so dr (x, z) rm =
max(rj , rk ). QED
A metric with this property (which is much stronger than the triangle inequality) is called an ultrametric.
Notice that
diam [] = r .
(4.17)
The metrics for different r are different, and we will make use of this fact
shortly. But
Proposition 4.8.1 The spaces (X, dr ) are all homeomorphic under the identity
map.
It is enough to show that the identity map is a continuous map from (X, dr ) to
(X, ds ) since it is one to one and we can interchange the role of r and s. So,
given > 0, we must find a > 0 such that if dr (x, y) < then ds (x, y) < . So
choose k so that sk < . Then letting rk = will do.
So although the metrics are different, the topologies they define are the same.
115
k k+1 k+1
1
1
1
=
+
= `([0]) + `([1]).
2
2
2
So if we also use the metric d 21 , we see, by repeating the above, that every [] can
P
be written as the disjoint union C1 Cn of sets in C with `([])) = `(Ci ).
Thus m`,C ([]) `() and so m`,C ([])(A) mI (A) or mII = mI . It also
follows from the above computation that
m ([]) = `([]).
There is also something special about the value s = 31 : Recall that one of the
definitions of the Cantor set C is that it consists of all points x [0, 1] which
have a base 3 expansion involving only the symbols 0 and 2. Let
h:XC
where h sends the bit 1 into the symbol 2, e.g.
h(011001 . . .) = .022002 . . . .
In other words, for any sequence z
h(0z) =
I claim that:
h(z)
,
3
h(1z) =
h(z) + 2
.
3
(4.18)
1
d 1 (x, y) |h(x) h(y)| d 13 (x, y)
(4.19)
3 3
Proof.
If x and y start with different bits, say x = 0x0 and y = 1y 0 then
d 13 (x, y) = 1 while h(x) lies in the interval [0, 13 ] and h(y) lies in the interval
[ 23 , 1] on the real line. So h(x) and h(y) are at least a distance 13 and at most
a distance 1 apart, which is what (4.19) says. So we proceed by induction.
Suppose we know that (4.19) is true when x = x0 and y = y 0 with x0 , y 0
starting with different digits, and || n. (The above case was where || = 0.)
116
1
d 1 (x0 , y 0 )
3 3
4.9
Hausdorff measure.
Take C to be the collection of all subsets of X, and for any positive real number
s define
`s (A) = diam(A)s
(with 0s = 0). Take C to consist of all subsets of X. The method II outer
measure is called the s-dimensional Hausdorff outer measure, and its restriction to the associated -field of (Caratheodory) measurable sets is called
the s-dimensional Hausdorff measure. We will let ms, denote the method
I outer measure associated to `s and , and let Hs denote the Hausdorff outer
measure of dimension s, so that
Hs (A) = lim ms, (A).
0
For example, we claim that for X = R, H1 is exactly Lebesgue outer measure, which we will denote here by L . Indeed, if A has diameter r, then A
is contained in a closed interval of length r. Hence L (A) r. The Method
I construction theorem says that m1, is the largest outer measure satisfying
m (A) diam A for sets of diameter less than . Hence m1, (A) L (A) for
all sets A and all , and so
H1 L .
On the other hand, any bounded half open interval [a, b) can be broken up into
a finite union of half open intervals of length < , whose sum of diameters is
b a. So m1, ([a, b) b a. But the method I construction theorem says that
L is the largest outer measure satisfying
m ([a, b)) b a.
117
4.10
Hausdorff dimension.
This last theorem implies that for any Borel set F , there is a unique value s0
(which might be 0 or ) such that Ht (F ) = for all t < s0 and Hs (F ) = 0
for all for all s > s0 . This value is called the Hausdorff dimension of F . It is
one of many competing (and non-equivalent) definitions of dimension. Notice
that it is a metric invariant, and in fact is the same for two spaces different by
a Lipschitz homeomorphism with Lipschitz inverse. But it is not a topological
invariant. In fact, we shall show that the space X of all sequences of zeros and
one studied above has Hausdorff dimension 1 relative to the metric d 12 while
it has Hausdorff dimension log 2/ log 3 if we use the metric d 13 . Since we have
shown that (X, d 13 ) is Lipschitz equivalent to the Cantor set C, this will also
prove that C has Hausdorff dimension log 2/ log 3.
We first discuss the d 21 case and use the following lemma
Lemma 4.10.1 If diam(A) > 0, then there is an such that A [] and
diam([]) = diam A.
118
Proof. Given any set A, it has a longest common prefix. Indeed, consider
the set of lengths of common prefixes of elements of A. This is finite set of
non-negative integers since A has at least two distinct elements. Let n be the
largest of these, and let be a common prefix of this length. Then it is clearly
the longest common prefix of A. Hence A [] and diam([]) = diam A.QED
Let C denote the collection of all sets of the form [] and let ` be the function
on C given by
1
`([]) = ( )|| ,
2
and let ` be the associated method I outer measure, and m the associated
measure; all these as we introduced above. We have
` (A) ` ([]) = diam([]) = diam(A).
By the method I construction theorem, m1, is the largest outer measure with
the property that n (A) diam A for sets of diameter < . Hence ` m1, ,
and since this is true for all > 0, we conclude that
` H1 .
On the other hand, for any and any > 0, there is an n such that 2n <
and n ||. The set [] is the disjoint union of all sets [] [] with || n,
and there are 2n|| of these subsets, each having diameter 2n . So
m1, ([]) 2|| .
However ` is the largest outer measure satisfying this inequality for all [].
Hence m1, ` for all so H1 ` . In other words
H1 = m.
But since we computed that m(X) = 1, we conclude that
The Hausdorff dimension of (X, d 12 ) is 1.
Now let us turn to (X, d 31 ). Then the diameter diam 12 relative to the metric
d 12 and the diameter diam 13 relative to the metric d 13 are given by
diam 12 ([]) =
k
k
1
1
, diam 13 ([]) =
,
3
2
k = ||.
119
4.11
Push forward.
B G.
k
.
k!
4.12
4.12.1
Similarity dimension.
120
ris = 1.
(4.20)
i=1
n
X
rit .
i=1
We have
f (0) = n
and
i=1
QED
Definition 4.12.1 The number s in (4.20) is called the similarity dimension
of the ratio list (r1 , . . . , rn ).
Iterated function systems and fractals.
A map f : X Y between two metric spaces is called a similarity with
similarity ratio r if
dY (f (x1 ), f (x2 )) = rdX (x1 , x2 ) x1 , x2 X.
(Recall that a map is called Lipschitz with Lipschitz constant r if we only had
an inequality, , instead of an equality in the above.)
Let X be a complete metric space, and let (r1 , . . . , rn ) be a contracting ratio
list. A collection
(f1 , . . . , fn ), fi : X X
is called an iterated function system which realizes the contracting ratio
list if
fi : X X, i = 1, . . . , n
is a similarity with ratio ri . We also say that (f1 , . . . , fn ) is a realization of
the ratio list (r1 , . . . , rn ).
It is a consequence of Hutchinsons theorem, see below, that
121
(4.23)
122
4.12.2
e A.
If x 6= y are two elements of E, they will have a longest common initial string
, and we then define
d(x, y) := w .
This makes E into a complete ultrametic space. Define the maps gi : E E
by
gi (x) = ix.
That is, gi shifts the infinite string one unit to the right and inserts the letter i
in the initial position. In terms of our metric, clearly (g1 , . . . , gn ) is a realization
of (r1 , . . . , rn ) and the space E itself is the corresponding fractal set.
We let [] denote the set of all strings beginning with , i.e. whose first
word (of length equal to the length of ) is . The diameter of this set is w .
The Hausdorff dimension of E is s.
We begin with a lemma:
Lemma 4.12.1 Let A E have positive diameter. Then there exists a word
such that A [] and
diam(A) = diam[] = w .
Proof. Since A has at least two elements, there will be a which is a prefix of
one and not the other. So there will be an integer n (possibly zero) which is the
length of the longest common prefix of all elements of A. Then every element
of A will begin with this common prefix which thus satisfies the conditions of
the lemma. QED
The lemma implies that in computing the Hausdorff measure or dimension,
we need only consider covers by sets of the form []. Now if we choose s to be
the solution of (4.20), then
s
(diam[]) =
n
X
i=1
(diam[i]) = (diam[])
n
X
i=1
ris .
123
This means that the method II outer measure assosicated to the function A 7
(diam A)s coincides with the method I outer measure and assigns to each set
[] the measure ws . In particular the measure of E is one, and so the Hausdorff
dimension of E is s.
The universality of E.
Let (f1 , . . . , fn ) a realization of (r1 , . . . , rn ) on a complete metric space X.
Choose a point a X and define h0 : E X by
h0 (z) : a.
Inductively define the maps hp by defining hp+1 on each of the open sets [{i}]
by
hp+1 (iz) := fi (hp (z)).
The sequence of maps {hp } is Cauchy in the uniform norm. Indeed, if y [{i}]
so y = gi (z) for some z E then
dX (hp+1 (y), hp (y)) = dX (fi (hp (z)), fi (hp1 (z))) = ri dX (hp (z), hp1 (z))).
So if we let c := maxi (ri ) so that 0 < c < 1, we have
sup dX (hp+1 (y), hp (y)) c sup dX (hp (x), hp1 (x))
yE
xE
for a suitable constant C. This shows that the hp converge uniformly to a limit
h which satisfies
h gi = fi h.
Now
hk+1 (E) =
fi (hk (E)) ,
and the proof of Hutchinsons theorem given below - using the contraction fixed
point theorem for compact sets under the Hausdorff metric - shows that the
sequence of sets hk (E) converges to the fractal K.
Since the image of h is K which is compact, the image of [] is f (K) where
we are using the obvious notation fij = fi fj , fijk = fi fj fk etc. The set
f (K) has diameter w diam(K). Thus h is Lipschitz with Lipschitz constant
diam(K).
The uniqueness of the map h follows from the above sort of argument.
124
4.13
Let X be a complete metric space. Let H(X) denote the space of non-empty
compact subsets of X. For any A H(X) and any positive number , let
A = {x X|d(x, y) , for some y A}.
We call A the -collar of A. Recall that we defined
d(x, A) = inf d(x, y)
yA
to be the distance from any x X to A, then we can write the definition of the
-collar as
A = {x|d(x, A) }.
Notice that the infimum in the definition of d(x, A) is actually achieved, that
is, there is some point y A such that
d(x, A) = d(x, y).
This is because A is compact. For a pair of non-empty compact sets, A and B,
define
d(A, B) = max d(x, B).
xA
So
d(A, B) A B .
Notice that this condition is not symmetric in A and B. So Hausdorff introduced
h(A, B)
(4.24)
(4.25)
(4.26)
min d(a, b)
bB
= d(a, c) + d(c, B) c C
d(a, c) + d(C, B) c C.
The second term in the last expression does not depend on c, so minimizing
over c gives
d(a, B) d(a, C) + d(C, B).
Maximizing over a on the right gives
d(a, B) d(A, C) + d(C, B).
Maximizing on the left gives the desired
d(A, B) d(A, C) + d(C, A).
We sketch the proof of completeness. Let An be a sequence of compact nonempty subsets of X which is Cauchy in the Hausdorff metric. Define the set
A to be the set of all x X with the property that there exists a sequence of
points xn An with xn x. It is straighforward to prove that A is compact
and non-empty and is the limit of the An in the Hausdorff metric.
Suppose that : X X is a contraction. Then defines a transformation
on the space of subsets of X (which we continue to denote by ):
(A) = {x|x A}.
Since is continuous, it carries H(X) into itself. Let c be the Lipschitz constant
of . Then
d((A), (B))
= cd(A, B).
Similarly, d((B), (A)) c d(B, A) and hence
h((A), (B)) c h(A, B).
(4.27)
126
.
Putting the previous facts together we get Hutchinsons theorem;
Theorem 4.13.1 Let T1 , . . . , Tn be contractions on a complete metric space
and let c be the maximum of their Lipschitz contants. Define the Hutchinoson
operator T on H(X) by
T (A) := T1 (A) Tn (A).
Then T is a contraction with Lipschtz constant c.
4.14
Affine examples
4.14.1
x
,
3
T2 : x 7
x 2
+ .
3 3
127
128
the limit of the sets Bn . But a point such as .101 is not in the limit of the
Bn and hence not in C. This description of C is also due to Cantor. Notice
that for any point a with triadic expansion a = .a1 a2 a2
T1 a = .0a1 a2 a3 ,
while
T2 a = .2a1 a2 a3 .
Thus if all the entries in the expansion of a are either zero or two, this will also
be true for T1 a and T2 a. This shows that the C (given by this second Cantor
description) satisfies T C C. On the other hand,
T1 (.a2 a3 ) = .0a2 a3 ,
T2 (.a2 a3 ) = .2a2 a3
4.14.2
x
y
7
1
2
x
y
+
1
2
0
1
.
The fixed point of the Hutchinson operator for this choice of T1 , T2 , T3 is called
the Sierpinski gasket, S. If we take our initial set A0 to be the right triangle
with vertices at
0
1
0
,
, and
0
0
1
then each of the Ti A0 is a similar right triangle whose linear dimensions are onehalf as large, and which shares one common vertex with the original triangle.
In other words,
A1 = T A0
is obtained from our original triangle be deleting the interior of the (reversed)
right triangle whose vertices are the midpoints of our origninal triangle. Just
as in the case of the Cantor set, successive applications of T to this choice of
original set amounts to successive deletions T
of the middle and the Hausdorff
limit is the intersection of all of them: S = Ai .
We can also start with the one element set
0
B0
0
129
4.14.3
Morans theorem
(4.28)
130
(4.29)
V rd
(diam B)d .
Dd
V rd
(diam B)d 2d (diam B)d
Dd
131
or
2d Dd
.
V rd
So any integer greater that the right hand side of this inequality (which is
independent of B) will do.
Now we turn to the proof of (4.29) which will then complete the proof of
Morans theorem. Let B be a Borel subset of K. Then
[
O
B
m
QB
so
h1 (B)
[].
QB
Now
([]) = (diam[])s =
s
1
1
diam(O ) s (diam B)s
D
D
and so
m(h1 (B))
X
QB
m() N
1
(diam B)s
Ds
1
(diam B)s
Ds
132
Chapter 5
n
X
ai 1Ai
Ai F.
(5.1)
i=1
Then, for any E F we would like to define the integral of a simple function
over E as
Z
n
X
ai m(Ai E)
(5.2)
dm =
E
i=1
and extend this definition by some sort of limiting process to a broader class of
functions.
I havent yet specified what the range of the functions should be. Certainly,
even to get started, we have to allow our functions to take values in a vector
space over R, in order that the expression on the right of (5.2) make sense. In
fact, I will eventually allow f to take values in a Banach space. However the
theory is a bit simpler for real valued functions, where the linear order of the
reals makes some arguments easier. Of course it would then be no problem to
pass to any finite dimensional space over the reals. But we will on occasion
need integrals in infinite dimensional Banach spaces, and that will require a
little reworking of the theory.
133
134
5.1
Recall that if (X, F) and (Y, G) are spaces with -fields, then
f :XY
is called measurable if
f 1 (E) F
E G.
(5.3)
Notice that the collection of subsets of Y for which (5.3) holds is a -field, and
hence if it holds for some collection C, it holds for the -field generated by C.
For the next few sections we will take Y = R and G = B, the Borel field. Since
the collection of open intervals on the line generate the Borel field, a real valued
function f : X R is measurable if and only if
f 1 (I) F
Equally well, it is enough to check this for intervals of the form (, a) for all
real numbers a.
Proposition 5.1.1 If F : R2 R is a continuous function and f, g are two
measurable real valued functions on X, then F (f, g) is measurable.
Proof. The set F 1 (, a) is an open subset of the plane, and hence can be
written as the countable union of products of open intervals I J. So if we set
h = F (f, g) then h1 ((, a)) is the countable union of the sets f 1 (I)g 1 (J)
and hence belongs to F. QED
From this elementary proposition we conclude that if f and g are measurable
real valued functions then
f + g is measurable (since (x, y) 7 x + y is continuous),
f g is measurable (since (x, y) 7 xy is continuous), hence
f 1A is measurable for any A F hence
f + is measurable since f 1 ([0, ]) F and similarly for f so
|f | is measurable and so is |f g|. Hence
f g and f g are measurable
and so on.
5.2
We are going to allow for the possibility that a function value or an integral
might be infinite. We adopt the convention that
0 = 0.
135
(5.4)
where
Z
I(E, f ) =
dm : 0 f, simple .
(5.5)
i,j
136
(5.8)
Proofs.
(5.9):I(E, f ) I(E, g).
(5.10): If is a simple function with f , then multiplying by 1E
gives a function which is still f and is still a simple function. The set I(E, f )
is unchanged by considering only simple functions of the form 1E and these
constitute all simple functions 1E f .
(5.11): We have 1E f 1F f and we can apply (5.9) and (5.10).
(5.12): I(E, af ) = aI(E, f ).
(5.13):
In the definition (5.2) all the terms on the right vanish since
m(E Ai ) = 0. So I(E, f ) consists of the single element 0.
137
EF
EF
Z
1E f dm
1E gdm.
X
But
Z
Z
1E f dm +
Z
1E c f dm =
Z
f dm +
Z
f dm =
Ec
f dm
X
138
5.3
Fatous lemma.
This says:
Theorem 5.3.1 If {fn } is a sequence of non-negative functions, then
Z
Z
lim inf
fk dm
lim inf fk dm.
n kn
n kn
(5.17)
Recall that the limit inferior of a sequence of numbers {an } is defined as follows:
Set
bn := inf ak
kn
so that the sequence {bn } is non-decreasing, and hence has a limit (possibly
infinite) which is defined as the lim inf. For a sequence of functions, lim inf fn
is obtained by taking lim inf fn (x) for every x.
Consider the sequence of simple functions {1[n,n+1] }. At each point x the
lim inf is 0, in fact 1[n,n+1] (x) becomes and stays 0 as soon as n > x. Thus the
right hand side of (5.17) is zero. The numbers which enter into the left hand
side are all 1, so the left hand side is 1.
Similarly, if we take fn = n1(0,1/n] , the left hand side is 1 and the right hand
side is 0. So without further assumptions, we generally expect to get strict
inequality in Fatous lemma.
Proof: Set
gn := inf fk
kn
so that
gn gn+1
and set
f := lim inf fn = lim gn .
n kn
Let
f
be a simple function. We must show that
Z
Z
dm lim inf
fk dm.
n kn
(5.18)
so m(D) = .
Choose some positive number b < all the positive values taken by . This is
possible since there are only finitely many such values.
139
Let
Dn := {x|gn (x) > b}.
The Dn % D since b < (x) limn gn (x) at each point of D. Hence
m(Dn ) m(D) = . But
Z
Z
bm(Dn )
gn dm
fk dm k n
Dn
Dn
Z
fk dm
fk dm
Dn
fn dm = .
b) m ({x : (x) > 0}) < . Choose > 0 so that it is less than the minimum
of the positive values taken on by and set
(x) if (x) > 0
(x) =
0
if (x) = 0.
Let
Cn := {x|gn (x) }
and
C = {x : f (x) }.
Then Cn % C. We have
Z
Z
dm
Cn
gn dm
ZCn
fk dm k n
ZCn
fk dm k n
ZC
fk dm k n.
So
Z
dm lim inf
fk dm.
Cn
140
since (Bi Cn ) % Bi C = Bi . So
Z
Z
dm lim inf fk dm.
Now
Z
dm =
5.4
R
The fRn are increasing and all f so the Rfn dm are monotone increasing and
all f dm. So the limit exists and is f dm. On the other hand, Fatous
lemma gives
Z
Z
Z
f dm lim inf
fn dm = lim
fn dm.
QED
In the monotone convergence theorem we need only know that
fn % f a.e.
Indeed, let C be the set where convergence holds, so m(C c ) = 0. Let gn = 1C fn
and gR = 1C f . Then
gn % gR everywhere,
so we may apply (5.19) to gn and g.
R
R
But gn dm = fn dm and gdm = f dm so the theorem holds for fn and f
as well.
5.5
We will
R say an R valued measurable function is integrable if both
and f dm < . If this happens, we set
Z
Z
Z
+
f dm := f dm f dm.
f + dm <
(5.20)
Since both numbers on the right are finite, this difference makes sense. Some
authors prefer to allow one or the other numbers (but not both) to be infinite,
141
in which case the right hand side of (5.20) might be = or . We will stick
with the above convention.
We will denote the set of all (real valued) integrable functions by L1 or
L1 (X) or L1 (X, R) depending on how precise we want to be.
Notice that if f g then f + g + and f g all of these functions being
non-negative. So
Z
Z
Z
Z
f + dm g + dm,
f dm g dm
hence
Z
f dm
f dm
g dm
g dm
or
Z
f g
Z
f dm
gdm.
(5.21)
Z
af dm = a
f dm.
(5.22)
Z
(f + g)dm =
Z
f dm +
gdm.
(5.23)
XX
i
ai m(Ai Bj ) +
ai m(Ai ) +
bj m(Ai Bj )
bj m(Bj )
Z
=
XX
Z
f dm +
gdm
where we have used the fact that m is additive and the Ai Bj are disjoint
sets whose union over j is Ai and whose union over i is Bj .
142
f dm is
We also have
Proposition 5.5.1 If h L1 and
R
A
143
Z
kf k1 :=
|f |dm
5.6
g L1 .
Then
Z
fn f a.e. f L1 and
Z
fn dm
f dm.
Proof. The functions fn are all integrable, since their positive and negative
parts are dominated by g. Assume for the moment that fn 0. Then Fatous
lemma says that
Z
Z
f dm lim inf
fn dm.
Z
gdm lim sup
fn dm.
144
Subtracting
gdm gives
Z
So
Z
lim sup
Z
fn dm
lim sup
f dm.
Z
fn dm
Z
f dm lim inf
fn dm
5.7
Riemann integrability.
i = 1, . . . , n
ki mi
ki = inf f (x)
xIi
Mi mi
Mi := sup f (x)
xIi
145
Suppose that f is measurable. All the functions in the above inequality are
Lebesgue integrable, so dominated convergence implies that
Z
lim Un = lim
Z
un dx =
udx
a
where u = lim un with a similar equation for the lower bounds. The Riemann
integral is defined as the common value of lim Ln and lim Un whenever these
limits are equal.
Proposition 5.7.1 f is Riemann integrable if and only if f is continuous almost everywhere.
Proof. Notice that if x is not an endpoint of any interval in the partitions,
then f is continuous at
R x if and only if u(x) = `(x). Riemanns condition for
integrability says that (u`)dm = 0 which implies that f is continuous almost
everywhere.
Conversely, if f is continuous a.e. then u = f = ` a.e.. Since u is measurable
so is f , and since we are assuming that f is bounded, we conclude that f
Lebesgue integrable. As ` = f = u a.e. their Lebesgue integrals coincide. But
the statement that the Lebesgue integral of u is the same as that of ` is precisely
the statement of Riemann integrability.QED
Notice that in the course of the proof we have also shown that the Lebesgue
and Riemann integrals coincide when both exist.
5.8
Z
X
gn dm =
gn dm.
n=1
Proof. We have
Z X
n
n=1
gk dm =
k=1
n Z
X
gk dm
k=1
146
Then
and
fk (x) converges to a finite limit for almost all x, the sum is integrable,
Z X
fk dm =
k=1
Z
X
fk dm.
k=1
n=1 gn
n=1
|fn | then
n=1
and we are assuming that this sum is finite. So g is integrable, in particular the
set of x for which g(x) = must have measure zero. In other words,
X
n=1
If
P a series is absolutely convergent, then it is convergent, so we can say that
fn (x) converges almost everywhere. Let
f (x) =
fn (x)
n=1
at all points where the series converges, and set f (x) = 0 at all other points.
Now
X
fn (x) g(x)
n=0
n
X
Z
f dm =
lim
fk dm = lim
k=1
XZ
fk dm =
Z
X
fk dm
k=1
QED
5.9
L1 is complete.
1
2
n n1 .
1
22
n n2 .
147
1
.
2j
Let
fj := hnj+1 hnj .
Then
Z
|fj |dm <
1
2j
k
X
fj converges
fj = hnk+1 .
j=1
5.10
Z
f dm
Z
dm =
(f )dm = kf k1 < .
ai 1Ai
148
R
(finite sum) where each of the ai > 0 and since dm < each Ai has finite
measure. Since m(Ai [n, n]) m(Ai ) as n , we may choose n sufficiently
large so that
kf k1 < 2 where =
ai 1Ai [n,n] .
For each of the sets Ai [n, n] we can find a bounded open set Ui which
contains it, and such that m(Ui /Ai ) is as small as we please. So we can find
finitely many bounded open sets Ui such that
kf
S
Each Ui is
Pa countable union of disjoint open intervals, Ui = j Ii,j , and since
m(Ui ) = j m(Ii,j ), we can find finitely many Ii,j , j ranging over a finite set of
S
integers, Ji such that m
jJi is as close as we like to m(Ui ). So let us call a
P
step function a function of the form bi 1Ii where the Ii are bounded intervals.
We have shown that we can find a step function with positive coefficients which
is as close as we like in the k k1 norm to f . If f is not necessarily non-negative,
we know (by definition!) that f + and f are in L1 , and so we can approximate
each by a step function. the triangle inequality then gives
Proposition 5.10.1 The step functions are dense in L1 (R, R).
If [a, b], a < b is a finite interval, we can approximate 1[a,b] as closely as we
like in the k k1 norm by continuous functions: just choose n large enough so
that n2 < b a, and take the function which is 0 for x < a, rises linearly from 0
to 1 on [a, a + n1 ], is identically 1 on [a + n1 , b n1 ], and goes down linearly from
1 to 0 from b n1 to b and stays 0 thereafter. As n this clearly tends to
1[a,b] in the k k1 norm. So
Proposition 5.10.2 The continuous functions of compact support are dense in
L1 (R, R).
As a consequence of this proposition, we see that we could have avoided all of
measure theory if our sole purpose was to define the space L1 (R, R). We could
have defined it to be the completion of the space of continuous functions of
compact support relative to the k k1 norm.
5.11
We will state and prove this in the generalized form. Let h be a bounded
measurable function on R. We say that h satisfies the averaging condition if
Z c
1
lim
hdm 0.
(5.25)
|c| |c| 0
149
For example, if h(t) = cos t, 6= 0, then the expression under the limit sign in
the averaging condition is
1
sin t
c
which tends to zero as |c| . Here the oscillations in h are what give rise to
the averaging condition. As another example, let
1
|t| t
h(t) =
1/|t| |t| 1.
Then the left hand side of (5.25) is
1
(1 + log |c|),
|c|
|c| 1.
Here the averaging condition is satisfied because the integral in (5.25) grows
more slowly that |c|.
Theorem 5.11.1 [Generalized Riemann-Lebesgue Lemma].
Let f L1 ([c, d], R), c < d . If h satisfies the averaging
condition (5.25) then
Z d
lim
f (t)h(rt)dt = 0.
(5.26)
r
Proof. Our proof will use the density of step functions, Proposition 5.10.1.
We first prove the theorem when f = 1[a,b] is the indicator function of a finite
interval. Suppose for example that 0 a < b.Then the integral on the right
hand side of (5.26) is
Z
Z b
1[a.b] h(rt)dt =
h(rt)dt, or setting x = rt
0
1
r
br
h(x)dx
0
1
r
ra
h(x)dx
0
and each of these terms tends to 0 by hypothesis. The same argument will work
for any bounded interval [a, b] we will get a sum or difference of terms as above.
So we have proved (5.26) for indicator functions of intervals and hence for step
functions.
Now let M be such that |h| M everywhere (or almost everywhere) and
choose a step function s so that
kf sk1
.
2M
Then f h = (f s)h + sh
Z
Z
Z
f (t)h(rt)dt = (f (t) s(t)h(rt)dt + s(t)h(rt)dt
Z
Z
(f (t) s(t))h(rt)dt + s(t)h(rt)dt
Z
M + s(t)h(rt)dt .
2M
150
5.11.1
2
This says:
Theorem 5.11.2 If a trigonometric series
a0 X
+
dn cos(nt n )
2
n
dn R
E k
But
cos2 (nk t k ) =
1
[1 + cos 2(nk t k )]
2
so
Z
=
=
Z
1
[1 + cos 2(nk t k )]dt
2 E
Z
1
m(E) +
cos 2(nk t k )
2
ZE
1
1
m(E) +
1E cos 2(nk t k )dt.
2
2 R
151
But 1E L1 (R, R) so the second term on the last line goes to 0 by the Riemann
Lebesgue Lemma. So the limit is 12 m(E) instead of 0, a contradiction. QED
5.12
Fubinis theorem.
This famous theorem asserts that under suitable conditions, a double integral is
equal to an iterated integral. We will prove it for real (and hence finite dimensional) valued functions on arbitrary measure spaces. (The proof for Banach
space valued functions is a bit more tricky, and we shall omit it as we will not
need it. This is one of the reasons why we have developed the real valued theory
first.) We begin with some facts about product -fields.
5.12.1
Product -fields.
Let (X, F) and (Y, G) be spaces with -fields. On X Y we can consider the
collection P of all sets of the form
A B,
A F, B G.
B G.
In other words, an element of P is a cylinder set when one of the factors is the
whole space.
Theorem 5.12.1 .
152
prX (x, y) = x
prY (x, y) = y
QED
5.12.2
153
From items 1) and 3) we see that a -system is closed under complementation, and since = X c it contains the empty set. If B is both a -system and a
system, it is closed under any finite union, since AB = A(B/(AB) which
is a disjoint union. Any countable union can be written in the form A =% An
where the An are finite disjoint unions as we have already argued. So we have
proved
Proposition 5.12.1 If H is both a -system and a -system then it is a -field.
Also, we have
Proposition 5.12.2 [Dynkins lemma.] If C is a -system, then the -field
generated by C is the smallest -system containing C.
Let M be the -field generated by C, and H the smallest -system containing
C. So M H. By the preceding proposition, all we need to do is show that H
is a -system.
Let
H1 := {A| A C H C C}.
Clearly H1 is a -system containing C, so H H1 which means that A C H
for all A H and C C.
Let
H2 := {A| A H H H H}.
H2 is again a -system, and it contains C by what we have just proved. So
H2 H, which means that the intersection of two elements of H is again in H,
i.e. H is a -system. QED
5.12.3
154
n
K2
X
i=0
i
1A(n,i) .
2n
5.12.4
Let (X, F, m) and (Y, G, n) be measure spaces with m(X) < and n(Y ) < .
For every bounded F G-measurable function f , we know that the function
f (x, ) : y 7 f (x, y)
155
which is a function of y.
Proposition 5.12.4 Let B denote the space of bounded F G measurable functions such that
R
Y f (x, y)n(dy) is a F measurable function on X,
R
X f (x, y)m(dx) is a G measurable function on Y and
Z
X
Z
Z
f (x, y)n(dy) m(dx) =
Z
f (x, y)m(dx) n(dy).
(5.27)
(5.28)
both sides being equal on account of the preceding proposition. This measure
assigns the value m(A)n(B) to any set A B P, and since P generates F G
as a sigma field, any two measures which agree on P must agree on F G.
Hence m n is the unique measure which assigns the value m(A)n(B) to sets
of P.
Furthermore, we know that
Z
Z Z
Z Z
f (x, y)(mn) =
f (x, y)n(dy) m(dx) =
f (x, y)m(dx) n(dy)
XY
(5.29)
is true for functions of the form 1AB and hence by the monotone class theorem
it is true for all bounded functions which are measurable relative to F G.
The above assertions are the content of Fubinis theorem for bounded measures and functions. We summarize:
156
Theorem 5.12.3 Let (X, F, m) and (Y, G, n) be measure spaces with m(X) <
and n(Y ) < . There exists a unique measure on F G with the property
that
(m n)(A B) = m(A)n(B) A B P.
For any bounded F G measurable function, the double integral is equal to the
iterated integral in the sense that (5.29) holds.
5.12.5
Suppose that we temporarily keep the condition that m(X) < and n(Y ) < .
Let f be any non-negative F G-measurable function. We know that (5.29)
holds for all bounded measurable functions, in particular for all simple functions.
We know that we can find a sequence of simple functions sn such that sn % f .
Hence by several applications of the monotone convergence theorem, we know
that (5.29) is true for all non-negative F G-measurable functions in the sense
that all three terms are infinite together, or finite together and equal. Now we
have agreed to call a F G-measurable function f integrable if and only if f +
and f have finite integrals. In this case (5.29) holds.
S
A measure space (X, F, m) is called -finite if X = n Xn where m(Xn ) <
. In other words, X is -finite if it is a countable union of finite measure
spaces. As usual, we can then write X as a countable union of disjoint finite
measure spaces. So if X and Y are -finite, we can write the various integrals
that occur in (5.29) as sums of integrals which occur over finite measure spaces.
A bit of standard argumentation shows that Fubini continues to hold in this
case.
If X or Y is not -finite, or, even in the finite case, if f is not non-negative
or m n integrable, then Fubini need not hold. I hope to present the standard
counter-examples in the problem set.
Chapter 6
6.1
Let L be a vector space of bounded real valued functions on a set S closed under
and . For example, S might be a complete metric space, and L might be
the space of continuous functions of compact support on S.
A map
I:LR
is called an Integral if
1. I is linear: I(af + bg) = aI(f ) + bI(g)
2. I is non-negative: f 0 I(f ) 0 or equivalently f g I(f ) I(g).
3. fn & 0 I(fn ) & 0.
For example, we might take S = Rn , L = the space of continuous functions of
compact support on Rn , and I to be the Riemann integral. The first two items
on the above list are clearly satisfied. As to the third, we recall Dinis lemma
from the notes on metric spaces, which says that a sequence of continuous
functions of compact support {fn } on a metric space which satisfies fn & 0
actually converges uniformly to 0. Furthermore the supports of the fn are all
contained in a fixed compact set - for example the support of f1 . This establishes
the third item.
157
158
The plan is now to successively increase the class of functions on which the
integral is defined.
Define
U := {limits of monotone non-decreasing sequences of elements of L}.
We will use the word increasing as synonymous with monotone non-decreasing
so as to simplify the language.
Lemma 6.1.1 If fn is an increasing sequence of elements of L and if k L
satisfies k lim fn then lim I(fn ) I(k).
Proof. If k L and lim fn k, then
fn k k and fn fn k
so I(fn ) I(fn k) while
[k (fn k)] & 0
so
I([k fn k]) & 0
by 3) or
I(fn k) % I(k).
Hence lim I(fn ) lim I(fn k) = I(k). QED
Lemma 6.1.2 [12C] If {fn } and {gn } are increasing sequences of elements of
L and lim gn lim fn then lim I(gn ) lim I(fn ).
Proof. Fix m and take k = gm in the previous lemma. Then I(gm ) lim I(fn ).
Now let m . QED
Thus
fn % f and gn % f lim I(fn ) = lim I(gn )
so we may extend I to U by setting
I(f ) := lim I(fn )
for fn % f.
If f L, this coincides with our original I, since we can take gn = f for all n
in the preceding lemma.
We have now extended I from L to U . The next lemma shows that if we
now start with I on U and apply the same procedure again, we do not get any
further.
Lemma 6.1.3 [12D] If fn U and fn % f then f U and I(fn ) % I(f ).
159
for i n.
Let n . Then
fi lim hn f.
Now let i . We get
f lim hn f.
So we have written f as a limit of an increasing sequence of elements of L, So
f U . Also
I(gin ) I(hn ) I(f )
so letting n we get
I(fi ) I(f ) lim I(fn )
so passing to the limits gives I(f ) = lim I(fn ). QED
We have
I(f + g) = I(f ) + I(g) for f, g U.
Define
U := {f | f U }
and
I(f ) := I(f )
f U.
160
n
X
hi
and
n
X
.
2n
I(hi ) I(fn ) + .
i=1
hi U,
f h
i=1
Since fm L1 we can find a gm U with I(fm ) I(gm ) < and hence for m
large enough I(h) I(gm ) < 2. So f L1 and I(f ) = lim I(fn ). QED
6.2
A collection of functions which is closed under monotone increasing and monotone decreasing limits is called a monotone class. B is defined to be the
smallest monotone class containing L.
Lemma 6.2.1 Let h k. If M is a monotone class which contains (g h) k
for every g L, then M contains all (f h) k for all f B.
Proof. The set of f such that (f h) k M is a monotone class containing
L by the distributive laws.QED
Taking h = k = 0 this says that the smallest monotone class containing L+ ,
the set of non-negative functions in L, is the set B + , the set of non-negative
functions in B.
Here is a series of monotone class theorem style arguments:
Theorem 6.2.1 f, g B af + bg B, f g B and f g B.
For f B, let
M(f ) := {g B|f + g, f g, f g B}.
M(f ) is a monotone class. If f L it includes all of L, hence all of B. But
g M(f ) f M(g).
So L M(g) for any g B, and since it is a monotone class B M(g). This
says that f, g B f + g B, f g B and f g B. Similarly, let M be
the class of functions for which cf B for all real c. This is a monotone class
containing L hence contains B. QED
6.3. MEASURE.
161
6.3
Measure.
162
1
0
fn (x) =
n(f (x) a)
if
f (x) a + n1
if
f (x) a .
if a < f (x) < a + n1
We have
fn % 1Aa
so 1Aa B and 0 1Aa a1 f + . QED
Theorem 6.3.2 If f 0 and Aa is integrable for all a > 0 then f B.
Proof. For > 1 define
Am := {x| m < f (x) m+1 }
for m Z and
f :=
m 1Am .
Each f B. Take
n = 22
Then each successive subdivision divides the previous one into octaves and
fm % f . QED
Also
f f f
and
I(f) =
(Am )
Z
=
f d.
So we have
I(f ) I(f ) I(f )
6.4. HOLDER,
MINKOWSKI , LP AND LQ .
and
Z
f d
163
Z
f d
f d.
R
So if either of I(f ) or f d is finite they both are and
Z
I(f ) f d ( 1)I(f ) ( 1)I(f ).
So
Z
f d = I(f ).
If f B + and a > 0 then
1
6.4
H
older, Minkowski , Lp and Lq .
ap
p
while the area between the same curve and the y-axis up to y = b
B=
bq
.
q
164
Suppose b < ap1 to fix the ideas. Then area ab of the rectangle is less than
A + B or
bq
ap
+
ab
p
q
1
ap bq
a b
+ .
p q
p1
|f | d
.
p
|f |p
,
kf kp
b=
|g|q
kgkq
as functions. Then
Z
Z
Z
1 1
1 1
p
q
(|f ||g|)d kf kp kgkq
|f | d +
|g| d = kf kp kgkq .
p kf kpp
q kgkqq
This shows that the left hand side is integrable and that
Z
f gd kf kp kgkq
(6.1)
which is known as H
olders inequality. (If either kf ||p or kgkq = 0 then
f g = 0 a.e. and H
olders inequality is trivial.)
We write
Z
(f, g) := f gd.
Proposition 6.4.1 [Minkowskis inequality] If f, g Lp , p 1 then f +g
Lp and
kf + gkp kf kp + kgkp .
For p = 1 this is obvious. If p > 1
p
6.4. HOLDER,
MINKOWSKI , LP AND LQ .
165
so
|f + g|p1 Lq
and its k kq norm is
1
p1
p
= kf + gkp1
.
p
n
X
fj Lp
|fi+1 fi | Lp and hn := fn +
|fi+1 fi | Lp .
We have
gn+1 gn = fn+1 fn + |fn+1 fn | 0
so gn is increasing and similarly hn is decreasing. Hence f := lim gn Lp and
kf fn kp khn gn kp 2n+2 0. So the subsequence has a limit which
then must be the limit of the original sequence. QED
Proposition 6.4.2 L is dense in Lp for any 1 p < .
Proof. For p = 1 this was a defining property of L1 . More generally, suppose
that f Lp and that f 0. Let
An := {x :
1
< f (x) < n},
n
166
and let
gn := f 1An .
Then (f gn ) & 0 as n . Choose n sufficiently large so that kf gn kp < /2.
Since
0 gn n1An and (An ) < np I(|f |p ) <
we conclude that
gn L1 .
Now choose h L+ so that
kh gn k1 <
p
2n
1/p
(I(|h gn |p ))
1/p
I(|h gn |p1 |h gn |)
1/p
I(np1 |h gn |)
1/p
= np1 kh gn k1
< /2.
=
6.5
(6.2)
167
Remark. In the statement of the theorem, both sides of (6.2) are allowed to
be .
Proof. If kf k = 0, then kf kq = 0 for all q > 0 so the result is trivial in this
case. So let us assume that kf k > 0 and let a be any positive number smaller
that kf k . In other words,
0 < a < kf k .
Let
Aa := kx : |f (x)| > a}.
This set has positive measure by the choice of a, and its measure is finite since
f Lp . Also
Z
1/q
q
kf kq
|f |
a(Aa )1/q .
Aa
Letting q gives
lim inf kf kq a
q
kf kq (kf kp ) q (kf k )1 q .
Letting q gives the desired result. QED
6.6
Suppose we are given two integrals, I and J on the same space L. That is, both
I and J satisfy the three conditions of linearity, positivity, and the monotone
limit property that went into our definition of the term integral. We say that
J is absolutely continuous with respect to I if every set which is I null (i.e.
has measure zero with respect to the measure associated to I) is J null.
The integral I is said to be bounded if
I(1) < ,
or, what amounts to the same thing, that
I (S) <
168
f L1 (J).
(6.3)
Let N be the set of all x where g(x) 1. Taking f = 1N in the preceding string
of equalities shows that
J(1N ) nI(1N ).
Since n is arbitrary, we have proved
169
gn
i=1
g i L1 (I).
i=1
We have
1
1g
almost everywhere
f0 1
f0
almost everywhere
f0 =
so
g=
and
J(f ) = I(f f0 )
1
170
Then we can apply the rest of the proof of the Radon Nikodym theorem to Jcont
to conclude that
Jcont (f ) = I(f f0 )
P
where f0 = i=1 (1N c g)i is an element of L1 (I) as before. In particular, Jcont
is absolutely continuous with respect to I.
A second extension is to certain situations where S is not of finite measure.
We say that a function f is locally L1 if f 1A L1 for every set A with
(A) < . We say that S is -finite with respect to if S is a countable union
of sets of finite measure. This is the same as saying that 1 = 1S B. If S
is -finite then it can be written as a disjoint union of sets of finite measure.
If S is -finite with respect to both I and J it can be written as the disjoint
union of countably many sets which are both I and J finite. So if J is absolutely
continuous with respect I, we can apply the Radon-Nikodym theorem to each of
these sets of finite measure, and conclude that there is an f0 which is locally L1
with respect to I, such that J(f ) = I(f f0 ) for all f L1 (J), and f0 is unique
up to almost everywhere equality.
6.7
Recall that H
olders inequality (6.1) says that
Z
f gd kf kp kgkq
if f Lp and g Lq where
1 1
+ = 1.
p q
For the rest of this section we will assume without further mention that this
relation between p and q holds. Holders inequality implies that we have a map
from
Lq (Lp )
sending g Lq to the continuous linear function on Lp which sends
Z
f 7 I(f g) = f gd.
Furthermore, H
olders inequality says that the norm of this map from Lq
p
(L ) is 1. In particular, this map is injective.
The theorem we want to prove is that under suitable conditions on S and
I (which are more general even that -finiteness) this map is surjective for
1 p < .
We will first prove the theorem in the case where (S) < , that is when I
is a bounded integral. For this we will will need a lemma:
6.7.1
171
172
6.7.2
Theorem 6.7.1 Suppose that (S) < and that F is a bounded linear function on Lp with 1 p < . Then there exists a unique g Lq such that
F (f ) = (f, g) = I(f g).
Here q = p/(p 1) if p > 1 and q = if p = 1.
Proof. Consider the restriction of F to L. We know that F = F + F where
both F + and F are linear and non-negative and are bounded with respect to
the k kp norm on L. The monotone convergence theorem implies that if fn & 0
then kfn kp 0 and the boundedness of F + with respect to the k kp says that
kfn kp 0 F + (fn ) 0.
So F + satisfies all the axioms for an integral, and so does F . If f vanishes
outside a set of I measure zero, then kf kp = 0. Applied to a function of the form
f = 1A we conclude that if A has = I measure zero, then A has measure
zero with respect to the measures determined by F + or F . We can apply the
173
6.7.3
Here the cases p > 1 and p = 1 may be different, depending on how infinite S
is.
Let us first consider the case where p > 1. If we restrict the functional F to
any subspace of Lp its norm can only decrease. Consider a subspace consisting
of all functions which vanish outside a subset S1 where (S1 ) < . We get
174
X
fm = f
m=1
fm hm
Am
6.8
6.8.1
176
Proof. We apply Proposition 6.7.1 to the case of our L and with the uniform
norm, k k . We get
F = F+ F
and an examination of he proof will show that in fact
kF k kF k .
By the preceding theorem, F are both integrals. QED
6.8.2
Fubinis theorem.
Theorem 6.8.3 Let S1 and S2 be locally compact Hausdorff spaces and let I
and J be non-negative linear functionals on L(S1 ) and L(S2 ) respectively. Then
Ix (Jy h(x, y)) = Jy (Ix (h(x, y))
for every h L(S1 S2 ) in the obvious notation, and this common value is an
integral on L(S1 S2 ).
Proof via Stone-Weierstrass. The equation in the theorem is clearly true
if h(x, y) = f (x)g(y) where f L(S1 ) and g L(S2 ) and so it is true for any
h which can be written as a finite sum of such functions. Let h be a general
element of L(S1 S2 ). then we can find compact subsets C1 S1 and C2 S2
such that h is supported in the compact set C1 C2 . The functions of the form
X
fi (x)gi (y)
where the fi are all supported in C1 and the gi in C2 , and the sum is finite,
form an algebra which separates points. So for any > 0 we can find a k of the
above form with
kh kk < .
Let B1 and B2 be bounds for I on L(C1 ) and J on L(C2 ) as provided by Lemma
6.8.1. Then
X
|Jy h(x, y)
J(gi )fi (x)| = |[Jy (f k)](x)| < B2 .
This shows that Jy h(x, y) is the uniform limit of continuous functions supported
in C1 and so Jy h(x, y) is itself continuous and supported in C1 . It then follows
that Ix (Jy (h) is defined, and that
|Ix (Jy h(x, y)
177
Since is arbitrary, this gives the equality in the theorem. Since this (same)
functional is non-negative, it is an integral by the first of the Riesz representation
theorems above. QED
Let X be a locally compact Hausdorff space, and let L denote the space of
continuous functions of compact support on X. Recall that the Riesz representation theorem (one of them) asserts that any non-negative linear function I on
L satisfies the starting axioms for the Daniell integral, and hence corresponds
to a measure defined on a -field, and such that I(f ) is given by integration
of f relative to this measure for any f L.
6.9
6.9.1
178
The proof of this theorem hinges on some topological facts whose true place is
in the chapter on metric spaces, but I will prove them here. The importance
of the theorem is that it will allow us to derive some conclusions about spaces
which are very huge (such as the space of all paths in Rn ) but are nevertheless
locally compact (in fact compact) Hausdorff spaces. It is because we want to
consider such spaces, that the earlier proof, which hinged on taking limits of
sequences in the very definition of the Daniell integral, is insufficient to get at
the results we want.
6.9.2
Propositions in topology.
179
and
Supp(h) U.
Proof. Choose V as in Proposition 6.9.3. By Urysohns lemma applied to the
compact space V we can find a function h : V [0, 1] such that h = 1 on K
and f = 0 on V \ V . Extend h to be zero on the complement of V . Then h does
the trick.
Proposition 6.9.5 Let X be a locally compact Hausdorff space, f L, i.e. f
is a continuous function of compact support on X. Suppose that there are open
subsets U1 , . . . Un such that
Supp(f )
n
[
Ui .
i=1
L2 := K \ U2 .
So L1 and L2 are disjoint compact sets. By Proposition 6.9.1 we can find disjoint
open sets V1 , V2 with
L1 V1 , L2 V2 .
Set
K1 := K \ V1 , K2 := K \ V2 .
Then K1 and K2 are compact, and
K = K1 K 2 ,
K1 U1 , K2 U2 .
2 := h2 h1 h2 .
180
6.9.3
= sup{I(f ) : f L, 0 f 1U }
= sup{I(f ) : f L, 0 f 1U , Supp(f ) U }
(6.5)
(6.6)
for any open set U , since either of these equations determines on any open
set U and hence for the Borel field.
R
Since f 1U and both are measurable functions, it is clear that (U ) = 1U
is at least as large as the expression on the right hand side of (6.5). This in
turn is as least as large as the right hand side of (6.6) since the supremum in
(6.6) is taken over as smaller set of functions that that of (6.5). So it is enough
to prove that (U ) is the right hand side of (6.6).
Let a < (U ). Interior regularity implies that we can find a compact set
K U with
a < (K).
Take the f provided by Proposition 6.9.4. Then a < I(f ), and so the right hand
side of (6.6) is a. Since a was any number < (U ), we conclude that (U ) is
the right hand side of 6.6). QED
6.10
Existence.
We will
define a function m defined on all subsets,
show that it is an outer measure,
show that the set of measurable sets in the sense of Caratheodory include
all the Borel sets, and that
integration with respect to the associated measure assigns I(f ) to every
f L.
6.10.1
Definition.
(6.7)
(6.8)
Since U is contained in itself, this does not change the definition on open sets.
It is clear that m () = 0 and that A B implies that m (A) m (B). So
6.10. EXISTENCE.
181
m
Un
m (Un ).
(6.9)
n
Set
U :=
Un ,
N
[
Un .
n=1
I(fi )
m (Ui ),
using the definition (6.7). Replacing the finite sum on the right hand side of
this inequality by the infinite sum, and then taking the supremum over f proves
(6.9), where we use the definition (6.7) once again.
Next let {An } be any sequence of subsets of X. We wish to prove that
!
[
X
m
An
m (An ).
n
182
6.10.2
Let F denote the collection of subsets which are measurable in the sense of
Caratheodory for the outer measure m . We wish to prove that F B(X).
Since B(X) is the -field generated by the open sets, it is enough to show that
every open set is measurable in the sense of Caratheodory, i.e. that
m (A) m (A U ) + m (A U c )
(6.10)
for any open set U and any set A with m (A) < : If > 0, choose an open
set V A with
m (V ) m (A) +
which is possible by the definition (6.8). We will show that
m (V ) m (V U ) + m (V U c ) 2.
(6.11)
and
Supp(f1 ) V U
with
I(f1 ) m (V U ) .
Let K := Supp(f1 ). Then K U and so K c U c and K c is open. Hence
V K c is an open set and
V K c V U c.
Using the definition (6.7), we can find an f2 L such that
f2 1V K c
and
Supp(f2 ) V K c
with
I(f2 ) m (V K c ) .
But m (V K c ) m (V U c ) since V K c V U c . So
I(f2 ) m (V U c ) .
So
f1 + f2 1K + 1V K c 1V
since K = Supp(f1 ) V and Supp(f2 ) V K c . Also
Supp(f1 + f2 ) (K V K c ) = V.
Thus f = f1 + f2 L and so by (6.7),
I(f1 + f2 ) m (V ).
This proves (6.11) and hence that all Borel sets are measurable.
6.10. EXISTENCE.
6.10.3
183
1
f
1
1
I(f ).
1
So, by (6.8)
1
I(f ) < .
1
Reviewing the preceding argument, we see that we have in fact proved the
more general statement
(K) m (U )
6.10.4
Interior regularity.
We now prove interior regularity, which will be very important for us. We wish
to prove that
(U ) = sup{(K) : K U, K compact },
for any open set U , where, according to (6.7),
m (U ) = sup{I(f ) : f L, 0 f 1U , Supp(f ) U }.
Since Supp(f ) is compact, and contained in U , we will be done if we show that
f L, 0 f 1 I(f ) (Supp(f )).
So let V be an open set containing Supp(f ). By definition (6.7),
(V ) I(f )
(6.12)
184
6.10.5
Finally, we must show that all the elements of L are integrable with respect to
and
Z
I(f ) = f d.
(6.13)
Since the elements of L are continuous, they are Borel measurable. As every
f L can be written as the difference of two non-negative elements of L, and as
both sides of (6.13) are linear in f , it is enough to prove (6.13) for non-negative
functions.
Following Lebesgue, divide the y-axis up into intervals of size . That is,
let be a positive number, and, for every positive integer n set
if
f (x) (n 1)
0
f
(x)
(n
1)
if
(n
1) < f (x) n
fn (x) :=
if
n < f (x)
If (n1) kf k only the first alternative can occur, so all but finitely many of
the fn vanish, and they all are continuous and have compact support so belong
to L. Also
X
f=
fn
this sum being finite, as we have observed, and so
X
I(f ) =
I(fn ).
Set K0 := Supp(f ) and
Kn := {x : f (x) n} n = 1, 2, . . . .
Then the Ki are a nested decreasing collection of compact sets, and
1Kn fn 1Kn1 .
By Propositions 6.10.1 and6.10.2 we have
(Kn ) I(fn ) (Kn1 ).
6.10. EXISTENCE.
185
On the other hand, the monotonicity of the integral (and its definition) imply
that
Z
(Kn ) fn d (Kn1 ).
Summing these inequalities gives
N
X
(Kn )
I(f )
i=1
N
X
(Kn )
f d
N
1
X
i=0
(Kn )
(Kn )
i=0
(Kn )
i=0
i=1
N
1
X
N
X
i=1
of one another. Since is arbitrary, we have proved (6.13) and completed the
proof of the Riesz representation theorem.
186
Chapter 7
Wiener measure.
7.1.1
(7.1)
0t<
m
Y
n R,
R
i=1
2
1
e(xy) /2t
n/2
(2t)
(7.2)
2
1
nr (x) := ex /2r .
r
nr = r 2 S
2
r 2
/2
(Sa f )= (1/a)S1/a f,
189
and takes the unit Gaussian into the unit Gaussian. Thus upon Fourier transform, the equation nt ? ns = nt+s becomes the obvious fact that
es
/2 t 2 /2
= e(s+t)
/2
The same proof (or an iterated version of the one dimensional result) applies in
n-dimensions.
So, for each x Rn we have defined a measure on . We denote the
measure corresponding to Ix by prx . It is a probability measure in the sense
that prx () = 1.
The intuitive idea behind the definition of prx is that it assigns probability
prx (E) :=
Z
E1
to the set of all paths which start at x and pass through the set E1 at time
t1 , the set E2 at time t2 etc. and we have denoted this set of paths by E.
7.1.2
tn/2 ex
/2t
(7.4)
/2
f().
(7.5)
t0
(7.6)
Using some language we will introduce later, conditions (7.4) and (7.6) say that
the Tt form a continuous semi-group of operators. If we differentiate (7.5) with
respect to t, and let
u(t, x) := (Tt f )(x)
we see that u is a solution of the heat equation
2u
2u
2u
=
+ +
2
1
2
(t)
(x )
(xn )2
+
(x1 )2
(xn )2
we are tempted to write
Tt = et ,
in analogy to our study of elliptic operators on compact manifolds. We will
spend lot of time justifying these kind of formulas in the non-compact setting
later on in the course.
7.1.3
The purpose of this subsection is to prove that if we use the measure prx , then
the set of discontinuous paths has measure zero.
We begin with some technical issues. We recall that the statement that a
measure is regular means that for any Borel set A
(A) = inf{(G) : A G, G open }
and for any open set U
(U ) = sup{(K) : K U, K compact}.
This second condition has the following consequence: Suppose that is any
collection of open sets which is closed under finite union. If
[
O=
G
G
then
(O) = sup (G)
G
2
t
1/2
1
2t
t
r
1/2 Z
Z
r
x2 /2t
dx
x x2 /2t
e
dx =
t
2t
2
t
1/2
1/2 Z
r
2
er /2t
.
r
x x2 /2t
e
dx =
r
191
For fixed r this tends to zero (very fast) as t 0. In n-dimensions kyk >
(in the Euclidean
norm) implies that at least one of its coordinates yi satisfies
/2nt
(7.7)
Then
1
prx (A) 2( , )
2
independently of the number m of steps.
Proof. Let
(7.8)
1
}
2
1
}
2
let
and let
Di := {| |(t1 ) (ti )| > and |(t1 ) (tk )| k = 1, . . . i 1}.
If A, then Di for some i by the definition of A, by taking i to be the
first j that works in the definition of A. If 6 B and Di then Ci
since it has to move a distance of at least 12 to get back from outside the ball
of radius to inside the ball of radius 12 . So we have
AB
m
[
(Ci Di )
i=1
and hence
prx (A) prx (B) +
m
X
prx (Ci Di ).
(7.9)
i=1
Now we can estimate prx (Ci Di ) as follows. For to belong to this intersection,
we must have Di and then the path moves a distance at least 2 in time
tn ti and these two events are independent, so prx (Ci Di ) ( 2 , ) prx (Di ).
Here is this argument in more detail: Let
F = 1{(y,z)|
|yz|> 12 }
Z
...
1
1
prx (A) prx (B) + ( , ) 2( , ).
2
2
QED
Let
E : {||(ti ) (tj )| > 2 for some 1 j < k m}.
Then E A since if |(tj ) (tk )| > 2 then either |(t1 ) (tj )| > or
|(t1 ) (tk )| > (or both). So
1
prx (E) 2( , ).
2
(7.10)
1
prx (E(a, b.)) 2( , ).
2
Proof. Here is where we are going to use the regularity of the measure. Let
S denote a finite subset of [a, b] and and let
E(a, b, , S) := {| |(s) (t)| > 2 for some s, t S}.
193
Then E(a, b, , S) is an open set and prx (E(a, b, , S)) < 2( 12 , ) for any S. The
union over all S of the E(a, b, , S) is E(a, b, ). The regularity of the measure
now implies the lemma. QED
Let k and n be integers, and set
:=
1
.
n
Let
F (k, , ) := {| |(t) (s)| > 4 for some t, s [0, k], with |t s| < }.
Then we claim that
( 12 , )
.
(7.11)
E1
to the set of all paths which start at 0 and pass through the set E1 at time t1 ,
the set E2 at time t2 etc. and we have denoted this set of paths by E.
7.1.4
Embedding in S 0 .
R
Hence the set of with 0 (1+t)2 |w(t)|dt = must have measure zero. QED
Now each element of W defines a tempered distribution, i.e. an element of
S 0 according to the rule
Z
h, i =
(t)(t)dt.
(7.12)
0
and take these to form a basis for a topology on W. Since we put the weak
topology on S 0 it is clear that the map (7.12) is continuous relative to this new
topology. So it will be sufficient to show that each set U () is of the form
A W where A is in B(), the Borel field associated to the (product) topology
on .
So first consider the subsets Vn, () of W consisting of all 1 W such that
sup |1 (t) (t)|
t0
1
.
n
Vn, (),
1
n
7.2
will converge for almost all and hence we get a random variable
hX, i
where
Z
hX, i() =
X(t)()(t)dt,
T
the right hand side being defined (almost everywhere) as the limit of the Riemann approximating sums.
The same will be true if vanishes rapidly at infinity and the sample paths
satisfy (a.e.) a slow growth condition such as given by Proposition 7.1.1 in
addition to being continuous a.e.
The notation hX, i is justified since hX, i clearly depends linearly on .
But now we can make the following definition due to Gelfand. We may
restrict further by requiring that belong to D or S. We then consider a rule
Z which assigns to each such a random variable which we might denote by Z()
or hZ, i and which depends linearly on and satisfies appropriate continuity
conditions. Such an object is called a generalized random process. The
idea is that (just as in the case of generalized functions) we may not be able to
evaluated Z(t) at a given time t, but may be able to evaluate a smeared out
version Z().
The purpose of the next few sections is to do the following computation: We
wish to show that for the case Brownian motion, hX, i is a Gaussian random
variable with mean zero and with variance
Z Z
min(s, t)(s)(t)dsdt.
0
7.3
7.3.1
Gaussian measures.
Generalities about expectation and variance.
Let V be a vector space (say over the reals and finite dimensional). Let X be
a V -valued random variable. That is, we have some measure space (M, F, )
(which will be fixed and hidden in this section) where is a probability measure
on M , and X : M V is a measurable function. If X is integrable, then
Z
E(X) :=
Xd
M
197
Var(AX) = (A A) Var(X)
(7.13)
assuming that E(X) and Var(X) exist. We can also write this last equation as
Var(AX)() = Var(X)(A ),
(7.14)
(7.15)
7.3.2
i i
(7.16)
(7.17)
X () = N (A )eia = e 2 Id (A
) ia
or
X () = e Var(X)()/2+iE(X) .
(7.18)
199
(7.19)
then
Q() =
j j2 .
7.3.3
7.3.4
For example, consider the two dimensional vector space with coordinates (x1 , x2 )
and probability density proportional to
1 x21
(x2 x1 )2
exp
+
2 s
ts
where 0 < s < t. This corresponds to the matrix
t
t
1
ts
1
1
s(ts)
s
=
1
1
ts
t s 1 1
ts
whose inverse is
s
s
s
t
(7.20)
201
0st
as claimed .
Let us say that a probability measure on S 0 is a centered generalized
Gaussian process if every S, thought of as a function on the probability
space (S 0 , ) is a real valued centered Gaussian random variable; in other words
() is a centered Gaussian probability measure on the real line. If we denote
this process by Z, then we may write Z() for the random variable given by
. We clearly have Z(a + b) = aZ() + bZ() in the sense of addition of
random variables, and so we may think of Z as a rule which assigns, in a linear
fashion, random variables to elements of S. With some slight modification
Z()
:= Z().
7.4
To see how this derivative works, let us consider what happens for Brownian
motion. Let be a continuous path of slow growth, and set
h (t) :=
1
((t + h) (t)).
h
The paths are not differentiable (with probability one) so this limit does not
exist as a function. But the limit does exist as a generalized function, assigning
the value
Z
(t)(t)dt
0
2
s(s)ds
(t)dt.
0
s(s)ds
= t(t)
(s)ds.
0
t(t)(t)dt =
(t)2 dt
2
0
0
and
Z
Z
Z
(s)ds (t)dt
=
0
(t)2 dt.
(s t)(s)(t)dsdt.
0
Notice that now the covariance function is the generalized function (s t).
The generalized process (extended to the whole line) with this covariance is
called white noise because it is a Gaussian process which is stationary under
translations in time and its covariance function is (s t), signifying independent variation at all times, and the Fourier transform of the delta function is a
constant, i.e. assigns equal weight to all frequencies.
Chapter 8
Haar measure.
A topological group is a group G which is also a topological space such that
the maps
G G G, (x, y) 7 xy
and
x 7 x1
G G,
`a (x) = ax
206
8.1
8.1.1
Examples.
Rn .
8.1.2
Discrete groups.
If G has the discrete topology then the counting measure which assigns the value
one to every one element set {x} is Haar measure.
8.1.3
Lie groups.
We can reformulate the condition of left invariance as follows: Let I denote the
integral associated to the measure :
Z
I(f ) = f d.
Then
f d(`a ) = I(`a f )
where
(`a f )(x) = f (ax).
(8.1)
Z
1`1
d.
a A
(8.2)
8.1. EXAMPLES.
207
208
I claim that every entry of this matrix is a left invariant linear differential form.
Indeed,
(`a M )(x) = M (ax) = M (a)M (x).
Let us write
A = M (a).
Since a is fixed, A is a constant matrix, and so
(`a M )1 = (AM )1 = M 1 A1
while
`a dM = d(AM ) = AdM
since A is a constant. So
`a (M 1 dM ) = (M 1 A1 AdM ) = M 1 dM.
Of course, if the size of M is too small, there might not be enough linearly
independent entries. (In the complex case we want to be able to choose the
real and imaginary parts of these entries to be linearly independent.) But if,
for example, the map x 7 M (x) is an immersion, then there will be enough
linearly independent entries to go around.
For example, consider the group of all two by two real matrices of the form
a b
,
a=
6 0.
0 1
This group is sometimes known as the ax + b group since
a b
x
ax + b
=
.
0 1
1
1
In other words, G is the group of all translations and rescalings (and re-orientations)
of the real line.
We have
1 1
a b
a
a1 b
=
0 1
0
1
and
d
so
a b
0 1
a b
0 1
=
da
0
db
0
1
1
a b
a da
d
=
0 1
0
a1 db
0
(8.3)
8.1. EXAMPLES.
209
As a second example, consider the group SU (2) of all unitary two by two
matrices with determinant one. Each column of a unitary matrix is a unit
vector, and the columns are orthogonal. We can write the first column of the
matrix as
(8.4)
and the condition that the determinant be one fixes this constant of proportionality to be one. So we can write
M=
where (8.4) is satisfied. So we can think of M as a complex matrix valued
function on the group SU (2). Since M is unitary, M 1 = M so
M 1 =
and
M
dM =
d
d
d
d
=
d + d
d + d
d + d
d + d
.
Each of the real and imaginary parts of the entries is a left invariant one form.
But let us multiply three of these entries directly:
(d + d) (d + d) (d + d)
= (||2 + ||2 )d d (d + d)
d d (d + d).
We can simplify this expression by differentiating the equation
+ = 1
to get
d + d + d + d = 0.
So for 6= 0 we can solve for d:
1
d = (d + d + d).
210
d).
If we write
d =
(8.5)
as a left invariant three form on SU (2). You might think that this three form
is complex valued, but we shall now give an alternative expression for it which
will show that it is in fact real valued.
For this introduce polar coordinates in four dimensions as follows: Write
w
z
x
y
= w + iz
= x + iy so x2 + y 2 + z 2 + w2 = 1,
= cos
= sin cos
= sin sin cos
= sin sin sin
0 , 0 , 0 2.
Then
d d = (dw + idz) (dw idz) = 2idw dz
= 2id(cos ) d(sin cos ) = 2i sin2 d d.
Now
= sin sin ei
so
d = id +
where the missing terms involve d and d and so will disappear when multiplied
by d d. Hence
d d d = 2 sin2 sin d d d.
Finally, we see that the three form (8.5) when expressed in polar coordinates is
2 sin2 sin d d d.
Of course we can multiply this by any constant. If we normalize so that (G) = 1
the Haar measure is
1
sin2 sin ddd.
2 2
8.2
211
Topological facts.
212
Proposition 8.2.5 Suppose that G is locally compact. If f L then f is uniformly left (and right) continuous. That is, given > 0 there is a neighborhood
V of e such that
s V |f (sx) f (x)| < .
Equivalently, this says that
xy 1 V |f (x) f (y)| < .
Proof. Let
C := Supp(f )
and let U be a symmetric compact neighborhood of e. Consider the set Z of
points s such that
|f (sx) f (x)| < x U C.
I claim that this contains an open neighborhood W of e. Indeed, for each fixed
y U C the set of s satisfying this condition at y is an open neighborhood
Wy of e, and this Wy works in some neighborhood Oy of y. Since U C is compact, finitely many of these Oy cover U C, and hence the intersection of the
corresponding Wy form an open neighborhood W of e. Now take
V := U W.
If s V and x U C then |f (sx) f (x)| < . If x 6 U C, then sx 6 C
(since we chose U to be symmetric) and x 6 C, so f (sx) = 0 and f (x) = 0, so
|f (sx) f (x)| = 0 < . QED
In the construction of the Haar integral, we will need this proposition. So it
is exactly at this point where the assumption that G is locally compact comes
in.
8.3
mf
mg
mg
mf
mg
213
i ci mg
mf
.
mg
(8.7)
(8.8)
It is clear that
(`a f ; g) = (f ; g) a G
(f1 + f2 ; g) (f1 , g) + (f2 ; g)
(cf ; g) = c(f ; g) c > 0
f1 f2 (f1 ; g) (f2 ; g).
P
P
If f (x) ci g(si x) for all x and g(y) dj h(tj y) for all y then
f (x)
(8.9)
(8.10)
(8.11)
(8.12)
ci dj h(tj si x) x.
ij
f0 6= 0.
Define
Ig (f ) :=
(f ; g)
.
(f0 ; g)
1
Ig (f ).
(f0 ; f )
(8.13)
214
S :=
Sf .
f L+ ,f 6=0
The idea is that I somehow is the limit of the Ig as we restrict the support of
g to lie in smaller and smaller neighborhoods of the identity. We shall prove
that as we make these neighborhoods smaller and smaller, the Ig are closer and
closer to being additive, and so their limit I satisfies the conditions for being
an invariant integral. Here are the details:
Lemma 8.3.1 Given f1 and f2 in L+ and > 0 there exists a neighborhood V
of e such that
Ig (f1 ) + Ig (f2 ) Ig (f1 + f2 ) +
for all g with Supp(g) V .
Proof. Choose L such that = 1 on Supp(f1 + f2 ). For a given > 0 to
be chosen later, let
f := f1 + f2 + ,
h1 :=
f1
f2
, h2 := .
f
f
215
i = 1, 2.
cj [hi (s1
j ) + ]
cj [1 + 2].
216
and
I(f1 ) + I(f2 ) Ig (f1 ) + Ig (f2 ) + 2 Ig (f1 + f2 ) + 3 I(f1 + f2 ) + 4.
In short, I satisfies
I(f1 + f2 ) = I(f1 ) + I(f2 )
for all f1 , f2 in L+ , is left invariant, and I(cf ) = cI(f ) for c 0. As usual,
extend I to all of L by
I(f1 f2 ) = I(f1 ) I(f2 )
and this is well defined.
Since for f L+ we have
I(f ) (f ; f0 ) mf /mf0 = kf k /mf0
we see that I is bounded in the sup norm. So it is an integral (by Dinis lemma).
Hence, by the Riesz representation theorem, if G is Hausdorff, we get a regular
left invariant Borel measure. This completes the existence part of the main
theorem.
From the fact that is regular, and not the zero measure, we conclude that
there is some compact set K with (K) > 0. Let U be any non-empty open set.
The translates xU, x K cover K, and since K is compact, a finite number,
say n of them, cover K. But they all have the same measure, (U ) since is
left invariant. Thus
(K) n(U )
implying
(U ) > 0 for any non-empty open set U
(8.14)
8.4
Uniqueness.
8.4. UNIQUENESS.
217
Z
h(x, y)d(y)d(x) =
h(y 1 , xy)d(y)d(x).
R
From the definition of h the left hand side is f (x)d(x). For the right hand
side
g(x)
.
h(y 1 , xy) = f (y 1 ) R
g(ty 1 )d(t)
Integrating this first with respect to d(y) gives
kg(x)
where k is the constant
Z
k=
f (y 1 )
d(y).
g(ty 1 )d(t)
218
R
R
Now integrate with respect to . We get f d = k gd so
R
f d
R
,
gd
the right hand side of (8.16), does not depend on , since it equals k which is
expressed in terms of . QED
8.5
(G)
.
(K)
Let n be such that we can find n disjoint sets of the form xi K but no n + 1
disjoint sets of this form. This says that for any x G, xK can not be disjoint
from all the xi K. Thus
!
[
G=
xi K K 1
i
8.6
(8.17)
where we have fixed, once and for all, a (left) Haar measure . The left invariance
(under left multiplication by x1 ) implies that
Z
(f ? g)(x) = f (y)g(y 1 x)d(y).
(8.18)
In what follows we will write dy instead of d(y) since we have chosen a fixed
Haar measure .
If A := Supp(f ) and B := Supp(g) then f (y)g(y 1 x) is continuous as a
function of y for each fixed x and vanishes unless y A and y 1 x B. Thus
f ? g vanishes unless x AB. Also
Z
219
(8.19)
I claim that we have the associative law: If, f, g, h L then the claim is that
(f ? g) ? h = f ? (g ? h)
(8.20)
Indeed, using the left invariance of the Haar measure and Fubini we have
Z
((f ? g) ? h)(x) :=
(f ? g)(xy)h(y 1 )dy
Z Z
=
f (xyz)g(z 1 )h(y 1 )dzdy
Z Z
=
f (xz)g(z 1 y)h(y 1 )dzdy
Z Z
=
f (xz)g(z 1 y)h(y 1 )dydz
Z
=
f (xz)(g ? h)(z 1 )dz
=
(f ? (g ? h))(x).
(8.21)
220
8.7
8.7.1
The involution.
The modular function.
(8.22)
221
= .
0 1
x
In all cases the modular function is continuous (as follows from the uniform
right continuity, Proposition 8.2.5), and from its definition, it follows that
(st) = (s)(t).
In other words, is a continuous homomorphism from G to the multiplicative
group of positive real numbers.
The group G is called unimodular if 1. For example, a commutative
group is obviously unimodular. Also, a compact group is unimodular, because
G has finite measure, and is carried into itself by right multiplication so
(s)(G) = (rs ()(G) = (rs1 (G)) = (G).
222
8.7.2
(8.23)
(8.24)
Similarly,
(rs f )(x) = f (x1 s1 )(x1 ) = (s)`s (f)
or
(rs f ) = (s)`s f.
(8.25)
8.7.3
223
Relation to convolution.
We claim that
(f ? g) = g ? f
(8.27)
Proof.
(f ? g)(x)
=
=
=
8.7.4
(
g ? f)(x). QED
8.8
In general, the algebra L1 (G) will not have an identity element, since the only
candidate for the identity element would be the -function
h, f i = f (e),
and this will not be an honest function unless the topology of G is discrete.
So we need to introduce a different algebra if we want to have an algebra with
identity. If G were a Lie group we could consider the algebra of all distributions.
For a general locally compact Hausdorff group we can proceed as follows: Let
M(G) denote the space of all finite complex measures on G: A non-negative
measure is called finite if (G) < . A real valued measure is called finite
if its positive and negative parts are finite, and a complex valued measure is
called finite if its real and imaginary parts are finite.
Given two finite measures and on G we can form the product measure
on G G and then push this measure forward under the multiplication
map
m : G G G,
and so define their convolution by
? := m ( ).
224
One checks that the convolution of two regular Borel measures is again a regular
Borel measure, and that on measures which are absolutely continuous with respect to Haar measure, this coincides with the convolution as previously defined.
One can also make the algebra of regular finite Borel measures under convolution into a Banach algebra (under the total variation norm). This algebra
does include the -function (which is a measure!) and so has an identity. I will
not go into this matter here except to make a number of vague but important
points.
8.8.1
An algebra A is a vector space (over the complex numbers) together with a map
m:AAA
which is subject to various conditions (perhaps the associative law, perhaps the
commutative law, perhaps the existence of the identity, etc.). The dual object
would be a co-algebra, consisting of a vector space C and a map
c:C C C
subjects to a series of conditions dual to those listed above. If A is finite dimensional, then we have an identification of (A A) with A? A? , and so
the dual space of a finite dimensional algebra is a coalgebra and vice versa. For
infinite dimensional algebras or coalgebras we have to pass to certain topological
completions.
For example, consider the space Cb (G) denote the space of continuous bounded
functions on G endowed with the uniform norm
kf k = l.u.b.xG {|f (x)|}.
We have a bounded linear map
c : Cb (G) Cb (G G)
given by
c(f )(x, y) := f (xy).
In the case that G is finite, and endowed with the discrete topology, the space
Cb (G) is just the space of all functions on G, and Cb (G G) = Cb (G) Cb (G)
where Cb (G) Cb (G) can be identified with the space of all functions on G G
of the form
X
(x, y) 7
fi (x)gi (y)
i
where the sum is finite. In the general case, not every bounded continuous
function on G G can be written in the above form, but, by Stone-Weierstrass,
the space of such functions is dense in Cb (G G). So we can say that Cb (G)
is almost a co-algebra, or a co-algebra in the topological sense, in that the
for all f Cb .
Proof. We need to show that fn & 0 h`, fn i & 0. Given > 0, choose
:=
.
1 + 2kf1 k
So
1
.
2
This same inequality then holds with f1 replaced by fn since the fn are monotone decreasing. We have the K as in the definition of tightness, and by Dinis
lemma, we can choose N so that
kf1 k
kfn k,K
2A
n > N.
8.9
226
element a sending the coset xH into axH. By abuse of language, we will continue
to denote this action by `a . So
`a (xH) := (ax)H.
We can consider the corresponding action on measures
7 `a .
The measure is said to be invariant if
`a = a G.
The measure on G/H is said to be relatively invariant with modulus D
if D is a function on G such that
`a = D(a) a G.
From its definition it follows that
D(ab) = D(a)D(b),
and it is not hard to see from the ensuing discussion that D is continuous. We
will only deal with positive measures here, so D is continuous homomorphism
of G into the multiplicative group of real numbers. We call such an object a
positive character. The questions we want to address in this section are what
are the possible invariant measures or relatively invariant measures on G/H,
and what are their modular functions.
For example, consider the ax + b group acting on the real line. So G is the
ax + b group, and H is the subgroup consisting of those elements with b = 0, the
pure rescalings. So H is the subgroup fixing the origin in the real line, and we
can identify G/H with the real line. Let N G be the subgroup consisting of
pure translations, so N consists of those elements of G with a = 1. The group
N acts as translations of the line, and (up to scalar multiple) the only measure
on the real line invariant under all translations is Lebesgue measure, dx. But
a 0
h=
0 1
acts on the real line by sending x 7 ax and hence
`h (dx) = a1 dx.
(The push forward of the measure under the map assigns the measure
(1 (A)) to the set A.) So there is no measure on the real line invariant under
G. On the other hand, the above formula shows that dx is relatively invariant
with modular function
a b
D
= a1 .
0 1
(s)
(s)
s H.
(8.28)
f C0 (G)
(8.29)
228
Lemma 8.9.1 If B is a compact subset of G/H then there exists a compact set
A B such that
(A) = B,
Proof. Since we are assuming that G is locally compact, we can find an open
neighborhood O of e in G whose closure C is compact. The sets (xO), x G
are all open subsets of G/H since is open, and their images cover all of G/H.
In particular, since B is compact, finitely many of them cover B, so
!
[
[
[
B
(xi O)
(xi C) =
xi C
i
xi C
is compact, being the finite union of compact sets. The set 1 (B) is closed
(since its complement is the inverse image of an open set, hence open). So
A := K 1 (B)
is compact, and its image is B. QED
Proposition 8.9.1 J is surjective.
Let F C0 (G/H) and let B = Supp(F ). Choose a compact set A G with
(A) = B as in the lemma. Choose C0 (G) with 0 and > 0 on A. If
x AH = 1 (B)
then (xh) > 0 for some h H, and so J() > 0 on B. So we may extend the
function
F (z)
z 7
J()(z)
to a continuous function, call it , by defining it to be zero outside B = Supp(F ).
The function g = , i.e.
g(x) = ((x))
is hence a continuous function on G, and hence
f := g
is a continuous function of compact support on G. Since g is constant on H
cosets,
J(f )(z) = (z)J(h)(z) = F (z). QED
K(J((`a f )D))
D(a)1 K((J(`a (f D))))
D(a)1 K(`a (J(f D))))
D(a)1 D(a)K(J(f D)))
M (f ).
=
=
=
=
=
=
=
=
=
=
I(rh f )
K(J((rh f )D))
D(h)K(J(rh (f D)))
D(h)(h)K(J(f d))
D(h)(h)I(f ),
230
I(D1 `a (f ))
D(a)I(`a (D1 f ))
D(a)I(D1 f )
D(a)K(F ). QED
Chapter 9
(9.1)
Y
(x i e)
231
9.1
9.1.1
Maximal ideals.
Existence.
9.1.2
For any ring R we let Mspec(R) denote the set of maximal (proper) two sided
ideals of R. For any two sided ideal I we let
Supp(I) := {M Mspec(R) : I M }.
Notice that
Supp({0}) = Mspec(R)
and
Supp(R) = .
For any family I of two sided ideals, a maximal
ideal contains all of the I if
P
and only if it contains the two sided ideal I . In symbols
!
\
X
Supp(I ) = Supp
I .
Thus the intersection of any collection of sets of the form Supp(I) is again of
this form. Notice also that if
A = Supp(I)
then
A = Supp(J) where J =
\
M A
M.
233
M B
!
\
A B = Supp
(9.2)
M AB
Indeed, if N is a maximal ideal belonging to A B then it contains the intersection on the right hand side of (9.2) so the left hand side contains the right.
We must show the reverse inclusion. So suppose the contrary. This means that
there is a maximal ideal N which contains the intersection on the right but
does not T
belong to either A or B. Since N does not belong to A, the ideal
J(A) := M A M is not contained in N , so J(A) + N = R, and hence there
exist a J(A) and m N such that a + m = e. Similarly, there exist b J(B)
and n N such that b + n = e. But then
e = e2 = (a + m)(b + n) = ab + an + mb + mn.
Each of the last three terms on the right belong to N since it is a two sided
ideal, and so does ab since
!
!
!
\
\
\
ab
M
M =
M N.
M A
M B
M AB
(For the case of commutative rings, a major advance was to replace maximal
ideals by prime ideals in the preceding construction - giving rise to the notion
of Spec(R) - the prime spectrum of a commutative ring. But the motivation for
this development in commutative algebra came from these constructions in the
theory of Banach algebras.)
9.1.3
9.1.4
Let S be a compact Hausdorff space, and let C(S) denote the ring of continuous
complex valued functions on S. For each p S, the map of C(S) C given by
f 7 f (p)
is a surjective homomorphism. The kernel of this map consists of all f which
vanish at p. By the preceding proposition, this is then a maximal ideal, which
we shall denote by Mp .
Theorem 9.1.2 If I is a proper ideal of C(S), then there is a point p S such
that
I Mp .
In particular every maximal ideal in C(S) is of the form Mp so we may identify
Mspec(C(S)) with S as a set. This identification is a homeomorphism between
the original topology of S and the topology given above on Mspec(C(S)).
Proof. Suppose that for every p S there is an f I such that f (p) 6= 0.
Then |f |2 = f f I and |f (p)|2 > 0 and |f |2 0 everywhere. Thus each point
of S is contained in a neighborhood U for which there exists a g I with g 0
everywhere, and g > 0 on U . Since S is compact, we can cover S with finitely
many such neighborhoods. If we take h to be the sum of the corresponding gs,
then h I and h > 0 everywhere. So h1 C(S) and e = 1 = hh1 I so
I = C(S), a contradiction. This proves the first part of the the theorem.
To prove the last statement, we must show that the closure of any subset
A S in the original topology coincides with its closure in the topology derived
from the maximal ideal structure. That is, we must show that
!
\
closure of A in the topology of S = Supp
M .
M A
Now
\
M A
consists exactly of all continuous functions which vanish at all points of A. Any
such function must vanish on the closure of A in the topology of S. So the left
hand side of the above equation is contained in the right hand side. We must
show the reverse inclusion. Suppose p S does not belong to the closure of A
in the topology of S. Then Urysohns Lemma asserts
T that there is an f C(S)
which vanishes on A and f (p) 6= 0. Thus p 6 Supp( M A M ). QED
235
Theorem 9.1.3 Let I be an ideal in C(S) which is closed in the uniform topology on C(S). Then
\
I=
M.
M Supp(I)
Proof. Supp(I) consists of all points p such that f (p) = 0 for all f I. Since
f is continuous, the set of zeros of f is closed, and hence Supp(I) being the
intersection of such sets is closed. Let O be the complement
T of Supp(I) in S.
Then O is a locally compact space, and the elements of M Supp(I) M when
restricted to O consist of all functions which vanish at infinity. I, when restricted
to O is a uniformly closed subalgebra of this algebra. If we could show that the
elements of I separate points in O then the Stone-Weierstrass theorem would
tell us that I consists of all continuous functions on O which vanish at infinity,
i.e. all continuous functions which vanish on Supp(I), which is the assertion of
the theorem. So let p and q be distinct points of O, and let f C(S) vanish on
Supp(I) and at q with f (p) = 1. Such a function exists by Urysohns Lemma,
again. Let g I be such that g(p) 6= 0. Such a g exists by the definition of
Supp(I). Then gf I, (gf )(q) = 0, and (gf )(p) 6= 0. QED
9.2
Normed algebras.
A normed algebra is an algebra (over the complex numbers) which has a norm
as a vector space which satisfies
kxyk kxkkyk.
Since e = ee this implies that
kek kek2
so
kek 1.
Consider the new norm
kykN := lubkxk6=0 kyxk/kxk.
This still satisfies (9.3). Indeed, if x, y, and z are such that yz 6= 0 then
kxyzk kyzk
kxyzk
=
kxkN kykN
kzk
kyzk
kzk
and the inequality
kxyzk
kxkN kykN
kzk
is certainly true if yz = 0. So taking the sup over all z 6= 0 we see that
kxykN kxkN kykN .
(9.3)
(9.4)
9.3
237
Proof. For each x A, the values assumed by the set of ` B at x lie in the
closed disk Dkxk of radius kxk in C. Thus
Y
B
Dkxk
xA
and h(e) = 1.
(9.5)
9.3.1
xn
n=1
xn .
n=1
n
X
xi .
Then if m < n
ksm sn k
n
X
m+1
1
0
1 kxk
kxk
.
1 kxk
(9.6)
239
1
.
ky 1 k
kxk
.
(a kxk)a
(9.7)
Thus the set of elements having inverses is open and the map x x1 is
continuous on its domain of definition.
Proof. If kxk < ky 1 k1 then
ky 1 xk ky 1 kkxk < 1.
Hence e + y 1 x has an inverse by the previous proposition. Hence y + x =
y(e + y 1 x) has an inverse. Also
(y + x)1 y 1 = (e + y 1 x)1 e y 1 = (y 1 x)0 y 1
where (y 1 x)0 is the adverse of y 1 x.
From (9.6) and the above expression for (x + y)1 y 1 we see that
k(x + y)1 y 1 k k(y 1 x)0 kky 1 k
kxkky 1 k2
kxk
=
.
1 kxkky 1 k
a(a kxk)
QED
Proposition 9.3.3 If I is a proper ideal then ke xk 1 for all x I.
Proof. Otherwise there would be some x I such that e x has an adverse,
i.e. x has an inverse which contradicts the hypothesis that I is proper.
Proposition 9.3.4 The closure of a proper ideal is proper. In particular, every
maximal ideal is closed.
Proof. The closure of an ideal I is clearly an ideal, and all elements in the
closure still satisfy ke xk 1 and so the closure is proper. QED
Proposition 9.3.5 If I is a closed ideal in A then A/I is again a Banach
algebra.
Proof. The quotient of a Banach space by a closed subspace is again a Banach
space. The norm on A/I is given by
kXk = min kxk
xX
min
xX, yY
kxyk
min
xX, yY
kxkkyk = kXkkY k.
Also, if E is the coset containing e then E is the identity element for A/I and
so
kEk 1.
But we know that this implies that kEk = 1. QED
Suppose that A is commutative and M is a maximal ideal of A. We know
that A/M is a field, and the preceding proposition implies that A/M is a normed
field containing the complex numbers. The following famous result implies that
A/M is in fact norm isomorphic to C. It deserves a subsection of its own:
The Gelfand-Mazur theorem.
A division algebra is a (possibly not commutative) algebra in which every nonzero element has an inverse.
Theorem 9.3.3 Every normed division algebra over the complex numbers is
isometrically isomorphic to the field of complex numbers.
Let A be the normed division algebra and x A. We must show that x = e
for some complex number . Suppose not. Then by the definition of a division
algebra, (x e)1 exists for all C and all these elements commute. Thus
(x ( + h)e)1 (x e)1 = h(x ( + h)e)1 (x e)1
as can be checked by multiplying both sides of this equation on the right by
x e and on the left by x ( + h)e. Thus the strong derivative of the function
7 (x e)1
exists and is given by the usual formula (xe)2 . In particular, for any ` A
the function
7 `((x e)1 )
is analytic on the entire complex plane. On the other hand for 6= 0 we have
1
(x e)1 = 1 ( x e)1
9.3.2
241
9.3.3
(9.8)
The right hand side of (9.8) makes sense in any algebra, and is called the spectral radius of x and is denoted by |x|sp . We claim that
Theorem 9.3.4 In any Banach algebra we have
1
(9.9)
|x|sp kxn k n
and so
X
0
(x) =
(x)n
1
`(xn )n
converges on this disk. Here we use the fact that the Taylor series of a function
of a complex variable converges on any disk contained in the region where it is
analytic. Thus
|`(n xn )| 0
for each fixed ` A if || < 1/|x|sp . Considered as a family of linear functions
of `, we see that
` 7 `(n xn )
is bounded for each fixed `, and hence by the uniform boundedness principle,
there exists a constant K such that
kn xn k < K
for each in this disk, in other words
1
kxn k n K n (1/||)
so
1
9.3.4
243
We repeat the proof: Since L1 (G) L2 (G) this sum converges and
X
X
|(f ? g)(x)|
|f (xy 1 )| |g(y)|
xG
x,yG
|g(y)|
yG
X
xG
|g(y)|)(
yG
kf ? gk
|f (xy 1 )|
|f (w)|)
i.e.
wG
kf kkgk.
If x L1 (G) is defined by
x (t) =
1
if t = x
0 otherwise
then
x ? y = xy .
We know that the most general continuous linear function on L1 (G) is obtained from multiplication by an element of L (G) and then integrating =
summing. That is it is given by
X
f 7
f (x)(x)
xG
continuous function on G.
For example, if G = Z under addition, the condition to be a character says
that
(m + n) = (m)(n), || 1.
So
(n) = (1)n
where
(1) = ei
for some R/(2Z). Thus
f() =
f (n)ein
nZ
is just the Fourier series with coefficients f (n). The image of the Gelfand transform is just the set of Fourier series which converge absolutely. We conclude
from Theorem 9.3.5 that if F is an absolutely convergent Fourier series which
vanishes nowhere, then 1/F has an absolutely convergent Fourier series. Before
Gelfand, this was a deep theorem of Wiener.
To deal with the version of this theorem which treats the Fourier transform
rather than Fourier series, we would have to consider algebras which do not
have an identity element. Most of what we did goes through with only mild
modifications, but I do not to go into this, as my goals are elsewhere.
9.4
Self-adjoint algebras.
245
(f ) = f .
For example, if A is the algebra of bounded operators on a Hilbert space, then
the map T 7 T sending every operator to its adjoint is an example of an
involutory anti-automorphism. Another example is L1 (G) under convolution,
for a locally compact Hausdorff group G where the involution was the map
f 7 f.
If A is a semi-simple self-adjoint commutative Banach algebra, the map
x 7 x is an involutory anti-automorphism. It has this further property:
f = f 1 + f2
is invertible.
ag 2 (a2 b2 )g
b(a2 + b2 )
satisfies
h = h.
We have
h(M ) =
So
1 + h(M )2 = 0
contradicting the hypothesis that 1 + h2 is invertible. Now let us apply this
1
f . We have
result to 21 f and to 2i
f = g + ih
where g =
1
1
(f + f ), h = (f f )
2
2i
f A.
(9.11)
(9.12)
(9.13)
kf 2 k = kf k2
247
= kf 2 + c2 ek kf k2 + c2 .
This says that
a2 + b2 + 2bc + c2 kf k2 + c2
which is impossible if we choose c so that 2bc > kf k2 .
So we have proved that = . Now by definition, if f (M ) = f (N ) for all
f A, the maximal ideals M and N coincide. So the image of elements of A
under the Gelfand transform separate points of Mspec(A). But every f A can
be written as
1
1
f = (f + f ) + i (f f )
2
2i
i.e. as a sum g+ih where g and f are real valued. Hence the real valued functions
of the form g separate points of Mspec(A). Hence by the Stone Weierstrass theorem we know that the image of the Gelfand transform is dense in C(Mspec(A)).
Since A is complete and the Gelfand transform is norm preserving, we conclude
that the Gelfand transform is surjective. QED
9.4.1
An important generalization.
A Banach algebra with an involution such that (9.11) holds is called a C algebra. Notice that we are not assuming that this Banach algebra is commutative. But an element x of such an algebra is called normal if
xx = x x,
in other words if x does commute with x . Then we can repeat the argument
at the beginning of the proof of Theorem 9.4.2 to conclude that if x is a normal
element of a C algebra, then
k
kx2 k = kxk2
(9.14)
1
(x ae)
b
9.4.2
An important application.
(9.16)
sup kT T k
kk=1
sup
|(T T , )|
kk=1,kk=1
sup
|(T , T )|
kk=1,kk=1
sup (T , T )
kk=1
= kT k2
so
kT k2 kT T k kT kkT k
so
kT k kT k.
Reversing the role of T and T gives the reverse inequality so kT k = kT k.
Inserting into the preceding inequality gives
kT k2 kT T k kT k2
9.5. THE SPECTRAL THEOREM FOR BOUNDED NORMAL OPERATORS, FUNCTIONAL CALCULUS FORM
so we have the equality (9.16). QED
Thus the map T 7 T sending every bounded operator on a Hilbert space
into its adjoint is an anti-involution on the Banach algebra of all bounded operators, and it satisfies (9.11). We can thus apply Theorem 9.4.2 to conclude:
Theorem 9.4.4 Let B be any commutative subalgebra of the algebra of bounded
operators on a Hilbert space which is closed in the strong topology and with
the property that T B T B. Then the Gelfand transform T 7 T
gives a norm preserving isomorphism of B with C(M) where M = Mspec(B).
Furthermore, (T ) = T for all T B. In particular, if T is self-adjoint, then
T is real valued.
9.5
M 3 h 7 h(T ),
(9.17)
9.5.1
(9.18)
9.5. THE SPECTRAL THEOREM FOR BOUNDED NORMAL OPERATORS, FUNCTIONAL CALCULUS FORM
just apply the Stone-Weierstrass theorem to conclude (9.18) for all f . If f is
real then f = f and therefore (f ) = (f ) . If f 0 then we can find a real
valued g C((T ) such that f = g 2 and the square of a self-adjoint operator is
non-negative. QED
In view of this theorem, there is a more suggestive notation for the map .
Since the image of the monomial z is T , and since the image of any polynomial
P (thought of as a function on (T )) is P (T ), we are safe in using the notation
f (T ) := (f )
for any f C((T )).
9.5.2
SpecB (T ) = SpecA (T ).
(9.19)
Remarks:
1. Applied to the case where A is the algebra of all bounded operators on a
Hilbert space, and where B is the closed subalgebra by I, T and T we get the
spectral theorem for normal operators as promised.
2. If x ze has no inverse in A it has no inverse in B. So
SpecA (x) SpecB (x).
We must show the reverse inclusion. We begin by formulating some general
results and introducing some notation.
For any associative algebra A we let G(A) denote the set of elements of A
which are invertible (the group-like elements).
Proposition 9.5.1 Let B be a Banach algebra, and let xn G(B) be such that
xn x and x 6 G(B). Then
kx1
n k .
Proof. Suppose not. Then there is some C > 0 and a subsequence of elements
(which we will relabel as xn ) such that
kx1
n k < C.
Then
x = xn (e + x1
n (x xn ))
with x xn 0. In particular, for n large enough
kx1
n k kx xn k < 1,
so (e + x1
n (x xn )) is invertible as is xn and so x is invertible contrary to
hypothesis. QED
9.5. THE SPECTRAL THEOREM FOR BOUNDED NORMAL OPERATORS, FUNCTIONAL CALCULUS FORM
9.5.3
I started out this chaper with the general theory of Banach algebras, went to
the Gelfand representation theorem, the special properties of C algebras, and
then some general facts about how the spectrum of an element can vary with
the algebra containing it. I took this route because of the impact the Gelfand
representation theorem had on the course of mathematics, especially in algebraic
geometry. But the key ideas are
(9.9), which, for a bounded operator T on a Banach space says that
1
|| = lim kT n k n
max
Spec(T )
,
(9.16) which says that if T is a bounded operator on a Hilbert space then
kT T k = kT k2 , and
If T is a bounded operator on a Hilbert space and T T = T T then it
follows from (9.16) that
k
k
kT 2 k = kT k2 .
We could prove these facts by the arguments given above and conclude that if
T is a normal bounded operator on a Hilbert space then
|| = kT k.
max
(9.20)
Spec(T )
max
|P ()|.
(9.21)
Spec(T )
The norm on the right is the restriction to polynomials of the uniform norm
k k on the space C(Spec(T )).
Now the map
P 7 P (T )
is a homomorphism of the ring of polynomials into bounded normal operators
on our Hilbert space satisfying
P 7 P (T )
and
kP (T )k = kP k,Spec(T ) .
The Weierstrass approximation theorem then allows us to conclude that this
homomorphism extends to the ring of continuous functions on Spec(T ) with all
the properties stated in Theorem 9.5.1 .
Chapter 10
and P (U ) = P (U )
U 7 (P (U )x, y)
255
256
10.1
Z
(T x, y) =
Tdx,y
T C(M).
257
Thus, for each fixed Borel set U M its measure x,y (U ) depends linearly on
x and anti-linearly on y. We have
Z
(M) =
1dx,y = (ex, y) = (x, y)
M
so
|x,y (M)| kxkkyk.
So if f is any bounded Borel function on M, the integral
Z
f dx,y
M
and
kO(f )k kf k .
On continuous functions we have
O(T) = T
so O is an extension of the inverse of the Gelfand transform from continuous
functions to bounded Borel functions. So we know that O is multiplicative and
takes complex conjugation into adjoint when restricted to continuous functions.
Let us prove these facts for all Borel functions. If f is real we know that
(O(f )y, x) is the complex conjugate of (O(f )x, y) since y,x = x,y . Hence
O(f ) is self-adjoint if f is real from which we deduce that
O(f ) = O(f ) .
Now to the multiplicativity: For S, T B we have
Z
Z
T x,y .
STdx,y = (ST x, y) =
Sd
M
Since this holds for all S C(M) (for fixed T, x, y) we conclude by the uniqueness of the measure that
T x,y = Tx,y .
Therefore, for any bounded Borel function f we have
Z
Z
(T x, O(f ) y) = (O(f )T x, y) =
f dT x,y =
M
Tf dx,y .
258
This holds for all T C(M) and so by the uniqueness of the measure again, we
conclude that
x,O(f ) y = f x,y
and hence
Z
(O(f g)x, y) =
gf dx,y =
M
or
O(f g) = O(f )O(g)
as desired.
We have now extended the homomorphism from C(M) to A to a homomorphism from the bounded Borel functions on M to bounded operators on
H.
Now define:
P (U ) := O(1U )
for any Borel set U . The following facts are immediate:
1. P () = 0
2. P (M) = e the identity
3. P (U V ) = P (U )P (V ) and P (U ) = P (U ). In particular, P (U ) is a
self-adjoint projection operator.
4. If U V = then P (U V ) = P (U ) + P (V ).
5. For each fixed x, y H the set function Px,y : U 7 (P (U )x, y) is a
complex valued measure.
Such a P is called a resolution of the identity. It follows from the last item
that for any fixed x H, the map U 7 P (U )x is an H valued measure.
We have shown that any commutative closed self-adjoint subalgebra B of
the algebra of bounded operators on a Hilbert space H gives rise to a unique
resolution of the identity on M = Mspec(B) such that
Z
TdP
(10.1)
T =
M
Tdx,y
Actually, given any resolution of the identity we can give a meaning to the
integral
Z
f dP
M
259
Ui Uj = , i 6= j
and 1 , . . . , n C, define
O(s) :=
Z
i P (Ui ) =:
sdP.
M
This is well defined on simple functions (is independent of the expression) and
is multiplicative
O(st) = O(s)O(t).
Also, since the P (U ) are self adjoint,
O(s) = O(s) .
It is also clear that O is linear and
Z
(O(s)x, y) =
sdPx,y .
M
As a consequence, we get
kO(s)xk2 = (O(s) O(s)x, x) =
|s|2 dPx,x
so
kO(s)x)k2 |sk kxk2 .
If we choose i such that |i | = ksk and take x = P (Ui )y 6= 0, then we see that
kO(s)k = ksk
provided we now take kf k to denote the essential supremum which means
the following:
It follows from the properties of a resolution of the identity that
S if Un is a
sequence of Borel sets such that P (Un ) = 0, then P (U ) = 0 if U = Un . So if f
is any complex valued Borel function on M, there will exist a largest open subset
V C such that P (f 1 (V )) = 0. We define the essential range of f to be
the complement of V , say that f is essentially bounded if its essential range
is compact, and then define its essential supremum kf k to be the supremum
of || for in the essential range of f . Furthermore we identify two essentially
bounded functions f and g if kf gk = 0 and call the corresponding space
L (P ).
260
is defined as the strong limit of the integrals of the corresponding simple functions. The map f 7 O(f ) is linear, multiplicative, and satisfies
O(f ) = O(f )
and
kO(f )k = kf k
as before.
If S is a bounded operator on H which commutes with all the O(f ) then it
commutes with all the P (U ) = O(1U ). Conversely, if S commutes with all the
P (U ) it commutes with all the O(s) for s simple and hence with all the O(f ).
Putting it all together we have:
Theorem 10.1.1 Let B be a commutative closed self adjoint subalgebra of the
algebra of all bounded operators on a Hilbert space H. Then there exists a
resolution of the identity P defined on M = Mspec(B) such that (10.1) holds.
The map T 7 T of C(M) B given by the inverse of the Gelfand transform
extends to a map O from L (P ) to the space of bounded operators on H
Z
O(f ) =
f dP.
M
Furthermore, P (U ) 6= 0 for any non-empty open set U and an operator S commutes with every element of B if and only if it commutes with all the P (U ) in
which case it commutes with all the O(f ).
We must prove the last two statements. If U is open, we may choose T 6= 0
such that T is supported in U (by Urysohns lemma). But then (10.1) implies
that T = 0, a contradiction.
For any bounded operator S and any x, y H and T B we have
Z
(ST x, y) = (T x, S y) = TdPx,S y
while
Z
(T Sx, y) =
TdPSx,y .
If ST = T S for all T B this means that the measures PSx,y and Px,S y are
the same, which means that
(P (U )Sx, y) = (P (U )x, S y) = (SP (U )x, y)
for all x and y which means that
SP (U ) = P (U )S
for all U . We already know that SP (U ) = P (U )S for all S implies that SO(f ) =
O(f )S for all f L (P ). QED
10.2
10.3
Stones formula.
for some projection valued measure P on R. We also know that every bounded
Borel function on R gives rise to an operator. In particular, if z is a complex
number which is not real, the function
7
1
z
262
Since
Z
(ze T ) =
(z )dP ()
R
1
(P ((a, b)) + P ([a, b])) .
2
a
(10.2)
Although this formula cries out for a complex variables proof, and I plan to
give one later, we can give a direct real variables proof in terms of what we
already know. Indeed, let
[R( i, T ) R( + i, T )] d =
f (x) :=
1
2i
Z
a
1
1
x i x + i
d.
We have
f (x) =
Z
a
1
d =
2
2
(x ) +
arctan
xa
xb
arctan
.
0
1
(1(a,b) + 1[a,b] ).
2
10.4
Unbounded operators.
Many important operators in Hilbert space that arise in physics and mathed
matics are unbounded. For example the operator D = 1i dx
on L2 (R). This
operator is not defined on all of L2 , and where it is defined it is not bounded
as an operator. One of the great achievements of Wintner in the late 1920s,
263
followed by Stone and von Neumann was to prove a version of the spectral
theorem for unbounded self-adjoint operators.
There are two (or more) approaches we could take to the proof of this theorem. Both involve the resolvent
Rz = R(z, T ) = (zI T )1 .
(10.3)
After spending some time explaining what an unbounded operator is and giving
the very subtle definition of what an unbounded self-adjoint operator is, we
will prove that the resolvent of a self-adjoint operator exists and is a bounded
normal operator for all non-real z.
We could then apply the spectral theorem for bounded normal operators
to derive the spectral theorem for unbounded self-adjoint operators. This is
the fastest approach, but depends on the whole machinery of the Gelfand representation theorem that we have developed so far. Or, we could could prove
the spectral theorem for unbounded self-adjoint operators directly using (a mild
modification of) Stones formula. We will present both methods. In the second
method we will follow the treatment by Lorch.
10.5
264
Equally well, we could start with the linear transformation: Suppose we are
given a (not necessarily closed) subspace D(T ) B and a linear transformation
T : D(T ) C.
We can then consider its graph (T ) B C which consists of all
{x, T x}.
Thus the notion of a graph, and the notion of a linear transformation defined
only on a subspace of B are logically equivalent. When we start with T (as
usually will be the case) we will write D(T ) for the domain of T and (T ) for
the corresponding graph. There is a certain amount of abuse of language here,
in that when we write T , we mean to include D(T ) and hence (T ) as part of
the definition.
A linear transformation is said to be closed if its graph is a closed subspace
of B C. Let us disentangle what this says for the operator T . It says that if
fn D(T ) then
fn f and T fn g f D(T ) and T f = g.
This is a much weaker requirement than continuity. Continuity of T would say
that fn f alone would imply that T fn converges to T f . Closedness says that
if we know that both fn converges and gn = T fn converges then we can conclude
that f = lim fn lies in D(T ) and that T f = g.
An important theorem, known as the closed graph theorem says that if T is
closed and D(T ) is all of B then T is bounded. As we will not need to use this
theorem in this lecture, we will not present its proof here.
10.6
The adjoint.
Suppose that we have a linear operator T : D(T ) C and let us make the
hypothesis that
D(T ) is dense in B.
Any element of B is then completely determined by its restriction to D(T ).
Now consider
(T ) C B
defined by
{`, m} (T ) h`, T xi = hm, xi
x D(T ).
(10.4)
265
whose domain consists of all ` C such that there exists an m B for which
h`, T xi = hm, xi x D(T ).
If `n ` and mn m then the definition of convergence in these spaces
implies that for any x D(T ) we have
h`, T xi = limh`n , T xi = limhmn , xi = hm, xi.
If we let x range over all of D(T ) we conclude that is a closed subspace of
C B . In other words we have proved
Theorem 10.6.1 If T : D(T ) C is a linear transformation whose domain
D(T ) is dense in B, it has a well defined adjoint T whose graph is given by
(10.4). Furthermore T is a closed operator.
10.7
Self-adjoint operators.
266
10.8
The resolvent.
1
.
||
(10.5)
(10.6)
267
In particular
kf k2 2 kgk2
for all g D(A). Since || > 0, we see that f = 0 g = 0 so (cI A) is
injective on D(A), and furthermore that that (cI A)1 (which is defined on
im (cI A))satisfies (10.5). We must show that this image is all of H.
First we show that the image is dense. For this it is enough to show that
there is no h 6= 0 H which is orthogonal to im (cI A). So suppose that
([cI A]g, h) = 0
g D(A).
Then
(g, ch) = (cg, h) = (Ag, h) g D(A)
which says that h D(A ) and A h = ch. But A is self adjoint so h D(A)
and Ah = ch. Thus
c(h, h) = (ch, h) = (Ah, h) = (h, Ah) = (h, ch) = c(h, h).
Since c 6= c this is impossible unless h = 0. We have now established that the
image of cI A is dense in H.
We now prove that it is all of H. So let f H. We know that we can find
fn = (cI A)gn ,
gn D(A) with fn f.
The sequence fn is convergent, hence Cauchy, and from (10.5) applied to elements of D(A) we know that
kgm gn k ||1 kfn fm k.
Hence the sequence {gn } is Cauchy, so gn g for some g H. But we know
that A is a closed operator. Hence g D(A) and (cI A)g = f . QED
The operator
Rz = Rz (A) = (zI A)1
is called the resolvent of A when it exists as a bounded operator. The set
of z C for which the resolvent exists is called the resolvent set and the
complement of the resolvent set is called the spectrum of the operator. The
preceding theorem asserts that the spectrum of a self-adjoint operator is a subset
of the real numbers.
Let z and w both belong to the resolvent set. We have
wI A = (w z)I + (zI A).
Multiplying this equation on the left by Rw gives
I = (w z)Rw + Rw (zI A),
268
zw
Rz R w
2
= Rw
.
zw
This says that the derivative in the complex sense of the resolvent exists and is
given by Rz2 . In other words, the resolvent is a holomorphic operator valued
function of z.
To emphasize this holomorphic character of the resolvent, we have
Proposition 10.8.1 Let z belong to the resolvent set. The the open disk of
radius kRz k1 about z belongs to the resolvent set and on this disk we have
Rw = Rz (I + (z w)Rz + (z w)2 Rz2 + ).
(10.8)
Proof. The series on the right converges in the uniform topology since |z
w| < kRz k1 . Multiplying this series by (zI A) (z w)I gives I. But
zI A (z w)I = wI A. So the right hand side is indeed Rw . QED
This suggests that we can develop a Cauchy theory of integration of functions such as the resolvent, and we shall do so, eventually leading to a proof of
the spectral theorem for unbounded self-adjoint operators.
However we first give a proof (following the treatment in Reed-Simon) in
which we derive the spectral theorem for unbounded operators from the Gelfand
representation theorem applied to the closed algebra generated by the bounded
normal operators (iI A)1 .
10.9
We first state this theorem for closed commutative self-adjoint algebras of (bounded)
operators. Recall that self-adjoint in this context means that if T B then
T B.
T 7 T
such that
[(W T W 1 )f ](m) = T(m)f (m).
In fact, M can be taken to be a finite or countable disjoint union of M =
Mspec(B)
N
[
M=
Mi ,
Mi = M
1
N Z+ and
T(m) = T(m)
if m Mi = M.
10.9.1
Cyclic vectors.
Td(P (U )x, y) = 0
270
which contradicts the assumption that the linear combinations of the T x are
dense in H. QED
Let us continue with the assumption that x is a cyclic vector for B. Let
= x,x
so
(U ) = (P (U )x, x).
This is a finite measure on M, in fact
(M) = kxk2 .
(10.9)
10.9.2
Start with any non-zero vector x1 and consider H1 = Bx1 = the closure of
linear combinations of T x1 , T B. The space H1 is a closed subspace of H
which is invariant under B, i.e. T H1 H1 T B. Therefore the space H1
is also invariant under B since if (x1 , y) = 0 then
(x1 , T y) = (T x1 , y) = 0 since T B.
Now if H1 = H we are done, since x1 is a cyclic vector for B acting on H1 .
If not choose a non-zero x2 H2 and repeat the process. We can choose a
collection of non-zero vectors zi whose linear combinations are dense in H - this
is the separability assumption. So we may choose our xi to be obtained from
orthogonal projections applied to the zi . In other words we have
H = H 1 H2 H3
where this is either a finite or a countable Hilbert space (completed) direct sum.
Let us also take care to choose our xn so that
X
kxn k2 <
which we can do, since cn xn is just as good as xn for any cn 6= 0. We have a
unitary isomorphism of Hn with L2 (M, n ) where n (U ) = (P (U )xn , xn ). In
particular,
n (M) = kxn k2 .
So if we take M to be the disjoint union of copies Mn of M each with measure
n then the total measure of M is finite and
M
L2 (M ) =
L2 (Mn , n )
where this is either a finite direct sum or a (Hilbert space completion of) a
countable direct sum. Thus the theorem for the cyclic case implies the theorem
for the general case. QED
10.9.3
272
and we know that the image of (iI A)1 is D(A) which is dense in H.
Now consider the function (A + iI)1 on M given by Theorem 10.9.1. It
can not vanish on any set of positive measure, since any function supported on
such a set would
be in the kernel of the operator consisting of multiplication by
(A + iI)1 .
Thus the function
1
A := (A + iI)1
i
is finite almost everywhere on M relative to the measure although it might
(and generally will) be unbounded. Our plan is to show that under the unitary
1
(A + iI)1
h.
So
W Ax = W y iW x
1
h ih
=
(A + iI)1
= Ah
QED
10.9.4
on H. With a slight abuse of language we might denote this operator by O(f A).
However we will use the more suggestive notation
f (A).
The map
f 7 f (A)
is an algebraic homomorphism,
f (A) = f (A) ,
kf (A)k kf k where the norm on the left is the uniform operator norm
and the norm on the right is the sup norm on R
if Ax = x then f (A)x = f ()x,
if f 0 then f (A) 0 in the operator sense,
if fn f pointwise and if kfn k is bounded, then fn (A) f (A) strongly,
and
if fn is a sequence of Borel functions on the line such that |fn ()| ||
for all n and for all R, and if fn () for each fixed R then for
each x D(A)
fn (A)x Ax.
274
All of the above statements are obvious except perhaps for the last two which
follow from the dominated convergence theorem. It is also clear from the preceding discussion that the map f 7 f (A) is uniquely determined by the above
properties.
10.9.5
For each measurable subset X of the real line we can consider its indicator
function 1X and hence 1X (A) which we shall denote by P (X). In other words
P (X) := 1X (A).
It follows from the above that
P (X) = P (X)
P (X)P (Y ) = P (X Y )
P (X Y ) = P (X) + P (Y ) if X Y = 0
P (X)
= s lim
N
X
P (Xi ) if Xi Xj = if i 6= j and X =
P ()
P (R)
= 0
= I.
Xi
g()dPx,y
R
dP.
R
=
=
E
E
E E = E
En 0 strongly
En I strongly
En E strongly.
(10.10)
(10.11)
(10.12)
(10.13)
(10.14)
(10.15)
A=
dE .
(10.16)
We shall now give an alternative proof of this formula which does not depend
on either the Gelfand representation theorem or any of the limit theorems of
Lebesgue integration. Instead, it depends on the Riesz-Dunford extension of the
Cauchy theory of integration of holomorphic functions along curves to operator
valued holomorphic functions.
276
10.10
i=1
and the usual proof of the existence of the Riemann integral shows that this
tends to a limit as the mesh becomes more and more refined and the mesh
distance tends to zero. The limit is denoted by
Z
Sz dz
C
and this notation is justifies because the change of variables formula for an
ordinary integral shows that this value does not depend on the parametrization,
but only on the orientation of the curve C.
We are going to apply this to Sz = Rz , the resolvent of an operator, and
the main equations we shall use are the resolvent equation (10.7) and the power
series for the resolvent (10.8) which we repeat here:
Rz Rw = (w z)Rz Rw
and
Rw = Rz (I + (z w)Rz + (z w)2 Rz2 + ).
We proved that the resolvent of a self-adjoint operator exists for all non-real
values of z.
But a lot of the theory goes over for the resolvent
Rz = R(z, T ) = (zI T )1
where T is an arbitrary operator on a Banach space, so long as we restrict
ourselves to the resolvent set, i.e. the set where the resolvent exists as a bounded
operator. So, following Lorch Spectral Theory we first develop some facts about
integrating the resolvent in the more general Banach space setting (where our
principal application will be to the case where T is a bounded operator).
For example, suppose that C is a simple closed curve contained in the disk
of convergence about z of (10.8) i.e. of the above power series for Rw . Then we
can integrate the series term by term. But
Z
(z w)n dw = 0
C
277
for all n 6= 1 so
Z
Rw dw = 0.
C
C1
and P T = T P.
Proof. Choose a simple closed curve C 0 disjoint from C but sufficiently close
to C so as to be homotopic to C via a homotopy lying in the resolvent set. Thus
Z
1
P =
Rw dw
2i C 0
278
and so
2
(2i) P =
Z Z
Rz dz
Rw dw =
C0
C0
where we have used the resolvent equation (10.7). We write this last expression
as a sum of two terms,
Z
Z
Z
Z
1
1
dzdw
Rz
dwdz.
Rw
C0
C zw
C
C0 z w
Suppose that we choose
C 0 to lie entirely inside C. Then the first expression
R
above is just (2i) C 0 Rw dw while the second expression vanishes, all by the
elementary Cauchy integral of 1/(z w). Thus we get
(2i)2 P 2 = (2i)2 P
or P 2 = P . This proves that P is a projection. It commutes with T because it
is an integral whose integrand Rz commutes with T for all z. QED
The same argument proves
Theorem 10.10.3 Let C and C 0 be simple closed curves each lying in the resolvent set, and let P and P 0 be the corresponding projections given by (10.17).
Then P P 0 = 0 if the curves lie exterior to one another while P P 0 = P 0 if C 0 is
interior to C.
Let us write
B 0 := P B,
B 00 = (I P )B
for the images of the projections P and I P where P is given by (10.17). Each
of these spaces is invariant under T and hence under Rz because P T = T P and
hence P Rz = Rz P .
For any transformation S commuting with P let us write
S 0 := P S = SP = P SP
279
(10.18)
Z
1
1
(zI T )
Rw
dw =
2i C
zw
Z
Z
1
1
1
dw I +
Rw dw = 0 + P = P.
=
2i C z w
2i C
We have thus proved
T = T 0 T 00
10.11
10.11.1
Positive operators.
280
such an operator positive. Clearly the sum of two positive operators is positive
as is the multiple of a positive operator by a non-negative number. Also we
write A1 A2 for two self adjoint operators if A1 A2 is positive.
Proposition 10.11.1 If A is a bounded self-adjoint operator and A I then
A1 exists and
kA1 k 1.
Proof. We have
kAxk kxk (Ax, x) (x, x) = kxk2
so
kAxk kxk
x H.
(10.19)
Proof. Suppose kAk 1. Then using Cauchy-Schwarz and then the definition
of kAk we get
([I A]x, x) = (x, x) (Ax, x) kxk2 kAxk kxk kxk2 kAkkxk2 0
so (I A) 0 and applied to A gives I + A 0 or I A I.
Conversely, suppose that I A I. Since I A 0 we know that
Spec(A) (, 1] and since I + A 0 we have Spec(A) (1, ]. So
Spec(A) [1, 1]
so that the spectral radius of A is 1. But for self adjoint operators we have
kA2 k = kAk2 and hence the formula for the spectral radius gives kAk 1. QED
281
10.11.2
1
(i+1 i ).
2
We now let A denote an arbitrary (not necessarily bounded) self adjoint transformation. We say that belongs to the point spectrum of A if there exists
an x D(A) such that x 6= 0 and Ax = x. In other words if is an eigenvalue of A. Notice that eigenvectors corresponding to distinct eigenvalues are
orthogonal: if Ax = x and Ay = y then
(x, y) = (x, y) = (Ax, y) = (x, Ay) = (x, y) = (x, y)
implying that (x, y) = 0 if 6= .
Also, the fact that a self-adjoint operator is closed implies that the space of
eigenvectors corresponding to a fixed eigenvalue is a closed subspace of H. We
let N denote the space of eigenvectors corresponding to an eigenvalue .
We say that A has pure point spectrum if its eigenvectors span H, in
other words if
M
H=
Ni
where the i range over the set of eigenvalues of A. Suppose that this is the
case. Then let
M
M :=
N
<
where this denotes the Hilbert space direct sum, i.e. the closure of the algebraic
direct sum. Let E denote projection onto M . Then it is immediate that the
E satisfy (10.10)-(10.15) and that (10.16) holds with the interpretation given
in the preceding section. We thus have a proof of the spectral theorem for
operators with pure point spectrum.
282
10.11.3
and
(I P )[D(A)] = D(A) H2 .
(10.20)
283
10.11.4
(10.22)
w
0
C C
284
(10.23)
(10.24)
(10.25)
Similarly
We also have
(x, x) (Ax, x) (x, x) for x im K (m, n).
Proof. We have
([A I]K (m, n)y, K (m, n)y)
=
=
=
=
=
(10.26)
285
(10.27)
Proof. Let C denote the rectangle of height one parallel to the real axis
and cutting the real axis at the points and . Use similar notation to define
the rectangles C and C . Consider the integrand
Sz := (z )(z )(z )Rz
and let
T :=
1
2i
Z
Sz dz
C
with similar notation for the integrals over the other two rectangles of the same
integrand. Then clearly
T = T + T
and T T = 0.
(10.28)
286
1
y=
2ir2
10.12
We may assume that 6= 0. For any > 0 we can find a > 0 such
(E+ , ) E , ) <
kk2
for all R. So
Z
d(E , )
R
d(E , ) < .
This says that the measure of the band of width about the diagonal has
measure less that . Letting shrink to 0 shows that the diagonal line has
measure zero.
We can restate this lemma more abstractly as follows: Consider the Hilbert
(the completion of the tensor product H H). The E and E
space HH
an eigenvalue of A I I A on HH,
288
10.13
Lurking in the background of our entire discussion is the closed graph theorem
which says that if a closed linear transformation from one Banach space to
another is everywhere defined, it is in fact bounded. We did not actually use
this theorem, but its statement and proof by Banach greatly clarified the notion
of a what an unbounded self-adjoint operator is, and explained the Hellinger
Toeplitz theorem as I mentioned earlier. So here I will give the standard proof
of this theorem (essentially a Baire category style argument) taken from Loomis.
In what follows X and Y will denote Banach spaces,
Bn := Bn (X) = {x X; kxk n}
denotes the ball of radius n about the origin in X and
Ur = Br (Y ) = {y Y : kyk r}
the ball of radius r about the origin in Y .
Lemma 10.13.1 Let
T :XY
be a bounded (everywhere defined) linear transformation. If T [B1 ] Ur is dense
in Ur then
Ur T [B1 ].
Proof. The set T [B1 ] is closed, so it will be enough to show that
Ur(1) T [B1 ]
for any > 0, or, what is the same thing, that
Ur
1
1 ].
T [B1 ] = T [B 1
1
T (xn+1 = yn+1 yn
If
x :=
X
1
xn
289
2
.
r
QED
Theorem 10.13.2 If T : X Y is defined on all of X and is such that
graph(T ) is a closed subspace of X Y , then T is bounded.
Proof. Let X Y denote the graph of T . By assumption, it is a closed
subspace of the Banach space X Y under the norm k{x, y}k = kxk + kyk. So
is a Banach space and the projection
X,
{x, y} 7 x
290
Chapter 11
Stones theorem
Recall that if A is a self-adjoint operator on a Hilbert space H we can form the
one parameter group of unitary operators
U (t) = eiAt
by virtue of a functional calculus which allows us to construct f (A) for any
bounded Borel function defined on R (if we use our first proof of the spectral
theorem using the Gelfand representation theorem) or for any function holomorphic on Spec(A) if we use our second proof. In any event, the spectral theorem
allows us to write
Z
U (t) =
eit dE
U (s + t) = U (s)U (t)
and that U depends continuously on t. We called this assertion the first half of
Stones theorem. The second half (to be stated more precisely below) asserts
the converse: that any one parameter group of unitary transformations can be
written in either, hence both, of the above forms.
The idea that we will follow hinges on the following elementary computation
Z
1
e(z+ix)t
(z+ix)t
e
dt =
=
if Re z > 0
z
+
ix
z
ix
0
t=0
valid for any real number x. If we substitute A for x and write U (t) instead of
eiAt this suggests that
Z
R(z, iA) = (zI iA)1 =
ezt U (t)dt if Re z > 0.
0
292
for z lying in the right half plane. We can obtain a similar formula for the left
half plane.
Our previous studies encourage us to believe that once we have found all
these putative resolvents, it should not be so hard to reconstruct A and then
the one-parameter group U (t) = eiAt .
This program works! But because of some of the subtleties involved in the
definition of a self-adjoint operator, we will begin with an important theorem
of von-Neumann which we will need, and which will also greatly clarify exactly
what it means to be self-adjoint.
A second matter which will lengthen these proceedings is that while we are at
it, we will prove a more general version of Stones theorem valid in an arbitrary
Frechet space F and for uniformly bounded semigroups rather than unitary
groups. Stone proved his theorem to meet the needs of quantum mechanics,
where a unitary one parameter group corresponds, via Wigners theorem to
a one parameter group of symmetries of the logic of quantum mechanics. In
more pedestrian terms, unitary one parameter groups arise from solutions of
Schrodingers equation. But many other important equations, for example the
heat equations in various settings, require the more general result.
The treatment here will essentially follow that of Yosida, Functional Analysis
especially Chapter IX, Nelson, Topics in dynamics I: Flows, and Reed and
Simon Methods of Mathematical Physics, II. Fourier Analysis, Self-Adjointness.
11.1
The group Gl(2, C) of all invertible complex two by two matrices acts as fractional linear transformations on the plane: the matrix
az + b
a b
.
sends z 7
c d
cz + d
Two different matrices M1 and M2 give the same fractional linear transformation
if and only if M1 = M2 for some (non-zero complex) number as is clear from
the definition. Since
1 i
i
i
1 0
= 2i
,
1 i
1 1
0 1
1 i
i
i
the fractional linear transformations corresponding to
and
1 i
1 1
are inverse to one another.
It is a theorem in the elementary theory of complex variables that fractional
linear transformations are the only orientation preserving transformations of
the plane which carry circles and lines into circles and lines.
Even
without
1 i
this general theory, an immediate computation shows that
carries the
1 i
(extended) real axis onto the unit circle, and hence its inverse carries the unit
293
circle onto the extended real axis. (Extended means with the point added.)
Indeed in the expression
xi
z=
x+i
when x is real, the numerator is the complex conjugate of the denominator and
hence |z| = 1. Under this transformation, the cardinal points 0, 1, of the
extended real axis are mapped as follows:
0 7 1,
1 7 i,
and 7 1.
We might think of (multiplication by) a real number as a self-adjoint transformation on a one dimensional Hilbert space, and (multiplication by) a number
of absolute value one as a unitary operator on a one dimensional Hilbert space.
This suggests in general that if A is a self adjoint operator, then
(A iI)(A + iI)1
should be unitary. In fact, we can be much more precise. First some definitions:
An operator U , possibly defined only on a subspace of a Hilbert space H is
called isometric if
kU xk = kxk
for all x in its domain of definition.
Recall that in order to define the adjoint T of an operator T it is necessary
that its domain D(T ) be dense in H. Otherwise the equation
(T x, y) = (x, T y) x D(T )
does not determine T y. A transformation T (in a Hilbert space H) is called
symmetric if D(T ) is dense in H so that T is defined and
D(T ) D(T ) and T x = T x x D(T ).
Another way of saying the same thing is T is symmetric if D(T ) is dense
and
(T x, y) = (x, T y) x, y D(T ).
A self-adjoint transformation is symmetric since D(T ) = D(T ) is one of the
requirements of being self-adjoint. Exactly how and why a symmetric operator
can fail to be self-adjoint will be clarified in the ensuing discussion. All of the
results of this section are due to von Neumann.
Theorem 11.1.1 Let T be a closed symmetric operator. Then (T + iI)x = 0
implies that x = 0 for any x D(T ) so (T + iI)1 exists as an operator on its
domain
D (T + iI)1 = im(T + iI).
This operator is bounded on its domain and the operator
UT := (T iI)(T + iI)1 with D(UT ) = D (T + iI)1 = im(T + iI)
294
(11.1)
Taking the plus sign shows that (T + iI)x = 0 x = 0 and also shows that
k[T + iI]xk kxk so
k[T + iI]1 yk kyk for y [T + iI](D(T )).
If we write x = [T + iI]1 y then (11.1) shows that
kUT yk2 = kT xk2 + kxk2 = kyk2
so UT is an isometry with domain consisting of all y = (T + iI)x, i.e. with
domain D([T + iI]1 ) = im[T + iI].
We now show that UT is closed. So we must show that if yn y and
zn z where zn = UT yn then y D(UT ) and UT y = z. The yn form a Cauchy
sequence and yn = [T + iI]xn since yn im(T + iI). From (11.1) we see that
the xn and the T xn form a Cauchy sequence, so xn x and T xn w which
implies that x D(T ) and T x = w since T is assumed to be closed. But then
(T + iI)x = w + ix = y so y D(UT ) and w ix = z = UT y. So we have shown
that UT is closed.
Subtract and add the equations
y
UT y
1
(I UT )y
2
1
(I + UT )y
2
=
=
(T + iI)x
(T iI)x to get
= ix and
= T x.
1
([I UT ]y + [I + UT ]y) = 0.
2
295
1
1
(I + UT )y = (I + UT )(I UT )1 2ix
2
2
or
T = i(I + UT )(I UT )1
as required. Furthermore, every x D(T ) is in im(I UT ). This completes the
proof of the first half of the theorem.
Now suppose we start with an isometry U and suppose that (I U )y = 0
for some y D(U ). Let z im(I U ) so z = w U w for some w. We have
(y, z) = (y, w) (y, U w) = (U y, U w) (y, U w) = (U y y, U w) = 0.
Since we are assuming that im(I U ) is dense in H, the condition (y, z) =
0 z im(I U ) implies that y = 0. Thus (I U )1 exists, and we may define
T = i(I + U )(I U )1
with
D(T ) = D (I U )1 = im(I U )
dense in H. Suppose that x = (I U )u, y = (I U )v D(T ) = im(I U ).
Then
(T x, y) = (i(I + U )u, (I U )v) = i [(U u, v) (u, U v)] + i [(u, v) (U u, U v)] .
The second expression in brackets vanishes since U is an isometry. So (T x, y) =
i(U u, v) i(u, U v) = (U u, iv) + (u, iU v) = ([I U ]u, i[I + U ]v) = (x, T y).
This shows that T is symmetric.
To see that UT = U we again write x = (I U )u. We have
T x = i(I + U )u so (T + iI)x = 2iu and (T iI)x = 2iU u.
Thus D(UT ) = {2iu u D(U )} = D(U ) and
UT (2iu) = 2iU u = U (2iu).
Thus U = UT .
We must still show that T is a closed operator. T maps xn = (I U )un
to (I + U )un . If both (I U )un and (I + U )un converge, then un and U un
converge. The fact that U is closed implies that if u = lim un then u D(U ) and
U u = lim U un . But this that (I U )un (I U )u and i(I + U )un i(I + U )u
so T is closed. QED
The map T 7 UT from symmetric operators to isometries is called the
Cayley transform.
296
Recall that an isometry is unitary if its domain and image are all of H.
If U is a closed isometry, then xn D(U ) and xn x implies that U xn is
convergent, hence x D(U ) and U x = lim U xn . Similarly, if U xn y then
the xn are Cauchy, hence convergent to an x with U x = y. So for any closed
isometry U the spaces D(U ) and im(U ) measure how far U is from being
unitary: If they both reduce to the zero subspace then U is unitary.
For a closed symmetric operator T define
H+
T = {x H|T x = ix} and HT = {x H|T x = ix}.
(11.2)
H+
T = D(U )
and
H
T = (im(U )) .
with x0 D(T ), x+ H+
T and x HT , so
T x = T x0 + ix+ ix .
In particular, T is self adjoint if and only if U is unitary.
Proof. To say that x D(U ) = D (T + iI)1
says that
(x, (T + iI)y) = 0
y D(T ).
297
So if we set
x+ =
1
x1
2i
we have
x1 = (T + iI)x+ ,
x+ D(U ) .
so
(T + iI)x = (T + iI)(x0 + x+ )
or
T (x x0 x+ ) = i(x x0 x+ ).
x := x x0 x+
we get the desired decomposition x = x0 + x+ + x .
To show that the decomposition is unique, suppose that
x0 + x+ + x = 0.
Applying (T + iI) gives
0 = (T + iI)x0 + 2ix+ .
But (T + iI)x0 D(U ) and x+ D(U ) so both terms above must be zero,
so x+ = 0. Also, from the preceding theorem we know that (T + iI)x0 = 0
x0 = 0. Hence since x0 = 0 and x+ = 0 we must also have x = 0. QED
11.1.1
An elementary example.
Take H = L2 ([0, 1]) relative to the standard Lebesgue measure. Consider the
d
which is defined on all elements of H whose derivative, in the sense
operator 1i dt
of distributions, is again in L2 ([0, 1]). For any two such elements we have the
integration by parts formula
1 d
1 d
x, y = x(1)y(1) x(0)y(0) + x,
y .
i dt
i dt
(Even though in general the value at a point of an element in L2 makes no sense,
Rh
if x is such that x0 L2 then h1 0 x(t)dt makes sense, and integration by parts
using a continuous representative for x shows that the limit of this expression
is well defined and equal to x(0) for our continuous representative.) Suppose
d
we take T = 1i dt
but with D(T ) consisting of those elements whose derivatives
belong to L2 as above, but which in addition satisfy
x(0) = x(1) = 0.
298
1 d
i dt
1 d
y) = ei x(0)y(1) x(0)y(0).
i dt
This will vanish for all x D(A ) if and only if y D(A ). So we see that A
is self adjoint.
The moral is that to construct a self adjoint operator from a differential
operator which is symmetric, we may have to supplement it with appropriate
boundary conditions.
d
considered as an unOn the other hand, consider the same operator 1i dt
bounded operator on L2 (R). We take as its domain the set of all elements
of x L2 (R) whose distributional derivatives belong to L2 (R) and such that
limt x = 0. The functions et do not belong to L2 (R) and so our operator
is in fact self-adjoint. So the issue of whether or not we must add boundary
conditions depends on the nature of the domain where the differential operator
is to be defined. A deep analysis of this phenomenon for second order ordinary
differential equations was provided by Hermann Weyl in a paper published in
1911. It is safe to say that much of the progress in the theory of self-adjoint
operators was in no small measure influenced by a desire to understand and
generalize the results of this fundamental paper.
11.2
299
11.2.1
We are going to begin by showing that every such semigroup has an infinitesimal generator, i.e. can be written in some sense as Tt = eAt . It is important
to observe that we have made a serious change of convention in that we are
dropping the i that we have used until now. With this new notation, for example, the infinitesimal generator of a group of unitary transformations will be
a skew-adjoint operator rather than a self-adjoint operator. In quantum mechanics, where an observable is a self-adjoint operator, there is a good reason
for emphasizing the self-adjoint operators, and hence including the i. There
are many good reasons for deviating from the physicists notation, not the least
having to do with the theory of Lie algebras. I do not want to go into these
reasons now. Some will emerge from the ensuing notation. But the presence or
absence of the i is a cultural divide between physicists and mathematicians.
So we define the operator A as
1
Ax = lim (Tt I)x.
t&0 t
That is A is the operator defined on the domain D(A) consisting of those x for
which the limit exists.
Our first task is to show that D(A) is dense in F. For this we begin as
promised with the putative resolvent
Z
R(z) :=
ezt Tt dt
(11.3)
0
which is defined (by the boundedness and continuity properties of Tt ) for all z
with Re z > 0. We begin by checking that every element of im R(z) belongs to
300
D(A): We have
Z
Z
1 zt
1 zt
1
(Th I)R(z)x =
e Tt+h xdt
e Tt xdt =
h
h 0
h 0
Z
Z
Z
Z
1 z(rh)
1 zt
ezh 1 zt
1 h zt
e
Tr xdr
e Tt xdt =
e Tt xdt
e Tt xdt
h h
h 0
h
h 0
h
"
#
Z h
Z
ezh 1
1 h zt
zt
=
R(z)x
e Tt dt
e Tt xdt.
h
h 0
0
If we now let h 0, the integral inside the bracket tends to zero, and the
expression on the right tends to x since T0 = I. We thus see that
R(z)x D(A)
and
AR(z) = zR(z) I,
or, rewriting this in a more familiar form,
(zI A)R(z) = I.
(11.4)
This equation says that R(z) is a right inverse for zI A. It will require a lot
more work to show that it is also a left inverse.
We will first prove that D(A) is dense in F by showing that im(R(z)) is
dense. In fact, taking s to be real, we will show that
lim sR(s)x = x
Indeed,
x F.
(11.5)
sest dt = 1
sR(s)x x = s
0
For any > 0 we can, by the continuity of Tt , find a > 0 such that
p(Tt x x) < 0 t .
Now let us write
Z
Z
s
est p(Tt x x)dt = s
0
301
s
est dt s
est dt = .
As to the second integral, let M be a bound for p(Tt x) + p(x) which exists by
the uniform boundedness of Tt . The triangle inequality says that p(Tt x x)
p(Tt x) + p(x) so the second integral is bounded by
Z
M
sest dt = M es .
11.3
Tt Ax = Tt lim
lim
h&0
1
1
[Tt+h Tt ]x = lim [Th I]Tt x
h&0 h
h
h&0
1
[Tt+h Tt ]x = ATt x = Tt Ax.
h
302
`(Ts Ax)ds.
0
In turn, it is enough to prove this equality for the real and imaginary parts of `.
So it all boils down to a lemma in the theory of functions of a real variable:
Lemma 11.3.1 Suppose that f is a continuous real valued function of t with
the property that the right hand derivative
d+
f (t + h) f (t)
f := lim
= g(t)
h&0
dt
h
exists for all t and g(t) is continuous. Then f is differentiable with f 0 = g.
+
Proof. We first prove that ddt f 0 on an interval [a, b] implies that f (b)
f (a). Suppose not. Then there exists an > 0 such that
f (b) f (a) < (b a).
Set
F (t) := f (t) f (a) + (t a).
Then F (a) = 0 and
d+
F > 0.
dt
At a this implies that there is some c > a near a with F (c) > 0. On the other
hand, since F (b) < 0, and F is continuous, there will be some point s < b
with F (s) = 0 and F (t) < 0 for s < t b. This contradicts the fact that
+
[ ddt F ](s) > 0.
+
and if ddt f (t) M we can apply the above result to M t f (t) to conclude that
+
f (t2 ) f (t1 ) M (t2 t1 ). So if m = min g(t) = min ddt f on the interval [t1 , t2 ]
and M is the maximum, we have
m
f (t2 ) f (t1 )
M.
t2 t 1
11.3.1
303
The resolvent.
R(z) =
ezt Tt dt
x D(A)
(0) = 1.
So
(t) = ezt
which is impossible since (t) is a bounded function of t and the right hand
side of the above equation is not bounded for t 0 since the real part of z is
positive.
We have from (11.4) that
(zI A)R(z)(zI A)x = (zI A)x
and we know that R(z)(zI A)x D(A). From the injectivity of zI A we
conclude that R(z)(zI A)x = x.
From (zI A)R(z) = I we see that zI A maps im R(z) D(A) onto F so
certainly zI A maps D(A) onto F bijectively. Hence
im(R(z)) = D(A),
im(zI A) = F
and
R(z) = (zI A)1 .
We have already established the following:
304
D(A)
AR(z, A)x = R(z, A)Ax
AR(z, A)x
lim zR(z, A)x
z%
=
=
=
=
(11.6)
(11.7)
(11.8)
(11.9)
We also have
Theorem 11.3.2 The operator A is closed.
Proof. Suppose that xn D(A), xn x and yn y where yn = Axn . We
must show that x D(A) and Ax = y. Set
zn := (I A)xn
so zn x y.
11.3.2
Examples.
(11.10)
305
Translations.
Consider the one parameter group of translations acting on L2 (R):
[U (t)x](s) = x(s t).
(11.11)
where now the shift in (11.11) means mod 1. This still allows freedom in the
choice of phase between the exiting value of the x and its incoming value. Thus
we specify a unitary one parameter group when we fix a choice of phase as the
effect of passing go. This choice of phase is the origin of the that are needed
d
to introduce in finding the self adjoint extensions of 1i dt
acting on functions
vanishing at the boundary.
306
1 u2 /2t
e
.
2t
We have already verified in our study of the Fourier transform that this is a
continuous semi-group (when we set T0 = I) when acting on S. In fact, for
x S, we can take the Fourier transform and conclude that
[Tt x]() = ei
t/2
x
().
Differentiating this with respect to t and setting t = 0 (and taking the inverse
Fourier transform) shows that
d
1 d2
Tt x
=
x
dt
2 ds2
t=0
for x S. We wish to arrive at the same result for Tt acting on F. It is easy
enough to verify that the operators Tt are continuous in the uniform norm and
hence extend to an equibounded semigroup on F. We will now verify that the
infinitesimal generator A of this semigroup is
A=
1 d2
2 ds2
0
Z
Z
1 2 r(sv)2 /22
=
x(v)
2 r e
d dv setting t = 2 /r
2
0
Z
1
=
x(v)(r/2) 2 e 2r|sv| dv
( 2 +c2 / 2 )
e
0
d =
2c
e .
2
(11.12)
Let me postpone the calculation of this integral to the end of the subsection.
Assuming the evaluation of this integral we can write
Z s
r 12 Z
x(v)e 2r(sv) dv .
yr (s) =
x(v)e 2r(vs) dv +
2
307
x(v)e 2r(sv) dv .
yr0 (s) = r
x(v)e 2r(vs) dv
Z
yr00 (s) = 2rx(s) + r3/2 2
x(v))e 2r|vs| dv,
or
yr00 = 2r(yr x).
Comparing this with (11.10) which says that Ayr = r(yr x) we see that indeed
A=
1 d2
.
2 ds2
Let us now verify the evaluation of the integral in (11.12): Start with the
known integral
x2
.
e
dx =
2
0
Set x =
c/ so that dx = (1 + c/ 2 )d and x = 0 corresponds to = c.
Thus 2 =
Z
(c/)2
2c
(1 + c/) d = e
2c
=e
Z
( 2 +c2 / 2 )
Z
d +
e(
+c2 / 2 )
(1 + c/ 2 )d
( 2 +c2 / 2 )
e
c
c
d .
2
c
2 d
and this
and hence
= e2c
2
e(
+c2 / 2 )
which is (11.12).
Bochners theorem.
A complex valued continuous function F is called positive definite if for every
continuous function of compact support we have
Z Z
F (t s)(t)(s)dtds 0.
(11.13)
R
308
(It is easy to see that the fact that F is a positive definite function implies that
(x, x) 0 for all x F.) Passing to the quotient by the subspace of null vectors
and completing we obtain a Hilbert space H.
Let Ur be defined by [Ur x](t) = x(t r) as usual. Then
X
X
F (t+r(s+r))x(t)y(s) = (x, y).
(Ur x, Ur y) =
F (ts)x(tr)y(s r) =
t,s
t,s
F (r) =
eir dx,x =
309
11.4
k
X
t
0
k!
Bk
with convergence guaranteed as a result of the convergence of the usual exponential series in one variable. (There are serious problems with this definition
from the point of view of numerical implementation which we will not discuss
here.)
In infinite dimensional spaces some additional assumptions have to be placed
on an operator B before we can conclude that the above series converges. Here
is a very stringent condition which nevertheless suffices for our purposes.
Let F be a Frechet space and B a continuous map of F F. We will assume
that the B k are equibounded in the sense that for any defining semi-norm p
there is a constant K and a defining semi-norm q such that
p(B k x) Kq(x) k = 1, 2, . . . x F.
Here the K and q are required to be independent of k and x.
Then
n
n
n
X
X
X
tk k
tk
tk
p(
B x)
p(B k x) Kq(x)
k!
k!
k!
m
m
n
and so
n
X
tk
0
k!
Bk x
is a Cauchy sequence for each fixed t and x (and uniformly in any compact
interval of t). It therefore converges to a limit. We will denote the map x 7
P tk k
0 k! B x by
exp(tB).
310
11.5
This formula shows that R(z, A)x is continuous in z. The resolvent equation
R(z, A) R(w, A) = (w z)R(z, A)R(w, A)
then shows that R(z, A)x is complex differentiable in z with derivative R(z, A)2 x.
It then follows that R(z, A)x has complex derivatives of all orders given by
dn R(z, A)x
= (1)n n!R(z, A)n+1 x.
dz n
On the other hand, differentiating the integral formula for the resolvent n- times
gives
Z
dn R(z, A)x
=
ezt (t)n Tt dt
dz n
0
where differentiation under the integral sign is justified by the fact that the Tt
are equicontinuous in t. Putting the previous two equations together gives
Z
z n+1 zt n
n+1
(zR(z, A))
x=
e t Tt xdt.
n! 0
This implies that for any semi-norm p we have
Z
z n+1 zt n
p((zR(z, A))n+1 x)
e t sup p(Tt x)dt = sup p(Tt x)
n! 0
t0
t0
since
Z
0
ezt tn dt =
n!
z n+1
311
x D(A).
(11.14)
The idea of the proof is now this: By the results of the preceding section, we
can construct the one parameter semigroup s 7 exp(sJn ). Set s = nt. We can
then form ent exp(ntJn ) which we can write as exp(tn(Jn I)) = exp(tAJn )
by virtue of (11.14). We expect from (11.5) that
lim Jn x = x
x F.
(11.15)
This then suggests that the limit of the exp(tAJn ) be the desired semi-group.
So we begin by proving (11.15). We first prove it for x D(A). For such x
we have (Jn I)x = n1 Jn Ax by (11.14) and this approaches zero since the Jn
are equibounded. But since D(A) is dense in F and the Jn are equibounded we
conclude that (11.15) holds for all x F.
Now define
(n)
Tt
312
X (nt)k
k!
p(Tt
x) Kq(x).
(11.16)
(n)
Tt
(m)
x Tt
Z
x=
0
d (m) (n)
(T
T )xds =
ds ts s
(m)
p(Tt
(m)
x Tt
x) Ktq((Jn Jm )Ax).
(n)
From (11.15) this implies that the Tt x converge (uniformly in every compact
(n)
interval of t) for x D(A), and hence since D(A) is dense and the Tt are
equicontinuous for all x F. The limiting family of operators Tt are equicon(n)
tinuous and form a semi-group because the Tt have this property.
We must show that the infinitesimal generator of this semi-group is A. Let
us temporarily denote the infinitesimal generator of this semi-group by B, so
that we want to prove that A = B. Let x D(A). We claim that
(n)
lim Tt
AJn x = Tt Ax
(11.17)
p(Tt Ax Tt
(n)
AJn x) p(Tt Ax Tt
p((Tt
(n)
Ax) + p(Tt
(n)
Tt )Ax)
(n)
Ax Tt
AJn x)
+ Kq(Ax Jn Ax)
where we have used (11.16) to get from the second line to the third. The second
term on the right tends to zero as n and we have already proved that
the first term converges to zero uniformly on every compact interval of t. This
establishes (11.17).
313
(n)
lim (Tt x x)
Z t
= lim
Ts(n) AJn xds
n 0
Z t
( lim Ts(n) AJn x)ds
=
0 n
Z t
=
Ts Axds
n
where the passage of the limit under the integral sign is justified
R t by the uniform
convergence in t on compact sets. It follows from Tt x x = 0 Ts Axds that x
is in the domain of the infinitesimal operator B of Tt and that Bx = Ax. So B
is an extension of A in the sense that D(B) D(A) and Bx = Ax on D(A).
But since B is the infinitesimal generator of an equibounded semi-group, we
know that (I B) maps D(B) onto F bijectively, and we are assuming that
(I A) maps D(A) onto F bijectively. Hence D(A) = D(B). QED
In case F is a Banach space, so there is a single norm p = k k, the hypotheses
of the theorem read: D(A) is dense in F, the resolvents R(n, A) exist for all
integers n = 1, 2, . . . and there is a constant K independent of n and m such
that
k(I n1 A)m k K n = 1, 2, . . . , m = 1, 2, . . . .
(11.18)
11.6
Contraction semigroups.
In particular, if A satisfies
k(I n1 A)1 k 1
(11.19)
1
.
|Im (z)|
314
11.6.1
=
=
=
(11.20)
315
(11.21)
n=0
by our general power series formula for the resolvent. In particular, for s real and
|s 1| < 1 the resolvent exists, and then (11.21) implies that kR(s, A)k s1 .
Repeating the process we keep enlarging the resolvent set (A) until it includes
the whole positive real axis and conclude from (11.21) that kR(s, A)k s1
which implies (11.19). As we are assuming that D(A) is dense we conclude that
A generates a contraction semigroup.
Conversely, suppose that Tt is a contraction semi-group with infinitesimal
generator A. We know that Dom(A) is dense. Let hh, ii be any semi-scalar
product. Then
Re hhTt x x, xii = Re hhTt x, xii kxk2 kTt xkkxk kxk2 0.
Dividing by t and letting t & 0 we conclude that Re hhAx, xii 0 for all
x D(A), i.e. A is dissipative for hh, ii. QED
Once again, this gives a direct proof of the existence of the unitary group
generated generated by a skew adjoint operator.
A useful way of verifying the condition im(I A) = F is the following: Let
A : F F be the adjoint operator which is defined if we assume that D(A)
is dense.
316
Proposition 11.6.1 Suppose that A is densely defined and closed, and suppose
that both A and A are dissipative. Then im(I A) = F and hence A generates
a contraction semigroup.
Proof. The fact that A is closed implies that (I A)1 is closed, and since we
know that (I A)1 is bounded from the fact that A is dissipative, we conclude
that im(I A) is a closed subspace of F . If it were not the whole space there
would be an ` F which vanished on this subspace, i.e.
h`, x Axi = 0 x D(A).
This implies that that ` D(A ) and A ` = ` which can not happen if A is
dissipative by (11.21) applied to A and s = 1. QED
11.6.2
exp(t(B I)) = et
k k
X
t B
k=0
k!
k exp(t(B I))k et
k
X
t kBkk
k=0
k!
1.
For future use (Chernoffs theorem and the Trotter product formula) we
record (and prove) the following inequality:
k[exp(n(B I)) B n ]xk
nk(B I)xk
x F, and n = 1, 2, 3 . . . .
(11.22)
317
Proof.
k[exp(n(B I)) B n ]xk = ken
X
nk
k=0
en
en
= en
en
k!
X
nk
k=0
X
k=0
X
k=0
X
k=0
k!
(B k B n )xk
k(B k B n )xk
nk
k(B |kn| I)xk
k!
nk
k(B I)(I + B + + B (|kn|1 )xk
k!
nk
|k n|k(B I)xk.
k!
X
nk
k=0
k!
|k n|
n.
(11.23)
Consider the space of all sequences a = {a0 , a1 , . . . } with finite norm relative to
scalar product
X
nk
(a, b) := en
ak bk .
k!
k=0
k
k
X
X
X
u
u
n
n
nk
n
n
2
t
e
|k n| e
(k n) ten
.
k!
k!
k!
k=0
k=0
k=0
The second square root is one, and we recognize the sum under the first square
root as the variance of the Poisson distribution with parameter n, and we know
that this variance is n. QED
11.7
Convergence of semigroups.
318
restrictive hypothesis that these all coincide, since in many important applications they wont.
For this purpose we make the following definition. Let us assume that F
is a Banach space and that A is an operator on F defined on a domain D(A).
We say that a linear subspace D D(A) is a core for A if the closure A of A
and the closure of A restricted to D are the same: A = A|D. This certainly
implies that D(A) is contained in the closure of A|D. In the cases of interest to
us D(A) is dense in F, so that every core of A is dense in F.
We begin with an important preliminary result:
Proposition 11.7.1 Suppose that An and A are dissipative operators, i.e. generators of contraction semi-groups. Let D be a core of A. Suppose that for each
x D we have that x D(An ) for sufficiently large n (depending on x) and
that
An x Ax.
(11.24)
Then for any z with Re z > 0 and for all y F
R(z, An )y R(z, A)y.
(11.25)
Proof.
We know that the R(z, An ) and R(z, A) are all bounded in norm
by 1/Re z. So it is enough for us to prove convergence on a dense set. Since
(zI A)D(A) = F, it follows that (zI A)D is dense in F since A is closed.
So in proving (11.25) we may assume that y = (zI A)x with x D. Then
kR(z, An )y R(z, A)yk = kR(z, An )(zI A)x xk
= kR(z, An )(zI An )x + R(z, An )(An x Ax) xk
= kR(z, An )(An A)xk
1
k(An A)xk 0,
Re z
where, in passing from the first line to the second we are assuming that n is
chosen sufficiently large that x D(An ). QED
Theorem 11.7.1 Under the hypotheses of the preceding proposition,
(exp(tAn ))x (exp(tA))x
for each x F uniformly on every compact interval of t.
Proof. Let
n (t) := et [((exp(tAn ))x (exp(tA))x)] for t 0
and set (t) = 0 for t < 0. It will be enough to prove that these F valued
functions converge uniformly in t to 0, and since D is dense and since the
operators entering into the definition of n are uniformly bounded in n, it is
enough to prove this convergence for x D which is dense. We claim that
319
for fixed x D the functions n (t) are uniformly equi-continuous. To see this
observe that
d
n (t) = et [(exp(tAn ))An x (exp(tA))Ax] et [(exp(tAn ))x (exp(tA))x]
dt
for t 0 and the right hand side is uniformly bounded in t 0 and n.
So to prove that n (t) converges uniformly in t to 0, it is enough to prove
this fact for the convolution n ? where is any smooth function ofcompact
support, since we can choose the to have small support and integral 2, and
then n (t) is close to (n ? )(t).
Now the Fourier transform of n ? is the product of their Fourier transforms:
n . We have n (s) =
Z
1
1
and
im R(z0 ) is dense in F.
320
11.8
11.8.1
Lies formula.
(11.26)
C
n2
n1
X
k=0
so
n1
1
1
(kAk + kBk) and kTn k exp (kAk + kBk)
n
n
and
n1
1
n1
exp (kAk + kBk)
= exp
(kAk + kBk) exp(kAk + kBk)
n
n
321
so
C
exp(kAk + kBk.
n
This same proof works if A and B are self-adjoint operators such that A + B
is self-adjoint on the intersection of their domains. For a proof see Reed-Simon
vol. I pages 295-296. For applications this is too restrictive. So we give a more
general formulation and proof following Chernoff.
kSnn Tnn k
11.8.2
Chernoff s theorem.
x D.
n
t
y = (exp tA)y
lim f
n
(11.27)
322
t
t
t
t n
f
I xk.
k exp(tCn ) f
xk nk f
I xk = k
n
n
n
n t
The expression inside the k k on the right tends to Ax so the whole expression
tends to zero. This proves (11.27) for all x in D. But since D is dense in F and
f (t/n) and exp tA are bounded in norm by 1 it follows that (11.27) holds for all
y F. QED
11.8.3
(11.28)
11.8.4
323
Commutators.
11.8.5
This is the starting point of a class of theorems which asserts that that if A is
self-adjoint and if B is a symmetric operator which is small in comparison to
A then A + B is self adjoint.
324
0 a < 1, x D(A).
1
,
kA(A + i)1 k 1.
11.8.6
L2 (R3 ) L2 (R3 )
given by
H0 :=
2
2
2
+ 2+ 2
2
x1
x2
x3
.
325
Here the domain of H0 is taken to be those L2 (R3 ) for which the differential
operator on the right, taken in the distributional sense, when applied to gives
an element of L2 (R3 ).
The operator H0 is called the free Hamiltonian of non-relativistic quantum
mechanics. The Fourier transform F is a unitary isomorphism of L2 (R3 ) into
L2 (R3 ) and carries H0 into multiplication by 2 whose domain consists of those
t
(exp i V )f
n
Hence we can write the expression under the limit sign in the Trotter product formula, when applied to f and evaluated at x0 as the following formal
expression:
3n/2 Z
Z
4it
326
11.9
An important advance was introduced by Mark Kac in 1951 where the unitary
group exp it(H0 +V ) is replaced by the contraction semi-group exp t(H0 +V ).
Then the techniques of probability theory (in particular the existence of Wiener
measure on the space of continuous paths) can be brought to bear to justify
a formula for the contractive semi-group as an integral over path space. I will
state and prove an elementary version of this formula which follows directly
from what we have done. The assumptions about the potential are physically
unrealistic, but I choose to regard the extension to a more realistic potential as
a technical issue rather than a conceptual one.
Let V be a continuous real valued function of compact support. To each
continuous path on Rn and for each fixed time t 0 we can consider the
integral
Z
t
V ((s))ds.
0
The map
Z
7
V ((s))ds
0
(11.31)
327
V ((s))ds
(11.32)
m j=1
m
0
for each fixed .
Theorem 11.9.1 The Feynman-Kac formula. Let V be a continuous real
valued function of compact support on Rn . Let
H =+V
as an operator on H = L2 (Rn ). Then H is self-adjoint and for every f H
Z t
Z
tH
e
f (x) =
f ((t)) exp
V ((s))ds dx
(11.33)
x
t
=
p x, xm ,
m
Rn
Rn
t
p x, xm ,
m
m
X
t
f (x)1 exp
V (xj ) dx1 dxm .
m
j=1
Z
m
X
t
jt
f ((t))dx .
exp
V
m
m
x
j=1
The integrand (with respect to the Wiener measure dx ) converges on all continuous paths, that is to say almost everywhere with respect to dx to the
integrand on right hand side of (11.33). So to justify (11.33) we must prove
that the integral of the limit is the limit of the integral. We will do this by the
dominated convergence theorem:
Z
m
X
t
jt
exp
V
f ((t)) dx
m j=1
m
x
328
|f ((t))|dx = et max |V | et |f | (x) <
for almost all x. Hence, by the dominated convergence theorem, (11.33) holds
for almost all x. QED
11.10
In this section I want to discuss the following circle of ideas. Consider the
operator
H0 : L2 (R3 ) L2 (R3 )
given by
H0 :=
2
2
2
+ 2+ 2
2
x1
x2
x3
.
Here the domain of H0 is taken to be those L2 (R3 ) for which the differential
operator on the right, taken in the distributional sense, when applied to gives
an element of L2 (R3 ).
The operator H0 has a fancy name. It is called the free Hamiltonian of nonrelativistic quantum mechanics. Strictly speaking we should add for particles
of mass one-half in units where Plancks constant is one.
The Fourier transform is a unitary isomorphism of L2 (R3 ) into L2 (R3 ) and
carries H0 into multiplication by 2 whose domain consists of those L2 (R3 )
such that 2 ()
belongs to L2 (R3 ). The operators
V (t) : L2 (R3 ) L2 (R3 ),
()
7 eit
form a one parameter group of unitary transformations whose infinitesimal generator in the sense of Stones theorem is operator consisting of multiplication by
2 with domain as given above. [The minus sign before the i in the exponential
is the convention used in quantum mechanics. So we write exp itA for the
one-parameter group associated to the self-adjoint operator A. I apologize for
this (rather irrelevant) notational change, but I want to make the notation in
this section consistent with what you will see in physics books.]
Thus the operator of multiplication by 2 , and hence the operator H0 is a
self-adjoint transformation. The operator of multiplication by 2 is clearly nonnegative and so every point on the negative real axis belongs to its resolvent
set. Let us write a point on the negative real axis as 2 where > 0. Then
the resolvent of multiplication by 2 at such a point on the negative real axis is
given by multiplication by f where
f () = f () :=
1
.
2 + 2
11.10.1
Nevertheless, we will be able to give some slightly more explicit (and very instructive) representations of these operators as convolutions. For example, we
will use the Cauchy residue calculus to compute f and we will find, up to factors
of powers of 2 that f is the function
Y (x) :=
er
r
where r denotes the distance from the origin, i.e. r2 = x2 . This function
has an integrable singularity at the origin, and vanishes rapidly at infinity. So
convolution by Y will be well defined and given by the usual formula on elements
of S and extends to an operator on L2 (R3 ).
The function Y is known as the Yukawa potential. Yukawa introduced
this function in 1934 to explain the forces that hold the nucleus together. The
exponential decay with distance contrasts with that of the ordinary electromagnetic or gravitational potential 1/r and, in Yukawas theory, accounts for the
fact that the nuclear forces are short range. In fact, Yukawa introduced a heavy
boson to account for the nuclear forces. The role of mesons in nuclear physics
was predicted by brilliant theoretical speculation well before any experimental
discovery. Here are the details:
Since f L2 we can compute its inverse Fourier transform as
(2)3/2 f = lim (2)3
R
Z
||R
eix
d.
2 + 2
(11.34)
330 + i R
R
CHAPTER
R +11.
i RSTONES THEOREM
6
?
Here p
lim means the L2 limit and || denotes the length of the vector , i.e.
|| = 2 and we will use similar notation |x| = r for the length of x. Assume
x 6= 0. Let
x
u :=
|||x|
so u is the cosine of the angle between x and . Fix x and introduce spherical
coordinates in space with x at the north pole and s = || so that
Z
Z R Z 1 is|x|u
e
eix
2
d
=
(2)
s2 duds
(2)3
2
2
2
2
0
1 s +
||R +
=
1
(2)2 i|x|
seis|x|
ds.
(s + i)(s i)
This last integral is along the bottom of the path in the complex s-plane consisting of the boundary of the rectangle as drawn in the figure.
On the two vertical sides of the rectangle, the integrand is bounded
by some
constant time 1/R, so the
contribution of the vertical sides is O(1/ R). On the
top the integrand is O(e R ). So the limits of these integrals are zero. There
is only one pole in the upper half plane at s = i, so the integral is given by
2i this residue which equals
2i
ie|x|
= ie|x| .
2i
Inserting this back into (11.34) we see that the limit exists and is equal to
(2)3/2 f =
1 e|x|
.
4 |x|
1
4
Z
R3
e|xy|
(y)dy,
|x y|
11.10.2
The explicit calculation of the operator U (t) is slightly more tricky. The
2
function 7 eit is an imaginary Gaussian, so we expect is inverse Fourier
transform to also be an imaginary Gaussian, and then we would have to make
sense of convolution by a function which has absolute value one at all points.
There are several ways to proceed. One involves integration by parts, and I
hope to explain how this works later on in the course in conjunction with the
method of stationary phase.
Here I will follow Reed-Simon vol II p.59 and add a little positive term to
t and then pass to the limit. In other words, let be a complex number with
positive real part and consider the function
7 e
This function belongs to S and its inverse Fourier transform is given by the
function
2
x 7 (2)3/2 ex /4 .
(In fact, we verified this when is real, but the integral defining the inverse
Fourier transform converges in the entire half plane Re > 0 uniformly in any
Re > and so is holomorphic in the right half plane. So the formula for real
positive implies the formula for in the half plane.)
We thus have
3/2 Z
2
1
e|xy| /4 (y)dy.
(eH0 )(x) =
4
3
R
Here the square root in the coefficient in front of the integral is obtained by
continuation from the positive square root on the positive axis. For example, if
we take = + it so that = i(t i) we get
Z
2
3
(U (t))(x) = lim (U (t i))(x) = lim (4i(t i)) 2
e|xy| /4i(ti) (y)dy.
&0
&0
332
and the integral converges. For general elements of L2 the operator U (t)
is obtained by taking the L2 limit of the above expression for any sequence
of elements of L1 L2 which approximate in L2 . Alternatively, we could
interpret the above integral as the & limit of the corresponding expression
with t replaced by t i.
Chapter 12
12.1
It is a truism in atomic physics or quantum chemistry courses that the eigenstates of the Schr
odinger operator for atomic electrons are the bound states,
the ones that remain bound to the nucleus, and that the scattering states
which fly off in large positive or negative times correspond to the continuous
spectrum. The purpose of this section is to give a mathematical justification
for this truism. The key result is due to Ruelle, (1969), using ergodic theory
methods. The more streamlined version presented here comes from the two papers mentioned above. The ergodic theory used is limited to the mean ergodic
theorem of von-Neumann which has a very slick proof due to F. Riesz (1939)
which I shall give.
12.1.1
Schwartzschilds theorem.
334
negative time but which remain in a finite region for all future time constitute
a set of measure zero. Schwartzschild derived his theorem from the Poincare
recurrence theorem. I learned these two theorems from a course on celestial
mechanics by Carl Ludwig Siegel that I attended in 1953. These theorems appear in the last few pages of Siegels famous Vorlesungen u
ber Himmelsmechanik
which developped out of this course. For proofs I refer to the treatment given
there.
The Poincar
e recurrence theorem.
This says the following:
Theorem 12.1.1 [The Poincar
e recurrence theorem.] Let St be a measure
preserving flow on a measure space (M, ). Let A be a subset of M contained
in an invariant set of finite measure. Then outside of a subset of A of measure
zero, every point p of A has the property that St p A for infinitely many times
in the future.
The idea is quite simple. If this were not the case, there would be an infinite
sequence of disjoint sets of positive measure all contained in a fixed set of finite
measure.
Schwartzschilds theorem.
Consider the same set up as in the Poincare recurrence theorem, but now let
M be a metric space with a regular measure, and assume that the St are
homeomorphisms. Let A be an open set of finite measure, and let B consist of
all p such that St p A for all t 0. Let C consist of all p such that t p A for
all t R. Clearly C B.
Theorem 12.1.2 [Schwartzchilds theorem.] The measure of B \ C is zero.
Again, the proof is straightforward and I refer to Siegel for details. Phrased
in more intuitive language, Schwartzschilds theorem says that outside of a set
of measure zero, any point p A which has the property that St p A for all
t 0 also has the property that St p A for all t < 0. The capture orbits
have measure zero.
Of course, the catch in the theorem is that one needs to prove that B has
positive measure for the theorem to have any content.
Siegel calls the set B the set of points which are weakly stable for the future
(with respect to A ) and C the set of points which are weakly stable for all time.
Application to the solar system?
Suppose one has a mechanical system with kinetic energy T and potential energy
V.
335
If the potential energy is bounded from below, then we can find a bound for
the kinetic energy in terms of the total energy:
T aH + b
which is the classical analogue of the key operator bound in quantum version,
see equation (12.7) below. A region of the form
kxk R,
H(x, p) E
then forms a bounded region of finite measure in phase space. Liouvilles theorem asserts that the flow on phase space is volume preserving. So Schwartzschilds
theorem applies to this region.
I quote Siegel as to the possible application to the solar system:
Under the unproved assumption that the planetary system is weakly
stable with respect to all time, we can draw the following conclusion:
If the planetary system captures a particle coming in from infinity,
say some external matter, then the new system with this additional
particle is no longer weakly stable with respect to just future time,
and it follows that the particle - or a planet or the sun- must again
be expelled, or a collision must take place. For an interpretation of
the significance of this result, one must, however, consider that we
do not even know whether for n > 2 the solutions of the n-body
problem that are weakly stable with respect to all time form a set
of positive measure.
12.1.2
Vt f dt
0
336
d
Vt g
dt
so
1
T
Vt hdt =
0
1
(Vt g g) 0.
T
By hypothesis, for any f H we can, for any > 0 , find an h of the above
form such that kf hk < 12 so
Z
Z
1 T
1
1 T
Vt f dt
+
Vt hdt
.
T 0
2
T 0
By then choosing T sufficiently large we can make the second term less than
1
2 .
12.1.3
General considerations.
(12.1)
337
and
)
Z
1 T
f H lim
kFr Vt f k2 dt = 0, for all r = 1, 2, . . . . (12.2)
T T 0
(
M :=
(12.3)
kFr Vt f1 k2 dt +
kFr Vt f2 k2 dt.
T
T
0
0
Each term on the right converges to 0 as T proving that af1 + bf2 M .
This proves 1).
Proof of 2. Let fn M0 and suppose that fn f . Given > 0 choose N so
that kfn f k2 < 14 for all n > N . This implies that
k(I Fr )Vt (f fn )k2 <
1
4
338
1
+ 2 sup k(I Fr )Vt fn k2
2
t
for all n > N and any fixed r. We may choose r sufficiently large so that the
second term on the right is also less that 21 . This proves that f M0 .
Let fn M and suppose that fn f . Given > 0 choose N so that
kfn f k2 < 14 for all n > N . Then
Z
Z
2 T
1 T
2
kFr Vr f k dt
kFr Vr (f fn )k2 dt
T 0
T 0
Z
2 T
+
kFr Vr fn k2 dt
T 0
Z
1
2 T
+
kFr Vr fn k2 dt.
2
T 0
Fix n. For any given r we can choose T0 large enough so that the second term
on the right is < 12 . This shows that for any fixed r we can find a T0 so that
Z
1 T
kFr Vr f k2 dt <
T 0
for all T > T0 , proving that f M . This proves 2).
Proof of 3. Let f M0 and g M both 6= 0. Then
Z
1 T
|(f, g)|2 =
|(f, g)|2 dt
T 0
Z
1 T
|(Vt f, Vt g|2 dt
=
T 0
Z
1 T
=
|(Fr0 Vt f, g) + (Vt f, Fr g)|2 dt
T 0
Z
Z
2 T
2 T
0
2
|(Fr Vt f, Vt g)| dt +
|(Vt f, Fr g)|2 dt
T 0
T 0
Z
Z T
2
2
kgk2
|Fr0 Vt f k2 dt + kf k2
kFr Vt gk2 dt
T
T
0
where we used the Cauchy-Schwarz inequality in the last step.
For any > 0 we may choose r so that
kFr0 Vt f k2
4kgk2
for all t. We can choose a T such that
Z
1 T
kFr Vt gk2 dt <
.
T 0
4kf k2
339
Proof of 5. By 3) we have M M
0 . By 4) we have M0 Hp = Hc .
Proposition 12.1.1 is valid without any assumptions whatsoever relating H
to the Fr . The only place where we used H was in the proof of 4) where we
used the fact that if f is an eigenvector of H then it is also an eigenvector of of
Vt and so we could pull out a scalar.
The goal is to impose sufficient relations between H and the Fr so that
Hc M .
(12.4)
M0 M
= Hc = Hp .
12.1.4
Recall that the mean ergodic theorem says that if Ut is a unitary one parameter
group acting without (non-zero) fixed vectors on a Hilbert space G then
Z
1 T
lim
Ut dt = 0
T T 0
for all G. Let
c.
G = Hc H
We know from our discussion of the spectral theorem (Proposition 10.12.1)
that H I I H does not have zero as an eigenvalue acting on G. We may
apply the mean ergodic theorem to conclude that
Z
1 T itH
lim
e
f eitH edt = 0
T T 0
340
|(e, Vt f )|2 dt = 0 e H, f Hc .
(12.5)
12.1.5
.
6
Any compact operator in a separable Hilbert space is the norm limit of finite
rank operators. So we can find a finite rank operator TN such that
kFr Sn Ec TN k2 <
.
12kf k2
341
kFr Vt gk2 dt
2
T
kFr Vt (S Sn )f k2 dt +
4
+
3 T
2
4
+
3
T
kFr Vt Sn f k2 dt
2
T
kFr Sn Ec TN k2 kVt f k2 dt +
4
T
kTN Vt k2 dt
kTN Vt k2 dt.
Z
0
4
T
RT
0
4
kTN Vt k dt =
T
2
2N 1 4
kgi k2
1
T
N
2
X
(Vt f, hi )gi
dt
i=0
By (12.5) we can choose T0 so large that this expression is < 3 for all T > T0 .
Of course a special case of the theorem will be where all the Sn = S as will
be the case for Ruelles theorem for Kato potentials.
12.1.6
Kato potentials.
(12.6)
342
7 kk2 ()
where kk denotes the Euclidean norm of . Since belongs to the Schwartz
space S, the function
7 (1 + kk2 )()
belongs to L2 as does the function
7 (1 + kk2 )1
in three dimensions. Let denote the function
7 kk.
By the Cauchy-Schwarz inequality we have
1 = |((1 + 2 )1 , (1 + 2 ))|
kk
ck2 k
2 + ckk
2
ck(2 + 1)k
where
c2 = k(1 + 2 )1 k2 .
For any r > 0 and any function S let r be defined by
r () = r3 (r).
Then
1
2 , and k2 r k2 = r 2 k2 k2 .
1 , kr k2 = r 2 kk
kr k1 = kk
Applied to this gives
1 cr 21 k2 k
2 + cr 32 kk
2.
kk
By Plancherel
2 = kk2
k2 k
2 = kk2 .
and kk
343
1
kxk
1
kxi xj k
are Kato potentials as are any linear combination of them. So the total Coulomb
potential of any system of charged particles is a Kato potential.
P
By example 12.1.6, the restriction of this potential to the subspace {x| mi xi =
0} is a Kato potential. This is the atomic potential about the center of mass.
12.1.7
1
1
and b =
()
,
1
(12.7)
0 < < 1.
344
If we choose < 1 and then Re z sufficiently negative, we can make the right
hand side of this inequality < 1. For this range of z we see that R(z, H) =
(zI H)1 is bounded so the range of zI H is all of L2 . This proves that H
is self-adjoint and that its resolvent set contains a half plane Re z << 0 and so
is bounded from below. Also, for Dom() we have
= H V
so
kk kHk + kV k kHk + kk + kk
which proves (12.7).
12.1.8
is 7 g()().
The operator f (x)g(p) is the norm limit of the operators fn gn
where fn is obtained from f by setting fn = 1Bn f where Bn is the ball of radius
1 about the origin, and similarly for g. The operator fn (x)gn (p) is given by the
square integrable kernel
Kn (x, y) = fn (x)
gn (x y)
and so is compact. Hence f (x)g(p) is compact. We will take
g(p) =
1
= (1 + )1 .
1 + p2
1
(1 + )R(z, H)
1 + p2
12.1.9
345
Ruelles theorem.
12.2
12.2.1
1
.
+1
|x|
By the functional calculus, f (H) is well defined, and in any spectral representation goes over into multiplication by f (h) which is injective. So K = f (H)1 I
is a well defined (in general unbounded) self-adjoint operator whose spectral representation is multiplication by h . But the expression for K is independent of
the spectral representation. This shows that H = K is well defined.
Proposition 12.2.1 Let H be a self-adjoint operator on a Hilbert space H and
let Dom(H) be the domain of H. Let 0 < < 1. Then f Dom(H) if and only
if f Dom(H ) and H f Dom(H 1 ) in which case
Hf = H 1 H f.
346
BH (f, g) := (H 2 f, H 2 g),
1
But H 2 = (H 2 ) so the second part of the proposition follows from the first.
12.2.2
Quadratic forms.
347
counterexample.
Let H = L2 (R) and let D consist of all continuous functions of compact support.
Let
B(f, g) = f (0)g(0).
The only candidate for an operator H which satisfies B(f, g) = (Hf, g) is the
operator which consists of multiplication by the delta function at the origin.
But there is no such operator.
Consider a sequence of uniformly bounded continuous functions fn of compact support which are all identically one in some neighborhood of the origin and whose support shrinks to the origin. Then fn 0 in the norm
of H. Also, Q(fn fm , fn fm ) 0, so Q(fn fm , fn fm ) 0. But
Q(fn , fn ) 1 6= 0 = Q(0, 0). So D is not complete for the norm k k1
1
kf k1 := (Q(f ) + kf k2H ) 2 .
Consider a function g D which equals one on the interval [1, 1] so that
(g, g) = 1. Let gn := g fn with fn as above. Then gn g in H yet Q(gn ) 0.
So Q is not lower semi-continuous as a function on D.
We recall the definition of lower semi-continuity:
12.2.3
348
12.2.4
349
m, n > N.
Z
Q(f, g) + (f, g) = (f, g)1 =
f gd.
S
1
.
1+h
Then the two previous equations imply that H is unitarily equivalent to L2 (S, ),
i.e.
Z
(f, g) =
f gd
S
350
and
Z
Q(f, g) =
hf gd.
S
This last equation says that Q is the quadratic form associated to the operator
H corresponding to multiplication by h.
12.2.5
f D.
Dom(H12 ) = Dom(H22 ) = D.
Proof. The assumption on the relation between the forms implies that their
associated metrics on D are equivalent. So if D is complete with respect to
one metric it is complete with respect to the other, and the domains of the
associated self-adjoint operators both coincide with D.
12.2.6
351
and kfn k 0.
So
kf k21
=
=
=
m n
m n
12.3
In this section will denote a bounded open set in RN , with piecewise smooth
boundary, c > 1 is a constant, b is a continuous function defined on the closure
of satisfying
c1 < b(x) < c x
and
a = (aij ) = (aij (x))
is a real symmetric matrix valued function of x defined and continuously differentiable on and satisfying
c1 I a(x) cI
x .
Let
Hb := L2 (, bdN x).
We let C () denote the space of all functions f which are C on and all
of whose partial derivatives can be extended to be continuous functions on .
We let
C0 () C ()
denote those f satisfying f (x) = 0 for x .
352
For f C0 () we define Af by
Af (x) := b(x)1
N
X
f
aij (x)
.
xi
xj
i,j=1
Z
N
X
f N
(Af, g)b =
aij (x)
gd x
x
x
i
j
i,j=1
=
Z X
aij
ij
f g N
d x = (f, Ag)b .
xi xj
(12.8)
then Q is symmetric and so defines a quadratic form associated to the nonnegative symmetric operator H. We may apply the Friedrichs theorem to conclude the existence of a self adjoint extension H of A which is associated to the
closure of Q.
The closure of Q is complete relative to the metric determined by Theorem
12.2.1. But our assumptions about b and a guarantee the metrics of quadratic
forms coming from different choices of b and a are equivalent and all equivalent
to the metric coming from the choice b 1 and a (ij ) which is
Z
kf k21 =
|f |2 + |f |2 dN x,
(12.9)
where
f = 1 f 2 f N f
and
2
12.3.1
353
We can then define the partial derivatives of f in the sense of the theory of
distributions, for example
Z
`i f () =
f (i )dN x.
These partial derivatives may or may not come from elements of L2 (, dN x).
We define the space W 1,2 () to consist of those f L2 (, dN x) whose first
partial derivatives (in the distributional sense) i f = f /xi all come form
elements of L2 (, dN x). We define a scalar product ( , )1 on W 1,2 () by
Z n
o
(f, g)1 :=
f (x)g(x) + f (x) g(x) dN x.
(12.10)
and i fn gi
i = 1, . . . N
=
=
(gi , )
lim (i fn , )
= lim (fn , i )
n
= (f, i )
which says that gi = i f .
We define W01,2 () to be the closure in W 1,2 () of the subspace Cc ().
Since Cc () C0 () the domain of Q, the closure of the form Q defined by
(12.8) on C0 () contains W01,2 ().
We claim that
Lemma 12.3.1 Cc () is dense in C0 () relative to the metric k k1 given by
(12.9).
Proof. By taking real and imaginary parts, it is enough to prove this theorem
for real valued functions. For any > 0 let F be a smooth real valued function
on R such that
F (x) = x |x| > 2
F (x) = 0
|x| <
354
0 F0 (x) 3
x R.
For f C0 () define
f (x) := F (f (x)),
so F
Cc ().
Also,
0
\B
12.3.2
12.3.3
355
Rademachers theorem says that a Lipschitz function on RN is differentiable almost everywhere with a bound on its derivative given by the Lipschitz constant.
The following is a variant of this theorem which is useful for our purposes.
Theorem 12.3.1 Let f be a continuous real valued function on RN which vanishes outside a bounded open set and which satisfies
x, y RN
(12.11)
|f |2 + |f |2 dN x |K|c2 (N + diam(K))
RN
x
s
So
ks (x) = 0 if kxk s,
ks (x) > 0 if kxk < s, and
R
RN ks (x)dN x = 1.
Define ps by
ks (x z)f (z)dN y
ps (x) :=
RN
so ps is smooth,
supp ps Ks = {x| d(x, K) s}
and
Z
ps (x) ps (y) =
RN
(f (x z) f (y z)) ks (z)dN z
356
so
|ps (x) ps (y)| ckx yk.
This implies that kps (x)k c so the mean value theorem implies that supxRN |ps (x)|
c diam Ks and so
kps k21 |Ks |c2 (diam Ks2 + N ).
By Plancherel
kps k21
(1 + kk2 )|
ps ()|2 dN
=
RN
Z
h() =
k(x)eix dN x.
RN
The function h is smooth with h(0) = 1 and |h()| 1 for all . By Fatous
lemma
Z
kf k1 =
(1 + kk2 )|f()|2 dN
RN
Z
lim inf
(1 + kk2 )|h(sy)|2 |f()|2 dN
s0
RN
|K|c (N + diam(K)2 ).
s0
2
0
if
|s|
s
if
|s| 2
.
(s) =
2(s ) if
s 2
2(s + ) if 2 s
Then set f = (f ). If O is the open set where f (x) 6= 0 then f has its
support contained in the set S consisting of all points whose distance from the
complement of O is > /c. Also
|f (x) f (y)| 2|f (x) f (y)| 2kx y|.
So we may apply the preceding result to f to conclude that f W 1,2 (RN ) and
kf k21 4|S |c2 (N + diam (O)2 )
357
12.4
12.4.1
12.4.2
[
nN
( , + ) {n}
358
then since
L2 (E(,+) ) =
L2 (( , + ), {n})
we conclude that all but finitely many of the summands on the right are zero,
which implies that for all but finitely many n we have
(( , + ) {n}) = 0.
For each of the finite non-zero summands, we can apply the case N = 1 of the
following lemma:
Lemma 12.4.1 Let be a measure on RN such that L2 (RN , ) is finite dimensional. Then is supported on a finite set in the sense that there is some finite
set of m distinct points x1 , . . . , xm each of positive measure and such that the
complement of the union of these points has measure zero.
Proof. Partition RN into cubes whose vertices have all coordinates of the form
t/2r for an integer r and so that this is a disjoint union. The corresponding
decomposition of the L2 spaces shows that only finite many of these cubes have
positive measure, and as we increase r the cubes with positive measure are
nested downward, and can not increase in number beyond n = dim L2 (RN , ).
Hence they converge in measure to at most n distinct points each of positive
measure and the complement of their union has measure zero.
We conclude from this lemma that there are at most finitely many points
(sr , k) with sr ( , + ) which have finite measure in the spectral representation of H, each giving rise to an eigenvector of H with eigenvalue sr , and
the complement of these points has measure zero. This shows that d (H).
We have proved
Proposition 12.4.1 (H) belongs to d (H) if and only if there is some
> 0 such that P (( , + ))(H) is finite dimensional.
12.4.3
12.4.4
359
Proof.
If the essential spectrum is empty, then the spectrum consists of
eigenvalues of finite multiplicity which have no accumulation finite point, and
so must converge in absolute value to . Enumerate the eigenvalues according
to increasing absolute value. Each has finite multiplicity and so we can find
an orthonormal basis of the finite dimensional eigenspace corresponding to each
eigenvalue. The eigenvectors corresponding to distinct eigenvalues are orthogonal. So what we must show is that the space spanned by all these eigenvectors is
dense. Suppose not. The space L orthogonal to all the eigenvectors is invariant
under H. If this space is non-zero, the spectrum of H restricted to this subspace
is not empty, and is a subset of the spectrum of H. So there will be eigenvectors
in L, contrary to the definition of L.
Conversely, suppose that the conditions hold. Let fn be the complete set
of eigenvectors. Since the set of eigenvalues is isolated, we will be done if we
show that they constitute the entire spectrum of H. Suppose that z does not
coincide with any of the n . We must show that the operator P
zI H has a
bounded
inverse
on
the
domain
of
H,
which
consists
of
all
f
=
n an fn such
P
P 2
that
|an |2 < and
n |an |2 < . But for these f
X
k(zI H)f k2 =
|z n |2 |an |2 c2 kf k2
n
1
1
(f, fj )fj )k
kf k.
1 + j
1 + n
j=n+1
360
This shows that n(I + H)1 can be approximated in operator norm by finite
rank operators. So we need only apply the following characterization of compact
operators which we should have stated and proved last semester:
12.4.5
12.4.6
(12.12)
The n are an increasing family of numbers. We shall show that they constitute
that part of the discrete spectrum of H which lies below the essential spectrum:
Theorem 12.4.3 Let H be a non-negative self-adjoint operator on a Hilbert
space H. Define the numbers n = n (H) by (12.12). Then one of the following
three alternatives holds:
361
1. H has empty essential spectrum. In this case the n and coincide with the eigenvalues of H repeated according to multiplicity and listed
in increasing order, or else H is finite dimensional and the n coincide
with the eigenvalues of H repeated according to multiplicity and listed in
increasing order.
2. There exists an a < such that n < a for all n, and limn n =
a. In this case a is the smallest number in the essential spectrum of H
and (H) [0, a) consists of the n which are eigenvalues of H repeated
according to multiplicity and listed in increasing order.
3. There exists an a < and an N such that n < a for n N and m = a
for all m > N . Then a is the smallest number in the essential spectrum
of H and (H) [0, a) consists of the 1 , . . . , N which are eigenvalues of
H repeated according to multiplicity and listed in increasing order.
Proof. Let b be the smallest point in the essential spectrum of H (so b =
in case 1.). So H has only isolated eigenvalues of finite multiplicity in [0, b)
and these constitute the entire spectrum of H in this interval. Let {fk } be
an orthonormal set of these eigenvectors corresponding to these eigenvalues k
listed (with multiplicity) in increasing order.
Let Mn denote the P
space spanned by the first n of these eigenvectors, and
n
let f Mn . Then f = j=1 (f, fj )fj so
Hf =
n
X
j (f, fj )fj
j=1
and so
(Hf, f ) =
n
X
j |(f, fj )|2 n
j=1
n
X
|(f, fj )|2 = n kf k2
j=1
so
n n .
In the other direction, let L be an n-dimensional subspace of Dom(H) and let
P denote orthogonal projection of H onto Mn1 so that
Pf =
n1
X
(f, fj )fj .
j=1
362
1). If there are infinitely many n in [0, b) they must have finite accumulation
point a b, and by definition, a is in the essential spectrum. Then we must
have a = b and we are in case 2). The remaining possibility is that there are
only finitely many 1 , . . . M < b. Then for k M we have k = k as above,
and also m b for m > M . Since b ess (H), the space
K := P (b , b + )H
is infinite dimensional for all > 0. Let {f1 , f2 , . . . , } be an orthonormal basis
of K, and let L be the space spanned by the first m of these basis elements. By
the spectral theorem, (Hf, f ) (b + )kf k2 for any f L. so for all m we have
m b + . So we are in case 3).
In applications (say to chemistry) one deals with self-adjoint operators which
are bounded from below, rather than being non-negative. But this requires just
a trivial shift in stating and applying the preceding theorem. In some of these
applications the bottom of the essential spectrum is at 0, and one is interested
in the lowest eigenvalue 1 which is negative.
12.4.7
(Hf, f )
.
(f, f )
min
f 6=0,f f1
(Hf, f )
.
(f, f )
This 2 coincides with the 2 given by (12.12) and an f2 which achieves the
minimum is an eigenvector of H with eigenvalue 2 . Proceeding this way, after finding the first n eigenvalues 1 , . . . , n and corresponding eigenvectors
f1 , . . . , fn we define
n+1 =
min
f 6=0,f f1 ,f f2 ,...,f fn
(Hf, f )
.
(f, f )
363
applications, one can frequently find a common core D for the quadratic forms
Q associated to the operators. That is,
1
D Dom(H 2 )
1
=
=
inf{(L), |L Dom(H)
inf{(L), |L D
00n
inf{(L), |L Dom(H 2 ).
Then
n = 0n = 00n .
1
Proof.
We first prove that 0n = 00n . Since D Dom(H 2 ) the condition
1
L D implies L Dom(H 2 ) so
0n 00n
n.
ij
satisfies
|(L0 ) 00n | < c00n
364
12.4.8
The definition (12.12) makes sense in a real finite dimensional vector space. If
Q is a real quadratic form on a finite dimensional real Hilbert space V , then we
can write Q(f ) = (Hf, f ) where H is a self-adjoint (=symmetric) operator, and
then find an orthonormal basis according to (12.12). In terms of such a basis
f1 , . . . , fn , we have
X
X
Q(f ) =
k rk2 where f =
rk fk .
k
1
dSf = (r1 , . . . , rn ).
2
So the only possible values of are = i for some i and the corresponding f
is given by rj = 0, j 6= i and ri 6= 0. This is a watered down version version of
Theorem 12.4.3. In applications, one is frequently given a basis of V which is
not orthonormal. Thus (in terms of the given basis)
X
X
Q(f ) =
Hij ri rj , and S(f ) =
Sij ri rj
ij
where
f=
ri fi .
365
i.e.
H11 S11
..
H12 S12
..
.
..
.
H1n S1n
r1
..
..
. = 0.
.
Hn1 Sn1
Hn2 Sn2
Hnn Snn
rn
..
..
..
..
det
=0
.
.
.
.
Hn1 Sn1 Hn2 Sn2 Hnn Snn
which is known as the secular equation due to its previous use in astronomy to
determine the periods of orbits.
12.5
It is not hard to check (and we will do so within the next three lectures) that
W 1,2 () with this scalar product is a Hilbert space. We let C0 () denote the
space of smooth functions of compact support whose support is contained in ,
and let W01,2 () denote the completion of C0 () with respect to the norm k k1
coming from the scalar product (, )1 .
We will show that defines a non-negative self-adjoint operator with domain
W01,2 () known as the Dirichlet operator associated with . I want to postpone
the proofs of these general facts and concentrate on what Rayleigh-Ritz tells us
when is a bounded open subset which we will assume from now on.
We are going apply Rayleigh-Ritz to the domain D() and the quadratic
form Q(f ) = Q(f, f ) where
Z
Q(f, g) :=
f (x) g(x)dx.
Define
n () := inf{(L)|L C0 (), dim(L) = n}
where
(L) = sup Q(f ), f L, kf k = 1
as before.
366
then
lim n (m ) = n ()
for all n.
Proof. For any > 0 there exists an n-dimensional subspace L of C0 () such
that (L) n () + . There will be a compact subset K such that all
the elements of L have support in K. We can then choose m sufficiently large
so that K m . Then
n () n (m ) n () + .
12.6
Valence.
6=0
(H, )
.
(, )
Unless one has a clever way of computing 1 by some other means, minimizing
the expression on the right over all of H is a hopeless task. What is done in
practice is to choose a finite dimensional subspace and apply the above minimization over all in that subspace (and similarly to apply (12.12) to subspaces
of that subspace for the higher eigenvalues). The hope is that this yield good
approximations to the true eigenvalues.
If M is a finite dimensional subspace of H, and P denotes projection onto
M , then applying (12.12) to subspaces of M amounts to finding the eigenvalues
12.6. VALENCE.
367
12.6.1
Consider the case where M is two dimensional with a basis 1 and 2 . The
idea is that we have some grounds for believing that that the true eigenfunction
has characteristics typical of these two elements and is likely to be some linear
combination of them. If we set
H11 := (H1 , 1 ),
H22 := (H2 , 2 )
and
S11 := (S1 , 1 ),
S12 := (1 , 2 ) = S21 ,
S22 := (2 , 2 )
then if these quantities are real we can apply the secular equation
H11 S11 H12 S12
det
=0
H21 S21 H22 S22
to determine .
Suppose that S11 = S22 = 1, i.e. that 1 and 2 are separately normalized.
Also assume that 1 and 2 are linearly independent. Let
:= S12 = S21 .
This is sometimes
R called the overlap integral since if our Hilbert space is
L2 (R3 ) then = R3 1 2 dx. Now
H11 = (H1 , 1 )
is the guess that we would make for the lowest eigenvalue (= the lowest energy
level) if we took L to be the one dimensional space spanned by 1 . So let us
call this value E1 . So E1 := H11 and similarly define E2 = H22 . The secular
equation becomes
( E1 )( E2 ) (H12 )2 = 0.
If we define F () := ( E1 )( E2 ) (H12 )2 then F is positive for large
values of || since || < 1 by Cauchy-Schwarz. F () is non-positive at = E1 or
E2 and in fact generically will be strictly negative at these points. So the lower
solution of the secular equations will generically lie strictly below min(E1 , E2 )
and the upper solution will generically lie strictly above max(E1 , E2 ). This is
known as the no crossing rule and is of great importance in chemistry. I
hope to explain the higher dimensional version of this rule (due to Teller-von
Neumann and Wigner) later.
368
12.6.2
H
uckel theory of hydrocarbons.
In this theory the space M is the n-dimensional space where each carbon atom
contributes one electron. (The other electrons being occupied with the hydrogen
atoms.) It is assumed that the S in the secular equation is the identity matrix.
This amounts to the assumption that the basis given by the electrons associated with each carbon atom is an orthonormal basis. It is also assumed that
(Hf, f ) = is the same for each basis element. In a crude sense this measures
the electron-attracting power of each carbon atom and hence is assumed to be
the same for all basis elements. If (Hfr , fs ) 6= 0, the atoms r and s are said to
be bonded. It is assumed that only nearest neighbor atoms are bonded, in
which case it is assumed that (Hfr , fs ) = is independent of r and s. So P HP
has the form
I + A
where A is the adjacency matrix of the graph whose vertices correspond to the
carbon atoms and whose edges correspond to the bonded pairs of atoms. If we
set
E
x :=
then finding the energy levels is the same as finding the eigenvalues x of the
adjacency matrix A. In particular this is so if we assume that the values of
and are independent of the particular molecule.
12.7
In this section we present the proof given by Davies of the spectral theorem,
taken from his book Spectral Theory and Differential Operators.
12.7.1
Symbols.
These are functions which vanish more (or grow less) at infinity the more you
differentiate them. More precisely, for any real number we let S denote the
space of smooth functions on R such that for each non-negative integer n there
is a constant cn (depending on f ) such that
|f (n) (x)| cn (1 + |x|2 )(n)/2 .
It will be convenient to introduce the function
1
hzi := (1 + |z|2 ) 2 .
So we can write the definition of S as being the space of all smooth functions
f such that
|f (n) (x)| cn hxin
(12.13)
for some cn and all integers n 0. For example, a polynomial of degree k belongs
to S k since every time you differentiate it you lower the degree (and eventually
369
get zero). More generally, a function of the form P/Q where P and Q are
polynomials with Q nowhere vanishing belongs to S k where k = deg P deg Q.
The name symbol comes from the theory of pseudo-differential operators.
12.7.2
Define
A :=
S .
<0
n Z
X
r=0
(12.14)
Rx
f (t)dt so
sup |f (x)| kf k1
xR
n+1
XZ
r=0
r
d
r1
dxr {f (x)(1 m (x)} hxi dx.
|x|>m
370
12.7.3
Consider complex valued differentiable functions in the (x, y) plane. Define the
differential operators
:=
+i
, and
:=
i
.
z
2 x
y
z
2 x
y
Define the complex valued linear differential forms
dz := dx + idy,
dz := dx idy
so
1
1
(dz + dz),
dy = (dz dz).
2
2i
So we can write any complex valued differential form adx + bdy as Adz + Bdz
where
1
1
A = (a ib), B = (a + ib).
2
2
In particular, for any differentiable function f we have
dx =
df =
f
f
f
f
dx +
dy =
dz +
dz.
x
y
z
z
Also
dz dz = 2idx dy.
So if U is any bounded region with piecewise smooth boundary, Stokes theorem
gives
Z
Z
Z
Z
f
f
f dz =
d(f dz) =
dz dz = 2i
dxdy.
U
U
U z
U z
The function f can take values in any Banach space. We will apply to functions
with values in the space of bounded operators on a Hilbert space.
A function f is holomorphic if and only if f
z = 0. So the above formula
implies the Cauchy integral theorem.
Here is a variant of Cauchys integral theorem valid for a function of compact
support in the plane:
Z
1
f
1
dxdy = f (w).
(12.15)
C z z w
Indeed, the integral on the left is the limit of the integral over C \ D where D
is a disk of radius centered at w. Since f has compact support, and since
1
= 0,
z z w
we may write the integral on the left as
Z
1
f (z)
dz f (w).
2i D z w
12.7.4
371
(12.16)
(12.17)
f
,n
= O(|y|n ),
z
and, in particular,
f,n
(x, 0) = 0.
z
We call f,n an almost holomorphic extension of f .
12.7.5
The Heffler-Sj
ostrand formula.
z C.
(12.19)
372
n
X
r=0
as y 0
then
Z
F
1
R(z, H)dxdy = 0.
C z
Proof of the lemma. Chooose N large enough so that the support of F is
contained in the square |x| < N, |y| < N . So the integration is over this square.
Let Q denote this square with the strip |y| < removed, so Q consists of two
rectangular regions, Q = R+ R , and it is enough to show that the limit
of the integrals over each of these regions vanishes. By Stokes theorem, the
integral over R+ is equal to
Z
i
F (z)R(z, H)dz.
2 R+
But F (z) vanishes on the top and on the vertical sides of R+ while along the
bottom we have kR(z, H)k 1 while F = O( 2 ). So the integral allong the
bottom path tends to zero, and the same for R . This proves the lemma.
Proof of the theorem. Since the smooth functions of compact support are
dense in A it is enough to prove the theorem for f compactly supported, If we
make two choices, 1 and 2 then the corresponding f1 ,n and f2 ,n agree in
some neighborhood of the x-axis, so f1 ,n f2 ,n vanishes in a neighborhood of
the x-axis so the lemma implies that
f1 ,n (A) = f2 ,n (A).
This shows that the definition is independent of the choice of . If m > n 1
then f,m f,n = O(y 2 ) so another application of the lemma proves that f(H)
is independent of the choice of n.
Notice that the proof shows that we can choose to be any smooth function
which is identically one in a neighborhood of the real axis and which is compactly
supported in the imaginary direction.
12.7.6
373
Let w be a complex number with a non-zero imaginary part, and consider the
function rw on R given by
1
rw (x) =
wx
This function clearly belongs to A and so we can form rw (H). The purpose of
this section is to prove that
rw (H) = R(w, H).
We will choose the in the definition of rw so that w 6 supp . To be specific,
choose = (|y|/hxi) for large enough so that w 6 supp .
We will choose the n in the definition of rw as n = 1.
For each real number m consider the region
m := (x, y)||x| < m and m1 hxi < |y| < 2m .
Again, m consists of two regions, each of which has three straight line segment
sides (one horizontal at the top (or bottom) and two vertical) and a parabolic
side, and we can write the integral over C as the limit of the integral over m
as m . So by Stokes,
Z
rm (H) = lim
rw (z)R(z, H)dz.
m
The first summand on the right vanishes when |y| hxi. We can apply Taylors
formula with remainder to the function y 7 rw (x + iy) in the second term. So
we have the estimate
|rw (z) rw (z)| c1hxi<|y| + c
|y|2
hxi3
along each of the four vertical sides. So the integral of the difference along the
vertical sides is majorized by
Z 2m
Z 2m
dy
ydy
c
+c
= O(m1 ).
3
my
1
1
hmi
hmi m
Along the two horizontal lines (the very top and the very bottom) vanishes
and krw (z)(z H)1 k is of order m2 so these integrals are O(m1 ). Along
the parabolic curves 1 and the Taylor expansion yields
|rw (z) rw (z)| c
y2
hxi3
374
12.7.7
We now show that the map f 7 f (H) has the desired properties of a functional
calculus, see Theorem 12.7.2 below. First some lemmas:
Lemma 12.7.4 If f is a smooth function of compact support which is disjoint
from the spectrum of H then f (H) = 0.
Proof. We may find a finite number of piecewise smooth curves which are
disjoint from the spectrum of H and which bound a region U which contains
the support of f. Then by Stokes
Z
f
1
R(z, H)dxdy =
f (H) =
U z
Z
i
f(z)R(z, H)dz = 0
2 U
since f vanishes on U.
Lemma 12.7.5 For all f, g A
(f g)(H) = f (H)g(H).
Proof. It is enough to prove this when f and g are smooth functions of compact
support. The product on the right is given by
1
2
Z
KL
f
g
R(z, H)R(w, H)dxdydudv
z w
f
f (H)g(H) =
f + g
R(z, H)dxdy
KL
z
z
=
Z
C
(fg)
R(z, H)dxdy.
z
375
Z
C
(f g)
R(z, H)dxdy.
z
But
(f g) fg
is of compact support and is O(y 2 ) so Lemma 12.7.3 implies our lemma.
Lemma 12.7.6
f (H) = f (H) .
This follows from R(z, H) = R(z, H).
Lemma 12.7.7
kf (H)k kf k
where kf k denotes the sup norm of f .
Proof. Choose c > kf k and define
g(s) := c
p
c2 |f (s)|2 .
Then g A and
g 2 = 2cg |f |2
or
f f cg cg + g 2 = 0.
By Lemma 12.7.5 and the preceding lemma this implies that
f (H)f (H) + (c g(H)) (c g(H)) = c2 .
But then for any in our Hilbert space,
kf (H)k2 kf (H)k2 + k(c g(H)k2 = c2 kk2
proving the lemma.
Let C0 (R) denote the space of continuous functions which vanish at with
k k the sup norm. The algebra A is dense in C0 (R) by Stone Weierstrass, and
the preceding lemma allows us to extend the map f 7 f (H) to all of C0 (R).
Theorem 12.7.2 If H is a self-adjoint operator on a Hilbert space H then there
exists a unique linear map
f 7 f (H)
from C0 (R) to bounded operators on H such that
1. The map f 7 f (H) is an algebra homomorphism,
2. f (H) = f (H) ,
376
3. kf (H)k kf k ,
4. If w is a complex number with non-zero imaginary part and rw (x) = (w
x)1 then
rw (H) = R(w, H)
5. If the support of f is disjoint from the spectrum of H then f (H) = 0.
We have proved everything except the uniqueness. But item 4) determines the
map on the functions rw and the algebra generated by these functions is dense
by Stone Weierstrass.
In order to get the full spectral theorem we will have to extend this functional
calculus from C0 (R) to a larger class of functions, for example to the class of
bounded measurable functions. In fact, Davies proceeds by using the theorem
we have already proved to get the spectral theorem in the form that says that
a self-adjoint operator is unitarily equivalent to a multiplication operator on
an L2 space and then the extended functional calculus becomes evident. First
some definitions:
12.7.8
z n1 H n
n=0
shows that R(z, H)L L. By analytic continuation this holds for all non-real z.
So if H is a bounded operator, if L is invariant in the usual sense it is resolvent
invariant. We shall see shortly that conversely, if L is a resolvent invariant
subspace for a bounded self-adjoint operator then it is invariant in the usual
sense.
Lemma 12.7.8 If L is a resolvent invariant subspace for a (possibly unbounded)
self-adjoint operator then so is its orthogonal complement.
Proof. If L then for any L we have
(R(z, H), ) = (, R(z, H)) = 0
if Im z 6= 0 so R(z, H) L .
Now suppose that H is a bounded self-adjoint operator and that L is a
resolvent invariant subspace. For f L decompose Hf as Hf = g + h where
g L and h L . The Lemma says that R(z, H)h L . But
R(z, H)h = R(z, H)(Hf g) = R(z, H)(Hf zf + zf g)
377
= f + R(z, H)(zf g) L
since f L, zf g L and L is invariant under R(z, H). Thus R(z, H)h
L L so R(z, H)h = 0. But R(z, H) is injective so h = 0. We have shown that
if H is a bounded self-adjoint operator and L is a resolvent invariant subspace
then it is invariant under H.
So from now on we can drop the word resolvent. We will take the word invariant to mean resolvent invariant when dealing with possibly unbounded
operators. On bounded operators this coincides with the old notion of invariance.
12.7.9
Cyclic subspaces.
which would prove the lemma. To prove this, choose a sequence vm Dom(H)
with vm v and choose some non-real number z. Set
wm := (zI H)vm ,
so that vn = R(z, H)wn . Let rz be the function rz (s) = (z s)1 on R as above,
so vm = rz (H)wm .
378
Then
fn (H)v
= fn (H)vm + fn (H)(v vm )
= fn (H)rz (H)wm + fn (H)(v vm )
= (fn rz )(H)wn + fn (H)(v vm ).
So
fn (H)v v = [(fn rz )(H) rz (H)]wm + (vm v) + fn (H)(v vm ).
Now fn rz rz in the sup norm on C0 (R). So given > 0 we can first choose
m so large that kvm vk < 31 . Since the sup norm of fn is 1, the third
summand above is also less in norm that 31 . We can then choose n sufficiently
large that the first term is also 13 .
Clearly, L is the smallest (resolvent) invariant subspace which contains v.
Hence, if H is a bounded self-adjoint operator, L is the smallest closed subspace
containing all the H n v. So for bounded self-adjoint operators, we have not
changed the definition of cyclic subspace.
Proposition 12.7.1 Let H be a (possibly unbounded) self-adjoint operator on
a separable Hilbert space H. Then there exist a (finite or countable) family of
orthogonal cyclic subspaces Ln such that H is the closure of
M
Ln .
n
Proof.
Let fn be a countable dense subset of H and let L1 be the cyclic
subspace generated by f1 . If L1 = H we are done. If not, there must be some
m for which fm 6 L1 . Choose the smallest such m and let g2 be the orthogonal
projection of fm onto L
1 . Let L2 be the cyclic subspace generated by g2 . If
L1 L2 = H we are done. If not, there is some m for which fm 6 L1 L2 .
choose the smallest such m and let g3 be the projection of fm onto the orthogonal
complement of L1 L2 . Proceed inductively. Either this comes to an end with
H a finite direct sum of cyclic subspaces or it goes on indefinitely. In either case
all the fi belong to the algebraic direct sum in the proposition and hence the
closure of this sum is all of H.
379
380
12.7.10
381
f d
R
for all f C0 (R). If the support of f is disjoint from S then f (H) = 0. this
shows that is supported on S, the spectrum of H.
We have
Z
f gd = (g (H)f (H)v, v) = (f (H)v, g(H)v)
for f, g C0 (R).
Let M denote the linear subspace of H consisting of all f (H)v, f C0 (R).
The above equality says that there is an isometry U from M to C0 (R) (relative
to the L2 norm) such that
U (f (H)) = f.
Since M is dense in H by hypothesis, this extends to a unitary operator U form
H to L2 (S, ).
Let f1 , f2 , f C0 (R) and set 2 = f1 (H)v, 2 = f2 (H)v. So fi = U i , i =
1, 2. Then
Z
(f (H)1 , 2 ) =
f f1 f 2 d.
S
382
= HR(w, H)U 1 f
= U 1 f + wR(w, H)U 1 f
= U 1 f + U 1 wrw f
w
1
1 f
= U
wx
= U 1 (hrw f )
= U 1 (hg).
We can now state and prove the general form of the spectral representation
theorem. Using the decomposition of H into cyclic subspaces and taking the
generating vectors to have norm 2n we can proceed as we did last semester. We
can get the extension of the homomorphism f 7 f (H) to bounded measurable
functions by proving it in the L2 representation via the dominated convergence
theorem. This then gives projection valued measures etc.
Chapter 13
Scattering theory.
The purpose of this chapter is to give an introduction to the ideas of Lax and
Phillips which is contained in their beautiful book Scattering theory.
Throughout this chapter K will denote a Hilbert space and t 7 S(t), t 0
a strongly continuous semi-group of contractions defined on K which tends
strongly to 0 as t in the sense that
lim kS(t)kk = 0
for each k K.
13.1
Examples.
13.1.1
Translation - truncation.
(13.1)
kP Tt f k2 =
|f (x)|2 dx
384
for s 0. Hence g G Ts g = Ts
g G since
(Ts
g, ) = (g, Ts ) = 0
13.1.2
G.
Incoming representations.
for t 0
(13.2)
U (t)D = 0
(13.3)
U (t)D = H.
(13.4)
[
t
13.1. EXAMPLES.
385
U (t)D
t>0
386
13.1.3
Scattering residue.
In the scattering theory example, we want to believe that at large future times
the obstacle has little effect and so there should be both an incoming space
describing the situation long before the interaction with the obstacle, and also an
outgoing space reflecting behavior long after the interaction with the obstacle.
The residual behavior - i.e. the effect of the obstacle - is what is of interest.
For example, in elementary particle physics, this might be observed as a blip in
the scattering cross-section describing a particle of a very short life-time. See
the very elementary discussion of the blip arising in the Breit-Wigner formula
below.
So let t 7 U (t) be a strongly continuous one parameter unitary group on
a Hilbert space H, let D be an incoming subspace for U and let D+ be an
outgoing subspace (i.e. incoming for t 7 U (t)). Suppose that
D D+
and let
K := [D D+ ] = D
D+ .
Let
P := orthogonal projection onto D
.
Let
Z(t) := P+ U (t)P ,
t 0.
Claim:
Z(t) : K K.
Proof. Since P+ occurs as the leftmost factor in the definition of Z, the image
of Z(t) is contained in D
+ . We must show that
x D
P+ U (t)x D
U (t) : D
D .
13.2. BREIT-WIGNER.
387
So
U (t)x D
.
Since D D
+ the projection P+ is the identity on D , in particular
P+ : D D
and hence, since P+ is self-adjoint,
P+ : D
7 D .
Thus P+ U (t)x D
as required. QED
By abuse of language, we will now use Z(t) to denote the restriction of Z(t)
to K. We claim that t 7 Z(t) is a semi-group. Indeed, we have
P+ U (t)P+ x = P+ U (t)x + P+ U (t)[P+ x x] = P+ U (t)x
since [P+ x x] D+ and U (t) : D+ D+ for t 0. Also Z(t) = P+ U (t) on
K since P is the identity on K. Therefore we may drop the P on the right
when restricting to K and we have
Z(s)Z(t) = P+ U (s)P+ U (t) = P+ U (s)U (t) = P+ U (s + t) = Z(s + t)
proving that Z is a semigroup.
We now show that Z is strongly contracting. For any x H and any > 0
we can find a T > 0 and a y D+ such that
kx U (T )yk <
since
t<0
so P+ U (t)U (T )y = 0
and hence
kZ(t)xk < .
We have proved that Z is a strongly contractive semi-group on K which tends
strongly to zero, i.e. that(13.1) holds.
13.2
Breit-Wigner.
388
fd (t) =
Then
2
et d
0
t0
t > 0.
kfd k =
so the map
R : K L2 (R, N) d 7 fd
is an isometry. Also
P (Tt fd )(s) = P fd (s t) = et fd (s)
so
(P Tt ) R = R Z(t).
This is an example of the representation theorem in the next section.
If we take the Fourier transform of fd we obtain the function
1
1
d
7
2 i
whose norm as a function of is proportional to the Breit-Wigner function
1
.
2 + 2
It is this bump appearing in graph of a scattering experiment which signifies a
resonance, i.e. an unstable particle whose lifetime is inversely proportional
to the width of the bump.
13.3
d
kS(t)kk2 .
dt
390
if t > r.
13.4
fd (t + r) if
t r
.
0
if r < t 0
(13.5)
391
392
13.5
Let us show that the Sinai representation theorem implies a version (for n = 1)
of the Stone - von Neumann theorem:
Theorem 13.5.1 Let {U (t)} be a one parameter group of unitary operators,
and let B be a self-adjoint operator on a Hilbert space H. Suppose that
U (t)BU (t) = B tI.
(13.6)
Z
B tI =
Z
( t)dE =
dE+t
by a change of variables.
We thus obtain
U (t)E U (t) = E+t .
Remember that E is orthogonal projection onto the subspace associated to
(, ] by the spectral measure associated to B. Let D denote the image of E0 .
Then the preceding equation says that U (t)D is the image of the projection Et .
The standard properties of the spectral measure - that the image of Et increase
with t, tend to the whole space as t and tend to {0} as t are exactly
the conditions that D be incoming for U (t). Hence the Sinai representation
393
theorem is equivalent to the Stone - von -Neumann theorem in the above form.
QED
Historically, Sinai proved his representation theorem from the Stone - von
Neumann theorem. Here, following Lax and Phillips, we are proceeding in the
reverse direction.