Kuttler
April 1, 2011
2
Contents
I Prerequisite Material 11
1 Some Fundamental Concepts 13
1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.1 Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.2 The Schroder Bernstein Theorem . . . . . . . . . . . . . . . . 16
1.1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . 19
1.2 limsup And liminf . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Double Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Row reduced echelon form 27
2.1 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 The row reduced echelon form of a matrix . . . . . . . . . . . . . . . 35
2.3 Finding the inverse of a matrix . . . . . . . . . . . . . . . . . . . . . 39
2.4 The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Sequences 43
3.1 The Inner Product And Dot Product . . . . . . . . . . . . . . . . . . 43
3.1.1 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Vector Valued Sequences And Their Limits . . . . . . . . . . . . . . 46
3.3 Sequential Compactness . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Closed And Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Cauchy Sequences And Completeness . . . . . . . . . . . . . . . . . 55
3.6 Shrinking Diameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Continuous Functions 61
4.1 Continuity And The Limit Of A Sequence . . . . . . . . . . . . . . . 64
4.2 The Extreme Values Theorem . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Sequences And Series Of Functions . . . . . . . . . . . . . . . . . . . 72
4.6 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Sequences Of Polynomials, Weierstrass Approximation . . . . . . . . 77
4.7.1 The Tietze Extension Theorem . . . . . . . . . . . . . . . . . 81
3
4 CONTENTS
4.8 The Operator Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.9 Ascoli Arzela Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5 The Mathematical Theory Of Determinants, Basic Linear Algebra 99
5.1 The Function sgn
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 The Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.2 Permuting Rows Or Columns . . . . . . . . . . . . . . . . . . 102
5.2.3 A Symmetric Deﬁnition . . . . . . . . . . . . . . . . . . . . . 103
5.2.4 The Alternating Property Of The Determinant . . . . . . . . 104
5.2.5 Linear Combinations And Determinants . . . . . . . . . . . . 105
5.2.6 The Determinant Of A Product . . . . . . . . . . . . . . . . . 105
5.2.7 Cofactor Expansions . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.8 Formula For The Inverse . . . . . . . . . . . . . . . . . . . . . 108
5.2.9 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.10 Upper Triangular Matrices . . . . . . . . . . . . . . . . . . . 110
5.2.11 The Determinant Rank . . . . . . . . . . . . . . . . . . . . . 110
5.2.12 Determining Whether A Is One To One Or Onto . . . . . . . 112
5.2.13 Schur’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.14 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . 116
5.3 The Right Polar Factorization . . . . . . . . . . . . . . . . . . . . . . 117
6 The Derivative 121
6.1 Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3 The Matrix Of The Derivative . . . . . . . . . . . . . . . . . . . . . 123
6.4 A Mean Value Inequality . . . . . . . . . . . . . . . . . . . . . . . . 126
6.5 Existence Of The Derivative, C
1
Functions . . . . . . . . . . . . . . 127
6.6 Higher Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7 C
k
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.7.1 Some Standard Notation . . . . . . . . . . . . . . . . . . . . . 134
6.8 The Derivative And The Cartesian Product . . . . . . . . . . . . . . 134
6.9 Mixed Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 138
6.10 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . 140
6.10.1 More Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.10.2 The Case Of 1
n
. . . . . . . . . . . . . . . . . . . . . . . . . 147
6.11 Taylor’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.11.1 Second Derivative Test . . . . . . . . . . . . . . . . . . . . . . 149
6.12 The Method Of Lagrange Multipliers . . . . . . . . . . . . . . . . . . 151
6.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
CONTENTS 5
II Integration And Measure 159
7 Measures And Measurable Functions 161
7.1 Open Coverings And Compactness . . . . . . . . . . . . . . . . . . . 161
7.2 An Outer Measure On T (1) . . . . . . . . . . . . . . . . . . . . . . 164
7.3 Measures And Measure Spaces . . . . . . . . . . . . . . . . . . . . . 167
7.4 Measures From Outer Measures . . . . . . . . . . . . . . . . . . . . . 169
7.5 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.6 One Dimensional Lebesgue Stieltjes Measure . . . . . . . . . . . . . 181
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8 The Abstract Lebesgue Integral 189
8.1 Deﬁnition For Nonnegative Measurable Functions . . . . . . . . . . . 189
8.1.1 Riemann Integrals For Decreasing Functions . . . . . . . . . . 189
8.1.2 The Lebesgue Integral For Nonnegative Functions . . . . . . 190
8.2 The Lebesgue Integral For Nonnegative Simple Functions . . . . . . 191
8.3 The Monotone Convergence Theorem . . . . . . . . . . . . . . . . . 193
8.4 Other Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.5 Fatou’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.6 The Integral’s Righteous Algebraic Desires . . . . . . . . . . . . . . . 195
8.7 The Lebesgue Integral, L
1
. . . . . . . . . . . . . . . . . . . . . . . . 196
8.8 The Dominated Convergence Theorem . . . . . . . . . . . . . . . . . 201
8.9 The One Dimensional Lebesgue Stieltjes Integral . . . . . . . . . . . 203
8.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9 The Lebesgue Integral For Functions Of n Variables 213
9.1 Completion Of Measure Spaces, Approximation . . . . . . . . . . . . 213
9.2 Dynkin Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.3 n Dimensional Lebesgue Measure And Integrals . . . . . . . . . . . . 220
9.3.1 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.3.2 n Dimensional Lebesgue Measure And Integrals . . . . . . . . 221
9.3.3 The Sigma Algebra Of Lebesgue Measurable Sets . . . . . . . 225
9.3.4 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.4 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.4.1 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.4.2 Completion Of Product Measure Spaces . . . . . . . . . . . . 231
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10 Lebesgue Measurable Sets 237
10.1 The σ Algebra Of Lebesgue Measurable Sets . . . . . . . . . . . . . 237
10.2 Change Of Variables, Linear Maps . . . . . . . . . . . . . . . . . . . 241
10.3 Covering Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10.4 Diﬀerentiable Functions And Measurability . . . . . . . . . . . . . . 248
10.5 Change Of Variables, Nonlinear Maps . . . . . . . . . . . . . . . . . 250
10.6 The Mapping Is Only One To One . . . . . . . . . . . . . . . . . . . 255
10.7 Mappings Which Are Not One To One . . . . . . . . . . . . . . . . . 256
6 CONTENTS
10.8 Spherical Coordinates In p Dimensions . . . . . . . . . . . . . . . . . 258
10.9 Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . 261
10.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
11 Approximation Theorems 273
11.1 Bernstein Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 273
11.2 Stone’s Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . 275
11.3 The Case Of Complex Valued Functions . . . . . . . . . . . . . . . . 278
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
12 The L
p
Spaces 283
12.1 Basic Inequalities And Properties . . . . . . . . . . . . . . . . . . . . 283
12.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.3 Minkowski’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.4 Density Simple Functions . . . . . . . . . . . . . . . . . . . . . . . . 291
12.5 Density Of C
c
(Ω) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
12.6 Continuity Of Translation . . . . . . . . . . . . . . . . . . . . . . . . 293
12.7 Separability, Some Special Functions . . . . . . . . . . . . . . . . . . 293
12.8 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
12.9 Molliﬁers And Density Of C
∞
c
(1
n
) . . . . . . . . . . . . . . . . . . . 297
12.10L
∞
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
12.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
13 Fourier Transforms 309
13.1 Fourier Transforms Of Functions In ( . . . . . . . . . . . . . . . . . 309
13.2 Fourier Transforms Of Just About Anything . . . . . . . . . . . . . . 311
13.2.1 Fourier Transforms Of (
∗
. . . . . . . . . . . . . . . . . . . . 311
13.2.2 Fourier Transforms Of Functions In L
1
(1
n
) . . . . . . . . . . 315
13.2.3 Fourier Transforms Of Functions In L
2
(1
n
) . . . . . . . . . . 317
13.2.4 The Schwartz Class . . . . . . . . . . . . . . . . . . . . . . . 322
13.2.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
13.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
14 Fourier Series 331
14.1 Deﬁnition And Basic Properties . . . . . . . . . . . . . . . . . . . . . 331
14.2 The Riemann Lebesgue Lemma . . . . . . . . . . . . . . . . . . . . . 335
14.3 Dini’s Criterion For Convergence . . . . . . . . . . . . . . . . . . . . 335
14.4 Jordan’s Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
14.5 Integrating And Diﬀerentiating Fourier Series . . . . . . . . . . . . . 342
14.6 Diﬀerentiating Fourier Series . . . . . . . . . . . . . . . . . . . . . . 344
14.7 Ways Of Approximating Functions . . . . . . . . . . . . . . . . . . . 346
14.7.1 Uniform Approximation With Trig. Polynomials . . . . . . . 346
14.7.2 Mean Square Approximation . . . . . . . . . . . . . . . . . . 350
14.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
CONTENTS 7
III Further Topics 361
15 Metric Spaces And General Topological Spaces 363
15.1 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
15.2 Compactness In Metric Space . . . . . . . . . . . . . . . . . . . . . . 365
15.3 Some Applications Of Compactness . . . . . . . . . . . . . . . . . . . 368
15.4 Ascoli Arzela Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 370
15.5 General Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . 374
15.6 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
15.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
16 Measure Theory And Topology 389
16.1 Borel Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
16.2 Regular Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
16.3 Locally Compact Hausdorﬀ Spaces . . . . . . . . . . . . . . . . . . . 398
16.4 Positive Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . 402
16.5 Lebesgue Measure And Its Properties . . . . . . . . . . . . . . . . . 410
16.6 Change Of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.7 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
17 Extension Theorems 421
17.1 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
17.2 Caratheodory Extension Theorem . . . . . . . . . . . . . . . . . . . 423
17.3 The Tychonoﬀ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 426
17.4 Kolmogorov Extension Theorem . . . . . . . . . . . . . . . . . . . . 428
17.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
17.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
18 Banach Spaces 441
18.1 Theorems Based On Baire Category . . . . . . . . . . . . . . . . . . 441
18.1.1 Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . 441
18.1.2 Uniform Boundedness Theorem . . . . . . . . . . . . . . . . . 445
18.1.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . 446
18.1.4 Closed Graph Theorem . . . . . . . . . . . . . . . . . . . . . 448
18.2 Hahn Banach Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 450
18.3 Uniform Convexity Of L
p
. . . . . . . . . . . . . . . . . . . . . . . . 458
18.4 Weak And Weak ∗ Topologies . . . . . . . . . . . . . . . . . . . . . . 464
18.4.1 Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . 464
18.4.2 Banach Alaoglu Theorem . . . . . . . . . . . . . . . . . . . . 465
18.4.3 Eberlein Smulian Theorem . . . . . . . . . . . . . . . . . . . 467
18.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
8 CONTENTS
19 Hilbert Spaces 477
19.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
19.2 Approximations In Hilbert Space . . . . . . . . . . . . . . . . . . . . 483
19.3 Orthonormal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
19.4 Fourier Series, An Example . . . . . . . . . . . . . . . . . . . . . . . 488
19.5 General Theory Of Continuous Semigroups . . . . . . . . . . . . . . 490
19.5.1 An Evolution Equation . . . . . . . . . . . . . . . . . . . . . 499
19.5.2 Adjoints, Hilbert Space . . . . . . . . . . . . . . . . . . . . . 502
19.5.3 Adjoints, Reﬂexive Banach Space . . . . . . . . . . . . . . . . 506
19.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
20 Representation Theorems 515
20.1 Radon Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . 515
20.2 Vector Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
20.3 Representation Theorems For The Dual Space Of L
p
. . . . . . . . . 529
20.4 The Dual Space Of L
∞
(Ω) . . . . . . . . . . . . . . . . . . . . . . . 537
20.5 Non σ Finite Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
20.6 The Dual Space Of C
0
(X) . . . . . . . . . . . . . . . . . . . . . . . . 546
20.7 The Dual Space Of C
0
(X), Another Approach . . . . . . . . . . . . 551
20.8 More Attractive Formulations . . . . . . . . . . . . . . . . . . . . . . 553
20.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
21 Diﬀerentiation With Respect To General Radon Measures 559
21.1 Besicovitch Covering Theorem . . . . . . . . . . . . . . . . . . . . . 559
21.2 Fundamental Theorem Of Calculus For Radon Measures . . . . . . . 563
21.3 Slicing Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
21.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
22 The Bochner Integral 577
22.1 Strong And Weak Measurability . . . . . . . . . . . . . . . . . . . . 577
22.2 The Essential Bochner Integral . . . . . . . . . . . . . . . . . . . . . 585
22.3 The Spaces L
p
(Ω; X) . . . . . . . . . . . . . . . . . . . . . . . . . . 590
22.4 Measurable Representatives . . . . . . . . . . . . . . . . . . . . . . . 596
22.5 Vector Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
22.6 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . 603
23 Hausdorﬀ Measure 611
23.1 Deﬁnition Of Hausdorﬀ Measures . . . . . . . . . . . . . . . . . . . . 611
23.1.1 Properties Of Hausdorﬀ Measure . . . . . . . . . . . . . . . . 612
23.2 H
n
And m
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
23.3 Technical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 617
23.3.1 Steiner Symmetrization . . . . . . . . . . . . . . . . . . . . . 619
23.3.2 The Isodiametric Inequality . . . . . . . . . . . . . . . . . . . 621
23.4 The Proper Value Of β (n) . . . . . . . . . . . . . . . . . . . . . . . . 622
23.4.1 A Formula For α(n) . . . . . . . . . . . . . . . . . . . . . . . 623
23.4.2 Hausdorﬀ Measure And Linear Transformations . . . . . . . . 625
CONTENTS 9
A The Hausdorﬀ Maximal Theorem 627
A.1 The Hamel Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
Copyright c _ 2010,
10 CONTENTS
Part I
Prerequisite Material
11
Some Fundamental Concepts
1.1 Set Theory
1.1.1 Basic Deﬁnitions
A set is a collection of things called elements of the set. For example, the set of
integers, the collection of signed whole numbers such as 1,2,4, etc. This set whose
existence will be assumed is denoted by Z. Other sets could be the set of people in
a family or the set of donuts in a display case at the store. Sometimes parentheses,
¦ ¦ specify a set by listing the things which are in the set between the parentheses.
For example the set of integers between 1 and 2, including these numbers could
be denoted as ¦−1, 0, 1, 2¦. The notation signifying x is an element of a set S, is
written as x ∈ S. Thus, 1 ∈ ¦−1, 0, 1, 2, 3¦. Here are some axioms about sets.
Axioms are statements which are accepted, not proved.
1. Two sets are equal if and only if they have the same elements.
2. To every set A, and to every condition S (x) there corresponds a set B, whose
elements are exactly those elements x of A for which S (x) holds.
3. For every collection of sets there exists a set that contains all the elements
that belong to at least one set of the given collection.
4. The Cartesian product of a nonempty family of nonempty sets is nonempty.
5. If A is a set there exists a set T (A) such that T (A) is the set of all subsets
of A. This is called the power set.
These axioms are referred to as the axiom of extension, axiom of speciﬁcation,
axiom of unions, axiom of choice, and axiom of powers respectively.
It seems fairly clear you should want to believe in the axiom of extension. It is
merely saying, for example, that ¦1, 2, 3¦ = ¦2, 3, 1¦ since these two sets have the
same elements in them. Similarly, it would seem you should be able to specify a
new set from a given set using some “condition” which can be used as a test to
13
14 SOME FUNDAMENTAL CONCEPTS
determine whether the element in question is in the set. For example, the set of all
integers which are multiples of 2. This set could be speciﬁed as follows.
¦x ∈ Z : x = 2y for some y ∈ Z¦ .
In this notation, the colon is read as “such that” and in this case the condition is
being a multiple of 2.
Another example of political interest, could be the set of all judges who are not
judicial activists. I think you can see this last is not a very precise condition since
there is no way to determine to everyone’s satisfaction whether a given judge is an
activist. Also, just because something is grammatically correct does not
mean it makes any sense. For example consider the following nonsense.
S = ¦x ∈ set of dogs : it is colder in the mountains than in the winter¦ .
So what is a condition?
We will leave these sorts of considerations and assume our conditions make sense.
The axiom of unions states that for any collection of sets, there is a set consisting
of all the elements in each of the sets in the collection. Of course this is also open to
further consideration. What is a collection? Maybe it would be better to say “set
of sets” or, given a set whose elements are sets there exists a set whose elements
consist of exactly those things which are elements of at least one of these sets. If o
is such a set whose elements are sets
∪¦A : A ∈ o¦ or ∪ o
signify this union.
Something is in the Cartesian product of a set or “family” of sets if it consists
of a single thing taken from each set in the family. Thus (1, 2, 3) ∈ ¦1, 4, .2¦
¦1, 2, 7¦¦4, 3, 7, 9¦ because it consists of exactly one element from each of the sets
which are separated by . Also, this is the notation for the Cartesian product of
ﬁnitely many sets. If o is a set whose elements are sets
A∈S
A
signiﬁes the Cartesian product.
The Cartesian product is the set of choice functions, a choice function being a
function which selects exactly one element of each set of o. You may think the axiom
of choice, stating that the Cartesian product of a nonempty family of nonempty sets
is nonempty, is innocuous but there was a time when many mathematicians were
ready to throw it out because it implies things which are very hard to believe, things
which never happen without the axiom of choice.
A is a subset of B, written A ⊆ B, if every element of A is also an element of
B. This can also be written as B ⊇ A. A is a proper subset of B, written A ⊂ B
or B ⊃ A if A is a subset of B but A is not equal to B, A ,= B. A∩ B denotes the
intersection of the two sets A and B and it means the set of elements of A which
1.1. SET THEORY 15
are also elements of B. The axiom of speciﬁcation shows this is a set. The empty
set is the set which has no elements in it, denoted as ∅. A ∪ B denotes the union
of the two sets A and B and it means the set of all elements which are in either of
the sets. It is a set because of the axiom of unions.
The complement of a set, (the set of things which are not in the given set ) must
be taken with respect to a given set called the universal set which is a set which
contains the one whose complement is being taken. Thus, the complement of A,
denoted as A
C
( or more precisely as X¸ A) is a set obtained from using the axiom
of speciﬁcation to write
A
C
≡ ¦x ∈ X : x / ∈ A¦
The symbol / ∈ means: “is not an element of”. Note the axiom of speciﬁcation takes
place relative to a given set. Without this universal set it makes no sense to use
the axiom of speciﬁcation to obtain the complement.
Words such as “all” or “there exists” are called quantiﬁers and they must be
understood relative to some given set. For example, the set of all integers larger
than 3. Or there exists an integer larger than 7. Such statements have to do with a
given set, in this case the integers. Failure to have a reference set when quantiﬁers
are used turns out to be illogical even though such usage may be grammatically
correct. Quantiﬁers are used often enough that there are symbols for them. The
symbol ∀ is read as “for all” or “for every” and the symbol ∃ is read as “there
exists”. Thus ∀∀∃∃ could mean for every upside down A there exists a backwards
E.
DeMorgan’s laws are very useful in mathematics. Let o be a set of sets each of
which is contained in some universal set U. Then
∪
_
A
C
: A ∈ o
_
= (∩¦A : A ∈ o¦)
C
and
∩
_
A
C
: A ∈ o
_
= (∪¦A : A ∈ o¦)
C
.
These laws follow directly from the deﬁnitions. Also following directly from the
deﬁnitions are:
Let o be a set of sets then
B ∪ ∪¦A : A ∈ o¦ = ∪¦B ∪ A : A ∈ o¦ .
and: Let o be a set of sets show
B ∩ ∪¦A : A ∈ o¦ = ∪¦B ∩ A : A ∈ o¦ .
Unfortunately, there is no single universal set which can be used for all sets.
Here is why: Suppose there were. Call it S. Then you could consider A the set
of all elements of S which are not elements of themselves, this from the axiom of
speciﬁcation. If A is an element of itself, then it fails to qualify for inclusion in A.
Therefore, it must not be an element of itself. However, if this is so, it qualiﬁes for
inclusion in A so it is an element of itself and so this can’t be true either. Thus
16 SOME FUNDAMENTAL CONCEPTS
the most basic of conditions you could imagine, that of being an element of, is
meaningless and so allowing such a set causes the whole theory to be meaningless.
The solution is to not allow a universal set. As mentioned by Halmos in Naive
set theory, “Nothing contains everything”. Always beware of statements involving
quantiﬁers wherever they occur, even this one. This little observation described
above is due to Bertrand Russell and is called Russell’s paradox.
1.1.2 The Schroder Bernstein Theorem
It is very important to be able to compare the size of sets in a rational way. The
most useful theorem in this context is the Schroder Bernstein theorem which is the
main result to be presented in this section. The Cartesian product is discussed
above. The next deﬁnition reviews this and deﬁnes the concept of a function.
Deﬁnition 1.1.1 Let X and Y be sets.
X Y ≡ ¦(x, y) : x ∈ X and y ∈ Y ¦
A relation is deﬁned to be a subset of X Y . A function, f, also called a mapping,
is a relation which has the property that if (x, y) and (x, y
1
) are both elements of
the f, then y = y
1
. The domain of f is deﬁned as
D(f) ≡ ¦x : (x, y) ∈ f¦ ,
written as f : D(f) →Y .
It is probably safe to say that most people do not think of functions as a type
of relation which is a subset of the Cartesian product of two sets. A function is like
a machine which takes inputs, x and makes them into a unique output, f (x). Of
course, that is what the above deﬁnition says with more precision. An ordered pair,
(x, y) which is an element of the function or mapping has an input, x and a unique
output, y,denoted as f (x) while the name of the function is f. “mapping” is often
a noun meaning function. However, it also is a verb as in “f is mapping A to B
”. That which a function is thought of as doing is also referred to using the word
“maps” as in: f maps X to Y . However, a set of functions may be called a set of
maps so this word might also be used as the plural of a noun. There is no help for
it. You just have to suﬀer with this nonsense.
The following theorem which is interesting for its own sake will be used to prove
the Schroder Bernstein theorem.
Theorem 1.1.2 Let f : X →Y and g : Y →X be two functions. Then there exist
sets A, B, C, D, such that
A∪ B = X, C ∪ D = Y, A∩ B = ∅, C ∩ D = ∅,
f (A) = C, g (D) = B.
1.1. SET THEORY 17
The following picture illustrates the conclusion of this theorem.
B = g(D)
A
E
'
D
C = f(A)
Y X
f
g
Proof: Consider the empty set ∅ ⊆ X. If y ∈ Y ¸ f (∅), then g (y) / ∈ ∅ because
∅ has no elements. Also, if A, B, C, and D are as described above, A also would
have this same property that the empty set has. However, A is probably larger.
Therefore, say A
0
⊆ X satisﬁes T if whenever y ∈ Y ¸ f (A
0
) , g (y) / ∈ A
0
.
/ ≡ ¦A
0
⊆ X : A
0
satisﬁes T¦.
Let A = ∪/. If y ∈ Y ¸ f (A), then for each A
0
∈ /, y ∈ Y ¸ f (A
0
) and so
g (y) / ∈ A
0
. Since g (y) / ∈ A
0
for all A
0
∈ /, it follows g (y) / ∈ A. Hence A satisﬁes
T and is the largest subset of X which does so. Now deﬁne
C ≡ f (A) , D ≡ Y ¸ C, B ≡ X ¸ A.
It only remains to verify that g (D) = B.
Suppose x ∈ B = X ¸ A. Then A ∪ ¦x¦ does not satisfy T and so there exists
y ∈ Y ¸ f (A∪ ¦x¦) ⊆ D such that g (y) ∈ A ∪ ¦x¦ . But y / ∈ f (A) and so since A
satisﬁes T, it follows g (y) / ∈ A. Hence g (y) = x and so x ∈ g (D).
Theorem 1.1.3 (Schroder Bernstein) If f : X → Y and g : Y → X are one to
one, then there exists h : X →Y which is one to one and onto.
Proof: Let A, B, C, D be the sets of Theorem 1.1.2 and deﬁne
h(x) ≡
_
f (x) if x ∈ A
g
−1
(x) if x ∈ B
Then h is the desired one to one and onto mapping.
Recall that the Cartesian product may be considered as the collection of choice
functions.
Deﬁnition 1.1.4 Let I be a set and let X
i
be a set for each i ∈ I. f is a choice
function written as
f ∈
i∈I
X
i
if f (i) ∈ X
i
for each i ∈ I.
18 SOME FUNDAMENTAL CONCEPTS
The axiom of choice says that if X
i
,= ∅ for each i ∈ I, for I a set, then
i∈I
X
i
,= ∅.
Sometimes the two functions, f and g are onto but not one to one. It turns out
that with the axiom of choice, a similar conclusion to the above may be obtained.
Corollary 1.1.5 If f : X → Y is onto and g : Y → X is onto, then there exists
h : X →Y which is one to one and onto.
Proof: For each y ∈ Y , f
−1
(y) ≡ ¦x ∈ X : f (x) = y¦ ,= ∅. Therefore, by the
axiom of choice, there exists f
−1
0
∈
y∈Y
f
−1
(y) which is the same as saying that
for each y ∈ Y , f
−1
0
(y) ∈ f
−1
(y). Similarly, there exists g
−1
0
(x) ∈ g
−1
(x) for all
x ∈ X. Then f
−1
0
is one to one because if f
−1
0
(y
1
) = f
−1
0
(y
2
), then
y
1
= f
_
f
−1
0
(y
1
)
_
= f
_
f
−1
0
(y
2
)
_
= y
2
.
Similarly g
−1
0
is one to one. Therefore, by the Schroder Bernstein theorem, there
exists h : X →Y which is one to one and onto.
Deﬁnition 1.1.6 A set S, is ﬁnite if there exists a natural number n and a map
θ which maps ¦1, , n¦ one to one and onto S. S is inﬁnite if it is not ﬁnite. A
set S, is called countable if there exists a map θ mapping N one to one and onto
S.(When θ maps a set A to a set B, this will be written as θ : A →B in the future.)
Here N ≡ ¦1, 2, ¦, the natural numbers. S is at most countable if there exists a
map θ : N →S which is onto.
The property of being at most countable is often referred to as being countable
because the question of interest is normally whether one can list all elements of the
set, designating a ﬁrst, second, third etc. in such a way as to give each element of
the set a natural number. The possibility that a single element of the set may be
counted more than once is often not important.
Theorem 1.1.7 If X and Y are both at most countable, then XY is also at most
countable. If either X or Y is countable, then X Y is also countable.
Proof: It is given that there exists a mapping η : N →X which is onto. Deﬁne
η (i) ≡ x
i
and consider X as the set ¦x
1
, x
2
, x
3
, ¦. Similarly, consider Y as the
set ¦y
1
, y
2
, y
3
, ¦. It follows the elements of X Y are included in the following
rectangular array.
(x
1
, y
1
) (x
1
, y
2
) (x
1
, y
3
) ←Those which have x
1
in ﬁrst slot.
(x
2
, y
1
) (x
2
, y
2
) (x
2
, y
3
) ←Those which have x
2
in ﬁrst slot.
(x
3
, y
1
) (x
3
, y
2
) (x
3
, y
3
) ←Those which have x
3
in ﬁrst slot.
.
.
.
.
.
.
.
.
.
.
.
.
.
1.1. SET THEORY 19
Follow a path through this array as follows.
(x
1
, y
1
) → (x
1
, y
2
) (x
1
, y
3
) →
¸ ¸
(x
2
, y
1
) (x
2
, y
2
)
↓ ¸
(x
3
, y
1
)
Thus the ﬁrst element of XY is (x
1
, y
1
), the second element of XY is (x
1
, y
2
),
the third element of X Y is (x
2
, y
1
) etc. This assigns a number from N to each
element of X Y. Thus X Y is at most countable.
It remains to show the last claim. Suppose without loss of generality that X
is countable. Then there exists α : N → X which is one to one and onto. Let
β : X Y → N be deﬁned by β ((x, y)) ≡ α
−1
(x). Thus β is onto N. By the ﬁrst
part there exists a function from N onto X Y . Therefore, by Corollary 1.1.5,
there exists a one to one and onto mapping from X Y to N.
Theorem 1.1.8 If X and Y are at most countable, then X∪Y is at most countable.
If either X or Y are countable, then X ∪ Y is countable.
Proof: As in the preceding theorem,
X = ¦x
1
, x
2
, x
3
, ¦
and
Y = ¦y
1
, y
2
, y
3
, ¦ .
Consider the following array consisting of X ∪ Y and path through it.
x
1
→ x
2
x
3
→
¸ ¸
y
1
→ y
2
Thus the ﬁrst element of X ∪ Y is x
1
, the second is x
2
the third is y
1
the fourth is
y
2
etc.
Consider the second claim. By the ﬁrst part, there is a map from N onto XY .
Suppose without loss of generality that X is countable and α : N →X is one to one
and onto. Then deﬁne β (y) ≡ 1, for all y ∈ Y ,and β (x) ≡ α
−1
(x). Thus, β maps
XY onto N and this shows there exist two onto maps, one mapping X∪Y onto N
and the other mapping N onto X ∪ Y . Then Corollary 1.1.5 yields the conclusion.
1.1.3 Equivalence Relations
There are many ways to compare elements of a set other than to say two elements
are equal or the same. For example, in the set of people let two people be equiv
alent if they have the same weight. This would not be saying they were the same
20 SOME FUNDAMENTAL CONCEPTS
person, just that they weighed the same. Often such relations involve considering
one characteristic of the elements of a set and then saying the two elements are
equivalent if they are the same as far as the given characteristic is concerned.
Deﬁnition 1.1.9 Let S be a set. ∼ is an equivalence relation on S if it satisﬁes
the following axioms.
1. x ∼ x for all x ∈ S. (Reﬂexive)
2. If x ∼ y then y ∼ x. (Symmetric)
3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive)
Deﬁnition 1.1.10 [x] denotes the set of all elements of S which are equivalent to
x and [x] is called the equivalence class determined by x or just the equivalence class
of x.
With the above deﬁnition one can prove the following simple theorem.
Theorem 1.1.11 Let ∼ be an equivalence class deﬁned on a set S and let H denote
the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes,
either x ∼ y and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅.
1.2 limsup And liminf
It is assumed in all that is done that 1 is complete. There are two ways to describe
completeness of 1. One is to say that every bounded set has a least upper bound and
a greatest lower bound. The other is to say that every Cauchy sequence converges.
These two equivalent notions of completeness will be taken as given.
The symbol, F will mean either 1 or C. The symbol [−∞, ∞] will mean all real
numbers along with +∞ and −∞ which are points which we pretend are at the
right and left ends of the real line respectively. The inclusion of these make believe
points makes the statement of certain theorems less trouble.
Deﬁnition 1.2.1 For A ⊆ [−∞, ∞] , A ,= ∅ sup A is deﬁned as the least upper
bound in case A is bounded above by a real number and equals ∞if A is not bounded
above. Similarly inf A is deﬁned to equal the greatest lower bound in case A is
bounded below by a real number and equals −∞ in case A is not bounded below.
Lemma 1.2.2 If ¦A
n
¦ is an increasing sequence in [−∞, ∞], then
sup ¦A
n
¦ = lim
n→∞
A
n
.
Similarly, if ¦A
n
¦ is decreasing, then
inf ¦A
n
¦ = lim
n→∞
A
n
.
1.2. LIMSUP AND LIMINF 21
Proof: Let sup (¦A
n
: n ∈ N¦) = r. In the ﬁrst case, suppose r < ∞. Then
letting ε > 0 be given, there exists n such that A
n
∈ (r − ε, r]. Since ¦A
n
¦ is
increasing, it follows if m > n, then r − ε < A
n
≤ A
m
≤ r and so lim
n→∞
A
n
= r
as claimed. In the case where r = ∞, then if a is a real number, there exists n
such that A
n
> a. Since ¦A
k
¦ is increasing, it follows that if m > n, A
m
> a. But
this is what is meant by lim
n→∞
A
n
= ∞. The other case is that r = −∞. But
in this case, A
n
= −∞ for all n and so lim
n→∞
A
n
= −∞. The case where A
n
is
decreasing is entirely similar.
Sometimes the limit of a sequence does not exist. For example, if a
n
= (−1)
n
,
then lim
n→∞
a
n
does not exist. This is because the terms of the sequence are a
distance of 1 apart. Therefore there can’t exist a single number such that all the
terms of the sequence are ultimately within 1/4 of that number. The nice thing
about limsup and liminf is that they always exist. First here is a simple lemma
and deﬁnition.
Deﬁnition 1.2.3 Denote by [−∞, ∞] the real line along with symbols ∞ and −∞.
It is understood that ∞ is larger than every real number and −∞ is smaller than
every real number. Then if ¦A
n
¦ is an increasing sequence of points of [−∞, ∞] ,
lim
n→∞
A
n
equals ∞ if the only upper bound of the set ¦A
n
¦ is ∞. If ¦A
n
¦ is
bounded above by a real number, then lim
n→∞
A
n
is deﬁned in the usual way and
equals the least upper bound of ¦A
n
¦. If ¦A
n
¦ is a decreasing sequence of points of
[−∞, ∞] , lim
n→∞
A
n
equals −∞ if the only lower bound of the sequence ¦A
n
¦ is
−∞. If ¦A
n
¦ is bounded below by a real number, then lim
n→∞
A
n
is deﬁned in the
usual way and equals the greatest lower bound of ¦A
n
¦. More simply, if ¦A
n
¦ is
increasing,
lim
n→∞
A
n
= sup ¦A
n
¦
and if ¦A
n
¦ is decreasing then
lim
n→∞
A
n
= inf ¦A
n
¦ .
Lemma 1.2.4 Let ¦a
n
¦ be a sequence of real numbers and let U
n
≡ sup ¦a
k
: k ≥ n¦ .
Then ¦U
n
¦ is a decreasing sequence. Also if L
n
≡ inf ¦a
k
: k ≥ n¦ , then ¦L
n
¦ is
an increasing sequence. Therefore, lim
n→∞
L
n
and lim
n→∞
U
n
both exist.
Proof: Let W
n
be an upper bound for ¦a
k
: k ≥ n¦ . Then since these sets are
getting smaller, it follows that for m < n, W
m
is an upper bound for ¦a
k
: k ≥ n¦ .
In particular if W
m
= U
m
, then U
m
is an upper bound for ¦a
k
: k ≥ n¦ and so U
m
is at least as large as U
n
, the least upper bound for ¦a
k
: k ≥ n¦ . The claim that
¦L
n
¦ is decreasing is similar.
From the lemma, the following deﬁnition makes sense.
Deﬁnition 1.2.5 Let ¦a
n
¦ be any sequence of points of [−∞, ∞]
lim sup
n→∞
a
n
≡ lim
n→∞
sup ¦a
k
: k ≥ n¦
lim inf
n→∞
a
n
≡ lim
n→∞
inf ¦a
k
: k ≥ n¦ .
22 SOME FUNDAMENTAL CONCEPTS
Theorem 1.2.6 Suppose ¦a
n
¦ is a sequence of real numbers and that
lim sup
n→∞
a
n
and
lim inf
n→∞
a
n
are both real numbers. Then lim
n→∞
a
n
exists if and only if
lim inf
n→∞
a
n
= lim sup
n→∞
a
n
and in this case,
lim
n→∞
a
n
= lim inf
n→∞
a
n
= lim sup
n→∞
a
n
.
Proof: First note that
sup ¦a
k
: k ≥ n¦ ≥ inf ¦a
k
: k ≥ n¦
and so from Theorem 3.2.7,
lim sup
n→∞
a
n
≡ lim
n→∞
sup ¦a
k
: k ≥ n¦
≥ lim
n→∞
inf ¦a
k
: k ≥ n¦
≡ lim inf
n→∞
a
n
.
Suppose ﬁrst that lim
n→∞
a
n
exists and is a real number. Then by Theorem 3.5.3
¦a
n
¦ is a Cauchy sequence. Therefore, if ε > 0 is given, there exists N such that if
m, n ≥ N, then
[a
n
−a
m
[ < ε/3.
From the deﬁnition of sup ¦a
k
: k ≥ N¦ , there exists n
1
≥ N such that
sup ¦a
k
: k ≥ N¦ ≤ a
n
1
+ε/3.
Similarly, there exists n
2
≥ N such that
inf ¦a
k
: k ≥ N¦ ≥ a
n
2
−ε/3.
It follows that
sup ¦a
k
: k ≥ N¦ −inf ¦a
k
: k ≥ N¦ ≤ [a
n
1
−a
n
2
[ +
2ε
3
< ε.
Since the sequence, ¦sup ¦a
k
: k ≥ N¦¦
∞
N=1
is decreasing and ¦inf ¦a
k
: k ≥ N¦¦
∞
N=1
is increasing, it follows from Theorem 3.2.7
0 ≤ lim
N→∞
sup ¦a
k
: k ≥ N¦ − lim
N→∞
inf ¦a
k
: k ≥ N¦ ≤ ε
1.2. LIMSUP AND LIMINF 23
Since ε is arbitrary, this shows
lim
N→∞
sup ¦a
k
: k ≥ N¦ = lim
N→∞
inf ¦a
k
: k ≥ N¦ (1.2.1)
Next suppose 1.2.1. Then
lim
N→∞
(sup¦a
k
: k ≥ N¦ −inf ¦a
k
: k ≥ N¦) = 0
Since sup ¦a
k
: k ≥ N¦ ≥ inf ¦a
k
: k ≥ N¦ it follows that for every ε > 0, there
exists N such that
sup ¦a
k
: k ≥ N¦ −inf ¦a
k
: k ≥ N¦ < ε
Thus if m, n > N, then
[a
m
−a
n
[ < ε
which means ¦a
n
¦ is a Cauchy sequence. Since 1 is complete, it follows that
lim
n→∞
a
n
≡ a exists. By the squeezing theorem, it follows
a = lim inf
n→∞
a
n
= lim sup
n→∞
a
n
With the above theorem, here is how to deﬁne the limit of a sequence of points
in [−∞, ∞].
Deﬁnition 1.2.7 Let ¦a
n
¦ be a sequence of points of [−∞, ∞] . Then lim
n→∞
a
n
exists exactly when
lim inf
n→∞
a
n
= lim sup
n→∞
a
n
and in this case
lim
n→∞
a
n
≡ lim inf
n→∞
a
n
= lim sup
n→∞
a
n
.
The signiﬁcance of limsup and liminf, in addition to what was just discussed,
is contained in the following theorem which follows quickly from the deﬁnition.
Theorem 1.2.8 Suppose ¦a
n
¦ is a sequence of points of [−∞, ∞] . Let
λ = lim sup
n→∞
a
n
.
Then if b > λ, it follows there exists N such that whenever n ≥ N,
a
n
≤ b.
If c < λ, then a
n
> c for inﬁnitely many values of n. Let
γ = lim inf
n→∞
a
n
.
Then if d < γ, it follows there exists N such that whenever n ≥ N,
a
n
≥ d.
If e > γ, it follows a
n
< e for inﬁnitely many values of n.
24 SOME FUNDAMENTAL CONCEPTS
The proof of this theorem is left as an exercise for you. It follows directly from
the deﬁnition and it is the sort of thing you must do yourself. Here is one other
simple proposition.
Proposition 1.2.9 Let lim
n→∞
a
n
= a > 0. Then
lim sup
n→∞
a
n
b
n
= a lim sup
n→∞
b
n
.
Proof: This follows from the deﬁnition. Let λ
n
= sup ¦a
k
b
k
: k ≥ n¦ . For all n
large enough, a
n
> a −ε where ε is small enough that a −ε > 0. Therefore,
λ
n
≥ sup ¦b
k
: k ≥ n¦ (a −ε)
for all n large enough. Then
lim sup
n→∞
a
n
b
n
= lim
n→∞
λ
n
≡ lim sup
n→∞
a
n
b
n
≥ lim
n→∞
(sup ¦b
k
: k ≥ n¦ (a −ε))
= (a −ε) lim sup
n→∞
b
n
Similar reasoning shows
lim sup
n→∞
a
n
b
n
≤ (a +ε) lim sup
n→∞
b
n
Now since ε > 0 is arbitrary, the conclusion follows.
1.3 Double Series
Sometimes it is required to consider double series which are of the form
∞
k=m
∞
j=m
a
jk
≡
∞
k=m
_
_
∞
j=m
a
jk
_
_
.
In other words, ﬁrst sum on j yielding something which depends on k and then sum
these. The major consideration for these double series is the question of when
∞
k=m
∞
j=m
a
jk
=
∞
j=m
∞
k=m
a
jk
.
In other words, when does it make no diﬀerence which subscript is summed over
ﬁrst? In the case of ﬁnite sums there is no issue here. You can always write
M
k=m
N
j=m
a
jk
=
N
j=m
M
k=m
a
jk
1.3. DOUBLE SERIES 25
because addition is commutative. However, there are limits involved with inﬁnite
sums and the interchange in order of summation involves taking limits in a diﬀerent
order. Therefore, it is not always true that it is permissible to interchange the
two sums. A general rule of thumb is this: If something involves changing the
order in which two limits are taken, you may not do it without agonizing over
the question. In general, limits foul up algebra and also introduce things which are
counter intuitive. Here is an example. This example is a little technical. It is placed
here just to prove conclusively there is a question which needs to be considered.
Example 1.3.1 Consider the following picture which depicts some of the ordered
pairs (m, n) where m, n are positive integers.
0
0
0
0
0
0
0
c
c
c
c
c
c
a
c
c
c
c
c
b
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
The numbers next to the point are the values of a
mn
. You see a
nn
= 0 for all
n, a
21
= a, a
12
= b, a
mn
= c for (m, n) on the line y = 1 + x whenever m > 1, and
a
mn
= −c for all (m, n) on the line y = x −1 whenever m > 2.
Then
∞
m=1
a
mn
= a if n = 1,
∞
m=1
a
mn
= b − c if n = 2 and if n >
2,
∞
m=1
a
mn
= 0. Therefore,
∞
n=1
∞
m=1
a
mn
= a +b −c.
Next observe that
∞
n=1
a
mn
= b if m = 1,
∞
n=1
a
mn
= a + c if m = 2, and
∞
n=1
a
mn
= 0 if m > 2. Therefore,
∞
m=1
∞
n=1
a
mn
= b +a +c
and so the two sums are diﬀerent. Moreover, you can see that by assigning diﬀerent
values of a, b, and c, you can get an example for any two diﬀerent numbers desired.
It turns out that if a
ij
≥ 0 for all i, j, then you can always interchange the order
of summation. This is shown next and is based on the following lemma. First, some
notation should be discussed.
26 SOME FUNDAMENTAL CONCEPTS
Deﬁnition 1.3.2 Let f (a, b) ∈ [−∞, ∞] for a ∈ A and b ∈ B where A, B are
sets which means that f (a, b) is either a number, ∞, or −∞. The symbol, +∞ is
interpreted as a point out at the end of the number line which is larger than every
real number. Of course there is no such number. That is why it is called ∞. The
symbol, −∞ is interpreted similarly. Then sup
a∈A
f (a, b) means sup (S
b
) where
S
b
≡ ¦f (a, b) : a ∈ A¦ .
Unlike limits, you can take the sup in diﬀerent orders.
Lemma 1.3.3 Let f (a, b) ∈ [−∞, ∞] for a ∈ A and b ∈ B where A, B are sets.
Then
sup
a∈A
sup
b∈B
f (a, b) = sup
b∈B
sup
a∈A
f (a, b) .
Proof: Note that for all a, b, f (a, b) ≤ sup
b∈B
sup
a∈A
f (a, b) and therefore, for
all a, sup
b∈B
f (a, b) ≤ sup
b∈B
sup
a∈A
f (a, b). Therefore,
sup
a∈A
sup
b∈B
f (a, b) ≤ sup
b∈B
sup
a∈A
f (a, b) .
Repeat the same argument interchanging a and b, to get the conclusion of the
lemma.
Theorem 1.3.4 Let a
ij
≥ 0. Then
∞
i=1
∞
j=1
a
ij
=
∞
j=1
∞
i=1
a
ij
.
Proof: First note there is no trouble in deﬁning these sums because the a
ij
are
all nonnegative. If a sum diverges, it only diverges to ∞ and so ∞ is the value of
the sum. Next note that
∞
j=r
∞
i=r
a
ij
≥ sup
n
∞
j=r
n
i=r
a
ij
because for all j,
∞
i=r
a
ij
≥
n
i=r
a
ij
.
Therefore,
∞
j=r
∞
i=r
a
ij
≥ sup
n
∞
j=r
n
i=r
a
ij
= sup
n
lim
m→∞
m
j=r
n
i=r
a
ij
= sup
n
lim
m→∞
n
i=r
m
j=r
a
ij
= sup
n
n
i=r
lim
m→∞
m
j=r
a
ij
= sup
n
n
i=r
∞
j=r
a
ij
= lim
n→∞
n
i=r
∞
j=r
a
ij
=
∞
i=r
∞
j=r
a
ij
Interchanging the i and j in the above argument proves the theorem.
Row reduced echelon form
2.1 Elementary Matrices
The elementary matrices result from doing a row operation to the identity matrix.
Deﬁnition 2.1.1 The row operations consist of the following
1. Switch two rows.
2. Multiply a row by a nonzero number.
3. Replace a row by the same row added to a multiple of another row.
We refer to these as the row operations of type 1,2, and 3 respectively.
The elementary matrices are given in the following deﬁnition.
Deﬁnition 2.1.2 The elementary matrices consist of those matrices which result
by applying a row operation to an identity matrix. Those which involve switching
rows of the identity are called permutation matrices
1
.
As an example of why these elementary matrices are interesting, consider the
following.
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
a b c d
x y z w
f g h i
_
_
=
_
_
x y z w
a b c d
f g h i
_
_
A 3 4 matrix was multiplied on the left by an elementary matrix which was
obtained from row operation 1 applied to the identity matrix. This resulted in
applying the operation 1 to the given matrix. This is what happens in general.
1
More generally, a permutation matrix is a matrix which comes by permuting the rows of the
identity matrix, not just switching two rows.
27
28 ROW REDUCED ECHELON FORM
Now consider what these elementary matrices look like. First P
ij
, which involves
switching row i and row j of the identity where i < j. We write
I =
_
_
_
_
_
_
_
_
_
_
_
_
_
r
1
.
.
.
r
i
.
.
.
r
j
.
.
.
r
n
_
_
_
_
_
_
_
_
_
_
_
_
_
where
r
j
= (0 1 0)
with the 1 in the j
th
position from the left.
This matrix P
ij
is of the form
_
_
_
_
_
_
_
_
_
_
_
_
_
r
1
.
.
.
r
j
.
.
.
r
i
.
.
.
r
n
_
_
_
_
_
_
_
_
_
_
_
_
_
Now consider what this does to a column vector.
_
_
_
_
_
_
_
_
_
_
_
_
_
r
1
.
.
.
r
j
.
.
.
r
i
.
.
.
r
n
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
v
1
.
.
.
v
i
.
.
.
v
j
.
.
.
v
n
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
v
1
.
.
.
v
j
.
.
.
v
i
.
.
.
v
n
_
_
_
_
_
_
_
_
_
_
_
_
_
2.1. ELEMENTARY MATRICES 29
Now we try multiplication of a matrix on the left by this elementary matrix P
ij
.
Consider
P
ij
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
a
12
a
1p
.
.
.
.
.
.
.
.
.
a
i1
a
i2
a
ip
.
.
.
.
.
.
.
.
.
a
j1
a
j2
a
jp
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
From the way you multiply matrices this is a matrix which has the indicated
columns.
_
_
_
_
_
_
_
_
_
_
_
_
_
P
ij
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
.
.
.
a
i1
.
.
.
a
j1
.
.
.
a
n1
_
_
_
_
_
_
_
_
_
_
_
_
_
, P
ij
_
_
_
_
_
_
_
_
_
_
_
_
_
a
12
.
.
.
a
i2
.
.
.
a
j2
.
.
.
a
n2
_
_
_
_
_
_
_
_
_
_
_
_
_
, , P
ij
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1p
.
.
.
a
ip
.
.
.
a
jp
.
.
.
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
.
.
.
a
j1
.
.
.
a
i1
.
.
.
a
n1
_
_
_
_
_
_
_
_
_
_
_
_
_
,
_
_
_
_
_
_
_
_
_
_
_
_
_
a
12
.
.
.
a
j2
.
.
.
a
i2
.
.
.
a
n2
_
_
_
_
_
_
_
_
_
_
_
_
_
, ,
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1p
.
.
.
a
jp
.
.
.
a
ip
.
.
.
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
a
12
a
1p
.
.
.
.
.
.
.
.
.
a
j1
a
j2
a
jp
.
.
.
.
.
.
.
.
.
a
i1
a
i2
a
ip
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
This has established the following lemma.
Lemma 2.1.3 Let P
ij
denote the elementary matrix which involves switching the
i
th
and the j
th
rows of I. Then if P
ij
, A are conformable, we have
P
ij
A = B
30 ROW REDUCED ECHELON FORM
where B is obtained from A by switching the i
th
and the j
th
rows.
Next consider the row operation which involves multiplying the i
th
row by a
nonzero constant, c. We write
I =
_
_
_
_
_
r
1
r
2
.
.
.
r
n
_
_
_
_
_
where
r
j
= (0 1 0)
with the 1 in the j
th
position from the left. The elementary matrix which results
from applying this operation to the i
th
row of the identity matrix is of the form
E (c, i) =
_
_
_
_
_
_
_
_
r
1
.
.
.
cr
i
.
.
.
r
n
_
_
_
_
_
_
_
_
Now consider what this does to a column vector.
_
_
_
_
_
_
_
_
r
1
.
.
.
cr
i
.
.
.
r
n
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
v
1
.
.
.
v
i
.
.
.
v
n
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
v
1
.
.
.
cv
i
.
.
.
v
n
_
_
_
_
_
_
_
_
Denote by E (c, i) this elementary matrix which multiplies the i
th
row of the identity
by the nonzero constant, c. Then from what was just discussed and the way matrices
are multiplied,
E (c, i)
_
_
_
_
_
_
_
_
a
11
a
12
a
1p
.
.
.
.
.
.
.
.
.
a
i1
a
i2
a
ip
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
np
_
_
_
_
_
_
_
_
equals a matrix having the columns indicated below.
=
_
_
_
_
_
_
_
_
a
11
a
12
a
1p
.
.
.
.
.
.
.
.
.
ca
i1
ca
i2
ca
ip
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
np
_
_
_
_
_
_
_
_
2.1. ELEMENTARY MATRICES 31
This proves the following lemma.
Lemma 2.1.4 Let E (c, i) denote the elementary matrix corresponding to the row
operation in which the i
th
row is multiplied by the nonzero constant c. Thus E (c, i)
involves multiplying the i
th
row of the identity matrix by c. Then
E (c, i) A = B
where B is obtained from A by multiplying the i
th
row of A by c.
Finally consider the third of these row operations. Letting r
j
be the j
th
row of
the identity matrix, denote by E (c i +j) the elementary matrix obtained from
the identity matrix by replacing r
j
with r
j
+ cr
i
. In case i < j this will be of the
form
_
_
_
_
_
_
_
_
_
_
_
_
_
r
1
.
.
.
r
i
.
.
.
cr
i
+r
j
.
.
.
r
n
_
_
_
_
_
_
_
_
_
_
_
_
_
Now consider what this does to a column vector.
_
_
_
_
_
_
_
_
_
_
_
_
_
r
1
.
.
.
r
i
.
.
.
cr
i
+r
j
.
.
.
r
n
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
v
1
.
.
.
v
i
.
.
.
v
j
.
.
.
v
n
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
v
1
.
.
.
v
i
.
.
.
cv
i
+v
j
.
.
.
v
n
_
_
_
_
_
_
_
_
_
_
_
_
_
Now from this and the way matrices are multiplied,
E (c i +j)
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
a
12
a
1p
.
.
.
.
.
.
.
.
.
a
i1
a
i2
a
ip
.
.
.
.
.
.
.
.
.
a
j2
a
j2
a
jp
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
32 ROW REDUCED ECHELON FORM
equals a matrix of the following form having the indicated columns.
_
_
_
_
_
_
_
_
_
_
_
_
_
E (c i +j)
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
.
.
.
a
i1
.
.
.
a
j2
.
.
.
a
n1
_
_
_
_
_
_
_
_
_
_
_
_
_
, E (c i +j)
_
_
_
_
_
_
_
_
_
_
_
_
_
a
12
.
.
.
a
i2
.
.
.
a
j2
.
.
.
a
n2
_
_
_
_
_
_
_
_
_
_
_
_
_
, E (c i +j)
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1p
.
.
.
a
ip
.
.
.
a
jp
.
.
.
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
a
12
a
1p
.
.
.
.
.
.
.
.
.
a
i1
a
i2
a
ip
.
.
.
.
.
.
.
.
.
a
j2
+ca
i1
a
j2
+ca
i2
a
jp
+ca
ip
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
np
_
_
_
_
_
_
_
_
_
_
_
_
_
The case where i > j is similar. This proves the following lemma in which, as above,
the i
th
row of the identity is r
i
.
Lemma 2.1.5 Let E (c i +j) denote the elementary matrix obtained from I by
replacing the j
th
row of the identity r
j
with cr
i
+ r
j
. Letting the k
th
row of A be
a
k
,
E (c i +j) A = B
where B has the same rows as A except the j
th
row of B is ca
i
+a
j
.
The above lemmas are summarized in the following theorem.
Theorem 2.1.6 To perform any of the three row operations on a matrix A it suf
ﬁces to do the row operation on the identity matrix, obtaining an elementary matrix
E, and then take the product, EA. In addition to this, the following identities hold
for the elementary matrices described above.
E (c i +j) E (−c i +j) = E (−c i +j) E (c i +j) = I (2.1.1)
E (c, i) E
_
c
−1
, i
_
= E
_
c
−1
, i
_
E (c, i) (2.1.2)
P
ij
P
ij
= I (2.1.3)
Proof: Consider 2.1.1. Starting with I and taking −c times the i
th
row added
to the j
th
yields E (−c i +j) which diﬀers from I only in the j
th
row. Now
multiplying on the left by E (c i +j) takes c times the i
th
row and adds to the j
th
thus restoring the j
th
row to its original state. Thus E (c i +j) E (−c i +j) =
I. Similarly E (−c i +j) E (c i +j) = I. The reasoning is similar for 2.1.2 and
2.1.3.
2.1. ELEMENTARY MATRICES 33
Deﬁnition 2.1.7 For an nn matrix A, an nn matrix B which has the property
that AB = BA = I is denoted by A
−1
. Such a matrix is called an inverse. When
A has an inverse, it is called invertible.
The following lemma says that if a matrix acts like an inverse, then it is the
inverse. Also, the product of invertible matrices is invertible.
Lemma 2.1.8 If B, C are both inverses of A, then B = C. That is, there exists at
most one inverse of a matrix. If A
1
, , A
m
are each invertible m m matrices,
then the product A
1
A
2
A
m
is also invertible and
(A
1
A
2
A
m
)
−1
= A
−1
m
A
−1
m−1
A
−1
1
Proof. From the deﬁnition and associative law of matrix multiplication,
B = BI = B(AC) = (BA) C = IC = C.
This proves the uniqueness of the inverse.
Next suppose A, B are invertible. Then
AB
_
B
−1
A
−1
_
= A
_
BB
−1
_
A
−1
= AIA
−1
= AA
−1
= I
and also
_
B
−1
A
−1
_
AB = B
−1
_
A
−1
A
_
B = B
−1
IB = B
−1
B = I
It follows from Deﬁnition 2.1.7 that AB has an inverse and it is B
−1
A
−1
. Thus the
case of m = 1, 2 in the claim of the lemma is true. Suppose this claim is true for k.
Then
A
1
A
2
A
k
A
k+1
= (A
1
A
2
A
k
) A
k+1
By induction, the two matrices (A
1
A
2
A
k
) , A
k+1
are both invertible and
(A
1
A
2
A
k
)
−1
= A
−1
k
A
−1
2
A
−1
1
By the case of the product of two invertible matrices shown above,
((A
1
A
2
A
k
) A
k+1
)
−1
= A
−1
k+1
(A
1
A
2
A
k
)
−1
= A
−1
k+1
A
−1
k
A
−1
2
A
−1
1
.
We will discuss methods for ﬁnding the inverse later. For now, observe that
Theorem 2.1.6 says that elementary matrices are invertible and that the inverse of
such a matrix is also an elementary matrix. The major conclusion of the above
Lemma and Theorem is the following lemma about linear relationships.
Deﬁnition 2.1.9 Let v
1
, , v
k
, u be vectors. Then u is said to be a linear com
bination of the vectors ¦v
1
, , v
k
¦ if there exist scalars c
1
, , c
k
such that
u =
k
i=1
c
i
v
i
.
We also say that when the above holds for some scalars c
1
, , c
k
, there exists a
linear relationship between the vector u and the vectors ¦v
1
, , v
k
¦.
34 ROW REDUCED ECHELON FORM
We will discuss this more later, but the following picture illustrates the geometric
signiﬁcance of the vectors which have a linear relationship with two vectors u, v
pointing in diﬀerent directions.
y
z
x
1
u
v
The following lemma states that linear relationships between columns in a ma
trix are preserved by row operations. This simple lemma is the main result in
understanding all the major questions related to the row reduced echelon form as
well as many other topics.
Lemma 2.1.10 Let A and B be two mn matrices and suppose B results from a
row operation applied to A. Then the k
th
column of B is a linear combination of the
i
1
, , i
r
columns of B if and only if the k
th
column of A is a linear combination of
the i
1
, , i
r
columns of A. Furthermore, the scalars in the linear combinations are
the same. (The linear relationship between the k
th
column of A and the i
1
, , i
r
columns of A is the same as the linear relationship between the k
th
column of B
and the i
1
, , i
r
columns of B.)
Proof. Let A be the following matrix in which the a
k
are the columns
_
a
1
a
2
a
n
_
and let B be the following matrix in which the columns are given by the b
k
_
b
1
b
2
b
n
_
Then by Theorem 2.1.6 on Page 32, b
k
= Ea
k
where E is an elementary matrix.
Suppose then that one of the columns of A is a linear combination of some other
columns of A. Say
a
k
= c
1
a
i
1
+ +c
r
a
i
r
Then multiplying by E,
b
k
= Ea
k
= c
1
Ea
i
1
+ +c
r
Ea
i
r
= c
1
b
i
1
+ +c
r
b
i
r
.
This is really just an extension of the technique for ﬁnding solutions to a linear
system of equations. In solving a system of equations earlier, row operations were
used to exhibit the last column of an augmented matrix as a linear combination of
the preceding columns. The row reduced echelon form makes obvious all linear
relationships between all columns, not just the last column and those preceding it.
2.2. THE ROW REDUCED ECHELON FORM OF A MATRIX 35
2.2 The row reduced echelon form of a matrix
When you do row operations on a matrix, there is an ultimate conclusion. It is
called the row reduced echelon form. We show here that every matrix has such
a row reduced echelon form and that this row reduced echelon form is unique. The
signiﬁcance is that it becomes possible to use the deﬁnite article in referring to the
row reduced echelon form. Hence important conclusions about the original matrix
may be logically deduced from an examination of its unique row reduced echelon
form. First we need the following deﬁnition.
Deﬁnition 2.2.1 Deﬁne special column vectors e
i
as follows.
e
i
=
_
_
_
_
_
_
_
_
0
.
.
.
1
.
.
.
0
_
_
_
_
_
_
_
_
Thus e
i
is the column vector which has all zero entries except for a 1 in the i
th
position down from the top.
Now here is the description of the row reduced echelon form.
Deﬁnition 2.2.2 An mn matrix is said to be in row reduced echelon form if,
in viewing successive columns from left to right, the ﬁrst nonzero column encoun
tered is e
1
and, in viewing the columns of the matrix from left to right, you have
encountered e
1
, e
2
, , e
k
, the next column is either e
k+1
or this next column is a
linear combination of the vectors, e
1
, e
2
, , e
k
.
The n n matrix
I =
_
_
_
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
0 0 1
_
_
_
_
_
(the identity matrix)
is row reduced. So too are, for example,
A =
_
0 1
0 0
_
, B =
_
_
1 0 0
0 1 3
0 0 0
_
_
, C =
_
_
1 2 0 0
0 0 1 0
0 0 0 1
_
_
.
Deﬁnition 2.2.3 Given a matrix A, row reduction produces one and only one row
reduced matrix B with A ∼ B. See Corollary 2.2.9.
Theorem 2.2.4 Let A be an m n matrix. Then A has a row reduced echelon
form determined by a simple process.
36 ROW REDUCED ECHELON FORM
Proof. Viewing the columns of A from left to right, take the ﬁrst nonzero
column. Pick a nonzero entry in this column and switch the row containing this
entry with the top row of A. Now divide this new top row by the value of this
nonzero entry to get a 1 in this position and then use row operations to make all
entries below this equal to zero. Thus the ﬁrst nonzero column is now e
1
. Denote
the resulting matrix by A
1
. Consider the submatrix of A
1
to the right of this
column and below the ﬁrst row. Do exactly the same thing for this submatrix that
was done for A. This time the e
1
will refer to F
m−1
. Use the ﬁrst 1 obtained by the
above process which is in the top row of this submatrix and row operations, to zero
out every entry above it in the rows of A
1
. Call the resulting matrix A
2
. Thus A
2
satisﬁes the conditions of the above deﬁnition up to the column just encountered.
Continue this way till every column has been dealt with and the result must be in
row reduced echelon form.
Now here is some terminology which is often used.
Deﬁnition 2.2.5 The ﬁrst pivot column of A is the ﬁrst nonzero column of A
which becomes e
1
in the row reduced echelon form. The next pivot column is the
ﬁrst column after this which becomes e
2
in the row reduced echelon form. The third
is the next column which becomes e
3
in the row reduced echelon form and so forth.
The algorithm just described for obtaining a row reduced echelon form shows
that these columns are well deﬁned, but we will deal with this issue more carefully
in Corollary 2.2.9 where we show that every matrix corresponds to exactly one row
reduced echelon form.
Example 2.2.6 Determine the pivot columns for the matrix
A =
_
_
2 1 3 6 2
1 7 8 4 0
1 3 4 −2 2
_
_
(2.2.4)
As described above, the k
th
pivot column of A is the column in which e
k
appears
in the row reduced echelon form for the ﬁrst time in reading from left to right. A
row reduced echelon form for A is
_
_
1 0 1 0
64
35
0 1 1 0 −
4
35
0 0 0 1 −
9
35
_
_
It follows that columns 1,2, and 4 in A are pivot columns.
Note that from Lemma 2.1.10 the last column of A has a linear relationship to
the ﬁrst four columns. Namely
_
_
2
0
2
_
_
=
64
35
_
_
2
1
1
_
_
+
_
−
4
35
_
_
_
1
7
3
_
_
+
_
−
9
35
_
_
_
6
4
−2
_
_
This linear relationship is revealed by the row reduced echelon form but it was not
apparent from the original matrix.
2.2. THE ROW REDUCED ECHELON FORM OF A MATRIX 37
Deﬁnition 2.2.7 Two matrices A, B are said to be row equivalent if B can be
obtained from A by a sequence of row operations. When A is row equivalent to B,
we write A ∼ B.
Proposition 2.2.8 In the notation of Deﬁnition 2.2.7. A ∼ A. If A ∼ B, then
B ∼ A. If A ∼ B and B ∼ C, then A ∼ C.
Proof: That A ∼ A is obvious. Consider the second claim. By Theorem 2.1.6,
there exist elementary matrices E
1
, E
2
, , E
m
such that
B = E
1
E
2
E
m
A
It follows from Lemma 2.1.8 that (E
1
E
2
E
m
)
−1
exists and equals the product of
the inverses of these matrices in the reverse order. Thus
E
−1
m
E
−1
m−1
E
−1
1
B = (E
1
E
2
E
m
)
−1
B
= (E
1
E
2
E
m
)
−1
(E
1
E
2
E
m
) A = A
By Theorem 2.1.6, each E
−1
k
is an elementary matrix. By Theorem 2.1.6 again, the
above shows that A results from a sequence of row operations applied to B. The
last claim is left for an exercise.
There are three choices for row operations at each step in Theorem 2.2.4. A
natural question is whether the same row reduced echelon matrix always results in
the end from following any sequence of row operations.
We have already made use of the following observation in ﬁnding a linear rela
tionship between the columns of the matrix A in 2.2.4, but here it is stated more
formally. Now
_
_
_
x
1
.
.
.
x
n
_
_
_ = x
1
e
1
+ +x
n
e
n
,
so to say two column vectors are equal, is to say the column vectors are the same
linear combination of the special vectors e
j
.
Corollary 2.2.9 The row reduced echelon form is unique. That is if B, C are two
matrices in row reduced echelon form and both are obtained from A by a sequence
of row operations, then B = C.
Proof: Suppose B and C are both row reduced echelon forms for the matrix
A. It follows that B and C have zero columns in the same position because row
operations do not aﬀect zero columns. By Proposition 2.2.8, B and C are row
equivalent. Suppose e
1
, , e
r
occur in B for the ﬁrst time, reading from left to
right, in positions i
1
, , i
r
respectively. Then from the description of the row
reduced echelon form, each of these columns of B, in positions i
1
, , i
r
, is not a
linear combination of the preceding columns. Since C is row equivalent to B, it
follows from Lemma 2.1.10, that each column of C in positions i
1
, , i
r
is not a
38 ROW REDUCED ECHELON FORM
linear combination of the preceding columns of C. By the description of the row
reduced echelon form, e
1
, , e
r
occur for the ﬁrst time in C in positions i
1
, , i
r
respectively. Therefore, both B and C have the sequence e
1
, e
2
, , e
r
occurring for
the ﬁrst time in the positions, i
1
, i
2
, , i
r
. Since these matrices are row equivalent,
it follows from Lemma 2.1.10, that the columns between the i
k
and i
k+1
position in
the two matrices are linear combinations involving the same scalars, of the columns
in the i
1
, , i
k
position. Similarly, the columns after the i
r
position are linear
combinations of the columns in the i
1
, , i
r
positions involving the same scalars
in both matrices. This is equivalent to the assertion that each of these columns is
identical in B and C.
Now with the above corollary, here is a very fundamental observation. The
number of nonzero rows in the row reduced echelon form is the same as the number
of pivot columns. Namely, this number is r in both cases where e
1
, , e
r
are the
pivot columns in the row reduced echelon form. Now consider a matrix which looks
like this: (More columns than rows.)
Corollary 2.2.10 Suppose A is an m n matrix and that m < n. That is, the
number of rows is less than the number of columns. Then one of the columns of A
is a linear combination of the preceding columns of A. Also, there exists x ∈ F
n
such that x ,= 0 and Ax = 0.
Proof: Since m < n, not all the columns of A can be pivot columns. In reading
from left to right, pick the ﬁrst one which is not a pivot column. Then from the
description of the row reduced echelon form, this column is a linear combination of
the preceding columns. Say
a
j
= x
1
a
1
+ +x
j−1
a
j−1
.
Therefore, from the way we multiply a matrix times a vector,
A
_
_
_
_
_
_
_
_
_
_
_
_
x
1
.
.
.
x
j−1
−1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
= (a
1
a
j−1
a
j
a
n
)
_
_
_
_
_
_
_
_
_
_
_
_
x
1
.
.
.
x
j−1
−1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
= 0
2.3. FINDING THE INVERSE OF A MATRIX 39
2.3 Finding the inverse of a matrix
We have already discussed the idea of an inverse of a matrix in Deﬁnition 2.1.7.
Recall that the inverse of an n n matrix A is a matrix B such that
AB = BA = I
where I is the identity matrix discussed in In Theorem 2.1.6, it was shown that an
elementary matrix is invertible and that its inverse is also an elementary matrix.
We also showed in Lemma 2.1.8 that the product of invertible matrices is invertible
and that the inverse of this product is the product of the inverses in the reverse
order. In this section, we consider the problem of ﬁnding an inverse for a given
n n matrix. Recall that A has an inverse, denoted by A
−1
if AA
−1
= A
−1
A = I.
The procedure for ﬁnding the inverse is called the GaussJordan procedure.
Procedure 2.3.1 Suppose A is an nn matrix. To ﬁnd A
−1
if it exists, form the
augmented n 2n matrix
(A[I)
and then if possible, do row operations until you obtain an n 2n matrix of the
form
(I[B) . (2.3.5)
When this has been done, B = A
−1
. If it is impossible to row reduce to a matrix of
the form (I[B) , then A has no inverse.
The procedure just described actually yields a right inverse. This is a matrix
B such that AB = I.
As mentioned earlier, what you have really found in the above algorithm is a
right inverse. Is this right inverse matrix, which we have called the inverse, really
the inverse, the matrix which when multiplied on both sides gives the identity?
Theorem 2.3.2 Suppose A, B are nn matrices and AB = I. Then it follows that
BA = I also and so B = A
−1
. For n n matrices, the left inverse, right inverse
and inverse are all the same thing.
Proof. If AB = I for A, B n n matrices, is BA = I? If AB = I, there exists
a unique solution x to the equation
Bx = y
for any choice of y. In fact,
x = A(Bx) = Ay.
This means the row reduced echelon form of B must be I. Thus every column is a
pivot column. Otherwise, there exists a free variable and the solution, if it exists,
40 ROW REDUCED ECHELON FORM
would not be unique, contrary to what was just shown must happen if AB = I. It
follows that a right inverse B
−1
for B exists. The above procedure yields
_
B I
_
→
_
I B
−1
_
Now multiply both sides of the equation AB = I on the right by B
−1
. Then
A = A
_
BB
−1
_
= (AB) B
−1
= B
−1
Thus A is the right inverse of B and so BA = I. This shows that if AB = I, then
BA = I also. Exchanging roles of A and B, we see that if BA = I, then AB = I.
This has shown that in the context of nn matrices, right inverses, left inverses
and inverses are all the same and this matrix is called A
−1
.
The following corollary is also of interest.
Corollary 2.3.3 An n n matrix A has an inverse if and only if the row reduced
echelon form of A is I.
Proof. First suppose the row reduced echelon form of A is I. Then Procedure
2.3.1 yields a right inverse for A. By Theorem 2.3.2 this is the inverse. Next suppose
A has an inverse. Then there exists a unique solution x to the equation
Ax = y
given by x = A
−1
y. It follows that in the augmented matrix
(A[0)
there are no free variables and so every column to the left of [ is a pivot column.
Therefore, the row reduced echelon form of A is I.
This also shows the following major theorem.
Theorem 2.3.4 An n n matrix is invertible if and only if there is a sequence of
elementary matrices ¦E
1
, , E
s
¦ such that A = E
1
E
s
.
Proof: If A is invertible, then there exist elementary matrices E
i
such that
E
1
E
s−1
E
s
A = I
Hence, multiplying by the inverses of these elementary matrices,
A = E
−1
s
E
−1
2
E
−1
1
and as shown above, the inverse of an elementary matrix is an elementary matrix.
This proves the theorem in one direction. The other direction is obvious because
as pointed out above, the product of invertible matrices is invertible.
2.4. THE RANK OF A MATRIX 41
2.4 The rank of a matrix
With the existence and uniqueness of the row reduced echelon form, it is a natural
step to deﬁne the rank of a matrix.
Deﬁnition 2.4.1 Let A be an m n matrix. Then the rank of A is deﬁned to
be the number of pivot columns. From the description of the row reduced echelon
form, this is also equal to the number of nonzero rows in the row reduced echelon
form. The nullity of A is the number of non pivot columns. This is the same as
the number of free variables in the augmented matrix (A[0).
The rank is important because of the following proposition.
Proposition 2.4.2 Let A be an mn matrix which has rank r. Then there exists
a set of r columns of A such that every other column is a linear combination of
these columns. Furthermore, none of these columns is a linear combination of the
other r −1 columns in the set. The rank of A is no larger than the minimum of m
and n. Also the rank added to the nullity equals n.
Proof. Since the rank of A is r it follows that A has exactly r pivot columns.
Thus, in the row reduced echelon form, every column is a linear combination of these
pivot columns and none of the pivot columns is a linear combination of the others
pivot columns. By Lemma 2.1.10 the same is true of the columns in the original
matrix A. There are at most min (m, n) pivot columns (nonzero rows). Therefore,
the rank of A is no larger than min (m, n) as claimed. Since every column is either
a pivot column or isn’t a pivot column, this shows that the rank added to nullity
equals n.
42 ROW REDUCED ECHELON FORM
Sequences
3.1 The Inner Product And Dot Product
The inner product is deﬁned for vectors in C
n
or 1
n
. To avoid making a distinction
I will use the symbol F. Vectors are denoted by bold face letters. Thus a will denote
the vector
a =(a
1
, , a
n
) .
Scalars are denoted by letters which are not in bold face.
Deﬁnition 3.1.1 Let a, b ∈ F
n
, deﬁne a,b as
(a, b) ≡
n
k=1
a
k
b
k
.
This is called the inner product.
With this deﬁnition, there are several important properties satisﬁed by the inner
product. In the statement of these properties, α and β will denote scalars and a, b, c
will denote vectors or in other words, points in F
n
. The following proposition comes
directly from the deﬁnition of the inner product.
Proposition 3.1.2 The inner product satisﬁes the following properties.
(a, b) = (b, a) (3.1.1)
(a, a) ≥ 0 and equals zero if and only if a = 0 (3.1.2)
((αa +βb) , c) = α(a, c) +β (b, c) (3.1.3)
(c, (αa +βb)) = α(c, a) +β (c, b) (3.1.4)
[a[
2
≡ (a, a) (3.1.5)
Example 3.1.3 Find (1, 2, 0, −1) , (0, i, 2, 3) .
43
44 SEQUENCES
This equals 0 + 2 (−i) + 0 +−3 = −3 −2i
For any inner product there is always the Cauchy Schwarz inequality.
Theorem 3.1.4 The inner product satisﬁes the inequality
[(a, b)[ ≤ [a[ [b[ . (3.1.6)
Furthermore equality is obtained if and only if one of a or b is a scalar multiple of
the other.
Proof: First deﬁne θ ∈ C such that
θ (a, b) = [(a, b)[ , [θ[ = 1,
and deﬁne a function of t ∈ 1
f (t) = ((a +tθb) , (a +tθb)) .
Then by 3.1.2, f (t) ≥ 0 for all t ∈ 1. Also from 3.1.3,3.1.4,3.1.1, and 3.1.5
f (t) = (a, (a +tθb)) + (tθb, (a +tθb))
= (a, a) +tθ (a, b) +tθ (b, a) +t
2
[θ[
2
(b, b)
= [a[
2
+ 2t Re θ (a, b) +[b[
2
t
2
= [a[
2
+ 2t [(a, b)[ +[b[
2
t
2
Now if [b[
2
= 0 it must be the case that (a, b) = 0 because otherwise, you could
pick large negative values of t and violate f (t) ≥ 0. Therefore, in this case, the
Cauchy Schwarz inequality holds.
In the case that [b[ ,= 0, y = f (t) is a polynomial which opens up and therefore,
if it is always nonnegative, the quadratic formula requires that
The discriminant
¸ .. ¸
4 [(a, b)[
2
−4 [a[
2
[b[
2
≤ 0
since otherwise the function, f (t) would have two real zeros and would necessarily
have a graph which dips below the t axis. This proves 3.1.6.
It is clear from the axioms of the inner product that equality holds in 3.1.6
whenever one of the vectors is a scalar multiple of the other. It only remains to
verify that this is the only way equality can occur. If either vector equals zero,
then equality is obtained in 3.1.6 so it can be assumed both vectors are non zero.
Then if equality is achieved, it follows f (t) has exactly one real zero because the
discriminant vanishes. Therefore, for some value of t, a +tθb = 0 showing that a is
a multiple of b.
You should note that the entire argument was based only on the properties
of the inner product listed in 3.1.1  3.1.5. This means that whenever something
3.1. THE INNER PRODUCT AND DOT PRODUCT 45
satisﬁes these properties, the Cauchy Schwartz inequality holds. There are many
other instances of these properties besides vectors in F
n
.
The Cauchy Schwartz inequality allows a proof of the triangle inequality for
distances in F
n
in much the same way as the triangle inequality for the absolute
value.
Theorem 3.1.5 (Triangle inequality) For a, b ∈ F
n
[a +b[ ≤ [a[ +[b[ (3.1.7)
and equality holds if and only if one of the vectors is a nonnegative scalar multiple
of the other. Also
[[a[ −[b[[ ≤ [a −b[ (3.1.8)
Proof : By properties of the inner product and the Cauchy Schwartz inequality,
[a +b[
2
= ((a +b) , (a +b))
= (a, a) + (a, b) + (b, a) + (b, b)
= [a[
2
+ 2 Re (a, b) +[b[
2
≤ [a[
2
+ 2 [(a, b)[ +[b[
2
≤ [a[
2
+ 2 [a[ [b[ +[b[
2
= ([a[ +[b[)
2
.
Taking square roots of both sides you obtain 3.1.7.
It remains to consider when equality occurs. If either vector equals zero, then
that vector equals zero times the other vector and the claim about when equality
occurs is veriﬁed. Therefore, it can be assumed both vectors are nonzero. To get
equality in the second inequality above, Theorem 3.1.4 implies one of the vectors
must be a multiple of the other. Say b = αa. Also, to get equality in the ﬁrst
inequality, (a, b) must be a nonnegative real number. Thus
0 ≤ (a, b) = (a,αa) = α[a[
2
.
Therefore, α must be a real number which is nonnegative.
To get the other form of the triangle inequality,
a = a −b +b
so
[a[ = [a −b +b[
≤ [a −b[ +[b[ .
Therefore,
[a[ −[b[ ≤ [a −b[ (3.1.9)
Similarly,
[b[ −[a[ ≤ [b −a[ = [a −b[ . (3.1.10)
It follows from 3.1.9 and 3.1.10 that 3.1.8 holds. This is because [[a[ −[b[[ equals
the left side of either 3.1.9 or 3.1.10 and either way, [[a[ −[b[[ ≤ [a −b[ .
46 SEQUENCES
3.1.1 The Dot Product
If you forget about the conjugate, the resulting product will be referred to as the dot
product. Thus a b =
n
i=1
a
i
b
i
. This dot product satisﬁes the following properties.
a b = b a (3.1.11)
((αa +βb) c) =α(a c) +β (b c) (3.1.12)
(c (αa +βb)) = α(c a) +β (c b) (3.1.13)
Note that it is the same thing as the inner product if the vectors are in 1
n
. However,
in case the vectors are in C
n
, this dot product will not satisfy a a ≥ 0. For example,
(1, 2i) (1, 2i) = 1 −4 = −3. However, ((1, 2i) , (1, 2i)) = 1 + 4 = 5. Usually we are
considering 1
n
so it makes no diﬀerence.
3.2 Vector Valued Sequences And Their Limits
Functions deﬁned on the set of integers larger than a given integer which have
values in a vector space are called vector valued sequences. I will always assume
the vector space is a normed vector space. Actually, it will specialized even more
to F
n
, although everything can be done for an arbitrary vector space and when it
creates no diﬃculties, I will state certain deﬁnitions and easy theorems in the more
general context and use the symbol [[[[ to refer to the norm. Other than this, the
notation is almost the same as it was when the sequences had values in C. The
main diﬀerence is that certain variables are placed in bold face to indicate they are
vectors. Even this is not really necessary but it is conventional to do it.The concept
of subsequence is also the same as it was for sequences of numbers. To review,
Deﬁnition 3.2.1 Let ¦a
n
¦ be a sequence and let n
1
< n
2
< n
3
, be any strictly
increasing list of integers such that n
1
is at least as large as the ﬁrst number in the
domain of the function. Then if b
k
≡ a
n
k
, ¦b
k
¦ is called a subsequence of ¦a
n
¦ .
Example 3.2.2 Let a
n
=
_
n + 1, sin
_
1
n
__
. Then ¦a
n
¦
∞
n=1
is a vector valued se
quence.
The deﬁnition of a limit of a vector valued sequence is given next. It is just like
the deﬁnition given for sequences of scalars. However, here the symbol [[ refers to
the usual norm in F
n
. In a general normed vector space, it will be denoted by [[[[ .
Deﬁnition 3.2.3 A vector valued sequence ¦a
n
¦
∞
n=1
converges to a in a normed
vector space V, written as
lim
n→∞
a
n
= a or a
n
→a
if and only if for every ε > 0 there exists n
ε
such that whenever n ≥ n
ε
,
[[a
n
−a[[ < ε.
3.2. VECTOR VALUED SEQUENCES AND THEIR LIMITS 47
In words the deﬁnition says that given any measure of closeness ε, the terms of
the sequence are eventually this close to a. Here, the word “eventually” refers to n
being suﬃciently large.
Theorem 3.2.4 If lim
n→∞
a
n
= a and lim
n→∞
a
n
= a
1
then a
1
= a.
Proof: Suppose a
1
,= a. Then let 0 < ε < [[a
1
−a[[ /2 in the deﬁnition of
the limit. It follows there exists n
ε
such that if n ≥ n
ε
, then [[a
n
−a[[ < ε and
[a
n
−a
1
[ < ε. Therefore, for such n,
[[a
1
−a[[ ≤ [[a
1
−a
n
[[ +[[a
n
−a[[
< ε +ε < [[a
1
−a[[ /2 +[[a
1
−a[[ /2 = [[a
1
−a[[ ,
a contradiction.
Theorem 3.2.5 Suppose ¦a
n
¦ and ¦b
n
¦ are vector valued sequences and that
lim
n→∞
a
n
= a and lim
n→∞
b
n
= b.
Also suppose x and y are scalars in F. Then
lim
n→∞
xa
n
+yb
n
= xa +yb (3.2.14)
Also,
lim
n→∞
(a
n
b
n
) = (a b) (3.2.15)
If ¦x
n
¦ is a sequence of scalars in F converging to x and if ¦a
n
¦ is a sequence of
vectors in F
n
converging to a, then
lim
n→∞
x
n
a
n
= xa. (3.2.16)
Also if ¦x
k
¦ is a sequence of vectors in F
n
then x
k
→x, if and only if for each j,
lim
k→∞
x
j
k
= x
j
. (3.2.17)
where here
x
k
=
_
x
1
k
, , x
n
k
_
, x =
_
x
1
, , x
n
_
.
Proof: Consider the ﬁrst claim. By the triangle inequality
[[xa +yb−(xa
n
+yb
n
)[[ ≤ [x[ [[a −a
n
[[ +[y[ [[b −b
n
[[ .
By deﬁnition, there exists n
ε
such that if n ≥ n
ε
,
[[a −a
n
[[ , [[b −b
n
[[ <
ε
2 (1 +[x[ +[y[)
so for n > n
ε
,
[[xa +yb−(xa
n
+yb
n
)[[ < [x[
ε
2 (1 +[x[ +[y[)
+[y[
ε
2 (1 +[x[ +[y[)
≤ ε.
48 SEQUENCES
Now consider the second. Let ε > 0 be given and choose n
1
such that if n ≥ n
1
then
[a
n
−a[ < 1.
For such n, it follows from the Cauchy Schwarz inequality and properties of the
inner product that
[a
n
b
n
−a b[ ≤ [(a
n
b
n
) −(a
n
b)[ +[(a
n
b) −(a b)[
≤ [a
n
[ [b
n
−b[ +[b[ [a
n
−a[
≤ ([a[ + 1) [b
n
−b[ +[b[ [a
n
−a[ .
Now let n
2
be large enough that for n ≥ n
2
,
[b
n
−b[ <
ε
2 ([a[ + 1)
, and [a
n
−a[ <
ε
2 ([b[ + 1)
.
Such a number exists because of the deﬁnition of limit. Therefore, let
n
ε
> max (n
1
, n
2
) .
For n ≥ n
ε
,
[a
n
b
n
−a b[ ≤ ([a[ + 1) [b
n
−b[ +[b[ [a
n
−a[
< ([a[ + 1)
ε
2 ([a[ + 1)
+[b[
ε
2 ([b[ + 1)
≤ ε.
This proves 3.2.15. The claim, 3.2.16 is left for you to do.
Finally consider the last claim. If 3.2.17 holds, then from the deﬁnition of
distance in F
n
,
lim
k→∞
[x −x
k
[ ≡ lim
k→∞
¸
¸
¸
_
n
j=1
_
x
j
−x
j
k
_
2
= 0.
On the other hand, if lim
k→∞
[x −x
k
[ = 0, then since
¸
¸
¸x
j
k
−x
j
¸
¸
¸ ≤ [x −x
k
[ , it
follows from the squeezing theorem that
lim
k→∞
¸
¸
¸x
j
k
−x
j
¸
¸
¸ = 0.
An important theorem is the one which states that if a sequence converges, so
does every subsequence. You should review Deﬁnition 3.2.1 at this point. The proof
is identical to the one involving sequences of numbers.
Theorem 3.2.6 Let ¦x
n
¦ be a vector valued sequence with lim
n→∞
x
n
= x and let
¦x
n
k
¦ be a subsequence. Then lim
k→∞
x
n
k
= x.
3.3. SEQUENTIAL COMPACTNESS 49
Proof: Let ε > 0 be given. Then there exists n
ε
such that if n > n
ε
, then
[[x
n
−x[[ < ε. Suppose k > n
ε
. Then n
k
≥ k > n
ε
and so
[[x
n
k
−x[[ < ε
showing lim
k→∞
x
n
k
= x as claimed.
Theorem 3.2.7 Let ¦x
n
¦ be a sequence of real numbers and suppose each x
n
≤ l
(≥ l)and lim
n→∞
x
n
= x. Then x ≤ l (≥ l) . More generally, suppose ¦x
n
¦ and ¦y
n
¦
are two sequences such that lim
n→∞
x
n
= x and lim
n→∞
y
n
= y. Then if x
n
≤ y
n
for all n suﬃciently large, then x ≤ y.
Proof: Let ε > 0 be given. Then for n large enough,
l ≥ x
n
> x −ε
and so
l +ε ≥ x.
Since ε > 0 is arbitrary, this requires l ≥ x. The other case is entirely similar or else
you could consider −l and ¦−x
n
¦ and apply the case just considered.
Consider the last claim. There exists N such that if n ≥ N then x
n
≤ y
n
and
[x −x
n
[ +[y −y
n
[ < ε/2.
Then considering n > N in what follows,
x −y ≤ x
n
+ε/2 −(y
n
−ε/2) = x
n
−y
n
+ε ≤ ε.
Since ε was arbitrary, it follows x −y ≤ 0.
Theorem 3.2.8 Let ¦x
n
¦ be a sequence vectors and suppose each [[x
n
[[ ≤ l (≥ l)and
lim
n→∞
x
n
= x. Then x ≤ l (≥ l) . More generally, suppose ¦x
n
¦ and ¦y
n
¦ are two
sequences such that lim
n→∞
x
n
= x and lim
n→∞
y
n
= y. Then if [[x
n
[[ ≤ [[y
n
[[ for
all n suﬃciently large, then [[x[[ ≤ [[y[[ .
Proof: It suﬃces to just prove the second part since the ﬁrst part is similar.
By the triangle inequality,
[[[x
n
[[ −[[x[[[ ≤ [[x
n
−x[[
and for large n this is given to be small. Thus ¦[[x
n
[[¦ converges to [[x[[ . Similarly
¦[[y
n
[[¦ converges to [[y[[. Now the desired result follows from Theorem 3.2.7.
3.3 Sequential Compactness
The following is the deﬁnition of sequential compactness. It is a very useful notion
which can be used to prove existence theorems.
50 SEQUENCES
Deﬁnition 3.3.1 A set K ⊆ V, a normed vector space is sequentially compact if
whenever ¦a
n
¦ ⊆ K is a sequence, there exists a subsequence, ¦a
n
k
¦ such that this
subsequence converges to a point of K.
First of all, it is convenient to consider the sequentially compact sets in F.
Lemma 3.3.2 Let I
k
=
_
a
k
, b
k
¸
and suppose that for all k = 1, 2, ,
I
k
⊇ I
k+1
.
Then there exists a point, c ∈ 1 which is an element of every I
k
.
Proof: Since I
k
⊇ I
k+1
, this implies
a
k
≤ a
k+1
, b
k
≥ b
k+1
. (3.3.18)
Consequently, if k ≤ l,
a
l
≤ a
l
≤ b
l
≤ b
k
. (3.3.19)
Now deﬁne
c ≡ sup
_
a
l
: l = 1, 2,
_
By the ﬁrst inequality in 3.3.18, and 3.3.19
a
k
≤ c = sup
_
a
l
: l = k, k + 1,
_
≤ b
k
(3.3.20)
for each k = 1, 2 . Thus c ∈ I
k
for every k.
If this went too fast, the reason for the last inequality in 3.3.20 is that from
3.3.19, b
k
is an upper bound to
_
a
l
: l = k, k + 1,
_
. Therefore, it is at least as
large as the least upper bound.
Theorem 3.3.3 Every closed interval, [a, b] is sequentially compact.
Proof: Let ¦x
n
¦ ⊆ [a, b] ≡ I
0
. Consider the two intervals
_
a,
a+b
2
¸
and
_
a+b
2
, b
¸
each of which has length (b −a) /2. At least one of these intervals contains x
n
for
inﬁnitely many values of n. Call this interval I
1
. Now do for I
1
what was done for I
0
.
Split it in half and let I
2
be the interval which contains x
n
for inﬁnitely many values
of n. Continue this way obtaining a sequence of nested intervals I
0
⊇ I
1
⊇ I
2
⊇ I
3
where the length of I
n
is (b −a) /2
n
. Now pick n
1
such that x
n
1
∈ I
1
, n
2
such that
n
2
> n
1
and x
n
2
∈ I
2
, n
3
such that n
3
> n
2
and x
n
3
∈ I
3
, etc. (This can be done
because in each case the intervals contained x
n
for inﬁnitely many values of n.) By
the nested interval lemma there exists a point, c contained in all these intervals.
Furthermore,
[x
n
k
−c[ < (b −a) 2
−k
and so lim
k→∞
x
n
k
= c ∈ [a, b].
3.4. CLOSED AND OPEN SETS 51
Theorem 3.3.4 Let
I =
n
k=1
K
k
where K
k
is a sequentially compact set in F. Then I is a sequentially compact set
in F
n
.
Proof: Let ¦x
k
¦
∞
k=1
be a sequence of points in I. Let
x
k
=
_
x
1
k
, , x
n
k
_
Thus
_
x
i
k
_
∞
k=1
is a sequence of points in K
i
. Since K
i
is sequentially compact, there
exists a subsequence of ¦x
k
¦
∞
k=1
denoted by ¦x
1k
¦ such that
_
x
1
1k
_
converges to x
1
for some x
1
∈ K
1
. Now there exists a further subsequence, ¦x
2k
¦ such that
_
x
1
2k
_
converges to x
1
, because by Theorem 3.2.6, subsequences of convergent sequences
converge to the same limit as the convergent sequence, and in addition,
_
x
2
2k
_
converges to some x
2
∈ K
2
. Continue taking subsequences such that for ¦x
jk
¦
∞
k=1
,
it follows
_
x
r
jk
_
converges to some x
r
∈ K
r
for all r ≤ j. Then ¦x
nk
¦
∞
k=1
is the
desired subsequence such that the sequence of numbers in F obtained by taking
the j
th
component of this subsequence converges to some x
j
∈ K
j
. It follows from
Theorem 3.2.5 that x ≡
_
x
1
, , x
n
_
∈ I and is the limit of ¦x
nk
¦
∞
k=1
.
Corollary 3.3.5 Any box of the form
[a, b] +i [c, d] ≡ ¦x +iy : x ∈ [a, b] , y ∈ [c, d]¦
is sequentially compact in C.
Proof: The given box is essentially [a, b] [c, d] .
¦x
k
+iy
k
¦
∞
k=1
⊆ [a, b] +i [c, d]
is the same as saying (x
k
, y
k
) ∈ [a, b] [c, d] . Therefore, there exists (x, y) ∈ [a, b]
[c, d] such that x
k
→ x and y
k
→ y. In other words x
k
+ iy
k
→ x + iy and
x +iy ∈ [a, b] +i [c, d].
3.4 Closed And Open Sets
The deﬁnition of open and closed sets is next.
Deﬁnition 3.4.1 Let U be a set of points in a normed vector space, V . A point,
p ∈ U is said to be an interior point if whenever [[x −p[[ is suﬃciently small, it
follows x ∈ U also. The set of points, x which are closer to p than δ is denoted by
B(p, δ) ≡ ¦x ∈ V : [[x −p[[ < δ¦ .
This symbol, B(p, δ) is called an open ball of radius δ. Thus a point, p is an interior
point of U if there exists δ > 0 such that p ∈ B(p, δ) ⊆ U. An open set is one for
which every point of the set is an interior point. Closed sets are those which are
complements of open sets. Thus H is closed means H
C
is open.
52 SEQUENCES
Theorem 3.4.2 The intersection of any ﬁnite collection of open sets is open. The
union of any collection of open sets is open. The intersection of any collection of
closed sets is closed and the union of any ﬁnite collection of closed sets is closed.
Proof: To see that any union of open sets is open, note that every point of the
union is in at least one of the open sets. Therefore, it is an interior point of that
set and hence an interior point of the entire union.
Now let ¦U
1
, , U
m
¦ be some open sets and suppose p ∈ ∩
m
k=1
U
k
. Then there
exists r
k
> 0 such that B(p, r
k
) ⊆ U
k
. Let 0 < r ≤ min (r
1
, r
2
, , r
m
) . Then
B(p, r) ⊆ ∩
m
k=1
U
k
and so the ﬁnite intersection is open. Note that if the ﬁnite
intersection is empty, there is nothing to prove because it is certainly true in this
case that every point in the intersection is an interior point because there aren’t
any such points.
Suppose ¦H
1
, , H
m
¦ is a ﬁnite set of closed sets. Then ∪
m
k=1
H
k
is closed if
its complement is open. However, from DeMorgan’s laws,
(∪
m
k=1
H
k
)
C
= ∩
m
k=1
H
C
k
,
a ﬁnite intersection of open sets which is open by what was just shown.
Next let ( be some collection of closed sets. Then
(∩()
C
= ∪
_
H
C
: H ∈ (
_
,
a union of open sets which is therefore open by the ﬁrst part of the proof. Thus ∩(
is closed.
Next there is the concept of a limit point which gives another way of character
izing closed sets.
Deﬁnition 3.4.3 Let A be any nonempty set and let x be a point. Then x is said
to be a limit point of A if for every r > 0, B(x, r) contains a point of A which is
not equal to x.
Example 3.4.4 Consider A = B(x, δ) , an open ball in a normed vector space.
Then every point of B(x, δ) is a limit point. There are more general situations
than normed vector spaces in which this assertion is false.
If z ∈ B(x, δ) , consider z+
1
k
(x −z) ≡ w
k
for k ∈ N. Then
[[w
k
−x[[ =
¸
¸
¸
¸
¸
¸
¸
¸
z+
1
k
(x −z) −x
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
¸
¸
¸
_
1 −
1
k
_
z−
_
1 −
1
k
_
x
¸
¸
¸
¸
¸
¸
¸
¸
=
k −1
k
[[z −x[[ < δ
and also
[[w
k
−z[[ ≤
1
k
[[x −z[[ < δ/k
3.4. CLOSED AND OPEN SETS 53
so w
k
→ z. Furthermore, the w
k
are distinct. Thus z is a limit point of A as
claimed. This is because every ball containing z contains inﬁnitely many of the w
k
and since they are all distinct, they can’t all be equal to z.
Similarly, the following holds in any normed vector space.
Theorem 3.4.5 Let A be a nonempty set in V, a normed vector space. A point a is
a limit point of A if and only if there exists a sequence of distinct points of A, ¦a
n
¦
which converges to a. Also a nonempty set A is closed if and only if it contains all
its limit points.
Proof: Suppose ﬁrst a is a limit point of A. There exists a
1
∈ B(a, 1) ∩A such
that a
1
,= a. Now supposing distinct points, a
1
, , a
n
have been chosen such that
none are equal to a and for each k ≤ n, a
k
∈ B(a, 1/k) , let
0 < r
n+1
< min
_
1
n + 1
, [[a −a
1
[[ , , [[a −a
n
[[
_
.
Then there exists a
n+1
∈ B(a, r
n+1
) ∩A with a
n+1
,= a. Because of the deﬁnition of
r
n+1
, a
n+1
is not equal to any of the other a
k
for k < n+1. Also since [[a −a
m
[[ <
1/m, it follows lim
m→∞
a
m
= a. Conversely, if there exists a sequence of distinct
points of A converging to a, then B(a, r) contains all a
n
for n large enough. Thus
B(a, r) contains inﬁnitely many points of A since all are distinct. Thus at least one
of them is not equal to a. This establishes the ﬁrst part of the theorem.
Now consider the second claim. If A is closed then it is the complement of an
open set. Since A
C
is open, it follows that if a ∈ A
C
, then there exists δ > 0 such
that B(a, δ) ⊆ A
C
and so no point of A
C
can be a limit point of A. In other words,
every limit point of A must be in A. Conversely, suppose A contains all its limit
points. Then A
C
does not contain any limit points of A. It also contains no points
of A. Therefore, if a ∈ A
C
, since it is not a limit point of A, there exists δ > 0 such
that B(a, δ) contains no points of A diﬀerent than a. However, a itself is not in A
because a ∈ A
C
. Therefore, B(a, δ) is entirely contained in A
C
. Since a ∈ A
C
was
arbitrary, this shows every point of A
C
is an interior point and so A
C
is open.
Closed subsets of sequentially compact sets are sequentially compact.
Theorem 3.4.6 If K is a sequentially compact set in a normed vector space and
if H is a closed subset of K then H is sequentially compact.
Proof: Let ¦x
n
¦ ⊆ H. Then since K is sequentially compact, there is a sub
sequence, ¦x
n
k
¦ which converges to a point, x ∈ K. If x / ∈ H, then since H
C
is
open, it follows there exists B(x, r) such that this open ball contains no points of
H. However, this is a contradiction to having x
n
k
→x which requires x
n
k
∈ B(x, r)
for all k large enough. Thus x ∈ H and this has shown H is sequentially compact.
Deﬁnition 3.4.7 A set S ⊆ V, a normed vector space is bounded if there is some
r > 0 such that S ⊆ B(0, r) .
Theorem 3.4.8 Every closed and bounded set in F
n
is sequentially compact. Con
versely, every sequentially compact set in F
n
is closed and bounded.
54 SEQUENCES
Proof: Let H be a closed and bounded set in F
n
. Then H ⊆ B(0, r) for some r.
Therefore, if x ∈ H, x =(x
1
, , x
n
) , it must be that
_
n
i=1
[x
i
[
2
< r and so each
x
i
∈ [−r, r] +i [−r, r] ≡ R
r
, a sequentially compact set by Corollary 3.3.5. Thus H
is a closed subset of
n
i=1
R
r
which is a sequentially compact set by Theorem 3.3.4.
Therefore, by Theorem 3.4.6 it follows H is sequentially compact.
Conversely, suppose K is a sequentially compact set in F
n
. If it is not bounded,
then there exists a sequence, ¦k
m
¦ such that k
m
∈ K but k
m
/ ∈ B(0, m) for m =
1, 2, . However, this sequence cannot have any convergent subsequence because if
k
m
k
→ k, then for large enough m, k ∈ B(0,m) ⊆ D(0, m) and k
m
k
∈ B(0, m)
C
for all k large enough and this is a contradiction because there can only be ﬁnitely
many points of the sequence in B(0, m) . If K is not closed, then it is missing a
limit point. Say k
∞
is a limit point of K which is not in K. Pick k
m
∈ B
_
k
∞
,
1
m
_
.
Then ¦k
m
¦ converges to k
∞
and so every subsequence also converges to k
∞
by
Theorem 3.2.6. Thus there is no point of K which is a limit of some subsequence
of ¦k
m
¦ , a contradiction.
What are some examples of closed and bounded sets in a general normed vector
space and more speciﬁcally F
n
?
Proposition 3.4.9 Let D(z, r) denote the set of points,
¦w ∈ V : [[w−z[[ ≤ r¦
Then D(z, r) is closed and bounded. Also, let S (z,r) denote the set of points
¦w ∈ V : [[w−z[[ = r¦
Then S (z, r) is closed and bounded. It follows that if V = F
n
,then these sets are
sequentially compact.
Proof: First note D(z, r) is bounded because
D(z, r) ⊆ B(0, [[z[[ + 2r)
Here is why. Let x ∈ D(z, r) . Then [[x −z[[ ≤ r and so
[[x[[ ≤ [[x −z[[ +[[z[[ ≤ r +[[z[[ < 2r +[[z[[ .
It remains to verify it is closed. Suppose then that y / ∈ D(z, r) . This means
[[y −z[[ > r. Consider the open ball B(y, [[y −z[[ −r) . If x ∈ B(y, [[y −z[[ −r) ,
then
[[x −y[[ < [[y −z[[ −r
and so by the triangle inequality,
[[z −x[[ ≥ [[z −y[[ −[[y −x[[ > [[x −y[[ +r −[[x −y[[ = r
Thus the complement of D(z, r) is open and so D(z, r) is closed.
For the second type of set, note S (z, r)
C
= B(z, r)∪D(z, r)
C
, the union of two
open sets which by Theorem 3.4.2 is open. Therefore, S (z, r) is a closed set which
is clearly bounded because S (z, r) ⊆ D(z, r).
3.5. CAUCHY SEQUENCES AND COMPLETENESS 55
3.5 Cauchy Sequences And Completeness
The concept of completeness is that every Cauchy sequence converges. Cauchy
sequences are those sequences which have the property that ultimately the terms
of the sequence are bunching up. More precisely,
Deﬁnition 3.5.1 ¦a
n
¦ is a Cauchy sequence in a normed vector space, V if for all
ε > 0, there exists n
ε
such that whenever n, m ≥ n
ε
,
[[a
n
−a
m
[[ < ε.
Theorem 3.5.2 The set of terms (values) of a Cauchy sequence in a normed vector
space V is bounded.
Proof: Let ε = 1 in the deﬁnition of a Cauchy sequence and let n > n
1
. Then
from the deﬁnition,
[[a
n
−a
n
1
[[ < 1.
It follows that for all n > n
1
,
[[a
n
[[ < 1 +[[a
n
1
[[ .
Therefore, for all n,
[[a
n
[[ ≤ 1 +[[a
n
1
[[ +
n
1
k=1
[[a
k
[[ .
Theorem 3.5.3 If a sequence ¦a
n
¦ in V, a normed vector space converges, then
the sequence is a Cauchy sequence.
Proof: Let ε > 0 be given and suppose a
n
→ a. Then from the deﬁnition of
convergence, there exists n
ε
such that if n > n
ε
, it follows that
[[a
n
−a[[ <
ε
2
Therefore, if m, n ≥ n
ε
+ 1, it follows that
[[a
n
−a
m
[[ ≤ [[a
n
−a[[ +[[a −a
m
[[ <
ε
2
+
ε
2
= ε
showing that, since ε > 0 is arbitrary, ¦a
n
¦ is a Cauchy sequence.
The following theorem is very useful. It is identical to an earlier theorem. All
that is required is to put things in bold face to indicate they are vectors.
Theorem 3.5.4 Suppose ¦a
n
¦ is a Cauchy sequence in any normed vector space
and there exists a subsequence, ¦a
n
k
¦ which converges to a. Then ¦a
n
¦ also con
verges to a.
56 SEQUENCES
Proof: Let ε > 0 be given. There exists N such that if m, n > N, then
[[a
m
−a
n
[[ < ε/2.
Also there exists K such that if k > K, then
[[a −a
n
k
[[ < ε/2.
Then let k > max (K, N) . Then for such k,
[[a
k
−a[[ ≤ [[a
k
−a
n
k
[[ +[[a
n
k
−a[[
< ε/2 +ε/2 = ε.
Deﬁnition 3.5.5 If V is a normed vector space having the property that every
Cauchy sequence converges, then V is called complete. It is also referred to as a
Banach space.
Example 3.5.6 1 is given to be complete. This is a fundamental axiom on which
calculus is developed.
Given 1 is complete, the following lemma is easily obtained.
Lemma 3.5.7 C is complete.
Proof: Let ¦x
k
+iy
k
¦
∞
k=1
be a Cauchy sequence in C. This requires ¦x
k
¦ and
¦y
k
¦ are both Cauchy sequences in 1. This follows from the obvious estimates
[x
k
−x
m
[ , [y
k
−y
m
[ ≤ [(x
k
+iy
k
) −(x
m
+iy
m
)[ .
By completeness of 1 there exists x ∈ 1 such that x
k
→x and similarly there exists
y ∈ 1 such that y
k
→y. Therefore, since
[(x
k
+iy
k
) −(x +iy)[ ≤
_
(x
k
−x)
2
+ (y
k
−y)
2
≤ [x
k
−x[ +[y
k
−y[
it follows (x
k
+iy
k
) →(x +iy) .
A simple generalization of this idea yields the following theorem.
Theorem 3.5.8 F
n
is complete.
Proof: By 3.5.7, F is complete. Now let ¦a
m
¦ be a Cauchy sequence in F
n
.
Then by the deﬁnition of the norm
¸
¸
¸a
j
m
−a
j
k
¸
¸
¸ ≤ [a
m
−a
k
[
3.6. SHRINKING DIAMETERS 57
where a
j
m
denotes the j
th
component of a
m
. Thus for each j = 1, 2, , n,
_
a
j
m
_
∞
m=1
is a Cauchy sequence. It follows from Theorem 3.5.7, the completeness of F, there
exists a
j
such that
lim
m→∞
a
j
m
= a
j
Theorem 3.2.5 implies that lim
m→∞
a
m
= a where
a =
_
a
1
, , a
n
_
.
3.6 Shrinking Diameters
It is useful to consider another version of the nested interval lemma. This involves
a sequence of sets such that set (n + 1) is contained in set n and such that their
diameters converge to 0. It turns out that if the sets are also closed, then often
there exists a unique point in all of them.
Deﬁnition 3.6.1 Let S be a nonempty set in a normed vector space, V . Then
diam(S) is deﬁned as
diam(S) ≡ sup ¦[[x −y[[ : x, y ∈ S¦ .
This is called the diameter of S.
Theorem 3.6.2 Let ¦F
n
¦
∞
n=1
be a sequence of closed sets in F
n
such that
lim
n→∞
diam(F
n
) = 0
and F
n
⊇ F
n+1
for each n. Then there exists a unique p ∈ ∩
∞
k=1
F
k
.
Proof: Pick p
k
∈ F
k
. This is always possible because by assumption each set is
nonempty. Then ¦p
k
¦
∞
k=m
⊆ F
m
and since the diameters converge to 0, it follows
¦p
k
¦ is a Cauchy sequence. Therefore, it converges to a point, p by completeness
of F
n
discussed in Theorem 3.5.8. Since each F
k
is closed, p ∈ F
k
for all k. This is
because it is a limit of a sequence of points only ﬁnitely many of which are not in
the closed set F
k
. Therefore, p ∈ ∩
∞
k=1
F
k
. If q ∈ ∩
∞
k=1
F
k
, then since both p, q ∈ F
k
,
[p −q[ ≤ diam(F
k
) .
It follows since these diameters converge to 0, [p −q[ ≤ ε for every ε. Hence p = q.
A sequence of sets ¦G
n
¦ which satisﬁes G
n
⊇ G
n+1
for all n is called a nested
sequence of sets.
58 SEQUENCES
3.7 Exercises
1. For a nonempty set S in a normed vector space, V, deﬁne a function
x →dist (x,S) ≡ inf ¦[[x −y[[ : y ∈ S¦ .
Show
[[dist (x, S) −dist (y, S)[[ ≤ [[x −y[[ .
2. Let A be a nonempty set in F
n
or more generally in a normed vector space.
Deﬁne the closure of A to equal the intersection of all closed sets which contain
A. This is usually denoted by A. Show A = A ∪ A
where A
consists of the
set of limit points of A. Also explain why A is closed.
3. The interior of a set was deﬁned above. Tell why the interior of a set is always
an open set. The interior of a set A is sometimes denoted by A
0
.
4. Given an example of a set A whose interior is empty but whose closure is all
of 1
n
.
5. A point, p is said to be in the boundary of a nonempty set A if for every r > 0,
B(p, r) contains points of A as well as points of A
C
. Sometimes this is denoted
as ∂A. In a normed vector space, is it always the case that A∪∂A = A? Prove
or disprove.
6. Give an example of a ﬁnite dimensional normed vector space where the ﬁeld
of scalars is the rational numbers which is not complete.
7. Explain why as far as the theorems of this chapter are concerned, C
n
is es
sentially the same as 1
2n
.
8. A set A ⊆ 1
n
is said to be convex if whenever x, y ∈ Ait follows tx+(1 −t) y ∈
A whenever t ∈ [0, 1]. Show B(z, r) is convex. Also show D(z,r) is convex.
If A is convex, does it follow A is convex? Explain why or why not.
9. Let A be any nonempty subset of 1
n
. The convex hull of A, usually denoted by
co (A) is deﬁned as the set of all convex combinations of points in A. A convex
combination is of the form
p
k=1
t
k
a
k
where each t
k
≥ 0 and
k
t
k
= 1. Note
that p can be any ﬁnite number. Show co (A) is convex.
10. Suppose A ⊆ 1
n
and z ∈ co (A) . Thus z =
p
k=1
t
k
a
k
for t
k
≥ 0 and
k
t
k
=
1. Show there exists n + 1 of the points ¦a
1
, , a
p
¦ such that z is a convex
combination of these n+1 points. Hint: Show that if p > n+1 then the vectors
¦a
k
−a
1
¦
p
k=2
must be linearly dependent. Conclude from this the existence of
scalars ¦α
i
¦ such that
p
i=1
α
i
a
i
= 0. Now for s ∈ 1, z =
p
k=1
(t
k
+sα
i
) a
k
.
Consider small s and adjust till one or more of the t
k
+sα
k
vanish. Now you
are in the same situation as before but with only p −1 of the a
k
. Repeat the
argument till you end up with only n+1 at which time you can’t repeat again.
3.7. EXERCISES 59
11. Show that any uncountable set of points in F
n
must have a limit point.
12. Let V be any ﬁnite dimensional vector space having a basis ¦v
1
, , v
n
¦ . For
x ∈ V, let
x =
n
k=1
x
k
v
k
so that the scalars, x
k
are the components of x with respect to the given basis.
Deﬁne for x, y ∈ V
(x y) ≡
n
i=1
x
i
y
i
Show this is a dot product for V satisfying all the axioms of a dot product
presented earlier.
13. In the context of Problem 12 let [x[ denote the norm of x which is produced
by this inner product and suppose [[[[ is some other norm on V . Thus
[x[ ≡
_
i
[x
i
[
2
_
1/2
where
x =
k
x
k
v
k
. (3.7.21)
Show there exist positive numbers δ < ∆ independent of x such that
δ [x[ ≤ [[x[[ ≤ ∆[x[
This is referred to by saying the two norms are equivalent. Hint: The top half
is easy using the Cauchy Schwarz inequality. The bottom half is somewhat
harder. Argue that if it is not so, there exists a sequence ¦x
k
¦ such that
[x
k
[ = 1 but k
−1
[x
k
[ = k
−1
≥ [[x
k
[[ and then note the vector of components
of x
k
is on S (0, 1) which was shown to be sequentially compact. Pass to
a limit in 3.7.21 and use the assumed inequality to get a contradiction to
¦v
1
, , v
n
¦ being a basis.
14. It was shown above that in F
n
, the sequentially compact sets are exactly those
which are closed and bounded. Show that in any ﬁnite dimensional normed
vector space, V the closed and bounded sets are those which are sequentially
compact.
15. Two norms on a ﬁnite dimensional vector space, [[[[
1
and [[[[
2
are said to be
equivalent if there exist positive numbers δ < ∆ such that
δ [[x[[
1
≤ [[x[[
2
≤ ∆[[x
1
[[
1
.
Show the statement that two norms are equivalent is an equivalence rela
tion. Explain using the result of Problem 13 why any two norms on a ﬁnite
dimensional vector space are equivalent.
60 SEQUENCES
16. A normed vector space, V is separable if there is a countable set ¦w
k
¦
∞
k=1
such that whenever B(x, δ) is an open ball in V, there exists some w
k
in this
open ball. Show that F
n
is separable. This set of points is called a countable
dense set.
17. Let V be any normed vector space with norm [[[[. Using Problem 13 show
that V is separable.
18. Suppose V is a normed vector space. Show there exists a countable set of open
balls B ≡ ¦B(x
k
, r
k
)¦
∞
k=1
having the remarkable property that any open set U
is the union of some subset of B. This collection of balls is called a countable
basis. Hint: Use Problem 17 to get a countable dense dense set of points,
¦x
k
¦
∞
k=1
and then consider balls of the form B
_
x
k
,
1
r
_
where r ∈ N. Show this
collection of balls is countable and then show it has the remarkable property
mentioned.
19. Suppose S is any nonempty set in V a ﬁnite dimensional normed vector space.
Suppose ( is a set of open sets such that ∪( ⊇ S. (Such a collection of sets is
called an open cover.) Show using Problem 18 that there are countably many
sets from (, ¦U
k
¦
∞
k=1
such that S ⊆ ∪
∞
k=1
U
k
. This is called the Lindeloﬀ
property when every open cover can be reduced to a countable sub cover.
20. A set H in a normed vector space is said to be compact if whenever ( is a set
of open sets such that ∪( ⊇ H, there are ﬁnitely many sets of (, ¦U
1
, , U
p
¦
such that
H ⊆ ∪
p
i=1
U
i
.
Show using Problem 19 that if a set in a normed vector space is sequentially
compact, then it must be compact. Next show using Problem 14 that a set
in a normed vector space is compact if and only if it is closed and bounded.
Explain why the sets which are compact, closed and bounded, and sequentially
compact are the same sets in any ﬁnite dimensional normed vector space
Continuous Functions
Continuous functions are deﬁned as they are for a function of one variable.
Deﬁnition 4.0.1 Let V, W be normed vector spaces. A function f: D(f ) ⊆ V →
W is continuous at x ∈ D(f ) if for each ε > 0 there exists δ > 0 such that whenever
y ∈ D(f ) and
[[y −x[[
V
< δ
it follows that
[[f (x) −f (y)[[
W
< ε.
A function, f is continuous if it is continuous at every point of D(f ) .
There is a theorem which makes it easier to verify certain functions are contin
uous without having to always go to the above deﬁnition. The statement of this
theorem is purposely just a little vague. Some of these things tend to hold in almost
any context, certainly for any normed vector space.
Theorem 4.0.2 The following assertions are valid
1. The function, af+bg is continuous at x when f, g are continuous at x ∈
D(f ) ∩ D(g) and a, b ∈ F.
2. If and f and g have values in F
n
and they are each continuous at x, then f g
is continuous at x. If g has values in F and g (x) ,= 0 with g continuous, then
f /g is continuous at x.
3. If f is continuous at x, f (x) ∈ D(g) , and g is continuous at f (x) ,then g ◦ f
is continuous at x.
4. If V is any normed vector space, the function f : V →1, given by f (x) = [[x[[
is continuous.
5. f is continuous at every point of V if and only if whenever U is an open set
in W, f
−1
(W) is open.
61
62 CONTINUOUS FUNCTIONS
Proof: First consider 1.) Let ε > 0 be given. By assumption, there exist δ
1
> 0
such that whenever [x −y[ < δ
1
, it follows [f (x) −f (y)[ <
ε
2(a+b+1)
and there
exists δ
2
> 0 such that whenever [x −y[ < δ
2
, it follows that [g (x) −g (y)[ <
ε
2(a+b+1)
. Then let 0 < δ ≤ min (δ
1
, δ
2
) . If [x −y[ < δ, then everything happens
at once. Therefore, using the triangle inequality
[af (x) +bf (x) −(ag (y) +bg (y))[
≤ [a[ [f (x) −f (y)[ +[b[ [g (x) −g (y)[
< [a[
_
ε
2 ([a[ +[b[ + 1)
_
+[b[
_
ε
2 ([a[ +[b[ + 1)
_
< ε.
Now consider 2.) There exists δ
1
> 0 such that if [y −x[ < δ
1
, then [f (x) −f (y)[ <
1. Therefore, for such y,
[f (y)[ < 1 +[f (x)[ .
It follows that for such y,
[f g (x) −f g (y)[ ≤ [f (x) g (x) −g (x) f (y)[ +[g (x) f (y) −f (y) g (y)[
≤ [g (x)[ [f (x) −f (y)[ +[f (y)[ [g (x) −g (y)[
≤ (1 +[g (x)[ +[f (y)[) [[g (x) −g (y)[ +[f (x) −f (y)[]
≤ (2 +[g (x)[ +[f (x)[) [[g (x) −g (y)[ +[f (x) −f (y)[]
Now let ε > 0 be given. There exists δ
2
such that if [x −y[ < δ
2
, then
[g (x) −g (y)[ <
ε
2 (2 +[g (x)[ +[f (x)[)
,
and there exists δ
3
such that if [x −y[ < δ
3
, then
[f (x) −f (y)[ <
ε
2 (2 +[g (x)[ +[f (x)[)
Now let 0 < δ ≤ min (δ
1
, δ
2
, δ
3
) . Then if [x −y[ < δ, all the above hold at once
and so
[f g (x) −f g (y)[ ≤
(2 +[g (x)[ +[f (x)[) [[g (x) −g (y)[ +[f (x) −f (y)[]
< (2 +[g (x)[ +[f (x)[)
_
ε
2 (2 +[g (x)[ +[f (x)[)
+
ε
2 (2 +[g (x)[ +[f (x)[)
_
= ε.
This proves the ﬁrst part of 2.) To obtain the second part, let δ
1
be as described
above and let δ
0
> 0 be such that for [x −y[ < δ
0
,
[g (x) −g (y)[ < [g (x)[ /2
63
and so by the triangle inequality,
−[g (x)[ /2 ≤ [g (y)[ −[g (x)[ ≤ [g (x)[ /2
which implies [g (y)[ ≥ [g (x)[ /2, and [g (y)[ < 3 [g (x)[ /2.
Then if [x −y[ < min (δ
0
, δ
1
) ,
¸
¸
¸
¸
f (x)
g (x)
−
f (y)
g (y)
¸
¸
¸
¸
=
¸
¸
¸
¸
f (x) g (y) −f (y) g (x)
g (x) g (y)
¸
¸
¸
¸
≤
[f (x) g (y) −f (y) g (x)[
_
g(x)
2
2
_
=
2 [f (x) g (y) −f (y) g (x)[
[g (x)[
2
≤
2
[g (x)[
2
[[f (x) g (y) −f (y) g (y) +f (y) g (y) −f (y) g (x)[]
≤
2
[g (x)[
2
[[g (y)[ [f (x) −f (y)[ +[f (y)[ [g (y) −g (x)[]
≤
2
[g (x)[
2
_
3
2
[g (x)[ [f (x) −f (y)[ + (1 +[f (x)[) [g (y) −g (x)[
_
≤
2
[g (x)[
2
(1 + 2 [f (x)[ + 2 [g (x)[) [[f (x) −f (y)[ +[g (y) −g (x)[]
≡ M [[f (x) −f (y)[ +[g (y) −g (x)[]
where M is deﬁned by
M ≡
2
[g (x)[
2
(1 + 2 [f (x)[ + 2 [g (x)[)
Now let δ
2
be such that if [x −y[ < δ
2
, then
[f (x) −f (y)[ <
ε
2
M
−1
and let δ
3
be such that if [x −y[ < δ
3
, then
[g (y) −g (x)[ <
ε
2
M
−1
.
Then if 0 < δ ≤ min (δ
0
, δ
1
, δ
2
, δ
3
) , and [x −y[ < δ, everything holds and
¸
¸
¸
¸
f (x)
g (x)
−
f (y)
g (y)
¸
¸
¸
¸
≤ M [[f (x) −f (y)[ +[g (y) −g (x)[]
< M
_
ε
2
M
−1
+
ε
2
M
−1
_
= ε.
64 CONTINUOUS FUNCTIONS
This completes the proof of the second part of 2.)
Note that in these proofs no eﬀort is made to ﬁnd some sort of “best” δ. The
problem is one which has a yes or a no answer. Either it is or it is not continuous.
Now consider 3.). If f is continuous at x, f (x) ∈ D(g) , and g is continuous at
f (x) ,then g◦f is continuous at x. Let ε > 0 be given. Then there exists η > 0 such
that if [y −f (x)[ < η and y ∈ D(g) , it follows that [g (y) −g (f (x))[ < ε. From
continuity of f at x, there exists δ > 0 such that if [x −z[ < δ and z∈ D(f ) , then
[f (z) −f (x)[ < η. Then if [x −z[ < δ and z ∈ D(g ◦ f ) ⊆ D(f ) , all the above hold
and so
[g (f (z)) −g (f (x))[ < ε.
This proves part 3.)
To verify part 4.), let ε > 0 be given and let δ = ε. Then if [[x −y[[ < δ, the
triangle inequality implies
[f (x) −f (y)[ = [[[x[[ −[[y[[[
≤ [[x −y[[ < δ = ε.
This proves part 4.)
Next consider 5.) Suppose ﬁrst f is continuous. Let U be open and let x ∈
f
−1
(U) . This means f (x) ∈ U. Since U is open, there exists ε > 0 such that
B(f (x) , ε) ⊆ U. By continuity, there exists δ > 0 such that if y ∈ B(x, δ) , then
f (y) ∈ B(f (x) , ε) and so this shows B(x,δ) ⊆ f
−1
(U) which implies f
−1
(U) is
open since x is an arbitrary point of f
−1
(U) . Next suppose the condition about
inverse images of open sets are open. Then apply this condition to the open set
B(f (x) , ε) . The condition says f
−1
(B(f (x) , ε)) is open, and since
x ∈ f
−1
(B(f (x) , ε)) ,
it follows x is an interior point of f
−1
(B(f (x) , ε)) so there exists δ > 0 such that
B(x, δ) ⊆ f
−1
(B(f (x) , ε)) . This says f (B(x, δ)) ⊆ B(f (x) , ε) . In other words,
whenever [[y −x[[ < δ, [[f (y) −f (x)[[ < ε which is the condition for continuity at
the point x. Since x is arbitrary,
4.1 Continuity And The Limit Of A Sequence
There is a very useful way of thinking of continuity in terms of limits of sequences
found in the following theorem. In words, it says a function is continuous if it takes
convergent sequences to convergent sequences whenever possible.
Theorem 4.1.1 A function f : D(f ) →W is continuous at x ∈ D(f ) if and only
if, whenever x
n
→x with x
n
∈ D(f ) , it follows f (x
n
) →f (x) .
Proof: Suppose ﬁrst that f is continuous at x and let x
n
→x. Let ε > 0 be given.
By continuity, there exists δ > 0 such that if [[y −x[[ < δ, then [[f (x) −f (y)[[ < ε.
4.1. CONTINUITY AND THE LIMIT OF A SEQUENCE 65
However, there exists n
δ
such that if n ≥ n
δ
, then [[x
n
−x[[ < δ and so for all n
this large,
[[f (x) −f (x
n
)[[ < ε
which shows f (x
n
) →f (x) .
Now suppose the condition about taking convergent sequences to convergent
sequences holds at x. Suppose f fails to be continuous at x. Then there exists ε > 0
and x
n
∈ D(f ) such that [[x −x
n
[[ <
1
n
, yet
[[f (x) −f (x
n
)[[ ≥ ε.
But this is clearly a contradiction because, although x
n
→x, f (x
n
) fails to converge
to f (x) . It follows f must be continuous after all.
Theorem 4.1.2 Suppose f : D(f ) →1 is continuous at x ∈ D(f ) and suppose
[[f (x
n
)[[ ≤ l (≥ l)
where ¦x
n
¦ is a sequence of points of D(f ) which converges to x. Then
[[f (x)[[ ≤ l (≥ l) .
Proof: Since [[f (x
n
)[[ ≤ l and f is continuous at x, it follows from the triangle
inequality, Theorem 3.2.8 and Theorem 4.1.1,
[[f (x)[[ = lim
n→∞
[[f (x
n
)[[ ≤ l.
The other case is entirely similar.
Another very useful idea involves the automatic continuity of the inverse function
under certain conditions.
Theorem 4.1.3 Let K be a sequentially compact set and suppose f : K →f (K) is
continuous and one to one. Then f
−1
must also be continuous.
Proof: Suppose f (k
n
) →f (k) . Does it follow k
n
→k? If this does not happen,
then there exists ε > 0 and a subsequence still denoted as ¦k
n
¦ such that
[k
n
−k[ ≥ ε (4.1.1)
Now since K is compact, there exists a further subsequence, still denoted as ¦k
n
¦
such that
k
n
→k
∈ K
However, the continuity of f requires
f (k
n
) →f (k
)
and so f (k
) = f (k). Since f is one to one, this requires k
= k, a contradiction to
4.1.1.
66 CONTINUOUS FUNCTIONS
4.2 The Extreme Values Theorem
The extreme values theorem says continuous functions achieve their maximum and
minimum provided they are deﬁned on a sequentially compact set.
The next theorem is known as the max min theorem or extreme value theorem.
Theorem 4.2.1 Let K ⊆ F
n
be sequentially compact. Thus K is closed and
bounded, and let f : K → 1 be continuous. Then f achieves its maximum and
its minimum on K. This means there exist, x
1
, x
2
∈ K such that for all x ∈ K,
f (x
1
) ≤ f (x) ≤ f (x
2
) .
Proof: Let λ = sup ¦f (x) : x ∈ K¦ . Next let ¦λ
k
¦ be an increasing sequence
which converges to λ but each λ
k
< λ. Therefore, for each k, there exists x
k
∈ K
such that
f (x
k
) > λ
k
.
Since K is sequentially compact, there exists a subsequence, ¦x
k
l
¦ such that
lim
l→∞
x
k
l
= x ∈ K.
Then by continuity of f,
f (x) = lim
l→∞
f (x
k
l
) ≥ lim
l→∞
λ
k
l
= λ
which shows f achieves its maximum on K. To see it achieves its minimum, you
could repeat the argument with a minimizing sequence or else you could consider
−f and apply what was just shown to −f, −f having its minimum when f has its
maximum.
4.3 Connected Sets
Stated informally, connected sets are those which are in one piece. In order to deﬁne
what is meant by this, I will ﬁrst consider what it means for a set to not be in one
piece.
Deﬁnition 4.3.1 Let A be a nonempty subset of V a normed vector space. Then
A is deﬁned to be the intersection of all closed sets which contain A. Note the whole
space, V is one such closed set which contains A.
Lemma 4.3.2 Let A be a nonempty set in a normed vector space V. Then A is a
closed set and
A = A∪ A
where A
denotes the set of limit points of A.
4.3. CONNECTED SETS 67
Proof: First of all, denote by ( the set of closed sets which contain A. Then
A = ∩(
and this will be closed if its complement is open. However,
A
C
= ∪
_
H
C
: H ∈ (
_
.
Each H
C
is open and so the union of all these open sets must also be open. This
is because if x is in this union, then it is in at least one of them. Hence it is an
interior point of that one. But this implies it is an interior point of the union of
them all which is an even larger set. Thus A is closed.
The interesting part is the next claim. First note that from the deﬁnition, A ⊆ A
so if x ∈ A, then x ∈ A. Now consider y ∈ A
but y / ∈ A. If y / ∈ A, a closed set, then
there exists B(y, r) ⊆ A
C
. Thus y cannot be a limit point of A, a contradiction.
Therefore,
A∪ A
⊆ A
Next suppose x ∈ A and suppose x / ∈ A. Then if B(x, r) contains no points of
A diﬀerent than x, since x itself is not in A, it would follow that B(x,r) ∩ A = ∅
and so recalling that open balls are open, B(x, r)
C
is a closed set containing A so
from the deﬁnition, it also contains A which is contrary to the assertion that x ∈ A.
Hence if x / ∈ A, then x ∈ A
and so
A∪ A
⊇ A
Now that the closure of a set has been deﬁned it is possible to deﬁne what is
meant by a set being separated.
Deﬁnition 4.3.3 A set S in a normed vector space is separated if there exist sets
A, B such that
S = A∪ B, A, B ,= ∅, and A∩ B = B ∩ A = ∅.
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated. Remember A denotes the closure of the set A.
Note that the concept of connected sets is deﬁned in terms of what it is not. This
makes it somewhat diﬃcult to understand. One of the most important theorems
about connected sets is the following.
Theorem 4.3.4 Suppose U and V are connected sets having nonempty intersec
tion. Then U ∪ V is also connected.
Proof: Suppose U ∪ V = A ∪ B where A ∩ B = B ∩ A = ∅. Consider the sets
A∩ U and B ∩ U. Since
(A∩ U) ∩ (B ∩ U) = (A∩ U) ∩
_
B ∩ U
_
= ∅,
68 CONTINUOUS FUNCTIONS
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets A, B. Therefore, the other must be empty
and this shows U ∪ V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
U
V
Theorem 4.3.5 Let f : X → Y be continuous where Y is a normed vector space
and X is connected. Then f (X) is also connected.
Proof: To do this you show f (X) is not separated. Suppose to the contrary
that f (X) = A∪B where A and B separate f (X) . Then consider the sets f
−1
(A)
and f
−1
(B) . If z ∈ f
−1
(B) , then f (z) ∈ B and so f (z) is not a limit point of
A. Therefore, there exists an open set U containing f (z) such that U ∩ A = ∅.
But then, the continuity of f and Theorem 4.0.2 implies that f
−1
(U) is an open
set containing z such that f
−1
(U) ∩ f
−1
(A) = ∅. Therefore, f
−1
(B) contains no
limit points of f
−1
(A) . Similar reasoning implies f
−1
(A) contains no limit points
of f
−1
(B). It follows that X is separated by f
−1
(A) and f
−1
(B) , contradicting
the assumption that X was connected.
An arbitrary set can be written as a union of maximal connected sets called
connected components. This is the concept of the next deﬁnition.
Deﬁnition 4.3.6 Let S be a set and let p ∈ S. Denote by C
p
the union of all
connected subsets of S which contain p. This is called the connected component
determined by p.
Theorem 4.3.7 Let C
p
be a connected component of a set S in a normed vector
space. Then C
p
is a connected set and if C
p
∩ C
q
,= ∅, then C
p
= C
q
.
Proof: Let ( denote the connected subsets of S which contain p. If C
p
= A∪B
where
A∩ B = B ∩ A = ∅,
then p is in one of A or B. Suppose without loss of generality p ∈ A. Then every
set of ( must also be contained in A since otherwise, as in Theorem 4.3.4, the set
4.3. CONNECTED SETS 69
would be separated. But this implies B is empty. Therefore, C
p
is connected. From
this, and Theorem 4.3.4, the second assertion of the theorem is proved.
This shows the connected components of a set are equivalence classes and par
tition the set.
A set I is an interval in 1 if and only if whenever x, y ∈ I then (x, y) ⊆ I. The
following theorem is about the connected sets in 1.
Theorem 4.3.8 A set C in 1 is connected if and only if C is an interval.
Proof: Let C be connected. If C consists of a single point, p, there is nothing
to prove. The interval is just [p, p] . Suppose p < q and p, q ∈ C. You need to show
(p, q) ⊆ C. If
x ∈ (p, q) ¸ C
let C ∩ (−∞, x) ≡ A, and C ∩ (x, ∞) ≡ B. Then C = A∪ B and the sets A and B
separate C contrary to the assumption that C is connected.
Conversely, let I be an interval. Suppose I is separated by A and B. Pick x ∈ A
and y ∈ B. Suppose without loss of generality that x < y. Now deﬁne the set
S ≡ ¦t ∈ [x, y] : [x, t] ⊆ A¦
and let l be the least upper bound of S. Then l ∈ A so l / ∈ B which implies l ∈ A.
But if l / ∈ B, then for some δ > 0,
(l, l +δ) ∩ B = ∅
contradicting the deﬁnition of l as an upper bound for S. Therefore, l ∈ B which
implies l / ∈ A after all, a contradiction. It follows I must be connected.
This yields a generalization of the intermediate value theorem from one variable
calculus.
Corollary 4.3.9 Let E be a connected set in a normed vector space and suppose
f : E →1 and that y ∈ (f (e
1
) , f (e
2
)) where e
i
∈ E. Then there exists e ∈ E such
that f (e) = y.
Proof: From Theorem 4.3.5, f (E) is a connected subset of 1. By Theorem
4.3.8 f (E) must be an interval. In particular, it must contain y.
The following theorem is a very useful description of the open sets in 1.
Theorem 4.3.10 Let U be an open set in 1. Then there exist countably many
disjoint open sets ¦(a
i
, b
i
)¦
∞
i=1
such that U = ∪
∞
i=1
(a
i
, b
i
) .
Proof: Let p ∈ U and let z ∈ C
p
, the connected component determined by p.
Since U is open, there exists, δ > 0 such that (z −δ, z +δ) ⊆ U. It follows from
Theorem 4.3.4 that
(z −δ, z +δ) ⊆ C
p
.
70 CONTINUOUS FUNCTIONS
This shows C
p
is open. By Theorem 4.3.8, this shows C
p
is an open interval, (a, b)
where a, b ∈ [−∞, ∞] . There are therefore at most countably many of these con
nected components because each must contain a rational number and the rational
numbers are countable. Denote by ¦(a
i
, b
i
)¦
∞
i=1
the set of these connected compo
nents.
Deﬁnition 4.3.11 A set E in a normed vector space is arcwise connected if for
any two points, p, q ∈ E, there exists a closed interval, [a, b] and a continuous
function, γ : [a, b] →E such that γ (a) = p and γ (b) = q.
An example of an arcwise connected topological space would be any subset of
1
n
which is the continuous image of an interval. Arcwise connected is not the same
as connected. A well known example is the following.
__
x, sin
1
x
_
: x ∈ (0, 1]
_
∪ ¦(0, y) : y ∈ [−1, 1]¦ (4.3.2)
You can verify that this set of points in the normed vector space 1
2
is not arcwise
connected but is connected.
Lemma 4.3.12 In a normed vector space, B(z,r) is arcwise connected.
Proof: This is easy from the convexity of the set. If x, y ∈ B(z, r) , then let
γ (t) = x +t (y −x) for t ∈ [0, 1] .
[[x +t (y −x) −z[[ = [[(1 −t) (x −z) +t (y −z)[[
≤ (1 −t) [[x −z[[ +t [[y −z[[
< (1 −t) r +tr = r
showing γ (t) stays in B(z, r).
Proposition 4.3.13 If X is arcwise connected, then it is connected.
Proof: Let X be an arcwise connected set and suppose it is separated. Then
X = A ∪ B where A, B are two separated sets. Pick p ∈ A and q ∈ B. Since
X is given to be arcwise connected, there must exist a continuous function γ :
[a, b] →X such that γ (a) = p and γ (b) = q. But then γ ([a, b]) = (γ ([a, b]) ∩ A) ∪
(γ ([a, b]) ∩ B) and the two sets γ ([a, b]) ∩ A and γ ([a, b]) ∩ B are separated thus
showing that γ ([a, b]) is separated and contradicting Theorem 4.3.8 and Theorem
4.3.5. It follows that X must be connected as claimed.
Theorem 4.3.14 Let U be an open subset of a normed vector space. Then U is
arcwise connected if and only if U is connected. Also the connected components of
an open set are open sets.
Proof: By Proposition 4.3.13 it is only necessary to verify that if U is connected
and open in the context of this theorem, then U is arcwise connected. Pick p ∈ U.
4.4. UNIFORM CONTINUITY 71
Say x ∈ U satisﬁes T if there exists a continuous function, γ : [a, b] →U such that
γ (a) = p and γ (b) = x.
A ≡ ¦x ∈ U such that x satisﬁes T.¦
If x ∈ A, then Lemma 4.3.12 implies B(x, r) ⊆ U is arcwise connected for
small enough r. Thus letting y ∈ B(x, r) , there exist intervals, [a, b] and [c, d] and
continuous functions having values in U, γ, η such that γ (a) = p, γ (b) = x, η (c) =
x, and η (d) = y. Then let γ
1
: [a, b +d −c] →U be deﬁned as
γ
1
(t) ≡
_
γ (t) if t ∈ [a, b]
η (t +c −b) if t ∈ [b, b +d −c]
Then it is clear that γ
1
is a continuous function mapping p to y and showing that
B(x, r) ⊆ A. Therefore, A is open. A ,= ∅ because since U is open there is an open
set B(p, δ) containing p which is contained in U and is arcwise connected.
Now consider B ≡ U¸A. I claim this is also open. If B is not open, there exists a
point z ∈ B such that every open set containing z is not contained in B. Therefore,
letting B(z, δ) be such that z ∈ B(z, δ) ⊆ U, there exist points of A contained in
B(z, δ) . But then, a repeat of the above argument shows z ∈ A also. Hence B is
open and so if B ,= ∅, then U = B∪A and so U is separated by the two sets B and
A contradicting the assumption that U is connected.
It remains to verify the connected components are open. Let z ∈ C
p
where C
p
is
the connected component determined by p. Then picking B(z, δ) ⊆ U, C
p
∪B(z, δ)
is connected and contained in U and so it must also be contained in C
p
. Thus z is
an interior point of C
p
.
As an application, consider the following corollary.
Corollary 4.3.15 Let f : Ω → Z be continuous where Ω is a connected open set
in a normed vector space. Then f must be a constant.
Proof: Suppose not. Then it achieves two diﬀerent values, k and l ,= k. Then
Ω = f
−1
(l) ∪ f
−1
(¦m ∈ Z : m ,= l¦) and these are disjoint nonempty open sets
which separate Ω. To see they are open, note
f
−1
(¦m ∈ Z : m ,= l¦) = f
−1
_
∪
m=l
_
m−
1
6
, m+
1
6
__
which is the inverse image of an open set while f
−1
(l) = f
−1
__
l −
1
6
, l +
1
6
__
also
an open set.
4.4 Uniform Continuity
The concept of uniform continuity is also similar to the one dimensional concept.
Deﬁnition 4.4.1 Let f be a function. Then f is uniformly continuous if for every
ε > 0, there exists a δ depending only on ε such that if [[x −y[[ < δ then
[[f (x) −f (y)[[ < ε.
72 CONTINUOUS FUNCTIONS
Theorem 4.4.2 Let f : K → F be continuous where K is a sequentially compact
set in F
n
or more generally a normed vector space. Then f is uniformly continuous
on K.
Proof: If this is not true, there exists ε > 0 such that for every δ > 0
there exists a pair of points, x
δ
and y
δ
such that even though [[x
δ
−y
δ
[[ < δ,
[[f (x
δ
) −f (y
δ
)[[ ≥ ε. Taking a succession of values for δ equal to 1, 1/2, 1/3, ,
and letting the exceptional pair of points for δ = 1/n be denoted by x
n
and y
n
,
[[x
n
−y
n
[[ <
1
n
, [[f (x
n
) −f (y
n
)[[ ≥ ε.
Now since K is sequentially compact, there exists a subsequence, ¦x
n
k
¦ such that
x
n
k
→z ∈ K. Now n
k
≥ k and so
[[x
n
k
−y
n
k
[[ <
1
k
.
Hence
[[y
n
k
−z[[ ≤ [[y
n
k
−x
n
k
[[ +[[x
n
k
−z[[
<
1
k
+[[x
n
k
−z[[
Consequently, y
n
k
→z also. By continuity of f and Theorem 4.1.2,
0 = [[f (z) −f (z)[[ = lim
k→∞
[[f (x
n
k
) −f (y
n
k
)[[ ≥ ε,
an obvious contradiction. Therefore, the theorem must be true.
Recall the closed and bounded subsets of F
n
are those which are sequentially
compact.
4.5 Sequences And Series Of Functions
Now it is an easy matter to consider sequences of vector valued functions.
Deﬁnition 4.5.1 A sequence of functions is a map deﬁned on N or some set of
integers larger than or equal to a given integer, m which has values which are func
tions. It is written in the form ¦f
n
¦
∞
n=m
where f
n
is a function. It is assumed also
that the domain of all these functions is the same.
Here the functions have values in some normed vector space.
The deﬁnition of uniform convergence is exactly the same as earlier only now it
is not possible to draw representative pictures so easily.
Deﬁnition 4.5.2 Let ¦f
n
¦ be a sequence of functions. Then the sequence converges
pointwise to a function f if for all x ∈ D, the domain of the functions in the
sequence,
f (x) = lim
n→∞
f
n
(x)
4.5. SEQUENCES AND SERIES OF FUNCTIONS 73
Thus you consider for each x ∈ D the sequence of numbers ¦f
n
(x)¦ and if this
sequence converges for each x ∈ D, the thing it converges to is called f (x).
Deﬁnition 4.5.3 Let ¦f
n
¦ be a sequence of functions deﬁned on D. Then ¦f
n
¦ is
said to converge uniformly to f if it converges pointwise to f and for every ε > 0
there exists N such that for all n ≥ N
[[f (x) −f
n
(x)[[ < ε
for all x ∈ D.
Theorem 4.5.4 Let ¦f
n
¦ be a sequence of continuous functions deﬁned on D and
suppose this sequence converges uniformly to f . Then f is also continuous on D. If
each f
n
is uniformly continuous on D, then f is also uniformly continuous on D.
Proof: Let ε > 0 be given and pick z ∈ D. By uniform convergence, there exists
N such that if n > N, then for all x ∈ D,
[[f (x) −f
n
(x)[[ < ε/3. (4.5.3)
Pick such an n. By assumption, f
n
is continuous at z. Therefore, there exists δ > 0
such that if [[z −x[[ < δ then
[[f
n
(x) −f
n
(z)[[ < ε/3.
It follows that for [[x −z[[ < δ,
[[f (x) −f (z)[[ ≤ [[f (x) −f
n
(x)[[ +[[f
n
(x) −f
n
(z)[[ +[[f
n
(z) −f (z)[[
< ε/3 +ε/3 +ε/3 = ε
which shows that since ε was arbitrary, f is continuous at z.
In the case where each f
n
is uniformly continuous, and using the same f
n
for
which 4.5.3 holds, there exists a δ > 0 such that if [[y −z[[ < δ, then
[[f
n
(z) −f
n
(y)[[ < ε/3.
Then for [[y −z[[ < δ,
[[f (y) −f (z)[[ ≤ [[f (y) −f
n
(y)[[ +[[f
n
(y) −f
n
(z)[[ +[[f
n
(z) −f (z)[[
< ε/3 +ε/3 +ε/3 = ε
This shows uniform continuity of f .
Deﬁnition 4.5.5 Let ¦f
n
¦ be a sequence of functions deﬁned on D. Then the
sequence is said to be uniformly Cauchy if for every ε > 0 there exists N such that
whenever m, n ≥ N,
[[f
m
(x) −f
n
(x)[[ < ε
for all x ∈ D.
74 CONTINUOUS FUNCTIONS
Then the following theorem follows easily.
Theorem 4.5.6 Let ¦f
n
¦ be a uniformly Cauchy sequence of functions deﬁned on
D having values in a complete normed vector space such as F
n
for example. Then
there exists f deﬁned on D such that ¦f
n
¦ converges uniformly to f .
Proof: For each x ∈ D, ¦f
n
(x)¦ is a Cauchy sequence. Therefore, it converges
to some vector f (x). Let ε > 0 be given and let N be such that if n, m ≥ N,
[[f
m
(x) −f
n
(x)[[ < ε/2
for all x ∈ D. Then for any x ∈ D, pick n ≥ N and it follows from Theorem 3.2.8
[[f (x) −f
n
(x)[[ = lim
m→∞
[[f
m
(x) −f
n
(x)[[ ≤ ε/2 < ε.
Corollary 4.5.7 Let ¦f
n
¦ be a uniformly Cauchy sequence of functions continuous
on D having values in a complete normed vector space like F
n
. Then there exists f
deﬁned on D such that ¦f
n
¦ converges uniformly to f and f is continuous. Also, if
each f
n
is uniformly continuous, then so is f .
Proof: This follows from Theorem 4.5.6 and Theorem 4.5.4.
Here is one more fairly obvious theorem.
Theorem 4.5.8 Let ¦f
n
¦ be a sequence of functions deﬁned on D having values in
a complete normed vector space like F
n
. Then it converges pointwise if and only if
the sequence ¦f
n
(x)¦ is a Cauchy sequence for every x ∈ D. It converges uniformly
if and only if ¦f
n
¦ is a uniformly Cauchy sequence.
Proof: If the sequence converges pointwise, then by Theorem 3.5.3 the se
quence ¦f
n
(x)¦ is a Cauchy sequence for each x ∈ D. Conversely, if ¦f
n
(x)¦ is a
Cauchy sequence for each x ∈ D, then ¦f
n
(x)¦ converges for each x ∈ D because
of completeness.
Now suppose ¦f
n
¦ is uniformly Cauchy. Then from Theorem 4.5.6 there ex
ists f such that ¦f
n
¦ converges uniformly on D to f . Conversely, if ¦f
n
¦ converges
uniformly to f on D, then if ε > 0 is given, there exists N such that if n ≥ N,
[f (x) −f
n
(x)[ < ε/2
for every x ∈ D. Then if m, n ≥ N and x ∈ D,
[f
n
(x) −f
m
(x)[ ≤ [f
n
(x) −f (x)[ +[f (x) −f
m
(x)[ < ε/2 +ε/2 = ε.
Thus ¦f
n
¦ is uniformly Cauchy.
Once you understand sequences, it is no problem to consider series.
4.5. SEQUENCES AND SERIES OF FUNCTIONS 75
Deﬁnition 4.5.9 Let ¦f
n
¦ be a sequence of functions deﬁned on D. Then
_
∞
k=1
f
k
_
(x) ≡ lim
n→∞
n
k=1
f
k
(x) (4.5.4)
whenever the limit exists. Thus there is a new function denoted by
∞
k=1
f
k
(4.5.5)
and its value at x is given by the limit of the sequence of partial sums in 4.5.4. If
for all x ∈ D, the limit in 4.5.4 exists, then 4.5.5 is said to converge pointwise.
∞
k=1
f
k
is said to converge uniformly on D if the sequence of partial sums,
_
n
k=1
f
k
_
converges uniformly.If the indices for the functions start at some other value than
1, you make the obvious modiﬁcation to the above deﬁnition.
Theorem 4.5.10 Let ¦f
n
¦ be a sequence of functions deﬁned on D which have
values in a complete normed vector space like F
n
. The series
∞
k=1
f
k
converges
pointwise if and only if for each ε > 0 and x ∈ D, there exists N
ε,x
which may
depend on x as well as ε such that when q > p ≥ N
ε,x
,
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
q
k=p
f
k
(x)
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
< ε
The series
∞
k=1
f
k
converges uniformly on D if for every ε > 0 there exists N
ε
such that if q > p ≥ N then
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
q
k=p
f
k
(x)
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
< ε (4.5.6)
for all x ∈ D.
Proof: The ﬁrst part follows from Theorem 4.5.8. The second part follows
from observing the condition is equivalent to the sequence of partial sums forming a
uniformly Cauchy sequence and then by Theorem 4.5.6, these partial sums converge
uniformly to a function which is the deﬁnition of
∞
k=1
f
k
.
Is there an easy way to recognize when 4.5.6 happens? Yes, there is. It is called
the Weierstrass M test.
Theorem 4.5.11 Let ¦f
n
¦ be a sequence of functions deﬁned on D having values
in a complete normed vector space like F
n
. Suppose there exists M
n
such that
sup ¦[f
n
(x)[ : x ∈ D¦ < M
n
and
∞
n=1
M
n
converges. Then
∞
n=1
f
n
converges
uniformly on D.
76 CONTINUOUS FUNCTIONS
Proof: Let z ∈ D. Then letting m < n and using the triangle inequality
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
k=1
f
k
(z) −
m
k=1
f
k
(z)
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
≤
n
k=m+1
[[f
k
(z)[[ ≤
∞
k=m+1
M
k
< ε
whenever m is large enough because of the assumption that
∞
n=1
M
n
converges.
Therefore, the sequence of partial sums is uniformly Cauchy on D and therefore,
converges uniformly to
∞
k=1
f
k
on D.
Theorem 4.5.12 If ¦f
n
¦ is a sequence of continuous functions deﬁned on D and
∞
k=1
f
k
converges uniformly, then the function,
∞
k=1
f
k
must also be continuous.
Proof: This follows from Theorem 4.5.4 applied to the sequence of partial sums
of the above series which is assumed to converge uniformly to the function,
∞
k=1
f
k
.
4.6 Polynomials
General considerations about what a function is have already been considered ear
lier. For functions of one variable, the special kind of functions known as a polyno
mial has a corresponding version when one considers a function of many variables.
This is found in the next deﬁnition.
Deﬁnition 4.6.1 Let α be an n dimensional multiindex. This means
α = (α
1
, , α
n
)
where each α
i
is a positive integer or zero. Also, let
[α[ ≡
n
i=1
[α
i
[
Then x
α
means
x
α
≡ x
α
1
1
x
α
2
2
x
α
n
3
where each x
j
∈ F. An n dimensional polynomial of degree m is a function of the
form
p (x) =
α≤m
d
α
x
α
.
where the d
α
are complex or real numbers. Rational functions are deﬁned as the
quotient of two polynomials. Thus these functions are deﬁned on F
n
.
For example, f (x) = x
1
x
2
2
+ 7x
4
3
x
1
is a polynomial of degree 5 and
x
1
x
2
2
+ 7x
4
3
x
1
+x
3
2
4x
3
1
x
2
2
+ 7x
2
3
x
1
−x
3
2
4.7. SEQUENCES OF POLYNOMIALS, WEIERSTRASS APPROXIMATION 77
is a rational function.
Note that in the case of a rational function, the domain of the function might
not be all of F
n
. For example, if
f (x) =
x
1
x
2
2
+ 7x
4
3
x
1
+x
3
2
x
2
2
+ 3x
2
1
−4
,
the domain of f would be all complex numbers such that x
2
2
+ 3x
2
1
,= 4.
By Theorem 4.0.2 all polynomials are continuous. To see this, note that the
function,
π
k
(x) ≡ x
k
is a continuous function because of the inequality
[π
k
(x) −π
k
(y)[ = [x
k
−y
k
[ ≤ [x −y[ .
Polynomials are simple sums of scalars times products of these functions. Similarly,
by this theorem, rational functions, quotients of polynomials, are continuous at
points where the denominator is non zero. More generally, if V is a normed vector
space, consider a V valued function of the form
f (x) ≡
α≤m
d
α
x
α
where d
α
∈ V , sort of a V valued polynomial. Then such a function is continuous
by application of Theorem 4.0.2 and the above observation about the continuity of
the functions π
k
.
Thus there are lots of examples of continuous functions. However, it is even
better than the above discussion indicates. As in the case of a function of one
variable, an arbitrary continuous function can typically be approximated uniformly
by a polynomial. This is the n dimensional version of the Weierstrass approximation
theorem.
4.7 Sequences Of Polynomials, Weierstrass Approx
imation
Just as an arbitrary continuous function deﬁned on an interval can be approximated
uniformly by a polynomial, there exists a similar theorem which is just a general
ization of the earlier one which will hold for continuous functions deﬁned on a box
or more generally a closed and bounded set. The proof is based on the following
lemma.
Lemma 4.7.1 The following estimate holds for x ∈ [0, 1] and m ≥ 2.
m
k=0
_
m
k
_
(k −mx)
2
x
k
(1 −x)
m−k
≤
1
4
m
78 CONTINUOUS FUNCTIONS
Proof: First of all, from the binomial theorem
_
1 −x +e
t
x
_
m
=
m
k=0
_
m
k
_
e
kt
x
k
(1 −x)
m−k
and so, taking the derivative on both sides with respect to t and then letting t = 0,
mx =
m
k=0
_
m
k
_
kx
k
(1 −x)
m−k
Next take the second derivative of both sides with respect to t and then let t = 0.
Thus after doing the computations,
x
2
m
2
−x
2
m+mx =
m
k=0
_
m
k
_
k
2
x
k
(1 −x)
m−k
It follows from the above and the binomial theorem,
m
k=0
_
m
k
_
(k −mx)
2
x
k
(1 −x)
m−k
=
m
k=0
_
m
k
_
k
2
x
k
(1 −x)
m−k
+m
2
x
2
−2mx
m
k=0
_
m
k
_
kx
k
(1 −x)
m−k
= x
2
m
2
−x
2
m+mx +m
2
x
2
−2m
2
x
2
= −x
2
m+mx
and this achieves its maximum value when x = 1/2. Plugging this in, the above is
no larger than
1
4
m.
Now let f be a continuous function deﬁned on [0, 1] . Let p
n
be the polynomial
deﬁned by
p
n
(x) ≡
n
k=0
_
n
k
_
f
_
k
n
_
x
k
(1 −x)
n−k
. (4.7.7)
Now for f a continuous function deﬁned on [0, 1]
n
and for x =(x
1
, , x
n
) ,consider
the polynomial,
p
m
(x) ≡
m
k
1
=1
m
k
n
=1
_
m
k
1
__
m
k
2
_
_
m
k
n
_
x
k
1
1
(1 −x
1
)
m−k
1
x
k
2
2
(1 −x
2
)
m−k
2
x
k
n
n
(1 −x
n
)
m−k
n
f
_
k
1
m
, ,
k
n
m
_
. (4.7.8)
Also deﬁne if I is a set in 1
n
[[h[[
I
≡ sup ¦[h(x)[ : x ∈ I¦ .
Thus p
m
converges uniformly to f on a set I if
lim
m→∞
[[p
m
−f[[
I
= 0.
4.7. SEQUENCES OF POLYNOMIALS, WEIERSTRASS APPROXIMATION 79
To simplify the notation, let k = (k
1
, , k
n
) where each k
i
∈ [0, m],
k
m
≡
_
k
1
m
, ,
k
n
m
_
, and let
_
m
k
_
≡
_
m
k
1
__
m
k
2
_
_
m
k
n
_
.
Also deﬁne
[[k[[
∞
≡ max ¦k
i
, i = 1, 2, , n¦
x
k
(1 −x)
m−k
≡ x
k
1
1
(1 −x
1
)
m−k
1
x
k
2
2
(1 −x
2
)
m−k
2
x
k
n
n
(1 −x
n
)
m−k
n
.
Thus in terms of this notation,
p
m
(x) =
k
∞
≤m
_
m
k
_
x
k
(1 −x)
m−k
f
_
k
m
_
This is the n dimensional version of the Bernstein polynomials which is what results
n the case where n = 1.
Lemma 4.7.2 For x ∈ [0, 1]
n
, f a continuous F valued function deﬁned on [0, 1]
n
,
and p
m
given in 4.7.8, p
m
converges uniformly to f on [0, 1]
n
as m →∞.
Proof: The function, f is uniformly continuous because it is continuous on a
sequentially compact set [0, 1]
n
. Therefore, there exists δ > 0 such that if [x −y[ <
δ, then
[f (x) −f (y)[ < ε.
Denote by G the set of k such that (k
i
−mx
i
)
2
< η
2
m
2
for each i where η = δ/
√
n.
Note this condition is equivalent to saying that for each i,
¸
¸
k
i
m
−x
i
¸
¸
< η. A short
computation shows that by the binomial theorem,
k
∞
≤m
_
m
k
_
x
k
(1 −x)
m−k
= 1
and so for x ∈ [0, 1]
n
,
[p
m
(x) −f (x)[ ≤
k
∞
≤m
_
m
k
_
x
k
(1 −x)
m−k
¸
¸
¸
¸
f
_
k
m
_
−f (x)
¸
¸
¸
¸
≤
k∈G
_
m
k
_
x
k
(1 −x)
m−k
¸
¸
¸
¸
f
_
k
m
_
−f (x)
¸
¸
¸
¸
+
k∈G
C
_
m
k
_
x
k
(1 −x)
m−k
¸
¸
¸
¸
f
_
k
m
_
−f (x)
¸
¸
¸
¸
(4.7.9)
Now for k ∈ G it follows that for each i
¸
¸
¸
¸
k
i
m
−x
i
¸
¸
¸
¸
<
δ
√
n
(4.7.10)
80 CONTINUOUS FUNCTIONS
and so
¸
¸
f
_
k
m
_
−f (x)
¸
¸
< ε because the above implies
¸
¸
k
m
−x
¸
¸
< δ. Therefore, the
ﬁrst sum on the right in 4.7.9 is no larger than
k∈G
_
m
k
_
x
k
(1 −x)
m−k
ε ≤
k
∞
≤m
_
m
k
_
x
k
(1 −x)
m−k
ε = ε.
Letting M ≥ max ¦[f (x)[ : x ∈ [0, 1]
n
¦ it follows
[p
m
(x) −f (x)[
≤ ε + 2M
k∈G
C
_
m
k
_
x
k
(1 −x)
m−k
≤ ε + 2M
_
1
η
2
m
2
_
n
k∈G
C
_
m
k
_
n
j=1
(k
j
−mx
j
)
2
x
k
(1 −x)
m−k
≤ ε + 2M
_
1
η
2
m
2
_
n
k
∞
≤m
_
m
k
_
n
j=1
(k
j
−mx
j
)
2
x
k
(1 −x)
m−k
because on G
C
,
(k
j
−mx
j
)
2
η
2
m
2
< 1, j = 1, , n.
Now by Lemma 4.7.1,
[p
m
(x) −f (x)[ ≤ ε + 2M
_
1
η
2
m
2
_
n _
m
4
_
n
.
Therefore, since the right side does not depend on x, it follows that for all m
suﬃciently large,
[[p
m
−f[[
[0,1]
n ≤ 2ε
and since ε is arbitrary, this shows lim
m→∞
[[p
m
−f[[
[0,1]
n = 0.
Theorem 4.7.3 Let f be a continuous function deﬁned on
R ≡
n
k=1
[a
k
, b
k
] .
Then there exists a sequence of polynomials ¦p
m
¦ converging uniformly to f on R.
Proof: Let g
k
: [0, 1] →[a
k
, b
k
] be linear, one to one, and onto and let
x = g (y) ≡ (g
1
(y
1
) , g
2
(y
2
) , , g
n
(y
n
)) .
Thus g : [0, 1]
n
→
n
k=1
[a
k
, b
k
] is one to one, onto, and each component function
is linear. Then f ◦ g is a continuous function deﬁned on [0, 1]
n
. It follows from
Lemma 4.7.2 there exists a sequence of polynomials, ¦p
m
(y)¦ each deﬁned on [0, 1]
n
4.7. SEQUENCES OF POLYNOMIALS, WEIERSTRASS APPROXIMATION 81
which converges uniformly to f ◦ g on [0, 1]
n
. Therefore,
_
p
m
_
g
−1
(x)
__
converges
uniformly to f (x) on R. But
y =(y
1
, , y
n
) =
_
g
−1
1
(x
1
) , , g
−1
n
(x
n
)
_
and each g
−1
k
is linear. Therefore,
_
p
m
_
g
−1
(x)
__
is a sequence of polynomials.
There is a more general version of this theorem which is easy to get. It depends
on the Tietze extension theorem, a wonderful little result which is interesting for
its own sake.
4.7.1 The Tietze Extension Theorem
To generalize the Weierstrass approximation theorem I will give a special case of
the Tietze extension theorem, a very useful result in topology. When this is done,
it will be possible to prove the Weierstrass approximation theorem for functions
deﬁned on a closed and bounded subset of 1
n
rather than a box.
Lemma 4.7.4 Let S ⊆ 1
n
be a nonempty subset. Deﬁne
dist (x, S) ≡ inf ¦[x −y[ : y ∈ S¦ .
Then x →dist (x, S) is a continuous function satisfying the inequality,
[dist (x, S) −dist (y, S)[ ≤ [x −y[ . (4.7.11)
Proof: The continuity of x → dist (x, S) is obvious if the inequality 4.7.11
is established. So let x, y ∈ 1
n
. Without loss of generality, assume dist (x, S) ≥
dist (y, S) and pick z ∈ S such that [y −z[ −ε < dist (y, S) . Then
[dist (x, S) −dist (y, S)[ = dist (x, S) −dist (y, S)
≤ [x −z[ −([y −z[ −ε)
≤ [z −y[ +[x −y[ −[y −z[ +ε = [x −y[ +ε.
Since ε is arbitrary, this proves 4.7.11.
Lemma 4.7.5 Let H, K be two nonempty disjoint closed subsets of 1
n
. Then there
exists a continuous function, g : 1
n
→ [−1, 1] such that g (H) = −1/3, g (K) =
1/3, g (1
n
) ⊆ [−1/3, 1/3] .
Proof: Let
f (x) ≡
dist (x, H)
dist (x, H) + dist (x, K)
.
The denominator is never equal to zero because if dist (x, H) = 0, then x ∈ H
because H is closed. (To see this, pick h
k
∈ B(x, 1/k) ∩H. Then h
k
→x and since
H is closed, x ∈ H.) Similarly, if dist (x, K) = 0, then x ∈ K and so the denominator
is never zero as claimed. Hence f is continuous and from its deﬁnition, f = 0 on H
and f = 1 on K. Now let g (x) ≡
2
3
_
f (x) −
1
2
_
. Then g has the desired properties.
82 CONTINUOUS FUNCTIONS
Deﬁnition 4.7.6 For f a real or complex valued bounded continuous function de
ﬁned on M ⊆ 1
n
.
[[f[[
M
≡ sup ¦[f (x)[ : x ∈ M¦ .
Lemma 4.7.7 Suppose M is a closed set in 1
n
where 1
n
and suppose f : M →
[−1, 1] is continuous at every point of M. Then there exists a function, g which is
deﬁned and continuous on all of 1
n
such that [[f −g[[
M
<
2
3
, g (1
n
) ⊆ [−1/3, 1/3] .
Proof: Let H = f
−1
([−1, −1/3]) , K = f
−1
([1/3, 1]) . Thus H and K are
disjoint closed subsets of M. Suppose ﬁrst H, K are both nonempty. Then by
Lemma 4.7.5 there exists g such that g is a continuous function deﬁned on all of 1
n
and g (H) = −1/3, g (K) = 1/3, and g (1
n
) ⊆ [−1/3, 1/3] . It follows [[f −g[[
M
<
2/3. If H = ∅, then f has all its values in [−1/3, 1] and so letting g ≡ 1/3, the
desired condition is obtained. If K = ∅, let g ≡ −1/3.
Lemma 4.7.8 Suppose M is a closed set in 1
n
and suppose f : M → [−1, 1] is
continuous at every point of M. Then there exists a function, g which is deﬁned and
continuous on all of 1
n
such that g = f on M and g has its values in [−1, 1] .
Proof: Using Lemma 4.7.7, let g
1
be such that g
1
(1
n
) ⊆ [−1/3, 1/3] and
[[f −g
1
[[
M
≤
2
3
.
Suppose g
1
, , g
m
have been chosen such that g
j
(1
n
) ⊆ [−1/3, 1/3] and
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
f −
m
i=1
_
2
3
_
i−1
g
i
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
M
<
_
2
3
_
m
. (4.7.12)
This has been done for m = 1. Then
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
3
2
_
m
_
f −
m
i=1
_
2
3
_
i−1
g
i
_¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
M
≤ 1
and so
_
3
2
_
m
_
f −
m
i=1
_
2
3
_
i−1
g
i
_
can play the role of f in the ﬁrst step of the
proof. Therefore, there exists g
m+1
deﬁned and continuous on all of 1
n
such that
its values are in [−1/3, 1/3] and
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
3
2
_
m
_
f −
m
i=1
_
2
3
_
i−1
g
i
_
−g
m+1
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
M
≤
2
3
.
Hence
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
f −
m
i=1
_
2
3
_
i−1
g
i
_
−
_
2
3
_
m
g
m+1
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
M
≤
_
2
3
_
m+1
.
4.7. SEQUENCES OF POLYNOMIALS, WEIERSTRASS APPROXIMATION 83
It follows there exists a sequence, ¦g
i
¦ such that each has its values in [−1/3, 1/3]
and for every m 4.7.12 holds. Then let
g (x) ≡
∞
i=1
_
2
3
_
i−1
g
i
(x) .
It follows
[g (x)[ ≤
¸
¸
¸
¸
¸
∞
i=1
_
2
3
_
i−1
g
i
(x)
¸
¸
¸
¸
¸
≤
m
i=1
_
2
3
_
i−1
1
3
≤ 1
and
¸
¸
¸
¸
¸
_
2
3
_
i−1
g
i
(x)
¸
¸
¸
¸
¸
≤
_
2
3
_
i−1
1
3
so the Weierstrass M test applies and shows convergence is uniform. Therefore g
must be continuous. The estimate 4.7.12 implies f = g on M.
The following is the Tietze extension theorem.
Theorem 4.7.9 Let M be a closed nonempty subset of 1
n
and let f : M → [a, b]
be continuous at every point of M. Then there exists a function, g continuous on
all of 1
n
which coincides with f on M such that g (1
n
) ⊆ [a, b] .
Proof: Let f
1
(x) = 1 +
2
b−a
(f (x) −b) . Then f
1
satisﬁes the conditions of
Lemma 4.7.8 and so there exists g
1
: 1
n
→[−1, 1] such that g is continuous on 1
n
and equals f
1
on M. Let g (x) = (g
1
(x) −1)
_
b−a
2
_
+b. This works.
With the Tietze extension theorem, here is a better version of the Weierstrass
approximation theorem.
Theorem 4.7.10 Let K be a closed and bounded subset of 1
n
and let f : K → 1
be continuous. Then there exists a sequence of polynomials ¦p
m
¦ such that
lim
m→∞
(sup ¦[f (x) −p
m
(x)[ : x ∈ K¦) = 0.
In other words, the sequence of polynomials converges uniformly to f on K.
Proof: By the Tietze extension theorem, there exists an extension of f to a
continuous function g deﬁned on all 1
n
such that g = f on K. Now since K is
bounded, there exist intervals, [a
k
, b
k
] such that
K ⊆
n
k=1
[a
k
, b
k
] = R
Then by the Weierstrass approximation theorem, Theorem 4.7.3 there exists a se
quence of polynomials ¦p
m
¦ converging uniformly to g on R. Therefore, this se
quence of polynomials converges uniformly to g = f on K as well.
By considering the real and imaginary parts of a function which has values in C
one can generalize the above theorem.
84 CONTINUOUS FUNCTIONS
Corollary 4.7.11 Let K be a closed and bounded subset of 1
n
and let f : K →F
be continuous. Then there exists a sequence of polynomials ¦p
m
¦ such that
lim
m→∞
(sup ¦[f (x) −p
m
(x)[ : x ∈ K¦) = 0.
In other words, the sequence of polynomials converges uniformly to f on K.
4.8 The Operator Norm
It is important to be able to measure the size of a linear operator. The most
convenient way is described in the next deﬁnition.
Deﬁnition 4.8.1 Let V, W be two ﬁnite dimensional normed vector spaces having
norms [[[[
V
and [[[[
W
respectively. Let L ∈ L(V, W) . Then the operator norm of
L, denoted by [[L[[ is deﬁned as
[[L[[ ≡ sup ¦[[Lx[[
W
: [[x[[
V
≤ 1¦ .
Then the following theorem discusses the main properties of this norm. In the
future, I will dispense with the subscript on the symbols for the norm because it is
clear from the context which norm is meant. Here is a useful lemma.
Lemma 4.8.2 Let V be a normed vector space having a basis ¦v
1
, , v
n
¦. Let
A =
_
a ∈ F
n
:
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
k=1
a
k
v
k
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
≤ 1
_
where a = (a
1
, , a
n
) . Then A is a closed and bounded subset of F
n
.
Proof: First suppose a / ∈ A. Then
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
k=1
a
k
v
k
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
> 1.
Then for b = (b
1
, , b
n
) , and using the triangle inequality,
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
k=1
b
k
v
k
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
k=1
(a
k
−(a
k
−b
k
)) v
k
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
≥
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
k=1
a
k
v
k
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
−
n
k=1
[a
k
−b
k
[ [[v
k
[[
and now it is apparent that if [a −b[ is suﬃciently small so that each [a
k
−b
k
[ is
small enough, this expression is larger than 1. Thus there exists δ > 0 such that
B(a, δ) ⊆ A
C
showing that A
C
is open. Therefore, A is closed.
4.8. THE OPERATOR NORM 85
Next consider the claim that A is bounded. Suppose this is not so. Then there
exists a sequence ¦a
k
¦ of points of A,
a
k
=
_
a
1
k
, , a
n
k
_
,
such that lim
k→∞
[a
k
[ = ∞. Then from the deﬁnition of A,
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
j=1
a
j
k
[a
k
[
v
j
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
≤
1
[a
k
[
. (4.8.13)
Let
b
k
=
_
a
1
k
[a
k
[
, ,
a
n
k
[a
k
[
_
Then [b
k
[ = 1 so b
k
is contained in the closed and bounded set S (0, 1) which is
sequentially compact in F
n
. It follows there exists a subsequence, still denoted by
¦b
k
¦ such that it converges to b ∈ S (0,1) . Passing to the limit in 4.8.13 using the
following inequality,
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
j=1
a
j
k
[a
k
[
v
j
−
n
j=1
b
j
v
j
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
≤
n
j=1
¸
¸
¸
¸
¸
a
j
k
[a
k
[
−b
j
¸
¸
¸
¸
¸
[[v
j
[[
to see that the sum converges to
n
j=1
b
j
v
j
, it follows
n
j=1
b
j
v
j
= 0
and this is a contradiction because ¦v
1
, , v
n
¦ is a basis and not all the b
j
can
equal zero. Therefore, A must be bounded after all.
Theorem 4.8.3 The operator norm has the following properties.
1. [[L[[ < ∞
2. For all x ∈ X, [[Lx[[ ≤ [[L[[ [[x[[ and if L ∈ L(V, W) while M ∈ L(W, Z) ,
then [[ML[[ ≤ [[M[[ [[L[[.
3. [[[[ is a norm. In particular,
(a) [[L[[ ≥ 0 and [[L[[ = 0 if and only if L = 0, the linear transformation
which sends every vector to 0.
(b) [[aL[[ = [a[ [[L[[ whenever a ∈ F
(c) [[L +M[[ ≤ [[L[[ +[[M[[
4. If L ∈ L(V, W) for V, W normed vector spaces, L is continuous, meaning that
L
−1
(U) is open whenever U is an open set in W.
86 CONTINUOUS FUNCTIONS
Proof: First consider 1.). Let A be as in the above lemma. Then
[[L[[ ≡ sup
_
_
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
L
_
_
n
j=1
a
j
v
j
_
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
: a ∈ A
_
_
_
= sup
_
_
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
n
j=1
a
j
L(v
j
)
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
: a ∈ A
_
_
_
< ∞
because a →
¸
¸
¸
¸
¸
¸
n
j=1
a
j
L(v
j
)
¸
¸
¸
¸
¸
¸ is a real valued continuous function deﬁned on a
sequentially compact set and so it achieves its maximum.
Next consider 2.). If x = 0 there is nothing to show. Assume x ,= 0. Then from
the deﬁnition of [[L[[ ,
¸
¸
¸
¸
¸
¸
¸
¸
L
_
x
[[x[[
_¸
¸
¸
¸
¸
¸
¸
¸
≤ [[L[[
and so, since L is linear, you can multiply on both sides by [[x[[ and conclude
[[L(x)[[ ≤ [[L[[ [[x[[ .
For the other claim,
[[ML[[ ≡ sup ¦[[ML(x)[[ : [[x[[ ≤ 1¦
≤ [[M[[ sup ¦[[Lx[[ : [[x[[ ≤ 1¦ ≡ [[M[[ [[L[[ .
Finally consider 3.) If [[L[[ = 0 then from 2.), [[Lx[[ ≤ 0 and so Lx = 0 for
every x which is the same as saying L = 0. If Lx = 0 for every x, then L = 0 by
deﬁnition. Let a ∈ F. Then from the properties of the norm, in the vector space,
[[aL[[ ≡ sup ¦[[aLx[[ : [[x[[ ≤ 1¦
= sup ¦[a[ [[Lx[[ : [[x[[ ≤ 1¦
= [a[ sup ¦[[Lx[[ : [[x[[ ≤ 1¦ ≡ [a[ [[L[[
Finally consider the triangle inequality.
[[L +M[[ ≡ sup ¦[[Lx +Mx[[ : [[x[[ ≤ 1¦
≤ sup ¦[[Mx[[ +[[Lx[[ : [[x[[ ≤ 1¦
≤ sup ¦[[Lx[[ : [[x[[ ≤ 1¦ + sup ¦[[Mx[[ : [[x[[ ≤ 1¦
because [[Lx[[ ≤ sup ¦[[Lx[[ : [[x[[ ≤ 1¦ with a similar inequality holding for M.
Therefore, by deﬁnition,
[[L +M[[ ≤ [[L[[ +[[M[[ .
Finally consider 4.). Let L ∈ L(V, W) and let U be open in W and v ∈ L
−1
(U) .
Thus since U is open, there exists δ > 0 such that
L(v) ∈ B(L(v) , δ) ⊆ U.
4.8. THE OPERATOR NORM 87
Then if w ∈ V,
[[L(v −w)[[ = [[L(v) −L(w)[[ ≤ [[L[[ [[v −w[[
and so if [[v −w[[ is suﬃciently small, [[v −w[[ < δ/ [[L[[ , then L(w) ∈ B(L(v) , δ)
which shows B(v, δ/ [[L[[) ⊆ L
−1
(U) and since v ∈ L
−1
(U) was arbitrary, this
shows L
−1
(U) is open.
The operator norm will be very important in the chapter on the derivative.
Part 1.) of Theorem 4.8.3 says that if L ∈ L(V, W) where V and W are two
normed vector spaces, then there exists K such that for all v ∈ V,
[[Lv[[
W
≤ K[[v[[
V
An obvious case is to let L = id, the identity map on V and let there be two
diﬀerent norms on V, [[[[
1
and [[[[
2
. Thus (V, [[[[
1
) is a normed vector space and
so is (V, [[[[
2
) . Then Theorem 4.8.3 implies that
[[v[[
2
= [[id (v)[[
2
≤ K
2
[[v[[
1
(4.8.14)
while the same reasoning implies there exists K
1
such that
[[v[[
1
≤ K
1
[[v[[
2
. (4.8.15)
This leads to the following important theorem.
Theorem 4.8.4 Let V be a ﬁnite dimensional vector space and let [[[[
1
and [[[[
2
be
two norms for V. Then these norms are equivalent which means there exist constants,
δ, ∆ such that for all v ∈ V
δ [[v[[
1
≤ [[v[[
2
≤ ∆[[v[[
1
A set K is sequentially compact if and only if it is closed and bounded. Also every
ﬁnite dimensional normed vector space is complete. Also any closed and bounded
subset of a ﬁnite dimensional normed vector space is sequentially compact.
Proof: From 4.8.14 and 4.8.15
[[v[[
1
≤ K
1
[[v[[
2
≤ K
1
K
2
[[v[[
1
and so
1
K
1
[[v[[
1
≤ [[v[[
2
≤ K
2
[[v[[
1
.
Next consider the claim that all closed and bounded sets in a normed vector
space are sequentially compact. Let L : F
n
→V be deﬁned by
L(a) ≡
n
k=1
a
k
v
k
88 CONTINUOUS FUNCTIONS
where ¦v
1
, , v
n
¦ is a basis for V . Thus L ∈ L(F
n
, V ) and so by Theorem 4.8.3
this is a continuous function. Hence if K is a closed and bounded subset of V it
follows
L
−1
(K) = F
n
¸ L
−1
_
K
C
_
= F
n
¸ (an open set) = a closed set.
Also L
−1
(K) is bounded. To see this, note that L
−1
is one to one onto V and so
L
−1
∈ L(V, F
n
). Therefore,
¸
¸
L
−1
(v)
¸
¸
≤
¸
¸
¸
¸
L
−1
¸
¸
¸
¸
[[v[[ ≤
¸
¸
¸
¸
L
−1
¸
¸
¸
¸
r
where K ⊆ B(0, r) . Since K is bounded, such an r exists. Thus L
−1
(K) is a closed
and bounded subset of F
n
and is therefore sequentially compact. It follows that if
¦v
k
¦
∞
k=1
⊆ K, there is a subsequence ¦v
k
l
¦
∞
l=1
such that
_
L
−1
v
k
l
_
converges to a
point, a ∈ L
−1
(K) . Hence by continuity of L,
v
k
l
= L
_
L
−1
(v
k
l
)
_
→La ∈ K.
Conversely, suppose K is sequentially compact. I need to verify it is closed and
bounded. If it is not closed, then it is missing a limit point, k
0
. Since k
0
is a limit
point, there exists k
n
∈ B
_
k
0
,
1
n
_
such that k
n
,= k
0
. Therefore, ¦k
n
¦ has no limit
point in K because k
0
/ ∈ K. It follows K must be closed. If K is not bounded,
then you could pick k
n
∈ K such that k
m
/ ∈ B(0, m) and it follows ¦k
k
¦ cannot
have a subsequence which converges because if k ∈ K, then for large enough m,
k ∈ B(0, m/2) and so if
_
k
k
j
_
is any subsequence, k
k
j
/ ∈ B(0, m) for all but ﬁnitely
many j. In other words, for any k ∈ K, it is not the limit of any subsequence. Thus
K must also be bounded.
Finally consider the claim about completeness. Let ¦v
k
¦
∞
k=1
be a Cauchy se
quence in V . Since L
−1
, deﬁned above is in L(V, F
n
) , it follows
_
L
−1
v
k
_
∞
k=1
is a
Cauchy sequence in F
n
. This follows from the inequality,
¸
¸
L
−1
v
k
−L
−1
v
l
¸
¸
≤
¸
¸
¸
¸
L
−1
¸
¸
¸
¸
[[v
k
−v
l
[[ .
therefore, there exists a ∈ F
n
such that L
−1
v
k
→a and since L is continuous,
v
k
= L
_
L
−1
(v
k
)
_
→L(a) .
Next suppose K is a closed and bounded subset of V and let ¦x
k
¦
∞
k=1
be a
sequence of vectors in K. Let ¦v
1
, , v
n
¦ be a basis for V and let
x
k
=
n
j=1
x
j
k
v
j
Deﬁne a norm for V according to
[[x[[
2
≡
n
j=1
¸
¸
x
j
¸
¸
2
, x =
n
j=1
x
j
v
j
4.9. ASCOLI ARZELA THEOREM 89
It is clear most axioms of a norm hold. The triangle inequality also holds because
by the triangle inequality for F
n
,
[[x +y[[ ≡
_
_
n
j=1
¸
¸
x
j
+y
j
¸
¸
2
_
_
1/2
≤
_
_
n
j=1
¸
¸
x
j
¸
¸
2
_
_
1/2
+
_
_
n
j=1
¸
¸
y
j
¸
¸
2
_
_
1/2
≡ [[x[[ +[[y[[ .
By the ﬁrst part of this theorem, this norm is equivalent to the norm on V . Thus
K is closed and bounded with respect to this new norm. It follows that for each
j,
_
x
j
k
_
∞
k=1
is a bounded sequence in F and so by the theorems about sequential
compactness in F it follows upon taking subsequences n times, there exists a sub
sequence x
k
l
such that for each j,
lim
l→∞
x
j
k
l
= x
j
for some x
j
. Hence
lim
l→∞
x
k
l
= lim
l→∞
n
j=1
x
j
k
l
v
j
=
n
j=1
x
j
v
j
∈ K
because K is closed.
Example 4.8.5 Let V be a vector space and let ¦v
1
, , v
n
¦ be a basis. Deﬁne a
norm on V as follows. For v =
n
k=1
a
k
v
k
,
[[v[[ ≡ max ¦[a
k
[ : k = 1, , n¦
In the above example, this is a norm on the vector space, V . It is clear [[av[[ =
[a[ [[v[[ and that [[v[[ ≥ 0 and equals 0 if and only if v = 0. The hard part is the
triangle inequality. Let v =
n
k=1
a
k
v
k
and w = v =
n
k=1
b
k
v
k
.
[[v +w[[ ≡ max
k
¦[a
k
+b
k
[¦ ≤ max
k
¦[a
k
[ +[b
k
[¦
≤ max
k
[a
k
[ + max
k
[b
k
[ ≡ [[v[[ +[[w[[ .
This shows this is indeed a norm.
4.9 Ascoli Arzela Theorem
Let ¦f
k
¦
∞
k=1
be a sequence of functions deﬁned on a compact set which have values
in a ﬁnite dimensional normed vector space V. The following deﬁnition will be of
the norm of such a function.
90 CONTINUOUS FUNCTIONS
Deﬁnition 4.9.1 Let f : K → V be a continuous function which has values in a
ﬁnite dimensional normed vector space V where here K is a compact set contained
in some normed vector space. Deﬁne
[[f [[ ≡ sup ¦[[f (x)[[
V
: x ∈ K¦ .
Denote the set of such functions by C (K; V ) .
Proposition 4.9.2 The above deﬁnition yields a norm and in fact C (K; V ) is a
complete normed linear space.
Proof: This is obviously a vector space. Just verify the axioms. The main
thing to show it that the above is a norm. First note that [[f [[ = 0 if and only if
f = 0 and [[αf [[ = [α[ [[f [[ whenever α ∈ F, the ﬁeld of scalars, C or 1. As to the
triangle inequality,
[[f +g[[ ≡ sup ¦[[(f +g) (x)[[ : x ∈ K¦
≤ sup ¦[[f (x)[[
V
: x ∈ K¦ + sup ¦[[g (x)[[
V
: x ∈ K¦
≡ [[g[[ +[[f [[
Furthermore, the function x →[[f (x)[[
V
is continuous thanks to the triangle in
equality which implies
[[[f (x)[[
V
−[[f (y)[[
V
[ ≤ [[f (x) −f (y)[[
V
.
Therefore, [[f [[ is a well deﬁned nonnegative real number.
It remains to verify completeness. Suppose then ¦f
k
¦ is a Cauchy sequence with
respect to this norm. Then from the deﬁnition it is a uniformly Cauchy sequence
and since by Theorem 4.8.4 V is a complete normed vector space, it follows from
Theorem 4.5.6, there exists f ∈ C (K; V ) such that ¦f
k
¦ converges uniformly to f .
That is,
lim
k→∞
[[f −f
k
[[ = 0.
This proves the proposition.
Theorem 4.8.4 says that closed and bounded sets in a ﬁnite dimensional normed
vector space V are sequentially compact. This theorem typically doesn’t apply to
C (K; V ) because generally this is not a ﬁnite dimensional vector space although
as shown above it is a complete normed vector space. It turns out you need more
than closed and bounded in order to have a subset of C (K; V ) be sequentially
compact.
Deﬁnition 4.9.3 Let T ⊆ C (K; V ) . Then T is equicontinuous if for every ε > 0
there exists δ > 0 such that whenever [x −y[ < δ, it follows
[f (x) −f (y)[ < ε
for all f ∈ T. T is bounded if there exists C such that
[[f [[ ≤ C
for all f ∈ T.
4.9. ASCOLI ARZELA THEOREM 91
Lemma 4.9.4 Let K be a sequentially compact nonempty subset of a ﬁnite dimen
sional normed vector space. Then there exists a countable set D ≡ ¦k
i
¦
∞
i=1
such
that for every ε > 0 and for every x ∈ K,
B(x, ε) ∩ D ,= ∅.
Proof: Let n ∈ N. Pick k
n
1
∈ K. If B
_
k
n
1
,
1
n
_
⊇ K, stop. Otherwise pick
k
n
2
∈ K ¸ B
_
k
n
1
,
1
n
_
Continue this way till the process ends. It must end because if it didn’t, there would
exist a convergent subsequence which would imply two of the k
n
j
would have to be
closer than 1/n which is impossible from the construction. Denote this collection
of points by D
n
. Then D ≡ ∪
∞
n=1
D
n
. This must work because if ε > 0 is given
and x ∈ K, let 1/n < ε/3 and the construction implies x ∈ B(k
n
i
, 1/n) for some
k
n
i
∈ D
n
∪ D. Then
k
n
i
∈ B(x, ε) .
D is countable because it is the countable union of ﬁnite sets.
Deﬁnition 4.9.5 More generally, if K is any subset of a normed vector space and
there exists D such that D is countable and for all x ∈ K,
B(x, ε) ∩ D ,= ∅
then K is called separable.
Now here is another remarkable result about equicontinuous functions.
Lemma 4.9.6 Suppose ¦f
k
¦
∞
k=1
is equicontinuous and the functions are deﬁned on
a sequentially compact set K. Suppose also for each x ∈ K,
lim
k→∞
f
k
(x) = f (x) .
Then in fact f is continuous and the convergence is uniform. That is
lim
k→∞
[[f
k
−f [[ = 0.
Proof: Uniform convergence would say that for every ε > 0, there exists n
ε
such that if k, l ≥ n
ε
, then for all x ∈ K,
[[f
k
(x) −f
l
(x)[[ < ε.
Thus if the given sequence does not converge uniformly, there exists ε > 0 such that
for all n, there exists k, l ≥ n and x
n
∈ K such that
[[f
k
(x
n
) −f
l
(x
n
)[[ ≥ ε
92 CONTINUOUS FUNCTIONS
Since K is sequentially compact, there exists a subsequence, still denoted by ¦x
n
¦
such that lim
n→∞
x
n
= x ∈ K. Then letting k, l be associated with n as just
described,
ε ≤ [[f
k
(x
n
) −f
l
(x
n
)[[
V
≤ [[f
k
(x
n
) −f
k
(x)[[
V
+[[f
k
(x) −f
l
(x)[[
V
+[[f
l
(x) −f
l
(x
n
)[[
V
By equicontinuity, if n is large enough, this implies
ε <
ε
3
+[[f
k
(x) −f
l
(x)[[
V
+
ε
3
and now taking n still larger if necessary, the middle term on the right in the above
is also less than ε/3 which yields a contradiction. Hence convergence is uniform and
so it follows from Theorem 4.5.6 the function f is actually continuous and
lim
k→∞
[[f −f
k
[[ = 0.
The Ascoli Arzela theorem is the following.
Theorem 4.9.7 Let K be a closed and bounded subset of a ﬁnite dimensional
normed vector space and let T ⊆ C (K; V ) where V is a ﬁnite dimensional normed
vector space. Suppose also that T is bounded and equicontinuous. Then if ¦f
k
¦
∞
k=1
⊆
T, there exists a subsequence ¦f
k
l
¦
∞
l=1
which converges to a function f ∈ C (K; V )
in the sense that
lim
l→∞
[[f −f
k
l
[[
Proof: Denote by
_
f
(k,n)
_
∞
n=1
a subsequence of
_
f
(k−1,n)
_
∞
n=1
where the index
denoted by (k −1, k −1) is always less than the index denoted by (k, k) . Also let
the countable dense subset of Lemma 4.9.4 be D = ¦d
k
¦
∞
k=1
. Then consider the
following diagram.
f
(1,1)
, f
(1,2)
, f
(1,3)
, f
(1,4)
, →d
1
f
(2,1)
, f
(2,2)
, f
(2,3)
, f
(2,4)
, →d
1
, d
2
f
(3,1)
, f
(3,2)
, f
(3,3)
, f
(3,4)
, →d
1
, d
2
, d
3
f
(4,1)
, f
(4,2)
, f
(4,3)
, f
(4,4)
, →d
1
, d
2
, d
3
, d
4
.
.
.
The meaning is as follows.
_
f
(1,k)
_
∞
k=1
is a subsequence of the original sequence
which converges at d
1
. Such a subsequence exists because ¦f
k
(d
1
)¦
∞
k=1
is contained
in a bounded set so a subsequence converges by Theorem 4.8.4. (It is given to be in
a bounded set and so the closure of this bounded set is both closed and bounded,
hence weakly compact.) Now
_
f
(2,k)
_
∞
k=1
is a subsequence of the ﬁrst subsequence
which converges, at d
2
. Then by Theorem 3.2.6 this new subsequence continues to
converge at d
1
. Thus, as indicated by the diagram, it converges at both b
1
and
4.10. EXERCISES 93
b
2
. Continuing this way explains the meaning of the diagram. Now consider the
subsequence of the original sequence
_
f
(k,k)
_
∞
k=1
. For k ≥ n, this subsequence is a
subsequence of the subsequence ¦f
n,k
¦
∞
k=1
and so it converges at d
1
, d
2
, d
3
, d
4
.
This being true for all n, it follows
_
f
(k,k)
_
∞
k=1
converges at every point of D. To
save on notation, I shall simply denote this as ¦f
k
¦ .
Then letting d ∈ D,
[[f
k
(x) −f
l
(x)[[
V
≤ [[f
k
(x) −f
k
(d)[[
V
+[[f
k
(d) −f
l
(d)[[
V
+[[f
l
(d) −f
l
(x)[[
V
Picking d close enough to x and applying equicontinuity,
[[f
k
(x) −f
l
(x)[[
V
< 2ε/3 +[[f
k
(d) −f
l
(d)[[
V
Thus for k, l large enough, the right side is less than ε. This shows that for each x ∈
K, ¦f
k
(x)¦
∞
k=1
is a Cauchy sequence and so by completeness of V this converges. Let
f (x) be the thing to which it converges. Then f is continuous and the convergence
is uniform by Lemma 4.9.6.
4.10 Exercises
1. In Theorem 4.7.3 it is assumed f has values in F. Show there is no change if
f has values in V, a normed vector space provided you redeﬁne the deﬁnition
of a polynomial to be something of the form
α≤m
a
α
x
α
where a
α
∈ V .
2. How would you generalize the conclusion of Corollary 4.7.11 to include the
situation where f has values in a ﬁnite dimensional normed vector space?
3. If ¦f
n
¦ and ¦g
n
¦ are sequences of F
n
valued functions deﬁned on D which con
verge uniformly, show that if a, b are constants, then af
n
+bg
n
also converges
uniformly. If there exists a constant, M such that [f
n
(x)[ , [g
n
(x)[ < M for all
n and for all x ∈ D, show ¦f
n
g
n
¦ converges uniformly. Let f
n
(x) ≡ 1/ [x[
for x ∈ B(0,1) and let g
n
(x) ≡ (n −1) /n. Show ¦f
n
¦ converges uniformly on
B(0,1) and ¦g
n
¦ converges uniformly but ¦f
n
g
n
¦ fails to converge uniformly.
4. Formulate a theorem for series of functions of n variables which will allow you
to conclude the inﬁnite series is uniformly continuous based on reasonable
assumptions about the functions in the sum.
5. If f and g are real valued functions which are continuous on some set D, show
that
min (f, g) , max (f, g)
are also continuous. Generalize this to any ﬁnite collection of continuous func
tions. Hint: Note max (f, g) =
f−g+f+g
2
. Now recall the triangle inequality
which can be used to show [[ is a continuous function.
94 CONTINUOUS FUNCTIONS
6. Find an example of a sequence of continuous functions deﬁned on 1
n
such
that each function is nonnegative and each function has a maximum value
equal to 1 but the sequence of functions converges to 0 pointwise on 1
n
¸¦0¦ ,
that is, the set of vectors in 1
n
excluding 0.
7. Theorem 4.3.14 says an open subset U of 1
n
is arcwise connected if and only
if U is connected. Consider the usual Cartesian coordinates relative to axes
x
1
, , x
n
. A square curve is one consisting of a succession of straight line
segments each of which is parallel to some coordinate axis. Show an open
subset U of 1
n
is connected if and only if every two points can be joined by
a square curve.
8. Let x → h(x) be a bounded continuous function. Show the function f (x) =
∞
n=1
h(nx)
n
2
is continuous.
9. Let S be a any countable subset of 1
n
. Show there exists a function, f
deﬁned on 1
n
which is discontinuous at every point of S but continuous
everywhere else. Hint: This is real easy if you do the right thing. It involves
the Weierstrass M test.
10. By Theorem 4.7.10 there exists a sequence of polynomials converging uni
formly to f (x) = [x[ on R ≡
n
k=1
[−M, M] . Show there exists a sequence of
polynomials, ¦p
n
¦ converging uniformly to f on R which has the additional
property that for all n, p
n
(0) = 0.
11. If f is any continuous function deﬁned on K a sequentially compact subset
of 1
n
, show there exists a series of the form
∞
k=1
p
k
, where each p
k
is a
polynomial, which converges uniformly to f on [a, b]. Hint: You should use
the Weierstrass approximation theorem to obtain a sequence of polynomials.
Then arrange it so the limit of this sequence is an inﬁnite sum.
12. A function f is Holder continuous if there exists a constant, K such that
[f (x) −f (y)[ ≤ K[x −y[
α
for some α ≤ 1 for all x, y. Show every Holder continuous function is uniformly
continuous.
13. Consider f (x) ≡ dist (x,S) where S is a nonempty subset of 1
n
. Show f is
uniformly continuous.
14. Let K be a sequentially compact set in a normed vector space V and let
f : V →W be continuous where W is also a normed vector space. Show f (K)
is also sequentially compact.
15. If f is uniformly continuous, does it follow that [f [ is also uniformly continu
ous? If [f [ is uniformly continuous does it follow that f is uniformly continu
ous? Answer the same questions with “uniformly continuous” replaced with
“continuous”. Explain why.
4.10. EXERCISES 95
16. Let f : D →1 be a function. This function is said to be lower semicontinuous
1
at x ∈ D if for any sequence ¦x
n
¦ ⊆ D which converges to x it follows
f (x) ≤ lim inf
n→∞
f (x
n
) .
Suppose D is sequentially compact and f is lower semicontinuous at every
point of D. Show that then f achieves its minimum on D.
17. Let f : D →1 be a function. This function is said to be upper semicontinuous
at x ∈ D if for any sequence ¦x
n
¦ ⊆ D which converges to x it follows
f (x) ≥ lim sup
n→∞
f (x
n
) .
Suppose D is sequentially compact and f is upper semicontinuous at every
point of D. Show that then f achieves its maximum on D.
18. Show that a real valued function deﬁned on D ⊆ 1
n
is continuous if and only
if it is both upper and lower semicontinuous.
19. Show that a real valued lower semicontinuous function deﬁned on a sequen
tially compact set achieves its minimum and that an upper semicontinuous
function deﬁned on a sequentially compact set achieves its maximum.
20. Give an example of a lower semicontinuous function deﬁned on 1
n
which is
not continuous and an example of an upper semicontinuous function which is
not continuous.
21. Suppose ¦f
α
: α ∈ Λ¦ is a collection of continuous functions. Let
F (x) ≡ inf ¦f
α
(x) : α ∈ Λ¦
Show F is an upper semicontinuous function. Next let
G(x) ≡ sup ¦f
α
(x) : α ∈ Λ¦
Show G is a lower semicontinuous function.
22. Let f be a function. epi (f) is deﬁned as
¦(x, y) : y ≥ f (x)¦ .
It is called the epigraph of f. We say epi (f) is closed if whenever (x
n
, y
n
) ∈
epi (f) and x
n
→ x and y
n
→ y, it follows (x, y) ∈ epi (f) . Show f is lower
semicontinuous if and only if epi (f) is closed. What would be the correspond
ing result equivalent to upper semicontinuous?
1
The notion of lower semicontinuity is very important for functions which are deﬁned on inﬁnite
dimensional sets. In more general settings, one formulates the concept diﬀerently.
96 CONTINUOUS FUNCTIONS
23. The operator norm was deﬁned for L(V, W) above. This is the usual norm
used for this vector space of linear transformations. Show that any other norm
used on L(V, W) is equivalent to the operator norm. That is, show that if
[[[[
1
is another norm, there exist scalars δ, ∆ such that
δ [[L[[ ≤ [[L[[
1
≤ ∆[[L[[
for all L ∈ L(V, W) where here [[[[ denotes the operator norm.
24. One alternative norm which is very popular is as follows. Let L ∈ L(V, W)
and let (l
ij
) denote the matrix of L with respect to some bases. Then the
Frobenius norm is deﬁned by
_
_
ij
[l
ij
[
2
_
_
1/2
≡ [[L[[
F
.
Show this is a norm. Other norms are of the form
_
_
ij
[l
ij
[
p
_
_
1/p
where p ≥ 1 or even
[[L[[
∞
= max
ij
[l
ij
[ .
Show these are also norms.
25. Explain why L(V, W) is always a complete normed vector space whenever
V, W are ﬁnite dimensional normed vector spaces for any choice of norm for
L(V, W). Also explain why every closed and bounded subset of L(V, W) is
sequentially compact for any choice of norm on this space.
26. Let L ∈ L(V, V ) where V is a ﬁnite dimensional normed vector space. Deﬁne
e
L
≡
∞
k=1
L
k
k!
Explain the meaning of this inﬁnite sum and show it converges in L(V, V ) for
any choice of norm on this space. Now tell how to deﬁne sin (L).
27. Let X be a ﬁnite dimensional normed vector space, real or complex. Show
that X is separable. Hint: Let ¦v
i
¦
n
i=1
be a basis and deﬁne a map from F
n
to
X, θ, as follows. θ (
n
k=1
x
k
e
k
) ≡
n
k=1
x
k
v
k
. Show θ is continuous and has
a continuous inverse. Now let D be a countable dense set in F
n
and consider
θ (D).
4.10. EXERCISES 97
28. Let B(X; 1
n
) be the space of functions f , mapping X to 1
n
such that
sup¦[f (x)[ : x ∈ X¦ < ∞.
Show B(X; 1
n
) is a complete normed linear space if we deﬁne
[[f [[ ≡ sup¦[f (x)[ : x ∈ X¦.
29. Let α ∈ (0, 1]. Deﬁne, for X a compact subset of 1
p
,
C
α
(X; 1
n
) ≡ ¦f ∈ C (X; 1
n
) : ρ
α
(f ) +[[f [[ ≡ [[f [[
α
< ∞¦
where
[[f [[ ≡ sup¦[f (x)[ : x ∈ X¦
and
ρ
α
(f ) ≡ sup¦
[f (x) −f (y)[
[x −y[
α
: x, y ∈ X, x ,= y¦.
Show that (C
α
(X; 1
n
) , [[[[
α
) is a complete normed linear space. This is
called a Holder space. What would this space consist of if α > 1?
30. Let ¦f
n
¦
∞
n=1
⊆ C
α
(X; 1
n
) where X is a compact subset of 1
p
and suppose
[[f
n
[[
α
≤ M
for all n. Show there exists a subsequence, n
k
, such that f
n
k
converges in
C (X; 1
n
). The given sequence is precompact when this happens. (This also
shows the embedding of C
α
(X; 1
n
) into C (X; 1
n
) is a compact embedding.)
Hint: You might want to use the Ascoli Arzela theorem.
31. This problem is for those who know about the derivative and the integral of
a function of one variable. Let f :1 1
n
→ 1
n
be continuous and bounded
and let x
0
∈ 1
n
. If
x : [0, T] →1
n
and h > 0, let
τ
h
x(s) ≡
_
x
0
if s ≤ h,
x(s −h) , if s > h.
For t ∈ [0, T], let
x
h
(t) = x
0
+
_
t
0
f (s, τ
h
x
h
(s)) ds.
Show using the Ascoli Arzela theorem that there exists a sequence h →0 such
that
x
h
→x
in C ([0, T] ; 1
n
). Next argue
x(t) = x
0
+
_
t
0
f (s, x(s)) ds
98 CONTINUOUS FUNCTIONS
and conclude the following theorem. If f :1 1
n
→ 1
n
is continuous and
bounded, and if x
0
∈ 1
n
is given, there exists a solution to the following
initial value problem.
x
= f (t, x) , t ∈ [0, T]
x(0) = x
0
.
This is the Peano existence theorem for ordinary diﬀerential equations.
32. Let D(x
0
, r) be the closed ball in 1
n
,
¦x : [x −x
0
[ ≤ r¦
where this is the usual norm coming from the dot product. Let P : 1
n
→
D(x
0
, r) be deﬁned by
P (x) ≡
_
x if x ∈ D(x
0
, r)
x
0
+r
x−x
0
x−x
0

if x / ∈ D(x
0
, r)
Show that [Px −Py[ ≤ [x −y[ for all x ∈ 1
n
.
33. Use Problem 31 to obtain local solutions to the initial value problem where
f is not assumed to be bounded. It is only assumed to be continuous. This
means there is a small interval whose length is perhaps not T such that the
solution to the diﬀerential equation exists on this small interval.
The Mathematical Theory Of
Determinants, Basic Linear
Algebra
5.1 The Function sgn
n
It is easiest to give a diﬀerent deﬁnition of the determinant which is clearly well
deﬁned and then prove the earlier one in terms of Laplace expansion. Let (i
1
, , i
n
)
be an ordered list of numbers from ¦1, , n¦. This means the order is important
so (1, 2, 3) and (2, 1, 3) are diﬀerent. There will be some repetition between this
section and the earlier section on determinants. The main purpose is to give all
the missing proofs. Two books which give a good introduction to determinants are
Apostol [2] and Rudin [39]. A recent book which also has a good introduction is
Baker [4].
The following Lemma will be essential in the deﬁnition of the determinant.
Lemma 5.1.1 There exists a unique function sgn
n
which maps each list of numbers
from ¦1, , n¦ to one of the three numbers, 0, 1, or −1 which also has the following
properties.
sgn
n
(1, , n) = 1 (5.1.1)
sgn
n
(i
1
, , p, , q, , i
n
) = −sgn
n
(i
1
, , q, , p, , i
n
) (5.1.2)
In words, the second property states that if two of the numbers are switched, the
value of the function is multiplied by −1. Also, in the case where n > 1 and
¦i
1
, , i
n
¦ = ¦1, , n¦ so that every number from ¦1, , n¦ appears in the or
dered list, (i
1
, , i
n
) ,
sgn
n
(i
1
, , i
θ−1
, n, i
θ+1
, , i
n
) ≡
(−1)
n−θ
sgn
n−1
(i
1
, , i
θ−1
, i
θ+1
, , i
n
) (5.1.3)
where n = i
θ
in the ordered list, (i
1
, , i
n
).
99
100THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
Proof: To begin with, it is necessary to show the existence of such a function.
This is clearly true if n = 1. Deﬁne sgn
1
(1) ≡ 1 and observe that it works.
No switching is possible. In the case where n = 2, it is also clearly true. Let
sgn
2
(1, 2) = 1 and sgn
2
(2, 1) = −1 while sgn
2
(2, 2) = sgn
2
(1, 1) = 0 and verify it
works. Assuming such a function exists for n, sgn
n+1
will be deﬁned in terms of
sgn
n
. If there are any repeated numbers in (i
1
, , i
n+1
) , sgn
n+1
(i
1
, , i
n+1
) ≡ 0.
If there are no repeats, then n + 1 appears somewhere in the ordered list. Let θ
be the position of the number n + 1 in the list. Thus, the list is of the form
(i
1
, , i
θ−1
, n + 1, i
θ+1
, , i
n+1
). From 5.1.3 it must be that
sgn
n+1
(i
1
, , i
θ−1
, n + 1, i
θ+1
, , i
n+1
) ≡
(−1)
n+1−θ
sgn
n
(i
1
, , i
θ−1
, i
θ+1
, , i
n+1
) .
It is necessary to verify this satisﬁes 5.1.1 and 5.1.2 with n replaced with n+1. The
ﬁrst of these is obviously true because
sgn
n+1
(1, , n, n + 1) ≡ (−1)
n+1−(n+1)
sgn
n
(1, , n) = 1.
If there are repeated numbers in (i
1
, , i
n+1
) , then it is obvious 5.1.2 holds because
both sides would equal zero from the above deﬁnition. It remains to verify 5.1.2 in
the case where there are no numbers repeated in (i
1
, , i
n+1
) . Consider
sgn
n+1
_
i
1
, ,
r
p, ,
s
q, , i
n+1
_
,
where the r above the p indicates the number, p is in the r
th
position and the s
above the q indicates that the number, q is in the s
th
position. Suppose ﬁrst that
r < θ < s. Then
sgn
n+1
_
i
1
, ,
r
p, ,
θ
n + 1, ,
s
q, , i
n+1
_
≡
(−1)
n+1−θ
sgn
n
_
i
1
, ,
r
p, ,
s−1
q , , i
n+1
_
while
sgn
n+1
_
i
1
, ,
r
q, ,
θ
n + 1, ,
s
p, , i
n+1
_
=
(−1)
n+1−θ
sgn
n
_
i
1
, ,
r
q, ,
s−1
p , , i
n+1
_
and so, by induction, a switch of p and q introduces a minus sign in the result.
Similarly, if θ > s or if θ < r it also follows that 5.1.2 holds. The interesting case
is when θ = r or θ = s. Consider the case where θ = r and note the other case is
entirely similar.
sgn
n+1
_
i
1
, ,
r
n + 1, ,
s
q, , i
n+1
_
=
(−1)
n+1−r
sgn
n
_
i
1
, ,
s−1
q , , i
n+1
_
(5.1.4)
5.2. THE DETERMINANT 101
while
sgn
n+1
_
i
1
, ,
r
q, ,
s
n + 1, , i
n+1
_
=
(−1)
n+1−s
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
. (5.1.5)
By making s −1 −r switches, move the q which is in the s −1
th
position in 5.1.4 to
the r
th
position in 5.1.5. By induction, each of these switches introduces a factor
of −1 and so
sgn
n
_
i
1
, ,
s−1
q , , i
n+1
_
= (−1)
s−1−r
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
.
Therefore,
sgn
n+1
_
i
1
, ,
r
n + 1, ,
s
q, , i
n+1
_
= (−1)
n+1−r
sgn
n
_
i
1
, ,
s−1
q , , i
n+1
_
= (−1)
n+1−r
(−1)
s−1−r
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
= (−1)
n+s
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
= (−1)
2s−1
(−1)
n+1−s
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
= −sgn
n+1
_
i
1
, ,
r
q, ,
s
n + 1, , i
n+1
_
.
This proves the existence of the desired function.
To see this function is unique, note that you can obtain any ordered list of
distinct numbers from a sequence of switches. If there exist two functions f and
g both satisfying 5.1.1 and 5.1.2, you could start with f (1, , n) = g (1, , n)
and applying the same sequence of switches, eventually arrive at f (i
1
, , i
n
) =
g (i
1
, , i
n
). If any numbers are repeated, then 5.1.2 gives both functions are equal
to zero for that ordered list.
5.2 The Determinant
5.2.1 The Deﬁnition
In what follows sgn will often be used rather than sgn
n
because the context supplies
the appropriate n.
Deﬁnition 5.2.1 Let f be a real valued function which has the set of ordered lists
of numbers from ¦1, , n¦ as its domain. Deﬁne
(k
1
,··· ,k
n
)
f (k
1
k
n
)
102THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
to be the sum of all the f (k
1
k
n
) for all possible choices of ordered lists (k
1
, , k
n
)
of numbers of ¦1, , n¦. For example,
(k
1
,k
2
)
f (k
1
, k
2
) = f (1, 2) +f (2, 1) +f (1, 1) +f (2, 2) .
Deﬁnition 5.2.2 Let (a
ij
) = A denote an n n matrix. The determinant of A,
denoted by det (A) is deﬁned by
det (A) ≡
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) a
1k
1
a
nk
n
where the sum is taken over all ordered lists of numbers from ¦1, , n¦. Note it
suﬃces to take the sum over only those ordered lists in which there are no repeats
because if there are, sgn (k
1
, , k
n
) = 0 and so that term contributes 0 to the sum.
Let A be an n n matrix, A = (a
ij
) and let (r
1
, , r
n
) denote an ordered list
of n numbers from ¦1, , n¦. Let A(r
1
, , r
n
) denote the matrix whose k
th
row
is the r
k
row of the matrix, A. Thus
det (A(r
1
, , r
n
)) =
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
(5.2.6)
and
A(1, , n) = A.
5.2.2 Permuting Rows Or Columns
Proposition 5.2.3 Let
(r
1
, , r
n
)
be an ordered list of numbers from ¦1, , n¦. Then
sgn (r
1
, , r
n
) det (A)
=
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
(5.2.7)
= det (A(r
1
, , r
n
)) . (5.2.8)
Proof: Let (1, , n) = (1, , r, s, , n) so r < s.
det (A(1, , r, , s, , n)) = (5.2.9)
(k
1
,··· ,k
n
)
sgn (k
1
, , k
r
, , k
s
, , k
n
) a
1k
1
a
rk
r
a
sk
s
a
nk
n
,
5.2. THE DETERMINANT 103
and renaming the variables, calling k
s
, k
r
and k
r
, k
s
, this equals
=
(k
1
,··· ,k
n
)
sgn (k
1
, , k
s
, , k
r
, , k
n
) a
1k
1
a
rk
s
a
sk
r
a
nk
n
=
(k
1
,··· ,k
n
)
−sgn
_
_
k
1
, ,
These got switched
¸ .. ¸
k
r
, , k
s
, , k
n
_
_
a
1k
1
a
sk
r
a
rk
s
a
nk
n
= −det (A(1, , s, , r, , n)) . (5.2.10)
Consequently,
det (A(1, , s, , r, , n)) =
−det (A(1, , r, , s, , n)) = −det (A)
Now letting A(1, , s, , r, , n) play the role of A, and continuing in this way,
switching pairs of numbers,
det (A(r
1
, , r
n
)) = (−1)
p
det (A)
where it took p switches to obtain(r
1
, , r
n
) from (1, , n). By Lemma 5.1.1,
this implies
det (A(r
1
, , r
n
)) = (−1)
p
det (A) = sgn (r
1
, , r
n
) det (A)
and proves the proposition in the case when there are no repeated numbers in the
ordered list, (r
1
, , r
n
). However, if there is a repeat, say the r
th
row equals the
s
th
row, then the reasoning of 5.2.9 5.2.10 shows that A(r
1
, , r
n
) = 0 and also
sgn (r
1
, , r
n
) = 0 so the formula holds in this case also.
Observation 5.2.4 There are n! ordered lists of distinct numbers from ¦1, , n¦.
To see this, consider n slots placed in order. There are n choices for the ﬁrst
slot. For each of these choices, there are n − 1 choices for the second. Thus there
are n(n −1) ways to ﬁll the ﬁrst two slots. Then for each of these ways there are
n −2 choices left for the third slot. Continuing this way, there are n! ordered lists
of distinct numbers from ¦1, , n¦ as stated in the observation.
5.2.3 A Symmetric Deﬁnition
With the above, it is possible to give a more symmetric description of the determi
nant from which it will follow that det (A) = det
_
A
T
_
.
Corollary 5.2.5 The following formula for det (A) is valid.
det (A) =
1
n!
(r
1
,··· ,r
n
)
(k
1
,··· ,k
n
)
sgn (r
1
, , r
n
) sgn (k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
. (5.2.11)
And also det
_
A
T
_
= det (A) where A
T
is the transpose of A. (Recall that for
A
T
=
_
a
T
ij
_
, a
T
ij
= a
ji
.)
104THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
Proof: From Proposition 5.2.3, if the r
i
are distinct,
det (A) =
(k
1
,··· ,k
n
)
sgn (r
1
, , r
n
) sgn (k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
.
Summing over all ordered lists, (r
1
, , r
n
) where the r
i
are distinct, (If the r
i
are
not distinct, sgn (r
1
, , r
n
) = 0 and so there is no contribution to the sum.)
n! det (A) =
(r
1
,··· ,r
n
)
(k
1
,··· ,k
n
)
sgn (r
1
, , r
n
) sgn (k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
.
This proves the corollary since the formula gives the same number for A as it does
for A
T
.
5.2.4 The Alternating Property Of The Determinant
Corollary 5.2.6 If two rows or two columns in an n n matrix, A, are switched,
the determinant of the resulting matrix equals (−1) times the determinant of the
original matrix. If A is an nn matrix in which two rows are equal or two columns
are equal then det (A) = 0. Suppose the i
th
row of A equals (xa
1
+yb
1
, , xa
n
+yb
n
).
Then
det (A) = xdet (A
1
) +y det (A
2
)
where the i
th
row of A
1
is (a
1
, , a
n
) and the i
th
row of A
2
is (b
1
, , b
n
) , all
other rows of A
1
and A
2
coinciding with those of A. In other words, det is a linear
function of each row A. The same is true with the word “row” replaced with the
word “column”.
Proof: By Proposition 5.2.3 when two rows are switched, the determinant of the
resulting matrix is (−1) times the determinant of the original matrix. By Corollary
5.2.5 the same holds for columns because the columns of the matrix equal the rows
of the transposed matrix. Thus if A
1
is the matrix obtained from A by switching
two columns,
det (A) = det
_
A
T
_
= −det
_
A
T
1
_
= −det (A
1
) .
If A has two equal columns or two equal rows, then switching them results in the
same matrix. Therefore, det (A) = −det (A) and so det (A) = 0.
It remains to verify the last assertion.
det (A) ≡
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) a
1k
1
(xa
k
i
+yb
k
i
) a
nk
n
= x
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) a
1k
1
a
k
i
a
nk
n
+y
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) a
1k
1
b
k
i
a
nk
n
5.2. THE DETERMINANT 105
≡ xdet (A
1
) +y det (A
2
) .
The same is true of columns because det
_
A
T
_
= det (A) and the rows of A
T
are
the columns of A.
5.2.5 Linear Combinations And Determinants
Deﬁnition 5.2.7 A vector w, is a linear combination of the vectors ¦v
1
, , v
r
¦ if
there exists scalars, c
1
, c
r
such that w =
r
k=1
c
k
v
k
. This is the same as saying
w ∈ span ¦v
1
, , v
r
¦.
The following corollary is also of great use.
Corollary 5.2.8 Suppose A is an nn matrix and some column (row) is a linear
combination of r other columns (rows). Then det (A) = 0.
Proof: Let A =
_
a
1
a
n
_
be the columns of A and suppose the condition
that one column is a linear combination of r of the others is satisﬁed. Then by using
Corollary 5.2.6 you may rearrange the columns to have the n
th
column a linear
combination of the ﬁrst r columns. Thus a
n
=
r
k=1
c
k
a
k
and so
det (A) = det
_
a
1
a
r
a
n−1
r
k=1
c
k
a
k
_
.
By Corollary 5.2.6
det (A) =
r
k=1
c
k
det
_
a
1
a
r
a
n−1
a
k
_
= 0.
The case for rows follows from the fact that det (A) = det
_
A
T
_
.
5.2.6 The Determinant Of A Product
Recall the following deﬁnition of matrix multiplication.
Deﬁnition 5.2.9 If A and B are n n matrices, A = (a
ij
) and B = (b
ij
), AB =
(c
ij
) where
c
ij
≡
n
k=1
a
ik
b
kj
.
One of the most important rules about determinants is that the determinant of
a product equals the product of the determinants.
Theorem 5.2.10 Let A and B be n n matrices. Then
det (AB) = det (A) det (B) .
106THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
Proof: Let c
ij
be the ij
th
entry of AB. Then by Proposition 5.2.3,
det (AB) =
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) c
1k
1
c
nk
n
=
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
)
_
r
1
a
1r
1
b
r
1
k
1
_
_
r
n
a
nr
n
b
r
n
k
n
_
=
(r
1
··· ,r
n
)
(k
1
,··· ,k
n
)
sgn (k
1
, , k
n
) b
r
1
k
1
b
r
n
k
n
(a
1r
1
a
nr
n
)
=
(r
1
··· ,r
n
)
sgn (r
1
r
n
) a
1r
1
a
nr
n
det (B) = det (A) det (B) .
5.2.7 Cofactor Expansions
Lemma 5.2.11 Suppose a matrix is of the form
M =
_
A ∗
0 a
_
(5.2.12)
or
M =
_
A 0
∗ a
_
(5.2.13)
where a is a number and A is an (n −1) (n −1) matrix and ∗ denotes either a
column or a row having length n−1 and the 0 denotes either a column or a row of
length n −1 consisting entirely of zeros. Then
det (M) = a det (A) .
Proof: Denote M by (m
ij
). Thus in the ﬁrst case, m
nn
= a and m
ni
= 0 if
i ,= n while in the second case, m
nn
= a and m
in
= 0 if i ,= n. From the deﬁnition
of the determinant,
det (M) ≡
(k
1
,··· ,k
n
)
sgn
n
(k
1
, , k
n
) m
1k
1
m
nk
n
Letting θ denote the position of n in the ordered list, (k
1
, , k
n
) then using the
earlier conventions used to prove Lemma 5.1.1, det (M) equals
(k
1
,··· ,k
n
)
(−1)
n−θ
sgn
n−1
_
k
1
, , k
θ−1
,
θ
k
θ+1
, ,
n−1
k
n
_
m
1k
1
m
nk
n
5.2. THE DETERMINANT 107
Now suppose 5.2.13. Then if k
n
,= n, the term involving m
nk
n
in the above expres
sion equals zero. Therefore, the only terms which survive are those for which θ = n
or in other words, those for which k
n
= n. Therefore, the above expression reduces
to
a
(k
1
,··· ,k
n−1
)
sgn
n−1
(k
1
, k
n−1
) m
1k
1
m
(n−1)k
n−1
= a det (A) .
To get the assertion in the situation of 5.2.12 use Corollary 5.2.5 and 5.2.13 to write
det (M) = det
_
M
T
_
= det
__
A
T
0
∗ a
__
= a det
_
A
T
_
= a det (A) .
In terms of the theory of determinants, arguably the most important idea is
that of Laplace expansion along a row or a column. This will follow from the above
deﬁnition of a determinant.
Deﬁnition 5.2.12 Let A = (a
ij
) be an n n matrix. Then a new matrix called
the cofactor matrix, cof (A) is deﬁned by cof (A) = (c
ij
) where to obtain c
ij
delete
the i
th
row and the j
th
column of A, take the determinant of the (n −1) (n −1)
matrix which results, (This is called the ij
th
minor of A. ) and then multiply this
number by (−1)
i+j
. To make the formulas easier to remember, cof (A)
ij
will denote
the ij
th
entry of the cofactor matrix.
The following is the main result. Earlier this was given as a deﬁnition and the
outrageous totally unjustiﬁed assertion was made that the same number would be
obtained by expanding the determinant along any row or column. The following
theorem proves this assertion.
Theorem 5.2.13 Let A be an n n matrix where n ≥ 2. Then
det (A) =
n
j=1
a
ij
cof (A)
ij
=
n
i=1
a
ij
cof (A)
ij
.
The ﬁrst formula consists of expanding the determinant along the i
th
row and the
second expands the determinant along the j
th
column.
Proof: Let (a
i1
, , a
in
) be the i
th
row of A. Let B
j
be the matrix obtained
from A by leaving every row the same except the i
th
row which in B
j
equals
(0, , 0, a
ij
, 0, , 0). Then by Corollary 5.2.6,
det (A) =
n
j=1
det (B
j
)
Denote by A
ij
the (n −1) (n −1) matrix obtained by deleting the i
th
row and
the j
th
column of A. Thus cof (A)
ij
≡ (−1)
i+j
det
_
A
ij
_
. At this point, recall
108THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
that from Proposition 5.2.3, when two rows or two columns in a matrix, M, are
switched, this results in multiplying the determinant of the old matrix by −1 to get
the determinant of the new matrix. Therefore, by Lemma 5.2.11,
det (B
j
) = (−1)
n−j
(−1)
n−i
det
__
A
ij
∗
0 a
ij
__
= (−1)
i+j
det
__
A
ij
∗
0 a
ij
__
= a
ij
cof (A)
ij
.
Therefore,
det (A) =
n
j=1
a
ij
cof (A)
ij
which is the formula for expanding det (A) along the i
th
row. Also,
det (A) = det
_
A
T
_
=
n
j=1
a
T
ij
cof
_
A
T
_
ij
=
n
j=1
a
ji
cof (A)
ji
which is the formula for expanding det (A) along the i
th
column.
Note that this gives an easy way to write a formula for the inverse of an n n
matrix.
5.2.8 Formula For The Inverse
Theorem 5.2.14 A
−1
exists if and only if det(A) ,= 0. If det(A) ,= 0, then A
−1
=
_
a
−1
ij
_
where
a
−1
ij
= det(A)
−1
cof (A)
ji
for cof (A)
ij
the ij
th
cofactor of A.
Proof: By Theorem 5.2.13, and letting (a
ir
) = A, if det (A) ,= 0,
n
i=1
a
ir
cof (A)
ir
det(A)
−1
= det(A) det(A)
−1
= 1.
Now consider
n
i=1
a
ir
cof (A)
ik
det(A)
−1
when k ,= r. Replace the k
th
column with the r
th
column to obtain a matrix, B
k
whose determinant equals zero by Corollary 5.2.6. However, expanding this matrix
along the k
th
column yields
0 = det (B
k
) det (A)
−1
=
n
i=1
a
ir
cof (A)
ik
det (A)
−1
5.2. THE DETERMINANT 109
Summarizing,
n
i=1
a
ir
cof (A)
ik
det (A)
−1
= δ
rk
.
Using the other formula in Theorem 5.2.13, and similar reasoning,
n
j=1
a
rj
cof (A)
kj
det (A)
−1
= δ
rk
This proves that if det (A) ,= 0, then A
−1
exists with A
−1
=
_
a
−1
ij
_
, where
a
−1
ij
= cof (A)
ji
det (A)
−1
.
Now suppose A
−1
exists. Then by Theorem 5.2.10,
1 = det (I) = det
_
AA
−1
_
= det (A) det
_
A
−1
_
so det (A) ,= 0.
The next corollary points out that if an n n matrix, A has a right or a left
inverse, then it has an inverse.
Corollary 5.2.15 Let A be an n n matrix and suppose there exists an n n
matrix, B such that BA = I. Then A
−1
exists and A
−1
= B. Also, if there exists
C an n n matrix such that AC = I, then A
−1
exists and A
−1
= C.
Proof: Since BA = I, Theorem 5.2.10 implies
det Bdet A = 1
and so det A ,= 0. Therefore from Theorem 5.2.14, A
−1
exists. Therefore,
A
−1
= (BA) A
−1
= B
_
AA
−1
_
= BI = B.
The case where CA = I is handled similarly.
The conclusion of this corollary is that left inverses, right inverses and inverses
are all the same in the context of n n matrices.
Theorem 5.2.14 says that to ﬁnd the inverse, take the transpose of the cofactor
matrix and divide by the determinant. The transpose of the cofactor matrix is
called the adjugate or sometimes the classical adjoint of the matrix A. In words,
A
−1
is equal to one over the determinant of A times the adjugate matrix of A.
5.2.9 Cramer’s Rule
In case you are solving a system of equations, Ax = y for x, it follows that if A
−1
exists,
x =
_
A
−1
A
_
x = A
−1
(Ax) = A
−1
y
110THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
thus solving the system. Now in the case that A
−1
exists, there is a formula for
A
−1
given above. Using this formula,
x
i
=
n
j=1
a
−1
ij
y
j
=
n
j=1
1
det (A)
cof (A)
ji
y
j
.
By the formula for the expansion of a determinant along a column,
x
i
=
1
det (A)
det
_
_
_
∗ y
1
∗
.
.
.
.
.
.
.
.
.
∗ y
n
∗
_
_
_,
where here the i
th
column of A is replaced with the column vector (y
1
, y
n
)
T
,
and the determinant of this modiﬁed matrix is taken and divided by det (A). This
formula is known as Cramer’s rule.
5.2.10 Upper Triangular Matrices
Deﬁnition 5.2.16 A matrix M, is upper triangular if M
ij
= 0 whenever i > j.
Thus such a matrix equals zero below the main diagonal, the entries of the form M
ii
as shown.
_
_
_
_
_
_
∗ ∗ ∗
0 ∗
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
∗
0 0 ∗
_
_
_
_
_
_
A lower triangular matrix is deﬁned similarly as a matrix for which all entries above
the main diagonal are equal to zero.
With this deﬁnition, here is a simple corollary of Theorem 5.2.13.
Corollary 5.2.17 Let M be an upper (lower) triangular matrix. Then det (M) is
obtained by taking the product of the entries on the main diagonal.
5.2.11 The Determinant Rank
Deﬁnition 5.2.18 A submatrix of a matrix A is the rectangular array of numbers
obtained by deleting some rows and columns of A. Let A be an mn matrix. The
determinant rank of the matrix equals r where r is the largest number such that
some r r submatrix of A has a non zero determinant.
Theorem 5.2.19 If A, an m n matrix has determinant rank r, then there exist
r rows (columns) of the matrix such that every other row (column) is a linear
combination of these r rows (columns).
5.2. THE DETERMINANT 111
Proof: Suppose the determinant rank of A = (a
ij
) equals r. Thus some r r
submatrix has non zero determinant and there is no larger square submatrix which
has non zero determinant. Suppose such a submatrix is determined by the r columns
whose indices are
j
1
< < j
r
and the r rows whose indices are
i
1
< < i
r
I want to show that every row is a linear combination of these rows. Consider the
l
th
row and let p be an index between 1 and n. Form the following (r + 1) (r + 1)
matrix
_
_
_
_
_
a
i
1
j
1
a
i
1
j
r
a
i
1
p
.
.
.
.
.
.
.
.
.
a
i
r
j
1
a
i
r
j
r
a
i
r
p
a
lj
1
a
lj
r
a
lp
_
_
_
_
_
Of course you can assume l / ∈ ¦i
1
, , i
r
¦ because there is nothing to prove if the
l
th
row is one of the chosen ones. The above matrix has determinant 0. This is
because if p / ∈ ¦j
1
, , j
r
¦ then the above would be a submatrix of A which is too
large to have non zero determinant. On the other hand, if p ∈ ¦j
1
, , j
r
¦ then the
above matrix has two columns which are equal so its determinant is still 0.
Expand the determinant of the above matrix along the last column. Let C
k
denote the cofactor associated with the entry a
i
k
p
. This is not dependent on the
choice of p. Remember, you delete the column and the row the entry is in and take
the determinant of what is left and multiply by −1 raised to an appropriate power.
Let C denote the cofactor associated with a
lp
. This is given to be nonzero, it being
the determinant of the matrix
_
_
_
a
i
1
j
1
a
i
1
j
r
.
.
.
.
.
.
a
i
r
j
1
a
i
r
j
r
_
_
_
Thus
0 = a
lp
C +
r
k=1
C
k
a
i
k
p
which implies
a
lp
=
r
k=1
−C
k
C
a
i
k
p
≡
r
k=1
m
k
a
i
k
p
Since this is true for every p and since m
k
does not depend on p, this has shown the
l
th
row is a linear combination of the i
1
, i
2
, , i
r
rows. The determinant rank does
not change when you replace A with A
T
. Therefore, the same conclusion holds for
the columns.
112THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
5.2.12 Determining Whether A Is One To One Or Onto
The following theorem is of fundamental importance and ties together many of the
ideas presented above.
Theorem 5.2.20 Let A be an n n matrix. Then the following are equivalent.
1. det (A) = 0.
2. A, A
T
are not one to one.
3. A is not onto.
Proof: Suppose det (A) = 0. Then the determinant rank of A = r < n.
Therefore, there exist r columns such that every other column is a linear com
bination of these columns by Theorem 5.2.19. In particular, it follows that for
some m, the m
th
column is a linear combination of all the others. Thus letting
A =
_
a
1
a
m
a
n
_
where the columns are denoted by a
i
, there exists
scalars, α
i
such that
a
m
=
k=m
α
k
a
k
.
Now consider the column vector x ≡
_
α
1
−1 α
n
_
T
. Then
Ax = −a
m
+
k=m
α
k
a
k
= 0.
Since also A0 = 0, it follows A is not one to one. Similarly, A
T
is not one to one
by the same argument applied to A
T
. This veriﬁes that 1.) implies 2.).
Now suppose 2.). Then since A
T
is not one to one, it follows there exists x ,= 0
such that
A
T
x = 0.
Taking the transpose of both sides yields
x
T
A = 0
T
where the 0
T
is a 1 n matrix or row vector. Now if Ay = x, then
[x[
2
= x
T
(Ay) =
_
x
T
A
_
y = 0
T
y = 0
contrary to x ,= 0. Consequently there can be no y such that Ay = x and so A is
not onto. This shows that 2.) implies 3.).
Finally, suppose 3.). If 1.) does not hold, then det (A) ,= 0 but then from
Theorem 5.2.14 A
−1
exists and so for every y ∈ F
n
there exists a unique x ∈ F
n
such that Ax = y. In fact x = A
−1
y. Thus A would be onto contrary to 3.). This
shows 3.) implies 1.).
Corollary 5.2.21 Let A be an n n matrix. Then the following are equivalent.
5.2. THE DETERMINANT 113
1. det(A) ,= 0.
2. A and A
T
are one to one.
3. A is onto.
Proof: This follows immediately from the above theorem.
5.2.13 Schur’s Theorem
Consider the following system of equations for x
1
, x
2
, , x
n
n
j=1
a
ij
x
j
= 0, i = 1, 2, , m (5.2.14)
where m < n. Then the following theorem is a fundamental observation.
Theorem 5.2.22 Let the system of equations be as just described in 5.2.14 where
m < n. Then letting
x
T
≡ (x
1
, x
2
, , x
n
) ∈ F
n
,
there exists x ,= 0 such that the components satisfy each of the equations of 5.2.14.
Here F is a ﬁeld of scalars. Think 1 or C for example.
Proof: The above system is of the form
Ax = 0
where A is an mn matrix with m < n. Therefore, if you form the matrix
_
A
0
_
,
an nn matrix having n−m rows of zeros on the bottom, it follows this matrix has
determinant equal to 0. Therefore, from Theorem 5.2.20, there exists x ,= 0 such
that Ax = 0.
Deﬁnition 5.2.23 A set of vectors in F
n
, F = 1 or C, ¦x
1
, , x
k
¦ is called an
orthonormal set of vectors if
x
i
x
j
= (x
i
, x
j
) = δ
ij
≡
_
1 if i = j
0 if i ,= j
Theorem 5.2.24 Let v
1
be a unit vector ([v
1
[ = 1) in F
n
, n > 1. Then there exist
vectors ¦v
2
, , v
n
¦ such that
¦v
1
, , v
n
¦
is an orthonormal set of vectors.
114THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
Proof: The equation for x
v
1
x = 0
has a nonzero solution x by Theorem 5.2.22. Pick such a solution and divide by
its magnitude to get v
2
a unit vector such that v
1
v
2
= 0. Hence ¦v
1
, v
2
¦ is an
orthonormal set of vectors. Now suppose v
1
, , v
k
have been chosen such that
¦v
1
, , v
k
¦ is an orthonormal set of vectors. Then consider the equations
v
j
x = 0 j = 1, 2, , k
This amounts to the situation of Theorem 5.2.22 in which there are more variables
than equations. Therefore, by this theorem, there exists a nonzero x solving all
these equations. Divide by its magnitude and this gives v
k+1
.
The argument gives the following generalization.
Corollary 5.2.25 Let ¦v
1
, , v
k
¦ be orthonormal in F
n
where k < n. Then there
exist vectors ¦v
k+1
, , v
n
¦ such that ¦v
1
, , v
n
¦ is an orthonormal basis for F
n
.
Deﬁnition 5.2.26 If U is an n n matrix whose columns form an orthonormal
set of vectors, then U is called an orthogonal matrix if it is real and a unitary
matrix if it is complex. Note that from the way we multiply matrices,
U
T
U = UU
T
= I.
Thus U
−1
= U
T
. If U is only unitary, then from the dot product in C
n
, we replace
the above with
U
∗
U = UU
∗
= I.
Where the ∗ indicates to take the conjugate of the transpose.
Note the product of orthogonal or unitary matrices is orthogonal or unitary
because
(U
1
U
2
)
T
(U
1
U
2
) = U
T
2
U
T
1
U
1
U
2
= I
(U
1
U
2
)
∗
(U
1
U
2
) = U
∗
2
U
∗
1
U
1
U
2
= I.
Two matrices A and B are similar if there is some invertible matrix S such
that A = S
−1
BS. Note that similar matrices have the same characteristic equation
because by Theorem 5.2.10 which says the determinant of a product is the product
of the determinants,
det (λI −A) = det
_
λI −S
−1
BS
_
= det
_
S
−1
(λI −B) S
_
= det
_
S
−1
_
det (λI −B) det (S) = det
_
S
−1
S
_
det (λI −B) = det (λI −B)
With this preparation, here is a case of Schur’s theorem.
5.2. THE DETERMINANT 115
Theorem 5.2.27 Let A be a real or complex n n matrix. Then there exists a
unitary matrix U such that
U
∗
AU = T, (5.2.15)
where T is an upper triangular matrix having the eigenvalues of A on the main
diagonal listed according to multiplicity as zeros of the characteristic equation. If A
has all real entries and eigenvalues, then U can be chosen to be orthogonal.
Proof: The theorem is clearly true if A is a 1 1 matrix. Just let U = 1 the
1 1 matrix which has 1 down the main diagonal and zeros elsewhere. Suppose it
is true for (n −1) (n −1) matrices and let A be an n n matrix. Then let v
1
be
a unit eigenvector for A. There exists λ
1
such that
Av
1
= λ
1
v
1
, [v
1
[ = 1.
By Theorem 5.2.24 there exists ¦v
1
, , v
n
¦, an orthonormal set in 1
n
. Let U
0
be a matrix whose i
th
column is v
i
. Then from the above, it follows U
0
is unitary.
Then from the way you multiply matrices, (Remember that to get the ij
th
entry of
a product, you take the dot product if the i
th
row of the matrix on the left with
the j
th
column of the matrix on the right.) U
∗
0
AU
0
is of the form
_
_
_
_
_
λ
1
∗ ∗
0
.
.
. A
1
0
_
_
_
_
_
where A
1
is an n−1 n−1 matrix. The above matrix is similar to A, so it has the
same eigenvalues and indeed the same characteristic equation. Now by induction,
there exists an (n −1) (n −1) unitary matrix
¯
U
1
such that
¯
U
∗
1
A
1
¯
U
1
= T
n−1
,
an upper triangular matrix. Consider
U
1
≡
_
1 0
0
¯
U
1
_
From the way we multiply matrices, this is a unitary matrix and
U
∗
1
U
∗
0
AU
0
U
1
=
_
1 0
0
¯
U
∗
1
__
λ
1
∗
0 A
1
__
1 0
0
¯
U
1
_
=
_
λ
1
∗
0 T
n−1
_
≡ T
where T is upper triangular. Then let U = U
0
U
1
. Then U
∗
AU = T. If A is real
having real eigenvalues, all of the above can be accomplished using the real dot
product and using real eigenvectors. Thus U can be orthogonal.
116THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
5.2.14 Symmetric Matrices
Recall a real matrix A is symmetric if A = A
T
.
Lemma 5.2.28 A real symmetric matrix has all real eigenvalues.
Proof: Recall the eigenvalues are solutions λ to
det (λI −A) = 0
and so by Theorem 5.2.20, there exists x a vector such that
Ax = λx, x ,= 0
Of course if A is real, it is still possible that the eigenvalue could be complex and if
this is the case, then the vector x will also end up being complex. I wish to show
that the eigenvalues are all real. Suppose then that λ is an eigenvalue and let x be
the corresponding eigenvector described above. Then letting x denote the complex
conjugate of x,
λx
T
x = (Ax)
T
x = x
T
A
T
x = x
T
Ax = x
T
Ax = x
T
xλ
and so, canceling x
T
x, it follows λ = λ showing λ is real.
Theorem 5.2.29 Let A be a real symmetric matrix. Then there exists a diago
nal matrix D consisting of the eigenvalues of A down the main diagonal and an
orthogonal matrix U such that
U
T
AU = D.
Proof: Since A has all real eigenvalues, it follows from Theorem 5.2.27, there
exists an orthogonal matrix U such that
U
T
AU = T
where T is upper triangular. Now
T
T
= U
T
A
T
U = U
T
AU = T
and so in fact T is a diagonal matrix having the eigenvalues of A down the diagonal.
Theorem 5.2.30 Let A be a real symmetric matrix which has all positive eigen
values 0 < λ
1
≤ λ
2
≤ λ
n
. Then
(Ax x) ≡ x
T
Ax ≥ λ
1
[x[
2
5.3. THE RIGHT POLAR FACTORIZATION 117
Proof: Let U be the orthogonal matrix of Theorem 5.2.29. Then
(Ax x) = x
T
Ax =
_
x
T
U
_
D
_
U
T
x
_
=
_
U
T
x
_
D
_
U
T
x
_
=
i
λ
i
¸
¸
_
U
T
x
_
i
¸
¸
2
≥ λ
1
i
¸
¸
_
U
T
x
_
i
¸
¸
2
= λ
1
_
U
T
xU
T
x
_
= λ
1
_
U
T
x
_
T
U
T
x =λ
1
x
T
UU
T
x = λ
1
x
T
Ix = λ
1
[x[
2
.
5.3 The Right Polar Factorization
The right polar factorization involves writing a matrix as a product of two other
matrices, one which preserves distances and the other which stretches and distorts.
First here are some lemmas which review and add to many of the topics discussed
so far about adjoints and orthonormal sets and such things.
Lemma 5.3.1 Let A be a symmetric matrix such that all its eigenvalues are non
negative. Then there exists a symmetric matrix, A
1/2
such that A
1/2
has all non
negative eigenvalues and
_
A
1/2
_
2
= A.
Proof: Since A is symmetric, it follows from Theorem 5.2.29 there exists a
diagonal matrix D having all real nonnegative entries and an orthogonal matrix
U such that A = U
T
DU. Then denote by D
1/2
the matrix which is obtained by
replacing each diagonal entry of D with its square root. Thus D
1/2
D
1/2
= D. Then
deﬁne
A
1/2
≡ U
T
D
1/2
U.
Then
_
A
1/2
_
2
= U
T
D
1/2
UU
T
D
1/2
U = U
T
DU = A.
Since D
1/2
is real,
_
U
T
D
1/2
U
_
T
= U
T
_
D
1/2
_
T _
U
T
_
T
= U
T
D
1/2
U
so A
1/2
is symmetric.
There is also a useful observation about orthonormal sets of vectors which is
stated in the next lemma.
Lemma 5.3.2 Suppose ¦x
1
, x
2
, , x
r
¦ is an orthonormal set of vectors. Then if
c
1
, , c
r
are scalars,
¸
¸
¸
¸
¸
r
k=1
c
k
x
k
¸
¸
¸
¸
¸
2
=
r
k=1
[c
k
[
2
.
118THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
Proof: This follows from the deﬁnition. From the properties of the dot product
and using the fact that the given set of vectors is orthonormal,
¸
¸
¸
¸
¸
r
k=1
c
k
x
k
¸
¸
¸
¸
¸
2
=
_
_
r
k=1
c
k
x
k
,
r
j=1
c
j
x
j
_
_
=
k,j
c
k
c
j
(x
k
, x
j
) =
r
k=1
[c
k
[
2
.
Here is another lemma about preserving distance.
Lemma 5.3.3 Suppose R is an mn matrix with m > n and R preserves distances.
Then R
T
R = I.
Proof: Since R preserves distances, [Rx[ = [x[ for every x. Therefore from the
axioms of the dot product,
[x[
2
+[y[
2
+ (x, y) + (y, x)
= [x +y[
2
= (R(x +y) , R(x +y))
= (Rx,Rx) + (Ry,Ry) + (Rx, Ry) + (Ry, Rx)
= [x[
2
+[y[
2
+
_
R
T
Rx, y
_
+
_
y, R
T
Rx
_
and so for all x, y,
_
R
T
Rx −x, y
_
+
_
y,R
T
Rx −x
_
= 0
Hence for all x, y,
_
R
T
Rx −x, y
_
= 0
Now for a x, y given, choose α ∈ ¦−1, 1¦ such that
α
_
R
T
Rx −x, y
_
=
¸
¸
_
R
T
Rx −x, y
_¸
¸
Then
0 =
_
R
T
Rx −x,αy
_
= α
_
R
T
Rx −x, y
_
=
¸
¸
_
R
T
Rx −x, y
_¸
¸
Thus
¸
¸
_
R
T
Rx −x, y
_¸
¸
= 0 for all x, y because the given x, y were arbitrary. Let
y = R
T
Rx −x to conclude that for all x,
R
T
Rx −x = 0
which says R
T
R = I since x is arbitrary.
With this preparation, here is the big theorem about the right polar factoriza
tion.
5.3. THE RIGHT POLAR FACTORIZATION 119
Theorem 5.3.4 Let F be an m n matrix where m ≥ n. Then there exists a
symmetric n n matrix, U which has all nonnegative eigenvalues and an m n
matrix, R which preserves distances and satisﬁes R
T
R = I such that
F = RU.
Proof: Consider F
T
F. This is a symmetric matrix because
_
F
T
F
_
T
= F
T
_
F
T
_
T
= F
T
F
Also the eigenvalues of the n n matrix F
T
F are all nonnegative. This is because
if x is an eigenvalue,
λ(x, x) =
_
F
T
Fx, x
_
= (Fx,Fx) ≥ 0.
Therefore, by Lemma 5.3.1, there exists an n n symmetric matrix, U having all
nonnegative eigenvalues such that
U
2
= F
T
F.
Consider the subspace U (F
n
). Let ¦Ux
1
, , Ux
r
¦ be an orthonormal basis for
U (F
n
) ⊆ F
n
. Note that U (F
n
) might not be all of F
n
. Using Corollary 5.2.25,
extend to an orthonormal basis for all of F
n
,
¦Ux
1
, , Ux
r
, y
r+1
, , y
n
¦ .
Next observe that ¦Fx
1
, , Fx
r
¦ is also an orthonormal set of vectors in F
m
.
This is because
(Fx
k
, Fx
j
) =
_
F
T
Fx
k
, x
j
_
=
_
U
2
x
k
, x
j
_
=
_
Ux
k
, U
T
x
j
_
= (Ux
k
, Ux
j
) = δ
jk
Therefore, from Corollary 5.2.25 again, this orthonormal set of vectors can be ex
tended to an orthonormal basis for F
m
,
¦Fx
1
, , Fx
r
, z
r+1
, , z
m
¦
Thus there are at least as many z
k
as there are y
j
. Now for x ∈ F
n
, since
¦Ux
1
, , Ux
r
, y
r+1
, , y
n
¦
is an orthonormal basis for F
n
, there exist unique scalars,
c
1
, c
r
, d
r+1
, , d
n
such that
x =
r
k=1
c
k
Ux
k
+
n
k=r+1
d
k
y
k
120THE MATHEMATICAL THEORYOF DETERMINANTS, BASIC LINEAR ALGEBRA
Deﬁne
Rx ≡
r
k=1
c
k
Fx
k
+
n
k=r+1
d
k
z
k
(5.3.16)
Then also there exist scalars b
k
such that
Ux =
r
k=1
b
k
Ux
k
and so from 5.3.16,
RUx =
r
k=1
b
k
Fx
k
= F
_
r
k=1
b
k
x
k
_
Is F (
r
k=1
b
k
x
k
) = F (x)?
_
F
_
r
k=1
b
k
x
k
_
−F (x) , F
_
r
k=1
b
k
x
k
_
−F (x)
_
=
_
_
F
T
F
_
_
r
k=1
b
k
x
k
−x
_
,
_
r
k=1
b
k
x
k
−x
__
=
_
U
2
_
r
k=1
b
k
x
k
−x
_
,
_
r
k=1
b
k
x
k
−x
__
=
_
U
_
r
k=1
b
k
x
k
−x
_
, U
_
r
k=1
b
k
x
k
−x
__
=
_
r
k=1
b
k
Ux
k
−Ux,
r
k=1
b
k
Ux
k
−Ux
_
= 0
Therefore, F (
r
k=1
b
k
x
k
) = F (x) and this shows
RUx = Fx.
From 5.3.16 and Lemma 5.3.2 R preserves distances. Therefore, by Lemma 5.3.3
R
T
R = I.
The Derivative
6.1 Basic Deﬁnitions
The concept of derivative generalizes right away to functions of many variables.
However, no attempt will be made to consider derivatives from one side or another.
This is because when you consider functions of many variables, there isn’t a well
deﬁned side. However, it is certainly the case that there are more general notions
which include such things. I will present a fairly general notion of the derivative of
a function which is deﬁned on a normed vector space which has values in a normed
vector space. The case of most interest is that of a function which maps F
n
to F
m
but it is no more trouble to consider the extra generality and it is sometimes useful
to have this extra generality because sometimes you want to consider functions
deﬁned, for example on subspaces of F
n
and it is nice to not have to trouble with
ad hoc considerations. Also, you might want to consider F
n
with some norm other
than the usual one.
In what follows, X, Y will denote normed vector spaces. Thanks to Theorem
4.8.4 all the deﬁnitions and theorems given below work the same for any norm given
on the vector spaces.
Let U be an open set in X, and let f : U →Y be a function.
Deﬁnition 6.1.1 A function g is o(v) if
lim
v→0
g (v)
[[v[[
= 0 (6.1.1)
A function f : U →Y is diﬀerentiable at x ∈ U if there exists a linear transforma
tion L ∈ L(X, Y ) such that
f (x +v) = f (x) +Lv +o(v)
This linear transformation L is the deﬁnition of Df (x). This derivative is often
called the Frechet derivative.
Note that from Theorem 4.8.4 the question whether a given function is diﬀeren
tiable is independent of the norm used on the ﬁnite dimensional vector space. That
121
122 THE DERIVATIVE
is, a function is diﬀerentiable with one norm if and only if it is diﬀerentiable with
another norm.
The deﬁnition 6.1.1 means the error,
f (x +v) −f (x) −Lv
converges to 0 faster than [[v[[. Thus the above deﬁnition is equivalent to saying
lim
v→0
[[f (x +v) −f (x) −Lv[[
[[v[[
= 0 (6.1.2)
or equivalently,
lim
y→x
[[f (y) −f (x) −Df (x) (y −x)[[
[[y −x[[
= 0. (6.1.3)
The symbol, o(v) should be thought of as an adjective. Thus, if t and k are
constants,
o(v) = o(v) +o(v) , o(tv) = o(v) , ko(v) = o(v)
and other similar observations hold.
Theorem 6.1.2 The derivative is well deﬁned.
Proof: First note that for a ﬁxed vector, v, o(tv) = o(t). This is because
lim
t→0
o(tv)
[t[
= lim
t→0
[[v[[
o(tv)
[[tv[[
= 0
Now suppose both L
1
and L
2
work in the above deﬁnition. Then let v be any vector
and let t be a real scalar which is chosen small enough that tv +x ∈ U. Then
f (x +tv) = f (x) +L
1
tv +o(tv) , f (x +tv) = f (x) +L
2
tv +o(tv) .
Therefore, subtracting these two yields (L
2
−L
1
) (tv) = o(tv) = o(t). There
fore, dividing by t yields (L
2
−L
1
) (v) =
o(t)
t
. Now let t → 0 to conclude that
(L
2
−L
1
) (v) = 0. Since this is true for all v, it follows L
2
= L
1
.
Lemma 6.1.3 Let f be diﬀerentiable at x. Then f is continuous at x and in fact,
there exists K > 0 such that whenever [[v[[ is small enough,
[[f (x +v) −f (x)[[ ≤ K[[v[[
Proof: From the deﬁnition of the derivative,
f (x +v) −f (x) = Df (x) v +o(v) .
Let [[v[[ be small enough that
o(v)
v
< 1 so that [[o(v)[[ ≤ [[v[[. Then for such
v,
[[f (x +v) −f (x)[[ ≤ [[Df (x) v[[ +[[v[[
≤ ([[Df (x)[[ + 1) [[v[[
This proves the lemma with K = [[Df (x)[[ + 1.
Here [[Df (x)[[ is the operator norm of the linear transformation, Df (x).
6.2. THE CHAIN RULE 123
6.2 The Chain Rule
With the above lemma, it is easy to prove the chain rule.
Theorem 6.2.1 (The chain rule) Let U and V be open sets U ⊆ X and V ⊆
Y . Suppose f : U → V is diﬀerentiable at x ∈ U and suppose g : V → F
q
is
diﬀerentiable at f (x) ∈ V . Then g ◦ f is diﬀerentiable at x and
D(g ◦ f ) (x) = D(g (f (x))) D(f (x)) .
Proof: This follows from a computation. Let B(x,r) ⊆ U and let r also be
small enough that for [[v[[ ≤ r, it follows that f (x +v) ∈ V . Such an r exists
because f is continuous at x. For [[v[[ < r, the deﬁnition of diﬀerentiability of g
and f implies
g (f (x +v)) −g (f (x)) =
Dg (f (x)) (f (x +v) −f (x)) +o(f (x +v) −f (x))
= Dg (f (x)) [Df (x) v +o(v)] +o(f (x +v) −f (x))
= D(g (f (x))) D(f (x)) v +o(v) +o(f (x +v) −f (x)) . (6.2.4)
It remains to show o(f (x +v) −f (x)) = o(v).
By Lemma 6.1.3, with K given there, letting ε > 0, it follows that for [[v[[ small
enough,
[[o(f (x +v) −f (x))[[ ≤ (ε/K) [[f (x +v) −f (x)[[ ≤ (ε/K) K[[v[[ = ε [[v[[ .
Since ε > 0 is arbitrary, this shows o(f (x +v) −f (x)) = o(v) because whenever
[[v[[ is small enough,
[[o(f (x +v) −f (x))[[
[[v[[
≤ ε.
By 6.2.4, this shows
g (f (x +v)) −g (f (x)) = D(g (f (x))) D(f (x)) v +o(v)
which proves the theorem.
6.3 The Matrix Of The Derivative
Let X, Y be normed vector spaces, a basis for X being ¦v
1
, , v
n
¦ and a basis for
Y being ¦w
1
, , w
m
¦ . First note that if π
i
: X →F is deﬁned by
π
i
v ≡ x
i
where v =
k
x
k
v
k
,
then π
i
∈ L(X, F) and so by Theorem 4.8.3, it follows that π
i
is continuous and if
lim
s→t
g (s) = L, then [π
i
g (s) −π
i
L[ ≤ [[π
i
[[ [[g (s) −L[[ and so the i
th
components
converge also.
124 THE DERIVATIVE
Suppose that f : U → Y is diﬀerentiable. What is the matrix of Df (x) with
respect to the given bases? That is, if
Df (x) =
ij
J
ij
(x) w
i
v
j
,
what is J
ij
(x)?
D
v
k
f (x) ≡ lim
t→0
f (x +tv
k
) −f (x)
t
= lim
t→0
Df (x) (tv
k
) +o(tv
k
)
t
= Df (x) (v
k
) =
ij
J
ij
(x) w
i
v
j
(v
k
) =
ij
J
ij
(x) w
i
δ
jk
=
i
J
ik
(x) w
i
It follows
lim
t→0
π
j
_
f (x +tv
k
) −f (x)
t
_
≡ lim
t→0
f
j
(x +tv
k
) −f
j
(x)
t
≡ D
v
k
f
j
(x)
= π
j
_
i
J
ik
(x) w
i
_
= J
jk
(x)
Thus J
ik
(x) = D
v
k
f
i
(x).
In the case where X = 1
n
and Y = 1
m
and v is a unit vector, D
v
f
i
(x) is the
familiar directional derivative in the direction v of the function, f
i
.
Of course the case where X = F
n
and f : U ⊆ F
n
→F
m
, is diﬀerentiable and the
basis vectors are the usual basis vectors is the case most commonly encountered.
What is the matrix of Df (x) taken with respect to the usual basis vectors? Let
e
i
denote the vector of F
n
which has a one in the i
th
entry and zeroes elsewhere.
This is the standard basis for F
n
. Denote by J
ij
(x) the matrix with respect to these
basis vectors. Thus
Df (x) =
ij
J
ij
(x) e
i
e
j
.
Then from what was just shown,
J
ik
(x) = D
e
k
f
i
(x) ≡ lim
t→0
f
i
(x +te
k
) −f
i
(x)
t
≡
∂f
i
∂x
k
(x) ≡ f
i,x
k
(x) ≡ f
i,k
(x)
where the last several symbols are just the usual notations for the partial derivative
of the function, f
i
with respect to the k
th
variable where
f (x) ≡
m
i=1
f
i
(x) e
i
.
6.3. THE MATRIX OF THE DERIVATIVE 125
In other words, the matrix of Df (x) is nothing more than the matrix of partial
derivatives. The k
th
column of the matrix (J
ij
) is
∂f
∂x
k
(x) = lim
t→0
f (x +te
k
) −f (x)
t
≡ D
e
k
f (x) .
Thus the matrix of Df (x) with respect to the usual basis vectors is the matrix
of the form
_
_
_
f
1,x
1
(x) f
1,x
2
(x) f
1,x
n
(x)
.
.
.
.
.
.
.
.
.
f
m,x
1
(x) f
m,x
2
(x) f
m,x
n
(x)
_
_
_
where the notation g
,x
k
denotes the k
th
partial derivative given by the limit,
lim
t→0
g (x +te
k
) −g (x)
t
≡
∂g
∂x
k
.
The above discussion is summarized in the following theorem.
Theorem 6.3.1 Let f : F
n
→F
m
and suppose f is diﬀerentiable at x. Then all the
partial derivatives
∂f
i
(x)
∂x
j
exist and if Jf (x) is the matrix of the linear transforma
tion, Df (x) with respect to the standard basis vectors, then the ij
th
entry is given
by
∂f
i
∂x
j
(x) also denoted as f
i,j
or f
i,x
j
.
Deﬁnition 6.3.2 In general, the symbol
D
v
f (x)
is deﬁned by
lim
t→0
f (x +tv) −f (x)
t
where t ∈ F. This is often called the Gateaux derivative.
What if all the partial derivatives of f exist? Does it follow that f is diﬀeren
tiable? Consider the following function, f : 1
2
→1,
f (x, y) =
_
xy
x
2
+y
2
if (x, y) ,= (0, 0)
0 if (x, y) = (0, 0)
.
Then from the deﬁnition of partial derivatives,
lim
h→0
f (h, 0) −f (0, 0)
h
= lim
h→0
0 −0
h
= 0
and
lim
h→0
f (0, h) −f (0, 0)
h
= lim
h→0
0 −0
h
= 0
However f is not even continuous at (0, 0) which may be seen by considering the
behavior of the function along the line y = x and along the line x = 0. By Lemma
6.1.3 this implies f is not diﬀerentiable. Therefore, it is necessary to consider the
correct deﬁnition of the derivative given above if you want to get a notion which
generalizes the concept of the derivative of a function of one variable in such a way
as to preserve continuity whenever the function is diﬀerentiable.
126 THE DERIVATIVE
6.4 A Mean Value Inequality
The following theorem will be very useful in much of what follows. It is a version
of the mean value theorem as is the next lemma.
Lemma 6.4.1 Let Y be a normed vector space and suppose h : [0, 1] →Y is diﬀer
entiable and satisﬁes
[[h
(t)[[ ≤ M.
Then
[[h(1) −h(0)[[ ≤ M.
Proof: Let ε > 0 be given and let
S ≡ ¦t ∈ [0, 1] : for all s ∈ [0, t] , [[h(s) −h(0)[[ ≤ (M +ε) s¦
Then 0 ∈ S. Let t = sup S. Then by continuity of h it follows
[[h(t) −h(0)[[ = (M +ε) t (6.4.5)
Suppose t < 1. Then there exist positive numbers, h
k
decreasing to 0 such that
[[h(t +h
k
) −h(0)[[ > (M +ε) (t +h
k
)
and now it follows from 6.4.5 and the triangle inequality that
[[h(t +h
k
) −h(t)[[ +[[h(t) −h(0)[[
= [[h(t +h
k
) −h(t)[[ + (M +ε) t > (M +ε) (t +h
k
)
and so
[[h(t +h
k
) −h(t)[[ > (M +ε) h
k
Now dividing by h
k
and letting k →∞
[[h
(t)[[ ≥ M +ε,
a contradiction.
Theorem 6.4.2 Suppose U is an open subset of X and f : U →Y has the property
that Df (x) exists for all x in U and that, x +t (y −x) ∈ U for all t ∈ [0, 1]. (The
line segment joining the two points lies in U.) Suppose also that for all points on
this line segment,
[[Df (x+t (y −x))[[ ≤ M.
Then
[[f (y) −f (x)[[ ≤ M[y −x[ .
6.5. EXISTENCE OF THE DERIVATIVE, C
1
FUNCTIONS 127
Proof: Let
h(t) ≡ f (x +t (y −x)) .
Then by the chain rule,
h
(t) = Df (x +t (y −x)) (y −x)
and so
[[h
(t)[[ = [[Df (x +t (y −x)) (y −x)[[
≤ M[[y −x[[
by Lemma 6.4.1
[[h(1) −h(0)[[ = [[f (y) −f (x)[[ ≤ M[[y −x[[ .
6.5 Existence Of The Derivative, C
1
Functions
There is a way to get the diﬀerentiability of a function from the existence and con
tinuity of the Gateaux derivatives. This is very convenient because these Gateaux
derivatives are taken with respect to a one dimensional variable. The following
theorem is the main result.
Theorem 6.5.1 Let X be a normed vector space having basis ¦v
1
, , v
n
¦ and let
Y be another normed vector space having basis ¦w
1
, , w
m
¦ . Let U be an open
set in X and let f : U →Y have the property that the Gateaux derivatives,
D
v
k
f (x) ≡ lim
t→0
f (x +tv
k
) −f (x)
t
exist and are continuous functions of x. Then Df (x) exists and
Df (x) v =
n
k=1
D
v
k
f (x) a
k
where
v =
n
k=1
a
k
v
k
.
Furthermore, x →Df (x) is continuous; that is
lim
y→x
[[Df (y) −Df (x)[[ = 0.
128 THE DERIVATIVE
Proof: Let v =
n
k=1
a
k
v
k
. Then
f (x +v) −f (x) = f
_
x+
n
k=1
a
k
v
k
_
−f (x) .
Then letting
0
k=1
≡ 0, f (x +v) −f (x) is given by
n
k=1
_
_
f
_
_
x+
k
j=1
a
j
v
j
_
_
−f
_
_
x+
k−1
j=1
a
j
v
j
_
_
_
_
=
n
k=1
[f (x +a
k
v
k
) −f (x)] +
n
k=1
_
_
_
_
f
_
_
x+
k
j=1
a
j
v
j
_
_
−f (x +a
k
v
k
)
_
_
−
_
_
f
_
_
x+
k−1
j=1
a
j
v
j
_
_
−f (x)
_
_
_
_
(6.5.6)
Consider the k
th
term in 6.5.6. Let
h(t) ≡ f
_
_
x+
k−1
j=1
a
j
v
j
+ta
k
v
k
_
_
−f (x +ta
k
v
k
)
for t ∈ [0, 1] . Then
h
(t) = a
k
lim
h→0
1
a
k
h
_
_
f
_
_
x+
k−1
j=1
a
j
v
j
+ (t +h) a
k
v
k
_
_
−f (x + (t +h) a
k
v
k
)
−
_
_
f
_
_
x+
k−1
j=1
a
j
v
j
+ta
k
v
k
_
_
−f (x +ta
k
v
k
)
_
_
_
_
and this equals
_
_
D
v
k
f
_
_
x+
k−1
j=1
a
j
v
j
+ta
k
v
k
_
_
−D
v
k
f (x +ta
k
v
k
)
_
_
a
k
(6.5.7)
Now without loss of generality, it can be assumed the norm on X is given by that
of Example 4.8.5,
[[v[[ ≡ max
_
_
_
[a
k
[ : v =
n
j=1
a
k
v
k
_
_
_
6.5. EXISTENCE OF THE DERIVATIVE, C
1
FUNCTIONS 129
because by Theorem 4.8.4 all norms on X are equivalent. Therefore, from 6.5.7 and
the assumption that the Gateaux derivatives are continuous,
[[h
(t)[[ =
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
_
D
v
k
f
_
_
x+
k−1
j=1
a
j
v
j
+ta
k
v
k
_
_
−D
v
k
f (x +ta
k
v
k
)
_
_
a
k
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
≤ ε [a
k
[ ≤ ε [[v[[
provided [[v[[ is suﬃciently small. Since ε is arbitrary, it follows from Lemma 6.4.1
the expression in 6.5.6 is o(v) because this expression equals a ﬁnite sum of terms
of the form h(1) −h(0) where [[h
(t)[[ ≤ ε [[v[[ . Thus
f (x +v) −f (x) =
n
k=1
[f (x +a
k
v
k
) −f (x)] +o(v)
=
n
k=1
D
v
k
f (x) a
k
+
n
k=1
[f (x +a
k
v
k
) −f (x) −D
v
k
f (x) a
k
] +o(v) .
Consider the k
th
term in the second sum.
f (x +a
k
v
k
) −f (x) −D
v
k
f (x) a
k
= a
k
_
f (x +a
k
v
k
) −f (x)
a
k
−D
v
k
f (x)
_
where the expression in the parentheses converges to 0 as a
k
→ 0. Thus whenever
[[v[[ is suﬃciently small,
[[f (x +a
k
v
k
) −f (x) −D
v
k
f (x) a
k
[[ ≤ ε [a
k
[ ≤ ε [[v[[
which shows the second sum is also o(v). Therefore,
f (x +v) −f (x) =
n
k=1
D
v
k
f (x) a
k
+o(v) .
Deﬁning
Df (x) v ≡
n
k=1
D
v
k
f (x) a
k
where v =
k
a
k
v
k
, it follows Df (x) ∈ L(X, Y ) and is given by the above formula.
It remains to verify x →Df (x) is continuous.
[[(Df (x) −Df (y)) v[[
≤
n
k=1
[[(D
v
k
f (x) −D
v
k
f (y)) a
k
[[
≤ max ¦[a
k
[ , k = 1, , n¦
n
k=1
[[D
v
k
f (x) −D
v
k
f (y)[[
= [[v[[
n
k=1
[[D
v
k
f (x) −D
v
k
f (y)[[
130 THE DERIVATIVE
and so
[[Df (x) −Df (y)[[ ≤
n
k=1
[[D
v
k
f (x) −D
v
k
f (y)[[
which proves the continuity of Df because of the assumption the Gateaux derivatives
are continuous.
This motivates the following deﬁnition of what it means for a function to be C
1
.
Deﬁnition 6.5.2 Let U be an open subset of a normed ﬁnite dimensional vector
space, X and let f : U →Y another ﬁnite dimensional normed vector space. Then
f is said to be C
1
if there exists a basis for X, ¦v
1
, , v
n
¦ such that the Gateaux
derivatives,
D
v
k
f (x)
exist on U and are continuous.
Here is another deﬁnition of what it means for a function to be C
1
.
Deﬁnition 6.5.3 Let U be an open subset of a normed ﬁnite dimensional vector
space, X and let f : U →Y another ﬁnite dimensional normed vector space. Then
f is said to be C
1
if f is diﬀerentiable and x →Df (x) is continuous as a map from
U to L(X, Y ).
Now the following major theorem states these two deﬁnitions are equivalent.
Theorem 6.5.4 Let U be an open subset of a normed ﬁnite dimensional vector
space, X and let f : U →Y another ﬁnite dimensional normed vector space. Then
the two deﬁnitions above are equivalent.
Proof: It was shown in Theorem 6.5.1 that Deﬁnition 6.5.2 implies 6.5.3. Sup
pose then that Deﬁnition 6.5.3 holds. Then if v is any vector,
lim
t→0
f (x +tv) −f (x)
t
= lim
t→0
Df (x) tv +o(tv)
t
= Df (x) v+ lim
t→0
o(tv)
t
= Df (x) v
Thus D
v
f (x) exists and equals Df (x) v. By continuity of x →Df (x) , this estab
lishes continuity of x →D
v
f (x) and proves the theorem.
Note that the proof of the theorem also implies the following corollary.
Corollary 6.5.5 Let U be an open subset of a normed ﬁnite dimensional vector
space, X and let f : U →Y another ﬁnite dimensional normed vector space. Then if
there is a basis of X, ¦v
1
, , v
n
¦ such that the Gateaux derivatives, D
v
k
f (x) exist
and are continuous. Then all Gateaux derivatives, D
v
f (x) exist and are continuous
for all v ∈ X.
From now on, whichever deﬁnition is more convenient will be used.
6.6. HIGHER ORDER DERIVATIVES 131
6.6 Higher Order Derivatives
If f : U ⊆ X →Y for U an open set, then
x →Df (x)
is a mapping from U to L(X, Y ), a normed vector space. Therefore, it makes
perfect sense to ask whether this function is also diﬀerentiable.
Deﬁnition 6.6.1 The following is the deﬁnition of the second derivative.
D
2
f (x) ≡ D(Df (x)) .
Thus,
Df (x +v) −Df (x) = D
2
f (x) v +o(v) .
This implies
D
2
f (x) ∈ L(X, L(X, Y )) , D
2
f (x) (u) (v) ∈ Y,
and the map
(u, v) →D
2
f (x) (u) (v)
is a bilinear map having values in Y . In other words, the two functions,
u →D
2
f (x) (u) (v) , v →D
2
f (x) (u) (v)
are both linear.
The same pattern applies to taking higher order derivatives. Thus,
D
3
f (x) ≡ D
_
D
2
f (x)
_
and D
3
f (x) may be considered as a trilinear map having values in Y . In general
D
k
f (x) may be considered a k linear map. This means the function
(u
1
, , u
k
) →D
k
f (x) (u
1
) (u
k
)
has the property
u
j
→D
k
f (x) (u
1
) (u
j
) (u
k
)
is linear.
Also, instead of writing
D
2
f (x) (u) (v) , or D
3
f (x) (u) (v) (w)
the following notation is often used.
D
2
f (x) (u, v) or D
3
f (x) (u, v, w)
with similar conventions for higher derivatives than 3. Another convention which
is often used is the notation
D
k
f (x) v
k
132 THE DERIVATIVE
instead of
D
k
f (x) (v, , v) .
Note that for every k, D
k
f maps U to a normed vector space. As mentioned
above, Df (x) has values in L(X, Y ) , D
2
f (x) has values in L(X, L(X, Y )) , etc.
Thus it makes sense to consider whether D
k
f is continuous. This is described in
the following deﬁnition.
Deﬁnition 6.6.2 Let U be an open subset of X, a normed vector space and let
f : U →Y. Then f is C
k
(U) if f and its ﬁrst k derivatives are all continuous. Also,
D
k
f (x) when it exists can be considered a Y valued multilinear function.
6.7 C
k
Functions
Recall that for a C
1
function, f
Df (x) v =
j
D
v
j
f (x) a
j
=
ij
D
v
j
f
i
(x) w
i
a
j
=
ij
D
v
j
f
i
(x) w
i
v
j
_
k
a
k
v
k
_
=
ij
D
v
j
f
i
(x) w
i
v
j
(v)
where
k
a
k
v
k
= v and
f (x) =
i
f
i
(x) w
i
. (6.7.8)
This is because
w
i
v
j
_
k
a
k
v
k
_
≡
k
a
k
w
i
δ
jk
= w
i
a
j
.
Thus
Df (x) =
ij
D
v
j
f
i
(x) w
i
v
j
I propose to iterate this observation, starting with f and then going to Df and
then D
2
f and so forth. Hopefully it will yield a rational way to understand higher
order derivatives in the same way that matrices can be used to understand linear
transformations. Thus beginning with the derivative,
Df (x) =
ij
1
D
v
j
1
f
i
(x) w
i
v
j
1
.
Then letting w
i
v
j
1
play the role of w
i
in 6.7.8,
D
2
f (x) =
ij
1
j
2
D
v
j
2
_
D
v
j
1
f
i
_
(x) w
i
v
j
1
v
j
2
≡
ij
1
j
2
D
v
j
1
v
j
2
f
i
(x) w
i
v
j
1
v
j
2
6.7. C
K
FUNCTIONS 133
Then letting w
i
v
j
1
v
j
2
play the role of w
i
in 6.7.8,
D
3
f (x) =
ij
1
j
2
j
3
D
v
j
3
_
D
v
j
1
v
j
2
f
i
_
(x) w
i
v
j
1
v
j
2
v
j
3
≡
ij
1
j
2
j
3
D
v
j
1
v
j
2
v
j
3
f
i
(x) w
i
v
j
1
v
j
2
v
j
3
etc. In general, the notation,
w
i
v
j
1
v
j
2
v
j
n
deﬁnes an appropriate linear transformation given by
w
i
v
j
1
v
j
2
v
j
n
(v
k
) = w
i
v
j
1
v
j
2
v
j
n−1
δ
kj
n
.
The following theorem is important.
Theorem 6.7.1 The function x → D
k
f (x) exists and is continuous for k ≤ p if
and only if there exists a basis for X, ¦v
1
, , v
n
¦ and a basis for Y, ¦w
1
, , w
m
¦
such that for
f (x) ≡
i
f
i
(x) w
i
,
it follows that for each i = 1, 2, , m all Gateaux derivatives,
D
v
j
1
v
j
2
···v
j
k
f
i
(x)
for any choice of v
j
1
v
j
2
v
j
k
and for any k ≤ p exist and are continuous.
Proof: This follows from a repeated application of Theorems 6.5.1 and 6.5.4 at
each new diﬀerentiation.
Deﬁnition 6.7.2 Let X, Y be ﬁnite dimensional normed vector spaces and let U
be an open set in X and f : U →Y be a function,
f (x) =
i
f
i
(x) w
i
where ¦w
1
, , w
m
¦ is a basis for Y. Then f is said to be a C
n
(U) function if for
every k ≤ n, D
k
f (x) exists for all x ∈ U and is continuous. This is equivalent to the
other condition which states that for each i = 1, 2, , m all Gateaux derivatives,
D
v
j
1
v
j
2
···v
j
k
f
i
(x)
for any choice of v
j
1
v
j
2
v
j
k
where ¦v
1
, , v
n
¦ is a basis for X and for any
k ≤ n exist and are continuous.
134 THE DERIVATIVE
6.7.1 Some Standard Notation
In the case where X = 1
n
and the basis chosen is the standard basis, these Gateaux
derivatives are just the partial derivatives. Recall the notation for partial derivatives
in the following deﬁnition.
Deﬁnition 6.7.3 Let g : U →X. Then
g
x
k
(x) ≡
∂g
∂x
k
(x) ≡ lim
h→0
g (x +he
k
) −g (x)
h
Higher order partial derivatives are deﬁned in the usual way.
g
x
k
x
l
(x) ≡
∂
2
g
∂x
l
∂x
k
(x)
and so forth.
A convenient notation which is often used which helps to make sense of higher
order partial derivatives is presented in the following deﬁnition.
Deﬁnition 6.7.4 α = (α
1
, , α
n
) for α
1
α
n
positive integers is called a multi
index. For α a multiindex, [α[ ≡ α
1
+ +α
n
and if x ∈ X,
x =(x
1
, , x
n
),
and f a function, deﬁne
x
α
≡ x
α
1
1
x
α
2
2
x
α
n
n
, D
α
f (x) ≡
∂
α
f (x)
∂x
α
1
1
∂x
α
2
2
∂x
α
n
n
.
Then in this special case, the following deﬁnition is equivalent to the above as
a deﬁnition of what is meant by a C
k
function.
Deﬁnition 6.7.5 Let U be an open subset of 1
n
and let f : U → Y. Then for k a
nonnegative integer, f is C
k
if for every [α[ ≤ k, D
α
f exists and is continuous.
6.8 The Derivative And The Cartesian Product
There are theorems which can be used to get diﬀerentiability of a function based
on existence and continuity of the partial derivatives. A generalization of this was
given above. Here a function deﬁned on a product space is considered. It is very
much like what was presented above and could be obtained as a special case but
to reinforce the ideas, I will do it from scratch because certain aspects of it are
important in the statement of the implicit function theorem.
The following is an important abstract generalization of the concept of partial
derivative presented above. Insead of taking the derivative with respect to one
variable, it is taken with respect to several but not with respect to others. This
vague notion is made precise in the following deﬁnition. First here is a lemma.
6.8. THE DERIVATIVE AND THE CARTESIAN PRODUCT 135
Lemma 6.8.1 Suppose U is an open set in X Y. Then the set U
y
deﬁned by
U
y
≡ ¦x ∈ X : (x, y) ∈ U¦
is an open set in X. Here X Y is a ﬁnite dimensional vector space in which the
vector space operations are deﬁned componentwise. Thus for a, b ∈ F,
a (x
1
, y
1
) +b (x
2
, y
2
) = (ax
1
+bx
2
, ay
1
+by
2
)
and the norm can be taken to be
[[(x, y)[[ ≡ max ([[x[[ , [[y[[)
Proof: Recall by Theorem 4.8.4 it does not matter how this norm is deﬁned
and the deﬁnition above is convenient. It obviously satisﬁes most axioms of a norm.
The only one which is not obvious is the triangle inequality. I will show this now.
[[(x, y) + (x
1
, y
1
)[[ = [[(x +x
1
, y +y
1
)[[ ≡ max ([[x +x
1
[[ , [[y +y
1
[[)
≤ max ([[x[[ +[[x
1
[[ , [[y[[ +[[y
1
[[)
suppose then that [[x[[ +[[x
1
[[ ≥ [[y[[ +[[y
1
[[ . Then the above equals
[[x[[ +[[x
1
[[ ≤ max ([[x[[ , [[y[[) + max ([[x
1
[[ , [[y
1
[[) ≡ [[(x, y)[[ +[[(x
1
, y
1
)[[
In case [[x[[ +[[x
1
[[ < [[y[[ +[[y
1
[[ , the argument is similar.
Let x ∈ U
y
. Then (x, y) ∈ U and so there exists r > 0 such that
B((x, y) , r) ∈ U.
This says that if (u, v) ∈ X Y such that [[(u, v) −(x, y)[[ < r, then (u, v) ∈ U.
Thus if
[[(u, y) −(x, y)[[ = [[u −x[[ < r,
then (u, y) ∈ U. This has just said that B(x,r), the ball taken in X is contained in
U
y
.
Or course one could also consider
U
x
≡ ¦y : (x, y) ∈ U¦
in the same way and conclude this set is open in Y . Also, the generalization to
many factors yields the same conclusion. In this case, for x ∈
n
i=1
X
i
, let
[[x[[ ≡ max
_
[[x
i
[[
X
i
: x = (x
1
, , x
n
)
_
Then a similar argument to the above shows this is a norm on
n
i=1
X
i
.
Corollary 6.8.2 Let U ⊆
n
i=1
X
i
and let
U
(x
1
,··· ,x
i−1
,x
i+1
,··· ,x
n
)
≡
_
x ∈ F
r
i
:
_
x
1
, , x
i−1
, x, x
i+1
, , x
n
_
∈ U
_
.
Then U
(x
1
,··· ,x
i−1
,x
i+1
,··· ,x
n
)
is an open set in F
r
i
.
136 THE DERIVATIVE
The proof is similar to the above.
Deﬁnition 6.8.3 Let g : U ⊆
n
i=1
X
i
→ Y , where U is an open set. Then the
map
z →g
_
x
1
, , x
i−1
, z, x
i+1
, , x
n
_
is a function from the open set in X
i
,
_
z : x =
_
x
1
, , x
i−1
, z, x
i+1
, , x
n
_
∈ U
_
to Y . When this map is diﬀerentiable, its derivative is denoted by D
i
g (x). To aid
in the notation, for v ∈ X
i
, let θ
i
v ∈
n
i=1
X
i
be the vector (0, , v, , 0) where
the v is in the i
th
slot and for v ∈
n
i=1
X
i
, let v
i
denote the entry in the i
th
slot
of v. Thus, by saying
z →g
_
x
1
, , x
i−1
, z, x
i+1
, , x
n
_
is diﬀerentiable is meant that for v ∈ X
i
suﬃciently small,
g (x +θ
i
v) −g (x) = D
i
g (x) v +o(v) .
Note D
i
g (x) ∈ L(X
i
, Y ) .
Deﬁnition 6.8.4 Let U ⊆ X be an open set. Then f : U → Y is C
1
(U) if f is
diﬀerentiable and the mapping
x →Df (x) ,
is continuous as a function from U to L(X, Y ).
With this deﬁnition of partial derivatives, here is the major theorem.
Theorem 6.8.5 Let g, U,
n
i=1
X
i
, be given as in Deﬁnition 6.8.3. Then g is
C
1
(U) if and only if D
i
g exists and is continuous on U for each i. In this case, g
is diﬀerentiable and
Dg (x) (v) =
k
D
k
g (x) v
k
(6.8.9)
where v = (v
1
, , v
n
) .
Proof: Suppose then that D
i
g exists and is continuous for each i. Note that
k
j=1
θ
j
v
j
= (v
1
, , v
k
, 0, , 0) .
Thus
n
j=1
θ
j
v
j
= v and deﬁne
0
j=1
θ
j
v
j
≡ 0. Therefore,
g (x +v) −g (x) =
n
k=1
_
_
g
_
_
x+
k
j=1
θ
j
v
j
_
_
−g
_
_
x +
k−1
j=1
θ
j
v
j
_
_
_
_
(6.8.10)
6.8. THE DERIVATIVE AND THE CARTESIAN PRODUCT 137
Consider the terms in this sum.
g
_
_
x+
k
j=1
θ
j
v
j
_
_
−g
_
_
x +
k−1
j=1
θ
j
v
j
_
_
= g (x+θ
k
v
k
) −g (x) + (6.8.11)
_
_
g
_
_
x+
k
j=1
θ
j
v
j
_
_
−g (x+θ
k
v
k
)
_
_
−
_
_
g
_
_
x +
k−1
j=1
θ
j
v
j
_
_
−g (x)
_
_
(6.8.12)
and the expression in 6.8.12 is of the form h(v
k
) −h(0) where for small w ∈ X
k
,
h(w) ≡ g
_
_
x+
k−1
j=1
θ
j
v
j
+θ
k
w
_
_
−g (x +θ
k
w) .
Therefore,
Dh(w) = D
k
g
_
_
x+
k−1
j=1
θ
j
v
j
+θ
k
w
_
_
−D
k
g (x +θ
k
w)
and by continuity, [[Dh(w)[[ < ε provided [[v[[ is small enough. Therefore, by
Theorem 6.4.2, whenever [[v[[ is small enough,
[[h(v
k
) −h(0)[[ ≤ ε [[v
k
[[ ≤ ε [[v[[
which shows that since ε is arbitrary, the expression in 6.8.12 is o(v). Now in 6.8.11
g (x+θ
k
v
k
) −g (x) = D
k
g (x) v
k
+o(v
k
) = D
k
g (x) v
k
+o(v) .
Therefore, referring to 6.8.10,
g (x +v) −g (x) =
n
k=1
D
k
g (x) v
k
+o(v)
which shows Dg (x) exists and equals the formula given in 6.8.9.
Next suppose g is C
1
. I need to verify that D
k
g (x) exists and is continuous.
Let v ∈ X
k
suﬃciently small. Then
g (x +θ
k
v) −g (x) = Dg (x) θ
k
v +o(θ
k
v)
= Dg (x) θ
k
v +o(v)
since [[θ
k
v[[ = [[v[[. Then D
k
g (x) exists and equals
Dg (x) ◦ θ
k
Now x → Dg (x) is continuous. Since θ
k
is linear, it follows from Theorem 4.8.3
that θ
k
: X
k
→
n
i=1
X
i
is also continuous,
The way this is usually used is in the following corollary, a case of Theorem 6.8.5
obtained by letting X
i
= F in the above theorem.
138 THE DERIVATIVE
Corollary 6.8.6 Let U be an open subset of F
n
and let f :U → F
m
be C
1
in
the sense that all the partial derivatives of f exist and are continuous. Then f is
diﬀerentiable and
f (x +v) = f (x) +
n
k=1
∂f
∂x
k
(x) v
k
+o(v) .
6.9 Mixed Partial Derivatives
Continuing with the special case where f is deﬁned on an open set in F
n
, I will next
consider an interesting result due to Euler in 1734 about mixed partial derivatives.
It turns out that the mixed partial derivatives, if continuous will end up being equal.
Recall the notation
f
x
=
∂f
∂x
= D
e
1
f
and
f
xy
=
∂
2
f
∂y∂x
= D
e
1
e
2
f.
Theorem 6.9.1 Suppose f : U ⊆ F
2
→1 where U is an open set on which f
x
, f
y
,
f
xy
and f
yx
exist. Then if f
xy
and f
yx
are continuous at the point (x, y) ∈ U, it
follows
f
xy
(x, y) = f
yx
(x, y) .
Proof: Since U is open, there exists r > 0 such that B((x, y) , r) ⊆ U. Now let
[t[ , [s[ < r/2, t, s real numbers and consider
∆(s, t) ≡
1
st
¦
h(t)
¸ .. ¸
f (x +t, y +s) −f (x +t, y) −
h(0)
¸ .. ¸
(f (x, y +s) −f (x, y))¦. (6.9.13)
Note that (x +t, y +s) ∈ U because
[(x +t, y +s) −(x, y)[ = [(t, s)[ =
_
t
2
+s
2
_
1/2
≤
_
r
2
4
+
r
2
4
_
1/2
=
r
√
2
< r.
As implied above, h(t) ≡ f (x +t, y +s) − f (x +t, y). Therefore, by the mean
value theorem and the (one variable) chain rule,
∆(s, t) =
1
st
(h(t) −h(0)) =
1
st
h
(αt) t
=
1
s
(f
x
(x +αt, y +s) −f
x
(x +αt, y))
for some α ∈ (0, 1) . Applying the mean value theorem again,
∆(s, t) = f
xy
(x +αt, y +βs)
6.9. MIXED PARTIAL DERIVATIVES 139
where α, β ∈ (0, 1).
If the terms f (x +t, y) and f (x, y +s) are interchanged in 6.9.13, ∆(s, t) is
unchanged and the above argument shows there exist γ, δ ∈ (0, 1) such that
∆(s, t) = f
yx
(x +γt, y +δs) .
Letting (s, t) →(0, 0) and using the continuity of f
xy
and f
yx
at (x, y) ,
lim
(s,t)→(0,0)
∆(s, t) = f
xy
(x, y) = f
yx
(x, y) .
The following is obtained from the above by simply ﬁxing all the variables except
for the two of interest.
Corollary 6.9.2 Suppose U is an open subset of X and f : U →1 has the property
that for two indices, k, l, f
x
k
, f
x
l
, f
x
l
x
k
, and f
x
k
x
l
exist on U and f
x
k
x
l
and f
x
l
x
k
are both continuous at x ∈ U. Then f
x
k
x
l
(x) = f
x
l
x
k
(x) .
By considering the real and imaginary parts of f in the case where f has values
in C you obtain the following corollary.
Corollary 6.9.3 Suppose U is an open subset of F
n
and f : U →F has the property
that for two indices, k, l, f
x
k
, f
x
l
, f
x
l
x
k
, and f
x
k
x
l
exist on U and f
x
k
x
l
and f
x
l
x
k
are both continuous at x ∈ U. Then f
x
k
x
l
(x) = f
x
l
x
k
(x) .
Finally, by considering the components of f you get the following generalization.
Corollary 6.9.4 Suppose U is an open subset of F
n
and f : U → F
m
has the
property that for two indices, k, l, f
x
k
, f
x
l
, f
x
l
x
k
, and f
x
k
x
l
exist on U and f
x
k
x
l
and
f
x
l
x
k
are both continuous at x ∈ U. Then f
x
k
x
l
(x) = f
x
l
x
k
(x) .
It is necessary to assume the mixed partial derivatives are continuous in order
to assert they are equal. The following is a well known example [3].
Example 6.9.5 Let
f (x, y) =
_
xy(x
2
−y
2
)
x
2
+y
2
if (x, y) ,= (0, 0)
0 if (x, y) = (0, 0)
From the deﬁnition of partial derivatives it follows immediately that f
x
(0, 0) =
f
y
(0, 0) = 0. Using the standard rules of diﬀerentiation, for (x, y) ,= (0, 0) ,
f
x
= y
x
4
−y
4
+ 4x
2
y
2
(x
2
+y
2
)
2
, f
y
= x
x
4
−y
4
−4x
2
y
2
(x
2
+y
2
)
2
Now
f
xy
(0, 0) ≡ lim
y→0
f
x
(0, y) −f
x
(0, 0)
y
= lim
y→0
−y
4
(y
2
)
2
= −1
140 THE DERIVATIVE
while
f
yx
(0, 0) ≡ lim
x→0
f
y
(x, 0) −f
y
(0, 0)
x
= lim
x→0
x
4
(x
2
)
2
= 1
showing that although the mixed partial derivatives do exist at (0, 0) , they are not
equal there.
6.10 Implicit Function Theorem
The following lemma is very useful.
Lemma 6.10.1 Let A ∈ L(X, X) where X is a ﬁnite dimensional normed vector
space and suppose [[A[[ ≤ r < 1. Then
(I −A)
−1
exists (6.10.14)
and
¸
¸
¸
¸
¸
¸(I −A)
−1
¸
¸
¸
¸
¸
¸ ≤ (1 −r)
−1
. (6.10.15)
Furthermore, if
1 ≡
_
A ∈ L(X, X) : A
−1
exists
_
the map A →A
−1
is continuous on 1 and 1 is an open subset of L(X, X).
Proof: Let [[A[[ ≤ r < 1. If (I −A) x = 0, then x = Ax and so if x ,= 0,
[[x[[ = [[Ax[[ ≤ [[A[[ [[x[[ < r [[x[[
which is a contradiction. Therefore, (I −A) is one to one. Hence it maps a basis of
X to a basis of X and is therefore, onto. Here is why. Let ¦v
1
, , v
n
¦ be a basis
for X and suppose
n
k=1
c
k
(I −A) v
k
= 0.
Then
(I −A)
_
n
k=1
c
k
v
k
_
= 0
and since (I −A) is one to one, it follows
n
k=1
c
k
v
k
= 0
6.10. IMPLICIT FUNCTION THEOREM 141
which requires each c
k
= 0 because the ¦v
k
¦ are independent. Hence ¦(I −A) v
k
¦
n
k=1
is a basis for X because there are n of these vectors and every basis has the same
size. Therefore, if y ∈ X, there exist scalars, c
k
such that
y =
n
k=1
c
k
(I −A) v
k
= (I −A)
_
n
k=1
c
k
v
k
_
so (I −A) is onto as claimed. Thus (I −A)
−1
∈ L(X, X) and it remains to estimate
its norm.
[[x −Ax[[ ≥ [[x[[ −[[Ax[[ ≥ [[x[[ −[[A[[ [[x[[ ≥ [[x[[ (1 −r)
Letting y = x − Ax so x = (I −A)
−1
y, this shows, since (I −A) is onto that for
all y ∈ X,
[[y[[ ≥
¸
¸
¸
¸
¸
¸(I −A)
−1
y
¸
¸
¸
¸
¸
¸ (1 −r)
and so
¸
¸
¸
¸
¸
¸(I −A)
−1
¸
¸
¸
¸
¸
¸ ≤ (1 −r)
−1
. This proves the ﬁrst part.
To verify the continuity of the inverse map, let A ∈ 1. Then
B = A
_
I −A
−1
(A−B)
_
and so if
¸
¸
¸
¸
A
−1
(A−B)
¸
¸
¸
¸
< 1 which, by Theorem 4.8.3, happens if
[[A−B[[ < 1/
¸
¸
¸
¸
A
−1
¸
¸
¸
¸
,
it follows from the ﬁrst part of this proof that
_
I −A
−1
(A−B)
_
−1
exists and so
B
−1
=
_
I −A
−1
(A−B)
_
−1
A
−1
which shows 1 is open. Also, if
¸
¸
¸
¸
A
−1
(A−B)
¸
¸
¸
¸
≤ r < 1, (6.10.16)
¸
¸
¸
¸
B
−1
¸
¸
¸
¸
≤
¸
¸
¸
¸
A
−1
¸
¸
¸
¸
(1 −r)
−1
Now for such B this close to A such that 6.10.16 holds,
¸
¸
¸
¸
B
−1
−A
−1
¸
¸
¸
¸
=
¸
¸
¸
¸
B
−1
(A−B) A
−1
¸
¸
¸
¸
≤ [[A−B[[
¸
¸
¸
¸
B
−1
¸
¸
¸
¸
¸
¸
¸
¸
A
−1
¸
¸
¸
¸
≤ [[A−B[[
¸
¸
¸
¸
A
−1
¸
¸
¸
¸
2
(1 −r)
−1
which shows the map which takes a linear transformation in 1 to its inverse is
continuous.
The next theorem is a very useful result in many areas. It will be used in this
section to give a short proof of the implicit function theorem but it is also useful
in studying diﬀerential equations and integral equations. It is sometimes called the
uniform contraction principle.
142 THE DERIVATIVE
Theorem 6.10.2 Let X, Y be ﬁnite dimensional normed vector spaces. Also let E
be a closed subset of X and F a closed subset of Y. Suppose for each (x, y) ∈ EF,
T(x, y) ∈ E and satisﬁes
[[T(x, y) −T(x
, y)[[ ≤ r [[x −x
[[ (6.10.17)
where 0 < r < 1 and also
[[T(x, y) −T(x, y
)[[ ≤ M[[y −y
[[ . (6.10.18)
Then for each y ∈ F there exists a unique “ﬁxed point” for T(, y) , x ∈ E, satisfying
T(x, y) = x (6.10.19)
and also if x(y) is this ﬁxed point,
[[x(y) −x(y
)[[ ≤
M
1 −r
[[y −y
[[ . (6.10.20)
Proof: First consider the claim there exists a ﬁxed point for the mapping,
T(, y). For a ﬁxed y, let g (x) ≡ T(x, y). Now pick any x
0
∈ E and consider the
sequence,
x
1
= g (x
0
) , x
k+1
= g (x
k
) .
Then by 6.10.17,
[[x
k+1
−x
k
[[ = [[g (x
k
) −g (x
k−1
)[[ ≤ r [[x
k
−x
k−1
[[ ≤
r
2
[[x
k−1
−x
k−2
[[ ≤ ≤ r
k
[[g (x
0
) −x
0
[[ .
Now by the triangle inequality,
[[x
k+p
−x
k
[[ ≤
p
i=1
[[x
k+i
−x
k+i−1
[[
≤
p
i=1
r
k+i−1
[[g (x
0
) −x
0
[[ ≤
r
k
[[g (x
0
) −x
0
[[
1 −r
.
Since 0 < r < 1, this shows that ¦x
k
¦
∞
k=1
is a Cauchy sequence. Therefore, by
completeness of E it converges to a point x ∈ E. To see x is a ﬁxed point, use the
continuify of g to obtain
x ≡ lim
k→∞
x
k
= lim
k→∞
x
k+1
= lim
k→∞
g (x
k
) = g (x) .
This proves 6.10.19. To verify 6.10.20,
[[x(y) −x(y
)[[ = [[T(x(y) , y) −T(x(y
) , y
)[[ ≤
[[T(x(y) , y) −T(x(y) , y
)[[ +[[T(x(y) , y
) −T(x(y
) , y
)[[
6.10. IMPLICIT FUNCTION THEOREM 143
≤ M[[y −y
[[ +r [[x(y) −x(y
)[[ .
Thus
(1 −r) [[x(y) −x(y
)[[ ≤ M[[y −y
[[ .
This also shows the ﬁxed point for a given y is unique.
The implicit function theorem deals with the question of solving, f (x, y) = 0
for x in terms of y and how smooth the solution is. It is one of the most important
theorems in mathematics. The proof I will give holds with no change in the context
of inﬁnite dimensional complete normed vector spaces when suitable modiﬁcations
are made on what is meant by L(X, Y ) . There are also even more general versions
of this theorem than to normed vector spaces.
Recall that for X, Y normed vector spaces, the norm on X Y is of the form
[[(x, y)[[ = max ([[x[[ , [[y[[) .
Theorem 6.10.3 (implicit function theorem) Let X, Y, Z be ﬁnite dimensional normed
vector spaces and suppose U is an open set in X Y . Let f : U →Z be in C
1
(U)
and suppose
f (x
0
, y
0
) = 0, D
1
f (x
0
, y
0
)
−1
∈ L(Z, X) . (6.10.21)
Then there exist positive constants, δ, η, such that for every y ∈ B(y
0
, η) there
exists a unique x(y) ∈ B(x
0
, δ) such that
f (x(y) , y) = 0. (6.10.22)
Furthermore, the mapping, y →x(y) is in C
1
(B(y
0
, η)).
Proof: Let T(x, y) ≡ x −D
1
f (x
0
, y
0
)
−1
f (x, y). Therefore,
D
1
T(x, y) = I −D
1
f (x
0
, y
0
)
−1
D
1
f (x, y) . (6.10.23)
by continuity of the derivative which implies continuity of D
1
T, it follows there
exists δ > 0 such that if [[(x −x
0
, y −y
0
)[[ < δ, then
[[D
1
T(x, y)[[ <
1
2
. (6.10.24)
Also, it can be assumed δ is small enough that
¸
¸
¸
¸
¸
¸D
1
f (x
0
, y
0
)
−1
¸
¸
¸
¸
¸
¸ [[D
2
f (x, y)[[ < M (6.10.25)
where M >
¸
¸
¸
¸
¸
¸D
1
f (x
0
, y
0
)
−1
¸
¸
¸
¸
¸
¸ [[D
2
f (x
0
, y
0
)[[. By Theorem 6.4.2, whenever x, x
∈
B(x
0
, δ) and y ∈ B(y
0
, δ),
[[T(x, y) −T(x
, y)[[ ≤
1
2
[[x −x
[[ . (6.10.26)
Solving 6.10.23 for D
1
f (x, y) ,
D
1
f (x, y) = D
1
f (x
0
, y
0
) (I −D
1
T(x, y)) .
144 THE DERIVATIVE
By Lemma 6.10.1 and the assumption that D
1
f (x
0
, y
0
)
−1
exists, it follows, D
1
f (x, y)
−1
exists and equals
(I −D
1
T(x, y))
−1
D
1
f (x
0
, y
0
)
−1
By the estimate of Lemma 6.10.1 and 6.10.24,
¸
¸
¸
¸
¸
¸D
1
f (x, y)
−1
¸
¸
¸
¸
¸
¸ ≤ 2
¸
¸
¸
¸
¸
¸D
1
f (x
0
, y
0
)
−1
¸
¸
¸
¸
¸
¸ . (6.10.27)
Next more restrictions are placed on y to make it even closer to y
0
. Let
0 < η < min
_
δ,
δ
3M
_
.
Then suppose x ∈ B(x
0
, δ) and y ∈ B(y
0
, η). Consider
x −D
1
f (x
0
, y
0
)
−1
f (x, y) −x
0
= T(x, y) −x
0
≡ g (x, y) .
D
1
g (x, y) = I −D
1
f (x
0
, y
0
)
−1
D
1
f (x, y) = D
1
T(x, y) ,
and
D
2
g (x, y) = −D
1
f (x
0
, y
0
)
−1
D
2
f (x, y) .
Also note that T(x, y) = x is the same as saying f (x
0
, y
0
) = 0 and also g (x
0
, y
0
) =
0. Thus by 6.10.25 and Theorem 6.4.2, it follows that for such (x, y) ∈ B(x
0
, δ)
B(y
0
, η),
[[T(x, y) −x
0
[[ = [[g (x, y)[[ = [[g (x, y) −g (x
0
, y
0
)[[
≤ [[g (x, y) −g (x, y
0
)[[ +[[g (x, y
0
) −g (x
0
, y
0
)[[
≤ M[[y −y
0
[[ +
1
2
[[x −x
0
[[ <
δ
2
+
δ
3
=
5δ
6
< δ. (6.10.28)
Also for such (x, y
i
) , i = 1, 2, Theorem 6.4.2 and 6.10.25 implies
[[T(x, y
1
) −T(x, y
2
)[[ =
¸
¸
¸
¸
¸
¸D
1
f (x
0
, y
0
)
−1
(f (x, y
2
) −f (x, y
1
))
¸
¸
¸
¸
¸
¸
≤ M[[y
2
−y
1
[[ . (6.10.29)
From now on assume [[x −x
0
[[ < δ and [[y −y
0
[[ < η so that 6.10.29, 6.10.27,
6.10.28, 6.10.26, and 6.10.25 all hold. By 6.10.29, 6.10.26, 6.10.28, and the uniform
contraction principle, Theorem 6.10.2 applied to E ≡ B
_
x
0
,
5δ
6
_
and F ≡ B(y
0
, η)
implies that for each y ∈ B(y
0
, η), there exists a unique x(y) ∈ B(x
0
, δ) (actually
in B
_
x
0
,
5δ
6
_
) such that T(x(y) , y) = x(y) which is equivalent to
f (x(y) , y) = 0.
Furthermore,
[[x(y) −x(y
)[[ ≤ 2M[[y −y
[[ . (6.10.30)
6.10. IMPLICIT FUNCTION THEOREM 145
This proves the implicit function theorem except for the veriﬁcation that y →
x(y) is C
1
. This is shown next. Letting v be suﬃciently small, Theorem 6.8.5 and
Theorem 6.4.2 imply
0 = f (x(y +v) , y +v) −f (x(y) , y) =
D
1
f (x(y) , y) (x(y +v) −x(y)) +
+D
2
f (x(y) , y) v +o ((x(y +v) −x(y) , v)) .
The last term in the above is o (v) because of 6.10.30. Therefore, using 6.10.27,
solve the above equation for x(y +v) −x(y) and obtain
x(y +v) −x(y) = −D
1
(x(y) , y)
−1
D
2
f (x(y) , y) v +o (v)
Which shows that y →x(y) is diﬀerentiable on B(y
0
, η) and
Dx(y) = −D
1
f (x(y) , y)
−1
D
2
f (x(y) , y) . (6.10.31)
Now it follows from the continuity of D
2
f , D
1
f , the inverse map, 6.10.30, and this
formula for Dx(y)that x() is C
1
(B(y
0
, η)).
The next theorem is a very important special case of the implicit function the
orem known as the inverse function theorem. Actually one can also obtain the
implicit function theorem from the inverse function theorem. It is done this way in
[33] and in [2].
Theorem 6.10.4 (inverse function theorem) Let x
0
∈ U, an open set in X , and
let f : U →Y where X, Y are ﬁnite dimensional normed vector spaces. Suppose
f is C
1
(U) , and Df (x
0
)
−1
∈ L(Y, X). (6.10.32)
Then there exist open sets W, and V such that
x
0
∈ W ⊆ U, (6.10.33)
f : W →V is one to one and onto, (6.10.34)
f
−1
is C
1
, (6.10.35)
Proof: Apply the implicit function theorem to the function
F(x, y) ≡ f (x) −y
where y
0
≡ f (x
0
). Thus the function y → x(y) deﬁned in that theorem is f
−1
.
Now let
W ≡ B(x
0
, δ) ∩ f
−1
(B(y
0
, η))
and
V ≡ B(y
0
, η) .
146 THE DERIVATIVE
6.10.1 More Derivatives
In the implicit function theorem, suppose f is C
k
. Will the implicitly deﬁned func
tion also be C
k
? It was shown above that this is the case if k = 1. In fact it holds
for any positive integer k.
First of all, consider D
2
f (x(y) , y) ∈ L(Y, Z) . Let ¦w
1
, , w
m
¦ be a basis
for Y and let ¦z
1
, , z
n
¦ be a basis for Z. Then D
2
f (x(y) , y) has a matrix
with respect to these bases. Thus conserving on notation, denote this matrix by
_
D
2
f (x(y) , y)
ij
_
. Thus
D
2
f (x(y) , y) =
ij
D
2
f (x(y) , y)
ij
z
i
w
j
The scalar valued entries of the matrix of D
2
f (x(y) , y) have the same diﬀeren
tiability as the function y →D
2
f (x(y) , y) . This is because the linear projection
map π
ij
mapping L(Y, Z) to F given by π
ij
L ≡ L
ij
, the ij
th
entry of the matrix of
L with respect to the given bases is continuous thanks to Theorem 4.8.3. Similar
considerations apply to D
1
f (x(y) , y) and the entries of its matrix, D
1
f (x(y) , y)
ij
taken with respect to suitable bases. From the formula for the inverse of a matrix,
Theorem 5.2.14, the ij
th
entries of the matrix of D
1
f (x(y) , y)
−1
, D
1
f (x(y) , y)
−1
ij
also have the same diﬀerentiability as y →D
1
f (x(y) , y).
Now consider the formula for the derivative of the implicitly deﬁned function in
6.10.31,
Dx(y) = −D
1
f (x(y) , y)
−1
D
2
f (x(y) , y) . (6.10.36)
The above derivative is in L(Y, X) . Let ¦w
1
, , w
m
¦ be a basis for Y and let
¦v
1
, , v
n
¦ be a basis for X. Letting x
i
be the i
th
component of x with respect
to the basis for X, it follows from Theorem 6.7.1, y → x(y) will be C
k
if all such
Gateaux derivatives, D
w
j
1
w
j
2
···w
j
r
x
i
(y) exist and are continuous for r ≤ k and for
any i. Consider what is required for this to happen. By 6.10.36,
D
w
j
x
i
(y) =
k
_
−D
1
f (x(y) , y)
−1
_
ik
(D
2
f (x(y) , y))
kj
≡ G
1
(x(y) , y) (6.10.37)
where (x, y) → G
1
(x, y) is C
k−1
because it is assumed f is C
k
and one derivative
has been taken to write the above. If k ≥ 2, then another Gateaux derivative can
be taken.
D
w
j
w
k
x
i
(y) ≡ lim
t→0
G
1
(x(y +tw
k
) , y +tw
k
) −G
1
(x(y) , y)
t
= D
1
G
1
(x(y) , y) Dx(y) w
k
+D
2
G
1
(x(y) , y)
≡ G
2
(x(y) , y,Dx(y))
Since a similar result holds for all i and any choice of w
j
, w
k
, this shows x is at
least C
2
. If k ≥ 3, then another Gateaux derivative can be taken because then
(x, y, z) →G
2
(x, y, z) is C
1
and it has been established Dx is C
1
. Continuing this
6.11. TAYLOR’S FORMULA 147
way shows D
w
j
1
w
j
2
···w
j
r
x
i
(y) exists and is continuous for r ≤ k. This proves the
following corollary to the implicit and inverse function theorems.
Corollary 6.10.5 In the implicit and inverse function theorems, you can replace
C
1
with C
k
in the statements of the theorems for any k ∈ N.
6.10.2 The Case Of 1
n
In many applications of the implicit function theorem,
f : U ⊆ 1
n
1
m
→1
n
and f (x
0
, y
0
) = 0 while f is C
1
. How can you recognize the condition of the implicit
function theorem which says D
1
f (x
0
, y
0
)
−1
exists? This is really not hard. You
recall the matrix of the transformation D
1
f (x
0
, y
0
) with respect to the usual basis
vectors is
_
_
_
f
1,x
1
(x
0
, y
0
) f
1,x
n
(x
0
, y
0
)
.
.
.
.
.
.
f
n,x
1
(x
0
, y
0
) f
n,x
n
(x
0
, y
0
)
_
_
_
and so D
1
f (x
0
, y
0
)
−1
exists exactly when the determinant of the above matrix is
nonzero. This is the condition to check. In the general case, you just need to verify
D
1
f (x
0
, y
0
) is one to one and this can also be accomplished by looking at the matrix
of the transformation with respect to some bases on X and Z.
6.11 Taylor’s Formula
First recall the Taylor formula with the Lagrange form of the remainder. It will
only be needed on [0, 1] so that is what I will show.
Theorem 6.11.1 Let h : [0, 1] → 1 have m + 1 derivatives. Then there exists
t ∈ (0, 1) such that
h(1) = h(0) +
m
k=1
h
(k)
(0)
k!
+
h
(m+1)
(t)
(m+ 1)!
.
Proof: Let K be a number chosen such that
h(1) −
_
h(0) +
m
k=1
h
(k)
(0)
k!
+K
_
= 0
Now the idea is to ﬁnd K. To do this, let
F (t) = h(1) −
_
h(t) +
m
k=1
h
(k)
(t)
k!
(1 −t)
k
+K (1 −t)
m+1
_
148 THE DERIVATIVE
Then F (1) = F (0) = 0. Therefore, by Rolle’s theorem there exists t between 0 and
1 such that F
(t) = 0. Thus,
0 = −F
(t) = h
(t) +
m
k=1
h
(k+1)
(t)
k!
(1 −t)
k
−
m
k=1
h
(k)
(t)
k!
k (1 −t)
k−1
−K (m+ 1) (1 −t)
m
And so
= h
(t) +
m
k=1
h
(k+1)
(t)
k!
(1 −t)
k
−
m−1
k=0
h
(k+1)
(t)
k!
(1 −t)
k
−K (m+ 1) (1 −t)
m
= h
(t) +
h
(m+1)
(t)
m!
(1 −t)
m
−h
(t) −K (m+ 1) (1 −t)
m
and so
K =
h
(m+1)
(t)
(m+ 1)!
.
Now let f : U → 1 where U ⊆ X a normed vector space and suppose f ∈
C
m
(U). Let x ∈ U and let r > 0 be such that
B(x,r) ⊆ U.
Then for [[v[[ < r consider
f (x+tv) −f (x) ≡ h(t)
for t ∈ [0, 1]. Then by the chain rule,
h
(t) = Df (x+tv) (v) , h
(t) = D
2
f (x+tv) (v) (v)
and continuing in this way,
h
(k)
(t) = D
(k)
f (x+tv) (v) (v) (v) ≡ D
(k)
f (x+tv) v
k
.
It follows from Taylor’s formula for a function of one variable given above that
f (x +v) = f (x) +
m
k=1
D
(k)
f (x) v
k
k!
+
D
(m+1)
f (x+tv) v
m+1
(m+ 1)!
. (6.11.38)
This proves the following theorem.
Theorem 6.11.2 Let f : U →1 and let f ∈ C
m+1
(U). Then if
B(x,r) ⊆ U,
and [[v[[ < r, there exists t ∈ (0, 1) such that 6.11.38 holds.
6.11. TAYLOR’S FORMULA 149
6.11.1 Second Derivative Test
Now consider the case where U ⊆ 1
n
and f : U →1 is C
2
(U). Then from Taylor’s
theorem, if v is small enough, there exists t ∈ (0, 1) such that
f (x +v) = f (x) +Df (x) v+
D
2
f (x+tv) v
2
2
. (6.11.39)
Consider
D
2
f (x+tv) (e
i
) (e
j
) ≡ D(D(f (x+tv)) e
i
) e
j
= D
_
∂f (x +tv)
∂x
i
_
e
j
=
∂
2
f (x +tv)
∂x
j
∂x
i
where e
i
are the usual basis vectors. Letting
v =
n
i=1
v
i
e
i
,
the second derivative term in 6.11.39 reduces to
1
2
i,j
D
2
f (x+tv) (e
i
) (e
j
) v
i
v
j
=
1
2
i,j
H
ij
(x+tv) v
i
v
j
where
H
ij
(x+tv) = D
2
f (x+tv) (e
i
) (e
j
) =
∂
2
f (x+tv)
∂x
j
∂x
i
.
Deﬁnition 6.11.3 The matrix whose ij
th
entry is
∂
2
f(x)
∂x
j
∂x
i
is called the Hessian ma
trix, denoted as H(x).
From Theorem 6.9.1, this is a symmetric real matrix, thus self adjoint. By the
continuity of the second partial derivative,
f (x +v) = f (x) +Df (x) v+
1
2
v
T
H (x) v+
1
2
_
v
T
(H (x+tv) −H (x)) v
_
. (6.11.40)
where the last two terms involve ordinary matrix multiplication and
v
T
= (v
1
v
n
)
for v
i
the components of v relative to the standard basis.
150 THE DERIVATIVE
Deﬁnition 6.11.4 Let f : D → 1 where D is a subset of some normed vector
space. Then f has a local minimum at x ∈ D if there exists δ > 0 such that for all
y ∈ B(x, δ)
f (y) ≥ f (x) .
f has a local maximum at x ∈ D if there exists δ > 0 such that for all y ∈ B(x, δ)
f (y) ≤ f (x) .
Theorem 6.11.5 If f : U → 1 where U is an open subset of 1
n
and f is C
2
,
suppose Df (x) = 0. Then if H (x) has all positive eigenvalues, x is a local min
imum. If the Hessian matrix H (x) has all negative eigenvalues, then x is a local
maximum. If H (x) has a positive eigenvalue, then there exists a direction in which
f has a local minimum at x, while if H (x) has a negative eigenvalue, there exists
a direction in which H (x) has a local maximum at x.
Proof: Since Df (x) = 0, formula 6.11.40 holds and by continuity of the second
derivative, H (x) is a symmetric matrix. Thus H (x) has all real eigenvalues. Sup
pose ﬁrst that H (x) has all positive eigenvalues and that all are larger than δ
2
> 0.
Then by Theorem 5.2.30,
u
T
H (x) u ≥ δ
2
[u[
2
.
From 6.11.40 and the continuity of H, if v is small enough,
f (x +v) ≥ f (x) +
1
2
δ
2
[v[
2
−
1
4
δ
2
[v[
2
= f (x) +
δ
2
4
[v[
2
.
This shows the ﬁrst claim of the theorem. The second claim follows from similar
reasoning. Suppose H (x) has a positive eigenvalue λ
2
. Then let v be an eigenvector
for this eigenvalue. Then from 6.11.40,
f (x+tv) = f (x) +
1
2
t
2
v
T
H (x) v+
1
2
t
2
_
v
T
(H (x+tv) −H (x)) v
_
which implies
f (x+tv) = f (x) +
1
2
t
2
λ
2
[v[
2
+
1
2
t
2
_
v
T
(H (x+tv) −H (x)) v
_
≥ f (x) +
1
4
t
2
λ
2
[v[
2
whenever t is small enough. Thus in the direction v the function has a local mini
mum at x. The assertion about the local maximum in some direction follows simi
larly.
This theorem is an analogue of the second derivative test for higher dimensions.
As in one dimension, when there is a zero eigenvalue, it may be impossible to de
termine from the Hessian matrix what the local qualitative behavior of the function
is. For example, consider
f
1
(x, y) = x
4
+y
2
, f
2
(x, y) = −x
4
+y
2
.
6.12. THE METHOD OF LAGRANGE MULTIPLIERS 151
Then Df
i
(0, 0) = 0 and for both functions, the Hessian matrix evaluated at
(0, 0) equals
_
0 0
0 2
_
but the behavior of the two functions is very diﬀerent near the origin. The second
has a saddle point while the ﬁrst has a minimum there.
6.12 The Method Of Lagrange Multipliers
As an application of the implicit function theorem, consider the method of Lagrange
multipliers from calculus. Recall the problem is to maximize or minimize a function
subject to equality constraints. Let f : U →1 be a C
1
function where U ⊆ 1
n
and
let
g
i
(x) = 0, i = 1, , m (6.12.41)
be a collection of equality constraints with m < n. Now consider the system of
nonlinear equations
f (x) = a
g
i
(x) = 0, i = 1, , m.
x
0
is a local maximum if f (x
0
) ≥ f (x) for all x near x
0
which also satisﬁes the
constraints 6.12.41. A local minimum is deﬁned similarly. Let F : U 1 →1
m+1
be deﬁned by
F(x,a) ≡
_
_
_
_
_
f (x) −a
g
1
(x)
.
.
.
g
m
(x)
_
_
_
_
_
. (6.12.42)
Now consider the m+1n Jacobian matrix, the matrix of the linear transformation,
D
1
F(x, a) with respect to the usual basis for 1
n
and 1
m+1
.
_
_
_
_
_
f
x
1
(x
0
) f
x
n
(x
0
)
g
1x
1
(x
0
) g
1x
n
(x
0
)
.
.
.
.
.
.
g
mx
1
(x
0
) g
mx
n
(x
0
)
_
_
_
_
_
.
If this matrix has rank m + 1 then some m + 1 m + 1 submatrix has nonzero
determinant. It follows from the implicit function theorem that there exist m + 1
variables, x
i
1
, , x
i
m+1
such that the system
F(x,a) = 0 (6.12.43)
speciﬁes these m+1 variables as a function of the remaining n −(m+ 1) variables
and a in an open set of 1
n−m
. Thus there is a solution (x,a) to 6.12.43 for some x
152 THE DERIVATIVE
close to x
0
whenever a is in some open interval. Therefore, x
0
cannot be either a
local minimum or a local maximum. It follows that if x
0
is either a local maximum
or a local minimum, then the above matrix must have rank less than m+1 which,
requires the rows to be linearly dependent. Thus, there exist m scalars,
λ
1
, , λ
m
,
and a scalar µ, not all zero such that
µ
_
_
_
f
x
1
(x
0
)
.
.
.
f
x
n
(x
0
)
_
_
_ = λ
1
_
_
_
g
1x
1
(x
0
)
.
.
.
g
1x
n
(x
0
)
_
_
_+ +λ
m
_
_
_
g
mx
1
(x
0
)
.
.
.
g
mx
n
(x
0
)
_
_
_. (6.12.44)
If the column vectors
_
_
_
g
1x
1
(x
0
)
.
.
.
g
1x
n
(x
0
)
_
_
_,
_
_
_
g
mx
1
(x
0
)
.
.
.
g
mx
n
(x
0
)
_
_
_ (6.12.45)
are linearly independent, then, µ ,= 0 and dividing by µ yields an expression of the
form
_
_
_
f
x
1
(x
0
)
.
.
.
f
x
n
(x
0
)
_
_
_ = λ
1
_
_
_
g
1x
1
(x
0
)
.
.
.
g
1x
n
(x
0
)
_
_
_+ +λ
m
_
_
_
g
mx
1
(x
0
)
.
.
.
g
mx
n
(x
0
)
_
_
_ (6.12.46)
at every point x
0
which is either a local maximum or a local minimum. This proves
the following theorem.
Theorem 6.12.1 Let U be an open subset of 1
n
and let f : U → 1 be a C
1
function. Then if x
0
∈ U is either a local maximum or local minimum of f subject
to the constraints 6.12.41, then 6.12.44 must hold for some scalars µ, λ
1
, , λ
m
not all equal to zero. If the vectors in 6.12.45 are linearly independent, it follows
that an equation of the form 6.12.46 holds.
6.13 Exercises
1. Suppose L ∈ L(X, Y ) and suppose L is one to one. Show there exists r > 0
such that for all x ∈ X,
[[Lx[[ ≥ r [[x[[ .
Hint: You might argue that [[[x[[[ ≡ [[Lx[[ is a norm.
2. Show every polynomial,
α≤k
d
α
x
α
is C
k
for every k.
3. If f : U → 1 where U is an open set in X and f is C
2
, show the mixed
Gateaux derivatives, D
v
1
v
2
f (x) and D
v
2
v
1
f (x) are equal.
6.13. EXERCISES 153
4. Give an example of a function which is diﬀerentiable everywhere but at some
point it fails to have continuous partial derivatives. Thus this function will
be an example of a diﬀerentiable function which is not C
1
.
5. The existence of partial derivatives does not imply continuity as was shown
in an example. However, much more can be said than this. Consider
f (x, y) =
_
(x
2
−y
4
)
2
(x
2
+y
4
)
2
if (x, y) ,= (0, 0) ,
1 if (x, y) = (0, 0) .
Show each Gateaux derivative, D
v
f (0) exists and equals 0 for every v. Also
show each Gateaux derivative exists at every other point in 1
2
. Now consider
the curve x
2
= y
4
and the curve y = 0 to verify the function fails to be con
tinuous at (0, 0). This is an example of an everywhere Gateaux diﬀerentiable
function which is not diﬀerentiable and not continuous.
6. Let f be a real valued function deﬁned on 1
2
by
f (x, y) ≡
_
x
3
−y
3
x
2
+y
2
if (x, y) ,= (0, 0)
0 if (x, y) = (0, 0)
Determine whether f is continuous at (0, 0). Find f
x
(0, 0) and f
y
(0, 0) . Are
the partial derivatives of f continuous at (0, 0)? Find
D
(u,v)
f ((0, 0)) , lim
t→0
f (t (u, v))
t
.
Is the mapping (u, v) →D
(u,v)
f ((0, 0)) linear? Is f diﬀerentiable at (0, 0)?
7. Let f : V →1 where V is a ﬁnite dimensional normed vector space. Suppose
f is convex which means
f (tx + (1 −t) y) ≤ tf (x) + (1 −t) f (y)
whenever t ∈ [0, 1]. Suppose also that f is diﬀerentiable. Show then that for
every x, y ∈ V,
(Df (x) −Df (y)) (x −y) ≥ 0.
8. Suppose f : U ⊆ V →F where U is an open subset of V, a ﬁnite dimensional
inner product space with the inner product denoted by (, ) . Suppose f is
diﬀerentiable. Show there exists a unique vector v (x) ∈ V such that
(u v (x)) = Df (x) u.
This special vector is called the gradient and is usually denoted by ∇f (x) .
Hint: You might review the Riesz representation theorem presented earlier.
154 THE DERIVATIVE
9. Suppose f : U → Y where U is an open subset of X, a ﬁnite dimensional
normed vector space. Suppose that for all v ∈ X, D
v
f (x) exists. Show
that whenever a ∈ F D
av
f (x) = aD
v
f (x). Explain why if x → D
v
f (x) is
continuous then v → D
v
f (x) is linear. Show that if f is diﬀerentiable at x,
then D
v
f (x) = Df (x) v.
10. Suppose B is an open ball in X and f : B →Y is diﬀerentiable. Suppose also
there exists L ∈ L(X, Y ) such that
[[Df (x) −L[[ < k
for all x ∈ B. Show that if x
1
, x
2
∈ B,
[f (x
1
) −f (x
2
) −L(x
1
−x
2
)[ ≤ k [x
1
−x
2
[ .
Hint: Consider Tx = f (x) − Lx and argue [[DT (x)[[ < k. Then consider
Theorem 6.4.2.
11. Let U be an open subset of X, f : U → Y where X, Y are ﬁnite dimensional
normed vector spaces and suppose f ∈ C
1
(U) and Df (x
0
) is one to one. Then
show f is one to one near x
0
. Hint: Show using the assumption that f is C
1
that there exists δ > 0 such that if
x
1
, x
2
∈ B(x
0
, δ) ,
then
[f (x
1
) −f (x
2
) −Df (x
0
) (x
1
−x
2
)[ ≤
r
2
[x
1
−x
2
[ (6.13.47)
then use Problem 1.
12. Suppose M ∈ L(X, Y ) and suppose M is onto. Show there exists L ∈ L(Y, X)
such that
LMx =Px
where P ∈ L(X, X), and P
2
= P. Also show L is one to one and onto. Hint:
Let ¦y
1
, , y
m
¦ be a basis of Y and let Mx
i
= y
i
. Then deﬁne
Ly =
m
i=1
α
i
x
i
where y =
m
i=1
α
i
y
i
.
Show ¦x
1
, , x
m
¦ is a linearly independent set and show you can obtain
¦x
1
, , x
m
, , x
n
¦, a basis for X in which Mx
j
= 0 for j > m. Then let
Px ≡
m
i=1
α
i
x
i
where
x =
m
i=1
α
i
x
i
.
6.13. EXERCISES 155
13. This problem depends on the result of Problem 12. Let f : U ⊆ X → Y, f
is C
1
, and Df (x) is onto for each x ∈ U. Then show f maps open subsets
of U onto open sets in Y . Hint: Let P = LDf (x) as in Problem 12. Ar
gue L maps open sets from Y to open sets of the vector space X
1
≡ PX
and L
−1
maps open sets from X
1
to open sets of Y. Then Lf (x +v) =
Lf (x) + LDf (x) v +o(v) . Now for z ∈ X
1
, let h(z) = Lf (x +z) −Lf (x) .
Then h is C
1
on some small open subset of X
1
containing 0 and Dh(0) =
LDf (x) which is seen to be one to one and onto and in L(X
1
, X
1
) . There
fore, if r is small enough, h(B(0,r)) equals an open set in X
1
, V. This is by
the inverse function theorem. Hence L(f (x +B(0,r)) −f (x)) = V and so
f (x +B(0,r)) −f (x) = L
−1
(V ) , an open set in Y.
14. Suppose U ⊆ 1
2
is an open set and f : U → 1
3
is C
1
. Suppose Df (s
0
, t
0
)
has rank two and
f (s
0
, t
0
) =
_
_
x
0
y
0
z
0
_
_
.
Show that for (s, t) near (s
0
, t
0
), the points f (s, t) may be realized in one of
the following forms.
¦(x, y, φ(x, y)) : (x, y) near (x
0
, y
0
)¦,
¦(φ(y, z) , y, z) : (y, z) near (y
0
, z
0
)¦,
or
¦(x, φ(x, z) , z, ) : (x, z) near (x
0
, z
0
)¦.
This shows that parametrically deﬁned surfaces can be obtained locally in a
particularly simple form.
15. Let f : U → Y , Df (x) exists for all x ∈ U, B(x
0
, δ) ⊆ U, and there exists
L ∈ L(X, Y ), such that L
−1
∈ L(Y, X), and for all x ∈ B(x
0
, δ)
[[Df (x) −L[[ <
r
[[L
−1
[[
, r < 1.
Show that there exists ε > 0 and an open subset of B(x
0
, δ) , V , such that
f : V →B(f (x
0
) , ε) is one to one and onto. Also Df
−1
(y) exists for each
y ∈ B(f (x
0
) , ε) and is given by the formula
Df
−1
(y) =
_
Df
_
f
−1
(y)
_¸
−1
.
Hint: Let
T
y
(x) ≡ T (x, y) ≡ x−L
−1
(f (x) −y)
for [y −f (x
0
)[ <
(1−r)δ
2L
−1

, consider ¦T
n
y
(x
0
)¦. This is a version of the inverse
function theorem for f only diﬀerentiable, not C
1
.
156 THE DERIVATIVE
16. Recall the n
th
derivative can be considered a multilinear function deﬁned on
X
n
with values in some normed vector space. Now deﬁne a function denoted
as w
i
v
j
1
v
j
n
which maps X
n
→Y in the following way
w
i
v
j
1
v
j
n
(v
k
1
, , v
k
n
) ≡ w
i
δ
j
1
k
1
δ
j
2
k
2
δ
j
n
k
n
(6.13.48)
and w
i
v
j
1
v
j
n
is to be linear in each variable. Thus, for
_
n
k
1
=1
a
k
1
v
k
1
, ,
n
k
n
=1
a
k
n
v
k
n
_
∈ X
n
,
w
i
v
j
1
v
j
n
_
n
k
1
=1
a
k
1
v
k
1
, ,
n
k
n
=1
a
k
n
v
k
n
_
≡
k
1
k
2
···k
n
w
i
(a
k
1
a
k
2
a
k
n
) δ
j
1
k
1
δ
j
2
k
2
δ
j
n
k
n
= w
i
a
j
1
a
j
2
a
j
n
(6.13.49)
Show each w
i
v
j
1
v
j
n
is an n linear Y valued function. Next show the set
of n linear Y valued functions is a vector space and these special functions,
w
i
v
j
1
v
j
n
for all choices of i and the j
k
is a basis of this vector space. Find
the dimension of the vector space.
17. Minimize
n
j=1
x
j
subject to the constraint
n
j=1
x
2
j
= a
2
. Your answer
should be some function of a which you may assume is a positive number.
18. Find the point, (x, y, z) on the level surface, 4x
2
+y
2
−z
2
= 1which is closest
to (0, 0, 0) .
19. A curve is formed from the intersection of the plane, 2x +3y +z = 3 and the
cylinder x
2
+y
2
= 4. Find the point on this curve which is closest to (0, 0, 0) .
20. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and
the sphere x
2
+y
2
+z
2
= 16. Find the point on this curve which is closest to
(0, 0, 0) .
21. Find the point on the plane, 2x + 3y + z = 4 which is closest to the point
(1, 2, 3) .
22. Let A = (A
ij
) be an n n matrix which is symmetric. Thus A
ij
= A
ji
and recall (Ax)
i
= A
ij
x
j
where as usual sum over the repeated index. Show
∂
∂x
i
(A
ij
x
j
x
i
) = 2A
ij
x
j
. Show that when you use the method of Lagrange mul
tipliers to maximize the function, A
ij
x
j
x
i
subject to the constraint,
n
j=1
x
2
j
=
1, the value of λ which corresponds to the maximum value of this functions is
such that A
ij
x
j
= λx
i
. Thus Ax = λx. Thus λ is an eigenvalue of the matrix,
A.
6.13. EXERCISES 157
23. Let x
1
, , x
5
be 5 positive numbers. Maximize their product subject to the
constraint that
x
1
+ 2x
2
+ 3x
3
+ 4x
4
+ 5x
5
= 300.
24. Let f (x
1
, , x
n
) = x
n
1
x
n−1
2
x
1
n
. Then f achieves a maximum on the set
S ≡
_
x ∈ 1
n
:
n
i=1
ix
i
= 1 and each x
i
≥ 0
_
.
If x ∈ S is the point where this maximum is achieved, ﬁnd x
1
/x
n
.
25. Let (x, y) be a point on the ellipse, x
2
/a
2
+ y
2
/b
2
= 1 which is in the ﬁrst
quadrant. Extend the tangent line through (x, y) till it intersects the x and
y axes and let A(x, y) denote the area of the triangle formed by this line and
the two coordinate axes. Find the minimum value of the area of this triangle
as a function of a and b.
26. Maximize
n
i=1
x
2
i
(≡ x
2
1
x
2
2
x
2
3
x
2
n
) subject to the constraint,
n
i=1
x
2
i
= r
2
. Show the maximum is
_
r
2
/n
_
n
. Now show from this that
_
n
i=1
x
2
i
_
1/n
≤
1
n
n
i=1
x
2
i
and ﬁnally, conclude that if each number x
i
≥ 0, then
_
n
i=1
x
i
_
1/n
≤
1
n
n
i=1
x
i
and there exist values of the x
i
for which equality holds. This says the “geo
metric mean” is always smaller than the arithmetic mean.
27. Maximize x
2
y
2
subject to the constraint
x
2p
p
+
y
2q
q
= r
2
where p, q are real numbers larger than 1 which have the property that
1
p
+
1
q
= 1.
show the maximum is achieved when x
2p
= y
2q
and equals r
2
. Now conclude
that if x, y > 0, then
xy ≤
x
p
p
+
y
q
q
and there are values of x and y where this inequality is an equation.
158 THE DERIVATIVE
Part II
Integration And Measure
159
Measures And Measurable
Functions
The integral to be discussed next is the Lebesgue integral. This integral is more
general than the Riemann integral of beginning calculus. It is not as easy to deﬁne
as this integral but it is vastly superior in every application. In fact, the Riemann
integral has been obsolete for over 100 years. There exist convergence theorems
for this integral which are not available for the Riemann integral and unlike the
Riemann integral, the Lebesgue integral generalizes readily to abstract settings used
in probability theory. Much of the analysis done in the last 100 years applies to the
Lebesgue integral. For these reasons, and because it is very easy to generalize the
Lebesgue integral to functions of many variables I will present the Lebesgue integral
here. First it is convenient to discuss outer measures, measures, and measurable
function in a general setting.
7.1 Open Coverings And Compactness
This is a good place to put an important theorem about compact sets. The deﬁnition
of what is meant by a compact set follows.
Deﬁnition 7.1.1 Let  denote a collection of open sets in a normed vector space.
Then  is said to be an open cover of a set K if K ⊆ ∪. Let K be a subset of
a normed vector space. Then K is compact if whenever  is an open cover of K
there exist ﬁnitely many sets of , ¦U
1
, , U
m
¦ such that
K ⊆ ∪
m
k=1
U
k
.
In words, every open cover admits a ﬁnite subcover.
It was shown earlier that in any ﬁnite dimensional normed vector space the closed
and bounded sets are those which are sequentially compact. The next theorem says
that in any normed vector space, sequentially compact and compact are the same.
1
1
Actually, this is true more generally than for normed vector spaces. It is also true for metric
spaces, those on which there is a distance deﬁned.
161
162 MEASURES AND MEASURABLE FUNCTIONS
First here is a very interesting lemma about the existence of something called a
Lebesgue number, the number r in the next lemma.
Lemma 7.1.2 Let K be a sequentially compact set in a normed vector space and
let  be an open cover of K. Then there exists r > 0 such that if x ∈ K, then
B(x, r) is a subset of some set of .
Proof: Suppose no such r exists. Then in particular, 1/n does not work for each
n ∈ N. Therefore, there exists x
n
∈ K such that B
_
x
n
,
1
n
_
is not a subset of any of
the sets of . Since K is sequentially compact, there exists a subsequence, ¦x
n
k
¦
converging to a point x of K. Then there exists r > 0 such that B(x, r) ⊆ U ∈ 
because  is an open cover. Also x
n
k
∈ B(x,r/2) for all k large enough and also
for all k large enough, 1/n
k
< r/2. Therefore, there exists x
n
k
∈ B(x,r/2) and
1/n
k
< r/2. But this is a contradiction because
B(x
n
k
, 1/n
k
) ⊆ B(x, r) ⊆ U
contrary to the choice of x
n
k
which required B(x
n
k
, 1/n
k
) is not contained in any
set of .
Theorem 7.1.3 Let K be a set in a normed vector space. Then K is compact if
and only if K is sequentially compact. In particular if K is a closed and bounded
subset of a ﬁnite dimensional normed vector space, then K is compact.
Proof: Suppose ﬁrst K is sequentially compact and let  be an open cover.
Let r be a Lebesgue number as described in Lemma 7.1.2. Pick x
1
∈ K. Then
B(x
1
, r) ⊆ U
1
for some U
1
∈ . Suppose ¦B(x
i
, r)¦
m
i=1
have been chosen such that
B(x
i
, r) ⊆ U
i
∈ .
If their union contains K then ¦U
i
¦
m
i=1
is a ﬁnite subcover of . If ¦B(x
i
, r)¦
m
i=1
does not cover K, then there exists x
m+1
/ ∈ ∪
m
i=1
B(x
i
, r) and so B(x
m+1
, r) ⊆
U
m+1
∈ . This process must stop after ﬁnitely many choices of B(x
i
, r) because
if not, ¦x
k
¦
∞
k=1
would have a subsequence which converges to a point of K which
cannot occur because whenever i ,= j,
[[x
i
−x
j
[[ > r
Therefore, eventually
K ⊆ ∪
m
k=1
B(x
k
, r) ⊆ ∪
m
k=1
U
k
.
this proves one half of the theorem.
Now suppose K is compact. I need to show it is sequentially compact. Suppose
it is not. Then there exists a sequence, ¦x
k
¦ which has no convergent subsequence.
This requires that ¦x
k
¦ have no limit point for if it did have a limit point x, then
B(x, 1/n) would contain inﬁnitely many distinct points of ¦x
k
¦ and so a subse
quence of ¦x
k
¦ converging to x could be obtained. Also no x
k
is repeated inﬁnitely
often because if there were such, a convergent subsequence could be obtained. Hence
7.1. OPEN COVERINGS AND COMPACTNESS 163
∪
∞
k=m
¦x
k
¦ ≡ C
m
is a closed set, closed because it contains all its limit points. (It
has no limit points so it contains them all.) Then letting U
m
= C
C
m
, it follows ¦U
m
¦
is an open cover of K which has no ﬁnite subcover. Thus K must be sequentially
compact after all.
If K is a closed and bounded set in a ﬁnite dimensional normed vector space,
then K is sequentially compact by Theorem 4.8.4. Therefore, by the ﬁrst part of
this theorem, it is sequentially compact.
Summarizing the above theorem along with Theorem 4.8.4 yields the following
corollary which is often called the Heine Borel theorem.
Corollary 7.1.4 Let X be a ﬁnite dimensional normed vector space and let K ⊆ X.
Then the following are equivalent.
1. K is closed and bounded.
2. K is sequentially compact.
3. K is compact.
There is also a very interesting result having to do with coverings of an arbitrary
set. The message is that you can always reduce to a countable subcover. Here F = C
or 1.
Lemma 7.1.5 Let [[x[[ ≡ max ¦[x
i
[ , i = 1, 2, , n¦ for x ∈ F
n
. Then every set U
which is open in F
n
is the countable union of balls of the form B(x,r) where the
open ball is deﬁned in terms of the above norm. Also, if ( is any collection of open
sets, then there exists a countable subset (
such that
∪( = ∪(
.
Proof: By Theorem 4.8.3 if you consider the two normed vector spaces (F
n
, [[)
and (F
n
, [[[[) , the identity map is continuous in both directions. Therefore, if a set
U is open with respect to [[ , it follows it is open with respect to [[[[ and the other
way around. The other thing to notice is that there exists a countable dense subset
of F. The rationals will work if F = 1 and if F = C, then you use ¸+ i¸. Letting
D be a countable dense subset of F, D
n
is a countable dense subset of F
n
. It is
countable because it is a ﬁnite Cartesian product of countable sets and you can use
Theorem 1.1.7 of Page 18 repeatedly. It is dense because if x ∈ F
n
, then by density
of D, there exists d
j
∈ D such that
[d
j
−x
j
[ < ε
then d ≡ (d
1
, , d
n
) is such that [[d −x[[ < ε.
Now consider the set of open balls,
B ≡¦B(d, r) : d ∈ D
n
, r ∈ ¸¦ .
This collection of open balls is countable by Theorem 1.1.7 of Page 18. I claim
every open set is the union of balls from B. Let U be an open set in F
n
and x ∈ U.
164 MEASURES AND MEASURABLE FUNCTIONS
Then there exists δ > 0 such that B(x, δ) ⊆ U. There exists d ∈ D
n
∩ B(x, δ/5) .
Then pick rational number δ/5 < r < 2δ/5. Consider the set of B, B(d, r) . Then
x ∈ B(d, r) because r > δ/5. However, it is also the case that B(d, r) ⊆ B(x, δ)
because if y ∈ B(d, r) then
[[y −x[[ ≤ [[y −d[[ +[[d −x[[
<
2δ
5
+
δ
5
< δ.
Let B
⊆ B denote those sets of B which contain some point of ∪( and which are
contained in some set of (. For each B ∈ B
, let U
B
be a single set of ( which
contains B. Consider a point of ∪( x. Then x is in some set of B
just described.
Therefore,
x ∈ ∪B
⊆ ∪¦U
B
: B ∈ B
¦
Let (
≡ ¦U
B
: B ∈ B
¦. Since x is arbitrary, this shows that
∪( ⊆∪ (
⊆ ∪(
The last condition which says you can reduce to a countable sub covering is
called the Lindeloﬀ property.
7.2 An Outer Measure On T (1)
A measure on 1 is like length. I will present something more general than length
because it is no trouble to do so and the generalization is useful in many areas of
mathematics such as probability.
Deﬁnition 7.2.1 The following deﬁnition is important.
F (x+) ≡ lim
y→x+
F (y) , F (x−) = lim
y→x−
F (y)
Thus one of these is the limit from the left and the other is the limit from the right.
Deﬁnition 7.2.2 T (S) denotes the set of all subsets of S.
Theorem 7.2.3 Let F be an increasing function deﬁned on 1. This will be called
an integrator function. There exists a function µ : T (1) → [0, ∞] which satisﬁes
the following properties.
1. If A ⊆ B, then 0 ≤ µ(A) ≤ µ(B) , µ(∅) = 0.
2. µ(∪
∞
k=1
A
i
) ≤
∞
i=1
µ(A
i
)
3. µ([a, b]) = F (b+) −F (a−) ,
4. µ((a, b)) = F (b−) −F (a+)
7.2. AN OUTER MEASURE ON P (R) 165
5. µ((a, b]) = F (b+) −F (a+)
6. µ([a, b)) = F (b−) −F (a−).
Proof: First it is necessary to deﬁne the function µ. This is contained in the
following deﬁnition.
Deﬁnition 7.2.4 For A ⊆ 1,
µ(A) = inf
_
_
_
∞
j=1
(F (b
i
−) −F (a
i
+)) : A ⊆ ∪
∞
i=1
(a
i
, b
i
)
_
_
_
In words, you look at all coverings of A with open intervals. For each of these
open coverings, you add the “lengths” of the individual open intervals and you take
the inﬁmum of all such numbers obtained.
Then 1.) is obvious because if a countable collection of open intervals covers B,
then it also covers A. Thus the set of numbers obtained for B is smaller than the set
of numbers for A. Why is µ(∅) = 0? Pick a point of continuity of F. Such points exist
because F is increasing and so it has only countably many points of discontinuity.
Let a be this point. Then ∅ ⊆ (a −δ, a +δ) and so µ(∅) ≤ F (a +δ) − F (a −δ)
for every δ > 0. Letting δ →0, it follows that µ(∅) = 0.
Consider 2.). If any µ(A
i
) = ∞, there is nothing to prove. The assertion simply
is ∞ ≤ ∞. Assume then that µ(A
i
) < ∞ for all i. Then for each m ∈ N there
exists a countable set of open intervals, ¦(a
m
i
, b
m
i
)¦
∞
i=1
such that
µ(A
m
) +
ε
2
m
>
∞
i=1
(F (b
m
i
−) −F (a
m
i
+)) .
Then using Theorem 1.3.4 on Page 26,
µ(∪
∞
m=1
A
m
) ≤
i,m
(F (b
m
i
−) −F (a
m
i
+)) =
∞
m=1
∞
i=1
(F (b
m
i
−) −F (a
m
i
+))
≤
∞
m=1
µ(A
m
) +
ε
2
m
=
∞
m=1
µ(A
m
) +ε,
and since ε is arbitrary, this establishes 2.).
Next consider 3.). By deﬁnition, there exists a sequence of open intervals,
¦(a
i
, b
i
)¦
∞
i=1
whose union contains [a, b] such that
µ([a, b]) +ε ≥
∞
i=1
(F (b
i
−) −F (a
i
+))
By Theorem 7.1.3, ﬁnitely many of these intervals also cover [a, b]. It follows there
exist ﬁnitely many of these intervals, denoted as ¦(a
i
, b
i
)¦
n
i=1
, which overlap, such
166 MEASURES AND MEASURABLE FUNCTIONS
that a ∈ (a
1
, b
1
) , b
1
∈ (a
2
, b
2
) , , b ∈ (a
n
, b
n
) . Therefore,
µ([a, b]) ≤
n
i=1
(F (b
i
−) −F (a
i
+)) .
It follows
n
i=1
(F (b
i
−) −F (a
i
+)) ≥ µ([a, b])
≥
n
i=1
(F (b
i
−) −F (a
i
+)) −ε
≥ F (b+) −F (a−) −ε
Therefore,
F (b +δ) −F (a −δ) ≥ µ([a, b]) ≥ F (b+) −F (a−) −ε
Letting δ →0,
F (b+) −F (a−) ≥ µ([a, b]) ≥ F (b+) −F (a−) −ε
Since ε is arbitrary, this shows
µ([a, b]) = F (b+) −F (a−) ,
This establishes 3.).
Consider 4.). For small δ > 0,
µ([a +δ, b −δ]) ≤ µ((a, b)) ≤ µ([a, b]) .
Therefore, from 3.) and the deﬁnition of µ,
F ((b −δ)) −F ((a +δ)) ≤ F ((b −δ) +) −F ((a +δ) −)
= µ([a +δ, b −δ]) ≤ µ((a, b)) ≤ F (b−) −F (a+)
Now letting δ decrease to 0 it follows
F (b−) −F (a+) ≤ µ((a, b)) ≤ F (b−) −F (a+)
This shows 4.)
Consider 5.). From 3.) and 4.), for small δ > 0,
F (b+) −F ((a +δ))
≤ F (b+) −F ((a +δ) −)
= µ([a +δ, b]) ≤ µ((a, b])
≤ µ((a, b +δ)) = F ((b +δ) −) −F (a+)
≤ F (b +δ) −F (a+) .
7.3. MEASURES AND MEASURE SPACES 167
Now let δ converge to 0 from above to obtain
F (b+) −F (a+) = µ((a, b]) = F (b+) −F (a+) .
This establishes 5.) and 6.) is entirely similar to 5.).
The ﬁrst two conditions of the above theorem are so important that we give
something satisfying them a special name.
Deﬁnition 7.2.5 Let Ω be a nonempty set. A function mapping T (Ω) →[0, ∞] is
called an outer measure if it satisﬁes
Theorem 7.2.6 1. If A ⊆ B, then 0 ≤ µ(A) ≤ µ(B) , µ(∅) = 0.
2. µ(∪
∞
k=1
A
i
) ≤
∞
i=1
µ(A
i
)
7.3 Measures And Measure Spaces
First here is a deﬁnition of a measure. This is what we want to eventually get.
What has been obtained above on T (1) is merely an outer measure along with its
values on intervals.
Deﬁnition 7.3.1 o ⊆ T (Ω) is called a σ algebra, pronounced “sigma algebra”, if
∅, Ω ∈ o,
If E ∈ o then E
C
∈ o
and
If E
i
∈ o, for i = 1, 2, , then ∪
∞
i=1
E
i
∈ o.
A function µ : o → [0, ∞] where o is a σ algebra is called a measure if whenever
¦E
i
¦
∞
i=1
⊆ o and the E
i
are disjoint, then it follows
µ
_
∪
∞
j=1
E
j
_
=
∞
j=1
µ(E
j
) .
The triple (Ω, o, µ) is often called a measure space. Sometimes people refer to
(Ω, o) as a measurable space, making no reference to the measure. Sometimes (Ω, o)
may also be called a measure space.
The following lemma is obvious and its proof is left for you.
Lemma 7.3.2 Let ( denote a set whose elements are σ algebras. Then ∩( is also
a σ algebra.
For o ⊆ T (Ω) , σ (o) denotes the smallest σ algebra which contains o. Such a
smallest σ algebra exists because T (Ω) is a σ algebra. Hence, letting ( denote the
set of all σ algebras which contain o it follows ( ,= ∅. Therefore, ( is nonempty
and σ (o) ≡ ∩(.
168 MEASURES AND MEASURABLE FUNCTIONS
Deﬁnition 7.3.3 In any topological space, the Borel sets consist of the smallest
σ algebra which contains the open sets. Thus denoting by τ the collection of open
sets, the Borel sets, written B ≡ σ (τ).
Theorem 7.3.4 Let ¦E
m
¦
∞
m=1
be a sequence of measurable sets in a measure space
(Ω, T, µ). Then if E
n
⊆ E
n+1
⊆ E
n+2
⊆ ,
µ(∪
∞
i=1
E
i
) = lim
n→∞
µ(E
n
) (7.3.1)
and if E
n
⊇ E
n+1
⊇ E
n+2
⊇ and µ(E
1
) < ∞, then
µ(∩
∞
i=1
E
i
) = lim
n→∞
µ(E
n
). (7.3.2)
Stated more succinctly, E
k
↑ E implies µ(E
k
) ↑ µ(E) and E
k
↓ E with µ(E
1
) < ∞
implies µ(E
k
) ↓ µ(E).
Proof: First note that ∩
∞
i=1
E
i
= (∪
∞
i=1
E
C
i
)
C
∈ T so ∩
∞
i=1
E
i
is measurable.
Also note that for A and B sets of T, A¸B ≡
_
A
C
∪ B
_
C
∈ T. To show 7.3.1, note
that 7.3.1 is obviously true if µ(E
k
) = ∞ for any k. Therefore, assume µ(E
k
) < ∞
for all k. Thus
µ(E
k+1
¸ E
k
) +µ(E
k
) = µ(E
k+1
)
and so
µ(E
k+1
¸ E
k
) = µ(E
k+1
) −µ(E
k
).
Also,
∞
_
k=1
E
k
= E
1
∪
∞
_
k=1
(E
k+1
¸ E
k
)
and the sets in the above union are disjoint. Hence
µ(∪
∞
i=1
E
i
) = µ(E
1
) +
∞
k=1
µ(E
k+1
¸ E
k
) = µ(E
1
)
+
∞
k=1
µ(E
k+1
) −µ(E
k
)
= µ(E
1
) + lim
n→∞
n
k=1
µ(E
k+1
) −µ(E
k
) = lim
n→∞
µ(E
n+1
).
This shows part 7.3.1.
To verify 7.3.2,
µ(E
1
) = µ(∩
∞
i=1
E
i
) +µ(E
1
¸ ∩
∞
i=1
E
i
)
since µ(E
1
) < ∞, it follows µ(∩
∞
i=1
E
i
) < ∞. Also, E
1
¸ ∩
n
i=1
E
i
↑ E
1
¸ ∩
∞
i=1
E
i
and
so by 7.3.1,
µ(E
1
) −µ(∩
∞
i=1
E
i
) = µ(E
1
¸ ∩
∞
i=1
E
i
) = lim
n→∞
µ(E
1
¸ ∩
n
i=1
E
i
)
7.4. MEASURES FROM OUTER MEASURES 169
= µ(E
1
) − lim
n→∞
µ(∩
n
i=1
E
i
) = µ(E
1
) − lim
n→∞
µ(E
n
),
Hence, subtracting µ(E
1
) from both sides,
lim
n→∞
µ(E
n
) = µ(∩
∞
i=1
E
i
).
The following deﬁnition is important.
Deﬁnition 7.3.5 If something happens except for on a set of measure zero, then
it is said to happen a.e. “almost everywhere”. For example, ¦f
k
(x)¦ is said to
converge to f (x) a.e. if there is a set of measure zero, N such that if x ∈ N, then
f
k
(x) →f (x).
7.4 Measures From Outer Measures
Earlier an outer measure on T (1) was constructed. This can be used to obtain a
measure deﬁned on 1. However, the procedure for doing so is a special case of a
general approach due to Caratheodory in about 1918.
Deﬁnition 7.4.1 Let Ω be a nonempty set and let µ : T(Ω) → [0, ∞] be an outer
measure. For E ⊆ Ω, E is µ measurable if for all S ⊆ Ω,
µ(S) = µ(S ¸ E) +µ(S ∩ E). (7.4.3)
To help in remembering 7.4.3, think of a measurable set E, as a process which
divides a given set into two pieces, the part in E and the part not in E as in 7.4.3.
In the Bible, there are several incidents recorded in which a process of division
resulted in more stuﬀ than was originally present.
2
Measurable sets are exactly
those which are incapable of such a miracle. You might think of the measurable
sets as the nonmiraculous sets. The idea is to show that they form a σ algebra on
which the outer measure µ is a measure.
First here is a deﬁnition and a lemma.
Deﬁnition 7.4.2 (µ¸S)(A) ≡ µ(S ∩ A) for all A ⊆ Ω. Thus µ¸S is the name of a
new outer measure, called µ restricted to S.
The next lemma indicates that the property of measurability is not lost by
considering this restricted measure.
Lemma 7.4.3 If A is µ measurable, then A is µ¸S measurable.
2
1 Kings 17, 2 Kings 4, Mathew 14, and Mathew 15 all contain such descriptions. The stuﬀ
involved was either oil, bread, ﬂour or ﬁsh. In mathematics such things have also been done with
sets. In the book by Bruckner Bruckner and Thompson there is an interesting discussion of the
Banach Tarski paradox which says it is possible to divide a ball in R
3
into ﬁve disjoint pieces and
assemble the pieces to form two disjoint balls of the same size as the ﬁrst. The details can be
found in: The Banach Tarski Paradox by Wagon, Cambridge University press. 1985. It is known
that all such examples must involve the axiom of choice.
170 MEASURES AND MEASURABLE FUNCTIONS
Proof: Suppose A is µ measurable. It is desired to to show that for all T ⊆ Ω,
(µ¸S)(T) = (µ¸S)(T ∩ A) + (µ¸S)(T ¸ A).
Thus it is desired to show
µ(S ∩ T) = µ(T ∩ A∩ S) +µ(T ∩ S ∩ A
C
). (7.4.4)
But 7.4.4 holds because A is µ measurable. Apply Deﬁnition 7.4.1 to S ∩T instead
of S.
If A is µ¸S measurable, it does not follow that A is µ measurable. Indeed, if
you believe in the existence of non measurable sets, you could let A = S for such a
µ non measurable set and verify that S is µ¸S measurable.
The next theorem is the main result on outer measures which shows that starting
with an outer measure you can obtain a measure.
Theorem 7.4.4 Let Ω be a set and let µ be an outer measure on T (Ω). The
collection of µ measurable sets o, forms a σ algebra and
If F
i
∈ o, F
i
∩ F
j
= ∅, then µ(∪
∞
i=1
F
i
) =
∞
i=1
µ(F
i
). (7.4.5)
If F
n
⊆ F
n+1
⊆ , then if F = ∪
∞
n=1
F
n
and F
n
∈ o, it follows that
µ(F) = lim
n→∞
µ(F
n
). (7.4.6)
If F
n
⊇ F
n+1
⊇ , and if F = ∩
∞
n=1
F
n
for F
n
∈ o then if µ(F
1
) < ∞,
µ(F) = lim
n→∞
µ(F
n
). (7.4.7)
This measure space is also complete which means that if µ(F) = 0 for some F ∈ o
then if G ⊆ F, it follows G ∈ o also.
Proof: First note that ∅ and Ω are obviously in o. Now suppose A, B ∈ o. I
will show A¸ B ≡ A∩ B
C
is in o. To do so, consider the following picture.
S
A
C
B
C
S
A
C
B
S
A
B
S
A
B
C
A
B
S
7.4. MEASURES FROM OUTER MEASURES 171
First note that
S ¸ (A¸ B) = (S ¸ A) ∪ (S ∩ B) = (S ¸ A) ∪ (A∩ B ∩ S) .
Since µ is subadditive and A, B ∈ o,
µ(S) ≤ µ(S ¸ (A¸ B)) +µ(S ∩ (A¸ B))
≤ µ(S ¸ A) +µ(S ∩ B ∩ A) +µ
_
S ∩ A∩ B
C
_
= µ(S ¸ A) +µ(S ∩ A) = µ(S)
and so all the inequalities are equal signs. Therefore, since S is arbitrary, this shows
A¸ B ∈ o.
Since Ω ∈ o, this shows that A ∈ o if and only if A
C
∈ o. Now if A, B ∈ o,
A ∪ B = (A
C
∩ B
C
)
C
= (A
C
¸ B)
C
∈ o. By induction, if A
1
, , A
n
∈ o, then so
is ∪
n
i=1
A
i
. If A, B ∈ o, with A∩ B = ∅,
µ(A∪ B) = µ((A∪ B) ∩ A) +µ((A∪ B) ¸ A) = µ(A) +µ(B).
By induction, if A
i
∩ A
j
= ∅ and A
i
∈ o,
µ(∪
n
i=1
A
i
) =
n
i=1
µ(A
i
). (7.4.8)
Now let A = ∪
∞
i=1
A
i
where A
i
∩ A
j
= ∅ for i ,= j.
∞
i=1
µ(A
i
) ≥ µ(A) ≥ µ(∪
n
i=1
A
i
) =
n
i=1
µ(A
i
).
Since this holds for all n, you can take the limit as n →∞ and conclude,
∞
i=1
µ(A
i
) = µ(A)
which establishes 7.4.5.
Consider part 7.4.6. Without loss of generality µ(F
k
) < ∞ for all k since
otherwise there is nothing to show. Suppose ¦F
k
¦ is an increasing sequence of sets
of o. Then letting F
0
≡ ∅, ¦F
k+1
¸ F
k
¦
∞
k=0
is a sequence of disjoint sets of o since
it was shown above that the diﬀerence of two sets of o is in o. Also note that from
7.4.8
µ(F
k+1
¸ F
k
) +µ(F
k
) = µ(F
k+1
)
and so if µ(F
k
) < ∞, then
µ(F
k+1
¸ F
k
) = µ(F
k+1
) −µ(F
k
) .
Therefore, letting
F ≡ ∪
∞
k=1
F
k
172 MEASURES AND MEASURABLE FUNCTIONS
which also equals
∪
∞
k=1
(F
k+1
¸ F
k
) ,
it follows from part 7.4.5 just shown that
µ(F) =
∞
k=1
µ(F
k+1
¸ F
k
) = lim
n→∞
n
k=1
µ(F
k+1
¸ F
k
)
= lim
n→∞
n
k=1
µ(F
k+1
) −µ(F
k
) = lim
n→∞
µ(F
n+1
) .
In order to establish 7.4.7, let the F
n
be as given there. Then, since (F
1
¸ F
n
)
increases to (F
1
¸ F), 7.4.6 implies
lim
n→∞
(µ(F
1
) −µ(F
n
)) = µ(F
1
¸ F) .
Now µ(F
1
¸ F) +µ(F) ≥ µ(F
1
) and so µ(F
1
¸ F) ≥ µ(F
1
) −µ(F). Hence
lim
n→∞
(µ(F
1
) −µ(F
n
)) = µ(F
1
¸ F) ≥ µ(F
1
) −µ(F)
which implies
lim
n→∞
µ(F
n
) ≤ µ(F) .
But since F ⊆ F
n
,
µ(F) ≤ lim
n→∞
µ(F
n
)
and this establishes 7.4.7. Note that it was assumed µ(F
1
) < ∞because µ(F
1
) was
subtracted from both sides.
It remains to show o is closed under countable unions. Recall that if A ∈ o, then
A
C
∈ o and o is closed under ﬁnite unions. Let A
i
∈ o, A = ∪
∞
i=1
A
i
, B
n
= ∪
n
i=1
A
i
.
Then
µ(S) = µ(S ∩ B
n
) +µ(S ¸ B
n
) (7.4.9)
= (µ¸S)(B
n
) + (µ¸S)(B
C
n
).
By Lemma 7.4.3 B
n
is (µ¸S) measurable and so is B
C
n
. I want to show µ(S) ≥
µ(S ¸ A) + µ(S ∩ A). If µ(S) = ∞, there is nothing to prove. Assume µ(S) < ∞.
Then apply Parts 7.4.7 and 7.4.6 to the outer measure µ¸S in 7.4.9 and let n →∞.
Thus
B
n
↑ A, B
C
n
↓ A
C
and this yields µ(S) = (µ¸S)(A) + (µ¸S)(A
C
) = µ(S ∩ A) +µ(S ¸ A).
Therefore A ∈ o and this proves Parts 7.4.5, 7.4.6, and 7.4.7.
It only remains to verify the assertion about completeness. Letting G and F be
as described above, let S ⊆ Ω. I need to verify
µ(S) ≥ µ(S ∩ G) +µ(S ¸ G)
7.5. MEASURABLE FUNCTIONS 173
However,
µ(S ∩ G) +µ(S ¸ G) ≤ µ(S ∩ F) +µ(S ¸ F) +µ(F ¸ G)
= µ(S ∩ F) +µ(S ¸ F) = µ(S)
because by assumption, µ(F ¸ G) ≤ µ(F) = 0.
Corollary 7.4.5 Completeness is the same as saying that if (E ¸ E
) ∪(E
¸ E) ⊆
N ∈ T and µ(N) = 0, then if E ∈ T, it follows that E
∈ T also.
Proof: If the new condition holds, then suppose G ⊆ F where µ(F) = 0, F ∈ T.
Then
=∅
¸ .. ¸
(G¸ F) ∪ (F ¸ G) ⊆ F and µ(F) is given to equal 0. Therefore, G ∈ T.
Now suppose the earlier version of completeness and let
(E ¸ E
) ∪ (E
¸ E) ⊆ N ∈ T
where µ(N) = 0 and E ∈ T. Then we know
(E ¸ E
) , (E
¸ E) ∈ T
and all have measure zero. It follows E ¸ (E ¸ E
) = E ∩ E
∈ T. Hence
E
= (E ∩ E
) ∪ (E
¸ E) ∈ T
The measure µ which results from the outer measure of Theorem 7.2.3 is called
the Lebesgue Stieltjes measure associated with the integrator function F. Its prop
erties will be discussed more later.
7.5 Measurable Functions
The integral will be deﬁned on measurable functions which is the next topic con
sidered. It is sometimes convenient to allow functions to take the value +∞. You
should think of +∞, usually referred to as ∞, as something out at the right end
of the real line and its only importance is the notion of sequences converging to
it. x
n
→ ∞ exactly when for all l ∈ 1, there exists N such that if n ≥ N, then
x
n
> l. This is what it means for a sequence to converge to ∞. Don’t think of
∞ as a number. It is just a convenient symbol which allows the consideration of
some limit operations more simply. Similar considerations apply to −∞ but this
value is not of very great interest. In fact the set of most interest for the values of
a function, f is the complex numbers or more generally some normed vector space.
Recall the notation,
f
−1
(A) ≡ ¦x : f (x) ∈ A¦ ≡ [f (x) ∈ A] ≡ [f ∈ A]
in whatever context the notation occurs.
174 MEASURES AND MEASURABLE FUNCTIONS
Lemma 7.5.1 Let f : Ω →(−∞, ∞] where T is a σ algebra of subsets of Ω. Then
the following are equivalent.
f
−1
((d, ∞]) ∈ T for all ﬁnite d,
f
−1
((−∞, d)) ∈ T for all ﬁnite d,
f
−1
([d, ∞]) ∈ T for all ﬁnite d,
f
−1
((−∞, d]) ∈ T for all ﬁnite d,
f
−1
((a, b)) ∈ T for all a < b, −∞< a < b < ∞.
Proof: First note that the ﬁrst and the third are equivalent. To see this, observe
f
−1
([d, ∞]) = ∩
∞
n=1
f
−1
((d −1/n, ∞]),
and so if the ﬁrst condition holds, then so does the third.
f
−1
((d, ∞]) = ∪
∞
n=1
f
−1
([d + 1/n, ∞]),
and so if the third condition holds, so does the ﬁrst.
Similarly, the second and fourth conditions are equivalent. Now
f
−1
((−∞, d]) = (f
−1
((d, ∞]))
C
so the ﬁrst and fourth conditions are equivalent. Thus the ﬁrst four conditions are
equivalent and if any of them hold, then for −∞< a < b < ∞,
f
−1
((a, b)) = f
−1
((−∞, b)) ∩ f
−1
((a, ∞]) ∈ T.
Finally, if the last condition holds,
f
−1
([d, ∞]) =
_
∪
∞
k=1
f
−1
((−k +d, d))
_
C
∈ T
and so the third condition holds. Therefore, all ﬁve conditions are equivalent.
This lemma allows for the following deﬁnition of a measurable function having
values in (−∞, ∞].
Deﬁnition 7.5.2 Let (Ω, T) be a measure space and let f : Ω →(−∞, ∞]. Then f
is said to be T measurable if any of the equivalent conditions of Lemma 7.5.1 hold.
More generally, when you have a measure space (Ω, T) and a function g : Ω → X
where X is a topological space, we say that g is measurable if g
−1
(U) ∈ T for all
U an open set.
In the case of (−∞, ∞], you can start with the general deﬁnition of measurability
just given and conclude that all of the equivalent conclusions in the above lemma
hold. This was contained in the above proof.
7.5. MEASURABLE FUNCTIONS 175
Theorem 7.5.3 Let f
n
and f be functions mapping Ω to (−∞, ∞] where T is a σ
algebra of measurable sets of Ω. Then if f
n
is measurable, and f(ω) = lim
n→∞
f
n
(ω),
it follows that f is also measurable. (Pointwise limits of measurable functions are
measurable.)
Proof: The idea is to show f
−1
((a, ∞]) ∈ T.
f
−1
_
(a +
1
r
, ∞]
_
⊆ ∪
∞
k=1
∩
∞
n=k
f
−1
n
_
(a +
1
r
, ∞]
_
⊆ f
−1
__
a +
1
r
, ∞
__
because f (ω) > a implies that for all n large enough, f
n
(ω) > a +
1
r
also. If
f
n
(ω) > a +
1
r
for all n large enough, then f (ω) ∈
_
a +
1
r
, ∞
¸
. Now simply take a
union over all r ∈ N. Thus
f
−1
((a, ∞]) ⊆ ∪
∞
r=1
∪
∞
k=1
∩
∞
n=k
f
−1
n
_
(a +
1
r
, ∞]
_
⊆ ∪
∞
r=1
f
−1
__
a +
1
r
, ∞
__
= f
−1
((a, ∞])
The expression in the middle is given to be measurable.
Notation 7.5.4 For E a set, let A
E
(ω) be deﬁned by
A
E
(x) =
_
1 if ω ∈ E
0 if ω / ∈ E
Let (Ω, T) be a measure space. A simple function is one which is measurable and
has ﬁnitely many values. Thus every simple function is of the form
s (ω) ≡
m
k=1
c
k
A
E
k
(ω) ,
the c
k
being the distinct values of s, E
k
being the set on which s = c
k
. A nonnegative
simple function is one in which the c
k
∈ [0, ∞].
Lemma 7.5.5 Let s (ω) =
m
k=1
c
k
A
E
k
(ω) where the c
k
are the distinct values of
s, the E
k
being disjoint. Then s is measurable if and only if each E
k
is measurable..
Proof: Suppose ﬁrst s is measurable. Let δ be small enough that E
k
=
s
−1
(c
k
−δ, c
k
+δ) . That is, no other values of the c
j
are in this interval. Then
it follows E
k
is measurable because s is given to be measurable.
Next suppose each E
k
is measurable. Then [s > α] consists of a ﬁnite union of
E
k
where c
k
> α.
Note that f
−1
([a, b)) = f
−1
(−∞, a)
C
∩ f
−1
(−∞, b) which is measurable.
Theorem 7.5.6 A function f ≥ 0 is measurable with respect to the measure space
(Ω, T) if and only if there exists a sequence of nonnegative simple functions ¦s
n
¦
satisfying
0 ≤ s
n
(ω) (7.5.10)
176 MEASURES AND MEASURABLE FUNCTIONS
s
n
(ω) ≤ s
n+1
(ω)
f(ω) = lim
n→∞
s
n
(ω) for all ω ∈ Ω. (7.5.11)
If f is bounded and measurable, the convergence is actually uniform.
Proof : First suppose that f is measurable.
f
−1
([a, b)) = f
−1
((−∞, a))
C
∩ f
−1
((−∞, b)) ∈ T.
Letting I ≡ ¦ω : f (ω) = ∞¦ , deﬁne
t
n
(ω) =
2
n
k=0
k
n
A
f
−1
([k/n,(k+1)/n))
(ω) +nA
I
(ω).
Then t
n
(ω) ≤ f(ω) for all ω and lim
n→∞
t
n
(ω) = f(ω) for all ω. This is because
t
n
(ω) = n for ω ∈ I and if f (ω) ∈ [0,
2
n
+1
n
), then
0 ≤ f (ω) −t
n
(ω) ≤
1
n
. (7.5.12)
Thus whenever ω / ∈ I, the above inequality will hold for all n large enough. Let
s
1
= t
1
, s
2
= max (t
1
, t
2
) , s
3
= max (t
1
, t
2
, t
3
) , .
Then the sequence ¦s
n
¦ satisﬁes 7.5.107.5.11.
In case f is bounded, the term nA
I
(ω) is not present. Therefore, for all n large
enough, 2
n
n ≥ f (ω) for all ω, 7.5.12 holds for all ω. Thus the convergence is
uniform.
Conversely, if f is the limit of nonnegative simple functions, then by Theorem
7.5.3, f is measurable because each of these simple functions is measurable.
Now specialize to the case where f has values in 1.
Deﬁnition 7.5.7 Let f : Ω →1. Then
f
+
≡ max (f, 0) =
[f[ +f
2
,
f
−
≡ f
+
−f = −min (f, 0) =
[f[ −f
2
.
Thus f
+
−f
−
= f and both f
+
and f
−
are nonnegative.
Observation 7.5.8 Note that f
−
< ∞ since f > −∞. Also note that f = f
+
when f
+
> 0 and f
−
= −f when f
−
> 0.
Theorem 7.5.9 Let (Ω, T) be a measurable space. Then f : Ω →1 is measurable
if and only if both f
+
and f
−
are measurable, if and only if each of these are
pointwise limits of an increasing sequence of simple functions.
7.5. MEASURABLE FUNCTIONS 177
Proof: Suppose ﬁrst that f is measurable and a ≥ 0. Then from the deﬁnition
and the above observation,
_
f
+
_
−1
((a, ∞]) = f
−1
((a, ∞]) ∈ T
If a < 0, then from the deﬁnition,
_
f
+
_
−1
((a, ∞]) =
_
f
+
_
−1
([0, ∞]) = Ω ∈ T
Thus from Lemma 7.5.1, f
+
is measurable.
Next consider f
−
. First let a ≥ 0.
_
f
−
_
−1
((a, ∞]) = f
−1
((−∞, −a)) ∈ T
If a < 0, then
_
f
−
_
−1
((a, ∞]) =
_
f
−
_
−1
([0, ∞]) = Ω ∈ T
and so f
+
, f
−
are both measurable. The rest involving the approximating sequences
of simple functions follows from Theorem 7.5.6.
Conversely, if both f
+
and f
−
are measurable, then consider the following pic
ture.
R
i
y = a +x
The top half can be ﬁlled with a countable union of open rectangles of the sort
shown, R
i
= (a
i
, b
i
) (c
i
, d
i
). See Lemma 7.1.5. Then clearly f (ω) > a, if and only
if (f
−
(ω) , f
+
(ω)) is in the top half plane determined in the above picture, which
is the union of countably many open rectangles. Hence
f
−1
((a, ∞]) = ∪
∞
i=1
_
f
+
_
−1
(c
i
, d
i
) ∩
_
f
−
_
−1
(a
i
, b
i
) ∈ T
You could also consider the following equation in which the right side is a count
able union of measurable sets.
[f > a] = ∪
r∈Q
_
f
+
> a +r
¸
∩
_
f
−
< r
¸
.
The equation is true because if ω ∈ [f
+
> a +r] ∩[f
−
< r] for some r, then f (ω) =
f
+
(ω) − f
−
(ω) > a + r − r = a. Thus the right side is contained in the left. If
f (ω) > a, then f
+
(ω) > a + f
−
(ω) and so you can choose r a little bigger than
f
−
(ω) such that also f
+
(ω) > a+r. Hence, for that r, ω ∈ [f
+
> a +r] ∩[f
−
< r] .
178 MEASURES AND MEASURABLE FUNCTIONS
Deﬁnition 7.5.10 Let (Ω, T) be a measurable space and let f : Ω →C, the complex
numbers. Then f is said to be measurable if the real and imaginary parts of f
are measurable. Complex simple functions are deﬁned as those which are complex
valued, measurable and have ﬁnitely many values. Thus by similar reasoning to the
above, every complex simple function is of the form
s (ω) =
n
k=1
c
k
A
E
k
(ω)
where the c
k
are the distinct values of s, and the E
k
are disjoint sets.
Lemma 7.5.11 Let s (ω) =
n
k=1
(a
k
+ib
k
) A
E
k
(ω) where the a
k
+ ib
k
are the
distinct values of s and the E
k
are the disjoint sets on which s achieves these
values. Then s is measurable if and only if each E
k
is measurable.
Proof: First suppose s is measurable. Then both
n
k=1
a
k
A
E
k
(ω) and
n
k=1
b
k
A
E
k
(ω)
are measurable. Then E
k
= [Re s = a
k
] ∩[Ims = b
k
] ∈ T. Conversely, assume each
E
k
is measurable. Then
k
a
k
A
E
k
and
k
b
k
A
E
k
are both measurable because if
h is the ﬁrst, then
[h > α] = ∪¦E
k
: a
k
> α¦ ∈ T.
Similar considerations hold for the second.
Now it is not hard to generalize the above to complex valued functions.
Theorem 7.5.12 f : Ω →C is measurable if and only if there exists a sequence of
complex simple functions ¦s
j
¦
∞
j=1
such that [s
j
[ ≤ [f[ and s
j
(ω) →f (ω) for all ω.
Proof: If f is measurable, the conclusion follows right away from Theorem 7.5.9
applied to the real and imaginary parts of f. Letting h denote one of these real or
imaginary parts, there exist increasing sequences of simple functions ¦s
k
¦ , ¦t
k
¦
converging pointwise to h
+
and h
−
respectively. If h(ω) ≥ 0, Then h
−
(ω) = 0 so
t
k
(ω) = 0. Hence
[h(ω)[ = h
+
(ω) ≥ s
k
(ω) = [s
k
(ω) −t
k
(ω)[
If h(ω) ≤ 0, then h(ω) = −h
−
(ω) and so h
+
(ω) = 0. Hence s
k
(ω) = 0 and
[h(ω)[ = h
−
(ω) ≥ t
k
(ω) = [s
k
(ω) −t
k
(ω)[
Either way, [h[ ≥ [s
k
−t
k
[.
Thus there exist real simple functions ¦p
k
¦ ¦q
k
¦ converging to Re f and Imf
respectively such that [p
k
[ ≤ [Re f[ , [q
k
[ ≤ [Imf[ . Hence the complex simple
function p
k
+iq
k
converges pointwise to Re f +i Imf and
[p
k
+iq
k
[ =
_
p
2
k
+q
2
k
≤
_
(Re f)
2
+ (Imf)
2
.
Conversly, if there exists a sequence of complex simple functions which do what
is expressed above, then their real parts are simple functions converging to the real
7.5. MEASURABLE FUNCTIONS 179
part of f and their imaginary parts are simple functions converging to the imaginary
part of f. Then it follows from Theorem 7.5.3 that f is measurable because its real
and imaginary parts are.
Now the following is a major result about continuous functions of complex valued
measurable functions.
Corollary 7.5.13 Let (Ω, T) be a measurable space, let g : C
n
→C be continuous
and let f
i
: Ω → C be measurable. Then g ◦ f is also measurable where f =
(f
1
, , f
n
).
Proof: Let
_
s
i
j
_
∞
j=1
be a sequence of complex simple functions converging point
wise to f
i
. Let s
j
≡
_
s
1
j
, , s
n
j
_
. Then by continuity of g,
lim
j→∞
g ◦ s
j
= g ◦ f
It follows that if g ◦ s
j
is measurable, then so is g ◦ f . However, since each s
j
has a
simple function in each component, g ◦ s
j
has ﬁnitely many values, corresponding
to the possible values of the s
i
j
. Furthermore, these ﬁnitely many values of s
j
are
achieved on a measurable set resulting from the ﬁnite intersection of the sets on
which the s
i
j
are constant. Therefore, g ◦ s
j
is measurable because it has ﬁnitely
many values which are achieved on measurable sets, and consequently g ◦ f is also
measurable because it is a limit of these measurable functions.
What this says is that you can do pretty much any reasonable algebraic ma
nipulation to measurable functions and you end up with a measurable function.
In particular, linear combinations, products, absolute values, etc. of measurable
functions are all measurable.
Theorem 7.5.14 (Egoroﬀ) Let (Ω, T, µ) be a ﬁnite measure space,
(µ(Ω) < ∞)
and let f
n
, f be complex valued functions such that Re f
n
, Imf
n
are all measurable
and
lim
n→∞
f
n
(ω) = f(ω)
for all ω / ∈ E where µ(E) = 0. Then for every ε > 0, there exists a set
F ⊇ E, µ(F) < ε,
such that f
n
converges uniformly to f on F
C
.
Proof: First suppose E = ∅ so that convergence is pointwise everywhere. It
follows then that Re f and Imf are pointwise limits of measurable functions and
are therefore measurable. Let E
km
= ¦ω ∈ Ω : [f
n
(ω) − f(ω)[ ≥ 1/m for some
n > k¦. Note that
[f
n
(ω) −f (ω)[ =
_
(Re f
n
(ω) −Re f (ω))
2
+ (Imf
n
(ω) −Imf (ω))
2
180 MEASURES AND MEASURABLE FUNCTIONS
and so by Corollary 7.5.13,
_
[f
n
−f[ ≥
1
m
_
is measurable. Hence E
km
is measurable because
E
km
= ∪
∞
n=k+1
_
[f
n
−f[ ≥
1
m
_
.
For ﬁxed m, ∩
∞
k=1
E
km
= ∅ because f
n
converges to f . Therefore, if ω ∈ Ω there
exists k such that if n > k, [f
n
(ω) −f (ω)[ <
1
m
which means ω / ∈ E
km
. Note also
that
E
km
⊇ E
(k+1)m
.
Since µ(E
1m
) < ∞, Theorem 7.3.4 on Page 168 implies
0 = µ(∩
∞
k=1
E
km
) = lim
k→∞
µ(E
km
).
Let k(m) be chosen such that µ(E
k(m)m
) < ε2
−m
and let
F =
∞
_
m=1
E
k(m)m
.
Then µ(F) < ε because
µ(F) ≤
∞
m=1
µ
_
E
k(m)m
_
<
∞
m=1
ε2
−m
= ε
Now let η > 0 be given and pick m
0
such that m
−1
0
< η. If ω ∈ F
C
, then
ω ∈
∞
m=1
E
C
k(m)m
.
Hence ω ∈ E
C
k(m
0
)m
0
so
[f
n
(ω) −f(ω)[ < 1/m
0
< η
for all n > k(m
0
). This holds for all ω ∈ F
C
and so f
n
converges uniformly to f on
F
C
.
Now if E ,= ∅, consider ¦A
E
Cf
n
¦
∞
n=1
. Each A
E
Cf
n
has real and imaginary
parts measurable and the sequence converges pointwise to A
E
f everywhere. There
fore, from the ﬁrst part, there exists a set of measure less than ε, F such that on
F
C
, ¦A
E
Cf
n
¦ converges uniformly to A
E
Cf. Therefore, on (E ∪ F)
C
, ¦f
n
¦ con
verges uniformly to f.
7.6. ONE DIMENSIONAL LEBESGUE STIELTJES MEASURE 181
7.6 One Dimensional Lebesgue Stieltjes Measure
Now with these major results about measures, it is time to specialize to the outer
measure of Theorem 7.2.3. The next theorem gives Lebesgue Stieltjes measure on
1. The conditions 7.6.13 and 7.6.14 given below are known respectively as inner
and outer regularity.
Theorem 7.6.1 Let o denote the σ algebra of Theorem 7.4.4, associated with the
outer measure µ in Theorem 7.2.3, on which µ is a measure. Then every open
interval is in o. So are all open and closed sets. Furthermore, if E is any set in o
µ(E) = sup ¦µ(K) : K is a closed and bounded set K ⊆ E¦ (7.6.13)
µ(E) = inf ¦µ(V ) : V is an open set V ⊇ E¦ (7.6.14)
Proof: The ﬁrst task is to show (a, b) ∈ o. I need to show that for every S ⊆ 1,
µ(S) ≥ µ(S ∩ (a, b)) +µ
_
S ∩ (a, b)
C
_
(7.6.15)
Suppose ﬁrst S is an open interval, (c, d) . If (c, d) has empty intersection with (a, b)
or is contained in (a, b) there is nothing to prove. The above expression reduces to
nothing more than µ(S) = µ(S). Suppose next that (c, d) ⊇ (a, b) . In this case,
the right side of the above reduces to
µ((a, b)) +µ((c, a] ∪ [b, d))
≤ F (b−) −F (a+) +F (a+) −F (c+) +F (d−) −F (b−)
= F (d−) −F (c+) ≡ µ((c, d))
The only other cases are c ≤ a < d ≤ b or a ≤ c < d ≤ b. Consider the ﬁrst of these
cases. Then the right side of 7.6.15 for S = (c, d) is
µ((a, d)) +µ((c, a]) = F (d−) −F (a+) +F (a+) −F (c+)
= F (d−) −F (c+) = µ((c, d))
The last case is entirely similar. Thus 7.6.15 holds whenever S is an open interval.
Now it is clear 7.6.15 also holds if µ(S) = ∞. Suppose then that µ(S) < ∞ and
let
S ⊆ ∪
∞
k=1
(a
k
, b
k
)
such that
µ(S) +ε >
∞
k=1
(F (b
k
−) −F (a
k
+)) =
∞
k=1
µ((a
k
, b
k
)) .
182 MEASURES AND MEASURABLE FUNCTIONS
Then since µ is an outer measure, and using what was just shown,
µ(S ∩ (a, b)) +µ
_
S ∩ (a, b)
C
_
≤ µ(∪
∞
k=1
(a
k
, b
k
) ∩ (a, b)) +µ
_
∪
∞
k=1
(a
k
, b
k
) ∩ (a, b)
C
_
≤
∞
k=1
µ((a
k
, b
k
) ∩ (a, b)) +µ
_
(a
k
, b
k
) ∩ (a, b)
C
_
≤
∞
k=1
µ((a
k
, b
k
)) ≤ µ(S) +ε.
Since ε is arbitrary, this shows 7.6.15 holds for any S and so any open interval is in
S.
It follows any open set is in o. This follows from Theorem 4.3.10 which implies
that if U is open, it is the countable union of disjoint open intervals. Since each of
these open intervals is in o and o is a σ algebra, their union is also in o. It follows
every closed set is in o also. This is because o is a σ algebra and if a set is in o
then so is its complement. The closed sets are those which are complements of open
sets.
The assertion of outer regularity is not hard to get. Letting E be any set
µ(E) < ∞, there exist open intervals covering E denoted by ¦(a
i
, b
i
)¦
∞
i=1
such that
µ(E) +ε >
∞
i=1
F (b
i
−) −F (a
i
+) =
∞
i=1
µ(a
i
, b
i
) ≥ µ(V )
where V is the union of the open intervals just mentioned. Thus
µ(E) ≤ µ(V ) ≤ µ(E) +ε.
This shows outer regularity. If µ(E) = ∞, there is nothing to show.
Now consider the assertion of inner regularity 7.6.13. Suppose I is a closed and
bounded interval and E ⊆ I with E ∈ o. By outer regularity, there exists open V
containing I ∩ E
C
such that
µ
_
I ∩ E
C
_
+ε > µ(V )
Then since µ is additive on o, it follows that µ
_
V ¸
_
I ∩ E
C
__
< ε. Then K ≡ V
C
∩I
is a compact subset of E. Also,
E ¸
_
V
C
∩ I
_
= E ∩ V = V ¸ E
C
⊆ V ¸
_
I ∩ E
C
_
,
a set of measure less than ε. Therefore,
µ
_
V
C
∩ I
_
+ε ≥ µ
_
V
C
∩ I
_
+µ
_
E ¸
_
V
C
∩ I
__
= µ(E) ,
so the desired conclusion holds in the case where E is contained in a compact
interval.
7.7. EXERCISES 183
Now suppose E is arbitrary and let l < µ(E) . Then choosing ε small enough,
l + ε < µ(E) also. Letting E
n
≡ E ∩ [−n, n] , it follows from Theorem 7.3.4 that
for n large enough, µ(E
n
) > l + ε. Now from what was just shown, there exists
K ⊆ E
n
such that µ(K) +ε > µ(E
n
). Hence µ(K) > l. This shows 7.6.13.
Deﬁnition 7.6.2 When the integrator function is F (x) = x, the Lebesgue Stielt
jes measure just discussed is known as one dimensional Lebesgue measure and is
denoted as m.
Proposition 7.6.3 For m Lebesgue measure, m([a, b]) = m((a, b)) = b − a. Also
m is translation invariant in the sense that if E is any Lebesgue measurable set,
then m(x +E) = m(E).
Proof: The formula for the measure of an interval comes right away from
Theorem 7.2.3. From this, it follows right away that whenever E is an interval,
m(x +E) = m(E). Every open set is the countable disjoint union of open intervals,
so if E is an open set, then m(x +E) = m(E). What about closed sets? First
suppose H is a closed and bounded set. Then letting (−n, n) ⊇ H,
µ(((−n, n) ¸ H) +x) +µ(H +x) = µ((−n, n) +x)
Hence, from what was just shown about open sets,
µ(H) = µ((−n, n)) −µ((−n, n) ¸ H)
= µ((−n, n) +x) −µ(((−n, n) ¸ H) +x) = µ(H +x)
Therefore, the translation invariance holds for closed and bounded sets. If H is an
arbitrary closed set, then
µ(H +x) = lim
n→∞
µ(H ∩ [−n, n] +x) = lim
n→∞
µ(H ∩ [−n, n]) = µ(H) .
It follows right away that if G is the countable intersection of open sets, (G
δ
set,
pronounced g delta set ) then
m(G∩ (−n, n) +x) = m(G∩ (−n, n))
Now taking n → ∞, m(G+x) = m(G) .Similarly, if F is the countable union of
compact sets, (F
σ
set, pronounced F sigma set) then µ(F +x) = µ(F) . Now using
Theorem 7.6.1, if E is an arbitrary measurable set, there exist an F
σ
set F and a
G
δ
set G such that F ⊆ E ⊆ G and m(F) = m(G) = m(E). Then
m(F) = m(x +F) ≤ m(x +E) ≤ m(x +G) = m(G) = m(E) = m(F) .
7.7 Exercises
1. ♠Let ( be a set whose elements are σ algebras of subsets of Ω. Show ∩( is a
σ algebra also.
184 MEASURES AND MEASURABLE FUNCTIONS
2. Let Ω be any set. Show T (Ω) , the set of all subsets of Ω is a σ algebra. Now
let L denote some subset of T (Ω) . Consider all σ algebras which contain L.
Show the intersection of all these σ algebras which contain L is a σ algebra
containing L and it is the smallest σ algebra containing L, denoted by σ (L).
When Ω is a normed vector space, and L consists of the open sets σ (L) is
called the σ algebra of Borel sets.
3. Show that for (Ω, T) a measurable space, f : Ω → C is measurable if and
only if f
−1
(open sets) ∈ T. Hint: Note that any open set is composed of
countably many rectangles of the form (a, b) +i (c, d).
4. ♠Consider Ω = [0, 1] and let o denote all subsets of [0, 1] , F such that either
F
C
or F is countable. Note the empty set must be countable. Show o is a
σ algebra. (This is a sick σ algebra.) Now let µ : o → [0, ∞] be deﬁned by
µ(F) = 1 if F
C
is countable and µ(F) = 0 if F is countable. Show µ is a
measure on o.
5. ♠Let Ω = N, the positive integers and let a σ algebra be given by T = T (N),
the set of all subsets of N. What are the measurable functions having values
in C? Let µ(E) be the number of elements of E where E is a subset of N.
Show µ is a measure.
6. ♠Let T be a σ algebra of subsets of Ω and suppose T has inﬁnitely many
elements. Show that T is uncountable. Hint: You might try to show there
exists a countable sequence of disjoint sets of T, ¦A
i
¦. It might be easiest to
verify this by contradiction if it doesn’t exist rather than a direct construction
however, I have seen this done several ways. Once this has been done, you
can deﬁne a map θ, from T (N) into T which is one to one by θ (S) = ∪
i∈S
A
i
.
Then argue T (N) is uncountable and so T is also uncountable.
7. ♠A probability space is a measure space, (Ω, T, P) where the measure P has
the property that P (Ω) = 1. Such a measure is called a probability measure.
Random vectors are measurable functions, X, mapping a probability space,
(Ω, T, P) to 1
n
. Thus X(ω) ∈ 1
n
for each ω ∈ Ω and P is a probability
measure deﬁned on the sets of T, a σ algebra of subsets of Ω. For E a Borel
set in 1
n
, deﬁne
µ(E) ≡ P
_
X
−1
(E)
_
≡ probability that X ∈ E.
Show this is a well deﬁned probability measure on the Borel sets of 1
n
. Thus
µ(E) = P (X(ω) ∈ E) . It is called the distribution.
8. ♠Let E be a countable subset of 1. Show m(E) = 0. Hint: Let the set be
¦e
i
¦
∞
i=1
and let e
i
be the center of an open interval of length ε/2
i
.
9. ↑ ♠If S is an uncountable set of irrational numbers, is it necessary that S has
a rational number as a limit point? Hint: Consider the proof of Problem 8
when applied to the rational numbers. (This problem was shown to me by
Lee Erlebach.)
7.7. EXERCISES 185
10. ♠Let µ be a ﬁnite measure on the Borel sets of 1
n
. Show that µ must be
regular. Hint: You might let T denote those Borel sets for which µ is regular,
show the open sets are in T and that T is a σ algebra.
11. ♠If K is a compact subset of an open set V where K, V are in 1
n
, show there
exists a continuous function f : 1
n
→ [0, 1] such that f (x) = 1 on K and
spt (f) ≡ ¦x : f (x) ,= 0¦ referred to as the support of f is a compact subset
of V .
12. ♠Suppose (Ω, T, µ) is a measure space and ¦E
i
¦ ⊆ T. Suppose also that
∞
i=1
µ(E
i
) < ∞. Let N consist of all the points of Ω which are in inﬁnitely
many of the sets E
i
. Show that µ(N) = 0.
13. ↑ ♠Suppose µ is a ﬁnite regular measure deﬁned on B (1
n
) and E is a Borel
set. Let A
E
denote the indicator function of E deﬁned as
A
E
(x) ≡
_
1 if ω ∈ E
0 if ω / ∈ E
Show there exists a sequence of continuous functions having compact supports
¦f
k
¦ such that f
k
(x) → A
E
(x) for a.e. x. Recall from the above problems,
the support of f is deﬁned as
spt (f) ≡ ¦x : f (x) ,= 0¦.
14. ♠ Consider the following nested sequence of compact sets ¦P
n
¦. We let P
1
=
[0, 1], P
2
=
_
0,
1
3
¸
∪
_
2
3
, 1
¸
, etc. To go from P
n
to P
n+1
, delete the open interval
which is the middle third of each closed interval in P
n
. Let P = ∩
∞
n=1
P
n
.
Since P is the intersection of nested nonempty compact sets, it follows from
advanced calculus that P ,= ∅. Show m(P) = 0. Show there is a one to
one onto mapping of [0, 1] to P. The set P is called the Cantor set. Thus,
although P has measure zero, it has the same number of points in it as [0, 1]
in the sense that there is a one to one and onto mapping from one to the
other. Hint: There are various ways of doing this last part but the most
enlightenment is obtained by exploiting the construction of the Cantor set.
You might also work the next problem to get a way to do it.
15. ↑ ♠ Consider the sequence of functions deﬁned in the following way. Let
f
1
(x) = x on [0, 1]. To get fromf
n
to f
n+1
, let f
n+1
= f
n
on all intervals where
f
n
is constant. If f
n
is nonconstant on [a, b], let f
n+1
(a) = f
n
(a), f
n+1
(b) =
f
n
(b), f
n+1
is piecewise linear and equal to
1
2
(f
n
(a) + f
n
(b)) on the middle
third of [a, b]. Sketch a few of these and you will see the pattern. The process
of modifying a nonconstant section of the graph of this function is illustrated
in the following picture.
186 MEASURES AND MEASURABLE FUNCTIONS
Show ¦f
n
¦ converges uniformly on [0, 1]. If f(x) = lim
n→∞
f
n
(x), show that
f(0) = 0, f(1) = 1, f is continuous, and f
(x) = 0 for all x / ∈ P where P is the
Cantor set. This function is called the Cantor function.It is a very important
example to remember. Note it has derivative equal to zero a.e. and yet it
succeeds in climbing from 0 to 1. Thus it would seem that
_
1
0
f
(t) dt = 0 ,=
f (1) − f (0) . Is this somehow contradictory to the fundamental theorem of
calculus? Hint: This isn’t too hard if you focus on getting a careful estimate
on the diﬀerence between two successive functions in the list, considering only
a typical small interval in which the change takes place. The above picture
should be helpful.
16. ♠ Let m(W) > 0, W is measurable, W ⊆ [a, b]. Show there exists a nonmea
surable subset of W. Hint: Let x ∼ y if x − y ∈ ¸. Observe that ∼ is an
equivalence relation on 1. See Deﬁnition 1.1.9 on Page 20 for a review of this
terminology. Let ( be the set of equivalence classes and let T ≡ ¦C∩W : C ∈ (
and C ∩ W ,= ∅¦. By the axiom of choice, there exists a set A, consisting of
exactly one point from each of the nonempty sets which are the elements of
T. Show
W ⊆ ∪
r∈Q
A+r (a.)
A+r
1
∩ A+r
2
= ∅ if r
1
,= r
2
, r
i
∈ ¸. (b.)
Observe that since A ⊆ [a, b], then A+r ⊆ [a−1, b +1] whenever [r[ < 1. Use
this to show that if m(A) = 0, or if m(A) > 0 a contradiction results.Show
there exists some set S such that m(S) < m(S ∩ A) + m(S ¸ A) where m is
the outer measure determined by m.
17. ↑ ♠ This problem gives a very interesting example found in the book by
McShane [36]. Let g(x) = x+f(x) where f is the strange function of Problem
15. Let P be the Cantor set of Problem 14. Let [0, 1] ¸ P = ∪
∞
j=1
I
j
where
I
j
is open and I
j
∩ I
k
= ∅ if j ,= k. These intervals are the connected
components of the complement of the Cantor set. Show m(g(I
j
)) = m(I
j
)
so m(g(∪
∞
j=1
I
j
)) =
∞
j=1
m(g(I
j
)) =
∞
j=1
m(I
j
) = 1. Thus m(g(P)) = 1
because g([0, 1]) = [0, 2]. By Problem 16 there exists a set A ⊆ g (P) which is
non measurable. Deﬁne φ(x) = A
A
(g(x)). Thus φ(x) = 0 unless x ∈ P. Tell
why φ is measurable. (Recall m(P) = 0 and Lebesgue measure is complete.)
Now show that A
A
(y) = φ(g
−1
(y)) for y ∈ [0, 2]. Tell why g
−1
is continuous
but φ◦g
−1
is not measurable. (This is an example of measurable ◦ continuous
,= measurable.) Show there exist Lebesgue measurable sets which are not
Borel measurable. Hint: The function, φ is Lebesgue measurable. Now show
that Borel ◦ measurable = measurable.
18. ♠Let K be a compact subset of 1 having no isolated points. Show that there
exists an increasing continuous function g such that g is constant on every
connected component of K
C
and has values between 0 and 1. If J, L are two
components, J < L, then the value of g on J is strictly less than its value on
L. Hint: Let the components be ¦(a
k
, b
k
)¦. Let a be the ﬁrst point of K and
7.7. EXERCISES 187
b be the last. Let g
0
be piecewise linear, increasing and continuous going from
0 to the left of a to 1 to the right of b. Let g
1
equal
1
2
(g
0
(a
1
) +g
0
(b
1
)) on
(a
1
, b
1
) and adjust to make piecewise linear and increasing going from 0 to 1.
Next adjust g
1
in a similar way to make it constant on (a
2
, b
2
). Continue this
way. Estimate [[g
k
−g
k−1
[[
∞
in terms of g
k−1
(b
k
) − g
k−1
(a
k
) and observe
and use that the intervals (g
k−1
(a
k
) , g
k−1
(b
k
)) are disjoint.
19. ↑ ♠Show that if K is any compact subset of 1 which has no isolated points,
there exists a Borel measure µ which has the properties µ(K) = 1, µ(E) =
µ(E ∩ K) , if H is a proper compact subset of K, then µ(H) < 1. Also,
µ(p) = 0 whenever p is a point.
20. ↑ ♠Let K be an arbitrary compact subset of 1. Then there exists a Borel
measure µ which has the properties µ(K) = 1, µ(E) = µ(E ∩ K) , if H is a
compact subset of K, then µ(H) < 1.
21. Suppose you consider the closed upper half plane determined by the line y =
x. Can it be covered with countably many rectangles of the form [a, b] [c, d]
each of which is contained in the upper half plane? Hint: You must cover
the points on the line y = x.
188 MEASURES AND MEASURABLE FUNCTIONS
The Abstract Lebesgue
Integral
The general Lebesgue integral requires a measure space, (Ω, T, µ) and, to begin with,
a nonnegative measurable function. I will use Lemma 1.3.3 about interchanging two
supremums frequently. Also, I will use the observation that if ¦a
n
¦ is an increasing
sequence of points of [0, ∞] , then sup
n
a
n
= lim
n→∞
a
n
which is obvious from the
deﬁnition of sup.
8.1 Deﬁnition For Nonnegative Measurable Func
tions
8.1.1 Riemann Integrals For Decreasing Functions
First of all, the notation
[g < f]
is short for
¦ω ∈ Ω : g (ω) < f (ω)¦
with other variants of this notation being similar. Also, the convention, 0 ∞ = 0
will be used to simplify the presentation whenever it is convenient to do so.
Deﬁnition 8.1.1 Let f : [a, b] →[0, ∞] be decreasing. Deﬁne
_
b
a
f (λ) dλ ≡ lim
M→∞
_
b
a
M ∧ f (λ) dλ = sup
M
_
b
a
M ∧ f (λ) dλ
where a ∧ b means the minimum of a and b. Note that for f bounded,
sup
M
_
b
a
M ∧ f (λ) dλ =
_
b
a
f (λ) dλ
189
190 THE ABSTRACT LEBESGUE INTEGRAL
where the integral on the right is the usual Riemann integral because eventually
M > f. For f a nonnegative decreasing function deﬁned on [0, ∞),
_
∞
0
fdλ ≡ lim
R→∞
_
R
0
fdλ = sup
R>1
_
R
0
fdλ = sup
R
sup
M>0
_
R
0
f ∧ Mdλ
Since decreasing bounded functions are Riemann integrable, the above deﬁnition
is well deﬁned. Now here are some obvious properties.
Lemma 8.1.2 Let f be a decreasing nonnegative function deﬁned on an interval
[a, b] . Then if [a, b] = ∪
m
k=1
I
k
where I
k
≡ [a
k
, b
k
] and the intervals I
k
are non
overlapping, it follows
_
b
a
fdλ =
m
k=1
_
b
k
a
k
fdλ.
Proof: This follows from the computation,
_
b
a
fdλ ≡ lim
M→∞
_
b
a
f ∧ Mdλ
= lim
M→∞
m
k=1
_
b
k
a
k
f ∧ Mdλ =
m
k=1
_
b
k
a
k
fdλ
Note both sides could equal +∞.
8.1.2 The Lebesgue Integral For Nonnegative Functions
Here is the deﬁnition of the Lebesgue integral of a function which is measurable
and has values in [0, ∞].
Deﬁnition 8.1.3 Let (Ω, T, µ) be a measure space and suppose f : Ω → [0, ∞] is
measurable. Then deﬁne
_
fdµ ≡
_
∞
0
µ([f > λ]) dλ
which makes sense because λ →µ([f > λ]) is nonnegative and decreasing.
Note that if f ≤ g, then
_
fdµ ≤
_
gdµ because µ([f > λ]) ≤ µ([g > λ]) .
For convenience
0
i=1
a
i
≡ 0.
Lemma 8.1.4 In the situation of the above deﬁnition,
_
fdµ = sup
h>0
∞
i=1
µ([f > hi]) h
8.2. THE LEBESGUE INTEGRAL FOR NONNEGATIVE SIMPLE FUNCTIONS 191
Proof: Let m(h, R) satisfy R−h < hm(R, h) ≤ R. Then lim
R→∞
m(h, R) = ∞
and so
_
fdµ ≡
_
∞
0
µ([f > λ]) dλ = sup
M
sup
R
_
hm(h,R)
0
µ([f > λ]) ∧ Mdλ
= sup
M
sup
R>0
sup
h>0
m(h,R)
k=1
(µ([f > kh]) ∧ M) h
The sum is just a lower sum for the integral
_
hm(h,R)
0
µ([f > λ]) ∧ Mdλ. Hence,
switching the order of the sups, this equals
= sup
h>0
sup
R
m(R,h)
k=1
(µ([f > kh])) h = sup
h>0
∞
k=1
(µ([f > kh])) h.
8.2 The Lebesgue Integral For Nonnegative Sim
ple Functions
To begin with, here is a useful lemma.
Lemma 8.2.1 If f (λ) = 0 for all λ > a, where f is a decreasing nonnegative
function, then
_
∞
0
f (λ) dλ =
_
a
0
f (λ) dλ.
Proof: From the deﬁnition,
_
∞
0
f (λ) dλ = lim
R→∞
_
R
0
f (λ) dλ = sup
R>1
_
R
0
f (λ) dλ
= sup
R>1
sup
M
_
R
0
f (λ) ∧ Mdλ
= sup
M
sup
R>1
_
R
0
f (λ) ∧ Mdλ
= sup
M
sup
R>1
_
a
0
f (λ) ∧ Mdλ
= sup
M
_
a
0
f (λ) ∧ Mdλ ≡
_
a
0
f (λ) dλ.
Now the Lebesgue integral for a nonnegative function has been deﬁned, what
does it do to a nonnegative simple function? Recall a nonnegative simple function is
one which has ﬁnitely many nonnegative real values which it assumes on measurable
sets. Thus a simple function can be written in the form
s (ω) =
n
i=1
c
i
A
E
i
(ω)
192 THE ABSTRACT LEBESGUE INTEGRAL
where the c
i
are each nonnegative, the distinct values of s.
Lemma 8.2.2 Let s (ω) =
p
i=1
a
i
A
E
i
(ω) be a nonnegative simple function where
the E
i
are distinct but the a
i
might not be. Then
_
sdµ =
p
i=1
a
i
µ(E
i
) . (8.2.1)
Proof: Without loss of generality, assume 0 ≡ a
0
< a
1
≤ a
2
≤ ≤ a
p
and
that µ(E
i
) < ∞, i > 0. Here is why. If µ(E
i
) = ∞, then letting a ∈ (a
i−1
, a
i
) , by
Lemma 8.2.1, the left side would be
_
a
p
0
µ([s > λ]) dλ ≥
_
a
i
a
0
µ([s > λ]) dλ
≡ sup
M
_
a
i
0
µ([s > λ]) ∧ Mdλ
≥ sup
M
Ma
i
= ∞
and so both sides are equal to ∞. Thus it can be assumed for each i, µ(E
i
) < ∞.
Then it follows from Lemma 8.2.1 and Lemma 8.1.2,
_
∞
0
µ([s > λ]) dλ =
_
a
p
0
µ([s > λ]) dλ =
p
k=1
_
a
k
a
k−1
µ([s > λ]) dλ
=
p
k=1
(a
k
−a
k−1
)
p
i=k
µ(E
i
) =
p
i=1
µ(E
i
)
i
k=1
(a
k
−a
k−1
) =
p
i=1
a
i
µ(E
i
)
Lemma 8.2.3 If a, b ≥ 0 and if s and t are nonnegative simple functions, then
_
as +btdµ = a
_
sdµ +b
_
tdµ.
Proof: Let
s(ω) =
n
i=1
α
i
A
A
i
(ω), t(ω) =
m
i=1
β
j
A
B
j
(ω)
where α
i
are the distinct values of s and the β
j
are the distinct values of t. Clearly
as + bt is a nonnegative simple function because it has ﬁnitely many values on
measurable sets In fact,
(as +bt)(ω) =
m
j=1
n
i=1
(aα
i
+bβ
j
)A
A
i
∩B
j
(ω)
8.3. THE MONOTONE CONVERGENCE THEOREM 193
where the sets A
i
∩ B
j
are disjoint and measurable. By Lemma 8.2.2,
_
as +btdµ =
m
j=1
n
i=1
(aα
i
+bβ
j
)µ(A
i
∩ B
j
)
=
n
i=1
a
m
j=1
α
i
µ(A
i
∩ B
j
) +b
m
j=1
n
i=1
β
j
µ(A
i
∩ B
j
)
= a
n
i=1
α
i
µ(A
i
) +b
m
j=1
β
j
µ(B
j
)
= a
_
sdµ +b
_
tdµ.
8.3 The Monotone Convergence Theorem
The following is called the monotone convergence theorem. This theorem and re
lated convergence theorems are the reason for using the Lebesgue integral.
Theorem 8.3.1 (Monotone Convergence theorem) Let f have values in [0, ∞] and
suppose ¦f
n
¦ is a sequence of nonnegative measurable functions having values in
[0, ∞] and satisfying
lim
n→∞
f
n
(ω) = f(ω) for each ω.
f
n
(ω) ≤ f
n+1
(ω)
Then f is measurable and
_
fdµ = lim
n→∞
_
f
n
dµ.
Proof: By Lemma 8.1.4
lim
n→∞
_
f
n
dµ = sup
n
_
f
n
dµ
= sup
n
sup
h>0
∞
k=1
µ([f
n
> kh]) h = sup
h>0
sup
N
sup
n
N
k=1
µ([f
n
> kh]) h
= sup
h>0
sup
N
N
k=1
µ([f > kh]) h = sup
h>0
∞
k=1
µ([f > kh]) h =
_
fdµ
To illustrate what goes wrong without the Lebesgue integral, consider the fol
lowing example.
194 THE ABSTRACT LEBESGUE INTEGRAL
Example 8.3.2 Let ¦r
n
¦ denote the rational numbers in [0, 1] and let
f
n
(t) ≡
_
1 if t / ∈ ¦r
1
, , r
n
¦
0 otherwise
Then f
n
(t) ↑ f (t) where f is the function which is one on the rationals and zero
on the irrationals. Each f
n
is Riemann integrable (why?) but f is not Riemann
integrable. Therefore, you can’t write
_
fdx = lim
n→∞
_
f
n
dx.
A metamathematical observation related to this type of example is this. If you
can choose your functions, you don’t need the Lebesgue integral. The Riemann Dar
boux integral is just ﬁne. It is when you can’t choose your functions and they come
to you as pointwise limits that you really need the superior Lebesgue integral or
at least something more general than the Riemann integral. The Riemann integral
is entirely adequate for evaluating the seemingly endless lists of boring problems
found in calculus books. It is shown later that the two integrals coincide when the
Lebesgue integral is taken with respect to Lebesgue measure and the function being
integrated is Riemann integrable.
8.4 Other Deﬁnitions
To review and summarize the above, if f ≥ 0 is measurable,
_
fdµ ≡
_
∞
0
µ([f > λ]) dλ (8.4.2)
another way to get the same thing for
_
fdµ is to take an increasing sequence of
nonnegative simple functions, ¦s
n
¦ with s
n
(ω) → f (ω) and then by monotone
convergence theorem,
_
fdµ = lim
n→∞
_
s
n
where if s
n
(ω) =
m
j=1
c
i
A
E
i
(ω) ,
_
s
n
dµ =
m
i=1
c
i
µ(E
i
) .
Similarly this also shows that for such nonnegative measurable function,
_
fdµ = sup
__
s : 0 ≤ s ≤ f, s simple
_
Here is an equivalent deﬁnition of the integral of a nonnegative measurable function.
The fact it is well deﬁned has been discussed above.
Deﬁnition 8.4.1 For s a nonnegative simple function,
s (ω) =
n
k=1
c
k
A
E
k
(ω) ,
_
s =
n
k=1
c
k
µ(E
k
) .
8.5. FATOU’S LEMMA 195
For f a nonnegative measurable function,
_
fdµ = sup
__
s : 0 ≤ s ≤ f, s simple
_
.
8.5 Fatou’s Lemma
The next theorem, known as Fatou’s lemma is another important theorem which
justiﬁes the use of the Lebesgue integral.
Theorem 8.5.1 (Fatou’s lemma) Let f
n
be a nonnegative measurable function with
values in [0, ∞]. Let g(ω) = liminf
n→∞
f
n
(ω). Then g is measurable and
_
gdµ ≤ lim inf
n→∞
_
f
n
dµ.
In other words,
_
_
lim inf
n→∞
f
n
_
dµ ≤ lim inf
n→∞
_
f
n
dµ
Proof: Let g
n
(ω) = inf¦f
k
(ω) : k ≥ n¦. Then
g
−1
n
([a, ∞]) = ∩
∞
k=n
f
−1
k
([a, ∞])
=
_
∪
∞
k=n
f
−1
k
([a, ∞])
C
_
C
∈ T.
Thus g
n
is measurable by Lemma 7.5.1. Also g(ω) = lim
n→∞
g
n
(ω) so g is measur
able because it is the pointwise limit of measurable functions. Now the functions g
n
form an increasing sequence of nonnegative measurable functions so the monotone
convergence theorem applies. This yields
_
gdµ = lim
n→∞
_
g
n
dµ ≤ lim inf
n→∞
_
f
n
dµ.
The last inequality holding because
_
g
n
dµ ≤
_
f
n
dµ.
(Note that it is not known whether lim
n→∞
_
f
n
dµ exists.)
8.6 The Integral’s Righteous Algebraic Desires
The monotone convergence theorem shows the integral wants to be linear. This is
the essential content of the next theorem.
Theorem 8.6.1 Let f, g be nonnegative measurable functions and let a, b be non
negative numbers. Then af +bg is measurable and
_
(af +bg) dµ = a
_
fdµ +b
_
gdµ. (8.6.3)
196 THE ABSTRACT LEBESGUE INTEGRAL
Proof: By Theorem 7.5.6 on Page 175 there exist increasing sequences of non
negative simple functions, s
n
→ f and t
n
→ g. Then af + bg, being the pointwise
limit of the simple functions as
n
+ bt
n
, is measurable. Now by the monotone con
vergence theorem and Lemma 8.2.3,
_
(af +bg) dµ = lim
n→∞
_
as
n
+bt
n
dµ
= lim
n→∞
_
a
_
s
n
dµ +b
_
t
n
dµ
_
= a
_
fdµ +b
_
gdµ.
As long as you are allowing functions to take the value +∞, you cannot consider
something like f +(−g) and so you can’t very well expect a satisfactory statement
about the integral being linear until you restrict yourself to functions which have
values in a vector space. This is discussed next.
8.7 The Lebesgue Integral, L
1
The functions considered here have values in C, which is a vector space. A function
f with values in C is of the form f = Re f + i Imf where Re f and Imf are real
valued functions. In fact
Re f =
f +f
2
, Imf =
f −f
2i
.
Deﬁnition 8.7.1 Let (Ω, o, µ) be a measure space and suppose f : Ω →C. Then f
is said to be measurable if both Re f and Imf are measurable real valued functions.
As is always the case for complex numbers, [z[
2
= (Re z)
2
+ (Imz)
2
. Also, for
g a real valued function, one can consider its positive and negative parts deﬁned
respectively as
g
+
(x) ≡
g (x) +[g (x)[
2
, g
−
(x) =
[g (x)[ −g (x)
2
.
Thus [g[ = g
+
+g
−
and g = g
+
−g
−
and both g
+
and g
−
are measurable nonnegative
functions if g is measurable.
Then the following is the deﬁnition of what it means for a complex valued
function f to be in L
1
(Ω).
Deﬁnition 8.7.2 Let (Ω, T, µ) be a measure space. Then a complex valued function
f is in L
1
(Ω) if
_
[f[ dµ < ∞.
8.7. THE LEBESGUE INTEGRAL, L
1
197
For a function in L
1
(Ω) , the integral is deﬁned as follows.
_
fdµ ≡
_
(Re f)
+
dµ −
_
(Re f)
−
dµ +i
__
(Imf)
+
dµ −
_
(Imf)
−
dµ
_
I will show that with this deﬁnition, the integral is linear and well deﬁned. First
note that it is clearly well deﬁned because all the above integrals are of nonnegative
functions and are each equal to a nonnegative real number because for h equal to
any of the functions, [h[ ≤ [f[ and
_
[f[ dµ < ∞.
Here is a lemma which will make it possible to show the integral is linear.
Lemma 8.7.3 Let g, h, g
, h
be nonnegative measurable functions in L
1
(Ω) and
suppose that
g −h = g
−h
.
Then
_
gdµ −
_
hdµ =
_
g
dµ −
_
h
dµ.
Proof: By assumption, g + h
= g
+ h. Then from the Lebesgue integral’s
righteous algebraic desires, Theorem 8.6.1,
_
gdµ +
_
h
dµ =
_
g
dµ +
_
hdµ
which implies the claimed result.
Lemma 8.7.4 Let Re
_
L
1
(Ω)
_
denote the vector space of real valued functions
in L
1
(Ω) where the ﬁeld of scalars is the real nulmbers. Then
_
dµ is linear on
Re
_
L
1
(Ω)
_
.
Proof: First observe that from the deﬁnition of the positive and negative parts
of a function,
(f +g)
+
−(f +g)
−
= f
+
+g
+
−
_
f
−
+g
−
_
because both sides equal f +g. Therefore from Lemma 8.7.3 and the deﬁnition, it
follows from Theorem 8.6.1 that
_
f +gdµ ≡
_
(f +g)
+
−(f +g)
−
dµ =
_
f
+
+g
+
dµ −
_
f
−
+g
−
dµ
=
_
f
+
dµ +
_
g
+
dµ −
__
f
−
dµ +
_
g
−
dµ
_
=
_
fdµ +
_
gdµ.
what about taking out scalars? First note that if a is real and nonnegative, then
(af)
+
= af
+
and (af)
−
= af
−
while if a < 0, then (af)
+
= −af
−
and (af)
−
=
−af
+
. These claims follow immediately from the above deﬁnitions of positive and
198 THE ABSTRACT LEBESGUE INTEGRAL
negative parts of a function. Thus if a < 0 and f ∈ L
1
(Ω) , it follows from Theorem
8.6.1 that
_
afdµ ≡
_
(af)
+
dµ −
_
(af)
−
dµ =
_
(−a) f
−
dµ −
_
(−a) f
+
dµ
= −a
_
f
−
dµ +a
_
f
+
dµ = a
__
f
+
dµ −
_
f
−
dµ
_
≡ a
_
fdµ.
The case where a ≥ 0 works out similarly but easier.
Now here is the main result.
Theorem 8.7.5
_
dµ is linear on L
1
(Ω) and L
1
(Ω) is a complex vector space. If
f ∈ L
1
(Ω) , then Re f, Imf, and [f[ are all in L
1
(Ω) . Furthermore, for f ∈ L
1
(Ω) ,
_
fdµ ≡
_
(Re f)
+
dµ −
_
(Re f)
−
dµ +i
__
(Imf)
+
dµ −
_
(Imf)
−
dµ
_
≡
_
Re fdµ +i
_
Imfdµ
and the triangle inequality holds,
¸
¸
¸
¸
_
fdµ
¸
¸
¸
¸
≤
_
[f[ dµ. (8.7.4)
Also, for every f ∈ L
1
(Ω) it follows that for every ε > 0 there exists a simple
function s such that [s[ ≤ [f[ and
_
[f −s[ dµ < ε.
Also L
1
(Ω) is a vector space.
Proof: First consider the claim that the integral is linear. It was shown above
that the integral is linear on Re
_
L
1
(Ω)
_
. Then letting a +ib, c +id be scalars and
f, g functions in L
1
(Ω) ,
(a +ib) f + (c +id) g = (a +ib) (Re f +i Imf) + (c +id) (Re g +i Img)
= c Re (g)−b Im(f)−d Im(g)+a Re (f)+i (b Re (f) +c Im(g) +a Im(f) +d Re (g))
It follows from the deﬁnition that
_
(a +ib) f + (c +id) gdµ =
_
(c Re (g) −b Im(f) −d Im(g) +a Re (f)) dµ
+i
_
(b Re (f) +c Im(g) +a Im(f) +d Re (g)) (8.7.5)
8.7. THE LEBESGUE INTEGRAL, L
1
199
Also, from the deﬁnition,
(a +ib)
_
fdµ + (c +id)
_
gdµ = (a +ib)
__
Re fdµ +i
_
Imfdµ
_
+(c +id)
__
Re gdµ +i
_
Imgdµ
_
which equals
= a
_
Re fdµ −b
_
Imfdµ +ib
_
Re fdµ +ia
_
Imfdµ
+c
_
Re gdµ −d
_
Imgdµ +id
_
Re gdµ −d
_
Imgdµ.
Using Lemma 8.7.4 and collecting terms, it follows that this reduces to 8.7.5. Thus
the integral is linear as claimed.
Consider the claim about approximation with a simple function. Letting h equal
any of
(Re f)
+
, (Re f)
−
, (Imf)
+
, (Imf)
−
, (8.7.6)
It follows from the monotone convergence theorem and Theorem 7.5.6 on Page 175
there exists a nonnegative simple function s ≤ h such that
_
[h −s[ dµ <
ε
4
.
Therefore, letting s
1
, s
2
, s
3
, s
4
be such simple functions, approximating respectively
the functions listed in 8.7.6, and s ≡ s
1
−s
2
+i (s
3
−s
4
) ,
_
[f −s[ dµ ≤
_
¸
¸
¸(Re f)
+
−s
1
¸
¸
¸ dµ +
_
¸
¸
¸(Re f)
−
−s
2
¸
¸
¸ dµ
+
_
¸
¸
¸(Imf)
+
−s
3
¸
¸
¸ dµ +
_
¸
¸
¸(Imf)
−
−s
4
¸
¸
¸ dµ < ε
It is clear from the construction that [s[ ≤ [f[.
What about 8.7.4? Let θ ∈ C be such that [θ[ = 1 and θ
_
fdµ =
¸
¸
_
fdµ
¸
¸
. Then
from what was shown above about the integral being linear,
¸
¸
¸
¸
_
fdµ
¸
¸
¸
¸
= θ
_
fdµ =
_
θfdµ =
_
Re (θf) dµ ≤
_
[f[ dµ.
If f, g ∈ L
1
(Ω) , then it is known that for a, b scalars, it follows that af + bg is
measurable. See Corollary 7.5.13. Also
_
[af +bg[ dµ ≤
_
[a[ [f[ +[b[ [g[ dµ < ∞.
The following corollary follows from this. The conditions of this corollary are
sometimes taken as a deﬁnition of what it means for a function f to be in L
1
(Ω).
200 THE ABSTRACT LEBESGUE INTEGRAL
Corollary 8.7.6 f ∈ L
1
(Ω) if and only if there exists a sequence of complex simple
functions, ¦s
n
¦ such that
s
n
(ω) →f (ω) for all ω ∈ Ω
lim
m,n→∞
_
([s
n
−s
m
[) = 0
(8.7.7)
When f ∈ L
1
(Ω) ,
_
fdµ ≡ lim
n→∞
_
s
n
. (8.7.8)
Proof: From the above theorem, if f ∈ L
1
there exists a sequence of simple
functions ¦s
n
¦ such that
_
[f −s
n
[ dµ < 1/n, s
n
(ω) →f (ω) for all ω
Then
_
[s
n
−s
m
[ dµ ≤
_
[s
n
−f[ dµ +
_
[f −s
m
[ dµ ≤
1
n
+
1
m
.
Next suppose the existence of the approximating sequence of simple functions.
Then f is measurable because its real and imaginary parts are the limit of measur
able functions. By Fatou’s lemma,
_
[f[ dµ ≤ lim inf
n→∞
_
[s
n
[ dµ < ∞
because
¸
¸
¸
¸
_
[s
n
[ dµ −
_
[s
m
[ dµ
¸
¸
¸
¸
≤
_
[s
n
−s
m
[ dµ
which is given to converge to 0. Hence
__
[s
n
[ dµ
_
is a Cauchy sequence and is
therefore, bounded.
In case f ∈ L
1
(Ω) , letting ¦s
n
¦ be the approximating sequence, Fatou’s lemma
implies
¸
¸
¸
¸
_
fdµ −
_
s
n
dµ
¸
¸
¸
¸
≤
_
[f −s
n
[ dµ ≤ lim inf
m→∞
_
[s
m
−s
n
[ dµ < ε
provided n is large enough. Hence 8.7.8 follows.
This is a good time to observe the following fundamental observation which
follows from a repeat of the above arguments.
Theorem 8.7.7 Suppose Λ(f) ∈ [0, ∞] for all nonnegative measurable functions
and suppose that for a, b ≥ 0 and f, g nonnegative measurable functions,
Λ(af +bg) = aΛ(f) +bΛ(g) .
In other words, Λ wants to be linear. Then Λ has a unique linear extension to the
complex valued measurable functions.
8.8. THE DOMINATED CONVERGENCE THEOREM 201
8.8 The Dominated Convergence Theorem
One of the major theorems in this theory is the dominated convergence theorem.
Before presenting it, here is a technical lemma about limsup and liminf which is
really pretty obvious from the deﬁnition.
Lemma 8.8.1 Let ¦a
n
¦ be a sequence in [−∞, ∞] . Then lim
n→∞
a
n
exists if and
only if
lim inf
n→∞
a
n
= lim sup
n→∞
a
n
and in this case, the limit equals the common value of these two numbers.
Proof: Suppose ﬁrst lim
n→∞
a
n
= a ∈ 1. Then, letting ε > 0 be given, a
n
∈
(a −ε, a +ε) for all n large enough, say n ≥ N. Therefore, both inf ¦a
k
: k ≥ n¦
and sup ¦a
k
: k ≥ n¦ are contained in [a −ε, a +ε] whenever n ≥ N. It follows
limsup
n→∞
a
n
and liminf
n→∞
a
n
are both in [a −ε, a +ε] , showing
¸
¸
¸
¸
lim inf
n→∞
a
n
−lim sup
n→∞
a
n
¸
¸
¸
¸
< 2ε.
Since ε is arbitrary, the two must be equal and they both must equal a. Next suppose
lim
n→∞
a
n
= ∞. Then if l ∈ 1, there exists N such that for n ≥ N,
l ≤ a
n
and therefore, for such n,
l ≤ inf ¦a
k
: k ≥ n¦ ≤ sup ¦a
k
: k ≥ n¦
and this shows, since l is arbitrary that
lim inf
n→∞
a
n
= lim sup
n→∞
a
n
= ∞.
The case for −∞ is similar.
Conversely, suppose liminf
n→∞
a
n
= limsup
n→∞
a
n
= a. Suppose ﬁrst that
a ∈ 1. Then, letting ε > 0 be given, there exists N such that if n ≥ N,
sup ¦a
k
: k ≥ n¦ −inf ¦a
k
: k ≥ n¦ < ε
therefore, if k, m > N, and a
k
> a
m
,
[a
k
−a
m
[ = a
k
−a
m
≤ sup ¦a
k
: k ≥ n¦ −inf ¦a
k
: k ≥ n¦ < ε
showing that ¦a
n
¦ is a Cauchy sequence. Therefore, it converges to a ∈ 1, and
as in the ﬁrst part, the liminf and limsup both equal a. If liminf
n→∞
a
n
=
limsup
n→∞
a
n
= ∞, then given l ∈ 1, there exists N such that for n ≥ N,
inf
n>N
a
n
> l.
Therefore, lim
n→∞
a
n
= ∞. The case for −∞ is similar.
Here is the dominated convergence theorem.
202 THE ABSTRACT LEBESGUE INTEGRAL
Theorem 8.8.2 (Dominated Convergence theorem) Let f
n
∈ L
1
(Ω) and suppose
f(ω) = lim
n→∞
f
n
(ω),
and there exists a measurable function g, with values in [0, ∞],
1
such that
[f
n
(ω)[ ≤ g(ω) and
_
g(ω)dµ < ∞.
Then f ∈ L
1
(Ω) and
0 = lim
n→∞
_
[f
n
−f[ dµ = lim
n→∞
¸
¸
¸
¸
_
fdµ −
_
f
n
dµ
¸
¸
¸
¸
Proof: f is measurable by Theorem 7.5.3. Since [f[ ≤ g, it follows that
f ∈ L
1
(Ω) and [f −f
n
[ ≤ 2g.
By Fatou’s lemma (Theorem 8.5.1),
_
2gdµ ≤ lim inf
n→∞
_
2g −[f −f
n
[dµ
=
_
2gdµ −lim sup
n→∞
_
[f −f
n
[dµ.
Subtracting
_
2gdµ,
0 ≤ −lim sup
n→∞
_
[f −f
n
[dµ.
Hence
0 ≥ lim sup
n→∞
__
[f −f
n
[dµ
_
≥ lim inf
n→∞
__
[f −f
n
[dµ
_
≥
¸
¸
¸
¸
_
fdµ −
_
f
n
dµ
¸
¸
¸
¸
≥ 0.
This proves the theorem by Lemma 8.8.1 because the limsup and liminf are equal.
Corollary 8.8.3 Suppose f
n
∈ L
1
(Ω) and f (ω) = lim
n→∞
f
n
(ω) . Suppose also
there exist measurable functions, g
n
, g with values in [0, ∞] such that lim
n→∞
_
g
n
dµ =
_
gdµ, g
n
(ω) → g (ω) µ a.e. and both
_
g
n
dµ and
_
gdµ are ﬁnite. Also suppose
[f
n
(ω)[ ≤ g
n
(ω) . Then
lim
n→∞
_
[f −f
n
[ dµ = 0.
1
Note that, since g is allowed to have the value ∞, it is not known that g ∈ L
1
(Ω) .
8.9. THE ONE DIMENSIONAL LEBESGUE STIELTJES INTEGRAL 203
Proof: It is just like the above. This time g + g
n
− [f −f
n
[ ≥ 0 and so by
Fatou’s lemma,
_
2gdµ −lim sup
n→∞
_
[f −f
n
[ dµ =
lim inf
n→∞
_
(g
n
+g) dµ −lim sup
n→∞
_
[f −f
n
[ dµ
= lim inf
n→∞
_
((g
n
+g) −[f −f
n
[) dµ ≥
_
2gdµ
and so −limsup
n→∞
_
[f −f
n
[ dµ ≥ 0. Thus
0 ≥ lim sup
n→∞
__
[f −f
n
[dµ
_
≥ lim inf
n→∞
__
[f −f
n
[dµ
_
≥
¸
¸
¸
¸
_
fdµ −
_
f
n
dµ
¸
¸
¸
¸
≥ 0.
Deﬁnition 8.8.4 Let E be a measurable subset of Ω.
_
E
fdµ ≡
_
fA
E
dµ.
If L
1
(E) is written, the σ algebra is deﬁned as
¦E ∩ A : A ∈ T¦
and the measure is µ restricted to this smaller σ algebra. Clearly, if f ∈ L
1
(Ω),
then
fA
E
∈ L
1
(E)
and if f ∈ L
1
(E), then letting
˜
f be the 0 extension of f oﬀ of E, it follows
˜
f
∈ L
1
(Ω).
8.9 The One Dimensional Lebesgue Stieltjes Inte
gral
Let F be an increasing function deﬁned on 1. Let µ be the Lebesgue Stieltjes
measure deﬁned in Theorems 7.6.1 and 7.2.3. The conclusions of these theorems
are reviewed here.
Theorem 8.9.1 Let F be an increasing function deﬁned on 1, an integrator func
tion. There exists a function µ : T (1) →[0, ∞] which satisﬁes the following prop
erties.
1. If A ⊆ B, then 0 ≤ µ(A) ≤ µ(B) , µ(∅) = 0.
204 THE ABSTRACT LEBESGUE INTEGRAL
2. µ(∪
∞
k=1
A
i
) ≤
∞
i=1
µ(A
i
)
3. µ([a, b]) = F (b+) −F (a−) ,
4. µ((a, b)) = F (b−) −F (a+)
5. µ((a, b]) = F (b+) −F (a+)
6. µ([a, b)) = F (b−) −F (a−) where
F (b+) ≡ lim
t→b+
F (t) , F (b−) ≡ lim
t→a−
F (t) .
There also exists a σ algebra o of measurable sets on which µ is a measure which
contains the open sets and also satisﬁes the regularity conditions,
µ(E) = sup ¦µ(K) : K is a closed and bounded set K ⊆ E¦ (8.9.9)
µ(E) = inf ¦µ(V ) : V is an open set V ⊇ E¦ (8.9.10)
whenever E is a set in o.
The Lebesgue integral taken with respect to this measure, is called the Lebesgue
Stieltjes integral. Note that any real valued continuous function is measurable with
respect to o. This is because if f is continuous, inverse images of open sets are open
and open sets are in o. Thus f is measurable because f
−1
((a, b)) ∈ o. Similarly if
f has complex values this argument applied to its real and imaginary parts yields
the conclusion that f is measurable.
For f a continuous function, how does the Lebesgue Stieltjes integral compare
with the Darboux Stieltjes integral? To answer this question, here is a technical
lemma.
Lemma 8.9.2 Let D be a countable subset of 1 and suppose a, b / ∈ D. Also suppose
f is a continuous function deﬁned on [a, b] . Then there exists a sequence of functions
¦s
n
¦ of the form
s
n
(x) ≡
m
n
k=1
f
_
z
n
k−1
_
A
[z
n
k−1
,z
n
k
)
(x)
such that each z
n
k
/ ∈ D and
sup ¦[s
n
(x) −f (x)[ : x ∈ [a, b]¦ < 1/n.
Proof: First note that D contains no intervals. To see this let D = ¦d
k
¦
∞
k=1
. If
D has an interval of length 2ε, let I
k
be an interval centered at d
k
which has length
ε/2
k
. Therefore, the sum of the lengths of these intervals is no more than
∞
k=1
ε
2
k
= ε.
8.9. THE ONE DIMENSIONAL LEBESGUE STIELTJES INTEGRAL 205
Thus D cannot contain an interval of length 2ε. Since ε is arbitrary, D cannot
contain any interval.
Since f is continuous, it follows from Theorem 4.4.2 on Page 72 that f is uni
formly continuous. Therefore, there exists δ > 0 such that if [x −y[ ≤ 3δ, then
[f (x) −f (y)[ < 1/n
Now let ¦x
0
, , x
m
n
¦ be a partition of [a, b] such that [x
i
−x
i−1
[ < δ for each i.
For k = 1, 2, , m
n
−1, let z
n
k
/ ∈ D and [z
n
k
−x
k
[ < δ. Then
¸
¸
z
n
k
−z
n
k−1
¸
¸
≤ [z
n
k
−x
k
[ +[x
k
−x
k−1
[ +
¸
¸
x
k−1
−z
n
k−1
¸
¸
< 3δ.
It follows that for each x ∈ [a, b]
¸
¸
¸
¸
¸
m
n
k=1
f
_
z
n
k−1
_
A
[z
n
k−1
,z
n
k
)
(x) −f (x)
¸
¸
¸
¸
¸
< 1/n.
Proposition 8.9.3 Let f be a continuous function deﬁned on 1. Also let F be an
increasing function deﬁned on 1. Then whenever c, d are not points of discontinuity
of F and [a, b] ⊇ [c, d] ,
_
b
a
fA
[c,d]
dF =
_
fdµ
Here µ is the Lebesgue Stieltjes measure deﬁned above.
Proof: Since F is an increasing function it can have only countably many dis
continuities. The reason for this is that the only kind of discontinuity it can have is
where F (x+) > F (x−) . Now since F is increasing, the intervals (F (x−) , F (x+))
for x a point of discontinuity are disjoint and so since each must contain a rational
number and the rational numbers are countable, and therefore so are these intervals.
Let D denote this countable set of discontinuities of F. Then if l, r / ∈ D, [l, r] ⊆
[a, b] , it follows quickly from the deﬁnition of the Darboux Stieltjes integral that
_
b
a
A
[l,r)
dF = F (r) −F (l) = F (r−) −F (l−)
= µ([l, r)) =
_
A
[l,r)
dµ.
Now let ¦s
n
¦ be the sequence of step functions of Lemma 8.9.2 such that these step
functions converge uniformly to f on [c, d] Then
¸
¸
¸
¸
_
_
A
[c,d]
f −A
[c,d]
s
n
_
dµ
¸
¸
¸
¸
≤
_
¸
¸
A
[c,d]
(f −s
n
)
¸
¸
dµ ≤
1
n
µ([c, d])
and
¸
¸
¸
¸
¸
_
b
a
_
A
[c,d]
f −A
[c,d]
s
n
_
dF
¸
¸
¸
¸
¸
≤
_
b
a
A
[c,d]
[f −s
n
[ dF <
1
n
(F (b) −F (a)) .
206 THE ABSTRACT LEBESGUE INTEGRAL
Also if s
n
is given by the formula of Lemma 8.9.2,
_
A
[c,d]
s
n
dµ =
_
m
n
k=1
f
_
z
n
k−1
_
A
[z
n
k−1
,z
n
k
)
dµ
=
m
n
k=1
_
f
_
z
n
k−1
_
A
[z
n
k−1
,z
n
k
)
dµ
=
m
n
k=1
f
_
z
n
k−1
_
µ
_
[z
n
k−1
, z
n
k
)
_
=
m
n
k=1
f
_
z
n
k−1
_ _
F (z
n
k
−) −F
_
z
n
k−1
−
__
=
m
n
k=1
f
_
z
n
k−1
_ _
F (z
n
k
) −F
_
z
n
k−1
__
=
m
n
k=1
_
b
a
f
_
z
n
k−1
_
A
[z
n
k−1
,z
n
k
)
dF =
_
b
a
s
n
dF.
Therefore,
¸
¸
¸
¸
¸
_
A
[c,d]
fdµ −
_
b
a
A
[c,d]
fdF
¸
¸
¸
¸
¸
≤
¸
¸
¸
¸
_
A
[c,d]
fdµ −
_
A
[c,d]
s
n
dµ
¸
¸
¸
¸
+
¸
¸
¸
¸
¸
_
A
[c,d]
s
n
dµ −
_
b
a
s
n
dF
¸
¸
¸
¸
¸
+
¸
¸
¸
¸
¸
_
b
a
s
n
dF −
_
b
a
A
[c,d]
fdF
¸
¸
¸
¸
¸
≤
1
n
µ([c, d]) +
1
n
(F (b) −F (a))
and since n is arbitrary, this shows
_
fdµ −
_
b
a
fdF = 0.
In particular, in the special case where F is continuous and f is continuous,
_
b
a
fdF =
_
A
[a,b]
fdµ.
Thus, if F (x) = x so the Darboux Stieltjes integral is the usual integral from
calculus,
_
b
a
f (t) dt =
_
A
[a,b]
fdµ
8.10. EXERCISES 207
where µ is the measure which comes from F (x) = x as described above. This
measure is often denoted by m. Thus when f is continuous
_
b
a
f (t) dt =
_
A
[a,b]
fdm
and so there is no problem in writing
_
b
a
f (t) dt
for either the Lebesgue or the Riemann integral. Furthermore, when f is continuous,
you can compute the Lebesgue integral by using the fundamental theorem of calculus
because in this case, the two integrals are equal.
8.10 Exercises
1. ♠Let Ω = N =¦1, 2, ¦. Let T = T(N), the set of all subsets of N, and let
µ(S) = number of elements in S. Thus µ(¦1¦) = 1 = µ(¦2¦), µ(¦1, 2¦) = 2,
etc. Show (Ω, T, µ) is a measure space. It is called counting measure. What
functions are measurable in this case? For a nonnegative function, f deﬁned
on N, show
_
N
fdµ =
∞
k=1
f (k)
What do the monotone convergence and dominated convergence theorems say
about this example?
2. ♠ For the measure space of Problem 1, give an example of a sequence of
nonnegative measurable functions ¦f
n
¦ converging pointwise to a function f,
such that inequality is obtained in Fatou’s lemma.
3. ♠If (Ω, T, µ) is a measure space and f ≥ 0 is measurable, show that if g (ω) =
f (ω) a.e. ω and g ≥ 0, then
_
gdµ =
_
fdµ. Show that if f, g ∈ L
1
(Ω) and
g (ω) = f (ω) a.e. then
_
gdµ =
_
fdµ.
4. An algebra / of subsets of Ω is a subset of the power set such that Ω is in the
algebra and for A, B ∈ /, A ¸ B and A ∪ B are both in /. Let ( ≡ ¦E
i
¦
∞
i=1
be a countable collection of sets and let Ω
1
≡ ∪
∞
i=1
E
i
. Show there exists an
algebra of sets /, such that / ⊇ ( and / is countable. Note the diﬀerence
between this problem and Problem 6. Hint: Let (
1
denote all ﬁnite unions
of sets of ( and Ω
1
. Thus (
1
is countable. Now let B
1
denote all complements
with respect to Ω
1
of sets of (
1
. Let (
2
denote all ﬁnite unions of sets of
B
1
∪ (
1
. Continue in this way, obtaining an increasing sequence (
n
, each of
which is countable. Let
/ ≡ ∪
∞
i=1
(
i
.
208 THE ABSTRACT LEBESGUE INTEGRAL
5. Let / ⊆ T (Ω) where T (Ω) denotes the set of all subsets of Ω. Let σ (/)
denote the intersection of all σ algebras which contain /, one of these being
T (Ω). Show σ (/) is also a σ algebra.
6. ♠We say a function g mapping a normed vector space, Ω to a normed vector
space is Borel measurable if whenever U is open, g
−1
(U) is a Borel set. (The
Borel sets are those sets in the smallest σ algebra which contains the open
sets.) Let f : Ω → X and let g : X → Y where X is a normed vector space
and Y equals C, 1, or (−∞, ∞] and T is a σ algebra of sets of Ω. Suppose f
is measurable and g is Borel measurable. Show g ◦ f is measurable.
7. ♠Let (Ω, T, µ) be a measure space. Deﬁne µ : T(Ω) →[0, ∞] by
µ(A) = inf¦µ(B) : B ⊇ A, B ∈ T¦.
Show µ satisﬁes
µ(∅) = 0, if A ⊆ B, µ(A) ≤ µ(B),
µ(∪
∞
i=1
A
i
) ≤
∞
i=1
µ(A
i
), µ(A) = µ(A) if A ∈ T.
If µ satisﬁes these conditions, it is called an outer measure. This shows every
measure determines an outer measure on the power set.
8. Suppose (Ω, o, µ) is a measure space which may not be complete. Show that
one can obtain a larger σ algebra and extend the measure to this larger σ
algebra in such a way that the resulting measure space is complete. Do it by
considering the outer measure determined by µ and then using Caratheodory’s
procedure to get a possibly larger σ algebra such that on this σ algebra, this
outer measure is a measure.
9. ♠ Let ¦E
i
¦ be a sequence of measurable sets with the property that
∞
i=1
µ(E
i
) < ∞.
Let S = ¦ω ∈ Ω such that ω ∈ E
i
for inﬁnitely many values of i¦. Show
µ(S) = 0 and S is measurable. This is part of the Borel Cantelli lemma.
Hint: Write S in terms of intersections and unions. Something is in S means
that for every n there exists k > n such that it is in E
k
. Remember the tail
of a convergent series is small.
10. ↑ ♠Let ¦f
n
¦ , f be measurable functions with values in C. ¦f
n
¦ converges in
measure if
lim
n→∞
µ(x ∈ Ω : [f(x) −f
n
(x)[ ≥ ε) = 0
8.10. EXERCISES 209
for each ﬁxed ε > 0. Prove the theorem of F. Riesz. If f
n
converges to f
in measure, then there exists a subsequence ¦f
n
k
¦ which converges to f a.e.
Hint: Choose n
1
such that
µ(x : [f(x) −f
n
1
(x)[ ≥ 1) < 1/2.
Choose n
2
> n
1
such that
µ(x : [f(x) −f
n
2
(x)[ ≥ 1/2) < 1/2
2
,
n
3
> n
2
such that
µ(x : [f(x) −f
n
3
(x)[ ≥ 1/3) < 1/2
3
,
etc. Now consider what it means for f
n
k
(x) to fail to converge to f(x). Then
use Problem 9.
11. ♠Suppose (Ω, µ) is a ﬁnite measure space (µ(Ω) < ∞) and S ⊆ L
1
(Ω). Then
S is said to be uniformly integrable if for every ε > 0 there exists δ > 0 such
that if E is a measurable set satisfying µ(E) < δ, then
_
E
[f[ dµ < ε
for all f ∈ S. Show S is uniformly integrable and bounded in L
1
(Ω) if there
exists an increasing function h which satisﬁes
lim
t→∞
h(t)
t
= ∞, sup
__
Ω
h([f[) dµ : f ∈ S
_
< ∞.
S is bounded if there is some number, M such that
_
[f[ dµ ≤ M
for all f ∈ S.
12. ♠ Let (Ω, T, µ) be a measure space and suppose f, g : Ω → (−∞, ∞] are
measurable. Prove the sets
¦ω : f(ω) < g(ω)¦ and ¦ω : f(ω) = g(ω)¦
are measurable. Hint: The easy way to do this is to write
¦ω : f(ω) < g(ω)¦ = ∪
r∈Q
[f < r] ∩ [g > r] .
Note that l (x, y) = x − y is not continuous on (−∞, ∞] so the obvious idea
doesn’t work.
210 THE ABSTRACT LEBESGUE INTEGRAL
13. ♠ Let ¦f
n
¦ be a sequence of real or complex valued measurable functions. Let
S = ¦ω : ¦f
n
(ω)¦ converges¦.
Show S is measurable. Hint: You might try to exhibit the set where f
n
converges in terms of countable unions and intersections using the deﬁnition
of a Cauchy sequence.
14. ♠Suppose u
n
(t) is a diﬀerentiable function for t ∈ (a, b) and suppose that for
t ∈ (a, b),
[u
n
(t)[, [u
n
(t)[ < K
n
where
∞
n=1
K
n
< ∞. Show
(
∞
n=1
u
n
(t))
=
∞
n=1
u
n
(t).
Hint: This is an exercise in the use of the dominated convergence theorem
and the mean value theorem.
15. ♠Suppose ¦f
n
¦ is a sequence of nonnegative measurable functions deﬁned on
a measure space, (Ω, o, µ). Show that
_
∞
k=1
f
k
dµ =
∞
k=1
_
f
k
dµ.
Hint: Use the monotone convergence theorem along with the fact the integral
is linear.
16. ♠The integral
_
∞
−∞
f (t) dt will denote the Lebesgue integral taken with re
spect to one dimensional Lebesgue measure as discussed earlier. Show that
for α > 0, t →e
−at
2
is in L
1
(1). The gamma function is deﬁned for x > 0 as
Γ(x) ≡
_
∞
0
e
−t
t
x−1
dt
Show t →e
−t
t
x−1
is in L
1
(1) for all x > 0. Also show that
Γ(x + 1) = xΓ(x) , Γ(1) = 1.
How does Γ(n) for n an integer compare with (n −1)!?
17. ♠This problem outlines a treatment of Stirling’s formula which is a very useful
approximation to n! based on a section in [39]. It is an excellent application
of the monotone convergence theorem. Follow and justify the following steps
using the convergence theorems for the Lebesgue integral as needed. Here
x > 0.
Γ(x + 1) =
_
∞
0
e
−t
t
x
dt
8.10. EXERCISES 211
First change the variables letting t = x(1 +u) to get Γ(x + 1) =
e
−x
x
x+1
_
∞
−1
_
e
−u
(1 +u)
_
x
du
Next make the change of variables u = s
_
2
x
to obtain Γ(x + 1) =
√
2e
−x
x
x+(1/2)
_
∞
−
√
x
2
_
e
−s
√
2
x
_
1 +s
_
2
x
__
x
ds
The integrand is increasing in x. This is most easily seen by taking ln of the
integrand and then taking the derivative with respect to x. This derivative
is positive. Next show the limit of the integrand as x → ∞ is e
−s
2
. This
isn’t too bad if you take ln and then use L’Hospital’s rule. Consider the
integral. Explain why it must be increasing in x. Next justify the following
assertion. Remember the monotone convergence theorem applies to a sequence
of functions.
lim
x→∞
_
∞
−
√
x
2
_
e
−s
√
2
x
_
1 +s
_
2
x
__
x
ds =
_
∞
−∞
e
−s
2
ds
Now Stirling’s formula is
lim
x→∞
Γ(x + 1)
√
2e
−x
x
x+(1/2)
=
_
∞
−∞
e
−s
2
ds
where this last improper integral equals a well deﬁned constant (why?). It
is very easy, when you know something about multiple integrals of functions
of more than one variable to verify this constant is
√
π but the necessary
mathematical machinery has not yet been presented. It can also be done
through much more diﬃcult arguments in the context of functions of only one
variable. See [39] for these clever arguments.
18. To show you the power of Stirling’s formula, ﬁnd whether the series
∞
n=1
n!e
n
n
n
converges. The ratio test falls ﬂat but you can try it if you like. Now explain
why, if n is large enough
n! ≥
1
2
__
∞
−∞
e
−s
2
ds
_
√
2e
−n
n
n+(1/2)
≡ c
√
2e
−n
n
n+(1/2)
.
Use this.
19. Give a theorem in which the improper Riemann integral coincides with a
suitable Lebesgue integral. (There are many such situations just ﬁnd one.)
212 THE ABSTRACT LEBESGUE INTEGRAL
20. ♠Note that
_
∞
0
sin x
x
dx is a valid improper Riemann integral deﬁned by
lim
R→∞
_
R
0
sin x
x
dx
but this function, sin x/x is not in L
1
([0, ∞)). Why?
21. Let f be a nonnegative strictly decreasing function deﬁned on [0, ∞). For
0 ≤ y ≤ f (0), let f
−1
(y) = x where y ∈ [f (x+) , f (x−)]. (Draw a picture. f
could have jump discontinuities.) Show that f
−1
is nonincreasing and that
_
∞
0
f (t) dt =
_
f(0)
0
f
−1
(y) dy.
22. If A is m¸S measurable, it does not follow that A is m measurable. Give an
example to show this is the case. You can use the existence of non measurable
sets.
23. ♠If f is a nonnegative Lebesgue measurable function, show there exists g a
Borel measurable function such that g (x) = f (x) a.e.
The Lebesgue Integral For
Functions Of n Variables
9.1 Completion Of Measure Spaces, Approxima
tion
Suppose (Ω, T, µ) is a measure space. Then it is always possible to enlarge the σ
algebra and deﬁne a new measure µ on this larger σ algebra such that
_
Ω, T, µ
_
is
a complete measure space. Recall this means that if
(E ¸ E
) ∪ (E
¸ E) ⊆ N
where µ(N) = 0 with N ∈ T, then E
∈ T also. The following theorem is the
main result. The new measure space is called the completion of the measure space.
Deﬁnition 9.1.1 A measure space, (Ω, T, µ) is called σ ﬁnite if there exists a se
quence ¦Ω
n
¦ ⊆ T such that ∪
n
Ω
n
= Ω and µ(Ω
n
) < ∞.
For example, if X is a ﬁnite dimensional normed vector space and µ is a measure
deﬁned on B (X) which is ﬁnite on compact sets, then you could take Ω
n
= B(0, n) .
Theorem 9.1.2 Let (Ω, T, µ) be a measure space. Then there exists a measure
space, (Ω, (, λ) satisfying
1. (Ω, (, λ) is a complete measure space.
2. λ = µ on T
3. ( ⊇ T
4. For every E ∈ ( there exists G ∈ T such that G ⊇ E and µ(G) = λ(E) .
In addition to this, if (Ω, T, µ) is σ ﬁnite, then the following approximation
result holds.
213
214 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
5. For every E ∈ ( there exists F ∈ T and G ∈ T such that F ⊆ E ⊆ G and
µ(G¸ F) = λ(G¸ F) = 0 (9.1.1)
There is a unique complete measure space (Ω, (, λ) extending (Ω, T, µ) which
satisﬁes 9.1.1.
Proof: Deﬁne the outer measure
λ(A) ≡ inf ¦µ(E) : E ∈ T¦ , λ(∅) ≡ 0.
Denote by ( the σ algebra of λ measurable sets. Then I claim that λ = µ on T. It
is clear that λ ≤ µ on T directly from the deﬁnition. Now if A ∈ T and λ(A) = ∞,
there is nothing to prove. Therefore, assume λ(A) < ∞ and suppose A ⊆ E such
that E ∈ T and
λ(A) +ε > µ(E) ≥ µ(A)
Then since ε is arbitrary, λ ≥ µ also.
From the deﬁnition, there exists E ⊇ S such that if λ(S) < ∞,
λ(S) +ε > µ(E) ≥ λ(S)
Let E
n
⊇ S, be a decreasing sequence of sets of T such that
λ(S) ≤ µ(E
n
) ≤ λ(S) + 2
−n
Now let G = ∩
∞
n=1
E
n
. It follows that G ⊇ S and λ(S) = µ(G).
Why is T ⊆ (? Letting λ(S) < ∞, (There is nothing to prove if λ(S) = ∞.)
let G ∈ T be such that G ⊇ S and λ(S) = µ(G) . Then if A ∈ T,
λ(S) ≤ λ(S ∩ A) +λ
_
S ∩ A
C
_
≤ λ(G∩ A) +λ
_
G∩ A
C
_
= µ(G∩ A) +µ
_
G∩ A
C
_
= µ(G) = λ(S) .
Thus T ⊆ (..
Finally suppose µ is σ ﬁnite. Let Ω = ∪
∞
n=1
Ω
n
where the Ω
n
are disjoint sets of
T and µ(Ω
n
) < ∞. Letting A ∈ (, consider A
n
≡ A ∩ Ω
n
. From what was just
shown, there exists G
n
⊇ A
C
∩ Ω
n
, G
n
⊆ Ω
n
such that µ(G
n
) = λ
_
A
C
∩ Ω
n
_
.
Ω
n
∩ A
C
Ω
n
∩ A
©
G
n
Since µ(Ω
n
) < ∞, this implies
λ
_
G
n
¸
_
A
C
∩ Ω
n
__
= λ(G
n
) −λ
_
A
C
∩ Ω
n
_
= 0.
9.1. COMPLETION OF MEASURE SPACES, APPROXIMATION 215
Now G
C
n
⊆ A∪ Ω
n
and so F
n
≡ G
C
n
∩ Ω
n
⊆ A
n
and λ(A
n
¸ F
n
) =
λ
_
A∩ Ω
n
¸
_
G
C
n
∩ Ω
n
__
= λ(A∩ Ω
n
∩ G
n
) = λ(A∩ G
n
) = λ
_
G
n
¸ A
C
_
≤ λ
_
G
n
¸
_
A
C
∩ Ω
n
__
= 0.
Letting F = ∪
n
F
n
, it follows that F ∈ T and
λ(A¸ F) ≤
∞
k=1
λ(A
k
¸ F
k
) = 0.
Also, there exists G
n
⊇ A
n
such that µ(G
n
) = λ(G
n
) = λ(A
n
) . Since the measures
are ﬁnite, it follows that λ(G
n
¸ A
n
) = 0. Then letting G = ∪
∞
n=1
G
n
, it follows that
G ⊇ A and
λ(G¸ A) = λ(∪
∞
n=1
G
n
¸ ∪
∞
n=1
A
n
)
≤ λ(∪
∞
n=1
G
n
¸ A
n
) ≤
∞
n=1
λ(G
n
¸ A
n
) = 0.
Thus µ(G¸ F) = λ(G¸ F) = λ(G¸ A) +λ(A¸ F) = 0.
If you have
_
λ
, (
_
complete and satisfying 9.1.1, then letting E ∈ (
, it follows
from 5, there exist F, G ∈ T such that
F ⊆ E ⊆ G, µ(G¸ F) = 0 = λ(G¸ F) .
Therefore, by completeness of the two measure spaces, E ∈ (. The opposite in
clusion is similar. Hence ( = (
. If E ∈ (, let F ⊆ E ⊆ G where µ(G¸ F) = 0.
Then
λ(E) ≤ µ(G) = µ(F) = λ
(F) ≤ λ
(E)
The opposite inequality holds by the same reasoning. Hence λ = λ
.
Theorem 9.1.3 Let (Ω, T, µ) be a complete measure space and let f ≤ g ≤ h be
functions having values in [0, ∞] . Suppose also that f (ω) = h(ω) a.e. ω and that f
and h are measurable. Then g is also measurable. If (Ω, (, λ) is the completion of
a σ ﬁnite measure space (Ω, T, µ) as described above in Theorem 9.1.2, then if f is
measurable with respect to ( having values in [0, ∞] , it follows there exist functions
h, g measurable with respect to T, h ≤ f ≤ g, such that g = h a.e. Also if f has
complex values and is ( measurable, there exist h, g which are T measurable such
that [h[ ≤ [f[ ≤ [g[ and h = f = g a.e.
Proof: Consider the ﬁrst claim. g = f +A
[g>f]
(g −f) . f is given to be measur
able. The second term on the right is also measurable because
_
A
[g>f]
(g −f) > α
¸
=
Ω if α < 0 while if α ≥ 0, the set is contained in [h −f > 0] , which is a measurable
set of measure zero.
216 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
Now consider the last assertion. By Theorem 7.5.6 on Page 175 there exists an
increasing sequence of nonnegative simple functions, ¦s
n
¦ measurable with respect
to ( which converges pointwise to f. Letting
s
n
(ω) =
m
n
k=1
c
n
k
A
E
n
k
(ω) (9.1.2)
be one of these simple functions, it follows from Theorem 9.1.2, there exist sets,
F
n
k
∈ T such that F
n
k
⊆ E
n
k
⊆ G
n
k
and µ(G
n
k
¸ F
n
k
) = 0. Then let
b
n
(ω) ≡
m
n
k=1
c
n
k
A
F
n
k
(ω) , t
n
(ω) =
m
n
k=1
c
n
k
A
G
n
k
(ω) .
Hence b
n
≤ s
n
≤ t
n
, and t
n
= b
n
a.e. Let
g (ω) ≡ lim sup
n→∞
t
n
(ω)
and let
h(ω) = lim inf
n→∞
b
n
(ω) .
Then h, g are T measurable, h ≤ f ≤ g, and both h, g equal f oﬀ N ≡ ∪
n
[t
n
−b
n
> 0].
Because oﬀ this set of measure zero, both functions equal s
n
which converges to f.
The last claim follows from considering the positive and negative parts of real
and imaginary parts.
There is a certain property called regularity which is very important. Recall the
deﬁnition of the Borel sets in Deﬁnition 7.3.3.
Deﬁnition 9.1.4 Let (X, T, µ) be a measure space where the σ algebra T contains
B (X) , the Borel sets. This is called a regular measure space if for every E ∈ T,
µ(E) = sup ¦µ(K) : K ⊆ E, K compact¦
and
µ(E) = inf ¦µ(V ) : V ⊇ E, V open¦
Note that this is equivalent to the existence of a set F ⊆ E which is the countable
union of compact sets and a set G ⊇ E which is the countable intersection of open
sets such that
µ(E) = µ(F) = µ(G) .
Sets which are the countable intersection of open sets are called G
δ
sets and those
which are the countable union of compact sets are called F
σ
sets.
1
In the context of a σ ﬁnite measure space, this can be strengthened to mean the
same as µ(G¸ F) = 0 and it is in this form that regularity will usually be referred
to. The following theorem is very interesting in this regard.
1
Actually, F
σ
usually means that the set is a countable union of closed sets but in the situations
considered in this book it is usually the same thing.
9.1. COMPLETION OF MEASURE SPACES, APPROXIMATION 217
Theorem 9.1.5 Let µ be a measure deﬁned on the Borel sets of 1
n
which is ﬁnite
on compact sets. Then if E is any Borel set, there exists F a countable union of
compact sets and G, a countable intersection of open sets, such that F ⊆ E ⊆ G
and µ(G¸ F) = 0.
Proof: First suppose that µ is a ﬁnite measure. Let T ≡ E Borel such that the
following two conditions hold. These will be referred to as inner regular on E and
outer regular on E.
µ(E) = sup ¦µ(K) : K compact and K ⊆ E¦
µ(E) = inf ¦µ(V ) : V open and V ⊇ E¦
Then T contains the open sets because every open set is the union of an increas
ing sequence of compact sets. If U is open, simply let K
n
=
_
x : dist
_
x,U
C
_
≥
1
n
_
∩
B(0, n). Therefore, µ(U) = lim
n→∞
µ(K
n
). It is obvious that if U is open, then
there exists V open containing U such that µ(V ) = µ(U) . Just take V = U.
Similar reasoning shows that the closed sets are in T.
Next suppose E ∈ T. Show there exists K compact, K ⊆ E such that
µ(E) < µ(K) +ε.
But since µ is a measure, µ(E)−µ(K) = µ(E ¸ K) . Thus µ(E ¸ K) < ε. Therefore,
K
C
is an open set which contains E
C
. Since µ
_
K
C
¸ E
C
_
= µ(E ¸ K) , it follows
that µ is outer regular on E
C
.
By similar reasoning, there exists V ⊇ E, V open, such that µ(V ¸ E) < ε. It
follows that V
C
is a closed subset of E
C
and µ
_
E
C
¸ V
C
_
= µ(V ¸ E) < ε. Now
V
C
= ∪
∞
n=1
V
C
∩ B(0, n) ≡ ∪
∞
n=1
K
n
,
the union of an increasing sequence of compact sets. Therefore, for all n large
enough,
µ
_
E
C
¸ K
n
_
≤ µ
_
E
C
¸ V
C
_
+µ
_
V
C
¸ K
n
_
< ε.
This has shown that T is closed with respect to complements.
Next consider E
i
∈ T. There exist open sets V
i
⊇ E
i
such that µ(V
i
¸E
i
) <
ε/2
i
. Then
µ((∪
i
V
i
) ¸ (∪
i
E
i
)) ≤ µ(∪
i
(V
i
¸ E
i
)) ≤
∞
i=1
µ(V
i
¸E
i
) < ε.
Thus µ is outer regular on ∪
i
E
i
.
Next, there exists K
i
⊆ E
i
such that µ(K
i
) +
ε
2
i
> µ(E
i
). Then
µ(∪
∞
i=1
E
i
¸ ∪
∞
i=1
K
i
) ≤ µ(∪
i
(E
i
¸ K
i
)) ≤
∞
i=1
µ(E
i
¸ K
i
) < ε.
218 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
It follows that
µ(∪
∞
i=1
E
i
¸ ∪
n
i=1
K
i
) ≤ µ(∪
∞
i=1
E
i
¸ ∪
∞
i=1
K
i
) +µ(∪
∞
i=1
K
i
¸ ∪
n
i=1
K
i
) < ε
provided n is suﬃciently large.
Hence T is a σ algebra which contains the open sets. Therefore, T equals the
Borel sets. It follows that for any E Borel, there exists F a countable union of
compact sets and G a countable intersection of open sets such that G ⊇ E ⊇ F and
µ(G¸ F) = 0.
Next consider the case where µ is only known to be ﬁnite on compact sets. Then
let A
1
≡ B(0, 1) , A
2
≡ B(0, 2)¸B(0, 1) , A
n
≡ B(0, n)¸B(0, n −1) , . Thus
µ is ﬁnite on each of these Borel sets A
i
. Let µ
n
(E) ≡ µ(E ∩ A
n
). Thus µ
n
is a
ﬁnite Borel measure and for E Borel,
µ(E) =
∞
n=1
µ
n
(E) .
Applying the above to each µ
n
, there exist F
n
, a countable union of compact
sets and G
n
, a countable intersection of open sets such that G
n
⊇ E ⊇ F
n
and
µ
n
(G
n
¸ F
n
) = 0. Let G ≡ ∩
n
G
n
and F ≡ ∪
∞
n=1
F
n
. Then G is a countable inter
section of open sets, F is a countable union of compact sets, G ⊇ E ⊇ F, and
µ(G¸ F) ≤
∞
n=1
µ
n
(G
n
¸ F
n
) = 0.
The following little theorem shows that if you have a measure which is regular
on B (X) , then the completion is also regular on the enlarged σ algebra coming
from the completion.
Theorem 9.1.6 Suppose µ is a regular measure deﬁned on B (Ω), the Borel sets of
Ω, a topological space. Also suppose µ is σ ﬁnite. Then denoting by
_
Ω, B (Ω), µ
_
the completion of (Ω, B (Ω) , µ) , it follows µ is also regular.
Proof: By Theorem 9.1.2, for F ∈ B (Ω) there exist sets of B (Ω) , H, G such
that H ⊆ F ⊆ G and µ(G¸ H) = 0. By regularity of µ, there exist G
a G
δ
set and
H
an F
σ
set such that G
⊇ G, H
⊆ H, and µ(H ¸ H
) = µ(G
¸ G) = 0. Then
µ(G
¸ H
) = µ(G
¸ H
)
≤ µ(G
¸ G) +µ(G¸ H) +µ(H ¸ H
) = 0.
A repeat of the above argument yields the following corollary.
Corollary 9.1.7 The conclusion of the above theorem holds for Ω replaced with Y
where Y is a closed subset of Ω.
9.2. DYNKIN SYSTEMS 219
9.2 Dynkin Systems
The approach to n dimensional Lebesgue measure will be based on a very elegant
idea due to Dynkin.
Deﬁnition 9.2.1 Let Ω be a set and let / be a collection of subsets of Ω. Then /
is called a π system if ∅, Ω ∈ / and whenever A, B ∈ /, it follows A∩ B ∈ /.
For example, if 1
n
= Ω, an example of a π system would be the set of all open
sets. Another example would be sets of the form
n
k=1
A
k
where A
k
is a Lebesgue
measurable set.
The following is the fundamental lemma which shows these π systems are useful.
Lemma 9.2.2 Let / be a π system of subsets of Ω, a set. Also let ( be a collection
of subsets of Ω which satisﬁes the following three properties.
1. / ⊆ (
2. If A ∈ (, then A
C
∈ (
3. If ¦A
i
¦
∞
i=1
is a sequence of disjoint sets from ( then ∪
∞
i=1
A
i
∈ (.
Then ( ⊇ σ (/) , where σ (/) is the smallest σ algebra which contains /.
Proof: First note that if
H ≡ ¦( : 1  3 all hold¦
then ∩H yields a collection of sets which also satisﬁes 1  3. Therefore, I will
assume in the argument that ( is the smallest collection of sets satisfying 1  3, the
intersection of all such collections. Let A ∈ / and deﬁne
(
A
≡ ¦B ∈ ( : A∩ B ∈ (¦ .
I want to show (
A
satisﬁes 1  3 because then it must equal ( since ( is the smallest
collection of subsets of Ω which satisﬁes 1  3. This will give the conclusion that for
A ∈ / and B ∈ (, A ∩ B ∈ (. This information will then be used to show that if
A, B ∈ ( then A∩ B ∈ (. From this it will follow very easily that ( is a σ algebra
which will imply it contains σ (/). Now here are the details of the argument.
Since / is given to be a π system, / ⊆ (
A
. Property 3 is obvious because if
¦B
i
¦ is a sequence of disjoint sets in (
A
, then
A∩ ∪
∞
i=1
B
i
= ∪
∞
i=1
A∩ B
i
∈ (
because A∩ B
i
∈ ( and the property 3 of (.
It remains to verify Property 2 so let B ∈ (
A
. I need to verify that B
C
∈ (
A
.
In other words, I need to show that A ∩ B
C
∈ (. However, consider the following
picture in which the shaded area is A¸ B = A∩ B
C
.
220 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
A B
Thus it is easy to see that
A∩ B
C
=
_
A
C
∪ (A∩ B)
_
C
∈ (
Here is why. Since B ∈ (
A
, A ∩ B ∈ ( and since A ∈ / ⊆ ( it follows A
C
∈ (. It
follows the union of the disjoint sets A
C
and (A∩ B) is in ( and then from 2 the
complement of their union is in (. Thus (
A
satisﬁes 1  3 and this implies since (
is the smallest such, that (
A
⊇ (. However, (
A
is constructed as a subset of ( and
so ( = (
A
. This proves that for every B ∈ ( and A ∈ /, A ∩ B ∈ (. Now pick
B ∈ ( and consider
(
B
≡ ¦A ∈ ( : A∩ B ∈ (¦ .
I just proved / ⊆ (
B
. The other arguments are identical to show (
B
satisﬁes 1  3
and is therefore equal to (. This shows that whenever A, B ∈ ( it follows A∩B ∈ (.
This implies ( is a σ algebra. To show this, all that is left is to verify ( is closed
under countable unions because then it follows ( is a σ algebra. Let ¦A
i
¦ ⊆ (.
Then let A
1
= A
1
and
A
n+1
≡ A
n+1
¸ (∪
n
i=1
A
i
)
= A
n+1
∩
_
∩
n
i=1
A
C
i
_
= ∩
n
i=1
_
A
n+1
∩ A
C
i
_
∈ (
because it was just shown that ﬁnite intersections of sets of ( are in (. Since the
A
i
are disjoint, it follows
∪
∞
i=1
A
i
= ∪
∞
i=1
A
i
∈ (
Therefore, ( ⊇ σ (/) because it is a σ algebra which contains /.
9.3 n Dimensional Lebesgue Measure And Inte
grals
9.3.1 Iterated Integrals
Let m denote one dimensional Lebesgue measure. That is, it is the Lebesgue Stielt
jes measure which comes from the integrator function, F (x) = x. Also let the σ
algebra of measurable sets be denoted by T. Recall this σ algebra contained the
open sets. Also from the construction given above,
m([a, b]) = m((a, b)) = b −a
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 221
Deﬁnition 9.3.1 Let f be a function of n variables and consider the symbol
_
_
f (x
1
, , x
n
) dx
i
1
dx
i
n
. (9.3.3)
where (i
1
, , i
n
) is a permutation of the integers ¦1, 2, , n¦ . The symbol means
to ﬁrst do the Lebesgue integral
_
f (x
1
, , x
n
) dx
i
1
yielding a function of the other n −1 variables given above. Then you do
_ __
f (x
1
, , x
n
) dx
i
1
_
dx
i
2
and continue this way. The iterated integral is said to make sense if the process just
described makes sense at each step. Thus, to make sense, it is required
x
i
1
→f (x
1
, , x
n
)
can be integrated. Either the function has values in [0, ∞] and is measurable or it
is a function in L
1
. Then it is required
x
i
2
→
_
f (x
1
, , x
n
) dx
i
1
can be integrated and so forth. The symbol in 9.3.3 is called an iterated integral.
With the above explanation of iterated integrals, it is now time to deﬁne n
dimensional Lebesgue measure.
9.3.2 n Dimensional Lebesgue Measure And Integrals
With the Lemma about π systems given above and the monotone convergence
theorem, it is possible to give a very elegant and fairly easy deﬁnition of the Lebesgue
integral of a function of n real variables. This is done in the following proposition.
Notation 9.3.2 A set is called F
σ
if it is a countable union of compact sets. A set
is called G
δ
if it is a countable intersection of open sets.
Proposition 9.3.3 There exists a σ algebra T
n
of sets of 1
n
which contains the
open sets and a measure m
n
deﬁned on this σ algebra such that if f : 1
n
→[0, ∞)
is measurable with respect to T
n
then for any permutation (i
1
, , i
n
) of ¦1, , n¦
it follows
_
R
n
fdm
n
=
_
_
f (x
1
, , x
n
) dx
i
1
dx
i
n
(9.3.4)
222 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
In particular, this implies that if A
i
is Lebesgue measurable for each i = 1, , n
then
m
n
_
n
i=1
A
i
_
=
n
i=1
m(A
i
) .
Also for every Borel set E, there exist F an F
σ
set and G a G
δ
set such that
F ⊆ E ⊆ G
and m
n
(G¸ H) = 0.
Proof: Deﬁne a π system as
/ ≡
_
n
i=1
A
i
: A
i
is Lebesgue measurable
_
Also let R
p
≡ [−p, p]
n
, the n dimensional rectangle having sides [−p, p]. A set F ⊆
1
n
will be said to satisfy property T if for every p ∈ N and any two permutations
of ¦1, 2, , n¦, (i
1
, , i
n
) and (j
1
, , j
n
) the two iterated integrals
_
_
A
R
p
∩F
dx
i
1
dx
i
n
,
_
_
A
R
p
∩F
dx
j
1
dx
j
n
make sense and are equal. Now deﬁne ( to be those subsets of 1
n
which have
property T.
Thus / ⊆ ( because if (i
1
, , i
n
) is any permutation of ¦1, 2, , n¦ and
A =
n
i=1
A
i
∈ /
then
_
_
A
R
p
∩A
dx
i
1
dx
i
n
=
n
i=1
m([−p, p] ∩ A
i
) .
Now suppose F ∈ ( and let (i
1
, , i
n
) and (j
1
, , j
n
) be two permutations. Then
R
p
=
_
R
p
∩ F
C
_
∪ (R
p
∩ F)
and so
_
_
A
R
p
∩F
Cdx
i
1
dx
i
n
=
_
_
_
A
R
p
−A
R
p
∩F
_
dx
i
1
dx
i
n
.
Since R
p
∈ (, the iterated integrals on the right and hence on the left make
sense. Then continuing with the expression on the right and using that F ∈ (,
_
_
_
A
R
p
−A
R
p
∩F
_
dx
i
1
dx
i
n
=
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 223
(2p)
n
−
_
_
A
R
p
∩F
dx
i
1
dx
i
n
= (2p)
n
−
_
_
A
R
p
∩F
dx
j
1
dx
j
n
=
_
_
_
A
R
p
−A
R
p
∩F
_
dx
j
1
dx
j
n
=
_
_
A
R
p
∩F
Cdx
j
1
dx
j
n
which shows that if F ∈ ( then so is F
C
.
Next suppose ¦F
i
¦
∞
i=1
is a sequence of disjoint sets in (. Let F = ∪
∞
i=1
F
i
. I need
to show F ∈ (. Since the sets are disjoint,
_
_
A
R
p
∩F
dx
i
1
dx
i
n
=
_
_
∞
k=1
A
R
p
∩F
k
dx
i
1
dx
i
n
=
_
_
lim
N→∞
N
k=1
A
R
p
∩F
k
dx
i
1
dx
i
n
Do the iterated integrals make sense? Note that the iterated integral makes sense
for
N
k=1
A
R
p
∩F
k
as the integrand because it is just a ﬁnite sum of functions for
which the iterated integral makes sense. Therefore,
x
i
1
→
∞
k=1
A
R
p
∩F
k
(x)
is measurable and by the monotone convergence theorem,
_
∞
k=1
A
R
p
∩F
k
(x) dx
i
1
= lim
N→∞
_
N
k=1
A
R
p
∩F
k
dx
i
1
Now each of the functions,
x
i
2
→
_
N
k=1
A
R
p
∩F
k
dx
i
1
is measurable and so the limit of these functions,
_
∞
k=1
A
R
p
∩F
k
(x) dx
i
1
is also measurable. Therefore, one can do another integral to this function. Con
tinuing this way using the monotone convergence theorem, it follows the iterated
integral makes sense. The same reasoning shows the iterated integral makes sense
for any other permutation.
Now applying the monotone convergence theorem as needed,
_
_
A
R
p
∩F
dx
i
1
dx
i
n
=
_
_
∞
k=1
A
R
p
∩F
k
dx
i
1
dx
i
n
224 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
=
_
_
lim
N→∞
N
k=1
A
R
p
∩F
k
dx
i
1
dx
i
n
=
_
_
lim
N→∞
N
k=1
_
A
R
p
∩F
k
dx
i
1
dx
i
n
=
_
lim
N→∞
N
k=1
_ _
A
R
p
∩F
k
dx
i
1
dx
i
n
= lim
N→∞
N
k=1
_
_
A
R
p
∩F
k
dx
i
1
dx
i
n
= lim
N→∞
N
k=1
_
_
A
R
p
∩F
k
dx
j
1
dx
j
n
the last step holding because each F
k
∈ (. Then repeating the steps above in the
opposite order, this equals
_
_
∞
k=1
A
R
p
∩F
k
dx
j
1
dx
j
n
=
_
_
A
R
p
∩F
dx
j
1
dx
j
n
Thus F ∈ (. By Lemma 9.2.2 ( ⊇ σ (/).
Let T
n
= σ (/). Each set of the form
n
k=1
U
k
where U
k
is an open set is in
/. Also every open set in 1
n
is a countable union of open sets of this form. This
follows from Lemma 7.1.5 on Page 163. Therefore, every open set is in T
n
. Thus
σ (/) ⊇ B (1
n
) , the Borel sets of 1
n
.
For F ∈ T
n
deﬁne
m
n
(F) ≡ lim
p→∞
_
_
A
R
p
∩F
dx
j
1
dx
j
n
where (j
1
, , j
n
) is a permutation of ¦1, , n¦ . It doesn’t matter which one. It
was shown above they all give the same result. I need to verify m
n
is a measure.
Let ¦F
k
¦ be a sequence of disjoint sets of T
n
.
m
n
(∪
∞
k=1
F
k
) = lim
p→∞
_
_
∞
k=1
A
R
p
∩F
k
dx
j
1
dx
j
n
= lim
p→∞
_
_
lim
m→∞
m
k=1
A
R
p
∩F
k
dx
j
1
dx
j
n
Using the monotone convergence theorem repeatedly as in the ﬁrst part of the
argument, this equals
∞
k=1
lim
p→∞
_
_
A
R
p
∩F
k
dx
j
1
dx
j
n
≡
∞
k=1
m
n
(F
k
) .
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 225
Thus m
n
is a measure. Now letting A
k
be a Lebesgue measurable set
m
n
_
n
k=1
A
k
_
= lim
p→∞
_
_
n
k=1
A
[−p,p]∩A
k
(x
k
) dx
j
1
dx
j
n
= lim
p→∞
n
k=1
m([−p, p] ∩ A
k
) =
n
k=1
m(A
k
) .
Next consider 9.3.4.
It was shown above that for F ∈ T it follows
_
R
n
A
F
dm
n
= lim
p→∞
_
_
A
R
p
∩F
dx
j
1
dx
j
n
Applying the monotone convergence theorem repeatedly on the right, this yields
that the iterated integral makes sense and
_
R
n
A
F
dm
n
=
_
_
A
F
dx
j
1
dx
j
n
It follows 9.3.4 holds for every nonnegative simple function in place of f because
these are just linear combinations of functions, A
F
. Now taking an increasing
sequence of nonnegative simple functions, ¦s
k
¦ which converges to a measurable
nonnegative function f
_
R
n
fdm
n
= lim
k→∞
_
R
n
s
k
dm
n
= lim
k→∞
_
_
s
k
dx
j
1
dx
j
n
=
_
_
fdx
j
1
dx
j
n
The assertion about regularity on the Borel sets follows from Theorem 9.1.5
because the measure is ﬁnite on compact sets.
9.3.3 The Sigma Algebra Of Lebesgue Measurable Sets
In the above section, Lebesgue measure was deﬁned on a σ algebra T
n
which was the
smallest σ algebra which contains /, where / was the set of measurable rectangles,
sets of the form
n
i=1
E
i
for E
i
a Lebesgue measurable set. However, this is not
a complete measure space. To see this, consider A B where m
1
(B) = 0 and A
is a non measurable set. Then A B is a subset of 1 B and m
2
(1 B) = 0.
However, A B cannot be in σ (/) because all sets E in σ (/) have the property
that E
y
≡ ¦x : (x, y) ∈ E¦ is a measurable set. In this case, pick y ∈ B and
(AB)
y
= A which is not measurable. This leads to the following deﬁnition.
Deﬁnition 9.3.4 The measure space for Lebesgue measure is (1
n
, T
n
, m
n
) where
T
n
is the completion of (1
n
, B (1
n
) , m
n
) where B (1
n
) denotes the Borel sets is n
dimensional measure deﬁned on B (1
n
).
226 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
The important thing about Lebesgue measure is that the measure space is com
plete and it is a regular measure space.
Theorem 9.3.5 The measure space (1
n
, T
n
, m
n
) is complete and regular. This
means that for every E ∈ T
n
, there exists an F
σ
set F and a G
δ
set G such that
G ⊇ E ⊇ F and m
n
(G¸ F) = 0. Also if f ≥ 0 is T
n
measurable, then there exists
g ≤ f such that g is Borel measurable and g = f a.e.
Proof: Let E ∈ T
n
. From Theorem 9.1.2 there exist Borel sets A, B such that
A ⊆ E ⊆ B and m
n
(B ¸ A) = 0. Now from Theorem 9.1.5, there exists a G
δ
set G
containing B and an F
σ
set F contained in A such that m
n
(A¸ F) = m
n
(G¸ B) =
0. Then
m
n
(G¸ F) = m
n
(G¸ B) +m
n
(B ¸ A) +m
n
(A¸ F) = 0.
Now consider the last claim. Let s
k
be an increasing sequence of simple functions
which converges pointwise to f. Say s
k
(x) =
m
k
i=1
c
i
A
E
i
(x) where E
i
∈ T
n
.
Then let F
i
⊆ E
i
such that F
i
is Borel measurable and m
n
(E
i
¸ F
i
) = 0. Letting
N
k
= ∪
m
k
i=1
(E
i
¸ F
i
) , and N
= ∪
∞
k=1
N
k
, it follows m
n
(N
) = 0. Now let N be
a Borel measurable set such that N has measure zero and N ⊇ N
. Then each
function in the sequence of simple functions given by s
k
≡ A
N
m
k
i=1
c
i
A
F
i
is Borel
measurable, and the sequence converges to a Borel measurable function which equals
f oﬀ the exceptional set of measure zero N and equals 0 on N.
I deﬁned (T
n
, m
n
) as the completion of (B (1
n
) , m
n
) but the same thing would
have been obtained if I had used (T
n
, m
n
). This can be shown from using the
regularity of one dimensional Lebesgue measure. You might try and show this.
9.3.4 Fubini’s Theorem
Formula 9.3.4 is often called Fubini’s theorem. So is the following theorem. In
general, people tend to refer to theorems about the equality of iterated integrals as
Fubini’s theorem, and in fact Fubini did produce such theorems, but so did Tonelli
and some of these theorems presented here and above should be called Tonelli’s
theorem.
Theorem 9.3.6 Let m
n
be deﬁned in Proposition 9.3.3 on the σ algebra of sets
T
n
given there. Suppose f ∈ L
1
(1
n
, T
n
, m
n
) , (f is T
n
measurable.) . Then if
(i
1
, , i
n
) is any permutation of ¦1, , n¦ ,
_
R
n
fdm
n
=
_
_
f (x) dx
i
1
dx
i
n
.
In particular, iterated integrals for any permutation of ¦1, , n¦ are all equal.
Proof: It suﬃces to prove this for f having real values because if this is shown
the general case is obtained by taking real and imaginary parts. Since f ∈ L
1
(1
n
) ,
_
R
n
[f[ dm
n
< ∞
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 227
and so both
1
2
([f[ +f) and
1
2
([f[ −f) are in L
1
(1
n
) and are each nonnegative.
Hence from Proposition 9.3.3,
_
R
n
fdm
n
=
_
R
n
_
1
2
([f[ +f) −
1
2
([f[ −f)
_
dm
n
=
_
R
n
1
2
([f[ +f) dm
n
−
_
R
n
1
2
([f[ −f) dm
n
=
_
_
1
2
([f (x)[ +f (x)) dx
i
1
dx
i
n
−
_
_
1
2
([f (x)[ −f (x)) dx
i
1
dx
i
n
=
_
_
1
2
([f (x)[ +f (x)) −
1
2
([f (x)[ −f (x)) dx
i
1
dx
i
n
=
_
_
f (x) dx
i
1
dx
i
n
The following corollary is a convenient way to verify the hypotheses of the above
theorem.
Corollary 9.3.7 Suppose f is measurable with respect to T
n
and suppose for some
permutation, (i
1
, , i
n
)
_
_
[f (x)[ dx
i
1
dx
i
n
< ∞
Then f ∈ L
1
(1
n
) .
Proof: By Proposition 9.3.3,
_
R
n
[f[ dm
n
=
_
_
[f (x)[ dx
i
1
dx
i
n
< ∞
and so f is in L
1
(1
n
).
In using Proposition 9.3.3 or Corollary 9.3.7 or Theorem 9.3.6 when f is only
known to be T
n
measurable, one typically uses Theorem 9.3.5 to get a function g
which is equal to f m
n
a.e., but g is Borel and hence T
n
measurable. (You use
this theorem on the positive and negative parts of the real and imaginary parts of
f.) Then every Lebesgue integral mentioned above in the above can be computed
by using the iterated integral of g. However, if you are interested in a much fussier
result, read the section on completion of product measure spaces below.
Since T
n
contains the Borel sets, all the above formulas pertain to the case
where f is Borel measurable.
228 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
Example 9.3.8 Find the iterated integral
_
1
0
_
1
x
sin (y)
y
dydx
Notice the limits. The iterated integral equals
_
R
2
A
A
(x, y)
sin (y)
y
dm
2
where
A = ¦(x, y) : x ≤ y where x ∈ [0, 1]¦
Fubini’s theorem can be applied because the function (x, y) → sin (y) /y is contin
uous except at y = 0 and can be redeﬁned to be continuous there. The function is
also bounded so
(x, y) →A
A
(x, y)
sin (y)
y
clearly is in L
1
_
1
2
_
. Therefore,
_
R
2
A
A
(x, y)
sin (y)
y
dm
2
=
_ _
A
A
(x, y)
sin (y)
y
dxdy
=
_
1
0
_
y
0
sin (y)
y
dxdy
=
_
1
0
sin (y) dy = 1 −cos (1)
9.4 Product Measures
9.4.1 General Theory
The treatment of Lebesgue measure given above is a special case of something called
product measure. You can take the product measure for any two ﬁnite or σ ﬁnite
measures. A measure space (Ω, T, µ) is called σ ﬁnite if there are measurable subsets
Ω
n
such that µ(Ω
n
) < ∞ and Ω = ∪
∞
n=1
Ω
n
. σ ﬁnite
Given two ﬁnite measure spaces, (X, T, µ) and (Y, o, ν) , there is a way to deﬁne
a σ algebra of subsets of XY , denoted by T o and a measure, denoted by µν
deﬁned on this σ algebra such that
µ ν (AB) = µ(A) ν (B)
whenever A ∈ T and B ∈ o.
Deﬁnition 9.4.1 Let (X, T, µ) and (Y, o, ν) be two measure spaces. A measurable
rectangle is a set of the form AB where A ∈ T and B ∈ o.
9.4. PRODUCT MEASURES 229
With this lemma, it is easy to deﬁne product measure.
Let (X, T, µ) and (Y, o, ν) be two ﬁnite measure spaces. Deﬁne / to be the set
of measurable rectangles, AB, A ∈ T and B ∈ o. Let
( ≡
_
E ⊆ X Y :
_
Y
_
X
A
E
dµdν =
_
X
_
Y
A
E
dνdµ
_
(9.4.5)
where in the above, part of the requirement is for all integrals to make sense.
Then / ⊆ (. This is obvious.
Next I want to show that if E ∈ ( then E
C
∈ (. Observe A
E
C = 1 −A
E
and so
_
Y
_
X
A
E
Cdµdν =
_
Y
_
X
(1 −A
E
) dµdν
=
_
X
_
Y
(1 −A
E
) dνdµ
=
_
X
_
Y
A
E
Cdνdµ
which shows that if E ∈ (, then E
C
∈ (.
Next I want to show ( is closed under countable unions of disjoint sets of (. Let
¦A
i
¦ be a sequence of disjoint sets from (. Then
_
Y
_
X
A
∪
∞
i=1
A
i
dµdν =
_
Y
_
X
∞
i=1
A
A
i
dµdν =
_
Y
∞
i=1
_
X
A
A
i
dµdν
=
∞
i=1
_
Y
_
X
A
A
i
dµdν =
∞
i=1
_
X
_
Y
A
A
i
dνdµ
=
_
X
∞
i=1
_
Y
A
A
i
dνdµ =
_
X
_
Y
∞
i=1
A
A
i
dνdµ
=
_
X
_
Y
A
∪
∞
i=1
A
i
dνdµ, (9.4.6)
the interchanges between the summation and the integral depending on the mono
tone convergence theorem. Thus ( is closed with respect to countable disjoint
unions.
From Lemma 9.2.2, ( ⊇ σ (/) . Also the computation in 9.4.6 implies that on
σ (/) one can deﬁne a measure, denoted by µ ν and that for every E ∈ σ (/) ,
(µ ν) (E) =
_
Y
_
X
A
E
dµdν =
_
X
_
Y
A
E
dνdµ. (9.4.7)
Now here is Fubini’s theorem.
Theorem 9.4.2 Let f : XY →[0, ∞] be measurable with respect to the σ algebra,
σ (/) just deﬁned and let µ ν be the product measure of 9.4.7 where µ and ν are
ﬁnite measures on (X, T) and (Y, o) respectively. Then
_
X×Y
fd (µ ν) =
_
Y
_
X
fdµdν =
_
X
_
Y
fdνdµ.
230 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
Proof: Let ¦s
n
¦ be an increasing sequence of σ (/) measurable simple functions
which converges pointwise to f. The above equation holds for s
n
in place of f from
what was shown above. The ﬁnal result follows from passing to the limit and using
the monotone convergence theorem.
The symbol, T o denotes σ (/).
Of course one can generalize right away to measures which are only σ ﬁnite.
Theorem 9.4.3 Let f : XY →[0, ∞] be measurable with respect to the σ algebra,
σ (/) just deﬁned and let µ ν be the product measure of 9.4.7 where µ and ν are
σ ﬁnite measures on (X, T) and (Y, o) respectively. Then
_
X×Y
fd (µ ν) =
_
Y
_
X
fdµdν =
_
X
_
Y
fdνdµ.
Proof: Since the measures are σ ﬁnite, there exist increasing sequences of sets,
¦X
n
¦ and ¦Y
n
¦ such that µ(X
n
) < ∞ and ν (Y
n
) < ∞. Then µ and ν restricted to
X
n
and Y
n
respectively are ﬁnite. Then from Theorem 9.4.2,
_
Y
n
_
X
n
fdµdν =
_
X
n
_
Y
n
fdνdµ
Passing to the limit yields
_
Y
_
X
fdµdν =
_
X
_
Y
fdνdµ
whenever f is as above. In particular, you could take f = A
E
where E ∈ T o
and deﬁne
(µ ν) (E) ≡
_
Y
_
X
A
E
dµdν =
_
X
_
Y
A
E
dνdµ.
Then just as in the proof of Theorem 9.4.2, the conclusion of this theorem is ob
tained.
It is also useful to note that all the above holds for
n
i=1
X
i
in place of X Y.
You would simply modify the deﬁnition of ( in 9.4.5 including all permutations for
the iterated integrals and for / you would use sets of the form
n
i=1
A
i
where A
i
is measurable. Everything goes through exactly as above. Thus the following is
obtained.
Theorem 9.4.4 Let ¦(X
i
, T
i
, µ
i
)¦
n
i=1
be σ ﬁnite measure spaces and let
n
i=1
T
i
denote the smallest σ algebra which contains the measurable boxes of the form
n
i=1
A
i
where A
i
∈ T
i
. Then there exists a measure λ deﬁned on
n
i=1
T
i
such that
if f :
n
i=1
X
i
→[0, ∞] is
n
i=1
T
i
measurable, and (i
1
, , i
n
) is any permutation
of (1, , n) , then
_
fdλ =
_
X
i
n
_
X
i
1
fdµ
i
1
dµ
i
n
9.4. PRODUCT MEASURES 231
9.4.2 Completion Of Product Measure Spaces
Using Theorem 9.1.3 it is easy to give a generalization to yield a theorem for the
completion of product spaces.
Theorem 9.4.5 Let ¦(X
i
, T
i
, µ
i
)¦
n
i=1
be σ ﬁnite measure spaces and let
n
i=1
T
i
denote the smallest σ algebra which contains the measurable boxes of the form
n
i=1
A
i
where A
i
∈ T
i
. Then there exists a measure λ deﬁned on
n
i=1
T
i
such that
if f :
n
i=1
X
i
→[0, ∞] is
n
i=1
T
i
measurable, and (i
1
, , i
n
) is any permutation
of (1, , n) , then
_
fdλ =
_
X
i
n
_
X
i
1
fdµ
i
1
dµ
i
n
Let
_
n
i=1
X
i
,
n
i=1
T
i
, λ
_
denote the completion of this product measure space and
let
f :
n
i=1
X
i
→[0, ∞]
be
n
i=1
T
i
measurable. Then there exists N ∈
n
i=1
T
i
such that λ(N) = 0 and a
nonnegative function, f
1
measurable with respect to
n
i=1
T
i
such that f
1
= f oﬀ
N and if (i
1
, , i
n
) is any permutation of (1, , n) , then
_
fdλ =
_
X
i
n
_
X
i
1
f
1
dµ
i
1
dµ
i
n
.
Furthermore, f
1
may be chosen to satisfy either f
1
≤ f or f
1
≥ f.
Proof: This follows immediately from Theorem 9.4.4 and Theorem 9.1.3. By
the second theorem, there exists a function f
1
≥ f such that f
1
= f for all
(x
1
, , x
n
) / ∈ N, a set of
n
i=1
T
i
having measure zero. Then by Theorem 9.1.2
and Theorem 9.4.4
_
fdλ =
_
f
1
dλ =
_
X
i
n
_
X
i
1
f
1
dµ
i
1
dµ
i
n
.
Since f
1
= f oﬀ a set of measure zero, I will dispense with the subscript. Also
it is customary to write
λ = µ
1
µ
n
and
λ = µ
1
µ
n
.
Thus in more standard notation, one writes
_
fd (µ
1
µ
n
) =
_
X
i
n
_
X
i
1
fdµ
i
1
dµ
i
n
This theorem is often referred to as Fubini’s theorem. The next theorem is also
called this.
232 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
Corollary 9.4.6 Suppose f ∈ L
1
_
n
i=1
X
i
,
n
i=1
T
i
, µ
1
µ
n
_
where each X
i
is a σ ﬁnite measure space. Then if (i
1
, , i
n
) is any permutation of (1, , n) ,
it follows
_
fd (µ
1
µ
n
) =
_
X
i
n
_
X
i
1
fdµ
i
1
dµ
i
n
.
Proof: Just apply Theorem 9.4.5 to the positive and negative parts of the real
and imaginary parts of f.
Here is another easy corollary.
Corollary 9.4.7 Suppose in the situation of Corollary 9.4.6, f = f
1
oﬀ N, a set of
n
i=1
T
i
having µ
1
µ
n
measure zero and that f
1
is a complex valued function
measurable with respect to
n
i=1
T
i
. Suppose also that for some permutation of
(1, 2, , n) , (j
1
, , j
n
)
_
X
j
n
_
X
j
1
[f
1
[ dµ
j
1
dµ
j
n
< ∞.
Then
f ∈ L
1
_
n
i=1
X
i
,
n
i=1
T
i
, µ
1
µ
n
_
and the conclusion of Corollary 9.4.6 holds.
Proof: Since [f
1
[ is
n
i=1
T
i
measurable, it follows from Theorem 9.4.4 that
∞ >
_
X
j
n
_
X
j
1
[f
1
[ dµ
j
1
dµ
j
n
=
_
[f
1
[ d (µ
1
µ
n
)
=
_
[f
1
[ d (µ
1
µ
n
)
=
_
[f[ d (µ
1
µ
n
) .
Thus f ∈ L
1
_
n
i=1
X
i
,
n
i=1
T
i
, µ
1
µ
n
_
as claimed and the rest follows from
Corollary 9.4.6.
The following lemma is also useful.
Lemma 9.4.8 Let (X, T, µ) and (Y, o, ν) be σ ﬁnite complete measure spaces and
suppose f ≥ 0 is T o measurable. Then for a.e. x,
y →f (x, y)
is o measurable. Similarly for a.e. y,
x →f (x, y)
is T measurable.
9.5. EXERCISES 233
Proof: By Theorem 9.1.3, there exist T o measurable functions, g and h and
a set N ∈ T o of µ λ measure zero such that g ≤ f ≤ h and for (x, y) / ∈ N, it
follows that g (x, y) = h(x, y) . Then
_
X
_
Y
gdνdµ =
_
X
_
Y
hdνdµ
and so for a.e. x,
_
Y
gdν =
_
Y
hdν.
Then it follows that for these values of x, g (x, y) = h(x, y) and so by Theorem 9.1.3
again and the assumption that (Y, o, ν) is complete, y → f (x, y) is o measurable.
The other claim is similar.
9.5 Exercises
1. ♠Find
_
2
0
_
6−2z
0
_
3−z
1
2
x
(3 −z) cos
_
y
2
_
dy dxdz.
2. Find
_
1
0
_
18−3z
0
_
6−z
1
3
x
(6 −z) exp
_
y
2
_
dy dxdz.
3. Find
_
2
0
_
24−4z
0
_
6−z
1
4
y
(6 −z) exp
_
x
2
_
dxdy dz.
4. Find
_
1
0
_
12−4z
0
_
3−z
1
4
y
sin x
x
dxdy dz.
5. Find
_
20
0
_
1
0
_
5−z
1
5
y
sin x
x
dxdz dy+
_
25
20
_
5−
1
5
y
0
_
5−z
1
5
y
sin x
x
dxdz dy. Hint: You might
try doing it in the order, dy dxdz
6. ♠Explain why for each t > 0, x →e
−tx
is a function in L
1
(1) and
_
∞
0
e
−tx
dx =
1
t
.
Thus
_
R
0
sin (t)
t
dt =
_
R
0
_
∞
0
sin (t) e
−tx
dxdt
Now explain why you can change the order of integration in the above iterated
integral. Then compute what you get. Next pass to a limit as R → ∞ and
show
_
∞
0
sin (t)
t
dt =
1
2
π
7. ♠Explain why
_
∞
a
f (t) dt ≡ lim
r→∞
_
r
a
f (t) dt whenever f ∈ L
1
(a, ∞) ; that
is fA
[a,∞)
∈ L
1
(1).
234 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
8. ♠Let f (y) = g (y) = [y[
−1/2
if y ∈ (−1, 0) ∪ (0, 1) and f (y) = g (y) = 0 if
y / ∈ (−1, 0) ∪ (0, 1). For which values of x does it make sense to write the
integral
_
R
f (x −y) g (y) dy?
9. ♠Let E
i
be a Borel set in 1. Show that
n
i=1
E
i
is a Borel set in 1
n
.
10. ♠Let ¦a
n
¦ be an increasing sequence of numbers in (0, 1) which converges to
1. Let g
n
be a nonnegative function which equals zero outside (a
n
, a
n+1
) such
that
_
g
n
dx = 1. Now for (x, y) ∈ [0, 1) [0, 1) deﬁne
f (x, y) ≡
∞
k=1
g
n
(y) (g
n
(x) −g
n+1
(x)) .
Explain why this is actually a ﬁnite sum for each such (x, y) so there are
no convergence questions in the inﬁnite sum. Explain why f is a continuous
function on [0, 1) [0, 1). You can extend f to equal zero oﬀ [0, 1) [0, 1) if
you like. Show the iterated integrals exist but are not equal. In fact, show
_
1
0
_
1
0
f (x, y) dydx = 1 ,= 0 =
_
1
0
_
1
0
f (x, y) dxdy.
Does this example contradict the Fubini theorem? Explain why or why not.
11. ♠Let f : [a, b] →1 be Rieman integrable. Thus f is a bounded function and
by Darboux’s theorem, there exists a unique number between all the upper
sums and lower sums of f, this number being the Riemann integral. Show
that f is Lebesgue measurable and
_
b
a
f (x) dx =
_
[a,b]
fdm
where the second integral in the above is the Lebesgue integral taken with
respect to one dimensional Lebesgue measure and the ﬁrst is the ordinary
Riemann integral.
12. ♠Let (Ω, T, µ) be a σ ﬁnite measure space and let f : Ω → [0, ∞) be mea
surable. Also let φ : [0, ∞) → 1 be increasing with φ(0) = 0 and φ a C
1
function. Show that
_
Ω
φ ◦ fdµ =
_
∞
0
φ
(t) µ([f > t]) dt.
Hint: This can be done using the following steps. Let t
n
i
= i2
−n
. Show that
A
[f>t]
(ω) = lim
n→∞
∞
i=0
A
[f>t
n
i+1
]
(ω) A
[t
n
i
,t
n
i+1
)
(t)
Now this is a countable sum of T B ([0, ∞)) measurable functions and so it
follows that (t, ω) → A
[f>t]
(ω) is T B ([0, ∞)) measurable. Consequently,
9.5. EXERCISES 235
so is A
[f>t]
(ω) φ(t) . Note that it is important in the argument to have f > t.
Now observe
_
Ω
φ ◦ fdµ =
_
Ω
_
f(ω)
0
φ
(t) dtdµ =
_
Ω
_
∞
0
A
[f>t]
(ω) φ
(t) dtdµ
Use Fubini’s theorem. For your information, this does not require the measure
space to be σ ﬁnite. You can use a diﬀerent argument which ties in to the
ﬁrst deﬁnition of the Lebesgue integral. The function t →µ([f > t]) is called
the distribution function.
13. ♠Give a diﬀerent proof of the above as follows. First suppose f is a simple
function,
f (ω) =
n
k=1
a
k
A
E
k
(ω)
where the a
k
are strictly increasing, φ(a
0
) = a
0
≡ 0. Then explain carefully
the steps to the following argument.
_
Ω
φ ◦ fdµ =
n
i=1
_
φ(a
i
)
φ(a
i−1
)
µ([φ ◦ f > t]) dt =
n
i=1
_
φ(a
i
)
φ(a
i−1
)
n
k=i
µ(E
k
) dt
=
n
i=1
n
k=i
µ(E
k
)
_
a
i
a
i−1
φ
(t) dt =
n
i=1
_
a
i
a
i−1
φ
(t)
n
k=i
µ(E
k
) dt
=
n
i=1
_
a
i
a
i−1
φ
(t) µ([f > t]) dt =
_
∞
0
φ
(t) µ([f > t]) dt
Note that this did not require the measure space to be σ ﬁnite and comes
directly from the deﬁnition of the integral.
14. ♠Give another argument for the above result as follows.
_
φ ◦ fdµ =
_
∞
0
µ([φ ◦ f > t]) dt =
_
∞
0
µ
__
f > φ
−1
(t)
¸_
dt
and now change the variable in the last integral, letting φ(s) = t. Justify the
easy manipulations.
15. ♠Let φ(αx) ≤ C
α
φ(x) for all α > 1, φ(0) = 0, φ is strictly increasing on
[0, ∞), φ is C
1
, and suppose (Ω, T, µ) is a ﬁnite measure space. Also suppose
f, g are nonnegative measurable functions Suppose there exists β > 1 such
that for all λ > 0 and 1 > δ > 0,
µ([f > βλ] ∩ [g ≤ δλ]) ≤ η (δ) µ([f > λ])
where lim
δ→0+
η (δ) = 0 where η is increasing. Show there exists a constant
C depending only on β, η such that
_
Ω
φ ◦ fdµ ≤ C
_
Ω
φ ◦ gdµ
236 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
This is called the good lambda inequality
2
. Hint: Use the above problems.
Fill in the details.
_
Ω
φ(f) dµ =
_
∞
0
φ
(t) µ([f > t]) dt =
_
∞
0
βφ
(βλ) µ([f > βλ]) dλ
=
_
∞
0
βφ
(βλ) µ([f > βλ] ∩ [g ≤ δλ]) dλ
+
_
∞
0
βφ
(βλ) µ([f > βλ] ∩ [g > δλ]) dλ
≤
_
∞
0
βφ
(βλ) η (δ) µ([f > λ]) dλ +
_
∞
0
βφ
(βλ) µ([g > δλ]) dλ
=
_
∞
0
βφ
(βλ) η (δ) µ([f > λ]) dλ +
_
∞
0
β
δ
φ
_
β
t
δ
_
µ([g > t]) dt
= η (δ)
_
Ω
φ(βf) dµ +
_
Ω
φ
_
β
δ
g
_
dµ
≤ η (δ) C
β
_
Ω
φ(f) dµ +C
β/δ
_
Ω
φ(g) dµ
Now adjust δ. This yields the desired result in the case that
_
Ω
φ(f) dµ < ∞.
What about the case where
_
Ω
φ(f) dµ = ∞? Does the good lambda estimate
hold if f is replaced with f ∧m for m a positive constant? Recall µ(Ω) < ∞.
2
I have no idea why it is called the good lambda inequality. I am also not sure if there is a bad
lambda inequality. It is a remarkable result however.
Lebesgue Measurable Sets
10.1 The σ Algebra Of Lebesgue Measurable Sets
The σ algebra of Lebesgue measurable sets is larger than the above σ algebra of
Borel sets or of the earlier σ algebra which came from an application of the π system
lemma. It is convenient to use this larger σ algebra, especially when considering
change of variables formulas, although it is certainly true that one can do most
interesting theorems with the Borel sets only. However, it is in some ways easier to
consider the more general situation and this will be done next.
Deﬁnition 10.1.1 The completion of (1
p
, B (1
p
) , m
p
) is the Lebesgue measure
space. Denote this by (1
p
, T
p
, m
p
) .
Thus for each E ∈ T
p
,
m
p
(E) = inf ¦m
p
(F) : F ⊇ E and F ∈ B (1
p
)¦
It follows that for each E ∈ T
p
there exists F ∈ B (1
p
) such that F ⊇ E and
m
p
(E) = m
p
(F) .
Theorem 10.1.2 m
p
is regular on T
p
. In fact, if E ∈ T
p
, there exist sets in
B (1
p
) , F, G such that
F ⊆ E ⊆ G,
F is a countable union of compact sets, G is a countable intersection of open sets
and
m
p
(G¸ F) = 0.
If A
k
is Lebesgue measurable then
p
k=1
A
k
∈ T
p
and
m
p
_
p
k=1
A
k
_
=
p
k=1
m(A
k
) .
In addition to this, m
p
is translation invariant. This means that if E ∈ T
p
and
x ∈ 1
p
, then
m
p
(x+E) = m
p
(E) .
The expression x +E means ¦x +e : e ∈ E¦ .
237
238 LEBESGUE MEASURABLE SETS
Proof: The regularity of m
p
on the Borel sets follows from Theorem 9.1.5.
Then Theorem 9.1.6 implies m
p
is regular. The assertion about the measure of the
Cartesian product in the case where each A
k
is Borel follows from the fact that m
p
is the extension of a measure for which the assertion does hold. See Theorem 9.1.2
on the completion of a measure space. In case the A
k
are only Lebesgue measurable,
the equation can be obtained from regularity considerations.
It only remains to consider the claim about translation invariance. Let / denote
all sets of the form
p
k=1
U
k
where each U
k
is an open set in 1. Thus / is a π system.
x +
p
k=1
U
k
=
p
k=1
(x
k
+U
k
)
which is also a ﬁnite Cartesian product of ﬁnitely many open sets. Also,
m
p
_
x +
p
k=1
U
k
_
= m
p
_
p
k=1
(x
k
+U
k
)
_
=
p
k=1
m(x
k
+U
k
)
=
p
k=1
m(U
k
) = m
p
_
p
k=1
U
k
_
The step to the last line is obvious because an arbitrary open set in 1 is the disjoint
union of open intervals, and the lengths of these intervals are unchanged when they
are slid to another location.
Now let ( denote those sets of T
p
(Recall that T
p
was the smallest σ algebra
which contains all the measurable rectangles
p
i=1
E
i
, E
i
Lebesgue measurable.) E
with the property that for each n ∈ N
m
p
(x +E ∩ (−n, n)
p
) = m
p
(E ∩ (−n, n)
p
)
and the set x +E ∩ (−n, n)
p
is in T
p
. Thus / ⊆ (. If E ∈ (, then
_
x +E
C
∩ (−n, n)
p
_
∪ (x +E ∩ (−n, n)
p
) = x + (−n, n)
p
which implies x +E
C
∩ (−n, n)
p
is in T
p
since it equals a diﬀerence of two sets in
T
p
. Now consider the following.
m
p
_
x +E
C
∩ (−n, n)
p
_
+m
p
(E ∩ (−n, n)
p
)
= m
p
_
x +E
C
∩ (−n, n)
p
_
+m
p
(x +E ∩ (−n, n)
p
)
= m
p
(x + (−n, n)
p
) = m
p
((−n, n)
p
)
10.1. THE σ ALGEBRA OF LEBESGUE MEASURABLE SETS 239
= m
p
_
E
C
∩ (−n, n)
p
_
+m
p
(E ∩ (−n, n)
p
)
which shows
m
p
_
x +E
C
∩ (−n, n)
p
_
= m
p
_
E
C
∩ (−n, n)
p
_
showing that E
C
∈ (.
If ¦E
k
¦ is a sequence of disjoint sets of (,
m
p
(x + (∪
∞
k=1
E
k
) ∩ (−n, n)
p
) = m
p
(∪
∞
k=1
(x +E
k
) ∩ (−n, n)
p
)
Now the sets ¦(x +E
k
) ∩ (−p, p)
n
¦ are also disjoint and so the above equals
k
m
p
(x +E
k
∩ (−n, n)
p
) =
k
m
p
(E
k
∩ (−n, n)
p
)
= m
p
(∪
∞
k=1
E
k
∩ (−n, n)
p
)
Thus ( is also closed with respect to countable disjoint unions. It follows from the
lemma on π systems that ( = σ (/) = B (1
p
) .
I have just shown that for every E ∈ B (1
p
) , and any n ∈ N,
m
p
(x +E ∩ (−n, n)
p
) = m
p
(E ∩ (−n, n)
p
)
Taking the limit as n →∞ yields
m
p
(x +E) = m
p
(E) .
This proves translation invariance on sets in B (1
p
).
By Regularity of m
p
, for E ∈ T
p
, there exists G a G
δ
set and F an F
σ
set such
that F ⊆ E ⊆ G and m
p
(G¸ F) = 0. Also, it is obvious that the translation of a
G
δ
set is a G
δ
set and the translation of an F
σ
set is an F
σ
set. Therefore, from
what was just shown,
F +x ⊆ E +x ⊆ G+x
and m
p
((G+x) ¸ (F +x)) = m
p
(G¸ F +x) = m
p
(G¸ F) = 0. Therefore, by
completeness, E +x is Lebesgue measurable and
m
p
(E +x) ≤ m
p
(G+x) = m
p
(G) = m
p
(E)
= m
p
(F) = m
p
(F +x) ≤ m
p
(E +x)
The following lemma is about the existence of a Borel measurable representa
tive for any Lebesgue measurable function. It follows right away from the above
deﬁnition of Lebesgue measure as the completion of (1
p
, B (1
p
) , m
p
) and Theorem
9.1.3.
Lemma 10.1.3 Let f be measurable with respect to T
p
. Then there exists a Borel
measurable function g, such that g = f m
p
a.e.
What is the main importance of regularity of Lebesgue measure? It is related
to the density of continuous functions which have compact support in L
1
(1
n
).
240 LEBESGUE MEASURABLE SETS
Deﬁnition 10.1.4 Let f be a function. spt (f) ≡ ¦x : f (x) ,= 0¦. In words, the
closure of the set where f is not zero. spt (f) is called the support of f. A function
is said to be in C
c
(1
p
) if it is continuous and has compact support.
The following remarkable theorem on approximation follows from regularity of
the measure.
Theorem 10.1.5 Let f ∈ L
1
(1
n
) . Then there exists g ∈ C
c
(1
n
) such that
_
R
n
[f −g[ dm
n
< ε.
Proof: First note that every open set V is the countable union of compact sets.
In fact,
V = ∪
∞
k=1
B(0, k) ∩
_
x ∈ V : dist
_
x, V
C
_
≥ 1/k
_
.
Now if K ⊆ V where V is an open set and K is a compact set, it follows from the
ﬁnite intersection property of compact sets that for all k large enough,
K ⊆ B(0, k) ∩
_
x ∈ V : dist
_
x, V
C
_
> 1/k
_
≡ W
⊆ B(0, k) ∩
_
x ∈ V : dist
_
x, V
C
_
≥ 1/k
_
,
a compact subset of V . Let h(x) =
dist(x,W
C
)
dist(x,W
C
)+dist(x,K)
. Then since W is an open
set, containing the compact set K, it follows that the denominator is never equal
to 0. Therefore, h is continuous. Also, if x ∈ K, then h = 0 and if x / ∈ W, then
h = 0. It follows that h is in C
c
(1
n
) , equals 1 on K and vanishes oﬀ a compact
set which is contained in V . We denote this situation with the notation of Rudin,
K ≺ h ≺ W.
Now let f ≥ 0 and
_
fdm
n
< ∞. By Theorem 8.7.5, there exists a nonnegative
simple function, s (x) =
m
k=1
c
k
A
E
k
(x) , m
n
(E
k
) < ∞, such that
_
[f −s[ dm
n
<
ε
2
.
By regularity of the measure, there exists a compact set K
k
and an open set V
k
such that
K
k
⊆ E
k
⊆ V
k
, m
n
(V
k
¸ K
k
) <
ε
2 (
m
k=1
c
k
)
Then from what was just shown, there exists h
k
such that K
k
≺ h
k
≺ V
k
. Then let
h(x) =
m
k=1
c
k
h
k
(x) . Thus h ∈ C
c
(1
n
).
_
[s −h[ dm
n
≤
m
k=1
c
k
m
n
(V
k
¸ K
k
) <
ε
2
.
Therefore,
_
[f −h[ dm
n
≤
_
[f −s[ dm
n
+
_
[s −h[ dm
n
<
ε
2
+
ε
2
= ε.
For an arbitrary f ∈ L
1
(1
n
) , you simply apply the above result to positive and
negative parts of real and imaginary parts.
10.2. CHANGE OF VARIABLES, LINEAR MAPS 241
10.2 Change Of Variables, Linear Maps
The change of variables formula for linear maps is implied by the following major
theorem. First, here is some notation. Let dx
i
denote
dx
1
dx
i−1
dx
i+1
dx
p
.
Theorem 10.2.1 Let E be any Lebesgue measurable set and let A be a pp matrix.
Then AE is Lebesgue measurable and m
p
(AE) = [det (A)[ m
p
(E). Also, if E is
any Lebesgue measurable set, then
_
A
A(E)
(y) dy =
_
A
E
(x) [det (A)[ dx.
Proof: Let Q denote
p
i=1
(a
i
, b
i
) , an open rectangle. First suppose A is an
elementary matrix which is obtained from I by adding α times row i to row j.
Then det (A) = 1 and
m
p
(AQ) ≡
_
_
i=j
_
b
i
a
i
_
_
_
A
AQ
dx
j
dx
j
=
_
_
i=j
_
b
i
a
i
_
_
_
b
j
+αx
i
a
j
+αx
i
dx
j
dx
j
=
p
i=1
[b
i
−a
i
[ = m
p
(Q) = [det (A)[ m
p
(Q)
The linear transformation determined by A just shears Q in the direction of x
j
. It
is clear in this case that AQ is an open, hence measurable set.
Next suppose A is an elementary matrix which comes from multiplying the j
th
row of I with α ,= 0. Then this changes the length of one of the sides of the box by
the factor α, resulting in
m
p
(AQ) = [α[ [b
j
−a
j
[
i=j
[b
i
−a
i
[ = [α[ m
p
(Q) = [det (A)[ m
p
(Q)
In case A is an elementary matrix which comes from switching two rows of
the identity, then it maps Q to a possibly diﬀerent Q which has the lengths of its
sides the same set of numbers as the set of lengths of the sides of Q. Therefore,
[det (A)[ = 1 and so
m
p
(AQ) = m
p
(Q) = [det (A)[ m
p
(Q) .
Let R
n
=
p
i=1
(−n, n) and let / consist of all ﬁnite intersections of such open
boxes as just described. This is clearly a π system. Now let
( ≡ ¦E ∈ B (1
p
) : m
p
(A(E ∩ R
n
)) = [det (A)[ m
p
(E ∩ R
n
) , and AE is Borel¦
where A is any of the above elementary matrices. It is clear that ( is closed with
respect to countable disjoint unions. If E ∈ (, is E
C
∈ (? Since A is onto,
AE
C
= 1
n
¸ AE which is Borel. What about the estimate?
m
p
(A(E ∩ R
n
)) +m
p
_
A
_
E
C
∩ R
n
__
= m
p
(A(R
n
)) = [det (A)[ m
p
(R
n
)
242 LEBESGUE MEASURABLE SETS
and so
m
p
_
A
_
E
C
∩ R
n
__
= [det (A)[ m
p
(R
n
) −m
p
(A(E ∩ R
n
))
= [det (A)[ (m
p
(R
n
) −m
p
(E ∩ R
n
))
= [det (A)[ m
p
_
E
C
∩ R
n
_
It was shown above that ( contains /. By Lemma 9.2.2, it follows that ( ⊇ σ (/) =
B (1
p
). Therefore, for any A elementary and E a Borel set,
[det (A)[ m
p
(E) = lim
n→∞
[det (A)[ m
p
(E ∩ R
n
)
= lim
n→∞
m
p
(A(E ∩ R
n
)) = m
p
(A(E)) .
Now consider A an arbitrary invertible matrix. Then A is the product of ele
mentary matrices A
1
A
m
and so if E is Borel, so is AE. Also
[det (A)[ m
p
(E) =
m
i=1
[det (A
i
)[ m
p
(E) =
m−1
i=1
[det (A
i
)[ m
p
(A
m
E)
=
m−2
i=1
[det (A
i
)[ m
p
(A
m−1
A
m
E) = m
p
(A
1
A
m
E) = m
p
(AE) .
In case A is an arbitrary matrix which has rank less than p, there exists a
sequence of elementary matrices such that
A = E
1
E
2
E
s
B
where B is in row reduced echelon form and has at least one row of zeros. Thus if
E is any Lebesgue measurable set,
AE ⊆ E
1
E
2
E
s
(B1
p
)
However, B1
n
is a Borel set of measure zero because it is contained in a set of the
form
F ≡ ¦x ∈ 1
p
: x
k
, , x
p
= 0¦
and this has measure zero. Therefore, E
1
E
2
E
s
(B1
p
) is a Borel set of measure
zero because
m
p
(E
1
E
2
E
s
(B1
p
)) = [det (E
1
E
2
E
s
)[ m
p
(B1
p
) = 0.
It follows from completeness of the measure that AE is Lebesgue measurable and
m
p
(AE) = [det (A)[ m
p
(E) = 0.
It has now been shown that for invertible A, and E any Borel set,
m
p
(AE) = [det (A)[ m
p
(E)
10.3. COVERING THEOREMS 243
and for any Lebesgue measurable set E and A not invertible, the above formula
holds. It only remains to verify the formula holds for A invertible and E only
Lebesgue measurable. However, in this case, A maps open sets to open sets because
its inverse is continuous and maps compact sets to compact sets because x → Ax
is continuous. Hence A takes G
δ
sets to G
δ
sets and F
σ
sets to F
σ
sets. Let E be
Lebesgue measurable. By regularity of the measure, there exists G and F, G
δ
and F
σ
sets respectively such that F ⊆ E ⊆ G and m
p
(G¸ F) = 0. Then AF ⊆ AE ⊆ AG
and
m
p
(AG¸ AF) = m
p
(A(G¸ F)) = [det (A)[ m
p
(G¸ F) = 0.
By completeness, AE is Lebesgue measurable. Also
[det (A)[ m
p
(F) = m
p
(AF) ≤ m
p
(AE) ≤ m
p
(AG)
= [det (A)[ m
p
(G) = [det (A)[ m
p
(E) = [det (A)[ m
p
(F)
The above theorem also implies easily the following version of the change of variables
formula for linear mappings.
Theorem 10.2.2 Let f ≥ 0 and suppose it is Lebesgue measurable. Then if A is a
p p matrix,
_
A
A(R
p
)
(y) f (y) dy =
_
f (A(x)) [det (A)[ dx. (10.2.1)
Proof: From Theorem 10.2.1, the equation is true if det (A) = 0. It follows that
it suﬃces to consider only the case where A
−1
exists. First suppose f (y) = A
E
(y)
where E is a Lebesgue measurable set. In this case, A(1
n
) = 1
n
. Then from
Theorem 10.2.1
_
A
A(R
p
)
(y) f (y) dy = m
p
(E) = [det (A)[ m
p
_
A
−1
E
_
=
_
R
n
[det (A)[ A
A
−1
E
(x) dx
=
_
R
n
[det (A)[ A
E
(Ax) dx =
_
f (A(x)) [det (A)[ dx
It follows from this that 10.2.1 holds whenever f is a nonnegative simple function.
Finally, the general result follows from approximating the Lebesgue measurable
function with nonnegative simple functions using Theorem 7.5.6 and then applying
the monotone convergence theorem.
10.3 Covering Theorems
The Vitali covering theorem is a profound result about coverings of a set in 1
n
with
open balls. The balls can be deﬁned in terms of any norm for 1
n
. For example, the
norm could be
[[x[[ ≡ max ¦[x
k
[ : k = 1, , n¦
244 LEBESGUE MEASURABLE SETS
or the usual norm
[x[ =
¸
k
[x
k
[
2
or any other. The balls can be either open or closed or neither. The proof given
here is from Basic Analysis [32].
Lemma 10.3.1 Let T be a countable collection of balls satisfying
∞> M ≡ sup¦r : B(p, r) ∈ T¦ > 0
and let k ∈ (0, ∞) . Then there exists ( ⊆ T such that
If B(p, r) ∈ ( then r > k, (10.3.2)
If B
1
, B
2
∈ ( then B
1
∩ B
2
= ∅, (10.3.3)
( is maximal with respect to 10.3.2 and 10.3.3. (10.3.4)
By this is meant that if H is a collection of balls satisfying 10.3.2 and 10.3.3, then
H cannot properly contain (.
Proof: If no ball of T has radius larger than k, let ( = ∅. Assume therefore,
that some balls have radius larger than k. Let T ≡ ¦B
i
¦
∞
i=1
. Now let B
n
1
be the
ﬁrst ball in the list which has radius greater than k. If every ball having radius
larger than k intersects this one, then stop. The maximal set is ¦B
n
1
¦ . Otherwise,
let B
n
2
be the next ball having radius larger than k which is disjoint from B
n
1
.
Continue this way obtaining ¦B
n
i
¦
∞
i=1
, a ﬁnite or inﬁnite sequence of disjoint balls
having radius larger than k. Then let ( ≡ ¦B
n
i
¦. To see ( is maximal with respect
to 10.3.2 and 10.3.3, suppose B ∈ T, B has radius larger than k, and ( ∪ ¦B¦
satisﬁes 10.3.2 and 10.3.3. Then at some point in the process, B would have been
chosen because it would be the ball of radius larger than k which has the smallest
index. Therefore, B ∈ ( and this shows ( is maximal with respect to 10.3.2 and
10.3.3.
For a ball, B = B(x, r) , denote by
¯
B the open ball, B(x, 4r), denoted as
B
0
(x, 4r) .
Lemma 10.3.2 Let T be a countable collection of balls, and let
A ≡ ∪¦B : B ∈ T¦ .
Suppose
∞> M ≡ sup ¦r : B(p, r) ∈ T¦ > 0.
Then there exists ( ⊆ T such that ( consists of disjoint balls and
A ⊆ ∪¦
¯
B : B ∈ (¦.
10.3. COVERING THEOREMS 245
Proof: By Lemma 10.3.1, there exists (
1
⊆ T which satisﬁes 10.3.2, 10.3.3,
and 10.3.4 with k =
2M
3
.
Suppose (
1
, , (
m−1
have been chosen for m ≥ 2. Let
T
m
= ¦B ∈ T : B ⊆ 1
n
¸
union of the balls in these G
j
¸ .. ¸
∪¦(
1
∪ ∪ (
m−1
¦ ¦,
and using Lemma 10.3.1, let (
m
be a maximal collection of disjoint balls from T
m
with the property that each ball has radius larger than
_
2
3
_
m
M. Let ( ≡ ∪
∞
k=1
(
k
.
Let x ∈ B(p, r) ∈ T. Choose m such that
_
2
3
_
m
M < r ≤
_
2
3
_
m−1
M
Then B(p, r) must have nonempty intersection with some ball from (
1
∪ ∪ (
m
because if it didn’t, then (
m
would fail to be maximal. Denote by B(p
0
, r
0
) a ball
in (
1
∪ ∪ (
m
which has nonempty intersection with B(p, r) . Thus
r
0
>
_
2
3
_
m
M.
Consider the picture, in which w ∈ B(p
0
, r
0
) ∩ B(p, r) .
w '
r
0
p
0
c
r
p
x
Then
[x −p
0
[ ≤ [x −p[ +[p −w[ +
≤r
0
¸ .. ¸
[w−p
0
[
≤ r +r +r
0
≤ 2
<
3
2
r
0
¸ .. ¸
_
2
3
_
m−1
M +r
0
< 2
_
2
3
_
m−1
_
3
2
_
m
r
0
+r
0
= 4r
0
.
This proves the lemma since it shows B(p, r) ⊆ B
0
(p
0
, 4r
0
).
You don’t need to assume the set of Balls is countable. Let T be any collection of
balls having bounded radii. Let T
ε
result from replacing each ball B(x, r) ∈ T with
the open ball B
0
(x, r (1 +ε)). Thus, letting A
ε
denote the union of these slightly
enlarged balls, it follows from Problem 19 on Page 60 or Lemma 7.1.5 on Page 163,
that a countable subset of T
ε
denoted by T
ε
has the property that ∪T
ε
= A
ε
.
Thus, letting ε = 1/4, the above conclusion follows for these enlarged balls and
that the enlarged balls are of the form B
0
(x, 5r) where B(x, r) ∈ T. Note that if
B
0
(x, r (1 +ε)) ∩B
0
(x
1
, r
1
(1 +ε)) = ∅, then B(x, r) ∩B(x
1
, r
1
) = ∅. This proves
the following proposition.
246 LEBESGUE MEASURABLE SETS
Proposition 10.3.3 Let T be a collection of balls, and let
A ≡ ∪¦B : B ∈ T¦ .
Suppose
∞> M ≡ sup ¦r : B(p, r) ∈ T¦ > 0.
Then there exists ( ⊆ T such that ( consists of disjoint balls whose closures are
also disjoint and
A ⊆ ∪¦
´
B : B ∈ (¦
where for B = B(x, r) a ball,
´
B denotes the open ball B
0
(x, 5r).
Here is the concept of a Vitali covering.
Deﬁnition 10.3.4 Let S be a set and let ( be a covering of S meaning that every
point of S is contained in a set of (. This covering is said to be a Vitali covering if
for each ε > 0 and x ∈ S, there exists a set B ∈ ( containing x, the diameter of B
is less than ε, and there exists an upper bound to the set of diameters of sets of (.
Theorem 10.3.5 Let E ⊆ 1
p
be a bounded measurable set and let T be a collection
of balls, open or not, of bounded radii such that T covers E in the sense of Vitali.
Then there exists a countable collection of balls from T whose closures are disjoint,
denoted by ¦B
j
¦
∞
j=1
, such that m
p
(E ¸ ∪
∞
j=1
B
j
) = m
p
(E ¸ ∪
∞
j=1
B
j
) = 0.
Proof: From the change of variables theorem for linear transformations,
m
p
(B(x,αr)) = m
p
(B(0,αr))
= α
p
m
p
(0, r) = α
p
m
p
(B(x,r)) ,
Let S (x,r) ≡ ¦y : [y −x[ = r¦. Then for each ε < r,
m
p
(S (x, r)) ⊆ m
p
(B(x, r +ε)) −m
p
(B(x, r −ε))
= m
p
(B(0, r +ε)) −m
p
(B(0, r −ε))
=
__
r +ε
r
_
p
−
_
r −ε
r
_
p
_
(m
p
(B(0, r)))
Hence m
p
(S (x, r)) = 0.
If m
p
(E) = 0, there is nothing to prove, so assume the measure of this set is
positive. By outer regularity of Lebesgue measure, there exists U, an open set which
satisﬁes
m
p
(E) > (1 −10
−p
)m
p
(U), U ⊇ E.
10.3. COVERING THEOREMS 247
E
1
U
Each point of E is contained in balls of T of arbitrarily small radii and so there
exists a covering of E with balls of T whose closures are contained in U. Therefore,
by Proposition 10.3.3, there exist balls, ¦B
i
¦
∞
i=1
⊆ T such that their closures are
disjoint and
E ⊆ ∪
∞
j=1
´
B
j
, B
j
⊆ U.
Therefore,
m
p
_
E ¸ ∪
∞
j=1
B
j
_
≤ m
p
(U) −m
p
_
∪
∞
j=1
B
j
_
<
_
1 −10
−p
_
−1
m
p
(E) −
∞
j=1
m
p
_
B
j
_
=
_
1 −10
−p
_
−1
m
p
(E) −5
−p
∞
j=1
m
p
_
´
B
j
_
≤
_
1 −10
−p
_
−1
m
p
(E) −5
−p
m
p
(E)
= m
p
(E) θ
p
where
θ
p
≡
_
1 −10
−p
_
−1
−5
−p
< 1
Thus, there exists m
1
large enough that
m
p
_
E ¸ ∪
m
1
j=1
B
j
_
≤ θ
p
m
p
(E)
Now consider E ¸ ∪
m
1
j=1
B
j
and apply the same reasoning to it that was done to E.
Thus there exists m
2
> m
1
such that
m
p
_
E ¸ ∪
m
2
j=1
B
j
_
≤ θ
p
m
p
_
E ¸ ∪
m
1
j=1
B
j
_
≤ θ
2
p
m
p
(E)
Continuing this way, there exists an increasing subsequence m
k
such that
m
p
_
E ¸ ∪
m
k
j=1
B
j
_
≤ θ
k
p
m
p
(E)
and since θ
p
< 1, and m
p
(E) < ∞, this implies m
p
_
E ¸ ∪
∞
j=1
B
j
_
= 0.
You don’t need to assume that E has ﬁnite measure in order to draw the above
conclusion.
248 LEBESGUE MEASURABLE SETS
Corollary 10.3.6 Let E ⊆ 1
p
be a measurable set and let T be a collection of balls
which come from some norm, open or not, but having bounded radii such that T
covers E in the sense of Vitali. Then there exists a countable collection of balls from
T having disjoint closures, denoted by ¦B
j
¦
∞
j=1
, such that m
p
(E ¸ ∪
∞
j=1
B
j
) = 0. If
E is only assumed to be an arbitrary set with such a Vitali covering of open balls
or a countable Vitali covering of arbitrary balls, then the conclusion is the same,
except you write m
p
_
E ¸ ∪
∞
j=1
B
j
_
= 0.
Proof: Consider A
n
= B(0, n) ¸ B(0, n −1), n = 1, 2, . Let E
n
= E ∩
A
n
. Then ∪
∞
n=1
E
n
∪ N = E where N is a set of measure zero. From Theorem
10.3.5, there exist balls of T, having disjoint closures denoted by ¦B
n
i
¦
∞
i=1
, such
that m
p
(E
n
¸ ∪
∞
i=1
B
n
i
) = 0 and each B
n
i
⊆ A
n
. Then ¦B
n
i
, (i, n) ∈ N N¦ is a
suitable collection of balls having disjoint closures.
Finally, in case E is not known to be measurable, let A ≡ ∪T, a measurable set
because it is either open or at least measurable in case the covering is countable.
Thus, applying the above result to A in place of E, there exist balls having disjoint
closures ¦B
k
¦ such that
m
p
(E ¸ ∪
∞
k=1
B
k
) = m
p
_
E ¸ ∪
∞
k=1
B
k
_
≤ m
p
(A¸ ∪
∞
k=1
B
k
) = 0.
10.4 Diﬀerentiable Functions And Measurability
To begin with, certain kinds of functions map measurable sets to measurable sets.
It will be assumed that U is an open set in 1
p
and that h : U →1
p
satisﬁes
Dh(x) exists for all x ∈ U, (10.4.5)
Note that if
h(x) = Lx
where L ∈ L(1
p
, 1
p
) , then L is included in 10.4.5 because
L(x +v) = L(x) +L(v) +o(v)
In fact, o(v) = 0. Thus multiplication by a p p matrix satisﬁes 10.4.5.
It is convenient in the following lemma to use the norm on 1
p
given by
[[x[[ = max ¦[x
k
[ : k = 1, 2, , p¦ .
Thus B(x, r) is the open box,
p
k=1
(x
k
−r, x
k
+r)
and so m
p
(B(x, r)) = (2r)
p
. Also for a linear transformation A ∈ L(1
p
, 1
p
) ,
[[A[[ ≡ sup
x≤1
[[Ax[[ .
10.4. DIFFERENTIABLE FUNCTIONS AND MEASURABILITY 249
It is appropriate to consider the measurability of a certain set in the following
lemma. Note that Dh(x) is a matrix whose i
th
column is h
x
i
, the partial derivative
with respect to x
i
given by
lim
t→0
h(x+te
i
) −h(x)
t
Each component of the above diﬀerence quotient is a continuous function and so
it follows that each component of the above limit is Borel measurable. Therefore,
x →[[Dh(x)[[ is also a Borel measurable function because it equals
sup
v≤1,v∈D
[[Dh(x) v[[
where D is a dense countable subset of the unit ball. Since it is the sup of countably
many Borel measurable functions, it must also be Borel measurable. It follows the
set
T
k
≡ ¦x ∈ T : [[Dh(x)[[ < k¦
for T a measurable set, must be measurable as well.
Lemma 10.4.1 Let h satisfy 10.4.5. If T ⊆ U and m
p
(T) = 0, then m
p
(h(T)) =
0.
Proof: Let
T
k
≡ ¦x ∈ T : [[Dh(x)[[ < k¦
and let ε > 0 be given. Now by outer regularity, there exists an open set V ,
containing T
k
which is contained in U such that m
p
(V ) < ε. Let x ∈ T
k
. Then by
diﬀerentiability,
h(x +v) = h(x) +Dh(x) v +o (v)
and so there exist arbitrarily small r
x
< 1 such that B(x,5r
x
) ⊆ V and whenever
[[v[[ ≤ 5r
x
, [[o (v)[[ < k [[v[[ . Thus
h(B(x, 5r
x
)) ⊆ B(h(x) , 6kr
x
) .
From the Vitali covering theorem, there exists a countable disjoint sequence of
these balls, ¦B(x
i
, r
i
)¦
∞
i=1
such that ¦B(x
i
, 5r
i
)¦
∞
i=1
=
_
´
B
i
_
∞
i=1
covers T
k
Then
letting m
p
denote the outer measure determined by m
p
,
m
p
(h(T
k
)) ≤ m
p
_
h
_
∪
∞
i=1
´
B
i
__
≤
∞
i=1
m
p
_
h
_
´
B
i
__
≤
∞
i=1
m
p
(B(h(x
i
) , 6kr
x
i
))
=
∞
i=1
m
p
(B(x
i
, 6kr
x
i
)) = (6k)
p
∞
i=1
m
p
(B(x
i
, r
x
i
))
≤ (6k)
p
m
p
(V ) ≤ (6k)
p
ε.
250 LEBESGUE MEASURABLE SETS
Since ε > 0 is arbitrary, this shows m
p
(h(T
k
)) = 0. Now
m
p
(h(T)) = lim
k→∞
m
p
(h(T
k
)) = 0.
Lemma 10.4.2 Let h satisfy 10.4.5. If S is a Lebesgue measurable subset of U,
then h(S) is Lebesgue measurable.
Proof: By Theorem 10.1.2 there exists F which is a countable union of compact
sets F = ∪
∞
k=1
K
k
such that
F ⊆ S, m
p
(S ¸ F) = 0.
Then since h is continuous
h(F) = ∪
k
h(K
k
) ∈ B (1
p
)
because the continuous image of a compact set is compact. Also, h(S ¸ F) is a set
of measure zero by Lemma 10.4.1 and so
h(S) = h(F) ∪ h(S ¸ F) ∈ T
p
because it is the union of two sets which are in T
p
.
In particular, this proves most of the following theorem from a diﬀerent point
of view to that done before.
Theorem 10.4.3 Let A be a pp matrix. Then if E is a Lebesgue measurable set,
it follows that A(E) is also a Lebesgue measurable set.
Proof: By Theorem 10.1.2, there exists F a countable union of compact sets
¦K
i
¦, and a set of measure zero N disjoint from F such that E = F ∪ N, with
m
p
(N) = 0. Since x → Ax is a continuous map, each AK
i
is compact and so AF
is a countable union of compact sets. Therefore, it is a Borel set and is therefore,
Lebesgue measurable. Thus A(F ∪ N) = A(F) ∪ A(N) and from what was just
shown, A(N) is of measure zero and so it is measureable.
10.5 Change Of Variables, Nonlinear Maps
In this section theorems are proved which yield change of variables formulas for
C
1
functions. More general versions can be seen in Kuttler [32], Kuttler [33], and
Rudin [40]. You can obtain more by exploiting the Radon Nikodym theorem and
the Lebesgue fundamental theorem of calculus, two topics which are best studied
in a more advanced course. Instead, I will present some good theorems using the
Vitali covering theorem directly.
A basic version of the theorems to be presented is the following. If you like, let
the balls be deﬁned in terms of the norm
[[x[[ ≡ max ¦[x
k
[ : k = 1, , p¦
10.5. CHANGE OF VARIABLES, NONLINEAR MAPS 251
Lemma 10.5.1 Let U and V be bounded open sets in 1
p
and let h, h
−1
be C
1
functions such that h(U) = V . Also let f ∈ C
c
(V ) . Then
_
V
f (y) dm
p
=
_
U
f (h(x)) [det (Dh(x))[ dm
p
Proof: First note h
−1
(spt (f)) is a closed subset of the bounded set U and so
it is compact. Thus x →f (h(x)) [det (Dh(x))[ is bounded and continuous.
Let x ∈ U. By the assumption that h and h
−1
are C
1
,
h(x +v) −h(x) = Dh(x) v +o(v)
= Dh(x)
_
v +Dh
−1
(h(x)) o(v)
_
= Dh(x) (v +o(v))
and so if r > 0 is small enough then B(x, r) is contained in U and
h(B(x, r)) −h(x) =
h(x+B(0,r)) −h(x) ⊆ Dh(x) (B(0, (1 +ε) r)) . (10.5.6)
Making r still smaller if necessary, one can also obtain
[f (y) −f (h(x))[ < ε (10.5.7)
for any y ∈ h(B(x, r)) and also
[f (h(x
1
)) [det (Dh(x
1
))[ −f (h(x)) [det (Dh(x))[[ < ε (10.5.8)
whenever x
1
∈ B(x, r) . The collection of such balls is a Vitali cover of U. By
Corollary 10.3.6 there is a sequence of disjoint closed balls ¦B
i
¦ such that U =
∪
∞
i=1
B
i
∪ N where m
p
(N) = 0. Denote by x
i
the center of B
i
and r
i
the radius.
Then by Lemma 10.4.1, the monotone convergence theorem, 10.5.6  10.5.8, and
the change of variables formula for linear maps Theorem 10.2.1,
_
V
f (y) dm
p
=
∞
i=1
_
h(B
i
)
f (y) dm
p
≤ εm
p
(V ) +
∞
i=1
_
h(B
i
)
f (h(x
i
)) dm
p
≤ εm
p
(V ) +
∞
i=1
f (h(x
i
)) m
p
(h(B
i
))
≤ εm
p
(V ) +
∞
i=1
f (h(x
i
)) m
p
(Dh(x
i
) (B(0, (1 +ε) r
i
)))
= εm
p
(V ) + (1 +ε)
p
∞
i=1
_
B
i
f (h(x
i
)) [det (Dh(x
i
))[ dm
p
≤ εm
p
(V ) + (1 +ε)
p
∞
i=1
_
_
B
i
f (h(x)) [det (Dh(x))[ dm
p
+εm
p
(B
i
)
_
≤ εm
p
(V ) + (1 +ε)
p
∞
i=1
_
B
i
f (h(x)) [det (Dh(x))[ dm
p
+ (1 +ε)
p
εm
p
(U)
= εm
p
(V ) + (1 +ε)
p
_
U
f (h(x)) [det (Dh(x))[ dm
p
+ (1 +ε)
p
εm
p
(U)
252 LEBESGUE MEASURABLE SETS
Since ε > 0 is arbitrary, this shows
_
V
f (y) dm
p
≤
_
U
f (h(x)) [det (Dh(x))[ dm
p
(10.5.9)
whenever f ∈ C
c
(V ) . Now x →f (h(x)) [det (Dh(x))[ is in C
c
(U) and so using the
same argument with U and V switching roles and replacing h with h
−1
,
_
U
f (h(x)) [det (Dh(x))[ dm
p
≤
_
V
f
_
h
_
h
−1
(y)
__ ¸
¸
det
_
Dh
_
h
−1
(y)
__¸
¸
¸
¸
det
_
Dh
−1
(y)
_¸
¸
dm
p
=
_
V
f (y) dm
p
by the chain rule. This with 10.5.9 proves the lemma.
The next task is to relax the assumption that f is continuous. This will make
use of the following simple lemma.
Lemma 10.5.2 Let S be a nonempty set in 1
p
and let
dist (x, S) ≡ inf ¦[x −y[ : y ∈ S¦
Then this function of x is continuous. If K is a compact subset of an open set G,
there exists a function f : G →[0, 1] such that spt (f) ≡ ¦x : f (x) ,= 0¦ is a compact
subset of G and f (x) = 1 for all x ∈ K. This situation is written as K ≺ f ≺ G.
Proof: First consider the claim about the function. Let x
1
, x
2
be two points.
Say dist (x
1
, S) ≥ dist (x
2
, S). Then let y ∈ S be such that dist (x
2
, S) > [x
2
−y[−ε.
Then
[dist (x
1
, S) −dist (x
2
, S)[ = dist (x
1
, S) −dist (x
2
, S)
≤ [x
1
−y[ −([x
2
−y[ −ε) ≤ [x
1
−x
2
[ +ε
Since ε is arbitrary, [dist (x
1
, S) −dist (x
2
, S)[ ≤ [x
1
−x
2
[.
Now consider the second claim about f. Since K is compact, there exists a posi
tive distance, 2δ between K and G
C
. (Why?) Now let D
1
, , D
m
be closed balls of
radius δ which cover K. Hence K∪∪
m
i=1
D
i
≡ H is a compact set which contains K
and is contained in G. Now consider the open sets B(0, k)∩
_
x : dist
_
x, G
C
_
>
1
k
_
.
These open sets cover H and are increasing so there exists one of them W which
contains H. Note that the closure of this set is compact because it is closed and
bounded. Hence K ⊆ H ⊆ W ⊆ G and W is compact. Let
f (x) =
dist
_
x, W
C
_
dist (x, W
C
) + dist (x,H)
Then since H is compact, it has positive distance to W
C
and so the denominator
is nonzero. Hence the function given has the desired properties.
Note that the lemma proves a little more than needed by having the function
equal 1 on an open set containing K.
10.5. CHANGE OF VARIABLES, NONLINEAR MAPS 253
Corollary 10.5.3 Let U and V be bounded open sets in 1
p
and let h, h
−1
be C
1
functions such that h(U) = V . Also let E ⊆ V be measurable. Then
_
V
A
E
(y) dm
p
=
_
U
A
E
(h(x)) [det (Dh(x))[ dm
p
.
Proof: First suppose E ⊆ H ⊆ V where H is compact. By regularity, there
exist compact sets K
k
and a decreasing sequence of open sets G
k
⊆ V such that
K
k
⊆ E ⊆ G
k
and m
p
(G
k
¸ K
k
) < 2
−k
. By Lemma 10.5.2, there exist f
k
such that K
k
≺ f
k
≺ G
k
.
Then f
k
(y) → A
E
(y) a.e. because if y is such that convergence fails, it must be
the case that y is in G
k
¸K
k
for inﬁnitely many k and
k
m
p
(G
k
¸ K
k
) < ∞. This
set equals
N = ∩
∞
m=1
∪
∞
k=m
G
k
¸ K
k
and so for each m ∈ N
m
p
(N) ≤ m
p
(∪
∞
k=m
G
k
¸ K
k
)
≤
∞
k=m
m
p
(G
k
¸ K
k
) <
∞
k=m
2
−k
= 2
−(m−1)
showing m
p
(N) = 0.
Then f
k
(h(x)) must converge to A
E
(h(x)) for all x / ∈ h
−1
(N), a set of measure
zero by Lemma 10.4.1. Thus A
E
(h(x)) = lim
k→∞
f
k
(h(x)) oﬀ h
−1
(N) and so by
completeness of Lebesgue measure, x →A
E
(h(x)) is measurable. Then
_
V
f
k
(y) dm
p
=
_
U
f
k
(h(x)) [det (Dh(x))[ dm
p
.
Since V is bounded, G
1
is compact. Therefore, [det (Dh(x))[ is bounded inde
pendent of k and so, by the dominated convergence theorem, using a dominating
function, A
V
in the integral on the left and A
G
1
[det (Dh)[ on the right, it follows
_
V
A
E
(y) dm
p
=
_
U
A
E
(h(x)) [det (Dh(x))[ dm
p
.
For an arbitrary measurable E, let V = ∪
∞
k=1
H
k
where H
k
is compact and H
k
⊆
H
k+1
. Let E
k
= H
k
∩ E replace E in the above and use the monotone convergence
theorem letting k →∞.
You don’t need to assume the open sets are bounded.
Corollary 10.5.4 Let U and V be open sets in 1
p
and let h, h
−1
be C
1
functions
such that h(U) = V . Also let E ⊆ V be measurable. Then
_
V
A
E
(y) dm
p
=
_
U
A
E
(h(x)) [det (Dh(x))[ dm
p
.
254 LEBESGUE MEASURABLE SETS
Proof: Since both h, h
−1
are continuous, h maps open sets to open sets. Let
U
n
= B(0, n) ∩ U where n is large enough that this intersection is nonempty. Let
V
n
= h(U
n
) , an open set. Then if E is a Lebesgue measurable set, the above
implies
_
V
n
A
E∩V
n
(y) dm
p
=
_
U
n
A
E∩V
n
(h(x)) [det Dh(x)[ dm
p
Hence
_
V
A
E∩V
n
(y) dm
p
=
_
U
A
E∩V
n
(h(x)) [det Dh(x)[ dm
p
Now let n →∞ and use the monotone convergence theorem.
With this corollary, the main theorem follows.
Theorem 10.5.5 Let U and V be open sets in 1
p
and let h, h
−1
be C
1
functions
such that h(U) = V. Then if g is a nonnegative Lebesgue measurable function,
_
V
g (y) dm
p
=
_
U
g (h(x)) [det (Dh(x))[ dm
p
. (10.5.10)
Proof: From Corollary 10.5.4, 10.5.10 holds for any nonnegative simple function
in place of g. In general, let ¦s
k
¦ be an increasing sequence of simple functions which
converges to g pointwise. Then from the monotone convergence theorem
_
V
g (y) dm
p
= lim
k→∞
_
V
s
k
dm
p
= lim
k→∞
_
U
s
k
(h(x)) [det (Dh(x))[ dm
p
=
_
U
g (h(x)) [det (Dh(x))[ dm
p
.
Of course this theorem implies the following corollary by splitting up the function
into the positive and negative parts of the real and imaginary parts.
Corollary 10.5.6 Let U and V be open sets in 1
p
and let h, h
−1
be C
1
functions
such that h(U) = V. Let g ∈ L
1
(V ) . Then
_
V
g (y) dm
p
=
_
U
g (h(x)) [det (Dh(x))[ dm
p
.
This is a pretty good theorem but it isn’t too hard to generalize it. In particular,
it is not necessary to assume h
−1
is C
1
.
In what follows, it may be convenient to take
[[x[[ = max ¦[x
i
[ , i = 1, , p¦ ≡ [[x[[
∞
and use the operator norm for A ∈ L(1
p
, 1
p
).
10.6. THE MAPPING IS ONLY ONE TO ONE 255
10.6 The Mapping Is Only One To One
The following is Sard’s lemma. In the proof, it does not matter which norm you use
in deﬁning balls but it may be easiest to consider the norm[[x[[ ≡ max ¦[x
i
[ , i = 1, , p¦.
Lemma 10.6.1 (Sard) Let U be an open set in 1
p
and let h : U →1
p
be diﬀeren
tiable. Let
Z ≡ ¦x ∈ U : det Dh(x) = 0¦ .
Then m
p
(h(Z)) = 0.
Proof: For convenience, assume the balls in the following argument come from
[[[[
∞
. First note that Z is a Borel set because h is continuous and so the component
functions of the Jacobian matrix are each Borel measurable. Hence the determinant
is also Borel measurable.
Suppose that U is a bounded open set. Let ε > 0 be given. Also let V ⊇ Z with
V ⊆ U open, and
m
p
(Z) +ε > m
p
(V ) .
Now let x ∈ Z. Then since h is diﬀerentiable at x, there exists δ
x
> 0 such that if
r < δ, then B(x, r) ⊆ V and also,
h(B(x,r)) ⊆ h(x) +Dh(x) (B(0,r)) +B(0,rη) , η < 1.
Regard Dh(x) as an n n matrix, the matrix of the linear transformation Dh(x)
with respect to the usual coordinates. Since x ∈ Z, it follows that there exists an
invertible matrix A such that ADh(x) is in row reduced echelon form with a row
of zeros on the bottom. Therefore,
m
p
(A(h(B(x,r)))) = m
p
(ADh(x) (B(0,r)) +AB(0,rη)) (10.6.11)
The diameter of ADh(x) (B(0,r)) is no larger than [[A[[ [[Dh(x)[[ 2r and it lies in
1
p−1
¦0¦ . The diameter of AB(0,rη) is no more than [[A[[ (2rη) .Therefore, the
measure of the right side in 10.6.11 is no more than
[([[A[[ [[Dh(x)[[ 2r +[[A[[ (2η)) r]
p−1
(rη)
≤ C ([[A[[ , [[Dh(x)[[) (2r)
p
η
Hence from the change of variables formula for linear maps,
m
p
(h(B(x,r))) ≤ η
C ([[A[[ , [[Dh(x)[[)
[det (A)[
m
p
(B(x, r))
Then letting δ
x
be still smaller if necessary, corresponding to suﬃciently small η,
m
p
(h(B(x,r))) ≤ εm
p
(B(x, r))
The balls of this form constitute a Vitali cover of Z. Hence, by the Vitali covering
theorem, there exists ¦B
i
¦
∞
i=1
, B
i
= B
i
(x
i
, r
i
) , a collection of disjoint balls, each of
256 LEBESGUE MEASURABLE SETS
which is contained in V, such that m
p
(h(B
i
)) ≤ εm
p
(B
i
) and m
p
(Z ¸ ∪
i
B
i
) = 0.
Hence from Lemma 10.4.1,
m
p
(h(Z) ¸ ∪
i
h(B
i
)) ≤ m
p
(h(Z ¸ ∪
i
B
i
)) = 0
Therefore,
m
p
(h(Z)) ≤
i
m
p
(h(B
i
)) ≤ ε
i
m
p
(B
i
)
≤ ε (m
p
(V )) ≤ ε (m
p
(Z) +ε) .
Since ε is arbitrary, this shows m
p
(h(Z)) = 0. What if U is not bounded? Then
consider Z
n
= Z ∩ B(0, n) . From what was just shown, h(Z
n
) has measure 0 and
so it follows that h(Z) also does, being the countable union of sets of measure zero.
With this important lemma, here is a generalization of Theorem 10.5.5.
Theorem 10.6.2 Let U be an open set and let h be a 1 −1, C
1
(U) function with
values in 1
p
. Then if g is a nonnegative Lebesgue measurable function,
_
h(U)
g (y) dm
p
=
_
U
g (h(x)) [det (Dh(x))[ dm
p
. (10.6.12)
Proof: Let Z = ¦x : det (Dh(x)) = 0¦ , a closed set. Then by the inverse
function theorem, h
−1
is C
1
on h(U ¸ Z) and h(U ¸ Z) is an open set. Therefore,
from Lemma 10.6.1, h(Z) has measure zero and so by Theorem 10.5.5,
_
h(U)
g (y) dm
p
=
_
h(U\Z)
g (y) dm
p
=
_
U\Z
g (h(x)) [det (Dh(x))[ dm
p
=
_
U
g (h(x)) [det (Dh(x))[ dm
p
.
10.7 Mappings Which Are Not One To One
Now suppose h is only C
1
, not necessarily one to one. For
U
+
≡ ¦x ∈ U : [det Dh(x)[ > 0¦
and Z the set where [det Dh(x)[ = 0, Lemma 10.6.1 implies m
p
(h(Z)) = 0. For
x ∈ U
+
, the inverse function theorem implies there exists an open set B
x
⊆ U
+
,
such that h is one to one on B
x
.
Let ¦B
i
¦ be a countable subset of ¦B
x
¦
x∈U
+
such that U
+
= ∪
∞
i=1
B
i
. Let
E
1
= B
1
. If E
1
, , E
k
have been chosen, E
k+1
= B
k+1
¸ ∪
k
i=1
E
i
. Thus
∪
∞
i=1
E
i
= U
+
, h is one to one on E
i
, E
i
∩ E
j
= ∅,
10.7. MAPPINGS WHICH ARE NOT ONE TO ONE 257
and each E
i
is a Borel set contained in the open set B
i
. Now deﬁne
n(y) ≡
∞
i=1
A
h(E
i
)
(y) +A
h(Z)
(y).
The set h(E
i
) , h(Z) are measurable by Lemma 10.4.2. Thus n() is measurable.
Lemma 10.7.1 Let F ⊆ h(U) be measurable. Then
_
h(U)
n(y)A
F
(y)dm
p
=
_
U
A
F
(h(x))[ det Dh(x)[dm
p
.
Proof: Using Lemma 10.6.1 and the Monotone Convergence Theorem
_
h(U)
n(y)A
F
(y)dm
p
=
_
h(U)
_
_
_
∞
i=1
A
h(E
i
)
(y) +
m
p
(h(Z))=0
¸ .. ¸
A
h(Z)
(y)
_
_
_A
F
(y)dm
p
=
∞
i=1
_
h(U)
A
h(E
i
)
(y)A
F
(y)dm
p
=
∞
i=1
_
h(B
i
)
A
h(E
i
)
(y)A
F
(y)dm
p
=
∞
i=1
_
B
i
A
E
i
(x)A
F
(h(x))[ det Dh(x)[dm
p
=
∞
i=1
_
U
A
E
i
(x)A
F
(h(x))[ det Dh(x)[dm
p
=
_
U
∞
i=1
A
E
i
(x)A
F
(h(x))[ det Dh(x)[dm
p
=
_
U
+
A
F
(h(x))[ det Dh(x)[dm
p
=
_
U
A
F
(h(x))[ det Dh(x)[dm
p
.
Deﬁnition 10.7.2 For y ∈ h(U), deﬁne a function, #, according to the formula
#(y) ≡ number of elements in h
−1
(y).
Observe that
#(y) = n(y) a.e. (10.7.13)
because n(y) = #(y) if y / ∈ h(Z), a set of measure 0. Therefore, # is a measurable
function because of completeness of Lebesgue measure.
258 LEBESGUE MEASURABLE SETS
Theorem 10.7.3 Let g ≥ 0, g measurable, and let h be C
1
(U). Then
_
h(U)
#(y)g(y)dm
p
=
_
U
g(h(x))[ det Dh(x)[dm
p
. (10.7.14)
Proof: From 10.7.13 and Lemma 10.7.1, 10.7.14 holds for all g, a nonnegative
simple function. Approximating an arbitrary measurable nonnegative function, g,
with an increasing pointwise convergent sequence of simple functions and using
the monotone convergence theorem, yields 10.7.14 for an arbitrary nonnegative
measurable function, g.
10.8 Spherical Coordinates In p Dimensions
Sometimes there is a need to deal with spherical coordinates in more than three
dimensions. In this section, this concept is deﬁned and formulas are derived for
these coordinate systems. Recall polar coordinates are of the form
y
1
= ρ cos θ
y
2
= ρ sin θ
where ρ > 0 and θ ∈ 1. Thus these transformation equations are not one to one
but they are one to one on (0, ∞) [0, 2π). Here I am writing ρ in place of r to
emphasize a pattern which is about to emerge. I will consider polar coordinates as
spherical coordinates in two dimensions. I will also simply refer to such coordinate
systems as polar coordinates regardless of the dimension. This is also the reason I
am writing y
1
and y
2
instead of the more usual x and y. Now consider what happens
when you go to three dimensions. The situation is depicted in the following picture.
φ
1
ρ
(x
1
, x
2
, x
3
)
R
2
R
From this picture, you see that y
3
= ρ cos φ
1
. Also the distance between (y
1
, y
2
)
and (0, 0) is ρ sin (φ
1
) . Therefore, using polar coordinates to write (y
1
, y
2
) in terms
of θ and this distance,
y
1
= ρ sin φ
1
cos θ,
y
2
= ρ sin φ
1
sin θ,
y
3
= ρ cos φ
1
.
where φ
1
∈ 1 and the transformations are one to one if φ
1
is restricted to be in
[0, π] . What was done is to replace ρ with ρ sin φ
1
and then to add in y
3
= ρ cos φ
1
.
Having done this, there is no reason to stop with three dimensions. Consider the
following picture:
10.8. SPHERICAL COORDINATES IN P DIMENSIONS 259
φ
2
ρ
(x
1
, x
2
, x
3
, x
4
)
R
3
R
From this picture, you see that y
4
= ρ cos φ
2
. Also the distance between (y
1
, y
2
, y
3
)
and (0, 0, 0) is ρ sin (φ
2
) . Therefore, using polar coordinates to write (y
1
, y
2
, y
3
) in
terms of θ, φ
1
, and this distance,
y
1
= ρ sin φ
2
sin φ
1
cos θ,
y
2
= ρ sin φ
2
sin φ
1
sin θ,
y
3
= ρ sin φ
2
cos φ
1
,
y
4
= ρ cos φ
2
where φ
2
∈ 1 and the transformations will be one to one if
φ
2
, φ
1
∈ [0, π] , θ ∈ [0, 2π), ρ ∈ (0, ∞) .
Continuing this way, given spherical coordinates in 1
p
, to get the spherical
coordinates in 1
p+1
, you let y
p+1
= ρ cos φ
p−1
and then replace every occurance of
ρ with ρ sin φ
p−1
to obtain y
1
y
p
in terms of φ
1
, φ
2
, , φ
p−1
,θ, and ρ.
It is always the case that ρ measures the distance from the point in 1
p
to the
origin in 1
p
, 0. Each φ
i
∈ 1 and the transformations will be one to one if each
φ
i
∈ [0, π] , and θ ∈ [0, 2π). It can be shown using math induction that these
coordinates map
p−2
i=1
[0, π] [0, 2π) (0, ∞) one to one onto 1
p
¸ ¦0¦ .
Theorem 10.8.1 Let y = h(φ, θ, ρ) be the spherical coordinate transformations in
1
p
. Then letting A =
p−2
i=1
[0, π] [0, 2π), it follows h maps A(0, ∞) one to one
onto 1
p
¸ ¦0¦ . Also [det Dh(φ, θ, ρ)[ will always be of the form
[det Dh(φ, θ, ρ)[ = ρ
p−1
Φ(φ, θ) . (10.8.15)
where Φ is a continuous function of φ and θ.
1
Furthermore whenever f is Lebesgue
measurable and nonnegative,
_
R
p
f (y) dy =
_
∞
0
ρ
p−1
_
A
f (h(φ, θ, ρ)) Φ(φ, θ) dφdθdρ (10.8.16)
where here dφdθ denotes dm
p−1
on A. The same formula holds if f ∈ L
1
(1
p
) .
Proof: Formula 10.8.15 is obvious from the deﬁnition of the spherical coor
dinates. The ﬁrst claim is also clear from the deﬁnition and math induction. It
remains to verify 10.8.16. Let A
0
≡
p−2
i=1
(0, π) (0, 2π) . Then it is clear that
1
Actually it is only a function of the ﬁrst but this is not important in what follows.
260 LEBESGUE MEASURABLE SETS
(A¸ A
0
) (0, ∞) ≡ N is a set of measure zero in 1
p
. Therefore, from Lemma
10.4.1 it follows h(N) is also a set of measure zero. Therefore, using the change of
variables theorem, Fubini’s theorem, and Sard’s lemma,
_
R
p
f (y) dy =
_
R
p
\{0}
f (y) dy =
_
R
p
\({0}∪h(N))
f (y) dy
=
_
A
0
×(0,∞)
f (h(φ, θ, ρ)) ρ
p−1
Φ(φ, θ) dm
p
=
_
A
A×(0,∞)
(φ, θ, ρ) f (h(φ, θ, ρ)) ρ
p−1
Φ(φ, θ) dm
p
=
_
∞
0
ρ
p−1
__
A
f (h(φ, θ, ρ)) Φ(φ, θ) dφdθ
_
dρ.
Now the claim about f ∈ L
1
follows routinely from considering the positive and
negative parts of the real and imaginary parts of f in the usual way.
Notation 10.8.2 Often this is written diﬀerently. Note that from the spherical co
ordinate formulas, f (h(φ, θ, ρ)) = f (ρω) where [ω[ = 1. Letting S
p−1
denote the
unit sphere, ¦ω ∈ 1
p
: [ω[ = 1¦ , the inside integral in the above formula is some
times written as
_
S
p−1
f (ρω) dσ
where σ is a measure on S
p−1
. See [32] for another description of this measure. It
isn’t an important issue here. Either 10.8.16 or the formula
_
∞
0
ρ
p−1
__
S
p−1
f (ρω) dσ
_
dρ
will be referred to as polar coordinates and is very useful in establishing estimates.
Here σ
_
S
p−1
_
≡
_
A
Φ(φ, θ) dφdθ.
Example 10.8.3 For what values of s is the integral
_
B(0,R)
_
1 +[x[
2
_
s
dy bounded
independent of R? Here B(0, R) is the ball, ¦x ∈ 1
p
: [x[ ≤ R¦ .
I think you can see immediately that s must be negative but exactly how neg
ative? It turns out it depends on p and using polar coordinates, you can ﬁnd just
exactly what is needed. From the polar coordinates formula above,
_
B(0,R)
_
1 +[x[
2
_
s
dy =
_
R
0
_
S
p−1
_
1 +ρ
2
_
s
ρ
p−1
dσdρ
= C
p
_
R
0
_
1 +ρ
2
_
s
ρ
p−1
dρ
Now the very hard problem has been reduced to considering an easy one variable
problem of ﬁnding when
_
R
0
ρ
p−1
_
1 +ρ
2
_
s
dρ
is bounded independent of R. You need 2s + (p −1) < −1 so you need s < −p/2.
10.9. BROUWER FIXED POINT THEOREM 261
10.9 Brouwer Fixed Point Theorem
The Brouwer ﬁxed point theorem is one of the most signiﬁcant theorems in math
ematics. There exist relatively easy proofs of this important theorem. The proof I
am giving here is the one given in Evans [17]. I think it is one of the shortest and
easiest proofs of this important theorem. It is based on the following lemma which
is an interesting result about cofactors of a matrix.
Recall that for A an p p matrix, cof (A)
ij
is the determinant of the matrix
which results from deleting the i
th
row and the j
th
column and multiplying by
(−1)
i+j
. In the proof and in what follows, I am using Dg to equal the matrix of
the linear transformation Dg taken with respect to the usual basis on 1
p
. Thus
Dg (x) =
ij
(Dg)
ij
e
i
e
j
and recall that (Dg)
ij
= ∂g
i
/∂x
j
where g =
i
g
i
e
i
.
Lemma 10.9.1 Let g : U →1
p
be C
2
where U is an open subset of 1
p
. Then
p
j=1
cof (Dg)
ij,j
= 0,
where here (Dg)
ij
≡ g
i,j
≡
∂g
i
∂x
j
. Also, cof (Dg)
ij
=
∂ det(Dg)
∂g
i,j
.
Proof: From the cofactor expansion theorem,
det (Dg) =
p
i=1
g
i,j
cof (Dg)
ij
and so
∂ det (Dg)
∂g
i,j
= cof (Dg)
ij
(10.9.17)
which shows the last claim of the lemma. Also
δ
kj
det (Dg) =
i
g
i,k
(cof (Dg))
ij
(10.9.18)
because if k ,= j this is just the cofactor expansion of the determinant of a matrix
in which the k
th
and j
th
columns are equal. Diﬀerentiate 10.9.18 with respect to
x
j
and sum on j. This yields
r,s,j
δ
kj
∂ (det Dg)
∂g
r,s
g
r,sj
=
ij
g
i,kj
(cof (Dg))
ij
+
ij
g
i,k
cof (Dg)
ij,j
.
Hence, using δ
kj
= 0 if j ,= k and 10.9.17,
rs
(cof (Dg))
rs
g
r,sk
=
rs
g
r,ks
(cof (Dg))
rs
+
ij
g
i,k
cof (Dg)
ij,j
.
262 LEBESGUE MEASURABLE SETS
Subtracting the ﬁrst sum on the right from both sides and using the equality of
mixed partials,
i
g
i,k
_
_
j
(cof (Dg))
ij,j
_
_
= 0.
If det (g
i,k
) ,= 0 so that (g
i,k
) is invertible, this shows
j
(cof (Dg))
ij,j
= 0. If
det (Dg) = 0, let
g
k
(x) = g (x) +ε
k
x
where ε
k
→0 and det (Dg +ε
k
I) ≡ det (Dg
k
) ,= 0. Then
j
(cof (Dg))
ij,j
= lim
k→∞
j
(cof (Dg
k
))
ij,j
= 0
Deﬁnition 10.9.2 Let h be a function deﬁned on an open set U ⊆ 1
p
. Then
h ∈ C
k
_
U
_
if there exists a function g deﬁned on an open set W containng U such
that g = h on U and g is C
k
(W) .
In the following lemma, you could use any norm in deﬁning the balls and every
thing would work the same but I have in mind the usual norm.
Lemma 10.9.3 There does not exist h ∈ C
2
_
B(0, R)
_
such that h :B(0, R) →
∂B(0, R) which also has the property that h(x) = x for all x ∈ ∂B(0, R) . Such a
function is called a retraction.
Proof: Suppose such an h exists. Let λ ∈ [0, 1] and let p
λ
(x) ≡ x+λ(h(x) −x) .
This function, p
λ
is called a homotopy of the identity map and the retraction, h.
Let
I (λ) ≡
_
B(0,R)
det (Dp
λ
(x)) dx.
Then using the dominated convergence theorem,
I
(λ) =
_
B(0,R)
i.j
∂ det (Dp
λ
(x))
∂p
λi,j
∂p
λij
(x)
∂λ
dx
=
_
B(0,R)
i
j
∂ det (Dp
λ
(x))
∂p
λi,j
(h
i
(x) −x
i
)
,j
dx
=
_
B(0,R)
i
j
cof (Dp
λ
(x))
ij
(h
i
(x) −x
i
)
,j
dx
Now by assumption, h
i
(x) = x
i
on ∂B(0, R) and so one can form iterated integrals
and integrate by parts in each of the one dimensional integrals to obtain
I
(λ) = −
i
_
B(0,R)
j
cof (Dp
λ
(x))
ij,j
(h
i
(x) −x
i
) dx = 0.
10.9. BROUWER FIXED POINT THEOREM 263
Therefore, I (λ) equals a constant. However,
I (0) = m
p
(B(0, R)) > 0
but
I (1) =
_
B(0,1)
det (Dh(x)) dm
p
=
_
∂B(0,1)
#(y) dm
p
= 0
because from polar coordinates or other elementary reasoning, m
p
(∂B(0, 1)) = 0.
The following is the Brouwer ﬁxed point theorem for C
2
maps.
Lemma 10.9.4 If h ∈ C
2
_
B(0, R)
_
and h : B(0, R) → B(0, R), then h has a
ﬁxed point, x such that h(x) = x.
Proof: Suppose the lemma is not true. Then for all x, [x −h(x)[ , = 0. Then
deﬁne
g (x) = h(x) +
x −h(x)
[x −h(x)[
t (x)
where t (x) is nonnegative and is chosen such that g (x) ∈ ∂B(0, R) . This mapping
is illustrated in the following picture.
f (x)
x
g(x)
If x →t (x) is C
2
near B(0, R), it will follow g is a C
2
retraction onto ∂B(0, R)
contrary to Lemma 10.9.3. Now t (x) is the nonnegative solution, t to
H (x, t) = [h(x)[
2
+ 2
_
h(x) ,
x −h(x)
[x −h(x)[
_
t +t
2
= R
2
(10.9.19)
Then
H
t
(x, t) = 2
_
h(x) ,
x −h(x)
[x −h(x)[
_
+ 2t.
If this is nonzero for all x near B(0, R), it follows from the implicit function theorem
that t is a C
2
function of x. From 10.9.19
2t = −2
_
h(x) ,
x −h(x)
[x −h(x)[
_
±
¸
4
_
h(x) ,
x −h(x)
[x −h(x)[
_
2
−4
_
[h(x)[
2
−R
2
_
264 LEBESGUE MEASURABLE SETS
and so
H
t
(x, t) = 2t + 2
_
h(x) ,
x −h(x)
[x −h(x)[
_
= ±
¸
4
_
R
2
−[h(x)[
2
_
+ 4
_
h(x) ,
x −h(x)
[x −h(x)[
_
2
If [h(x)[ < R, this is nonzero. If [h(x)[ = R, then it is still nonzero unless
(h(x) , x −h(x)) = 0.
But this cannot happen because the angle between h(x) and x −h(x) cannot be
π/2. Alternatively, if the above equals zero, you would need
(h(x) , x) = [h(x)[
2
= R
2
which cannot happen unless x = h(x) which is assumed not to happen. Therefore,
x → t (x) is C
2
near B(0, R) and so g (x) given above contradicts Lemma 10.9.3.
Now it is easy to prove the Brouwer ﬁxed point theorem.
Theorem 10.9.5 Let f : B(0, R) → B(0, R) be continuous. Then f has a ﬁxed
point.
Proof: If this is not so, there exists ε > 0 such that for all x ∈ B(0, R),
[x −f (x)[ > ε.
By the Weierstrass approximation theorem, there exists h, a polynomial such that
max
_
[h(x) −f (x)[ : x ∈ B(0, R)
_
<
ε
2
.
Then for all x ∈ B(0, R),
[x −h(x)[ ≥ [x −f (x)[ −[h(x) −f (x)[ > ε −
ε
2
=
ε
2
contradicting Lemma 10.9.4.
10.10 Exercises
1. ♠Recall the deﬁnition of f
y
. Prove that if f ∈ L
1
(1
p
) , then
lim
y→0
_
R
p
[f −f
y
[ dm
p
= 0
This is known as continuity of translation. Hint: Use the theorem about
being able to approximate an arbitrary function in L
1
(1
p
) with a function in
C
c
(1
p
).
10.10. EXERCISES 265
2. ♠Show that if a, b ≥ 0 and if p, q > 0 such that
1
p
+
1
q
= 1
then
ab ≤
a
p
p
+
b
q
q
Hint: You might consider for ﬁxed a ≥ 0, the function h(b) ≡
a
p
p
+
b
q
q
− ab
and ﬁnd its minimum.
3. ♠In the context of the previous problem, prove Holder’s inequality. If f, g
measurable functions, then
_
[f[ [g[ dµ ≤
__
[f[
p
dµ
_
1/p
__
[g[
q
dµ
_
1/q
Hint: If either of the factors on the right equals 0, explain why there is nothing
to show. Now let a = [f[ /
__
[f[
p
dµ
_
1/p
and b = [g[ /
__
[g[
q
dµ
_
1/q
. Apply
the inequality of the previous problem.
4. ♠If f ∈ L
1
(1
p
) , show there exists g ∈ L
1
(1
p
) such that g is also Borel
measurable such that g (x) = f (x) for a.e. x.
5. ♠Suppose f, g ∈ L
1
(1
p
) . Deﬁne f ∗ g (x) by
_
f (x −y) g (y) dm
p
(y) .
Show this makes sense for a.e. x and that in fact for a.e. x
_
[f (x −y)[ [g (y)[ dm
p
(y)
Next show
_
[f ∗ g (x)[ dm
p
(x) ≤
_
[f[ dm
p
_
[g[ dm
p
.
Hint: Use Problem 4. Show ﬁrst there is no problem if f, g are Borel mea
surable. The reason for this is that you can use Fubini’s theorem to write
_ _
[f (x −y)[ [g (y)[ dm
p
(y) dm
p
(x)
=
_ _
[f (x −y)[ [g (y)[ dm
p
(x) dm
p
(y)
=
_
[f (z)[ dm
p
_
[g (y)[ dm
p
.
Explain. Then explain why if f and g are replaced by functions which are
equal to f and g a.e. but are Borel measurable, the convolution is unchanged.
266 LEBESGUE MEASURABLE SETS
6. ♠In the situation of Problem 5 Show x →f ∗ g (x) is continuous whenever g
is also bounded. Hint: Use Problem 1.
7. ♠ Let f : [0, ∞) → 1 be in L
1
(1, m). The Laplace transform is given by
´
f(x) =
_
∞
0
e
−xt
f(t)dt. Let f, g be in L
1
(1, m), and let h(x) =
_
x
0
f(x −
t)g(t)dt. Show h ∈ L
1
, and
´
h =
´
f´ g.
8. ♠Suppose A is covered by a ﬁnite collection of Balls, T. Show that then
there exists a disjoint collection of these balls, ¦B
i
¦
p
i=1
, such that A ⊆ ∪
p
i=1
´
B
i
where
´
B
i
has the same center as B
i
but 3 times the radius. Hint: Since the
collection of balls is ﬁnite, they can be arranged in order of decreasing radius.
9. Let f be a function deﬁned on an interval, (a, b). The Dini derivates are
deﬁned as
D
+
f (x) ≡ lim inf
h→0+
f (x +h) −f (x)
h
,
D
+
f (x) ≡ lim sup
h→0+
f (x +h) −f (x)
h
D
−
f (x) ≡ lim inf
h→0+
f (x) −f (x −h)
h
,
D
−
f (x) ≡ lim sup
h→0+
f (x) −f (x −h)
h
.
Suppose f is continuous on (a, b) and for all x ∈ (a, b), D
+
f (x) ≥ 0. Show
that then f is increasing on (a, b). Hint: Consider the function, H (x) ≡
f (x) (d −c) − x(f (d) −f (c)) where a < c < d < b. Thus H (c) = H (d).
Also it is easy to see that H cannot be constant if f (d) < f (c) due to the
assumption that D
+
f (x) ≥ 0. If there exists x
1
∈ (a, b) where H (x
1
) > H (c),
then let x
0
∈ (c, d) be the point where the maximum of f occurs. Consider
D
+
f (x
0
). If, on the other hand, H (x) < H (c) for all x ∈ (c, d), then consider
D
+
H (c).
10. ↑ Suppose in the situation of the above problem we only know
D
+
f (x) ≥ 0 a.e.
Does the conclusion still follow? What if we only know D
+
f (x) ≥ 0 for every
x outside a countable set? Hint: In the case of D
+
f (x) ≥ 0,consider the
bad function in the exercises for the chapter on the construction of measures
which was based on the Cantor set. In the case where D
+
f (x) ≥ 0 for all but
countably many x, by replacing f (x) with
¯
f (x) ≡ f (x) + εx, consider the
situation where D
+
¯
f (x) > 0 for all but countably many x. If in this situation,
¯
f (c) >
¯
f (d) for some c < d, and y ∈
_
¯
f (d) ,
¯
f (c)
_
,let
z ≡ sup
_
x ∈ [c, d] :
¯
f (x) > y
0
_
.
10.10. EXERCISES 267
Show that
¯
f (z) = y
0
and D
+
¯
f (z) ≤ 0. Conclude that if
¯
f fails to be in
creasing, then D
+
¯
f (z) ≤ 0 for uncountably many points, z. Now draw a
conclusion about f.
11. ↑ ♠ Let f : [a, b] →1 be increasing. Show
m
_
_
_
N
pq
¸ .. ¸
_
D
+
f (x) > q > p > D
+
f (x)
¸
_
_
_ = 0 (10.10.20)
and conclude that aside from a set of measure zero, D
+
f (x) = D
+
f (x).
Similar reasoning will show D
−
f (x) = D
−
f (x) a.e. and D
+
f (x) = D
−
f (x)
a.e. and so oﬀ some set of measure zero, we have
D
−
f (x) = D
−
f (x) = D
+
f (x) = D
+
f (x)
which implies the derivative exists and equals this common value. Hint: To
show 10.10.20, let U be an open set containing N
pq
such that m(N
pq
) + ε >
m(U). For each x ∈ N
pq
there exist y > x arbitrarily close to x such that
f (y) −f (x) < p (y −x) .
Thus the set of such intervals, ¦[x, y]¦ which are contained in U constitutes a
Vitali cover of N
pq
. Let ¦[x
i
, y
i
]¦ be disjoint and
m(N
pq
¸ ∪
i
[x
i
, y
i
]) = 0.
Now let V ≡ ∪
i
(x
i
, y
i
). Then also we have
m
_
_
_N
pq
¸
=V
¸ .. ¸
∪
i
(x
i
, y
i
)
_
_
_ = 0.
and so m(N
pq
∩ V ) = m(N
pq
). For each x ∈ N
pq
∩ V , there exist y > x
arbitrarily close to x such that
f (y) −f (x) > q (y −x) .
Thus the set of such intervals, ¦[x
, y
]¦ which are contained in V is a Vitali
cover of N
pq
∩ V . Let ¦[x
i
, y
i
]¦ be disjoint and
m(N
pq
∩ V ¸ ∪
i
[x
i
, y
i
]) = 0.
Then verify the following:
i
f (y
i
) −f (x
i
) > q
i
(y
i
−x
i
) ≥ qm(N
pq
∩ V ) = qm(N
pq
)
≥ pm(N
pq
) > p (m(U) −ε) ≥ p
i
(y
i
−x
i
) −pε
≥
i
(f (y
i
) −f (x
i
)) −pε ≥
i
f (y
i
) −f (x
i
) −pε
268 LEBESGUE MEASURABLE SETS
and therefore, (q −p) m(N
pq
) ≤ pε. Since ε > 0 is arbitrary, this proves that
there is a right derivative a.e. A similar argument does the other cases.
12. ♠Suppose f is a function in L
1
(1) and f is diﬀerentiable. Does it follow
that f
∈ L
1
(1)? Hint: What if φ is C
1
and vanishes inside (0, 1) (Give an
example.) and f (x) = φ(2
p
(x −p)) for x ∈ (p, p + 1) , f (x) = 0 if x < 0?
13. Why is it that if f ∈ L
1
(1) , then there exists g ∈ C
1
(1) which vanishes oﬀ
some ﬁnite interval? Consider g ∈ C
c
(1) which is close to f in L
1
(1) and
then consider g
h
(x) ≡
1
2h
_
x+h
x−h
g (t) dt.
14. Prove Lemma 10.4.1 which says a C
1
function maps a set of measure zero to
a set of measure zero using Theorem 10.7.3.
15. ♠ For this problem deﬁne
_
∞
a
f (t) dt ≡ lim
r→∞
_
r
a
f (t) dt. Note this coincides
with the Lebesgue integral when f ∈ L
1
(a, ∞). Show
(a)
_
∞
0
sin(u)
u
du =
π
2
(b) lim
r→∞
_
∞
δ
sin(ru)
u
du = 0 whenever δ > 0.
(c) If f ∈ L
1
(1), then lim
r→∞
_
R
sin (ru) f (u) du = 0.
Hint: For the ﬁrst two, use
1
u
=
_
∞
0
e
−ut
dt and apply Fubini’s theorem to
_
R
0
sin u
_
R
e
−ut
dtdu. For the last part, ﬁrst establish it for f a C
1
function
which vanishes oﬀ a ﬁnite interval and then use the density of this set in
L
1
(1) to obtain the result. This is called the Riemann Lebesgue lemma.
16. ↑ ♠Suppose that g ∈ L
1
(1) and that at some x > 0, g is locally Holder
continuous from the right and from the left. This means
lim
r→0+
g (x +r) ≡ g (x+)
exists,
lim
r→0+
g (x −r) ≡ g (x−)
exists and there exist constants K, δ > 0 and r ∈ (0, 1] such that for [x −y[ <
δ,
[g (x+) −g (y)[ < K[x −y[
r
for y > x and
[g (x−) −g (y)[ < K[x −y[
r
for y < x. Show that under these conditions,
lim
r→∞
2
π
_
∞
0
sin (ur)
u
_
g (x −u) +g (x +u)
2
_
du
=
g (x+) +g (x−)
2
.
10.10. EXERCISES 269
17. ↑ ♠Let f ∈ L
1
(1) . Then the Fourier transform of f is given by
Ff (t) =
1
√
2π
_
R
f (s) e
−ist
ds.
Let g ∈ L
1
(1) and suppose g is locally Holder continuous from the right and
from the left at x. Show that then
lim
R→∞
1
2π
_
R
−R
e
ixt
_
∞
−∞
e
−ity
g (y) dydt =
g (x+) +g (x−)
2
.
This is very interesting. Hint: Show the left side of the above equation
reduces to
2
π
_
∞
0
sin (ur)
u
_
g (x −u) +g (x +u)
2
_
du
and then use Problem 16 to obtain the result.
18. ↑ ♠ A measurable function g deﬁned on (0, ∞) has exponential growth if
[g (t)[ ≤ Ce
ηt
for some η. For Re (s) > η, deﬁne the Laplace Transform by
Lg (s) ≡
_
∞
0
e
−su
g (u) du.
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick γ > η. Show that
lim
R→∞
1
2π
_
R
−R
e
γt
e
iyt
Lg (γ +iy) dy =
g (t+) +g (t−)
2
.
This formula is sometimes written in the form
1
2πi
_
γ+i∞
γ−i∞
e
st
Lg (s) ds
and is called the complex inversion integral for Laplace transforms. It can be
used to ﬁnd inverse Laplace transforms. Hint:
1
2π
_
R
−R
e
γt
e
iyt
Lg (γ +iy) dy =
1
2π
_
R
−R
e
γt
e
iyt
_
∞
0
e
−(γ+iy)u
g (u) dudy.
Now use Fubini’s theorem and do the integral from −R to R to get this equal
to
e
γt
π
_
∞
−∞
e
−γu
g (u)
sin (R(t −u))
t −u
du
270 LEBESGUE MEASURABLE SETS
where g is the zero extension of g oﬀ [0, ∞). Then this equals
e
γt
π
_
∞
−∞
e
−γ(t−u)
g (t −u)
sin (Ru)
u
du
which equals
2e
γt
π
_
∞
0
g (t −u) e
−γ(t−u)
+g (t +u) e
−γ(t+u)
2
sin (Ru)
u
du
and then apply the result of Problem 16.
19. Let K be a nonempty closed and convex subset of 1
p
. Recall K is convex
means that if x, y ∈ K, then for all t ∈ [0, 1] , tx + (1 −t) y ∈ K. Show that
if x ∈ 1
p
there exists a unique z ∈ K such that
[x −z[ = min ¦[x −y[ : y ∈ K¦ .
This z will be denoted as Px. Hint: First note you do not know K is compact.
Establish the parallelogram identity if you have not already done so,
[u −v[
2
+[u +v[
2
= 2 [u[
2
+ 2 [v[
2
.
Then let ¦z
k
¦ be a minimizing sequence,
lim
k→∞
[z
k
−x[
2
= inf ¦[x −y[ : y ∈ K¦ ≡ λ.
Now using convexity, explain why
¸
¸
¸
¸
z
k
−z
m
2
¸
¸
¸
¸
2
+
¸
¸
¸
¸
x−
z
k
+z
m
2
¸
¸
¸
¸
2
= 2
¸
¸
¸
¸
x −z
k
2
¸
¸
¸
¸
2
+ 2
¸
¸
¸
¸
x −z
m
2
¸
¸
¸
¸
2
and then use this to argue ¦z
k
¦ is a Cauchy sequence. Then if z
i
works for
i = 1, 2, consider (z
1
+z
2
) /2 to get a contradiction.
20. In Problem 19 show that Px satisﬁes the following variational inequality.
(x−Px) (y−Px) ≤ 0
for all y ∈ K. Then show that [Px
1
−Px
2
[ ≤ [x
1
−x
2
[. Hint: For the ﬁrst
part note that if y ∈ K, the function t →[x−(Px +t (y−Px))[
2
achieves its
minimum on [0, 1] at t = 0. For the second part,
(x
1
−Px
1
) (Px
2
−Px
1
) ≤ 0, (x
2
−Px
2
) (Px
1
−Px
2
) ≤ 0.
Explain why
(x
2
−Px
2
−(x
1
−Px
1
)) (Px
2
−Px
1
) ≥ 0
and then use a some manipulations and the Cauchy Schwarz inequality to get
the desired inequality.
10.10. EXERCISES 271
21. Establish the Brouwer ﬁxed point theorem for any convex compact set in 1
p
.
Hint: If K is a compact and convex set, let R be large enough that the closed
ball, D(0, R) ⊇ K. Let P be the projection onto K as in Problem 20 above.
If f is a continuous map from K to K, consider f ◦P. You want to show f has
a ﬁxed point in K.
22. In the situation of the implicit function theorem, suppose f (x
0
, y
0
) = 0 and
assume f is C
1
. Show that for (x, y) ∈ B(x
0
, δ) B(y
0
, r) where δ, r are
small enough, the mapping
x →T
y
(x) ≡ x−D
1
f (x
0
, y
0
)
−1
f (x, y)
is continuous and maps B(x
0
, δ) to B(x
0
, δ/2) ⊆ B(x
0
, δ). Apply the Brouwer
ﬁxed point theorem to obtain a shorter proof of the implicit function theorem.
23. Here is a really interesting little theorem which depends on the Brouwer ﬁxed
point theorem. It plays a prominent role in the treatment of the change of
variables formula in Rudin’s book, [40] and is useful in other contexts as well.
The idea is that if a continuous function mapping a ball in 1
k
to 1
k
doesn’t
move any point very much, then the image of the ball must contain a slightly
smaller ball.
Lemma: Let B = B(0, r), a ball in 1
k
and let F : B → 1
k
be continuous
and suppose for some ε < 1,
[F(v) −v[ < εr (10.10.21)
for all v ∈ B. Then
F(B) ⊇ B(0, r (1 −ε)) .
Hint: Suppose a ∈ B(0, r (1 −ε)) ¸ F(B) so it didn’t work. First explain
why a ,= F(v) for all v ∈ B. Now letting G :B → B, be deﬁned by G(v) ≡
r(a−F(v))
a−F(v)
,it follows Gis continuous. Then by the Brouwer ﬁxed point theorem,
G(v) = v for some v ∈ B. Explain why [v[ = r. Then take the inner product
with v and explain the following steps.
(G(v) , v) = [v[
2
= r
2
=
r
[a −F(v)[
(a −F(v) , v)
=
r
[a −F(v)[
(a −v +v −F(v) , v)
=
r
[a −F(v)[
[(a −v, v) +(v −F(v) , v)]
=
r
[a −F(v)[
_
(a, v) −[v[
2
+(v −F(v) , v)
_
≤
r
[a −F(v)[
_
r
2
(1 −ε) −r
2
+r
2
ε
¸
= 0.
272 LEBESGUE MEASURABLE SETS
24. Using Problem 23 establish the following interesting result. Suppose f : U →
1
p
is diﬀerentiable. Let
S = ¦x ∈ U : det Df (x) = 0¦.
Show f (U ¸ S) is an open set.
25. Let K be a closed, bounded and convex set in 1
p
and let f : K → 1
p
be
continuous and let y ∈ 1
p
. Show using the Brouwer ﬁxed point theorem
there exists a point x ∈ K such that P (y −f (x) +x) = x. Next show that
(y −f (x) , z −x) ≤ 0 for all z ∈ K. The existence of this x is known as
Browder’s lemma and it has great signiﬁcance in the study of certain types of
nolinear operators. Now suppose f : 1
p
→1
p
is continuous and satisﬁes
lim
x→∞
(f (x) , x)
[x[
= ∞.
Show using Browder’s lemma that f is onto.
Approximation Theorems
11.1 Bernstein Polynomials
In this chapter, is the Stone Weierstrass approximation theorem. To begin with,
a very special case will be presented which has to do with functions deﬁned on
intervals. First here is a little lemma.
Lemma 11.1.1 Let x ∈ [0, 1] . Then
m
k=0
_
m
k
_
x
k
(1 −x)
m−k
(k −mx)
2
≤
1
2
m.
Proof: Let
φ(t) ≡
_
1 −x +e
t
x
_
m
=
m
k=0
_
m
k
_
e
kt
x
k
(1 −x)
m−k
Then
φ
(t) = mxe
t
_
xe
t
−x + 1
_
m−1
φ
(t) = mxe
t
_
xe
t
−x + 1
_
m−2
_
mxe
t
−x + 1
_
Then
φ
(0) = mx =
m
k=0
_
m
k
_
kx
k
(1 −x)
m−k
,
φ
(0) = mx(mx −x + 1) =
m
k=0
_
m
k
_
k
2
x
k
(1 −x)
m−k
Therefore,
m
k=0
_
m
k
_
x
k
(1 −x)
m−k
(k −mx)
2
≤
m
k=0
_
m
k
_
x
k
(1 −x)
m−k
k
2
273
274 APPROXIMATION THEOREMS
−2mx
m
k=0
_
m
k
_
kx
k
(1 −x)
m−k
+m
2
x
2
= mx(mx −x + 1) −2mx(mx) +m
2
x
2
= mx(1 −x)
and this has a maximum when x = 1/2 which yields (1/4) m.
With this preparation, here is the ﬁrst version of the Weierstrass approximation
theorem.
Theorem 11.1.2 Let f ∈ C ([0, 1]) and let
p
m
(x) ≡
m
k=0
_
m
k
_
x
k
(1 −x)
m−k
f
_
k
m
_
.
Then these polynomials converge uniformly to f on [0, 1].
Proof: Let [[f[[
∞
denote the largest value of [f[. By uniform continuity of f,
there exists a δ > 0 such that if [x −x
[ < δ, then [f (x) −f (x
)[ < ε/2. By the
binomial theorem,
[p
m
(x) −f (x)[ ≤
m
k=0
_
m
k
_
x
k
(1 −x)
m−k
¸
¸
¸
¸
f
_
k
m
_
−f (x)
¸
¸
¸
¸
≤
[
k
m
−x[<δ
_
m
k
_
x
k
(1 −x)
m−k
¸
¸
¸
¸
f
_
k
m
_
−f (x)
¸
¸
¸
¸
+
2 [[f[[
∞
[
k
m
−x[≥δ
_
m
k
_
x
k
(1 −x)
m−k
Therefore,
≤
m
k=0
_
m
k
_
x
k
(1 −x)
m−k
ε
2
+ 2 [[f[[
∞
(k−mx)
2
≥m
2
δ
2
_
m
k
_
x
k
(1 −x)
m−k
≤
ε
2
+ 2 [[f[[
∞
1
m
2
δ
2
m
k=0
_
m
k
_
(k −mx)
2
x
k
(1 −x)
m−k
≤
ε
2
+ 2 [[f[[
∞
1
4
m
1
δ
2
m
2
< ε
provided m is large enough.
Deﬁnition 11.1.3 For a continuous function f deﬁned on some set S, let
[[f[[
∞
≡ sup ¦[f (x)[ : x ∈ S¦ .
11.2. STONE’S GENERALIZATION 275
Thus the above theorem can be stated as follows. There exists a sequence of polyno
mials ¦p
m
¦ such that
lim
m→∞
[[p
m
−f[[
∞
= 0.
Thus [[[[
∞
is a norm.
The classical form of the Weierstrass approximation theorem follows.
Corollary 11.1.4 If f ∈ C ([a, b]) , then there exists a sequence of polynomials
which converge uniformly to f on [a, b].
Proof: Let l : [0, 1] → [a, b] be one to one, linear and onto. Then f ◦ l is
continuous on [0, 1] and so if ε > 0 is given, there exists a polynomial p such that
for all x ∈ [0, 1] ,
[p (x) −f ◦ l (x)[ < ε
Therefore, letting y = l (x) , it follows that for all y ∈ [a, b] ,
[p (y) −f (y)[ < ε.
As another corollary, here is the version which will be used in Stone’s general
ization.
Corollary 11.1.5 Let f be a continuous function deﬁned on [−M, M] with f (0) =
0. Then there exists a sequence of polynomials ¦p
m
¦ such that p
m
(0) = 0 and
lim
m→∞
[[p
m
−f[[
∞
= 0.
Proof: From Corollary 11.1.4 there exists a sequence of polynomials ¦´ p
m
¦ such
that [[ ´ p
m
−f[[
∞
→0. Simply consider p
m
= ´ p
m
− ´ p
m
(0).
11.2 Stone’s Generalization
Stone’s generalization of this theorem must be one of the most elegant things to be
found. This is next. First it will be done for a compact subset of 1
n
and then it
will be done for 1
n
itself. However, it can be generalized much further to locally
compact Hausdorﬀ spaces. However, it is my intent to mainly feature 1
n
and so a
simpliﬁed version which will be suﬃcient for the needs of this book will be presented.
Deﬁnition 11.2.1 / is called a real algebra of functions if whenever f, g ∈ /, so
is fg and / is also a real vector space. The closure of /, denoted by / consists
of all functions f such that for some sequence ¦g
k
¦ ⊆ /, lim
k→∞
[[g
k
−f[[
∞
= 0.
An algebra / of functions deﬁned on K is said to separate the points if for every
x ,= y, there exists φ ∈ / such that φ(x) ,= φ(y) . It is said to annihilate no point
if for every x ∈ K, there exists φ ∈ / such that φ(x) ,= 0.
Note that if / is an algebra which annihilates no point and separates the points,
then / is also an algebra which has these same properties.
276 APPROXIMATION THEOREMS
Lemma 11.2.2 Suppose / is an algebra of continuous functions which are deﬁned
on K, a compact subset of some 1
n
and that the algebra annihilates no point and
separates the points. Then for f ∈ / it follows that [f[ is in /. If f, g are in /, so
is max (f, g) and min (f, g).
Proof: Say [[f[[
∞
≤ M. Let p
m
(t) →[t[ uniformly on [−M, M] with p
m
(0) = 0.
Thus p
m
(f) ∈ / and lim
m→∞
[[[f[ −p
m
(f)[[
∞
= 0. Hence [f[ ∈ /. Now note that
max (f, g) =
[g −f[ + (f +g)
2
, min (f, g) =
(f +g) −[g −f[
2
Since / is an algebra, these are in /.
Now here is a fundamental observation. If p, q are two points of K, then there
exists a function of / which will achieve a desired value at these two points.
Lemma 11.2.3 Let / be an algebra of functions deﬁned on K which separates
points and annihilates no point. Then if p, q are two diﬀerent points of K and if
c, d ∈ 1, then there exists a function ψ
pq
∈ / such that ψ
pq
(p) = c, ψ
pq
(q) = d.
Proof: By assumption, there exist φ, ψ, η ∈ / such that φ(p) ,= φ(q) , ψ (p) ,=
0, η (q) ,= 0. Let
ψ
pq
(x) ≡
φ(x) −φ(q)
φ(p) −φ(q)
c
ψ (p)
ψ (x) +
φ(x) −φ(p)
φ(q) −φ(p)
d
η (q)
η (x)
This clearly works in the sense that it gives the right values at p, q. Why is it in
/? This is obvious when you expand it and see that you are adding products of
functions in / and scalar multiples of functions in /.
Now here is the ﬁrst step to Stone’s generalization.
Theorem 11.2.4 Let K be a compact set and let / be a real algebra of continuous
functions deﬁned on K such that / separates the points and annihilates no point.
Then for every f ∈ C (K) and ε > 0, there exists ψ ∈ / such that [[f −ψ[[
∞
≤ ε.
Proof: Let f ∈ C (K) . I will show there exists ψ ∈ / such that [[f −ψ[[
∞
≤ ε.
This will prove the desired result thanks to the deﬁnition of /.
Pick p ∈ K and let ψ
pq
∈ / be such that ψ
pq
(p) = f (p) and ψ
pq
(q) = f (q).
Then for each q ∈ K, there exists an open set U
q
containing q such that ψ
pq
(z)+ε >
f (z) for all z ∈ U
q
. Since K is compact, there are ﬁnitely many, U
q
1
, , U
q
m
which
cover K. Therefore, letting ψ
p
(x) ≡ min
_
ψ
pq
i
_
m
i=1
, it follows that ψ
p
(p) = f (p)
and also ψ
p
(x) +ε > f (x) for all x. Now of course p was arbitrary and for each p,
there exists an open set V
p
containing p such that f (z) > ψ
p
(z) −ε for all z ∈ V
p
.
Since K is compact, there exist ﬁnitely many of these V
p
which cover K, ¦V
p
i
¦
n
i=1
.
Then it follows that for all x,
f (x) > max
_
ψ
p
i
_
n
i=1
−ε
11.2. STONE’S GENERALIZATION 277
Let ψ = max
_
ψ
p
i
_
n
i=1
. Then ψ (x) + ε > f (x) > ψ (x) −ε and so [[f −ψ[[
∞
< ε.
The next step is to generalize this to algebras in C
0
(1
n
) or more generally
to C
0
(X) where X is a locally compact Hausdorﬀ space. I will ignore the more
abstract case at this time.
Deﬁnition 11.2.5 A function f deﬁned on 1
n
is said to be in C
0
(1
n
) if f is
continuous and if lim
x→∞
f (x) = 0. The same deﬁnitions involving an algebra of
functions still hold. The only diﬀerence is that here the functions are not deﬁned on
a compact set. Because of the limit condition, [[f[[
∞
≡ sup ¦[f (x)[ : x ∈ 1
n
¦ < ∞.
The generalization to this case depends on the ideas illustrated in the following
picture.
K
x = θ(y)
P
y
R
n
x
n+1
sphere : x
2
1
+ +x
2
n
+ (x
n+1
−1)
2
= 1
The sphere is labeled K and there is a mapping, denoted by θ which takes the
surface of K ¸ ¦P¦ , to 1
n
as implied by the picture. Obviously this mapping and
its inverse is continuous. Now make the following deﬁnition for f ∈ C
0
(1
n
) .
ˆ
f (y) ≡
_
f ◦ θ (y) if y ,=P
0 if y = P
Then
ˆ
f is a continuous function deﬁned on K. This is because if y
k
→ P, this is
the same thing as [θy
k
[ →∞and so, by assumption
ˆ
f (y
k
) →0 = lim
x→∞
f (x) ≡
ˆ
f (P). Now suppose / is an algebra of functions contained in C
0
(1
n
) which sepa
rates the points of 1
n
and annihilates no point of 1
n
.
Let
´
/ denote the algebra of functions consisting of all functions of the form
ˆ g + c where c is a constant and g ∈ /. Then
´
/ is an algebra of functions on
C (K) which annihilates no point of K and separates the points of K. Since K is
a compact subset of 1
n+1
, the conclusion of Theorem 11.2.4 applies. Therefore, if
f ∈ C
0
(1
n
) , there exists g
k
∈
´
/ such that
¸
¸
¸
¸
¸
¸
ˆ
f −g
k
¸
¸
¸
¸
¸
¸
∞
→0
Since
ˆ
f (P) = 0, it can be assumed that g
k
is of the form
ˆ
h
k
for h
k
∈ /, the original
algebra of functions in 1
n
. Therefore, simply ignoring P,
lim
k→∞
[[f −h
k
[[
∞
= 0.
278 APPROXIMATION THEOREMS
This proves the following theorem which is the real form of the Stone Weierstrass
approximation theorem.
Theorem 11.2.6 Let / be a real algebra of functions in C
0
(1
n
) which separates
points and annihilates no point of 1
n
. Then if f is any function of C
0
(1
n
) , there
exists a sequence of functions of / ¦h
k
¦
∞
k=1
such that
lim
k→∞
[[h
k
−f[[
∞
= 0.
11.3 The Case Of Complex Valued Functions
What about the general case where C
0
(1
n
) consists of complex valued functions
and the ﬁeld of scalars is C rather than 1? The following is the version of the Stone
Weierstrass theorem which applies to this case. You have to assume that for f ∈ /
it follows f ∈ /. Such an algebra is called self adjoint.
Theorem 11.3.1 Suppose / is an algebra of functions in C
0
(1
n
) which separates
the points, annihilates no point, and has the property that if f ∈ /, then f ∈ /.
Then / is dense in C
0
(X).
Proof: Let Re / ≡ ¦Re f : f ∈ /¦, Im/ ≡¦Imf : f ∈ /¦. First I will show
that / =Re /+i Im/ =Im/+i Re /. Let f ∈ /. Then
f =
1
2
_
f +f
_
+
1
2
_
f −f
_
= Re f +i Imf ∈ Re /+i Im/
and so / ⊆ Re /+i Im/. Also
f =
1
2i
_
if +if
_
−
i
2
_
if + (if)
_
= Im(if) +i Re (if) ∈ Im/+i Re /
This proves one half of the desired equality. Now suppose h ∈ Re /+i Im/. Then
h = Re g
1
+i Img
2
where g
i
∈ /. Then since Re g
1
=
1
2
(g
1
+g
1
) , it follows Re g
1
∈
/. Similarly Img
2
∈ /. Therefore, h ∈ /. The case where h ∈ Im/ + i Re / is
similar. This establishes the desired equality.
Now Re / and Im/ are both real algebras. I will show this now. First consider
Im/. It is obvious this is a real vector space. It only remains to verify that
the product of two functions in Im/ is in Im/. Note that from the ﬁrst part,
Re /, Im/ are both subsets of / because, for example, if u ∈ Im/ then u + 0 ∈
Im/ + i Re / = /. Therefore, if v, w ∈ Im/, both iv and w are in / and so
Im(ivw) = vw and ivw ∈ /. Similarly, Re / is an algebra.
Both Re / and Im/ must separate the points. Here is why: If x
1
,= x
2
, then
there exists f ∈ / such that f (x
1
) ,= f (x
2
) . If Imf (x
1
) ,= Imf (x
2
) , this shows
there is a function in Im/, Imf which separates these two points. If Imf fails
to separate the two points, then Re f must separate the points and so you could
consider Im(if) to get a function in Im/ which separates these points. This shows
Im/ separates the points. Similarly Re / separates the points.
11.4. EXERCISES 279
Neither Re / nor Im/ annihilate any point. This is easy to see because if x
is a point there exists f ∈ / such that f (x) ,= 0. Thus either Re f (x) ,= 0 or
Imf (x) ,= 0. If Imf (x) ,= 0, this shows this point is not annihilated by Im/.
If Imf (x) = 0, consider Im(if) (x) = Re f (x) ,= 0. Similarly, Re / does not
annihilate any point.
It follows from Theorem 11.2.6 that Re / and Im/ are dense in the real valued
functions of C
0
(X). Let f ∈ C
0
(X) . Then there exists ¦h
n
¦ ⊆ Re / and ¦g
n
¦ ⊆
Im/ such that h
n
→ Re f uniformly and g
n
→ Imf uniformly. Therefore, h
n
+
ig
n
∈ / and it converges to f uniformly.
A repeat of the above argument using Theorem 11.2.4 directly leads to the
following corollary.
Corollary 11.3.2 Suppose / is an algebra of functions in C (K) where K is a
compact set, which separates the points of K, annihilates no point, and has the
property that if f ∈ /, then f ∈ /. Then / is dense in C (K), the complex valued
continuous functions deﬁned on K.
11.4 Exercises
1. ♠Consider polynomials in x
3
on [0, b] . Such a polynomial is of the form p (x) =
a
0
+a
1
x
3
+a
2
x
6
+ +a
n
x
3n
. Show these polynomials are dense in the space
of continuous functions deﬁned on [0, b].
2. ♠Consider polynomials in ln (1 +x) on [0, b] . Such a polynomial is of the form
p (x) = a
0
+a
1
(ln (1 +x))
1
+ +a
n
(ln (1 +x))
n
. Show these functions are
dense in the space of continuous functions deﬁned on [0, b].
3. ♠Recall the formula for the Bernstein polynomials.
p
n
(x) =
n
k=0
_
n
k
_
f
_
k
n
_
x
k
(1 −x)
n−k
These converged to the continuous function on [0, 1]. Now what if f is
C
1
([0, 1])? Is it true that p
n
(x) yields a sequence of polynomials which con
verges uniformly to f
(x) on [0, 1]? The answer is yes. Prove it. What
happens in general when f is C
n
([0, 1])? This sort of thing is in general false.
Consider the sequence of functions ¦f
n
(x)¦
∞
n=1
=
_
x
1+nx
2
_
∞
n=1
and show it
converges uniformly to 0 but f
n
(0) converges to 1. Thus the Bernstein poly
nomials are really remarkable.
4. ♠Let S
1
denote the unit circle centered at (0, 0). For convenience, consider
S
1
to be in the complex plane. Thus the points of S
1
are of the form e
ix
for
x ∈ 1. Show that linear combinations of the functions e
inx
for n ∈ Z is an
algebra and is dense in C
_
S
1
_
. That is, show that if f is continuous on S
1
280 APPROXIMATION THEOREMS
and complex valued, there exist complex scalars c
k
such that
sup
_¸
¸
¸
¸
¸
f
_
e
ix
_
−
n
k=−n
c
k
e
ikx
¸
¸
¸
¸
¸
: x ∈ 1
_
< ε.
5. ↑ ♠Suppose g is a continuous function deﬁned on 1 which is 2π periodic,
g (x + 2π) = g (x) for all x. Then deﬁne f on S
1
as follows. f
_
e
ix
_
≡ g (x) .
Explain why this is well deﬁned and yields f ∈ C
_
S
1
_
. Hint: If e
ix
= e
iy
then x − y is a multiple of 2π. If e
ix
n
→ e
ix
, you might argue that each x
n
and x can be considered in [0, 2π) and that the x
n
can be chosen such that
x
n
→ x. Now prove the following theorem. If g is continuous on 1 and 2π
periodic, then for every ε > 0 there exist complex numbers c
k
such that
sup
_¸
¸
¸
¸
¸
g (x) −
n
k=−n
c
k
e
ikx
¸
¸
¸
¸
¸
: x ∈ 1
_
< ε.
6. This problem gives an outline of the way Weierstrass originally proved the
Weierstrass approximation theorem. Choose a
n
such that
_
1
−1
_
1 −x
2
_
n
a
n
dx =
1. Show a
n
<
n+1
2
or something like this. Now show that for δ ∈ (0, 1) ,
lim
n→∞
_
_
1
δ
_
1 −x
2
_
n
a
n
+
_
−δ
−1
a
n
_
1 −x
2
_
n
dx
_
= 0.
Next for f a continuous function deﬁned on 1, deﬁne the polynomial, p
n
(x)
by
p
n
(x) ≡
_
x+1
x−1
a
n
_
1 −(x −t)
2
_
n
f (t) dt =
_
1
−1
a
n
f (x −t)
_
1 −t
2
_
n
dt.
Then show lim
n→∞
[[p
n
−f[[
∞
= 0. where [[f[[
∞
= max ¦[f (x)[ : x ∈ [−1, 1]¦.
7. ♠Suppose f ∈ C
0
([0, ∞)) and also [f (t)[ ≤ Ce
−rt
. Let / denote the algebra
of linear combinations of functions of the form e
−st
for s suﬃciently large.
Thus / is dense in C
0
([0, ∞)) . Show that if
_
∞
0
e
−st
f (t) dt = 0
for each s suﬃciently large, then f (t) = 0. Next consider only [f (t)[ ≤ Ce
rt
for some r. That is f has exponential growth. Show the same conclusion holds
for f if
_
∞
0
e
−st
f (t) dt = 0
for all s suﬃciently large. This justiﬁes the Laplace transform procedure of
diﬀerential equations where if the Laplace transforms of two functions are
equal, then the two functions are considered to be equal. More can be said
about this. Hint: For the last part, consider g (t) ≡ e
−2rt
f (t) and apply the
ﬁrst part to g. If g (t) = 0 then so is f (t).
11.4. EXERCISES 281
8. ♠Consider linear combinations of functions of the form
n
i=1
φ
i
(x
i
) where
φ
i
∈ C
c
(1) . Show that this is an algebra / of functions of C
c
(1
n
) . Explain
why / is dense in C
0
(1
n
).
9. In the context of the above problem, if f ∈ C
c
(1
n
) , deﬁne
_
fdx ≡ lim
k→∞
_
_
p
k
dx
1
dx
n
where [[p
k
−f[[
∞
→0, p
k
a function of the sort described in Problem 8. You
must show that the limit in the above exists and that this limit is independent
of the choice of approximating sequence from /. Now give an unbelievably
short and easy proof that the order of iterated integrals can be interchanged
for
_
fdx where f ∈ C
c
(1
n
).
10. ♠Suppose µ is a ﬁnite outer regular Borel measure deﬁned on a compact
Hausdorﬀ space Ω. Show that there exists K a compact subset of Ω such
that µ(K) = µ(Ω) but if H is any compact subset of K, then µ(H) < µ(K).
Hint: This uses outer regularity and the ﬁnite intersection property.
282 APPROXIMATION THEOREMS
The L
p
Spaces
12.1 Basic Inequalities And Properties
One of the main applications of the Lebesgue integral is to the study of various
sorts of functions space. These are vector spaces whose elements are functions of
various types. One of the most important examples of a function space is the space
of measurable functions whose absolute values are p
th
power integrable where p ≥ 1.
These spaces, referred to as L
p
spaces, are very useful in applications. In the chapter
(Ω, o, µ) will be a measure space.
Deﬁnition 12.1.1 Let 1 ≤ p < ∞. Deﬁne
L
p
(Ω) ≡ ¦f : f is measurable and
_
Ω
[f(ω)[
p
dµ < ∞¦
For each p > 1 deﬁne q by
1
p
+
1
q
= 1.
Often one uses p
instead of q in this context.
L
p
(Ω) is a vector space and has a norm. This is similar to the situation for 1
n
but the proof requires the following fundamental inequality. .
Theorem 12.1.2 (Holder’s inequality) If f and g are measurable functions, then
if p > 1,
_
[f[ [g[ dµ ≤
__
[f[
p
dµ
_1
p
__
[g[
q
dµ
_1
q
. (12.1.1)
Proof: First here is a proof of Young’s inequality .
Lemma 12.1.3 If p > 1, and 0 ≤ a, b then ab ≤
a
p
p
+
b
q
q
.
283
284 THE L
P
SPACES
Proof: Consider the following picture:
b
a
x
t
x = t
p−1
t = x
q−1
From this picture, the sum of the area between the x axis and the curve added to
the area between the t axis and the curve is at least as large as ab. Using beginning
calculus, this is equivalent to the following inequality.
ab ≤
_
a
0
t
p−1
dt +
_
b
0
x
q−1
dx =
a
p
p
+
b
q
q
.
The above picture represents the situation which occurs when p > 2 because the
graph of the function is concave up. If 2 ≥ p > 1 the graph would be concave down
or a straight line. You should verify that the same argument holds in these cases
just as well. In fact, the only thing which matters in the above inequality is that
the function x = t
p−1
be strictly increasing.
Note equality occurs when a
p
= b
q
.
Here is an alternate proof.
Lemma 12.1.4 For a, b ≥ 0,
ab ≤
a
p
p
+
b
q
q
and equality occurs when if and only if a
p
= b
q
.
Proof: If b = 0, the inequality is obvious. Fix b > 0 and consider f (a) ≡
a
p
p
+
b
q
q
− ab. Then f
(a) = a
p−1
− b. This is negative when a < b
1/(p−1)
and is
positive when a > b
1/(p−1)
. Therefore, f has a minimum when a = b
1/(p−1)
. In other
words, when a
p
= b
p/(p−1)
= b
q
since 1/p + 1/q = 1. Thus the minimum value of f
is
b
q
p
+
b
q
q
−b
1/(p−1)
b = b
q
−b
q
= 0.
It follows f ≥ 0 and this yields the desired inequality.
Proof of Holder’s inequality: If either
_
[f[
p
dµ or
_
[g[
p
dµ equals ∞, the
inequality 12.1.1 is obviously valid because ∞ ≥ anything. If either
_
[f[
p
dµ or
_
[g[
p
dµ equals 0, then f = 0 a.e. or that g = 0 a.e. and so in this case the left side of
the inequality equals 0 and so the inequality is therefore true. Therefore assume both
12.1. BASIC INEQUALITIES AND PROPERTIES 285
_
[f[
p
dµ and
_
[g[
p
dµ are less than ∞and not equal to 0. Let
__
[f[
p
dµ
_
1/p
= I (f)
and let
__
[g[
p
dµ
_
1/q
= I (g). Then using the lemma,
_
[f[
I (f)
[g[
I (g)
dµ ≤
1
p
_
[f[
p
I (f)
p
dµ +
1
q
_
[g[
q
I (g)
q
dµ = 1.
Hence,
_
[f[ [g[ dµ ≤ I (f) I (g) =
__
[f[
p
dµ
_
1/p
__
[g[
q
dµ
_
1/q
.
This proves Holder’s inequality.
The following lemma will be needed.
Lemma 12.1.5 Suppose x, y ∈ C. Then
[x +y[
p
≤ 2
p−1
([x[
p
+[y[
p
) .
Proof: The function f (t) = t
p
is concave up for t ≥ 0 because p > 1. Therefore,
the secant line joining two points on the graph of this function must lie above the
graph of the function. This is illustrated in the following picture.
[x[ [y[ m
([x[ +[y[)/2 = m
Now as shown above,
_
[x[ +[y[
2
_
p
≤
[x[
p
+[y[
p
2
which implies
[x +y[
p
≤ ([x[ +[y[)
p
≤ 2
p−1
([x[
p
+[y[
p
)
Note that if y = φ(x) is any function for which the graph of φ is concave up,
you could get a similar inequality by the same argument.
Corollary 12.1.6 (Minkowski inequality) Let 1 ≤ p < ∞. Then
__
[f +g[
p
dµ
_
1/p
≤
__
[f[
p
dµ
_
1/p
+
__
[g[
p
dµ
_
1/p
. (12.1.2)
286 THE L
P
SPACES
Proof: If p = 1, this is obvious because it is just the triangle inequality. Let
p > 1. Without loss of generality, assume
__
[f[
p
dµ
_
1/p
+
__
[g[
p
dµ
_
1/p
< ∞
and
__
[f +g[
p
dµ
_
1/p
,= 0 or there is nothing to prove. Therefore, using the above
lemma,
_
[f +g[
p
dµ ≤ 2
p−1
__
[f[
p
+[g[
p
dµ
_
< ∞.
Now [f (ω) +g (ω)[
p
≤ [f (ω) +g (ω)[
p−1
([f (ω)[ +[g (ω)[). Also, it follows from the
deﬁnition of p and q that p −1 =
p
q
. Therefore, using this and Holder’s inequality,
_
[f +g[
p
dµ ≤
_
[f +g[
p−1
[f[dµ +
_
[f +g[
p−1
[g[dµ
=
_
[f +g[
p
q
[f[dµ +
_
[f +g[
p
q
[g[dµ
≤ (
_
[f +g[
p
dµ)
1
q
(
_
[f[
p
dµ)
1
p
+ (
_
[f +g[
p
dµ)
1
q
(
_
[g[
p
dµ)
1
p
.
Dividing both sides by (
_
[f +g[
p
dµ)
1
q
yields 12.1.2.
The following follows immediately from the above.
Corollary 12.1.7 Let f
i
∈ L
p
(Ω) for i = 1, 2, , n. Then
_
_
¸
¸
¸
¸
¸
n
i=1
f
i
¸
¸
¸
¸
¸
p
dµ
_
1/p
≤
n
i=1
__
[f
i
[
p
_
1/p
.
This shows that if f, g ∈ L
p
, then f + g ∈ L
p
. Also, it is clear that if a is a
constant and f ∈ L
p
, then af ∈ L
p
because
_
[af[
p
dµ = [a[
p
_
[f[
p
dµ < ∞.
Thus L
p
is a vector space and
a.)
__
[f[
p
dµ
_
1/p
≥ 0,
__
[f[
p
dµ
_
1/p
= 0 if and only if f = 0 a.e.
b.)
__
[af[
p
dµ
_
1/p
= [a[
__
[f[
p
dµ
_
1/p
if a is a scalar.
c.)
__
[f +g[
p
dµ
_
1/p
≤
__
[f[
p
dµ
_
1/p
+
__
[g[
p
dµ
_
1/p
.
f →
__
[f[
p
dµ
_
1/p
would deﬁne a norm if
__
[f[
p
dµ
_
1/p
= 0 implied f = 0.
Unfortunately, this is not so because if f = 0 a.e. but is nonzero on a set of
12.2. COMPLETENESS 287
measure zero,
__
[f[
p
dµ
_
1/p
= 0 and this is not allowed. However, all the other
properties of a norm are available and so a little thing like a set of measure zero
will not prevent the consideration of L
p
as a normed vector space if two functions
in L
p
which diﬀer only on a set of measure zero are considered the same. That is,
an element of L
p
is really an equivalence class of functions where two functions are
equivalent if they are equal a.e. With this convention, here is a deﬁnition.
Deﬁnition 12.1.8 Let f ∈ L
p
(Ω). Deﬁne
[[f[[
p
≡ [[f[[
L
p
≡
__
[f[
p
dµ
_
1/p
.
Then with this deﬁnition and using the convention that elements in L
p
are
considered to be the same if they diﬀer only on a set of measure zero, [[ [[
p
is a
norm on L
p
(Ω) because if [[f[[
p
= 0 then f = 0 a.e. and so f is considered to be
the zero function because it diﬀers from 0 only on a set of measure zero.
12.2 Completeness
The following is an important deﬁnition.
Deﬁnition 12.2.1 A complete normed linear space is called a Banach
1
space.
L
p
is a Banach space. This is the next big theorem.
Theorem 12.2.2 The following hold for L
p
(Ω)
a.) L
p
(Ω) is complete.
b.) If ¦f
n
¦ is a Cauchy sequence in L
p
(Ω), then there exists f ∈ L
p
(Ω) and a
subsequence which converges a.e. to f ∈ L
p
(Ω), and [[f
n
−f[[
p
→0.
Proof: Let ¦f
n
¦ be a Cauchy sequence in L
p
(Ω). This means that for every
ε > 0 there exists N such that if n, m ≥ N, then [[f
n
− f
m
[[
p
< ε. Now select a
subsequence as follows. Let n
1
be such that [[f
n
−f
m
[[
p
< 2
−1
whenever n, m ≥ n
1
.
Let n
2
be such that n
2
> n
1
and [[f
n
− f
m
[[
p
< 2
−2
whenever n, m ≥ n
2
. If
1
These spaces are named after Stefan Banach, 18921945. Banach spaces are the basic item of
study in the subject of functional analysis and will be considered later in this book.
There is a recent biography of Banach, R. Katu˙ za, The Life of Stefan Banach, (A. Kostant and
W. Woyczy´ nski, translators and editors) Birkhauser, Boston (1996). More information on Banach
can also be found in a recent short article written by Douglas Henderson who is in the department
of chemistry and biochemistry at BYU.
Banach was born in Austria, worked in Poland and died in the Ukraine but never moved. This
is because borders kept changing. There is a rumor that he died in a German concentration camp
which is apparently not true. It seems he died after the war of lung cancer.
He was an interesting character. He hated taking examinations so much that he did not receive
his undergraduate university degree. Nevertheless, he did become a professor of mathematics due
to his important research. He and some friends would meet in a cafe called the Scottish cafe where
they wrote on the marble table tops until Banach’s wife supplied them with a notebook which
became the ”Scotish notebook” and was eventually published.
288 THE L
P
SPACES
n
1
, , n
k
have been chosen, let n
k+1
> n
k
and whenever n, m ≥ n
k+1
, [[f
n
−
f
m
[[
p
< 2
−(k+1)
. The subsequence just mentioned is ¦f
n
k
¦. Thus, [[f
n
k
−f
n
k+1
[[
p
<
2
−k
. Let
g
k+1
= f
n
k+1
−f
n
k
.
Then by the corollary to Minkowski’s inequality,
∞>
∞
k=1
[[g
k+1
[[
p
≥
m
k=1
[[g
k+1
[[
p
≥
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
m
k=1
[g
k+1
[
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
p
for all m. It follows that
_
_
m
k=1
[g
k+1
[
_
p
dµ ≤
_
∞
k=1
[[g
k+1
[[
p
_
p
< ∞ (12.2.3)
for all m and so the monotone convergence theorem implies that the sum up to m
in 12.2.3 can be replaced by a sum up to ∞. Thus,
_
_
∞
k=1
[g
k+1
[
_
p
dµ < ∞
which requires
∞
k=1
[g
k+1
(x)[ < ∞ a.e. x.
Therefore,
∞
k=1
g
k+1
(x) converges for a.e. x because the functions have values in
a complete space, C, and this shows the partial sums form a Cauchy sequence. Now
let x be such that this sum is ﬁnite. Then deﬁne
f(x) ≡ f
n
1
(x) +
∞
k=1
g
k+1
(x)= lim
m→∞
f
n
m
(x)
since
m
k=1
g
k+1
(x) = f
n
m+1
(x) − f
n
1
(x). Therefore there exists a set E having
measure zero such that
lim
k→∞
f
n
k
(x) = f(x)
for all x / ∈ E. Redeﬁne f
n
k
to equal 0 on E and let f(x) = 0 for x ∈ E. It then
follows that lim
k→∞
f
n
k
(x) = f(x) for all x. By Fatou’s lemma, and the Minkowski
inequality,
[[f −f
n
k
[[
p
=
__
[f −f
n
k
[
p
dµ
_
1/p
≤
lim inf
m→∞
__
[f
n
m
−f
n
k
[
p
dµ
_
1/p
= lim inf
m→∞
[[f
n
m
−f
n
k
[[
p
≤
lim inf
m→∞
m−1
j=k
¸
¸
¸
¸
f
n
j+1
−f
n
j
¸
¸
¸
¸
p
≤
∞
i=k
¸
¸
¸
¸
f
n
i+1
−f
n
i
¸
¸
¸
¸
p
≤ 2
−(k−1)
. (12.2.4)
12.3. MINKOWSKI’S INEQUALITY 289
Therefore, f ∈ L
p
(Ω) because
[[f[[
p
≤ [[f −f
n
k
[[
p
+[[f
n
k
[[
p
< ∞,
and lim
k→∞
[[f
n
k
−f[[
p
= 0. This proves b.).
This has shown f
n
k
converges to f in L
p
(Ω). It follows the original Cauchy
sequence also converges to f in L
p
(Ω). This is a general fact that if a subsequence
of a Cauchy sequence converges, then so does the original Cauchy sequence. You
should give a proof of this or you could see Theorem 3.5.4.
12.3 Minkowski’s Inequality
In working with the L
p
spaces, the following inequality also known as Minkowski’s
inequality is very useful. It is similar to the Minkowski inequality for sums. To
see this, replace the integral,
_
X
with a ﬁnite summation sign and you will see the
usual Minkowski inequality or rather the version of it given in Corollary 12.1.7.
To prove this theorem ﬁrst consider a special case of it in which technical con
siderations which shed no light on the proof are excluded.
Lemma 12.3.1 Let (X, o, µ) and (Y, T, λ) be ﬁnite complete measure spaces and
let f be µ λ measurable and uniformly bounded. Then the following inequality is
valid for p ≥ 1.
_
X
__
Y
[f(x, y)[
p
dλ
_1
p
dµ ≥
__
Y
(
_
X
[f(x, y)[ dµ)
p
dλ
_1
p
. (12.3.5)
Proof: Since f is bounded and µ(X) , λ(Y ) < ∞,
__
Y
(
_
X
[f(x, y)[dµ)
p
dλ
_1
p
< ∞.
Let
J(y) =
_
X
[f(x, y)[dµ.
Note there is no problem in writing this for a.e. y because f is measurable and
Lemma 9.4.8 on Page 232. Then by Fubini’s theorem,
_
Y
__
X
[f(x, y)[dµ
_
p
dλ =
_
Y
J(y)
p−1
_
X
[f(x, y)[dµ dλ
=
_
X
_
Y
J(y)
p−1
[f(x, y)[dλ dµ
Now apply Holder’s inequality in the last integral above and recall p −1 =
p
q
. This
290 THE L
P
SPACES
yields
_
Y
__
X
[f(x, y)[dµ
_
p
dλ
≤
_
X
__
Y
J(y)
p
dλ
_1
q
__
Y
[f(x, y)[
p
dλ
_1
p
dµ
=
__
Y
J(y)
p
dλ
_1
q
_
X
__
Y
[f(x, y)[
p
dλ
_1
p
dµ
=
__
Y
(
_
X
[f(x, y)[dµ)
p
dλ
_1
q
_
X
__
Y
[f(x, y)[
p
dλ
_1
p
dµ. (12.3.6)
Therefore, dividing both sides by the ﬁrst factor in the above expression,
__
Y
__
X
[f(x, y)[dµ
_
p
dλ
_
1
p
≤
_
X
__
Y
[f(x, y)[
p
dλ
_1
p
dµ. (12.3.7)
Note that 12.3.7 holds even if the ﬁrst factor of 12.3.6 equals zero.
Now consider the case where f is not assumed to be bounded and where the
measure spaces are σ ﬁnite.
Theorem 12.3.2 Let (X, o, µ) and (Y, T, λ) be σﬁnite measure spaces and let f
be product measurable. Then the following inequality is valid for p ≥ 1.
_
X
__
Y
[f(x, y)[
p
dλ
_1
p
dµ ≥
__
Y
(
_
X
[f(x, y)[ dµ)
p
dλ
_1
p
. (12.3.8)
Proof: Since the two measure spaces are σ ﬁnite, there exist measurable sets,
X
m
and Y
k
such that X
m
⊆ X
m+1
for all m, Y
k
⊆ Y
k+1
for all k, and µ(X
m
) , λ(Y
k
) <
∞. Now deﬁne
f
n
(x, y) ≡
_
f (x, y) if [f (x, y)[ ≤ n
n if [f (x, y)[ > n.
Thus f
n
is uniformly bounded and product measurable. By the above lemma,
_
X
m
__
Y
k
[f
n
(x, y)[
p
dλ
_1
p
dµ ≥
__
Y
k
(
_
X
m
[f
n
(x, y)[ dµ)
p
dλ
_1
p
. (12.3.9)
Now observe that [f
n
(x, y)[ increases in n and the pointwise limit is [f (x, y)[. There
fore, using the monotone convergence theorem in 12.3.9 yields the same inequality
with f replacing f
n
. Next let k → ∞ and use the monotone convergence theorem
again to replace Y
k
with Y . Finally let m →∞ in what is left to obtain 12.3.8.
Note that the proof of this theorem depends on two manipulations, the inter
change of the order of integration and Holder’s inequality. Note that there is nothing
to check in the case of double sums. Thus if a
ij
≥ 0, it is always the case that
_
_
j
_
i
a
ij
_
p
_
_
1/p
≤
i
_
_
j
a
p
ij
_
_
1/p
12.4. DENSITY SIMPLE FUNCTIONS 291
because the integrals in this case are just sums and (i, j) →a
ij
is measurable.
The L
p
spaces have many important properties.
12.4 Density Simple Functions
Theorem 12.4.1 Let p ≥ 1 and let (Ω, o, µ) be a measure space. Then the simple
functions are dense in L
p
(Ω).
Proof: Recall that a function, f, having values in 1 can be written in the form
f = f
+
−f
−
where
f
+
= max (0, f) , f
−
= max (0, −f) .
Therefore, an arbitrary complex valued function, f is of the form
f = Re f
+
−Re f
−
+i
_
Imf
+
−Imf
−
_
.
If each of these nonnegative functions is approximated by a simple function, it
follows f is also approximated by a simple function. Therefore, there is no loss of
generality in assuming at the outset that f ≥ 0.
Since f is measurable, Theorem 7.5.6 implies there is an increasing sequence of
simple functions, ¦s
n
¦, converging pointwise to f(x). Now
[f(x) −s
n
(x)[ ≤ [f(x)[.
By the Dominated Convergence theorem,
0 = lim
n→∞
_
[f(x) −s
n
(x)[
p
dµ.
Thus simple functions are dense in L
p
.
12.5 Density Of C
c
(Ω)
For Ω a topological space, C
c
(Ω) is the space of continuous functions with compact
support in Ω. If you have never heard of a topological space, think 1
p
. Also recall
the following deﬁnition.
Deﬁnition 12.5.1 Let (Ω, o, µ) be a measure space and suppose (Ω, τ) is also a
topological space. Then (Ω, o, µ) is called a regular measure space if the σ algebra
of Borel sets is contained in o, and for all E ∈ o,
µ(E) = inf¦µ(V ) : V ⊇ E and V open¦
and if µ(E) < ∞,
µ(E) = sup¦µ(K) : K ⊆ E and K is compact ¦
and µ(K) < ∞ for any compact set K.
292 THE L
P
SPACES
For example m
p
, Lebesgue measure on 1
p
is an example of such a measure
by Theorem 10.1.2. However, there are many other examples of regular measure
spaces. However, if the extra generality has no interest, just let Ω = 1
p
.
Lemma 12.5.2 Let Ω be a metric space in which the closed balls are compact and
let K be a compact subset of V , an open set. Then there exists a continuous function
f : Ω → [0, 1] such that f(x) = 1 for all x ∈ K and spt(f) is a compact subset of
V . That is, K ≺ f ≺ V.
Proof: First note that V is the increasing union of open sets ¦W
m
¦
∞
m=1
having
compact closures. Therefore, there exists an m such that K ⊆ W
m
since otherwise,
you could obtain a nested sequence of nonempty compact sets of the form K ¸ W
m
which would have a point in common contrary to the assertion that K ⊆ V . Pick
such an m. Then let
f (x) =
dist
_
x, W
C
m
_
dist (x, W
C
m
) + dist (x, K)
This is clearly equal to 1 on K and equals 0 oﬀ W
m
so spt (f) is compact. It is
continuous because the functions are all continuous and the denominator cannot
vanish because if x ∈ K, then it is at positive distance from W
C
.
It is not necessary to be in a metric space to do this. You can accomplish the
same thing using Urysohn’s lemma but this is a topic for another course.
Theorem 12.5.3 Let (Ω, o, µ) be a regular measure space as in Deﬁnition 12.5.1(For
example m
p
and Ω = 1
p
) where the conclusion of Lemma 12.5.2 holds. Then C
c
(Ω)
is dense in L
p
(Ω).
Proof: First consider a measurable set E where µ(E) < ∞. Let K ⊆ E ⊆ V
where µ(V ¸ K) < ε. Now let K ≺ h ≺ V. Then
_
[h −A
E
[
p
dµ ≤
_
A
p
V \K
dµ = µ(V ¸ K) < ε.
It follows that for each s a simple function in L
p
(Ω) , there exists h ∈ C
c
(Ω) such
that [[s −h[[
p
< ε. This is because if
s(x) =
m
i=1
c
i
A
E
i
(x)
is a simple function in L
p
where the c
i
are the distinct nonzero values of s each
µ(E
i
) < ∞ since otherwise s / ∈ L
p
due to the inequality
_
[s[
p
dµ ≥ [c
i
[
p
µ(E
i
) .
By Theorem 12.4.1, simple functions are dense in L
p
(Ω). Therefore, the set of
functions C
c
(Ω) , is also dense in L
p
(Ω).
12.6. CONTINUITY OF TRANSLATION 293
12.6 Continuity Of Translation
Deﬁnition 12.6.1 Let f be a function deﬁned on U ⊆ 1
n
and let w ∈ 1
n
. Then
f
w
will be the function deﬁned on w+U by
f
w
(x) = f(x −w).
Theorem 12.6.2 (Continuity of translation in L
p
) Let f ∈ L
p
(1
n
) with the mea
sure being Lebesgue measure. Then
lim
w→0
[[f
w
−f[[
p
= 0.
Proof: Let ε > 0 be given and let g ∈ C
c
(1
n
) with [[g − f[[
p
<
ε
3
. Since
Lebesgue measure is translation invariant (m
n
(w+E) = m
n
(E)),
[[g
w
−f
w
[[
p
= [[g −f[[
p
<
ε
3
.
You can see this from looking at simple functions and passing to the limit or you
could use the change of variables formula to verify it.
Therefore
[[f −f
w
[[
p
≤ [[f −g[[
p
+[[g −g
w
[[
p
+[[g
w
−f
w
[[
<
2ε
3
+[[g −g
w
[[
p
. (12.6.10)
But lim
w→0
g
w
(x) = g(x) uniformly in x because g is uniformly continuous. Now
let B be a large ball containing spt (g) and let δ
1
be small enough that B(x, δ) ⊆ B
whenever x ∈ spt (g). If ε > 0 is given there exists δ < δ
1
such that if [w[ < δ, it
follows that [g (x −w) −g (x)[ < ε/3
_
1 +m
n
(B)
1/p
_
. Therefore,
[[g −g
w
[[
p
=
__
B
[g (x) −g (x −w)[
p
dm
n
_
1/p
≤ ε
m
n
(B)
1/p
3
_
1 +m
n
(B)
1/p
_ <
ε
3
.
Therefore, whenever [w[ < δ, it follows [[g − g
w
[[
p
<
ε
3
and so from 12.6.10 [[f −
f
w
[[
p
< ε.
12.7 Separability, Some Special Functions
Here the measure space will be (1
n
, m
n
, T
n
) , familiar Lebesgue measure.
First recall the following deﬁnition of a polynomial.
294 THE L
P
SPACES
Deﬁnition 12.7.1 α = (α
1
, , α
n
) for α
1
α
n
nonnegative integers is called a
multiindex. For α a multiindex, [α[ ≡ α
1
+ +α
n
and if x ∈ 1
n
,
x = (x
1
, , x
n
) ,
and f a function, deﬁne
x
α
≡ x
α
1
1
x
α
2
2
x
α
n
n
.
A polynomial in n variables of degree m is a function of the form
p (x) =
α≤m
a
α
x
α
.
Here α is a multiindex as just described and a
α
∈ C. Also deﬁne for α =
(α
1
, , α
n
) a multiindex
D
α
f (x) ≡
∂
α
f
∂x
α
1
1
∂x
α
2
2
∂x
α
n
n
.
Deﬁnition 12.7.2 Deﬁne (
1
to be the functions of the form p (x) e
−ax
2
where
a > 0 is rational and p (x) is a polynomial having all rational coeﬃcients, a
α
being
“rational” if it is of the form a+ib for a, b ∈ ¸. Let ( be all ﬁnite sums of functions
in (
1
. Thus ( is an algebra of functions which has the property that if f ∈ ( then
f ∈ (.
Thus there are countably many functions in (
1
. This is because, for each m,
there are countably many choices for a
α
for [α[ ≤ m since there are ﬁnitely many
α for [α[ ≤ m and for each such α, there are countably many choices for a
α
since
¸+i¸ is countable. (Why?) Thus there are countably many polynomials having
degree no more than m. This is true for each m and so the number of diﬀerent
polynomials is a countable union of countable sets which is countable. Now there
are countably many choices of e
−αx
2
and so there are countably many in (
1
because
the Cartesian product of countable sets is countable.
Now ( consists of ﬁnite sums of functions in (
1
. Therefore, it is countable
because for each m ∈ N, there are countably many such sums which are possible.
I will show now that ( is dense in L
p
(1
n
) but ﬁrst, here is a lemma which
follows from the Stone Weierstrass theorem.
Lemma 12.7.3 ( is dense in C
0
(1
n
) with respect to the norm,
[[f[[
∞
≡ sup ¦[f (x)[ : x ∈ 1
n
¦
Proof: By the Weierstrass approximation theorem, it suﬃces to show ( sep
arates the points and annihilates no point. It was already observed in the above
deﬁnition that f ∈ ( whenever f ∈ (. If y
1
,= y
2
suppose ﬁrst that [y
1
[ ,= [y
2
[ .
Then in this case, you can let f (x) ≡ e
−x
2
. Then f ∈ ( and f (y
1
) ,= f (y
2
). If
[y
1
[ = [y
2
[ , then suppose y
1k
,= y
2k
. This must happen for some k because y
1
,= y
2
.
12.7. SEPARABILITY, SOME SPECIAL FUNCTIONS 295
Then let f (x) ≡ x
k
e
−x
2
. Thus ( separates points. Now e
−x
2
is never equal to
zero and so ( annihilates no point of 1
n
.
These functions are clearly quite specialized. Therefore, the following theorem
is somewhat surprising.
Theorem 12.7.4 For each p ≥ 1, p < ∞, ( is dense in L
p
(1
n
). Since ( is count
able, this shows that L
p
(1
n
) is separable.
Proof: Let f ∈ L
p
(1
n
) . Then there exists g ∈ C
c
(1
n
) such that [[f −g[[
p
< ε.
Now let b > 0 be large enough that
_
R
n
_
e
−bx
2
_
p
dx < ε
p
.
Then x →g (x) e
bx
2
is in C
c
(1
n
) ⊆ C
0
(1
n
) . Therefore, from Lemma 12.7.3 there
exists ψ ∈ ( such that
¸
¸
¸
¸
¸
¸ge
b·
2
−ψ
¸
¸
¸
¸
¸
¸
∞
< 1
Therefore, letting φ(x) ≡ e
−bx
2
ψ (x) it follows that φ ∈ ( and for all x ∈ 1
n
,
[g (x) −φ(x)[ < e
−bx
2
Therefore,
__
R
n
[g (x) −φ(x)[
p
dx
_
1/p
≤
__
R
n
_
e
−bx
2
_
p
dx
_
1/p
< ε.
It follows
[[f −φ[[
p
≤ [[f −g[[
p
+[[g −φ[[
p
< 2ε.
From now on, we can drop the restriction that the coeﬃcients of the polynomials
in ( are rational. We also drop the restriction that a is rational. Thus ( will be
ﬁnite sums of functions which are of the form p (x) e
−ax
2
where the coeﬃcients of
p are complex and a > 0.
The following lemma is also interesting even if it is obvious.
Lemma 12.7.5 For ψ ∈ ( , p a polynomial, and α, β multiindices, D
α
ψ ∈ ( and
pψ ∈ (. Also
sup¦[x
β
D
α
ψ(x)[ : x ∈ 1
n
¦ < ∞
Thus these special functions are inﬁnitely diﬀerentiable (smooth). They also
have the property that they and all their derivatives vanish as [x[ →∞.
296 THE L
P
SPACES
12.8 Convolutions
An important construction is the convolution of two functions. This is deﬁned as
follows.
Deﬁnition 12.8.1 Let f, g be functions deﬁned on 1
n
. Then f ∗ g (x) equals the
following integral provided it exists.
g ∗ f (x) ≡
_
R
n
g (x −y) f (y) dy
We have the following fundamental result about convolutions in the context of
Lebesgue measure.
Theorem 12.8.2 Let f ∈ L
1
(1
n
) and g ∈ L
p
(1
n
) for 1 ≤ p < ∞. Then f ∗ g (x)
makes sense for a.e. x. Furthermore, f ∗ g (x) = g ∗ f (x) a.e., f ∗ g ∈ L
p
(1
n
) and
[[f ∗ g[[
L
p
(R
n
)
≤ [[f[[
L
1
(R
n
)
[[g[[
L
p
(R
n
)
Proof: By Lemma 10.1.3, both f and g have Borel measurable representatives.
Thus these representatives are equal to f and g respectively in L
1
(1
n
) and L
p
(1
n
).
Thus I will assume without loss of generality that both f and g are Borel measurable.
Then using the Minkowski inequality for integrals and translation invariance of
Lebesgue measure,
_
R
n
__
R
n
[f (x −y) g (y)[ dy
_
dx =
__ __
[f (y)[ [g (x −y)[ dy
_
p
dx
_
1/p
≤
_
[f (y)[
__
[g (x −y)[
p
dx
_
1/p
dy
=
_
[f (y)[
__
[g (x)[
p
dx
_
1/p
dy = [[f[[
L
1
(R
n
)
[[g[[
L
p
(R
n
)
the last step following from translation invariance of Lebesgue measure, Theorem
10.1.2 or from the change of variables formulas. It follows from this inequality that
for a.e. x,
_
[f (x −y)[ [g (y)[ dy < ∞
and so, for those values of x, it follows y →f (x −y) g (y) is in L
1
(1
n
) , and so
_
f (x −y) g (y) dy ≡ f ∗ g (x)
exists. Also note that the above also implies, f ∗ g = g ∗ f.
To emphasize something which was observed in the proof,
_
f (y) g (x −y) dy
at those values of x where it makes sense, is unchanged when you change the
representative of f and g.
12.9. MOLLIFIERS AND DENSITY OF C
∞
C
R
N
297
12.9 Molliﬁers And Density Of C
∞
c
(1
n
)
The special functions deﬁned above are dense in each L
p
(1
n
) . This is remark
able. However, there is a possibly even more remarkable theorem about inﬁnitely
diﬀerentiable functions which have compact support.
Deﬁnition 12.9.1 Let U be an open subset of 1
n
. C
∞
c
(U) is the vector space
of all inﬁnitely diﬀerentiable functions which equal zero for all x outside of some
compact set contained in U. Similarly, C
m
c
(U) is the vector space of all functions
which are m times continuously diﬀerentiable and whose support is a compact subset
of U.
Example 12.9.2 Let U = B(z, 2r)
ψ (x) =
_
_
_
exp
_
_
[x −z[
2
−r
2
_
−1
_
if [x −z[ < r,
0 if [x −z[ ≥ r.
Then a little work shows ψ ∈ C
∞
c
(U). The following also is easily obtained.
Lemma 12.9.3 Let U be any open set. Then C
∞
c
(U) ,= ∅.
Proof: Pick z ∈ U and let r be small enough that B(z, 2r) ⊆ U. Then let
ψ ∈ C
∞
c
(B(z, 2r)) ⊆ C
∞
c
(U) be the function of the above example.
Deﬁnition 12.9.4 Let U = ¦x ∈ 1
n
: [x[ < 1¦. A sequence ¦ψ
m
¦ ⊆ C
∞
c
(U) is
called a molliﬁer (This is often called an approximate identity if the diﬀerentiability
is not included.) if
ψ
m
(x) ≥ 0, ψ
m
(x) = 0, if [x[ ≥
1
m
,
and
_
ψ
m
(x) = 1. Sometimes it may be written as ¦ψ
ε
¦ where ψ
ε
satisﬁes the above
conditions except ψ
ε
(x) = 0 if [x[ ≥ ε. In other words, ε takes the place of 1/m
and in everything that follows ε →0 instead of m →∞.
As before,
_
f(x, y)dµ(y) will mean x is ﬁxed and the function y → f(x, y)
is being integrated. To make the notation more familiar, dx is written instead of
dm
n
(x).
Example 12.9.5 Let
ψ ∈ C
∞
c
(B(0, 1)) (B(0, 1) = ¦x : [x[ < 1¦)
with ψ(x) ≥ 0 and
_
ψdm = 1. Let ψ
m
(x) = c
m
ψ(mx) where c
m
is chosen in such
a way that
_
ψ
m
dm = 1. By the change of variables theorem c
m
= m
n
.
298 THE L
P
SPACES
Deﬁnition 12.9.6 A function, f, is said to be in L
1
loc
(1
n
, µ) if f is µ measurable
and if [f[A
K
∈ L
1
(1
n
, µ) for every compact set K. Here µ is a Radon measure
(A Borel measure which is complete and regular like m
n
.) on 1
n
. Usually µ =
m
n
, Lebesgue measure. When this is so, write L
1
loc
(1
n
) or L
p
(1
n
), etc. If f ∈
L
1
loc
(1
n
, µ), and g ∈ C
c
(1
n
),
f ∗ g(x) ≡
_
f(y)g(x −y)dµ.
Note that here, a speciﬁc order is mandated, unlike the earlier treatment given above
for convolutions in the context of Lebesgue measure where the order does not matter.
The following lemma will be useful in what follows. It says that one of these very
unregular functions in L
1
loc
(1
n
, µ) is smoothed out by convolving with a molliﬁer.
Lemma 12.9.7 Let f ∈ L
1
loc
(1
n
, µ), and g ∈ C
∞
c
(1
n
). Then f ∗ g is an inﬁnitely
diﬀerentiable function. Here µ is a Radon measure on 1
n
. (Complete, regular, and
ﬁnite on compact sets.)
Proof: Consider the diﬀerence quotient for calculating a partial derivative of
f ∗ g.
f ∗ g (x +te
j
) −f ∗ g (x)
t
=
_
f(y)
g(x +te
j
−y) −g (x −y)
t
dµ(y) .
Using the fact that g ∈ C
∞
c
(1
n
), the quotient,
g(x +te
j
−y) −g (x −y)
t
,
is uniformly bounded. To see this easily, use Theorem 6.4.2 on Page 126 to get the
existence of a constant, M depending on
max ¦[[Dg (x)[[ : x ∈ 1
n
¦
such that
[g(x +te
j
−y) −g (x −y)[ ≤ M[t[
for any choice of x and y. Therefore, there exists a dominating function for the
integrand of the above integral which is of the form C [f (y)[ A
K
where K is a
compact set depending on the support of g. It follows the limit of the diﬀerence
quotient above passes inside the integral as t →0 and
∂
∂x
j
(f ∗ g) (x) =
_
f(y)
∂
∂x
j
g (x −y) dµ(y) .
Now letting
∂
∂x
j
g play the role of g in the above argument, partial derivatives of all
orders exist. A similar use of the dominated convergence theorem shows all these
partial derivatives are also continuous.
The following theorem is a lot like Lemma 10.5.2 except the function is inﬁnitely
diﬀerentiable.
12.9. MOLLIFIERS AND DENSITY OF C
∞
C
R
N
299
Theorem 12.9.8 Let K be a compact subset of an open set U. Then there exists
a function, h ∈ C
∞
c
(U), such that h(x) = 1 for all x ∈ K and h(x) ∈ [0, 1] for all
x.
Proof: Let r > 0 be small enough that K + B(0, 3r) ⊆ U. The symbol,
K +B(0, 3r) means
¦k +x : k ∈ K and x ∈ B(0, 3r)¦ .
Thus this is simply a way to write
∪¦B(k, 3r) : k ∈ K¦ .
Think of it as fattening up the set K. Let K
r
= K +B(0, r). A picture of what is
happening follows.
K K
r
U
Consider A
K
r
∗ ψ
m
where ψ
m
is a molliﬁer. Let m be so large that
1
m
< r.
Then from the deﬁnition of what is meant by a convolution, and using that ψ
m
has
support in B
_
0,
1
m
_
, A
K
r
∗ ψ
m
= 1 on K and that its support is in K + B(0, 3r).
Now using Lemma 12.9.7, A
K
r
∗ ψ
m
is also inﬁnitely diﬀerentiable. Therefore, let
h = A
K
r
∗ ψ
m
.
Theorem 12.9.9 For each p ≥ 1, C
∞
c
(1
n
) is dense in L
p
(1
n
). Here the measure
is Lebesgue measure.
Proof: Let f ∈ L
p
(1
n
) and let ε > 0 be given. Choose g ∈ C
c
(1
n
) such that
[[f −g[[
p
<
ε
2
. This can be done by using Theorem 12.5.3. Now let
g
m
(x) = g ∗ ψ
m
(x) ≡
_
g (x −y) ψ
m
(y) dm
n
(y) =
_
g (y) ψ
m
(x −y) dm
n
(y)
where ¦ψ
m
¦ is a molliﬁer. It follows from Lemma 12.9.7 g
m
∈ C
∞
c
(1
n
). It vanishes
if x / ∈ spt(g) +B(0,
1
m
).
[[g −g
m
[[
p
=
__
[g(x) −
_
g(x −y)ψ
m
(y)dm
n
(y)[
p
dm
n
(x)
_1
p
≤
__
(
_
[g(x) −g(x −y)[ψ
m
(y)dm
n
(y))
p
dm
n
(x)
_1
p
≤
_ __
[g(x) −g(x −y)[
p
dm
n
(x)
_1
p
ψ
m
(y)dm
n
(y)
=
_
B(0,
1
m
)
[[g −g
y
[[
p
ψ
m
(y)dm
n
(y) <
ε
2
300 THE L
P
SPACES
whenever m is large enough. This follows from the uniform continuity of g. Theorem
12.3.2 was used to obtain the third inequality. There is no measurability problem
because the function
(x, y) →[g(x) −g(x −y)[ψ
m
(y)
is continuous. Thus when m is large enough,
[[f −g
m
[[
p
≤ [[f −g[[
p
+[[g −g
m
[[
p
<
ε
2
+
ε
2
= ε.
This is a very remarkable result. Functions in L
p
(1
n
) don’t need to be continu
ous anywhere and yet every such function is very close in the L
p
norm to one which
is inﬁnitely diﬀerentiable having compact support.
Corollary 12.9.10 Let U be an open set. For each p ≥ 1, C
∞
c
(U) is dense in
L
p
(U). Here the measure is Lebesgue measure.
Proof: Let f ∈ L
p
(U) and let ε > 0 be given. Choose g ∈ C
c
(U) such that
[[f −g[[
p
<
ε
2
. This is possible because Lebesgue measure restricted to the open set
U is regular. Thus the existence of such a g follows from Theorem 12.5.3. Now let
g
m
(x) = g ∗ ψ
m
(x) ≡
_
g (x −y) ψ
m
(y) dm
n
(y) =
_
g (y) ψ
m
(x −y) dm
n
(y)
where ¦ψ
m
¦ is a molliﬁer. It follows from Lemma 12.9.7 g
m
∈ C
∞
c
(U) for all m
suﬃciently large. It vanishes if x / ∈ spt(g) +B(0,
1
m
). Then
[[g −g
m
[[
p
=
__
[g(x) −
_
g(x −y)ψ
m
(y)dm
n
(y)[
p
dm
n
(x)
_1
p
≤
__
(
_
[g(x) −g(x −y)[ψ
m
(y)dm
n
(y))
p
dm
n
(x)
_1
p
≤
_ __
[g(x) −g(x −y)[
p
dm
n
(x)
_1
p
ψ
m
(y)dm
n
(y)
=
_
B(0,
1
m
)
[[g −g
y
[[
p
ψ
m
(y)dm
n
(y) <
ε
2
whenever m is large enough. This follows from the uniform continuity of g. Theorem
12.3.2 was used to obtain the third inequality. There is no measurability problem
because the function
(x, y) →[g(x) −g(x −y)[ψ
m
(y)
is continuous. Thus when m is large enough,
[[f −g
m
[[
p
≤ [[f −g[[
p
+[[g −g
m
[[
p
<
ε
2
+
ε
2
= ε.
12.10. L
∞
301
Another thing should probably be mentioned. If you have had a course in com
plex analysis, you may be wondering whether these inﬁnitely diﬀerentiable functions
having compact support have anything to do with analytic functions which also have
inﬁnitely many derivatives. The answer is no! Recall that if an analytic function
has a limit point in the set of zeros then it is identically equal to zero. Thus these
functions in C
∞
c
(1
n
) are not analytic. This is a strictly real analysis phenomenon
and has absolutely nothing to do with the theory of functions of a complex variable.
12.10 L
∞
Formally the conjugate index to 1 would be ∞ regarding 1/∞ as 0. This is also an
important space. Sometimes we call it the space of essentially bounded functions
meaning that they are bounded oﬀ a set of measure zero.
Deﬁnition 12.10.1 Let (Ω, o, µ) be a measure space. L
∞
(Ω) is the vector space
of measurable functions such that for some M > 0, [f(x)[ ≤ M for all x outside of
some set of measure zero ([f(x)[ ≤ M a.e.). Deﬁne f = g when f(x) = g(x) a.e.
and [[f[[
∞
≡ inf¦M : [f(x)[ ≤ M a.e.¦.
Theorem 12.10.2 L
∞
(Ω) is a Banach space.
Proof: It is clear that L
∞
(Ω) is a vector space. Is [[ [[
∞
a norm?
Claim: If f ∈ L
∞
(Ω), then [f (x)[ ≤ [[f[[
∞
a.e.
Proof of the claim:
_
x : [f (x)[ ≥ [[f[[
∞
+n
−1
_
≡ E
n
is a set of measure zero
according to the deﬁnition of [[f[[
∞
. Furthermore, ¦x : [f (x)[ > [[f[[
∞
¦ = ∪
n
E
n
and so it is also a set of measure zero. This veriﬁes the claim.
Now if [[f[[
∞
= 0 it follows that f (x) = 0 a.e. Also if f, g ∈ L
∞
(Ω),
[f (x) +g (x)[ ≤ [f (x)[ +[g (x)[ ≤ [[f[[
∞
+[[g[[
∞
a.e. and so [[f[[
∞
+ [[g[[
∞
serves as one of the constants, M in the deﬁnition of
[[f +g[[
∞
. Therefore,
[[f +g[[
∞
≤ [[f[[
∞
+[[g[[
∞
.
Next let c be a number. Then [cf (x)[ = [c[ [f (x)[ ≤ [c[ [[f[[
∞
and so [[cf[[
∞
≤
[c[ [[f[[
∞
. Therefore since c is arbitrary, [[f[[
∞
= [[c (1/c) f[[
∞
≤
¸
¸
1
c
¸
¸
[[cf[[
∞
which
implies [c[ [[f[[
∞
≤ [[cf[[
∞
. Thus [[ [[
∞
is a norm as claimed.
To verify completeness, let ¦f
n
¦ be a Cauchy sequence in L
∞
(Ω) and use the
above claim to get the existence of a set of measure zero, E
nm
such that for all
x / ∈ E
nm
,
[f
n
(x) −f
m
(x)[ ≤ [[f
n
−f
m
[[
∞
Let E = ∪
n,m
E
nm
. Thus µ(E) = 0 and for each x / ∈ E, ¦f
n
(x)¦
∞
n=1
is a Cauchy
sequence in C. Let
f(x) =
_
0 if x ∈ E
lim
n→∞
f
n
(x) if x / ∈ E
= lim
n→∞
A
E
C(x)f
n
(x).
302 THE L
P
SPACES
Then f is clearly measurable because it is the limit of measurable functions. If
F
n
= ¦x : [f
n
(x)[ > [[f
n
[[
∞
¦
and F = ∪
∞
n=1
F
n
, it follows µ(F) = 0 and that for x / ∈ F ∪ E,
[f(x)[ ≤ lim inf
n→∞
[f
n
(x)[ ≤ lim inf
n→∞
[[f
n
[[
∞
< ∞
because ¦[[f
n
[[
∞
¦ is a Cauchy sequence. ([[[f
n
[[
∞
−[[f
m
[[
∞
[ ≤ [[f
n
−f
m
[[
∞
by the
triangle inequality.) Thus f ∈ L
∞
(Ω). Let n be large enough that whenever m > n,
[[f
m
−f
n
[[
∞
< ε.
Then, if x / ∈ E,
[f(x) −f
n
(x)[ = lim
m→∞
[f
m
(x) −f
n
(x)[
≤ lim
m→∞
inf [[f
m
−f
n
[[
∞
< ε.
Hence [[f −f
n
[[
∞
< ε for all n large enough.
12.11 Exercises
1. ♠Let E be a Lebesgue measurable set in 1. Suppose m(E) > 0. Consider the
set
E −E = ¦x −y : x ∈ E, y ∈ E¦.
Show that E −E contains an interval. Hint: Let
f(x) =
_
A
E
(t)A
E
(x +t)dt.
Note f is continuous at 0 and f(0) > 0 and use continuity of translation in
L
p
.
2. ♠Give an example of a sequence of functions in L
p
(1) which converges to
zero in L
p
but does not converge pointwise to 0. Does this contradict the
proof of the theorem that L
p
is complete? You don’t have to be real precise,
just describe it.
3. ♠Let g be in L
2
([0, 2π]) and extend it to be periodic of period 2π. The n
th
partial sum of the Fourier series of g is deﬁned as
n
k=−n
c
k
e
ikx
where c
k
is the Fourier coeﬃcient given by
c
k
=
1
2π
_
2π
0
e
−iky
g (y) dy.
12.11. EXERCISES 303
Show that this particular choice of c
k
has the property that if a
k
is any other
choice of these constants, then
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
g −
n
k=−n
c
k
e
ikx
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
L
2
(0,2π)
≤
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
g −
n
k=−n
a
k
e
ikx
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
L
2
(0,2π)
Thus, the partial sums of the Fourier series give the optimal approximation
of g in L
2
(0, 2π)!
4. ↑ ♠Now show that whenever g is as above,
lim
n→∞
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
g −
n
k=−n
c
k
e
ikx
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
L
2
(0,2π)
= 0
Thus the Fourier series converges in L
2
(0, 2π) to the function it came from.
Hint: This is not too hard if you use Problem 5 on Page 280. This does not
show anything about pointwise convergence of the partial sums of the Fourier
series! This will be done later. In fact the question of pointwise convergence
of Fourier series to such a function in L
2
(0, 2π) was an unsolved problem till
the middle 1960’s.
5. ↑ ♠Now suppose that f is a continuous 2π periodic function. Suppose also
that for t ∈ [0, 2π] ,
f (t) = f (0) +
_
t
0
g (s) ds
where g ∈ L
2
(0, 2π) . Letting S
n
g (x) ≡
n
k=−n
g
k
e
ikx
for
g
k
=
1
2π
_
2π
0
g (s) e
−iks
ds
the Fourier coeﬃcient of g, it follows that S
n
g →g in L
2
(0, 2π).
(a) Show that this convergence implies that for each t ∈ [0, 2π] ,
f (t) = f (0) + lim
n→∞
_
t
0
S
n
g (s) ds
Now use the integral of the sum equals the sum of the integrals on the
right to ﬁnd that
f (t) = f (0) + lim
n→∞
n
k=−n, k=0
g
k
ik
_
e
ikt
−1
_
304 THE L
P
SPACES
(b) Show that the Fourier coeﬃcient of f is given by f
k
=
g
k
ik
whenever k ,= 0.
Also verify that
lim
n→∞
n
k=−n, k=0
g
k
ik
= lim
n→∞
n
k=−n, k=0
f
k
≡ C
exists. Explain from this why for all t ∈ [0, 2π] ,
f (t) = f (0) −C + lim
n→∞
n
k=−n, k=0
f
k
e
ikt
(c) Finally, use the proof of the completeness of L
2
to argue that for some
subsequence n
m
, lim
m→∞
n
m
= ∞,
f (t) = lim
m→∞
n
m
k=−n
m
, k=0
f
k
e
ikt
+f
0
for a.e. t ∈ [0, 2π], where f
0
is the Fourier coeﬃcient of f, f
0
=
1
2π
_
2π
0
f (t) dt.
Explain why f (0)−C = f
0
. Explain why this gives a result for pointwise
convergence of the Fourier series of f and describe this result carefully.
A much better result on pointwise convergence will be given later.
6. ♠The Marcinkiewicz interpolation theorem is a very amazing and useful the
orem. Here is a deﬁnition.
Deﬁnition 12.11.1 L
p
(Ω)+L
1
(Ω) will denote the space of measurable func
tions, f, such that f is the sum of a function in L
p
(Ω) and L
1
(Ω). Also, if
T : L
p
(Ω) +L
1
(Ω) → space of measurable functions, T is subadditive if
[T (f +g) (x)[ ≤ [Tf (x)[ +[Tg (x)[.
T is of type (p, p) if there exists a constant independent of f ∈ L
p
(Ω) such
that
[[Tf[[
p
≤ Af
p
, f ∈ L
p
(Ω).
T is weak type (p, p) if there exists a constant A independent of f such that
µ([x : [Tf (x)[ > α]) ≤
_
A
α
[[f[[
p
_
p
, f ∈ L
p
(Ω).
Now here is the Marcinkiewicz interpolation theorem.
Theorem 12.11.2 Let (Ω, µ, o) be a σ ﬁnite measure space, 1 < r < ∞, and
let
T : L
1
(Ω) +L
r
(Ω) → space of measurable functions
12.11. EXERCISES 305
be subadditive, weak (r, r), and weak (1, 1). Then T is of type (p, p) for every
p ∈ (1, r) and
[[Tf[[
p
≤ A
p
[[f[[
p
where the constant A
p
depends only on p and the constants in the deﬁnition
of weak (1, 1) and weak (r, r).
This problem is to prove this wonderful theorem. Recall ﬁrst Problem 13 on
Page 235. From this problem or the previous problem,
_
Ω
[f[
p
dµ =
_
∞
0
pt
p−1
µ([[f[ > t]) dt (12.11.11)
There is nothing to prove if f / ∈ L
p
so always assume f ∈ L
p
. Now for α > 0,
let
f
1
(x) ≡
_
f (x) if [f (x)[ ≤ α
0 if [f (x)[ > α
, f
2
(x) ≡
_
f (x) if [f (x)[ > α
0 if [f (x)[ ≤ α
Thus f = f
1
+f
2
.
(a) Show that f
1
∈ L
r
and f
2
∈ L
1
.
(b) Explain why
[[Tf[ > α] ⊆
_
[Tf
1
[ >
α
2
_
∪
_
[Tf
2
[ >
α
2
_
.
(c) Using the weak type estimates and 12.11.11, explain why
_
[Tf[
p
dµ =
_
∞
0
pα
p−1
µ([[Tf[ > α]) dα
≤
_
∞
0
pα
p−1
µ
__
[Tf
1
[ >
α
2
__
dα +
_
∞
0
pα
p−1
µ
__
[Tf
2
[ >
α
2
__
dα
≤ p
_
∞
0
α
p−1
_
2A
r
α
[[f
1
[[
L
r
_
r
dα +p
_
∞
0
α
p−1
2A
1
α
[[f
2
[[
1
dα.
(d) Now explain from the deﬁnition of f
1
and f
2
why the right side equals
= p (2A
r
)
r
_
∞
0
α
p−1−r
_
Ω
[f
1
[
r
dµdα + 2A
1
p
_
∞
0
α
p−2
_
Ω
[f
2
[ dµdα
= p (2A
r
)
r
_
Ω
[f[
r
_
∞
f(x)
α
p−1−r
dαdµ + 2A
1
p
_
Ω
[f[
_
f(x)
0
α
p−2
dαdµ
≤ max
_
2
r
A
r
r
p
r −p
,
2pA
1
p −1
_
[[f[[
p
L
p
(Ω)
306 THE L
P
SPACES
(e) Note how Fubini’s theorem was used. Why were the functions of interest
product measurable? After all, f
i
is a function of α although this has
been suppressed. You might consider the proof of measurability which
led to Problem 12 on Page 234.
7. ♠ Let K be a bounded subset of L
p
(1
n
) and suppose that for each ε > 0
there exists G such that G is compact with
_
R
n
\G
[u(x)[
p
dx < ε
p
and for all ε > 0, there exist a δ > 0 and such that if [h[ < δ, then
_
[u(x +h) −u(x)[
p
dx < ε
p
for all u ∈ K. Show that K is precompact in L
p
(1
n
). Hint: Let φ
k
be a
molliﬁer and consider
K
k
≡ ¦u ∗ φ
k
: u ∈ K¦ .
Verify the conditions of the Ascoli Arzela theorem for these functions deﬁned
on G and show there is an ε net for each ε > 0. Can you modify this to let
an arbitrary open set take the place of 1
n
? This is a very important result.
8. ♠Let (Ω, d) be a metric space and suppose also that (Ω, o, µ) is a regular
measure space such that µ(Ω) < ∞ and let f ∈ L
1
(Ω) where f has complex
values. Show that for every ε > 0, there exists an open set of measure less
than ε, denoted here by V and a continuous function, g deﬁned on Ω such
that f = g on V
C
. Thus, aside from a set of small measure, f is continuous.
If [f (ω)[ ≤ M, show that we can also assume [g (ω)[ ≤ M. This is called
Lusin’s theorem. Hint: Use Theorems 12.5.3 and 12.2.2 to obtain a sequence
of functions in C
c
(Ω) , ¦g
n
¦ which converges pointwise a.e. to f and then use
Egoroﬀ’s theorem to obtain a small set W of measure less than ε/2 such that
convergence is uniform on W
C
. Now let F be a closed subset of W
C
such
that µ
_
W
C
¸ F
_
< ε/2. Let V = F
C
. Thus µ(V ) < ε and on F = V
C
,
the convergence of ¦g
n
¦ is uniform showing that the restriction of f to V
C
is
continuous. Now use the Tietze extension theorem.
9. ♠ Let φ : 1 →1 be convex. This means
φ(λx + (1 −λ)y) ≤ λφ(x) + (1 −λ)φ(y)
whenever λ ∈ [0, 1]. Verify that if x < y < z, then
φ(y)−φ(x)
y−x
≤
φ(z)−φ(y)
z−y
and that
φ(z)−φ(x)
z−x
≤
φ(z)−φ(y)
z−y
. Show if s ∈ 1 there exists λ such that
φ(s) ≤ φ(t) +λ(s −t) for all t. Show that if φ is convex, then φ is continuous.
12.11. EXERCISES 307
10. ↑ ♠Prove Jensen’s inequality. If φ : 1 →1 is convex, µ(Ω) = 1, and f : Ω →1
is in L
1
(Ω), then φ(
_
Ω
f du) ≤
_
Ω
φ(f)dµ. Hint: Let s =
_
Ω
f dµ and use
Problem 9.
11. ♠ Let
1
p
+
1
p
= 1, p > 1, let f ∈ L
p
(1), g ∈ L
p
(1). Show f ∗ g is uniformly
continuous on 1 and [(f ∗ g)(x)[ ≤ [[f[[
L
p[[g[[
L
p
. Hint: You need to consider
why f ∗ g exists and then this follows from the deﬁnition of convolution and
continuity of translation in L
p
.
12. ♠ B(p, q) =
_
1
0
x
p−1
(1 −x)
q−1
dx, Γ(p) =
_
∞
0
e
−t
t
p−1
dt for p, q > 0. The ﬁrst
of these is called the beta function, while the second is the gamma function.
Show a.) Γ(p + 1) = pΓ(p); b.) Γ(p)Γ(q) = B(p, q)Γ(p +q).
13. Let f ∈ C
c
(0, ∞) and deﬁne F(x) =
1
x
_
x
0
f(t)dt. Show
[[F[[
L
p
(0,∞)
≤
p
p −1
[[f[[
L
p
(0,∞)
whenever p > 1.
Hint: Argue there is no loss of generality in assuming f ≥ 0 and then assume
this is so. Integrate
_
∞
0
[F(x)[
p
dx by parts as follows:
_
∞
0
F
p
dx =
show = 0
¸ .. ¸
xF
p
[
∞
0
−p
_
∞
0
xF
p−1
F
dx.
Now show xF
= f − F and use this in the last integral. Complete the
argument by using Holder’s inequality and p −1 = p/q.
14. ↑ Now suppose f ∈ L
p
(0, ∞), p > 1, and f not necessarily in C
c
(0, ∞). Show
that F(x) =
1
x
_
x
0
f(t)dt still makes sense for each x > 0. Show the inequality
of Problem 13 is still valid. This inequality is called Hardy’s inequality. Hint:
To show this, use the above inequality along with the density of C
c
(0, ∞) in
L
p
(0, ∞).
15. Suppose f, g ≥ 0. When does equality hold in Holder’s inequality?
16. ♠ Prove Vitali’s Convergence theorem: Let ¦f
n
¦ be uniformly integrable and
complex valued, µ(Ω) < ∞, f
n
(x) →f(x) a.e. where f is measurable. Then
f ∈ L
1
and lim
n→∞
_
Ω
[f
n
−f[dµ = 0. Hint: Use Egoroﬀ’s theorem to show
¦f
n
¦ is a Cauchy sequence in L
1
(Ω).
17. ↑ ♠ Show the Vitali Convergence theorem implies the Dominated Convergence
theorem for ﬁnite measure spaces but there exist examples where the Vitali
convergence theorem works and the dominated convergence theorem does not.
18. ♠ Suppose f ∈ L
∞
∩ L
1
. Show lim
p→∞
[[f[[
L
p = [[f[[
∞
. Hint:
([[f[[
∞
−ε)
p
µ([[f[ > [[f[[
∞
−ε]) ≤
_
[f>f
∞
−ε]
[f[
p
dµ ≤
308 THE L
P
SPACES
_
[f[
p
dµ =
_
[f[
p−1
[f[ dµ ≤ [[f[[
p−1
∞
_
[f[ dµ.
Now raise both ends to the 1/p power and take liminf and limsup as p →∞.
You should get [[f[[
∞
−ε ≤ liminf [[f[[
p
≤ limsup [[f[[
p
≤ [[f[[
∞
19. ♠Suppose µ(Ω) < ∞. Show that if 1 ≤ p < q, then L
q
(Ω) ⊆ L
p
(Ω). Hint
Use Holder’s inequality.
20. Show L
1
(1) _ L
2
(1) and L
2
(1) _ L
1
(1) if Lebesgue measure is used. Hint:
Consider 1/
√
x and 1/x.
21. ♠ Suppose that θ ∈ [0, 1] and r, s, q > 0 with
1
q
=
θ
r
+
1 −θ
s
.
show that
(
_
[f[
q
dµ)
1/q
≤ ((
_
[f[
r
dµ)
1/r
)
θ
((
_
[f[
s
dµ)
1/s
)
1−θ
.
If q, r, s ≥ 1 this says that
[[f[[
q
≤ [[f[[
θ
r
[[f[[
1−θ
s
.
Using this, show that
ln
_
[[f[[
q
_
≤ θ ln ([[f[[
r
) + (1 −θ) ln ([[f[[
s
) .
Hint:
_
[f[
q
dµ =
_
[f[
qθ
[f[
q(1−θ)
dµ.
Now note that 1 =
θq
r
+
q(1−θ)
s
and use Holder’s inequality.
22. Suppose f is a function in L
1
(1) and f is inﬁnitely diﬀerentiable. Does it
follow that f
∈ L
1
(1)? Is f
measurable? Hint: What if φ ∈ C
∞
c
(0, 1) and
f (x) = φ(2
n
(x −n)) for x ∈ (n, n + 1) , f (x) = 0 if x < 0?
23. ♠For a function f ∈ L
1
(1
p
), the Fourier transform, Ff is given by
Ff (t) ≡
1
_√
2π
_
n
_
R
p
e
−it·x
f (x) dx
and the so called inverse Fourier transform, F
−1
f is deﬁned by
Ff (t) ≡
1
_√
2π
_
n
_
R
p
e
it·x
f (x) dx
Show that if f ∈ L
1
(1
p
) , then lim
x→∞
Ff (x) = 0. Hint: You might try to
show this ﬁrst for f ∈ C
∞
c
(1
p
).
Fourier Transforms
13.1 Fourier Transforms Of Functions In (
Let ( be the functions of Deﬁnition 12.7.2 except, for the sake of convenience,
remove all references to rational numbers. Thus ( consists of ﬁnite sum of poly
nomials having coeﬃcients in C times e
−ax
2
for some a > 0. The idea is to ﬁrst
understand the Fourier transform on these very specialized functions.
Deﬁnition 13.1.1 For ψ ∈ ( Deﬁne the Fourier transform, F and the inverse
Fourier transform, F
−1
by
Fψ(t) ≡ (2π)
−n/2
_
R
n
e
−it·x
ψ(x)dx,
F
−1
ψ(t) ≡ (2π)
−n/2
_
R
n
e
it·x
ψ(x)dx.
where t x ≡
n
i=1
t
i
x
i
. Note there is no problem with this deﬁnition because ψ is
in L
1
(1
n
) and therefore,
¸
¸
e
it·x
ψ(x)
¸
¸
≤ [ψ(x)[ ,
an integrable function.
One reason for using the functions ( is that it is very easy to compute the
Fourier transform of these functions. The ﬁrst thing to do is to verify F and F
−1
map ( to ( and that F
−1
◦ F (ψ) = ψ.
Lemma 13.1.2 The following hold. (c > 0)
_
1
2π
_
n/2
_
R
n
e
−ct
2
e
−is·t
dt =
_
1
2π
_
n/2
_
R
n
e
−ct
2
e
is·t
dt
=
_
1
2π
_
n/2
e
−
s
2
4c
_√
π
√
c
_
n
=
_
1
2c
_
n/2
e
−
1
4c
s
2
. (13.1.1)
309
310 FOURIER TRANSFORMS
Proof: Consider ﬁrst the case of one dimension. Let H (s) be given by
H (s) ≡
_
R
e
−ct
2
e
−ist
dt =
_
R
e
−ct
2
cos (st) dt
Then using the dominated convergence theorem to diﬀerentiate,
H
(s) =
_
R
_
−e
−ct
2
_
t sin (st) dt =
_
e
−ct
2
2c
sin (st) [
∞
−∞
−
s
2c
_
R
e
−ct
2
cos (st) dt
_
= −
s
2c
H (s) .
Also H (0) =
_
R
e
−ct
2
dt. Thus H (0) =
_
R
e
−cx
2
dx ≡ I and so
I
2
=
_
R
2
e
−c(x
2
+y
2
)
dxdy =
_
∞
0
_
2π
0
e
−cr
2
rdθdr =
π
c
.
For another proof of this which does not use change of variables and polar coordi
nates, see Problem 14 below. Hence
H
(s) +
s
2c
H (s) = 0, H (0) =
_
π
c
.
It follows that H (s) = e
−
s
2
4c
√
π
√
c
. Hence
1
√
2π
_
R
e
−ct
2
e
−ist
dt =
_
π
c
1
√
2π
e
−
s
2
4c
=
_
1
2c
_
1/2
e
−
s
2
4c
.
This proves the formula in the case of one dimension. The case of the inverse Fourier
transform is similar. The n dimensional formula follows from Fubini’s theorem.
With these formulas, it is easy to verify F, F
−1
map ( to ( and F ◦ F
−1
=
F
−1
◦ F = id.
Theorem 13.1.3 Each of F and F
−1
map ( to (. Also F
−1
◦ F (ψ) = ψ and
F ◦ F
−1
(ψ) = ψ.
Proof: To make the notation simpler,
_
will symbolize
1
(2π)
n/2
_
R
n
. Also,
f
b
(x) ≡ e
−bx
2
. Then from the above
Ff
b
= (2b)
−n/2
f
(4b)
−1
The ﬁrst claim will be shown if it is shown that Fψ ∈ ( for ψ (x) = x
α
e
−bx
2
because an arbitrary function of ( is a ﬁnite sum of scalar multiples of functions
such as ψ. Using Lemma 13.1.2,
Fψ (t) ≡
_
e
−it·x
x
α
e
−bx
2
dx
= (−i)
−α
D
α
t
__
e
−it·x
e
−bx
2
dx
_
= (−i)
−α
D
α
t
_
e
−
t
2
4b
_√
π
√
b
_
n
_
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 311
and this is clearly in ( because it equals a polynomial times e
−
t
2
4b
. Similarly,
F
−1
: ( →(. Now consider F
−1
◦ F (ψ) (s) . From the above, and integrating by
parts,
F
−1
◦ F (ψ) (s) = (−i)
−α
_
e
is·t
D
α
t
__
e
−it·x
e
−bx
2
dx
_
dt
= (−i)
−α
(−i)
α
s
α
_
e
is·t
__
e
−it·x
e
−bx
2
dx
_
dt
= s
α
F
−1
(F (f
b
)) (s)
F
−1
(F (f
b
)) (s) = F
−1
_
(2b)
−n/2
f
(4b)
−1
_
(s) = (2b)
−n/2
F
−1
_
f
(4b)
−1
_
(s)
= (2b)
−n/2
_
2 (4b)
−1
_
−n/2
f
(4(4b)
−1
)
−1 (s) = f
b
(s)
Hence F
−1
◦ F (ψ) (s) = s
α
f
b
(s) = ψ (s).
13.2 Fourier Transforms Of Just About Anything
13.2.1 Fourier Transforms Of (
∗
Deﬁnition 13.2.1 Let (
∗
denote the vector space of linear functions deﬁned on (
which have values in C. Thus T ∈ (
∗
means T : ( →C and T is linear,
T (aψ +bφ) = aT (ψ) +bT (φ) for all a, b ∈ C, ψ, φ ∈ (
Let ψ ∈ (. Then we can regard ψ as an element of (
∗
by deﬁning
ψ (φ) ≡
_
R
n
ψ (x) φ(x) dx.
Then we have the following important lemma.
Lemma 13.2.2 The following is obtained for all φ, ψ ∈ (.
Fψ (φ) = ψ (Fφ) , F
−1
ψ (φ) = ψ
_
F
−1
φ
_
Also if ψ ∈ ( and ψ = 0 in (
∗
so that ψ (φ) = 0 for all φ ∈ (, then ψ = 0 as a
function.
Proof:
Fψ (φ) ≡
_
R
n
Fψ (t) φ(t) dt
=
_
R
n
_
1
2π
_
n/2
_
R
n
e
−it·x
ψ(x)dxφ(t) dt
=
_
R
n
ψ(x)
_
1
2π
_
n/2
_
R
n
e
−it·x
φ(t) dtdx
=
_
R
n
ψ(x)Fφ(x) dx ≡ ψ (Fφ)
312 FOURIER TRANSFORMS
The other claim is similar.
Suppose now ψ (φ) = 0 for all φ ∈ (. Then
_
R
n
ψφdx = 0
for all φ ∈ (. Therefore, this is true for φ = ψ and so ψ = 0.
This lemma suggests a way to deﬁne the Fourier transform of something in (
∗
.
Deﬁnition 13.2.3 For T ∈ (
∗
, deﬁne FT, F
−1
T ∈ (
∗
by
FT (φ) ≡ T (Fφ) , F
−1
T (φ) ≡ T
_
F
−1
φ
_
Lemma 13.2.4 F and F
−1
are both one to one, onto, and are inverses of each
other.
Proof: First note F and F
−1
are both linear. This follows directly from the
deﬁnition. Suppose now FT = 0. Then FT (φ) = T (Fφ) = 0 for all φ ∈ (. But F
and F
−1
map ( onto ( because if ψ ∈ (, then as shown above, ψ = F
_
F
−1
(ψ)
_
.
Therefore, T = 0 and so F is one to one. Similarly F
−1
is one to one. Now
F
−1
(FT) (φ) ≡ (FT)
_
F
−1
φ
_
≡ T
_
F
_
F
−1
(φ)
__
= Tφ.
Therefore, F
−1
◦ F (T) = T. Similarly, F ◦ F
−1
(T) = T. Thus both F and F
−1
are
one to one and onto and are inverses of each other as suggested by the notation.
Probably the most interesting things in (
∗
are functions of various kinds. The
following lemma will be useful in considering this situation.
Lemma 13.2.5 If f ∈ L
1
loc
(1
n
) and
_
R
n
fφdx = 0 for all φ ∈ C
c
(1
n
), then
f = 0 a.e.
Proof: Let E be bounded and Lebesgue measurable. By regularity, there exists
a compact set K
k
⊆ E and an open set V
k
⊇ E such that m
n
(V
k
¸ K
k
) < 2
−k
.
Let h
k
equal 1 on K
k
, vanish on V
C
k
, and take values between 0 and 1. Then
h
k
converges to A
E
oﬀ ∩
∞
k=1
∪
∞
l=k
(V
l
¸ K
l
) , a set of measure zero. Hence, by the
dominated convergence theorem,
_
fA
E
dm
n
= lim
k→∞
_
fh
k
dm
n
= 0.
It follows that for E an arbitrary Lebesgue measurable set,
_
fA
B(0,R)
A
E
dm
n
= 0.
Let
sgn f =
_
f
f
if [f[ ,= 0
0 if [f[ = 0
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 313
By Theorem 9.1.3, there exists ¦s
k
¦, a sequence of simple functions converging
pointwise to sgn f such that [s
k
[ ≤ 1. Then by the dominated convergence theorem
again,
_
[f[ A
B(0,R)
dm
n
= lim
k→∞
_
fA
B(0,R)
s
k
dm
n
= 0.
Since R is arbitrary, [f[ = 0 a.e.
Corollary 13.2.6 Let f ∈ L
1
(1
n
) and suppose
_
R
n
f (x) φ(x) dx = 0
for all φ ∈ (. Then f = 0 a.e.
Proof: Let ψ ∈ C
c
(1
n
) . Then by the Stone Weierstrass approximation theo
rem, there exists a sequence of functions, ¦φ
k
¦ ⊆ ( such that φ
k
→ ψ uniformly.
Then by the dominated convergence theorem,
_
fψdx = lim
k→∞
_
fφ
k
dx = 0.
By Lemma 13.2.5 f = 0.
The next theorem is the main result of this sort.
Theorem 13.2.7 Let f ∈ L
p
(1
n
) , p ≥ 1, or suppose f is measurable and has
polynomial growth,
[f (x)[ ≤ K
_
1 +[x[
2
_
m
for some m ∈ N. Then if
_
fψdx = 0
for all ψ ∈ (, then it follows f = 0.
Proof: First note that if f ∈ L
p
(1
n
) or has polynomial growth, then it makes
sense to write the integral
_
fψdx described above. This is obvious in the case of
polynomial growth. In the case where f ∈ L
p
(1
n
) it also makes sense because
_
[f[ [ψ[ dx ≤
__
[f[
p
dx
_
1/p
__
[ψ[
p
dx
_
1/p
< ∞
due to the fact mentioned above that all these functions in ( are in L
p
(1
n
) for
every p ≥ 1. Suppose now that f ∈ L
p
, p ≥ 1. The case where f ∈ L
1
(1
n
) was
dealt with in Corollary 13.2.6. Suppose f ∈ L
p
(1
n
) for p > 1. Then
[f[
p−2
f ∈ L
p
(1
n
) ,
_
p
= q,
1
p
+
1
q
= 1
_
314 FOURIER TRANSFORMS
and by density of ( in L
p
(1
n
) (Theorem 12.7.4), there exists a sequence ¦g
k
¦ ⊆ (
such that ¸
¸
¸
¸
¸
¸g
k
−[f[
p−2
f
¸
¸
¸
¸
¸
¸
p
→0.
Then
_
R
n
[f[
p
dx =
_
R
n
f
_
[f[
p−2
f −g
k
_
dx +
_
R
n
fg
k
dx
=
_
R
n
f
_
[f[
p−2
f −g
k
_
dx
≤ [[f[[
L
p
¸
¸
¸
¸
¸
¸g
k
−[f[
p−2
f
¸
¸
¸
¸
¸
¸
p
which converges to 0. Hence f = 0.
It remains to consider the case where f has polynomial growth. Thus x →
f (x) e
−x
2
∈ L
1
(1
n
) . Therefore, for all ψ ∈ (,
0 =
_
f (x) e
−x
2
ψ (x) dx
because e
−x
2
ψ (x) ∈ (. Therefore, by the ﬁrst part, f (x) e
−x
2
= 0 a.e.
Note that “polynomial growth” could be replaced with a condition of the form
[f (x)[ ≤ K
_
1 +[x[
2
_
m
e
kx
α
, α < 2
and the same proof would yield that these functions are in (
∗
. The main thing to
observe is that almost all functions of interest are in (
∗
.
Theorem 13.2.8 Let f be a measurable function with polynomial growth,
[f (x)[ ≤ C
_
1 +[x[
2
_
N
for some N,
or let f ∈ L
p
(1
n
) for some p ∈ [1, ∞]. Then f ∈ (
∗
if
f (φ) ≡
_
fφdx.
Proof: Let f have polynomial growth ﬁrst. Then the above integral is clearly
well deﬁned and so in this case, f ∈ (
∗
.
Next suppose f ∈ L
p
(1
n
) with ∞ > p ≥ 1. Then it is clear again that the
above integral is well deﬁned because of the fact that φ is a sum of polynomials
times exponentials of the form e
−cx
2
and these are in L
p
(1
n
). Also φ →f (φ) is
clearly linear in both cases.
This has shown that for nearly any reasonable function, you can deﬁne its Fourier
transform as described above. You could also deﬁne the Fourier transform of a ﬁnite
Borel measure µ because for such a measure
ψ →
_
R
n
ψdµ
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 315
is a linear functional on (. This includes the very important case of probability
distribution measures. The theoretical basis for this assertion will be given a little
later.
13.2.2 Fourier Transforms Of Functions In L
1
(1
n
)
First suppose f ∈ L
1
(1
n
) .
Theorem 13.2.9 Let f ∈ L
1
(1
n
) . Then Ff (φ) =
_
R
n
gφdt where
g (t) =
_
1
2π
_
n/2
_
R
n
e
−it·x
f (x) dx
and F
−1
f (φ) =
_
R
n
gφdt where g (t) =
_
1
2π
_
n/2
_
R
n
e
it·x
f (x) dx. In short,
Ff(t) ≡ (2π)
−n/2
_
R
n
e
−it·x
f(x)dx,
F
−1
f(t) ≡ (2π)
−n/2
_
R
n
e
it·x
f(x)dx.
Proof: From the deﬁnition and Fubini’s theorem,
Ff (φ) ≡
_
R
n
f (t) Fφ(t) dt =
_
R
n
f (t)
_
1
2π
_
n/2
_
R
n
e
−it·x
φ(x) dxdt
=
_
R
n
_
_
1
2π
_
n/2
_
R
n
f (t) e
−it·x
dt
_
φ(x) dx.
Since φ ∈ ( is arbitrary, it follows from Theorem 13.2.7 that Ff (x) is given by the
claimed formula. The case of F
−1
is identical.
Here are interesting properties of these Fourier transforms of functions in L
1
.
Theorem 13.2.10 If f ∈ L
1
(1
n
) and [[f
k
− f[[
1
→ 0, then Ff
k
and F
−1
f
k
con
verge uniformly to Ff and F
−1
f respectively. If f ∈ L
1
(1
n
), then F
−1
f and Ff
are both continuous and bounded. Also,
lim
x→∞
F
−1
f(x) = lim
x→∞
Ff(x) = 0. (13.2.2)
Furthermore, for f ∈ L
1
(1
n
) both Ff and F
−1
f are uniformly continuous.
Proof: The ﬁrst claim follows from the following inequality.
[Ff
k
(t) −Ff (t)[ ≤ (2π)
−n/2
_
R
n
¸
¸
e
−it·x
f
k
(x) −e
−it·x
f(x)
¸
¸
dx
= (2π)
−n/2
_
R
n
[f
k
(x) −f (x)[ dx
= (2π)
−n/2
[[f −f
k
[[
1
.
316 FOURIER TRANSFORMS
which a similar argument holding for F
−1
.
Now consider the second claim of the theorem.
[Ff (t) −Ff (t
)[ ≤ (2π)
−n/2
_
R
n
¸
¸
¸e
−it·x
−e
−it
·x
¸
¸
¸ [f(x)[ dx
The integrand is bounded by 2 [f (x)[, a function in L
1
(1
n
) and converges to 0 as
t
→t and so the dominated convergence theorem implies Ff is continuous. To see
Ff (t) is uniformly bounded,
[Ff (t)[ ≤ (2π)
−n/2
_
R
n
[f(x)[ dx < ∞.
A similar argument gives the same conclusions for F
−1
.
It remains to verify 13.2.2 and the claim that Ff and F
−1
f are uniformly
continuous.
[Ff (t)[ ≤
¸
¸
¸
¸
(2π)
−n/2
_
R
n
e
−it·x
f(x)dx
¸
¸
¸
¸
Now let ε > 0 be given and let g ∈ C
∞
c
(1
n
) such that (2π)
−n/2
[[g −f[[
1
< ε/2.
Then
[Ff (t)[ ≤ (2π)
−n/2
_
R
n
[f(x) −g (x)[ dx
+
¸
¸
¸
¸
(2π)
−n/2
_
R
n
e
−it·x
g(x)dx
¸
¸
¸
¸
≤ ε/2 +
¸
¸
¸
¸
(2π)
−n/2
_
R
n
e
−it·x
g(x)dx
¸
¸
¸
¸
.
Now integrating by parts, it follows that for [[t[[
∞
≡ max ¦[t
j
[ : j = 1, , n¦ > 0
[Ff (t)[ ≤ ε/2 + (2π)
−n/2
¸
¸
¸
¸
¸
¸
1
[[t[[
∞
_
R
n
n
j=1
¸
¸
¸
¸
∂g (x)
∂x
j
¸
¸
¸
¸
dx
¸
¸
¸
¸
¸
¸
(13.2.3)
and this last expression converges to zero as [[t[[
∞
→∞. The reason for this is that
if t
j
,= 0, integration by parts with respect to x
j
gives
(2π)
−n/2
_
R
n
e
−it·x
g(x)dx = (2π)
−n/2
1
−it
j
_
R
n
e
−it·x
∂g (x)
∂x
j
dx.
Therefore, choose the j for which [[t[[
∞
= [t
j
[ and the result of 13.2.3 holds. There
fore, from 13.2.3, if [[t[[
∞
is large enough, [Ff (t)[ < ε. Similarly, lim
t→∞
F
−1
(t) =
0. Consider the claim about uniform continuity. Let ε > 0 be given. Then there
exists R such that if [[t[[
∞
> R, then [Ff (t)[ <
ε
2
. Since Ff is continuous, it is
uniformly continuous on the compact set [−R −1, R + 1]
n
. Therefore, there exists
δ
1
such that if [[t −t
[[
∞
< δ
1
for t
, t ∈ [−R −1, R + 1]
n
, then
[Ff (t) −Ff (t
)[ < ε/2. (13.2.4)
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 317
Now let 0 < δ < min (δ
1
, 1) and suppose [[t −t
[[
∞
< δ. If both t, t
are contained
in [−R, R]
n
, then 13.2.4 holds. If t ∈ [−R, R]
n
and t
/ ∈ [−R, R]
n
, then both are
contained in [−R −1, R + 1]
n
and so this veriﬁes 13.2.4 in this case. The other case
is that neither point is in [−R, R]
n
and in this case,
[Ff (t) −Ff (t
)[ ≤ [Ff (t)[ +[Ff (t
)[
<
ε
2
+
ε
2
= ε.
There is a very interesting relation between the Fourier transform and convolu
tions.
Theorem 13.2.11 Let f, g ∈ L
1
(1
n
). Then f∗g ∈ L
1
and F(f∗g) = (2π)
n/2
FfFg.
Proof: Consider
_
R
n
_
R
n
[f (x −y) g (y)[ dydx.
The function, (x, y) →[f (x −y) g (y)[ is Lebesgue measurable and so by Fubini’s
theorem,
_
R
n
_
R
n
[f (x −y) g (y)[ dydx =
_
R
n
_
R
n
[f (x −y) g (y)[ dxdy = [[f[[
1
[[g[[
1
< ∞.
It follows that for a.e. x,
_
R
n
[f (x −y) g (y)[ dy < ∞ and for each of these values
of x, it follows that
_
R
n
f (x −y) g (y) dy exists and equals a function of x which is
in L
1
(1
n
) , f ∗ g (x). Now
F(f ∗ g) (t) ≡ (2π)
−n/2
_
R
n
e
−it·x
f ∗ g (x) dx
= (2π)
−n/2
_
R
n
e
−it·x
_
R
n
f (x −y) g (y) dydx
= (2π)
−n/2
_
R
n
e
−it·y
g (y)
_
R
n
e
−it·(x−y)
f (x −y) dxdy
= (2π)
n/2
Ff (t) Fg (t) .
There are many other considerations involving Fourier transforms of functions
in L
1
(1
n
). Some others are in the exercises.
13.2.3 Fourier Transforms Of Functions In L
2
(1
n
)
Consider Ff and F
−1
f for f ∈ L
2
(1
n
). First note that the formula given for Ff
and F
−1
f when f ∈ L
1
(1
n
) will not work for f ∈ L
2
(1
n
) unless f is also in L
1
(1
n
).
Recall that a +ib = a −ib.
Theorem 13.2.12 For φ ∈ (, [[Fφ[[
2
= [[F
−1
φ[[
2
= [[φ[[
2
.
318 FOURIER TRANSFORMS
Proof: First note that for ψ ∈ (,
F(ψ) = F
−1
(ψ) , F
−1
(ψ) = F(ψ). (13.2.5)
This follows from the deﬁnition. For example,
Fψ (t) = (2π)
−n/2
_
R
n
e
−it·x
ψ (x) dx
= (2π)
−n/2
_
R
n
e
it·x
ψ (x) dx
Let φ, ψ ∈ (. It was shown above that
_
R
n
(Fφ)ψ(t)dt =
_
R
n
φ(Fψ)dx.
Similarly,
_
R
n
φ(F
−1
ψ)dx =
_
R
n
(F
−1
φ)ψdt. (13.2.6)
Now, 13.2.5  13.2.6 imply
_
R
n
[φ[
2
dx =
_
R
n
φφdx =
_
R
n
φF
−1
(Fφ)dx =
_
R
n
φF(Fφ)dx
=
_
R
n
Fφ(Fφ)dx =
_
R
n
[Fφ[
2
dx.
Similarly
[[φ[[
2
= [[F
−1
φ[[
2
.
Lemma 13.2.13 Let f ∈ L
2
(1
n
) and let φ
k
→f in L
2
(1
n
) where φ
k
∈ (. (Such
a sequence exists because of density of ( in L
2
(1
n
).) Then Ff and F
−1
f are both
in L
2
(1
n
) and the following limits take place in L
2
.
lim
k→∞
F (φ
k
) = F (f) , lim
k→∞
F
−1
(φ
k
) = F
−1
(f) .
Proof: Let ψ ∈ ( be given. Then
Ff (ψ) ≡ f (Fψ) ≡
_
R
n
f (x) Fψ (x) dx
= lim
k→∞
_
R
n
φ
k
(x) Fψ (x) dx = lim
k→∞
_
R
n
Fφ
k
(x) ψ (x) dx.
Also by Theorem 13.2.12 ¦Fφ
k
¦
∞
k=1
is Cauchy in L
2
(1
n
) and so it converges to
some h ∈ L
2
(1
n
). Therefore, from the above,
Ff (ψ) =
_
R
n
h(x) ψ (x)
which shows that F (f) ∈ L
2
(1
n
) and h = F (f) . The case of F
−1
is entirely
similar.
Since Ff and F
−1
f are in L
2
(1
n
) , this also proves the following theorem.
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 319
Theorem 13.2.14 If f ∈ L
2
(1
n
), Ff and F
−1
f are the unique elements of L
2
(1
n
)
such that for all φ ∈ (,
_
R
n
Ff(x)φ(x)dx =
_
R
n
f(x)Fφ(x)dx, (13.2.7)
_
R
n
F
−1
f(x)φ(x)dx =
_
R
n
f(x)F
−1
φ(x)dx. (13.2.8)
Theorem 13.2.15 (Plancherel)
[[f[[
2
= [[Ff[[
2
= [[F
−1
f[[
2
. (13.2.9)
Proof: Use the density of ( in L
2
(1
n
) to obtain a sequence, ¦φ
k
¦ converging
to f in L
2
(1
n
). Then by Lemma 13.2.13
[[Ff[[
2
= lim
k→∞
[[Fφ
k
[[
2
= lim
k→∞
[[φ
k
[[
2
= [[f[[
2
.
Similarly,
[[f[[
2
= [[F
−1
f[[
2.
The following corollary is a simple generalization of this. To prove this corollary,
use the following simple lemma which comes as a consequence of the Cauchy Schwarz
inequality.
Lemma 13.2.16 Suppose f
k
→f in L
2
(1
n
) and g
k
→g in L
2
(1
n
). Then
lim
k→∞
_
R
n
f
k
g
k
dx =
_
R
n
fgdx
Proof:
¸
¸
¸
¸
_
R
n
f
k
g
k
dx −
_
R
n
fgdx
¸
¸
¸
¸
≤
¸
¸
¸
¸
_
R
n
f
k
g
k
dx −
_
R
n
f
k
gdx
¸
¸
¸
¸
+
¸
¸
¸
¸
_
R
n
f
k
gdx −
_
R
n
fgdx
¸
¸
¸
¸
≤ [[f
k
[[
2
[[g −g
k
[[
2
+[[g[[
2
[[f
k
−f[[
2
.
Now [[f
k
[[
2
is a Cauchy sequence and so it is bounded independent of k. Therefore,
the above expression is smaller than ε whenever k is large enough.
Corollary 13.2.17 For f, g ∈ L
2
(1
n
),
_
R
n
fgdx =
_
R
n
Ff Fgdx =
_
R
n
F
−1
f F
−1
gdx.
320 FOURIER TRANSFORMS
Proof: First note the above formula is obvious if f, g ∈ (. To see this, note
_
R
n
Ff Fgdx =
_
R
n
Ff (x)
1
(2π)
n/2
_
R
n
e
−ix·t
g (t) dtdx
=
_
R
n
1
(2π)
n/2
_
R
n
e
ix·t
Ff (x) dxg (t)dt =
_
R
n
_
F
−1
◦ F
_
f (t) g (t)dt
=
_
R
n
f (t) g (t)dt.
The formula with F
−1
is exactly similar.
Now to verify the corollary, let φ
k
→ f in L
2
(1
n
) and let ψ
k
→ g in L
2
(1
n
).
Then by Lemma 13.2.13
_
R
n
Ff Fgdx = lim
k→∞
_
R
n
Fφ
k
Fψ
k
dx = lim
k→∞
_
R
n
φ
k
ψ
k
dx =
_
R
n
fgdx
A similar argument holds for F
−1
.
How does one compute Ff and F
−1
f ?
Theorem 13.2.18 For f ∈ L
2
(1
n
), let f
r
= fA
E
r
where E
r
is a bounded measur
able set with E
r
↑ 1
n
. Then the following limits hold in L
2
(1
n
) .
Ff = lim
r→∞
Ff
r
, F
−1
f = lim
r→∞
F
−1
f
r
.
Proof: [[f −f
r
[[
2
→0 and so [[Ff −Ff
r
[[
2
→0 and [[F
−1
f −F
−1
f
r
[[
2
→0 by
Plancherel’s Theorem.
What are Ff
r
and F
−1
f
r
? Let φ ∈ (
_
R
n
Ff
r
φdx =
_
R
n
f
r
Fφdx
= (2π)
−
n
2
_
R
n
_
R
n
f
r
(x)e
−ix·y
φ(y)dydx
=
_
R
n
[(2π)
−
n
2
_
R
n
f
r
(x)e
−ix·y
dx]φ(y)dy.
Since this holds for all φ ∈ (, a dense subset of L
2
(1
n
), it follows that
Ff
r
(y) = (2π)
−
n
2
_
R
n
f
r
(x)e
−ix·y
dx.
Similarly
F
−1
f
r
(y) = (2π)
−
n
2
_
R
n
f
r
(x)e
ix·y
dx.
This shows that to take the Fourier transform of a function in L
2
(1
n
), it suﬃces
to take the limit as r → ∞ in L
2
(1
n
) of (2π)
−
n
2
_
R
n
f
r
(x)e
−ix·y
dx. A similar
procedure works for the inverse Fourier transform.
Note this reduces to the earlier deﬁnition in case f ∈ L
1
(1
n
). Now consider the
convolution of a function in L
2
with one in L
1
.
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 321
Theorem 13.2.19 Let h ∈ L
2
(1
n
) and let f ∈ L
1
(1
n
). Then h ∗ f ∈ L
2
(1
n
),
F
−1
(h ∗ f) = (2π)
n/2
F
−1
hF
−1
f,
F (h ∗ f) = (2π)
n/2
FhFf,
and
[[h ∗ f[[
2
≤ [[h[[
2
[[f[[
1
. (13.2.10)
Proof: An application of Minkowski’s inequality yields
_
_
R
n
__
R
n
[h(x −y)[ [f (y)[ dy
_
2
dx
_
1/2
≤ [[f[[
1
[[h[[
2
. (13.2.11)
Hence
_
[h(x −y)[ [f (y)[ dy < ∞ a.e. x and
x →
_
h(x −y) f (y) dy
is in L
2
(1
n
). Let E
r
↑ 1
n
, m(E
r
) < ∞. Thus,
h
r
≡ A
E
r
h ∈ L
2
(1
n
) ∩ L
1
(1
n
),
and letting φ ∈ (,
_
F (h
r
∗ f) (φ) dx
≡
_
(h
r
∗ f) (Fφ) dx
= (2π)
−n/2
_ _ _
h
r
(x −y) f (y) e
−ix·t
φ(t) dtdydx
= (2π)
−n/2
_ _ __
h
r
(x −y) e
−i(x−y)·t
dx
_
f (y) e
−iy·t
dyφ(t) dt
=
_
(2π)
n/2
Fh
r
(t) Ff (t) φ(t) dt.
Since φ is arbitrary and ( is dense in L
2
(1
n
),
F (h
r
∗ f) = (2π)
n/2
Fh
r
Ff.
Now by Minkowski’s Inequality, h
r
∗ f → h ∗ f in L
2
(1
n
) and also it is clear that
h
r
→h in L
2
(1
n
) ; so, by Plancherel’s theorem, you may take the limit in the above
and conclude
F (h ∗ f) = (2π)
n/2
FhFf.
The assertion for F
−1
is similar and 13.2.10 follows from 13.2.11.
322 FOURIER TRANSFORMS
13.2.4 The Schwartz Class
The problem with ( is that it does not contain C
∞
c
(1
n
). I have used it in presenting
the Fourier transform because the functions in ( have a very speciﬁc form which
made some technical details work out easier than in any other approach I have
seen. The Schwartz class is a larger class of functions which does contain C
∞
c
(1
n
)
and also has the same nice properties as (. The functions in the Schwartz class
are inﬁnitely diﬀerentiable and they vanish very rapidly as [x[ →∞ along with all
their partial derivatives. This is the description of these functions, not a speciﬁc
form involving polynomials times e
−αx
2
. To describe this precisely requires some
notation.
Deﬁnition 13.2.20 f ∈ S, the Schwartz class, if f ∈ C
∞
(1
n
) and for all positive
integers N,
ρ
N
(f) < ∞
where
ρ
N
(f) = sup¦(1 +[x[
2
)
N
[D
α
f(x)[ : x ∈ 1
n
, [α[ ≤ N¦.
Thus f ∈ S if and only if f ∈ C
∞
(1
n
) and
sup¦[x
β
D
α
f(x)[ : x ∈ 1
n
¦ < ∞ (13.2.12)
for all multi indices α and β.
Also note that if f ∈ S, then p(f) ∈ S for any polynomial, p with p(0) = 0 and
that
S ⊆ L
p
(1
n
) ∩ L
∞
(1
n
)
for any p ≥ 1. To see this assertion about the p (f), it suﬃces to consider the case
of the product of two elements of the Schwartz class. If f, g ∈ S, then D
α
(fg) is
a ﬁnite sum of derivatives of f times derivatives of g. Therefore, ρ
N
(fg) < ∞ for
all N. You may wonder about examples of things in S. Clearly any function in
C
∞
c
(1
n
) is in S. However there are other functions in S. For example e
−x
2
is in
S as you can verify for yourself and so is any function from (. Note also that the
density of C
c
(1
n
) in L
p
(1
n
) shows that S is dense in L
p
(1
n
) for every p.
Recall the Fourier transform of a function in L
1
(1
n
) is given by
Ff(t) ≡ (2π)
−n/2
_
R
n
e
−it·x
f(x)dx.
Therefore, this gives the Fourier transform for f ∈ S. The nice property which S
has in common with ( is that the Fourier transform and its inverse map S one to
one onto S. This means I could have presented the whole of the above theory in
terms of S rather than in terms of (. However, it is more technical.
Theorem 13.2.21 If f ∈ S, then Ff and F
−1
f are also in S.
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 323
Proof: To begin with, let α = e
j
= (0, 0, , 1, 0, , 0), the 1 in the j
th
slot.
F
−1
f(t +he
j
) −F
−1
f(t)
h
= (2π)
−n/2
_
R
n
e
it·x
f(x)(
e
ihx
j
−1
h
)dx. (13.2.13)
Consider the integrand in 13.2.13.
¸
¸
¸
¸
e
it·x
f(x)(
e
ihx
j
−1
h
)
¸
¸
¸
¸
= [f (x)[
¸
¸
¸
¸
(
e
i(h/2)x
j
−e
−i(h/2)x
j
h
)
¸
¸
¸
¸
= [f (x)[
¸
¸
¸
¸
i sin ((h/2) x
j
)
(h/2)
¸
¸
¸
¸
≤ [f (x)[ [x
j
[
and this is a function in L
1
(1
n
) because f ∈ S. Therefore by the Dominated
Convergence Theorem,
∂F
−1
f(t)
∂t
j
= (2π)
−n/2
_
R
n
e
it·x
ix
j
f(x)dx
= i(2π)
−n/2
_
R
n
e
it·x
x
e
j
f(x)dx.
Now x
e
j
f(x) ∈ S and so one can continue in this way and take derivatives indeﬁ
nitely. Thus F
−1
f ∈ C
∞
(1
n
) and from the above argument,
D
α
F
−1
f(t) =(2π)
−n/2
_
R
n
e
it·x
(ix)
α
f(x)dx.
To complete showing F
−1
f ∈ S,
t
β
D
α
F
−1
f(t) =(2π)
−n/2
_
R
n
e
it·x
t
β
(ix)
a
f(x)dx.
Integrate this integral by parts to get
t
β
D
α
F
−1
f(t) =(2π)
−n/2
_
R
n
i
β
e
it·x
D
β
((ix)
a
f(x))dx. (13.2.14)
Here is how this is done.
_
R
e
it
j
x
j
t
β
j
j
(ix)
α
f(x)dx
j
=
e
it
j
x
j
it
j
t
β
j
j
(ix)
α
f(x) [
∞
−∞
+
i
_
R
e
it
j
x
j
t
β
j
−1
j
D
e
j
((ix)
α
f(x))dx
j
where the boundary term vanishes because f ∈ S. Returning to 13.2.14, use the
fact that [e
ia
[ = 1 to conclude
[t
β
D
α
F
−1
f(t)[ ≤C
_
R
n
[D
β
((ix)
a
f(x))[dx < ∞.
324 FOURIER TRANSFORMS
It follows F
−1
f ∈ S. Similarly Ff ∈ S whenever f ∈ S.
Of course S can be considered a subset of (
∗
as follows. For ψ ∈ S,
ψ (φ) ≡
_
R
n
ψφdx
Theorem 13.2.22 Let ψ ∈ S. Then (F ◦ F
−1
)(ψ) = ψ and (F
−1
◦ F)(ψ) = ψ
whenever ψ ∈ S. Also F and F
−1
map S one to one and onto S.
Proof: The ﬁrst claim follows from the fact that F and F
−1
are inverses of
each other on (
∗
which was established above. For the second, let ψ ∈ S. Then
ψ = F
_
F
−1
ψ
_
. Thus F maps S onto S. If Fψ = 0, then do F
−1
to both sides to
conclude ψ = 0. Thus F is one to one and onto. Similarly, F
−1
is one to one and
onto.
13.2.5 Convolution
To begin with it is necessary to discuss the meaning of φf where f ∈ (
∗
and φ ∈ (.
What should it mean? First suppose f ∈ L
p
(1
n
) or measurable with polynomial
growth. Then φf also has these properties. Hence, it should be the case that
φf (ψ) =
_
R
n
φfψdx =
_
R
n
f (φψ) dx. This motivates the following deﬁnition.
Deﬁnition 13.2.23 Let T ∈ (
∗
and let φ ∈ (. Then φT ≡ Tφ ∈ (
∗
will be deﬁned
by
φT (ψ) ≡ T (φψ) .
The next topic is that of convolution. It was just shown that
F (f ∗ φ) = (2π)
n/2
FφFf, F
−1
(f ∗ φ) = (2π)
n/2
F
−1
φF
−1
f
whenever f ∈ L
2
(1
n
) and φ ∈ ( so the same deﬁnition is retained in the general
case because it makes perfect sense and agrees with the earlier deﬁnition.
Deﬁnition 13.2.24 Let f ∈ (
∗
and let φ ∈ (. Then deﬁne the convolution of f
with an element of ( as follows.
f ∗ φ ≡ (2π)
n/2
F
−1
(FφFf) ∈ (
∗
There is an obvious question. With this deﬁnition, is it true that F
−1
(f ∗ φ) =
(2π)
n/2
F
−1
φF
−1
f as it was earlier?
Theorem 13.2.25 Let f ∈ (
∗
and let φ ∈ (.
F (f ∗ φ) = (2π)
n/2
FφFf, (13.2.15)
F
−1
(f ∗ φ) = (2π)
n/2
F
−1
φF
−1
f. (13.2.16)
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 325
Proof: Note that 13.2.15 follows from Deﬁnition 13.2.24 and both assertions
hold for f ∈ (. Consider 13.2.16. Here is a simple formula involving a pair of
functions in (.
_
ψ ∗ F
−1
F
−1
φ
_
(x)
=
__ _ _
ψ (x −y) e
iy·y
1
e
iy
1
·z
φ(z) dzdy
1
dy
_
(2π)
n
=
__ _ _
ψ (x −y) e
−iy·˜ y
1
e
−i˜ y
1
·z
φ(z) dzd˜ y
1
dy
_
(2π)
n
= (ψ ∗ FFφ) (x) .
Now for ψ ∈ (,
(2π)
n/2
F
_
F
−1
φF
−1
f
_
(ψ) ≡ (2π)
n/2
_
F
−1
φF
−1
f
_
(Fψ) ≡
(2π)
n/2
F
−1
f
_
F
−1
φFψ
_
≡ (2π)
n/2
f
_
F
−1
_
F
−1
φFψ
__
=
f
_
(2π)
n/2
F
−1
__
FF
−1
F
−1
φ
_
(Fψ)
_
_
≡
f
_
ψ ∗ F
−1
F
−1
φ
_
= f (ψ ∗ FFφ) (13.2.17)
Also
(2π)
n/2
F
−1
(FφFf) (ψ) ≡ (2π)
n/2
(FφFf)
_
F
−1
ψ
_
≡
(2π)
n/2
Ff
_
FφF
−1
ψ
_
≡ (2π)
n/2
f
_
F
_
FφF
−1
ψ
__
=
= f
_
F
_
(2π)
n/2
_
FφF
−1
ψ
_
__
= f
_
F
_
(2π)
n/2
_
F
−1
FFφF
−1
ψ
_
__
= f
_
F
_
F
−1
(FFφ ∗ ψ)
__
f (FFφ ∗ ψ) = f (ψ ∗ FFφ) . (13.2.18)
The last line follows from the following.
_
FFφ(x −y) ψ (y) dy =
_
Fφ(x −y) Fψ (y) dy
=
_
Fψ (x −y) Fφ(y) dy
=
_
ψ (x −y) FFφ(y) dy.
From 13.2.18 and 13.2.17 , since ψ was arbitrary,
(2π)
n/2
F
_
F
−1
φF
−1
f
_
= (2π)
n/2
F
−1
(FφFf) ≡ f ∗ φ
which shows 13.2.16.
326 FOURIER TRANSFORMS
13.3 Exercises
1. ♠For f ∈ L
1
(1
n
), show that if F
−1
f ∈ L
1
or Ff ∈ L
1
, then f equals a
continuous bounded function a.e.
2. ♠ Suppose f, g ∈ L
1
(1) and Ff = Fg. Show f = g a.e.
3. Show that if f ∈ L
1
(1
n
) , then lim
x→∞
Ff (x) = 0.
4. ↑ ♠Suppose f ∗ f = f or f ∗ f = 0 and f ∈ L
1
(1). Show f = 0.
5. ♠For this problem deﬁne
_
∞
a
f (t) dt ≡ lim
r→∞
_
r
a
f (t) dt. Note this coincides
with the Lebesgue integral when f ∈ L
1
(a, ∞). Show
(a)
_
∞
0
sin(u)
u
du =
π
2
(b) lim
r→∞
_
∞
δ
sin(ru)
u
du = 0 whenever δ > 0.
(c) If f ∈ L
1
(1), then lim
r→∞
_
R
sin (ru) f (u) du = 0.
Hint: For the ﬁrst two, use
1
u
=
_
∞
0
e
−ut
dt and apply Fubini’s theorem to
_
R
0
sin u
_
R
e
−ut
dtdu. For the last part, ﬁrst establish it for f ∈ C
∞
c
(1) and
then use the density of this set in L
1
(1) to obtain the result. This is sometimes
called the Riemann Lebesgue lemma.
6. ↑ ♠Suppose that g ∈ L
1
(1) and that at some x > 0, g is locally Holder
continuous from the right and from the left. This means
lim
r→0+
g (x +r) ≡ g (x+)
exists,
lim
r→0+
g (x −r) ≡ g (x−)
exists and there exist constants K, δ > 0 and r ∈ (0, 1] such that for [x −y[ <
δ,
[g (x+) −g (y)[ < K[x −y[
r
for y > x and
[g (x−) −g (y)[ < K[x −y[
r
for y < x. Show that under these conditions,
lim
r→∞
2
π
_
∞
0
sin (ur)
u
_
g (x −u) +g (x +u)
2
_
du
=
g (x+) +g (x−)
2
.
13.3. EXERCISES 327
7. ♠Let g ∈ L
1
(1) and suppose g is locally Holder continuous from the right
and from the left at x. Show that then
lim
R→∞
1
2π
_
R
−R
e
ixt
_
∞
−∞
e
−ity
g (y) dydt =
g (x+) +g (x−)
2
.
This is very interesting. If g ∈ L
2
(1), this shows F
−1
(Fg) (x) =
g(x+)+g(x−)
2
,
the midpoint of the jump in g at the point, x. In particular, if g ∈ (,
F
−1
(Fg) = g. Hint: Show the left side of the above equation reduces to
2
π
_
∞
0
sin (ur)
u
_
g (x −u) +g (x +u)
2
_
du
and then use Problem 6 to obtain the result.
8. ↑ A measurable function g deﬁned on (0, ∞) has exponential growth if [g (t)[ ≤
Ce
ηt
for some η. For Re (s) > η, deﬁne the Laplace Transform by
Lg (s) ≡
_
∞
0
e
−su
g (u) du.
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick γ > η. Show that
lim
R→∞
1
2π
_
R
−R
e
γt
e
iyt
Lg (γ +iy) dy =
g (t+) +g (t−)
2
.
This formula is sometimes written in the form
1
2πi
_
γ+i∞
γ−i∞
e
st
Lg (s) ds
and is called the complex inversion integral for Laplace transforms. It can be
used to ﬁnd inverse Laplace transforms. Hint:
1
2π
_
R
−R
e
γt
e
iyt
Lg (γ +iy) dy =
1
2π
_
R
−R
e
γt
e
iyt
_
∞
0
e
−(γ+iy)u
g (u) dudy.
Now use Fubini’s theorem and do the integral from −R to R to get this equal
to
e
γt
π
_
∞
−∞
e
−γu
g (u)
sin (R(t −u))
t −u
du
where g is the zero extension of g oﬀ [0, ∞). Then this equals
e
γt
π
_
∞
−∞
e
−γ(t−u)
g (t −u)
sin (Ru)
u
du
328 FOURIER TRANSFORMS
which equals
2e
γt
π
_
∞
0
g (t −u) e
−γ(t−u)
+g (t +u) e
−γ(t+u)
2
sin (Ru)
u
du
and then apply the result of Problem 6.
9. ♠Suppose f ∈ (. Show F(f
x
j
)(t) = it
j
Ff(t).
10. ♠ Let f ∈ ( and let k be a positive integer.
[[f[[
k,2
≡ ([[f[[
2
2
+
α≤k
[[D
α
f[[
2
2
)
1/2
.
One could also deﬁne
[[[f[[[
k,2
≡ (
_
R
n
[Ff(x)[
2
(1 +[x[
2
)
k
dx)
1/2
.
Show both [[ [[
k,2
and [[[ [[[
k,2
are norms on ( and that they are equivalent.
These are Sobolev space norms. For which values of k does the second norm
make sense? How about the ﬁrst norm?
11. ↑ ♠ Deﬁne H
k
(1
n
), k ≥ 0 by f ∈ L
2
(1
n
) such that
(
_
[Ff(x)[
2
(1 +[x[
2
)
k
dx)
1
2
< ∞,
[[[f[[[
k,2
≡ (
_
[Ff(x)[
2
(1 +[x[
2
)
k
dx)
1
2
.
Show H
k
(1
n
) is a Banach space, and that if k is a positive integer, H
k
(1
n
)
=¦ f ∈ L
2
(1
n
) : there exists ¦u
j
¦ ⊆ ( with [[u
j
− f[[
2
→ 0 and ¦u
j
¦ is a
Cauchy sequence in [[ [[
k,2
of Problem 10¦. This is one way to deﬁne Sobolev
Spaces. Hint: One way to do the second part of this is to deﬁne a new
measure µ by
µ(E) ≡
_
E
_
1 +[x[
2
_
k
dx.
Then show µ is a Borel measure which is inner and outer regular and show
there exists ¦g
m
¦ such that g
m
∈ ( and g
m
→ Ff in L
2
(µ). Thus g
m
=
Ff
m
, f
m
∈ ( because F maps ( onto (. Then by Problem 10, ¦f
m
¦ is
Cauchy in the norm [[ [[
k,2
.
12. ↑ ♠If 2k > n, show that if f ∈ H
k
(1
n
), then f equals a bounded continuous
function a.e. Hint: Show that for k this large, Ff ∈ L
1
(1
n
), and then use
Problem 1. To do this, write
[Ff(x)[ = [Ff(x)[(1 +[x[
2
)
k
2
(1 +[x[
2
)
−k
2
,
13.3. EXERCISES 329
So
_
[Ff(x)[dx =
_
[Ff(x)[(1 +[x[
2
)
k
2
(1 +[x[
2
)
−k
2
dx.
Use the Cauchy Schwarz inequality. This is an example of a Sobolev imbedding
Theorem.
13. Let u ∈ (. Then Fu ∈ ( and so, in particular, it makes sense to form the
integral,
_
R
Fu(x
, x
n
) dx
n
where (x
, x
n
) = x ∈ 1
n
. For u ∈ (, deﬁne γu(x
) ≡ u(x
, 0). Find a
constant such that F (γu) (x
) equals this constant times the above integral.
Hint: By the dominated convergence theorem
_
R
Fu(x
, x
n
) dx
n
= lim
ε→0
_
R
e
−(εx
n
)
2
Fu(x
, x
n
) dx
n
.
Now use the deﬁnition of the Fourier transform and Fubini’s theorem as re
quired in order to obtain the desired relationship.
14. Let h(x) =
_
_
x
0
e
−t
2
dt
_
2
+
_
_
1
0
e
−x
2
(
1+t
2
)
1+t
2
dt
_
. Show that h
(x) = 0 and
h(0) = π/4. Then let x →∞to conclude that
_
∞
0
e
−t
2
dt =
√
π/2. Show that
_
∞
−∞
e
−t
2
dt =
√
π and that
_
∞
−∞
e
−ct
2
dt =
√
π
√
c
.
15. ♠Recall that for f a function, f
y
(x) = f (x −y) . Find a relationship between
Ff
y
(t) and Ff (t) given that f ∈ L
1
(1
n
).
16. ♠ For f ∈ L
1
(1
n
) , simplify Ff (t +y) .
17. ♠ For f ∈ L
1
(1
n
) and c a nonzero real number, show Ff (ct) = Fg (t) where
g (x) = f
_
x
c
_
.
18. Suppose that f ∈ L
1
(1) and that
_
[x[ [f (x)[ dx < ∞. Find a way to use the
Fourier transform of f to compute
_
xf (x) dx.
330 FOURIER TRANSFORMS
Fourier Series
14.1 Deﬁnition And Basic Properties
A Fourier series is an expression of the form
∞
k=−∞
c
k
e
ikx
where this means
lim
n→∞
n
k=−n
c
k
e
ikx
.
Obviously such a sequence of partial sums may or may not converge at a particular
value of x.
These series have been important in applied math since the time of Fourier who
was an oﬃcer in Napoleon’s army. He was interested in studying the ﬂow of heat in
cannons and invented the concept to aid him in his study. Since that time, Fourier
series and the mathematical problems related to their convergence have motivated
the development of modern methods in analysis. As recently as the mid 1960’s a
problem related to convergence of Fourier series was solved for the ﬁrst time and
the solution of this problem was a big surprise.
1
This chapter is on the classical
theory of convergence of Fourier series.
If you can approximate a function f with an expression of the form
∞
k=−∞
c
k
e
ikx
then the function must have the property f (x + 2π) = f (x) because this is true of
every term in the above series. More generally, here is a deﬁnition.
1
The question was whether the Fourier series of a function in L
2
converged a.e. to the function.
It turned out that it did, to the surprise of many because it was known that the Fourier series of
a function in L
1
does not necessarily converge to the function a.e. The problem was solved by
Carleson in 1965.
331
332 FOURIER SERIES
Deﬁnition 14.1.1 A function, f deﬁned on 1 is a periodic function of period T if
f (x +T) = f (x) for all x.
As just explained, Fourier series are useful for representing periodic functions and
no other kind of function. There is no loss of generality in studying only functions
which are periodic of period 2π. Indeed, if f is a function which has period T, you
can study this function in terms of the function, g (x) ≡ f
_
Tx
2π
_
where g is periodic
of period 2π.
Deﬁnition 14.1.2 For f ∈ L
1
([−π, π]) (f measurable and
_
π
−π
[f (t)[ dt < ∞) and
f periodic on 1, deﬁne the Fourier series of f as
∞
k=−∞
c
k
e
ikx
, (14.1.1)
where
c
k
≡
1
2π
_
π
−π
f (y) e
−iky
dy. (14.1.2)
Also deﬁne the n
th
partial sum of the Fourier series of f by
S
n
(f) (x) ≡
n
k=−n
c
k
e
ikx
. (14.1.3)
It may be interesting to see where this formula came from. Suppose then that
f (x) =
∞
k=−∞
c
k
e
ikx
,
multiply both sides by e
−imx
and take the integral
_
π
−π
, so that
_
π
−π
f (x) e
−imx
dx =
_
π
−π
∞
k=−∞
c
k
e
ikx
e
−imx
dx.
Now switch the sum and the integral on the right side even though there is absolutely
no reason to believe this makes any sense. Then
_
π
−π
f (x) e
−imx
dx =
∞
k=−∞
c
k
_
π
−π
e
ikx
e
−imx
dx
= c
m
_
π
−π
1dx = 2πc
m
because
_
π
−π
e
ikx
e
−imx
dx = 0 if k ,= m. It is formal manipulations of the sort just
presented which suggest that Deﬁnition 14.1.2 might be interesting.
14.1. DEFINITION AND BASIC PROPERTIES 333
In case f is real valued, c
k
= c
−k
and so
S
n
f (x) =
1
2π
_
π
−π
f (y) dy +
n
k=1
2 Re
_
c
k
e
ikx
_
.
Letting c
k
≡ α
k
+iβ
k
S
n
f (x) =
1
2π
_
π
−π
f (y) dy +
n
k=1
2 [α
k
cos kx −β
k
sin kx]
where
c
k
=
1
2π
_
π
−π
f (y) e
−iky
dy =
1
2π
_
π
−π
f (y) (cos ky −i sinky) dy
which shows that
α
k
=
1
2π
_
π
−π
f (y) cos (ky) dy, β
k
=
−1
2π
_
π
−π
f (y) sin (ky) dy
Therefore, letting a
k
= 2α
k
and b
k
= −2β
k
,
a
k
=
1
π
_
π
−π
f (y) cos (ky) dy, b
k
=
1
π
_
π
−π
f (y) sin (ky) dy
and
S
n
f (x) =
a
0
2
+
n
k=1
a
k
cos kx +b
k
sin kx (14.1.4)
This is often the way Fourier series are presented in elementary courses where it is
only real functions which are to be approximated. However it is easier to stick with
Deﬁnition 14.1.2.
The partial sums of a Fourier series can be written in a particularly simple form
which is presented next.
S
n
f (x) =
n
k=−n
c
k
e
ikx
=
n
k=−n
_
1
2π
_
π
−π
f (y) e
−iky
dy
_
e
ikx
=
_
π
−π
1
2π
n
k=−n
_
e
ik(x−y)
_
f (y) dy
≡
_
π
−π
D
n
(x −y) f (y) dy. (14.1.5)
The function,
D
n
(t) ≡
1
2π
n
k=−n
e
ikt
is called the Dirichlet Kernel.
334 FOURIER SERIES
Theorem 14.1.3 The function, D
n
satisﬁes the following:
1.
_
π
−π
D
n
(t) dt = 1
2. D
n
is periodic of period 2π
3. D
n
(t) = (2π)
−1
sin(n+
1
2
)t
sin(
t
2
)
.
Proof: Part 1 is obvious because
1
2π
_
π
−π
e
−iky
dy = 0 whenever k ,= 0 and it
equals 1 if k = 0. Part 2 is also obvious because t → e
ikt
is periodic of period 2π.
It remains to verify Part 3. Note
2πD
n
(t) =
n
k=−n
e
ikt
= 1 + 2
n
k=1
cos (kt)
Therefore,
2πD
n
(t) sin
_
t
2
_
= sin
_
t
2
_
+ 2
n
k=1
sin
_
t
2
_
cos (kt)
= sin
_
t
2
_
+
n
k=1
sin
__
k +
1
2
_
t
_
−sin
__
k −
1
2
_
t
_
= sin
__
n +
1
2
_
t
_
where the easily veriﬁed trig. identity cos (a) sin (b) =
1
2
(sin (a +b) −sin (a −b)) is
used to get to the second line.
Here is a picture of the Dirichlet kernels for n = 1, 2, and 3
x
0 −1 −2 1
1
−1
0
3 2
2
−3
y
Note they are not nonnegative but there is a large central positive bump which
gets larger as n gets larger.
It is not reasonable to expect a Fourier series to converge to the function at
every point. To see this, change the value of the function at a single point in
(−π, π) and extend to keep the modiﬁed function periodic. Then the Fourier series
of the modiﬁed function is the same as the Fourier series of the original function and
so if pointwise convergence did take place, it no longer does. However, it is possible
to prove an interesting theorem about pointwise convergence of Fourier series. This
is done next.
14.2. THE RIEMANN LEBESGUE LEMMA 335
14.2 The Riemann Lebesgue Lemma
The Riemann Lebesgue lemma is the basic result which makes possible the study of
pointwise convergence of Fourier series. It is also a major result in other contexts
and serves as a useful example.
Lemma 14.2.1 (Riemann Lebesgue) Let f ∈ L
1
(1) . Then
lim
α→∞
_
R
f (t) sin (αt +β) dt = 0. (14.2.6)
Proof: By Theorem 12.9.9, there exists g ∈ C
∞
c
(1) such that
[[f −g[[
L
1
(R)
< ε/2.
Then for spt (g) ⊆ (a, b)
_
R
g (t) sin (αt +β) dt =
=0
¸ .. ¸
g (t)
−cos (αt +β)
α
[
b
a
+
_
b
a
g
(t)
cos (αt +β)
α
dt
and this converges to 0 as α →∞ since g
(t) is bounded. Therefore,
¸
¸
¸
¸
_
f (t) sin (αt +β) dt
¸
¸
¸
¸
≤
¸
¸
¸
¸
_
f (t) sin (αt +β) dt −
_
g (t) sin (αt +β) dt
¸
¸
¸
¸
+
¸
¸
¸
¸
_
R
g (t) sin (αt +β) dt
¸
¸
¸
¸
≤
_
[f (t) −g (t)[ dt +ε/2 < ε/2 +ε/2 = ε
provided α is large enough.
14.3 Dini’s Criterion For Convergence
Fourier series like to converge to the midpoint of the jump of a function under
certain conditions. The condition given for convergence in the following theorem
is due to Dini. It is a generalization of the usual theorem presented in elementary
books on Fourier series methods. [3].
Recall
lim
t→x+
f (t) ≡ f (x+) , and lim
t→x−
f (t) ≡ f (x−)
Theorem 14.3.1 Let f be a periodic function of period 2π which is in L
1
([−π, π]).
Suppose at some x, f (x+) and f (x−) both exist and that the function
y →
¸
¸
¸
¸
f (x −y) −f (x−) +f (x +y) −f (x+)
y
¸
¸
¸
¸
≡ h(y) (14.3.7)
336 FOURIER SERIES
is in L
1
([−π, π]). Then
lim
n→∞
S
n
f (x) =
f (x+) +f (x−)
2
. (14.3.8)
Proof:
S
n
f (x) =
_
π
−π
D
n
(x −y) f (y) dy
Change variables x − y → y and use the periodicity of f and D
n
along with the
formula for D
n
(y) to write this as
S
n
f (x) =
_
π
−π
D
n
(y) f (x −y)
=
_
π
0
D
n
(y) f (x −y) dy +
_
0
−π
D
n
(y) f (x −y) dy
=
_
π
0
D
n
(y) [f (x −y) +f (x +y)] dy
=
_
π
0
1
π
sin
__
n +
1
2
_
y
_
sin
_
y
2
_
_
f (x −y) +f (x +y)
2
_
dy. (14.3.9)
Note the function
y →
1
π
sin
__
n +
1
2
_
y
_
sin
_
y
2
_ ,
while it is not deﬁned at 0, is at least bounded and by L’Hospital’s rule,
lim
y→0
1
π
sin
__
n +
1
2
_
y
_
sin
_
y
2
_ =
2n + 1
π
,
so deﬁning it to equal this value at 0 yields a continuous, hence Riemann integrable
function and so the above integral at least makes sense. Also from the property
that
_
π
−π
D
n
(t) dt = 1,
f (x+) +f (x−) =
_
π
−π
D
n
(y) [f (x+) +f (x−)] dy
= 2
_
π
0
D
n
(y) [f (x+) +f (x−)] dy
=
_
π
0
1
π
sin
__
n +
1
2
_
y
_
sin
_
y
2
_ [f (x+) +f (x−)] dy.
Therefore,
¸
¸
¸
¸
S
n
f (x) −
f (x+) +f (x−)
2
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
_
π
0
1
π
sin
__
n +
1
2
_
y
_
sin
_
y
2
_
_
f (x −y) −f (x−) +f (x +y) −f (x+)
2
_
dy
¸
¸
¸
¸
¸
. (14.3.10)
14.3. DINI’S CRITERION FOR CONVERGENCE 337
Now the function
y →
¸
¸
¸
¸
¸
f (x −y) −f (x−) +f (x +y) −f (x+)
2 sin
_
y
2
_
¸
¸
¸
¸
¸
(14.3.11)
is in L
1
([0, π]) and so, extending it to equal 0 oﬀ this interval produces a function
which is in L
1
(1). I need to verify this assertion. When this is done, the desired
result will follow right away from the Riemann Lebesgue lemma. There is clearly
no measurability diﬃculty with the above function. Also, by the assumption given,
the function equals
¸
¸
¸
¸
¸
y
2 sin
_
y
2
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
f (x −y) −f (x−) +f (x +y) −f (x+)
y
¸
¸
¸
¸
The function
y
2 sin(
y
2
)
is bounded on (0, π) so the result is in L
1
([0, π]) .
The following corollary is obtained immediately from the above proof with minor
modiﬁcations.
Corollary 14.3.2 Let f be a periodic function of period 2π which is an element of
R([−π, π]). Suppose at some x, the function
y →
¸
¸
¸
¸
f (x −y) +f (x +y) −2s
y
¸
¸
¸
¸
(14.3.12)
is in L
1
([0, π]). Then
lim
n→∞
S
n
f (x) = s. (14.3.13)
The following corollary gives an easy to check condition for the Fourier series to
converge to the mid point of the jump.
Corollary 14.3.3 Let f be a periodic function of period 2π which is an element
of L
1
([−π, π]). Suppose at some x, f (x+) and f (x−) both exist and there exist
positive constants, K and δ such that whenever 0 < y < δ
[f (x −y) −f (x−)[ ≤ Ky
θ
, [f (x +y) −f (x+)[ < Ky
θ
(14.3.14)
where θ ∈ (0, 1]. Then
lim
n→∞
S
n
f (x) =
f (x+) +f (x−)
2
. (14.3.15)
Proof: The condition 14.3.14 clearly implies Dini’s condition, 14.3.7. This is
because for 0 < y < δ
¸
¸
¸
¸
f (x −y) −f (x−) +f (x +y) −f (x+)
y
¸
¸
¸
¸
≤ 2Ky
θ−1
338 FOURIER SERIES
and so
_
π
ε
¸
¸
¸
¸
f (x −y) −f (x−) +f (x +y) −f (x+)
y
¸
¸
¸
¸
dy
≤
_
δ
ε
2Ky
θ−1
dy +
1
δ
_
π
δ
[f (x −y) −f (x−) +f (x +y) −f (x+)[ dy.
Now
_
δ
ε
2Ky
θ−1
dy = 2K
δ
θ
−ε
θ
θ
which converges to 2K
δ
θ
θ
as ε →0. Thus
lim
ε→0+
_
π
ε
¸
¸
¸
¸
f (x −y) −f (x−) +f (x +y) −f (x+)
y
¸
¸
¸
¸
dy
exists and so, from the monotone convergenct theorem the function
y →
f (x −y) −f (x−) +f (x +y) −f (x+)
y
is in L
1
([0, π]) . This is the Dini condition.
As pointed out by Apostol [3], where you can read these theorems presented in
the context of the Riemann integral, this is a very surprising result because even
though the Fourier coeﬁcients depend on the values of the function on all of [−π, π],
the convergence properties depend in this theorem on very local behavior of the
function.
14.4 Jordan’s Criterion
There is a diﬀerent condition which implies the Fourier series converges to the mid
point of the jump. In order to prove the theorem, there are some interesting lemmas
which are needed.
Lemma 14.4.1 Let G be an increasing function deﬁned on [a, b]. Thus G(x) ≤
G(y) whenever x < y. Then G(x−) = G(x+) = G(x) for every x except for a
countable set of exceptions.
Proof: Let S ≡ ¦x ∈ [a, b] : G(x+) > G(x−)¦. Then there is a rational number
in each interval, (G(x−) , G(x+)) and also, since G is increasing, these intervals are
disjoint. It follows that there are only countably many such intervals. Therefore,
S is countable and if x / ∈ S, G(x+) = G(x−) showing that G is continuous on S
C
and the claimed equality holds.
The next lemma is called the second mean value theorem for integrals.
Lemma 14.4.2 Let G be an increasing function deﬁned on [a, b] and let f be a
continuous function deﬁned on [a, b]. Then there exists t
0
∈ [a, b] such that
_
b
a
G(s) f (s) ds = G(a)
__
t
0
a
f (s) ds
_
+G(b−)
_
_
b
t
0
f (s) ds
_
. (14.4.16)
14.4. JORDAN’S CRITERION 339
Proof: Letting h > 0 deﬁne
G
h
(t) ≡
1
h
2
_
t
t−h
_
s
s−h
G(r) drds
where G(x) ≡ G(a) for all x < a. Thus G
h
(a) = G(a). Also, from the fundamental
theorem of calculus, G
h
(t) ≥ 0 and is a continuous function of t. Also it is clear
that lim
h→0
G
h
(t) = G(t−) for all t ∈ [a, b]. Letting F (t) ≡
_
t
a
f (s) ds,
_
b
a
G
h
(s) f (s) ds = F (t) G
h
(t) [
b
a
−
_
b
a
F (t) G
h
(t) dt. (14.4.17)
Now letting m = min ¦F (t) : t ∈ [a, b]¦ and M = max ¦F (t) : t ∈ [a, b]¦, since
G
h
(t) ≥ 0,
m
_
b
a
G
h
(t) dt ≤
_
b
a
F (t) G
h
(t) dt ≤ M
_
b
a
G
h
(t) dt.
Therefore, if
_
b
a
G
h
(t) dt ,= 0,
m ≤
_
b
a
F (t) G
h
(t) dt
_
b
a
G
h
(t) dt
≤ M
and so by the intermediate value theorem from calculus,
_
_
b
a
G
h
(t) dt
_
F (t
h
) =
_
b
a
F (t) G
h
(t) dt
for some t
h
∈ [a, b] . This is true even if
_
b
a
G
h
(t) dt = 0 because in this case, the
left side equals 0. Since G
h
≥ 0 and is continuous, it must equal 0 and so the right
side is also 0. Therefore, substituting for
_
b
a
F (t) G
h
(t) dt
in 14.4.17,
_
b
a
G
h
(s) f (s) ds = F (t) G
h
(t) [
b
a
−
_
F (t
h
)
_
b
a
G
h
(t) dt
_
= F (b) G
h
(b) −F (t
h
) G
h
(b) +F (t
h
) G
h
(a)
=
_
_
b
t
h
f (s) ds
_
G
h
(b) +
__
t
h
a
f (s) ds
_
G(a) .
Now selecting a convergent subsequence, still denoted by h which converges to zero,
let t
h
→t
0
∈ [a, b]. Therefore, using the dominated convergence theorem or simply
340 FOURIER SERIES
the continuity of f and the above lemma,
_
b
a
G(s) f (s) ds =
_
b
a
G(s−) f (s) ds = lim
h→0
_
b
a
G
h
(s) f (s) ds
= lim
h→0
_
_
b
t
h
f (s) ds
_
G
h
(b) +
__
t
h
a
f (s) ds
_
G(a)
=
_
_
b
t
0
f (s) ds
_
G(b−) +
__
t
0
a
f (s) ds
_
G(a) .
The above lemma will be used in the following lemma from Apostol [3].
Lemma 14.4.3 Let G be increasing. Then for δ > 0,
lim
α→∞
_
δ
0
G(y)
sin (αy)
y
dy =
π
2
G(0+)
Proof: Let 0 < h < δ then
_
δ
0
G(y)
sin(αy)
y
dy =
_
h
0
(G(y) −G(0+))
sin (αy)
y
dy +G(0+)
_
h
0
sin (αy)
y
dy +
_
δ
h
G(y)
sin (αy)
y
dy
From the mean value theorem above, the ﬁrst integral equals
(G(h−) −G(0+))
_
h
0
sin (αy)
y
dy
This integral converges as α →∞ to
_
∞
0
sin y
y
dy =
π
2
. Just change the variable and
use Problem 6 on Page 233. See also Problem 19 on Page 358 below. Therefore, if
h is chosen small enough, the ﬁrst term is bounded by ε/3. Fix such an h. Then
as α →∞ the second term converges to
π
2
G(0+). The last term converges to 0 by
the Riemann Lebesgue lemma. Therefore, ﬁxing h as described,
¸
¸
¸
¸
¸
_
δ
0
G(y)
sin (αy)
y
dy −
π
2
G(0+)
¸
¸
¸
¸
¸
≤
¸
¸
¸
¸
¸
_
h
0
(G(y) −G(0+))
sin (αy)
y
dy
¸
¸
¸
¸
¸
+
¸
¸
¸
¸
¸
G(0+)
_
h
0
sin (αy)
y
dy −
π
2
¸
¸
¸
¸
¸
+
¸
¸
¸
¸
¸
_
δ
h
G(y)
sin (αy)
y
dy
¸
¸
¸
¸
¸
<
ε
3
+
ε
3
+
ε
3
provided α is large enough.
Deﬁnition 14.4.4 Let f : [a, b] →C be a function. Then f is of bounded variation
if
sup
_
n
i=1
[f (t
i
) −f (t
i−1
)[ : a = t
0
< < t
n
= b
_
≡ V (f, [a, b]) < ∞
where the sums are taken over all possible lists, ¦a = t
0
< < t
n
= b¦. The sym
bol, V (f, [a, b]) is known as the total variation on [a, b].
14.4. JORDAN’S CRITERION 341
Lemma 14.4.5 A real valued function f, deﬁned on an interval [a, b] is of bounded
variation if and only if there are increasing functions, H and G deﬁned on [a, b]
such that f (t) = H (t) − G(t). A complex valued function is of bounded variation
if and only if the real and imaginary parts are of bounded variation.
Proof: For f a real valued function of bounded variation, deﬁne an increasing
function, H (t) ≡ V (f, [a, t]) and then note that
f (t) = H (t) −
G(t)
¸ .. ¸
[H (t) −f (t)].
It is routine to verify that G(t) is increasing. Conversely, if f (t) = H (t) − G(t)
where H and G are increasing, the total variation for H is just H (b) − H (a) and
the total variation for G is G(b) − G(a). Therefore, the total variation for f is
bounded by the sum of these.
The last claim follows from the observation that
[f (t
i
) −f (t
i−1
)[ ≥ max ([Re f (t
i
) −Re f (t
i−1
)[ , [Imf (t
i
) −Imf (t
i−1
)[)
and
[Re f (t
i
) −Re f (t
i−1
)[ +[Imf (t
i
) −Imf (t
i−1
)[ ≥ [f (t
i
) −f (t
i−1
)[ .
Since a bounded variation function is the diﬀerence of increasing functions, this
immediately implies the following corollary.
Corollary 14.4.6 If f is of bounded variation, then
lim
α→∞
_
δ
0
f (y)
sin (αy)
y
=
π
2
f (0+)
With this corollary, here is the main theorem, the Jordan criterion for pointwise
convergence of the Fourier series.
Theorem 14.4.7 Suppose f is 2π periodic and is in L
1
(−π, π). Suppose also that
for some δ > 0, f is of bounded variation on [x −δ, x +δ]. Then
lim
n→∞
S
n
f (x) =
f (x+) +f (x−)
2
. (14.4.18)
Proof: First note that from Deﬁnition 14.4.4, lim
y→x−
Re f (y) exists because
Re f is the diﬀerence of two increasing functions. Similarly this limit will exist for
Imf by the same reasoning, and limits of the form lim
y→x+
will also exist. Then
S
n
f (x) −
f (x+) +f (x−)
2
=
_
π
−π
D
n
(y)
_
f (x −y) −
f (x+) +f (x−)
2
_
dy
342 FOURIER SERIES
=
_
π
0
D
n
(y) [(f (x +y) −f (x+)) + (f (x −y) −f (x−))] dy.
Now the Dirichlet kernel, D
n
(y) is a constant multiple of
sin ((n + 1/2) y) / sin (y/2)
and so the Riemann Lebesgue lemma implies
lim
n→∞
_
π
δ
D
n
(y) [(f (x +y) −f (x+)) + (f (x −y) −f (x−))] dy = 0.
Thus it suﬃces to show that
lim
n→∞
_
δ
0
D
n
(y) [(f (x +y) −f (x+)) + (f (x −y) −f (x−))] dy = 0. (14.4.19)
Now y →(f (x +y) −f (x+))+(f (x −y) −f (x−)) ≡ h(y) is of bounded variation
for y ∈ [0, δ] and lim
y→0+
h(y) = h(0+) = 0. The above limit equals
lim
n→∞
_
δ
0
sin
_
n +
1
2
_
y
y
y
sin
_
y
2
_h(y) dy = lim
y→0+
y
sin
_
y
2
_h(y) = 0
by Corollary 14.4.6.
It is known that neither the Jordan criterion nor the Dini criterion implies the
other. See Problem 21.
14.5 Integrating And Diﬀerentiating Fourier Se
ries
You can typically integrate Fourier series term by term and things will work out
according to your expectations. More precisely, here is the main theorem.
Theorem 14.5.1 Let f be 2π periodic and in L
1
([−π, π]). Then
_
x
−π
f (t) dt =
_
x
−π
a
0
dt + lim
n→∞
n
k=−n,k=0
a
k
_
x
−π
e
ikt
dt
where a
k
are the Fourier coeﬃcients of f.
To prove this theorem, here is a lemma.
Lemma 14.5.2 Suppose f ∈ L
1
(−π, π) and
_
π
−π
fdx = 0. Then
_
π
−π
_
x
−π
f (t) dtdx = −
_
π
−π
tf (t) dt
_
π
−π
e
−ikx
_
x
−π
f (t) dtdx =
_
π
−π
f (t)
e
−ikt
ik
dt
14.5. INTEGRATING AND DIFFERENTIATING FOURIER SERIES 343
Proof: This comes from a use of Fubini’s theorem.
_
π
−π
_
x
−π
f (t) dtdx =
_
π
−π
_
π
t
f (t) dxdt =
_
π
−π
f (t)
_
π
t
dxdt
=
_
π
−π
f (t) (π −t) dt = −
_
π
−π
tf (t) dt.
_
π
−π
e
−ikx
_
x
−π
f (t) dtdx =
_
π
−π
f (t)
_
π
t
e
−ikx
dxdt
=
_
π
−π
f (t)
_
e
−ikt
ik
−
e
−ikπ
ik
_
dt =
_
π
−π
f (t)
e
−ikt
ik
dt
Proof of the theorem: First suppose a
0
≡
1
2π
_
π
−π
f (t) dt = 0. Of course this
happens if and only if
_
π
−π
f (t) dt = 0. Then letting F (x) =
_
x
−π
f (t) dt, it follows
that F (−π) = F (π) = 0 and that if F is extended to be 2π periodic, the resulting
function is continuous and of bounded variation on any interval. What is the Fourier
series of F? Denote its Fourier coeﬃcients by A
k
and the Fourier coeﬃcients of f
by a
k
. Then from the lemma,
A
0
=
1
2π
_
π
−π
_
x
−π
f (t) dtdx = −
1
2π
_
π
−π
tf (t) dt
A
k
=
1
2π
_
π
−π
e
−ikx
_
x
−π
f (t) dtdx =
1
2π
_
π
−π
f (t)
e
−ikt
ik
dt =
1
ik
a
k
Since F is of bounded variation, it follows that for all x,
F (x) = −
1
2π
_
π
−π
tf (t) dt +
k=0
1
ik
a
k
e
ikx
In particular, this is true when x = −π and then
0 = −
1
2π
_
π
−π
tf (t) dt +
k=0
1
ik
a
k
e
−ikπ
It follows that
F (x) =
k=0
1
ik
a
k
_
e
ikx
−e
−tkπ
_
=
_
x
−π
a
0
dt + lim
n→∞
n
k=−n,k=0
a
k
_
x
−π
e
ikt
dt
which proves the theorem when a
0
= 0. Now in general, consider g ≡ f − a
0
. For
k ,= 0,
_
π
−π
g (t) e
−ikt
dt =
_
π
−π
f (t) e
−ikt
dt ≡ 2πa
k
.
Then from what was just shown,
_
x
−π
gdt =
_
x
−π
fdt −a
0
(x +π) =
k=0
a
k
_
x
−π
e
ikt
dt
344 FOURIER SERIES
Showing that for all x,
_
x
−π
fdt =
_
x
−π
a
0
dt + lim
n→∞
n
k=−n,k=0
a
k
_
x
−π
e
ikt
dt.
Example 14.5.3 Let f (x) = x for x ∈ [−π, π) and extend f to make it 2π periodic.
Then the Fourier coeﬃcients of f are
a
0
= 0, a
k
=
(−1)
k
i
k
Therefore,
1
2π
_
π
−π
te
−ikt
=
i
k
cos πk
_
x
−π
tdt =
1
2
x
2
−
1
2
π
2
= lim
n→∞
n
k=−n,k=0
(−1)
k
i
k
_
x
−π
e
ikt
dt
= lim
n→∞
n
k=−n,k=0
(−1)
k
i
k
_
sin xk
k
+i
−cos xk + (−1)
k
k
_
For fun, let x = 0 and conclude
−
1
2
π
2
= lim
n→∞
n
k=−n,k=0
(−1)
k
i
k
_
i
−1 + (−1)
k
k
_
= lim
n→∞
n
k=−n,k=0
(−1)
k+1
k
_
−1 + (−1)
k
k
_
= lim
n→∞
2
n
k=1
(−1)
k
+ (−1)
k
2
=
∞
k=1
−4
(2k −1)
2
and so
π
2
8
=
∞
k=1
1
(2k −1)
2
14.6 Diﬀerentiating Fourier Series
Of course it is not reasonable to suppose you can diﬀerentiate a Fourier series term
by term and get good results.
Consider the series for f (x) = 1 if x ∈ (0, π] and f (x) = −1 on (−π, 0) with
f (0) = 0. In this case a
0
= 0.
a
k
=
1
2π
__
π
0
e
−ikt
dt −
_
0
−π
e
−ikt
dt
_
=
i
π
cos πk −1
k
14.6. DIFFERENTIATING FOURIER SERIES 345
so the Fourier series is
k=0
_
(−1)
k
−1
πk
_
ie
ikx
What happens if you diﬀerentiate it term by term? It gives
k=0
−
(−1)
k
−1
π
e
ikx
which fails to converge anywhere because the k
th
term fails to converge to 0. This
is in spite of the fact that f has a derivative away from 0.
However, it is possible to prove some theorems which let you diﬀerentiate a
Fourier series term by term. Here is one such theorem.
Theorem 14.6.1 Suppose for x ∈ [−π, π]
f (x) =
_
x
−π
f
(t) dt +f (−π)
where f is also periodic of period 2π and f
(t) is in L
1
([−π, π]). Then if
f (x) =
∞
k=−∞
a
k
e
ikx
it follows the Fourier series of f
is
∞
k=−∞
a
k
ike
ikx
.
Proof: Since f
is 2π periodic, (why?) it follows from Theorem 14.5.1
f (x) −f (−π) =
∞
k=−∞
b
k
__
x
−π
e
ikt
dt
_
where b
k
is the k
th
Fourier coeﬃcient of f
. Thus
b
k
=
1
2π
_
π
−π
f
(t) e
−ikt
dt.
If k = 0, b
0
=
1
2π
(f (π) −f (−π)) = 0 by periodicity of f. For k ,= 0,
b
k
=
1
2π
_
π
−π
f
(t)
__
t
−π
−ike
−iks
ds −(−1)
k
_
dt
Since
_
π
−π
f
(t) dt = 0, this equals
1
2π
_
π
−π
f
(t)
__
t
−π
−ike
−iks
ds
_
dt
346 FOURIER SERIES
and now using Fubini’s theorem
= −ik
1
2π
_
π
−π
_
π
s
f
(t) e
−iks
dtds = −ik
1
2π
_
π
−π
e
−iks
(f (π) −f (s)) ds
= ik
1
2π
_
π
−π
f (s) e
−iks
ds = ika
k
because
_
π
−π
e
−iks
ds = 0. It follows the Fourier series for f
is
∞
k=−∞
ika
k
e
ikx
as claimed.
Note the conclusion of this theorem is only about the Fourier series of f
. It does
not say the Fourier series of f
converges pointwise to f
. However, if f
satisﬁes a
Dini condition, then this will also occur. For example, if f
has a bounded derivative
at every point, then by the mean value theorem [f
(x) −f
(y)[ ≤ K[x −y[ , and
this is enough to show the Fourier series converges to f
(x).
14.7 Ways Of Approximating Functions
Given above is a theorem about Fourier series converging pointwise to a periodic
function or more generally to the mid point of the jump of the function. Notice
that some sort of smoothness of the function approximated was required, the Dini
condition or the Jordan condition. It can be shown that if this sort of thing is not
present, the Fourier series of a continuous periodic function may fail to converge
to it in a very spectacular manner. In fact, Fourier series don’t do very well at
converging pointwise. However, there is another way of converging at which Fourier
series cannot be beat. It is mean square convergence. This is the same as converging
in L
2
([−π, π]).
Deﬁnition 14.7.1 Let f be a function deﬁned on an interval, [a, b] . Then a se
quence, ¦g
n
¦ of functions is said to converge uniformly to f on [a, b] if
lim
n→∞
sup ¦[f (x) −g
n
(x)[ : x ∈ [a, b]¦ ≡ [[f −g[[
∞
= 0.
The sequence is said to converge mean square to f if
lim
n→∞
[[f −g
n
[[
2
≡ lim
n→∞
_
_
b
a
[f −g
n
[
2
dx
_
1/2
= 0
14.7.1 Uniform Approximation With Trig. Polynomials
It turns out that if you don’t insist the a
k
be the Fourier coeﬃcients, then every
continuous 2π periodic function θ → f (θ) can be approximated uniformly with a
14.7. WAYS OF APPROXIMATING FUNCTIONS 347
Trig. polynomial of the form
p
n
(θ) ≡
n
k=−n
a
k
e
ikθ
This means that for all ε > 0 there exists a p
n
(θ) such that
[[f −p
n
[[
∞
< ε.
Deﬁnition 14.7.2 Recall the n
th
partial sum of the Fourier series S
n
f (x) is given
by
S
n
f (x) =
_
π
−π
D
n
(x −y) f (y) dy =
_
π
−π
D
n
(t) f (x −t) dt
where D
n
(t) is the Dirichlet kernel,
D
n
(t) = (2π)
−1
sin
_
n +
1
2
_
t
sin
_
t
2
_
The n
th
Fejer mean, σ
n
f (x) is the average of the ﬁrst n of the S
n
f (x). Thus
σ
n+1
f (x) ≡
1
n + 1
n
k=0
S
k
f (x) =
_
π
−π
_
1
n + 1
n
k=0
D
k
(t)
_
f (x −t) dt
The Fejer kernel is
F
n+1
(t) ≡
1
n + 1
n
k=0
D
k
(t) .
As was the case with the Dirichlet kernel, the Fejer kernel has some properties.
Lemma 14.7.3 The Fejer kernel has the following properties.
1. F
n+1
(t) = F
n+1
(t + 2π)
2.
_
π
−π
F
n+1
(t) dt = 1
3.
_
π
−π
F
n+1
(t) f (x −t) dt =
n
k=−n
b
k
e
ikθ
for a suitable choice of b
k
.
4. F
n+1
(t) =
1−cos((n+1)t)
4π(n+1) sin
2
(
t
2
)
, F
n+1
(t) ≥ 0, F
n
(t) = F
n
(−t) .
5. For every δ > 0,
lim
n→∞
sup ¦F
n+1
(t) : π ≥ [t[ ≥ δ¦ = 0.
In fact, for [t[ ≥ δ,
F
n+1
(t) ≤
2
(n + 1) sin
2
_
δ
2
_
4π
.
348 FOURIER SERIES
Proof: Part 1.) is obvious because F
n+1
is the average of functions for which
this is true.
Part 2.) is also obvious for the same reason as Part 1.). Part 3.) is obvious
because it is true for D
n
in place of F
n+1
and then taking the average yields the
same sort of sum.
The last statements in 4.) are obvious from the formula which is the only hard
part of 4.).
F
n+1
(t) =
1
(n + 1) sin
_
t
2
_
2π
n
k=0
sin
__
k +
1
2
_
t
_
=
1
(n + 1) sin
2
_
t
2
_
2π
n
k=0
sin
__
k +
1
2
_
t
_
sin
_
t
2
_
Using the identity sin (a) sin (b) = cos (a −b) − cos (a +b) with a =
_
k +
1
2
_
t and
b =
t
2
, it follows
F
n+1
(t) =
1
(n + 1) sin
2
_
t
2
_
4π
n
k=0
(cos (kt) −cos (k + 1) t)
=
1 −cos ((n + 1) t)
(n + 1) sin
2
_
t
2
_
4π
which completes the demonstration of 4.).
Next consider 5.). Since F
n+1
is even it suﬃces to show
lim
n→∞
sup ¦F
n+1
(t) : π ≥ t ≥ δ¦ = 0
For the given t, [t[ ≥ δ,
F
n+1
(t) ≤
1 −cos ((n + 1) t)
(n + 1) sin
2
_
δ
2
_
4π
≤
2
(n + 1) sin
2
_
δ
2
_
4π
which shows 5.).
Here is a picture of the Fejer kernels for n = 2, 4, 6.
2 −3
y
t
3 −1
0.25
0.5
1.0
0.0
1 0
0.75
−2
Note how these kernels are nonnegative, unlike the Dirichlet kernels. Also there
is a large bump in the center which gets increasingly large as n gets larger. The
fact these kernels are nonnegative is what is responsible for the superior ability of
the Fejer means to approximate a continuous function.
14.7. WAYS OF APPROXIMATING FUNCTIONS 349
Theorem 14.7.4 Let f be a continuous and 2π periodic function. Then
lim
n→∞
[[f −σ
n+1
f[[
∞
= 0.
Proof: Let ε > 0 be given. Then by part 2. of Lemma 14.7.3,
[f (x) −σ
n+1
f (x)[ =
¸
¸
¸
¸
_
π
−π
f (x) F
n+1
(y) dy −
_
π
−π
F
n+1
(y) f (x −y) dy
¸
¸
¸
¸
=
¸
¸
¸
¸
_
π
−π
(f (x) −f (x −y)) F
n+1
(y) dy
¸
¸
¸
¸
≤
_
π
−π
[f (x) −f (x −y)[ F
n+1
(y) dy
=
_
δ
−δ
[f (x) −f (x −y)[ F
n+1
(y) dy +
_
π
δ
[f (x) −f (x −y)[ F
n+1
(y) dy
+
_
−δ
−π
[f (x) −f (x −y)[ F
n+1
(y) dy
Since F
n+1
is even and [f[ is continuous and periodic, hence bounded by some
constant M the above is dominated by
≤
_
δ
−δ
[f (x) −f (x −y)[ F
n+1
(y) dy + 4M
_
π
δ
F
n+1
(y) dy
Now choose δ such that for all x, it follows that if [y[ < δ then
[f (x) −f (x −y)[ < ε/2.
This can be done because f is uniformly continuous on [−π, π]. Since f is periodic,
it must also be uniformly continuous on 1. (why?) Therefore, for this δ, this has
shown that for all x
[f (x) −σ
n+1
f (x)[ ≤ ε/2 + 4M
_
π
δ
F
n+1
(y) dy
and now by Lemma 14.7.3 it follows
[[f −σ
n+1
f[[
∞
≤ ε/2 +
8Mπ
(n + 1) sin
2
_
δ
2
_
4π
< ε
provided n is large enough.
350 FOURIER SERIES
14.7.2 Mean Square Approximation
The partial sums of the Fourier series of f do a better job approximating f in
the mean square sense than any other linear combination of the functions, e
ikθ
for
[k[ ≤ n. This will be shown next. It is nothing but a simple computation. Recall
the Fourier coeﬃcients are
a
k
=
1
2π
_
π
−π
f (θ) e
−ikθ
Then using this fact as needed, consider the following computation.
_
π
−π
¸
¸
¸
¸
¸
f (θ) −
n
k=−n
b
k
e
ikθ
¸
¸
¸
¸
¸
2
dθ
=
_
π
−π
_
f (θ) −
n
k=−n
b
k
e
ikθ
__
f (θ) −
n
l=−n
b
l
e
−ilθ
_
dθ
=
_
π
−π
_
[f (θ)[
2
+
_
n
k=−n
b
k
e
ikθ
__
n
l=−n
b
l
e
−ilθ
_
−f (θ)
n
l=−n
b
l
e
−ilθ
−f (θ)
n
k=−n
b
k
e
ikθ
_
dθ
=
_
π
−π
[f (θ)[
2
dθ +
kl
b
k
b
l
_
π
−π
e
ikθ
e
−ilθ
dθ −2π
l
b
l
a
l
−2π
k
b
k
a
k
Then adding and subtracting 2π
k
[a
k
[
2
,
=
_
π
−π
[f (θ)[
2
dθ −2π
k
[a
k
[
2
+ 2π
k
[b
k
[
2
−2π
l
b
l
a
l
−2π
k
b
k
a
k
+ 2π
k
[a
k
[
2
=
_
π
−π
[f (θ)[
2
dθ −2π
k
[a
k
[
2
+ 2π
_
k
(b
k
−a
k
)
_
b
k
−a
k
_
_
=
_
π
−π
[f (θ)[
2
dθ −2π
k
[a
k
[
2
+ 2π
k
[b
k
−a
k
[
2
Therefore, to make
_
π
−π
¸
¸
¸
¸
¸
f (θ) −
n
k=−n
b
k
e
ikθ
¸
¸
¸
¸
¸
2
dθ
14.7. WAYS OF APPROXIMATING FUNCTIONS 351
as small as possible for all choices of b
k
, one should let b
k
= a
k
, the k
th
Fourier
coeﬃcient. Stated another way,
_
π
−π
¸
¸
¸
¸
¸
f (θ) −
n
k=−n
b
k
e
ikθ
¸
¸
¸
¸
¸
2
dθ ≥
_
π
−π
[f (θ) −S
n
f (θ)[
2
dθ
for any choice of b
k
. In particular,
_
π
−π
[f (θ) −σ
n+1
f (θ)[
2
dθ ≥
_
π
−π
[f (θ) −S
n
f (θ)[
2
dθ. (14.7.20)
Also,
_
π
−π
f (θ) S
n
f (θ)dθ =
_
π
−π
n
k=−n
a
k
f (θ) e
−ikθ
dθ
=
n
k=−n
a
k
=2πa
k
¸ .. ¸
_
π
−π
f (θ) e
−ikθ
dθ
= 2π
n
k=−n
[a
k
[
2
Similarly,
_
π
−π
f (θ)S
n
f (θ) dθ = 2π
n
k=−n
[a
k
[
2
and a simple computation of the above sort shows that also
_
π
−π
S
n
f (θ) S
n
f (θ)dθ = 2π
n
k=−n
[a
k
[
2
.
Therefore,
0 ≤
_
π
−π
(f (θ) −S
n
f (θ))
_
f (θ) −S
n
f (θ)
_
dθ
=
_
π
−π
[f (θ)[
2
+[S
n
f (θ)[
2
−f (θ)S
n
f (θ) −f (θ) S
n
f (θ)dθ
=
_
π
−π
[f (θ)[
2
−[S
n
f (θ)[
2
dθ
showing
2π
n
k=−n
[a
k
[
2
≤
_
π
−π
[S
n
f (θ)[
2
dθ ≤
_
π
−π
[f (θ)[
2
dθ (14.7.21)
Now it is easy to prove the following fundamental theorem.
352 FOURIER SERIES
Theorem 14.7.5 Let f ∈ L
2
([−π, π]) and periodic of period 2π. Then
lim
n→∞
_
π
−π
[f −S
n
f[
2
dx = 0.
Proof: First assume f is continuous and 2π periodic. Then by 14.7.20
_
π
−π
[f −S
n
f[
2
dx ≤
_
π
−π
[f −σ
n+1
f[
2
dx
≤
_
π
−π
[[f −σ
n+1
f[[
2
∞
dx = 2π [[f −σ
n+1
f[[
2
∞
and the last expression converges to 0 by Theorem 14.7.4.
Next suppose f is only in L
2
([−π, π]) . By Theorem 12.5.3, there exists a con
tinuous function g which has compact support such that
¸
¸
¸
¸
gA
[−π,π]
−fA
([−π,π])
¸
¸
¸
¸
L
2
≤
¸
¸
¸
¸
g −fA
([−π,π])
¸
¸
¸
¸
L
2
(R)
< ε
Since (−π, π) is a countable union of compact sets, without loss of generality, it can
be assumed that spt (g) ⊆ (−π, π). This is because, letting K
n
=
_
−π +
1
n
, π −
1
n
¸
,
it follows from the dominated convergence theorem that
lim
n→∞
¸
¸
¸
¸
gA
[−π,π]
−gA
K
n
¸
¸
¸
¸
L
2
(R)
= 0
and so gA
[−π,π]
could be replaced with gA
K
n
if necessary in the above inequality
involving ε.
Extending g to be 2π periodic, it follows
[[f −S
n
f[[
L
2
([−π,π])
≤ [[f −σ
n+1
f[[
L
2
([−π,π])
≤ [[f −g[[
L
2
([−π,π])
+[[g −σ
n+1
g[[
L
2
([−π,π])
+[[σ
n+1
(g −f)[[
L
2
([−π,π])
. (14.7.22)
The last term is no larger than
_
_
π
−π
__
π
−π
F
n
(y) [g (x −y) −f (x −y)[ dy
_
2
dx
_
1/2
≤
_
π
−π
F
n
(y)
__
π
−π
[g (x −y) −f (x −y)[
2
dx
_
1/2
dy
=
_
π
−π
F
n
(y) dy [[g −f[[
L
2
([−π,π])
= [[g −f[[
L
2
([−π,π])
Therefore, 14.7.22 is no larger than 3ε where ε was arbitrary.
14.8. EXERCISES 353
14.8 Exercises
1. ♠Suppose f has inﬁnitely many derivatives and is also periodic with period
2π. Let the Fourier series of f be
∞
k=−∞
a
k
e
ikθ
Show that
lim
k→∞
k
m
a
k
= lim
k→∞
k
m
a
−k
= 0
for every m ∈ N.
2. Let f be a continuous function deﬁned on [−π, π]. Show there exists a poly
nomial, p such that [[p −f[[ < ε where
[[g[[ ≡ sup ¦[g (x)[ : x ∈ [−π, π]¦ .
Extend this result to an arbitrary interval. This is another approach to the
Weierstrass approximation theorem. Hint: First ﬁnd a linear function, ax +
b = y such that f − y has the property that it has the same value at both
ends of [−π, π]. Therefore, you may consider this as the restriction to [−π, π]
of a continuous periodic function, F. Now ﬁnd a trig polynomial,
σ (x) ≡ a
0
+
n
k=1
a
k
cos kx +b
k
sin kx
such that [[σ −F[[ <
ε
3
. Recall 14.1.4. Now consider the power series of
the trig functions making use of the error estimate for the remainder after m
terms.
3. ♠The inequality established above,
2π
n
k=−n
[a
k
[
2
≤
_
π
−π
[S
n
f (θ)[
2
dθ ≤
_
π
−π
[f (θ)[
2
dθ
is called Bessel’s inequality. Use this inequality to give an easy proof that for
all f ∈ L
2
([−π, π]) ,
lim
n→∞
_
π
−π
f (x) e
inx
dx = 0.
Recall that in the Riemann Lebesgue lemma [f[ ∈ L
1
((a, b]) so while this
exercise is easier, it lacks the generality of the earlier proof. Explain why this
is less general.
4. ♠Let f (x) = x for x ∈ (−π, π) and extend to make the resulting function
deﬁned on 1 and periodic of period 2π. Find the Fourier series of f. Verify
the Fourier series converges to the midpoint of the jump and use this series
to ﬁnd a nice formula for
π
4
. Hint: For the last part consider x =
π
2
.
354 FOURIER SERIES
5. ♠Let f (x) = x
2
on (−π, π) and extend to form a 2π periodic function deﬁned
on 1. Find the Fourier series of f. Now obtain a famous formula for
π
2
6
by
letting x = π.
6. Let f (x) = cos x for x ∈ (0, π) and deﬁne f (x) ≡ −cos x for x ∈ (−π, 0).
Now extend this function to make it 2π periodic. Find the Fourier series of f.
7. ♠Suppose f is piecewise continuous and 2π periodic on 1. This means the
function is periodic and it is continuous on [−π, π] except for ﬁniteliy many
points. Also suppose that at every point, f (x+) and f (x−) exist. Show
that the Fejer means, σ
n
f (x) →
f(x+)+f(x−)
2
. Thus there is no requirement
of smoothness of f from one side at all and you still get convergence to the
midpoint of the jump.
8. ♠Suppose f, g ∈ L
2
([−π, π]). Show
1
2π
_
π
−π
fgdx =
∞
k=−∞
α
k
β
k
,
where α
k
are the Fourier coeﬃcients of f and β
k
are the Fourier coeﬃcients
of g.
9. Recall the partial summation formula, called the Dirichlet formula which says
that
q
k=p
a
k
b
k
= A
q
b
q
−A
p−1
b
p
+
q−1
k=p
A
k
(b
k
−b
k+1
) .
Here A
q
≡
q
k=1
a
k
. Also recall Dirichlet’s test which says that if lim
k→∞
b
k
=
0, A
k
are bounded, and
[b
k
−b
k+1
[ converges, then
a
k
b
k
converges. Show
the partial sums of
k
sinkx are bounded for each x ∈ 1. Using this fact
and the Dirichlet test above, obtain some theorems which will state that
k
a
k
sin kx converges for all x.
10. Let ¦a
n
¦ be a sequence of positive numbers having the property that
lim
n→∞
na
n
= 0
and for all n ∈ N, na
n
≥ (n + 1) a
n+1
. Show that if this is so, it follows that
the series,
∞
k=1
a
n
sin nx converges uniformly on 1. This is a variation of a
very interesting problem found in Apostol’s book, [3]. Hint: Use the Dirichlet
formula of Problem 9 on
ka
k
sin kx
k
and show the partial sums of
sin kx
k
are bounded independent of x. To do this, you might argue the maximum
value of the partial sums of this series occur when
n
k=1
cos kx = 0. Sum this
series by considering the real part of the geometric series,
q
k=1
_
e
ix
_
k
and
then show the partial sums of
sin kx
k
are Riemann sums for a certain ﬁnite
integral.
14.8. EXERCISES 355
11. The problem in Apostol’s book mentioned in Problem 10 does not require
na
n
to be decreasing and is as follows. Let ¦a
k
¦
∞
k=1
be a decreasing sequence
of nonnegative numbers which satisﬁes lim
n→∞
na
n
= 0. Then
∞
k=1
a
k
sin (kx)
converges uniformly on 1. You can ﬁnd this problem worked out completely
in Jones [30]. Fill in the details to the following argument or something like
it to obtain a proof. First show that for p < q, and x ∈ (0, π) ,
¸
¸
¸
¸
¸
¸
q
k=p
a
k
sin (kx)
¸
¸
¸
¸
¸
¸
≤ 3a
p
csc
_
x
2
_
. (14.8.23)
To do this, use summation by parts using the formula
q
k=p
sin (kx) =
cos
__
p −
1
2
_
x
_
−cos
__
q +
1
2
_
x
_
2 sin
_
x
2
_ ,
which you can establish by taking the imaginary part of a geometric series of
the form
q
k=1
_
e
ix
_
k
or else the approach used above to ﬁnd a formula for
the Dirichlet kernel. Now deﬁne
b (p) ≡ sup ¦na
n
: n ≥ p¦ .
Thus b (p) → 0, b (p) is decreasing in p, and if k ≥ n, a
k
≤ b (n) /k. Then
from 14.8.23 and the assumption ¦a
k
¦ is decreasing,
¸
¸
¸
¸
¸
¸
q
k=p
a
k
sin (kx)
¸
¸
¸
¸
¸
¸
≤
¸
¸
¸
¸
¸
¸
m
k=p
a
k
sin (kx)
¸
¸
¸
¸
¸
¸
+
≡0 if m=q
¸ .. ¸
¸
¸
¸
¸
¸
q
k=m+1
a
k
sin (kx)
¸
¸
¸
¸
¸
≤
_
m
k=p
b(k)
k
[sin (kx)[ + 3a
p
csc
_
x
2
_
if m < q
q
k=p
b(k)
k
[sin (kx)[ if m = q.
≤
_
m
k=p
b(k)
k
kx + 3a
p
2π
x
if m < q
q
k=p
b(k)
k
kx if m = q
(14.8.24)
where this uses the inequalities
sin
_
x
2
_
≥
x
2π
, [sin (x)[ ≤ [x[ for x ∈ (0, π) .
There are two cases to consider depending on whether x ≤ 1/q. First suppose
that x ≤ 1/q. Then let m = q and use the bottom line of 14.8.24 to write
356 FOURIER SERIES
that in this case,
¸
¸
¸
¸
¸
¸
q
k=p
a
k
sin (kx)
¸
¸
¸
¸
¸
¸
≤
1
q
q
k=p
b (k) ≤ b (p) .
If x > 1/q, then q > 1/x and you use the top line of 14.8.24 picking m such
that
q >
1
x
≥ m ≥
1
x
−1.
Then in this case,
¸
¸
¸
¸
¸
¸
q
k=p
a
k
sin (kx)
¸
¸
¸
¸
¸
¸
≤
m
k=p
b (k)
k
kx + 3a
p
2π
x
≤ b (p) x(m−p) + 6πa
p
(m+ 1)
≤ b (p) x
_
1
x
_
+ 6π
b (p)
m+ 1
(m+ 1) ≤ 25b (p) .
Therefore, the partial sums of the series,
a
k
sin kx form a uniformly Cauchy
sequence and must converge uniformly on (0, π). Now explain why this implies
the series converges uniformly on 1.
12. ♠Suppose f (x) =
∞
k=1
a
k
sin kx and that the convergence is uniform. Recall
something like this holds for power series. Is it reasonable to suppose that
f
(x) =
∞
k=1
a
k
k cos kx? Explain.
13. ♠Suppose [u
k
(x)[ ≤ K
k
for all x ∈ D where
∞
k=−∞
K
k
= lim
n→∞
n
k=−n
K
k
< ∞.
Show that
∞
k=−∞
u
k
(x) converges converges uniformly on D in the sense
that for all ε > 0, there exists N such that whenever n > N,
¸
¸
¸
¸
¸
∞
k=−∞
u
k
(x) −
n
k=−n
u
k
(x)
¸
¸
¸
¸
¸
< ε
for all x ∈ D. This is called the Weierstrass M test.
14. ♠Suppose f is a diﬀerentiable function of period 2π and suppose that both
f and f
are in L
2
([−π, π]) such that for all x ∈ (−π, π) and y suﬃciently
small,
f (x +y) −f (x) =
_
x+y
x
f
(t) dt.
Show that the Fourier series of f converges uniformly to f. Hint: First show
using the Dini criterion that S
n
f (x) →f (x) for all x. Next let
∞
k=−∞
a
k
e
ikx
14.8. EXERCISES 357
be the Fourier series for f. Then from the deﬁnition of a
k
, show that for
k ,= 0, a
k
=
1
ik
a
k
where a
k
is the Fourier coeﬃcient of f
. Now use the
Bessel’s inequality to argue that
∞
k=−∞
[a
k
[
2
< ∞and then show this implies
[a
k
[ < ∞. You might want to use the Cauchy Schwarz inequality to do this
part. Then using the version of the Weierstrass M test given in Problem 13
obtain uniform convergence of the Fourier series to f.
15. Let f be a function deﬁned on 1. Then f is even if f (θ) = f (−θ) for all
θ ∈ 1. Also f is called odd if for all θ ∈ 1, −f (θ) = f (−θ). Now using the
Weierstrass approximation theorem show directly that if h is a continuous even
2π periodic function, then for every ε > 0 there exists an m and constants,
a
0
, , a
m
such that
¸
¸
¸
¸
¸
h(θ) −
m
k=0
a
k
cos
k
(θ)
¸
¸
¸
¸
¸
< ε
for all θ ∈ 1. Hint: Note the function arccos is continuous and maps [−1, 1]
onto [0, π] . Using this, show you can deﬁne g a continuous function on [−1, 1]
by g (cos θ) = h(θ) for θ on [0, π]. Now use the Weierstrass approximation
theorem on [−1, 1].
16. ♠Show that if f is any odd 2π periodic function, then its Fourier series can
be simpliﬁed to an expression of the form
∞
n=1
b
n
sin (nx)
and also f (mπ) = 0 for all m ∈ N.
17. ♠Consider the symbol
∞
k=1
a
n
. The inﬁnite sum might not converge. Summa
bility methods are systematic ways of assigning a number to such a symbol.
The n
th
Ceasaro mean σ
n
is deﬁned as the average of the ﬁrst n partial sums
of the series. Thus
σ
n
≡
1
n
n
k=1
S
k
where
S
k
≡
k
j=1
a
j
.
Show that if
∞
k=1
a
n
converges then lim
n→∞
σ
n
also exists and equals the
same thing. Next ﬁnd an example where, although
∞
k=1
a
n
fails to converge,
lim
n→∞
σ
n
does exist. This summability method is called Ceasaro summa
bility. Recall the Fejer means were obtained in just this way.
358 FOURIER SERIES
18. ♠Let 0 < r < 1 and for f a continuous periodic function of period 2π consider
A
r
f (θ) ≡
∞
k=−∞
r
k
a
k
e
ikθ
where the a
k
are the Fourier coeﬃcients of f. Show that
lim
r→1−
A
r
f (θ) = f (θ) .
Hint: You need to ﬁnd a kernel and write as the integral of the kernel con
volved with f. Then consider properties of this kernel as was done with the
Fejer kernel.
19. Recall the Dirichlet kernel is
D
n
(t) ≡ (2π)
−1
sin
_
n +
1
2
_
t
sin
_
t
2
_
and it has the property that
_
π
−π
D
n
(t) dt = 1. Show ﬁrst that this implies
1
2π
_
π
−π
sin (nt) cos (t/2)
sin (t/2)
dt = 1
and this implies
1
π
_
π
0
sin (nt) cos (t/2)
sin (t/2)
= 1.
Next change the variable to show the integral equals
1
π
_
nπ
0
sin (u) cos (u/2n)
sin (u/2n)
1
n
du
Now show that
lim
n→∞
sin (u) cos (u/2n)
sin (u/2n)
1
n
=
2 sin u
u
Next show that
lim
n→∞
1
π
_
nπ
0
2 sin u
u
du = lim
n→∞
1
π
_
nπ
0
sin (u) cos (u/2n)
sin (u/2n)
1
n
du = 1
Finally show
lim
R→∞
_
R
0
sin u
u
du =
π
2
≡
_
∞
0
sinu
u
du
This is a very important improper integral.
14.8. EXERCISES 359
20. Recall the Fourier series of a function in L
2
(−π, π) converges to the func
tion in L
2
(−π, π). Prove a similar theorem with L
2
(−π, π) replaced by
L
2
(−mπ, mπ) and the functions
_
(2π)
−(1/2)
e
inx
_
n∈Z
used in the Fourier series replaced with
_
(2mπ)
−(1/2)
e
i
n
m
x
_
n∈Z
Now suppose f is a function in L
2
(1) satisfying Ff (t) = 0 if [t[ > mπ. Show
that if this is so, then
f (x) =
1
π
n∈Z
f
_
−n
m
_
sin (π (mx +n))
mx +n
.
Here m is a positive integer. This is sometimes called the Shannon sampling
theorem.Hint: First note that since Ff ∈ L
2
and is zero oﬀ a ﬁnite interval,
it follows Ff ∈ L
1
. Also
f (t) =
1
√
2π
_
mπ
−mπ
e
itx
Ff (x) dx
and you can conclude from this that f has all derivatives and they are all
bounded. Thus f is a very nice function. You can replace Ff with its Fourier
series. Then consider carefully the Fourier coeﬃcient of Ff. Argue it equals
f
_
−n
m
_
or at least an appropriate constant times this. When you get this the
rest will fall quickly into place if you use Ff is zero oﬀ [−mπ, mπ].
21. Show that neither the Jordan nor the Dini criterion for pointwise convergence
implies the other criterion. That is, ﬁnd an example of a function for which
Jordan’s condition implies pointwise convergence but not Dini’s and then ﬁnd
a function for which Dini works but Jordan does not. Hint: You might try
considering something like y = [ln (x)]
−1
for x > 0, x near 0, to get something
for which Jordan works but Dini does not. For the other part, try something
like xsin (1/x).
360 FOURIER SERIES
Part III
Further Topics
361
Metric Spaces And General
Topological Spaces
15.1 Metric Space
Deﬁnition 15.1.1 A metric space is a set, X and a function d : X X →[0, ∞)
which satisﬁes the following properties.
d (x, y) = d (y, x)
d (x, y) ≥ 0 and d (x, y) = 0 if and only if x = y
d (x, y) ≤ d (x, z) +d (z, y) .
You can check that 1
n
and C
n
are metric spaces with d (x, y) = [x −y[ . How
ever, there are many others. The deﬁnitions of open and closed sets are the same
for a metric space as they are for 1
n
.
Deﬁnition 15.1.2 A set, U in a metric space is open if whenever x ∈ U, there
exists r > 0 such that B(x, r) ⊆ U. As before, B(x, r) ≡ ¦y : d (x, y) < r¦ . Closed
sets are those whose complements are open. A point p is a limit point of a set, S
if for every r > 0, B(p, r) contains inﬁnitely many points of S. A sequence, ¦x
n
¦
converges to a point x if for every ε > 0 there exists N such that if n ≥ N, then
d (x, x
n
) < ε. ¦x
n
¦ is a Cauchy sequence if for every ε > 0 there exists N such that
if m, n ≥ N, then d (x
n
, x
m
) < ε.
Lemma 15.1.3 In a metric space, X every ball, B(x, r) is open. A set is closed
if and only if it contains all its limit points. If p is a limit point of S, then there
exists a sequence of distinct points of S, ¦x
n
¦ such that lim
n→∞
x
n
= p.
Proof: Let z ∈ B(x, r). Let δ = r −d (x, z) . Then if w ∈ B(z, δ) ,
d (w, x) ≤ d (x, z) +d (z, w) < d (x, z) +r −d (x, z) = r.
Therefore, B(z, δ) ⊆ B(x, r) and this shows B(x, r) is open.
The properties of balls are presented in the following theorem.
363
364 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Theorem 15.1.4 Suppose (X, d) is a metric space. Then the sets ¦B(x, r) : r >
0, x ∈ X¦ satisfy
∪ ¦B(x, r) : r > 0, x ∈ X¦ = X (15.1.1)
If p ∈ B(x, r
1
) ∩ B(z, r
2
), there exists r > 0 such that
B(p, r) ⊆ B(x, r
1
) ∩ B(z, r
2
) . (15.1.2)
Proof: Observe that the union of these balls includes the whole space, X so
15.1.1 is obvious. Consider 15.1.2. Let p ∈ B(x, r
1
) ∩ B(z, r
2
). Consider
r ≡ min (r
1
−d (x, p) , r
2
−d (z, p))
and suppose y ∈ B(p, r). Then
d (y, x) ≤ d (y, p) +d (p, x) < r
1
−d (x, p) +d (x, p) = r
1
and so B(p, r) ⊆ B(x, r
1
). By similar reasoning, B(p, r) ⊆ B(z, r
2
). This proves
the theorem.
Let K be a closed set. This means K
C
≡ X ¸ K is an open set. Let p be a
limit point of K. If p ∈ K
C
, then since K
C
is open, there exists B(p, r) ⊆ K
C
. But
this contradicts p being a limit point because there are no points of K in this ball.
Hence all limit points of K must be in K.
Suppose next that K contains its limit points. Is K
C
open? Let p ∈ K
C
.
Then p is not a limit point of K. Therefore, there exists B(p, r) which contains at
most ﬁnitely many points of K. Since p / ∈ K, it follows that by making r smaller if
necessary, B(p, r) contains no points of K. That is B(p, r) ⊆ K
C
showing K
C
is
open. Therefore, K is closed.
Suppose now that p is a limit point of S. Let x
1
∈ (S ¸ ¦p¦) ∩ B(p, 1) . If
x
1
, , x
k
have been chosen, let
r
k+1
≡ min
_
d (p, x
i
) , i = 1, , k,
1
k + 1
_
.
Let x
k+1
∈ (S ¸ ¦p¦) ∩ B(p, r
k+1
) . This proves the lemma.
Lemma 15.1.5 If ¦x
n
¦ is a Cauchy sequence in a metric space, X and if some
subsequence, ¦x
n
k
¦ converges to x, then ¦x
n
¦ converges to x. Also if a sequence
converges, then it is a Cauchy sequence.
Proof: Note ﬁrst that n
k
≥ k because in a subsequence, the indices, n
1
, n
2
,
are strictly increasing. Let ε > 0 be given and let N be such that for k >
N, d (x, x
n
k
) < ε/2 and for m, n ≥ N, d (x
m
, x
n
) < ε/2. Pick k > n. Then if n > N,
d (x
n
, x) ≤ d (x
n
, x
n
k
) +d (x
n
k
, x) <
ε
2
+
ε
2
= ε.
Finally, suppose lim
n→∞
x
n
= x. Then there exists N such that if n > N, then
d (x
n
, x) < ε/2. it follows that for m, n > N,
d (x
n
, x
m
) ≤ d (x
n
, x) +d (x, x
m
) <
ε
2
+
ε
2
= ε.
This proves the lemma.
15.2. COMPACTNESS IN METRIC SPACE 365
15.2 Compactness In Metric Space
Many existence theorems in analysis depend on some set being compact. Therefore,
it is important to be able to identify compact sets. The purpose of this section is
to describe compact sets in a metric space.
Deﬁnition 15.2.1 Let A be a subset of X. A is compact if whenever A is con
tained in the union of a set of open sets, there exists ﬁnitely many of these open sets
whose union contains A. (Every open cover admits a ﬁnite subcover.) A is “se
quentially compact” means every sequence has a convergent subsequence converging
to an element of A.
In a metric space compact is not the same as closed and bounded!
Example 15.2.2 Let X be any inﬁnite set and deﬁne d (x, y) = 1 if x ,= y while
d (x, y) = 0 if x = y.
You should verify the details that this is a metric space because it satisﬁes the
axioms of a metric. The set X is closed and bounded because its complement is
∅ which is clearly open because every point of ∅ is an interior point. (There are
none.) Also X is bounded because X = B(x, 2). However, X is clearly not compact
because
_
B
_
x,
1
2
_
: x ∈ X
_
is a collection of open sets whose union contains X but
since they are all disjoint and nonempty, there is no ﬁnite subset of these whose
union contains X. In fact B
_
x,
1
2
_
= ¦x¦.
From this example it is clear something more than closed and bounded is needed.
If you are not familiar with the issues just discussed, ignore them and continue.
Deﬁnition 15.2.3 In any metric space, a set E is totally bounded if for every ε > 0
there exists a ﬁnite set of points ¦x
1
, , x
n
¦ such that
E ⊆ ∪
n
i=1
B(x
i
, ε).
This ﬁnite set of points is called an ε net.
The following proposition tells which sets in a metric space are compact. First
here is an interesting lemma.
Lemma 15.2.4 Let X be a metric space and suppose D is a countable dense subset
of X. In other words, it is being assumed X is a separable metric space. Consider
the open sets of the form B(d, r) where r is a positive rational number and d ∈ D.
Denote this countable collection of open sets by B. Then every open set is the union
of sets of B. Furthermore, if ( is any collection of open sets, there exists a countable
subset, ¦U
n
¦ ⊆ ( such that ∪
n
U
n
= ∪(.
Proof: Let U be an open set and let x ∈ U. Let B(x, δ) ⊆ U. Then by density of
D, there exists d ∈ D∩B(x, δ/4) . Now pick r ∈ ¸∩(δ/4, 3δ/4) and consider B(d, r) .
Clearly, B(d, r) contains the point x because r > δ/4. Is B(d, r) ⊆ B(x, δ)? if so,
366 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
this proves the lemma because x was an arbitrary point of U. Suppose z ∈ B(d, r) .
Then
d (z, x) ≤ d (z, d) +d (d, x) < r +
δ
4
<
3δ
4
+
δ
4
= δ
Now let ( be any collection of open sets. Each set in this collection is the union
of countably many sets of B. Let B
denote the sets of B which are contained in
some set of (. Thus ∪B
= ∪(. Then for each B ∈ B
, pick U
B
∈ ( such that
B ⊆ U
B
. Then ¦U
B
: B ∈ B
¦ is a countable collection of sets of ( whose union
equals ∪(. Therefore, this proves the lemma.
Proposition 15.2.5 Let (X, d) be a metric space. Then the following are equiva
lent.
(X, d) is compact, (15.2.3)
(X, d) is sequentially compact, (15.2.4)
(X, d) is complete and totally bounded. (15.2.5)
Proof: Suppose 15.2.3 and let ¦x
k
¦ be a sequence. Suppose ¦x
k
¦ has no
convergent subsequence. If this is so, then no value of the sequence is repeated
more than ﬁnitely many times. Also ¦x
k
¦ has no limit point because if it did,
there would exist a subsequence which converges. To see this, suppose p is a limit
point of ¦x
k
¦ . Then in B(p, 1) there are inﬁnitely many points of ¦x
k
¦ . Pick one
called x
k
1
. Now if x
k
1
, x
k
2
, , x
k
n
have been picked with x
k
i
∈ B(p, 1/i) , consider
B(p, 1/ (n + 1)) . There are inﬁnitely many points of ¦x
k
¦ in this ball also. Pick
x
k
n+1
such that k
n+1
> k
n
. Then ¦x
k
n
¦
∞
n=1
is a subsequence which converges to p
and it is assumed this does not happen. Thus ¦x
k
¦ has no limit points. It follows
the set
C
n
= ∪¦x
k
: k ≥ n¦
is a closed set because it has no limit points and if
U
n
= C
C
n
,
then
X = ∪
∞
n=1
U
n
but there is no ﬁnite subcovering, because no value of the sequence is repeated more
than ﬁnitely many times. Note x
k
is not in U
n
whenever k > n. This contradicts
compactness of (X, d). This shows 15.2.3 implies 15.2.4.
Now suppose 15.2.4 and let ¦x
n
¦ be a Cauchy sequence. Is ¦x
n
¦ convergent?
By sequential compactness x
n
k
→ x for some subsequence. By Lemma 15.1.5 it
follows that ¦x
n
¦ also converges to x showing that (X, d) is complete. If (X, d) is
not totally bounded, then there exists ε > 0 for which there is no ε net. Hence there
exists a sequence ¦x
k
¦ with d (x
k
, x
l
) ≥ ε for all l ,= k. By Lemma 15.1.5 again,
this contradicts 15.2.4 because no subsequence can be a Cauchy sequence and so no
subsequence can converge. This shows 15.2.4 implies 15.2.5.
15.2. COMPACTNESS IN METRIC SPACE 367
Now suppose 15.2.5. What about 15.2.4? Let ¦p
n
¦ be a sequence and let
¦x
n
i
¦
m
n
i=1
be a 2
−n
net for n = 1, 2, . Let
B
n
≡ B
_
x
n
i
n
, 2
−n
_
be such that B
n
contains p
k
for inﬁnitely many values of k and B
n
∩ B
n+1
,= ∅.
To do this, suppose B
n
contains p
k
for inﬁnitely many values of k. Then one of
the sets which intersect B
n
, B
_
x
n+1
i
, 2
−(n+1)
_
must contain p
k
for inﬁnitely many
values of k because all these indices of points from ¦p
n
¦ contained in B
n
must be
accounted for in one of ﬁnitely many sets, B
_
x
n+1
i
, 2
−(n+1)
_
. Thus there exists a
strictly increasing sequence of integers, n
k
such that
p
n
k
∈ B
k
.
Then if k ≥ l,
d (p
n
k
, p
n
l
) ≤
k−1
i=l
d
_
p
n
i+1
, p
n
i
_
<
k−1
i=l
2
−(i−1)
< 2
−(l−2)
.
Consequently ¦p
n
k
¦ is a Cauchy sequence. Hence it converges because the metric
space is complete. This proves 15.2.4.
Now suppose 15.2.4 and 15.2.5 which have now been shown to be equivalent.
Let D
n
be a n
−1
net for n = 1, 2, and let
D = ∪
∞
n=1
D
n
.
Thus D is a countable dense subset of (X, d).
Now let ( be any set of open sets such that ∪( ⊇ X. By Lemma 15.2.4, there
exists a countable subset of (,
¯
( = ¦U
n
¦
∞
n=1
such that ∪
¯
( = ∪(. If ( admits no ﬁnite subcover, then neither does
¯
( and there ex
ists p
n
∈ X¸∪
n
k=1
U
k
. Then since X is sequentially compact, there is a subsequence
¦p
n
k
¦ such that ¦p
n
k
¦ converges. Say
p = lim
k→∞
p
n
k
.
All but ﬁnitely many points of ¦p
n
k
¦ are in X¸ ∪
n
k=1
U
k
. Therefore p ∈ X¸ ∪
n
k=1
U
k
for each n. Hence
p / ∈ ∪
∞
k=1
U
k
contradicting the construction of ¦U
n
¦
∞
n=1
which required that ∪
∞
n=1
U
n
⊇ X. Hence
X is compact. This proves the proposition.
Consider 1
n
. In this setting totally bounded and bounded are the same. This
will yield a proof of the Heine Borel theorem from advanced calculus.
368 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Lemma 15.2.6 A subset of 1
n
is totally bounded if and only if it is bounded.
Proof: Let A be totally bounded. Is it bounded? Let x
1
, , x
p
be a 1 net for
A. Now consider the ball B(0, r + 1) where r > max ([x
i
[ : i = 1, , p) . If z ∈ A,
then z ∈ B(x
j
, 1) for some j and so by the triangle inequality,
[z −0[ ≤ [z −x
j
[ +[x
j
[ < 1 +r.
Thus A ⊆ B(0,r + 1) and so A is bounded.
Now suppose A is bounded and suppose A is not totally bounded. Then there
exists ε > 0 such that there is no ε net for A. Therefore, there exists a sequence of
points ¦a
i
¦ with [a
i
−a
j
[ ≥ ε if i ,= j. Since A is bounded, there exists r > 0 such
that
A ⊆ [−r, r)
n
.
(x ∈[−r, r)
n
means x
i
∈ [−r, r) for each i.) Now deﬁne o to be all cubes of the form
n
k=1
[a
k
, b
k
)
where
a
k
= −r +i2
−p
r, b
k
= −r + (i + 1) 2
−p
r,
for i ∈ ¦0, 1, , 2
p+1
−1¦. Thus o is a collection of
_
2
p+1
_
n
non overlapping cubes
whose union equals [−r, r)
n
and whose diameters are all equal to 2
−p
r
√
n. Now
choose p large enough that the diameter of these cubes is less than ε. This yields a
contradiction because one of the cubes must contain inﬁnitely many points of ¦a
i
¦.
This proves the lemma.
The next theorem is called the Heine Borel theorem and it characterizes the
compact sets in 1
n
.
Theorem 15.2.7 A subset of 1
n
is compact if and only if it is closed and bounded.
Proof: Since a set in 1
n
is totally bounded if and only if it is bounded, this
theorem follows from Proposition 15.2.5 and the observation that a subset of 1
n
is
closed if and only if it is complete. This proves the theorem.
15.3 Some Applications Of Compactness
The following corollary is an important existence theorem which depends on com
pactness.
Corollary 15.3.1 Let X be a compact metric space and let f : X → 1 be contin
uous. Then max ¦f (x) : x ∈ X¦ and min ¦f (x) : x ∈ X¦ both exist.
15.3. SOME APPLICATIONS OF COMPACTNESS 369
Proof: First it is shown f (X) is compact. Suppose ( is a set of open sets whose
union contains f (X). Then since f is continuous f
−1
(U) is open for all U ∈ (.
Therefore,
_
f
−1
(U) : U ∈ (
_
is a collection of open sets whose union contains X.
Since X is compact, it follows ﬁnitely many of these,
_
f
−1
(U
1
) , , f
−1
(U
p
)
_
contains X in their union. Therefore, f (X) ⊆ ∪
p
k=1
U
k
showing f (X) is compact
as claimed.
Now since f (X) is compact, Theorem 15.2.7 implies f (X) is closed and bounded.
Therefore, it contains its inf and its sup. Thus f achieves both a maximum and a
minimum.
Deﬁnition 15.3.2 Let X, Y be metric spaces and f : X → Y a function. f is
uniformly continuous if for all ε > 0 there exists δ > 0 such that whenever x
1
and
x
2
are two points of X satisfying d (x
1
, x
2
) < δ, it follows that d (f (x
1
) , f (x
2
)) < ε.
A very important theorem is the following.
Theorem 15.3.3 Suppose f : X →Y is continuous and X is compact. Then f is
uniformly continuous.
Proof: Suppose this is not true and that f is continuous but not uniformly
continuous. Then there exists ε > 0 such that for all δ > 0 there exist points, p
δ
and q
δ
such that d (p
δ
, q
δ
) < δ and yet d (f (p
δ
) , f (q
δ
)) ≥ ε. Let p
n
and q
n
be
the points which go with δ = 1/n. By Proposition 15.2.5 ¦p
n
¦ has a convergent
subsequence, ¦p
n
k
¦ converging to a point, x ∈ X. Since d (p
n
, q
n
) <
1
n
, it follows
that q
n
k
→x also. Therefore,
ε ≤ d (f (p
n
k
) , f (q
n
k
)) ≤ d (f (p
n
k
) , f (x)) +d (f (x) , f (q
n
k
))
but by continuity of f, both d (f (p
n
k
) , f (x)) and d (f (x) , f (q
n
k
)) converge to 0
as k →∞ contradicting the above inequality. This proves the theorem.
Another important property of compact sets in a metric space concerns the ﬁnite
intersection property.
Deﬁnition 15.3.4 If every ﬁnite subset of a collection of sets has nonempty inter
section, the collection has the ﬁnite intersection property.
Theorem 15.3.5 Suppose T is a collection of compact sets in a metric space, X
which has the ﬁnite intersection property. Then each of these sets is closed and
there exists a point in their intersection. (∩T ,= ∅).
Proof: First I show each compact set is closed. Let K be a nonempty compact
set and suppose p / ∈ K. Then for each x ∈ K, let V
x
= B(x, d (p, x) /3) and
U
x
= B(p, d (p, x) /3) so that U
x
and V
x
have empty intersection. Then since V
is compact, there are ﬁnitely many V
x
which cover K say V
x
1
, , V
x
n
. Then let
U = ∩
n
i=1
U
x
i
. It follows p ∈ U and U has empty intersection with K. In fact U has
empty intersection with ∪
n
i=1
V
x
i
. Since U is an open set and p ∈ K
C
is arbitrary,
it follows K
C
is an open set.
370 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Consider now the claim about the intersection. If this were not so, ∪
_
F
C
: F ∈ T
_
=
X and so, in particular, picking some F
0
∈ T,
_
F
C
: F ∈ T
_
would be an open
cover of F
0
. Since F
0
is compact, some ﬁnite subcover, F
C
1
, , F
C
m
exists. But
then F
0
⊆ ∪
m
k=1
F
C
k
which means ∩
m
k=0
F
k
= ∅, contrary to the ﬁnite intersection
property. To see this, note that if x ∈ F
0
, then it must fail to be in some F
k
and so
it is not in ∩
m
k=0
F
k
. Since this is true for every x it follows ∩
m
k=0
F
k
= ∅.
Theorem 15.3.6 Let X
i
be a compact metric space with metric d
i
. Then
m
i=1
X
i
is also a compact metric space with respect to the metric, d (x, y) ≡ max
i
(d
i
(x
i
, y
i
)).
Proof: This is most easily seen from sequential compactness. Let
_
x
k
_
∞
k=1
be a sequence of points in
m
i=1
X
i
. Consider the i
th
component of x
k
, x
k
i
. It
follows
_
x
k
i
_
is a sequence of points in X
i
and so it has a convergent subsequence.
Compactness of X
1
implies there exists a subsequence of x
k
, denoted by
_
x
k
1
_
such
that
lim
k
1
→∞
x
k
1
1
→x
1
∈ X
1
.
Now there exists a further subsequence, denoted by
_
x
k
2
_
such that in addition to
this, x
k
2
2
→x
2
∈ X
2
. After taking m such subsequences, there exists a subsequence,
_
x
l
_
such that lim
l→∞
x
l
i
= x
i
∈ X
i
for each i. Therefore, letting x ≡(x
1
, , x
m
),
x
l
→x in
m
i=1
X
i
. This proves the theorem.
15.4 Ascoli Arzela Theorem
Deﬁnition 15.4.1 Let (X, d) be a complete metric space. Then it is said to be
locally compact if B(x, r) is compact for each r > 0.
Thus if you have a locally compact metric space, then if ¦a
n
¦ is a bounded
sequence, it must have a convergent subsequence.
Let K be a compact subset of 1
n
and consider the continuous functions which
have values in a locally compact metric space, (X, d) where d denotes the metric on
X. Denote this space as C (K, X) .
Deﬁnition 15.4.2 For f, g ∈ C (K, X) , where K is a compact subset of 1
n
and
X is a locally compact complete metric space deﬁne
ρ
K
(f, g) ≡ sup ¦d (f (x) , g (x)) : x ∈ K¦ .
Then ρ
K
provides a distance which makes C (K, X) into a metric space.
The Ascoli Arzela theorem is a major result which tells which subsets of C (K, X)
are sequentially compact.
Deﬁnition 15.4.3 Let A ⊆ C (K, X) for K a compact subset of 1
n
. Then A is
said to be uniformly equicontinuous if for every ε > 0 there exists a δ > 0 such that
whenever x, y ∈ K with [x −y[ < δ and f ∈ A,
d (f (x) , f (y)) < ε.
15.4. ASCOLI ARZELA THEOREM 371
The set, A is said to be uniformly bounded if for some M < ∞, and a ∈ X,
f (x) ∈ B(a, M)
for all f ∈ A and x ∈ K.
Uniform equicontinuity is like saying that the whole set of functions, A, is uni
formly continuous on K uniformly for f ∈ A. The version of the Ascoli Arzela
theorem I will present here is the following.
Theorem 15.4.4 Suppose K is a nonempty compact subset of 1
n
and A ⊆ C (K, X)
is uniformly bounded and uniformly equicontinuous. Then if ¦f
k
¦ ⊆ A, there exists
a function, f ∈ C (K, X) and a subsequence, f
k
l
such that
lim
l→∞
ρ
K
(f
k
l
, f) = 0.
To give a proof of this theorem, I will ﬁrst prove some lemmas.
Lemma 15.4.5 If K is a compact subset of 1
n
, then there exists D ≡ ¦x
k
¦
∞
k=1
⊆ K
such that D is dense in K. Also, for every ε > 0 there exists a ﬁnite set of points,
¦x
1
, , x
m
¦ ⊆ K, called an ε net such that
∪
m
i=1
B(x
i
, ε) ⊇ K.
Proof: For m ∈ N, pick x
m
1
∈ K. If every point of K is within 1/m of x
m
1
, stop.
Otherwise, pick
x
m
2
∈ K ¸ B(x
m
1
, 1/m) .
If every point of K contained in B(x
m
1
, 1/m) ∪B(x
m
2
, 1/m) , stop. Otherwise, pick
x
m
3
∈ K ¸ (B(x
m
1
, 1/m) ∪ B(x
m
2
, 1/m)) .
If every point of K is contained in B(x
m
1
, 1/m) ∪B(x
m
2
, 1/m) ∪B(x
m
3
, 1/m) , stop.
Otherwise, pick
x
m
4
∈ K ¸ (B(x
m
1
, 1/m) ∪ B(x
m
2
, 1/m) ∪ B(x
m
3
, 1/m))
Continue this way until the process stops, say at N (m). It must stop because
if it didn’t, there would be a convergent subsequence due to the compactness of
K. Ultimately all terms of this convergent subsequence would be closer than 1/m,
violating the manner in which they are chosen. Then D = ∪
∞
m=1
∪
N(m)
k=1
¦x
m
k
¦ . This
is countable because it is a countable union of countable sets. If y ∈ K and ε > 0,
then for some m, 2/m < ε and so B(y, ε) must contain some point of ¦x
m
k
¦ since
otherwise, the process stopped too soon. You could have picked y. This proves the
lemma.
372 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Lemma 15.4.6 Suppose D is deﬁned above and ¦g
m
¦ is a sequence of functions of
A having the property that for every x
k
∈ D,
lim
m→∞
g
m
(x
k
) exists.
Then there exists g ∈ C (K, X) such that
lim
m→∞
ρ (g
m
, g) = 0.
Proof: Deﬁne g ﬁrst on D.
g (x
k
) ≡ lim
m→∞
g
m
(x
k
) .
Next I show that ¦g
m
¦ converges at every point of K. Let x ∈ K and let ε > 0 be
given. Choose x
k
such that for all f ∈ A,
d (f (x
k
) , f (x)) <
ε
3
.
I can do this by the equicontinuity. Now if p, q are large enough, say p, q ≥ M,
d (g
p
(x
k
) , g
q
(x
k
)) <
ε
3
.
Therefore, for p, q ≥ M,
d (g
p
(x) , g
q
(x)) ≤ d (g
p
(x) , g
p
(x
k
)) +d (g
p
(x
k
) , g
q
(x
k
)) +d (g
q
(x
k
) , g
q
(x))
<
ε
3
+
ε
3
+
ε
3
= ε
It follows that ¦g
m
(x)¦ is a Cauchy sequence having values X. Therefore, it con
verges. Let g (x) be the name of the thing it converges to.
Let ε > 0 be given and pick δ > 0 such that whenever x, y ∈ K and [x −y[ < δ,
it follows d (f (x) , f (y)) <
ε
3
for all f ∈ A. Now let ¦x
1
, , x
m
¦ be a δ net for
K as in Lemma 15.4.5. Since there are only ﬁnitely many points in this δ net, it
follows that there exists N such that for all p, q ≥ N,
d (g
q
(x
i
) , g
p
(x
i
)) <
ε
3
for all ¦x
1
, , x
m
¦ . Therefore, for arbitrary x ∈ K, pick x
i
∈ ¦x
1
, , x
m
¦ such
that [x
i
−x[ < δ. Then
d (g
q
(x) , g
p
(x)) ≤ d (g
q
(x) , g
q
(x
i
)) +d (g
q
(x
i
) , g
p
(x
i
)) +d (g
p
(x
i
) , g
p
(x))
<
ε
3
+
ε
3
+
ε
3
= ε.
Since N does not depend on the choice of x, it follows this sequence ¦g
m
¦ is uni
formly Cauchy. That is, for every ε > 0, there exists N such that if p, q ≥ N,
then
ρ (g
p
, g
q
) < ε.
15.4. ASCOLI ARZELA THEOREM 373
Next, I need to verify that the function, g is a continuous function. Let N be
large enough that whenever p, q ≥ N, the above holds. Then for all x ∈ K,
d (g (x) , g
p
(x)) ≤
ε
3
(15.4.6)
whenever p ≥ N. This follows from observing that for p, q ≥ N,
d (g
q
(x) , g
p
(x)) <
ε
3
and then taking the limit as q → ∞ to obtain 15.4.6. In passing to the limit, you
can use the following simple claim.
Claim: In a metric space, if a
n
→a, then d (a
n
, b) →d (a, b) .
Proof of the claim: You note that by the triangle inequality, d (a
n
, b) −
d (a, b) ≤ d (a
n
, a) and d (a, b) −d (a
n
, b) ≤ d (a
n
, a) and so
[d (a
n
, b) −d (a, b)[ ≤ d (a
n
, a) .
Now let p satisfy 15.4.6 for all x whenever p > N. Also pick δ > 0 such that if
[x −y[ < δ, then
d (g
p
(x) , g
p
(y)) <
ε
3
.
Then if [x −y[ < δ,
d (g (x) , g (y)) ≤ d (g (x) , g
p
(x)) +d (g
p
(x) , g
p
(y)) +d (g
p
(y) , g (y))
<
ε
3
+
ε
3
+
ε
3
= ε.
Since ε was arbitrary, this shows that g is continuous.
It only remains to verify that ρ (g, g
k
) → 0. But this follows from 15.4.6. This
proves the lemma.
With these lemmas, it is time to prove Theorem 15.4.4.
Proof of Theorem 15.4.4: Let D = ¦x
k
¦ be the countable dense set of K
gauranteed by Lemma 15.4.5 and let ¦(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , ¦ be a
subsequence of N such that
lim
k→∞
f
(1,k)
(x
1
) exists.
This is where the local compactness of X is being used. Now let
¦(2, 1) , (2, 2) , (2, 3) , (2, 4) , (2, 5) , ¦
be a subsequence of ¦(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , ¦ which has the property
that
lim
k→∞
f
(2,k)
(x
2
) exists.
Thus it is also the case that
f
(2,k)
(x
1
) converges to lim
k→∞
f
(1,k)
(x
1
) .
374 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
because every subsequence of a convergent sequence converges to the same thing as
the convergent sequence. Continue this way and consider the array
f
(1,1)
, f
(1,2)
, f
(1,3)
, f
(1,4)
, converges at x
1
f
(2,1)
, f
(2,2)
, f
(2,3)
, f
(2,4)
converges at x
1
and x
2
f
(3,1)
, f
(3,2)
, f
(3,3)
, f
(3,4)
converges at x
1
, x
2
, and x
3
.
.
.
Now let g
k
≡ f
(k,k)
. Thus g
k
is ultimately a subsequence of
_
f
(m,k)
_
whenever k > m
and therefore, ¦g
k
¦ converges at each point of D. By Lemma 15.4.6 it follows there
exists g ∈ C (K; X) such that
lim
k→∞
ρ (g, g
k
) = 0.
This proves the Ascoli Arzela theorem.
Actually there is an if and only if version of it but the most useful case is what
is presented here. The process used to get the subsequence in the proof is called
the Cantor diagonalization procedure.
15.5 General Topological Spaces
It turns out that metric spaces are not suﬃciently general for some applications.
This section is a brief introduction to general topology. In making this generaliza
tion, the properties of balls which are the conclusion of Theorem 15.1.4 on Page
364 are stated as axioms for a subset of the power set of a given set which will be
known as a basis for the topology.
Deﬁnition 15.5.1 Let X be a nonempty set and suppose B ⊆ T (X). Then B is a
basis for a topology if it satisﬁes the following axioms.
1.) Whenever p ∈ A ∩ B for A, B ∈ B, it follows there exists C ∈ B such that
p ∈ C ⊆ A∩ B.
2.) ∪B = X.
Then a subset, U, of X is an open set if for every point, x ∈ U, there exists
B ∈ B such that x ∈ B ⊆ U. Thus the open sets are exactly those which can be
obtained as a union of sets of B. Denote these subsets of X by the symbol τ and
refer to τ as the topology or the set of open sets.
Note that this is simply the analog of saying a set is open exactly when every
point is an interior point.
Proposition 15.5.2 Let X be a set and let B be a basis for a topology as deﬁned
above and let τ be the set of open sets determined by B. Then
∅ ∈ τ, X ∈ τ, (15.5.7)
If ( ⊆ τ, then ∪ ( ∈ τ (15.5.8)
If A, B ∈ τ, then A∩ B ∈ τ. (15.5.9)
15.5. GENERAL TOPOLOGICAL SPACES 375
Proof: If p ∈ ∅ then there exists B ∈ B such that p ∈ B ⊆ ∅ because there are
no points in ∅. Therefore, ∅ ∈ τ. Now if p ∈ X, then by part 2.) of Deﬁnition 15.5.1
p ∈ B ⊆ X for some B ∈ B and so X ∈ τ.
If ( ⊆ τ, and if p ∈ ∪(, then there exists a set, B ∈ ( such that p ∈ B.
However, B is itself a union of sets from B and so there exists C ∈ B such that
p ∈ C ⊆ B ⊆ ∪(. This veriﬁes 15.5.8.
Finally, if A, B ∈ τ and p ∈ A ∩ B, then since A and B are themselves unions
of sets of B, it follows there exists A
1
, B
1
∈ B such that A
1
⊆ A, B
1
⊆ B, and
p ∈ A
1
∩ B
1
. Therefore, by 1.) of Deﬁnition 15.5.1 there exists C ∈ B such that
p ∈ C ⊆ A
1
∩ B
1
⊆ A ∩ B, showing that A ∩ B ∈ τ as claimed. Of course if
A∩ B = ∅, then A∩ B ∈ τ. This proves the proposition.
Deﬁnition 15.5.3 A set X together with such a collection of its subsets satisfying
15.5.715.5.9 is called a topological space. τ is called the topology or set of open sets
of X.
Deﬁnition 15.5.4 A topological space is said to be Hausdorﬀ if whenever p and q
are distinct points of X, there exist disjoint open sets U, V such that p ∈ U, q ∈ V .
In other words points can be separated with open sets.
Hausdorﬀ
p
U
q
V
Deﬁnition 15.5.5 A subset of a topological space is said to be closed if its comple
ment is open. Let p be a point of X and let E ⊆ X. Then p is said to be a limit
point of E if every open set containing p contains a point of E distinct from p.
Note that if the topological space is Hausdorﬀ, then this deﬁnition is equivalent
to requiring that every open set containing p contains inﬁnitely many points from
E. Why?
Theorem 15.5.6 A subset, E, of X is closed if and only if it contains all its limit
points.
Proof: Suppose ﬁrst that E is closed and let x be a limit point of E. Is x ∈ E?
If x / ∈ E, then E
C
is an open set containing x which contains no points of E, a
contradiction. Thus x ∈ E.
Now suppose E contains all its limit points. Is the complement of E open? If
x ∈ E
C
, then x is not a limit point of E because E has all its limit points and so
there exists an open set, U containing x such that U contains no point of E other
than x. Since x / ∈ E, it follows that x ∈ U ⊆ E
C
which implies E
C
is an open set
because this shows E
C
is the union of open sets.
376 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Theorem 15.5.7 If (X, τ) is a Hausdorﬀ space and if p ∈ X, then ¦p¦ is a closed
set.
Proof: If x ,= p, there exist open sets U and V such that x ∈ U, p ∈ V and
U ∩ V = ∅. Therefore, ¦p¦
C
is an open set so ¦p¦ is closed.
Note that the Hausdorﬀ axiom was stronger than needed in order to draw the
conclusion of the last theorem. In fact it would have been enough to assume that
if x ,= y, then there exists an open set containing x which does not intersect y.
Deﬁnition 15.5.8 A topological space (X, τ) is said to be regular if whenever C is
a closed set and p is a point not in C, there exist disjoint open sets U and V such
that p ∈ U, C ⊆ V . Thus a closed set can be separated from a point not in the
closed set by two disjoint open sets.
Regular
p
U
C
V
Deﬁnition 15.5.9 The topological space, (X, τ) is said to be normal if whenever
C and K are disjoint closed sets, there exist disjoint open sets U and V such that
C ⊆ U, K ⊆ V . Thus any two disjoint closed sets can be separated with open sets.
Normal
C
U
K
V
Deﬁnition 15.5.10 Let E be a subset of X. E is deﬁned to be the smallest closed
set containing E.
Lemma 15.5.11 The above deﬁnition is well deﬁned.
Proof: Let ( denote all the closed sets which contain E. Then ( is nonempty
because X ∈ (.
(∩¦A : A ∈ (¦)
C
= ∪
_
A
C
: A ∈ (
_
,
an open set which shows that ∩( is a closed set and is the smallest closed set which
contains E.
Theorem 15.5.12 E = E ∪ ¦limit points of E¦.
15.5. GENERAL TOPOLOGICAL SPACES 377
Proof: Let x ∈ E and suppose that x / ∈ E. If x is not a limit point either, then
there exists an open set, U,containing x which does not intersect E. But then U
C
is a closed set which contains E which does not contain x, contrary to the deﬁnition
that E is the intersection of all closed sets containing E. Therefore, x must be a
limit point of E after all.
Now E ⊆ E so suppose x is a limit point of E. Is x ∈ E? If H is a closed set
containing E, which does not contain x, then H
C
is an open set containing x which
contains no points of E other than x negating the assumption that x is a limit point
of E.
The following is the deﬁnition of continuity in terms of general topological spaces.
It is really just a generalization of the ε  δ deﬁnition of continuity given in calculus.
Deﬁnition 15.5.13 Let (X, τ) and (Y, η) be two topological spaces and let f : X →
Y . f is continuous at x ∈ X if whenever V is an open set of Y containing f(x),
there exists an open set U ∈ τ such that x ∈ U and f(U) ⊆ V . f is continuous if
f
−1
(V ) ∈ τ whenever V ∈ η.
You should prove the following.
Proposition 15.5.14 In the situation of Deﬁnition 15.5.13 f is continuous if and
only if f is continuous at every point of X.
Deﬁnition 15.5.15 Let (X
i
, τ
i
) be topological spaces.
n
i=1
X
i
is the Cartesian
product. Deﬁne a product topology as follows. Let B =
n
i=1
A
i
where A
i
∈ τ
i
.
Then B is a basis for the product topology.
Theorem 15.5.16 The set B of Deﬁnition 15.5.15 is a basis for a topology.
Proof: Suppose x ∈
n
i=1
A
i
∩
n
i=1
B
i
where A
i
and B
i
are open sets. Say
x =(x
1
, , x
n
) .
Then x
i
∈ A
i
∩B
i
for each i. Therefore, x ∈
n
i=1
A
i
∩B
i
∈ B and
n
i=1
A
i
∩B
i
⊆
n
i=1
A
i
.
The deﬁnition of compactness is also considered for a general topological space.
This is given next.
Deﬁnition 15.5.17 A subset, E, of a topological space (X, τ) is said to be compact
if whenever ( ⊆ τ and E ⊆ ∪(, there exists a ﬁnite subset of (, ¦U
1
U
n
¦, such
that E ⊆ ∪
n
i=1
U
i
. (Every open covering admits a ﬁnite subcovering.) E is precom
pact if E is compact. A topological space is called locally compact if it has a basis
B, with the property that B is compact for each B ∈ B.
In general topological spaces there may be no concept of “bounded”. Even if
there is, closed and bounded is not necessarily the same as compactness. However,
in any Hausdorﬀ space every compact set must be a closed set.
378 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Theorem 15.5.18 If (X, τ) is a Hausdorﬀ space, then every compact subset must
also be a closed set.
Proof: Suppose p / ∈ K. For each x ∈ X, there exist open sets, U
x
and V
x
such
that
x ∈ U
x
, p ∈ V
x
,
and
U
x
∩ V
x
= ∅.
If K is assumed to be compact, there are ﬁnitely many of these sets, U
x
1
, , U
x
m
which cover K. Then let V ≡ ∩
m
i=1
V
x
i
. It follows that V is an open set containing
p which has empty intersection with each of the U
x
i
. Consequently, V contains no
points of K and is therefore not a limit point of K. This proves the theorem.
A useful construction when dealing with locally compact Hausdorﬀ spaces is the
notion of the one point compactiﬁcation of the space.
Deﬁnition 15.5.19 Suppose (X, τ) is a locally compact Hausdorﬀ space. Then let
¯
X ≡ X ∪ ¦∞¦ where ∞ is just the name of some point which is not in X which is
called the point at inﬁnity. A basis for the topology ¯τ for
¯
X is
τ ∪
_
K
C
where K is a compact subset of X
_
.
The complement is taken with respect to
¯
X and so the open sets, K
C
are basic open
sets which contain ∞.
The reason this is called a compactiﬁcation is contained in the next lemma.
Lemma 15.5.20 If (X, τ) is a locally compact Hausdorﬀ space, then
_
¯
X, ¯τ
_
is a
compact Hausdorﬀ space. Also if U is an open set of ¯τ, then U ¸ ¦∞¦ is an open
set of τ.
Proof: Since (X, τ) is a locally compact Hausdorﬀ space, it follows
_
¯
X, ¯τ
_
is
a Hausdorﬀ topological space. The only case which needs checking is the one of
p ∈ X and ∞. Since (X, τ) is locally compact, there exists an open set of τ, U
having compact closure which contains p. Then p ∈ U and ∞ ∈ U
C
and these are
disjoint open sets containing the points, p and ∞ respectively. Now let ( be an
open cover of
¯
X with sets from ¯τ. Then ∞ must be in some set, U
∞
from (, which
must contain a set of the form K
C
where K is a compact subset of X. Then there
exist sets from (, U
1
, , U
r
which cover K. Therefore, a ﬁnite subcover of
¯
X is
U
1
, , U
r
, U
∞
.
To see the last claim, suppose U contains ∞ since otherwise there is nothing to
show. Notice that if C is a compact set, then X ¸ C is an open set. Therefore, if
x ∈ U ¸ ¦∞¦ , and if
¯
X ¸ C is a basic open set contained in U containing ∞, then
if x is in this basic open set of
¯
X, it is also in the open set X ¸ C ⊆ U ¸ ¦∞¦ . If x
is not in any basic open set of the form
¯
X ¸ C then x is contained in an open set of
τ which is contained in U ¸ ¦∞¦. Thus U ¸ ¦∞¦ is indeed open in τ. This proves
the lemma.
15.6. CONNECTED SETS 379
Deﬁnition 15.5.21 If every ﬁnite subset of a collection of sets has nonempty in
tersection, the collection has the ﬁnite intersection property.
Theorem 15.5.22 Let / be a set whose elements are compact subsets of a Haus
dorﬀ topological space, (X, τ). Suppose / has the ﬁnite intersection property. Then
∅ ,= ∩/.
Proof: Suppose to the contrary that ∅ = ∩/. Then consider
( ≡
_
K
C
: K ∈ /
_
.
It follows ( is an open cover of K
0
where K
0
is any particular element of /. But
then there are ﬁnitely many K ∈ /, K
1
, , K
r
such that K
0
⊆ ∪
r
i=1
K
C
i
implying
that ∩
r
i=0
K
i
= ∅, contradicting the ﬁnite intersection property.
Lemma 15.5.23 Let (X, τ) be a topological space and let B be a basis for τ. Then
K is compact if and only if every open cover of basic open sets admits a ﬁnite
subcover.
Proof: Suppose ﬁrst that X is compact. Then if ( is an open cover consisting
of basic open sets, it follows it admits a ﬁnite subcover because these are open sets
in (.
Next suppose that every basic open cover admits a ﬁnite subcover and let ( be
an open cover of X. Then deﬁne
¯
( to be the collection of basic open sets which are
contained in some set of (. It follows
¯
( is a basic open cover of X and so it admits
a ﬁnite subcover, ¦U
1
, , U
p
¦. Now each U
i
is contained in an open set of (. Let
O
i
be a set of ( which contains U
i
. Then ¦O
1
, , O
p
¦ is an open cover of X. This
proves the lemma.
In fact, much more can be said than Lemma 15.5.23. However, this is all which
I will present here.
15.6 Connected Sets
Stated informally, connected sets are those which are in one piece. More precisely,
Deﬁnition 15.6.1 A set, S in a general topological space is separated if there exist
sets, A, B such that
S = A∪ B, A, B ,= ∅, and A∩ B = B ∩ A = ∅.
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated.
One of the most important theorems about connected sets is the following.
Theorem 15.6.2 Suppose U and V are connected sets having nonempty intersec
tion. Then U ∪ V is also connected.
380 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Proof: Suppose U ∪ V = A ∪ B where A ∩ B = B ∩ A = ∅. Consider the sets,
A∩ U and B ∪ U. Since
(A∩ U) ∩ (B ∩ U) = (A∩ U) ∩
_
B ∩ U
_
= ∅,
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets, A, B. Therefore, the other must be empty
and this shows U ∪ V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
U
V
Theorem 15.6.3 Let f : X → Y be continuous where X and Y are topological
spaces and X is connected. Then f (X) is also connected.
Proof: To do this you show f (X) is not separated. Suppose to the contrary
that f (X) = A∪B where A and B separate f (X) . Then consider the sets, f
−1
(A)
and f
−1
(B) . If z ∈ f
−1
(B) , then f (z) ∈ B and so f (z) is not a limit point of
A. Therefore, there exists an open set, U containing f (z) such that U ∩ A = ∅.
But then, the continuity of f implies that f
−1
(U) is an open set containing z such
that f
−1
(U) ∩f
−1
(A) = ∅. Therefore, f
−1
(B) contains no limit points of f
−1
(A) .
Similar reasoning implies f
−1
(A) contains no limit points of f
−1
(B). It follows
that X is separated by f
−1
(A) and f
−1
(B) , contradicting the assumption that X
was connected.
An arbitrary set can be written as a union of maximal connected sets called
connected components. This is the concept of the next deﬁnition.
Deﬁnition 15.6.4 Let S be a set and let p ∈ S. Denote by C
p
the union of all
connected subsets of S which contain p. This is called the connected component
determined by p.
Theorem 15.6.5 Let C
p
be a connected component of a set S in a general topo
logical space. Then C
p
is a connected set and if C
p
∩ C
q
,= ∅, then C
p
= C
q
.
15.6. CONNECTED SETS 381
Proof: Let ( denote the connected subsets of S which contain p. If C
p
= A∪B
where
A∩ B = B ∩ A = ∅,
then p is in one of A or B. Suppose without loss of generality p ∈ A. Then every
set of ( must also be contained in A also since otherwise, as in Theorem 15.6.2, the
set would be separated. But this implies B is empty. Therefore, C
p
is connected.
From this, and Theorem 15.6.2, the second assertion of the theorem is proved.
This shows the connected components of a set are equivalence classes and par
tition the set.
A set, I is an interval in 1 if and only if whenever x, y ∈ I then (x, y) ⊆ I. The
following theorem is about the connected sets in 1.
Theorem 15.6.6 A set, C in 1 is connected if and only if C is an interval.
Proof: Let C be connected. If C consists of a single point, p, there is nothing
to prove. The interval is just [p, p] . Suppose p < q and p, q ∈ C. You need to show
(p, q) ⊆ C. If
x ∈ (p, q) ¸ C
let C ∩(−∞, x) ≡ A, and C ∩(x, ∞) ≡ B. Then C = A∪B and the sets, A and B
separate C contrary to the assumption that C is connected.
Conversely, let I be an interval. Suppose I is separated by A and B. Pick x ∈ A
and y ∈ B. Suppose without loss of generality that x < y. Now deﬁne the set,
S ≡ ¦t ∈ [x, y] : [x, t] ⊆ A¦
and let l be the least upper bound of S. Then l ∈ A so l / ∈ B which implies l ∈ A.
But if l / ∈ B, then for some δ > 0,
(l, l +δ) ∩ B = ∅
contradicting the deﬁnition of l as an upper bound for S. Therefore, l ∈ B which
implies l / ∈ A after all, a contradiction. It follows I must be connected.
The following theorem is a very useful description of the open sets in 1.
Theorem 15.6.7 Let U be an open set in 1. Then there exist countably many
disjoint open sets, ¦(a
i
, b
i
)¦
∞
i=1
such that U = ∪
∞
i=1
(a
i
, b
i
) .
Proof: Let p ∈ U and let z ∈ C
p
, the connected component determined by p.
Since U is open, there exists, δ > 0 such that (z −δ, z +δ) ⊆ U. It follows from
Theorem 15.6.2 that
(z −δ, z +δ) ⊆ C
p
.
This shows C
p
is open. By Theorem 15.6.6, this shows C
p
is an open interval,
(a, b) where a, b ∈ [−∞, ∞] . There are therefore at most countably many of these
connected components because each must contain a rational number and the ra
tional numbers are countable. Denote by ¦(a
i
, b
i
)¦
∞
i=1
the set of these connected
components.
382 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Deﬁnition 15.6.8 A topological space, E is arcwise connected if for any two
points, p, q ∈ E, there exists a closed interval, [a, b] and a continuous function,
γ : [a, b] → E such that γ (a) = p and γ (b) = q. E is locally connected if it has
a basis of connected open sets. E is locally arcwise connected if it has a basis of
arcwise connected open sets.
An example of an arcwise connected topological space would be the any subset
of 1
n
which is the continuous image of an interval. Locally connected is not the
same as connected. A well known example is the following.
__
x, sin
1
x