Kung-Ching Chang - Lecture Notes On Calculus of Variations (Peking University Mathematics) - WSPC (2016)

10157_9789813144682_tp.
indd 1 5/9/16 11:13 AM

PEKING UNIVERSITY SERIES IN MATHEMATICS
Series Editors: Kung-Ching Chang, Pingwen Zhang, Bin Liu,

and Jiping Zhang (Peking University, China)
Vol. 1: An Introduction to Finsler Geometry

by Xiaohuan Mo (Peking University, China)
Vol. 2: Numerical Methods for Exterior Problems

by Lung-An Ying (Peking University & Xiamen University, China)
Vol. 3: Approaches to the Qualitative Theory of Ordinary Differential

Equations: Dynamical Systems and Nonlinear Oscillations
by Tongren Ding (Peking University, China)
Vol. 4: Elliptic, Hyperbolic and Mixed Complex Equations with

Parabolic Degeneracy: Including Tricomi–Bers and
Tricomi–Frankl–Rassias Problems
by Guo Chun Wen (Peking University, China)
Vol. 5: Arbitrage, Credit and Informational Risks

edited by Caroline Hillairet (Ecole Polytechnique, France),
Monique Jeanblanc (Université d’Evry, France) and
Ying Jiao (Université Lyon I, France)
Vol. 6: Lecture Notes on Calculus of Variations

by Kung-Ching Chang (Peking University, China)

Eh - Lecture Notes on Calculus of Variations.indd 1 02-09-16 4:49:09 PM

Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data

Names: Zhang, Gongqing. | Zhang, Tan, 1969–
Title: Lecture notes on calculus of variations / by Kung Ching Chang
(Peking University, China) ; translated by: Tan Zhang (Murray State University, USA).
Other titles: Calculus of variations
Description: New Jersey : World Scientific, 2016. | Series: Peking University
series in mathematics ; volume 6 | Includes bibliographical references and index.
Identifiers: LCCN 2016025413 | ISBN 9789813144682 (hardcover : alk. paper) |
ISBN 9789813146235 (softcover : alk. paper)
Subjects: LCSH: Calculus of variations. | Mathematical analysis. | Functionals.
Classification: LCC QA315 .Z434 2016 | DDC 515/.64--dc23
LC record available at https://lccn.loc.gov/2016025413
British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.
Copyright © 2017 by World Scientific Publishing Co. Pte. Ltd.

All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
Printed in Singapore
Eh - Lecture Notes on Calculus of Variations.indd 2 02-09-16 4:49:09 PM

August 30, 2016 14:14 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page v
Preface
Calculus of variations first appeared around the time when calculus was in-
vented, over 300 years ago. It used to be a mandatory course for undergraduate
students majoring in mathematics, following calculus and ordinary differential
equations. The main content was to turn variational problems into problems of
solving differential equations. However, since only a few differential equations
have explicit solutions, this limitation hindered the study. Since the second half of
the 20th century, as a course of its own, calculus of variations has been condensed
or merged into other related courses gradually.
Nevertheless calculus of variations has close connections to many other
branches of mathematics. Among the 23 problems proposed by Hilbert, three
of which deal with variational problems, its importance is evident. Variational
problems arise naturally from mechanics, physics, economics, operation research,
and engineering, etc. In particular, since the 1970s, finite element methods and op-
timization techniques have provided numerical solutions to variational problems,
thereby elevating its status in the realm of applied mathematics.
In the past few decades, great development has taken place in both theoretical
and applied aspects of calculus of variations. It is noted by the mathematical com-
munity that calculus of variations is quickly becoming a necessity in undergradu-
ate mathematics education, without it, students would struggle with new demands
of modern science and technology. However, there is no unanimous agreement so
far on how to remedy this shortcoming, it will take some time to explore. This
book attempts to bring the readers up to date in this subject area.
In the academic years between 2006 and 2010, the author taught the course
entitled “Calculus of Variations” to both advanced undergraduate students and
beginning level graduate students in mathematics at the School of Mathematical
Sciences, Peking University. The organization of the course content was based on
the following three principles:
v
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page vi
vi Lecture Notes on Calculus of Variations
1. The lectures should not only introduce the classical theory but also the
modern development of calculus of variations. Furthermore, more in-depth stud-
ies were conducted stemming from various research problems.
2. The course should put its main emphasis on the most frequently used theo-
rems as well as techniques.
3. The course should welcome a large audience, including students in the area
of pure mathematics, numerical mathematics, mathematical statistics, information
science, and financial mathematics.
The prerequisites for this course were Mathematical Analysis, Modern Alge-
bra, and Analytic Geometry; in addition, students were expected to be somewhat
familiar with Ordinary Differential Equations, Real Analysis, Functional Analy-
sis, Differential Geometry, and Mathematical Physics.
The entire course was divided into three sections: classical theory of calculus
of variations, the existence and regularity of solutions, and special topics; whereas
the latter played a role in the development of modern day calculus of variation.
This book not only introduces fundamental concepts, basic theorems, and tech-
niques used in calculus of variations, but reinforces them with abundant examples
and counterexamples as well. In particular, it sheds new light on the definitions,
their interplays and compatibilities with existing theorems and methods. Exercises
are given at the end of most lectures.
This book is based on these lectures notes.
Due to the experimental nature of the book, I sincerely welcome any input and
critique on how to improve it.
I am very grateful to Tian Fu Zhao and Shan Nian Lu from the Higher Educa-
tion Publishing Company. Their careful proofreading and comments are greatly
appreciated.
Kung Ching Chang

Peking University
December 2010
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page vii
Contents
Preface v
1. The theory and problems of calculus of variations 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Typical examples . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. The Euler–Lagrange equation 13

2.1 The necessary condition for the extremal values of
functions — a review . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The derivation of the Euler–Lagrange equation . . . . . . . . . . 14
2.3 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Examples of solving the Euler–Lagrange equations . . . . . . . 21
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3. The necessary condition and the sufficient condition on

extremal values of functionals 29
3.1 The extremal values of functions — a revisit . . . . . . . . . . . 29
3.2 Second order variations . . . . . . . . . . . . . . . . . . . . . . 30
3.3 The Legendre–Hadamard condition . . . . . . . . . . . . . . . 32
3.4 The Jacobi field . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Conjugate points . . . . . . . . . . . . . . . . . . . . . . . . . 37
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
vii
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page viii
viii Lecture Notes on Calculus of Variations
4. Strong minima and extremal fields 43

4.1 Strong minima and weak minima . . . . . . . . . . . . . . . . . 43
4.2 A necessary condition for strong minimal value and the
Weierstrass excess function . . . . . . . . . . . . . . . . . . . . 44
4.3 Extremal fields and strong minima . . . . . . . . . . . . . . . . 46
4.4 Mayer field, Hilbert’s invariant integral . . . . . . . . . . . . . . 52
4.5 A sufficient condition for strong minima . . . . . . . . . . . . . 54
4.6∗ The proof of Theorem 4.4 (for the case N > 1) . . . . . . . . . 56
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5. The Hamilton–Jacobi theory 61

5.1 Eikonal and the Carathéodory system of equations . . . . . . . . 61
5.2 The Legendre transformation . . . . . . . . . . . . . . . . . . . 62
5.3 The Hamilton system of equations . . . . . . . . . . . . . . . . 64
5.4 The Hamilton–Jacobi equation . . . . . . . . . . . . . . . . . . 67
5.5∗ Jacobi’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 69
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6. Variational problems involving multivariate integrals 73

6.1 Derivation of the Euler–Lagrange equation . . . . . . . . . . . . 73
6.2 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Second order variations . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7. Constrained variational problems 89

7.1 The isoperimetric problem . . . . . . . . . . . . . . . . . . . . 89
7.2 Pointwise constraints . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 Variational inequalities . . . . . . . . . . . . . . . . . . . . . . 100
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8. The conservation law and Noether’s theorem 103

8.1 One parameter diffeomorphisms and Noether’s theorem . . . . . 103
8.2 The energy–momentum tensor and Noether’s theorem . . . . . . 107
8.3 Interior minima . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.4∗ Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page ix
Contents ix
9. Direct methods 119

9.1 The Dirichlet’s principle and minimization method . . . . . . . 119
9.2 Weak convergence and weak-∗ convergence . . . . . . . . . . . 122
9.3 Weak-∗ sequential compactness . . . . . . . . . . . . . . . . . . 125
9.4∗ Reflexive spaces and the Eberlein–Šmulian theorem . . . . . . . 129
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
10. Sobolev spaces 133

10.1 Generalized derivatives . . . . . . . . . . . . . . . . . . . . . . 133
10.2 The space W m,p (Ω) . . . . . . . . . . . . . . . . . . . . . . . 134
10.3 Representations of functionals . . . . . . . . . . . . . . . . . . 137
10.4 Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.5 Some important properties of Sobolev spaces and
embedding theorems . . . . . . . . . . . . . . . . . . . . . . . 139
10.6 The Euler–Lagrange equation . . . . . . . . . . . . . . . . . . . 145
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
11. Weak lower semi-continuity 149

11.1 Convex sets and convex functions . . . . . . . . . . . . . . . . 149
11.2 Convexity and weak lower semi-continuity . . . . . . . . . . . . 151
11.3 An existence theorem . . . . . . . . . . . . . . . . . . . . . . . 154
11.4∗ Quasi-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
12. Boundary value problems and eigenvalue problems of linear

differential equations 163
12.1 Linear boundary value problems and orthogonal projections . . . 163
12.2 The eigenvalue problems . . . . . . . . . . . . . . . . . . . . . 167
12.3 The eigenfunction expansions . . . . . . . . . . . . . . . . . . . 171
12.4 The minimax description of eigenvalues . . . . . . . . . . . . . 176
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
13. Existence and regularity 179

13.1 Regularity (n = 1) . . . . . . . . . . . . . . . . . . . . . . . . 180
13.2 More on regularity (n > 1) . . . . . . . . . . . . . . . . . . . . 184
13.3 The solutions of some variational problems . . . . . . . . . . . 186
13.4 The limitations of calculus of variations . . . . . . . . . . . . . 193
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page x
x Lecture Notes on Calculus of Variations
14. The dual least action principle and the Ekeland

variational principle 195
14.1 The conjugate function of a convex function . . . . . . . . . . . 195
14.2 The dual least action principle . . . . . . . . . . . . . . . . . . 199
14.3 The Ekeland variational principle . . . . . . . . . . . . . . . . . 202
14.4 The Fréchet derivative and the Palais–Smale condition . . . . . 203
14.5 The Nehari technique . . . . . . . . . . . . . . . . . . . . . . . 206
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
15. The Mountain Pass Theorem, its generalizations, and

applications 211
15.1 The Mountain Pass Theorem . . . . . . . . . . . . . . . . . . . 211
15.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
16. Periodic solutions, homoclinic and heteroclinic orbits 227

16.1 The simple pendulum . . . . . . . . . . . . . . . . . . . . . . . 227
16.2 Periodic solutions . . . . . . . . . . . . . . . . . . . . . . . . . 230
16.3 Heteroclinic orbits . . . . . . . . . . . . . . . . . . . . . . . . . 234
16.4 Homoclinic orbits . . . . . . . . . . . . . . . . . . . . . . . . . 238
17. Geodesics and minimal surfaces 243

17.1 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
17.2 Minimal surfaces . . . . . . . . . . . . . . . . . . . . . . . . . 247
18. Numerical methods for variational problems 259

18.1 The Ritz method . . . . . . . . . . . . . . . . . . . . . . . . . . 259
18.2 The finite element method . . . . . . . . . . . . . . . . . . . . 261
18.3 Cea’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
18.4 An optimization method — the conjugate gradient method . . . 268
19. Optimal control problems 275

19.1 The formulation of problems . . . . . . . . . . . . . . . . . . . 275
19.2 The Pontryagin Maximal Principle . . . . . . . . . . . . . . . . 280
19.3 The Bang-Bang principle . . . . . . . . . . . . . . . . . . . . . 285
20. Functions of bounded variations and image processing 289

20.1 Functions of bounded variations in one variable — a review . . . 289
20.2 Functions of bounded variations in several variables . . . . . . . 293
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page xi
Contents xi
20.3 The relaxation function . . . . . . . . . . . . . . . . . . . . . . 299

20.4 Image restoration and the Rudin–Osher–Fatemi model . . . . . 301
Bibliography 305
Index 309
August 19, 2016 10:39 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page 1
Lecture 1
The theory and problems of calculus of

variations
1.1 Introduction
Calculus of variations is an important part of mathematical analysis, it is

closely related to many other branches of mathematics and has numerous applica-
tions; some examples include:
Many important equations in mathematical physics, differential equations in

elastic and plastic mechanics, biomembrane equation, and differential equations
arising from geometry are some particular Euler equations for certain functionals.
Optimal control problems are often different kinds of variational problems

with constraints, which appear in both engineering and economics.
Variational problems also occur frequently in intelligence material, image pro-

cessing, and optimal designs.
Variational method is the main tool in establishing the existence of solutions

for elliptic partial differential equations, it has since become an integral part of the
study of partial differential equations. The intimate connections between calculus
of variations and partial differential equations are readily seen by Hilbert’s 19th
and 20th problems.
Numerical methods used in partial differential equations, in particular, the

finite element method comes directly from variational structures. The rapid devel-
opment of optimization techniques has made it feasible to numerical solutions for
the extremal values of variational problems.
The interplay between topology and the calculus of variations has led to
a new brach in mathematics — global analysis. It has vastly accelerated to the
advancement of critical point theory. In particular, Morse theory reveals the
1
2 Lecture Notes on Calculus of Variations
interplay between analysis and topology, it has since become a core topic in dif-
ferential topology. In the same vein, Floer homology has also emerged as their
heir.
Calculus of variations has seeped through to many other areas, such as Rie-
mannian Geometry, Finsler Geometry, Symplectic Geometry, and Conformal Ge-
ometry, etc. Many variational problems with rich geometric background, such as
geodesics, minimal surfaces, harmonic maps, etc. have stimulated research inter-
ests in finding new theory (for example, geometric measure theory), new methods,
and new techniques.
Variational methods play an important role in the study of periodic orbits of
Hamiltonian dynamical systems, Mather set, and chaos.
Malliavin Calculus, the stochastic calculus of variations is an interplay
between differential calculus and probability theory. It has become an essential
part of financial mathematics.
It is clear that variational problems, theory, and methods have profound influ-
ence on various areas of modern mathematics, including pure mathematics, ap-
plied mathematics, numerical mathematics, information science, and mathemati-
cal economics, etc. No doubt, calculus of variations has taken the center stage of
modern mathematics.
Comparing to some classical textbooks (for examples [LL], [Ka], [GF]), this
volume offers the following unique features:
While introducing the classical theory of calculus of variations, emphasis is
put on the first order as well as second order conditions of a minimum.
Since many examples in partial differential equations, differential geometry,
and mathematical physics involve several variables, we give more extensive dis-
cussions on multivariate cases.
In the classical theory of calculus of variations, we strengthen the Hamilton–
Jacobi theory and the conservation law, since they are very useful content in
physics and geometry.
Aside from the classical theory of calculus of variations, we also empha-
size the direct methods and their applications. The direct methods are the main
body of modern calculus of variations; it is the foundation for establishing the
existence of solutions of differential equations and their numerical solutions. This
approach is accessible to students with previous exposure to functional analysis,
and it constitutes for nearly half of the material in this volume.
Eigenvalue problem is one of the central problems in analysis. Our treatment
is to present them as an application of solving constrained optimization problems,
Theory and problems of calculus of variations 3
which mirrors nicely with corresponding topics in functional analysis.

Furthermore, we conduct careful investigation on some special topics, which
may be considered optional.
Critical point theory is the fastest growing branch of calculus of variations
in the past few decades, it has wide applications. It is particularly important in
proving the existence of solutions of differential equations. This is a very rich
area, we will only introduce one of the simplest results — the Mountain Pass
Theorem, as the first introductory step to this exciting subject.
Periodic solutions of Hamiltonian systems, homoclinic orbits, and hetero-
clinic orbits are heated topics in dynamical systems and symplectic geometry.
Under certain conditions, the existence of solutions may be obtained via varia-
tional methods.
Both geodesics and minimal surfaces are simple geometric examples of
variational nature, this lecture may be regarded as an introduction to geometric
analysis.
Finite element methods and optimization techniques are two commonly
used numerical solutions to variational problems. However, it is worth noting
that the theoretical background of finite element method is built upon calculus of
variations.
Some additional topics with real life applications are presented toward the
end of the book, for instance, some optimal control problems and problems from
image processing. They are independent of each other and yet help the readers to
appreciate this subject in our modern society.
There are a total of 20 lectures. The first eight lectures are considered classical
calculus of variations; Lectures 9 through 14 introduce direct methods. Together
they are the main focus of the book. Lectures 15 through 20 are special topics,
they may be optional material. Topics with ∗ may be omitted during the first
reading of this volume.
1.2 Functionals
Calculus of variations examines the extremal values (or more generally, the
critical values) of functionals.
Generally speaking, a functional is a mapping from any set M to the field
of real numbers R or the field of complex numbers C. However, in calculus of
variations, a functional will only take values in R, whose domain M is a set of
functions, i.e. I : M → R.
For example, let Ω ⊂ Rn be an open set, x0 ∈ Ω be a fixed point, F ∈ C(Ω̄),

and M = C 1 (Ω̄).
I1 (u) = max |u(x)|,

x∈Ω̄
I2 (u) = u(x0 ),
Z
I3 (u) = [|∇u(x)|2 − F (u(x))]dx
Ω
are all functionals. However, regardless of the choices of M and the single vari-
able function f , the composite function
I4 (u) = f (u(x))
is not a functional.
Given a function L ∈ C 1 (Ω×RN ×RnN ), one mainly considers the following
functional:
Z
I(u) = L(x, u(x), ∇u(x))dx,
Ω
where M is a subset of C 1 (Ω̄), the set of continuously differentiable functions

on Ω̄. Sometimes, M could also be a subset of differentiable functions in some
generalized sense, given by certain prescribed constraints (such as integral form
boundary conditions, pointwise boundary conditions, and boundary conditions
with or without differentials).
Occasionally, the integral expression of I may also contain higher order
derivatives terms, the set M should be modified accordingly.
1.3 Typical examples
Example 1.1 (The line of steepest descent) Given two points A = (x1 , y1 ) and
B = (x2 , y2 ) in the xy-plane, where x1 < x2 and y1 > y2 . A particle is free
falling along a smooth curve joining A and B. Assuming the initial velocity of
the particle is zero, what trajectory would be the fastest to travel from A to B?
(see Figure 1.1)
Fig. 1.1
Assume u ∈ C 1 ([x1 , x2 ]), let {(x, u(x)) | x ∈ [x1 , x2 ], u(xi ) = yi , i = 1, 2}

be a curve connecting A and B. Since

1 2
 mv = mgh,


2
 ds
v =
 ,
dt
we have
p
v= 2g(y1 − u(x))
and
s
ds 1 + |u0 (x)|2
dt = = dx.
v 2g(y1 − u(x))
The total time is

s
x1 x1
1 + |u0 (x)|2
Z Z
1
T = dt = √ dx.
x0 2g x0 (y1 − u(x))
Let
M = {u ∈ C 1 ([x1 , x2 ])) | u(xi ) = yi , i = 1, 2},
then the mapping M → R via u 7→ T is a “functional”. Here u is the independent

variable, T = T (u) is the dependent variable, and we wish to find u ∈ M such
that T attains its minimum.
Example 1.2 (Geodesics) Given two points P0 = (x01 , x02 , x03 ) and P1 =
P3
(x11 , x12 , x13 ) on the unit sphere S 2 = {(x1 , x2 , x3 ) ∈ R3 | i=1 x2i = 1}, find
the path joining them whose arclength is the shortest.
We adopt spherical coordinates v = (θ, ϕ) ∈ [− 21 π, 12 π] × [0, 2π) on S 2 such
that 
 x1 = x1 (v) = cos θ cos ϕ,
x = x2 (v) = cos θ sin ϕ,
 2
x3 = x3 (v) = sin θ,
and v i = (θi , ϕi ) corresponds to Pi = (x1 (v i ), x2 (v i ), x3 (v i )) for i = 0, 1.
Let M = {v ∈ C 1 ([0, 1], [− π2 , π2 ] × [0, 2π)) | v(i) = v i , i = 0, 1}, then for all
v ∈ M , u(t) = (x1 (v(t)), x2 (v(t)), x3 (v(t))) for t ∈ [0, 1] is an arc connecting
P0 and P1 .
The square of the line element is
ds2 = dx21 + dx22 + dx23 = dθ2 + cos2 θdϕ2 = (θ0 (t)2 + cos2 θ(t)ϕ0 (t)2 )dt2 .
Hence, the arclength L : M → R is given by
Z 1 Z 1p
L(u) = |ds| = (θ0 (t)2 + cos2 θ(t)ϕ0 (t)2 )dt.
0 0
The arclength L is now a functional of the parametric function v(t) = (θ(t), ϕ(t)),
and we wish to find v ∈ M such that L(v) is a minimum.
Example 1.3 (Minimal surfaces) Given a Jordan curve Γ in R3 , is it possible to
find a surface S bounded by Γ whose area is a minimum?
We give the following parametrization of S : (u, v) 7→ Z = (x, y, z) as a map
from D̄ → R3 , where D ⊂ R2 is the unit circle, u2 + v 2 ≤ 1 such that

x = x(u, v),


y = y(u, v),


z = z(u, v).
The area A(Z) of S is given by

Z
A(Z) = |Zu × Zv |dudv
ZD p
= (xu yv − yu xv )2 + (yu zv − zu yv )2 + (zu xv − xu zv )2 dudv.
D
The area A is therefore a functional of the parametric function Z, and we wish to
minimize A under the condition that Z|∂D is homeomorphic to (') Γ. Namely,
we want to find the vector-valued function Z(u, v) = (x(u, v), y(u, v), z(u, v)) ∈
M , where M = {Z ∈ C 1 (D̄, R3 ) | Z|∂D ' Γ} such that A(Z) is a minimum.
Example 1.4 (Eigenvalue problems and inequalities) Given a bounded region

Ω ∈ Rn . For all u ∈ H01 (Ω), where H01 (Ω) denotes the Sobolev space with zero
boundary condition (for details, see Lecture 10), we define the energy
Z
E(u) = |∇u(x)|2 dx
Ω
and the constraint

Z
G(u) = |u(x)|2 dx = 1.
Ω
We define
M = {u ∈ H01 (Ω) | G(u) = 1}
and we wish to find a function u1 such that the functional E : M → R attains its
minimum at u1 . Furthermore,
λ1 = min{E(u) | G(u) = 1}
is called the first eigenvalue with the corresponding first eigenfunction u1 . The
eigenpair has great importance in geometry, physics, and engineering.
Often times, inequalities in analysis and geometry can be posed as a variational
problem. For instance, the Sobolev’s inequality asserts
Z Z n−2
N
2N
|∇u(x)|2 dx ≥ SN |u(x)| N −2 dx ,
RN RN
where
−2/N
Γ(N )
SN = N (N − 2)π .
Γ(N/2)
We may turn this into a variational problem as follows. Let
Z
2N
1 N

M = u ∈ H0 (R ) |u(x)| N −2 dx = 1
RN
and
Z
E(u) = |∇u(x)|2 dx,
RN
we now find u ∈ M such that the functional E has a minimum. If the minimal
value is SN , then SN is the best constant for the above inequality.
Example 1.5 (Vibrations of thin plates) To study the vibrations of a thin elastic
plate (thin means the ratio between the thickness h and the minimal span a of the
plate satisfies h/a 1; the plate is assumed to be homogeneous and isotropic)
under exterior force. In continuum mechanics, Kirchnoff proposed the so-called
“assumption of straight line”, i.e. straight lines normal to the plate remains straight
and no stretch nor strain on the plate occurs during deformation.
Suppose a plane region Ω represents a thin plate, its density function is ρ(x, y).
Let w(x, y) be the displacement of (x, y) ∈ Ω. The stress–strain relation can
then be expressed via the potential energy density, which depends on the Hessian
matrix of w(x, y):

wxx wxy
.
wyx wyy
Since all physical quantities remain independent of the coordinates chosen, the
potential energy density can only depend on the quantities
2
wxx + wyy and wxx wyy − wxy .
Let f (x, y) be the density of the stress put on Ω, the resulting potential energy
density is only considered exterior work done. If we ignore the stress on ∂Ω, the
boundary of Ω and the bending on the boundary, the total potential energy is then
given by
Z
1 2 2
U (w) = [(wxx +wyy ) −2(1−µ)(wxx wyy −wxy )]+f (x, y)w(x, y) dxdy,
Ω 2
where µ is determined by the material of the plate itself, known as the Poisson
ratio.
Suppose now we fix the boundary of the thin plate, i.e. w|∂Ω = ϕ(x) with ϕ
a given function on ∂Ω. Let M = {w ∈ C 2 (Ω̄) | w|∂Ω = ϕ(x)}, then U is a
functional of the displacement function w ∈ M .
From the principles of mechanics, we know that the equilibrium of the plate
must obey variational principle. In other words, the potential energy functional
U of the plate achieves its minimum at the displacement w where the plate is
balanced.
1.4 More examples
Unlike the above tranditional examples, certain problems may not appear to
be related to problems of finding extremal values of functionals at first, however,
by proper modifications, they can be transformed into variational problems.
Example 1.6 (Reinvestment) We assume the production rate of certain goods at

time t is q = q(t), the growth rate q̇ is proportional to the percentage of reinvest-
ment u(t) at time t, i.e.
q̇ = αuq,
where α > 0 is a constant.
Consequently, the total number of goods produced on the time interval [0, T ]
is given by
Z T
J(u, q) = (1 − u(t))q(t) dt.
0
Given the initial value q(0) = q0 , the question to consider is: how to choose the
reinvestment percentage u(t) such that the total product J is a maximum?
Once we set
M = {(u, q) ∈ C[0, T ] × C 1 [0, T ] | 0 ≤ u(t) ≤ 1, q̇ = αuq, q(0) = q0 },
the mapping (u, q) 7→ J(u, q) becomes a functional on M . The question is to
find (u, q) ∈ M such that J achieves its maximum. This is an optimal control
problem, where the roles of u and q are unequal. We call u the control variable, q
the state variable and J the target functional.
In dealing with variational problems, we often refer to the variables as “func-
tions”, but the meaning of “functions” can be quite broad.
Example 1.7 (Image segmentation) Imagine the following scenario: a picture
of someone (with possible damage) is presented to us and we are asked to detect
the edge of the image of the person.
Suppose the background of the picture occupies a plane region Ω ⊂ R2 and
g : Ω → R1 is its image. We wish to find a function u : Ω → R1 such that
u is the best fit for g along the edge of the image, without leaving special marks
elsewhere.
In order to describe the edge of the image, we introduce a closed subset K of
Ω̄ with finite one dimensional Hausdorff measure H 1 (K). We define
Z Z
I(K, u) = |∇u|2 dxdy + µ |u − g|2 dxdy + λH1 (K),
Ω\K Ω\K
where λ, µ > 0 are adjustable parameters.

It is worth noting that I not only depends on the function u, but also on the
closed subset K. Although the variable K is not a function, we may replace it by
its characteristic function
(
1, x ∈ K
χK (x) =
0, x ∈ Ω \ K
and I becomes a functional depending on χK and u. We may henceforth use K

and χK interchangeably.
We can thus turn this image segmentation problem into an extremal problem
for the functional I. For this reason, we define the set
Z
M = (K, u) | K ⊂ Ω is closed, H 1 (K) < ∞, (|∇u|2 + |u|2 )dxdy < ∞ ,
Ω
where ∇u should be understood in the sense of distribution. Our goal is to find

(K, u) ∈ M such that I : M → R achieves its minimum.
Example 1.8 (Harmonic mappings) Assume (M, g) and (N, h) are two compact
Riemannian manifolds. Given a mapping u ∈ C 1 (M, N ), its differential du ∈
Γ(T ∗ M × u−1 T N ) is a cross section of the product bundle, where u−1 T N is a
bundle over M with metric h ◦ u and T ∗ M is the cotangent bundle on M . In local
coordinates: x = (x1 , . . . , xm ) and u = (u1 , . . . , un ), we have:
∂ui α ∂
du = dx ⊗ .
∂xα ∂ui
We define the energy density to be
n m
1 1 X X ∂ui ∂uj
L(u, du) = |du(x)|2h = hij (u(x))g αβ (x) α β .
2 2 i,j=1 ∂x ∂x
α,β=1
Let dVg denote the volume element of M , then the total energy
Z
E(u) = L(u(x), du(x))dVg
M
is a functional of u.
The mapping u itself can also be regarded as a “function”. If we take M =
C 1 (M, N ), then E : M → R is a functional. Those mappings u ∈ M for which
E attains its minimal value are called harmonic mappings. They play an important
role in differential geometry.
Example 1.9 (Hamiltonian system) Given a function H ∈ C 1 (R1 × Rn ×
Rn ). We introduce the following notation: x = (x1 , . . . , xn ), p = (p1 , . . . , pn ),
Pn
(t, x, p) ∈ R1 × Rn × Rn , and (x, p)Rn = i=1 xi pi .
When both x and p are (vector-valued) functions of time t, the system of
ordinary differential equations
(
ẋ = Hp (t, x, p),
ṗ = −Hx (t, x, p)
is called a Hamiltonian system.
Let M be the collection of all functions H ∈ C 1 (R1 × Rn × Rn ) such that

the functions as well as their derivative functions are integrable on R1 , we impose
certain conditions on H such that
Z
I(x, p) = [(ẋ, p)Rn − H(t, x, p)]dt
R1
is a functional on M . Clearly, I is neither bounded above nor below, hence it has
no extremal values. However, in our subsequent lectures, we shall find out for
certain choices of M , those pairs (x(t), p(t)) for which I attains its critical values
(or stationary values) are precisely solutions of the Hamiltonian system.
Example 1.10 (Einstein field equation) In the theory of general relativity, it is
customary to use the signature (1, 3) Minkowski metric (gij ) on R4 to describe the
gravitational field. The Minkowski metric is a non-degenerate symmetric bilinear
form with sign convention (1, −1, −1, −1) and whose line element is given by
X
ds2 = gij dxi dxj .
i,j
We designate the Euclidean coordinate by (x, y, z) and the time coordinate by t,

then (x0 , x1 , x2 , x3 ) = (ct, x, y, z), where c is the speed of light. Using an inertia
frame, we have:
ds2 = (dx0 )2 − (dx1 )2 − (dx2 )2 − (dx3 )2 .
We use the Christoffel symbols Γijk to compute the curvature tensor to be
1 ∂ 2 gim ∂ 2 gkl ∂ 2 gil ∂ 2 gkm
X
Riklm = + − − + gnp (Γnkl Γpim−Γnkm Γpil ),
2 ∂xk ∂xl ∂xi ∂xm ∂xk ∂xm ∂xi ∂xl n,p
the Ricci tensor is given by

X ∂Γl ∂Γlil
X
Rik = ik
− + (Γlik Γm m l
lm − Γil Γkm )
∂xl ∂xk
l l,m
and the scalar curvature is

X XX
R= g ik Rik = g il g km Riklm .
i,k i,k l,m
Einstein introduced the following decomposition of S = S(g):

S = Sg + Sm ,
where
Z q
Sg = RdΩ, dΩ = | det(gij )|d4 x
is called the Einstein–Hilbert action field, it represents the contribution of the

gravitational field without any matter. The quantity Sm is given by
Z q
1
Sm = Λ − det(gij )d4 x,
c
where Λ is a function determined by the matter and metric, it measures the contri-
bution of the gravitational field.
Note that the metric g is a variable, also a function taking its values in non-
degenerate real symmetric 4 × 4 matrices of signature (1, 3), whereas S = S(g)
is regarded as a functional of g.
The motion of matter obeys variational principles, hence the gravitational
field-Minkowski metric helps to stabilize S.
In summary, variational problems have very rich content. From classical me-
chanics to gauge theory, the laws of motion follow variational principles. The
reflection of a light beam, the bending surface of fluid in capillary, and soap bub-
bles are also examples of natural phenomena which abide by variational princi-
ples. Furthermore, from engineering designs to social-economic life, we are con-
tinuously challenged by problems seeking for maximal speed, maximal distance,
minimal consumption, optimal shape, the lightest weight, optimal revenue, and
the most crisp image, etc. all of which ultimately lead to solving extremal value
problems of functionals.
Lecture 2
The Euler–Lagrange equation
2.1 The necessary condition for the extremal values of functions — a

review
Before we begin our study on the necessary condition for the extremal values
of a functional, we shall first review the necessary condition for the extremal val-
ues of a real-valued function. Let Ω ⊂ Rn be an open set. Suppose the function
f ∈ C 1 (Ω) attains its (local) minimum at some point x0 ∈ Ω, i.e. there exists an
open neighborhood U ⊂ Ω of x0 such that
f (x) ≥ f (x0 ), ∀ x ∈ U.
Hence, ∀ h ∈ Rn \ {θ}, where θ denotes the zero vector in Rn , ∃ ε(h) > 0 such
that whenever 0 < |ε| < ε(h), x0 + εh ∈ U and
f (x0 + εh) ≥ f (x0 ),
i.e.
1
[f (x0 + εh) − f (x0 )] ≥ 0.
ε
Letting ε → 0, it yields
(∇f (x0 ), h)Rn = 0,
where (·, ·) denotes the standard inner product on Rn . Since h ∈ Rn \ {θ} is

arbitrary, it follows that ∇f (x0 ) = θ; a necessary condition for x0 to be a local
minimum of f .
13
2.2 The derivation of the Euler–Lagrange equation
In this section, we only consider the case of a functional depending on a single

variable function (it may be vector-valued). For the multivariate cases, we give
detailed discussions in Lecture 6.
Given an interval J = [t0 , t1 ] ⊂ R1 and an open subset Ω ⊂ RN . For a given
continuously differentiable function L = L(x, u, p), L ∈ C 1 (J × Ω × RN , R1 )
together with two points P0 , P1 ∈ Ω, we define the set
M = {u ∈ C 1 (J, Ω) | u(ti ) = Pi , i = 0, 1}
and the functional I on M via Z
I(u) = L(t, u(t), u̇(t))dt.
J
We call u∗ a minimum of I on M if there exists an open neighborhood U of x∗ in
M with respect to the C 1 (J, Ω) topology such that
I(u) ≥ I(u∗ ), ∀ u ∈ M ∩ U.
∗
Assuming the existence of u , we shall determine the necessary condition for
which I attains its minimum at u∗ .
Similar to the extremal value problems of functions, ∀ ϕ ∈ C01 (J, RN )
(C01 (J, RN )is the closure of C0∞ (J, RN ) with respect to the C 1 (J, Ω) topology),
∃ ε = ε(ϕ) > 0 such that whenever 0 < |ε| < ε(ϕ), u∗ + εϕ ∈ U and
I(u∗ + εϕ) − I(u∗ ) ≥ 0.
It follows that
1
δI(u∗ , ϕ) = lim (I(u∗ + εϕ) − I(u∗ )
ε→0 ε
Z XN
= [Lui (t, u∗ (t), u̇∗ (t))ϕi (t) + Lpi (t, u∗ (t), u̇∗ (t))ϕ̇i (t)]dt
J i=1
Z XN Z t
= ( Lui (s, u∗ (s), u̇∗ (s))ds − Lpi (t, u∗ (t), u̇∗ (t)))ϕ̇i (t)dt
J i=1 t0
≥ 0,
∀ ϕ ∈ C01 (J, RN ). We call δI(u∗ , ϕ) the first order variation of I with respect
to ϕ.
If we replace ε > 0 by ε < 0, it is equivalent to replacing ϕ by −ϕ in the
above equation. Hence, ∀ ϕ ∈ C01 (J, RN ), we have:
Z X N Z t
∗ ∗ ∗ ∗
Lui (s, u (s), u̇ (s))ds − Lpi (t, u (t), u̇ (t)) ϕ̇i (t)dt = 0.
J i=1 t0
The Euler–Lagrange equation 15
In order to better understand this integral in relation to u∗ , it is desirable to remove

the arbitrary function ϕ. To do so, we shall need the following lemma.
Lemma 2.1 (du Bois–Reymond) If ψ ∈ C[t0 , t1 ] satisfies
Z
ψ(t) · λ̇(t)dt = 0, ∀ λ ∈ C01 (J),
J
where C01 (J) = C01 (J, R1 ), which is equal to {u ∈ C 1 (J) | u(t0 ) = u(t1 ) = 0},
then ψ is a constant.
1
Rt
ψ(t)dt and λ(t) = t0 (ψ(s) − c)ds, then λ ∈ C01 (J).
R
Proof Let c = |J| J
Thus,
Z Z Z
(ψ(t) − c)2 dt = ψ(t)(ψ(t) − c)dt = ψ(t) · λ̇(t)dt = 0.
J J J
By continuity, ψ must be a constant.

This leads to
Theorem 2.1 Assume u∗ ∈ M is a minimizer of the functional I, it must sat-
isfy the integral form of the Euler–Lagrange equation ( for short E-L equation
henceforth):
Z t
Lui (s, u∗ (s), u̇∗ (s))ds − Lpi (t, u∗ (t), u̇∗ (t)) = const., 1 ≤ i ≤ N, ∀ t.
t0
Note that E-L equation is a necessary condition for u∗ to be a minimizer of I

on M , it needs not be sufficient. Moreover, the solutions of the E-L equation
correspond to critical points of the functional I, they may be maxima, minima, or
other extrema.
Remark 2.1 When L ∈ C 1 and u ∈ C 1 , using the theory of distributions, the
integral form of the E-L equation may also be written as
−DLp (t, u(t), u̇(t)) + Lu (t, u(t), u̇(t)) = 0,
where D is the generalized derivative. We can define the Euler–Lagrange operator

EL as follows:
(EL u)(t) = −DLp (t, u(t), u̇(t)) + Lu (t, u(t), u̇(t)).
In particular, if L ∈ C 2 and u ∈ C 2 , then the above expression is valid pointwise,

d
hence we may simply replace D by the usual dt .
Remark 2.2 We can also relax the C 1 (J) requirement in Theorem 2.1. For exam-
ple, we may consider Lipschitz functions on J, Lip(J). Since a Lipschitz function
u(t) is absolutely continuous on J, it has a derivative function u̇(t) almost every-

where on J. The integral used in the functional I can therefore be interpreted in
the sense of a Lebesgue integral, and we let
M = {u ∈ Lip(J, Ω) | u(ti ) = Pi , i = 0, 1}.
Note that in M , ∀ δ > 0, U = {v ∈ Lip(J) | |v(t) − u∗ (t)| + |v̇(t) − u̇∗ (t)| <
δ a.e., t ∈ J} is a neighborhood of u∗ .
The E-L equation still holds in the sense of Lebesgue integral for almost all
t ∈ J:
Z t
Lui (s, u(s), u̇(s))ds − Lpi (t, u(t), u̇(t)) = const., 1 ≤ i ≤ N.
t0
In fact, since u̇ is bounded almost everywhere (a.e.) on J, there exists a com-

pact neighborhood W ⊂ Ω × RN of (u(t), u̇(t)) such that the derivative of L is
bounded a.e. on J × W . Hence, we have:
1
δI(u∗ , ϕ) = lim (I(u∗ + sϕ) − I(u∗ ))
s→0 s
Z
1
= lim [(L(t, u∗ (t) + sϕ(t), u̇∗ (x) + sϕ̇(x)) − L(t, u∗ (t), u̇∗ (t))]dt
s→0 s J
Z X n
= [Lui (t, u∗ (t), u̇∗ (t))ϕi (t) + Lpi (t, u∗ (t), u̇∗ (t))ϕ̇i (t)]dt.
J i=1
To see this, we use the fact that the difference inside the above integral is uniformly
bounded, hence by the Lebesgue’s Dominant Convergence Theorem, we can pass
the limit inside the integral.
Aside from this, in the du Bois–Reymond Lemma, we may replace the re-
quirement ψ ∈ C 1 (J) by ψ ∈ L∞ (J) and λ ∈ C01 (J) by λ ∈ AC0 (J), the space
of all absolutely continuous functions on J which vanish on the boundary of J.
Remark 2.3 A piecewise C 1 continuous function is a Lipschitz function. By
a piecewise C 1 continuous function u, we mean there exists a finite set D =
{a1 , . . . , ak } such that u ∈ C 1 (J \ D) and u̇(ai ± 0) exists for 1 ≤ i ≤ k. In
this case, the integral form of the E-L equation continues to hold for the class of
piecewise C 1 continuous functions.
Many fundamental equations in mechanics and geometry are E-L equations.
We have the following examples:
Example 2.1 (The displacement of a moving particle) A force F is put on a
particle of mass m. Assume the particle’s displacement coordinate is given by
x = (x1 , x2 , x3 ) ∈ R3 with |x|2 = x21 + x22 + x23 , the velocity v = ẋ and kinetic
energy T = 21 mv 2 . Suppose F is of a potential, i.e. there exists a function V such

that −∇V = F , we then call the function
1 1
L=T −V = mv 2 − V = m|ẋ|2 − V (x)
2 2
the Lagrangian. On a properly chosen domain M , we consider the functional
Z t2
I(x) = L(x(t), ẋ(t))dt.
t1
A minimizer x(t) of I satisfies the E-L equation

F = mẍ,
which is exactly the orbit governed by Newton’s second law of motion.
Given a collection of particles with degree of freedom n, we denote the dis-
placement coordinate by q = (q1 , . . . , qn ), it follows that
Pn
• the kinetic energy is given by T = 21 i,j=1 aij q̇i q̇j , where (aij ) is a posi-
tive definite matrix,
• the potential is given by V = V (q1 , . . . , qn ),
• the Lagrangian is given by L = T − V ,
• the functional is given by
Z t1 Z t1 X n
1
I(q) = L(q(t), q̇(t))dt = aij q̇i q̇j − V (q1 , . . . , qn ) dt.
t0 t0 2 i,j=1
The derived E-L equations are

n
X ∂V
aij q̈j = − (i = 1, 2, . . . , n),
j=1
∂qi
which are also Newtonian equations.

Example 2.2 (Geodesics) Suppose (M̃ , g) is a Riemannian manifold equipped
with a Riemannian metric gik (u), where (gik ) is an N ×N positive definite matrix.
Given two points P1 and P2 belonging to the same coordinate chart U ⊂ M̃ . U is
homeomorphic to an open subset in RN . We then define L : R1 × RN → R1 by
N
X
L(u, p) = gij (u)pi pj .
i,j=1
Let M = {u ∈ C 1 (J, U ) | u(i) = Pi }, for the functional

Z
I(u) = L(u(x), u̇(x))dx,
J
its E-L equation looks like

d X 1X
{gij (u)u̇j } = gjk,i (u)u̇j u̇k ,
dt j 2
j,k
where
∂
gjk,i (u) = gjk (u).
∂ui
Consequently,
X X 1
gij (u)üj + (gij,k (u)u̇j u̇k − gkj,i u̇j u̇k ) = 0.
j
2
j,k
Note that
X X d X d X
glj,k (u)u̇k u̇j = glj (u)u̇j = glk u̇k = gkl,j (u)u̇k u̇j .
j
dt dt
j,k k j,k
By means of Christoffel symbols of the first kind

1
Γjlk (u) = {glj,k + gkl,j − gjk,l },
2
we see that
X X
gij (u)üj + Γjik (u)u̇k u̇j = 0.
j jk
ik
Let (g ) be the inverse matrix of (gik ), or using the Christoffel symbols of the
second kind
X
Γijk = g il Γjlk ,
l
the E-L equation becomes

X
üi + Γijk (u)u̇j u̇k = 0 ∀ i.
j,k
This coincides with the geodesic equations in differential geometry.

In the following lectures, we will also use the E-L equation to derive many
fundamental equations in physics.
• The variational derivative
Our earlier derivation of the E-L equation took place on the whole interval J.
However, ∀ c ∈ int(J), the pointwise E-L equation only depends on the behavior
of L near c. This means, it only depends on any open interval (c − h, c + h) ⊂
int(J). Since every test function ϕ with support inside (c − h, c + h) belongs to
C01 (J, RN ), by the arbitrariness of ϕ, it yields the E-L equation on (c − h, c + h)

(see Figure 2.1)
Fig. 2.1
Next, we examine such local behavior via limits. For simplicity, we shall take
N = 1 and assume L ∈ C 2 , u ∈ C 2 . Note that
Z c+h Z t
EL (u)(s)dsϕ̇(t)dt
I(u + ϕ) − I(u) c−h t0
lim = − lim
∆σ ∆σ
Z c+h
EL (u)(t)ϕ(t)dt
c−h
= lim
∆σ
= EL (u)(c),
where θR = θ(t) ∈ (0, 1), the support of ϕ is contained in (c − h, c + h), and

c+h
∆σ = c−h ϕ(t)dt is the area of the sector bounded by the curves u(t) + ϕ(t)
and u(t) for t ∈ (c − h, c + h). It should also be noted that the limiting process is
given by
h → 0, sup |ϕ̇(t)| → 0.
t∈[c−h,c+h]
In light of the above calculations, we call the Euler–Lagrange operator of u at t

d
EL (u)(t) = − Lp (t, u(t), u̇(t)) + Lu (t, u(t), u̇(t))
dt
the variational derivative of I at t.
2.3 Boundary conditions
Recall in §2.1, a function u from the domain

M = {u ∈ C 1 (J, Ω) | u(ti ) = Pi , i = 0, 1}
satisfies the boundary condition u(ti ) = Pi (i = 0, 1) on the interval J. This
implies the minimizer u∗ satisfies not only the E-L equation
d
− Lp (t, u(t), u̇(t)) + Lu (t, u(t), u̇(t)) = 0,
dt
but also the boundary conditions
u(t0 ) = P0 and u(t1 ) = P1 .
If we change the domain to be M = C 1 (J, RN ) instead, i.e. we impose no
condition on the endpoints of J, it is interesting to find out what equation and
boundary condition the minimizer u∗ of I would satisfy.
From §2.1, in the process of deriving the E-L equation, the crucial step is to
choose a nearby function u of u∗ and compare their functional values. To be more
specific, we choose
u = u∗ + εϕ
for ϕ ∈ C01 (J, RN ). Since ϕ vanishes at the endpoints of J, u and u∗ share the
same boundary values.
In our current setting, since there is no need to impose any boundary condi-
tion on ϕ, any ϕ ∈ C 1 (J, RN ) would work. Suppose u∗ ∈ C 2 (J, RN ), using
integration by parts, we see that
δI(u∗ , ϕ)
Z X n Z t1
∗ ∗ ∗ ∗
= [Lui (s, u (s), u̇ (s))ds − Lpi (t, u (t), u̇ (t)) ϕ̇i (t)dt
J i=1 t0
n Z
Z X t1
d
=− [Lui (t, u∗ (t), u̇∗ (t)) − Lpi (t, u∗ (t), u̇∗ (t)) ϕ̇i (t)dt
J i=1 t0 dt
N
X
− [Lui (t, u∗ (t), u̇∗ (t))ϕi (t)|tt10 ].
i=1
Since C01 (J, RN ) ⊂ C 1 (J, RN ), we first choose an arbitrary ϕ ∈ C01 (J, RN ) to

obtain the same E-L equation and then choose ϕ ∈ C 1 (J, RN ) (with arbitrary
boundary value). Since the first term of the right-hand side disappears and ϕi (tj )
(j = 0, 1, i = 1, . . . , N ) in the second term is arbitrary, it must be the case where
Lui (tj , u∗ (tj ), u̇∗ (tj )) = 0, i = 1, . . . , N, j = 0, 1.
Of course, there are many other choices for the set M ; for example, one may
choose to fix only one endpoint and leave the other one free; we may also im-
pose different boundary conditions on the different components of a vector-valued
function.
In some of our latter discussions, we will also encounter other types of bound-
ary conditions, such as periodic and free boundary conditions.
Caution: In all of our previous discussions, we have always assumed all func-
tions u ∈ M are continuously differentiable. If we replace the continuously differ-
entiable functions by piecewise C 1 functions, although the E-L equation remains
the same (locally), it is however necessary to insert corner conditions at those
points where the derivative function has a jump discontinuity (see Exercise 2.4).
2.4 Examples of solving the Euler–Lagrange equations
For N = 1, we shall consider the following special cases where the E-L equa-
tion can be simplified.
• Case 1. Suppose u is absent from L, then L = L(t, p) and
d
Lp (t, u̇(t)) = 0.
dt
Since Lp (t, u̇(t)) = c is a first order equation without u, assuming we can solve
for u̇ (e.g. Lpp (t, p) 6= 0) to get
u̇(t) = g(t, c),
then by integration, u(t) is also solved.
Example 2.3 Let M = {u ∈ C 1 ([1, 2]) | u(1) = 0, u(2) = 1} and
Z 2p
dt
I(u) = 1 + u̇2 .
1 t
Find u such that it minimizes the functional I.
p
Solution Since L = t−1 1 + p2 ,
p
Lp = p = C.
t 1 + p2
It follows that
Ct
u̇2 (1 − C 2 t2 ) = C 2 t2 or u̇ = ± √ .
1 − C 2 t2
Taking into account of the sign as part of the constant C, we integrate again to
obtain
1p
u= 1 − C 2 t2 + C1 .
C
Using the above boundary conditions, we deduce that C = √1 and C1 = 2.

5
Therefore,
(u − 2)2 + t2 = 5.
• Case 2. (Autonomous systems) Suppose L is independent of t, then L =
L(u, p). We introduce the Hamiltonian
H(u, p) = pLp (u, p) − L(u, p).
Theorem 2.2 Assume L ∈ C 2 and it is independent of t. If u ∈ C 2 (J, R1 ) is a
solution of the E-L equation, then
H(u(t), u0 (t)) = const., ∀ t.
Proof By direct calculation, we have:
d
H(u(t), u0 (t)) = u0 (t) · EL (u)(t) = 0.
dt

Example 2.4 (The line of steepest descent) Here we have:
p
1 1 + p2
L(u, p) = √ √ .
2g y1 − u
By Theorem 2.2, it implies
pLp − L|(u,u0 ) = const.,
i.e. ∃ a constant c such that
√
1 + u02 u02
−√ +p = c.
y1 − u (1 + u02 )(y1 − u)
Hence,
c2 (1 + u02 )(y1 − u) = 1.
Let k be a positive constant yet to be determined. We make the following coordi-
nate transformation: let θ be a parameter such that
(
x = x(θ),
u = u(θ),
and
u(θ) = y1 − k(1 − cos θ).
Then
sin2 θ

c2 1 + k 2 k(1 − cos θ) = 1.
ẋ(θ)2
q
1
Taking c = 2k , we then have:
ẋ(θ) = k(1 − cos θ).
Thus, we have:

 x(θ) = x1 + k(θ − sin θ),
u(θ) = y1 − k(1 − cos θ),
θ ∈ [0, Θ].

Lastly, we use

x(Θ) = x2 ,
u(Θ) = y2 ,
to determine the k and Θ.
Example 2.5 (A minimal surface generated by a surface of revolution) Given
two points P1 = (x1 , y1 ) and P2 = (x2 , y2 ) in the xy-plane, where x1 < x2
and y1 , y2 > 0, we are to find a function u ∈ C 1 ([x1 , x2 ]) whose graph passing
through these points such that the surface generated by revolving the graph of u
about the x-axis has minimal area.
Without loss of generality, we may assume u(xi ) = yi , i = 1, 2 and u(x) > 0.
The area of the surface of revolution is given by
Z x2 p
I(u) = 2π u(x) (1 + u0 (x)2 )dx.
x1
We now find u to minimize I.
Since the Hamiltonian is conservative, we have:
L(u(t), p(t)) − u̇(t)Lp (u(t), p(t)) = C,
hence,
p uu̇2
u 1 + u̇2 − √ = C, ∀ t,
1 + u̇2
or equivalently,
p
u̇ = C −1 u2 − C 2 .
After integration, we arrive at
√
u+ u2 − C 2
C ln = x + C1 ,
C
or the equivalent form of
x + C1
u = C cosh ,
C
which are the standard catenary equations.
Example 2.6 (Geodesics on a sphere) As in Example 1.2, we adopt the param-

eter ϕ. By Theorem 2.2, we can rewrite the E-L equation as
θ02 (ϕ) p
p − cos2 θ(ϕ) + θ0 (ϕ)2 = c,
cos2 θ(ϕ) + θ0 (ϕ)2
where c is a constant. By definition, −1 ≤ c < 1. Hence,
p
− cos2 θ(ϕ) = c cos2 θ(ϕ) + θ0 (ϕ)2 ,
namely,
c2 θ02 = cos4 θ − c2 cos2 θ.
By substituting t = tan θ in the above equation, it yields that
Z Z
cdθ cdt ct
±ϕ + ϕ0 = √ = p = arcsin √ ,
2
cos θ cos θ − c 2 2
(1 − c ) − c t 2 2 1 − c2
for 0 < c2 < 1. This gives us
√
1 − c2
tan θ(ϕ) = sin (±ϕ + ϕ0 )
c
or
√
1 − c2
θ(ϕ) = arctan sin (±ϕ + ϕ0 ).
c
The constants c and ϕ0 are determined by P0 and P1 .
It turns out that this corresponds to the “great circles” on the sphere. When
c = 0, θ = ± π2 , which corresponds to the north and south pole, a degenerate case
as this is not a curve. When c = −1, it corresponds to the equator.
• Case 3. Suppose p is absent from L, i.e. L = L(t, u), then a solution of the
E-L equation
Lu (t, u) = 0
is a single curve or several curves.
Rb
Example 2.7 For the functional I(u) = a
(t − u)2 dt, its E-L equation is
t − u = 0;
which are lines with equation u = t ∀ t ∈ [a, b].
• Coordinate transformations In the following, we use the variational
derivative to prove that E-L equation is invariant under coordinate transforma-
tions. Let
(
s = s(t, u),
v = v(t, u),
whose inverse is
(
t = t(s, v),
u = u(s, v).
The Lagrangian L is now changed to

us + uv q
L̃(s, v, q) = L t(s, v), u(s, v), (ts + tv q).
ts + tv q
Suppose the image of t ∈ [t0 , t1 ] under the transformation is s ∈ [s0 , s1 ], we then
have:
Z t1 Z s1
L(t, u(t), u̇(t))dt = L̃(s, v(s), v̇(s))ds.
t0 s0
Hence, the E-L equation is now of the form

d
L̃v − L̃q = 0.
ds
We can solve the latter equation first in the new coordinates and then convert it
back under the inverse transformation.
Example 2.8 Consider the extremal values of the functional:
Z ϕ1 p
I(r) = r2 + ṙ2 dθ,
ϕ0
where r = r(θ). Its corresponding E-L equation is

r d ṙ
√ − √ = 0.
2
r + ṙ 2 dθ r + ṙ2
2
Using polar coordinates:

x = r cos θ, u = r sin θ,
the functional I is of the form
Z x1 p
I(u) = 1 + u̇2 dx,
x0
whose E-L equation becomes

ü = 0.
A general solution to this second order equation is of the form
u = ax + b.
Substituting back to the original variables, it follows that
r sin θ = ar cos θ + b.
Exercises
1. Given an interval J = [t0 , t1 ] ⊂ R1 and an open subset Ω ⊂ RN . Let L =

L(x, u, p) ∈ C 1 (J × Ω × RN , R1 ) be a continuously differentiable function.
For two vectors ξ0 , ξ1 ∈ RN , we define
M1 = {u ∈ C 1 (J, RN ) | u̇(ti ) = ξi , i = 0, 1}
and the functional I on M1 by
Z
I(u) = L(t, u(t), u̇(t))dt.
J
Find the necessary condition for which u0 ∈ M1 is a minimizer of I.
2. (The first Erdmann corner condition) Under the assumptions of Exercise 1, we
further choose two points P0 , P1 ∈ Ω. Assume
u0 ∈ M2 = {u ∈ P W C 1 (J, RN ) | u(ti ) = Pi , i = 0, 1}
is a minimizer of I, where P W C 1 denotes the set of piecewise C 1 continuous
functions. If there exists t∗ ∈ (t0 , t1 ) such that u̇0 (t∗ − 0) 6= u̇0 (t∗ + 0), prove
that
LP (t∗ , u0 (t∗ ), u̇0 (t∗ − 0)) = LP (t∗ , u0 (t∗ ), u̇0 (t∗ + 0)), i = 1, . . . , N.
2 N N 2 N
3. Let L ∈ C (R × R ) and J be a closed interval. Assume R u ∈ C (J, R )
is a solution of the E-L equation of the functional I(u) = J L(u(t), u̇(t))dt.
Define the Hamiltonian to be
N
X
H(u, p) = pi Lpi (u, p) − L(u, p),
i=1
prove that
H(u(t), u̇(t)) = const.
Given a collection of particles with degree of freedom n, we denote the dis-
placement coordinate by q = (q1 , . . . , qn ), the kinetic energy by
n
1 X
T = aij q̇i q̇j ,
2 i,j=1
where (aij ) is a positive definite matrix, and the potential by

V = V (q1 , . . . , qn ).
Let the Lagrangian be L = T − V , we may ask the following questions: what
is the Hamiltonian H in the case? What physical meaning does it have?
4. (The second Erdmann corner condition) Under the assumptions of Exercise 2,

prove that
N
X
∗ ∗ ∗
L(t , u0 (t ), u̇0 (t −0))− Lpi (t∗ , u0 (t∗ ), u̇0 (t∗ −0))
i=1
N
X
= L(t∗ , u0 (t∗ ), u̇0 (t∗ +0))− Lpi (t∗ , u0 (t∗ ), u̇0 (t∗ +0)), i = 1, . . . , N.
i=1
Hint: Introduce the following coordinate transformation

t = vN +1 (s), ui (t) = vi (s), 1 ≤ i ≤ N, s ∈ Λ,
where vN +1 : Λ → J is a homeomorphism. Choose a function F ∈
C 1 (RN +1 × RN ) such that
F (y1 , . . . , yN +1 , q1 , . . . , qN +1 )

q1 qN
= L yN +1 , y1 , . . . , yN , ,..., qN +1 ,
qN +1 qN +1
show that
R
(1) RThe functional K(v) = Λ F (v(s), v̇(s))ds and the functional I(u) =
J
L(t, u(t), u̇(t))dt have the same set of extremal values. Furthermore,
their extrema can be derived from one another via the above coordinate
transformation.
(2)
∀ λ > 0, F (y, λq) = λF (y, q).
(3) Using positive homogeniety, show that the Euler’s equation holds:
N
X +1
F (y, q) = Fqi (y, q)qi .
i=1
Lecture 3
The necessary condition and the sufficient

condition on extremal values of functionals
3.1 The extremal values of functions — a revisit
Assume f ∈ C 2 (Ω, R1 ), Ω ⊂ Rn is an open set, x0 ∈ Ω is such that

∇f (x0 ) = 0, we may ask the question what are the necessary condition and
the sufficient condition for x0 to be a (local) minimum of f .
Suppose x0 is a minimum of f , there must be a neighborhood U ⊂ Ω of x0
satisfying
f (x) − f (x0 ) ≥ 0, ∀ x ∈ U.
This means ∃ 0 > 0 such that when 0 < || < 0 , for all h ∈ Rn \ {0},
x0 + h ∈ U and
f (x0 + h) ≥ f (x0 ), || < 0 .
This implies the one variable function 7→ f (x0 + h) has 0 as its minimum,
hence,
d2
f (x0 + h)|=0 ≥ 0.
d2
Namely,
n
X ∂2f
(d2 f (x0 )h, h) = (x0 )hi hj ≥ 0.
i,j=1
∂xi ∂xj
2
Therefore, the matrix d2 f (x0 ) = ( ∂x∂i ∂x
f
j
)(x0 ) is positive semi-definite.
2
Conversely, suppose the matrix d f (x0 ) is positive definite, then x0 must be a
strict local minimum of f .
29
3.2 Second order variations
We now return to the discussion on the extremal values of functionals. We

have shown that the E-L equation is a first order variation; it serves only as a
necessary condition for the minimizer but not sufficient. From both the func-
tional analysis and differential topology points of view, a solution satisfying the
E-L equation is only a critical point of the functional. Just like its counterpart in
the finite dimensional case, we also need the second order variation to determine
whether it is a (local) minimum.
Let L ∈ C 2 (J × RN × RN ) and
Z
I(u) = L(t, u(t), u̇(t))dt.
J
We assume that u0 ∈ M is a solution of the E-L equation EL (u0 ) = 0 of the

functional I. For all ϕ ∈ C0∞ (J, RN ), let
g(s) = I(u0 + sϕ),
then the one variable function s 7→ g(s) has 0 as its minimum. We call the
following expression
δ 2 I(u0 , ϕ) = g̈(0)
d2
= I(u0 + sϕ)|s=0
ds2
d2
Z
= 2 L(t, u0 (t) + sϕ(t), u̇0 (t) + sϕ̇(t))dt|s=0
ds J
XZ
= [Lui uj (t, u0 (t), u̇0 (t))ϕi (t)ϕj (t)
i,j J
+ 2Lui pj (t, u0 (t), u̇0 (t))ϕi (t)ϕ̇j (t)
+ Lpi pj (t, u0 (t), u̇0 (t))ϕ̇i (t)ϕ̇j (t)]dt

the second order variation of I along ϕ at u0 .
On one hand, suppose u0 is a minimizer, then g̈(0) ≥ 0, so
δ 2 I(u0 , ϕ) ≥ 0, ∀ ϕ ∈ C01 (J, RN ). (3.1)
On the other hand, suppose u0 ∈ C01 (J, RN ) satisfies the E-L equation and sup-
pose ∃ λ > 0 such that
Z
δ I(u0 , ϕ) ≥ λ {|ϕ|2 + |ϕ̇|2 }dt, ∀ ϕ ∈ C01 (J, RN ),
2
(3.2)
J
Extremal values of functionals 31
then u0 must be a strict minimum of I. To see this, consider

s2 s2 s2
g(s) − g(0) = g(s) − g(0) − ġ(0)s = g̈(θs) = [g̈(θs) − g̈(0)] + g̈(0),
2 2 2
for θ ∈ (0, 1) depending only on ϕ. We introduce the following function-valued
matrices:
A = (Lpi pj (t, u, p)),
B = (Lpi uj (t, u, p)),
C = (Lui uj (t, u, p)),
together with their restrains along the function u0 (t):
Au0 = (Lpi pj (t, u0 (t), u̇0 (t))),

Bu0 = (Lpi uj (t, u0 (t), u̇0 (t))),
Cu0 = (Lui uj (t, u0 (t), u̇0 (t))).
We then have:
Z
δ 2 I(u0 , ϕ) = [(Au0 ϕ̇, ϕ̇) + 2(Bu0 ϕ̇, ϕ) + (Cu0 ϕ, ϕ)]dt.
J
Since
Z
g̈(s) = [(Au0 +sϕ ϕ̇, ϕ) + 2(Bu0 +sϕ ϕ̇, ϕ) + (Cu0 +sϕ ϕ̇, ϕ)] dt
J
and L ∈ C 2 , for all kϕkC 1 (J) ≤ 1, as s → 0, we have the uniform estimate:
|Au0 +sϕ − Au0 | + |Bu0 +sϕ − Bu0 | + |Cu0 +sϕ − Cu0 | = o(1),
which yields
Z
g̈(s) − g̈(0) = o(s2 ) (|∇ϕ|2 + |ϕ|2 ) dt.
J
Hence, ∀ ϕ ∈ C01 (J, RN ), as long as |s| is sufficiently small, there exists < λ
such that
Z
I(u0 + ϕ) − I(u0 ) ≥ (λ − ) (|∇ϕ|2 + |ϕ|2 ) dt
J
Although (3.1) and (3.2) give us the necessary and sufficient conditions respec-
tively on u0 such that it minimizes I, its dependence on the arbitrary function ϕ is
nevertheless unsatisfying. We shall continue our study in the subsequent section.
3.3 The Legendre–Hadamard condition
In our previous setting, notice that the roles of the three matrices A0 , B0 , and
C0 in determining whether u0 is a minimum are not all equal.
In fact, ∀ τ ∈ int(J), ∀ ξ ∈ RN , ∀ µ > 0 sufficiently small, one may choose
v ∈ C 1 (R1 ) with v(s) = 0 satisfying for |s| ≥ 1, R1 v̇(s)2 ds = 1. Let
R

t−τ
ϕ(t) = ξµv ,
µ
then

t−τ
ϕ̇(t) = ξ v̇ .
µ
For all µ > 0 sufficiently small, we have:
Z
ϕ̇i ϕ̇j dt = ξi ξj µ,
J
Z Z
2
ϕ̇i ϕj dt = ξi ξj µ v(t)v̇(t)dt,
J R1
Z Z
ϕi ϕj dt = ξi ξj µ3 v(t)2 dt.
J R1
Substituting into (3.1) and letting µ → 0, it shows that
δ 2 I(u0 , ϕ) = µ(Au0 ξ, ξ) + o(µ).
We introduce the Legendre–Hadamard condition as follows:

N
X
(Au0 ξ, ξ) = Lpi pj (τ, u0 (τ ), u̇0 (τ ))ξ i ξ j ≥ 0, ∀ τ ∈ J, ∀ ξ ∈ RN . (3.3)
i,j=1
Suppose ∃ λ > 0 such that

N
X
Lpi pj (τ, u(τ ), u̇(τ ))ξ i ξ j ≥ λ|ξ|2 , ∀ τ ∈ J, ∀ ξ ∈ RN , (3.4)
i,j=1
we then call it the strict Legendre–Hadamard condition.

Theorem 3.1 Let L ∈ C 2 (J × RN × RN ). Suppose u0 ∈ M is a minimizer
of I, then the Legendre–Hadamard condition (3.1) holds. Conversely, if u0 ∈ M
satisfies the E-L equation, and if there exists λ > 0 such that (3.2) holds, then u0
is a strict minimizer of I.
We have stated (3.2) involves an arbitrary function ϕ, in order to remove the

influence of ϕ, we must establish its relation with the strict Legendre–Hadamard
condition (3.4).
It turns out, as seen by the next lemma, we can remove the |ϕ|2 term from the
integral on the right-hand side of (3.2).
Lemma 3.1 (Poincaré) Let ϕ ∈ C01 (J, RN ), then we have:
(t2 − t1 )2
Z Z
2
|ϕ| dt ≤ |ϕ̇|2 dt.
J 2 J
Proof Since
Z t
ϕ(t) = ϕ̇(s)ds,
t0
by the Cauchy–Schwarz inequality, we have:

Z t 2 Z
|ϕ(t)|2 ≤ |ϕ̇(s)|ds ≤ (t − t0 ) |ϕ̇(s)|2 ds.
t0 J
After integrating, it gives that

(t1 − t0 )2
Z Z
|ϕ(t)|2 dt ≤ |ϕ̇|2 dt.
J 2 J
If we remove the |ϕ|2 term from the integral on the right-hand side of (3.2) and
replacing it by
Z
δ I(u, ϕ) > λ |ϕ̇|2 dt, ∀ ϕ ∈ C01 (J, RN ),
2
(3.2)0
J
for some λ ≥ 0, then Theorem 3.1 remains valid.

p
Example 3.1 Let L = 1 + p2 and M = {u ∈ C 1 ([0, b]) | u(0) = u(b) = 0}.
Rbp
The E-L equation of the functional I(u) = 0 1 + u̇2 (t) dt is given by
d u̇
√ = 0,
dt 1 + u̇2
which has a solution u = 0 ∈ M .
Since
1
Luu = Lup = Lpu = 0, Lpp = 3 ,
(1 + p2 ) 2
Z b
δ 2 I(0, ϕ) = ϕ̇2 dt.
0
By Theorem 3.1, u = 0 is a minimum.
On one hand, from the second order variation, it is not difficult to see if the
matrix

Au0 Bu0
Bu0 Cu0
is positive definite, then the solution u0 of the E-L equation must be a minimum.
However, from the next example, we see that the positive definiteness of the matrix
is not necessary for u0 to be a minimum.
Example 3.2 Let I(u) = J (u̇(t)2 − u(t)2 )dt, then for u = 0,
R

AB 1 0
=
BC 0 −1
is not positive semi-definite.
However, when |J| = t1 − t0 is sufficiently small, from Poincaré’s inequality,
we still have;
(t1 − t0 )2
Z Z
δ 2 I(0, ϕ) = (ϕ̇2 − ϕ2 )dt ≥ 1 − ϕ̇2 dt.
J 2 J
So u = 0 is still the minimum.

On the other hand, it is not difficult to check (3.4) is not a sufficient
condition for u0 to be a minimum. In the following, we will investigate the addi-
tional requirement needed for (3.4) to be sufficient.
3.4 The Jacobi field
In this section, we introduce the notion of a Jacobi field.

Let L ∈ C 3 and assume u0 is a solution of the E-L equation, along u0 , we
define
Φu0 (t, ξ, η) = (Au0 η, η) + 2(Bu0 ξ, η) + (Cu0 ξ, ξ), ∀ (ξ, η) ∈ RN × RN
to be the (accessory) Lagrangian.

Suppose u0 is a minimum, we examine the following integral based on the
accessory Lagrangian:
Z
Qu0 (ϕ) = Φu0 (t, ϕ(t), ϕ̇(t)) dt, ∀ ϕ ∈ C01 (J, RN ).
J
Since Qu0 (ϕ) = δ 2 I(u0 , ϕ) ≥ 0, ∀ ϕ ∈ C01 (J, RN ) and Qu0 (θ) = 0, θ must be
a minimum.
We extend the domain of functional Qu0 to be Lip0 (J, RN ) (Lipschitz func-

tions with vanishing boundary values), its integral form of the E-L equation looks
like
Z t
>
Au0 ϕ̇(t) + Bu0 ϕ(t) − (Bu0 ϕ̇(t) + Cu0 ϕ(t)) = const.
t0
If L along u0 satisfies the strict Legendre–Hadamard condition, i.e. Au0 is positive

definite, then using the above integral form of the E-L equation, the solution ϕ ∈
C 2 (J, RN ). Furthermore, ϕ must satisfy the homogeneous second order ordinary
differential equation:
d
Ju0 (ϕ) = [Au0 ϕ̇(t) + Bu>0 (t)ϕ] − [Bu0 ϕ̇(t) + Cu0 ϕ] = 0, ∀ t ∈ J.
dt
We call this equation the Jacobi equation and the operator Ju0 the Jacobi operator
along u0 (a solution of the E-L equation).
The Jacobi operator is a linear ordinary differential operator of second order,
and it plays a similar role in variational problems as that of the Hessian matrix in
extremal problems of functions.
We call a C 2 -solution of the Jacobi equation a Jacobi field along the orbit
u0 (t). All Jacobi fields together constitute a linear space of dimension 2N .
Theorem 3.2 If ϕ0 is a Jacobi field along u0 , then Qu0 (ϕ0 ) = 0. Conversely, if
ϕ ∈ Lip0 (J, RN ) satisfies Qu0 (ϕ0 ) = 0 and Qu0 (ϕ) ≥ 0 for all ϕ ∈ C01 (J, RN ),
then ϕ0 is a Jacobi field along u0 .
Proof “⇒” Since Φu0 is homogeneous of degree two with respect to (ξ, η), by
Euler’s identity, we have:
2Φu0 (t, ξ, η) = (Φu0 )ξ (t, ξ, η)ξ + (Φu0 )η (t, ξ, η)η.
Suppose ϕ0 ∈ C01 ([a, b], RN ), [a, b] ⊂ int(J) is a Jacobi field along u0 , then
Z b
2 (Φu0 )(t, ϕ0 (t), ϕ̇0 (t))dt
a Z
b
= [(Φu0 )ξ (t, ϕ0 (t), ϕ̇0 (t))ϕ0 (t) + (Φu0 )η (t, ϕ0 (t), ϕ̇0 (t))ϕ̇0 (t)]dt
Zab
d
= [(Φu0 )ξ (t, ϕ0 (t), ϕ̇0 (t)) − (Φu0 )η (t, ϕ0 (t), ϕ̇0 (t))]ϕ0 (t)dt
aZ dt
b
=− Ju0 (ϕ) dt = 0.
a
Since a and b are arbitrary, it follows that Qu0 (ϕ0 ) = 0.
“⇐” Using smooth function approximation, we have:
Qu0 (ϕ) ≥ 0, ∀ ϕ ∈ Lip0 (J, RN ).
Thus, ϕ0 is a minimum of Qu0 . From our previous argument, it must satisfy the
integral form of the E-L equation, hence it also satisfies the differential form of
the E-L equation Ju0 (ϕ0 ) = 0.
Lemma 3.2 Given a sufficiently smooth Lagrangian L, suppose it satisfies the
strict Legendre–Hadamard condition along a solution u0 of the E-L equation,
namely, Au0 is positive definite. Suppose there exists µ > 0 such that
Z
Qu0 (ϕ) ≥ µ |ϕ|2 dt,
J
then there exists λ > 0 such that
Z
Qu0 (ϕ) ≥ λ (|ϕ̇|2 + |ϕ|2 )dt.
J
Consequently, u0 is a strict minimum of the functional
Z
I(u) = L(t, u(t), u̇(t)) dt.
J
R
Proof For any two continuous functions φ, ψ on J, let hφ, ψi = J
φ(t)ψ(t)dt.
Since Au0 is positive definite, there exists α > 0 such that
Z
hAu0 ϕ̇, ϕ̇i ≥ α |ϕ̇|2 dt.
J
From
Qu0 (ϕ) = hAu0 ϕ̇, ϕ̇i + 2hBu0 ϕ̇, ϕi + hCu0 ϕ, ϕi,
we can find two positive constants C1 and C2 such that
Z
α |ϕ̇|2 dt
J
Z 21 Z 21 Z
2 2
≤ Qu0 (ϕ) + C1 |ϕ̇| dt |ϕ| dt + |ϕ|2 dt
Z J Z J J
α 2 2
≤ |ϕ̇| dt + Qu (ϕ) + C2 |ϕ| dt.
2 J J
Using the assumption
Z
|ϕ|2 dt ≤ µ−1 Qu0 (ϕ),
J
we find that
Z
2
(1 + C2 µ−1 )Qu0 (ϕ).
|ϕ̇|2 dt ≤
J α
According to Theorem 3.1 and Poincaré’s inequality, u0 is a strict minimum of I.

3.5 Conjugate points
Definition 3.1 (ConjugateR points) Let u0 be a solution of the E-L equation of

the functional I(u) = J L(t, u(t), u̇(t)) dt. We call (a, u0 (a)) and (b, u0 (b)) a
pair of conjugate points along the orbit (t, u0 (t)), if there exists a nonzero Jacobi
field ϕ ∈ C01 ([a, b], RN ) along u0 (t) (see Figure 3.1).
Fig. 3.1
Sometimes, if there are no conjugate points on the orbit {(t, u0 (t)) | t ∈

(t0 , t1 )}, we simply say that u0 has no conjugate points.
Example 3.3 Given the metric
e(x, y)dx2 + 2f (x, y)dxdy + g(x, y)dy 2
on a surface S in R3 . We choose a geodesic γ in S and without loss of generality,
we may assume that it is the x-axis (y = 0) and the curves x = const are perpen-
dicular to γ. We furnish S with an orthonormal frame, under which, the square of
the line element of the curve y = u(x) is given by
ds2 = e(x, y)dx2 + dy 2 ,
where e > 0, e(x, 0) = 1, and ey (x, 0) = 0. The arclength functional is
Z bp
I(u) = e(x, u) + u̇2 dx,
a
i.e.
p
L(t, u) = e(x, u) + p2 .
Hence,
e 2euu (e + p2 ) − e2u
Lpp = 3 , Lup = Lpu = 0, Luu = 3 .
(e + p2 ) 2 4(e + p2 ) 2
Along the geodesic γ: y = 0, we have:

AB 1 0
=
BC 0 12 euu
In differential geometry, we call the quantity
1
K(x) = − euu (x, 0)
2
the Gaussian curvature, whose accessory variational integral is
1 b 2
Z
Q0 (ϕ) = [ϕ̇ − K(x)ϕ2 ]dx.
2 a
The Jacobi operator is then
J0 (ϕ) = ϕ̈ + Kϕ.
When K is a constant, the Jacobi field is of the form

1 √

 √ sinh( −Kt), K < 0
 −K


ϕ(t) = t, K = 0,

 1 √
 √ sin( Kt), K > 0.


K
It follows that if K ≤ 0, then there are no conjugate√points. However, when
K > 0, the first conjugate point of (0, 0) along γ is (π/ K, 0).
Remark 3.1 For a general Riemannian manifold (M, g), g is a Riemannian
metric on M , the Lagrangian of a geodesic is
X
L(u, p) = gij (u)pi pj ,
the corresponding Jacobi equation is
d2 ϕ
+ R (u̇(t), ϕ(t)) u̇(t) = 0,
dt2
where R(·, ·) denotes the Riemann curvature operator.
Theorem R 3.3 Suppose u0 is a solution of the E-L equation of the functional
I(u) = J L(t, u(t), u̇(t)) dt and Au0 is positive definite. If δ 2 I(u0 , ϕ) ≥ 0 for
all ϕ ∈ C01 (J, RN ), then there is no a ∈ (t0 , t1 ) such that (a, u0 (a)) is conjugate
to (t0 , u0 (t0 )).
Proof Suppose not, then ∃ a ∈ (t0 , t1 ) such that (a, u0 (a)) and (t0 , u0 (t0 )) are
conjugate points, i.e. there exists a nonzero Jacobi field ξ ∈ C 2 ([t0 , a], RN ) along
u0 (t) satisfying: Ju0 (ξ) = 0 and ξ(t0 ) = ξ(a) = 0. Let

˜ = ξ(t) t ∈ [t0 , a],
ξ(t)
0 t ∈ [a, t1 ],
then ξ˜ ∈ Lip(J, RN ) with ξ(t

˜ 0 ) = ξ(t
˜ 1 ) = 0 and
Z a
˜ =
Qu0 (ξ) ˙
Φu0 (t, ξ(t), ξ(t))dt = 0.
t0
By Theorem 3.2, ξ˜ ∈ C 2 (J, RN ) must satisfy the Jacobi equation Ju0 (ξ) ˜ = 0.
By the uniqueness of the solution of the initial value problem of a second order
ordinary differential equation, ξ˜ ≡ 0, contradictory to ξ 6= 0, hence completes the
proof.
For the special case N = 1, we can also show that the converse of the above
theorem is also true.
We shall henceforth assume u0 is a solution of the E-L equation. Notice that
if u0 has no conjugate points on (t0 , t1 ], then there exists a positive Jacobi field
ψ > 0, ∀ t ∈ J.
To see this, suppose λ is a Jacobi field with the initial conditions λ(t0 ) = 0
and λ̇(t0 ) = 1. By assumption, the next root a satisfies a > t1 . Since the
solution of an ordinary differential equation varies continuously dependent on the
initial values, there exist > 0 and a Jacobi field ψ such that ψ(t0 − ) = 0,
ψ̇(t0 − ) = 1, and ψ(t) > 0, ∀ t ∈ J.
Lemma 3.3 Suppose ψ(t) > 0, ∀ t ∈ J is a Jacobi field along u0 , then for all
ϕ ∈ C01 (J), we have:
Z 0 2
ϕ
Qu0 (ϕ) = Au0 (t)ψ 2 (t) (t) dt.
J ψ
ϕ
Proof Let λ = ψ, then ϕ = λψ and ϕ0 = λ0 ψ + λψ 0 . Hence,
Au0 ϕ02 + 2Bu0 ϕ0 ϕ + Cu0 ϕ2

= λ2 (Au0 ψ 02 + 2Bu0 ψ 0 ψ + Cu0 ψ 2 ) + 2λ0 λψ(Au0 ψ 0 + Bu0 ψ) + λ02 Au0 ψ 2 .
Now since ψ satisfies the Jacobi equation, we then have:
Z
(Au0 ϕ02 + 2Bu0 ϕ0 ϕ + Cu0 ϕ2 ) dt
J
d(Au0 ψ 0 + Bu0 ψ) 2
Z
= ψ 0 λ2 (Au0 ψ 0 + Bu0 ψ) + ψλ
J dt

0 0 02 2
+ 2λ λψ(Au0 ψ + Bu0 ψ) + Au0 λ ψ dt
Z
= (Au0 λ02 ψ 2 )dt + ψλ2 (Au0 ψ 0 + Bu0 ψ)|tt10
ZJ
= (Au0 λ02 ψ 2 )dt.
J

Theorem 3.4 Let N = 1 and assume u0 ∈ C 1 (J) is a solution of the E-L equa-
tion. If ∃ λ > 0 such that Au0 (t) ≥ λ, ∀ t ∈ J and there exists a Jacobi field
ψ > 0 on J, then u0 is a strict minimum.
Proof Denote
α = inf (Au0 (t)ψ 2 (t)) > 0.
J
∀ϕ∈ C01 (J),

we use Lemma 3.3 and Poincaré’s inequality to obtain:
Z 02
2 ϕ
Qu0 (ϕ) = Au0 ψ dt
J ψ
Z 02
ϕ
≥α dt
J ψ
Z 2
1 ϕ
≥α 2 dt
|J| J ψ
Z
1 1
≥ α inf ϕ2 dt.
J ψ 2 |J|2 J
Thus, there exists µ > 0 such that
Z
Qu0 (ϕ) ≥ µ |ϕ|2 dt.
J
The assertion now follows from Lemma 3.2.
1
Example 3.4 Let M = {u ∈ C ([0, 1]) | u(0) = a, u(1) = b} and consider the
functional
Z 1
I(u) = (tu̇ + u̇2 )dt.
0
Since Lu = 0, Lp = 2p + t, and Lpp = 2, its E-L equation
2ü(t) + 1 = 0
has solutions
t2

1
u(t) = − + b + t + a.
4 4
The accessory variational integral is
Z 1
Qu (ϕ) = ϕ̇2 dt,
0
with the corresponding Jacobi equation
ϕ̈ = 0.
Using the initial conditions (ϕ(0), ϕ̇) = (0, 1), we see that ϕ = t. This Jacobi
field has no conjugate point and hence u is a strict minimum.
Exercises
1. Find the minimum of each of the following functionals:

(1)
Z 1
I(u) = (tu̇ + u̇2 ) dt, M = C01 (0, 1).
0
(2)
Z 1 p
I(u) = u 1+ u̇2 dt, M = {v ∈ C01 [a, b] | v(a) = cosh a, v(b) = cosh b},
0
for 0 < a < b.
(3)
Z 1
I(u) = (u2 + u̇2 ) dt, M = {v ∈ C 1 [0, b] | v(0) = 0, v(b) = B}.
0
2. Assume that ϕ is an absolutely continuous function on [a, b] whose a.e. deriva-

tive function ϕ0 is square integrable on [a, b]. If ϕ(a) = 0, prove the following
Poincaré’s inequality:
Z b
(b − a)2 b 0
Z
2
ϕ (x) dx ≤ [ϕ (x)]2 dx.
a 2 a
3. In R3 , consider the surface of revolution generated by revolving the curve r =

r(z) > 0 about the z-axis:
p
S : r = x2 + y 2 .
(1) Find the metric on S.
(2) Write the equation of geodesics in S.
(3) First write the geodesic equations for the cylinder r = const. and the cir-
cular cone r = z respectively, then determine whether they are minima.
4. Assume L = L(t, u, p) is a differentiable function which is bounded from
below. Furthermore, assume L is strictly convex with respect to (u, p). Let
M = C01 (J). Show that the solution u ∈ M of the E-L equation must be a
strict minimum of the functional.
May 2, 2013 14:6 BC: 8831 - Probability and Statistical Theory PST˙ws
This page intentionally left blank

Lecture 4
Strong minima and extremal fields
4.1 Strong minima and weak minima
Similar to a minimum of a given function, a minimum of a given functional

is also a local minimum. The notion of ‘local’ is determined by neighborhoods.
In calculus of variations, the space M is often an infinite dimensional function
space and an infinite dimensional space is usually equipped with many distinct
topologies; it is therefore crucial to specify what topology we are considering in
our investigations.
Definition 4.1 Let J = [t0 , t1 ], u ∈ C 1 (J, RN ) is called a strong (weak) mini-
mum of the functional
Z
I= L(t, u(t), u̇(t)) dt
J
if there exists > 0 such that for all ϕ ∈ C01 (Ω, RN ) with
kϕkC 0 (J) < (kϕkC 1 (J) < ),
we have:
I(u + ϕ) ≥ I(u).
The C 1 requirement can be replaced by Lip and using the Lip-norm instead of the
C 1 -norm. Without any confusion, we still call this a weak minimum. In Lecture 3,
the notion of a minimum agrees with a weak local minimum (cf. §3.2).
It is clear that a strong minimum is always a weak minimum and a weak min-
imum in the sense of Lipschitz is also a weak minimum under the C 1 -topology.
We have the following example.
Example 4.1 Let
Z 1
I(u) = (u02 + u03 )dx
0
43
and
M = {u ∈ Lip([0, 1], R1 ) | u(0) = u(1) = 0},
then u = 0 is a weak minimum.
In fact, for kukLip < 12 , we have:
Z 1
1 1 02
Z
02 03
I(u) − I(0) = (u + u )dx ≥ u dx ≥ 0.
0 2 0
Moreover, from
Z 1
δ 2 I(0, ϕ) = ϕ̇2 dt, ∀ ϕ ∈ C01 ([0, 1])
0
and Poincaré’s inequality, we see that u = 0 must be a strict (weak) local
minimum.
On the other hand, we claim u = 0 is not a strong minimum. ∀ 0 < h < 1−h2 ,
let
x

−h,
 x ∈ [0, h2 ],
uh (x) =
 h(x − 21) , x ∈ [h2 , 1],

1−h
then
1 − 1 ≤− 1 ,

x ∈ [0, h2 ],
 h2 h3 2h3


0 0
(uh2 + uh3 )(x) = 2 3

 h + h ≤ 2, x ∈ [h2 , 1].
1 − h2 1 − h2

Thus, kuh kC 0 ≤ h and

1
I(uh ) − I(0) = I(uh ) ≤ 2 − −→ −∞ (h → 0).
2h
4.2 A necessary condition for strong minimal value and the

Weierstrass excess function
Let L ∈ C 1 (J × RN × RN ) be a Lagrangian and suppose u is a solution of

the E-L equation of the functional
Z
I(u) = L(t, u(t), u̇(t)) dt,
J
we seek a necessary condition for u to be a strong minimum of I. On that note,
we compare I(u) to the values of I in a C 0 -neighborhood of u.
Strong minima and extremal fields 45
Suppose u is a strong minimum, then ∀ ϕ ∈ C01 (J, RN ) with kϕkC 0 < ,

I(u + ) ≥ I(u).
We now construct such a function ϕ. ∀ ξ ∈ RN , ∀ τ ∈ (t0 , t1 ), we choose
λ > 0 sufficiently small so that [τ − λ2 , τ + λ] ⊂ (t0 , t1 ). Let

 0,
 s ∈ (−∞, −λ2 ] ∪ [λ, ∞)
ψλ (s) = λ2 + s, s ∈ [−λ2 , 0]

 2
λ − λs, s ∈ [0, λ],
then

 0,
 s ∈ (−∞, −λ2 ] ∪ [λ, ∞)
0
ψλ (s) = 1, s ∈ [−λ2 , 0]

−λ, s ∈ [0, λ].

We define
ϕλ (t) = ξψλ (t − τ ),
it is easy to check that kϕλ kC 0 = O(λ2 ), kϕ̇λ kC 0 = kξkRN , and
kϕλ kC 1 ([τ,τ +λ]) = O(λ). In particular, if we choose ϕ = ϕλ , then
I(u + ϕλ ) − I(u) ≥ 0.
It follows from the E-L equation that
Z Z
F (t) dt := {L(t, u(t) + ϕλ (t), u̇(t) + ϕ̇λ (t)) − L(t, u(t), u̇(t))
J J
− ϕλ (t)Lu (t, u(t), u̇(t)) − ϕ̇λ (t)Lp (t, u(t), u̇(t))}dt
≥ 0
and
Z Z τ −λ2 Z τ Z τ +λ Z t1
F (t) dt = + + + F (s) ds.
J t0 τ −λ2 τ τ +λ
Note the first and fourth integrals are both equal to zero, whereas the integrand of
the third integral is o(λ), whence
Z τ +λ
1
lim 2 F (s) ds = 0.
λ→0 λ τ
Lastly, the second integral yields:
Z τ
1
lim F (s) ds
λ→0 λ2 τ −λ2
= L(τ, u(τ ), u̇(τ ) + ξ) − L(τ, u(τ ), u̇(τ )) − ξLp (τ, u(τ ), u̇(τ )).
Fig. 4.1
We call the following

EL (t, u, p, q) = L(t, u, q) − L(t, u, p) − (q − p) · Lp (t, u, p)
the Weierstrass excess function. Figure 4.1 illustrates its geometric meaning.
As seen in the graph, t0 ∈ J, u0 = u(t0 ), p0 = u̇(t0 ). For ft0 (p) =
L(t0 , u0 , p), EL (t0 , u0 , p0 , q) is the difference of the value of ft0 at which p = q
d
and the value of tangent line ft0 (p0 )+(q −p0 ) dp ft0 (p0 ), or simply, the difference
of the curve and its tangent line.
In summary, we have:
Theorem 4.1 Suppose u ∈ C 1 (J, RN ) is a strong minimum of I, then
EL (t, u(t), u̇(t), u̇(t) + ξ) ≥ 0, ∀ ξ ∈ RN , ∀ t ∈ J. (4.1)
4.3 Extremal fields and strong minima
In this section, we turn our attention to the sufficient condition of a strong

minimum. Given a function u, we compare u with nearby C 0 functions sharing
common endpoints.
Conventionally, we call the graph γ0 = {(t, u0 (t)) ∈ J × RN | t ∈ J} of u0 ,
a solution of the E-L equation, an extremal curve. We will embed the extremal
curve γ0 into its nearby extremal curves.
Given J = [t0 , t1 ], a Lagrangian L ∈ C 1 (J × RN × RN ), the functional
Z
I(u) = L(t, u(t), u̇(t)) dt,
J
and u0 , a solution of the E-L equation of I.

Fig. 4.2
Suppose we can extend u0 to a larger interval J1 = (a, b) ⊃ J and suppose

{(t, u(t, α)) | t ∈ J1 , α ∈ B1 (θ) ⊂ RN , 1 > 0} is a family of sufficiently
smooth extremal curves of I.
Definition 4.2 (A field of extremals) Let Ω be a simply connected neighborhood
of {(t, u(t, α)) | t ∈ J1 , α ∈ B (θ) ⊂ RN } ( ∈ (0, 1 )). If ψ ∈ C 1 (Ω, RN ) is a
vector field which satisfies:
1. every solution of u̇(t) = ψ(t, u(t)) is a solution of the E-L equation of I;
2. det(∂αi uj (t, α)) 6= 0;
3. ∀ (t1 , u1 ) ∈ Ω, ∃ a unique α1 ∈ B1 (θ), such that u(t1 , α1 ) = u1 ;
4. u(t, 0) = u0 (t),
then Ω is said to be a field of extremals and ψ is said to be its directional field
(flow) (see Figure 4.2).
Example 4.2 Let L = 12 p2 , then the E-L equation ü = 0 has solutions uλ =
mt + λ. Hence, Ω = {(t, mt + λ) | (t, λ) ∈ R1 × R1 } and ψ(t, u) = m is a field
of extremals and its directional field, where m is a constant (see Figure 4.3).
p
Example 4.3 Let L = (1 + p2 ), then the E-L equation ü = 0 has solutions
uλ = λt. Hence, Ω = {(t, λt) | (t, λ) ∈ (t0 , ∞) × R1 }. When t0 > 0, Ω and
ψ(t, u) = ut is a field of extremals and its directional field (see Figure 4.4).
Example 4.4 Let L = 12 (p2 − u2 ), then the E-L equation ü = −u has so-
lutions uλ = sin (t + λ), ∀ λ ∈ R1 . For any open interval J, let Ω =
{(t, sin (t + λ)) | (t, λ) ∈ J × (−1, +1)}. Although Ω is covered by extremal
Fig. 4.3
Fig. 4.4
curves, there are two distinct extremal curves passing through each point of Ω, so
it is not a field of extremals.
In contrast, the family of extremal curves uλ = λ sin t (∀ λ ∈ R1 ) generates
a field of extremals and its directional field where Ω = {(t, λ sin t) | (t, λ) ∈
(, π − ) × R1 } ( ∈ (0, π2 )) and ψ(t, u) = u cot t.
Suppose γ0 = {(t, u0 (t)) | t ∈ J} ⊂ Ω and ψ its directional field, we compare
it with its nearby piecewise C 1 curves γ = {(t, u(t)) | t ∈ J} ⊂ Ω with common
endpoints (u(t0 ) = u0 (t0 ), u(t1 ) = u0 (t1 )). Since

Z Z
I(u) = Ldt = L(t, u(t), u̇(t))dt
γ J
and ψ is a directional field of Ω, it follows that

Z
I(u0 ) = [(L(t, u, ψ(t, u))−ψ(t, u)Lp (t, u, ψ(t, u)))dt+Lp (t, u, ψ(t, u))du].
γ0
If the integral is independent of path, then

Z
I(u0 ) = L(t, u0 (t), u̇0 (t))dt
ZJ
= [(L(t, u, ψ(t, u))−ψ(t, u)Lp (t, u, ψ(t, u)))dt+Lp (t, u, ψ(t, u))du]
γ
Z
= [(L(t, u(t), ψ(t, u(t)))−(u̇(t)−ψ(t, u(t)))Lp (t, u(t), ψ(t, u(t))))]dt.
J
Thus,
Z
I(u) − I(u0 ) = [L(t, u(t), u̇(t)) − L(t, u(t), ψ(t, u(t)))
J
− (u̇(t) − ψ(t, u(t)))Lp (t, u(t), ψ(t, u(t)))]dt
Z
= E(t, u(t), ψ(t, u(t)), u̇(t))dt.
J
If we further assume
E(t, u, ψ(t, u), p) ≥ 0, ∀ (t, u, p) ∈ Ω × RN ,
then
I(u) ≥ I(u0 ),
which means u0 is a strong minimum.
We next address the following questions:
1. Is it possible to embed the extremal curve of u0 into a field of extremals?
By “embed” we mean there exists an open interval J1 ⊃ J, a continuous
function u : J1 × B → RN such that ∀ α = (α1 , α2 , . . . , αN ) ∈ B ⊂ RN ,
u(t, α) is an extremal curve for which u0 (t) = u(t, 0)|J , ∀ t ∈ J and Ω =
{t, u(t, α) | t ∈ J1 , α ∈ B } is a field of extremals.
2. Why is the above integral independent of path?
We now answer the first question: under what condition can a given extremal
curve γ0 be embedded into a field of extremals Ω?
For simplicity, we shall only present the argument for the case N = 1.
Fig. 4.5
Lemma 4.1 If L ∈ C 3 and {u(t, α) ∈ C 2 (J × (−ǫ, ǫ))} is a family of solutions

of the E-L equation, then

∂
ξ(t) = u(t, s)
∂α α=0
is a Jacobi field along u0 (t) = u(t, 0) (see Figure 4.5).
Proof Denote uα = u(t, α) and differentiate the E-L equation:
d
Lp (t, uα (t), u̇α (t)) = Lu (t, uα (t), u̇α (t))
dt
with respect to α at α = 0, by letting τ = (t, u0 (t), u̇0 (t)), we obtain:
d ˙ + Lpu (τ )ξ(t) = Lpu (τ )ξ(t)

˙ + Luu (τ )ξ(t),
Lpp (τ )ξ(t)
dt
which means
Ju0 (ξ) = 0.
Lemma 4.2 Assume N = 1, L ∈ C 3 , and u0 ∈ C 2 is a solution of its E-L

equation. Suppose the strict Legendre–Hadamard condition along u0 (t) holds:
Lpp (t, u0 (t), u̇0 (t)) > 0 for all t ∈ (t0 , t1 ]. If on (t0 , t1 ], (t0 , u0 (t0 )) has no
conjugate point along the extrema curve γ0 of u0 , then γ0 can be embedded into a
field of extremals Ω, a simply connected region generated by the family of solution
curves.
Proof 1. From Lpp (t, u0 (t), u̇0 (t)) > 0 for all t ∈ J = [t0 , t1 ] and the E-L
equation, the solution u0 can be extended to a larger interval J1 = (a, b) ⊃ J.
2. ∀ α ∈ R1 with |α| < ǫ0 sufficiently small, solving the initial value problem
of the E-L equation:
EL (ϕ(·, α)) = 0, ϕ(a, α) = u0 (a), ϕt (a, α) = u̇0 (a) + α,
it yields a family of solution ϕ(t, α), t ∈ (a, b), and |α| < 0 . By the unique-
ness of solutions of ordinary differential equations, ϕ(t, 0) = u0 (t). This fulfills
requirement (4) in Definition 4.2.
3. Define Ω = {(t, ϕ(t, α)) | t ∈ (a, b), |α| < }, < 0 . By Lemma 4.1,
ξ(t) = ∂α ϕ(t, 0)|α=0
is a Jacobi field along u0 which satisfies:
˙
ξ(a) = 0 and ξ(a) = 1.
By our assumption, it has no conjugate point, and Lemma 3.3 shows that there
exists ξ(t) > 0 for all t ∈ [a, t1 ]. According to the continuous dependence of
parameters, for 0 < 1 < 0 ,
∂α ϕ(t, α) > 0, |α| < 1 .
This fulfills requirement (2) in Definition 4.2.
Using the Implicit Function Theorem, we can find 0 < 2 < 1 , ∀ (t, u) ∈
Ω2 , the equation
u = ϕ(t, α)
has a unique continuously differentiable solution α = w(t, u) ∈ B2 (0). This
fulfills requirement (3) in Definition 4.2.
4. Let Ω = Ω2 , it contains γ0 and it is homeomorphic to a quadrilateral,
whence it is simply connected. Furthermore, the directional field ψ(t, u) gener-
ated by ϕ(t, u) is
ψ(t, u) = ∂t ϕ(t, w(t, u)).
It follows that ψ is defined everywhere in Ω and ϕ(t, α) describes a family of
solution curves of the equation
u̇ = ∂t ϕ(t, α) = ∂t ϕ(t, w(t, u)) = ψ(t, u).
This fulfills requirement (1) of Definition 4.2 (see Figure 4.6).
3
Remark 4.1 When N > 1, Lemma 4.2 still holds. It states: let L ∈ C ([t0 , t1 ] ×
RN × RN ) and u0 ∈ C 2 ([t0 , t1 ] × RN ) be a solution of its E-L equation. Suppose
for all t ∈ [t0 , t1 ], the matrix Lpp (t, u0 (t), u̇0 (t)) is invertible. If on (t0 , t1 ],
(t0 , u0 (t0 )) has no conjugate point along the extrema curve γ0 of u0 , then γ0 can
be embedded into a field of extremals Ω, a simply connected region generated by
the family of solution curves.
The proof resembles the proof of Lemma 4.2. We need only replace the scalar
α ∈ R1 in step 2 above by a vector α ∈ RN . The resulting solution ϕ(t, α) of the
E-L equation then satisfies:
∂α ϕj (a, α) = 0, ∂αi ∂t ϕj (a, α) = δij , 1 ≤ i, j ≤ N.
Fig. 4.6
We modify the Jacobi field in the third step above by {wi (t, α) =
∂αi ∂t ϕ(a, α), i = 1, . . . , N }. Since
wi (a, 0) = 0, ∂αi wi (a, 0) = ei , i = 1, . . . , N
and on J1 , it has no conjugate point along γ0 , we have:
det(∂α ∂t ϕj (t, α)) 6= 0, ∀ t ∈ J1 .
The rest of the proof remains the same.
4.4 Mayer field, Hilbert’s invariant integral
We now examine the second question: under what condition is the integral in
Theorem 4.1 independent of path?
Let

Ri (t, u) = Lpi (t, u, ψ(t, u)),
H(t, u) = hψ(t, u), Lp (t, u, ψ(t, u)i − L(t, u, ψ(t, u)),
where h·, ·i denotes the standard inner product on RN and the 1-form
N
X
ω= Ri dui − H dt.
i=1
We show that
dω = 0.
Definition 4.3 A field of extremals is called a Mayer field if it satisfies the fol-
lowing compatibility condition:
∂ui Lpj (t, u(t), ψ(t, u(t))) = ∂uj Lpi (t, u(t), ψ(t, u(t))), ∀ 1 ≤ i, j ≤ N.
Corollary 4.1 For N = 1, every field of extremals is a Mayer field.

Given a field of extremals and its directional field (Ω, ψ), we introduce
N
X XN XN
Dψ = ∂t + ψi ∂ui + ∂t ψi + ψk ∂uk ψi ∂pi
i=1 i=1 k=1
and we have:
Lemma 4.3 Let L ∈ C 2 (J × RN × RN ). Then (Ω, ψ) is a field of extremals and
its directional field if and only if for any integral curve (t, u(t)), we have:
Dψ Lp (t, u(t, α), ψ(t, u(t, α))) = Lu (t, u(t, α), ψ(t, u(t, α))).
Proof We henceforth denote L̃ := L̃(t, α) = L(t, u(t, α), ψ(t, u(t, α))) and
likewise for L̃ui and L̃pi . (Ω, ψ) is a field of extremals and its directional field
d
if and only if u̇(t) = ψ(t, u(t)) with the E-L equation L̃u = dt L̃p . We compute
directly that
d
L̃ui = L̃p
dt i
N
X
j d
= ∂t + ψ ∂uj + ψj (t, u(t))∂pj L̃pi
j=1
dt
N
X
j N
= ∂t + ψ ∂uj + ∂t ψj + Σk=1 ψk ∂uk ψj ∂pj L̃pi
j=1
= Dψ L̃pi .
Lemma 4.4 If L ∈ C 2 (J × RN × RN ), then Ω is a Mayer critical field if and
only if dω = 0, i.e.
∂t Ri = −∂ui H, 1 ≤ i ≤ N.
Proof By direct computation, we have:
N
X
j
∂t Ri = ∂t + ψt ∂pj L̃pj .
j=1
It follows from the compatibility condition that

−Hui = ∂ui (L − ψLp )(t, u, ψ(t, u))
N
X N
X
j j
= L̃ui + ∂ui ψ ∂pj L̃ − ∂ui ψ L̃pj − ψ j ∂uj L̃pi
j=1 j=1
N
X N
X
= L̃ui − ψ j ∂uj + k
ψuj ∂pk L̃pi .
1 1
Hence,
∂t Ri + ∂ui H
N
X N
X N
X
= ∂t + ψti ∂pi L̃p + j
ψ ∂uj + k j
ψ ∂uk ψ ∂pj L̃pi − L̃ui
i=1 j=1 k=1
= Dψ L̃pi − L̃ui .
dω = 0 now follows immediately from Lemma 4.3. Conversely, using dω = 0

and the above equality, the reader can derive the compatibility condition.
Given a Mayer field (Ω, ψ), both R and H are already defined. If γ is any
curve joining the points (t0 , u(t0 )) and (t, u), the line intergal
Z (t,u) N
X
S(t, u) = Ri (t, u) dui − H(t, u) dt
(t0 ,u(t0 )) i=1
Z
= (Lp · du + (L − ψLp )dt)
γ
is independent of path. We shall call this line intergal the Hilbert invariant
integral.
4.5 A sufficient condition for strong minima
The case N = 1 is relatively straightforward, we have the following:
Theorem 4.2 For N = 1 and L ∈ C 3 , suppose on (t0 , t1 ], (t0 , u0 (t0 )) has no

conjugate point along γ0 , the extremal curve corresponding to the solution of the
E-L equation u0 . Let (Ω, ψ) be a field of extremals with a directional field, in
which u0 is embedded. If Lpp (t, u, p) > 0, ∀ (t, u, p) ∈ Ω × R1 , then u0 is a
strong minimum of I.
Proof Notice
E(t, u, ψ(t, u), q) = L(t, u, q) − L(t, u, ψ(t, u)) − (q − ψ(t, u))Lp (t, u, ψ(t, u))
= Lpp (t, u, v) ≥ 0, ∀ (t, u, q) ∈ Ω × RN ,
where v is in between ψ(t, u) and q. Our assertion follows from Lemmas 4.2 and
4.3, and the argument used in §4.3.
Example 4.5 Let
M = {v ∈ C 1 ([0, a]) | v(0) = (cosh a)−1 , v(a) = 1}.
Suppose the E-L equation ü − u = 0 corresponding to

Z a
I(u) = (u̇2 + u2 )dt
0
has a solution
cosh t
u0 = .
cosh a
Since the Jacobi equation ϕ̈ − ϕ = 0 has no conjugate points and
E(t, u, p, q) = (u2 + q 2 ) − (u2 + p2 ) − 2p(q − p) = (q − p)2 ≥ 0,
u0 must be a strong minimum.

Example 4.6 Find the weak minima and the strong minima of the functional
Z 1
I(u) = u̇2 (1 + u̇)2 dt
0
with the boundary conditions u(0) = 0 and u(1) = m.

Clearly, I has minimal value 0. The solution of the E-L equation which
achieves the value zero must satisfy either u̇ = 0 or u̇ = −1.
Unless m = 0 and u(t) = 0, or m = −1 and u(t) = −t, there can be no other
C 1 solution. However, for Lipschitz functions, if there exists a solution, then it is
necessary to have −1 ≤ m ≤ 0, in which case, they are
(
−t 0 ≤ t ≤ −m,
u1 =
m −m ≤ t ≤ 1.
or
(
0 0 ≤ t ≤ 1 + m,
u2 =
1+m−t 1 + m ≤ t ≤ 1.
We calculate the second derivatives of the Lagrangian:
Luu = Lup = 0, Lpp = 2(6p2 + 6p + 1) = 12(p − p1 )(p − p2 ).
Lpp = 0 has zeros

√ √
1 3 1 3
p1 = − − and p2 = − + .
2 6 2 6
When p ∈ / [p1 , p2 ], Lpp > 0. Regardless of whether we are considering the C 1

solutions or the Lip solutions u1 and u2 , their slopes are equal to 0 or −1, which
are outside the interval, hence Lpp (t, u(t), u̇(t)) > 0. Their corresponding Jacobi
operator
Ju (ϕ) = 2(6p2 + 6p + 1)ϕ̈
has no conjugate points, whence u1 and u2 are both strict weak minima.
Lastly, we take a look at the Weierstrass excess function
E(t, u, p, q) = q 2 (1 + q)2 − p2 (1 + p)2 − (q − p)(4p3 + 6p2 + 2p)

= [q(1 + q) − p(1 + p)]2 + 2p(1 + p)(q − p)2
≥ 0.
This implies u1 and u2 are both strong minima as well.

Utilizing the concept of Mayer field, we have the following theorem regarding
N > 1.
Theorem 4.3 Suppose the E-L equation of I has a solution u0 . Suppose its cor-
responding extremal curve γ0 can be embedded into a family of extremal curves
such that they define a Mayer field (Ω, ψ). Furthermore, if
E(t, u, ψ(t, u), p) ≥ 0, ∀ (t, u, p) ∈ Ω × RN ,
then u0 is a strong minimum of I.

In fact, for N > 1, a result similar to Theorem 4.2 also holds.
Theorem 4.4 Let L ∈ C 3 (J × RN × RN , R1 ). Suppose
(1) (t0 , u0 (t0 )) has no conjugate points along the critical curve γ0 ,
(2) (Lpi pj (t, u0 (t), u̇0 (t))) ∀ t ∈ J is positive definite,
(3) E(t, u, ψ(t, u), p)) ≥ 0, ∀ (t, u, p) ∈ Ω × RN , p 6= ψ(t, u),
then u0 is a strong minimum of I.
4.6∗ The proof of Theorem 4.4 (for the case N > 1)
So far, we already know: if N = 1, then any extremal field (Ω, ψ) is a Mayer

field. However, for N > 1, we are seeking the conditions for which an extremal
curve can be embedded into a Mayer field.
For a given Lagrangian L, we denote L̃(t, α) = L(t, ϕ(t, α), ϕ̇(t, α)), where
ϕ̇(t, α) = ϕt (t, α). Likewise, we denote L̃ui and L̃pi . By direct computation, we
have:
N
X
dω = d Ri dui − H dt
i=1
N
X N
X
i i
=d L̃pi du + L̃ − ϕ̇ L̃pi dt
i=1 i=1
N
X
= d L̃ dt + L̃pi (dui − ϕ̇i dt)
i=1
N
X N
X N
X
= L̃uk dϕk ∧ dt + L̃pk dϕ̇k ∧ dt − L̃pi dϕ̇i ∧ dt
k=1 k=1 i=1
N X
X N
+ dL̃pi ϕiαl dαl
i=1 l=1
X N
N X N X
X N
= dL̃uk ϕkαl dαl ∧ dt + ∂t L̃pi ϕiαl dt ∧ dαl
k=1 l=1 i=1 l=1
N X
X N X
N
+ ∂αm L̃pi ϕiαl dαm ∧ dαl
m=1 i=1 l=1
N X
X N N X
X N
N X
= (L̃ui − ∂t L̃pi )ϕiαl dαl ∧ dt + ∂αm L̃pi ϕiαl dαm ∧ dαl .
i=1 l=1 m=1 i=1 l=1
Denote
N
X
[αl , αm ] = (∂αl L̃pi ϕiαm − ∂αm L̃pi ϕiαl )
i=1
and we call it the Lagrange bracket. Using the Lagrange bracket, we can rewrite
the above calculation as the following formula:
N X
X N X
dω = EL (ϕ)i ϕiαl dαl ∧ dt + [αl , αm ] dαl ∧ dαm . (4.2)
i=1 l=1 1≤l<m≤N
Consequently, we have:
Lemma 4.5 Let L ∈ C 3 (J × Rn × RN ). Suppose (Ω, ψ) is an extremal field
determined by a family of extremal curves (ϕ, α), then (Ω, ψ) is a Mayer field if
and only if
EL (ϕ(·, α)) = 0, ∀ α ∈ RN , [αl , αm ] = 0, 1 ≤ l, m ≤ N.
Proof Suppose (Ω, ψ) is a Mayer field, then by the invariance of differential

forms, the left-hand side of (4.2) is equal to zero. Since the extremal curve ϕ(·, α)
satisfies the E-L equation, the first term on the right-hand side is zero. Hence, the
Lagrange bracket equals zero.
Conversely, if
EL (ϕ(·, α)) = 0, ∀ α ∈ RN , [αl , αm ] = 0, 1 ≤ l, m ≤ N,
then ω is closed. Furthermore, since (Ω, ψ) is an extremal field, (Ω, ψ) is a Mayer

field.
For the Lagrange bracket, we have:
Lemma 4.6 Suppose L ∈ C 3 (J × Rn × RN ) and (Ω, ψ) is an extremal field,
then
∂ l m
[α , α ] = 0, ∀ l, m.
∂t
Proof Using conditions 1–4 in the definition of extremal fields, we compute

directly:
N
∂ l m X ∂ L̃ui i ∂ L̃pi i ∂ L̃ui i ∂ L̃pi i
[α , α ] = ϕ m + ϕ̇ m − ϕ l − ϕ̇ l .
∂t i=1
∂αl α ∂αl α ∂αm α ∂αm α
∂ L̃ ∂ L̃ ∂ L̃ ∂ L̃
After differentiating ∂αuli , ∂αumi , ∂αpli , ∂αpmi , and using the fact that L is twice
differentiable, all terms will cancel, which leaves us
∂ l m
[α , α ] = 0.
∂t
In the following, we will strengthen Remark 4.1 for the higher dimensional cases,
embedding a critical curve γ0 defined on J = [t0 , t1 ] into a Mayer field (Ω, ψ).
Theorem 4.5 Suppose the extremal curve γ0 has no conjugate points on J. Sup-
pose also (Lpi pj (t, u(t), u̇(t)) is invertible for all t ∈ J, then γ0 can be embedded
into a family of extremal curves such that this family of curves defines a Mayer
field (Ω, ψ).
Proof Based on Remark 4.1, we obtain a field of extremals (Ω, ψ), we now show
that this is a Mayer field. In fact, using ∂αi ϕj (a, α) = 0, ∀ i, j, we can deduce
d
that [αl , αm ](a, α) = 0, ∀ l, m ∈ [1, N ]. By Lemma 4.6, dt [αl , αm ](t, α) = 0,
l m
hence [α , α ](t, α) = 0. This means (Ω, ψ) is a Mayer field.
Exercises
1. Use the Weierstrass excess function to verify that u0 is not a strong minimum
of I(u) = J (u̇2 + u̇3 ) dt.
R
R2p
2. Verify that for I(u) = 0 u(1 + u̇2 ) and M = {u ∈ C 1 ([0, 2]) | u(0) =
2, u(2) = 5}, there is a two-parameter family of solutions to the E-L equation
2
1 t+β
u(t, α, β) = α + .
α 2
When (α, β) = (1, 2), u(t, 1, 2) ∈ M .
Use (u(t, α, β)) to determine two independent Jacobi fields.

Rb√
3. Suppose I(u) = a 1 + u̇ dt is defined on C01 (0, b), describe the field of ex-
tremals (ψ(t,
R 2u))2containing u0 = 0. Also, verify that it is a strong minimum.
4. Let I(u) = 1 (u̇ +t u̇ ) dt and M = {v ∈ C 1 ([1, 2]) | v(1) = 1, v(2) = 2}.
2 2
Verify that
2
u0 (t) = − + 3
t
is a strong minimum.
5. Suppose u ∈ C 1 (J × RN , RN ) satisfies El (u(t, α)) = 0, ∀ t ∈ J, ∀ α ∈ RN ,
u(0, α) = θ, ∂t u(0, α) = α, and u0 = u(·, 0). Prove that
(1) ∂αi u(t, α)|α=0 (i = 1, . . . , N ) are Jacobi fields.
(2) If (t, u0 (t)) has no conjugate points on J, then det(∂αi uj (t, α)) 6= 0,
∀ t ∈ J.
Rb
6. Suppose I(u) = a [ 12 u̇2 − V (u)] dt.
(1) When V (u) = cos u, write down the Jacobi operator and the Jacobi field
along the E-L equation solution u = 0.
(2) When V ∈ C 2 (R1 ), V 00 (u) ≤ 0, show that u0 is a strict weak minimum
of I.

Lecture 5
The Hamilton–Jacobi theory
5.1 Eikonal and the Carathéodory system of equations
Given a Lagrangian L : J × RN × RN −→ R1 and its corresponding

functional
Z
I(u) = L(t, u(t), u̇(t))dt,
J
we know from the previous lecture that for a field of extremals and its directional
field (Ω, ψ), there exists a 1-form
ω = Lp (t, u, ψ(t, u))du − (hψ(t, u), Lp (t, u, ψ(t, u))i
− L(t, u, ψ(t, u)))dt,
where h·, ·i denotes the standard inner product on RN .

In order for (Ω, ψ) to be a Mayer field, it is necessary and sufficient to require
ω to be a closed form.
As a consequence, by the Mayer field, ω defines a single-valued function g:
Z
g(t, u) − g(t0 , u0 ) = ω, (5.1)
γ
where γ is any curve connecting (t0 , u0 ) and (t, u). We call this single-valued
function g an eikonal.
Remark 5.1 The eikonal has the following physical meaning: consider the line
integral of ω along an extremal curve γ = (t, u(t)), u̇(t) = ψ(t, u(t)):
61
Z
g(t2 , u(t2 )) − g(t1 , u(t1 )) = ω
γ
Z
= (hLp (t, u(t), ψ(t, u(t))), dui
J
− [hψ(t, u(t)), Lp (t, u(t), ψ(t, u(t)))i
− L(t, u(t), ψ(t, u(t)))])dt
Z
= hLp (t, u(t), u̇(t)), u̇(t)i
J
− [hu̇(t), Lp (t, u(t), u̇(t))i − L(t, u(t), u̇(t))]dt
Z
= L(t, u(t), u̇(t))dt.
J
This shows that the difference of an eikonal g between any two points along a
given extremal curve is equal to the line integral of the Lagrangian along that
curve.
In optics, the Lagrangian represents the distance of a ray of light traveled in
an instance divided by its speed. The line integral is therefore equal to the time
elapsed when the light ray travels from point one to point two. Consequently, the
wavefront of a light ray from a one point source (or the phase front of waves) can
be expressed via the level sets g(t, u) = const.
The definition shows directly that on the field of extremals (Ω, ψ), the eikonal
g satisfies the following Carathéodory system of equations:
(
∇u g(t, u) = Lp (t, u, ψ(t, u))
(5.2)
∂t g(t, u) = L(t, u, ψ(t, u)) − hψ(t, u), Lp (t, u, ψ(t, u))i.
5.2 The Legendre transformation
Suppose the derivative function f 0 of f ∈ C 2 (R1 , R1 ) has an inverse function

ψ. Denote x = ψ(ξ). We call
f ∗ (ξ) = ξx − f (x) = ξψ(ξ) − f ◦ ψ(ξ)
the Legendre transform of f .
The Legendre transformation can be extended to multivariate functions. Let
f ∈ C 2 (RN , R1 ). Suppose the gradient ξ = ∇f (x) has inverse mapping ψ.
Denote x = ψ(ξ). We call
f ∗ (ξ) = hξ, xi − f (x) = hξ, ψ(ξ)i − f ◦ ψ(ξ)
the Legendre transform of f .
The Hamilton–Jacobi theory 63
We describe the geometric meaning of the Legendre transformation as follows:

denote G the graph of f {(x, y) ∈ RN × R1 | y = f (x)}, the tangent hyperplane
at the point P = (x, y) is {(α, β) | β − f (x) = h∇f (x), α − xi}. So any point
Q = (α, β) on the hyperplane must satisfy:
β − h∇f (x), αi = f (x) − h∇f (x), xi,
i.e.
β − hξ, αi = −f ∗ (ξ).
−f ∗ (ξ) is the intercept of the hyperplane on the β-axis (see Figure 5.1).
Fig. 5.1
The Legendre transformation has the following properties:

(1) If f ∈ C s , s ≥ 2, then f ∗ ∈ C s .
Proof In fact, since ψ ∈ C s−1 , f ∗ ∈ C s−1 . But by definition,
f ∗ (ξ) = hξ, ψ(ξ)i − f ◦ ψ(ξ),
and since ξ = ∇f (ψ(ξ)), we have:
∇f ∗ (ξ) = ψ(ξ) + hξ, ∇ψ(ξ)i − h∇f (ψ(ξ)), ∇ψ(ξ)i = ψ(ξ),
whence f ∗ ∈ C s .
(2) f ∗∗ = f , i.e. the Legendre transformation is reflexive.
In fact, by (1), x = ψ(ξ) = ∇f ∗ (ξ),
f ∗∗ (x) = hξ, xi − f ∗ (ξ) = f (x).
In order to emphasize this symmetry, we write it as
f (x) + f ∗ (ξ) = hξ, xi, ξ = ∇f (x), x = ∇f ∗ (ξ).
(3)
−1
∂2
2
∗
∂ f
f (ξ)
= (x) .
∂ξi ∂ξj ξ=∇f (x) ∂xi ∂xj
This is because
x = ψ(∇f (x)) = (∇f ∗ )((∇f )(x)),
differentiating on both sides, we have the identity matrix
∂2 ∂2

∗

I= f (ξ) · f (x).
∂ξi ∂ξj ∂xi ∂xj
ξ=∇f (x)
1
Example 5.1 Let f (x) = xp /p, p > 1, then ξ = f 0 (x) = xp−1 , so x = ξ p−1 .
By definition,
1 0
f ∗ (ξ) = ξ · x − f (x) = 0 ξ p ,
p
1 1
where p0 + p = 1.
Example 5.2 Suppose A = (aij ) is a symmetric invertible N × N matrix. Let
f (x) = 21 hAx, xi, ∀ x ∈ RN , then ξ = ∇f (x) = Ax is invertible and we can
solve for x = A−1 ξ. So the Legendre transformation is
1 1
f ∗ (ξ) = hx, ξi − f (x) = hA−1 ξ, ξi − hξ, A−1 ξi = hA−1 ξ, ξi.
2 2
5.3 The Hamilton system of equations
C2
Given a Lagrangian L : R1× RN × RN −→ R1 .
Suppose det Lpi pj (t, u, p) 6= 0. Let
ξi = Lpi (t, u, p), 1 ≤ i ≤ N.
Using the Implicit Function Theorem, we can solve the system of equations
locally
pi = ϕi (t, u, ξ), ξ = (ξ1 , ξ2 , . . . , ξN ), 1 ≤ i ≤ N.
Fix (t, u), as a function of p, we apply the Legendre transformation on L and let
N
X
H(t, u, ξ) = L∗ (t, u, ξ) = ξi pi − L(t, u, p)|p=ϕ(t,u,ξ) . (5.3)
1
We call H the Hamiltonian.

Since the Legendre transformation is reflexive, if H is the Legendre transform
of L, L is vice versa the Legendre transform of H.
The E-L equation corresponding to the Lagrangian L is a second order differ-

ential equation
d
Lp (t, u(t), u̇(t)) − Lu (t, u(t), u̇(t)) = 0,
dt
which in turn can be written as a system of first order equations:

 u̇(t) = p(t),

d
 Lp (t, u(t), p(t)) − Lu (t, u, p(t)) = 0,

dt
whose solution is (u(t), p(t)).
If we further assume L ∈ C 3 , then by differentiating both sides of (5.3), we
have:
Ht dt + hHu , dui + hHξ , dξi
= hξ, dpi + hp, dξi − Lt dt − hLu , dui − hLp , dpi
= −Lt dt − hLu , dui + hp, dξi.
Hence, the following relations hold:
Hξ (t, u, ξ) = p, Lp (t, u, p) = ξ,
Hu (t, u, ξ) + Lu (t, u, p) = 0, Ht (t, u, ξ) + Lt (t, u, p) = 0.
For (u(t), p(t)), a solution of the E-L equation, let ξ(t) = Lp (t, u(t), p(t)), then
˙ = d Lp (t, u(t), u̇(t))

ξ(t)
dt
= Lu (t, u(t), u̇(t))
= Lu (t, u(t), p(t))
= −Hu (t, u(t), ξ(t))
and
u̇(t) = p(t) = Hξ (t, u(t), ξ(t)).
Thus, (u(t), ξ(t)) satisfies the Hamilton system of equations (we shall abbreviate
it by H-S):
˙ = −Hu (t, u(t), ξ(t)),
ξ(t) u̇(t) = Hξ (t, u(t), ξ(t)). (5.4)
Conversely, given an H-S solution (u(t), ξ(t)), by letting p(t) = u̇(t), and by
ξ(t) = Lp (t, u(t), p(t)), we can deduce (u(t), p(t)) is a solution of the E-L
equation:
d ˙ = −Hu (t, u(t), ξ(t)) = Lu (t, u(t), p(t)).
Lp (t, u(t), u̇(t)) = ξ(t)
dt
From this, we can establish the following one-to-one correspondence between the
E-L equation and H-S:
(E-L) ←→ (H-S)
(u(t), p(t)) ←→ (u(t), ξ(t)) = (u(t), Lp (t, u(t), p(t))) .
The H-S is also the E-L equation
Z of the functional:
F (u, ξ) = [hu̇(t), ξ(t)i − H(t, u(t), ξ(t))]dt.
J
Its corresponding 1-form is
α = ξdu − Hdt,
which we call the Poincaré–Cartan invariant.
Since H is the Legendre transform of L, L is also the Legendre transform of
H. This implies:
L(t, u(t), u̇(t)) = hu̇(t), ξ(t)i − H(t, u(t), ξ(t)). (5.5)
From this, we see that the integrands in the functional F and the functional I are
in fact the same function expressed via different variables, while the Poincaré–
Cartan invariant is indeed the differential version of the Hilbert invariant integral
from the previous lecture.
It is worth noting that the functional F (u, ξ) corresponding to the H-S is not
bounded below, hence it is impossible to possess any minimal value. The solutions
of the H-S are merely “critical points” of the functional F .
Example 5.3 For a given collection of particles in classical mechanics (cf. Ex-
ample 2.1 in Lecture 2), let q = (q1 , . . . , qN ), p = (p1 , . . . , pN ), T (p) =
1
PN
2 1 aij pi pj , and V = V (q1 , . . . , qN ), its Lagrangian is
L(t, q, p) = T (p) − V (q),
whose corresponding E-L equation is
N
d X
aij q̇j = −∂qi V (q), i = 1, . . . , N.
dt 1
Here the Hamiltonian
N
1 X ij
H(q, ξ) = a ξi ξj + V (q)
2 1
is the energy of the collection of particles, where (aij ) is the inverse matrix of
(aij ). The corresponding H-S is
˙
 ξi = −∂qi V (q),

XN
q̇
 i
 = aij ξj , i = 1, 2, · · · , N.
j=1
When the Hamiltonian H is independent of t, ∀ c ∈ R1 , let H −1 (c) = {(u, ξ) ∈
RN × RN | H(u, ξ) = c} be a level set of the Hamiltonian.
Theorem 5.1 The solution curve {(t, u(t), ξ(t) | ∀ t} of a Hamiltonian system
remains on the same level set.
Proof Suppose (u(t), ξ(t)) is a solution of
˙ = −Hu (u(t), ξ(t)),

ξ(t)
u̇(t) = Hξ (u(t), ξ(t)),
then using
d ˙
H(u(t), ξ(t)) = hHu (u(t), ξ(t)), u̇(t)i + hHξ (u(t), ξ(t)), ξ(t)i = 0,
dt
it is immediate for all t, H(u(t), ξ(t)) = const.
Applying this theorem to a collection of moving particles, it asserts: the total
energy of an isolated system remains constant (the law of conservation of energy).
5.4 The Hamilton–Jacobi equation
Given a Hamiltonian H = H(t, u, ξ), we call the first order partial differential
equation
∂t S(t, u) + H(t, u, ∇u S(t, u)) = 0 (5.6)
the Hamilton–Jacobi equation (we shall abbreviate it by H-J equation), where
S = S(t, u) is a function of N + 1 variables.
The H-J equation is a fundamental equation in both classical mechanics and
quantum mechanics.
We know for a Mayer field (Ω, ψ), its eikonal g satisfies the Carathéodory
system of equations (5.2):

∇u g(t, u) = Lp (t, u, ψ(t, u))
∂t g(t, u) = L(t, u, ψ(t, u)) − hψ(t, u), Lp (t, u, ψ(t, u))i.
Substituting the Legendre transform ξ = Lp (t, u, ψ(t, u)) = ∇u g(t, u) into the
Carathéodory system of equations, we see that g satisfies the first order partial
differential equation:
∂t g(t, u) + H(t, u, ∇u g(t, u)) = 0, (5.60 )
where H is the Legendre transform of the Lagrangian L.
Based on the relationship of the Legendre transforms of the Lagrangian and
the Hamiltonian, for a given Hamiltonian H, we can write L and transforming a
solution (u(t), ξ(t)) of the H-S:
ξ˙i (t) = −Hui (u(t), ξ(t)),

u̇i (t) = Hξi (u(t), ξ(t)), 1≤i≤n

into a solution (u(t), p(t)) of the E-L equation. After integrating, we can express
the eikonal g(t, u).
It is evident, if we have a solution (u(t), ξ(t)) of the Hamiltonian system, by
letting
p(t) = Hξ (t, u(t), ξ(t)),
we can then obtain the solution (u(t), p(t)) of the E-L equation.
Moreover, using (5.5), we can obtain the Lagrangian L(t, u(t), p(t)), whence,
Z t
g(t, u(t)) − g(t0 , u(t0 )) = L(t, u(t), p(t))dt
t0
is the solution of the eikonal equation.

That being said, we can derive solutions of the H-J equation from the solu-
tions of the Hamiltonian system by choosing arbitrary initial values. However, it
should be noted that the Hamiltonian system is a system of ordinary differential
equations, whereas the H-J equation is a first order partial differential equation.
Example 5.4 (Propagation of light through a medium) Denote the density of a
medium at the point (t, u) ∈ R1 × Rn by ρ(t, u). Using the speed of light in vac-
1
uum as a unit, suppose the light speed at the point is ρ(t,u) , then the corresponding
Lagrangian is
p
L(t, u, p) = ρ(t, u) 1 + p2 ,
here
p
ξ = Lp = ρ · p ,
1 + p2
ξ
p= p ,
ρ − ξ2
2
p
H(t, u, ξ) = pξ − L(t, u, p) = − ρ2 − ξ 2 .
The eikonal g satisfies the eikonal equation
p
gt = ρ2 − |∇u g|2 .
This is a H-J equation. Sometimes, we also write it as
gt2 + |∇u g|2 = ρ2
and
g(t, u) = const.
is the wavefront of light.
The corresponding directional field is

∇u g ∇u g
ψ(t, u) = Hξ (t, u, ∇u g) = p = .
ρ2 − |∇u g|2 ∂t g
Since
u̇ = ψ(t, u), (ṫ, u̇) = (1, u̇) = (∂t g)−1 (∂t g, ∇u g),
its integral curve therefore follows the normal direction of the wavefront g(t, u) =
const. in the (t, u)-space. In other words, “a ray of light travels perpendicularly to
the wavefronts.”
5.5∗ Jacobi’s Theorem
On the other hand, we can also derive the solutions of H-S from the solutions
of the H-J equation.
Definition 5.1 Let g = g(t, u1 , . . . , uN ; λ1 , · · · , λN ) be a family of solutions of
the H-J equation depending on the N independent parameters (λ1 , . . . , λN ) ∈ Λ
(here Λ ⊂ RN is a region). If det(gui λj ) 6= 0, then it is called a complete integral.
Theorem 5.2 (Jacobi) Let the C 2 function g(t, u1 , . . . , uN ; λ1 , . . . , λN ) be
a complete integral of the H-J equation. Given 2N parameters (α, β) =
(α1 , . . . , αN , β1 , . . . , βN ), suppose the function

u = U (t, α, β),
p = P (t, α, β),
satisfies

gαi (t, U (t, α, β), α) = −βi
(5.7)
Pi (t, α, β) = gui (t, U (t, α, β), α) i = 1, 2, . . . , N,
then they form a family of solutions of the Hamiltonian system.
Proof 1. First, differentiating the H-J equation (5.60 ) with respect to αi , we have:
N
X
gt,αi + Hξk (t, u, ∇u g)gαi ,uk = 0, i = 1, 2, . . . , N.
k=1
Substituting u = U (t, α, β) and we have:

gt,αi (t, U (t, α, β), α)
N
X
+ Hξk (t, U (t, α, β), P (t, α, β))gαi ,uk (t, U (t, α, β), α) = 0. (5.8)
k=1
Differentiating the first equation in (5.7) with respect to t, we have:

N
X
gt,αi (t, U (t, α, β), α) + gαi ,uk (t, U (t, α, β), α)U̇k (t, α, β) = 0. (5.9)
k=1
Subtracting (5.9) by (5.8), it yields:
N
X
[U̇k (t, α, β) − Hξk (t, U (t, α, β), P (t, α, β))]gαi ,uk (t, U (t, α, β), α) = 0.
k=1
Since the matrix (gαi ,uk ) is invertible, it follows that
U̇k (t, α, β) = Hξk (t, U (t, α, β), P (t, α, β)),
which constitutes a set of equations of the H-S.
2. Next, by differentiating the H-J equation (5.60 ) with respect to ui , we get
gt,ui (t, u, α) + Hui (t, u, ∇u g(t, u, α))
N
X
+ Hξk (t, u, ∇u g(t, u, α))gui ,uk (t, u, α) = 0.
k=1
Substituting u = U (t, α, β) and P (t, α, β) = ∇u g(t, U (t, α, β), α) into the
above equation, we have
N
X
−Hui (t, U, P ) = gt,ui (t, U, α) + gui uk (t, U, α)U˙k . (5.10)
1
Differentiating the second equation in (5.7) with respect to t, we then have
N
X
Ṗi (t, α, β) = gui ,t (t, U, α) + gui ,uk (t, U, α)U̇k (t, α, β), (5.11)
k=1
i.e.
Ṗi (t, α, β) = −Hui (t, U (t, α, β), P (t, α, β)).
This yields the other set of equations of the H-S.
Remark 5.2 The significance of the Jacobi’s Theorem is that one can express the
general solutions of the H-S via the solutions of the H-J equation. The approach
is to solve the system of N implicit function equations:
gαi (t, u, α) = −βi , i = 1, . . . , N
to obtain
u = U (t, α, β) (5.12)
and then substituting it back into the eikonal g to get
p = P (t, α, β) = ∂u g(t, U (t, α, β), α). (5.13)
The set (u, p) is precisely the desired general solutions of the H-S.
Remark 5.3 Complete integrals and general solutions have different mean-
ings. This is because the H-J equation is a first order partial differential equa-
tion, and according to the uniqueness of solution to a Cauchy problem, its
general solution should contain an arbitrary function ϕ(u), not just 2N indepen-
dent parameters.
However, the general solutions of the H-S can be determined by a complete
integral g of the H-J equation. In other words, for a given initial value (u0 , ξ0 ), to
solve the initial value problem of the H-S
u̇ = Hξ (t, u, ξ), ξ˙ = −Hu (t, u, ξ), u(0) = u0 , ξ(0) = ξ0 ,
we may proceed as follows:
Once (u0 , ξ0 ) is given, since det(gui αj ) 6= 0, we can first apply the Implicit
Function Theorem to the second equation of the system

∇α g(0, u0 , α) = −β,
ξ0 = ∇u g(0, u0 , α)
to solve for
α0 = α(u0 , ξ0 ).
Setting
β0 = −∇α g(0, u0 , α0 )
and plugging them into (5.12) and (5.13), we will end up with the solution of the
H-S with initial value (u0 , ξ0 ).
Example 5.5 (Harmonic Oscillators) Given the Lagrangian L = 21 (mp2 −ku2 ),
where m and k are positive constants. Then the Hamiltonian is
1 p2

2
H= + ku .
2 m
The corresponding H-S
 u̇ = p ,

m
 ṗ = −ku.
has solution
 r
k

 u = C sin (t + t0 ),
m

√
r
 p = C mk cos k (t + t0 ),


m
where t0 and C are arbitrary constants.
We now use the H-J equation

gt + H(t, u, gu ) = 0
and the Jacobi’s Theorem to express the H-S solution.
Consider the special eikonal g(t, u, α) = ϕ(u) − αt, where α is a parameter
and ϕ is a function yet to be determined. So the H-J equation is
1 ϕ02 (u)

+ ku2 = α,
2 m
i.e.
p
ϕ0 (u) = m(2α − ku2 ).
Solving this, we have:
Z u p
g = g(t, u, α) = m(2α − ku2 )du − αt.
0
We now solve the equation
r r
m k
β = −gα = t − arcsin u
k 2α
to get
r r
2α k
u= sin (t − β) .
k m
Substituting it back into g, it follows that
r
√

0 k
p = gu = ϕ (u) = 2αm cos (t − β) .
m
This gives the solution of the Hamiltonian system of a harmonic oscillator involv-
ing the two parameters α, β.
Exercises
1. For each of the given Lagrangian, find the Hamiltonian and solve the Hamilto-
nian system.
(1) L = (p +pku)2 , k 6= 0.
(2) L = e−u 1 + p2 .
Write the H-J equation and find its complete solutions.
2. Suppose ∀ (t, u), L(t, u, p) is convex in p. Prove that
H(t, u, ξ) = sup {hp, ξi − L(t, u, p)}.
p∈RN
Lecture 6
Variational problems involving multivariate

integrals
Previously, we have discussed variational problems involving single integrals.

However, when the unknown is a multivariate (or vector-valued) function, we
are confronted with variational problems involving multivariate integrals. In this
lecture, we extend the theory of calculus of variations from a single integral setting
to a multivariate integral setting, including the E-L equation, the criteria for weak
and strong minima, Jacobi fields, and the Weierstrass excess function, etc.
Let Ω ⊂ Rn be a bounded region with ∂Ω ∈ C 1 . Given a Lagrangian L =
L(x, u, p) ∈ C 2 (Ω̄ × RN × RnN ) and a boundary function Φ ∈ C 1 (∂Ω, RN ), we
want to minimize the functional
Z
I(u) = L(x, u(x), ∇u(x))dx,
Ω
under the boundary condition u ∈ M := {v ∈ C 1 (Ω̄, RN ) | v|∂Ω = Φ|∂Ω }.

For simplicity, we introduce the following notation:
x = (xα )n1 = (x1 , . . . , xn ),
u = (ui )N 1
1 = (u , . . . , u ),
N
i
p = (pα )1≤i≤N,1≤α≤n ,
pi = (pi1 , . . . , pin ), 1 ≤ i ≤ N,
∂u
∂α u = , α = 1, . . . , n,
∂x
αi
∂u
∇u = = (uiα ).
∂xα
6.1 Derivation of the Euler–Lagrange equation
u0 ∈ M is said to be a minimum of I on M , if
I(u) ≥ I(u0 ), ∀ u ∈ U,
73
where U ⊂ M is a neighborhood of u0 . When using the C 1 -topology, u0 is called

a weak minimum, when using the C 0 -topology, u0 is called a strong minimum.
Similar to the single variable setting, assuming such u0 exists, we seek a nec-
essary condition for which it must satisfy, namely the E-L equation.
For brevity, we denote τ = (x, u0 (x), ∇u0 (x)).
Theorem 6.1 Suppose L ∈ C 2 , u0 ∈ C 2 , and u0 is a minimum of I on M , then
it satisfies the following E-L equation
n
X ∂Lpiα (x, u0 (x), ∇u0 (x))
− Lui (x, u0 (x), ∇u0 (x)) = 0, 1 ≤ i ≤ N.
α=1
∂xα
Proof ∀ ϕ ∈ C01 (Ω̄, RN ), consider the 1-variable function s 7→ g(s) := I(u0 +

sϕ). Since 0 is a minimum of g, we deduce that
XN Z Xn
0 i i
0 = g (0) = Lui (τ )ϕ (x) + Lpiα (τ )∂xα ϕ (x) dx
i=1 Ω α=1
N Z
X n
X
= Lui (τ ) − Lpiα (τ ) ϕi (x)dx.
i=1 Ω α=1
The assertion follows from a result (Lemma 6.1) similar to the Du Bois Reymond’s
lemma.
We will prove a generalized version of the Du Bois Reymond’s lemma in
higher dimensions. But first, we introduce the 1-variable bump function:

−1
exp |t| < 1

ψ(t) = 1 − |t|2
0 |t| > 1.

For the multivariable x = (x1 , . . . , xn ), let

ϕ(x) = c−1
n ψ(|x|),
where |x| = (x21 + · · · + x2n )1/2 and cn = Rn ψ(|x|)dx.

R
For > 0 sufficiently small, we further define

x
ϕ (x) = −n ϕ .

Given a region Ω ⊂ RN , for any δ > 0 sufficiently small, we denote Ωδ = {x ∈
Ω | d(x, ∂Ω) ≥ δ, |x| ≤ 1/δ}.
Suppose u ∈ L1loc (Ω), let
Z
uδ (x) = u(y)ϕδ (x − y)dy, ∀ x ∈ Ωδ .
Ω
Variational problems involving multivariate integrals 75
Using the change of variable z = (y − x)/δ, we obtain:

Z
uδ (x) = u(x + δz)ϕ(z)dz, ∀ x ∈ Ωδ ,
B1 (θ)
where B1 (θ) is the unit ball in Rn centered at the origin. Hence, ∀ δ0 > 0,
Z Z Z
|u(x)−uδ (x)|dx ≤ ϕ(z)dz |u(x)−u(x+δz)|dx → 0, as δ → 0.
Ωδ0 B1 (θ) Ωδ0
In fact, if we use a continuous function v defined on Ω̄δ instead of u, the above

limit clearly still holds. Furthermore, since C(Ω̄δ0 ) is dense in L1 (Ωδ0 ), replacing
it by u ∈ L1loc (Ω), the limit remains zero.
In summary, we have:
Lemma 6.1 Suppose u ∈ L1loc (Ω), then ∀ δ0 > 0,
Z
|u(x) − uδ (x)|dx → 0 as δ → 0. (6.1)
Ωδ0
Corollary 6.1 Suppose u ∈ L1loc (Ω) and

Z
u(x)ϕ(x)dx = 0, ∀ ϕ ∈ C0∞ (Ω),
Ω
then u(x) = 0 a.e. for x ∈ Ω.

Proof Note that ∀ δ0 > 0, since
ϕδ (x − y) ∈ C0∞ (Ω), ∀ x ∈ Ωδ0 , ∀ δ < δ0
and by assumption, one has
Z
uδ (x) = u(y)ϕδ (x − y)dy = 0, ∀ x ∈ Ωδ , ∀ δ < δ0 .
Ω
It follows from Lemma 6.1 that uδ (x) → u(x) a.e. for x ∈ Ωδ0 , namely u(x) = 0
a.e. for x ∈ Ωδ0 . Since δ0 > 0 is arbitrary, it is immediate u(x) = 0 a.e. for
x ∈ Ω.
Just like in the 1-variable setting, we call EL : u 7→ v = (v1 , . . . , vN ), where
n
X ∂Lpiα (x, u(x), ∇u(x))
vi = − Lui (x, u(x), ∇u(x)), i = 1, . . . , N
α=1
∂xα
the Euler–Lagrange operator of L.

Remark 6.1 Without the hypothesis u ∈ C 2 , one can still define the E-L operator
by interpreting the term ∂x∂α in front of Lpiα (x, u(x), ∇u(x)) as derivatives of a
distribution, so the E-L equation holds in the sense of distributions.
Similar to Remarks 2.2 and 2.3 in Lecture 2, when dealing with variational
problems involving multivariate integrals, we can change the domain M of a func-
tional from C 1 (Ω̄, RN ) to Lip(Ω̄, RN ), or more specially, to P W C 1 (Ω̄, RN ), the
class of piecewise C 1 functions. u ∈ P W C 1 (Ω̄, RN ) if there exists a finite col-
lection of n − 1 dimensional piecewise C 1 -hypersurfaces {S1 , . . . , Sk } such that
Sk
the continuous function u ∈ C 1 (Ω̄ \ j=1 Sj , RN ) and u has normal derivatives
on both sides of Sj .
Example 6.1 (Dirichlet integrals) Assume N = 1, L(p) = |p|2 /2, for the
functional
Z
1
D(u) = |∇u(x)|2 dx,
2 Ω
its E-L equation is
n
X ∂2u
∆u = = 0, ∀ x ∈ Ω.
α=1
∂x2α
This is a harmonic equation, also called the Laplace equation.
We can also consider a more generalized variational problem, such as
1 2 a(x) α+1
L(x, u, p) = |p| − |u| , α > 0,
2 α+1
where a ∈ C(Ω̄). Its E-L equation is
−∆u(x) = a(x)|u|α−1 u, ∀ x ∈ Ω.
Example 6.2 (Wave equations) Denote R1 × R3 the time-space continuum, a
given point has coordinate (t, x), where t represents time and x = (x1 , x2 , x3 )
represents its spacial location. We use u = u(t, x1 , x2 , x3 ) to represent the prop-
agation of an elastic wave in the region Ω ⊂ R1 × R3 .
The kinetic energy of the elastic wave is
Z
1
T = |∂t u(t, x)|2 dtdx
2 Ω
and the potential energy is
Z
1
U= |∇x u(t, x)|2 dtdx.
2 Ω
The Lagrangian is
L(u) = T (u) − U (u).
According to the principle of stable action, the propagation of the elastic wave is
a solution of the E-L equation
2u = ∂t2 u − 4u = 0
of L. This is known as the d’Alembert equation.

Likewise, if there are internal forces and or external forces involved, then there
are some added terms to the potential energy. For instance,
Z
1
U= (|∇x u(t, x)|2 + M 2 |u(t, x)|2 )dtdx,
2 Ω
where M > 0 is a constant. In this case, the corresponding E-L equation is a
Klein–Gordan equation:
2u − M 2 u = ∂t2 u − 4u + M 2 u = 0.
Another example is
Z
1 1
U= |∇x u(t, x)|2 + |u(t, x)|4 dtdx,
Ω 2 4
then the corresponding E-L equation is a nonlinear wave equation:
2u + u3 = ∂t2 u − 4u + u3 = 0.
Example 6.3 (Minimal surfaces) Let Ω ⊂ Rn . Given u ∈ C 1 (Ω̄), its graph
{(x, u(x)) | x ∈ Ω̄} is a hypersurface, whose area is given by
Z p
A(u) = 1 + |∇u(x)|2 dx.
Ω
Under the boundary condition u|∂Ω = Φ, the hypersurface minimizing the area
satisfies the E-L equation

∇u(x)
div p = 0, ∀ x ∈ Ω. (6.2)
1 + |∇u(x)|2
Notice the mean curvature of the hypersurface is

1 ∇u(x)
H = div p .
n 1 + |∇u(x)|2
Suppose we are given the mean curvature function H(x), x ∈ Ω, then u satisfies
the mean curvature equation

∇u(x)
div p = nH(x), ∀ x ∈ Ω. (6.3)
1 + |∇u(x)|2
It is worth noting, (6.3) is the E-L equation of the functional
Z p
I(u) = 1 + |∇u(x)|2 + H(x)u(x) dx.
Ω
In particular, when n = 2, the above equation reduces to
3
(1 + u2y )uxx − 2ux uy uxy + (1 + u2x )uyy = 2H(x)(1 + u2x + u2y ) 2 . (6.4)
Comparing (6.2) and (6.3), the mean curvature with mean curvature zero equation
coincides with the minimal surface equation. Therefore, the zero mean curvature
surfaces are usually called minimal surfaces.
We now find a special solution u(x, y) = f (x) + g(y) of the minimal surface
equation. Substituting this into (6.3) with H = 0, it yields
(1 + g 0 (y)2 )f 00 (x) + (1 + f 0 (x)2 )g 00 (y) = 0,
i.e.
f 00 (x) g 00 (y)
= c = − .
1 + f 0 (x)2 1 + g 0 (y)2
From which, we find arctan f 0 (x) = cx or equivalently
1
f (x) = − ln | cos cx|.
c
Likewise,
1
g(y) = ln | cos cy|.
c
Thus,

1 cos cy
u(x, y) = ln .
c cos cx
The minimal surface defined by u is known as the Scherk surface.
Example 6.4 (Maxwell’s equations) Consider a point with space-time coordinate
(x0 , x1 , x2 , x3 ) ∈ R1 × R3 , where x = (x1 , x2 , x3 ) denotes its spacial coordinate
and x0 = ct, t is time and c is the speed of light.
In an electromagnetic field, the electric charge ρ and the electric current j are
both functions of space-time. Let E = (E1 , E2 , E3 ) and B = (B1 , B2 , B3 ) de-
note the electric field and the magnetic field respectively, which are also functions
of space-time.
Maxwell’s equations can be written as
 − 1c ∂B = ∇ × E

(Faraday’s law of induction)

 ∂t
∇·B = 0 (Gauss’s law for magnetism)

 ∇ × B = 1c ∂E + 4π c j (Ampere’s circuital law)

 ∂t
∇ · E = 4πρ.

(Gauss’s law for electric charge)
Since ∇ · B = 0, there exists a magnetic potential A = (A1 , A2 , A3 ) such that
∇ × A = B.
From Faraday’s law of induction,

∂A 1 ∂B
∇× E+ =∇×E+ = 0,
∂x0 c ∂t
we can deduce that there exists an electric potential A0 such that

∂A
E+ = ∇A0 .
∂x0
We call A = (A0 , A1 , A2 , A3 ) an electromagnetic potential and let
∂Aj ∂Ai
Fij = i
− ,
∂x ∂xj
it then follows that
0 −E1 −E2 −E3
 
E1 0 B3 −B2 
(Fij ) = E2 −B3 0 B1  .

E3 B2 −B1 0
We now define the Lagrangian
3 3
1 1X 1 X 2
L=− ji Ai + Fik
c c i=0 16π
i,k=0
and J = (j0 , j1 , j2 , j3 ), where j0 = cρ and j = (j1 , j2 , j3 ).

The corresponding functional
Z
I(A) = Ld4 x.
R1 ×R3
From this, we can deduce the E-L equation

∂L ∂ ∂L
− = 0,
∂Ai ∂xj ∂pji
where
∂Ai
pji = .
∂xj
Since
∂L 1 ∂L 1
= − 2 ji , = Fij ,
∂Ai c ∂pji 4cπ
we have
∂Fij 4π
= − ji i = 0, 1, 2, 3.
∂xj c
This is precisely the Ampere’s circuital law (i = 1, 2, 3) and Gauss’s law for
electric charges (i = 0).
Remark 6.2 (A0 , A1 , A2 , A3 ) is not uniquely determined by E and H. In fact, for
any function f ∈ C 1 , we use
∂f
A0j (x) = Aj (x) + , j = 0, 1, 2, 3
∂xj
to replace (Aj )30 , then the corresponding E, H remain unchanged. The above
transformation is called a gauge transformation. If, in addition, we impose the
Lorentz condition
3
∂A0 X ∂Aj
− + = 0,
∂x0 1
∂xj
it will resolve the non-uniqueness issue.
6.2 Boundary conditions
Similar to single integral’s variational problems, when dealing with variational

problems involving multivariate integrals, depending on the requirement of the
functional on the boundary of its domain, the resulting E-L equation also has to
meet certain boundary conditions. Previously, we have discussed the scenario
where a given domain M comes with a prescribed boundary function:
M = {u ∈ C 1 (Ω̄, RN ) | u|∂Ω = Φ}.
As a consequence, the resulting u∗ has to satisfy not only the E-L equation
EL (u) = 0,
but also the Dirichlet boundary condition:
u|∂Ω = Φ.
Suppose we change the domain to be M = C 1 (Ω̄, RN ), in other words, on the
boundary ∂Ω of Ω, we impose no condition on the functional whatsoever. Then
via an argument similar to that used in the single integral setting, we see that the
C 2 extremal function u∗ of the functional I again satisfies the E-L differential
equation
EL (u∗ ) = 0.
Furthermore, using integration by parts, we have:
Z X n
δI(u∗ , ϕ) = Lui (x, u∗ (x), u̇∗ (x))ϕi (x)
Ω i=1
n
X
+ Lpiα (x, u∗ (x), ∇u∗ (x))∂α ϕi (x) dx
α=1
Z Z n X
X N
= EL (u∗ )ϕ̇ + να (x)Lpiα (x, u∗ (x), ∇u∗ (x))∂α ϕi (x)dH n−1 (x),
Ω ∂Ω α=1 i=1
where dH n−1 (x) denotes the area element of ∂Ω and ν(x) = (ν1 (x),
ν2 (x), . . . , νn (x)) denotes the unit outward normal vector of ∂Ω. This gives rise
to the Neumann boundary condition:
XN
να Lpiα (x, u∗ (x), ∇u∗ (x))|∂Ω = 0, i = 1, . . . , N. (6.5)
i=1
6.3 Second order variations
Given Ω ⊂ Rn Rand a Lagrangian L ∈ C 2 (Ω̄ × RN × RnN ), we define the

functional I(u) = Ω L(x, u(x), ∇(x))dx. Suppose u0 is a solution of the E-L
equation, we now study the second order variation of the functional I.
∀ ϕ ∈ C0∞ (Ω, RN ), we continue to use the previous 1-variable function
g(s) = I(u0 + sϕ) and we have:
d2
δ 2 I(u0 , ϕ) = g 00 (0) = I(u0 + sϕ)|s=0
ds2
N Z
X n
X
i j
= Lui uj (τ )ϕ (x)ϕ (x) + 2 Lui pjα (τ )ϕi (x)∂α ϕj (x)
i,j Ω α=1
n
X
+ Lpi pj (τ )∂α ϕi (x)∂β ϕj (x) dx,
α β
α,β=1
where τ = (x, u0 (x), ∇u0 (x)).

For simplicity, we introduce the following notation.
A = (Lpjα pk (x, u, p)),
β
B = (Lpj uk (x, u, p)),

β
C = (Luj uk (x, u, p)),

and
Au0 = (ajk
αβ ) = (Lpjα pk (τ )),β
Bu0 = (bjk
β ) = (Lpj uk (τ )),
β
Cu0 = (cjk ) = (Luj uk (τ )).

Furthermore, we denote
Z
Qu0 (ϕ) = δ 2 I(u0 ) = [Au0 (∇ϕ, ∇ϕ) + 2Bu0 (∇ϕ, ϕ) + Cu0 (ϕ, ϕ)]dx.
Ω
If u0 is a weak minimum, then it is necessary to have
Qu0 (ϕ) ≥ 0, ∀ ϕ ∈ C01 (Ω, RN ), (6.6)
where C01 (Ω, RN ) is the closure of C0∞ (Ω, RN ) in C 1 (Ω̄, RN ).
Conversely, suppose u0 ∈ C 1 (Ω̄, RN ) satisfies the E-L equation and if there
exists λ > 0 such that
Z
Qu0 (ϕ) ≥ λ [|∇ϕ|2 + |ϕ|2 ]dx, ∀ ϕ ∈ C01 (Ω, RN ), (6.7)
Ω
then u0 is a strict minimum of I. The proof is identical to the case n = 1 (we
refer to Theorem 3.1).
Analogous to the n = 1 case, we have a similar Legendre–Hadamard

condition.
∀ x0 ∈ Ω, ∀ µ > 0, let v ∈ C0∞ (B1 (θ), RN ). For µ sufficiently small, let

x − x0
ϕ(x) = µv .
µ
Substituting it into (6.6), it yields
Z
n
Qu0 (ϕ) = µ Au0 (x0 + µy)∇v(y)∇v(y)
B1 (θ)
+ 2µBu0 (x0 + µy)v∇v(y) + µ2 Cu0 (x0 + µy)v(y)v(y)dy ≥ 0.

Letting µ → 0, we see that
N
X n
X Z
ajk
αβ (x0 ) ∂α v j (y)∂β v k (y)dy ≥ 0.
j,k=1 α,β=1 B1 (θ)
Now for any ρ ∈ C0∞ (B1 (θ), R1 ) satisfying ρ2 (y)dy = 1, ∀ ξ ∈ RN , and

R
B1 (0)
∀ η ∈ Rn , we define
v(y) = ξ cos (tη · y)ρ(y)
and
v(y) = ξ sin (tη · y)ρ(y).
Substituting these into the above inequality respectively and then adding them,
furthermore, by letting t → ∞, we have:
N
X n
X
ajk j k
αβ (x0 )ξ ξ ηα ηβ + O(t
−1
) ≥ 0,
j,k=1 α,β=1
i.e.
N
X n
X
ajk j k
αβ (x0 )ξ ξ ηα ηβ ≥ 0.
j,k=1 α,β=1
This is the Legendre–Hadamard condition

N
X n
X
Lpjα pk (x, u0 (x), ∇u0 (x))ξ j ξ k ηα ηβ ≥ 0,
β
j,k=1 α,β=1
∀ (x, ξ, η) ∈ Ω × RN × Rn . (6.8)
If we adopt the rank-1 matrix notation

π = (παi ) = (ξ i ηα ),
then (6.8) can be written in the following equivalent form:

N
X n
X
Lpjα pk (x, u0 (x), ∇u0 (x))παj πβk ≥ 0, ∀ π, rank(π) = 1.
β
j,k=1 α,β=1
If ∃ λ > 0 such that ∀ x ∈ Ω, ∀ ξ ∈ RN , and ∀ η ∈ Rn ,

N
X n
X
Lpjα pk (x, u0 (x), ∇u0 (x))ξ j ξ k ηα ηβ ≥ λ|ξ|2 |η|2 , (6.9)
β
j,k=1 α,β=1
then we call it the strict Legendre–Hadamard condition.

(6.9) also has the equivalent form of
N
X n
X
Lpjα pk (x, u0 (x), ∇u0 (x))παj πβk ≥ λkπk2 , ∀ π, rank(π) = 1.
β
j,k=1 α,β=1
Note the norm of the matrix π = (παj ) is given by

n X
X N 21
j 2
kπk = (πα ) .
α=1 j=1
In addition, there is a stronger condition for multi-integral variational problems:

N
X n
X
Lpjα pk (x, u0 (x), ∇u0 (x))παj πβk ≥ λkπk2 , ∀ x ∈ Ω ∀ π ∈ Rn×N ,
β
j,k=1 α,β=1
known as the strong elliptical condition.

However, for N = 1 or n = 1, the strict Legendre–Hadamard condition and
the strong elliptical condition agree with each other.
In summary, we have the following.
Theorem 6.2 Let L ∈ C 2 and suppose u0 ∈ M is a weak minimum of I, then
(6.8) holds. Conversely, if u0 ∈ M satisfies the E-L equation and if there exists
λ > 0 such that
Z
δ I(u0 , ϕ) ≥ λ {|ϕ(x)|2 + |∇ϕ(x)|2 }dx, ∀ ϕ ∈ C01 (Ω, RN ),
2
(6.10)
Ω
then u0 is a strict minimum of I.

Similarly, (6.10) implies the strict Legendre–Hadamard condition (6.9), but
(6.9) is not a sufficient condition for u0 to be a weak minimum.
6.4 Jacobi fields
For multi-integral variational problems, we also have the concept of a Jacobi

field.
Let L ∈ C 3 and suppose u0 ∈ M is a weak minimum of I, then
δ 2 I(u0 , ϕ) ≥ 0, ∀ ϕ ∈ C01 (Ω, RN ),
i.e. Qu0 (ϕ) ≥ 0.
The E-L equation of the functional Qu0 is a system of homogeneous second
order partial differential equations:
XN X n X n
jk k jk k jk k jk k
∂α aαβ ∂β ϕ + bα ϕ − bβ ∂β ϕ + c ϕ = 0,
k=1 α=1 β=1
j = 1, 2, . . . , N.
As for n = 1, we call this the Jacobi equation and call
Ju0 : ϕ 7→ (ψ 1 , . . . , ψ N )
the Jacobi operator along u0 , where
XN X n X n n
X
ψj = ∂α ajk
αβ β∂ ϕk
+ bjk k
α ϕ − bjk
β β∂ ϕk
+ cjk k
ϕ ,
k=1 α=1 β=1 β=1
j = 1, . . . , N.
Any C 2 solution of the Jacobi equation is called a Jacobi field along u0 .
For a differential operator satisfying the strict Legendre–Hadamard condition,
we have the following inequality:
Lemma 6.2 (Gårding’s inequality) Suppose (ajk αβ (x)) are uniformly continuous
functions defined on Ω̄ ⊂ Rn and there exists σ > 0 such that
n
X N
X
ajk j k 2 2
αβ ξ ξ ηα ηβ ≥ σ|ξ| |η| , ∀ x ∈ Ω,
α,β=1 j,k=1
then there exist α > 0 and C0 > 0 such that for all ϕ ∈ C01 (Ω, RN ),
Z X n XN Z Z
jk j k 2
aαβ (x)∂α ϕ ∂β ϕ dx ≥ α |∇ϕ(x)| dx − C0 |ϕ(x)|2 dx.
Ω α,β=1 j,k=1 Ω Ω
Proof The inequality clearly holds for N = 1, in which case, C0 = 0. It follows

readily from the assumption
n
X
aαβ ∂α u(x)∂β u(x) ≥ σ|∇ϕ(x)|2
α,β=1
and integrating on both sides.

For N > 1, if (ajk

αβ (x)) are constants, we proceed by using the Fourier trans-
form. Let ϕ vanish outside Ω, then it is defined on the entire Rn . Let
Z
ϕ̂(ξ) = ϕ(x) exp [−2πihξ, xiRn ]dx,
Rn
then
Z
−2πiξα ϕ̂(ξ) = ∂α ϕ(x) exp [−2πihξ, xiRn ]dx.
Rn
According to Parsaval’s equality,

XN Xn Z
ajk j k
αβ ∂α ϕ (x)∂β ϕ (x)dx
j,k=1 α,β=1 Rn
N
X n Z
X
= 4π 2 ajk j k
αβ ξα ξβ ϕ̂ (ξ)ϕ̂ (ξ)dξ
j,k=1 α,β=1 Rn
Z
≥ 4π 2 σ |ξ|2 |ϕ̂(ξ)2
Rn
Z
=σ |∇ϕ(x)|2 dx.
Ω
When (ajk
αβ (x))
are non-constants, we can employ the argument of partition
of unity, treating the coefficients as if they are constants in each small neighbor-
hood, and then piece them together using the estimate given above. The remainder
can be combined into the second integral on the right-hand side be means of the
Schwarz’s inequality. Since the details of this proof is rather complicated and be-
yond the scope of this course, we shall omit it and refer the interested readers to
K. Yosida, Functional Analysis, pp. 175–177.
Lemma 6.3 Let L ∈ C 2 satisfy the strict Legendre–Hadamard condition, namely
∃ σ > 0 such that
Lpi pj (x, u, p)ξ i ξ j ηα ηβ ≥ σ|ξ|2 |η|2 , ∀ (x, u, p) ∈ Ω × RN × RnN .
α β
Suppose u0 is a solution of the E-L equation, and there exists µ > 0 such that
Z
Qu0 (ϕ) ≥ µ |ϕ|2 dx ∀ ϕ ∈ C01 (Ω, RN ),
Ω
then there exists λ > 0 such that

Z
Qu0 (ϕ) ≥ λ (|∇ϕ|2 + |ϕ|2 )dx ∀ ϕ ∈ C01 (Ω, RN ),
Ω
whence u0 is a strict minimum of I.

Proof Since L satisfies the strict Legendre–Hadamard condition, according to

Gårding’s inequality, there exist α > 0 and C0 > 0 such that
Z Z
Au0 (∇ϕ · ∇ϕ)dx ≥ [α|∇ϕ|2 − C0 |ϕ|2 ]dx.
Ω Ω
From
Z
Qu0 (ϕ) = Au0 (∇ϕ · ∇ϕ) + 2Bu0 (∇ϕ · ϕ) + Cu0 (ϕ · ϕ),
Ω
we deduce that there exist positive constants C1 and C2 such that

Z
α |∇ϕ|2 dx
Ω
Z 21 Z 21 Z Z
2 2 2
≤ Qu0 (ϕ)+C1 |∇ϕ| dx |ϕ| dx + |ϕ| dx +C2 |ϕ|2 dx
Ω Ω Ω Ω
Z Z
α 2
≤ |∇ϕ| dt+Qu0 (ϕ)+C2 |ϕ|2 dx.
2 Ω Ω
Moreover, using the fact

Z
|ϕ|2 dx ≤ µ−1 Qu0 (ϕ),
Ω
we then have
Z
2
|∇ϕ(x)|2 dx ≤ (1 + C2 µ−1 )Qu0 (ϕ).
Ω α
Combining the two inequalities, it follows that there exists λ > 0 such that
Z
Qu0 (ϕ) ≥ λ (|∇ϕ|2 + |ϕ|2 )dx.
Ω
By Theorem 6.2, u0 is a strict minimum of I.

In addition, we provide a different criterion for strict minimum from the
“eigenvalue” point of view.
Suppose u0 ∈ C 1 (Ω̄, RN ) is a solution of the E-L equation. We call
Z
λ1 = inf Qu0 (ϕ) | ϕ ∈ C01 (Ω, RN ), |ϕ(x)|2 dx = 1
Ω
the first eigenvalue of the Jacobi operator (details are given in Lecture 12).
Theorem 6.3 Let L ∈ C 2 satisfy the strict Legendre–Hadamard condition. Sup-
pose u0 ∈ M is a weak minimum of I, then λ1 ≥ 0. Furthermore, if λ1 > 0, then
u0 is a strict weak minimum of I.
Proof Suppose λ1 < 0, then ∃ ϕ0 ∈ C01 (Ω, RN )\{θ} such that

Z
λ1
Qu0 (ϕ0 ) < |ϕ0 (x)|2 dx < 0,
2 Ω
which contradicts (6.6). Thus, λ1 ≥ 0.
Suppose λ1 > 0, then
Z
λ1
Qu0 (ϕ) ≥ |ϕ(x)|2 , ∀ ϕ ∈ C01 (Ω, RN ).
2 Ω
u0 is a strict weak minimum from Lemma 6.3.
Remark 6.3 (Strong minimum) For multi-integral variational problems, we can
also define the Weierstrass excess function EL ∈ C 1 (Ω × RN × RnN × RnN , R1 )
and use this to state the necessary condition for a strong minimum. We define
n X
X N
EL (x, u, p, q) = L(x, u, q) − L(x, u, p) − (qαi − piα )Lpiα (x, u, p).
α=1 ß=1
We have the following.

Theorem 6.4 Suppose u0 ∈ C 1 (Ω̄, RN ) is a strong minimum of I, then
EL (x, u0 (x), ∇u0 (x), ∇u(x) + π) ≥ 0, ∀ x ∈ Ω, ∀ π = (παi ), rank(π) = 1.
The idea of the proof follows closely to that of a single variable, however, it is
more complicated, so we shall omit it here.
When n > 1 but N = 1, there is also a similar sufficient condition for strong
minimum (Lichtenstein theorem), we refer to [GH], p. 390.
Exercises
R q
1. Let I(u) = Ω
1 + u2x + u2y dxdy, ∀ ϕ ∈ C0∞ (Ω), find the first variation
δI(u, ϕ) and the second variation δ 2 I(u, ϕ).
2. Suppose g = (gαβ (x))1≤α,β≤n is a continuous positive definite marix defined
on a closed and bounded region Ω. Denote det(g) its determinant and (g αβ (x))
its inverse matrix. Let
Z X n
1
I(u) = g αβ (x)∂α u(x)∂β (x) det(g) dx1 · · · dxn .
2 Ω
α,β=1
(1) find its E-L equation.

(2) Assume ψ ∈ C 1 (∂Ω, R1 ), u0 ∈ M := {v ∈ C 1 (Ω, R1 ) | v|∂Ω = ψ} is a
critical point of I. Prove that u0 is a weak minimum.
(3) Find Ju0 .
3. Find the E-L equation of the functional

Z
I(u) = (|∇u|p − |u|q )dx, 1 ≤ p, q < ∞.
Ω
Suppose u0 is a critical point, find Ju0 .
4. Suppose u ∈ C 1 (R4 ) is a solution of 2u = 0. Is u a critical point or minimum
of the functional
Z
I(u) = [(∂t u)2 − |∇x u|2 ] dtdx1 dx2 dx3 ?
R4
5. Suppose F ∈ C 2 (R1 , R1 ) satisfy |F 00 (t)| < λ1 , which is the first eigenvalue

of the Laplace operator
Z
1 2
I(u) = |∇u| + F (u) dx
Ω 2
with zero boundary condition. Suppose u0 ∈ C01 (Ω) is a critical point of I,

show that it is a minimum.
6. Let Ω ⊂ R2 be a plane region, ν be a constant, f ∈ L1 (Ω), M = {w ∈
C 2 (Ω, R1 ) | w|Ω = ∂n w|Ω = 0}, where ∂n denotes the normal derivative.
Write the E-L equation of the functional:
Z
I(w) = [(wxx + wyy )2 − 2ν(wxx wyy − wxy 2
) + f w] dxdy.
Ω
Lecture 7
Constrained variational problems
Finding extremal values of functions includes both unconstrained and con-

strained problems. The extreme value problems of functionals also include both
unconstrained and constrained problems. However, the constraints can be more
colorful in variational problems.
7.1 The isoperimetric problem
The so-called isoperimetric problem states: for a given target functional

(M, I), a constraint functional (M, N ), and a prescribed constant c, find the
necessary and sufficient condition for I to attain its minimum under the constraint
N (u) = c, namely,
min{I(u) | u ∈ M, N (u) = c}.
The following example is the original source of isoperimetric problems.
Example 7.1 (The isoperimetric problem) Find a closed plane curve of a given
perimeter which encloses the greatest area.
We parametrize a closed plane curve as follows:

x = x(θ),
y = y(θ),
where 0 ≤ θ ≤ 2π is the parameter. The enclosed area is then given by
1 2π
Z
A= (xy 0 − yx0 )dθ
2 0
and its arclength
Z 2π p
L= x02 + y 02 dθ.
0
89
Since the perimeter l is given, our goal is to find a curve (x(θ), y(θ)) such that
functional A is a maximum under the constraint L = l.
We first recall the method used on constrained optimization problems in math-
ematical analysis. Assume f, g ∈ C 1 (Ω, R1 ), where Ω ⊂ Rn is an open subset.
Assume g −1 (1) 6= ∅. If x0 ∈ Ω is such that the function f attains its minimum
under the constraint g(x0 ) = 1 with ∇g(x0 ) 6= 0:
f (x0 ) = min
−1
f (x),
g (1)
then we can apply the Lagrange multiplier to turn this constrained minimization
problem into an unconstrained minimization problem.
To be more explicit, there exists a Lagrange multiplier λ ∈ R1 such that
∇f (x0 ) + λ∇g(x0 ) = 0.
The Lagrange multiplier has a clear geometric meaning: if M = g −1 (1) is a
differentiable manifold, then ∇f (x) is parallel to the outward normal vector at
the point (x, g(x)), whenever x is the constraint minimizer. We restrict f on M
and denote it by f˜ = f |M , whose differential
∇f (x) · ∇g(x)
df˜(x) = ∇f (x) − ∇g(x).
k∇g(x)k2
At an extreme point, it must satisfy
df˜(x) = 0 ⇔ ∇f (x) + λ∇g(x) = 0,
where
∇f (x) · ∇g(x)
λ=−
k∇g(x)k2
is the projection of −∇f (x) onto the unit outward normal ∇g(x) (when
k∇g(x)k = 1).
This indicates by means of the Lagrange multiplier, the solution of the con-
strained extreme value problem becomes a stationary point (critical point) of the
adjusted function f + λg.
Constrained extreme value problem of a functional can also be turned into
an unconstrained extreme value problem of another functional via the Lagrange
multiplier. We have the following:
Theorem 7.1 Given L, G ∈ C 2 (Ω̄ × RN × RnN ), ρ ∈ C 1 (∂Ω, RN ), and M =
{u ∈ C 1 (Ω̄, RN ) | u|∂Ω = ρ}. Define on M the functionals
Z
I(u) = L(x, u(x), ∇u(x))dx
Ω
Constrained variational problems 91
and
Z
N (u) = G(x, u(x), ∇u(x))dx.
Ω
Suppose c is a constant such that N −1 (c) ∩ M =

6 ∅. Suppose u0 ∈ M is a weak
minimum of I under the constraint N (u) = c, i.e.
I(u0 ) = min I(u),
u∈M∩N −1 (c)
and suppose ∃ ϕ0 ∈ C01 (Ω, RN ) such that δN (u, ϕ0 ) =

6 0, then ∃ λ ∈ R1
satisfying
δI(u0 , ϕ) + λδN (u0 , ϕ) = 0, ∀ ϕ ∈ C01 (Ω, RN ).
Namely, if let
Q = L + λG
be the adjusted Lagrangian, then u0 satisfies the corresponding E-L equation
n
X
∂α Qpiα (x, u0 (x), ∇u0 (x)) = Qui (x, u0 (x), ∇u0 (x)), i = 1, . . . , N. (7.1)
α=1
Proof Note the mapping ϕ 7→ δN (u0 , ϕ) is linear, so without loss of generality,

we may assume δN (u0 , ϕ0 ) = 1. We regard N −1 (c) ∩ M as a hypersurface in a
function space. For any ϕ ∈ C01 (Ω, RN ) linearly independent of ϕ0 , consider the
plane π at u0 spanned by the vectors ϕ0 and ϕ (see Figure 7.1)
π = {u0 + ϕ + τ ϕ0 | (, τ ) ∈ R2 },
and the two functions Φ and Ψ on π:
Φ(, τ ) = I(u0 + ϕ + τ ϕ0 )
Fig. 7.1
and
Ψ(, τ ) = N (u0 + ϕ + τ ϕ0 ).
Notice
Ψ(0, 0) = N (u0 ) = c, Ψτ (0, 0) = δN (u0 , ϕ0 ) = 1,
for 0 , τ0 > 0 sufficiently small, we now apply the implicit function theorem on
R = (−0 , 0 ) × (−τ0 , τ0 ): ∃ ξ ∈ C 1 (−0 , 0 ) such that (, ξ()) ∈ R is the
unique solution of
Ψ(, τ ) = c
inside R. This means
Ψ(, ξ()) = c, ξ(0) = 0, ξ 0 (0) = −Ψ (0, 0) = −δN (u0 , ϕ).
So inside a small enough R,
N −1 (c) ∩ π = u0 + ϕ + ξ()ϕ0 .
Let
g() = I(u0 + ϕ + ξ()ϕ0 ).
Since v = u0 + ϕ + τ ϕ0 ∈ M,
kv − u0 kC 1 ≤ ||kϕkC 1 + |τ |kϕ0 kC 1 .
However, u0 is a weak minimum of I on M ∩ N −1 (c), so for 0 , τ0 sufficiently
small,
Φ(, ξ()) ≥ Φ(0, 0).
Thus, 0 is a minimum of g.
From this, we deduce that
0 = g 0 (0)
d
= Φ(, ξ(τ ))|=0
d
= Φ (0, 0) + Φτ (0, 0)ξ 0 (0)
= δI(u0 , ϕ) + λδN (u0 , ϕ), (7.2)
where
λ = −Φτ (0, 0)
is independent of ϕ. Letting Q = L + λG, (7.2) is the E-L equation associated to
Q:
div Qp (x, u0 (x), ∇u0 (x)) = Qu (x, u0 (x), ∇u0 (x)).
Remark 7.1 We still call λ a Lagrange multiplier.

Remark 7.2 We can also consider the constrained extreme value problem with
multiple constraints: given L, G1 , . . . , Gm ∈ C 2 (Ω̄ × RN × RnN ), M ⊂
C 1 (Ω̄, RN ), and constants c1 , c2 , . . . , cm . Let
Z
I(u) = L(x, u(x), ∇u(x))dx,
Ω
Z
Nj (u) = Gj (x, u(x), ∇u(x))dx, j = 1, 2, . . . , m,
Ω
if u ∈ M is a minimum
Z of the functional I under the constraints
Gj (x, u(x), ∇u(x))dx = cj , j = 1, 2, . . . , m
Ω
and if ∃ ϕk ∈ C01 (Ω, RN ) (1 ≤ k ≤ m) such that det(ajk ) 6= 0, where
ajk = δNj (u, ϕk )
Z
= [hGju (x, u(x), ∇u(x), ϕk (x)i + hGjp (x, u(x), ∇u(x)), ∇ϕk (x)i]dx,
Ω
j, k = 1, . . . , m,
then there exist Lagrange multipliers λ1 , . . . , λm such that u satisfies the E-L
equation of the adjusted Lagrangian
Q = L + λ1 N1 + · · · + λm Gm .
Example 7.1 (The isoperimetric problem continued) By Theorem 7.1, we in-
troduce the Lagrange multiplier λ and consider the E-L equation of the adjusted
functional
I(x, y) = A(x, y) + λL(x, y),
it follows that  0
y0

0
 −x = λ p 02


 ,
x + y 02
0
x0

 0
 y = λ p 02 .


x + y 02
Upon solving, we arrive at
y0


 x − c1 = −λ p ,
x02 + y 02

x0
 y − c2 = λ p 02 ,


x + y 02
where c1 and c2 are constants. This is exactly the standard circle equation
(x − c1 )2 + (y − c2 )2 = λ2 ,
1
whose radius r = λ = 2π and centered at (c1 , c2 ).
Example 7.2 (The eigenvalue problem continued) In Lecture 1, we asked the

following constrained extreme value problem: for a given domain Ω ⊂ Rn and a
bounded continuous function q ∈ C(Ω̄), define the functionals
Z Z
I(u) = |∇u|2 dx, N (u) = q(x)|u(x)|2 dx, M = C01 (Ω).
Ω Ω
Find
min{I(u) | u ∈ M, N (u) = 1}.
According to Theorem 7.1, we introduce the Lagrange multiplier λ. The E-L
equation of the adjusted Lagrangian Q = p2 − λq(x)u2 is
−∆u = λqu.
Here λ precisely coincides with the eigenvalues of the Laplace operator −∆ with
respect to the weight function q.
R If u0 ∈2 M is a minimum R satisfying the constraint N (u) =
2
Ω
q(x)|u(x)| dx = 1, then λ = Ω
|∇u0 (x)| dx 6= 0 such that
−∆u0 (x) = λq(x)u0 (x).

This minimum u0 is an eigenfunction of the Laplace operator −∆ with respect to
the weight function q, and the Lagrange multiplier λ is its corresponding eigen-
value.
In fact, by introducing the Lagrange multiplier λ, all critical points of the
functional Z
[|∇u(x)|2 − λq(x)u2 (x)]dx
Ω
are eigenfunctions.
7.2 Pointwise constraints
The constraint appeared in the isoperimetric problem is of integral form, there

is however another kind of constraint, which is given pointwise.
For instance, given a function M ∈ C 1 (Ω̄ × RN × RnN , R1 ), we want to find
the extrema in M, under the pointwise constraint
M (x, u(x), ∇u(x)) = 0, ∀ x ∈ Ω,
of the functional Z
I(u) = L(x, u(x), ∇u(x))dx.
Ω
It is natural to ask, is there a Lagrange multiplier-like method available? In the
following, we will only address this question in the case of holonomic constraints,
namely M only depends on u (but independent of p).
Theorem 7.2 Let Ω̄ ⊂ Rn be a closed and bounded set. Let L ∈ C 2 (Ω̄ × RN ×

RnN , R1 ), M ∈ C 2 (RN , R1 ), ρ ∈ C 1 (∂Ω, RN ), and
M = {u ∈ P W C 1 (Ω̄, RN ) | u|∂Ω = ρ}.
Suppose u0 ∈ M is a local minimum under the above constraint and it is C 2

outside finitely many (n−1) dimensional piecewise C 1 hypersurfaces. If ∀ x ∈ Ω̄,
∇M (u0 (x)) = 6 0, then there exists a continuous function λ ∈ C(Ω̄) such that u0
satisfies the E-L equation of the adjusted Lagrangian Q = L + λM :
n
X ∂
Lui + λMui = L i, 1 ≤ i ≤ N. (7.3)
α=1
∂xα pα
Proof We will first construct such λ to be continuous locally on each piece, and
then glue them together to a globally defined continuous function on Ω̄.
1. ∀ x0 ∈ Ω, there exists a ball Br (x0 ) ⊂ Ω such that ∇x (M (u0 (x))) = 6 0,
∀ x ∈ Br (x0 ). Since ∇u M (u0 (x)) · ∇u0 (x) = ∇x (M (u0 (x))), ∇u M (u) 6= 0,
∀ u ∈ u0 (Br (x0 )), ∇u0 (x) 6= 0, ∀ x ∈ Br (x0 ). Without loss of generality,
we may assume MuN (u0 (x)) 6= 0, ∀ x ∈ Br (x0 ). If we adopt the notation
ũ = (u1 , . . . , uN −1 ), then we can solve and obtain uN = U (ũ), where U is a C 2
function (see Figure 7.2).
Fig. 7.2
We further adopt the notation

−1
p̃ = (piα )1≤i≤N
1≤α≤n ,
−1
NX
N
p = Uui piα ,
i=1 1≤α≤n
and
N
X −1
PN = Uui ∇ui .
i=1
2. Since u0 is a minimum, when the domain of the integral of I is restricted to

Br (x0 ), u0 |Br (x0 ) is still a minimum. This can be justified as follows: suppose
not, then ∃ v ∈ P W C 1 (B̄r (x0 )) such that v|∂Br (x0 ) = u0 |∂Br (x0 ) and
Z Z
L(x, v(x), ∇v(x))dx < L(x, u0 (x), ∇u0 (x))dx.
B̄r (x0 ) B̄r (x0 )
Now, if we replace u0 (x) by v(x) on B̄r (x0 ), then we will end up with a new
piecewise differentiable function whose functional value is strictly less than I(u0 ),
which contradicts the fact that u0 is a minimum.
Let
Λ(x, ũ, p̃) = L(x, ũ, U (ũ), p̃, pN ),
since u0 is the minimum of I under the constraint M (u) = 0, ũ0 must be a
minimum of
Z
J(ũ) = Λ(x, ũ(x), ∇ũ(x))dx.
Br (x0 )
The latter has E-L equation

Λui = div Λpi (x, ũ(x), ∇ũ(x)), i = 1, . . . , N − 1.
Namely,
n n
X ∂pN
α
X ∂
Lui + LuN Uui + LpN = (Lpiα + LpN Uui ), i = 1, . . . , N − 1.
α=1
α
∂ui α=1
∂x α
α
(7.4)
However,
N −1
∂pN
α
X ∂2U j ∂Uui
= u =
i ∂uj α
. (7.5)
∂ui j=1
∂u ∂xα
By (7.5), the right-hand side of (7.4) is equal to

n
X ∂ ∂ ∂
Lpiα + Uui LpN + L N U i
α=1
∂xα ∂xα α pα
∂xα u
n
∂pN

X ∂ ∂ α
= L i + Uui L N + LpN .
α=1
∂xα pα ∂xα pα α
∂ui
Hence, (7.4) is reduced to

N −1 X n
X ∂ ∂
Lui + Uui LuN − LpN = Lpiα . (7.6)
α=1
∂x α
α
α=1
∂x α
Noting
M (ũ0 , U (ũ0 )) = 0,
upon differentiating, we obtain
Mui + MuN Uui = 0,
i.e.
Mui
Uui = − . (7.7)
MuN
We now define on Br (x0 ) the function
1
λ= (div LpN − LuN ). (7.8)
MuN
Substituting (7.7) and (7.8) into (7.6), it follows that
Lui − div Lpi + λMui = 0, i = 1, . . . , N − 1. (7.9)
Combining (7.8) and (7.9) gives the local E-L equation (7.3) on Br (x0 ).
3. Since Ω̄ is compact, we have a finite open sub-covering of the covering
by {Br (x0 ) | x0 ∈ Ω, r = r(x0 )}, together with the already defined functions
λ|Br (x0 ) . Suppose on Br1 (x1 ), MuN 6= 0 and on Br2 (x2 ), Muj 6= 0, then we
have λ|Br2 (x2 ) = M1 j (div Lpj − Luj ) and λ|Br1 (x1 ) is the λ in (7.8). Thus, when
u
any two such small balls have a nonempty intersection (Br1 (x1 ) ∩ Br2 (x2 ) 6= ∅),
by (7.9), it is immediate that
λ|Br1 (x1 ) = λ|Br2 (x2 ) .
We now glue these λ|Br (x0 ) together to create a globally continuous function λ as
desired.
Remark 7.3 Just like the isoperimetric problem, we can also consider multiple
pointwise constraints. As long as the constraints M1 (u) = M2 (u) = · · · =
Ms (u) = 0 satisfy

∂Mi (u(x))
rank = s, ∀ x ∈ Ω.
∂uj
Now the adjusted Lagrangian is
Xs
Q=L+ λi Mi ,
i=1
where λi ∈ C(Ω), i = 1, . . . , s.
Remark 7.4 If the constraint functions also depend on other variables, then
we call them non-holonomic constraints. As to variational problems with non-
holonomic constraints, is there still a Lagrange multiplier method? The answer to
this question is far more complicated; we refer the interested readers to the book
by Giaquinta and Hidelbrandt [GH]. However, some special cases are also known
to be optimal control problems, which we will address in the third part of this
book.
Example 7.3 (Revisit of geodesics on spheres) In our earlier discussions, we
have studied the geodesics problems from an unconstrained extreme value prob-
lem point of view. We now provide a different viewpoint. Regard the sphere as
the constraint
x2 + y 2 + z 2 = 1.
Find the extrema of the functional
Z b p
I(u) = ẋ2 + ẏ 2 + ż 2 dt
a
given this pointwise constraint.
Let u = (x, y, z) and p = (ξ, η, ζ). Introducing the Lagrange multiplier λ(t),
we have the adjusted Lagrangian
p
Q = ξ 2 + η 2 + ζ 2 + λ(x2 + y 2 + z 2 ).
Its E-L equation is
d u̇
= 2λu;
dt |u̇|
furthermore, it also satisfies |u| = 1. Since

d u̇ d u̇ u̇ du u̇
×u = ×u+ × = 2λu × u + × u̇ = 0,
dt |u̇| dt |u̇| |u̇| dt |u̇|
v = |u̇u̇| × u must be a constant vector. Consequently, u ⊥ v, i.e. the geodesic u
must lie in a plane orthogonal to the constant vector v, hence this curve must be a
piece of a great circle.
Example 7.4 On the slant plane x = y, find the equation of the trajectory of a
moving particle of unit mass by gravity. We denote the coordinate of the particle
by (x, y). The Lagrangian is
1 2
L= (p + q 2 ) − gy,
2
where g is the gravitational constant and the constraint is
M (x, y) = y − x = 0.
The adjusted Lagrangian is

1 2
Q = L + λM = (p + q 2 ) − gy + λ(y − x)
2
whose E-L equation is
∂Q d

 −λ = ∂x = dt Qp = ẍ,


 ∂Q d
 −g + λ =
 = Qq = ÿ.
∂y dt
Since y = x, λ = g/2, from which we solve to get
1
x = y = − λt2 + ẋ(0)t + x(0),
2
where λ = g/2.
Example 7.5 (Harmonic mappings to spheres) Let Ω be the unit ball in R3 , S 2
be the unit sphere in R3 , and u = (u1 , u2 , u3 ) : Ω → S 2 . If u is a solution of the
constrained extreme value problem
min{I(u) | M (u) = 0},
where
Z Z X 3 X 3
2
I(u) = |∇u| dx = |∂α ui |2 dx,
Ω Ω i=1 α=1
2
M (u) = |u| − 1 = u21 + u22 + u23
− 1,
then we call u a harmonic mapping from the solid unit ball to the unit sphere. Find
the differential equation for which such a harmonic mapping satisfies.
Solution The E-L equation of the adjusted Lagrangian is
−∆ui = λui ,
where ∆ is the Laplace operator and the Lagrange multiplier λ is a continuous
function. From
u21 + u22 + u23 = 1,
by differentiating, it yields
hu, ∇ui = 0,
where h·, ·, i denotes the standard inner product on R3 . Differentiating once more,
it yields
hu, ∆ui + |∇u|2 = 0.
Multiplying by u on both sides of the above E-L equation, it is immediate
λ = −h∆u, ui = |∇u|2 ,
i.e.
−∆u = u|∇u|2 .

7.3 Variational inequalities
In addition to constraints presented in terms of equalities, there are also varia-

tional problems with inequality constraints.
Example 7.6 (The Obstacle problem) Let Ω ⊂ R2 be a bounded open subset.
As for the boundary, there is a given function ϕ ∈ C 1 (∂Ω).
As for the obstacle, there is a given function ψ ∈ C 1 (Ω̄).
As for the external force, there is a given function f ∈ C(Ω̄).
Over Ω, we consider a thin membrane, whose boundary is fixed, u|∂Ω = ϕ,
while applying the external force, it cannot move beyond the obstacle, i.e. u(x) ≤
ψ(x), ∀ x ∈ Ω̄.
We propose this as the following variational problem: find the equilibrium
position
u ∈ M = {u ∈ P W C 1 (Ω̄) | u|∂Ω = ϕ},
subject to the inequality constraint
u(x) ≤ ψ(x), ∀ x ∈ Ω, u ∈ M,
such that the energy I of the thin membrane achieves its minimum:
Z
1
I(u) = |∇u(x)|2 − f (x)u(x) dx.
Ω 2
Generally speaking, the domain M is a given set of functions defined on Ω̄ ⊂ Rn ,

for a given convex subset C of M and a Lagrangian L, we ask to
Z
min I(u) = L(x, u(x), ∇u(x))dx | u ∈ C .
Ω
Suppose u ∈ C is a minimum, we want to derive a formula similar to the E-L

equation. In fact, since C is convex, ∀ v ∈ C, tv + (1 − t)u ∈ C, ∀ t ∈ [0, 1],
hence
I(tv + (1 − t)u) ≥ I(u), ∀ t ∈ [0, 1].
Consequently,
δI(u, v − u) = lim+ [I(tv + (1 − t)u) − I(u)] ≥ 0,
t→0
i.e. ∀ v ∈ C,
Z
[Lu (x, u(x), ∇u(x))(v(x) − u(x))
Ω
+ Lp (x, u(x), ∇u(x)) · (∇v(x) − ∇u(x))]dx ≥ 0.
Note the only difference between the E-L equation and this expression is that the
former is an equality, whereas the latter is an inequality. This is why we shall call
it a variational inequality.
Returning to the obstacle problem, C = {u ∈ P W C 1 (Ω̄) | u(x) ≤
ψ(x), ∀ x ∈ Ω̄, u|∂Ω = ϕ} is a convex set, so the resulting variational inequality
is
Z
[∇u∇(v − u) − f (v − u)]dx ≥ 0, ∀ v ∈ C.
Ω
Exercises
1. Find
min{I(u) | u ∈ C 1 [0, 1], u(0) = 0, u(1) = 2, N (u) = L},
where
Z 1 Z 1
I(u) = u̇2 (t)dt, N (u) = u(t)dt.
0 0
2. (The Dido problem) Find
max{I(u) | u ∈ C01 (0, b), N (u) = L},
where
Z b Z b
1
I(u) = u(t)dt, N (u) = (1 + u̇(t)2 ) 2 dt.
0 0
3. Find the E-L equation of

Z b
I(u) = [(u00 )2 (t) − p(t)(u0 )2 (t) + q(t)u2 (t)]dt,
a
Rb
subject to the isoperimetric constraint a r(t)u2 (t)dt = 1, where p, q, r are
continuous functions on [a, b], and u satisfies the boundary condition: u(a) =
u̇(a) = u(b) = u̇(b) = 0.
4. Find the extreme values of the functional
Z 1
I(u) = (u̇21 + u̇22 )dt, M = {(u1 , u2 ) ∈ C01 ((0, 1), R2 )},
0
under the constraint u22 + (u1 − t) = 0.

5. Let Ω ⊂ Rn , u : Ω → RN . Suppose G ∈ C 1 (RN ) such that ∇G(u) 6= 0,

∀ u ∈ G−1 (1). Let
Z
I(u) = |∇u(x)|2 dx.
Ω
Prove that the E-L equation of the constrained variational problem
min{I(u) | G(u(x)) = 1 ∀ x ∈ Ω}
is
−∆ui = λ∂ui G, i = 1, . . . , N,
PN Pn
k,j=1 α=1 ∂u2k uj G(u)∂α uk ∂α uj
λ= .
|∇u G(u)|2
This is the harmonic mapping equation of the hypersurface G(u) = 1.
6. Let Ω ⊂ Rn be a bounded open subset and u : Ω̄ → R1 be a non-parametric
equation of a surface. For a given
R boundary value u|∂Ω = ψ (ψ is a contin-
2 21
uous function),
R its area A(u) = Ω
(1 + |∇u(x)| ) dx with enclosed volume
V (u) = Ω u(x)dx. Determine the surface equation whose area is a minimum
but whose volume is the given constant V0 .
7. Let X = (X1 (u, v), X2 (u, v), X3 (u, v)), ∀ (u, v) ∈ B := {(u, v) | u2 + v 2 =
1} be the parametric equation of a surface S. Define
Z X 3
1
D(X) = |∇X i |2 dudv
2 B i=1
and
Z
1
V (X) = X · (Xu ∧ Xv )dudv.
3 B
Determine the equation of the surface S such that D(X) achieves its minimum
under the constraint V (X) = V0 .
Lecture 8
The conservation law and Noether’s

theorem
In physics and mechanics, we frequently encounter various kinds of conserva-

tion laws, such as conservation of energy, conservation of momentum, and con-
servation of angular momentum, etc. E. Noether found out that the reason for such
conservation laws is because the Lagrangian has invariant property under certain
group actions.
8.1 One parameter diffeomorphisms and Noether’s theorem
1. A special one parameter family of functions

Given a bounded open domain Ω ⊂ Rn , a function u ∈ C 1 (Ω̄, RN ), and a
Lagrangian L ∈ C 2 (Ω̄ × RN × RnN ). We now introduce a 1-parameter family of
diffeomorphisms ηε : Ω → Ωε , ε ∈ (−ε0 , ε0 ), where Ωε = ηε (Ω) ⊂ Rn is the
family of deformation of Ω under ηε and η0 = id (see Figure 8.1).
Fig. 8.1
103
Suppose
∂
ηε |ε=0 = X̄(x).
∂ε
Then the deformation
y = ηε (x) = x + εX̄(x) + o(ε).
Moreover, we define v : Rn × (−ε0 , ε0 ) → RN satisfying v(x, 0) = u(x). This
induces a family of functions vε = v(·, ) : Ωε → RN , (y, ε) ∈ Ωε × (−ε0 , ε0 ).
Suppose

∂
vε (x) = ϕ(x).
∂ε ε=0
In our previously discussed variational problems, in the same domain of the func-
tional, all functions are defined over the same underlying region Ω. However, this
requirement is unnecessary, we can indeed allow the functions in M to take on
different domain Ω. To emphasize the functional dependence on the region, we
shall denote
Z
I(u, Ω) = L(x, u(x), ∇u(x))dx.
Ω
Consequently, on the family of functions {v , Ω }, I takes values

Z
Φ(ε) = I(vε , Ωε ) = L(y, v(y), ∇vε (y))dy
Ωε
Z
∂(ηε (x))
= L(ηε (x), vε (ηε (x)), ∇y vε (ηε (x))) det dx.
Ω ∂x
Since

d ∂(ηε (x))
det = div(X̄),
d ∂x
=0
when u ∈ C 2 , we have the following Noether’s identity:

Z
Φ0 (0) = [∂xα LX̄ α + Lui ϕi + Lpiα ϕixα + L div X̄]dx
Ω
Z
= [div(LX̄) + Lui ϕi − ∂α (Lpiα )ϕi + div(Lpi ϕi )]dx,
Ω
i.e.
Z
Φ0 (0) = [EL (u)ϕ + div(LX̄ + Lpi ϕi )]dx. (8.1)
Ω
The conservation law and Noether’s theorem 105
2. General local 1-parameter transformation group

We consider deformation in the “phase space” (the space where both x and u
are changing simultaneously). Let Ω ⊂ Rn . Let {φε } : Ω×RN → Rn ×RN , || <
0 , be a family of mappings satisfying

Y (x, u, 0) = x,
W (x, u, 0) = u.
We call (x, u) 7→ (Y (x, u, ε), W (x, u, ε)) a local 1-parameter transformation
group. Its generating vector field is
n N
dφε X
α ∂ X ∂
= X (x, u) α + U i (x, u) i . (8.2)
dε ε=0 α=1
∂x i=1
∂u
Thus,

∂
 X(x, u) = ∂ε Y (x, u, ε) ,

ε=0

 U (x, u) = ∂ W (x, u, ε)

.
∂ε ε=0
For any u ∈ C 1 (Ω̄, RN ), we want to convert it into the special 1-parameter family
of functions as given above. Let

η(x, ε) = Y (x, u(x), ε),
ω(x, ε) = W (x, u(x), ε),
then

η(x, 0) = x,
ω(x, 0) = u(x).
We also let

∂η(x, ε)
X̄(x) = = X(x, u(x)),
∂ε ε=0
(8.3)
∂ω(x, ε)
Ū (x) = = U (x, u(x)).
∂ε ε=0
Introducing the variable

y = ηε (x) = η(x, ε) = x + εX̄(x) + o(ε)
as well as the deformed region Ωε = ηε (Ω), then ηε : Ω̄ → Ω̄ε is a diffeomor-
phism.
Denote the inverse mapping ξε = ηε−1 , then we have
x = ξε (y) = y − εX̄(y) + o(ε) ∀ y ∈ Ωε
and another family of mappings

vε (y) = v(y, ε) = ω(ξε (y), ε) = ω(x, ε) ∀ y ∈ Ωε .
Let
∂v(x, ε)
ϕ(x) = ∀ x ∈ Ω̄,
∂ε ε=0
it follows that
n
∂v(η(x, ε), ε) X
Ū = |ε=0 = ϕ(x) + ∂α u(x)X̄ α (x),
∂ α=1
i.e.
n
X
ϕ(x) = Ū − uxα X̄ α . (8.4)
α=1
That said, for a given u ∈ C 2 (Ω, RN ), we can turn the locally 1-parameter
transformation group φε , whose generating vector field is (8.2), into a special
1-parameter family of functions. Note X̄ and ϕ are determined by (8.3) and
(8.4). Substituting into (8.1), we obtain the Noether’s identity of a general lo-
cal 1-parameter transformation group.
It is worth noting the representation of I(u, Ω) under the transformation {φε }
is Z
I(vε , Ωε ) = L(y, vε (y), ∇vε (y))dy.
Ωε
Definition 8.1 ∀ Ω0 ⊂ Ω̄0 ⊂ Ω. Let (Ω0 )ε = η(Ω0 , ε). If
I(vε , (Ω0 )ε ) = const. (independent of ε), ∀ u ∈ C 1 (Ω̄0 , RN ),
then I is invariant under {φε }.
In fact, if at each point, the following holds:
L(ηε (x), vε (ηε (x)), ∇y vε (ηε (x)))det(ηε (x)) = L(x, u(x), ∇u(x)),
then I is invariant under {φε }. In summary, we arrive at
Theorem 8.1 (Noether) Suppose the local 1-parameter transformation group
{φε } is generated by the vector field (8.2). Let
Z
I(u) = L(x, u(x), ∇u(x))dx.
Ω
Then ∀ u ∈ C 2 (Ω, RN ), the Noether identity (8.1) holds, where


 X̄(x) = X(x, u(x)),

 Ū (x) = U (x, u(x)),

n

i i
X ∂ui α
ϕ (x) = Ū (x) − X̄ (x).


∂xα


α=1
If the functional I is invariant under {φε }, then
EL (u)ϕ + div(LX̄ + Lpi ϕi ) = 0.
Here we should substitute (u(x), ∇u(x)) into (u, p) in L and Lp .

Furthermore, if u ∈ C 2 is a weak minimum of I, then the (n − 1)-form
n N n
ˆ α ∧ · · · dxn
X X X
ν= LX̄ α + Lpiα Ū i − uixβ X̄ β dx1 ∧ · · · ∧ dx
α=1 i=1 β=1
is closed, i.e.
dν = 0.
Corollary 8.1 When n = 1, let {φε } be a local 1-parameter transformation group
on RN , whose generating vector field is U (this means X = 0). If I is invariant
under {φε } and u ∈ C 2 is a weak minimum of I, then
N
X
Ū i (u(t))Lpi (t, u(t), u̇(t)) = const.
i=1
Corollary 8.2 When n = 1, if L is autonomous (i.e. L is independent of t), then
for the solution u of the E-L equation of I, we have
XN
i

L− Lpi p = const.
i=1 (u,p)=(u(t),u̇(t))
Proof In fact, for the local 1-parameter transformation group {φε }, its generating
vector fields are X = 1, U = 0, whence ϕ = du dt .
8.2 The energy–momentum tensor and Noether’s theorem
When n = 1, the Legendre transform of the Lagrangian L is the Hamiltonian

H. The roles of L and H are usually symmetric in formulas. However, when
n > 1, since p is no longer a vector but a tensor instead, we introduce the fol-
lowing Hamilton energy–momentum tensor (or the energy–momentum tensor for
short) to reflect such symmetry:
T (x, u, p) = (Tαβ (x, u, p)),
where
Tαβ = piα Lpiβ − δαβ L.
In other words, every component of T is the Legendre transform of L. In this
sense, the energy–momentum tensor is the generalization of the Hamiltonian in
high dimensions.
Example 8.1 Let

 
3
1 X 1
L=  gij pi pi − M 2 u2  = (p20 − p21 − p22 − p23 − M 2 u2 ),
2 i,j=0 2
where g00 = 1, gii = −1, i = 1, 2, 3, and gβγ = 0 for β 6= γ. Then

3
X
Tβα = gαγ pβ pγ − δαβ L.
γ=0
In particular,
1
T00 = (p20 + p21 + p22 + p23 + M 2 u2 ), Tβ0 = −p0 pβ , β = 1, 2, 3.
2
Using the energy–momentum tensor, we can rewrite Noether’s theorem as follows.
Theorem 8.2 Let L ∈ C 2 (Ω × RN × RnN , R1 ). Suppose
Z
I(u) = L(x, u(x), ∇u(x))dx
Ω
is invariant under local 1-parameter transformation group {φε }. If u ∈
C 2 (Ω̄, RN ) is a weak minimum of I, then
 
Xn n
X Xn
 Lpiα Ū i − Tβα X̄ β  = 0.
α=1 i=1 β=1
xα
Or simply,
div Lpi Ū i − T · X̄ = 0.

Example 8.2 Consider the system of l particles m1 , . . . , ml , whose space coor-

dinates are X = (X1 , . . . , Xl ), where Xi = (xi , yi , zi ) (1 ≤ i ≤ l) is the space
coordinates of the ith particle. The energy is
1X 2 1X
T = mj Ẋj (t) = mj (ẋ2j + ẏj2 + żj2 ).

2 2
The energy potential is
X mi mj X mi mj
V = −k = −k 2 12
,
i<j
|Xi − X j | 2 2
i<j [(xi − xj ) + (yi − yj ) + (zi − zj ) ]
where k is a constant. The Lagrangian

L = T − V,
whose associated functional is
Z t1
I(X) = L(X(t), Ẋ(t))dt.
t0
• The space translation group {Sε }.

Let {Sε } be a family of space coordinate transformations depending on the
parameter ε:
x
ei = xi + ε, yei = yi , zei = zi , 1 ≤ i ≤ l.
Since L is independent of ε, I is invariant under {Sε }, whose generating vector
fields X = 0, U = (e1 , . . . , e1 ), Pi1 = mi ẋi (t), where e1 = (1, 0, 0). It follows
from Noether’s theorem that
l
X
Pi1 = const.
i=1
Pl
Likewise, applying the translations to y and z respectively, we obtain i=1 Pi2 =
const. and i=1 Pi3 = const. Denote P = (P 1 , P 2 , P 3 ), then
P
l
X l
X
Pi = mi Ẋi (t) = const.,
i=1 i=1
which is the conservation of momentum.

• The time translation group.
Let {Tε } be a family of space-time coordinate transformations depending on
the parameter ε:
t = t + ε,
e x
ei = xi , yei = yi , zei = zi , 1 ≤ i ≤ l.
Then I is invariant under {Tε }. Now X = 1 and U = 0, it follows from Noether’s
theorem that
X
H = pLp − L = mj |Ẋj |2
X
1 2
X mi mj
− mj |Ẋj | + k 2 12
2 2 2
i<j [(xi − xj ) + (yi − yj ) + (zi − zj ) ]
= T + V = const.,
which is the conservation of energy.
• The 1-parameter rotational group {Rε }.
Let {Rε } be a family of space-time coordinate transformations depending on
the parameter ε:


 t = t,
e
xei = xi cos ε + yi sin ε,

 yei = −xi sin ε + yi cos ε,


zei = zi 1 ≤ i ≤ l.
Then I is invariant under {Rε }. Now
X = 0, U = (Z1 , Z2 , . . . , Zl ),
where Zi = (yi , −xi , 0), 1 ≤ i ≤ l, then

l
X l
X
mi (yi ẋi − xi ẏi ) = LẊi · Zi = const.
i=1 i=1
Likewise, for rotations in the yz-plane and the xz-plane, we also have similar
identities. This is the conservation of angular momentum:
l
X
mi Xi ∧ Ẋi = const.
i=1
Example 8.3 Consider a gravitational field u : R1 × R3 → RN , where u =

u(x) denotes the distribution of the field with x = (x0 , x1 , x2 , x3 ), x0 = t being
the time coordinate, and x̄ = (x1 , x2 , x3 ) being the space coordinate. Given a
Lagrangian L : (R1 × R3 ) × RN × R4N , the corresponding functional is
Z
I(u) = L(x, u(x), ∇u(x))dx.
Ω
For example, for N = 1

1 2
L(x, u, p) = (p − p21 − p22 − p23 − M 2 u2 ).
2 0
This is the Klein–Gordon field.
According to the special relativity, the Lagrangian of any gravitational field re-
mains invariant under a positive Lorentz transformation. A positive Lorentz trans-
formation is a time orientation preserving linear transformation on the Minkowski
space-time which leaves the quadratic form x20 − x21 − x22 − x23 invariant. All
positive Lorentz transformations form a group, called the positive Lorentz trans-
formation group.
Obviously, the Lagrangian L is invariant under the space-time translation
group, which infinitesimal generators are
X̄ β = (δγβ )0≤γ≤3 .
In addition, U = 0, it follows from Noether’s theorem that

3
X
Tγα

xα
= 0, γ = 0, 1, 2, 3,
α=0
where
X
Tγα = gαβ ∂β u∂γ u − δαγ L(x, u, ∇u).
β
Or simply,
div Tγ = 0, γ = 0, 1, 2, 3,
where Tγ = (Tγα )0≤α≤3 .
Choose any [t1 , t2 ] × BR (θ) ⊂ R4 as our domain of integration, then
Z Z Z
x̄
Tγ0 (t2 , x̄)dx̄− Tγ0 (t1 , x̄)dx̄+ Tγ0 (t, x̄)· dtdσ = 0,
BR (θ) BR (θ) [t1 ,t2 ]×∂BR (θ) |x̄|
where dσ is the area element of the 2-sphere and |x̄| is the norm of x̄. If as
R → ∞, Tγ0 (t, x), |x| = R, tend to zero uniformly, and Tγ0 (t, ·) is integrable on
R3 , then
Z
Pγ (t) = Tγ0 (t, x̄)dx̄ = const.
R3
In particular, take γ = 0, then
Z
P0 (t) = T00 (t, x̄)dx̄
R3
Z
= [Lp0 ux0 − L](t, x̄)dx̄
3
ZR
1
= (|∂t u|2 + |∇u|2 + M 2 u2 )dx = const., ∀ t ∈ R1 .
R3 2
This shows the conservation of energy. Next, take γ = 1, 2, 3, then
Z
Pγ (t) = Tγ0 (t, x̄)dx̄
R3
Z
= [Lp0 uxγ ](t, x̄)dx̄
R3
Z
=− ∂t u∂γ u = const., ∀ t ∈ R1 .
R3
This shows the conservation of momentum.
The positive Lorentz transformation group includes the following six rota-
tional generators:
εµν = −ενµ , 0 ≤ µ ≤ ν ≤ 3.
Consider the transformation
3
X
yµ = xµ + gνν µν xν ,
0
whose corresponding vector field is

dyµ
Xµαβ = |ε =0
dεαβ αβ
= gββ xβ δµα − gαα xα δµβ , 0 ≤ β < α ≤ 3.
We call
3
X 3
X
Mαβν = (Lpν uxµ − δµν L)Xµαβ = Tµν Xµαβ
µ=0 µ=0
angular momentum. According to Noether’s theorem,

3
X 3
X
(Tµν Xµαβ )xν = ∂ν (gββ Tαν xβ − gαα Tβν xα ) = 0.
µ,ν=0 ν=0
Hence,
Z
Mαβ,0 (t)dx̄ = const.,
R3
∀ (α, β), 0 ≤ α < β ≤ 3. This shows the conservation of angular momentum.

Remark 8.1 For electromagnetic field, complex vector fields, and Dirac field,
etc, their conservation laws can be deduced in a similar fashion using Noether’s
theorem. It is clear that Noether’s theorem is of fundamental importance.
8.3 Interior minima
In this section, we discuss another necessary condition for a functional to at-

tain its minimum. In our earlier discussion, for a given functional, we started with
the dependent variable u to derive a necessary condition for which the functional
achieves its minimum: the E-L equation. However, we can also view this from
a different angle: fix u and let x vary. As a result, we will end up with different
functions, hence the functional varies accordingly. We attempt to describe a nec-
essary condition for which the functional achieves its minimum from this point of
view.
We adopt our earlier notation, let η : Ω̄ → Ω̄ be a self-diffeomorphism,
y = η (x) = x + X̄(x) + o(),
where X̄|∂Ω = 0. η has inverse mapping
x = ξ (y).
Given u ∈ C 1 (Ω̄, RN ), let
v (y) = u(x),
i.e.
v = u ◦ ξ = u ◦ η−1 (y).
Hence,
Z
I(v , Ω) = L(y, v (y), ∇v (y))dy
Ω
Z
∂ξ ∂η
= L η (x), u(x), ∇u(x) det dx.
Ω ∂y ∂x
Noting
−1
∂ξ ∂η ∂X
= =I − + o(),
∂y ∂x ∂x
so for u ∈ C 2 , we have
n n X N
∂ui ∂ X̄ α ∂ X̄ α
Z X
d X
I(vε , Ω)|ε=0 = − Lpiβ + Lxα X̄ α + L dx
dε Ω α=1 ∂xα ∂xβ ∂xα
β=1 i=1
Z X n X n
= Lxα − ∂xα (L(x, u(x), ∇u(x))) + ∂xβ Lpiβ uxα X̄ α dx
i
Ω α=1 β=1
Z n
X ∂u α
= EL (u) · − X̄ dx. (8.5)
Ω α=1
∂xα
We now introduce the following.

Defintion 8.2 u ∈ C 1 (Ω̄, RN ) is said to be an interior minimum of I, if ∀ X̄ ∈
C01 (Ω, Rn ), the functional I satisfies the equation under the transformation vε :
d
I(vε , Ω)|ε=0 = 0.
d
Consequently, when u ∈ C 2 is an interior minimum of I, by (8.5), we have
∂u
EL (u) = 0.
∂xα
It follows immediately that

Corollary 8.3 A C 2 weak minimum is an interior minimum.

It is also worth noting that a necessary condition for having an interior
minimum in C 2 can be expressed via the energy–momentum tensor. ∀ u ∈
C 2 (Ω̄, RN ), ∀ X ∈ C01 (Ω, RN ),
n X N
∂ui ∂ X̄ α ∂ X̄ α
Z X
α
Lpiβ − Lxα X̄ − L dx
Ω α=1 i=1 ∂xα ∂xβ ∂xα
n X n
∂Tαβ (x, u(x), ∇u(x))
Z X
=− + Lxα (x, u(x), ∇u(x)) X̄ α dx.
Ω α=1 ∂xβ
β=1
Thus,
n
X ∂Tαβ (x, u(x), ∇u(x))
+ Lxα (x, u(x), ∇u(x)) = 0.
∂xβ
β=1
As an application of the above necessary condition, we have the following.
Example 8.4 (Conformal mapping condition) LetRΩ be a planar region. Sup-
pose u ∈ C 2 (Ω, R2 ) is a weak minimum of D(u) = Ω |∇u|2 dxdy, then
φ(z) := |∂x u|2 − |∂y u|2 + 2i∂x u∂y u
is an analytic function of z = x + iy ((x, y) ∈ Ω).
P2
Proof ∀ X = (X 1 , X 2 ) ∈ C01 (Ω, R2 ), notice that L = α=1 |pα |2 is indepen-
dent of z and u. Since u is an interior minimum, according to the first equality in
(8.5),
Z X 2 X 2
β
L div X − Lpiα ∂β u∂α X dxdy
Ω α=1 β=1
Z
= [|∇u|2 (∂x X 1 + ∂y X 2 ) − ∂x u(∂x u∂x X 1 + ∂y u∂x X 2 )
Ω
−∂y u(∂x u∂y X 1 + ∂y u∂y X 2 )]dxdy
Z
= [(|∂x u|2 − |∂y u|2 )(∂x X 1 − ∂y X 2 ) + 2∂x u · ∂y u(∂x X 2 + ∂y X 1 )]dxdy
Ω
= 0.
Let ξ = |∂x u|2 − |∂y u|2 , η = 2∂x u · ∂y u, then
Z
[−(∂x ξ + ∂y η)X 1 + (∂y ξ − ∂x η)X 2 ]dxdy = 0.
Ω
Since X ∈ C01 (Ω, R2 ) is arbitrary, we have:

∂x ξ + ∂y η = 0,
∂y ξ − ∂x η = 0.
This is precisely the Cauchy–Riemann equation. Hence, φ is analytic.
8.4∗ Applications
Example 8.5 (Clairaut’s Theorem) Let l be a geodesic on the smooth surface of

revolution S. ∀ P ∈ l, denote r(P ) the radius of the cross-section at point P and
α(P ) the angle between l and the meridian at P , then
r(P ) sin α(P ) = const.
Proof We parametrize S by
(x, y, z) = (r cos θ, r sin θ, f (r)). (8.6)
A curve on S can be represented by r = r(θ). Since the arclength functional
Z θ2 p
L(r) = r2 + (1 + f 0 (r)2 )ṙ2 dθ
θ1
is independent of θ, we have the conservation law:
r2
p = const.
r2 + (1 + f 0 (r)2 )ṙ2
Notice the differential of the arclength is
p
ds = r2 + (1 + f 0 (r)2 )ṙ2 dθ.
The conservation law can therefore be rewritten as
dθ
r2 = const. (8.7)
ds
We re-parametrize S by the arclength s: r = r(s), θ = θ(s), the equation of l
can then be obtained by substituting r(s) and θ(s) into the r and s of (8.6), the
tangent vector of l is
a = (cos θ, sin θ, f 0 )ṙ + (− sin θ, cos θ, 0)rθ̇.
The cross-section’s equation is
(x, y, z) = (r cos θ, r sin θ, f (r)),
where r = const. and whose tangent vector is
b = (−r sin θ, r cos θ, 0).
The cosine value of the angle between these two vectors is
a · b = r(s)2 θ̇(s). (8.8)
π
However, the angle between l and the cross-section is β(s) = 2 − α(s), i.e.
a · b = r(s) cos(β(s)) = r(s) sin(α(s)).
Combining (8.7) and (8.8), we have
r(s) sin(α(s)) = const.

Example 8.6 (Pohozaev’s identity) Given a region Ω ⊂ R3 with smooth bound-

ary and g ∈ C(R1 ). Consider the following nonlinear elliptic equation

−4u = g(u) in Ω
(8.9)
u=0 on ∂Ω.
We now prove when n ≥ 3, its (C 1 weak) solution satisfies the identity
Z 2
n−2
Z Z
1 ∂u
|∇u|2 − n G(u) + (x · ν)dσ = 0, (8.10)
2 Ω Ω 2 ∂Ω ∂ν
where G is an anti-derivative of g satisfying G(0) = 0, dσ is the area element of
∂Ω, and ν is the unit outward normal of ∂Ω.
Proof The Lagrangian is
1
L(u, p) = p2 − G(u)
2
and M = C01 (Ω), whose corresponding E-L equation is exactly (8.9).
Without loss of generality, we may assume the origin θ ∈ Ω. Let u ∈ C01 (Ω)
and consider Ωε = (1 + ε)Ω together with the 1-parameter family of diffeomor-
phisms ηε : Ω → Ωε for ηε (x) = (1 + ε)x (see Figure 8.2).
Fig. 8.2
Let vε : Ωε → R1 ,
vε (y) = u((1 + ε)−1 y),
then X̄ = x and
d
u(1 + ε)x)|ε=0 = −x · ∇u.
ϕ(x) =
dε
By Noether’s theorem, we have
d
I(vε , Ωε )|ε=0
dε Z
= div(LX̄ + Lp ϕ)dx
ZΩ
1
= div |∇u|2 − G(u) x − ∇u(x · ∇u) dx. (8.11)
Ω 2
On one hand, from (8.9), it is equal to

Z
n 1
|∇u|2 − nG(u) + x · ∇ |∇u|2 − G(u)
Ω 2 2

−4u(x · ∇u) − ∇u · ∇(x · ∇u) dx
Z
n x x
= |∇u|2 − nG(u) + · ∇(|∇u|2 ) − · ∇(|∇u|2 ) − |∇u|2 dx
Ω 2 2 2
Z
n−2
= |∇u|2 − nG(u) dx.
Ω 2
On the other hand, by Green’s formula, since u|∂Ω = 0, it follows that

Z 2
d 1 ∂u
I(vε , Ωε )|ε=0 = − (x · ν)dσ.
dε 2 ∂Ω ∂ν
Consequently,
2
n−2
Z Z Z
2 1 ∂u
|∇u| − n G(u) + (x · ν)dσ = 0.
2 Ω Ω 2 ∂Ω ∂ν

As a concrete application, let us take a star-shaped region Ω ⊂ Rn (n ≥ 3), that
is, every ray emanating from the origin θ intersects ∂Ω once and only once. Let
n+2
g(u) = u n−2 .
The equation
( n+2
−∆u = u n−2 x ∈ Ω,
(8.12)
u|∂Ω = 0
has no nontrivial solution.
2n
In fact, let G(u) = n−2
2n u
n−2 , if u is a solution of (8.12), then by (8.10),
Z 2
∂u
(x · ν)dσ = 0.
∂Ω ∂ν

Since Ω is star-shaped region, x · ν > 0. Hence, u|∂Ω = ∂u ∂ν |∂Ω = 0. By the

uniqueness of the initial value problem of Laplace equation, u ≡ 0.
Exercises
1. Let L(t, u, p) = t2 (p2 − 31 u6 ). Suppose ϕε : R1 × RN → R1 × RN and define

u
Y (t, u, ) = (1 + )t, W (t, u, ) = 1 .
(1 + ) 2
Prove:
(1)
Z 1
1
I(u) = t2 u̇2 − u6 dt
0 3
is invariant under {ϕε }.
(2) If u is a solution of the E-L equation of I, then
t3 6
u + t3 u̇2 + t2 uu̇ = const.
3
2. Let L = (p + ku)2 , where k is a constant. Let {ϕε } be defined such that
Y (t, u, ε) = t + ε, W (t, u, ε) = u + εαe−kt , α ∈ R1 .
R1
(1) Verify I(u) = 0 (u̇ + ku)2 dt is invariant under {ϕε }.
(2) If u is a solution of the E-L equation of I, find the conservation law for
which u satisfies.
(3) Solve for u by means of Hamiltonian system.
(4) For this u, verify the conclusion of (2).
Lecture 9
Direct methods
9.1 The Dirichlet’s principle and minimization method
Before the 20th century, calculus of variations was largely based on the studies
of E-L equations. Just like in mathematical analysis, finding the extrema of a func-
tion is usually turned into a problem of solving equation of critical points, finding
the extrema of a functional can also be turned into solving the corresponding E-L
equation.
E-L equations are differential equations. When n = 1, these are ordinary
differential equations (or systems of ordinary differential equations); only under
some special circumstances, one can find their analytic solutions. When n > 1,
E-L equations are partial differential equations, the circumstances for which one
will be able to find analytic solutions are extremely rare.
In the 19th century, driven by the studies of electromagnetism and complex
variables, people were seeking solutions of the Laplace equation

4u = 0
in Ω, (9.1)
u|∂Ω = ϕ,
and the Poisson equation

4u = f
in Ω. (9.2)
u|∂Ω = ϕ.
The Riemann (conformal) mapping theorem is a famous example: for a non-
empty simply connected open domain Ω ⊂ C, there exists a biholomorphic map-
ping f (i.e. a bijective holomorphic mapping whose inverse is also holomorphic)
from Ω onto the open unit disk. The fact f is biholomorphic implies that it is
conformal. To establish the existence of the conformal mapping, Riemann turned
this into a boundary value problem of the harmonic equation (9.1). He noticed
119
that (9.1) is the E-L equation of the Dirichlet integral (regarded as a functional)
Z
D(u) = |∇u(x)|2 dx (9.3)
Ω
1
on the set M = {u ∈ C (Ω̄) | u|∂Ω = ϕ}, the solution of (9.1) can thus be
obtained by finding the minimum of D.
This idea opened up a new path in solving partial differential equations. If
a differential equation is the E-L equation of some functional, then solving the
partial differential equation can be turned into finding the extrema of the corre-
sponding functional.
In order to prove the existence of solutions to the boundary value problem of
the harmonic equation (9.1), it is led to prove the Dirichlet integral achieves its
minimum on M . However, why does such minimum exist? Riemann’s argument
was based on the following Dirichlet’s principle, a widely accepted statement back
in the mid-19th century.
Dirichlet’s principle Since D is bounded below, it “must” attain its infimum.
That is, “there exists” u such that D achieves its minimum at u.
“Rationale”: Choose a sequence {un } ⊂ M such that D(un ) →
inf u∈M D(u). Since {un } is bounded, there exists a convergent subsequence
unk → u0 . This u0 is then the desired solution D(u0 ) = minu∈M D(u).
This argument is clearly flawed in a modern reader’s eyes. Shortly after pub-
lishing the Riemann mapping theorem, a heated debate began about the validity of
the “Dirichlet’s principle”. In 1870, Weierstrass constructed the following coun-
terexample. Consider the following extreme-value problem:
Z 1
I(u) = x2 u02 dx, M = {u ∈ C 1 [−1, 1] | u(−1) = −1, u(1) = 1}.
−1
(1) inf I = 0. In fact, I ≥ 0. Letting
u∈M
arctan x
u = , ∀ > 0,
arctan 1
it follows that
1
2 (x2 + 2 )−1
Z
2
I(u ) < 1 2 dx = → 0, as → 0.
−1 (arctan ) arctan 1
(2) If u0 is a minimum of I on M , then u00 ≡ 0, so u0 = const. This contradicts
the boundary condition!
Any reader with some rigorous background in mathematical analysis under-
stands: it is in general not true that every minimizing sequence contains a subse-
quence converging to the minimal value.
Direct methods 121
Given a topological space X, suppose f : X → R1 is bounded below, that

is, ∃ M > 0 such that f (x) > −M , hence there exists m = inf x∈X f (x). Let
{xj } ⊂ X be a minimizing sequence such that f (xj ) → m. Under what condition
does {xj } have a subsequence converging to the minimum?
Denote ft = {x ∈ X|f (x) ≤ t}, ∀ t ∈ R1 . If
∃ t > m such that ft is “sequentially compact”, (9.4)
then for N sufficiently large, {xj | j ≥ N } ⊂ ft . Hence, there exists a subse-
quence xjk −→ x0 ∈ ft .
As to whether such x0 is a minimum, we require additionally
f (x0 ) ≤ limf (xjk ).
The condition
xn → x0 =⇒ f (x0 ) ≤ limf (xn )
is called the sequentially lower semicontinuity of f . (Sometimes, without confu-
sion, we simply refer to as lower semicontinuity.)
In summary, if f : X → R1 is sequentially lower semicontinuous and if
∃ t > m and ∃ a sequentially compact set Kt ⊃ ft := {x ∈ X|f (x) ≤ t}, then f
achieves its minimal value on X.
Next, we will apply the above abstract theorem to solve variational problems.
In finite dimensional Euclidean spaces, for a lower semicontinuous function
f , if we impose the coercive condition:
f (x) → +∞, as kxk → ∞, (9.5)
it then follows that ∃ t > m such that ft is “sequentially compact”, which is well
known in mathematical analysis.
This is however a completely different matter in infinite dimensional spaces.
For example, in an infinite dimensional Hilbert space, consider the norm-square
function f (x) = kxk2 , ∀ t > 0, it is clearly coercive, but the set
√
ft = {x|kxk ≤ t}
is not sequentially compact with respect to the norm-topology! As q an example,
∞
X = L [0, π], ft is not sequentially compact, since the sequence { 2t
2
π sin nx}1
has no convergent subsequence whatsoever.
If we equip M with the C 1 -topology, then on one hand, from the boundedness
of {D(un )}, it is not possible to conclude kun kC 1 is bounded. On the other hand,
it is not difficult to construct an example where the sequence is C 1 -bounded, but
has no C 1 -convergent subsequence. This is the reason why the previous “ratio-
nale” is invalid.
9.2 Weak convergence and weak-∗ convergence
In finite dimensional Euclidean space Rn , we usually write xn =

(ξ1n , ξ2n , . . . , ξm
n
) → x = (ξ1 , ξ2 , . . . , ξm ) for xn converging to x. The meaning
of convergence (“→”) is understood in the sense of norm convergence:
X m 21
n 2
kxn − xk = (xi − xi ) → 0.
i=1
This can also be interpreted as coordinate-wise convergence: xni → xi , i =

1, . . . , m, since these two notions of convergence are equivalent.
However, in infinite dimensional spaces, these two notions of convergence are
vastly different! For example, in l2 , given xn = (ξ1n , ξ2n , . . .), x = (ξ1 , ξ2 , . . .),
P∞ P∞
where i=1 |ξin |2 < ∞ and i=1 |ξi |2 < ∞. We say xn converges to x in norm
(denoted xn → x) if
X ∞ 21
n 2
kxn − xk = (xi − xi ) → 0,
i=1
whereas xn converges to x coordinate-wise means
xni → xi , i = 1, 2, 3, . . . .
If we take ξin = δin , n = 1, 2, . . . , then xn converges to 0 coordinate-wise, but it

does not converge in norm.
If in l2 , we use “coordinate-wise convergence” to define convergence, then ev-
ery bounded sequence indeed contains a convergent subsequence. This is the same
as in finite dimensional spaces, whose proof is based on the Cantor’s “diagonal
P∞
method”. In fact, let {xn } be bounded, i.e. ∃ M > 0 such that i=1 (ξin )2 ≤ M 2 .
It follows that |ξin | ≤ M ∀ i, ∀ n. Hence,
n1
∃{n1k | k ∈ N} ⊂ N such that ξ1 k → ξ10 ,
n2
∃{n2k | k ∈ N} ⊂ {n1k | k ∈ N} such that ξ2 k → ξ20 ,
··· ,
nl
∃{nlk | k ∈ N} ⊂ {nl−1
k | k ∈ N} such that ξl k → ξl0 ,
···
When k → ∞, since ∀ N , ∃ K = K(N ) such that for all k > K,
N N
ni
X X
|ξi0 |2 ≤ |ξi k |2 + 1 ≤ M 2 + 1,
i=1 i=1
Direct methods 123
which implies
∞
X
|ξi0 |2 ≤ M 2 + 1.
i=1
We have thus established the diagonal subsequence xnkk “coordinate-wise con-
verges to” x0 = (ξ10 , ξ20 , . . . , ξk0 , . . .) ∈ l2 .
Weak convergence and weak-∗ convergence both stem from the idea of
“coordinate-wise convergence”.
Definition 9.1 Let X be a normed linear space, the sequence {xn } ⊂ X is
said to converge weakly to x, denoted xn * x, if for any x∗ ∈ X ∗ , we have
hx∗ , xn − xi → 0, where X ∗ is the dual space of X.
Let X ∗ be the dual space of a normed linear space X, a sequence {x∗n } ⊂ X ∗
is said to converge to x∗ in the weak-∗ -topology, denoted x∗n *∗ x∗ , if for any
x ∈ X, we have hx∗n − x∗ , xi → 0.
Remark 9.1 In fact, on X ∗ , we have both the notion of weak convergence and
the notion of weak-∗ convergence. By weak convergence x∗n * x∗ , we mean for
any x∗∗ ∈ X ∗∗ , hx∗∗ , x∗n − x∗ i → 0; by weak-∗ convergence, we mean for any
x ∈ X, hx∗n − x∗ , xi → 0. Since we have the continuous embedding X ,→ X ∗∗ ,
weak convergence implies weak-∗ convergence.
It is evident that norm convergence implies both weak and weak-∗ conver-
gence, but not vice versa.
Example 9.1 In L2 (−∞, ∞), choose any nonzero function ϕ(t) with compact
support, let ϕn (t) = ϕ(t + n), then ϕn * 0, but kϕn k = kϕk 6= 0.
Example 9.2 Note that L2 [0, 2π] is self-dual, its dual space is again L2 [0, 2π].
Consider the sequence {sin (nt)} ⊂ L2 [0, 2π]. According to the Riemann–
Lebesgue lemma, ∀ f ∈ L1 [0, 2π],
Z 2π
f (t) sin (nt)dt → 0,
0
from which we may conclude:
(1) Take X = Lp [0, 2π], 1 ≤ p < ∞ and regard the sequence {sin (nt)} ⊂
0
Lp [0, 2π] = (Lp [0, 2π])∗ , then
sin (nt) *∗ 0, as n → ∞.
(2) Take X = Lp [0, 2π], 1 ≤ p < ∞ and regard the sequence {sin (nt)} ⊂
0
L∞ [0, 2π] ⊂ X, X ∗ = Lp [0, 2π] ⊂ L1 [0, 2π], 1 < p0 ≤ ∞, then
sin (nt) * 0, as n → ∞.
For 1 < p < ∞, Lp [0, 2π] is reflexive, weak and weak-∗ convergence coincide.
More generally, we have

Example 9.3 Let D = [0, 1]N be the unit hypercube in RN . Let ϕ ∈ Lp (D), 1 ≤
p ≤ ∞ and make its periodic continuation. Let ϕn (x) = ϕ(nx), ∀ n and
Z
ϕ̄ = ϕ(x)dx,
D
then
ϕn * ϕ̄ in Lp (D), 1 ≤ p < ∞,
and
ϕn *∗ ϕ̄ in L∞ (D).
Proof First, without loss of generality, we may assume ϕ̄ = 0; for otherwise, we

can replace ϕ by ϕ̃ = ϕ − ϕ̄.
Next, notice
Z Z
p p 1
kϕn kp = |ϕ(nx)| dx = N |ϕ(y)|p dy = kϕkpp ∀ 1 ≤ p ≤ ∞.
D n nD
R
We now define the set-valued function Φ(E) = E ϕ(x)dx, where E is any mea-
surable set. Φ is σ-additive, and since ϕ is periodic, Φ(x + D) = 0, ∀ x ∈ RN .
For any rectangular hypercube Q = ΠN 1 (ci , di ), by translating D inside nQ
without overlapping, we obtain
Z Z Z
1 N
χ ϕ
Q n dx = ϕ n dx =
nN |Φ(nQ)| ≤ |ϕ|dx.

D

Q n D
P
Thus, for the simple function ξ = i αi χQi , Qi ∩ Qj = ∅, i 6= j, we have
Z
ϕn ξdx → 0, n → ∞.
D
0
For 1 < p ≤ ∞, the above simple functions form a dense subset in Lp (D)
p 0
(p0 = p−1 ). That is, ∀ f ∈ Lp (D), ∃ ξ a simple function such that kf − ξkp0 <

2kϕkp . For n sufficiently large,
Z Z

ϕ n f dx
≤ kϕ k
n p kf − ξkp 0 +
ξϕn ≤ .

D D
The proof for p = 1 can be modified from the above argument, we omit the proof,
leaving as an exercise.
Direct methods 125
9.3 Weak-∗ sequential compactness
In calculus of variations, we hope to utilize weak convergence or weak-∗ con-

vergence to deduce weak sequential compactness or weak-∗ sequential compact-
ness. In the previous section, we have discussed the fact in l2 , a bounded sequence
has a “coordinate-wise” convergent subsequence. We now extend this statement
as well as its proof to be a more abstract theorem.
Theorem 9.1 (Banach-Alaoglu) Let X ∗ be the dual space of a separable normed
linear space X. Suppose {x∗n | n = 1, 2, . . .} ⊂ X ∗ is a norm-bounded sequence:
M = sup kx∗n k < ∞, then it has a weak-∗ convergent subsequence.
Proof Since X is separable, it has a countable dense subset {xk | k = 1, 2, . . .}.
For x1 , since |hx∗n , x1 i| is a bounded sequence, it has a subsequence x∗n1 such
j
that hx∗n1 , x1 i converges.
j
For x2 , since |hx∗n1 , x1 i| is a bounded sequence, it has a subsequence x∗n2 such
j j
that hx∗n2 , x1 i converges.
j
Continuing in this fashion and applying the diagonal method, we can choose
a subsequence {x∗nj } such that hx∗nj , xk i converges, ∀ k = 1, 2, . . . .
j j
However, since {xk | k = 1, 2, . . .} is dense and {x∗n | n = 1, 2, . . .} ⊂ X ∗ is
bounded in norm, for any x ∈ X, the sequence {hx∗nj , xi} converges.
j
Define
f (x) = lim hx∗nj , xi.
j→∞ j
It is clear that f (x) is linear and continuous,

|f (x)| ≤ sup kx∗nj kkxk ≤ M kxk ∀ x ∈ X.
j j
Thus, ∃ x∗ ∈ X ∗ such that f (x) = hx∗ , xi, ∀ x ∈ X, i.e.

x∗nj *∗ x∗ .
j

Consequently, we have the following fundamental result.
Theorem 9.2 Let X be the dual space of a separable Banach space (e.g. a re-
flexive Banach space). Let E ⊂ X be a non-empty weak-∗ sequentially closed
subset. If f : E → R1 is sequentially weak-∗ lower semi-continuous (abbreviated
s.w∗ . l.s.c), and if f is coercive (∀ x ∈ E, when kxk → ∞, f (x) → +∞), then
f attains its minimum on E.
Proof Choose a minimizing sequence {xn } ⊂ E of f ,
lim f (xn ) = inf f (x).
x∈E
Since f is coercive, {xn } is bounded. By Theorem 9.1, {xn } contains a weak-∗

convergent subsequence
xnk *∗ x0 .
By assumption, E is weak-∗ sequentially closed, x0 ∈ E.

Next, since f is s.w∗ . l.s.c.,
f (x0 ) ≤ lim inf f (xnk ).
Thus,
f (x0 ) = inf f (x).

x∈E

We now return to the Dirichlet integral. On the one hand, from the bounded-
ness of the Dirichlet integral D(u), it is impossible to deduce its boundedness in
the C 1 -norm; on the other hand, we do not know whether C 1 (Ω̄) is the dual space
of some normed linear space. Because of this, in order to verify the validity of the
Dirichlet’s Principle, the space C 1 (Ω̄) is not a proper choice.
Closely related to the Dirichlet integral D(u) is the following semi-norm:
Z 21
2
k u k= |∇u| dx ,
Ω
and the norm

Z 21
2 2
kuk1 = (|∇u| + |u| )dx .
Ω
This norm corresponds to the inner product

Z
(u, v) = (∇u · ∇v + uv)dx.
Ω
1
Unfortunately, C (Ω̄) is not complete with respect to such a norm. We denote its
completion (with respect to the above norm) by H 1 (Ω), which is a Hilbert space.
Consequently, the Dirichlet inner product associated with the semi-norm D(u)
is
Z
D(u, v) = ∇u · ∇vdx;
Ω
they are related via D(u) = D(u, u).

When Ω is a bounded open set, on C01 (Ω̄), we can extend Poincaré’s inequality
encountered in Lecture 3 from one-variable functions to multivariable functions.
Direct methods 127
Lemma 9.1 (Poincaré’s inequality) Let Ω ⊂ Rn be a bounded open set, u ∈

C01 (Ω̄), then ∃ C = C(Ω) such that
Z Z
|u|2 dx ≤ C |∇u|2 dx.
Ω Ω
Proof Choose a hypercube D ⊂ Rn such that Ω ⊂ D, ∀ ϕ ∈ C0∞ (Ω), let

ϕ(x), x∈Ω
ϕ̃(x) =
0, x∈/ Ω̄.
Then
Z Z Z Z
2 2 2
|ϕ| = |ϕ̃| , |∇ϕ| = |∇ϕ̃|2 .
Ω D Ω D
Denote x = (x1 , x̃). By the single variable Poincaré’s inequality, we have
Z Z
2
|ϕ̃(x1 , x̃)| dx1 ≤ C |∂x1 ϕ̃(x1 , x̃)|2 dx1 ,
J J
where J is the projection of D in the direction of x1 . Integrating with respect to
x̃, it yields
Z Z
|ϕ̃|2 dx ≤ C |∇ϕ̃|2 dx,
D D
i.e.
Z Z
|ϕ|2 dx ≤ C |∇ϕ|2 dx.
Ω Ω
By taking the limit, the inequality holds ∀ u ∈ C01 (Ω̄).

This shows that D(u) is a norm on C01 (Ω̄).
We denote the closure of C01 (Ω̄)
in H 1 (Ω) by H01 (Ω). It is a closed subspace of the Hilbert space H 1 (Ω). We
can therefore regard D(v), D(v0 , v), ∀ v ∈ H01 (Ω) as the continuous extension
of the Dirichlet integral and the Dirichlet inner product respectively. For a given
v0 ∈ C 1 (Ω̄), on M = v0 + H01 (Ω), the Dirichlet integral is defined as
D(u) = D(v0 ) + 2D(v0 , v) + D(v),
where u = v0 + v ∈ M .
If we regard the so-defined D(u) as a functional on H01 (Ω), then
1. D(u) is coercive.
We wish to prove
D(v) → ∞ ⇒ D(u) → ∞.
It follows from Schwarz’s inequality and Young’s inequality that
1
|D(v0 , v)| ≤ D(v0 ) + D(v),
4
it implies
1
D(u) ≥ D(v) − D(v0 ).
2
This confirms coerciveness.
2. D(u) is sequentially weakly semi-lower continuous.
Let uj = v0 + vj , then
uj * u (H 1 (Ω)) ⇔ vj * v (H01 (Ω)).
Notice
1 1
|D(v0 , ϕ)| ≤ D(v0 ) 2 D(ϕ) 2 ∀ ϕ ∈ H01 (Ω),
ϕ 7→ D(v0 , ϕ) is a continuous linear functional on H01 (Ω). Thus,
D(v0 , vj ) → D(v0 , v).
Likewise,
D(v, vj ) → D(v, v).
By Schwarz’s inequality, we obtain
1 1
D(v) = lim D(v, vj ) ≤ lim inf D(v) 2 D(vj ) 2 ,
i.e.
D(v) ≤ lim inf D(vj ).
That is,
D(u) ≤ lim inf D(uj ).
H01 (Ω) is a Hilbert space, hence is self-dual.
We can also directly verify H01 (Ω) is self-dual, H01 (Ω) is the dual space of the
Banach space H01 (Ω). Furthermore, H01 (Ω) is separable (the detailed proof will
be given in the next lecture). We now apply Theorem 9.2 to deduce the Dirichlet
integral indeed attains its minimum on H01 (Ω), hence affirms Dirichlet’s principle.
Remark 9.2 To verify Dirichlet’s Principle, there is also a more direct method -
orthogonal projection. Geometrically speaking, it is equivalent to minimizing the
distance from a hyperplane to a given point outside the hyperplane (we refer to
Lecture 12). From which, we will derive the Riesz Representation Theorem and
the self-duality of a Hilbert space.
Remark 9.3 In the above example, we obtain the solution u0 ∈ H 1 (Ω). However,
we do not yet know if it is differentiable, nor do we know if it belongs to C 2 . In
our on-going discussions, we must address in what sense will u0 be a solution of
the harmonic equation.
Direct methods 129
As for a more general functional I and its boundary conditions, in order to

establish the existence of solutions via the minimizing sequence method, we must
provide:
1. a suitable function space, which is a reflexive Banach space, or the dual
space of a separable Banach space.
2. the functional I is sequentially weak-∗ lower semi-continuous with respect
to the underlying topology,
3. I is coercive with respect to the topology.
4. in what sense the so-obtained minimal solution will indeed be a solution of
the original equation (satisfying the weak form of the E-L equation).
5. whether the minimal solution would satisfy the differentiability rendered
by the equation.
9.4∗ Reflexive spaces and the Eberlein–Šmulian theorem
In functional analysis, there is in-depth study of weak sequential compact-

ness. It is possible to deduce the weak sequential compactness directly by the
reflexiveness, avoiding the separable assumption. (In this occasion, there is no
difference between being weakly sequentially compact and being weakly-∗ se-
quentially compact.)
We recall the definition of a reflexive space: the dual space X ∗ of a Banach
space X is also a Banach space. We can also consider the dual space (X ∗ )∗ =
X ∗∗ of X ∗ , and we call it the second dual of X.
Notice ∀ x ∈ X, we can define a functional on X ∗ via
hFx , x∗ i = hx∗ , xi ∀ x∗ ∈ X ∗ .
Fx is linear on X ∗ and satisfies
|hFx , x∗ i| ≤ kx∗ k kxk.
We call T : x 7→ Fx a natural mapping. In fact, T : X → X ∗∗ is a continuous
embedding. By the Hahn–Banach theorem, ∃ x∗ ∈ X ∗ such that kx∗ k = 1,
hx∗ , xi = kxk, it follows that
kxk = hx∗ , xi = hFx , x∗ i ≤ kFx k.
Thus, T is an isometry, that is, X is isometric to a closed linear subspace of its
second dual X ∗∗ .
Definition 9.2 A Banach space is said to be reflexive if the above isometry T is
surjective.
The following result on separability is due to Banach.
Theorem 9.3 (Banach) Let X be a normed linear space. If the dual space X ∗ is
separable, then so is X itself.
Proof 1. Denote S1∗ the unit sphere in X ∗ , then S∗1∗ is separable. In fact, let
x
{x∗n } ⊂ X ∗ be a countable dense subset. Let yn∗ = kxn∗ k , then {yn∗ } is a countable
n
∗ ∗ ∗
dense subset of S1 . To see this, ∀ x ∈ S1 , ∃ xnj such that kx∗nj − x∗ k → 0,
∗
hence kx∗nj k → 1 and

1
kyn∗ j − x∗ k ≤ 1 − ∗ kx∗nj k + kx∗nj − x∗ k → 0.

kxnj k
This shows {yn∗ } is dense in S1∗ .
2. By definition, there exists xn ∈ X such that kxn k = 1 and hyn∗ , xn i ≥ 1/2.
Let X0 = span{x1 , x2 , . . . , xn , . . .}. We want to show X0 = X. Suppose not, by
the Hahn–Banach theorem, there exists y ∗ ∈ S1∗ such that hy ∗ , xi = 0, ∀ x ∈ X0 .
However,
1
ky ∗ − yn∗ k ≥ |hy ∗ − yn∗ , xn i| ≥ ,
2
which is impossible. Thus, X0 = X, i.e. X is separable.
As a consequence of Theorems 9.1 and 9.3, we have the following corollary.
Corollary 9.1 If X is a separable, reflexive Banach space, then every bounded
sequence has a weakly convergent subsequence.
Moreover, the separable assumption can be removed, but we will need the
assistance of the following theorem.
Theorem 9.4 (Pettis) A closed linear subspace X0 of a reflexive Banach space
X is reflexive.
Proof We want to show that ∀ x∗∗ ∗∗
0 ∈ X0 , ∃ x ∈ X0 such that
hx∗∗ ∗ ∗
0 , x0 i = hx0 , xi, ∀ x∗0 ∈ X0∗ . (9.6)
∗
1. Define the mapping T : X → X0∗ ∗ ∗
by T x = x |X0 ,
∗ ∗
hT x , x0 i = hx |X0 , x0 i ∀ x0 ∈ X0 .
T is linear and continuous. Its dual mapping T ∗ ∈ L(X0∗∗ , X ∗∗ ) satisfying
∀ x∗∗ ∗∗ ∗ ∗∗
0 ∈ X0 , T x0 ∈ X .
∗∗
2. Since X is reflexive, ∃ x ∈ X such that

hx∗ , xi = hT ∗ x∗∗ ∗ ∗∗ ∗
0 , x i = hx0 , x |X0 i ∀ x∗ ∈ X ∗ . (9.7)
We want to show x ∈ X0 . We argue by contradiction. Suppose x ∈ / X0 , then
applying the Hahn–Banach theorem to the closed linear subspace X0 , ∃ x∗1 ∈ X ∗
such that
hx∗1 , xi 6= 0, x∗1 |X0 = 0.
Direct methods 131
But
hx∗1 , xi = hx∗∗ ∗
0 , x1 |X0 i = 0,
which is a contradiction.
3. We now prove (9.6), i.e. x∗∗ 0 is the image of x ∈ X0 under the natural
mapping. According to the Hanh–Banach theorem, ∀ x∗0 ∈ X0∗ , ∃ x∗ ∈ X ∗ such
that x∗ |X0 = x∗0 . Noting (9.7) implies
hx∗∗ ∗ ∗ ∗
0 , x0 i = hx , xi = hx0 , xi ∀ x∗0 ∈ X0∗ .
This shows X0 is reflexive.

∗
Corollary 9.2 A Banach space X is reflexive if and only if X is reflexive.
Proof “⇒” (X ∗ )∗∗ = (X ∗∗ )∗ = X ∗ .
“⇐” Suppose X ∗ is reflexive. Using the forward implication, X ∗∗ is reflexive.
However, since X is a closed linear subspace of X ∗∗ , by the Pettis Theorem, X
is reflexive.
As a consequence, we have:
Theorem 9.5 (Eberlein–Šmulian) Every bounded sequence {xn } of a reflexive
Banach space X has a weakly convergent subsequence.
Corollary 9.3 Every bounded sequence {xn } of a Hilbert space has a weakly
convergent subsequence.
Besides Hilbert spaces, we consider the following reflexive Banach spaces.
Consider the Lebesgue function space Lp (Ω), where Ω ⊂ Rn and 1 ≤ p < ∞.
We know
0 1 1
(Lp (Ω))∗ = Lp (Ω), 1 ≤ p < ∞, + = 1.
p p0
This means, for every continuous linear functional F on Lp (Ω), there exists, in
0
the sense of a.e., a unique function v ∈ Lp (Ω) such that it can be represented by
Z
F (u) = v(x)u(x)dx
Ω
and
Z 10
p
p0
kF k(Lp )∗ = |v(x)| dx = kvkp0 .
Ω
p
For all u ∈ L (Ω), 1 < p < ∞,
Z
v 7→ u(x)v(x)dx
Ω
0
can be viewed as a continuous linear functional G on Lp (Ω). Likewise, there
exists wu ∈ Lp (Ω) such that
Z
0
G(v) = wu (x)v(x)dx ∀ v ∈ Lp (Ω),
Ω
i.e.
Z Z
0
u(x)v(x)dx = G(v) = wu (x)v(x)dx ∀ v ∈ Lp (Ω).
Ω Ω
Thus, wu (x) = u(x) a.e.

Following the above line of thought, when 1 < p < ∞,
0
(Lp (Ω))∗∗ = (Lp (Ω))∗ = Lp (Ω),
whence space Lp (Ω) is reflexive.
Exercises
1. Prove that for a bounded sequence in l2 , coordinate-wise convergence ⇔ weak

convergence.
2. Given a family of functions
arctan x
u = , ∀ > 0.
arctan 1
In the space L2 (−1, 1), determine whether it converges in norm or converges
weakly. In the space H 1 (−1, 1), determine whether it converges in norm or
converges weakly.
3. In L2 ([0, 2π]), find
w − lim sin nt and w − lim sin2 nt.
n→∞ n→∞
1
4. In H (0, 1), given a sequence of functions

 t − k , t ∈ k , 2k + 1

j j 2j


uj (t) = k = 0, 1, 2, . . . , j − 1,
 k+1 2k + 1 k + 1
 −t + , t∈ ,


j 2j j
j = 1, 2, . . . . Determine whether it converges in norm or converges weakly.
Lecture 10
Sobolev spaces
We have pointed out in our earlier discussions that neither C 1 nor C01 is an ap-
propriate space for verifying the Dirichlet’s principle, instead, one should consider
the spaces H 1 and H01 .
Such scenario frequently occurs when applying direct methods to solve vari-
ational problems. This is because the functionals are usually variational integrals
involving derivatives, and the C-norm associated with the C-space consisting of
the same order derivatives is determined by the maximal value of the pointwise
norm. However, the C-norm cannot be controlled by such variational integrals.
Moreover, in order to possess the weakly sequential compactness, the underly-
ing space must be the dual space of a normed linear space, since such space is at
least complete. The Sobolev spaces introduced in this lecture satisfy the above
requirements.
10.1 Generalized derivatives
Let u, v ∈ L1loc (Ω), ∀ i = 1, 2, . . . , n, we call v the generalized derivative of

∞
Z v = Dxi u, ifZ ∀ ϕ ∈ C0 (Ω),
u with respect to xi , denoted
vϕdx = − u∂xi ϕdx.
Ω Ω
More generally, for a multi-index α = (α1 , . . . , αn ), we denote
Xn
|α| = αi , ∂ α = ∂xα11 · · · ∂xαnn .
i=1
Definition 10.1 We call v the αth-order generalized derivative of u, if ∀ ϕ ∈
C0∞ (Ω), Z Z
vϕdx = (−1)|α| u∂ α ϕdx.
Ω Ω
Denote v = Dα u.
133
Example 10.1 Let n = 1, J = (−1, 1), and u(x) = |x|. Since

Z 1 Z 1 Z 0
0 0
|x|ϕ (x)dx = xϕ (x)dx + (−x)ϕ0 (x)dx
−1 0 −1
Z 1
=− sgn xϕ(x)dx,
−1
we have D(|x|) = sgn x.

Example 10.2 If u ∈ C k (Ω), then Dα u = ∂ α u, ∀ α, |α| ≤ k. This is because
Z Z
α |α|
∂ uϕdx = (−1) u∂ α ϕdx.
Ω Ω
10.2 The space W m,p (Ω)
Definition 10.2 (The space W m,p (Ω)) Suppose p ∈ [1, ∞] and m ∈ N, let
W m,p (Ω) := {u ∈ Lp (Ω) | Dα u ∈ Lp (Ω), ∀ α, |α| ≤ m},
on which, we define the norm by
X Z p1
α p
kukm,p = |D u(x)| dx , 1 ≤ p < ∞,
|α|≤m Ω
X
kukm,∞ = esssupx∈Ω |Dα u(x)|.
|α|≤m
We call them Sobolev spaces.

Clearly, Sobolev spaces are normed linear spaces.
For a bounded region Ω, we have the following chains of containments:
W m,∞ (Ω) ⊂ W m,q (Ω) ⊂ W m,p (Ω) ⊂ W m,1 (Ω), 1 < p < q < ∞,
W m,p (Ω) ⊂ W l,p (Ω), 0 ≤ l ≤ m,

also,
if Ω1 ⊂ Ω2 , u ∈ W m,q (Ω2 ) then u|Ω1 ∈ W m,q (Ω1 ).
Theorem 10.1 The space W m,p (Ω) is complete, hence is a Banach space.
Proof Let {uj } be a Cauchy sequence, then ∀ α, |α| ≤ m,{Dα uj } is a Cauchy
sequence in Lp . Hence, there exists gα ∈ Lp (Ω) such that Dα uj → gα , Lp (Ω).
Now ∀ ϕ ∈ C0∞ (Ω), we have
hDα uj , ϕi = (−1)|α| huj , ∂ α ϕi.
Sobolev spaces 135
It follows that
hDα uj , ϕi → hgα , ϕi, ∀ α, |α| ≤ m
and
huj , ∂ α ϕi → hg0 , ∂ α ϕi, ∀ α, |α| ≤ m.
That is,
hgα , ϕi = (−1)|α| hg0 , ∂ α ϕi.
From which, we deduce that
gα = Dα g0
and
k Dα uj − gα kp → 0 ∀ α, |α| ≤ m.
m,p
Thus, g0 ∈ W (Ω) and
kuj − g0 km,p → 0, j → ∞.
It is worth mentioning that although the generalized derivatives are dually and
globally defined, they are nevertheless closely related to the locally defined ordi-
nary derivatives. In particular, for functions in the one dimensional Sobolev space,
their generalized derivatives are the original derivatives almost everywhere! As-
sume n = 1 and J = [a, b], we have:
Example 10.3 W 1,1 (J) = AC(J), the space of absolutely continuous functions
defined on J and
Du(x) = u0 (x), a.e. (10.1)
1,1
Proof ∀ u ∈ W (J), we show that
Z x
u(x) − u(a) = Du(t)dt, ∀ x ∈ J.
a
∀ n ∈ N, let


 n(t − a) t ∈ [a, a + n1 ],

t ∈ [a + n1 , x − n1 ],

1
ϕn (t) =


 −n(x − t) t ∈ [x − n1 , x]

0 t ∈ [x, b].

Since ∃ ξn,k ∈ C0∞ (J), kξn,k kC 1 ≤ 2n such that |ϕn (t) − ξn,k (t)| converges to
R0 uniformly on J andRthat |ϕ0n (t) − ξn,k
0
(t)| → 0 a.e. t ∈ J as k → ∞, from
0
J
u(t)ξn,k (t)dt = − J
Du(t)ξ n,k dt, it follows that
Z Z
u(t)ϕ0n (t)dt = − Du(t)ϕn (t)dt.
J J
Letting n → ∞, it follows immediate

Z x
u(x) − u(a) = Du(t)dt, ∀ x ∈ J.
a
Thus, u(x) is absolutely continuous on J; furthermore,
u0 (x) = Du(x), a.e. x ∈ J.
Conversely, ∀ u ∈ AC(J), u0 (x) exists a.e. and belongs to L1 (J). It suffices to

show
u0 (x) = Du(x), a.e. x ∈ J.

Z x
In fact, since u(x) = u0 (t)dt + u(y), ∀ x, y ∈ J, we have
y
Z Z
u(x)ϕ0 (x)dx = − u0 (x)ϕ(x)dx ∀ϕ ∈ C0∞ (J).
J J
This is (10.1).
Example 10.4 W 1,∞ (J) = Lip(J), the space of Lipschitz functions on J.
Proof “⊃” Assume u ∈ Lip(J), then u is absolutely continuous
Ry 0 on J. Hence, u0
exists almost everywhere and satisfies u(y) − u(x) = x u (t)dt as well as
|u(y) − u(x)|
|u0 (x)| ≤ sup ≤ M.
y∈J |y − x|
From Example 10.3, we see that Du ∈ L∞ (J), i.e. u ∈ W 1,∞ (J) and
kuk1,∞ ≤ kukLip .
“⊂” Conversely, assume u ∈ W 1,∞ (J), then by (10.1),

Z y
|u(y) − u(x)| ≤ |u0 (t)| dt ≤ ku0 k∞ |y − x| ∀ x, y ∈ J.
x
Thus,
kukLip ≤ kukW 1,∞ .
Definition 10.3 We denote the closure of C0∞ (Ω) in W m,p (Ω) by W0m,p (Ω).
Lemma 10.1 If u ∈ W m,p (Ω), ψ ∈ C0∞ (Ω), then (ψu) ∈ W0m,p (Ω) and
supp(Dα (ψu)) ⊂ supp(ψ), |α| ≤ m.

Sobolev spaces 137
Proof We only prove for m = 1, the rest can be proved by mathematical induc-
tion. In fact, ∀ ϕ ∈ C0∞ (Ω),
Z Z
Dxi (ψu)ϕdx = − ψu∂xi ϕdx
Ω Ω
Z
=− u[∂xi (ψϕ) − ∂xi ψϕ]dx
Z Ω
= [∂xi ψu + ψDxi u]ϕdx.
Ω
Hence,
Dxi (ψu) = ψ(Dxi u) + (∂xi ψ)u.
10.3 Representations of functionals
In order to understand the weak topology on the Banach space W m,p (Ω) (or
W0m,p (Ω)), we need to consider the representations of functionals on these spaces.
We know the dual space of Lp (J) is
0 1 1
(Lp (Ω))∗ = Lp (Ω), 1 ≤ p < ∞, + = 1,
p p0
0
i.e. ∀ f ∈ (Lp (Ω))∗ , ∃ v ∈ Lp (Ω) such that
Z
hf, ui = u(x)v(x)dx ∀ u ∈ Lp (Ω)
Ω
and
Z 10
p
p0
kf k = sup hf, ui = |v| .
kukp ≤1 Ω
In order to examine continuous linear functionals on W m,p (Ω), we isometrically

embed W m,p (Ω) into the product space |α|≤m Lp (Ω) as a closed linear sub-
Q
space:
i: u 7→ {Dα u, |α| ≤ m},
W m,p (Ω) → Π|α|≤m Lp (Ω).

∗
By the Hahn–Banach theorem, f ∈ (W m,p (Ω)) if and only if there exists
0
{ψα , |α| ≤ m} ∈ |α|≤m Lp (Ω) such that
Q
Z X
α
hf, ui = D u(x)ψα (x) dx.
Ω |α|≤m
Consequently, the weak convergence uj * u in W m,p (Ω) (p ≥ 1) can be

expressed as
Z X Y 0
uj * u in W m,p (Ω) ⇔ Dα (uj −u)ψα dx → 0, ∀ {ψα } ∈ Lp (Ω).
Ω |α|≤m |α|≤m
Theorem 10.2 W m,p (Ω) (1 < p < ∞) is a reflexive Banach space.

Proof We define the embedding i : W m,p (Ω) → |α|≤m Lp (Ω) via
Q
i : u 7→ {Dα u, |α| ≤ m}.

This is a closed map, therefore i(W m,p (Ω)) ⊂ |α|≤m Lp (Ω) is a closed linear
Q
subspace. Since |α|≤m Lp (Ω) is reflexive, by the Pettis theorem, W m,p (Ω) is
Q
also reflexive.
10.4 Modifiers
In Lecture 6, in the proof of higher dimensional du Bois–Reymond lemma, we

have introduced the “bump function”

 −1 −1
cn exp |x| < 1,
ϕ(x) = 1 − |x|2
0 |x| ≥ 1,

1
where |x| = (x21 + · · · + x2n ) 2 and cn is such a constant that
Z
ϕ(x)dx = 1.
Rn
∀ > 0, let

−n x
ϕ (x) = ϕ ,

then supp ϕ ⊂ B (θ).
We utilize this function to smooth out any given function. Let u ∈ L1 (Ω),
supp(u) ⊂ Ωδ := {x ∈ Ω | d(x, ∂Ω) ≥ δ > 0}. We call the mapping u 7→ u
(0 < < δ) a modifier, where
Z
u (x) = ϕ (x − y)u(y)dy.
Ω
Modifiers have the following properties:
(1) supp(u ) ⊂ (supp u) := {x ∈ Ω | d(x, supp u) ≤ ).
(2) u ∈ C0∞ (Ω) and
Z
α
∂ u (x) = u(x)∂ α ϕ (x − y)dy.
Ω
Sobolev spaces 139
(3) If u ∈ C0m (Ω), then ∂ α (u ) = (∂ α u) , ∀ α |α| ≤ m for > 0 sufficiently
small.
(4) If u ∈ C0 (Ω), then ku − u kC → 0 as → 0.
(5) If u ∈ Lp (Ω), 1 ≤ p < ∞, then ku − u kLp → 0 as → 0.
(6) C0∞ (Ω) is dense in Lp (Ω).
(7) If u, ∂ α u ∈ Lp (Ω), p ∈ [1, ∞], supp u ⊂ int(Ω), then for sufficiently
small, we have (Dα u) = ∂ α (u) .
In fact,
Z
LHS = Dα u(y)ϕ (x − y)dy
Ω
Z
= (−1)|α| u(y)∂yα ϕ (x − y)dy
Ω
Z
= u(y)∂xα ϕ (x − y)dy
Ω
Z
= ∂xα u(y)ϕ (x − y)dy = RHS.
Ω
(8)
W0m,p (Rn ) = W m,p (Rn ).
10.5 Some important properties of Sobolev spaces and embedding

theorems
Sobolev spaces are fundamental function spaces, which play an important role
in Harmonic Analysis, Partial Differential Equations, Functional Analysis, and
Calculus of Variations. The important properties of Sobolev spaces are discussed
in many textbooks. In this section, we only list some of the main results and refer
the interested readers for the detailed proofs in the existing literature. However,
in order to assist the readers’ understanding of the significance of these results
as well as the essence of the proofs, we would like to provide proofs for some
particular or simplified cases.
Extension Theorem
Sobolev spaces are function spaces. When the domain of these functions is
an arbitrary region Ω ⊂ Rn , the space is denoted W m,p (Ω); when the domain
of these functions is the whole Rn , it is denoted W m,p (Rn ). A natural question
comes to mind: can we extend each function u ∈ W m,p (Ω) to be a function
ũ ∈ W m,p (Rn ) such that ũ|Ω = u?
The answer is affirmative as long as Ω ⊂ Rn has sufficiently smooth boundary.
Theorem 10.3 (Extension Theorem) If Ω is a bounded region where ∂Ω is

uniformly C m , then ∀ 0 ≤ l ≤ m, ∀ 1 ≤ p < ∞, ∃ T ∈ L(W l,p (Ω), W l,p (Rn ))
such that
T u(x) = u(x), a.e. x ∈ Ω.
A detailed proof can be found in [Ad] Theorem 4.26. This theorem is due to
Lichtenstein, Hestenes, Seeley, and Calderon.
In contrast, for W0m,p spaces, regardless of the choice of Ω, such extension is
always possible, that is, ∃ T ∈ L(W0l,p (Ω), W0l,p (Rn )) such that

u(x), a.e. x ∈ Ω,
T u(x) =
0, x∈/ Ω.
This is because C0∞ (Ω) is a dense subspace of W0m,p (Ω) and C0∞ (Ω) is a sub-
space of C0∞ (Rn ), whence W0l,p (Ω) is a closed subspace of W0l,p (Rn ).
Approximation Theorem
We have previously defined H 1 (Ω) as Rthe metric completion of the space
C (Ω̄) with respect to the norm kuk = ( Ω (|∇u|2 + |u|2 )dx)1/2 . Addition-
1
ally, W 1,2 (Ω) is a complete metric space containing C 1 (Ω̄), whence H 1 (Ω) ⊂
W 1,2 (Ω). It is natural to ask whether they are the same. We give a positive an-
swer to this question via the following approximation theorem.
Theorem 10.4 (Serrin-Meyers Approximation Theorem) If p ∈ [1, ∞), then
the set S := C ∞ (Ω) ∩ W m,p (Ω) is dense in W m,p (Ω),
Proof Choose a sequence of open subsets {Ωj } satisfying
∅ = Ω−1 = Ω0 ⊂ Ω1 ⊂ Ω̄1 ⊂ Ω2 ⊂ Ω̄2 ⊂ · · · Ωi ⊂ Ω̄i ⊂ Ωi+1 ⊂ · · · ,
∞
[
Ωi = Ω.
i=1
For example, let Ωi = {x ∈ Ω | kxk ≤ i, dist(x, ∂Ω) > 1i }, i = 1, 2, 3, . . . . For

an open covering of Ω
O = {Uk = Ωk+1 \Ωk−1 | k = 1, 2, . . .},
the corresponding partition of unity {ψi } subordinate to O satisfies:
(1) ψi ∈ C0∞ (Ω),
(2) supp{ψi } ⊂ Ωi+1 \Ωi−1 ,
P
(3) ψi (x) ≡ 1 ∀ x ∈ Ω.
Now ∀ u ∈ W m,p (Ω), ∀ > 0, choose δi < dist(Ωi , ∂Ωi+1 ) small enough
such that

k(ψi u)δi − ψi ukm,q < i . (10.2)
2
Sobolev spaces 141
This is possible due to properties (7), (5) in §10.4 and Lemma 10.1. ∀ α, |α| ≤ m,
we have
k∂ α (ψi u)δi − Dα (ψi u)kq = k(Dα (ψi u))δi − Dα (ψi u)kq → 0.
Summing it up in (10.2), it follows that

∞
X ∞
X ∞
X
k (ψi u)δi − ukm,q = k (ψi u)δi − (ψi u)km,q < .
i=1 i=1 i=1
P∞ P∞
Since ∀ x ∈ Ω, i=1 (ψi u)δi (x) has finite sum, v = i=1 (ψi u)δi is infinitely
many times differentiable.
Moreover, ∀ k ∈ N, on each Ωk ,
k+2
X k+2
X
u(x) = (ψi u)(x), v(x) = (ψi u)δi (x),
i=1 i=1
we have the estimate

k+2
X
ku − vkW m,p (Ωk ) ≤ k(ψi u)δi − ψi ukm,p < .
i=1
Letting k → ∞, it follows that ku − vkm,p < . It is evident v ∈ S as desired.

Corollary 10.1 If Ω is a bounded region where ∂Ω is uniformly C m , then ∀ 1 ≤
p < ∞, W m,p (Ω) is a separable Banach space.
Proof Choose an open hypercube D ⊂ Rn such that Ω ⊂ Ω̄ ⊂ D. ∀ u ∈
W m,q (Ω), ∃ ũ ∈ W m,q (Rn ) such that
kũkW m,q (Rn ) ≤ CkukW m,q (Ω) , ũ|Ω = u.
For arbitrary > 0, applying Theorem 10.4, we can find v ∈ C ∞ (Rn ) ∩

W m,q (Rn ) such that

kũ − vkW m,q (Ω) ≤ kũ − vkW m,q (D) ≤ .
3
Choose ψ ∈ C0∞ (D) satisfying ψ(x) = 1, ∀ x ∈ Ω, then ψv ∈ C0∞ (D). By the
Weierstrass Approximation Theorem, there exists a polynomial P on D̄ such that

kψv − P kW m,q (D) ≤ kψv − P kC m (D̄) ≤ .
3
Since any polynomial can be approximated in the C m sense by polynomials with
rational coefficients, this asserts that W m,p (Ω) has a countable dense subset.
Corollary 10.2 H 1 (Ω) = W 1,2 (Ω).
Poincaré’s Inequality
Poincaré’s inequality in Lecture 9 can be extended to W01,p (Ω), 1 ≤ p < ∞
as follows.
Poincaré’s Inequality If Ω ⊂ Rn is bounded, u ∈ W 1,p (Ω), 1 ≤ p < ∞, then
∃ C = C(p, Ω) such that
Z Z
p
|u| dx ≤ C |∇u|p dx.
Ω Ω
Corollary 10.3 If Ω ⊂ Rn is bounded, then
Z p1
kuk = |∇u|p
Ω
defines an equivalent norm on W01,p (Ω) (1 ≤ p < ∞).
Since W01,p (Ω) (1 < p < ∞) is a closed linear subspace of W 1,p (Ω), it is
itself a reflexive Banach space.
Embedding Theorems
Theorem 10.5 (Sobolev) Both embeddings
1 1 m
W m,q (Rn ) ,→ Lr (Rn ), = − (if mq < n),
r q n
and ∀ j ∈ N,
n
W m+j,q (Rn ) ,→ C j,λ (Rn ), 0 < λ ≤ m − (if mq > n)
q
are continuous.
A proof can be found in [Ad] 5.4–5.10.
Combining the above theorem and the extension theorem, we arrive at
Theorem 10.6 (Sobolev Embedding Theorem) Assume Ω is a bounded region
with uniformly C m boundary, 1 ≤ q < ∞, and m ≥ 0 is an integer, then the
embeddings
1 1 m
W m,q (Ω) ,→ Lr (Ω), = − (if mq < n)
r q n
and ∀ j ∈ N,
n
W m+j,q (Ω) ,→ C j,λ (Ω̄), 0 < λ ≤ m − (if mq > n)
q
are both continuous.
The most frequently used version of this result is for m = 1, and we denote
nq
q ∗ = n−q , then
W 1,q (Ω) ,→ Lr (Ω), r ≤ q ∗ if n > q
and
W 1,q (Ω) ,→ C(Ω̄) if q > n
are both continuous.
Sobolev spaces 143
Remark 10.1 When n = 1, Ω = (a, b), the conclusion of the embedding theorem
follows easily from Hölder’s inequality and Example 10.3. Since in this case,
the generalized derivatives coincide with the usual derivative functions almost
everywhere, we have
Z x Z b q1
1
u0 (t)dt ≤ |x−y| q0 |u0 (t)|q dt ,

|u(x)−u(y)| = ∀ x, y ∈ (a, b).
y a

Compact Embeddings
Theorem 10.7 (Rellich–Kondrachov compact embedding theorem) Assume Ω
is a bounded region with uniformly C m boundary, 1 ≤ q ≤ ∞, and m ≥ 0 is an
integer, then the embeddings
nq
W m,q (Ω) ,→ Lr (Ω), 1 ≤ r < (if m < n/q)
n − mq
and
W m,q (Ω) ,→ C(Ω̄), (if m > n/q)
are both compact.
The proof can be found in [Ad] Theorem 6.2.
We now give a direct proof for a frequently used special case of the above
result.
Theorem 10.8 (Rellich) If Ω ⊂ Rn is a bounded region, 1 ≤ p < ∞, then a
closed bounded ball in W01,p (Ω) is sequentially compact in Lp (Ω).
Proof It suffices to show the closed unit ball is sequentially compact. Denote B
the closed unit ball in W01,p (Ω) centered at 0. We proceed by showing ∀ > 0,
under the Lp -norm, there exists a finite net of B.
The idea is to find a uniformly bounded and equicontinuous set of functions
in an arbitrary Lp -neighborhood of B.
1. By definition, C0∞ (Ω) is dense in W01,p (Ω). Denote S = C0∞ (Ω) ∩ B. For
any δ > 0, denote Sδ = {vδ | v ∈ S}, where
Z
vδ (x) = v(y)ϕδ (x − y)dy.
Ω
∀ v ∈ S,
kvδ k1,p ≤ kvk1,p ≤ 1.
Since Z

|vδ (x) − v(x)| = ϕ(y)[v(x) − v(x − δy)]dy
|y|≤1
Z Z δ|y|
∂

v x − r y drdy,

≤ ϕ(y)
|y|≤1 0
∂r |y|
by Hölder’s inequality,
Z Z Z δ|y| p
p
1 1 ∂ y
kvδ − vkp ≤ 0
ϕ(y) ϕ(y)
p p
∂r v x − r |y| drdy dx

Ω |y|≤1 0
Z Z
≤ ϕ(y)δ p |y|p |∇v(x)|p dxdy,
|y|≤1 Ω
i.e.
kvδ − vkp ≤ δk∇vkp .
Thus, ∃ δ0 = δ() such that

kv − vδ kp ≤ , ∀ v ∈ S, ∀ δ ≤ δ0 .
8
Fixing δ = δ0 , we have
Z
C
|vδ (x)| =
v(y)ϕδ (x − y)dy ≤ n kvkp
Ω δ
and
Z
C
|∇vδ (x)| =
v(y)∇ϕδ (x − y)dy ≤ n+1 kvkp .
Ω δ
This shows Sδ0 ⊂ C(Ω̄) is uniformly bounded and equicontinuous. According

to the Arzelà–Ascoli theorem, under the C-norm, Sδ0 has a finite 4mes(Ω) -net
{w1 , w2 , . . . , wl }. That is, ∀ v ∈ Sδ0 , ∃ wi ∈ Sδ0 such that

kvδ − wi kC < .
4mes(Ω)
Thus,

kvδ − wi kp < .
4
2. However, wi may not lie in B, but for each wi , there is a corresponding vδi 0 ,
where v i ∈ S = C0∞ (Ω) ∩ B. If vδi 0 , ∈/ B, its support must lie beyond Ω̄; we
can then take δi ∈ (0, δ0 ] such that the support of wi0 = vδi i is confined in Ω.
Consequently, we have wi0 ∈ B and

kwi − wi0 kp ≤ kwi − v i kp + kwi0 − v i kp ≤ .
4
∀ u ∈ B, ∃ v ∈ S such that ku − vk1,p ≤ 4 , hence
ku − wi0 kp ≤ ku − vkp + kv − vδ0 kp + kvδ0 − wi kp + kwi − wi0 kp < .
{w10 , . . . , wl0 } is the finite -net we are seeking.
Sobolev spaces 145
10.6 The Euler–Lagrange equation
In this section, we extend the calculus of variation from previously discussed

C 1 -space to Sobolev spaces. Given a Lagrangian L ∈ C(Ω × RN × RnN ),
L = L(x, u, p) is differentiable, where Ω ⊂ Rn . In order to define the functional
Z
I(u) = L(x, u(x), ∇u(x))dx
Ω
on some suitable Sobolev space W 1,q (Ω) and to make sure the E-L equation cor-
responding to the necessary condition of its extremal values actually makes sense,
we must impose some additional assumptions on L as follows:
(1) L, Lu , Lp are continuous;
(2) |L(x, u, p)| ≤ C(1 + |u|q + |p|q );
(3) |Lu (x, u, p)| + |Lp (x, u, p)| ≤ C(1 + |u|q + |p|q ).
In fact, by assumption (2), ∀ u ∈ W 1,p (Ω, RN ),
Z Z
I(u) = |L(x, u(x), ∇u(x))|dx ≤ C (1 + |u(x)|q + |∇u(x)|q )dx < ∞.
Ω Ω
Moreover, by assumption (3),
Z
|Lu (x, u(x), ∇u(x))| + |Lp (x, u(x), ∇u(x))|dx < ∞,
Ω
i.e. λ(x) := Lu (x, u(x), ∇u(x)) and µ(x) := Lp (x, u(x), ∇u(x)) ∈ L1 .
Lemma 10.2 Under the assumptions (1), (2), and (3), ∀ u0 ∈ W 1,q (Ω), ∀ ϕ ∈
C01 (Ω),
d
δI(u0 , ϕ) = I(u0 + sϕ)|s=0
Z dt
= [Lu (x, u(x), ∇u(x))ϕ(x) + Lp (x, u(x), ∇u(x))∇ϕ(x)]dx. (10.3)
Ω
Proof We need only be concerned with

Z
s−1 [I(u + sϕ) − I(u)] = [λs (x)ϕ(x) + µs (x)∇ϕ(x)]dx,
Ω
where
Z 1
λs (x) = Lu (x, u(x) + θsϕ(x), ∇(u(x) + θsϕ(x)))dθ,
0
Z 1
µs (x) = Lp (x, u(x) + θsϕ(x), ∇(u(x) + θsϕ(x)))dθ.
0
By assumption (3), λs , µs ∈ L1 , and as s → 0, almost everywhere we have
λs (x) → Lu (x, u(x), ∇u(x)), µs (x) → Lp (x, u(x), ∇u(x)).
In addition, ∀ ϕ ∈ C01 (Ω), the integrand in (10.3) is dominated by the integrable

function
C(1 + |u(x)|q + |∇u(x)|q )(|ϕ(x)| + |∇ϕ(x)|).
Thus, the conclusion follows immediately from the Lebesgue Dominated Conver-
gence Theorem.
Remark 10.2 The continuity requirement in all three variables (x, u, p) in Lemma
10.2 assumption (1) can be replaced by the weaker Carathéodory condition:
∀ (u, p) ∈ RN × RnN , x 7→ L(x, u, p) is Lebesgue measurable,

for a.e. x ∈ Ω, (u, p) 7→ L(x, u, p) is continuous. (10.4)
That is, assuming L, Lu , and Lp satisfy the Carathéodory condition.

Remark 10.3 Note that in the Sobolev Embedding Theorem, when q < n,
nq
W 1,q (Ω) ,→ Lr (Ω), r ≤ q ∗ = n−q . The exponential growth requirement in
Lemma 10.2 assumption (3) can be relaxed by:
r q ∗

 ≤ C(1 + |u| + |p| ), r ≤ q , q < n
(30 ) |Lu (x, u, p)| + |Lp (x, u, p)| ≤ ≤ C(1 + |u|r + |p|q ), r ≥ 1, q = n
≤ C(1 + |p|q ),

q > n.
As a consequence, we have:
Theorem 10.9 Suppose the Carathéodory condition (10.4), (2), and (30 ) hold. If
for any given ρ ∈ W 1,q (Ω, RN ), u0 ∈ M = ρ + W01,q (Ω, RN ) is a minimum of
I in M , then u0 satisfies the following E-L equation:
Z
[Lu (x, u(x), ∇u(x))ϕ(x) + Lp (x, u(x), ∇u(x))∇ϕ(x)]dx = 0,
Ω
∀ ϕ ∈ C0∞ (Ω, RN ). (10.5)
In this sense, we call solutions of (10.5) the generalized solutions of E-L equation.
Remark 10.4 The generalization of the concept of directional derivatives in dif-
ferential calculus on Banach spaces are the Gâteaux derivatives.
Let X be a Banach space, let U ⊂ X be an open subset, and f ∈ C(U, R1 ) be
a function defined on U .
Sobolev spaces 147
Definition 10.4 (The Gâteaux derivative) Let x0 ∈ U . We say f is Gâteaux

differentiable at x0 , if ∀ h ∈ X, ∃ df (x0 , h) ∈ R1 such that
|f (x0 + th) − f (x0 ) − tdf (x0 , h)| = o(t), (t → 0).
We call df (x0 , h) the Gâteaux derivative of f at x0 .

In calculus of variations, for a functional I in the space W 1,q (Ω, RN ), the
difference between the variation δI(u0 , ϕ) and the Gâteaux derivative dI(u0 , ϕ)
is that, in the variation, ϕ ∈ C0∞ (Ω) or C01 (Ω); while in the Gâteaux derivative,
ϕ ∈ W01,q (Ω).
In order to insure the integral involved in dI(u0 , ϕ) is well-defined, we must
modify the exponential growth assumption (3) as follows:
r−1
+ |p|q−1 ), r < q ∗ , q < n,

 C(1 + |u|
00
(3 ) |Lu (x, u, p)|+|Lp (x, u, p)| ≤ C(1 + |u| + |p|q−1 ), r ≥ 1, q = n,
r
C(1 + |p|q−1 ),

q > n.
We have the following theorem regarding the Gâteaux derivative of I:
00 1,q N
Theorem 10.10 R Suppose (1), (2) and (3 ) hold. Given ρ ∈ W (Ω, R ).
Let I(u) = Ω L(x, u(x), ∇u(x))dx be a functional defined on M = ρ +
W01,q (Ω, RN ). If I attains its minimum at u0 ∈ M , then u0 is Gâteaux differ-
entiable; furthermore,
Z
dI(u0 , ϕ) = [Lu (x, u(x), ∇u(x))ϕ(x) + Lp (x, u(x), ∇u(x))∇ϕ(x)]dx,
Ω
∀ ϕ ∈ W01,q (Ω, RN ).
Comparing to (30 ), (300 ) is clearly stronger. So in calculus of variations, we often

use variational derivatives, rather than the Gâteaux derivatives.
Remark 10.5 When n = 1 and J = (a, b), we can derive the following integral
form of the E-L equation via (10.5):
Z t
Lu (t, u∗ (t), u̇∗ (t))dt = Lp (t, u∗ (t), u̇∗ (t)) = c, a.e. (10.6)
a
In Lecture 2, Remark 2.2, we established the du Bois–Remond lemma on L∞ (J),

we can now extend this to the Lebesgue space L1 (J).
Lemma 10.3 Suppose f ∈ L1 (J) satisfies
Z
f (t)ϕ0 (t)dt = 0, ∀ ϕ ∈ C0∞ (J),
J
then f (t) = c, a.e. t ∈ J.

Proof Fix any two Lebesgue points a < t0 < t1 < b of f . Let c = f (t0 ). Choose
ε > 0 such that (t0 − ε, t1 + ε) ⊂ J. We construct a piecewise linear function ψ
as follows

0, t∈/ [t0 − ε, t1 + ε]
ψ(t) =
1, t ∈ [t0 , t1 ]
and connecting the rest with straight lines. Clearly, there exists {ϕn } ⊂ C0∞ (J)
such that ϕn → ψ in W 1,1 (J). Hence,
Z Z
f (t)ϕ0n (t)dt → f (t)ψ 0 (t)dt;
J J
consequently,
Z t0 Z t1 +ε
−1 −1
ε f (t)dt − ε f (t)dt = 0.
t0 −ε t1
Letting ε → 0, by taking the limit, f (t1 ) = f (t0 ) = c. Since t1 can be an

arbitrary Lebesgue point, it follows that for all Lebesgue point t ∈ J, f (t) = c.
This completes the proof.
Thus, under the assumptions (1), (2) and (30 ), the minimum u∗ ∈ u0 +
W 1,q (J, RN ) of I satisfies the integral form of the E-L equation (10.6).
Exercises
1. Verify the properties (1)–(6) and (8) listed in §10.4.

2. Let u ∈ W 1,q (R1 ). Show that
q−1

 C λ (R1 ), λ= , 1 ≤ q < ∞,
u∈ q
 Lip(R1 ), q = ∞.
Lecture 11
Weak lower semi-continuity
In Lecture 9, we have demonstrated that the weak sequential lower semi-

continuity of a functional plays an important role in direct methods. In this lecture,
we focus on the criteria for determining whether a functional is weakly sequen-
tially lower semi-continuous.
11.1 Convex sets and convex functions
We shall investigate the lower semi-continuity of a function from the view-

point of point-set topology. From definition, on a topological space X, a function
f : X → R1 is said to be lower semi-continuous if and only if its epigraph
ft = {x ∈ X | f (x) ≤ t, ∀ t ∈ R1 } is closed. On a normed linear space X, a
function f is weakly sequentially lower semi-continuous if and only if the set ft
(∀ t ∈ R1 ) is weakly sequentially closed.
Generally speaking, being closed and weak closed for sets are quite different
concepts: a weakly closed set must be closed; however, the converse may not be
true.
A convex set in a Banach space has the following important property: if C ⊂
X is a convex subset of a Banach space X, then
closed ⇐⇒ weakly sequentially closed.
This is due to Mazur’s Theorem: In a Banach space, if xn * x, then
1
Pn
n i=1 xi → x.
We can apply this result to determine whether a functional is weakly sequen-
tially lower semi-continuous. Let us recall the definition of a convex function:
given a function f defined on a convex subset C of a linear space E, if
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), ∀ x, y ∈ C, ∀ λ ∈ [0, 1],
149
then f is said to be convex. It is not difficult to see that

f is a convex function =⇒ ft := {x ∈ X|f (x) ≤ t} is a convex set, ∀ t ∈ R1 .
Combining these two statements, we obtain the following theorem.
Theorem 11.1 Let X be a Banach space. If f : X → R1 is a convex function,
then
f is sequentially lower semi -continuous ⇐⇒ f is weakly sequentially lower
semi -continuous.
Proof We note that
f is lower semi-continuous ⇐⇒ ∀ t ∈ R1 , ft is closed.
f is weakly sequentially lower semi-continuous ⇐⇒ ∀ t ∈ R1 , ft is weakly
sequentially closed.
Since ∀ t ∈ R1 , ft is a convex set, these two notions are equivalent, our
assertion follows.
Based on Theorem 11.1, we now determine the weak sequential lower
semicontinuity of certain functionals.
From now on, without confusion, we will not distinguish the gradient ∇ and
the generalized gradient D (in the sense of generalized derivatives), and denote
both by ∇.
Example 11.1 Let Ω ⊂ Rn be a bounded region, f ∈ L2 (Ω). On the Sobolev
space H01 (Ω), the functional
Z
1
I(u) = |∇u(x)|2 + f (x)u(x) dx
Ω 2
is weakly sequentially lower semi-continuous.
R
Proof Since u 7−→ Ω f (x)u(x)dx is linear, and by Poincaré’s inequality,
Z
f (x)u(x)dx ≤ kf kL2 kukL2 ≤ C(f )kukH01 .
Ω
It is both continuous and convex, hence it is weakly sequentially lower semi-

continuous.
Furthermore, since u → Ω |∇u(x)|2 dx is continuous and convex, it is also
R
weakly sequentially lower semi-continuous.

Example 11.2 Let Ω ⊂ Rn be a bounded region. On the Sobolev space W01,q (Ω)
nq
(1 ≤ q < ∞, 1 ≤ r ≤ q ∗ = n−q ), the functional
Z
I(u) = (|∇u(x)|q + |u(x)|r )dx
Ω
Weak lower semi-continuity 151
is weakly sequentially lower semi-continuous. (The proof is similar and left as an

exercise.)
If we replace the functional by
Z
I(u) = (|∇u(x)|q + c(x)|u(x)|r )dx,
Ω
∞
where c ∈ L (Ω), c ≥ 0, then the conclusion still holds and the proof remains
the same.
However, if the non-negativity condition on c is removed, then in general,
I is not necessarily convex, so it may not be weakly sequentially lower semi-
nq
continuous. But if we impose the condition r < q ∗ = n−q , then I will be weakly
sequentially lower semi-continuous.
In fact, if uj * u in W01,q (Ω), then
Z Z
q
lim inf |∇uj (x)| ≥ |∇u(x)|q .
Ω Ω
We also have
Z Z
lim inf c(x)|un (x)|r dx ≥ c(x)|u(x)|r dx.
Ω Ω
Suppose not, there exist > 0 and a subsequence {uj 0 } such that
Z Z
c(x)|uj 0 (x)|r dx < c(x)|u(x)|r dx − .
Ω Ω
By the Rellich–Kondrachov embedding theorem, there exists a subsequence

{unj } ⊂ {uj 0 } such that unj → u in Lr (Ω). Thus,
Z Z
c(x)|unj (x)|r dx → c(x)|u(x)|r dx,
Ω Ω
a contradiction.
11.2 Convexity and weak lower semi-continuity

R
In order to establish the variational integral I(u) = Ω L(x, u(x), ∇u(x))dx
is convex in u, it would require the Lagrangian L(x, u, p) to be convex in (u, p).
But from Example 11.2, we see that this requirement is too strong. Using compact
embedding, such convexity requirement for u can be replaced by some exponen-
tial growth conditions. We will carry out this idea in solving general variational
problems.
Theorem 11.2 (Tonelli–Morrey) Suppose L : Ω̄ × RN × RnN → R1 satisfies

(1) L ∈ C 1 (Ω̄ × Rn × RnN ),
(2) L ≥ 0,
(3) ∀ (x, u) ∈RΩ × RN , p 7→ L(x, u, p) is convex,
then I(u) = Ω L(x, u(x), ∇u(x))dx is weakly sequentially lower semi-
continuous in W 1,q (Ω, RN ) (1 ≤ q < ∞).
We now need another important characteristic of convex functions.
Lemma 11.1 Let X be a Banach space, h ∈ X, and f : X → R1 be a convex
function. If f has directional derivative df (x0 , h) at x0 in the direction of h, then
f (x0 ) + df (x0 , h) ≤ f (x0 + h).
This is a direct application of the well-known result of single variable convex

functions.
Proof of Theorem 11.2 Let uj * u (W 1,q ), we want to show
Z Z
lim inf L(x, uj (x), ∇uj (x))dx ≥ L(x, u(x), ∇u(x))dx.
Ω Ω
1. By the Rellich–Kondrachov theorem, we can find a subsequence, still denoted

by {uj } such that
uj → u, Lq (Ω, RN ).
Then by the Riesz theorem, there exists a further subsequence, which is again
denoted by {uj } such that
uj (x) → u(x) a.e. x ∈ Ω.
∀ ε > 0, there exists a compact subset K ⊂ Ω such that mes(Ω\K) < ε and
(1) uj → u uniformly on K (Egorov’s theorem),
R ∇u are continuous on K (Luzin’s theorem),
(2) u and
(3) if Ω L(x, u(x), ∇u(x))dx < +∞, then
Z Z
L(x, u(x), ∇u(x))dx ≥ L(x, u(x), ∇u(x))dx − ε,
K Ω
(the absolute
R continuity of Lebesgue integrals).
If Ω L(x, u(x), ∇u(x))dx = +∞, then we take
Z
1
L(x, u(x), ∇u(x))dx > .
K ε
2. Since L is convex in p, applying Lemma 11.1, we obtain

Z
I(uj ) ≥ L(x, uj (x), ∇uj (x))dx
K
Z Z
≥ Lp (x, uj (x), ∇u(x))(∇uj (x)−∇u(x))dx + L(x, uj (x), ∇u(x))dx
K K
Z Z
= L(x, uj (x), ∇u(x))dx + Lp (x, u(x), ∇u(x))(∇uj (x) − ∇u(x))dx
K K
Z
+ Lp (x, uj (x), ∇u(x)) − Lp (x, u(x), ∇u(x))(∇uj (x) − ∇u(x))dx
K
= I + II + III.
3. By (1), on K, uj → u uniformly, and L is continuous, hence the first term

Z Z
I= L(x, uj (x), ∇u(x))dx → L(x, u(x), ∇u(x))dx.
K K
∞
By (2), Lp (x, u(x), ∇u(x)) ∈ L (K). Taking χK to be the characteristic func-
0
tion of K, it yields χK Lp (x, u(x), ∇u(x)) ∈ L∞ (Ω) ⊂ Lq (Ω). Using the fact
uj * u (in W 1,q (Ω)), we can deduce
∇uj * ∇u (in Lq (Ω, RnN )).
Thus, the second term
Z
II = lim Lp (x, u(x), ∇u(x))(∇uj (x) − ∇u(x))dx = 0.
j→∞ K
Lastly, since a weakly convergent sequence is bounded,

k∇uj − ∇uk1 ≤ C1 (k∇uj − ∇ukq ) ≤ C1 (k∇uj kq + k∇ukq ) ≤ C2 ;
furthermore, since Lp (x, uj (x), ∇u(x)) → Lp (x, u(x), ∇u(x)) uniformly on K,
it follows that the third term
Z
III = lim (Lp (x, uj (x), ∇u(x)) − L(x, u(x), ∇u(x)))(∇uj − ∇u)dx = 0.
j→∞ K
In summary,
Z
lim I(uj ) ≥ L(x, u(x), ∇u(x))dx ≥ I(u) − ε.
j→∞ K
Since ε > 0 is arbitrary,

lim I(uj ) ≥ I(u).
j→∞
Remark 11.1 It is worth noting that the theorem does not require the functional
to be bounded from above, this is because we are only concerned with lower semi-
continuity. Without the restriction on the growth of the functional, it is conceivable
that ∃ u ∈ W 1,q (Ω) such that I(u) = +∞. However, this does not interfere with
our discussion on lower semi-continuity.
Utilizing the concept of Carathéodory functions, assumption (1) in Theorem
11.2 can be replaced by
(10 ): for a.e. (x, u), p 7→ L(x, u, p) is differentiable, L and Lp are both
Carathéodory functions.
11.3 An existence theorem
Theorem 11.3 (The existence of extreme values) Let Ω ⊂ Rn be a bounded

measurable set. If u0 ∈ W 1,q (Ω, RN ), 1 < q < ∞ and if
(1) L and Lp are Carathéodory functions,
(20 ) ∃ a ∈ L1 (Ω), b > 0, such that L(x, u, p) ≥ −a(x) + b|p|q , ∀ (x, u, p) ∈
Ω × RN × RnN ,
(3) ∀ (x, u) ∈ Ω × RN , p 7→ L(x, u, p) is convex,
then the functional
Z
I(u) = L(x, u(x), ∇u(x))dx
Ω
attains its minimum on u0 + W01,q (Ω, RN ).

Proof Consider the functional v 7→ I(u0 + v) on the reflexive Banach space
W01,q (Ω, RN ).
0
1. Although assumption (2
R ) and assumption (2) are not quite the same, we can
consider the functional I + Ω a(x)dx instead of I. Then applying the Tonelli–
Morrey theorem to W01,q (Ω, RN ), it follows that I is weakly sequentially lower
semi-continuous.
2. I is coercive, i.e.
when kuk1,q → ∞, I(u) → +∞.
In fact, ∀ v ∈ W01,q (Ω, RN ), by Poincaré’s inequality, there exist α, β > 0 such
that
Z Z
I(u0 + v) ≥ − a(x)dx + b |∇(u0 + v)|q dx ≥ αkvkq1,q − β.
Ω Ω
As an example, we claim the Poisson equation has a generalized solution.

Given a bounded Ω ⊂ Rn and a function f ∈ L2 (Ω), there exists u ∈ H01 (Ω)
such that
−∆u = f.
As a matter of fact, by Theorem 11.3, the functional
Z
1 2
I(u) = |∇u| − f u dx
Ω 2
attains its minimum on H01 (Ω).
11.4∗ Quasi-convexity
Although we have relaxed the convexity requirement to be ∀ (x, u) ∈ Ω ×

N
R , p 7→ L(x, u, p) is convex, this condition is yet still too strong in many
applications. i
∂u
For instance, the Jacobian determinant det ∂xj
clearly possesses both ge-
ometric and mechanical meaning. When Ω ⊂ Rn , u = (u1 , u2 , . . . , un ) : Ω →
Rn , the Lagrangian containing the Jacobian determinant of u often occurs in both
geometry and mechanics. For example, let f ∈ C(R1 , R1 ) be a convex function,
2
L : Rn → R1 ,
A → L(A) = f (det(A)),
i

∂u
where A = det ∂x j
. However, L is not convex in p.
It is natural to ask whether the assumption on L being convex in p is indeed
necessary. We shall first take a look at the simple case where L only depends on
p and n = 1. The more R general cases will be inspired by this simple case.
Suppose I(u) = J L(u0 (t))dt is sequentially weak-∗ lower semi-continuous,
i.e.
uj *∗ u (in W 1,∞ (J, RN )) =⇒ limI(uj ) ≥ I(u).
We seek conditions for which L must satisfy.
Without loss of generality, we may assume J = (0, 1); in particular, ∀ η ∈
RN , ∀ λ ∈ (0, 1), take

t(1 − λ), t ∈ [0, λ),
ϕ(t) = η
(1 − t)λ, t ∈ [λ, 1),
then ϕ ∈ W01,∞ (J). We now periodically extend this function to the whole line,
and let
1
ϕm (t) = ϕ(mt).
m
For arbitrary u0 , p0 ∈ RN , we construct the function

u(t) = u0 + p0 t
and the sequence
um (t) = u(t) + ϕm (t), m = 1, 2, . . . .
We have:
|um (t) − um (s)| = |ϕm (t) − ϕm (s)| ≤ sup |ϕ0 (mθ)| |t − s| ,
t≤θ≤s
namely,
|um |1,∞ ≤ kϕ0 kL∞ (J)
and
um → u (in L∞ (J)).
By further applying the facts ϕ̇m (t) = ϕ0 (mt), ϕ ∈ W01,∞ (J, RN ), ϕ̄0 = 0, and
by Example 9.3 in Lecture 9, it follows that
ϕ0m *∗ 0 (in L∞ (J)).
Now ∀ ξ ∈ (W 1,1 (J, RN ))∗ , we have ξ = (ξ0 , ξ1 ) ∈ L1 (J, R2N ). Moreover,
since Dum *∗ Du in L∞ (J), it follows that
Z
hξ, um − ui = [ξ0 (um − u) + ξ1 D(um − u)]dt → 0.
J
Namely, um *∗ u in W 1,∞ (J, RN ).
Notice that
Z
I(u) = L(p0 )dt = L(p0 ),
J
and by the periodicity of ϕ, we have
Z
I(um ) = L(p0 + ϕ0m (t))dt
J
Z
1
= L(p0 + ϕ0 (t))dt
m mJ
Z
= L(p0 + ϕ0 (t))dt.
J
Suppose I is weakly sequentially lower semi-continuous, lim inf I(um ) ≥ I(u),
it yields that
Z
L(p0 + ϕ0 (t))dt ≥ L(p0 ).
J
Inspired by the above argument, we now consider the case n > 1.
Theorem 11.4 Let Ω ⊂ Rn be a region and L ∈ C(Ω, RnN ). If

Z
I(u) = L(∇u(x))dx
Ω
is sequentially weak-∗ lower semi-continuous on W 1,∞ (Ω, RN ), i.e.
uj *∗ u (W 1,∞ (Ω, RN )) =⇒ limI(uj ) ≥ I(u),
then for any hypercube D ⊂ D̄ ⊂ Ω, ∀ A0 ∈ RnN (an n × N matrix),
Z
L(A0 + ∇ϕ(x))dx ≥ L(A0 )mes(D), ∀ ϕ ∈ W01,∞ (Ω, RN ).
D
Proof 1. We first extend ϕm to high dimensions. Without loss of generality, we

may assume D = [0, 1]n is the unit hypercube. For any k ∈ N , we partition D
kn
into 2k equal size subcubes, D = ∪2l=1 Dlk , where each side of Dlk has length 2−k
with center ckl = 2−k (y1l + 21 , y2l + 12 , . . . , ynl + 12 ), where (y1l , y2l , . . . , ynl ) l =
1, 2, . . . , 2kn range over the lattice points (0, 1, 2, . . . , 2k − 1)n . For any v ∈
W01,∞ (D, RN ), we extend it periodically to the entire Rn . Let
1
wk (x) = k v(2k (x − ckl )), x ∈ Dlk , ∀ l = 1, 2, . . . , 2kn ,
2
then
∇wk (x) = ∇v(2k (x − ckl )), x ∈ Dlk , ∀ l = 1, 2, . . . , 2kn ,
and
in L∞ (D, RN ),

wk → 0
∇wk *∗ 0 in L∞ (D, RN ).
2. Define
uk (x) = A0 x + wk (x), k = 1, 2, . . . ,
where wk is defined as above and it vanishes outside D. It is clear
uk *∗ u = A0 x.
Hence,
mes(Ω)L(A0 ) = I(u) ≤ lim inf I(uk )
and Z Z
I(uk ) = L(A0 + ∇wk (x)dx + L(A0 )dx
D Ω\D
kn
2
X Z
= L(A0 + ∇v(2k (x − ckl )))dx + L(A0 )mes(Ω\D)
l=1 Dlk
Z
= L(A0 + ∇v(x))dx + L(A0 )mes(Ω\D).
D
Our assertion now follows.
This is why Morrey introduced the following concept of quasi-convexity of a

function.
Definition 11.1 A function L is said to be quasi-convex, if ∀ A ∈ RnN (an n × N
matrix), ∀ hypercube D ⊂ Rn , ∀ v ∈ W01,∞ (D, RN ), we have
Z
mes(D)L(A) ≤ L(A + ∇v(x))dx.
D
The importance of quasi-convexity is to insure the weak sequential lower semi-

continuity of a functional. The proof of the following theorem is rather lengthy,
we refer the interested readers to [Da], pp. 156–167.
Theorem 11.5 (Morrey–Acerbi–Fusco) When 1 ≤ p < ∞, if
Z
I(u) = L(∇u)dx
Ω
is weakly sequentially lower semi-continuous on W 1,p (Ω, RN ) (or when p = ∞,

I is weakly-∗ sequentially lower semi-continuous), then L is quasi-convex. Con-
versely, if we add the growth conditions:

 |L(A) ≤ α(1 + |A|)| p=1
q p
−α(1 + |A| ) ≤ L(A) ≤ α(1 + |A| ) 1 ≤ q < p < ∞
|L(A)| ≤ η(|A|) p = ∞,

where α > 0 is a constant, and η is a non-decreasing continuous function, and if

L is quasi-convex, then when 1 ≤ p < ∞, I is weakly sequentially lower semi-
continuous on W 1,p (Ω, RN ) (or when p = ∞, I is weak-∗ sequentially lower
semi-continuous).
What kind of functions are quasi-convex? Suppose f : R1 → R1 is convex
and u ∈ L1 (Ω), then from Jessen’s inequality:
Z Z
1 1
f u(x) dx ≤ f (u(x))dx,
mes(Ω) Ω mes(Ω) Ω
we see that if L is convex, then L must be quasi-convex. In fact, if p 7→ L(p) is
convex, then
Z
−1
L(p) = L(mes(D) (p + ∇ϕ(x))dx)
Z D
≤ mes(D)−1 L(p + ∇ϕ(x))dx), ∀ ϕ ∈ W01,∞ (Ω, RN ).
D
It is worth noting when n = 1 or N = 1, quasi-convexity and convexity coincide.

We first verify the above statement by showing quasi-convexity implies
convexity for n = 1.
Let ξ, η ∈ RN , ∀ λ ∈ [0, 1]. Let

ξ1 = ξ + (1 − λ)η,
ξ2 = ξ − λη.
Then
ξ = λξ1 + (1 − λ)ξ2 ,
η = ξ1 − ξ2 .
Define

t(1 − λ), t ∈ [0, λ),
ϕ(t) = η
(1 − t)λ, t ∈ [λ, 1),
and substituting it into the quasi-convexity assumption,
Z 1
L(ξ) ≤ L(ξ + ϕ0 (t))dt
0
Z λ Z 1
= L(ξ1 )dt + L(ξ2 )dt
0 λ
= λL(ξ1 ) + (1 − λ)L(ξ2 ),
it shows that p 7→ L(p) is convex.
Next, we verify quasi-convexity implies convexity for N = 1, n > 1. That is,
∀ ξ1 , ξ2 ∈ Rn , we want to show
L(λξ1 + (1 − λ)ξ2 ) ≤ λL(ξ1 ) + (1 − λ)L(ξ2 ).
Take a hypercube D ⊂ Ω, without loss of generality, we may assume D = [0, 1]n .
We continue to use the above defined function ϕ, ∀ x = (x1 , . . . , xn ) ∈ D, let
uk (x) = ηk −1 ϕ(kx1 ),
then

(1 − λ), {kx1 } ∈ [0, λ),
∇uk (x) = η
−λ, {kx1 } ∈ [λ, 1),
where {y} represents the fractional part of y ∈ R1 and η = ξ1 − ξ2 . Furthermore,
let
vk (x) = η min{k −1 ϕ(kx1 ), dist(x, ∂D)},
where dist(x, ∂D) = inf{sup1≤i≤n kxi − yi k | y = (y1 , . . . , yn ) ∈ ∂D}. Thus,
vk |∂D = 0, and there exists a constant K > 0 such that |uk (x) − vk (x)| ≤
Kkx − yk, whence, vk ∈ W01,p (D), 1 < p < ∞. In addition,
mes{x ∈ D | ∇uk (x) 6= ∇vk (x)} → 0, k → ∞.
If we take D1 = {x ∈ D | ∇uk (x) = (1 − λ)η}, D2 = {x ∈ D | ∇uk (x) =

−λη}, then D = D1 ∪ D2 .
Since L is quasi-convex,
Z
mes(D)L(λξ1 + (1 − λ)ξ2 ) ≤ L(λξ1 + (1 − λ)ξ2 + ∇vk (x))dx.
D
Let ξ = λξ1 + (1 − λ)ξ2 , then by the absolute continuity of integrals, we have

Z Z
lim L(ξ + ∇vk (x))dx = lim L(ξ + ∇uk (x))dx
D D
Z Z
= L(ξ1 )dx + L(ξ2 )dx
D1 D2
= (λL(ξ1 ) + (1 − λ)L(ξ2 ))mes(D).
This proves our assertion.
However, there are quasi-convex functions which are not convex. For exam-
ple, the determinant function
A 7→ det(A).
We only verify this for the case N = n = 2. Given a matrix A = (aij ). Let
u = (u1 , u2 ), P = (pij ), and L(P ) = det(P ). Since
det(∇ϕ) = ∂1 (ϕ1 ∂2 ϕ2 ) − ∂2 (ϕ1 ∂1 ϕ2 ),
we have:
Z
det(∇ϕ)dx = 0, ∀ ϕ ∈ W01,∞ (Ω).
D
Consequently,
Z
−1
mes(D) det(A + ∇ϕ(x))dx
D
Z
= mes(D)−1 [det(A) + det(∇ϕ(x)) + a11 ∂2 ϕ2 + a22 ∂1 ϕ1
D
− a12 ∂1 ϕ2 − a21 ∂2 ϕ1 ]dx
= det(A).
Recall the Legendre–Hadamard condition from Lecture 6 in determining whether
u0 is a weak minimum,
Lpjα pk (x, u0 (x), ∇u0 (x))παj pkβ ≥ 0, ∀ π ∈ Rn×N , rank (π) = 1.
β
This is related to convexity. In fact, if ∀ (x, u) ∈ Ω × RN , p 7→ L(x, u, p) is

convex, and if L ∈ C 2 , then for any u, the Legendre–Hadamard condition holds.
However, the Legendre–Hadamard condition does not require L to be convex

for all n × N matrices, it only requires L to be convex for all rank 1 matrices. We
shall call the Lagrangian L rank 1 convex (for brievity, we omit (x, u) ∈ Ω × RN )
if it satisfies the following conditions:
L(λB + (1 − λ)C) ≤ λL(B) + (1 − λ)L(C),
∀ λ ∈ [0, 1], ∀ B, C ∈ M n×N , rank(B − C) = 1
In summary,
for n=1 or N =1
convex =⇒ quasi-convex =⇒ rank 1 convex =⇒ convex.
Returning to the existence problem, with the assistance of the Morrey–Acerbi–
Fusco theorem, we have a more general existence theorem.
C
Theorem 11.6 Suppose L : M n×N −→ R1 is quasi-convex and there exist
constants C2 > C1 > 0 such that
C1 |A|p ≤ L(A) ≤ C2 (1 + |A|p ), 1 < p < ∞.
If further, we assume v ∈ W 1,p (Ω, RN ), then in the set E = Wv1,p = {u ∈
W 1,p (Ω, RN ) | u|∂Ω = v|∂Ω },
Z
I(u) = L (∇u(x)) dx
Ω
achieves its minimum.
Exercises
1. Suppose ∀ x ∈ Ω, (u, p) 7−→ L(x, u, p) is convex. Prove that the functional

Z
I(u) = L(x, u(x), ∇u(x))dx
Ω
is convex in u.
2. Find the weak limit of ϕn (x) = n1 sin nx in W 1,q (0, 2π), 1 ≤ q < ∞.
3. Let Ω ⊂ Rn be a bounded region. Suppose in W 1,q (Ω) (1 ≤ q < ∞), the
sequence {uj } converges weakly and {uj } converges almost everywhere to a
function u0 . Does {uj } contain a convergent subsequence in Lq (Ω)? If so,
what does it converge to? Is u0 ∈ W 1,q (Ω)?
4. Let J = (0, 1), λ ∈ (0, 1), α, β ∈ R1 ,

α, x ∈ (0, λ),
ϕ(x) =
β, x ∈ (λ, 1),
and let ϕn (x) = ϕ(nx). Show that ϕn * λα + (1 − λ)β.
5. Determine whether each of the following functionals is weakly sequentially

lower semi-continuous on each specified spaces:
(1) Assume Φ is a single variable convex function, c is a continuous function
on the bounded set Ω̄,
Z
I(u) = (Φ(|∇u(x)|) + c(x)|u(x)|4 )dx in W 1,4 (Ω).
Ω
1
[1 + |∇u(x)|2 ] 2 dx in W 1,1 (Ω).
R
(2) I(u) = Ω
R1 2
(3) I(u) = −1
t u̇(t)2 dt in H 1 (−1, 1).
Lecture 12
Boundary value problems and eigenvalue

problems of linear differential equations
12.1 Linear boundary value problems and orthogonal projections
In previous lectures, the methods used in analyzing the existence of extreme

values of functionals not only rely on the weak sequential compactness of the
domain but also the weak sequential lower semi-continuity of the functionals.
Interestingly, as mentioned in Lecture 9, for some special variational prob-
lems associated with a particular class of linear differential equations, the weak
sequential compactness as well as the weak sequential lower semi-continuity can
be replaced by the completeness of a Hilbert space, and we need only the orthog-
onal projections in a Hilbert space. In this lecture, we introduce the orthogonal
projection method and its applications in variational problems.
Recall the Poisson equation in Lecture 11: let Ω ⊂ Rn be a bounded region
and f ∈ L2 (Ω), we want to find u : Ω → R1 satisfying the equation

−∆u = f , in Ω,
u = 0, on ∂Ω.
In the last lecture, we obtain the minimum by means of proving that a minimizing
sequence of the functional
Z
1 2
I(u) = |∇u| − f u dx
Ω 2
contains a convergent subsequence in H01 (Ω).
However, since H01 (Ω) is a Hilbert space equipped with the inner product
Z
((u, v)) = ∇u · ∇v dx, ∀ u, v ∈ H01 (Ω),
Ω
we note the first term in the functional I precisely corresponds to the norm induced
by such inner product, whereas the second term can also be expressed via this
163
inner product. To see this, we regard

Z
ϕ 7→ f · ϕ dx
Ω
as a linear functional on H01 (Ω). Using Schwarz’s inequality, we have
Z

f · ϕ dx ≤ kf k · kϕk ,
2 2
Ω
and Poincaré’s inequality yields
Z

f · ϕ dx ≤ C kf k kϕk 1 ,
2 H0
Ω
which implies the continuity of the functional. Hence, there exists F ∈ (H01 (Ω))∗
such that Z
F (ϕ) = f · ϕ dx.
Ω
By the Riesz representation theorem from functional analysis, this continuous lin-
ear functional can be represented via the inner product as follows: ∃ u0 ∈ H01 (Ω)
such that Z
((u0 , ϕ)) = F (ϕ) = f ϕ dx.
Ω
Consequently, we can rewrite the E-L equation of I in terms of the inner product:
Z
((u, ϕ)) = f · ϕ dx = ((u0 , ϕ)), ∀ ϕ ∈ C0∞ (Ω).
Ω
Since C0∞ (Ω) is dense in H01 (Ω), u = u0 is the solution.
In appearance this approach contains neither the minimizing sequence nor
the weak sequential compactness and the weak sequential lower continuity. The
solution is obtained directly. However, the proof of the Riesz representation the-
orem is itself a variational problem, namely, finding the distance from a point
outside a given hyperplane to the hyperplane in a Hilbert space, along with the
projection of the point onto the hyperplane.
Let us recall this proof. Denote
M = {η ∈ H01 (Ω)|F (η) = 0}.
M is a closed linear subspace of H01 (Ω). For any ϕ ∈ H01 (Ω), we can find the
orthogonal projection η of ϕ onto M , i.e. η ∈ M and ϕ − η = ξ⊥M . If we take

ξ
 F (ξ) 2 ξ 6= 0,

v0 = [[ξ]]

0 ξ = 0,

then ((v0 , ϕ)) = F (ϕ) as desired.

Boundary value and eigenvalue problems 165
Note, if ξ = 0, then the above equality clearly holds. If ξ 6= 0, then

∀ ϕ ∈ H01 (Ω),
F (ϕ) = F (ξ),
but

ξ
((v0 , ϕ)) = F (ξ) 2 , ϕ = F (ξ),
[[ξ]]
which is exactly what we want.
If we take an equivalent inner product on the Hilbert space, specifically
Z 21
2
[[u]] = |∇u| dx ,
Ω
then the following argument precisely depicts the minimization process used in
solving variational problems.
We now return to examine the existence of orthogonal projections (see
Figure 12.1).
Fig. 12.1
This can be converted to a variational problem:
min [[ϕ − x]] .

x∈M
In fact, if there exists η ∈ M such that [[ϕ − η]] = minx∈M [[ϕ − x]], then
ξ = ϕ − η satisfies
((ξ, x − η)) = 0, ∀ x ∈ M.
This is because ∀ x ∈ M and ∀ t ∈ [0, 1], by letting

2
g(t) = [[ϕ − (tx + (1 − t)η)]]
2 2
= [[ϕ − η]] − 2t((ϕ − η, x − η)) + t2 [[x − η]] ,
g becomes a quadratic function, which achieves its minimum at t = 0, whence,

g 0 (0) = −2((ϕ − η, x − η)) = 0, (12.1)
namely,
((ξ, x − η)) = 0.
But why does this minimum exists? Let m = inf x∈M [[ϕ − x]], choose a minimiz-
ing sequence {ηj } ⊂ M such that
1
[[ηj − ϕ]] < m + , ∀j
j
(see Figure 12.2).
Fig. 12.2
The question is: does {ηj } converge? ∀ ε > 0, by the parallelogram law and
the fact that M is a linear subspace, we have
2
2 2 2 ηj + ηk
[[ηj − ηk ]] = 2([[ηj − ϕ]] + [[ηk − ϕ]] ) − 4 −ϕ
2
≤ 4(m + ε)2 − 4m2 , for j, k sufficiently large. (12.2)
Hence, {ηj } is a Cauchy sequence. By the completeness of H01 (Ω), {ηj } is con-
vergent, i.e. ηj → η. Since M is closed, η ∈ M and it achieves
[[η − ϕ]] = min [[ϕ − x]] .
x∈M
This argument provides a different angle to verify the Dirichlet’s principle. By the
peculiarity of the problem, it is unnecessary to employ weak convergence, while
the metric completeness of the space plays a key role.
The same method can also be used in handling variational inequalities (cf. Lec-
ture 7). For instance, the obstacle problem of a thin membrane: on the bounded
region Ω ⊂ Rn , given a measurable function ψ(x) as an obstacle and the external
force function f ∈ L2 (Ω). We seek the equilibrium position of the thin membrane
u : Ω → R1 .
We restate this as a variational problem. Let C = {u ∈ H01 (Ω) | u(x) ≤
ψ(x)
R a.e.} be a closed convex subset of the Hilbert space H01 (Ω). Denote (f, v) =
Ω
f · vdx, we wish to find u ∈ C such that
((u, v − u)) − (f, v − u) ≥ 0, ∀ v ∈ C. (12.3)
According to the above discussion, there exists u0 ∈ H01 (Ω) such that
((u0 , v)) = (f, v), ∀ v ∈ H01 (Ω).
Thus, it becomes
((u − u0 , v − u)) ≥ 0, ∀ v ∈ C. (12.4)
Based on the earlier argument, this can be accomplished by finding
min [[u0 − v]] . (12.5)
v∈C
In fact, suppose u is a solution to the problem, since ∀ v ∈ C, tv + (1 − t)u ∈

C, (12.1) becomes
g 0 (0) ≥ 0,
i.e. (12.4). As for the existence of a solution for (12.5), this can still be deduced by
the minimizing sequence {ηj } ⊂ C. Since C is convex, in (12.2), 21 (ηj +ηk ) ∈ C.
By the parallelogram law, {ηj } is a Cauchy sequence. Since C is closed, it must
converge to a limit, at which the minimal distance is attained. Furthermore, we
already know the solution of (12.5) is indeed the solution of (12.3).
12.2 The eigenvalue problems
Let Ω ⊂ Rn be a region, we want to know for which values of λ ∈ R1 , the

boundary value problem

−∆u = λu, in Ω,
(12.6)
u = 0, on ∂Ω,
has a nonzero solution u. Similar to eigenvalue problems of matrices, we shall
call this the eigenvalue problem of the differential operator −∆.
Those λ which correspond to nonzero solutions are called the “spectrum”.
For the nonzero solution u ∈ L2 (Ω), it is called an eigenfunction, and the cor-
responding λ is called an eigenvalue. This kind of eigenvalue problems is fre-
quently encountered in geometry, mechanics, physics, and many other branches in
mathematics.
Eigenvalue problems can be handled using constrained variational methods.

As stated in Lecture 6, setting
Z
2
I(u) = |∇u| dx
Ω
and
Z
2
N (u) = |u| dx,
Ω
we wish to find
min{I(u)|u ∈ H01 (Ω) ∩ N −1 (1)}. (12.7)
If ϕ1 ∈ C 2 (Ω̄) indeed achieves such minimum, then by the Lagrange multipliers,
for the adjusted E-L equation of the Lagrangian, there exists λ1 ∈ R1 such that
−∆ϕ1 = λ1 ϕ1 in Ω. (12.8)
R 2
Since N (ϕ1 ) = Ω |ϕ1 | dx = 1, ϕ1 is nonzero. Multiplying both sides of (12.8)
by ϕ1 and then integrating, it yields
Z
2
I(ϕ1 ) = |∇ϕ1 | dx = λ1 .
Ω
This means
λ1 = min{I(u)|u ∈ H01 (Ω) ∩ N −1 (1)}. (12.9)
In the following, we shall verify the existence of a solution for the minimization
problem (12.7). Suppose Ω ⊂ Rn is bounded, we already know that I is coercive
and weakly sequentially lower semi-continuous, it remains to show the set
M1 = {u ∈ H01 (Ω) | N (u) = 1}
is weakly sequentially closed. That is, if {uj } ⊂ M1 , uj * u in H01 (Ω), then
u ∈ M1 .
By the Rellich–Kondrachov compact embedding theorem, for the bounded
region Ω, H01 (Ω) ,→ L2 (Ω) is a compact embedding. By assumption, uj * u in
H01 (Ω), it contains a subsequence {u0j } such that
u0j → u in L2 (Ω).
Moreover, since
Z
u2j dx = 1,
Ω
it follows that
Z
u2 dx = 1,
Ω
i.e. u ∈ M1 , hence M1 is weakly closed.

Once the first eigenvalue has been found, motivated by the eigenvalue
problems of matrices, it is natural to ask: are there any other eigenvalues? If
so, how to find them?
We continue to adopt the idea of constrained optimization. Consider the set
Z

M2 = u ∈ M1 u · ϕ1 dx = 0 ,
Ω
we seek to
min{I(u) | u ∈ M2 }.
If ϕ2 ∈ M2 is a minimum, then ϕ2 6= 0, and by the Lagrange multipliers, there

exist λ2 , µ2 ∈ R1 such that
−∆ϕ2 = λ2 ϕ2 + µ2 ϕ1 . (12.10)
We now prove that µ2 = 0. First, multiplying both sides of (12.8) by ϕ2 and then
integrating, it yields
Z Z Z
∇ϕ2 ∇ϕ1 dx = ∇ϕ1 ∇ϕ2 = λ1 ϕ1 ϕ2 dx = 0.
Ω Ω Ω
Next, multiplying both sides of (12.10) by ϕ1 and then integrating, it yields

Z Z
∇ϕ2 · ∇ϕ1 dx = µ2 |ϕ2 |2 dx.
Ω Ω
Thus, µ2 = 0.
Consequently,
−∆ϕ2 = λ2 ϕ2 in Ω.
This confirms that λ2 is indeed an eigenvalue with its corresponding eigenfunction

ϕ2 , and ϕ2 6= ϕ1 . In addition,
Z
2
λ2 = I(ϕ2 ) = |∇ϕ2 | dx = min{I(u) | u ∈ M2 } ≥ λ1 .
Ω
In order to show I attains its minimum on M2 , we must show M2 is weakly

closed. Suppose uj * u (in (H01 (Ω)), since {uj } ⊂ M1 and M1 is weakly
R
closed, u ∈ M1 . From Ω
uj ϕ1 = 0, it follows that
Z
uϕ1 = 0.
Ω
Thus, u ∈ M2 , i.e. M2 is weakly closed.

Continuing in this fashion, we let
Z

Mn = u ∈ Mn−1 uϕn−1 dx = 0
Ω
and use a similar argument, we can show that each Mn is weakly closed. Hence,
the constrained optimization problem
min{I(u) | u ∈ Mn }
has a solution ϕn 6= 0 satisfying
n−1
X
−∆ϕn = λn ϕn + µj ϕj .
j=1
Likewise, using mathematical induction, we can show µ1 = · · · = µn−1 = 0 and

∆ϕn = λn ϕn ,
where
λn = I(ϕn ) = min{I(u) | u ∈ Mn } ≥ λn−1 .
As a consequence, we have the following theorem.
Theorem 12.1 Suppose Ω ⊂ Rn is a bounded region, then Eq. (12.6) has an
increasing sequence of eigenvalues 0 < λ1 ≤ λ2 ≤ · · · ≤ λn → +∞, with
corresponding eigenfunctions {ϕ1 , ϕ2 , . . .} ⊂ H01 (Ω) satisfying

 ∆ϕi = λi ϕi ,
i = 1, 2, . . .
Z
 |ϕi |2 dx = 1,
Ω
and
Z Z
((ϕi , ϕj )) = ∇ϕi · ∇ϕj dx = λi ϕi ϕj dx = 0, ∀ i 6= j.
Ω Ω
Proof It suffices to show λn → +∞.

We argue by contradiction. Suppose ∃ C > 0 such that λn ≤ C, then
Z Z
2 2
|∇ϕj | dx = λj |ϕj | dx = λj ≤ C.
Ω Ω
This implies {ϕj }∞ 1

1 is a bounded sequence in H0 (Ω), hence there is a weakly
convergent subsequence
ϕ0j * ϕ∗ (in H01 (Ω)).
On one hand, by the Rellich–Kondrachov compact embedding theorem, we have
ϕ0j → ϕ∗ (in L2 (Ω)). (12.11)

R
On the other hand, since Ω ϕi ϕj dx = 0, i 6= j, we see that
Z Z
2
|ϕi − ϕj | dx = (ϕ2i + ϕ2j )dx = 2, i 6= j. (12.12)
Ω Ω
Substituting (12.11) into (12.12), it follows that

Z
2
|ϕ∗ − ϕj | = 2, ∀ j.
Ω
In particular, if we take j = j 0 , then |ϕ∗ − ϕ0j |2 dx = 2, which contradicts

R
Ω
(12.11).
Remark 12.1 According to the regularity theory of elliptical equations, the eigen-
functions ϕ1 , ϕ2 , . . . are indeed infinitely differentiable on Ω. Furthermore, if the
boundary of Ω is smooth, then they are infinitely differentiable on Ω̄.
12.3 The eigenfunction expansions
We have established the collection of eigenfunctions {ϕi }∞1 is an orthogonal

family not only in H01 (Ω) but also in L2 (Ω). Thus, they form an orthogonal basis
for L2 (Ω). We now show that this family is a complete orthogonal basis for the
Hilbert space L2 (Ω).
∀ u ∈ L2 (Ω), let
Z
cn = u(x)ϕn (x)dx, ∀ n,
Ω
and we call them generalized Fourier coefficients of u. Consider the partial sum
m
X
sm (x) = cn ϕn (x);
n=1
by the completeness of {ϕi }∞

1 , we mean as m → ∞,
sm (x) → u(x) (in L2 (Ω)).

On one hand, by orthogonality,

Z Z Z Z
2 2 2
|∇(u − sm )| dx = |∇u| dx − 2 ∇u · ∇sm dx + |∇sm | dx
Ω Ω Ω Ω
m
X m
X
2 2 2
= k∇uk2 − 2 λn |cn | + λn |cn |
n=1 n=1
m
X
2 2
= k∇uk2 − λn |cn | , (12.13)
n=1
it follows that
m
X Z
2 2
λn |cn | ≤ |∇u| dx. (12.14)
n=1 Ω
On the other hand, since u − sm ∈ Mm+1 , we have

Z Z
2 2
|∇(u − sm )| dx ≥ λm+1 |u − sm | dx.
Ω Ω
Using (12.13), we deduce that
2 1 2
ku − sm k2 ≤ k∇uk2 → 0, m → ∞.
λm+1
Hence, we have the Fourier expansion
∞
X
u(x) = cn ϕn (x) (in the sense of L2 norm)
n=1
and the Parseval’s identity,

Z m
X Z ∞
X
2 2
|u| dx = lim ksm k2 = lim ci cj ϕi ϕj dx = |cm |2 .
Ω m→∞ m→∞ Ω
i,j=1 m=1
In addition, we can also obtain

(1) Rsm → u in H01 (Ω),
P∞
(2) Ω |∇u|2 dx = n=1 λn |cn |2 .
To prove (1), combining (12.13) and (12.14) and letting m > n → ∞, it
yields that
m
X
k∇(sm − sn )k22 = λj |cj |2 → 0.
j=n+1
Hence, sm → u in H01 (Ω). In the meantime,

∞
X
k∇uk22 = lim k∇sm k22 = λj |cj |2 .
m→∞
j=1
Remark 12.2 In the special case of n = 1, we consider the following Sturm–

Liouville problem. On the interval J = [a, b], given functions p ∈ C 1 (J) and
q ∈ C(J). Suppose there exists a constant α > 0 such that
p(x) ≥ α.
By further assuming q(x) ≥ 0, ∀ x ∈ J, we define the Sturm–Liouville operator
to be
Lu = −(pu0 )0 + qu.
Consider the following eigenvalue problem:

Lu = λu in J,
u(a) = u(b) = 0.
As before, on H01 (J), consider the functional
Z
1
I(u) = (p(t)|u0 (t)|2 + q(t)|u(t)|2 )dt.
J 2
However, we define an equivalent norm and inner product on H01 (J) via I as
follows:
Z 21
kuk = p(t)|u0 (t)|2 + q(t)|u(t)|2 dt ,
J
Z
((u, v)) = p(t)u0 (t)v 0 (t) + q(t)u(t)v(t)dt.
J
Inductively, by introducing the constraints
Z Z
1 2

M1 = u ∈ H0 (J) |u| = 1 , M2 = u ∈ M1
uϕ1 dx = 0 , . . . ,
J
we obtain the eigenvalues and eigenfunctions

0 < λ1 ≤ λ2 ≤ · · · ≤ λn → +∞,
and
ϕ1 , ϕ2 , . . . , ϕn , . . . .
It is not difficult to verify ϕn ∈ C 2 (J), ∀ n.

They satisfy
Z
ϕn ϕm dx = δnm , ∀ n, m
J
and
Lϕn = λn ϕn , ∀ n.
Furthermore, ∀ u ∈ H01 (J), its generalized Fourier expansion is

∞
X
u(x) = cn ϕn (x),
n=1
which converges in the sense of H01 (J), where

Z
cn = u(x)ϕn (x)dx.
J
Example 12.1 Let p = 1, q = 0, and J = [0, π], then

Lu = −ü.
It is easy to see
λn = n2 , ϕn (x) = sin nx, n = 1, 2, . . . .
Remark 12.3 Besides the Dirichlet problem, we can also consider eigenvalue
problems with other types of boundary conditions. As an example, we consider
the Neumann problem:
−(pu0 )0 + qu = λu in (a, b),

u0 (a) = u0 (b) = 0.
As mentioned in Lectures 1 and 6, for the Neumann problem, we shall use the
space H 1 (Ω) instead of H01 (Ω). The reason is as follows: using integration by
parts:
Z b Z b
(pu0 )0 ϕdx = (pu0 )ϕ|ba − (pu0 )ϕ0 dx,
a a
the integral form of the E-L equation turns out to be

Z b
(pu0 ϕ0 + (q − λ)uϕ)dx = 0, ∀ ϕ ∈ H 1 (J).
a
Simultaneously, we derive the following ordinary differential equation

−(pu0 )0 + qu = λu,
with boundary condition
u0 (a) = u0 (b) = 0.
However, on the space H 1 (J), the functional
Z b
I(u) = (p|u0 |2 + q|u|2 )dx
a
is not coercive. We use the subspace

Z b
1

X = u ∈ H (J) udx = 0 ,
a
i.e. replacing H01 (J) by the subspace which is orthogonal to all constant functions.
R b X, Poincaré’s inequality still remains valid. The proof is
On the subspace
identical, since a udx = 0, there exists ξ ∈ (a, b) such that u(ξ) = 0. Thus,
Z x Z b 12
1
0 0 2

|u(x)| =
u (t)dt ≤ (b − a)
2 |u | dt .
ξ a
We can now minimize I on the subspace X. In other words, we insert, on H 1 (J),

an integral constraint
Z
u(t)dt = 0.
J
It is worth noting the Lagrange multiplier associated with this constraint will nat-
urally disappear in the equation.
Note that when q = 0, a nonzero constant is itself an eigenfunc-
tion of the eigenvalue 0. Under the Neumann boundary condition, the
Sturm–Liouville
p problem has eigenvalues {0, λ1 , λ2 , . . .} and eigenfunctions
{1/ (b − a), ϕ1 , ϕ2 , . . .}.
Next, we consider the eigenvalue problem for the Laplace operator with Neu-
mann boundary condition:

 ∆u= λu in Ω,
∂u
 = 0,
∂n ∂Ω
where n denotes the unit normal direction on ∂Ω. First, we introduce the subspace
Z
X = u ∈ H 1 (Ω)

u(x)dx = 0 .
Ω
We then extend Poincaré’s inequality Rto the subspace X; (after inserting an inte-
gral constraint), we minimize I(u) = Ω |∇u|2 dx on X.
Remark 12.4 The same method applies to the eigenvalue problem of the Laplace–
Beltrami operator on a closed (no boundary) compact Riemannian manifold
(M, g):
1 X √
∆g u = √ ∂i (g ij g∂j u),
g ij
again we denote g = det(gij ), without any boundary condition.

In particular, it is worth noting the following eigenvalue problem with periodic

boundary condition:
−(pu0 )0 + qu = λu in (a, b),

u(a) = u(b), u0 (a) = u0 (b),
can also be regarded as eigenvalue problem on the closed compact manifold S 1 .

In this case, we replace H01 (J) by Hper
1
(J), the space of periodic functions; we
replace Poincaré’s inequality by Wirtinger’s inequality (see Lemma 13.4 in Lec-
ture 13), which can be used in verifying the coerciveness of a functional.
Remark 12.5 The above steps used in finding the eigenvalues and eigenfunc-
tions coincide with the geometric approach of diagonalizing a quadratic form de-
termined by a quadratic hyper-surface in Rn . In which, λ−1 −1
1 , λ2 , . . . are the
principal axes.
12.4 The minimax description of eigenvalues
Our previous description of the eigenvalues {λn } is inductive; in other words,

assuming the n − 1 eigenfunctions ϕ1 , . . . , ϕn−1 have been found, then we can
determine λn . The following min-max theorem gives a direct approach to find λn .
Theorem 12.2 (Courant’s Min-Max Theorem)
Z
2
|∇u| dx
λn = max min ZΩ ,
⊥
En−1 u∈En−1 \{θ} 2
|u| dx
Ω
where En−1 is any (n − 1)-dimensional linear subspace of H01 (Ω).

Proof Assume v1 , . . . , vn−1 ∈ H01 (Ω) is a collection of linearly independent
functions. Let
En−1 = span{v1 , v2 , . . . , vn−1 }
and
Z
2
|∇u| dx
µ(En−1 ) = min ZΩ .
⊥
u∈En−1 \{θ} 2
|u| dx
Ω
On one hand, we will prove
µ(En−1 ) ≤ λn .
⊥
Let {ϕ1 , . . . , ϕn } be the first n eigenfunctions, then (En−1 \{θ}) ∩
span{ϕ1 , . . . , ϕn } 6= ∅, i.e. there exists a nonzero u such that
 Xn
u = ci ϕi ,


Z i=1
 j = 1, . . . , n − 1,
 uvj dx = 0,

Ω
or
n
X Z
ci ϕi vj dx = 0 j = 1, . . . , n − 1.
i=1 Ω
This system
R of (n − 1) linear equations in the n unknowns c1 , . . . , cn with coeffi-
cients Ω ϕi vj dx must possess a nontrivial solution. From λ1 ≤ λ2 ≤ · · · ≤ λn ,
it follows that
Z n
2 P 2
|∇u| dx λi |ci |
Ω i=1
µ(En−1 ) ≤ Z = P n ≤ λn .
2 2
|u| dx |ci |
Ω i=1
On the other hand, if we choose the particular subspace Ẽn−1 =

span{ϕ1 , . . . , ϕn−1 }, then
max µ(En−1 ) ≥ µ(Ẽn−1 ) = λn .
En−1
This completes the proof.
Exercises
1. Let Ω ⊂ Rn be bounded. If y ∈ C(Ω̄) is a positive continuous function, prove

that

−∆u(x) = λr(x)u(x), x ∈ Ω,
u(x) = 0, x ∈ ∂Ω,
has infinitely many eigenvalues and eigenfunctions. Furthermore, the eigen-
functions are mutually orthogonal with respect to the inner product
Z
(u, v) = u(x)v(x)r(x)dx.
Ω
2. We adopt the above notations. Assume aij ∈ C 1 (Ω̄), c ∈ C(Ω̄), if there exists
α > 0 such that
X n
ai,j (x)ξi ξj ≥ α|ξ|2 , ∀ ξ = (ξ1 , . . . , ξn ) ∈ Rn ,
i,j=1
prove that

Xn
−

∂i (ai,j (x)∂j u(x)) + c(x)u(x) = λu(x), x ∈ Ω,
 i,j=1 x ∈ ∂Ω,
u(x) = 0,

has infinitely many eigenvalues and eigenfunctions.

3. Let H be a Hilbert space, α : H × H → R1 be a bounded bilinear functional,
and V ⊂ H be a linear subspace. Suppose there exists c > 0 such that
a(v, v) ≥ ckvk2 , ∀ v ∈ V,
prove that (V, a(·, ·)) is a Hilbert space.
4. In exercise 2 above, assume f ∈ L2 (Ω), show that

Xn
−

∂ (a (x)∂ u(x)) + c(x)u(x) = f (x),
i i,j j x ∈ Ω,
 i,j=1 x ∈ ∂Ω,
u(x) = 0,

has a unique solution u ∈ H01 (Ω).

Lecture 13
Existence and regularity
In the previous two lectures, we turned the existence of solutions of an E-L

equation into a problem of finding extreme values of a functional. Under certain
conditions, the minimum u can be obtained by a minimizing sequence in some
Sobolev space W 1,q (Ω), which satisfies the following equation only in the gener-
alized sense:
Z
Lu (x, u(x), ∇u(x))ϕ(x)+Lp (x, u(x), ∇u(x))∇ϕ(x)dx = 0, ∀ ϕ ∈ C0∞ (Ω).
Ω
That is, the minimum is a generalized solution of the corresponding E-L equation.
However, for a functional containing first order derivatives, its E-L equation
is a second order differential equation. A generalized solution is a solution in
the ordinary sense only if it is twice differentiable. From a differential equation
perspective, we need also address whether such a generalized solution would have
enough differentiability to fulfill the E-L differential equation. In other words, is
it possible to deduce u ∈ C 2 from the generalized solution u? Or perhaps more
differentiability, or even analyticity? This is the so-called regularity problem.
We call the problem of finding a generalized solution an “existence” problem,
and call the problem of determining the differentiability of a generalized solution
a “regularity” problem.
That being said, in direct methods, the “existence” and “regularity” are twin
problems.
Among Hilbert’s 23 problems, problem 19 inquires the analyticity of solu-
tions of some regular variational problems (in fact, it is about elliptical equations),
whereas problem 20 is related to the solvability of a regular variational problem
with boundary conditions (namely, the existence).
179
13.1 Regularity (n = 1)
In the following, we shall henceforth assume the Lagrangian L ∈ C 2 , we ask:

when does the minimum u∗ of functional I belong to C 2 (J)?
Lemma 13.1 Suppose u∗ ∈ C 1 (J) is a minimum of I(u) = J L(t, u(t), u̇(t))dt.
R
If det(Lpi pj (t, u∗ (t), u̇∗ (t))) 6= 0, ∀ t ∈ J, then u∗ ∈ C 2 (J).

Proof Denote
Z t
q(t) = Lu (t, u∗ (t), u̇∗ (t))dt − c,
a
where c is a constant. Using the integral form of the E-L equation, we have
Lp (t, u∗ (t), u̇∗ (t)) = q(t).
Define ϕ : J¯ × RN → RN via
ϕ(t, p) = Lp (t, u∗ (t), p) − q(t).
We know that ϕ ∈ C 1 (J¯ × RN , RN ) satisfies
det(ϕp (t, u∗ (t))) = det(Lpp (t, u∗ (t), u̇∗ (t))) 6= 0
and
ϕ(t, u̇∗ (t)) = 0.
By the Implicit Function Theorem, the equation
ϕ(t, p) = 0
has a unique local C 1 -solution, i.e. ∀ t0 ∈ J, ¯ there exists a neighborhood U =
U (t0 ) and a unique λ ∈ C 1 (U, RN ) such that in the neighborhood of (t, u̇∗ (t)),
ϕ(t, λ(t)) = 0.
Thus, u̇∗ (t) = λ(t) ∈ C 1 , which implies u∗ ∈ C 2 .
The condition det(Lpi pj (t, u∗ (t), u̇∗ (t)) 6= 0 plays an important role; for oth-
erwise, there exists a functional whose minimum is of C 1 but not of C 2 .
Example 13.1 Let L(t, u, p) = u2 (p − 2t)2 and M = {u ∈
C 1 ([−1, 1]) | u(−1) = 0, u(1) = 1}, then the functional
Z 1
I(u) = u2 (u̇(t) − 2t)2 dt
−1
has a minimum

∗ 0, t < 0,
u (t) =
t2 , t ≥ 0.
Existence and regularity 181
It is clear u∗ ∈ C 1 \C 2 ([−1, 1]). Moreover, Lpp (t, u∗ (t), u̇∗ (t)) = 2u∗ (t)2 = 0,
for t < 0.
The above lemma elevated the solution from being C 1 to being C 2 , based on
the Implicit Function Theorem. However, it is not enough to insure the regularity
of the solution. In Lecture 9, we pointed out that the space C 1 is not the suitable
function space for direct methods. Using direct methods, we can only obtain
a solution in some Sobolev space. In order to obtain its regularity, we have to
further establish that the generalized solution is indeed a C 1 solution.
Theorem 13.1 For 1 < r < ∞, assume L satisfies the following growth
condition:
|L(t, u, p)| + |Lu (t, u, p)| + |Lp (t, u, p)| ≤ C(1 + |p|r );
for r = ∞, no growth condition on L is needed.
Furthermore, assume the matrix (Lpi pj (t, u, p)) ∀ (t, u, p) ∈ J¯ × RN × RN
is positive definite.
∗
R If u ∈ W 1,r (J, RN ) is a minimum of the functional I(u) =
Ω
L(t, u(t), u̇(t))dt, then by changing the values of u∗ on a set of measure zero,
∗
u ∈ C 2.
Proof By Lemma 13.1, it suffices to show that u∗ ∈ C 1 .
Define the function ϕ : J × RN × RN × RN → R1 by
ϕ(t, u, p, q) = Lp (t, u, p) − q.
By assumption, det(Lpi pj (t, u, p)) 6= 0, ∀ (t, u, p) ∈ J¯ × RN × RN . To solve
the equation: ϕ(t, u, p, q) = 0, we apply the Implicit Function Theorem to see if
there exists a unique local C 1 -solution
p = λ(t, u, q). (13.1)
On one hand, we show that this solution is globally unique. Suppose p1 , p2 both
satisfy (13.1), then q = Lp (t, u, p1 ) = Lp (t, u, p2 ). Hence,
0 = (Lp (t, u, p1 ) − Lp (t, u, p2 ), p1 − p2 ) = (B(p1 − p2 ), p1 − p2 ),
where
Z 1
B= Lpp (t, u, p1 + τ (p2 − p1 ))dτ.
0
By assumption, B is positive definite, therefore p1 = p2 .
On the other hand, since u∗ ∈ W 1,r (J, RN ), when 1 < r < ∞,
Lu (t, u∗ (t), u̇∗ (t)) ∈ L1 (J) and when r = ∞, Lu (t, u∗ (t), u̇∗ (t)) ∈ L∞ (J).
In any case,
Z t
q(t) = Lu (s, u∗ (s), u̇∗ (s))ds − c is absolutely continuous.
t0
By the integral form of the E-L equation,

Z t
Lu (t, u∗ (t), u̇∗ (t))dt − Lp (t, u∗ (t), u̇∗ (t)) = c, a.e. t ∈ J,
t0
it follows that
q(t) = Lp (t, u∗ (t), u̇∗ (t)), a.e. t ∈ J
and
u̇∗ (t) = λ(t, u∗ (t), q(t)), a.e. t ∈ J.
Since q(t) is absolutely continuous, substituting it into the above expression, by
changing the values of u∗ on a set of measure zero, we have u̇∗ (t) is continuous,
i.e. u∗ ∈ C 1 . Thus, u∗ ∈ C 2 follows readily from Lemma 13.1.
In the above proof, the global positive definiteness of the matrix
(Lpi pj (t, u, p)), ∀ (t, u, p) ∈ J¯ × RN × RN
played a crucial role in establishing both the global uniqueness of the solution as
well as its regularity. This is because the derivatives of functions in the Sobolev
space W 1,r may be discontinuous. Along the graphs of these functions, two
timewise nearby points may fall in different image neighborhoods. Since the Im-
plicit Function Theorem only works for neighborhoods in the image space, it is
no longer applicable.
The following example demonstrates, by removing the global positive defi-
niteness assumption, the solution becomes non-differentiable.
Example 13.2 Let L(p) = (p2 − 1)2 and M = {u ∈ Lip([0, 1]) | u(0) = u(1) =
0}, then the functional
Z 1
I(u) = (u̇2 (t) − 1)2 dt
0
has minimal value 0.
Note that Lpp (t, u, p) = 4(3p2 − 1) is not positive definite.
If u ∈ C 1 is a minimum, then u̇(t) = ±1. However, regardless of u̇(t) = 1 or
u̇(t) = −1, it is impossible to have a solution satisfying the boundary condition
u(0) = u(1) = 0. In other words, there cannot be a C 1 -solution to achieve
the minimal value. On the contrary, there are uncountably many solutions of
this variational problem, which are Lipschitz sawtooth-like functions satisfying
u̇(t) = ±1 (see Figure 13.1).
Lastly, we give an example to show that without the convexity of the La-
grangian, the weak sequential lower semi-continuity of the functional may not
hold.
Fig. 13.1
Example 13.3 (Bolza) Let L(u, p) = u2 + (p2 − 1)2 and M = W01,4 (0, 1), the
the functional Z 1
I(u) = [u2 (t) + (u̇(t)2 − 1)2 ]dt
0
has infimum zero, but I has no minimum in M .
To see this, note on one hand, I(u) ≥ 0; on the other hand, we define a
minimizing sequence of sawtooth-like functions:
k k 2k + 1

t − ,
 t∈ , ,
 j j 2j
uj (t) =
k+1 2k + 1 k + 1
 −t + , t∈

 , ,
j 2j j
where j = 2, 3, . . . and 0 ≤ k ≤ j − 1.
1
Since |u̇j (t)| = 1 a.e. and |uj | ≤ 2j , we see that
1
I(uj ) ≤ 2 → 0, j → ∞.
4j
Thus, inf u∈M I = 0.
However, I has no minimum in W01,4 (0, 1). Suppose not, there exists u0 ∈
1,4
W0 (0, 1) such that I(u0 ) = 0, hence u̇0 (t) = 0 a.e. and I(u0 ) = 1, a contra-
diction (see Figure 13.2).
Fig. 13.2
13.2 More on regularity (n > 1)
The regularity problems of partial differential equations are far more compli-
cated than those of ordinary differential equations, they demand special knowl-
edge. In order to give the interested readers a taste of this subject, we provide the
following example.
Theorem 13.2 (Weyl) Suppose u ∈ L1loc (Ω) satisfies
Z
u · 4ϕdx = 0, ∀ ϕ ∈ C0∞ (Ω), (13.2)
Ω
then after changing its value on a set of measure zero, u ∈ C ∞ (Ω).
We call those locally L1 functions which satisfy Eq. (13.2) weakly harmonic
functions. The idea of the proof originates from the mean value property of har-
monic functions.
If u ∈ C 2 (Ω) and ∆u = 0 in Ω, then
Z Z
1 n
u(x) = n−1 u(y)dσ = n u(y)dy, ∀ Br (x) ⊂ Ω, (13.3)
r $n ∂Br (x) r $n Br (x)
where $n denotes the surface area of the unit sphere S ⊂ Rn . We call (13.3) the
mean value property of u.
To verify (13.3), we assume Br (x) ⊂ Ω and choose ρ ∈ (0, r), then
Z Z Z
∂u n−1 ∂
0= ∆u(y)dy = dσ = ρ u(x + ρw)dw.
Bρ (x) ∂Bρ (x) ∂n ∂ρ |w|=1
This implies
Z
u(x + ρw)dw = const. = $n u(x),
|w|=1
hence,
Z Z
1 n
u(x) = u(y)dσ = u(y)dy.
rn−1 $n ∂Br (x) r n $n Br (x)
We now extend this result to prove the smoothness of a solution.

Recall the properties of modifiers introduced in Lecture 10: ∀ δ > 0, let
Ωδ = {x R∈ Ω | d(x, ∂Ω) > δ} and ϕδ (x) = δ −n ϕ( xδ ), where ϕ ∈ C0∞ (B1 (θ))
satisfying B1 (θ) ϕ(x)dx = 1 and for some one variable function ρ, ϕ(rw) = ρ(r)
for all |w| = 1, then
Z
uδ (x) = u(y)ϕδ (x − y)dy ∈ C ∞ (Ωδ ).
Ω
Lemma 13.2 If u ∈ C(Ω) has the mean value property, then u ∈ C ∞ (Ω) and
u(x) = uδ (x), ∀ x ∈ Ωδ .
Proof ∀ x ∈ Ωδ , by the mean value property of u, we have:

Z
−n y
uδ (x) = δ u(x + y)ϕ( )dy
|y|≤δ δ
Z
= u(x + δy)ϕ(y)dy
|y|≤1
Z1 Z
n−1
= r ρ(r) u(x + δrw)dwdr = u(x).
0 |w|=1
Since uδ ∈ C ∞ (Ωδ ) and δ > 0 is arbitrary, u ∈ C ∞ (Ω).

1
Lemma 13.3 If u ∈ L (Ω) is a weakly harmonic function, i.e. it satisfies (13.2),
then u ∈ C ∞ (Ω) and it has the mean value property.
Proof (1) We first prove ∆uδ (x) = 0, ∀ x ∈ Ωδ0 , 0 < δ < δ0 .
∀ ψ ∈ C0∞ (Ωδ0 ), since supp ψδ ⊂ Ω, we have
Z Z
uδ (x)∆ψ(x)dx = uδ (x)∆ψ(x)dx
Ωδ0 Rn
Z Z
= u(y)ϕδ (x − y)∆ψ(x)dx
n Rn
ZR
= u(y)(∆ψ)δ (y)dy
ZΩ
= u∆(ψδ )dy = 0.
Ω
Thus, ∆uδ (x) = 0 in Ωδ0 .
(2) If we can show that {uδ } is uniformly bounded and equicontinuous on
Ω̄2δ0 , then by the Arzelà–Ascoli theorem, there must be a subsequence which
converges uniformly to a continuous function v. Since uδ → u in L1 (Ωδ0 ),
u(x) = v(x) a.e. in Ω. Furthermore, since uδ is harmonic, it has the mean
value property. Upon taking limits, v also possesses the mean value property. By
Lemma 13.2, it is immediate v ∈ C ∞ (Ω).
(3) We now return to prove that {uδ } is uniformly bounded and equicontinu-
ous on Ω̄2δ0 . By the mean value property of uδ ,
Z
n
|uδ (x)| ≤ n |uδ (y)|dy ≤ Cδ0 kuδ k1 ≤ Cδ0 kuk1
δ0 $n Bδ0 (x)
and
Z Z

|uδ (x) − uδ (y)| ≤
u(ξ) [ϕ(x + δ(z − ξ)) − ϕ(y + δ(z − ξ))]dξdz
Ω Bδ0 (θ)
≤ Cδ0 |x − y|kuk1 .
Remark 13.1 Theorem 13.2 asserts a generalized solution is smooth in Ω. How-

ever, to further examine the smoothness on Ω̄, it becomes exceedingly more dif-
ficult, which is beyond the scope of this book. The study of regularity problems
of partial differential equations is a specialized area of its own. There are ample
textbooks, monographs, and references on this subject; we omit any further dis-
cussion and refer the interested readers to the introductory book by D. Gilbarg and
N. Trudinger [GT].
For linear strongly elliptic equations, assuming the domain and coefficients are
sufficiently smooth, solutions of regular boundary value problems have enough
differentiability. The most notable work is due to J. Schauder, S. Agmon, A.
Douglise, and L. Nirenberg. For quasi-linear elliptical equations, the most notable
work is due to J. Nash, E. De Giorgi, O. A. Ladyzenskaya, and N. N. Uraltzeva.
For a system of elliptical equations, generally speaking, its solutions may not
possess regularity, cf. M. Giaquinta [Gi].
13.3 The solutions of some variational problems
In this section, we examine the existence and regularity of solutions of varia-

tional problems via the following examples.
Example 13.4 (The two-point boundary value problem) Denote the interval
J = (t0 , t1 ) and the 2-torus T 2 = R2 /Z2 . Given a0 , a1 ∈ T 2 and F ∈ C 2 (J ×
T 2 ), find a solution u ∈ C 2 (J, T 2 ) satisfying the boundary conditions u(ti ) = ai ,
i = 0, 1, and the equation:
ü(t) = ∇u F (t, u(t)). (13.4)
Solution In order to turn this into a standard variational problem, we first extend
the function F to the entire R2 , namely,
F (t, u1 + 1, u2 ) = F (t, u1 , u2 + 1) = F (t, u1 , u2 ), ∀ t ∈ J.
We still denote it by F . Define the functional and the corresponding space by
Z
1
I(u) = |u̇(t)|2 + F (t, u(t)) dt,
J 2
a0 (t1 − t) + a1 (t − t0 )
M = u0 + H01 (J, R2 ), u0 = .
t1 − t0
Its E-L equation is exactly (13.4).
1
Denote u = u0 + v, and we use J |v̇|2 2 as the norm on H01 (J, R2 ).
R
Since F is bounded, I is coercive. By the compact embedding H 1 (J, R2 ) ,→

C(J, R2 ) and the continuity of F , I is also weakly sequentially lower semi-
continuous. Furthermore, since M is weakly closed, by the existence theorem,
I attains its minimum u on M . Noticing (Lpi pj ) = Id, so it is positive definite.

By Theorem 13.1 on regularity, u ∈ C 2 and clearly it satisfies (13.4).
Example 13.5 (Periodic solutionsRof forced oscillations) Assume e ∈ C[0, T ]
T
has mean value 0 and period T : 0 e(t)dt = 0, and a is a constant. Find the
periodic solutions of period T > 0 of the following equation:
ü(t) + a sin u(t) = e(t).
1
Solution We define Hper (0, T ) to be the Sobolev space of all H 1 (0, T ) functions
with period T > 0, i.e. the closure of T -periodic C ∞ functions in H 1 (0, T ).
Define the functional
Z T
1 0 2
I(u) = |u (t)| + a cos u(t) − E(t)u0 (t) dt,
0 2
where
Z t
E(t) = e(s)ds.
0
E is also a function of period T > 0. Since
Z T Z T
0
E(t)u (t)dt = − e(t)u(t)dt,
0 0
the E-L equation of I is precisely
ü(t) + a sin u(t) = e(t).
1
To verify I attains its minimum in Hper (0, T ), similar to Poincaré’s inequality, we
need the following lemma.
1
Lemma 13.4 (Wirtinger’s inequality) Suppose u ∈ Hper (0, T ) and ū =
1
R T
T 0 u(t)dt = 0, then
Z T
4π 2 T
Z
|u0 (t)|2 dt ≥ 2 |u(t)|2 dt.
0 T 0
Proof On [0, T ], we expand the periodic function u as its Fourier series. Since
ū = 0,
∞
X 2πkt 2πkt
u(t) = ak cos + bk sin ,
T T
k=1
hence,
∞
0 2π X 2πkt 2πkt
u (t) = −kak sin + kbk cos .
T T T
k=1
September 1, 2016 9:0 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page 188
By Parseval’s identity,
Z T ∞
(2π)2 X 2
|u0 (t)|2 dt = k (|ak |2 + |bk |2 )
0 T
k=1
2 X∞
(2π)
≥ (|ak |2 + |bk |2 )
T
k=1
2 Z T
(2π)
= |u(t)|2 dt.
T2 0
1
∀ u ∈ Hper (0, T ), we decompose
u = ũ + ū,
RT
where ū = T1 0 u(t)dt is a real number. From Wirtinger’s inequality, we see
that ū is not controlled by the values of I. In other words, if we use the H 1 -norm
1
directly on Hper (0, T ), then I is not coercive.
Moreover, since the nonlinear term cos u in I is 2π-periodic, we have
I(u + 2π) = I(u).
1
By which, we need not consider I on the entire Hper (0, T ), but instead, by setting
1
M = {u = ξ + η | ξ ∈ Hper (0, T ), ξ¯ = 0, η ∈ [0, 2π]};
we then restrict I on M . The advantage is that ū now only varies on the bounded
interval [0, 2π].
Noting that M is weakly sequentially closed and
1 ˙ 2 ˙ 2 − |a|;
I(u) ≥ kξk2 − kEk∞ kξk
2
by Wirtinger’s inequality, kξk ˙ 2 is an equivalent norm on H 1 . I is coercive on M .
It is not difficult to verify that I is also weakly lower semi-continuous. According
1
to the existence theorem, there exists a minimum u ∈ M ⊂ Hper . Moreover,
since all conditions in Theorem 13.1 on regularity are met, it follows that u ∈ C 2 .

Example 13.6 Let Ω ⊂ Rn be a bounded region, 1 < r < p < ∞, and f ∈
p
L p−r (Ω), then
Z
1 1
I(u) = |∇u(x)|p − f (x) · |u(x)|r−1 u(x) dx, ∀ u ∈ H01 (Ω)
Ω p r
has a minimum u0 ∈ W01,p (Ω), and it satisfies
Z
[|∇u(x)|p−2 ∇u(x)∇ϕ(x) − |u|r−1 f (x)ϕ(x)]dx = 0, ∀ ϕ ∈ C0∞ (Ω).
Ω
This is a generalized solution of the equation

−∆p u = f |u|r−1 ,
where
n
X
−∆p u = ∂i (|∇u|p−2 )∂i u
i=1
is called the p-Laplace operator.

1
Proof Denote kuk = Ω |∇u(x)|p dx p , we know that it is an equivalent norm
R
on W01,p (Ω).
(1) We claim I is coercive.
By Poincaré’s inequality, Hölder’s inequality, and Young’s inequality, ∃ C > 0
such that
Z p
|f ||u|r dx ≤ kf k p−r r r
p kuk ≤ Ckf k p ||u|| ≤ C kf k p
p p−r

p−r
+ rkukp .
p−r
Ω
Thus,

1 C p
I(u) ≥ − kukp − kf k p−r

p → +∞, as kuk → ∞.
p r p−r
(2) Since Ω |∇u(x)|p dx is convex and lower semi-continuous, it is weakly

R
sequentially lower semi-continuous. Using the Rellich–Kondrachov compact

embedding theorem, the latter term is also weakly sequentially lower semi-
continuous. Thus, I is weakly sequentially lower semi-continuous.
Remark 13.2 Generally speaking, it is not possible to obtain u ∈ C 2 . However,
for p = 2, r = 1, if f ∈ C γ (Ω̄) (0 < γ < 1) is a Hölder function and if ∂Ω
is sufficiently smooth, then using the Schauder’s estimate, one can prove u ∈
C 2,γ (Ω̄). When p = 2, r = 1, for f ∈ C(Ω̄), this is the Poisson equation.
Example 13.7 (Harmonic mappings) Let Ω ⊂ Rn be a bounded open set with
smooth boundary ∂Ω. Denote the unit sphere in Rn+1 by S n . Consider the map-
ping u = (u1 , . . . , un+1 ) : Ω −→ S n ⊂ Rn+1 . Let ϕ = (ϕ1 , . . . , ϕn+1 ) :
∂Ω −→ S n be a C 1 mapping defined on the boundary. We define the set
M = {u ∈ H 1 (Ω, Rn+1 ) | u(x) ∈ S n , a.e. x ∈ Ω, u|∂Ω = ϕ},
and we want to find
inf{E(u) | u ∈ M },
where
Z
1
E(u) = |∇u|2 dx.
2 Ω
Since E is convex and lower semi-continuous, it is weakly sequentially lower

semi-continuous. Furthermore, E is also coercive.
We now verify M is weakly sequentially closed. If
uj → u H 1 (Ω, Rn+1 ) ,

then there exists a subsequence uj 0 → u in L2 (Ω, Rn+1 ) and a further subse-

quence uj 0 (x) → u(x) a.e.
From uj (x) ∈ S a.e., it follows that u(x) ∈ S n a.e., whence u ∈ M . Conse-
quently, there exists a minimum u0 ∈ M satisfying the E-L equation:
Z
∇u0 (x)∇v(x)dx = 0,
Ω
∀v ∈ H01 (Ω, Rn+1 )

satisfying v(x) ∈ Tu0 (x) S n a.e. in Ω, where Tu S n is the
tangent space to S at u ∈ S n .
n
If u0 ∈ C 2 (Ω, Rn+1 ), then

(∆u0 )T (x) = 0, a.e. in Ω,
where (∆u0 )T (x) is the tangential projection of ∆u0 (x) at u0 (x).
The normal projection (∆u0 )N (x) of ∆u0 (x) at u0 (x) is given by
(∆u0 )N (x) = ∆u0 (x) · u0 (x).
Differentiating the equation |u(x)|2 = 1 twice, it yields
∆u(x) · u(x) = −|∇u(x)|2 ,
thus, u0 satisfies the equation
−∆u(x) = u(x)|∇u(x)|2 .
This is the so-called harmonic mapping equation (cf. Example 7.5 in Lecture 7).

Remark 13.3 When m = 2, Morrey proved that u0 is smooth. However, for
m > 2, F. H. Lin proved that u0 = 1/|x|; in which case, although u0 is a mini-
mum, it still has singularity.
Example 13.8 (Nonlinear eigenvalue problems) In Lecture 12, we introduced
the variational approach to solving eigenvalue problems of linear differential equa-
tions. Similarly, for eigenvalue problems in nonlinear differential equations, vari-
ational methods are also successful. Consider the following example: given a
bounded domain Ω ⊂ Rn and a Carathéodory function f : Ω × R1 → R1 satisfy-
ing the growth condition:
|f (x, u)| ≤ C(1 + |u|r ), 1 ≤ r < 2∗ − 1,
where 2∗ = 2n
n−2 (n > 2). Assume
f (x, 0) = 0,
find u ∈ H01 (Ω)\{0} such that
−∆u(x) = λf (x, u(x)),
where λ is a parameter. Furthermore, λ is called an eigenvalue if it corresponds to
a nonzero solution u ∈ H01 (Ω).
Solution Similar to linear equations, we regard this as a constrained variational
problem. On H01 (Ω), we consider the functionals
Z
1
I(u) = |∇u(x)|2 dx,
2 Ω
Z
N (u) = F (x, u(x))dx,
Ω
where
Z t
F (x, t) = f (x, s)ds
0
is an anti-derivative of f (·, t). Since the growth of f is restricted, N (u) is well-
defined on H01 (Ω).
Likewise, we want to find the minimum of I on M = N −1 (1). Since I
is coercive and weakly lower semi-continuous, and by the Rellich–Kondrachov
compact embedding theorem, M is weakly closed, it follows that I attains its
minimum u0 . Moreover, using the Lagrange multipliers, there exists a real number
λ0 such that
−∆u0 (x) = λ0 f (x, u0 (x)).
When f (x, u) = u, this is precisely the eigenvalue problem in Lecture 12. How-
ever, when f (x, u) = ur and 1 ≤ r < 2∗ − 1 with 2∗ = n−2 2n
(n > 2), it
generalizes the linear eigenvalue problem.
Remark 13.4 Recall in Lecture 12, we obtained an increasing sequence of eigen-
values for a linear differential equation. It is natural to ask: for what kind of
nonlinear differential equations would we have similar results. It is not a simple
task to answer this since for nonlinear problems, we no longer have orthogonal-
ity between two eigenvectors with distinct eigenvalues. However, if f is odd in
u, that is, f (x, −u) = −f (x, u); similar results are captured by the Liusternik–
Schnirelmann theory, which requires more in-depth knowledge of topology.
The following gives an example of a functional which is neither bounded
above nor below, so the common variational method does not seem applicable.
However, by a certain technique, we can turn it into a minimal value problem.
Example 13.9 (The polarization technique) Find u ∈ X = H01 (Ω) satisfying

the equation:
−∆u = |u|p−2 u,
where 2 < p < 2∗ and 2∗ is as defined above.
Notice that this is the E-L equation of the functional
Z
1 1
I(u) = |∇u|2 − |u|p dx.
Ω 2 p
However, I is neither bounded above nor below.
We use polar decomposition on the variable u as follows. Let S be the unit
sphere in X, ∀ u ∈ X\{θ}, there exists a unique pair (t, v) ∈ R1+ × S such that
u = tv.
Fixing any v ∈ S and consider the single variable function on the ray
t2 tp
t 7→ I(tv) = − |v|pp ,
2 p
where |v|pp = |v(x)|p dx. It attains its maximum at
R
Ω
1
t = t(v) = p ,
|v|pp−2
which also satisfies
d
I(tv)|t=t(v) = 0.
dt
Substituting into the original functional I, we obtain a functional I˜ on the unit
sphere S by

˜ = 1−1
I(v)
1
2p .
2 p
|v|pp−2
Using the embedding theorem, there exists a constant C > 0 such that
|v|pp ≤ C p kvkp = C p .
Thus,
1 1
≥ p.
|v|pp C
This means I˜ is indeed a continuous functional on S. Using compact embedding,
it is also weakly sequentially continuous. Note that S is itself weakly sequentially
compact, hence I˜ must attain its minimum at some v0 .
It remains to show u0 = t(v0 )v0 is a solution of the original equation. To

justify this, on one hand, we have
0 = (I˜0 (u0 ), w) = (I 0 (t(v0 )v0 ), w) − (I 0 (t(v0 )v0 ), v0 )(v0 , w), ∀ w⊥v0 ,
where (·, ·) is the inner product on X. On the other hand,
d
I(tv)|t=t(v) = (I 0 (t(v)v), v).
dt
Thus,
(I 0 (t(v0 )v0 ), w + tv0 ) = 0, ∀ w⊥v0 , ∀ t ∈ R1 ,
i.e.
(I 0 (u0 ), ϕ) = 0, ∀ ϕ ∈ X.
This affirms that u0 is a solution of the E-L equation of I.

Remark 13.5 In some reference, the polarization technique is also termed the
fiberation method.
13.4 The limitations of calculus of variations
At the end of this lecture, we point out, in particular, the solutions of differ-
ential equations are not always obtainable by direct methods. Hadamard gave the
following counterexample: let
∞
X sin m!ϑ
g(ϑ) = ,
m=1
m2
then
∞
X rm! sin m!ϑ
u(r, ϑ) = , ((r, ϑ) ∈ [0, 1] × [0, 2π])
m=1
m2
converges uniformly to a continuous function u on the unit circle.

The function u is smooth and harmonic in the interior of the unit circle and
taking on the boundary value of g. However, the integral of the square of the
gradient of u is infinite! In other words, as a harmonic function, u satisfies the
E-L equation ∆u = 0 of the Dirichlet integral in some sense, but the correspond-
ing functional (the Dirichlet integral) itself takes on the value of infinity!
Exercises
1. Let Ω ⊂ Rn be a bounded region with smooth boundary and f ∈ C(Ω̄). On

W01,4 (Ω), consider the functional
Z
1 4 2 2
I(u) = |∇u| − x |u| − f (x)u dx.
Ω 4
Prove that I is weakly lower semi-continuous and coercive, hence it has a

minimal solution.
2. Determine whether each of the following functional is weakly lower semi-
continuous, or coercive, and whether the minimal solution exists. Explain why.
(1)
Z 1
t2 u̇2 dt, M = {u ∈ H 1 (−1, 1) | u(±1) = ±1}.
−1
(2)
Z 1
(u̇2 − 1)2 dt, M = W01,4 (0, 1).
0
Lecture 14
The dual least action principle and the

Ekeland variational principle
The main focus of this lecture includes: the dual least action principle and the
Ekeland variational principle.
The dual least action principle is mainly applied to Hamiltonian systems and
related problems. In general, the functional associated with a Hamiltonian system
is neither bounded above nor below, so variational methods are difficult to apply.
However, if the Hamiltonian in a Hamiltonian system is a convex function, then
by means of convex conjugates, the problem can be transformed into a constrained
variational problem. This is the essence of the dual least action principle.
The Ekeland variational principle is a general minimization result with a broad
variety of applications. It provides a specific method in choosing a minimizing se-
quence; consequently, this minimizing sequence along with some other conditions
give rise to numerous applications.
14.1 The conjugate function of a convex function
In Lecture 3, we introduced the Legendre transform f ∗ of a function f (x) on

n
R via
f ∗ (ξ) = ξ · ψ(ξ) − f (ψ(ξ)), (14.1)
where x = ψ(ξ) is the inverse function of ξ = ∇f (x).
The importance of the Legendre transformation is that it reveals the inverse
relation of the gradient ∇f of f and the gradient ∇f ∗ of f ∗ , namely,
ξ = ∇f (x) and x = ∇f ∗ (ξ)
are inverses of each other. Thus, we can recover f from f ∗ via the relation
f (x) + f ∗ (ξ) = hξ, xi.
195
However, it is worth noting that the applicability of the Legendre transformation

is rather restrictive, since it requires the existence of (∇f (x))−1 everywhere.
Given a convex function f , whose domain is D(f ) = {x ∈ Rn | f (x) <
+∞}. The replacement of the idea of the gradient mapping is the sub-differential
operator x 7→ ∂f (x). The sub-differential operator is a set-valued mapping:
ξ ∈ ∂f (x) ⇐⇒ f (y) ≥ f (x) + hξ, y − xi, ∀ y ∈ D(f )
⇐⇒ f (x) − hξ, xi = min {f (y) − hξ, yi}.
y∈D(f )
Its inverse is also a set-valued mapping. If we still denote the inverse (set-valued)
mapping of ∂f (x) by ψ(ξ), then
ψ(ξ) = {x ∈ D(f ) | ξ ∈ ∂f (x)},
i.e.
x ∈ ψ(ξ) ⇐⇒ x is such that f (y) − hξ, yi achieves its minimal value.
Accordingly, we can extend the Legendre transformation to convex functions.
Comparing to (14.1), we introduce the following.
Definition 14.1 (Conjugate function) Let f : Rn → R̄ = R1 ∪ {+∞} be a
proper function, i.e. D(f ) 6= ∅. We call
f ∗ (ξ) = sup {hξ, xi − f (x)}
x∈Rn
the conjugate function of f . Sometimes, it is also called the Fenchel transform

of f .
A conjugate function has the following properties.
(1) f ∗ is lower semi-continuous and convex.
This is because f ∗ is the supremum of a family of affine functions, and since
affine functions are convex and lower semi-continuous, the supremum of convex
lower semi-continuous functions is itself convex and lower semi-continuous.
(2) If f is a proper, lower semi-continuous, and convex function, then f ∗ is
proper.
Proof Since f is proper, ∃ x0 ∈ Rn such that f (x0 ) < +∞. This means the
epigraph of f epi(f ) = {(x, t) ∈ Rn × R1 |f (x) ≤ t} is a closed convex set with
non-empty complement. Choose t0 < f (x0 ), then (t0 , x0 ) ∈/ epi(f ). By Ascoli’s
separation theorem, ∃ (x∗0 , λ) ∈ Rn × R1 and ∃ α ∈ R1 such that
hx∗0 , xi + λt > α > hx∗0 , x0 i + λt0 ∀ (x, t) ∈ epi(f ). (14.2)
In particular,
hx∗0 , x0 i + λf (x0 ) > α > hx∗0 , x0 i + λt0 .
The Ekeland variational principle 197
It follows that λ > 0 and

1 ∗ α
− x0 , x − f (x) < − , ∀ x ∈ D(f ),
λ λ
i.e.
x∗0

∗ α
f − < − < +∞.
λ λ

∗ ∗
(3) If f ≤ g, then g ≤ f .
(4) (Young’s inequality) hx∗ , xi ≤ f (x) + f ∗ (x∗ ).
(5) f (x) + f ∗ (x∗ ) = hx∗ , xi ⇐⇒ x∗ ∈ ∂f (x).
Proof By definition,
x∗ ∈ ∂f (x) ⇐⇒ hx∗ , y − xi ≤ f (y) − f (x), ∀ y ∈ Rn ,
⇐⇒ hx∗ , yi − f (y) ≤ hx∗ , xi − f (x), ∀ y ∈ Rn ,
⇐⇒ f ∗ (x∗ ) ≤ hx∗ , xi − f (x).
Combining with Young’s inequality, the assertion follows.
(6) If g(x) = f (x − x0 ) + hx∗0 , xi + a, then
g ∗ (x∗ ) = f ∗ (x∗ − x∗0 ) + hx∗ , x0 i − (a + hx∗0 , x0 i).
(7) If g(x) = f (λx), λ > 0, then g ∗ (x∗ ) = f (x∗ /λ).
As long as f ∗ is a proper function, we can also define its conjugate f ∗∗ . By
property (1), f ∗∗ is proper, convex, and lower semi-continuous.
Theorem 14.1 (Fenchel–Moreau) If f is a proper, convex, lower semi-
continuous function, then f ∗∗ = f .
Proof By Young’s inequality, we have f ∗∗ (x) ≤ f (x), ∀ x ∈ Rn .
To prove the reversed inequality, we argue by contradiction. Suppose not,
there exists x0 ∈ Rn such that f ∗∗ (x0 ) < f (x0 ). Following the proof of property
(2), we obtain a point (x0 , f ∗∗ (x0 )) ∈
/ epi(f ), hence the inequality (14.2), where
t0 = f ∗∗ (x0 ).
If f (x0 ) < +∞, then λ > 0 in (14.2) and
hx∗0 , xi + λf (x) > α, ∀ x ∈ D(f ).
Likewise, we have
x∗0

α
f∗ − ≤− .
λ λ
According to the definition of f ∗∗ ,
x∗ x∗

f ∗∗ (x0 ) ≥ − 0 , x0 − f ∗ − 0 ,
λ λ
whence
hx∗0 , x0 i + λf ∗∗ (x0 ) ≥ α. (14.3)
This contradicts (14.2).
If f ∗∗ (x0 ) < +∞, then λ > 0 still holds, which again yields a contradiction.
It remains to consider the case where f (x0 ) = f ∗∗ (x0 ) = +∞ and λ = 0.
From (14.2), ∃ ε > 0 such that
hx∗0 , x − x0 i ≥ ε, ∀ x ∈ D(f ). (14.4)
∗ ∗ n ∗ ∗
Since f is proper, ∃ x1 ∈ R such that f (x1 ) < +∞ and
hx∗1 , xi − f (x) − f ∗ (x∗1 ) ≤ 0, ∀ x ∈ D(f ). (14.5)
Combining (14.4) and (14.5), ∀ n ∈ N, it yields
hx∗1 − nx∗0 , xi + nhx∗0 , x0 i + nε − f (x) − f ∗ (x∗1 ) ≤ 0, ∀ x ∈ D(f ).
Consequently,
f ∗ (x∗1 − nx∗0 ) + nhx∗0 , x0 i + nε − f ∗ (x∗1 ) ≤ 0
or
nε + hx∗1 , x0 i − f ∗ (x∗1 ) ≤ hx∗1 − nx∗0 , x0 i − f ∗ (x∗1 − nx∗0 ) ≤ f ∗∗ (x0 ).
Letting n → ∞, it follows that f ∗∗ (x0 ) = +∞, a contradiction.
Corollary 14.1 For a proper, convex, lower semi-continuous function f , we have
ξ ∈ ∂f (x) ⇐⇒ x ∈ ∂f ∗ (ξ).
This directly generalizes the Fenchel–Moreau theorem and property (5).
Corollary 14.2 If f : Rn → R̄ is proper, then
f ∗∗ = conv(f ) = sup{ϕ : Rn → R̄ | ϕ(x) ≤ f (x), ∀ x ∈ Rn , ϕ is convex }.
Proof Suppose g ≤ f is convex, then it is proper and convex. By property (3),
f ∗ ≤ g ∗ ; moreover, g ∗∗ ≤ f ∗∗ . By the Fenchel–Moreau theorem, we see that
g = g ∗∗ ≤ f ∗∗ .
Example 14.1 Let f (x) = |x|p /p, 1 < p < ∞, where |x| = (x21 + x22 + · · · +
x2n )1/2 , then
1 0 1 1
f ∗ (ξ) = 0 |ξ|p , where + 0 = 1.
p p p
Example 14.2 Let f (x) = |x|, then
(
∗ 0 if |ξ| ≤ 1,
f (ξ) =
+∞ if |ξ| > 1.
Proof Recall
f ∗ (ξ) = sup{hξ, xi − |x|}.
x
For |ξ| > 1, we let x = tξ and t → +∞, then f ∗ (ξ) = +∞. For |ξ| ≤ 1, since
hξ, xi − |x| ≤ 0, by choosing x = 0, the result follows.
14.2 The dual least action principle
Given a convex Hamiltonian F ∈ C 1 (RN × RN ) satisfying the following

growth condition:
0 ≤ H(u, ξ) ≤ C(|u|2 + |ξ|2 ). (14.6)
Find a vector-valued periodic function (u(t), ξ(t)) ∈ RN × RN which satisfies
the Hamiltonian system
˙ = Hu (t, u(t), ξ(t)),

ξ(t)
(14.7)
u̇(t) = −Hξ (t, u(t), ξ(t)).
As mentioned before, the minimization method cannot be applied directly to the
associated functional. We shall use the conjugate function H ∗ of the convex
Hamiltonian H to rephrase the problem.
In the above problem, the period is yet to be determined. Introducing a pa-
rameter λ, we wish to find a 2π-periodic solution (v(t), η(t)) ∈ RN × RN such
that

η̇(t) = λHv (t, v(t), η(t)),
(14.8)
v̇(t) = −λHη (t, v(t), η(t)).
Suppose we have found a pair (v(t), η(t)) satisfying (14.8), then by letting
u(t) = v(λ−1 t), ξ(t) = η(λ−1 t),
the pair (u(t), ξ(t)) satisfies (14.7). If λ > 0, then both of them have period 2λπ.
We now turn (14.8) into a constrained minimization problem. It is the dual
1
problem of the original problem: find (w, ρ) ∈ X := Hper ((0, 2π), RN )2 =
1 2N
{(w, ρ) ∈ H ((0, 2π), R ) | w(0) = w(2π), ρ(0) = ρ(2π)} which satisfies
Z 2π
I(w, ρ) = H ∗ (ρ̇(t), −ẇ(t))dt,
0
Z 2π (14.9)
G(w, ρ) = (−ρ̇(t) · w(t) + ẇ(t) · ρ(t))dt = −π.
0
Denoting
Z 2π
h(w, ρ), (u, ξ)i = (w · u + ρ · ξ)dt,
0
a direct calculation shows
˙ u̇), (w, ρ)i
hG0 ((w, ρ)), (u, ξ)i = 2h(−ρ̇, ẇ), (u, ξ)i = 2h(−ξ,
and
Z 2π
0
hI ((w, ρ)), (u, ξ)i = ˙
(H ∗ )0 (ρ̇(t), −ẇ(t)) · (ξ(t), −u̇(t))dt.
0
We have the following conclusions:

(1) M := G−1 (−π) 6= ∅.
This is because
Z 2π
G(w, ρ) = −2 ρ̇(t) · w(t)dt
0
is linear in both w and ρ.
(2) G0 ((w, ρ)) 6= (θ, θ), ∀ (w, ρ) ∈ M .
This is because from (14.9), we see (w, ρ) 6= (θ, θ) on M .
(3) I is a convex functional, which is bounded below and continuous on X;
therefore, it is also weakly sequentially lower semi-continuous.
It suffices to verify I is bounded below. Combining (14.6), property (3) of a
conjugate function, and Example 14.1, we have
1
H ∗ (w, ρ) ≥ (|w|2 + |ρ|2 )
C
and
1 2π
Z
I(w, ρ) ≥ (|ẇ|2 + |ρ̇|2 )dt.
C 0
We note that there does not exist any constant vector-valued functions on M ,
hence
Z 2π 12
k(w, ρ)k = (|ẇ|2 + |ρ̇|2 )dt
0
defines an equivalent norm on X. This simultaneously justifies the following
(4) I is coercive.
(5) M is weakly sequentially closed.
To verify this, suppose {(wj , ρj )} ⊂ M such that (wj , ρj ) * (w0 , ρ0 ) in X.
Then in L2 ([0, 2π], RN ), we have strong convergence wj → w0 and ρj → ρ0 .
Furthermore,
−π = lim G(wj , ρj )
Z 2π
= lim (−ρ̇j (t) · wj (t) + ẇj (t) · ρj (t))dt
0
Z 2π
= (−ρ̇(t) · w(t) + ẇ(t) · ρ(t))dt = G(w0 , ρ0 ).
0
Consequently, there exists (w0 , ρ0 ) ∈ M , which is the minimum of the con-
strained problem (14.9). Using Lagrange multipliers, there exists λ ∈ R1 such
that
λ
I 0 (w0 , ρ0 ) + G0 (w0 , ρ0 ) = 0,
2
i.e.
(H ∗ )0 (ρ˙0 , −w˙0 ) = λ(w0 , ρ0 ). (14.10)
Since the sub-differential of the conjugate function and the sub-differential of
the original function are inverses of each other, as stated in Corollary 14.1, (14.10)
is equivalent to

ρ˙0 (t) = Hw (t, λw0 (t), λρ0 (t)),
(14.11)
w˙0 (t) = −Hρ (t, λw0 (t), λρ0 (t)).
If we define

η = λρ0 ,
v = λw0 ,
by substituting into (14.11), it gives (14.8).
Lastly, we verify λ > 0. In (14.10), first multiplying both sides by (ρ̇0 , −ẇ0 )
and then integrating, we obtain
h(H ∗ )0 (ρ˙0 , −w˙0 ), (ρ˙0 , −w˙0 )i = −λG(w0 , ρ0 ) = λπ.
It follows from (14.6) that ∇H(θ, θ) = (θ, θ). Using property (5) of a conjugate
function, we see that
H ∗ (θ, θ) = −H(θ, θ) = 0.
Since H ∗ is convex, we have
H ∗ (θ, θ) − H ∗ (ρ˙0 , −w˙0 ) ≥ −h(H ∗ )0 (ρ˙0 , −w˙0 ), (ρ˙0 , −w˙0 )i,
i.e.
h(H ∗ )0 (ρ˙0 , −w˙0 ), (ρ˙0 , −w˙0 )i ≥ H ∗ (ρ˙0 , −w˙0 ) ≥ 0.
Thus, λ ≥ 0.
It remains to show that λ 6= 0. We argue by contradiction. Suppose λ = 0,
then
(H ∗ )0 (ρ˙0 , −w˙0 ) = (θ, θ).
By Corollary 14.1, we see that

ρ˙0 = −Hu (θ, θ),
w˙0 = Hρ (θ, θ).
Using condition (14.6), we conclude that (Hu (θ, θ), Hρ (θ, θ)) = (θ, θ), a contra-
diction to conclusion (2).
14.3 The Ekeland variational principle
In view of our previous methods in finding extreme values, except in some

special circumstances (mostly linear problems) where the orthogonal projection
method is applicable, we almost always rely on the weak convergence (weak se-
quential lower semi-continuity, weak sequential compactness, etc.). However, the
weak topology is in general more complicated, hard to grasp, and often tedious
to verify. In the following, we introduce the Ekeland variational principle, a fun-
damental result proposed by Ekeland in 1970. However, this seemingly simple
result can be combined with the Palais–Smale condition, a compactness condition
in modern variational calculus, to produce a very useful method in finding extreme
values.
Theorem 14.2 (Ekeland) Let (X, d) be a complete metric space. Let f : X →
R1 ∪ {+∞} be a proper function, i.e. f 6≡ +∞. If f is bounded below and lower
semi-continuous and if ∃ ε > 0 and ∃ xε ∈ X such that f (xε ) < inf X f + ε,
then ∃ yε ∈ X such that
(1) f (yε ) ≤ f (xε ),
(2) d(yε , xε ) ≤ 1,
(3) f (x) > f (yε ) − εd(yε , x), ∀ x ∈ X\{yε }.
Proof We point out that the above-mentioned yε is itself the minimum of the
function f (x) + εd(yε , x), which depends on yε .
1◦ We choose a convergent sequence in X recursively.
First, we choose u0 = xε . Suppose un has been chosen, we define
Sn = {w ∈ X| f (w) ≤ f (un ) − εd(w, un )}.
Since un ∈ Sn , Sn 6= ∅. Choose un+1 ∈ Sn satisfying

1
f (un+1 ) − inf f ≤ f (un ) − inf f , n = 0, 1, 2, . . . .
Sn 2 Sn
2◦ We show {un } is a Cauchy sequence. Note

εd(un , um ) ≤ f (un ) − f (um ), ∀ m ≥ n. (14.12)
Since {f (un )} is decreasing and f is bounded below, whenever n.m → ∞,
f (un ) − f (um ) → 0.
Therefore, ∃ u∗ ∈ X such that un → u∗ . Since f is lower semi-continuous,
we have
f (u∗ ) ≤ lim f (un ) ≤ lim inf f. (14.13)
n→∞ n→∞ Sn
3◦ It remains to verify yε = u satisfies (1)–(3).
∗
Since {f (un )} is decreasing,

f (yε ) = f (u∗ ) ≤ f (un ) ≤ f (u0 ) = f (xε ), ∀ n,
so (1) holds.
From (14.12),
εd(xε , yε ) = εd(u0 , u∗ )
≤ f (xε ) − f (u∗ )
≤ f (xε ) − inf f < ε,
X
so (2) holds.
Lastly, we prove (3) by contradiction. Suppose yε = u∗ does not satisfy (3),
then ∃ w 6= u∗ such that
f (w) ≤ f (u∗ ) − εd(u∗ , w). (14.14)
From (14.12), it follows that
εd(un , u∗ ) ≤ f (un ) − f (u∗ ),
i.e.
f (u∗ ) ≤ f (un ) − εd(un , u∗ ). (14.15)
Combining (14.14) and (14.15), we deduce that
f (w) ≤ f (un ) − εd(un , w), ∀ n.
T∞
Thus, w ∈ n=1 Sn . Using (14.13), we have f (u∗ ) ≤ f (w), which contradicts
(14.14).
Corollary 14.3 Let (X, d) be a complete metric space and f : X → R1 ∪ {+∞}
be a proper function, which is bounded below and lower semi-continuous. Then
∀ ε > 0, ∃ yε ∈ X such that f (x) > f (yε ) − εd(x, yε ), ∀ x 6= yε .
It is worth noting that although the Ekeland variational principle only employs
the metric topology, which involves neither the weak topology nor the various no-
tions of compactness, but the minimum of the functional f is not reached. How-
ever, the significance lies in that by choosing this special minimizing sequence, it
produces a special sequence of approximated minima.
14.4 The Fréchet derivative and the Palais–Smale condition
In the following, we connect the Ekeland variational principle to the derivative

of a continuously differentiable function.
In Lecture 10, we introduced the Gâteaux derivative of a real-valued function

on a Banach space. Let f : X → R1 , x0 ∈ U ⊂ X, where U is an open
neighborhood of X. We say that f has Gâteaux derivative at x0 if ∀ h ∈ X,
∃ df (x0 , h) ∈ R1 such that
f (x0 + h) − f (x0 ) − tdf (x0 , h) = o(t), t → 0, ∀ x0 + h ∈ U.
The Fréchet derivative is closely related to the Gâteaux derivative. If
|f (x) − f (x0 ) − hξ, x − x0 i| = o(kx − x0 k) as x → x0 ,
then ξ ∈ X ∗ is said to be the Fréchet derivative of f at x0 and we denote f 0 (x0 ) =
ξ.
If f has Fréchet derivative f 0 (x0 ), then it has Gâteaux derivative at x0 and
df (x0 , h) = hf 0 (x0 ), hi, ∀ h ∈ X.
Furthermore,
df (x0 , h)
kf 0 (x0 )k = sup .
h∈X khk
Conversely, suppose f has Gâteaux derivative df (x, h) everywhere in some neigh-
borhood U of x0 , if ∀ x ∈ U , ∃ ξ(x) ∈ X ∗ such that
hξ(x), hi = df (x, h).
If x 7→ ξ(x) is also continuous, then f has Fréchet derivative f 0 (x0 ) at x0 .
If the Fréchet derivative f 0 (x) exists everywhere and x 7→ f 0 (x) is continuous,
then we say that f is continuously differentiable, and we denote f ∈ C 1 (X, R1 ).
The Gâteuax derivative generalizes the directional derivative from Rn to a
Banach space, where the Fréchet derivative generalizes the differential from Rn
to a Banach space.
In Lecture 11, we computed the Gâteaux derivatives of some functionals of
integral form. Their appearances are identical to the variational derivatives. As
for the Fréchet derivatives, as long as they exist, they should also have the same
expression.
Example 14.3 Given Ω ⊂ Rn , F ∈ C 1 (Ω × R1 ) satisfies
n+2
|Fs (x, s)| ≤ C(1 + |x|)µ , µ ≤ 2∗ − 1 = (n > 2).
n−2
On H01 (Ω), we find the Fréchet derivative of the functional
Z
1
I(u) = |∇u(x)|2 + F (x, u(x)) dx.
Ω 2
We already computed its Gâteaux derivative to be

Z
dI(u, v) = [∇u · ∇v + Fu (x, u)v]dx.
Ω
On H01 (Ω), using the inner product
Z
hu, vi = [∇u · ∇v]dx,
Ω
we can write it as
dI(u, v) = hu + KFu (x, u), vi,
where
K = (1 − ∆)−1 : H01 (Ω)∗ → H01 (Ω).
By the embedding
∗
H01 (Ω) ,→ L2 (Ω) (14.16)
and the growth condition
2n ∗
|Fs (x, u(x))| ≤ C(1 + |u(x)|)µ ∈ L n+2 = (L2 (Ω))∗ ,
∗
we see the nonlinear mapping u 7→ Fs (x, u(x)) : H01 (Ω) ,→ (L2 (Ω) )∗ is
continuous.
Moreover, by the continuity of the embedding (14.16), we deduce that its dual
mapping
∗
(Ω) ∗
(L2 ) ,→ (H01 (Ω))∗ (14.17)
is also continuous. Thus, the Gâteaux derivative u 7→ u + KFu (x, u) : H01 (Ω)
,→
1
H0 (Ω) is continuous. Consequently, the Fréchet derivative of f exists and is given
by
I 0 (u) = u + KFu (x, u).
We call x0 a critical point of f if f 0 (x0 ) = θ and the value f (x0 ) its critical
value.
Subsequently, in variational problems, minima are critical points, and all crit-
ical points are solutions of E-L equations.
Definition 14.2 Let X be a Banach space and f ∈ C 1 (X, R1 ). If for any sequence
{xj }∞
1 ⊂ X satisfying
f (xj ) → c, kf 0 (xj )k → 0, (14.18)

there is a convergent subsequence, then we say f at c satisfies the Palais–Smale
condition, denoted by PSc .We also call the sequence satisfying (14.18) a Palasi–
Smale sequence (or briefly PS-sequence).
If ∀ c ∈ R1 , f satisfies PSc , then we say f satisfies PS.

The Palais–Smale condition can be extended to a general Banach manifold.
The following corollary is very important, since by combining the Ekeland
variational principle and the Palais–Smale condition, we give a criterion for the
existence of minimal value.
Corollary 14.4 Let X be a Banach space (or more generally, a Banach manifold).
Let f ∈ C 1 (X, R1 ) be bounded below. Denote
c = inf f.
X
If f satisfies the PSc condition, then f attains its minimum.

Proof According to the Ekeland variational principle, ∀ n ≥ 1, ∃ xn ∈ X such
that
f (x) > f (xn ) − n1 kx − xn k, ∀ x 6= xn

f (xn ) < c + n1 .
The first inequality implies
1
kf 0 (xn )k = sup |df (xn , ϕ)| ≤ .
kϕk=1 n
The second inequality implies
f (xn ) → c.
According to the PSc condition, there exists a convergent subsequence {xnj } such
that xnj → x∗ ∈ X. By continuity, it follows that f (x∗ ) = c.
14.5 The Nehari technique
Since many functionals are neither bounded above nor below, in appearance,
it seems difficult to apply variational methods to find their critical points. How-
ever, for some particular problems, Nehari provided a special technique, which
transforms a critical point problem to an extremal problem.
Let H be a Hilbert space, equipped with inner product (·, ·), and a given func-
tional I ∈ C 2 (H, R1 ). We look for the critical points of I, namely, those points
for which I 0 (u) = 0.
Define G(u) = hI 0 (u), ui. We note that all critical points u of I satisfy:
G(u) = 0.
If the set M = {h ∈ H | G(u) = 0} is a manifold; for example, G0 (u) 6= θ,
˜ Furthermore,
∀ u ∈ M , then we can restrict I on M to obtain a new functional I.
if I˜ has an extreme value, then we can find the extreme value of I;

˜ in other words,
we find the constrained extreme value of I.
Some people may ask: how to handle the Lagrange multiplier associated with
G(u) = 0? In fact, since
(I 0 (u), u) 0
I˜0 (u) = I 0 (u) − G (u),
kG0 (u)k2
so on the set M , we have (I 0 (u), u) = G(u) = 0. Thus, if G0 (u) 6= θ, then
I˜0 (u) = 0 ⇐⇒ I 0 (u) = 0.
We now give a concrete example to demonstrate how to apply this technique.
Example 14.4 Let Ω ⊂ Rn be a bounded domain. Let a ∈ C(Ω̄), a(x) ≥ α > 0,
2 < µ < 2∗ . On H01 (Ω), find the nontrivial critical points of the functional
Z
1 2 µ
I(u) = |∇u(x)| + a(x)|u(x)| dx.
Ω 2
We compute that
Z
0
G(u) = hI (u), ui = [|∇u|2 + µa(x)|u(x)|µ ]dx.
Ω
On G−1 (0),
Z
˜ µ 1 1
I(u) = (µ − 2) a(x)|u(x)| dx = − kuk2
Ω 2 µ
is nonnegative.
1◦ Note that
Z
hG0 (u), vi = [2∇u · ∇v + µ2 a(x)|u(x)|µ−2 u(x)v(x)]dx, ∀ v ∈ H01 (Ω),
Ω
so we have G(θ) = 0 and

G0 (u) = 2u + (−∆)−1 a|u|µ−2 u.
This means: θ ∈ G−1 (0) and G0 (θ) = θ.
However, ∀ u ∈ G−1 (0), by the embedding theorem, we have
Z Z Z µ2
2 µ 2

|∇u| dx = µ a(x)|u(x)| dx ≤ C |∇u| dx .
Ω Ω Ω
It follows that either u = θ or

Z µ2 −1
1
|∇u|2 dx ≥ .
Ω C
This implies θ is the only isolated point in G−1 (0). Let M = G−1 (0)\{θ}, then
on M ,
Z
hG0 (u), ui = 2kuk2 + µ2 a(x)|u(x)|µ ]dx > 0,
Ω
0
whence G (u) 6= 0, ∀ u ∈ M .
2◦ It is clear I˜ ∈ C 1 (H01 (Ω), R1 ). We now verify the Palais–Smale condition.
Suppose {uj } ⊂ M is a PS-sequence satisfying I(u ˜ j ) → 0 and |I(u
˜ j )| ≤ C.
Since

1 1 ˜ j ) ≤ C,
− kuj k2 = I(u
2 µ
there exists a subsequence uj 0 * u. Moreover, since
I˜0 (u) = I 0 (u) = u + (−∆)−1 µa(x)|u(x)|µ−2 u(x)
and the embedding
H01 (Ω) ,→ Lµ (Ω)
is compact, as I˜0 (uj ) → 0, {uj } has a subsequence which converges in Lµ (Ω).
Composing it with (−∆)−1 , it follows that this subsequence in H01 (Ω) converges
to u0 ∈ M and
I 0 (u0 ) = I˜0 (u0 ) = lim I˜0 (uj ) = 0.
Thus, u0 is the desired nontrivial critical point.
Exercises
√
1. Let R > a > 0 and M = {u ∈ C 1 ([−a, a]) | u(±a) = R2 − a2 },
Z a p
u
I(u) = 1 − u̇2 − dt.
−a R
(1) Compute the first and second variations of I.
(2) Write the Euler–Lagrange
√ equation.
(3) Verify that u0 = R2 − t2 is a weak minimal solution.
(4) Write the Jacobi operator along u0 .
2. Find inf{I(u) | u ∈ M } for each of the following:
(1)
Z 2 p
I(u) = t 1 + u̇2 dt,
1
M = {u ∈ C 1 ([1, 2]) | u(j) = cosh−1 j, j = 1, 2}.

(2)
Z b
I(u) = π u2 dr,
a
Z b p
1

M = u ∈ C0 ([a, b]) 2π u 1 + u̇2 dt = c .
a
1 1 1 n N
3. Let V = C (R , R ). ∀ (u, p) ∈ R × R , let
N
1X 2 X
L(u, p) = p − V ((ui − uj )2 ).
2 i=1 i
i6=j
(1) Write the corresponding functional.

(2) Write the Euler–Lagrange equation.
(3) Write the corresponding Hamiltonian.
(4) Write the corresponding Hamiltonian system.
(5) Suppose {ui (t)}N
1 is a solution of the E-L equation, prove that
N
1X 2 X
u̇i (t) + V ((ui (t) − uj (t)2 ) = const. ∀ t ∈ R1 .
2 i=1
i6=j
(6) Write the corresponding Hamilton–Jacobi equation.

4. Let r ∈ C([0, 1]), suppose ∃ t0 ∈ [0, 1] such that r(t0 ) > 0. Prove that there
exist infinitely many pairs {(λn , un )}∞
1 with un 6= 0, λn → +∞, such that

ü = λru, in (0, 1),
u(0) = u(1) = 0.

Lecture 15
The Mountain Pass Theorem, its

generalizations, and applications
In this lecture, we introduce the theory of finding critical points other than
minima (or maxima). This theory carefully examines the changes of topologi-
cal structure taking place in the level sets of a functional; subsequently provides
criteria to the existence of critical points. Such theory is based on the ideas and
machinery from both algebraic and differential topology. Since the 1970s, critical
point theory has undergone a rapid development; in particular, it has found pro-
found applications in partial differential equations and dynamical systems with
variational structures.
We do not require the reader to possess the needed topological background;
instead, we would like to expose the reader to some of the most fundamental
and commonly used critical points theorems, such as the Mountain Pass Theo-
rem. To make this more accessible, we will take on a more geometrically intuitive
approach.
The standard critical point theory utilizes the gradient flow to accomplish the
deformation between the level sets of a functional. However, this treatment is
beyond the scope of this book and does not fit well with our current content, we
will adopt a more direct approach - introducing these critical point theorems based
on the Ekeland variational principle.
15.1 The Mountain Pass Theorem
The following intuitive example illustrates the basic idea used in analyzing
saddle points, a special kind of critical point.
Imagine the following scenario: in a valley surrounded by mountains, if a
person starting from a point p1 outside the valley wants to reach a point p0 in
the valley, the optimal path would be along which the highest point is always no
211
higher than the highest point on a nearby path. The highest point on this path is
likely to be a saddle point - a critical point which is neither the maximum nor the
minimum (see Figure 15.1).
Fig. 15.1
We formulate this mathematically as follows: Let X = Rn and Ω ⊂ Rn be an

open set. Given two points p0 ∈ Ω and p1 ∈ / Ω̄. Suppose there exists a function
1 1
f ∈ C (X, R ) such that
α = inf f (x) > max{f (p0 ), f (p1 )}. (15.1)

x∈∂Ω
Let
Γ = {l ∈ C([0, 1], X)| l(i) = pi , i = 0, 1} (15.2)
be the set of all paths connecting the two points and let
c = inf sup f ◦ l(t). (15.3)

l∈Γ t∈[0,1]
We intend to assert c is a critical value of f , i.e. there exists x0 ∈ X such that

f 0 (x0 ) = 0 and f (x0 ) = c.
Unfortunately, geometric intuition is not a proof. To validate the assertion, we
must impose other conditions on f .
Theorem 15.1 (The Mountain Pass Theorem) Let X be a Banach space and
f ∈ C 1 (X, R1 ). Let Ω ⊂ X be an open set. Given two points p0 ∈ Ω and p1 ∈ / Ω̄
satisfying (15.1). Let c be defined as in (15.2) and (15.3). If f satisfies the PSc
condition, then c ≥ α is a critical value of f .
Proof We define the metric on Γ by
d(l1 , l2 ) = max kl1 (t) − l2 (t)k,

t∈[0,1]
The Mountain Pass Theorem 213
then (Γ, d) is a complete metric space. Let

I(l) = max f ◦ l(t).
t∈[0,1]
From assumption (15.1), I(l) ≥ α and I satisfies locally Lipschitzian condition:

|I(l1 ) − I(l2 )| ≤ max |f ◦ l1 (t) − f ◦ l2 (t)|
t∈[0,1]
≤ max kf 0 (θl1 (t) + (1 − θ)l2 (t))kkl1 (t) − l2 (t)k

t,θ∈[0,1]
≤ Cd(l1 , l2 ),
where C is a constant, depending only on f 0 , l1 , and l2 . Using the Ekeland varia-
tional principle, we obtain a sequence {ln } ⊂ Γ such that
1
c ≤ I(ln ) < c + , (15.4)
n
1
I(l) > I(ln ) − d(l, ln ), l 6= ln , n = 1, 2, . . . (15.5)
n
Let
M (l) = {t ∈ [0, 1]| f ◦ l(t) = I(l)}.
Then M is a non-empty compact set and M ⊂ (0, 1). To see this, suppose t0 ∈
M (l) ∩ {0, 1}, then
f ◦ l(t0 ) = max f ◦ l(t) ≥ inf f = α,
t∈[0,1] ∂Ω
but
f ◦ l(t0 ) ≤ max{f (p0 ), f (p1 )} < α,
a contradiction.
Denote Γ0 = {ψ ∈ C([0, 1], X)| ψ(i) = θ, i = 0, 1}, it is a closed linear
subspace of C([0, 1], X) with norm
kψkΓ0 = maxt∈[0,1] kψ(t)k.
By (15.4), ∀ h ∈ Γ0 , khkΓ0 = 1, ∀ λj ↓ 0, ∀ ξj ∈ M (ln + λj h), we have
1
λ−1
j [f ◦ (ln + λj h)(ξj ) − f ◦ ln (ξj )] ≥ − .
n
Since {ξj } ⊂ [0, 1], there exists a subsequence, still denoted ξj , converging to ηn .
The latter depends on ln , λj , and h. Taking the limit, it yields
1
df (ln (ηn ), h(ηn )) ≥ − . (15.6)
n
We want to show ∃ ηn∗ ∈ M (ln ) such that

1
df (ln (ηn∗ ), ϕ) ≥ − , ∀ ϕ ∈ X, kϕk = 1. (15.7)
n
If this is true, then by letting xn = ln (ηn∗ ), it follows that
1
c ≤ f (xn ) < c + ,
n
1
sup |df (xn , ϕ)| ≤ .
kϕk=1 n
Using the PSc condition, {xn } has a convergent subsequence {xnj } such that
xnj → x∗ , consequently, f 0 (x∗ ) = 0.
We now prove (15.7) by contradiction. Suppose there does not exist such ηn∗
satisfying (15.7), then ∀ η ∈ M (ln ), ∃ vη ∈ X with kvη k = 1 such that
1
df (ln (η), vη ) < − .
n
So there is a neighborhood of η Oη ⊂ (0, 1) such that
1
df (ln (ξ), vη ) < − , ∀ ξ ∈ Oη .
n
Since M (ln ) is compact, it has a finite covering, say, {Oηi | i = 1, . . . , m} and
Sm
M (ln ) ⊂ i=1 Oηi , which corresponds to {vηi }m 1 , kvηi k = 1 such that
1
df (ln (ξ), vηi ) < − , ∀ ξ ∈ Oηi , i = 1, . . . , m. (15.8)
n
We construct a partition of unity subordinate to the {Oηi }: 0 ≤ ρi ≤ 1, supp ρi ⊂
Oηi , 1 ≤ i ≤ m such that
m
X
ρi (ξ) = 1, ∀ ξ ∈ M (ln ).
i=1
Let
m
X
v = v(ξ) = ρi (ξ)vηi .
i=1
Since M (ln ) ⊂ (0, 1), v ∈ Γ0 , kvk ≤ 1. In fact, we can choose a finite covering
and some ξ ∗ ∈ M (ln ) such that there is only one i0 with ξ ∗ ∈ Oηi0 . Hence,
kvkΓ0 = 1, and (15.8) implies
1
df (ln (ξ), v(ξ)) < − , ∀ ξ ∈ M (ln ),
n
contradictory to (15.6). The proof is now complete.
Remark 15.1 The Mountain Pass Theorem was stated in the above version by
A. Ambrosetti and P. Robinowitz in 1974. Its generalizations as well as variations
have since been applied to solving various variational problems. The theorem
originated from the Wall Theorem, discovered by M. Morse while studying the
multiple-solution problem arisen in minimal surfaces.
The proof of the Mountain Pass Theorem by means of Ekeland variational
principle was independently provided by S. Z. Shi (The Chinese Annal of Math-
ematics 1, 1985, 348–355) and J. P. Aubin and I. Ekeland (Applied Nonlinear
Analysis, John Wiley and Sons, 1984).
Remark 15.2 We comment that in Theorem 15.1, the Palais–Smale condition
indeed plays a crucial role. Without it, Brezis and Nirenberg gave the following
counterexample:
In R2 , consider the function
f (x, y) = x2 + (1 − x)3 y 2 .
Let c = inf x2 +y2 = 41 f (x, y) > 0, it actually has a valley Ω = {(x, y) ∈
R2 | f (x, y) ≤ c}, while f (0, 0) = 0 and f (4, 1) = −11, but by direct com-
putation, f has the only critical point (0, 0).
The geometric structure involved in the Mountain Pass Theorem is a special
case of the more general linking structure.
Definition 15.1 Let X be a Banach space, Q ⊂ X be compact manifold with
boundary ∂Q, and S ⊂ X be a closed subset. We say ∂Q and S link, if
(1) ∂Q ∩ S = ∅,
(2) for any continuous ϕ : Q → X satisfying ϕ|∂Q = id|∂Q , we have ϕ(Q) ∩
S 6= ∅.
Link is a property depicting how two sets intersect with one another under contin-
uous deformations, therefore it is a topological property.
Example 15.1 In the Mountain Pass Theorem, Q = {tx0 + (1 − t)x1 | t ∈ [0, 1]},
S = ∂Ω and ∂Q = {x0 , x1 }, hence ∂Ω and S link.
Example 15.2 Let X be a Banach space, X1 be a finite dimensional linear sub-
space, X2 be its complementary space, i.e. X = X1 ⊕ X2 . Let
S = X2 , Q = BR ∩ X1 ,
where BR is the closed ball in X centered at θ with radius R > 0, thus
∂Q = {x ∈ X1 | kxk = R}
(see Figure 15.2).
Fig. 15.2
We now show S and ∂Q link. As stated before, since link is a topological

property, we must resort to topological machinery. A relatively simple tool from
topology is the Brouwer degree.
Given a continuous self mapping f on Rn and a bounded open set Ω ⊂ Rn ,
fix a point p ∈
/ f (∂Ω).
The Brouwer degree deg(f, Ω, p) is a function depending on the three vari-
ables (f, Ω, p). We list some its important properties:
/ f (∂Ω), and deg(f, Ω, p) 6= 0, then f −1 (p)∩
1. (Kronecker’s existence) If p ∈
Ω 6= ∅.
2. (Homotopy invariance) If F ∈ C([0, 1] × Ω̄, Rn ), p ∈
/ F ([0, 1] × ∂Ω), then
deg(F (t, ·), Ω, p) = const.
3. (Additivity) Assume Ω1 , Ω2 ⊂ Ω are bounded open sets, Ω1 ∩ Ω2 = ∅, and
p∈
/ f (Ω̄\(Ω1 ∪ Ω2 )), then
deg(f, Ω, p) = deg(f, Ω1 , p) + deg(f, Ω2 , p).
4. (Normality)
(
1 if p ∈ Ω,
deg(f, Ω, p) =
0 if p ∈
/ Ω.
5. Suppose in addition, f ∈ C 1 (Ω, Rn ) and p ∈
/ f (∂Ω) is a regular point of
f , i.e.
det(∂xi fj (x)) 6= 0, ∀ x ∈ f −1 (p),
then
X
deg(f, Ω, p) = sgn det(∂xi fj (xk )).
xk ∈f −1 (p)∩Ω
We now return to prove ∂Ω and S link. Clearly, S ∩ ∂Q = ∅, it remains to

show ∀ ϕ ∈ C(Q, X) satisfying ϕ|∂Q = id|∂Q , we have
ϕ(Q) ∩ S 6= ∅.
Equivalently, we must show ∃ x0 ∈ Q such that
P ◦ ϕ(x0 ) = θ,
where P : X → X1 is a projection operator. For this, we define F ∈ C([0, 1] ×
Q, Rn ) as
F (t, x) = tP ◦ ϕ(x) + (1 − t)x.
Since
θ∈
/ ∂Q = F (t, ∂Q), ∀ t ∈ [0, 1],
by the homotopy invariance property and normality, it follows that
deg(P ◦ ϕ, Q, θ) = deg(id, Q, θ) = 1.
Next, by Kronecker’s existence property, ∃ x0 ∈ Q such that
P ◦ ϕ(x0 ) = θ.
Thus, S and ∂Q are linked.
Example 15.3 Let X be a Banach space, X1 be a finite dimensional linear sub-
space, X2 be its complementary space, i.e. X = X1 ⊕ X2 . Let e ∈ X2 with
kek = 1, R > ρ > 0. Let
S = X2 ∩ ∂Bρ (θ)
and
Q = {x1 + te | (x1 , t) ∈ X1 × R1+ , kx1 k2 + t2 ≤ R2 },
then
∂Q = (BR (θ) ∩ X1 ) ∪ (∂BR (θ) ∩ (X1 ⊕ R1 e))+ ,
where
(∂BR (θ) ∩ (X1 ⊕ R1 e))+ = {x1 + te | (x1 , t) ∈ X1 × R1+ , kx1 k2 + t2 = R2 },
(see Figure 15.3).
We want to show S and ∂Q link. Clearly, S ∩ ∂Q = ∅. It remains to show
∀ ϕ ∈ C(Q, X) satisfying ϕ|∂Q = id|∂Q , ∃ x0 ∈ Q such that
P ◦ ϕ(x0 ) = θ, kϕ(x0 )k = ρ,
Fig. 15.3
where P : X → X1 is a projectin operator. This is equivalent to showing

P ◦ ϕ(x1 + se) = θ, k(I − P ) ◦ ϕ(x1 + se)k = ρ,
where x1 + se = x0 .
We define the deformation F ∈ C([0, 1] × Q, X1 × R1 e) via
F (t, x1 + se) = [(1 − t)x1 + tP ◦ ϕ(x1 + se)]
+ [(1 − t)s + tk(I − P ) ◦ ϕ(x1 + se)k − ρ]e,
then
F (1, x1 + se) = P ◦ ϕ(x1 + se) + [k(I − P ) ◦ ϕ(x1 + se)k − ρ]e,
F (0, x1 + se) = x1 + (s − ρ)e.
Moreover, when x1 + se ∈ ∂Q,
F (t, x1 + se) = x1 + (s − ρ)e 6= θ.
It follows from the homotopy invariance that
deg(F (1, ·), Q, θ) = deg(F (0, ·), Q, θ).
The latter can be computed directly by using property (5):

−1 ∂(x1 , s − ρ)
F (0, ·) (θ, 0) = (θ, ρ), det = 1.
∂(x1 , s)
Thus,
deg(F (1, ·), Q, θ) = 1.
Next, by Kronecker’s existence property, we have
F (1, x1 + se) = (θ, 0)
has a solution. Thus, S and ∂Q are linked.
Following the same proof of the Mountain Pass Theorem, we can also prove
the following.
Theorem 15.2 Let X be a Banach space, Q ⊂ X be compact manifold with
boundary ∂Q, and S ⊂ X be a closed subset linked with ∂Q. Let f ∈ C 1 (X, R1 ).
Suppose there exist α < β such that
sup f (x) ≤ α < β ≤ inf f (x). (15.9)
x∈∂Q x∈S
Let
Γ = {ϕ ∈ C(Q, X) | ϕ|∂Q = id|∂Q } (15.10)
and
c = inf max f (ϕ(ξ)). (15.11)
ϕ∈Γ ξ∈Q
If f also satisfies the PSc condition, then c (≥ β) is a critical value of f .
15.2 Applications
Both the Mountain Pass Theorem and the linking theorem have numerous ap-
plications in variational problems. However, we will only exhibit their usefulness
by a few examples to whet the reader’s appetite.
Example 15.4 Given a periodic continuous function a with period T > 0 on the
real line. Define the potential function
1 a(t)
V (t, x) = − |x|2 + |x|p+1 . (15.12)
2 p+1
Suppose p > 1, a(t) ≥ α > 0. Find a non-trivial T -periodic solution x ∈
C 2 ([0, T ], RN ) of the system
ẍ + Vx (t, x) = 0. (15.13)
1
We define, on the space Hper ((0, T ), RN ) := {x ∈ H 1 ((0, T ), RN ) | x(0) =
x(T )}, the functional
Z T
1 2 2 a(t) p+1
I(x) = (|ẋ| + |x| ) − |x| dt. (15.14)
0 2 p+1
It is clear that (15.13) is its E-L equation.
We claim x = θ is a local minimum of I, therefore it is a trivial solution of
(15.13). First, we note
I(θ) = 0.
Next, by the embedding theorem, kxkp+1 ≤ Ckxk, where C is a constant and

RT 1
kxk = 0 [|u̇|2 + |u|2 ]dt 2 . Since ∃ M > 0 such that |a(t)| ≤ M , ∀ t ∈ [0, T ],
it follows that
Z T
(p + 1)−1 a(t)|x(t)|p+1 dt ≤ M C p+1 kxkp+1 .
0
Choose ε > 0 so small that whenever x ∈ Bε (θ)\{θ}, it holds that

Z T
1 1
I(x) = kxk2 − (p + 1)−1 a(t)|x(t)|p+1 dt ≥ kxk2 .
2 0 4
Thus, x = θ is a local minimum.
In the following, we want to obtain a non-trivial critical point by the
Mountain Pass Theorem. To do so, we choose a low point outside the “valley”:
e = λξ sin 2tπ
T , ξ = (1, 1, . . . , 1). For λ > 0 sufficiently large, we have
2 Z T p+1
2π T p+1 2tπ
I(e) = λ2 n −λp+1 n 2 (p+1)−1 a(t) sin

+ dt < 0.
T 2 0 T
Let
Γ = {l ∈ C([0, 1], H01 ((0, T ), RN )) | l(0) = θ, l(1) = e}
and
1 2
c = inf sup I(x) ≥ ε ;
l∈Γ x∈l 4
if we can verify the PSc condition, then c is a critical value.
Suppose {xn } is a PS sequence with

I(xn ) → c
kI 0 (xn )k = supkϕk=1 dI(xn , ϕ) → 0.
We want to show that {xn } has a convergent subsequence. Note
Z T
x˙n · ϕ̇ + xn · ϕ − a(t)|xn |p−1 xn ϕ dt → 0,

dI(xn , ϕ) =
0
1
∀ ϕ ∈ Hper ((0, T ), RN ). (15.15)
In (15.15), by taking ϕ = xn , it yields
Z T
[|ẋn |2 + |xn |2 − a(t)|xn |p+1 ]dt = o(kxn k) (15.16)
0
and
T
|ẋn |2 + |xn |2 |xn |p+1
Z
− a(t) dt → c(6= 0). (15.17)
0 2 p+1
Comparing (15.16) and (15.17), it yields

Z T
1 1
− |ẋn |2 + |xn |2 dt = C1 + o(kxn k),
2 p+1 0
where C1 is a constant. Hence, kxn k is bounded and there exists a subsequence,

still denoted by xn such that
1
xn * x0 , in Hper ((0, T ), RN ).
1
Lastly, we verify xn → x0 in Hper ((0, T ), RN ) as follows.
By the embedding theorem,
xn → x0 , in L∞ ((0, T ), RN ).
Furthermore, by the assumption of the PS sequence,
−1
d2

xn − − 2 + 1 (a(t)|xn |p−1 xn ) → 0, Hper
1
((0, T ), RN ).
dt
1
Thus, xn converges strongly in Hper ((0, T ), RN ). It follows immediately that
1 N
xn → x0 in Hper ((0, T ), R ).
We have now verified all conditions stated in the Mountain Pass Theorem, so
1
this problem has a non-trivial solution x0 ∈ Hper ((0, T ), RN ). Furthermore, by
2 N
regularity, we see that x0 ∈ Cper ((0, T ), R ).
Remark 15.3 The same proof also applies to the case where
−c 2 a(t)
V (t, x) = |x| + |x|p+1 ,
2 p+1
for c > 0.
Example 15.5 In Example 15.4, we change the potential to be
c 2 a(t)
V (t, x) = |x| + |x|p+1 , (15.120 )
2 p+1
where c > 0, and p > 1, a(t) ≥ α > 0. Find a non-trivial T -periodic solution of
(15.13).
1
We still work with the space Hper ((0, T ), RN ), but we change the functional
to be
Z T
1 a(t)
I(x) = (|ẋ|2 − c|x|2 ) − |x|p+1 dt. (15.140 )
0 2 p + 1
It is easy to see that x = θ is still a critical point. However, it is no longer a
minimum. In order to find a non-trivial critical point, we must consider how the
different level sets of the functional are actually linked.
We linearize Eq. (15.13) and the corresponding linear equation is

−ẍ(t) = cx(t), t ∈ [0, T ],
together with periodic boundary conditions.
We first turn our attention to the eigenvalue problem
−ẍ(t) = λx(t), t ∈ [0, T ], (15.18)
where x(0) = x(T ) and ẋ(0) = ẋ(T ).
In Lecture 12, it has been shown the eigenvalues are
2
2kπ
λk = , k = 0, 1, 2, . . .
T
with corresponding eigenfunctions
2kπt 2kπt
cos ⊗ e, sin ⊗ f,
T T
where e, f ∈ RN for k ≥ 1; e ∈ RN for k = 0. For each k ≥ 1, we denote

2kπt 2kπt
⊗ f e, f ∈ RN ,

Ek = span cos ⊗ e, sin
T T
1
which is a 2N dimensional subspace of Hper ((0, T ), RN ), and E0 = RN is an N
dimensional subspace.
1
We have the direct sum decomposition of the space X = Hper ((0, T ), RN ):
X = X1 ⊕ X2 , where
k
M
X1 = Ej (15.19)
j=0
and k = max{j ∈ N | λj ≤ c}.

On the subspace X2 ,
Z T Z T
2
|ẋ(t)| dt ≥ λk+1 |x(t)|2 dt.
0 0
Hence,
Z T Z T
1 c 2 1
I(x) ≥ 1− |ẋ(t)| dt − a(t)|x(t)|p+1 dt.
2 λk+1 0 p+1 0
In Example 15.4, we estimated
Z T
a(t)|x(t)|p+1 dt ≤ M C p+1 kxkp+1 = o(kxk2 ) (x → θ).
0
c
Since 1 − λk+1 > 0, there exist ρ > 0 and β > 0 such that
I(x) ≥ β, for x ∈ ∂Bρ (θ) ∩ X2 . (15.20)
Let S = ∂Bρ (θ) ∩ X2 .
On X1 , we always have
Z T Z T
1 2 2 1 2
I(x) ≤ (|ẋ| − c|x| ) dt ≤ (λk − c)|x| dt ≤ 0.
0 2 0 2
We now take e = cos 2(k+1)πt
T ⊗ e1 , where e1 = (1, 0, . . . , 0) ∈ RN . Since all
norms on a finite dimensional space are equivalent, on the space X1 ⊕ R1 e, as
kxk → ∞, the following holds uniformly
Z T Z T
2 α
I(x) ≤ (λk+1 − c) |x| dt − |x|p+1 dt → −∞.
0 p+1 0
Next, we take
Q = {(x1 , t) ∈ X1 × R1+ | kx1 k2 + t2 = R2 }.
For R > ρ > 0 sufficiently large,
I|∂Q ≤ 0. (15.21)
Additionally, according to Example 15.3, we see that S and ∂Q are linked.
Lastly, we verify the PSc condition. The argument is similar to that of Exam-
ple 15.5. The only difference is that it is less direct in verifying the boundedness
1
of the PS sequence in Hper ((0, T ), RN ). Suppose {xj } ⊂ Hper
1
((0, T ), RN ) sat-
isfies I(xj ) → c and I 0 (xj ) → θ, then
Z T
[|ẋj (t)|2 − c|xj (t)|2 ]dt = C1 + o(kxj k)
0
and
Z T
a(t)|xj (t)|p+1 dt = C2 + o(kxj k).
0
By Hölder’s inequality, we have
Z T Z T 2 Z
p+1 T p−1
p+1
2
2 p+1 − p−1
|xj (t)| dt ≤ a(t)|xj (t)| dt a (t)dt .
0 0 0
Combining the three inequalities above, it follows that
Z T
|ẋj (t)|2 dt = C3 + o(kxj k),
0
whence kxj k ≤ C4 .
According to (15.20), (15.21), and Theorem 15.2, we established that I has a
critical value c ≥ β > 0, which corresponds to a non-trivial solution of (15.13).
Similar methods can also be applied to partial differential equations.
Example 15.6 Let Ω ⊂ Rn be a bounded domain. Let 1 < p < 2∗ −1, a ∈ C(Ω̄),
and a(x) ≥ α > 0, ∀ x ∈ Ω. Find a weak solution of the equation
−∆u(x) = a(x)|u(x)|p−1 u(x), (15.22)
where u ∈ H01 (Ω). On H01 (Ω), define the functional
Z
1 2 a(x) p+1
I(u) = |∇u(x)| − |u(x)| dx; (15.23)
Ω 2 p+1
(15.22) is its E-L equation.
Clearly, u = θ is a trivial solution. We want to find a non-trivial critical point
of I. By Poincaré’s inequality,
Z Z
1 1
I(u) ≥ |∇u(x)|2 dx − o(kuk2 ) ≥ |∇u(x)|2 dx as |uk → 0,
Ω 2 4 Ω
so for r > 0 sufficiently small,
r2
I|∂Br (θ) ≥ .
4
Choose any nonzero function ϕ ∈ H01 (Ω),
Z Z
1 a(x)
I(tϕ) = t2 |∇ϕ(x)|2 dx − tp+1 |ϕ(x)|p+1 dx → −∞.
Ω 2 Ω p + 1
If we take p0 = θ and p1 = tϕ, then for t sufficiently large, we will have the
desired geometric structure in the Mountain Pass Theorem.
It remains to verify the PSc condition. Suppose {uj } satisfies I(uj ) → c and
I 0 (uj ) → θ in H01 (Ω), we want to show that it has a convergent subsequence. In
fact, we have
Z
1 a(x)
|∇uj (x)|2 − |uj (x)|p+1 dx → c (15.24)
Ω 2 p+1
and
Z
[∇uj (x)ϕ(x) − a(x)|uj (x)|p−1 uj (x)ϕ(x)]dx = o(kϕk), ∀ ϕ ∈ H01 (Ω).
Ω
(15.25)
Substituting ϕ = uj into the equation, it yields
Z
[|∇uj (x)|2 − a(x)|uj (x)|p+1 ]dx = o(kuj k). (15.26)
Ω
Combining (15.24) and (15.26), it follows that

Z
1 1
− |∇uj (x)|2 dx = C + o(kui k).
2 p+1 Ω
Thus, {uj } is a bounded sequence in H01 (Ω). Consequently, it has a weakly

convergent subsequence uj 0 * u0 .
We prove uj 0 → u0 strongly in H01 (Ω). From (15.25), we see that
uj 0 = (−∆)−1 (a|uj 0 |p−1 uj 0 + o(kuj 0 k)).
We notice
(1) H01 (Ω) ,→ Lp+1 (Ω) is compact,
(2) u 7→ a|u|p−1 u gives a bounded continuous embedding Lp+1 (Ω) →
p+1
L p (Ω) ,→ (H01 )∗ (Ω),
(3) (−∆)−1 ∈ L((H01 )∗ , H01 ) is continuous;
hence, uj 0 → u0 strongly in H01 (Ω).
This completes the verification of the PSc condition. By the Mountain Pass
2
Theorem, we have the critical value c ≥ r4 > 0, which corresponds to a non-
trivial critical point.
When c < 0, the equation
−∆u(x) = cu(x) + a(x)|u(x)|p−1 u(x) (15.220 )
shares the same conclusion as that of (15.22). The proof is identical, hence
omitted.
When c > 0, we can apply the linking theorem to prove (15.220 ) has a non-
trivial solution.
Remark 15.4 Both the Mountain Pass Theorem and its generalization — the
linking theorem are special cases of a more general minimax principle. The
original minimax principle can be traced back to G.D. Birkhoff while studying
closed geodesics. It then underwent a systematic development by L. Liusternik,
L. Schnirelmann, and M. Krasnoselski, and it has since become an important
part of critical point theory. In the 1960s, R.S. Palais extended the Liusternik–
Schnirelmann theory to infinite dimensional manifolds. Since the work of A.
Ambrosetti and P.H. Rabinowitz on the Mountain Pass Theorem, minimax prin-
ciple has matured rapidly and found numerous applications. At the same time,
vast development also took place in the parallel branch of critical point theory —
Morse Theory. Together they forge the new area of global variational calculus
or topological variational methods. However, due to its broad connection with
other subjects, we will omit it from this book and refer the interested reader to
references such as [Ch], [MW], [St1], and [Ra1], etc.

Lecture 16
Periodic solutions, homoclinic and

heteroclinic orbits
16.1 The simple pendulum
We begin this lecture with the motion of a pendulum (see Figure 16.1)
to introduce the concepts of periodic solutions, homoclinic and heteroclinic
orbits.
Fig. 16.1
A small ball of mass m is attached to a pendulum of length l and let to swing

freely from side to side under gravity. Denote the angle of the pendulum from
its stationary position by ϕ, then the kinetic energy is 21 m(lϕ̇)2 and its potential
energy is mgl(1 − cos ϕ). The Lagrangian is 12 m(lϕ̇)2 − mgl(1 − cos ϕ) and its
dynamical equation is
ϕ̈ + α sin ϕ = 0,
where α = ω02 = gl .
227
On the phase plane (x, y) = (ϕ, ϕ̇) (see Figure 16.2),
Fig. 16.2
we consider the system

ẋ = y
ẏ = −α sin x.
Its energy is given by
m 2 2
E= l y + mgl(1 − cos x).
2
It is shown in Figure 16.2 the level sets of the energy as well as the equilibrium
points (0, 0) and (±π, 0).
When A ∈ (0, π), the energy at the initial point (x, y) = (A, 0) is E =
mgl(1 − cos A), whose motion equation is
p
y = ±ω0 2(cos x − cos A).
The pendulum’s motion is periodic with period

√ Z
2 2 A dx
T (A) = √ .
ω0 0 cos x − cos A
Furthermore,
2π
lim T (A) = , lim T (A) = ∞.
A→0 ω0 A→π−0
The orbit: (x(t), ẋ(t)) → (±π, 0) as t → ±∞, connecting (−π, 0) and (π, 0)
through the points (0, ±2ω0 ) in the upper or lower half plane is called a hetero-
clinic orbit.
Periodic solutions 229
ml2 B 2
If in addition, we assume |B| > 2ω0 , the solution with energy E = 2
passing through the initial points (x, y) = (0, B) is given by
q
y = ± B 2 − 2ω02 (1 − cos x).
These are periodic curves in the upper and lower half planes respectively; if
we let
Z x
ds
t(x) = p
0 B − 2ω02 (1 − cos s)
2
and
τ (B) = t(2π),
then
τ (B)
p(x) = t(x) −
2π
is a 2π-periodic function. We have
(
x(t + τ (B)) = x(t) + 2π,
y(t + τ (B)) = y(t).
In this sense, we call it a periodic solution of the second kind.
In a dynamical system, an orbit x(t) → p as t → ±∞ connecting a saddle
equilibrium point p to itself is called a homoclinic orbit (see Figure 16.3).
Fig. 16.3
16.2 Periodic solutions
In nonlinear oscillations, we are concerned with the periodic solutions of the

following equation
ü + ∇u V (t, u) = 0, u(0) = u(T ), u̇(0) = u̇(T ). (16.1)
For example, the gravitational acceleration g as a result of the moon’s gravitational
force (tidal force) is a T -periodic function, whose potential energy is
g(t)
V (t, u) = cos 2πu.
2π
Generally speaking, we assume V ∈ C 1 ([0, T ] × RN , R1 ) and introduce the
Lagrangian
|p|2
L(t, u, p) = − V (t, u),
2
whose associated functional is
T
|u̇|2
Z
I(u) = − V (t, u) dt.
0 2
We regard (16.1) as the E-L equation of I and choose the underlying space to be
1
Hper ([0, T ], RN ) = {u ∈ H 1 ([0, T ], RN ) | u(0) = u(T )}.
We consider the following simple cases separately.
I. Suppose u 7→ −V (t, u) is continuous and convex, furthermore,
Z T
F (u) = − V (t, u) dt → +∞, as kukRN → ∞. (16.2)
0
For example, let N = 1 and V (t, u) = −|u|p (1 + ε sin t), where p > 1. Since
−Vuu = −p(p − 1)|u|p−2 (1 + ε sin t) > 0,
I is convex. Since I is weakly lower semi-continuous, it is weakly sequentially
lower semi-continuous. We prove I is bounded below and coercive as follows.
Since u 7→ −V (t, u) is continuous and convex, F is a continuous and convex
function on RN . So ∃ x0 ∈ RN such that F achieves its minimum at x0 . We then
have
Z T
0 = F 0 (x0 ) = − Vu (t, x0 ) dt. (16.3)
0
By the convexity of −V , we have the inequality

−V (t, u) ≥ −(V (t, x0 ) + Vu (t, x0 )(u − x0 )), (16.4)
RT
∀ u ∈ HT1 (0, T ), we have the decomposition u = ũ + ū, where ū = T1 0
u dt.
Combining (16.3) and (16.4), we have
Z T Z T
− V (t, u(t))dt ≥ − (V (t, x0 ) + Vu (t, x0 )(u(t) − x0 ))dt
0 0
Z T
=− (V (t, x0 ) + Vu (t, x0 )(u(t) − ū))dt. (16.5)
0
Let
! 12
Z T Z T
c1 = V (t, x0 ) dt and c2 = |Vu (t, x0 )|2 dt ,
0 0
then
Z T Z T
1 ˙ 2−
I(u) ≥ |ũ| [V (t, x0 ) + Vu (t, x0 )ũ(t)]dt
2 0 0
! 12
Z T Z T
1 ˙ − c1 − c2
2 2
≥ |ũ| |ũ| .
2 0 0
RT RT
By Wirtinger’s inequality, |ũ|2 ≤ c23 ˙ 2 , it follows that
|ũ|
0 0
12
1 T
Z Z
I(u) ≥ ˙ 2 − c1 − c2 c3
|ũ| ˙2
|ũ|
2 0
Z
1 ˙ 2 − c4 .
≥ |ũ|
4
Consequently, the coerciveness of I is determined by whether ū can be bounded
by I(u). By convexity,

u(t) + (−ũ(t))
−V (t, ū/2) = −V t,
2
1
≤ − (V (t, u(t)) + V (t, −ũ(t))),
2
it follows that
1 T ˙2
Z Z T Z T
ū
I(u) ≥ |ũ| − 2 V t, dt + V (t, −ũ(t))dt.
2 0 0 2 0
RT
Since kũk∞ ≤ CkũkH 1 is bounded by I(u), − 0 V (t, −ũ(t))dt is also bounded
by I(u), whence
Z T
ū
− V t, ≤ c5 I(u) + c6 .
0 2
By (16.2), ū is bounded by I(u); this proves that I is coercive.
1
From this, we see that Eq. (16.1) has a solution u ∈ Hper ([0, T ]). By regular-
2
ity, u ∈ C .
Lastly, we examine the periodic condition. By the embedding theorem and
1
u ∈ Hper ([0, T ]), u(0) = u(T ). It remains to show u̇(0) = u̇(T ). In the integral
form of the E-L equation, we choose a period-T function ϕ ∈ C ∞ ([0, T ]), then
we have
Z T
0= [u̇ϕ̇ − Vu (t, u)ϕ]dt
0
Z T
= [−ü − Vu (t, u)]ϕdt + u̇ϕ|T0
0
= u̇(T )ϕ(T ) − u̇(0)ϕ(0).
Since ϕ(0) = ϕ(T ) is arbitrary, we have shown
u̇(T ) = u̇(0).
Theorem 16.1 Under the assumption of (16.2), Eq. (16.1) has a C 2 periodic
solution.
II. V is continuous and periodic. Suppose there exist linearly independent
vectors e1 , . . . , eN ∈ RN such that
V (t, u + ei ) = V (t, u), ∀ (t, u) ∈ [0, T ] × RN . (16.6)
We must again verify
Z T
1
I(u) = |u̇(t)|2 + V (t, u(t)) dt
0 2
is weakly sequentially lower semi-continuous and coercive.
According to Morrey’s Theorem in Lecture 11, I is for certain weakly sequen-
tially lower semi-continuous.
Since V is continuous and satisfies (16.6), there exists a constant C such that
|V (t, u)| ≤ C.
However, we cannot deduce the coerciveness of I directly from
1 T 2
Z
I(u) ≥ |u̇| − CT.
2 0
Thus, we choose the decomposition
u = ũ + ū,
1
RT
where ū = T 0
u(t) dt. Denote
X = {u ∈ HT1 ([0, T ]) | ū = 0},
1
then X is a closed linear subspace of Hper ([0, T ]).
It follows from Wirtinger’s inequality that

Z T Z T Z T
2
|u̇| = ˙ 2
|ũ| ≥ α |ũ|2 ;
0 0 0
namely, I is coercive on X.
Suppose {uj } is a minimizing sequence of I, we decompose uj = ũj + ūj ,
then {ũj } has a weakly convergent subsequence.
P
Noting that V is periodic: V (u + λi ei ) = V (u), hence
X
I u+ λi ei = I(u), ∀ (λ1 , . . . , λn ) ∈ Zn .
Although {ūj } could be unbounded, but after removing the integer parts, it be-
(j) (j)
comes bounded, i.e. ∃ (λ1 , . . . , λn ) ∈ Zn such that
N

(j)
X X
ūj + λ i ei ≤ kei k , A.

n
i=1 R i=1
P (j)
Let vj = ũj + (ūj + then I(vj ) = I(uj ), whereas {vj } is a bounded
λi ei ),
minimizing sequence. Consequently, it has a weakly convergent subsequence, still
denoted {vj } such that
vj * u∗ .
The same argument can be used to show u∗ is a minimum. Likewise, the same
steps as above can be used to verify u∗ is a periodic solution. Lastly, by regularity,
u∗ ∈ C 2 (R1 ).
Theorem 16.2 Under the assumption of (16.6), Eq. (16.1) has a C 2 periodic
solution.
III. Periodic solutions on the torus
Using the same methods as above, we can study the periodic solutions of the
E-L equation of a functional defined on the torus T N = RN /ZN . Given a La-
grangian on the torus
L(t, u, p) : T × T N × RN → R1
which satisfies
L(t + Z, u + Zn , p) = L(t, u, p)
and the following conditions
 −1
 c ≤ Lpp ≤ c,
|L | + |Lpu | ≤ c(1 + |p|),
 pt
|Lu | ≤ c(1 + |p|2 ),
then for the functional

Z 1
I(u) = L(t, u(t), u̇(t))dt,
0
its E-L equation
Z 1
{Lp (t, u(t), u̇(t))φ̇(t) + Lu (t, u(t), u̇(t))φ(t)}dt = 0,
0
1
∀ φ(t) ∈ Hper ([0, 1], RN )
1
has a periodic solution u ∈ Hper ([0, 1], RN ). Moreover, by regularity, we know
2 N
u ∈ C ([0, 1], R ).
IV. Mq,p -periodic solutions of the second kind on the torus
Under the same assumptions as before, we consider periodic solutions of the
second kind.
∀ (q, p) ∈ Z2 , q 6= 0, u is called a periodic solution of type (q, p), if
u(t + q) = u(t) + p.
A periodic solution of the free oscillation of a pendulum is a type (1, 0) periodic
solution, whereas a periodic solution of the second kind is a type (1, 1) periodic
solution. p
In Mq,p = t ∈ R1 + Wper 1,2
qt ([0, q]), the minimum of
Z q
I(u) = L(t, u(t), u̇(t)) dt
0
is a desired type (q, p) periodic solution.
Using the same minimization technique, the same argument affirms the
existence of the minimum of I.
16.3 Heteroclinic orbits
By definition, a heteroclinic orbit is an orbit connecting the (non-degenerate)

zeros of a vector field. For instance, given a Lagrangian
1
L(t, u, p) = |p|2 − V (t, u),
2
suppose
1) V ∈ C 2 (RN , R1 );
2) V is T periodic in t;
3) V (t, u) ≤ 0; there are only two non-degenerate maxima θ and ξ such that
V (t, θ) = V (t, ξ) = 0, Vu (t, θ) = Vu (t, ξ) = 0,
where Vuu (t, θ) and Vuu (t, ξ) are both negative definite;
4) ∃ V0 < 0 such that

lim V (t, u) ≤ V0 .
|u|→∞
Find a solution connecting θ and ξ of the following equation:

t ∈ R1

ü + Vu (t, u) = 0,
(16.7)
u̇(−∞) = u̇(+∞) = 0, u(−∞) = θ, u(+∞) = ξ.
We solve this by variational methods. Choose the space
Z
1 1 N 2

Ê = u ∈ Hloc (R , R ) |u̇| < ∞ ,
R1
with norm
Z
kuk2Ê = |u̇|2 dt + |u(0)|2 .
R1
We define the functional
Z ∞
1 2
I(u) = |u̇| − V (t, u)dt .
−∞ 2
I may take on value infinity on Ê. However, since we are only concerned with a
minimizing sequence {uj }, it suffices to consider the set on which the value of I
is finite. We define
Γ(θ, ξ) = {u ∈ Ê | u(−∞) = θ, u(+∞) = ξ},
where u(±∞) = lim u(t). We intend to find
t→±∞
min I(u).
u∈ Γ(θ,ξ)
Applying Morrey’s Theorem on any finite interval and by taking limits, it follows
that I is weakly sequentially lower semi-continuous.
Since I ≥ 0, it is bounded below.
1◦ We verify I is coercive. Since
Z
1
|u̇|2 dt ≤ I(u) ≤ C,
2 R1
it suffices to bound |u(0)|.
When u ∈ Γ(θ, ξ), we have
Z b
|u(b) − u(a)| ≤ |u̇(t)| dt
a
!1/2
Z b
≤ (b − a)1/2 |u̇(t)| dt .
a
As long as
−V (t, u(t)) ≥ M1 , (16.8)
we have
Z b
1 2
I(u) ≥ |u̇| − V (t, u) dt
a 2
1 |u(b) − u(a)|2
≥ + M1 |b − a|
2 |b − a|
p
≥ 2M1 |u(b) − u(a)|. (16.9)
From assumptions (3) and (4), fixing ε > 0, there exists M1 > 0 such that when
u(t) 6∈ Bε (θ) ∪ Bε (ξ),
(16.8) holds.
Noting that V is T -periodic in t, so by letting
(τi u)(t) = u(t − iT ), ∀ i ∈ Z,
we have I(τi u) = I(u).
In order to bound |u(0)|, we use the fact that V is τi invariant. ∀ u ∈ Γ(θ, ξ),
∀ > 0, ∃ i0 such that τ = τi0 satisfying
τ u(t) ∈ Bε (θ), t < 0, τ u(0) ∈ ∂Bε (θ).
Replacing u by ũ = τ uj , it is immediate that kũk = ε. This shows I is coercive.
Moreover, replacing any u ∈ Γ(θ, ξ) by ũ, then I(u) = I(ũ).
2◦ Next, we verify Γ(θ, ξ) is weakly closed with respect to the minimizing
sequence. That is, we want to prove: if {uj } ⊂ Γ(θ, ξ), uj * u and I(uj ) →
inf Γ(θ,ξ) I, then u ∈ Γ(θ, ξ).
Since u is the weak limit of the minimizing sequence and I is weakly lower
semi-continuous, it follows that
I(u) ≤ inf I.
Γ(θ,ξ)
∞ 1 n
By (16.9), u ∈ L (R , R ). Hence, it has a ω-limit point, i.e. ∃ ti → +∞,
∃ α ∈ RN such that u(ti ) → α.
1) We claim the limit point is unique. Suppose ∃ t0i → +∞ such that u(t0i ) →
β. Since I(u) < ∞, it follows from (16.9) that
Z 0
0 1 ti 1 2
|u(ti ) − u(ti )| ≤ √ |u̇| − V (t, u)dt → 0,

2M1 ti 2

hence α = β.
2) We claim α = θ or ξ. We proceed by contradiction. Suppose not, then

∃ t1 > 0 such that when t > t1 , we have
u(t) 6∈ Bε (θ) ∪ Bε (ξ).
Consequently,
Z ∞ Z ∞
I(u) ≥ − V (t, u(t))dt ≥ M1 = ∞,
t1 t1
a contradiction.
3) We claim α = ξ. Suppose not, then α = θ. ∀ δ > 0, ∃ tδ > 0 such that
when t ≥ tδ , we have u(t) ∈ Bδ (θ).
Since uj * u ∈ Ê, according to the exposition in the last paragraph of 1◦ , we
may choose uj such that uj (t) ∈ Bε (θ) for t ≤ 0 and uj (0) ∈ ∂Bε (θ). Choose
δ < ε/4 and t1 > tδ + 1 such that u(t1 ) ∈ Bδ (θ).
∃ j0 such that for j ≥ j0 , kuj − ukL∞ ([0, t1 ]) < δ, hence uj (t1 ) ∈ B2δ (θ).
It follows from (16.9) that
Z ∞
1 2 εp
I(uj ) ≥ |u̇j | − V (t, uj ) dt + 2M1 .
t1 2 2
We construct the sequence

 0, t < t1 − 1,
vj (t) = (t − t1 + 1)uj (t1 ), t ∈ [t1 − 1, t1 ],

uj (t), t > t1 ,
then vj ∈ Γ(θ, ξ) and
Z t1 Z ∞
1 2 1 2
I(vj ) = |uj (t1 )| − V (t, vj (t)) dt + |u̇j | − V (t, uj (t)) dt.
t1 −1 2 t1 2
However,
Z ∞
1 2 εp
|u̇j | − V (t, uj (t)) dt ≤ I(uj ) − 2M1
t1 2 2
and
Z t1
1
|uj (t1 )| − V (t, vj (t)) dt ≤ 2δ 2 + max (−V (t, u)).
2
t1 −1 2 |u|≤2δ
Choosing δ > 0 such that

εp
2δ 2 + max (−V (t, u)) < 2M1 ,
|u|≤2δ 4
we then obtain
εp
I(vj ) ≤ I(uj ) −
2M1 ,
4
but I(uj ) → inf Γ(θ,ξ) I, which contradicts the fact that {vj } ⊂ Γ(θ, ξ).
We have successfully established Γ(θ, ξ) is weakly closed with respect to the
minimizing sequence. Consequently, the functional I achieves its minimum, say
u∗ . By regularity, u∗ ∈ C 2 (RN ).
3◦ Lastly, we verify u̇∗ (±∞) = 0.
We already know when t > t1 , u∗ (t) ∈ Bε (ξ). Thus, by assumptions (3) and
(1), there exist β1 > 0 and β2 > 0 such that
−V (t, u∗ (t)) ≥ β1 |u∗ (t) − ξ|2 ,
|Vu (t, u∗ (t))| ≤ β2 |u∗ (t) − ξ|.
It folows that
Z ∞ Z ∞
∗
β1 2
|u (t) − ξ| dt ≤ − V (t, u∗ (t)) dt ≤ I(u∗ )
t1 t1
and
Z ∞ Z ∞ Z ∞
|ü∗ |2 dt = |Vu (t, u∗ (t))|2 dt ≤ β22 |u∗ (t) − ξ|2 dt,
t1 t1 t1
R∞
from which, we can deduce t1 |ü | dt < ∞. Together with u̇∗ ∈ L2 (R1 ), it
∗ 2
follows that u̇∗ (+∞) = 0. Likewise, u̇∗ (−∞) = 0.

Theorem 16.3 Under the assumptions (1)–(4), Eq. (16.7) has a heteroclinic
orbit.
16.4 Homoclinic orbits
Given a continuous periodic function a ∈ C 1 (R1 ) with period 2T > 0. Sup-

pose µ > 2 and ∃ α > 0 such that a(t) ≥ α. We define the potential function
1
V (t, x) = − |x|2 + a(t)|x|µ . (16.10)
2
We want to find a homoclinic orbit x ∈ H 1 (R1 , R1 ) initiating from x = 0 which
satisfies the equation
ẍ + Vx (t, x) = 0. (16.11)
We adopt the following method. ∀ k ∈ N, we find a 2kT -periodic solution xk
of Eq. (16.11). Then by letting k → ∞, we examine whether the sequence of
solutions {xk } has a limit. If the limit exists, is it still a solution of (16.11)? Is it
a homoclinic orbit?
1
We begin by defining for ∀ k ∈ N, the space Xk = H2kT ([−kT, kT ]) with
norm
Z kT
kxk2k = (|ẋ(t)|2 + |x(t)|2 )dt
−kT
and the functional
Z kT
1
Ik (x) = kxk2k − a(t)|x(t)|µ dt.
2 −kT
According to Example 15.5 in Lecture 15, Ik possesses the geometric structure
described in the Mountain Pass Theorem, that is,
1
Ik (0) = 0, Ik (x) = kxk2k + o(kxk2k ), ∃ ϕ ∈ Xk \{0} such that I(tϕ) → −∞.
2
Furthermore, Ik satisfies the Palais–Smale condition. Thus, there exists a critical
point xk ∈ Xk , i.e.
Z kT
[ẋk ϕ̇ + xk ϕ − µa|xk |µ−2 xk ϕ]dt = 0, ∀ ϕ ∈ Xk , (16.110 )
−kT
with
ck = I(xk ) > 0.
Noting that a is 2T -periodic, while xk (t) is 2kT -periodic, xk (t − jT )|[−kT,kT ]
(j ∈ Z) are also solutions of (16.10) with the same period and the same critical
value. Therefore, we can choose such a xk that
max xk (t) = max xk (t). (16.12)
t∈[0,T ] t∈[−kT,kT ]
We proceed by the following steps.

1◦ We claim ∃ M > 0, ck ≤ M , ∀ k ∈ N.
To prove this, we choose ϕ1 ∈ X1 such that ϕ1 (±T ) = 0 and I1 (ϕ1 ) ≤ 0.
Let
M = max I1 (tϕ1 )
t∈[0,1]
and

ϕ1 (t), t ∈ [−T, T ],
zk (t) =
0, t ∈ [−kT, kT ]\[−T, T ].
Then
zk ∈ Xk , Ik (zk ) = I1 (ϕ1 ) ≤ 0.
Hence, we have the estimate
ck ≤ max Ik (tzk ) = max I1 (tϕ1 ) = M.
t∈[0,1] t∈[0,1]
2◦ We claim ∃ M1 > 0 such that kxk kk ≤ M1 .

In (16.11), by choosing ϕ = xk , it yields that
Z kT
[ẋ2k + x2k − µa(t)|xk (t)|µ ]dt = 0.
−kT
Combining this with the identity

1 kT 2
Z Z kT
ck = Ik (xk ) = (ẋk + x2k )dt − a(t)|xk (t)|µ dt,
2 −kT −kT
we obtain that
Z kT
µ
−1 a(t)|xk (t)|µ dt = ck .
2 −kT
Thus,
Z kT
2
kxk2k = 2 Ik (zk ) + µ
a(t)|xk (t)| dt ≤ 2 + ck ≤ M1 .
−kT µ−2
◦
3 We want to prove there exists a solution y ∈ H 1 (R1 ). Notice that for any
1
x ∈ Hloc , we have the embedding inequality
Z t+ 21
2
|x(t)| ≤ 2 (ẋ(r)2 + x(r)2 )dr.
t− 12
This implies there exist constants C and M2 independent of k such that

kxkL∞ [−kT,kT ] ≤ Ckxkk ≤ M2 .
Substituting into (16.11), we obtain a constant M3 independent of k such that
kxk kC 2 [−kT,kT ] ≤ M3 .
Subsequently, on any finite bounded interval (−R, R), xk as well as its derivative
converge uniformly to a continuously differentiable function y, i.e.
xk → y uniformly in C 1 [−R, R].
So ∀ k ∈ N,
Z kT
[ẏ 2 + y 2 ]dt ≤ M1 .
−kT
Since k is arbitrary, we must have

Z ∞
[ẏ 2 + y 2 ]dt ≤ M1 . (16.13)
−∞
This means y ∈ H 1 (R1 ) and it satisfies Eq. (16.11) on the entire real line.
4◦ We want to prove the solution y is a homoclinic orbit. That is, we must

show y(t) → 0 as t → ±∞. We use the Fourier transform of y:
Z ∞
1
ỹ(ξ) = y(t)eiξt dt.
2π −∞
By Plancherel’s theorem,
Z ∞ Z ∞
(1 + |ξ 2 |)|ỹ(ξ)|2 dξ = [ẏ 2 + y 2 ]dt, (16.14)
−∞ −∞
together with Schwarz’s inequality, we have
Z ∞ Z ∞ 21 Z ∞ 21
2 2 2 −1
|ỹ(ξ)|dξ ≤ (1 + |ξ |)|ỹ(ξ)| dξ (1 + |ξ| ) dξ < ∞.
−∞ −∞ −∞
Moreover, by the Riemann–Lebesgue theorem, it follows immediately that
y(t) → 0 as t → ±∞.
◦
5 Lastly, we explain why this solution y is non-trivial. The main idea is to
show there exists a constant δ > 0 such that
kxk kL∞ [−kT,kT ] ≥ δ. (16.15)
To carry out this idea, we define the function
a(t)|x|µ
φ(r) = max .
t∈[0,T ], |x|≤r |x|2
This is a monotone non-decreasing function which satisfies

φ(r) > 0, r > 0,
φ(r) → ∞, r → ∞.
Since xk is a solution of (16.11), we have
Z kT
2
kxk kk = µ a(t)|xk (t)|µ dt.
−kT
φ and kxk kk are related via
Z kT
kxk k2k ≤ µφ(kxk kL∞ [−kT,kt] ) |xk (t)|2 dt ≤ µφ(kxk kL∞ [−kT,kt] )kxk k2k .
−kT
Thus,
1
φ(kxk kL∞ [−kT,kT ] ) ≥.
µ
By choosing an appropriate δ, (16.15) can be fulfilled.
Consequently, (16.12) implies
max |xk (t)| ≥ δ,
t∈[0,T ]
whence
max1 |y(t)| ≥ δ.
t∈R
Theorem 16.4 Suppose µ > 2 and a ∈ C 1 (R1 ) is a 2T -periodic solution

(T > 0). If ∃ α > 0 such that a(t) ≥ α, ∀ t ∈ R1 , then Eq. (16.11) has a
non-trivial homoclinic orbit.
Lecture 17
Geodesics and minimal surfaces
17.1 Geodesics
Let (M, g) be a Riemannian manifold. A curve
γ : [0, 1] → M, γ̇ 6= θ
is called a geodesic, if the acceleration vector field

D dγ
= 0.
dt dt
This means, along the curve γ, the velocity vector field is parallel. Hence, along
γ,

d dγ dγ D dγ dγ dγ 1
g , = 2g , = 0 ⇐⇒ dt = g(γ̇, γ̇) = const.
2
dt dt dt dt dt dt
Fig. 17.1
In local coordinates, γ = (u1 , . . . , un ) and the geodesic equation is given by

d2 uk k dui duj
+ Γ ij (u) · = 0, 1 ≤ k ≤ n,
dt2 dt dt
243
where Γkij are the Christoffel symbols on M , i.e.

k 1 kl ∂gil ∂gjl ∂gij
Γij = g + − ,
2 ∂uj ∂ui ∂ul

∂ ∂
gij = g , , 1 ≤ i, j, k ≤ n.
∂ui ∂uj
Xn
g ij gjk = δki .
j=1
• The arclength functional and the energy functional

For a C 1 curve γ in (M, g), or more generally, γ ∈ W 1,1 ([0, 1], M ), we call
Z 1
dγ
L(γ) = (t) dt
dt
0
the arclength functional. For γ ∈ H 1 ([0, 1], M ), we call

2
1 1
Z
dγ
E(γ) = (t) dt
2 0 dt
the energy functional.
By Schwarz’s inequality,
L(γ)2 ≤ 2E(γ),
and the equality holds if and only if k dγ

dt (t)k = const.
• L is invariant under parameter diffeomorphisms
Z T
dγ ◦ s
L(γ ◦ s) = dτ (τ ) dτ

0
Z T
dγ ds
=
dt (s(τ )) (τ ) dτ
dτ
0
Z 1
dγ
= (t) dt
dt
0
= L(γ).
However, E is not invariant under diffeomorphisms!

• The normalized arclength functional
Z t
1 dγ
l(t) = (s) ds, t ∈ [0, 1].
L(γ) 0 ds
September 2, 2016 9:0 ws-book9x6 BC: 10157 – Lecture Notes on Calculus of Variations book page 245
Geodesics and minimal surfaces 245
Since l : [0, 1] → [0, 1] is a diffeomorphism, letting s = l−1 , then γ ◦ s : [0, 1] →

M satisfies
−1
d(γ ◦ s) dγ ds
(l) = dγ (t) dl (t)

dl (l) =
dt ◦ s(l) dl dt dt = L(γ).
If γ is parametrized by its normalized arclength, i.e. it is parametrized by γ ◦ s,

then
L(γ)2 = 2E(γ).
Given two points P0 , P1 ∈ M , let
Γ = γ ∈ H 1 ([0, 1], M ) | γ(i) = Pi , i = 0, 1 ,

then
p
inf L(γ) = inf 2E(γ).
γ∈Γ γ∈Γ
Finding the curve with the shortest distance connecting the points P0 and P1 is
equivalent to finding the curve connecting P0 and P1 with minimal energy.
Thus, the curve γ with the shortest arclength also minimizes the energy.
Consequently, γ satisfies the E-L equation of E:
d
Lp (t, u(t), u̇(t)) = Lu (t, u(t), u̇(t)) ,
dt
Pn
where L(t, u, p) = 12 i,j=1 gij (u)pi pj . Namely,
n
X d 1 ∂gkj k j
(gij (u(t))u̇(t)) − (u(t)) u̇ (t)u̇ (t) =0
j=1
dt 2 ∂ui
n n n
X
j
X ∂gij k j X ∂gkj k j
⇐⇒ 2gij ü + 2 u̇ u̇ − u̇ u̇ = 0
j=1
∂uk ∂ui
k=1 k=1
n n n
1 XXX
⇐⇒ üi + g il (2glj,k u̇k u̇j − gkj,l u̇k u̇j ) = 0
2
l=1 k=1 j=1
1 Pn Pn Pn
⇐⇒ üi + g jl (glj,k + gkl,j − gjk,l )u̇k u̇j = 0
2 l=1 k=1 j=1
X n
⇐⇒ üi + Γijk u̇k u̇j = 0.
k,j=1
This is precisely the geodesic equation. So the curves of the shortest arclengths
are themselves geodesics!
• The existence of geodesics

Assume (M, g) is a compact Riemannian manifold, given two points P0 , P1 ∈
M , does there exist a geodesic connecting P0 and P1 ?
By The Nash Embedding Theorem, we can isometrically embed (M, g) into
some RN for N sufficiently large. For the functional
1 1
Z
E(u) = ku̇(t)k2RN dt,
2 0
consider the set

S = { u ∈ H 1 ([0, 1], RN ) | u(0) = P0 , u(1) = P1 , u(t) ∈ M, ∀ t ∈ [0, 1]
= C∗ ([0, T ], M ) ∩ H 1 ([0, 1], RN ),
where
C∗ ([0, T ], M ) = {u ∈ C([0, T ], M ) | u(i) = Pi , i = 0, 1}.
This is because the embedding i : H 1 ,→ C is continuous.
Moreover, we have the estimate
Z t2

ku(t1 ) − u(t2 )kRN =
u̇(t)dt

t1
Z t2
≤ ku̇(t)kdt
t1
Z t2 12
1
2
≤ |t1 − t2 | 2 ku̇(t)k dt
t1
1 1
≤ (2E(u)) |t1 − t2 | 2 .
2
This implies S is a weakly closed subset of H 1 ([0, T ], RN ).

Using the direct method, E attains its minimum, which is also the minimum
of L.
We arrive at the following.
Theorem 17.1 Let (M, g) be a compact Riemannian manifold and let P0 , P1 ∈
M be given. Then there exists a curve joining P0 and P1 with the shortest arc
length. Furthermore, this curve satisifies the geodesic equation.
Remark 17.1 In the above approach, we did not seek the minimum of L(γ)
directly, this is because L(γ) is invariant under any diffeomorphism on [0, 1].
However, the diffeomorphism group itself is vast and without compactness.
On the contrary, E(γ) is not invariant under the diffeomorphism group action.
Compactness is also available. In the original proof due to Hilbert, the use of
“parametrizing by the arclength” was intended to avoid the diffeomorphism group.
So in essence, it plays the same role as minimizing the energy functional.
17.2 Minimal surfaces
In geometry, a minimal surface is defined to be a surface whose mean curva-

ture is zero.
For a surface X over the domain Ω ⊂ R2 , we adopt the parametric equation
X : Ω −→ Rn , (x, y) 7→ X = (X 1 , . . . , X n ).
In isothermal coordinates, the parametrization X of a minimal surface satisfies


 ∆X i = 0, 1 ≤ i ≤ n (harmonic equation)
(17.1)


Xn n
X
[(Xxi )2 − (Xyi )2 ] = Xxi · Xyi = 0, (weak conformal condition).




i=1 i=1
Example 17.1 The catenoid


 x = cosh u sin v,
y = cosh u cos v,

z = u.
The origin of its name is related to the following problem.
• The Plateau problem of minimal surfaces
Given a Jordan curve Γ in Rn (see Figure 17.2).
Find a (disklike) surface X : D̄ → Rn bounded by Γ such that its enclosed
area is a minimum. Denote
D̄ = { w = (x, y) ∈ R2 |w| ≤ 1},
X = X 1 (w), . . . , X n (w) .
Fig. 17.2
Recall the square of the area determined by the two vectors Xx and Xy is
|Xx ∧ Xy |2 = |Xx |2 |Xy |2 − (Xx · Xy )2 ,
hence the area of X is

Z
A(X) = |Xx ∧ Xy |dx ∧ dy.
D
• Boundary conditions Let X̂ = X|∂D . Suppose X̂ : ∂D −→ Γ is an

oriented parametrization; namely, it is an orientation preserving homeomorphism.
Denote
C(Γ) = {X ∈ H 1 (D̄, Rn ) | X̂ = X|∂D ∈ C(∂D, Γ)

is a weakly monotone parametrization}.
Before applying variational methods to study minimal surfaces, let us first insert
certain condition on Γ such that
inf A(X) < ∞,

X∈C(Γ)
which ensures C(Γ) 6= ∅.

For example, if we assume Γ is rectifiable, then we prove there exists X0 ∈
C(Γ) such that
A(X0 ) = inf A(X).

X∈C(Γ)
• A is invariant under diffeomorphisms

Let z = (u, v) and z = z(w), then

dz ∂x u ∂y u
= .
dw ∂x v ∂y v
Thus,
Xx ∧ Xy dx ∧ dy = Xu ∧ Xv du ∧ dv.
Since the diffeomorphsim group Diff(D̄, D̄) is vast, it has no compactness. We

will employ the direct method to minimize the area functional A. As in the
geodesic problem, we turn to the Dirichlet integral
Z
1
D(X) = |∇X|2 dx ∧ dy,
2 D
where ∇X = (Xxi , Xyi ), 1 ≤ i ≤ n.
• Conformal mappings A diffeomorphism
g : R2 → R2 , w 7→ z
is conformal, if it satisfies the following weakly conformal condition:
|gx |2 − |gy |2 = gx · gy = 0.
A direct calculation shows that although the Diriihlet integral is not diffeomorphi-
cally invariant, it is conformally invariant!
Z Z
|∇w X|2 dx ∧ dy = |∇z X|2 du ∧ dv.
D D
We also note the following relations:

1
|α ∧ β| ≤ (|α|2 + |β|2 ), “ = ” ⇐⇒ |α|2 − |β|2 = α · β = 0;
2
A(X) ≤ D(X), “ = ” ⇐⇒ X is weakly conformal.
We have the following important result.

Theorem 17.2 (Morrey–Lichtenstein) Let Γ ∈ C 2 be a Jordan curve, then
C(Γ) 6= ∅, and
inf A(X) = inf D(X).

X∈C(Γ) X∈C(Γ)
In order to avoid disrupting the proof of existence, we shall postpone the proof
of the above theorem until the end of this lecture.
The advantage of minimizing the Dirichlet integral instead of the area func-
tional is that the conformal group is far smaller than the diffeomorphism group.
The conformal group is the set of all conformal transformations forming a
group. It is generated by three real parameters:

iφ w + a 1

G = g(w) = e |a| < 1, φ ∈ R .
1 + āw
It is worth noting when a = aj → 1, gj (w) is concentrated at one point on the
unit circle. This means ∀ X ∈ C(Γ), the closure of the orbit {X ◦ g | g ∈ G} in
the weak topology on H 1 (D, Rn ) contains constant functions. However, the latter
cannot be a weakly monotone parametrization of Γ, so C(Γ) cannot be a weakly
closed subset in H 1 . Even replacing the functional by the Dirichlet integral, it is
still impossible to get around the weak closedness issue of C(Γ).
To rectify this, we further reduce the influence of the conformal group G: by
imposing additional restriction on C(Γ), we hope to “mod out” the action of G.
Choose three arbitrary points P0 , P1 , and P2 on Γ and let

2kπ
∗ i

C (Γ) = X ∈ C(Γ) X e 3 = Pk , 0 ≤ k ≤ 2 .
2kπ
We call X e 3 i = Pk for 0 6 k 6 2 the three-point condition. Since
inf A(X) = inf D(X) = inf D(X),

X∈C(Γ) X∈C(Γ) X∈C ∗ (Γ)
we instead find
min{D(X), X ∈ C ∗ (Γ)}.
If X is the minimizing function, then it satisfies

(1)

d
D(X + εϕ) = 0,
dε ε=0
(2)
Z
d −1
d 1 ∇z (X ◦ gε−1 (z))2 du ∧ dv|ε=0 = 0,

D(X ◦ gε , gε (D))
=
dε ε=0 dε 2 gε (D)
where gε : D → gε (D) is a diffeomorphism.

The rationale is as follows. Although C ∗ (Γ) is not a differentiable manifold,
for any ϕ ∈ C0∞ (D, Rn ), X +ϕ ∈ C ∗ (Γ), so (1) holds. From this, we can deduce
the E-L equation
∆X = 0 in D,
which is the first equation in (17.1).

d
As for (2), we argue by contradiction. Suppose dε D(X ◦ gε−1 , gε (D))|ε=0 6=
0, then there must exist a diffeomorphism ḡ ∈ C (D̄, R2 ) such that X̄ε = X ◦
1
ḡε−1 satisfying
D(X̄ε , ḡε (D)) < D(X).
However, ḡε (D) is simply connected, so by the Riemann Mapping Theorem, there
exists a conformal mapping hε : D → ḡε (D). Let X̃ε = X̄ε ◦ hε , then X̃ε ∈
C(Γ). Furthermore, by the conformal invariance of the Dirichlet integral, we have
D(X̃ε ) = D(X̄ε , ḡε (D)) < D(X).
This is a contradiction!
From (2) and a similar argument used in Example 8.4 in Lecture 8, it follows
that on D,
|Xx |2 = |Xy |2 ,

Xx · Xy = 0,
which is the second equation in (17.1).
Suppose dg dε |ε=0 =P
ε
τ , τ = (τ 1 , τ 2 ). Applying Noether’s formula in
1 n
Lecture 8 to L(p) = 2 ( i=1 (pi1 )2 + (pi2 )2 ), it yields
Z
div(Lτ + Lpi · ϕi ) dxdy = 0,
D
i
where ϕ = −(Xxi τ 1 + Xyi τ 2 ).
Since
Lτ +Lpi · ϕi
1
= (|Xx |2 +|Xy |2 )(τ 1 , τ 2 )−Σi (Xxi τ 1 +Xyi τ 2 )(Xxi , Xyi )
2

1 1
= − (|Xx |2 −|Xy |2 )τ 1 −Xx · Xy τ 2 , −Xx · Xy τ 1 + (|Xx |2 −|Xy |2 )τ 2 ,
2 2
it follows that
1
div(Lτ + Lpi · ϕi ) = (|Xx |2 − |Xy |2 )(τy2 − τx1 ) − Xx · Xy (τy1 + τx2 ).
2
We have thus proved
d
D(X ◦ gε−1 , gε (D))|ε=0
dε Z
d 1 ∇z (X ◦ gε−1 )2 du ∧ dv

=
dε 2 gε (D)
Z
1
=− [(|Xx |2 − |Xy |2 )(τx1 − τy2 ) + 2Xx · Xy (τy1 + τx2 )] dxdy.
2 D
Since λ = τx1 − τy2 and µ = τy1 + τx2 are arbitrary, we conclude
|Xx |2 − |Xy |2 = Xx · Xy = 0 a.e. in D.
We have shown the minimizing function of the energy functional is indeed a
generalized solution of the minimal surface equation.
• Replacing Rn by a more general Riemannian manifold (N n , h), while Γ is
an embedded Jordan curve in N . The corresponding equation becomes

trace(∇dX) = 0,
h(Xx , Xx ) − h(Xy , Xy ) = h(Xx , Xy ) = 0.
Via the isometric embedding N ,→ RN , it can also be written as

∆X = A(X)(∇X, ∇X),
|Xx |2 − |Xy |2 = Xx · Xy = 0,
where A(X)(·, ·) denotes the second fundamental form of X.
The existence of solution for the Plateau problem of a minimal surface was
proved by Douglas (1931), Rado (1933), Courant (1945), and Struwe (1988), etc.
The regularity of solution was proved by Hilbrandt (1969, 1971), which states: If
Γ ∈ C 2,α , then X ∈ C 2,α (Ω̄, RN ) for 0 < α < 1.
Proof of existence It is of particular note that in order to minimize D(X), it suf-
fices to only consider its boundary value X̂ instead of the whole function X. The
reason is, from E-L equation, we know a harmonic function achieves the minimal
value of the Dirichlet integral D(X), and a harmonic function is determined by
its boundary values. In fact, consider the Fourier expansion of X̂
∞
X
X̂(eiθ ) = a0 + (ak cos kθ + bk sin kθ), (17.2)
k=1
where
Z 2π
1 2π
Z
1
a0 = X̂(eiθ )dθ, ak = X̂(eiθ ) cos kθdθ,
2π 0 π 0
1 2π
Z
bk = X̂(eiθ ) sin kθdθ, k = 1, 2, . . . ,
π 0
X∞
k(|ak |2 + |bk |2 ) < ∞,
k=1
then
∞
1 X
X(reiθ ) = a0 + (ak cos kθ + bk sin kθ)rk . (17.3)
2
k=1
It is evident that we may regard

∞
X
D(X) = π k(|ak |2 + |bk |2 )
k=1
as a functional defined on the set C ∗ (Γ) ⊂ X = {X̂ = X|∂D | D(X) < ∞}; on
X , if we define the norm to be
∞ 12
1 2 X 2 2
kX̂k = a + k(|ak | + |bk | ) ,
4 0
k=1
then it is a Hilbert space. D is clearly weakly sequentially lower semi-continuous.

Since the range of X̂ falls on Γ, |a0 | is bounded, hence D(X) is coercive.
However, in order to apply the direct method, the notion of “compactness” is
of particular concern.
For a given Γ, X̂ is merely a parametrization of Γ.
For a family of minimizing functions, their continuous moduli are in fact con-
tinuous moduli of Γ under diffeomorphic transformations. The crucial step in
proving the equicontinuity of a sequence of minimizing functions is that for a
family of parametrizations of finite energy, intervals cannot be concentrated to a
single point. The following lemma plays a key role.
Lemma 17.1 (Courant–Lebesgue)
√ Let X ∈ H 1 (D, Rn ), then ∀ w ∈ D̄, ∀ δ ∈
(0, 1), ∃ ρ ∈ [δ, δ] such that
Z
D(X)
|∂s X|2 ds ≤ 4 ,
Cρ0 ρ| ln δ|
where ∂s denotes the tangential derivative, Cρ0 is the arc whose center lies on the
unit circle and whose radius is ρ (see Figure 17.3).
Fig. 17.3
Proof By Fubini’s Theorem, for almost all ρ ∈ (0, 1), |∂s X| ∈ L2 (Cρ ) and
Z
2D(X) ≥ |∇X|2 dz
(B√δ (w)\Bδ (w))
√ Z
Z δ
≥ |∂s X|2 dsdρ
δ Cρ
√
Z Z δ
2 dρ
≥ ess inf√ ρ |∂s X| ds .
δ<ρ< δ Cρ δ ρ
Moreover, from
√
Z δ
dρ 1
= | ln δ|,
δ ρ 2
the assertion now follows.
As a consequence, we have the following.
Lemma 17.2 ∀ M > 0, the set
n o
CM = X̂ : ∂D −→ Γ X̂ = X|∂D , X ∈ C ∗ (Γ), D(X) ≤ M

is equicontinuous.
Proof Since Γ is a Jordan curve, ∀ ε > 0, ∃ d > 0, ∀ P 6= P 0 ,
|P − P 0 | < d
=⇒ Among the two components of Γ\{P, P 0 }, at least one has diam < ε.
1◦ . ∀ ε > 0, assume ε < mini6=j |Pi − Pj |. Choose d > 0 as above, choose
δ > 0 such that |8πM 2
ln δ| < d . ∀ X ∈ CM , ∀ w ∈ ∂D, set Cρ = D ∩ ∂Bρ (w). By
the Courant–Lebesgue lemma,
2
|X(w1 ) − X(w2 )|2 ≤ [diam X(Cρ )]
Z 2
≤ |∂s X|ds
Cρ
Z
≤ L(Cρ ) |∂s X|2 ds
Cρ
8πM
≤ , ∀ w1 , w2 ∈ Cρ ,
| ln δ|
where L(Cρ ) is the arclength of Cρ . Hence, {w1 , w2 } = ∂D ∩ ∂Bρ (w) =⇒
among the two components of Γ\{X(w1 ), X(w2 )}, at least one has diameter less
than ε.
2◦ Choose δ > 0 such that ∀ z ∈ ∂D, at least two values of k satisfy
√
2πk
z − e 3 i ≥ δ.

Suppose ∂D\{w1 , w2 } = C1 ∪ C2 , C1 ∩ C = {w
2 2kπ 1 , w2 }, then at least one of C1 ,
C2 , say C1 , contains at most one point of e 3 i k = 1, 2, 3 . Hence,
diam X̂(C1 ) < ε.
∀ z, z 0 ∈ ∂D, when |z − z 0 | < δ, choose w0 ∈ ∂D and ρ > 0 such that z, z 0 ∈ C1 .
Thus,
|X̂(z) − X̂(z 0 )| ≤ diam X̂(C1 ) < ε =⇒ CM is equicontinuous.
Theorem 17.3 If Γ ∈ C 2 is a Jordan curve, then ∃ X0 ∈ C ∗ (Γ) such that

D(X0 ) = inf D(X).
X∈C∗ (Γ)
Proof We consider the functional D(X) on C ∗ (Γ)

X̂ ↔ X 7→ D(X),
where the relation of X and X̂ is determined by (17.2) and (17.3). We already
know that it is both weakly sequentially lower semi-continuous and coercive, it
remains to show C ∗ (Γ) is weakly closed. Suppose {X̂j } converges to X̂0 weakly,
then their norms are bounded. By Lemma 17.2, the sequence is equicontinu-
ous. Furthermore, since the range of X̂ lies on Γ, {X̂j } is uniformly bounded.
By the Arzelà–Ascoli theorem, it has a subsequence {X̂j0 } which converges uni-
formly to a continuous function. Consequently, X0 : ∂D → Γ is continuous
and weakly monotone. Furthermore, it also satisfies the three-point condition, so
X0 ∈ C ∗ (Γ).
In summary, the Plateau problem of a minimal surface has a solution in C(Γ).
By Weyl’s theorem, the solution is C ∞ in D. As for the regularity of the solution
on D̄, it is beyond the scope of this book, hence omitted.
We now return to the proof of Theorem 17.2, that is, A(X) and D(X) share
the same infimum. We have shown previously A(X) ≤ D(X) and the equality
holds if and only if X is weakly conformal.
Lemma 17.3 Let Γ ∈ C 2 be a Jordan curve and X ∈ C(Γ) ∩ C 2 (D̄, Rn ),
then ∀ ε > 0, ∃ g ∈ H 1 ∩ C(D̄, R2 ) with g : D̄ → D̄ surjective and g :
∂D → ∂D monotone, such that Xε ◦ g is weakly conformal, where Xε (x, y) =
(X(x, y), εx, εy) ∈ C 2 (D̄, Rn+2 ).
Proof Let
S = {g ∈ C 1 (D̄, R2 ) | g : D̄ → D̄ is a diffeomorphism,
2kπ 2kπ
g(e 3 i )=e 3 i for 1 ≤ k ≤ 3}
and S̄ be the weak closure of S in H 1 (D, R2 ). We also define
Z
1
E(g) = D(Xε ◦ g) = |(∇Xε ) ◦ g) · ∇g|2 dxdy.
2 D
Since
E(g) ≥ εD(g),
E is a weakly sequentially lower semi-continuous and coercive functional on S̄.
According to the proof of Theorem 17.3, E has a minimum g0 on S̄, i.e.
D(Xε ◦ g0 ) = E(g0 ) ≤ D(Xε ◦ g), ∀ g ∈ S̄.
From (2), it follows that Xε ◦ g0 is weakly conformal.

At this point, we have already obtained
A(Xε ◦ g0 ) = D(Xε ◦ g0 ).
Although A(Xε ◦ g0 ) = D(Xε ◦ g0 ) is true for any diffeomorphism g, but g0 ∈
S̄ may not be a diffeomorphism. In the following, we endeavor to extend the
diffeomorphic invariance of the area to S̄. If we are successful, then the proof of
Theorem 17.2 is complete. But first, let us study S̄.
Lemma 17.4 S̄ ⊂ C(D̄, D̄) ∩ C(∂D, ∂D). ∀ g ∈ S̄, g : D̄ → D̄ is surjective
and it is weakly monotone on ∂D; furthermore, it also satisfies the three-point
condition.
Proof 1. For boundary points, by the Courant–Lebesgue lemma, or for Γ = ∂D,
directly by Lemma 17.2, we have ∀ ε > 0, ∃ ρ > 0 such that
sup |g(w) − g(w0 )| < ε,
w,w0 ∈Cρ
where Cρ = ∂Bρ (w0 ) ∩ D.

2. For interior points, since g ∈ S is a diffeomorphism, it maps Bρ (w0 ) ∩ D
into a small neighborhood contained in g(Cρ ),

sup |g(w) − g(w0 )| ≤ sup sup |g(w) − g(w0 )|
|w−w0 |<δ w0 ∈D̄ w,w0 ∈Bρ (w0 )∩D

0
= sup sup |g(w) − g(w )| < ε.
w0 ∈D̄ w,w0 ∈Cρ
This implies the H 1 bounded subset of S is equicontinuous.

3. ∀ g ∈ S̄, ∃ {gj }∞ 1
1 ⊂ S such that gj * g in H (D). We have proved earlier
that gj is equicontinuous on D̄. According to the Arzelà–Ascoli theorem, gj → g
uniformly on D̄, so g ∈ C(D̄, D̄) is surjective and g ∈ C ∗ (∂D).
By adding differentiability to X, we can further extend this invariance.
Lemma 17.5 Let Γ ∈ C 2 be a Jordan curve, then ∀ g ∈ S̄, ∀ X ∈ C ∗ (Γ) ∩
C 2 (D̄, Rn ), we have X ◦ g ∈ C ∗ (Γ) and A(X ◦ g) = A(X).
Proof From X ∈ C ∗ (Γ) and Lemma 17.4, g ∈ C ∗ (∂D), hence X ◦ g ∈ C ∗ (Γ).
We already knew the area functional is invariant under diffeomorphisms, we
now extend this property to S̄ via weak limits. To do so, we divide into two steps.
1. Fix w = (h(z)) ∈ S̄, ∀ g ∈ S, consider the integral
Z
|Xu × Xv |w=(h(z)) det(∇g(z))dz.
D
Noting
det(∇g) = ∂x (g1 ∂y g2 ) − ∂y (g1 ∂x g2 ),
and using integration by parts, it yields

Z
2 |Xu × Xv |w=(h(z)) det(∇g) dxdy
D
Z
= − [Gh (g1 ∂y g2 − g2 ∂y g1 ) − Hh (g1 ∂x g2 − g2 ∂x g1 )] dxdy
ZD
+ |Xu × Xv |w=(h(eiθ )) ψ 0 (θ) dθ,
∂D
where
ψ 0 (θ) = g1 ∂θ g2 − g2 ∂θ g1 ,
Gh = |Xu × Xv |u ∂x h1 + |Xu × Xv |v ∂x h2 ,
Hh = |Xu × Xv |u ∂y h1 + |Xu × Xv |v ∂y h2 .
Since Gh , Hh ∈ L2 (D), the area integral on the right-hand side of the equation
can be extended to g ∈ S̄.
In the line integral on the right-hand side of the equation, we notice
g12 + g22 = 1,
whence
g2 (eiθ )
ψ(θ) = arctan .
g1 (eiθ )
When g ∈ S̄, g ∈ C(D̄, D̄) and g : ∂D → ∂D is monotone, i.e. ψ is
monotone. So ψ 0 (θ) exists almost everywhere. In addition,
Z
|ψ 0 (θ)|dθ = 2π.
∂D
From which, we conclude ψ ∈ W 1,1 (∂D). This means the line integral on the
right-hand side of the equation can also be extended to g ∈ S̄. Namely, if {g j } ⊂
S, g ∈ S̄ such that g j * g, then
Z
2 |Xu × Xv |w=(g(z)) det(∇g) dxdy
D
Z
= lim [− [Gg (g1j ∂y g2j − g2j ∂y g1j ) − Hg (g1j ∂x g2j − g2j ∂x g1j )] dxdy
j→∞ D
Z
+ |Xu × Xv |w=(g(eiθ )) ψj0 (θ)dθ]
∂D
Z
= lim 2 |Xu × Xv |w=(g(z)) det(∇g j ) dxdy,
j→∞ D
0
where ψ (θ) = g1j ∂θ g2j − g2j ∂θ g1j .
2. We now prove: if {gj } ⊂ S, gj * g in H 1 ∩ C(D̄, D̄), then

Z

|Xu × Xv |w=(g (z)) − |Xu × Xv |w=(g(z)) det(∇gj ) dxdy → 0.
j
D
On one hand, Lemma 17.4 implies gj → g uniformly on D̄, hence

|Xu × Xv |w=(g (z)) − |Xu × Xv |w=(g(z)) → 0 uniformly on D̄.
j
On the other hand, we have
Z Z
| det(∇gj )| dxdy ≤ |∇gj |2 dxdy ≤ M,
D D
the assertion follows.
3. From
A(X ◦ g) = A(X), ∀ g ∈ S,
it follows readily that
A(X ◦ g) = A(X), ∀ g ∈ S̄.
Proof of Theorem 17.2 For a Jordan curve Γ with differentiability assumption, by
using it as boundary value to solve the harmonic equation, we see that C(Γ) 6= ∅.
From our earlier analysis, it suffices to show inf D(X) ≤ inf A(X). Combin-
ing Lemmas 17.3 and 17.5, for X ∈ C ∗ (Γ) ∩ C 2 (D̄, Rn ), ∃ g ∈ S̄ such that
A(Xε ) = A(Xε ◦ g) = D(Xε ◦ g).
Since
D(X ◦ g) ≤ D(Xε ◦ g)
and
lim A(X ) = A(X),
ε→0
it follows that
inf D(X) ≤ inf A(X).
X∈C(Γ) X∈C(Γ)∩C 2 (D̄,Rn )
Moreover, since Γ ∈ C 2 , C 2 (D̄, Rn ) ∩ C(Γ) is dense in C(Γ), this completes the

proof.
Lecture 18
Numerical methods for variational

problems
18.1 The Ritz method
The direct method for solving a variational problem consists of two main parts:
(1) the construction of a minimizing sequence,
(2) taking the limit of the minimizing sequence in order to obtain a solution.
L. Rayleigh and W. Ritz proposed a numerical method for finding a mini-
mizing sequence, known as the Rayleigh–Ritz method. Let M0 be a linear sub-
space
R of the functions space X over some domain Ω, I is the integral functional
Ω
L(x, u(x), ∇u(x)) dx on Ω, whose domain M = ϕ0 + M0 , where ϕ0 is a
function on Ω (determined by its inhomogeneous boundary value).
The idea of the Rayleigh–Ritz method is to choose a complete basis
{e1 , e2 , . . .} for M0 and let
Mn = ϕ0 + span{e1 , e2 , . . . , en } ⊂ M.
We then search, on Mn , for the minimum ϕn ∈ Mn of the functional I. Since
any function u in Mn can be expressed as
n
X
u = ϕ0 + ξi ei ξ = (ξ1 , . . . , ξn ) ∈ Rn (18.1)
i=1
I|Mn is in fact a function in the n real variables ξ1 , . . . , ξn :

X n
I(u) = I ϕ0 + ξi ei . (18.2)
i=1
Consequently, we can either directly apply optimization technique to find the min-
imum ϕn of I|Mn , or by solving the system of n homogeneous equations:
n
∂ X
I ϕ0 + ξi ei = 0, j = 1, 2, . . . , n. (18.3)
∂ξj i=1
259
Closely related to this idea is the Ritz–Galerkin method. However, this method is
not targeted at the functional I, rather at its E-L equation
Z
δI(u, ϕ) = (Lu (τ )ϕ(x) + Lp (τ )∇ϕ(x)) dx = 0, ∀ ϕ ∈ M0 ,
Ω
where τ = (x, u(x), ∇u(x)). On M , we write u in the form of (18.1), then by

choosing ϕ = ej , j = 1, . . . , n, successively, we obtain
Z
(Lu (τn (x))ej (x) + Lp (τn (x))∇ej (x)) dx = 0, j = 1, . . . , n, (18.4)
Ω
where
n
X n
X
τn (x) = x, ϕ0 (x) + ξi ei (x), ∇ϕ0 (x) + ξi ∇ei (x) .
i=1 i=1
Note that (18.4) is also a system of n equations in the n variables ξ = (ξ1 , . . . , ξn ).

Example 18.1 Let f ∈ L2 (Ω) and define the fucntional
Z
1
I(u) = |∇u|2 − f · u dx, u ∈ H01 (Ω). (18.5)
Ω 2
Let Vn = span{e1 , . . . , en } be an n dimensional linear subspace of H01 (Ω). Ac-

cording to the Rayleigh–Ritz method, we convert it to be the problem of finding
the minimal value of
Z " X n n
#
1 2
X
Jn (ξ1 , . . . , ξn ) = |ξi ∇ei (x)| + f (x) ξi ei (x) dx
Ω 2 i=1 i=1
n n
1 X X
= aij ξi ξj + bj ξj ,
2 i,j=1 j=1
R R
where aij = Ω ∇ei (x) · ∇ej (x) dx for i, j = 1, . . . , n, and bj = Ω f (x)ej (x) dx
for j = 1, . . . , n.
This is equivalent to solving the following system of linear equations:
n
X
aij ξj = bi , i = 1, 2, . . . , n. (18.6)
j=1
Since the corresponding E-L equation is

Z
[∇u · ∇ϕ − f · ϕ] dx = 0, ∀ ϕ ∈ H01 (Ω),
Ω
Numerical methods 261
according to the Ritz–Galerkin method, we must solve the system of equations

Xn Z Z
ξi ∇ei (x)∇ej (x) dx − f (x)ej (x) dx = 0, j = 1, . . . , n,
i=1 Ω Ω
namely,
n
X
aij ξi − bj = 0, j = 1, . . . , n. (18.7)
i=1
Since aij = aji for all i and j, (18.6) and (18.7) are identical to each other.
Remark 18.1 The Ritz–Galerkin method is not only applicable to the E-L equa-
tion of a functional, but also applicable to all differential equations whose weak
solutions are expressed in terms of integrals. For example, suppose aij , bj , and c
are continuous functions, we define a weak solution u ∈ H01 (Ω) of the equation
X n X n
∂ ∂ ∂
aij (x) u(x) + bj (x) u(x)+c(x)u(x) = f (x), x ∈ Ω


i,j=1
∂x j ∂x j j=1
∂x j


u ∂Ω = 0
(18.50 )
1
to be such that ∀ ϕ ∈ H0 (Ω), it satisfies the integral equation
Z n Xn
X ∂u(x) ∂ϕ(x) ∂u(x)
− aij (x) + bj (x) +c(x)u(x)−f (x) ϕ(x) dx = 0.
Ω i,j=1
∂xj ∂xi j=1
∂xj
Pn
By substituting u = k=1 ξk ek (x) and ϕ(x) = el (x), l = 1, 2, . . . , n, we can
again obtain a system of n linear equations in n variables.
18.2 The finite element method
In view of the previous discussion, we are confronted with the challenge of

how to construct such finite dimensional approximating subspaces. We propose a
criterion, according to which, a sequence of finite dimensional subspaces approxi-
mating any given space can be constructed. As examples, in variational problems,
we frequently encounter the space H01 (Ω) (the Dirichlet problem with zero bound-
ary value) and the space H 1 (Ω) (the zero Neumann boundary value), we follow
the steps outline below to construct function spaces on Ω as finite dimensional
subspaces of the above mentioned spaces.
For simplicity, we assume the domain Ω is a bounded polyhedron. In other
words, ∂Ω consists of piecewise hyperplanes. We select spaces consisting of
piecewise linear functions to approximate H01 (Ω) (or H 1 (Ω)).
We first introduce some terminology. We call K ⊂ Rn an n-simplex, denoted
K = {P0 , P1 , . . . , Pn }, if it is the closure of the convex hull of the n + 1 points
{P0 , P1 , . . . , Pn }, where Pi ∈ Rn for i = 0, 1, . . . , n such that
{Pi − P0 | 1 ≤ i ≤ n}
are linearly independent.

It is clear a 0-simplex is a point, a 1-simplex is a line segment, and a 2-simplex
is a triangle, etc.
We call J = {K1 , . . . , Km } a triangulation of Ω if
(1) Ki ∩ Kj = ∅ or a p-simplex for 0 ≤ p ≤ n − 1,
S
(2) Ω = Ki ∈J Ki .
Denote hJ = max{diam(Ki ) | 1 ≤ i ≤ m} and we call it the mesh of J.
For the space H 1 (Ω), the vertices of {Ki }m1 are called nodes, they are denoted
by N = {Nj | 1 ≤ j ≤ M }.
For the space H01 (Ω), we ignore all vertices on ∂Ω, but only take the vertices
in the interior of Ω, they are denoted by N = {Nj | 1 ≤ j ≤ M0 }.
We correlate the nodes with the dual basis for the finite dimensional approxi-
mating subspace Vh as follows.
Choose piecewise affine functions as elements of Vh with basis functions
{ei }M1 , they are determined by the node set via
ei (Nj ) = δij i, j = 1, 2, . . . , M.
In fact, for any simplex K ∈ J, the value of the basis function ei on K is com-
pletely determined by its value at each vertex (if the vertex lies on ∂Ω, then set the
value be zero for homogeneous Dirichlet boundary condition and set the value be
a fixed constant for inhomogeneous Dirichlet boundary condition).
In particular, taking K = (P0 , . . . , Pn ), Pi = (p1i , . . . , pni ) for i = 0, 1, . . . , n,
we define n + 1 functions via
(
vj (x) = xj j = 1, 2, . . . , n
v0 (x) = 1,
where x = (x1 , . . . , xn ), it follows that

n

X
j
pji ei (x),




 x =
 i=0
n
(18.8)

 X



1 = ei (x).
i=0
S
Thus, ∀ i, the support of ei is K∈J {K | Ni ∈ K}; namely, the union of the
simplices which have Ni as a vertex.
Fig. 18.1
An arbitrary function v ∈ Vh can be expressed as

M
X
v(x) = v(Nj )ej (x).
j=1
We call (Ω, Vh , N ) a finite element (see Figure 18.1).

For a given function u ∈ C(Ω̄), how do we construct an element in Vh to
approximate u? A natural yet simple way is to use interpolation.
Denote Πu the corresponding element of u in Vh . ∀ K ∈ J, denote ΠK u =
Πu|K . Since the functions in Vh are piecewise affine, ΠK u is completely deter-
mined by the values of u at the vertices of K. Let
n
X
ΠK u(x) = u(Pj )ej (x),
j=0
where K = {P0 , . . . , Pn }.
Consequently, Π : C(Ω̄) → H01 (Ω) is a bounded linear operator with
kΠukH 1 ≤ CkukC(Ω̄) ,
.
where C = n sup1≤j≤M kej kH 1 .
Lemma 18.1 Let u ∈ C 2 (Ω̄), then
n2 (n + 1) h2J
k∇u − ∇(Πu)k1,∞ ≤ k∇2 uk∞ , (18.9)
2 ρJ
n2 (n + 1) 2 2
ku − Πuk∞ ≤ hJ k∇ uk∞ , (18.10)
2
where k · k1,∞ is the norm on W 1,∞ , k · k∞ is the norm on L∞ ,
hJ = sup hK , hK = diam(K),
K∈J
and
ρJ = inf ρK , ρK = sup{2R | BR (x) ⊂ K, x ∈ K}.
K∈J
Proof It suffices to consider the difference of ∇u(x) and ∇(ΠK u) on each sim-
plex K = {P0 , P1 , . . . , Pn }. We have the Taylor expansion of ∇(ΠK u):
n
X
∇(ΠK u)(x) = u(Pj )∇ej (x)
j=0
n
X 1 2 2
= u(x) + ∇u(x)(Pj − x) + ∇ u(ξx )(Pj − x) ∇ej (x),
j=0
2
(18.11)
where ξx ∈ k. Differentiating (18.8), we obtain
 n
j
X ∂ei
δ = pji , j, k = 1, 2, . . . , n


 k

∂xk
i=0
n
 X ∂ei
0 = .


i=0
∂xk
Substituting into (18.11), it yields

n
X
∇(ΠK u)(x) = ∇u(x) + Rj (x)∇ej (x),
j=0
where
1 2
Rj (x) = ∇ u(ξx )(Pj − x)2 .
2
Since

∂ei 1 1
kPj − xk ≤ diamK ≤ hJ , ∂xk ≤ ρK ≤ ρJ ,

this validates (18.9):

n2 (n + 1) h2J
k∇u − ∇(Πu)k1,∞ ≤ k∇2 uk∞ .
2 ρJ
Likewise, we can prove (18.10).
2
Theorem 18.1 Let u ∈ C(Ω̄) ∩ H (Ω). Suppose for a simplicial subdivision J,
there exists β > 0 such that
ρK
≥ β, ∀ K ∈ J, (18.12)
hK
then
ku − ΠukL2 ≤ Ch2J |u|H 2
and
|u − Πu|H 1 ≤ Cβ hJ |u|H 2 ,
with the semi-norm

X Z 21
α 2
|v|H r = |D v| dx , r = 1, 2.
|α|=r Ω
Proof According to Lemma 18.1, for any simplex K ∈ J we have
ku − ΠukL2 (K) ≤ Ch2K |u|H 2 (K)
and
h2K
|u − Πu|H 1 (K) ≤ C |u|H 2 (K) .
ρK
Thus,
X
ku − Πuk2L2 (Ω) = ku − ΠK uk2L2 (K)
K∈J
X
≤ C 2 h4K |u|2H 2 (K)
K∈J
≤ C 2 h4J |u|2H 2
and
X
|u − Πu|2H 1 (Ω) = |u − ΠK u|2H 1 (K)
K∈J
X h4K 2
≤ C2 |u| 2
ρ2K H (K)
K∈J
1
≤ C 2 h2J |u|2 2 .
β 2 H (K)
Remark 18.2 The geometric meaning of condition (18.12) is that in a simplicial
subdivision, the angle of each simplex cannot be arbitrarily small.
Remark 18.3 The advantage of adopting the interpolation method in construct-
ing approximating functions is evident not only in theoretical research but also in
practical computations. The reason is that the interpolation method only requires
the solution to be continuous, i.e. u ∈ C(Ω̄), however, the resulting approxima-
tion is in the sense of H 1 . In particular, for linear second order elliptic equations,
u ∈ H 2 (Ω). When n ≤ 3, by the embedding theorem, H 2 (Ω) ,→ C(Ω̄).
18.3 Cea’s theorem
We begin this section with an abstract theorem. Let H be a Hilbert space

equipped with an inner product (·, ·) (whose induced norm is k · k). Let V be a
closed linear subspace of H. Given F ∈ V ∗ .
Suppose
(1) a(u, v) is a bounded bilinear form on H, i.e.
|a(u, v)| ≤ Ckukkvk,
a(α1 u1 + α2 u2 , v) = α1 a(u1 , v) + α2 a(u2 , v), ∀ α1 , α2 ∈ R1 ,
a(u, β1 v1 + β2 v2 ) = β1 a(u, v1 ) + β2 a(u, v2 ), ∀ β1 , β2 ∈ R1 .

(2) a is coercive on V , i.e. ∃ α > 0 such that
a(u, v) ≥ αkvk2 , ∀ v ∈ V.
(3) Vh is a finite dimensional closed subspace of V .
According to the Lax–Milgram theorem, under assumptions (1) and (2), there
exists a unique u ∈ V such that
a(u, v) = F (v), ∀ v ∈ V.
Using the Ritz–Galerkin method, we can obtain a unique approximation solution
uh such that
a(uh , v) = F (v), ∀ v ∈ Vh .
We ask in what sense uh approximates u.
Theorem (Cea) Under the assumptions (1)–(3), if u ∈ V is the unique solution
of
a(u, v) = F (v), ∀v ∈ V
and if uh ∈ Vh is the solution of the approximated equation
a(uh , v) = F (v), ∀ v ∈ Vh ,
then
C
ku − uh k ≤ min ku − vk.
α v∈Vh
Proof Since
a(u, v) = F (v), ∀ v ∈ V,
a(uh , v) = F (v), ∀ v ∈ Vh ,
it follows that
a(u − uh , v) = 0, ∀ v ∈ Vh .
In particular,
αku − uh k2 ≤ a(u − uh , u − uh )
= a(u − uh , u − v) + a(u − uh , v − vh )
≤ Cku − uh k ku − vk,
where v ∈ Vh . This means
C
ku − uh k ≤ ku − vk, ∀ v ∈ Vh ,
α
i.e.
C
ku − uh k ≤ min ku − vk.
α v∈Vh
We now return to address the linear second order elliptic equation (18.50 ) men-
tioned in Remark 18.1. Suppose α, β, γ > 0 satisfy β 2 < αγ and
Pn 2 n
(1) i,j=1 aij (x)ξi ξj ≥ α|ξ| , ∀ ξ = (ξ1 , . . . , ξn ) ∈ R , |aij (x)| ≤ C for
1 ≤ i, j ≤ n, ∀ x ∈ Ω,
(2) c(x) ≥ γ, ∀ x ∈ Ω,
(3) max1≤i≤n kbi k∞ < 2β.
Let δ = α − β 2 /γ, then δ > 0. We also let
 
Z Xn n
X
a(u, v) =  aij (x)∂j u ∂i v + bi (x)∂i u · v + c(x)u · v  dx,
Ω i,j i=1
for u, v ∈ H01 (Ω).

The so-defined a(u, v) is now a bounded bilinear form on H01 (Ω). Further-
more, from the inequality
Z n
X
bi (x)∂i u(x) · u(x) dx ≤ max kbi k∞ k∇uk2 kuk2 ,

Ω i=1 1≤i≤n
it follows that a(u, v) is coercive on H01 (Ω) as affirmed by

a(v, v) ≥ αk∇vk2 − 2βk∇vk kvk2 + γkvk22 ≥ δk∇vk2 → ∞.
Now, ∀ f ∈ L2 (Ω), by the Lax–Milgram theorem, we obtain a unique solution
u ∈ H01 (Ω) which satisfies
Z
a(u, v) = f · v dx, ∀ v ∈ H01 (Ω).
Ω
Moreover, by the regularity of elliptic equations, we know u ∈ H 2 (Ω) and there

exists a constant C such that
|u|H 2 (Ω) ≤ Ckf kL2 (Ω) .
If we apply the Ritz–Galerkin method to find the approximated solution, adopting
the previously discussed finite element (Ω, Vh , N ), we obtain uh ∈ Vh . Lastly,
according to Cea’s theorem and Theorem 18.1, we have:
C
ku − uh kH01 (Ω) ≤
min ku − vkH01 (Ω)
δ v∈Vh
C
≤ hJ |u|H 2 (Ω)
δ
C0
≤ 2 hJ kf kL2 (Ω) .
δ
In summary, using the finite element method, the interpolated function is not only
an approximation of the original function in the sense of H01 -norm, but as an
approximated solution, it is also an approximation of the real solution u in the
sense of H01 -norm. The accuracy of the approximation is proportional to the mesh
size hJ .
18.4 An optimization method — the conjugate gradient method
Optimization technique is a common technique used in finding the extremal

values of a function numerically. Given a real-valued function f : Rn → R1 , we
intend to design an iterative algorithm in order to produce a minimizing sequence
x0 , x1 , . . . , xi , . . . which will lead to the minimal value of f .
Starting from a point xi , we want to choose xi+1 carefully so that the function
value decreases in each step of the iteration, i.e. f (xi+1 ) < f (xi ), if it is possible.
It is worth noting, from the point xi to another point xi+1 , we must take into
account two important factors — the direction as well as the step length,
xi+1 − xi = λi σ i , λi > 0, kσ i k = 1, (18.13)
where σ i denotes the direction of descent and λi its step length (see Figure 18.2).
We linearize f near xi by
f (x) = f (xi ) + (∇f (xi ), x − xi ) + o(kx − xi k).
Locally speaking, if we choose the direction of the negative gradient
σ i = −∇f (xi )/k∇f (xi )k, (18.14)
Fig. 18.2
then it gives the steepest descent. The step length λi can then be computed by
minimizing the single variable function
φ(t) = f (xi + tσ i )
in the direction of σ i . This iterative algorithm is known as the gradient method,
which is outlined as follows.
1. Given an initial point x0 .
2. Starting from xi , use (18.14) to compute σ i ; then use
0 i
φ (λ ) = 0,
φ(λi ) < φ(λ), ∀ λ ∈ [0, λi ),
to compute λi > 0, where φ(t) is defined as in (18.15).
3. Use (18.13) to compute xi+1 based on xi , σ i , and λi .
If σ i+1 = 0, then stop and xi+1 is the desired point. Otherwise, continue.
It is not difficult to show, if f is coercive with only one minimum, then starting
from any initial point x0 , the gradient method will either reach the minimum in
finite steps, or it will produce a minimizing sequence whose limit is the minimum.
However, the gradient method is not ideal, since the rate of convergence could
be painfully slow. The following example clearly portraits such weakness. Let
1
f (x1 , x2 ) = x21 + x22 , ε > 0.
ε
This is a quadratic function whose level curves are all ellipses and whose mini-
mum is at (0, 0). Starting from xi = (xi1 , xi2 ), by direct calculation, we have
λi

i+1 i i
x = (1 + λ )x1 , 1 + xi2 ,
ε
where
− (xi1 )2 + ε12 (xi2 )2

i
λ = .
(xi1 )2 + ε13 (xi2 )2
So for ε > 0 very small,
(xi1 )2

xi+1 ≈ (1 − ε)xi1 , −ε2 .
xi2
Fig. 18.3
This is why the descending path zigzags slowly approaching (0, 0) (see Fig-
ure 18.3).
The conjugate gradient method emerged to compensate for this weakness. Its
main idea is in a small neighborhood of the minimum, the function f can be
approximated by its second order Taylor polynomial:
1
f (x) = f (x0 )+(∇f (x0 ), x−x0 )+ ∇2 f (x0 )(x − x0 ), x − x0 +o(kx−x0 k2 ).

2
For the quadratic function
1
c + (b, x) + (Ax, x), c ∈ R1 , b ∈ Rn , A ∈ R,
2
where A is positive definite, it has a unique minimum. When n = 2, its level
curves are ellipses. We will take advantage of the special feature of this quadratic
function to avoid the zigzagging descending path.
From analytic geometry, we know a given vector w = (w1 , w2 ) determines an
axis of a family of ellipses, while its conjugate axis is the trace of the midpoints
lying on the chords parallel to this vector. We call the direction u of the conjugate
axis the conjugate direction of w (see Figure 18.4).
We are inspired by a geometric fact: for a positive definite quadratic function
on the plane, regardless of the initial point x0 , given any direction w, the minimum
along w must be the midpoint of a chord in the ellipse parallel to w. The next step
is, if we can find the minimum in the conjugate direction u of w, then we can reach
the center of the family of ellipses in merely two steps, which coincides with the
minimum of the original function.
In order to extend this geometric fact to higher dimensions, we must be able to
analytically express the concept of “conjugate” direction. For simplicity, we take
b = 0.
Fig. 18.4
Given a vector w, ∀ v ∈ R2 , the parametric equation of the line passing

through v parallel to w is x = v + tw. It intersects the ellipse
1
c + (Ax, x) = 0
2
at the points
xi = v + ti w, i = 1, 2,
where t1 and t2 are two real roots of the equation
2c + t2 (Aw, w) + 2t(Av, w) + (Av, v) = 0.
x1 +x2
If v lies on the conjugate axis, then v = 2 . Thus,
t1 + t2 = 0,
namely,
(Av, w) = 0.
In other words, if we redefine an inner product on R2 using the positive definite
matrix A by
[v, w]A = (Av, w),
then
v is conjugate to w ⇐⇒ v is orthogonal to w with respect to [·, ·]A .
This can easily be extended to Rn . Let A be an n × n positive definite matrix,
∀ v, w ∈ Rn , define
[v, w]A = (Av, w).
It is straightforward to verify [·, ·]A is in fact an inner product on Rn .
We say v and w are conjugate with respect to A if
[v, w]A = (Av, w) = 0.
Consequently, on Rn , for a given positive definite matrix A, there exist n

mutually conjugate vectors w1 , . . . , wn such that
(Awi , wj ) = 0, 1 ≤ i < j ≤ n.
Consider the quadratic function f (x) = c + (b, x) + 21 (Ax, x), where A is

positive definite. As a routine exercise in linear algebra, it is straightforward to
verify: starting from any x0 , first find a family of mutually conjugate unit vec-
tors w1 , . . . , wn , and then find the one dimensional minimum along these vectors
iteratively; in n steps, we can reach the minimum of f .
Based on this idea, we outline the conjugate gradient method as follows:
1. Given an initial point x0 ∈ Rn and an initial direction σ 0 = −∇f (x0 ).
2. Suppose we already found xi and σ i set
ri = ∇f (xi ) (= Axi + b),

λi = −(ri , σ i )/(Aσ i , σ i ),
σ i+1 = −ri+1 + β i σ i ,
β i = (Ari+1 , σ i )/(Aσ i , σ i ).
Set
xi+1 = xi + λi σ i .
3. If ri+1 = 0, then stop and xi+1 is the desired minimum; otherwise, con-
tinue. For a quadratic function, the above iteration will terminate in at most n
steps.
In fact, it is not difficult to verify by mathematical induction: for m ≤ n, we
have
(1) if φ(t) = f (xi + tσ i ), then φ0 (λi ) = 0;
(2) span{σ 0 , . . . , σ m } = span{r0 , . . . , rm } = span{r0 , Ar0 , . . . , Am r0 };
(3) [σ i , σ j ]A = 0 for i 6= j and 1 ≤ i < j ≤ m.
Remark 18.4 Since the optimization problem of a quadratic function can be
reduced to solving a system of linear equations, the conjugate gradient method is
also a very effective iterative algorithm in solving systems of linear equations.
The systems of linear equations generated by the finite element method in
general correspond to sparse matrices, where the conjugate gradient method is
proven to be particularly effective.
Remark 18.5 The conjugate gradient method is not only applicable to optimiza-
tion problems of quadratic functions, but is also applicable to convex functions.
Despite the fact that it may not reach the minimum in only n steps, the rate of
convergence is nevertheless notably improved.
Remark 18.6 Of particular note, in calculations of quantum mechanics and math-
ematical physics, it is often required to compute the spectrum of a positive definite
self-adjoint operator. Under certain compactness assumption, the spectrum con-
sists only of eigenvalues. In which case, based on the minimax description of the
eigenvalues (see Lecture 12), we can apply the Ritz method to obtain approxi-
mated solutions. Both the finite element method as well as the conjugate gradient
method are numerical implementations of the Ritz method.

Lecture 19
Optimal control problems
Optimal control problems are a special kind of variational problems, which

have profound applications in engineering, economics, and other areas. In gen-
eral, since the function realizing the extremal value of a functional need not be
continuous (it usually involves jump discontinuity, such as switching), it cannot
be obtained by solving the traditional Euler–Lagrange equation. Instead, the nec-
essary condition is replaced by the Pontryagin Maximal Principle. The foundation
of optimal control theory has been developed into a subject of its own. In this
lecture, we only give a brief introduction.
19.1 The formulation of problems
In order to assist the reader with an intuitive understanding of the formulation

and possible solution of an optimal control problem, we begin with a rather simple
example.
Example 19.1 (A rolling cart problem) A small rolling cart with mass m = 1 is
placed on the x-axis. Starting from the stationary point x = 0, the cart is to reach
the point x = L at exactly time T . Without friction and air resistance, we seek
how to apply a nonnegative external force u ≥ 0 on the cart such that the total
work done W is minimum.
Denote the position of the cart at time t by x(t). Let v(t) be the velocity of
the cart at time t and u(t) be the external force put on the cart at time t. Their
275
relations are given by


dx
= v,


dt

(19.1)
 dv

 = u,
dt

x(0) 0
= , (19.2)
v(0) 0
x(T ) = L.
The controlled external force belongs to the set
U = {u ∈ L∞ ([0, T ])|u ≥ 0}.
The work done by the external force is
Z L Z T T
v 2 (T )
Z
W = u dx = uv dt = v̇v dt = .
0 0 0 2
We are asked to solve the constrained minimization problem
Z T Z T

min W = uv dt (19.1), (19.2),
v dt = L, u ∈ U .
0 0
◦ L2
Analysis 1 Under the constraint, we claim W ≥ 2T 2 .
To prove this, since v is continuous, by the Mean Value Theorem of integrals,
∃ t0 ∈ (0, T ) such that v(t0 ) = L/T . Since u ≥ 0, v is non-decreasing, hence
v 2 (T ) v 2 (t0 ) L2
W = ≥ = .
2 2 2T 2
2
2◦ We claim W > 2T L
2.
L2
To prove this, suppose there exists a control function u such that W = 2T 2 ,
then ∃ t0 ∈ (0, T ) which satisfies

v(t) = v(t0 ) = L , ∀ t ∈ [t0 , T ],

T
v(t0 ) = max v(t).

t∈[0,t0 ]
It follows that
Z t0
L
L= v(t) dt + (T − t0 ),
0 T
i.e.
Z t0
L
t0 = v(t) dt.
T 0
Optimal control problems 277
Thus,
L
v(t) ≡ .
T
This contradicts (19.2).
3◦ We consider a discontinuous u. Let
(
a, t ∈ [0, T0 ],
u(t) =
0, t ∈ (T0 , T ],
where a and T0 are parameters. The solution is
(
at, t ∈ [0, T0 ],
v(t) =
aT0 , t ∈ (T0 , T ].
Now we have
T0
a2 T02
Z
W = a2 t dt = ,
0 2
T
aT02 aT 2
Z
L= v dt = + aT0 (T − T0 ) = aT T0 − 0 .
0 2 2
2
L L
When aT0 = T, as T0 → 0, W → 2T 2 , the controlled external force u(t) =
L
T δ(t), where δ(t) is a pulse function.
Therefore, the pulse force u(t) = TL δ(t) produces constant velocity v = TL ,
L2
which realizes the minimal work W = 2T 2 . However, pulse force is not physical.
◦
4 If u is allowed to change signs, then the control set becomes U =
L∞ ([0, T ]).
If we set u = a( T2 − t), then
t2

T
v=a t− , v(T ) = 0,
2 2
hence W = 0. However, there are many solutions for W = 0. For example, if we
take

a,

 t ∈ [0, T0 ],
u(t) = 0, t ∈ (T0 , T − T0 ),

−a,

t ∈ [T − T , T ], 0
then

at,

 t ∈ [0, T0 ],
v(t) = aT0 , t ∈ (T0 , T − T0 ),

a(T − t),

t ∈ [T − T0 , T ],
is also a solution of W = 0. In particular, if we choose T0 = T /2, then the control

function u fluctuates in between the positive and negative constants ±a, which is
why it is also called a Bang-Bang control.
From this example, we discover in the formulation of an optimal control
problem, variables can be divided into two groups: state variables and control
variables.
A state variable describes the state of a system, which is typically a vector (for
instance the variable x in our previous example). Its range is a subset Y of the
vector space Rn .
A control variable is controllable (for instance the variable u in our previous
example). All permissible control variables constitute a set U . It is a subset of
L1 ([0, T ]); it is often comprised of piecewise continuous functions.
In an optimal control problem, it is required to prescribe a state equation which
dictates the system’s changes:
ẋ = f (t, x, u), t ∈ (0, T ), (19.3)
It is also required to prescribe the initial state x(0) = x0 and the terminal set
x(T ) ∈ B. Using the Lagrangian L : [0, T ] × Rn × RN → R1 , we can define the
price functional
Z T
J(u) = L(t, x(t), u(t)) dt,
0
the goal is to find u ∈ U such that

min J(u).
u∈U
Example 19.1 (continued) In the above example, the state equations are
(
ẋ = v,
t ∈ [0, T ],
v̇ = u,
the initial state and the terminal set are respectively given by

x(0) 0 L
= , B= .
v(0) 0 ∗
The set of permissible control functions is
U = {u ∈ L∞ ([0, T ], R1 ) | u(t) ≥ 0}.
The Lagrangian L = u · v, which defines the price functional
Z T Z T
J(u) = W = u · v dt = L(v, u) dt.
0 0
Example 19.2 We find how to reinvest the products in order to maximize the
overall production of the goods.
Suppose the production rate of a certain goods (the unit time production) is
q = q(t). Suppose its rate of increase q̇ is proportional to the percentage of
reinvestment u,
q̇ = αuq, t ∈ [0, T ],
where α > 0 is given. The initial state is also given,
q(0) = q0 .
However, the terminal state has no restriction whatsoever, i.e. B = R1 .
The overall production is given by
Z T
I= (1 − u)q dt.
0
The controlled reinvestment percentage
u ∈ U = {v ∈ L∞ (0, T ) | v(t) ∈ [0, 100]}.
The goal is to maximize the overall production, i.e.
max{I | u ∈ U }.
Example 19.3 (The Ramsey economic growth model) In 1928, Ramsey pro-
posed the following economic growth model.
Suppose the investment per worker is i, the consumption per worker is c, and
the output per worker is x = i + c.
Suppose x is a function f (k) of the capital intensity (capital per worker) k.
Suppose the population growth rate is α, then the labor L, as a function of time t,
follows the exponential growth L(t) = L0 eαt .
Since the total capital is K = kL, the total investment I = dK dt . They are
related via
I 1 dK dk k dL dk
i= = = + = + αk.
L L dt dt L dt dt
The target function measuring the consumption of the society is
Z ∞
W (c) = e−pt u(c(t))dt,
0
where p > 0 is called the depreciation rate of capital, while u(c) is called
the consumer’s utility function. Ramsey proposed to maximize W (c) under the
constraint
dk
= f (k) − αk − c, k(0) = k0 .
dt
19.2 The Pontryagin Maximal Principle
Consider the following optimal control problem.

Assume the state variable x ∈ W 1,∞ ((0, T ), Rn ), the control variable u ∈
U = {v is a piecewise continuous function taking values in Rn | v(t) ∈ C}, where
C ⊂ RN is compact. The state equation is ẋ = F (x, u), where F ∈ C 1 (Rn ×
RN , Rn ). Both the initial state x(0) = x0 and the terminal state x(T ) = x1 are
given.
We also assume L ∈ C(Rn ×RN , R1 ) is a given Lagrangian. Define the target
functional
Z T
J(x, u) = L(x(t), u(t))dt.
0
We have the following maximal principle.

Theorem 19.1 Suppose u = u∗ is a solution of the above optimal control prob-
lem which corresponds to the orbit x = x∗ (t) of the optimal state variable, i.e.
(x∗ (t), u∗ (t)) attains the minimum of
Z T
J(x, u) = L(x(t), u(t)) dt.
0
Then there exists a piecewise continuously differentiable function λ = λ(t) ∈ Rn

such that
λ̇ = −Hx (x∗ , λ∗ , u∗ ),
where
H(x, λ, u) = −L(x, u) + λF (x, u). (19.4)
Furthermore, along the optimal orbit {(x∗ (t), λ(t), u∗ (t)) | t ∈ (0, T )}, H is the
constant
E = sup H(x∗ (t), λ(t), v).

v∈C
The function H(x, λ, u) is also called the Hamiltonian, whereas λ is called the
conjugate variable.
Proof (The proof is rather lengthy and it is beyond the scope of this book. We
hereby adopt the outline of the proof given by Macki and Struass [MS], we also
refer to MacCluer [Mac] for details.)
1◦ Suppose for any initial state x0 , ∃ u = u(x0 , t) ∈ U minimizes the func-
tional. This means there exists an optimal path, y = x(x0 , t) for t ∈ [0, T0 ] such
that ẏ = F (y, u), y(0) = x0 , y(T0 ) = x1 , and that

Z T0
I(x0 ) := min L(y(t), v(t)) dt
v∈U 0
Z T0
= L(y(t), u(t)) dt.
0
According to the local minimizing principle, (u(x0 , t), x(x0 , t)) is also optimal in
(t, T0 ], hence
Z T0
0
I(x(x , t)) = L(x(x0 , s), u(x0 , s)) ds.
t
◦
2 Suppose I is differentiable and u is piecewise continuous. Then except for
finitely many points, for all t, we have
0 = L(x(x0 , t), u(x0 , t)) + ∇I(x(x0 , t)) · ẋ(x0 , t)
= L(x(x0 , t), u(x0 , t)) + ∇I(x(x0 , t))F (x(x0 , t), u(x0 , t)).
When x0 = x0 , by setting u = u∗ (x0 , t), we have x = x∗ (x0 , t) and
0 = L(x∗ (x0 , t), u∗ (x0 , t)) − λF (x∗ (x0 , t), u∗ (x0 , t))
= −H(x∗ (x0 , t), λ(t), u∗ (x0 , t)),
where λ(t) = −∇I(x∗ (x0 , t)). Since x∗ (x0 , t) is piecewise differentiable, λ is
piecewise continuous and along the optimal orbit (x∗ , λ, u∗ ),
H(x∗ (x0 , t), λ(t), u∗ (x0 , t)) = 0.
3◦ We now prove
H(x∗ (t), λ(t), v) ≤ 0. ∀ v ∈ C, ∀ t ∈ (0, T ).
In fact, ∀ v ∈ C, the system
(
ẋ = F (x, v),
x(0) = x0 , ∀ t ∈ [0, T0 )
has a local solution x = x̃(t). Generally speaking, ∀ v ∈ C, x̃ is not the optimal
orbit from x0 to x̃(t), we only have
Z t
I(x0 ) ≤ L(x̃, v) ds + I(x̃(t)).
0
So for t > 0 sufficiently small,
t
I(x̃(t)) − I(x0 )
Z
1
− ≤ L(x̃, v) ds.
t t 0
From this, we can deduce that

d
− I(x̃(t))|t=0 ≤ L(x0 , v),
dt
while the differential of the left-hand side is equal to
˙
−∇I(x0 ) · x̃(0) = −∇I(x0 ) · F (x0 , v).
Since we can choose any state x∗ (x0 , t) along the orbit to be an initial value, it
follows that
H(x∗ (x0 , t), λ(t), v) ≤ 0, ∀ v ∈ C, ∀ t ∈ [0, T ].
4◦ Lastly, we prove λ satisfies the conjugate equation. Let
h(x, u) = −L(x, u) − ∇I(x)F (x, u).
From our earlier discussion, we know that ∀ t ∈ (0, T ), the function h, except for
finitely many points, achieves its maximum when x = x∗ (t) and u = u∗ (t).
Suppose I ∈ C 2 . By direct computation, it yields
∂h
0=
∂xi
∂L ∂(∇I · F )
=− −
∂xi ∂xi
N
∂L ∂ X ∂I
=− − ( Fj )
∂xi ∂xi j=1 ∂xj
N N
∂L X ∂ 2 I X ∂Fj ∂I
=− − Fj −
∂xi j=1 ∂xi ∂xj j=1
∂xi ∂xj
N N
∂L X ∂λi X ∂Fj
=− + ẋj + λj ,
∂xi j=1 ∂xj j=1
∂xi
whence
N N
X ∂λi ∂L X ∂Fj ∂H
λ̇i = ẋj = − λj =− .
j=1
∂xj ∂xi j=1 ∂xi ∂xi
That is,
λ̇ = −Hx (x∗ (x0 , t), λ(t), u∗ (x0 , t)).
Remark 19.1 We call the quantity H(x, λ, u) defined in (19.4) the Hamiltonian,
since the orbit (x∗ (t), λ(t), u∗ (t)) satisfies the Hamiltonian system
(
ẋ = F (t, x, u) = Hλ (x, λ, u),
λ̇ = −Hx (x, λ, u).
Remark 19.2 In this theorem, if the terminal vector x(T ) = x1 has some com-
ponents which are not pre-determined, then the corresponding components in the
conjugate variable λ(T ) must be zero.
We now revisit the previous two examples using the maximal principle.
Example 19.1 (further continuation) Suppose the control variable has a con-
straint 0 ≤ u(t) ≤ a, where a > 0 is a prescribed constant. By introducing the
conjugate variable λ(t) = (α(t), β(t)), the Hamiltonian is

v
H = −L + λF = −uv + (α, β) = αv + u(β − v).
u
Suppose (x∗ (t), v ∗ (t)) is the optimal orbit with conjugate orbit (α, β) such that it
achieves the maximum of the Hamiltonian. Since v ∗ (t) is fixed, u(β − v ∗ ) must
attain its maximum. Thus,
(
a, β > v∗ ,
u=
0, β < v∗ .
Since v̇ = u and v(0) = 0, it follows that there exists a constant v0 such that
(
∗ at, β > v∗ ,
v =
v0 , β < v∗ .
In addition, the conjugate variable satisfies the equation
λ̇ = (α̇, β̇) = −H(x,v) = (0, u − α).
From λ(t) = (α(t), β(t)), we deduce α = α0 must be a constant, and
(
(a − α0 )t + c, u = a,
β=
−α0 t + d, u = 0,
where c and d are constants. Since both v ∗ and β are continuous, it follows that
β − v ∗ = −α0 t + c, d = c + v0 .
Noting the terminal v(T ) is indeterminate, according to Remark 19.2, β(T ) = 0.
However, since
−α0 t + c = 0
has only one real root T0 = c/α0 , this implies c ≤ α0 T .

at, t ∈ [0, T0 ),
v ∗ (t) =
aT ,
0 t ∈ [T0 , T ].
(
a, t ∈ [0, T0 ),
u(t) =
0, t ∈ [T0 , T ].
Moreover, by the terminal condition x(T ) = L, we also have
Z T
aT 2
L= vdt = aT T0 − 0 ,
0 2
 r
2 2L
T − T 2 − L, a≥ ,


a T2
T0 =


no solution, 2L
. a<
T2
Example 19.2 (continued) We return to analyze how to maximize the overall
production.
Introducing the conjugate variable λ, the Hamiltonian is
H = −L + λF = (1 − u)q + λ(αuq) = (1 − u + λαu)q.
1◦ Since q ≥ 0, maximizing H is equivalent to maximizing −u + λαu =
(λα − 1)u. Hence, the optimal control variable should be
(
1, αλ(t) > 1,
u(t) =
0, αλ(t) < 1.
2◦ The conjugate variable λ satisfies the equation
λ̇ = −Hq = −(1 − u + λαu),
namely, λ̇ + λαu = u − 1, therefore,
(
λ̇ + αλ = 0, u = 1,
λ̇ = −1, u = 0.
It yields the solution
(
ce−αt , u=1
λ(t) =
−t + d, u = 0,
where c and d are constants.
3◦ The terminal condition. Since q(T ) is indeterminate, λ(T ) = 0. This
implies u(T ) 6= 1, which is only possible if
u = 0, d = T.
This conclusion is reasonable, since at the end of a production cycle, there is no
need to reinvest.
4◦ The change of the control variable. From 1◦ , we already know, the values
of the control variable u should change at αλ(t) = 1, i.e. when ts = T − α1 .
Thus, the final solution is given by
(
1, t ∈ (0, ts ),
u(t) =
0, t ∈ (ts , T ),
(
q0 eαt , 0 < t < ts
q(t) =
q0 eαts , ts < t < T,
Z T
q0 α(T − 1 )
I= q0 eαts dt = q0 eαts (T − ts ) = e α .
ts α
Example 19.3 (continued) Returning to the Ramsey’s economic growth model,
to maximize W (c) is equivalent to minimizing −W (c). By introducing the con-
jugate variable λ and the Hamiltonian
H(k, λ, c) = e−pt u(c) + λ(f (k) − αk − c),
we obtain the optimal conditions
∂H
= u0 (c)e−pt − λ = 0,
∂c
and 
k 0 (t) = f (k) − αk − c,

∂H
λ0 (t) = −
 = −λ(f 0 (k) − α)
∂k
together with the initial condition k(0) = k0 . Furthermore, since the problem is
proposed on the positive half line, we also have the terminal condition
lim λk(t) = 0.
t→∞
In 1928, Ramsey applied variational methods to discuss this problem; however,

its importance was neglected. It was not until 1965 that Ramsey’s original idea
was rediscovered and improved by economists Cass and Koopmans respectively,
and it has since been termed the Ramsey–Cass–Koopmans’ model.
19.3 The Bang-Bang principle
A specific control model is called a Bang-Bang control, if each component of

the control variable u ∈ RN takes on at most two different values in the entire
process.
Given an n × n matrix A and an n × N matrix B, let x ∈ Rn be the state

variable and u ∈ RN be the control variable. Consider the following linear system
(
ẋ = Ax + Bu,
(19.5)
x(0) = x0 .
Theorem 19.2 In the linear system (19.5), if there exists a control variable u0 ∈
L∞ (0, T ) to move from the initial state x(0) = x0 to the terminal state x(T ) =
x1 , then it must be realized by a bang-bang control u.
This theorem is in fact a corollary of the Krein–Milman theorem in functional
analysis. The Krein–Milman theorem concerns the extreme points of a compact
convex subset in a locally convex topological vector space. It is an equivalent
statement to the Hahn–Banach theorem.
Let X be a locally convex topological vector space and E ⊂ X be a convex
subset. A point x ∈ E is called an extreme point, if there are no y, z ∈ E such
that x = (y + z)/2. The set of all extreme points is called the extremal set of E,
denoted by D(E).
The Krein–Milman theorem Let X be a locally convex topological vector space
and E ⊂ X be a compact convex subset, then E is the closure of the extremal set
D(E).
Proof of Theorem 19.2 Without loss of generality, we may assume ku0 (t)k∞ ≤ 1.
Let
Z t
U = v ∈ L∞ (0, T ) | kvk∞ ≤ 1 x(T ) = eA x0 + eA(t−s) Bv(s)ds .
0
Clearly, U is a weak-∗ closed convex set and u0 ∈ U .

According to the Banach–Alaoglu theorem, the unit ball in L∞ (0, T ) is
weak-∗ compact, so must be U itself. According to the Krein–Milmann theorem,
it must have extreme points.
In the following, we prove: if u ∈ U is an extreme point of U , then u must
be a bang-bang control. For simplicity, we only assume N = 1. Choose any
ε ∈ (0, 1), consider the set
S = {t ∈ (0, T ) | u(t) ∈ (−1 + , 1 − )}.
Note that the solution of the linear system can be expressed as

Z t
x(t) = eA x0 + eA(t−s) Bu(s)ds.
0
Choose a function v ∈ L∞ (0, T ) such that

Z
e−As Bv(s)ds = 0.
S
We define v to be zero outside S, by rescaling if necessary, we have kvk∞ = 1.

Thus, u ± εv ∈ U and u = 12 [(u + v) + (u − v)], which contradicts u is an
extreme point. This implies S = ∅. However, since ε > 0 is arbitrary, u must be
a bang-bang control as claimed.

Lecture 20
Functions of bounded variations and image

processing
In many modern applied problems (such as problems arising from image

restoration, phase transition, etc.), the solutions of variational problems are not
always continuous; more specifically, these kinds of discontinuity often occur to
rectifiable curves with obvious breaking (or measurable higher dimensional hyper-
surfaces). This phenomenon suggests that the desired solutions do not belong to
any appropriate Sobolev spaces, instead, the focus should be turned to the space
of functions of bounded variations.
Starting from the mid-20th century, De Giorgi et al. adopted methods from
measure theory to study Plateau problems of minimal surfaces. Functions of
bounded variations and measures are naturally related, the theoretical framework
revealing their intertwined connection has been systematically developed into the
area of geometric measure theory. However, these topics are far beyond the scope
of this book, we can only demonstrate the essence by some concrete examples
with applications.
We first recall the definition as well as some chief properties of functions of
bounded variations in one variable, and then we will extend this to the case of
several variables.
20.1 Functions of bounded variations in one variable — a review
Definition 20.1 A function u : J = (a, b) → R1 is said to be of bounded

variation, denoted BV, if there is a constant M > 0 such that
n
X
Sπ (u) = |u(ti+1 ) − u(xi )| ≤ M
i=1
289
for every partition π = {a < t1 < · · · < tn+1 < b} of (a, b). We call
Vab (u) = sup Sπ (u)

π
the total variation of u on J.

The set of BV functions on J is denoted by BV(J), which is itself a linear
space. The following simple properties are straightforward to verify:
(1) A bounded monotone function on J is a BV function.
(2) Lip(J, R1 ) ⊂ BV(J).
(3) ∀ u ∈ BV(J), u can be decomposed as the difference of two monotone
increasing functions:
1 1
u(x) = (Tu (x) + u(x)) − (Tu (x) − u(x)),
2 2
where
m
X
Tu (x) = sup |u(ti+1 )−u(ti )| | σ : a < t1 < · · · < tm+1 < x is any partition of J .
i=1
(4) ∀ u ∈ BV(J), the set of jumping points Du = {x ∈ J|u(x − 0) 6=

u(x + 0)} is at most countably infinite.
Since the value of a BV function at a jump discontinuity is not determined, in
order to avoid unnecessary fuss caused by this, we choose to normalize
u(x) = u(x − 0), ∀ x ∈ J.
To further signify the variation part of u on J, we also normalize u(a + 0) = 0.

The set of all normalized functions of bounded variations on J is still a linear
space, denoted by NBV(J). On which, we define the norm
kuk = Vab (u).
Then NBV(J) is a Banach space.

In the following, we establish the one-to-one correspondence between the
NBV functions and the σ-additive Borel measurable functions.
Let u : J → R1 be a monotone increasing normalized BV function, define
µu ((a, x)) = u(x).
We claim that µu can be extended to a measure on J, i.e. it is a σ-additive non-

negative set function. In fact, ∀ [α, β) ⊂ J, let
µu ([α, β)) = u(β) − u(α),

Functions of bounded variations 291
then µu ({α}) = u(α + 0) − u(α). Hence, for any Borel set E ⊂ J, we have
!
[
1
µu (E) = L [u(x), u(x + 0)] ,
x∈E
where L denotes the Lebesgue measure on R1 .

1
Conversely, for a given Borel measure µ on J, u(x) = µ((a, x)) is monotoni-

cally increasing and left continuous; moreover, it satisfies u(a + 0) = 0.
From property (3), ∀ u ∈ BV(J), there exist two monotone increasing, left
continuous functions u1 and u2 such that u = u1 − u2 . By the above statement,
they correspond to two respective Borel measures µi = µui for i = 1, 2. By
defining
µ(E) = µ1 (E) − µ2 (E),
it follows that µ is a σ-additive Borel measurable set function, which satisfies
µ((a, x)) = µ1 ((a, x)) − µ2 ((a, x)) = u(x), ∀ x ∈ J.
Subsequently, for any σ-additive Borel measurable set function µ,
u(x) = µ((a, x)) (20.1)
defines a unique normalized BV function.
As a consequence, for any σ-additive Borel measurable set function µ and any
Borel measurable set E ∈ B, we can also define a “total variation”:

X∞ ∞
[
|µ|(E) = sup |µ(Bi )| Bi ∈ B, Bi ∩ Bj = ∅, i 6= j, Bi = E .

i=1 i=1
It is not difficult to verify:
(1) |µ| is a Borel measure,
(2) |µ(E)| ≤ |µ|(E), ∀ E ∈ B,
(3) |µ|(J) = Vab (u).
We denote the space of all σ-additive Borel measurable set functions on J by
M(J, R1 ). On M(J, R1 ), define the norm to be
kµkM = |µ|(J),
then it is also a Banach space.
The above one-to-one correspondence u 7→ µ is an isomorphism between the
Banach spaces NBV(J) and M(J, R1 ).
Recall the Riesz representation theorem for the space of continuous functions
on J:
C0 (J)∗ ∼
= M(J, R1 ),
where C0 (J) = {ϕ ∈ C(J)|ϕ(a) = ϕ(b) = 0}.
Since C0 (J) is separable, M(J, R1 ) is the dual space of a separable Banach

space. This means: ∀ F ∈ C0 (J)∗ , there exists a unique µ ∈ M(J, R1 ) such that
Z
F (φ) = φ dµ, ∀ φ ∈ C0 (J),
J
with kF k = kµkM , and the map F 7→ µ is surjective.
By the Lebesgue–Nikodym decomposition theorem, every BV function has
the following decomposition:
u(x) = v(x) + r(x) + s(x),
where v(x) is the absolutely continuous part of u, r(x) is the Cantor part of u,
and s(x) is the jumping part of u. In particular, s is itself a jump function, while r
is a non-constant function of bounded variation, but whose derivative equals zero
almost everywhere.
Since a BV function u is itself a bounded measurable function, u ∈ L1 (J). As
a function in L1 (J), it has a generalized derivative (in the sense of distribution)
Du, which satisfies
Z
hDu, ϕi = − u · ϕ0 dx, ∀ ϕ ∈ C0∞ (J). (20.2)
J
It is worth noting, contrasting to Sobolev spaces, for a BV function u, Du and
the almost everywhere derivative u0 are not identical to each other! Only when
r = s = 0, u ∈ AC(J), which implies Du = u0 .
If s 6= 0, then Du may be a measure. For example, let J = (−1, 1) and

0, x ≤ 0,
u(x) =
1, x > 0,
then Du = δ(x). That is, if we define µ = Du, then µ({0}) = 1, µ(B) = 0,
∀ B ∈ B and 0 6∈ B.
We now reveal the explicit relation between a BV function u and its corre-
sponding measure µ. Notice as the dual space of C0 (J), we have a natural dual
pairing:
Z
hµ, ϕi = ϕ dµ, ∀ (µ, ϕ) ∈ M(J, R1 ) × C0 (J).
J
Thus,
Z
kµkM = sup ϕ dµ | kϕkC0 ≤ 1 .
J
Furthermore, according to the one-to-one correspondence of M(J, R1 ) and
NBV(J), we also have
Z Z
hµ, ϕi = ϕ du(x) = − u(x)ϕ0 (x) dx.
J J
Combining with (20.2), it follows immediately that

Du = µ.
Since there is no notion of end (left and right) points in the domain of functions
of several variables, the idea of normalizing the function value seems impractical.
However, the above formula sheds new light on how to generalize functions of
bounded variations from a single variable to several variables. More precisely, we
are led to the following definition:
Definition 20.2 On BV(J), we define the norm
kukBV = kukL1 + kDukM .
It is not difficult to verify that BV(J) is a Banach space.
20.2 Functions of bounded variations in several variables
Let Ω ⊂ Rn be a bounded open subset. Denote C01 (Ω, Rn ) the function space
of all continuously differentiable functions on Ω with compact support in Rn .
Analogous to the single variable situation, we define
Defintion 20.3 A function u ∈ L1 (Ω) is said to be of bounded variation, if
Z
udivϕ dx ϕ ∈ C01 (Ω, Rn ), kϕk∞ ≤ 1 < ∞.

kDuk(Ω) = sup
Ω
We call kDuk(Ω) the total variation of u.

Denote the space of all BV functions on Ω by BV(Ω) with the norm
kukBV = kukL1 + kDuk(Ω).
Just like in the single variable setting, for any BV function u, we consider the
relation between the generalized gradient Du:
Z
hDu, ϕi = hu, div ϕi := u div ϕdx, ∀ ϕ ∈ C0∞ (Ω, Rn )
Ω
and the corresponding measure.
From the fact that total variation of u is bounded, we can affirm Du is a vector-
valued Radon measure.
This is because ∀ ϕ0 ∈ C0 (Ω, Rn ), in the Banach space consisting of all
continuous functions on Ω̄ with zero boundary values, there exists a sequence
{ϕk } ⊂ C01 (Ω, Rn ) such that ϕk → ϕ0 uniformly on Ω̄, and |ϕk |∞ ≤ |ϕ0 |∞ ,
∀ k. Define the linear functional
Z
L(ϕ) = u div ϕ dx, ∀ ϕ ∈ C01 (Ω, Rn ),
Ω
it satisfies
|L(ϕ)| ≤ Ckϕk∞ , ∀ ϕ ∈ C01 (Ω, Rn ).
By letting
L(ϕ0 ) = lim L(ϕk ),
k→∞
we see that not only the limit L(ϕ0 ) exists, but it is independent of the particular
choices of ϕk as well. This means L has a unique linear continuous extension
L : C0 (Ω, Rn ) → R1 .
Namely, L is a linear continuous functional on C0 (Ω, Rn ). By the Riesz repre-
sentation theorem, there exists a vector-valued Radon measure µ = (µ1 , . . . , µn )
such that
Z Z X n Z
u div ϕ dx = − ϕi dµi = − ϕ dµ, ∀ ϕ = (ϕ1 , . . . , ϕn ) ∈ C01 (Ω, Rn ).
Ω Ω i=1 Ω
This means
Du = dµ.
We henceforth denote the space of all Radon measures on Ω taking values in Rn
by M(Ω, Rn+1 ), on which, by defining the norm
Z

kµkM = sup ϕ dµ kϕkC0 (Ω,Rn+1 ) ≤ 1 ,
Ω
it becomes a Banach space.
Example 20.1 Let u ∈ W 1,1 (Ω), then u ∈ BV(Ω) and kukW 1,1 = kukBV . To
prove this, note that on one hand,
Z Z
Du · φ ≤ kDuk1 , ∀ φ ∈ C01 (Ω, Rn ), kφk∞ ≤ 1.

u div φ =

Ω Ω
On the other hand, ∀ ε > 0, ∃ ϕε ∈ C01 (Ω, Rn ) with kϕε k∞ ≤ 1, such that
Z Z Z
|Du| ≤ Du · ϕε dx + ε = u div ϕε dx + ε ≤ kDuk(Ω) + ε.
Ω Ω Ω
Consequently, kDuk(Ω) = kDuk1 .
Example 20.2 Let S ⊂ Rn be a C ∞ compact closed hypersurface of dimension
n − 1. We equip S the induce metric from Rn and denote its (n − 1) dimensional
Hausdorff measure by Hn−1 (S).
Assume Ω is the region bounded by S with characteristic function χΩ . Then
χΩ ∈ BV(Rn ) and
kDχΩ k(Rn ) = Hn−1 (S).
Proof 1◦ ∀ φ ∈ C01 (Rn , Rn ), by Gauss’s theorem,

Z Z Z
χΩ · div φ dx = div φ dx = n(x) · φ(x) dHn−1 ,
Rn Ω S
where n(x) is the unit normal vector. Thus,
kDχΩ k(Rn ) ≤ Hn−1 (S).
2◦ On the other hand, we can extend n(x) to be a C ∞ vector field V on Rn
using the standard partition of unity argument. Furthermore, we can make sure
kV (x)k ≤ 1, ∀ x ∈ Rn .
∀ ρ ∈ C0∞ (Rn , R1 ), |ρ(x)| ≤ 1, ∀ x ∈ Rn , let φ = ρV , then
Z Z
χΩ div φ dx = ρ dHn−1 .
Rn S
Thus,
Z
n
kDχΩ k(R ) ≥ sup χΩ div φ dx | φ ∈ C0∞ (Rn , Rn ), kφk∞ ≤1
ZΩ
≥ sup ρ dH n−1
|ρ∈ C0∞ (Rn , R1 ), n
|ρ(x)| ≤ 1, ∀ x ∈ R
S
= Hn−1 (S).
From the above two examples, we see that W 1,1 (Ω) ⊂ BV(Ω), but they are
not equal.
BV(Ω) has the following properties.
(1) (Lower semi-continuity) Suppose {uj } ⊂ BV(Ω) with uj → u in L1 (Ω),
then for any open subset U ⊂ Ω,
kDuk(U ) ≤ lim kDuj k(U ).
Moreover, from sup{kDuj k(Ω) | j = 1, 2, . . .} < ∞, we can deduce that
kDuk(Ω) < ∞.
Proof ∀ φ ∈ C01 (Ω, Rn ), kφk∞ ≤ 1, we have
Z Z
u div φ dx = lim uj div φ dx ≤ lim kDuj k(Ω).
Ω j→∞ Ω
The assertion now follows.
(2) BV(Ω) is complete.
Proof Let {uj } ⊂ BV(Ω) be a Cauchy sequence. By definition, this is also a
Cauchy sequence in L1 (Ω). Hence, uj → u in L1 (Ω). By Property (1), u ∈
BV(Ω). It remains to show
kD(uj − u)k(Ω) → 0.
Again by property (1), ∀ ε > 0, ∃ j0 > 0 such that

kD(uj − u)k(Ω) ≤ lim kD(uj − uk )k(Ω) < ε, j ≥ j0 ,
k→∞
which completes the proof.

∞
(3) (Approximation) ∀ u ∈ BV(Ω), ∃ {uj } ⊂ BV(Ω) ∩ C (Ω) such that
(a) uj → u in L1 (Ω).
(b) kDuj k → kDuk in the sense of Radon measure.
In particular, kDuj k(Ω) → kDuk(Ω).
For the proof, we refer to [AFP], pp. 122–123.
(4) (Compactness) Suppose {uj } ⊂ BV(Ω) satisfies kui kBV ≤ M , then
there exists a subsequence {unj } and some u such that unj → u in L1 (Ω) and
kDuk(Ω) ≤ M .
Proof By property (3), ∃ {vj } ⊂ BV(Ω) ∩ C ∞ (Ω) such that kvj − uj k1 ≤ 1j and
kvj kBV ≤ kuj kBV + 1j .
Since kvj kW 1,1 = kvj kBV , {vj } is W 1,1 -bounded. By the Rellich–
Kondrachov compact embedding theorem, there exists a subsequence {vnj } such
that vnj → u in L1 (Ω), whence unj → u in L1 (Ω). By property (1), we have
kDuk(Ω) ≤ lim kDvj k(Ω) ≤ lim kDuj k(Ω) ≤ M.
In addition, Poincaré’s inequality also holds for BV functions.

(5) (Poincaré’s inequality) Let Ω ⊂ Rn be a bounded, connected, and
extendable region, then the following inequality holds:
Z
1 n
ku − u dxkp ≤ Cp kDuk(Ω), ∀ u ∈ BV(Ω), 1 ≤ p ≤ .
|Ω| Ω n−1
For the proof, we refer to [AFP], p. 153.
Next, we examine the weak-∗ convergence of BV(Ω). Denote X =
C0 (Ω, Rn+1 ). Let
E = φ = (ϕ0 , ϕ1 , . . . , ϕn ) ∈ C0∞ (Ω, Rn+1 ) | div ϕ̂ = ϕ0 , ϕ̂ = (ϕ1 , . . . , ϕn ) .

Let Y be the closure of E in X, then Y is a closed linear subspace of X.

Consider the linear mapping
T : BV(Ω) → M(Ω, Rn+1 ),
u 7→ (uLn , D1 u, . . . , Dn u),
where Ln is the n-Lebesgue measure, while Di u is the ith component of the
Radon vector-valued measure Du, then in the sense of distribution, we have
hDi u, ϕi = −hu, ∂xi ϕi, ∀ ϕ ∈ C0∞ (Ω), 1 ≤ i ≤ n.
We point out T u ∈ (X/Y )∗ . This is because ∀ φ ∈ E,

hT u, φi = huLn , ϕ0 i + hDu, ϕ̂i = huLn , ϕ0 − div ϕ̂i = 0.
Thus, T u|Y = 0, i.e. T u ∈ (X/Y )∗ .
Furthermore, T is surjective, i.e. ∀ (µ0 , µ1 , . . . , µn ) ∈ M(Ω, Rn+1 ), if
n
X
hµ0 , ϕ0 i + hµi , ϕi i = 0, ∀ (ϕ0 , . . . , ϕn ) ∈ E,
i=1
then
hµi , ϕi = −hµ0 , ∂i ϕi, i = 1, 2, . . . , n, ∀ ϕ ∈ C01 (Ω).
Using techniques from measure theory, it is not difficult to verify that µ0 is indeed
absolutely continuous. Therefore, there exists u ∈ L1 (Ω) such that µ0 = uLn .
Since
Z

kT ukM(Ω,Rn+1 ) = sup [u · ϕ0 + Du · ϕ̂] dx kφkC0 (Ω,Rn+1 ) ≤ 1 ,
Ω
it follows that
kukBV = kukL1 + kDuk(Ω) ≤ 2kT uk ≤ 2kukBV .
As a consequence, we can regard BV(Ω) as the dual space of the separable Banach
space (X/Y ). By the Banach–Alaoglu theorem, every bounded set is weak-∗
sequentially compact.
We now analyze the weak-∗ topology on BV(Ω).
Theorem 20.1
uj → u, in L1 ,

BV weak∗
uj −−−−−−→ u ⇐⇒
uj is bounded in BV.
Proof “⇒” By the Banach–Steinhaus theorem, {uj } is bounded in BV. By prop-

erty (4), uj → u in L1 (Ω).
“⇐” By the Banach–Alaoglu theorem, {uj } contains a weak convergent sub-
sequence u∗nj . It remains to show all possible weak-∗ limits of {Duj } are the
same as Du.
Suppose there exists a subsequence {unj } such that lim Dunj = µ, then
Z Z
unj div ϕ dx = − Dunj · ϕ dx,
Ω Ω
which implies Z
u div ϕ dx = −hµ, ϕi.
Ω
Thus, µ = Du.
The generalized gradient Du of a BV function u in several variables also takes

on the Lebesgue decomposition in the sense of distribution:
Du = Da u + Dj u + Dc u,
where
Da u = ∇uLn ∈ L1 (Ω, Rn )
is the absolutely continuous part, Dj u and Dc u are the jump part and the Cantor
part respectively. As usual, Ln denotes the Lebesgue measure on Rn .
Fig. 20.1
They are defined as follows. Denote Br (x) the ball of radius r > 0 with center
x in Rn . Define two functions (see Figure 20.1)
Ln {y ∈ Br (x) | u(y) > t}

+

u (x) = inf t ∈ [−∞, +∞] lim =0 ,
r→0 rn
Ln {y ∈ Br (x) | u(y) < t}

−

u (x) = sup t ∈ [−∞, +∞] lim =0 .
r→0 rn
A point x ∈ Ω is called a Lebesgue point if
Z
1
lim |u(y) − u(x)| dy = 0.
r→0 |Br (x)| B (x)
r
In which case, we have

Z
1
u(x) = lim u(y) dy
r→0 |Br (x)| Br (x)
and
u(x) = u+ (x) = u− (x).
We call the set

Su = {x ∈ Ω | u+ (x) > u− (x)}
the jump set of u. It can be shown that Su is a countable rectifiable set.
In connection to the (n − 1) dimensional Hausdorff measure Hn−1 , we can
define a normal direction nu (x) to almost all x ∈ Su . Consequently, we can
define
Dj u = (u+ − u− )nu · Hn−1 |Su
and
Dc u = (Du − Da u)x(Ω\Su ).
We refer to [AFP], pp. 187–189 for details.
20.3 The relaxation function
Under certain growth restriction, we know the weak sequential lower semi-
continuity of a functional I is equivalent to the quasi-convexity of the Lagrangian
in p. In Lecture 13, we mentioned that Bolza gave the counterexample (Example
13.3) of L(u, p) = u2 + (p2 − 1)2 . It is not convex in p, I(u) has a minimizing
sequence {un } in W 1,4 (0, 1) which converges weakly to 0, and the corresponding
functional values I(un ) → 0, but I(0) = 1.
However, from the viewpoint of extremal values of a functional, in this exam-
ple, the value of I(0) is not of particular importance, but instead, the minimizing
sequence itself reflects how small the values of I may become.
In order to showcase the characteristic of the functional in this regard, we
introduce the concept of a relaxation function.
Definition 20.4 Let (X, τ ) be a topological space and f : X → R̄ = R1 ∪{+∞}.
We call Rf (x) = sup{ϕ(x) | ϕ : X → R̄ is lower semi-continuous, ϕ(y) ≤
f (y), ∀ y ∈ X} the relaxation function of f .
The following are immediate from the definition.
(1) If f is lower semi-continuous, then Rf = f .
Proof Choosing ϕ = f , then f ≤ Rf . Conversely, by definition, Rf ≤ f .
(2) Rf (x) is lower semi-continuous. This is because the supremum of a family
of lower semi-continuous functions is itself lower semi-continuous.
From this, we see that the relaxation function Rf of f is the largest among all
lower semi-continuous functions which are not greater than f .
(3) Let f : X → R̄ be bounded below. Suppose f has a minimizing se-
quence {xj } and x0 is a limit point of {xj }, then x0 is the minimum of Rf and
Rf (x0 ) = inf f .
Proof Without loss of generality, we may assume c = inf f = lim f (xj ). On one
hand,
Rf (x0 ) ≤ limRf (xj ) (by (2))

≤ limf (xj ) = c (by definition).
On the other hand, c ≤ f (x), ∀ x ∈ X. Since the constant function c is weakly

lower semi-continuous, so c ≤ Rf (x), ∀ x ∈ X. This means Rf (x0 ) = c and x0
is the minimum of Rf .
Theorem 20.2 Let X be the dual space of a separable Banach space X and
f : X → R1 be bounded below. If f is coercive, then in the weak-∗ topology, the
relaxation function Rf has a minimum. Furthermore,
min Rf = inf f.
X X
Proof Choose a minimizing sequence {xj } of f such that lim f (xj ) = inf X f .
*w
Since {xj } is bounded in X, it contains a weak-∗ convergent subsequence xnj −→
x0 . By definition, the relaxation function Rf with respect to the weak-∗ topology
is weak -∗ lower semi-continuous, by property (3), x0 is the minimum of Rf , i.e.
min Rf = Rf (x0 ) = inf f.

X X
Returning to a variational problem, we consider the Lagrangian L : Rn → R1 ,

which is continuous and satisfies the growth condition
c0 |p|q ≤ L(p) ≤ c1 |p|q + c2 , 1 < q < ∞,
where c0 , c1 , c2 > 0 are constants. Define the functional

Z
I(u) = L(∇u(x)) dx, u ∈ W 1,q (Ω), 1 < q < ∞.
Ω
Noting that
I is weakly lower semi-continuous ⇔ L is quasi-convex ⇔ L is convex,
thus, we have
Z
RI(u) = conv(L)(∇u(x)) dx,
Ω
where conv(L) = sup{ϕ|ϕ(x) ≤ L(x), ∀ x ∈ Ω, ϕ is convex} is the convex hull

of L; that is, the supremum of all convex functions no greater than L.
Fig. 20.2
Example 20.3 (The double well) Let L(t, u, p) = (1 − p2 )2 (see Figure 20.2),
Z 1
I(u) = L(t, u(t), u̇(t))dt.
−1
Define
(1 − p2 )2 , |p| > 1,

L̃(t, u, p) =
0, |p| ≤ 1,
which is the convex hull of L, then
Z 1
RI(u) = L̃(t, u(t), u̇(t))dt.
−1
In modern calculus of variations, there are abundant literature dedicated to the

concrete representations of relaxation functions, for example, we refer to Buttazzo
[Bu].
20.4 Image restoration and the Rudin–Osher–Fatemi model
Optic instruments often cause distortions in imaging. Besides noises, instru-

ments themselves can also cause incorrect magnifications of the image functions.
Let Ω be a domain, and let realistic images be represented by functions u :
Ω ⊂ R2 → R. The optical image of the object is
Z
ud (x) = Ku(x) + n(x) = K(x − y)u(y)dy + n(x),
where K is the convolution operator derived from the point source dispersion
function of the optical instrument itself. It can be viewed as a bounded linear oper-
ator on L2 (Ω). Moreover, n(x) denotes the noise. The so-called image restoration
is the process of deducing u from ud .
The most natural approach is to employ the least square method in finding
Z
inf |ud − Ku|2 dx, (20.3)
Ω
whose E-L equation is
K ∗ ud − K ∗ Ku = 0, (20.4)
where K ∗ is the adjoint operator of K. Since K ∗ K does not have bounded in-
verse, (20.2) is in general ill-posed.
Tikhonov and Arsenin proposed a regularization method in 1977, by adding
an energy term
Z Z
Tλ (u) = |ud − Ku|2 dx + λ |∇u|2 dx, where λ is a parameter
Ω Ω
to the target functional in (20.3).
The advantage of this method can not only overcome the ill-posedness of
(20.2), but also smooth out the solution u.
If we seek the minimum of the functional on H 1 (Ω), then the E-L equation is
K ∗ ud − K ∗ Ku + λ∆u = 0,
with the Neumann boundary condition

∂u
= 0.
∂n ∂Ω
The regularity of solution of an elliptic equation insures the smoothness of the
solution u. By doing so, some noise can also be filtered out. However, this ho-
mogenized smoothing process has its flaw, it inevitably blurs the boundary of the
image.
Several attempts were made to remedy this issue. For example, replacing the
L2 norm of the gradient ∇u by the Lp norm, and by decreasing the value of p, one
can hopefully retain the image on the
R boundary. Rudin–Osher–Fatemi suggested
to use the L1 norm, namely, using Ω |∇u| dx instead of Ω |∇u|2 dx.
R
Unfortunately, W 1,1 (Ω) is neither reflexive, nor is it the dual space of any
Banach space. Consequently, it is very difficult to determine the existence of
minimum of a functional on it.
We have stated that BV(Ω) is the dual space of a separable Banach space.
However, a BV solution u may have jump-type discontinuity, while the set of
jump-type discontinuity is a rectifiable curve. If we solve the problem in BV(Ω),
then not only the existence of the solution is relatively easy to establish, but the
edge of the body is also easier to meet. The model they proposed is to find the
minimum u of the functionalZ Z
1 2
E(u) = |ud − Ku| dx + λ φ(|∇u|) dx, (20.5)
2 Ω Ω
where φ : R+ → R+ is a strictly convex, monotone increasing function satisfying

φ(0) = 0 (20.6)
and
∃ c > 0, ∃ b ≥ 0 such that cs − b ≤ φ(s) ≤ cs + b, ∀ s. (20.7)
However, on BV(Ω), we must know how to interpret the second integral in (20.5).
Theorem 20.3 (E. De Giorgi, L. Ambrosio, G. Buttazzo) In BV(Ω), the relax-
ation function of E is
Z Z
1
RE(u) = |ud − Ku|2 + λ φ(|∇u|) dx
2 Ω Ω
Z Z
+ − 1
+ λc (u − u ) dH + λc |Cu |, (20.8)
Su Ω\Su
where
Du = ∇uL2 + (u+ − u− )nu H1 |Su + Cu
is the Lebesgue decomposition. We refer to [Bu].
Theorem 20.4 Under the assumptions given above, if we further assume K ∈
L(L2 , L2 ) and kK · 1k =
6 0, then the functional RE(u) has a minimum u0 ∈
BV(Ω).
Proof It suffices to show RE is coercive on BV(Ω). Let {uj } ⊂ BV(Ω) be
a minimizing sequence of RE. By (20.7) and (20.8), kDuj k(Ω) is bounded
and kud − Kuj k2 is also bounded. It remains to show that kuj k1 is bounded.
Decompose
uj = vj + wj ,
1
R
where wj = Ω j
|Ω| u dx.
By Poincaré’s inequality (property (5)), kvj k2 ≤ CkDuj k(Ω), so we need
only to show |wj | is bounded. Let kK · 1k = r and kKk = k, from
kud − Kuj k22 ≤ M,
it follows that
2
r|wj |(r|wj | − 2(kkvj k2 + kud k2 )) ≤ [kKvj − ud k2 − |wj |r]
≤ kud − Kuj k22 ≤ M.
Since kvj k2 is bounded, so is |wj |.
Finally, by the weak-∗ sequential lower semi-continuity of the relaxation func-
tion RE, there must exist a minimum u0 ∈ BV(Ω). This completes the proof.


Bibliography
The main references for Lectures 1–8: [BGH], [LL], [GF], [JJ], [Ka], [Mac],
[Mos].
The main references for Lectures 9–14: [Ad], [AE], [CL], [Da], [Ek], [Gi],
[JJ], [Mos], [Mor], [Ne], [Po].
The main references for Lectures 15–20:

Lecture 15: [Ch], [JJ], [Ra1], [St1], [MW]
Lecture 16: [BGH], [HT], [Maw], [Mos], [Ra2], [Ra3].
Lecture 17: [Co], [Jo], [Ha], [Hi], [Mor], [Os], [St2].
Lecture 18: [Hu], [Joh].
Lecture 19: [BM], [Mac], [Le], [PBGM].
Lecture 20: [AFP], [AK], [BGH], [Bu], [Fe].
[Ad] Adams, R. A. (1975) Sobolev Spaces, Acad. Press. (Chinese translation by Ye,
Q. X., Wang, Y. D., Ying, L. A., Han, H. D., and Wu, L. C., Beijing, People’s
Education Press, 1983.
[AFP] Ambrosio, L., Fusco, N., and Pallara, D. (2000). Functions of Bounded Varia-
tion and Free Discontinuous Problems, Clarendon Press, Oxford.
[AE] Aubert G. and Ekeland I. (1984). Applied Nonlinear Analysis, A John Wiley-
Interscience Publication.
[AK] Aubert, G. and Kornprobst, P. (2002). Mathematical Problems in Image Pro-
cessing, Partial Differential Equations and the Calculus of Variations, Applied
Math Sciences 147, Springer.
[BM] Brechtken, U. and Manderschild, U. (1991). Introduction to the Calculus of
Variations, Chapman Hill.
[Bu] Buttazzo, G. (1989). Semicontinuity, relaxation and integral representation in
the Calculus of Variations, Pitman Research Notes in Mathematics 207, Long-
man Scientific and Technical.
[BGH] Buttazzo, G., Giaquinta, M., and Hildebrandt, S. (1998). One-dimensional
Variational Problems, An Introduction, Clarendon Press. Oxford.
[CL] Chang, K. C. and Lin, Y. Q. (1987). Lecture notes on Functional Analysis I
(Chinese), Beijing, Peking University Press.
305
[Ch] Chang, K. C. (1993). Infinite Dimensional Morse Theory and Multiple Solution
Problems, Birkhäuser.
[Co] Courant, R. (1950). Dirichlet’s Principle, Conformal Mapping, and Minimal
Surfaces, New York, Interscience.
[Da] Dacorogna, B. (1989). Direct Methods in the Calculus of Variations, Applied
Math. Sciences, 78, Springer Verlag.
[Ek] Ekeland, I. (1979). Non-convex Minimization Problems, Bull. Amer. Math.
Soc., pp. 443–474.
[Fe] Federer, H. (1969). Geometric Measure Theory, Springer-Verlag.
[GF] Gelfand, I. M. and Fomin, S. V. Calculus of Variations, (English translation by
Silverman, R. A., Prentice Hall, 1964).
[Gi] Giaquinta, M. (1983). The regularity problem of extremals of variational inte-
grals, Proc. NATO/LMS Advance Study Inst. on “Systems of nonlinear partial
differential equations”, Reidel Publ. Co.
[GH] Giaquinta, M. and Hildebrandt, S. (1996). Calculus of Variations I The La-
grangian formalism, Grundlehren der mathematischen Wissenschaften, 310,
Springer.
[GT] Gilbarg, D. and Trudinger, N. (1983). Elliptic Partial Differential Equations of
Second Order, 2nd ed. Grundlehren der mathematischen Wissenschaften, 224
Springer..
[Ha] Hardt, R. (Ed. 2004). Six Theorems in Variations, Student Math. Lib. 26 AMS.
[Hi1] Hilbert, D. (1900). Uber das Dirichletsche Prinzip, Jber Deutsch Math. Vere.,
8, pp.184–188.
[Hi] Hildebrandt, S. (1969). Boundary behavior of Minimal Surfaces, Arch. Rat.
Mech. Anal., 35, pp. 47–82.
[HT] Hofer, H. and Toland, J. F. (1984). Homoclinic, heteroclinic and periodic so-
lutions for indefinite Hamiltonian systems, Math. Ann., 268, pp. 387–403.
[Hu] Hughes, T. (2000). Finite Element method, Linear Static and Dynamic Finite
Element Analysis, Dover Publications.
[JJ] Jost, J. and Jost, X. Li. (1988). Calculus of Variations, Cambridge University
Press.
[Jo] Jost, J. (1990). Two Dimensional Geometric Variational Problems, John Wiley
and Sons.
[Joh] Johnson, C. (1987). Numerical Solution of Partial Differential Equations by
the Finite Element Method, Studentlitteratur, Lund.
[Ka] Kato, T. Calculus of Variations and its Applications (Chinese Translation by
Zhou, H. S., Shanghai Science and Technology Press, 1961).
[LL] Lavrentiev, M. A. and Liusternik, L. A. Lecture Notes on the Calculus of Vari-
ations (Chinese Translation by Zeng, D. H., Deng, H. Y. and Wang, Z. K., High
Education Press, Beijing, 1955).
[Le] Leitmann, G. (1981). The Calculus of Variations and Optimal Control, An In-
troduction, Plenum Press, 1981.
[Mac] MacCluer, C. R. (2005). Calculus of Variations, Mechanics, Control, and
Other Applications, Pearson Education, Ltd.
[MS] Macki, J. and Strauss, A. (1982). Introduction to Optimal Control Theory,
Springer-Verlag, 1982.
Bibliography 307
[MT] Mallianvin, P. and Thalmaier, A. (2006). Stohastic Calculus of Variations in

Mathematical Finance, Springer Finance, Springer.
[Maw] Mawhin, J. Global Results for the Forced Pendulum Equation, Handbook of
Differential Equations, Ordinary Differential Equations, (Ed. by A. Canada, P.
Drabek, F. Fonda), Vol. 1, pp. 533–389.
[MW] Mawhin, J. and Willem, M. (1989). Critical Point Theory and Hamiltonian
Systems, Applied Math. Sciences, 74, Springer Verlag.
[Mor] Morrey, C. B. Jr. (1966). Multiple Integrals in the Calculus of Variations,
Springer Verlag.
[Mos] Moser, J. (2003). Selected Chapters in the Calculus of Variations, Birkhäuser
Verlag.
[Ne] Nehari, Z. (1961). Characteristics Values Associated with a Class of Nonlinear
Second Order Equations, Acta Math., 105, pp. 141–175.
[Os] Osserman, R. (1986). A survey of Minimal Surfaces, Dover Publications.
[Po] Pohozaev, S. (2008). Nonlinear Variational Problems via the Fibering Method,
Handbook of Differential Equations, Stationary Partial Differential Equations
(Ed. by M. Chipot), Vol. 5, pp. 49–208.
[PBGM] Pontryagin, L., Boltyanskii, V., Gamkrelidze, R., and Mishchenko, E. (1962).
The Mathematical Theory of Optimal Process, Interscience Publishers, John
Wiley and Sons.
[Ra1] Rabinowitz, P. (1986). Minimax methods in critical point theory with appli-
cations to differential equations, CBMS Regional Conference Series Math. 65
AMS Providence, 1986.
[Ra2] Rabinowitz, P. (1990). Homoclinic Orbits for a Class of Hamiltonian Systems,
Proc. Royal Soc. Edinburgh, 114 A, pp. 33–38.
[Ra3] Rabinowitz, P. (1989). Periodic and Heteroclinic Solutions for a Periodic
Hamiltonian System, Ann. Inst. Henri Poincaré, Anal. Nonlinearie, 6, pp. 331–
346.
[St1] Struwe, M. (1988). Plateau’s Problem and the Calculus of Variations, Mathe-
matical Notes, Princeton University Press.
[St2] Struwe, M. (1990). Variational Methods, Applications to Nonlinear Partial
Differential Equations and Hamiltonian Systems, Springer Verlag.

Index
1-parameter family of diffeomorphisms, conformal mappings, 249

103 conjugate function, 196
1-parameter rotational group, 109 conjugate gradient method, 268, 270, 272
conjugate points, 37
A field of extremals, 47 conservation of angular momentum, 110
absolutely continuous part, 292, 298 conservation of energy, 109
Additivity, 216 conservation of momentum, 109
Autonomous systems, 22 control variable, 278
convex, 150
Banach, 130 convolution operator, 301
Banach–Alaoglu theorem, 286, 297 coordinate transformations, 24
Banach-Alaoglu, 125 counterexample, 120, 193
Bang-Bang control, 278, 285 Courant’s Min-Max Theorem, 176
biholomorphic mapping, 119 Courant–Lebesgue, 253
Bolza, 183 Courant–Lebesgue lemma, 256
boundary condition, 20, 248 critical point, 205
bounded variation, 289, 293 critical value, 205
Brezis and Nirenberg gave the following
counterexample, 215 d’Alembert equation, 77
Brouwer degree, 216 Direct methods, 119
directional field (flow), 47
Cantor part, 292, 298 Dirichlet boundary condition, 80
Carathéodory condition, 146 Dirichlet integral, 76, 126
Carathéodory system of equations, 62 Dirichlet’s principle, 120, 128
Cauchy–Riemann equation, 114 displacement of a moving particle, 16
Cea, 266 Du Bois Reymond’s lemma, 16, 74
Clairaut’s Theorem, 115 du Bois–Reymond, 15
coercive condition, 121 dual least action principle, 195, 199
coerciveness, 128
conformal group, 249 E-L equation, 16, 74
conformal mapping condition, 114 E. De Giorgi, L. Ambrosio, G. Buttazzo,
309
303 heteroclinic orbit, 228

Eberlein-S̆mulian, 131 holonomic constraints, 94
eigenfunction, 167 homoclinic orbit, 229
eigenvalue, 167 Homotopy invariance, 216
Eigenvalue problems and inequalities, 7
eikonal, 61 image restoration, 301
Einstein field equation, 11 image segmentation, 9
Ekeland, 202 interior minimum, 113
Ekeland variational principle, 203 inverse (set-valued) mapping of ∂f (x),
energy–momentum tensor, 107 196
Euler–Lagrange equation, 15 isoperimetric problem, 89
Euler–Lagrange operator, 19
existence, 179 Jacobi equation, 35, 84
Extension Theorem, 140 Jacobi field, 34
extremal curve, 46 Jacobi operator, 35, 84
extreme point, 286 jumping points, 290
extreme set, 286
Klein–Gordon field, 110
Fenchel transform, 196 Krein–Milman theorem, 286
Fenchel–Moreau, 197 Kronecker’s existence, 216
finite element, 263
first eigenvalue, 86 Lagrange bracket, 57
first order variation, 14 Lagrange multiplier, 90, 93
Fréchet derivative, 204 Laplace equation, 76
Lebesgue measure, 291
The Gâteaux derivative, 147 Lebesgue point, 298
Gårding’s inequality, 84 Lebesgue–Nikodym decomposition
generalized derivative, 133, 292 theorem, 292
generalized Fourier coefficients, 171 Legendre transform, 62
generalized gradient, 150, 293 Legendre–Hadamard condition, 32
Geodesic, 6, 17, 243 level sets, 211
geodesic equation, 245 line of steepest descent, 4
Geodesics on a sphere, 24 link, 215, 216
geometric measure theory, 289 Liusternik–Schnirelmann theory, 191
global variational calculus, 225 local 1-parameter transformation group,
gradient method, 269 105
lower semi-continuous, 149
Hadamard, 193 lower semicontinuity, 121
Hamilton system of equations, 65
Hamilton–Jacobi equation, 67 Maxwell’s equations, 78
Hamiltonian, 64, 282 Mayer field, 52
Hamiltonian system, 10 Mazur’s Theorem, 149
Harmonic mappings, 10, 189 mean curvature, 77
Harmonic mappings to spheres, 99 mean curvature equation, 77
Harmonic oscillators, 71 mean value property, 184
Hausdorff measure, 294 mesh, 262
Index 311
minimal surface generated by a surface of rank 1 convex, 161

revolution, 23 Rayleigh–Ritz method, 259
minimal surfaces, 6, 77, 78 regularity, 179
modifier, 138 reinvestment, 9
Morrey–Acerbi–Fusco, 158 relaxation function, 299
Morrey–Lichtenstein, 249 Rellich, 143
Morse Theory, 225 Rellich–Kondrachov compact embedding
Mountain Pass Theorem, 212 theorem, 143
Riemann (conformal) mapping theorem,
n-simplex, 261 119
Nehari technique, 206 Riemann–Lebesgue theorem, 241
Neumann boundary condition, 80 Riemannian manifold, 251
nodes, 262 Rudin–Osher–Fatemi model, 301
Noether, 106
Noether’s identity, 104, 106 Scherk surface, 78
Noether’s theorem, 108 second order variation, 30, 81
Nonlinear eigenvalue problems, 190 sequentially compact, 121
Normality, 216 sequentially lower semicontinuity, 121
Serrin–Meyers Approximation Theorem,
Obstacle problem, 100 140
Sobolev, 142
Palais–Smale condition, 202, 205 Sobolev Embedding Theorem, 142
Parsaval’s equality, 85 Sobolev spaces, 134
Parseval’s identity, 172 space W m,p (Ω), 134
periodic solution of the second kind, 229 space translation group, 108
periodic solutions of forced oscillations, spectrum, 167
187 state equation, 278
permissible control functions, 278 state variable, 278
Pettis, 130 steepest descent, 22
phase space, 105 strict Legendre–Hadamard condition, 32,
Pohozaev’s identity, 116 83
Poincaré, 33 strong (weak) minimum, 43
Poincaré’s inequality, 36, 127, 142 strong elliptical condition 83
Poincaré–Cartan invariant, 66 Strong minimum, 74, 87
point source dispersion function, 301 Sturm–Liouville operator, 173
Pointwise constraints, 94 sub-differential operator, 196
Poisson equation, 119
polarization technique, 192 three-point condition, 250
Pontryagin Maximal Principle, 280 time translation group, 109
positive Lorentz transformation, 110 Tonelli–Morrey, 152
price functional, 278 topological variational methods, 225
propagation of light through a medium, 68 total variation, 290, 293
triangulation, 262
quasi-convex, 158 two-point boundary value problem, 186
Ramsey economic growth model, 279 variational derivative, 18

variational inequality, 101 weakly harmonic functions, 184

vibrations of thin plates, 8 weakly semi-lower continuous, 128
weakly sequentially lower
Wall Theorem, 215 semi-continuous, 149
wave equations, 76 Weierstrass, 120
weak convergence, 123 Weierstrass excess function, 46
weak minimum, 74 Weyl, 184
weak-∗ convergence, 123 Wirtinger’s inequality, 187
Weak-∗ sequential compactness, 125
weakly conformal condition, 249 Young’s inequality, 127, 197
568 THE MATHEMATICAL GAZETTE
There is a section that makes use of the Johnson solids (called the Johnson-
Zalgeller solids in the book). These are the 92 convex polyhedra that are more or less
irregular, but whose faces are regular polygons. Edge models of all 92, made from
coloured plastic rods, are shown in a single picture about 12 cm by 10 cm. It is
impossible to make out much detail: they are unlabelled, some overlap others, and
the smallest is a triangular blob about 4 mm by 2 mm. In the text some are referred
to only by number, e.g. J3, J7, J65, with no illustration or other indication of their
shape. Although the numbering is standard and the book claims to be self-contained,
we have to look elsewhere to find what solids these represent.
The English slips occasionally (‘... every repeated pattern has one of two
dimensional symmetry groups ... ’), and there are a few misprints. I was quietly
amused to see a British newspaper that was once renowned for its misprints referred
to as ‘The Gardian’. The mathematician Hallard Croft might not be so amused to see
his name given consistently in the references as H. T. Craft.
There is a lot of jargon, probably unavoidable, but wading through uni-trunk
holders, canonical parallelohedra, (P,Q)-chimera superimpositions,
equicomplementability and equidecomposability is perhaps not for the faint-hearted.
There are chapters that stand alone, and can easily be dipped into. But sections such
as the work leading to Hilbert's third problem and Dehn's theorem are serious
mathematics that has to be worked at. You have only to look at the given definition
of the Dehn invariant to see that it makes nonsense of the claim on the back cover
that ‘It does not require more than a high school level of knowledge ...’.
Despite my strictures this is an interesting work, with much that cannot be found
elsewhere. I shall certainly try to fathom how the 858-box works. This is a book,
perhaps, for the library, with a notice ‘Handle with care’.
10.1017/mag.2017.164 MICHAEL FOX
2 Leam Road, Leamington Spa CV31 3PA
e-mail: vulpius1@gmail.com
Lecture notes on calculus of variations by Kung Ching Chang; translated by Tan

Zhang, pp. 324, $56.00 (paper), ISBN 978-9-81314-623-5, World Scientific (2017).
Calculus of variations deals with maximising or minimising functionals, which
are mappings from a set of functions to real numbers. Particular variational problems
and their solutions, such as the brachistochrone, already appeared soon after the
invention of calculus. Such problems are much more complex than those in ordinary
calculus, with the necessary condition for the attainment of critical values taking the
form of the Euler-Lagrange equation. Many variational problems then become ones
of solving differential equations, and university students will have come across some
of them in the study of mathematical methods. At the more advanced level, one has
to study the foundation of the subject, much of it being built up by Weierstrass; for
example, the space of functions involved being infinite dimensional, even the
existence of an extremal solution often becomes a non-trivial problem. In more
recent times great strides in both the theoretical and applied aspects have taken place
in the development of sophisticated ‘direct methods’. However, because of the
required mathematical maturity from the students, only a handful of universities
offer calculus of variations as an optional fourth year course.
The book being reviewed is a good translation of the lecture notes on calculus of
variations for students at Peking University. There are 20 lectures, with Lectures 1-8
on the classical theory; the foundation and development are covered in Lectures 9-
14, and a wide range of applications are given in the remaining six lectures. The
REVIEWS 569
classical theory includes sufficient conditions for extremal solutions, the Legendre-
Hadamard condition, the isoperimetric problem, Mayer field, Hilbert's invariant
integral, and the Hamilton-Jacobi theory. There is also a lecture on Noether's
theorem, which gives a unified approach to various conservation laws in physics and
mechanics from the invariant properties of the Lagrangian under certain group
actions. Dirichlet's principle, the analysis of the existence of extreme values of
functionals, and the need to introduce notions such as weak sequential lower semi-
continuity are considered in the lectures on foundation and development, which also
include lectures on Sobolov spaces and the Ekeland variational principle. The
lectures on applications include the Mountain Pass theorem, geodesics and minimal
surfaces, numerical methods (Rayleigh-Ritz, finite element, optimisation, conjugate
gradient), optimal control, and image processing.
I learned much from studying the lectures on applications. Take, as an example,
Lecture 16, with the title ‘Periodic solutions, homoclinic and heteroclinic orbits’,
which opens with the consideration of the simple pendulum. The dynamical equation
concerned is ϕ ¨ + (g / l) sin ϕ = 0, where g and l have their usual meanings; readers
will be familiar with the simple harmonic solution if sin ϕ is replaced by its first
order approximation ϕ. Here we are given a proper treatment for the problem, and
the topics in the title are considered in some depth.
The prerequisite is perhaps too high for most undergraduates—readers are
expected to be thoroughly familiar with concepts from measure theory, functional
analysis, differential geometry and topology, as well as various techniques in applied
mathematics. Sometimes esoteric theorems and methods are used with neither
comment nor reference, for example, the embedding theorems of Rellich-
Kondrachov, and of Nash. The presentation can be somewhat economical in places,
where supplementary work is required for a proper understanding. However, there
are suitably chosen examples throughout the text to illustrate new notions and
concepts, and also exercises at the end of most of the lectures. The book should be
useful to advanced undergraduates and graduate students, and it will also serve as a
reference text.
10.1017/mag.2017.165 PETER SHIU
353 Fulwood Road, Sheffield S10 3BQ
e-mail: p.shiu@yahoo.co.uk
Finding Fibonacci by Keith Devlin, pp. 241, £24.95, ISBN 978-0-691-17486-0,

Princeton University Press (2017).
No, not another book about Fibonacci series and the golden ratio, I'm glad to say.
Keith Devlin, author of Mathematics: The New Golden Age and many other books,
has in recent years devoted a substantial amount of time in search of the historical
Leonardo of Pisa, filius Bonacci (the popular name is a nineteenth-century
construct), and this has given rise to two books: The Man of Numbers [1], and the
present volume. Devlin was fascinated by what he knew of Leonardo's Liber abbaci
(the spelling used throughout). The book was the first to promulgate Hindu and
Arabic methods of computation in the West, and Devlin argues convincingly that by
doing so its publication in 1202 had a revolutionary effect on trade and society. For
example, the normal methods of calculation in the West at that time involved either a
complete absence of ‘working’ or successive obliterations, whereas the new methods
provided a paper audit trail of a sort indispensable to the development of complex
finance. He compares this effect, entertainingly, to the revolution effected by the
introduction of modern computing operating systems which moved computing
power away from the domain of experts and into that of the general public.

Kung-Ching Chang - Lecture Notes On Calculus of Variations (Peking University Mathematics) - WSPC (2016)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kung-Ching Chang - Lecture Notes On Calculus of Variations (Peking University Mathematics) - WSPC (2016)

Uploaded by

Copyright:

Available Formats

10157_9789813144682_tp.

indd 1 5/9/16 11:13 AM

Series Editors: Kung-Ching Chang, Pingwen Zhang, Bin Liu,

Vol. 1: An Introduction to Finsler Geometry

Vol. 2: Numerical Methods for Exterior Problems

Vol. 3: Approaches to the Qualitative Theory of Ordinary Differential

Vol. 4: Elliptic, Hyperbolic and Mixed Complex Equations with

Vol. 5: Arbitrage, Credit and Informational Risks

Vol. 6: Lecture Notes on Calculus of Variations

Eh - Lecture Notes on Calculus of Variations.indd 1 02-09-16 4:49:09 PM

Library of Congress Cataloging-in-Publication Data

British Library Cataloguing-in-Publication Data

Copyright © 2017 by World Scientific Publishing Co. Pte. Ltd.

Eh - Lecture Notes on Calculus of Variations.indd 2 02-09-16 4:49:09 PM

vi Lecture Notes on Calculus of Variations

Kung Ching Chang

1. The theory and problems of calculus of variations 1

2. The Euler–Lagrange equation 13

3. The necessary condition and the sufficient condition on

viii Lecture Notes on Calculus of Variations

4. Strong minima and extremal fields 43

5. The Hamilton–Jacobi theory 61

6. Variational problems involving multivariate integrals 73

7. Constrained variational problems 89

8. The conservation law and Noether’s theorem 103

9. Direct methods 119

10. Sobolev spaces 133

11. Weak lower semi-continuity 149

12. Boundary value problems and eigenvalue problems of linear

13. Existence and regularity 179

x Lecture Notes on Calculus of Variations

14. The dual least action principle and the Ekeland

15. The Mountain Pass Theorem, its generalizations, and

16. Periodic solutions, homoclinic and heteroclinic orbits 227

17. Geodesics and minimal surfaces 243

18. Numerical methods for variational problems 259

19. Optimal control problems 275

20. Functions of bounded variations and image processing 289

20.3 The relaxation function . . . . . . . . . . . . . . . . . . . . . . 299

The theory and problems of calculus of

Calculus of variations is an important part of mathematical analysis, it is

 Many important equations in mathematical physics, differential equations in

 Optimal control problems are often different kinds of variational problems

Variational problems also occur frequently in intelligence material, image pro-

 Variational method is the main tool in establishing the existence of solutions

 Numerical methods used in partial differential equations, in particular, the

2 Lecture Notes on Calculus of Variations

Theory and problems of calculus of variations 3

which mirrors nicely with corresponding topics in functional analysis.

4 Lecture Notes on Calculus of Variations

For example, let Ω ⊂ Rn be an open set, x0 ∈ Ω be a fixed point, F ∈ C(Ω̄),

I1 (u) = max |u(x)|,

where M is a subset of C 1 (Ω̄), the set of continuously differentiable functions

1.3 Typical examples

Theory and problems of calculus of variations 5

Assume u ∈ C 1 ([x1 , x2 ]), let {(x, u(x)) | x ∈ [x1 , x2 ], u(xi ) = yi , i = 1, 2}

The total time is

M = {u ∈ C 1 ([x1 , x2 ])) | u(xi ) = yi , i = 1, 2},

then the mapping M → R via u 7→ T is a “functional”. Here u is the independent

6 Lecture Notes on Calculus of Variations

The area A(Z) of S is given by

Theory and problems of calculus of variations 7

Example 1.4 (Eigenvalue problems and inequalities) Given a bounded region

Many important equations in mathematical physics, differential equations in

Optimal control problems are often different kinds of variational problems

Variational method is the main tool in establishing the existence of solutions

Numerical methods used in partial differential equations, in particular, the