Professional Documents
Culture Documents
Variationalcalculus
Variationalcalculus
Calculus of Variations
071113 Frank Porter
Revision 171116
1 Introduction
Many problems in physics have to do with extrema. When the problem
involves finding a function that satisfies some extremum criterion, we may
attack it with various methods under the rubric of “calculus of variations”.
The basic approach is analogous with that of finding the extremum of a
function in ordinary calculus.
First, it seems that such a path must exist – the two outer paths in
Fig. 2(b) presumably bracket the correct path, or at least can be made to
bracket the path. For example, the upper path can be adjusted to take an
arbitrarily long time by making the first part more and more horizontal. The
lower path can also be adjusted to take an arbitrarily long time by making
the dip deeper and deeper. The straight-line path from a to b must take
a shorter time than both of these alternatives, though it may not be the
shortest.
It is also readily observed that the optimal path must be single-valued in
x, see Fig. 1(c). A path that wiggles back and forth in x can be shortened in
time simply by dropping a vertical path through the wiggles. Thus, we can
describe path C as a function y(x).
1
(a) (b) (c)
a. a . a.
C
y
x
b . b . b .
Figure 1: The Brachistochrone Problem: (a) Illustration of the problem; (b)
Schematic to argue that a shortest-time path must exist; (c) Schematic to
argue that we needn’t worry about paths folding back on themselves.
We’ll choose a coordinate system with the origin at point a and the y axis
directed downward (Fig. 1). We choose the zero of potential energy so that
it is given by:
V (y) = −mgy.
The kinetic energy is
1
T (y) = −V (y) = mv 2 ,
2
for zero total energy. Thus, the speed of the particle is
q
v(y) = 2gy.
2
Different functions y(x) will typically yield different values for T ; we call
T a “functional” of y. Our problem is to find the minimum of this functional
with respect to possible functions y. Note that y must be continuous – it
would require an infinite speed to generate a discontinuity. Also, the accel-
eration must exist and hence the second derivative d2 y/dx2 . We’ll proceed
to formulate this problem as an example of a more general class of problems
in “variational calculus”.
Consider all functions, y(x), with fixed values at two endpoints; y(x0 ) =
y0 and y(x1 ) = y1 . We wish to find that y(x) which gives an extremum for
the integral: Z x1
I(y) = F (y, y 0 , x) dx,
x0
0
where F (y, y , x) is some given function of its arguments. We’ll assume “good
behavior” as needed.
In ordinary calculus, when we want to find the extrema of a function
f (x, y, . . .) we proceed as follows: Start with some candidate point (x0 , y0 , . . .),
Compute the total differential, df , with respect to arbitrary infinitesimal
changes in the variables, (dx, dy, . . .):
! !
∂f ∂f
df = dx + dy + ...
∂x x0 ,y0 ,...
∂y x0 ,y0 ,...
Y (x0 ) = y0
Y (x1 ) = y1 . (1)
3
1.9 Y+ ε h
Y
1.4
0.9 h
0.4
-0.1
0 0.2 0.4 0.6 0.8 1
-0.6
-1.1
We’ll use “δI” to denote the change in I due to this change in functional
form:
Z x1 Z x1
0 0
δI = F (Y + h, Y + h , x) dx − F (Y, Y 0 , x) dx,
x0 x0
Z x1 ! !
∂F ∂F
≈ h+ h0 dx. (4)
∂y ∂y 0
x0 y=Y y=Y
y 0 =Y 0 y 0 =Y 0
4
where we have used h(x0 ) = h(x1 ) = 0. Thus,
Z x1 " !#
∂F d ∂F
δI = + h dx. (6)
x0 ∂y dx ∂y 0 y=Y
y 0 =Y 0
Proof: Imagine that f (χ) > 0 for some x0 < χ < x1 . Since f is continuous,
there exists > 0 such that f (x) > 0 for all x ∈ (χ − , χ + ). Let
(x − χ + )2 (x − χ − )2 , χ − ≤ x ≤ χ +
h(x) = (9)
0 otherwise.
Note that h(x) is continuously differentiable in [x0 , x1 ] and vanishes at x0
and x1 . We have that
Z x1 Z χ+
f (x)h(x) dx = f (x)(x − χ + )2 (x − χ − )2 dx (10)
x0 χ−
> 0, (11)
since f (x) is larger than zero everywhere in this interval. Thus, f (x) cannot
be larger than zero anywhere in the interval. The parallel argument follows
for f (x) < 0.
This theorem then permits the assertion that
" !#
∂F d ∂F
+ = 0. (12)
∂y dx ∂y 0 y=Y
y 0 =Y 0
5
whenever y = Y such that I is an extremum, at least if the expression on
the right is continuous. We call the expression on the right the “Lagrangian
derivative” of F (y, y 0 , x) with respect to y(x), and denote it by δF
δy
.
The extremum condition, relabeling Y → y, is then:
!
δF ∂F d ∂F
≡ − = 0. (13)
δy ∂y dx ∂y 0
∂ 2 F 00 ∂ 2F 0 ∂ 2F ∂F
y + y + − = 0. (17)
∂y 02 ∂y∂y 0 ∂x∂y 0 ∂y
Let us now apply this to the brachistochrone problem, finding the ex-
tremum of:
Z xb s
q 1 + y 02
2gT = dx. (18)
0 y
That is: s
0 1 + y 02
F (y, y , x) = . (19)
y
Notice that, in this case, F has no explicit dependence on x, and we can
take a short-cut. Starting with the Euler-Lagrange equation, if F has no
explicit x-dependence we find:
" #
∂F d ∂F 0
0 = − y (20)
∂y dx ∂y 0
6
∂F 0 d ∂F
= y − y0 (21)
∂y dx ∂y 0
dF ∂F d ∂F
= − 0 y 00 − y 0 (22)
dx ∂y dx ∂y 0
!
d ∂F
= F − y0 0 . (23)
dx ∂y
Hence,
∂F
F − y0 = constant = C. (24)
∂y 0
In this case,
∂F 0 2
q
y0 = (y ) / y (1 + y 02 ). (25)
∂y 0
Thus, s
1 + y 02 2
q
− (y 0 ) / y (1 + y 02 ) = C, (26)
y
or 1
y 1 + y 02 = ≡ A. (27)
C2
Solving for x, we find
Z s
y
x= dy. (28)
A−y
We may perform this integration with the trigonometric substitution: y =
A
2
(1 − cos θ) = A sin2 2θ . Then,
v
sin2 2θ
Z u
u θ θ
x = t
2 θ A sin cos dθ (29)
1 − sin 2 2 2
Z
θ
= A sin2 dθ (30)
2
A
= (θ − sin θ) + B. (31)
2
We determine integration constant B by letting θ = 0 at y = 0. We
chose our coordinates so that xa = ya = 0, and thus B = 0. Constant A is
determined by requiring that the curve pass through (xb , yb ):
A
xb = (θb − sin θb ), (32)
2
A
yb = (1 − cos θb ). (33)
2
7
This pair of equations determines A and θb . The brachistochrone is given
parametrically by:
A
x = (θ − sin θ), (34)
2
A
y = (1 − cos θ). (35)
2
In classical mechanics, Hamilton’s principle for conservative systems that
the action is stationary gives the familiar Euler-Lagrange equations of clas-
sical mechanics. For a system with generalized coordinates q1 , q2 , . . . , qn , the
action is Z t
S= L ({qi } , {q̇i } , t0 ) dt, (36)
t0
8
Substituting into the Euler-Lagrange equation gives
" #
d d
p(x) f (x) − q(x)f (x) = 0. (43)
dx dx
This is the Sturm-Liouville equation! That is, the Sturm-Liouville differential
equation is just the Euler-Lagrange equation for the functional J.
We have the following theorem:
Theorem: The solution to
" #
d d
p(x) f (x) − q(x)f (x) = g(x), (44)
dx dx
Thus, Z U
pd02 (x) + q(x)d(x)2 dx = 0. (47)
0
Since pd02 ≥ 0 and qd2 ≥ 0, we must thus have pd02 = 0 and qd2 = 0
in order for the integral to vanish. Since p > 0 and pd02 = 0 it must
be true that d0 = 0, that is d is a constant. But d(0) = 0, therefore
d(x) = 0. The solution, if it exists, is unique.
The issue for existence is the boundary conditions. We presume that
a solution to the differential equation exists for some boundary con-
ditions, and must show that a solution exists for the given boundary
9
condition. From elementary calculus we know that two linearly inde-
pendent solutions to the homogeneous differential equation exist. Let
h1 (x) be a non-trivial solution to the homogeneous differential equation
with h1 (0) = 0. This must be possible because we can take a suitable
linear combination of our two solutions. Because the solution to the
inhomogeneous equation is unique, it must be true that h1 (U ) 6= 0.
Likewise, let h2 (x) be a solution to the homogeneous equation with
h2 (U ) = 0 (and therefore h2 (0) 6= 0). Suppose f0 (x) is a solution to
the inhomogeneous equation satisfying some boundary condition. Form
the function:
f (x) = f0 (x) + k1 h1 (x) + k2 h2 (x). (48)
We adjust constants k1 and k2 in order to satisfy the desired boundary
condition
That is,
b − f0 (U )
k1 = , (51)
h1 (U )
a − f0 (0)
k2 = . (52)
h2 (U )
with p(x) > 0 and q(x) ≥ 0, attains its minimum if and only if f (x) is
the solution of the corresponding Sturm-Liouville equation.
Proof: Let s(x) be the unique solution to the Sturm-Liouville equation sat-
isfying the given boundary conditions. Let f (x) be any other continu-
ously differentiable function satisfying the boundary conditions. Then
d(x) ≡ f (x) − s(x) is continuously differentiable and d(0) = d(U ) = 0.
10
Solving for f , squaring, and doing the same for the dervative equation,
yields
f 2 = d2 + s2 + 2sd, (54)
f 02 = d02 + s02 + 2s0 d0 . (55)
Let
But
Z U Z U" #
0 0 U
d
(pd s + qds + gd) dx = dps0 + −d(x) (ps0 ) + qds + gd dx
0 0 0 dx
Z U " #
d
= d(x) − (ps0 ) + qs + g dx, since d(0) = d(U ) = 0
0 dx
= 0; integrand is zero by the differential equation. (60)
11
of complete functions, {βn (x)} (not necessarily eignefunctions):
∞
X
f (x) = An βn (x).
n=1
pf 02 = 0
(x)βn0 (x).
XX
Am An p(x)βm (64)
m n
Let
Z U
0
Cmn ≡ pβm βn0 dx, (65)
0
Z U
Bmn ≡ pβm βn dx, (66)
0
Z U
Gn ≡ gβn dx. (67)
0
Assume that we can interchange the sum and integral, obtaining, for example,
Z U
pf 02 dx =
XX
Cmn Am An . (68)
0 m n
Then XX X
J= (Cmn + Bmn ) Am An + 2 Gn An . (69)
m n n
Let Dmn ≡ Cmn + Bmn = Dnm . The Dmn and Gn are known, at least in
principle. We wish to solve for the expansion coefficients {An }. To accom-
plish this, use the condition that J is a minimum, that is,
∂J
= 0, ∀n. (70)
∂An
Thus,
∞
∂J X
0= = Dnm Am + Gn , n = 1, 2, . . . (71)
∂An m=1
This is an infinite system of coupled inhomogeneous equations. If Dnm is
diagonal, the solution is simple:
12
The reader is encouraged to demonstrate that this occurs if the βn are the
eigenfunctions of the Sturm-Liouville operator.
It may be too difficult to solve the eigenvalue problem. In this case, we can
look for an approximate solution via the “Rayleigh-Ritz” approach: Choose
some finite number of linearly independent functions {α1 (x), α2 (x), . . . , αN (x)}.
In order to find a function
N
f¯(x) =
X
Ān α(n)(x) (73)
n=1
that approximates closely f (x), we find the values for Ān that minimize
N N
J(f¯) =
X X
D̄nm Ām Ān + 2 Ḡn Ān , (74)
n,m=1 n=1
where now
Z U
D̄nm ≡ (pαn0 αm
0
+ qαn αm ) dx (75)
0
Z U
Ḡn ≡ gαn dx. (76)
0
In this method, it is important to make a good guess for the set of functions
{αn }.
It may be remarked that the Rayleigh-Ritz method is similar in spirit
but different from the variational method we typically introduce in quantum
mechanics, for example when attempting to compute the ground state energy
of the helium atom. In that case, we adjust parameters in a non-linear
function, while in the Rayleigh-Ritz method we adjust the linear coefficients
in an expansion.
5 Adding Constraints
As in ordinary extremum problems, constraints introduce correlations, now
in the possible variations of the function at different points. As with the
ordinary problem, we may employ the method of Lagrange multipliers to
impose the constraints.
13
We consider the case of the “isoperimetric problem”, to find the stationary
points of the functional:
Z b
J= F (f, f 0 , x) dx, (78)
a
Theorem: (Euler) The function f that solves this problem also makes the
functional I = J + λC stationary for some λ, as long as δC
δf
6= 0 (i.e., f
does not satisfy the Euler-Lagrange equation for C).
14
The solution must minimize the potential energy:
Z 2
V = g ydm (84)
1
Z 2
= ρg yds (85)
Z1x2 q
= ρg y 1 + y 02 dy, (86)
x1
x+k
With the substitution y + λ = C cosh θ, we obtain θ = C
, where k is an
integraton constant, and thus
!
x+k
y + λ = C cosh . (94)
C
15
There are three unknown constants to determine in this expression, C, k,
and λ. We have three equations to use for this:
!
x1 + k
y1 + λ = C cosh , (95)
C
!
x2 + k
y2 + λ = C cosh , and (96)
C
Z x2 q
L = 1 + y 02 dx. (97)
x1
6 Eigenvalue Problems
We may treat the eigenvalue problem as a variational problem. As an exam-
ple, consider again the Sturm-Liouville eigenvalue equation:
" #
d df (x)
p(x) − q(x)f (x) = −λw(x)f (x), (98)
dx dx
Lf = −λwf. (99)
16
Notice here that we may take C = 1, corresponding to normalized eigenfunc-
tions f , with respect to weight w.
Let’s attempt to find approximate solutions using the Rayleigh-Ritz method.
Expand
∞
X
f (x) = An un (x), (104)
n=1
where u(0) = u(U ) = 0. The un are some set of expansion functions, not the
eigenfunctions – if they are the eigenfunctions, then the problem is already
solved! Substitute this into I, giving
∞ X
X ∞
I= (Cmn − λDmn ) Am An , (105)
m=1 n=1
where
Z U
Cmn ≡ (pu0m u0n + qum un ) dx (106)
0
Z U
Dmn ≡ wum un dx. (107)
0
Requiring I to be stationary,
∂I
= 0, m = 1, 2, . . . , (108)
∂Am
yields the infinite set of coupled homogeneous equations:
∞
X
(Cmn − λDmn ) An = 0, m = 1, 2, . . . (109)
n=1
Solve for the “best” approximation of this form by finding those {Ān } that
satisfy
N
X
C̄mn − λ̄D̄mn Ān = 0, m = 1, 2, . . . , N, (111)
n=1
where
Z U
0
C̄mn ≡ (pαm αn0 + qαm αn ) dx (112)
0
Z U
D̄mn ≡ wαm αn dx. (113)
0
17
This looks like N equations in the N +1 unknowns λ̄, {Ān }, but the overall
normalization of the An ’s is arbitrary. Hence there are enough equations in
principle, and we obtain
PN
m,n=1 C̄mn Ām Ān
λ̄ = PN . (114)
m,n=1 D̄mn Ām Ān
where we have used the both the boundary condition f (0) = f (U ) = 0 and
d
Sturm-Liouville equation dx (pf 0 ) = qf − λwf to obtain the third line. Also,
J(f¯)
λ̄ = , (117)
C(f¯)
18
how we do. For simplicity, we’ll try a Rayleigh-Ritz approximation with only
one term in the sum.
As we noted earlier, it is a good idea to pick the functions with some
care. In this case, we know that the lowest eigenfunction won’t wiggle much,
and a good guess is that it will be symmetric with no zeros in the interval
(−1, 1). Such a function, which satisfies the boundary conditions, is:
f¯(x) = Ā 1 − x2 , (119)
becomes
(C − λ̄D)Ā = 0. (124)
If Ā 6= 0, then
C 5
= .
λ̄ = (125)
D 2
We are within 2% of the actual lowest eigenvalue of λ1 = π 2 /4 = 2.467. Of
course this rather good result is partly due to our good fortune at picking a
close approximation to the actual eigenfunction, as may be seen in Fig. 6.
19
1.2
1 Series1
Series2
0.8
0.6
0.4
0.2
0
‐1.5 ‐1 ‐0.5 0 0.5 1 1.5
‐0.2
∂u ∂u ∂h
ux ≡ , uy ≡ , hx ≡ , etc. (128)
∂x ∂y ∂x
Then !
dI ZZ ∂F ∂F ∂F
= h+ hx + hy dxdy. (129)
d D ∂u ∂ux ∂uy
20
We want to “integrate by parts” the last two terms, in analogy with the
single-variable case. Recall Green’s theorem:
!
I ZZ
∂Q ∂P
(P dx + Qdy) = − dxdy, (130)
S D ∂x ∂y
and let
∂F ∂F
P =h , Q = −h . (131)
∂ux ∂uy
With some algrbra, we find that
! " ! !#
dI I ∂F ∂F ZZ
∂F D ∂F D ∂F
= h dx − dy + h − − dxdy,
d S ∂ux ∂uy D ∂u Dx ∂ux Dy ∂uy
(132)
where
Df ∂f ∂f ∂u ∂f ∂ 2 u ∂f ∂ 2 u
≡ + + + (133)
Dx ∂x ∂u ∂x ∂ux ∂x2 ∂uy ∂x∂y
is the “total partial derivative” with respect to x.
The boundary integral over S is zero, since h(x ∈ {S}) = 0. The re-
maining double integral over D must be zero for arbitrary functions h, and
hence, ! !
∂F D ∂F D ∂F
− − = 0. (134)
∂u Dx ∂ux Dy ∂uy
This result is once again called the Euler-Lagrange equation.
8 Exercises
1. Suppose you have a string of length L. Pin one end at (x, y) = (0, 0)
and the other end at (x, y) = (b, 0). Form the string into a curve such
that the area between the string and the x axis is maximal. Assume
that b and L are fixed, with L > b. What is the curve formed by the
string?
2. We considered the application of the Rayleigh-Ritz method to finding
approximate eigenvalues satisfying
y 00 = −λy, (135)
with boundary conditions y(−1) = y(1) = 0. Repeat the method, now
with two functions:
α1 (x) = 1 − x2 , (136)
α2 (x) = x2 (1 − x2 ). (137)
21
You should get estimates for two eigenvalues. Compare with the exact
eigenvalues, including a discussion of which eigenvalues you have man-
aged to approximate and why. If the eigenvalues you obtain are not
the two lowest, suggest another function you might have used to get
the lowest two.
d2 y 1 dy m2
!
2
+ + k − y = 0. (138)
dx2 x dx x2
22