Professional Documents
Culture Documents
lean-Louis Basdevant
~ Springer
Professor Jean-Louis Basdevant
Physics Department
Ecole Poly technique
91128 Palaiseau
France
jean-louis. basdevant@polytechnique.edu
9 8 7 6 543 2 1
springer.com
Preface
Preface........................................................ v
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Esthetics and Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Metaphysics and Science ................................. 3
1.3 Numbers, Music, and Quantum Physics .................. " 4
1.4 The Age of Enlightenment and the Principle of the Best. . . . .. 7
1.5 The Fermat Principle and Its Consequences. . . . . . . . . . . . . . . .. 8
1.6 Variational Principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9
1.7 The Modern Era, from Lagrange to Einstein and Feynman .... 12
2.4 Problems............................................... 43
Introduction
The same need explains why physics is deeply filled with esthetic consider-
ations. In fact, the beauty of a theory has very often been considered as a
decisive argument in its favor. Albert Einstein's general relativity is a famous
example. It was formulated in 1916 but only got its true experimental verifica-
tions 70 years later.l Nevertheless, nobody seriously thought that the theory
could really be disproved. 2 As Lev Davidovich Landau says ([1], Section 82),
"[It] is probably the most beautiful of existing theories. It is remarkable that
Einstein constructed it purely by deductive arguments and that it is only
afterwards that it was confirmed by astronomical observations."
The ingredients of esthetics have many origins. Of course, the beauty of
an idea in itself is difficult if not impossible to define in general. However,
lOne usually makes a distinction between the verifications of the equivalence prin-
ciple, such as the deviation of light rays by a gravitational field, the variation of
the pace of a clock in a gravitational potential, or the general relativistic correc-
tions to celestial mechanics, and the true predictions of Einstein's equations, such
as the radiation of gravitational waves.
2 This does not mean that one should give up finding experimental proofs.
two factors are easier to identify. These are the simplicity of a theory and
its unifying nature. Below, we will mention the archetype of such an intellec-
tual achievement, the Pythagorean musical scale. There are numerous other
examples.
After extensive work, both observational3 and calculational,4 Johannes
Kepler founded his famous laws on the motion of planets in the solar system.
The discovery that, from a Copernican viewpoint, the orbits are the pure and
legendary ellipses of the geometry of Apollonius, Euclid and Archimedes has
a beauty and a simplicity that Kepler could not resist. He naturally conceived
the universe as being constructed in a mathematical esthetic that exhibits
both purity and unity. He expressed his emotion in his celebrated phrase
"Nature likes simplicity."5 It was both a triumph and a wonder when in the
framework of his Principia Isaac Newton was able to deduce Kepler's laws.
The same thing happened in the unification of electricity and magnetism
by Andre-Marie Ampere, followed by that of electromagnetism and light by
James Clerk Maxwell. This amazing adventure of the 19th century lasted
for a long time. The mathematical structure of Maxwell's equations revealed
relativity. The unification of electro-weak interactions by Sheldon Glashow,
Steven Weinberg and Abdus Salam in the 1960s was the following stage of this
fascinating endeavor. It led to the perspective of unifying all fundamental in-
teractions, including gravitation. At each step, simplicity, unity, and esthetics
are dominant features.
Simplicity does not mean that things can become understandable by the
layman. It is quite the contrary. The simplicity appears within the mathemat-
ical language. Galileo was the first to realize that:
Philosophy is written in the immense book which is constantly open
in front of us (I mean the universe), however, one cannot understand
it without first learning the language and the characters in which it is
written. It is written in the mathematical language and its characters
are triangles, circles and other geometrical figures in the absence of
which it is not possible for a human being to understand a single word
of it.
It is tempting to recall the words of Leonardo da Vinci in his Treatise on
Painting: "Non mi legga chi non e matematico, nelli mia principi.,,6 Simplic-
ity lies in the mysterious possibility of representing natural phenomena by
more and more general mathematical structures. If one can say that one of
the most fundamental mathematical structures of quantum mechanics lies in
the simplest of the four basic operations (i.e. addition), it is a consequence
3 Galileo's telescope was invented in 1609, ten years later than Kepler's works.
4 Kepler dedicated his memoir Mysterium cosmographicum to John Napier, who
invented logarithms. Kepler said that without logarithms he never would have
been able to perform his accurate and difficult calculations.
5 Natura simplicitatem amat.
6 Those who are not mathematicians should not read my principles.
1.2 Metaphysics and Science 3
7 "What really is time? If no one asks me, I know. But if someone asks the question
and that I must explain it, I no longer know." Saint Augustine, Confessions XI,
XIV, 17.
4 1 Introduction
The birth of modern physics is commonly placed in the 17th century with
Galileo. In fact, he laid its two founding stones: the experimental method and
the formulation of the theory in a mathematical language.
However, the starting point of experimental and theoretical physics lies
2200 years before that. In fact, the Pythagorean theorem occults what is, in
precisely Galileo's terms, the first modern discovery in physics-the theory of
sounds and the musical scale. It is modern in the sense that this discovery
possesses the two properties of having an experimental foundation and of
being expressed in a mathematical manner.
Music is the first abstract art. It is fascinating because it reaches directly
the subconscious. It escapes any attempt to be verbalized. Apart from tech-
nical discussions between experts, one cannot tell music. Musical writing is a
permanent source of amazement, as one can see in Figure 1.1 .
b
1fJ,----
tp~
1~e:.. ~
..,.....
3~frttI..... r"b.. ~
o__
dur .. t .. - - - -
l"hlllht1 ~
Fig. 1.1. Sylvano Bussotti, "Piano pieces for David Tudor # 4," excerpt from
"Pieces de Chair II" (Pieces of Flesh). (Courtesy of Casa Ricordi-BMG Ricordi
Milan; all rights reserved.)
It is difficult to put a date on the birth of this art, but for sure, very
quickly, humans in their songs understood the existence of harmony. The
octave, which is the simplest example, consists in the amazing discovery that
the same sound can be reproduced at a high pitch as well as at a low pitch.
The legend is that Pythagoras discovered the explanation of the musical
pitch by noticing that the pitch was directly related to the length of the ob-
ject that produced the sound. He used to pass daily in front of a blacksmith's
1.3 Numbers, Music, and Quantum Physics 5
workshop in Samos, his native island. 8 He observed that rods of iron of differ-
ent length gave different sounds under the blacksmith's hammer. As Arthur
Koestler says ([2], Chapters V and VII), "The ear-splitting crashes and bangs
in the workshop which, since the Bronze Age had yielded to the Iron Age,
had been regarded by ordinary mortals as a mere nuisance, were suddenly
lifted out of their habitual context: the 'bangs' became 'clangs' of music. In
the technical language of the communication engineer, Pythagoras had turned
'''noise'' into "information."
Back home, Pythagoras proceeded to an experimental verification of his
ideas on musical objects, in particular the vibrating strings of a lyre. He
understood that if he divided a string by integers belonging to the tetraktys,
the set of integers 1, 2, 3, 4 whose sum is the "perfect" number 10, he obtained
what had for a long time been named the "harmony" -the octave, the fifth,
and the fourth.
As Denis Diderot wrote in the entry Pythagorism of his encyclopedia
"L'Encyclopedie" :
After studying all subtleties he could find on the harmonics of a sound, and
the way he could reduce them to the interval of one octave by dividing them
by powers of 2, Pythagoras ended up with musical scales, in particular the one
that bears his name, which is shown in Table 1.1. The numbers indicate ratios
of frequencies. (The Greek modes were expressed in a decreasing sequence
according to the length of the string.)
note CDEFGAB C
ratio of frequencies 1 2.8 .!. :! 9.
64 3 2
TI
16
243
128
2
In this scale, the intervals between two consecutive notes can take only
two values: the tone (ratio 9/8) and the half-tone (ratio 256/243). Pythagoras
considered it particularly important that the numerators and the denomina-
tors of these fractions were powers of the elements of the tetraktys (in that
case 2 and 3). For him, this scale has a much greater beauty than all others.
In fact, at this point, we must give a further observation of Diderot, in his
last sentence of the entry:
The motion of celestial orbits, which carries the seven planets, forms
a perfect concert.
As Aristoxenes put it, one merit of Pythagoras was that he "elevated arith-
metics above the needs of merchants." He transformed a set of empirical and
utilitarian rules, in particular for trade, into a genuine deductive science. How-
ever, starting from his analysis of musical harmony, which can be reduced to
integer numbers, he could not resist believing that numbers are the principle,
the source, and the root of all things. Therefore, we are back to metaphysics.
According to this principle, the Pythagoreans elaborated a mystical "arith-
mology" by assigning qualitative properties to numbers. For instance, they
arrived at the idea that they could conceive and describe the cosmos and
its origin by the harmony of spheres. The principle of harmony invaded all
the philosophy of the Pythagoreans. They believed that all the universe is
determined by integers and the resulting harmony.
Pythagoras himself is one of the most mysterious personalities in Greek
antiquity. We do not possess any written text of his. For a long time, his
thought was known through oral tradition. Aristotle avoided mentioning his
name and only spoke about the Pythagoreans. Pythagoras was born in the 6th
century B.C. in Samos, in Asia Minor. Around the age of 40, he emigrated to
Crotone, in Italy. He founded some community, which was both religious and
political. The community was massacred during a revolt of the population. He
elevated integer numbers to the rank of foundations of the world. One legend
is that he committed suicide on the day he discovered he had proven that V2
was irrational; i.e. that he could not express the diagonal of a square in terms
of its side as the ratio of two integers.
Numerology played a considerable role in the development of science in
the 19th century. John Dalton's law of definite proportions enabled chemical
reactions to be reduced to an interplay of integers and secured atomic theory.
The classification of species in zoology as well as in botany rested on the
counting of various elements such as the number of petals in botany and the
number of teeth or nails in zoology.
The phenomenological analysis of atomic spectra involved rational frac-
tions. This adventure led to one of the most amazing breakthroughs, Johann
Balmer's integer number formula and its role in the birth of quantum me-
chanics.
It was by chance that, in 1885, Balmer, who was a high school teacher
in Basel and passionate about numerology, became aware of the first four
1.4 The Age of Enlightenment and the Principle of the Best 7
lines of the visible part of the spectrum of atomic hydrogen. He noticed that
the wavelengths of these lines could be fitted to one part in a thousand by a
formula involving integers: 1/)" ex (n 2 - 4)/n 2 , n;:::: 3. Although he was not a
physicist, this result struck him with its simplicity and its beauty. In his 1885
paper, he wrote: "It appears to me that hydrogen ... more than any other
substance, will open new roads in the knowledge of matter, of its structure
and of its properties."
In fact, in 1912, when the 27-year-old Niels Bohr was working at Ernest
Rutherford's laboratory on a model of an atom, he was completely unaware of
Balmer's formula and of the analogous formulas of Johannes Rydberg concern-
ing alkali atoms. When, by accident, he was informed of the Balmer formula,
it took him a few weeks to construct his celebrated model of the hydrogen
atom, one of the turning points of quantum physics.
An amusing enigma remains to be settled, namely the empirical law that
Titius found in 1772 and that was also published by Bode in 1778. This law
is a relation between the distances a of planets from the sun (more exactly
the major axis of the ellipses), expressed in astronomical units (1 A.U. = 150
million km), and their ranks n, assuming the rank of Mercury is n = -()() and
the rank of Venus n = 1. The original form of the law was a = (n + 4)/10
with n = 0, 3, 6, 12, 24, 48, ... ; the present form is
a = 0.4 + 0.3 x 2n - 1
where a is the distance between the planet and the sun. For Mercury, n = -()()
and a = 0.4; n = 1 for Venus; n = 2 for the Earth; n = 3 for Mars; and n = 5
for Jupiter. The "gap" observed for n = 4 led to the discovery of the belt of
asteroids when astronomers tried to observe a planet at a distance of 2.8 A.U.
The Titius-Bode law, which is accurate up to Uranus, becomes bad for larger
distances (it gives a = 77.2 A.U. for Pluto, whose actual distance from the sun
is 39.2 A.D., but is now classified as a "dwarf planet," after the August 26,
2006, Congress of the International Astronomical Union). There are present
speculations to see whether it holds or not for extra-solar planetary systems
discovered in recent years. No dynamical calculation has ever been able to
recover this formula from the theory of celestial mechanics.
Philosophers of the 18th century were fond of the idea of balance and equilib-
rium. Let us mention, for its actuality, the following assertion of Charles de
Secondat de Montesquieu in The Spirit of the Laws: "In any public office, one
must compensate the might of the power by the brevity of its duration."
With the philosophy of Gottfried Wilhelm Leibniz (1646-1716), there ap-
pears an acknowledgment that optimal conditions appear in Nature. Coming
8 1 Introduction
back to Diderot and the item Leibnizianism in the "Encyclopedie," one can
read the following:
him. He was convinced that things could be done properly. As Fermat said,
"It seems to me that a little geometry can help us solve the problem."
When he managed to formulate the law of refraction nl sin i 1 = n2 sin i 2 , in
a geometrical manner, Fermat was fascinated: "The fruits of my work were the
most unexpected and the most extraordinary that ever were. In fact ... I have
found that my principle gives exactly and precisely the same proportion of
refractions that Monsieur Descartes established." At the end of 1661, Fermat
wrote his principle of least time, which started everything. He called it the
principle of natural economy.
In 1744, Pierre-Louis Moreau de Maupertuis (1698-1759), who in 1730 had
introduced the ideas of Newton into France and continental Europe, stated
for the first time the principle of the least amount of action in mechanics.
Even though the initial form and justification given by Maupertuis are
obscure, it is a historical landmark in the evolution of ideas in physics and
likewise, at the time, in philosophy.
In the same line as Fermat, Maupertuis understood that, in some well-
defined conditions, Newton's equations were equivalent to the fact that a
quantity, which he called the action, was minimal. His statement is the fol-
lowing:
The Action is proportional to the product of the mass by the velocity
and by space. Now, here is this principle, so wise, so worthy of the
Supreme Being: when some change occurs in Nature, the amount of
Action used for this change is always the smallest possible.
As Philip M. Morse and Herman Feshbach say [10], variational principles are
the mathematical formulation of the superlative. This formulation of physical
laws consists in imposing that some typical physical quantity of the system
10 1 Introduction
under consideration is optimal for the actual performance of the system com-
pared with the value it would take if one were to imagine a different perfor-
mance. In a certain sense, owing to their universality, variational principles
can appear as a general "metatheory" of physics and perhaps, one day, of other
branches of science such as biology, psychology, and social phenomena. They
playa central role in economics. The first formulation of a physical theory con-
sists in explaining a phenomenon by a local law. This is the case for Newton's
laws of dynamics, for the Snell-Descartes laws, and for the differential laws
of electromagnetism or thermodynamics. After this first formulation has been
performed and exploited, one always seeks the underlying basic principles and
their relations with other theoretical schemes. "Variational principles" express
physical laws in a global manner. The corresponding formulation can restore
the local laws, however one discovers that it is richer and more powerful. It
allows us to bring out the fundamental principles of the laws under consider-
ation. This provides a more fruitful view both of fundamental principles and
their applications.
This way of considering physical processes and structures can be traced
back to Greek mathematicians and philosophers. The Greeks characterized a
straight line as the shortest path between its endpoints. In the first century
B.C., Hero of Alexandria had discovered and proved the remarkable fact that
the equality of the angles of incidence and reflection in geometrical optics boils
down to the fact that the length of the path between the source and the eye
of the observer is the shortest possible. In the same spirit, The Aristotelians
thought they could "justify" that celestial orbits are circular by the fact that,
for a given value of the perimeter of closed planar curves, the circle is the
one that surrounds the largest area (this is called the isoperimetal problem in
mathematics).1 o Considering a straight line as the shortest path between two
points or a circle as the shortest line around a given area are simple ways to
define these geometric objects.
Similarly, in physics, saying that electric current is distributed in a network
in such a way that the energy loss by Joule heating is as small as possible is a
simple and direct description of the flow of electric current that encompasses
a variety of particular cases without using any complicated mathematics. Of
course, calculations reappear as soon as one applies these principles to specific
cases. The assertion that a physical system acts or evolves in such a way that
some function related to it is minimum or maximum is very often the starting
point of theoretical investigations, and it enables one to uncover the ultimate
relations between physical facts.
Therefore, variational principles present natural phenomena as problems
of optimization under constraints. They are present in all sectors of physics (a
10 The legend says that, when she founded Carthage, Dido had to satisfy the con-
dition that the city should be contained within a bull's skin. She cut the skin in
narrow strips in order to make an enormous circle with it.
1.6 Variational Principles 11
11 In the same line of thought, he gave a famous argument in favor of free choice.
The argument is known as "Buridan's Ass," where two piles of hay are set at
equal distances from a starving donkey. Nobody, even God, can predict which
pile the poor beast will choose. In order to express such ideas at that time in the
Sorbonne, some amount of courage, authority, and skill was necessary.
12 1 Introduction
also straight; it is due to the impetus of gravity, also called natural impetus,
and the cannonball falls down. 12
:?/
B C
';oloot ~ moo"
if
Fig. 1.2. Successive phases of the motion of a cannonball in the theory of impetus.
The metaphysical enthusiasm did not last very long. This was not due to any
lack of intellectual richness or esthetics but because since then variational prin-
ciples have never stopped producing important physical results. Our ambition
in this book is to describe a few of them.
Leonhard Euler (1707- 1783) and Joseph-Louis Lagrange (1736- 1813),
whose works were pursued by William R. Hamilton (1805-1865), set the math-
ematical foundations of the subject. They constructed a founding stone of
present day theoretical physics.
The consequences of this conception of physics can be found in Einstein's
general relativity as well as in gauge theories of fundamental interactions.
The central mathematical tool is the variational calculus (also called calcu-
lus of variations). This is the work of Euler, who understood the mathematical
foundation, and Lagrange who made a decisive contribution in 1766. 13 Vari-
12 The fall is steeper than the rise of the violent impetus. Fortunately, air friction
does produce this effect!
13 Euler, who had been partially blind since the age of 28, became completely blind
in that same year of 1766. The 18-year-old Lagrange visited him in 1754 and told
1.7 The Modern Era, from Lagrange to Einstein and Feynman 13
Fig. 1.3. Diagram from the Polish 16th century artillery handbook Ars Magne Ar-
tilleriae pars prima: DellAqua Praxis: examples of shots. (One can imagine that 20th
century colliding beam facilities were already in the minds of people.) Archives of
Casimir Siemienowicz, General of the artillery of the Polish and Lithuanian Crown.
(Courtesy of Richard J. Orli.)
him about his work. Euler was filled with wonder over the talent of this young
man, and he hid his own results for some time so that the full credit would go
to Lagrange. This is nearly a unique example, nonexistent nowadays, of human
courtesy and passion for science.
14 1 Introduction
work was the starting point of the work of Hamilton who called it a "scientific
poem written by the Shakespeare of Mathematics."
Hamilton was born in Dublin. Like Lagrange, he was also a precocious
child. At the age of 19, he wrote a remarkable paper on optics. At the age of
23, he became Professor of Astronomy at Dublin and Royal Astronomer at the
Dunsink Observatory. He spent all his life in Dublin and in his observatory.
Hamilton's interest in optics came from the instruments in his laboratory.
His memoir On caustics, written in 1824, is a milestone of optics. Soon after
that, he developed and amplified the analytical mechanics of Lagrange, and
he gave it its modern form.
Hamilton was fascinated by variational principles and, in particular, by the
similarity between Maupertuis's principle in mechanics and Fermat's principle
in optics. In 1830, he made the remarkable observation that the formalisms
of optics and mechanics could be unified and that Newtonian mechanics cor-
responds to the same limit or approximation as geometrical optics compared
with wave optics.
His contemporaries paid no attention to that remark, and the great math-
ematician Felix Klein said in 1890 that it was a shame. Of course, in 1830,
there was no experimental evidence for Planck's constant. Nevertheless, to
a large extent, Hamilton's work can be considered a precursor of quantum
mechanics.
Our main purpose here is to give an instructive account of the analytical
mechanics of Lagrange and Hamilton. These are inescapable chapters in the
culture of physicists. We shall also show the many spinoffs in other sectors.
We shall, in particular, show the relation of analytical mechanics with optics
and with quantum mechanics.
In Chapter 2, we recall Fermat's principle that was given in 1661 as a
least time principle. Fermat poses the problem of the propagation of light by
asking what is the effective path followed by a light ray to go from one point
to another. This will bring us in a natural way to the mathematical core of
our purpose, the variational calculus of Euler and Lagrange. It is a very rich
chapter of mathematics. Here, we only wish to obtain physical results in a
simple and straightforward manner.
We will investigate some simple examples in order to get acquainted with
the matter. These will be the Maupertuis least action principle and other
more unexpected examples, such as Kirchhoff's laws or Poisson's equation in
electrostatics.
Finally, we shall turn to a case that is quite analogous in its spirit but is
fascinating because of the number and power of its results compared with the
simplicity of its starting point, the foundations of statistical thermodynam-
ics. Introducing the technique of Lagrange multipliers and the principle of
equiprobability of configurations, we will obtain a very remarkable definition
of temperature, together with its first physical property, that temperatures
of systems in thermal contact equalize. Next, we will give the statistical ab-
solute definition of entropy due to Ludwig Boltzmann. This will lead us to a
1. 7 The Modern Era, from Lagrange to Einstein and Feynman 15
momenta or generalized momenta, which plays a central role in all that will
follow.
Finally, we will extend these considerations to the case of a relativistic
particle. Our starting point will precisely be relativistic invariance. The least
action principle can only be meaningful if it determines the motion of a particle
in the same way, whatever the relative state of motion of the observer. This
will enable us to construct the Lagrangian of a relativistic particle. We shall
see how the energy and momentum of a free particle are related to its mass
and velocity. We will prove that the set {E / c, p} is a four-vector of space-time
in relativity.
Chapter 4 leads us to the next stage, in the 1830s, and to the so-called
canonical formulation of analytical mechanics due to Hamilton. The canonical
formalism was elaborated in 1834. It is more convenient for a series of problems
such as the dynamics of point-like particles. But it is impressive, above all, in
the number of its developments, both in physics and in mathematics. In the
present book, we are mainly concerned with applications to mechanics, but
we shall describe several other spinoffs of Hamilton's work. We will establish
the canonical formalism that consists in describing the state of a system by
conjugate variables (i.e., positions {x} and Lagrange conjugate momenta {p} )
and not by positions and velocities. In other words, a system is described by a
point in phase space, and it is characterized by a Hamiltonian that is obtained
from the Lagrangian by a Legendre transformation.
After finding Hamilton's canonical equations, which are first order coupled
differential equations for the evolution of the state variables, we shall present
some aspects of dynamical systems. In fact, this type of physical problem has
been an amazing source of discoveries, both in mathematics and in physics.
Henri Poincare founded this field of research in 1885 when he studied the three-
body problem. This leads to fascinating developments, such as the behavior for
t = 00, attractors and strange attractors, bifurcations, chaos, etc. The most
famous strange attractor is the Lorenz attractor, named after its inventor,
Edward N. Lorenz, who discovered it in 1963 in a mathematical model for the
evolution of the atmosphere. Lorenz generated a new and spectacular source
of interest in chaos with his "butterfly" effect in meteorology.
Next, we will introduce the Poisson brackets, which bear a mathemati-
cal structure of great interest and whose applications are closer to what we
are concerned with here. Jacobi considered that to be Poisson's greatest dis-
covery. In fact, Poisson brackets are the starting point of the theory of Lie
groups. We shall use them to define canonical transformations, which have
many applications and show that there is a complete equivalence between
the two types of state variables: positions {x} and momenta {p}. From the
mathematical point of view, phase space is the space that is appropriate to
describe the evolution of a set of points, as opposed to the "empirical" space
of positions and velocities. We will then be able to understand in a natural
way the amazing property discovered by Dirac in 1925. There is a remark-
able similarity between analytical mechanics and quantum mechanics if one
1. 7 The Modern Era, from Lagrange to Einstein and Feynman 17
in common are equal or strictly proportional. These quantities are the two
concepts of mass. One is the inertial mass, or the coefficient of inertia, and
the other is the coupling coefficient to the gravitational field, or the gravita-
tional mass. There is no a priori argument that can explain why this equality
occurs. In a gravitational field, this equality eliminates the mass from the
equations of motion. Two bodies placed with the same initial conditions have
the same motion whatever their masses. It took some time to realize how deep
this observation is. The historical experiment of ECitvCis in 1890 14 has been
systematically redone and improved since then. It is still performed with more
and more sophisticated techniques.
The underlying idea of general relativity is that the equality becomes nat-
ural if what we call the "gravitational" motion is actually a free motion in a
curved space-time. Einstein used to say that in 1907, when he was working
on how to incorporate Newtonian gravitation in relativity (the incorporation
of electromagnetism was by construction automatic), he had the "happiest
thought of his life." He was thinking of what a carpenter falling from the roof
would feel. For such an "observer" (and of course as long as he does not en-
counter any obstacle), there is no gravitational field. If this observer lets any
object "fall" from his pocket, this object stands still or has a uniform linear
motion with respect to him, whatever its nature or physical and chemical
composition. (The resistance of the atmosphere is of course neglected in this
example.)
The ambition of this chapter is to show how the notion of motion in a
curved space can lead to a theory where the equality of the "two masses"
emerges naturally. We start by studying the free motion of a particle in a
curved space and the notion of the metric of the space. We will then write the
motion of a free particle in such a space. This will lead us to a fundamental
result: The physical trajectories are the geodesics of the space, the curves of
minimal (or extremal) length. As we shall see, this is how the motion of a
particle of constant energy E in a Euclidean space-time can be transformed
into the free motion of the same particle in a curved space, which is equivalent
to the Maupertuis principle.
This will allow us to understand the reasoning of Einstein when he con-
structed general relativity and some consequences of this theory. We will dis-
play three historical examples: The variation of the beat of a clock due to the
gravitational field, the corrections to Newton's celestial mechanics, and the
deviation of light rays by a gravitational field. These examples are histori-
cal. They are also very important in present-day astrophysics and cosmology.
The deviation of light by a gravitational field plays an important role via the
gravitational lensing effect that it induces. One application is the search for
a baryonic component in the "missing mass" of the universe. Another is that
the mass distribution in the universe, be it the visible mass or the missing
14 Roland Eotvos, "Uber die Anziehung der Erde auf Verchiedene Substanzen,"
Math. nat. Ber. Ungarn, 8, 65, (1890).
1. 7 The Modern Era, from Lagrange to Einstein and Feynman 19
mass, can act as a natural telescope that can enable us to see faraway objects
and therefore much younger objects. Through this natural cosmic telescope
(or microscope), the universe appears as an endless gallery of gravitational
mirages.
Finally, Chapter 7 is devoted to Feynman's variational formulation of
quantum mechanics. Richard P. Feynman was probably the greatest theo-
retical physicist of the second half of the 20th century. In his thesis work in
1942 at Princeton, Feynman attempted to solve the problem of the self-mass
of the electron, which is infinite in second-order perturbation theory in quan-
tum electrodynamics. Feynman discovered a "least action principle," which
enabled him to solve the problem by using both retarded and advanced po-
tentials. In order to do this, he introduced the mathematical concept of path
integrals, which has been a field of extensive interest since then. The first tri-
umph of this method came when it led to the correct calculation of the Lamb
shift in the hydrogen atom without introducing any arbitrary cutoff parame-
ters. The infinities were dealt with in a systematic and well-defined manner in
terms of basic physical parameters. Since then, the renormalization group has
acquired a depth that places it at the forefront of present theoretical physics.
It was only a few years later that Feynman understood that he could apply
his ideas to a variational formulation of nonrelativistic quantum mechanics. In
an article published in 1948 15 followed a few years later by the book Quantum
Mechanics and Path Integrals by Feynman and Hibbs [20], which corresponds
to the course Feynman gave on quantum mechanics at Caltech for a few years,
one can find the essence and the beauty of his ideas and results.
The two pillars of this approach are the following. First, Feynman is not
interested in states of a system but rather in amplitudes of processes. This is a
more realistic attitude in the sense that any phenomenon, any measurement,
consists in a process. Second, Feynman addresses the problem of quantum
mechanics in space-time.
Feynman's approach relies on the superposition principle. To any physical
process there correspond a number of complex amplitudes that add up. The
probability of observing an event is the modulus squared of the sum of ampli-
tudes that can lead to that event. The Feynman principle consists in assuming
that the phase of the amplitude for a given process is given by the classical
action along the path under consideration divided by Planck's constant n.
The sum of all amplitudes that contribute to the process under consideration
is a mathematically complicated object called a path integral.
Feynman shows how one recovers the Einstein and de Broglie relations,
together with the Schr6dinger equation, observables, and all usual quantum
mechanics in this framework. If one considers systems and processes where
the classical action S(b, a) is macroscopic (i.e., much larger than Planck's
constant n), the contributions of paths that may seem very close to each
other classically but are such that the difference of the classical action along
these paths is much larger than n will be destructive with probability one.
The total contribution of the sum of such paths will therefore vanish in the
global action.
However, in the vicinity of the classical trajectory Xcl(t), the action
Scl(b, a) is stationary. Therefore, the only paths that contribute apprecia-
bly are those for which the action S(b, a) is sufficiently close to the classical
action Scl(b, a), the difference being small compared with n. In other words,
under these considerations, it is only an infinitesimal vicinity of the classical
trajectory, impossible to resolve experimentally, that occurs. The "probabil-
ity" of the classical trajectory is therefore equal to one. In this way, classical
mechanics appears as the limit of quantum mechanics for macroscopic values
of the action. In addition, as we shall see, the amplitude satisfies identically
a modern version of the Huygens-Fresnel principle in optics.
Therefore, Feynman's principle contains an amazing unifying esthetics af-
ter the five previous chapters of this book. It consists in taking into account,
in the calculation of an amplitude, the "largest number" of possible paths con-
strained by the fact that paths that are too far apart interfere destructively.
One can also visualize this as the fact that an amplitude increases when the
"volume" of the space of alternative paths that contribute in a coherent man-
ner is larger. From that point of view, the phase of an amplitude acquires a
physical role and an essence that is perhaps not fully appreciated.
2
Variational Principles
The remarkable aspects of variational principles are twofold. First, they re-
veal that natural structures and processes result from principles of optimal
conditions. Second, they are universal. All physical laws can be expressed in
such a global form. This form leads to the local expression of physical laws,
but it is richer and more powerful. In particular, it reveals the fundamental
principles that govern physical laws.
Variational principles possess the common feature of presenting natural
phenomena as a result of optimization under constraints. The founding idea
in modern physics and its first formalization are due to Fermat and the prin-
ciple he proposed in geometrical optics. This was followed by the variational
calculus developed by Euler and Lagrange in the 18th century.
In this chapter, we review a number of examples and introduce the neces-
sary mathematical tools. In Section 2.1, we turn back to the Fermat principle,
in particular Fermat's proof of the laws of refraction. Fermat did not know
the velocity of light and the existence of an index of refraction. He assumed
that the time it takes light to travel a certain distance in a medium is propor-
tional to the "resistance" of that medium to the propagation of light. Fermat
stated his "least time principle" at the end of 1661. He called it the "prin-
ciple of natural economy." We know that this principle explains curved light
rays and mirages, which the Snell-Descartes laws cannot account for. This
will directly lead us to the central underlying mathematical foundation of the
problem under consideration: the variational calculus of Euler and Lagrange.
It is an amazing chapter of mathematics, both in its unifying aspect and in
the number of problems that it allows one to solve. Deliberately, we shall not
go into
J-L. any mathematical details. Such details can be found in the literature,
Basdevant,
and we shall focus on physical applications and results.
In Section 2.2, we will give a series of examples. First is the "least action
principle," as first stated by Maupertuis for mechanics in 1744. This was a
landmark in the evolution of ideas in physics as well as philosophy at the
time. Then we shall display simple, but mOre original, applications such as
Kirchhoff's laws in electricity Or Poisson's equation in electrostatics.
In Section 2.3, we consider a physical problem that is very similar in its
spirit but is fascinating in the number and importance of its consequences
compared with the simplicity of the starting point. This concerns the foun-
dations of statistical thermodynamics. We shall introduce the technique of
Lagrange multipliers and the principle of equiprobability of configurations.
From this, a very simple definition of the notion of temperature will emerge,
together with its first physical property: The temperatures of two systems in
thermal contact at equilibrium are equal. Then, we will obtain the statistical
and absolute definition of entropy, due to Boltzmann. This will lead us to the
simple but striking principle that Thermodynamical equilibrium corresponds
to a situation where the entropy is maximum, given the constraints imposed
on the system; in other words, a situation where disorder is maximum, given
the constraints.
Fig. 2.1. Possible light rays between an emitter A and an observer B when there
is a reflection on a plane. Since B' is symmetric to B with respect to the plane of
the mirror the length of AO B' is equal to that of AO B. The shortest path between
A and B' is a straight line. A path AF B is longer whenever F f. O.
Refraction
Concerning the laws of refraction, Descartes had assumed that the velocity
of light in matter (a dense medium) is greater than in a vacuum (or in a
diluted medium).l That fact, together with the lack of rigor of Descartes's
"proof," had made Fermat angry. He was convinced that things could be
done properly. As he said, "It seems to me that some geometry can help us
solve this problem."
Fermat solved the problem of refraction only much later, in 1661, annoyed
by the critiques of Descartes's supporters. The key point of his proof lies in
the assumption that the velocity of light is, on the contrary, smaller in a dense
medium than in a dilute one.
Let (X, Y) be the plane separating the two media of indices nl and n2. 2
The source is at point A and the observer is at point B, as represented in
Figure 2.2.
Let Hand H' be the projections of A and B on the (x, y) plane. We denote
by h the distance of A to the surface and hi that of B. The distance H H' is
l. We consider a path AOB and we denote by x the distance HO. We want
to minimize the optical path nl AO + n2 OB.
By the Pythagorean theorem, we have
H x
Fig. 2.2. Possible light ray between an emitter A and an observer B when there is
refraction across a plane surface between two media of indices nl and n2. H and H'
are the projections of A and B on the surface. h is the distance between A and the
surface, and hi that between B and the surface. The distance H H' is l.
We note that
and
where the angles 81 and 82 are indicated In the figure, and i1 and i2 are the
angles of incidence and refraction.
Therefore, we obtain the Snell-Descartes law
(2.4)
Rescuing a swimmer
This result can be transposed into many other situations. One example is the
optimal path that a rescuer must follow on a beach and in the water in order
to rescue a bather in difficulty. The velocities of the rescuer on the beach, VI,
and in the water, V2, are not the same. The optimal trajectory, which can be
sketched as in (2.2), obeys the law 3
Curved Rays
Fig. 2.3. Light ray between an emitter A and an observer B in a medium whose
index of refraction varies with the altitude z. The variable x is the horizontal dis-
tance. We assume that the problem is translation invariant in the perpendicular y
direction. The apparent direction of point A as seen by B is the tangent to the light
ray reaching B.
The light rays propagate along curved paths and not straight lines, and
the optical angular position of an object differs from its geometrical direction.
From the mathematical point of view, we need to find the path z = Z(x)
of a light ray propagating in a medium of index n( z, x) (or simply n( z) if the
system is translation invariant along the x direction) and going from a point
A at (zo, xo) to an observer B at (Zl' xd. The time dT that it takes light to
go from [x, z] to [x + dx, z + dz] is
T = -
c
11B n(zh/1 + z(x)2 dx,
A
(2.5)
where the endpoints A and B are fixed, where z(x) = dz(x)jdx, and where
is a known function, called the Lagrange function. Needless to emphasize, it
is exactly the problem of equation 2.5.
Let us assume there exists a solution, that we denote as z = Z(x). We
want that, for any infinitesimal variation 8z(x) of Z(x), there corresponds
a second-order (or more) variation of the integral I. In the transformation
Z -7 Z + 8z(x), Z -7 Z + 8z(x), where it is assumed that the endpoints of
the integration do not change, 8z(A) = 8z(B) = 0, the variation OJ of the
integral is
OJ = 1 [0
A
B -
OZ
- -d (0)]
dx
-. 8z(x)dx.
OZ
(2.7)
2.1 The Fermat Principle and Variational Calculus 27
We want this integral to vanish for any infinitesimal variation rSz. The
integrand must vanish identically. Therefore, the solution z = Z (x) must
satisfy the second-order differential equation
(2.8)
Let us come back to the case of curved rays considered above. Consider the
integral (2.5) and let us assume for definiteness that the index of refraction
varies with the altitude as n(z) = 1 + vz, with v constant for definiteness.
(This formula is only valid for a finite range in z, we could use n(z) = no + vz
for negative v.) We also assume that the endpoints are at the same height
z(x = 0) = hand z(x = l) = h. The Lagrange function is
1 .
12 = -(1 + vz)J1 + z(x)2,
c
from which we deduce the Lagrange-Euler equation
1+ u2 = uu, (2.11)
In this simplistic model, the trajectory of the curved ray is a cosh function
whose minimum (or maximum) altitude is attained at x = 1/2 (the symmetry
of the problem).
This situation is encountered in mirages. Perhaps the most common is
highway shimmer. Parts of a hot highway can appear as "lakes." This sort
of mirage is sketched in Figure 2.4. The index of refraction is smaller near
the highway, where the temperature is high and the air less dense, whereas it
increases with the altitude, where the temperature is lower. The "lake" is a
reflection of the sky. Such a case is called an inferior mirage. The apparent
image is below the actual direction of the object. This is depicted in Figure
2.4.
Fig. 2.4. Diagrams of an inferior mirage (left) and a superior mirage (right).
As one can understand from this simple example, a more complex variation
of the index of refraction n( z) will lead to a variety of phenomena. The reverse
happens if the index is smaller at high altitudes than at lower altitudes. This
type of situation happens when light rays propagate near a hot hill. These are
called superior mirages. One can then see an object that should be hidden
geometrically by the hill, such as the famous mirages in the desert.
At sunset, one can see the sun for quite a long time after it has gone below
the geometrical horizon. As shown in Figure 2.5, when the sun is close to
the horizon, light rays cross an atmosphere whose index of refraction varies
considerably with the altitude. At sunset, the angle between the apparent
direction of the sun and its actual direction is roughly half a degree. The sun
is far below the horizon (see [3] for other examples).
Mirages happen frequently in the Arctic and Antarctic. For a long time, the
line of sight crosses a large thickness of the atmosphere. Over that distance, the
density and chemical composition of the atmosphere can vary considerably.
This results in spectacular effects.
Figure 2.6 is a picture taken during a German expedition led by the ship
Germania in the Arctic in 1888. It is particularly rich, since for both ships
2.1 The Fermat Principle and Variational Calculus 29
,
Apparent direction of su n
,/
~---_/
Palh of light
Sun
Earth
Fig. 2.5. Actual and apparent directions of the sun near the horizon. They differ
by '" 0.5 degrees.
there are two superior mirages, inverted with respect to one another. Between
the ships, one can see an iceberg. This picture is reminiscent of the legend
of the Flying Dutchman 4 at the Cape of Good Hope (in the Southern Hemi-
sphere).
Fig. 2.6. Double superior mirages observed by sailors of the Germania expedition
in the Arctic in 1888. (Courtesy of Roger Lapthorn.)
Figure 2.7 shows two mirages: The superior mirage of an iceberg in the
arctic and a remarkable double sunset mirage, where the lower, inferior, image
of the sun in the forefront is caused by the strong density variations inside a
layer of clouds visible on the picture (see http://www.atoptics.co.uk/).
The variations of the index of refraction of the atmosphere generate a series
of effects, in particular lensing effects. It is possible to observe islands, ships,
and coasts that are several hundred kilometers away. The variation of the index
allows one to see and take pictures of the famous "Green ray" at sunset (see
Pekka Parviainen at http://virtual.finland.fi/finfo/english/mirage2.html).
4 The "Flying Dutchman" was a famous sailor. He claimed he could sail around
the Cape whatever the weather conditions. Years after he disappeared in a huge
storm, many sailors claimed they had seen his ship, in particular in the sky, which
was proof that storms were unable to beat him.
30 2 Variational Principles
Fig. 2.7. Above: superior mirage of icebergs in the Arctic. (Courtesy ofPekka Parvi-
ainen.) Below: remarkable double sunset mirage observed at Paranal Observatory
in the Atacama Desert, Chile, by Luc Arnold in 2002 at the site of the European
Southern Observatory Very Large Telescope. (Courtesy of Luc Arnold)
In 1744, Maupertuis stated for the first time his principle of the least quantity
of action in mechanics. Even though the initial version and justification of
Maupertuis are confused, it is a historical landmark in the evolution of ideas,
both in physics and, at the time, in philosophy.
Consider a particle of mass m and velocity v. The action of Maupertuis
is the product of three terms: the mass, the velocity, and the distance cov-
ered. Actually, it is the integral of the linear momentum projected along the
J
trajectory: A = mv dl.
The correct formulation and the proof of Maupertuis's principle were given
a little later by Euler. In present terminology, consider a point-like particle
of mass m in a potential V(r). We denote by v the velocity and v its norm.
Assuming (this is essential) that the energy E is a constant of the motion, we
have
1
E = "2mv2 + V(r).
The action of Maupertuis is
where dl is the length element along the trajectory. The principle of Mauper-
tuis is that the physical trajectory that the particle follows to go from a to b
with a fixed energy E is the path that makes (2.14) minimum.
There are many proofs. We parameterize the state variables {r, r}, where
r = (x,y,z), by the time t on the physical trajectory (i.e., we work with
{r( t), r( t)} ). The times of departure ta and arrival tb are therefore well defined.
We have dl = v dt = J(i;2 + iP + i 2) dt, and the action (2.14) is
Aa,b =
lta
tb J2m(E - V(r)) Ji;2 + iP + i 2 dt. (2.15)
av
-mv- 1 = -d (i;-J2m(E - V(r)) ) ,
ax J2m(E - V(r)) dt v
av
-mv- 1 = -d (iJ-J2m(E - V(r)) ) ,
ay J2m(E - V(r)) dt v .
av
-mv-
az
1
J2m(E - V(r))
= -d
dt
(i
-J2m(E - V(r))
v
). (2.16)
dv
-V'V=m- QED. (2.17)
dt
Consider a massive string of constant linear mass density f.l and length L whose
endpoints are fixed at A (x = 0, z = zo) and B (x = a, z = zd. The string
lies in the vertical plane (x, z), and it is in the gravitational field, oriented
along the vertical z axis. We want to determine the shape of the string at
equilibrium. (Of course, we assume that (Zl - ZO)2 + a2 ::; L2.)
Equilibrium corresponds to the configuration where the gravitational po-
tential energy of the string is minimal. Consider an arbitrary shape of the
string z (x). An element of the string in the interval [x, x + dx 1 has a length
dl 2 = dx 2 + dz 2 = (1 + i(x )2)dx 2, and its potential energy is dV = f.l9 z dl (g
is the acceleration of gravity). We must therefore minimize the integral
(2.18)
32 2 Variational Principles
1 + Z2 = ZZ, (2.19)
Z = ccosh((x - xo)jc)
where the parameters c and Xo are determined by the constraints z(O) = zo,
z(a) = ZI, and the length of the string L = foa Jl + z(x)2dx.
The minimum is located in the interval x E [0, a] according to the relative
positions of the endpoints.
In Problem 2.2 one can see that by using the technique of Lagrange mul-
tipliers, which we will define in Section 2.3.3, the problem can be cast as a
translation-invariant problem along the z axis since the length L is an intrinsic
quantity of the string.
R, I,
The variational principle here consists in imposing that the energy losses
by Joule heating are as small as possible. In other words, we want to find the
minimum value of
impose that the potential difference V between the two nodes is given. Notice
that we do not need the notion of electric potential. We have replaced the
local notion of potential difference by a global energetic condition and a very
simple principle.
Considering an arbitrary circuit, the principle is that the global heating
loss Lk Rklk is minimal. Of course, one recovers the Kirchhoff laws. For a
relatively simple network, the two approaches are equivalent. In practice, they
may be very different if we consider a large network of electricity transporta-
tion, with, for instance, 10 million elements. Inverting a 10 7 x 10 7 matrix in
real time is not realistic, whereas mathematical optimization procedures are
extremely efficient and easy to handle.
i1 P
= --. (2.20)
EO
(2.21 )
The problem under consideration is to find the potential (r) that minimizes
this expression.
We remark on the following points:
1. As usual, we assume there are no charges at infinity, so that can be
chosen to vanish at infinity. The integrals run over all three-dimensional
space.
2. Since the first term is positive, if there exists a minimum of this expression
for a function (r), this minimum corresponds to an equilibrium situation.
In this respect, it is similar to the case of the massive string in Section
2.2.2. There is an equilibrium between two contributions to the total en-
ergy that compete with one another. Any excess of one form of energy
corresponds to an unstable situation.
34 2 Variational Principles
3. In comparison with the mirage (2.5) or the massive string (2.18), it is the
potential and its gradient V' that play the role of the previous single
variable z and its derivative z. The variable x of the previous simple
examples is now a point r of three-dimensional space (Le., r E R 3 ).
Let be the solution and rJ(r) an infinitesimal variation of this potential.
In the variation -+ + rJ, we have, to first order,
(2.22)
Integrating the first term by parts, and taking into account the fact that
vanishes at infinity, we obtain
Therefore,
(2.23)
11 = P
--.
co
A particular case is when the charge density vanishes. By that, we mean
that there are a certain number of charged conductors each of which is at
a given potential Vb V2 ,, Vn . There is a surface charge density, but the
volume density p vanishes everywhere. Let L\, E 2 ,, En be the surfaces of
the conductors. Then equation (2.23) boils down to
11 = 0,
Consider the interval {z , z+ dz} and let r(z) be the radius of a transverse
section of the surface. We want to minimize the energy
This surface, which is rotation invariant around the z axis, bears the sweet
name of a catenoid.
One can attempt to determine shapes of bubbles attached to more com-
plicated structures. (Needless to emphasize again, the present problem has an
analytic solution.)
Let us turn to a case that is similar in its motivation but that has fascinating
consequences compared with the simplicity of the starting point.
As Schrodinger wrote [6], there is, basically, only one problem in statisti-
cal thermodynamics: the distribution of a given amount of energy E over N
identical systems.
We only consider here classical statistical thermodynamics. Quantum
statistics is outside the scope of this book. The only "quantum" feature lies
in the fact that we assume there are discrete energy levels.
We consider an isolated assembly of N identical systems {Sl' S2, ... , 09 N },
each of which can occupy one of the energy levels Ck (for instance, the energy
levels in a box where we place the atoms of a monatomic gas).
36 2 Variational Principles
We assume that the pairwise interactions of these systems are weak in the
sense that they do not affect their energy levels. The energy of the assembly
is therefore the sum of the energies of the N systems.
Let us call the state or configuration of the assembly the fact that
The ei belong to the set {ck} and, of course, the sum is equal to the (given)
total energy E.
We call distribution of the N systems the fact that
(2.24)
more probable than any other distribution. In other words, if one were to
inspect the state of the assembly, one would most of the time find a state in
the vicinity of the most probable distribution.
This distribution (more correctly, this vicinity) corresponds to the ther-
modynamic equilibrium of the assembly.
We therefore want to determine the distribution that maximizes W. Ac-
tually, W is a very large number. It is convenient to maximize its logarithm
rather than W itself.
Since the numbers {nd are very large, we can use Stirling's formula N! rv
NN e- N (27rN)1/2 (where the last factor doesn't play any significant role),
which leads to
(2.25)
where we have introduced the probabilities Pi = ndN.
We want to find the distribution {ni} that maximizes this expression under
the constraints
(2.26)
d of of d
dx f(x, yo(x)) = ox + oy dx (Yo(x)) = O. (2.27)
of og of og
-;:;-+>--;:;-=0 (1),
uX uX
oy + >- oy = 0 (2), 9 = 0 (3). (2.29)
a1 + >.. ayo a1
ax ax
= 0 (1), ay - >.. = 0 (2). (2.30)
Eliminating>.. between (1) and (2) obviously amounts to solving the initial
equation (2.27).
This method applies in the case of a function 1({Xi}) of any number of
variables Xi, i = 1, ... , n related by any number P of constraints 9k( {Xi} =
0, k=1, ... ,p(withp<n).
It is simpler to work not with the occupation numbers ni but with the prob-
abilities Pi = ndN, which can be considered continuous quantities since N is
very large.
In terms of these probabilities, the two constraints are
We must therefore introduce two Lagrange multipliers, a and /3, and the
probability law {pd, which maximizes this expression under the constraints
above, is the function for which the variation of the quantity
(2.32)
L e-o:-{3C: i = 1,
i
-0: 1
e - =--..."..-
- 2:i e-{3C:i (2.33)
'" -130
E=NuiEie ' (2.34)
Li e-{3ci
We will see that this number (3 defines the temperature T of the assembly
by
(3 = l/kT, (2.35)
where k is Boltzmann's constant. It is, in particular, a quantity that equalizes
when two assemblies are put in thermal contact (which is the first property
of temperature).
Therefore, the probability Pi of finding a system in the energy level Ei at
equilibrium is given by Boltzmann's factor
Pi = --=Z- (2.36)
is called the partition function of the system (from the German Zustands-
summe, sum over states). This function plays an important role in statistical
physics; -k In Z is the free energy divided by T. The form (2.36) is called the
Boltzmann-Gibbs distribution.
Consider two assemblies [ and [', which may be of different natures, formed
respectively of Nand N' systems Sand S'. The energy levels of S are Ei
and those of S' are Ej. These two assemblies are in "thermal contact," which
means that they can exchange energy but that their interaction is sufficiently
weak that it does not change the individual energy levels Ei and Ej of the two
systems considered separately.
Furthermore, these two systems are isolated. We denote by E the total
energy.
1. The number W of states in a distribution ({ ni}, {nj}) of the systems S
and S' is
W= N!N'!
IIi (ni!)IIj (nj!)
2. There are now three constraints on the distributions ({ nd, {nj} ):
or equivalently
(2.38)
3. This expression must vanish for all infinitesimal bPi and bpj. Therefore,
one obtains
or
,
Pi = e
-a-{3U
" Pj = e -a' -{3U~J. (2.39)
We notice that it is the same Lagrange multiplier (3 that appears in both
expressions. This is due to the fact that it is the total energy that is a
given quantity. The constants a, a', and (3 are fixed by the constraints as
above. Therefore, the two temperatures are equal if (3 = l/kT.
i.e.,
(2.40)
E
I> c,c si / kT
= N .::::::::::":."--"----,.:_=_ (2.42)
-Z=i e- si / kT
(all appropriate care is assumed in the counting of degenerate states and in
taking the continuum limit).
Finally, this method allows us to define a thermostat by considering the
limit where one of the assemblies is much larger than the other. Establishing
thermal contact with the second, small assembly does not change the tem-
perature of the first one. We therefore recover the usual treatment of ther-
modynamics of assemblies in thermal contact with a thermostat at a given
temperature.
The notion of heat, which is very intuitive and has been known since very
ancient times, was viewed for a long time as emanating from some fluid that
could flow from one body to another. The first principle of thermodynamics
tells us that it is a particular form of energy. Statistical thermodynamics allows
us to understand this in a very natural manner.
Indeed, consider an assembly at equilibrium whose total energy is E =
N LPici and whose temperature is T. In any infinitesimal evolution of this
assembly through a contact with the outside, two things can happen. One
is the variation of the energy levels Ci if the total volume changes, or if an
electric field is applied, etc. Another is the reorganization of the populations
of the various energy levels ni = NPi' The corresponding variation dE of the
total energy of the assembly is
(2.44)
The first term is obvious. It corresponds simply to the work of the external
forces
(2.45)
(we avoid the traditional dW in order to avoid confusion). Under the external
action, the energy levels vary, resulting in a variation (2.45) of the total energy
of the system.
The second term is less obvious. It comes from the fact that, even if the
energy levels Ci do not change (in the absence of external work), the total
energy can be modified by a rearrangement of the populations ni of the levels.
This variation of the (internal) energy without any intervention of external
forces is what we call "heat." We obtain the statistical definition of heat as
(2.46)
In order to relate Boltzmann's entropy equation (2.43) and the usual for-
mula of thermodynamics, consider a variation dS = kd(ln W). We obtain
(2.47)
(of course, Li dni = 0). Suppose the evolution is sufficiently slow that at any
time thermodynamical equilibrium is achieved. (The temperature may evolve
during the process). This is called a reversible transformation in macroscopic
thermodynamics. If this is the case, the ni are proportional to exp (-ci/kT),
which yields
(2.48)
2.4 Problems 43
dS rev . = (di) ,
rev.
(2.49)
2.4 Problems
2.3. Brachistochrone
A popular problem for mathematicians is the brachistochrone curve. Consider
two points A and B in a vertical plane, joined by a curve C. In A, a massive
particle is dropped with zero initial velocity, and it slides without friction
along the curve under the effect of gravity. We want to determine the curve C
such that the time for the particle to go from A to B is minimum. We note z
the altitude and x the abscissa of a point on the curve. The endpoints A and
B correspond respectively to (x = a, z = a) and (x = b, z = (3).
We neglect friction of air and the track, as well as the efforts of the skier to
maintain his trajectory. Therefore, the total energy of the skier is a constant
of the motion.
1. Check that with this definition of the variable x, the potential energy of
the skier at point (x,y) is V = -mgxsina.
2. Write the expression of the skier's total energy at a given time. We denote
x == dx/dt, if == dy/dt. What is the relation between the potential energy
and the kinetic energy owing to energy conservation?
3. Use the previous expression to express the square of the time interval dt
between two positions, (x, y) and (x + dx, y + dy), of the skier, in terms
of dx 2, dy2, x, y, g, and a.
4. Calculate the time it takes to go from 0 to A if the skier follows a trajec-
tory defined by a function y(x) (note y' == dy/dx).
5. What is the equation of the optimal trajectory?
vi
6. Show that along the optimal trajectory the quantity C = y' / x(l + (y')2)
is a constant. Deduce from this that along the trajectory the quantity
f(t) = if/x is a constant K, and express its value in terms of C, g, and a.
7. Check that the parametric form x(B) = (1 - cos2B)/(2C2) and y(B) =
(2B - sin 2B) / (2C 2 ) is a solution. Use the result of the previous question
to calculate the function B(t).
8. What kind of curve is it? Draw the trajectory qualitatively in the case
y'(A) 1.
9. Explain the result physically. (It is not necessary to do all previous calcu-
lations in order to answer this question.)
2.5. Strategy of a Regatta
A sailboat has velocity v(B), which is a function of the angle B between the
direction of the wind and the direction of the boat and also of the norm w
of the velocity of the wind. We assume that the velocity of the boat v is
proportional to the velocity of the wind wand that it depends on the angle
e chosen by the skipper. For convenience, in what follows, we shall write this
velocity in the form
w
v(B) = cos(B) h(tanB) , (2.50)
We are interested in the strategy where the sailboat tacks to the wind (i.e.,
e ~ 71"/2), as shown in Figure 2.11. We assume that the x component Vx of
2.4 Problems 45
the velocity of the boat is opposite to that of the wind and that the position
of the sailboat along the x axis always increases with time. We assume the
coast is linear (land = half-plane z < 0, sea = half-plane z > 0).
We assume the wind is parallel to the coast, of direction opposite to the
x axis, and that the norm of its velocity w(z) depends only on the distance z
to the coast.
x=L
z
z =z,
shore x
Fig. 2.11. Diagram of the direction of the sailboat compared with that of the wind.
Here, we assume that the velocity of the wind has the form
zo
w(z) = Wo - WI--, (2.51 )
Z Zo+
where Wo is the velocity far from the coast, which is larger than the velocity
(wo - wd ;::: 0 on the coast z = O.
1. We denote
. dx . dz , dz
x = dt' Z = dt' Z = dx'
Show that z' = tan B.
2. We first assume the wind is uniform (w = constant, WI = 0). Write the
expression of the velocity of the boat along the axis of the wind Vx = x
in terms of wand h(tanB). For what values of Band z' is this velocity
maximum? What is its value?
3. We now assume WI -:I O. The boat sails from the origin (x = 0, z = 0)
to a given point (x = L, z = zd. We assume that z' ;::: 0 for all t (i.e.
the boat never changes tack). We want to determine the fastest trajectory
z( x). Write the expression of the time dt to go, on this trajectory, from x
to x + dx in terms of the functions wand h. Give the value of the total
time T between the starting point and the arrival.
4. Deduce from (3) the equation that determines the optimal trajectory
(which minimizes T).
5. Show that the translation invariance of the problem along the x direction
yields
46 2 Variational Principles
h'(z')z' - h(z')
w(z) = A,
where A is a constant.
6. Use the previous result to calculate the trajectory in the form of a function
x(z) (and not a function z(x)). Fix the value of the constant A.
7. Calculate the value of z/ = dz/dx as a function of z. We assume that
Zl Land Zl zoo Do you think the result corresponds to the best
strategy? If not, what modifications must the skipper make?
3
2 There are many books on analytic mechanics, or dynamics. One can refer to the
classics of Landau and Lifshitz, [8] and [1], and the book Classical Mechanics [9]
by Herbert Goldstein, which is clear and complete.
3.1 Lagrangian Formalism and the Least Action Principle 49
po sible trajectory
In order to make things simple, let us consider first the case of only one space
dimension. Among the infinite class of possible trajectories (see Figure 3.1),
50 3 The Analytical Mechanics of Lagrange
what is the law that determines the physical one? Lagrange knows that the
answer to this question lies in the "principle of natural economy" of Fermat,
further developed by Maupertuis, as we said in Chapter 2.
The variational principle we present here is not the original one used by
Lagrange; it was formulated by Hamilton in 1834 and is simpler in this dis-
cussion. In order not to complicate things, we reverse chronology.
One assumes the following:
1. Any mechanical system is characterized by a Lagrange function, or La-
grangian (x, X, t), which depends on the position x, on its time derivative
x = dx / dt, and possibly on time. The quantities x and x are called the
state variables of the particle. For a particle in a potential V(x, t), we have
for instance
1 .
= 2mx2 - V(x, t). (3.1)
2. For any trajectory x(t), one can define the action S by the integral
S = I
tl
t2
(x, x, t) dt. (3.2)
The Least Action Principle states that the physical trajectory X(t) fol-
lowed by the particle is such that S is minimum, or, more generally, has an
extremum.
We call X(t) the physical trajectory, and we proceed as in Section 2.1.2, except
that the variable is now the time t. Consider a trajectory x(t) infinitely close
to X(t), which also starts from Xl at tr and reaches X2 at t2,
. d
x(t) = X(t) + c5x(t), x(t) = X(t) + c5x(t), c5x(t) = dt c5x(t), (3.3)
where by assumption
(3.4)
To first order in c5x, the variation of S is
c5S = itlt2 (8
8x c5x(t)
8)
+ 8x c5x(t) dt. (3.5)
We integrate the second term by parts and take into account (3.4), so that
the integrated term vanishes. This leads to
c5S = I (8
h
t2 -
8x
- -d (8))
-.
dt
c5x(t) dt.
8x
(3.6)
3.1 Lagrangian Formalism and the Least Action Principle 51
The least action principle states that 88 must vanish whatever the infinitesi-
mal variation 8x(t). Therefore, the equation of motion (i.e., the equation that
determines the physical trajectory), is the Lagmnge-Euler equation
(3.7)
mx= - ~~ =j,
where j is the force.
Generalization
Remarks
'=+ :/({xd,t),
the equations of motion are unchanged.
2. Form of the Lagrangian
It is mainly invariance considerations that dictate the form of the La-
grangian, in particular translation or rotation invariance. We shall come
back to this point. The kinetic term mv 2 /2 comes from the principle of
inertia, or equivalently from invariance under Galilean transformations.
Consider the simple case of a free particle in space.
a) Time has no privileged origin, and therefore a/at = O.
b) Space has no privileged origin, and therefore a/aXi) = o.
52 3 The Analytical Mechanics of Lagrange
8
+ 2v . E 8v 2
I
=
and the equations of motion are the same in both reference frames.
f) If the particle is in a field of force, the potential energy term in (3.1)
is merely a definition of the force. We wish to recover Newton's law,
and this choice guarantees it for forces that derive from potentials.
3. Generalization
The Lagrangian of a set of N particles in a potential V(rl,"" rN; t)
(which includes the mutual interactions of the particles) is
N
= ~ Lmi(ri)2 - V(rl, ... ,rN;t). (3.11)
i=l
4. Change of System of Coordinates
The Lagrange-Euler equations keep the same form in all systems of co-
ordinates (for instance (x, y, z) -+ (r, e, <p)). This feature is particularly
useful in order to perform changes of variables. One calls a sy,;tem of
coordinates {qd generalized coordinates.
minimum for x = constant, the motion is linear and uniform. In the absence
of inertia, on the contrary, the particle would go to the maximum of the
potential at the initial point and come back at the final point. The presence
of the potential can be considered as a property of space that curves the
trajectory. Inertia and force can be viewed as conflicting effects. The particle
follows a path of minimal "length," this length being measured by the action
S.
We see here how the mechanical problem can be transformed into a ge-
ometric problem. As we shall see later on, the motion of a particle in a fiat
Euclidean space can be transformed into the free motion in a curved space,
where it moves along geodesics. We will come back extensively to this point
in chapter 6. Einstein had this idea in mind in 1908 when he was construct-
ing general relativity. It took him seven years to elaborate the mathematical
details of the final theory.
(3.13)
54 3 The Analytical Mechanics of Lagrange
or
In this change of variables, the Lagrange-Euler equations keep the same form.
We can define the conjugate momentum Pi of the generalized variable qi by
the relation
8C'
Pi = 8qi (3.14)
This quantity satisfies the same equation as (3.13); i.e., 'Pi = 8' j8qi.
A cyclic variable is a variable qi that does not appear explicitly in the
Lagrangian '. This means that
8'
-8 =0.
qi
In that case, the conjugate momentum Pi = 8' j8qi is conserved; we have
Pi = constant.
It is useful to find cyclic variables owing to the resulting conservation laws.
Assume the system is isolated (i.e., 8j8t = 0). Another way to describe this
assumption is to say that the system is invariant under translations in time
or that time is homogeneous.
We evaluate the evolution of (x,x) along the physical trajectory x(t),
where we have transformed the first term by taking into account the Lagrange
equation (3.7). We deduce that
(3.16)
8
E = x(t) - - (3.17)
8x '
3.2 Invariances and Conservation Laws 55
(3.18)
Examples
Consider the massive string of Chapter 2 and equation (2.18). The Lagrangian
is (up to factors).c ex: z(x)Jl + z(x)2 (here the variable is x). This Lagrangian
does not depend on the variable x. Therefore, the quantity pz -.c, where p is
the conjugate momentum of z, is constant along the curve (it is a "constant
of the motion" in the language of the present chapter). One obtains with no
difficulty p = zzj Jl + z(X)2 and pz -.c = -zj Jl + z(x)2 = -c, where cis
a constant.
We deduce from this that
J(z)z
p= -,====~=;:: (3.21 )
Jl + z(X)2
Since the Lagrangian does not depend explicitly on variable x, the quantity
J(z)
A = pz -.c = (3.22)
Jl + z(x)2'
56 3 The Analytical Mechanics of Lagrange
i.e., (3.23)
i.e., (3.24)
Suppose the problem is invariant under translations in space. This is the case
for a free particle, and it is also the case for a system of particles whose
interactions depend only on the relative coordinates: V ( { r i - r j } ).
In this case, for any infinitesimal transformation ri ---+ ri + E, the La-
grangian is invariant:
8
5 = '"' - . E = 0 'IE; i.e., (3.25)
~8ri
t
(3.26)
where the gradient is taken with respect to the vector variable rio
If the Lagrangian of the system is of the form (3.11), this relation is simply
the principle of action and reaction of Newton. Indeed, if we consider a system
of two particles interacting via a potential V(rl - r2), we obtain
(3.27)
(3.28)
(3.29)
2)ri x Pi + ri x Pi) = O.
i
In other words,
where the angular momentum Li of each particle and the total angular mo-
mentum L are defined by
(3.31)
angular momentum and therefore lies in the plane of the trajectory, fixes the
two others. Therefore, the solution of the problem does not necessitate any
quadrature. One consequence is that in the case of bound states, the trajectory
is closed, which is exceptional: Only the harmonic potential (ex r2) and the
Newtonian potential (ex 1/r) lead to this property.
The invariance law of the Lagrangian that corresponds to this conserva-
tion law is, in a sophisticated mathematical language, an 0(4) symmetry. We
shall not deal with that here. This necessitates some Lie-group theoretical
considerations that are beyond the scope of this book.
One can convince oneself that the formalisms of Lagrange and Newton coin-
cide in the case of conservative forces, which derive from potentials. However,
the Lagrangian formalism does not easily accommodate dissipative forces that
depend on the velocity, such as friction. Dissipative forces belong to the me-
chanics of continuous media, and we are not much concerned with that here. 3
We can nevertheless give, as a concrete example, a Lagrangian method
that can deal with simple dissipative systems by a trick. Consider a system
that loses energy by friction, Joule heating, or any other process. The trick
consists in coupling the system appropriately with a fictitious mirror system
that formally absorbs the energy in such a way that the total energy of the two
systems remains constant. Naturally, one only attributes a physical meaning
to quantities or results that possess one.
Consider, for definiteness, a damped harmonic oscillator in one dimension,
of coordinate x, whose equation of motion is
mx + Rx + kx = O. (3.33)
In order to obtain this result in the Lagrangian formalism, we introduce a
"mirror" oscillator of coordinate x* and the formal Lagrangian for the set of
the two coupled systems
1
C = m(xx*) - '2R(x*x - xx*) - kxx*. (3.34)
They have nothing to do with the linear momentum of the damped oscillator
(3.33).
3 For a general treatment of dissipative forces, we refer to Chapter 3, Section 2, of
Morse and Feshbach [10].
3.3 Velocity-Dependent Forces 59
we obtain
.. _ . (aAy(r, t) _ aAx(r,
mx - y ax ay
t)) _. (aAx(r,
Z az
t) _ aAz(r, t)) _aAx(r, t)
ax at
(3.41)
Therefore, if we introduce the vector field
B(r, t) = V' x A(r, t), (3.42)
we obtain the vector expression
.. _. B( ) _ aA(r, t)
mr - r x r, t at' (3.43)
whose form is of obvious interest.
60 3 The Analytical Mechanics of Lagrange
V'. B = 0,
aB
V' x E = - - (3.44)
at'
allow us to express the fields E and B in terms of the scalar and the vector
potentials if> and A,
B=V'xA, (3.45)
One thing may, however, seem surprising. We have expressed the Lagrangian
in terms of the potentials if> and A. However, these are not unique. The fields
E and B are invariant under gauge transformations,
Therefore, a gauge transformation does not affect the physics of the problem.
This is of course obvious in the equations of motion. It becomes less obvious
when one transposes the result in quantum mechanics. 4 Gauge invariance is
a dynamical symmetry that one can visualize as defining field theories. This
is the starting point of modern theories of fundamental interactions.
3.3.4 Momentum
Consider first a free particle of mass m. We know the result: The motion is
linear and uniform.
(3.53)
The action is
s = _mc 2 1t2
h
)1- v: dt.
c
(3.54)
This action is Lorentz invariant, whereas the Lagrangian (3.53) is not. This
comes from the fact that in the present approach, time, over which we inte-
grate, plays a special role. One can get rid of this problem, but we shall not
do it here.
We remark that in the limit of small velocities, we recover the non-
e
relativistic Lagrangian up to a constant: = -mc2 + mv 2 /2.
p - -
ae - mv
(3.55)
- av - ---,==~==;:
Jl - v2 /c 2 '
The energy is
(3.56)
(3.60)
~ + q(v A - ).
C = -mc2 y 1- ~ (3.61)
1. Conjugate Momentum
Let p be the momentum in the absence of the field, as defined by (3.55):
mv
p= (3.62)
)1- v2 /c 2
---c=~=;;;:
2. Lagrange-Euler Equations
The equation of motion follows from the Lagrange-Euler equations.
(3.64)
We have
a
ar = q(\7(v. A) - \7), (3.65)
which yields
dP = d(p + qA) = q(\7(v. A) _ \7). (3.66)
dt dt
3. Equations of Motion
We use the relations
dA = aA
dt at
+ (x aA
ax
+ iJ aA + i aA) = aA + (v. \7)A,
ay az at
(3.67)
and
\7(v . A) = (v \7)A + v x (\7 x A). (3.68)
This leads to the equation of motion
dp
dt = q(E + v x B), (3.69)
(3.70)
by taking the derivative of this equation with respect to time, and taking
into account the definition (3.62), we obtain
dkin dp
--=V-. (3.71)
dt dt
3.5 Problems 65
dt:kin = qv . E (3.72)
dt '
where E is the electric field. Only the electric field works and modifies the
kinetic energy and the norm of the velocity.
3.5 Problems
for the evolution of the atmosphere. Lorenz created a new and spectacular
source of interest in chaos with his "butterfly" effect in meteorology.
In Section 4.3, we introduce the Poisson brackets, which bear a mathe-
matical structure of great interest and whose applications are closer to what
we are concerned with here. Jacobi considered them as Poisson's greatest dis-
covery. In fact, Poisson brackets bear the starting point of the theory of Lie
groups. We shall use them to define canonical transformations, which have
many applications and show that there is a complete equivalence between
the two types of state variables: positions {x} and momenta {p}. From the
mathematical point of view, phase space is the space that is appropriate to
describe the evolution of a set of points, as opposed to the "empirical" space
of positions and velocities. We shall establish the Liouville theorem, which is
a remarkable geometric property of the evolution of a system in phase space.
We will then be able to see in a natural way the amazing discovery of Dirac
in 1925. There is a remarkable similarity between analytical mechanics and
quantum mechanics if one replaces the classical Poisson brackets by the com-
mutators (divided by in) of quantum observables. In Section 4.4, we shall
extend these considerations to the case of a charged particle in a magnetic
field, where precisely the conjugate momentum and the linear momentum
differ radically.
Section 4.5 is devoted to the Hamilton-Jacobi equation, where one chooses
to work directly with the action as a function of the variables (x,p) and no
longer with the Lagrangian or the Hamiltonian. After we have established the
major properties and the Hamilton-Jacobi equation, we will discover an im-
pressive series of results. We shall see how, for conservative systems, the flow
of trajectories is orthogonal to the surfaces of constant action. From that point
of view, we will see that the Maupertuis principle can be cast in a completely
geometric form. At that point, we will be able to understand how geometrical
optics appears as the limit of wave optics, as was discovered by Hamilton.
The proof involves what is called the eikonal, which is the optical analog of
the action (divided by the wavelength). In the approximation of small wave-
lengths, called the eikonal approximation, the wave propagates with a wave
vector that is locally perpendicular to the surfaces of constant eikonal. The
surfaces are the geometric wave fronts. We will see that the eikonal equation
corresponds exactly to the Fermat principle. The geometric interpretation is
nothing but the Huygens-Presnel principle. Finally, we will show that the
same methodology can be applied to the Schrodinger equation in wave me-
chanics. This constitutes the famous semiclassical approximation of Gregor
Wentzel, Hendrik Anthony Kramers and Leon Brillouin.
Actually, the formulation (3.2) of the least action principle is not due to
Lagrange (who used a more complicated form). It was formulated by Hamilton
4.1 Hamilton's Canonical Formalism 69
Suppose that we invert equations (3.12) and that we can calculate the {Xi}
in terms of the {xd and {pd, which are our new state variables. 2
The problem is to obtain the equations of motion of the {xd and {pd in
terms of these same variables by eliminating the {x;}.
The solution consists in performing what is called a Legendre transforma-
tion. Let us introduce the Hamilton function, or Hamiltonian,
( 4.1)
Consider, for simplicity, a one-dimensional problem, and let us write the total
differential of H,
dH = P d X + X. dP - -oJ: d X - -
oJ:. d'x - -oJ: dt.
ox ox ot
Taking into account (3.12) and (3.13), the first and fourth terms cancel, and
the third one is -p dx. Therefore, we have
2 Conjugate momenta always exist since the Lagrangian contains a quadratic term
in Xi.
70 4 Hamilton's Canonical Formalism
Legendre transformations are often used for performing changes of variables. One
chooses the most convenient set of variables according to the nature of the phys-
ical problem under consideration. A simple example is that of thermodynamic
potentials. Starting from the energy U = W +Q, which is convenient if one works
with the volume and the entropy, dU = -PdV + TdS, one goes to the enthalpy
H = U + PV if one works with the pressure and the entropy dH = V dP + TdS,
to the free energy F = U - T S if one works with the volume and the temperature
dF = -PdV - SdT, and the free enthalpy, the Gibbs function G = F + PV, if
one works with the temperature and the pressure dG = -SdT + VdP.
(4.5)
More generally, if we denote by X(t) = (ri(t), Pi(t)), the position of the system
at time t in phase space, Hamilton's equations are of the form X = F(X);
i.e., a first-order differential equation for the evolution of the 2N-component
vector X(t). This is called a dynamical system.
This type of problem has been an amazing source of discoveries both in
mathematics and in physics; one can refer to the book by 1. Percival and
D. Richards [11]. This field of investigations was founded by Henri Poincare
in 1885, in particular when he made his celebrated analysis of the three-
body problem (the really difficult problem of mechanics). A large number of
famous mathematicians have studied this type of problem, which is still at
the forefront ofresearch in mathematics. J.-C. Yoccoz was awarded the Fields
Medal in 1994 for his results on this subject, which he studied together with
Michael Herman.
One studies the whole set of possible motions, which is called the flow of
these vectors. This leads to fascinating problems such as limiting problems
at t = 00, attractors, and strange attractors; bifurcations, which are sudden
changes in the nature of these flows for certain values of the parameters enter-
ing the function F(X); and chaos and the "butterfly effect" in meteorology,
for example.
4.2 Dynamical Systems 71
Poincare considered a gravitating system involving more than two bodies, say
planets around the sun. Considering two sets of initial conditions as close to
one another as one wishes, Poincare proved that there is a time when two
of the planets can be as far away from each other (and from their starting
point) as one wishes. 3 This effect is called chaos. It is encountered in many
other physical problems. According to the system under consideration, the
characteristic time for chaos to show up varies considerably.
A very simple example of a chaotic system is playing dice. In principle, in
classical mechanics, if we were to determine extremely accurately the condi-
tions of the problem (i.e., the initial conditions, the way to throw the dice, the
geometry of the dice, etc.) one could in principle predict the result of a throw
of dice, and the phenomenon would lose its probabilistic character. However,
it is quite obvious and intuitive that the outcome of different experiments
would be highly sensitive to the initial conditions and that it would require
an enormous amount of information to make the prediction. It is therefore
much more efficient in practice to perform a probabilistic description of the
problem, where one imposes some ignorance on the initial conditions which
are said to be chosen "at random." This phenomenon is encountered in ce-
lestial mechanics, and many other problems, when initial conditions are close
but not "infinitesimally" close, provided the time of evolution is long enough.
The case of three planets of unequal masses orbiting around a "sun,"
taking into account their mutual interactions, is shown in Figure 4.1. At the
beginning, everything evolves rather smoothly. However, after some time, the
lightest planet is simply ejected from the system; this is of course compatible
with energy conservation, which would not be the case for a two-body system.
By letting the computer run for a longer time, the two other planets, which
have a smooth motion at first, also reach unexpected configurations.
As noted earlier, the most famous strange attractor is probably the Lorenz
attractor, and it generated a spectacular amount of new interest in chaos.
Consider the evolution of a rectangular slice of the atmosphere that is
heated from below and cooled from above. There are three variables: x, the
convective flow of the atmosphere; y, the horizontal temperature distribution;
and Z, the vertical temperature distribution.
The details of the physics involved are of little interest here. In the Lorenz
model, the evolution of these variables is given by the (Hamiltonian) non-
linear differential system
3 In the 19th century, Laplace and others had extensively developed perturbation
theory, which provided extremely accurate predictions for celestial mechanics. Ac-
tually, in the course of his work, Poincare showed that the perturbation expansion
never converges; it is only an asymptotic expansion.
72 4 Hamilton's Canonical Formalism
Fig. 4.1. Evolution of three planets around a star, taking into account their mutual
interactions. The time sequence of the pictures must be read from left to right
and from bottom to top. The time interval between two pictures is the same. One
sees that at the eleventh stage, the third planet, lighter and initially close to the
second one, is expelled from the system. Pictures are from Jean-Fran<;ois Colonna,
colonna@cmap.polytechnique.fr, http://www.lactamme.polytechnique.fr; all rights
reserved.
dx
dt=O"(Y-X)'
dy
dt = px - y - XZ,
dz
dt = xy - (3z, (4.6)
where 0" is the ratio of the viscosity to the thermal conductivity, p is the
temperature difference between the top and the bottom of the slice, and (3 is
the ratio of the width to the height of the slice.
Lorenz used to solve this problem numerically, using hours of computer
time at night,4 by standard successive iteration techniques (Xi, Yi, Zi) -+
(XHl' YHl, ZHl)' At that time, this generated kilograms of paper called com-
puter listings. One day, Lorenz had the idea of redoing a calculation whose
solution he had found the day before, using as a starting point not the last
point obtained the day before but some intermediate value (Xi, Yi, Zi) obtained
in the calculation. To his great surprise, after a relatively small number of iter-
ations, the following values appeared completely different from those obtained
previously. Lorenz had rediscovered chaos, due, in that case, to round-off er-
rors of the numbers he used.
The sensitivity of the results to the initial conditions induces the same
type of difference between two solutions initially close to one another. Lorenz
called that the "butterfly effect." Actually, the title of one of his talks was:
Can the beat of a butterfly's wing in Brasil cause a tornado in Texas? Whether
or not it is a coincidence, the "Lorenz attractor" has the shape of the wings
of a butterfly.
In Figures 4.2 and 4.3, one can see the result of an iteration of the equations
(4.6). We notice that the time evolution of the point (x, y, z) has a perfectly
quiet behavior-the point turns around on a wing of the attractor- but that
unexpectedly it "jumps" from one wing to the other at certain times. This
occurs unexpectedly in space as well as in time, in the sense that the trajec-
tories of two points that are initially very close (in position and velocity) can
become completely different at a later time. In particular, the two positions
can be on different wings of the attractor.
..
Fig. 4.2. Lorenz attractor viewed from two different sides. The points correspond
to a discrete numerical iteration of (4.6) . One can follow the points and observe the
sudden and unexpected transition from one wing of the attractor to the other, which
was not possible to predict half a semiperiod before. (Courtesy of Jean-Fran<;ois
Colonna.)
Consider two physical quantities f and g, which are functions of the state
variables (Xi,Pi), i = 1, ... , N and possibly of time. One calls a Poisson
bracket of f and g the quantity
74 4 Hamilton's Canonical Formalism
Fig. 4.3. Projection of the Lorenz attractor on the (x, z) plane. (Courtesy of Jean-
Fran<;ois Colonna.)
N
{f g} = '" ( a1 ag _ a1 ag ) (4.7)
, ~
i=l
ax2 ap2 ap2 ax2 .
(4.10)
and
a1
{pi, j} = --a
Xi
. (4.11)
{1, {g, h}} + {g, {h, j}} + {h, {1, g}} = O. (4.12)
(4.13)
j = {f,H}. (4.16)
Poisson Theorem
Theorem 1. If f and g are two constants of the motion, then their Poisson
bracket is also a constant of the motion.
This theorem of Poisson can be derived from the Jacobi identity (4.12)
We assume that f and g are constants of the motion; i.e., {g, H} = 0 and
{H, f} = O. Therefore,
{H, {f,g}} = 0
and {f, g} is a constant of the motion. In certain cases, this allows one to find
new constants of the motion.
(4.18)
such that Hamilton's equations keep the same form in the new variables. Let
H'(X I , ... , X N , PI"'" PN ; t) be Hamilton's function expressed in terms of
the new variables [Xi, Pi]. Then, in a canonical transformation, by definition
one has
. aH' . aH'
Xi = aPi' Pi = - aXi . (4.19)
Comments
Cyclic Variable
which amounts to
A =
X2 +p 2
2 ,'P = arctan X
(P) . (4.28)
The variables (A, 'P) are canonical conjugate variables, as one can check with
no difficulty. In these variables, the Hamiltonian reduces to the simple expres-
sion H = wA. Hence we have the equations of motion
Here, E is the energy of the oscillator, a constant of the motion. The interest-
ing point about this operation is that we have reduced the problem to a single
time-dependent variable, the angle 'P. Since the energy, which is proportional
to the action A, is conserved, only the angular variable 'P evolves. The variable
'P is a cyclic variable. It does not appear explicitly in the Hamiltonian, and
this results in the properties (4.29) and (4.30).
The geometric interpretation in the (X, P) space, which here is equivalent
to phase space, is simple. The motion occurs on a circle of radius A = E/w,
which depends on the time-independent value of the energy E. On this circle,
the motion of the point (X, P) is uniform and of angular velocity w : 'P =
wt + 'Po.
We already mentioned cyclic variables in Section 3.2.2. This is a simple
example of the role played by such variables, in particular in the investigation
of integrable systems.
of the system. When the system evolves, this point moves in phase space. A
volume element of phase space is defined by
(4.31)
J
Consider an arbitrary volume Jl of phase space, Jl = dJl. We claim that
this volume is invariant under canonical transformations,
J dXI .. dXN dPI ... dPN = J dXI ... dXN dPI ... dPN. (4.32)
J = ax ap _ ax ap = {X P} = 1 (4.33)
ax ap ap ax ' ,
which is equal to one by definition. The extension to N conjugate variables
Xl ... X N PI ... PN is more lengthy but proceeds from the same observation.
Consider now a volume Jl of phase space. Each point of this volume evolves
according to Hamilton's equations. Consequently, at any time t, the variables
(Xi(t),Pi(t)) are canonical variables. Therefore, the motion can be seen as
performing at each time interval a canonical transformation of the state vari-
ables in phase space (Xi(t),pi(t)) -+ (Xi(t'),Pi(t')). We therefore obtain the
Liouville theorem, which is of great importance in statistical physics:
Theorem 3. A volume of phase space remains unchanged during the Hamil-
tonian evolution of the system.
This remarkable geometric property derives from the structure of Hamil-
ton's equations. It is independent of the specific form of the Hamiltonian
itself.
Another interesting geometric property in phase space is the following.
Hamilton's function H(x,p) is defined in phase space. In this space, consider
a vector field whose components are (x,p); i.e.,
. aH . aH
X= ap' p = - ax
One calls the flow of this vector field the set of curves whose tangents at each
point are collinear with the vector at this point. We notice that the flow of
80 4 Hamilton's Canonical Formalism
( i; , p), also called the Hamiltonian flow, is orthogonal in each point to the
gradient of the Hamiltonian at this point,
n
v
H= (aH aH)
ax 'ap .
In the example (4.26) above, the result is very simple. The trajectories
in the (X, P) plane are circles centered at the origin, and the gradient of
H = (P 2 + X 2 )/2 lies along straight lines going through the origin. This can
be stated in the converse way: The gradient of H = (p2 + X 2)/2 lies along
straight lines going through the origin, and therefore the trajectories are cir-
cles centered at the origin. This result can be generalized to any number of
variables. One can express the conservation laws of energy, momentum, and
angular momentum with geometrical considerations of that kind (using the
corresponding invariance properties).
The formulas above reveal an amazing fact. There is a strong analogy, if not
more, between the structures of analytical mechanics and quantum mechanics.
In quantum mechanics, one proves quite easily what is called the Ehrenfest
theorem: 6 The time derivative of the expectation value (a) of a physical quan-
tity A is related to the commutator of the observable A and the Hamiltonian
fI by the relation
d
dt (a) =
1"
in ([A, H]) + at .
(aA) (4.34)
'. 1"
A = in [A,H] + at
aA (4.35)
which has the same structure as (4.14) if one replaces the Poisson brackets by
the commutators of the quantum observables, divided by in.
The same remark applies to the canonical commutation relations of the
conjugate variables of position x and momentum p,
This similarity, if not identity, between the structures of the two mechanics
was one of the first major discoveries of Paul Adrien Maurice Dirac during the
summer of 1925 (he was 23). Dirac, after finding that the noncommutativity
of quantum observables was actually the foundation of Werner Heisenberg's
matrix mechanics, had decided to construct a new formulation of mechanics
that would incorporate this noncommutativity in a well-defined way. One day,
he remembered the structure of Poisson brackets and saw that they played,
formally, a role similar to that of quantum commutators, divided by in. In
september 1925, he was able to construct what is called "quantum mechanics"
in its present form. Of course, the mathematical nature and the physical in-
terpretation of the quantities are different in the two cases, but the equations
that relate them are the same if one postulates the correspondence between
Poisson brackets in analytical mechanics and the quantum commutators, di-
vided by in, in quantum mechanics.
More generally, in complex problems (large numbers of degrees of freedom,
constraints between variables, etc.), the systematic method of obtaining the
commutation relations of quantum observables consists in referring to the
classical Poisson brackets and replacing them by the quantum commutators
(divided by in).
4.4.1 Hamiltonian
which is expressed in terms of the potentials A and CP, and not the fields E
and B.
As an exercise, one can write the Hamiltonian in the relativistic case (3.61),
and the result is
H = Jm 2c4 + c2(p - qA)2 + qCP, (4.40)
82 4 Hamilton's Canonical Formalism
The least action principle consists in finding the equations of motion by min-
imizing the action, which is itself defined in terms of the Lagrangian and the
endpoints of the trajectory by (3.2).
It is useful, however, to work directly with the action itself as a physical
quantity. In order to do this, we will first express the action in terms of the
coordinates and time; i.e., S(Xl, X2, ... , Xn; t). In the case of one degree of
freedom, this amounts to calculating the values of S along the set of physical
trajectories as a function of the time and position of arrival (x, t), the start-
ing time and position being fixed. Equivalently, we want to characterize the
various trajectories starting at (Xl, iI) and arriving at (x, t) with the value of
the action S(x, t; Xl, iI).
The action is defined by
S=it.c(X,x,t')dt' . (4.41)
tJ
In this expression, the variables (x(t),x(t)) are assumed to take their physical
values that satisfy the Lagrange-Euler equations.
We integrate the second term by parts, but we do not impose that we end
up at the same point x(t) but rather in its vicinity x(t) + ox(t) (we maintain
ox(td = 0). The integrated term does not cancel out and we obtain
oS = a.c
ax ox(t) + ith (a.c
ax d (a.c))
- dt ax ox(t) dt. (4.43)
Therefore, the partial derivatives of the action with respect to time and to
the coordinates are simply the conjugate momenta
as . 1 as = Pi,
-a
Xi
=Pi, or m genera -a
qi
(4.46)
dS =.c. (4.47)
dt
Now, if we consider the action as a function of coordinates and time, we have
(4.48)
84 4 Hamilton's Canonical Formalism
Putting together the two equalities, we see that the (partial) derivative of the
action with respect to time is, up to a sign, the Hamiltonian
as N
at = [- I>iXi = -H, (4.49)
i=1
and the total differential of the action can therefore be written in terms of the
coordinates and time as
N
dS = LPi dXi - H dt. (4.50)
i=1
We notice that no reference is made to the initial position and time. The
formal expression of the action is therefore
(4.51 )
(4.52)
which is the form (3.2) we used as a starting point in the previous chapter.
However, we work here with the conjugate variables (x,p) and not the
variables (x, x) of Chapter 3. The canonical equations of Hamilton follow
directly from the expression of the action (4.51). Indeed, consider the variables
x and p as independent variables, and consider for simplicity a single degree
of freedom. The action is then
S = 1(2)
(1)
(p dx - H dt) . (4.53)
1
of Sis
(2) ( aH aH) (4.54)
8S= (1) 8pdx+pd(8x)- ax 8xdt - ap8pdt .
The second term in the integral can be integrated by parts. The integrated
term (p 8x) disappears since we assume 8x(2) = 8x(1) = O. We therefore
obtain
(4.55)
which vanishes for any variation (8x,8p) if and only if the integrands vanish
identically, i.e.
aH aH
dx- -dt=O
ap , dp+ ax dt = 0,
where we recognize Hamilton's canonical equations.
4.5 The Action and the Hamilton-Jacobi Equation 85
The Hamilton-Jacobi equation can be read off from (4.49) and (4.46). In the
Hamilton function, we can replace the momenta Pi by the partial derivatives
of the action. This leads to
12 Po
H=-2 ( Pr +2"+
2 PcP
2
2) +V(r,e,cjJ).
2e (4.57)
m r r sm
v = Vo(r) + f(e)
r2 (4.58)
(in full generality, one can add a term of the form g(cjJ)/T 2 sin 2 e). The
Hamilton-Jacobi equation is
1
2m
(OSO)2
or
1 [(OSO)2
+Vo(r)+2mr oe 2
1 1 (OSO)2 =E,
+2mf(e) +2mr2sin2e ocjJ
(4.59)
where E is the constant value of the energy.
The cjJ variable is cyclic. We denote by = L z the constant value of PcP. In
other words,
( OSO)2
ocjJ
= 2
. (4.60)
(4.62)
We obtain
dS
( dO 1)2 2
+ 2mf(O) + sin2 0 = a (4.63)
2
1 ( dS2 ) a
2m dr + Vo(r) + 2mr2 = E, (4.64)
where a is, like E and , a constant of the motion, determined by the initial
conditions.
Integrating these equations yields
(3 = as therefore (4.66)
aa'
Since q is, by definition, the derivative of q along the physical trajectory,
we have
and .!!:.-(3 = i as + aH a 2s .
dt at aa ap aq aa
On the other hand, we have
~H (q as(q,a,t)) = aH a 2s (4.67)
aa 'aq ap aaaq'
Injecting this into (4.66), and taking into account the Hamilton-Jacobi equa-
tion (4.56), we obtain
Going back to the result (4.65) we consider the three constants of the
motion (E,f,a). From the expression (4.65) of the action, we define the three
constants (3E ,(3c,(3a by
as as
(3E = aE' (3c = af'
The values of these constants are fixed by the initial conditions of the problem.
We therefore obtain the trajectory and the equation of motion from the three
equations by taking the derivatives of (4.65) with respect to E, f, and a.
Reduced action
Suppose the Hamiltonian H does not depend explicitly on time. Then, the
energy is conserved. Let E be its value in the problem under consideration.
Then equation (4.49) yields
as =-E- (4.69)
at '
i.e.,
S = -Et + SO(XI, ... ,XN). (4.70)
The quantity So is called the reduced action. It satisfies the equation
(4.72)
For a conservative system, we see that the variational principle concerns this
quantity: 680 = o.
Geometric Interpretation
The relation (4.46) can also be written in terms of the reduced action
aso
~=Pi' (4.73)
UXi
linear momenta Pi = mi:k In coordinate space, (Xl, X2, ... , X N ), consider the
surfaces on which the reduced action is constant, So = constant. The relation
(4.73) implies that the vector P == (PI,P2,'" ,PN) is everywhere orthogonal
to these surfaces.
Considering the simple case of a particle in three-dimensional space, we see
that the trajectory is, at each point, orthogonal to the surface So = constant.
At a given time, this property is also true for the action S. If we denote by
di: an elementary vector tangent to the surface So = constant at point r, we
have by definition \7 So . di: = 0 or
Maupertuis Principle
1
_(\7S0)2 + V(r) = E, or also (\7S0)2 = 2m(E - V(r)). (4.75)
2m
In this problem, the momentum is simply p = mr. The reduced action (4.72)
J J r.
is therefore
So = p. dr = m dr. (4.76)
(4.77)
Hence we have the simple form of the Maupertuis principle given in Section
J
2.2.1
c5 J2m(E - V) de = o. (4.80)
4.6 Analytical Mechanics and Optics 89
The previous consideration will allow us to show how geometrical optics ap-
pears as the limit of wave optics in the small-wavelength limit.
Scalar wave
(4.81)
(4.82)
(4.83)
where
k 0_- W 27r
_
- -- (4.84)
c A
is the modulus of the wave vector at point r. The quantity Sin (4.83) is called
the eikonal (from the Greek word ELK,WlJ, image or picture). Inserting (4.83)
into (4.82), we obtain, after simplifying by eikoS(r) and dividing by k6,
i ( 1 2
tpo ( \7 S ) 2 + ko 2) 2
2\7 tpo . \7 S + tpo \7 S - k6 \7 tpo = n . (4.85)
Of course, one notices the great similarity between the eikonal equation (4.87)
and the Hamilton-Jacobi equation (4.75) for a massive point-like particle. The
reduced action So of the particle and the eikonal S for a light wave obey the
same law if one makes the correspondence
The same type of argument can be used in wave mechanics and the Schrodinger
equation. This is called the semiclassical approximation and is due to Wentzel,
Kramers, and Brillouin (WKB). One can for instance refer to Volume 1, Chap-
ter VI of Albert Messiah's book Quantum Mechanics [13] for any details,
particularly practical applications of the method.
Consider the Schrodinger equation
a h2
+ V(r) 1j;(r, t).
at 1j;(r, t) = -2m
ih- - ,11j;(r, t) (4.91 )
Substituting in (4.91) and considering separately the real and imaginary parts,
we obtain
as _1 (\1S)2 V = ~ \1 2 A (4.93)
8t + 2m + 2m A '
8A 1
m-
8t
+ \1A \1S + -A\1
2
2
S = O. (4.94)
(4.97)
85 1
at + 2m (\75)2 + V = 0, (4.98)
4.7 Problems
P2 p2
H=_1+_2+ mw 2x2I+ ___
___ mw 2x22+ mJP(x I _ x 2 )2
2m 2m 2 2 4
1. Show that the transformation
x = Xl +X2 p = PI +P2
y'2' y'2'
y = Xl - X2
y'2'
is a canonical transformation and express the Hamiltonian with these new
variables.
2. Find the eigenfrequencies of the system.
3. Write the general form of the motion (Xl (t), X2 (t)).
(4.99)
p _ PI - P2 p _ PI + P2 + P3
1- y2 , 3 - J3 '
is canonical.
2. Write the Hamiltonian with these new variables. Deduce the eigenfrequen-
cies and the general form of the motion.
(4.100)
(4.101)
1 N 1 N .
Y k -- - - ~ e2ikmr/N X = __ e-22kmr/Np
ffi ~ n,
q
k ffi
~
~ n, (4.102)
n=l n=l
a) Show that
b) Show that
N N
LYkYZ = LX~ and (4.104)
k=l n=l k=l n=l
c) Show that
(4.105)
(4.106)
1 (T
(1) = T io f(t) dt. (4.107)
(4.108)
3. What does this equality become if the potential V is a central power law
function V = g rn with r = Irl?
4. In the case above, what is the relation between the total energy E, the
mean kinetic energy (E k ), and the mean potential energy (V) for
a) a harmonic oscillator n = 2, and
b) for a Newtonian (or Coulomb) potential n = -I?
96 4 Hamilton's Canonical Formalism
5. In general, for an arbitrary potential, the orbits of bound states are not
closed curves, but they nevertheless remain confined in space. At all times,
Irl :::; ro and Ipi :::; Po, where ro and Po are fixed. Give a generalization of
the definition (4.107) such that the result (4.108) remains true.
4.6. Calculate the Poisson brackets of the three components of the angular
momentum L = r x p.
A PxL r
=- - -e 2 -
m r
between each other, with the components of the angular momentum, and with
the Hamiltonian. What can one conclude?
4.8. Verify with the Hamiltonian (4.39) that Hamilton's equations give the
expected equation of motion.
5
Lagrangian Field Theory
The Lagrangian formalism acquires its real power when one deals with systems
that possess a large, possibly infinite, number of degrees of freedom. That is
the case in mechanics of continuous media. We will now examine how this
formalism deals with field theory.
In itself, field theory is a vast domain that acquires its completeness when
one considers the quantization of fields and the theory of fundamental inter-
actions. In the present chapter, which is deliberately rather short, we want
to explain the principles of Lagrangian field theory and its application to the
electromagnetic field. The classical theory of gravitation is beyond the scope of
this book. It is thoroughly treated in the literature, and we refer the interested
reader to Landau and Lifshitz [1], for instance.
In Section 5.1, we will study the principle of the Lagrangian formulation
of field theory, starting with the case of a vibrating string. Actually, the pro-
cedure is rather simple. One starts by considering a discrete problem with
finite elements of the string. One then takes the continuum limit such that
a Lagrangian space density appears. It is in this limiting procedure that one
appreciates how well the Lagrangian formalism is adapted to this type of
problem.
The extension to three space dimensions, as well as several degrees of
freedom, is dealt with in Section 5.2. One can easily guess the extension of
the method to four dimensional space-time and relativistic fields. In Section
5.3, we will consider a scalar field, and in Section 5.4 the electromagnetic
field and the Maxwell equations. In Section 5.5, we shall say a few words
about field equations that are of first order in time. The first example is the
Fourier diffusion equation, which corresponds to a nonreversible problem; i.e.,
a dissipative
J-L. Basdevant, problem. This example is interesting because of the similarity
between the Fourier equation and the Schrodinger equation. We shall see that
a Lagrangian approach can be constructed for the latter but that essentially
it leads nowhere in nonrelativistic quantum mechanics.
dEk = 2(P
1
dx) (o'lj;)
at 2
(5.1)
V = ~T
2
r (O'lj;)2 dx.
l
Jo ox (5.2)
(5.3)
ox dx = ~2 [P(o'lj;)
d'c = L (Of,'f/, o'otlj; ' o'lj;) ot _ (O'lj;)2] dx.
2
Tox (5.4)
5.2 Field Equations 99
8L 8 ( 8L ) 8 ( 8L ) (5.6)
8'1j; = 8t 8(8'1j;/8t) + 8x 8(8'1j;/8x) .
In the case under consideration, 8L/8'1j; = 0 so that if we define the prop-
agation velocity c by
2 7
C =-, (5.7)
p
we obtain the propagation equation of vibrations along the string
(5.8)
The previous case is slightly more complex than the equations we saw in (2.8)
and (2.9). Indeed, for a field, the dynamical variable 'Ij; depends on several
variables. In the example (5.8), the field 'Ij; depends on two variables, t and x.
More generally, consider n dynamical variables 'lj;k, k = 1, ... , n, that
depend on m variables x s , s = 1, ... , m (including time); i.e., 'lj;k(X s ) , s =
1, ... ,m.
We define
(5.9)
and we denote by ['Ij;k] the set of partial derivatives of 'lj;k(Xl, ... , xm). The
Lagrangian density is of the form
It is a bit tedious but not difficult to convince oneself that the determination of
the extremum of the action S under the set of all infinitesimal transformations
'ljJk -+ 'ljJk + 8'IjJk, k = 1, ... ,n, which vanish on the edge of the integration vol-
ume once one has performed all integrations by parts, lead to the generalized
Lagrange-Euler equations
(5.10)
we obtain
a (aL) aL m-l a (aL)
at a?j;k = a'IjJk - ~ axs a'IjJ'k '
(5.11)
Consider again the vibrating string, adding for more generality a linear term
in 'IjJ (which can come from an external force F(x) that we apply at each
point). For simplicity, we define
(5.13)
(5.14)
where G = F/ p.
Since we are interested in the time evolution of the system, we define the
density of conjugate momentum p by
5.3 Scalar Field 101
(5.16)
This density depends on 'IjJ and p, but also on 'IjJ', and the form of the canonical
equations must be modified. Inserting (5.16) (i.e., L = p - H), in the least
action principle, and integrating by parts in the two variables x and t, we
obtain
= JJ +
dt dx[8p p8 - (8H/8p)8p - (8H/8'IjJ)8'IjJ - (8H/8'IjJ')8'IjJ']
= J J [( .
dt dx
8H) (. 8H 8 8H) ]
'IjJ - 8p 8p - p + 8'IjJ - 8x 8'IjJ' 8'IjJ . (5.17)
One can check that they yield the propagation equation (5.14).
(5.19)
Notice that, compared with the vibrating string, space and time derivatives are
interchanged. The kinetic term (local velocity) comes from a vector quantity,
whereas the potential (the pressure) is a scalar.
With the Lagrangian density (5.19), one obtains the propagation equation
(5.20)
102 5 Lagrangian Field Theory
The case of the electromagnetic field is more complex and deeper. In fact, one
must take into account the fact that it involves two vector fields, and above
all, we must take care of relativistic invariance, which is the fundamental
property of Maxwell's equations. This problem is treated thoroughly in the
book by Landau and Lifshitz [1], for instance. Here we want to point out the
major features.
Physically, the electromagnetic field cannot be separated from its sources,
the charges, on which it furthermore acts. For a system of charged particles
in an electromagnetic field, the action is written in full generality as
(5.21)
where S field is the action of free fields, Spart is the action of the free particles in
the absence of fields, and Sint corresponds to the interaction of these particles
and the field, which we know already from Section 3.3.2. We recall that in an
electromagnetic field derived from the potentials A and cJ), the Lagrangian of
a particle of charge q and mass m is expressed in terms of the potentials A
and cJ),
Lint = qr A(r, t) - qtfJ(r, t). (5.22)
This form transforms as wanted in a Lorentz transformation. If we intro-
duce the current four-vector
where p and j are respectively the charge density and current, and the poten-
tial four-vector
{All} = (cJ)lc, A), (5.24)
the Lagrangian density that corresponds to (5.22) is
(5.25)
which is manifestly invariant. (We keep the same symbol L for the Lagrangian
density; the integration runs along space and time.) The action is invariant
since d 3 r dt is a relativistic invariant.
The fields are expressed in terms of the potentials cJ) and A by
B=V'xA, (5.26)
Using the notation all = aI ax Il' one expresses the electromagnetic tensor field
as
FIlV = all A V _ av All; (5.27)
i.e., the antisymmetric tensor
5.4 Electromagnetic Field 103
(5.28)
which lead to
\7 x E = - -
aB \7. B = o. (5.29)
at '
The inhomogeneous Maxwell equations relate the fields to the charge
densities and currents. Suppose there is a given charge density and current
{jl-'} = (cp, j). Then the Lagrangian density for the electromagnetic field in
the presence of these sources is
(5.30)
(5.31 )
the action S is defined as the integral over all space and time,
(5.32)
We have
(5.33)
One can check that the equations of motion of the electromagnetic field are,
in a covariant form,
(5.34)
where we have restored the coefficient EO, which we previously took equal to
one for convenience. This boils down to
P
\7E=-,
2
c \7 B = -
j aE
+-. (5.35)
EO
X
EO at
We see from (5.31) that the physical electromagnetic field in the vacuum,
away from charges, minimizes the difference (E2 -c2B2) given the constraints
imposed by the presence of the sources. This was implicit in the example of
the simple electrostatic field in (2.2.4).
104 5 Lagrangian Field Theory
In order to deal with equations of first order in time, such as the Fourier
diffusion equation or the Schrodinger equation, we use the technique described
in Section 3.3.1 for dissipative systems.
1:
i.e., the action is
S(to,t,) =
1
dt J
L d3 r.
(5.37)
The equation satisfied by 7/J is the usual diffusion equation. That written for
7/J*
would represent a diffusion reversed in time, or a "concentration".
It is necessary to use similar techniques in order to write in a Lagrangian
form the flow of a viscous fluid (see [10], Chapter 3, Section 3).
lOne says that the Schrodinger equation is a Fourier equation with an imaginary
time.
5.6 Problems 105
This form is appealing since its integral over space is simply the expectation
value of the quantum energy
(5.42)
5.6 Problems
5.1. The Telegraph Equation
The equation for neutron transport in matter has the form
106 5 Lagrangian Field Theory
2 ap 3 a2 p
a at + v 2 at 2 - LJ.p = 0, (5.43)
called the telegraph equation (see, for instance, Appendix D of [14]) which
shows a propagation term of the neutron density, of individual velocities v
which we assume to be the same and constant here. In the diffusive regime,
in reactor cores, this term is negligible. There exist situations (for instance,
neutrino transport in supernovae) where all terms must be kept owing to the
discontinuities of the diffusive medium.
Proceeding as in (3.34), write the form of a Lagrangian from which this
equation is derived.
6
Motion in a Curved Space
1 Roland Eotvos, "Uber die Anziehung der Erde auf Verchiedene Substanzen,"
Math. nat. Ber. Ungarn, 8, 65 (1890)
2 See, for instance, A. Pais, Subtle Is the Lord, Chapter 9, Oxford University Press,
New York, 1982. The original letter of Einstein to R. W. Lawson in January 1920
t, has been found. The published article, A. Einstein, Nature, 106, 782 (1921), is
not as light in spirit.
lets any object "fall" from his pocket, this object stands still or has a uniform
linear motion with respect to him, whatever its nature, and its physical and
chemical composition; (the resistance of the atmosphere is neglected).
The "equivalence principle" and its consequences can be found in many
books, for instance one by Hans Stefani [15]. The ambition of this chapter is
to show how the notion of motion in a curved space can lead to a theory such
that the equality of the "two masses" emerges naturally.
The equivalence principle can be stated in the following way. For a short
time, the laws of physics in a small laboratory in free fall are the same as
in the same laboratory in an inertial reference frame in the absence of grav-
itation. One usually makes a distinction between this principle, which only
concerns the motion, and the theory of general relativity itself; (i.e.; the Ein-
stein equations that relate the curvature tensor of space-time to the energy
momentum tensor of matter). In this book, we shall not describe Einstein's
equations and their consequence.
We will start by studying the free motion of a particle in a curved space.
In Section 6.1, we define what one calls a curved space and introduce the
fundamental notion of the metric of the space. In Section 6.2, we will write
the motion of a free particle in such a space. This will lead us, in Section
6.3, to a fundamental result: The physical trajectories are the geodesics of the
space; i.e., the curves of minimal (or extremal) length. As we shall see, this is
how the motion of a particle of constant energy E in a Euclidean space-time,
can be transformed into the free motion of the same particle in a curved space,
which is equivalent to the Maupertuis principle.
This will allow us to understand the reasoning of Einstein when he con-
structed general relativity, and some consequences of this theory. We will
display three historical examples: the variation of the beat of a clock due to
the gravitational field, the corrections to Newton's celestial mechanics, and
the deviation of light rays by a gravitational field.
These examples are historical. They are also very important in present-
day astrophysics and cosmology. The deviation of light by a gravitational field
plays an important role via the gravitational lensing effect that it induces. One
application is the search for a baryonic component in the "missing mass" of
the universe. Another is that the mass distribution in the universe, be it the
visible mass or the missing mass, acts as a natural telescope that can enable
us to see faraway objects, and therefore much younger objects. Through this
natural cosmic telescope (or microscope), the universe appears as an endless
gallery of gravitational mirages.
4. The necessary tool for this type of measurement is to have straight lines
(i.e., geodesics) of the space. It appears that always, whether it was Thales
measuring the height of the Great Pyramid, Eratosthenes, or Gauss, it was
implicit in the minds of people that light rays are physical entities that
possess the "perfect" mathematical property of propagating along straight
lines.
In his celebrated memoir on the theory of surfaces, Gauss understood that
the geometry of a surface is an intrinsic property of the surface, independent
of whether this surface is embedded in a Euclidean space or not. Gauss's ideas
were the starting points for the developments performed by Riemann.
In order to see whether or not a space is Euclidean, one can check whether
the Pythagorean theorem, the triangle inequality, and the angle formula above
are satisfied or not. Analyzing this further shows that everything boils down
to measuring distances and comparing sets of them. Hence the importance of
what is called the metric tensor or simply the metric of the space, which we
shall introduce below.
A famous example, due to Einstein, illustrates this fact. Consider four
points in a space, which we denote 1, 2, 3, 4, and let us denote dij the distance
between points i and j. In a fiat, Euclidean space, the following relation is
always satisfied
~~+~4+~4+4~+~4+4~
+ +
+di2d~3d~1 di2d~4d~1 di3d~4d~1 d~3d~4d~2 +
-di2d~3d~4 - di3d~2d~4 - di2d~4d~3 - di4d~2d~3
-di3d~4d~2 - di4d~3d~2 - d~3d~1 di4 - d~l di3d~4
-d~4d~1 di3 - d~l di4d~3 - d~l di2d~4 - d~2d~1 di4 = O.
One can use an airline schedule (and some courage) to verify that this equality
is not satisfied by Paris, New York, Johannesburg, and Shanghai (or any
other set of four airports), provided one uses the actual distances covered by
airplanes going as "straight" as possible from one place to the other.
(6.2)
6.1.3 Examples
Sphere S2 in R 3
~ ~ ~
gxx = 1 + R2 -x 2 -y 2 gyy = 1 + R2 -x 2 -y 2' gxy = gyx = R2 -x 2 -y 2 .
i.e., (6.3)
112 6 Motion in a Curved Space
2. Hyperbolic spaces:
Two-sheet hyperboloid:
i.e., (6.4)
One-sheet hyperboloid:
p2dp2
i.e., dw 2 = p2 _ R2 (6.5)
3. Parabolic space:
2 p2dp2
i.e., dw = --2-. (6.6)
a
(6.7)
General Case
One could, of course, continue playing the same type of game as in these
examples by imposing any constraint of the type p(x, y, z, w) = 0 in the space
n4. Actually, one would be far from discovering all three-dimensional curved
spaces.
The definition of a curved space consists of choosing the metric {g",,i3}; the
simple examples above are only illustrations.
Historically, the most famous example was given by Felix Klein in 1890.
It was a concrete example of the geometries of Gauss, Janos B6lyai, and
Lobatchevsky. Klein's model consists of an analytical geometry where each
point is represented by two real numbers, Xl and X2, such that xi + x < 1
and where the distance d(x, y) between two points is defined to be
(6.8)
6.2.1 Lagrangian
Since the particle is free, the Lagrangian boils down to its kinetic part, E kin =
mv 2 /2; i.e.,
1 ( ds ) 2 1 dx a dx(3
(6.9)
=2 m dt =2 mgar dtdt'
Note that if the space variables do not seem to appear explicitly in this La-
grangian, they are present in the metric ga(3.
The conjugate momenta are obtained with no difficulty. Assuming the
metric is symmetric, ga(3 = g(3a, which does not restrain the generality, one
obtains
(6.10)
The Hamiltonian is
(6.11)
The value of the Hamiltonian is the same as the value of the Lagrangian
as it should be since we consider a free particle. (Of course, the Lagrangian
and Hamiltonian functions are not expressed with the same variables.) We
deduce a consequence which is both obvious and important. Because of energy
conservation, the square of the velocity is a constant of the motion along the
trajectory.
(6.12)
(6.13)
We shall use this in the next section. The expanded form of this equation is
(6.14)
We notice, and it is not surprising, that the mass cancels off identically:
One obtains the equation for the i/' by multiplying (6.14) by gl"V (6.2),
114 6 Motion in a Curved Space
OI.,(3X0I.f3-0
x"J-I+rJ-l x - , (6.15)
We shall make no further use of these symbols in this book, but it is a good
example to show that the formal complexity of general relativity is a matter
of writing; what is subtle is the physics.
1. Motion on S2
One can, as an exercise, recover that the motion on a usual sphere S2 is
a uniform motion on a great circle.
2. Motion on S3
Consider now the case of the three-dimensional "spherical" space of (6.3);
i.e., the free motion on the sphere S3.
Obviously, the volume of this space is finite since p2 = x 2 + y2 + z2 ~ R2.
In spherical coordinates, the Lagrangian of the problem is
(6.17)
(6.18)
(6.19)
(6.20)
6.2 Free Motion in a Curved Space 115
We notice that the two constants of the motion E and A satisfy the
inequality
A 2 :S -
2R2E
-, (6.21)
m
which is a direct consequence of the fact that the energy is greater
than the rotational energy mA 2/2p2. This is a consequence of (6.20);
i.e., E ~ mA2 /2p2 ~ mA2/2R2.
The equations (6.19) and (6.20) are first-order differential equations that
determine the motion in terms of the constants of the motion E and A.
The solution is simple. We define parameters wand "( by
2E 2_ mA
2 2
w = mR2 and "( - 2ER2' (6.22)
"(2 :s 1. (6.23)
We set
p = Rcos(w'ljJ); i.e., p = -w"j;Rsin(w'ljJ). (6.24)
If we insert this in equation (6.20), we obtain
2 2
2 2 '2 W "(
W = W 'ljJ + cos 2( 0'.);
w'f/
(6.25)
i.e.,
W 2"j;2 cos 2(w'ljJ) = w2 (cos2(w'ljJ) - "(2). (6.26)
We now make the change of functions
The choice
u(t) = sin(w((t)) (6.28)
leads with no difficulty to
. A
= . (6.31)
R2(cos2 w(t - to) + "(2 sin2 w(t - to))'
Le.,
tan((t) - o) = "(tanw(t - to), (6.32)
which is also periodic and of frequency w.
We conclude the following.
a) Consider the Euclidean plane of the motion (Le., x = p cos , y =
p sin ). For simplicity, we choose the initial parameters as to = 0, o =
0, and we have
x = R cos wt y = "(Rsinwt.
n2 = 2E
J& mR2.
H = _1 (p2R2 2+2
- P Po + 2) .
Pc/> (6.34)
2m p R2 p2 p2 sin2 ()
6.3.1 Definition
i.e., the path {XCI.} such that 8s AB = 0 for any infinitesimal variation
{8x Cl.,8:i;CI.}. Considering an arbitrary parameterization {XCl.(A)} of the path,
we must find the path that minimizes the integral
dxCl. dx(3
gCl.(3 dA dA dA. (6.36)
The assertion above is that these paths are the same as those along which
the action
S =
A
lB 1
.edt = -m
2 A
lB dxCl. dx f3
gCl.(3--dt
dt dt
(6.37)
is stationary.
5 In Minkowski space, light rays follow trajectories of vanishing "length." The no-
tion of parallel transport allows us to overcome this apparent difficulty; see [5],
[16], or [18].
118 6 Motion in a Curved Space
The variational problem posed in equation (6.36) is similar in every way with
those considered in Chapter 2. Consider a variation
where
dE v
EV(A) EV(B)
.V
E = d)" . and = = O. (6.39)
(6.40)
where we have set X'" == (djd)")x"'. We now integrate the second term by parts.
Consider the quantity
We have
s: _
USAB -
JB [12Faxv
A
(a g",{3. '" .X X
{3
-
d (2gv{3x. (3)) - (2gv{3x. (3) d)"
d)" d ( 2F 1)] E
vd ).. -_ O.
(6.41 )
This variation must vanish for any {EV}, and we obtain the equations
(6.42)
These equations are simplified if one makes an appropriate choice of the pa-
rameter )... Consider the choice).. = s; i.e., ).. is the length along the geodesic
and d)" = ds. 6 Then, by definition, inserting this in equation (6.36), we have
along the geodesic
dF
F= 1, and d)" = o.
Consequently, the equation of the geodesic becomes
(6.43)
Not only does this equation have the same form as the equation of motion
(6.13), but it is equivalent to it. Indeed, we can choose).. = t. In the case of a
free motion, we have seen that v = dsjdt is a constant along the trajectory.
Therefore, ds = vdt and the factor Ijv 2 cancels off identically in (6.43).
We have proven our assertion. The trajectories of a free particle in a curved
space are the geodesics of the space. In other words, the trajectory followed
6 Actually, it suffices that A be an affine function of s.
6.3 Geodesic Lines 119
6.3.3 Examples
If we keep in mind how we have treated example 2 of the free motion on S3,
we can use the constants of the motion in order to determine the geodesics in
simple but non trivial cases that are not totally academic.
1. Isotropic spaces
Metrics of the form (6.7), or more generally
(6.44)
i
r da
po
g(a)
( (2Ejm-A2ja2) )
=t-to (6.46)
The only difficulty lies in the inversion of this formula in order to obtain
the dependence p( t) .
2. Hyperbolic geodesics
Consider the metric
(6.47)
(6.49)
(6.50)
(6.51)
(6.52)
(6.53)
The solution of the problem is obtained rather easily. One defines the
parameters wand , as before,
2 2E
w = mR2 and ,2 = 2ER2.
mA 2
(6.54)
(6.55)
and
tan((t) - o) = ,cothw(t - to). (6.56)
We notice that the distance to the origin increases exponentially when
It I -+ 00. The geodesics of the metric (6.47) are hyperbolas
(6.57)
6.3 Geodesic Lines 121
1 N
.c = 2 L mi,j(q)qiqj - V(q). (6.58)
i,j=l
Here, we denote by mi,j (q) the coefficients of the quadratic form that con-
stitutes the kinetic energy. In Cartesian coordinates, mi,j (q) is diagonal and
does not depend on the coordinates. This is no longer true in general.
The conjugate momenta are
(6.59)
1 N
E =2 L mi,j(q)qiqj + V(q). (6.60)
i,j=l
(6.62)
So = J (6.63)
122 6 Motion in a Curved Space
In this form, we see that the trajectory that minimizes the reduced action So
is a geodesic of a curved space whose metric, which depends on E, is given by
N
ds 2 = 2(E - V(q)) L mi,j(q)dqidqj. (6.64)
i,j=l
t - to = lqo
q Li,j mi,j(q')dq~dqj
2(E - V(q'))
(6.65)
The scheme we have studied up to now is appealing since the equality between
the inertial and the gravitational masses follows automatically. However, the
theory has an embarrassing by-product in that the norm of the velocity is a
constant in time.
In order to get rid of this defect, we must introduce the time variable in
the problem and extend it to space-time and not only space.
Our purpose here is not to enter the domain of relativistic gravitation and
general relativity as a whole (see, for instance, [1], [15], [16], or [17]). We only
want to introduce a curvature of space-time that, at least to lowest order in
v 2 / c2 , v being a characteristic velocity of the problem, allows us to recover
Newton's usual equations while maintaining the nice properties encountered
above, in particular the fact that the mass drops out from the equations of
motion.
What metric of space-time can we choose in order to achieve this program?
We have seen in Chapter 3 that the Lagrangian of a free relativistic particle
1S
L = -mc2 V~
1 - ~. (6.67)
I t2g2
The Lorentz-invariant action is
(6.69)
1
.e = -me2 + -mv
2
2
- m-f..
'1-', (6.71)
J J(e - ~:
and an action
s= .edt = -me + ~ ) dt. (6.72)
S = -me J
ds, (6.73)
(6.74)
This is the simplest, or most naive, extension of the metric (6.66) which ac-
counts for the phenomena that interest us
(6.75)
r = -\l, (6.76)
In the theory of general relativity, the metric is related to the mass distribution
(actually to the energy-momentum tensor) by Einstein's equations.
An exact solution of these equations was given in 1916 by Karl Schwarzschild.
It is the metric generated by the static gravitational field of an isotropic mass
distribution of total mass M. This metric leads to
ds 2 = (1 + 2tf>(P)) c2dt 2 _
c2 (
dp2
1 + 2~~P))
_ p2(de 2 + sin 2 e dqi), (6.79)
d l2 = l po
p2
PI
dp
1--
P
> P2 - Pl (6.80)
Some arbitrariness remains as far as the couple of variables (p, t) are con-
cerned. Here, these variables are chosen so that there is no off-diagonal term
dp dt in the metric.
The proof of this formula is, of course, beyond the scope of this book.
One can refer to Landau and Lifshitz [1], Section 97, and to Misner, Thorne,
and Wheeler [18], Chapter 25. One can find the complete description of black
holes (i.e. physics inside the Schwarzschild radius) in [18].
The "naive" metric (6.74) is the approximation of (6.77) to lowest order
in v2jc2 and /c2.
We remark on the form (6.77) that its spatial part is not locally Euclidean.
There is no local rotation invariance, which is intuitive since the radial vari-
able plays a special role. When fields are weak (i.e. roj p 1), or at large
distances, one can use locally Euclidean space variables (x, y, z), and, to a
good approximation, the Schwarzschild metric (6.77) is of the form
6.4 Gravitation and the Curvature of Space-Time 125
where (r, e, ) are the usual spherical coordinates. (The proof of this result
can be found in [1], Section 97).
We notice that if the metric (6.75) gives us the classical Newton equation, it
"curves" time at each point in space. In that respect, it is in full agreement
with the general solution of Schwarzschild, which predicts a dilation of the
proper time in an algebraically increasing gravitational potential
(6.82)
This effect, as well as the "twin effect" of special relativity, has been mea-
sured with great accuracy by R.F.C. Vessot and collaborators. 9 A hydrogen
maser was sent to an altitude of 10,000 km by a Scout rocket, and the varia-
tion in time of its frequency was made as the gravitational potential increased
(algebraically). There are many corrections, in particular due to the Doppler
effect of the spacecraft and to the Earth's rotation. It was possible to test the
predictions of general relativity on the variation of the pace of a clock as a
function of the gravitational field with a relative accuracy of 7 x 10- 5 . This
was done by comparison with atomic clocks, or masers, on Earth. Up to now,
it has been one of the best verifications of general relativity. The recording of
the beats between the embarked maser and a test maser on Earth is shown in
Figure 6.1. (These are actually beats between signals, which are first recorded
and then treated in order to take into account all physical corrections.)
To next order, the Schwarzschild metric curves space. This causes a variety
of observable phenomena in celestial mechanics. Among these is the famous
precession of the perihelion of planets and comets .
Here we choose to work with the form (6.81). In fact, the value of
Schwarzschild's radius is r s = 2G M / c2 , r s = 3 km for the sun and r s =
0.44 cm for the Earth. It is very small compared to the orders of magnitude
of celestial mechanics in the solar system (1 A.U.= 150 x 106 km). The effects
are small corrections to the Newtonian terms.
9 R. F. C. Vessot, M. W. Levine, E. M. Mattison, E. L. Blomberg, T. E. Hoffman,
G. U. Nystrom, B. F. Farrel, R. Decher, P. B. Eby, C. R. Baugher, J. W. Watts, D.
L. Teuber, and F. D. Wills, "Test of Relativistic Gravitation with a Space-Borne
Hydrogen Maser", Phys. Rev. Lett. 45, 2081, (1980).
126 6 Motion in a Curved Space
(a) ~
1
11460 .. T
1
11490MT
(e)
I
12400.. T
...
1
1331 GMT
(e)
I
1~360MT
Fig. 6.1. Beats between a maser onboard the spacecraft launched by a Scout rocket
and a maser on Earth at various instants in GMT. (a) Signal of the dipole antenna;
the pointer shows the delicate moment when the spacecraft separated from the rocket
(it was important that the maser onboard had not been damaged by vibrations
during takeoff). During this first phase, the special relativity effect due to the velocity
is dominant. (b) Time interval of "zero beat" during ascent when the velocity effect
and the gravitational effect, of opposite signs, cancel each other. (c) Beat at the
apogee, entirely due to the gravitational effect of general relativity. Its frequency is
0.9 Hz. (d) Zero beat at descent. (e) End of the experiment. The spacecraft enters
the atmosphere and the maser onboard ceases to work. (Courtesy of R.F.C. Vessot.)
6.4 Gravitation and the Curvature of Space-Time 127
GM
with a
re
l.
= -2-
(6.84)
.c = -me -ds
dt
G M-
= -me2 + -
r
3G
m + -m ( 1 + - M) [r2 + r 2(f)' 2 + sm
2-
2 re
. 2 .2 ]
f) ) .
(6.85)
The first and most famous application is the calculation of the precession
of Mercury's perihelion.
Classical Calculation
Let [. be the energy of the planet and A the norm of its angular momentum.
For convenience, we define
[. A
E=.- and L=- (6.86)
m m
Conservation of angular momentum yields
(6.88)
2E ,2 2 2GMu
L2 = U +U - -L-2- (6.89)
The trajectory is obtained by a simple quadrature (one can take the deriva-
tive of (6.89) with respect to , which leads to a linear equation whose general
solution is inserted into (6.89) in order to fix the constants):
1 + ecos L2
U= P with p= GM and e = (6.90)
Relativistic Correction
With the curvature of space-time, the motion remains planar. One chooses as
above e = 7r 12, and the Lagrangian is given by (6.85); i.e.,
I' _
/..-- 1 (1 +--
-m
2
3GM) (.2
rc2
r +r 2J,2)
'f'
GMm
+--.
r
(6.92)
3GM 3G2M2
e= and A -- - - -
pc2 - L 2 c2 (6 . 94)
E --
- 1(1 +--
2
3G M) (.2
c2 r
2J,2) - -
r +r'f' G-M.
r
(6.95)
We still define r' =:: (dr 1d) and r = r' so that the energy is expressed, in
terms of the variables and parameters defined in (6.93) and (6.94), as
2E (U,2+ U2) 2
-- --u (6.96)
2 - (1+3~~) p.
(6.98)
Of course, we notice that in the absence of the relativistic correction (,X = 0),
the solution is
Va = 1 + e cos . (6.99)
In order to calculate the relativistic correction, we start by taking the deriva-
tive of (6.98) with respect to . We obtain
First-Order Perturbation
(6.101)
II 3 + e2
vI + VI = - 2 - + 2e cos , (6.102)
whose solution is
VI
3+
=-
2 .e
2 - +esm+ (
00+"2 sm, e). (6.103)
1 GM [
-:;. = 1:2 1 + e cos 1 -
( 3 G 2 M2 )
c2 L2
3G 2 M2
+ c2 L2
(3 +e2 e. ) ]
- 2 - + "2 sm .
(6.104)
130 6 Motion in a Curved Space
where a is half of the major axis of the ellipse and e its eccentricity.
The parameters of the planet Mercury are a = 55,3 x 106 km, 11 = liT =
415 revolutions per century, and its eccentricity is e = 0.2056 (the mass of
the sun is Me:; = 2 X 1030 kg). The calculated value is
compared with the observed 43.11 0.45" per century. Einstein said that this
result was the strongest emotional experience of his scientific life.lO
Another effect of the metric (6.77) and the corresponding geodesics is the
deviation of light rays by a gravitational field. This effect, which was one of
the first verifications of general relativity, in 1919, has regained considerable
interest in recent years because of its astrophysical and cosmological conse-
quences through the gravitational lensing effect that it induces.ll We use the
weak-field approximation
(6.106)
(6.107)
where (r, t) are the space-time coordinates of the photon as seen by an ob-
server. From this equation, we can calculate the velocity v of a photon in a
gravitational potential,
1+ 2~~r) [ 2<P(r)]
v =C 2~(r) ~ C 1 + --2- . (6.108)
1- ----cr- C
With this expression, we can calculate the photon trajectory by the Fermat
principle exactly as for curved rays in equation (2.5); i.e., by minimizing the
integral
T = rB
dR, (6.109)
JA v
where A and B are the endpoints of the photon trajectory and dR is the length
element along the trajectory.
We assume that the potential <1>( r) is spherically symmetric and centered
at the origin. We consider the motion in the plane (AOB) and we use polar
coordinates (r, e) as shown in Figure 6.2. We consider a situation where A
e
and B are symmetric to each other, so that = 0 corresponds to the point
of shortest distance to the origin. It is convenient to determine the function
r(B) that minimizes the time T. Under these conditions, equation (6.109) can
be written as
T= -
liB VI + r 2 (P dr
c A [1 + 2~~r) 1 '
(6.110)
where iJ = dB / dr.
We consider the potential created by a total mass M, and we assume the
photon path is outside the mass distribution so that we can set
2<1>(r) 2GM A
, (6.111)
c2 r
(6.112)
132 6 Motion in a Curved Space
(6.113)
(6.115)
() J 1 J.L
-- xvx 2 - 1
+-===--
( Vx 2 - 1)3 .
(6.116)
whose solution is
R >..r
() = arccos - - -=--;:::::;<=~ (6.117)
r Rvr2 - R2
(we have come back to the variable r).
The value of the constant R is obtained from the closest distance ro of the
photon to the origin, which corresponds to () = o. We obtain
GM
R = ro(1 - c:) with c:- - -2
- roc
One can check that, to the same order, R is nothing but the impact parameter
of the photon (Le. the distance between its trajectory, which is linear at long
distances, and the parallel line going through the center r = 0).
What is more interesting is the angular deflection compared with a straight
line. In the absence of the gravitational field, the photon follows a straight
line, so that the difference between the direction of arrival and the direction
of departure is L1()~ = 7r. This direction of departure is also the (Euclidean)
direction of observation of the source that emits the photon.
In solution (6.117), in the presence ofthe gravitational potential, this same
difference is twice the difference ()(r = (0) - ()(r = ro). By definition, ()(r =
6.5 Gravitational Optics and Mirages 133
ro) = O. For r --+ 00, equation (6.117) gives e(r = (0) = 7r /2-Aj R, according
to whether it is the initial or final direction of the photon. The difference
between the direction of reception of the photon and the geometrical direction
of its source is i1e:foM = 7r - 2),,/ R. In other words, one observes a deflection of
the light rays compared with a straight line, due to the gravitational potential,
of
i1e = i1e GM _ i1e o = 4GM (6.118)
00 00 00 Rc2 '
where R is the impact parameter or, to good approximation, the closest dis-
tance between the photon and the center of the potential. 12
For a light ray coming from a star and grazing the edge of the sun, the
calculated deflection is 1.75/1. In the case of Jupiter, it is 0.02/1.
The first measurement of this effect was performed by teams led by Sir
Arthur Stanley Eddington. 13 It was done on the Sobral Islands in Brasil and
the Principe Islands in the Gulf of Guinea on May 29, 1919. The experiment
consisted in observing the apparent motion of stars (seven at Sobral and five at
Principe) during a total eclipse of the sun. The results, 1.98 0.16/1 at Sobral
and 1.61 0.31/1 at Principe, were in agreement with Einstein's prediction.
It is most probably this experiment that generated the public's interest in
relativity and Einstein himself.
The most precise measurement at present comes from interferometric ra-
dioastronomical observation of radio waves coming from the source 3C 279. 14
It gives the result 1,77 0.20/1.
As we shall see, the most important cosmological use of this effect is through
gravitational lensing of light on remote objects in the universe. This effect is
due to the fact that mass (not only mass of galaxies but also of "dark matter")
in the universe acts as an optical instrument that can enable one to observe
faraway objects and therefore very "young" objects.
Two potentials are of particular interest. The first is that of a point-like
mass M:
12 It is amusing that this is exactly twice as much as the deflection that a Newtonian
argument would give using the Rutherford scattering formula.
13 F.W. Dyson, A.S. Eddington, and C. Davidson, Philos. Trans. R Soc. London,
Ser. 220 A, 291 (1920); Mem. R Astron. Soc., 62, 291 (1920).
14 G.A. Seielstad, RA. Sramek, and K.W. Weller, Phys. Rev. Lett., 24, 1373 (1970).
134 6 Motion in a Curved Space
[Xp GM 4GM
8r r2 =? a = Rc2 . (6.119)
As shown in Figure 6.3, the gravitational deflection can yield two images of
a source. The two images have impact parameters b1 and b2 . The potential
created by a point-like mass always gives two images because the angle of
deflection diverges for small values of the impact parameter. We will see later
that one of the images is in general much more luminous than the other.
In the case of an extended mass distribution, such as a cluster of galaxies
(6.120), one can only observe two images separated by an angle a if the
undeflected impact parameter satisfies bo < b max = Laj2 where L is the
distance between the source and the lens (which we take here to be equal to
the distance between the observer and the lens for simplicity). The reason is
that if bo were larger than Laj2, the two images would be on the same side,
which is impossible.
The large clusters correspond to a = v~ j c2 rv 10- 5 , and the two images
can be separated by terrestrial telescopes of resolution CJ() rv 3 X 10- 6 .
The "cross-section" necessary for a double image to occur is CJ rv 1fb;;'ax =
1f L2V~ j c2 . This cross-section increases with L because the necessary angle of
deflection decreases with L.
The probability for a given object to have two images because of the lensing
effect due to a cluster of galaxies is simply equal to the probability P that this
object hides behind a cluster. This probability is proportional to the cross-
section, to the number density n of clusters, and to the total length of the
path rv L:
(6.121)
The fact that this probability increases rapidly with L makes the number of
double-image quasars sensitive to the value of the undeflected impact param-
eter boo
A second pratical application of deflection by clusters is that the time of
flight is not the same for both images. Quasars have an intrinsic variability
6.5 Gravitational Optics and Mirages 135
y
L L
and by comparing the light curves (light flux as a function of time) one can
determine the difference Llt of the two times of flight.
The time it takes light to go from one point to another can be deduced
with no difficulty from the calculations performed in Section 6.4.5. We have
The first term is the obvious term in the absence of a gravitational effect.
If we consider a mirage, the time delay is the difference between the inte~
grals calculated along each path. The first-order term vanishes obviously. This
leaves a "gravitational" term, which is proportional to the potential difference,
and a "geometric term."
For an angle of deflection independent of the point of impact, which is
approximately the case for clusters of galaxies (6.120), the geometric term
vanishes, leaving
(6.124)
where Y1(Z) and Y2(Z) are the photon trajectories in the two images. Consider
the nearly symmetric case IY11 rv IY21. Going back to Figure 6.3, we see that
136 6 Motion in a Curved Space
Llt rv 1
4b o
00
-00
dz
a<P
a Y
. (6.126)
The integral is simply the deflection angle 28 given by (6.118), and the time
delay is
bo
Llt = 4L L 28. (6.127)
The factor bo/ L is the angular separation between the center of the cluster
and the average position of the two images.
e
In order to estimate the length of the delay, we can take bo/ L rv rv 10- 5
and L rv d hub (where d hub is the Hubble distance, d hub = c/ Ho ':::0' 4300 Mpc,
Ho being the Hubble constant), which gives Llt rv 1 year.
The first historical observation of this effect was the observation in 1979
of the "double quasar" caused by the gravitational lens Q0957+561. 15 The
original image is shown in Figure 6.4.
Fig. 6.4. Top left: first picture of the double quasar Q0957+561. Top right: com-
parison of the spectra of the two objects with a time delay of 417 days. Bottom
picture: the galaxy that acts as a gravitational lens after subtracting the pictures of
the quasars. (Picture from P. Magain , Liege University.)
15 D. Walsh, R.F. Carswell, and R.J. Weymann, Nature, 279, 381 (1979).
6.5 Gravitational Optics and Mirages 137
The two quasars have exactly the same spectrum. However, the time vari-
ation of the signals emitted is the same except for a delay of 417 days. Once
the two images are subtracted from each other, taking this delay into account,
the galaxy that acts as a gravitational lens appears clearly.
One can observe pictures with a multiplicity greater than two. The most
spectacular example is perhaps the Einstein cross shown in Figure 6.5. Four
images of the pulsar Q2237 +0305 appear, together with the spiral galaxy that
causes the mirage.
Fig. 6.5. The Einstein cross. The four different images of the same quasar at a
redshift of 1. 7 are due to the central galaxy, which is at a redshift of 0.04 and
therefore much closer. One can wonder about the probability of finding such an
alignment, but the vastness of the universe and the perseverance of astronomers are
such that events of small probability are observed in appreciable amounts. (Credit
NASA and ESA.)
Fig. 6.6. Einstein ring caused by the lensing by a close galaxy of the light emitted
by the galaxy B1938+666 located behind it. The actual size of the visible object is
several tens of thousands of light-years. The picture comes from the Hubble Space
Telescope. (Credit L. J . King (U. Manchester), NIeMOS, HST, NASA.)
The last effect of gravitational lensing comes from the distortion of the
image of an extended object. The distortion along the radial and tangent
directions is illustrated in Figure 6.7. This distortion causes the arcs that can
be seen in Figure 6.8.
This effect can be used to determine the mass of the cluster. The masses
determined by this effect can be compared with the visible masses and with the
masses one estimates with the virial theorem and the dispersion in velocities.
It is a method to evaluate the amount of dark matter in the cluster.
In this respect, the universe appears as an endless gallery of mirages.
6.5 Gravitational Optics and Mirages 139
Fig. 6.8. In this picture, obtained by the Hubble Space Telescope, practically all
luminous objects are galaxies of the cluster Abell 2218. This cluster is massive
enough that its gravitational field focuses the light of galaxies located behind it.
This results in images that are stretched along arcs similar to what one can see at
night through a white wine glass. The Abell 2218 cluster is 3 billion light-years from
us in the constellation of Draco. The spectrum of the arcs is strongly blueshifted
compared with that of the stars in the cluster (this cannot be seen on the black
and white picture) because the light that is focused comes from very young and
hot stars at the beginning of their evolution. (Credit: W. Couch (University of New
South Wales), R. Ellis (Cambridge University), NASA.)
The theory of nucleosynthesis indicates that the baryon density in the universe
is 0.04 times the critical density. This leads to the idea that baryons cannot
account for all the dark matter. It is nevertheless possible that baryons are
a component of the galactic dark matter if they are in a form that does not
emit light in appreciable amounts. The simplest way this can happen is if the
baryons are hidden in objects that either do not burn (for instance, brown
dwarfs) or have ceased to burn (for instance, white dwarfs, neutron stars, or
black holes). Brown dwarfs have a mass < 0.07M(,), which is not sufficient to
create high enough temperatures for the burning of hydrogen. Initially, they
were the preferred candidates because they completely avoid the problems
associated with background light emission or the pollution of the interstellar
medium by heavy elements caused by mass loss or supernova explosions.
The dark objects located in galactic halos are called "machos", for "mas-
sive compact halo objects."
Paczynski 16 has suggested that machos could be detected through the
gravitational lensing effect they induce on individual stars of the Large Mag-
Milky Way
o SMC
Fig. 6.9. Sketch of a gravitational lensing effect in the Large Magellanic Cloud
(LMC) by an invisible object in the galactic halo. The two images cannot be resolved,
but the combined luminosity of the two images gives rise to a time-dependent am-
plification of the light of the star when the invisible object crosses the line of sight.
The corresponding light curve is shown in Figure 6.10.
ellanic Cloud (LMC) (Figure 6.9). This small galaxy is 50 kpc away from the
Earth.
The theory of gravitational lensing was done above. For a point-like lens, it
is simple to show that the two amplifications 17 depend on the reduced impact
parameter u = bo/ R E ,
A = u2 + 2 uVU2+4 (6.128)
2uvu 2 +4 '
where the "Einstein radius" is given by
R2 _ 4GMLx(1- x)
E - c2 '
(6.129)
where Lx is the distance between the observer and the lens, L being the
distance between the observer and the source. We see that for bo R E ,
A+ = 1 and A_ = 0, as expected. For bo ---+ 0, the amplifications become
infinite formally, and this actually results from the fact that a point-like source
is deformed into a ring. The divergence ceases if one takes into account the
finite extension of the source, which gives an effective extension at boo
In the case of "lensing" by stellar objects of the galactic halo, the angle
between the two images is very small 1 milliarcsec). This type of effect is
therefore called "microlensing" . Terrestrial telescopes cannot resolve the two
17 Professionals prefer the term "magnification."
6.5 Gravitational Optics and Mirages 141
images because atmospheric turbulence blurs the images and stellar objects
have an angular dimension of the order of 1 arcsec. Therefore, the only observ-
able effect is a temporary amplification of the total intensity when the macho
comes close to the line of sight, and then recedes from it. The amplification is
u2 + 2
A= (6.130)
uvu 2 +4 '
where u is the closest distance to the (undeflected) line of sight that the
deflector reaches in units of "Einstein's radius," RE = J4GMLx(1 - x)/c2 ,
where L is the distance between the source and the observer, Lx is the distance
between the observer and the deflector, and M is the mass of the macho.
The amplification is larger than 1.34 when the distance to the line of sight
is less than R E . This amplification corresponds to an acceptable observational
threshold since photometry can be performed quite easily to better than 10%
accuracy. At a given moment, the probability P for a star to be amplified by
a factor 1.34 is equal to the probability that its undeflected light path passes
within one Einstein radius of the macho,
P rv nmacho L 7r R~ , (6.131)
where nmacho is the average number density of machos lying between the LMC
and us, and L is the distance of the LMC. The macho density is nmacho rv
Mhalo/ M L 3 , where Mhalo is the total mass of the halo up to the position of
the LMC. Using the expression of the Einstein radius, one finds that P does
not depend on the mass M but is determined only by the velocity of the LMC:
P rv GMhalo rv V[MC
(6.132)
Lc2 c2
The LMC is believed to orbit around the Milky Way with a velocity of VLMC rv
200 km/s. (This corresponds to a flat rotation curve up to the LMC.) In that
case, P is of the order of 10- 6 . More refined calculations give P = 0.5 X 10- 6
[19].
Since the observer, the star, and the deflector are in relative motion with
respect to one another, a noticeable amplification lasts only as long as the non-
deflected ray remains within one Einstein radius. The resulting light curve,
which is achromatic and symmetric, is represented in Figure 6.10 for a variety
of values of the impact parameter. The timescale of the amplification is the
time ilt that it takes the object to cross one Einstein radius between the
observer and the source. For the lensing of stars of the LMC and deflectors
of our halo, the relative velocities are of the order of 200kms- 1 and the
position of the deflector is roughly halfway between the observer and the
source (x rv 0.5). The average ilt is then
ilt rv RE
200km/s
rv 75 days ~
-.
Mev
(6.133)
142 6 Motion in a Curved Space
The duration distribution can therefore be used to estimate the mass of ma-
chos, assuming they are in the galactic halo (and not in the LMC).
c
.2 2.4
0 2.2
.... - 0.5
~
a. 2
E
loB
1.6
1.4
1.2
0.6
-3 -2 -1 o 2 J
(t-t.)/tot
Fig. 6.10. Microlensing curves for a point-like source. The curves correspond to
four values of the closest approach distance (0.5, 0.7, 1.0, and 1.5 times the Ein-
stein radius). The duration timescale L1t, which depends on the mass, is normalized
according to (6.133). (Courtesy of James Rich.)
Two experimental groups, the MACHO collaboration and the EROS col-
laboration, have published results of searches for events in the directions of
the LMC and the SMC (the nearby Small Magellanic Cloud). The absence
of events lasting less than 15 days allowed both groups to exclude objects
of masses in the interval 10- 7 M('J < M < 10- 1M('J as the main component
of the halo. 18 these limits exclude brown dwarfs of masses rv 0.07 M('J as the
major components of the halo.
One important aspect of the phenomenon is that the amplification should
be symmetric and achromatic. Figure 6.11 shows an event recorded by the
EROS experiment that possesses both properties.
The MACHO collaboration, however, observed events of a duration of rv 50
days.19 If these are interpreted as originating from dark lenses of the galactic
halo, this rate corresponds to a fraction f = 0.2 of machos contributing to the
total mass of the halo. The timescale corresponds to objects of mass rv O.4M('J '
EROS has only published upper bounds on the fraction of the halo com-
ponents made of machos. 2o
The results of the two experiments show that it is unlikely that the halo is
made up predominantly of objects of the order of stellar masses. The present
challenge is to prove that the events observed by the MACHO collaboration
are caused by lensing by objects of the halo and not, for instance, lensing
18 C.Alcock, RA.Allsman, D.Alves , et al., Astrophys. J. Lett. 499 , L9 (1998)
19 C.Alcock, RAllsman, D.RAlves, et al., Astrophys. J. 542,281 (2000).
20 T . Lasserre, et al. , EROS Collaboration, Astron. Astrophy.355, L39 (2000).
6.5 Gravitational Optics and Mirages 143
3300 3325 3350 3375 3400 3-42$ ~ 3475 3500 3525 3S5O
dote
Fig. 6.11. Gravitational microlensing event in the EROS experiment. The upper
picture is in the blue part of the optical spectrum and the lower picture is in the
red part. The phenomenon is symmetric and achromatic, as expected. (Courtesy of
James Rich).
objects in the clouds themselves. If that is the case, the mass estimation
implies that they correspond to very old white dwarfs, perhaps the oldest
stars.
The information on the localization of the lenses (in the galactic halo or
in the Magellanic Clouds) is difficult to obtain. The simplest case is that of
events with a very large amplification, in particular the events due to binary
lenses. In such events, the light curve is modified in a way that depends on
the relative distance of the lens and the source stars.21 It is also possible to
obtain information on the distance of lenses in events of very long duration
when the motion of the Earth around the sun modifies the light curve. 22 In
the future, it will also be possible to resolve the two microlensing images
with interferometric space telescopes. Such measurements will give enough
information to determine the distances of the lensing objects and to draw a
definite conclusion.
The search for dark objects by microlensing is under way in the Andromeda
Nebula, the spiral galaxy M31, which is close to our galaxy.23
6.6 Problems
w = Rcos'lj;, Z = Rsin'lj;cose, x = Rsin'lj; sine cos cp, y = Rsin'lj; sine sin cp.
6.2. Geodesics
Consider the metric
(6.134)
(6.135)
(6.136)
years, one can find the essence and the beauty of his ideas and results. Feyn-
man showed that quantum transition amplitudes can be calculated with path
integrals and that this method is more efficient than working with the notion
of a wave function, in particular when one considers systems made of several
interacting particles. Statistical physics also profited from path integral tech-
niques. Many results have been obtained, and this is a central tool in modern
quantum field theory.
As far as teaching quantum mechanics is concerned, Feynman acknowl-
edged that the traditional approach is more efficient. However, it is useful and
instructive, as we shall see, to provide in this last chapter Feynman's ideas
and the path integral technique. We will, in particular, discover its unifying
aspects after having worked through the previous five chapters.
The book by Feynman and Hibbs is remarkably well written. It proceeds
by considering specific examples at each step. This allows the reader to get
acquainted gradually with the basic ideas of the theory. Here, we wish to
remain at an elementary level and to elicit the structure of the theory and
its relation with analytical mechanics. This is why we will treat very few
examples; in particular, we will not enter into the application to perturbation
theory, which is extremely powerful and elegant.
One reason for the success of the method in field theory stems from the fact
that Feynman directly casts the problem of quantum mechanics in space-time.
l
integral
t2
S= (x, x, t) dt. (7.1)
h
4. The least action principle states that the actual physical trajectory X(t)
renders S minimal (extremal).
5. The equation of motion that determines the actual trajectory is the
Lagrange-Euler equation
(7.2)
7.1 Feynman's Principle 147
6. For a free particle, = mx 2 /2, the classical action between (Xl, tl) and
(X2' t 2 ) is
(7.3)
as H = _ as (7.4)
Pi = aXi' at
a b
Fig. 7.1. Successive Young interferences across a series of screens (only a subset of
possi ble paths are represented).
The modulus of all these amplitudes is roughly the same,3 but the phase
differs appreciably from one path to another. The amplitude K(b, a) is the
sum of individual amplitudes
where x(t) defines a path between a and b. Of course, the specific structure
of the setup in Figure 7.1 does not matter.
The Feynman principle consists in stating that , in full generality, in any
experimental setup, the phase of the amplitude (x(t)) corresponding to a
given path is the classical action along this path, calculated according to
equation (7.1), divided by n:
1 .
(x(t)) = C ekS(x(t)). (7.7)
We shall see later on how one fixes the normalization constant C (which is
essential). We stress the fact that the quantity S(x(t)) in this expression is the
value of the action (7.1) along the path x(t). It does not necessarily correspond
to an extremum of the classical action.
This leads us to a central point, which is the evaluation of the sum (7.6) with
the definition (7.7). In fact, the family of possible trajectories x(t) between a
3 Of course, it is only after we have understood the physical and mathematical
structure of the problem that this claim appears justified in good approximation.
7.1 Feynman's Principle 149
and b is a complicated set. The result does not correspond to a simple limit
of the discrete set, which we could calculate in the case of Figure 7.1, to a
continuous set.
In order to define the sum on all paths, we proceed by first taking discrete
time intervals tb-ta in the form of N successive equal intervals ti, i = 0, ... , N
as :
tb - ta = Ne, e = ti+1 - ti to = t a , tN = tb. (7.8)
For each value ti, we choose a value Xi of the variable x. This gives a set
of N - 1 values since the endpoints are fixed,
By joining the successive xi's by straight lines (we shall come back to this
point), we define a trajectory in the form of a broken line that joins the
points a and b. Each set {Xi} defines a different possible trajectory.
If we integrate on the values of each Xi from -00 to +00, we sum over
all "trajectories" corresponding to this particular discretization of the time
variable. This procedure is illustrated in Figure 7.2 .
.< -
Xh
/
Xl
/ '-:-. / \ :/
Xu
./
\ /'
'2
x,
For a given value of e, let C (e) be the normalization constant of (7.7). The
amplitude K,(b, a) is given by
-'
K,(b, a) - !~ C(e)
_1_ JJ... J e
-kS(b,a) dXl dX2 dX(N-l)
C(e) C(e) ... C(e) , (7.9)
where, for each value of the set {Xi(ti)}, S(b,a) is the action calculated on
the trajectory defined by this set, as represented in Figure 7.2.
150 7 Feynman's Principle in Quantum Mechanics
The end of the calculation consists in taking the limit E -+ O. This is where
the normalization factor C (E) enters, as well as the number of such factors.
Indeed, the limit must exist and only involve physical quantities. Assuming
this is achieved, the amplitude K(b, a) is given by
where the symbol Vx( t) characterizes the mathematical nature of this expres-
sion.
The form (7.11) is called a path integral. In this expression, S(b, a) is a
number whose value depends on the function x(t). The "integration" over this
function x(t), which is represented by Vx(t), is called a functional integral.
In other words, the amplitudes for two successive events going through the
same given intermediate point c, (a -+ c) and (c -+ b), are multiplied. The
amplitude K(b, a) is the sum of these products on all possible values of the
intermediate point. This is simply the superposition principle.
This argument can be extended to any number N of intervals, with inter-
mediate points Xi, i = 1"" (N - 1), which leads to
K(b,a) = J K(b,N-1)K(N-1,N-2)K(i+1,i)
.. K(l,a)dxl dX2 .dXN-l (7.15)
Assuming these intervals are infinitesimal and of equal length E, the cor-
responding expression resembles equation (7.9). It is not identical, however,
since the latter form is a limit, whereas (7.15) is an equality. This, however,
enables us to obtain an infinitesimal form of the amplitude K between two
points separated by an infinitesimal time interval E. In fact, when t2 - tl = E
is infinitesimal, the action (7.1) is, to first order in E,
S(2 1) = ,c (X2
, E
+ Xl X2 - Xl t2 +
2' E ' 2
h) '
(7.16)
or
K(2,1) = 1 exp(~,C(X2+Xl,X2-Xl,t2+tl)).
C(t2- t l=E) n 2 E 2 (7.17)
Inserting this result into the formula (7.15), and assuming we can exchange
the order of the integration and the limit E -+ 0, we indeed obtain an equality
between the two expressions. This justifies the method (7.9) and (7.10) in all
cases where the expressions converge sufficiently well.
152 7 Feynman's Principle in Quantum Mechanics
(7.18)
This result is obvious. In this problem, the velocity j; is a constant and the
Lagrangian is C = mj;2/2 = m[(x2 - Xd/(t2 - td1 2 /2, which leads directly to
(7.18) since S = J Cdt.
In order to calculate the propagator of a free particle, we could use the lim-
iting form (7.9). However, in this case, it is completely equivalent to use the
expression (7.15) because the result is independent of the value of f = tH1 -ti.
We will also obtain the value of the normalization coefficient C(E) in (7.7).
The final result is that the propagator of a free particle between points a
and b is
This gives the value of the normalization factor. For a free particle and a time
interval (tb - t a), we read in the formula above
(7.20)
(7.21 )
(7.22)
(7.23)
(7.24)
The second expression is obtained using the Cauchy theorem in the complex
plane.
In expression (7.15) we set Xo == xa. We first calculate the integral over
rJ
Xl. Using (7.17), we obtain
(X2 - xd
2 2 1
(
+ (Xl - xo) ="2 (X2 + xo) + 4 Xl -
2
( X2 - Xo
2 )2)
this reduces to a simple Gaussian integral on Xl, which gives
1 )
K(2, a) = ( C(f)
2 J i(2f)7rn ( im
m exp 4nf (X2 - Xo)
2) . (7.26)
(7.27)
Consequently, the equality of (7.28) and (7.27) for infinitesimal time in-
tervals tb - ta = f imposes the choice
C( f) = (2i:nf r/ 2
== (2i7rn(~ - ta ) r/ 2
(7.29)
It is straightforward to obtain
(Xb - X)2 (x - xa)2
(tb - t) + (t - ta)
(Xb- Xa)2 (tb-ta) ( _ Xb(t-ta)+xa(tb-t))2 (731)
(tb - ta) + (tb - t)(t - ta) x (tb - ta) . .
The first term, which is independent of x, factorizes in the integral, which
boils down to a Gaussian integral. The value I of this integral (without the
prefactors of (7.30)) is therefore
One can check with no difficulty that the free propagator obeys the partial
differential equation
aK n,2 a2K
in,-=--- (7.34)
at 2m ax2
for t > 0 (or tb > ta).
Equation (7.34) has the same form as the Schr6dinger equation for a free
particle. We must, however, be careful since we do not yet know the physical
nature of the amplitude K and how it is related to a physical probability
amplitude.
7.2 Free Particle 155
1 +00 JC(x,t)dx
-00
= Vrr:~ 1+
27r2nt -00
00
exp (irr:,x2) dx
2n t
= 1 'Vt > 0. (7.35)
Some authors stress the fact that the free Schrodinger equation can be con-
sidered as a Fourier diffusion equation,
ap
at = D'V2 p,
for a purely imaginary time t = iT. This remark is interesting in that the same
mathematical techniques apply to both and that the solutions have obvious
formal similarities.
156 7 Feynman's Principle in Quantum Mechanics
Two points are in order. First, the function K that we use here becomes a
density p (of heat or matter), which is positive. The solution is then real and
positive, or zero. The result (7.35) expresses the conservation of energy, and
the limit (7.36) represents an initial condition where some quantity of heat has
been deposited on a given point, which avoids any problem of interpretation.
Second, and this is perhaps more interesting, this is an example of the fact
that path integral techniques are useful in a large category of problems.
One can refer to the remarkable book Techniques and Applications of Path
Integration by Lawrence S. Schulman [21]. In the present case, the solution of
a partial differential equation of first order in time can be cast quite directly
into the form of a path integral. This is the case for the Fourier equation as
well as the Schrodinger equation.
J
wave
K(x, t) = 27r7nt exp i(x, t) ex: ei(kx-Wf) , (7.37)
where k is the wave vector and w the frequency. These are locally related by
k= a (7.38)
ax
Here, the value of the phase is
Therefore, we obtain
A= ~. (7.41)
p
(7.42)
(7.45)
In the book by Feynman and Hibbs, there are several examples and cal-
culations of this type. We shall not elaborate further on this aspect.
amplitude 'ljJ(x, t) obviously satisfies all the conditions we have found previ-
ously. By definition, it is square integrable, which avoids by construction the
limiting procedures seen in Section 7.2.3. Apart from these problems, the am-
plitude K of (7.33) is a particular wave function for which we know that the
particle started at a == (0,0).
The wave function is a probability amplitude. Therefore, it satisfies the
i:
law of composition of successive amplitudes (7.14); i.e., the integral equation
The physical content of this formula is important. The amplitude 'ljJ(x, t) for
the particle to arrive at (x, t) is the sum over all possible values of an in-
termediate point x' of the product of the total amplitude 'ljJ( x', t') and the
amplitude K(x,t;x',t') to go from (x',t') to (x,t).
In other words (we intentionally keep the enthusiastic presentation of Feyn-
man), the effect of all the past history of a particle is contained in a single
function 'ljJ(x, t). One can forget everything one knows about the past history
of a particle. If one knows its wave function at a given time t, one can calculate
and "read" in it all that can happen to the particle in the future. 6
In fact, equation (7.46) is nothing else than the modern expression of the
Huygens-Fresnel principle in optics (see, for instance, Born and Wolf [12],
Chapter VIII), which founded wave optics. The Huygens principle, given in
1690, was that "Each infinitesimal element of a wave front can be considered as
a secondary perturbation which radiates spherical wavelets. The wave front
at a later time is the envelope of these wavelets". Fresnel completed this
principle later, in 1818, by postulating that the secondary wavelets "are in
mutual interference". The fundamental principles of wave optics were stated.
We have abundantly treated the case of a free particle above. The propagator
can be calculated with no difficulty:
m
---;-------,-- exp (im- (X2 - XI)2) . (7.4 7)
27rih(t2 - tI) 2h (t2 - tI)
o'ljJ(x, t)
at
.'to,
Zit
(7.48)
6 Feynman added, with his legendary sense of humor, "The effect of the entire
History on the future of the universe could be obtained from a single gigantic
wave function."
7.3 Wave Function and the Schrodinger Equation 159
i:
The integral equation satisfied by the wave function is still
The action is
S= t (x, X, t) dt,
it'
(7.51 )
(7.53)
We insert this into (7.49) and we recall that if the phase of the propagator K
is well defined, exp(iS/n), on the contrary its normalization C is not, and we
obtain
7jJ(x, t + E) = 1 00
-00
1 (
C exp
[im(x- y
Ii 2E )2 iE V (x+y)])
+ h" -2- 7jJ(y, t) dy.
(7.54)
In the argument of the exponential, the first term, (i/n)(m(x - y)2/2E), be-
comes large as soon as y becomes appreciably different from x. Within this
assumption, the phase of the integrand in (7.54) varies rapidly and this inte-
grand oscillates very quickly. On average, these various contributions to the
integral cancel each other. In other words, it is only the sufficiently small val-
ues of Ix - yl that will give appreciable contributions to the integral. We can
rewrite (7.54), setting y - x = 7] and keeping in mind that only small values
of 17]1 will contribute, as
7jJ(x, t + E) = 1 00
-00
1 exp [im7]2]
C 2nE exp [iE
-h" V(x + 7]/2) ]7jJ(x + 7], t)d7]. (7.55)
160 7 Feynman's Principle in Quantum Mechanics
(7.56)
The identification in each order of E gives the following result, using the gaus-
sian integrals (7.22) and (7.23).
1. Order 0 in E
The coefficients of 'ljJ on both sides are concerned. We obtain
C= (21r~nE r/ 2
, (7.57)
. a'ljJ n a 'ljJ
2 2
2n- - - a2 + V(x)'ljJ(x, t),
at = - 2m x
(7.58)
Both from the conceptual and the technical points of view, the method of
Feynman path integrals has an undeniable elegance and richness. We have
mentioned that it extends to many other physical problems such as quantum
field theory, Brownian motion, polarons, spin physics, statistical mechanics,
and critical phenomena, as one can see in the book by Schulmann [21]. This
book contains, in particular, a very pleasant discussion of quantum mechanics
in curved spaces. We end this chapter with a series of remarks that the present
results have induced after going through the previous five chapters of this
book.
There is no hierarchical relationship between the depth of the various
approaches and different chapters of physics, neither do we wish to discuss
any axiomatics of physics. It is a personal matter of taste to prefer such and
such a line of thought. What is interesting here is to see the unifying character
of what we have discussed, from the Fermat principle up to the Feynman path
integrals.
and suppose the classical action S(b, a) is macroscopic (i.e., it is much larger
than the Planck constant n). Consider the contribution of several paths that
can perfectly well be close to each other in the classical sense but whose
difference is much larger than n. The contributions of these paths to the phase
will be completely different (and very difficult to determine with an accuracy
better than, say, 7r). With great probability, they will interfere destructively. If
one considers the set of all those paths, their total contribution to the integral
will vanish.
However, in the vicinity of the classical trajectory Xel(t), the action
Sel (b, a) is stationary. Therefore, paths that are sufficiently close to the classi-
cal trajectory will give contributions that will interfere in a constructive way.
Only those paths along which the action S(b, a) is sufficiently close to the clas-
sical action Sel (b, a) will contribute, the difference being noticeably smaller
than the unit of action n. Notice that for all processes involving macroscopic
values of the action, this quantity will be larger than, say, 10 25 to 30 n.
In other words, under these conditions, the only appreciable contribution
will come from an infinitesimal vicinity of the classical trajectory that cannot
be resolved experimentally. Consequently, the "probability" of the classical
trajectory is equal to one. The probability for any trajectory that can be
distinguished from the classical one vanishes.
162 7 Feynman's Principle in Quantum Mechanics
1 aSel
d = h aXb dXb = kdxb, (7.62)
where k is the local wave vector of (7.61). But we know (see (7.4)) that the
classical momentum of a particle at Xb is related to the action by p = (as / aXb).
We therefore obtain the de Broglie relation
p= hk,
as announced in (7.41).
In the same way, by varying the time of arrival bt2, we obtain (referring
to the definition (7.37))
(7.63)
E=nw.
7.4 Concluding Remarks 163
(7.64)
For large values of /-l, the phase of eil'f(t) varies very rapidly unless f'(t) = O.
Therefore, the dominant contributions to the integral will come from values of
t for which f'(t) vanishes. If f'(t) vanishes at a single point to, we can expand
f as a power series in the vicinity of to; i.e.,
(7.65)
If one neglects higher-order terms in the expansion, one obtains the result
(7.66)
7.5 Problems
F(T) = (
mw )1/2 .
27rinsinwT
Check that one obtains the formula (7.61).
Solutions
yields z = sinh cf>(x) , i.e., j.gZ + A = Ccoshcf> with C = j.g. The solution is
Z A + -C cosh (J-Lg
= -- -(x - xo) ) . (7.71)
J-Lg J-Lg C
The constants xo, C, and A are fixed by the conditions z(O) = zo, z(a) = Zl,
and Joa JI + z(x)2dx = L.
2.3 Brachistochrone
Energy conservation gives
-I (dS)2
- + g(z - a) = O. (7.72)
2 dt
We want to minimize
T-
- la
b
( I + Z2
2g(a-z)
) dx (7.73)
(7.74)
T = fAo dt = v'2gsina
I fA JI +x(y')2 dx 0
6. We deduce
o y
A
x
-l
T-
o
L
dx
h'(z')
().
w z
(7.77)
170 Solutions
~: = :X (~:,) .
5. The function <fJ does not depend explicitly on x. Therefore, we have
Consequently,
~
dx
(<fJ - z' 8<fJ) = 0
8z' ,
which gives (h'(z')z' - h(z'))/w(z) = constant.
6. We have z'h' - h = -2/z'. We therefore obtain the first-order differential
equation for the function x(z), (-2/A)dx/dz = w(z), and hence the result
z , - --
dz - WOZI - wlzoln(l
~~~--~~~~~~~
+ (zI/zo))
- dx - woL - wILzo/(z + zo)
If Zl Land Zl Zo, the velocity of the wind does not vary appreciably
over the whole path, and one has z' '" zI/ L l.
In the second question, we have seen that the optimal velocity for a
constant wind velocity is attained for z' = 1. The present configuration
certainly does not correspond to the best strategy. One must tack at some
point (Xl, Z) with 0 < Xl < Land Z Zl, as represented in Figure 7.2
in order to benefit fully from the power of the wind (this possibility was
excluded in the text).
x:::L
Z :::z,
shore x
3. Constant force
5. One varies t2, taking into account that the variation of the time of arrival
yields a variation of the trajectory.
172 Solutions
ac . ac 2 ac 2 2 .
Pr = af = mr, P9 = aiJ = mr 0, P</> = a = mr sin O.
3. Taking the derivative of (3.73) with respect to time, and taking into ac-
count that in Cartesian coordinates p = mv, one obtains directly the
result L z = mr2 sin2 O = P</>o
4. The conservation of P</>' or L z , corresponds to the invariance under trans-
lation in ; i.e., rotation invariance around the z axis.
5. If a charged particle is in a magnetic field B parallel to Oz, there is
rotational invariance around the z axis and the component L z is conserved.
p2 mw 2X 2 Q2 m(w2 + ,n2)y2
H = 2m + 2 + 2m + 2 .
2. The eigenfrequencies of the system are therefore WI = wand W2
Jw 2 +,n2 .
3. The general form of the motion follows from
m mw 2 3m,n2 (2
2
H = 2(PI +P2 +P3)
2 2
+ -2-(XI
2
+X2 +X3 + - 2 - Xl +X2 .
2 2) 2)
{X,P} = 1.
Solutions 173
2. In these variables, which are the same as those used by Dirac in the
quantum harmonic oscillator,
H = w(a*a).
a = {a,H} = -iwa,
which is a first-order differential equation. The general solution is
Yk = y'N-k,
b) We have
N N
L,qkqk = L,p;. (7.80)
k=l n=l
Similarly
t t (~ t
k=l
qkqk =
k=l VN n=l
e-2ikmr/N pn) ( ~
VN n'=l
t e2ikn'7r/N p~) .
(7.81)
The summation over k gives bnn" and hence the result.
with
b) We have
{Yj, qd = bjk' {Yj, qk} = bjk, {Yj, qF,r -d = bjk, {yj, qN -d = bjk.
(7.83)
c) We obtain
Yk = {Yk, H} = ; (qk + qN-k) = mqk'
fl,2 ( + * )
.* _ { * H} - m k Yk YN-k _ fl,2
qk - qkl - - 2 - m kYk
-_ VN
1 [
cos (2!?7rtN+ 2n7r) + cos (2!?7rtN- 2n7r)] . (7.85 )
(2!?7rt + 2Y7r/a)
f( t,y ) -__1_ [
VN cos N + cos (2!?7rt -N 2Y7r/a)]
and satisfies the wave equation
1 82 f 82 f
------=0.
!?2a 2 8t 2 8x2
In this chain of coupled oscillators, a progressive wave of velocity !?a
propagates.
4.5 Virial Theorem
1. One obtains
p2
{A, H} = - - r . V'V.
m
The time evolution of A is simply
dA p2
dt = {A, H} = m - r . V'v.
2 ( : : ) = (r V'V).
176 Solutions
3. If V = gr n , we have
8V
r V'V = ra;: = nV.
(7.86)
where 'lj;* is the "mirror" density which concentrates instead of diffusing. This
leads to the propagation equation
3 8 2'lj; 28'lj;
2~ -i1'lj;+a ~ =0. (7.87)
v ut ut
This equation can be solved by Fourier transformation if the coefficients v
and a are constants. (This is not the case if the medium is inhomogeneous or
discontinuous. )
Solutions 177
Problems of Chapter 6
6.2 Geodesics
Solutions exist only for p 2: R (which is explained by equation (6.136)).
The energy is
(7.88)
2 2E
w = mR2' (7.89)
We obtain
(7.90)
and
tanh((t) - o) = ,tanhw(t - to). (7.91)
The calculation of the propagator involves only Gaussian integrals, and the
result follows directly. One recovers (7.61).
References
16. Steven Weinberg, Gravitation and Cosmology, John Wiley & Sons, New
York (1972).
17. P. A. M. Dirac, General Theory of Relativity, John Wiley & Sons, New
York (1975).
18. Charles W. Misner, Kip S. Thorne, and John Archibald Wheeler, Grav-
itation, W.H. Freemann and Company, New York (1973).
19. James Rich, Fundamentals of Cosmology, Springer-Verlag, Heidelberg
(2001).
20. R.P. Feynman and A.R. Hibbs, Quantum Mechanics and Path Integrals,
McGraw-Hill, New York (1965).
21. Lawrence S. Schulman, Techniques and Applications of Path Integra-
tion, John Wiley & Sons, New York (1981).
22. Julian Schwinger, Selected Papers on Quantum Electrodynamics, Dover,
New York, (1958).
Index