Rojo, Bloch - The Principle of Least Action PDF

T H E P R I N C I P L E O F L E A S T AC T I O N
History and Physics
The principle of least action originates in the idea that Nature has a purpose and
thus should follow a minimum or critical path. This basic principle, with its variants
and generalizations, applies to optics, mechanics, electromagnetism, relativity and
quantum mechanics, and provides a guide to understanding the beauty of physics.
This text provides an accessible introduction to the action principle across these
various fields of physics and examines its history and fundamental role in science.
It includes explanations from historical sources, discussions of classic papers, and
original worked examples.
Different sections require different levels of mathematical sophistication. How-
ever, the main story line is accessible not only to researchers and students in physics
and the history of physics, but also to those with a more modest mathematical
background.
A L B E R T O R O J O is an Associate Professor at Oakland University. He is a Ful-
bright Specialist in Physics Education and he was awarded the Jack Williams
Endowed Chair in Science and Humanities from the University of Eastern New
Mexico. His research focuses primarily on theoretical condensed matter, and he
has previously published popular science books.
ANTHONY BLOCH is the Alexander Ziwet Collegiate Professor of Mathematics

at the University of Michigan where he is the Department Chair. He has served on
the editorial boards of various journals, he is a Fellow of several professional soci-
eties, and he has received a Presidential Young Investigator Award, a Guggenheim
Fellowship and a Simons Fellowship.
THE P R I NC I P L E OF L E AST ACT ION
History and Physics
A L B E RTO RO J O
Oakland University, Michigan
ANTHONY BLOCH
University of Michigan
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9780521869027
DOI: 10.1017/9781139021029

c Alberto Rojo and Anthony Bloch 2018
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2018
Printed in the United Kingdom by Clays, St Ives plc
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Rojo, Alberto G., author. | Bloch, Anthony, author.
Title: The principle of least action : history and physics / Alberto Rojo
(Oakland University, Michigan), Anthony Bloch (University of Michigan).
Description: Cambridge, United Kingdom ; New York, NY :
Cambridge University Press, 2018.
| Includes bibliographical references and index.
Identifiers: LCCN 2017023575| ISBN 9780521869027 (hardback ; alk. paper) |
ISBN 0521869021 (hardback ; alk. paper)
Subjects: LCSH: Least action. | Variational principles. | Mechanics. |
Lagrange equations. | Hamilton-Jacobi equations.
Classification: LCC QA871 .R65 2017 | DDC 530.1–dc23
LC record available at https://lccn.loc.gov/2017023575
ISBN 978-0-521-86902-7 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
List of Illustrations page ix

Acknowledgments xii
1 Introduction 1
2 Prehistory of Variational Principles 6

2.1 Queen Dido and the Isoperimetric Problem 6
2.1.1 Zenodorus’s Solution* 7
2.2 Hero of Alexandria and the Law of Reflection 13
2.3 Galileo and the Curve of Swiftest Descent 14
2.4 Bending of Light Rays and Fermat’s Minimum Principle 18
2.4.1 Fermat’s Method of Maxima and Minima 21
2.4.2 Huygens’ Simplified Derivation of Snell’s Law 24
2.5 Newton and the Solid of Least Resistance* 27
2.5.1 The Sphere and the Cylinder 28
2.5.2 An Application “in the Building of Ships” 30
2.5.3 The First Genuine Variational Calculation 34
3 An Excursion to Newton’s Principia 38

3.1 Newton’s Propositions on the Laws of Motion 38
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion 39
3.2.1 Proposition 1: Equal Areas are Swept Out in Equal Times 39
3.2.2 Proposition 6: The Force Law and the Geometry of the Orbit* 41
3.2.3 Circular Orbits 43
3.2.4 Proposition 10: Elliptical Orbit with the Center of Force
at the Center of the Ellipse 45
3.2.5 Proposition 11: Center of Force at the Focus of the Ellipse* 48
4 The Optical-Mechanical Analogy, Part I 51

4.1 Bernoulli’s Challenge and the Brachistochrone 51
4.1.1 Huygens and the Horologium Oscillatorium 52
v
vi Contents
4.1.2 Leibniz’s Solution of the Brachistochrone 55

4.1.3 Bernoulli’s Solution: Particle Paths as Light Rays 57
4.2 Maupertuis, Least Action, and Metaphysical Mechanics 59
4.3 Euler and the Method of Maxima and Minima* 62
4.3.1 Euler’s Derivation of Orbits from the Least Action Principle 65
4.4 Examples of the Optical-Mechanical Analogy 68
4.4.1 Conservation of “Angular Momentum” for Light Rays 69
4.4.2 The Terrestrial Brachistochrone 71
4.5 The String Analogy and the Principle of Least Action 72
4.5.1 The Least Action Principle and Stretchable Strings 75
5 D’Alembert, Lagrange, and the Statics-Dynamics Analogy 79

5.1 The Principle of Virtual Work 79
5.2 Statics Meets Dynamics: Bernoulli’s Calculation of the Center
of Oscillation 82
5.3 D’Alembert’s Principle 85
5.4 Lagrange’s Dynamics 90
5.4.1 Lagrange’s “Scientific Poem” 93
5.4.2 Symmetries 96
5.5 Lagrange versus d’Alembert: Dissipative and Nonholonomic
Systems 100
5.5.1 Dissipation in a Reversible System: Lamb’s Model 101
5.5.2 Nonholonomic Systems 103
5.6 Gauss’s Principle of Least Constraint 105
5.7 Least Action with a Twist: the Elasticity of the Ether and
Maxwell’s Equations* 107
6 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi

Equation 112
6.1 Hamilton’s “Theory of Systems of Rays” 112
6.2 Conical Refraction* 119
6.2.1 Fresnel’s Equations for Anisotropic Crystals 119
6.2.2 Analytical Derivation of the Wave Surface 122
6.2.3 Hamilton’s Derivation of the Conical Cusp 124
6.2.4 Internal Conical Refraction: “The Plum Laid Down on a
Table” 125
6.3 Hamilton’s Law of Varying Action* 128
6.4 An Example from Hamilton: The Characteristic Function
V for a Parabolic Orbit* 131
6.5 Hamilton’s “Second Essay on a General Method in Dynamics”* 134
Contents vii
6.5.1 Example: Particle in a Uniform Gravitational Field 140

6.6 Hamilton-Jacobi and Huygens’ Principle* 142
6.7 Applications and Examples 143
6.7.1 The Equation of a Light Ray 143
6.7.2 Hamiltonian of the Harmonic Oscillator 145
6.7.3 Hamilton-Jacobi Equation for a Particle in a Magnetic Field 146
6.8 When the Principle of Least Action Loses its “Least” 149
6.8.1 Focus and Kinetic Focus 149
6.8.2 Kinetic Focus for a Free Particle on a Sphere 151
6.8.3 Saddle Paths for the Harmonic Oscillator 152
6.8.4 Kinetic Focus of Elliptic Planetary Orbits 154
6.8.5 Gouy’s Phase and Critical Action 155
6.8.6 Caustics 158
7 Relativity and Least Action 162

7.1 Simultaneity and the Relativity of Time 163
7.2 The Relativistic “F = ma” 166
7.2.1 The Energy-Momentum Four-Vector 169
7.2.2 Invariance of the Relativistic Action 171
7.3 Hamilton-Jacobi Equation for a Relativistic Particle 172
7.4 The Principle of Equivalence 173
7.4.1 Bending of Light Rays According to the Equivalence
Principle 176
7.4.2 Bending of Light Rays, Newtonian Calculation 178
7.5 Space-Time is Curved 180
7.6 Weak Gravity around a Static, Spherical Star 182
7.6.1 Precession of the Perihelion of Mercury 182
7.6.2 Bending of the Light Rays in the General Theory 185
7.7 Hilbert’s Least Action Principle for General Relativity* 186
8 The Road to Quantum Mechanics 189

8.1 The Need for a New Mechanics 189
8.2 Bohr’s “Trilogy” of 1913 and Sommerfeld’s Generalization 192
8.2.1 Sommerfeld and the Kepler Problem 195
8.2.2 The Fine Structure of the Hydrogen Spectrum 198
8.3 Adiabatic Invariants 199
8.4 De Broglie’s Matter Waves 203
8.5 Schrödinger’s Wave Mechanics 205
8.5.1 The Eikonal Equation 206
8.5.2 Schrödinger’s Derivation 207
viii Contents
8.6 Dirac’s Lagrangian View of Quantum Mechanics 209

8.7 Feynman’s Thesis and Path Integrals 211
8.8 Huygens’ Principle in Optics and Quantum Mechanics* 214
8.8.1 First-Order (in Time) Propagator for the Wave Equation 215
8.8.2 Huygens’ Principle and Spherical Wavelets 217
8.8.3 Cancellation of the Backwards Wave 218
Appendix A Newton’s Solid of Least Resistance, Using Calculus 221

Appendix B Original Statement of d’Alembert’s Principle 223
Appendix C Equations of Motion of McCullagh’s Ether 224
Appendix D Characteristic Function for a Parabolic Keplerian Orbit 225
Appendix E Saddle Paths for Reflections on a Mirror 227
Appendix F Kinetic Caustics from Quantum Motion in
One Dimension 229
Appendix G Einstein’s Proof of the Covariance of
Maxwell’s Equations 233
Appendix H Relativistic Four-Vector Potential 235
Appendix I Ehrenfest’s Proof of the Adiabatic Theorem 238
References 241
Index 254
Illustrations
2.1 Adapted from Zenodorus: equiangular polygon page 8

2.2 From Zenodorus: isosceles triangle 8
2.3 Adapted from Zenodorus: equiangular polygon 9
2.4 Adapted from Zenodorus: similar isosceles triangles 9
2.5 From Zenodorus: area of a regular polygon 11
2.6 Adapted from Zenodorus: largest regular octogon 11
2.7 Area of the circle according to Archimedes 12
2.8 A variational problem without a solution 12
2.9 Hero of Alexandria’s original proof of the law of reflection 14
2.10 Galileo’s curve of swiftest descent 15
2.11 Particle falling on an inclined plane 16
2.12 Galileo’s law of chords 17
2.13 Swiftest descent, from Galileo’s Two New Sciences 17
2.14 Descent on a circle, from Galileo’s Two New Sciences 18
2.15 Descartes’ derivation of the law of refraction, reproduced from Descartes
(1637) 19
2.16 Fermat’s method of tangents 22
2.17 From Fermat’s Analyse pour les réfractions 23
2.18 From Huygens’ Treatise on Light 25
2.19 Adapted from Feynman’s Lectures on Physics 26
2.20 From Newton’s Principia, Proposition 34, Book II 29
2.21 Archimedes’ calculation of the volume of a paraboloid 30
2.22 Volume of a paraboloid inscribed on a cylinder 30
2.23 First figure in the scholium to Proposition 34 in Newton’s Principia 31
2.24 Second figure in the scholium to Proposition 34 32
2.25 Adapted from Whiteside (1974) 33
2.26 Newton’s variational method 35
2.27 Newton’s variational calculation II 35
3.1 Equal areas in equal times 39
3.2 From Robert Hooke’s manuscript 41
3.3 Equal areas in equal times for central forces 42
3.4 From Proposition 6 of the Principia 42
ix
x List of Illustrations

3.6 An ellipse is a rescaled circle 45
3.7 The intersecting chord theorem 46
3.8 Scaling proof of the intersecting chord theorem 47
3.10 Reflective property of the ellipse 49
4.1 Adapted from Huygens’ Horologium Oscillatorium 53
4.2 Path independence of the final velocity 53
4.3 Diagram from Huygens’ original work on the isochronous clock 54
4.4 From Leibniz’s solution of the brachistochrone 55
4.5 From the Beilage to Leibniz’s letter to Bernoulli 56
4.6 Light ray in a variable medium 57
4.7 A light ray refracting at horizontal interfaces 58
4.8 Bernoulli’s synchronous curve 59
4.9 Maupertuis’s law of reflection 60
4.10 From Euler, 1744 treatise on maxima and minima 63
4.11 Optical mechanical analogy 69
4.12 Ray in a “central” index of refraction 70
4.13 Terrestrial brachistochrone 72
4.14 Bernoulli’s string analogy of Snell’s law 73
4.15 String analogy for fixed tension 74
4.16 String analogy for variable tension 75
5.1 Galileo’s virtual velocity for an inclined plane 80
5.2 Virtual work according to John Bernoulli 81
5.3 Virtual work and the Fermat-Torricelli problem 82
5.4 Impressed, unchanged, and lost motions 83
5.5 Lost velocities and the center of oscillation 84
5.6 D’Alembert’s principle 87
5.7 Virtual versus real displacements 89
5.8 Infinite chain of oscillators 102
5.9 Lamb’s model 104
5.10 Curl and mean rotation of elastic displacements 108
6.1 Law of reflection for a parabolic mirror 113
6.2 Reflection on a curved mirror 115
6.3 Malus’s theorem for refraction 117
6.4 Variation of Hamilton’s characteristic function 118
6.5 Phase velocity of a birefringent crystal 121
6.6 Normal slowness 122
6.7 Fresnel wave surface 124
6.8 Internal conical refraction 127
6.9 Hamilton’s law of varying action 128
6.10 Eccentric anomaly 132
6.11 Huygens’ construction 143
6.12 Huygens’ construction and Fermat’s principle 144
6.13 Particle in a magnetic field 147
List of Illustrations xi
6.14 Focus and saddle path for a mirror 149

6.15 Jacobi’s example of kinetic focus 150
6.16 Kinetic focus for the harmonic oscillator 152
6.17 Kinetic focus of elliptical orbits 154
6.18 Saddle point and Gouy’s phase 157
6.19 Caustic on a cup of coffee 159
6.20 Caustic for a particle in one dimension 160
7.1 Time dilation 164
7.2 Vertical ray seen from a moving frame 165
7.3 Bending of a light ray grazing a star of radius R 176
7.4 Newtonian bending of a light ray 178
7.5 Radial potential for Schwarzschild’s metric 184
8.1 Stepwise change of the spring constant of a harmonic oscillator 200
8.2 Sudden change of the spring constant of a harmonic oscillator 201
8.3 Adiabatic invariance of the harmonic oscillator 202
8.4 Adiabatic invariance of p d x 203
8.5 Huygens’ construction 214
8.6 Huygens’ construction for spherical wavelets 218
A.1 Meridian of the solid of least resistance, from equations (A.12) 222
E.1 Saddle path for a mirror 227
F.1 Quantum one-dimensional delta function potential 231
I.1 Adiabatic theorem for closed orbits 239
Acknowledgments
We thank Pablo Amster for useful input, Michael V. Berry for very valuable com-
ments on a preliminary version of our manuscript, Danilo Capecchi for answering
our questions on the history of virtual work and d’Alembert’s principle, Olivier
Darrigol for comments on McCullagh’s theory of elasticity, David Garfinkle for
his suggestions on the relativity chapter, Ursula Goldenbaum for her remarks on
Maupertuis’s controversy, William Kentridge for his wonderful cover illustration,
Mario Mariscotti for his feedback, Guillermo Martínez for useful comments and
for his input on the cover, Jeffrey K. McDonough for helpful comments on Leib-
niz, Luis Navarro Veguillas for remarks on the adiabatic principle in quantum
physics, Bernardino Orio de Miguel for sending us his translation from Latin of
the Bernoulli-Leibniz correspondence, Peter Pesic for useful discussions, Roshdi
Rashed for comments on Islamic science, Jeffrey Rauch for discussions on Huy-
gens’ principle, Ignacio Silva for references on Aristotle, and Alejandro Uribe for
his comments on optics and saddle paths.
1
Introduction
The idea of writing a book on the principle of least action came to us after many
conversations over coffee, while we pondered ways of communicating to students
the ideas of mechanics with an historical flavor. We chose the principle of least
action because we think that its importance and aesthetic value as a unifying idea
in physics is not sufficiently emphasized in regular courses. To the general public,
even to those interested in science at a popular level, the beautiful notion that the
fundamental laws of physics can be expressed as the minimum (or an extremum) of
something often seems foreign. Nature loves extremes. Soap films seek to minimize
their surface area, and adopt a spherical shape; a large piece of matter tends to
maximize the gravitational attraction between its parts, and as a result the planets
are also spherical; light rays refracting in a glass window bend and follow the path
of least time; the orbits of the planets are those that minimize something called
the “action;” and the path that a relativistic particle chooses to follow between two
events in space-time is the one that maximizes the time measured by a clock on the
particle.
Our initial intention was to write a popular book, but the project morphed into
a more technical presentation. Nevertheless we have tried to keep sophisticated
mathematics to a minimum: nothing more than freshman calculus is needed for
most of the book, and a good part of the book requires only high school algebra.
Some familiarity with differential equations would be useful in certain sections.
While the different sections have various levels of difficulty, the book does not need
to be read in a linear fashion. It is quite feasible to browse through this book, as
most of the chapters and many of the sections are relatively self-contained. Sections
and subsections that are a bit more technical and that can easily be omitted on a
first reading include 1.1, 2.5, 3.2.2, 3.2.5, 4.3, 5.7, 6.2 to 6.6, 7.7 and 8.8. These
are marked in the text with an asterisk.
The gold standards on the topic of our book are The Variational Principles
of Mechanics by Cornelius Lanczos and Variational Principles in Dynamics and
1
2 Introduction
Quantum Theory by Wolfgang Yourgrau and Stanley Mandelstam. Our book can
be regarded as a supplement to these two masterpieces, with expositions that follow
the historical development of minimum principles, some elementary examples, an
invitation to read the primary sources, and to appreciate science, in the words of
Isidor Isaac Rabi, as a “human endeavor in its historic context, . . . as an intellectual
pursuit rather than as a body of tricks.”
The metaphysical roots of the least action principle are in Aristotle’s statement
from De caelo and Politics: “Nature does nothing in vain.” If there is a purpose in
Nature, she should follow a minimum path. At least that is the notion pursued by
Hero of Alexandria in the first century AD to deduce the law of reflection: light
follows the path that minimizes the travel time. Later, in 1657, Pierre de Fermat
extended this idea to the refraction of light rays. “There is nothing as probable or
apparent,” says Fermat, “as the assumption that Nature always acts by the easiest
means, which is to say either along the shortest lines when time is not a considera-
tion, or in any case by the shortest time.” The Arabic astronomer, Ibn al-Haytham,
also uses the principle of “the simplest way” to explain refraction. Galileo, in pos-
tulating the uniform acceleration of freely falling bodies, in 1638, also echoes
Aristotle: “we have been led by the hand to the investigation of naturally acceler-
ated motion by consideration of the custom and procedure of Nature herself in all
her other works, in the performance of which she habitually employs the first, sim-
plest, and easiest means.” In 1746, Pierre Louis Moreau de Maupertuis postulated
the principle of least action. His proposal, based on metaphysical and religious
views, reflected his adherence to notions of simplicity that had guided Fermat and
Galileo: “Nature, in the production of its effects,” he wrote, “does so always by the
simplest means.” More specifically: “in Nature, the amount of action (la quantité
d’action) necessary for change is the smallest possible. Action is the product of the
mass of a body times its velocity times the distance it moves.” His formulation was
vague, but, in the hands of Leonard Euler, it later became a well-formulated prin-
ciple. Gottfried Leibniz used similar (but not identical) ideas to study refraction of
light. Leibniz’s idea is of a “most determined” path and this reflects “God’s inten-
tions to create the best of all possible worlds.” “This principle of Nature,” he says
in his Tentamen Anagogicum, “is purely architectonic,” and then he adds: “Assume
the case that Nature were obliged in general to construct a triangle and that, for this
purpose, only the perimeter or the sum were given and nothing else; then Nature
would construct an equilateral triangle.”
The formulation of mechanics in terms of minimum principles originates in the
optical mechanical analogy first used by John Bernoulli to solve the “brachis-
tochrone problem:” what path between two fixed points in a vertical plane does
a particle follow in order to minimize the time taken? Bernoulli maps the prob-
lem to that of a light ray refracting in a medium of varying index of refraction,
Introduction 3
where light follows the path of least time. The mapping between mechanics and
optics becomes an isomorphism with Maupertuis’s formulation. The minimization
of action for a particle and the minimization of time for a light ray become the
same mathematical problem provided the index of refraction is identified with the
momentum of the particle: the paths are isomorphic. The principle of least action
then may be viewed as an alternative and equivalent formulation to Newton’s laws
of motion.
In the Age of Enlightenment, Newton’s ideas were extended to incorporate con-
straints in mechanical systems. The key figures are James Bernoulli, Jean le Rond
d’Alembert, and Joseph-Louis Lagrange. The central concept for these develop-
ments is the principle of virtual work, which establishes the conditions of static
equilibrium and its extension to dynamics. The work of Lagrange, starting in 1760,
is of supreme importance. For a constrained system with r degrees of freedom
(for example, a particle constrained to move on the surface of a sphere has r = 2
since at a given position it can move in only two directions), he is able to express
the dynamics in terms of a single function L (the Lagrangian) through r equa-
tions identical in structure. Lagrange’s equations can be derived from a minimum
principle, giving rise to an expanded version of the principle of least action: Mau-
pertuis’s minimum principle gives the path between two points in space for a fixed
value of the energy, while Lagrange’s integral gives the path that takes a given
time t between two fixed points in space. Lagrange’s ideas were extended, start-
ing in the 1820s, by William Rowan Hamilton (and also by Carl Jacobi). Hamilton
and Jacobi put the optical mechanical analogy in a broader conceptual frame: the
end points of paths that emanate from a given origin at t = 0 (each path being a
minimum of the Lagrangian action) create, at a later time t, a “wave-front” that
propagates. This wave-front is a surface that intersects the particle trajectories (just
like a wave-front for light is perpendicular to the light rays) but does not include
interference or diffraction effects peculiar to waves. However, it invites a “natu-
ral” question: if light rays are the small wavelength limit of wave optics, what is
the wave theory of particles whose small wavelength limit gives the particle tra-
jectories? Hamilton did not have an experimental reason to entertain the question
in the mid-nineteenth century, but the answer came in the 1920s with Louis de
Broglie’s and Erwin Schrödinger’s quantum theory of wave mechanics. In 1923
de Broglie wrote: “Dynamics must undergo the same evolution that optics has
undergone when undulations took the place of purely geometrical optics,” and in
1926 Schrödinger considered the “general correspondence which exists between
the Hamilton-Jacobi differential equation and the ‘allied’ wave equation.” In 1942
Richard Feynman established an even deeper connection between least action and
quantum physics: a quantum particle, in propagating between two fixed points in
space and time, does not follow a single path but all possible paths “at the same
4 Introduction
time.” The contribution of each path to the total propagation is the (complex)
exponential of Hamilton’s action.
The fact that many fundamental laws of physics can be expressed in terms of
the least action principle (with the appropriate action) led Max Planck to say that,
“Among the more or less general laws which manifest the achievements of physical
science in the course of the last centuries, the principle of least action is probably
the one which, as regards form and content, may claim to come nearest to that
final ideal goal of theoretical research.” And Arthur Eddington, in 1920, wrote:
“the law of gravitation, the laws of mechanics, and the laws of electromagnetic
fields have all been summed up in a principle of least action. . . . Action is one
of the two terms in pre-relativity physics which survive unmodified in a descrip-
tion of the absolute world. The only other survival is entropy.” Although Einstein
didn’t follow a least action approach in his theories of relativity (special and gen-
eral), Max Planck, in 1907, in the first relativity paper not written by Einstein,
formulated the dynamics of the special theory in terms of the least action principle.
One of the most interesting applications of the least action principle is the deriva-
tion, by David Hilbert, of the field equations of general relativity. Hilbert knew,
from Einstein, that the relativistic theory of gravitation had to involve the curva-
ture of a four-dimensional space-time. Einstein had struggled for eight years and
he had eventually arrived at the solution by analyzing the properties of the field
equations themselves. Hilbert followed the approach of the least action principle,
guessed the “most natural” Lagrangian and, in 1915, derived the field equations
before Einstein.
Our purpose in writing this book is to tell the above stories with some math-
ematical rigor while staying as close as possible to the sources. Chapter 2 visits
some ancient incarnations of minimum principles before moving on to Galileo’s
curve of swiftest descent and Fermat’s precalculus ideas. We also include New-
ton’s calculation of the solid of least resistance, which anticipates the calculus of
variations used in the principle of least action. In chapter 3 we take an excursion
to Newton’s Principia, even though this work is not directly related to variational
principles. We do so for two reasons: the monumental importance of this work on
mechanics, and the fact that Newton’s ideas are crucial in the development of the
principle of least action. Chapter 4 tells the story of the optical mechanical analogy
and the true beginnings of variational principles. In chapter 5 we visit the principle
of virtual work and Lagrange’s equations. Here we point out that the principle of
least action fails to give the dynamics of nonholonomic systems, where the con-
straints are expressed in terms of the possible motions rather than in terms of the
possible configurations. Chapter 5 and the ones that follow require familiarity with
calculus. In writing chapter 6, we decided to follow Hamilton’s crucial papers as
closely as possible, making some sections of this chapter perhaps less accessible
Introduction 5
than the material found in previous chapters. We included a discussion on kinetic

foci, which emphasizes the fact that the principle of least action should perhaps be
called the principle of extremal (or critical) action since action is not always least.
For classical (non-quantum) systems, the action is an extremum that can never be
a maximum; that leaves us with a minimum or a saddle point, and both are possi-
ble. Chapter 7 is an overview of relativity in connection with least action. Finally,
in Chapter 8 we narrate the development of quantum theory and, in particular, we
trace its connection with the optical mechanical analogy and the principle of least
action.
2
Prehistory of Variational Principles
2.1 Queen Dido and the Isoperimetric Problem

Consider a loop of thread lying on a table. How can we distort the loop, without
stretching the thread, so that it encloses the maximum area?
The problem appears in a story told by the Roman poet, Virgil, in his epic poem,
The Aeneid (Virgil, 19 BC):
They sailed to the place where today you’ll see

Stone walls going higher and the citadel
Of Carthage, the new town. They bought the land,
Called Drumskin [Byrsa] from the bargain made, a tract
They could enclose with one bull’s hide.
These verses refer to the legend of Queen Dido who fled her home because her
brother, Pygmalion, had killed her husband and was plotting to steal all her money.
She ended up on the north coast of Africa, where she was given permission to rule
over whatever area of land she was able to enclose using the hide of only one bull.
She cut the hide into thin strips, tying them together to form the longest loop she
could make, in order to enclose the largest possible kingdom. Queen Dido seems to
have discovered how to use this loop to maximize the area of her kingdom: using
straight coastline as her side border, she enclosed the largest area of land possible
by placing the loop in the shape of a semi-circle.
Queen Dido’s story is now the emblem of the so-called isoperimetric problem:
for a fixed perimeter, determine the shape of the closed, planar curve that encloses
the maximum area. The answer is the circle. Aristotle, in De caelo, while dis-
cussing the motion of the heavens, displays some knowledge or intuition of this
result (Aristotle, 350 BC/1922, Book II):
Again, if the motion of the heavens is the measure of all movement . . . and the minimum
movement is the swiftest, then, clearly, the movement of the heavens must be the swiftest
of all movements. Now of the lines which return upon themselves the line which bounds
the circle is the shortest; and that movement is the swiftest which follows the shortest line.
6
However, a common assumption in ancient times was that the area of a figure is
determined entirely by its perimeter (Gandz, 1940). For example, Thucydides, the
great ancient historian, estimated the size of Sicily from its circumnavigation time
which is proportional to the perimeter (Thucydides, 431 BC, Book VI):
For the voyage round Sicily in a merchantman is not far short of eight days; and yet, large
as the island is, there are only two miles of sea to prevent its being mainland.
The confusion persisted even up to the times of Galileo, who expresses the
problem in Sagredo’s voice (Galilei, 1638/1974, p. 61):
people who lack knowledge of geometry . . . make the error when speaking of surfaces; for
in determining the size of different cities, they often imagine that everything is known when
the lengths [quantità] of the city boundaries are given, not knowing that one boundary
might be equal to another, while the area contained by one be much greater than that in the
other.
The isoperimetric problem was solved by the Greek mathematician Zenodorus

(ca. 200 BC– ca. 140 BC). Although his work has been lost, we know of his proof
through Pappus and Theon of Alexandria (Pappus, 1888; Heath, 1921). Zenodorus
starts by shaping Queen Dido’s loop into straight lines of different lengths to form
an arbitrary irregular polygon. And then he shows that one can increase the area
enclosed by that polygon by changing the lengths of the sides – without altering its
perimeter or the number of sides – until they are all the same. In other words, he
shows that, of all polygons of a given perimeter and a given number of sides, the
equilateral polygon encloses the largest area. However, even if we fix the perimeter
and the number of sides, there are still an infinite number of possible equilateral
polygons. Zenodorus shows that, from this infinite set of equilaterals, the equian-
gular one encloses the largest area. And for the final piece of the proof: if we start
with a regular polygon and increase its number of sides (keeping the perimeter
fixed), the new polygon encloses a larger area. And this leads us naturally to the
circle, which we can think of as a regular polygon with an infinite number of sides.
The reader who is not interested in the details of Zenodorus’s proof can skip the
next section without loss of continuity.
2.1.1 Zenodorus’s Solution*

Zenodorus showed that a non-equilateral polygon can be manipulated to enclose
a larger area without changing its perimeter. Pick a triangle of the irregular poly-
gon – the shaded triangle of Figure 2.1. Keeping the base AB fixed, reshape the
triangle into an isosceles triangle of the same perimeter by moving the vertex from
8 Prehistory of Variational Principles
D C
A B
Figure 2.1 Adapted from Zenodorus. The equilateral polygon encloses a larger
area than any irregular polygon with the same perimeter and the same number of
sides.
b)
+
2 (a
1
D
E
b)
C
+
b
2 (a
1
A B
Figure 2.2 From Zenodorus. Among the triangles on a fixed base AB and fixed
perimeter AB + a + b, the isosceles triangle AD B has the largest area. Triangle
AD B, whose legs have length (a + b)/2, has the same perimeter as the starting
(scalene) triangle AC B. Prolong AC to F so that AD = D F. The dashed line
D E, parallel to AB, is a “line of symmetry:” the segment D B is the reflection
of D F on the “mirror” D E, and all points below the line D E are closer to B
than to F. Now consider the triangle AC F and use the “triangular inequality”
(in any triangle the sum of any two sides is greater than the remaining side):
C F + a > a + b and the segment C F > b. Point C is therefore below C E; the
height of the triangle AC B is therefore smaller than the height of AD B. Since
both triangles have the same base, AD B has larger area.
C to D. Zenodorus shows that this reshaping increases the area. The specifics of the
proof (see Figure 2.2) draw from the repertoire of “conjuring tricks” of the Greek
geometers – the choreography of auxiliary lines and symmetric angles that reveal
sometimes unexpected and paradoxical relations. The process can be repeated for
all triangles of consecutive vertices of the polygon, allowing one to conclude that
the polygon enclosing the largest area is equilateral.
The second step is to show that the maximum polygon is also equiangular. This
Zenodorus proves by considering two consecutive triangles from the polygon (see
Figure 2.3). Zenodorus proves that, given two non-similar isosceles triangles, if we
construct, on the same bases, two similar triangles with the same total perimeter as
the first two triangles, then the sum of the areas of the similar triangles is greater
Figure 2.3 Adapted from Zenodorus. The area of a polygon can be increased by
making it equiangular.
D
B
a D a h2
a a
h1
h2
A b1 C b2 E
a h1
a
B̄ h1
B̄ b 1 + b2
Figure 2.4 Adapted from Zenodorus. The sum of the areas of two isosceles trian-
gles with different bases AC = 2b1 and C E = 2b2 , but otherwise equal sides a,
is smaller than the sum of the areas of two similar triangles with the same bases
and equal total perimeter.
than the sum of the areas of the non-similar triangles. According to the account
by Heath (1921), Zenodorus’s proof is restricted. But we can show that a slight
modification makes it valid in general.
Let us start with the non-similar isosceles triangles ABC and C D E (see Fig-
ure 2.4). Their bases are AC = 2b1 and C E = 2b2 respectively, their heights are
h 1 and h 2 , and all the legs are of length a. Following Zenodorus’s logic, construct
the triangle A B̄C, which is the “mirror” image of triangle ABC, the mirror being
along the line of the common bases.
Now construct the two similar triangles A B̄ C and C D̄ E with new heights h 1
and h 2 , keeping the total perimeter the same:
B D = 2a. (2.1)
Since the original triangles are not similar, the line B̄ D joining their vertices is
shorter than 2a.1
B̄ D < 2a. (2.2)
1 This is due to the triangular inequality; the side B̄ D is smaller than the sum of BC and C D.
Using the Pythagorean theorem

( B̄ D )2 = (b1 + b2 )2 + (h 1 + h 2 )2
> ( B̄ D)2
= (b1 + b2 )2 + (h 1 + h 2 )2 , (2.3)
and
(h 1 + h 2 ) > (h 1 + h 2 ). (2.4)
This relation between the final and the initial heights is general and was obtained

by Zenodorus. All we need for equation (2.4) to be valid is that B D > B D, and
this inequality holds even for isosceles triangles of unequal legs (BC = C D).
In a slight deviation from the proof reproduced by Heath, consider the total
change in area for the triangles,
A = b1 (h 1 − h 1 ) + b2 (h 2 − h 2 ). (2.5)
Now take the larger of the bases (b1 in our case) and replace it by the smaller one
in equation (2.5) to obtain
A > b2 (h 1 − h 1 ) + b2 (h 2 − h 2 )
= b2 (h 1 + h 2 − h 1 − h 2 )
> 0. (2.6)
The sum of the areas of two isosceles triangles is always increased if they are
made similar (keeping the sum of the perimeters and the bases fixed): the shaded
“hollow-angled (figure)” (κoιλoγ ώνιoν) A B̄ C B̄ is larger than the figure C D E D .
This does not mean that the maximum possible area is attained by making them
similar.2 All we have shown is that the area increases. Zenodorus proved that the
largest polygon of a given number of sides is regular. The underlying argument
used here is “one of a mathematician’s finest weapons” (Hardy, 1967, p. 94), reduc-
tio ad absurdum, where an assertion is established by deriving an absurdity from
its denial. We first assume that there exists a polygon of maximum area, but we
find a way to make it even larger (by making it equilateral). We then assume that,
from the set of equilateral polygons, we choose the one of maximum area. But we
then show that making it equiangular makes it larger still. Thus we show that the
only equilateral that cannot be further enlarged is the equiangular one. The proof
“shuts all doors one after another, and leaves open only one, through which merely
for that reason we must now pass” (Schopenhauer, 1969, p. 70). There is one more
door to shut before we get to the circle: the proof that the area of a regular polygon
is increased if we increase the number of sides (always for fixed perimeter).
2 The maximum area, as can be proven using elementary calculus and also by a geometric method due to
Steiner, is attained when sin ∠BC A/ sin ∠DC E = b1 /b2 .
=
h
h
Perimeter
Figure 2.5 The area of a regular polygon of n sides (n = 5 in this case) is one half
the perimeter times the apothem h (the height of each of the n identical triangles
making the polygon).
B α
O b β
O D
a
α β O A
A B C D
Figure 2.6 Adapted from Zenodorus.
A regular polygon of n sides can be decomposed into n equal isosceles triangles

(see Figure 2.5). The total area of the polygon is one half the perimeter times the
height h of each triangle (the apothem of the polygon). If we increase the number
of sides of the polygon from n to m, the angle is decreased by the ratio n/m and,
since the perimeter is constant, the base of each triangle is shortened by the same
ratio. Zenodorus shows that the height of the triangles of the new “m-gon” is larger
than the height of the triangles of the n-gon. In Figure 2.6 a hexagon (n = 6) is
changed into an octagon (m = 8). The ratio of the (half) angles of the vertices
is β/α = 6/8. Draw the arc of a circle of radius O D where D intersects the
base AB. The areas of the circular sectors are: (Sector Oa D ) = O D × β and
(Sector Oab) = O D × α; their ratio is also n/m = 6/8. What Zenodorus shows
is that the ratio AB/AD > m/n. In other words, if we keep the height constant,
the perimeter will decrease: in order to keep the perimeter constant, the height has
to increase; and therefore the area will increase. Zenodorus compares the areas of
the triangles O AD and O D B with their corresponding sectors. From Figure 2.5,
O A × (AB − AD )/2 > O D × (α − β) (2.7a)

O D × β > O A × AD /2. (2.7b)
Multiplying the inequalities of equations (2.7), we obtain Zenodorus’s result:

AB α m
> = . (2.8)
AD β n
The largest regular polygon of a given perimeter is the one with an infinite num-
ber of sides: the circle. And the area of that circle, as shown by Archimedes (see
Figure 2.7), is one half the area of the rectangle having the perimeter and radius of
the circle as its length and width respectively.
Zenodorus’s solution, as well as a very elegant proof from Steiner (1842), are
vulnerable to a subtle but important flaw: the shape of maximum area for a given
perimeter is assumed to exist (without proof). The fact that from a given n-gon we
can construct a new one enclosing a larger area, does not guarantee that the area-
maximizing n-gon exists. As pointed out later by Weierstrass (1927), we could be
finding an upper bound and not the actual solution. Consider, for example (Blan-
chard and Brüning, 1982), the problem of finding the shortest curve C joining
points A and B with the restriction that C is perpendicular to the straight line AB
at both A and B (see Figure 2.8). If we call a the length of the straight line AB, we
can construct a curve connecting A and B that satisfies the constraint, consisting
of two arcs of a circle of radius plus a straight line. For each of these curves we
can find a smaller that shortens the length, but the limiting straight line of length
a is never reached.
It turns out that for n-gons, the proof that one of maximum area exists is rel-
atively simple (Blåsjö, 2005; Courant and Robbins, 1996), since the area and the
perimeter are continuous functions of the 2n coordinates of its vertices, which can
be restricted to a “compact” set of points in a 2n-dimensional space. For example,
each point can be thought of as being inside a square. Weierstrass showed that a
continuous function (the area in our case) on a closed and bounded interval is itself
bounded and attains its bounds. Zenodorus gave a beautiful proof but missed the
delicate proof of existence, and so Weierstrass usually gets the credit.
Figure 2.7 Archimedes’ proof that the area of a circle is half that of the rectangle
having the perimeter and radius of the circle as its length and width respec-
tively. A regular polygon with an infinite number of sides (of which we show
an approximation, with 15 sides) is a circle.

A B
Figure 2.8 A variational problem without a solution.

2.2 Hero of Alexandria and the Law of Reflection 13
2.2 Hero of Alexandria and the Law of Reflection

The first formulation of a minimum principle in a physical problem comes from
Hero of Alexandria (ca. 10 AD – ca. 70 AD). In his Catoptrics, Hero presents a
proof that the law of reflection for light rays on a flat mirror follows from the
minimization of travel time (Heronis, 1976). His starting point is:
what moves with constant velocity follows a straight line. An example is an arrow which
we see shot from the bow. For, because of the forward moving force the moving body
strives to follow the shortest path since it cannot afford the time for a slower motion, that is
a longer path. The moving force does not allow such a delay. Thus the body tries to follow
the shortest path because of its speed, but between the same endpoints the shortest of all
lines is the straight line.3
For Hero, light propagates at a finite velocity, an assumption that goes back as
early as 490 BC, attributed to Empedocles of Agrigentum (in Sicily), two millennia
before the finite speed of light was verified by the Danish astronomer, Ole Romer,
in 1676 (Hildebrandt and Tromba, 1985). The laws of reflection were known by the
Greeks before Hero. Euclid, in his Optics, states that light propagates in straight
lines and that the angle of incidence equals the angle of reflection. But Hero is
the first to derive the law from a minimization principle: he seeks the shortest path
between two points, subject to the condition of touching an intermediate point on
a plane. Hero uses the metaphysical principle of economy (Eastwood, 1970) to
derive a physical law, an approach at the core of the history of the principle of
least action. According to Damianus of Larissa (fourth century AD), author of On
the Hypotheses in Optics, Hero applies a principle that Aristotle mentions in many
places of his work: Nature does nothing in vain.4 If “Nature did not wish to lead
our sight in vain, she would incline it so as to make equal angles”(Damianus, 1897,
p. 21).
Hero considers the trajectory of a ray that connects points d and g and reflects
on the plane eh (see Figure 2.9). The model of vision for the Greeks was inspired
by a popular analogy between the sun and the eye: the light rays are “visual rays”
that originate in the eye, and the sensation of sight is produced when those linear
tentacles touch the object. Today we know that this model is wrong, but the geome-
try of these rays is maintained if we invert the direction of propagation, and Hero’s
treatment is valid. With an ingenuity that echoes Zenodorus’s proof in Figure 2.2,
Hero draws a line perpendicular to the plane eh through g and considers a sym-
metric point z such that ze = eg. For any point of reflection b, we have zb = bg
3 Translation from Pedersen (1993).
4 For example, in Politics, Aristotle says “Nature, as we often say, makes nothing in vain” (Aristotle, 350 BC,
Book I, Part II), and in De caelo we read “But God and Nature create nothing that has not its use” (Aristotle,
350 BC/1922, Book I, 4).
z e g
d
b
Figure 2.9 Hero of Alexandria’s original proof of the law of reflection.
and the angles ∠zbe = ∠ebg. These equalities reduce the problem to finding the
shortest path between the initial point d and the reflected, auxiliary point z. The
answer is evident: the straight line dz that intersects the reflecting plane at a. Since
the angles ∠had and ∠zae are equal, and since, by the symmetrical construction
∠zae = ∠eag one concludes with Hero that
∠had = ∠eag; (2.9)
in other words, the angle of incidence is the same as the angle of reflection.
2.3 Galileo and the Curve of Swiftest Descent

One of the first minimum problems connected to dynamics was considered by
Galileo in his masterpiece, Two New Sciences. In the scholium (a brief comment)
to Proposition 36, he states that, for a particle falling between two points in a ver-
tical plane, the “swiftest movement of all from one terminus to the other is not
through the shortest line of all which is the straight line [AC]” (see Figure 2.10),
“but through the circular arc” (Galilei, 1638/1974, p. 212). Galileo’s Proposition
36 antecedes the classical “problem of the brachistochrone” – the curve of swiftest
descent among all possible curves connecting two points (see Section 4.1). The
brachistochrone is a cycloid and not the arc of a circle studied by Galileo. Galileo
compares the time of fall for the straight line AC with that of an arc of a cir-
cle and not with all possible curves; – at least there is no attempt to do so in
Two New Sciences. Galileo finds his minimum curve (or the minimum subject to
having polygons touching the arc of a circle) through a sequence of lemmas that
follow from Euclidean geometry and his postulate of uniform acceleration. And
his justification of the postulate has the principle of least action in a pre-embryonic
stage.
On the third day of Two New Sciences, Galileo introduces us to the notion of
bodies falling with uniform acceleration. It was known before Galileo that “Nature
does employ a certain kind of acceleration for descending heavy things” (Galilei,
1638/1974, p. 197). But what kind of acceleration was not clear. Galileo tells us
B A
Figure 2.10 The descent of a particle from A to C is faster through the arc of the
circle than through the straight line.
that, through experimentation (naturalia experimenta), he reached the conclusion

that “the intensification of speed is made according to the extension of time,” and
that the distance traveled increases with the square of the time.5 He does not use
the term “uniform” acceleration; he rather talks about “natural” acceleration, and
proposes a metaphysical justification:
we have been led by the hand to the investigation of naturally accelerated motion by con-
sideration of the custom and procedure of Nature itself in all her other works, in the
performance of which she habitually employs the first, simplest, and easiest means . . . .
Thus when I consider that a stone, falling from rest at some height, successively acquires
new increments of speed, why should I not believe that those additions are made by the
simplest and most evident rule? (Galilei, 1638/1974, pp. 153–154).
Galileo studied uniform acceleration by considering the motion on inclined

planes. In one of the first postulates, he assumes that “the degrees of speed acquired
by the same moveable over different inclinations of planes are equal whenever the
heights of those planes are equal.” Later, in 1638, the year of publication of Two
New Sciences (the dialogue parts were written in 1633–1635), Galileo, blind and
under house arrest, added a section expanding on this postulate. In this section he
uses what today we would call the component of the acceleration along the plane.
In contemporary algebraic language (which Galileo never used), the distance d
traveled along an inclined plane is

1 H 2
d= g× t , (2.10)
2 D
where g × (H/D) (see Figure 2.11) is the component of the acceleration g along
the plane.6 The total time ttotal to travel the inclined distance is (setting d = D)
5 Proposition II, Theorem II.
6 The only “modern” addition here is to introduce g and the factor 1/2. Strictly speaking, Galileo is saying that
d ∝ t 2 and that the constant of proportionality is in turn proportional to H/D.
H
× D
g
g
H
D
Figure 2.11 The final√velocity of a particle falling from rest on an inclined plane
is proportional to D/ H .

2 D
ttotal = ×√ . (2.11)
g H
On the other hand, the final velocity vfinal = gttotal is

H
vfinal = g × × ttotal
D

= 2g H (independent of D!) (2.12)
Equation (2.12) expresses the content of Galileo’s second postulate of accelerated

motion. And in Proposition I, Theorem I, Galileo states that the time t required to
travel a distance D under uniformly accelerated motion is equal to the time required
to travel the same distance, but at a constant velocity equal to the average between
the initial velocity vi and the final velocity v f . Galileo shows this geometrically
starting at rest (vi = 0), but the result applies even if the initial velocity is not
zero:7
D
t= , (2.13)
vav
where vav = (vi + v f )/2. √

The particular geometrical relation of equation (2.11), D/ H = constant,
appears in chords of circles (see Figure 2.12), leading Galileo to his “law of
chords:” for an arc of a circle “the times of descent through all chords from the
terminals P and R” (see the right panel of Figure 2.12) “are equal.”
Galileo then compares a direct path with a broken path touching a point on an
arc of a circle (see Figure 2.13). If the particle starts at D at rest, it reaches points
F and B at the same time (by the law of chords). Now, at B the particle is moving
faster than at F [by equation (2.12)]. Since the final velocity at C is the same for
7 For accelerated motion D = v t + gt 2 /2 ≡ v t + (v − v )t/2, from which equation (2.13) follows.
i i f i
A A
P
α
β P Q P D
H
R
β
B O
B
Figure 2.12 Galileo’s law of chords. Left: Thales’ theorem: If A, B, and P are
points on a circle, and AB its diameter, the angle A P B (equal to α + β) is a
right angle. Center: By similarity of the triangles AB P and P B Q, P B/B Q =
AB/P B: (P B)2 /B Q is constant for a circle, and equal to the diameter AB. This
geometrical relation is the same one that appears in the time of fall on an inclined
plane: equation
√(2.11). Right: Galileo’s law of chords: For all points on the arc of
the circle, D/ H is constant, and the descent time from rest is the same for all
inclined planes.
D A
B
Circle of equal time
C
Figure 2.13 Adapted from Galileo’s Two New Sciences. Point B is on the arc of
circle C B D. A particle starting at rest from D falls faster through the broken path
D B + BC than through the direct, shorter path DC.
both paths,8 the average velocity is larger for path BC than for path FC. Also,9
FC > C B, and from equation (2.13) the time BC is smaller than the time FC: the
broken path is faster.
Galileo then considers the fall through a polygonal path of five sides and con-
jectures that adding more sides, and in this way approaching an arc of a circle,
8 Galileo shows this by extending the path C B to C A. Particles starting at rest from D and A reach B with the
same velocity (they fall from the same height). The velocity at C from the paths D BC and AC is therefore
the same. And since the planes AC and DC fall from the same height, the final velocities at C are
independent of the paths taken.
9 Galileo shows this inequality with a detailed argument in the [Third] Lemma of Proposition 35. Galileo
didn’t number the lemmas, but we follow Drake’s ordering.
B A
F
C G
Figure 2.14 Descent on a circle, from Galileo’s Two New Sciences.
makes the descent time get shorter. He is probably motivated by the isoperimetric
problem. In previous passages, in the First Day of Two New Sciences, he discusses
Zenodorus’s proof; Sagredo says that he has “seen the proof of this with partic-
ular satisfaction” (Galilei, 1638/1974, p. 62). However, Galileo does not give a
proof that a polygonal path of more sides gives a faster path on an arc of a cir-
cle. “It appears that one can deduce,” he says, “that the swiftest movement is . . .
along the circular arc.” Galileo adds more planes, as shown in Figure 2.14, and
uses the argument of the law of chords to break plane DC into the faster path
D E + EC. But the particle is not starting from rest at D, so the law of chords does
not apply (Erlichson, 1998). A full proof of this problem requires techniques that
were beyond Galileo’s mathematical horizon but that were developed within a few
decades.
2.4 Bending of Light Rays and Fermat’s Minimum Principle

Reflection on a mirror is not the only way for a light ray to distort from the per-
fect straight line. When a light ray enters a medium like water or glass, the path
also bends. This phenomenon, the refraction of light rays, was also studied by the
Greeks. For the astronomer Ptolemy (ca. 100 AD – ca. 170 AD), the visual rays
“may be altered in two ways: (1) by reflection, i.e. the rebound from objects, called
mirrors, which do not permit penetration, and (2) by bending in the case of media
which permit penetration”(Cohen, 1965). Ptolemy included tables of the change in
angle of a light ray passing from air to glass and from air to water (Mark Smith,
1982), but he was working under the wrong assumption that the angle of refraction
was proportional to the angle of incidence. It was only in the seventeenth century
– but arguably as early as 984 by the Arabic geometer Ibn Sahl (Rashed, 1990)
and secretly by the English astronomer Thomas Harriot (Lohne, 1959; Darrigol,
2012, p. 41) – that the correct law of refraction was published by Willebrord Snell
in 1621 and by Descartes (1637) in his Dioptrique.
A H
F
B E
C
I
G D
Figure 2.15 Descartes’ derivation of the law of refraction. Reproduced from

Descartes (1637).
It is not the angles of incidence and refraction that are proportional; the sines
of the angles, measured with respect to the normal to the plane of incidence, are
proportional. Descartes proves that a law of this sort results from a mechanical
model that treats light as particles that change their velocity as they go from one
medium to another. His analogy is the inflection of the motion of a tennis ball
upon entering water (see Figure 2.15). In a particle model, the refraction problem
is treated as analogous to the bouncing (or reflection) of a ball on a surface: the wall
inverts the direction of the velocity perpendicular to the wall, and the component
parallel to the plane remains the same. Descartes assumes that the same happens for
refraction: as the tennis ball penetrates the surface, the component of the velocity
parallel to the plane of the water does not change. Assume (see Figure 2.15) that
the tennis ball changes its velocity from vair to vwater as it penetrates the surface
BE. Since BI and AB are of equal length (Descartes chooses to put these points on
a circle), the travel times tair and twater for the two segments are different and must
satisfy:
vwater × twater = BI = vair × tair = AB. (2.14)
If the projection of the velocities parallel to the water surface, vparallel , is constant,
we have
AH = vparallel × tair (2.15a)

HF = vparallel × twater , (2.15b)
and Descartes obtains the “correct” law:

AH sin ∠ABH tair vwater
≡ = = , (2.16)
HF sin ∠GBI twater vair
or, which is equivalent:
vwater sin θwater = vair sin θair . (2.17)
Descartes obtains the correct law in the sense that the ratio of the sines is con-
stant. However, the experiments showed that light rays bend towards the normal
(AH>AF) as they penetrate the surface of water or glass (and not away from
the normal, as in Figure 2.15). In order to obtain agreement with experiments,
Descartes has to assume that light, like sound, propagates faster in water (or in
any dense medium, like glass) than in air. A radically different approach was fol-
lowed by Pierre de Fermat, who objected to Descartes’ argument that light should
propagate faster in denser media. The difference in the approach was not only in
the models for speed of propagation in denser media. In contrast with Descartes’
mechanistic approach, Fermat’s solution has the spirit of Hero’s treatment, with
Aristotle’s idea that Nature does nothing in vain as its ultimate justification.
In 1657, seven years after Descartes’ death, Fermat received the treatise La
lumière by Marín Cureau de la Chambre, which contains some bold statements
about minimum paths. After discussing the law of reflection, he says: “We see
therefore by all this reasoning that the equality of angles in reflection is made
along the shortest lines, and that this is not something particular to Light, since
nature observes the same order in all the movements that she causes” (De la
Chambre, 1662). Shortly afterward, he qualifies his statement: “if nature makes
its movements by the shortest lines, it would be necessary that they be made in
refraction as well.” Fermat replies in agreement: “The principle of physics is that
Nature performs her movements by the most simple paths”(Fermat, 1657/1894). In
a subsequent letter, from 1662, he adds:
There is nothing as probable or apparent as the assumption that Nature always acts by
the easiest means, which is to say either along the shortest lines when time is not a
consideration, or in any case by the shortest time.
As pointed out by Rashed (1970), the Arabic astronomer, Ibn al-Haytham (Alha-
cen is the Latinized version) uses the principle of ‘the simplest way’ to explain
refraction. Ibn al-Haytham thought of light as tiny hard spheres that move in
straight lines and that propagated in different media according to their density. The
denser the medium, the greater the resistance to penetration by light (Mark Smith,
2009). Interestingly, de la Chambre knew the Book of Optics of Ibn al-Haytham in
the Risner edition or in the Witelos version (Rashed, 2016).
Fermat is able to derive the law of refraction by minimizing the travel time for
the light ray from its initial point to its final point. And he does so using his own
“precalculus” mathematical invention, the method of maxima and minima.
2.4.1 Fermat’s Method of Maxima and Minima

In his discussion of maxima and minima, Fermat briefly explains his algorithm and
presents a simple example: for the line AC, find a point E such that AE × EC is
a maximum.
A E C
Note, in passing, that Fermat is solving the “isoperimetric problem” for a rectangle
of half perimeter AC. Call AC = b, and AE = a, with a the point to be found.
The product of the two segments is a(b − a) = ab − a 2 . Now he changes a by
a small amount e so that the first segment is a + e and the second is b − a − e.
The new product is ab − a 2 + eb − 2ae − e2 . At this point Fermat introduces
a concept (adaequare) that he took from Diophantus of Alexandria, and whose
interpretation was the subject of considerable debate (Breger, 1994; Giusti, 2009).
Fermat equates approximately, or sets adequal, the original product to the new one:
eb − 2ae − e2 ∼ 0, (2.18)
then divides by e
b − 2a − e ∼ 0. (2.19)
Finally,10 he sets e = 0 to obtain
b
a=
, or AE = EC. (2.20)
2
(The rectangle of largest area with a given perimeter is a square.) “It is impossible
to give a more general method,” he says.
Fermat’s method of finding the maximum of a function f (a) (equal to a(b − a)
in his introductory example) consists of finding the tangent of the function and
identifying the point a at which that tangent is equal to zero (see Figure 2.16).
Fermat’s method of tangents is very close to the infinitesimal calculus later devel-
oped by Newton and Leibniz. Newton, in his correspondence, acknowledges that
he obtained a hint for his method (of fluxions) “from Fermat’s way of drawing tan-
gents,” and that he just made it more general (Sabra, 1981, p. 144). In contemporary
mathematical language, Fermat’s method is to expand the ratio
f (a + e) − f (a)
(2.21)
e
in powers of e and to take the constant term: he is finding f (a), and then setting
it equal to zero to locate the extremum of the function. Fermat never spoke of
e as an infinitesimal, although it was a concept he knew from Galileo (Galilei,
10 In a parenthetical comment, Fermat adds that he divides by e “or by the highest common factor of e.”
M
f(x)
C
A
f(a + e)
f(a) B D
e
x
A a B
Figure 2.16 Fermat’s method of tangents. Left: The tangent to a curve, or the
slope of the curve, is given by [ f (a + e) − f (a)]/e when e is set to zero. At
a maximum M, the tangent is zero. Right: The tangent to a curve is zero for
“stationary” points: maxima like A or C, minima like B, or points like D which
are neither maxima nor minima.
1638/1974, p. 54). As told by Alexander (2014), in 1632 (at the time when Galileo
was being tried for defending heliocentrism), Jesuit clergymen banned the use of
infinitesimals. This might be the reason why Fermat, a staunch Catholic, does not
mention the term explicitly (Bascelli et al., 2014).
Fermat applies his method to several examples, including functions that include
square roots. As an example, let us apply Fermat’s method of tangents to the
√ √ √
function f (a) = a. We first need a + e − a. Multiplying and dividing by
√ √
a + e + a, we obtain
√ √
√ √ √ √ a+e+ a
a+e− a = a+e− a × √ √
a+e+ a
e
=√ √ . (2.22)
a+e+ a
√
Dividing by e and setting e = 0, we obtain 1/2 a for the tangent, or the slope of
√
a.
In his letter of 1662, Fermat tells Cureau de la Chambre that his derivation of
the law of refraction is long and tedious and “involves four lines by their square
roots.” In his Analyse pour les réfractions, Fermat (1657/1894b) indicates where
these square roots come from. The figure he uses is very similar to the one used by
Descartes in his tennis ball analogy. He puts the initial and final points, C and I ,
on a circle (see Figure 2.17). Just as in Descartes’ derivation, the minimum path
goes through the center D of the circle, and the unknown is the relation between
the segments D H and F D. For simplicity, let us take the radius of the circle
C D = D I = 1. And let us use Fermat’s notation: D H = a, D F = b. If we call v1
and v2 the velocities in the upper and lower media, the time for the minimum path is
CD DI 1 1
Tmin = + ≡ + . (2.23)
v1 v2 v1 v2
C θ1
1
D H D H
F O F b a
1
θ2
I
Figure 2.17 From Fermat’s Analyse pour les réfractions (left). On the right, we
indicate the angles of incidence (θ1 ) and refraction (θ2 ).
Fermat writes his equation in terms of what he calls the “resistances” in each
medium, which will be just the inverse velocities 1/v1 and 1/v2 . His resistances are,
in the contemporary notation, the indices of refraction of each medium. In order to
apply his method of maxima and minima, Fermat considers the nearby path C O I ,
where O D = e, the quantity later to be set to zero. The time Te for this path is
CO OI
Te = +
v1 v2
√ √
1 − 2eb + e2 1 + 2ea + e2
= + . (2.24)
v1 v2
In order to apply his method of tangents, we need to equate Tmin = Te , divide
by e, and set e equal to zero. Fermat does not provide the details of his “tedious”
calculation, but they are relatively easy (although somewhat lengthy) to carry out.
A reconstruction of the steps followed by Fermat, squaring twice in order to elim-
inate the square roots, was carried out by Sabra (1981) and Andersen (1983). We
offer a simpler derivation, in the spirit of the method of tangents, that Fermat could
well have followed.
The difference in travel times is
√ √
1 − 2eb + e2 − 1 1 + 2ea + e2 − 1
Te − Tmin = + . (2.25)
v1 v2
This is equivalent to
1 −2eb + e2 1 2ea + e2
Te − Tmin = √ + √ . (2.26)
v1 1 − 2eb + e2 + 1 v2 1 + 2ea + e2 + 1
Dividing by e and setting e = 0, we obtain
Te − Tmin b a
∼− + = 0. (2.27)
e v1 v2
and, since b/a = sin θ1 / sin θ2 (see Figure 2.17), we obtain

1 1
sin θ1 = sin θ2 . (2.28)
v1 v2
The result has the same functional form, the same dependence on the angles of
incidence and refraction, as the one obtained by Descartes, but the velocities are
reversed.11 If Fermat’s method is correct, then the speed of light should be lower
in denser media.
Fermat’s approach met with a serious objection from Claude Clerselier, an expert
in optics who edited the works of Descartes. For Clerselier, the principle used by
Fermat “is merely a moral, and not a physical one, which is not and cannot be
the cause of any natural effect” (Fermat, 1657/1894, p. 465). The objection is rea-
sonable: how, and why, does Nature choose a path between an initial and a final
point? Why does she choose time as the quantity to be minimized? If the princi-
ple of economy is a fundamental law of Nature, then there are final causes in the
universe. Clerselier defended the mechanistic, Cartesian point of view: given an
initial direction, a light ray will keep moving in a straight line until it reaches the
refracting surface, and at that point it will change direction. Here is Fermat’s rather
sarcastic conclusion to a letter he wrote on 21 May 1662 to Clerselier, a proponent
of Descarte’s approach (Sabra, 1981, p. 154).
I do not pretend, nor have I ever pretended to be in the inner confidence of Nature. She has
obscure and hidden ways which I have never undertaken to penetrate. I would only have
offered her a little geometrical aid on the subject of refraction, should she have been in
need of it. But since you assure me, Sir, that she can manage her affairs without it and that
she is content to follow the way that has been prescribed to her by M. Descartes, I willingly
hand over to you my alleged conquest of physics; and I am satisfied that you allow me to
keep my geometrical problem – pure and in abstracto, by means of which one can find the
path of a thing moving through two different media and seeking to complete its movement
as soon as it can.
2.4.2 Huygens’ Simplified Derivation of Snell’s Law

In his Treatise on Light, Christiaan Huygens (1690/1945) gives a simple and
elegant geometric version of Fermat’s proof.
Huygens starts with a path ABC (see Figure 2.18) that satisfies Snell’s law: the
sines of the angles θ1 = ∠ABP and θ2 = ∠CBQ are in the same ratio as the
velocities in the upper and the lower media. Then he shows that any path AFC
obtained from ABC by shifting the refraction point to the right, from B to F, has a
longer travel time than ABC. And the same is true for any path AKC obtained from
11 The Austrian mathematician, Georg von Peuerbach, published sine tables in 1450; but the symbol “sin” for
the sine of the angle was used for the first time by Cavalieri in 1634, twenty-one years before Fermat’s
publication of the law of refraction.
O
P
H
L
F
K
B
M G
Q
C
Figure 2.18 From Huygens’ Treatise on Light.
ABC by shifting the refraction point to the left, from B to K. So he concludes that
the path that satisfies Snell’s law minimizes the travel time between A and C. His
proof is indeed simple. Consider the shift to the right first. Huygens draws the two
right triangles BHF and BGF, with their common hypotenuse being the shift BF.
The distance traveled by the ray in medium 1 increases by HF, while in medium 2
it decreases by FC−(BG + GC). If we call, as before, v1 the velocity in the upper
medium, and v2 the velocity in the lower medium, the change in time between AFC
and ABC is

HF BG FC − CG
TAFC − TABC = − +
v1 v2 v2

sin θ1 sin θ2 FC − CG
= FB − + , (2.29)
v1 v2 v2
where we used the geometrical construction of Figure 2.18: HF/FB = sin θ1 and
BG/FB = sin θ2 . The same analysis applies to the path AKC:

sin θ2 sin θ1 AK − AL
TAKC − TABC = K B − + (2.30)
v2 v1 v1
If the path obeys Snell’s law, the expression in the curly brackets in equations (2.29)
and (2.30) is zero. Huygens notes that FC>CG, since FC is the hypotenuse of the
right triangle FGC and, for the same reason, AK>AL. The difference in times is
always larger than zero for arbitrary displacements to the left and to the right of B:
the path that satisfies Snell’s law is a minimum.
To connect Huygens’ approach to Fermat’s derivation of the method of tangents,
we repeat Huygens’ analysis starting from a path that is not a minimum. In order
to relate to Fermat’s notation, call FB = e, and FG = e cos θ2 . From Figure 2.18

we have
FC2 − CG2 e2 cos2 θ2
FC − CG = = . (2.31)
FC + CG FC + CG
Substituting equation (2.31) in (2.29), we obtain

sin θ1 sin θ2 e2 cos2 θ2

TAFC − TABC =e − + , (2.32)
v1 v2 FC + CG
and dividing by e and setting e = 0, we obtain Snell’s law (note that, for e = 0,
FC=CG). One nice feature about Huygens’ treatment is that he is not considering
infinitesimal displacements from the reference path. Instead, he shows that any
path that deviates from the one obeying Snell’s law takes a longer time. This is
true because, as we move the refraction point along the interface between the two
media, the time taken has only one minimum; the “time” function is concave (see
Figure 2.19).
In his Lectures on Physics, Richard Feynman (1963) presents a nice synthesis of
Fermat’s and Huygens’ geometrical argument. We reproduce Feynman’s version
in Figure 2.19.
A N1 Time
BF x
D
F
B x
G
N2 C
Figure 2.19 Adapted from Feynman’s Lectures on Physics. The refracted ray
ABC is displaced to AF B. If B F is much smaller than AB and BC, then
AB ≈ AD and GC ≈ FC. The path length changes by D F − BG, while the
change in time is D F/v1 − BG/v2 = B F(sin θ1 /v1 − sin θ2 /v2 ). Here θ1 is the
angle of the ray AB with the normal N1 , and θ2 is the angle of the refracted ray
BC with the normal N2 . For a stationary point the change in time is zero, and we
obtain Snell’s law, sin θ1 /v1 = sin θ2 /v2 .
2.5 Newton and the Solid of Least Resistance*

In Proposition 34 of the second book of his masterpiece, Philosophiæ Naturalis
Principia Mathematica (Principia for short), Isaac Newton proposes and solves
“the first genuine problem in the calculus of variations”(Goldstine, 1980). In this
section of the Principia, which has a strong practical flavor, Newton considers
the resistance experienced by a solid of revolution as it moves through a gas of
“equally spaced” particles. Proposition 34 contains a theorem and a scholium with
three statements of which Newton offers no proof in the Principia. The proofs
of two of these statements appeared later. In the translation of the Principia into
English by Andrew Motte (1729), at the end of the proposition, we read:
The demonstration of these curious Theorems being omitted by the author, the analysis
thereof, communicated by a friend, is added at the end of this volume.
Later, in 1888, in an appendix to “The Portsmouth Collection of Newton’s

Unpublished Papers” (Hall and Hall, 1888/1962), the same proofs appear (with
some notational differences). The Portsmouth collection contains notes and corre-
spondence on various subjects that, after Newton’s death, came into the possession
of “Mr Conduitt, who had marrried Catherine Barton, Newton’s favorite and
accomplished niece” (Hall and Hall, 1888/1962, p. ix ). Their only child married
a lord of the Portsmouth family and, in 1888, her descendant who was the Earl of
Portsmouth at that time, presented to the University of Cambridge the “portion of
the papers and correspondence which related to science.” The papers, for the most
part, presented little scientific novelty, with the exception of subjects related to the
lunar theory, the theory of atmospheric refraction, and the solid of least resistance.
In the preface, the editors of the Portsmouth Collection remark:
It is well known that in the Principia Newton determines the form of the solid of least
resistance, thus affording the first example of a class of problems which we now solve by
means of the Calculus of Variations.
Newton proves that the resistance of a sphere is half the resistance of a cylinder
of the same diameter, moving in the direction of its axis. In the first statement of
the scholium, he finds the angle of a truncated cone (a frustum of a cone) “which
should meet with less resistance than any other frustum constructed with the same
base and altitude.” In the second statement, he remarks that the resistance of an
ellipsoid or an oval of revolution is lowered if the front is replaced by a frustum
forming an angle of 135◦ . He proposes that this property “may be of use in the
building of ships.” And in the third somewhat puzzling statement, he gives – in
geometric form – the differential equation of the solid of least resistance. To the
best of our knowledge, this is the first occurrence of a curve defined by means of
a differential equation. In the following paragraphs, we will visit Proposition 34,
with particular focus on Newton’s variational calculation which anticipates Euler’s
treatment of variational calculus of 1744.
Motivated by a mundane application that amounts to finding the optimal shape
of a bullet, Newton presents what is “probably the most sophisticated calculation
in the Principia” (Chandrasekhar, 1995, p. 567).
2.5.1 The Sphere and the Cylinder

Newton assumes that, in its motion, the surface of the solid collides elastically,
and only once, with each of the particles of the gas. So it suffices to consider the
collision of a single particle with the solid. Newton starts by using the principle of
Galilean relativity, which he had discussed previously in the Principia: the solid in
motion at velocity v colliding with a particle at rest is equivalent to the collision
of a particle moving at velocity −v colliding with the solid at rest. Newton here is
invoking Corollary 5 of his well-known laws of motion: “The motions of bodies
included in a given space are the same among themselves, whether that space is
at rest or moves uniformly forwards in a right line without any circular motion.”
He then proceeds to analyze, geometrically, the effect of one particle incident on
a cylinder and on a sphere. The resistance of the solid will be proportional to the
momentum transfer from the particle to the solid. Consider the momentum of the
particle whose direction is parallel to the axis of symmetry of the cylinder, to be
represented by the line B L (see Figure 2.20), a distance EC from the axis. Newton
takes B L to be the same length as C B, the radius of the sphere. The momentum
and the radius have different units, and there is no reason to take them as having
the same length. But since Newton is trying to find a ratio between the resistance
of a cylinder and that of a sphere, it is very convenient to take the momentum and
the radius of the sphere as equal. After the collision, the particle “reflects” from the
surface of the sphere due to the effect of a momentum perpendicular to the tangent
to the sphere. The change in momentum is in the direction of the line L D (and,
the magnitude is 2 × L D). From Newton’s third law (“for every action there is an
equal and opposite reaction”), after the collision, the sphere receives a momentum
proportional to −2 × L D along the direction of L D. Only the component of this
momentum along the axis C D will have an effect on the resistance. The compo-
nent perpendicular to the axis will be cancelled by the component of momentum
change perpendicular to the axis due to a collision located symmetrically below
the axis. Since the triangles B L D and C B E are congruent, the component of L D
along B L is L D × E B/C B = E B 2 /C B. On the other hand, the effect of the
particle on the face of the cylinder, represented by the square in Figure 2.20, is
B
E L
C D
(θ = ÐLBD = ÐECB)
Figure 2.20 Copied from Newton’s Principia, Proposition 34, Book II.
always proportional to −2 × B L on all points of the (circular) cross section of the

cylinder. In conclusion, the effect of the particle on the sphere depends on the dis-
tance EC to the axis, and the ratio of the effects on the sphere and the cylinder is
EC 2 /BC 2 = sin2 θ. Since both the cylinder and the sphere are solids of revolution,
the ratio of the resistance of the sphere to the resistance of the cylinder is the ratio
of the volume of a paraboloid (as indicated in Figure 2.20) to the volume of the
circumscribing cylinder. Here, Newton states: “and it is known that a paraboloid is
half its circumscribed cylinder.” This result is easy to prove using calculus, which
Newton himself invented. But, in stating that the result is known, he is referring to
proofs from Greek geometry, and most probably to Archimedes’ proof. It is inter-
esting that this proof (which we present in Figure 2.21) is based not on the axioms
of geometry, but on Archimedes’ law of the lever. The law is derivable from New-
ton’s second law of motion but is otherwise not justified, as emphasized by the
philosopher, Ernest Mach, who has argued that the proof provided by Archimedes
(and later revisited by Huygens) is circular (Mach, 1960, pp. 11–28).12 In Figure
2.22 we offer a much simpler proof using a symmetry argument.
In 1841, most likely inspired by the Principia (Gould, 1985), Edgar Allan Poe
wrote “A Descent into the Maelstrom.” In the story, the protagonist survives a storm
using a knowledge of the resistance of solids, that he had received from a school-
master; but he states that the sphere offers less resistance than the cylinder (Poe,
1975, p. 138):
a cylinder, swimming in a vortex, offered more resistance to its suction, and was drawn in
with greater difficulty than an equally bulky body, of any form whatever.
Aside from the fact that one should not demand scientific rigor in a work of
fiction, Newton’s treatment applies to regimes of high velocities, where the resis-
tance is independent of the viscosity of the fluid. That is unlikely to be the case for
cylinders in water with vortices.
12 Recent scholarship revisits Archimedes’ proof and questions Mach’s arguments (Palmieri, 2008).
H
x D
H H
VP 2
VC
Figure 2.21 Archimedes’ proof that the volume V P of a paraboloid of revolution

is one half of the volume VC = π D 2 × H/4 of the circumscribing cylinder of
diameter D and height H . The paraboloid and the cylinder are sliced up into a very
large number of parallel thin discs. For a disc of diameter M N a distance x from
the vertex of the paraboloid, we have M N 2 /x = D 2 /H , or, 14 (π × M N 2 ) × H =
4 (π × D ) × x: the “torque” of the disc of the paraboloid at a (fixed) distance H
1 2
from an imaginary fulcrum equals the torque of a disc of the cylinder a distance
x from the fulcrum. The torque of the cylinder is therefore the same as the torque
of the whole paraboloid concentrated at a point a distance H from the fulcrum.
Since the center of gravity of the cylinder is at a distance H/2 from the fulcrum,
V P = VC /2.
R B R πR2
R R
y y
√
A Ry πRy
Figure 2.22 A paraboloid is inscribed on a cylinder of radius R and height R

(left). The volume of the cylinder is π R 3 . The axis of revolution for both the
cylinder and the paraboloid is AB. In the spirit of Archimedes’
√ proof, divide the
paraboloid into a large number of discs of radii r = Ry, equispaced along
the axis of revolution (the y direction). Since the area of a disc at a height y is
πr 2 = π Ry, the sum of the areas of those discs is equal to the sum of the lengths
shown in the figure on the right. The volume of the paraboloid is then equal to
the area of the triangle of base π R 2 and height R, which equals π R 3 /2, half the
volume of the cylinder.
2.5.2 An Application “in the Building of Ships”

In the second statement of the scholium, Newton poses the following problem. For
a truncated cone or frustum, fix the height (O D) and radius of the base (OC), as
in Figure 2.23. What is the value of the angle θ of the frustum that minimizes the
resistance? Newton states the solution, without indicating the proof, in geometric
C
F
θ
O D S
Q
Figure 2.23 First figure in the scholium to Proposition 34 in Newton’s Principia.
language: the angle is such that C Q = Q S, where Q is the midpoint of the height
and S is the vertex of the cone. More specifically, if S is the vertex of the cone, the
angle is such that

OD OD 2
OS = + + C O2 . (2.33)
2 2
The proof is simple using elementary calculus. We adapt here Newton’s proof –
he uses calculus in this case – as presented in Whiteside (1974). Since all the points
of the cone form the same angle θ with the axis of symmetry (tan θ = OC/O S),
the resistance of the frustum is the sum of two terms: the resistance of a disc of
radius F D (proportional to the area of the disc), and the resistance of the conical
surface (proportional to the product of the area of the annulus bounded by the
circles of radii F D and OC with sin2 θ):

R = Cπ OC 2 − D F 2 sin2 θ + D F 2 (2.34)

= Cπ OC sin θ + D F cos θ ,
2 2 2 2
(2.35)
and, since OC − D F = O D tan θ, we have

R = Cπ OC 2 sin2 θ + (OC − O D tan θ)2 cos2 θ (2.36)

= Cπ OC 2 + O D O D sin2 θ − 2OC sin θ cos θ . (2.37)
The angle of the cone of minimum resistance is determined by the condition
d R/dθ = 0, which we write, after very little algebra, as
OD
cot2 θ − cot θ − 1 = 0, (2.38)
OC
and

2
OS OD OD
cot θ = = + +1 (2.39)
OC 2 OC 2 OC
which is Newton’s statement as expressed in equation (2.33).
D N
F G
A R
M B
I H
Figure 2.24 Second figure in the scholium to Proposition 34.
Note that, for an infinitesimal cone for which O D → 0, θ = π/4, the optimal
angle for an inifinitesimal cone is 45◦ (or 135◦ , as indicated by Newton, if the
angle is measured with respect to the positive x axis). In order to verify that this is
a minimum (and not a maximum), we take O D OC in equation (2.37):

R C OC 2 − O D × OC sin 2θ , (2.40)
which is minimized for θ = π/4. Newton notes that the angle θ is acute; cot θ > 1,
as can be seen from equation (2.39), and he states (see Figure 2.24):
it follows, that, if the solid AD B E be generated by the convolution of an elliptical or oval

figure AD B E about its axis AB, and the generating figure be touched by three right lines
F G, G H , H I , in the points F, B, and I , so that G H shall be perpendicular to the axis
in the point of contact B, and F G, H I may be inclined to G H in the angles F G B, B H I
of 135 degrees: the solid arising from the convolution of the figure AD F G H I E about the
same axis AB will be less resisted than the former solid; if so be that both move forward
in the direction of their axis AB, and that the extremity B of each go foremost. Which
Proposition I conceive may be of use in the building of ships.
Newton does not offer a proof of this statement. We will prove it below. But
before that we mention Newton’s remarkable statement about a general solid of
least resistance. Given the previous statement, the optimal solid has to end with a
slope of 45◦ and to have a circular “nose.” Call BG the radius of the nose, as in
Figure 2.24, and write the slope at M in terms of BG as G B/B R. Newton states
that at every point N of the curve the following relation holds
B R × G B2 1
MR × 4
= . (2.41)
GR 4
Newton does not prove this statement in the Principia, but a proof appears
in an Appendix to Motte’s translation and in unpublished papers including the
manuscripts of the Portsmouth Collection. His calculation is a notable anticipation
of Euler’s treatment of variational calculus. Note that equation (2.41), written in a
geometrical language, is a relation between the slope and the height of the curve
that can be cast in modern notation as a differential equation. So Newton is not

giving us the form of the curve, but a differential equation satisfied by the curve.
If we call x the coordinate along the axis of revolution and make the following
identifications,
M N = y, (2.42a)
GB dy
=− ≡ −y , (2.42b)
BR dx
GB dy y
= − = , (2.42c)
GR d x 2 + dy 2 1 + y 2
equation (2.41) becomes
yy 3 GB
2 = − , (2.43)
1 + y 2 4
with G B a constant, the radius of the circular front, which in turn equals the value
of y when the slope is y = −1.
For the proof of the statement that the solid of least resistance always has an
angle θ < 45◦ , consider the curve F B of Figure 2.25. The (infinitesimal) seg-
ment ab of the curve generates a conical surface upon revolution along the axis
O P. If the segment is broken into two segments ac and cb, where ac is at 45◦
with the horizontal, the resistance decreases (the optimal infinitesimal frustum has
θ = 45◦ ). From equation (2.40), the change R is the difference between the
a

d c
45◦
a
d c G
b
O mn P
B
Figure 2.25 Adapted from Whiteside (1974).

resistances of the solids of revolution generated by the trapezia macn and mabn:
R = C × am × mn(sin 2θ0 − 1), where θ0 is the angle of ab with the horizontal.
Now consider the surfaces S1 and S2 obtained respectively by revolving the bro-
ken lines a ac, and a c c. Surface S2 is obtained from S1 by “lifting” the segment
ac to the line F G (at 45◦ with the axis O P). This lifting process decreases the
resistance by (one half) the difference of the resistance of the annuli obtained by
revolving the segments ad and a d . (Recall that the resistance of a conical sur-
face that forms an angle of 45◦ with the axis of revolution is half the resistance
of the annulus projected by the surface on the plane perpendicular to the direc-
tion of propagation.) Now imagine dividing the whole curve F B into infinitesimal
segments and sequentially performing this lifting process from left to right from
F, where the curved surface intersects the cone F G (the surface having a slope θ
equal to or larger than 45◦ at that point). As a result, we obtain the truncated cone
F G B with lower resistance, as stated by Newton. This proof is inspired by (but
different from) the one presented by Whiteside (1974, p. 663) in a paragraph that
concludes, “While no record of Newton’s own demonstration of this inequality has
survived, we have no reason to think that it could have been greatly different in its
structure.”
Now we turn to Newton’s anticipatory treatment of the calculus of variations.
2.5.3 The First Genuine Variational Calculation

Newton’s variational calculation of the solid of least resistance is a clear anticipa-
tion of Euler’s method: rather than finding the value x for which a function f (x)
is a minimum, the problem here is to find a function – let’s call it y(x) – that mini-
mizes a given integral of a function of y (and derivatives of y) over a certain range
of y. In particular, Newton is interested in the functional form y(x) defining a solid
of revolution that minimizes the resistance of the (whole) solid. Newton’s reason-
ing is to vary the function at arbitrary points in the domain x and to ask that, for
small values δ f of the variation, the resistance not change. In Figure 2.26 we show
a rendering of Newton’s variation procedure, adapted from the figure in Cajori’s
edition (1729/1934, p. 659) of the Principia. Newton takes two points, N and g in
the curve of the solid of revolution, and displaces them horizontally, to N and g
respectively, so that N N = gg is an infinitesimally small quantity. He chooses to
displace the points horizontally – in the direction of propagation, which in turn is
the direction of the axis of symmetry of the solid – because in this way the only
change to the resistance comes from the change in the angles of the segments n N
and gG. If we designate by {A} the resistance of the surface generated by the
revolution of A about the axis BC, we have {N g}={N g }. This equality holds
because we are displacing N g parallel to the direction of motion. This means that
D
n
N
o
N
g g
h G
C m M b B P
Figure 2.26 Newton’s variational method for the curve Dn N gG P of the solid of
least resistance.
y
n
b1
y1
θ1 N
N
b2
θ2 G
y2
a1 a2
x
m M B
Figure 2.27 Newton’s variational calculation taking the segments m M and bB

from Figure 2.26 as consecutive.
the x coordinates M and b of N and g can be chosen to be equal. This procedure is

identical to the one followed by Euler in 1744, which we will visit in Section 4.3.
In a slight rephrasing of Newton’s solution, we take the two points M and b
of Figure 2.26 to coincide, as in Figure 2.27. For the infinitesimals, we use the
following notation m M = a1 , M B = a2 , on = b1 , hg = b2 .13 For the vertical
(finite) coordinates we set mo = y1 , M N = y2 .
When the arc n N G, which we discretize in the two straight lines n N and N G
(see Figure 2.27), is rotated around the x axis, it generates a surface with the
following resistance:

R = 2πC y1 b1 sin θ12 + y2 b2 sin θ22 (2.44)
13 Newton takes the vertical intervals b = b . We find that keeping them different makes the emergence of
1 2
Newton’s differential equation clearer.

b13 b23
= 2πC y1 2 + y2 2 (2.45)
a1 + b12 a2 + b22

b13 b23
≡ 2πC y1 + y2 . (2.46)
(n N )2 (N G)2
Now we consider the modified arc n N G, obtained by shifting the point N hori-
zontally to N by an amount δ = N N . Notice the similarity with Fermat’s method
of tangents (see Section 2.4.1), with N N playing the role of Fermat’s “e.”
The new, modified, resistance Rδ is

b13 b23
Rδ = 2πC y1 + y2 (2.47)
(a1 + δ)2 + b12 (a2 − δ)2 + b22

b13 b23
2πC y1 + y2 , (2.48)
(n N )2 + 2a1 δ (N G)2 − 2a2 δ
where we have neglected the terms of order δ 2 since we are considering infinitesi-
mal increments. Multiplying and dividing the first term in parentheses in equation
(2.48) by (n N )2 − 2a1 δ and the second by (N G)2 + 2a2 δ, we obtain

3 (n N ) − 2a1 δ 3 (N G) + 2a2 δ
2 2
Rδ 2πC y1 b1 + y2 b2 , (2.49)
(n N )4 − (2a1 δ)2 (N G)4 − (2a2 δ)2

y1 b13 a1 y2 b23 a2
R + 4πCδ − + , (2.50)
(n N )4 (N G)4
where, again, we neglected the terms of order δ 2 . In order for the curve to be an
extreme, the variation linear in δ should be zero, meaning
y1 b13 a1 y2 b23 a2
= = constant. (2.51)
(n N )4 (N G)4
In other words, there is a quantity that is a function of the small increments a and
b in the x and y variables that remains constant over the curve:
b3 a
y 2 = K , (2.52)
a 2 + b2
where K is a constant for all points on the curve, and a and b refer to infinitesimal
increments around the point (x, y) on the curve. And now we can see that the
enigmatic equation (2.41) is nothing but the differential equation (2.52). Since we
know that, at the nose of the solid where the height y = G B (see Figure 2.24), the
slope of the curve is 45◦ , we have b/a = −1, which implies:
BG
K =− , (2.53)
4
and, since b/a = dy/d x, we get that equation (2.52) is equivalent to equations
(2.43) and (2.41).
In the proof in Newton’s letter (probably to David Gregory) reproduced in
Cajori’s edition (1729/1934), Newton takes b1 = b2 and the relation that is constant
along the curve becomes
a1 a2
y1 4
= y2 . (2.54)
nN N G4
It is a simple excercise – proceeding in the same manner as we did for (2.52) – to
show that equation (2.54) implies equation (2.41). Newton’s differential equation
can be derived with the rules of variational calculus, later developed by Euler and
Lagrange. We present such a derivation in Appendix A.
3
An Excursion to Newton’s Principia
The Principia is one of the greatest books of all time. In it Newton formulates a
“System of the World,” incorporating his own new mathematical ideas as well as
displaying a masterful command of geometry. “Newton was the greatest genius
that ever existed,” the mathematician Joseph-Louis Lagrange is alleged to have
said, adding, with a grain of humor, “and the most fortunate, for we cannot find
more than once a system of the world to establish.”
We include a chapter on Newton’s laws of motion and the Principia even though
this work is not directly related to variational principles. We do so for two reasons:
the monumental importance of his work on mechanics and the fact that his ideas
are crucial in the development of the principle of least action. Moreover, Newton
himself was a proponent of the Aristotelian simplicity and economy (Lyssy, 2015).
In his first rule for the study of Natural Philosophy we read: “As the philosophers
say: Nature does nothing in vain, and more causes are in vain when fewer suffice.
For nature is simple and does not indulge in the luxury of superfluous causes”
(Cohen and Whitman, 1999, p. 794).
3.1 Newton’s Propositions on the Laws of Motion

Book I of the Principia starts with three axioms, the proverbial Newton’s laws of
motion. After some definitions, Newton states (Motte, 1729, p. 83):
LAW I
Every body perseveres in its state of rest, or of uniform motion in a right line, unless it is
compelled to change that state by forces impressed thereon.
LAW II
The alteration of motion is ever proportional to the motive force impressed; and is made
in the direction of the right line in which that force is impressed
38
LAW III
To every action there is always an opposed and equal reaction: or the mutual actions of
two bodies upon each other are always equal, and directed to contrary parts.
In many derivations of the Principia, Newton uses his second law following
a method of impulses. The body moves on straight lines that are broken by the
actions of impulses acting at regular intervals of time. The orbits are firstly poly-
gons, much in the style of Zenodorus’s proof that we presented in Chapter 2. And
these polygons become continuous curves when the time between impulsive forces
shrinks to zero. Newton takes this procedure from Descartes’s tennis ball analogy
of the Dioptrique (Cohen and Whitman, 1999) where the motion of the tennis ball
is broken by an impulsive force at the interface of air and water (see Figure 2.15).
It is interesting that Newton, although being the creator of differential calculus,
uses geometrical methods in all his derivations in the Principia. In the rest of the
chapter we take a brief excursion through these derivations.
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion

3.2.1 Proposition 1: Equal Areas are Swept Out in Equal Times
If no forces act on a particle, it will move in a straight line at constant velocity.
This means that at constant intervals of time t, it will be displaced a constant
distance vt (see Figure 3.1). Since the triangles of sides vt and a common ver-
tex S have the same base and the same height (d), they have the same area: if we
vΔt
S
Figure 3.1 Equal areas in equal times. A particle moving at constant velocity v
on a straight line travels a distance vt at each interval t. If one considers a
fixed point S at a distance d from the line, the area swept out with respect to that
point during t is constant and given by dvt/2.
40 An Excursion to Newton’s Principia
displace the vertex of a triangle in a direction parallel to the opposite side, the area
does not change. And the same applies to a parallelogram when the sides are dis-
placed parallel to each other. This perhaps curious result was shown by Euclid in
his Elements, in Proposition 35 of Book 1. In his commentary to Euclid’s work,
Pappus of Alexandria (ca. 300 – ca. 350) refers to Proposition 35 as one of the
“paradoxical theorems of mathematics, since the uninstructed might well regard it
as impossible that the area of the parallelograms should remain the same while the
length of the sides other than the base and the side opposite to it may increase indef-
initely”(Heath, 1926, p. 329). This simple theorem of Euclidean geometry allows
one to prove that a particle moving in a straight line sweeps out equal areas in equal
times. And Newton uses the same theorem to prove that equal areas are swept out
in equal times in the presence of a force that points to a fixed point.
If a force acts on a particle, by the second law, its motion is no longer going
to be of constant velocity. If the force is in the direction of motion, its velocity
will change, but the particle will keep moving in the same direction. If the force
makes an angle with the direction of motion, the particle will change its direction.
Let us consider the case where the force is always directed to the fixed position
of point S (Newton labeled the point “S” because he was thinking of the sun).
Newton approximates the motion as a sequence of straight segments, as though
the force were acting in the form of periodic pulses. At the end of the calculation,
he takes the limit of the time between pulses to zero, and both the force and the
trajectory become continuous. The idea of impulsive forces was used by Robert
Hooke (Nauenberg, 1994), although the priority over Newton is debated (Erlich-
son, 1997). The idea is to compose the velocities: after each pulse the velocity is
the sum of two (vector!) components, the velocity the particle had just before the
pulse (which we will call v) and the change imparted by the impulsive force (which
we will call v).
At a time t after the pulse at a point B (see Figure 3.3), the displacement of
the particle will be given by the composition of two displacements: vt in the
direction of the velocity before the pulse, and vt in the direction of the impul-
sive force at B. Since the displacement cC (see Figure 3.3) is in the direction of
S B, the triangles S Bc and S BC have the same area: regardless of the changes
v, provided all these changes are directed to a fixed point; equal areas are swept
out in the same time. Simultaneously with Newton, Robert Hooke was using a
similar kind of polygonal diagram to study accelerated motion (see Figure 3.2),
and stated that, for elliptical orbits, the force should vary as the inverse square of
the distance. Later Hooke accused Newton of plagiarizing the inverse square law.
Newton responded that Hooke had only hinted at the idea and had offered no proof,
and he almost completely omitted almost completely any reference to him in the
Principia.
Figure 3.2 A page from Robert Hooke’s manuscript, dated September 1685,
showing graphical evaluation of orbital motion which varies linearly with dis-
tance.
3.2.2 Proposition 6: The Force Law and the Geometry of the Orbit*
After proving Kepler’s second law, Newton derives a geometric relation for the
orbit of an object subject to a central force. Figure 3.4 shows Newton’s diagram of
the orbit. When the particle is at point P, its velocity is along the tangent at P (the
line Z Y ). As we discussed in the previous section, the position of the particle an
instant later is Q, which results from the composition of two displacements: P R,
the displacement of the particle without the force at P; and R Q, the displacement
vΔt c
CΔ
vΔt
vΔt
S A
f e
E d
F D
Z
c
C
V B
S A
Figure 3.3 Equal areas in equal times. Top: Since the triangle S BC is obtained
from the triangle S Bc by displacing the vertex c parallel to their common base
S B, their areas are the same. For “central forces” – forces that are directed to a
fixed point (S in the figure) – equal areas are swept out in equal times. Bottom:
Reproduced from Proposition 1 of the Principia. Using his second law and a
simple extension of the argument of Figure 3.1, Newton derives Kepler’s second
law of planetary motion.
Y
R
P
Q
T Z
A
S
V
Figure 3.4 From Newton’s Principia. Illustration of Proposition 6.
due to the force at P. Since the force points always in the direction of S, Q R is
in the direction of P S. Since the second law states that the alteration of motion is
proportional to the force, the segment Q R, for a fixed time (or, which is equivalent,
for fixed R), is proportional to the force. Then Newton states that, if we fix the
force, the displacement Q R is proportional to the square of the time. The segment
Q R is the deviation of the body from the tangential path P R due to the action of
the force. Newton knew the results of Galileo’s experiments with inclined planes:
for a constant force, the distance traveled is proportional to the square of the time.
“Newton is assuming that, as the point Q shrinks back to the point P, the force
can be treated as if it were constant”(Brackenridge, 1996), and the point describes
a parabolic motion. Thus the segment Q R is both proportional to the square of the
time and to the magnitude of the force at point P:
Q R ∝ (Force) × (t)2 . (3.1)
And here Newton uses Kepler’s second law (his Proposition 1): t is proportional
to the area of the triangle S R P, which is approximately S P × QT /2, with QT the
segment drawn from Q perpendicular to S P:
1 (Q R)
Force ∝ . (3.2)
(S P)2 (QT )2
The geometry of the orbit is encoded in the relation of the infinitesimal quantities
(QT )2 /Q R with the distance to the center of force S P. Different orbits and differ-
ent positions of the center S with respect to the orbit give different laws of force.
One of the breakthroughs in Newton’s Principia is the proof that, if an orbit is ellip-
tical and the force is directed to the focus of the ellipse, the force should decrease
as the inverse square of the distance. Newton proves this by showing that (when
S P points to the focus) (QT )2 /Q R is a constant (equal to b2 /a with a the semi-
major axis of the ellipse and b the semi-minor axis). But before that he considers
some simpler cases, which we now discuss.
3.2.3 Circular Orbits

As a preliminary example, consider the simplest case: a circular orbit with the
center of force at the center of the circle, with all the points having the same S P
(the radius a of the circle). Using the Pythagorean theorem (see Figure 3.5),
a 2 = (QT )2 + (a − Q R)2 . (3.3)
Now we use Q R QT and the ratio,

(QT )2
= 2a, (3.4)
QR
is certainly constant on all the points of the circle. For the center of force at the
center of the circle, any dependence of the force on S P is an allowed solution.
What happens if the orbit is circular but the center of force is not at the center of
R R
Q Q
P T x P
T y v
S
S C
Figure 3.5 Left: Circular orbit with the center of force S at the center of the circle.
Right: Center of force S at a point on the circle (treated in Proposition 7 of the
Principia).
the circle? Newton considers this problem in Proposition 7. Let us consider the case
where the center of force is at a point on the circle. This is a peculiar orbit where
the object passes periodically through the very center of the force. The solution for
this problem is simple because of the similarities of the triangles C P y, Qx T and
Pvx:
QT Py SP
= ≡
Qx CP 2a
QR 2a
= . (3.5)
Pv SP
For the law of force, we need the ratio QT 2 /Q R:

(QT )2 S P 3 (Qx)2
= , (3.6)
QR 2a Pv
but, since Qx Qv, the ratio (Qx)2 /Pv = 2a, the ratio we obtained in equation
(3.4) for the center of force at the center of the circle, and
(QT )2 (S P)3
= . (3.7)
QR (2a)2
The force is
1 QR 1
F∝ ∼ . (3.8)
(S P) (QT )
2 2 (S P)5
With a force that decreases as the inverse fifth power of the distance, we can obtain
a circular orbit that passes through the center of force. But we stress that Newton
is deriving a law of force from the geometry of the orbit, and not the converse.
In the following propositions, Newton considers elliptical orbits and uses his
masterful command of the properties of conics to relate the ratio (QT )2 /Q R to the
geometry of the orbit.
3.2.4 Proposition 10: Elliptical Orbit with the Center of Force at the Center of
the Ellipse
In his treatment of elliptical orbits, Newton uses several properties of conics. Some
of them he derives, and some of them he mentions without proof. In order to make
this presentation self-contained, we present simple proofs of all the properties used
by Newton.
The area of a circumscribed parallelogram is constant.

We show this property by a “scaling” argument, treating the ellipse as a rescaled
circle. The equation for a circle of radius a is (x/a)2 +(y/a)2 = 1. If we rescale the
vertical axis by some factor λ, the equation becomes (x/a)2 + (y/λa)2 = 1, which
is the equation of an ellipse. This rescaling changes the angles: the perpendicular
diameters G P and K D of the circle become the conjugate diameters G P and K D
of the ellipse (see Figure 3.6). An interesting feature of this process is that, after
the rescaling process, all the initial areas are changed by the same factor λ (any
“microscopic” square of side δ and area δ 2 becomes a rectangle of sides δ and λδ).
This means that all the circumscribed squares of the circle become circumscribed
parallelograms, all of them with the same area λa 2 . And since the area of the paral-
lelogram is the product of the base and the height, for the conjugate semi-diameter
DC of the ellipse we have (see Figure 3.6):
1
(DC) × (P F) = Constant = (Area of the ellipse), (3.9)
4
where P F is the segment perpendicular to the conjugate diameter D K from P.
D
D
P P
C
C
F
G
G
K
Figure 3.6 An ellipse is a rescaled circle. Upon rescaling (contracting the verti-
cal direction in this case), all the areas are changed by the same factor. Since the
circumscribed (shaded) parallelogram (right) is a contracted square, it has con-
stant area (its ratio with the area of the ellipse is always 1/4π ), irrespective of the
choice of conjugate axes.
D Q
P
v
Q
C
G
Figure 3.7 The intersecting chord theorem: (Pv × vG)/Qv 2 = (PC)2 /(DC)2 .
Intersecting Chord Theorem

One of the results used by Newton is the so-called intersecting chord theorem
(Brackenridge, 1996, p. 114).
For the conjugate semi-diameters PCand DC and the chord Q Q intersecting
P G in v (see Figure 3.7), we have:
(Pv) × (vG) (PC)2

= . (3.10)
(Qv)2 (DC)2
This is a property that Newton cites without proof. In Proposition 10, Problem
5, he invokes this property, stating “by the property of conic sections.” According
to the recent translation by B. I. Cohen and Anne Whitman (1999), authors of
books on conic sections in the eighteenth and nineteenth centuries supplied a proof
of this theorem to help the readers of the Principia. And, as common readers of
the Principia, at this point of his derivation we relate to the words Galileo puts
in Simplicio’s mouth while demonstrating one of his geometrical proofs: “You
proceed too grandly in your demonstrations; it seems to me that you always assume
that all of Euclid’s propositions are as familiar and ready at hand to me as his
very first axioms” (Galilei, 1638/1974, p. 220). Here we offer a proof using the re-
scaling idea. First we prove this for a circle, where the property can be expressed
as a ratio of areas. Since the areas are changed by the same factor upon rescaling,
the property remains valid for the ellipse.
Consider the areas A1 , A2 , and A3 of Figure 3.8. For the circle, using PC = C D
(equal to the radius of the circle) and the Pythagorean theorem, we have that the
ratio of the areas is
A1 A2 [(Gv)(C D)][(Pv)(C D)]
= (3.11)
A32 (PC)2 (R P)2
[(PC + Cv)(PC)][(PC − Cv)(PC)]
= (3.12)
(PC)2 (R P)2
D R
A3
P D R
A3
v A2 P
v A2
A1
C A1
C
G G
Figure 3.8 Rescaling method to prove the intersecting chord theorem.
(PC)2 − (Cv)2
= (3.13)
(R P)2
=1 (3.14)
A A
= 1 2 2 (3.15)
A3
where, in the last line, we used the fact that all the areas are rescaled by the same
factor. Since the sides of the parallelograms of areas A1 , A2 , and A3 form the same
angle α (the angle between the semi-major axes), their areas are the product of their
sides times cos α. The factor cos α cancels in computing A1 A2 /A2 3 , and we obtain,
for the ellipse:
(Gv)(Pv)(C D)2
= 1. (3.16)
(C P)2 (R P)2
The theorem of equation (3.16) is valid regardless of the quantities Pv and R P
being infinitesimally small. Since Newton is interested in infinitesimal values of
Pv and R P, one can replace Gv by G P = 2C P, and then the theorem adopts a
simpler form (note that Qv = R P):
(Pv)(C D)2
= Constant (= 2). (3.17)
(PC)(Qv)2
Equipped with these two properties of the ellipse, Newton derives the law of
force for the center S at the center of the ellipse (see Figure 3.9).
We are interested in the ratio QT 2 /Q R ≡ QT 2 /Pv. By similarity of the
triangles QT v and P FC, we have
QT PF A0
= = (3.18)
Qv PC (PC) × (C D)
D R
Q
v P
T
F
G
Figure 3.9 From Proposition 10 of the Principia. Elliptical orbit with center of
force at the center of the ellipse.
where A0 is the constant (P F) × (S D), one-fourth the area of the circumscribed

parallelogram (see equation (3.9)). Using QT ∝ Qv/(PC) × (C D) we have
QR Pv (PC)2 (C D)2
≡ ∝ Pv (3.19)
(QT )2 (QT )2 (Qv)2

(Pv)(C D)2
= ×(PC)3 . (3.20)
(PC)(Qv)2

=2 by equation (3.17)
Since Q R/(QT )2 ∝ (PC)3 , the force is
1 QR
F∝ ∝ PC; (3.21)
(PC) (QT )2
2
the force law for an elliptical orbit with the center of force at the center of the
ellipse is proportional to the first power of the distance.
And now for the climax: the proof that an elliptical orbit with the center of force
at the focus implies that the force decreases as the inverse square of the distance.
3.2.5 Proposition 11: Center of Force at the Focus of the Ellipse*

In order to arrive at one of the great results of the Principia, Newton uses one
more property of the ellipse: if S is one of the foci, a line drawn from the center
of the ellipse parallel to the tangent at some point P intersects P S at E, and P E
is constant (see Figure 3.10). This theorem is easy to prove using the so-called
reflective property of the ellipse: the lines P S and P H connecting a point of the
ellipse with the foci make the same angle with the tangent. Or, all light rays that
emerge from one of the foci are reflected at the ellipse. The proof of this property
uses the result of Hero of Alexandria that we discussed in section 2.2, and the
P I P
E
S H S C H
Figure 3.10 The reflective property of the ellipse. Left: The dotted lines touch
confocal ellipses and have a total length greater than P S + P H . Right: If EC is
parallel to the tangent at P, P E is constant.
definition of the ellipse as the shape traced by a point so that the sum of its distances
from two foci, let us call it L 0 , is constant. For a given ellipse, draw a set of confocal
ellipses (sharing the two foci) but with larger sum L > L 0 of the distances to the
foci. The tangent to the first ellipse will touch these confocal ellipses. If we move
a point along the tangent, the smaller distance to the foci corresponds to P (the
ellipse with the smaller L). And, according to Hero, the minimum distance occurs
when the angle of incidence equals the angle of reflection. Now, for the second
property, draw H I parallel to the tangent at P through the focus H (see Figure
3.10). Due to the reflective property P I = P H (the triangle I P H is isosceles).
Also, since EC is parallel to H I , E S = E I and the total length is L 0 = 2P I +
2E I , P E is constant (equal to the ellipse’s semi-major axis).
Proof of the Inverse Square Law for the Center at S

Using the properties of the ellipse, finding the force law is now a matter of using
proportionalities between triangles.
Since the triangles Pvx and PC E (see Figure 3.11) are similar, we have:
Px PE
= , (3.22)
Pv PC
and the similarity between triangles QT x and P F E implies
QT PF
= . (3.23)
Qx PE
Now, P E is constant as we have proved, and P F ∝ 1/C D (equation 3.9). Finally,

using Q R = P x Pv, we have:
D R
Q
x P
T
v
E
S C H
F
G
Figure 3.11 From Proposition 11 of the Principia.
QT 2 (Qx)2 (P F)2 (PC)

∝
QR Pv
(R P) (PC)
2
∝ = Constant! (3.24)
(Pv)(C D)2
(by equation (3.17)).
Substituting this constant in equation (3.2) we obtain:
1
Force ∝ . (3.25)
(S P)2
For an elliptical orbit with the center of force in the focus of the ellipse, the
force decreases as the inverse square of the distance. This is the law of universal
gravitation, one of Newton’s greatest contributions.
4
The Optical-Mechanical Analogy, Part I
4.1 Bernoulli’s Challenge and the Brachistochrone

At the end of the seventeenth century, prominent mathematicians gravitated
towards problems formulated in terms of maxima and minima. The interest ger-
minated from a combination (of uncertain proportions) of aesthetic preferences,
theological reasons, and the sheer allure of mathematical challenges. The most
famous of such problems appears in a letter of June 9, 1696,1 from John Bernoulli
to his friend Gottfried Leibniz: “Given two points A and B in a vertical plane, find
the path AMB down which a moveable point M must, by virtue of its weight, fall
from A to B in the shortest possible time” (Leibniz, 1962; Orio, 2009). In just a
week, on June 16, Leibniz wrote back with the solution, adding that he solved the
problem against his will, but that he was attracted to its beauty like Eve before the
apple. He expressed his solution in the form of a differential equation, and proposed
to name the curve tachystoptota (curve of quickest descent). Bernoulli responded
a few days later taking up the biblical reference. He was “very happy about this
comparison provided that he was not regarded as the snake that had offered the
apple” (Knobloch, 2012). Bernoulli points out that Leibniz’s solution corresponds
to the cycloid (the curve traced out by a point on the rim of a circle rolling on
a flat plane) and proposed to name the curve brachistochrone. The cycloid was
admired by Galileo “as a very gracious curve to be adapted to the arches of a
bridge” (Drake, 1978, p. 406) and was proven by Huygens (1673) to correspond to
the isochronous pendulum. In the meantime, Bernoulli had already communicated
the problem as a challenge to Rudolf Christian von Bodenhausen in Florence, in
Switzerland to his brother Jacob Bernoulli, and in France to Pierre Varignon. In
1 In the letter’s signature we read “Gröningen June 9/19.” The duplicated dates correspond to the Julian and
Gregorian calendars which produced a dephasing of 10 or 11 days until 1701. Leibniz and Bernoulli discuss
calendars and the division of the day in many places of their correspondence. In letter 120, of January 1701,
Bernoulli jokes that he received Leibniz’s letter the day after it was written, “a notable day with a night of
eleven days” (Orio, 2009, p. 497).
51
52 The Optical-Mechanical Analogy, Part I
his response to Leibniz, John Bernoulli mentions two solutions. The first is similar
to Leibniz’s. The second uses a clever map from the swiftest path to the problem
of finding the trajectory of a light ray propagating in a medium of continuously
varying index of refraction. This mapping is a hallmark of the optical – mechan-
ical analogy, which had a profound influence in later formulations of mechanics.
In the nineteenth century, the analogy was used by William Rowan Hamilton, and
in the twentieth century it played a fundamental role in the formulation of wave
mechanics by Louis de Broglie and Erwin Schrödinger.
Bernoulli published his challenge in the December issue of the Acta Erudito-
rum and announced that he would suppress his own solution until Easter 1697.
The May 1697 issue of the Acta Eruditorum contained an introductory historical
paper by Leibniz on the brachistochrone. Leibniz omits his own solution because
it corresponded, he said, with the other solutions. The five solutions submitted by
John, Jacob Bernoulli, the Marquis de l’Hospital, Ehrenfried Walther von Tschirn-
haus, and Isaac Newton were published. Newton had not revealed his name; John
Bernoulli recognized the author, “from the claw of the lion.” In this section we
visit Huygens’ elegant proof that the cycloid corresponds to the isochronous pen-
dulum. We also discuss Leibniz’s solution of the brachistochrone, included in the
Beilage (a supplement or appendix) to his letter of June 16, 1696, as well as John
Bernoulli’s optical – mechanical solution.
4.1.1 Huygens and the Horologium Oscillatorium

In his masterpiece Horologium Oscillatorium (“The Pendulum Clock”), Huygens
(1673) considers a particle that starts at rest at a point P of the inverted cycloid of
Figure 4.1. The cycloid is generated by the counterclockwise rotation of a circle of
diameter . Point P is at a distance d below the highest point of the curve. Huygens
cleverly maps the time of passage between two very close points G and R of the
cycloid (see Figure 4.1) to the angle between two very close points S and T on
an auxiliary circle of diameter − d, equal to the total vertical fall of the particle.
The resulting time for the particle to slide from P to the bottom of the cycloid is
proportional to the arc of the auxiliary circle, independent of the starting position:
the oscillation is isochronous.
Since the instantaneous center of gyration of the rolling circle is the contact point
D, the direction of the velocity at G is perpendicular to DG. From Thales’ theorem
(see Figure 2.12), ∠AG D is a right angle, and the velocity is along the line G A. At
this point Huygens uses the fact that for a particle that starts at rest at P, the velocity
√
at G is 2gh. In other words, the magnitude of the velocity is independent of the
path; it only depends on the vertical distance from the starting point. This path
independence follows from repeated applications of Galileo’s second postulate of
D
d
P
h
Σ S
Q G F
x
R T
O
A A
Figure 4.1 Adapted from Huygens’ Horologium Oscillatorium.
B A P
Q
R
S
Figure 4.2 Path independence of the final velocity. According to Galileo’s sec-
ond postulate, equation (2.12), particles that start at rest at P and at A, and travel
respectively along the inclined planes AQ and P Q, reach Q with the same veloc-
ity. This means that the velocity at R is the same for the particle following the
path P Q R or the inclined plane A R. But the velocity at R is the one acquired by
a particle falling from rest along the plane B R, and the velocity at S is the same
for the path P Q RS and for the inclined plane B S.
accelerated motion, equation 2.12, and by discretizing the curve into a polygon
(see Figure 4.2).
Since the segment G R is very small (G R h), the velocity is approximately
√
constant along the segment, and given by 2gh. The travel time from G to R is
GR
TimeG R √ . (4.1)
2gh
Using the similarity of the triangles G R and G AQ, we have

GR GA
= = , (4.2)
R QA QA
where we have also used Galileo’s law of chords of Figure 2.12: G A is the
geometric mean between Q A and the diameter of the circle. Huygens’ inge-
nious step is to map the points of the cycloid to points of a circle of radius
D C
v
F B
Q E S D
X O
M
L H f
S
K G
R PT P
N
A I
Figure 4.3 Diagram from Huygens’ original work on the isochronous clock.
O S = ( − d)/2 (see Figure 4.3). Consider the segment ST , whose vertical pro-
√ to R. Using
jection Sx is equal
√
similarity of the triangles Sx T and S F O, and the
fact that S F = h × A F ≡ h × Q A we have:
ST OS OS
= =√ . (4.3)
R SF QA × h
Combining equations (4.2) and (4.3)
ST √
GR = × ×h
OS √
∠S O T × × h. (4.4)
Substituting equation
√ (4.4) in the expression for the time of passage of equation
(4.1), the factor h cancels out and we obtain:

Time RG ∠S O T × . (4.5)
2g
For a whole period of oscillation ∠S O T = 4π (two turns of S around the auxiliary

circle) and the period T of Huygens’ pendulum is

2
T = 4π ≡ 2π , (4.6)
2g g
independent of the amplitude. Notice that T corresponds to the period for small
oscillations of a pendulum of length 2 (equal to the radius of curvature of the
cycloid at the lowest point),
A
E D
C B
F
L
G
M P
Figure 4.4 From Leibniz’s solution of the brachistochrone.
4.1.2 Leibniz’s Solution of the Brachistochrone

In the appendix to his letter of June 16, 1696, to Bernoulli, Leibniz (1962) treats
the curve of fastest descent as a polygon (see Figure 4.4). Consider the descent of
a particle along the inclined planes AD and D B of Figure 4.4. Leibniz fixes the
points A and B and varies D horizontally so that the descent time through these
two planes is a minimum.
Using Galileo’s result for the descent time through inclined planes, Leibniz
expresses the descent time along AD and D B in terms of the descent times for
vertical descent along the segments AE and EC. He calls r the descent time along
AE and n the descent time along EC. Since the paths AE and AD correspond to
the same vertical fall (and the same applies for EC and D B), we can use equation
(2.13) to relate the times for the inclined with the vertical planes:
AD
TAD = r , (4.7a)
AE
DB
TD B =n . (4.7b)
EC
The time for the two segments is TAD B = TAD + TD B
r n
TAD B = (AE)2 + (E D)2 + (EC)2 + (C B − E D)2 . (4.8)
AD EC
Leibniz, applying his very own invention, differentiates Time AD B and equates the
result to zero to find the minimum:
dTAD B r ED n CB − ED
= −
d(E D) AD (AE)2 + (E D)2 EC (EC)2 + (C B − E D)2
r ED n FB
≡ − = 0. (4.9)
AD AD EC D B
Not unexpectedly, equation (4.9) resembles Fermat’s formula for a refracted ray
discussed in Section 2.4.1. The quantities n/EC and r/AD play the role of the
A
1B
1E 1C
2B
2E 2C
1F 1D
3B
3E 3C
2F 2D
Figure 4.5 From the Beilage to Leibniz’s letter to Bernoulli.
inverse velocities in the two media. The main difference is that n and r are vary-
ing quantities as we move along the vertical line AM and the fact that Leibniz
is implicitly considering variations over all possible curves. The analysis can be
repeated for further vertical segments as shown in Figure 4.5, where Leibniz shows
both the curve of fastest descent on the right and a parabola representing the time
for vertical fall on the left (we change the notation for subindices from Leibniz’s
n B to the contemporary Bn ). If, we call Tn the time for vertical fall from Bn to
Bn+1 , the ratio Tn /Bn Bn+1 is the inverse of the average velocity of the particle
traveling from Bn to Bn+1 . If the intervals Tn and Bn Bn+1 are very small we have:
Tn 1 1
≈√ =√ . (4.10)
Bn Bn+1 2g ABn 2gy
Following Leibniz,2 we call d x = Dn−1 Cn , and dy = Cn Dn . Using these

definitions of d x and dy, equation (4.9) implies
1 Dn Cn+1 1 dx
√ =√ = Constant ≡ k. (4.11)
2gy Cn Cn+1 2gy d x + dy 2
2
Equation (4.11) can be rewritten as

y
d x = dy , (4.12)
2a − y
with, a = k 2 g/4. Leibniz expresses his solution in the form of equation (4.12)
without noticing that it corresponds to a cycloid.
2 We exchange x with y so that x is the horizontal coordinate. Leibniz calls y the horizontal coordinate.
4.1.3 Bernoulli’s Solution: Particle Paths as Light Rays

In his response to Leibniz, Bernoulli remarks with amazement that the solution
to the problem is the same curve that Huygens had found for the isochronous
pendulum. The correspondence is immediate if we notice, from Figure 4.1, that
dx G DQ DQ
≈ = = , (4.13)
dy R QG DG 2 − D Q 2
with D Q = y, the vertical distance of a point of the cycloid. From the law of
chords DG 2 = D Q × , and we obtain

DQ
d x = dy . (4.14)
− DQ
Equation (4.14) is equivalent to equation (4.12), with a the radius of the circle that
generates the cycloid.
Bernoulli tells Leibniz that he found the solution in two ways. He refers to the
first method as a discovery of an “admirable coincidence” between the curvature
of a light ray that propagates in a non-uniform medium and the brachistochrone.
He cites the letter to De La Chambre where Fermat establishes that a light ray
refracts towards the perpendicular, traveling, from the point of view of time,
the shortest path. He also mentions Snell’s law (equation 2.28) and proposes an
extension to a medium where the velocity varies continuously (see Figure 4.6).
Bernoulli is extending Snell’s law to all points of the path, implicitly using the fact
that “any portion of a path of quickest descent must itself be a path of quickest
descent”(McDonough, 2009).
Figure 4.6 John Bernoulli divides space into horizontal regions on which the
velocity is constant. The path is a sequence of small straight segments. The curve
of minimum time for a particle traveling from A to B on a vertical plane is the
same as the light ray from A to B, provided the particle and ray velocities are
proportional.
Figure 4.7 A light ray refracting at horizontal interfaces.
Bernoulli discretizes the problem into thin horizontal layers. We call d the thick-
ness of each layer (see Figure 4.7). According to Bernoulli, the particle’s velocity
in each layer is constant. The interfaces between regions are at heights yn = nd
(here n = 1, 2, · · · ) and the particle’s velocity in the n-th region is approximately
√
vn = 2g × nd. Since one is looking for a path of least time, Bernoulli maps the
problem to that of a light ray that moves at a velocity vn in the n-th region and uses
Snell’s law at the interface. We call θni and θnr the angles of incidence and refraction
of the ray at the n-th interface (see Figure 4.7). According to Snell’s law, we have
sin θni sin θnr
= . (4.15)
vn vn+1
Since the interfaces are parallel, the refracted angle at the n-th interface is equal to
the incidence angle at the next interface:
θnr = θn+1
i
(4.16)
which implies that the ratio of the sines of angles of incidence with the velocity is
a constant at each interface:
sin θ
√ = C, (4.17)
y
with C a constant. From our discussion of Leibniz’s solution, it is clear that
equation (4.17) represents a cycloid: since
dx
sin θ = , (4.18)
d x 2 + dy 2
equation (4.17) is equivalent to Leibniz’s solution of equation (4.11).
In the last part of the letter, by way of an appendix, Bernoulli mentions his solu-
tion to another problem in which he anticipates ideas that William Rowan Hamilton
would study in the nineteenth century. He discusses a curve, which he calls “syn-
chronous,” generated by points B (see Figure 4.8) of cyloids of a common origin
A, so that the travel times from A to B are equal. The synchronous curve cuts
A G
B O
B
L
P B
B
K
Figure 4.8 Bernoulli’s synchronous curve.
all the cycloids perpendicularly and corresponds, says Bernoulli, to the “wave”
that Huygens (1690/1945) had discussed in his Treatise on Light. Using the results
of Huygens’ isochronous pendulum, he indicates how to construct such a curve.
From equation (4.5), the time √ to fall from A to B on a cycloid is proportional
√ ∠(G O L) × G K (see Figure 4.8). Since the time is also propor-
to the product
tional to A P, Bernoulli indicates that the synchronous curve can be obtained
intersecting the cycloid with an horizontal line through L, with the arc G L pro-
portional to the geometric mean of G K with A P. This elegant construction is
possible due to the coincidental fact that the brachistochrone and the isochronous
pendulum correspond to the same curve. Bernoulli regards this coincidence as a
motive for metaphysical speculation: Nature, which operates according to the sim-
plest means, has chosen uniform acceleration for falling bodies, and one curve
to fulfill both functions. If the velocity, instead of increasing quadratically with
height, were proportional to the height (a force proportional to the square of the
height), then the isochronous curve is a straight line. On the other hand, Snell’s
law would give sin θ = C y (the equation of a circle) and the brachistochrone,
rather than a cycloid, is then a circle. Given its historical importance as well as the
counter-intuitive property of being faster than a straight line, the brachistochrone
has received much prominence outside standard mathemetical venues. A brachis-
tochrone monument was unveiled at the University of Gronigen in 1996 and, in the
Academy building, in Groningen, the brachistochrone is depicted in a stained-glass
window (Sussmann and Willems, 1997).
4.2 Maupertuis, Least Action, and Metaphysical Mechanics

In 1746, Pierre Louis Moreau de Maupertuis, John Bernoulli’s disciple and the
first French Newtonian, proposed a notion that he thought of as having universal
E
A
V V
R D
C
W W
B
F
Figure 4.9 From Maupertuis (1744).
application: the principle of least action. His proposal, based on metaphysical and
religious views (Jourdain, 1912), reflected his adherence to notions of simplicity
that had previously guided Fermat and Galileo: “Nature, in the production of its
effects,” he wrote, “does so always by the simplest means” (Maupertuis, 1744).
More specifically: “in Nature, the quantity action (la quantité d’action) necessary
for change is the smallest possible. Action is the product of the mass of a body
times its velocity times the distance it moves” (Maupertuis, 1746).
In his 1744 article, Maupertuis derives the law of refraction using a minimization
process. His procedure uses calculus, replicating Fermat’s method. However, in his
calculation, rather than minimizing time, he minimizes the action V × A R + W ×
R B (see Figure 4.9), where V and W are the velocities of light in the different
media. Maupertuis minimizes the action treating the length of the segment C R as
a variable, the procedure being identical to Leibniz’s calculation of equation (4.9).
His result is Snell’s law in Descartes’s version: the ratios of the sines of the angles
is equal to the reciprocal of the ratio of the velocities:
sin ∠A R E W
= . (4.19)
sin ∠F R B V
Even though Maupertuis gets the wrong result for light, his expression, from a
corpuscular point of view, is correct. In contemporary language, as we discussed
in Section 2.4, conservation of momentum in the direction parallel to the interface
gives Maupertuis’s (and Descartes’s) law:
V sin ∠A R E= W sin ∠F R B. (4.20)
Maupertuis’s novelty is to show (albeit with the wrong premise) that, for the
restricted case of a single interface, the bending of a particle path can be obtained
from a minimum principle. In a subsequent paper entitled “Derivation of the laws of
motion and equilibrium from a metaphysical principle”(Maupertuis, 1746), Mau-

pertuis applied his principle of least action to simple problems: the collision of
bodies on a plane and the equilibrium of two bodies on a lever. He finds the equi-
librium point for two bodies attached to a lever by requiring that, for slight motions
of the lever, the action is the smallest possible. Let L be the length of the lever. Let
two masses m 1 and m 2 be placed at either end and call z the distance from the first
mass m 1 to the equilibrium point. If the lever rotates slightly, by an angle δ, about
the equilibrium point, the masses 1 and 2 describe arcs of length zδ and (L − z)δ
respectively. Since these arcs, says Maupertuis, will be proportional
to the speed

of each particle, the quantity of action is proportional to m 1 z 2 + m 2 (L − z)2 δ.
Minimization of the action with respect to z gives z = m 2 L/(m 1 + m 2 ), the correct
equilibrium condition. This simple calculation is at the end of the paper, the core
of which is metaphysical. Maupertuis showed that his principle applied to bodies
at rest and to light. He was convinced that it was indeed universal and displayed
the workings of God in the very construction of the universe. God, in Maupertuis’s
view, acted in such a way as to spend the least amount of this mysterious fuel, mv
(Ekeland, 2006).
Maupertuis’s ideas were far from being unanimously accepted. Amongst his
prominent detractors was Voltaire. Before becoming his enemy, Voltaire was an
ally of Maupertuis in spreading Newton’s ideas in the 1730’s. But tensions grew,
partly out of jealousy of Maupertuis’s romantic involvement with mathematician
Emilie du Châtelet, Voltaire’s lover (Terrall, 2002). In a pamphlet entitled “The
Diatribe of Doctor Akakia, Physician to the Pope,” Voltaire (1753) satirizes Mau-
pertuis and accuses him of plagiarism. Voltaire was siding with another alleged
rival, Johann Samuel König, who, in a review published in Nova Acta Eruditorum
(König, 1751), claimed that Leibniz had enunciated the principle of least action
in a 1707 letter to a Jacob Hermann. Leibniz’s original manuscript of the letter
was never found, and König was accused of forgery (Brunet, 1938). Recent schol-
arship by historian Ursula Goldenbaum (2016) challenges this widespread view.
Goldenbaum holds that Leibniz’s letter was not forged, that König did not accuse
Maupertuis of plagiarism, and calls the whole issue “the most successful fake news
in modern history” (Goldenbaum, 2017). Despite this petty dispute, Leibniz’s work
does anticipate the principle of least action. In one of his most important scientific
papers, “A Unitary Principle of Optics, Catoptrics, and Dioptrics,” Leibniz (1682)
attempts a reconciliation of the Cartesian and Fermatian views on the phenom-
ena of reflection (catoptrics) and refraction (dioptrics) of light (McDonough, 2008,
2009). Leibniz takes the point of view of Descartes that light travels faster in denser
media, and at the same time follows a Fermatian approach in insisting that the path
followed by light be obtained by the minimization of some quantity. This quantity
cannot be length, because light bends when it refracts. It cannot be time because
light bends towards the perpendicular as it enters a denser – and, in this view, faster
– medium taking a longer time than a straight line. So he proposes that light “radi-
ating from a point reaches an illuminated point by the easiest path”(Leibniz, 1682)
– or “most determined” path (Leibniz, 1696/1952) – where by easiest he means the
one that minimizes the total “resistance” opposed by the media. For a refracting ray
in an air-water interface like the one shown in Figure 4.9, the easiest path corre-
sponds to the minimum of m × A R + n × R B, where m and n are the resistances of
the upper (air) and lower (water) media respectively. Leibniz argues that the resis-
tance is higher in a denser medium and gets the correct result. His result agrees
in spirit with Fermat in that the physical path is obtained through a minimization
process and not by a local, mechanically efficient method, as favored by Descartes.
However, in order to agree with Descartes, Leibniz has to assume that light travels
faster in more resistant media, because “greater resistance prevents the diffusion
of light rays,” in a manner similar to a “river that flows in a narrow bed and thus
acquires a larger velocity”(Dugas, 1955, p. 260). In a deeper sense, Leibniz main-
tains that a principle like the most determined path is reflecting “God’s intentions
to create the best of all possible worlds”(McDonough, 2008). “This principle of
nature,” he says in his Tentamen Anagogicum “is purely architectonic,” and then he
adds: “Assume the case that nature were obliged in general to construct a triangle
and that for this purpose only the perimeter or the sum were given, and nothing
else; then nature would construct an equilateral triangle”(Leibniz, 1696/1952).
The biggest supporter of Maupertuis’s ideas was Leonard Euler, who condemned
König’s accusations and applied Maupertuis’s naive formulation to curvilinear
paths to show that planetary orbits can be obtained by requiring the action to be
a minimum.
4.3 Euler and the Method of Maxima and Minima*

In his book A Method for Finding Curved Lines having some Properties of Max-
imum and Minimum . . . – according to mathematician Constantin Carathéodory
(1937), one of the most wonderful books that has ever been written about a math-
ematical subject – Leonhard Euler (1744) sets up a general method for finding
curves that maximize or minimize a given quantity. At the end of the book, in
Additamentum II, he derives the paths that minimize Maupertuis’s action.
Euler considers the curve az of Figure 4.10, which we call y(x), with x repre-
senting the horizontal axis AZ. Suppose for concreteness that one wants to find
the curve of fastest descent from a to z (or from z to a). Euler’s procedure is to
discretize the horizontal axis into small equal segments HI=IJ= · · · =RS= d x,
and evaluate y(x) at the discrete points H, I, · · · . The continuum curve az is
now a sequence of straight segments. Given that the continuum problem has been
discretized, it is convenient to number the points so that L = xj−1 , M = xj ,
z
s
p q r
ν o
m n
l
j k
i
h
A H I J K L M N O P Q R S Z
Figure 4.10 From Euler’s A Method for Finding Curved Lines having some Prop-
erties of Maximum and Minimum (Euler, 1744). In order to determine the curve
y = y(x), with A ≤ x ≤ Z , which minimizes (or maximizes) the definite inte-
Z
gral A F(x, y, y )d x, Euler divides the interval AZ into manysmall subintervals,
each of width x. He then replaces the integral by a sum i F(xi , yi , yi )x.
In each term of this sum, he approximates the derivative yi by the slope of the
straight line between initial and final points of the subinterval. He then takes the
variation on a single point (N in the figure), changing y from n to ν, and asks that
the variation in the integral (the sum in the discretized version) is zero.
N = xj+1 , etc. With the same numbering convention, Ll= y j−1 , Mm= y j ,
Nm= y j+1 , etc. Following the logic that we used in previous discretizations, we
assume with Euler that the particle velocity v(y) (a function of the vertical height)
is constant along each straight segment of the curve. The travel time from m to n
will be mn/v(y j ); from n to o, it will be no/v(y j+1 ) etc. Using the Pythagorean
theorem, we have:

2
mn = (d x)2 + y j − y j−1 , (4.21a)

2
no = (d x)2 + y j+1 − y j , (4.21b)
(4.21c)
and similarly for all the other segments of the curve. The total travel time T is now
a sum over travel times on the individual segments:

2 2
T = · · · n(y j ) (d x)2 + y j − y j−1 + n(y j+1 ) (d x)2 + y j+1 − y j + · · · ,
(4.22)
where for simplicity we called n(y) = 1/v(y) (the index of refraction in the optical
mechanical analogy). Euler designated p j the slope, or the derivative of the curve:
y j − y j−1
pj = , (4.23)
dx
in terms of which the travel time becomes

T = · · · n(y j ) 1 + p 2j + n(y j+1 ) 1 + p 2j+1 + · · · d x. (4.24)
And now comes the minimization procedure. The travel time T is a function of
a large – infinite in principle but finite upon discretization – number of variables
y j . The minimum (or extreme) of T has to be an extreme with respect to variations
of each individual y j , changes that Euler represents as nν (see Figure 4.10):
dT
= 0. (4.25)
dy j
Since y j appears in both p j and in p j+1 the derivative of T with respect to y j
involves three terms. Noting that dp j /dy j = 1/d x, and dp j+1 /dy j = −1/d x
dT dn(y j )
= 1 + p 2j
dy j dy j
⎧ ⎫
1 ⎨ pj p j+1 ⎬
+ n(y j ) − n(y j+1 ) . (4.26)
dx ⎩ 1 + p2 1 + p2 ⎭
j j+1
At this point, notice that the second term in equation (4.26), in the limit where d x
is infinitesimal, becomes
⎧ ⎫
⎨
1 pj p j+1 ⎬ d d
n(y j ) − n(y j+1 ) ≈− n(y) 1+ p .
2
dx ⎩ 1 + p 2j 1 + p 2j+1 ⎭ dx dp
(4.27)
Finally, introducing partial derivatives, we obtain the following differential
equation for the fastest particle moving with an inverse velocity n(y):
∂ d ∂
n(y) 1 + p 2 − n(y) 1 + p 2 = 0. (4.28)
∂y dx ∂p
Following Euler, we can repeat the same discretization procedure for a general
integral of the form
#
Z (y(x), p(x)) d x. (4.29)
The procedure leads us to the

following differential equation for the function y(x)
that extremizes the integral Z d x:
∂Z d ∂Z
− = 0. (4.30)
∂y dx ∂p
Euler applies his method to many examples. The most relevant one for our
discussion is his derivation of the Keplerian orbits from a minimization of
Maupertuis’s action.
4.3.1 Euler’s Derivation of Orbits from the Least Action Principle

In the Additamentum II of “The Method of Maxima and Minima,” Euler applies
his technique to a particle moving in a central force describing a planar orbit. He
does it in two ways. In the second – which we will follow here – he uses polar
coordinates. He shows that his differential equation (4.30) is the same one obtained
by the “direct method.” For a direct derivation, one can use, following Euler (1736),
the so-called vis viva (Latin for “living force”) theorem. We first use Cartesian
coordinates. Consider motion in two dimensions and decompose the force into an
x and a y component, which Euler calls X and Y . According to Newton’s laws, if
vx and v y are the components of the velocity in each direction,
dvx
X =m (4.31a)
dt
dv y
Y =m . (4.31b)
dt
Euler considers

m mv 2
X d x + Y dy = m(vx dvx + v y dv y ) = d(vx2 + v 2y ) ≡ d , (4.32)
2 2
and obtains the vis viva theorem (Smith, 2006). For a gravitational attraction
directed at the point (x, y) = (0, 0), the components of the force F – the magnitude
of which is F(r ) = Gm M/r 2 , with r 2 = x 2 + y 2 – are3
Gm M x ∂ Gm M
X = F(r ) cos θ = − ≡ (4.33a)
x +y
2 2
x +y
2 2 ∂ x x 2 + y2
Gm M y ∂ Gm M
Y = F(r ) sin θ = − 2 ≡ , (4.33b)
x + y2 x 2 + y2 ∂ y x 2 + y2
with θ the polar angle. Substituting equations (4.33) in (4.32), we obtain
2
G Mm mv
d =d , (4.34)
r 2
which is equivalent to
mv 2 G Mm
E= − = Constant. (4.35)
2 r
In contemporary language, E is the total mechanical energy, mv 2 /2 is the kinetic
energy, and −G Mm/r the potential energy. Euler uses the idea of potential energy
(Yurkina, 1985) with another name; the term was used for the first time by Rankine
at the end of the 19th century (Lanczos, 1962).
3 Here G is the gravitational constant, m the mass of the orbiting particle, and M the mass of the center of
force.
On top of the constant E, there is another constant associated with equal areas
covered in equal times (the angular momentum in contemporary language). In polar
coordinates, the components of the velocity are vr in the radial direction and vθ in
the tangential direction. Constant areas in equal times imply that
dθ
mr vθ ≡ mr 2 = L = Constant, (4.36a)
dt
dr dr dθ dr L
vr = = = . (4.36b)
dt dθ dt dθ mr 2
Substituting equations (4.36) in equation (4.35), and noticing that v 2 = vr2 + vθ2 ,
we obtain a differential equation for the trajectory r = r (θ) that does not involve
the time variable:
2
L2 dr L2 G Mm
E= + − , (4.37)
2mr 2 dθ 2mr 4 r
or, in Euler’s notation
dr r 2
=√ r (A + V (r )) − C, (4.38)
dθ C
with V (r ) = G Mm/r , C = L 2 /2m and A = E. Euler shows that this differential
equation results from a minimization of Maupertuis’s action,
#
mvd. (4.39)
In polar coordinates d is given by

d = dr 2 + r 2 dθ 2 . (4.40)
Also, using equation (4.35), we write

2
v(r ) = A + V (r ), (4.41)
m
and obtain
⎧ ⎫
⎨√ 2 ⎬
dθ
mvd = 2m A + V (r ) 1 + r 2 dr. (4.42)
⎩ dr ⎭
Euler calls x the radial coordinate. The function to be found, y(x), is our θ(r )
(the polar angle of the orbit as a function of the radius), and Euler’s p = dθ/dr .
This means that the function Z is given by
√
Z (y(x), p(x)) = 2m A + V (x) 1 + x 2 p 2 . (4.43)
Since Z from equation (4.43) is not a function of y, Euler’s equation (4.30) gives
d Z /dy = 0, or, which is equivalent
dZ px 2 √
= Constant → A + V (x) = C, (4.44)
dp 1 + x 2 p2
√
where we wrote the constant as C so as to stay close to Euler’s notation. Equation
(4.44), after simple algebra gives
1 dx x 2
= =√ x (A + V (x)) − C, (4.45)
p dθ C
which is the same result as that of equation (4.38), obtained from the direct method.
Euler stresses that this calculation is valid as long as there is no resistance to
the motion.4 In other words, the motion involves a constant of motion E, the total
energy, and the velocity v of the particle is a simple function of the coordinates
determined by (4.35), the vis viva equation. This restriction to conservative motion
was not mentioned by Maupertuis. Euler (1751) later published a paper entitled
“Dissertation on the least action principle, with an examination of the objections
made by Professor König,” where he yields all the honor of the discovery of the
principle of least action to Maupertuis. He cites Aristotle’s notion that Nature does
nothing in vain. The preference for “least” is metaphysical; a maximum would be
evidence of “imperfection of the Creator’s wisdom” (Dugas, 1955, p. 275). How-
ever, what Euler showed in his calculation is that orbits in a central potential are
extrema of the Maupertuis action, and non necessarily minima. It is clear that the
paths cannot be maxima. One could always increase the value of the action by
adding “wiggles” to the path over a region small enough in size that the potential
V is constant. This process increases the kinetic energy of the path (the integral of
the potential energy remains constant), and the action increases. As we will dis-
cuss in more detail in Section 6.8, the physical paths are either minima or saddle
“points” of the action. It is also interesting that Euler does not claim to have proven
or discovered a wider principle. On the contrary, he remarks that he has “not discov-
ered this beautiful property a priori but (using logical terms) a posteriori, deducing
after many trials the formula which must become a minimum” (Euler, 1751). Mach
had good things to say about Euler’s modesty and accomplishments: “Euler mag-
nanimously left the principle its name, Maupertuis the glory of the invention, and
converted it into something new and really serviceable” (Mach, 1960, p. 550).
4 Here the term “resistance” refers to friction, or non-energy conserving forces, and not to resistance in the
Leibnizian sense
4.4 Examples of the Optical-Mechanical Analogy

In section 4.1.3 we discussed Bernoulli’s solution of the problem of quickest
descent using the connection to Snell’s law. This beautiful solution establishes
an equivalence between the trajectories of particles (at constant energy) and light
rays. For light, Fermat’s principle asks us to minimize the travel time between the
extremes of the path. The integral to be minimized is given by
#
1
ds, (4.46)
v
where v is the velocity of the light ray, which in general is some function of posi-
tion. On the other hand, for particles, Maupertuis’s principle of least action states
that the physical path is the one that minimizes the action, given by equation (4.39),
#
m vds. (4.47)
The analogy is immediate. If we know the functional form of vparticle (x) as a

function of position x, the path connecting two points A and B will be that of a
light ray with a position-dependent velocity vlight (x) so that
1
vparticle (x) ∝ . (4.48)
vlight (x)
Equivalently, in terms of the index of refraction n, defined as n = c/vlight , with
c the velocity of light in a vacuum, the paths of light rays and those of particles
are the same provided v ∝ n (from now on we use v for particles, omitting the
subscript).
The optical mechanical analogy express a geometric relation between portions
of the trajectory. Although the paths are geometrically equivalent, the time evo-
lution of a light ray and that of the particle will be clearly different: the light ray
is faster when the particle is slowest. If we start with the particle at a given point
with a given velocity (direction and magnitude), we can construct its trajectory
by breaking it into small straight segments that correspond to the propagation in
regions of constant potential. These segments will bend at each interface according
to mv1 sin θ1 = mv2 sin θ2 . If we start with a light ray at the same position and
pointing in the same direction as the velocity of the particle, we will get the same
trajectory if for each of the straight segments
n i = mvi , (4.49)
as depicted in Figure 4.11. Since the energy of the particle is the same at all points
of the trajectory,
1 2
mv = E − Vi . (4.50)
2 i
V1
V2
Figure 4.11 Bending of light rays and particles.
This means that, if we know the trajectory of a particle with energy E connecting
two points of a medium in which the potential is V (x), that identical trajectory will
be the one followed by a light ray connecting those points in a medium in which
the index of refraction is5

n(x) ∝ m [E − V (x)] . (4.51)
Conversely, the trajectory of a light ray in a medium in which the index of refraction
is n(x) will be identical to that followed by a particle of zero energy moving in a
potential
n 2 (x)
mV (x) ∝ − . (4.52)
2
As we already mentioned, the optical mechanical analogy was later used by
William Rowan Hamilton, and played a crucial role in the formulation of quan-
tum mechanics in the twentieth century. However, and remarkably, the idea “is
nowhere to be found again before the time of Hamilton in the nineteenth century”
(Carathéodory, 1937). From a pedagogical point of view, it is interesting that we
can obtain the trajectories of light rays in media with varying indexes of refrac-
tion by solving Newton’s equation of motion for a particle (Evans and Rosenquist,
1986).
4.4.1 Conservation of “Angular Momentum” for Light Rays

Let us consider the case where the index of refraction has spherical symmetry, that
is, n depends only on the distance with respect to a fixed point. For particles this
corresponds to motion in a central field of force.
5 Notice that the proportionality constant implicit in equation (4.51) is irrelevant since the minimization
process equates derivatives of integrals (4.46) and (4.47) and the path will be independent of the
proportionality constant, which appears in turn as a constant multiplying the whole integral.
Figure 4.12 A light ray refracting in a region in which the index of refraction
varies radially.
Snell’s law at point a of Figure 4.12 we have
sin θni sin θnr

= . (4.53)
vn vn−1
In contrast with the brachistochrone problem, now θn−1

i
= θnr . In reference to
Figure 4.12 we see that, for small angles θ
cd rn−1 θ
sin θnr = = ,
ad ad
and
ab rn θ sin θnr
sin θn−1
i
= = ≡ rn .
ad ad rn−1
Substituting the above relations in Eq. 4.53 we have
sin θni sin θn−1

i
rn = rn−1 = Constant. (4.54)
vn vn−1
We can now omit the index “i”, since in the above equation θ is the angle that
the light ray forms with the line connecting the “center of force” with the point of
refraction. The idea is now to take the limit of r very small so the trajectory of
the light ray becomes a continuous line. Also, we can write the velocity in terms of
the index of refraction v(r ) = c/n(r ) and rewrite (4.54) as
r n(r ) sin θ(r ) = Constant. (4.55)

In some optics textbooks this equation is called the formula of Bouguer (Born and
Woolf, 1999, p. 123), and is identical to the conservation of angular momentum6 if
we follow the optical-mechanical identification mv(r ) ↔ n(r ).
4.4.2 The Terrestrial Brachistochrone

Suppose we want to dig a tunnel from New York to Los Angeles and use only the
gravitational attraction to accelerate a person through the tunnel. This is the equiva-
lent of Bernoulli’s brachistochrone problem except that here the gravitational force
is radial and varies in magnitude. If we assume that the velocity is zero at the begin-
ning of the tunnel, the velocity depends on the distance r to the center of the Earth
as7

g(R 2 − r 2 )
v(r ) = , (4.56)
R
where R is the radius of the Earth. Using the result on optical angular momentum,
the equation of the curve has to satisfy

r sin θ (R 2 − r 2 ) = A (4.57)
with A a constant. From simple geometrical considerations we can see that this
equation corresponds to a hypocycloid, the curve described by a point on a circle
rolling inside of a larger circle.
From Figure 4.13 we see that
r sin θ = (R − D) cos α, (4.58)
and
r cos θ = R sin α. (4.59)
A couple of algebra steps combining the two equations above gives

√
R2 − D2 2
r sin θ = R − r 2, (4.60)
D
which proves that the terrestrial brachistochrone is a hypocycloid.
6 Conservation of angular momentum, or equal areas in equal times, means that mv r = Constant, with v the
θ θ
tangential velocity given by vθ = v sin θ . Since we are treating cases where the vis viva equation is valid, v is
a function of r , and we obtain mv(r ) sin θ = Constant.
7 If we assume that the Earth is a perfect sphere of uniform density, at a radius r < R, the acceleration of
gravity g(r ) is determined by the mass at points smaller than r , as though all the mass were concentrated at
the center of the Earth. This means that g(r ) = gr/R. Since g(r ) is obtained from the potential V (r ) through
g(r ) = d V /dr , we obtain V (r ) = gr 2 /2R. Using the vis viva equation (conservation of energy), equation
(4.56) follows.
Figure 4.13 When a circle of diameter D = QT rolls inside of a circle of radius

R = C Q, point P describes a hypocycloid. From the same arguments discussed
in Section 4.1, the instantaneous motion of P is along a circle centered at the point
of contact Q. That means that the tangent to the curve is the line T P, which is
perpendicular to Q P.
4.5 The String Analogy and the Principle of Least Action

Around the same time of the discussion of the least action principle, Bernoulli
(1742) published Disquisitio Catoptico-Dioptrica, in which he presents a deriva-
tion of both the law of reflection and Snell’s law, using a beautiful analogy with the
static equilibrium of a string under tension (see Figure 4.14). This analogy was also
noted later by Möbius (1837) (see also Gray, 1993; Lyusternik, 1964). A rephras-
ing of the derivation is the following. Call T1 and T2 the weights hanging from
points A and B and let the string be constrained to slide without friction along the
line C D (see Figure 4.14). Since the supports A and B are frictionless, the tensions
on the portions AE and E B are respectively T1 and T2 . The equilibrium condition
corresponds to zero force exerted by the string in the horizontal direction at the
point of support E. Since the tensions T1 and T2 are different, the segments AE
and E B are not in the same direction; the string “refracts” at the interface C D. We
call the angles ∠AE R = θ1 , and ∠S E B = θ2 . Since the portion AE exerts on E
a horizontal force T1 sin θ1 to the left, and portion E B exerts force T2 sin θ2 to the
right, the equilibrium condition is given by
T1 sin θ1 = T2 sin θ2 . (4.61)
Equation (4.61) is equivalent to Snell’s law if we identify the indices of refraction

n 1 and n 2 (or the inverse velocities) in the different regions with the corresponding
tension of the strings. As proven by Fermat and revisited by Huygens, Maupertuis,
R
A
C E D
B
A
S
R
m q
M e D
C E p
N
A R n
B
m
N
q p
M n
C E e D
S
Figure 4.14 John Bernoulli’s proof of Snell’s law using the mechanical equilib-
rium of a tense string. Reproduced from Bernoulli (1742).
and others, Snell’s law is the result of the minimization of the time t from A to
B, given by ct = n 1 1 + n 2 2 , where 1 = AE and 2 = E B. The analogy is
immediate; the quantity to be minimized for Bernoulli’s string is given by
U = T1 1 + T2 2 . (4.62)
The quantity U is the potential energy of the system. If we use the optical mechan-
ical analogy, the tension Ti of the string can be identified with the velocity vi
√
of the particle, which, in turn, is given by vi = 2m(E − Vi ), and Ui is the
corresponding potential energy for a particle in each region.
Table 4.1 Corresponding quantities in the analogy used in the principle of least
action between mechanics, geometric optics and the equilibrium of a
non-stretchable string.
Particle Light Ray Non-Stretchable String

mv (momentum) n (refractive index) T (Tension)
mv1 sin θ1 = mv2 sin θ2 n 1 sin θ1 = n 2 sin θ2 (Snell’s law) T1 sin θ1 = T2 sin θ2
(equilibrium)
d A = mvd (action) cdt = nd (optical length) dU = T d (potential energy)
Figure 4.15 Frictionless pulleys that can slide in horizontal lines with a string
passing through them a sufficient number of times gives the trajectory of the par-
ticle if we identify Ti with mvi at each segment. Since the string can only pass
through each pulley an integer number of times, the ratios of the velocities are
approximated by the ratios of the number of times the rope passes through each
segment (see Mach, 1960, p. 473).
In the spirit of the brachistochrone calculation, we are interested in paths that

traverse many regions where the particle velocities are different. In order to use
the string analogy, we break up the trajectory into many straight segments. The
arrangement will correspond to frictionless pulleys that can slide on horizontal
rods (see Figure 4.15), with the string passing through them a sufficient number of
times. Finally, there is a weight W attached to the extremity of the string (see Mach,
1960, p. 473). In Table 4.1, we summarize the non-stretchable string analogy with
Maupertuis’s principle of least action and Snell’s law.
In Chapter 6 we will visit William Rowan Hamilton’s work, who, starting
with Maupertuis’s action, extended the optical mechanical analogy to an all-
encompassing formulation of mechanics. In anticipation, we present a simple way
to extend Bernoulli’s equilibrium analogy to stretchable stings which leads us to a
preliminary derivation of Hamilton’s principle (Rojo, 2005).
4.5.1 The Least Action Principle and Stretchable Strings

Maupertuis’s principle of least action, as we discussed in this chapter, gives the
optimal trajectory for a particle of a given energy E between two fixed spatial
points. It doesn’t say anything about the time it takes to travel from one point to the
other. Now consider the problem of finding a path that will connect point P to point
Q in a fixed time t. In other words, what is the path that the particle will “choose”
in going from point P = (x P , 0) to point Q = (x Q , t)? In extending the treatment
to paths that go between two fixed space–time points, it is useful to treat t as a new
geometrical dimension. To simplify the analysis, and to retain the two-dimensional
picture of the previous section, consider motion in one (spatial) dimension.
We will discretize the problem and break the particle trajectory x(t), into small
straight segments connecting points separated by a fixed time interval dt. Just as
Newton does in the Principia, the fact that the segments are straight means that the
motion is of constant velocity during that interval, then changes due to an impulsive
force. This force will be different from zero if the potential is changing as a function
of x at the particle position. Consider for simplicity two broken segments as in
Figure 4.16(a).
Before the force F acts on the particle, the velocity is given by the slope of the
curve in the (x, t) plot:
xi − x P
vP = , (4.63)
dt
x
(a)
(xi, dt)
xi
Q = (xQ, t)
F
P = (xP, 0)
t
0 dt 2dt
x
−F (b)
xi
)t2) 2
d(td
Q
/(/
mm
==
kk
P
t
0 dt 2dt
Figure 4.16 Stretchable string analogy for Hamilton’s principle.

where points xi and x P should be thought of as very close to each other. The effect
of F is to change the particle’s velocity from v P to v Q , given by
x Q − xi
vQ = . (4.64)
dt
Notice that, since the force is downward, the slope decreases: downward force
means that, at xi , the potential is increasing as a function of x.
The path (in space time (x, t)) is a solution of Newton’s second law, according to
which the rate of change of the velocity times the particle mass is the force acting
on it:
vQ − vP
F =m . (4.65)
dt
Substituting the above expressions for the velocity:
m m
F= x Q − xi − (xi − x P ) . (4.66)
(dt)2
(dt)2
At this point take a step towards abstraction, and forget the space-time picture
for a moment. Equation (4.66) can be thought of as describing the force of a system
of two springs of identical “spring constant” k = m/(dt)2 , the first spring connect-
ing point (xi , dt) with (x P , 0), the second connecting (x Q , 2dt) with (xi , dt), as
sketched in Figure 4.16(b). In order for the system to be in equilibrium, or, in other
words, for the intermediate coordinate to have the value xi (the other two are fixed),
there has to be a force of precisely magnitude F but of opposite sign.
The path given by Newton’s law is given by the equilibrium condition of a
mechanical model of two springs in the presence of a potential of opposite sign
to that of V (x). The equilibrium configuration is the one that minimizes the poten-
tial energy of the entire system, springs plus “external” potential −V (x). Since the
potential energy for a spring of spring constant k connecting two points separated
by a distance δ is kδ 2 /2 , the total potential energy of the system (that we will call
$
S) is given by

$ m xi − x P 2 m x Q − xi 2
S= + − V (xi ). (4.67)
2 dt 2 dt
Just as in Euler’s treatment, the equilibrium condition of equation (4.66) is
obtained from d $S/d xi = 0, noting that F = −d V /d x.
Coming back to the original world line picture, the optimum path in space-time
is the one that minimizes the difference between kinetic and potential energy. This
is Hamilton’s principle, which in this case we obtained using a mechanical analogy
similar to the principle of least action in the sense that there is a correspondence
between the kinetic energy and the potential energy of the fictitious springs. In
other words, the stretchable string is in equilibrium due to two types of forces in
space-time: the external force due to (minus) the real external potential, and the
elastic force of fictitious springs playing the role of the kinetic energy.
For a longer path with N straight segments, each of them traversed by the particle
in a time dt, the velocity at the i-th segment will be vi = (xi+1 − xi )/dt and the
equivalent potential energy will be given by
2 2
$ mv1 mv2 mv 2N
S= − V (x1 ) + − V (x2 ) + · · · + . (4.68)
2 2 2
In the continuum limit we will have to minimize the quantity S[x(t)]:
# x(t) 2
mv
S[x(t)] = dt − V (x) , (4.69)
x(0) 2
which amounts to finding the path x(t) that minimizes the integral S.
We can also derive Hamilton’s principle using a slightly more sophisticated
approach while still keeping it elementary. In reference to Figure 6.11(a), Mau-
pertuis’s and Euler’s principle of least action tells us the path a particle of fixed
energy E will choose in going from A to B. Call V1 and V2 the potential ener-
√
gies in the upper and lower parts of the line C D, and v1 = 2m(E − V1 ) and
√
v2 = 2m(E − V2 ) the corresponding velocities. Now consider paths with differ-
ent energies and ask for which of those paths the particle will satisfy Newton’s laws
and spend a fixed amount of time t going from A to B. Following the logic of the
principle of least action, we want to find a function of the paths that will give the
desired one upon minimization. For the special case under consideration, the path
consists of two straight segments, and the function has to be such that, of all paths
that take a time t in going from A to B, the particle chooses the one that satisfies
the “Snell’s law for particles”: mv1 sin θ1 = mv2 sin θ2 .
Call a and b the perpendicular distances of A and B to the interface C D, L the
horizontal distance between A and B, and x the distance C E. Maupertius’s action
A = mv1 1 + mv2 2 can be thought of as a function of x and the energy:

A(x, E) = mv1 (E) x 2 + a 2 + mv2 (E) (L − x)2 + a 2 . (4.70)
In order to explore whether A(x, E) is the desired function, compute the

variations of A with respect to x and E,
∂A
d A = (mv1 sin θ1 − mv2 sin θ2 ) d x + d E. (4.71)
∂E
It is clear that minimizing A (or equivalently setting d A = 0) does not give us
Snell’s law for particles because of the second term above. However, notice that
dvi /d E = 1/mvi , and
√
∂A x 2 + a2 (L − x)2 + a 2
= + = t1 + t2 = t, (4.72)
∂E v1 v2
with t1 ≡ 1 /v1 and t2 ≡ 2 /v2 the times it takes the particle to go from A to B
and from E to B.
This means that if Et is subtracted from A, the desired quantity is obtained:
S = A − Et. (Notice that d(Et) = td E since the paths considered last a constant
time.) Therefore,
S = (mv1 1 − Et1 ) + (mv2 1 − Et2 )

= mv12 − E t1 + mv22 − E t2
= (K 1 − U1 ) t1 + (K 2 − U2 ) t2 , (4.73)
which is the quantity to be minimized according to Hamilton’s principle.
5
D’Alembert, Lagrange, and the Statics-Dynamics
Analogy
In this chapter we visit mechanics in the Age of Enlightenment. In that period,

Newton’s ideas, which allowed only the study of motion of bodies free in space,
were extended to incorporate constraints in mechanical systems. The key figures
are James Bernoulli, Jean le Rond d’Alembert and Joseph-Louis Lagrange. The
central concept is the principle of virtual work, which establishes the conditions of
static equilibrium and its extension to dynamics.
5.1 The Principle of Virtual Work

The idea of treating a static equilibrium problem using ideas from dynamics goes
back to Aristotle’s text “Mechanical Problems.” Although his authorship is dis-
puted, it is probably the product of his contemporaries of the Peripatetic School.
In the discussion of the lever – although the word equilibrium is never used – we
read “the ratio of the weight moved to the weight moving it is the inverse ratio of
the distances from the center” (Aristotle, 350 BC/1955, p. 353). This statement is
regarded by many as a precursor to the so-called method of virtual velocities, or vir-
tual displacements (Capecchi, 2012). In “On Mechanics,” one of his early works,
(Galileo, 1600/1960) borrows Aristotle’s idea and treats equilibrium on an inclined
plane as an invariance under hypothetical displacements. In the “Discourses,” he
uses notions of statics and dynamics in the same sentence: “when equilibrium (that
is, rest) is to prevail between two moveables, their [overall] speeds or their propen-
sities to motion – that is, the spaces they would pass in the same time – must be
inverse to their weights [gravità]”(Galilei, 1638/1974, p. 173). Since Galileo talks
about the “propensity” to move, and the system is at rest, the velocity refers to
a hypothetical motion in a time different from the time of our universe. Galileo
realized that, for weights on an inclined plane, the determining factor for equilib-
rium is their motion away from or “removal from the center of the earth”(Galileo,
1600/1960, p. 177). For the inclined plane with two masses of Figure 5.1, the ratio
of the masses is 2. In order for the system to be at equilibrium, the ratio of the
79
80 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
C
2m
m h
E 2
h
A B
Figure 5.1 Adapted from Galileo’s “On Mechanics.” Virtual displacements for an
inclined plane. According to Galileo, the system is at equilibrium if the ratio of
their vertical virtual velocities is reciprocal to the ratio of the weights.
vertical velocities (if they were displaced in the same amount of time) is 1/2. In
general, says Galileo, two weights will be in equilibrium on the inclined plane of
Figure 5.1 when the ratio of the forces E and F is equal to the ratio of the lines
C B and AC.
Interestingly, although Galileo is usually portrayed as rejecting Aristotle’s
dynamics, the principle of virtual displacements, the exclusive basis of Galileo’s
science of motion – at least according to historian Pierre Duhem – is of Aristotelian
heritage (Duhem, 1905, p. 260).
The term “virtual” appears for the first time in a letter of John Bernoulli to Pierre
Varignon, dated January 26, 1717. Varignon was able to solve numerous problems
of statics using the parallelogram rule for the decomposition of forces. Bernoulli
used a different approach. He didn’t see a clear way of introducing the reactive
forces in statics and created his “rule of energies” (later called the “principle of
virtual velocities” by Lagrange). For Bernoulli, a virtual velocity is a tendency, or
a propensity, to move, that the acting forces have on the system at equilibrium, and
his proposition is: “In any equilibrium of any forces in any way they are applied and
following any directions, either they interact with each other indirectly or directly,
the sum of the positive energies will be equal to the sum of the negative energies
taken positively” (Varignon, 1735, p. 176). What Bernoulli calls “energies” is what
we call today virtual work. In Figure 5.2 we reproduce Bernoulli’s definition of
virtual velocity. A force F, represented by the line FP, is acting on a point P of
the system in equilibrium. He now calls P p the (infinitesimal) movement of the
point when it is displaced from equilibrium. Since the point has moved from P to
p, according to Bernoulli the direction of the force at p will be along the line f p,
which in general will not be colinear with FP. He calls the segment C p the virtual
velocity of the force at P, and the energy is given by F ×C p (where “×” means the
usual product). The energy can be positive or negative depending on the direction
of the force at P, which can be from P to F or from F to P. In contemporary
notation, we denote the point P by the coordinate x and use F(x) for the force F.
5.1 The Principle of Virtual Work 81
F
f
P C
Figure 5.2 Virtual work, from Bernoulli’s letter to Varignon (1735).
For the displacement P p we use δx and F(x + δx) for the force F at p. In this
notation C p = δx · F(x + δx)/|F(x + δx)|. If we now use the fact that the segment
C p is infinitesimal, keeping the terms to lowest order in δx, we have:
Bernoullis’ energy = F × C p (5.1)

δx · F(x + δx)
= |F(x)| × (5.2)
|F(x + δx)|
δx · F(x). (5.3)
In this notation, for a system of N points that can undergo N virtual displacements
Bernoulli’s statement reads
F1 · δx1 + F2 · δx2 + · · · F N · δx N = 0. (5.4)
The quantities δxi are not arbitrary displacements of the system but those dis-
placements allowed by the constraints. It is crucial that each δxi is infinitesimal
in order for equation (5.4) to express a condition of equilibrium. When the dis-
placements are infinitesimal, we can divide equation (5.4) by an infinitesimal of
time “dt” (a differential of a “virtual time” t of sorts since the system is at rest)
and call Bernoulli’s statement the principle of virtual velocities. Varignon praised
Bernoulli’s idea but considered the rule of energies as a corollary to his parallel-
ogram rule. He was not able to prove that statement in general but showed the
equivalence between the two methods in many specific examples, one of which
we show in Figure 5.3. Three equal weights of magnitude W hang from pulleys
on a horizontal table, and the strings from which they hang are knotted at O. The
forces W A , W B , and WC acting on O are of equal magnitude but point in different
directions. We want to find the angle between the strings when the system is at
equilibrium. The principle of virtual work for this example can be expressed as
(W A + W B + WC ) · δx = 0, (5.5)
A
A
WA α
δx
W B
O
WC O WB B
C β
W C
Figure 5.3 Virtual displacements and equilibrium for three equal weights con-
nected by strings and knotted at a point O. Modified from Mach (1960).
where we used the notation δx for an arbitrary infinitesimal displacement within

the plane. If we choose, as in Figure 5.3, the displacement δx perpendicular to W B ,
we have W B · δx = 0 and equation (5.5) becomes:
W δx cos α − W δx cos β = 0, (5.6)
implying α = β, or, which is equivalent, ∠AO B = ∠C O B. If we repeat the
process by taking δx perpendicular to W A and to WC , we find that the three angles
∠AO B, ∠C O B, and ∠AOC are the same and equal to 120◦ .
The concept of virtual work was used and developed by many other scientists,
including Torricelli, Descartes, and Wallis (see Capecchi, 2012), but it was in the
hands of Lagrange, with his treatment of dynamics as a problem of static equi-
librium based on the work of James Bernoulli and d’Alembert, that the principle
acquired the status of a fundamental principle of mechanics. At a foundational
level, the logical status of the principle of virtual work is debatable. Although the
arguments leading to the principle are plausible, it cannot be rigorously proven
starting from Newton’s laws, and should be regarded as an axiom of mechanics
(Drago, 1993). In his Lectures on Theoretical Physics, Arnold Sommerfeld is elo-
quent: “Far from us to give a general proof of this postulate. Rather we regard it
practically as a definition of a ‘mechanical system’ ” (Sommerfeld, 1952, p. 53).
5.2 Statics Meets Dynamics: Bernoulli’s Calculation of the Center of

Oscillation
One of the breakthrough papers in solving the motion of mechanical systems
with restrictions, or constraints, on their motion, is by James Bernoulli (1703).
Bernoulli’s paper, a precursor of Jean le Rond d’Alembert’s formulation of dynam-
ics, and “second only to the Principia itself in influence on the later growth of
5.2 Statics Meets Dynamics 83
lost
unchanged
impressed
Figure 5.4 Impressed, unchanged, and lost motions (or velocities) for a rigid bar
that is free to rotate around C.
the discipline”(Truesdell, 1960b), is based on the idea of unchanged and “lost”

motions (or velocities) during a collision. For example, consider a bar on a plane,
constrained to rotate around a point C, as shown in Figure 5.4.
Imagine that a velocity is impressed on the extreme of the bar in the vertical
direction. This impressed velocity could come from a collision or an impact with
another particle. From the point of view of Newton’s second law, we can think that
an impulsive vertical force F acts on a mass m on the extreme of the (massless) bar
over an infinitesimal time dt. As a result, a momentum mdv = Fdt is transferred
to the mass. The impressed velocity in this case is dv, in the vertical direction.
Using the parallelogram rule, the impressed velocity can be expressed as the sum
of two components, one in the direction of the bar, and the other perpendicular
to the bar. The crucial assumption is that the component in the direction of the
bar is lost in the rigid bar. The component perpendicular to the bar will be the
acquired velocity of the mass at the extreme. In this picture, a lost velocity (or
motion) is one that, if impressed on the bar at rest, will have no effect on the
motion. Bernoulli extends this notion to impressed velocities at different points of
a compound system.
In his paper of 1703, Bernoulli’s finds the center of oscillation of a compound
pendulum: a pendulum with masses of different magnitudes at different distances
from the center of oscillation, all of them constrained to be part of the same, solid
object. By center of oscillation we mean the following: for a simple pendulum with
a point mass on the extreme point of a bar, the period of oscillation is independent
of the mass, and depends on the acceleration of gravity and on the length of the bar.
Finding the center of oscillations amounts to finding the length of a simple pendu-
lum that would be isochronous with the compound pendulum. Bernoulli considers
two masses on a plane. For simplicity let us consider the two masses to be along the
same line (see Figure 5.5). The extension to a compound pendulum with the posi-
tions of the masses forming an angle with the center of oscillation is immediate.
Masses m 1 and m 2 are at distances AC = 1 and BC = 2 from C. The bar is
g g g θx
X B
A C
a2
M G
T
a1
Figure 5.5 The center of oscillation X of a bar with two masses is such that,
if an impulsive force (or impressed velocity) g is applied at X , the compound
system will behave as a simple pendulum with a mass at X . The motions (arcs)
AM = X T = BG are all proportional to g. a1 and a2 are the lost motions.
horizontal, and the force of gravity acts impulsively.1 Since the masses acquire
momenta proportional to m 1 g and m 2 g respectively, their impressed velocities will
be equal, and proportional to g.
If the masses, rather than being constrained to be along the same line, were
free to rotate independently around C, a short time after the impulses they would
describe arcs of the same length, proportional to g. Mass 1 would be at position
M, and mass 2 would be at G. Since all of the calculation is based on proportion-
alities, we take the arcs AM = BG = g. In order for the constraint to be satisfied,
two extra motions, a1 and a2 , are required (see Figure 5.5). In their real position
after the impulses, the positions of the masses form the same angle θx with the
horizontal:
g + a1 g − a2
= = θx . (5.7)
1 2
The quantities a1 and a2 are the lost motions of this problem, and originate in
internal, impulsive forces, F1 and F2 , that act perpendicularly to the bar. Since
these forces act in the same infinitesimal time interval dt as the impulsive velocity
g dt, we have, using Newton’s law:2 a1 = F1 /m 1 , and a2 = F2 /m 2 . Bernoulli
assumes that the lost motions acting on the system at rest will not alter the static
equilibrium; the forces F1 and F2 obey the law of the lever, F1 1 = F2 2 , or which
is equivalent:
m 1 a1 1 = m 2 a2 2 . (5.8)
1 Bernoulli considers the bar at an angle with the horizontal. The algebra is slightly simpler when the bar is
horizontal and is enough to capture the essence of the method.
2 Again, we are using equalities where we mean proportionality. Strictly speaking, if we call a a
1
displacement, it is a quantity of “second order.” First, the impressed velocity dv1 comes from a force F1
acting in an infinitesimal time dt: dv1 = F1 dt/m 1 . Second, the displacement a1 is the distance traveled in a
small time that we can call δt: a1 = F1 δtdt/m A . The factors dt, δt are omitted, and do not enter into the
final calculation, provided one is talking about infinitesimal displacements.
Combining equations (5.8) and (5.7) we obtain θ X = g/ X , with X ≡ C X given by

m 1 21 + m 2 22
X = . (5.9)
m 1 1 + m 2 2
The result of equation (5.9) was previously obtained by Huygens (1673) using the
conservation of vis viva. Bernoulli derives the result using equation (5.8), the law
of the lever applied to dynamics. We justified the transition from the usual law
of static equilibrium of the lever (F1 1 = F2 2 ) to an equation that expresses an
equilibrium of momenta3 using a Newtonian argument. Bernoulli takes it as a prin-
ciple in itself, and cites as a demonstration Proposition 13 of the second part of the
Treatise on the Percussion or Impact of Bodies by Edme Mariotte (1673). However,
Mariotte’s “demonstration” is by experiment. We regard Mariotte’s ingenious argu-
ment as a precursor of f = dp/dt. He starts with a lever at equilibrium with two
equal weights at equal distances from the fulcrum.4 Then he replaces one of the
weights by a jet of water dripping from a bucket until equilibrium is reached. He is
actually equating the force with the rate of change of the quantity of motion. Then
he repeats the same with two unequal weights and concludes that, at equilibrium,
the ratio of the quantities of motion (mv) is reciprocal to the lengths. Bernoulli,
drawing from Mariotte, is using a “pre-Newtonian” approach; in fact Newton cites
Mariotte when he discusses the axioms of the Principia (Cohen and Whitman,
1999, p. 495). Moreover, Bernoulli’s paper of 1703 is a corrected version of a paper
published in Acta Eruditorum (Bernoulli, 1686), the year Newton published the
Principia. Later, d’Alembert uses the law of the lever applied to dynamics as one
of his basic principles (Vollgraff, 1915), and refers to this extension from dynam-
ics to statics as “the fundamental principle of M. Bernoulli’s solution” (d’Alembert,
1743, p. 71). In this important paper in the history of dynamics, Bernoulli is using
the extended principle of the lever, the “moment of the momentum” as a “new and
more general method” in kinetics (Truesdell, 1968).
5.3 D’Alembert’s Principle

In his Treatise on Dynamics, d’Alembert (1743) proposes a foundation of mechan-
ics alternative to Newton’s laws of motion. Today we consider Newton’s Principia
the great synthesis of all mechanics developed before him by Galileo, Huygens,
Descartes and Leibniz, but in the eighteen-century that was not the general view.
Scientists were still looking for new principles that could reduce mechanics to a
set of axioms from which the rest could be derived (Hankins, 1967). D’Alembert
rejected the Newtonian idea of force as a cause of the changes of motion. The
3 m v = m v if we divide both sides of equation (5.8) by dt.
1 1 1 2 2 2
4 The relevant passage is reprinted on pp. 135–136 of the Lectures on Mechanics by Jouguet (1908).
idea of causes motrices (motive causes) was for him “obscure and metaphysicical”
(d’Alembert, 1743, p. xvi). His program was to reduce all dynamics to kinematics
and to describe motion from geometry only, without physical concepts such as
force, which are drawn from experience. He didn’t object to the notion of mass,
which is also physical, but more tolerable than force, since it can be conceived of
as the number of particles in a body. For him, changes of motion of particles origi-
nate in collisions with other particles, which in turn collided previously with other
particles. In this sense, the “cause” of the change in motion is a consequence of a
previous consequence. The exception is gravity (and other forces like magnetism
that were not conceptualized at the time). He even tried to develop an impact the-
ory of gravitation and failed (Hankins, 1970, p. 167). If one treats the motion under
gravity as collisions with an invisible stream of particles, the resulting “force”
will depend on the velocity of the attracted body, and that is not observed. He
insisted that the motion under gravity could be described without forces, and that
the causes of motion are “known only through the effects, and we are completely
ignorant of their real nature” (d’Alembert, 1743, p. x). D’Alembert’s discomfort
with the notion of force was founded as follows. In order for the relation f = ma
(or f = dp/dt) to be a physical law, mass, acceleration, and force have to be
defined independently. If we exclude the law of gravitation, and define force as the
rate of change of momentum, Newton’s second law says that the rate of change
of momentum is equal to the change of momentum – a tautology. Since “forces”
of constraint, for example, are not gravitational, d’Alembert formulates his theory
avoiding the notion of force. It is interesting that Einstein’s theory of gravitation,
developed centuries later, is a theory based on geometry, without forces.
In his Treatise, d’Alembert postulates two laws and two theorems as the foun-
dation of dynamics. The first law is the law of inertia: a body at rest will remain
at rest unless an external influence acts on it. He explicitly states that he takes this
law from Newton. The second law states that a body, once it is put in motion, will
“persevere uniformly in a straight line” unless an external influence on it. The first
theorem is the parallelogram rule for the composition of velocities: if two “forces”
(he uses the term puissance and not force) act to change the velocity of a body at A
so that one will make it move uniformly from A to B and the other from A to C, the
body will change its velocity along the diagonal AD of the parallelogram formed
by AB and AC. The second theorem, which he calls the “law of equilibrium,”
and refers to impenetrable bodies, is as follows: “If two bodies whose velocities
are in inverse ratio of the masses, such that one cannot move without shifting the
other, there is equilibrium between these two bodies.” In his article “Equilibre”
in the monumental Encyclopedie he admits that he is using the term equilibrium
that comes from statics (from the Latin aequs and libra or ‘equal balance’) to
mean something from dynamics: if two impenetrable bodies of equal momenta
(mv) collide and their motions are destroyed by the collision, “after the instant
of the collision these two bodies have lost their tendency to move” (d’Alembert,
1755). Today we generally choose Newton’s laws of motion over d’Alembert’s.
We accept mechanics as an experimental science and not a branch of geometry and
think of forces as acting continuously and not by impacts. D’Alembert remains a
major name in the history of mechanics, not because of his laws, but because of his
principle, stated in Part II of the Treatise.
D’Alembert’s formulates his principle (we reproduce the original statement
in Appendix B) in terms of “motions” and not of forces. Consider the motions
a, b, c, · · · imparted on a system of masses A, B, C, · · · , as shown in Figure 5.6.
By motions d’Alembert means the momentum mv. In fact, since in general the
particles were already moving, we should think of a, b, c, · · · as changes in the
momenta of particles A, B, C, · · · . Due to the constraints of the system, part of
those motions will be lost and part will remain unchanged. D’Alembert uses the
parallelogram rule to decompose the impressed motion into the unchanged motions
ā, b̄, c̄ · · · and the lost motions α, β, γ · · · . The principle states that, if we think of
the system at rest and act upon it with the lost motions only, the system will remain
at rest: the lost motions cancel each other out. In general, these motions will not
be colinear, and we have to use the law of the lever to establish that cancellation.
With this simple prescription, one should be able to find the motions ā, b̄, c̄ · · · of
the system.
The first example presented by d’Alembert is the calculation of the center of
oscillation, where the calculation is essentially the same as the one by Bernoulli,
in the version we presented in Section 5.2. In solving the examples, d’Alembert
applies his principle using a geometrical approach that resembles that in the
Principia and which looks complicated for the modern reader (Fraser, 1985).
a (impressed motion)
ā (unchanged motion)
(lost motion) α
γ
A
C
c
b̄
b
c̄
β
B
Figure 5.6 Impressed, unchanged, and lost motions (or velocities) for a system
with constraints. D’Alembert’s principle states that, if the system at rest is acted
upon by the lost motions only, it will remain at rest.
What is today called d’Alembert’s principle is a combination of the interpreta-

tions that Lagrange and, later, Mach give of the principle in terms of forces rather
than motions. Following Mach, the impressed motions a, b, c · · · are the exter-
nal forces F, which Lagrange calls the “forces of acceleration which act upon
each body” (Lagrange, 1811/1995, p. 186). The unchanged (or actual) motions
ā, b̄, c̄ · · · correspond to the forces Fa needed to create the accelerations actually
acquired by each part of the system. For a particle of mass m, Fa = ma, with a
the acceleration. Finally, the lost motions α, β, γ , · · · are the reactions Fc from the
forces of constraint,5 the forces that limit the possible configurations of the system.
In this language of forces, d’Alembert’s parallelogram rule gives, for each of the
N particles of the system:
Fi = Fa,i + Fc,i ≡ m i ai + Fc,i , (i = 1, · · · N ). (5.10)
D’Alembert’s great insight is that the system is in static equilibrium under the
action of forces of constraint (or the lost motions in his language). The condition
of static equilibrium, according to the principle of virtual work, is
Fc,1 · δx1 + Fc,2 · δx2 + · · · Fc,N · δx N = 0, (5.11)
which, using equation (5.10), becomes:
(F1 − m 1 a1 ) · δx1 + (F2 − m 2 a2 ) · δx2 + · · · (F N − m N a N ) · δx N = 0. (5.12)
Equation (5.12) is d’Alembert’s principle expressed as a principle of virtual work,

and is the foundation of Lagrange’s Mécanique Analytique. A wedding between
statics and dynamics has occurred: the system’s dynamics is described as the static
equilibrium under the action of two types of forces: the external forces Fi and
−m i ai . If there are no constraints on the system, the virtual displacements δxi are
independent of each other. In this case, in order for equation (5.12) to be valid,
each term must be zero, and d’Alembert’s principle gives Newton’s second law:
Fi = m i ai .
In the application of d’Alembert’s principle, it is important to notice the differ-
ence between virtual and real displacements. First, a real displacement can be finite
or infinitesimal, whereas we usually talk of a virtual displacement as an infinites-
imally small quantity. When the displacements are infinitesimal, the principle of
virtual work and the principle of virtual velocities are equivalent. Second, since we
are dealing with a static equilibrium problem, the virtual displacement is a displace-
ment consistent with the constraints of the system at a given time, that is, ignoring
the velocities of the particles at that instant. Virtual displacements δx at different
times of the dynamical history of a system are, in principle, uncorrelated with each
5 The forces of constraint acting on the system, from Newton’s third law, are just −F .
c
δx
dx
x(t)
r(
t)
Figure 5.7 Virtual (δx) versus real (dx) displacements for a particle constrained
to move on a circle, the radius of which, r (t), varies according to an externally
imposed function of t. The dashed line shows a possible path or the particle.
other. On the other hand, the real displacements dx, being tangent to a real path
x(t) are not independent of each other at different times. A third difference arises
regarding the work done by the forces of constraint. Equation (5.11) can be read
as saying that the forces of constraint do no work under virtual displacements. In
order to illustrate this last point, consider the example of Figure 5.7: the motion of
a particle constrained to move on a circle whose radius r (t) is some given function
of t and there is no external force. At a given time, the virtual displacement δx is
in the direction tangent to the circle. Since the force of constraint, Fc , is along the
radial direction we have: Fc · δx = 0. On the other hand, the real velocity v in gen-
eral can have a radial component and so will the real displacement dx = vdt. This
means that the force of constraint can in fact do work under real displacements, but
not under virtual displacements.
The connection between static equilibrium and dynamics appears also in Euler’s
work, where he sets out to clarify what is really meant by the “action” to be
minimized (Euler, 1748a,b). Euler considers the integral of the force over the
displacement:
#
= F · dx, (5.13)
which he calls the “action of the forces” (Euler, 1748b) or “effort of the
forces” (Euler, 1752). Later Lagrange will call this quantity the “potential.” In equi-
librium, this quantity is a minimum. Euler states that it is “natural to maintain that
the principle of equilibrium should also hold for the motion of bodies acted upon
by similar forces.” Since, he says, “the intent of Nature is to economize the total
effort
as much as possible,” then, “if dt denotes the element of time, the integral
dt must be a minimum. Thus if in the state of equilibrium, the quantity is
a minimum, the same laws of nature seem to require that for motion, the integral
dt should also be a minimum” (Euler, 1752).
He then uses the vis viva theo-
rem: 2 Mv = C − to obtain that 2Ct − Mv dt should be a minimum. The
6 1 2 2
term
Ct “does
not enter into the consideration of the maximum or7minimum” and
Mv dt = Mv ds, Maupertuis’s action, should be a minimum.
2
5.4 Lagrange’s Dynamics

In one of his early works, almost two decades before the publication of his master-
piece Mécanique analytique, Lagrange applied his own generalization of Euler’s
calculus of variation to the principle of least action (Lagrange, 1760/1761). He fol-
lows Euler (see Section 4.3) and starts by considering Maupertuis’s action extended
to the case of many particles of masses M ,M , M , etc., and stating that the
formula # # #
M vds + M v ds + M v ds + · · · , (5.14)
should assume a maximum or a minimum. As we will discuss in some detail in

Section 6.8, while the formula cannot assume a maximum, it can assume a mini-
mum or a saddle point for the physical path. In equation (5.14), s is the distance
traveled by each particle, v = ds/dt is the velocity, and the assumption is that the
vis viva theorem holds for each “point” of the path of this multi-particle system:
1 1 1
Mv 2 + M v 2 + M v 2 + · · · + V = E = Constant. (5.15)
2 2 2
For the variational calculus, Lagrange introduced the symbol “δ,” to be distin-
guished from “d.” For a function f (x) of the independent variable x, the quantity
d f symbolizes the variation of f upon changes of x: d f (x) = f (x + d x) − f (x).
On the other hand, δ f represents a change when the functional form of f changes
infinitesimally: δ f = f (x) + g(x), with an infinitesimally small positive num-
ber. For simplicity, in order to present the essentials of Lagrange’s calculation,
let us consider
the variation of the Maupertuis action for one particle. Lagrange
calculates δ vds, where the integral is between two fixed points in space, then
compares the values of the integral evaluated on two nearby paths connecting those
fixed points. He then equates the variation to zero:
#
M (δv ds + v δds) = 0. (5.16)
A delicate point that emerges in the evaluation of equation (5.16) is the choice
of independent variable for the paths one wishes to compare. Choosing time as
6 He does not write the factor 1 in the kinetic energy, but that does not affect the result.
2
7 Evidently Euler is referring here to an extreme, since the minimum of is the maximum of −.
the independent variable is problematic: since we are comparing paths of the same
energy, the elapsed times will be different for different paths, and time will have to
be varied. Since Lagrange does not vary time, his derivation was re-analyzed criti-
cally in the nineteenth century by several researchers including Rodrigues (1816),
Mayer (1877), and notably by Jacobi (1884). The main point is that, for paths of
the same energy, if one uses time as the variable, the changes dδx and δdx are not
equal. One can use the “operators” “dδ” or “δd” interchangeably when comparing
paths that take the
same time in going from the initial to the final point, which is
not the case for vds. As we anticipated in Section 4.5, the path connecting two
points at a given time is the one that minimizes the quantity
#
dt (T − V ), (5.17)
which we call today “Hamilton’s principle.” It is interesting that Rodrigues (1816),

many years before Hamilton, found equation (5.17) using the method of multipli-
ers developed by Lagrange, extending Maupertuis’s principle to paths of varying
energy and fixed time.
As pointed out by Jacobi (1884), Lagrange’s calculation is simpler to understand
if one first eliminates time using the vis viva equation:8

2
vds = (E − V ) d x 2 + dy 2 + dz 2 . (5.18)
M

In order to evaluate vds as a purely geometrical integral, Jacobi chooses one of
the variables x, y, or z as independent. The integral, in this one-particle example,
becomes:
# # xf
2
vds = (E − V ) 1 + y 2 + z 2 d x, (5.19)
xi M
with y = dy/d x, z = dz/d x and xi , x f the x coordinates of the initial and final
points. An objection to this approach is the implicit assumption that the functions
y(x) and z(x) are single valued, which in general may not be the case. Let us intro-
duce a minor modification to Jacobi’s treatment and represent the path in terms of
a parameter λ that runs from zero to one: (x, y, z) = x = x(λ), and xi = x(0), and
x f = x(1). We can now write Maupertuis’s action as:
# # 1
2
vds = E − V (x) x 2 + y 2 + z 2 dλ (5.20)
M
# 1
0
2
≡ E − V (x)|x | dλ (5.21)
0 M
8 The remark appears in the sixth of a series of lectures from 1842 to 1843 published posthumously in 1884.
with x = dx/dλ. The integral of equation (5.21) is now expressed in purely

geometrical terms, without reference to time. Now consider a nearby path
x1 (λ) = x(λ) + δx ≡ x(λ) + δ(λ), (5.22)
with δ(λ) = (δx , δ y , δz ) a continuous function of the variable λ satisfying9 δ(0) =
δ(1) = 0: the extremes of the path are fixed. The points x(λ) and x1 (λ) of each
path correspond to the same value of the parameter λ but to different elapsed times
from the initial position. The Maupertuis action for the path x1 (λ) is:
# 1
2
dλ E − V (x + δ)|x + δ | . (5.23)
0 M
Expanding to lowest order10 in the variations δ and δ , we have for the difference
between the integrals of equations (5.23) and (5.21):
# # 1
−∇V x · δ
δ vds = dλ · δ |x | + v = 0. (5.24)
0 Mv |x |
Since the vector x /|x | is of unit length and along the tangent to the path, we can
write:
# # 1
F
δ vds = dλ · δ |x | + v · δ , (5.25)
0 Mv
where F = −∇V is the external force acting on the particle, originating from the
potential V . Now we can write the identity
d
(v · δ) = v · δ + v · δ , (5.26)
dλ
and use the fact that δ vanishes at the extremes11 of the integral to write
# # 1
|x | dv
δ vds = dλ F− · δ, (5.27)
0 Mv dλ
which, given that the variations δ are arbitrary, implies

|x | dv
F− · δ = 0. (5.28)
Mv dλ
If we now use x = dx/dλ, |x | = ds/dλ, and ds/v = dt, we can write equation
(5.28) as

dv
F−M · δx = 0, (5.29)
dt
9 The function δ(λ) is to be regarded as of infinitesimal magnitude. We can also write it as δ(λ) = η(λ), with
infinitesimally small.
10 Recall that both δ and δ are infinitesimal if one thinks of both as proportional to some .
11 Notice that, had we chosen time to parameterize the paths x(t) and x (t) = x(t) + δ(t), this last step would
1
not be licit. The total elapsed times T and T1 are different for the different paths, and δ(T ) = 0.
where we went back to the notation δ = δx. Equation (5.29) is, in contemporary
notation, the result Lagrange obtained in his 1760/1761 paper. If the variations of
the different components of δ along the x, y, z axes are independent, we obtain
Newton’s law: F = Mdv/dt. On the other hand, if the particle is subject to geo-
metrical constraints – for example, if it is restricted to move on a certain surface –
then the components of the possible variations of δx are not independent of each
other, and equation (5.29) becomes d’Alembert’s principle. Lagrange appears to
have noticed that his derivation corresponded to d’Alembert’s principle (or the
principle of virtual velocities), and therefore the principle of least action would
become a result derivable from a more fundamental principle.
In a prize essay on the libration of the moon, Lagrange (1764) formulates the
dynamic equations of motion using a “new principle of mechanics:” the principle
of virtual velocities. From then on, Lagrange shifts to the principle of virtual veloc-
ities, probably because of his rejection of the teleological speculations associated
with least action (Fraser, 1983).
5.4.1 Lagrange’s “Scientific Poem”

One of the great achievements of the Mécanique analytique is Lagrange’s rewriting
of d’Alembert’s principle – equation (5.29) – in arbitrary coordinates. Since time
dv
is not varied, Lagrange writes the acceleration term · δx as d 2 x δx + d 2 y δy +
dt
d 2 z δz, eliminating the denominator dt 2 . Next he expresses the variables x y, z (we
will consider first the case of only one particle) in term of arbitrary coordinates ξ ,
ψ, ϕ, so that
δx = A δξ + B δψ + C δϕ (5.30a)

δy = A δξ + B δψ + C δϕ (5.30b)

δz = A δξ + B δψ + C δϕ, (5.30c)
and the same expressions for equations for d x, dy, dz in terms of dξ , dψ,
dϕ. Lagrange does not use the language of partial derivatives, but we identify
A = ∂ x/∂ξ , B = ∂ x/∂ψ etc. Using equations (5.30) in the expression of the
acceleration we obtain:

d 2 x δx + d 2 y δy + d 2 z δz = Ad 2 x + A d 2 y + A d 2 z δξ

+ Bd 2 x + B d 2 y + B d 2 z δψ
+(Cd 2 x + C d 2 y + C d 2 z) δϕ. (5.31)
Lagrange’s purpose is to write the second variations d 2 x (the variation of the

variation) in terms of first variations d x. Note that
∂
(d x)2 = 2A d x, (5.32)
∂ξ
from which we have:
1 ∂
d (d x)2 = d A d x + A d 2 x. (5.33)
2 ∂ξ
If we now use12 d A = d(∂ x/∂ξ ) = (∂/∂ξ )d x, equation (5.33) becomes:
1 ∂ 1 ∂
Ad 2 x = d (d x)2 − (d x)2 . (5.34)
2 ∂ξ 2 ∂ξ
Repeating the same procedure for dy and dz and substituting in equation (5.31),
we obtain:

∂ ∂
d x δx + d y δy + d z δz = d φ −
2 2 2
φ δξ
∂ξ ∂ξ

∂ ∂
+ d φ− φ δψ
∂ψ ∂ψ

∂ ∂
+ d φ− φ δϕ, (5.35)
∂ϕ ∂ϕ
with
1 2
d x + dy 2 + dz 2 .
φ= (5.36)
2
The transformation of the force term of equation (5.29), is much simpler. Using
equations (5.30) and the fact that the force is derived from a potential, F = −∇V ,
we obtain:
∂V ∂V ∂V ∂V ∂V ∂V
δx + δy + δz = δξ + δψ + δϕ. (5.37)
∂x ∂y ∂z ∂ξ ∂ψ ∂ϕ
In order to write Lagrange’s equations in the form used in contemporary texts,
on both sides of equation (5.35) we multiply by M, divide by dt 2 and write the
∂ d ∂
operators of the form d as (with ξ̇ = dξ/dt), all valid operations as long
∂ξ dt ∂ ξ̇
as t is an independent variable. With these substitutions, equation (5.29) becomes

∂L d ∂T ∂L d ∂T ∂L d ∂T
− δξ + − δψ + − δϕ = 0, (5.38)
∂ξ dt ∂ ξ̇ ∂ψ dt ∂ ψ̇ ∂ϕ dt ∂ ϕ̇
with L = T −V , and T the kinetic energy 12 M ẋ2 expressed in terms of the new vari-
ables ξ , ψ, ϕ. The procedure we indicated can be extended without modifications
12 There is no problem with exchanging d with ∂/∂ξ in this case because they are geometrical derivatives
performed, according to d’Alembert’s principle, at a fixed time. The relation is still valid in the case when the
relation between the variables x, y, z, and ξ , ψ, ϕ depends on time, provided time is regarded as a constant,
as dictated by d’Alembert’s principle.
to many particles, where the kinetic energy is T = 12 (M ẋ21 + M ẋ22 + · · · ) and

equation (5.38) will contain 3N terms, three terms per particle. When the sys-
tem has constraints, the power of Lagrange’s casting of d’Alembert’s principle in
arbitrary coordinates is made evident. For a constrained system, the coordinates
x1 , x2 , · · · x N are not independent; the configuration of the system can be speci-
fied in terms of a smaller number of variables q1 , q2 , · · · , qr with r < 3N . Since
the virtual displacements δqi of these new variables are independent, each term of
equation (5.38) is zero. If, in addition, the potential energy is independent of the
velocities ẋi (and therefore independent of q̇i ), we obtain:
∂L d ∂L
− = 0. (5.39)
∂qi dt ∂ q̇i
Equation (5.39) is of particular elegance; it expresses the motion in terms of

a single function L (the Lagrangian) through r equations identical in structure
(they are “covariant”), one for each variable. Lagrange points out that equation
(5.39) “can be found more simply and generally by the principles of the Method
of Variations” (Lagrange, 1811/1995, p. 224), his own invention. Consider the
integral:13
# tb
I = dt f (x(t), ẋ(t), z, t) . (5.40)
ta
The variation of I is
# tb
∂f ∂f
δI = dt δx(t) + δ ẋ(t) (5.41)
ta ∂x ∂ ẋ
# tb
∂f d ∂f d ∂f
≡ dt δx(t) + δx(t) − δx(t) . (5.42)
ta ∂x dt ∂ ẋ dt ∂ ẋ
The middle term in equation (5.42) vanishes upon integration if the variation δx
is zero at the extremes of the integral. Since the variations are arbitrary, we obtain
∂f d ∂f
− = 0. In other words, the structure of equation (5.39) can be regarded
∂x dt ∂ ẋ
as resulting not from a dynamical principle, but from the minimization of L dt,
with L an arbitrary function of qi , q̇i and time.
William Rowan Hamilton, whose work was inspired by that of Lagrange, had
words of high praise for the Mécanique analytique: “[Lagrange showed] that the
most varied consequences respecting motions of systems of bodies may be derived
from one radical formula; the beauty of the method so suiting the dignity of the
results, as to make of his great work a kind of scientific poem” (Hamilton, 1834a).
13 Lagrange considers f to be a function of more variables, y, z and higher order derivatives ẍ, ÿ, etc. but in
order to illustrate that the implied structure is that of equation (5.39) it is enough to consider the integral of
equation (5.40).
Presumably Hamilton, a poet himself, is referring to equation (5.39) as the “one

radical formula.”
5.4.2 Symmetries
Symmetries play a key role in both classical and quantum physics. In the con-
text of Lagrangian mechanics, a symmetry is some mathematical operation or
transformation that leaves the Lagrangian or the equations of motion unchanged.
For example, right after deriving his scientific poem, Lagrange remarks that if
one adds to L a function that is a total derivative, d A(q)/dt, equations (5.39)
remain unaltered. Consider for simplicity of notation the case of only one variable:
d A(q)/dt = A (q)q̇, and:
L = L + A (q)q̇. (5.43)
The added term is innocuous since:
∂
A (q)q̇ = A (a)q̇, (5.44a)
∂q
d ∂ d
A (q)q̇ = A (q) = A (a)q̇, (5.44b)
dt ∂ q̇ dt
implying that both L and L give rise to the same equations of motion, just as
adding a constant to a function f (x) does not alter its derivative, and therefore
leaves the position of the minimum of f unchanged. This freedom of choice of L
was later baptized as “gauge symmetry” (Weyl, 1919) and plays a crucial role in
modern physics
Another simple mathematical evidence of a symmetry is a cyclic variable. This
means that the Lagrangian is independent of that variable. For example, the vari-
able could be a direction in space, say the x-axis. The Lagrange equation for that
variable becomes:
d ∂L
= 0, (5.45)
dt ∂ ẋ
and hence the quantity px = ∂∂ Lẋ (the particle’s momentum in the x direction) is
conserved along the motion.
In a famous paper, the mathematician Emmy Noether (1918) generalized the
concept underlying the cyclic variables and proved that symmetry implies con-
servation. In our simple example, the cyclic nature of the variable means that
∂ L/∂ x = 0, which is equivalent to L(x + d x) = L: we have invariance with
respect to translation in the x direction, or invariance with respect to the group
of translations in that direction. (This is a group in the sense that one can compose
two translations and obtain another translation or take the inverse of a translation by
translating in the negative direction of the original translation.) Noether generalizes

this result and considers the invariance of L upon a general change in coordinates,
from q to q . Symmetry of L under this change of coordinates means that the L
preserves its shape, or functional form:
L(q , q̇ ) = L(q, q̇). (5.46)
Noether considers continuous symmetries (like rotations and translations) where
the variations of the coordinates are infinitesimal q = q + f , with f a function
that, in principle, could depend on q, q̇ and time. She also allows for the change
in the independent variable t; we will return to this. Substituting this variation in
equation (5.46) and keeping the terms up to order , we have14 :

∂L ∂L
δL = ·f+ · ḟ = 0. (5.47)
∂q ∂ q̇
Equation (5.47) expresses the symmetry property of the function L under the
change generated by the function f, and so far does not say anything about conser-
vation. The magic of Noether’s theorem comes from noticing that, if we evaluate
∂L d ∂L
equation (5.47) along a path q(t) that obeys the equations of motion = ,
∂q dt ∂ q̇
the variation δL becomes a total derivative:

d ∂L
δL = · f = 0, (5.48)
dt ∂ q̇
and
∂L
· f = Constant of motion. (5.49)
∂ q̇
The constant of motion is sometimes called the “conserved charge” correspond-
ing to the symmetry f. For example, consider the Lagrangian of a particle of mass
M moving in a central potential:
1
M q̇2 − V (q),
L= (5.50)
2
with q = |q| the distance to the
origin.
Consider the following infinitesimal vari-
ation in coordinates: δq = n̂ × q , corresponding to f = n̂ × q, with n̂ a fixed
unit vector in space. The variation is an infinitesimal rotation around an axis in
the direction of n̂. Using ∂ L/∂q = −V (q)q/q and ∂ L/∂ q̇ = M q̇, the variation
becomes:

q·f
δL = −V (q) + M q̇ · ḟ . (5.51)
q
14 For compactness, we use the dot product notation: ∂ L · f = ∂ L f + ∂ L f + · · · ∂ L f , etc.
1 2 N
∂q ∂q1 ∂q2 ∂q N
Since f is perpendicular to q, and ḟ = n̂ × q̇ is perpendicular to q̇, we have

δL = 0: the variation corresponds in fact to a symmetry of this Lagrangian.
Noether’s theorem tells us that the associated constant of motion, from equation
(5.49), is

M q̇ · n̂ × q ≡ M n̂ · q̂ × q̇ , (5.52)
which is the component of the angular momentum along n̂. Since the direction
of n̂ is arbitrary, all the components of the angular momentum are constant for an
orbit in a central potential: rotational symmetry implies conservation of the angular
momentum. For more on Noether’s equation see, for example, Marsden and Ratiu
(1999).
In some interesting cases, the transformation generated by f is not strictly a sym-
metry – that is δL = 0 – but is such that it gives rise to a time derivative when the
variation is evaluated along an orbit:
d A(q)
δL = . (5.53)
dt
For such cases the constant of motion C has the form:
∂L
· f − A(q) = C. (5.54)
∂ q̇
We refer to the symmetry in this case as a “dynamical” symmetry because there
is an operation that leaves L unchanged, not in general, but along an orbit. Recall
that, in general, adding a “gauge” term d A(q)/dt to the Lagrangian does not affect
the equations of motion. If after performing an infinitesimal change of coordinates
(from q to q + f) along an orbit the Lagrangian changes by δL = d A/dt for
some particular A, then, along that orbit, we can simultaneously subtract a term
d A/dt (a licit gauge transformation) leaving L unchanged to first order in . The
constant of motion C of equation (5.54) reflects the dynamical invariance of L – the
invariance under a change in coordinates accompanied by a gauge transformation
along the orbit.
The principle of conservation of energy can be regarded as emerging from a
dynamical symmetry of the Lagrangian. Consider the variation of L under a “trans-
lation in time,” from t to t + . If the Lagrangian does not depend explicitly on
time (for example, if the constraints are independent of time), the variation is from
L(q(t), q̇(t)) to L(q(t + ), q̇(t + )):

∂L ∂L d
δL = · q̇ + · q̈ = L = 0. (5.55)
∂q ∂ q̇ dt
The variation is not zero, but is equal to a total derivative, with A = L, and
equation (5.54) applies. Since the change of coordinates is δq = q̇, we have f = q̇
and the constant of motion is:
∂L
· q̇ − L = E, (5.56)
∂ q̇
with E the energy of the system. For the cases where L = 12 M q̇2 − V (q), we have
the vis viva theorem: E = T + V is constant along all paths that obey the equations
of motion.
The Laplace-Runge-Lenz Vector

A peculiar example of dynamical symmetry is the Laplace-Runge-Lenz vector, a
constant of motion of the Kepler problem noticed by Laplace (1799) in his Trea-
tise on Dynamics, later mentioned by Tisserand (1899), and popularized by Runge
(1919) and Lenz (1924).
Laplace realized that, for the potential of the Kepler problem,15 V (r ) = −k/r ,
in addition to the energy E and the angular momentum L, there is a third constant
of motion, a vector R – called the Laplace-Runge-Lenz (LRL) vector – defined as:
R = p × L − Mk ûr , (5.57)
where p = M ẋ is the particle’s momentum and ûr is the unit vector in the particle’s
direction (x = r ûr ).
The constancy of R is easy to verify. If we choose the z axis in the direction of
the (constant) angular momentum (L = Lk̂), we have, using Newton’s law:
k kL
ṗ × L = − 2
Lûr × k̂ = 2 ûθ , (5.58)
r r
with ûθ the azimuthal unit vector. On the other hand, using L = Mr 2 θ̇ :
kL
Mk û˙ r = Mk ûθ θ̇ = ûθ , (5.59)
r2
and Ṙ = 0. At the perihelion and aphelion of an orbit, p is perpendicular to ûr ,
and p × L will be parallel to ûr : the LRL vector is parallel to the semi-major axis
of the ellipse.
In order to see the constancy of the LRL vector as a dynamical symmetry, start
with the Lagrangian written in Cartesian coordinates:
1 k
L = m ẋ 2 + ẏ 2 + ż 2 + , (5.60)
2 r

with r = x 2 + y 2 + z 2 . Now change x to x + δx ≡ x + ĵ, corresponding to f = ĵ.
The variation of L, keeping terms up to order is:
%
y& sin θ
δL = −k 3 = −k 2 . (5.61)
r r
15 Since there are no constraints on this system, we use r for q and x for q.
Clearly the Lagrangian is not invariant under this change: L is not invariant
under translations. However, along an orbit of angular momentum L we can use
1/r 2 = M θ̇/L, and δL becomes

Mk d mk dA
δL = − θ̇ sin θ = cos θ ≡ . (5.62)
L dt L dt
Since δL is a total derivative along the orbit, we can speak of the translation in the
y direction generated by f = ĵ as a dynamical symmetry of the Kepler problem.
The constant of motion, from equation (5.54), is
∂L Mk
C=· ĵ − cos θ (5.63)
∂ ẋ L
Mk
= M ẏ − ûr · ı̂. (5.64)
L
With simple manipulations we can verify that the constant C of equation (5.64)
is proportional to the x component of the LRL vector. Since the angular momentum
is in the z direction, we have:

ẋ × L = L ẋĵ − ẏı̂ , (5.65)
and
L ẏ = − (ẋ × L) · ı̂. (5.66)
Multiplying equation (5.64) by L we obtain:

p × L + Mk ûr · ı̂ = Const. (5.67)
Since the problem has rotational symmetry, we can choose any direction for f
and obtain that all the components of the LRL vector are constant. This dynamical
symmetry is unique to the Kepler problem. There are only two central poten-
tials that give closed orbits for all energies: the Kepler problem and the harmonic
potential where the force is proportional to the distance. And both have dynamical
symmetries. A vector is constant for the Kepler problem, and a more complicated
mathematical object, a tensor, is constant for the harmonic potential.16
5.5 Lagrange versus d’Alembert: Dissipative and Nonholonomic Systems

With all their poetic simplicity, Lagrange’s equations don’t apply universally; the
dynamics of some systems cannot be obtained from a minimum principle. In this
section, we discuss two of those cases: particles subject to viscous forces, and
the so-called “nonholonomic” systems, where the constraints are in the virtual
16 For a Lagrangian L = 1 M ẋ2 − 1 kr 2 , the quantity kx y + ẋ ẏ and the other two permutations of the
2 2
coordinates are constant.
5.5 Lagrange versus d’Alembert: Dissipative and Nonholonomic Systems 101
displacements themselves rather than on the possible configurations. In the first

case, even though a frictional force cannot be obtained from a minimization
process, the irreversible mechanism of friction and dissipation of energy of a
particle can be obtained from its reversible interaction with an infinite medium.
Irreversibility in this case arises from the choice of initial conditions, but the sys-
tem as a whole is reversible. On the other hand, the dynamics of nonholonomic
systems can be solved using d’Alembert’s principle, but cannot be obtained from a
minimum principle.
5.5.1 Dissipation in a Reversible System: Lamb’s Model

A simple way to model irreversibility for a particle moving in one dimension is
by adding a force f = −α Ẋ , where α is a viscosity coefficient. For example, the
equation of motion of a particle of mass M attached to a spring of constant K with
a frictional force is:
M Ẍ = −K X − α Ẋ . (5.68)
If, for example, we start with the particle with zero displacement and some finite
velocity v0 , the subsequent motion is damped:
α
X (t) = X 0 e− M t sin ωt, (5.69)
with X 0 = v0 /ω. (For small values of α, the frequency of oscillation is ω =
√
K /M.) The important point for the present discussion is that the force f cannot
be expressed as f = −∂ V /∂ X for some potential V (X ), as is the case for forces
that give rise to the vis viva theorem. There is no Lagrangian for the equations
of motion (5.68). We could be tempted to try a potential V (X ) = −cX Ẋ but we
immediately notice that it has the form of a derivative, V (X ) = d(−cX 2 /2)/dt,
and a term like this does not affect the equations of motion of the Lagrangian L O =
1
2
M Ẋ 2 − 12 K X 2 of the oscillator. However, damping can be obtained by coupling
the harmonic oscillator of Lagrangian L O with an open system – an infinite string –
as in the model proposed by Horace Lamb (1900).
In order to introduce a Lagrangian with an infinite number of degrees of
freedom – the qi ’s of equation (5.39) – let us start with the infinite one-dimensional
chain of “atoms” of mass m coupled to each other with springs of constant k (see
Figure 5.8). The Lagrangian of the string plus the oscillator is:
'∞
M 2 K 2 m 2 k
L= Ẋ − X + u̇ − (u n − u n+1 ) ,2
(5.70)
2 2 n=−∞
2 n 2
where u n represents the (vertical) displacement of the n-th mass with respect to
the equilibrium position. We are choosing the u n ’s to be vertical so that the string
un−1
na (n + 1)a
x
(n − 1)a un
un+1
Figure 5.8 Portion of a discrete, infinite chain of atoms separated a distance a in

the x direction. The vibrations of the atoms in this model are transverse to the
propagation.
oscillations are transverse, but we could have also chosen the oscillator coupled
to longitudinal oscillations of a medium. In the Lagrangian of equation (5.70) the
string and the oscillator are decoupled. Lamb chooses a coupling in the form of a
holonomic constraint:
X = u0, (5.71)
which amounts to “gluing” the oscillator to one of the points of the string (the one
at x = 0), and decreasing by one the number of degrees of freedom of the system.
The Lagrangian becomes:17
'∞
m 2 k (M − m) 2 K 2
L= u̇ − (u n − u n+1 ) + δn,0
2
u̇ n − u n . (5.72)
n=−∞
2 n 2 2 2
The equations of motion for each u n are (replacing M −m by M, since M m):

m + Mδn,0 ü n = k (u n+1 + u n−1 − 2u n ) − δn,0 K u n . (5.73)
Now take the limit, a → 0, where the chain becomes a continuous string. The
discrete difference becomes a second derivative: u n+1 + u n−1 − 2u n = a 2 ∂ 2 u/∂ x 2 ,
and we obtain the following parameters for the string: ρ = m/a (the mass per unit
length) and T = ka (the tension). Also, the Kronecker delta becomes the Dirac
delta: δi,0 = aδ(x), and Lagrange equations become:
∂ 2u ∂ 2u
(ρ + δ(x)M) = T − K δ(x)u(x). (5.74)
∂t 2 ∂x2
When x = 0, equation (5.74) becomes the free wave equation:
∂ 2u ∂ 2u
ρ = T , (5.75)
∂t 2 ∂x2
17 In equation (5.72) the function δ
i, j is the “Kronecker delta,” which is equal to one when i = j and zero
when i = j
5.5 Lagrange versus d’Alembert: Dissipative and Nonholonomic Systems 103
with general solutions:
u(x, t) = f (x − ct) + g(x + ct). (5.76)

√
where c = T /ρ is the velocity of the waves, and f and g are arbitrary functions
determined by the initial conditions.
In order to extract the equation for x = 0, integrate18 equation (5.74) between
x = 0− and x = 0+ :

d2 ∂u(0+ , t) ∂u(0− , t)
M 2 u(0, t) = T − − K u(0, t). (5.77)
dt ∂x ∂x
At this point, we choose a particular form of the initial conditions for the string:
u(x, t < 0) = 0; for all points except x = 0, where our particle of mass M lives.
The system is going to acquire what Lamb calls a “sudden blow”; only the mass M
(and not the string) is going to have some finite initial conditions. For later times,
due to the symmetry of the problem around x = 0, we choose u(x, t) = u(−x, t).
This means that, in equation (5.76) we are choosing b = 0 for x > 0; the string
vibrations correspond to an “outgoing” wave u(x, t) = u(x −ct). For this outgoing
wave, we have ∂u/∂ x = −c∂u/∂t, which, substituted in equation (5.77) gives:19
M Ẍ = −K X − (2cT ) Ẋ . (5.78)
We have obtained equation (5.68) with a viscosity coefficient α = 2cT . Equation

(5.78) is “non-Lagrangian” because we are focusing on x = 0, rendering the string
invisible, and staying within a restricted set of initial conditions. The system as
a whole is Lagrangian, and admits motions at x = 0 described by the equation
M Ẍ = −K X + (2cT ) Ẋ . These motions correspond to “incoming waves” that are
part of the reversible solutions of the system. If we focus on the outgoing waves,
we obtain dissipation, the string carrying the energy “lost” by the particle (see
Figure 5.9).
5.5.2 Nonholonomic Systems

The terms “holonomic” and “nonholonomic” were coined by Heinrich Hertz
(1894/1956) in his book The Principles of Mechanics. The word “holonomic” (or
“holonomous”) is comprised of the Greek words meaning “integral” (or “whole”)
and “law,” and refers to the fact that such constraints can be expressed as limitations
on the configuration variables of the system as a whole. As discussed in Section
5.4.1, substituting these constraints in d’Alembert’s principle leads to Lagrange’s

18 We use the properties of Dirac’s delta function: ∞ d xδ(x) f (x)d x = f (0), and admit the possibility that
∞
the derivative ∂u/∂ x can be discontinuous due to the presence of a delta function in the equation.
19 Recall that u(x = 0, t) = X (t).
X(t)
−ct ct
Figure 5.9 Lamb’s model: harmonic oscillator coupled to an infinite string. We

start at t = 0 with a “quiet” string: u(x, 0) = u̇(x, 0) = 0. If the particle starts
with zero amplitude and finite velocity, the subsequent motion of the particle is
given by equation (5.69). The string’s oscillation, for 0 < x < ct will be u(x, t) =
X 0 e2 M (x−ct) sin ωc (ct − x), and u(x, t) = 0 for |x| > ct.
T
equations for each of the unconstrained variables. On the other hand, for nonholo-
nomic systems, the values of the unconstrained variables are not a function of the
constrained variables and the limitations are on the (virtual) displacements them-
selves and not on the possible configurations. For example, following the logic of
the derivation of Lagrange equations, consider a holonomic transformation of vari-
ables from x, y, z to ξ, ψ: x = f (ξ, ψ), y = g(ξ, ψ), z = h(ξ, ψ). The variations
of the constrained variables are of the form δx = (∂ f /∂ξ )δξ + (∂ x/∂ψ)δψ and
similarly for y and z. The fact that we can write a variation ∂(δx)/∂(δξ ) as (∂ f /∂ξ )
for some f allowed Lagrange to write equation (5.32) and from that equality derive
the scientific poem. For nonholonomic systems, the function f does not exist, and
the crucial step of equation (5.32) cannot be taken. Let us see this fundamental
difference in a simple example.
Consider a particle in three dimensions in the absence of external forces.
d’Alembert’s principle reads:
m ẍδx + m ÿδy + m z̈δz = 0. (5.79)
Now consider the following “velocity constraint”:
δz = yδx, (5.80a)
ż = y ẋ. (5.80b)
This is a nonholonomic constraint in that a function z = f (x, y) does not exist
that would give us ∂ f /∂ x = y and ∂ f /∂ y = 0: there is no Lagrangian for this
system. Substituting equation (5.80a) in d’Alembert’s principle, equation (5.79),
we obtain:
(ẍ + z̈ y) δx + ÿδy = 0. (5.81)
5.6 Gauss’s Principle of Least Constraint 105
Since the virtual displacements δx and δy are independent, we have
ẍ + z̈ y = 0, (5.82a)
ÿ = 0. (5.82b)
Now we can use the constraint of equation (5.80b) to get

y
ẍ + ẋ ẏ = 0 (5.83)
1 + y2
Note this system gives free motion in the y direction while the x equation can
be integrated to give
c
ẋ = . (5.84)
(1 + y 2 )1/2
If, as in holonomic systems, we naively replace the constraint in the Lagrangian
L = m2 (ẋ 2 + ẏ 2 + ż 2 ), we obtain a completely different (and incorrect) dynamics.
For more on nonholonomic systems see Bloch (2003).
5.6 Gauss’s Principle of Least Constraint

In a short paper, the great Johann Carl Friedrich Gauss (1829) proposed an alter-
native minimum principle that does not require the evaluation of an integral.
Motivated by his least squares method for the theory of errors, Gauss defines a
“constraint” at every point in the path as the deviation, due to the forces of con-
straint, with respect to the otherwise free (unconstrained) motion. He then requires
that constraint be a minimum:
Let m, m , m , and so on the masses of the points; a, a , a , and so on their positions at

time t; b, b , b , and so on the positions they would take on if they were completely free
after an infinitesimal dt because of the forces acting on them during this time and of the
velocities and directions which they had at the instant t. The actual position c, c , c , and
so on will then be those for which, under all conditions eligible for the system, m(bc)2 +
m (b c )2 + m (b c )2 and so on is a minimum (Gauss, 1829).
We recognize the lengths bc, b c , b c and so on as proportional to Bernoulli’s

“lost motions.” Gauss’s principle requires therefore the minimization of the lost
motions and is close in spirit to d’Alembert’s principle. For example, consider one
of the particles (particle 1), of mass m 1 which at time t has position x(t) and veloc-
ity ẋ(t). For this particle, the positions b and c – or xb and xc in our notation – are:
1 F1 2
xb = x(t) + ẋ(t)dt + dt (5.85a)
2m
1
xc = x(t) + ẋ(t)dt + a1 dt 2 . (5.85b)
2
In equation (5.85), the coordinate xb does not satisfy the constraint, whereas xc
does. Since the factor dt 2 /2 is common to both xb and xc , we obtain
2
F1
(bc) = (xb − xc ) ∝
2 2
− a1 . (5.86)
m1
In this notation, Gauss tells us that the values of a1 , a2 , · · · are the ones that
minimize the constraint given by:
2 2 2
F1 F2 FN
m1 − a1 + m 2 − a2 + · · · m N − aN . (5.87)
m1 m2 mN
If we compute variations of equation (5.87) with respect to the variables ai and
equate to zero we obtain:
(F1 − m 1 a1 ) · δa1 + (F2 − m 1 a2 ) · δa2 + · · · (F N − m N a N ) · δa N = 0 (5.88)
Now we notice, from equations (5.85), that δa1 = dt22 δxc ≡ dt22 δx1 : for each par-
ticle, the variation of the acceleration ai is proportional to the virtual displacement
of the same particle. In other words, equation (5.88) is equivalent to d’Alembert’s
principle of equation (5.12).
As an example consider a particle in two dimensions constrained to move on a
parabola:
1
y = x2 (5.89)
2
We assume we have the potential force −mg in the downward y direction.
Gauss’s principle tells us that we need to minimize (with respect to the
accelerations) the expression
(m ẍ)2 + (−mg − m ÿ)2 , (5.90)
incorporating the constraint, which in our case is y = x 2 /2, giving ÿ = ẋ 2 + x ẍ.
The only independent acceleration is ẍ. Gauss’s principle tells us that the mini-
mization of equation (5.90) becomes
∂ 2
ẍ + (g + ẋ 2 + x ẍ)2 = 0. (5.91)
∂ ẍ
The result is the following equation of motion
ẍ(1 + x 2 ) + x ẋ 2 + gx = 0. (5.92)
Equation (5.92) can also be derived from the constrained Lagrangian, which
is in turn obtained by substituting the constraint (5.89) into the Lagrangian
L = 12 m(ẋ 2 + ẏ 2 ) − mgy:
5.7 Least Action with a Twist 107
1 1
L = m ẋ 2 1 + x 2 − mg x 2 . (5.93)
2 2
Gauss’s principle is attractive since it deals with a true minimum and not an
extreme of a function. As pointed out by Gauss himself, the principle is not new
and is equivalent to d’Alembert’s principle. That, says Gauss, does not make it
less interesting, since “it is always interesting and instructive to regard the laws
of nature from a new and advantageous point of view, so as to solve this or that
problem more simply.” An interesting interpretation of Gauss’s principle was given
by Heinrich Hertz (1894/1956). In the absence of external forces, the kinetic energy
is constant (for time-independent constraints). That means that the length of the
path s is proportional to vt (the velocity v of the particle is constant). That means
that the acceleration is given by
d 2x d 2x d
a= 2
∝ 2
= t̂ = κ, (5.94)
dt ds ds
with t̂ = dx/ds the tangent to the curve, and κ the curvature. In the absence of
forces, the curve followed by a constrained particle (or system of particles) is the
one with minimum possible curvature. For a particle on a sphere, the path connect-
ing two points is a great circle of radius equal to the radius R of the sphere, and the
curvature κ = 1/R, the smallest possible.
5.7 Least Action with a Twist: the Elasticity of the Ether and Maxwell’s
Equations*
The power of the Lagrangian approach “consists in its allowing us to ignore or
leave out of the account altogether the details of the mechanism . . . in the phe-
nomena under discussion” (Larmor, 1893). One could follow a heuristic route,
postulate a Lagrangian with a plausible potential and kinetic energies (for exam-
ple, that respects the symmetry of the system) and obtain the dynamics through
equation (5.39). The first people to exploit this “blackboxing” property of the prin-
ciple of least action (Darrigol, 2014, p. 93) were George Green (1838) and James
McCullagh (1846). McCullagh studied the propagation of light in anisotropic crys-
tals. When light enters an anisotropic crystal, the ray splits into two rays (and, in
some cases, an infinite number of rays, as we discuss in Section 6.2). This phe-
nomenon – the “birefringence” – is a clear indication that light is a transverse
wave. McCullagh’s idea is to treat light as the vibration of an elastic solid, just as
in the case we discussed in Section 5.5.1. In a crystal, this solid is anisotropic, and
in a vacuum this solid (the ether) is isotropic. In order to eliminate the longitudi-
nal vibrations, he proposes that the ether is incompressible. For example, consider
two atoms of a solid at positions x and x + d x, whose displacements from their
equilibrium positions in the x direction are u x (x) and u x (x + d x) respectively. If
u x (x) = u x (x + d x), there is potential energy corresponding to compression pro-

portional to (u x (x) − u x (x + d x))2 ∝ (∂u x /∂ x)2 . If we now repeat the same for
all three directions, we obtain that the potential energy of a compressible solid will
have a term proportional to (∇ ·u)2 . A term like this should not be part of the poten-
tial energy describing an incompressible solid. MacCullagh, in order to account for
the boundary conditions upon reflection and refraction of a ray, proposes a pecu-
liar form for the potential energy V of the ether: V ∝ (∇ × u)2 . His reasoning
for including this term was phenomenological and not microscopic; it could only
be justified because it gave rise to the correct propagation of vibrations. McCul-
lagh’s theory fell into oblivion for quite some time, mostly because of a problem
unnoticed by him, and emphasized by Stokes (1862): a term that depends on the
curl violates basic principles of elasticity. We can understand Stoke’s objection by
interpreting the curl as a rotation of an element of the medium, an interpretation
unknown to McCullagh and first offered by Cauchy (1843).
Consider first an isolated infinitesimal square of a two-dimensional elastic sys-
tem, as in Figure 5.10. Each vertex of the square corresponds to the equilibrium
position of an atom of the system. When the atoms are displaced, the square dis-
torts. The potential energy associated with the distortion is a function of the relative
displacements between atoms: one can think of the square having internal springs
connecting the atoms that provide a restoring force to compression or sheer of the
square. However, since the system is isolated, there should not be a restoring force
to uniform rotations or translations. For the square, the“mean rotation” (the rota-
tion moyenne in Cauchy’s language) amounts to the angular momentum (or the
torque) of the displacements with respect to the center C of the square. At vertices
1 and 2, the torque of the x components of the displacements is counterclockwise,
and it is clockwise at vertices 3 and 4. The torque T1 of the x component with
respect to C is therefore
u(x,y + dy) u(x + dx,y + dy)

dx
4 3
dy
C
u(x + dx,y)
1
u(x,y) 2
Figure 5.10 The mean rotation of the distorted square is the “angular momentum”
of the elastic displacements u with respect to the center C.
dx
T1 = (u x (x, y) + u x (x + d x, y) − u x (x + d x, y + dy) − u x (x, y + dy))
2
∂u x
−d x . (5.95)
∂y
∂u
Similarly, the torque of the y components is T2 = +dy ∂ xy . Adding the two
terms (for a square of side d x = dy), we get that the mean rotation of the square is
∂u
proportional to ∂u
∂y
x
− ∂ xy , the z component of ∇ × u. For a cube, the mean rotation
will be precisely a vector proportional to ∇ ×u. From the point of view of elasticity
theory, a term (∇ × u)2 in the potential energy implies that there is some “spring”
that opposes the rotation of an element of the ether with respect to the vacuum!
Despite this feature, McCullagh’s theory gives the correct result and anticipates
Maxwell’s equations.
The Lagrangian proposed by McCullagh is:20
#
1 2 1
L(t) = dv ρ u̇ − h (∇ × u) .
2
(5.96)
2 2
The first term in equation (5.96) is the kinetic energy, with ρ the density of the ether.
In the second term, h is a rotational elasticity, a measure of the ether’s resistance to
twisting. This Lagrangian has an infinite number of variables, u(x, t), one for each
value of the coordinate x. Applying Lagrange’s equations (see Appendix C for the
details of the derivation), we obtain:
ρ ü = −h∇ × (∇ × u) . (5.97)
Vindication for McCullagh came from FitzGerald (1879), who noted that equa-
tion (5.97) is identical in structure to Maxwell’s equations in free space. FitzGerald
pointed out that under the following correspondence,21
B ∝ u̇, (5.98a)
E ∝ ∇ × u, (5.98b)
equation (5.97) becomes ∂B/∂t = −∇ × E (Faraday’s law of induction). Since

the ether is incompressible we have ∇ · u̇ = ∇ · B = 0 (absence of mag-
netic monopoles), and since E is given by a curl we have ∇ · E = 0 (Gauss’s
law in vacuum). If we take the time derivative of equation (5.98b), we obtain
Ampére’s law:
20 For simplicity we write the isotropic case, but McCullagh considers the anisotropic situation, where the
“springs” for rotation are different for each of the cartesian axes.
21 The correspondence of equations (5.98) is not unique. One can also choose u ∝ A, with A the vector
potential: E ∝ −u̇, B ∝ ∇ × u, and Maxwell’s equations also follow.
∂
(∇ × u) = ∇ × u̇ ∝ ∇ × B. (5.99a)
∂t
E
The correspondence does not imply mechanical properties of the ether. Rather,
this is a mechanical analogy insinuating that the ether (at least in McCullagh’s
time) should be a unique fluid, different from ordinary matter. Later, William
Thomson (1890) (Lord Kelvin) attempted to justify McCullagh’s ether with a
system of microscopic liquid gyrostats, but the model becomes “desperately com-
plicated”(Sommerfeld, 1950, p. 111). In a letter to astronomer John Herschel,
McCullagh wrote: “One thing only I am persuaded of, is that the constitution of
the ether, if it ever would be discovered, will be found to be quite different from
any thing that we are in the habit of conceiving, though at the same time very
simple and very beautiful.” For us it is most important that FitzGerald initiated the
practice of formulating electromagnetism in terms of a Lagrangian of the fields (see
Darrigol, 2014, p. 94). Hermann Helmholtz (1892), an advocate of the principle of
least action, used the Lagrangian formalism to derive Maxwell’s equations by anal-
ogy. Hendrik Lorentz (1892, 1903) derived his “Lorentz force” (F = eE + ev × B)
using d’Alembert’s principle applied to the change in kinetic energy of the mag-
netic field. Karl Schwarzschild (1903), using the Lorentz force as an ansatz, wrote
the Lagrangian of the electromagnetic field in the presence of charges of density ρ
that move at velocity v:
#
1 2 1
L = dv B − E2 + ρ φ − v · A . (5.100)
2 c
In equation (5.100) the functions φ and A are potentials from which the elec-
tric and magnetic fields are derived: E = −∇φ − ∂A/∂t and B = ∇ × A. The
Lagrangian of equation (5.100) is a function of the potentials and of the coor-
dinates and velocities of the particles. If, following Schwarzschild, we minimize
with respect to the potentials, we obtain Maxwell’s equations. If we minimize with
respect to coordinates of one of the particles, we obtain the Lorentz force acting on
that particle. In a vacuum ρ = 0, and the Lagrangian is the same as McCullagh’s.
Max Planck (1909/1915) devotes the seventh of his Eight Lectures on Theoret-
ical Physics (delivered in 1909 at Columbia University) to the principle of least
action. He uses McCullagh’s Lagrangian to show that the “significance of the prin-
ciple of least action may be extended beyond ordinary mechanics and therefore it
can be utilized as the foundation of general dynamics, since it governs all known
reversible processes.” More recently, Richard Feynman remarks on the rotational
ether: “It is interesting that the correct equations for the behavior of light were
worked out by McCullagh in 1839. But people said to him, ‘Yes, but there is no
real material whose mechanical properties could possibly satisfy those equations,
and since light is an oscillation that must vibrate in something, we cannot believe in
this abstract equation business.’ If people had been more open-minded, they might
have believed in the right equations for the behavior of light much earlier than they
did” (Feynman, 2013).
6
The Optical-Mechanical Analogy, Part II: The
Hamilton-Jacobi Equation
6.1 Hamilton’s “Theory of Systems of Rays”

A major development in the principle of least action is William Rowan Hamil-
ton’s “characteristic function,” which, with Carl Gustav Jacobi’s contributions
leads to the Hamilton-Jacobi equation. During the course of his work, in which
he started focusing on optics when he was seventeen, Hamilton realized that optics
and dynamics are essentially a single mathematical subject, or two aspects of the
calculus of variations. According to the Irish mathematician John Lighton Synge,
Hamilton’s idea of the characteristic function placed him in the same category as
Descartes, Laplace, and Lagrange (Synge, 1945): Descartes characterized a curve
by a single equation, Lagrange did so for dynamical systems by describing them
using a single energy function, Laplace described the gravitational field using a
single potential function, and Hamilton introduced his characteristic or principal
function for mechanics.
The goal of Hamilton’s investigation on optics was to find a curve giving
stationary values for integrals of the type
#
d x dy dz
V = v x, y, z, , , du. (6.1)
du du du
If we evaluate the integral between fixed points A = (x, y, z) and B =
(x , y , z ) and consider all possible curves connecting A and A , is there a curve

that gives a smaller value of V than the others? “In general” such a curve exists
and corresponds to a ray in optics. As we will see, the curve is an extremum in the
calculus of variations, which includes curves that are “stationary” or “critical,” and
not necessarily giving an absolute minimum.
Hamilton’s central idea is the following: Regard the minumum (or stationary)
value of the integral as a function of the six coordinates (x, y, z) and (x , y , z ) of
A and A . So, in the usual calculus of variations the coordinates of A and B are
fixed parameters, and the variables on which V depends are the curves connecting
112
A with B. In Hamilton’s approach, the integral is evaluated at an extreme of V and

the coordinates of A and B are the variables.
Hamilton introduced his characteristic function, which provided an algebraic
tool for solving optical problems, in his series of four papers, “Theory of sys-
tems of rays,” published between 1828 and 1835. The first part of the series
is intended as an addition to the theorem of Malus. This states that a family
of rays emitted by a luminous point source (the rays being perpendicular to a
family of concentric spheres) will remain perpendicular to a family of surfaces
after one reflection on a smooth surface or one refraction through a smooth
surface. Hamilton calls such a family a “rectangular system of rays;” in con-
temporary language, they are called a normal congruence. Malus was not sure
whether this property remains true for several reflections or refractions, and
Hamilton shows that indeed the property is satisfied after many reflections and
refractions.
Hamilton starts from the law of reflection on a mirror: the normal to the mirror
bisects the angle between the incident and reflected rays. In modern notation, if n̂
is the vector normal to the mirror, and û and û the unit vectors of the incident and
reflected rays, the law of reflection (which Hamilton writes in components) is:
û + û = 2n̂. (6.2)
(Hamilton uses the notation (cos α, cos β, cos γ ) for unit vectors in terms of the
cosines with the coordinate axes.) Now consider a point Q within the mirror
infinitesimally close to the point of incidence P (see Figure 6.1): Q = P + δx.
Since δx is in the tangent plane of the mirror δx · n̂ = 0, the law of reflection
implies:
(û + û ) · δx = 0. (6.3)
Figure 6.1 The law of reflection for a parabolic mirror of focal length f . In gen-
eral, F(x, y) = C, with F given by equation (6.6) gives the family of parabolas
y = x 2 /(2( f + C)) + f 2 − C 2 .
114 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Hamilton asks how to find a mirror defined by the equation F(x, y, z) = C that
will reflect a system of rays of any given congruence into a focus at a point A . He
solves the problem formally by proving that equation (6.3) is an exact differential.
Before showing Hamilton’s proof, let us look at an example that illustrates the
spirit of Hamilton’s approach. Consider for simplicity a two-dimensional problem.
Take the incident rays as parallel and in the y direction û = ĵ. We want the rays to
converge to a focus on the y axis: A = f ĵ. Calling (x, y) the point of reflection
P = (x, y), (see Figure 6.1), we have
−ı̂x + ( f − y)ĵ
û = . (6.4)
x 2 + ( f − y)2
Equation (6.3) becomes

x f −y
dx − + 1 dy = 0. (6.5)
x 2 + ( f − y)2 x 2 + ( f − y)2
Equation (6.5), by simple inspection, is in fact an exact differential, dF = 0, with

F(x, y) = x 2 + ( f − y)2 − y, (6.6)
and F(x, y) = C describes a family of parabolas of focal length ( f + C)/2.
Hamilton’s motivation for introducing the characteristic function was probably
his noticing that, in general, equation (6.3) is an exact differential on a plane of
reflection (Darrigol, 2012). Hamilton first notes that û · δx is an exact differential,
since it simply represents δρ , with ρ the distance A P from the point of incidence
to the focus. Since δx is on the tangent to the mirror, û · δx = −δρ is also an exact
differential, but only on the surface of the mirror. So, in principle, while û · δx is
an exact differential outside of the mirror, û · δx is only so on the surface of the
mirror; in other words, the component u of û on the tangent plane is the gradient
of some function G. This implies that ∇ × u = 0 , and (since the curl of a vector
that lives on a surface is perpendicular to that surface, or ∇ × u ∝ n̂), û · δx = dG
is equivalent to
n̂ · ∇ × û = 0. (6.7)
Hamilton then notes that the variation of û along the direction of the incident
ray is zero since û is precisely along the ray; formally: (û · ∇)û = 0. Then he
remarks that, since û2 = 1, ∇ û2 = 0. This implies1 that û × (∇ × û) = 0, and
therefore ∇ × û is parallel to û. But, according to equation (6.7) ∇ × û is on the
tangent plane, and therefore ∇ × û = 0 on the mirror. Furthermore, the condition
(û · ∇)û = 0 implies (û · ∇)(∇ × û) = 0, implying that (∇ × u) remains the same
along the ray (as pointed out by Darrigol (2012), Hamilton overlooked this step in

1 Using the identity û × ∇ × û = ∇ û2 − (û · ∇)û
the proof). Consequently (∇ × u) = 0 everywhere outside of the mirror and û · δx

is an exact differential of some function V : the incident rays are perpendicular to
surfaces V = constant. Integrating û · δx along the reflected ray to the incident
point P (see Figure 6.1), gives V = M P + P A where the overline indicates the
length of the segment.
Hamilton’s next step leads to his characteristic function. Rather than considering
a variation along the mirror, consider now allowing the end points of the rays to
vary. For a single reflection (see Figure 6.2), V is given by
V (x, x ) = ρ + ρ ≡ Px + x P . (6.8)
Now consider the variation of V upon changing one of the extremes by δx. The
new value of V is
V (x + δx, x ) = Q R + x Q. (6.9)
Since the path x Q R is minimal (or extremal)
Q R + x Q = P R + ρ , (6.10)
and
V (x + δx, x ) − V (x, x ) ≡ ∇V · δx = P R − ρ = û · δx. (6.11)
Since the variation δx is arbitrary, we have ∇V (x, x ) = û (a unit vector).

If we now simply extend the above treatment and consider the differential of V
as a function of both the initial and the final points, we have
d V = dx · û − dx · û . (6.12)
Figure 6.2 Reflection on a curved mirror.

For a given luminous point source at x , the gradient ∇V (x , x) gives the “vector
of normal slowness,” in the direction of the ray. For this case of reflection and
propagation in vacuum, we showed that this vector is of unit length, which implies
the following two differential equations:

∂V 2 ∂V 2 ∂V 2
+ + =1 (6.13a)
∂x ∂y ∂z

∂V 2 ∂V 2 ∂V 2
+ + = 1. (6.13b)
∂x ∂ y ∂z
The above equations also apply for multiple reflections. As a simple example,
Hamilton solves V for the case of a flat mirror (Hamilton, 1833). Consider the
source to be at a point A = (x , y , z ), an “eye” to be at B = (x, y, z), and the
mirror to be located at z = 0. The characteristic function is given by the length L of
the path that connects A and B and reflects on the mirror. We know that L, accord-
ing to the theorem of Hero of Alexandria discussed in Section 2.2, is given by the
length of the straight line connecting A with the reflected point B̃ = (x, y, −z),
and

V (x , x) = (x − x )2 + (y − y )2 + (z + z )2 . (6.14)
Another interesting case is a set of three mirrors at right angles to one another,
coincident with the planes x = 0, y= 0 and z = 0. Taking into account the three
reflections that take place for the ray to travel from A to B, we have

V (x , x) = (x + x )2 + (y + y )2 + (z + z )2 . (6.15)
In the case of refraction, the path of least time is clearly not the shortest path,
since light moves at different speeds in different media. The quantity to be mini-
mized is the “optical path,” the product of the length with the index of refraction.
The extension of Malus’s theorem to a refracting surface is immediate. A simple
derivation is the following. Consider a set of rays that form a congruence. For
example, in reference to Figure 6.3 the rays A P and B Q are perpendicular to a
surface C. The rays propagate in a medium of index of refraction n and refract on a
surface S into a medium of index n . Consider two points A and B on the refracted
rays so that the optical paths of both rays are the same:
n A P + n P A = n B Q + n Q B . (6.16)
If the segments are adjacent, the optical path of the segments AQ A and A P A are
equal, since Snell’s law is obeyed at P and, to first order, the change in optical path
for an infinitesimal displacement along the refracting surface is zero:
n AQ + n Q A = n B Q + n Q B , (6.17)
Figure 6.3 Malus’s theorem for refraction: a normal congruence will remain
normal after a refraction.
In turn, since AB is perpendicular to B Q the lengths AQ and B Q are equal, which

upon substitution in equation (6.17) gives Q A = Q B . This means that Q B is
perpendicular to A B : the refracted rays also form a congruence.
The “point characteristic” V (x, x ) defined by Hamilton for refracting rays is the
optical length of the ray connecting x with point x:
# x

V (x, x ) = n(x)ds, (6.18)
x
where n(x) is the (position dependent) index of refraction, ds is the arc length
of the path, and the integral is evaluated on the trajectory that minimizes V . The
function V (x)/c, with c the velocity of light, is the time spent by the light ray in
going from x to x. The function V is then Fermat’s integral evaluated at the path
that extremizes the optical length.
In analogy with the case of reflection, consider the variation of V upon an
infinitesimal change in x (see Figure 6.4). If the optical length between x and x
is V , the new path x R has optical length V + d V . Since the new path is minimal,
the change in optical path is
d V ≡ ∇V · δx = n(x)ds = n(x)û · δx (6.19)
with û the unit vector normal to the surfaces of constant V . Extending the variation
with respect to x , as in the case of reflection, we obtain the following differential
equations,

∂V 2 ∂V 2 ∂V 2
+ + = n(x)2 (6.20a)
∂x ∂y ∂z

∂V 2 ∂V 2 ∂V 2
+ + = n(x )2 , (6.20b)
∂x ∂ y ∂z
Figure 6.4 Variation of Hamilton’s characteristic function.
also called the eikonal equations of Hamilton. However, according to J. L. Synge

(1937), “it is not necessary or desirable to use the word ‘eikonal,’ ” a word invented
by Heinrich Burns (1895) “in ignorance of Hamilton’s work.”
Hamilton’s description of geometric optics in terms of a characteristic function
is very elegant from the point of view of a general theory, but its calculation for
a given instrument presented difficulties. For example according to M. Herzberger
(1936) “no one has succeeded until now, except for the case of a plane mirror,
in finding this Hamiltonian function for a given optical system · · · only a few
scientists, for instance, Maxwell, Rayleigh, Clausius, and Larmor, have developed
optical problems by Hamilton’s method.” Herzberg is referring to the calculation
of the characteristic function V , but Hamilton, in the “Supplement to an essay on
the theory of systems of rays” also introduces two other functions, W (the “mixed
characteristic”) and T (the “angle characteristic”), that are Legendre transforms
of V and that are easier to calculate in simple mathematical systems. The mixed
characteristic depends on the initial coordinates and final direction of the ray
W (û, x ) = û · x − V (6.21a)
dW = x · d û + dx · û . (6.21b)
Notice that there are two mixed characteristics, the second one depending on the
final coordinates and the initial direction of the light ray. The angle characteristic
depends on the initial and final angles of the ray:
T (û, û ) = û · x − û · x − V (6.22a)

dT = x · d û − x · d û . (6.22b)
Although Hamilton’s papers in optics were formal, and his interest was mainly
in analysis rather than practical applications, in his “Third Supplement to an Essay
on the Theory of Systems of Rays” (Hamilton, 1837), he describes a remark-
able prediction for refraction in birefringent crystals: the phenomenon of conical
refraction.
Following Sarton (1932), the simplest way of explaining this is to borrow the
words of a contemporary account in the Dublin University Magazine of January,
1842, as quoted in a footnote in Robert Percival Graves’ biography of Hamilton
(Graves, 1882, pp. 623–624):
The law of the reflection of light at ordinary mirrors appears to have been known to
EUCLID; that of ordinary refraction at a surface of water, glass, or other uncrystallized
medium, was discovered at a much later age by SNELLIUS; HUYGENS discovered, and
MALUS confirmed, the law of extraordinary refraction produced by uniaxal crystals, such
as Iceland spar; and finally the law of the extraordinary double refraction at the faces of
biaxial crystals, such as topaz or arragonite, was found in our own time by FRESNEL. But
even in these cases of extraordinary or crystalline refraction, no more than two refracted
rays had ever been observed or even suspected to exist, if we except a theory of CAUCHY,
that there might possibly be a third ray, though probably imperceptible to our senses. Pro-
fessor HAMILTON, however, in investigating by his general method the consequences of
the law of FRESNEL, was led to conclude that there ought to be in certain cases, which
he assigned, not merely two, nor three, nor any finite number, but an infinite number, or a
cone of refracted rays within a biaxial crystal, corresponding to and resulting from a single
incident ray; and that in certain other cases, a single ray within such a crystal should give
rise to an infinite number of emergent rays, arranged in a certain other cone. He was led,
therefore, to anticipate from theory two new laws of light, to which he gave the names of
Internal and External Conical Refraction.
Let us follow Hamilton’s derivation in the following section.
6.2 Conical Refraction*

6.2.1 Fresnel’s Equations for Anisotropic Crystals
Hamilton considers the propagation of a transverse wave in an anisotropic crys-
tal using a simplified approach already considered by Augustin-Jean Fresnel and
George Biddell Airy: the idea that a light wave passing through a medium causes
a small displacement of the atoms within that medium. If we think in terms of an
anisotropic crystal, anisotropy is in the fact that the atomic displacements are not in
the direction of the restoring force. In a simple picture, if we consider the displace-
ments δx = (δx, δy, δz) the restoring force is given by f = −(a 2 δx, b2 δy, c2 δz),
which is clearly not parallel to δx. In modern notation, the relation between force
and displacement can be written as follows:
f = −K δx (6.23)
2 2 2
where K is the diagonal matrix K = diag a , b , c . Since we are dealing with a
transverse wave, the direction of propagation of the wave û is perpendicular to δx.
In turn, the restoring force can be decomposed as f = f +f⊥ in components parallel
and perpendicular to û. The idea followed by Hamilton is that the component along
f is not important, since that motion has no effect on the eye; all that matters is
the transverse displacement. The reasoning is that, in order for the motion to be
harmonic, the direction of the wave û has to be such that the component f⊥ is along
the direction of the displacement δx. In that way one can write an equation of the
form of an harmonic oscillator,
f⊥ = −ω2 δx, (6.24)
with ω (proportional to) the velocity of the wave. Using f⊥ = f − (f · û)û and
equation (6.23), we obtain:
K δx − (K δx · û)û = ω2 δx, (6.25)
and
δx = (K δx · û)(K − ω2 )−1 û. (6.26)
Multiplying both sides by K and then taking the dot product with û we get
û · K (K − ω2 )−1 û = 1, (6.27)
or, which is equivalent:
û · (K − ω2 )−1 û = 0. (6.28)
Since K is a diagonal matrix, another way of writing equation (6.28) is:
u 2x u 2y u 2z
+ + = 0, (6.29)
ω2 − a 2 ω2 − b2 ω2 − c2
which is the equation derived by Fresnel, and the starting point of Hamilton’s dis-
cussion. Equation (6.29) gives the values of the normal (or phase) velocity ω(û)
of propagation of a plane wave in the direction of the unit vector û. Since the
system is anisotropic, ω is in general different from the velocity of a ray. More
precisely, consider that at t = 0 a system of rays is emitted from the origin x = 0
in all directions. The rays are straight lines because the system is homogeneous,
but their velocities depend on direction because the system is anisotropic. At time
t, the tips of the rays generate a wave surface V (0, x) whose normal at x is û, not
necessarily in the direction of the ray, and the normal velocity of the front at that
point is given by ω(û). It is relatively easy to realize that, as noticed by Fresnel, for
a given direction there are two possible normal velocities that correspond in turn
to the two orthogonal polarizations of the wave. For example, if the propagation
is along the x y plane, multiplying equation (6.29) by the product of denominators
and setting u z = 0, u x = cos θ, u y = sin θ we obtain

(ω2 − c2 ) ω2 − b2 cos2 θ − a 2 sin2 θ = 0, (6.30)
which corresponds to a circle of radius c and an oval that intersects the axes at
x = b and y = a. The same analysis applies to u x = 0 and u y = 0. If, for example
a > b > c, in the x z plane the oval will intersect the circle in four points. We
indicate those points in Figure 6.5 as I1 , I2 , I3 , and I4 , in directions forming an
angle α0 with the axes. The angle α0 given by
a 2 − b2
cos2 α0 = , (6.31a)
a 2 − c2
b2 − c2
sin2 α0 = 2 , (6.31b)
a − c2
corresponds to a direction of a single normal velocity and is called the “opti-
cal axis.” Hamilton noticed that a function of the form given by equation (6.29)
gives rise to a surface with four cusps. Remarkably, Hamilton calculates V from
ω(û) using a purely analytical method and rederives Fresnel’s equation for the
wave surface. It turns out that the wave surface has a similar structure to that
given by equation (6.29) and therefore has cusps as well. Hamilton’s origi-
nal contribution was to realize that those cusps are the vertices of cones that
give rise to infinitely many possible directions of the refracted rays for some
special directions of the incident ray. We visit his derivation in the following
section.
y z z
I4 I1
α0
x x y
I3 I2
uz = 0 uy = 0 ux = 0
Figure 6.5 Polar graph of the normal velocity ω(û) given by equation (6.29) with
û projected on the three planes for a = 3, b = 2 and c = 1. The thick line is the
optical axis which makes an angle α0 [given by equations (6.31)] with the x axis.
Q B
P
A
û
Figure 6.6 As the point P describes the wave surface, the point Q describes the
surface of normal slowness.
6.2.2 Analytical Derivation of the Wave Surface

Consider the wave surface V (x, O) = 1, described by a pulse propagated from
the origin of coordinates O in unit time. A ray in the direction O P (see Figure
6.6) has a velocity v R that depends on direction. The dashed lines in Figure 6.6
represent the wave-front a time dt earlier, so A P = v R dt. The normal velocity
ω (the velocity of the tangent to the wave surface) is in the direction û, so that
AB = ωdt. Call x the vector of the ray O P: x = v R (x̂)x̂. Since x · û = ω,
the vector s = û/ω (represented by the line O Q) describes the surface of normal
slowness (slowness because the radius is inverse to the normal velocity in that
direction) and spans a surface that is polar reciprocal to the wave surface: s · x = 1.
This relation was noted by Cauchy in his works on optics: the wave surface “is
the envelope of the planes which cut perpendicularly the radii of the surface of
normal slowness at distances from the center equal to the reciprocal of these radii”
(Cauchy, 1830).
Interestingly, Hamilton notes a duality between the two surfaces “which he
[Cauchy] does not seem to have perceived: namely, that the surface of compo-
nents is the envelope of the planes which cut perpendicularly the radii of the wave
at distances from its center equal to the reciprocals of those radii, that is, equal
to the slowness of the rays” (Hamilton, 1837). This theorem follows from simple
considerations: since the tangent to V is perpendicular to s, we have s · dx = 0;
and since s · x = 1, we also have x · ds = 0: “if one surface S be deduced from
another S by drawing radii vectors to the latter from an arbitrary origin O, and
altering the lengths of these radii to their reciprocals without changing their direc-
tions, and seeking the envelope S of the planes perpendicular at the extremities
to these altered radii of S, then reciprocally, the surface S may be deduced from
S by a repetition of the same construction, employing the same origin O, and the
same arbitrary unit of length.” This duality means that one can treat the surface of
normal slowness, described by the surface G(s) = 0, as a characteristic function
itself, and deduce from it the “wave equation” for V (x).
Written in terms of s = 1/ω, Fresnel’s equation of wave normals (6.28) becomes
G(s) = s · (1 − s 2 K )−1 s = 0. (6.32)
According to the duality argument, the normal to this surface will be proportional
to x (the ray direction). Following the notation by Darrigol (2012) we define M =
(1 − s 2 K )−1 :

∇s G = 2Ms + 2 s · M 2 K s s = λx, (6.33)
with λ a proportionality constant determined by the reciprocal condition s · x = 1.

Using s · Ms = 0 in equation (6.33) we have
λ = 2s · M 2 K ss 2

≡ 2s · M 2 K s 2 − 1 + 1 s

= 2s · M 2 −M −1 + 1 s
= 2s · M 2 s. (6.34)
In the notation A = s · As, the above equations are

λ
= M 2 ≡ M 2 K s 2 . (6.35)
2
Substituting this expression for λ in equation (6.33), the ray vector x is given by
1
x= Ms + K M 2 s (6.36)
M
2
1
≡ Ms + K M 2 M −1 Ms
M
2
1 K M 2 s 2
= 1 + K M 2
Ms − K Ms. (6.37)
M 2 M 2
Using equation
(6.35) and
noticing that after squaring equation (6.36) we have
r = x = 1 + K M /M , equation (6.37) becomes
2 2 2 2
x = (r 2 − K )Ms. (6.38)
Using, from equation (6.36), x · Ms = 1 we obtain
x · (r 2 − K )−1 x = 1, (6.39)
which is equivalent to
x · (1 − r 2 K −1 )−1 x = 0, (6.40)
meaning that the wave surface V can be obtained from the equation of normal
slowness by replacing K by K −1 : an “inversion,” or rescaling of the axes.
6.2.3 Hamilton’s Derivation of the Conical Cusp

From the discussion in the previous section, the equation for the wave surface is
given by:
a2 x 2 b2 y 2 c2 z 2
+ + = 0. (6.41)
r 2 − a 2 r 2 − b2 r 2 − c2
In order to concentrate on the plane x z of intermediate “elasticity,” where the two
curves of the projection of the wave surface intersect, we write equation (6.41) as
(see Figure 6.7)

(r 2 − b2 ) a 2 x 2 (r 2 − c2 ) + c2 z 2 (r 2 − a 2 ) + (r 2 − a 2 )(r 2 − c2 )b2 y 2 = 0. (6.42)
If y = 0 the solutions are a circle of radius b, and an ellipse given by the
equation:
r 2 (a 2 cos2 θ + c2 sin2 θ 2 ) − a 2 c2 = 0. (6.43)
Notice that, whereas equation (6.29) gives rise to ovals upon projection onto the
planes of elasticity, the equation for 1/ω gives rise to ellipses. In other words, if
r = f (θ) is an ellipse, then r = 1/ f (θ) is an oval. The ellipse and the circle
intersect at the point
x0 = b(cos θ0 , 0, sin θ0 ) ≡ bû0 , (6.44)
which we find by setting r = b in the equation of the ellipse. The angle θ0 is
therefore given by the following equations:
b−2 − a −2
cos2 θ0 = , (6.45a)
c−2 − a −2
c−2 − b−2
sin2 θ0 = −2 . (6.45b)
c − a −2
Figure 6.7 Fresnel wave surface of equation (6.42) for a 2 = 0.12, b2 = 0.4,
c2 = 0.75, showing four “cusps.”
Next we expand around the intersection point:

x = bû0 + δx ,
substitute in equation (6.42), and keep the lowest orders in δx (which will be
quadratic in δx ):

2bû0 · δx a 2 2b cos θ0 δx (b2 − c2 ) + c2 2b sin θ0 δz (b2 − a 2 )

+ 2bû0 · δx (a 2 b2 cos θ02 + c2 b2 sin θ02 )
+ (b2 − a 2 )(b2 − c2 )b2 δ 2y 0. (6.46)
Following Hamilton, we choose a system of axes rotated with respect to the y
axis in such a way that the new x axis (x ) is in the direction θ0 . In other words,
û0 · δx = δx and δz = cos θ0 δz − sin θ0 δx . Expressing equation (6.46) in this new
coordinate system, and using the expressions (6.45), we obtain after a little algebra
4b−2 δx 2 − 4Cδx δz + b2 C 2 δz2 − b2 C 2 (δ 2y + δz2 ) = 0, (6.47)
with

C= (b−2 − a −2 )(c−2 − b−2 ). (6.48)
Equation (6.47) describes an elliptical cone and coincides with Hamilton’s expres-
sion G19 from his “Third Supplement to an Essay on the Theory of Systems of
Rays” (after permutation of the names of the axes x and z ). Hamilton follows a
slightly different route for the derivation, but the essential point is that he expands
around the singular point and obtains this unnoticed property of the curve: “Fres-
nel does not appear to have been aware of the existence of this tangent cone to his
wave” (Hamilton, 1837).
6.2.4 Internal Conical Refraction: “The Plum Laid Down on a Table”

The second startling calculation by Hamilton is to show (analytically) that there
is a particular tangent plane (four planes actually, one per singular point or cusp)
that touches the wave surface not at two points but on a circle. That plane is per-
pendicular to the optical axis, so that, when the angle of incidence is such that the
refracted ray is along the optical axis, there are infinitely many possibilities for the
refracted ray, all lying in a cone. Let us review how Hamilton derives that result.
First, note that a factor r 2 can be eliminated from equation (6.42), and defining
ρ 2 = x 2 + z 2 it can be written in the form

a 2 x 2 + c2 z 2
y b r −a −c +
2 2 2 2 2
+(ρ 2 −b2 )(a 2 x 2 +c2 z 2 −a 2 c2 ) = 0, (6.49)
b2
which is appropriate for analyzing the intersection with a tangent plane
perpendicular to the x z plane. Now rotate with respect to the y axis by an angle
α0 given by equations (6.31) (corresponding to the directions of the single normal

velocity) and set x = b (we are exploring a tangent to the circle of intermediate
normal velocity b). The wave equation now describes a curve in the z y plane:

2 a 2 x 2 + c2 z 2
y b y +z +b −a −c +
2 2 2 2 2 2
b2
+ z 2 (a 2 (Rx)2 + c2 (Rz)2 − a 2 c2 ) = 0, (6.50)
with Rx = cos α0 b + sin α0 z , and Rz = cos α0 z − sin α0 b. After a few algebraic

manipulations, we obtain

y 4 b2 + 2y 2 bz z b + C + z 2 (z b + C)2 = 0, (6.51)

with C = (a 2 − b2 )(b2 − c2 ). Equation (6.51) is a perfect square and implies
C
y 2 + z 2 + z = 0, (6.52)
b
which is equation O19 from the “Third Supplement,” and corresponds to a circle of
diameter D given by

(a 2 − b2 )(b2 − c2 )
D= . (6.53)
b
In a letter to astronomer John Herschel, Hamilton describes the touching of the
surface by the tangent plane with a nice metaphor: “somewhat as a plum can be laid
down on a table so as to touch and rest on the table on a whole circle of contact,
and has, in the interior of the circular space, a kind of conical cusp”(Graves, 1882,
p. 282). Hamilton called this phenomenon internal conical refraction, because the
cone is formed inside the crystal. If instead of a single ray a cone is incident on the
crystal at the appropriate angle, then a single ray would be refracted into the crystal
in the direction of the optical axis and emerge on the other side as a hollow cone.
Hamilton called this experiment external conical refraction.
Hamilton realized that if his prediction could be verified experimentally it would
be a major breakthrough. An effect like this was not only never seen before, but
was one of the few qualitatively new phenomena (perhaps the first) predicted by
pure mathematical reasoning. As emphasized by M. V. Berry “By the early 1800’s,
it was widely appreciated that mathematics is essential to understanding the natu-
ral world. However, the phenomena to which mathematics had been applied were
already familiar (e.g., tides, eclipses, and planetary orbits). Prediction of qualita-
tively new effects by mathematics may be commonplace today, but in the 1830’s it
was startling”(Berry and Jeffrey, 2007). The experimental confirmation of conical
refraction came from Hamilton’s colleague Humphrey Lloyd, who, in 1832, mea-
sured external conical refraction with a specimen of arragonite (O’Hara, 1982).
On top of observing the emerging cone, Lloyd made a remarkable discovery: all
the rays of the cone were polarized in different planes. After analyzing the cone
of emergent rays through a tourmaline polarizer, he writes: “ I was surprised to
observe that one radius only of the circular section vanished in a given position
of the axis of the tourmaline, and that the ray which disappeared ranged through
360◦ as the tourmaline plate turned through 180◦ . . . the angle between the planes
of polarization of any two of the rays of the cone is half the angle contained by
the planes passing through the rays themselves and its axis”(Lloyd, 1833). After
observing this result, Lloyd himself is able to reconcile his finding with Fresnel’s
theory. In modern terminology, the polarization of the ray is on the plane formed
by the wave vector and the Poynting vector. In Hamilton’s words: “the vibrations at
the circle of contact on Fresnel’s wave are in the chords of that circle drawn from
the extremity of the normal ω of single velocity” (point B on Figure 6.8). As a
result, and as explained by Hamilton, as one completes one turn along the circle of
the transmitted rays, the polarization completes half a turn (as shown in the insert
of Figure 6.8). Or, if one selects a point on the circle by passing the outgoing beam
through a polarizer, as the polarizer rotates by 180◦ , the point completes a whole
turn.
N
N M
P B
A
L
C
Incident beam
Figure 6.8 An incident beam, or ray, entering perpendicularly to a sample

side which in turn is perpendicular to the optical axis, will be refracted in a
cone of rays and will emerge on the other side as a cylinder of rays. The
of the emerging cylinder is L tan χ with L the sample thickness and
diameter
tan χ = (a 2 − b2 )(b2 − c2 )/b2 . The insert shows how the polarization direc-
tion completes half a turn as one walks around the circle of the emerging
rays.
The observation of conical refraction not only provided powerful evidence con-
firming that light is a transverse wave, but the π-phase change of polarization
after a full turn around the circle might also be considered the first realization
of a geometrical phase, or ‘Berry phase’ (Berry, 1984). In addition, Hamilton’s
cone is an anticipation of conical intersections that appear in chemistry and
condensed matter physics, where they are referred to as “Dirac cones” (Berry,
2015).
6.3 Hamilton’s Law of Varying Action*

In his path towards a unification of optics and mechanics, Hamilton considers an
unconstrained motion due to conservative forces and a trajectory x(t) that obeys
Newton’s law of motion m ẍ = F.
Hamilton considers the variation between two nearby paths (see Figure 6.9),
let’s call them x(t) and x (t), each of them obeying the equations of motion, with
δx(t) = x (t) − x(t):

m ẍ(t) · δx(t) = F {x(t)} · δx(t) = δU ≡ U x (t) − U {x(t)} , (6.54)
where U is what Hamilton calls the force function, and, in modern notation, cor-
responds to minus the potential energy. Next he considers variations of 2T , the
“living force” of the system (in modern nomenclature, twice the kinetic energy),
with
1
T = m ẋ2 (t). (6.55)
2
x(t)
x(t)
δx(t)
Figure 6.9 Schematic representation of two possible motions of a system “with

the same dynamical relations between the accelerations and positions of its
points, but with different initial data.” The variation δx(t) is between these two
mechanically allowed trajectories.
6.3 Hamilton’s Law of Varying Action* 129
Hamilton then uses the “celebrated law of living force” (the modern conservation
of mechanical energy). For both paths we have
1 2
m ẋ (t) = U {x(t)} + E (6.56)
2
1 2
m ẋ (t) = U x (t) + E , (6.57)
2
with E and E constants of motion (the total energy). The difference between the
above equations, to lowest order in δx is
δT = m ẋ(t) · δ ẋ(t) = δU + δ E
= m ẍ(t) · δx(t) + δ E, (6.58)
where, in equation (6.58), following Hamilton, we have used equation (6.54), and
δ E = E − E.
Using the identity
d
m [ẋ(t) · δx(t)] = m ẋ(t) · δ ẋ(t) + m ẍ(t) · δx(t) (6.59)
dt
we can rewrite equation (6.58) as
d
2(δT ) = m [ẋ(t) · δx(t)] + δ E. (6.60)
dt
Integrating equation (6.60) between times t = 0 and t, Hamilton obtains
# t
2(δT )dt = δV = m ẋ(t) · δx(t) − m ẋ(0) · δx(0) + tδ E, (6.61)
0
with
# t
V = 2T dt. (6.62)
0
Notice that, in equations (6.61) and

(6.62), we have exchanged the variation of
T with the integration over time ( δT dt = δ T dt), which is valid since “δ”
refers to variations between dynamically allowed paths and not arbitrary paths.
This distinguishes Hamilton’s approach from the formulation of the principle of
least action, where the extremes are fixed, and arbitrary variations over the paths are
considered in order to minimize (or extremize) the action. This procedure, Hamil-
ton emphasizes, leads to the equations of motion. Hamilton’s approach is different.
He introduces what he calls the “law of varying action” – expressed by equation
(6.61) – “in which we pass from an actual motion to another motion dynamically
possible, by varying the extreme positions of the system, and (in general) the quan-
tity H ” (the energy E in our notation). In analogy with the treatment of light rays,
V is a characteristic function that depends on the initial and final coordinates –

x(0) and x(t) respectively – and also on the energy E. In Hamilton’s notation, with
x(t) = (x, y, z) and x(0) = (a, b, c) equation (6.61) implies:
δV δV δV
= m ẋ, = m ẏ, = m ż, (6.63a)
δx δy δz
δV δV δV
= −m ȧ, = −m ḃ, = −m ċ, (6.63b)
δa δb δc
and
δV
= t. (6.64)
δE
Using the law of living force (or conservation of energy), equation (6.63a) implies
( 2 2 )
1 δV 2 δV δV
+ + = U (x, y, z) + E, (6.65)
2m δx δy δz
and equation (6.63b) implies
( 2 2 )
1 δV 2 δV δV
+ + = U (a, b, c) + E. (6.66)
2m δa δb δc
In the last two pages of “On a General Method in Dynamics” Hamilton (1834a)
introduces time into his expression of the characteristic function and defines the
auxiliary function S, which he would later call his principal function, as
S = V − Et (6.67)
#
≡ (2T − E)dt
#
= (T + U )dt, (6.68)
where he is using the law of living force T = U + E (recall that Hamilton calls U
what we today call minus the potential energy). From equations (6.67) and (6.60),
the variation of S is
δS = δV − tδ E − Eδt (6.69)
= m ẋ(t) · δx(t) − m ẋ(0) · δx(0) − Eδt, (6.70)
which means δS/δt = −E, δS/δx(t) = m ẋ(t) and δS/δx(0) = −m ẋ(0). He
finishes the article with equations for S, which follow directly from the living force
equation:
( 2 2 )
δS 1 δS 2 δS δS
+ + + = U (x, y, z), (6.71)
δt 2m δx δy δz
6.4 An Example from Hamilton 131
and ( 2 2 2 )
δS 1 δS δS δS
+ + + = U (a, b, c). (6.72)
δt 2m δa δb δc
6.4 An Example from Hamilton: The Characteristic Function

V for a Parabolic Orbit*
In an interesting section titled “On the Undisturbed Motion of a Planet or Comet
about the Sun: Dependence of the Characteristic Function of such Motion, on the
chord and the sum of the Radii” Hamilton (1837) calculates his characteristic func-
tion V for the case of a central field of force. In particular, he finds an interesting
way of writing V for a parabolic orbit. He considers, as in the previous sections of
the paper (calling V the “action,” or “accumulated living form”)
# t
V =2 T dt, (6.73)
0
which is a recasting of the Maupertuis integral, evaluated for a path that connects
two points in a time t. Since energy is conserved (and the integral is evaluated on a
path that is a solution of the equations of motion), we have
# t
V = 2 (E − U )dt
0
# t
= 2Et − U dt, (6.74)
0
where E is the total energy and U is the potential energy U = −mμ/r where
μ is the mass of the Sun and m the mass of the planet or orbiting comet.2 Also,
Hamilton calls 12 h the areal velocity, with
h = r 2 θ̇, (6.75)
a constant of motion proportional to the angular momentum of the particle. The
total energy is therefore
1 1 h 2 mμ
E = m ṙ 2 + m 2 − . (6.76)
2 2 r r
Hamilton uses the parameters a (the semi – major axis of the ellipse of the orbit)
and p = b2 /a (with b the semi-minor axis of the ellipse). Also, the total energy
is simply related to the semi-major axis by E = −mμ/2a, and the areal velocity
satisfies p = h 2 /μ.3
2 Hamilton uses μ = m + M, but we will simplify the calculation slightly by taking M m.
3 These relations are sometimes called the vis viva equations and can be deduced using simple expressions for
the angular momentum and the energy at the turning points (call them 1 and 2) of the orbit:
Using these definitions he obtains.

dr √ 2 1 p
= μ − − 2. (6.77)
dt r a r
From the expression above we can evaluate the integral of the potential energy
# t # r
√ 1 dr
−2 U dt = 2 μm
0 r0 r 2 1 p
− − 2
# r r a r
√ dr
≡ 2m μa . (6.78)
r0 2ar − r 2 − pa
Next, Hamilton uses some “well known” relations for conic orbits (which we
review in Figure 6.10). The eccentricity e of the orbit is related to p and a through

2
p b
e = 1− ≡ 1− . (6.79)
a a
b P
v F
a C B A
ae
Figure 6.10 The eccentric anomaly v. As the point P orbits around F – a focus of
the ellipse – it sweeps
out equal areas in equal times. The area of the sector F P A
is A F P A = 12 ht ≡ 12 b μa t. Since the ellipse as shown in the figure is a circle of
radius a that has been uniformly compressed in the vertical direction by a scaling
factor ab , the areas of the sectors F P A and F Q A are proportional: A F P A =
( ab )A F Q A . And since A F Q A = va2 /2 − Triangle{C F Q} = 12 va2 − 12 a2 e sin v, we

3
obtain t = aμ (v − e sin v).
v1 r1 = v2 r2 = h, and E/m = v12 /2 − μ/r1 = v22 /2 − μ/r2 . With simple manipulations of these equations,
one gets: h = 2μr1 r2 /(r1 + r2 ), E = −μ/(r1 + r2 ).
6.4 An Example from Hamilton 133
Also, he uses the eccentric anomaly v, in terms of which the radii r0 and r can be
written as4
r = a (1 − e cos v) , r0 = a (1 − e cos v0 ) . (6.80)
From equation (6.80) we have
a −r
cos v = , (6.81)
ae
or, which is equivalent,

ae sin v = 2ar − r 2 − pa. (6.82)
Notice that, expressed in terms of v, the integral of the potential energy is imme-
diate: from equation (6.80) we have dr = ae sin vdv and, since equation (6.82) is
just the denominator that appears in equation (6.78), we have
# t # v
√ √
−2 U dt = 2m μa dv = 2μm a(v − v0 ). (6.83)
0 v0
Kepler’s law of equal areas in equal times gives the elapsed time t in terms of
the initial and final eccentric anomalies (see Figure 6.10 for a derivation of the
equation below):

a3
t= (v − v0 − e sin v + e sin v0 ) , (6.84)
μ
and from this equation Hamilton obtains an interesting expression for the charac-
teristic function in terms of the eccentric anomalies:
√
V = 2Et + 2m μa(v − v0 )
√
= m μa (v − v0 + e sin v − e sin v0 ) . (6.85)
The expression for V does not yet have the desired form, since we want a char-
acteristic function that depends on the initial and final coordinates and not on the
parameters of the conic (which, in principle, we don’t know). Hamilton takes the
limit a → ∞ (and e → 1), which corresponds to parabolic motion. In this limit,
the angles v and v0 are small, and V , from equation (6.85), has the form
√
V = 2m μa (v − v0 ) . (6.86)
He then writes V in terms of r , r0 and the chord τ joining the initial and final
points. Since the coordinates of the initial and final points are (a cos v, b sin v) and
(a cos v0 , b sin v0 ), the chord is
% x &2 % y &2
4 These relations are easily derived from the equation of the ellipse + = 1, and (see Figure 6.10 )
a b
x = a cos v. Since r 2 = (x − ae)2 + y 2 , equation (6.80) follows.
τ 2 = a2 (cos v − cos v0 )2 + b2 (sin v − sin v0 )2

a2 (cos v − cos v0 )2
% a &2 2
v 2 − v02 (6.87)
2
where we used b/a ∼ 0 and small angles v and v0 . Also, in this limit, equation
(6.80) implies
a 2
r + r0 v + v02 . (6.88)
2
Equations (6.88) and (6.87) imply

r + r0 + τ r + r0 − τ
v= , v0 = , (6.89)
a a
and substituting in equation (6.86) we obtain Hamilton’s result for the characteris-
tic function for parabolic motion:
√ √ √
V = 2 μm r + r0 + τ − r + r0 − τ . (6.90)
In the same section of the paper, Hamilton refers to a theorem by Euler that relates
the time t for a particle describing a parabolic arc P P with r , r0 the focal distances
and τ the chord joining P and P . From equation (6.84) we have (expanding the
sine functions to lowest non-vanishing order in the eccentric anomalies)

a3 1 3
t v − v03 ,
μ6
1
= √ (r + r0 + τ )3/2 − (r + r0 − τ )3/2 , (6.91)
6 μ
which is Euler’s theorem.
Hamilton states that the characteristic function for a parabola, V , from equation
(6.90), satisfies equation (6.63a) for E = 0 (we present the details in Appendix D):
* +
1 ∂V 2 ∂V 2 μm
+ = , (6.92a)
2m ∂x ∂y r
* +
1 ∂V 2 ∂V 2 μm
+ = . (6.92b)
2m ∂x ∂y r
6.5 Hamilton’s “Second Essay on a General Method in Dynamics”*

Hamilton begins the article with a rederivation of Lagrange’s equation in arbitrary
coordinates. The derivation has become standard, but we present it, since it entails
one central aspect of Lagrange’s equations: its invariance under change of coor-
dinates. Hamilton considers a set of 3n rectangular coordinates xi = (xi , yi , z i )
to be functions of another set of 3n coordinates, or “marks of position,”
(η1 , η2 , · · · , η3n ).
x = x[η] (6.93)
∂x
ẋ = · η̇ (6.94)
∂η
∂ ẋ ∂x
= . (6.95)
∂ η̇ ∂η
Notice that the components of x and of ẋ are both independent variables, and equa-
tion (6.95) – the “law of cancellation of dots” – follows immediately from equation
(6.94). Using the chain rule we have
∂U ∂U ∂x
= · (6.96)
∂η ∂x ∂η
∂x
= m ẍ · (6.97)
∂η

d ∂x d ∂x
= m ẋ · − m ẋ · . (6.98)
dt ∂η dt ∂η
Hamilton relates the term in square brackets in equation (6.98) to derivatives of the
kinetic energy:
∂T ∂ ẋ ∂x
= m ẋ · ≡ m ẋ · , (6.99)
∂ η̇ ∂ η̇ ∂η
where we used the law of cancellation of dots of equation (6.95). Also
∂T ∂ ẋ d ∂x
= m ẋ · ≡ m ẋ · , (6.100)
∂η ∂η dt ∂η
where we used the fact that the derivatives d/dt and ∂/∂η commute.5 Substituting
equations (6.100) and (6.99) in equation (6.98), Hamilton obtains
∂U d ∂T ∂T
= − . (6.101)
∂η dt ∂ η̇ ∂η
If one adds the assumption that the potential energy is independent of the velocity,
we obtain the “covariance” (see page 95) of Lagrange’s equations (now written in
term of each of the variables ηi ):
d ∂L ∂L
= , (6.102)
dt ∂ η̇i ∂ηi
% &
5 Since ∂ η̇k = 0 we have ∂ d x = ∂ η̇ ∂ x = η̇ ∂ ∂ d ∂
∂η j ∂η j dt i ∂η j k k ∂ηk i k k ∂ηk ∂η j xi = dt ∂η j xi .
with L = T + U the Lagrangian. Hamilton does not use L in showing the invari-
ance; he uses the form of equation (6.101). Each coordinate ηi has its own Lagrange
equation, and each equation has the same form, or the same appearance, regard-
less of the transformation from one set of independent variables to another. This
invariance can seem mysterious. What is its origin? The answer is the principle of
least action. The Lagrange equations can be obtained from the stationarity of the
action at some trajectory, the important point being that the condition of stationar-
ity is the same in both coordinate systems. A very simple example illustrates this
point. Consider a function f (x) with a minimum at x0 . Now change coordinates
√
to Q = x 2 . As a function of Q, the function f becomes F(Q) = f ( Q), with a
different functional form, and a minimum at Q 0 = x02 . But obviously the condition
of minimization has the same appearance for both coordinates: d f /d x = 0 and
d F/d Q = 0. We can extend this analysis to functions f of many variables, as long
as f is a scalar (it does not have components). The Lagrangian function L behaves
like a scalar under a coordinate transformation; it changes its functional depen-
dence on the coordinates, but its numerical value at a given point remains the same.
Even though the root of this property is mathematically simple, “the invariance
of the Lagrangian equations with respect to arbitrary point-transformations gives
these equations a unique position in the development of mathematical thought.
These equations stand out as the first example of that ‘principle of invariance’
which was one of the leading ideas of the 19th century mathematics, and which
has become of dominant importance in contemporary physics” (Lanczos, 1962,
p. 117).
After showing the invariance of L, Hamilton uses a property of the kinetic energy
used by Lagrange in his Mécanique analytique:6
∂T
2T = η̇ · , (6.103)
∂ η̇
where T is taken to be a function of both variables η and η̇: T = T (η, η̇). The
differential of T is
∂T ∂T
dT = · dη + · d η̇. (6.104)
∂η ∂ η̇
Also, from equation (6.103)

∂T ∂T
d(2T ) = η̇ · d + · d η̇, (6.105)
∂ η̇ ∂ η̇
6 The proof of equation (6.103) is simpler in components. Starting with the equation for T in cartesian

coordinates, T = j m j ẋ 2j /2, we have ∂ T /∂ η̇i = j m j ẋ j (∂ ẋ j /∂ η̇i ) = j m j ẋ j (∂ x j /∂ηi ),

by the law of cancellation of dots. Now notice that
i (∂ x j /∂ηi )η̇i = d x j /dt ≡ ẋ j , which implies
η̇ · ∂ T /∂ η̇ = j m j ẋ 2j = 2T .
and Hamilton writes

∂T ∂T
d(2T ) − dT = dT = η̇ · d − · dη. (6.106)
∂ η̇ ∂η
At this point Hamilton introduces a new set of variables i (with i = 1, · · · , 3n)
defined as
∂T
= , (6.107)
∂ η̇
and considers T as a function of the variables η and . In other words, the
differential of equation (6.106) becomes
∂T
dT = η̇ · d − · dη. (6.108)
∂η
This new variable later became the “conjugate momentum,” since it constitutes
the generalization of the quantity m ẋ, the momentum, in cartesian coordinates. He
calls F the new functional form of the kinetic energy:
T (η, η̇) = F(η, ). (6.109)
Using dT = d F, and using equation (6.108), one has
∂T ∂F
=− , (6.110)
∂η ∂η
and
∂F
η̇ = . (6.111)
∂
Next, Hamilton uses equation (6.101) to derive the equations of motion of the
variables and η:
d ∂T
˙ =

dt ∂ η̇
∂(U + T )
=
∂η
∂(U − F)
= . (6.112)
∂η
Hamilton introduces “for abridgement” the function H (later to be called the
Hamiltonian, and labeled with the same letter),
H (η, ) = F(η, ) − U (η), (6.113)
in terms of which one gets two sets of equations:
∂H
η̇ = (6.114)
∂
∂H
˙ =−
, (6.115)
∂η
which became known as the Canonical Equations of Motion. While there are 3n
Lagrange equations of second order, there are 6n canonical equations of first order.
At first sight this does not represent an advantage, but it leads to an important con-
sequence in the structure of the dynamics. Also there is an element of elegance
in these equations: the (first-order) time derivatives are on only one side of the
equations, and the derivatives with respect to the generalized coordinates of this
enlarged space on the other. The Hamiltonian plays the role of a “potential” from
which the system trajectories are derived. The solution of the dynamical problem
now consists of integrating the canonical equations with assigned initial condi-
tions for the 6n variables ηi and i . Hamilton presents a way of integrating these
equations using his principal function S. By equation (6.113) we have:
T + U = 2T − H (6.116)
∂H
=· − H, (6.117)
∂
where we have used equations (6.103) for 2T together with equations (6.111) and
(6.107) (the definition of canonical momentum). Equation (6.117) is a rewriting
of the Lagrangian in terms of the canonical variables, and the principal function
will be:
# t
∂H
S= dt · −H . (6.118)
0 ∂
It is important to note that S is evaluated over a path {η(t), (t)} that is a solu-
tion of the equations of motion and that S is still a function of the initial and final
coordinates, η(0) and η(t), of that path. Hamilton considers the variation of S
between two paths (he is taking virtual displacements, just as in the case of his
principle of varying action). Using
∂H ∂H
δH = · δη + · δ , (6.119)
∂η ∂
we have for the variation of S, keeping the time t constant:
# t
∂H ∂H
δS = dt · δ + · δη (6.120)
0 ∂ ∂η
# t
= ˙ · δη)
dt ( · δ η̇ + (6.121)
0
# t
d
= dt ( · δη) (6.122)
0 dt
= (t) · δη(t) − (0) · δη(0), (6.123)
which is a generalization of equation (6.70) for arbitrary coordinates, now written

in terms of the initial and final values of the canonical momenta. Hamilton writes
the implications of equation (6.123) as
δS δS
(t) = , (0) = − (6.124)
δη(t) δη(0)
although at this point it is probably appropriate to use “∂” instead of “δ,” since one
is considering variations of S with respect to its arguments. Notice that we could
have arrived at the same result considering variations δL of the Lagrangian:
# t
∂L ∂L
δS = dt · δη + · δ η̇ (6.125)
0 ∂η ∂ η̇
# t # t
∂L d ∂L d ∂L
= dt − · δη + dt · δη . (6.126)
0 ∂η dt ∂ η̇ 0 dt ∂ η̇
The integrand of the first integral in equation (6.126) is zero since η(t) is a solution
of Lagrange’s equations, and we obtain:
, ,
∂L , ∂ L ,
δS = · δη,, − · δη,, , (6.127)
∂ η̇ t ∂ η̇ 0
which is equivalent to equation (6.123), since = ∂ L/∂ η̇.

Next, Hamilton considers time variations, and looks for a differential equation
for S in the arbitrary coordinates ηi that generalizes equations (6.71) and (6.72)
which he had obtained for cartesian coordinates in the “First Essay.” The principal
function is now assumed to be an explicit function of t:
dS ∂S ∂S
= + · η̇ (6.128)
dt ∂t ∂η
∂H
=· − H, (6.129)
∂
where we made use of equation (6.118). By the canonical equations, η̇ · ∂ S/∂η =
(∂ H/∂ ) · , and the equations above imply
∂S
= −H. (6.130)
∂t
The Hamiltonian is a function of the coordinates and the momenta, H =
H ( , η), but the momenta (at time t) are related to spatial derivatives of S, as indi-
cated in equation (6.124). Hamilton substitutes the expression for these momenta
in H and writes his celebrated differential equation for S in terms of the final
coordinates:

∂S ∂S
+H , η(t) = 0, (6.131)
∂t ∂η(t)
and another for the initial conditions:

∂S ∂S
+H − , η(0) = 0, (6.132)
∂t ∂η(0)
where we have included the minus sign in the second equation. Hamilton
assumes that H is quadratic in the momenta, and the structure of both equations
becomes equivalent. Equation (6.131), in terms of the final, time-dependent coor-
dinates, is known today as the Hamilton-Jacobi equation. The second equation
turns out to be redundant, as shown by Jacobi (1837), who criticized Hamil-
ton for giving two equations for the same function S. Jacobi’s criticism was,
on the one hand, the lack of proof that a simultaneous solution exists, and, on
the other, the fact that a solution to the first equation solves the mechanical
problem. Jacobi also points out that for a non-conservative system, the second
equation will not be valid, but the first holds also for time-dependent poten-
tials U . For reversible (conservative) systems, which are the ones considered
by Hamilton, one could solve either of the two equations. We illustrate this
with a simplified version of the example presented by Hamilton in the “Second
Essay.”
6.5.1 Example: Particle in a Uniform Gravitational Field

Consider the one-dimensional (vertical) motion, of a particle of unit mass in a
constant gravitational field of magnitude g. The potential is U = −gη. Let’s call
η(0) = ηi and η(t) = η f . The Lagrangian of the problem is
1
L(η̇, η) = η̇2 − gη, (6.133)
2
and the Hamiltonian is
1
H (, η) = 2 + gη. (6.134)
2
First, let us find S(ηi , η f , t), following Hamilton, by firstly integrating the equa-
tions of motion. Since the motion is of constant acceleration, the general solution
for η(t) is
1
η(t ) = ηi + v0 t − gt 2 . (6.135)
2
Note that we are using t for intermediate time and t for the total elapsed time. Here
v0 (the initial velocity) is a constant of motion, which we express simply in terms
of the variables of ηi , η f and t as
η f − ηi 1
v0 = + gt (6.136)
t 2
The Lagrangian, evaluated over this path η(t ), treating ηi , η f and t as parame-
ters, is
2
1 η f − ηi 1 η f − ηi 1
L(t ) = + gt − gt − g ηi + t − g(t − t)t ,
2 t 2 t 2
from which we obtain
# t
S(ηi , η f , t) = L(t )dt (6.137)
0
1 (η f − ηi )2 η f + ηi 1
= − gt − g2t 3. (6.138)
2 t 2 24
It is straightforward to show that the function S(ηi , η f , t) satisfies both differential
equations

∂S 1 ∂S 2
+ + gηi = 0, (6.139)
∂t 2 ∂ηi

∂S 1 ∂S 2
+ + gη f = 0. (6.140)
∂t 2 ∂η f
Another approach to obtaining S is to integrate the differential equation (only
one of them, the Hamilton-Jacobi equation in terms of the final coordinates, for
example). This is not the procedure followed by Hamilton in his essay (for more
elaboration on this point, see Nakane and Fraser (2002)). For the present problem,
since the potential is time independent, we can separate variables and write
S(ηi , η f , t) = V (ηi , η f , E) − Et, (6.141)
where E is a constant, which will have to be expressed in terms of the initial and
final conditions through equation (6.64): t = ∂ V /∂ E. This separation leads us to
the equation for the characteristic function V :

1 ∂V 2
+ gη f = E, (6.142)
2 ∂η f
from which, after integrating, we get
2√
V =− 2(E − gη f )3/2 + C, (6.143)
3g
where C is an integration constant (constant with respect to η f ) that we find by
imposing the boundary condition of vanishing V for η f = ηi , V (ηi , ηi , E) = 0:
2√
V = 2 (E − gηi )3/2 − (E − gη f )3/2 . (6.144)
3g
Using equation (6.64),

∂V 1 1
t= = 2(E − gη f ) − 2(E − gηi ). (6.145)
∂E g g
Even for this simple case, it takes some algebra7 to go from equations (6.144)
and (6.145) to the expression of equation (6.138)
6.6 Hamilton-Jacobi and Huygens’ Principle*

In his Traitè de la lumière (Treatise on Light) Christiaan Huygens (1690/1945)
stated that the future wave-front of a propagating wave of light can be found by
assuming that each point of the surface emits a (secondary) spherical wave, and by
constructing the envelope of all these spherical waves. Using his principle, he was
able to account for the rectilinear propagation of light and for the laws of refrac-
tion at an interface. In this section, we discuss the equivalence between Huygens’
construction and the Hamilton-Jacobi equation for particles.
Whereas in Fermat’s principle for optics, as well as in Hamiltonian optics, one
considers a single optimal path between two points, the Hamilton-Jacobi theory
extends this notion by considering a whole field of solutions, that is, line integrals
along all the curves that build a wave-front. The central idea of the Hamilton-
Jacobi theory is that, given a complete integral of the Hamilton-Jacobi equation,
one can obtain solutions of Hamilton’s equations just by differentiation. Both Fer-
mat’s principle (light rays follow the path of least time) and Huygens’ principle
(given a wave-front, a later wave-front is the envelope of spherical waves spread-
ing from the points of the given wave-front) stand at the center of Hamilton-Jacobi
theory (Butterfield, 2005).
Consider the wave-front V (x, t) that originates at a point q0 and time t = 0.
Now consider every point of this front as a new source. If s is small, the fronts
generated are spherical waves (see Figure 6.11) . The new wave-front V (x, t + s)
is the envelope of the wave-fronts starting at time s.
7 Squaring equation (6.145) we have
η f − ηi gt
2(E − gηi, f ) = ± , (6.146)
t 2
which, substituted in equation (6.144) gives
(η f − ηi )2 g2 t 3
V = + . (6.147)
t 12
Squaring equation (6.146) we have
(η f − ηi )2 gt 2
E= + + g(ηi + η f ), (6.148)
2t 2 8
and from these expressions for V and E, forming S = V − Et, we obtain equation (6.138).
Figure 6.11 Huygens’ construction.
In Huygens’ construction, the light rays are perpendicular to the wave-front but,
as discussed by Hamilton in his 1833 essay, this can be extended to the so-called
“extraordinary refraction,” where the family of light rays are not perpendicular to
the wave-front. An important observation made by Hamilton in his essay is the
equivalence between Fermat’s least action principle and Huygens’ construction,
which we illustrate for the simple case of refraction into an ordinary medium in
Figure 6.12. Since the Hamilton-Jacobi equation is derived from Fermat’s princi-
ple, the propagation of the “wave-front” S(x, t) of particles can be obtained from
Huygens’ construction.
Malus and Laplace extended the treatment outlined in Figure 6.12 to show the
equivalence of Huygens’ principle and Fermat’s principle of least time (Darrigol,
2012). Laplace considers a medium for which the velocity of propagation depends
on direction and is given by the following function: v 2 = α 2 + β 2 cos2 θ. For a
medium like this, the wave-fronts from a given point q0 are ellipses centered at q0
and the rays – straight lines that radiate from q0 – are clearly not perpendicular to
the ellipses, a situation we will consider in more detail later in this chapter.
The great contribution of Hamilton’s series of articles, summarized in his 1833
essay, is to formalize Huygens’ principle and extend it from light rays to mechan-
ical trajectories. Whereas Laplace’s formulation of the principle of least action
leads to second-order differential equations for the trajectories, Hamilton, follow-
ing the logic of Huygens’ principle, formulates the problem in terms of a first order
differential equation for the wave-fronts.
6.7 Applications and Examples

6.7.1 The Equation of a Light Ray
For an isotropic medium of index of refraction n(x), light rays minimize the optical
path given by the integral:
Figure 6.12 Equivalence between Huygens’ construction and Fermat’s least

action principle. According to Huygens, the envelope E of the wave-fronts emit-
ted at a refracting surface is perpendicular to the refracted rays. Consider two
infinitesimally close rays oci and o c i which, by, construction, correspond to
the same traveling time: T (oci) = T (o c i ). According to Fermat’s principle,
since the trajectory oci is an extreme, to first order T (oci) = T (oc i). Since
oo is perpendicular to oc, the times T (oc ) and T (o c ) differ in second order.
Also, by Huygens’ construction, i and i belong to the tangent to the wave-
front centered at c . This means that, to first order T (c i ) and T (c i) are the
same, implying Fermat’s principle. Notice that this argument is still applicable
if the wave-fronts centered at c and c are ellipses and not circles, like in the
“extraordinary refraction” mentioned by Hamilton.
# #
dsn(x) = d x 2 + dy 2 + dz 2 n(x) (6.149)
# 1
= dλ L(ẋ, x), (6.150)
0
with λ a variable that parameterizes the ray given by curve x(λ): the ray starts
at x(0) and ends at x(1). ẋ = dx/dλ, and L, the Lagrangian of the problem, is
given by:
√
L(ẋ, x) = ẋ 2 + ẏ 2 + ż 2 n(x) ≡ ẋ2 n(x). (6.151)
The “equation of motion” of x(λ) is given by Lagrange’s equations with the

(unvaried) parameter λ playing the role of time. Note that:
d ∂L d % &
= t̂ n(x) (6.152)
dλ ∂ ẋ dλ
∂L √
= = ẋ2 ∇n(x), (6.153)
∂x
√ √
with t̂ = ẋ/ ẋ2 = dx/ds the unit vector tangent to the ray. Since ẋ2 dλ = ds,
equations (6.152) and (6.153) give
d t̂ % &
n + t̂ t̂ · ∇n = ∇n, (6.154)
ds
which we can write as:

d t̂ ∇n
= t̂ × × t̂ , (6.155)
ds n
the equation of motion of the ray.
We will return to equation (6.155) in Section 7.4.1 when we calculate the
bending of a light ray due to relativistic effects.
6.7.2 Hamiltonian of the Harmonic Oscillator

We now consider the action for the harmonic oscillator. following Levi (2002).
The Lagrangian for the harmonic oscillator is (we take for simplicity the mass
of the particle equals unity):
1 2 ω2 x 2
L= ẋ − . (6.156)
2 2
The Hamiltonian H of this simple system can be derived from Hamilton’s
equations of section 6.5. In contemporary notation:
H = T ( p) + V (x), (6.157)
with
∂L
p= = ẋ, (6.158)
∂ ẋ
giving
1 2 1 2 2
H= p + ω x , (6.159)
2 2
As an exercise let us derive the Hamiltonian from the explicit form of the
action S(q, t), for a particle that starts at x = 0 at t = 0, and ends at x = q
at time t. Let us call τ the intermediate times of the path. The general solu-
tions (the paths that minimize the action) for the motion of a harmonic oscillator
are
ẋ0
x(τ ) = x0 cos ωτ + sin ωτ. (6.160)
ω
For the initial and final conditions x(τ = 0) = 0 and x(τ = t) = q, the
solution is:
sin ωτ
x(τ ) = q . (6.161)
sin ωt
The action integral then becomes

# t
ẋ(τ )2 ω2 x(τ )2
S= dτ −
0 2 2
ωq 2
= cot ωt. (6.162)
2
From the above equation we have
∂S
p= = ωq cot ωt (6.163)
∂q
and
∂S
H =− (6.164)
∂t
1 q 2 ω2
=
2 sin2 ωt
1 1 2 2 2
≡ q 2 ω2 + q ω cot ωt
2 2
1 2 2 1 2
= q ω + p , (6.165)
2 2
in correspondence with equation (6.159).
6.7.3 Hamilton-Jacobi Equation for a Particle in a Magnetic Field

Consider the motion of a particle of unit charge in a constant magnetic field B =
B k̂ ≡ ωk̂, We call ω the magnetic field, a symbol usually reserved for frequency,
since the orbits for this problem correspond to circles, all of them with the same
frequency ω.
One possible choice for the Lagrangian is
1 ω
L = m ẋ2 − (x ẏ − y ẋ) . (6.166)
2 2
The corresponding equations of motion are
d ∂L ω ∂L ω
= m ẍ − ẏ = = ẏ (6.167)
dτ ∂ ẋ 2 ∂x 2
d ∂L ω ∂L ω
= m ÿ + ẋ = = − ẋ. (6.168)
dτ ∂ ẏ 2 ∂x 2
We concentrate on the motion in the x y plane. We derive the Hamilton-Jacobi
equation using some simple geometrical considerations. As in the previous exam-
ple, we will evaluate the action S(q, t), for a particle that starts at (x, y) = 0 at
Figure 6.13 The paths for a particle in a magnetic field are not perpendicular to
the wave-fronts of constant action S(q, t).
t = 0, and ends at (x, y) = (q1 , q2 ) at time t. The physical orbits of this problem
are circles of radius r0 given by (see Figure 6.13)
|q(t)|
r0 = (6.169)
2 sin ωt/2

q12 + q22
= (6.170)
2 sin ωt/2
√
which rotate at a frequency ω = B/m. The velocity of the particle is constant
along the trajectory:
|ẋ(t)| = |ẋ(τ )| = ωr0 , (6.171)
where we use τ for the intermediate time of the trajectory connecting the origin
and the final point q(t). The term (x ẏ − y ẋ) for the intermediate time τ can also
be evaluated geometrically by noting that
|x(τ )| = 2r0 sin ωτ/2,
and that the angle between x(τ ) and ẋ(τ ) is ωτ/2:
ωτ
.
ẋ(τ )y(τ ) − ẏ(τ )x(τ ) = 2ωr02 sin2 (6.172)
2
Substituting in the Lagrangian (evaluated at the minimum path) we get
1
L(τ ) = ω2r02 cos ωτ, (6.173)
2
and for the action

# t
S(q, t) = L(τ )dτ
0
1
= ωr02 sin ωt
2
1 2 ωt
= ω q1 + q22 cot . (6.174)
4 2
The surfaces of constant S are circles in the (q1 , q2 ) plane. From equation (6.174)
we have
∂ S(q, t) 1 q2 + q2
− = ω2 1 2 2
∂t 8 sin ωt/2
1 2
= ẋ (t), (6.175)
2
and also
1 ωt
∇ S(q, t) = ωq cot ,
2 2
ωt
|∇ S(q, t)| = |ẋ(t)| cos . (6.176)
2
Since the angle between ∇ S and ẋ(t) is ωt/2, we can therefore write (see Figure
6.13)
ẋ(t) = ∇ S + A, (6.177)
where A is a vector perpendicular to ∇ S with

ωt
|A| = |ẋ(t)| sin
2
1
= ω|q|. (6.178)
2
Substituting (6.177) in (6.175), we obtain the Hamilton-Jacobi equation for a
particle in a constant magnetic field:
∂ S(q, t) 1 2
+ ∇ S(q, t) + A = 0, (6.179)
∂t 2
with
ω
A= (−q2 , q1 ). (6.180)
2
This example gives a nice geometrical interpretation of the vector potential, which
in our case is −A: ∇ × A = −ωk̂ (the magnetic field in this example points into
the plane, in the −z direction).
6.8 When the Principle of Least Action Loses its “Least”

In the Lagrange formulation, we require δS = 0 for the physical path. That does
not necessarily imply that the trajectory is a minimum. This feature was noted as
early as 1662 by De la Chambre, who pointed out that for a curved mirror the path
that obeys the law of reflection is not necessarily the one with the smallest length.
He presented this as an objection to the idea that Nature always follows the shortest
path (De la Chambre, 1662, p. 314). In Appendix E, we describe an example that
illustrates De la Chambre’s point.
When the trajectories are “critical,” i.e. they have a focal point as occurs in lenses
and mirrors in optics, the path can be a saddle. And for particles, when the paths
go beyond the kinetic focus, the “action loses its least”(Lanczos, 1962, p. 272). In
this section, we give some examples that illustrate this point.
6.8.1 Focus and Kinetic Focus

The saddle nature of paths beyond a focus is clear in the optical case. Consider light
rays that emerge from a point source O, propagate in a vacuum, and after reflecting
on a mirror, converge to a focus F. Consider a specific path P that starts at O and
finishes at a point P beyond the focus. As opposed to paths that end before the
focus, there is always a nearby – unphysical – path that has a shorter length than
P. This results from the fact that, in general, there are many (infinite in this case)
paths of the same length as P. These are unphysical paths that have a “kink” at the
focus. The length of any of these paths can be decreased by smoothing the kink, as
illustrated in Figure 6.14.
For mechanical systems, the idea of a focus is extended to a “kinetic focus,”
considered first by Jacobi (1837b). Jacobi discusses the case of planetary motion,
Q
F M
O
Figure 6.14 The physical path O Q P, where P is beyond the focus F of mirror
M, is not a minimum path for a light ray. Consider another, unphysical, nearby
path, O Q F P. Since this path also goes through the focus, it has the same length
as O Q P. But the segment Q P is clearly shorter than the broken path Q F P,
and the (unphysical) path O Q P has a length shorter than .
a
F
c
Figure 6.15 Jacobi’s example of a kinetic focus.
where the particle starts at a point a of an elliptical orbit, and ends at a

point b along a chord that passes through the focus of the ellipse, as shown
in Figure 6.15. There are infinitely many paths that start at a and end at
b, obtained by rotating the ellipse around the chord a Fb that (by symme-
try) have the same values of both the Maupertuis action V and Hamilton’s
action S. Point b is a “conjugate point” of a, or the kinetic focus of a.
Whittaker (1988, p. 252), in a classic work on mechanics, gives a precise def-
inition: “Consider any point a on an actual trajectory, and let another actual
trajectory be drawn through a making a very small angle with the first. If this
intersects the first trajectory again, say at a point b, then the limiting position
of the point b when the angle between the trajectories diminishes indefinitely
is called the kinetic focus of a on the first trajectory, or the point conjugate
to a.”
For the example proposed by Jacobi, and by the same logic discussed in the
optics example, a clockwise moving path that starts at a and ends at c, beyond the
kinetic focus, is not a minimum. There is always a nearby, unphysical path that
starts at a, follows a (physical) rotated ellipse up to a point close to b, and then
(this is the unphysical part) “avoids” the kink at b to reach c “spending” less action
than the original physical path.
Whittaker proposes the example of a particle moving “under no forces on a
smooth sphere. The trajectories are great circles on the sphere, and the action taken
along any path (whether a trajectory or not) is proportional to the length of the
path. The kinetic focus of any point A is the diametrically opposite point A on the
sphere, since any two great circles through A intersect again at A.” We elaborate
on Whittaker’s free particle on a sphere in the following example, where we show
that a point diametrically opposed to the space-time point A at time t = 0 can be
the kinetic focus of the action S.
6.8.2 Kinetic Focus for a Free Particle on a Sphere

Consider a trajectory of an otherwise free particle moving on a sphere of radius
unity. The Lagrangian is given by
1 2
L= θ̇ + φ̇ 2 sin2 θ . (6.181)
2
The solutions (the geodesics), as pointed out by Whittaker, are great circles.
Consider one of the solutions, corresponding to the particle moving along the
equator:
%π &
X0 (t) = (θ(t), φ(t)) = , ωt . (6.182)
2
A physical path of this sort, connecting the spacetime points on the equator
of
π π
the sphere, Pi = θ = 2 , φ = 0, t = 0 and P f = θ = 2 , φ = φ f , t = t f has
frequency ω = φ f /t f . The action for this path is
# tf
1 1 φ 2f
S0 = L(t)dt = ω2 t f = . (6.183)
0 2 2 tf
Now consider the variation of the action for a path that deviates slightly from this
physical path. Choose a particular deviation in which the motion remains free for
φ(t), but deviates slightly out of the equatorial plane:
%π &
X1 (t) = + αθ1 (t), ωt , (6.184)
2
with α 1. In order to ensure that the initial and final spacetime coordinates
remain the same, we impose the following condition:
θ1 (0) = θ1 (t f ) = 0. (6.185)
To the lowest order in α, the Lagrangian evaluated on points of this path is
1 2 2
L α θ̇1 + (1 − α 2 θ12 )ω2 , (6.186)
2
which gives an action, to lowest order in α,
# tf
S = S0 + α 2
(θ̇12 − ω2 θ12 )dt. (6.187)
0
(There is no first-order term in α since the path is an extremum of the action.)

Notice that the variation to order α 2 happens to be the same as the action of an
harmonic oscillator of frequency ω. In order to illustrate the saddle point beyond
the focus, consider a specific path that satisfies the conditions of equation (6.185):

t
θ1 (t) = C sin π , (6.188)
tf
with C an arbitrary constant. Substituting (6.188) in (6.187) we get

1 2 2 π2 1 C2 2
S = S0 + α C − ω t f ≡ S0 + α 2
2
π − φ 2f . (6.189)
2 tf 2 tf
Equation (6.189) implies that, for paths that are major circles of less than half a
turn around the sphere (φ f < π), the action is a minimum. But for those that go
beyond φ f = π the action is a saddle.
Another interesting case where the kinetic focus appears, not for reasons of sym-
metry but because of the isochronicity of the orbits, is the harmonic oscillator (Gray
and Taylor, 2007), which we discuss in the following example.
6.8.3 Saddle Paths for the Harmonic Oscillator

Consider the paths of a harmonic oscillator for which x(0) = 0 and x(T ) = x T .
If T is half the period of the oscillator (see Figure 6.16), x T = 0, and there is an
infinite number of paths that minimize the action. The physical paths for this case
are of the form
v0
x(t) = sin ω0 t, (6.190)
ω0
with v0 the initial velocity and ω0 = π/T0 the frequency of the oscillator. All the
paths of this form cross at (x = 0, t = T0 ). Note that the crossing of the final paths
does not guarantee that this final path is a kinetic focus. The action for all these
paths has to be the same (in particular, for those arbitrarily close).8
O
t
T0 T
Q
Figure 6.16 For a harmonic oscillator of half-period T0 , the space-time point (x =

0, t = T0 ) is a kinetic focus of (x = 0, t = 0). The unphysical path O Q P has
lower action than the physical path O T0 P.
8 As a counterexample, consider a one-dimensional infinite potential well, where the particle is free for values
of x in the interval (0, a). There is an infinite number of physical paths that connect the space-time point
For the harmonic oscillator of frequency ω0 , the Lagrangian (which we encoun-

tered in the previous section) is given by
1 1
L = m ẋ 2 − mω02 x 2 . (6.191)
2 2
All physical paths connecting (0, 0) with (0, T0 ) have zero action:
# T0
mv02 2
S= Ldt = sin ω0 t − cos2 ω0 t = 0. (6.192)
0 2ω0
So this is a very degenerate problem. In the spirit of the previous example, we will
show that trajectories that end after a half period are saddle “points” of the action.
The degeneracy of the harmonic oscillator originates in the fact that L is
quadratic in both x and ẋ, meaning that the second variation of S is of the same
form as that of L. As in the previous example, consider a path of the form
x(t) = x0 (t) + αφ(t), (6.193)
where x0 (t) is the path that minimizes the action S, α a small parameter and φ(t) a
function that satisfies φ(0) = φ(T ) = 0. The second variation of the action is
#
α2m T 2
δ S=
2
(φ̇ − ω02 φ 2 )dt. (6.194)
2 0
Now set, just as we did for the particle on a sphere,9

t
φ(t) = C sin π , (6.195)
T
which satisfies φ(t) = φ(T ) = 0, and ω = 2π/T .
Substitution of φ into the expression for the second variation gives
* +
α 2 mω02 C 2 T0 2
δ S=
2
−1 . (6.196)
4 2T
If T < T0 /2 (before the focus), δ 2 S > 0. But beyond the kinetic focus, for
T > T0 /2, we obtain a negative second variation. We can clearly choose other
variations, with a lot of wiggles, so that the second variation is positive. Hence we
have a saddle beyond the kinetic focus.
P0 = (x = 0, t = 0) with a point P f = (x f , t). These correspond to the particle bouncing from the wall n
times before reaching P f . For no bounce, the direct path has action S0 = mx 2 /t f ; for one bounce the action
is S1 = m(2a − x)2 /t f ; for two bounces S2 = m(2a + x)2 /t f and so on.
9 The general form of the variation satisfying the initial and final conditions is φ(t) = ∞ a sin π nt , as
n=1 n T
considered in Gray and Taylor (2007). However, if our purpose is to show that, beyond the kinetic focus,
there exists a path for which the action is smaller (to second order in α) this simpler function suffices.
Another example of a kinetic focus can be found in planetary elliptical orbits:

the space-time point (x, t = T ), where T is the period of the orbit, is the kinetic
focus of (x, t = 0).
6.8.4 Kinetic Focus of Elliptic Planetary Orbits

Kepler’s third law states that the period of an elliptical orbit depends on the semi-
major axis a and not on the eccentricity of the ellipse. This means that the problem
has a degeneracy similar to that of the harmonic oscillator: for a given point P of
an elliptical orbit, there is an infinite number of elliptical orbits, of the same focal
point F, that return to P in the same time (see Figure 6.17). Since these orbits
form a continuum, they satisfy Whittaker’s condition of diverging and converging
angles, and (P, T ) is a kinetic focus of (P, 0). We can establish this explicitly by
computing the action S for an elliptical path:
# T # T
1 2 μ mμ
S= m ẋ + = ET + 2 , (6.197)
0 2 r 0 r
where E is the energy of the orbit, and we used the notation of equation (6.4) for
the potential energy. According to Hamilton’s derivation discussed in Section 6.4,
the integral of the potential energy for an elliptical orbit is given by equation (6.83);
for a whole revolution the initial and final eccentric anomalies are v0 = 0, v = 2π,
giving
√
S = E T + 4πm μa. (6.198)
Q
F P
Figure 6.17 Elliptical orbits of the same energy and the same focus F, passing
through a common point P. Since the action depends only on the semi-major axis
(equation (6.199)), (P, T) is a kinetic focus of (P, 0).
From equation (6.84) we have T = 2πa 3/2 /μ1/2 , and since E = −mμ/(2a), the
action S for an elliptical orbit that starts and ends at the same spatial point after one
revolution is
√
S = 3πm μa . (6.199)
S is a function of the semi-major axis only and therefore (P, T ) is a kinetic focus.
6.8.5 Gouy’s Phase and Critical Action

In two influential papers, Louis Georges Gouy (1890, 1891) described an anoma-
lous behavior of the phase of a wave as it passes through a focus. He wrote
(translated from French (Visser and Wolf, 2010)):
If one considers a converging wave that has passed through a focus and has then become
divergent, a simple calculation shows that the vibration of that wave has advanced half a
period compared to what it should be according to the distance traveled and the speed of
light.
Since its first description, the Gouy phase has been observed under a wide variety
of circumstances, but the origin of this anomaly continues to be a matter of some
debate (see, for example, Visser (2010) and references therein). In this section,
we present a simple exposition that connects the origin of the Gouy phase to the
minimum versus saddle point problem for the action before and after the focus.
Interestingly, it turns out that Gouy himself had remarked on this point in his 1891
paper. We will consider the simpler case of a focal line (two-dimensional wave
propagation) rather than a focal point, a situation also considered by Gouy, where
the wave advances a quarter of a period rather than half a period.
Consider the superposition of coherent sources of light of wavelength λ located
on an arc of a circle of radius R, as shown in Figure 6.18. For this example, we
will assume that the angle α0 of the arc is small (α0 π ), but that the length of
the arc is much larger than the wavelength (Rα0 λ). From each point of the arc
√
emanates a cylindrical wave whose amplitude decreases as sin(kρ)/ ρ, where ρ
is the distance to the source, and k = 2π/λ. At the end of the section, we argue that
the extension from an arc to a curved surface is immediate. For simplicity, we will
consider the superposition of all these cylindrical waves evaluated along an axis x
that starts from the center of symmetry of the arc (point O on Figure 6.18) and goes
through the focus R. Along the x axis, the amplitude of the wave originating at O is
√
sin(kx)/ x. This will be our reference wave. At the focus – the center of the circle
– all the waves have the same phase √ and add constructively, giving rise to a total
amplitude proportional to α0 sin k R/ R. We are interested in points along the x
axis at distances d away from the focus that are much larger than the wavelength of
light (kd 1), and much smaller than the radius of the arc (d/R 1). The second
of these conditions implies that, for
√points close to the focus, the denominator of
the amplitude can be replaced by R. With this approximation (which was also
considered by Gouy), the amplitude of the wave at a point x is proportional to the
following superposition of waves originating from points within the arc:
# α0
1
I (x) ∝ √ dθ sin kρx (θ) (6.200)
R −α0
where the distance ρx (θ) is given by

ρx (θ) = R 2 + d 2 ± 2d R cos θ

R 2 ± 2d R cos θ
R ± d cos θ, (6.201)
and x = R ± d. The plus (minus) sign corresponds to a point after (before) the
focus. Since the angle of the arc is small we can replace cos θ 1 − θ 2 /2, and
obtain
( 2
x + dθ2 (before the focus)
ρx (θ) = 2 (6.202)
x − dθ2 (after the focus).
The amplitude at a point x before (after) the focus is the result of a superposition
of waves that have longer (shorter) path lengths than the “reference” wave sin kx.
This difference, rooted in the different nature of the stationary points before and
after the focus, is behind the dephasing of the wave as it passes through the focus
(see Figure 6.18). Before the focus, the amplitude results from the superposition
of phases whose optical paths are larger than the reference optical path x. The
converse happens after the focus. The precise value of the phase shift requires
evaluating the integral for I (x):
# α0
1 kdθ 2
I (x) ∝ √ dθ sin kx ± (6.203)
R −α0 2
# α √ kd
1 2 0 2
=√ √ kd dz sin(kx ± z )
2
(6.204)
R kd −α0 2
# ∞
1 2
√ dz sin(kx ± z 2 ), (6.205)
R kd −∞
where we used the condition kd 1 to boldly extend the limits of integration
to plus and minus infinity. Expanding the sine function, we obtain
# ∞ # ∞
1 2
I (x) ∝ √ sin kx dz cos z ± cos kx
2
dz sin z .
2
(6.206)
R kd −∞ −∞
d d
a R b
O x
Figure 6.18 Schematic rendition of the origin of the Gouy phase shift. Coherent
sources originating on the arc C interfere constructively at the focus R (thick line
wave). At a distance d away from the focus, the superposition is between waves
of different path lengths. At a point a before the focus, waves (represented by
the solid line) originating at an arbitrary point P have longer path lengths than
the wave originating at O (dashed line, or our reference wave). Since Pa > Oa,
the superposition gives a resultant wave shifted to the left of the reference wave.
The converse happens after the focus, where Pb < Ob, and the resulting wave is
shifted to the right. (The figure is out of scale with respect to the analysis in the
text, where R d λ is assumed.)
This expression was obtained by Gouy in his 1891 paper (Gouy (1891), p. 192).
The integrals in equation (6.206) are the so-called Fresnel integrals, which, for
enigmatic reasons, are equal:
# ∞ # ∞
π
dz sin z =
2
dz cos z =
2
, (6.207)
−∞ −∞ 2
and the equality of these two integrals (and not their precise value) is what gives
rise to the shift in π/2. √
Using the trigonometric identity sin kx ± cos kx = 2 sin(kx ± π/4), we obtain
our final result:

1 2π sin kx + π4
I (x) ∝ √ π
(before the focus) (6.208)
R kd sin kx − 4
(after the focus).
For this one-dimensional arc we obtain a phase difference of π/2 for the waves
before and after the focus. Gouy (1891) points out that everything happens as
though the waves, in the vicinity of the focal line, propagate at a greater veloc-
ity, gaining an advance of λ/2 over a usual plane wave that displaces on the same
line. Of course, this is an advance that refers to the phase velocity; the group veloc-
ity is never higher than the speed of light in vacuum. Otherwise, we √ could send
superluminal messages across the focus! Notice also the large factor kd in the
denominator of the resulting amplitude (also obtained by Gouy), meaning that, due
to cancellations, the amplitude is much smaller than the amplitude at the focus, as
expected. If we consider an ellipsoidal surface rather than a circle, with two foci,
we will see two phase changes, each of them of π/2. For a single focus and a
two-dimensional surface, the total phase change is π.
6.8.6 Caustics
In addition to the focusing phenomena, where a family of reflecting or refracting
rays (or paths) converges to a single point, there is a more general focusing mecha-
nism where the rays concentrate on surfaces or lines rather than points. These focal
surfaces are called “caustics” (from the Greek word meaning capable of burning),
because they are the places where light is most intense. Just as in the focal point,
the action is a minimum between the source and the caustic, and a saddle after
the caustic is crossed. Caustics are observed in everyday phenomena like rainbows
(Boyer, 1987), mirages (Young, 2012), and light reflected on the surface of water
(Berry, 2015).
As an illustration, consider rays propagating in two dimensions, giving rise to a
caustic line. Rays originate at a point M = x1 , reflect at a point P of a mirror, and
reach an “eye” A = x2 , as in Figure 6.1. If we call ρ1 = M P and ρ2 = P A , the
action, or optical path, is S = ρ1 + ρ2 . If the mirror is generated by a vector x(s),
with s the arc length, we have:
S = |x1 − x(s)| + |x2 − x(s)| ≡ ρ1 + ρ2 . (6.209)
In order to find the point P where the reflection takes place, we compute the first
variation of the action by shifting infinitesimally the reflection point along the
mirror:

dS x1 − x(s) x2 − x(s) dx(s)

=− + · (6.210)
ds |x1 − x(s)| |x2 − x(s)| ds
= −(û1 + û2 ) · t̂, (6.211)
where t̂ = dx(s)/ds is the tangent to the mirror, and û1 and û2 are respectively
the unit vectors in the direction of the incident and reflected ray. Not surprisingly,
equating d S/ds to zero we obtain Hero’s law of reflection: the (sine of the) angle
of incidence is equal to the (sine of the) angle of reflection.10 In order to investigate
whether this zero derivative corresponds to a minimum or a saddle, we compute the
second variation of S:

d2S d û1 d û2
= − + · t̂ − κ(û1 + û2 ) · n̂, (6.212)
ds 2 ds ds
10 Recall that the angle θ of incidence is the angle the ray makes with the normal to the mirror: sin θ = cos α,
where α is the angle the ray forms with the tangent to the curve.
where we used d t̂/ds = κ n̂, with κ the curvature of the mirror at P, and n̂ the
normal to the mirror, as in Figure 6.11. Using ûi = (xi − x(s)) / |xi − x(s)| we
obtain:
d ûi 1 % &
= −t̂ + (ûi · t̂)ûi . (6.213)
ds ρi
Substituting expression (6.213) in equation (6.212), and using û1 · t̂ = −û2 · t̂ =
sin θ we obtain:

d2S 1 1
= cos θ cos θ + − 2κ . (6.214)
ds 2 ρ1 ρ2
Let us analyze equation (6.214) for a fixed point of reflection, taking ρ1 and θ
constant, and varying ρ2 , that is, walking away from the reflection point along
the direction of the reflected ray. For very small ρ2 , the 1/ρ2 term dominates
and d 2 S/ds 2 > 0. As we increase ρ2 , we could reach a critical point where
d 2 S/ds 2 = 0. If we keep increasing ρ2 , then d 2 S/ds 2 < 0, and the action is a
saddle. The caustic for a mirror in two dimensions is therefore described by the
following equation:

1 1
cos θ + − 2κ = 0. (6.215)
ρ1 ρ2
It is clear that different positions of the originating light rays with repect to the
mirror will give rise to different caustics.
As a particular example, let us consider parallel rays incident on a circle of radius
R: ρ1 → ∞ and κ = 1/R (see Figure 6.19). The caustic generated is the curve
usually seen at the bottom of a cup of coffee. From equation (6.215), the equation
for the caustic of the cup of coffee is (renaming ρ2 = ρ)
R
ρ(θ) = cos θ. (6.216)
2
θ
R
(θ
)
ρ(θ
)
Figure 6.19 Caustic in a cup of coffee.

For rays incident vertically, the coordinate of the reflection point is R(θ) =
R(sin θ, − cos θ) (see Figure 6.19). Since the reflected ray forms an angle 2θ with
the incident ray, the vector ρ(θ) connecting the reflection point with the point in
the caustic is ρ(θ) = R2 cos θ(− sin 2θ, cos 2θ). The parametric equations of the
caustic, which we plot in Figure 6.19 is then:
R
x=
(2 sin θ − cos θ sin 2θ), (6.217a)
2
R
y = (cos θ cos 2θ − 2 cos θ). (6.217b)
2
As a second exactly solvable example, we move away from light rays and
consider motion in one dimension for a particle in a potential of the form:
V (x) = λ|x|. (6.218)
This example was considered by Gray and Taylor (2007); see also Gray (2009).
The motion corresponds to a constant force, like constant gravity, that reverses
direction at x = 0.
For a particle starting at x = 0 at t = 0 and initial velocity ẋ(0) = v0 , the orbit
is a parabola curving downwards:

1 2 2v0
x(t) = v0 t − λt , t < t0 = . (6.219)
2 λ
At t = t0 , when the particle comes back to the origin, the force reverses direction,
and the orbit becomes an upward curving parabola;
1
x(t) = v0 (t0 − t) + λ(t − t0 )2 , (t0 < t < 2t0 ) . (6.220)
2
The motion is periodic, with half period t0 = 2v0 /λ and amplitude λt02 /2. For a
given orbit, there is a critical point at tC = 4/3t0 (see Figure 6.20): for t < tC the
t0 4t0/3
t
caustic
Figure 6.20 Caustic for a particle moving in the potential V (x) = λ|x|, for initial
conditions x(0) = 0, ẋ(0) = λt0 .
action is a minumum, and for t > tC the action is a saddle. The caustic is described
by the curve,
1
xC (t) = − λt 2 . (6.221)
16
In Appendix F we derive the caustic for one-dimensional motion treating the
second variation as a quantum mechanical problem.
7
Relativity and Least Action
In 1905, Albert Einstein published a series of papers that constitute an unprece-

dented display of creativity. The paper Einstein (1905) published in June of this
annus mirabilis deals with the theory of relativity, a work that eventually brought
Einstein rock-star fame. The first sentence of the essay makes an aesthetic obser-
vation: “It is known that Maxwell’s electrodynamics, as usually understood at the
present time, when applied to moving bodies, leads to asymmetries which do not
appear to be inherent in the phenomena” (italics added). Einstein discusses the
meaning of this asymmetry with an example from the theory of electromagnetism.
He observes that a magnet in motion relative to a wire loop produces an electri-
cal current in the wire. According to Maxwell’s theory, different equations apply
when the magnet moves and the wire is stationary and vice versa. In one case, the
magnet is moving with respect to the ether (a universal static substance that acts as
the medium for transmission of light) and in the other, the magnet is at rest with
respect to the ether. This asymmetry was unacceptable to Einstein. If the current is
the same in both cases, then one is looking at the same phenomenon from different
perspectives, or from different reference frames, thus making the idea of an ether
superfluous. McCullagh’s prediction (see the quotation on page 110) of an unex-
pected, simple and beautiful ether is perhaps fulfilled; non-existence is the ultimate
simplicity. The new theory is based on two simple postulates:
1. The laws of physics take the same form for “all reference frames for which the
equations of mechanics hold good” (inertial frames).
2. Light always propagates in empty space with a velocity c which is independent
of the state of motion of the emitting body.
From this starting point, as simple as it is audacious, Einstein leads us through a

path of impeccable logic that culminates in the notion that time, represented by the
ticking of a wristwatch, is not an absolute phenomenon.
162
It is remarkable that the equations that Einstein derives existed before his work.
In 1895, the Dutch physicist Hendrik A. Lorentz, in order to explain some exper-
iments by Michelson and Morley wrote a set of equations (identical to Einstein’s)
in which time appeared as a mathematical variable that depended on velocity and
position. Lorentz distinguished between a true time (the one measured by a clock
at rest) and local time (the one dependent on the location of an event). The crucial
point is that Lorentz considered the local time a mere mathematical fiction used
to simplify an equation. Einstein accepts that fiction as real – a “suspension of
disbelief” of sorts – and incorporates it into his relativistic universe.
In the next few sections, we discuss Einstein’s special relativity paper with a
focus on its bearing on the principle of least action. In the second part of the chap-
ter, we visit the general theory, where the principle of least action is realized as
a “principle of maximal (or extremal) aging:” the paths followed by particles in
curved space-time are geodesics which extremize the time on the wristwatch of a
traveler following these path.
7.1 Simultaneity and the Relativity of Time

Einstein defines the synchronization of two stationary clocks A and B, separated a
distance r AB , by sending a light ray from A to B and back: the ray leaves A at time
t A , reaches B at t B , bounces back immediately and reaches A at time t A . Since the
time for a light ray to travel from A to B is the same as the time to travel from B to
A, the clocks are synchronized when the relation between the times of these three
events (t A , t B and t A ) is
t B − t A = t A − t B , (7.1)
and the universal constant, c, given by
1 r AB
c= , (7.2)
2 t A − t A
is the velocity of light. Now assume that we observe these three events from a
reference frame moving in the direction of r AB . Since c is a constant, the light ray
takes a longer time to go from A to B than from B to A: clocks that are synchronous
in one frame are not synchronous when viewed from a moving frame.
Since the degree of this lack of synchronization will depend on the distance
between the clocks, Einstein proposes a linear relation between times t in frame
K and times τ in frame k, in motion at velocity v (see Figure 7.1), of the form
τ (x, y, z, t) = α1 x + α2 y + α3 z + α4 t, where the quantities α1 , α2 , α3 , α4 depend
on the velocity v. The relation has to be linear; otherwise the relation between the
coordinates in different frames will depend on the origin of coordinates, violating
the homogeneity of space.
164 Relativity and Least Action
v
k
ξ ξ
“1” “3”
“2”
K x vt ct0 x
Figure 7.1 Viewed from the rest frame K , the distance traveled by the ray (thick
line) is ct = 2ct0 − vt , and t = 2ct0 /(c + v).
Suppose that the light ray of the (imaginary) synchronization experiment is in

the x direction: the events have coordinates y = z = 0 in the rest frame, and are
synchronized in the moving frame:
1
(τ A + τ A ) = τ B . (7.3)
2
In order to find the coefficients α1 and α2 let us consider the (space and time)
coordinates of the three events of the synchronization experiment as viewed from
the rest frame: ‘1”: the light ray leaves at a point (x, t) = (0, 0); “2”: the ray
reaches some point with coordinates (x0 , t0 ≡ x0 /c); “3”: the ray comes back to
the origin of the coordinates of the moving frame at a time t = 2ct0 /(c + v) and
spatial coordinate x = vt (see Figure 7.1).
Substituting these time and space coordinates in equation (7.3),
1
τ (0, 0, 0, 0) + τ (x , 0, 0, t ) = τ (x0 , 0, 0, t0 ), (7.4)
2
we obtain a ratio between α1 and α4 : α1 /α4 = −v/c

2
. The first relation obtained by
Einstein is then τ (x, 0, 0, t) = a(v) t − vx/c , with a(v) a constant to be deter-
2
mined. Next he looks for a linear relation between each of the spatial coordinates
ξ, η, ζ of the moving frame and x, y, z, t of the stationary frame. For points with
y = z = 0, let us write the linear relation for the ξ coordinate as ξ = b1 x + b2 t.
Since the origin of coordinates of the moving frame (ξ = 0) has coordinates
x = vt in the stationary frame, we have b2 /b1 = −v, or ξ = b(v)(x − vt)
and b(v) a constant to be determined. Now, since the coordinate of a light ray
is ξ = cτ in the moving frame and x = ct in the stationary frame, we have
b(v)(c−v)t = ca(v)(t −vt/c), implying a(v) = b(v): ξ(x, 0, 0, t) = a(v)(x −vt).
In order to obtain the relations between the coordinates perpendicular to the
motion in the moving frame and x, y, z, t, Einstein considers a light ray moving in
the η direction in the moving frame: η = cτ . Viewed from the rest frame, the ray
follows a tilted path, as shown in Figure 7.2.
We write the linear relation between η and the coordinates in the rest frame as
η = d1 x + d2 y + d3 z + d4 t. Consider the following three events (see Figure 7.2):
y
√ “2”
c 2 − v 2t
ct
ct
“3”
“1”
x
0 vt 2vt
Figure 7.2 A ray moving vertically in the moving frame follows a tilted path as
viewed from the rest frame.
√ frame) at x = y = z = t = 0; “2”:
“1”: the ray is emitted vertically (in the moving
the ray reaches a point with coordinates y = c2 − v 2 t, x = vt, at time t; “3”: the
ray is back at the origin (x = 2vt) at time 2t. Using equation (7.3) we now have
1
(τ (0, 0, 0, 0) + τ (2vt, 0, 0, 2t)) = τ (vt, c2 − v 2 t, 0, t). (7.5)
2
Since we already know the dependence of τ on x and t for y = 0, using
τ (x, y, 0, t) = a(v)(t − vx/c2 ) + α2 y, we obtain α2 = 0: the moving clocks,
viewed from the stationary frames, remain synchronized in the y direction. Repeat-
ing the same argument for a ray in the ζ direction, we obtain that τ is independent
of the coordinates perpendicular to the directions of motion.
In order to obtain a relation between y and the vertical coordinate η, we write:
%
v & v2
η = cτ = c a(v) t − 2 x = c a(v) 1 − 2 t (7.6)
c c

v2
= a(v) 1 − 2 y, (7.7)
c
where in √equation (7.6) we used x = vt and in equation (7.7)
we used (see Figure
7.2) y = c2 − v 2 t. An identical argument gives ζ = a(v) 1 − v2 /c2 z.
Einstein switches notation from the factor a(v) to φ(v) = a(v) 1 − v 2 /c2 and
writes the following relation for the transformed coordinates:
τ = φ(v)γ (t − vx/c2 ), (7.8a)

ξ = φ(v)γ (x − vt), (7.8b)
η = φ(v)y, (7.8c)
ζ = φ(v)z, (7.8d)

where γ = 1/ 1 − v 2 /c2 . In order to determine the prefactor φ(v), he proceeds in
two steps. First he introduces a third frame, K moving at velocity −v with respect
to the moving frame k. Using the transformation equations (7.8), we have that the
time t in that frame, viewed from the k frame, is t = φ(−v)γ (τ + vξ/c2 ) =
φ(−v)φ(v)t, and we have analogous relations for the other coordinates. Since the
variables (x, y, z, t) and (x , y , z , t ) refer to systems at rest with respect to one
another, we have
φ(v)φ(−v) = 1. (7.9)
For the second step, he considers a rod of length l (measured in the moving system)
and oriented in the direction η, perpendicular to the motion. From equation (7.8c),
the length of the rod measured in the stationary system is y = l/φ(v). For “reasons
of symmetry,” the length of the rod in the stationary system has to be the same for
the frame k moving in the +x or −x direction: l/φ(v) = l/φ(−v), from which
it follows that φ(v) = φ(−v), and, together with equation (7.9), Einstein obtains
φ(v) = 1. His final expression for the transformation of coordinates is:
τ = γ (t − vx/c2 ), (7.10a)
ξ = γ (x − vt), (7.10b)
η = y, (7.10c)
ζ = z. (7.10d)
7.2 The Relativistic “F = ma”

Part II of Einstein’s historical paper has the title “Electrodynamical Part.” He starts
by showing the covariance of Maxwell’s equations in a vacuum which, in the rest
1 ∂E 1 ∂B
frame, are = ∇ × B and = −∇ × E. Using the transformations equa-
c ∂t c ∂t
tions (7.10), he shows that, in the moving frame, the equations retain their form and
1 ∂E 1 ∂B
become1 = ∇ × B and = −∇ × E , with the new fields given by:
c ∂t c ∂t
E x = E x , Bx = Bx , (7.11a)
% v & % v &
E y = γ E y − Bz , B y = γ B y + E z , (7.11b)
% c c
v &
% v &
E z = γ E z + By , Bz = γ Bz − E y . (7.11c)
c c
We present the details of the derivation in Appendix G. The invariance of
Maxwell’s equations under the transformations of equations (7.10) was first derived
by Hendrik Lorentz (1895), and, in what appears to be a reference to this work,
1 We switch notations from (ξ, η, ζ, τ ) to (x , y , z , t ).
Einstein says, “we have thus shown that . . . the electrodynamic foundation of
Lorentz’s theory . . . agrees with the principle of relativity” (Einstein, 1952, p. 60).
Einstein uses these transformed fields to derive the relativistic version of
Newton’s second law. He considers the motion of a charged particle in an elec-
tromagnetic field. He wants to find out how the particle would accelerate under
the action of an electromagnetic force. Can we still apply F = m ẍ? The answer is
no. For example, under the action of a constant force, according to Newton’s law a
particle will increase its velocity indefinitely, whereas the transformation equations
(7.10) set a limit to the velocity: at v = c, time, viewed from the stationary sys-
tem, will stand still, and “all moving objects shrivel up into plane figures, . . . the
velocity of light plays the part, physically, of an infinitely great velocity” (Einstein,
1952, p. 48).
Consider a particle accelerated under the action of an electromagnetic force.
Einstein starts with the assumption that Newton’s second law is valid when the
particle is instantaneously at rest and is acted upon by an electric field E: m ẍ = eE,
with e the charge of the electron. Next he assumes, without loss of generality, that at
a certain instant t the (accelerated) particle is moving at velocity v in the x direction
as viewed from the stationary frame K . Since the particle is instantaneously at rest
in another frame k (moving at a velocity v with respect to K ), he writes:
d2x
m = eE x , (7.12a)
dt
d 2 y
m = eE y , (7.12b)
dt
d 2 z
m = eE z , (7.12c)
dt
where the primes refer to the coordinates and fields measured by an observer in
k. Since the motion of k with respect to K is in the x direction, we have x =
γ (x − vt), y = y, z = z, t = γ (t − vx/c2 ), and the primed fields given by
equations (7.11).
Since d x = γ (d x − vdt) and dt = γ (dt − vd x/c2 ), we have:
dx ẋ − v
= , (7.13)
dt 1 − v ẋ/c2
and
d2x 1 d dx
=
dt 2 dt /dt dt dt 2
1 ẍ(1 − v ẋ/c2 ) + (ẋ − v)v ẍ/c2
= . (7.14)
γ (1 − v ẋ/c2 ) (1 − v ẋ/c2 )2
Now, since we are considering an instant for which, in the unprimed frame, the
particle is moving at velocity ẋ = v, we have (1 − v ẋ/c2 ) = 1/γ 2 , and Einstein
writes
d2x d 2 y d 2 z
m = γ 3 m ẍ, m = γ 2 m ÿ m = γ 2 m z̈. (7.15)
dt dt dt
Using equations (7.15), together with the transformed fields of equation (G.8c)
Einstein obtains:
γ 3 m ẍ = eE x , (7.16a)
v
γ m ÿ = eE y − e Bz , (7.16b)
c
v
γ m z̈ = eE z + e B y , (7.16c)
c
where we are using the notation ẍ = d 2 x/dt 2 etc. Einstein remarks that, if one
insists on using Newton’s expression, where force equals mass times acceleration,
then special relativity gives a “longitudinal mass” (the mass in the direction of the
instantaneous velocity) of magnitude γ 3 m and a “transverse mass” equal to γ m for
the acceleration transverse to the velocity. And he remarks that “with a different
definition of force and acceleration we should naturally obtain other values for the
masses” (Einstein, 1952, p. 68). In a footnote – most probably due to Sommerfeld
– to the 1905 article included in the collection The Principle of Relativity (Einstein,
1952), we read “The definition of force here given is not advantageous, as was first
shown by Planck.” In fact, in the first paper on relativity by someone else other than
Einstein, Max Planck (1907) wrote the relativistic version of Newton’s second law
where the force is proportional to the rate of change of the momentum mγ ẋ and
derived the relativistic Lagrangian.
Following Planck, we decompose the acceleration into its component parallel to
the velocity, given by (ẍ · v̂)v̂, and the transverse component, given by ẍ − (ẍ · v̂)v̂,
with v̂ = ẋ/v a unit vector in the direction of the velocity. Using these definitions,
the left-hand side of equations (7.16) can be written (omitting the factor m) in
vector form:
1
γ 3 (ẍ · v̂)v̂ + γ ẍ − (ẍ · v̂)v̂ = γ ẍ + 2 γ 3 (ẍ · ẋ)ẋ (7.17)
c
d
= γ ẋ, (7.18)
dt

where we used γ 3 − γ = γ 3 v 2 /c2 , and γ = 1/ 1 − ẋ2 /c2 .
The force equation has now the symmetric form

⎛ ⎞
d ⎜ ⎟
⎜ m ẋ ⎟ = eE + e ẋ × B. (7.19)
dt ⎝ ẋ2
⎠ c
1− 2
c
v
It is interesting that the force F = eE+e ×B, which Lorentz had derived using
c
d’Alembert’s principle (Lorentz, 1903) emerges as a consequence of relativity from
the purely electrostatic force on a particle at rest.
Planck remarks that the left-hand side of equation (7.19) can be obtained using
d ∂ L0
Lagrange’s equation , with the kinetic Lagrangian given by:
dt ∂ ẋ

ẋ2
L 0 = −mc2 1 − 2 . (7.20)
c
The Lorentz force term on the right-hand side of equation (7.19) can be obtained,
as we saw in Section 5.7, by adding the term −eφ + (v/c)ẋ · A to the Lagrangian.2
In other words, the equation of motion (7.19) is equivalent to
d ∂L ∂L
= , (7.21)
dt ∂ ẋ ∂x
with
e
L = L 0 − eφ(x, t) + ẋ · A(x, t), (7.22)
c
and the relation between the fields and the potential is given by:
∂A
E = −∇φ − (7.23a)
∂t
B = ∇ × A. (7.23b)
7.2.1 The Energy-Momentum Four-Vector

The transformation equations (or Lorentz’s transformations) relating the coordi-
nates (x, y, z, t) an (x , y , z , t ) in different frames imply:
x 2 + y 2 + z 2 − (ct)2 = x 2 + y 2 + z 2 − (ct )2 . (7.24)
That is, even though the four coordinates of the event change, there is an invariant
given by x2 −(ct)2 . The invariance can be simply verified using the transformations
x = γ (x − vt), y = y, z = z, t = γ (t − vx/c2 ). Also, using the same
2 Lagrange’s equations for the term ẋ · A comprise a nice vector calculus exercise.
d ∂(ẋ·A) dA ∂A ∂(ẋ·A)
dt ∂ ẋ = dt = ∂t + (ẋ · ∇)A. On the other hand, ∂x ≡ ∇(ẋ · A). The magnetic force results from
the vector identity ∇(ẋ · A) − (ẋ · ∇)A = ẋ × (∇ × A).
transformation for the difference dx and dt between infinitesimally closed events

we have:
(d x)2 + (dy)2 + (dz)2 − (c dt)2 = (d x )2 + (dy )2 + (dz )2 − (c dt )2 . (7.25)
The interpretation of the invariant of equation (7.25) is simple: if (dx)2 − (cdt)2 <
0, we can find a reference frame K 0 in which dx = 0. In K 0 , the “length squared” of
the invariant is −(c dt0 )2 . The value of dt0 is the “proper time interval,” the tick of
a clock at rest in K 0 . Similarly, if (dx)2 − (cdt)2 > 0 we can find a reference frame
in which dt = 0 and the value of |dx| is the “proper length” of the interval, the
distance measured with a ruler between two events in a frame in which they occur
at the same time. In relativity we can construct physical magnitudes that share their
transformation properties with space and time upon change between frames. Let us
consider the relativistic momentum p of a free particle, which in the Hamiltonian
language, is given by:
∂ L0 m ẋ
p= = . (7.26)
∂ ẋ ẋ2
1− 2
c

We can invert equation (7.26) to express ẋ in terms of p: ẋ = cp/ (mc)2 + p2 .
Using this inversion, the Hamiltonian of the free particle is:
mc2
H = p · ẋ − L = = c (mc)2 + p2 = E. (7.27)
ẋ2
1− 2
c
where we have equated H to the constant of motion E (the energy) since the Hamil-
tonian is independent of time. Notice that, from equations (7.26) and (7.27), and
using ẋ = dx/dt, we can construct the following “four-vector”:

E mc
p, = (dx, c dt) . (7.28)
c (ct)2 − (dx)2

invariant
This structure of (p, E/c) implies that the four-vector transforms as (dx, c dt) upon
change in coordinates. More precisely, if px , p y , pz , E/c is the four-momentum
in frame K , the components of the vector px , p y , pz , E /c viewed from a moving
frame in the x direction are

E E
=γ − βpx (7.29a)
c c

E
px = γ px − β (7.29b)
c
p y = p y (7.29c)
pz = pz , (7.29d)
with β = v/c. Just as in the case of the invariants of equation (7.24), we have the
invariant “length” of the energy-momentum vector
E2
p2 −
= Invariant = −(mc)2 . (7.30)
c2
The invariant, the “proper length” of the four momentum, is its value in a frame
when the particle is at rest (p = 0), which leads us to perhaps the most famous
equation in the world:
E 0 = mc2 , (7.31)
with E 0 the energy of the particle at rest. For a particle with velocity v, or
momentum p the energy is
mc2
E= = (cp)2 + (mc2 )2 . (7.32)
1 − v 2 /c2
The mass of the particle emerges as a relativistic invariant, a “scalar” of the theory.
It is common in many textbooks to interpret equation (7.19), F = d(mγ ẋ)/dt, in
terms of a mass that increases with the velocity. It is more consistent with relativity
(Taylor and Wheeler, 1999, pp. 246–252) to refer to m as an invariant (Okun, 1989,
2009) and speak of a momentum that is not related linearly with the velocity.
7.2.2 Invariance of the Relativistic Action

The notion of invariants also applies to two events with coordinates (x1 , y1 , z 1 , t1 )
and (x2 , y2 , z 2 , t2 ) in the rest frame, whose corresponding coordinates in the mov-
ing frame are (x1 , y1 , z 1 , t1 ) and (x2 , y2 , z 2 , t2 ). One can immediately verify that
the “generalized scalar product” of the vectors, defined as x1 x2 + y1 y2 + z 1 z 2 −
(ct1 )(ct2 ), is also an invariant:
x1 x2 + y1 y2 + z 1 z 2 − (ct1 )(ct2 ) = x1 x2 + y1 y2 + z 1 z 2 − (ct1 )(ct2 ). (7.33)
It is interesting that, while the transformation properties of the fields are not the
same as those of the four coordinates of the events, the transformation property of
the four-vector (A x , A y , A z , φ) of the potentials is identical to those of the coordi-
nates (x, y, z, ct) (notice that we wrote ct for the fourth coordinate to ensure the
components of the four-vectors have units of length). That is, the new coordinates
of the fields for a frame moving in the x direction are:
φ = γ (φ − β A x ) (7.34a)
Ax = γ (A x − βφ), (7.34b)
Ay = A y , (7.34c)
Az = Az , (7.34d)
with β = v/c. We present the details of the derivation of equations (7.34) in

Appendix H. Since the four vector (A, φ) transforms in the same way as (x, ct),
the following “dot product” is an invariant:
x A x + y A y + z A z − φct ≡ x · A − (ct)φ = Invariant. (7.35)
A dot product of this kind appears in the action integral of a relativistic particle,
since:
1
Ldt = −mc (cdt)2 − (dx)2 + [dx · A − (cdt)φ] . (7.36)
c
invariant invariant
2
Planck noticed the relativistic invariance of the action 1 Ldt and proposed an
interpretation of his quantum mechanical constant h, proposed in 1900 to account
for the unexplained properties of the radiation of a black body. Since the time
integral of the Lagrangian from a definite initial (space-time) state 1 to a final
state 2 is an invariant, that is, it is the same independent of the choice of reference
frame, “it may be said,” writes Planck, “that every change in nature corresponds to
a definite number of elements of action”:
# 2
Ldt = nh, (7.37)
1
and h is, reasons Planck, a relativistic invariant. “It is evident that because of this
theorem the significance of the principle of least action is extended in a new direc-
tion” (Planck, 1907). We will return to this discussion in Chapter 8. For Planck,
the principle of least action, “by its form and comprehensiveness, may be said
to have approached most closely to the ideal aim of theoretical inquiry” (Planck,
1909/1915, p. 69) and attains its full clarity in the context of relativity where the
principle “contains all four world coordinates in fully symmetrical order”(Planck,
1910).
7.3 Hamilton-Jacobi Equation for a Relativistic Particle

For the relativistic Hamilton-Jacobi equation, we follow the discussion of Section
6.5. Once we determine the Hamiltonian H (p, x) of the system, the Hamilton-
Jacobi equation of the action S(x, t) is given by equation (6.131):
∂S
+ H (∇ S, x, t) = 0. (7.38)
∂t
For the Lagrangian of equation (7.22), the momentum is given by

∂L e
p= = mγ ẋ + A. (7.39)
∂ ẋ c
We can invert the relation of equation (7.39) as

1 − ẋ2 /c2 = mc/ (p − (e/c)A)2 + (mc)2 ,
and use it to derive the expression for the Hamiltonian H :
H = p · ẋ − L (7.40)
2
mc
= mγ v 2 + + eφ (7.41)
γ
2
v 1
= mc γ2
+ 2 +eφ (7.42)
c2 γ

=1
%
e &2
= c p − A + m 2 c2 + eφ. (7.43)
c
The resulting Hamilton-Jacobi equation is:
%
∂S e &2
+ c ∇S − A + m 2 c2 + eφ = 0 (7.44)
∂t c
As discussed in Section 5.4.2, if the fields are independent of time, then H
is constant along the paths, and its value is the total energy E. For these time-
independent cases, the function S can be obtained by separation of variables:
S(x, t) = S(x) − Et. For example, for an electron in a Coulomb potential
(φ = −e/r ) and for zero magnetic field (A = 0), we obtain the following
expression for the Hamilton-Jacobi equation:
2
1 e2
(∇ S)2 = 2 E + − m 2 c2 . (7.45)
c r
We will return to equation (7.45) in Section 8.2.2 when we discuss Sommerfeld’s
quantum derivation of the energy spectrum of the relativistic hydrogen atom.
7.4 The Principle of Equivalence

Einstein’s theory of General Relativity (Einstein, 1915a,b) incorporates the prin-
ciple of relativity into the theory of gravitation. The need to modify Newtonian
mechanics arises in part because in Newtonian mechanics the gravitational force
is an instantaneous action at a distance. In other words, if we suddenly move a
planet that is thousands of miles away from the Earth, the gravitational effects
(however small) would be felt instantaneously on the Earth. This is in conflict with
the Special Theory of Relativity which establishes that information cannot be prop-
agated faster than the speed of light. In addition, in Newtonian mechanics, gravity
is a peculiar force in the following sense. On the one hand, Newton’s second law
states that
Force = Mass × Acceleration, (7.46)
where this “Mass” is the inertial mass m I of the body, which expresses the extent to
which it resists to a change in motion. On the other hand, the gravitational attraction
on the same body due to a second body of mass M gives rise to a force
Gm G M
, (7.47)
r2
where m G is the gravitational mass of the body. The acceleration of this body
will be

mG G M
Acceleration = . (7.48)
mI r2
Now, from Galileo’s experiments, and to current experimental accuracy
mG = m I .
This means that, given some initial conditions, the motion of a body under the
action of gravitational forces is independent of its nature. This is called the “weak
equivalence principle:” the dynamics of the particle is specified by a single world
line. This suggests that we can locally eliminate gravity by transforming to a mov-
ing frame that is in free fall with the particle. For example, consider that you are in
an elevator in free fall. Since all the particles feel the same acceleration, the effect
of gravity has disappeared. A change of frame of reference has eliminated grav-
ity. Gravity must therefore be a fictitious force that arises in non-inertial frames of
reference.
We will examine the principle of equivalence for an elevator in free (vertical)
fall assuming that the gravitational field is constant. Later we will see that this is a
limited approximation: if it were valid, bodies could be accelerated indefinitely to
infinite momenta. Consider two clocks A and B in the elevator separated a distance
h in the vertical direction. The two clocks have velocities v A and v B as they pass
successively very close to a clock C at rest. We apply special relativity to relate
the time between clicks of the different clocks. As shown in the previous chapter,
for an observer at rest with clock C, the time t A is slower than the clicks of her
clock tC :

tC 1 v 2A
t A = tC 1 + , (7.49)
v 2A 2 c2
1 − c2
and

tC 1 v 2B
t B = tC 1 + , (7.50)
v 2B 2 c2
1− c2
which means that

1 2 gh
t B t A 1+ v − v A = t A 1 + 2 ,
2
(7.51)
2 B c
where we have used the principle of conservation of energy, which is valid in this
form provided the velocities are small. If the gravitational potential is not equal to
gh, more generally, we have

A − B
t B = t A 1 + , (7.52)
c2
where A,B denote the gravitational potentials at points A and B. Now we can
compare the (infinitesimal) time between ticks dt (x) of a clock in a region with a
potential (x) compared with those of a clock at infinity (which we will call dt),
where the potential is zero:

(x)
dt (x) = 1 + 2 dt. (7.53)
c
Equation (7.53) implies that if a source at x emits light with a frequency f (x)
(measured with respect to clocks at x), the frequency f upon arrival at a region
where = 0 is given by:

(x)
f = f (x) 1 + 2 . (7.54)
c
Einstein derives equation (7.54) in his paper “On the Influence of Gravitation on
the Propagation of Light” (Einstein, 1911) and states the first of the three canon-
ical tests of his theory of gravitation: “according to our view the spectral lines of
sunlight, as compared with the corresponding spectral lines of terrestrial sources of
light, must be somewhat displaced towards the red.” It turns out that this prediction
is very difficult to measure astrophysically and it was conclusively tested by Pound
and Rebka (1959) who used a very sensitive technique to measure the relative red
shift of two sources situated only meters away, at the top and bottom of Harvard
University’s Jefferson tower.
In the following paragraphs of his 1911 paper, Einstein discusses a “consequence
which is of fundamental importance for our theory”: the velocity of light can be
written as a function of position. He stresses that the velocity of light measured
by local observers (at r and at infinity) are the same. However, if an observer at
r sends two pulses at the consecutive times when a light ray passes the ends of a
rigid rod, those times would be different upon arrival at the position of a distant
observer. For the distant observer, the velocity at x, says Einstein, will be given by
the relation

(x)
c(x) = c0 1 + 2 , (7.55)
c0
with (x) the gravitational potential and c0 the velocity of light in vacuum. (For a
spherically symmetric star, = −G M/r where r is the distance to the center of
the star.)
7.4.1 Bending of Light Rays According to the Equivalence Principle

Since the velocity depends on position, Einstein infers that, “by means of Huygens’
principle,” light propagating across a gravitational field will undergo deflection.
This is the second test of his theory.
Since, as we saw in Section 6.6, Hugyens’ principle is equivalent to Fermat’s
principle, in order to calculate the deflection as proposed by Einstein, we can use
the “equation of motion” of the light ray (equation 6.155):

d t̂ ∇n
= t̂ × × t̂ , (7.56)
ds n
with n the index of refraction, given by:
1 GM
n(r ) = ≈1+ 2 . (7.57)
GM cr
1− 2
cr
where we used c for the speed of light in vacuum. We will compute the deflection
of a light ray that, for x = 0 (see Figure 7.3) is tangential to the star of radius R.
Since we expect a small deflection, to lowest order in G M/c2r , in the right hand
side of equation (7.56) we can write t̂ ≈ t̂0 = ı̂. Also:
t̂0 θ
x t̂∞
Figure 7.3 Bending of a light ray grazing a star of radius R.

∇n GM % &
≈ ∇n = − 2 3 îx + ĵy . (7.58)
n cr
Substituting equation (7.58) in equation (7.56) we obtain:
# ∞ # ∞
d t̂ GM
ds ≡ t̂∞ − ı̂ = −ĵ ds y 2 3 , (7.59)
0 ds 0 cr
which coincides with Einstein’s integral
√ for the deflection. Since the deflection is
small, we can use ds = d x and r = R 2 + x 2 . Also, t̂∞ − ı̂ ≈ −θĵ, and we obtain
Einsteins’s expression for the total deflection, α = 2θ:
#
GMR ∞ dx 2G M
α=2 2 3/2 = 2 . (7.60)
c 0 R2 + x 2 c R
It is very interesting that, unknown to Einstein, this expression was calculated

more than a century earlier by Johan Georg Soldner (1801). And the history of the
deflection of light problem began perhaps even earlier, with Newton, who, in the
conclusions of his Treatise on Opticks, proposed the following query: “Do not Bod-
ies act upon Light at a distance, and by their action bend its Rays; and is not this
action strongest at the least distance?” (Newton, 1718, p. 313). Soldner’s calcula-
tion seems to have been prompted by Laplace’s idea (from 1798!) that a star could
be made invisible by its gravitational field. Laplace’s proposal (strikingly resem-
bling the current notion of black holes) is in the first edition of his Exposition du
syst’eme du monde (Hawking & Ellis , 1973). Laplace calculates the radial motion
of a particle, for which the equation of motion is d 2r/dt 2 = −G M/r 2 (indepen-
dent of the mass, as we have already seen). He proves and uses the conservation
of energy and shows that the escape velocity is given by v 2 = 2G M/R, where R
is the radius of the star. If 2G M/R > c2 light would not be able to escape, and
the star would be invisible. Soldner follows Laplace’s idea applied to the bending
and obtains the result of equation (7.60). The fact that a Newtonian calculation
(Soldner) for the bending and an index of refraction calculation (Einstein) give
the same result is expected from our discussion of the optical mechanical analogy.
According to the analogy, the path of a particle and that of a light ray are identical
if we identify the index of refraction n(x) with the momentum p(x), given by

m(x) m(x)
p(x) ∝ E − m(x) ∝ 1 − ≈1− , (7.61)
E 2E
where we used the approximation for a very fast particle (E m(x) for all points
of the path). The energy E is the kinetic energy at infinity of a particle moving at
the velocity of light, which, from a Newtonian perspective is E = 12 mc2 . Using
(x) = −G M/r we obtain:
−A
x
p0 θ
ˆr,0
−u
R p∞ = p∞ˆ
ur,∞
Figure 7.4 Newtonian bending of a particle of mass m. Using the constancy of the
Laplace-Runge-Lenz vector A along the trajectory, we obtain for the deflection
angle tan θ = G Mm 2 /Rp0 p∞ .
GM
p(r ) ≈ 1 + , (7.62)
c2r
which coincides with equation (7.57).
7.4.2 Bending of Light Rays, Newtonian Calculation
As an exercise, let us use Newtonian mechanics to compute the bending of a light
ray by first solving the bending for a particle and then taking the particle velocity
to be very large. In Section 5.4.2, we showed that the Laplace-Runge-Lenz vector
R (we will call it A in the present section) given by equation (5.57) is a constant
of motion. For an unbound orbit that corresonds to the path of a light ray, we can
evaluate A at two points: at the grazing point, where the particle’s linear momentum
is p0 and the magnitude of the angular momentum is = p0 R (see Figure 7.4), and
at infinity, where the momentum p∞ and the radial unit vector ûr,∞ are colinear.
For the choice of axes of Figure 7.4, A points in the y direction, and since it is a
constant of motion, we have:
A = × p0 + G Mm 2 ur,0 = × p∞ + G Mm 2 ur,∞ , (7.63)
with
= − p0 R k̂, (7.64a)
p0 = p0 ı̂, (7.64b)
ûr,0 = ĵ, (7.64c)
% &
p∞ = p∞ ûr,∞ ≡ p∞ cos θ î − sin θ ĵ . (7.64d)
Since the x component of A is zero for all points of the path, we use equations
(7.64) and evaluate A x at a point “at infinity,”

× p∞ + G Mm 2 ur,∞ x = G Mm 2 cos θ − Rp0 p∞ sin θ = 0, (7.65)
and obtain:
GM
tan θ = . (7.66)
Rv0 v∞
Equation (7.66) coincides with Soldner’s expression.3 For a very fast particle, the
velocities at infinity and at closest approach are approximately equal, v0 ∼ v∞ ∼ c,
and the angle θ is small (tan θ ∼ θ)
2G M
α = 2θ =, (7.67)
Rc2
in agreement with Einstein’s 1911 result. Later, in the culmination of several
attempts, Einstein (1916a) would present his general theory of relativity and predict
that the bending is twice as big as his equivalence calculation predicted.
A Remark: The Actual “Newtonian” Bending Is Zero

The alert reader might wonder if there is an inconsistency in the “Newtonian” cal-
culation of the bending of a light ray: we took the limit v → c and in doing so one
should have incorporated special relativity. If we do so, we obtain zero bending.
Consider Newton’s equations F = dp/dt, which are valid for both classical New-
tonian and special relativity. The difference of course is that in special relativity
p = γ mv and not p = mv. Let us compute the bending in a simplified way, con-
sidering a fast particle that feels the effect of the gravitational force only during the
short time t that it takes to graze the star: t = 2R/v ≈ 2R/c, with R the radius
of the star. During that short interval, the magnitude of the gravitational force is
approximately constant, given by:
G Mm
, F≈ (7.68)
R2
and we obtain for the change in momentum,
G Mm 2G Mm
p ≈ 2
t = . (7.69)
R Rc
The change in momentum of equation (7.69) is the same for both the classical
Newtonian and special relativity cases. In the non-relativistic case, p = mv,
and taking into account that p is perpendicular to the velocity, the bending angle
is given by
v v 2G M
α= ≈ = . (7.70)
v c Rc2
In the special relativity case, however,
p = mγ v + mvγ = mγ 3 v. (7.71)
3 Soldner chooses the units so that R = 1. Otherwise the expression is identical.
As v approaches c, the factor γ approaches infinity: even though p is a finite

quantity v vanishes. The special relativistic bending is obtained, as in the
classical Newtonian case from the change of the velocity:
3/2
v v2 2G Mm
α≈ = 1− 2 → 0. (Special relativity). (7.72)
c c Rc2
The result is an indication that special relativity is not the correct theory for gravity.
Newton’s law for special relativity was deduced by Einstein from the covariance of
Maxwell’s equations, so our calculation would apply for the bending due to a static
electric charge but not due to a static gravitational field. Special relativity then tells
us that a static electric field does not bend light. The bending of light therefore is a
general relativistic, non-Newtonian effect.
7.5 Space-Time is Curved

Einstein’s use of the principle of equivalence implies that time is affected by
gravitation. We can interpret this result as implying that the gravitational field
changes the geometry of space-time. If we consider two events separated by
(dt, d x, dy, dz), with dt the time interval measured by a distant clock, the “dis-
tance” ds between these points depends on position, and is given by (assuming
that the gravitational field is small, or /c2 1):

2(r )
(ds) = (cdτ ) = 1 +
2 2
2
(cdt)2 − (d x)2 + (dy)2 + (dz)2 . (7.73)
c
The interval ds of equation (7.73) implies that space (but not space-time) is flat:
the distance between events for which dt = 0 obeys the Pythagorean theorem.
However, Einstein had good reasons to abandon the idea of a flat space in the
presence of gravity. In the third paragraph of his 1916 paper, he invites us to con-
sider two reference frames, K , and K , having a common origin O of coordinates,
with K rotating uniformly with respect to K. Consider a circle centered at O and
“suppose that the circumference and diameter of this circle have been measured
with a unit measure infinitely small compared with the radius and that we have
the quotient of the two results.” For an observer on K the ratio is π. However,
due to the Lorentz contraction (a special relativity concept) in the direction of
motion (and perpendicular to the radius of the circle), an observer on K would
measure a larger ratio. A circle of radius R and perimeter P = 2π R is possible
in a curved space, just as the distance D along a meridian from the North Pole to
the Equator is larger than P/2π, with P the perimeter of the equator. Now, accord-
ing to the principle of equivalence K “may also be considered as a system at rest
with respect to which there is a gravitational field. . . . We therefore arrive at the
result: the gravitational field influences and even determines the metrical laws of
7.5 Space-Time is Curved 181
the space-time continuum. . . . In the presence of a gravitational field the geometry

is not Euclidean” (Einstein, 1922, p. 64). Einstein’s idea is that in the presence of
gravity the description of space-time is specified by a metric tensor gαβ in such a
way that
(ds)2 = gαβ d x α d x β , (7.74)
with the indices α and β running from 1 to 4, and the convention that whenever
we see repeated indices we sum over those indices. When there is no gravitational
field, the tensor gαβ is diagonal. For example, in cartesian coordinates gx x = g yy =
gzz = −1, and gtt = 1. Once the metric tensor is known, the (four-dimensional)
path followed by a particle between two space-time points, the “geodesic,” is the
one for which the length s (or the proper time τ = s/c), given by
# #
s ≡ ds = gαβ d x α d x β , (7.75)
is an extremum. The physical paths are therefore determined by the principle of

extremal aging: “The path a free object takes between two events in space-time is
the path for which the time lapse between these events, recorded on the object’s
wristwatch, is an extremum” (Taylor and Wheeler, 2000).
In four-dimensional space there are 16 components of the metric tensor, but,
since gαβ = gβα the number of independent functions reduces to 10. The metric
tensor, with its 10 fields, plays the role of the (scalar) potential in New-
ton’s theory. And, just as in Newton’s theory, where the potential is related to
matter through the equation ∇ 2 (x) = 4π Gρ(x), with ρ(x) the mass den-
sity, Einstein was able to find a specification of the metric in terms of matter.
The precise relation is expressed in Einstein’s field equations, which state that
a certain tensor describing the distribution of matter equals a tensor describ-
ing the curvature of space-time with the curvature tensor involving gαβ . We will
not explore the full generality of these equations – which are justly consid-
ered the most beautiful theory in physics – but will exemplify its predictions
in the case of weak gravity for a spherically symmetric, static star. In his 1916
article, Einstein considered the weak gravity correction to the Newtonian the-
ory arising from his theory, and was able to calculate the precession of the
perihelion of Mercury and to correct his earlier result for the bending of a
light ray. We will present Einstein’s result using the metric gαβ found by Karl
Schwarzschild (1916) and published a little more than a month after the pub-
lication of Einstein’s theory. Whereas Einstein used rectangular coordinates to
approximate the gravitational field around a spherically symmetric, static mass,
Schwarzschild chose a polar coordinate system and found the following exact result
for ds:

2G M 1
(ds)2 = (cdt)2 1 − − (dr )2 − r 2 (dφ)2 + sin θ 2 (dθ)2 .
r c2 2G M
1−
r c2
(7.76)
Schwarzschild’s metric has the features mentioned in Einstein’s rotating frame
argument: for the choice of coordinates of equation (7.76), a spherical surface of
radial coordinate r has an area 4πr 2 , but the radial distance
between two concen-
tric spheres of radial coordinates r + dr and r is dr/ 1 − 2G M/r c2 and not just
dr , which would be the case for Euclidean geometry.
7.6 Weak Gravity around a Static, Spherical Star

For weak gravity (G M/r c2 1), we can approximate the metric of equation
(7.76) by

2G M 2G M
(ds) (ct) 1 −
2 2
2
− (dr ) 1 +
2
2
− r 2 (dφ)2 + sin2 θ(dθ)2
rc rc
(7.77)
We are now in a position to calculate the orbits of a planet around a static, non-
rotating spherical star of mass M which is a very good approximation of our sun,
even though our sun is not a perfect sphere, and has some rotation. We have to
find the path (in four dimensions) that is an extremum of ds, with ds given by
equation (7.77).
7.6.1 Precession of the Perihelion of Mercury

Without loss of generality, we will consider planar motion (θ = π/2). Also, just as
we did in our treatment of Maupertuis’s action in equation (5.21), we parameterize
the curve with a parameter λ that runs from zero to one between the (fixed, four-
dimensional) extremes of the curve:
# 1
s= dλ L(r, φ, t; ṙ , φ̇, t˙), (7.78)
0
with a Lagrangian L given by:

1/2
ds 2 ˙2 2G M 2G M
L= = c t 1− − ṙ 1 +
2
− r φ̇
2 2
, (7.79)
dλ r c2 r c2
and t˙ = dt/dλ, ṙ = dr/dλ, φ̇ = dφ/dλ. Finding the orbits in the present prob-
lem is formally equivalent to finding the path of a light ray following a minimum
principle, as we did in Section 6.7.1. The equations of motion for r , θ and t are
given by Lagrange’s equations (5.39). From the structure of L, we spot two cycli-
cal variables: t and φ (L does not depend on these variables; it only depends on
their velocities). We therefore have two constants of motion ∂ L/∂ t˙ and ∂ L/∂ φ̇:

∂L 1 2G M dt 2G M
= c2 t˙ 1 − = c 1 − =E (7.80a)
∂ t˙ L r c2 dτ r c2
∂L 1 1 dφ
= r 2 φ̇ = r 2 = , (7.80b)
∂ φ̇ L c dτ c
1 d d 1 d
where we have used = = , with τ the proper, “wristwatch” time.
L dλ ds c dτ
We need a third equation of motion for r (λ). We have two ways of finding this
d ∂L
equation: from the Lagrangian ( dλ ∂ ṙ
= ∂∂rL ), or from the invariant interval ds of
equation (7.77):
2 2 2
dt 2G M dr 2G M 2 dφ
c 2
1− − 1+ −r = c2 . (7.81)
dτ r c2 dτ r c2 dτ
Using equation (7.80) for the constants of motion and staying within the weak
gravity approximation we obtain:
2
2G M dr 2G M 2
E2 1 + − 1 + − = c2 , (7.82)
r c2 dτ r c2 r2
and, for the radial equation we obtain:

1 dr 2 GM 2 2 G M
= Ē + − 2+ 2 3 (7.83)
2 dτ r 2r cr
with Ē = (E 2 − c2 )/2. Equation (7.83) can be regarded as the radial equation of

a Newtonian particle of energy Ē and angular momentum moving in a potential
V (r ) given by:
GM 2 2 G M
V (r ) = − + 2− 2 3 , (7.84)
r 2r cr
which is the Keplerian problem with a perturbative term given by −2 G M/c2r 3 .
The only difference is that the evolution of r , instead of being expressed in terms
of the Newtonian time t, is given in terms of the proper time τ . But, as we will see
shortly, even with that difference we can compute the precession of the perihelion
of Mercury using a Newtonian approach with a perturbed potential.
In Figure 7.5 we show the difference between the relativistic V (r ) and the Kep-
lerian potential, which is obtained from V (r ) by taking the limit c → ∞. For given
values of Ē and , the minimum of the potential corresponds to the radius r = r0
of a circular orbit:
GM 2 2 G M
V (r0 ) = − + 3 = 0. (7.85)
r02 r03 c2r04
V(r) Relativistic
Newtonian
r0
r
Ē
Figure 7.5 Equivalent radial potential V (r ) for Schwarzschild’s metric, from

equation (7.84) .
For that circular orbit, the (proper) period is τ0 = 2π/ωφ , the proper time for a par-
ticle to come back to the same point in the circle. From equation (7.80b), we have

ωφ = . (7.86)
r02
On the other hand, for those same values of Ē and the radial component will
also describe an oscillatory motion as a function of τ . Since Mercury’s orbit has a
small eccentricity, we can compute the oscillation for small variations with respect
to the circular orbit. In this regime, the potential can be approximated by a parabola
(see Figure 7.5), and the radial motion will be harmonic in τ , with a frequency
ω2R ≡ V (r0 ) given by:
GM 2 2 G M
V (r0 ) = −2 + 3 − 12 (7.87)
r03 r04 c2r05
2 2 G M
= 4 −6 2 5 (7.88)
r0 c r0
GM
= ωφ2 − 6ωφ2 2 . (7.89)
c r0
Always within the weak gravity approximation, we obtain that the radial and
angular frequencies differ in the relativistic case:
GM
ω R ≈ ωφ − 3ωφ . (7.90)
c2 r0
In other words, whereas in the Keplerian problem the orbits are closed (ω R = ωφ ),
in the relativistic case there is a small angular change δφ (the precession) after a
period τ R in the radial coordinate. From equation (7.5), using ω R τ R = 2π, and
ωφ τ R = 2π + δφ we obtain
6π G M
δφ = . (7.91)
c2 r0
This is one of the most famous results of the General Theory of Relativity.
7.6.2 Bending of the Light Rays in the General Theory

For his calculation of the bending of the light rays within his general theory, Ein-
stein first derives the metric for a point mass in the approximation of weak gravity
and obtains:

2G M 2G M
(ds) = (ct) 1 −
2 2
2
− 1+ 2
(d x)2 + (dy)2 + (dz)2 . (7.92)
rc rc
Notice that this metric is not the same as the weak gravity limit of Schwarzild’s
metric of equation (7.77). However, always in the weak gravity approximation, a
relatively simple exercise in change of variables brings equation (7.77) to (7.92).
The change in variables “renames” the radial coordinate as follows:

GM 2
r →r 1+ . (7.93)
2r c2
Einstein then uses the “law of propagation of light in general coordinates”:
(ds)2 = 0, (7.94)
and obtains, in the same spirit as his principle of equivalence law, a new velocity
of light

1 − G M/r c2 2G M
c(r ) = c c 1− . (7.95)
1 + G M/r c2 r c2
and a corresponding index of refraction
2G M
n(r ) = 1 + , (7.96)
r c2
whose deviation from unity is twice the amount he obtained in his principle of
equivalence calculation (equation 7.57). His calculation of the bending is exactly
the one in the principle of equivalence, giving, for the bending angle the celebrated:
4G M
α= . (7.97)
Rc2
7.7 Hilbert’s Least Action Principle for General Relativity*

One of the most interesting applications of the least action principle is the
derivation of Einstein’s field equations of general relativity from an appropriate
Lagrangian. Remarkably, Einstein was not the first to derive these equations. Five
days before Einstein, David Hilbert, inspired by Einstein’s work, derived the equa-
tions using a least action approach. David Hilbert regarded the principle of least
action as “the key concept in his axiomatizations of physical theories”(Stölzer,
2003). Einstein was not following the Lagrangian route for his equations. He knew
that gravity was due to the curvature of space-time, that this curvature is determined
by the metric gαβ of space-time, and that the source of this curvature is the energy,
or more precisely, the “energy momentum tensor” Tαβ , a tensor of order two. So he
looked to formulate his theory as an equality, or a proportionality between energy
and curvature:
G αβ ∝ Tαβ , (7.98)

curvature energy
that in the limit of weak gravity would reduce to the classical equations of Newto-
nian gravity. The relation, or relations, between metric and curvature were known
to the geometers, which Einstein studied patiently. The idea of curvature is intu-
itive for a two-dimensional surface, like the surface of a sphere. But in higher
dimensions, like the four dimensions of space-time, the notion is far from obvious.
The most natural generalization to curvature in higher dimensions is the so-called
“Riemann tensor,” named after Bernhard Riemann, one of the creators of the idea
of n-dimensional curvature. The Riemann tensor is of order four, that is, it involves
four indices, rather than the two indices of equation (7.98). Qualitatively, the idea of
Riemann’s tensor being of order four is the following: one way of computing cur-
vature is to “transport” a vector parallel to itself along a closed curve. If, after this
journey, the vector returns to its original orientation, independent of the curve cho-
sen, space is flat. If it returns rotated with respect to its original orientation, space
is curved. Now imagine transporting a vector following a path made of a small
(infinitesimal) “parallelogram.”4 The parallelogram has an outgoing direction û,
4 This parallelogram is made of four geodesics.
7.7 Hilbert’s Least Action Principle for General Relativity* 187
and an arriving direction v̂, both directions described by vectors. In addition, we

transport our initial vector P along the parallelogram, and we have our final, arriv-
ing, vector P . The matrix connecting P with P has four indices, one for each of the
α
vectors involved: Rβγ δ . A “standard” result from differential geometry is that this
tensor can be constructed from the metric tensor and its first and second derivatives.
For two dimensions, there is only one independent component of the Riemann ten-
sor, and there is only one curvature. In higher dimensions, one has to think in terms
of many curvatures. Since the Riemann tensor has four indices, Einstein reasoned,
it cannot be the curvature that is proportional to Tαβ . There is, however, another
“natural” good candidate for the curvature: the Ricci tensor Rαβ (named after Ital-
ian mathematician Gregorio Ricci), a tensor of order two that can be obtained from
j
a “contraction” of the Riemann tensor: Rαβ = 4j=1 Rα jβ . Finally, from the Ricci
tensor, a scalar can be constructed, the Ricci scalar, with no indices: R = g αβ Rαβ
where g αβ in the inverse of the metric tensor and where we used Einstein’s conven-
tion of summation over repeated indices. Einstein’s guess, after many attempts, for
1
the left-hand side of equation (7.98), is G αβ = Rαβ − Rgαβ . In his 1915 paper,
2
Einstein was able to show that this guess gave the correct limit for weak gravity,
and he computed the precession of the perihelion of Mercury and the bending of
light rays. Simultaneously, Hilbert followed a more “natural” approach: guess the
Lagrangian L, from it write a principle of least action, and derive the equations
of motion by minimizing the action. The action of the curvature term has to be,
in analogy with what we discussed for Maxwell’s theory (see Appendix C), an
integral over space and time
#
S = d x1 d x2 d x3 d x4 L . (7.99)
Since the Lagrangian is a scalar, the most natural quantity that depends on the
metric and its second-order derivatives is the Ricci scalar, and Hilbert wrote:
#
c4 √
Scurvature = R −gd 4 x (7.100)
16π G
where R is the Ricci scalar and the integral is taken over space-time (d 4 x =
√
d x1 d x2 d x3 d x4 ). The term −g, where −g is the determinant of the metric, is
included so that the action is an invariant: the Ricci scalar is an invariant, but d 4 x
√
by itself depends on the choice of coordinates. The volume element −gd 4 x is
invariant under change in coordinates. With this guess for the action, Hilbert com-
puted the variations with respect to the metric at each point of space and time (the
components of the metric are the fields of the problem):
δ
Scurvature = 0. (7.101)
δgαβ
and obtained, for the first time, Einstein’s equations. Later, citing Hilbert, Einstein
(1916b) followed a slightly different approach and rederived his equations using
the least action principle.
8
The Road to Quantum Mechanics
8.1 The Need for a New Mechanics

The early development of quantum ideas takes as a point of departure Max Planck’s
derivation of the energy distribution of black body radiation, presented in 1900 at
a talk at the German Physical society in Berlin. Planck proposed a division of
the energies of an oscillator with frequency ν into discrete values given by nhν,
with n an integer and h the now famous Planck’s constant. With this assump-
tion, Planck derived a form of the spectrum of the radiation emitted by a black
body in agreement with experiment, and found h as a fitting parameter between
the experimental and theoretical curves. There is substantial debate among histori-
ans about the real meaning attributed by Planck to his quantization formula (Kuhn,
1978). In his derivation, Planck had to count the number of ways a given amount
of energy E could be distributed over a given number N of resonators of the same
frequency ν. Since the energy can be partitioned infinitely, the number of ways
in which the energy can be distributed is infinite. In order to find a finite answer,
Planck divides the interval E into M small intervals of energy = E/M, a dis-
cretization “trick” Ludwig Boltzmann had used in his treatment of the classical
gas (Boltzmann, 1872). Whereas Boltzmann took the limit M → ∞ at the end of
his calculation, Planck discovered that his empirical black body law emerged if he
assumed the relation = hν (Landsman, 2007). In 1905, Albert Einstein published
“On a Heuristic Point of View about the Creation and Conversion of Light” (Ein-
stein, 1905), in which he uses the same constant h, conferring on it a new meaning.
Whereas in Planck’s treatment h appears as a discretization parameter, for Einstein
hν represent physical quanta of radiation. In his words:
According to the assumption considered here, the energy of a light wave emitted from a
point source is not spread continuously over ever larger volumes, but consists of a finite
number of energy quanta that are spatially localized at points of space, move without
dividing, and are absorbed or generated only as a whole.
189
190 The Road to Quantum Mechanics
In this sense, it represented a return to the corpuscular description of light. How-

ever, Einstein expected the quanta of light to play no role in the phenomena of
diffraction and propagation in vacuo.
In subsequent years, until 1925, the so-called old quantum theory was devel-
oped both as a search for other quantum phenomena and in attempts to incorporate
and extend Planck’s and Einstein’s idea into the theory of mechanical systems. In
1911, at the Solvay Conference in Brussels, Arnold Sommerfeld gave a report on
quantum theory entitled “Planck’s Quantum of Action and Its General Significance
for Molecular Physics” (Sommerfeld, 1911a). Until then, quantum treatments
were exclusively of periodic systems, such as the black body radiation, and even
the vibrations of atoms in solids responsible for specific heats. Sommerfeld pro-
posed extending the treatment to aperiodic processes. The fundamental quantity
h, having units of action (or Energy × time), was taken by Sommerfeld as a
reflection of a deeper meaning: h is more basic than the energy quantum hν
for each resonator. Picking up the invariance argument of the action by Planck
(1907) (see Section 7.2), he postulated that “in very purely molecular processes
a definite and universal amount of action is taken or given up,” in such a way
that:
# τ
h
S= L dt = , (8.1)
0 2π
where τ is the duration (the “accumulation time”) of the process, and L = T − V
is the Lagrangian. In a reflection of his practical approach to problems, he clarifies
that he includes 2π in order to get agreement in the application of his postulate with
the photoelectric effect (Sommerfeld, 1911b). (In other applications presented in
his paper, like the production of X- and γ -rays, he tries h/4 rather than h/4π.) His
treatment of the photoelectric effect received mixed reactions, but we describe it
here since it is a nice example of the early usage of the action integral in quantum
physics. Sommerfeld considers the incident light as a field E that exerts a force
eE cos ωt on one-dimensional harmonic oscillators. The oscillators are initially at
rest, and assumed to be located at the walls of the metal (e is the charge of the
electron). According to his postulate, when the action S reaches the value h/2π,
the electrons are released from the atom with the kinetic energy attained at that
instant. The equation of motion of coordinate x of the oscillator is
eE
ẍ(t) = −ω02 x(t) + cos ωt. (8.2)
m
In the resonance situation, when ω0 = ω, x is given by
eE
x(t) = t sin ωt . (8.3)
2mω
8.1 The Need for a New Mechanics 191
In other words, the motion consists of a “rapid” oscillation modulated by an

increasing amplitude. Sommerfeld’s postulate is that the emission of the electron
occurs at the time τ when the action reaches the value h/2π. Using T = 12 m ẋ 2
and V = 12 mω2 x 2 , integrating equation (8.1) by parts, and using the equation of
motion (8.2), we get, with Sommerfeld:
# τ
m
S = x(τ )ẋ(τ ) − eE dt x(t) cos ωt. (8.4)
2 0
Since the action oscillates as it increases, it will attain the value h/2π for the first
time close to a maximum of S(τ ); otherwise it would have attained that value in the
previous oscillation. At the maximum S (τ ) = 0, T = V , and ẋ = ωx implying
that m ẋ x/2 = T /ω . For large values of ωτ , the maxima occur for cos ωτ sin ωτ
[ωτ π(n + 1/4)]. Evaluating the integral in equation (8.4) at a maximum of S
we obtain:
T (eE)2
S= − , (8.5a)
ω 16mω3
(eE)2 (eE)2
≡ (ωτ )2
− . (8.5b)
16mω3 16mω3
For large values of ωτ (the number of oscillations during the accumulation time, of
“at least a million” (Sommerfeld, 1911b)), the kinetic energy dominates over the
second term, and we obtain Einstein’s law:
h
T = ω ≡ ω. (8.6)
2π
The encouraging side of the result is that the kinetic energy of the emitted elec-
tron (hν) is independent of the intensity of the radiation, a result consistent with
observations. The treatment has, however, discrepancies with Einstein’s theory in
that the phenomenon has to take place in resonance between the field and the
atom, and due to the fact that the process is not instantaneous, but takes a time τ
(Stuewer, 1970). These objections were the subject of a lively debate at the Solvay
conference, taking more than 20 pages in the conference report.
The second phase of the development of quantum theory starts in 1913, when
interest shifted to the problem of atomic and molecular structure. It was known
from the end of the nineteenth century and the beginning of the twentieth century
that different elements emit and absorb light at certain frequencies only: the spec-
trum of the light emitted was discrete. In 1913, Niels Bohr proposed a theory that
predicted the emitted frequencies of an atom in great (but not full) agreement with
experiment. Says biographer Abraham Pais (1991):
Up to that time no one had ever produced anything like it in the realm of spectroscopy,
agreement between theory and experiment to five significant figures.
Bohr’s approach of 1913 was extended very soon after by Sommerfeld, who
also found striking agreement with experiments. Sommerfeld’s beautiful extension
incorporates the theory of relativity, and accounts for experimental discrepancies
(close doublets, not predicted by Bohr) that were put in evidence when the lines of
hydrogen’s spectrum were observed in a powerful spectroscope. The developments
of the quantum theory (the old and the “new,” post 1925) proceed to a great extent
by analogy with classical motion, seeking for the minimal modifications in the
Newtonian and Hamiltonian mechanics that will agree with observations. In the
next section, we visit the results obtained by Bohr and Sommerfeld in their main
papers, and concentrate on their use of classical concepts derived from the principle
of least action.
8.2 Bohr’s “Trilogy” of 1913 and Sommerfeld’s Generalization

Between July and November 1913, Niels Bohr published his most outstanding
achievement: a three-part paper on the constitution of atoms and molecules (Bohr,
1913a,b,c),which later became famous as “The Trilogy.” In the second and third
parts, Bohr considers the periodic arrangements of the elements and molecular
binding. But the memorable paper is the first one, where he lays out his original
ideas on the spectrum of the hydrogen atom. Bohr’s methodology was to employ
“ordinary mechanics” as a guide to constructing a quantum theory of the radiation
process, seeking for the minimal modifications of the classical theory that could
account for the experimental observations. He starts the paper by invoking Ruther-
ford’s theory of the atom (presented in 1911) as consisting of a positively charged
nucleus surrounded by electrons kept together by attractive forces, and abandon-
ing previous ideas that treated the atom as an harmonic oscillator. According to
classical physics, an electron orbiting around a nucleus will radiate energy, and
its kinetic energy will continuously increase as it falls towards the nucleus, in dis-
agreement with experiments. Bohr, following Planck’s ideas of discontinuity of the
emitted radiation, assumes that “during the binding of the electron, a homogeneous
radiation is emitted of a frequency ν equal to half the frequency of revolution of
the electron in its final orbit.” Why one half the final frequency? Bohr remarks
that half the frequency is an average of the frequencies between initial and final
states, but it certainly looks more like an ad hoc assumption determined by the
need to match his results to the ones deduced by experiment (Kuhn and Heilbron,
1969). The steps of Bohr’s calculation for this part of the paper are the following.
An electron in circular motion of radius r , subject to an electrical force of mag-
nitude F = e2 /r 2 will (assuming it does not radiate energy) revolve around the
nucleus at a frequency ω2 = e2 /mr 3 . For such an orbit, the total energy (kinetic
plus potential) is E = mv 2 /2 − e2 /r which reduces in the case of a circular orbit

to E = −e2 /2r . The kinetic energy is related to the total energy by a simple
expression:
T = −2E. (8.7)
The frequency of rotation is therefore related to the energy E by:
1/2
8 |E|3/2
ω = 2πν = . (8.8)
m e
If we now assume, with Bohr, that in falling to that orbit the electron emits n quanta
of frequency ω/2,
E = −nω/2, (8.9)
we get:
me2 1
E =− , (8.10)
22 n 2
which already looks like the current formula for the levels of the hydrogen atom.
Bohr points out that |E| is greatest when n = 1, which will correspond to the falling
of an electron to its lowest possible energy within the atom. Using the known values
of e, h, and m he gets for the ionization potential E/e = 13 volts, very similar to
the observed values. However, Bohr realizes that, with the hypothesis of equation
(8.9), he cannot derive Balmer’s formula for the frequencies (or wavelengths of
light λ) of the emitted radiation,

1 1 1
=R − . (8.11)
λ n2 m 2
In this part of Bohr’s treatment, n is the number of quanta of fixed frequency
ω/2. But if one wishes to interpret Balmer’s formula as given by the energy dif-
ference between stationary (non-radiating) states, different frequencies have to be
involved: “Considering systems in which the frequency is a function of the energy,
this assumption, however, must be regarded improbable” (Bohr, 1913a). So, in the
second part, Bohr changes his approach, using what later would become one of his
fundamental contributions to quantum physics: The principle of correspondence.
The essence of Bohr’s reasoning is the following: assume that the atom has “sta-
tionary” (non-radiating) states and that, when the atom makes a transition between
two of those states, a single quantum is emitted of frequency ν = cλ, with λ given
by equation (8.11), and R to be determined. Bohr uses ordinary mechanics as a
guide for constructing his theory of radiation; the exact mechanism of transition
during which a definite amount of energy is emitted or absorbed is not specified,
and was one of “the blind spots” of his theory (Perez, 2009). Given the form of
Balmer’s formula, the energy of each stationary state will be given by
hc R
En = − , (8.12)
n2
with n an integer. Using equation (8.8), an electron with this energy in a circular
orbit will have a frequency ω̄ given by
1/2
8 1 (hc R)3/2
ω̄ = . (8.13)
m e n3
In contrast to the initial treatment, the frequency ω of the emitted radiation is not
in principle related to the frequency ω̄ of the electron orbiting around the nucleus.
This is a clear deviation from ordinary electrodynamics, where we expect the fre-
quency of radiation to be that of the emitting sources. Now, says Bohr, consider the
transition between two successive states, i.e. m = n+1 of very low frequency (very
large n). For low frequencies one expects to recover the classical result: the emitted
radiation should have the same frequency as the electron’s orbital frequency. Using
Balmer’s formula, we have

1 1 R
ω = 2πc R 2 − 4πc 3 . (8.14)
n (n + 1) 2 n
If we require the correspondence principle (ω̄ = ω), we obtain from equations
(8.13) and (8.14)
2π 2 me4
R= , (8.15)
ch 3
which agrees with the experimentally observed values, and with the values obtained
in the initial treatment for E, but with a fundamentally different interpretation.
In the same paper, Bohr offers an alternative, “very simple interpretation” of his
results: in a circular orbit of frequency ω the kinetic energy T = mω2r 2 /2 and the
angular momentum = mωr 2 are related by
T
.= (8.16)
2ω
Combining T = −2E from equation (8.7) with E = −nω/2 from equation
(8.9) we have
= n. (8.17)
This relation is what is popularly known as Bohr’s quantization condition, a
relation that, he mentions, had also been developed by John William Nicholson
(1912), a mathematical physicist he had met in Cambridge. In his study of the
solar corona, Nicholson had initially set the ratio of energy to frequency equal
to a multiple of Planck’s constant. To a certain extent, Bohr wrote his trilogy in

response to Nicholson’s papers (Heilbrom, 2013).
Bohr concludes the first paper of the trilogy by stating his condition as a general
quantum principle:
In any molecular system consisting of positive nuclei and electrons in which the nuclei
are at rest relative to each other and the electrons move in circular orbits, the angular
momentum of every electron round the center of its orbit will in the permanent state of the
system be equal to h/2π ,where h is Planck’s constant.
The extension to more general cases beyond the case of circular motions was
later developed by Arnold Sommerfeld. We discuss Sommerfeld’s treatment in the
following section.
8.2.1 Sommerfeld and the Kepler Problem

In 1906, Planck, after studying Josiah Gibbs work on entropy and the concept
of “phase space,” considered the curves in ( p, x) space for the “one-dimensional
resonator.” During the oscillatory motion, the momentum p(t) and position x(t)
describe ellipses in the ( p, x) plane,
1 2 mω2 2
E= p + x , (8.18)
2m 2
where E is the energy, a constant of motion. Planck assumed the phase space was
divided up by a large number of these ellipses, and that “the area bounded by two
successive ellipses become equal to one another, such that E/ν = constant.”
In this way, he identifies the regions in between ellipses as having “equal prob-
ability.” If we equate the magnitude as an “energy element” by E/ν = h, the
“elementary quantum of action h acquires a new meaning; namely it gives the area
of an elementary region in the phase-space of a resonator, no matter what its fre-
quency” (Planck, 1906a). Planck introduces the notion of “elementary quantum of
action” because it has the same dimension as the quantity which owes its name to
the principle of least action (Planck, 1906a). Whereas Planck identified regions of
equal probability in phase space, Sommerfeld postulates a restriction of the paths
in phase space which, formally, resembles Planck’s proposal and extends Bohr’s
treatment to more complicated motions. Notice that the area of the ellipse of equa-
tion (8.18) is 2π E/ω. Since the allowed energies of an oscillator are nhω/2π,
it follows that the area bounded by the nth ellipse is nh. The area between two
consecutive ellipses can be written as
3 3
p dx − p dx = h, (8.19)
n n−1

where the sign meansthe integration
along a closed path of the trajectory. Som-
merfeld assumes that pd x 0 = 0 and finds that Planck’s formulation can be
written as: 3
p d x = nh, (8.20)
which corresponds to a quantization condition over the Maupertuis integral. Som-

merfeld then extends this idea, generalizing it to canonically conjugated variables
that describe cyclic motions:
3
pi dqi = n i h, (8.21)
with n i quantum numbers associated to each motion.

Bohr’s theory showed a surprising agreement with the measured Balmer series,
but the agreement was not exact. Measurements by Michelson in 1891 showed
a small splitting in the emitted lines of around 0.3 cm−1 (Kragh, 1985). As an
attempt to investigate whether these splittings could be due to the eccentricities of
the orbits, Sommerfeld, using his generalization for multiperiodic orbits, treats the
case of elliptical orbits, where the momentum has a radial component pr as well
as an angular component pθ . For planetary motion both the radial coordinate r and
the angle θ describe cyclical (although not periodic) motions. For this situation,
Sommerfeld’s proposed rules are:
3
pθ dθ = kh, (8.22a)
3
pr dr = nh, (8.22b)
with k and n integers. In order to apply these equations, one needs expressions
for pr and pθ . Sommerfeld (1916) obtains these expressions using the Hamilton-
Jacobi equation (Section 6.5), with S = S(x) + Et. Since the potential is central,
we can safely assume that the motion is in a plane, and S depends only on r
and θ:

1 ∂ S(r, θ) 2 ∂ S(r, θ) 2 e2
+ = E − . (8.23)
r2 ∂θ ∂r r
The above equation can be solved using the method of separation of variables
S(r, θ) = S1 (r ) + S2 (θ), from which we see that:
∂ S2
= pθ = L , (8.24)
∂θ
with L a constant. Sommerfeld’s quantum condition on pθ gives:
# 2π
pθ dθ = kh, → L = k, (8.25)
0
with k an integer. Notice that a graph of pθ versus θ is a horizontal line and not
a closed curve. Following Sommerfeld, we restricted the integral over θ to values
between 0 and 2π, since θ and θ + 2π correspond to the same angular position.
Applying the quantum condition on pr is more complicated due to the r
dependence of the momentum:
3
√ # r2 e2 L2
pr dr = 2 2m dr E + − , (8.26)
r1 r 2mr 2
where r1 and r2 are the turning points of the (classical) trajectory:
e2
r1 + r2 = , (8.27a)
|E|
L2
r1r2 = , (8.27b)
2m|E|
which are obtained by setting pr (r1 ) = pr (r2 ) = 0. Sommerfeld solves this inte-
gral using the rather sophisticated technique of contour integration, extending the
integrand to the complex plane. We present the calculation because we find it illus-
trative and elegant, although with current software the integral can be evaluated
with a simple click.
Notice that equation (8.28) can be written as
3 # r2
dr
I = pr dr = 2 2m|E| (r − r1 )(r2 − r ), (8.28)
r1 r
so that the integrand, when extended to the complex plane, has a branch cut in the
real axis between the points r1 and r2 . The integral can be evaluated by computing
the residues at singularities in the complex plane. The integrand has two singu-
√
larities: a simple pole at z = 0, with residue Res(0) = −r1r2 , and a residue
at infinity, since there is a term of the integrand that decreases as ∼ 1/z for very
large z:
1 i r2 + r2
(z − r1 )(r2 − z) − + ··· , (8.29)
z z 2
where “· · · ” is shorthand for terms (not necessarily small) that vanish after inte-
grating over a very large circle centered at z = 0 in the complex plane. The above
term therefore gives rise to a residue Res(∞) = −i(r2 + r2 )/2 upon integration
over a large circle. The result of the contour integration is:
√
I = 2m E 2πi [Res(∞) + Res(0)]
√ √
= 2m Eπ r1 + r2 − 2 r1r2

2me4
=π − 2π L
|E|
= nh. (8.30)
Since L = k, the energies are given by

2π 2 me4 1
|E| = , (8.31)
h 2 (n + k)2
a rather disappointing result for Sommerfeld, since the way in which the two
indices appear does not change the spectrum; rather it is just a relabeling of the
quantum number n. This means that the small observed deviations in the structure
of spectral lines from Bohr’s treatment are not due to deviations from his circular
orbits but to something else.
8.2.2 The Fine Structure of the Hydrogen Spectrum

In a very successful calculation, Sommerfeld approached the problem of the fine
structure, incorporating relativity into the picture (Sommerfeld, 1916).
Following the logic of the previous section, Sommerfeld uses the relativistic
version of Hamilton-Jacobi equation, which we derived Section 7.3:
2
L2 ∂ S1 2 1 e2
+ = 2 E+ + mc 2
− m 2 c2 , (8.32)
r2 ∂r c r
with L a constant1 . Sommerfeld’s quantum condition on pθ is unchanged from the
non-relativistic treatment:
# 2π
dθ pθ = kh, → L = k, (8.33)
0
with k an integer.
Expanding the squares, equation (8.32) above can be cast in the same form as
that in equation (8.28):

pr2 1 ∂ S1 2 ē2 L̄ 2
≡ = Ē + − , (8.34)
2m 2m ∂r r 2mr 2
with
E
Ē = E 1 + , (8.35a)
2mc2

E
ē = e 1 + , (8.35b)
mc2

L̄ = k̄ ≡ k 2 − α 2 , (8.35c)
and α = e2 /c the so-called “fine structure constant”2 (α 1/137).
1 In comparing with equation (7.45), notice that Sommerfeld adds a constant mc2 to E. This is just a shift so
that the zero of energy corresponds to the particle at rest.
2 The fine structure constant gave rise to numerological speculations and inspired Guido Beck, Hans Bethe,
and Wolfgang Riezler to write a parody relating α = 1/137 to the value of the absolute zero temperature T0 ,
expressed in degrees Kelvin: T0 = −(2/α − 1). The spoof was accepted in good faith by the editor of Die
The contour integration proceeds in the same way as in the previous section, and
we obtain Ē from equation (8.31) substituting e → ē, k → k̄:
2π 2 m ē4 1
| Ē| = √ , (8.36)
h 2 (n + k 2 − α 2 )2
or, with a little algebra in the above equation we obtain:
* −1/2 +
α 2
E = mc2 1+ √ −1 (8.37)
(n + k 2 − α 2 )2
The two quantum numbers n and k give rise to orbits which before had the
same energy but now have very slightly different energies. For example, the orbits
corresponding to (k = 2, n = 1) and (k = 1, n = 2) have the same energy
according to equation (8.31), but not according to equation (8.37). Sommerfeld’s
formula later showed very good agreement with experiment, and this anticipation
received high praise from Planck, who in his Nobel lecture of 1920 said:
“that magic formula arose before which both the hydrogen and the helium spectrum had
to reveal the riddle of their fine structure, to such an extent that the finest present-day mea-
surements, those of F. Paschen, could be explained generally through it - an achievement
fully comparable with that of the famous discovery of the planet Neptune whose existence
and orbit was calculated by Leverrier before the human eye had seen it.”
8.3 Adiabatic Invariants

In 1911, at the first Solvay Conference in Brussels, Albert Einstein and Hendrik
A. Lorentz discussed a simple problem that later led to a central research tool
within the old quantum theory. At the time, it was assumed that mechanical systems
subject to the as yet embryonic quantum laws could only make all or nothing jumps
between allowed states of different energy. What would happen, Lorentz asked,
if one takes a pendulum with an allowed energy and shortens the length of the
string by grasping it with two fingers? Einstein remarked (without proof) that, even
though the pendulum’s period would decrease and its energy would increase, if the
string is shortened very slowly (adiabatically), the product of the two quantities
will remain constant: a pendulum whose frequency changes adiabatically does not
undergo a quantum jump, and the product of the period and frequency is quantized.
The first formal connection between adiabatic transformations and quantum theory
is due to Paul Ehrenfest (1911, 1913) who proposed, after many conversations
Naturwissenschaften (Beck, Bethe, and Riezler, 1931). As an amusing coincidence, we note that the value of
1/α appears in H. G. Wells’ short story “The Crystal Egg,” published in 1897: “Suffice that the effect was
this: the crystal, being peered into at an angle of about 137 degrees from the direction of the illuminating ray,
gave a clear and consistent picture of a wide and peculiar country-side.”
with Einstein, the adiabatic hypothesis (Navarro and Pérez, 2006; Pié i Valls and
Pérez, 2016): the adiabatic invariants of mechanical systems are the quantities to
be quantized.
In order to understand the essentials of an adiabatic invariant, let us consider a
harmonic oscillator of frequency ω. Let us think (rather than of a pendulum) of
√
a mass m attached to a spring with constant K , and ω = K /ω. Given some
initial conditions, the displacement x(t) with respect to the equilibrium position
will execute a harmonic motion. For example, if we start the oscillator at zero
velocity and a displacement x0 , we will have
x(t) = x0 cos ωt.
The velocity v(t) in this case is
v(t) = −ωx0 sin ωt.
The energy E is a constant of motion:

1 1 1
E = mv 2 (t) + K x 2 (t) = K x02 .
2 2 2
Now we want to change the spring constant very slowly, in such a way that
1 K
ω,
K t
which means that, in a period of oscillation of the spring and the mass, the relative
change in the spring constant is very small.
We will carry out the variation of K in small constant steps separated by inter-
vals of time t much larger than the period of oscillation (see Figure 8.1). Notice
that, as we vary K , the period of oscillation is changing, so at each step we want
the condition ωt 1, where now ω = ω(t). In this process, ω will change
“abruptly” at certain points in time and will remain constant until the next change.
Figure 8.1 Stepwise change of the spring constant of a harmonic oscillator.

Figure 8.2 Sudden change of spring constant of a harmonic oscillator.
Consider the first change in the spring constant, from K to K + δ K , tak-

ing place at some time t1 . The velocity v and the coordinate x have the same
values immediately before and immediately after the change. This means that the
change in energy due to the change in K is all potential at the moment of the
change:
1 K
E → E + K x02 cos2 (ωt1 ) = E + E cos2 φ1 ,
2 K
where we set φ1 = ωt1 . This energy remains constant until the next change.
We can write the first energy change in terms of the change in spring constant as
E K
= cos2 φ1 ,
E K
or, in terms of the corresponding change in frequency:
E ω
= 2 cos2 φ1 . (8.38)
E ω
Notice that the relative change in energy and the relative change in frequency
are proportional, but the proportionality constant (2 cos2 φ1 ) depends on the instant
at which the change took place. After n changes, the energy will have increased to
some value E n and the frequency to some value ωn . In the (n + 1)th change, we
will have
E n ω
= 2 cos2 φn , (8.39)
En ωn
where, for simplicity, we have assumed that the changes ω are all the same.
Again, the relative changes depend on the instant when the change takes place
through the factor 2 cos2 φn . Now, since we are assuming that the changes are sep-
arated by time intervals that are very long compared with the period of oscillation,
the phases φn can be taken as random, and we can say that, on average
E ω
= , (8.40)
E ω
where we have used the fact that the average of cos2 φ, cos2 φ = 1/2.
Another way of writing the above equation is
E E
= = Constant. (8.41)
ω ω
The ratio E/ω is the adiabatic invariant of the harmonic oscillator. In order to
visualize the fact that this is an invariant on average, with some small fluctuations,
in Figure 8.3 we show the results of iterating equation (8.39) for three cases using
φn as random numbers.
For a harmonic oscillator, the orbits of constant energy (in the position-
momentum plane (x, p)) are the ellipses of equation (8.18 ), whose area 2π E/ω is
the adiabatic invariant. See Figure 8.4. Ehrenfest extended the argument to multi-
periodic orbits and put the adiabatic principle on a firmer basis. He showed that for
periodic orbits, the adiabatic invariant is the average kinetic energy during a period:
“the average kinetic energy increases in the same proportion as the frequency under
adiabatic influencing.” Notice the equivalence between the adiabatic principle and
Sommerfeld’s condition:
Figure 8.3 Energy versus frequency for a harmonic oscillator for which the fre-
quency is increased very slowly, starting from A: frequency ω/2 and energy E,
B: frequency ω and energy E, and C: frequency ω and energy E/2. Notice that
there are some small fluctuations, but on average the energy increases in such a
way that E/ω retains its initial value. The dashed lines show the corresponding
straight lines E = Constant × ω.
8.4 De Broglie’s Matter Waves 203
p
√
2mE
A
x
ω 2E/m
Figure 8.4 In an adiabatic transformation from an ellipse A into a different ellipse

B of the harmonic oscillator, the form changes but the area does not.
# τ # τ 3
2νT = 2 T dt = mv dt =
2
mv ds, (8.42)
0 0
with τ = 1/ν the period of the orbit. We show the details of the derivation in
Appendix I. The concept of invariance (relativistic invariance of the action and adi-
abatic invariance) brings some unity to the fragmentary models of the old quantum
theory. The adiabatic principle, in most historical accounts, has less significance
than the correspondence principle, which is regarded as a precursor of the full
theory (Pié i Valls and Pérez, 2016).
8.4 De Broglie’s Matter Waves

A major breakthrough in the development of quantum theory is due to Louis de
Broglie who, in a series of papers published in 1923 (de Broglie, 1923a,b,c),
and later summarized in his doctoral thesis (de Broglie, 1924b), proposed wave
mechanics as a theory of matter.
His initial goal was to introduce quantum mechanics into relativistic dynamics.
In reference to the relationship energy = h × frequency, in the introduction to his
thesis he writes, “It seems to us that the fundamental idea pertaining to quanta is
the impossibility of considering an isolated quantity of energy without associating
a particular frequency to it” (de Broglie, 1924b, p. 11). He stresses that the devel-
opment of quantum theory is in terms of the action rather than energy: Planck’s
constant has units of action (ML2 T−1 ) and that “can be no accident since Relativity
theory reveals that the action is one of the principal ‘invariants’ of physics.”
The very basis of de Broglie’s theory is that, by some meta law (grande loi)
of Nature, the particle (in its rest frame) is the “seat” of some periodic, internal
phenomenon of frequency ν0 . The relation between ν0 and m is given by both

Einsteinian relations: the quantum relation E 0 = hν0 and the relativistic relation
E 0 = mc2 (de Broglie, 1923a):
hν0 = mc2 . (8.43)
This “is worth as much, like all hypotheses, as can be deduced from its conse-
quences.”
Now consider the particle moving at velocity v. The stationary observer sees,
due to time dilation, a shortened frequency ν1 = ν0 1 − β 2 . On the other hand,
since we are assuming ν0 = E 0 / h, the relativistic transformation
of the energy
– equation (7.32) – gives an increased frequency, ν = ν0 / 1 − β . To solve this
2
enigma, de Broglie, drawing some inspiration from an earlier paper by Marcel

Brillouin (1919), proposes a brilliant idea: there is a “fictitious wave – onde fic-
tive – associated with the motion” of the particle. Both the phase velocity and the
frequency of the wave will be different from the particle velocity and its internal
frequency, but, as they move, particle and fictitious wave will stay in phase. Sup-
pose that the internal periodic motion associated with the particle at rest has the
form sin(2πν0 t). If the particle moves at velocity v, the motion, viewed from the
rest frame, is sin(2πν1 t) = sin(2πν1 x/v), where the change in frequency comes
from time dilation. Now suppose we associate with the particle a wave of fre-
quency ν and phase velocity3 vph = c2 /v, and evaluate it at the position x = vt of
the particle:
t=x/v
v & ↓
% % x&
sin 2πν x − 2 t = sin 2πν1 . (8.44)
c v
The fictitious wave is in phase with the “internal phenomenon” of the particle.
de Broglie calls this equality “the theorem of phase harmony.”
Relating the momentum of the particle with the phase velocity we have:
mv E hν
p= = v= , (8.45)
1− β2 c 2 vph
and, using vph = λν, de Broglie obtains p = h/λ. The non-relativistic version of
his famous momentum – wavelength relation,
h
,
λ= (8.46)
mv
appears explicitly for the first time in his thesis (de Broglie, 1924b, p. 93)
Next, de Broglie moves on to particles describing closed paths of variable veloc-
ities, keeping in mind the optical-mechanical analogy: “The rays of the phase wave
3 de Broglie emphasizes that the wave has to be fictitious since v is larger than the velocity of light and
ph
therefore cannot correspond to transport of energy.
are identical with the paths which are dynamically possible” (de Broglie, 1924a).
In other words, he is addressing a question that is latent in Hamilton’s work: what
is the wave theory whose limit corresponds to the Newtonian paths? Unknown (as
far as we know) to de Broglie, is a striking remark in Ehrenfest’s notebooks that
raises this issue (in 1904!): “The Hamilton-Jacobi equation of Lagrangian mechan-
ics corresponds to diffractionless optics. What is the super-Lagrangian mechanics
whose Hamilton-Jacobi equation is adequate for describing the diffracted wave?”
(Klein, 1970, p. 161).
Starting from the equivalence between Fermat’s principle and Maupertuis’s least
action principle,
# #
ds
δ = δ p ds = 0, (8.47)
λ

Fermat Maupertuis
de Broglie proposes to interpret the stability of Bohr’s orbits as a “tuning” con-

dition: after completing an orbit the electron has to be in phase with itself. If the
wavelength is constant, the tuning condition is = nλ with the length of the
orbit. In the general case
3 3
ds p
= ds = n. (8.48)
λ h
de Broglie, a connoisseur of chamber music, regards the atom as “some kind of
musical instrument which, depending on the way it is constructed, can emit a
certain basic tone and a sequence of overtones” (Gamow, 1966, p. 81). With de
Broglie, “the fundamental bond which unites two great principles of geometric
optics and of dynamics is thus fully brought to light”(de Broglie, 1923c) and will
be fully developed a few years later by Erwin Schrödinger.
Schrödinger, who was very familiar with Hamiltonian mechanics, even prior to
1924, was able to utilize Hamilton’s version of the optical mechanical analogy to
derive a wave equation, something that was missing in de Broglie’s theory (Kragh,
1982)
8.5 Schrödinger’s Wave Mechanics

The development of wave mechanics by Erwin Schrödinger represents the success
of pursuing and extending the Hamilton analogy of mechanics to optics: to each
path of a particle in the space of configurations corresponds a light ray in geometric
optics. The idea of a light ray requires that the dimensions of the path be small
compared with the wavelength of light; otherwise a full ondulatory description
is required. In his seminal papers, Schrödinger (1926a,b,c,d) follows the optical
mechanical analogy proposed by de Broglie, and searches for a wave equation that,
in the limit of small wavelengths, corresponds to classical mechanics in the same

way as the small wavelength limit of ondulatory optics corresponds to geometrical
optics. This idea was anticipated in a paper by Sommerfeld and Runge (1911), who
showed that the Hamilton–Jacobi equation can be obtained for a scalar wave when
the wavelength is small compared with the typical spatial variations of the medium.
This similarity is expressed in the so-called eikonal equation4 .
8.5.1 The Eikonal Equation

Consider the simple case of a wave propagating in one dimension:
∂ 2 φ(x, t) c2 ∂ 2 φ(x, t)
= (8.49)
∂t 2 n 2 (x) ∂ x 2
For n = 1 the stationary states are
φ(x, t) = eik(x−ct) .
Now consider a stationary state for the case n = n(x):
φ(x, t) = φ(x)e−ikct .
The wave equation for φ(x) becomes:

∂ 2 φ(x)
= k 2 n 2 (x)φ(x) (8.50)
∂x 2
Sommerfeld and Runge consider n to be a slowly varying function at the scale

of 1/k and write the stationary state as
φ(x) = A(x)eikW (x) , (8.51)
where A(x) is a slowly varying function. Inserting (8.51) in ( 8.58) we obtain by

simple computation of the derivatives:

d 2 A(x) d A(x) dW (x) dW (x) 2
2
+ 2ik − k A(x)
2
= −k 2 n(x)2 A(x) (8.52)
dx dx dx dx
The above equation is exact. We can divide equation (8.52) by k 2 and take the limit
k → ∞ (the small wavelength limit), and we get

dW (x) 2
= n(x)2 , (8.53)
dx
4 The word “eikonal” is the German form of the Greek word ικ ών, meaning likeness.
which, immediately extended to three dimensions. leads to:

∂W 2 ∂W 2 ∂W 2
+ + = n(x)2 , (8.54)
∂x ∂y ∂z
the Hamilton-Jacobi equation for light.
8.5.2 Schrödinger’s Derivation

Schrödinger considers the “general correspondence which exists between the
Hamilton-Jacobi differential equation and the “allied” wave equation.” He then
exploits the fact that Hamilton’s principle corresponds to Fermat’s principle for
wave propagation, and that the Hamilton-Jacobi equation expresses Huygens’ prin-
ciple for this wave propagation. In his view, the “micro-mechanical” laws represent
a breakdown of classical mechanics much as “the laws of geometry are not applica-
ble for diffraction.” It is notable that the optical mechanical analogy leads Hamilton
to formulate his theory of mechanics, and later leads Schrödinger to his theory of
wave mechanics.
It is also worth noting also that in his papers Schrödinger follows the optical
mechanical analogy to derive a time independent equation. He then introduces the
time-dependent equation in a rather ad hoc way aiming at a description of quantum
systems subject to a time-varying perturbation (Briggs, 2001).
In his 1926 papers Schrödinger presents three derivations of his equation. In the
first paper (Schrödinger, 1926a) he presents a rather cryptic derivation following
a variational principle. According to his notebooks, he applied the method “after
he had found a candidate for the differential sought for” (Kragh, 1982). In the
second paper (Schrödinger, 1926b), he follows the reverse of the transformation
of Sommerfeld and Runge, and looks for a wave equation that has as its eikonal
equation the Hamilton-Jacobi equation. It is interesting that Schrödinger first writes
a wave equation that is second order in time, just like (8.49). In the Hamilton-Jacobi
formulation for light, c/n(x) is the normal velocity of the surface of constant W .
Consider a particle moving in a time-independent potential, at constant
energy E. In the spirit of the eikonal derivation, the Hamilton-Jacobi equation can
be treated as an equation for the phase φ(x, t):
# x
S ∝ φ(x, t) ∝ p(x)d x − Et. (8.55)
x0
The wave-front corresponds to a surface (a point in one dimension) for which φ is

constant, or
dφ = 0 = p(x)d x − Edt. (8.56)
The velocity u = d x/dt = E/ p(x) of the wave-front is

E
u(x) = √ . (8.57)
2m(E − V (x))
If we take u(x) as equivalent to c/n(x) (notice that the units are the same) and
substitute equation (8.57) in equation (8.49) we get the following time-dependent
equation for a wave function ψ(x, t)
∂ 2 ψ(x, t) 2m [E − V (x)] ∂ 2 ψ(x, t)
= . (8.58)
∂x2 E2 ∂t 2
The above equation is not a “good” time-dependent equation for a quantum par-
ticle: on the one hand, it contains the energy E as a parameter associated with a
stationary, time independent equation; on the other hand, it does not contain ! For
a stationary state, Schrödinger proposes
ψ(x, t) = eiωt ψ(x),
which, together with Einstein’s relation
E
ω=

gives rise to his celebrated time-independent equation:
2 ∂ 2 ψ(x)
− = [E − V (x)] ψ(x). (8.59)
2m ∂ x 2
Notice that this is the same differential equation we would get from the opti-
cal mechanical analogy, replacing k 2 n 2 (x) in equation (8.50) by p 2 (x)/2 =
2m[V (x) − E]/2 (here p(x) is the, classical particle momentum). The optical-
mechanical analogy therefore carries over seamlessly to wave mechanics as long
as one is dealing with stationary states. It is worth stressing that the optical-
mechanical analogy embodied in Maupertuis’s and Fermat’s principles can be
considered a geometrical isomorphism between paths of particles and trajectories
of light rays. However, the correspondence breaks down if one considers, vis-à-vis
the dynamics, the light ray’s velocity (v L ) at a point x is inverse to that of the par-
ticle (v P ) following the same path: v L ∝ 1/n(x), and v P ∝ p(x) ∝ n(x). This
lack of correspondence between the dynamics of particles and rays is mirrored in
the difference between the time-dependences of the wave equations. If one insists
on a differential equation for particles with a second-order derivative in time, as
in equation (8.58), the parameter E, which one would expect to label a station-
ary state through Einstein’s relation, lingers into the time-independent equation,
signaling an inconsistency.
It is only later, in the fourth of his 1926 papers in Annalen der Physik that
Schrödinger (1926d) proposes a first-order time-dependent equation that would
8.6 Dirac’s Lagrangian View of Quantum Mechanics 209
“get rid of the energy parameter” and that acquires the form of (8.59) for stationary
states:
2 ∂ 2 ψ(x, t) ∂ψ(x, t)
− + V (x)ψ(x, t) = ±i . (8.60)
2m ∂ x 2 ∂t
8.6 Dirac’s Lagrangian View of Quantum Mechanics

In a very influential paper, the English physicist Paul Dirac (1933) considered the
role of the Lagrangian in quantum mechanics. Up to then, quantum mechanics
was “built up on a foundation of analogy with the Hamiltonian theory of clas-
sical mechanics.” The theories of Schrödinger and Heisenberg took the classical
notions of coordinates and momenta, which evolve according to the Hamiltonian
rules of dynamics, and looked for modifications that can account for the exper-
imental observations. Experimentally measured quantities are, for example, the
frequencies of the light emitted by an atom: the energy levels of the atom are
an example of an observable quantity. In the words of Max Born, “In the case
of atomic theory, we have certainly introduced, as fundamental constituents, mag-
nitudes of very doubtful observability, as, for instance, the position, velocity and
period of the electron.” The modification introduced by Heisenberg is an attempt
to translate classical mechanics, “as slightly altered as possible, into matrix form”
(Born, 1926). Coordinates q and momenta p are replaced by matrix operators p̂
and q̂. Products of matrices are not commutative ( p̂q̂ is not in general q̂ p̂), and the
condition
q̂ p̂ − p̂q̂ = i (8.61)
is introduced. The above relation implies directly that the action of the operator p̂
acting on a function of q̂ is to compute its derivative with respect to q̂:
∂
p̂ = i . (8.62)
∂ q̂
These operators act, as matrices, on a “space” of states, which Dirac denotes
|ψ (the “ket”) and φ| (the “bra”), with the inner product denoted by φ|ψ. In a
representation where the position operators are diagonal, one has
q̂|q = q|q (8.63)

Q̂|Q = Q|Q (8.64)
In his paper Dirac relates the inner product q|Q with the so-called contact
transformations of the Lagrangian theory. The notion of contact transformation is
closely related to Huygens’ principle for wave propagation and, at the same time, to
Hamilton’s view of the propagation of the surfaces of constant action discussed in
Chapter 6. A contact transformation can be thought of as a mapping from a surface

σ to a new surface σ , or, in Huygens’ context, a tranformation from wave-front σ
into the wave-front σ . In the Hamilton-Jacobi theory, the contact transformation is
defined by the action S(Q, q) between two points separated by an time interval t:
# q
S(Q, q) = Ldt. (8.65)
Q
If one considers the point Q as fixed, the equation S = constant defines, as a

function of q, a surface of constant action. If now the point Q takes all posible
positions on the surface σ , the different surfaces of constant S envelope the surface
σ . In this way, the function S(Q, q) “carries” the wave-front σ to σ a time t later.
In the classical view, the momenta are:
∂S ∂S
p= , P=− . (8.66)
∂q ∂Q
The quantum analog to S would “carry” the quantum wave function from a time t0
to t0 + t.
In his search for the quantum contact transformation, Dirac starts by writing the
following identities
q|q̂|Q = qq|Q
q| Q̂|Q = Qq|Q
∂
q| p̂|Q = −i q|Q
∂q
∂
q| P̂|Q = i q|Q (8.67)
∂Q
and then considers a function α of the two position operators:
q|α(q̂, Q̂)|Q = α(q, Q)q|Q, (8.68)
where α(q, Q) is now a function of the numbers (not operators) q and Q. It is
important that for this expression to be valid the function α has to be written in
terms of q̂ and Q̂ in a “well-ordered way,” that is, in a power series expansion of
the function α. Where a term of the form Q n q m appears, the Q̂ terms are written
to the right, and all the q̂ terms to the left: q̂ m Q̂ n . He then puts
q|Q = eiU (q,Q)/ . (8.69)
Since Dirac is not assuming U to be real, the above expression does not lose
generality by being written as an exponential. A simple substitution of (8.69) in
Equations (8.67) gives:
∂ ∂U (q, Q) ∂U (q̂, Q̂)
q| p̂|Q = −i q|Q = q|Q = q| |Q. (8.70)
∂q ∂q ∂ q̂
From the above equation Dirac makes the following operator identifications:
∂U (q̂, Q̂) ∂U (q̂, Q̂)
p̂ = , P̂ = − (8.71)
∂ q̂ ∂ Q̂
which are of the form of the classical equations (8.66). Therefore, Dirac concludes,
the function U (q, Q) defined in equation (8.69) is “the analogue” of the classical
action S(Q, q). Notice that Dirac is stating an analogy and not an equality.
8.7 Feynman’s Thesis and Path Integrals

In his doctoral thesis entitled “The Principle of Least Action in Quantum
Mechanics,” Richard Feynman (1942/2005) considers the fundamental role of the
Lagrangian rather than the Hamiltonian and develops in more elaborate detail the
idea outlined in Dirac’s 1933 paper. Feynman’s interest at the time was to formu-
late a quantum theory of electrodynamics. The basic issue was that the action to
be minimized involves interaction of particles at different times: the Lagrangian is
not local, there is no momentum, and the quantization becomes problematic. As an
example Feynman considers the action S given by:
# ∞ 2
m ẋ (t)
S= − V [x(t)] + k ẋ(t)ẋ(t + T0 ),
2
(8.72)
−∞ 2
which corresponds to a particle in a potential V (x), and interacting with itself in
a distant mirror: T0 /2 is the (constant) time it takes for light to reach the mirror.
Since the quantity in curly brackets is not local in time, there is no good definition
of momentum. The equations of motion derived from δS/δx(t) are
m ẍ(t) = −V [x(t)] − k 2 [ẍ(t + T0 ) + ẍ(t − T0 )] . (8.73)
The force acting on the particle at time t depends on the motion of the particle
at times other than t, and the equations of motion cannot be described directly in
Hamiltonian form. “The least action principle does not imply a Hamiltonian form
if the action is a function of anything more than positions and velocities at the
same moment” (Feynman, 1965). Feynman confronts the need to quantize systems
which in general have no Hamiltonian form, but whose (classical) dynamics can be
described in terms of a least action principle. In his Nobel prize acceptance speech,
he narrates the story eloquently:
When I was struggling with this problem, I went to a beer party in the Nassau Tavern in
Princeton. There was a gentleman, newly arrived from Europe (Herbert Jehle) who came
and sat next to me. Europeans are much more serious than we are in America because
they think that a good place to discuss intellectual matters is a beer party. So, he sat by
me and asked, “what are you doing” and so on, and I said, “I’m drinking beer.” Then I
realized that he wanted to know what work I was doing and I told him I was struggling
with this problem, and I simply turned to him and said, “listen, do you know any way of
doing quantum mechanics, starting with action—where the action integral comes into the
quantum mechanics?” “No,” he said, “but Dirac has a paper in which the Lagrangian, at
least, comes into quantum mechanics. I will show it to you tomorrow.”
Feynman’s goes on to relate his bafflement with Dirac’s statement about the
quantity q|Q as being “analogous” to the exponential of the action. Since q|Q
propagates the wave function ψ from point q (or x) to a different point Q (or x )
at a later time, consider the propagation from point (x, t) to point (x , t + ) an
infinitesimal time later (Feynman, 1942/2005):
# ∞ x
(x, t + ) = A d x (x , t)ei x dt L/ , (8.74)
−∞
where Feynman is taking Dirac’s statement at face value, replacing the “analogous”
by an equality through a proportionality constant A. Since the integral of L is
between two infinitesimally close times one can write:
# x
dt L L average . (8.75)
x
Replacing the average velocity between the two very close points x and x by (x −
x)/ ≡ η/ one obtains
# x
m (x − x )2 x + x
dt L −V .
x 2 2
m η2
− V (x) . (8.76)
2
Substituting the above expression in (8.74), and expanding to lowest order in and
η one obtains:

# ∞ i mη2
2 −V (x)
ψ(x, t + ) A dηψ(x + η, t)e ,
−∞
# ∞
i i mη2
A 1 − V (x) dηψ(x + η, t)e 2

−∞ # ∞
i i mη2
A 1 − V (x) ψ(x, t) dηe 2
−∞
#
1 ∂ ψ(x, t) ∞
2
i mη2
+ dη η e
2 2
2 ∂x2 −∞

i 2πi 1 ∂ 2 ψ(x, t) i
= A 1 − V (x) ψ(x, t) + . (8.77)
m 2 ∂x2 m
With the particular choice of the constant of proportionality

m
A= ,
2πi
and keeping the terms in Eq. 8.77 up to first order in :

∂ψ(x, t) i 2 ∂ 2 ψ(x, t)
ψ(x, t) + ψ(x, t) − V (x)ψ(x, t) − ,
∂t 2m ∂ x 2
from which the Schrödinger equation follows immediately. With this nice calcu-
lation, Feynman is able to make quantitative sense of Dirac’s somewhat cryptic
statement connecting the propagator to the exponential of the action.
After claryfing the operational meaning of the exponential of the action, Feyn-
man proceeds, following Dirac’s logic. Dirac states that the quantity B(tb , ta ) given
by
# tb
B(tb , ta ) = exp i L dt/ = exp {i S(tb , ta )/} (8.78)

ta
“corresponds to” xtb |xta in the quantum theory. Feyman shows that this corre-
spondence is strict provided a) a proportionality constant is introduced and b) the
time interval tb − ta is infinitesimal. For a finite interval, Feyman follows Dirac,
and divides the interval tb − ta into a large number N + 1 of small time intervals of
length :
t1 = ta + , t2 = ta + 2, · · · t N = tb − ,
and uses, following Dirac, the composition law (or completeness relation)
#
d xi |xi xi | = 1, (8.79)
to obtain
#
xtb |xta = d x1 d x2 · · · d x N xtb |x N x N |x N −1 x N −1 | · · · |x1 x1 |xta . (8.80)
Since each intermediate propagator xi+1 |xi involves times that are infinitesimally
close, one can replace them by the expression
( )
m m xi+1 − xi 2
xi+1 |xi = exp i − V (xi ) .
2πi 2
The limit N → ∞ corresponds to → 0, and one obtains:

#
xtb |xta = lim A N
d x1 d x2 · · · d x N e(i/)S[b,a] , (8.81)
→0
where
# tb
S[b, a] = L(ẋ, x)dt (8.82)
ta
is the line integral of the Lagrangian taken over the trajectory connecting xa with
xb and passing through the points xi with straight sections in between.
The kernel, or propagator xtb |xta is then a sum over all paths, each of them with
equal weights.
8.8 Huygens’ Principle in Optics and Quantum Mechanics*

Huygens’ principle discussed in Section 6.6 is popular in pedagogical expositions,
and many current introductory books use Huygens’ construction to derive Snell’s
law (Serway and Jewett, 2013). This theory, at least as commonly presented, has
some difficulties that Huygens overcomes by making special ad hoc hypothe-
ses. First, he assumes that the secondary wave emitted by the front has no effect
except at the point where it touches the envelope. Second, the envelope of the
secondary waves has two sheets, one on each side of the surface that where the dis-
turbance originates (see Figure 8.5). If one follows Huygens’ construction strictly,
one should obtain two disturbances, one propagating forward and one propagat-
ing backwards. Huygens overcomes this difficulty by assuming that only one sheet
has to be considered. Another criticism of Huygens’ construction goes to the very
essence of the principle; for example, Melvin Schwartz (1972) writes that to con-
sider each point on a wave-front as a new source of radiation, and to add the
cΔt
Wave-front at time t+Δt

“Backwards” wave
Wave-front at time t
Figure 8.5 Huygens’ construction, showing the two wave-fronts generated, only
one of which should be kept.
radiation from all the new sources together, “makes no sense at all. Light does not
emit light; only accelerating charges emit light. Later we will see that it actually
does give the right answer for the wrong reasons.” Finally, the incarnation of Huy-
gens’ principle in quantum mechanics gave rise to some confusing remarks in the
literature. In the classic “Variational Principles in Dynamics and Quantum Theory,”
by W. Yourgrau and S. Mandelstam, we read: “this [quantum mechanics] is the only
discipline of physics which is susceptible to a consistent treatment by Huygens’
concept. The wave equations of classical physics are differential equations of the
second order with respect to time. Huygens’ principle – which determines the wave
function at any time once it is known thoughout space at an earlier time – is applica-
ble without modification exclusively to first-order differential equations”(Yourgrau
and Mandelstam, 1968, p. 136)
To complete our tour of the principle of least action, here we revisit these
questions with the purpose of clarifying the nature of Huygens’ principle at an
introductory level. We show, through explicit calculations, that: a) Huygens’ prin-
ciple can be in fact written in terms of a first-order propagator, provided we treat
the two “components” of the wave-front at a given time (the function and its time
derivative) as the source; b) The future wave-front can be written as an envelope
of spherical waves emerging from each point of the (two component) source wave
function; c) The cancellation of the backwards wave results explicitly from the pre-
cise relation that exists for a wave-front between the wave and its time-derivative;
and d) The secondary wave has effect only where it touches the envelope. We
present the calculations for scalar waves of constant velocity and later argue that,
for an inhomogeneous medium, although the mathematics gets more complicated,
the basic principle can still be applied.
8.8.1 First-Order (in Time) Propagator for the Wave Equation

Consider the wave equation for a scalar function u(x, t):
∂ 2 u(x, t)
= ∇ 2 u(x, t), (8.83)
c2 ∂t 2
and call u t (x, t) = ∂u(x, t)/∂t. The second-order (in time) equation (8.83) can be
rewritten as the following two equations:
∂
u t (x, t) = c2 ∇ 2 u(x, t) (8.84a)
∂t
∂
u(x, t) = u t (x, t). (8.84b)
∂t
We can rewrite the above equations in terms of the two-state function (x, t)

u(x, t)
(x, t) = , (8.85)
u t (x, t)
as

∂ 0 1
(x, t) = (x, t) ≡ Û (x, t), (8.86)
∂t c2 ∇ 2 0
with the following solution:
(x, t) = eÛ t (x, 0). (8.87)

We can obtain an explicit expression of the propagator eÛ t in terms of the Fourier
components of (x, 0):
#
1
(x, t) = e √
Û t ¯
d 3 keik·x (k). (8.88)
( 2π)3
Since the action of Û on the exponentials eik·x is:

0 1
Û e ik·x
= eik·x , (8.89)
−(ck)2 0
we have

0 t
Û t ik·x
e e = exp eik·x ,
−(ck)2 t 0
1

cos(ckt) sin(ckt) ik·x
= ck e
−ck sin(ckt) cos(ckt)
≡ Ĝ(k, t)eik·x . (8.90)
The exponentiation calculated above follows immediately using the series expan-
sion of the exponential, and noticing that:
2n
0 t 1 0
= (−1) (ckt)
n 2n
, (8.91)
−(ck)2 t 0xs 0 1
2n+1
0 t 0 1/ck
= (−1) (ckt)
n 2n+1
. (8.92)
−(ck)2 t 0 −ck 0
The function at time t is therefore given by:
#
1
(x, t) = √ d 3 keik·x Ĝ(k, t)(k) . (8.93)
( 2π)3
More explicitly, given the (two) general initial conditions:
u(x, 0) = f (x)
#
1
= √ d 3 k f¯(k)eik·x (8.94)
( 2π )3
u t (x, 0) = g(x)
#
1
= √ d 3 kḡ(k)eik·x , (8.95)
( 2π)3
the functions at time t are given by
# * +
1 g(k̄)
u(x, t) = √ d 3 keik·x f¯(k) cos ckt + sin ckt (8.96)
( 2π )3 ck
#
1
u t (x, t) = √ d 3 keik·x −ck f¯(k) sin ckt + g(k̄) cos ckt . (8.97)
( 2π )3
In other words, the propagation of a second-order equation can be written in
terms of a first-order propagator of a two-component wave function, with two ini-
tial conditions u and u t . Having established the first-order nature of the evolution
operator for the two-state wave-function, in the next subsection we compute the
spatial representation of the evolution operator, and write it in the form:
#
(x, t) = d 3 x Ĝ(x − x , t)(x , 0), (8.98)
where Ĝ(x − x , t) corresponds to spherical waves emanating from point x .
8.8.2 Huygens’ Principle and Spherical Wavelets

In this subsection we compute explicitly the form of Ĝ(x − x , t) that appears in
equation (8.98). We first note that
1
* ∂ +
cos ckt sin ckt ∂t
1 sin ckt
Ĝ(k, t) = ck ≡ ∂2 ∂ , (8.99)
−ck sin ckt cos ckt ∂t 2 ∂t
ck
which enables us to write
* +
∂
1
Ĝ(x − x , t) = ∂t
∂2 ∂ G 0 (x − x , t), (8.100)
∂t 2 ∂t
with G 0 given by the following explicit calculation:

#
1 sin ckt ik·(x−x )
G 0 (x − x , t) = d 3k e
(2π)3 ck
# # π
1 2π ∞
= k dk sin ckt sin θeik|x−x | cos θ
(2π) c 0
3
# ∞ # π0
1
= 2
k dk sin ckt sin θeik|x−x | cos θ
c(2π) 0
# ∞ 0
1 2
= dk sin ckt sin k|x − x |
(2π)2 c|x − x | 0
u(x’,0) ut(x’,0)
ct ct
u(x,t) = x + x
Figure 8.6 Huygens’ construction: at each point x , the wave-front u(x , 0) and
its time derivative u t (x , 0) emit spherical wavelets. As a result, the wave-front
u(x, t) is the average of u(x , 0) and u t (x , 0) on a spherical surface of radius ct
centered at x. The small arrows are meant to represent the time derivative of the
wave-front.
# ∞
1 1

= dk cos k(|x−x | − ct)+cos k(|x − x | + ct)
(2π)2 c|x − x | 0
1 δ(|x − x | − ct)
= . (8.101)
4π c|x − x |
Notice that the second term, given by δ(|x − x | + ct), vanishes for positive t
(we are propagating in one direction of time only). The propagator (for t > 0) is
therefore given by

1 δt (|x − x | − ct) δ(|x − x | − ct)
Ĝ(x − x , t) = . (8.102)
4πc|x − x | δtt (|x − x | − ct) δt (|x − x | − ct)
So, in a generalized way, Huygens’ principle works, and each point gener-
ates “two-component spherical pulses,” one from the function and one from the
time-derivative. Substituting (8.102) in (8.98), we obtain
# #
∂ d S d S
u(x, t) = t u(x , 0) + t u t (x , 0), (8.103)
∂t S 4π S 4π
where the surface integrals above correspond to the integrals on a spherical surface
of radius ct centered on x. The wave function at (x, t) is therefore a superposition
of spherical wavelets that originate at points from earlier times. We represent this
average pictorially in Figure 8.6. Equation (8.103) appears in specialized books
(Baker and Copson, 1950) and, interestingly, goes back to Poisson (1818) (who
does not cite Huygens).
8.8.3 Cancellation of the Backwards Wave

Consider a planar wave-front that, at t = 0 is only finite on the plane z = 0
u(x, 0) = Aδ(z) (8.104a)

u t (x, 0) = −c Aδ (z). (8.104b)
The choice of u t = −c Aδ (z) originates in the fact that we expect this condition
to be of a propagating front of the form u(x, t) = Aδ(z − ct). Our purpose in this
section is to identify the cancellation of the backwards wave from the contribution
of the spherical wavelets.
Let’s evaluate the averages that appear in Eq. (8.103). Since the problem has
translational invariance in the x y plane, let us compute, without loss of generality,
the wave function at a point (0, 0, z), along the z axis. For the integral in the first
term in (8.103) we have
# # π
d S A
t u(x , 0) = t 2π dθ sin θ δ(z − ct cos θ )
S 4π 4π 0
A
= [1 − (|z| − ct)] , (8.105)
2c
with (x) the Heaviside function. From equation (8.105) we see that the contribu-
tion to the wave-front u at time t originating from the wave-front at t = 0 (the first
term in Eq. (8.103)) is given by:
#
∂ d S A
t u(x , 0) = δ(|z| − ct)
∂t S 4π 2
A
≡ [δ(z − ct) + δ(z + ct)] . (8.106)
2
In other words, the wavelets from the front itself give rise to two sheets, one above
and one below the front, each of them with half the amplitude.
Now consider the contribution from the second term in equation (8.103):
# #
d S A π
t u t (x , 0) = −ct dθ sin θ δ (z − ct cos θ )
S 4π 2 0
A ∂
=− [1 − (|z| − ct)]
2 ∂z
A
≡ [δ(z − ct) − δ(z + ct)] . (8.107)
2
The second set of wavelets gives rise to two sheets as well, but of opposite sign.
As a result, adding the two contributions, the backwards wave cancels and the
wave-front propagates in the “forward” direction.
The calculation of this section shows as well that, even though Huygens’ con-
struction calls for a superposition of wavelets, the secondary wave has effect only at
the point (0, 0, z), where the source from (0, 0, 0) touches the envelope. So, due to
the structure of the wave equation, the evolution of the wave-front in fact proceeds
as though light “were emitting light.”
Having made this comment on Huygens’ principle, it is worth mentioning that

what Huygens was indeed referring to as a “wave-front” is a well defined progess-
ing boundary in front of which there is no wave. Such a wave-front is realized in
the Hamiton-Jacobi theory, since it is composed of the tips of the particle paths that
start at a given origin. On the other hand, the solutions of Schrödinger’s equation
do not consist of a wave-front in this sense. One can construct a front at t = 0,
but it disperses immediately into a difuse wave packet. So, to be rigorous, the path
integral described by Feynman is not the true incarnation of Huygens’ idea.
Appendix A
Newton’s Solid of Least Resistance, Using Calculus
The resistance of a solid of revolution is

# yM
F = C2π dyy sin2 θ (A.1)
0
# yM
dy 2
= C2π dyy 2 (A.2)
0 d x + dy 2
# tb 3
y ẏ
= C2π dt, (A.3)
ta ẋ + ẏ
2 2
where, in going from equation (A.2) to equation (A.3) we introduced a parameter

t, and wrote ẋ = d x/dt and ẏ = dy/dt. The resistance of the solid is written now
as an “action,” with a Lagrangian given by
y ẏ 3
L(ẋ, ẏ, y) = . (A.4)
ẋ 2 + ẏ 2
Since the above L is independent of x, we have
∂L 2y ẏ 3 ẋ
= 2 = 2K , (A.5)
∂ ẋ ẋ 2 + ẏ 2
with K a constant. It is interesting that from equation (A.5) we can extract a para-
metric expression for the curve of the solid of least resistance. Notice that equation
(A.5) can be written as
2 2
d x + dy 2 1
2 2
y(q) = K = K 1 + q (A.6)
dy 3 d x q

1
≡ K q 3 + 2q + , (A.7)
q
with q = d x/dy. Given this definition of q we have

dy 1
d x = q dq = q K 3q + 2 − 2 dq
2
(A.8)
dq q

1
= K 3q 3 + 2q − dq. (A.9)
q
221
222 Newton’s Solid of Least Resistance, Using Calculus
y
C1
C2
C3
x
O D
Figure A.1 Meridian of the solid of least resistance, from equations (A.12).
Integrating equation (A.9) we obtain

3 4
x(q) = K q + q − ln |q| + x0 .
2
(A.10)
4
The constant of integration x0 is fixed so that the slope q = −1 corresponds to the
height O D of the solid:
7
O D = K + x0 . (A.11)
4
Finally, the parametric equations of the meridian of the solid of least resistance
are

7 3 4
x(q) = K − q − q + ln q + O D,
2
(A.12a)
4 4

1
y(q) = K q 3 + 2q + , (A.12b)
q
with O D the height of the solid, and q varying in the interval1 (1, q M ) so that
x(q M ) = 0. The constant K is related to the base of the solid, since OC = y(q M ).
In Figure A.1, we show three curves, corresponding to the ratios OC1 /O D = 2.39,
OC2 /O D = 1.68 and OC3 /O D = 0.666, and values of K = 0.25, 0.12 and 0.02
respectively.
1 We changed the sign of the range of q simply in order to plot the curve with positive values of x.
Appendix B
Original Statement of d’Alembert’s Principle
The original statement, as translated from the original French, and given by Fraser
(1985) is:
General Principle:
Given a system of bodies arranged mutually in any manner whatever; let us suppose that a
particular motion is impressed on each of the bodies, that it cannot follow because of the
action of the others, to find that motion that each body should take.
Solution.
Let A, B, C, etc. be the bodies composing the system, and let us suppose that the
motions a, b, c, etc. be impressed on them, and which be forced because of the mutual
action of the bodies to be changed into the motions a, b, c etc. It is clear that the motion a
impressed on the body A can be regarded as composed of the motion a that it takes, and
of another motion α; similarly, the motions b, c, etc. can be regarded as composed of the
motions b, β, c, x; etc.; from which it follows that the motions of the bodies A, B, C, etc.
would have been the same, if instead of giving the impulses a, b, c, one had given simul-
taneously the double impulses a, α, b, β, c, x etc. Now by supposition the bodies A, B, C,
etc. took among themselves the motions a, b, c, etc. Therefore the motions α, β, x, etc.
must be such that they do not disturb the motions a, b, c, etc., that is, that if the bodies had
received only the motions α, β, x etc. these motions would have destroyed each other and
the system would remain at rest.
From this results the following principle for finding the motion of several bodies which
act on one another. Decompose the motions a, b, c, etc. impressed on each body into two
others a, α, b, β, c, x etc. which are such that if the motions a, b, c, etc. were impressed
alone on the bodies they would retain these motions without interfering with each other;
and that if the motions α, β, x were impressed alone, the system would remain at rest; it is
clear that a, b, c will be the motions that the bodies will take by virtue of their action.
223
Appendix C
Equations of Motion of McCullagh’s Ether
Consider the variation of the potential energy V originated by a variation δu at

point x.
#
1
δV = h dv (∇ × (u + δu))2 − (∇ × u)2 (C.1)
2
#
h dv {(∇ × u) · (∇ × δu)} (C.2)
#
= h dv {∇ × [(∇ × u) · δu] − [∇ × (∇ × u)] · δu} (C.3)
(C.4)
If u is zero at infinity, a valid assumption, the first integral in equation (C.4)
vanishes, and we obtain:
#
δV = −h dv [∇ × (∇ × u)] · δu. (C.5)

Substituting in the variation of dt (T − V ) we obtain:
# #
δS = dt dv {ρ ü + h∇ × (∇ × u)} · δu (C.6)
Since the variations δu are arbitrary, equation (5.97) follows.
224
Appendix D
Characteristic Function for a Parabolic Keplerian Orbit
In this appendix, we prove equations (6.92). Call (x, y) and (x , y ) the coordinates
of r and r with respect to the focus.
r 2 = x 2 + y2 (D.1)
2 2
r02 =x +y (D.2)
2 2
τ = (x − x ) + (y − y )
2
(D.3)
≡r 2
+ r02 − 2rr0 cos θ (D.4)
Call
V √ √ √ √
w= √ = r + r0 + τ − r + r0 − τ ≡ + − −
2 μm
and notice that

(+) (−) = (r + r0 )2 − τ 2 = 2rr0 (1 + cos θ)

∂w 1 1 1 ∂r 1 1 ∂τ
= √ −√ + √ +√ (D.5)
∂x 2 + − ∂x + − ∂x

1 1 1 x 1 1 (x − x )
= √ −√ + √ +√ (D.6)
2 + − r + − τ
2 4 5 x2
∂w 1
= 2 (r + r0 ) − 2rr0 (1 + cos θ) 2
∂x 8rr0 (1 + cos θ) r
4 5 (x − x )2
+ 2 (r + r0 ) + 2rr0 (1 + cos θ)

τ2
x(x − x )
−2 × 2 × τ (D.7)
rτ
225
226 Characteristic Function for a Parabolic Keplerian Orbit
2 2
∂w ∂w 1
+ = (r + r0 ) − 2rr0 (1 + cos θ)
∂x ∂y
4rr0 (1 + cos θ)

+ (r + r0 ) + 2rr0 (1 + cos θ)

x(x − x ) + y(y − x )
−2 (D.8)
r
1
= {r + r0 − r + r0 cos θ} (D.9)
2rr0 (1 + cos θ)
1
= (D.10)
2r
from which equations (6.92) follow immediately (note the symmetry between r
and r0 ).
Appendix E
Saddle Paths for Reflections on a Mirror
The following exercise illustrates a case where the action is not a minimum. Con-
sider (see Figure E.1) a light ray that travels in the x y plane from P to Q reflecting
on a mirrored surface whose equation is
1
y = αx 2 . (E.1)
2
The optimal path will consist of two straight segments P R and Q R. We choose
points P and Q symmetrical with respect to the y axis. We want to find the
trajectory P R Q that minimizes the time T (x) given by

2
1 2 1 2 2
cT (x) = (x + a) + b − αx
2 + (x − a) + b − αx
2 . (E.2)
2 2
We are interested, for concreteness, in the reflection at x = 0. Consider x close
to zero and expand T (x) to obtain:

cT (x) (x + a)2 + b2 − bαx 2 + (x − a)2 + b2 − bαx 2

= a 2 + b2 + 2ax + (1 − bα)x 2 + a 2 + b2 − 2ax + (1 − bα)x 2
P C Q
a a
0 x
Figure E.1 The action (length) for a light ray reflecting from a curved surface
could be a saddle.
227
228 Saddle Paths for Reflections on a Mirror

b b
a 2 + b2 − √ α− 2 x2
a +b
2 2 a + b 2
b
= a 2 + b2 − √ (α − αe ) x 2 , (E.3)
a + b2
2
√
where αe = b/ a 2 + b2 corresponds to the curvature of an √ ellipse E whose foci
are at P and Q and of major and minor semi-axes given by a 2 + b2 and a respec-
tively. If the curvature α of the mirror is larger than that of the ellipse E (αe ), the
law of reflection is followed at x = 0, but the action is a maximum with respect to
small variations of the point R around x = 0. If α = αe , the action is critical, as
expected if P and Q are focal points: the law of reflection is followed for all points
in the mirror. We stress that the action, although being a maximum with respect to
variations of the “reflection” point, cannot be an overall maximum, since the path
P Q or Q R can always be made longer by adding wiggles to the straight line. The
path length is therefore a minimum or a saddle.
Appendix F
Kinetic Caustics from Quantum Motion
in One Dimension
In this Appendix we analyze the interesing mapping between the second varia-
tion of the action of a classical path in a one-dimensional potential V (x), and
the quantum mechanical problem of a particle moving in a potential −V (x) =
−d 2 V (x)/d x 2 (Hussein, Pereira, Stojanoff, and Takai, 1980). Let us take the mass
of the particle as unity. The action S, given by
# t
1 2
S= dt ẋ − V (x) , (F.1)
0 2
has a vanishing variation, δS = 0, for a path x0 (t). If we consider a small variation
around the path x0 :
x(t) = x0 (t) + φ(t), (F.2)
with 1, the action is:

S ≈ S0 + 2 S2 , (F.3)
with
# t
1 2
S2 = dt φ̇ − V (x0 (t))φ (t) .
2
(F.4)
0 2
Integrating by parts the first term, and noting that φ(0) = φ(t) = 0, we obtain:
# T
1 d2
S2 = dt φ(t) − + Ṽ (t) φ(t), (F.5)
0 2 dt 2
where, for simplicity of notation, we have called T the final time in order to distin-
guish it from the intermediate, integrated time t, and Ṽ (t) = −V (x0 (t)). Equation
(F.5) is precisely the expectation value of the energy of a “quantum” particle, if we
think of the variable t as x and take 2 /m = 1. The potential in the x (or t) direction
depends on the classical path x0 (t) minimizing the action.
229
230 Kinetic Caustics from Quantum Motion in One Dimension
The analysis of the possible signs of S2 is reduced now to studying the spectrum
of eigenvalues (the “energies” E n ) of the problem whose stationary Schrödinger
equation is given by:

1 d2
− + Ṽ (t) φn (t) = E n φn (t). (F.6)
2 dt 2
Since the set of eigenfunctions φn (t) is orthonormal, we can write any arbitrary
variation φ(t) as a linear combination of the form
∞
'
φ(t) = an φn (t), (F.7)
n=0
which, substituted in equation (F.5) gives:

∞
'
S2 = an2 E n . (F.8)
n=0
The condition for a caustic or for a kinetic focus (S2 = 0) is then that the corre-
sponding quantum problem has an eigenvalue E n = 0. Let us apply this treatment
to the problem considered in section 6.8.6, where V (x) = λ|x|, giving
Ṽ (t) = −V (x0 (t)) = −λδ(x0 (t)), (F.9)
where x0 (t) is the classical orbit given by equations (6.219) and (6.220). Since in
the interval 0 < t < 2t0 the particle has a value x0 (t) = 0 for t = t0 , we write
δ(t − t0 )
δ(x0 (t)) = . (F.10)
|v0 |
Using v0 = λt0 /2 the equivalent Schrödinger equation becomes

1 d2 2
− − δ(t − t0 ) φn (t) = E n φn (t), (F.11)
2 dt 2 t0
which corresponds to an otherwise free particle with an attractive delta function
potential of intensity 2/t0 located at t = t0 . (In addition, in order to impose the
boundary conditions φn (0) = φn (T ) = 0, we add infinite potential walls at t = 0
and at t = T .)
First note that if t0 > T the delta function is outside the well, and the quantum
particle is free inside the well. The eigenfunctions are those of the potential well, a
classical textbook quantum problem:

πnt
φn (t) ∝ sin , (F.12)
T
with positive eigenvalues given by:
1 π 2n2
En = , (n = 1, 2, · · · ). (F.13)
2 T2
Kinetic Caustics from Quantum Motion in One Dimension 231
In other words, for times T shorter than t0 we have S2 > 0, and the action is a
minimum. For T > t0 , the attractive delta function falls inside the potential well,
and there is the possibility of a zero (or negative) energy. If we integrate equation
(F.11) between t0− and t0+ we obtain:
4
φn (t0− ) − φn (t0+ ) =
φ(t0 ), (F.14)
t0
so the delta function imposes a discontinuity in the derivative. We concentrate now
on the possibility of having a state function φ0 (t) with zero energy. If we look at
the Schrödinger equation for times t = t0 , for E 0 = 0 we have simply:
φ0 (t) = 0, (F.15)
whose nontrivial solution is
φ0 (t) = a + bt. (F.16)
The function φ0 will be of the form (see Figure F.1):
⎧
⎨at if 0 < t < t0 ;
φ0 (t) = (T − t) (F.17)
⎩at0 if t0 < t < T .
T − t0
In order for φ0 of equation (F.17) to be an allowed solution, it has to satisfy the
discontinuity of the derivative of equation (F.14):
Ṽ(t)
φ0(t)
t0
t
0 T
2
− δ(t − t0)
t0
Figure F.1 Quantum analogue for the second variation of the action for a potential
V (x) = λ|x|.
232 Kinetic Caustics from Quantum Motion in One Dimension
t0
a+a = 4a, (F.18)
T − t0
giving
4
T = t0 . (F.19)
3
In other words, for times shorter than 4t0 /3 the action is a maximum. For
t = 4t0 /3, the second variation is zero, and that point corresponds to the caustic of
this problem (see Figure 6.20).
Appendix G
Einstein’s Proof of the Covariance
of Maxwell’s Equations
Einstein writes Maxwell’s equations in a vacuum in the form1

1 ∂ Ex ∂ Bz ∂ By 1 ∂ Bx ∂ Ey ∂ Ez
= − , = − ,
c ∂t ∂y ∂z c ∂t ∂z ∂y
1 ∂ Ey ∂ Bx ∂ Bz 1 ∂ By ∂ Ez ∂ Ex
= − , = − ,
c ∂t ∂z ∂x c ∂t ∂x ∂z
1 ∂ Ez ∂ By ∂ Bx 1 ∂ Bz ∂ Ex ∂ Ey
= − , = − . (G.1)
c ∂t ∂x ∂y c ∂t ∂y ∂x
The transformation equations (7.10), using the notation x , y , z , t in place of
ξ, η, ζ, τ , imply:

∂ ∂ ∂
=γ − v , (G.2a)
∂t ∂t ∂x

∂ ∂ v ∂
=γ − , (G.2b)
∂x ∂ x c2 ∂t
∂ ∂
= , (G.2c)
∂y ∂y
∂ ∂
= . (G.2d)
∂z ∂z
Using equations (G.2) in (G.1) we obtain:

1 ∂ ∂ ∂ Bz ∂ By
γ − v Ex = − , (G.3a)
c ∂t ∂x ∂y ∂z

1 ∂ ∂ ∂ Bx ∂ v ∂
γ − v Ey = −γ − Bz , (G.3b)
c ∂t ∂x ∂z ∂ x c2 ∂t
1 Einstein uses the notation (X,Y,Z) for the vector of the electric field and (L,M,N) for that of the magnetic
field, in what Abraham Pais calls the “horrible but not uncommon notation in which each component of the
electric and magnetic field has its own name”(Pais, 1982, p. 145).
233
234 Einstein’s Proof of the Covariance of Maxwell’s Equations

1 ∂ ∂ ∂ v ∂ ∂ Bx
γ −v Ez = γ − 2 By − . (G.3c)
c ∂t ∂x ∂x c ∂t ∂ y
Einstein now uses the vanishing divergence of the electric field, expressed in the
primed coordinates:

∂ Ex ∂ Ey ∂ Ez ∂ v ∂ ∂ Ey ∂ Ez
+ + =γ − 2 Ex + + = 0, (G.4)
∂x ∂y ∂z ∂x c ∂t ∂y ∂z
which, substituted in equation (G.3a) gives
1 ∂ Ex ∂ % v & ∂ % v &
= γ B z − E y − γ B y + Ez . (G.5)
c ∂t ∂ y c ∂z c
On the other hand, equations (G.3b) and (G.3c) can be rewritten as
1 ∂ % v & ∂ Bx ∂ % v &
γ E y − B z = − γ B z − Ey , (G.6a)
c ∂t c ∂z ∂x c
1 ∂ % v & ∂ % v & ∂ Bx
γ E z + By = γ By + E z − . (G.6b)
c ∂t c ∂x c ∂z
Equations (G.5) and (G.6) can be written as

1 ∂E
= ∇ × B , (G.7)
c ∂t
with
E x = ψ(v)E x , Bx = ψ(v)Bx , (G.8a)
% v & % v &
E y = ψ(v)γ E y − Bz , B y = ψ(v)γ B y + E z , (G.8b)
% c c
v &
% v &
E z = ψ(v)γ E z + B y , Bz = ψ(v)γ Bz − E y . (G.8c)
c c
Einstein includes the prefactor ψ(v) because equations (G.5) and (G.6) are
linear in the fields, and therefore they determine the primed fields up to a mul-
tiplicative constant. In order to find ψ(v), he proceeds in the same fashion as in
the transformation of fields. Applying the transformation followed by the inverse
transformation, an operation that brings the fields to the original values, he obtains
ψ(v)ψ(−v) = 1. “From reasons of symmetry” (Einstein, 1952, p. 53) ψ(v) =
ψ(−v), which implies therefore ψ(v) = 1 and we obtain equations (G.8c).
1 ∂B
Applying the same logic we obtain: = −∇ × E .
c ∂t
Appendix H
Relativistic Four-Vector Potential
Equations (G.8c) for the transformation of the electromagnetic fields are different
in structure from the transformation of the coordinates x = γ (x − vt), y = y,
z = z, t = γ (t − vx/c2 ): the fields change in the transverse direction but remain
unchanged in the direction of motion. Now, we can write the six components of the
fields E and B in terms of two potentials, a scalar φ(x, t) and a vector A(x, t):
1 ∂A
E = −∇φ − , (H.1a)
c ∂t
B = ∇ × A. (H.1b)
Since φ and A have the same units let us write the transformation of coordinates
in terms of x, y z and t¯ with t¯ = ct having units of length:
t¯ = γ (t¯ − βx), (H.2a)

x = γ (x − β t¯), (H.2b)

y = y, (H.2c)

z = z, (H.2d)
where β = v/c. Using the chain rule for the derivatives we have:

∂ ∂ t¯ ∂ ∂x ∂
= +
∂t¯ ¯
∂t ∂t ¯ ¯
∂t ∂ x

∂ ∂
=γ −β (H.3a)
∂ t¯ ∂x

∂ ∂x ∂ ∂ t¯ ∂
= +
∂x ∂x ∂x ∂ x ∂ t¯

∂ ∂
=γ −β (H.3b)
∂x ∂ t¯
235
236 Relativistic Four-Vector Potential
∂ ∂
= , (H.3c)
∂y ∂y
∂ ∂
= . (H.3d)
∂z ∂z
Let us consider the transformation laws for the fields obtained by Einstein – equa-
tions (G.8c) – and express them in terms of the potentials of equations (H.1b). For
the electric field in the x direction we have:
∂φ ∂ A x
E x = E x = − −
∂x ∂ t¯
∂ ∂ ∂ ∂
= −γ −β φ−γ − β Ax
∂x ∂ t¯ ∂ t¯ ∂x
∂ ∂
= − γ (φ − β A x ) − γ (A x − βφ) . (H.4)
∂x ∂ t¯
φ Ax
In other words, the field in the moving frame can be expressed as
∂φ ∂ Ax
E x = − − , (H.5)
∂x ∂ t¯
with φ = γ (φ − β A x ) and Ax = γ (A x − βφ): the transformation properties of
(φ, A x ) are the same as those of (ct, x). For the field in the y component, we have:

E y = γ E y − β Bz (H.6)

∂φ ∂ A y ∂ Ay ∂ Ax
= −γ + − γβ − (H.7)
∂y ∂ t¯ ∂x ∂y

∂ ∂ ∂
= − γ (φ − β A x ) − γ +β Ay (H.8)
∂y ∂ t¯ ∂x
φ
∂
∂ t¯

∂φ ∂ Ay
=− − , (H.9)
∂y ∂ t¯
where we have used y = y and A y = Ay . Let us consider the transformation of

the magnetic field in the y direction:

B y = γ B y + β E z (H.10)

∂ Ax ∂ Az ∂φ ∂ A z
=γ − − γβ + (H.11)
∂z ∂x ∂z ∂ t¯
Relativistic Four-Vector Potential 237

∂ ∂ ∂
= γ (A x − βφ) − γ +β Az (H.12)
∂z ∂x ∂ t¯
Ax
∂
∂x

∂ Ax ∂ Az
= − ≡ ∇ × A y . (H.13)
∂z ∂x
The same calculation can be done for all the components of the electric and mag-
netic fields. We conclude that, if in one frame the fields are given in term of the
potentials as:
1 ∂A
E = −∇φ − , (H.14a)
c ∂t
B = ∇ × A, (H.14b)
in a primed frame moving at velocity v with respect to the first one, the fields are
given by
1 ∂A
E = −∇ φ − (H.15a)
c ∂t
B = ∇ × A , (H.15b)

φ = γ (φ − β A x ) (H.16a)
Ax = γ (A x − βφ), (H.16b)
Ay = Ay, (H.16c)
Az = Az . (H.16d)
Appendix I
Ehrenfest’s Proof of the Adiabatic Theorem
As we saw in section 5.4.2, if the Lagrangian is independent of time, the energy E,

is a constant of motion:

∂L
· ẋ − L = E. (I.1)
∂ ẋ
If the Lagrangian L(x, ẋ, t) depends explicitly on time, we have:

d ∂L ∂L
· ẋ − L = − . (I.2)
dt ∂ ẋ ∂t
Now suppose that L depends on a parameter a and that, for fixed values of the
parameter, the paths are periodic orbits, not necessarily harmonic. These orbits
have energies that depend on the parameter a. Next, assume that a(t) varies slowly
with time, so that δa/a 1 in a time equal to the period of one of the orbits. This
means that, during a period, the energy is approximately constant and corresponds
to the instantaneous value of the parameter a: E ≈ E(a(t)), and equation (I.2) can
be approximated as
dE ∂L ∂L
=− = − ȧ. (I.3)
dt ∂t ∂a
The term −∂ L/∂a is equivalent to the external force associated with the parameter
a, and equation (I.3) constitutes the adiabatic approximation. For the variation of
the energy between very close periods (where the external force is approximately
constant), we integrate equation (I.3)
# t # t
∂L ∂L 1 ∂L
E = − ȧdt ≈ −δa ≡ −δa dt. (I.4)
0 ∂a ∂a t 0 ∂a
Following Eherenfest, let us now consider two very close orbits 1 and 2 (see
Figure I.1) corresponding to the parameters a and a + δa, and assume that they
238
Ehrenfest’s Proof of the Adiabatic Theorem 239
x2(0)= x2(tB)
δ(0)
x2(tA)
δ(tA)
x1(0) = x1(tA)
Figure I.1 Adiabatic theorem for closed orbits.
are solutions of the equations of motion for time-independent, fixed values of the
respective parameters. The respective orbits have periods t A and t B :
x1 (t A ) = x1 (0), (I.5a)
x2 (t B ) = x2 (0), (I.5b)
x2 (t) = x1 (t) + δ(t). (I.5c)
Ehrenfest considers the difference beteween the actions, δS = S2 − S1 for the

two orbits during their periods A and B. Expanding to lowest order in δ, δa and
t A − tb , we have:
# tA
S1 = L(x1 , ẋ1 , a)dt (I.6a)
0
# tB
S2 = L(x1 + δ, ẋ1 + δ̇, a + δa)dt. (I.6b)
0
⎧ ⎫
# tA ⎪
⎪
⎨ ∂L ⎪
⎪
d ∂L d ∂L ∂L ⎬
δS = t L(x1 , x˙1 , a) + − ·δ + ·δ + δa
0 ⎪ ⎪ ∂x dt ∂ ẋ dt ∂ ẋ ∂a ⎪ ⎪
⎩ ⎭
=0
(I.7)
, , # tA
∂ L ,, ∂ L ,, ∂L
= t L(0) + , · δ(t A ) − , · δ(0) + δa dt. (I.8)
∂ ẋ t A ∂ ẋ 0 0 ∂a
Since the orbits are periodic
, ,
∂ L ,, ∂ L ,,
= . (I.9)
∂ ẋ ,t A ∂ ẋ ,0
240 Ehrenfest’s Proof of the Adiabatic Theorem
Also (see Figure I.1)

δ(t A ) − δ(0) = −ẋ(0)(t B − t A ) ≡ −ẋ(0)T, (I.10)
and we obtain, with Ehrenfest:
, # tA
∂L , ∂L
δS = T L − ,
· ẋ , + δa dt (I.11)
∂ ẋ 0 0 ∂a
# tA
∂L
= −E 1 t + δa dt. (I.12)
0 ∂a
Now consider the variation of the total energy
#
δ (T + V )dt = E 2 t B − E 1 t A = Et B + E 1 t. (I.13)

Adding equations (I.13) and (I.12), and recalling that S = (T − V )dt, we obtain:
# tP # tA
∂L
2δ T dt = E t B + δa dt. (I.14)
0 0 ∂a
Using the adiabatic result of equation (I.4) we arrive at Ehrenfest’s result:
# tP
2δ T dt = 0, (I.15)
0
or which is equivalent,

δ ωT = 0, (I.16)
where T is the average kinetic energy during a period t P .
References
Abraham, R., and Marsden, J. E. 1978. Foundations of Mechanics. Addison-Wesley.

Reprinted by Perseus Press, 1995.
Alexander, A. 2014. Infinitesimal: How a Dangerous Mathematical Theory Shaped the
Modern World. Scientific American / Farrar, Straus and Giroux.
Andersen, K. 1983. The Mathematical Technique in Fermta’s Deduction of the Law of
Refraction. Historia Mathematica, 10: 48–62.
Aristotle, 350 BC/1922. De caelo. In The Works of Aristotle. Translated by J. L. Stocks and
H. H. Joachim. Oxford University Press.
Aristotle, 350 BC/1955. Aristotle: Mechanical Problems. In Aristotle. Minor Works.
Translated by W. S, Hett. Harvard University Press.
Aristotle, 350 BC Politics. Available at http://classics.mit.edu/Aristotle/politics.html .
Baker, B. B., and Copson, E. T. 1950. The Mathematical Theory of Huygens’ Principle.
Second edn. Clarendon Press, 14.
Bascelli, T., Bottazzi, E., Herzberg, F., Kanovei, V., Katz, K. U., Katz, M. G., Nowik, T.,
Sherry, T., and Shnider, S. Fermat, Leibniz, Euler, and the Gang, 2014: The
True History of the Concepts of Limit and Shadow. Notices of the AMS 61 (8):
848–864.
Beck, G., Bethe, G., and Riezler W. 1931. Remarks on the Quantum Theory of the
Absolute Zero of Temperature. Die Naturwissenschaften, 2: 38–39. A translation
appears in A Random Walk in Science, (1973). Compiled by R.L. Weber, edited by E.
Mendoza. Institute of Physics.
Bellver-Cebreros, C., and Rodriguez-Danta, M. 2001. Eikonal Equation from Continuum
Mechanics and Analogy between Equilibrium of a String and Geometrical Light
Rays. Am. J. Phys. 69: 360–367.
Bernoulli, J. 1686. Narratio Controvertiæ inter Dn. Hugenium et Abbatem Catalanum
agitatæ de Centro Oscillationis quæloco animadversionis esse poterit in Respon-
sionem Dr. Catelani, num. 27, Ephem. Gallic. anni 1684, insertam. Acta Eroditorum,
356–360. Available in Google books.
Bernoulli, J. 1703. Démonstration générale du centre de balancement ou d’oscillation,
tirée de la nature du levier. Letter of 13 March 1703 Mém. acad. sci. Paris, 1703.
Fourth edn: 78–84.
Bernoulli J. 1742. Disquisitio Catoptico-Dioptrica. Opera Omnia. Vol. 1. printed by
Marci-Michaelis Bousquet & sociorum, 1742: 369–376.
Berry, M. V. 1984. Quantal Phase Factors Accompanying Adiabatic Changes. Proc. Roy.
Soc. Lond. A392: 45–57.
241
242 References
Berry, M. V., and Jeffrey, M. R. 2007. Conical Diffraction: Hamilton’s Diabolical Point
at the Heart of Crystal Optics. In Progress in Optics 50, edited by E. Wolf: 13–50.
Elsevier B. V.
Berry, M.V. 2015. Nature’s Optics and Our Understanding of Light. Contemp. Phys. 56:
2–16.
Blanchard, P. and Brüning, E. 1982. Variational Methods in Mathematical Physics: A
Unified Approach. Springer-Verlag.
Blåsjö, V. 2005. The Isoperimetric Problem. American Mathematical Monthly 112, June-
July: 526–566.
Bloch, A. 2003. Nonholonomic Mechanics and Control. With J. Baillieul, P. Crouch., J.
Marsden., and D. Zenkov. 2nd edn., 2015. Springer.
Bohr, N. 1913a. On the Constitution of Atoms and Molecules: Part I. Phil. Mag. 26 (6):
1–25.
Bohr, N. 1913b. On the Constitution of Atoms and Molecules: Part II, Systems Containing
Only a Single Nucleus. Phil. Mag. 26 (6): 476–502.
Bohr, N. 1913c. On the Constitution of Atoms and Molecules: Part III, Systems Containing
Several Nuclei. Phil. Mag. 26 (6): 857–875.
Bohr, N. 1922. The Theory of Spectra and Atomic Constitution: Three Essays. Cambridge
University Press.
Boltzmann, L. 1872. Further Studies on the Thermal Equilibrium of Gas Molecules.
Reprinted in Brush, S. G. 2003. The Kinetic Theory of Gases, an Anthology of
Classic Papers with Historical Commentary. Imperial College Press: 262–349. Orig-
inally published under the title Weitere Studien über das Wärmegleichgewicht unter
Gasmolekülen. Sitzungberichte Akad. Wiss. Vienna, part II, 66: 275–370.
Born, M. 1926. The Problems of Atomic Dynamics: 75. Dover Publications.
Born, M., and Woolf, E. 1999. Principles of Optics, 7th ed., Cambridge University
Press.
Boyer, C. A. 1987. The Rainbow: From Myth to Mathematics. Princeton University Press.
Brackenridge, J. B. 1996. The Key to Newton’s Dynamics: The Kepler Problem and the
Principia. University of California Press.
Breger. H. 1994. The Mysteries of Adaequare: A Vindication of Fermat. Arch. Hist. Exact
Sci. 46 (3): 193–219.
Briggs, J. S., and Rost, J. M. 2001. On the Derivation of the Time-Dependent Equation of
Schrödinger. Foundations of Physics 31: 693–712.
Brillouin, Marcel. 1919. Actions mécaniques à hérédité discontinue par propagation; essai
de théorie dynamique de l’atome à quanta. Comptes rendus 168: 1318–1320.
de Broglie, L. 1923a. Ondes et quanta. Comptes rendus 177: 507–510.
de Broglie, L. 1923b. Quanta de lumière, diffraction et interférences. Comptes rendus 177:
548–550.
de Broglie, L. 1923c. Les quanta, la théorie cinétique des gaz el le principe de Fermat.
Comptes rendus 177: 630–632.
de Broglie, L. 1924a. A Tentative Theory of Light Quanta. Philosophical Magazine 47:
446–458.
de Broglie, L. 1924b. Ph. D. Thesis. Université de Paris.
Brunet, P. 1938. Etude historique sur le principe de la moindre action. Herman & Cie ,
Éditeurs.
Burns, H. 1895. The Eikonal. Translated by D. H. Delphenich. S. Hirzel.
Butterfield, J. 1995. On Hamilton-Jacobi Theory as a Classical Root of Quantum Theory.
In Quo Vadis Quantum Mechanics? Edited by Elitzur, A. C., Dolev, S. and Kolenda,
N., 239–273. Springer.
References 243
Byers, N. 1978. E. Noether’s Discovery of the Deep Connection Between Symmetries and
Conservation Laws. Arxiv. hep-th 980744.
Cajori, F. 1729/1934. Sir Isaac Newton’s Mathematical Principles of Natural Philosophy
and his System of the World (Andrew Motte’s translation of the Principia, of 1729,
revised.) University of California Press.
Capecchi, D. 2012. History of Virtual Work Laws. A History of Mechanics Prospective.
Springer Verlag.
Carathéodory, C. 1937. The Beginning of Research in the Calculus of Variations. Osiris,
3: 224–240.
Cauchy, A. 1830. Oeuvres complètes d’Augustin Cauchy. Ser. 1, vol. 9, p. 410.
Cauchy, A. 1843. Mémoire sur les dilatations, les condensations et les rotations produites
par un changement de forme dans un système de points matériels. Comptes rendus,
Vol. 16, p. 12. Reprinted in Oeuvres Complètes d’Augustin Cauchy. Vol. 7 (1892),
235–246.
Chandrasekhar, S. 1995. Newton’s Principia for the Common Reader. Oxford University
Press.
Cohen, I. B. 1974. Isaac Newton, The Calculus of Variations, and the Design of Ships. In
For Dirk Struik, Scientific, Historical and Political Essays in Honor of Dirk J. Struik.
Edited by 000, 169–187. Boston Studies in the Philosophy of Science Vol. 15. Reidel
Publishing Company.
Cohen, I. B., and Whitman, A. 1999. A Guide to Newton’s Principia. In Isaac Newton, The
Principia, p. 46. University of California Press.
Cohen, M., and Drabkin, I. E. 1965. A Source Book in Greek Science. Harvard University
Press, 271–272.
Courant, R., and Robbins, H. 1996. What is Mathematics? An Elementary Approach to
Ideas and Methods. Revised by Ian Stewart from the original 1941 edition. Oxford
University Press.
D’Alembert, J. L. R. 1743. Traité de dynamique.
D’Alembert, J. L. R. 1755. Equilibre. In Encyclopédie ou Dictionnaire raisonné des sci-
ences, des arts et des métiers. Available online at http://encyclopedie.uchicago.edu/.
D’Alembert, J. L. R. 1758. Traité de dynamique (Second edition). First edition 1743.
Damianus, 1897. Schrift über Optik: Mit Auszügen aus Geminos, Edited by R. Schöne,
Berlin, Reichsdruckerei.
Darrigol, O. 2010. James MacCullagh’s ether: An optical route to Maxwell’s equations?
European Physical Journal H, 35: 133–172.
Darrigol, O. 2012. A History of Optics from Greek Antiquity to the Nineteenth Century.
Oxford University Press.
Darrigol, O. 2014. Physics and Necessity: Rationalist Pursuits from the Cartesian Past to
the Quantum Present. Oxford University Press.
De la Chambre, M. C. 1662. La Lumière, 313–314. Available in Google Books.
Descartes, R. 1637. Discours de la méthode pour bien conduire sa raison, et chercher la
vérité dans les sciences plus la dioptrique, les météores et la géomeétrie qui sont des
essais de cette méthode. Leyde: Maire. p. 73.
Dirac, P. A. M. 1933. The Lagrangian in Quantum Mechanics. Phys. Zeits. Sowjetunion,
3 (1): 64–72.
Drago, A. 1993. The Principle of Virtual Works as a Source of Two Traditions in 18th
Century Mechanics. In Bevilacqua F. (ed.), 1992. History of Physics in Europe in the
19th and 20th Centuries, Como, Italy, 1992, F. Bevilacqua ed., Società Italiana di
Fisica, Bologna, 69–80, Bologna.
Drake, S. 1978. Galileo at Work: His Scientific Biography. University of Chicago Press.
244 References
Dugas, R. 1955. A History of Mechanics. Dover.

Duhem, P. 1905. Les origines de la statique.
Eastwood, B. S. 1971. Metaphysical Derivations of a Law of Refraction: Damianos and
Grosseteste. Archive for History of Exact Sciences, 6 (3): 224–236.
Ehrenfest, P. 1911. Welche Züge der Lichtquantenhypothese spielen in der Theorie der
Wärmestrahlung eine wesentliche Rolle? Annalen der Physik, 36: 91–118.
Ehrenfest, P. 1913. A mechanical theorem of Boltzmann and its relation to the theory of
energy quanta. Proceedings of the Amsterdam Academy 16: 591–597.
Einstein, A. 1905. Über einen die Erzeugung und Verwandlung des Lichtes betreffenden
heuristischen Gesichtspunkt. Annalen der Physik, 17: 132–148.
Einstein, A. 1911. Über den Einfluss der Schwerkraft auf die Ausbreitung des Lichtes.
Annalen der Physik, 35: 898–908. Translated as “On the Influence of Gravitation on
the Propagation of Light,” in Einstein, A. 1952. The Principle of Relativity. Dover. pp.
99–108.
Einstein, A. 1915a. Erklärung der Perihelbewegung des Merkur aus der allgemeinen
Relativitätstheorie. Preussische Akademie der Wissenschaften, 1915 (part 2):
831–839.
Einstein, A. 1915b. Die Feldgleichungen der Gravitation. Preussischen Akademie der
Wissenschaften 1915 (part 2): 844–847.
Einstein, A. 1916a. Die Grundlage der allmeinen Relativitätstheorie. Annalen der Physik,
49: 769–822. Translated as “The Foundation of the General Theory of Relativity,” in
Einstein, A. 1952. The Principle of Relativity. Dover. pp. 111–164.
Einstein, A. 1916b. Hamilton’s Principle and the General Theory of Relativity. Königliche
preußische Akademie der Wissenschaften (Berlin), 1111–1116. Reprinted in “The
Principle of Relativity” Einstein (1952).
Einstein, A. 1922. The Meaning of Relativity. Four Lectures Delivered at Princeton
University, May, 1921. Princeton University Press.
Einstein, A. 1952. The Principle of Relativity. A Collection of Original Papers on the
Special and General Theory of Relativity, with notes by A. Sommerfeld. Dover
Ekeland, I. 2006. The Best of all possible worlds: Mathematics and Destiny. The University
of Chicago Press.
Erlichson, H. 1997. Hooke’s September 1685 Ellipse Vertices Construction and Newtons’s
Instantaneous Impulse Construction. Historia Mathematica, 24: 167–184.
Erlichson, H. 1988. Galileo’s work on swiftest descent from a circle and how he almost
proved the circle itself was the minimum time path. The American Mathematical
Monthly, 105: 338–347.
Euler, L. 1736. Mechanica sive motus scientia analytice exposita. Auctore Leonhardo
Eulero academiae imper. scientiarum membro et matheseos sublimioris profes-
sore. Vol. 1: Instar supplementi ad commentar. acad. scient. imper. Petropoli. Ex
typographia academiae scientarum. Available online at http://eulerarchive.maa.org/
docs/originals/E015intro.pdf.
Euler, L. 1744. Methodus Inveniendi Lineas Curvas Maximi Minimive Proprietate Gau-
dentes, sive Solutio Problematis Isoperimetrici Latissimo Sensu Accepti (A Method
of Finding Plane Curves that Show Some Property of Maximum or Minimum, or
Solution of Isoperimetric Problems in the Broadest Accepted Sense). Laussanne and
Geneva. Also in L. Euler, Opera Omnia I, Vol. XXIV, C. Carathéodory, ed. Bern,
1952.
Euler, L. 1748a. Recherches sur les plus grands et plus petits qui se trouvent dans les
actions des forces. Mémoires de l’Académie des sciences de Berlin, 4:149–188.
Euler, L. 1748b. Reflexions sur quelques loix générales de la nature qui s’observent dans
les effets des forces quelconques. Mémoires de l’Académie des sciences de Berlin, 4:
189–218.
References 245
Euler, L. 1751. Dissertation sur le principe de la moindre action, avec l’examen des objec-
tions de M. le Professeur Koenig faites contre ce principe. Berlin. Bilingual edition
available online at http://eulerarchive.maa.org/docs/originals/E186a.pdf
Euler, L. 1752. Harmonie entre les principes generaux de repos et de mouvement de M. de
Maupertuis. Mémoires de l’Académie des sciences de Berlin, 7: 169–198. Available
at http://eulerarchive.maa.org/.
Evans, J. and Rosenquist, M. 1986. “F = ma” optics. Am. J. Phys. 54: 876–882.
Fermat, P. 1657/1894. Œuvres. Vol. 1. Translated by Paul Tannery Correspondance. Paris.
Available in Google books.
Fermat, P. 1657/1894b. Œuvres. Vol. 3. Translated by Paul Tannery. Paris. Available in
Google books. 149–151.
Feynman, R. 1942/2005. Feynman’s Thesis–A New Approach to Quantum Theory. Edited
by Laurie M Brown. World Scientific. Available online at https://cds.cern.ch/record/
101498/files/Thesis-1942-Feynman.pdf
Feynman, R. 1965. “The Development of the Space-Time View of Quantum Electrody-
namics.” Nobel Lecture. http://www.nobelprize.org/nobel_prizes/physics/laureates/
1965/feynman-lecture.html.
Feynman, R. 1963. The Feynman Lectures on Physics. Vol. I, Addison Wesley. Section
26–3.
Feynman, R. 2013. The Feynman Lectures on Physics. Vol. II, The Millenium Edition.
http://www.feynmanlectures.caltech.edu/II_01.html.
FitzGerald, G. F. 1879. On the electromagnetic theory of the reflection and refraction of
light. Proceedings of the Royal Society. Reprinted in FitzGerald, G. F. 1902. The
Scientific Papers of the Late G. F. FitzGerald, ed. J. Larmor. Dublin, 41–44.
Fraser, C. 1983. J. L. Lagrange’s Early Contributions to the Principles and Methods of
Mechanics. Archive for History of Exact Sciences, 28: 197–241.
Fraser, C. 1985. D’Alembert’s Principle: The Original Formulation and Application in Jean
d’Alembert’s Traité de Dynamique (1743), parts 1 and 2. Centaurus 28: 31–61.
Galilei, G. 1600/1960. On Motion and On Mechanics. Translated with Introduction and
notes by I. E. Drabkin and Stillman Drake. University of Wisconsin Press.
Galilei, G. 1638/1974. Discourses and Mathematical Demonstrations Two New Sciences.
Translated by Stillman Drake. University of Wisconsin Press.
Gamow, R. I. 1966. Thirty Years That Shook Physics: The Story of Quantum Theory. Dover
Publications.
Gandz, S. 1940. Studies in Babylonian Mathematics III: Isoperimetric Problems and the
Origin of the Quadratic Equations. Isis, 32: 103–115.
Gauss, C. F. 1829. Über ein neues allgemeines Grundgesetzder Mechanik.Crelles Journal,
4: 232–235.
Giusti, E. 2009. Les méthodes des maxima et minima de Fermat. Ann. Fac. Sci. Toulouse
Math. 18 (6): Fascicule Special, 5985.
Goldenbaum, U. 2016. Ein gefälschter Leibnizbrief?: Plaidoyer für seine Authentizität.
Wehrhahn Verlag, Hannover.
Goldenbauh, U. 2017. Private communication.
Goldstine, H. H. 1980. A History of the Calculus of Variations from the 17th through the
19th Century. Springer Verlag.
Gould, S. H. 1985. Newton, Euler, and Poe in the Calculus of Variations. In Differential
Geometry, Calculus of Variations, and their Applications. Edited by Rassias, G. M.
and Rassias T. M., 267–282. Marcel Dekker, Inc., 1985.
Gouy, L. G. 1890. Sur une propriété nouvelle des ondes lumineuses. Comptes rendus
hebdomadaires des séances de l’Académie des sciences, 110: 1251–1253.
246 References
Gouy, L. G. 1891. Sur la propagation anomale des ondes. Annales de chimie et de physique
24 6e série, 145–213.
Graves, R. P. 1882. Life of Sir William Rowan Hamilton, Andrews Professor of Astronomy
in the University of Dublin, and Royal Astronomer of Ireland, Including Selections
from his Poems, Correspondence, and Miscellaneous Writings. Hodges, Figgis, Vol. I.
Gray, J. 1993. Möbius’s Geometrical Mechanics. In Möbius and His Band. Edited by
Fauvel, J. R. Flood, R. and Wilson, R., 79–103 Oxford University Press.
Gray, C. G. and Taylor, E. F. 2007. When Action Is Not Least. Am. J. Phys. 75: 434–458.
Gray, C. G. 2009. Principle of Least Action. Scholarpedia, 4 (12): 8291. Available at
http://www.scholarpedia.org/article/Principle− of− least− action.
Green, G. 1838. On the Laws of Reflection and Refraction of Light at the Common Surface
of Two Non-Crystallized Media (read 11 Dec 1837). Transactions of the Cambridge
Philosophical Society. Also in The Mathematical Papers of the Late George Green.
MacMillan and Co. 243–269.
Hall, A. R. and Hall, M. B. 1888/1962. A Catalog of the Portsmouth Collection of Books
and Papers Written by or Belonging to Sir Isaac Newton. Cambridge University
Press.
Hamilton, W. R. 1823. On a General Method of Expressing the Paths of Light, and of the
Planets, by the Coefficients of a Characteristic Function. Dublin University Review
and Quarterly Magazine, 1: 795–826.
Hamilton, W. R. 1833. On Some Results of the View of a Characteristic Function in Optics.
Report of the Third Meeting of the British Association for the Advancement of Science
held at Cambridge in 1833 John Murray, 360–370.
Hamilton, W. R. 1834. On a General Method in Dynamics. Philosophical Transactions of
the Royal Society, part II, 247–308.
Hamilton, W. R. 1834b. On the Application to Dynamics of a General Mathematical
Method Previously Applied to Optics. British Association Report, 513–518.
Hamilton, W. R. 1835. Second Essay on a General Method in Dynamics. Philosophical
Transactions of the Royal Society, I: 95–144.
Hamilton, W. R. 1837. Third Supplement to an Essay on the Theory of Systems of Rays.
Transactions of the Royal Irish Academy, 17: 1–144.
Hanc, J. and Taylor, E., F. 2004. From Conservation of Energy to the Principle of Least
Action: A Story Line. Am. J. Phys., 72: 514–521 .
Hanc, J., Taylor, E., F. and Tuleja, S. 2004. Deriving Lagrange’s Equations Using
Elementary Calculus. Am. J. Phys., 72: pp. 510–513.
Hanc, J., Taylor, E., F. and Tuleja, S. 2005. Variarional Mechanics in One and Two
Dimensions. Am. J. Phys., 73: 603–610.
Hanc, J., Tuleja, S. and Hancova, M. 2004. Simple Derivation of Newtonian Mechanics
from the Principle of Least Action. Am. J. Phys., 71: 386–391.
Hankins, T., L. 1967. The Reception of Newton’s Second Law of Motion in the 18th
Century. Archives internationales d’histoire des sciences, 20: 55–56.
Hankins, T., L. 1970. Jean d’Alembert, Science and the Enlightenment. Clarendon Press.
Hankins, T., L. 1980. Sir William Rowan Hamilton. The John Hopkins University Press.
Hardy, G., H. 1967. A Mathematicians Apology. Cambridge University Press.
Hawking, S. and Ellis, G.F.R. 1973. The Large Scale Structure of Space-Time. Cambridge
University Press. pp. 365–368.
Heath, T. 1921. A History of Greek Mathematics. Vol. 2. Oxford University Press, pp.
206–213.
Heath, T. 1926. The Thirteen Books of Euclid’s Elements. Books 1–2. Cambridge
University Press.
References 247
Heilbrom, J. 2013. The path to the Quantum Atom. Nature, 498: 27–30.
Helmholtz, H., von. 1887. Zur Geschichte des Princips der kleinsten Action.
Helmholtz, H., von. 1892. Das Princip der kleinsten Wirkung in der Electrodynamik.
Annalen der Physik, 283: 1–26.
Heronis Alexandrini. 1976. Opera Qvae Supersvnt Omnia. Vol. 2. Teubner, 396. Available
at http://gallica.bnf.fr/ark:/12148/bpt6k25187r
Hertz, H.R. 1894/1956. Gessamelte Werke, Vol. 3 Der Prinzipien der Mechanik in neuem
Zusammenhange dargestellt, Barth. English Translation: Dover.
Herzberger, M. 1936. On the Characteristic Function of Hamilton, the Eiconal of Bruns,
and Their Use in Optics. J. Opt. Soc. Am., 26: 177–178.
Hildebrandt S. and Tromba A. 1985. Mathematics and Optimal Form. Scientific American
Books.
Holm, D. D. 2008. Geometric Mechanics. Imperial College Press.
Hussein, M. S., Pereira, J. G., Stojanoff, V., and Takai, H. 1980. The Sufficient Condition
for an Extremum in the Classical Action Integral as an Eigenvalue Problem. American
Journal of Physics, 48: 767–770.
Huygens, C. 1673. Horologium Oscillatorium; sive, de Motu Pendulorum ad Horologia
Aptato Demonstrationes Geometricae.
Huygens, C. 1690/1945. Treatise on Light. In Which Are Explained the Causes of That
Which Occurs in Reflexion, & in Refraction. And Particularly in the Strange Refrac-
tion of Iceland Crystal. Rendered into English by Silvanus P. Thompson. University
of Chicago Press, 42–44.
Jacobi, C. G. 1837. Über die Reduction der Integration der partiellen Differentialgleichun-
gen erster Ordnung zwischen Irgend einer Zahl Variabeln auf die Integration eines
einzigen Systemes gewohnlicher Differentialgleichungen. Journal für die Reine und
Angewandte Mathematik, 17: 97–162. In Werke, 4: 57–127.
Jacobi, C. G. 1837. Zür Theorie der Variationensrechnung und der Differential Gle-
ichungen. J. f. Math. XVII: 68–82. An English translation is given in Todhunter
(1861/2005): p. 243.
Jacobi, C. G. 1884. Vorlesungen über Dynamik. For the English version see Balagan-
gadharan, K. (translator), 2009. Jacobi’s Lectures on Dynamics. Hindustan Book
Agency.
Jammer, M. 1999. Concepts of Force. Dover.
Jourdain, P. E. B. 1912. Maupertuis and the Principle of Least Action. The Monist, 22:
414–459.
Jouguet, E. 1908. Lectures de Mécanique. La Mécanique Ensignée par les Auteurs
Orininaux. Vol. 1. Gauthier-Villars.
Klein, F. 1918. Über die Differentialgesetze für die Erhaltung von Impuls und Energie
in die Einsteinschen Gravitationstheories. Nachr. d. Konig. Gesellsch. d.Wiss. zu
Gottingen Math-phys. Klasse.
Klein, M. J. 1970. Paul Ehrenfest, The Making of a Theoretical Physicist. Vol. 1. North
Holland.
Knobloch, E. 2012. Leibniz and the Brachistochrone. Documenta Mathematica. Extra
Volume ISMP, 15–18.
König, S. 1751. De universali principio aequilibrii et motus. Nova acta eroditorum,
162–176. Available at http://gallica.bnf.fr/.
Kragh, J. 1982. Erwin Schrödinger and the Wave Equation: The Crucial Phase. Centaurus,
26: 154–197.
Kragh, J. 1985 The Fine Structure of Hydrogen and the Gross Structure of the Physics
Community, 1916–26. Historical Studies in the Physical Sciences, 15: No. 2, 67–125.
248 References
Kuhn, T., S. and Heilbron, J., L. 1969. The Genesis of the Bohr Atom. Historical Studies
in the Physical Sciences, 1: 211–290.
Kuhn, T., S. 1978. Black-body Theory and the Quantum Discontinuity: 1894–1912. Oxford
University Press.
Lagrange, J. L. 1760/1761. Application de la méthode exposé dans le mémoire précédente
à la solution des problèmes de dynamique differents. Miscelanea Taurinesia, 196–
298.
Lagrange, J. L. 1764. Recherches sur la libration de la Lune. Œuvres de Lagrange, Vol. 6,
5–61. Available at http://gallica.bnf.fr/.
Lagrange, J. L. 1768. Mécanique Analytique, Seconde Partie. The equations first appeared
in Miscell. Tourin, 11.
Lagrange, J. L. 1811/1995. Analytical Mechanics. Translated by A. Boissonnade and V. N.
Vagliente from the Mécanique analytique. New edn. 1811. Springer.
Lamb, H. 1900. On a Peculiarity of the Wave-System due to the Free Vibrations of a
Nucleus in an Extended Medium. Proc. London Math. Soc., 53: 208–211.
Lanczos, C. 1962. The Variational Principles of Mechanics. Second edn. University of
Toronto Press, first edn. 1949.
Landsman, N., P. 2007. Between Classical and Quantum. In Handbook of the Philosophy of
Science, Vol. 2: Philosophy of Physics, Edited by John Earman & Jeremy Butterfield,
417–554. North Holland.
Laplace, P. S. 1799. Traité de mécanique céleste. Vol. 1, First Part, Book 2. 165 ff.
Larmor, J. 1893. A Dynamical Theory of the Electric and Luminuferous Medium.
Proceedings of the Royal Society of London, 14: 438–461.
Leibniz, G. W. 1682. Unicum Opticae, Catoptricae & Dioptricae Principium. Acta
eruditorum. June. Reprinted in Acta Eruditorum. Vol. 1. Johnson Reprint
Corporation. English translation by Jeffrey K. McDonough. Available at
http://philosophyfaculty.ucsd.edu/faculty/rutherford/Leibniz/unitary-principle.htm.
Leibniz, G., W. 1962. Mathematische Schriften, Vol. 3/1. Edited by G. I. Gerhardt. Reprint.
Georg Olms Verlagbuchhandlung Hildesheim.
Leibniz, G., W. 1696/1952. Tentamen Anagogicum. In Philosophical Papers and Letters.
Translated by Loemker, L. E., 777–788. University of Chicago Press.
Lenz, W. 1924. Über den Bewegungsverlauf und die Quantenzustände der gestörten
Keplerbewegung. Zeitschrift für Physik, 24: 197–207.
Levi, M. 2002. Lectures on Geometrical Methods in Mechanics In Classical and Celestial
Mechanics. Edited by H. Cabral and F. Diacu, 239–280, Princeton University Press.
Levi, M. 2012. The Mathematical Mechanic: Using Physical Reasoning to Solve Problems.
Princeton University Press.
Lewis, A. 1998. The Geometry of the Gibbs-Appell Equations and Gauss’ Principle of
Least Constraint. Reports on Math. Phys, 38: 11–28.
Liberzon, D. 2012. Calculus of Variations and Optimal Control Theory. Princeton
University Press.
Lloyd, H. 1833. On the Phenomena Presented by Light in its Passage along the Axes of
Biaxial Crystals. Trans. R. Irish Acad., 17: 145–158. Reprinted in Lloyd, H., 1877.
Miscellaneous Papers Connected with Physical Science. Longman Green: 1–18.
Lohne, J. 1959. Thomas Harriott (1650–1621), The Tycho Brahe of Optics. Centaurus,
6 (2): 113–121.
Lorentz, H. A. 1892 La théorie électromagnétique de Maxwell et son application aux corps
mouvants. E.J. Brill.
Lorentz, H. A. 1895. Michelson’s Interference Experiment. Reprinted in Einstein (1952),
1–7.
References 249
Lorentz, H. A. 1903. Contributions to the Theory of Electrons, Proc. Roy. Acad. Amster-
dam. 608: 132–154.
Lyssy, A. 2015. L’Économie de la nature—Maupertuis et Euler sur le Principe de Moindre
Action, Philosophiques, 42: 31–51.
Lyusternik, L. A. 1964. Shortest Paths, Variational Problems. MacMillan.
Mach, E. 1960. The Science of Mechanics: Account of its Development. Translated by
Thomas J. McCormack. Sixth edn. Open Court Publishing Company.
Mariotte, E. (1673). Traité de la percussion ou chocq des corps, dans lequel les princi-
pales régles du mouvement contraires á celles que Mr. Des Cartes, & quelques autres
modernes ont voulu éstablir, sont demonstrées par leurs veritables causes.
Mark Smith, A. 1982. Ptolemy’s Search for a Law of Refraction: A Case-Study in the
Classical Methodology of ‘Saving the Appearances’ and Its Limitations. Archive for
History of Exact Sciences, 26: No. 3, 221–240.
Mark Smith, A. 2009. Alhacen on Refraction: A Critical Edition, with English Transla-
tion and Commentary, of Book 7 of Alhacen’s De Aspectibus. Transactions of the
American Philosophical Society, 100 (3): 213–331.
Marsden, J. E. and T. S. Ratiu. 1999. Introduction to Mechanics and Symmetry. Springer-
Verlag, Texts in Applied Mathematics, 17; First Edition 1994, Second Edition, 1999.
de Maupertuis, P. L. M. 1744. Accord de différentes loix de la nature, qui avoient jusqu’ici
paru incompatible. Memoires de l’Académie Royale de Sciences (Paris), 417–426.
Reprinted in Oeuvres, 4 pp. 1–23 Reprografischer Nachdruck der Ausg. (1768).
de Maupertuis, P. L. M. 1746. Les Loix du mouvement et du repos duites d’un principe
metaphysique. Histoire de l’Académie Royale des Sciences et des Belles Lettres,
267–294.
Mayer, A. 1877. Geschichte des Princips der kleinsten Action. Leipzig. Available in
Google books.
MacCullagh, J. 1846. An Essay towards a Dynamical Theory of Crystalline Re-exion and
Refraction (read 9 Dec. 1839). The Transactions of the Royal Irish Academy, 21:
17–50.
McDonough, J. K. 2008. Leibniz’s two realms revisited. Nôus, 42 (4): 673–696
McDonough, J. K. 2009. Leibniz on Natural Teleology and the Laws of Optics. Philosophy
and Phenomenological Research, 78 (3): 505–544.
Mehra, J. and Rechenberg, H. 1982. The Historical Development of Quantum Theory. Part
I, Springer-Verlag, 58.
Möbius, A. F. 1837. Lehrbuch der Statik. Part 2, 217–313.
Moore, T. A. 2004. Getting the Most Action Out of Least Action: A Proposal. Am. J. Phys.
72: 522–527.
Motte, A. 1729. Mathematical Principles of Natural Philosophy by Sir Isaac Newton,
translated into English. Vol. 2, Appendix pp. i–vii.
Nakane M. and Fraser C. G. 2002. The Early History of Hamilton-Jacobi Dynamics 1834–
1837. Centaurus, 44: 61–227.
Nadderd, L., Davidovic, M. and Davidovic, D. 2014. A direct derivation of the relativistic
Lagrangian for a system of particles using d’Alembert’s principle. Am. J. Phys, 82:
1083–1086.
Nauenberg, M. 1994. Hooke, Orbital Motion, and Newtons’s Principia. American Journal
of Physics 62 (4): 331–350.
Neimark, J. I. and N. A. Fufaev. 1972. Dynamics of Nonholonomic Systems. Translations
of Mathematical Monographs, AMS, 33.
Navarro, L. and Pérez E. 2006. Paul Ehrenfest: The Genesis of the Adiabatic Hypothesis,
1911–1914. Arch. Hist. Exact Sci. 60: 209–267.
250 References
Neumann, J.G. 1888. Leipzig Beriechte XL, Vierkandt Monatshefte fur̈ Math u. Phys. III.
Newton, I. 1687. Philosophi Naturalis Principia Mathematica. Londini Societatis Regiae
ac Typis, Josephi and Streater.
Newton, I. 1718. Opticks: A Treatise of the Reflections, Refractions, Inflections & Colours
of Light. Second Edition. Available in Google books.
Noether, E. 1918. Invariante Variationsprobleme, Nachr. D. Knig. Gesellsch. D. Wiss. Zu
Göttingen, Math-phys. Klasse, 235–257.
Nicholson, J. W. 1912. The Constitution of the Solar Corona. II. Month. Not. Roy. Astr.
Soc., 72: 677–692.
O’Hara, J. 1979. Analysis versus Geometry: William Rowan Hamilton, James Mac-
Cullagh and the Elucidation of the Fresnel Wave Surface in the Theory of Dou-
ble Refraction. Available at https://halshs.archives-ouvertes.fr/halshs-00004274v3/
file/15_OHara.tif.pdf.
O’Hara, J. 1982. The Prediction and Discovery of Conical Refraction by William Rowan
Hamilton and Humphrey Lloyd (1832–1833). Proc. Roy. Ir. Acad. 82: pp. 231–257.
Okun, L.B. 1989. The Concept of Mass. Physics Today, 42: 31–36.
Okun, L.B. 2009. Mass versus Relativistic and Rest Masses. Am. J. Phys. 77: 430–431.
Olver, P. 2015. Introduction to the Calculus of Variations. Notes.
Orio de Miguel, B. (Translator). 2009. Correspondencia G. W. Leibniz–Johann Bernoulli.
Cartas 1–275, 20 de diciembre de 1693–11 de noviembre de 1716. Registro de la
Propiedad Intelectual de la Comunidad de Madrid.
Pais, A. 1982. ‘Subtle is the Lord: The Science and the Life of Albert Einstein. Oxford
University Press, 154
Pais, A. 1991. Niels Bohr’s Times. Oxford University Press. p.154
Palmieri, P. 2008. The Empirical Basis of Equilibrium: Mach, Vailati, and the Lever. Stud.
Hist. Phil. Sci. 39: 42–53.
Pappus of Alexandria, 1888. Pappi Alexandrini Collectionis: quae supersunt. Vol. 2, 1189–
1211.
Pars, L. A. 1965. A Treatise on Analytical Dynamics. Heineman.
Pedersen, O. 1993. Early Physics and Astronomy: A Historical Introduction. Cambridge
University Press 115.
Pérez, E. 2009. Ehrenfest’s Adiabatic Theory and the Old Quantum Theory, 1916–1918.
Arch. Hist. Exact Sci. 63: pp. 81–125.
Pié i Valls, B. and Pérez, E. 2016. The Historical Role of the Adiabatic Principle in Bohr’s
Quantum Theory. Ann. Phys., 528: 530–534 .
Planck, M. 1906a. Vorlesungen über die Theorie der Wärmestrahlumg, J. A. Barth, 155–
156. For the translation of the second edition, see M. Planck. 1914. The Theory of
Radiation. Blakiston’s Son & Co., 160–166.
Planck, M. 1906b. Das Prinzip der Relativität und die Grundgleichungen der Mechanik.
Verhandlungen der Deutschen Physikalischen Gesellschaft, 8: 136–141.
Planck, M. 1907. Zur Dynamik bewegter Systeme. Sitzungsberichte der Preussischen
Akademie der Wissenschaften, (January-June), 542–570.
Planck, M. 1910. Die Stellung der neueren Physik zur mechanischen Naturanschauung. In
Planck M. 1944. Wege zur physikalischen Erkenntnis. Reden und Vorträge. S. Hirzel.
pp.. 25–41.
Planck, M. 1915. Eight Lectures on Theoretical Physics. Columbia University Press.
Planck, M. 1915/1993. The Principle of Least Action. In A Survey of Physical Theory.
Dover.
Poe, E. A. 1975. The Complete Tales and Poems. Vintage Books.
References 251
Poisson M. 1818. Mémoire sur l’intégration de quelques équations lináires aux dif-
férences partielles, et particulièrement de léquation générale du mouvement des
fluides élastiques. Mémoires de l’Académie royale des sciences, 3: 121–176.
Pound, R. V. and Rebka Jr. G. A. 1959. Gravitational Red-Shift in Nuclear Resonance.
Physical Review Letters, 3: 439–441
Rashed, R. 1970. Optique géometrique et doctrine optique chez Ibn al-Haytham. Archive
for History of Exact Sciences, 6: 271–298.
Rashed, R. 1990. A Pioneer in Anaclastics: Ibn Sahl on Burning Mirrors and Lenses. Isis,
81: 464–491.
Rashed, R. 2016. Private communication.
Rodrigues, O. 1816. De la manière d’employer le principe de la moindre action, pour
obtenir les équations du mouvement, rapportées aux variables independentes. Corre-
spondance sur l’École Royale Polytechnique. Vol. III, 159–162.
Rojo, A. G. 2005. Hamilton’s Principle: Why Is the Integrated Difference of the Kinetic
and Potential Energy Minimized? Am. J. Phys., 73: 831–836.
Runge, C. 1919. Vektoanalysis. Vol. 1.
Sabra, A. I. 1981. Theories of Light from Descartes to Newton. Cambridge University
Press.
Sarton G. 1932. Discovery of Conical Refraction by William Rowan Hamilton and
Humphrey Lloyd (1833). Isis, 17: 154–170.
Schopenhauer, A. 1969. The world as will and representation. Vol. 1 Dover Publications.
Schrödinger, E. 1926a. Quantisation as a Problem of Proper Values (Part I). Annalen der
Physik, 79: 361–376.
Schrödinger, E. 1926b. Quantisation as a Problem of Proper Values (Part II). Annalen der
Physik, 79: 489–527.
Schrödinger, E. 1926c. Quantisation as a Problem of Proper Values (Part III). Annalen der
Physik, 80: 437–490.
Schrödinger, E. 1926d. Quantisation as a Problem of Proper Values (Part IV). Annalen der
Physik, 81: 109–139.
Schrödinger, E. 1928. Collected Papers on Wave Mechanics. Blackie & Son Limited,
London.
Schwartz, M. 1972. Principles of Electrodynamics. Dover Publications.
Schwarzschild, K. 1903. Zur Elektrodynamik. 1. Zwei Formen des Prinzips der kleinsten
Wirkung in der Elektronentheorie. Konigliche Gesellschaft der Wissenschaften und
der Georg August Universitat zu Gottingen. 126–131. Available at http://gdz.sub.uni-
goettingen.de/ .
Schwarzschild, K. 1916. On the Gravitational Field of a Point-Mass, According to
Einstein’s Theory. Sitzungsberichte der Königlich Preußischen Akademie der Wis-
senschaften, 49: 189–196.
Serway, Raymond A. and Jewett, John W. 2013. Physics for Scientists and Engineers.
Ninth edn. Cengage Learning, 1071–1072.
Sklar, L. 2013. Philosophy and the Foundations of Mechanics. Cambridge University
Press.
Soldner, H. J. von. 1801 On the Deflection of a Light Ray from its Motion along a Straight
Line through the Attraction of a Celestial Body Which Passes Nearby. Translated
from Soldner, H. J., Astronomisehes Jahrbuch für das Jahr 1804, 161–172, in Jaki, S.
L. 1978. Johann Georg von Soldner and the Gravitational Bending of Light, with an
English Translation of His Essay on It Published in 1801. Foundations of Physics, 8:
Nos. 11/12, 927–950.
252 References
Sommerfeld A. and Runge I. 1911. Anwendung der Vektorrechtung auf die Grundlagen
der geometrischen Optik. Annalen der Physik, 4th ser. 35: pp. 289–293.
Sommerfeld, A. 1911a. Das Plancksche Wirkungsquantum und seine allgemeine Bedeu-
tung für die Molekularphysik. Physikalische Zeitschrift, 12: 10571069.
Sommerfeld, A. 1911b. Application de la théorie de l’élément d’áction aus phénomènes
moléculaires non périodiques. In La théorie du rayonnement et les quanta. Rapports
et discussions de la réunion tenue à Bruxelles, du 30 octobre au 3 novembre 1911,
sous les auspices de M. E. Solvay. Pub. par MM. P. Langevin et M. de Broglie. Ulan
Press, 2012, 313–392.
Sommerfeld, A. 1916. Zur Theorie des Zeeman-Effekts der Wasserstofflinien, mit einem
Anhang über den Stark-Effect. Phys. Zs., 17: 491–507
Sommerfeld, A. 1950. Mechanics of Deformable Bodies: Lectures on Theoretical Physics.
Vol. 2. Academic Press.
Sommerfeld, A. 1952. Mechanics. Lectures on Theoretical Physics. Vol. 1. Academic
Press.
Smith, G. E. 2006. The vis viva dispute: A controversy at the dawn of dynamics. Phys.
Today, October, 31–36.
Steiner J. 1842. Sur le maximum et le minimum des figures dans le plan, sur la sphère et
dans l’espace en général. Second mémoire’. J. Reine Angew. Math. 24: 189–250.
Stokes, G. G. 1862. Report on double refraction. Report of the British Association for the
Advancement of Science. Reprinted Mathematical and Physical Papers, by the Late
George Gabriel Stokes. Vol. 4. Cambridge University Press (1904). 127–202
Stöltzner, M. 2003. The Principle of Least Action as the Logical Empiricist’s Shibboleth.
Studies in History and Philosophy of Modern Physics, 34: 285–318.
Stuewer, R. 1970. Non-Einstenian Interpretations of the Photoelectric Effect. Minnesota
Studies in the Philosophy of science. Edited by R. Stuewer. Vol. 5, 246–263
Synge, J. L. 1937. Geometrical Optics: An Introduction to Hamilton’s Method. Cambridge
University Press, vii.
Synge, J. L. 1945. The Life and Early Work of Sir William Rowan Hamilton. In a collection
of papers in memory of Sir William Rowan Hamilton, Scipta Mathematica Studies,
13–24.
Sussmann, H. J. and Willems, J. C. 1997. 300 Years of Optimal Control: From the
Brachistochrone to the Maximum Principle. IEEE Control System Magazine, 17:
32–44.
Taylor, E. F. and Wheeler, J. A. 1999. Spacetime Physics: Introduction to Special
Relativity. Second edn. W. H. Freeman.
Taylor, E. F. and Wheeler, J. A. 2000. Exploring Black Holes. Addison Wesley, 5.
Terrall, M. 2002. The Man Who Flattened the Earth; Maupertuis and the Sciences of
Enlightment. University of Chicago Press.
Thomson, William (Lord Kelvin). 1890. On a Mechanism for the Constitution of Ether.
Proceedings Royal Society of Edinburgh, 17: 122–132.
Thucydides, 431 BC. The History of the Peloponnesian War. Available at
http://classics.mit.edu/Thucydides/pelopwar.html
Tisserand, F. 1899. Traité de mécanique céleste, 95–97.
Todhunter I. 1861/2005. A History of the Progress of the Calculus of Variations During the
Nineteenth Century. Cambridge University Press.
Truesdell, C. 1960. Rational Mechanics of Flexible or Elastic Bodies. 1638–1788 –
Introduction to Work of Euler. Springer.
Truesdell, C. 1960b. A Program toward Rediscovering the Rational Mechanics of the Age
of Reason. Archive for History of Exact Sciences, 1: 3–36.
References 253
Truesdell, C. 1968. Whence the Law of Moment of Momentum? Chapter V in Truesdell C.,
1968. Essays in the History of Mechanics. Springer-Verlag.
Varignon, P. 1735. Nouvelle Mecanique ou Statique. Vol. 2.
Vierkandt, A. 1892. Über gleitende und rollende Bewegung. Monatshefte der Math. und
Phys. 3: 31–54.
Virgil. 19. BC The Aeneid. Book I. Translated by Robert Fitzgerald. New York: Random
House, 1981.
Visser, T. D. and Wolf, E. 2010. The origin of the Gouy phase anomaly and its
generalization to astigmatic wavefields. Optics Communications, 283: 3371–3375.
Voltaire, 1753. Diatribe du docteur Akakia, médecin du Pape: Decret de l’inquisition.
Rome. Available in Google books.
Vollgraff, J.A. 1915. Christiaan Huygens (1629–1695) et Jean le Rond d’Alembert (1715–
1783). Janus, 20: 269–313.
Weierstrass, K. 1927. Mathematische Werke. Vol. 7. Mayer & Muller.
Weyl. H. 1919. Eine neue Erweiterung der Relativitätstheorie. Ann. der Physik, 59: 101–
133.
Whiteside D. T. 1974. The Mathematical Papers of Isaac Newton. Vol. 6. Cambridge
University Press.
Whittaker, E. T. 1988. A Treatise on the Analytical Dynamics of Particles and Rigid Bod-
ies. Fourth Edition, Cambridge University Press; first edn. 1904, fourth edn., 1937,
reprinted by Dover 1944 and Cambridge University Press 1988.
Young, A. T. 2012. An introduction to mirages http://mintaka.sdsu.edu/GF/mirages/
mirintro.html.
Yourgrau, W. and Mandelstam, S. 1968. Variational Principles in Dynamics and Quantum
Theory. W.H. Saunders and Co., first published 1955.
Yurkina, M. I. 1985. Sur l’histoire de la notion du potentiel. Journal of Geodesy, 59 (2):
150–166.
Index
action, 89 dissipation, 101

action of the forces, 89 dissipative systems, 100
adiabatic hypothesis, 200
adiabatic invariants, 199 eccentric anomaly, 133
Airy, 119 eccentricity, 132
angle characteristic, 118 Ehrenfest, 199
eikonal, 206
Balmer’s formula, 193 eikonal equation, 118
Bernoulli, 51, 79, 80 Einstein, 162, 189
Bernoulli’s challenge, 51 Einstein’s field equations, 181
Berry’s phase, 128 elliptical orbits, 44, 45
birefringent crystals, 119 energy-momentum, 169
Bohr, 191 energy-momentum tensor, 186
Bohr’s trilogy, 192 ether: elasticity, 107
Boltzman, 189
Born, 209 Fermat’s method of maxima and minima, 21
brachistochrone, 51 Fermat’s minimum principle, 18
Brillouin, 204 Fermat’s principle, 142
Feynman, 211
canonical equations of motion, 138 fine structure, 198
caustics, 158 focus, 149
center of oscillation, 82 Fresnel, 119
characteristic function, 112 Fresnel’s equations, 119
circular orbits, 43
compound pendulum, 83 Galileo, 14, 51
conical cusp, 124 gauge symmetry, 96
conical refraction, 119 Gauss’s principle, 105
conjugate momentum, 137 General Relativity, 173
contact transformation, 209 geometrical phase, 128
covariance, 95, 135 gravitation, 173
critical, 112
curvature of space-time, 186 Hamilton, 112
cycloid, 51 Hamilton’s principle, 91
Hamilton-Jacobi, 112, 172
d’Alembert, 79 Hamilton-Jacobi equation, 140
d’Alembert’s principle, 85 Hamiltonian, 137
De Broglie, 203 Heisenberg, 209
Dirac, 209 Hero of Alexandria, 13, 116
Dirac cones, 128 Hilbert’s least action principle, 186
254
Index 255
Horologium Oscillatorium, 52 phase harmony, 204

Huygens, 24, 51 phase velocity, 120
Huygens’ principle, 142 Planck, 168, 189, 190
Huygens’ principle in optics and quantum mechanics, Planck’s constant, 189
214 planetary orbits, 62
hydrogen, 198 point characteristic, 117
hypocycloid, 71 Poisson, 218
principal function, 112
internal conical refraction, 126 Principia, 38
intersecting chord theorem, 46 principle of equivalence, 173
isochronous pendulum, 51 proper length, 170
isoperimetric problem, 6 proper time, 170
Jacobi, Carl Gustav, 112 quantum contact transformation, 210

quantum theory of electrodynamics, 211
Kepler’s laws, 39 Queen Dido, 6
ket and bra, 209
kinetic focus, 149 rectangular system of rays, 113
reflection, law of, 13
Lagrange’s dynamics, 90 relativistic particle, 172
Lagrange’s scientific poem, 93 reversible system, 101
Lagrangian view of quantum mechanics, 209 Ricci tensor, 187
Lamb’s model, 101 Riemann, 186
Laplace-Runge-Lenz vector, 99 Riemann tensor, 186
law of reflection, 113
law of varying action, 128, 129
Schrödinger, 205
least constraint, 105
Simultaneity, 163
Leibniz, 51, 61
Snell’s law, 24
living force, 128
solid of least resistance, 27
Lloyd, H., 126
Solvay Conference, 199
Lorentz, 163, 199
Sommerfeld, 190, 195
stationary, 112
MacCullagh, 107
string analogy, 72
Malus’s theorem, 113
supplement to an essay on rays, 118
matter waves, 203
swiftest descent, 14
Maupertuis, 59
symmetries, 96
Maxwell’s equations, 107, 166
metaphysical mechanics, 59 synchronization, 163
Michelson and Morley, 163 Synge, 112
mixed characteristic, 118 system of rays, 112, 118
System of the World, 38
Newton, 27, 38
Newton’s Laws, 38 tachystoptota, 51
Nicholson, 194 terrestrial brachistochrone, 71
Noether, 96 Theon, 7
nonholonomic systems, 100, 103 theory of rays, 113
normal congruence, 113 Thucydides, 7
normal slowness, 116, 122
normal velocity, 120 virtual work, 79
vis viva, 90
onde fictive, 204 Voltaire, 61
optical-mechanical Analogy, 51
wave mechanics, 205
Pappus, 7 wave surface, 120
parabolic orbit, 131 Weyl, 96
path integrals, 211
perihelion of Mercury, 181, 182 Zenodorus, 7

Rojo, Bloch - The Principle of Least Action PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rojo, Bloch - The Principle of Least Action PDF

Uploaded by

Copyright:

Available Formats

T H E P R I N C I P L E O F L E A S T AC T I O N

History and Physics

ANTHONY BLOCH is the Alexander Ziwet Collegiate Professor of Mathematics

Cambridge University Press is part of the University of Cambridge.

List of Illustrations page ix

2 Prehistory of Variational Principles 6

3 An Excursion to Newton’s Principia 38

4 The Optical-Mechanical Analogy, Part I 51

4.1.2 Leibniz’s Solution of the Brachistochrone 55

5 D’Alembert, Lagrange, and the Statics-Dynamics Analogy 79

6 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi

6.5.1 Example: Particle in a Uniform Gravitational Field 140

7 Relativity and Least Action 162

8 The Road to Quantum Mechanics 189

8.6 Dirac’s Lagrangian View of Quantum Mechanics 209

Appendix A Newton’s Solid of Least Resistance, Using Calculus 221

2.1 Adapted from Zenodorus: equiangular polygon page 8

3.5 From Proposition 7 of the Principia 44

6.14 Focus and saddle path for a mirror 149

than the material found in previous chapters. We included a discussion on kinetic

2.1 Queen Dido and the Isoperimetric Problem

They sailed to the place where today you’ll see

The isoperimetric problem was solved by the Greek mathematician Zenodorus

2.1.1 Zenodorus’s Solution*

Using the Pythagorean theorem

Figure 2.6 Adapted from Zenodorus.

A regular polygon of n sides can be decomposed into n equal isosceles triangles

O A × (AB − AD )/2 > O D × (α − β) (2.7a)

Multiplying the inequalities of equations (2.7), we obtain Zenodorus’s result:

Figure 2.8 A variational problem without a solution.

2.2 Hero of Alexandria and the Law of Reflection

Figure 2.9 Hero of Alexandria’s original proof of the law of reflection.

2.3 Galileo and the Curve of Swiftest Descent

that, through experimentation (naturalia experimenta), he reached the conclusion

Galileo studied uniform acceleration by considering the motion on inclined

On the other hand, the final velocity vfinal = gttotal is

Equation (2.12) expresses the content of Galileo’s second postulate of accelerated

where vav = (vi + v f )/2. √

Figure 2.14 Descent on a circle, from Galileo’s Two New Sciences.

2.4 Bending of Light Rays and Fermat’s Minimum Principle

Figure 2.15 Descartes’ derivation of the law of refraction. Reproduced from

AH = vparallel × tair (2.15a)

and Descartes obtains the “correct” law:

or, which is equivalent:

vwater sin θwater = vair sin θair . (2.17)

2.4.1 Fermat’s Method of Maxima and Minima

and, since b/a = sin θ1 / sin θ2 (see Figure 2.17), we obtain

2.4.2 Huygens’ Simplified Derivation of Snell’s Law

Figure 2.18 From Huygens’ Treatise on Light.

to relate to Fermat’s notation, call FB = e, and FG = e cos θ2 . From Figure 2.18

Substituting equation (2.31) in (2.29), we obtain

sin θ1 sin θ2 e2 cos2 θ2

2.5 Newton and the Solid of Least Resistance*

Later, in 1888, in an appendix to “The Portsmouth Collection of Newton’s

2.5.1 The Sphere and the Cylinder

always proportional to −2 × B L on all points of the (circular) cross section of the

Figure 2.21 Archimedes’ proof that the volume V P of a paraboloid of revolution

Figure 2.22 A paraboloid is inscribed on a cylinder of radius R and height R

2.5.2 An Application “in the Building of Ships”

Figure 2.23 First figure in the scholium to Proposition 34 in Newton’s Principia.

Figure 2.24 Second figure in the scholium to Proposition 34.

it follows, that, if the solid AD B E be generated by the convolution of an elliptical or oval

that can be cast in modern notation as a differential equation. So Newton is not

In polar coordinates d is given by