MIT Press - Structure and Interpretation of Classical Mechanics

Structure and Interpretation
of Classical Mechanics
Gerald Jay Sussman and Jack Wisdom

with Meinhard E. Mayer
The MIT Press

Cambridge, Massachusetts London, England
°2000
c by The Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form or
by any electronic or mechanical means (including photocopying, recording,
or information storage and retrieval) without permission in writing from the
publisher.
This book was set by the authors using the LATEX typesetting system and
was printed and bound in the United States of America.
This book is dedicated,
in respect and admiration,
to
The Principle of Least Action.
“The author has spared himself no pains in his endeavour to

present the main ideas in the simplest and most intelligible form,
and on the whole, in the sequence and connection in which they
actually originated. In the interest of clearness, it appeared to
me inevitable that I should repeat myself frequently, without pay-
ing the slightest attention to the elegance of the presentation. I
adhered scrupulously to the precept of that brilliant theoretical
physicist L. Boltzmann, according to whom matters of elegance
ought be left to the tailor and to the cobbler.”
Albert Einstein, in Relativity, the Special and General Theory,
(1961), p. v.
Contents
Contents vii
Preface xiii
Acknowledgments xvii
1 Lagrangian Mechanics 1
1.1 The Principle of Stationary Action 4
1.2 Configuration Spaces 9
1.3 Generalized Coordinates 11
1.4 Computing Actions 16
1.5 The Euler-Lagrange Equations 26
1.5.1 Derivation of the Lagrange Equations 27
1.5.2 Computing Lagrange’s Equations 34
1.6 How to Find Lagrangians 37
1.6.1 Coordinate Transformations 44
1.6.2 Systems with Rigid Constraints 48
1.6.3 Constraints as Coordinate Transformations 60
1.6.4 The Lagrangian is Not Unique 62
1.7 Evolution of Dynamical State 67
1.8 Conserved Quantities 76
1.8.1 Conserved Momenta 76
1.8.2 Energy Conservation 78
1.8.3 Central Forces in Three Dimensions 81
1.8.4 Noether’s Theorem 84
1.9 Abstraction of Path Functions 88
1.10 Constrained Motion 93
1.10.1Coordinate Constraints 95
viii Contents
1.10.2Derivative Constraints 102

1.10.3Non-Holonomic Systems 106
1.11 Summary 109
1.12 Projects 110
2 Rigid Bodies 113

2.1 Rotational Kinetic Energy 114
2.2 Kinematics of Rotation 116
2.3 Moments of Inertia 118
2.4 Inertia Tensor 121
2.5 Principal Moments of Inertia 123
2.6 Representation of the Angular Velocity Vector 125
2.7 Euler Angles 128
2.8 Vector Angular Momentum 132
2.9 Motion of a Free Rigid Body 134
2.9.1 Computing the Motion of Free Rigid Bodies 136
2.9.2 Qualitative Features 138
2.10 Axisymmetric Tops 144
2.11 Spin-Orbit Coupling 152
2.11.1Development of the Potential Energy 152
2.11.2Rotation of the Moon and Hyperion 156
2.12 Euler’s Equations 163
2.13 Nonsingular Generalized Coordinates 168
2.14 Summary 177
2.15 Projects 177
3 Hamiltonian Mechanics 179

3.1 Hamilton’s Equations 181
3.1.1 The Legendre Transformation 189
3.1.2 Hamiltonian Action Principle 199
3.1.3 A Wiring Diagram 201
3.2 Poisson Brackets 203
Contents ix
3.3 One Degree of Freedom 206

3.4 Phase Space Reduction 208
3.4.1 Lagrangian Reduction 218
3.5 Phase Space Evolution 220
3.5.1 Phase Space Description is Not Unique 222
3.6 Surfaces of Section 224
3.6.1 Poincaré Sections for Periodically-Driven Systems 225
3.6.2 Computing Stroboscopic Surfaces of Section 231
3.6.3 Poincaré Sections for Autonomous Systems 232
3.6.4 Non-axisymmetric Top 245
3.7 Exponential Divergence 246
3.8 Liouville’s Theorem 250
3.9 Standard Map 259
3.10 Summary 262
3.11 Projects 263
4 Phase Space Structure 265

4.1 Emergence of the Mixed Phase Space 266
4.2 Linear Stability of Fixed Points 271
4.2.1 Equilibria of Differential Equations 271
4.2.2 Fixed Points of Maps 275
4.2.3 Relations Among Exponents 277
4.3 Homoclinic Tangle 282
4.3.1 Computation of Stable and Unstable Manifolds 287
4.4 Integrable Systems 289
4.5 Poincaré-Birkhoff Theorem 296
4.5.1 Computing the Poincaré-Birkhoff Construction 301
4.6 Invariant Curves 303
4.6.1 Finding Invariant Curves 306
4.6.2 Dissolution of Invariant Curves 311
5 Canonical Transformations 317

5.1 Point Transformations 318
5.2 General Canonical Transformations 322
x Contents
5.2.1 Time-independent Canonical Transformations 325

5.2.2 Symplectic Transformations 330
5.2.3 Time-Dependent Transformations 333
5.2.4 The Symplectic Condition 336
5.3 Invariants of Canonical Transformations 338
5.4 Extended Phase Space 345
5.4.1 Poincaré-Cartan Integral Invariant 352
5.5 Reduced Phase Space 353
5.6 Generating Functions 358
5.6.1 F1 Generates Canonical Transformations 360
5.6.2 Generating Functions and Integral Invariants 362
5.6.3 Classes of Generating Functions 368
5.6.4 Point Transformations 370
5.6.5 Classical “Gauge” Transformations 386
5.7 Time Evolution is Canonical 391
5.7.1 Another View of Time Evolution 397
5.7.2 Yet Another View of Time Evolution 401
5.8 Hamilton-Jacobi Equation 403
5.8.1 Harmonic Oscillator 405
5.8.2 Kepler Problem 409
5.8.3 F2 and the Lagrangian 413
5.8.4 The Action Generates Time Evolution 414
5.9 Lie Transforms 416
5.10 Lie Series 422
5.11 Exponential Identities 430
5.12 Summary 432
6 Canonical Perturbation Theory 435

6.1 Perturbation Theory with Lie Series 436
6.2 Pendulum as a Perturbed Rotor 438
6.2.1 Higher Order 446
6.2.2 Eliminating Secular Terms 448
6.3 Many Degrees of Freedom 451
6.3.1 Driven Pendulum as a Perturbed Rotor 454
Contents xi
6.4 Nonlinear Resonance 456

6.4.1 Pendulum Approximation 458
6.4.2 Reading the Hamiltonian 464
6.4.3 Resonance Overlap Criterion 466
6.4.4 Resonances in Higher Order Perturbation Theory 467
6.4.5 Stability of Inverted Vertical Equilibrium 468
6.5 Projects 472
7 Appendix: Our Notation 475
8 Appendix: Scheme 491

Bibliography 501
List of Exercises 505
Preface
“In almost all textbooks, even the best, this
principle is presented so that it is impossible to
understand.” (K. Jacobi Lectures on Dynamics,
1842-1843). I have not chosen to break with
tradition.
V.I. Arnold, Mathematical Methods of Classical
Mechanics (1980), footnote on p. 246
There has been a remarkable revival of interest in classical me-

chanics in recent years. We now know that there is much more
to classical mechanics than previously suspected. The behavior of
classical systems is surprisingly rich; derivation of the equations of
motion, the focus of traditional presentations of mechanics, is just
the beginning. Classical systems display a complicated array of
phenomena such as non-linear resonances, chaotic behavior, and
transitions to chaos.
Traditional treatments of mechanics concentrate most of their
effort on the extremely small class of symbolically tractable dy-
namical systems. We concentrate on developing general methods
for studying the behavior of systems, whether or not they have
a symbolic solution. Typical systems exhibit behavior that is
qualitatively different from the solvable systems and surprisingly
complicated. We focus on the phenomena of motion, and we make
extensive use of computer simulation to explore this motion.
Even when a system is not symbolically tractable the tools of
modern dynamics allow one to extract a qualitative understand-
ing. Rather than concentrating on symbolic descriptions, we con-
centrate on geometric features of the set of possible trajectories.
Such tools provide a basis for the systematic analysis of numerical
or experimental data.
Classical mechanics is deceptively simple. It is surprisingly easy
to get the right answer with fallacious reasoning or without real
understanding. Traditional mathematical notation contributes
to this problem. Symbols have ambiguous meanings, which de-
xiv Preface
pend on context, and often even change within a given context.1

For example, a fundamental result of mechanics is the Lagrange
equations. Using traditional notation the Lagrange equations are
written
d ∂L ∂L
i
− i = 0.
dt ∂ q̇ ∂q
The Lagrangian L must be interpreted as a function of the position
and velocity components q i and q̇ i , so that the partial deriva-
tives make sense, but then in order for the time derivative d/dt
to make sense solution paths must have been inserted into the
partial derivatives of the Lagrangian to make functions of time.
The traditional use of ambiguous notation is convenient in simple
situations, but in more complicated situations it can be a serious
handicap to clear reasoning. In order that the reasoning be clear
and unambiguous, we have adopted a more precise mathematical
notation. Our notation is functional and follows that of modern
mathematical presentations.2
Computation also enters into the presentation of the mathe-
matical ideas underlying mechanics. We require that our mathe-
matical notations be explicit and precise enough so that they can
1
In his book on mathematical pedagogy [15], Hans Freudenthal argues that
the reliance on ambiguous, unstated notational conventions in such expressions
as f (x) and df (x)/dx makes mathematics, and especially introductory calcu-
lus, extremely confusing for beginning students; and he enjoins mathematics
educators to use more formal modern notation.
2
In his beautiful book Calculus on Manifolds (1965), Michael Spivak uses
functional notation. On p.44 he discusses some of the problems with classical
notation. We excerpt a particularly juicy quote:
The mere statement of [the chain rule] in classical notation requires the
introduction of irrelevant letters. The usual evaluation for D1 (f ◦ (g, h))
runs as follows:
If f (u, v) is a function and u = g(x, y) and v = h(x, y) then
∂f (g(x, y), h(x, y)) ∂f (u, v) ∂u ∂f (u, v) ∂v
= +
∂x ∂u ∂x ∂v ∂x
[The symbol ∂u/∂x means ∂/∂x g(x, y), and ∂/∂u f (u, v) means
D1 f (u, v) = D1 f (g(x, y), h(x, y)).] This equation is often written simply
∂f ∂f ∂u ∂f ∂v
= + .
∂x ∂u ∂x ∂v ∂x
Note that f means something different on the two sides of the equation!
Preface xv
be interpreted automatically, as by a computer. As a consequence

of this requirement the formulas and equations that appear in the
text stand on their own. They have clear meaning, independent of
the informal context. For example, we write Lagrange’s equations
in functional notation as follows:3
D(∂2 L ◦ Γ[q]) − ∂1 L ◦ Γ[q] = 0
The Lagrangian L is a real-valued function of time t, coordinates

x, and velocities v; the value is L(t, x, v). Partial derivatives
are indicated as derivatives of functions with respect to partic-
ular argument positions; ∂2 L indicates the function obtained by
taking the partial derivative of the Lagrangian function L with
respect to the velocity argument position. The traditional partial
derivative notation, which employs a derivative with respect to a
“variable,” depends on context and can lead to ambiguity.4 The
partial derivatives of the Lagrangian are then explicitly evaluated
along a path function q. The time derivative is taken and the
Lagrange equations formed. Each step is explicit; there are no
implicit substitutions.
Computational algorithms are used to communicate precisely
some of the methods used in the analysis of dynamical phenomena.
Expressing the methods of variational mechanics in a computer
language forces them to be unambiguous and computationally
effective. Computation requires us to be precise about the repre-
sentation of mechanical and geometric notions as computational
objects and permits us to represent explicitly the algorithms for
manipulating these objects. Also, once formalized as a procedure,
a mathematical idea becomes a tool that can be used directly to
compute results.
Active exploration on the part of the student is an essential
part of the learning experience. Our focus is on understanding
the motion of systems; to learn about motion the student must
actively explore the motion of systems through simulation and
3
This is presented here without explanation, to give the flavor of the notation.
The text gives a full explanation.
4
“It is necessary to use the apparatus of partial derivatives, in which even the
notation is ambiguous.” From V.I. Arnold, Mathematical Methods of Classical
Mechanics (1980), Section 47, p258. See also the footnote on that page.
xvi Preface
experiment. The exercises and projects are an integral part of the

presentation.
That the mathematics is precise enough to be interpreted au-
tomatically allows active exploration to be extended to the math-
ematics. The requirement that the computer be able to inter-
pret any expression provides strict and immediate feedback as
to whether the expression is correctly formulated. Experience
demonstrates that interaction with the computer in this way un-
covers and corrects many deficiencies in understanding.
This book presents classical mechanics from an unusual per-
spective. It focuses on understanding motion rather than deriving
equations of motion. It weaves recent discoveries of nonlinear dy-
namics throughout the presentation, rather than presenting them
as an afterthought. It uses functional mathematical notation that
allows precise understanding of fundamental properties of classical
mechanics. It uses computation to constrain notation, to capture
and formalize methods, for simulation, and for symbolic analysis.
This book is the result of teaching classical mechanics at MIT
for the past six years. The contents of our class began with ideas
from a class on nonlinear dynamics and solar system dynamics by
Wisdom and ideas about how computation can be used to formu-
late methodology developed in the introductory computer science
class by Abelson and Sussman. When we started we expected that
using this approach to formulate mechanics would be easy. We
quickly learned though that there were many things we thought we
understood that we did not in fact understand. Our requirement
that our mathematical notations be explicit and precise enough
so that they can be interpreted automatically, as by a computer,
is very effective in uncovering puns and flaws in reasoning. The
resulting struggle to make the mathematics precise, yet clear and
computationally effective, lasted far longer than we anticipated.
We learned a great deal about both mechanics and computation
by this process. We hope others, especially our competitors, will
adopt these methods that enhance understanding, while slowing
research.
Acknowledgments
We would like to thank the many people who have helped us to

develop this book and the curriculum that it is designed to sup-
port. We have had substantial help from the wonderful students
who studied with us in our classical mechanics class. They have
forced us to be clear; they have found bugs that we had to fix, in
the software, in the presentation, and in our thinking.
We have had considerable technical help in the development and
presentation of the subject matter from Harold Abelson. Abelson
is one of the developers of the Scmutils software system. He put
mighty effort into some sections of the code. We also consulted
him when we were desperately trying to understand the logic of
mechanics. He often could propose a direction to lead out of an
intellectual maze.
Matthew Halfant started us on the development of the Scmutils
system. He encouraged us to get into scientific computation, using
Scheme and functional style as an active way to explain the ideas,
without the distractions of imperative languages such as C. In the
1980’s he wrote some of the early Scheme procedures for numerical
computation that we still use.
Dan Zuras helped us with the invention of the unique organi-
zation of the Scmutils system. It is because of his insight that the
system is organized around a generic extension of the chain rule
for taking derivatives. He also helped in the heavy lifting that was
required to make a really good polynomial GCD algorithm, based
on ideas that we learned from Richard Zippel.
This book, and a great deal of other work of our laboratory,
could not have been done without the outstanding work of Chris
Hanson. Chris developed and maintained the Scheme system un-
derlying this work. In addition, he took us through a pass of
reorganization of the Scmutils system that forced the clarification
of many of the ideas of types and of generic operations that make
our system as good as it is.
Guillermo Juan Rozas, co-developer of the Scheme system,
made major contributions to the Scheme compiler, and imple-
xviii Acknowledgments
mented a number of other arcane mechanisms that make our

system efficient enough to support our work.
Besides contributing to some of the methods for the solution
of linear equations in the Scmutils system, Jacob Katzenelson has
provided valuable feedback that improved the presentation of the
material.
Julie Sussman, PPA, provided careful reading and serious crit-
icism that forced us to reorganize and rewrite major parts of the
text. She also developed and maintained Gerald Jay Sussman over
these many years.
Cecile Wisdom, saint, is a constant reminder, by her faith and
example, of what is really important. This project would not
have been possible without the loving support and unfailing en-
couragement she has given Jack Wisdom. Their children, William,
Edward, Thomas, John, and Elizabeth Wisdom, each wonderfully
created, daily enrich his life with theirs.
Meinhard Mayer wants to thank Rita Mayer, for patient moral
support, particularly during his frequent visits to MIT during the
last 12 years; Niels Mayer for introducing him to the wonderful
world of Scheme (thus sowing the seeds for this collaboration), as
well as Elma and the rest of the family for their love.
Many have contributed to our understanding of dynamics over
the years. Michel Henon and Boris Chirikov have had particular
influence. Stan Peale, Peter Goldreich, Alar Toomre, and Scott
Tremaine have been friends and mentors. We thank former stu-
dents Jihad Touma and Matthew Holman, for their collaboration
and continued friendship. We have greatly benefited from associ-
ations with many in the nonlinear dynamics community: Tassos
Bountis, Robert Helleman, Michael Berry, Michael Tabor, Ian
Percival, John Greene, Robert MacKay, Jim Meiss, Dominique
Escande, David Ruelle, Mitchell Feigenbaum, Leo Kadanoff, Jim
Yorke, Celso Grebogi, Steve Wiggins, Philip Holmes, Jerry Gollub,
Harry Swinney, and many others. We also acknowledge the late
Res Jost, George Sudarshan, and Walter Thirring.
There are numerous others who have contributed to this work,
either in the development of the software or in the development of
the content, including Bill Siebert, Panayotis Skordos, Kleanthes
Koniaris, Kevin Lin, James McBride, Rebecca Frankel, Thomas
F. Knight, Pawan Kumar, Elizabeth Bradley, Alice Seckel, and
Kenneth Yip. We have had extremely useful feedback from and
Acknowledgments xix
discussions with Piet Hut, Jon Doyle, David Finkelstein, Peter

Fisher, and Robert Hermann.
We thank The MIT Artificial Intelligence Laboratory for its hos-
pitality and logistical support. We acknowledge the Matsushita
Corporation for support of Gerald Jay Sussman through an en-
dowed chair. We thank Breene M. Kerr for support of Jack Wis-
dom through an endowed chair. We thank the MIT Mathemat-
ics and EECS departments for sabbatical support for Meinhard
Mayer. And finally, we are grateful to Rebecca Bisbee for her
assistance over the many years we have been involved in this
project.
1
Lagrangian Mechanics
The purpose of mechanics is to describe how
bodies change their position in space with “time.”
I should load my conscience with grave sins against
the sacred spirit of lucidity were I to formulate the
aims of mechanics in this way, without serious
reflection and detailed explanations. Let us
proceed to disclose these sins.
Albert Einstein Relativity, the Special and General
Theory, (1961), p. 9.
The subject of this book is motion, and the mathematical tools

used to describe it.
Centuries of careful observations of the motions of the planets
revealed regularities in those motions, allowing accurate predic-
tions of phenomena such as eclipses and conjunctions. The effort
to formulate these regularities and ultimately to understand them
led to the development of mathematics and to the discovery that
mathematics could be effectively used to describe aspects of the
physical world. That mathematics can be used to describe natural
phenomena is a remarkable fact.
When a juggler throws a pin it takes a rather predictable path
and it rotates in a rather predictable way. In fact, the skill of jug-
gling depends crucially on this predictability. It is also a remark-
able discovery that the same mathematical tools used to describe
the motions of the planets can be used to describe the motion of
the juggling pin.
Classical mechanics describes the motion of a system of par-
ticles, subject to forces describing their interactions. Complex
physical objects, such as juggling pins, can be modeled as myriad
particles with fixed spatial relationships maintained by stiff forces
of interaction.
There are many conceivable ways a system could move that
never occur. We can imagine that the juggling pin might pause
in midair or go fourteen times around the head of the juggler be-
fore being caught, but these motions do not happen. How can
2 Chapter 1 Lagrangian Mechanics
we distinguish motions of a system that can actually occur from

other conceivable motions? Perhaps we can invent some mathe-
matical function that allows us to distinguish realizable motions
from among all conceivable motions.
The motion of a system can be described by giving the position
of every piece of the system at each moment. Such a description of
the motion of the system is called a configuration path; the config-
uration path specifies the configuration as a function of time. The
juggling pin rotates as it flies through the air; the configuration of
the juggling pin is specified by giving the position and orientation
of the pin. The motion of the juggling pin is specified by giving
the position and orientation of the pin as a function of time.
The function that we seek takes a configuration path as an
input and produces some output. We want this function to have
some characteristic behavior when the input is a realizable path.
For example, the output could be a number, and we could try to
arrange that the number is zero only on realizable paths. Newton’s
equations of motion are of this form; at each moment Newton’s
differential equations must be satisfied.
However, there is a alternate strategy that provides more in-
sight and power: we could look for a path-distinguishing function
that has a minimum on the realizable paths—on nearby unreal-
izable paths the value of the function is higher than it is on the
realizable path. This is the variational strategy: for each physical
system we invent a path-distinguishing function that distinguishes
realizable motions of the system by having a stationary point for
each realizable path.1 For a great variety of systems realizable
motions of the system can be formulated in terms of a variational
principle.2
1
A stationary point of a function is a point where the function’s value does not
vary as the input is varied. Local maxima or minima are stationary points.
2
The variational formulation successfully describes all of the Newtonian me-
chanics of particles and rigid bodies. The variational formulation has also
been usefully applied in the description of many other systems such as classi-
cal electrodynamics, the dynamics of inviscid fluids, and the design of mech-
anisms such as four-bar linkages. In addition, modern formulations of quan-
tum mechanics and quantum field theory build on many of the same con-
cepts. However, the variational formulation does not appear to apply to all
dynamical systems. For example, there is no simple prescription to apply
the variational apparatus to systems with dissipation, though in special cases
variational methods still apply.
3
Mechanics, as invented by Newton and his contemporaries, de-

scribes the motion of a system in terms of the positions, velocities,
and accelerations of each of the particles in the system. In contrast
to the Newtonian formulation of mechanics, the variational formu-
lation of mechanics describes the motion of a system in terms of
aggregate quantities that are associated with the motion of the
system as a whole.
In the Newtonian formulation the forces can often be written
as derivatives of the potential energy of the system. The motion
of the system is determined by considering how the individual
component particles respond to these forces. The Newtonian for-
mulation of the equations of motion is intrinsically a particle-by-
particle description.
In the variational formulation the equations of motion are for-
mulated in terms of the difference of the kinetic energy and the
potential energy. The potential energy is a number that is char-
acteristic of the arrangement of the particles in the system; the
kinetic energy is a number that is determined by the velocities of
the particles in the system. Neither the potential energy nor the
kinetic energy depend on how those positions and velocities are
specified. The difference is characteristic of the system as a whole
and does not depend on the details of how the system is specified.
So we are free to choose ways of describing the system that are
easy to work with; we are liberated from the particle-by-particle
description inherent in the Newtonian formulation.
The variational formulation has numerous advantages over the
Newtonian formulation. The equations of motion for those param-
eters that describe the state of the system are derived in the same
way regardless of the choice of those parameters: the method of
formulation does not depend on the choice of coordinate system.
If there are positional constraints among the particles of a system
the Newtonian formulation requires that we consider the forces
maintaining these constraints, whereas in the variational formu-
lation the constraints can be built into the coordinates. The vari-
ational formulation reveals the association of conservation laws
with symmetries. The variational formulation provides a frame-
work for placing any particular motion of a system in the context
of all possible motions of the system. We pursue the variational
formulation because of these advantages.
1.1 The Principle of Stationary Action
Let us suppose that for each physical system there is a path-

distinguishing function that is stationary on realizable paths. We
will try to deduce some of its properties.
Experience of motion
Our ordinary experience suggests that physical motion can be de-
scribed by configuration paths that are continuous and smooth.3
We do not see the juggling pin jump from one place to another.
Nor do we see the juggling pin suddenly change the way it is mov-
ing.
Our ordinary experience suggests that the motion of physical
systems does not depend upon the entire history of the system.
If we enter the room after the juggling pin has been thrown into
the air we cannot tell when it left the juggler’s hand. The juggler
could have thrown the pin from a variety of places at a variety
of times with the same apparent result as we walk in the door.4
So the motion of the pin does not depend on the details of the
history.
Our ordinary experience suggests that the motion of physical
systems is deterministic. In fact, a small number of parameters
summarize the important aspects of the history of the system and
determine the future evolution of the system. For example, at
any moment the position, velocity, orientation and rate of change
of the orientation of the juggling pin are enough to completely
determine the future motion of the pin.
Realizable paths
From our experience of motion we develop certain expectations
about realizable configuration paths. If a path is realizable, then
any segment of the path is a realizable path segment. Conversely,
a path is realizable if every segment of the path is a realizable
3
Experience with systems on an atomic scale suggests that at this scale systems
do not travel along well-defined configuration paths. To describe the evolution
of systems on the atomic scale we employ quantum mechanics. Here, we
restrict attention to systems for which the motion is well described by a smooth
configuration path.
4
Extrapolation of the orbit of the Moon backward in time cannot determine
the point at which the Moon was placed on this trajectory. To determine
the origin of the Moon we must supplement dynamical evidence with other
physical evidence such as chemical compositions.
path segment. The realizability of a path segment depends on

all points of the path in the segment. The realizability of a path
segment depends on every point of the path segment in the same
way; no part of the path is special. The realizability of a path
segment depends only on points of the path within the segment;
the realizability of a path segment is a local property.
So the path-distinguishing function aggregates some local prop-
erty of the system measured at each moment along the path seg-
ment. Each moment along the path must be treated the same way.
The contributions from each moment along the path segment must
be combined in a way that maintains the independence of the con-
tributions from disjoint subsegments. One method of combination
that satisfies these requirements is to add up the contributions,
making the path-distinguishing function an integral over the path
segment of some local property of the path.5
So we will try to arrange that the path-distinguishing func-
tion, constructed as an integral of a local property along the path,
assumes an extreme value for any realizable path. Such a path-
distinguishing function is traditionally called an action for the
system. We use the word “action” to be consistent with common
usage. Perhaps it would be clearer to continue to call it “path-
distinguishing function,” but then it would be more difficult for
others to know what we were talking about.6
In order to pursue the agenda of variational mechanics, we must
invent action functions that are stationary on the realizable tra-
jectories of the systems we are studying. We will consider actions
that are integrals of some local property of the configuration path
at each moment. Let γ be the configuration-path function; γ(t)
5
We suspect that this argument can be promoted to a precise constraint on
the possible ways of making this path-distinguishing function.
6
Historically, Huygens was the first to use the term “action” in mechanics. He
used the term to refer to “the effect of a motion.” This is an idea that came
from the Greeks. In his manuscript “Dynamica” (1690) Leibnitz enunciated a
“Least Action Principle” using the “harmless action,” which was the product
of mass, velocity, and the distance of the motion. Leibnitz also spoke of a
“violent action” in the case where things collided.
is the configuration at time t. The action of the segment of the

path γ in the time interval from t1 to t2 is7
Z t2
S[γ](t1 , t2 ) = F[γ] (1.1)
t1
where F[γ] is a function of time that measures some local property

of the path. It may depend upon the value of the function γ at
that time and the value of any derivatives of γ at that time.8
The configuration path can be locally described at a moment in
terms of the configuration, the rate of change of the configuration,
and all the higher derivatives of the configuration at the given
moment. Given this information the path can be reconstructed in
some interval containing that moment.9 Local properties of paths
can depend on no more than the local description of the path.
The function F measures some local property of the configura-
tion path γ. We can decompose F[γ] into two parts: a part that
measures some property of a local description and a part that ex-
tracts a local description of the path from the path function. The
function that measures the local property of the system depends
on the particular physical system; the method of construction of a
local description of a path from a path is the same for any system.
We can write F[γ] as a composition of these two functions:10
F[γ] = L ◦ T [γ]. (1.2)
7
RAb
definite integral of a real-valued
R b function f of a real argument is written
a
f . This can also be written a f (x)dx. The first notation emphasizes that
a function is being integrated.
8
Traditionally, square brackets are put around functional arguments. In this
case, the square brackets remind us that the value of S may depend on the
function γ in complicated ways, such as through its derivatives.
9
In the case of a real-valued function the value of the function and its deriva-
tives at some point can be used to construct a power series. For sufficiently
nice functions (real analytic) the power series constructed in this way con-
verges in some interval containing the point. Not all functions can be locally
represented in this way. For example, the function f (x) = exp(−1/x2 ), with
f (0) = 0, is zero and has all derivatives zero at x = 0, but this infinite number
of derivatives is insufficient to determine the function value at any other point.
10
Here ◦ denotes composition of functions: (f ◦g)(t) = f (g(t)). In our notation
the application of a path-dependent function to its path is of higher precedence
than the composition, so L ◦ T [γ] = L ◦ (T [γ]).
The function T takes the path and produces a function of time.

Its value is an ordered tuple containing the time, the configuration
at that time, the rate of change of the configuration at that time,
and the values of higher derivatives of the path evaluated at that
time. For the path γ and time t:11
T [γ](t) = (t, γ(t), Dγ(t), . . .) (1.3)
We refer to this tuple, which includes as many derivatives as are

needed, as the local tuple.
The function L depends on the specific details of the physical
system being investigated, but does not depend on any particular
configuration path. The function L computes a real-valued local
property of the path. We will find that L needs only a finite num-
ber of components of the local tuple to compute this property:
The path can be locally reconstructed from the full local descrip-
tion; that L depends on a finite number of components of the local
tuple guarantees that it measures a local property.12
The advantage of this decomposition is that the local descrip-
tion of the path is computed by a uniform process from the con-
figuration path, independent of the system being considered. All
of the system-specific information is captured in the function L.
The function L is called a Lagrangian13 for the system, and the
resulting action,
Z t2
S[γ](t1 , t2 ) = L ◦ T [γ], (1.4)
t1
11
The derivative Dγ of a configuration path γ can be defined in terms of
ordinary derivatives by specifying how it acts on sufficiently smooth real-
valued functions f of configurations. The exact definition is unimportant at
this stage. If you are curious see footnote 23.
12
We will later discover that an initial segment of the local tuple will be
sufficient to determine the future evolution of the system. That a configuration
and a finite number of derivatives determines the future means that there is
a way of determining all of the rest of the derivatives of the path from the
initial segment.
13
The classical Lagrangian plays a fundamental role in the path-integral for-
mulation of quantum mechanics (due to Dirac and Feynman), where the com-
plex exponential of the classical action yields the relative probability ampli-
tude for a path. The Lagrangian is the starting point for the Hamiltonian
formulation of mechanics (discussed in chapter 3), which is also essential in
the Schrödinger and Heisenberg formulations of quantum mechanics and in
the Boltzmann-Gibbs approach to statistical mechanics.
is called the Lagrangian action. Lagrangians can be found for a

great variety of systems. We will see that for many systems the
Lagrangian can be taken to be the difference between kinetic and
potential energy. Such Lagrangians depend only on the time, the
configuration, and the rate of change of the configuration. We will
focus on this class of systems, but will also consider more general
systems from time to time.
A realizable path of the system is to be distinguished from oth-
ers by having stationary action with respect to some set of nearby
unrealizable paths. Now some paths near realizable paths will
also be realizable: for any motion of the juggling pin there is an-
other that is slightly different. So when addressing the question
of whether the action is stationary with respect to variations of
the path we must somehow restrict the set of paths we are con-
sidering to contain only one realizable path. It will turn out that
for Lagrangians that depend only on the configuration and rate
of change of configuration it is enough to restrict the set of paths
to those that have the same configuration at the endpoints of the
path segment.
The Principle of Stationary Action14 asserts that for each dy-
namical system we can cook up a Lagrangian such that a realizable
path connecting the configurations at two times t1 and t2 is dis-
tinguished from all conceivable paths by the fact that the action
S[γ](t1 , t2 ) is stationary with respect to variations of the path.
For Lagrangians that depend only on the configuration and rate
of change of configuration the variations are restricted to those
that preserve the configurations at t1 and t2 .15
14
The principle is often called the “Principle of Least Action” because its
initial formulations spoke in terms of the action being minimized rather than
the more general case of taking on a stationary value. The term “Principle of
Least Action” is also commonly used to refer to a result, due to Maupertuis,
Euler, and Lagrange, which says that free particles move along paths for which
the integral of the kinetic energy is minimized among all paths with the given
endpoints. Correspondingly, the term “action” is sometimes used to refer
specifically to the integral of the kinetic energy. (Actually, Euler and Lagrange
used the vis viva, or twice the kinetic energy.)
15
Other ways of stating the principle of stationary action make it sound teleo-
logical and mysterious. For instance, one could imagine that the system con-
siders all possible paths from its initial configuration to its final configuration
and then chooses the one with the smallest action. Indeed, the underlying vi-
sion of a purposeful, economical, and rational universe played no small part in
the philosophical considerations that accompanied the initial development of
1.2 Configuration Spaces 9
Exercise 1.1: Fermat optics

Fermat observed that the laws of reflection and refraction could be ac-
counted for by the following facts: Light travels in a straight line in any
particular medium with a velocity that depends upon the medium. The
path taken by a ray from a source to a destination through any sequence
of media is a path of least total time, compared to neighboring paths.
Show that these facts do imply the laws of reflection and refraction.16
1.2 Configuration Spaces

Let us consider mechanical systems that can be thought of as
composed of constituent point particles, with mass and position,
but with no internal structure.17 Extended bodies may be thought
of as composed of a large number of these constituent particles
with specific spatial relationships between them. Extended bodies
maintain their shape because of spatial constraints between the
constituent particles. Specifying the position of all the constituent
particles of a system specifies the configuration of the system. The
existence of constraints between parts of the system, such as those
that determine the shape of an extended body, means that the
constituent particles cannot assume all possible positions. The
set of all configurations of the system that can be assumed is
called the configuration space of the system. The dimension of the
mechanics. The earliest action principle that remains part of modern physics is
Fermat’s Principle, which states that the path traveled by a light ray between
two points is the path that takes the least amount of time. Fermat formu-
lated this principle around 1660 and used it to derive the laws of reflection
and refraction. Motivated by this, the French mathematician and astronomer
Pierre-Louis Moreau de Maupertuis enunciated the Principle of Least Action
as a grand unifying principle in physics. In his Essai de cosmologie (1750)
Maupertuis appealed to this principle of “economy in nature” as evidence of
the existence of God, asserting that it demonstrated “God’s intention to regu-
late physical phenomena by a general principle of the highest perfection.” For
a historical perspective of Maupertuis’s, Euler’s, and Lagrange’s roles in the
formulation of the principle of least action, see Jourdain [25].
16
For reflection the angle of incidence is equal to the angle of reflection. Re-
fraction is described by Snell’s law. Snell’s Law is that when light passes from
one medium to another, the ratio of the sines of the angles made to the normal
to the interface is the inverse of the ratio of the refractive indices of the media.
The refractive index is the ratio of the speed of light in the vacuum to the
speed of light in the medium.
17
We often refer to a point particle with mass but no internal structure as a
point mass.
configuration space is the smallest number of parameters that have

to be given to completely specify a configuration. The dimension
of the configuration space is also called the number of degrees of
freedom of the system.18
For a single unconstrained particle it takes three parameters to
specify the configuration. Thus the configuration space of a point
particle is three dimensional. If we are dealing with a system with
more than one point particle, the configuration space is more com-
plicated. If there are k separate particles we need 3k parameters
to describe the possible configurations. If there are constraints
among the parts of a system the configuration is restricted to a
lower-dimensional space. For example, a system consisting of two
point particles constrained to move in three dimensions so that the
distance between the particles remains fixed has a five-dimensional
configuration space: for example, with three numbers we can fix
the position of one particle, and with two others we can give the
position of the other particle relative to the first.
Consider a juggling pin. The configuration of the pin is specified
if we give the positions of every atom making up the pin. However,
there exist more economical descriptions of the configuration. In
the idealization that the juggling pin is truly rigid, the distances
among all the atoms of the pin remain constant. So we can specify
the configuration of the pin by giving the position of a single atom
and the orientation of the pin. Using the constraints, the positions
of all the other constituents of the pin can be determined from
this information. The dimension of the configuration space of
the juggling pin is six: the minimum number of parameters that
specify the position in space is three, and the minimum number
of parameters that specify an orientation is also three.
As a system evolves with time, the constituent particles move
subject to the constraints. The motion of each constituent particle
18
Strictly speaking the dimension of the configuration space and the number
of degrees of freedom are not the same. The number of degrees of freedom is
the dimension of the space of configurations that are “locally accessible.” For
systems with integrable constraints the two are the same. For systems with
non-integrable constraints the configuration dimension can be larger than the
number of degrees of freedom. For further explanation see the discussion of
systems with non-integrable constraints below (section 1.10.3). Apart from
that discussion, all of the systems we will consider have integrable constraints
(they are “holonomic”). This is why we have chosen to blur the distinction be-
tween the number of degrees of freedom and the dimension of the configuration
space.
is specified by describing the changing configuration. Thus, the

motion of the system may be described as evolving along a path
in configuration space. The configuration path may be specified
by a function, the configuration-path function, which gives the
configuration of the system at any time.
Exercise 1.2: Degrees of freedom

For each of the mechanical systems described below, give the number of
degrees of freedom of the configuration space.
a. Three juggling pins.
b. A spherical pendulum, consisting of a point mass hanging from a
rigid massless rod attached to a fixed support point. The pendulum
bob may move in any direction subject to the constraint imposed by the
rigid rod. The point mass is subject to the uniform force of gravity.
c. A spherical double pendulum, consisting of one point-mass hanging
from a rigid massless rod attached to a second point-mass hanging from
a second massless rod attached to a fixed support point. The point mass
is subject to the uniform force of gravity.
d. A point mass sliding without friction on a rigid curved wire.
e. A top consisting of a rigid axisymmetric body with one point on the
symmetry axis of the body attached to a fixed support, subject to a
uniform gravitational force.
f. The same as e, but not axisymmetric.
1.3 Generalized Coordinates

In order to be able to talk about specific configurations we need to
have a set of parameters that label the configurations. The param-
eters that are used to specify the configuration of the system are
called the generalized coordinates. Consider an unconstrained free
particle. The configuration of the particle is specified by giving
its position. This requires three parameters. The unconstrained
particle has three degrees of freedom. One way to specify the po-
sition of a particle is to specify its rectangular coordinates relative
to some chosen coordinate axes. The rectangular components of
the position are generalized coordinates for an unconstrained par-
ticle. Or consider an ideal planar double pendulum: a point mass
constrained to always be a given distance from a fixed point by a
rigid rod, with a second mass that is constrained to be at a given
distance from the first mass by another rigid rod, all confined to a
vertical plane. The configuration is specified if the orientation of

the two rods is given. This requires at least two parameters; the
planar double pendulum has two degrees of freedom. One way to
specify the orientation of each rod is to specify the angle it makes
with the vertical. These two angles are generalized coordinates
for the planar double pendulum.
The number of coordinates need not be the same as the dimen-
sion of the configuration space, though there must be at least that
many. We may choose to work with more parameters than neces-
sary, but then the parameters will be subject to constraints that
restrict the system to possible configurations, that is, to elements
of the configuration space.
For the planar double pendulum described above, the two angle
coordinates are enough to specify the configuration. We could
also take as generalized coordinates the rectangular coordinates of
each of the masses in the plane, relative to some chosen coordinate
axes. These are also fine coordinates, but we will have to explicitly
keep in mind the constraints that limit the possible configurations
to the actual geometry of the system. Sets of coordinates with
the same dimension as the configuration space are easier to work
with because we do not have to deal with explicit constraints
among the coordinates. So for the time being we will consider
only formulations where the number of configuration coordinates
is equal to the number of degrees of freedom; later we will learn
how to handle systems with redundant coordinates and explicit
constraints.
In general, the configurations form a space M of some dimen-
sion n. The n-dimensional configuration space can be parametrized
by choosing a coordinate function χ that maps elements of the
configuration space to n-tuples of real numbers. If there is more
than one dimension, the function χ is a tuple of n independent
coordinate functions19 χi , i = 0, . . . , n − 1, where each χi is a
real-valued function defined on some region of the configuration
space.20 For a given configuration m in the configuration space M
19
A tuple of functions that all have the same domain is itself a function on
that domain: Given a point in the domain the value of the tuple of functions
is a tuple of the values of the component functions at that point.
20
The use of superscripts to index the coordinate components is traditional,
even though there is potential confusion, say, with exponents. We use zero-
based indexing.
the values χi (m) of the coordinate functions are the generalized

coordinates of the configuration. These generalized coordinates
permit us to identify points of the n-dimensional configuration
space with n-tuples of real numbers.21 For any given configura-
tion space, there are a great variety of ways to choose generalized
coordinates. Even for a single point moving without constraints,
we can choose rectangular coordinates, polar coordinates, or any
other coordinate system that strikes our fancy.
The motion of the system can be described by a configuration
path γ mapping time to configuration-space points. Correspond-
ing to the configuration path is a coordinate path q = χ◦γ mapping
time to tuples of generalized coordinates. If there is more than
one degree of freedom the coordinate path is a structured object:
q is a tuple of component coordinate path functions q i = χi ◦ γ.
At each instant of time t, the values q(t) = (q 0 (t), . . . , q n−1 (t)) are
the generalized coordinates of a configuration.
The derivative Dq of the coordinate path q is a function22 that
gives the rate of change of the configuration coordinates at a given
time: Dq(t) = (Dq 0 (t), . . . , Dq n−1 (t)). The rate of change of a
generalized coordinate is called a generalized velocity.
We can make coordinate representations for higher derivatives
of the path as well. We introduce the function (pronounced
21
More precisely, the generalized coordinates identify open subsets of the con-
figuration space with open subsets of Rn . It may require more than one set of
generalized coordinates to cover the entire configuration space. For example,
if the configuration space is a two-dimensional sphere, we could have one set
of coordinates that maps (a little more than) the northern hemisphere to a
disk, and another set that maps (a little more than) the southern hemisphere
to a disk, with a strip near the equator common to both coordinate systems.
A space that can be locally parametrized by smooth coordinate functions is
called a differentiable manifold. The theory of differentiable manifolds can be
used to formulate a coordinate-free treatment of variational mechanics. An
introduction to mechanics from this perspective can be found in [2] or [5] .
22
The derivative of a function f is a function. It is denoted Df . Our notational
convention is that D is a high-precedence operator. Thus D operates on the
adjacent function before any other application occurs: Df (x) is the same as
(Df )(x).
“chart”) that extends a coordinate representation to the local tu-

ple:23
χ (t, γ(t), Dγ(t), . . .) = (t, q(t), Dq(t), . . .) , (1.5)
where q = χ ◦ γ. The function χ takes the coordinate-free local

tuple (t, γ(t), Dγ(t), . . .) and gives a coordinate representation as
a tuple of the time, the value of the coordinate path function at
that time, and the values of as many derivatives of the coordinate
path function as are needed.
Given a coordinate path q = χ ◦ γ the rest of the local tuple can
be computed from it. We introduce a function Γ that does this
Γ[q](t) = (t, q(t), Dq(t), . . .) . (1.6)
The evaluation of Γ only involves taking derivatives of the coordi-

nate path q = χ ◦ γ; the function Γ does not depend on χ. From
relations (1.5) and (1.6) we find
Γ[q] = χ ◦ T [γ]. (1.7)
Exercise 1.3: Generalized coordinates

For each of the systems described in exercise 1.2 specify a system of
generalized coordinates that can be used to describe the behavior of the
system.
Lagrangians in generalized coordinates

The action is a property of a configuration path segment for a
particular Lagrangian L. The action does not depend on the co-
ordinate system that is used to label the configurations. We can
use this property to find a coordinate representation Lχ for the
Lagrangian L.
23
The formal definition of is unimportant to the discussion, but if you really
want to know here is one way to do it:
First, we define the derivative Dγ of a configuration path γ in terms of
ordinary derivatives by specifying how it acts on sufficiently smooth real-
valued functions f of configurations: (Dn γ)(t)(f ) = Dn (f ◦ γ)(t). Then we
define χ (a, b, c, d, . . .) = (a, χ(b), c(χ), d(χ), . . .) . With this definition:
2 ¡ 2 ¢
χ (t, γ(t), Dγ(t), D γ(t), . . .) = t, χ(γ(t)), Dγ(t)(χ), D γ(t)(χ), . . .
¡ ¢
= t, χ ◦ γ(t), D(χ ◦ γ)(t), D2 (χ ◦ γ)(t), . . .
¡ ¢
= t, q(t), Dq(t), D2 q(t), . . . .
The action is
Z t2
S[γ](t1 , t2 ) = L ◦ T [γ]. (1.8)
t1
The Lagrangian L is a function of the local tuple T [γ](t) =

(t, γ(t), Dγ(t), . . .). The local tuple has the coordinate represen-
tation Γ[q] = χ ◦ T [γ], where q = χ ◦ γ. So if we choose24
−1
Lχ = L ◦ χ , (1.9)
then25
Lχ ◦ Γ[q] = L ◦ T [γ]. (1.10)
On the left we have the composition of functions that use the

intermediary of a coordinate representation; on the right we have
the composition of two functions that do not involve coordinates.
We define the coordinate representation of the action to be
Z t2
Sχ [q](t1 , t2 ) = Lχ ◦ Γ[q]. (1.11)
t1
The function Sχ takes a coordinate path; the function S takes a

configuration path. Since the integrands are the same by equa-
tion (1.10) the integrals have the same value:
S[γ](t1 , t2 ) = Sχ [χ ◦ γ](t1 , t2 ). (1.12)
So we have a way of constructing coordinate representations of a

Lagrangian that gives the same action for a path in any coordinate
system.
For Lagrangians that depend only on positions and velocities
the action can also be written
Z t2
Sχ [q](t1 , t2 ) = Lχ (t, q(t), Dq(t)) dt. (1.13)
t1
24
The coordinate function χ is locally invertible, and so is χ.
25 −1
L ◦ T [γ] = L ◦ χ ◦ χ ◦ T [γ] = Lχ ◦ Γ[χ ◦ γ] = Lχ ◦ Γ[q].
The coordinate system used in the definition of a Lagrangian or

an action is usually unambiguous, so the subscript χ will usually
be dropped.
1.4 Computing Actions

To illustrate the above ideas, and to introduce their formulation as
computer programs, we consider the simplest mechanical system—
a free particle moving in three dimensions. Euler and Lagrange
discovered that for a free particle the time-integral of the kinetic
energy over the particle’s actual path is smaller than the same
integral along any alternative path between the same points: a
free particle moves according to the principle of stationary action,
provided we take the Lagrangian to be the kinetic energy. The ki-
netic energy for a particle of mass m and velocity ~v is 12 mv 2 , where
v is the magnitude of ~v . In this case we can choose the generalized
coordinates to be the ordinary rectangular coordinates.
Following Euler and Lagrange, the Lagrangian for the free par-
ticle is26
L(t, x, v) = 12 m(v · v), (1.14)
where the formal parameter x names a tuple of components of

the position with respect to a given rectangular coordinate sys-
tem, and where the formal parameter v names a tuple of velocity
components.27
We can express this formula as a procedure:
26
Here we are making a function definition. A definition specifies the value
of the function for arbitrarily chosen formal parameters. One may change
the name of a formal parameter, so long as the new name does not conflict
with any other symbol in the definition. For example, the following definition
specifies exactly the same free-particle Lagrangian:
L(a, b, c) = 12 m(c · c).
27
The Lagrangian is formally a function of the local tuple, but any particular
Lagrangian only depends on a finite initial segment of the local tuple. We
define functions of local tuples by explicitly declaring names for the elements
of the initial segment of the local tuple that includes the elements upon which
the function depends.
(define ((L-free-particle mass) local)

(let ((v (velocity local)))
(* 1/2 mass (dot-product v v))))
The definition indicates that L-free-particle is a procedure that

takes mass as an argument and returns a procedure that takes
a local tuple local,28 extracts the generalized velocity with the
procedure velocity, and uses the velocity to compute the value
of the Lagrangian.
Suppose we let q denote a coordinate path function that maps
time to position components:29
q(t) = (x(t), y(t), z(t)) . (1.15)
We can make this definition30

(define q
(up (literal-function ’x)
(literal-function ’y)
(literal-function ’z)))
where literal-function makes a procedure that represents a

function of one argument that has no known properties other than
the given symbolic name.31 The symbol q now names a procedure
28
We represent the local tuple as a composite data structure, the components
of which are the time, the generalized coordinates, the generalized velocities,
and possibly higher derivatives. We do not want to be bothered by the details
of packing and unpacking the components into these structures, so we provide
utilities for doing this. The constructor ->local takes the time, the coor-
dinates, and the velocities and returns a data structure representing a local
tuple. The selectors time, coordinate, and velocity extract the appropri-
ate pieces from the local structure. The procedures time = (component 0),
coordinate = (component 1) and velocity = (component 2).
29
Be careful. The x in the definition of q is not the same as the x that was used
as a formal parameter in the definition of the free-particle Lagrangian above.
There are only so many letters in the alphabet, so we are forced to reuse them.
We will be careful to indicate where symbols are given new meanings.
30
A tuple of coordinate or velocity components is made with the procedure
up. Component i of the tuple q is (ref q i). All indexing is zero based. The
word up is to remind us that in mathematical notation these components are
indexed by superscripts. There are also down tuples of components that are
indexed by subscripts. See the appendix on notation.
31
In our system, arithmetic operators are generic over symbols and expressions
as well as numeric values; so arithmetic procedures can work uniformly with
numbers or expressions. For example, if we have the procedure (define (cube
of one real argument (time) that produces a tuple of three com-

ponents representing the coordinates at that time. For example,
we can evaluate this procedure for a symbolic time t as follows:
(print-expression (q ’t))
(up (x t) (y t) (z t))
The procedure print-expression produces a printable form of

the expression. The procedure print-expression simplifies ex-
pressions before printing them.
The derivative of the coordinate path Dq is the function that
maps time to velocity components:
Dq(t) = (Dx(t), Dy(t), Dz(t)).
We can make and use the derivative of a function.32 For example,

we can write:
(print-expression ((D q) ’t))
(up ((D x) t) ((D y) t) ((D z) t))
The function Γ takes a coordinate path and returns a function of

time that gives the local tuple (t, q(t), Dq(t), . . .). We implement
this Γ with the procedure Gamma. Here is what Gamma does:
(print-expression ((Gamma q) ’t))
(up t
(up (x t) (y t) (z t))
(up ((D x) t) ((D y) t) ((D z) t)))
So the composition L ◦ Γ is a function of time that returns the

value of the Lagrangian for this point on the path:
(print-expression
((compose (L-free-particle ’m) (Gamma q)) ’t))
(+ (* 1/2 m (expt ((D x) t) 2))
(* 1/2 m (expt ((D y) t) 2))
(* 1/2 m (expt ((D z) t) 2)))
x) (* x x x)) we can obtain its value for a number (cube 2) => 8 or for a
literal symbol (cube ’a) => (* a a a).
32
Derivatives of functions yield functions. For example, ((D cube) 2) => 12
and ((D cube) ’a) => (* 3 (expt a 2)).
The procedure show-expression is like print-expression except

that it puts the simplified expression into traditional infix form
and displays the result.33 Most of the time we will use this method
of display, to make the boxed expressions that appear in this book.
It also produces the prefix form as returned by print-expression,
but we will usually not show this.34
(show-expression
((compose (L-free-particle ’m) (Gamma q)) ’t))
1 1 1
m (Dx (t))2 + m (Dy (t))2 + m (Dz (t))2
2 2 2
According to equation (1.11) we can compute the Lagrangian

action from time t1 to time t2 as:
(define (Lagrangian-action L q t1 t2)
(definite-integral (compose L (Gamma q)) t1 t2))
Lagrangian-action takes as arguments a procedure L that com-

putes the Lagrangian, a procedure q that computes a coordinate
path, and starting and ending times t1 and t2. The definite-
integral used here takes as arguments a function and two lim-
its t1 and t2, and computes the definite integral of the function
over the interval from t1 to t2.35 Notice that the definition of
Lagrangian-action does not depend on any particular set of co-
ordinates or even the dimension of the configuration space. The
method of computing the action from the coordinate representa-
tion of a Lagrangian and a coordinate path does not depend on
the coordinate system.
We can now compute the action for the free particle along a
path. For example, consider a particle moving at uniform speed
33
The display is generated with TEX.
34
For very complicated expressions the prefix notation of Scheme is often bet-
ter, but simplification is almost always useful. We can separate the functions
of simplification and infix display. We will see examples of this later.
35
Scmutils includes a variety of numerical integration procedures. The ex-
amples in this section were computed by rational-function extrapolation of
Euler-MacLaurin formulas with a relative error tolerance of 10−10 .
along a straight line t 7→ (4t + 7, 3t + 5, 2t + 1).36 We represent

the path as a procedure
(define (test-path t)
(up (+ (* 4 t) 7)
(+ (* 3 t) 5)
(+ (* 2 t) 1)))
For a particle of mass 3, we obtain the action between t = 0 and

t = 10 as37
(Lagrangian-action (L-free-particle 3.0) test-path 0.0 10.0)
435.
Exercise 1.4: Lagrangian actions

For a free particle an appropriate Lagrangian is38
L(t, x, v) = 12 mv 2 .
Suppose that x is the constant-velocity straight-line path of a free par-
ticle, such that xa = x(ta ) and xb = x(tb ). Show that the action on the
solution path is
m (xb − xa )2
.
2 tb − ta
Paths of minimum action

We already know that the actual path of a free particle is uniform
motion in a straight line. According to Euler and Lagrange the
action is smaller along a straight-line test path than along nearby
paths. Let q be a straight-line test path with action S[q](t1 , t2 ).
Let q + ²η be a nearby path, obtained from q by adding a path
36
Surely for a real physical situation we would have to specify units for these
quantities. In this illustration we do not give units.
37
Here we use decimal numerals to specify the parameters. This forces the
representations to be floating point, which is efficient for numerical calculation.
If symbolic algebra is to be done it is essential that the numbers be exact
integers or rational fractions, so that expressions can be reliably reduced to
lowest terms. Such numbers are specified without a decimal point.
38
The squared magnitude of the velocity is ~v · ~v , the vector dot-product of
the velocity with itself. The square of a structure of components is defined to
be the sum of the squares of the individual components, so we write simply
v 2 = v · v.
variation η scaled by the real parameter ².39 The action on the

varied path is S[q + ²η](t1 , t2 ). Euler and Lagrange found S[q +
²η](t1 , t2 ) > S[q](t1 , t2 ) for any η that is zero at the endpoints and
for any small non-zero ².
Let’s check this numerically by varying the test path, adding
some amount of a test function that is zero at the endpoints t = t1
and t = t2 . To make a function η that is zero at the endpoints,
given a sufficiently well-behaved function ν, we can use η(t) =
(t − t1 )(t − t2 )ν(t). This can be implemented:
(define ((make-eta nu t1 t2) t)
(* (- t t1) (- t t2) (nu t)))
We can use this to compute the action for a free particle over a
path varied from the given path, as a function of ²:40
(define ((varied-free-particle-action mass q nu t1 t2) epsilon)
(let ((eta (make-eta nu t1 t2)))
(Lagrangian-action (L-free-particle mass)
(+ q (* epsilon eta))
t1
t2)))
The action for the varied path, with ν(t) = (sin t, cos t, t2 ), and
² = 0.001 is, as expected, larger than for the test path:
((varied-free-particle-action 3.0 test-path
(up sin cos square)
0.0 10.0)
0.001)
436.29121428571153
39
Note that we are doing arithmetic on functions. We extend the arithmetic
operations so that the combination of two functions of the same type (same
domains and ranges) is the function on the same domain that combines the
values of the argument functions in the range. For example, if f and g are
functions of t, then f g is the function t 7→ f (t)g(t). A constant multiple of
a function is the function whose value is the constant times the value of the
function for each argument: cf is the function t 7→ cf (t).
40
Note that we are adding procedures. Paralleling our extension of arithmetic
operations to functions, arithmetic operations are extended to compatible pro-
cedures.
We can numerically compute the value of ² for which the action

is minimized. We search between, say −2 and 1:41
(minimize
(varied-free-particle-action 3.0 test-path
(up sin cos square)
0.0 10.0)
-2.0 1.0)
(-1.5987211554602254e-14 435.0000000000237 5)
We find exactly what is expected—that the best value for ² is

zero,42 and the minimum value of the action is the action along
the straight path.
Finding trajectories that minimize the action
We have used the variational principle to determine if a given
trajectory is realizable. We can also use the variational princi-
ple to actually find trajectories. Given a set of trajectories that
are specified by a finite number of parameters, we can search the
parameter space looking for the trajectory in the set that best ap-
proximates the real trajectory by finding one that minimizes the
action. By choosing a good set of approximating functions we can
get arbitrarily close to the real trajectory.43
One way to make a parametric path that has fixed endpoints
is to use a polynomial that goes through the endpoints as well
as a number of intermediate points. Variation of the positions
of the intermediate points varies the path; the parameters of the
varied path are the coordinates of the intermediate positions. The
procedure make-path constructs such a path using a Lagrange
41
The arguments to minimize are a procedure implementing the univariate
function in question, and the lower and upper bounds of the region to be
searched. Scmutils includes a choice of methods for numerical minimization;
the one used here is Brent’s algorithm, with an error tolerance of 10−5 . The
value returned by minimize is a list of 3 numbers: the first is the argument
at which the minimum occurred, the second is the minimum obtained, and
the third is the number of iterations of the minimization algorithm required
to obtain the minimum.
42
Yes, -1.5987211554602254e-14 is zero for the tolerance required of the min-
imizer. And the 435.0000000000237 is arguably the same as 435 obtained
before.
43
There are lots of good ways to make such a parametric set of approximating
trajectories. One could use splines or higher-order interpolating polynomials;
one could use Chebyshev polynomials; one could use Fourier components. The
choice depends upon the kinds of trajectories one wants to approximate.
interpolation polynomial.44 The procedure make-path is called

with five arguments: (make-path t0 q0 t1 q1 qs), where q0 and
q1 are the endpoints, t0 and t1 are the corresponding times, and
qs is a list of intermediate points.
Having specified a parametric path we can construct a paramet-
ric action that is just the action computed along the parametric
path:
(define ((parametric-path-action Lagrangian t0 q0 t1 q1) qs)
(let ((path (make-path t0 q0 t1 q1 qs)))
(Lagrangian-action Lagrangian path t0 t1))))
We can find approximate solution paths by finding parameters

that minimize the action. We do this minimization with a canned
multidimensional minimization procedure:45
(define (find-path Lagrangian t0 q0 t1 q1 n)
(let ((initial-qs (linear-interpolants q0 q1 n)))
(let ((minimizing-qs
(multidimensional-minimize
(parametric-path-action Lagrangian t0 q0 t1 q1)
initial-qs)))
(make-path t0 q0 t1 q1 minimizing-qs))))
44
Here is one way to implement make-path:
(define (make-path t0 q0 t1 q1 qs)
(let ((n (length qs)))
(let ((ts (linear-interpolants t0 t1 n)))
(Lagrange-interpolation-function
(append (list q0) qs (list q1))
(append (list t0) ts (list t1))))))
The procedure linear-interpolants produces a list of elements that linearly
interpolate the first two arguments. We use this procedure here to specify ts,
the n evenly spaced intermediate times between t0 and t1 at which the path
will be specified. The parameters being adjusted, qs, are the positions at these
intermediate times. The procedure Lagrange-interpolation-function takes
a list of values and a list of times and produces a procedure that computes
the Lagrange interpolation polynomial that goes through these points.
45
The minimizer used here is the Nelder-Mead downhill simplex method. As
usual with numerical procedures, the interface to the nelder-mead procedure
is complex, with lots of optional parameters to allow the user to control errors
effectively. For this presentation we have specialized nelder-mead by wrapping
it in the more palatable multidimensional-minimize. Unfortunately, you will
have to learn to live with complicated numerical procedures someday.
The procedure multidimensional-minimize takes a procedure (in

this case the value of the call to action-on-parametric-path) that
computes the function to be minimized (in this case the action)
and an initial guess for the parameters. Here we choose the initial
guess to be equally-spaced points on a straight line between the
two endpoints, computed with linear-interpolants.
To illustrate the use of this strategy, we will find trajectories of
the harmonic oscillator, with Lagrangian46
L(t, q, v) = 12 mv 2 − 12 kq 2 , (1.16)
for mass m and spring constant k. This Lagrangian is imple-

mented by
(define ((L-harmonic m k) local)
(let ((q (coordinate local))
(v (velocity local)))
(- (* 1/2 m (square v)) (* 1/2 k (square q)))))
We can find an approximate path taken by the harmonic oscil-

lator for m = 1 and k = 1 between q(0) = 1 and q(π/2) = 0 as
follows:47
(define q (find-path (L-Harmonic 1.0 1.0) 0. 1. :pi/2 0. 3))
We know that the trajectories of this harmonic oscillator, for

m = 1 and k = 1, are
q(t) = A cos(t + ϕ) (1.17)
where the amplitude A and the phase ϕ are determined by the

initial conditions. For the chosen endpoint conditions the solution
is q(t) = cos(t). The approximate path should be an approxima-
tion to cosine over the range from 0 to π/2. Figure 1.1 shows the
error in the polynomial approximation produced by this process.
The maximum error in the approximation with three intermedi-
ate points is less than 1.7 × 10−4 . We find, as expected, that the
error in the approximation decreases as the number of intermedi-
46
Don’t worry. We know that you don’t yet know why this is the right La-
grangian. We will get to this in section 1.6.
47
By convention, named constants have names that begin with colon. The
constants named :pi and :-pi are what we would expect from their names.
+0.0002
-0.0002
0 π/4 π/2
Figure 1.1 The difference between the polynomial approximation

with minimum action and the actual trajectory taken by the harmonic
oscillator. The abscissa is the time and the ordinate is the error.
ate points is increased. For four intermediate points it is about a

factor of 15 better.
Exercise 1.5: Solution process

We can watch the progress of the minimization by modifying the proce-
dure parametric-path-action to plot the path each time the action is
computed. Try this:
(define win2 (frame 0. :pi/2 0. 1.2))
(define ((parametric-path-action Lagrangian t0 q0 t1 q1)

intermediate-qs)
(let ((path (make-path t0 q0 t1 q1 intermediate-qs)))
;; display path
(graphics-clear win2)
(plot-function win2 path t0 t1 (/ (- t1 t0) 100))
;; compute action
(Lagrangian-action Lagrangian path t0 t1)))
(find-path (L-harmonic 1. 1.) 0. 1. :pi/2 0. 2)
Exercise 1.6: Minimizing action

Suppose we try to obtain a path by minimizing an action for an im-
possible problem. For example, suppose we have a free particle and we
impose endpoint conditions on the velocities as well as the positions that

are inconsistent with the particle being free. Does the formalism protect
itself from such an unpleasant attack? You may find it illuminating to
program it and see what happens.
1.5 The Euler-Lagrange Equations

The principle of stationary action characterizes the realizable
paths of systems in configuration space as those for which the
action has a stationary value. In elementary calculus, we learn
that the critical points of a function are the points where the
derivative vanishes. In an analogous way, the paths along which
the action is stationary are solutions of a system of differential
equations. This system, called the Euler-Lagrange equations or
just the Lagrange equations, is the link that permits us to use
the principle of stationary action to compute the motions of me-
chanical systems, and to relate the variational and Newtonian
formulations of mechanics.48
Lagrange equations
We will find that if L is a Lagrangian for a system that depends
on time, coordinates, and velocities, and if q is a coordinate path
for which the action S[q](t1 , t2 ) is stationary (with respect to any
variation in the path that keeps the endpoints of the path fixed)
then
D(∂2 L ◦ Γ[q]) − ∂1 L ◦ Γ[q] = 0. (1.18)
Here L is a real-valued function of a local tuple; ∂1 L and ∂2 L

denote the partial derivatives of L with respect to its general-
ized position and generalized velocity arguments.49 The function
∂2 L maps a local tuple to a structure whose components are the
derivatives of L with respect to each component of the gener-
alized velocity. The function Γ[q] maps time to the local tuple:
Γ[q](t) = (t, q(t), Dq(t), . . .). Thus the compositions ∂1 L◦Γ[q] and
48
This result was initially discovered by Euler and later rederived by Lagrange.
49
The derivative or partial derivative of a function that takes structured argu-
ments is a new function that takes the same number and type of arguments.
The range of this new function is itself a structure with the same number of
components as the argument with respect to which the function is differenti-
ated.
∂2 L◦Γ[q] are functions of one argument, time. The Lagrange equa-

tions assert that the derivative of ∂2 L ◦ Γ[q] is equal to ∂1 L ◦ Γ[q],
at any time. Given a Lagrangian, the Lagrange equations form a
system of ordinary differential equations that must be satisfied by
realizable paths.50
1.5.1 Derivation of the Lagrange Equations

We will show that Principle of Stationary Action implies that
realizable paths satisfy a set of ordinary differential equations.
First we will develop tools for investigating how path-dependent
functions vary as the paths are varied. We will then apply these
tools to the action, to derive the Lagrange equations.
Varying a path
Suppose that we have a function f [q] that depends on a path q.
How does the function vary as the path is varied? Let q be a
coordinate path and q + ²η be a varied path, where the function
η is a path-like function that can be added to the path q, and the
factor ² is a scale factor. We define the variation δη f [q] of the
function f on the path q by51
µ ¶
f [q + ²η] − f [q]
δη f [q] = lim . (1.19)
²→0 ²
50
Lagrange’s equations are traditionally written in the form
d ∂L ∂L
− = 0,
dt ∂ q̇ ∂q
or, if we write a separate equation for each component of q, as
d ∂L ∂L
− i =0 i = 0, . . . , n − 1 .
dt ∂ q̇ i ∂q
In this way of writing Lagrange’s equations the notation does not distinguish
between L, which is a real-valued function of three variables (t, q, q̇), and L ◦
Γ[q], which is a real-valued function of one real variable t. If we do not realize
this notational pun, the equations don’t make sense as written—∂L/∂ q̇ is a
function of three variables, so we must regard the arguments q, q̇ as functions
of t before taking d/dt of the expression. Similarly, ∂L/∂q is a function of
three variables, which we must view as a function of t before setting it equal
to d/dt(∂L/∂ q̇). These implicit applications of the chain rule pose no problem
in performing hand computations—once you understand what the equations
represent.
51
The variation operator δη is like the derivative operator in that it acts on
the immediately following function: δη f [q] = (δη f )[q].
The variation of f is a linear approximation to the change in the

function f for small variations in the path. The variation of f
depends on η.
A simple example is the variation of the identity path function:
I[q] = q. Applying the definition
µ ¶
(q + ²η) − q
δη I[q] = lim = η. (1.20)
²→0 ²
It is traditional to write δη I[q] simply as δq. Another example is

the variation of the path function that returns the derivative of
the path. We have
µ ¶
D(q + ²η) − Dq
δη g[q] = lim = Dη with g[q] = Dq. (1.21)
²→0 ²
It is traditional to write δη g[q] as δDq.

The variation may be represented in terms of a derivative. Let
g(²) = f [q + ²η], then
µ ¶
g(²) − g(0)
δη f [q] = lim = Dg(0). (1.22)
²→0 ²
Variations have the following derivative-like properties. For
path-dependent functions f and g and constant c:
δη (f g)[q] = δη f [q] g[q] + f [q] δη g[q] (1.23)

δη (f + g)[q] = δη f [q] + δη g[q] (1.24)
δη (cf )[q] = c δη f [q]. (1.25)
Let F be a path-independent function and let g be a path-dependent

function, then
δη h[q] = (DF ◦ g[q]) δη g[q] with h[q] = F ◦ g[q]. (1.26)
The operators D (differentiation) and δ (variation) commute in

the following sense:
Dδη f [q] = δη g[q] with g[q] = D(f [q]). (1.27)
Variations also commute with integration in a similar sense.

If a path-dependent function f is stationary for a particular
path q with respect to small changes in that path then it must be
stationary for a subset of those variations that result from adding

small multiples of a particular function η to q. So the statement
δη f [q] = 0 for arbitrary η implies the function f is stationary for
small variations of the path around q.
Exercise 1.7: Properties of δ

Show that δ has the properties 1.23-1.27.
Exercise 1.8: Implementation of δ

a. Suppose we have a procedure f that implements a path-dependent
function: for path q and time t it has the value ((f q) t). The proce-
dure delta computes the variation (δη f )[q](t) as the value of ((((delta
eta) f) q) t). Complete the definition of delta:
(define ((((delta eta) f) q) t)
...
)
b. Use your delta procedure to verify the properties of δ listed in ex-

ercise 1.7 for simple functions such as implemented by the procedure f:
(define ((F q) t)
((literal-function ’f) (q t)))
This implements a simple path-dependent function that depends only

on the coordinates of the path at each moment.
Varying the action

The action is the integral of the Lagrangian along a path:
Z t2
S[q](t1 , t2 ) = L ◦ Γ[q]. (1.28)
t1
For a realizable path q the variation of the action with respect to

any variation η that preserves the endpoints, η(t1 ) = η(t2 ) = 0, is
zero:
δη S[q](t1 , t2 ) = 0. (1.29)
The variation of the action is

Z t2
δη S[q](t1 , t2 ) = δη h[q] where h[q] = L ◦ Γ[q]. (1.30)
t1
This follows from the fact that variation commutes with integra-
tion.
Using the fact that
δη Γ[q] = (0, η, Dη), (1.31)
which follows from equations (1.20) and (1.21), and using the chain
rule for variations (1.26) we get52
Z t2
δη S[q](t1 , t2 ) = (DL ◦ Γ[q])δη Γ[q]
t1
Z t2
= ((∂1 L ◦ Γ[q])η + (∂2 L ◦ Γ[q])Dη) . (1.32)
t1
Integrating the last term of equation (1.32) by parts gives
δη S[q](t1 , t2 ) = (∂2 L ◦ Γ[q])η|tt21

Z t2
+ {(∂1 L ◦ Γ[q]) − D(∂2 L ◦ Γ[q])} η. (1.33)
t1
For our variation η we have η(t1 ) = η(t2 ) = 0 so the first term

vanishes.
So the variation of the action is zero if and only if
Z t2
0= {(∂1 L ◦ Γ[q]) − D(∂2 L ◦ Γ[q])} η. (1.34)
t1
The variation of the action is zero because, by assumption, q is a

realizable path. Thus (1.34) must be true for any function η that
is zero at the endpoints.
We retain enough freedom in the choice of the variation so that
this forces the factor in the integrand multiplying η to be zero at
each point along the path. We argue by contradiction: Suppose
this factor were nonzero at some particular time. Then it would
have to be nonzero in at least one of its components. But if we
choose our η to be a bump that is nonzero only in that component
in a neighborhood of that time, and zero everywhere else, then the
52
A function of multiple arguments is considered a function of a tuple of its
arguments. Thus, the derivative of a function of multiple arguments is a
tuple of the partial derivatives of that function with respect to each of the
arguments. So in the case of a Lagrangian L
DL(t, q, v) = [∂0 L(t, q, v), ∂1 L(t, q, v), ∂2 L(t, q, v)] .
integral will be nonzero. So we may conclude that the factor in

curly brackets is identically zero:53
D (∂2 L ◦ Γ[q]) − (∂1 L ◦ Γ[q]) = 0. (1.35)
This is just what we set out to obtain, the Lagrange equations.

A path satisfying Lagrange’s equations is one for which the
action is stationary, and the fact that the action is stationary de-
pends only on the values of L at each point of the path (and at
each point on nearby paths), but not on the coordinate system we
use to compute these values. So if the system’s path satisfies La-
grange’s equations in some particular coordinate system, it must
satisfy Lagrange’s equations in any coordinate system. Thus the
equations of variational mechanics are derived the same way in
any configuration space and any coordinate system.
Harmonic oscillator
For an example, consider the harmonic oscillator. A Lagrangian
is
L(t, x, v) = 12 mv 2 − 12 kx2 . (1.36)
Then
∂1 L(t, x, v) = −kx and ∂2 L(t, x, v) = mv. (1.37)
The Lagrangian is applied to a tuple of the time, a coordinate,

and a velocity. The symbols t, x, and v are arbitrary; they are
used to specify formal parameters of the Lagrangian.
Now suppose we have a configuration path y, which gives the
coordinate of the oscillator y(t) for each time t. The initial seg-
ment of the corresponding local tuple at time t is
Γ[y](t) = (t, y(t), Dy(t)) . (1.38)
So
∂1 L ◦ Γ[y](t) = −ky(t) and ∂2 L ◦ Γ[y](t) = mDy(t), (1.39)
53
To make this argument more precise requires careful analysis.
and
D(∂2 L ◦ Γ[y])(t) = mD2 y(t), (1.40)
so the Lagrange equation is
mD2 y(t) + ky(t) = 0, (1.41)
which is the equation of motion of the harmonic oscillator.

Orbital motion
As another example, consider the two-dimensional motion of a
particle of mass m with gravitational potential energy −µ/r,
where r is the distance to the center of attraction. A Lagrangian
is54
1 µ
L(t; ξ, η; vξ , vη ) = m(vξ2 + vη2 ) + p , (1.42)
2 ξ + η2
2
where ξ and η are formal parameters for rectangular coordinates

of the particle, and vξ and vη are formal parameters for corre-
sponding rectangular velocity components. Then55
∂1 L(t; ξ, η; vξ , vη ) = [∂1,0 L(t; ξ, η; vξ , vη ), ∂1,1 L(t; ξ, η; vξ , vη )]

· ¸
−µξ −µη
= , . (1.43)
(ξ 2 + η 2 )3/2 (ξ 2 + η 2 )3/2
Similarly,
∂2 L(t; ξ, η; vξ , vη ) = [mvξ , mvη ] . (1.44)
Now suppose we have a configuration path q = (x, y), so that

the coordinate tuple at time t is q(t) = (x(t), y(t)). The initial
segment of the local tuple at time t is
Γ[q](t) = (t; x(t), y(t); Dx(t), Dy(t)) . (1.45)
54
When we write a definition that names the components of the local tuple, we
indicate that these are grouped into time, position, and velocity components
by separating the groups with semicolons.
55
The derivative with respect to a tuple is a tuple of the partial derivatives
with respect to each component of the tuple (see the appendix on notation).
So
· ¸
−µx(t) −µy(t)
∂1 L ◦ Γ[q](t) = 3/2
,
((x(t))2 + (y(t))2 ) ((x(t))2 + (y(t))2 )3/2
∂2 L ◦ Γ[q](t) = [mDx(t), mDy(t)] (1.46)
and
£ ¤
D(∂2 L ◦ Γ[q])(t) = mD2 x(t), mD2 y(t) . (1.47)
The component Lagrange equations at time t are

µx(t)
mD2 x(t) + =0
((x(t))2 + (y(t))2 )3/2
µy(t)
mD2 y(t) + = 0. (1.48)
((x(t))2 + (y(t))2 )3/2
Exercise 1.9: Lagrange’s equations

Derive the Lagrange equations for the following systems, showing all of
the intermediate steps as we did in the harmonic oscillator and orbital
motion examples.
a. A particle of mass m moves in a two-dimensional potential V (x, y) =
(x2 + y 2 )/2 + x2 y − y 3 /3, where x and y are rectangular coordinates of
the particle. A Lagrangian for this system is L(t; x, y; vx , vy ) = 12 m(vx2 +
vy2 ) − V (x, y).
b. An ideal planar pendulum consists of a bob of mass m connected to
a pivot by a massless rod of length l subject to uniform gravitational
acceleration g. A Lagrangian for this system is L(t, θ, θ̇) = 12 ml2 θ̇2 +
mgl cos θ. The formal parameters of L are t, θ, and θ̇; θ measures the
angle of the pendulum rod to a plumb-line and θ̇ is the angular velocity
of the rod.56
c. A Lagrangian for a particle of mass m constrained to move on a
sphere of radius R is L(t; θ, ϕ; α, β) = 12 mR2 (α2 + (β sin θ)2 ). The angle
θ is colatitude of the particle is and ϕ is the longitude; the rate of change
of the colatitude is α and the rate of change of the longitude is β.
56
The symbol θ̇ is just a mnemonic symbol; the dot over the θ is not intended
to indicate differentiation. To define L we could have just as well have written:
L(a, b, c) = 12 ml2 c2 + mgl cos b. However, we use a dotted symbol to remind
us that the argument matching a formal parameter, such as θ̇, is a rate of
change of an angle, such as θ.
Exercise 1.10: Higher derivative Lagrangians

Derive Lagrange’s equations for Lagrangians that depend on the acceler-
ations. In particular, show that the Lagrange equations for Lagrangians
of the form L(t, q, q̇, q̈) with q̈ terms are:57
D2 (∂3 L ◦ Γ[q]) − D(∂2 L ◦ Γ[q]) + ∂1 L ◦ Γ[q] = 0. (1.49)

In general, these equations, first derived by Poisson, will involve the
fourth derivative of q. Note that the derivation is completely analogous
to the derivation of the Lagrange equations without accelerations; it is
just longer. What restrictions must we place on the variations so that
the critical path satisfies a differential equation?
1.5.2 Computing Lagrange’s Equations

The procedure for computing Lagrange’s equations mirrors the
functional expression (1.18), where the procedure Gamma imple-
ments Γ:58
(define ((Lagrange-equations Lagrangian) q)
(- (D (compose ((partial 2) Lagrangian) (Gamma q)))
(compose ((partial 1) Lagrangian) (Gamma q))))
The argument of Lagrange-equations is a procedure that com-

putes a Lagrangian. It returns a procedure that when applied to
a path q returns a procedure of one argument (time) that com-
putes the left-hand side of the Lagrange equations (1.18). These
residual values are zero if q is a path for which the Lagrangian
action is stationary.
Observe that the Lagrange-equations procedure, like the La-
grange equations themselves, is valid for any generalized coordi-
nate system. When we write programs to investigate particular
systems, the procedures that implement the Lagrangian function
and the path q will reflect the actual coordinates chosen to rep-
resent the system, but we use the same Lagrange-equations pro-
cedure in each case. This abstraction reflects the important fact
57
In traditional notation these equations read
2
d ∂L d ∂L ∂L
− + = 0.
dt2 ∂ q̈ dt ∂ q̇ ∂q
58
The Lagrange-equations procedure uses the operations (partial 1) and
(partial 2), which implement the partial derivative operators with respect
to the second and third argument positions (those with indices 1 and 2).
1.5.2 Computing Lagrange’s Equations 35
that the method of derivation of Lagrange’s equations from a La-

grangian is always the same; it is independent of the number of
degrees of freedom, the topology of the configuration space, and
the coordinate system used to describe points in the configuration
space.
The free particle
Consider again the case of a free particle. The Lagrangian is
implemented by the procedure L-free-particle. Rather than
numerically integrating and minimizing the action, as we did in
section 1.4, we can check Lagrange’s equations for an arbitrary
straight-line path t 7→ (at + a0 , bt + b0 , ct + c0 )
(define (test-path t)
(up (+ (* ’a t) ’a0)
(+ (* ’b t) ’b0)
(+ (* ’c t) ’c0)))
(print-expression
(((Lagrange-equations (L-free-particle ’m))
test-path)
’t))
(down 0 0 0)
That the residuals are zero indicates that the test-path satisfies
the Lagrange equations.59
Instead of checking the equations for an individual path in
three-dimensional space, we can also apply the Lagrange-equations
procedure to an arbitrary function:60
(show-expression
(((Lagrange-equations (L-free-particle ’m))
(literal-function ’x))
’t))
(* (((expt D 2) x) t) m)
59
There is a Lagrange equation for every degree of freedom. The residuals of
all the equations are zero if the path is realizable. The residuals are arranged
in a down tuple because they result from derivatives of the Lagrangian with
respect to argument slots that take up tuples. See the appendix on notation.
60
Observe that the second derivative is indicated as the square of the derivative
operator (expt D 2). Arithmetic operations in Scmutils extend over operators
as well as functions.
mD2 x (t)
The result is an expression containing the arbitrary time t, and

mass m, so it is zero precisely when D2 x = 0, which is the expected
equation for a free particle.
The harmonic oscillator
Consider the harmonic oscillator again, with Lagrangian (1.16).
We know that the motion of a harmonic oscillator is a sinusoid
with a given amplitude, frequency and phase:
x(t) = a cos(ωt + ϕ). (1.50)
Suppose we have forgotten how the constants in the solution relate

to the physical parameters of the oscillator. Let’s plug in the
proposed solution and look at the residual:
(define (proposed-solution t)
(* ’a (cos (+ (* ’omega t) ’phi))))
(show-expression
(((Lagrange-equations (L-harmonic ’m ’k))
proposed-solution)
’t))
¡ ¢
cos (ωt + ϕ) a k − mω 2
The residual here shows that for nonzero amplitude, the

p only so-
lutions allowed are ones where (k − mω 2 ) = 0, or ω = k/m.
Exercise 1.11:
Compute Lagrange’s equations for the Lagrangians in exercise 1.9 using
the Lagrange-equations procedure. Additionally, use the computer to
perform each of the steps in the Lagrange-equations procedure and
show the intermediate results. Relate these steps to the ones you showed
in the hand derivation of exercise 1.9.
Exercise 1.12:
a. Write a procedure to compute the Lagrange equations for Lagrangians
that depend upon acceleration, as in exercise 1.10.
b. Use your procedure to compute the Lagrange equations for the La-
grangian
L(t, x, v, a) = − 12 mxa − 12 kx2 .
Do you recognize the resulting equation of motion?

c. For more fun write the general Lagrange equation procedure that
takes a Lagrangian of any order, and the order, to produce the required
equations of motion.
1.6 How to Find Lagrangians
Lagrange’s equations are a system of second-order differential

equations. In order to use them to compute the evolution of a
mechanical system we must find a suitable Lagrangian for the
system. There is no general way to construct a Lagrangian for
every system, but there is an important class of systems for which
we can identify Lagrangians in a straightforward way in terms of
kinetic and potential energy. The key idea is to construct a La-
grangian L such that Lagrange’s equations are Newton’s equations
F~ = m~a.
Suppose our system consists of N particles indexed by α, with
mass mα and vector position ~xα (t). Suppose further that the
forces acting on the particles can be written in terms of a gradient
of a potential energy V, which is a function of the positions of
the particles and possibly time, but which does not depend on the
~ ~x V,
velocities. In other words, the force on particle α is F~α = −∇ α
where ∇ ~ ~x V is the gradient of V with respect to the position of

α
the particle with index α. We can write Newton’s equations as

~ ~x V(t, ~x0 (t), . . . , ~xN −1 (t)) = 0.
D(mα D~xα )(t) + ∇ (1.51)
α
Vectors can be represented as tuples of components of the vec-

tors on a rectangular basis. So ~x1 (t) is represented as the tuple
x1 (t). Let V be the potential energy function expressed in terms
of components:
V (t; x0 (t), . . . , xN −1 (t)) = V(t, ~x0 (t), . . . , ~xN −1 (t)). (1.52)
Newton’s equations are
D(mα Dxα )(t) + ∂1,α V (t; x0 (t), . . . , xα (t), . . . , xN −1 (t)) = 0,(1.53)

where ∂1,α V is the partial derivative of V with respect to the xα (t)

argument slot.
To form the Lagrange equations we collect all the position
components of all the particles into one tuple x(t), so x(t) =
(x0 (t), . . . , xN −1 (t)). The Lagrange equations for the coordinate
path x are
D (∂2 L ◦ Γ[x]) − (∂1 L ◦ Γ[x]) = 0. (1.54)
Observe that Newton’s equations (1.51) are just the compo-

nents of the Lagrange equations (1.54) if we choose L to have the
properties
∂2 L ◦ Γ[x](t) = [m0 Dx0 (t), . . . , mN −1 DxN −1 (t)]

∂1 L ◦ Γ[x](t) = [−∂1,0 V (t, x(t)), . . . , −∂1,N −1 V (t, x(t))] , (1.55)
where V (t, x(t)) = V (t; x0 (t), . . . , xN −1 (t)) and ∂1,α V (t, x(t)) is
the tuple of the components of the derivative of V with respect
to the coordinates of the particle with index α, evaluated at time
t and coordinates x(t). These conditions are satisfied if for every
aα and bα
∂2 L(t; a0 , . . . , aN −1 ; b0 , . . . , bN −1 )
= [m0 b0 , . . . , mN −1 bN −1 ] (1.56)
and
∂1 L(t; a0 , . . . , aN −1 ; b0 , . . . , bN −1 )
= [−∂1,0 V (t, a), . . . , −∂1,N −1 V (t, a)] , (1.57)
where a = (a0 , . . . , aN −1 ). We use the symbols a and b to empha-

size that these are just formal parameters of the Lagrangian. One
choice for L that has the required properties (1.56-1.57) is
1X
L(t, x, v) = mα vα2 − V (t, x), (1.58)
2 α
where vα2 is the sum of the squares of the components of vα .61
61
Remember that x and v are just formal parameters of the Lagrangian. This
x is not the path x used earlier in the derivation, though it could be the value
of that path at a particular time.
The first term is the kinetic energy, conventionally denoted T .

So this choice for the Lagrangian is L(t, x, v) = T (t, x, v)−V (t, x),
the difference of the kinetic and potential energy. We will often
extend the arguments of the potential energy function to formally
include the velocities so that we can write L = T − V .62
Hamilton’s principle
Given a system of point particles for which we can identify the
force as the (negative) derivative of a potential energy V that is
independent of velocity, we have shown that the system evolves
along a path that satisfies Lagrange’s equations with L = T − V .
Having identified a Lagrangian for this class of systems, we can
restate the principle of stationary action in terms of energies. This
statement is known as Hamilton’s Principle: A point-particle sys-
tem for which the force is derived from a potential energy that
is independent of velocity, evolves along a path q for which the
action
Z t2
S[q](t1 , t2 ) = L ◦ Γ[q]
t1
is stationary with respect to variations of the path q that leave

the endpoints fixed, where L = T − V is the difference between
kinetic and potential energy.63
62
We can always give a function extra arguments that are not used so that it
can be algebraically combined with other functions of the same shape.
63
Hamilton formulated the fundamental variational principle for time-
independent systems in 1834-1835. Jacobi gave this principle the name
“Hamilton’s principle.” For systems subject to generic, nonstationary con-
straints Hamilton’s principle was investigated in 1848 by Ostrogradsky. In
the Russian literature Hamilton’s principle is often called the Hamilton-
Ostrogradsky principle.
William Rowan Hamilton (1805–1865) was a brilliant 19th-century mathe-
matician. His early work on geometric optics (based on Fermat’s principle)
was so impressive that he was elected to the post of Professor of Astronomy at
Trinity College and Royal Astronomer of Ireland while he was still an under-
graduate. He produced two monumental works of 19th-century mathematics.
His discovery of quaternions revitalized abstract algebra and sparked the de-
velopment of vector techniques in physics. His 1835 memoir “On a General
Method in Dynamics” put variational mechanics on a firm footing, finally giv-
ing substance to Maupertuis’s vaguely stated Principle of Least Action of 100
years before. Hamilton also wrote poetry and carried on an extensive corre-
spondence with Wordsworth, who advised him to put his energy into writing
mathematics rather than poetry.
It might seem that we have reduced Lagrange’s equations to

nothing more than F~ = m~a, and indeed, the principle is motivated
by comparing the two equations for this special class of systems.
However, the Lagrangian formulation of the equations of motion
has an important advantage over F~ = m~a. Our derivation used
the rectangular components xα of the positions of the constituent
particles for the generalized coordinates, but if the system’s path
satisfies Lagrange’s equations in some particular coordinate sys-
tem, it must satisfy the equations in any coordinate system. Thus
we see that L = T − V is suitable as a Lagrangian, with any set of
generalized coordinates. The equations of variational mechanics
are derived the same way in any configuration space and any co-
ordinate system. In contrast, the Newtonian formulation is based
on elementary geometry: in order for D2 ~x(t) to be meaningful
as an acceleration, ~x(t) must be a vector in physical space. La-
grange’s equations have no such restriction on the meaning of the
coordinate q. The generalized coordinates can be any parameters
that conveniently describe the configurations of the system.
Constant acceleration
Consider a particle of mass m in a uniform gravitational field with
acceleration g. The potential energy is mgh where h is the height
of the particle. The kinetic energy is just 12 mv 2 . A Lagrangian
for the system is the difference of the kinetic and potential en-
ergies. In rectangular coordinates, with y measuring the vertical
position and x measuring¡ the horizontal¢ position, the Lagrangian
is L(t; x, y; vx , vy ) = 12 m vx2 + vy2 − mgy. We have
(define ((L-uniform-acceleration m g) local)
(let ((y (ref q 1)))
(- (* 1/2 m (square v)) (* m g y)))))
In addition to the formulation of the fundamental variational principle,

Hamilton also stressed the analogy between geometric optics and mechanics,
and stressed the importance of the momentum variables (which were earlier
introduced by Lagrange and Cauchy), leading to the “canonical” form of me-
chanics, which we discuss in chapter 3.
(show-expression
(((Lagrange-equations
(L-uniform-acceleration ’m ’g))
(literal-function ’y)))
’t))
" #
mD2 x (t)
gm + mD2 y (t)
This equation describes unaccelerated motion in the horizontal

direction (mD2 x(t) = 0) and constant acceleration in the vertical
direction (mD2 y(t) = −gm).
Central force field
Consider planar motion of a particle of mass m in a central force
field, with an arbitrary potential energy U (r) depending only upon
the distance r to the center of attraction. We will derive the La-
grange equations for this system in both rectangular coordinates
and polar coordinates.
In rectangular coordinates (x, y), with origin p at the center of
attraction, the potential energy is V (t; x, y) = U ( x2 + y 2 ). The
kinetic energy is T (t; x, y; vx , vy ) = 12 m(vx2 + vy2 ). A Lagrangian
for the system is L = T − V :
p
L(t; x, y; vx , vy ) = 12 m(vx2 + vy2 ) − U ( x2 + y 2 ). (1.59)
As a procedure:
(define ((L-central-rectangular m U) local)
(- (* 1/2 m (square v))
(U (sqrt (square q))))))
The Lagrange equations are

(show-expression
(L-central-rectangular ’m (literal-function ’U)))
(literal-function ’y)))
’t))
 ³q ´ 
DU (y (t))2 + (x (t))2 x (t)
 mD2 x (t) + q 
 
 2
(y (t)) + (x (t)) 2 
 q 
 ³ ´ 
 2 2 
 DU (x (t)) + (y (t)) y (t) 
 mD y (t) +
2
q 
2 2
(x (t)) + (y (t))
We can rewrite these Lagrange equations as:

x(t)
mD2 x(t) = − DU (r(t)) (1.60)
r(t)
y(t)
mD2 y(t) = − DU (r(t)), (1.61)
r(t)
p
where r(t) = (x(t))2 + (y(t))2 . We can interpret these as fol-
lows. The particle is subject to a radially directed force with
magnitude −DU (r). Newton’s equations equate the force with
the product of the mass and the acceleration. The two Lagrange
equations are just the rectangular components of Newton’s equa-
tions.
We can describe the same system in polar coordinates. The
relationship between rectangular coordinates (x, y) and polar co-
ordinates (r, ϕ) is:
x = r cos ϕ
y = r sin ϕ. (1.62)
The relationship of the generalized velocities is derived from the

coordinate transformation. Consider a configuration path that is
represented in both rectangular and polar coordinates. Let x e and
ye be components of the rectangular coordinate path, and let re and
ϕe be components of the corresponding polar coordinate path. The
rectangular components at time t are (e x(t), ye(t)), and the polar
coordinates at time t are (e e
r(t), ϕ(t)). They are related by (1.62):
e(t) = re(t) cos ϕ(t)

x e
ye(t) = re(t) sin ϕ(t).
e (1.63)
The rectangular velocity at time t is (De

x(t), De
y (t)). Differentiat-
ing (1.63) gives the relationship among the velocities
De
x(t) = De e − re(t)Dϕ(t)
r(t) cos ϕ(t) e sin ϕ(t)
e
De
y (t) = De e + re(t)Dϕ(t)
r(t) sin ϕ(t) e sin ϕ(t).
e (1.64)
These relations are valid for any configuration path at any mo-
ment, so we can abstract them to relations among coordinate
representations of an arbitrary velocity. Let vx and vy be the
rectangular components of the velocity; and ṙ and ϕ̇ be the rate
of change of r and ϕ. Then
vx = ṙ cos ϕ − rϕ̇ sin ϕ

vy = ṙ sin ϕ + rϕ̇ cos ϕ. (1.65)
The kinetic energy is 12 m(vx2 + vy2 ):
T (t; r, ϕ; ṙ, ϕ̇) = 12 m(ṙ2 + r2 ϕ̇2 ), (1.66)
and the Lagrangian is
L(t; r, ϕ; ṙ, ϕ̇) = 12 m(ṙ2 + r2 ϕ̇2 ) − U (r). (1.67)
We express this Lagrangian as follows:

(define ((L-central-polar m U) local)
(qdot (velocity local)))
(let ((r (ref q 0)) (phi (ref q 1))
(rdot (ref qdot 0)) (phidot (ref qdot 1)))
(- (* 1/2 m
(+ (square rdot)
(square (* r phidot))) )
(U r)))))
Lagrange’s equations are:

(show-expression
(L-central-polar ’m (literal-function ’U)))
(up (literal-function ’r)
(literal-function ’phi)))
’t))
" #
mD2 r (t) − mr (t) (Dϕ (t))2 + DU (r (t))
2mDr (t) r (t) Dϕ (t) + mD 2 ϕ (t) (r (t))2
We can interpret the first equation as expressing that the product

of the mass and the radial acceleration is the sum of the force due
to the potential and the centrifugal force. The second equation
can be interpreted as saying that the derivative of the angular
momentum mr2 Dϕ is zero; so angular momentum is conserved.64
Note that we used the same Lagrange-equations procedure
for the derivation in both coordinate systems. Coordinate repre-
sentations of the Lagrangian are different for different coordinate
systems, and the Lagrange equations in different coordinate sys-
tems look different. Yet, the same method is used to derive the
Lagrange equations in any coordinate system.
Exercise 1.13:
Check that the Lagrange equations for central force motion in polar
coordinates and the Lagrange equations in rectangular coordinates are
equivalent. Determine the relationship among the second derivatives
by substituting paths into the transformation equations and computing
derivatives, then substitute these relations into the equations of motion.
1.6.1 Coordinate Transformations

The motion of a system is independent of the coordinates we use to
describe it. This coordinate-free nature of the motion is apparent
in the action principle. The action depends only on the value of the
Lagrangian along the path and not on the particular coordinates
used in the representation of the Lagrangian. We can use this
property to find a Lagrangian in one coordinate system in terms
of a Lagrangian in another coordinate system.
Suppose we have a mechanical system whose motion is de-
scribed by a Lagrangian L that depends on time, coordinates,
and velocities. And suppose we have a coordinate transformation
F such that x = F (t, x0 ). The Lagrangian L is expressed in terms
of the unprimed coordinates. We want to find a Lagrangian L0 ex-
pressed in the primed coordinates that describes the same system.
One way to do this is to require that the value of the Lagrangian
along any configuration path be independent of the coordinate
64
We will talk much more about angular momentum later.
system. If q is a path in the unprimed coordinates and q 0 is the

corresponding path in primed coordinates, then the Lagrangians
must satisfy:
L0 ◦ Γ[q 0 ] = L ◦ Γ[q]. (1.68)
We have seen that the transformation from rectangular to polar

coordinates implies that the generalized velocities transform in a
certain way. The velocity transformation can be deduced from the
requirement that a path in polar coordinates and a corresponding
path in rectangular coordinates are consistent with the coordinate
transformation. In general, the requirement that paths in two
different coordinate systems are consistent with the coordinate
transformation can be used to deduce how all of the components
of the local tuple transform. Given a coordinate transformation
F , let C be the corresponding function that maps local tuples in
the primed coordinate system to corresponding local tuples in the
unprimed coordinate system
C ◦ Γ[q 0 ] = Γ[q]. (1.69)
We will deduce the general form of C below.

Given such local tuple transformation C, a Lagrangian L0 that
satisfies equation (1.68) is
L0 = L ◦ C. (1.70)
We can see this by substituting L0 into equation (1.68)
L0 ◦ Γ[q 0 ] = L ◦ C ◦ Γ[q 0 ] = L ◦ Γ[q]. (1.71)
To deduce the local-tuple transformation C given a coordinate

transformation F , we deduce how each component of the local tu-
ple transforms. Of course, the coordinate transformation specifies
how the coordinate component of the local tuple transforms. The
generalized velocity component of the local-tuple transformation
can be deduced as follows. Let q and q 0 be the same configura-
tion path expressed in the two coordinate systems. Substituting
these paths into the coordinate transformation and computing the
derivative we find
Dq(t) = ∂0 F (t, q 0 (t)) + ∂1 F (t, q 0 (t))Dq 0 (t). (1.72)

Through any point there is always a path of any given velocity,

so we may generalize, and conclude that along corresponding co-
ordinate paths the generalized velocities satisfy
v = ∂0 F (t, x0 ) + ∂1 F (t, x0 )v 0 . (1.73)
If needed, rules for higher derivative components of the local tuple

can be determined in a similar fashion. The local-tuple transfor-
mation that takes a local tuple in the primed system to a local
tuple in the unprimed system is constructed from the component
transformations:
(t, x, v, . . .) = C(t, x0 , v 0 , . . .)
= (t, F (t, x0 ), ∂0 F (t, x0 ) + ∂1 F (t, x0 )v 0 , . . .) . (1.74)
So if we take the Lagrangian L0 to be
L0 = L ◦ C (1.75)
then the action has a value that is independent of the coordinate

system used to compute it. The configuration path of stationary
action does not depend on which coordinate system is used to
describe the path. The Lagrange equations derived from these
Lagrangians will in general look very different from one another,
but they must be equivalent.
Exercise 1.14:
Show by direct calculation that the Lagrange equations for L0 are satis-
fied if the Lagrange equations for L are satisfied.
Given a coordinate transformation F , we can use equation (1.74)

to find the function C, which transforms local tuples. The proce-
dure F->C implements this65
(define ((F->C F) local)
(->local (time local)
(F local)
(+ (((partial 0) F) local)
(* (((partial 1) F) local)
(velocity local)))))
65
As described in footnote 28 the procedure ->local constructs a local tuple
from an initial segment of time, coordinates, and velocities.
As an illustration, consider the transformation from polar to

rectangular coordinates: x = r cos ϕ and y = r sin ϕ, with the
following implementation:
(define (p->r local)
(let ((polar-tuple (coordinate local)))
(let ((r (ref polar-tuple 0))
(phi (ref polar-tuple 1)))
(let ((x (* r (cos phi)))
(y (* r (sin phi))))
(up x y)))))
In terms of the polar coordinates and the rates of change of the po-
lar coordinates, the rates of change of the rectangular components
are:
(show-expression
(velocity
((F->C p->r)
(->local ’t (up ’r ’phi) (up ’rdot ’phidot)))))
Ã !
−ϕ̇r sin (ϕ) + ṙ cos (ϕ)
ϕ̇r cos (ϕ) + ṙ sin (ϕ)
We can use F->C to find the Lagrangian for central force motion in
polar coordinates from the Lagrangian in rectangular components,
using equation (1.70),
(define (L-central-polar m U)
(compose (L-central-rectangular m U) (F->C p->r)))
(show-expression
((L-central-polar ’m (literal-function ’U))
(->local ’t (up ’r ’phi) (up ’rdot ’phidot))))
1 1
mϕ̇2 r2 + mṙ2 − U (r)
2 2
The result is the same as Lagrangian (1.67).
Exercise 1.15: Central force motion

Find Lagrangians for central force motion in three dimensions in rect-
angular coordinates and in spherical coordinates. First, find the La-
grangians analytically, then check the results with the computer by gen-
eralizing the programs that we have presented.
1.6.2 Systems with Rigid Constraints

We have found that L = T − V is a suitable Lagrangian for a
system of point particles subject to forces derived from a potential.
Extended bodies can sometimes be conveniently idealized as a
system of point particles connected by rigid constraints. We will
find that L = T − V , expressed in irredundant coordinates, is a
suitable Lagrangian for modeling systems of point particles with
rigid constraints. We will first illustrate the method and then
provide a justification.
Lagrangians for rigidly constrained systems
The system is presumed to be made of N point masses, indexed by
α, in ordinary three-dimensional space. The first step is to choose
a convenient set of irredundant generalized coordinates q and re-
describe the system in terms of these. In terms of the generalized
coordinates the rectangular coordinates of particle α is:
xα = fα (t, q). (1.76)
For irredundant coordinates q all the coordinate constraints are

built into the functions fα . We deduce the relationship of the
generalized velocities v to the velocities of the constituent particles
vα by inserting path functions into equation (1.76), differentiating,
and abstracting to arbitrary velocities.66 We find
vα = ∂0 fα (t, q) + ∂1 fα (t, q)v. (1.77)
We use equations (1.76) and (1.77) to express the kinetic energy

in terms of the generalized coordinates and velocities. Let Te be
the kinetic energy as a function of the rectangular coordinates and
velocities:
X1
Te(t; x0 , . . . , xN −1 ; v0 , . . . , vN −1 ) = mα v2α , (1.78)
α
2
where v2α is the squared magnitude of vα . As a function of the

generalized coordinate tuple q and the generalized velocity tuple
66
See section 1.6.1.
ys (t) 0110 g
θ
l
0110
y m
10
x
Figure 1.2 The pendulum is driven by vertical motion of the pivot.

The pivot slides on the y-axis. Although the bob is drawn as a blob
it is modeled as a point mass. The bob is acted on by the uniform
acceleration g of gravity in the negative ŷ direction.
v the kinetic energy is
T (t, q, v) = Te(t, f (t, q), ∂0 f (t, q) + ∂1 f (t, q)v)

X1
= mα (∂0 fα (t, q) + ∂1 fα (t, q)v)2 . (1.79)
α
2
Similarly, we use equation (1.76) to reexpress the potential en-

ergy in terms of the generalized coordinates. Let Ve (t, x) be the
potential energy at time t in the configuration specified by the
tuple of rectangular coordinates x. Expressed in generalized co-
ordinates the potential energy is
V (t, q, v) = Ve (t, f (t, q)). (1.80)
We take the Lagrangian to be the difference of the kinetic energy

and the potential energy: L = T − V .
A pendulum driven at the pivot
Consider a pendulum (see figure 1.2) of length l and mass m,
modeled as a point mass, supported by a pivot that is driven in
the vertical direction by a given function of time ys .
The dimension of the configuration space for this system is one;

we choose θ, shown in figure 1.2, as the generalized coordinate.
The position of the bob is given, in rectangular coordinates, by
x = l sin θ and y = ys (t) − l cos θ. (1.81)
The velocities are
vx = lθ̇ cos θ and vy = Dys (t) + lθ̇ sin θ, (1.82)
obtained by differentiating along a path and abstracting to veloc-

ities at the moment.
The kinetic energy is Te(t; x, y; vx , vy ) = 12 m(vx2 +vy2 ). Expressed
in generalized coordinates the kinetic energy is
1 ¡ ¢
T (t, θ, θ̇) = m l2 θ̇2 + (Dys (t))2 + 2lDys (t)θ̇ sin θ . (1.83)
2
The potential energy is Ve (t; x, y) = mgy. Expressed in gener-
alized coordinates the potential energy is
V (t, θ, θ̇) = gm (ys (t) − l cos θ) . (1.84)
A Lagrangian is L = T − V .
The Lagrangian is expressed as
(define ((T-pend m l g ys) local)
(let ((t (time local))
(theta (coordinate local))
(thetadot (velocity local)))
(let ((vys (D ys)))
(* 1/2 m
(+ (square (* l thetadot))
(square (vys t))
(* 2 l (vys t) thetadot (sin theta)))))))
(define ((V-pend m l g ys) local)

(theta (coordinate local)))
(* m g (- (ys t) (* l (cos theta))))))
(define L-pend (- T-pend V-pend))

Lagrange’s equation for this system is67

(show-expression
(L-pend ’m ’l ’g (literal-function ’y s)))
(literal-function ’theta))
’t))
D2 θ (t) l2 m + D2 ys (t) sin (θ (t)) lm + sin (θ (t)) glm
Exercise 1.16:
Derive the Lagrangians in exercise 1.9.
Exercise 1.17: Bead on a helical wire

A bead of mass m is constrained to move on a frictionless helical wire.
The helix is oriented so that its axis is horizontal. The diameter of the
helix is d and its pitch (turns per unit length) is h. The system is in
a uniform gravitational field with vertical acceleration g. Formulate a
Lagrangian that describes the system and find the Lagrange equations
of motion.
Exercise 1.18: Bead on a triaxial surface

A bead of mass m moves without friction on a triaxial ellipsoidal surface.
In rectangular coordinates the surface satisfies
x2 y2 z2
2
+ 2 + 2 = 1, (1.85)
a b c
for some constants a, b, and c. Identify suitable generalized coordinates,
formulate a Lagrangian, and find Lagrange’s equations.
Exercise 1.19: A two-bar linkage

The two-bar linkage shown in figure 1.3 is constrained to move in the
plane. It is composed of three small massive bodies interconnected by
two massless rigid rods in a uniform gravitational field with vertical
acceleration g. The rods are pinned to the central body by a hinge that
allows the linkage to fold. The system is arranged so that the hinge is
completely free: the members can go through all configurations without
67
We hope you appreciate the TEXmagic here. A symbol with a underline char-
acter is converted by show-expression to a subscript. Symbols with carets,
the names of Greek letters, and terminating in the characters ”dot” are simi-
larly mistreated.
g m2
l1
l2
m1
m3
Figure 1.3 A two-bar linkage is modeled by three point masses con-

nected by rigid massless struts. This linkage is subject to a uniform
vertical gravitational acceleration.
m1
x
θ g
m2
Figure 1.4 This pendulum is pivoted on a point particle of mass m1

that is allowed to slide on a horizontal rail. The pendulum bob is a point
particle of mass m2 that is acted on by the vertical force of gravity.
collision. Formulate a Lagrangian that describes the system and find

the Lagrange equations of motion. Use the computer to do this, because
the equations are rather big.
Exercise 1.20: Sliding pendulum

Consider a pendulum of length l attached to a support that is free to
move horizontally, shown in figure 1.4. Let the mass of the support be
m1 and the mass of the pendulum be m2 . Formulate a Lagrangian and
derive Lagrange’s equations for this system.
Why it works
In this section we show that L = T − V is in fact a suitable
Lagrangian for rigidly constrained systems. We do this by requir-
ing that the Lagrange equations be equivalent to the Newtonian

vectorial dynamics with vector constraint forces.68
We consider a system of particles. The particle with index α has
mass mα and position ~xα (t) at time t. There may be a very large
number of these particles, or just a few. Some of the positions
may also be specified functions of time, such as the position of the
pivot of a driven pendulum. There are rigid position constraints
among some of the particles; we assume all of these constraints
are of the form
2
(~xα (t) − ~xβ (t)) · (~xα (t) − ~xβ (t)) = lαβ , (1.86)
that is, the distance between particles α and β is lαβ .

The Newtonian equation of motion for particle α says that the
mass times the acceleration of particle α is equal to the sum of the
potential forces and the constraint forces. The potential forces are
derived as the negative gradient of the potential energy, and may
depend on the positions of the other particles and the time. The
constraint forces F~αβ are the vector constraint forces associated
with the rigid constraint between particle α and particle β. So
D(mα D~xα )(t)

X
= −∇ ~ ~x V(t, ~x0 (t), . . . , ~xN −1 (t)) + F~αβ (t), (1.87)
α
{β|β↔α}
where in the summation β ranges over only those particle indices

for which there are rigid constraints with the particle indexed by
α; we use the notation β ↔ α for the relation that there is a rigid
constraint between the indicated particles.
68
We will simply accept the Newtonian procedure for systems with rigid con-
straints and find Lagrangians that are equivalent. Of course, actual bodies are
never truly rigid, so we may wonder what detailed approximations have to be
made to treat them as truly rigid. For instance, a more satisfying approach
would be to replace the rigid distance constraints by very stiff springs. We
could then immediately write the Lagrangian as L = T − V , and we should
be able to derive the Newtonian procedure for systems with rigid constraints
as an approximation. However, this is too complicated to do at this stage, so
we accept the Newtonian idealization.
The force of constraint is directed along the line between the

particles, so we may write
~xβ (t) − ~xα (t)
F~αβ (t) = Fαβ (t) (1.88)
lαβ
where Fαβ (t) is the scalar magnitude of the tension in the con-
straint at time t. Note that F~αβ = −F~βα . In general, the scalar
constraint forces change as the system evolves.
Formally, we can reproduce Newton’s equations with the La-
grangian69
X
1 2
L(t; x, F ; ẋ, Ḟ ) = 2 mα ẋα − V (t, x)
α
X Fαβ £ ¤
− (xβ − xα )2 − lαβ
2
(1.89)
2lαβ
{α,β|α<β,α↔β}
where the constraint forces are being treated as additional gener-

alized coordinates. Here x is a structure composed of all of the
rectangular components xα of all of the ~xα , ẋ is a structure com-
posed of all the rectangular components ẋα of all of the velocity
vectors ~vα , and F is a structure composed of all of the Fαβ . The
velocity of F does not appear in the Lagrangian, and F itself only
appears linearly. So the Lagrange equations associated with F are
(xβ (t) − xα (t))2 − lαβ

2
=0 (1.90)
but this is just a restatement of the constraints. The Lagrange

equations for the particle coordinates are Newton’s equations (1.87)
D(mDxα )(t) = −∂1,α V (t, x(t))

X xβ (t) − xα (t)
+ Fαβ (t) . (1.91)
lαβ
{β|α↔β}
69
This Lagrangian is purely formal and does not represent a model of the
constraint forces. In particular, note that the constraint terms do not look
like a potential of constraint with a minimum when the constraint is exactly
satisfied. Rather, the constraint terms in the Lagrangian are zero when the
constraint is satisfied, and can be either positive or negative depending on
whether the distance between the particles is larger or smaller than the con-
straint distance.
Now that we have a suitable Lagrangian, we can use the fact

that Lagrangians can be reexpressed in any generalized coordi-
nates to find a simpler Lagrangian. The strategy is to choose
a new set of coordinates for which many of the coordinates are
constants and the remaining coordinates are irredundant.
Let q be a tuple of generalized coordinates that specify the de-
grees of freedom of the system without redundancy. Let c be a
tuple of other generalized coordinates that specify the distances
between particles for which constraints are specified. The c co-
ordinates will have constant values. The combination of q and c
replace the redundant rectangular coordinates x.70 In addition,
we still have the F coordinates, which are the scalar constraint
forces. Our new coordinates are the components of q, c, and F .
There exist functions fα that give the rectangular coordinates
of the constituent particles in terms of q and c:
xα = fα (t, q, c). (1.92)
To reexpress the Lagrangian in terms of q, c, and F we need to

find vα in terms of the generalized velocities q̇ and ċ: we do this
by differentiating fα along a path and abstracting to arbitrary
velocities (see section 1.6.1):
vα = ∂0 fα (t, q, c) + ∂1 fα (t, q, c) q̇ + ∂2 fα (t, q, c) ċ. (1.93)
Substituting these into Lagrangian (1.89), and using
c2αβ = (xβ − xα )2 , (1.94)
we find
L0 (t; q, c, F ; q̇, ċ, Ḟ )

X 2
1
= 2 mα (∂0 fα (t, q, c) + ∂1 fα (t, q, c) q̇ + ∂2 fα (t, q, c) ċ)
α
X Fαβ £ 2 ¤
2
− V (t, f (t, q, c)) − cαβ − lαβ . (1.95)
2lαβ
70
Typically the number of components of x is equal to the sum of the number
of components of q and c; adding a strut removes a degree of freedom and
adds a distance constraint. However, there are singular cases for which the
addition of single strut can remove more than a single degree of freedom. We
do not consider the singular cases here.
The Lagrange equations are derived by the usual procedure.

Rather than write out all the gory details, let’s think about how
it will go.
The Lagrange equations associated with F just restate the con-
straints:
0 = c2αβ (t) − lαβ

2
(1.96)
and consequently we know that along a solution path c(t) = l,

and Dc(t) = D2 c(t) = 0. We can use this result to simplify the
Lagrange equations associated with q and c.
The Lagrange equations associated with q are the same as if
they were derived from the Lagrangian71
X 2
L00 (t, q, q̇) = 1
2 mα (∂0 fα (t, q, l) + ∂1 fα (t, q, l) q̇)
α
− V (t, f (t, q, l)), (1.97)
but this is exactly T − V where T and V are computed from the

generalized coordinates q, with fixed constraints. Notice that the
constraint forces do not appear in the Lagrange equations for q
because in the Lagrange equations they are multiplied by a term
that is identically zero on the solution paths. So the Lagrange
equations for T − V with irredundant generalized coordinates q
and fixed constraints are equivalent to Newton’s equations with
vector constraint forces.
The Lagrange equations for c can be used to find the constraint
forces. The Lagrange equations are a big mess so we will not show
them explicitly, but in general they are equations in D2 c, Dc, and
c that will depend upon q, Dq, and F . The dependence on F is
linear, so we can solve for F in terms of the solution path q and
Dq, with c = l and Dc = D2 c = 0.
If we are not interested in the constraint forces, we can abandon
the full Lagrangian (1.95) in favor of Lagrangian (1.97), which is
71
Consider a function g of, say, three arguments, and let g0 be a function of two
arguments satisfying g0 (x, y) = g(x, y, 0). Then (∂0 g0 )(x, y) = (∂0 g)(x, y, 0).
The substitution of a value in an argument commutes with the taking of
the partial derivative with respect to a different argument. In deriving the
Lagrange equations for q we can set c = l and ċ = 0 in the Lagrangian, but we
cannot do this in deriving the Lagrange equations associated with c, because
we have to take derivatives with respect to those arguments.
equivalent as far as the evolution of the generalized coordinates q

is concerned.
The same derivation goes through even if the lengths lαβ speci-
fied in the interparticle distance constraints are a function of time.
It can also be generalized to allow distance constraints to be time-
dependent positions, by making some of the positions of particles
~xβ specified functions of time.
More generally
Consider a constraint of the form
ϕ(t, x(t)) = 0, (1.98)
where x(t) is the structure of all the rectangular components xα (t)

at time t. In section 1.10 we will show, using the variational
principle, that an appropriate Lagrangian for this system is
X
1 2
L(t; x, λ; ẋ, λ̇) = 2 mα ẋα − V (t, x) + λϕ(t, x), (1.99)
α
where λ is an additional coordinate and λ̇ is the corresponding

generalized velocity. The Lagrange equations associated with λ
are just a restatement of the constraints: ϕ(t, x(t)) = 0. The
Lagrange equations for the particle coordinates are:
D(mα Dxα )(t) = −∂1,α V (t, x(t)) + λ(t)∂1,α ϕ(t, x(t)). (1.100)
Such a constraint can also be modeled by including appropriate

constraint forces in Newton’s equations:
X
D(mα D~xα )(t) = −∇ ~ ~x V(t; ~x0 (t) . . . ~xN −1 (t)) + F~α (t). (1.101)
α
α
For equations (1.100) to be the same as equations (1.101) we must

identify λ(t)∂1,α ϕ(t, x(t)) with the forces of constraint on particle
α. Notice that these forces of constraint are proportional to the
normal to the constraint surface at each instant and thus do no
work for motions that obey the constraint.
Lagrangian (1.89), which we developed above to include New-
tonian forces of constraint for position constraints, is exactly of
x1 , y1
m1
l
θ
m0
x0 , y0
Figure 1.5 A rigid rod of length l constrains two massive particles in

a plane.
this form. We can identify

X Fαβ (t) £ ¤
λ(t)ϕ(t, x(t)) = (xβ (t) − xα (t))2 − lαβ
2
.
2lαβ
(1.102)
The forces of constraint satisfy

X xβ (t) − xα (t)
λ(t)∂1,α ϕ(t, x(t)) = Fαβ (t) . (1.103)
lαβ
{β|α↔β}
Accepting Lagrangian (1.99) as describing systems with con-

straints of the form (1.98), we can make a coordinate transforma-
tion from the redundant coordinates x to irredundant generalized
coordinates q and constraint coordinates c = ϕ(t, x), as above.
The coordinate λ will not appear in the Lagrange equations for
q because on solution paths they will be multiplied by a factor
that is identically zero. If we are interested only in the evolution
of the generalized coordinates we can assume the constraints are
identically satisfied and take the Lagrangian to be the difference
of the kinetic and potential energies expressed in terms of the
generalized coordinates.
Exercise 1.21: The dumbbell

In this exercise we will recapitulate the derivation of the Lagrangian for
constrained systems for a particular simple system.
Consider two massive particles in the plane constrained by a massless
rigid rod to remain a distance l apart, as in figure 1.5. There are appar-
ently four degrees of freedom for two massive particles in the plane, but
the rigid rod reduces this number to three.
We can uniquely specify the configuration with the redundant coor-

dinates of the particles, say x0 (t), y0 (t) and x1 (t), y1 (t). The constraint
(x1 (t) − x0 (t))2 + (y1 (t) − y0 (t))2 = l2 eliminates one degree of freedom.
a. Write Newton’s equations for the balance of forces for the four rect-
angular coordinates of the two particles, given that the scalar tension in
the rod is F .
b. Write the formal Lagrangian
L(t; x0 , y0 , x1 , y1 , F ; ẋ0 , ẏ0 , ẋ1 , ẏ1 , Ḟ )
such that Lagrange’s equations will yield the Newton’s equations that
you derived in part a.
c. Make a change of coordinates to a coordinate system with center of
mass coordinates xcm , ycm , angle θ, distance between the particles c, and
tension force F . Write the Lagrangian in these coordinates, and write
the Lagrange equations.
d. You may deduce from one of these equations that c(t) = l. From
this fact we get that Dc = 0 and D2 c = 0. Substitute these into the
Lagrange equations you just computed to get equation of motion for
xcm , ycm , θ.
e. Make a Lagrangian (= T − V ) for the system described with the irre-
dundant generalized coordinates xcm , ycm , θ and compute the Lagrange
equations from this Lagrangian. They should be the same equations as
you derived for the same coordinates from part d.
Exercise 1.22: Driven pendulum

Show that the Lagrangian (1.89) can be used to describe the driven
pendulum, where the position of the pivot is a specified function of
time: Derive the equations of motion using the Newtonian constraint
force prescription, and show that they are the same as the Lagrange
equations. Be sure to examine the equations for the constraint forces as
well as the position of the pendulum bob.
Exercise 1.23: Fill in the details

Show that the Lagrange equations for Lagrangian (1.97) are the same
as the Lagrange equations for Lagrangian (1.95) with the substitution
c(t) = l, Dc(t) = D2 c(t) = 0.
Exercise 1.24: Constraint forces

Find the tension in an undriven planar pendulum.
1.6.3 Constraints as Coordinate Transformations

The derivation of a Lagrangian for a constrained system involves
steps that are analogous to the steps in the derivation of a coor-
dinate transformation.
For a constrained system one specifies the rectangular coordi-
nates of the constituent particles in terms of generalized coordi-
nates that incorporate the constraints. We then determine the
rectangular velocities of the constituent particles as functions the
generalized coordinates and the generalized velocities. The La-
grangian that we know how to express in rectangular coordinates
and velocities of the constituent particles can then be reexpressed
in terms of the generalized coordinates and velocities.
To carry out a coordinate transformation one specifies how the
configuration of a system expressed in one set of generalized coor-
dinates can be reexpressed in terms of another set of generalized
coordinates. We then determine the transformation of general-
ized velocities implied by the transformation of generalized coor-
dinates. A Lagrangian that is expressed in terms of one of the
sets of generalized coordinates can then be reexpressed in terms
of the other set of generalized coordinates.
These are really two applications of the same process, so we
can make Lagrangians for constrained systems by composing a
Lagrangian for unconstrained particles with a coordinate trans-
formation that incorporates the constraint. Our deduction that
L = T − V is a suitable Lagrangian for a constrained systems was
in fact based on a coordinate transformation from a set of coor-
dinates subject to constraints to a set of irredundant coordinates
plus constraint coordinates that are constant.
Let xα be the tuple of rectangular components of the con-
stituent particle with index α, and vα be its velocity. The La-
grangian
Lf (t; x0 , . . . , xN −1 ; v0 , . . . , vN −1 )
X
1 2
= 2 mα vα − V (t; x0 , . . . , xN −1 ; v0 , . . . , vN −1 ) (1.104)
α
is the difference of kinetic and potential energies of the constituent

particles. This is a suitable Lagrangian for a set of unconstrained
free particles with potential energy V .
Let q be a tuple of irredundant generalized coordinates, and v
be the corresponding generalized velocity tuple. The coordinates
1.6.3 Constraints as Coordinate Transformations 61
q are related to xα , the coordinates of the constituent particles, by

xα = fα (t, q), as before. The constraints among the constituent
particles are taken into account in the definition of the fα . Here
we view this as a coordinate transformation. What is unusual
about this as a coordinate transformation is that the dimension
of x is not the same as the dimension of q. From this coordinate
transformation we can find the local-tuple transformation function
(see section 1.6.1)
(t; x0 , . . . , xN −1 ; v0 , . . . , vN −1 ) = C(t, q, v). (1.105)
A Lagrangian for the constrained system can be obtained from

the Lagrangian for the unconstrained system by composing it with
the local-tuple transformation function from constrained coordi-
nates to unconstrained coordinates:
L = Lf ◦ C. (1.106)
The constraints enter only in the transformation.

To illustrate this we will find a Lagrangian for the driven pen-
dulum introduced in section 1.6.2. The T −V Lagrangian for a free
particle of mass m in a vertical plane subject to a gravitational
potential with acceleration g is
Lf (t; x, y; vx , vy ) = 12 m(vx2 + vy2 ) − mgy, (1.107)
where y measures the vertical height of the point mass. As a

program
(define ((Lf m g) local)
(let ((y (ref q 1)))
(- (* 1/2 m (square v)) (* m g y)))))
The coordinate transformation from generalized coordinate θ to

rectangular coordinates is x = l sin θ, y = ys (t) − l cos θ, where l is
the length of the pendulum and ys gives the height of the support
as a function of time. It is interesting that the drive enters only
through the specification of the constraints. As a program
(define ((dp-coordinates l y s) local)

(theta (coordinate local)))
(let ((x (* l (sin theta)))
(y (- (y s t) (* l (cos theta)))))
(up x y))))
Using F->C we can deduce the local-tuple transformation and de-

fine the Lagrangian for the driven pendulum by composition:
(define (L-pend m l g y s)
(compose (Lf m g)
(F->C (dp-coordinates l y s))))
The Lagrangian is
(show-expression
((L-pend ’m ’l ’g (literal-function ’y s))
(->local ’t ’theta ’thetadot)))
1 1
glm cos (θ)−gmys (t)+ l2 mθ̇2 +lmθ̇Dys (t) sin (θ)+ m (Dys (t))2
2 2
This is the same as the Lagrangian found in section 1.6.2.

We have found a very interesting decomposition of the La-
grangian for constrained systems. One part consists of the dif-
ference of the kinetic and potential energy of the constituents.
The other part describes the constraints that are specific to the
configuration of a particular system.
1.6.4 The Lagrangian is Not Unique

Lagrangians are not in a one-to-one relationship with physical
systems—many Lagrangians can be used to describe the same
physical system. In this section we will demonstrate this by show-
ing that the addition to the Lagrangian of a “total time deriva-
tive” of a function of the coordinates and time does not change
the paths of stationary action or the equations of motion deduced
from the action principle.
Total time derivatives

Let’s first explain what we mean by a “total time derivative.” Let
F be a function of time and coordinates. Then the time derivative
of F along a path q is
D(F ◦ Γ[q]) = (DF ◦ Γ[q])DΓ[q]. (1.108)
Because F only depends on time and coordinates:
DF ◦ Γ[q] = [∂0 F ◦ Γ[q], ∂1 F ◦ Γ[q]] . (1.109)
So we only need the first two components of DΓ[q],

¡ ¢
(DΓ[q])(t) = 1, Dq(t), D2 q(t), . . . , (1.110)
to form the product
D(F ◦ Γ[q]) = ∂0 F ◦ Γ[q] + (∂1 F ◦ Γ[q])Dq

= (∂0 F + (∂1 F )Q̇) ◦ Γ[q], (1.111)
where Q̇ = I2 is a selector function:72 c = Q̇(a, b, c), so Dq =

Q̇ ◦ Γ[q]. The function
Dt F = ∂0 F + (∂1 F )Q̇ (1.112)
is called the total time derivative of F ; it is a function of three

arguments: the time, the generalized coordinates, and the gener-
alized velocities.
In general, the total time derivative of a local-tuple function F
is that function Dt F that when composed with a local-tuple path
is the time derivative of the composition of the function F with
the same local-tuple path:
Dt F ◦ Γ[q] = D(F ◦ Γ[q]). (1.113)
The total time derivative Dt F is explicitly given by
Dt F (t, q, v, a, . . .) = ∂0 F (t, q, v, a, . . .)
+ ∂1 F (t, q, v, a, . . .) v
+ ∂2 F (t, q, v, a, . . .) a + · · · , (1.114)
72
Components of a tuple structure, such as the value of Γ[q](t) can be selected
with selector functions: Ii gets the element with index i from the tuple.
where we take as many terms as needed to exhaust the arguments

of F .
Exercise 1.25: Properties of Dt

The total time derivative Dt F is not the derivative of the function F .
Nevertheless, the total time derivative shares many properties with the
derivative. Demonstrate that Dt has the following properties for local-
tuple functions F and G, number c, and a function H with domain
containing the range of G.
a. Dt (F + G) = Dt F + Dt G
b. Dt (cF ) = cDt F
c. Dt (F G) = F Dt G + (Dt F )G
d. Dt (H ◦ G) = (DH ◦ G)Dt G.
Adding total time derivatives to Lagrangians

Consider two Lagrangians L and L0 that differ by the addition of
a total time derivative of a function F that depends only on the
time and the coordinates
L0 = L + Dt F. (1.115)
The corresponding action integral is

Z t2
0
S [q](t1 , t2 ) = L0 ◦ Γ[q]
t1
Z t2
= (L + Dt F ) ◦ Γ[q]
t1
Z t2 Z t2
= L ◦ Γ[q] + D(F ◦ Γ[q])
t1 t1
= S[q](t1 , t2 ) + (F ◦ Γ[q])|tt21 . (1.116)
The variational principle states that the action integral along a

realizable trajectory is stationary with respect to variations of the
trajectory that leave the configuration at the endpoints fixed. The
action integrals S[q](t1 , t2 ) and S 0 [q](t1 , t2 ) differ by a term
(F ◦ Γ[q])|tt21 = F (t2 , q(t2 )) − F (t1 , q(t1 )) (1.117)
that depends only on the coordinates and time at the endpoints

and these are not allowed to vary. Thus, if S[q](t1 , t2 ) is stationary
for a path, then S 0 [q](t1 , t2 ) will also be stationary. So either

Lagrangian can be used to distinguish the realizable paths.
The addition of a total time derivative to a Lagrangian does
not affect whether the action is critical for a given path. So if we
have two Lagrangians that differ by a total time derivative the
corresponding Lagrange equations are equivalent in that the same
paths satisfy each. Moreover, the additional terms introduced into
the action by the total time derivative only appear in the endpoint
condition and thus do not affect the Lagrange equations derived
from the variation of the action, so the Lagrange equations are the
same. So the Lagrange equations are not changed by the addition
of a total time derivative to a Lagrangian.
Exercise 1.26: Lagrange equations for total time derivatives

Let F (t, q) be a function of t and q only, with total time derivative
Dt F = ∂0 F + ∂q F Q̇. (1.118)
Show explicitly that the Lagrange equations for Dt F are identically zero,
and thus that the addition of Dt F to a Lagrangian does not affect the
Lagrange equations.
The driven pendulum provides a nice illustration of adding total

time derivatives to Lagrangians. The equation of motion for the
driven pendulum (see section 1.6.2),
ml2 D2 θ(t) + ml(g + D2 ys (t)) sin θ(t) = 0, (1.119)
has an interesting and suggestive interpretation: it is the same as

the equation of motion of an undriven pendulum, except that the
acceleration of gravity g is augmented by the acceleration of the
pivot D2 ys . This intuitive interpretation was not apparent in the
Lagrangian derived as the difference of the kinetic and potential
energies in section 1.6.2. However, we can write an alternate La-
grangian that has the same equations of motion that is as easy to
interpret as the equation of motion:
L0 (t, θ, θ̇) = 12 ml2 θ̇2 + ml(g + D2 ys (t)) cos θ. (1.120)
With this Lagrangian it is apparent that the effect of the acceler-

ating pivot is to modify the acceleration of gravity. Note, however,
that it is not the difference of the kinetic and potential energies.
Let’s compare the two Lagrangians for the driven pendulum. The
difference ∆L = L − L0 is
∆L(t, θ, θ̇) = 12 m(Dys (t))2 + mlDys (t)θ̇ sin θ

− gmys (t) − mlD2 ys (t) cos θ. (1.121)
The two terms in ∆L that do not depend on either θ or θ̇ do not

affect the equations of motion. The remaining two terms are the
total time derivative of the function F (t, θ) = −mlDys (t) cos θ,
which does not depend on θ̇. The addition of such terms to a
Lagrangian does not affect the equations of motion.
Identification of total time derivatives
If the local-tuple function G, with arguments (t, q, v), is the total
time derivative of a function F , with arguments (t, q), then G
must have certain properties.
From equation (1.112), we see that G must be linear in the
generalized velocities
G(t, q, v) = G1 (t, q, v) v + G2 (t, q, v) (1.122)
where neither G1 nor G2 depend on the generalized velocities:

∂2 G1 = ∂2 G2 = 0.
If G is the total time derivative of F then G1 = ∂1 F and G2 =
∂0 F , so
∂ 0 G1 = ∂ 0 ∂ 1 F
∂1 G2 = ∂1 ∂0 F. (1.123)
The partial derivative with respect to the time argument does

not have structure, so ∂0 ∂1 F = ∂0 ∂1 F . So if G is the total time
derivative of F then
∂ 0 G1 = ∂ 1 G0 . (1.124)
Furthermore, G1 = ∂1 F , so
∂1 G1 = ∂1 ∂1 F. (1.125)
If there is more than one degree of freedom these partials are

actually structures of partial derivatives with respect to each co-
ordinate. The partial derivatives with respect to two different
coordinates must be the same independent of the order of the
differentiation. So ∂1 G1 must be symmetric.
Note that we have not shown that these conditions are sufficient
for determining that a function is a total time derivative, only that
they are necessary.
Exercise 1.27: Identifying total time derivatives

For each of the following functions, either show that it is not a total
time derivative or produce a function from which it can be derived.
a. G(t, x, vx ) = mvx
b. G(t, x, vx ) = mvx cos t
c. G(t, x, vx ) = vx cos t − x sin t
d. G(t, x, vx ) = vx cos t + x sin t
e. G(t; x, y; vx , vy ) = 2(xvx + yvy ) cos t − (x2 + y 2 ) sin t
f. G(t; x, y; vx , vy ) = 2(xvx + yvy ) cos t − (x2 + y 2 ) sin t + y 3 vx + xvy
1.7 Evolution of Dynamical State
Lagrange’s equations are ordinary differential equations that the

path must satisfy. They can be used to test if a proposed path is
a realizable path of the system. However, we can also use them
to develop a path, starting with initial conditions.
The state of a system is defined to be the information that
must be specified for the subsequent evolution to be determined.
Remember our juggler: he or she must throw the pin in a cer-
tain way for it to execute the desired motion. The juggler has
control of the initial position and orientation of the pin, and the
initial velocity and spin of the pin. Our experience with juggling
and similar systems suggests that the initial configuration and the
rate of change of the configuration are sufficient to determine the
subsequent motion. Other systems may require higher derivatives
of the configuration.
For Lagrangians that are written in terms of a set of generalized
coordinates and velocities we have shown that Lagrange’s equa-
tions are second-order ordinary differential equations. If the dif-
ferential equations can be solved for the highest-order derivatives
and if the differential equations satisfy appropriate conditions73
73
For example, the Lipschitz condition is that the rate of change of the deriva-
tive is bounded by a constant in an open set around each point of the trajec-
tory. See [22] for a good treatment of the Lipschitz condition.
then there is a unique solution to the initial-value problem: given

values of the solution and the lower derivatives of the solution at
a particular moment there is a unique solution function. Given
irredundant coordinates the Lagrange equations satisfy these con-
ditions.74 Thus a trajectory is determined by the generalized co-
ordinates and the generalized velocities at any time. This is the
information required to specify the dynamical state.
A complete local description of a path consists of the path and
all of its derivatives at a moment. The complete local descrip-
tion of a path can be reconstructed from an initial segment of
the local tuple, given a prescription for computing higher-order
derivatives of the path in terms of lower-order derivatives. The
state of the system is specified by that initial segment of the local
tuple from which the rest of the complete local description can be
deduced. The complete local description gives us the path near
that moment. Actually, all we need is a rule for computing the
next higher derivative; we can get all the rest from this. Assume
that the state of a system is given by the tuple (t, q, v). If we are
given a prescription for computing the acceleration a = A(t, q, v),
then
D2 q = A ◦ Γ[q], (1.126)
and we have as a consequence
D3 q = D(A ◦ Γ[q]) = Dt A ◦ Γ[q], (1.127)
and so on. So the higher derivative components of the local tuple

are given by functions Dt A, Dt2 A, . . .. Each of these functions
depends on lower derivative components of the local tuple. All we
need to deduce the path from the state is a function that gives
the next higher derivative component of the local description from
the state. We use the Lagrange equations to find this function.
74
If the coordinates are redundant we cannot, in general solve for the highest-
order derivative. However, since we can transform to irredundant coordinates,
and since we can solve the initial-value problem in the irredundant coordinates,
and since we can construct the redundant coordinates from the irredundant
coordinates, we can in general solve the initial-value problem for redundant
coordinates. The only hitch is that we may not specify arbitrary initial con-
ditions: the initial conditions must be consistent with the constraints.
First, we expand the Lagrange equations
∂1 L ◦ Γ[q] = D(∂2 L ◦ Γ[q])
so that the second derivative appears explicitly
∂1 L ◦ Γ[q]
= ∂0 ∂2 L ◦ Γ[q] + (∂1 ∂2 L ◦ Γ[q]) Dq + (∂2 ∂2 L ◦ Γ[q]) D2 q.
Solving this system for D2 q one obtains the generalized accelera-

tion along a solution path q
D2 q =
[∂2 ∂2 L ◦ Γ[q]]−1 [∂1 L ◦ Γ[q] − (∂1 ∂2 L ◦ Γ[q]) Dq − ∂0 ∂2 L ◦ Γ[q]]
where [∂2 ∂2 L ◦ Γ]−1 is the inverse of the Hessian matrix. The

function that gives the acceleration is
£ ¤
A = (∂2 ∂2 L)−1 ∂1 L − ∂0 ∂2 L − (∂1 ∂2 L)Q̇ , (1.128)
where Q̇ = I2 is the velocity component selector.

That initial segment of the local tuple that specifies the state
is called the local state tuple, or, more simply, the state tuple.
We can express the function that gives the acceleration as a
function of the state tuple as the following procedure. It takes
a procedure that computes the Lagrangian, and returns a pro-
cedure that takes a state tuple as its argument and returns the
acceleration.75
(define (Lagrangian->acceleration L)
(let ((P ((partial 2) L))
(F ((partial 1) L)))
(/ (- F
(+ ((partial 0) P)
(* ((partial 1) P) velocity)))
((partial 2) P))))
Once we have a way of computing the acceleration from the

coordinates and the velocities, we can give a prescription for com-
puting the derivative of the state as a function of the state. For
75
In Scmutils division by a matrix is interpreted as multiplication on the left
by the inverse matrix.
the state (t, q(t), Dq(t)) at the moment t the derivative of the state
is (1, Dq(t), D2 q(t)) = (1, Dq(t), A(t, q(t), Dq(t))). The procedure
Lagrangian->state-derivative takes a Lagrangian and returns
a procedure that takes a state and returns the derivative of the
state:
(define (Lagrangian->state-derivative L)
(let ((acceleration (Lagrangian->acceleration L)))
(lambda (state)
(up 1
(velocity state)
(acceleration state)))))
We represent a state by an up-tuple of the components of that

initial segment of the local tuple that determine the state.
For example, the parametric state derivative for a harmonic
oscillator is
(define (harmonic-state-derivative m k)
(Lagrangian->state-derivative (L-harmonic m k)))
(print-expression
((harmonic-state-derivative ’m ’k)
(up ’t (up ’x ’y) (up ’v x ’v y))))
(up 1 (up v x v y) (up (/ (* -1 k x) m) (/ (* -1 k y) m)))
The Lagrange equations are second-order system of differential

equations that constrain realizable paths q. We can use the state
derivative to express the Lagrange equations as a first-order sys-
tem of differential equations that constrain realizable coordinate
paths q and velocity paths v:
(define ((Lagrange-equations-first-order L) q v)
(let ((state-path (qv->state-path q v)))
(- (D state-path)
(compose (Lagrangian->state-derivative L)
state-path))))
(define ((qv->state-path q v) t)
(up t (q t) (v t)))
For example, we can find the first-order form of the equations of

motion of a two-dimensional harmonic oscillator:
(show-expression
(((Lagrange-equations-first-order (L-harmonic ’m ’k))
(literal-function ’y))
(up (literal-function ’v x)
(literal-function ’v y)))
’t))
 
0
 Ã ! 
 − 
 Dx (t) vx (t) 
 
 
 
  Dy (t) − vy (t)  
 kx (t) 
 
  m + Dvx (t)  
 
 
ky (t)
+ Dvy (t)
m
The zero in first element of the structure of the Lagrange equa-
tions residuals is just the tautology that time advances uniformly:
that the time function is just the identity, so its derivative is 1
and the residual is zero. The equations in the second element
constrain the velocity path to be the derivative of the coordinate
path. The equations in the third element give the rate of change
of the velocity in terms of the applied forces.
Numerical integration
A set of first order ordinary differential equations that give the
state derivative in terms of the state can be integrated to find the
state path that emanates from a given initial state. Numerical
integrators find approximate solutions of such differential equa-
tions by a process illustrated in figure 1.6. The state derivative
produced by Lagrangian->state-derivative can be used by a
package that numerically integrates systems of first-order ordinary
differential equations.
The procedure state-advancer can be used to find the state of
a system at a specified time, given an initial state, which includes
the initial time, and a parametric state-derivative procedure.76
76
The Scmutils system provides a stable of numerical integration routines
that can be accessed through this interface. These include quality-controlled
Runge-Kutta (QCRK4) and Bulirsch-Stoer. The default integration method
is Bulirsch-Stoer.
t0
R t
t 1
a
q(t0 )
A
R q(t)
q
v Dq(t0 )
R Dq(t)
System derivative Integrator
Figure 1.6 The input to the system derivative is the state. The func-
tion A gives the acceleration as a function of the components that de-
termine the state. The output of the system derivative is the derivative
of the state. The integrator takes the derivative of the state as its in-
put and produces the integrated state, starting at the initial conditions.
Notice how the second-order system is put into first-order form by the
routing of the Dq(t) components in the system derivative.
For example, to advance the state of a two-dimensional harmonic

oscillator we write77
(print-expression
((state-advancer harmonic-state-derivative 2. 1.)
(up 0. (up 1. 2.) (up 3. 4.))
10
1.e-12)
(up 10.
(up 3.712791664584467 5.420620823651575)
(up 1.6148030925459906 1.8189103724750977))
The arguments to state-advancer are a parametric state deriva-

tive, harmonic-state-derivative, and the state-derivative pa-
77
The procedure state-advancer automatically compiles state-derivative pro-
cedures the first time they are encountered. The first time a new state-
derivative is used there is a delay while compilation occurs.
rameters (mass 2. and spring constant 1.). A procedure is re-

turned that takes an initial state, (up 0. (up 1. 2.) (up 3.
4.)), a target time, 10, and a relative error tolerance, 1.e-12.
The output is an approximation to the state at the specified final
time.
Consider the driven pendulum, described above, with a periodic
drive. We choose ys (t) = a cos ωt.
(define ((periodic-drive amplitude frequency phase) t)
(* amplitude (cos (+ (* frequency t) phase))))
(define (L-periodically-driven-pendulum m l g a omega)

(let ((ys (periodic-drive a omega 0)))
(L-pend m l g ys)))
Lagrange’s equation for this system is:

(show-expression
(L-periodically-driven-pendulum ’m ’l ’g ’a ’omega))
(literal-function ’theta))
’t))
D2 θ (t) l2 m − cos (ωt) sin (θ (t)) almω 2 + sin (θ (t)) glm
The parametric state derivative for the periodically driven pendu-

lum is
(define (pend-state-derivative m l g a omega)
(Lagrangian->state-derivative
(L-periodically-driven-pendulum m l g a omega)))
(show-expression
((pend-state-derivative ’m ’l ’g ’a ’omega)
(up ’t ’theta ’thetadot)))
 
1
 
 
 θ̇ 
 
 
 aω 2 cos (ωt) sin (θ) g sin (θ) 
−
l l
To examine the evolution of the driven pendulum we need a

mechanism that evolves a system for some interval while moni-
toring aspects of the system as it evolves. The procedure evolve
provides this service, using state-advancer repeatedly to advance
the state to the required moments. The procedure evolve takes a
parametric state-derivative and its parameters and returns a pro-
cedure that evolves the system from a specified initial state to a
number of other times, monitoring some aspect of the state at
those times. To generate a plot of the angle versus time we make
a monitor procedure that generates the plot as the evolution pro-
ceeds:78
(define ((monitor-theta win) state)
(let ((theta ((principal-value :pi) (coordinate state))))
(plot-point win (time state) theta)))
(define plot-win (frame 0. 100. :-pi :pi))
((evolve pend-state-derivative
1.0 ;m=1kg
1.0 ;l=1m
9.8 ;g=9.8m/s2
0.1 ;a=1/10 m
(* 2.0 (sqrt 9.8)) ) ;omega
(up 0.0 ;t0 =0
1. ;theta0 =1 radian
0.) ;thetadot0 =0 radians/s
(monitor-theta plot-win)
0.01 ;step between plotted points
100.0 ;final time
1.0e-13) ;local error tolerance
Figure 1.7 shows the angle θ versus time for a couple of orbits for
the driven pendulum. The initial conditions for the two runs are
the same except that in one the bob is given a tiny velocity equal to
10−10 m/s, about one atom width per second. The initial segments
78
The results are plotted in a plot-window that is created by the procedure
frame with arguments xmin, xmax, ymin, ymin, that specify the limits of the
plotting area. Points are added to the plot with the procedure plot-point
that takes a plot-window and the abscissa and ordinate of the point to be
plotted.
The procedure principal-value is used to reduce an angle to a standard
interval. The argument to principal-value is the point at which the circle is
to be cut. Thus (principal-value :pi) is a procedure that reduces an angle
θ to the interval −π ≤ θ < π.
+π
−π
0 25 50 75 100
+π
−π
0 25 50 75 100
Figure 1.7 Orbits of the driven pendulum. The angle θ is plotted

against time. Because angles are periodic, this plot may be thought
of as wound around a cylinder. The upper plot shows the results of a
simulation with initial conditions θ = 1 and θ̇ = 0. The orbit oscillates
for a while, then circulates, then resumes oscillating. In the lower plot
we show the result for a slightly different initial angular velocity, θ̇ =
10−10 . The initial behavior is indistinguishable from the top figure, but
the two trajectories become uncorrelated after the transition between
oscillation and circulation. This extreme sensitivity to initial conditions
is characteristic of systems with chaotic behavior.
of the two orbits are indistinguishable. After about 75 seconds the

two orbits diverge and become completely different. This extreme
sensitivity to tiny changes in initial conditions is characteristic of
what is called chaotic behavior. Later, we will investigate this
example further, using other tools such as Lyapunov exponents,
phase space, and Poincaré sections.
1.8 Conserved Quantities
A quantity that is a function of the state of the system that is

constant along a solution path is called a conserved quantity or a
constant of motion. If C is a conserved quantity, then
D(C ◦ Γ[q]) = Dt C ◦ Γ[q] = 0 (1.129)
for solution paths q. Following historical practice we also refer

to constants of the motion as integrals of the motion.79 In this
section, we will investigate systems with symmetry and find that
symmetries are associated with conserved quantities. For instance,
linear momentum is conserved in a system with translational sym-
metry, angular momentum is conserved if there is rotational sym-
metry, energy is conserved if the system does not depend on the
origin of time. We first consider systems for which a coordinate
system can be chosen that naturally expresses the symmetry, and
later discuss systems for which no coordinate system can be chosen
that simultaneously expresses all symmetries.
1.8.1 Conserved Momenta

If a Lagrangian L(t, q, v) does not depend on some particular co-
ordinate q i , then
(∂1 L)i = 0, (1.130)
and the corresponding ith component of the Lagrange equations

is
(D(∂2 L ◦ Γ[q]))i = 0. (1.131)
79
In the older literature conserved quantities are sometimes called first inte-
grals.
1.8.1 Conserved Momenta 77
This is the same as80
D ((∂2 L)i ◦ Γ[q])) = 0. (1.132)
So we see that
Pi = (∂2 L)i (1.133)
is a conserved quantity. The function P is called the momen-

tum state function. The value of the momentum state function
is the generalized momentum. We refer to ith component of the
generalized momentum as the momentum conjugate to the ith co-
ordinate.81 A generalized coordinate component that does not
appear explicitly in the Lagrangian is called a cyclic coordinate.
The generalized momentum component conjugate to any cyclic
coordinate is a constant of the motion. Its value is constant along
realizable paths; it may have different values on different paths.
As we will see, momentum is an important quantity even when it
is not conserved.
Given the coordinate path q and the Lagrangian L, the momen-
tum path p is
p = ∂2 L ◦ Γ[q] = P ◦ Γ[q], (1.134)
with components
pi = Pi ◦ Γ[q]. (1.135)
The momentum path is well defined for any path q. If the path is
realizable and the Lagrangian does not depend on q i then pi is a
constant function
Dpi = 0. (1.136)
The constant value of pi may be different for different trajectories.
80
The derivative of a component is equal to the component of the derivative.
81
Observe that we indicate a component of the generalized momentum with
a subscript, and indicate a component of the generalized coordinates with a
superscript. These conventions are consistent with the ones that are commonly
used in tensor algebra, which is sometimes helpful in working out complex
problems.
Examples of conserved momenta

The free particle Lagrangian L(t, x, v) = 12 mv 2 is independent of
x. So the momentum state function, P(t, q, v) = mv, is conserved
along realizable paths. The momentum path p for the coordinate
path q is p(t) = P ◦ Γ[q](t) = m Dq(t). For a realizable path
Dp(t) = 0. For the free particle the usual linear momentum is
conserved for realizable paths.
For a particle in a central force field (section 1.6), the La-
grangian
L(t; r, ϕ; ṙ, ϕ̇) = 12 m(ṙ2 + r2 ϕ̇2 ) − V (r)
depends on r but is independent of ϕ. The momentum state-

function is
£ ¤
P(t; r, ϕ; ṙ, ϕ̇) = mṙ, mr2 ϕ̇ .
It has two components. The first component, “the radial mo-

mentum,” is not conserved. The second component, “the angular
momentum,” is conserved along any solution trajectory.
If the central potential problem had been expressed in rectan-
gular coordinates, then all of the coordinates would have appeared
in the Lagrangian. In that case there would not be any obvious
conserved quantities. Nevertheless, the motion of the system does
not depend on the choice of coordinates; so the angular momen-
tum is still conserved.
We see that there is great advantage in making a judicious
choice for the coordinate system. If we can choose the coordinates
so that a symmetry of the system is reflected in the Lagrangian
by the absence of some coordinate component, then the existence
of a corresponding conserved quantity will be automatic.82
1.8.2 Energy Conservation

Momenta are conserved by the motion if the Lagrangian does not
depend on the corresponding coordinate. There is another con-
82
In general, conserved quantities in a physical system are associated with
continuous symmetries, whether or not one can find a coordinate system in
which the symmetry is apparent. This powerful notion was formalized and a
theorem linking conservation laws with symmetries was proved by E. Noether
early in the 20th century. See section 1.8.4 on Noether’s theorem.
1.8.2 Energy Conservation 79
stant of the motion, the energy, if the Lagrangian L(t, q, q̇) does
not depend explicitly on the time: ∂0 L = 0.
Consider the time derivative of the Lagrangian along a solution
path q:
D(L ◦ Γ[q]) = ∂0 L ◦ Γ[q] + (∂1 L ◦ Γ[q])Dq + (∂2 L ◦ Γ[q])D2 q.(1.137)
Using Lagrange’s equations to rewrite the second term
D(L◦Γ[q]) = (∂0 L)◦Γ[q]+D(∂2 L◦Γ[q])Dq+(∂2 L◦Γ[q])D2 q.(1.138)
Isolating ∂0 L and combining the first two terms on the right side
(∂0 L) ◦ Γ[q] = D(L ◦ Γ[q]) − D((∂2 L ◦ Γ[q])Dq)

= D(L ◦ Γ[q]) − D((∂2 L ◦ Γ[q])(Q̇ ◦ Γ[q]))
= D((L − P Q̇) ◦ Γ[q]), (1.139)
where, as before, Q̇ selects the velocity from the state. So we see

that if ∂0 L = 0 then
E = P Q̇ − L, (1.140)
is a conserved along realizable paths. The function E is called

the energy state function.83 Let E = E ◦ Γ[q] denote the energy
function on the path q. The energy function has a constant value
along any realizable trajectory if the Lagrangian has no explicit
time dependence; the energy E may have a different value for dif-
ferent trajectories. A system that has no explicit time dependence
is called autonomous.
Given a Lagrangian L, we may compute the energy:
(define (Lagrangian->energy L)
(let ((P ((partial 2) L)))
(- (* P velocity) L)))
Energy in terms of kinetic and potential energies

In some cases the energy can be written as the sum of kinetic and
potential energies. Suppose the system is composed of particles
with rectangular coordinates xα , the movement of which may be
subject to constraints, and that these rectangular coordinates are
some functions of the generalized coordinates q and possibly time
83
The sign of the energy state function is a matter of convention.
t: xα = fα (t, q). We form the Lagrangian as L = T − V and

compute the kinetic energy in terms of q by writing the rectangular
velocities in terms of the generalized velocities:
vα = ∂0 fα (t, q) + ∂1 fα (t, q)v. (1.141)
The kinetic energy is

P
T (t, q, v) = 12 α mα vα2 , (1.142)
where vα is the magnitude of vα .

If the fα functions do not depend explicitly on time (∂0 fα = 0),
then the rectangular velocities are homogeneous functions of the
generalized velocities of degree 1, and T is a homogeneous function
of the generalized velocities of degree 2, because it is formed by
summing the square of homogeneous functions of degree 1. If T is
a homogeneous function of degree 2 in the generalized velocities
then
P Q̇ = (∂2 T )Q̇ = 2T, (1.143)
where the second equality follows from Euler’s theorem on homo-

geneous functions.84 The energy state function is
E = P Q̇ − L = P Q̇ − T + V. (1.144)
So if fα is independent of time, the energy function can be rewrit-

ten
E = 2T − T + V = T + V. (1.145)
Notice that if V depends on time the energy is still the sum of

the kinetic energy and potential energy, but the energy is not
conserved.
The energy state function is always a well defined function,
whether or not it can be written in the form of T +V , and whether
or not it is conserved along realizable paths.
84
Euler’s theorem says that if f is a function of x = (x0 , x1 , . . .) that is homo-
geneous of degree n in each of the xi , then
X ³ ∂f ´
(x)xi = nf (x).
i
∂xi
Exercise 1.28:
An analogous result holds when the fα do depend explicitly on time.
a. Show that in this case the kinetic energy contains terms that are
linear in the generalized velocities.
b. Show that, by adding a total time derivative, the Lagrangian can
be written in the form L = A − B, where A is a homogeneous quadratic
form in the generalized velocities, and B is velocity independent.
c. Show, using Euler’s theorem, that the energy function is E = A + B.
An example where terms that were linear in the velocity were removed
from the Lagrangian by adding a total time derivative has already been
given: the driven pendulum.
Exercise 1.29:
A particle of mass m slides off a horizontal cylinder of radius R in a
uniform gravitational field with acceleration g. If the particle starts
close to the top with zero initial speed, with what angular velocity does
the particle leave the cylinder?
1.8.3 Central Forces in Three Dimensions

One important physical system is the motion of a particle in a cen-
tral field in three dimensions, with an arbitrary potential energy
V (r) depending only on the radius. We will describe this system
in spherical coordinates r, θ, and ϕ, where θ is the colatitude and
ϕ is the longitude. The kinetic energy has three terms:
T (t; r, θ, ϕ; ṙ, θ̇, ϕ̇) = 12 m(ṙ2 + r2 θ̇2 + r2 (sin θ)2 ϕ̇2 ).
As a procedure:
(define ((T3-spherical m) state)
(let ((t (time state))
(q (coordinate state))
(qdot (velocity state)))
(let ((r (ref q 0))
(theta (ref q 1))
(phi (ref q 2))
(rdot (ref qdot 0))
(thetadot (ref qdot 1))
(phidot (ref qdot 2)))
(* 1/2 m
(+ (square rdot)
(square (* r thetadot))
(square (* r (sin theta) phidot)))))))
The Lagrangian is then formed by subtracting the potential en-

ergy:
(define (L3-central m Vr)
(define (Vs state)
(let ((r (ref (coordinate state) 0)))
(Vr r)))
(- (T3-spherical m) Vs))
Let’s first look at the generalized forces (the derivatives of the La-
grangian with respect to the generalized coordinates). We com-
pute these with a partial derivative with respect to the coordinate
argument of the Lagrangian:
(show-expression
(((partial 1) (L3-central ’m (literal-function ’V)))
(up ’t
(up ’r ’theta ’phi)
(up ’rdot ’thetadot ’phidot))))
 
mϕ̇2 r (sin (θ))2 + mrθ̇2 − DV (r)
 
 
 mϕ̇2 r2 cos (θ) sin (θ) 
 
0
The ϕ component of the force is zero because ϕ does not appear

in the Lagrangian (it is a cyclic variable). The corresponding
momentum component is conserved. Compute the momenta:
(show-expression
(((partial 2) (L3-central ’m (literal-function ’V)))
(up ’t
 
mṙ
 
 
 mr2 θ̇ 
 
mr2 ϕ̇ (sin (θ))2
The momentum conjugate to ϕ is conserved. This is the z com-

ponent of the angular momentum ~r × (m~v ), for vector position
~r and linear momentum m~v . We can show this by writing the z

component of the angular momentum in spherical coordinates:
(define ((ang-mom-z m) state)
(let ((q (coordinate state))
(v (velocity state)))
(ref (cross-product q (* m v)) 2)))
(define (s->r state)

(let ((q (coordinate state)))
(let ((r (ref q 0))
(theta (ref q 1))
(phi (ref q 2)))
(let ((x (* r (sin theta) (cos phi)))
(y (* r (sin theta) (sin phi)))
(z (* r (cos theta))))
(up x y z))))))
(show-expression
((compose (ang-mom-z ’m) (F->C s->r))
(up ’t
mr2 ϕ̇ (sin (θ))2
The choice of the z-axis is arbitrary, so the conservation of any

component of the angular momentum implies the conservation of
all components. Thus the total angular momentum is conserved.
We can choose the z axis so all of the angular momentum is in the
z component. The angular momentum must be perpendicular to
both the radius vector and to the linear momentum vector. Thus
the motion is planar, θ = π/2, and θ̇ = 0. Planar motion in a
central-force field was discussed in section 1.6.
We can also see that the energy state function computed from
the Lagrangian for a central field is in fact T + V :
(show-expression
((Lagrangian->energy (L3-central ’m (literal-function ’V)))
(up ’t
1 1 1
mϕ̇2 r2 (sin (θ))2 + mr2 θ̇2 + mṙ2 + V (r)
2 2 2
The energy is conserved because the Lagrangian has no explicit

time dependence.
Exercise 1.30: Driven spherical pendulum

A spherical pendulum is a massive bob, subject to uniform gravity, that
may swing in three dimensions, but remains at a given distance from
the pivot. Formulate a Lagrangian for a spherical pendulum, driven
by vertical motion of the pivot. What symmetry(ies) can you find?
Find coordinates that express the symmetry. What is conserved? Give
analytic expression(s) for the conserved quantity(ies).
1.8.4 Noether’s Theorem

We have seen that if a system has a symmetry and if a coordinate
system can be chosen so that the Lagrangian does not depend
on the coordinate associated with the symmetry then there is a
conserved quantity associated with the symmetry. However, there
are more general symmetries for which there is no coordinate sys-
tem that fully expresses the symmetry. For example, motion in a
central potential is spherically symmetric, the dynamical system
is invariant under rotations about any axis, but the expression of
the Lagrangian for the system in spherical coordinates only ex-
hibits symmetry around one axis. More generally, a Lagrangian
has a symmetry if there is a coordinate transformation that leaves
the Lagrangian unchanged. A continuous symmetry is a paramet-
ric family of symmetries. Here we show that for any continuous
symmetry there is a conserved quantity.
Consider a parametric coordinate transformation Fe with pa-
rameter s:85
x = Fe (s)(t, x0 ). (1.146)
To this parametric coordinate transformation there corresponds a

e
parametric state transformation C:
e
(t, x, v) = C(s)(t, x0 , v 0 ). (1.147)
85
Noether’s theorem is more general than we state and prove it here. We
assume the transformations Fe (s) have no dependence on the generalized ve-
locities. Properly, we should also consider velocity dependent symmetries.
We require that the transformation Fe(0) is the identity coordinate

transformation x0 = Fe(0)(t, x0 ); and as a consequence C(0) e is
0 0 e 0 0
the identity state transformation (t, x , v ) = C(0)(t, x , v ). The
Lagrangian L has a continuous symmetry corresponding to Fe if it
is invariant under the transformations
e
L(s) e
= L ◦ C(s) =L (1.148)
for any s. The Lagrangian L is the same function as the trans-

e
formed Lagrangian L(s).
e e
That L(s) = L for any s implies DL(s) e
= 0. Explicitly, L(s) is
e
L(s)(t, x0 , v 0 ) = L(t, Fe(s)(t, x0 ), Dt (Fe(s))(t, x0 , v 0 )), (1.149)
where we have rewritten the velocity component of C(s)e in terms

e
of the total time derivative. The derivative of L is zero:
e
0 = DL(s)(t, x0 , v 0 )
= ∂1 L(t, x, v)(DFe)(s)(t, x0 ) + ∂2 L(t, x, v)Dt (DFe(s))(t, x0 ),
(1.150)
where we have used the fact that86
Dt (DFe(s)) = DG(s) with G(s) = Dt (Fe (s)). (1.151)
On a realizable path q we can use the Lagrange equations to

rewrite the first term
0 = (Dt ∂2 L ◦ Γ[q])((DFe)(s) ◦ Γ[q 0 ])

+ (∂2 L ◦ Γ[q])(Dt (DFe(s)) ◦ Γ[q 0 ]). (1.152)
For s = 0 the paths q and q 0 are the same, so Γ[q] = Γ[q 0 ], and
this equation becomes
0 = ((Dt ∂2 L)((DFe)(0)) + (∂2 L)(Dt (DFe(0)))) ◦ Γ[q]
86
The total time derivative is like a derivative with respect to a real-number
argument in that it does not generate structure, so it can commute with
derivatives that generate structure. Be careful though, it may not commute
with some derivatives for other reasons. For example, Dt ∂1 (Fe (s)) is the same
as ∂1 Dt (Fe (s)), but Dt ∂2 (Fe (s)) is not the same as ∂2 Dt (Fe (s)). The reason is
that Fe (s) does not depend on the velocity, but Dt (Fe (s)) does.
= Dt ((∂2 L)(DFe(0))) ◦ Γ[q]. (1.153)
Thus the state function I,
I = (∂2 L)(DFe(0)), (1.154)
is conserved along solution trajectories. This is Noether’s inte-

gral. The integral is the product of the momentum and a vector
associated with the symmetry.
Illustration: motion in a central potential
For example, consider the central potential Lagrangian in rectan-
gular coordinates:
L(t; x, y, z; vx .vy , vz )
¡ ¢ ³p ´
= 12 m vx2 + vy2 + vz2 − U x2 + y 2 + z 2 , (1.155)
and a parametric rotation Rz (s) about the z axis

Ã ! Ã 0! Ã 0 !
x x x cos s − y 0 sin s
y = Rz (s) y 0 = x0 sin s + y 0 cos s . (1.156)
z z0 z0
The rotation is an orthogonal transformation so
x2 + y 2 + z 2 = (x0 )2 + (y 0 )2 + (z 0 )2 . (1.157)
Differentiating along a path, we get
(vx , vy , vz ) = Rz (s)(vx0 , vy0 , vz0 ), (1.158)
so the velocities also transform by an orthogonal transformation,

and vx2 + vy2 + vz2 = (vx0 )2 + (vy0 )2 + (vz0 )2 . Thus
L0 (t; x0 , y 0 , z 0 ; vx0 , vy0 , vz0 )

¡ ¢
= 12 m (vx0 )2 + (vy0 )2 + (vz0 )2
¡p ¢
−U (x0 )2 + (y 0 )2 + (z 0 )2 , (1.159)
and we see that L0 is precisely the same function as L.

The momenta are
∂2 L(t; x, y, z; vx , vy , vz ) = [mvx , mvy , mvz ] . (1.160)

and
DFe(0)(t; x, y, z) = DR
ez (0)(x, y, z) = (y, −x, 0) . (1.161)
So the Noether integral is
I(t; x, y, z; vx , vy , vz ) = ((∂2 L)(DFe(0)))(t; x, y, z; vx , vy , vz )

= m(yvx − xvy ), (1.162)
which we recognize as minus the z component of the angular mo-

mentum: ~x × (m~v ). Since the Lagrangian is preserved by any
continuous rotational symmetry, all components of the vector an-
gular momenta are conserved for the central potential problem.
The procedures calls ((Rx angle-x) q), ((Ry angle-y) q),
and ((Rz angle-z) q) rotate the rectangular tuple q about the
indicated axis by the indicated angle.87 We use these to make a
parametric coordinate transformation F-tilde:
(define (F-tilde angle-x angle-y angle-z)
(compose (Rx angle-x) (Ry angle-y) (Rz angle-z) coordinate))
A Lagrangian for motion in a central potential is:

(define ((L-central-rectangular m U) tqp)
(let ((q (coordinate state))
(v (velocity state)))
(- (* 1/2 m (square v)) (U (sqrt (square q))))))
The Noether integral is then
87
The definition of the procedure Rx is
(define ((Rx angle) q)
(let ((ca (cos angle)) (sa (sin angle)))
(let ((x (ref q 0)) (y (ref q 1)) (z (ref q 2)))
(up x
(- (* ca y) (* sa z))
(+ (* sa y) (* ca z))))))
The definitions of Ry and Rz are similar.
(define Noether-integral
(let ((L (L-central-rectangular
’m (literal-function ’U))))
(* ((partial 2) L) ((D F-tilde) 0 0 0))))
(print-expression
(Noether-integral
(up ’t
(up ’x ’y ’z)
(up ’vx ’vy ’vz))))
(down (+ (* m vy z) (* -1 m vz y))
(+ (* m vz x) (* -1 m vx z))
(+ (* m vx y) (* -1 m vy x)))
We get all three components of the angular momentum.
1.9 Abstraction of Path Functions

An essential step in the derivation of the local-tuple transforma-
tion function C from the coordinate transformation F was the
deduction of the relationship between the velocities in the two
coordinate systems. We did this by inserting coordinate paths
into the coordinate transformation function F , differentiating, and
then generalizing the results on the path to arbitrary velocities at
a moment. The last step is an example of a more general problem
of abstracting a local-tuple function from a path function. Given a
function f of a local tuple a corresponding path-dependent func-
tion f¯[q] is f¯[q] = f ◦ Γ[q]. Given f¯, how can we reconstitute
f ? The local-tuple function f depends on only a finite number of
components of the local tuple, and f¯ only depends on the corre-
sponding local components of the path. So f¯ has the same value
for all paths that have that number of components of the local
tuple in common. Given f¯ we can reconstitute f by taking the
argument of f , which is a finite initial segment of a local tuple,
constructing a path that has this local description, and finding
the value of f¯ for this path.
Two paths that have the same local description up to the nth
derivative are said to osculate with order n contact. For example,
a path and the truncated power series representation of the path
up to order n have order n contact; if fewer than n derivatives
are needed by a local-tuple function, the path and the truncated
power series representation are equivalent. Let O be a function
that generates an osculating path with the given local tuple com-
ponents. So O(t, q, v, . . .)(t) = q, D(O(t, q, v, . . .))(t) = v, and in
general
(t, q, v, . . .) = Γ[O(t, q, v, . . .)](t). (1.163)
The number of components of the local tuple that are required is

finite, but unspecified. One way of constructing O is through the
truncated power series
O(t, q, v, a, . . .)(t0 ) = q + v(t0 − t) + 12 a(t0 − t)2 + · · · , (1.164)
where the number of terms is the same as the number of compo-

nents of the local tuple that are specified.
Given the path function f¯ we reconstitute the f function as
follows. We take the argument of f and construct an osculating
path with this local description. Then the value of f is the value
of f¯ for this osculating path:
f (t, q, v, . . .) = f ◦ Γ[O(t, q, v, . . .)](t) = f¯[O(t, q, v, . . .)](t). (1.165)
Let Γ̄ be the function that takes a path function and returns

the corresponding local-tuple function:
f = Γ̄(f¯). (1.166)
From equation (1.165) we see that
Γ̄(f¯)(t, q, v, . . .) = f¯[O(t, q, v, . . .)](t). (1.167)
The procedure Gamma-bar implements the function Γ̄ that re-

constitutes a path-dependent function into a local-tuple function:
(define ((Gamma-bar f-bar) local)
((f-bar (osculating-path local)) (time local)))
The procedure osculating-path takes a number of local compo-

nents and returns a path with these components; it is implemented
as a power series.
We can use Gamma-bar to construct the procedure F->C that
takes a coordinate transformation F and generates the procedure
that transforms local tuples. The procedure F->C constructs a
path-dependent procedure f-bar that takes a coordinate path in
the primed system and returns the local tuple of the corresponding
path in the unprimed coordinate system. It then uses Gamma-bar

to abstract f-bar to arbitrary local tuples in the primed coordi-
nate system.
(define (F->C F)
(define (f-bar q-prime)
(define q
(compose F (Gamma q-prime)))
(Gamma q))
(Gamma-bar f-bar))
(show-expression
((F->C p->r)
(->local ’t (up ’r ’theta) (up ’rdot ’thetadot))))
 
t
 
 Ã ! 
 r cos (θ) 
 
 
 
Ã r sin (θ) 
 −rθ̇ sin (θ) + ṙ cos (θ) ! 
 
 
rθ̇ cos (θ) + ṙ sin (θ)
Notice that in this definition of F->C we do not explicitly calculate

any derivatives. The calculation that led up to the state transfor-
mation (1.74) is not needed.
We can also use Γ̄ to make an elegant formula for computing
the total time derivative Dt F of the function F :
Dt F = Γ̄(Ḡ), with Ḡ[q] = D(F ◦ Γ[q]). (1.168)
The implementation of the total time derivative as a program

follows this definition. Given a procedure F implementing a local-
tuple function and a path q we can construct a new procedure
(compose F (Gamma q)). The procedure G-bar implements the
derivative of this function of time. We then abstract this off the
path with Gamma-bar to give the total time derivative.
(define (Dt F)
(define (G-bar q)
(D (compose F (Gamma q))))
(Gamma-bar G-bar))
Exercise 1.31: Velocity transformation

Use the procedure Gamma-bar to construct a procedure that transforms
velocities given a coordinate transformation. Apply this procedure to
the procedure p->r to deduce (again) equation (1.65).
Exercise 1.32: Path functions and state functions

The local-tuple function f is the same as the local-tuple function Γ̄(f¯)
where f¯[q] = f ◦Γ[q]. On the other hand, the path function f¯[q], and the
path function Γ̄(f¯) ◦ Γ[q], are not necessarily the same. Explain. Give
examples where they are the same and where they are not the same.
Write programs to illustrate the behavior.
Lagrange equations at a moment

Given a Lagrangian, the Lagrange equations test paths for whether
they are realizable paths of the system. The Lagrange equations
relate the path and its derivatives. The fact that the Lagrange
equations must be satisfied at each moment suggests that we can
abstract the Lagrange equations off the path and write them as
relations among the local-tuple components of realizable paths.
Let Ē[L] be the path-dependent function that produces the
residuals of the Lagrange equations (1.18) for the Lagrangian L:
Ē[L][q] = D(∂2 L ◦ Γ[q]) − ∂1 L ◦ Γ[q]. (1.169)
Realizable paths q satisfy the Lagrange equations
Ē[L][q] = 0. (1.170)
The path-dependent Lagrange equations can be converted to local

Lagrange equations using Γ̄
E[L] = Γ̄(Ē[L]). (1.171)
The operator E is called the Euler-Lagrange operator. In terms of

this operator the Lagrange equations are
E[L] ◦ Γ[q] = 0. (1.172)
Applying the definition (1.167) of Γ̄
E[L](t, q, v, . . .) = Γ̄(Ē[L])(t, q, v, . . .)
= D(∂2 L ◦ Γ[O(t, q, v, . . .)])
− ∂1 L ◦ Γ[O(t, q, v, . . .)]
= (Dt (∂2 L))(t, q, v, . . .) − ∂1 L(t, q, v, . . .)

= (Dt ∂2 L − ∂1 L)(t, q, v, . . .). (1.173)
So the Euler-Lagrange operator is explicitly
E[L] = Dt ∂2 L − ∂1 L. (1.174)
The procedure Euler-Lagrange-operator implements E

(define (Euler-Lagrange-operator L)
(- (Dt ((partial 2) L)) ((partial 1) L))) .
For example, applied to the Lagrangian for the harmonic oscil-

lator,
(print-expression
((Euler-Lagrange-operator
(L-harmonic ’m ’k))
(->local ’t ’x ’v ’a)))
(+ (* a m) (* k x))
Notice that the components of the local tuple are individually

specified. Using equation (1.172), the Lagrange equations for the
harmonic oscillator are:88
(print-expression
((compose
(Euler-Lagrange-operator (L-harmonic ’m ’k))
(Gamma (literal-function ’x) 4))
’t))
(+ (* k (x t)) (* m (((expt D 2) x) t)))
Exercise 1.33: Properties of E

Let F and G be two Lagrangian-like functions of a local tuple, C be a
local-tuple transformation function, and c a constant. Demonstrate the
following properties:
a. E[F + G] = E[F ] + E[G]
b. E[cF ] = cE[F ]
c. E[F G] = E[F ]G + F E[G] + (Dt F )∂2 G + ∂2 F (Dt G)
d. E[F ◦ C] = Dt (DF ◦ C)∂2 C + DF ◦ CE[C]
88
Notice that Gamma has one more argument than it usually has. This argument
gives the length of the initial segment of the local tuple needed. The default
length is 3, giving components of the local tuple up to and including the
velocities.
1.10 Constrained Motion 93
1.10 Constrained Motion
An advantage of the Lagrangian approach is that the coordinates

can often be chosen to exactly describe the freedom of the sys-
tem, automatically incorporating any constraints. We may also
use coordinates that have more freedom than the system actu-
ally has and consider explicit constraints among the coordinates.
For example, the planar pendulum has a one-dimensional config-
uration space. We have formulated this problem using the angle
from the vertical as the configuration coordinate. Alternatively,
we may choose to represent the pendulum as a body moving in
the plane, constrained to be on the circle of the correct radius
around the pivot. We would like to have valid descriptions for
both choices and show they are equivalent. In this section we
develop tools to handle problems with explicit constraints. The
constraints considered here are more general than those consid-
ered in the demonstration that the Lagrangian for systems with
rigid constraints can be written as the difference of kinetic and
potential energies (see section 1.6.2).
Suppose the configuration of a system with n degrees of freedom
is specified by n + 1 coordinates and that configuration paths q
are constrained to satisfy some relation of the form
ϕ(t, q(t), Dq(t)) = 0. (1.175)
How do we formulate the equations of motion? One approach

would be to use the constraint equation to eliminate one of the
coordinates in favor of the rest, and then the evolution of the
reduced set of generalized coordinates would be described by the
usual Lagrange equations. The equations governing the evolution
of coordinates that are not fully independent should be equivalent.
We can address the problem of formulating equations of mo-
tion for systems with redundant coordinates by returning to the
action principle. Realizable paths are distinguished from other
paths by having stationary action. Stationary refers to the fact
that the action does not change with certain small variations of
the path. What variations should be considered? We have seen
that velocity-independent rigid constraints can be used to elim-
inate redundant coordinates. In the irredundant coordinates we
distinguished realizable paths using variations that by construc-
tion satisfy the constraints. Thus in the case where constraints
can be used to eliminate redundant coordinates we can restrict

the variations in the path to those that are consistent with the
constraints.
So how does the restriction of the possible variations affect the
argument that led to Lagrange’s equations (refer to section 1.5)?
Actually most of the calculation is unaffected. The condition that
the action is stationary still reduces to the condition (1.34):
Z t2
© ª
0= (∂1 L ◦ Γ[q]) − D (∂2 L ◦ Γ[q]) η. (1.176)
t1
At this point we argued that because the variations η are arbitrary

(except for conditions at the endpoints), the only way for the
integral to be zero is for the integrand to be zero. Furthermore,
the freedom in our choice of η allowed us to deduce that the factor
multiplying η in the integrand must be identically zero, thereby
deriving Lagrange’s equations.
Now the choice of η is not completely free. We may still deduce
from the arbitrariness of η that the integrand must be zero,89 but
we may no longer deduce that the factor multiplying η is zero
(only that the projection of this factor onto acceptable variations
is zero). So we have
© ª
(∂1 L ◦ Γ[q]) − D (∂2 L ◦ Γ[q]) η = 0, (1.177)
with η subject to the constraints.

A path q satisfies the constraint if ϕ̄[q] = ϕ ◦ Γ[q] = 0. The
constraint must be satisfied even for the varied path, so we only
allow variations η for which the variation of the constraint is zero:
δη (ϕ̄) = 0. (1.178)
We can say that the variation must be “tangent” to the constraint

surface. Expanding this with the chain rule, a variation η is tan-
gent to the constraint surface ϕ if
(∂1 ϕ ◦ Γ[q]) η + (∂2 ϕ ◦ Γ[q]) Dη = 0. (1.179)
89
Given any acceptable variation we may make another acceptable variation by
multiplying the given one by a bump function that emphasizes any particular
time interval.
1.10.1 Coordinate Constraints 95
Note that these are functions of time; the variation at a given time
is tangent to the constraint at that time.
1.10.1 Coordinate Constraints

Consider constraints that do not depend on velocities:
∂2 ϕ ≡ 0.
In this case the variation is tangent to the constraint surface if
(∂1 ϕ ◦ Γ) η = 0. (1.180)
Together, equations (1.177) and (1.180) should determine the mo-

tion, but how do we eliminate η? The residual of the Lagrange
equations is orthogonal90 to any η that is orthogonal to the nor-
mal to the constraint surface. A vector that is orthogonal to all
vectors orthogonal to a given vector is parallel to the given vec-
tor. Thus, the residual of Lagrange’s equations is parallel to the
normal to the constraint surface; the two must be proportional:
D (∂2 L ◦ Γ[q]) − ∂1 L ◦ Γ[q] = λ(∂1 ϕ) ◦ Γ[q]. (1.181)
That the two vectors are parallel everywhere along the path does
not guarantee that the proportionality factor is the same at each
moment along the path, so the proportionality factor λ is some
function of time, which may depend on the path under consider-
ation. These equations, with the constraint equation ϕ ◦ Γ[q] = 0,
are the governing equations. These equations are sufficient to de-
termine the path q and to eliminate the unknown function λ.
Now watch this
Suppose we form an augmented Lagrangian treating λ as one of
the coordinates
L0 (t; q, λ; q̇, λ̇) = L(t, q, q̇) + λϕ(t, q, q̇). (1.182)
The Lagrange equations associated with the coordinates q are just

the modified Lagrange equations (1.181), and the Lagrange equa-
90
We take two tuple-valued functions of time to be orthogonal if at each instant
the dot product of the tuples is zero. Similarly, tuple-valued functions are
considered parallel if at each moment one of the tuples is a scalar multiple of
the other. The scalar multiplier is in general a function of time.
tion associated with λ is just the constraint equation. (Note that

λ̇ does not appear in the augmented Lagrangian.) So the La-
grange equations for this augmented Lagrangian fully encapsulate
the modification to the Lagrange equations that is imposed by the
addition of an explicit coordinate constraint, at the expense of in-
troducing extra degrees of freedom. Notice that this Lagrangian is
of the same form as Lagrangian (1.89) that we used in the deriva-
tion of L = T − V for rigid systems (section 1.6.2).
Alternatively
How do we know that we have enough information to eliminate
the unknown function λ from equations (1.181) or that the ex-
tra degree of freedom introduced in Lagrangian (1.182) is purely
formal?
If λ could be written as a function of the solution state path,
then it would be clear that it is determined by the state and
can thus be eliminated. Okay, suppose λ can be written as a
composition of state-dependent function with the path: λ = Λ ◦
Γ[q]. Consider the Lagrangian
L00 = L + Λϕ. (1.183)
This new Lagrangian has no extra degrees of freedom. The La-

grange equations for L00 are the Lagrange equations for L with
additional terms arising from the product of Λϕ. Applying the
Euler-Lagrange operator E (see section 1.9) to this Lagrangian
gives91
E [L00 ] = E [L] + E [Λϕ]

= E [L] + Λ E [ϕ] + E [Λ] ϕ + Dt Λ ∂2 ϕ + ∂2 Λ Dt ϕ. (1.184)
Composition of E[L00 ] with Γ[q] gives the Lagrange equations for

the path q. Using the fact that the constraint is satisfied on the
path ϕ ◦ Γ[q] = 0 and consequently Dt ϕ ◦ Γ[q] = 0, we have
E [L00 ] ◦ Γ[q] = (E [L] + λE [ϕ] + Dλ(∂2 ϕ)) ◦ Γ[q], (1.185)
91
Recall that the Euler-Lagrange operator E has the property
E [F G] = F E[G] + E[F ] G + Dt F ∂2 G + ∂2 F Dt G.
θ
l
Figure 1.8 We can formulate the behavior of a pendulum as motion

in the plane, constrained to a circle about the pivot.
where we have used λ = Λ ◦ Γ[q]. If we now use the fact that we

are only dealing with coordinate constraints, ∂2 ϕ = 0 then
E [L00 ] ◦ Γ[q] = (E [L] + λE[ϕ]) ◦ Γ[q]. (1.186)
The Lagrange equations are the same as those derived from the
augmented Lagrangian L0 . The difference is that now we see that
λ = Λ ◦ Γ[q] is determined by the unaugmented state. This is the
same as saying that λ can be eliminated.
Considering only the formal validity of the Lagrange equations
for the augmented Lagrangian, we could not deduce that λ could
be written as the composition of a state-dependent function Λ with
Γ[q]. The explicit Lagrange equations derived from the augmented
Lagrangian depend on the accelerations D2 q as well as λ so we
may not deduce separately that either is the composition of a
state-dependent function and Γ[q]. However, now we see that λ is
such a composition. This allows us to deduce that D2 q is also a
state-dependent function composed with the path. The evolution
of the system is determined from the dynamical state.
The pendulum using constraints
The pendulum can be formulated as the motion of a massive par-
ticle in a vertical plane subject to the constraint that the distance
to the pivot is constant (see figure 1.8).
In this formulation, the kinetic and potential energies in the
Lagrangian are those of an unconstrained particle in a uniform
gravitational acceleration. A Lagrangian for the unconstrained

particle is
L(t; x, y; vx , vy ) = 12 m(vx2 + vy2 ) − mgy. (1.187)
The constraint that the pendulum moves in a circle of radius l

about the pivot is92
x2 + y 2 − l2 = 0. (1.188)
The augmented Lagrangian is
L0 (t; x, y, λ; vx , vy , λ̇) = 12 m(vx2 +vy2 )−mgy +λ(x2 +y 2 −l2 ).(1.189)
The Lagrange equations for the augmented Lagrangian are
mD2 x − 2λx = 0 (1.190)

mD2 y + mg − 2λy = 0 (1.191)
x2 + y 2 − l2 = 0. (1.192)
These equations are sufficient to solve for the motion of the pen-
dulum.
It should not be surprising that these equations simplify if we
switch to “polar” coordinates
x = r sin θ y = −r cos θ. (1.193)
Substituting this into the constraint equation we determine that

r = l, a constant. Forming the derivatives and substituting into
the other two equations we find
ml(cos θD2 θ − sin θ(Dθ)2 ) − 2λ sin θ = 0 (1.194)

ml(sin θD2 θ + cos θ(Dθ)2 ) + mg + 2λ cos θ = 0. (1.195)
Multiplying the first by cos θ and the second by sin θ and adding,
we find
mlD2 θ + mg sin θ = 0, (1.196)
92
This constraint has the same form as the constraints used in the demonstra-
tion that L = T − V can be used for rigid systems. Here it is a particular
example of a more general set of constraints.
which we recognize as the correct equation for the pendulum. This

is the same as the Lagrange equation for the pendulum using the
unconstrained generalized coordinate θ. For completeness, we can
find λ in terms of the other variables
mD2 x 1
λ= = − (mg cos θ + ml(Dθ)2 ). (1.197)
2x 2l
This confirms that λ is really the composition of a function of the
state with the state path. Notice that 2lλ is a force—it is the
sum of the outward component of the gravitational force and the
centrifugal force. Using this interpretation in the two coordinate
equations of motion we see that the terms involving λ are the
forces that must be applied to the unconstrained particle to make
it move on the circle required by the constraints. Equivalently, we
may think of 2lλ as the tension in the pendulum rod that holds
the mass.93
Building systems from parts
The method of using augmented Lagrangians to enforce con-
straints on dynamical systems provides us with a way of building
the analysis of a compound system by combining the results of
the analysis of the parts of the system and the coupling between
them.
Consider the compound spring-mass system shown at the top of
figure 1.9. We could analyze this as a monolithic system with two
configuration coordinates x1 and x2 , representing the extensions
of the springs from their equilibrium lengths X1 and X2 .
An alternative procedure is to break the system into several
parts. In our spring-mass system we can choose two parts, one is
a spring and mass attached to the wall, and the other is a spring
and mass with its attachment point at an additional configuration
coordinate ξ. We can formulate a Lagrangian for each part sepa-
rately. We can then choose a Lagrangian for the composite system
as the sum of the two component Lagrangians with a constraint
ξ = X1 + x1 to accomplish the coupling.
93
Indeed, if we had scaled the constraint equations as we did in the discussion
of Newtonian constraint forces we could have identified λ with the the magni-
tude of the constraint force F . However, though λ will in general be related to
the constraint forces it will not be one of them. We chose to leave the scaling
as it naturally appeared rather than make things turn out artificially pretty.
k1 k2
m1 m2
X1 x1 X2 x2
k1
m1
X1 x1
k2
ξ
m2
X2 x2
Figure 1.9 A compound spring-mass system is decomposed into two

subsystems. We have two springs and masses that may only move hori-
zontally. The equilibrium positions of the springs are X1 and X2 . The
systems are coupled by the position-coordinate constraint ξ = X1 + x1 .
Let’s see how this works. The Lagrangian for the subsystem
attached to the wall is
L1 (t, x1 , ẋ1 ) = 12 m1 ẋ21 − 12 k1 x21 (1.198)
and the Lagrangian for the subsystem that attaches to it is

˙ ẋ2 ) = 1 m2 (ξ˙ + ẋ2 )2 − 1 k2 x2 .
L2 (t; ξ, x2 ; ξ, (1.199)
2 2 2
We construct a Lagrangian for the system composed from these

parts as a sum of the Lagrangians for each of the separate parts,
with a coupling term to enforce the constraint:

˙ λ̇)
L(t; x1 , x2 , ξ, λ; ẋ1 , ẋ2 , ξ,
˙ ẋ2 ) + λ(ξ − (X1 + x1 )) (1.200)
= L1 (t, x1 , ẋ1 ) + L2 (t; ξ, x2 ; ξ,
Thus we can write Lagrange’s equations for the four configuration

coordinates, in order, as follows:
m1 D2 x1 = −k1 x1 − λ (1.201)
m2 (D2 ξ + D2 x2 ) = −k2 x2 (1.202)
m2 (D2 ξ + D2 x2 ) = λ (1.203)
0 = ξ − (X1 + x1 ) (1.204)
Notice that in this system λ is the force of constraint, holding the

system together. We can now eliminate the “glue” coordinates
ξ and λ to obtain the equations of motion in the coordinates x1
and x2 :
m1 D2 x1 + m2 (D2 x1 + D2 x2 ) + k1 x1 = 0 (1.205)
m2 (D2 x1 + D2 x2 ) + k2 x2 = 0 (1.206)
This strategy can be generalized. We can make a library of

primitive components. Each component may be characterized by
a Lagrangian with additional degrees of freedom for the terminals
where that component may be attached to others. We then can
construct composite Lagrangians by combining components using
constraints to glue together the terminals.
Exercise 1.34: Combining Lagrangians

a. Make another primitive component that is compatible with the spring-
mass structures described in this section. For example, make a pendu-
lum that can attach to the spring-mass system. Build a combination
and derive the equations of motion. Be careful, the algebra is horrible
if you choose bad coordinates.
b. For a nice little project, construct a family of compatible mechanical
parts, characterized by appropriate Lagrangians, that can be combined
in a variety of ways to make interesting mechanisms. Remember that in
a good language the result of combining pieces should be a piece of the
same kind that can be further combined with other pieces.
Exercise 1.35: Bead on a triaxial surface

Consider again the motion of a bead constrained to move on a triaxial
surface from exercise 1.18. Reformulate this using rectangular coordi-
nates as the generalized coordinates with an explicit constraint that the
bead stay on the surface. Find a Lagrangian and show that the Lagrange
equations are equivalent to those found in exercise 1.18.
Exercise 1.36: Motion of a tiny golf ball

Consider the motion of a golf ball idealized as a point mass constrained
to a frictionless smooth surface of varying height h(x, y) in a uniform
gravitational field with acceleration g.
a. Find an augmented Lagrangian for this system, and derive the equa-
tions governing the motion of the point mass in x and y.
b. Under what conditions is this approximated by a potential function
V (x, y) = mgh(x, y)?
c. Assume that we have an h(x, y) that is axisymmetric about x = y =
0. Can you find such an h that yields motions with closed orbits?
1.10.2 Derivative Constraints

Here we investigate velocity-dependent constraints that are “to-
tal time derivatives” of velocity independent constraints. The
methods presented so far do not apply because the constraint is
velocity-dependent.
Consider a velocity-dependent constraint ψ = 0. That ψ is a to-
tal time derivative means that there exists a velocity-independent
function ϕ such that
ψ ◦ Γ[q] = D(ϕ ◦ Γ[q]). (1.207)
That ϕ is velocity independent means ∂2 ϕ = 0. As state functions

the relationship between ψ and ϕ is
ψ = Dt ϕ = ∂0 ϕ + ∂1 ϕQ̇. (1.208)
Given a ψ we can find ϕ by solving this linear partial differential

equation. The solution is determined up to a constant, so ψ = 0
implies ϕ = K for some constant K. On the other hand, if we
knew ϕ = K then ψ = 0 follows. Thus the velocity-dependent
constraint ψ = 0 is equivalent to the velocity-independent con-
straint ϕ = K, and we know how to find Lagrange equations for
such systems.
1.10.2 Derivative Constraints 103
If L is a Lagrangian for the unconstrained problem, the La-

grange equations with the constraint ϕ = K are
(E[L] + λ E[ϕ]) ◦ Γ[q] = 0, (1.209)
where λ is a function of time that will be eliminated during the

solution process. The constant K does not affect the Lagrange
equations. The function ϕ is velocity-independent ∂2 ϕ = 0, so the
Lagrange equations become
(E[L] − λ∂1 ϕ) ◦ Γ[q] = 0. (1.210)
From equation (1.208) we see that
∂1 ϕ = ∂2 ψ, (1.211)
so the Lagrange equations with the constraint ψ = 0 are
E[L] ◦ Γ[q] = λ∂2 ψ ◦ Γ[q]. (1.212)
The important feature is that we can write the Lagrange equations

directly in terms of ψ without having to produce the integral ϕ.
Of course the validity of these Lagrange equations depends on the
existence of the integral ϕ.
It turns out that the augmented Lagrangian trick also works
here. These Lagrange equations are given if we augment the La-
grangian with the constraint ψ multiplied by a function of time
λ0 :
L0 = L + λ0 ψ. (1.213)
The Lagrange equations for L0 turn out to be
E[L] ◦ Γ[q] = −Dλ0 ∂2 ψ ◦ Γ[q], (1.214)
which, with the identification λ = −Dλ0 , are the same as Lagrange

equations (1.212).
Sometimes a problem is naturally formulated in terms of velocity-
dependent constraints. The formalism we have developed will
handle any velocity-dependent constraint that can be written in
terms of the derivative of a coordinate constraint. Such a con-
straint is called an integrable constraint. Any system for which
the constraints can be put in the form of a coordinate constraint,
or are already in that form, is called a holonomic system.
θ
x
Figure 1.10 A massive hoop rolling, without slipping, down an in-

clined plane.
Exercise 1.37:
Show that the augmented Lagrangian (1.213) does lead to the Lagrange
equations (1.214), taking into account the fact that ψ is a total time
derivative of ϕ.
Goldstein’s hoop
Here we consider a problem for which the constraint can be rep-
resented as a time derivative of a coordinate constraint: a hoop
of mass M rolling, without slipping, down a (one-dimensional)
inclined plane (see figure 1.10).94
We will formulate this problem in terms of the two coordinates
θ, the rotation of an arbitrary point on the hoop from an arbitrary
reference direction, and x, the linear progress down the inclined
plane. The constraint is that the hoop does not slip. Thus a
change in θ is exactly reflected in a change in x; the constraint
function is:
ψ(t; x, θ; ẋ, θ̇) = Rθ̇ − ẋ (1.215)
This constraint is phrased as a relation among generalized veloci-

ties, but it could be integrated to get x = Rθ + c. We may form
our augmented Lagrangian with either the integrated constraint
or its derivative.
94
This example appears in [18] pages 49–51,
1.10.2 Derivative Constraints 105
The kinetic energy has two parts, the energy of rotation of the
hoop and the energy of the motion of its center of mass.95 The
potential energy of the hoop decreases as the height decreases.
Thus we may write the augmented Lagrangian:
L(t; x, θ, λ; ẋ, θ̇, λ̇)

= 12 M R2 θ̇2 + 12 M ẋ2 + M gx sin ϕ + λ(Rθ̇ − ẋ). (1.216)
Lagrange’s equations are
M D2 x − Dλ = M g sin ϕ (1.217)
M R2 D2 θ + R Dλ = 0 (1.218)
R Dθ − Dx = 0. (1.219)
And by differentiation of the third Lagrange equation we obtain,
D2 x = RD2 θ. (1.220)
By combining these equations we can solve for the dynamical

quantities of interest. For this case of a rolling hoop the linear
acceleration
D2 x = 12 g sin ϕ (1.221)
is just half of what it would have been if the mass had just slid
down a frictionless plane without rotating. Note that for this hoop
D2 x is independent of both M and R. We see from the Lagrange
equations that Dλ can be interpreted as the friction force involved
in enforcing the constraint. The frictional force of constraint is
Dλ = 12 M g sin ϕ (1.222)
and the angular acceleration is

1g
D2 θ = sin ϕ. (1.223)
2R
95
We will see in chapter 2 how to compute the kinetic energy of rotation, but
for now the answer is 12 M R2 θ̇2
1.10.3 Non-Holonomic Systems

Systems with constraints that are not integrable are termed non-
holonomic systems. A constraint is not integrable if it cannot be
written in terms of an equivalent coordinate constraint. An ex-
ample of a non-holonomic system is a ball rolling without slipping
in a bowl. As the ball rolls it must turn so that the surface of the
ball does not move relative to the bowl at the point of contact.
This looks like it might establish a relation between the location of
the ball in the bowl and the orientation of the ball, but it doesn’t.
The ball may return to the same place in the bowl with different
orientations depending on the intervening path the ball has taken.
As a consequence the constraints may not be used to eliminate any
coordinates.
What are the equations of motion governing non-holonomic sys-
tems? For the restricted set of systems with non-holonomic con-
straints that are linear in the velocities, it is widely reported96
that the equations of motion are the following. Let ψ have the
form
ψ(t, q, v) = G1 (t, q)v + G2 (t, q), (1.224)
a state function that is linear in the velocities. We assume ψ is not

a total time derivative. If L is a Lagrangian for the unconstrained
system, then the equations of motion are asserted to be
E[L] ◦ Γ[q] = λG1 ◦ Γ[q] = λ∂2 ψ ◦ Γ[q]. (1.225)
Together with the constraint ψ = 0 the system is closed and the

evolution of the system is determined. Note that these equations
are identical to the Lagrange equations (1.212) for the case that ψ
is a total time derivative, but here the derivation of those equations
is no longer valid.
An essential step in the derivation of the Lagrange equations
for coordinate constraints ϕ = 0 with ∂2 ϕ = 0 was to note that
two conditions must be satisfied
(E[L] ◦ Γ[q])η = 0, (1.226)
96
For some treatments of non-holonomic systems see, for example, Whit-
taker [43], Goldstein [18], Gantmakher [17], or Arnold et al. [6].
1.10.3 Non-Holonomic Systems 107
and
(∂1 ϕ ◦ Γ[q])η = 0. (1.227)
Because E[L] ◦ Γ[q] is orthogonal to η, and η is constrained to be

orthogonal to ∂1 ϕ ◦ Γ[q] the two must be parallel at each moment:
E[L] ◦ Γ[q] = λ∂1 ϕ ◦ Γ[q]. (1.228)
The Lagrange equations for derivative constraints were derived

from this.
This derivation does not go through if the constraint function is
velocity dependent. In this case, for a variation η to be consistent
with the velocity-dependent constraint function ψ it must satisfy
(see equation 1.179)
(∂1 ψ ◦ Γ[q])η + (∂2 ψ ◦ Γ[q])Dη = 0. (1.229)
We may no longer eliminate η by the same argument, because η

is no longer orthogonal to ∂1 ψ ◦ Γ[q], and we cannot rewrite the
constraint as a coordinate constraint because ψ is, by assumption,
not integrable.
The following is the derivation of the non-holonomic equations
from Arnold, et al. ([6]), translated into our notation. Define the
“virtual velocities” ξ to be any velocity satisfying
(∂2 ψ ◦ Γ[q])ξ = 0. (1.230)
The “principle of d’Alembert-Lagrange,” according to Arnold,

states that
(E[L] ◦ Γ[q])ξ = 0, (1.231)
for any virtual velocity ξ. Because ξ is arbitrary except that it is

required to be orthogonal to ∂2 ψ◦Γ[q] and any such ξ is orthogonal
to E[L] ◦ Γ[q], then ∂2 ψ ◦ Γ[q] must be parallel to E[L] ◦ Γ[q]. So
E[L] ◦ Γ[q] = λ(∂2 ψ ◦ Γ[q]), (1.232)
which are the non-holonomic equations.

To convert the stationary action equations to the equations of
Arnold we must do the following. To get from equation (1.226)
to equation (1.231), we must replace η by ξ. However, to get
from equation (1.229) to equation (1.230), we must set η = 0 and
replace Dη by ξ. All “derivations” of the non-holonomic equa-

tions have similar identifications. It comes down to this: the non-
holonomic equations do not follow from the action principle. They
are something else. Whether they are correct or not depends on
whether they agree with experiment.
For systems with coordinate constraints or derivative constraints
we have found that the Lagrange equations can be derived from
a Lagrangian that is augmented with the constraint. However, if
the constraints are not integrable the Lagrange equations for the
augmented Lagrangian are not the same as the non-holonomic
system (equations 1.225).97 Let L0 be an augmented Lagrangian
with non-integrable constraint ψ:
L0 (t; q, λ; q̇, λ̇) = L(t, q, q̇) + λψ(t, q, q̇) (1.233)
then the Lagrange equations associated with the coordinates are:
0 = E[L] ◦ Γ[q]
+ Dλ(∂2 ψ) ◦ Γ[q] + λD((∂2 ψ) ◦ Γ[q]) − λ(∂1 ψ) ◦ Γ[q]. (1.234)
The Lagrange equation associated with λ is just the constraint

equation
ψ ◦ Γ[q] = 0. (1.235)
An interesting feature of these equations is that they involve both

λ and Dλ. Thus the usual state variables q and Dq, with the
constraint, are not sufficient to determine a full set of initial con-
ditions for the derived Lagrange equations, we need to specify an
initial value for λ as well.
In general, for any particular physical system, equations (1.225)
and (1.234) are not the same, and in fact they have different so-
lutions. It is not apparent that either set of equations accurately
models the physical system. The first approach to non-holonomic
systems is not justified by extension of the arguments for the holo-
nomic case and the other is not fully determined. Perhaps this is
an indication that the models are inadequate; that more details
of how the constraints are maintained need to be specified.
97
Arnold, et al. [6] call the variational mechanics with the constraints added
to the Lagrangian Vakonomic mechanics.
1.11 Summary 109
1.11 Summary
To analyze a mechanical system we construct an action function

that gives us a way to distinguish realizable motions from other
conceivable motions of the system. The action function is con-
structed so as to be stationary only on paths describing realizable
motions, with respect to variations of the path. This is called the
principle of stationary action. The principle of stationary action
is a coordinate-independent specification of the realizable paths.
For systems with or without constraints we may choose any sys-
tem of coordinates that uniquely determines the configuration of
the system.
For a large variety of mechanical systems actions are integrals
of a function, called the Lagrangian, along the path. For many
systems an appropriate Lagrangian is the difference of the kinetic
energy and the potential energy of the system. The choice of a
Lagrangian for a system is not unique.
For any system that we have a Lagrangian action we can for-
mulate a system of ordinary differential equations, the Lagrange
equations, that is satisfied by any realizable path. The method of
deriving the Lagrange equations from the Lagrangian is indepen-
dent of the coordinate system used to formulate the Lagrangian.
One freedom we have in formulation is that the addition of a to-
tal time derivative to a Lagrangian for a system yields another
Lagrangian that has the same Lagrange equations.
The Lagrange equations are a set of ordinary differential equa-
tions: there is a finite state that summarizes the history of the
system and is sufficient to determine the future. There is an ef-
fective procedure for evolving the motion of the system from a
state at an instant. For many systems the state is determined by
the coordinates and the rate of change of the coordinates at an
instant.
If there are continuous symmetries in a physical system there
are conserved quantities associated with them. If the system can
be formulated in such a way that the symmetries are manifest in
missing coordinates in the Lagrangian then there are conserved
momenta conjugate to those coordinates. If the Lagrangian is
independent of time then there is a conserved energy.
1.12 Projects
Exercise 1.38: A numerical investigation
Consider a pendulum: a mass m supported on a massless rod of length
l, in a uniform gravitational field. A Lagrangian for the pendulum is:
m
L(t, θ, θ̇) = (lθ̇)2 + mgl cos θ
2
For the pendulum, the period of the motion depends on the amplitude.
We wish to find trajectories of the pendulum with a given frequency.
Three methods of doing this present themselves: (1) solution by the
principle of least action, (2) numerical integration of Lagrange’s equa-
tion, and (3) analytic solution (which requires some exposure to elliptic
functions). We will carry out all three, and compare the solution trajec-
tories.
To be specific, consider the parameters m = 1kg, l = 1m,pg =
9.8ms−2 . The frequency of small amplitude oscillations is ω0 = g/l.
Let’s find the non-trivial solution that has the frequency ω1 = 45 ω0 .
a. The angle is periodic in time, so a Fourier series representation is
appropriate. We can choose the origin of time so that a zero crossing
of the angle is at time zero. Since the potential is even in the angle,
the angle is an odd function of time. Thus we need only a sine series.
Since the angle returns to zero after one-half period the angle is an odd
function of time about the midpoint. Thus only odd terms of the series
are present:
m
X
θ(t) = An sin((2n − 1)ω1 t).
n=1
P∞
The amplitude of the trajectory is A = θmax = n=1 (−1)n+1 An .
Find approximations to the first few coefficients An by minimizing
the action. You will have to write a program similar to the find-path
procedure in section 1.4. Watch out: there is more than one trajectory
that minimizes the action.
b. Write a program to numerically integrate Lagrange’s equations for
the trajectories of the pendulum. The trouble with using numerical
integration to solve this problem is that we do not know how the fre-
quency of the motion depends on the initial conditions. So we have to
guess, and then gradually improve our guess. Define a function Ω(θ̇)
that numerically computes the frequency of the motion as a function of
the initial angular velocity (with θ = 0). Find the trajectory by solving
Ω(θ̇) = ω, for the initial angular velocity of the desired trajectory. Meth-
ods of solving this equation include successive bisection, minimizing the
squared residual, etc.—choose one.
1.12 Projects 111
θ1 l 1
m1
l2
θ2 m2
Figure 1.11 The double pendulum is pinned in two joints so that its
members are free to move in a plane.
c. Now let’s formulate the analytic solution for the frequency as a func-
tion of amplitude. The period of the motion is simply
Z T /4 Z A
1
T =4 dt = 4 dθ.
0 0 θ̇
Using the energy, solve for θ̇ in terms of the amplitude A and θ to write
the required integral explicitly. This integral can be written in terms
of elliptic functions, but in a sense this does not solve the problem—we
still have to compute the elliptic functions. Let’s avoid this excursion
into elliptic functions and just do the integral numerically using the
procedure definite-integral. We still have the problem that we can
specify the amplitude A and get the frequency but to solve our problem
we need to solve the inverse problem, but that can be done as in part b.
Exercise 1.39: Double pendulum behavior

Consider the ideal double pendulum show in figure 1.11.
a. Formulate a Lagrangian to describe the dynamics. Derive the equa-
tions of motion in terms of the given angles θ1 and θ2 . Put the equations
into a form appropriate for numerical integration.
Assume the following system parameters:
m
g = 9.8
sec2
l1 = 1.0 m
l2 = 0.9 m
m1 = 1.0 kg
m2 = 3.0 kg
b. Prepare graphs showing the behavior of each angle as a function of

time when the system is started with the initial conditions:
π
θ1 (0) = radian
2
θ2 (0) = π radian
radian
θ̇1 (0) = 0
sec
radian
θ̇2 (0) = 0
sec
Make the graphs extend to 50 seconds. Save the state points at .125
second intervals in a list.
c. Make a graph of the behavior of the energy of your system as a
function of time. The energy should be conserved. How good is the
conservation you obtained?
d. Repeat the experiment of part b with the m2 bob 10−10 m higher
than before. Form the list of squared differences of the distances between
the m2 bobs in the two experiments, and plot the log of that against
time. What do you see?
e. Repeat the previous comparison, but this time with the initial con-
ditions:
π
θ1 (0) = radian
2
θ2 (0) = 0 radian
radian
θ̇1 (0) = 0
sec
radian
θ̇2 (0) = 0
sec
What do you see here?
2
Rigid Bodies
The polhode rolls without slipping on the
herpolhode lying in the invariable plane.
Herbert Goldstein Classical Mechanics, (1950),
footnote on p 161.
The motion of rigid bodies presents many surprising phenomena.

Consider the motion of a top. A top is usually thought of as
an axisymmetric body, subject to gravity, with a point on the
axis of symmetry that is fixed in space. The top is spun, and in
general executes some complicated motion. We observe that the
top usually settles down into an unusual motion in which the axis
of the top slowly precesses about the vertical, apparently moving
perpendicular to the direction in which gravity is attempting to
accelerate it.
Consider the motion of a book thrown into the air.1 Books
have three main axes. Idealized as a brick with rectangular faces,
the three axes are the lines through the centers of opposite faces.
Try spinning the book about each axis. The motion of the book
spun about the longest and the shortest axis is a simple regular
rotation, perhaps with a little wobble depending on how carefully
it is thrown. The motion of the book spun about the intermediate
axis is qualitatively different: however carefully the book is spun
about the intermediate axis the book tumbles.
The rotation of the Moon is peculiar in that the Moon always
presents the same face to the Earth, indicating that the rotational
period and the orbit period are the same. Considering that the
orbit of the Moon is constantly changing because of interactions
with the Sun and other planets, and therefore the orbital period
of the Moon is constantly undergoing small variations, we might
expect that the face of the Moon that we see would slowly change,
but it does not. What is special about the face that is presented
to us?
1
We put a rubber band around the book so that it does not open.
114 Chapter 2 Rigid Bodies
A rigid body may be thought of as a large number of constituent

particles with rigid constraints among them. Thus the dynamical
principles governing the motion of rigid bodies are the same as
those governing the motion of any other system of particles with
rigid constraints. What is new here is that the number of con-
stituent particles is very large and we need to develop new tools
to handle them effectively.
We have found that a Lagrangian for a system with rigid con-
straints can be written as the difference of the kinetic and po-
tential energies. The kinetic and potential energies are naturally
expressed in terms of the positions and velocities of the constituent
particles. To write the Lagrangian in terms of the generalized co-
ordinates and velocities we must specify functions that relate the
generalized coordinates to the positions of the constituent parti-
cles. In the systems with rigid constraints considered up to now
these functions were explicitly given for each of the constituent
particles and individually included in the derivation of the La-
grangian. For a rigid body there are too many consituent particles
to handle each one of them in this way. We need to find means
of expressing the kinetic and potential energies of rigid bodies in
terms of the generalized coordinates and velocities, without going
through the particle-by-particle details.
The strategy is to first rewrite the kinetic and potential energies
in terms of quantities that characterize essential aspects of the
distribution of mass in the body and the state of motion of the
body. Only later do we introduce generalized coordinates. For
the kinetic energy, it turns out a small number of parameters
completely specify the state of motion and the relevant aspects
of the distribution of mass in the body. For the potential energy,
we find that for some specific problems the potential energy can
be represented with a small number of parameters, but in general
we have to make approximations to obtain a representation with
a manageable number of parameters.
2.1 Rotational Kinetic Energy
We consider a rigid body to be made up of a large number of

constituent particles with mass mα , position ~xα , and velocities
2.1 Rotational Kinetic Energy 115
~x˙ α , with rigid positional constraints among them. The kinetic

energy is
X
1
x˙ α · ~x˙ α .
2 mα ~ (2.1)
α
It turns out that the kinetic energy of a rigid body can be sepa-
rated into two pieces: a kinetic energy of translation and a kinetic
energy of rotation. Let’s see how this comes about.
The configuration of a rigid body is fully specified given the
location of any point in the body and the orientation of the body.
This suggests that it would be useful to decompose the position
vectors for the constituent particles as the sum of the vector X ~
~
to some reference position in the body and the vector ξα from the
reference position to the particular constituent element with index
α:
~ + ξ~α .
~xα = X (2.2)
Along paths, the velocities are related by
~˙ + ξ~˙ α .
~x˙ α = X (2.3)
So in terms of X ~˙ and ξ~˙ α the kinetic energy is

X ³ ´ ³ ´
1
mα
~˙ + ξ~˙ α · X
X ~˙ + ξ~˙ α
2
α
X ³ ´
= 1 ~˙ · X ~˙ · ξ~˙ α + ξ~˙ α · ξ~˙ α .
~˙ + 2X
2 mα X (2.4)
α
If we select the reference position in the body to be its center of

mass,
X
~ = 1
X mα ~xα , (2.5)
M α
P
where M = α mα is the total mass of the body, then
X X
mα ξ~α = ~ = 0.
mα (~xα − X) (2.6)
α α
So along paths the relative velocities satisfy

X ˙
mα ξ~α = 0. (2.7)
α
The kinetic energy is then

X P 1
1 ~˙ ~˙ ~˙ ~˙
2 mα X · X + α 2 mα ξ α · ξ α . (2.8)
α
The kinetic energy is the sum of the kinetic energy of the motion
of the total mass at the center of mass
1 ~˙ ~˙
· X,
2MX (2.9)
and the kinetic energy of rotation about the center of mass

X
1 ~˙ ~˙
2 mα ξ α · ξ α . (2.10)
α
Written in terms of appropriate generalized coordinates the ki-

netic energy is a Lagrangian for a free rigid body. If we choose
generalized coordinates so that the center of mass position is en-
tirely specified by some of them and the orientation is entirely
specified by others, then the Lagrange equations for a free rigid
body will decouple into two groups of equations, one concerned
with the motion of the center of mass and one concerned with the
orientation.
Such a separation might occur in other problems, such as a
rigid body moving in a uniform gravitational field, but in general,
potential energies cannot be separated as the kinetic energy sep-
arates. So the motion of the center of mass and the rotational
motion are usually coupled through the potential. Even in these
cases, it is usually an advantage to choose generalized coordinates
that separately specify the position of the center of mass and the
orientation.
2.2 Kinematics of Rotation
The motion of a rigid body about a center of rotation, a reference

position that is fixed with respect to the body, is characterized
2.2 Kinematics of Rotation 117
at each moment by a rotation axis and a rate of rotation. Let’s

elaborate.
We can get from any orientation of a body to any other orien-
tation of the body by a rotation of the body. That this is true is
called Euler’s theorem.2 We know that rotations have the prop-
erty that they do not commute: the composition of successive
rotations in general depends on the order of operation. Rotating
a book about the x̂ axis and then about the ẑ axis puts the book
in a different orientation than rotating the book about the ẑ axis
and then about the x̂ axis. Nevertheless, Euler’s theorem states
that however many rotations have been composed to reach a given
orientation, the orientation could have been reached with a single
rotation. Try it! We take a book, rotate it this way, then that,
and then some other way—then find the rotation that does the job
in one step. So a rotation can be specified by an axis of rotation
and the angular amount of the rotation.
If the orientation of a body evolves over some interval of time
then the orientation at the beginning and the end of the interval
can be connected by a single rotation. In the limit that the du-
ration of the interval goes to zero the rotation axis approaches a
unique instantaneous rotation axis. And in this limit the ratio of
the angle of rotation and the duration of the interval approaches
the instantaneous rate of rotation. We represent this instanta-
neous rotational motion by the angular velocity vector ω ~ , which
points in the direction of the rotation axis (with the right-hand
rule giving the direction of rotation about the axis) and has a
magnitude equal to the rate of rotation.
If the angular velocity vector for a body is ω
~ then the velocities
of the constituent particles are perpendicular to the vectors to
the constituent particles and proportional to the rate of rotation
of the body and the distance of the constituent particle from the
instantaneous rotation axis:
˙
ξ~α = ω
~ × ξ~α . (2.11)
Isn’t it interesting that we have found a concise way of specify-

ing how the orientation of the body is changing, even though we
have not yet described a way to specify the orientation itself.
2
For an elementary geometric proof of Euler’s theorem see Whittaker [43].
2.3 Moments of Inertia
The rotational kinetic energy is the sum of the kinetic energy of

each of the constituents of the rigid body. We can rewrite the
rotational kinetic energy in terms of the angular velocity vector
and certain aggregate quantities determined by the distribution
of mass in the rigid body.
Substituting our representation of the relative velocity vectors
into the rotational kinetic energy we obtain
X ¡ ¢ ¡ ¢
1
mα
~˙ · ξ~˙ = P 1 mα ω
ξ ~ × ~α · ω
ξ ~ × ~α .
ξ (2.12)
2 α α α 2
α
We introduce an arbitrary rectangular coordinate system with ori-

gin at the center of rotation and with basis vectors ê0 , ê1 , and ê2 ,
with the property that ê0 × ê1 = ê2 . The components of ω ~ on this
coordinate system are ω 0 , ω 1 , and ω 2 . Rewriting ω
~ in terms of its
components, the rotational kinetic energy becomes
X ¡P ¢ ¡¡P ¢ ¢
1 i ~ j ×ξ
~α
2 mα ( i êi ω ) × ξα · j êj ω
α
1 P P ¡ ¢ ¡ ¢
= 2 ij ωi ωj α mα êi × ξ~α · êj × ξ~α
1 P
= 2 ij ω i ω j Iij , (2.13)
with
X ¡ ¢ ¡ ¢
Iij = mα êi × ξ~α · êj × ξ~α . (2.14)
α
The quantities Iij are the components of the inertia tensor with
respect to the chosen coordinate system. Note what a remarkable
form the kinetic energy has taken. All we have done is interchange
the order of summations, but now the kinetic energy is written as
a sum of products of components of the angular velocity vector,
which completely specify how the orientation of the body is chang-
ing, and the quantity Iij , which depends solely on the distribution
of mass in the body relative to the chosen coordinate system.
We will deduce a number of properties of the inertia tensor.
First, we find a somewhat simpler expression for it. The compo-
2.3 Moments of Inertia 119
nents of the vector ξ~α are (ξα , ηα , ζα ).3 Rewriting ξ~α as a sum
over its components, and simplifying the elementary vector prod-
ucts of basis vectors, the components of the inertia tensor can be
arranged in the inertia matrix I, which looks like:
"P P P #
αPmα (ηα2 + ζα2 ) P− α mα ξα ηα − Pα mα ξα ζα
− Pα mα ηα ξα αPmα (ξα2 + ζα2 ) P − α mα ηα ζα (2.15)
− α mα ζα ξα − α mα ζα ηα m (ξ 2 + η2 )
α α α α
The inertia tensor has real components and is symmetric: Ijk =

Ikj .
We define the moment of inertia I about a line by
X
I= mα (ξα⊥ )2 , (2.16)
α
where ξα⊥ is the perpendicular distance from the line to the con-
stituent with index α. The diagonal components of the inertia
tensor Iii are recognized as the moments of inertia about the lines
coinciding with the coordinate axes êi . The off-diagonal compo-
nents of the inertia tensor are called products of inertia.
The rotational kinetic energy of a body depends on the distri-
bution of mass of the body solely through the inertia tensor. Re-
markably, the inertia tensor involves only second order moments
of the mass distribution with respect to the center of mass. We
might have expected the kinetic energy to depend in a complicated
way on all the moments of the mass distribution, interwoven in
some complicated way with the components of the angular ve-
locity vector, but this is not the case. This fact has a remarkable
consequence: for the motion of a free rigid body the detailed shape
of the body does not matter. If a book and a banana have the
same inertia tensor, that is, the same second order mass moments,
then if they are thrown in the same way the subsequent motion
will be the same, however complicated that motion is. The fact
that the book has corners and the banana has a stem do not affect
the motion except for their contributions to the inertia tensor. In
general, the potential energy of an extended body is not so simple
3
Here we avoid the more consistent notation (ξα0 , ξα1 , ξα2 ) for the components
of ξ~α because it is awkward to write expressions involving powers of the com-
ponents written this way.
and does indeed depend on all moments of the mass distribution,

but for the kinetic energy the second moments are all that matter!
Exercise 2.1: Rotational kinetic energy

An interesting alternate form for the rotational kinetic energy can be
found by decomposing ξ~α into components parallel and perpendicular
to the rotation axis ω̂. Show that the rotational kinetic energy can also
be written
TR = 12 Iω 2 , (2.17)
where I is the moment of inertia about the line through the center of
mass with direction ω̂, and ω is the instantaneous rate of rotation.
Exercise 2.2: Steiner’s theorem

Let I be the moment of inertia of a body with respect to some given line
through the center of mass. Show that the moment of inertia I 0 with
respect to a second line parallel to the first is
I 0 = I + M R2 (2.18)
where M is the total mass of the body and R is the distance between
the lines.
Exercise 2.3: Some useful moments of inertia

Show that the moments of inertia of the following objects are as given:
a. The moment of inertia of a sphere of uniform density with mass M
and radius R about any line through the center is 25 M R2 .
b. The moment of inertia of a spherical shell with mass M and radius
R about any line through the center is 23 M R2 .
c. The moment of inertia of a cylinder of uniform density with mass M
and radius R about the axis of the cylinder is 12 M R2 .
c. The moment of inertia of a thin rod of uniform density per unit
length with mass M and length L about an axis perpendicular to the
1
rod through the center of mass is 12 M L2 .
Exercise 2.4: Jupiter

a. The density of a planet increases toward the center. Provide an
argument that the moment of inertia is less than that of a sphere of
uniform density of the same mass and radius.
b. The density as a function of radius inside Jupiter is well approxi-
mated by
M sin(πr/R)
ρ(r) = ,
R3 4r/R
2.4 Inertia Tensor 121
where M is the mass and R is the radius of Jupiter. Find the moment
of inertia of Jupiter in terms of M and R.
2.4 Inertia Tensor
The representation of the rotational kinetic energy in terms of the

inertia tensor was derived with the help of a rectangular coordi-
nate system with basis vectors êi . There was nothing special about
this particular rectangular basis. So, the kinetic energy must have
the same form in any rectangular coordinate system. We can use
this fact to derive how the inertia tensor changes if the body or
the coordinate system is rotated.
Let’s talk a bit about active and passive rotations. The rotation
of the vector ~x by the rotation R produces a new vector ~x0 = R~x.
We may write ~x in terms of its components with respect to some
arbitrary rectangular coordinate system with orthonormal basis
vectors êi : ~x = x0 ê0 + x1 ê1 + x2 ê2 . Let x indicate the column
matrix of components x0 , x1 , and x2 of ~x, and R be the matrix
representation of R with respect to the same basis. In these terms
rotation can be written x0 = Rx. The rotation matrix R is a real
orthogonal matrix.4 A rotation that carries vectors to new vectors
is called an active rotation.
Alternately, we can rotate the coordinate system by rotating the
basis vectors, but leave other vectors that might be represented
in terms of them unchanged. If a vector is unchanged but the
basis vectors are rotated then the components of the vector on
the rotated basis vectors are not the same as the components
on the original basis vectors. Denote the rotated basis vectors
by ê0i = Rêi . The component of a vector along a basis vector
is the dot product of the vector with the basis vector. So the
components of the vector ~x along the rotated basis ê0i are (x0 )i =
~x · ê0i = ~x · (Rêi ) = (R−1 ~x) · êi .5 So the components with respect to
the rotated basis elements are the same as the components of the
rotated vector R−1 ~x with respect to the original basis. In terms
of components, if the vector ~x has components x with respect to
the original basis vectors êi , then the components x0 of the same
4
An orthogonal matrix R satisfies RT = R−1 and det R = 1.
5
The last equality follows from the fact that the rotation of two vectors pre-
serves the dot product: ~x·~ y ), or (R−1 ~
y = (R~x) · (R~ x) · ~ x · (R~
y=~ y ).
vector with respect to the rotated basis vectors ê0i are x0 = R−1 x,
or equivalently x = Rx0 . A rotation that actively rotates the
basis vectors, leaving other vectors unchanged, is called a passive
rotation. For a passive rotation the components of a fixed vector
change as if the vector was actively rotated by the inverse rotation.
With respect to the rectangular basis êi the rotational kinetic
energy is written
1P i j
2 ij ω ω Iij . (2.19)
In terms of matrix representations, the kinetic energy is

1 T
2 ω Iω, (2.20)
where ω is the column of components representing ω ~ .6 If we rotate

the coordinate system by the passive rotation R about the center
of rotation, the new basis vectors are ê0i = Rêi . The components
ω 0 of the vector ω
~ with respect to the rotated coordinate system
satisfy
ω = Rω 0 (2.21)
where R is the matrix representation of R. The kinetic energy is

1 0 T T 0
2 (ω ) R IRω . (2.22)
However, if we had started with the basis ê0i , we would have written
the kinetic energy directly as
1 0 T 0 0
2 (ω ) I ω , (2.23)
where the components are taken with respect to the ê0i basis. Com-
paring the two expressions, we see that
I0 = RT IR. (2.24)
Thus the inertia matrix transforms by a similarity transforma-

tion.7
6
We take a 1-by-1 matrix as a number.
7
That the inertia tensor transforms in this manner could have been deduced
from its definition (2.14). However, it seems that this argument, based on the
coordinate-system independence of the kinetic energy, provides insight.
2.5 Principal Moments of Inertia 123
2.5 Principal Moments of Inertia
We can use the transformation properties of the inertia ten-

sor (2.24) to show that there are special rectangular coordinate
systems for which the inertia tensor I 0 is diagonal, that is, Iij
0 =0
for i 6= j. Let’s assume that I0 is diagonal and solve for the rota-
tion matrix R that does the job. Multiplying both sides of (2.24)
on the left by R we have
RI0 = IR. (2.25)
We can examine pieces of this matrix equation by multiplying on

the right by a trivial column vector that picks out a particular
column. So we multiply on the right by the column matrix rep-
resentation ei of each of the coordinate unit vectors êi . These
column matrices have a one in the ith row and zeroes otherwise.
Using e0i = Rei , we find
RI0 ei = IRei = Ie0i . (2.26)
The matrix I0 is diagonal so
RI0 ei = Rei Iii0 = Iii0 e0i . (2.27)
So, from equations (2.26) and (2.27), we have
Iii0 e0i = Ie0i , (2.28)
which we recognize as an equation for the eigenvalue Iii0 and e0i ,

the column matrix of components of the associated eigenvector.
From e0i = Rei , we see that the e0i are the columns of the
rotation matrix R. Now, rotation matrices are orthogonal, so
RT R = 1; thus the columns of the rotation matrix must be or-
thonormal (e0i )T e0j = δij , which is one if i = j and zero otherwise.
But the eigenvectors that are solutions of equation (2.28) are not
necessarily even orthogonal. So we are not done yet.
If a matrix is real and symmetric then the eigenvalues are real.
Furthermore, if the eigenvalues are distinct then the eigenvectors
are orthogonal. However, if the eigenvalues are not distinct then
the directions of the eigenvectors for the degenerate eigenvalues
are not uniquely determined—we have the freedom to choose par-
ticular e0i that are orthogonal.8 The linearity of equation (2.28)

implies the e0i can be normalized. Thus whether or not the eigen-
values are distinct we can obtain an orthonormal set of ei . This
is enough to reconstruct a rotation matrix R that does the job
we asked of it: to rotate the coordinate system to a configuration
such that the inertia tensor is diagonal. If the eigenvalues are not
distinct, the rotation matrix R is not uniquely defined—there is
more than one rotation matrix R that does the job.
The eigenvectors and eigenvalues are determined by the require-
ment that the inertia tensor be diagonal with respect to the ro-
tated coordinate system. Thus the rotated coordinate system has
a special orientation with respect to the body. The basis vectors
ê0i therefore actually point along particular directions in the body.
We define the axes in the body through the center of mass with
these directions to be the principal axes. With respect to the co-
ordinate system defined by ê0i the inertia tensor is diagonal, by
construction, with the eigenvalues Iii0 on the diagonal. Thus the
moments of inertia about the principal axes are the eigenvalues
Iii0 . We call the moments of inertia about the principal axes the
principal moments of inertia.
For convenience, we often label the principal moments of inertia
according to their size: A ≤ B ≤ C, with principal axis unit vec-
tors â, b̂, ĉ, respectively. The positive direction along the principal
axes can be chosen so that â, b̂, ĉ form a right handed rectangular
coordinate basis.
Let x represent the matrix of components of a vector ~x with
respect to the basis vectors êi . Recall that the components x0 of a
vector ~x with respect to the principal axis unit vectors ê0i satisfy
x0 = RT x. (2.29)
This makes sense because the columns of R are the components

of e0i . Multiplying the components of ~x by the transpose of R is
taking the dot product of each ê0i with ~x producing the compo-
nents. The components of a vector on the principal axis basis are
sometimes called the body components of the vector.
8
If two eigenvalues are not distinct then linear combinations of the associ-
ated eigenvectors are eigenvectors. This gives us the freedom to find linear
combinations of the eigenvectors that are orthonormal.
Now let’s rewrite the kinetic energy in terms of the principal mo-
ments of inertia. If we choose our rectangular coordinate system
so that it coincides with the principal axes then the calculation
is simple. Let the components of the angular velocity vector on
the principal axes be (ω a , ω b , ω c ). Then, keeping in mind that the
inertia tensor is diagonal with respect to the principal axis basis,
the kinetic energy is just
1
TR = 2 [A(ω a )2 + B(ω b )2 + C(ω c )2 ] . (2.30)
Exercise 2.5: A constraint on the moments of inertia

Show that the sum of any two of the moments of inertia is greater than
the third moment of inertia.
Exercise 2.6: Principal moments of inertia

For each of the configurations described below find the principal mo-
ments of inertia with respect to the center of mass; find the correspond-
ing principal axes.
a. A regular tetrahedron consisting of four equal point masses tied to-
gether with rigid massless wire.
b. A cube of uniform density.
c. Five equal point masses rigidly connected by massless stuff. The
point masses are at the rectangular coordinates:
(−1, 0, 0), (1, 0, 0), (1, 1, 0), (0, 0, 0), (0, 0, 1)
Exercise 2.7: This book

Measure this book. You will admit that it is pretty dense. Don’t worry,
you will get to throw it later. Show that the principal axes are the lines
connecting the centers of opposite faces of the idealized brick approx-
imating the book. Compute the corresponding principal moments of
inertia.
2.6 Representation of the Angular Velocity Vector

We can specify the orientation of a body by specifying the rotation
that takes the body to this orientation from some reference ori-
entation. As the body moves the rotation that does this changes.
The angular velocity vector can be written in terms of this chang-
ing rotation along a path.
Let q be the coordinate path that we will use to describe the
motion of the body. Let M(q(t)) be the rotation that takes the
y y
M(q(t)) b
a
b c x a x
c
z z
Figure 2.1 The rotation M(q(t)) rotates the body from a reference
orientation in which the principal axes are aligned with the basis êi
(labeled by x, y, and z here) to the orientation specified by q(t).
body from the reference orientation to the orientation specified by

q(t) (see figure 2.1). Let ξ~α (t) be the vector to some constituent
with the body in the orientation specified by q(t), and let ξ~α0 be
the vector to the same constituent with the body in the reference
orientation. Then
ξ~α (t) = M(q(t))ξ~α0 . (2.31)
The constituent vectors ξ~α0 do not depend on the configuration,

because they are the vectors to the positions of the constituents
with the body in a fixed reference orientation.
We have already found an expression for the kinetic energy in
terms of the angular velocity vector and the inertia tensor. Here
we do this a different way. To compute the kinetic energy we
accumulate the contributions from all of the mass elements. The
positions of the constituent particles, at a given time t, are
ξ~α (t) = M(q(t))ξ~α0 = M (t)ξ~α0 , (2.32)
where M = M ◦ q. The velocity is the time derivative
Dξ~α (t) = DM (t)ξ~α0 . (2.33)
Using equation (2.32) we can write
Dξ~α (t) = DM (t)(M (t))−1 ξ~α (t). (2.34)

Recall that the velocity results from a rotation, and that the ve-
locities are (see equation 2.11)
Dξ~α (t) = ω
~ (t) × ξ~α (t). (2.35)
Thus we can identify the operator ω~ (t)× with DM (t)(M (t))−1 To

form the kinetic energy we need to extract ω ~ (t) from this.
If a vector ~u is represented by the component matrix u with
components x, y, and z, the function A which produces the matrix
representation of ~u× from the component matrix u is
" #
0 −z y
A(u) = z 0 −x . (2.36)
−y x 0
The inverse of this function can be applied to any skew-symmetric
matrix, and so we can use A−1 to extract the components ω of
~×
the angular velocity vector from the matrix representation of ω
in terms of M :
ω = A−1 (DM MT ), (2.37)
where M and DM are the matrix representations of the functions

M and DM , and where we have used the fact that for a matrix
representation of a rotation the transpose gives the inverse.
The components ω 0 of the angular velocity vector on the prin-
cipal axes are: ω 0 = MT ω. So
ω 0 = MT A−1 (DM MT ). (2.38)
The relationship of the angular velocity vector to the path is

a kinematic relationship; it is valid for any path. Thus we can
abstract it to obtain the components of the angular velocity at a
moment given the configuration and velocity at that moment.
Implementation of angular velocity functions
The following procedure gives the components of the angular ve-
locity as a function of time along the path
(define (((M-of-q->omega-of-t M-of-q) q) t)

(define M-on-path (compose M-of-q q))
(define (omega-cross t)
(* ((D M-on-path) t)
(m:transpose (M-on-path t))))
(antisymmetric->column-matrix (omega-cross t)))
The procedure omega-cross produces the matrix representation of

~ ×. The procedure antisymmetric->column-matrix, which cor-
ω
responds to the function A−1 , is used to extract the components of
the angular velocity vector from the skew-symmetric ω ~ × matrix.
The body components of the angular velocity vector as a func-
tion of time along the path are
(define (((M-of-q->omega-body-of-t M-of-q) q) t)
(* (m:transpose (M-of-q (q t)))
(((M-of-q->omega-of-t M-of-q) q) t)))
We can get the procedures of local state that give the angu-
lar velocity components by abstracting these procedures along ar-
bitrary paths that have given coordinates and velocities. The
abstraction of a procedure of a path to a procedure of state is
accomplished by Gamma-bar (see section 1.6.1):
(define (M->omega M-of-q)
(Gamma-bar
(M-of-q->omega-of-t M-of-q)))
(define (M->omega-body M-of-q)

(Gamma-bar
(M-of-q->omega-body-of-t M-of-q)))
These procedures give the angular velocities as a function of state.

We will see them in action after we get some M-of-q’s with which
to work.
2.7 Euler Angles
To go further we must finally specify a set of generalized coordi-

nates. We first do this using the traditional Euler angles. Later,
we find other ways of describing the orientation of a rigid body.
We are using an intermediate representation of the orientation
in terms of the function M of the generalized coordinates that gives
the rotation that takes the body from some reference orientation
and rotates it to the orientation specified by the generalized coor-
dinates. Here we take the reference orientation so that principal-
axis unit vectors â, b̂, ĉ are coincident with the basis vectors êi
labeled here by x̂, ŷ, ẑ.
We define the Euler angles in terms of simple rotations about
the coordinate axes. Let Rx (ψ) be a right-handed rotation about
the x̂ axis by the angle ψ, and let Rz (ψ) be a right-handed rotation
about the ẑ axis by the angle ψ. The function M for Euler angles
is written as a composition of three of these simple coordinate axis
rotations:
M(θ, ϕ, ψ) = Rz (ϕ)Rx (θ)Rz (ψ), (2.39)
for the Euler angles θ, ϕ, ψ.

The Euler angles can specify any orientation of the body, but
the orientation does not always correspond to a unique set of Eu-
ler angles. In particular, if θ = 0 then the orientation is dependent
only on the sum ϕ + ψ, so the orientation does not uniquely de-
termine either ϕ or ψ.
Exercise 2.8: Euler angles

It is not immediately obvious that all orientations can be represented in
terms of the Euler angles. To show that Euler angles are adequate to
represent all orientations solve for Euler angles that give an arbitrary
rotation R. Keep in mind that some orientations do not correspond to
a unique representation in terms of Euler angles.
Though the Euler angles allow us to specify all orientations and

thus can be used as generalized coordinates, the definition of Euler
angles is pretty arbitrary. In fact no reasoning has led us to them,
and this is reflected in our presentation of them by just saying
“here they are.” Euler angles are well suited for some problems
and are cumbersome for others.
There are other ways of defining similar sets of angles. For
instance, we could also take our generalized coordinates to satisfy
M0 (θ, ϕ, ψ) = Rx (ϕ)Ry (θ)Rz (ψ). (2.40)
Such alternatives to the Euler angles come in handy from time to

time.
Each of the fundamental rotations can be represented as a ma-

trix. The rotation matrix representing a right-handed rotation
about the ẑ axis by the angle ψ is
" #
cos ψ − sin ψ 0
Rz (ψ) = sin ψ cos ψ 0 (2.41)
0 0 1
and a right-handed rotation about the x axis by the angle ψ is
represented by the matrix
" #
1 0 0
Rx (ψ) = 0 cos ψ − sin ψ . (2.42)
0 sin ψ cos ψ
The matrix that represents the rotation that carries the body from
its reference orientation to the actual orientation is
Rz (ϕ)Rx (θ)Rz (ψ). (2.43)
The rotation matrices and their product can be constructed by

simple programs:
(define (rotate-z-matrix angle)
(matrix-by-rows
(list (cos angle) (- (sin angle)) 0)
(list (sin angle) (cos angle) 0)
(list 0 0 1)))
(define (rotate-x-matrix angle)

(matrix-by-rows
(list 1 0 0)
(list 0 (cos angle) (- (sin angle)))
(list 0 (sin angle) (cos angle))))
(define (Euler->M angles)

(let ((theta (ref angles 0))
(phi (ref angles 1))
(psi (ref angles 2)))
(* (rotate-z-matrix phi)
(rotate-x-matrix theta)
(rotate-z-matrix psi))))
Now that we have a procedure that implements a sample M,

we can find the components of the angular velocity vector and the
body components of the angular velocity vector using the proce-
dures M-of-q->omega-of-t and M-of-q->omega-body-of-t from

section 2.6. For example,
(show-expression
(((M-of-q->omega-body-of-t Euler->M)
(up (literal-function ’theta)
(literal-function ’phi)
(literal-function ’psi)))
’t))
 
Dϕ (t) sin (θ (t)) sin (ψ (t)) + cos (ψ (t)) Dθ (t)
 
 
 Dϕ (t) sin (θ (t)) cos (ψ (t)) − sin (ψ (t)) Dθ (t) 
 
cos (θ (t)) Dϕ (t) + Dψ (t)
To construct the kinetic energy we need the procedure of state

that gives the body components of the angular velocity vector:
(show-expression
((M->omega-body Euler->M)
(up ’t
(up ’theta ’phi ’psi)
(up ’thetadot ’phidot ’psidot))))
 
ϕ̇ sin (ψ) sin (θ) + θ̇ cos (ψ)
 
 
 ϕ̇ sin (θ) cos (ψ) − θ̇ sin (ψ) 
 
ϕ̇ cos (θ) + ψ̇
We capture this result as a procedure:

(define (Euler-state->omega-body local)
(let ((q (coordinate local)) (qdot (velocity local)))
(let ((theta (ref q 0))
(psi (ref q 2))
(phidot (ref qdot 1))
(psidot (ref qdot 2)))
(let ((omega-a (+ (* thetadot (cos psi))
(* phidot (sin theta) (sin psi))))
(omega-b (+ (* -1 thetadot (sin psi))
(* phidot (sin theta) (cos psi))))
(omega-c (+ (* phidot (cos theta)) psidot)))
(column-matrix omega-a omega-b omega-c)))))
The kinetic energy can be written:

(define ((T-rigid-body A B C) local)
(let ((omega-body (Euler-state->omega-body local)))
(* 1/2
(+ (* A (square (ref omega-body 0)))
(* B (square (ref omega-body 1)))
(* C (square (ref omega-body 2)))))))
2.8 Vector Angular Momentum

The vector angular momentum of a particle is the cross product of
the position and the linear momentum. For a rigid body the vector
angular momentum is the sum of the vector angular momentum
of each of the constituents. Here we find an expression for the
vector angular momentum of a rigid body in terms of the inertia
tensor and the angular velocity vector.
The vector angular momentum of a rigid body is:
X
~xα × (mα ~x˙ α ), (2.44)
α
where ~xα , ~x˙ α , and mα are the positions, velocities, and masses
of the constituent particles. It turns out that the vector angular
momentum decomposes into the sum of the angular momentum
of the center of mass and the rotational angular momentum about
the center of mass, just as the kinetic energy separates into the
kinetic energy of the center of mass and the kinetic energy of
rotation. As in the kinetic energy demonstration, decompose the
position into the vector to the center of mass X ~ and the vectors
from the center of mass to the constituent mass elements ξ~α :
~ + ξ~α ,
~xα = X (2.45)
with velocities
~˙ + ξ~˙ α .
~x˙ α = X (2.46)
Substituting, the angular momentum is

X
mα (X ~˙ + ξ~˙ α ).
~ + ξ~α ) × (X (2.47)
α
2.8 Vector Angular Momentum 133
Multiplying out the product, ~

P and using the fact that X is the
center of mass, and M = α mα is the total mass of the body,
the angular momentum is
X ˙
X ~˙ +
~ × (M X) ξ~α × (mα ξ~α ). (2.48)
α
The angular momentum of the center of mass is
X ~˙
~ × (M X), (2.49)
and the rotational angular momentum is

X ˙
ξ~α × (mα ξ~α ). (2.50)
α
We can also reexpress the rotational angular momentum in

terms of the angular velocity vector and the inertia tensor, as
˙
we did for the kinetic energy. Using ξ~α = ω
~ × ξ~α . The rotational
angular momentum is
X
~ =
L mα ξ~α × (~
ω × ξ~α ). (2.51)
α
In terms of components with respect to the basis êi , this is

X
Lj = Ijk ω k , (2.52)
k
where Ijk are the components of the inertia tensor (2.14). The
angular momentum and the kinetic energy are expressed in terms
of the same inertia tensor.
With respect to the principal axis basis, the angular momentum
components have a particularly simple form:
La = Aω a (2.53)
Lb = Bω b (2.54)
Lc = Cω c . (2.55)
Exercise 2.9:
Verify that the expression (2.52) for the components of the rotational
angular momentum (2.51) in terms of the inertia tensor is correct.
We can define procedures to calculate the components of the

angular momentum on the principal axes:
(define ((Euler-state->L-body A B C) local)
(let ((omega-body (Euler-state->omega-body local)))
(column-matrix (* A (ref omega-body 0))
(* B (ref omega-body 1))
(* C (ref omega-body 2)))))
We then transform the components of the angular momentum on

the principal axes to the components on the fixed basis êi :
(define ((Euler-state->L-space A B C) local)
(let ((angles (coordinate local)))
(* (Euler->M angles)
((Euler-state->L-body A B C) local))))
These procedures are local state functions, like Lagrangians.
2.9 Motion of a Free Rigid Body

The kinetic energy, expressed in terms of a suitable set of gen-
eralized coordinates, is a Lagrangian for a free rigid body. In
section 2.1 we found that the kinetic energy of a rigid body can
be written as the sum of the rotational kinetic energy and the
translational kinetic energy. By choosing one set of coordinates
to specify the position and another set to specify the orientation
the Lagrangian becomes a sum of a translational Lagrangian and a
rotational Lagrangian. The Lagrange equations for translational
motion are not coupled to the Lagrange equations for the rota-
tional motion. For a free rigid body the translational motion is
just that of a free particle: uniform motion. Here we concentrate
on the rotational motion of the free rigid body. We can adopt the
Euler angles as the coordinates that specify the orientation; the
rotational kinetic energy was expressed in terms of Euler angles
in the previous section.
Conserved quantities
The Lagrangian for a free rigid body has no explicit time depen-
dence, so we can deduce that the energy, which is just the kinetic
energy, is conserved by the motion.
The Lagrangian does not depend on the Euler angle ϕ, so we
can deduce that the momentum conjugate to this coordinate is
2.9 Motion of a Free Rigid Body 135
conserved. An explicit expression for the momentum conjugate to

ϕ is:
(define Euler-state
(up ’t
(up ’thetadot ’phidot ’psidot)))
(show-expression
(ref (((partial 2) (T-rigid-body ’A ’B ’C)) Euler-state)
1))
Aϕ̇ (sin (θ))2 (sin (ψ))2 + Aθ̇ cos (ψ) sin (θ) sin (ψ)
+ B ϕ̇ (cos (ψ))2 (sin (θ))2 − B θ̇ cos (ψ) sin (θ) sin (ψ)
+ C ϕ̇ (cos (θ))2 + C ψ̇ cos (θ)
We know that this complicated quantity is conserved by the mo-

tion of the rigid body because of the symmetries of the Lagrangian.
If there are no external torques, then we expect that the vector
angular momentum will be conserved. We can verify this using
the Lagrangian formulation of the problem. First, we note that
Lz is the same as pϕ . We can check this by direct calculation:
(print-expression
(- (ref ((Euler-state->L-space ’A ’B ’C) Euler-state)
2)
(ref (((partial 2) (T-rigid-body ’A ’B ’C)) Euler-state)
1)))
;Value: 0
We know that pϕ is conserved because the Lagrangian for the free

rigid body did not mention ϕ, so now we know that Lz is con-
served. Since the orientation of the coordinate axes is arbitrary,
we know that if any rectangular component is conserved then all
of them are. So the vector angular momentum is conserved for
the free rigid body.
Of course, we could have seen this with the help of Noether’s
theorem (see section 1.8.4). There are a continuous family of ro-
tations that can transform any orientation into any other orienta-
tion. The orientation of the coordinate axes we used to define the
Euler angles is arbitrary, and the kinetic energy (the Lagrangian)
is the same for any choice of coordinate system. Thus the situa-
tion meets the requirements of Noether’s theorem, which tells us
that there is a conserved quantity. In particular, the family of
rotations around each coordinate axis gives us conservation of the
angular momentum component on that axis. We construct the
vector angular momentum by combining these contributions.
Exercise 2.10: Vector angular momentum

Fill in the details of the argument that Noether’s theorem implies that
vector angular momentum is conserved by the motion of the free rigid
body.
2.9.1 Computing the Motion of Free Rigid Bodies

Lagrange’s equations for the motion of a free rigid body in terms
of Euler angles are quite disgusting, so we will not show them
here. However, we will use the Lagrange equations to explore the
motion of the free rigid body.
Before doing this it is worth noting that the equations of motion
in Euler angles are singular for some configurations, because for
these configurations the Euler angles are not uniquely defined. If
we set θ = 0 then an orientation does not correspond to a unique
value of ϕ and ψ; only their sum determines the orientation.
The singularity arises in the explicit Lagrange equations when
we attempt to solve for the second derivative of the generalized
coordinates in terms of the generalized coordinates and the gen-
eralized velocities (see section 1.7). The isolation of the second
derivative requires multiplying by the inverse of ∂2 ∂2 L. The de-
terminant of this quantity becomes zero when the Euler angle θ
is zero.
(show-expression
(determinant
(((square (partial 2)) (T-rigid-body ’A ’B ’C))
Euler-state)))
ABC (sin (θ))2
So when θ is zero, we cannot solve for the second derivatives.

When θ is small, the Euler angles can move very rapidly, and thus
may be difficult to compute reliably. Of course, the motion of the
rigid body is perfectly well behaved for any orientation. This is a
2.9.1 Computing the Motion of Free Rigid Bodies 137
problem of the representation of that motion in Euler angles; it is

a “coordinate singularity.”
One solution to this problem is to use another set of Euler-like
coordinates for which Lagrange’s equations have singularities for
different orientations, such as those defined in equation (2.40). So
as the calculation proceeds, if we come close to a singularity in one
set of coordinates we can switch and use the other set for a while
until they encounter a singularity. This solves the problem, but it
is cumbersome. For the moment we will ignore this problem and
compute some trajectories, being careful to limit our attention to
trajectories that avoid the singularities.
We will compute some trajectories by numerical integration and
check our integration process by seeing how well energy and an-
gular momentum are conserved. Then, we will investigate the
evolution of the components of angular momentum on the prin-
cipal axis basis. We will discover that we can learn quite a bit
about the qualitative behavior of rigid bodies by combining the
information we get from the energy and angular momentum.
To develop a trajectory from initial conditions we integrate the
Lagrange equations, as we did in chapter 1. The system derivative
is obtained from the Lagrangian:
(define (rigid-sysder A B C)
(Lagrangian->state-derivative (T-rigid-body A B C)))
The following program monitors the errors in the energy and the
components of the angular momentum:
(define ((monitor-errors win A B C L0 E0) state)
(L ((Euler-state->L-space A B C) state))
(E ((T-rigid-body A B C) state)))
(plot-point win t (relative-error (ref L 0) (ref L0 0)))
(plot-point win t (relative-error E E0))))
(define (relative-error value reference-value)

(if (zero? reference-value)
(error "Zero reference value -- RELATIVE-ERROR")
(/ (- value reference-value) reference-value)))
We make a plot window to display the errors:

(define win (frame 0. 100. -1.e-12 1.e-12))
The default integration method is Bulirsch-Stoer (bulirsch-stoer);

the integration method used here is quality-controlled Runge-
Kutta (qcrk4):
(set! *ode-integration-method* ’qcrk4)
We use evolve to investigate the evolution:

(let ((A 1.) (B (sqrt 2.)) (C 2.) ; moments of inertia
(state0 (up 0.0 ; initial state
(up 1. 0. 0.)
(up 0.1 0.1 0.1))))
(let ((L0 ((Euler-state->L-space A B C) state0))
(E0 ((T-rigid-body A B C) state0)))
((evolve rigid-sysder A B C)
state0
(monitor-errors win A B C L0 E0)
0.1 ; step between plotted points
100.0 ; final time
1.0e-12))) ; max local truncation error
The plot that is developed of the relative errors in the components

of the angular momenta and the energy (see figure 2.2) shows that
we have been successful in controlling the error in the conserved
quantities. This should give us some confidence in the trajectory
that is evolved.
2.9.2 Qualitative Features of Free Rigid Body Motion

The evolution of the components of the angular momentum on
the principal axes has a remarkable property. For almost every
initial condition the body components of the angular momentum
periodically trace a simple closed curve.
We can see this by investigating a number of trajectories, and
plotting the components of angular momentum of the body on
the principal axes (see figure 2.3). For most initial conditions
we find a a one-dimensional simple-closed curve. The trajectories
appear to cross because they are projected. There are special
initial conditions that produce trajectories, called the separatrix,
that appear to intersect in two points.
To make this figure a number of trajectories of equal energy
were computed. The three dimensional space of body components
10−12
−10−12
0 25 50 75 100
Figure 2.2 The relative error in energy and in the three spatial com-
ponents of the angular momentum versus time. It is interesting to note
that the energy error is one of the three falling curves.
is projected onto a two-dimensional plane for display. Points on

the back of this projection of the ellipsoid of constant energy are
plotted with lower density than points on the front of the ellipsoid.
What is going on? The state space for a free rigid body is six
dimensional: the three Euler angles and their time derivatives.
We know four constants of the motion—the three spatial compo-
nents of the angular momentum, Lx , Ly , and Lz , and the energy,
E. Thus, the motion is restricted to a two-dimensional region of
the state space.9 Our experiment shows that the components of
the angular momentum trace one-dimensional closed curves in the
angular-momentum subspace, so there is something more going on
here.
The total angular momentum is conserved if all of the compo-
nents are, so we also have the constant
L2 = L2x + L2y + L2z . (2.56)
9
We expect that for each constant of the motion we reduce by one the di-
mension of the region of the state space explored by a trajectory. This is
because a constant of the motion can be used to locally solve for one of the
state variables in terms of the others.
Figure 2.3 Trajectories of the components of the angular momentum

vector on the principal axes, projected onto a plane. Each closed curve,
except for the separatrix, is a different trajectory. All the trajectories
shown here have the same energy.
The spatial components of the angular momentum do not change,

but of course the projections of the angular momentum onto the
principal axes do change because the axes move as the body moves.
However, the magnitude of the angular momentum vector is the
same whether it is computed from components on the fixed basis
or components on the principal axis basis. So, the combination
L2 = L2a + L2b + L2c , (2.57)
is conserved.
Using the expressions (2.53 - 2.55) for the angular momentum
in terms of the components of the angular velocity vector on the
principal axes, the kinetic energy (2.30) can be rewritten in terms

of the angular momentum components on the principal axes
µ ¶
1 L2a L2b L2
E= + + c . (2.58)
2 A B C
The two integrals (2.57 and 2.58) provide constraints on how

the components of the angular momentum vector on the princi-
pal axes can change. We recognize the angular momentum in-
tegral (2.57) as the equation of a sphere, and the kinetic energy
integral (2.58) as the equation for a triaxial ellipsoid. Both inte-
grals are conserved so the components of the angular momentum
are constrained to move on the intersection of these two surfaces,
the energy ellipsoid and the angular momentum sphere. The in-
tersection of an ellipsoid and a sphere with the same center is
generically two closed curves, so an orbit is confined to one of
these curves. This sheds light on the puzzle at the beginning of
this section.
Because of our ordering A ≤ B ≤ C, the longest axis of this
triaxial ellipsoid coincides with the ĉ direction, when all the an-
gular momentum is along the axis of largest principal moment of
inertia, and the shortest axis of the energy ellipsoid coincides with
the â axis, when all the angular momentum is along the smallest
moment of inertia. Without actually solving the Lagrange equa-
tions, we have found strong constraints on the evolution of the
components of the angular momentum on the principal axes.
To determine how the system evolves along these intersection
curves we have to use the equations of motion. We observe that
the evolution of the components of the angular momentum on
the principal axes depends only on the components of the angular
momentum on the principal axes, even though the values of these
components are not enough to completely specify the dynamical
state. Apparently the dynamics of these components is self con-
tained, and we will see that it can be described in terms of a set
of differential equations whose only dynamical variables are the
components of the angular momentum on the principal axes (see
section 2.12).
We note that there are two axes for which the intersection
curves shrink to a point if we hold the energy constant and vary the
magnitude of the angular momentum. If the angular momentum
starts at these points, the integrals constrain the angular momen-
tum to stay there. These points are equilibrium points for the body
components of the angular momentum. However, these points are
not equilibrium points for the system as a whole. At these points
the body is still rotating even though the body components of the
angular momentum are not changing. This kind of equilibrium is
called a relative equilibrium. We can also see that if the angular
momentum is initially slightly displaced from one of these relative
equilibria then the angular momentum is constrained to stay near
it on one of the intersection curves. The angular momentum vec-
tor is fixed in space, so the principal axis of the equilibrium point
of the body rotates stably about the angular momentum vector.
At the principal axis with intermediate moment of inertia, the b̂
axis, the intersection curves cross. As we observed, the dynamics
of the components of the angular momentum on the principal axes
form a self-contained dynamical system. Trajectories of a dynam-
ical system cannot cross,10 so the most that can happen is that
if the equations of motion carry the system along the intersec-
tion curve then the system can only asymptotically approach the
crossing point. So without solving any equations we can deduce
that the point of crossing is another relative equilibrium. If the
angular momentum is initially aligned with the intermediate axis,
then it stays aligned. If the system is slightly displaced from the
intermediate axis, then the evolution along the intersection curve
will take the system far from the relative equilibrium. So rotation
about the axis of intermediate moment of inertia is unstable—
initial displacements of the angular momentum, however small
initially, become large. Again, the angular momentum vector is
fixed in space, but now the principal axis with the intermediate
principal moment does not stay close to the angular momentum,
so the body executes a complicated tumbling motion.
This gives some insight into the mystery of the thrown book
mentioned at the beginning of the chapter. If one throws a book
so that it is initially rotating about either the axis with the largest
or the smallest moment of inertia (the smallest and largest physi-
cal axes, respectively), the book rotates regularly about that axis.
However, if the book is thrown so that it is initially rotating about
the axis of intermediate moment of inertia (the intermediate phys-
ical axis), then the book tumbles, however carefully the book is
10
Systems of ODEs that satisfy a Lipschitz condition have unique solutions.
thrown. You can try it with this book (but put a rubber band
around it first).
Before moving on, we can make some further physical deduc-
tions. Suppose a freely rotating body is subject to some sort of
internal friction that dissipates energy, but conserves the angular
momentum. For example, real bodies flex as they spin. If the
spin axis moves with respect to the body then the flexing changes
with time, and this changing distortion converts kinetic energy
of rotation into heat. Internal processes do not change the total
angular momentum of the system. If we hold the magnitude of
the angular momentum fixed but gradually decrease the energy
then the curve of intersection on which the system moves gradu-
ally deforms. For a given angular momentum there is a lower limit
on the energy; the energy cannot be so low that there are no in-
tersections. For this lowest energy the intersection of the angular
momentum sphere and the energy ellipsoid is a pair of points on
the axis of maximum moment of inertia. With energy dissipation,
a freely rotating physical body eventually ends up with the lowest
energy consistent with the given angular momentum, which is ro-
tation about the principal axis with the largest moment of inertia
(typically the shortest physical axis).
Thus, we expect that given enough time all freely rotating phys-
ical bodies will end up rotating about the axis of largest moment of
inertia. You can demonstrate this to your satisfaction by twirling
a small bottle containing some viscous fluid, such as correction
fluid. What you will find is that, whatever spin you try to put
on the bottle, it will reorient itself so that the axis of the largest
moment of inertia is aligned with the spin axis. Remarkably, this
is very nearly true of almost every body in the solar system for
which there is enough information to decide. The deviations from
principal axis rotation for the Earth are tiny, the angle between
the angular momentum vector and the ĉ axis for the Earth is less
than one arc-second.11 In fact, the evidence is that all of the plan-
ets, the Moon and all of the other natural satellites, and almost
all of the asteroids rotate very nearly about the largest moment
of inertia. We have deduced that this is to be expected using
an elementary argument. There are exceptions. Comets typically
do not rotate about the largest moment. As they are heated by
11
The deviation of the angular momentum from the principal axis may be due
to a number of effects: earthquakes, atmospheric tides, ... .
the sun, material spews out from localized jets, and the back reac-
tion from these jets changes the rotation state. Among the natural
satellites, the only known exception is Saturn’s satellite Hyperion,
which is tumbling chaotically. Hyperion is especially out-of-round
and subject to strong gravitational torques from Saturn.
2.10 Axisymmetric Tops
We have all played with a top at one time or another. For the
purposes of analysis we will consider an idealized top that does
not wander around. Thus, an ideal top is a rotating rigid body,
one point of which is fixed in space. Furthermore, the center of
mass of the top is not at the fixed point, which is the center of
rotation, and there is a uniform gravitational acceleration.
For our top we can take the Lagrangian to be the difference
of the kinetic energy and the potential energy. We already know
how to write the kinetic energy—what is new here is that we must
express the potential energy in terms of the configuration. In the
case of a body in a uniform gravitational field this is easy. The
potential energy is sum of “mgh” for all the constituent particles:
X
mα ghα , (2.59)
α
where g is the gravitational acceleration, hα = ~xα · ẑ, and where

the unit vector ẑ indicates which way is up. Rewriting the vector
~ to the center of mass,
to the constituents in terms of the vector X
the potential energy is:
X ¡ ¢
mα g X ~ + ξ~α · ẑ
α
Ã !
X
~ · ẑ + g
= gM X mα ξ~α · ẑ
α
~ · ẑ,
= gM X (2.60)
where the last sum is zero because the center of mass is the origin
of ξ~α . So the potential energy of a body in a gravitational field
with uniform acceleration is very simple: it is just M gh, where M
is the total mass, and h = X~ · ẑ is the height of the center of mass.
ẑ
ψ
φ
x̂
Figure 2.4 An axisymmetric top is a symmetrical rigid body in a

uniform gravitational field with one point of the body fixed in space.
The Euler angles used to specify the configuration are indicated.
Here we consider an axisymmetric top (see figure 2.4). Such

a top has an axis of symmetry of the mass distribution, so the
center of mass is on the symmetry axis, and the fixed point is also
on the axis of symmetry.
In order to write the Lagrangian we need to choose a set of
generalized coordinates. If we choose them well we can take ad-
vantage of the symmetries of the problem. If the Lagrangian does
not depend on a particular coordinate the conjugate momentum
is conserved, and the complexity of the system is reduced.
The axisymmetric top has two apparent symmetries. The fact
that the mass distribution is axisymmetric implies that neither
the kinetic nor potential energy is sensitive to the orientation of
the top about that symmetry axis. Additionally, the kinetic and
potential energy are insensitive to a rotation of the physical system
about the vertical axis, because the gravitational field is uniform.
We can take advantage of these symmetries by choosing ap-
propriate coordinates, and we already have a coordinate system
that does the job—the Euler angles.12 We choose the reference

orientation so that the symmetry axis is vertical. The first Euler
angle ψ expresses a rotation about the symmetry axis. The next
Euler angle θ is the tilt of the symmetry axis of the top from the
vertical. The third Euler angle ϕ expresses a rotation of the top
about the z axis. The symmetries of the problem imply that the
first and third Euler angles do not appear in the Lagrangian. As a
consequence the momenta conjugate to these angles are conserved
quantities. Let’s work out the details.
First, we work out the Lagrangian explicitly. The general form
of the kinetic energy has been worked out, but here there is one
twist. The top is constrained so that it pivots about a fixed point
that is not at the center of mass. So the moments of inertia that
enter the kinetic energy are the moments of inertia of the top
with respect to the pivot point, not about the center of mass. If
we know the moments of inertia about the center of mass we can
write the moments of inertia about the pivot in terms of them (see
exercise 2.2). So let’s assume the principal moments of inertia of
the top about the pivot are A, B, and C, and A = B because of
the symmetry.13 We can use the computer to help us figure out
the Lagrangian for this special case:
(show-expression
((T-rigid-body ’A ’A ’C)
(up ’t
(up ’thetadot ’phidot ’psidot))))
1 ³1 ´ 1 1
(sin (θ))2 Aϕ̇2 + cos (θ) cos (θ) C ϕ̇2 + C ϕ̇ψ̇ + Aθ̇2 + C ψ̇ 2
2 2 2 2
We can rearrange this a bit to get
T (t; θ, ϕ, ψ; θ̇, ϕ̇, ψ̇)

¡ ¢ ¡ ¢2
= 12 A θ̇2 + ϕ̇2 sin2 θ + 12 C ψ̇ + ϕ̇ cos θ . (2.61)
12
That the axisymmetric top can be solved in Euler angles is, no doubt, the
reason for the traditional choice of the definition of the Euler angles. For other
problems, the Euler angles may offer no particular advantage.
13
Here, we do not require that C be larger than A = B, because they are not
measured with respect to the center of mass.
In terms of Euler angles, the potential energy is
V (t; θ, ϕ, ψ; θ̇, ϕ̇, ψ̇) = M gR cos θ, (2.62)
where R is the distance of the center of mass from the pivot. The
Lagrangian is L = T − V . We see that the Lagrangian is indeed
independent of ψ and ϕ, as expected.
There is no particular reason to look at the Lagrange equations.
We can assign that job to the computer when needed. However, we
have already seen that it may be useful to examine the conserved
quantities associated with the symmetries.
The energy is conserved, because the Lagrangian has no ex-
plicit time dependence. Also, the energy is the sum of the kinetic
and potential energy E = T + V , because the kinetic energy is
a homogeneous quadratic form in the generalized velocities. The
energy is
¡ ¢ ¡ ¢2
E = 12 A θ̇2 + ϕ̇2 sin2 θ + 12 C ψ̇ + ϕ̇ cos θ + M gR cos θ. (2.63)
Two of the generalized coordinates do not appear in the La-

grangian, so there are two conserved momenta. The momentum
conjugate to ϕ is
¡ ¢
pϕ = A(sin θ)2 + C(cos θ)2 ϕ̇ + C ψ̇ cos θ. (2.64)
The momentum conjugate to ψ is
pψ = C(ψ̇ + ϕ̇ cos θ). (2.65)
¡ The state of the ¢ system at a moment is specified by the tuple

t; θ, ϕ, ψ; θ̇, ϕ̇, ψ̇ . The two coordinates ϕ and ψ that do not ap-
pear in the Lagrangian do not appear in the Lagrange equations
or the conserved momenta. So the evolution of the remaining
four state variables, θ, θ̇, ϕ̇, and ψ̇, depends only on those re-
maining state variables. This subsystem for the top has a four
dimensional state space. The variables that did not appear in the
Lagrangian can be determined by integrating the derivatives of
these variables, which are determined separately by solving the
independent subsystem.
The evolution of the top is described by a four-dimensional

subsystem and two auxiliary quadratures.14 This subdivision is a
consequence of choosing generalized coordinates that incorporate
the symmetries. However, the choice of generalized coordinates
that incorporate the symmetries also gives conserved momenta.
We can make use of these momenta to further simplify the for-
mulation of the problem. Each integral can be used to locally
eliminate one dimension of the subsystem. In this case the sub-
system has four dimensions and there are three integrals, so the
system can be completely reduced to quadratures. For the top,
this can be done analytically, but we think it is a waste of time to
do it. Rather, we are interested in extracting interesting features
of the motion. We concentrate on the energy integral and use
the two conserved momenta to eliminate ϕ̇ and ψ̇. After a bit of
algebra we find:
1 (pϕ − pψ cos θ)2 p2ψ

E = Aθ̇2 + + + M gR cos θ. (2.66)
2 2A(sin θ)2 2C
Along a path θ, where Dθ(t) is substituted for θ̇, this is an ordi-

nary differential equation for θ. This differential equation involves
various constants, some of which are set by the initial conditions
of the other state variables. The solution of the differential equa-
tion for θ involves no more than ordinary integrals. So the top is
essentially solved. We could continue this argument to obtain the
qualitative behavior of θ: Using the energy (2.66), we can plot the
trajectories in the plane of θ̇ versus θ, and see that the motion
of θ is simply periodic. However we will defer continuing along
this path until chapter 3, when we have developed more tools for
analysis.
Let’s get real. Let’s make a top out of a disk of aluminum with a
steel rod through the center to make the pivot. Measuring the top
very carefully we find that the moment of inertia of the top about
the symmetry axis is about 6.60 × 10−5 kg m2 , and the moment
of inertia about the pivot point is about 3.28 × 10−4 kg m2 . The
combination gM R is about 0.0456 kg m2 sec−2 . We spin the top
up with an initial angular velocity of ψ̇ = 140 radians/second
(about 1337 rpm). The top initially has θ̇ = ϕ = ψ = 0 and
14
Traditionally, evaluating a definite integral is known as performing a quadra-
ture.
π/2
0
0 1 2
Figure 2.5 The tilt angle π − θ of the top versus time. The tilt of the
top varies periodically.
2π
0
0 1 2
Figure 2.6 The precession angle ϕ of the top versus time. The top
precesses nonuniformly—the rate of precession varies as the tilt varies.
200
100
0
0 1 2
Figure 2.7 The rate of rotation ψ̇ of the top versus time. The rate of
rotation of the top changes periodically, as the tilt of the top varies.
π/2
0
0 π 2π
Figure 2.8 An idea of the actual motion of the top is obtained by
plotting the tilt angle π − θ versus the precession angle ϕ. This is a
“latitude-longitude” map showing the path of the center of mass of the
top. We see that though the top has a net precession it executes a
looping motion as it precesses.
is initially tilted with θ = 0.1 radians. We then kick it so that

ϕ̇ = −15 radians/second. Figures 2.5 - 2.8 display aspects of the
evolution of the top for 2 seconds. The tilt of the top (measured by
θ) varies in a periodic manner. The orientation about the vertical
is measured by ϕ: we see that the top also precesses, and the
rate of precession varies with θ. We also see that as the top bobs
up and down the rate of rotation of the top oscillates—the top
spins faster when it is more vertical. The plot of tilt versus the
precession angle shows that in this case the top executes a looping
motion. If we do not kick it, but just let it drop then the loop
disappears leaving just a cusp. If we kick it in the other direction,
then there is no cusp or any looping motion.
Exercise 2.11: Kinetic energy of the top

We have asserted, without proof, that the kinetic energy of the top is
the kinetic energy of rotation about the pivot point. Show that this is
the same as the sum of the rotational kinetic energy about its center of
mass and the kinetic energy of the motion of the center of mass.
Exercise 2.12: Nutation of the top

a. Carry out the algebra to obtain the energy (2.66) in terms of θ and θ̇.
b. Numerically integrate the Lagrange equations for the top to obtain
figure 2.5, θ versus time.
c. Note that the energy is a differential equation for θ̇ in terms of θ,
with conserved quantities pφ , pψ and E determined by initial conditions.
Can we use this differential equation to obtain θ as a function of time?
Explain.
Exercise 2.13: Precession of the top

Consider a top that is rotating so that θ is constant.
a. Using the angular momentum integrals, compute the rate of preces-
sion ϕ̇.
b. Assume that ψ̇ is very large. Develop an approximate formula for the
precession rate by equating the rate of change of the angular momentum
to the gravitational torque on the center of mass.
c. Numerically integrate the top and check your estimate. Investigate
how the rate of precession varies with θ keeping other inputs fixed.
2.11 Spin-Orbit Coupling
The rotation of planets and natural satellites is affected by the

gravitational forces from other celestial bodies. As an extended
application of our development of the equations governing the mo-
tion of forced rigid bodies we consider the rotation of celestial
objects subject to gravitational forces.
We first develop the form of the potential energy for the grav-
itational interaction of an extended body with an external point
mass. With this potential energy and the usual rigid body kinetic
energy we can form Lagrangians that model a number of systems.
We will take an initial look at the rotation of the Moon and Mer-
cury; later, we will return to study these systems after we have
developed more tools.
2.11.1 Development of the Potential Energy

The first task is to develop convenient expressions for the gravi-
tational potential energy of the interaction of a rigid body with
a distant mass point. A rigid body can be thought of as made
of a large number of mass elements, subject to rigid coordinate
constraints. We have seen that the kinetic energy of a rigid body
is conveniently expressed in terms of the moments of inertia of
the body and the angular velocity vector, which in turn can be
represented in terms of a suitable set of generalized coordinates.
The potential energy can be developed in a similar manner. We
first represent the potential energy in terms of moments of the
mass distribution and later introduce generalized coordinates as
particular parameters of the potential energy.
The gravitational potential energy of a mass point and a rigid
body (see figure 2.9) is the sum of the potential energy of the mass
point with each mass element of the body:
X GM 0 mα
− (2.67)
α
rα
where M 0 is the mass of the external point mass, rα is the distance

between the point mass and the constituent mass element with
index α, mα is the mass of this constituent element, and G is
the gravitational constant. Let R be the distance of the center
of mass of the rigid body to the point mass; R is the magnitude
~ where the external mass point has position
of the vector ~x − X,
2.11.1 Development of the Potential Energy 153
mα
ξ~α rα
θ M0
R
~
X ~x
origin
Figure 2.9 The gravitational potential energy of a mass point and a

rigid body is the sum of the gravitational potential energy of the mass
point with each constituent mass element of the rigid body.
~x, and the center of mass has position X. ~ The vector from the
center of mass to the constituent with index α is ξ~α , and has
magnitude ξα . The distance rα is then given by the law of cosines
rα2 = R2 + ξα2 − 2ξα R cos θα where θα is the angle between ~x − X
~
and ξ~α . The potential energy is then
X mα
−GM 0 1/2
. (2.68)
2 2
α (R + ξα − 2ξα R cos θα )
This is complete, but we need to find a representation that does

not mention each constituent.
Typically, the size of celestial bodies is small compared to the
separation between them. We can make use of this to find a more
compact representation of the potential energy. If we expand the
potential energy in the small ratio ξα /R we find15
X 1 X ξαl
−GM 0 mα Pl (cos θα ), (2.69)
α
R Rl
l
15
The Legendre polynomials Pl may be obtained by expanding (1 + y 2 −
2yx)−1/2 as a power series in y. The coefficient of y l is Pl (x). The first few
Legendre polynomials are:P0 (x) = 1, P1 (x) = x, P2 (x) = 32 x2 − 12 , and so
on. The rest satisfy the recurrence relation: lPl (x) = (2l − 1)xPl−1 (x) − (l −
1)Pl−2 (x).
where Pl is the lth Legendre polynomial. Interchanging the order

of the summations:
GM 0 X X ξl
− mα αl Pl (cos θα ). (2.70)
R α
R
l
Successive terms in this expansion of the potential energy typically

decrease very rapidly because celestial bodies are small compared
to the separation between them. We can compute an upper bound
to the size of these terms by replacing each factor in the sum
over α by an upper bound. The Legendre polynomials all have
magnitudes less than one for arguments in the range −1 to 1. The
distances ξα are all less than some maximum extent of the body
ξmax . The sum over mα times these upper bounds is just the total
mass M times the upper bounds. Thus
X ξαl l
ξmax
k mα Pl (cos θ α )k ≤ M . (2.71)
α
Rl Rl
We see that the upper bound on successive terms decreases by a

factor ξmax /R. Successive terms may be smaller still. For large
bodies the gravitational force is strong enough to overcome the
internal material strength of the body, so the body, over time,
becomes nearly spherical. Successive terms in the expansion of
the potential are measures of the deviation of the mass distribu-
tion from a spherical mass distribution. Thus for large bodies
the higher order terms are small because the bodies are nearly
spherical.
Consider the first few terms in l. For l = 0 the sum over α just
gives the total mass M of the rigid body. For l = 1 the sum over
α is zero, as a consequence of choosing the origin of the ξ~α to be
the center of mass. For l = 2 we have to do a little more work.
The sum involves second moments of the mass distribution, and
can be written in terms of moments of inertia of the rigid body:
X X ³3 1´
mα ξα2 P2 (cos θα ) = mα ξα2 (cos θα )2 −
α α
2 2
X ³ 3 ´
= mα ξα2 1 − (sin θα )2
α
2
1
= (A + B + C − 3I) , (2.72)
2
2.11.1 Development of the Potential Energy 155
where A, B, and C are the principal moments of inertia, and I

is the moment of inertia of the rigid body about the line between
the center of mass of the body to the external point mass. The
moment I depends on the orientation of the rigid body relative to
the line between the bodies. The contributions to the potential
energy up to l = 2 are then16
GM M 0 GM 0
− − (A + B + C − 3I). (2.73)
R 2R3
Let α = cos θa , β = cos θb , and γ = cos θc be the direction cosines
of the angles θa , θb and θc between the principal axes â, b̂, and ĉ
and the line between the center of mass and the point mass.17 A
little algebra shows I = α2 A + β 2 B + γ 2 C. The potential energy
is then
GM M 0 GM 0
− − [(1 − 3α2 )A + (1 − 3β 2 )B + (1 − 3γ 2 )C]. (2.74)
R 2R3
This is a good first approximation to the potential energy of in-
teraction for most situations in the solar system; if we intended to
land on the moon we probably would want to take into account
higher order terms in the expansion.
Exercise 2.14:
a. Fill in the details that show that the sum over consitutents in equa-
tion (2.72) can be expressed as written in terms of moments of inertia.
In particular, show that
X
mα ξα cos θα = 0,
α
X
mα ξα2 = 2(A + B + C),
α
and that
X
mα ξα2 (sin θα )2 = I.
α
16
This approximate representation of the potential energy is sometimes called
MacCullagh’s formula.
17
Watch out, we just reused α. It was also used as the constituent index.
b. Show that if the principal moments of inertia of a rigid body are A,

B, and C, then the moment of inertia about an axis that goes through
the center of mass of the body with direction cosines α, β, and γ relative
to the principal axes is
I = α2 A + β 2 B + γ 2 C.
2.11.2 Rotation of the Moon and Hyperion

The approximation to the potential energy that we have derived
can be used for a number of different problems. For instance, it
can be used to investigate the effect of oblateness on the evolution
of an artificial satellite about the Earth, or to incorporate the
effect of planetary oblateness on the evolution of the orbits of
natural satellites, such as the Moon, or the Galilean satellites of
Jupiter. However, as the principal application here, we will use
it to investigate the rotational dynamics of natural satellites and
planets.
The potential energy depends on the position of the point mass
relative to the rigid body and on the orientation of the rigid body.
Thus the changing orientation is coupled to the orbital evolution;
each affects the other. However, in many situations the effect of
the orientation of the body on the evolution of the orbit may be
ignored. One way to see this is to look at the relative magnitudes
of the two terms in the potential energy (2.74). We already know
that the second term is guaranteed to be smaller than the first by
a factor of (ξmax /R)2 , but often it is much smaller still because the
body involved is nearly spherical. For example, the radius of the
Moon is about a third the radius of the Earth and the distance to
the Moon is about 60 Earth-radii. So the second term is smaller
than the first by a factor of order 10−4 due to the size factors. In
addition the Moon is roughly spherical and for any orientation the
combination A + B + C − 3I is of order 10−4 C. Now C is itself
of order 25 M R2 , because the density of the Moon does not vary
strongly with radius. So for the Moon the second term is of order
10−8 relative to the first. Even radical changes in the orientation
of the Moon would have little dynamical effect on the orbit of the
Moon.
We can learn some important qualitative aspects of the orien-
tation dynamics by studying a simplified model problem. First,
we assume that the body is rotating about its largest moment of
inertia. This is a natural assumption. Remember that for a free
2.11.2 Rotation of the Moon and Hyperion 157
rigid body the loss of energy while conserving angular momentum

leads to rotation about the largest moment of inertia. This is
observed for most bodies in the solar system. Next, we assume
that the spin axis is perpendicular to the orbital motion. This is a
good approximation for the rotation of natural satellites, and is a
natural consequence of tidal friction—dissipative solid body tides
raised on the satellite by the gravitational interaction with the
planet. Finally, for simplicity we take the rigid body to be mov-
ing on a fixed elliptic orbit. This may approximate the motion
of some physical systems, provided the timescale of the evolution
of the orbit is large compared to any timescale associated with
the rotational dynamics that we are investigating. So we have
a nice toy problem. This problem has been used to investigate
the rotational dynamics of Mercury, the Moon, and other natural
satellites. It makes specific predictions concerning the rotation of
Phobos, a satellite of Mars, which can be compared with observa-
tions. It provides a basic understanding of the fact that Mercury
rotates precisely 3 times for every 2 orbits it completes, and is the
starting point for understanding the chaotic tumbling of Saturn’s
satellite Hyperion.
â
b̂ θa
θb
f θ
$
Figure 2.10 The spin-orbit model problem in which the spin axis is
constrained to be perpendicular to the orbit plane has a single degree
of freedom, the orientation of the body in the orbit plane. Here the
orientation is specified by the generalized coordinate θ.
We are assuming that the orbit does not change or precess. The
orbit is an ellipse with the point mass at a focus of the ellipse. The
angle f (see figure 2.10) measures the position of the rigid body
in its orbit relative to the point in the orbit at which the two
bodies are closest.18 We assume the orbit is a fixed ellipse, so the
angle f and the distance R are periodic functions of time, with
period equal to the orbit period. With the spin axis constrained
to be perpendicular to the orbit plane, the orientation of the rigid
body is specified by a single degree of freedom: the orientation of
the body about the spin axis. We specify this orientation by the
generalized coordinate θ that measures the angle to the â principal
axis from the same line as we measure f , the line through the point
of closest approach.
Having specified the coordinate system, we can work out the
details of the kinetic and potential energies, and thus find the
Lagrangian. The kinetic energy is
T (t, θ, θ̇) = 12 C θ̇2 , (2.75)
where C is the moment of inertia about the spin axis, and the
angular velocity of the body about the ĉ axis is θ̇. There is no
component of angular velocity on the other principal axes.
To get an explicit expression for the potential energy we must
write the direction cosines in terms of θ and f : α = cos θa =
− cos(θ − f ), β = cos θb = sin(θ − f ), and γ = cos θc = 0 because
the ĉ axis is perpendicular to the orbit plane. The potential energy
is then
GM M 0
−
R
1 GM 0 £ ¤
− 3
(1 − 3 cos2 (θ − f ))A + (1 − 3 sin2 (θ − f ))B + C .
2 R
Since we are assuming that the orbit is given, we only need to
keep terms that depend on θ. Expanding the squares of the cosine
and the sine in terms of the double angles, and dropping all the
18
Traditionally, the point in the orbit at which the two bodies are closest is
called the pericenter, and the angle f is called the true anomaly.
terms that do not depend on θ we find the potential energy for

the orientation19
3 GM 0
V (t, θ, θ̇) = − (B − A) cos 2(θ − f (t)). (2.76)
4 R3 (t)
A Lagrangian for the model spin-orbit coupling problem is then
L=T −V:
1 3 GM 0
L(t, θ, θ̇) = C θ̇2 + (B − A) cos 2(θ − f (t)). (2.77)
2 4 R3 (t)
We introduce the dimensionless “out-of-roundness” parameter
r
3(B − A)
²= , (2.78)
C
and use the fact that the orbit frequency n satisfies Kepler’s third
law n2 a3 = G(M +M 0 ), which is approximately n2 a3 = GM 0 for a
small body in orbit around a much more massive one (M ¿ M 0 ).
In terms of ² and n the spin-orbit Lagrangian is
1 n2 ²2 C a3
L(t, θ, θ̇) = C θ̇2 + cos 2(θ − f (t)). (2.79)
2 4 R3 (t)
This is a problem with one degree of freedom with terms that vary
periodically with time.
The Lagrange equations are derived in the usual manner. The
equations are
n2 ²2 C a3
CD2 θ(t) = − sin 2(θ(t) − f (t)). (2.80)
2 R3 (t)
The equation of motion is very similar to that of the periodically
driven pendulum. The main difference here is that not only is the
strength of the acceleration changing periodically, but in the spin-
orbit problem the center of attraction is also varying periodically.
We can give a physical interpretation of this equation of motion.
It states that the rate of change of the angular momentum is equal
to the applied torque. The torque on the body arises because the
19
The given potential energy differs from the actual potential energy in that
non-constant terms that do not depend on θ and consequently do not affect
the evolution of θ have been dropped.
body is out of round and the gravitational force varies as the

inverse square of the distance. Thus the force per unit mass on
the near side of the body is stronger than the acceleration of the
body as a whole, and the force per unit mass on the far side of
the body is a little less than the acceleration of the body as a
whole. Thus, relative to the acceleration of the body as a whole
the far side is forced outward while the inner part of the body is
forced inward. The net effect is a torque on the body, which tries
to align the long axis of the body with the line to the external
point mass. If θ is a bit larger than f then there is a negative
torque, and if θ is a bit smaller than f then there is a positive
torque, both of which would align the long axis with the planet if
given a fair chance. The torque arises because of the difference of
the inverse R2 force across the body, so the torque is proportional
to R−3 . There is only a torque if the body is out-of-round, for
otherwise there is no handle to pull on. This is reflected in the
factor B − A, which appears in the expression for the torque. The
potential depends only on the moment of inertia, thus the body
has the same dynamics if it is rotated by 180◦ . The factor of 2 in
the argument of sine reflects this symmetry. This torque is called
the “gravity gradient torque.”
To compute the evolution requires a bunch of detailed prepara-
tion similar to what has been done for other problems. There are
many interesting phenomena to explore. We can take parameters
appropriate for the Moon, and find that Mr. Moon does not con-
stantly point the same face to the Earth, but instead constantly
shakes his head in dismay at what goes on here. If we nudge the
Moon a bit, say by hitting it with an asteroid, we find that the
long axis oscillates back and forth with respect to the direction
that points to the Earth. For the Moon, the orbital eccentric-
ity is currently about 0.05, and the out-of-roundness parameter is
about ² = 0.026. Figure 2.11 shows the angle θ − f as a function
of time for two different values of the “lunar” eccentricity. The
plot spans 50 lunar orbits, or a little under 4 years. This Moon
has been kicked by a large asteroid and has initial rotational angu-
lar velocity θ̇ equal to 1.01 times the orbit frequency. The initial
orientation is θ = 0. The smooth trace shows the evolution if
the orbital eccentricity is set to zero. We see an oscillation with
a period of about 40 lunar orbit periods or about 3 years. The
more wiggly trace shows the evolution of θ − f with an orbital
eccentricity of 0.05, near the current lunar eccentricity. The lunar
−1
0 25 50
Figure 2.11 The angle θ − f versus time for 50 orbit periods. The
ordinate scale is ±1 radian. The Moon has been kicked so that the initial
rotational angular velocity is 1.01 times the orbital frequency. The trace
with fewer wiggles was computed with zero lunar orbital eccentricity;
the other trace was computed with lunar orbital eccentricity of 0.05.
The period of the rapid oscillations is the lunar orbit period, and are
due mostly to the nonuniform motion of f .
eccentricity superimposes an apparent shaking of the face of the

moon back and forth with the period of the lunar orbit. Though
the Moon does slightly change its rate of rotation during the course
of its orbit, most of this shaking is due to the nonuniform motion
of the Moon in its elliptical orbit. This oscillation is called the
“optical libration of the Moon,” and it allows us to see a bit more
than half the surface of the Moon. The longer period oscillation
induced by the kick is called the “free libration of the Moon.” It
is “free” because we are free to excite it by choosing appropriate
initial conditions. The mismatch of the orientation of the moon
caused by the optical libration actually produces a periodic torque
on the Moon, which slightly speeds up and slows down the Moon
during every orbit. The resulting oscillation is called the “forced
libration of the Moon,” but it is too small to see in this plot.
The oscillation period of the free libration is easily calculated.
We see that the eccentricity of the orbit does not substantially
affect the period, so consider the special case of zero eccentricity.

In this case R = a, a constant, and f (t) = nt where n is the orbital
frequency (traditionally called the mean motion). The equation
of motion becomes
n2 ²2
D2 θ(t) = − sin 2(θ(t) − nt). (2.81)
2
Let ϕ(t) = θ(t) − nt, and consequently Dϕ(t) = Dθ(t) − n, and
D2 ϕ = D2 θ. Substituting these, the equation governing the evo-
lution of ϕ is
n2 ²2
D2 ϕ = − sin 2ϕ. (2.82)
2
For small deviations from synchronous rotation (small ϕ) this is
D2 ϕ = −n2 ²2 ϕ, (2.83)
so we see that the small amplitude oscillation frequency of ϕ is

n². For the Moon, ² is about 0.026, so the period is about 1/0.026
orbit periods or about 40 lunar orbit periods, which is what we
observed.
It is perhaps more fun to see what happens if the out-of-
roundness parameter is large. After our experience with the driven
pendulum it is no surprise that we find abundant chaos in the spin-
orbit problem when the system is strongly driven by having large
² and significant e. There is indeed one body in the solar system
that exhibits chaotic rotation—Hyperion, a small satellite of Sat-
urn. Though our model is not adequate for a complete account
of Hyperion, we can show that our toy model exhibits chaotic be-
havior for parameters appropriate for Hyperion. We take ² = 0.89
and e = 0.1. Figure 2.12 shows θ − f for 50 orbits, starting with
θ = 0 and θ̇ = 1.05. We see that sometimes one face of the body
oscillates facing the planet, sometimes the other face oscillates
facing the planet, and sometimes the body rotates relative to the
planet in either direction.
If we were to relax our restriction that the spin axis is fixed per-
pendicular to the orbit, then we find that the Moon maintains this
orientation of the spin axis even if nudged a bit, but for Hyperion
the spin axis almost immediately falls away from this configura-
tion. The state in which Hyperion on average points one face to
−π
0 25 50
Figure 2.12 The angle θ − f versus time for 50 orbit periods. The
ordinate scale is ±π radian. The out-of-roundness parameter is large
² = 0.89, with an orbital eccentricity of e = 0.1. The system is strongly
driven. The rotation is apparently chaotic.
Saturn is dynamically unstable to chaotic tumbling. Observations

of Hyperion have confirmed that Hyperion is chaotically tumbling.
2.12 Euler’s Equations
For a free rigid body we have seen that the components of the
angular momentum on the principal axes comprise a self contained
dynamical system: the variation of the principal axis components
depends only on the principal axis components. Here we derive
equations that govern the evolution of these components.
The starting point for the derivation is the conservation of the
vector angular momentum. The components of the angular mo-
mentum on the principal axes are
L0 = I0 ω 0 (2.84)
where ω 0 is composed of the components of the angular velocity

vector on the principal axes, and I0 is the matrix representation
of the inertia tensor with respect to the principal axis basis:
" #
A 0 0
I0 = 0 B 0 . (2.85)
0 0 C
The body components of the angular momentum L0 are related to

the components L on the fixed rectangular basis êi by
L = ML0 , (2.86)
where M is the matrix representation of the rotation that carries

the body and all vectors attached to the body from the reference
orientation of the body to the actual orientation.
The vector angular momentum is conserved for free rigid body
motion, and so are its components on a fixed rectangular basis.
So, along solution paths
0 = DL = DM L0 + M DL0 . (2.87)
Solving, we find
DL0 = −MT DM L0 . (2.88)
In terms of ω 0 this is
I0 Dω 0 = −MT DM I0 ω 0
= −MT A(Mω 0 ) M I0 ω 0 , (2.89)
where we have used equation (2.38) to write DM in terms of A.

The function A has the property20
RT A(Rv) R = A(v) (2.90)
20
Rotating the cross product of two vectors gives the same vector that is
u × ~v ) =
obtained by taking the cross product of two rotated vectors: R(~
u) × (R~v ).
(R~
for any vector with components v and any rotation with matrix
representation R. Using this property of A we find Euler’s equa-
tions:
I0 Dω 0 = −A(ω 0 ) I0 ω 0 . (2.91)
Euler’s equations give the time derivative of the body components

of the angular velocity vector entirely in terms of the components
of the angular velocity vector and the principal moments of in-
ertia. Let ω a , ω b , and ω c denote the components of the angular
velocity vector on the principal axes. Then Euler’s equations can
be written as the component equations
A Dω a = (B − C) ω b ω c
B Dω b = (C − A) ω c ω a
C Dω c = (A − B) ω a ω b . (2.92)
Alternately, we can rewrite Euler’s equations in terms of the

components of the angular momentum on the principal axes
DL0 = −A((I0 )−1 L0 )L0 . (2.93)
These equations confirm that the time derivatives of the com-

ponents of the angular momentum on the principal axes depend
only on the components of the angular momentum on the principal
axes.
Euler’s equations are very simple, but they do not completely
determine the evolution of a rigid body—they do not give the spa-
tial orientation of the body. However, equation (2.38) and prop-
erty (2.90) can be used to relate the derivative of the orientation
matrix to the body components of the angular velocity vector:
DM = MA(ω 0 ). (2.94)
A straightforward method of using these equations is to integrate

them componentwise as a set of nine first order ordinary differ-
ential equations, with initial conditions determining the initial
configuration matrix. Together with Euler’s equations, which de-
scribe how the body components of the angular velocity vector
change with time, this system of equations governing the motion
of a rigid body is complete. However, the reader will no doubt
have noticed that this approach is rather wasteful. The fact that
the orientation matrix can be specified with only three parameters

has not been taken into account. We should be integrating three
equations for the orientation, given ω 0 , not nine. To accomplish
this we once again need to parameterize the configuration matrix.
For example, we can use Euler angles to parameterize the ori-
entation:
M(θ, ϕ, ψ) = Rz (ϕ)Rx (θ)Rz (ψ). (2.95)
We form M by composing M with an Euler coordinate path. Equa-

tion (2.94) can then be used to solve for Dθ, Dϕ, and Dψ. We
find
" # " #" a#
Dθ cos ψ sin θ − sin ψ sin θ 0 ω
1
Dϕ = sin ψ cos ψ 0 ω b . (2.96)
sin θ
Dψ − sin ψ cos θ cos ψ cos θ sin θ ωc
This gives us the desired equation for the orientation. Note that
it is singular for θ = 0 as are Lagrange’s equations. So Euler’s
equations using Euler angles for the configuration have the same
problem as did the Lagrange equations using Euler angles. Again,
this is a manifestation of the fact for θ = 0 the orientation depends
only on ϕ + ψ. The singularity in the equations of motion for
θ = 0 does not correspond to anything funny in the motion of the
rigid body. A practical solution to the singularity problem is to
choose another set of Euler-like angles that have a singularity in a
different place, and switch from one to the other when the going
gets tough.
Exercise 2.15:
Fill in the details of the derivation of equation (2.96). You may want to
use the computer to help with the algebra.
Euler’s equations for forced rigid bodies

Euler’s equations were derived for a free rigid body. In general,
we must be able to deal with external forcing. How do we do
this? First, we derive expressions for the vector torque. Then we
include the vector torque in the Euler equations.
We derive the vector torque in a manner analogous to the
derivation of the vector angular momentum. That is, we derive
one component and then argue that since the coordinate system
is arbitrary, all components have the same form.
Suppose we have a rigid body subject to some potential energy

that depends only on time and the configuration. A Lagrangian
is L = T − V . If we use the Euler angles as generalized coordi-
nates, the last of the three active Euler rotations that define the
orientation is a rotation about the ẑ axis. The magnitude of this
rotation is given by the angle ϕ. The Lagrange equation for ϕ
gives21
Dpϕ (t) = −∂1,1 V (t; θ(t), ϕ(t), ψ(t)). (2.97)
If we define Tz , the component of the torque about the z axis, to

be minus the derivative of the potential energy with respect to the
angle of rotation of the body about the z axis,
Tz (t) = −∂1,1 V (t; θ(t), ϕ(t), ψ(t)), (2.98)
then we see that
Dpϕ (t) = Tz (t). (2.99)
We have already identified the momentum conjugate to ϕ as one

component, Lz , of the vector angular momentum L ~ (see sec-
tion 2.9), so
DLz (t) = Tz . (2.100)
Since the orientation of the reference rectangular basis vectors is

arbitrary we may choose them any way that we please. Thus if we
want any component of the vector torque, we may choose the z-
axis so that we can compute it in this way. We can conclude that
the vector torque gives the rate of change of the vector angular
momentum
~ = T~ .
DL (2.101)
Having obtained a general prescription for the vector torque, we

address how the vector torque may be included in Euler’s equa-
tions. Euler’s equations expressed the fact that the vector angular
21
In this equation we have a partial derivative with respect to a component of
the coordinate argument of the potential energy function. The first subscript
on the ∂ symbol indicates the coordinate argument. The second one selects
the ϕ component.
momentum is conserved. Let’s return to that calculation, but now

include a torque with components T
DL = T = DM L0 + M DL0 . (2.102)
Carrying out the same steps as before we find
DL0 + A((I0 )−1 L0 )L0 = M−1 T = T0 , (2.103)
where the components of the vector torque on the principal axes

are:
T0 = M−1 T. (2.104)
In terms of ω 0 this is
I0 Dω 0 + A(ω 0 ) I0 ω 0 = T0 . (2.105)
In components,
A Dω a − (B − C) ω b ω c = T a (2.106)
B Dω b − (C − A) ω c ω a = T b (2.107)
C Dω c − (A − B) ω a ω b = T c . (2.108)
Note that the torque entered only the equations for the body
angular momentum or alternately for the body angular velocity
vector. The equations that relate the derivative of the orientation
to the angular velocity vector are not modified by the torque. In a
sense, Euler’s equations contain the dynamics, and the equations
governing the orientation are kinematic. Of course, Lagrange’s
equations must be modified by the potential that gives rise to the
torques; in this sense Lagrange’s equations contain both dynamics
and kinematics.
2.13 Nonsingular Generalized Coordinates
The Euler angles provide a convenient way to parameterize the

orientation of a rigid body. However, the equations of motion
derived for them have singularities. Though we can avoid the
singularities by using other Euler-like combinations with different
singularities, this kludge is not very satisfying. Let’s brainstorm
a bit and see if we can come up with something better.
What does it take to specify an orientation? Perhaps we can

take a hint from Euler’s theorem. Recall that Euler’s theorem
states that any orientation can be reached with a single rotation.
So one idea to specify the orientation of a body is to parameterize
this single rotation that does the job. To specify this rotation we
need to specify the rotation axis and the amount of rotation. We
contrast this with the Euler angles that specify three successive
rotations. These three rotations need not have any relation to
the single composite rotation that gives the orientation. Isn’t it
curious that the Euler angles make no use of Euler’s theorem?
We can think of several ways of specifying a rotation. One
way would be to specify the rotation axis by the latitude and the
longitude that the rotation axis pierces a sphere. The amount of
rotation needed to take the body from the reference position could
be specified by one more angle. We can predict though that this
choice of coordinates will have similar problems to those of the
Euler angles: if the amount of rotation is zero, then the latitude
and longitude of the rotation axis is undefined. So the Lagrange
equations for these angles are probably singular. Another idea,
without this defect, is to represent the rotation by the rectangular
components of an orientation vector ~o; we take the direction of the
orientation vector to be the same as the axis of rotation that takes
the body from the reference orientation to the present orientation,
and the length of the orientation vector is the angle by which the
body must be rotated, in a right-hand sense about the orientation
vector. With this choice of coordinates, if the angle of rotation
is zero then the length of the vector is zero and has no unwanted
direction. This choice looks promising.
We denote the rectangular components of ~o by (oxq , oy , oz ); these
are our generalized coordinates. The magnitude o = o2x + o2y + o2z
is the angle of rotation. The axis of the rotation is ô = ~o/o. We
denote the components of ô by ôx , ôy , and ôz . The first step in
implementing the components of the orientation vector as gen-
eralized coordinates is to construct the rotation M to which the
orientation vector ~o corresponds. Let ~u0 be a vector to one of the
constituents of the body in the reference orientation, and ~u be the
vector to that constituent after rotation by M :
~u = M~u0 . (2.109)
We can determine M by considering how the rotation represented

by ~o affects the vector ~u0 . The component of ~u0 parallel to ~o is
unaffected. The perpendicular component is reduced by the cosine
of the rotation angle, and a component perpendicular to these
two is generated that is proportional to the sine of the rotation
angle and the magnitude of the perpendicular component. Let
(~u0 )k = (~u0 · ô)ô and (~u0 )⊥ = ~u0 − (~u0 )k , then
~u = (~u0 )k + (~u0 )⊥ cos o + ô × (~u0 )⊥ sin o. (2.110)
From this expression we can construct the equivalent rotation ma-

trix. First define some useful primitive matrices:
" #
0 −ôz ôy
A = A(ô) = ôz 0 −ôx , (2.111)
−ôy ôx 0
and
 
ô2x ôx ôy ôx ôz
S =  ôx ôy ô2y ôy ôz  , (2.112)
ôx ôz ôy ôz ô2z
with the identity
" #
1 0 0
I= 0 1 0 . (2.113)
0 0 1
The matrix A is antisymmetric and S is the symmetric outer
product of the components of ô. The matrix A implements the
cross product of ô with other vectors, and the matrix S projects
vectors to the orientation vector. We have the following identities:
AA = S − I (2.114)
SS = S (2.115)
SA = 0 (2.116)
AS = 0. (2.117)
In terms of these matrices, the rotation matrix is
M = I cos o + A sin o + S(1 − cos o) (2.118)

The inverse of a rotation is a rotation about the same axis but

by the negative of the rotation angle. Thus the inverse of M can
be written down immediately
M−1 = I cos o − A sin o + S(1 − cos o). (2.119)
We verify that the inverse of the rotation matrix is the transpose

of the rotation matrix by recalling that I and S are symmetric
and A is antisymmetric.
The computation of the angular velocity vector from MT and
DM is straightforward, though tedious; the angular velocity vec-
tor turns out to have a simple form:
h sin o 1 − cos o ³ sin o í
ω= I +A +S 1− Do. (2.120)
o o o
The components of the angular velocity vector on the principal
axes can be found by multiplying the above by M−1 = MT :
ω0 = I −A +S 1− Do. (2.121)
o o o
Let
W= I −A +S 1− . (2.122)
o o o
then we have
ω 0 = W Do. (2.123)
Solving, we find
Do = W−1 ω 0 . (2.124)
The matrix W is not an orthogonal matrix, so its inverse is not

trivial, but we can use the properties of the primitive matrices to
find it. Suppose we have a matrix of the form
N = aI + bA + cS (2.125)
that we wish to invert. Let’s guess that the inverse matrix has a
similar form.
N−1 = a0 I + b0 A + c0 S. (2.126)
We wish to find the coefficients a0 , b0 , and c0 so that NN−1 = I.

We find three conditions on the coefficients
1 = aa0 − bb0 r2 (2.127)

0 = ab0 + ba0 (2.128)
0 = ac0 + ca0 + bb0 + cc0 r2 , (2.129)
with solution
a
a0 = 2 (2.130)
a + b2
−b
b0 = 2 (2.131)
a + b2
b2 − ac
c0 = 3 . (2.132)
a + a2 c + ab2 + b2 c
We can now invert the matrix W using its representation in terms
of primitive matrices to find
1 ³ o sin o ´ o 1 ³ o sin o ´
W−1 = I + A+ S 2− (2.133)
2 1 − cos o 2 2 1 − cos o
Note that all terms have finite limits as o → 0. There is however
a new singularity. As o → 2π two of the denominators become
singular, but there the zeros in the numerators are not strong
enough to kill the singularity. This is the expected singularity that
corresponds to the fact that at radius 2π the orientation vector
corresponds to no rotation, but nevertheless specifies a rotation
axis. This singularity is easy to avoid. Whenever the orientation
vector develops a magnitude larger than π simply replace it by
the equivalent orientation vector ~o − 2πô.
We can write the equations governing the evolution of the ori-
entation as a vector equation in terms of ω ~ 0 = M −1 ω
~
1
ω 0 + ~o × ω
D~o = f (o)~ ~ 0 + g(o)~o(~o · ω
~ 0) (2.134)
2
with two auxiliary functions
1 x sin x
f (x) = (2.135)
2 1 − cos x
1 − f (x)
g(x) = . (2.136)
x2
The equation of motion for the orientation vector is surprisingly

simple. Both auxiliary functions have finite limits at zero:
lim f (x) = 1
x→0
lim g(x) = 16 . (2.137)
x→0
Orientation vectors with magnitude less than or equal to π are

enough to specify all orientations, and the equations of motion
in this region have no singularities. The orientation vector may
develop magnitudes greater than π but then we replace it by the
equivalent orientation vector with magnitude less than π. And
there is no hurry to do this because the equations are not sin-
gular until the magnitude reaches 2π. Thus we have a complete
nonsingular specification of the rigid body dynamics.
A practical matter
To use the orientation vector we are presented with the practical
problem of converting between the orientation vector representa-
tion of the orientation and other representations. We can consider
the rotation matrix M as an intermediate universal representation.
Whatever generalized coordinates have been chosen, we must be
able to compute the rotation matrix to which the coordinates cor-
respond. We must also solve the converse problem—the determi-
nation of the generalized coordinates from the rotation matrix.
We already have the explicit form for the rotation matrix in
terms of the orientation vector in equation (2.118), repeated here
for convenience,
M = I cos o + A sin o + S(1 − cos o). (2.138)
We can solve the converse problem by examining this same equa-

tion. We note that of the contributions to M two parts are sym-
metric and one is antisymmetric. We can isolate the antisymmet-
ric component by subtracting the transpose. We have
1
A sin o = (M − MT ) . (2.139)
2
But the matrix A is simply related to the orientation vector
³o´
A=A . (2.140)
o
We use the inverse operation A−1 that extracts the components

of a 3-vector from an antisymmetric 3x3 matrix. So we have
o
= A−1 (A). (2.141)
o
Note that information about the magnitude o of the rotation is
not available in A by itself. However, the combination of M and
its transpose produces a scaled version of A from which the mag-
nitude of the rotation can be recovered
sin o ³1 ´
o = A−1 (M − MT ) . (2.142)
o 2
The length of the vector represented by the components on the
left-hand side is just sin o. This does not uniquely determine o,
because o spans the interval 0 to π. To completely determine o
and thus ~o we need more information, say by determining cos o.
We can get cos o easily enough. Examination of the components
shows
1 h1 i
cos o = trace (M + MT ) − 1 . (2.143)
2 2
Having determined both the sine and the cosine of o we can de-
termine o. Of course, some these expressions contain divisions by
o that may be zero, but if o = 0 then the orientation vector is
just the zero vector. This completes the solution of the practical
problem of going to and from the orientation vector.
Composition of rotations
We can ask the following question: “To which rotation does the
composition of two rotations correspond?” Alternatively, “What
is the algebra of orientation vectors?” We have all the pieces,
to answer this question is just a matter of computation. Given
two rotations represented by the rotation matrices M1 and M2 ,
the rotation matrix of the composition of these rotations is M =
M2 M1 . Each of these rotation matrices can be converted to the
equivalent orientation vector. We can define the composition ~o =
~o2 ◦ ~o1 .
Let α = (sin o)/o, β = (1 − cos o)/o2 , and γ = cos o. By direct
calculation we find
n ³1 + γ ´ α2 β1 o
2
α~o = ~o1 α1 − (~o1 · ~o2 )
2 2
n ³1 + γ ´ α1 β2 o
1
+ ~o2 α2 − (~o1 · ~o2 )
2 2
n α α β1 β2 o
1 2
+ (~o1 × ~o2 ) − + (~o1 · ~o2 ) (2.144)
2 2
and
1 1
γ = (1 + γ1 )(1 + γ2 ) − 1 + (~o1 · ~o2 )2 β1 β2 − (~o1 · ~o2 )α1 α2 ,(2.145)
2 2
which together determine ~o.
Well, the formulas are rather complicated, but it turns out that
with a little rearrangement they can be made quite simple. Let
c = cos(o/2) and s = sin(o/2), and define ~q = (s/o)~o. The vector
~q is a scaled version of ~o; instead of having the magnitude o as ~o
does, the vector ~q has the magnitude s = sin(o/2). Notice that
if o is restricted to magnitudes less than π then the magnitude of
the rotation o can be recovered from the magnitude of ~q. Thus,
with this restriction, the vector ~q corresponds to a unique rotation,
no extra information is needed. Nevertheless it is convenient to
keep track of the cosine of the half-angle as well as the sine; so
let q = c = cos(o/2).22 The magnitude of ~q = s and q = c, so
q 2 + ~q · ~q = 1. We can reexpress the formulas for the composition
of two rotations in terms of ~q and q for each rotation. We have
~q = q2 ~q1 + q1 ~q2 + ~q2 × ~q1 (2.146)

q = q1 q2 − ~q1 · ~q2 . (2.147)
Now that is a significant simplification! The 4-tuple formed from

q with the three components of ~q are the components of Hamil-
ton’s quaternion. We see that the vector part of the quaternion
that represents an orientation is a scaled version of the orientation
vector.
Hamilton discovered an even more elegant way of writing the
formula for the composition of two rotations. Introduce three
unit quaternions: i, j, k, such that i2 = j 2 = k 2 = −1, ij = k,
jk = i, ki = j, and each of the unit quaternions anticommute:
ij = −ji, and so on. Denote the three components of ~q by
(q 1 , q 2 , q 3 ), and q by q 0 . Then define the composite quaternion
22
This notation has the potential for great confusion: q is not the magnitude
of the vector ~
q . Watch out!
q = q 0 + iq 1 + jq 2 + kq 3 . With the rule for how the unit quater-

nions multiply, the formula for the composition of two rotations
becomes simply a multiplication. The quaternions generalize the
idea of complex numbers. In fact they are the only algebraically
closed field besides complex numbers. The unit quaternions can-
not be represented simply as real numbers or complex numbers,
particularly because of their anticommuting properties. However,
they do have a representation as 2x2 matrices of complex numbers.
The units are
h i
1 0
1 7→ (2.148)
0 1
h i
0 −i
i 7→ (2.149)
−i 0
h i
0 −1
j 7→ (2.150)
1 0
h i
−i 0
k 7→ , (2.151)
0 i
where the i on the right-hand side is the usual imaginary unit
i2 = −1. These matrices are related to the Pauli spin matrices.
There are other representations, but this is carrying us too far
afield.
If we are faced with the task of composing rotations repeatedly,
then the quaternions will be a handy intermediate representation.
The quaternions also have the advantage that we do not need to
worry about whether the angle of rotation is in the appropriate
range. However, the equation of motion for the orientation vector
is simpler than the equation of motion for the quaternion, so we
will stick with the orientation vector when we need nonsingular
equations of motion for the orientation.
Exercise 2.16: Composition

Verify that the rule for composition of two rotations in terms of the
orientation vectors (equations 2.144 and 2.144) is equivalent to the rule
for multiplying two quaternions (equation 2.147).
Exercise 2.17: Equation of motion

Find the equation of motion for the orientation quaternion in terms of
the angular velocity vector.
2.14 Summary 177
2.14 Summary
A rigid body is an example of a mechanical system with con-

straints. Thus, in a sense this chapter on rigid bodies was nothing
but an extended example of the application of the ideas developed
in the first chapter.
We first showed that the kinetic energy for a rigid body sepa-
rates into a translational kinetic energy and a rotational kinetic
energy. The center of mass plays a special role in this separation.
The rotational kinetic energy is simply expressed in terms of
the inertia tensor and the angular velocity vector.
One choice for generalized coordinates is the Euler angles. They
form suitable generalized coordinates, but are otherwise not spe-
cial or well motivated.
Having developed the expressions for the kinetic energy that
take into account the body constraints and expressed the remain-
ing freedoms of motion in terms of suitable generalized coordi-
nates, the equations of motion for the free rigid body are just
Lagrange’s equations.
The vector angular momentum is conserved if there are no ex-
ternal torques. The time derivative of the body components of the
angular momentum can be written entirely in terms of the body
components of the angular momentum, and the three principal
moments of inertia. The body components of angular momentum
form a self-contained dynamical sub-system.
The Lagrange equations for the Euler angles are singular for
some Euler angles. Other choices of generalized coordinates like
the Euler angles have similar problems. Equations of motion for
the orientation vector are nonsingular.
2.15 Projects
Exercise 2.18: Free rigid body
Write and demonstrate a program that reproduces diagrams like fig-
ure 2.3. Can you find trajectories that are asymptotic to the unstable
relative equilibrium on the intermediate principal axis?
Exercise 2.19: Rotation of mercury

In the 60’s it was discovered that Mercury has a rotation period that is
precisely 2/3 times its orbital period. We can see this resonant behavior
in the spin-orbit model problem, and we can also play with nudging
Mercury a bit to see how far off the rotation rate can be and still be
trapped in this spin-orbit resonance. If the mismatch in angular velocity
is too great, Mercury’s rotation is no longer resonantly locked to its orbit.
Set ² = 0.026 and e = 0.2.
a. Write a program for the spin-obit problem so this resonance dynamics
can be investigated numerically. You will need to know (or, better,
show!) that f satisfies the equation
³ a ´2
Df = n(1 − e2 )1/2 , (2.152)
r
with
a 1 + e cos f
= . (2.153)
r 1 − e2
b. Show that the 3:2 resonance is stable by numerically integrating the

system when the rotation is not exactly in resonance and observing that
the angle θ − 32 f oscillates.
c. Find the range of initial θ̇ for which this resonance angle oscillates.
Exercise 2.20: Precession of the equinox

The Earth spins very nearly about the largest moment of inertia, and the
spin axis is tilted by about 23◦ to the orbit normal. There is a gravity-
gradient torque on the Earth from the Sun that causes the spin-axis of
the Earth to precess. Investigate this precession in the approximation
that the orbit of the Earth is circular, and the Earth is axisymmetric.
Determine the rate of precession in terms of the moments of inertia of
the Earth.
3
Hamiltonian Mechanics
Numerical experiments are just what their name
implies: experiments. In describing and evaluating
them, one should enter the state of mind of the
experimental physicist, rather than that of the
mathematician. Numerical experiments cannot be
used to prove theorems; but, from the physicist’s
point of view, they do often provide convincing
evidence for the existence of a phenomenon. We
will therefore follow an informal, descriptive and
non-rigorous approach. Briefly stated, our aim will
be to understand the fundamental properties of
dynamical systems rather than to prove them.
Michel Hénon, “Numerical Exploration of
Hamiltonian Systems,” in Chaotic Behavior of
Deterministic Systems, (1983).
The formulation of mechanics with generalized coordinates and

momenta as dynamical state variables is called the Hamiltonian
formulation. The Hamiltonian formulation of mechanics is equiva-
lent to the Lagrangian formulation, however each presents a useful
point of view. The Lagrangian formulation is especially useful in
the initial formulation of a system. The Hamiltonian formulation
is especially useful in understanding the evolution, especially when
there are symmetries and conserved quantities.
For each continuous symmetry of a mechanical system there
is a conserved quantity. If the generalized coordinates can be
chosen to reflect a symmetry, then, by the Lagrange equations,
the conjugate momentum is conserved. We have seen that such
conserved quantities allow us to deduce important properties of
the motion. For instance, consideration of the energy and angular
momentum allowed us to deduce that rotation of a free rigid body
about the axis of intermediate moment of inertia is unstable, and
that rotation about the other principal axes is stable. For the
axisymmetric top, we used two conserved momenta to reexpress
the equations governing the evolution of the tilt angle so that
they only involve the tilt angle and its derivative. The evolution
180 Chapter 3 Hamiltonian Mechanics
of the tilt angle can be determined independently and has simply

periodic solutions. Consideration of the conserved momenta has
provided key insight. The Hamiltonian formulation is motivated
by the desire to focus attention on the momenta.
In the Lagrangian formulation the momenta are, in a sense, sec-
ondary quantities: the momenta are functions of the state space
variables, but the evolution of the state space variables depends
on the state space variables and not on the momenta. To make
use of any conserved momenta requires fooling around with the
specific equations. The momenta can be rewritten in terms of the
coordinates and the velocities, so, locally, we can solve for the
velocities in terms of the coordinates and momenta. For a given
mechanical system and given coordinates, the momenta and the
velocities can be deduced from one another. Thus we can repre-
sent the dynamical state of the system in terms of the coordinates
and momenta just as well as with the coordinates and the veloci-
ties. If we use the coordinates and momenta to represent the state
and write the associated state derivative in terms of the coordi-
nates and momenta, then we have a self contained system. This
formulation of the equations governing the evolution of the system
has the advantage that if some of the momenta are conserved, the
remaining equations are immediately simplified.
The Lagrangian formulation of mechanics has provided the
means to investigate the motion of complicated mechanical sys-
tems. We have found that dynamical systems exhibit a bewilder-
ing variety of possible motions. The motion is sometimes rather
simple, and sometimes the motion is very complicated. Sometimes
the evolution depends very sensitively on the initial conditions,
and sometimes it is insensitive. And sometimes there are orbits
that maintain resonance relationships with a drive. Consider the
periodically driven pendulum. The driven pendulum can behave
more or less as an undriven pendulum with extra wiggles. It can
move in a strongly chaotic manner. It can move in resonance with
the drive, oscillating once for every two cycles of the drive, or
looping around once per drive cycle. Or consider the Moon. The
Moon rotates synchronously with its orbital motion, always point-
ing roughly the same face to the Earth. However, Mercury rotates
three times every two times it circles the Sun, and Hyperion ro-
tates chaotically. How can we make sense of this? How do we put
the possible motions of these systems in relation to each other?
What other motions are possible? The Hamiltonian formulation
of dynamics gives us much more than the stated goal of expressing

the system derivative in terms of potentially conserved quantities.
The Hamiltonian formulation provides a convenient framework in
which the possible motions may be placed and understood. We
will be able to see the range of stable resonance motions, and the
range of states reached by chaotic trajectories, and discover other
unsuspected possible motions. The Hamiltonian formulation leads
to many additional insights.
3.1 Hamilton’s Equations
The momenta are given by momentum state functions of the time,

the coordinates, and the velocities.1 Locally we can find inverse
functions that give the velocities in terms of the time, the co-
ordinates, and the momenta. We can use this inverse function
to represent the state in terms of the coordinates and momenta
rather than the coordinates and velocities. The equations of mo-
tion when recast in terms of coordinates and momenta are called
Hamilton’s canonical equations.
We present three derivations of Hamilton’s equations. The
first derivation is guided by the strategy outlined above and uses
nothing more complicated than implicit functions and the chain
rule. The second derivation first abstracts a key part of the first
derivation and then applies the more abstract machinery to derive
Hamilton’s equations. The third uses the action principle.
Lagrange’s equations give us the time derivative of the momen-
tum p on a path q
Dp(t) = ∂1 L(t, q(t), Dq(t)), (3.1)
where
p(t) = ∂2 L(t, q(t), Dq(t)). (3.2)
To eliminate Dq we need to solve equation (3.2) for Dq in terms

of p.
Let V be the function that gives the velocities in terms of the
time, coordinates, and momenta. Defining V is a problem of func-
1
Here we restrict our attention to Lagrangians that only depend on the time,
the coordinates, and the velocities.
tional inverses. To prevent confusion we use names for the vari-

ables that do not have mnemonic significance. Let
a = ∂2 L(b, c, d), (3.3)
then V satisfies2
d = V(b, c, a). (3.4)
The Lagrange equation (3.1) can be rewritten in terms of p using

V:
Dp(t) = ∂1 L(t, q(t), V(t, q(t), p(t))). (3.5)
We can also use V to rewrite equation (3.2) as an equation for Dq

in terms of t, q and p:
Dq(t) = V(t, q(t), p(t)). (3.6)
Equations (3.5) and (3.6) give the rate of change of q and p along
realizable paths as functions of t, q, and p along the paths.
Though fulfilling our goal of expressing the equations of motion
entirely in terms of coordinates and momenta, we can find a more
convenient representation. Define the function
e q, p) = L(t, q, V(t, q, p)),
L(t, (3.7)
which is the Lagrangian reexpressed as a function of time, coor-

dinates, and momenta. For the equations of motion we need ∂1 L
evaluated with the appropriate arguments. Consider
e q, p) = ∂1 L(t, q, V(t, q, p)) + ∂2 L(t, q, V(t, q, p))∂1 V(t, q, p)
∂1 L(t,
= ∂1 L(t, q, V(t, q, p)) + p∂1 V(t, q, p), (3.8)
where we used the chain rule in the first step and the inverse
property of V in the second step. Introducing the momentum
selector3 P (t, q, p) = p, and using the property ∂1 P = 0, we have
e q, p) − P (t, q, p)∂1 V(t, q, p)
∂1 L(t, q, V(t, q, p)) = ∂1 L(t,
2
The following properties hold: d = V(b, c, ∂2 L(b, c, d)) and a =
∂2 L(b, c, V(b, c, a)).
3
P = I2
e − P V)(t, q, p)
= ∂1 (L
= −∂1 H(t, q, p), (3.9)
where the Hamiltonian H is defined by4

e
H = P V − L. (3.10)
The Lagrange equation for Dp becomes simply
Dp(t) = −∂1 H(t, q(t), p(t)). (3.11)
The equation for Dq can also be written in terms of H. Consider

e
∂2 H(t, q, p) = ∂2 (P V − L)(t, q, p)
e q, p).
= V(t, q, p) + p∂2 V(t, q, p) − ∂2 L(t, (3.12)
e we write it out in terms of L:
To carry out the derivative of L
e q, p) = ∂2 L(t, q, V(t, q, p))∂2 V(t, q, p) = p∂2 V(t, q, p), (3.13)
∂2 L(t,
using the inverse property of V again. So, putting equations (3.12)

and (3.13) together, we obtain
∂2 H(t, q, p) = V(t, q, p). (3.14)
On paths for which Dq(t) = V(t, q(t), p(t)) we have
Dq(t) = ∂2 H(t, q(t), p(t)). (3.15)
Equations (3.11) and (3.15) give the derivatives of the coordi-

nate and momentum path functions in terms of the time, coordi-
nates, and momenta. These equations are known as Hamilton’s
equations:
Dq(t) = ∂2 H(t, q(t), p(t))

Dp(t) = −∂1 H(t, q(t), p(t)). (3.16)
The first equation is just a restatement of the relationship of the

momenta to the velocities in terms of the Hamiltonian and holds
for any path, whether or not it is a realizable path. The second
equation holds only for realizable paths.
4
The overall minus sign in the definition of the Hamiltonian is traditional.
Hamilton’s equations5 have an especially simple and symmet-

rical form. Just as Lagrange’s equations are constructed from
a real-valued function, the Lagrangian, Hamilton’s equations are
constructed from a real-valued function, the Hamiltonian. The
Hamiltonian function is6
H(t, q, p) = pV(t, q, p) − L(t, q, V(t, q, p)). (3.17)
The Hamiltonian has the same value as the energy function E (see
equation 1.140), except that the velocities are expressed in terms
of time, coordinates, and momenta by V:
H(t, q, p) = E(t, q, V(t, q, p)). (3.18)
Illustration
Let’s try something simple: the motion of a particle of mass m
with potential energy V (x, y). A Lagrangian is
L(t; x, y; vx , vy ) = 12 m(vx2 + vy2 ) − V (x, y). (3.19)
To form the Hamiltonian we first find the momenta p = ∂2 L(t, q, v):

px = mvx and py = mvy . Solving for the velocities in terms of the
momenta is easy here: vx = px /m and vy = py /m. The Hamilto-
nian is H(t, q, p) = pv − L(t, q, v) with v reexpressed in terms of
(t, q, p):
p2x + p2y
H(t; x, y; px , py ) = + V (x, y). (3.20)
2m
5
In traditional notation Hamilton’s equations are written:
dq ∂H dp ∂H
= and =− ,
dt ∂p dt ∂q
or as separate equations for each component:
dq i ∂H dpi ∂H
= and =− i.
dt ∂pi dt ∂q
6
Traditionally, the Hamiltonian is written
H = pq̇ − L,
This way of writing the Hamiltonian confuses the values of functions with
the functions that generate them: both q̇ and L have to be reexpressed as
functions of the time, coordinates and momenta.
The kinetic energy is a homogeneous quadratic form in the veloc-

ities, so the energy is T + V and the Hamiltonian is the energy
expressed in terms of momenta rather than velocities. Hamilton’s
equations for Dq are
Dx = px /m
Dy = py /m. (3.21)
Note that on paths, where vx = Dx and vy = Dy, these just re-

state the relation between the momenta and the velocities. Hamil-
ton’s equations for Dp are
Dpx = −∂0 V (x, y)

Dpy = −∂1 V (x, y). (3.22)
The rate of change of the linear momentum is minus the gradient

of the potential energy.
Exercise 3.1: Deriving Hamilton’s equations

For each of the following Lagrangians derive the Hamiltonian and Hamil-
ton’s equations. These problems are simple enough to do by hand.
a. A Lagrangian for a planar pendulum is L(t, θ, θ̇) = 12 ml2 θ̇2 +mgl cos θ.
b. A Lagrangian for a particle of mass m with a two dimensional po-
tential energy V (x, y) = (x2 + y 2 )/2 + x2 y − y 3 /3 is L(t; x, y; ẋ, ẏ) =
1 2 2
2 m(ẋ + ẏ ) − V (x, y).
c. A Lagrangian for a particle of mass m constrained to move on a
sphere of radius R is L(t; θ, ϕ; θ̇, ϕ̇) = 12 mR2 (θ̇2 + (ϕ̇ sin θ)2 ), where θ is
the colatitude and ϕ is the longitude on the sphere.
Exercise 3.2: Sliding pendulum

For the pendulum with a sliding support (see exercise 1.20) derive a
Hamiltonian and Hamilton’s equations.
Hamiltonian state
Given a coordinate path q, and a Lagrangian L, the corresponding
momentum path p is given by equation (3.2). Equation (3.15) ex-
presses the same relationship in terms of the corresponding Hamil-
tonian H. That these relations are valid for any path, whether
or not it is a realizable path, allows us to abstract to arbitrary
velocity and momentum at a moment. At a moment, the mo-
mentum p for the state tuple (t, q, v) is p = ∂2 L(t, q, v). We also
have v = ∂2 H(t, q, p). In the Lagrangian formulation the state
of the system at a moment can be specified by the local state

tuple (t, q, v) of time, generalized coordinates, and generalized
velocities. Lagrange’s equations determine a unique path ema-
nating from this state. In the Hamiltonian formulation the state
can be specified by the tuple (t, q, p) of time, generalized coordi-
nates, and generalized momenta. Hamilton’s equations determine
a unique path emanating from this state. The Lagrangian state
tuple (t, q, v) encodes exactly the same information as the Hamil-
tonian state tuple (t, q, p); we need a Lagrangian or a Hamiltonian
to relate them. The two formulations are equivalent in that for
equivalent initial states the same coordinate path emanates from
them.
The Lagrangian state derivative is constructed from the La-
grange equations by solving for the highest order derivative and
abstracting to arbitrary positions and velocities at a moment.7
The Lagrangian state path is generated by integration of the La-
grangian state derivative given an initial Lagrangian state (t, q, v).
Similarly, the Hamiltonian state derivative can be constructed
from Hamilton’s equations by abstracting to arbitrary positions
and momenta at a moment. Hamilton’s equations are a set of
first-order differential equations in explicit form. The Hamilto-
nian state derivative can be directly written in terms of them. The
Hamiltonian state path is generated by integration of the Hamilto-
nian state derivative given an initial Hamiltonian state (t, q, p). If
these state paths are obtained by integrating the state derivatives
with equivalent initial states, then the coordinate path compo-
nent of these state paths are the same and satisfy the Lagrange
equations. The coordinate path and the momentum path compo-
nents of the Hamiltonian state path satisfy Hamilton’s equations.
The Hamiltonian formulation and the Lagrangian formulation are
equivalent.
Given a path q the Lagrangian state path and the Hamiltonian
state paths can be deduced from it. The Lagrangian state path
7
In the construction of the Lagrangian state derivative from the Lagrange
equations we must solve for the highest order derivative. The solution process
requires the inversion of the matrix ∂2 ∂2 L. In the construction of Hamilton’s
equations, the construction of V from the momentum state function ∂2 L re-
quires the inversion of the same matrix. If the Lagrangian formulation has
singularities, the singularities cannot be avoided by going to the Hamiltonian
formulation.
Γ[q] can be constructed from a path q simply by taking derivatives.

The Lagrangian state path satisfies:
Γ[q](t) = (t, q(t), Dq(t)) . (3.23)
The Lagrangian state path is uniquely determined by the path q.

The Hamiltonian state path ΠL [q] can also be constructed from
the path q but the construction requires a Lagrangian. The Hamil-
tonian state path satisfies
ΠL [q](t) = (t, q(t), ∂2 L(t, q(t), Dq(t))) = (t, q(t), p(t)) . (3.24)
The Hamiltonian state tuple is not uniquely determined by the

path q because it depends upon our choice of Lagrangian, which
is not unique.
The 2n-dimensional space whose elements are labeled by the
n generalized coordinates q i and the n generalized momenta pi is
called phase space. The components of the generalized coordinates
and momenta are collectively called the phase-space components.8
The dynamical state of the system is completely specified by the
phase state tuple (t, q, p), given a Lagrangian or Hamiltonian to
provide the map between velocities and momenta.
Computing Hamilton’s equations
Hamilton’s equations are a system of first order differential equa-
tions. We presented a procedural formulation of Lagrange’s equa-
tions as a first order system in section 1.7. The following formu-
lation of Hamilton’s equations is analogous:
8
The term phase space was introduced by Josiah Willard Gibbs in his for-
mulation of statistical mechanics. The Hamiltonian plays a fundamental role
in the Boltzmann-Gibbs formulation of statistical mechanics, and in both the
Heisenberg and Schrödinger approaches to quantum mechanics.
The momentum p can be viewed as the coordinate representation of a linear
form on the tangent space. Thus pq̇ is a scalar quantity, which is invariant
under time-independent coordinate transformations of the configuration space.
The set of momentum forms comprise an n-dimensional vector space at each
point of configuration space called the cotangent space. The collection of all
cotangent spaces of a configuration space forms a space called the cotangent
bundle of the configuration manifold.
(define ((Hamilton-equations Hamiltonian) q p)

(let ((H-state-path (qp->H-state-path q p)))
(- (D H-state-path)
(compose (phase-space-derivative Hamiltonian)
H-state-path))))
The Hamiltonian state derivative is computed as follows:

(define ((phase-space-derivative Hamiltonian) H-state)
(up 1
(((partial 2) Hamiltonian) H-state)
(- (((partial 1) Hamiltonian) H-state))))
The state in the Hamiltonian formulation is composed of the time,

the coordinates, and the momenta. We call this an H-state, to dis-
tinguish it from the state in the Lagrangian formulation. We can
select the components of the Hamiltonian state with the selectors
time, coordinate, momentum. We construct Hamiltonian states
from their components with up. The first component of the state
is time, so the first component of the state derivative is one, the
time rate of change of time. Given procedures q and p implement-
ing coordinate and momentum path functions, the Hamiltonian
state path can be constructed with the following procedure:
(define ((qp->H-state-path q p) t)
(up t (q t) (p t)))
The Hamilton-equations procedure returns the residuals of Hamil-

ton’s equations for the given paths.
For example, a procedure implementing the Hamiltonian for a
point mass with potential energy V (x, y) is
(define ((H-rectangular m V) H-state)
(let ((q (coordinate H-state))
(p (momentum H-state)))
(+ (/ (square p) (* 2 m))
(V (ref q 0) (ref q 1)))))
Hamilton’s equations are:9
9
By default literal functions map reals to reals; the default type for a lit-
eral function is (-> Real Real). Here, the potential energy V takes two real
arguments and returns a real.
(show-expression
(((Hamilton-equations
(H-rectangular
’m
(literal-function ’V (-> (X Real Real) Real))))
(up (literal-function ’x) (literal-function ’y))
(down (literal-function ’p x) (literal-function ’p y)))
’t))
 
0
 


 px (t)  

 Dx (t) − 
  m  
   
 py (t) 
 Dy (t) − 
 
 " Dp (t) + ∂ V (xm #
 x 0 (t) , y (t)) 
 
Dpy (t) + ∂1 V (x (t) , y (t))
The zero in first element of the structure of Hamilton’s equations

residuals is just the tautology that time advances uniformly: that
the time function is just the identity, so its derivative is 1 and the
residual is zero. The equations in the second element relates the
coordinate paths and the momentum paths. The equations in the
third element give the rate of change of the momenta in terms of
the applied forces.
Exercise 3.3: Computing Hamilton’s equations

Check your answers to exercise 3.1 with the Hamilton equations proce-
dures.
3.1.1 The Legendre Transformation

The Legendre transformation abstracts a key part of the process
of transforming from the Lagrangian to the Hamiltonian formula-
tion of mechanics—the replacement of functional dependence on
generalized velocities with functional dependence on generalized
momenta. The momentum state function is defined as a partial
derivative of the Lagrangian, a real-valued function of time, co-
ordinates, and velocities. The Legendre transformation provides
an inverse that gives the velocities in terms of the momenta: we
are able to write the velocities as a partial derivative of a different

real-valued function of time, coordinates, and momenta.10
Given a real-valued function F , if we can find a real-valued
function G, such that DF = (DG)−1 then we say that F and G
are related by a Legendre transform.
Locally we can define the inverse function11 V of DF so that
DF ◦ V = I, where I is the identity function I(w) = w. Consider
the composite function Fe = F ◦ V. The derivative of Fe is
DFe = (DF ◦ V)DV = IDV. (3.25)
Using the product rule and DI = 1,
DFe = D(IV) − V, (3.26)
or
V = D(IV) − DFe = D(IV − Fe). (3.27)
The integral is determined up to a constant of integration. If we

define
G = IV − Fe, (3.28)
then we have
V = DG. (3.29)
The function G has the desired property that DG is the inverse

function V of DF . The derivation just given applies equally well
if the arguments of F and G have multiple components.
Given a relation w = DF (v) for some given function F , then
v = DG(w) for G = IV − F ◦ V, where V is the inverse function
of DF provided it exists.
A picture may help (see figure 3.1). The curve is the graph
of the function DF . Turned sideways, it is also the graph of the
function DG, because DG is the inverse function of DF . The
integral of DF from v0 to v is F (v) − F (v0 ); this is the area below
10
The Legendre transformation is more general than its use in mechanics in
that it captures the relationship between conjugate variables in systems as
diverse as thermodynamics, circuits, and field theory.
11
This can be done so long as the derivative is not zero.
DF
G(w) − G(w0 )
w0
F (v) − F (v0 )
v0 v DG
Figure 3.1 The Legendre transform can be interpreted in terms of

geometric areas. The curve is the graph of DF , and viewed sideways is
the graph of DG = DF −1 . This figure should remind you of the geo-
metric interpretation of the product rule for derivatives, or alternatively,
integration by parts.
the curve from v0 to v. Likewise the integral of DG from w0 to w

is G(w) − G(w0 ); this is the area to the left of the curve from w0
to w. The union of these two regions has the area wv − w0 v0 . So
wv − w0 v0 = F (v) − F (v0 ) + G(w) − G(w0 ), (3.30)
which is the same as
wv − F (v) − G(w) = w0 v0 − G(w0 ) − F (v0 ). (3.31)
The left-hand side depends on the point labeled by w and v and

the right-hand side depends on the point labeled by w0 and v0 , so
these can only be equal to a constant, independent of the variable
endpoints. As the point is changed the combination G(w)+F (v)−
wv is invariant. So
G(w) = wv − F (v) + C, (3.32)

with constant C. The requirement for G depends only on DG so

we can choose to define G with C = 0.
Legendre transformations with passive arguments
Let F be a real-valued function of two arguments, and
w = ∂1 F (x, v). (3.33)
If we can find a real-valued function G such that
v = ∂1 G(x, w) (3.34)
we say that F and G are related by a Legendre transformation,

and that the second argument in each function is active and that
the first argument is passive in the transformation.
If the function ∂1 F can be locally inverted with respect to the
second argument we can define
v = V(x, w), (3.35)
giving
w = ∂1 F (x, V(x, w)) = W (x, w), (3.36)
where W = I1 is the selector function for the second argument.

For the active arguments the derivation goes through as before.
The first argument to F and G is just along for the ride, it is a
passive argument. Let
Fe(x, w) = F (x, V(x, w)), (3.37)
then define
G = W V − Fe. (3.38)
We can check that G has the property V = ∂1 G by carrying out

the derivative:
∂1 G = ∂1 (W V − Fe)
= V + W ∂1 V − ∂1 Fe , (3.39)
but
∂1 Fe(x, w) = ∂1 F (x, V(x, w))∂1 V(x, w)

= W (x, w)∂1 V(x, w), (3.40)
or
∂1 Fe = W ∂1 V. (3.41)
So
∂1 G = V, (3.42)
as required. The active argument may have many components.

The partial derivatives with respect to the passive arguments
are related in a remarkably simple way. Let’s calculate the deriva-
tive ∂0 G in pieces. First,
∂0 (W V) = W ∂0 V (3.43)
because ∂0 W = 0. To calculate ∂0 Fe we must supply arguments
∂0 Fe(x, w) = ∂0 F (x, V(x, w)) + ∂1 F (x, V(x, w))∂0 V(x, w)

= ∂0 F (x, V(x, w)) + W (x, w)∂0 V(x, w). (3.44)
Putting these together we find
∂0 G(x, w) = −∂0 F (x, V(x, w)) = −∂0 F (x, v). (3.45)
The calculation is unchanged if the passive argument has many

components.
We can write the Legendre transformation more symmetrically:
w = ∂1 F (x, v)
wv = F (x, v) + G(x, w)
v = ∂1 G(x, w)
0 = ∂0 F (x, v) + ∂0 G(x, w). (3.46)
The last relation is not as trivial as it looks, because x enters the

equations connecting w and v. With this symmetrical form, we
see that the Legendre transform is its own inverse.
Exercise 3.4: Simple Legendre transforms

For each of the following functions find the function that is related to
the given function by the Legendre transform on the indicated active
argument. Show that the Legendre transform relations hold for your
solution, including the relations among passive arguments, if any.
a. F (x) = a sin x + b cos x, there are no passive arguments.
b. F (x, y) = a sin x cos y, with x active.
c. F (x, y, ẋ, ẏ) = xẋ2 + 3ẋẏ + y ẏ 2 , with ẋ and ẏ active.
Hamilton’s equations from the Legendre transformation

We can use the Legendre transformation with the Lagrangian
playing the role of F and with the generalized velocity slot playing
the role of the active argument. The Hamiltonian plays the role
of G with the momentum slot active. The coordinate and time
slots are passive arguments.
The Lagrangian L and the Hamiltonian H are related by a
Legendre transformation:
e = (∂2 L)(a, b, c) (3.47)
ec = L(a, b, c) + H(a, b, e) (3.48)
and
c = (∂2 H)(a, b, e), (3.49)
with passive equations
0 = ∂0 L(a, b, c) + ∂0 H(a, b, e), (3.50)

0 = ∂1 L(a, b, c) + ∂1 H(a, b, e). (3.51)
Presuming it exists, we can define the inverse of ∂2 L with respect

to the last argument
c = V(a, b, e), (3.52)
and write the Hamiltonian
H(a, b, c) = cV(a, b, c) − L(a, b, V(a, b, c)). (3.53)
These relations are purely algebraic in nature.

On a path q we have the momentum p:
p(t) = ∂2 L(t, q(t), Dq(t)), (3.54)

and from the definition of V
Dq(t) = V(t, q(t), p(t)). (3.55)
The Legendre transform gives
Dq(t) = ∂2 H(t, q(t), p(t)). (3.56)
This relation is purely algebraic and is valid for any path. The
passive equation (3.51) gives
∂1 L(t, q(t), Dq(t)) = −∂1 H(t, q(t), p(t)), (3.57)
but the left-hand side can be rewritten using the Lagrange equa-
tions, so
Dp(t) = −∂1 H(t, q(t), p(t)). (3.58)
This equation is only valid for realizable paths, because we used

the Lagrange equations to derive it. Equations (3.56) and (3.58)
are Hamilton’s equations.
The remaining passive equation is
∂0 L(t, q(t), Dq(t)) = −∂0 H(t, q(t), p(t)). (3.59)
We have found that if the Lagrangian has no explicit time de-

pendence (∂0 L = 0) then energy is conserved. This passive equa-
tion says that if the Lagrangian has no explicit time dependence
then the Hamiltonian will also have no explicit time dependence
(∂0 H = 0). So if the Hamiltonian has no explicit time dependence
then it is a conserved quantity.
Exercise 3.5:
Using Hamilton’s equations, show directly that the Hamiltonian is a
conserved quantity if the Hamiltonian has no explicit time dependence.
Legendre transforms of quadratic functions

We cannot implement the Legendre transform in general because
it involves finding the functional inverse of an arbitrary function.
However, many physical systems can be described by Lagrangians
that are quadratic forms in the generalized velocities. For such
functions the generalized momenta are linear functions of the gen-
eralized velocities, and thus explicitly invertible.
More generally, we can compute a Legendre transformation for

polynomial functions where the leading term is a quadratic form:
1
F (v) = vM v + bv + c. (3.60)
2
We can assume M is symmetric,12 because it defines a quadratic
form. We can find linear expressions for w
w = DF (v) = vM + b. (3.61)
So if M is invertible we can solve for v in terms of w. Thus we

may define a function V such that
v = V(w) = M −1 (w − b) (3.62)
and we can use this to compute the value of the function G:
G(w) = wV(w) − F (V(w)). (3.63)
Computing Hamiltonians
We implement the Legendre transform for quadratic functions by
the procedure:13
(define (Legendre-transform F)
(let ((w-of-v (D F)))
(define (G w)
(let ((z (dual-zero w)))
(let ((M ((D w-of-v) z))
(b (w-of-v z)))
(let ((v (/ (- w b) M)))
(- (* w v) (F v))))))
G))
The procedure Legendre-transform takes a procedure of one ar-

gument and returns the procedure that is associated with it by
the Legendre transform. If w = DF (v), wv = F (v) + G(w), and
v = DG(w) specifies a one argument Legendre transformation,
12
Let M be the matrix representation of M , then M = MT .
13
The division operation, denoted by / in the Legendre-transform procedure,
is generic over mathematical objects. We interpret the division in the matrix
representation: if a vector y is divided by a matrix M this is interpreted as a
request to solve the linear system Mx = y, where x is the unknown vector.
then G is the function associated with F by the Legendre trans-

form: G = IV − F ◦ V, where V is the functional inverse of DF .
We can use the Legendre-transform procedure to compute a
Hamiltonian from a Lagrangian
(define ((Lagrangian->Hamiltonian Lagrangian) H-state)
(let ((t (time H-state))
(q (coordinate H-state))
(define (L qdot)
(Lagrangian (up t q qdot)))
((Legendre-transform L) p)))
Notice that the one-argument Legendre-transform procedure is

sufficient. The passive variables are given no special attention,
they are just passed around.
The Lagrangian may be obtained from the Hamiltonian by the
procedure:
(define ((Hamiltonian->Lagrangian Hamiltonian) L-state)
(let ((t (time L-state))
(q (coordinate L-state))
(qdot (velocity L-state)))
(define (H p)
(Hamiltonian (up t q p)))
((Legendre-transform H) qdot)))
Notice that the two procedures Hamiltonian->Lagrangian and

Lagrangian->Hamiltonian are identical, except for the names.
For example, the Hamiltonian for the motion of the point mass
with the potential energy V (x, y) may be computed from the La-
grangian:
(define ((L-rectangular m V) local)
(- (* 1/2 m (square qdot))
(V (ref q 0) (ref q 1)))))
And the Hamiltonian is:

(show-expression
((Lagrangian->Hamiltonian
(L-rectangular
’m
(literal-function ’V (-> (X Real Real) Real))))
(up ’t (up ’x ’y) (down ’p x ’p y))))
1 2 1 2
2 px 2 py
V (x, y) + +
m m
Exercise 3.6: On a helical track

A uniform cylinder of mass M , radius R, and height h is mounted so as
to rotate freely on a vertical axis. A mass point of mass m is constrained
to move on a uniform frictionless helical track of pitch β (measured in
radians per meter of drop along the cylinder) mounted on the surface
of the cylinder (see figure 3.2). The mass is acted upon by standard
gravity (g = 9.8ms−2 ).
Figure 3.2
a. What are the degrees of freedom of this system? Pick and describe
a convenient set of generalized coordinates for this problem. Write a
Lagrangian to describe the dynamical behavior. It may help to know
that the moment of inertia of the cylinder around its axis is 12 M R2 . You
3.1.2 Hamiltonian Action Principle 199
may find it easier to do the algebra if various constants are combined

and represented as single symbols.
b. Make a Hamiltonian for the system. Write Hamilton’s equations for
the system. Are there any conserved quantities?
c. If we release the mass point at time t = 0 at the top of the track
with zero initial speed and let it slide down, what is the motion of the
system?
Exercise 3.7: An ellipsoidal bowl

Consider a point particle of mass m constrained to move in a bowl
and acted upon by a uniform gravitational acceleration g. The bowl
is ellipsoidal, with height z = ax2 + by 2 . Make a Hamiltonian for this
system. Are there any immediate deductions you can make about this
system?
3.1.2 Hamiltonian Action Principle

The previous two derivations of Hamilton’s equations have made
use of the Lagrange equations. Hamilton’s equations can also be
derived directly from the action principle.
The action is the integral of the Lagrangian along a path:
Z t2
S[q](t1 , t2 ) = L ◦ Γ[q]. (3.64)
t1
The action is stationary with respect to variations of the path that

preserve the configuration at the endpoints (for Lagrangians that
are functions of time, coordinates, and velocities).
We can rewrite the integrand in terms of the Hamiltonian
L ◦ Γ[q](t) = p(t)Dq(t) − H(t, q(t), p(t)), (3.65)
with p(t) = ∂2 L(t, q(t), Dq(t)). The Legendre transformation con-

struction gives
Dq(t) = ∂2 H(t, q(t), p(t)), (3.66)
which is one of Hamilton’s equations, the one that does not depend
on the path being a realizable path. Using
ΠL [q](t) = (t, q(t), ∂2 L(t, q(t), Dq(t))) = (t, q(t), p(t)) , (3.67)
the integrand is
L ◦ Γ[q] = pDq − H ◦ ΠL [q]. (3.68)
The variation of the action is then
δS[q](t1 , t2 )
Z t2
= δ(pDq − H ◦ ΠL [q])
t1
Z t2
= (δp Dq + p δDq − (DH ◦ ΠL [q])δΠL [q])
t1
Z t2
= {δp Dq + p Dδq
t1
−(∂1 H ◦ ΠL [q])δq − (∂2 H ◦ ΠL [q])δp} , (3.69)
where δp is the variation in the momentum.14 Integrating the

second term by parts, using D(pδq) = Dpδq + pDδq, we get
δS[q](t1 , t2 ) = pδq|tt21
Z t2
+ {δp Dq − Dp δq
t1
−(∂1 H ◦ ΠL [q])δq − (∂2 H ◦ ΠL [q])δp} . (3.70)
The variations are constrained so that δq(t1 ) = δq(t2 ) = 0, so the

integrated part vanishes. The variation of the action is
δS[q](t1 , t2 ) (3.71)
Z t2
= ((Dq − ∂2 H ◦ ΠL [q]) δp − (Dp + ∂1 H ◦ ΠL [q]) δq) .
t1
14
The variation of the momentum δp does not need to be further expanded in
this argument because it turns out that the factor multiplying it is zero. How-
ever, it is handy to see how it is related to the variations in the coordinate
path δq:
δp(t) = ∂1 ∂2 L(t, q(t), Dq(t))δq(t) + ∂2 ∂2 L(t, q(t), Dq(t))Dδq(t).
3.1.3 A Wiring Diagram 201
As a consequence of equation (3.66), the factor multiplying δp is

zero. We are left with
Z t2
δS[q](t1 , t2 ) = (Dp + ∂1 H ◦ ΠL [q]) δq. (3.72)
t1
For the variation of the action to be zero for arbitrary variations,

except for the endpoint conditions, we must have
Dp = −∂1 H ◦ ΠL [q], (3.73)
which is the “dynamical” Hamilton’s equation.15
3.1.3 A Wiring Diagram

Figure 3.3 shows a summary of the functional relationship between
the Lagrangian and the Hamiltonian descriptions of a dynami-
cal system. The diagram shows a “circuit” interconnecting some
“devices” with “wires”. The devices represent the mathematical
functions that relate the quantities on their terminals. The wires
represent identifications of the quantities on the terminals that
they connect. For example, there is a box that represents the
Lagrangian function. Given values t, q, and q̇ the value of the
Lagrangian L(t, q, q̇) is on the terminal labeled L, which is wired
to an addend terminal of an adder. There are other terminals of
the Lagrangian that carry the values of the partial derivatives of
the Lagrangian function.
The upper part of the diagram summarizes the relationship of
the Hamiltonian to the Lagrangian. For example, the sum of the
values on the terminals L of the Lagrangian and H of the Hamilto-
nian is the product of the value on the q̇ terminal of the Lagrangian
and the value on the p terminal of the Hamiltonian. This is the
active part of the Legendre transform. The passive variables are
related by the corresponding partial derivatives being negations
of each other. In the lower part of the diagram the equations of
15
It is sometimes asserted that the momenta have a different status in the
Lagrangian and Hamiltonian formulations; that in the Hamiltonian framework
the momenta are “independent” of the coordinates. From this it is argued that
the variations δq and δp are arbitrary and independent, therefore implying
that the factor multiplying each of them in the action integral (3.72) must
independently be zero, apparently deriving both of Hamilton’s equations. The
argument is fallacious: we can write δp in terms of δq (see footnote 14).
p q̇ q t
q̇
Lagrangian
q
∂1 L ∂0 L L ∂2 L t
ṗ
− − + ×
∂1 H ∂0 H H ∂2 H
p
q
Hamiltonian t
p0
R
q0
R
1
t0
Figure 3.3 This is a “wiring diagram” describing the relationships

among the dynamical quantities occurring in Lagrangian and Hamilto-
nian mechanics.
motion are indicated by the presence of the integrators, relating

the dynamical quantities to their time derivatives.
One can use this diagram to help understand the underlying
unity of the Lagrangian and Hamiltonian formulations of mechan-
ics. Lagrange’s equations are just the connection of the ṗ wire to
the ∂1 L terminal of the Lagrangian device. One of Hamilton’s
equations is just the connection of the ṗ wire (through the nega-
tion device) to the ∂1 H terminal of the Hamiltonian device. The
other is just the connection of the q̇ wire to the ∂2 H terminal of
the Hamiltonian device. We see that the two formulations are
consistent. One does not have to abandon any part of the La-
grangian formulation to use the Hamiltonian formulation: there
are deductions that can be made using both simultaneously.
3.2 Poisson Brackets
Here we introduce the Poisson bracket. In terms of the Poisson

bracket Hamilton’s equations have an elegant and symmetric ex-
pression.
Consider a function F of time, coordinates, and momenta. The
value of F along the path σ(t) = (t, q(t), p(t)) is (F ◦ σ)(t) =
F (t, q(t), p(t)). The time derivative of F ◦ σ is
D(F ◦ σ) = (DF ◦ σ)Dσ

= ∂0 F ◦ σ + (∂1 F ◦ σ)Dq + (∂2 F ◦ σ)Dp. (3.74)
If the phase-space path is a realizable path for a system with

Hamiltonian H, then Dq and Dp can be reexpressed using Hamil-
ton’s equations
D(F ◦ σ) = ∂0 F ◦ σ + (∂1 F ◦ σ)(∂2 H ◦ σ) − (∂2 F ◦ σ)(∂1 H ◦ σ)

= ∂0 F ◦ σ + (∂1 F ∂2 H − ∂2 F ∂1 H) ◦ σ
= ∂0 F ◦ σ + {F, H} ◦ σ (3.75)
where the Poisson bracket {F, H} of F and H is defined by16
{F, H} = ∂1 F ∂2 H − ∂2 F ∂1 H. (3.76)
Note that the Poisson bracket of two functions on the phase state
space is also a function on the phase state space.
The coordinate selector Q = I1 is an example of a function on
phase state space: Q(t, q, p) = q. According to equation (3.75)
Dq = D(Q ◦ σ) = {Q, H} ◦ σ = ∂2 H ◦ σ, (3.77)
but this is the same as Hamilton’s equation
Dq(t) = ∂2 H(t, q(t), p(t)). (3.78)
Similarly, the momentum selector P = I2 is a function on phase

state space: P (t, q, p) = p. We have
Dp = D(P ◦ σ) = {P, H} ◦ σ = −∂1 H ◦ σ, (3.79)
which is the same as the other Hamilton’s equation
Dp(t) = −∂1 H(t, q(t), p(t)). (3.80)
So the Poisson bracket provides a uniform way of writing Hamil-

ton’s equations:
D(Q ◦ σ) = {Q, H} ◦ σ
D(P ◦ σ) = {P, H} ◦ σ. (3.81)
The Poisson bracket of any function with itself is zero, so we

recover the conservation of energy for a system that has no explicit
time dependence:
DE = D(H ◦ σ) = (∂0 H + {H, H}) ◦ σ = ∂0 H ◦ σ. (3.82)
Properties of the Poisson bracket

Let F , G, and H be functions of time, position, and momentum,
and c is independent of position and momentum.
16
In traditional notation the Poisson bracket is written
X ³ ∂F ∂H ∂F ∂H ´
{F, H} = i
− .
i
∂q ∂pi ∂pi ∂q i
The Poisson bracket is antisymmetric:
{F, G} = − {G, F } . (3.83)
It is bilinear (linear in each argument):
{F, G + H} = {F, G} + {F, H} (3.84)

{F, cG} = c {F, G} (3.85)
{F + G, H} = {F, H} + {G, H} (3.86)
{cF, G} = c {F, G} . (3.87)
The Poisson bracket satisfies Jacobi’s identity:
0 = {F, {G, H}} + {H, {F, G}} + {G, {H, F }} , (3.88)
where all but the last can be immediately verified from the def-
inition. Jacobi’s identity requires a little more effort to verify.
We can use the computer to avoid this work. Define some literal
phase-space functions of Hamiltonian type:
(define F
(literal-function ’F
(-> (UP Real (UP Real Real) (DOWN Real Real)) Real)))
(define G
(literal-function ’G
(define H
(literal-function ’H
Then we check the Jacobi identity:

(pe ((+ (Poisson-bracket F (Poisson-bracket G H))
(Poisson-bracket G (Poisson-bracket H F))
(Poisson-bracket H (Poisson-bracket F G)))
(up ’t (up ’x ’y) (down ’px ’py))))
0
The residual is zero, so the Jacobi identity is satisfied for any three
phase space functions for two degrees of freedom.
Poisson brackets of conserved quantities

The Poisson bracket of conserved quantities is conserved. Let F
and G be time independent functions on the phase state space:
∂0 F = ∂0 G = 0. If F and G are conserved by the evolution under
H then
0 = D(F ◦ σ) = {F, H} ◦ σ
0 = D(G ◦ σ) = {G, H} ◦ σ. (3.89)
So the Poisson brackets of F and G with H are zero: {F, H} =

{G, H} = 0. The Jacobi identity then implies
{{F, G}, H} = 0, (3.90)
and thus
D({F, G} ◦ σ) = 0, (3.91)
so {F, G} is a conserved quantity. The Poisson bracket of two

conserved quantities is also a conserved quantity.
3.3 One Degree of Freedom
The solutions of time-independent systems with one degree of free-

dom can be found by quadrature. Such systems conserve the
Hamiltonian: the Hamiltonian has a constant value on each re-
alizable trajectory. We can use this constraint to eliminate the
momentum in favor of the coordinate. Thus Hamilton’s equations
reduce to a single equation Dq = f (q). The solution q can be
expressed as a definite integral.
A geometric view reveals more structure. Time-independent
systems with one degree of freedom have a two-dimensional phase
space. Energy is conserved, so all orbits are level curves of the
Hamiltonian. The possible orbit types are restricted to curves
that are contours of a real-valued function. The possible orbits
are paths of constant altitude in the mountain range on the phase
plane described by the Hamiltonian.
There are a small number of characteristic features that are pos-
sible. There are points that are stable equilibria of the dynamical
system. These are the peaks and pits of the Hamiltonian mountain
range. These equilibria are stable in the sense that neighboring
3.3 One Degree of Freedom 207
trajectories on nearby contours stay close to the equilibrium point.

There are orbits that trace simple closed curves on contours that
surround a peak or pit, or perhaps several peaks. There are also
trajectories that lie on contours that cross at a saddle point. The
crossing point is an unstable equilibrium. It is unstable in the
sense that neighboring trajectories leave the vicinity of the equi-
librium point. Such contours that cross at saddle points are called
separatrices, a contour that “separates” two regions of distinct be-
havior.17
At every point Hamilton’s equations give a unique rate of evo-
lution. Hamilton’s equations direct the system to move perpen-
dicular to the gradient of the Hamiltonian. At the peaks, pits,
and saddle points, the gradient of the Hamiltonian is zero, so ac-
cording to Hamilton’s equations these are fixed points. At other
points, the gradient of the Hamiltonian is non-zero, so according
to Hamilton’s equations the rate of evolution is non-zero. Trajec-
tories evolve along the contours of the Hamiltonian. Trajectories
on simple closed contours periodically trace the contour. At a
saddle point contours cross. The gradient of the Hamiltonian is
zero at the saddle point so a system started at the saddle point
does not leave the saddle point. On the separatrix away from the
saddle point the gradient of the Hamiltonian is not zero so trajec-
tories evolve along the contour. Trajectories on the separatrix are
asymptotic forward or backward in time to a saddle point. Going
forward or backward in time such trajectories forever approach
an unstable equilibrium but never reach it. If the phase space is
bounded, asymptotic trajectories that lie on contours of a smooth
Hamiltonian are always asymptotic to unstable equilibria at both
ends (but they may be different equilibria).
These orbit types are all illustrated by the prototypical phase
plane of the pendulum (see figure 3.4). The solutions lie on con-
tours of the Hamiltonian. There are three regions of the phase
plane; in each the motion is qualitatively different. In the cen-
tral region the pendulum oscillates; above this there is a region
in which the pendulum circulates in one direction; and below the
oscillation region the pendulum circulates in the other direction.
In the center of the oscillation region there is a stable equilibrium,
at which the pendulum is hanging motionless. At the boundaries
17
Separatrices is the plural of separatrix.
20
-20
−π 0 π
Figure 3.4 The phase plane of the pendulum has three regions dis-
playing two distinct kinds of behavior. In this figure there are a number
of different trajectories. Trajectories lie on the contours of the Hamilto-
nian. Trajectories may oscillate, making ovoid curves around the equi-
librium point, or they may circulate, producing wavy tracks outside the
eye-shaped region. The eye-shaped region is delimited by the separatrix.
This pendulum has length 1m, a bob of mass 1kg, and the acceleration
of gravity is 9.8ms−2 .
between these regions the pendulum is asymptotic to the unstable

equilibrium, at which the pendulum is standing upright.18 There
are two asymptotic trajectories, corresponding to the two ways the
equilibrium can be approached. Each of these is also asymptotic
to the unstable fixed point going backward in time.
3.4 Phase Space Reduction
Our motivation for the development of Hamilton’s equations was

to focus attention on the quantities that are sometimes conserved—
the momenta and the energy. In the Hamiltonian formulation the
18
The pendulum has only one unstable equilibrium. Remember that the co-
ordinate is an angle.
generalized configuration coordinates and the conjugate momenta

comprise the state of the system at a given time. We know from
the Lagrangian formulation that if the Lagrangian does not de-
pend on some coordinate then the conjugate momentum is con-
served. This is also true in the Hamiltonian formulation, but there
is a distinct advantage to the Hamiltonian formulation. In the La-
grangian formulation the knowledge of the conserved momentum
does not immediately lead to any simplification of the problem,
but in the Hamiltonian formulation the fact that momenta are
conserved gives an immediate reduction of the dimension of the
system to be solved. In fact, if a coordinate does not appear
in the Hamiltonian then the dimension of the system of coupled
equations that are remaining to be solved is reduced by two—
the coordinate does not appear and the conjugate momentum is
constant.
Let H(t, q, p) be a Hamiltonian for some problem with an n-
dimensional configuration space and 2n-dimensional phase space.
Suppose the Hamiltonian does not depend upon the ith coordinate
q i : (∂1 H)i = 0.19 According to Hamilton’s equations the conju-
gate momentum pi is conserved. Hamilton’s equations of motion
for the remaining 2n − 2 phase space variables do not involve q i
(because it does not appear in the Hamiltonian), and pi is a con-
stant. Thus the dimension of the difficult part of the problem,
the part that involves the solution of coupled ordinary differential
equations, is reduced by two. The remaining equation governing
the evolution of q i in general depends on all the other variables,
but once the reduced problem has been solved, then the equation
of motion for q i can be written so as to give Dq i explicitly as a
function of time. We can then find q i as a definite integral of this
function.20
Contrast this result with analogous results for more general
systems of differential equations. There are two independent sit-
uations.
19
If a Lagrangian does not depend on a particular coordinate then neither does
the corresponding Hamiltonian, because the coordinate is a passive variable
in the Legendre transform. Such a Hamiltonian is said to be cyclic in that
coordinate.
20
Traditionally, when a problem has been reduced to the evaluation of a def-
inite integral it is said to be reduced to a “quadrature.” Thus, the determi-
nation of the evolution of a cyclic coordinate q i is reduced to a problem of
quadrature.
One situation is that we know a constant of the motion. In gen-

eral, constants of the motion can be used to reduce the dimension
of the unsolved part of the problem by one. To see this, let the
system of equations be
Dz i = F i (z 1 , z 2 , . . . , z m ) (3.92)
where m is the dimension of the system. Assume we know some

constant of the motion
C(z 1 , z 2 , . . . , z m ) = 0. (3.93)
At least locally, we expect that we can use this equation to solve

for z m in terms of all the other variables, and use this solution to
eliminate the dependence on z m . The first m − 1 equations then
only depend upon the first m − 1 variables. The dimension of
the system of equations to be solved is reduced by one. After the
solution for the other variables has been found, z m can be found
using the constant of the motion.
Another situation is that one of the variables, say z i , does not
appear in the equations of motion (but there is an equation for
Dz i ). In this case the equations for the other variables form an
independent set of equations of one dimension less than the orig-
inal system. After these are solved, then the remaining equation
for z i can be solved by definite integration.
In both situations the dimension of the system of coupled equa-
tions is reduced by one. What is different about Hamilton’s equa-
tions is that these two situations often come together. If a Hamil-
tonian for a system does not depend on a particular coordinate
then the equations of motion for the other coordinates and mo-
menta do not depend on that coordinate. Furthermore, the mo-
mentum conjugate to that coordinate is a constant of the motion.
An added benefit is that the use of this constant of the motion
to reduce the dimension of the remaining equations is automatic
in the Hamiltonian formulation. The conserved momentum is a
state variable and just a parameter in the remaining equations.
When a generalized coordinate does not appear in the La-
grangian there is some continuous symmetry that is being ex-
pressed. The results on the reduction of the phase space show us
that in the formulation of a problem for which some symmetry is
apparent it will probably be to our advantage if we choose a coor-
dinate system that explicitly incorporates the symmetry, making
the Hamiltonian independent of a coordinate. Then the dimension

of the phase space of the coupled system will be reduced by two
for every coordinate that does not appear in the Hamiltonian.21
Motion in a central potential
Consider the motion of a particle of mass m in a central poten-
tial. A natural choice for generalized coordinates that reflects the
symmetry is polar coordinates. A Lagrangian is (equation 1.67):
L(t; r, ϕ; ṙ, ϕ̇) = 12 m(ṙ2 + r2 ϕ̇2 ) − V (r). (3.94)
The momenta are pr = mṙ and pϕ = mr2 ϕ̇. The kinetic energy is
a homogeneous quadratic form in the velocities so the Hamiltonian
is T + V with the velocities rewritten in terms of the momenta:
p2r p2ϕ
H(t; r, ϕ; pr , pϕ ) = + + V (r). (3.95)
2m 2mr2
Hamilton’s equations are:
pr
Dr =
m
pϕ
Dϕ =
mr2
p2ϕ
Dpr = − DV (r)
mr3
Dpϕ = 0. (3.96)
The potential energy depends on the distance from the origin, r,

as does the kinetic energy in polar coordinates, but neither the
potential energy nor the kinetic energy depends on the polar an-
gle ϕ. The angle ϕ does not appear in the Lagrangian so we know
that pϕ , the momentum conjugate to ϕ, is conserved along real-
izable trajectories. The fact that pϕ is constant along realizable
paths is expressed by one of Hamilton’s equations. That pϕ has a
constant value is immediately made use of in the other Hamilton’s
21
It is not always possible to choose a set of generalized coordinates in which
all symmetries are simultaneously manifest. For these systems, the reduction
of the phase space is more complicated. We have already encountered such
a problem: the motion of a free rigid body. The system is invariant under
a rotation about any axis, yet no single coordinate system can reflect this
symmetry. Nevertheless, we have already found that the dynamics is described
by a system of lower dimension that the full phase space: the Euler equations.
equations: the remaining equations are a self-contained subsystem

with constant pϕ . To make a lower dimensional subsystem in the
Lagrangian formulation we have to use each conserved momen-
tum to eliminate one of the other state variables, as we did for the
axisymmetric top (see section 2.10).
We can check our derivations with the computer. A procedure
implementing the Lagrangian has already been introduced (below
equation 1.67). We can use this to get the Hamiltonian:
(show-expression
(L-central-polar ’m (literal-function ’V)))
(up ’t (up ’r ’phi) (down ’p r ’p phi))))
1 2 1 2
2 pϕ 2 pr
V (r) + +
mr2 m
and to develop Hamilton’s equations:

(show-expression
(((Hamilton-equations
(Lagrangian->Hamiltonian
(L-central-polar ’m (literal-function ’V))))
(up (literal-function ’r)
(literal-function ’phi))
(down (literal-function ’p r)
(literal-function ’p phi)))
’t))
 
0
   
 p (t) 
 Dr (t) −
r 
   
  m  
  
 pφ (t)  
 Dφ (t) − 
 m (r (t)) 2 
  
 2 
 Dp (t) + DV (r (t)) − (pφ (t)) 
 r 3 
 m (r (t))  
Dpφ (t)
Axisymmetric top
We reconsider the axisymmetric top (see section 2.10) from the
Hamiltonian point of view. Recall that a top is a rotating rigid
body, one point of which is fixed in space. The center of mass is not
at the fixed point, and there is a uniform gravitational field. An
axisymmetric top is a top with an axis of symmetry. We consider
here an axisymmetric top with the fixed point on the symmetry
axis.
The axisymmetric top has two continuous symmetries that we
would like to exploit. It has the symmetry that neither the ki-
netic nor potential energy are sensitive to the orientation of the
top about the symmetry axis. The kinetic and potential energy
are also insensitive to a rotation of the physical system about the
vertical axis, because the gravitational field is uniform. We take
advantage of these symmetries by choosing coordinates that nat-
urally express them. We already have an appropriate coordinate
system that does the job—the Euler angles. We choose the refer-
ence orientation of the top so that the symmetry axis is vertical.
The first Euler angle ψ expresses a rotation about the symmetry
axis. The next Euler angle θ is the tilt of the symmetry axis of
the top from the vertical. The third Euler angle ϕ expresses a
rotation of the top about the fixed z axis. The symmetries of the
problem imply that the first and third Euler angles do not appear
in the Hamiltonian. As a consequence the momenta conjugate to
these angles are conserved quantities. The problem of determining
the motion of the axisymmetric top is reduced to the problem of
determining the evolution of θ and pθ . Let’s work out the details.
In terms of Euler angles a Lagrangian for the axisymmetric top
is (see section 2.10):
(define ((L-axisymmetric-top A C gMR) local)
(let ((theta (ref q 0))
(phidot (ref qdot 1))
(psidot (ref qdot 2)))
(+ (* 1/2 A
(+ (square thetadot)
(square (* phidot (sin theta)))))
(* 1/2 C
(square (+ psidot (* phidot (cos theta)))))
(* -1 gMR (cos theta))))))
where gMR is the product of the gravitational acceleration, the

mass of the top, and the distance from the point of support to the
center of mass. The Hamiltonian is nicer than we have a right to

expect:
(show-expression
((Lagrangian->Hamiltonian (L-axisymmetric-top ’A ’C ’gMR))
(up ’t
(down ’p theta ’p phi ’p psi))))
1 2
2 pψ
1 2
2 pψ (cos (θ))2 1 2
2 pθ pφ pψ cos (θ)
1 2
2 pφ
+ + − +
C A (sin (θ))2 A A (sin (θ))2 A (sin (θ))2
+ gM R · cos (θ)
Note that the angles ϕ and ψ do not appear in the Hamiltonian,

as expected. Thus the momenta pϕ and pψ are constants of the
motion.
For given values of pϕ and pψ we must determine the evolu-
tion of θ and pθ . The Hamiltonian for θ and pθ is effectively a
one degree of freedom Hamiltonian, and this Hamiltonian does
not involve the time. Thus the value of the Hamiltonian is con-
served along realizable trajectories. This means that the possible
trajectories of θ and pθ can be represented as contours of the
Hamiltonian. This gives us a big picture of the possible types of
motion and their relationship, for given values of pϕ and pψ .
If the top is standing vertically then pϕ = pψ . Let’s concen-
trate on the case that pϕ = pψ , and define p = pψ = pϕ . The
Hamiltonian becomes (after a little trigonometric simplification)
p2θ p2 p2 θ
H= + + tan2 + gM R cos θ. (3.97)
2A 2C 2A 2
Defining the effective potential energy
p2 p2 θ
Ueff (θ) = + tan2 + gM R cos θ, (3.98)
2C 2A 2
which parametrically depends on p, A, C, and gM R, the Hamil-
tonian is
p2θ
H= + Ueff (θ). (3.99)
2A
0.9
0.2
−π/2 0 π/2
Figure 3.5 The effective potential energy of the axisymmetric top as

a function of the angle. The top curve is for an axial angular momentum
p > pc . For this value the top is stable standing vertically. The bottom
curve is for p < pc . Here, the top is not stable standing vertically.
The middle curve is for p at the critical angular momentum. We see
the bifurcation of the stable equilibrium of the sleeping top into three
equilibrium points, one of them unstable.
If p is large Ueff has a single minimum at θ = 0, as we can see in

figure 3.5 For small p there is a minimum for finite positive θ and
a symmetrical minimum for negative θ; there is a local maximum
at θ = 0. There is a critical value of p at which θ = 0 changes from
being a minimum to a local maximum. Denote √ the critical value
by pc . A simple calculation shows pc = 4gM RA. For θ = 0
we have p = Cω where ω is the rotation rate. Thus to pc there
corresponds a critical rotation rate
p
ωc = 4gM RA/C. (3.100)
For ω > ωc the top can stand vertically; for ω < ωc the top
falls if slightly displaced from the vertical. The top which stands
vertically is called the “sleeping” top. For a more realistic top
friction gradually slows the rotation, and the rotation rate of the
top eventually falls below the critical rotation rate and the top
“wakes up.”
0.002
0.001
-0.001
-0.002
−π 0 π
Figure 3.6 The θ, pθ phase plane for the axisymmetric top with
pϕ = pψ and ω = 130 rad/s. The parameters are A = 0.0000328kg m2 ,
C = 0.000066kg m2 , gM R = 0.0456kg m2 s−2 . For these parameters the
critical frequency ωc is about 117.2 rad/s.
0.005
-0.005
−π 0 π
Figure 3.7 The θ, pθ phase plane for the axisymmetric top with
pϕ = pψ and ω = 90 rad/sec. The other parameters are as before.
0.01
-0.01
−π 0 π
Figure 3.8 The θ, pθ phase plane for the axisymmetric top with pϕ >
pψ . Most of the parameters are the same as before, but here pϕ =
0.00726kgm2 s−1 and pψ = 0.00594kgm2 s−1 .
We get additional insight into the sleeping top and the awake
top by looking at the trajectories in the θ, pθ phase plane. The
trajectories in this plane are simply contours of the Hamiltonian,
because the Hamiltonian is conserved. Figure 3.6 shows a phase
portrait for ω > ωc . All of the trajectories are loops around the
vertical (θ = 0). Displacing the top slightly from the vertical
simply places the top on a nearby loop, so the top stays nearly
vertical. Figure 3.7 shows the phase portrait for ω < ωc . Here
the vertical position is an unstable equilibrium. The trajectories
that approach the vertical are asymptotic—they take an infinite
amount of time to reach it, just as a pendulum with just the right
initial conditions can approach the vertical but never reach it. If
the top is displaced slightly from the vertical then the trajectories
loop around another center with nonzero θ. A top started at the
center point of the loop stays there, and one started near this
equilibrium point loops stably around it. Thus we see that when
the top “wakes up” the vertical is unstable, but the top does not
fall to the ground. Rather, it oscillates around a new equilibrium.
It is also interesting to consider the axisymmetric top when
pϕ 6= pψ . Consider the case pϕ > pψ . Some trajectories in the θ,
pθ plane are shown in figure 3.8. Note that in this case trajectories
do not go through θ = 0. The phase portrait for pϕ < pψ is similar
and will not be shown.
We have reduced the motion of the axisymmetric top to quadra-
tures by choosing coordinates that express the symmetries. It
turns out that the resulting integrals can be expressed in terms of
elliptic functions. Thus, the axisymmetric top can be analytically
solved. We do not dwell on this solution because it is not very il-
luminating. In fact, most problems cannot be solved analytically,
so there is not much profit in dwelling on the analytic solution of
one of the rare problems which is analytically solvable. Rather,
our discussion has focused on the geometry of the solutions in the
phase space, and the use of integrals to reduce the dimension of
the problem. With the phase space portrait we have found some
interesting qualitative features of the motion of the top.
Exercise 3.8: Sleeping top

Verify that the critical angular velocity above which an axisymmetric
top can sleep is given by equation (3.100).
3.4.1 Lagrangian Reduction

Suppose there are cyclic coordinates. In the Hamiltonian formula-
tion the equations of motion for the coordinates and momenta for
the other degrees of freedom form a self contained subsystem, in
which the momenta conjugate to the cyclic coordinates are param-
eters. We can form a Lagrangian for this subsystem by performing
a Legendre transform of the reduced Hamiltonian. Alternatively,
we can start with the full Lagrangian and perform a Legendre
transform only for those coordinates that are cyclic. The equa-
tions of motion are Hamilton’s equations for those variables that
are transformed and Lagrange’s equations for the others. The
momenta conjugate to the cyclic coordinates are conserved and
can be treated as parameters in the Lagrangian for the remaining
coordinates.
Divide the tuple q of coordinates into two subtuples q = (x, y).
Assume L(t; x, y; vx , vy ) is a Lagrangian for the system. Define
the Routhian R as the Legendre transform of L with respect to
the vy slot:
py = ∂2,1 L(t; x, y; vx , vy ) (3.101)

py vy = R(t; x, y; vx , py ) + L(t; x, y; vx , vy ) (3.102)
3.4.1 Lagrangian Reduction 219
vy = ∂2,1 R(t; x, y; vx , py ) (3.103)

0 = ∂0 R(t; x, y; vx , py ) + ∂0 L(t; x, y; vx , vy ) (3.104)
0 = ∂1 R(t; x, y; vx , py ) + ∂1 L(t; x, y; vx , vy ) (3.105)
0 = ∂2,0 R(t; x, y; vx , py ) + ∂2,0 L(t; x, y; vx , vy ) (3.106)
To define the function R we must solve equation (3.101) for vy

in terms of the other variables, and substitute this into equa-
tion (3.102).
Define the state path Ξ
Ξ(t) = (t; x(t), y(t); Dx(t), py (t)), (3.107)
where
py (t) = ∂2,1 L(t; x(t), y(t); Dx(t), Dy(t)). (3.108)
Realizable paths satisfy the equations of motion
D(∂2,0 R ◦ Ξ)(t) = ∂1,0 R ◦ Ξ(t) (3.109)

Dy(t) = ∂2,1 R ◦ Ξ(t) (3.110)
Dpy (t) = −∂1,1 R ◦ Ξ(t), (3.111)
which are Lagrange’s equations for x and Hamilton’s equations

for y and py .
Now suppose that the Lagrangian is cyclic in y. Then ∂1,1 L =
∂1,1 R = 0, and py (t) is a constant c on any realizable path. Equa-
tion (3.109) does not depend on y, by assumption, and we can
replace py by its constant value c. So equation (3.109) forms a
closed subsystem for the path x. The Lagrangian Lc
Lc (t, x, vx ) = −R(t; x, •; vx , c). (3.112)
describes the motion of the subsystem. The minus sign is intro-

duced for convenience. The path y can be found by integrating
equation (3.110) using the independently determined path x.
Define the action
Z t2
0
Sc [x](t1 , t2 ) = Lc ◦ Γ[x]. (3.113)
t1
The realizable paths x satisfy the Lagrange equations with the

Lagrangian Lc , so the action Sc0 is stationary with respect to vari-
ations ξ of x that are zero at the end times:
δξ Sc0 (t1 , t2 ) = 0. (3.114)
For realizable paths q the action S[q](t1 , t2 ) is stationary with

respect to variations η of q that are zero at the end times. Along
these paths the momentum py (t) has the constant value c. For
these same paths the action Sc0 [x](t1 , t2 ) is stationary with respect
to variations ξ of x that are zero at the end times. The dimension
of ξ is smaller than the dimension of η.
The values of the actions Sc0 [x](t1 , t2 ) and S[q](t1 , t2 ) are related:
Z t2
0
S[q](t1 , t2 ) = Sc [x] − cvy
t1
= Sc0 [x] − c(y(t2 ) − y(t1 )). (3.115)
Exercise 3.9: Routhian equations of motion

Verify that the equations of motion are given by equations (3.109) to
(3.111).
3.5 Phase Space Evolution

Most problems do not have enough symmetries to be reducible
to quadrature. It is natural to turn to numerical integration to
learn more about the evolution of such systems. The evolution in
phase space may be found by numerical integration of Hamilton’s
equations.
Hamilton’s equations are already in first order form; the Hamil-
tonian state derivative is the same as the phase-space derivative:
(define Hamiltonian->state-derivative
phase-space-derivative)
As an illustration consider again the periodically driven pendu-

lum (see section 1.6.2). The Hamiltonian is
(show-expression
(L-periodically-driven-pendulum ’m ’l ’g ’a ’omega))
(up ’t ’theta ’p theta)))
3.5 Phase Space Evolution 221
10
-10
−π 0 π
Figure 3.9 This is a phase-space picture of the evolution of the driven

pendulum. The phase-space view of the evolution reveals some interest-
ing structure.
1
− a2 mω 2 (cos (θ))2 (sin (ωt))2 + agm cos (ωt)
2
1 2
aωpθ sin (θ) sin (ωt) p
+ − glm cos (θ) + 22 θ
l l m
Hamilton’s equations for the periodically driven pendulum are un-

revealing, so we will not show them. We build a system derivative
from the Hamiltonian:
(define (H-pend-sysder m l g a omega)
(Hamiltonian->state-derivative
(Lagrangian->Hamiltonian
(L-periodically-driven-pendulum m l g a omega))))
Now we integrate this system, with the same initial conditions as

in chapter 1 (see figure 1.7), but displaying the trajectory in phase
space (figure 3.9). We make a monitor procedure to display the
evolution in phase space:
(define ((monitor-p-theta win) state)

(let ((q ((principal-value pi) (coordinate state)))
(p (momentum state)))
(plot-point win q p)))
We use evolve to explore the evolution of the system

(define window (frame -pi pi -10.0 10.0))
(let ((m 1.) ;m=1kg

(l 1.) ;l=1m
(g 9.8) ;g=9.8m/s2
(A 0.1) ;A=1/10 m
(omega (* 2 (sqrt 9.8))))
((evolve H-pend-sysder m l g A omega)
(up 0.0 ;t0 =0
1.0 ;theta0 =1 radian
0.0) ;thetadot0 =0 radians/s
(monitor-p-theta window)
0.01 ;plot interval
100.0 ;final time
1.0e-12))
The trajectory sometimes oscillates and sometimes circulates. The

patterns in the phase plane are reminiscent of the trajectories in
the phase plane of the undriven pendulum shown in figure 3.4.
3.5.1 Phase Space Description is Not Unique

We are familiar with the fact that a given motion of a system is
expressed differently in different coordinate systems: the functions
that express a motion in rectangular coordinates are different from
the functions that express the same motion in polar coordinates.
However, with a given coordinate system the evolution of the local
state tuple for particular initial conditions is unique. The general-
ized velocity path function is the derivative of the generalized co-
ordinate path function. On the other hand, the coordinate system
alone does not uniquely specify the phase-space description. The
relationship of the momentum to the coordinates and the veloci-
ties depends on the Lagrangian, and many different Lagrangians
may be used to describe the behavior of the same physical system.
When two Lagrangians for the same physical system are different
the phase-space descriptions of a dynamical state are different.
We have already seen two different Lagrangians for the driven
pendulum (see section 1.6.2). One was found using L = T −V , and
3.5.1 Phase Space Description is Not Unique 223
10
-10
−π 0 π
10
-10
−π 0 π
Figure 3.10 An orbit of the driven pendulum in the phase space

using L = T − V is shown in the upper plot. In the lower plot the same
trajectory is shown in the phase space for the alternate Lagrangian. The
evolution is the same, but the phase space representations are not the
same.
the other was found by inspection of the equations of motion. The

two Lagrangians differ by a total time derivative. The momentum
pθ conjugate to θ depends on which Lagrangian we choose to work
with, and the description of the evolution in the corresponding
phase space also depends on the choice of Lagrangian, even though
the behavior of the system is independent of the method used to
describe it. The momentum conjugate to θ, using the L = T − V
Lagrangian, is
pθ = ml2 θ̇ − almω sin θ sin ωt (3.116)
and the momentum conjugate to θ, using the alternate Lagrangian,

is
pθ = ml2 θ̇. (3.117)
The two momenta differ by an additive distortion that varies peri-

odically in time and depends on θ. That the phase-space descrip-
tions are different is illustrated in figure 3.10. The evolution of
the system is the same for each.
3.6 Surfaces of Section
Computing the evolution of mechanical systems is just the begin-

ning of understanding the dynamics. Typically, we want to know
much more than the phase space evolution of some particular tra-
jectory. We want to obtain a qualitative understanding of the
motion. We want to know what sorts of motion are possible, and
how one type relates to others. We want to abstract the essential
dynamics from the myriad particular evolutions that we can cal-
culate. One tool that we can bring to bear on this problem is a
technique called the surface of section or Poincaré section.22
Paradoxically, it turns out that by throwing away most of the
calculated information about a trajectory we gain essential new
information about the character of the trajectory and its relation
22
The surface of section technique was introduced by Poincaré in his Méthodes
Nouvelles de la Mécanique Céleste. Poincaré proved remarkable results about
dynamical systems using the surface of section technique, and we shall return
to those later. The surface of section technique is a key tool in the modern
study of dynamical systems, for both analytical and numerical investigations.
to other trajectories. A surface of section is generated by looking

at successive intersections of a trajectory or a set of trajectories
with a plane in the phase space. Typically, the plane is spanned
by a coordinate axis and the canonically conjugate momentum
axis. We will see that surfaces of section made in this way have
nice properties. The collection of points generated on these sur-
faces of section reveal important qualitative information about
the nature of the trajectories and the relationship among various
types of trajectories.23 The surface of section reveals two qualita-
tively different types of motion: regular and chaotic. An essential
characteristic of the chaotic motions is that initially nearby trajec-
tories separate exponentially with time; the separation of regular
trajectories is linear.24 These two types of trajectories are found
to be clustered in regions of regular motion and regions of chaotic
motion.
3.6.1 Poincaré Sections for Periodically-Driven Systems

For a periodically driven system the surface of section is a strobo-
scopic view of the evolution; we consider only the state of the sys-
tem at the strobe times, with the period of the strobe equal to the
drive period. We generate a surface of section for a periodically-
driven system by computing a number of trajectories and accumu-
lating the phase-space coordinates of each trajectory whenever the
drive passes through some particular phase. Let T be the period
of the drive, then, for each trajectory, the surface of section ac-
cumulates the phase-space points (q(t), p(t)), (q(t + T ), p(t + T )),
(q(t + 2T ), p(t + 2T )), and so on (see figure 3.11). For a system
23
The surface of section technique was put to spectacular use in the 1964
landmark paper [19] by astronomers Michel Hénon and Carl Heiles. In their
numerical investigations they found that some trajectories are chaotic, and
show exponential divergence with time, while other trajectories are regular,
showing linear divergence with time. They found that these two types of
trajectories are typically clustered in the phase space into regions of chaotic
behavior and regions of regular behavior.
24
That solutions of ordinary differential equations can show exponential
sensitivity to initial conditions was independently discovered by Edward
Lorenz ([28]) in the context of simplified model of convection in the Earth’s
atmosphere. Lorenz coined the picturesque term the “butterfly effect” to de-
scribe this sensitivity. The weather system model of Lorenz is so sensitive to
initial conditions that “the flapping of a butterfly’s wings in Brazil can change
the course of a typhoon in Japan.”
p p
Phase Space Trajectory
(q(t + T ), p(t + T ))
(q(t), p(t)) (q(t), p(t))
q q
time
t Drive t+T
Figure 3.11 Stroboscopic surface of section for a periodically driven

system. For each trajectory the surface of section accumulates the set
of phase-space points after each full cycle of the drive.
with a single degree of freedom we can plot the sequence of phase-

space points on a q, p surface.
In the case of the stroboscopic section for the periodically driven
system the phase of the drive is the same for all section points,
thus each phase-space point in the section, with the known phase
of the drive, may be considered as an initial condition for the
rest of the trajectory. The absolute time of the particular section
point does not affect the subsequent evolution; all that matters is
that the phase of the drive have the value specified for the section.
Thus we can think of the dynamical evolution as generating a map
that takes a point in the phase space and generates a new point
on the phase space after evolving the system for one drive period.
This map of the phase space onto itself is called the Poincaré map.
Figure 3.12 shows an example Poincaré section for the driven
pendulum. We plot the section points for a number of different
initial conditions. We are immediately presented with a new facet
20
-20
−π 0 π
Figure 3.12 Surface of section for the driven pendulum. The angle is
plotted on the abscissa; the momentum conjugate to this angle is plotted
on the ordinate. For this section the parameters are:
p m = 1 kg, l = 1m,
2
g = 9.8 m/s , A = 0.05 m, ω = 4.2ω0 , with ω0 = g/l.
of dynamical systems. For some initial conditions, the subsequent

section points appear to fill out a set of curves in the section. For
other initial conditions this is not the case. Rather, the set of
section points appear scattered over a region of the section. In
fact, all of the scattered points in figure 3.12 were generated from
a single initial condition. The surface of section suggests that
there are qualitatively different classes of trajectories that differ
in the dimension of the subspace of the section that they explore.
Trajectories that fill out curves on the surface of section are
called regular trajectories.25 The curves that are filled out by the
regular trajectories are invariant curves. They are invariant in
that if any section point for a trajectory falls on an invariant curve
all subsequent points fall on the same invariant curve. Otherwise
stated, the Poincaré map maps every point on an invariant curve
onto the invariant curve.
25
Regular trajectories are also called quasiperiodic trajectories.
The trajectories that appear to fill areas are called chaotic tra-
jectories. For these points the distance in phase space between ini-
tially nearby points grows, on average, exponentially with time.26
In contrast, for the regular trajectories, the distance in phase space
between initially nearby points grows, on average, linearly with
time.
The phase space seems to be grossly clumped into different re-
gions. Initial conditions in some regions seem to predominately
yield regular trajectories, and other regions seem to predominately
yield chaotic trajectories. This gross division of the phase space
into qualitatively different types of trajectories is called the di-
vided phase space. We will see later that there is much more
structure here than is apparent at this scale, and that upon mag-
nification there is a complicated interweaving of chaotic and reg-
ular regions on finer and finer scales. Indeed, we shall see that
many trajectories which appear to generate curves on the surface
of section are, upon magnification, actually chaotic and fill a tiny
area. We shall also find that there are trajectories which lie on
one-dimensional curves on the surface of section, but which only
explore a subset of this curve formed by cutting out an infinite
number of holes.27
The features seen on the surface of section of the driven pen-
dulum are quite general. The same phenomena are seen in most
dynamical systems. In general, there are both regular and chaotic
trajectories, and there is the clumping characteristic of the divided
phase space. The specific details depend upon the system, but the
basic phenomena are generic. Of course we are interested in both
aspects: the phenomena which are generic to all systems, and the
specific details for particular systems of interest.
The surface of section for the periodically driven pendulum has
specific features that give us qualitative information about how
this system behaves. The central island in figure 3.12 is the rem-
nant of the oscillation region for the unforced pendulum (see fig-
ure 3.4). There is a sizable region of regular trajectories here that
are, in a sense, similar to the trajectories of the unforced pendu-
26
We saw an example of this extreme sensitivity to initial conditions in fig-
ure 1.7.
27
One-dimensional invariant sets with an infinite number of holes are some-
times called cantori, by analogy to the Cantor sets, but it really doesn’t
Mather.
lum. In this region, the pendulum oscillates back and forth, much
as the undriven pendulum does, but the drive makes it wiggle as
it does so. The section points are all collected at the same phase
of the drive so we do not see these wiggles on the section.
The central island is surrounded by a large chaotic zone. Thus
the region of phase space with regular trajectories similar to the
unforced trajectories has finite extent. On the section, the bound-
ary of this “stable” region is apparently rather well defined—there
is a sudden transition from smooth regular invariant curves to
chaotic motion that can take the system far from this region of
regular motion.
There are two other sizeable regions of regular behavior. The
trajectories in these regions are resonant with the drive, on av-
erage executing one full rotation per cycle of the drive. The two
islands differ in the direction of the rotation. In these regions
the pendulum is making complete rotations, but the rotation is
locked to the drive so that points on the section appear only in the
islands with finite angular extent. The fact that points for partic-
ular trajectories loop around the islands means that the pendulum
sometimes completes a cycle faster than the drive and sometimes
slower than the drive, but never loses lock.
Each regular region has finite extent. So from the surface of
section we can see directly the range of initial conditions which
remain in resonance with the drive. Outside of the regular region
initial conditions lead to chaotic trajectories which evolve far from
the resonant regions.
Various higher order resonance islands are also visible, as are
non-resonant regular circulating orbits. So, the surface of section
has provided us with an overview of the main types of motion that
are possible and their relationship.
If we change the parameters we can see other interesting phe-
nomena. Figure 3.13 shows the surface of section when the drive
frequency is twice the natural small amplitude oscillation fre-
quency of the undriven pendulum. The section has a large chaotic
zone, with an interesting set of islands. The central equilibrium
has undergone an instability and instead of a central island we
find two off-center islands. These islands are alternately visited
one after the other. As the support goes up and down the pendu-
lum alternately tips to one side and then the other. It takes two
periods of the drive before the pendulum visits the same island.
Thus, the system has “period doubled.” An island has been re-
10
-10
−π 0 π
Figure 3.13 Another surface of section for the driven pendulum, il-
lustrating a period-doubled central island. For this section the frequency
of the drive is resonant with the frequency of small amplitude oscilla-
tions of the undriven pendulum. The angle is plotted on the abscissa
(scale −π to π); the momentum conjugate to this angle is plotted on the
ordinate (scale -10 to 10 kg m2 /s). For this section the parameters are:
2
m = 1 kg, l = 1 m, g = 9.8m/s , A = 0.1m, ω = 2ω0 .
placed by a period-doubled pair of islands. Note that other islands

still exist. The islands in the top and bottom of the chaotic zone
are the resonant islands, in which the pendulum loops on average
a full turn for every cycle of the drive. Note that, as before, if the
pendulum is rapidly circulating, the motion is regular.
It is a surprising fact that if we shake the support of a pendu-
lum fast enough then the pendulum can stand upright. This phe-
nomenon can be visualized with the surface of section. Figure 3.14
shows a surface of section when the drive frequency is large com-
pared to the natural frequency. The pendulum can stand upright
because there is a regular island at the inverted equilibrium. The
surface of section shows that the pendulum can remain upright
for a range of initial displacements from the vertical, which can
be seen on the surface of section.
3.6.2 Computing Stroboscopic Surfaces of Section 231
20
-20
−π 0 π
Figure 3.14 Surface of section for a rapidly driven pendulum, illus-

trating a vertical equilibrium. The angle is plotted on the abscissa (scale
−π to π); the momentum conjugate to this angle is plotted on the or-
dinate (scale -20 to 20 kg m2 /s). For this section the parameters are:
2
m = 1 kg, l = 1 m, g = 9.8 m/s , A = 0.2 m, ω = 10.1ω0 .
3.6.2 Computing Stroboscopic Surfaces of Section

We already have the system derivative for the pendulum, and we
can use it to construct a parametric map for constructing Poincaré
sections.
(define (driven-pendulum-map m l g A omega)
(let ((advance (state-advancer H-pend-sysder m l g A omega))
(map-period (/ 2pi omega)))
(lambda (theta ptheta return fail)
(let ((ns (advance
(up 0 theta ptheta) ; initial state
map-period))) ; integration interval
(return ((principal-value pi) (coordinate ns))
(momentum ns))))))
A map procedure takes the two section coordinates (here theta

and ptheta) and two “continuation” procedures. If the section
coordinates given are in the domain of the map, it produces two
new section coordinates and passes them to the return contin-

uation, otherwise the map procedure calls the fail continuation
procedure with no arguments.28
The trajectories of a map can be explored with an “interactive”
interface. The procedure explore-map allows us to use a pointing
device to choose initial conditions for trajectories. For example,
the surface of section in figure 3.12 was generated by plotting a
number of trajectories, using a pointer to choose initial conditions,
with the following program:
(define win (frame -pi pi -20 20))
(let ((m 1.0) ;m=1kg

(l 1.0) ;l=1m
(g 9.8) ;g=9.8m/s2
(A 0.05)) ;A=1/20m
(let ((omega0 (sqrt (/ g l))))
(let ((omega (* 4.2 omega0)))
(explore-map
win
(driven-pendulum-map m l g A omega)
1000)))) ;1000 points for each ic
Exercise 3.10: Fun with phase portraits

Choose some one-degree-of-freedom dynamical system that you are cu-
rious about and that can be driven with a periodic drive. Construct a
map of the sort we made for the driven pendulum and do some explor-
ing. Are there chaotic regions? Are all of the chaotic regions connected
together?
3.6.3 Poincaré Sections for Autonomous Systems

We illustrated the use of Poincaré sections to visualize qualitative
features of the phase space for a one degree-of-freedom system
with periodic drive, but the idea is more general. Here we show
how Hénon and Heiles used the surface of section to elucidate the
properties of an autonomous system.
Hénon-Heiles background
In the early 60’s astronomers were up against a wall. Careful mea-
surements of the motion of nearby stars in the galaxy had allowed
28
In the particular case of the driven pendulum there is no reason to call fail.
This contingency is reserved for systems where orbits escape or cease to satisfy
some constraint.
particular statistical averages of the observed motions to be de-

termined, and the averages were not at all what was expected. In
particular, what was calculated was the velocity dispersion: the
root mean square deviation of the velocity from the average. We
use angle brackets to denote an average over nearby stars: hwi is
the average value of some quantity w for the stars in the ensem-
ble. The average velocity is h~x˙ i. The components of the velocity
dispersion are
σx = h(ẋ − hẋi)2 i1/2 (3.118)

2 1/2
σy = h(ẏ − hẏi) i (3.119)
σz = h(ż − hżi)2 i1/2 . (3.120)
If we use cylindrical polar coordinates (r, θ, z) and align the axes

with the galaxy so that z is perpendicular to the galactic plane
and r increases with the distance to the center of the galaxy, then
two particular components of the velocity dispersion are:
σz = h(ż − hżi)2 i1/2 (3.121)

2 1/2
σr = h(ṙ − hṙi) i . (3.122)
It was the expectation at the time that these two components of

the velocity dispersion should be equal. In fact they were found
to differ by about a factor of 2: σr ≈ 2σz What was the prob-
lem? In the literature at the time there was considerable discus-
sion of what could be wrong. Was the problem some observa-
tional selection effect? Were the velocities measured incorrectly?
Were the assumptions used in the derivation of the expected ratio
not adequately satisfied? For example, the derivation assumed
that the galaxy was approximately axisymmetric. Perhaps non-
axisymmetric components of the galactic potential were at fault.
It turned out that the problem was much deeper. The under-
standing of motion was wrong.
Let’s review the derivation of the expected relation among the
components of the velocity dispersion. We wish to give a sta-
tistical description of the distribution of stars in the galaxy. We
introduce the phase-space distribution function f (~x, p~) which gives
the probability density of finding a star at position ~x with momen-
tum p~.29 Integrating this density over some finite volume of phase
space gives the probability of finding a star in that phase-space
volume (in that region of space within a specified region of mo-
menta). We assume the probability density is normalized so that
the integral over all of phase space gives unit probability; the star
is somewhere and has some momentum with certainty. In terms
of f the statistical average of any dynamical quantity w over some
volume of phase space V is just
Z
hwiV = fw (3.123)
V
where the integral extends over the phase-space volume V . In

computing the velocity dispersion at some point ~x, we would com-
pute the averages by integrating over all momenta.
Individual stars move in the gravitational potential of the rest
of the galaxy. It is not unreasonable to assume that the overall
distribution of stars in the galaxy does not change much with
time, or changes only very slowly. The density of stars in the
galaxy is actually very small and close encounters of stars are
very rare. Thus, we can model the gravitational potential of the
galaxy as a fixed external potential in which individual stars move.
The galaxy is approximately axisymmetric. We assume that the
deviation from exact axisymmetry is not a significant effect and
thus we take the model potential to be exactly axisymmetric.
Consider the motion of a point mass (a star) in an axisymmet-
ric potential (of the galaxy). In cylindrical polar coordinates the
Hamiltonian is
· ¸
1 2 p2θ 2
T +V = p + + pz + V (r, z), (3.124)
2m r r2
where V does not depend on θ. Since θ does not appear, we know
that the conjugate momentum pθ is constant. For the motion of
29
We will see that it is convenient to look at distribution functions in the phase-
space coordinates because the consequences of conserved momenta are more
apparent, but also because volume in phase space is conserved by evolution
(see section 3.8).
any particular star we can treat pθ as a parameter. Thus the

effective Hamiltonian has two degrees of freedom
1 £ 2 ¤
pr + p2z + U (r, z) (3.125)
2m
where
p2θ
U (r, z) = V (r, z) + . (3.126)
2mr2
The value E of the Hamiltonian is constant since there is no ex-
plicit time dependence in the Hamiltonian. Thus, we have con-
stants of the motion E and pθ .
Jeans’ “theorem” asserts that the distribution function f de-
pends only on the values of the integrals of motion. That is, we
can introduce a different distribution function f 0 that represents
the same physical distribution
f 0 (E, pθ ) = f (~x, ~x˙ ). (3.127)
There was good reason to believe that this might be correct. First,
it is clear that the distribution function surely depends at least on
E and pθ . The problem is “Given an energy E and angular mo-
mentum pθ what motion is allowed?” The integrals clearly confine
the evolution. Does the evolution carry the system everywhere
in the phase space subject to these known constraints? In the
early part of the 20th century this appeared plausible. Statistical
mechanics was successful, and statistical mechanics made exactly
this assumption. Perhaps there are other integrals of the mo-
tion which exist, but we have not yet discovered them? Poincaré
proved an important theorem with regard to integrals of the mo-
tion. Poincaré proved that most integrals of a dynamical system
typically do not persist upon perturbation of the system. That
is, if a small perturbation is added to a problem, then most of
the integrals of the original problem do not have analogs in the
perturbed problem. The integrals are destroyed. Of course, in-
tegrals which result from symmetries of the problem continue to
be preserved if the perturbed system has the same symmetries.
Thus angular momentum continues to be preserved upon appli-
cation of any axisymmetric perturbation. Poincaré’s theorem is
correct, but what came next was not. As a corollary to Poincaré’s
theorem, in 1920 Fermi published a proof of an ergodic theorem,
which stated that typically the motion of perturbed problems is

ergodic30 subject to the constraints imposed by the integrals re-
sulting from symmetries. Loosely speaking, this means that tra-
jectories go everywhere they are allowed to go by the integral
constraints. Fermi’s theorem was later shown to be incorrect, but
on the basis of this theorem we could expect that typically sys-
tems fully explore the phase space subject only to the constraints
imposed by the integrals resulting from symmetries. Suppose then
that the evolution of stars in the galactic potential were subject
only to the constraints of conserving E and pθ . We shall see that
this is not true, but if it were we could then conclude that the
distribution function for stars in the galaxy can also only depend
on E and pθ .
Given this form of the distribution function, we can deduce the
stated ratios of the velocity dispersions. We note that pz and pr
appear in the same way in the energy. Thus the average of any
function of pz computed with the distribution function must equal
the average of the same function of pr . In particular, the velocity
dispersions in the z and r directions must be equal:
σz = σr . (3.128)
But this is not what was observed, which was
σr ≈ 2σz . (3.129)
Hénon and Heiles approached this problem differently than oth-

ers at the time. Rather than improving the models for the motion
of stars in the Galaxy, they concentrated on what turned out to be
the central issue. What is the qualitative nature of motion? The
problem had nothing to do with galactic dynamics in particular,
but with the problem of motion. They abstracted the dynamical
problem from the particulars of galactic dynamics.
The system of Hénon and Heiles
We have seen that the study of the motion of a point with mass m
with an axisymmetric potential energy reduces to the study of a
reduced two degree of freedom problem in r and z with potential
energy U (r, z). Hénon and Heiles chose to study the motion in a
30
A system is ergodic if time averages along trajectories are the same as phase
space averages over the region explored by the trajectories.
two degree of freedom system with a particularly simple poten-

tial energy so the dynamics would be clear and the calculation
uncluttered. The Hénon-Heiles Hamiltonian is
1¡ 2 ¢
H(t; x, y; px , py ) = px + p2y + V (x, y) (3.130)
2
with potential energy
1¡ 2 ¢ 1
V (x, y) = x + y 2 + x2 y − y 3 . (3.131)
2 3
The potential energy is shaped like a distorted bowl. The poten-
tial energy has triangular symmetry, which is evident when the
potential energy is rewritten in polar coordinates
1 2 1 3
r − r sin 3θ. (3.132)
2 3
Contours of the potential energy are shown in figure 3.15. At small
values of the potential energy the contours are approximately cir-
cular; as the value of the potential energy approaches 1/6 the
contours become triangular, and at larger potential energies the
contours open to infinity.
The Hamiltonian is time independent, so energy is conserved.
In this case this is the only known integral. We first determine the
restrictions that conservation of energy imposes on the evolution.
We have
1¡ 2 ¢
E= px + p2y + V (x, y) ≥ V (x, y), (3.133)
2
so the motion is confined to the region inside the contour V = E
because the sum of the squares of the momenta cannot be negative.
Let’s compute some sample trajectories. For definiteness, we
investigate trajectories with energy E = 1/8. There are a large
variety of trajectories. There are trajectories that circulate in
a regular way around the bowl, and there are trajectories that
oscillate back and forth (figure 3.16). There are also trajectories
that appear more irregular (figure 3.17). There is no end to the
trajectories that could be computed, but let’s face it, surely there
is more to life than looking at trajectories.
The problem facing Hénon and Heiles was the issue of integrals
of motion. Are there other integrals besides the obvious ones?
They investigated this issue with the surface of section technique.
1.0
0.5
0.0
-0.5
-1.0 -0.5 0.0 0.5 1.0
Figure 3.15 Contours of the Hénon-Heiles potential energy. The con-

tours shown, from the inside out, are for potential energies: 1/100, 1/40,
1/20, 1/12, 1/8, and 1/6.
The surface of section is generated by looking at successive pas-

sages of trajectories through a plane in phase space. How does
this address the issue of the number of integrals? A priori, there
appear to be two possibilities: either there are hidden integrals
or there are not. Suppose there is no other integral of the mo-
tion besides the energy. Then the expectation was that succes-
sive intersections of the trajectory with the section plane would
eventually explore all of the section plane that is consistent with
conservation of energy. On the other hand, if there is a hidden
integral then the successive intersections would be constrained to
fall on a curve.
Specifically, the surface of section is generated by recording and
plotting py versus y whenever x = 0, as shown in figure 3.18.
Given the value of the energy E and a point (y, py ) on the section
x = 0 we can recover px , up to a sign. If we restrict attention to
intersections with the section plane that cross with, say, positive
px , then there is a one to one relation between section points
0.5
0.0
-0.5
-0.5 0.0 0.5
0.5
0.0
-0.5
-0.5 0.0 0.5
Figure 3.16 Two trajectories of the Hénon-Heiles Hamiltonian pro-

jected on the (x, y) plane. The energy is E = 1/8.
0.5
0.0
-0.5
-0.5 0.0 0.5
Figure 3.17 Another trajectory of the Hénon-Heiles Hamiltonian pro-

jected on the (x, y) plane. The energy is E = 1/8.
and trajectories. A section point thus corresponds to a unique

trajectory.
On the section, the energy is
1¡ 2 ¢
E = H(t; 0, y; px , py ) = px + p2y + V (0, y) (3.134)
2
Because p2x is positive, the trajectory is confined by the energy
integral to regions of the section such that
1
E ≥ p2y + V (x = 0, y) (3.135)
2
So, if there is no other integral, we might expect the points on the
section to eventually fill the area enclosed by this bounding curve.
On the other hand, suppose there is a hidden extra integral
I(x, y; px , py ) = 0. Then this integral would provide further con-
straints on the trajectories and their intersections with the sec-
tion plane. An extra integral I provides a constraint between the
phase-space variables. We can use E to solve for px , and on the
py
Figure 3.18 The surface of section for the Hénon-Heiles problem is

generated by recording and plotting the successive crossings of the x = 0
plane in the direction of increasing x.
section x = 0, so the extra integral gives a relation between y and

py on the section. So we expect that if there is another integral
the successive intersections of a trajectory with the section plane
will fall on a curve.
If there is no extra integral we expect the section points to fill
an area; if there is an extra integral we expect the section points
to be restricted to a curve. What actually happens? Figure 3.19
shows a surface of section for E = 1/12. On the section the
section points for several representative trajectories are displayed.
By and large, the points appear to be restricted to curves; so there
appears to be evidence for an extra integral. Look closely though.
Where the “curves” cross, the lines are a little fuzzy. Hmmm.
Let’s try a little larger energy E = 1/8. The appearance of the
section changes qualitatively (figure 3.20). For some trajectories
there still appear to be extra constraints on the motion. But other
trajectories appear to fill an area of the section plane, pretty much
as we expected of trajectories if there was no extra integral. In
particular, all of the scattered points on this section were gener-
ated by a single trajectory. Thus, some trajectories behave as if
there is an extra integral, and others don’t. Wow!
Let’s go on to a higher energy E = 1/6, just at the escape
energy. A section for this energy is shown in figure 3.21. Now, a
single trajectory explores most of the region of the section plane
allowed by energy conservation, but not entirely. There are still
trajectories that appear to be subject to extra constraints.
0.5
0.0
-0.5
-0.5 0.0 0.5
Figure 3.19 Surface of section for the Hénon-Heiles problem with

energy E = 1/12.
We seem to have all possible worlds. At low energy, the system

by and large behaves as if there is an extra integral, but not en-
tirely. At intermediate energy, the phase space is divided: some
trajectories explore areas whereas others are constrained. At high
energy, trajectories explore most of the energy surface; few tra-
jectories show extra constraints. We have just witnessed our first
transition to chaos.
There are two qualitatively different types of motion that are
revealed by this surface of section, just as we saw in the Poincaré
sections for the driven pendulum. There are trajectories that seem
to be constrained as if by an extra integral. And there are trajec-
tories that explore an area on the section as though there were no
extra integrals. Regular trajectories appear to be constrained by
an extra integral to a one-dimensional set on the section; chaotic
trajectories are not constrained in this way and explore an area.31
31
As before, upon close examination we may find that trajectories that appear
to be confined to a curve on the section are chaotic trajectories that explore
0.5
0.0
-0.5
-0.5 0.0 0.5

energy E = 1/8.
The surface of section not only reveals the existence of qualita-

tively different types of motion, but it also provides an overview
of the different types of trajectories. Take the surface of section
for E = 1/8 (figure 3.20). There are four main islands, engulfed
in a chaotic sea. The particular trajectories displayed above pro-
vide examples from different parts of the section. The trajectory
that loops around the bowl (figure 3.16) belongs to the large is-
land on the left side of the section. Similar trajectories that loop
around the bowl in the other direction belong to the large island
on the right side of the section. The trajectories that oscillate
back and forth across the bowl belong to the two islands above
and below the center of the section. (By symmetry there should
a highly confined region. It is known, however, that some trajectories really

are confined to curves on the section. Trajectories that start on these curves
remain on these curves forever, and the trajectories fill these curves densely.
These invariant curves are preserved by the dynamical evolution. There are
also invariant subsets of curves with an infinite number of holes. We will
explore the properties of these sets later.
0.5
0.0
-0.5
-0.5 0.0 0.5

energy E = 1/6. The section is clipped on the right.
be three such islands. The third island is snugly wrapped against

the boundary of the section.) Each of the main islands is sur-
rounded by a chain of secondary islands. We will see that the
types of orbits are inexhaustible, if we look closely enough. The
chaotic trajectory (figure 3.17) lives in the chaotic sea. Thus the
section provides a summary of the types of motion possible, and
how they are related to one another. It is much more useful than
plots of a zillion trajectories.
The sections for various energies summarize the dynamics at
that energy. A sequence of sections for various energies shows how
the major features change. We have already noticed that at low
energy the section is dominated by regular orbits, at intermediate
energy the section is divided more or less equally into regular and
chaotic regions. At high energies the section is dominated by a
single chaotic zone. We will see that such transitions from regular
to chaotic behavior are quite common; similar phenomena occur
in widely different systems, though the details naturally depend
on the system under study.
3.6.4 Non-axisymmetric Top 245
3.6.4 Non-axisymmetric Top
0.005
-0.005
−π 0 π
Figure 3.22 A surface of section for the non-axisymmetric top. The

parameters are A = 0.0003kg m2 , B = 0.00025kg m2 , C = 0.0001kg m2 ,
gM R = 0.0456kg m2 s−2 . The energy and pϕ are those of the top initially
standing vertically with rotation frequency 30 rad/s. The angle θ is on
the abscissa, and the momentum pθ is on the ordinate.
We have seen that the motion an axisymmetric top can be es-

sentially solved. A plot of the rate of change of the tilt angle
versus the tilt angle is a simple closed curve. The evolution of
the other angles describing the configuration can be obtained by
quadrature once the tilting motion has been solved. Now let’s
consider a non-axisymmetric top. A non-axisymmetric top is a
top with three unequal moments of inertia. The pivot is not at
the center of mass so uniform gravity exerts a torque. We assume
the line between the pivot and the center of mass is one of the
principal axes, which we take to be ĉ. There are no torques about
the vertical axis, so the vertical component of the angular mo-
mentum is conserved. If we write the Hamiltonian in terms of the
Euler angles, the angle ϕ, which corresponds to rotation about
the vertical, does not appear. Thus the momentum conjugate to
this angle is conserved. The non-trivial degrees of freedom are θ

and ψ, with their conjugate momenta.
We can make a surface of section (see figure 3.22) for this prob-
lem by displaying pθ versus θ when ψ = 0. There are in general
two values of pψ possible for given values of energy and pϕ . We
plot points only if the value of pψ at the crossing is the larger of the
two possibilities. This makes the points of the section correspond
uniquely to a trajectory.
In this section there is a large quasiperiodic island surrounding
a fixed point that corresponds to the tilted equilibrium point of
awake axisymmetric top (see figure 3.7). Surrounding this is a
large chaotic zone that extends from θ = 0 to angles near π. If
this top is placed initially near the vertical it exhibits chaotic
motion that carries it to large tilt angles. If the top is started
within the quasiperiodic island the tilt is stable.
3.7 Exponential Divergence
Hénon and Heiles discovered that the chaotic trajectories had

remarkable sensitivity to small changes in initial conditions—
initially nearby chaotic trajectories separate roughly exponen-
tially with time. On the other hand, regular trajectories do not
exhibit this sensitivity—initially nearby regular trajectories sepa-
rate roughly linearly with time.
Consider the evolution of two initially nearby trajectories for
the Hénon-Heiles problem, with energy E = 1/8. Let d(t) be the
usual Euclidean distance in the x, y, px , py space between the two
trajectories at time t. Figure 3.23 shows the common logarithm
of d(t)/d(0) as a function of time t. We see that the divergence is
well described as exponential.
On the other hand, the distance between two initially nearby
regular trajectories grows much more slowly. Figure 3.24 shows
the distance between two regular trajectories as a function of time.
The distance grows linearly with time.
It is remarkable that Hamiltonian systems have such radically
different types of trajectories. On the surface of section the chaotic
and regular trajectories differ by the dimension of the space that
they explore. It is very interesting that along with this dimen-
sional difference there is a drastic difference in the way chaotic
and regular trajectories separate. For higher dimensional systems
12
0
0 200 400 600
Figure 3.23 The common logarithm of the phase-space distance be-

tween two chaotic trajectories divided by the initial phase-space distance
as a function of time. The initial distance was 10−10 . The logarithm of
the distance grows approximately linearly; the distance grows exponen-
tially. The two trajectory method saturates when the distance between
trajectories becomes comparable to that allowed by conservation of en-
ergy. Also displayed is the distance between trajectories calculated by
integrating the linearized variational equations. This method does not
saturate.
the surface of technique is not as useful, but trajectories are still

distinguished by the way neighboring trajectories diverge: some
diverge exponentially whereas others diverge approximately lin-
early. Exponential divergence is the hallmark of chaotic behavior.
The rate of exponential divergence is quantified by the slope
of the graph of log(d(t)/d(0)). We can estimate the rate of ex-
ponential divergence of trajectories from a particular phase-space
trajectory σ by choosing a nearby trajectory σ 0 and computing
log(d(t)/d(t0 ))
γ(t) = , (3.136)
t − t0
where d(t) = kσ 0 (t) − σ(t)k. A problem with this method, the
“two-trajectory” method, is illustrated in figure 3.23. For strongly
chaotic trajectories two initially nearby trajectories soon find
200
100
0
0 200 400 600
Figure 3.24 The phase-space distance between two regular trajecto-

ries divided by the initial phase-space distance as a function of time.
The initial distance was 10−10 . The distance grows linearly.
themselves as far apart as they can get. Once this happens the
distance no longer grows. The estimate of the rate of divergence
of trajectories is limited by this “saturation.”
We can improve on this method by studying a variational sys-
tem of equations. Let
Dz(t) = F (t, z(t)) (3.137)
be the system of equations governing the evolution of the system.

A nearby trajectory z 0 satisfies
Dz 0 (t) = F (t, z 0 (t)). (3.138)
The difference between these trajectories ζ = z 0 − z satisfies
Dζ(t) = F (t, z 0 (t))−F (t, z(t)) = F (t, z(t)+ζ(t))−F (t, z(t)).(3.139)
If ζ is small we can approximate the right-hand side by a derivative
Dζ(t) = ∂1 F (t, z(t))ζ(t). (3.140)

This set of ordinary differential equations is called the variational

equations for the system. It is linear in ζ, and driven by z.
Let d(t) = kζ(t)k, then the rate of divergence can be estimated
as before. The advantage of this “variational method” is that
w can become arbitrarily large and its growth still measures the
divergence of nearby trajectories. We can see in figure 3.23 that
the variational method gives nearly the same result as the two-
trajectory method up to the point at which the two-trajectory
method saturates.32
The Lyapunov exponent is defined to be the infinite time limit
of γ(t), defined by equation (3.136), in which the distance d is
computed by the variational method. Actually, for each trajec-
tory there are many Lyapunov exponents, depending on the ini-
tial direction of the variation ζ. For an N dimensional system,
there are N Lyapunov exponents. For a randomly chosen ζ(t0 )
the subsequent growth of ζ has components that grow with each
of the Lyapunov exponents. In general, however, the growth of w
will be dominated by the largest exponent. The largest Lyapunov
exponent thus has the interpretation as the typical rate of expo-
nential divergence of nearby trajectories. The sum of the largest
two Lyapunov exponents can be interpreted as the typical rate
of growth of the area of two-dimensional elements. This can be
extended to higher dimensional elements. The rate of growth of
volume elements is the sum of all the Lyapunov exponents.
For Hamiltonian systems there are constraints that the Lya-
punov exponents must satisfy, which we will justify later. Lya-
punov exponents come in pairs: For every Lyapunov exponent λ
its negation −λ is also an exponent. For every conserved quan-
tity, one of the Lyapunov exponents (and its negation) is zero. So
the Lyapunov exponents can be used to check for the existence
of conserved quantities. The sum of the Lyapunov exponents for
a Hamiltonian system is zero, so volume elements do not expo-
nentially grow. We will see in the next section that phase-space
volume is actually conserved for Hamiltonian systems.
32
In strongly chaotic systems w may become so large that the computer can
no longer represent it. To prevent this we can replace w by w/c whenever the
size of w becomes uncomfortably large. The equation governing w is linear
so, except for the scale change, the evolution is unchanged. Of course we have
to keep track of these scale changes when computing the average growth rate.
This process is called “renormalization” to make it sound impressive.
3.8 Liouville’s Theorem
If an ensemble of states occupies a particular volume of phase

space at one moment, then the subsequent evolution of that vol-
ume by the flow described by Hamilton’s equations may distort
the ensemble but it does not change the volume the ensemble oc-
cupies. That phase-space volume is preserved by the phase flow
is called Liouville’s Theorem.
We will first illustrate the preservation of phase-space volume
with a simple example and then prove it in general.
The phase flow for the pendulum
Consider an undriven pendulum described by the Hamiltonian:
p2θ
H(t, θ, pθ ) = + glm cos θ. (3.141)
2l2 m
In figure 3.25 we see the evolution of an elliptic region around a
point on the θ-axis, in the oscillation region of the pendulum.
Three later positions of the region are shown. The region is
stretched and sheared by the flow, but the area is preserved. After
many cycles, the starting region will be stretched to be a thin layer
distributed in the phase angle of the pendulum. Figure 3.26 shows
a similar evolution (for smaller time intervals) of a region strad-
dling the separatrix33 near the unstable equilibrium point. The
phase-space region rapidly stretches along the separatrix, while
preserving the area. The initial conditions that start in the oscil-
lation region (inside of the separatrix) will continue to spread into
a thin ring-shaped region, while the initial conditions that start
outside of the separatrix will spread into a thin region of rotation
on the outside of the separatrix.
Proof of Liouville’s theorem
Consider a set of ordinary differential equations of the form
Dz(t) = F (t, z(t)), (3.142)
where z is a tuple of N state variables. Let R(t1 ) be a region of

the state space at time t1 . Each element of this region is an initial
33
The separatrix is the curve that separates the oscillating motion from the
circulating motion. It is made up of several trajectories that are asymptotic
to the unstable equilibrium.
+10
−10
−π 0 +π
Figure 3.25 A swarm of initial points outlining an area in the phase

space of the pendulum deforms as it evolves, but the area contained in
the contour remains constant. The horizontal axis is the angle of the
pendulum from the vertical. The vertical axis is the angular momentum.
The initial contour is the “ellipse” on the abscissa. The pendulum has
length 1 meter in standard gravity (9.8 meter/second2 ), so its period is
approximately 2 seconds. The flow proceeds clockwise and the deformed
areas are shown at .9 seconds, 1.8 seconds, and 2.7 seconds. The succes-
sive positions exhibit “shearing” of the region due to the fact that the
pendulum is not isochronous.
condition at time t1 for the system. Each element evolves to an

element at time t2 according to the differential equations. The set
of these elements at time t2 is the region R(t2 ). Regions evolve to
regions.
The evolution of the system for a time interval ∆t defines a
map gt,∆t from the state space to itself:
gt,∆t (z(t)) = z(t + ∆t). (3.143)
Regions map to regions by mapping each element in the region:
gt,∆t (R(t)) = R(t + ∆t). (3.144)

+10
−10
−π 0 +π
Figure 3.26 The pendulum here is the same as in the previous figure,
but now the swarm of initial points surrounds the unstable equilibrium
point for the pendulum in phase space, where θ = π and pθ = 0. The
swarm is stretched out along the separatrix. The time interval between
successively plotted contours is 0.3 seconds.
R
The volume V (t) of a region R(t) is R(t) 1. The volume of the
evolved region R(t + ∆t) is
Z
V (t + ∆t) = 1
R(t+∆t)
Z
= 1
gt,∆t (R(t))
Z
= Jac(gt,∆t ), (3.145)
R(t)
where Jac(gt,∆t ) is the Jacobian of the mapping gt,∆t . The Jaco-

bian is the determinant of the derivative of the mapping.
For small ∆t
gt,∆t (z(t)) = z(t) + ∆tF (t, z(t)) + o(∆t2 ), (3.146)

so
Dgt,∆t (z(t)) = 1 + ∆t∂1 F (t, z(t)) + o(∆t2 ). (3.147)
We can use the fact that if A is an N × N square matrix then
det(1 + ²A) = 1 + ² tr A + o(²2 ) (3.148)
to show that
Jac(gt,∆t )(z) = 1 + ∆tGt (z) + o(∆t2 ), (3.149)
where
Gt (z) = tr(∂1 F (t, z)). (3.150)
Thus
Z
£ ¤
V (t + ∆t) = 1 + ∆tGt + o(∆t2 )
R(t)
Z
= V (t) + ∆t Gt + o(∆t2 ). (3.151)
R(t)
So the rate of change of the volume at time t is

Z
DV (t) = Gt . (3.152)
R(t)
Now we compute Gt for a system described by a Hamiltonian

H. The components of z are the components of the coordinates
and the momenta: z k = q k , z k+n = pk for k = 0, . . . , n − 1. The
components of F are
F k (t, z) = (∂2 H)k (t, q, p)

F k+n (t, z) = −(∂1 H)k (t, q, p), (3.153)
for k = 0, . . . , n − 1. The diagonal components of the derivative

∂1 F are
(∂1 )k F k (t, z) = (∂1 )k (∂2 )k H(t, q, p)

(∂1 )k+n F k+n (t, z) = −(∂2 )k (∂1 )k H(t, q, p). (3.154)
The component partial derivatives commute, so the diagonal com-

ponents with index k and index k + n are equal and opposite. We
see that the trace, which is the sum of these diagonal components,
is zero. Thus the integral of Gt over the region R(t) is zero, so the
derivative of the volume at time t is zero. Because t is arbitrary,
the volume does not change. This proves Liouville’s theorem: the
phase-space flow conserves phase-space volume.
Notice that the proof of Liouville’s theorem does not depend
upon whether the Hamiltonian has explicit time dependence. Li-
ouville’s theorem holds for systems with time-dependent Hamil-
tonians.
We may think of the ensemble of all possible states as a fluid
flowing around under the control of the dynamics. Liouville’s theo-
rem says that this fluid is incompressible for Hamiltonian systems.
Exercise 3.11: Determinants and traces

Show that equation (3.148) is correct.
Area preservation of stroboscopic surfaces of

section
Surfaces of section for periodically driven Hamiltonian systems
are area preserving if the section coordinates are the phase space
coordinate and momentum. This is an important feature of sur-
faces of section. It is a consequence of Liouville’s theorem for one
degree of freedom problems.
It is also the case that surfaces of section such as those we have
used for the Hénon-Heiles problem are area preserving, but we are
not ready to prove this yet!
Poincaré recurrence
There is a remarkable theorem which is a trivial consequence of
Liouville’s theorem—the Poincaré recurrence theorem. Loosely,
the theorem states that almost all trajectories eventually return
arbitrarily close to where they started. This is true regardless of
whether the trajectories are chaotic or regular.
More precisely, consider a Hamiltonian dynamical system for
which the phase space is a bounded domain D. We identify some
initial point in the phase space, say, z0 . Then, for any finite
neighborhood U of z0 that we choose, there are trajectories which
emanate from initial points in that neighborhood that eventually
return to the neighborhood.
We can prove this by considering the successive images of U
under the time evolution. For simplicity, we restrict consideration
to time evolution for a time interval ∆. The map of the phase

space onto itself generated by time evolution for an interval ∆
we call C. Subsequent applications of the map generate a discrete
time evolution. Sets of points in phase space transform by evolving
all the points in the set; the image of the set U is denoted C(U ).
Now consider the trajectory of the set U , that is, the sets C n (U )
where C n indicates the n times composition of C. Now there are
two possibilities: either the successive images C i (U ) intersect or
they do not. If they do not intersect, then with each iteration, a
volume of D equal to the volume of U gets “used up” and cannot
belong to the further image. But the volume of D is finite, so
we cannot fit an infinite number of non-intersecting finite volumes
into it. Therefore, after some number of iterations the images
intersect. Suppose, C i (U ) intersects with C j (U ), with j < i,
for definiteness. Then the preimage of each must also intersect,
since the preimage of a point in the intersection belongs to both
sets. Thus C i−1 (U ) intersects C j−1 (U ). This can be continued
until finally we have C i−j (U ) intersects U . So we have proven
that after i − j iterations of the map C there are a set of points
initially in U that return to the neighborhood U .
The gas in the corner of the room
Suppose we have a collection of N classical atoms in a perfectly
sealed room. The phase-space dimension of this system is 6N . A
point in this phase space is denoted z. Suppose initially all the
atoms are, say, within one centimeter of one corner, with arbi-
trarily chosen finite velocities. This corresponds to some initial
point z0 in the phase space. The phase space of the system is
limited in space by the box, and in momentum by energy conser-
vation; the phase space is bounded. The recurrence theorem then
says that in the neighborhood of z0 there is an initial condition
of the system that returns to the neighborhood of z0 after some
time. For the individual atoms this means that after some time
all of the atoms will be found in the corner of the room again,
and again, and again. Makes one wonder about the second law of
thermodynamics, doesn’t it?34
34
It is reported that when Boltzmann was confronted with this problem he
responded, “You should wait that long!”
Non-existence of attractors in Hamiltonian systems

Some systems have attractors. An attractor is a region of phase
space that gobbles volumes of trajectories. For an attractor there
is some larger region, the basin of attraction, such that sets of tra-
jectories with non-zero volume eventually end up in the attractor
and never leave it. The recurrence theorem shows that Hamilto-
nian systems with bounded phase space do not have attractors.
Consider some candidate volume in the proposed basin of attrac-
tion. The recurrence theorem guarantees that some trajectories in
the candidate volume return to the volume repeatedly. Therefore,
the volume is not in a basin of attraction. Attractors do not exist
in Hamiltonian systems with bounded phase space.
This does not mean that every trajectory always returns. A
simple example is the pendulum. Suppose we take a blob of tra-
jectories that spans the separatrix, the trajectory that asymp-
totically approaches the unstable equilibrium with the pendulum
pointed up. Trajectories with more energy than the separatrix
make a full loop around and return to their initial point; trajecto-
ries with lower energy than the separatrix oscillate once across and
back to their initial position; but the separatrix trajectory itself
leaves the initial region permanently, and continually approaches
the unstable point.
Conservation of phase volume in a dissipative system
The definition of a dissipative system is not so clear. For some,
“dissipative” implies that phase-space volume is not conserved,
which is the same as saying the evolution of the system is not
governed by Hamilton’s equations. For others, “dissipative” im-
plies friction is present: representing loss of energy to unmodelled
degrees of freedom. Here is a curious example. The damped har-
monic oscillator is the paradigm of a dissipative system. Here we
show that the damped harmonic oscillator can be described by
Hamilton’s equations and that phase-space volume is conserved.
The damped harmonic oscillator is governed by the ordinary
differential equation
mD2 x + αDx + kx = 0 (3.155)

where α is a coefficient of damping. We can formulate this system

with the Lagrangian 35
m 2 k 2 αt
L(t, x, ẋ) = ( ẋ − x )e m . (3.156)
2 2
The Lagrange equation for this Lagrangian is
α
(mD2 x + αDx + kx)e m t = 0. (3.157)
Since the exponential is never zero this equation has the same
trajectories as equation (3.155) above.
The momentum conjugate to x is
α
p = mẋe m t , (3.158)
and the Hamiltonian is

1 2 −αt k α
H(t, x, p) = ( p )e m + ( x2 )e m t . (3.159)
2m 2
For this system, the Hamiltonian is not the sum of the kinetic
energy of the motion of the mass and the potential energy stored
in the spring. The value of the Hamiltonian is not conserved
(∂0 H 6= 0). Hamilton’s equations are
p(t) − α t
Dx(t) = e m
m
α
Dp(t) = −kx(t)e m t . (3.160)
Let’s consider a numerical case. Let m = 5, k = 1/4, α = 3.

Here the characteristic roots of the linear constant-coefficient or-
dinary differential equation (3.155) are s = −1/10, −1/2. Thus
the solutions are
h i · − 101 t − 12 t
¸h i
x(t) e e A1
= 1 1 , (3.161)
p(t) − 12 e− 2 t − 52 e− 10 t A2
for A1 and A2 determined by the initial conditions:

h i h ih i
x(0) 1 1 A1
= . (3.162)
p(0) − 12 − 52 A2
35
This is just the product of the Lagrangian for the undamped harmonic
oscillator with an increasing exponential of time.
Thus we can form the transformation from the initial state to the
final state:
h i · − 101 t ¸h i h i
x(t) e e − 12 t 1 1 −1 x(0)
= 1 1
1 5 . (3.163)
p(t) − 12 e− 2 t − 52 e− 10 t −2 −2 p(0)
The transformation is linear, so the area is transformed by the

determinant, which is 1 in this case. Thus, contrary to intu-
ition, the phase-space volume is conserved. So why is this not
a contradiction with the statement that there are no attractors
in Hamiltonian systems? The answer is that the Poincaré recur-
rence argument is only true for bounded phase spaces. Here, the
momentum expands exponentially with time (as the coordinate
contracts), so it is unbounded.
We shouldn’t really be too surprised by the way the theory
protects itself from an apparent paradox—that the phase volume
is conserved even though all trajectories decay to zero velocity
and coordinates. The proof of Liouville’s theorem allows for time-
varying Hamiltonians. In this case we are able to model the dis-
sipation by just such a time-varying Hamiltonian.
Exercise 3.12: Time-varying systems

To make the fact that Liouville’s theorem holds for time-varying sys-
tems even more concrete, extend the results of section 3.8 to show how
a swarm of initial points outlining an area in the phase space of the
driven pendulum deforms as it evolves. Construct pictures analogous
to figures 3.25 and 3.26 for one of the interesting cases where we have
surfaces of section. Does the distortion look different in different parts
of the phase space? How?
Distribution functions
We only know the state of a system approximately. It is reasonable
to model our state of knowledge by a probability density function
on the set of possible states. Given such incomplete knowledge,
what are the probable consequences? As the system evolves, the
density function also evolves. Liouville’s theorem gives us a handle
on this kind of problem.
Let f (t, q, p) be a probability density function on the phase
space at time t. For this to be a good probability density function
we require that the integral of f over all coordinates and momenta
is 1—that the system is somewhere is certain.
There is a set of trajectories that pass through any particular

region of phase space at a particular time. These trajectories are
neither created nor destroyed, and they proceed as a bundle to
another region of phase space at a later time. Liouville’s theo-
rem tells us that the volume of the source region is the same as
the volume of the target region, so the density must remain con-
stant. Thus D(f ◦ σ) = 0. If we have a system described by the
Hamiltonian H then
D(f ◦ σ) = ∂0 f ◦ σ + {f, H} ◦ σ. (3.164)
so we may conclude that
∂0 f ◦ σ + {f, H} ◦ σ = 0. (3.165)
This linear partial differential equation governs the evolution of

the density function, and thus shows how our state of knowledge
evolves.
3.9 Standard Map
We have seen that the surfaces of section for a number of different

problems are qualitatively very similar. They all show two qual-
itatively different types of motion: regular motion and chaotic
motion. They show that these types of orbits are clustered; there
are regions of the surface of section which have mostly regular
trajectories and other regions dominated by chaotic behavior. We
have also seen a transition to large-scale chaotic behavior as some
parameter is varied. Now we have learned that the map that takes
points on a two-dimensional surface of section to new points on the
surface of section is area-preserving. The sole property that these
maps of the section onto itself have in common (that we know of
at this point) is that they preserve area. Otherwise they are quite
distinct. Suppose we consider an abstract map of the section onto
itself that is area-preserving, without regard for whether the map
is generated by some dynamical system. Do area-preserving maps
show similar phenomena, or is the dynamical origin of the map
crucial to the phenomena we have found?36
36
This question was also addressed in the remarkable paper by Hénon and
Heiles, but with a different map than we use here.
Consider a map of the phase plane onto itself defined in terms

of the dynamical variables θ and its “conjugate momentum” I.
The map is
I 0 = (I + K sin θ) mod 2π (3.166)

θ0 = (θ + I 0 ) mod 2π. (3.167)
This map is known as the “standard map.”37 A curious feature of

the standard map is that the momentum variable I is treated as
an angular quantity. The derivative of the map has determinant
one, implying the map is area preserving.
We can implement the standard map:
(define ((standard-map K) theta I return failure)
(let ((nI (+ I (* K (sin theta)))))
(return ((principal-value 2pi) (+ theta nI))
((principal-value 2pi) nI))))
We use the explore-map procedure introduced earlier to use a

pointing device to interactively explore the surface of section. For
example, to explore the surface of section for parameter K = 0.6
we use:
(define window (frame 0.0 2pi 0.0 2pi))
(explore-map window (standard-map 0.6) 2000)
The resulting surface of section, for a variety of orbits chosen

with the pointer are shown in figure 3.27 The surface of section
does indeed look qualitatively similar to the surfaces of section
generated by dynamical systems.
The surface of section for K = 1.4 (as shown in figure 3.28) is
dominated by a large chaotic zone. The standard map exhibits a
transition to large-scale chaos near K = 1. So this abstract area-
preserving map of the phase plane onto itself shows behavior that
is similar to behavior in the sections generated by a Hamiltonian
dynamical system. Evidently, the area preservation property of
the dynamics in the phase space plays a determining role for many
interesting properties of trajectories of mechanical systems.
37
The standard map has been extensively studied. Early investigations were
by Chirikov [11] and by Taylor [41]. So the map is sometimes called the
Chirikov-Taylor map. Chirikov coined the term “standard map,” which we
adopt.
2π
0
0 π 2π
Figure 3.27 Surface of section for the standard map for K = 0.6. The
section shows mostly regular trajectories, with a few dominant islands,
but also shows a number of small chaotic zones.
Exercise 3.13: Fun with Henon’s quadratic map

Consider the map of the plane defined by the equations:
x0 = x cos α − (y − x2 ) sin α
y 0 = x sin α + (y − x2 ) cos α
a. Show that the map preserves area.

b. Implement the map as a procedure. The interesting range of x and y
is (−1, 1). There will be orbits that escape. You should check for values
of x and y that escape from this range and call the fail continuation
when this occurs.
c. Explore the phase portrait of this map for a few values of the param-
eter α. The map is particularly interesting for α = 1.32 and α = 1.2.
What happens in between?
2π
0
0 π 2π
Figure 3.28 Surface of section for the standard map for K = 1.4.
The dominant feature is a large chaotic zone. There are also some large
islands of regular behavior. In this case there are also some interesting
secondary islands - islands around islands.
3.10 Summary
Lagrange’s equations are a system of n second order ordinary dif-

ferential equations in the time, the generalized coordinates, the
generalized velocities, and the generalized accelerations. Trajec-
tories are determined by the coordinates and the velocities at a
moment.
Hamilton’s equations specify the dynamics as a system of first-
order ordinary differential equations in the time, the generalized
coordinates, and the conjugate momenta. Phase-space trajectories
are determined by an initial point in phase space at a moment.
The Hamiltonian formulation and the Lagrangian formulation
are equivalent in that equivalent initial conditions produce the
same configuration path.
If there is a symmetry of the problem that is naturally expressed
as a cyclic coordinate, then the conjugate momentum is conserved.
In the Hamiltonian formulation, such a symmetry naturally results
3.11 Projects 263
in the reduction of the dimension of the phase space of the difficult

part of the problem. If there are enough symmetries, then the
problem of determining the time evolution may be reduced to
evaluation of definite integrals (reduced to quadratures).
Systems without enough symmetries to be reducible to quadra-
tures may be effectively studied with the surface of section tech-
nique. This is particularly advantageous in systems for which the
reduced problem has two degrees of freedom or has one degree of
freedom with explicit periodic time dependence.
Surfaces of section reveal tremendous structure in the phase
space. There are chaotic zones and islands of regular behavior.
There are interesting transitions as parameters are varied between
mostly regular motion to mostly chaotic motion.
Chaotic trajectories exhibit sensitive dependence on initial con-
ditions, separating exponentially from nearby trajectories. Reg-
ular trajectories do not show such sensitivity. Curiously, chaotic
trajectories are distinguished both by the dimension of the space
they explore and by their exponential divergence.
The time evolution of a 2n-dimensional region in phase space
preserves the volume. Hamiltonian flow is “incompressible” flow
of the “phase fluid.”
Surfaces of section for two degree of freedom systems and
for periodically driven one degree of freedom systems are area-
preserving. Abstract area-preserving maps of a phase plane onto
itself show the same division of the phase space into chaotic and
regular regions as surfaces of section generated by dynamical sys-
tems. They also show transitions to large-scale chaos.
3.11 Projects
Exercise 3.14: Periodically driven pendulum
Explore the dynamics of the driven pendulum, using the surface of sec-
tion method. We are interested in exploring the regions of parameter
space over which various phenomena occur. Consider a pendulum of
length 9.8m, mass 1kg, and acceleration of gravity g = 9.8ms−2 , giv-
ing ω0 = 1rad/s. Explore the parameter plane of the amplitude A and
frequency ω of the periodic drive.
Examples of the phenomena to be investigated:
a. Inverted equilibrium. Show the region of parameter space (A, ω) in
which the inverted equilibrium is stable. If the inverted equilibrium is
stable there is some range of stability, i.e. there is a maximum angle
of displacement from the equilibrium that stable oscillations reach. If

you have enough time, plot contours in the parameter space for different
amplitudes of the stable region.
b. Period doubling of the normal equilibrium. For this case, plot the
angular momenta of the stable and unstable equilibria as functions of
the frequency for some given amplitude.
c. Transition to large-scale chaos. Show the region of parameter space
(A, ω) for which the chaotic zones around the three principal resonance
islands are linked.
Exercise 3.15: Spin-orbit surfaces of section

Write a program to compute surfaces of section for the spin-orbit prob-
lem, with the section points being recorded at pericenter. Investigate
the following:
a. Give a Hamiltonian formulation of the spin-orbit problem introduced
in section 2.11.2.
b. For out-of-roundness parameter ² = 0.1 and eccentricity e = 0.1
measure the widths of the regular islands associated with the 1:1, 3:2,
and 1:2 resonances.
c. Explore the surfaces of section for a range of ² for fixed e = 0.1.
Estimate the critical value of ² above which the main chaotic zones
around the 3:2 and the 1:1 resonance islands are merged.
d. For a fixed eccentricity e = 0.1 trace the location on the surface of
section of the stable and unstable fixed points associated with the 1:1
resonance as a function of the out-of-roundness ².
4
Phase Space Structure
When we try to represent the figure formed by
these two curves and their intersections in a finite
number, each of which corresponds to a doubly
asymptotic solution, these intersections form a
type of trellis, tissue, or grid with infinitely
serrated mesh. Neither of these two curves must
ever cut across itself again, but it must bend back
upon itself in a very complex manner in order to
cut across all of the meshes in the grid an infinite
number of times.
The complexity of this figure will be striking, and I
shall not even try to draw it.
Henri Poincaré New Methods of Celestial
Mechanics, volume III, Chapter XXXIII, Section
397, (1892).
We have seen rather complicated features appear as part of the

Poincaré sections of a variety of systems. We have seen fixed
points, invariant curves, resonance islands, and chaotic zones in
such diverse systems as the driven pendulum, the non-axisymmetric
top, the Hénon-Heiles system, and the spin-orbit coupling of a
satellite. Indeed, even in the standard map, where there is no
continuous process sampled by the surface of section, the phase
space shows similar features.
The motion of other systems is simpler. For some systems con-
served quantities can be used to reduce the solution to the eval-
uation of definite integrals. An example is the axisymmetric top.
Two symmetries imply the existence of two conserved momenta,
and time independence of the Hamiltonian implies energy conser-
vation. Using these conserved quantities, determining the motion
is reduced to the evaluation of definite integrals of the periodic
motion of the tilt angle as a function of time. Such systems do
not exhibit chaotic behavior; on a surface of section the conserved
quantities constrain the points to fall on curves. We may conjec-
ture that if points on a surface of section do not apparently fall
266 Chapter 4 Phase Space Structure
on curves then a sufficient number of conserved quantities do not

exist to reduce the solution to quadratures.
We have seen a number of instances in which the behavior of a
system changes qualitatively as additional effects are added. The
free rigid body can be reduced to quadratures, but the addition
of gravity gradient torques in the spin-orbit system yields the fa-
miliar mixture of regular and chaotic motions. The motion of an
axisymmetric top is also reducible to quadratures, but if the top is
made non-axisymmetric then the mixed phase space appears. The
system studied by Hénon and Heiles, with the classic mixed phase
space, can be thought of as a solvable pair of harmonic oscillators
with non-linear coupling terms. The pendulum is solvable, but
the driven pendulum has the mixed phase space.
We observe that, as additional effects are turned on, qualita-
tive changes occur in the phase space. Resonance islands appear,
chaotic zones appear, some invariant curves disappear, but oth-
ers persist. Why do resonance islands appear? How does chaotic
behavior arise? When do invariant curves persist? Can we draw
any general conclusions?
4.1 Emergence of the Mixed Phase Space

We can get some insight into these qualitative changes of behavior
by considering systems in which the additional effects are turned
on by varying a parameter. For some value of the parameter
the system has a sufficient number of conserved quantities to be
reducible to quadratures; as we vary the parameter away from
this value we can study how the mixed phase space appears. The
driven pendulum offers a archetypal example of such a system.
If the amplitude of the drive is zero, then solutions of the driven
pendulum are the same as the solutions of the undriven pendulum.
We have seen surfaces of section for the strongly driven pendulum,
illustrating the mixed phase space. Here we crank up the drive
slowly and study how the phase portrait changes.
The motion of the driven pendulum with zero amplitude drive
is the same as that of an undriven pendulum. The motion of a
pendulum was described in section 3.3. Energy is conserved, so
all orbits are level curves of the Hamiltonian in the phase plane
(see figure 4.1). There are three regions of the phase plane that
have qualitatively different types of motion: the region in which
20
-20
−π 0 π
Figure 4.1 The phase plane of the pendulum has three regions dis-
playing two distinct kinds of behavior. Trajectories lie on the contours
of the Hamiltonian. Trajectories may oscillate, making ovoid curves
around the equilibrium point, or they may circulate, producing wavy
tracks outside the eye-shaped region. The eye-shaped region is delimited
by the separatrix. This pendulum has length 1m, and the acceleration
of gravity is 9.8m s−2 .
the pendulum oscillates, the region in which the pendulum circu-

lates in one direction, and the region of circulation in the other
direction. In the center of the oscillation region there is a stable
equilibrium, at which the pendulum is hanging motionless. At
the boundaries between these regions the pendulum is asymptotic
to the unstable equilibrium, at which the pendulum is standing
upright. There are two asymptotic trajectories, corresponding to
the two ways the equilibrium can be approached. Each of these
is also asymptotic to the unstable fixed point going backward in
time.
Driven pendulum sections with zero drive
Now consider the periodically driven pendulum, but with zero-
amplitude drive. The state of the driven pendulum is specified
by an angle coordinate, its conjugate momentum, and the phase
of the periodic drive. With zero-amplitude drive the evolution of
20
-20
−π 0 π
Figure 4.2 A surface of section for the driven pendulum, with zero-
amplitude drive. The effect is to sample the trajectories of the undriven
pendulum, which lie on the contours of the Hamiltonian. Only a small
number of points are plotted for each trajectory to illustrate the fact that
for zero-amplitude drive the surface of section samples the continuous
trajectories of the undriven pendulum.
“driven” pendulum is the same as the undriven pendulum. The

phase of the drive does not affect the evolution, but we consider
the phase of the drive as part of the state so we can give a uniform
description that allows us to include the zero-amplitude drive case
with the non-zero amplitude case.
For the driven pendulum we make stroboscopic surfaces of sec-
tion by sampling the state at the drive period, and plotting the
angular momentum versus the angle (see figure 4.2). For zero-
amplitude drive, the section points are confined to the curves
traced by trajectories of the undriven pendulum. For each kind
of orbit that we saw in the one degree of freedom problem, there
are orbits of the driven pendulum that generate a corresponding
pattern of points on the section.
The two stationary orbits at the equilibrium points of the pen-
dulum appear as points on the surface of section. Equilibrium
points are fixed points of the Poincaré map.
Section points for the oscillating orbits of the pendulum fall on

the corresponding contour of the Hamiltonian. Section points for
the circulating orbits of the pendulum are likewise confined to the
corresponding contour of the Hamiltonian. We notice that the
appearance of the points generated by orbits on different contours
is different. Typically, if we collected more points on the surface of
section the points would eventually fill in the contours. However,
there are actually two possibilities. Remember that the period of
the pendulum is different for different trajectories. If the period
of the pendulum is commensurate with the period of the drive
then only a finite number of points will appear on the section.
Two periods are commensurate if one is a rational multiple of the
other. If the two periods are incommensurate then the section
points never repeat. In fact, the points fill the contour densely,
coming arbitrarily close to every point on the contour.
Section points for the asymptotic trajectories of the pendulum
fall on the contour of the Hamiltonian containing the saddle point.
Each asymptotic orbit generates a sequence of isolated points that
accumulate near the fixed point. No individual orbit fills the sep-
aratrix on the section.
Driven pendulum sections for small drive
Now consider the surface of section for small drive amplitude (see
figure 4.3). The amplitude of the p drive is A = 0.001m; the drive
frequency is 4.2ω0 , where ω0 = g/l. The overall appearance of
the surface of section is similar to the section with zero-amplitude
drive. Many orbits appear to lie on invariant curves similar to the
invariant curves of the zero-drive case. However, there are several
new features.
There are now resonance regions that correspond to the pen-
dulum rotating in lock with the drive. These features are found
in the upper and lower circulating region of the surface of section.
Each island has a fixed point for which the pendulum rotates ex-
actly once per cycle of the drive. In general, fixed points on the
surface of section correspond to periodic motions of the system
in the full phase space. The fixed point is at ±π, indicating that
the pendulum is vertical at the section phase of the drive. For or-
bits in the resonance region away from the fixed point the points
on the section apparently generate curves that surround the fixed
20
-20
−π 0 π
Figure 4.3 A surface of section for the driven pendulum, with non-
zero drive amplitude A = 0.001m and drive frequency 4.2ω0 . Many tra-
jectories apparently generate invariant curves, as in the zero-amplitude
drive case. Here, in addition, some orbits belong to island chains and
others are chaotic. The most apparent chaotic orbit is near the separa-
trix of the undriven pendulum.
point.1 For these orbits the pendulum rotates on average once per
drive, but the phase of the pendulum is sometimes ahead of the
drive and sometimes behind it.
There are other islands that appear with non-zero amplitude
drive. In the central oscillation region there is a six-fold chain
of secondary islands. For this orbit the pendulum is oscillating,
and the period of the oscillation is commensurate with the drive.
The six islands are all generated by a single orbit. In fact, the
islands are visited successively in a clockwise direction. After six
cycles of the drive the section point returns to the same island
but falls at a different point on the island curve, accumulating the
island curve after many iterations. The motion of the pendulum
is not periodic, but is locked in a resonance so that on average it
oscillates once for every six cycles of the drive.
1
Keep in mind that the abscissa is an angle.
4.2 Linear Stability of Fixed Points 271
Another feature which appears is a narrow chaotic region near

where the separatrix was in the zero-amplitude drive pendulum.
We find that chaotic behavior typically makes its most prominent
appearance near separatrices. This is not surprising because the
difference in velocities that distinguish whether the pendulum ro-
tates or oscillates is small for orbits near the separatrix. As the
pendulum approaches the top, whether it receives the extra nudge
it needs to go over the top depends on the phase of the drive.
Actually, the apparent separatrices of the resonance islands for
which the pendulum period is equal to the drive period are each
generated by a chaotic orbit. To see that this orbit appears to
occupy an area one would have to magnify the picture by about
a factor of 104 .
As the drive amplitude is increased the main qualitative changes
are the appearance of resonance islands and chaotic zones. Some
qualitative characteristics of the zero-case remain. For instance
many orbits appear to lie on invariant curves. This behavior is
not particular to the driven pendulum; similar features quite gen-
erally arise as additional effects are added to problems that are
reducible to quadratures. This chapter is devoted to understand-
ing in greater detail how these generic features arise.
4.2 Linear Stability of Fixed Points

Qualitative changes are associated with fixed points of the surface
of section. As the drive is turned on chaotic zones appear at fixed
points on separatrices of the undriven system, and we observe the
appearance of new fixed points associated with resonance islands.
Here we investigate the behavior of systems near fixed points. We
can distinguish two types of fixed points of a dynamical system.
There are fixed points of the differential equations governing the
evolution. These are equilibrium points of the system. There
are also fixed points on a surface of section. These are either
equilibrium points or periodic orbits of the system.
4.2.1 Equilibria of Differential Equations

Consider first the case of a fixed point of a system of differential
equations. If a system is initially at an equilibrium point, the
system remains there. What can we say about the evolution of the
system for points near such an equilibrium point? This is actually
a very difficult question, which is not completely answered. We

can however understand quite a lot about the motion of systems
near equilibrium. The first step is to investigate the evolution
of a linear approximation to the differential equations near the
equilibrium. This part is easy, and is the subject of linear stability
analysis. Later, we will address what the linear analysis implies
for the actual problem.
Consider a system of ordinary differential equations
Dz(t) = F (t, z(t)), (4.1)
with components
Dz i (t) = F i (t, z 1 (t), . . . , z n (t)) (4.2)
where n is the dimension of the state space. An equilibrium point

of this system of equations is a point ze for which the state deriva-
tive is zero:
0 = F (t, ze ). (4.3)
That this is zero at all moments for the equilibrium solution im-
plies ∂0 F (t, ze ) = 0.
Next consider a state path z 0 which passes near the equilibrium
point. The path displacement ζ is defined so that at time t
z 0 (t) = ze + ζ(t). (4.4)
We have
Dζ(t) = Dz 0 (t) = F (t, ze + ζ(t)). (4.5)
If ζ is small we can write the right-hand side as a Taylor series in

ζ:
Dζ(t) = F (t, ze ) + ∂1 F (t, ze )ζ(t) + · · · , (4.6)
but the first term is zero because ze is an equilibrium point, so
Dζ(t) = ∂1 F (t, ze )ζ(t) + · · · . (4.7)

4.2.1 Equilibria of Differential Equations 273
If ζ is small the evolution is approximated by the linear terms.

Linear stability analysis investigates the evolution of the approx-
imate equation
Dζ(t) = ∂1 F (t, ze )ζ(t). (4.8)
These are the variational equations (3.140) with the equilibrium

solution substituted for the reference trajectory. The relationship
of the solutions of this linearized system to the full system is a
difficult mathematical problem which is not fully resolved.
If we restrict attention to autonomous systems (∂0 F = 0) then
the variational equations at an equilibrium are a linear system of
ordinary differential equations with constant coefficients.2 Such
systems can be solved analytically. To simplify the notation, let
M = ∂1 F (t, ze ), so
Dζ(t) = M ζ(t). (4.9)
We seek a solution of the form
ζ(t) = αeλt , (4.10)
where α is a structured constant with the same number of com-

ponents as ζ. Substituting, we find
λαeλt = M αeλt . (4.11)
The exponential factor is not zero, so we find
M α = λα, (4.12)
which is an equation for the eigenvalue λ and (normalized) eigen-

vector α. In general, there are n eigenvalues and n eigenvectors, so
we must add a subscript to both α and λ indicating the particular
solution. The general solution is an arbitrary linear combination
of these individual solutions. The eigenvalues are solutions of the
characteristic equation
0 = det(M − λI) (4.13)
2
Actually, all we need is ∂0 ∂1 F (t, ze ) = 0.
where M is the matrix representation of M and I is the identity

matrix of the same dimension. The elements of M are real, so we
know that the eigenvalues λ are either real or come in complex-
conjugate pairs. We assume the eigenvalues are all distinct.3
If the eigenvalue is real then the solution is exponential, as
assumed. If the eigenvalue λ > 0 then the solution expands expo-
nentially along the direction α; if λ < 0 then the solution contracts
exponentially along the direction α.
If the eigenvalue is complex we can form real solutions by com-
bining the two solutions for the complex-conjugate pair of eigen-
values. Let λ = a + ib, with real a and b, be one such complex
eigenvalue. Let α = u + iv, where u and v are real, be the eigen-
vector corresponding to it. So there is a complex solution of the
form
ζc (t) = (u + iv)e(a+ib)t
= (u + iv)eat (cos bt + i sin bt)
= eat (u cos bt − v sin bt)
+ ieat (u sin bt + v cos bt). (4.14)
The complex conjugate of this solution is also a solution, because

the ordinary differential equation is linear with real linear coef-
ficients. This complex-conjugate solution is associated with the
eigenvalue which is the complex conjugate of the original complex
eigenvalue. So the real and imaginary parts of ζc are real solutions:
ζa (t) = eat (u cos bt − v sin bt)

ζb (t) = eat (u sin bt + v cos bt) (4.15)
These two solutions reside in the plane containing the vectors u

and v. If a is positive both solutions spiral outwards exponentially,
and if a is negative they both spiral inwards. If a is zero, both
solutions trace the same ellipse, but with different phases.
Again, the general solution is an arbitrary linear combination
of the particular real solutions corresponding to the various eigen-
3
If the eigenvalues are not unique then the form of the solution is modified.
4.2.2 Fixed Points of Maps 275
values. So if we denote the k th real eigensolution ζk (t), then the

general solution is
X
ζ(t) = Ak ζk (t), (4.16)
k
where Ak may be determined by the initial conditions (the state

at a given time).
Exercise 4.1: Pendulum

Carry out the details of finding the eigensolutions for the two equilibria
of the pendulum (θ = 0 and θ = π, both with pθ = 0). How is the small
amplitude oscillation frequency related to the eigenvalues? How are the
eigendirections related to the contours of the Hamiltonian?
4.2.2 Fixed Points of Maps

Fixed points on a surface of section correspond either to equilib-
rium points of the system or to a periodic motion of the system.
Linear stability analysis of fixed points is similar to the linear
stability analysis for equilibrium points.
Let T be a map of the state space onto itself, as might be gener-
ated by a surface of section. A trajectory sequence is generated by
successive iteration of the map T . Let x(n) be the nth point of the
sequence. The map carries one point of the trajectory sequence
to the next: x(n + 1) = T (x(n)). We can represent successive it-
erations of the map by a superscript: so T i indicates T composed
i times. For example, T 2 (x) = T (T (x)). Thus x(n) = T n (x(0)).4
A fixed point x0 of the map T satisfies
x0 = T (x0 ). (4.17)
Let x be some trajectory initially near x0 , and ξ be the deviation

from x0 : x(n) = x0 + ξ(n). The trajectory satisfies
x0 + ξ(n + 1) = T (x0 + ξ(n)). (4.18)
Expanding the right hand side as a Taylor series we obtain
x0 + ξ(n + 1) = T (x0 ) + DT (x0 )ξ(n) + · · · , (4.19)
4
The map T is being used as an operator: multiplication is interpreted as
composition.
but x0 = T (x0 ) so
ξ(n + 1) = DT (x0 )ξ(n) + · · · . (4.20)
Linear stability analysis considers the evolution of the system

truncated to the linear terms
ξ(n + 1) = DT (x0 )ξ(n). (4.21)
This is a system of linear difference equations, with constant co-

efficients DT (x0 ).
We assume there are solutions of the form
ξ(n) = ρn α, (4.22)
where ρ is some (complex) number. Substituting this solution in

the linearized evolution equation we find
ρα = DT (x0 )α, (4.23)
or
(DT (x0 ) − ρI)α = 0, (4.24)
where I is the identity function. We see that ρ is an eigenvalue of

the linear transformation DT (x0 ), and α is the associated (nor-
malized) eigenvector. Let M = DT (x0 ), and M be its matrix
representation. The eigenvalues are determined by
det(M − ρI) = 0. (4.25)
The elements of M are real, so the eigenvalues ρ are either real or

come in complex-conjugate pairs.5
For the real eigenvalues the solutions are just exponential ex-
pansion or contraction along the associated eigenvector α:
ξ(n) = ρn α. (4.26)
The solution is expanding if kρk > 1 and contracting if kρk < 1.

If the eigenvalues are complex, then the solution is complex, but
the complex solutions corresponding to the complex conjugate pair
of eigenvalues can be combined to form two real solutions, as was
5
We assume the eigenvalues are distinct for now.
done for the equilibrium solutions. Let ρ = exp(A + iB) with real
A and B, and ξ = u + iv. A calculation similar to that for the
equilibrium case show that there are two real solutions
ξa (n) = eAn (u cos Bn − v sin Bn)

ξb (n) = eAn (u sin Bn + v cos Bn) . (4.27)
We see that if A > 0 then the solution exponentially expands,

and if A < 0 the solution exponentially contracts. Exponential
expansion A > 0 corresponds to kρk > 1; exponential contraction
corresponds to kρk < 1. If A = 0 then the two real solutions and
any linear combination of them traces an ellipse.
The general solution is an arbitrary linear combination of each
of the eigensolutions. Let ξk be the k th real eigensolution. The
general solution is
X
ξ(n) = Ak ξk (n), (4.28)
k
where Ak may be determined by the initial conditions.
Exercise 4.2: Elliptical oscillation

Show that the arbitrary linear combination of ξa and ξb traces an ellipse
for A = 0.
Exercise 4.3: Standard map

The standard map (see section 3.9) has fixed points at I = 0 for θ = 0
and θ = π. Find the full eigensolutions for these two fixed points. For
what ranges of the parameter K are the fixed points linearly stable or
unstable.
4.2.3 Relations Among Exponents

For maps that are generated by stroboscopic sampling of the evo-
lution of a system of autonomous differential equations, equilib-
rium points are fixed points of the map. The eigensolutions of the
equilibrium of the flow and the eigensolutions of the map at the
fixed point are then related. Let τ be the sampling period. Then
ρi = eλi τ .
The Lyapunov exponent is a measure of the rate of exponential
divergence of nearby trajectories from a reference trajectory. If
the reference trajectory is an equilibrium of a flow then the Lya-
punov exponents are the real parts of the linearized characteristic
exponents λi . If the reference trajectory is fixed point of a map

generated by a flow (either a periodic orbit or an equilibrium)
then the Lyapunov exponents are real parts of the logarithm of
the characteristic multipliers, divided by the period of the map.
So if the characteristic multiplier is ρ = eA+iB and the period
of the map is τ then the Lyapunov exponent is A/τ . A positive
Lyapunov exponent of a fixed point indicates linear instability of
the fixed point.
The Lyapunov exponent has less information than the charac-
teristic multipliers or exponents because the imaginary part is lost.
However, the Lyapunov exponent is more generally applicable in
that it is well defined even for reference trajectories that are not
periodic.
In the linear analysis of the fixed point, each characteristic ex-
ponent corresponds to a subspace of possible linear solutions. For
instance, for a real characteristic multiplier there is a correspond-
ing eigendirection, and for any initial displacement along this di-
rection successive iterates are also along this direction. Complex-
conjugate pairs of multipliers correspond to a plane of solutions.
For a displacement initially on this plane, successive iterates are
also on this plane.
It turns out that something like this is also the case for the lin-
earized solutions near a reference trajectory that is not at a fixed
point. For each non-zero Lyapunov exponent there is a twisting
subspace so that for an initial displacement in this subspace suc-
cessive iterates also belong to the subspace. At different points
along the reference trajectory the unit displacement vector that
characterizes the direction of this subspace is different.
Hamiltonian specialization
For Hamiltonian systems there are additional constraints among
the eigenvalues.
Consider first the case of two-dimensional surfaces of section.
We have seen that Hamiltonian surfaces of section are area pre-
serving. As we saw in the proof of Liouville’s theorem, area
preservation implies that the determinant of the derivative of the
transformation is 1. At a fixed point x0 the linearized map is
ξ(n + 1) = DT (x0 )ξ(n). So M = DT (x0 ) has unit determinant.
Now the determinant is the product of the eigenvalues, so for a
fixed point on a Hamiltonian surface of section the two eigenval-
ues must be inverses of each other. We also have the constraint
Imρ
Reρ
Figure 4.4 The eigenvalues for fixed points of a two-dimensional

Hamiltonian map. The eigenvalues are either complex-conjugate pairs
that lie on the unit circle or they are real. For each eigenvalue the inverse
is also an eigenvalue.
that if an eigenvalue is complex then the complex conjugate of

the eigenvalue is also an eigenvalue. These two conditions im-
ply that the eigenvalues must either be real and inverses, or be
complex-conjugate pairs on the unit circle (see figure 4.4).
Fixed points for which the characteristic multipliers all lie on
the unit circle are called elliptic fixed points. The solutions of
the linearized variational equations trace ellipses around the fixed
point. Elliptic fixed points are linearly stable.
Fixed points with positive real characteristic multipliers are
called hyperbolic fixed points. For two-dimensional maps, there
is an exponentially expanding subspace and an exponentially con-
tracting subspace. The general solution is a linear combination
of these. Fixed points for which the characteristic multipliers are
negative are called hyperbolic with reflection.
The edge case of two degenerate characteristic multipliers is
called parabolic. For two degenerate eigenvalues the general solu-
tion grows linearly. This happens at points of bifurcation where
elliptic points become hyperbolic points or vice versa.
For two-dimensional Hamiltonian maps these are the only pos-

sibilities. For higher dimensional Hamiltonian maps, we can get
combinations of these: some characteristic multipliers can be real
and others complex-conjugate pairs. We might imagine that in
addition there would be many other types of fixed points that
occur in higher dimension. In fact, there is only one additional
type, shown in figure 4.5. For Hamiltonian systems of arbitrary
dimensions it is still the case that for each eigenvalue the complex
conjugate and the inverse are also eigenvalues. We can prove this
starting from a result that we will prove in chapter 5. Consider
the map of the phase space onto itself that is generated by time
evolution of a Hamiltonian system. Let z = (q, p), then the map
Tβ satisfies z(t + β) = Tβ (z(t)) for solutions z of Hamilton’s equa-
tions. We will show in chapter 5 that the derivative of the map
Tβ is symplectic, whether or not the starting point is at a fixed
point. A 2n × 2n matrix M is symplectic if it satisfies
MJMT = J, (4.29)
where J is the 2n-dimensional symplectic unit:

h i
0n×n 1n×n
J= , (4.30)
−1n×n 0n×n
with the n × n unit matrix 1n×n and the n × n zero matrix 0n×n .
Using the symplectic property we can show that in general for
each eigenvalue its inverse is also an eigenvalue. Assume ρ is
an eigenvalue, so ρ satisfies det(M − ρI) = 0. This equation
is unchanged if M is replaced by its transpose, so ρ is also an
eigenvalue of MT :
MT α0 = ρα0 . (4.31)
From this we can see that

1 0
α = (MT )−1 α0 . (4.32)
ρ
Now, from the symplectic property we have
MJ = J(MT )−1 . (4.33)

Imρ
Reρ
Figure 4.5 If there is more than one degree of freedom the eigenvalues
for fixed points of a Hamiltonian map may lie in a quartet, with two
complex-conjugate pairs. The magnitudes of the pairs must be inverses.
This enforces the constraint that the expansion produced by the roots
with magnitude greater than one is counterbalanced by the contraction
produced by the roots with magnitude smaller than one.
So
1 0
MJα0 = J(MT )−1 α0 = Jα , (4.34)
ρ
and we can conclude that 1/ρ is an eigenvalue of M with the
eigenvector Jα0 . From the fact that for every eigenvalue its in-
verse is also an eigenvalue we deduce that the determinant of the
transformation M, which is the product of the eigenvalues, is one.
The constraints that the eigenvalues must be associated with
inverses and complex conjugates yields exactly one new pattern of
eigenvalues in higher dimensions. Figure 4.5 shows the only new
pattern that is possible.
We have seen that the Lyapunov exponents for fixed points
are related to the characteristic multipliers for the fixed points,
so the Hamiltonian constraints on the multipliers correspond to
Hamiltonian constraints for Lyapunov exponents at fixed points.
For each characteristic multiplier, the inverse is also a character-

istic multiplier. This means that at fixed points, for each positive
Lyapunov exponent there is a corresponding negative Lyapunov
exponent with the same magnitude. It turns out that this is also
true if the reference trajectory is not at a fixed point. For Hamil-
tonian systems, for each positive Lyapunov exponent there is a
corresponding negative exponent of equal magnitude.
Exercise 4.4: Quartet

Describe (perhaps by drawing cross sections) the orbits that are possible
with quartets.
Linear and nonlinear stability

A fixed point that is linearly unstable indicates that the full sys-
tem is unstable at that point. What this means is that trajectories
starting near the fixed point diverge from the fixed point. On the
other hand, linear stability of a fixed point does not generally
guarantee that the full system is stable at that point. For a two-
degree of freedom Hamiltonian system the Kolmogorov-Arnold-
Moser theorem proves under certain conditions that linear stabil-
ity implies nonlinear stability. In higher dimensions though it is
not known whether linear stability implies nonlinear stability.
4.3 Homoclinic Tangle
For the driven pendulum we observe that as the amplitude of the

drive is increased the separatrix of the undriven pendulum is where
the most prominent chaotic zone appears. Here we examine the
motion in the vicinity of the separatrix of the undriven pendulum
in great detail. What emerges is a remarkably complicated pic-
ture, first discovered by Henri Poincaré. Indeed, Poincaré stated
that the picture that had emerged was so complicated that he was
not even going to attempt to draw it. We will review the argu-
ment leading to the picture, and compute enough of it to convince
ourselves of its reality.
The separatrix of the undriven pendulum is made up of two
trajectories that are asymptotic to the unstable equilibrium. In
the driven pendulum with zero drive, there are an infinite number
of distinct orbits that lie on the separatrix, which are distinguished
pθ
Vs Vu
Ws Wu
Wu Ws
Vu Vs
Figure 4.6 The neighborhood of the unstable fixed point of the pen-
dulum shows the stable and unstable manifolds of the nonlinear pen-
dulum and of the linearized variational system around the fixed point.
The axes are centered at the fixed point (±π, 0). The linear stable and
unstable manifolds are labeled by Vs and Vu ; the nonlinear stable and
unstable manifolds are labeled by Ws and Wu .
by the phase of the drive. These orbits are asymptotic to the

unstable fixed point both forward and backward in time.
Notice that close to the unstable fixed point the sets of points
that are asymptotic to the unstable equilibrium must be tangent
to the linear variational eigenvectors at the fixed point. (See fig-
ure 4.6.) In a sense, the sets of orbits that are asymptotic to the
fixed point are extensions to the non-linear problem of the sets
of orbits that are asymptotic to the fixed point in the linearized
problem.
In general, the set of points that are asymptotic to an unstable
fixed point forward in time is called the stable manifold of the
fixed point. The set of points that are asymptotic to an unstable
fixed point backward in time is called the unstable manifold. For
the driven pendulum with zero amplitude drive all points on the
separatrix are asymptotic both forward and backward in time to
the unstable fixed point. So in this case the stable and unstable
manifolds coincide.
If the drive amplitude is non-zero then there are still one-
dimensional sets of points that are asymptotic to the unstable
fixed point forward and backward in time: there are still stable
and unstable manifolds. Why? The behavior near the fixed point
is described by the linearized variational system. For the linear
variational system, points in the space spanned by the unstable
eigenvector, when mapped backwards in time, are asymptotic to
the fixed point. Points slightly off this curve may initially ap-
proach the unstable equilibrium, but eventually will fall away to
one side or the other. For the driven system with small drive,
there must still be a curve which separates the points that fall
away to one side from the points that fall away to the other side.
Points on the dividing curve must be asymptotic to the unstable
equilibrium. The dividing set cannot have positive area because
the map is area preserving.
For the zero-amplitude drive case the stable and unstable man-
ifolds are contours of the conserved Hamiltonian. For non-zero
amplitude the Hamiltonian is no longer conserved. For non-zero
drive the stable manifolds and unstable manifolds no longer coin-
cide. This is generally true for non-integrable systems: stable and
unstable manifolds do not coincide.
If the stable and unstable manifolds no longer coincide where
do they go? In general, the stable and unstable manifolds must
cross one another. The only other possibilities are that they run
off to infinity or spiral around. Area preservation can be used to
exclude the spiraling case. We will see that in general there are
barriers to running away. So the only possibility is that the stable
and unstable manifolds cross. This is illustrated in figure 4.7. The
point of crossing of a stable and unstable manifold is called a ho-
moclinic intersection if the stable and unstable manifolds belong
to the same unstable fixed point. It is called a heteroclinic in-
tersection if the stable and unstable manifolds belong to different
fixed points.
If the stable and unstable manifolds cross once then there are an
infinite number of other crossings. The intersection point belongs
to both the stable and unstable manifolds. That it is on the
unstable manifold means that all images forward and backward in
time also belong to the unstable manifold, and likewise for points
on the stable manifold. Thus all images of the intersection belong
Pθ
−π +π
Figure 4.7 For non-zero drive the stable and unstable manifolds no
longer coincide and in general cross. The dashed circle indicates the
central intersection. Forward and backward images of this intersection
are themselves intersections. Because the orbits are asymptotic to the
fixed point there are an infinity of such intersections.
to both the stable and unstable manifolds. So these images must

be additional crossings of the two manifolds.
We can deduce that there are still more intersections of the
stable and unstable manifolds. The maps we are considering not
only preserve area, but they preserve orientation. In the proof of
Liouville’s theorem we showed that the determinant of the trans-
formation is one, not just magnitude one. If we consider little
segments of the stable and unstable manifolds near the intersec-
tion point then these segments must map near the image of the
intersection point. That the map preserves orientation implies
that the manifolds are crossing one another in the same sense as
at the previous intersection. Therefore there must have been at
least one more crossing of the stable and unstable manifolds in
between these two. This is illustrated in figure 4.8. Of course, all
forward and backward images of these intermediate intersections
are also intersections.
As the picture gets more complicated keep in mind that the
stable manifold cannot cross itself and the unstable manifold can-
not cross itself. Suppose one did, say by making a little loop. The
image of this loop under the map must also be a loop. So if there
was a loop there would have to be an infinite number of loops.
A0
A−1 B1
B0 B2
B−1 A1
Figure 4.8 Orientation preservation implies that between an inter-

section of the stable and unstable manifolds and the image of this in-
tersection there is another intersection. Thus there are two alternating
families of intersections. The central intersection and its pre-images and
post-images are labeled Ai . Another family is labelled Bi .
That would be ok, but what happens as the loop gets close to
the fixed point? There would still have to be loops, but then the
stable and unstable manifolds would not have the right behavior:
the stable and unstable manifolds of the linearized map do not
have loops. Therefore, the stable and unstable manifolds cannot
cross themselves.6
We are not done yet! The lobes that are defined by successive
crossings of the stable and unstable manifolds enclose a certain
area. The map is area preserving so all images of these lobes must
have the same area. So there are an infinite number of images of
these lobes, all with the same area. Furthermore, the boundaries
of these images cannot cross. As the lobes approach the fixed
point we get an infinite number of lobes with a base with an
exponentially shrinking length. In order to pack these together
on the plane, without the boundaries crossing each other, the
lobes must stretch out to preserve area. We see that the length of
the lobe must grow roughly exponentially (It may not be uniform
in width so it need not be exactly exponential.) This exponential
lengthening of the lobes no doubt bears some responsibility for the
exponential divergence of nearby trajectories of chaotic orbits, but
6
Sometimes it is argued that the stable and unstable manifolds cannot cross
themselves on the basis of the uniqueness of solutions of differential equations.
This is an incorrect argument. The stable and unstable manifolds are not
themselves solutions of a differential equation, they are sets of points whose
solutions are asymptotic to the unstable fixed points.
4.3.1 Computation of Stable and Unstable Manifolds 287
does not prove it. It does however suggest a connection between

the fact that chaotic orbits appear to occupy an area on the section
and the fact that nearby chaotic orbits diverge exponentially.
Actually, the situation is even more complicated. As the lobes
stretch, they form tendrils that wrap around the separatrix region.
The tendrils of the unstable manifold can cross the tendrils of the
stable manifold. Each point of crossing is a new homoclinic inter-
section, and so each pre and post image of this point belongs to
both the stable and unstable manifolds, indicating another cross-
ing of these curves. We could go on and on. No wonder Poincaré
refused to draw this mess.
Exercise 4.5: Homoclinic paradox

How do we fit an infinite number of copies of a finite area in a finite re-
gion, without allowing the stable and unstable manifolds to cross them-
selves? Resolve this apparent paradox.
4.3.1 Computation of Stable and Unstable Manifolds

The homoclinic tangle is not just a bad dream. We can actually
compute it.
Very close to an unstable fixed point the stable and unstable
manifolds become indistinguishable from the rays along the eigen-
vectors of the linearized system. So one way to compute the un-
stable manifold is to take a line of initial conditions close to the
fixed point along the unstable manifold of the linearized system
and evolve them forward in time. Similarly, the stable manifold
can be constructed by taking a line of initial conditions along the
stable manifold of the linearized system and evolving them back-
ward in time.
We can do better than this by choosing some parameter (like
arclength) along the manifold and for each parameter decide how
many iterations of the map would be required to take the point
back to within some small region of the fixed point. We then
choose an initial condition along the linearized eigenvectors and
iterate the point back with the map. This idea is implemented in
the following program:
(define ((unstable-manifold T xe ye dx dy A eps) param)

(let ((n (floor->exact (/ (log (/ param eps)) (log A)))))
((iterated-map T n) (+ xe (* dx (/ param (expt A n))))
(+ ye (* dy (/ param (expt A n))))
cons
list)))
where T is the map, xe and ye are the coordinates of the fixed

point, dx and dy are components of the linearized eigenvector, A
is the characteristic multiplier, eps is a scale within which the
linearized map is a good enough approximation to T, and param is
a continuous parameter along the manifold. The program assumes
that there is a basic exponential divergence along the manifold—
that is why we take the logarithm of param to get initial conditions
in the linear regime. This assumption is not exactly true, but good
enough for now.
The curve is generated by a call to plot-parametric-fill,
which recursively subdivides intervals of the parameter until there
are enough points to get a smooth curve.
(define (plot-parametric-fill win f a b near?)
(let loop ((a a) (xa (f a)) (b b) (xb (f b)))
(if (not (close-enuf? a b (* 10 *machine-epsilon*)))
(let ((m (/ (+ a b) 2)))
(let ((xm (f m)))
(plot-point win (car xm) (cdr xm))
(if (not (near? xa xm))
(loop a xa m xm))
(if (not (near? xb xm))
(loop m xm b xb)))))))
The near? argument is a test for whether two points are within
a given distance of each other in the graph. Because some co-
ordinates are angle variables, this may involve a principal value
comparison. For example, for the driven pendulum section, the
horizontal axis is an angle but the vertical axis is not, so the pic-
ture is on a cylinder:
(define (cylinder-near? eps)
(let ((eps2 (square eps)))
(lambda (x y)
(< (+ (square ((principal-value pi)
(- (car x) (car y))))
(square (- (cdr x) (cdr y))))
eps2))))
Figure 4.9 shows a computation of the homoclinic tangle for the

driven pendulum. −1
p The parameters are m = 1kg, g = 9.8kgms ,
l = 1m, ω = 4.2 g/l, and amplitude A = 0.05m. For reference,
figure 4.9 shows a surface of section for these parameters on the
same scale.
Exercise 4.6: Computing homoclinic tangles

a. Compute stable and unstable manifolds for the standard map.
b. Identify the features on the homoclinic tangle that entered the argu-
ment about its existence, such as the central crossing of the stable and
unstable manifolds, etc.
c. Investigate the errors in the process. Are the computed manifolds
really correct or a figment of wishful thinking? One could imagine that
the errors are exponential and the computed manifolds have nothing to
do with the actual manifolds.
d. How much actual space is taken up by the homoclinic tangle? Con-
sider a value of the coupling constant K = 0.8. Does the homoclinic
tangle actually fill out the apparent chaotic zone?
4.4 Integrable Systems

Islands appear near commensurabilities, and commensurabilities
are present even in integrable systems. In integrable systems an
infinite number of periodic orbits are associated with each com-
mensurability, but upon perturbation only a finite number of pe-
riodic orbits survive. How does this happen? First we have to
learn more about integrable systems.
If an n degree of freedom system has n independent conserved
quantities then the solution of the problem can be reduced to
quadratures. Such a system is called integrable. Typically, the
phase space of integrable systems is divided into regions of qual-
itatively different behavior. For example, the motion of a pendu-
lum is reducible to quadratures, and has three distinct types of
solutions: the oscillating solutions and the clockwise and coun-
terclockwise circulating solutions. The different regions of the
pendulum phase space are separated by the trajectories that are
asymptotic to the unstable equilibrium. It turns out that for any
system that is reducible to quadratures a set of phase space coor-
dinates can be chosen for each region of the phase space so that the
Hamiltonian describing the motion in that region depends only on
the momenta. Furthermore if the phase space is bounded then the
10
-10
−π 0 π
10
-10
−π 0 π
Figure 4.9 The computed homoclinic tangle for the driven pendulum
exhibits the features described in the text. Notice how the excursions
of the stable and unstable manifolds become longer and thinner as they
approach the unstable fixed point. A surface of section with the same
parameters is also shown.
generalized coordinates can be chosen to be angles (that are 2π

periodic). The configuration space described by n angles is an n-
torus. The momenta conjugate to these angles are called actions.
Such phase space coordinates are called action-angle coordinates.
We will see how to reformulate systems in this way later. Here we
explore the consequences of such a formulation; this formulation is
especially useful for exploring what happens as additional effects
are added to integrable problems.
Orbit types in integrable systems
Suppose we have a time-independent n degree of freedom system
that is reducible to quadratures. For each region of phase space
there is a local formulation of the system so that the evolution
of the system is described by a time-independent Hamiltonian
that depends only on the momenta. Suppose further that the
coordinates are all angles. Let θ be the tuple of angles, and J be
the tuple of conjugate momenta. The Hamiltonian is
H(t, θ, J) = f (J). (4.35)
Hamilton’s equations are simply
DJ(t) = −∂1 H(t, θ(t), J(t)) = 0

Dθ(t) = ∂2 H(t, θ(t), J(t)) = ω(J(t)), (4.36)
where ω(J) = Df (J) is a tuple of frequencies with a component

for each degree of freedom. The momenta are all constant because
the Hamiltonian does not depend on any of the coordinates. The
motion of the coordinate angles is uniform; the rate of change
of the angles are the frequencies ω, which depend only on the
constant momenta. Given initial values θ(t0 ) and J(t0 ) at time
t0 , the solutions are simple:
J(t) = J(t0 )
θ(t) = ω(J(t0 ))(t − t0 ) + θ(t0 ). (4.37)
Though the solutions are simple, there are a number of distinct

orbit types: equilibrium solutions, periodic orbits, and quasiperi-
odic orbits, depending on the frequency ratios.
If ω(J) is zero for some J then θ and J are both constant, for
any θ. The system is at an equilibrium point.
2π
θ1
0
0 θ0 2π
Figure 4.10 The solid and dotted lines show two periodic trajectories
on the configuration coordinate plane. For commensurate frequencies
the configuration motion is periodic, independent of the initial angles.
In this illustration the frequencies satisfy 3ω 0 (J(t0 )) = 2ω 1 (J(t0 )). The
orbit closes after 3 cycles of θ0 and 2 cycles of θ1 , for any initial θ0 and
θ1 .
A solution is periodic if all the coordinates (and momenta)

of the system return to the initial coordinates (and momenta)
at some later time. Each coordinate θi with nonzero frequency
ω i (J(t0 )) is periodic with a period Ti = 2π/ω i (J(t0 )). The period
of the system must therefore be an integer multiple ki of each
of the individual coordinate periods Ti . If the system is periodic
with some set of integer multiples, then it is also periodic with
any common factors divided out. Thus the period of the system
is T = (ki /d)Ti where d is the greatest common divisor of the
integers ki .
For a system with two degrees of freedom a solution is periodic if
there exist relatively prime integers k and j such that kω 0 (J(t0 )) =
jω 1 (J(t0 )). The period of the system is T = 2πj/ω 0 (J(t0 )) =
2πk/ω 1 (J(t0 )); the frequency is ω 0 (J(t0 ))/j = ω 1 (J(t0 ))/k. A
periodic motion on the 2-torus is illustrated in figure 4.10.
i
P If thei frequencies ω (J(t0 )) satisfy an integer-coefficient relation
i ni ω (J(t0 )) = 0 among its frequencies we say that the frequen-
cies satisfy a commensurability. If there is no commensurability
for any non-zero integer coefficients we say that the frequencies
are linearly independent (with respect to the integers) and the so-
lution is quasiperiodic. One can prove that for n incommensurate
frequencies all solutions come arbitrarily close to every point in
the configuration space.7
For a system with two degrees of freedom the solutions in a
region described by a particular set of action-angle variables are
either equilibrium solutions, periodic solutions, or quasiperiodic
solutions.8 For systems with more than two degrees of degrees
of freedom there are trajectories that are neither periodic nor
quasiperiodic with n frequencies. These are quasiperiodic with
fewer frequencies and dense over a corresponding lower dimen-
sional torus.
Surfaces of section for integrable systems
As we have seen, in action-angle coordinates the angles move
with constant angular frequencies, and the momenta are constant.
Thus surfaces of section in action-angle coordinates are particu-
larly simple. We can make surfaces of section for time-independent
two degree of freedom systems or one degree of freedom systems
with periodic drive. In the latter case, one of the angles in the
action-angle system is the phase of the drive. We make surfaces
of section by accumulating points in one pair of canonical coordi-
nates as the other coordinate goes through some particular value,
such as zero. If we plot the section points with the angle coordi-
nate on the abscissa and the conjugate momentum on the ordinate
then the section points for all trajectories lie on horizontal lines,
as illustrated in figure 4.11.
For definiteness, let the plane of the surface of section be the
(θ0 , J0 ) plane, and the section condition be θ1 = 0. The other
7
Motion with n incommensurate frequencies is dense on the n-torus. Further-
more, such motion is ergodic on the n-torus. This means that time averages of
time independent phase space functions computed along trajectories are equal
to the phase space average of the same function over the torus.
8
For time-independent systems with two degrees of freedom the boundary
between regions described by different action-angle coordinates has asymptotic
solutions and unstable periodic orbits or equilibrium points. The solutions on
the boundary are not described by the action-angle Hamiltonian.
00
11
00
11 00
11
00
11 00 1
11 0
00
11 00
11 0
1
J0
0011
110000 000
1
0011 1
11
000
1
011 1
00
11
01111
00
0
0 θ0 2π
Figure 4.11 On surfaces of section for systems in action-angle coor-

dinates all trajectories generate points on horizontal lines. Trajectories
with frequencies that are commensurate with the sampling frequency
produce a finite number of points. Trajectories with frequencies that
are incommensurate with the sampling frequency fill out a horizontal
line densely.
momentum J1 is chosen so that all the trajectories have the same

energy. The momenta are all constant, so for a given trajectory
all points that are generated are constrained to a line of constant
J0 .
The time between section points is the period of θ1 : ∆t =
2π/ω 1 (J(t0 )) because a section point is generated for every cy-
cle of θ1 . The angle between successive points on the section
is ω 0 (J(t0 ))∆t = ω 0 (J(t0 ))2π/ω 1 (J(t0 )) = 2πν(J(t0 )), where
ν(J) = ω 0 (J)/ω 1 (J) is called the rotation number of the tra-
ˆ be the ith point (i is an integer) in a
jectory. Let θ̂(i) and J(i)
sequence of points on the surface of section generated by a solution
trajectory:
θ̂(i) = θ0 (i∆t + t0 )
ˆ = J0 (i∆t + t0 ),
J(i) (4.38)
where the system is assumed to be on the section ¡ at t = t0¢. Along

a trajectory, the map from one section point θ̂(i), J(i) ˆ to the
¡ ¢ 9
ˆ
next θ̂(i + 1), J(i + 1) is of the form:
µ ¶ µ ¶ µ ¶
θ̂(i + 1) θ̂(i) ˆ
θ̂(i) + 2πν̂(J(i))
ˆ + 1) = T ˆ = ˆ . (4.39)
J(i J(i) J(i)
As a function of the action on the section, the rotation number is
ˆ
ν̂(J(0)) ˆ
= ν(J(0), J1 (t0 )), where J1 (t0 ) has the value required to
be on the section, as for example by giving the correct energy. If
the rotation number function ν̂ is strictly monotonic in the action
coordinate on the section then the map is called a twist map.10
On a surface of section the different types of orbits generate
different patterns. If the orbit is an equilibrium solution then the
initial point on the surface of section is a fixed point. The system
just stays there.
If the two frequencies are commensurate then the trajectory is
periodic and there are only a finite number of points generated on
the surface of section. Both of the periodic solutions illustrated in
figure 4.10 generate two points on the surface of section defined by
θ1 = 0. If the frequencies are commensurate they satisfy a relation
of the form kω 0 (J(t0 )) = jω 1 (J(t0 )), where J(t0 ) = (J(0),
ˆ J1 (t0 ))
is the initial and constant value of the momentum tuple. The
motion is periodic with frequency ω 0 (J(t0 ))/j = ω 1 (J(t0 ))/k, so
the period is 2πj/ω 0 (J(t0 )) = 2πk/ω 1 (J(t0 )). Thus this periodic
orbit generates k points on this surface of section. For trajectories
with commensurate frequencies the rotation number is rational:
ˆ
ν̂(J(0)) ˆ
= ν(J(0), J1 (t0 )) = j/k. The coordinate θ1 makes k
cycles while the coordinate θ0 makes j cycles (figure 4.10 shows a
system with a rotation number of 2/3.). The frequencies depend
on the momenta but not on the coordinates, so the motion is
periodic with the same period and rotation number for all initial
angles given these momenta. Thus there is a continuous family of
periodic orbits with different initial angles.
If the two frequencies are incommensurate, then the 2-torus
is filled densely. Thus the line on which the section points are
9
The coordinate θ̂(i) is an angle. It can be brought to a standard interval such
as 0 to 2π.
10
Actually, to be a twist map we require |Dν(J)| > K > 0 over some interval
of J.
generated is filled densely. Again, this is the case for any initial
coordinates, because the frequencies depend only on the momenta.
There are infinitely many such orbits which are distinct for a given
set of frequencies.11
4.5 Poincaré-Birkhoff Theorem
How does this picture change if we add additional effects?

One peculiar feature of the orbits in integrable systems is that
there are continuous families of periodic orbits. The initial angles
do not matter, the frequencies depend only the actions. Contrast
this with our earlier experience with surfaces of section in which
periodic points are isolated, and associated with island chains.
Here we investigate periodic orbits of near-integrable systems, and
find that typically for each rational rotation number there are a
finite number of periodic points, half of which are linearly stable
and half linearly unstable.
Consider an integrable system described in action-angle coordi-
nates by the Hamiltonian H0 (t, θ, J) = f (J). We add some small
additional effect described by the term H1 in the Hamiltonian
H = H0 + ²H1 . (4.40)
An example of such a system is the periodically driven pendulum

with small drive amplitude. For zero drive amplitude the driven
pendulum is integrable, but not for small drive. Unfortunately,
we do not yet have the tools to develop action-angle coordinates
for the pendulum. A simpler problem that is already in action-
angle form is the driven rotor, which is just the driven pendulum
with gravity turned off. We can implement this by turning our
driven pendulum on its side, making the plane of the pendulum
horizontal. A Hamiltonian for the driven rotor is
p2θ
H(t, θ, pθ ) = + mlAω 2 cos ωt cos θ, (4.41)
2ml2
where A is the amplitude of the drive with frequency ω, m is
the mass of the bob, and l is the length of the rotor. For zero
11
The section points for any particular orbit are countable and dense, but they
have zero measure on the line.
amplitude, the Hamiltonian is already in action-angle form in that

it depends only on the momentum pθ and the coordinate is an
angle.
For an integrable system, the map generated on the surface
of section is of the form (4.39). With the additional of a small
perturbation to the Hamiltonian, small corrections are added to
the map
µ ¶ µ ¶
θ̂(i + 1) θ̂(i)
ˆ + 1) = T² J(i)
J(i ˆ
µ ¶
ˆ
θ̂(i) + 2πν̂(J(i)) ˆ
+ ²f (θ̂(i), J(i))
= ˆ + ²g(θ̂(i), J(i))
ˆ . (4.42)
J(i)
Both the map T and the perturbed map T² are area preserving
because the maps are generated as surfaces of section for Hamil-
tonian systems.
Suppose we are interested in determining whether periodic or-
ˆ
bits of a particular rational rotation number ν̂(J(0)) = j/k exist in
ˆ
some interval of the action α < J(0) < β. If the rotation number
is strictly monotonic in this interval and orbits with the rotation
ˆ
number ν̂(J(0)) occur in this interval for the unperturbed map T
then by a simple construction we can show that periodic orbits
with this rotation number also exist for T² for sufficiently small ².
If a point is periodic for rational rotation number ν̂(J(0)) ˆ =
j/k, with relatively prime j and k, we expect k distinct images
of the point to appear on the section. So if we consider the kth
iterate of the map then the point is a fixed point of the map. For
rational rotation number j/k the map T k has a fixed point for
every initial angle.
The rotation number of the map T is strictly monotonic. Sup-
pose for definiteness we assume the rotation number ν̂(J(0)) ˆ in-
ˆ ˆ∗ ˆ ∗
creases with J(0). For some J such that α < J < β the rotation
number is j/k, and (θ̂∗ , Jˆ∗ ) is a fixed point of T k for any initial θ̂∗ .
For Jˆ∗ the rotation number of T k is zero. The rotation number
of the map T is monotonically increasing so for J(0) ˆ > Jˆ∗ the
rotation number of T k is positive, and for J(0) ˆ < Jˆ∗ the rotation
k
number of T is negative, as long as J(0) ˆ is not too far from Jˆ∗ .
See figure 4.12.
Now consider the map T²k . In general, for small ², points map
to slightly different points under T² than they do under T , but not
Tk
J0
Tk
θ0
Figure 4.12 The map T k has a line of fixed points if the rotation
number is the rational j/k. Points above this line map to the larger θ0 ;
points below this line map to smaller θ0
T²k
T²k
J0
T²k
θ0
Figure 4.13 The map T²k is slightly different from T k , but above the
central region points still map to larger θ0 and below the central region
they map to smaller θ0 . By continuity there are points between for
which θ0 does not change.
too different. So we can expect that there is still some interval

near Jˆ∗ such that for J(0)
ˆ in the upper end of the interval T²k
0
maps points to larger θ , and for points in the lower end of the
interval maps to smaller θ0 , as we saw for T k . If this is the case
then for every θ̂(0) there is a point somewhere in the interval, some
Jˆ+ (θ̂(0)), for which θ0 does not change, by continuity. These are
not fixed points because the momentum J0 generally changes. See
figure 4.13.
The map is continuous, so we can expect that Jˆ+ is a continu-
ous function of the θ0 . As we let θ0 vary through 2π, either this
C0
J0
C1
θ0
Figure 4.14 The curve C0 of points that map to the same θ0 under
T²k is indicated by the solid line. The image of this curve C1 under T²k
is the dotted curve. Area preservation implies these curves cross.
function is periodic or not. That it must be periodic is a conse-

quence of area preservation.12 So the set of points that do not
change θ0 under T²k form some periodic function of θ0 . Call this
curve C0 . See figure 4.14.
The map T²k takes the curve C0 to another curve C1 , which, like
C0 , is continuous and periodic. The two curves C0 and C1 must
cross each other, as a consequence of area preservation. How do we
see this? Typically, there is a lower boundary or upper boundary
in J0 for the evolution. In some situations, we have such a lower
boundary because J0 cannot be negative. For example, in action-
angle variables for motion near an elliptic fixed point we will see
that the action is the area enclosed on the phase plane, which
cannot be negative. For others, we might use the fact that there
are invariant curves for large positive or negative J0 . In any case,
suppose there is such a barrier B. Then, the area of the region
between the barrier and C0 must be equal to the area of the image
of this region, which is the region between the barrier and the
12
If Jˆ+ were not periodic in θ0 then it would have to spiral. Suppose it
spirals. The region enclosed by two successive turns of the spiral is mapped
to a region between succesive turns of the spiral further down the spiral.
The map preserves area, so the spiral cannot asymptote, but must progress
infinitely down the cylinder. This is impossible because of the twist condition:
sufficiently far down the cylinder the rotation number is too different to allow
the angle to be the same under T²k . So Jˆ+ does not spiral.
Figure 4.15 The fixed point on the left is linearly unstable. The one
on the right is linearly stable.
curve C1 . So if at any point the two curves C0 and C1 do not

coincide, then they must cross to contain the same area. In fact,
they must cross an even number of times because they are both
periodic so if they cross once they must cross again to get back to
the same side they started on. The points at which the curves C0
and C1 cross are fixed points because the angle does not change
(that is what it means to be on C0 ) and the action does not change
(that is what it means for C0 and C1 to be the same at this point).
So we have deduced that there must be an even number of fixed
points of T²k . For each fixed point of T²k there are k images of this
fixed point under T² on the surface of section.
We can deduce the stability of these fixed points just from the
construction. The fixed points come in two types, elliptic and
hyperbolic. A elliptic fixed point appears where the flow is around
the fixed point: the map from C0 to C1 can be continued along the
background flow to make a closed curve. A hyperbolic fixed point
appears where if we follow the map from C0 to C1 we enter the
background flow in such a way as to leave the fixed point. So just
from the way the arrows connect we can determine the character
of the fixed point. See figure 4.15.
As we develop a Poincaré section, we find that some orbits leave
traces that circulate around the stable fixed points, resulting in the
Poincaré-Birkhoff islands. If we look at a particular island we see
that orbits in the island circulate around the fixed point at a rate
that is monotonically dependent upon the distance from the fixed
point. In the vicinity of the fixed point the evolution is governed
by a twist map. So the entire Poincaré-Birkhoff construction can
be carried out again. We expect that there will be concentric fam-
ilies of stable periodic points surrounded by islands and separated
4.5.1 Computing the Poincaré-Birkhoff Construction 301
by separatrices emanating from unstable periodic points. Around

each of these stable periodic orbits, the construction is repeated.
So the Poincaré-Birkhoff construction is recursive, leading to the
development of an infinite hierarchy of structure.
4.5.1 Computing the Poincaré-Birkhoff Construction

There are so many conditions in our construction of the fixed
points that one might be suspicious. We can make the construc-
tion more convincing by actually computing the various pieces
for a specific problem. Consider the periodically driven rotor,
with Hamiltonian
√ (4.41). We set m = 1kg, l = 1m, A = 0.1m,
ω = 4.2 9.8.
We call points that map to the same angle “radially mapping
points.” We find them with a simple bisection:
(define (radially-mapping-points map Jmin Jmax phi eps)
(bisect
(lambda (J)
((principal-value pi)
(- phi (map phi J (lambda (phip Jp) phip) list))))
Jmin Jmax eps))
The procedure map implements some map, which may be an iterate

of some more primitive map. We give the procedure an angle phi
to study and a range of actions Jmin to Jmax to search, and a
tolerance eps for the solution.
We make a plot of the curves C0 (of initial conditions that map
radially) and C1 (the image of C0 ) with an appropriate piece of
wrapper code.
In figure 4.16 we show the Poincaré-Birkhoff construction of the
fixed points for the driven rotor. These particular curves are con-
structed for the two 1:1 commensurabilities between the rotation
and the drive. There is one set of fixed points constructed for
each sense of rotation. The corresponding section is in figure 4.17.
We see that the section shows the existence of fixed points ex-
actly where the Poincaré-Birkhoff construction shows the crossing
of the curves C0 and C1 . Indeed, we can see that the nature of
the fixed point is clearly reflected in the relative configuration of
the C0 and C1 curves.
In figure 4.18 we show the result for a rotation number of 1/3.
The curves are the radially mapping points for the third iterate
of the section map (solid) and the images of these points (dot-
20
-20
−π 0 π
Figure 4.16 The curves C0 (solid) and C1 (dotted) for the 1:1 com-
mensurability.
20
-20
−π 0 π
Figure 4.17 A surface of section displaying the 1:1 commensurability.
ted). These curves are distorted by their proximity to the 1:1

islands shown in figure 4.17. The corresponding section is shown
in figure 4.19.
Exercise 4.7: Computing the Poincaré-Birkhoff construction

Consider the figure 3.27. Find the fixed points for the three major island
chains, using the Poincaré-Birkhoff construction.
4.6 Invariant Curves
We started with an integrable system, where there are invariant

curves. Do any invariant curves survive if a perturbation is added?
The Poincaré-Birkhoff construction for twist maps shows that
invariant curves with rational rotation number typically do not
survive perturbation. Upon perturbation the invariant curves
with rational rotation numbers are replaced by an alternating se-
quence of stable and unstable periodic orbits. So if there are in-
variant curves that survive perturbation they must have irrational
rotation numbers.
When we added a perturbation, we got chains of alternating
stable and unstable fixed points for every rational rotation num-
ber, and each stable fixed point is surrounded by an island that
occupies some region of the section. Since the rational numbers
are dense and each occupies a region one might wonder if any
invariant curve survives the perturbation. Surely there are even
more irrational rotation numbers to look at, but each irrational is
arbitrarily close to a rational, so it is not obvious that any invari-
ant curve can survive an arbitrarily small perturbation.
Nevertheless, the Kolmogorov-Arnold-Moser (KAM) theorem
proves invariant curves do exist if the perturbation is small enough,
so that the perturbed problem is “close enough” to an integrable
problem, and if the rotation number is “irrational enough.” We
will not prove this theorem here. Instead we will develop methods
for finding particular invariant curves.
Stable periodic orbits have a stable island surrounding them on
the surface of section. The largest islands are associated with ra-
tionals with small denominators. In general the size of the island
is limited to a size that decreases as the denominator increases.
These islands are a local indication of the effect of the perturba-
tion. Similarly, the chaotic zones appear near unstable periodic
5.5
4.5
3.5
−π 0 π
Figure 4.18 The curves C0 (solid) and C1 (dotted) for the 1:3 com-
mensurability. The angle runs from −π to π. The momentum runs from
3.5 to 4.5 in appropriate units.
5.5
4.5
3.5
−π 0 π
Figure 4.19 A surface of section displaying the 1:3 commensurability.
The angle runs from −π to π. The momentum runs from 3.5 to 4.5 in
appropriate units.
orbits and their homoclinic tangles. The homoclinic tangle is a

continuous curve so it cannot cross an invariant curve, which is
also continuous. If we are looking for invariant curves that persist
upon perturbation, we would be wise to avoid regions of phase
space where the islands or homoclinic tangles are major features.
The Poincaré-Birkhoff islands are ordered by rotation number.
Because of the twist condition, the rotation number is monotonic
in the momentum of the unperturbed problem. If there is an
invariant curve with a given rotation number it is sandwiched
between island chains associated with rational rotation numbers.
The rotation number of the invariant curve must be between the
rotation numbers of the island chains on either side of it.
The fact that the size of the islands decreases with the size of the
denominator suggests that invariant curves with rotation numbers
for which nearby rationals require large denominators are the most
likely to exist. So we will begin our search for invariant curves by
examining rotation numbers that are not near rationals with small
denominators.
Any irrational can be approximated by a sequence of rationals,
and for each of these rationals we expect there to be stable and
unstable periodic orbits with stable islands and homoclinic tan-
gles. An invariant curve for a given rotation number has the best
chance of surviving if the size of the islands associated with the
each rational approximation is smaller than the separation of the
islands from the invariant curve with that rotation number.
For any particular size denominator, the best rational approxi-
mation to an irrational number is given by an initial segment of a
simple continued fraction. If the approximating continued fraction
converges slowly to the irrational number then that number is not
near rationals with small denominators. Thus, we will look for in-
variant curves with rotation numbers that have slowly converging
continued-fraction approximations. The continued fractions that
converge most slowly have tails that are all one. For example, the
golden ratio,
√
1+ 5 1
φ= =1+ , (4.43)
2 1 + 1+ 1 1
1+···
is just such a number.

4.6.1 Finding Invariant Curves

Invariant curves, if there are any, are characterized by a particular
rotation number. Points on the invariant curve map to points
on the invariant curve. Neighboring points map to neighboring
points, preserving the order.
On the section for the unperturbed integrable system, the angle
between successive section points is constant: ∆θ = 2πν(J), for
rotation number ν(J). This map of the circle onto itself with
constant angular step we call a uniform circle map.
For a given rotation number points on the section are laid down
in a particular order characteristic of the rotation number only. As
a perturbation is turned on, the invariant curve with a particular
rotation number will be distorted and the angle between successive
points will no longer be constant. All that is required to have a
particular rotation number is that the average change in angle is
∆θ. Nevertheless, the ordering of the points on the surface of
section is preserved, and is characteristic of the rotation number.
We can use the fact that the sequence of points on the surface of
section for an invariant curve with a given rotation number must
have a particular order to find the invariant curve. By evolving
a candidate initial point with both the perturbed map and the
uniform circle map and comparing the ordering of the sequence of
points that are generated we can tell whether the initial point is
on the desired invariant curve or to which side it is.
Suppose we have a map that we can iterate to get the points
on a section. Using the idea of comparing the ordering of points
with the ordering of the uniform circle map, to indicate how the
rotation number of our orbit compares to the specified rotation
number, we can find the momentum, at a specified angle, for the
invariant curve by bisection search:13
(define (find-invariant-curve map rn theta0 Jmin Jmax eps)
(bisect (lambda (J) (which-way? rn theta0 J map))
Jmin Jmax eps))
However, we need to be able to determine which way to change

the momentum to approach the required rotation number.
13
This depends on the assumptions that Jmin and Jmax bracket the actual mo-
mentum, and that the rotation number is sufficiently continuous in momentum
in that region.
We can evolve the orbits for both maps, producing streams of

points that appear on the section. (The momentum value of the
uniform circle map is superfluous.) Each orbit stream is trans-
duced into a stream of positive integers. The integers give the
number of points that have been examined in the stream that
have smaller values of the angle. The streams of integers are then
compared until a discrepancy is found. The first discrepancy is
used to compare the rotation numbers of the two orbits, to deter-
mine which orbit has smaller rotation number.
(define (which-way? rn theta0 J0 map)
(compare-streams
(position-stream theta0
(orbit-stream map theta0 J0)
’())
(position-stream theta0
(orbit-stream (uniform-circle-map rn)
theta0 J0)
’())
0))
The maps are evolved and built into a stream by a simple recursive
procedure. The maps are represented in the same way that they
appeared in section 3.6.
(define (orbit-stream the-map x y)
(cons-stream (list x y)
(the-map x y
(lambda (nx ny)
(orbit-stream the-map nx ny))
(lambda () ’fail))))
The uniform-circle-map is a simple map that has a uniformly

progressing angle with constant momentum.
(define (uniform-circle-map rotation-number)
(let ((delta-theta (* :2pi rotation-number)))
(lambda (theta y result fail)
(result ((principal-value :2pi) (+ theta delta-theta))
y))))
The procedure position-stream produces a stream of index po-

sitions. It maintains an ordered list of angle values, and as each
new angle is added to the list it adds the position index to the
stream. A principal value is applied to the angle to bring it to a

uniform range specified.
(define (position-stream cut orbit list)
(insert! ((principal-value cut) (car (head orbit)))
list
(lambda (nlist position)
(cons-stream
position
(position-stream cut (tail orbit) nlist)))))
Given a new element x to be inserted into an ordered set set the

procedure insert! calls its continuation with the updated set and
the index that was used to insert the new element.14
The streams of indices are compared with compare streams.
The count is used to keep track of how many points we have al-
ready entered into the circle. When there is a discrepancy between
the indices, it means that one stream has begun to lead the other.
The principal-range procedure is used to determine which is the
leader.15 This is analogous to using the principal value to deter-
mine the direction from one angle to another on a circle.
14
The insert procedure is ugly:
(define (insert! x set cont)
(cond ((null? set)
(cont (list x) 1))
((< x (car set))
(cont (cons x set) 0))
(else
(let lp ((i 1) (lst set))
(cond ((null? (cdr lst))
(set-cdr! lst (cons x (cdr lst)))
(cont set i))
((< x (cadr lst))
(set-cdr! lst (cons x (cdr lst)))
(cont set i))
(else
(lp (+ i 1) (cdr lst))))))))
15
The principal-range procedure is implemented as follows:
(define ((principal-range period) index)
(let ((t (- index (* period (floor (/ index period))))))
(if (< t (/ period 2.))
t
(- t period))))
(define (compare-streams s1 s2 count)

(if (= (head s1) (head s2))
(compare-streams (tail s1) (tail s2) (+ count 1))
((principal-range count) (- (head s2) (head s1)))))
Once we have created this mess we can use it to find the initial
momentum (for a given initial angle) for an invariant curve with
a given rotation number. We search the standard map for an
invariant curve with a golden rotation number:16
(find-invariant-curve (standard-map 0.95)
(- 1 (/ 1 golden-mean))
0.0
2.0
2.2
1e-5)
;Value: 2.114462280273437
This algorithm, although correct, has terrible performance. The

problem is that each orbit builds a table of length the number of
points examined, and each insertion of a new point scans that
table sequentially, thus making a process that grows as the square
of the number of points examined in time and as the number of
points examined in space.
However, we observe that as ordering inconsistencies are found
the angles are usually near the initial angle. We can make use
of this to simplify the algorithm. Instead of keeping track of the
whole list of angles, we can keep track of a small list of angles
near the initial angle. In fact, keeping track of the nearest angle
on either side of the initial angle works well. Here is the complete
replacement for the which-way? procedure and its helpers. The
procedure is implemented as a simple loop with state variables for
the two orbits and the endpoints of the intervals. The z variables
keep track of the angle of the uniform circle map; the x variables
keep track of the angle of the map under study. The y variable
is the momentum for the map under study. On each iteration we
determine if the angle of the uniform circle map is in the interval of
interest below or above the initial angle. If it is in neither interval
then the map is further iterated. However, if it is in the region of
16
There is no invariant curve in the standard map with rotation number φ =
1.618.... However 1 − 1/φ has the same continued-fraction tail as φ and there
are rotation numbers of this size in the standard map.
interest then we check to see if the angle of the other map is in the
corresponding interval. If so, the intervals for the uniform circle
map and the other map are narrowed and the iteration proceeds. If
the angle is not in the required interval, a discrepancy is noted and
the sign of the discrepancy is reported. For this process to make
sense the differences between the angles for successive iterations
of both maps must be less than π.
(define (which-way? rotation-number x0 y0 the-map)
(let ((pv (principal-value (+ x0 pi))))
(let lp ((z x0) (zmin (- x0 :2pi)) (zmax (+ x0 :2pi))
(x x0) (xmin (- x0 :2pi)) (xmax (+ x0 :2pi))
(y y0))
(let ((nz (pv (+ z (* :2pi rotation-number)))))
(the-map x y
(lambda (nx ny)
(let ((nx (pv nx)))
(cond ((< x0 z zmax)
(if (< x0 x xmax)
(lp nz zmin z nx xmin x ny)
(if (> x xmax) 1 -1)))
((< zmin z x0)
(if (< xmin x x0)
(lp nz z zmax nx x xmax ny)
(if (< x xmin) -1 1)))
(else
(lp nz zmin zmax nx xmin xmax ny)))))
(lambda ()
(error "Map failed" x y)))))))
With this method of comparing rotation numbers we can expect

to be able to find the initial conditions for an invariant curve to
high precision:
(find-invariant-curve (standard-map 0.95)
(- 1 (/ 1 golden-mean))
0.0
2.0
2.2
1e-16)
;Value: 2.1144605494391726
Using initial conditions computed in this way we can produce

the invariant curve. See figure 4.20. If we expand the putative
invariant curve it should remain a curve for all magnifications—it
should show no sign of chaotic fuzziness. See figure 4.21.
−π
−π 0 π
Figure 4.20 A surface of section displaying the invariant curve at

rotation number 1 − 1/φ for the standard map with K = .95. The
invariant curve is in context: there is a chaotic region that almost eats
the curve. The angle and momentum run from 0 to 2π.
Exercise 4.8: Invariant curves in the standard map

Find another golden invariant curve in the standard map. Expand it to
show that it retains the features of a curve at high magnification.
4.6.2 Dissolution of Invariant Curves

As can be seen from figure 4.21 the points on an invariant curve
are not uniformly visited, unlike the picture we would get plotting
the angles for the uniform circle map. This is because an interval
may be expanded or compressed when mapped. We can compute
the relative probability density for visitation of each angle on the
invariant curve. A crude way to obtain this result is to count the
number of points that fall into equal incremental angle bins. It is
Figure 4.21 Here is a small portion of the same invariant curve shown
in figure 4.20. The curve is magnified by 2π × 107 . We see that even at
this magnification the points appear to lie on a line. We also see that
the visitation frequency of points is highly nonuniform.
more effective to use the linear variational map constructed from

the map being investigated to allow us to compute the change
in incremental angle from one point to its successor. Since all of
the points in a small interval around the source point are mapped
to points (in the same order) in a small interval around the tar-
get point, the relative probability density at a point is inversely
proportional to the size of the incremental interval around that
point. In order to get this started we need a good estimate of the
initial slope for the invariant curve. We can estimate the slope by
a difference quotient of the momentum and angle increments for
the interval that we used to refine the momentum of the invariant
curve with a given rotation number. Figures 4.22 and 4.23 show
the relative probability density of visitation as a function of angle
for the invariant curve of golden winding number in the standard
map for three different values of the parameter K. As K increases,
certain angles become less likely. Near K = 0.971635406 some an-
gles are never visited. But the invariant curve must be continuous.
Thus it appears that for larger K the invariant curve with this ro-
tation number will not exist. Indeed, if the invariant set persists
with the given rotation number it will have an infinite number of
holes (because it has an irrational winding number). Such a set is
sometimes called a cantorus.
Exercise 4.9: Dissolution of invariant curves

As the parameter K is increased beyond the critical value the golden
invariant curve ceases to exist. Investigate how the method for finding
invariant curves fails beyond the critical value of K.
Exercise 4.10: Hard

Make programs that reproduce figures 4.22, 4.22, and 4.23. You will
need to develop an effective method of estimating the probability of
visitation. There is one suggestion of how to do that in the text, but
you may find a better way.
Figure 4.22 The relative probability density of visitation as a func-

tion of angle for the invariant curve of golden winding number in the
standard map with K = 0.95 (above) and K = 0.97 (below). As K in-
creases the function becomes more complex and certain angles become
less likely to be visited.
Figure 4.23 The relative probability density of visitation as a func-

tion of angle for the invariant curve of golden winding number in the
standard map with K = 0.971635406. Here the function is very complex
and appears self similar. The valleys appear to reach to zero, so there
are discrete angles that are never visited.
5
Canonical Transformations
We have done considerable mountain climbing.
Now we are in the rarefied atmosphere of theories
of excessive beauty and we are nearing a high
plateau on which geometry, optics, mechanics, and
wave mechanics meet on common ground. Only
concentrated thinking, and a considerable amount
of re–creation, will reveal the beauty of our subject
in which the last word has not been spoken.
Cornelius Lanczos, The Variational Principles of
Mechanics, (1970, 1982), p. 229.
One way to simplify the analysis of a problem is to express the

problem in a form where the solution has a simple representation.
However, the initial formulation of the problem may be easier to
express in other terms. For example, the formulation of the prob-
lem of the motion of a number of gravitating bodies is simple in
rectangular coordinates, but it is easier to understand aspects of
the motion in terms of orbital elements, such as the semimajor
axes, eccentricities, and inclinations of the orbits. The semimajor
axis and eccentricity of an orbit depend on both the configuration
and the velocity of the body. Such transformations are more gen-
eral than those that express changes in configuration coordinates.
Here we investigate transformations of phase space coordinates
that involve both the generalized coordinates and the generalized
momenta.
Suppose we have two different Hamiltonian systems, and sup-
pose the trajectories of the two systems are in one-to-one corre-
spondence with each other. In this case both Hamiltonian systems
can be mathematical models of the same physical system. Some
questions about the physical system may be easier to answer by
reference to one model and others may be easier to answer in
the other model. For example, it may be easier to formulate the
physical system in one model and to discover a conserved quan-
tity in the other. Canonical transformations are maps between
Hamiltonian systems that preserve the dynamics.
318 Chapter 5 Canonical Transformations
A canonical transformation is a phase space coordinate trans-

formation and an associated transformation of the Hamiltonian
such that the dynamics given by Hamilton’s equations in the two
representations describe the same evolution of the system.
5.1 Point Transformations
A point transformation is a canonical transformation that extends

a possibly time-dependent transformation of the configuration co-
ordinates to a phase space transformation. For example, one
might want to reexpress motion in terms of polar coordinates,
given a description in terms of rectangular coordinates. In order
to extend a transformation of the configuration coordinates to a
phase space transformation we must specify how the momenta and
Hamiltonian are transformed.
We have already seen how configuration transformations can
be carried out in the Lagrangian formulation (see section 1.6.1).
In that case, we found that if the Lagrangian transforms by com-
position with the coordinate transformation, then the Lagrange
equations are equivalent.
Lagrangians that differ by the addition of a total time deriva-
tive are equivalent, but have different momenta conjugate to the
generalized coordinates. So there is more than one way to make
a canonical extension of a coordinate transformation.
Here, we find that particular canonical extension of a coordinate
transformation for which the Lagrangians transform by composi-
tion with the transformation, with no extra total time derivative
terms added to the Lagrangian.
Let L be a Lagrangian for a system. Consider the coordinate
transformation q = F (t, q 0 ). The velocities transform by
v = ∂0 F (t, q 0 ) + ∂1 F (t, q 0 )v 0 . (5.1)
We can obtain a Lagrangian in the transformed coordinates by

composition L0 (t, q 0 , v 0 ) = L(t, q, v)
L0 (t, q 0 , v 0 ) = L(t, F (t, q 0 ), ∂0 F (t, q 0 ) + ∂1 F (t, q 0 )v 0 ). (5.2)
The momentum conjugate to q 0 is
p0 = ∂2 L0 (t, q 0 , v 0 )
= ∂2 L(t, F (t, q 0 ), ∂0 F (t, q 0 ) + ∂1 F (t, q 0 )v 0 ) ∂1 F (t, q 0 )

= p∂1 F (t, q 0 ), (5.3)
where we have used
p = ∂2 L(t, q, v)
= ∂2 L(t, F (t, q 0 ), ∂0 F (t, q 0 ) + ∂1 F (t, q 0 )v 0 ). (5.4)
So, from equation (5.3),1
p = p0 (∂1 F (t, q 0 ))−1 . (5.5)
We can collect these results to define a canonical phase space

transformation C:2
(t, q, p) = C(t, q 0 , p0 )
= (t, F (t, q 0 ), p0 (∂1 F (t, q 0 ))−1 ). (5.6)
The Hamiltonian is obtained by the Legendre transform
H 0 (t, q 0 , p0 ) = p0 v 0 − L0 (t, q 0 , v 0 )
= (p∂1 F (t, q 0 )) ((∂1 F (t, q 0 )−1 (v − ∂0 F (t, q 0 ))))
− L(t, q, v)
= pv − L(t, q, v) − p∂0 F (t, q 0 )
= H(t, q, p) − p∂0 F (t, q 0 ), (5.7)
using relations (5.1) and (5.5) in the second step. Fully expressed
in terms of the transformed coordinates and momenta the trans-
formed Hamiltonian is
H 0 (t, q 0 , p0 ) = H(t, F (t, q 0 ), p0 (∂1 F (t, q 0 ))−1 )

− (p0 (∂1 F (t, q 0 ))−1 )∂0 F (t, q 0 ). (5.8)
1
Solving for p in terms of p0 involves multiplying equation (5.3) on the right by
(∂1 F (t, q 0 ))−1 . This inverse is the structure that when multiplying ∂1 F (t, q 0 )
on the right gives a identity structure. Structures representing linear trans-
formations may be represented in terms of matrices. In this case, the matrix
representation of the inverse structure is the inverse matrix of the matrix
representing the given structure.
2
In chapter 1 the transformation C takes a local tuple in one coordinate system
and gives a local tuple in another coordinate system. In this chapter C is a
phase-space transformation.
The Hamiltonians H 0 and H are equivalent because L and L0 have

the same value for a given dynamical state and so have the same
paths of stationary action. In general H and H 0 do not have the
same values for a given dynamical state, but differ by a term that
depends on the coordinate transformation.
For time-independent transformations, ∂0 F = 0, there are a
number of simplifications. The relationship of the velocities (5.1)
becomes
v = ∂1 F (t, q 0 )v 0 . (5.9)
Comparing this to the relation (5.5) between the momenta, we

see that in this case the momenta transform “oppositely” to the
velocities3
pv = p0 (∂1 F (t, q 0 ))−1 ∂1 F (t, q 0 )v 0 = p0 v 0 , (5.10)
so the product of the momenta and the velocities is not changed

by the transformation. This, combined with the fact that by con-
struction L(t, q, v) = L0 (t, q 0 , v 0 ), shows that
H(t, q, p) = pv − L(t, q, v)
= p0 v 0 − L0 (t, q 0 , v 0 )
= H 0 (t, q 0 , p0 ). (5.11)
For time-independent coordinate transformations the Hamiltonian

transforms by composition with the associated phase-space trans-
formation. We can also see this from the general relationship (5.7)
between the Hamiltonians.
Implementing point transformations
The procedure F->CT takes a procedure implementing a transfor-
mation of configuration coordinates F and returns a procedure
implementing a transformation of phase-space coordinates.
3
The velocities and the momenta are dual geometric objects with respect to
time-independent point transformations. The velocities comprise a vector field
on the configuration manifold, and the momenta comprise a covector field on
the configuration manifold. The invariance of the inner product pv under point
transformations provides the motivation for the use of superscripts for velocity
components and subscripts for momentum components in our notation.
(define ((F->CT F) H-state)

(up (time H-state)
(F H-state)
(* (momentum H-state)
(invert (((partial 1) F) H-state)))))
Consider a particle moving in a central field. In rectangular

coordinates a Hamiltonian is:
(define ((H-central m V) H-state)
(let ((x (coordinate H-state))
(+ (/ (square p) (* 2 m))
(V (sqrt (square x))))))
Let’s look at this Hamiltonian in polar coordinates. The phase

space transformation is obtained by applying F->CT to the pro-
cedure p->r that takes a time and a polar tuple and returns a
tuple of rectangular coordinates (see section 1.6.1). The trans-
formation is time-independent so the Hamiltonian transforms by
composition. In polar coordinates the Hamiltonian is:
(show-expression
((compose (H-central ’m (literal-function ’V))
(F->CT p->r))
(up ’t
(up ’r ’phi)
(down ’p r ’p phi))))
1 2 1 2
2 pr 2 pφ
V (r) + +
m mr2
There are three terms. There is the potential energy, which de-
pends on the radius, there is the kinetic energy due to radial mo-
tion, and there is the kinetic energy due to tangential motion. As
expected, the angle φ does not appear and thus the angular mo-
mentum is a conserved quantity. By going to polar coordinates we
have decoupled one of the two degrees of freedom in the problem.
Exercise 5.1: Rotations

Let q and q 0 be rectangular coordinates that are related by a rotation
R: q = Rq 0 . The Lagrangian for the system is L(t, q, v) = 12 mv 2 − V (q).
Find the corresponding phase space transformation C. Compare the
transformation equations for the rectangular components of the mo-
menta to those for the rectangular components of the velocities. Are

you surprised, considering equation (5.10)?
5.2 General Canonical Transformations
Although we have shown how to extend any point transformation

of the configuration space to a canonical transformation, there are
other ways to construct canonical transformations. How do we
know if we have a canonical transformation? To test if a trans-
formation is canonical we may use the fact that if the transfor-
mation is canonical then Hamilton’s equations of motion for the
transformed system and the original system will be equivalent.
Consider a Hamiltonian H and a phase space transformation
C. The transformation C transforms the phase space path σ 0 (t) =
(t, q 0 (t), p0 (t)) into σ(t) = (t, q(t), p(t)):
σ = C ◦ σ0. (5.12)
The rates of change of the phase-space coordinates are trans-

formed by the derivative of the transformation
Dσ = D(C ◦ σ 0 ) = (DC ◦ σ 0 )Dσ 0 . (5.13)
Let Ds be the phase-space derivative operator
Ds H(t, q, p) = (1, ∂2 H(t, q, p), −∂1 H(t, q, p)) . (5.14)
Hamilton’s equations are
Dσ = Ds H ◦ σ, (5.15)
for any realizable phase-space path σ.

The transformation is canonical if the equations of motion ob-
tained from the new Hamiltonian are the same as those that could
be obtained by transforming the equations of motion derived from
the original Hamiltonian to the new coordinates:
Dσ = (DC ◦ σ 0 )Dσ 0 = (DC ◦ σ 0 )Ds H 0 ◦ σ 0 . (5.16)
Comparing equation (5.15) with this we see
Ds H ◦ σ = (DC ◦ σ 0 )Ds H 0 ◦ σ 0 . (5.17)

5.2 General Canonical Transformations 323
Using σ = C ◦ σ 0 we find
Ds H ◦ C ◦ σ 0 = (DC ◦ σ 0 )Ds H 0 ◦ σ 0 . (5.18)
This condition must hold for any realizable phase-space path σ 0 .

Certainly this is true if the following condition holds for every
phase-space point:
Ds H ◦ C = DC · (Ds H 0 ). (5.19)
Any transformation that satisfies equation (5.19) is a canonical

transformation among phase-space representations of a dynamical
system. In one phase-space representation the system’s dynamics
is characterized by the Hamiltonian H 0 and in the other by H.
The idea behind this equation is illustrated in figure 5.1.
Ds
Ds H
q̇, ṗ q, p R
H
D
DC C
Ds H 0 H0
q̇ 0 , ṗ0 q 0 , p0 R
Ds
Figure 5.1 A canonical transformation C relates the descriptions of a

dynamical system in two phase-space coordinate systems. The transfor-
mation shows how Hamilton’s equations in one coordinate system may
be derived from Hamilton’s equations in the other coordinate system.
We can formalize this test as a program:

(define (canonical? C H Hprime)

(- (compose (phase-space-derivative H) C)
(* (D C) (phase-space-derivative Hprime))))
where phase-space-derivative, which was introduced in chap-

ter 3, implements Ds . The transformation is canonical if these
residuals are zero.
If a suitable Hamiltonian for the transformed system is obtained
by composing H with the phase space transformation, we obtain
a more specific formula:
Ds H ◦ C = DC Ds (H ◦ C). (5.20)
and a more specific test

(define (compositional-canonical? C H)
(canonical? C H (compose H C)))
Using this test we can verify that the polar-to-rectangular trans-

formation satisfies the test for a canonical transformation on a
general central field:
(print-expression
((compositional-canonical?
(F->CT p->r)
(H-central ’m (literal-function ’V)))
(up ’t
(up ’r ’phi)
(down ’p r ’p phi))))
(up 0 (up 0 0) (down 0 0))
The residuals are zero so the transformation is canonical.
Exercise 5.2: Group properties

If we say that C is canonical with respect to Hamiltonians H and H 0 if
and only if Ds H ◦ C = DC · Ds H 0 , then:
a. Show that the composition of canonical transformations is canonical.
b. Show that composition of canonical transformations is associative.
c. Show that the identity transformation is canonical.
d. Show that there is an inverse for a canonical transformation and the
inverse is canonical.
5.2.1 Time-independent Canonical Transformations

We have defined a canonical transformation as a transformation
of phase space coordinates for which Hamilton’s equations trans-
form appropriately. The conditions that a canonical transforma-
tion must satisfy (equations 5.19 or 5.20) involve the Hamilto-
nians. If the Hamiltonians transform by composition and the
transformation is time-independent then we can tell if the phase
space transformation is canonical without further reference to the
Hamiltonian.
First, we reformulate Hamilton’s equations in a slightly different
form. Hamilton’s equations are constructed from the derivative of
the Hamiltonian by rearranging the components and then negating
some of them. We introduce a shuffle function that does this
rearrangement:
e
J([a, b, c]) = (0, c, −b) . (5.21)
The argument to Je is a down tuple of components of the derivative

of a Hamiltonian-like function. The shuffle function is linear. We
also introduce a constant function:
Te([a, b, c]) = (1, 0, 0) . (5.22)
With these Hamilton’s equations can be expressed
Dσ = (Je + Te) ◦ DH ◦ σ. (5.23)
Using Je and Je the canonical condition (5.20) can be rewritten
(Je + Te) ◦ (DH) ◦ C = DC · ((Je + Te) ◦ (D(H ◦ C))) (5.24)

= DC · (Je ◦ ((DH ◦ C) · (DC)))
+ DC · (Te ◦ (D(H ◦ C))) (5.25)
The value of Te does not depend on its arguments, and for time-
independent transformations Te = DC · Te, so the canonical condi-
tion becomes
Je ◦ (DH) ◦ C = DC · (Je ◦ ((DH ◦ C) · (DC))). (5.26)
Applied to a particular phase-space state s this is

e
J(DH(C(s))) e
= DC(s) · J(DH(C(s)) · DC(s)). (5.27)
Let Φ be a function that takes a multiplier and produces a linear

transformation that multiplies the multiplier by the argument to
the linear transformation:
Φ(A)(v) = A · v. (5.28)
Similarly, let Φ∗ be a function that takes a multiplier and produces

a linear transformation that multiplies the argument to the linear
transformation by the multiplier:
Φ∗ (A)(p) = p · A. (5.29)
Using Φ and Φ∗ we can rewrite condition (5.27) as

e
J(DH(C(s)))
= (Φ(DC(s)) ◦ Je ◦ Φ∗ (DC(s)))(DH(C(s))). (5.30)
This condition is satisfied if
Je = Φ(DC(s)) ◦ Je ◦ Φ∗ (DC(s)). (5.31)
A time-independent transformation C is canonical, for Hamiltoni-

ans that transform by composition, if this condition on its deriva-
tive DC is satisfied.
Note that the condition (5.31) does not refer to the Hamilto-
nian. This is a remarkable result. Though we have assumed the
Hamiltonians transform by composition with the transformation,
we can decide whether a time-independent phase-space transfor-
mation preserves the dynamics of Hamilton’s equation without
further reference to the details of the dynamical system.
The test is implemented:
(define ((time-independent-canonical? C) s)
((- J-func
(compose (Phi ((D C) s))
J-func
(Phi* ((D C) s))))
(compatible-shape s)))
(define (J-func DH)

(up 0 (ref DH 2) (- (ref DH 1))))
(define ((Phi A) v) (* A v))
(define ((Phi* A) w) (* w A))
This procedure tests whether a composition of functions is the

same function as Je by computing their difference when applied to
a general typical argument.4 Here they are applied to a structure
with the shape of DH(s), for an arbitrary phase-space state s.5
For example, consider the following polar-canonical transforma-
tion:
(t, x, px ) = Cα (t, θ, I) (5.32)
where
r
2I
x= sin θ (5.33)
α
√
px = 2αI cos θ. (5.34)
Here α is an arbitrary parameter that we may set to whatever is

convenient. We define:
(define ((polar-canonical alpha) H-state)
(let ((t (state->t H-state))
(theta (coordinate H-state))
(I (momentum H-state)))
(let ((x (* (sqrt (/ (* 2 I) alpha)) (sin theta)))
(p x (* (sqrt (* 2 alpha I)) (cos theta))))
(up t x p x))))
And now we just run our test:
4
It is in principle impossible to generally determine if two functions are the
same, but in this case, since Φ(DC(s)) is linear, this test is valid.
5
The shape of DH(s) is a compatible shape to the shape of s: if they are
multiplied the result is a real number. The procedure compatible-shape takes
any structure and produces another structure that is guaranteed to multiply
with the given structure to produce a real number. The structure produced
is filled with unique real literals, so if the residual is zero then the functions
are the same.
(print-expression
((time-independent-canonical? (polar-canonical ’alpha))
(up ’t ’theta ’I)))
(up 0 0 0)
So the transformation is canonical.6

Of course, not every transformation we might try is canonical.
For example, we might try x = p sin θ with px = p cos θ. The
implementation is7
(define (a-non-canonical-transform H-state)
(let ((t (state->t H-state))
(theta (coordinate H-state))
(let ((x (* p (sin theta)))
(p x (* p (cos theta))))
(up t x p x))))
(print-expression
((time-independent-canonical? a-non-canonical-transform)
(up ’t ’theta ’p)))
(up 0 (+ (* -1 p x8102) x8102) (+ (* p x8101) (* -1 x8101)))
So this transformation is not compositional canonical.

Harmonic oscillator
The analysis of the harmonic oscillator illustrates the use of a
general canonical transformation in the solution of a problem. The
harmonic oscillator is a mathematical model of a simple spring-
mass system. The Hamiltonian for a spring mass system with
mass m and spring constant k is
p2x 1
H(t, x, px ) = + kx2 . (5.35)
2m 2
6
Actually, for I = 0 the transform is not well defined and so it is not composi-
tional canonical for that value. This transformation is “locally compositional
canonical” in that it is compositional canonical for nonzero values of I. We
will ignore this essentially topological problem.
7
The mysterious symbols such as x8102 are unique real literals introduced to
test functional equalities. That they appeared in a residual demonstrates that
the equality is invalid.
Hamilton’s equations of motion are
Dx = px /m
Dpx = −kx, (5.36)
giving the second order system
mD2 x + kx = 0. (5.37)
The solution is
x(t) = A sin(ωt + φ), (5.38)
where
p
ω = k/m (5.39)
and where A and φ are determined by initial conditions.

Let’s try our polar-canonical transformation Cα on the har-
monic oscillator. We substitute expressions (5.33) and (5.34) for
x and px in the Hamiltonian, getting our new Hamiltonian:
αI kI
H 0 (t, θ, I) = (cos θ)2 + (sin θ)2 . (5.40)
m α
√
If we choose α = km then we obtain
r
0 k
H (t, θ, I) = I = ωI, (5.41)
m
and the new Hamiltonian no longer depends on the coordinate.
Hamilton’s equation for I is
DI(t) = −∂1 H 0 (t, θ(t), I(t)) = 0, (5.42)
so I is constant. The equation for θ is
Dθ(t) = ∂2 H 0 (t, θ(t), I(t)) = ω. (5.43)
So
θ(t) = ωt + φ. (5.44)
In the original variables

p
x(t) = 2I(t)/α sin θ(t)
= A sin(ωt + φ), (5.45)
p
with the constant A = 2I(t)/α. So we have found the solu-
tion to the problem by making a canonical transformation to new
phase space variables for which the solution is trivial and then
transforming the solutions back to the original variables.
Exercise 5.3: Trouble in Lagrangian world

Is there a Lagrangian L0 that corresponds to the harmonic oscillator
Hamiltonian H 0 (t, θ, I) = ωI? What could this possibly mean?
Exercise 5.4: Polar-canonical transformations

Let x, p and θ, I be two sets of canonically conjugate variables. Consider
transformations of the form x = βI α sin θ and p = βI α cos θ. Determine
all α and β for which this transformation is compositional canonical.
Exercise 5.5: Standard map

Is the standard map a canonical transformation? Recall that the stan-
dard map is: I 0 = I + K sin θ, with θ0 = θ + I 0 , both modulo 2π.
5.2.2 Symplectic Transformations

Condition (5.31) involves the composition of functions, all of which
are linear transformations. Linear transformations can be repre-
sented in terms of matrices. A matrix representation is defined
with respect to a basis. For incremental Hamiltonian states we
organize the state components as a column matrix of time, the
components of the coordinates, and the corresponding components
of the momenta.
e and DC be the matrix representations of Je and Φ(DC(s)),
Let J
respectively, and where s is the arbitrary phase-space state at
which the canonical condition is being tested. The matrix repre-
sentation of Φ∗ (DC(s)) is the transpose of DC. In terms of these
matrix representations the test for canonical becomes
e = (DC) J
J e (DC)T . (5.46)
We say that a transformation is symplectic if the matrix represen-

tation of its derivative satisfies this identity.
5.2.2 Symplectic Transformations 331
The matrix representation of the multiplier for the linear trans-

e We can find the multiplier for a linear transfor-
formation Je is J.
mation by taking the derivative of the linear transformation and
e
evaluating it at an arbitrary point:8 DJ([a, b, c]). We can obtain a
matrix representation with the utility s->m that takes a multiplier
of a linear transformation and returns a matrix representation of
the multiplier.9 The matrix J e depends only on the number of
degrees of freedom. For example, the J e for a system with two
degrees of freedom is:
(print-expression
(let* ((s (typical-H-state 2))
(s* (compatible-shape s)))
(s->m s* ((D J-func) s*) s*)))
(matrix-by-rows (list 0 0 0 0 0)
(list 0 0 0 1 0)
(list 0 0 0 0 1)
(list 0 -1 0 0 0)
(list 0 0 -1 0 0))
In terms of matrix representations, the test that a transforma-

tion is symplectic is:
(define ((symplectic? C) s)
(let ((s* (compatible-shape s)))
(let ((J (s->m s* ((D J-func) s*) s*))
(DCs (s->m s* ((D C) s) s)))
(- J (* DCs J (m:transpose DCs))))))
For example, we can verify that the point transformation de-

rived from the coordinate transformation p->r is symplectic:
8
The derivative of a linear transformation is a constant function, independent
of the argument.
9
The procedure s->m takes three arguments: (s->m s* A s). The s* and s
specify the shapes of objects that multiply A on the left and right to give a
numerical value; these specify the basis.
(print-expression
((symplectic? (F->CT p->r))
(up ’t
(up ’r ’varphi)
(down ’p r ’p varphi))))
(matrix-by-rows (list 0 0 0 0 0)
(list 0 0 0 0 0)
(list 0 0 0 0 0)
(list 0 0 0 0 0)
(list 0 0 0 0 0))
There is a further simplification available. The elements of the

first row and the first column of the matrix representation of Je are
all zeros. So the first and column of the matrix identity is always
satisfied. So we can consider only the submatrix associated with
the coordinates and the momenta.
The qp submatrix10 of dimension 2n × 2n of the matrix J e is
called the symplectic unit for n degrees of freedom:
h i
0n×n 1n×n
Jn = . (5.47)
−1n×n 0n×n
The matrix Jn satisfies the following identities:
JTn = J−1
n = −Jn . (5.48)
A 2n × 2n matrix A that satisfies the relation
Jn = AJn AT (5.49)
is called a symplectic matrix.

Here is an alternate test for whether a transformation is sym-
plectic:
10
The qp submatrix of a 2n + 1-dimensional square matrix is the 2n-
dimensional matrix obtained by deleting the first row and the first column
of the given matrix. This can be computed by:
(define (qp-submatrix m)
(m:submatrix m 1 (m:num-rows m) 1 (m:num-cols m)))
(define ((symplectic-transform? C) s)
(symplectic-matrix?
(qp-submatrix
(s->m (compatible-shape s)
((D C) s)
s))))
(define (symplectic-matrix? M)
(let ((2n (m:dimension M)))
(let ((J (symplectic-unit (quotient 2n 2))))
(- J (* M J (m:transpose M))))))
The procedure symplectic-transform? returns a zero matrix if

and only if the transformation being tested passes the symplectic
matrix test. An appropriate symplectic unit matrix of a given size
is produced by the procedure symplectic-unit.
The point transformations are symplectic. For example,
(print-expression
((symplectic-transform? (F->CT p->r))
(up ’t
(up ’r ’theta)
(down ’p r ’p theta))))
(matrix-by-rows (list 0 0 0 0)
(list 0 0 0 0)
(list 0 0 0 0)
(list 0 0 0 0))
Exercise 5.6: Symplectic matrices

Let A be a symplectic matrix: Jn = AJn AT . Show that AT and A−1
are symplectic.
Exercise 5.7: Whittaker transform

¡1 ¢
Shew that the transformation q = log q0 sin p0 with p = q 0 cot p0 is
symplectic.
5.2.3 Time-Dependent Transformations

We have found that time-independent transformations (involving
the coordinates and conjugate momenta, but not the time) are
canonical if the derivative of the transformation is symplectic.
Let’s return to the calculation of the symplectic condition, but now
allow explicit time dependence in the transformation equations.
If the transformation is time-dependent, then it turns out that

H ◦ C does not make a suitable H 0 . Instead, we assume
H 0 = H ◦ C + K, (5.50)
and look for conditions on K and C that guarantee the transfor-

mation is canonical. Equation (5.25), the condition that a trans-
formation is canonical, becomes
(Je + Te) ◦ (DH) ◦ C = DC · (Je ◦ ((DH ◦ C) · DC + DK))

+ DC · (Te ◦ ((D(H ◦ C)) + DK)). (5.51)
This condition is satisfied if the following two conditions are sat-

isfied:
Je ◦ (DH) ◦ C = DC · (Je ◦ ((DH ◦ C) · (DC))) (5.52)
and
Te ◦ (DH) ◦ C = DC · ((Je + Te) ◦ (DK))

= DC · (Je ◦ (DK)) + ∂0 C (5.53)
Condition (5.52) is the condition that C is a symplectic trans-

formation. Condition (5.53) is an auxiliary condition on K. This
condition does not actually depend on the Hamiltonian H because
the constant value of Te does not depend on the argument. The
time component is always satisfied; only the coordinate and mo-
mentum components of this condition constrain K. Evaluated at
a particular state s (with compatible shape s∗ ) the condition on
K is
Te(s∗ ) = DC(s) · (J(DK(s))

e + ∂0 C(s), (5.54)
explicitly showing that the Hamiltonian H does not enter.

Thus we can conclude that a time-dependent transformation is
canonical if its position-momentum part is symplectic and if we
form the new Hamiltonian by adding an appropriate piece. Note
that we have not proven that the position-momentum part must
be symplectic. Rather we have shown that if this part is symplectic
then the Hamiltonian must be modified in an appropriate way.
As a program, the test for K is
(define ((canonical-K? C K) s)
(let ((s* (compatible-shape s)))
(- (T-func s*)
(+ (* ((D C) s) (J-func ((D K) s)))
(((partial 0) C) s)))))
Rotating coordinates
Consider a time-dependent point transformation to uniformly ro-
tating coordinates:
q = R(Ω)(t, q 0 ), (5.55)
with components
x = x0 cos(Ωt) − y 0 sin(Ωt)
y = x0 sin(Ωt) + y 0 cos(Ωt). (5.56)
As a program this is
(define ((rotating n) state)
(q (coordinate state)))
(let ((x (ref q 0))
(y (ref q 1))
(z (ref q 2)))
(up (+ (* (cos (* n t)) x) (* (sin (* n t)) y))
(- (* (cos (* n t)) y) (* (sin (* n t)) x))
z))))
The extension of this transformation to a phase space transforma-

tion is
(define (C-rotating Omega) (F->CT (rotating Omega)))
We first verify that the position-momentum part of this time-

dependent transformation is symplectic:
(pe
((symplectic-transform? (C-rotating ’Omega))
(up ’t
(coordinate-tuple ’x ’y ’z)
(momentum-tuple ’px ’py ’pz))))
(matrix-by-rows (list 0 0 0 0 0 0)
(list 0 0 0 0 0 0)
(list 0 0 0 0 0 0)
(list 0 0 0 0 0 0)
(list 0 0 0 0 0 0)
(list 0 0 0 0 0 0))
For this transformation the appropriate correction to the Hamil-

tonian is
K(Ω)(t; x0 , y 0 , z 0 ; p0x , p0y , p0z ) = −Ω(x0 p0y − y 0 p0x ), (5.57)
which is the rate of rotation of the coordinate system multiplied

by the angular momentum. The justification for this will be given
in section 5.6. The implementation is:
(define ((K Omega) s)
(let ((q (coordinate s)) (p (momentum s)))
(let ((x (ref q 0)) (y (ref q 1))
(px (ref p 0)) (py (ref p 1)))
(* -1 Omega (- (* x py) (* y px))))))
Applying the test:

(print-expression
((canonical-K? (C-rotating ’Omega) (K ’Omega))
(up ’t
(up ’x ’y ’z)
(down ’p x ’p y ’p z))))
(up 0 (up 0 0 0) (down 0 0 0))
The residuals are zero so this K completes the canonical transfor-

mation.
5.2.4 The Symplectic Condition

A transformation is symplectic if the pq part of the transformation
has symplectic derivative. This condition can be written simply
in terms of Poisson brackets.
5.2.4 The Symplectic Condition 337
e
The Poisson bracket can be written in terms of J:
{f, g} = (Df ) · (Je ◦ (Dg)), (5.58)
as can be seen by writing out the components.

We break the transformation C into position and momentum
parts:
q = A(t, q 0 , p0 ) (5.59)
p = B(t, q 0 , p0 ). (5.60)
In terms of the individual component functions the symplectic

condition (5.31) is
δji = {Ai , Bj }
0 = {Ai , Aj }
0 = {Bi , Bj } (5.61)
where δji is one if i = j and zero otherwise. These are called the
fundamental Poisson brackets. If a transformation satisfies these
fundamental Poisson bracket relations then it is symplectic.
We have found that a time-dependent transformation is canon-
ical if its position-momentum part is symplectic and we modify
the Hamiltonian by the addition of a suitable K. We can rewrite
these conditions in terms of Poisson brackets. If the Hamiltonian
is
H 0 (t, q 0 , p0 ) = H(t, A(t, q 0 , p0 ), B(t, q 0 , p0 )) + K(t, q 0 , p0 ), (5.62)
the transformation will be canonical if the coordinate-momentum

transformation satisfies the fundamental Poisson brackets, and K
satisfies:
© i ª
A , K + ∂0 Ai = 0
{Bj , K} + ∂0 Bj = 0. (5.63)
Exercise 5.8:
Fill in the details to show that the symplectic condition (5.31) is equiv-
alent to the fundamental Poisson brackets (5.61) and that the condition
on K (5.53) is equivalent to the Poisson bracket condition on K (5.63).
5.3 Invariants of Canonical Transformations
Canonical transformations allow us to change the phase-space co-

ordinate system that we use to express a problem, preserving the
form of Hamilton’s equations. If we solve Hamilton’s equations in
one phase-space coordinate system we can use the transformation
to carry the solution to the other coordinate system. What other
properties are preserved by a canonical transformation?
Noninvariance of pv
We noted in equation (5.10) that canonical extensions of point
transformations preserved the value of pv. This does not hold for
more general canonical transformations. We can illustrate this
with the transformation just considered. Along corresponding
paths x, px and θ, I
r
2I(t)
x(t) = sin θ(t)
p α
px (t) = 2I(t)α cos θ(t). (5.64)
and so Dx is
r
2I(t) 1
Dx(t) = Dθ(t) cos θ(t) + DI(t) p sin θ(t). (5.65)
α 2I(t)α
The difference of pv and the transformed p0 v 0 is
px (t)Dx(t) − I(t)Dθ(t)
¡ ¢
= I(t)Dθ(t) 2 cos2 θ(t) − 1 + DI(t) sin θ(t) cos θ(t). (5.66)
In general this is not zero. The product pv is not necessarily

invariant under general canonical transformations.
Invariance of Poisson brackets
Here is a remarkable fact: the composition of the Poisson bracket
of two phase space state functions with a canonical transforma-
tion is the same as the Poisson bracket of each of the two functions
composed with the transformation separately. Loosely speaking,
the Poisson bracket is invariant under canonical phase space trans-
formations.
Let f and g be two phase space state functions. Using the Je

representation of the Poisson bracket (see section 5.2.4),
{f ◦ C, g ◦ C} (s)
= (D(f ◦ C))(s) · (Je ◦ D(g ◦ C))(s)
e
= (Df ◦ C)(s)) · DC(s) · (J((Dg ◦ C(s)) · DC(s)))
e
= ((Df ◦ C)(s)) · (J((Dg ◦ C)(s)))
= ({f, g} ◦ C)(s), (5.67)
where the fact that C is symplectic and satisfies equation (5.27)

was used in the middle. Abstracted to functions of phase-space
states, this is:
{f ◦ C, g ◦ C} = {f, g} ◦ C. (5.68)
Volume preservation
Consider a canonical transformation C. Let Ĉt be a function with
parameter t such that (q, p) = Ĉt (q 0 , p0 ) if (t, q, p) = C(t, q 0 , p0 ).
The function Ĉt maps phase space coordinates to alternate phase
space coordinates at a given time. Consider regions R in (q, p)
and R0 in (q 0 , p0 ) such that R = Ĉt (R0 ). The volume of region R0
is
Z Z
V (R) = 1= det(DĈt ). (5.69)
R R0
Now if C is symplectic then the determinant of DĈt is one (see

section 4.2), so
V (R) = V (R0 ). (5.70)
Thus, phase space volume is preserved by symplectic transforma-

tions.
Liouville’s theorem shows that time evolution preserves phase
space volume. Here we see that canonical transformations also
preserve phase volumes. Later, we will find that time evolution
actually generates a canonical transformation.
A bilinear form preserved by symplectic transformations
The invariance of Poisson brackets under canonical transforma-
tions can be used to prove the invariance of another closely-related
antisymmetric bilinear form under canonical transformations. De-

fine11
ω(ζ1 , ζ2 ) = P (ζ2 )Q(ζ1 ) − P (ζ1 )Q(ζ2 ), (5.71)
where Q = I1 and P = I2 are the coordinate and momentum

selectors, respectively. The arguments ζ1 and ζ2 are incremental
phase space states. Under a canonical transformation s = C(s0 ),
incremental states transform with the derivative
ζi = DC(s0 )ζi0 . (5.72)
We will show that
ω(ζ1 , ζ2 ) = ω(ζ10 , ζ20 ), (5.73)
provided the ζi0 have zero time component.

Condition (5.27) that a time-independent C with compositional
Hamiltonian H is canonical is equivalent to the symplectic condi-
tion (5.31), which does not mention the Hamiltonian H. So for
time-independent symplectic C, condition (5.27) is also satisfied
with the Hamiltonian replaced by any function f on the phase-
state space:
e
J(Df e
(C(s))) = DC(s) · J((D(f ◦ C))(s)). (5.74)
We will use this in the following.

In terms of ω the Poisson bracket is
{f, g}(s) = ω((J˜ ◦ Df )(s), (J˜ ◦ Dg)(s)), (5.75)
as can be seen by writing out the components. We use the fact that
Poisson brackets are invariant under canonical transformations
({f, g} ◦ C)(s0 ) = {f ◦ C, g ◦ C}(s0 ). (5.76)
11
The ω form can also be written as a sum over degrees of freedom:
X
ω(ζ1 , ζ2 ) = Pi (ζ2 )Qi (ζ1 ) − Pi (ζ1 )Qi (ζ2 ).
i
Notice that the contributions for each i do not mix components from different
degrees of freedom.
The left hand side of equation (5.76) is
({f, g} ◦ C)(s0 ) = ω((J˜ ◦ Df ◦ C)(s0 ), (J˜ ◦ Dg ◦ C)(s0 ))

= ω(DC(s0 ) · (J˜ ◦ (D(f ◦ C)(s0 ))),
DC(s0 ) · (J˜ ◦ (D(g ◦ C)(s0 )))), (5.77)
where we have used the useful relation (5.74). The right-hand side
of equation (5.76) is
{f ◦ C, g ◦ C}(s0 ) = ω((J˜ ◦ D(f ◦ C))(s0 ), (J˜ ◦ D(g ◦ C))(s0 )).(5.78)
Now the left-hand side must equal the right-hand side for any f
and g, so the equation must also be true for arbitrary ζi0 of the
form:
ζ10 = (J˜ ◦ D(f ◦ C))(s0 )

ζ20 = (J˜ ◦ D(g ◦ C))(s0 ). (5.79)
The ζi0 are arbitrary incremental states with zero time components.
So we have proven that
ω(ζ10 , ζ20 ) = ω(DC(s0 ) · ζ10 , DC(s0 ) · ζ20 ). (5.80)
for canonical C and incremental states ζi0 with zero time compo-
nents. Using equation (5.72) we have
ω(ζ10 , ζ20 ) = ω(ζ1 , ζ2 ). (5.81)
Thus the bilinear antisymmetric function ω is invariant under

canonical transformations.
As a program ω is:
(define (omega zeta1 zeta2)
(- (* (momentum zeta2) (coordinate zeta1))
(* (momentum zeta1) (coordinate zeta2))))
We can check that it is invariant under the polar to rectangular

canonical transformation by computing the residuals. We use the
arbitrary state
(define a-polar-state
(up ’t
(up ’r ’varphi)
(down ’p r ’p varphi)))
and the typical state increments

(define zeta1
(up 0
(typical-object (coordinate a-polar-state))
(typical-object (momentum a-polar-state))))
(define zeta2
(up 0
(typical-object (coordinate a-polar-state))
(typical-object (momentum a-polar-state))))
Note that the time components of zeta1 and zeta2 are zero. We
evaluate the residual:
(print-expression
(let ((DCs ((D (F->CT p->r)) a-polar-state)))
(- (omega zeta1 zeta2)
(omega (* DCs zeta1) (* DCs zeta2)))))
0
The residual is zero so ω is invariant under this canonical trans-

formation.
Poincaré integral invariants
Consider the oriented area of a region R0 in phase space (see fig-
ure 5.2). Suppose we make a canonical transformation from coor-
dinates (q 0 , p0 ) to (q, p) taking region R0 to region R. The bound-
ary of the region in the transformed coordinates is just the image
under the canonical transformation of the original boundary. Let
Rqi ,pi be the projection of the region R onto the q i , pi plane of co-
ordinate q i and conjugate momentum pi , and Ai be its area. We
call the q i , pi plane the ith canonical plane in these phase space
variables. Similarly, let Rq0 0i ,p0i be the projection of R0 onto the
q 0i , p0i plane, and A0i be its area. Then it turns out that the sum
of the areas of the projections of R and R0 are the same:
X X
Ai = A0i . (5.82)
i i
That is, the sum of the projected areas on the canonical planes is
preserved by canonical transformations. Another way to say this
is
XZ XZ
i
dq dpi = dq 0i dp0i . (5.83)
i Rqi ,pi i Rq0 0i ,p0
i
R R0
p2 p02
R2 R20
R1 R10
q1 q10
q2 p1 q20 p01
Figure 5.2 A region R0 in phase space is mapped by a canonical

transformation C to a region R. The projections of region R onto the
planes formed by canonical basis pairs (qj , pj ) are Rj . The projections
of R0 are Rj0 . In general, the areas of the regions R and R0 are not the
same, but the sum of the areas of the canonical plane projections are
the same.
To see why this is true we first consider how the area of an incre-
mental parallelogram in phase space transforms under canonical
transformation. Let (∆q, ∆p) and (δq, δp) represent small incre-
ments in phase space, originating at (q, p). Consider the incre-
mental parallelogram with vertex at (q, p) with these two phase
space increments as edges. The sum of the areas of the canonical
projections of this incremental parallelogram can be written
X X
∆Ai = (∆q i δpi − ∆pi δq i ). (5.84)
i i
The right hand side is the sum of the areas on the canonical planes;
for each i we see the area of a parallelogram computed from the
components of the vectors defining its adjacent sides. Let ζ1 =
(0, ∆q, ∆p) and ζ2 = (0, δq, δp), then the sum of the areas of the
incremental parallelograms is just
X
∆Ai = ω(ζ1 , ζ2 ), (5.85)
i
where ω is the bilinear antisymmetric function introduced above.

The function ω is invariant under canonical transformations, so
the sum of the areas of the incremental parallelograms is invariant
under canonical transformations.
The area of an arbitrary region is just the limit of the sum of the
areas of incremental parallelograms that cover the region, so the
sum of oriented areas is preserved by canonical transformations:
X X
Ai = A0i . (5.86)
i i
We define an action-like region to be one for which canonical

coordinates can be chosen so that the region is entirely in the
subspace spanned by a particular canonical pair (q i , pi ). For this
coordinate system the projection on that plane has all of the area.
The projections on the other canonical planes have no area. So
the sum of the areas of the canonical projections is just the area
of the region itself. The sum of the areas of the projections onto
canonical planes is preserved under canonical transformation so
the area of an action-like region is the sum of the areas of the
canonical projections for any canonical coordinate system.
There are also regions which have no action-like projection. For
example, a region in the plane (q i , q j ) has no action-like projection.
Therefore the sum of the areas of the canonical projections is zero,
and this is the case for any canonical coordinate system, though
in other canonical coordinates some of the projections may have
non-zero area to be balanced by negative area of others.
The equality of areas relation (5.83) can also be written as
an equality of line integrals using Stokes’ Theorem, for simply-
connected regions Rqi ,pi and Rq0 0i ,p0i ,
XI XI
i
pi dq = p0i dq 0i . (5.87)
i ∂Rqi ,pi i ∂R 0
q 0i ,p0
i
The canonical planes are disjoint except at the origin, so the pro-
jected areas only intersect in at most one point. Thus we may
independently accumulate the line integrals around the bound-

aries of the individual projections of the region onto the canonical
planes into a line integral around the unprojected region:
I X I X
i
pi dq = p0i dq 0i . (5.88)
∂R i ∂R0 i
Exercise 5.9: Watch out

Consider the canonical transformation C:
p p
(t, x, p) = C(t, θ, J) = (t, 2(J + a) sin θ, 2(J + a) cos θ).
a. Show that the transformation is symplectic for any a.

b. Show that equation (5.88) is not generally satisfied for the region
enclosed by a curve of constant J.
5.4 Extended Phase Space
In this section we show that we can treat time as just another

coordinate if we wish. Systems described by a time-dependent
Hamiltonian may be recast in terms of a time-independent Hamil-
tonian with an extra degree of freedom. An advantage of this view
is that what was a time-dependent canonical transformation can
be treated as a time-independent transformation, where there are
no additional conditions for adjusting the Hamiltonian.
Suppose that we have some system characterized by a time-
varying Hamiltonian. For example, a periodically-driven pendu-
lum. We may imagine that there is some extremely massive oscil-
lator, unperturbed by the motion of the relatively massless pen-
dulum, that produces the drive. Indeed, we may think of time
itself as the coordinate of an infinitely massive particle moving
uniformly and driving everything else. We often consider the ro-
tation of the Earth as exactly such a stable time reference when
performing short-time experiments in the laboratory.
More formally, consider a dynamical system with n degrees of
freedom, whose behavior is described by a possibly time-dependent
Lagrangian L with corresponding Hamiltonian H. We make a new
dynamical system with n + 1 degrees of freedom by extending the
generalized coordinates to include time and introducing a new in-
dependent variable. We also extend the generalized velocities to
include a velocity for the time coordinate. In this new extended
state space the coordinates are redundant, so there is a constraint

relating the time coordinate to the new independent variable.
We relate the original dynamical system to the extended dy-
namical system as follows: Let q be a coordinate path. Let
qe , t : τ 7→ qe (τ ), t(τ ) be a coordinate path in the extended sys-
tem where τ is the new independent variable. Then qe = q ◦ t, or
qe (τ ) = q(t(τ )). Consequently, if v = Dq is the velocity along a
path then ve (τ ) = Dqe (τ ) = Dq(t(τ )) · Dt(τ ) = v(t(τ )) · vt (τ ).
We can find a Lagrangian Le for the extended system by re-
quiring that the value of the action is unchanged. Introduce the
extended Lagrangian action
Z τ2
Se [qe , t](τ1 , τ2 ) = Le ◦ Γ[qe , t], (5.89)
τ1
with
Le (τ ; qe , qt ; ve , vt ) = L(qt , qe , ve /vt )vt . (5.90)
We have
S[q](t(τ1 ), t(τ2 )) = Se [qe , t](τ1 , τ2 ). (5.91)
The Lagrange equations for qe are satisfied for exactly the same
trajectories that satisfy the original Lagrange equations for q.
The extended system is subject to a constraint that relates the
time to the new independent variable. We assume the constraint
is of the form φ(τ ; qe , qt ; ve , vt ) = qt − f (τ ) = 0. The constraint is
a holonomic constraint involving the coordinates and time, so we
can incorporate this constraint by augmenting the Lagrangian:12
L0e (τ ; qe , qt , λ; ve , vt , vλ ) = L(qt , qe , ve /vt )vt + vλ (vt − Df (τ )).(5.92)
The momenta conjugate to the coordinates are:
Pe (τ ; qe , qt , λ; ve , vt , vλ ) = ∂2,0 L0e (τ ; qe , qt , λ; ve , vt , vλ )
= ∂2 L(qt , qe , ve /vt )
= P(qt , qe , ve /vt ) (5.93)
Pt (τ ; qe , qt , λ; ve , vt , vλ ) = ∂2,1 L0e (τ ; qe , qt , λ; ve , vt , vλ )
12
We augment the Lagrangian with the total time derivative of the constraint
so that the Legendre transform will be well defined.
= L(qt , qe , ve /vt ) − ∂2 L(qt , qe , ve /vt )(ve /vt )

+ vλ
= −E(qt , qe , ve /vt ) + vλ
(5.94)
0
Pλ (τ ; qe , qt , λ; ve , vt , vλ ) = ∂2,2 Le (τ ; qe , qt , λ; ve , vt , vλ )
= vt − Df (τ ). (5.95)
So the extended momenta have the same values as the original

momenta at the corresponding states. The momentum conjugate
to the time coordinate is the negation of the energy plus vλ . The
momentum conjugate to λ is the constraint, which must be zero.
Next we carry out the transformation to the corresponding
Hamiltonian formulation. First, note that the Lagrangian Le is
a homogeneous form of degree one in the velocities. Thus, by
Euler’s theorem,
∂2 Le (τ ; qe , qt ; ve , vt ) · (ve , vt ) = Le (τ ; qe , qt ; ve , vt ), (5.96)
and so the Legendre transform of Le is identically zero. For L0e

there are additional terms
∂2 L0e (τ ; qe , qt , λ; ve , vt , vλ ) · (ve , vt , vλ )
= ∂2 Le (τ ; qe , qt ; ve , vt ) · (ve , vt ) + vλ vt + (vt − Df (τ ))vλ
= Le (τ ; qe , qt ; ve , vt ) + vλ vt + (vt − Df (τ ))vλ . (5.97)
So the Hamiltonian He0 corresponding to L0e is
He0 (τ ; qe , qt , λ; pe , pt , pλ ) = vλ vt
= (pt + H(qt , qe , pe ))(pλ + Df (τ )). (5.98)
We have used the fact that that at corresponding states the mo-
menta have the same values, so on paths pe = p ◦ t, and
E(qt , qe , ve /vt ) = H(qt , qe , pe ). (5.99)
The Hamiltonian He0 does not depend on λ so we deduce that

pλ is constant. In fact pλ must be given the value zero, because
it is the constraint. When there is a cyclic coordinate we can
form a reduced Hamiltonian for the remaining degrees of freedom
by substituting the constant value of conserved momentum conju-
gate to the cyclic coordinate into the Hamiltonian. The resulting

Hamiltonian is
He (τ ; qe , qt ; pe , pt ) = (pt + H(qt , qe , pe ))Df (τ ). (5.100)
This extended Hamiltonian governs the evolution of the extended

system, for arbitrary f .13
Hamilton’s equations reduce to
Dqe (τ ) = ∂2 H(t(τ ), qe (τ ), pe (τ ))Df (τ )

Dt(τ ) = Df (τ )
Dpe (τ ) = −∂1 H(t(τ ), qe (τ ), pe (τ ))Df (τ )
Dpt (τ ) = −∂0 H(t(τ ), qe (τ ), pe (τ ))Df (τ ). (5.101)
The second equation gives the required relation between t and

τ . The first and third equations are equivalent to Hamilton’s
equations in the original coordinates. We see this as follows. Using
qe = q ◦ t these can be rewritten
Dq(t(τ ))Dt(τ ) = ∂2 H(t(τ ), q(t(τ )), p(t(τ )))Df (τ )

Dp(t(τ ))Dt(τ ) = −∂1 H(t(τ ), q(t(τ )), p(t(τ )))Df (τ ). (5.102)
Using, Dt(τ ) = Df (τ ), and dividing out these factors out we

recover Hamilton’s equations.14
Now consider the special case for which the time is the same
as the independent variable: f (τ ) = τ , Df (τ ) = 1. In this case
q = qe and p = pe . The extended Hamiltonian becomes
He0 (τ ; qe , t; pe , pt ) = pt + H(t, qe , pe ). (5.103)
Hamilton’s equation for t becomes Dt(τ ) = 1, restating the con-

straint. The Hamilton’s equations for Dqe and Dpe are directly
Hamilton’s equations
Dq(τ ) = ∂2 H(τ, q(τ ), p(τ ))

Dp(τ ) = −∂1 H(τ, q(τ ), p(τ )). (5.104)
13
Once we have made this reduction, taking pλ to be zero, we can no longer
perform a Legendre transform back to the extended Lagrangian system; we
cannot solve for pt in terms of vt . However, the Legendre transform in the
extended system from He0 to L0e , with associated state variables, is well defined.
14
If f is strictly increasing then Df is never zero.
The extended Hamiltonian (5.103) does not depend on the inde-

pendent variable, so it is a conserved quantity. Thus, up to an
additive constant pt is minus the energy. The Hamilton’s equa-
tion for Dpt relates the change of the energy to ∂0 H. Note that
in the more general case, the momentum conjugate to the time is
not the negation of the energy. This choice, t(τ ) = τ , is useful for
a number of applications.
Note that the extension transformation is canonical in the sense
that the two sets of equations of motion describe equivalent dy-
namics. However, the transformation is not symplectic; in fact it
does not even have the same number of input and output variables.
Exercise 5.10: Homogeneous extended Lagrangian

Verify that Le is homogeneous of degree one in the velocities, and that
its Legendre transform is zero.
Exercise 5.11: Lagrange equations

a. Verify the claim that the Lagrange equations for qe are satisfied for
exactly the same trajectories that satisfy the original Lagrange equations
for q.
b. Verify the claim that the Lagrange equation for t relates the rate of
change of energy to ∂0 L.
Exercise 5.12: Lorentz transformations

Investigate Lorentz transformations as point transformations in the ex-
tended phase space.
Restricted three-body problem

An example that shows the utility of reformulating a problem in
the extended phase space is the restricted three-body problem:
the motion of a low mass particle subject to the gravitational
attraction of two other massive bodies, which move in some fixed
orbit. The problem is an idealization of the situation where a
body with very small mass moves in the presence of two bodies
with much larger masses. Any effects of the smaller body on the
larger bodies are neglected. In the simplest version, the motion of
all three bodies is assumed to be in the same plane, and the orbit
of the two massive bodies is circular.
The motion of the bodies with larger masses is not influenced
by the small mass so we model this situation as the small body
moving in a time-varying field of the larger bodies undergoing
a prescribed motion. This situation can be captured as a time

dependent Hamiltonian:
p2x + p2y Gmm1 Gmm2

H(t; x, y; px , py ) = − − (5.105)
2m r1 r2
where r1 and r2 are the distances of the small body to the larger
bodies, and where m is the mass of the small body, and m1 and
m2 are the masses of the larger bodies. Note that r1 and r2 are
quantities that depend both on the position of the small particle
and the time-varying position of the massive particles.
The massive bodies are in circular orbits, and maintain constant
distance from the center of mass. Let a1 and a2 be the distances
to the center of mass, then thepdistances satisfy m1 a1 = m2 a2 .
The angular frequency is Ω = G(m1 + m2 )/a3 where a is the
distance between the masses.
In polar coordinates, with the center of mass of the subsystem
of massive particles at the origin, and with r and θ describing the
position of the low-mass particle, the positions of the two massive
bodies are a2 = m1 a/(m1 +m2 ) with θ2 = Ωt, a1 = m2 a/(m1 +m2 )
with θ1 = Ωt + π. The distances to the point masses are
r22 = r2 + a22 − 2a2 r cos(θ − Ωt)

r12 = r2 + a21 − 2a1 r cos(θ − Ωt − π). (5.106)
So, in polar coordinates, the Hamiltonian is

µ ¶
1 2 p2θ Gmm1 Gmm2
H(t; r, θ; pr , pθ ) = pr + 2 − − . (5.107)
2m r r1 r2
We see therefore that the Hamiltonian can be written in terms of
some function f such that
H(t; r, θ; pr , pθ ) = f (r, θ − Ωt, pr , pθ ). (5.108)
The essential feature is that θ and t only appear in the Hamilto-

nian in the combination θ − Ωt.
One way to get rid of the time dependence, is to choose a new
set of variables with one coordinate equal to this combination
θ − Ωt, by making a point transformation to a rotating frame. We
have shown that
r0 = r (5.109)
θ0 = θ − Ωt (5.110)
p0r = pr (5.111)
p0θ = pθ (5.112)
with
H 0 (t; r0 , θ0 ; p0r , p0θ ) = H(t; r0 , θ0 + Ωt; p0r , p0θ ) − Ωp0θ

= f (r0 , θ0 , p0r , p0θ ) − Ωp0θ (5.113)
is a canonical transformation. The new Hamiltonian, which is not

the energy, is conserved because there is no explicit time depen-
dence. It is a useful integral of motion—the Jacobi constant.15
We can also eliminate the dependence on the independent time-
like variable from the Hamiltonian for the restricted problem by
going to the extended phase space, choosing t = τ . The Hamilto-
nian
He (τ ; r, θ, t; pr , pθ , pt ) = H(t; r, θ; pr , pθ ) + pt
= f (r, θ − Ωt, pr , pθ ) + pt (5.114)
is autonomous and is consequently an integral of the motion.

Again, we see that θ and t only occur in the combination θ − Ωt,
which suggests a point transformation to a new coordinate θ0 =
θ − Ωt. This point transformation is independent of the new in-
dependent variable τ . The transformation is specified in equa-
tions (5.109–5.112), augmented by relations specifying the way
the time coordinate and its conjugate momentum are handled:
t0 = t (5.115)
pt = −Ωp0θ + p0t . (5.116)
The new Hamiltonian is obtained by composing the old Hamilto-

nian with the transformation
He0 (τ ; r0 , θ0 , t0 ; p0r , p0θ , p0t )

= He (τ ; r0 , θ0 + Ωt0 , t0 ; p0r , p0θ , p0t − Ωp0θ )
= f (r0 , θ0 , p0r , p0θ ) + p0t − Ωp0θ (5.117)
15
Actually, the traditional Jacobi constant is C = −2H 0 .
We recognize that the new Hamiltonian in the extended phase

space, which has the same value as the original Hamiltonian in the
extended phase space, is just the Jacobi constant plus p0t . Now,
the new Hamiltonian does not depend on t0 so p0t is a constant
of the motion. In fact its value is irrelevant to the rest of the
dynamical evolution, so we may set the value of p0t to zero if we
like. Thus, we have found that the Hamiltonian in the extended
phase space, which is conserved, is just the Jacobi constant plus
an additive arbitrary constant. We have two routes to the Jacobi
constant: (1) transform the original system to a rotating frame
to eliminate the time dependence, but in the process add extra
terms to the Hamiltonian, and (2) go to the extended phase space
and immediately get an integral, and by going to a rotating frame
of reference recognize that this Hamiltonian is the same as the
Jacobi constant. So sometimes the Hamiltonian in the extended
phase space is a useful integral.
Exercise 5.13: Transformations in the extended phase space

In section 5.2.3 we found that time-dependent transformations for which
the derivative of the coordinate-momentum part is symplectic are canon-
ical only if the Hamiltonian is modified by adding a function K subject
to certain constraints (equation 5.54). Show that the constraints on K
follow from the symplectic condition in the extended phase space, using
the choice that t = τ .
5.4.1 Poincaré-Cartan Integral Invariant

A time-dependent transformation is canonical if in the extended
phase space the Hamiltonians transform by composition and the
extended phase space transformation is symplectic. In section 5.3
we have shown that if the derivative of the transformation is sym-
plectic then the sum of the areas of the projections of any two-
dimensional region of phase space onto the canonical q i , pi planes
is preserved. This is also true of symplectic transformations in
the extended phase space. Let R and R0 be corresponding regions
of extended phase space coordinates. Let Ai be the area of the
projection of the region R onto the canonical q i , pi plane, and A0i
be the area of the projection of the corresponding region R0 onto
the canonical q 0i , p0i plane. In the extended phase space, we also
have a projection onto the t, pt canonical plane. Let An be the
area of the projection onto the t, pt plane. We have then

n
X n
X
Ai = A0i . (5.118)
i=0 i=0
In terms of integrals this is

n Z
X n Z
X
dq i dpi = dq 0i dp0i . (5.119)
i=0 Ri i=0 R 0
i
This equality for the sum of area integrals can be rewritten in

terms of line integrals by Stokes’ theorem:
I Xn I Xn
i
( pi dq ) = ( p0i dq 0i ), (5.120)
∂R i=0 ∂R0 i=0
where the order of the integration and summation can be reversed

because the boundary of R projects to the boundary on the canon-
ical planes.
For the special choice of t = τ this result can be rephrased in
an interesting way. Let E be the value of the Hamiltonian in the
original unextended phase space. Using q n = t and pn = pt = −E
we can write
XZ
n−1 Z XZ
n−1 Z
i 0i 0
dq dpi − dtdE = dq dpi − dt0 dE 0 (5.121)
i=0 Ri Rn i=0 Ri0 Rn
0
and
I n−1 X I n−1
X
i
( pi dq − Edt) = ( p0i dq 0i − E 0 dt0 ). (5.122)
∂R i=0 ∂R0 i=0
The relations (5.121 and 5.122) are two formulations of the Poincaré-
Cartan integral invariant.
5.5 Reduced Phase Space

Suppose we have a system with n + 1 degrees of freedom described
by a time-independent Hamiltonian in a 2n + 2 dimensional phase
space. Here we can play the converse game: we can choose any
generalized coordinate to play the role of “time” and the negation

of its conjugate momentum to play the role of a new n degree of
freedom time-dependent Hamiltonian in a reduced phase space of
2n dimensions.
More precisely, let
¡ ¢
q = q 0 , ..., q n
p = [p0 , ..., pn ] , (5.123)
and suppose we have a system described by a time-independent

Hamiltonian
H(t, q, p) = f (q, p) = E. (5.124)
For each solution path there is a conserved quantity E. Let’s

choose a coordinate q n to be the time in a reduced phase space.
We define the dynamical variables for the n degree of freedom
reduced phase space:
¡ ¢
qr = qr0 , ..., qrn−1
pr = [pr0 , ..., prn−1 ] . (5.125)
In the original phase space a coordinate such as q n maps time to a

coordinate. In the formulation of the reduced phase space we will
have to use the inverse function τ = (q n )−1 to map the coordinate
to the time, giving the new coordinates in terms of the new time
qri = q i ◦ τ
pri = pi ◦ τ,
and thus
Dqri = D(q i ◦ τ ) = (Dq i ◦ τ )(Dτ ) = (Dq i ◦ τ )/(Dq n ◦ τ ) (5.126)

Dpri = D(pi ◦ τ ) = (Dpi ◦ τ )(Dτ ) = (Dpi ◦ τ )/(Dq n ◦ τ ). (5.127)
We propose that a Hamiltonian for our system in the reduced

phase space is the negative of the inverse of f (q 0 , ..., q n ; p0 , ..., pn ) =
E with respect to the pn argument:
Hr (x, qr , pr ) = −(the px such that f (qr , x; pr , px ) = E). (5.128)
Note that in the reduced phase space we will have indices for the
structured variables in the range 0 . . . n − 1 whereas in the original
phase space the indices are in the range 0 . . . n. We will show that
Hr is an appropriate Hamiltonian for the given dynamical system
in the reduced phase space. To compute Hamilton’s equations we
must expand the implicit definition of Hr . We define an auxiliary
function
g(x, qr , pr ) = f (qr , x; pr , −Hr (x, qr , pr )). (5.129)
Note that by construction this function is identically a constant

g = E. Thus all of its partial derivatives are zero:
∂0 g = (∂0 f )n − (∂1 f )n ∂0 Hr = 0
(∂1 g)i = (∂0 f )i − (∂1 f )n (∂1 Hr )i = 0
(∂2 g)i = (∂1 f )i − (∂1 f )n (∂2 Hr )i = 0, (5.130)
where we have suppressed the arguments. Solving for partials of

Hr , we get
(∂1 Hr )i = (∂0 f )i / (∂1 f )n = (∂1 H)i / (∂2 H)n (5.131)

i i n i n
(∂2 Hr ) = (∂1 f ) / (∂1 f ) = (∂2 H) / (∂2 H) . (5.132)
Using these relations we can deduce the Hamilton’s equations in

the reduced phase space from the Hamilton’s equations in the
original phase space. We thus obtain Hamilton’s equations in the
reduced phase space:
Dq i (τ (x))
Dqri (x) =
Dq n (τ (x))
(∂2 H(τ (x), q(τ (x)), p(τ (x))))i
=
(∂2 H(τ (x), q(τ (x)), p(τ (x))))n
= (∂2 Hr (x, qr (x), pr (x)))i (5.133)
Dpi (τ (x))
Dpri (x) =
Dq n (τ (x))
− (∂1 H(τ (x), q(τ (x)), p(τ (x))))i
=
(∂2 H(τ (x), q(τ (x)), p(τ (x))))n
= −(∂1 Hr (x, qr (x), pr (x)))i . (5.134)
Orbits in a central field

Consider planar motion in a central field. We have already seen
this expressed in polar coordinates in equation (3.95):
p2r p2φ
H(t; r, φ; pr , pφ ) = + + V (r) (5.135)
2m 2mr2
There are two degrees of freedom and the Hamiltonian is time-
independent. Thus the energy, the value of the Hamiltonian,
is conserved on realizable paths. Let’s forget about time and
reparametrize this system in terms of the orbital radius r.16 To
do this we solve
H(t; r, φ; pr , pφ ) = E (5.136)
for pr , obtaining
µ ¶ 12
p2φ
H 0 (r; φ; pφ ) = −pr = − 2m(E − V (r)) − (5.137)
r2
which is the Hamiltonian in the reduced phase space.
Hamilton’s equations are now quite simple:
µ ¶− 12
dφ ∂H 0 pφ p2φ
= = 2 2m(E − V (r)) − 2 (5.138)
dr ∂pφ r r
dpφ ∂H 0
=− = 0. (5.139)
dr ∂φ
We see that pφ is independent of r (as it was with t), so for any
particular orbit we may define a constant angular momentum L.
Thus our problem ends up as a simple quadrature:
Z r µ ¶− 12
L L2
φ(r) = 2m(E − V (r)) − 2 dr + φ0 . (5.140)
r2 r
16
We could have chosen to reparametrize in terms of φ, but then both pr
and r would occur in the resulting time-independent Hamiltonian. The path
we have chosen takes advantage of the fact that φ does not appear in our
Hamiltonian, so pφ is a constant of the motion. This structure suggests that
to solve this kind of problem we need to look ahead, as in playing chess.
To see the utility of this procedure we continue our example

with a definite potential energy—a gravitating mass point:
µ
V (r) = − . (5.141)
r
When we substitute this into equation (5.140) we obtain a mess,
which can be simplified to
Z r
dr
φ(r) = L p + φ0 . (5.142)
r 2mEr2 + 2mµr − L2
Integrating this, we obtain a further mess, which can be simplified
and rearranged to obtain the following:
Ã s !
1 mµ 2EL2
= 2 1− 1+ sin(φ(r) − φ0 ) . (5.143)
r L mµ2
This can be recognized as the polar-coordinate form of the equa-

tion of a conic section with eccentricity e and parameter p
1 1 + e cos θ
= (5.144)
r p
where
s
2EL2 L2 π
e= 1+ 2
, p = and θ = φ0 − φ(r) − . (5.145)
mµ mµ 2
In fact, if the orbit is an ellipse with semimajor axis a, we have
p = a(1 − e2 ) (5.146)
and so we can identify the role of energy and angular momentum

in shaping the ellipse:
µ p
E=− and L = mµa(1 − e2 ). (5.147)
2a
What we get from analysis in the reduced phase space is the
geometry of the trajectory, but we lose the time-domain behavior.
The reduction is often worth the price.
Although we have treated time in a special way up until now,
we have found that time is not special. It can be included in the
coordinates to make a driven system autonomous. And it can be

eliminated from any autonomous system in favor of any other co-
ordinate. This leads to numerous strategies for simplifying prob-
lems, by removing time variation, and then performing canonical
transforms on the resulting conservative autonomous system to
make a nice coordinate that we can then dump back into the role
of time.
5.6 Generating Functions
We have considered a number of properties of general canonical

transformations, without having a general method for coming up
with them. Here we introduce the method of generating functions.
The generating function is a real-valued function which compactly
specifies a canonical transformation through its partial derivatives,
as follows.
Consider a real-valued function F1 (t, q, q 0 ) mapping configura-
tions expressed in two coordinate systems to the reals. We will use
F1 to construct a canonical transformation from one coordinate
system to the other. We will show that the following relations
among the coordinates, the momenta, and the Hamiltonians spec-
ify a canonical transformation:
p = ∂1 F1 (t, q, q 0 ) (5.148)
p0 = −∂2 F1 (t, q, q 0 ) (5.149)
H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F1 (t, q, q 0 ). (5.150)
The transformation will then be explicitly given by solving for

one set of variables in terms of the others: To obtain the primed
variables in terms of the unprimed ones, let A be the inverse of
∂1 F1 with respect to the third argument,
q 0 = A(t, q, ∂1 F1 (t, q, q 0 )), (5.151)
then
q 0 = A(t, q, p) (5.152)
p0 = −∂2 F1 (t, q, A(t, q, p)). (5.153)
5.6 Generating Functions 359
Let B be the coordinate part of the phase space transformation

q = B(t, q 0 , p0 ). This B is an inverse function of ∂2 F1 , satisfying
q = B(t, q 0 , −∂2 F1 (t, q, q 0 )). (5.154)
Using B we have and
q = B(t, q 0 , p0 ) (5.155)
p = ∂1 F1 (t, B(t, q 0 , p0 ), q 0 ). (5.156)
To put the transformation in explicit form requires that the inverse

functions A and B exist.
We can use the above relations to verify that some given trans-
formation from one set of phase space coordinates (q, p) with
Hamiltonian function H(t, q, p) to another set (q 0 , p0 ) with Hamil-
tonian function H 0 (t, q 0 , p0 ) is canonical by finding an F1 (t, q, q 0 )
such that the above relations are satisfied. We can also use ar-
bitrarily chosen generating functions of type F1 to generate new
The polar-canonical transformation
The polar-canonical transformation (5.32) from coordinate and
momentum (x, px ) to new coordinate and new momentum (θ, I)
r
2I
x= sin θ (5.157)
α
√
px = 2Iα cos θ, (5.158)
introduced earlier, is canonical. This can also be demonstrated by

finding a suitable F1 generating function. The generating function
satisfies a set of partial differential equations (5.148) and (5.149):
px = ∂1 F1 (t, x, θ) (5.159)
I = −∂2 F1 (t, x, θ). (5.160)
Using the relations (5.157) and (5.158), which specify the canoni-
cal transformation, the first equation (5.159) can be rewritten
px = xα cot θ = ∂1 F1 (t, x, θ), (5.161)

which is easily integrated to yield

α 2
F1 (t, x, θ) = x cot θ + φ(t, θ) (5.162)
2
where φ is some integration “constant” with respect to the first
integration. Substituting this form for F1 into the second partial
differential equation (5.160) we find
α x2
I = −∂2 F1 (t, x, θ) = − ∂1 φ(t, θ), (5.163)
2 sin2 θ
but we see that if we set φ = 0 the desired relations are recovered.
So the generating function
α 2
F1 (t, x, θ) = x cot θ (5.164)
2
generates the polar-canonical transformation. This shows that
this transformation is canonical.
5.6.1 F1 Generates Canonical Transformations

We can prove directly that the transformation generated by F1 is
canonical by showing that if Hamilton’s equations are satisfied in
one set of coordinates then Hamilton’s equations will be satisfied
in the other set of coordinates. Let F1 take arguments (t, x, y).
The relations among the coordinates are
px = ∂1 F1 (t, x, y)
py = −∂2 F1 (t, x, y) (5.165)
and the Hamiltonians are related by
H 0 (t, y, py ) = H(t, x, px ) + ∂0 F1 (t, x, y). (5.166)
Substituting the generating function relations (5.165) into this

equation, we have
H 0 (t, y, −∂2 F1 (t, x, y)) = H(t, x, ∂1 F1 (t, x, y))+∂0 F1 (t, x, y).(5.167)

5.6.1 F1 Generates Canonical Transformations 361
Take the partial derivatives of this equality of expressions with

respect to the variables x and y:17
−(∂2 H 0 )j (∂1 (∂2 F1 )j )i = (∂1 H)i + (∂2 H)j (∂1 (∂1 F1 )j )i + (∂1 ∂0 F1 )i
(∂1 H 0 )i − (∂2 H 0 )j (∂2 (∂2 F1 )j )i = (∂2 H)j (∂2 (∂1 F1 )j )i + (∂2 ∂0 F1 )i
(5.168)
where the arguments are unambiguous and have been suppressed.

On solution paths we can use Hamilton’s equations for the (x, px )
system to replace the partial derivatives of H with derivatives of
x and px , obtaining
−(∂2 H 0 )j (∂1 (∂2 F1 )j )i = −(Dpx )i + (Dx)j (∂1 (∂1 F1 )j )i + (∂1 ∂0 F1 )i

(∂1 H 0 )i − (∂2 H 0 )j (∂2 (∂2 F1 )j )i = (Dx)j (∂2 (∂1 F1 )j )i + (∂2 ∂0 F1 )i .
(5.169)
Now compute the derivative of px and py , from equations (5.165),

along consistent paths
(Dpx )i = (∂1 (∂1 F1 )i )j (Dx)j + (∂2 (∂1 F1 )i )j (Dy)j + ∂0 (∂1 F1 )i

(Dpy )i = −(∂1 (∂2 F1 )i )j (Dx)j − (∂2 (∂2 F1 )i )j (Dy)j − ∂0 (∂2 F1 )i .
(5.170)
Substituting the first of these into the first of equations (5.169)
−(∂2 H 0 )j (∂1 (∂2 F1 )j )i = −(∂1 (∂2 F1 )j )i (Dy)j . (5.171)
Note that (∂2 (∂1 F1 )i )j = (∂1 (∂2 F1 )j )i . Provided that ∂2 ∂1 F1 is

non-singular,18 we have derived one of Hamilton’s equations for
the (y, py ) system
Dy(t) = ∂2 H 0 (t, y(t), py (t)). (5.172)
17
Here we use indices to select particular components of structured objects.
If an index symbol appears both as a superscript and as a subscript in an
expression, the value of the expression is the sum over all possible values of the
index symbol of the designated components (Einstein summation convention).
Thus, for example, if q̇ and p are of dimension n then the indicated product
n−1
pi q̇ i is to be interpreted as Σi=0 pi q̇ i .
18
A structure is non-singular if the determinant of the matrix representation
of the structure is non-zero.
The other Hamilton’s equation,
Dpy (t) = −∂1 H 0 (t, y(t), py (t)), (5.173)
can be derived in a similar way. So the generating function rela-

tions indeed specify a canonical transformation.
What we have shown is that the transformation is canonical,
which means that the equations of motion transform appropri-
ately; we have not shown that the qp part of the transformation
is symplectic. If the transformation is time-independent then the
Hamiltonians transform by composition, and in that circumstance
we know that canonical implies symplectic.
5.6.2 Generating Functions and Integral Invariants

Generating functions can be used to specify a canonical transfor-
mation by the prescription given above. We have shown that the
generating function prescription gives a canonical transformation.
Here we show how to get a generating function from a canonical
transformation, and derive the generating function rules.
The generating function representation of canonical transforma-
tions can be derived from the Poincaré integral invariants. The
outline is the following. We first show that, given a canonical
transformation, the integral invariants imply the existence of a
function of phase-space coordinates that can be written as a path-
independent line integral. Then we show that partial derivatives
of this function, represented in mixed coordinates, give the gener-
ating function relations between the old and new coordinates. We
only need to do this for time independent transformations because
time dependent transformations become time independent in the
extended phase space.
Generating functions of type F1
Recall the result about integral invariants from section 5.3. There
we found that
I X I X
pi dq i = p0i dq 0i , (5.174)
∂R i ∂R0 i
where R0 is a two dimensional region in (q 0 , p0 ) coordinates at time

t, and R = Ct (R0 ) is the corresponding region in (q, p) coordinates,
and where ∂R indicates the boundary of the region R. This holds
for any region and its boundary. We will show that this implies
there is a function F (t, q 0 , p0 ), which can be defined in terms of

line integrals
Z X Z X
0 0 0 0 i
F (t, q , p )−F (t, q0 , p0 ) = pi dq − p0i dq 0i ,(5.175)
γ=Ct (γ 0 ) i γ0 i
where γ 0 is a curve in phase space coordinates that begins at

γ 0 (0) = (q00 , p00 ) and ends at γ 0 (1) = (q 0 , p0 ), and γ is its image
under Ct .
Let
Z X Z X
Gt (γ 0 ) = pi dq i − p0i dq 0i , (5.176)
γ=Ct (γ 0 ) i γ0 i
and let γ10 and γ20 be two paths with the same endpoints. Then
I X I X
0 0 i
Gt (γ2 ) − Gt (γ1 ) = pi dq − p0i dq 0i
∂R ∂R0
= 0. (5.177)
So the value of Gt (γ 0 ) depends only on the endpoints of γ 0 .

Let
Ḡt,q00 ,p00 (q 0 , p0 ) = Gt (γ 0 ), (5.178)
where γ 0 is any path from q00 , p00 to q 0 , p0 . Changing the initial

point from q00 p00 to q10 p01 changes the value of Ḡ by a constant
Ḡt,q10 ,p01 (q 0 , p0 ) − Ḡt,q00 ,p00 (q 0 , p0 ) = Ḡt,q10 ,p01 (q00 , p00 ). (5.179)
So we can define F so that
Ḡt,q00 ,p00 (q 0 , p0 ) = F (t, q 0 , p0 ) − F (t, q00 , p00 ), (5.180)
demonstrating equation (5.175).

The phase-space point (q, p) in unprimed variables corresponds
to (q 0 , p0 ) in primed variables, at an arbitrary time t. Both p and q
are determined given q 0 and p0 . In general, given any two of these
four quantities we can solve for the other two. If we can solve for
the momenta in terms of the positions we get a particular class of
generating functions.19 We introduce the functions
p = fp (t, q, q 0 )
p0 = fp0 (t, q, q 0 ) (5.181)
that solve the transformation equations (t, q, p) = C(t, q 0 , p0 ) for

the momenta in terms of the coordinates at a specified time. With
these we introduce a function F1 (t, q, q 0 ) such that
F1 (t, q, q 0 ) = F (t, q, fp (t, q, q 0 )). (5.182)
The function F1 has the same value as F but has different argu-
ments. We will show that this F1 is in fact the generating function
for canonical transformations introduced in section 5.6. Let’s be
explicit about the definition of F1 in terms of a line integral
F1 (t, q, q 0 ) − F1 (t, q0 , q00 )

Z q,q0
= (fp (t, q, q 0 )dq − fp0 (t, q, q 0 )dq 0 ) . (5.183)
q0 ,q00
The two line integrals can be combined into this one because they
are both expressed as integrals along a curve in (q, q 0 ).
We can use the path independence of F1 to compute the par-
tial derivatives of F1 with respect to particular components and
19
Point transformations are not in this class: we cannot solve for the momenta
in terms of the positions for point transformations, because for a point trans-
formation the primed and unprimed coordinates can be deduced from each
other, so there is not enough information in the coordinates to deduce the
momenta.
consequently derive the generating function relations for the mo-

menta.20 So we conclude that
(∂1 F1 (t, q, q 0 ))i = fpi (t, q, q 0 ) (5.184)
and
(∂2 F1 (t, q, q 0 ))i = −fp0i (t, q, q 0 ). (5.185)
These are just the configuration and momentum parts of the gen-
erating function relations for canonical transformation. So start-
ing with a canonical transformation, we can find a generating
function that gives the coordinate-momentum part of the trans-
formation through its derivatives.
Starting from a general canonical transformation, we have con-
structed an F1 generating function from which the canonical trans-
20
Let F be defined as the path-independent line integral
Z xX
F (x) = fi (x)dxi + F (x0 )
x0 i
then
∂i F (x) = fi (x).
The partial derivatives of F do not depend on the constant point x0 or the
path from x0 to x, so we can choose a path that is convenient for evaluating
the partial derivative. Let
H(x)(∆xi ) = F (x0 , . . . , xi + ∆xi , . . . , xn−1 ) − F (x0 , . . . , xi , . . . , xn−1 ).
The partial derivative of F with respect to the ith component of F is
∂i F (x) = D(H(x))(0).
The function H is defined by the line integral
Z x0 ,...,xi +∆xi ,...,xn−1 X
H(x)(∆xi ) = fj (x)dxj
x0 ,...,xi ,...,xn−1 j
Z x0 ,...,xi +∆xi ,...,xn−1
= fi (x)dxi ,
x0 ,...,xi ,...,xn−1
where the second line follows because the line integral is along the coordinate
direction xi . This is now an ordinary integral so
∂i F (x) = fi (x).
formation may be rederived. So, we expect there is a generating

function for every canonical transformation.21
Generating functions of type F2
Point transformations were excluded from the previous argument
because we could not deduce the momenta from the coordinates.
However, a similar derivation allows us to make a generating func-
tion for this case. The integral invariants give us an equality of
area integrals. There are other ways of writing the equality of
areas relation (5.83) as a line integral. We can also write
I X I X
pi dq i = − qi0 dp0i . (5.186)
∂R i ∂R0 i
The minus sign arises because by flipping the axes we are travers-
ing the area in the opposite sense. Repeating the argument just
given, we can define a function
Z X Z X
0 0 0 0 0 0 i
F (t, q , p )−F (t, q0 , p0 ) = pi dq + qi0 dp0i ,(5.187)
γ=C(t,γ 0 ) i γ0 i
that is independent of the path γ 0 . If we can solve for q 0 and p in

terms of q and p0 we can define the functions
q 0 = fq0 0 (t, q, p0 )
p = fp0 (t, q, p0 ) (5.188)
and define
F2 (t, q, p0 ) = F 0 (t, fq0 0 (t, q, p0 ), p0 ). (5.189)
Then the canonical transformation is given as partial derivatives

of F2 :
(∂1 F2 (t, q, p0 ))i = fp0 i (t, q, p0 ) (5.190)
and
(∂2 F2 (t, q, p0 ))i = fq0i0 (t, q, p0 ). (5.191)
21
There may be some singular cases and topological problems that prevent
this from being rigorously true.
Relationship between F1 and F2

For canonical transformations that can be described by both an F1
and an F2 there must be a relation between them. The alternate
line integral expressions for the area integral are related. Consider
the difference
(F 0 (t, q 0 , p0 ) − F 0 (t, q00 , p00 )) − (F (t, q 0 , p0 ) − F (t, q00 , p00 ))

Z X Z X
0 0i
= pi dq + qi0 dp0i
γ0
i γ0 i
Z X
= d(p0i q 0i )
γ 0
i
X X
= (p0 )i (q 0 )i − (p00 )i (q00 )i . (5.192)
i i
The functions F and F 0 are related by an integrated term
F 0 (t, q 0 , p0 ) − F (t, q 0 , p0 ) = p0 q 0 , (5.193)
as are F1 and F2
F2 (t, q, p0 ) − F1 (t, q, q 0 ) = p0 q 0 . (5.194)
The generating functions F1 and F2 are related by a Legendre

transform:
p0 = −∂2 F1 (t, q, q 0 ) (5.195)

p0 q 0 = −F1 (t, q, q 0 ) + F2 (t, q, p0 ) (5.196)
q 0 = ∂2 F2 (t, q, p0 ). (5.197)
We have passive variables q and t:
−∂1 F1 (t, q, q 0 ) + ∂1 F2 (t, q, p0 ) = 0 (5.198)

−∂0 F1 (t, q, q 0 ) + ∂0 F2 (t, q, p0 ) = 0. (5.199)
But p = ∂1 F1 (t, q, q 0 ) from the first transformation, so
p = ∂1 F2 (t, q, p0 ). (5.200)
Furthermore, since H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F1 (t, q, q 0 ) we can

conclude that:
H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F2 (t, q, p0 ) (5.201)

5.6.3 Classes of Generating Functions

In summary, we have used F1 type generating functions to con-
struct canonical transformations:
p = ∂1 F1 (t, q, q 0 ) (5.202)
p0 = −∂2 F1 (t, q, q 0 ) (5.203)
H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F1 (t, q, q 0 ). (5.204)
We can also represent canonical transformations with generating

functions of the form F2 (t, q, p0 ), where the third argument of F2
is the momentum in the primed system.22
p = ∂1 F2 (t, q, p0 ) (5.205)
q 0 = ∂2 F2 (t, q, p0 ) (5.206)
H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F2 (t, q, p0 ) (5.207)
As in the F1 case to put the transformation in explicit form re-

quires that appropriate inverse functions be constructed to allow
the solution of the equations.
Similarly, we can construct two other forms for generating func-
tions, named mnemonically enough F3 and F4 :
q = −∂1 F3 (t, p, q 0 ) (5.208)

p0 = −∂2 F3 (t, p, q 0 ) (5.209)
H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F3 (t, p, q 0 ) (5.210)
and
22
The various generating functions are traditionally known by the names: F1 ,
F2 , F3 , and F4 . Please don’t blame us.
5.6.3 Classes of Generating Functions 369
q = −∂1 F4 (t, p, p0 ) (5.211)

q 0 = ∂2 F4 (t, p, p0 ) (5.212)
H 0 (t, q 0 , p0 ) − H(t, q, p) = ∂0 F4 (t, p, p0 ) (5.213)
In every case, if the generating function does not depend explic-

itly on time then the Hamiltonians are obtained from each other
purely by composition with the appropriate canonical transforma-
tion. If the generating function depends on time, then there are
additional terms.
The generating functions presented treat the coordinates and
momenta collectively. One could define more complicated gen-
erating functions for which the transformation of each degree of
freedom is specified by generating functions of different types.
Generating functions in extended phase space
We can represent canonical transformations with mixed variable
generating functions. We can extend these to represent trans-
formations in the extended phase space. Let F2 be a generating
function with arguments (t, q, p0 ). Then, the corresponding F2e in
the extended phase space can be taken to be
F2e (τ ; q, t; p0 , p0t ) = tp0t + F2 (t, q, p). (5.214)
The relations between the coordinates and the momenta are the
same as before. We also have
pt = (∂1 F2e )n (τ ; q, t; p0 , p0t ) = p0t + ∂0 F2 (t, q, p)

t0 = (∂2 F2e )n (τ ; q, t; p0 , p0t ) = t. (5.215)
The first equation gives the relationship between the original

Hamiltonians:
H 0 (t, q 0 , p0 ) = H(t, q, p) + ∂0 F2 (t, q, p), (5.216)
as required. We know that time-independent canonical transfor-

mations have symplectic qp part. The generating function rep-
resentation of a time dependent transformation does not depend
on the independent variable in the extended phase space. So, in
extended phase space the qp part of the transformation, which
includes the time and the momentum conjugate to time, is sym-

plectic.
5.6.4 Point Transformations

Point transformations can be represented in terms of a generating
function of type F2 . Equations (5.6), which define a canonical
point transformation derived from a coordinate transformation
F , are:
¡ ¢
(t, q, p) = C (t, q 0 , p0 ) = t, F (t, q 0 ), p0 (∂1 F (t, q 0 ))−1 . (5.217)
Let S be the inverse transformation of F with respect to the

second argument
q 0 = S(t, q), (5.218)
so that q 0 = S(t, F (t, q 0 )). The momentum transformation that

accompanies this coordinate transformation is
p0 = p(∂1 S(t, q))−1 . (5.219)
We can find the generating function F2 that gives this transfor-

mation by integrating equation (5.206) to get
F2 (t, q, p0 ) = p0 S(t, q) + φ(t, q). (5.220)
Substituting this into equation (5.205) we get
p = p0 ∂1 S(t, q) + ∂1 φ(t, q). (5.221)
We do not need the freedom provided by φ so we can set it equal

to zero:
F2 (t, q, p0 ) = p0 S(t, q), (5.222)
with
p = p0 ∂1 S(t, q). (5.223)
So this F2 gives the canonical transformation of equations (5.218)

and (5.219).
The canonical transformation for the coordinate transformation
S is the inverse of the canonical transformation for F . By design
F and S are inverses on the coordinate arguments. The identity

function is q 0 = I(q 0 ) = S(t, F (t, q 0 )). Differentiating yields
1 = ∂1 S(t, F (t, q 0 ))∂1 F (t, q 0 ), (5.224)
so
∂1 F (t, q 0 ) = (∂1 S(t, F (t, q 0 )))−1 . (5.225)
Using this, the relation between the momenta (5.223) is
p = p0 (∂1 F (t, q 0 ))−1 , (5.226)
showing that F2 gives a point transformation equivalent to the

point transformation (5.217)
So from this other point of view we see that the point transfor-
mation is canonical.
The F1 that corresponds to the F2 for a point transformation
is:
F1 (t, q, q 0 ) = F2 (t, q, p0 ) − p0 q 0
= p0 S(t, q) − p0 q 0
= 0. (5.227)
Polar and rectangular coordinates

A commonly required point transformation is the transition be-
tween polar coordinates and rectangular coordinates:
x = r cos θ (5.228)
y = r sin θ.
Using the formula for the generating function of a point transfor-

mation just derived:
³ ´
r cos θ
F2 (t; r, θ; px , py ) = [ px py ] . (5.229)
r sin θ
So the full transformation is derived:
(x, y) = ∂2 F2 (t; r, θ; px , py )
= (r cos θ, r sin θ)
[pr , pθ ] = ∂1 F2 (t; r, θ; px , py )
= [px cos θ + py sin θ, −px r sin θ + py r cos θ] . (5.230)
We can isolate the rectangular coordinates to one side of the trans-

formation and the polar coordinates to the other
1
pr = (px x + py y)
r
pθ = −px y + py x. (5.231)
So, interpreted in terms of Newtonian vectors, pr = r̂ · p~ is the

radial component of the linear momentum and pθ = ||~r × p~|| is the
magnitude of the angular momentum. Since the point transfor-
mation is time independent the Hamiltonian transforms by com-
position.
Rotating coordinates
A useful time-dependent point transformation is the transition to
a rotating coordinate system. This is most easily accomplished in
polar coordinates. Here we have
r0 = r
θ0 = θ − Ωt, (5.232)
where Ω is the angular velocity of the moving frame of reference.

The generating function is
³ ´
r
F2 (t; r, θ; p0r , p0θ ) = [ p0r p0θ ] . (5.233)
θ − Ωt
This yields the transformation equations
r0 =r
θ0 = θ − Ωt
pr = p0r
pθ = p0θ , (5.234)
which show that the momenta are the same in both coordinate
systems. However, here the Hamiltonian is not a simple composi-
tion:
H 0 (t; r0 , θ0 ; p0r , p0θ ) = H(t; r0 , θ0 + Ωt; p0r , p0θ ) − p0θ Ω. (5.235)
The Hamiltonians differ by the derivative of the generating func-

tion with respect to the time argument. In transforming to a ro-
tating frame the values of the Hamiltonians differ by the product
of the angular momentum and the angular velocity of the frame.

Notice that this addition to the Hamiltonian is the same as was
found earlier (5.57).
Exercise 5.14: Rotating coordinates in extended phase space

In the extended phase space the time is one of the coordinates. Carry out
the transformation to rotating coordinates using an F2 -type generating
function in the extended phase space. Compare the Hamiltonian, ob-
tained by composition with the transformation, to Hamiltonian (5.235).
Two-body problem
In this example we illustrate how canonical transformations can
be used to eliminate some of the degrees of freedom, leaving an
essential problem with fewer degrees of freedom.
Suppose only certain combinations of the coordinates appear in
the Hamiltonian. We make a canonical transformation to a new
set of phase-space coordinates such that these combinations of
the old phase space coordinates are some of the new phase space
coordinates. We choose other independent combinations of the
coordinates to complete the set. The advantage is that these other
independent coordinates do not appear in the new Hamiltonian,
so the momenta conjugate to them are conserved quantities.
Let’s see how this idea lets us reduce the problem of two gravi-
tating bodies to the simpler problem of the relative motion of the
two bodies, and in the process discover that the momentum of the
center of mass is conserved.
Consider the motion of two masses m1 and m2 , subject only to
a mutual gravitational attraction described by the potential V (r).
This problem has six degrees of freedom. The rectangular coor-
dinates of the particles are x1 and x2 , with conjugate momenta
p1 and p2 . Each of these is a structure of the three rectangular
components. The distance between the particles is r = kx1 − x2 k.
The Hamiltonian for the two-body problem is:
p21 p2
H(t; x1 , x2 ; p1 , p2 ) = + 2 + V (r). (5.236)
2m1 2m2
We do not need to further specify V at this point.
We note that the only linear combination of coordinates that

appears in the Hamiltonian is x2 − x1 . We choose new coordinates
so that one tuple of the new coordinates is this combination
x = x2 − x1 (5.237)
and to complete the set of new coordinates we choose another

tuple to be some independent linear combination
X = ax1 + bx2 (5.238)
where a and b are to be determined. We can use an F2 type

generating function
F2 (t; x1 , x2 ; p, P ) = (x2 − x1 )p + (ax1 + bx2 )P, (5.239)
where p and P will be the new momenta conjugate to x and X,

respectively. We deduce
(x, X) = ∂2 F2 (t; x1 , x2 ; p, P ) = (x2 − x1 , ax1 + bx2 )

[p1 , p2 ] = ∂1 F2 (t; x1 , x2 ; p, P ) = [−p + aP, p + bP ] . (5.240)
We can solve these for the new momenta:

p1 + p 2
P = (5.241)
a+b
ap2 − bp1
p= . (5.242)
a+b
The generating function is not time dependent so the new
Hamiltonian is the old Hamiltonian composed with the trans-
formation:
(−p + aP )2 (p + bP )2
H 0 (t; x, X; p, P ) = + + V (||x||)
2m1 2m2
p2 P2
= + + V (||x||)
2µ 2M
³ b a ´
+ − pP, (5.243)
m2 m1
with the definitions
1 1 1
= + (5.244)
µ m1 m2
and
1 a2 b2
= + . (5.245)
M m1 m2
We recognize µ as the usual “reduced mass.”
Notice that if the term proportional to pP were not present
then the x and X degrees of freedom would not be coupled at all,
and furthermore, the X part of the Hamiltonian would be just
the Hamiltonian of a free particle which is trivial to solve. The
condition that the “cross terms” disappear is
b a
− = 0, (5.246)
m2 m1
which is satisfied by
a = cm1 (5.247)
b = cm2 (5.248)
for any c. For a transformation to be defined c must be non-zero.

So with this choice the Hamiltonian becomes
H 0 (t; x, X; p, P ) = HX (t, X, P ) + Hx (t, x, p) (5.249)
with
p2
Hx (t, x, p) = + V (r) (5.250)
2µ
and
P2
HX (t, X, P ) = . (5.251)
2M
The reduced mass is the same as before, and now
1
M= (5.252)
c2 (m1+ m2 )
Notice that without further specifying c the problem has been
separated into the problem of determining the relative motion of
the two masses, and the problem of the other degrees of freedom.
We did not need to have a priori knowledge that the center of
mass might be important; and, in fact, only for particular choice

of c = (m1 + m2 )−1 does X become the center of mass.
Exercise 5.15: Jacobi coordinates

Consider an n-body problem in which the potential energy is the sum
of the potential energy of each pair of bodies considered separately, and
this potential energy depends only the distance between these bodies.
A Hamiltonian for this system is
H =T +V (5.253)
with
n−1
X p2i
T (t; x0 , x1 , . . . , xn−1 ; p0 , p1 , . . . , pn−1 ) = , (5.254)
i=0
2mi
and
X
V (t; x0 , x1 , . . . , xn−1 ; p0 , p1 , . . . , pn−1 ) = fij (kxi − xj k), (5.255)
i<j
where xi is the tuple of rectangular coordinates for body i, and pi is the

tuple of conjugate linear momenta for body i.
The potential energy of the system depends only on the relative po-
sitions of the bodies, so the relative motion decouples from the center
of mass motion. There is more than one canonical transformation that
accomplishes this decomposition of center of mass and relative motion
in the n-body problem.
We introduce a notation for the center of mass of the bodies with
indices less than or equal i
Pi
j=0 mi xi
Xi = , (5.256)
ηi
Pi
with ηi = j=0 mi .
a. Define one new coordinate to be the center of mass of the system.
x00 = Xn−1 , (5.257)
and n − 1 other coordinates to be
x0i = xi − Xn−1 , (5.258)
for i > 0, the differences of the position of body i and the center of
mass of the system. Find the associated canonical momenta using an
F2 type generating function. Show that the potential energy can be
written in terms of the coordinates for i > 0. Show that the kinetic
energy is not in the form of a sum of squares of momenta divided by
mass constants. These phase-space coordinates are known as canonical

heliocentric coordinates.
b. The Jacobi coordinates isolate the center of mass motion, without
spoiling the usual diagonal quadratic form of the kinetic energy. The
Jacobi coordinates are defined by
x0i = xi − Xi−1 , (5.259)
the difference of the position of body i and the center of mass of bodies
with lower indices, and
x00 = Xn−1 , (5.260)
the center of mass of all the bodies.
Complete the canonical transformation by finding the conjugate mo-
menta using an F2 type generating function. Show that the kinetic
energy can still be written in the form
n−1
X 2
p0i
T (t; x00 , x01 , . . . , x0n−1 ; p00 , p01 , . . . , p0n−1 ) = , (5.261)
i=0
2m0i
for some constants m0i , and that the potential V can be written solely
in terms of the Jacobi coordinates x0i with indices i > 0.
c. Are there any other canonical transformations that isolate the center
of mass and leave the kinetic energy as a sum of squares of momenta?
Epicyclic motion
It is often useful to compose a sequence of canonical transforma-
tions to make up the transformation we need for any particular
mechanical problem. The transformations we have supplied are
especially useful as components in these computations.
We will illustrate the use of canonical transformations to learn
about planar motion in a central field. The strategy will be to
consider perturbations of circular motion in the central field. The
analysis will proceed by transforming to a rotating coordinate sys-
tem that rides on a circular reference orbit, and then to make ap-
proximations that restrict the analysis to orbits that differ from
the circular orbit only slightly.
Recall that in rectangular coordinates we could easily write a
Hamiltonian for the motion of a particle of mass m in a field
defined by a potential energy that is only a function of the distance
from the origin as follows:
p2x + p2y p
H(t; x, y; px , py ) = + V ( x2 + y 2 ) (5.262)
2m
In this coordinate system Hamilton’s equations are easy, and they

are exactly what is needed to develop trajectories by numerical
integration, but the expressions are not very illuminating:
px
Dx = (5.263)
m
py
Dy = (5.264)
m
p x
Dpx = −DV ( x2 + y 2 ) p (5.265)
x2 + y 2
p y
Dpy = −DV ( x2 + y 2 ) p (5.266)
x2 + y 2
We can learn more by converting to polar coordinates centered
on the source of our field.
x = r cos φ (5.267)
y = r sin φ (5.268)
This coordinate system explicitly incorporates the geometrical

symmetry of the potential energy. Using the results of the previ-
ous section we can write the new Hamiltonian as:
p2r p2φ
H 0 (t; r, φ; pr , pφ ) = + + V (r) (5.269)
2m 2mr2
We can now write Hamilton’s equations in these new coordinates,
and they are much more illuminating than the equations expressed
in rectangular coordinates:
pr
Dr = (5.270)
m
pφ
Dφ = (5.271)
mr2
p2φ
Dpr = − DV (r) (5.272)
mr3
Dpφ = 0 (5.273)
We see that the angular momentum pφ is conserved, and we

are free to choose its constant value, so Dφ depends only on r.
We also see that we can establish a circular orbit at any radius
R0 : we choose pφ = pφ0 so that p2φ0 /(mR03 ) − DV (R0 ) = 0. This
will ensure that Dpr = 0, and thus Dr = 0. The (square of the)

angular velocity of this circular orbit is
DV (R0 )
Ω2 = . (5.274)
mR0
It is instructive to consider how orbits that are close to the circular
orbit differ from the circular orbit. This is best done in a frame
where a body moving in the circular orbit is a stationary point at
the origin. We can do this by converting to coordinates that are
rotating with the circular orbit and centered on the orbiting body.
We will do this in three stages. First we will transform to a polar
coordinate system that is rotating at angular velocity Ω. Then
we will return to rectangular coordinates, and finally, we will shift
the coordinates so the origin is on the reference circular orbit.
We start by examining the system in rotating polar coordinates.
This is a time-varying coordinate transformation:
r0 =r (5.275)
φ0 = φ − Ωt (5.276)
p0r = pr (5.277)
p0φ = pφ (5.278)
Using the formulas developed in the last section we can now write
the new Hamiltonian directly:
p02
r
p02
φ
H 00 (t; r0 , φ0 ; p0r , p0φ ) = + + V (r0 ) − p0φ Ω (5.279)
2m 2mr02
We see that H 00 is not time dependent, and therefore it is con-
served, but it is not energy. Energy is not conserved in the moving
coordinate system, but what is conserved here is a new quantity
which combines the energy with the product of the angular mo-
mentum of the particle in the new frame and the angular velocity
of the frame. We will want to keep track of this term.
Next, we return to rectangular coordinates, but they are rotat-
ing with the reference circular orbit:
x0 = r0 cos φ0 (5.280)
y 0 = r0 sin φ0 (5.281)
p0φ
p0x = p0r cos φ0 − sin φ0 (5.282)
r0
p0φ
p0y = p0r 0
sin φ + cos φ0 . (5.283)
r0
The Hamiltonian is
H 000 (t; x0 , y 0 ; p0x , p0y )

p02 02
x + py
p
= + Ω(y 0 p0x − x0 p0y ) + V ( x02 + y 02 ). (5.284)
2m
With one more quick manipulation we shift the coordinate sys-
tem so that the origin is out on our circular orbit. We define
new rectangular coordinates ξ and η with the following simple
canonical transformation of coordinates and momenta:
ξ = x0 − R0 (5.285)
η = y0 (5.286)
pξ = p0x (5.287)
pη = p0y . (5.288)
In this final coordinate system the Hamiltonian is
0000
p2ξ + p2η
H (t; ξ, η; pξ , pη ) = + Ω(ηpξ − (ξ + R0 )pη )
2m p
+ V ( (ξ + R0 )2 + η 2 ), (5.289)
and Hamilton’s equations are uselessly complicated, but the next

step is to consider only trajectories for which the coordinates ξ
and η are small compared with R0 . Under this assumption we
will be able to construct approximate equations of motion for these
trajectories that are linear in the coordinates, thus yielding simple
analyzable motion. Note that up until here, we have made no
approximations. The equations above are perfectly accurate for
any trajectories in a central field.
The idea is to expand the potential-energy term in the Hamilto-
nian as a series and to discard any term higher than second order
in the coordinates, thus giving us first-order accurate Hamilton’s
equations:
p
U (ξ, η) = V ( (ξ + R0 )2 + η 2 ) (5.290)
η 2
= V (R0 + ξ + + · · ·) (5.291)
2R0
η2
= V (R0 ) + DV (R0 )(ξ + )
2R0
ξ2
+ D2 V (R0 ) + ···. (5.292)
2
So the (negated) generalized forces are:
∂0 U (ξ, η) = DV (R0 ) + D2 V (R0 )ξ + · · · (5.293)

η
∂1 U (ξ, η) = DV (R0 ) + ···. (5.294)
R0
With this expansion we obtain the linearized Hamilton’s equa-
tions:
pξ
Dξ = + Ωη (5.295)
m
pη
Dη = − Ω(ξ + R0 ) (5.296)
m
Dpξ = −DV (R0 ) − D2 V (R0 )ξ + · · · + Ωpη (5.297)
η
Dpη = −DV (R0 ) + · · · − Ωpξ . (5.298)
R0
Of course, once we have linear equations we know how to solve
them exactly. Since the linearized Hamiltonian is conserved we
cannot get exponential expansion or collapse. So the possible
solutions are quite limited. It is instructive to convert these equa-
tions into a second-order system. We use Ω2 = DV (R0 )/(mR0 )
to eliminate the DV terms:
D2 V (R0 )
D2 ξ − 2ΩDη = (Ω2 − )ξ (5.299)
m
D2 η + 2ΩDξ = 0. (5.300)
Combining these we find
D3 ξ + ω 2 Dξ = 0 (5.301)
where
D2 V (R0 )
ω 2 = 3Ω2 + . (5.302)
m
Thus we have a simple harmonic oscillator with frequency ω as
one of the components of the solution. The general solution has
three parts
h i h i
ξ(t) 0
= η0 (5.303)
η(t) 1
h i
1
+ ξ0 (5.304)
−2At
h i
sin(ωt + ϕ0 )
+ C0 2Ω (5.305)
ω cos(ωt + ϕ0 )
where
Ω2 m − D2 V (R0 )
A= . (5.306)
4Ωm
The constants η0 , ξ0 , C0 , and ϕ0 are determined by the initial
conditions. If C0 = 0 the particle of interest is on a circular trajec-
tory, but not necessarily the same one as the reference trajectory.
If C0 = 0 and ξ0 = 0 we have a “fellow traveler”, a particle in
the same circular orbit as the reference orbit, but with different
phase. If C0 = 0 and η0 = 0 we have a particle in a circular orbit
that is interior or exterior to the reference orbit and shearing away
from the reference orbit. The shearing is due to the fact that the
angular velocity for a circular orbit varies with the radius. The
constant A gives the rate of shearing at each radius. If both η0 = 0
and ξ0 = 0 but C0 6= 0 then we have “epicyclic motion”. A particle
in a nearly circular orbit may be seen to move in an ellipse around
the circular reference orbit. The ellipse will be elongated in the
direction of circular motion by the factor 2Ω/ω and it will rotate
in the direction opposite the direction of the circular motion. The
initial phase of the epicycle is ϕ0 . Of course, any combination of
these solutions may exist.
The epicyclic frequency ω and the shearing rate A are deter-
mined by the force law (the radial derivative of the potential en-
ergy). For a force law proportional to a power of the radius
F ∝ r1−n (5.307)
the epicyclic frequency is related to the orbital frequency by

r
ω n
=2 1− (5.308)
Ω 4
and the shearing rate is

A n
= . (5.309)
Ω 4
For a few particular integer force laws we see:
n 0 1 2 3 4 5
A 1 1 3 5
Ω 0 4 2 4 1 4
ω
√ √
Ω 2 3 2 1 0 ±i
We can get some insight into the kinds of orbits that are pro-
duced by the epicyclic approximation by examining a few exam-
ples. For some force laws we have integer ratios of epicyclic fre-
quency to orbital frequency. In those cases we have closed orbits.
For an inverse-square force law (n = 3) we get elliptical orbits
with the center of the field at a focus of the ellipse. Figure 5.3
shows how an approximation to such an orbit can be constructed
by superposition of the motion on an elliptical epicycle with the
motion of the same frequency on a circle. If the force is propor-
tional to the radius (n = 0) we get a two-dimensional harmonic
oscillator. Here the epicyclic frequency is twice the orbital fre-
quency. Figure 5.4 shows how this yields elliptical orbits that are
centered on the source of the central force. An orbit is closed
ω
when Ω is a rational fraction. If the force is proportional to the
−3/4 power of the radius the epicyclic frequency is 3/2 the or-
bital frequency. This yields a 3-lobed pattern that can be seen
in figure 5.5. For other force laws the orbits predicted by this
analysis are multi-lobed patterns produced by precessing approx-
imate ellipses. Most of the cases have incommensurate epicyclic
and orbital frequencies, leading to orbits that do not close in finite
time.
The epicyclic approximation gives a very good idea of what ac-
tual orbits look like. Figure 5.6, drawn by numerical integration
of the orbit produced by integrating the original rectangular equa-
tions of motion for a particle in the field, shows the rosette-type
picture characteristic of incommensurate epicyclic and orbital fre-
quencies for an F = −r−2.3 force law.
We can directly compare a numerically integrated system with
one of our epicyclic approximations. For example the result of
numerically integrating our F ∝ r−3/4 system is very similar to
Figure 5.3 Epicyclic construction of an approximate orbit for F ∝

r−2 . The large dotted circle is the reference circular orbit. The dot-
ted ellipses are the epicycles. The epicycles are twice as long as they
are wide. The solid ellipse is the approximate trajectory produced by a
particle moving on the epicycles. The sense of orbital motion is counter-
clockwise, and the epicycles are rotating clockwise. The arrows represent
the increment of velocity contributed by the epicycle to the circular ref-
erence orbit.
Figure 5.4 Epicyclic construction of an approximate orbit for F ∝ r.

The large dotted circle is the reference circular orbit. The small dotted
circles are the epicycles. The solid ellipse is the approximate trajectory
produced by a particle moving on the epicycles. The sense of orbital
motion is counterclockwise, and the epicycles are rotating clockwise. The
arrows represent the increment of velocity contributed by the epicycle
to the circular reference orbit.
Figure 5.5 Epicyclic construction of an approximate orbit for F ∝

r−3/4 . The large dotted circle is the reference circular orbit. The dotted
ellipses are the epicycles. The epicycles are in a 4 : 3 ratio of length
to width. The solid ellipse is the approximate trajectory produced by a
particle moving on the epicycles. The sense of orbital motion is counter-
clockwise, and the epicycles are rotating clockwise. The arrows represent
the increment of velocity contributed by the epicycle to the circular ref-
erence orbit.
Figure 5.6 The numerically integrated orbit of a particle with a force

law F ∝ r−2.3 . For this law the ratio of the epicyclic frequency to the
orbital frequency is about .83666—close to 5/6, but not quite. This is
manifest in the nearly 5-fold symmetry of the rosette-like shape and the
fact that one must cross approximately six orbits to get from the inside
to the outside of the rosette.
the picture we obtained by epicycles. (See figure 5.7 and compare

it with figure 5.5.)
Figure 5.7 The numerically integrated orbit of a particle with a force

law F ∝ r−3/4 . For this law the ratio of the epicyclic frequency to the
orbital frequency is exactly 3/2. This is manifest in the 3-fold symmetry
of the rosette-like shape and the fact that one must cross two orbits to
get from the inside to the outside of the rosette.
Exercise 5.16: Collapsing orbits

What exactly happens as the force law becomes more steep? Investigate
this by sketching the contours of the Hamiltonian in r, pr space, for
various values of the force-law exponent, n. For what values of n are
there stable circular orbits? In the case that there are no stable circular
orbits what happens to circular and other noncircular orbits? How are
these results consistent with Liouville’s theorem and the non-existence
of attractors in Hamiltonian systems.
5.6.5 Classical “Gauge” Transformations

The addition of a total time derivative to a Lagrangian leads to the
same Lagrange equations. However, the two Lagrangians have dif-
ferent momenta, and they lead to different Hamilton’s equations.
Here, we find out how to represent the corresponding canonical
transformation with a generating function.
Let’s restate the result about total time derivatives and La-
grangians from the first chapter. Consider some function G(t, q)
of time and coordinates. We have shown that if L and L0 are

related by
L0 (t, q, q̇) = L(t, q, q̇) + ∂0 G(t, q) + ∂1 G(t, q)q̇ (5.310)
then the Lagrange equations of motion are the same. The gener-
alized coordinates used in the two Lagrangians are the same, but
the momenta conjugate to the coordinates are different. In the
usual way, define
P(t, q, q̇) = ∂2 L(t, q, q̇) (5.311)
and
P 0 (t, q, q̇) = ∂2 L0 (t, q, q̇). (5.312)
So we have
P 0 (t, q, q̇) = P(t, q, q̇) + ∂1 G(t, q). (5.313)
Evaluated on a trajectory, we have
p0 (t) = p(t) + ∂1 G(t, q(t)). (5.314)
This transformation is a special case of an F2 type transformation.

Let
F2 (t, q, p0 ) = qp0 − G(t, q), (5.315)
then the associated transformation is
q 0 = ∂2 F2 (t, q, p0 ) = q (5.316)
p = ∂1 F2 (t, q, p0 ) = p0 − ∂1 G(t, q) (5.317)
H 0 (t, q 0 , p0 ) = H(t, q, p) + ∂0 F2 (t, q, p0 )
= H(t, q, p) − ∂0 G(t, q). (5.318)
Explicitly, the new Hamiltonian is
H 0 (t, q 0 , p0 ) = H(t, q 0 , p0 − ∂1 G(t, q 0 )) − ∂0 G(t, q 0 ), (5.319)
where we have used the fact that q = q 0 . The transformation is

interesting in that the coordinate transformation is the identity
transformation, but the new and old momenta are not the same,
even in the case in which G has no explicit time dependence.

Suppose we have a Hamiltonian of the form
p2
H(t, x, p) = + V (x) (5.320)
2m
then the transformed Hamiltonian is
(p0 − ∂1 G(t, x0 ))2
H 0 (t, x0 , p0 ) = + V (x0 ) − ∂0 G(t, x0 ). (5.321)
2m
We see that this transformation may be used to modify terms in
the Hamiltonian that are linear in the momenta. Starting from H
the transformation introduces linear momentum terms; starting
from H 0 the transformation eliminates the linear terms.
We illustrate the use of this transformation with the driven
pendulum. The Hamiltonian for the driven pendulum was derived
automatically in section 3.1.1. We repeat the result here (cleaned
up a bit)
H(t, θ, pθ )
p2
= θ 2 − glm cos θ
2ml
pθ m
+ gmys (t) − sin θDys (t) − (cos θ)2 (Dys (t))2 , (5.322)
l 2
where ys is the drive function. The Hamiltonian is rather messy,
and includes a term that is linear in the angular momentum with
a coefficient that depends on both the angular coordinate and the
time. Let’s see what happens if we apply our transformation to
the problem to eliminate the linear term. We can identify the
transformation function G by requiring that the linear term in
momentum is killed:
G(t, θ) = −ml cos θDys (t). (5.323)
The transformed momentum is
p0θ = pθ + ml sin θDys (t), (5.324)
and the transformed Hamiltonian is

(p0θ )2
H 0 (t, θ, p0θ ) = − ml(g + D2 ys ) cos θ
2ml2
m
+ gmys (t) − (ys (t))2 (5.325)
2
Dropping the last two terms, which do not affect the equations of
motion, we find
(p0θ )2
H 0 (t, θ, p0θ ) = − ml(g + D2 ys ) cos θ. (5.326)
2ml2
So we have found, by a straightforward canonical transformation,
a Hamiltonian for the driven pendulum with the rather simple
form of a pendulum with gravitational acceleration that is mod-
ified by the acceleration of the pivot. It is, in fact, the Hamilto-
nian that corresponds to the alternate form of the Lagrangian for
the driven pendulum we found earlier by inspection (see equation
1.120). Here the derivation is by a simple canonical transforma-
tion, motivated by a desire to eliminate unwanted terms that are
linear in the momentum.
Exercise 5.17: Construction of generating functions

Suppose that canonical transformations Ca and Cb are generated by F1
class generating functions F1a and F1b .
a. Show that the generating function for the inverse transformation of
Ca is −F1a .
b. Show that the generating function for the composition transforma-
tion Ca ◦ Cb is F1a + F1b , using the fact that the generating function
does not depend on the intermediate point.
Exercise 5.18: Linear canonical transformations

We consider systems with two degrees of freedom, and transformations
for which the Hamiltonian transforms by composition.
a. Consider the linear canonical transformations that are generated by
F2 (t; x1 , x2 ; p01 , p02 ) = p01 ax1 + p01 bx2 + p02 cx1 + p02 dx2 .
Show that these transformations are just the point transformations, and
that the corresponding F1 is zero.
b. Other linear canonical transformations can be generated by
F1 (t; x1 , x2 ; x01 , x02 ) = x01 ax1 + x01 bx2 + x02 cx1 + x02 dx2 .
Surely we can make even more generators by constructing F3 and F4

class transformations analogously. Are all of the linear canonical trans-
formations obtainable in this way? If not, show one that cannot be so
generated.
c. Can all linear canonical transformations be generated by composi-

tions of transformations generated by the functions shown in parts a
and b above?
d. How many independent parameters are necessary to specify all pos-
sible linear canonical transformations for systems with two degrees of
freedom?
Exercise 5.19: Integral invariants

Consider the linear canonical transformation for a system with two de-
grees of freedom generated by the function:
F1 (t; x1 , x2 ; x01 , x02 ) = x01 ax1 + x01 bx2 + x02 cx1 + x02 dx2 ,
and the general parallelogram, with a vertex at the origin and with
adjacent sides starting at the origin and extending to the phase-space
points (x1a , x2a , p1a , p2a ) and (x1b , x2b , p1b , p2b ).
a. Find the area of the given parallelogram, and find the area of the
target parallelogram under the canonical transformation. Notice that
the area of the parallelogram is not preserved.
b. Find the areas of the projections of the given parallelogram, and the
areas of the projections of the target under canonical transformation.
Show that the sum of the areas of the projections on the action-like
planes is preserved.
Exercise 5.20: Standard map generating function

Find a generating function for the standard map (see exercise 5.5).
Exercise 5.21: An incorrect derivation

The following is an incorrect derivation of the rules for the generating
function. As you read it try to find the bug. Write an essay on this
subject. What is actually the problem?
Let L and L0 be the Lagrangians expressed in two coordinate systems
for which the path is q and q 0 , respectively. We further assume that the
value of L and L0 on the path differ by the time derivative of a function
of the configuration and time evaluated on the path. This function
can be written in terms of the path expressed in terms of both sets of
coordinates. Consider the function F1 (t, q, q 0 ), and its value on the path
Fe1 (t) = F1 (t, q(t), q 0 (t)) at time t. The time derivative of Fe1 is
DFe1 (t) = (∂1 F1 )(t, q(t), q 0 (t))Dq(t)

+ (∂2 F1 )(t, q(t), q 0 (t))Dq 0 (t)
+ ∂0 F1 (t, q(t), q 0 (t)). (5.327)
The relation between the Lagrangians is therefore
L(t, q, q̇) − L0 (t, q 0 , q̇ 0 )
= (∂1 F1 )(t, q, q 0 )q̇ + (∂2 F1 )(t, q, q 0 )q̇ 0 + ∂0 F1 (t, q, q 0 ). (5.328)

Now rewrite the Lagrangians in terms of the Hamiltonians
[pq̇ − H(t, q, p)] − [p0 q̇ 0 − H 0 (t, q 0 , p0 )] (5.329)
= ∂1 F1 (t, q, q 0 )q̇ + ∂2 F1 (t, q, q 0 )q̇ 0 + ∂0 F1 (t, q, q 0 ),
where p is determined by t, q, and q̇ and the Lagrangian L. Similar
relations hold for the primed functions. Let’s collect terms
0 = [p − ∂1 F1 (t, q, q 0 )]q̇
− [p0 + ∂2 F1 (t, q, q 0 )]q̇ 0
− H(t, q, p) + H 0 (t, q 0 , p0 ) − ∂0 F1 (t, q, q 0 ). (5.330)
If the relations (5.148–5.150) hold then each of these lines is inde-
pendently zero, apparently verifying that the Lagrangians differ by a
total time derivative. If this were true then the equations of motion
would be preserved and the transformation would have been shown to
be canonical.23
5.7 Time Evolution is Canonical
In this section we demonstrate that time evolution generates a

canonical transformation: if we consider all possible initial states
of a Hamiltonian system, and we follow all of the trajectories for
the same time interval, then the map from the initial state to the
final state of each trajectory is a canonical transformation.
We use time evolution to generate a transformation
(t, q, p) = C∆ (t0 , q 0 , p0 ) (5.331)
that is obtained in the following way. Let σ(t) = (t, q̄(t), p̄(t)) be a
solution of Hamilton’s equations. The transformation C∆ satisfies
C∆ (σ(t)) = σ(t + ∆), (5.332)
23
Many texts further muddy the matter by introducing an unjustified indepen-
dence argument here: they argue that because q̇ and q˙0 are independent the
relations (5.148–5.150) must hold. This is silly, because p and p0 are functions
of q̇ and q˙0 , respectively, so there are implied dependencies of the velocities
in many places, so it is unjustified to separately set pieces of this equation to
zero. However, notwithstanding this problem, the derivation of the fact that
the transformation is canonical is fallacious.
or, equivalently,
(t + ∆, q̄(t + ∆), p̄(t + ∆)) = C∆ (t, q̄(t), p̄(t)). (5.333)
Notice that C∆ changes the time component. This is the first

transformation of this kind that we have considered.24
Given a state (t0 , q 0 , p0 ) we find the phase space path σ emanat-
ing from this state as an initial condition, satisfying
q 0 = q̄(t0 )
p0 = p̄(t0 ). (5.334)
The value (t, q, p) of C∆ (t0 , q 0 , p0 ) is then (t0 +∆, q̄(t0 +∆), p̄(t0 +∆)).
Time evolution is canonical if the transformation C∆ is symplec-
tic and if the Hamiltonian transforms in an appropriate manner.
The transformation C∆ is symplectic if the bilinear antisymmet-
ric form ω is invariant (see equation 5.73) for a general pair of
linearized state variations with zero time component.
Let ζ 0 be an increment with zero time component of the state
(t , q 0 , p0 ). The linearized increment in the value of C∆ (t0 , q 0 , p0 ) is
0
ζ = DC∆ (t0 , q 0 , p0 )ζ 0 . The image of the increment is obtained by

multiplying the increment by the derivative of the transformation.
On the other hand, the transformation is obtained by time evolu-
tion, so the image of the increment can also be found by the time
evolution of the linearized variational system. Let
ζ̄(t) = (0, ζ̄q (t), ζ̄p (t))

ζ̄ 0 (t) = (0, ζ̄q0 (t), ζ̄p0 (t)) (5.335)
be variations of the state path σ(t) = (t, q̄(t), p̄(t)), then
ζ̄(t + ∆) = DC∆ (t, q(t), p(t))ζ̄(t)

ζ̄ 0 (t + ∆) = DC∆ (t, q(t), p(t))ζ̄ 0 (t). (5.336)
The symplectic requirement is
ω(ζ̄(t), ζ̄ 0 (t)) = ω(ζ̄(t + ∆), ζ̄ 0 (t + ∆)). (5.337)
24
Our theorems about which transformations are canonical are still valid, be-
cause they only required that the derivative of the independent variable be 1.
This must be true for arbitrary ∆, so it is satisfied if the following

quantity is constant:
A(t) = ω(ζ̄(t), ζ̄ 0 (t))

= P (ζ̄ 0 (t))Q(ζ̄(t) − P (ζ̄(t))Q(ζ̄ 0 (t)
= ζ̄p0 (t)ζ̄q (t) − ζ̄p (t)ζ̄q0 (t). (5.338)
We compute the derivative:
DA(t) = Dζ̄p0 (t)ζ̄q (t) + ζ̄p0 (t)Dζ̄q (t)

− Dζ̄p (t)ζ̄q0 (t) − ζ̄p (t)Dζ̄q0 (t). (5.339)
Using Hamilton’s equations the variations satisfy
Dζ̄q (t) = ∂1 ∂2 H(t, q̄(t), p̄(t))ζ̄q (t)

+ ∂2 ∂2 H(t, q̄(t), p̄(t))ζ̄p (t),
Dζ̄p (t) = −∂1 ∂1 H(t, q̄(t), p̄(t))ζ̄q (t)
− ∂2 ∂1 H(t, q̄(t), p̄(t))ζ̄p (t). (5.340)
Substituting these in DA and collecting terms we find25
DA(t) = 0. (5.341)
We conclude that time evolution generates a phase space trans-

formation with symplectic derivative.
To make a canonical transformation we must specify how the
Hamiltonian transforms. The same Hamiltonian describes the
evolution of a state and a time-advanced state because the lat-
ter is just another state. Thus the transformed Hamiltonian is
the same as the original Hamiltonian.
Liouville’s theorem, again
We deduced that volumes in phase space are preserved by time
evolution by showing that the divergence of the phase flow is zero,
using the equations of motion (see section 3.8). We can also use
the fact that volumes in phase space are preserved by the evolution
using the fact that time evolution is a canonical transformation.
25
Partial derivatives of structured arguments do not generally commute, so
this deduction is not as simple as it may appear. It is helpful to introduce
component indices and consider the equation componentwise.
We have shown that phase space volume is preserved for sym-

plectic transformations. Now we have shown that the transforma-
tion generated by time evolution is a symplectic transformation.
Therefore, the transformation generated by time-evolution pre-
serves phase space volume. This is an alternate proof of Liouville’s
theorem.
Another time-evolution transformation
There is another canonical transformation that can be constructed
from time evolution. We define the transformation C∆ 0 such that
0
C∆ = C∆ ◦ S−∆ , (5.342)
where S∆ (a, b, c) = (a + ∆, b, c) shifts the time of a phase-space

state.26 More explicitly, given a state (t, q 0 , p0 ), we evolve the state
that is obtained by subtracting ∆ from t; that is, we take the
state (t − ∆, q 0 , p0 ) as an initial state for evolution by Hamilton’s
equations. The state path σ satisfies
σ(t − ∆) = (t − ∆, q̄(t − ∆), p̄(t − ∆))

= (t − ∆, q 0 , p0 ). (5.343)
The output of the transformation is the state
(t, q, p) = σ(t) = (t, q̄(t), p̄(t)). (5.344)
The transformation satisfies

0
(t, q̄(t), p̄(t)) = C∆ (t, q̄(t − ∆), p̄(t − ∆)). (5.345)
0 are not a consistent phase-space state, the
The arguments of C∆
time argument must be decremented by ∆, and then the transfor-
mation is made by evolution of this state.
Why is this a good idea? Our usual canonical transforma-
tions do not change the time component. This modified time-
evolution transformation is thus of the form discussed previously.
26
The transformation S∆ is an identity on the qp components, so it is symplec-
tic. Although it adjusts the time, it is not a time-dependent transformation
in that the qp components do not depend upon the time. Thus, if we adjust
the Hamiltonian by composition with S∆ we have a canonical transformation.
The resulting time-evolution transformation is canonical, and in

the usual form:
0
(t, q, p) = C∆ (t, q 0 , p0 ). (5.346)
This transformation can also be extended to be a canonical

transformation, with an appropriate adjustment of the Hamil-
tonian. The Hamiltonian H∆ 0 that gives the correct Hamilton’s
equations at the transformed phase space point is the original

Hamiltonian composed with a function that decrements the inde-
pendent variable by ∆:
0
H∆ (t, q, p) = H(t − ∆, q, p), (5.347)
or
0
H∆ = H ◦ S∆ . (5.348)
0 = H.
Notice that if H is time independent then H∆
Let us assume we have a procedure ((C delta-t) state) that
implements a time-evolution transformation of the state state
with time interval delta-t.
We can get a procedure ((Cp delta-t) state) that imple-
ments C∆0 from the ((C delta-t) state) that implements C us-
∆
ing the procedure
(define ((C->Cp C) delta-t)
(compose (C delta-t) (shift-t (- delta-t))))
where shift-t implements S∆ :

(define ((shift-t delta-t) state)
(up
(+ (time state) delta-t)
(coordinate state)
(momentum state)))
To complete the canonical transformation we have a procedure

that transforms the Hamiltonian
(define ((H->Hp delta-t) H)
(compose H (shift-t (- delta-t))))
So both C and C 0 can be used to make canonical transformations

by specifying how the old and new Hamiltonians are related. For
0 the Hamiltonian is
C∆ the Hamiltonian is unchanged. For C∆
time-shifted .
Exercise 5.22: Verification

The condition (5.19) that Hamilton’s equations are preserved for C∆ is
0
Ds H ◦ C∆ = DC∆ Ds H∆ ,
0
and the condition (5.19) that Hamilton’s equations are preserved for C∆
is
0 0 0
Ds H ◦ C∆ = DC∆ Ds H∆ .
Verify that these conditions are satisfied.
Exercise 5.23: Driven harmonic oscillator

We can use the simple driven harmonic oscillator to illustrate that time
evolution yields a symplectic transformation which can be extended to
be canonical in two ways. We use the driven harmonic oscillator because
its solution can be compactly expressed in explicit form.
Suppose that we have a harmonic oscillator with natural frequency
ω0 driven by a periodic sinusoidal drive of frequency ω and amplitude
α. The Hamiltonian we will consider is
H(t, q, p) = 12 p2 + 12 ω02 q 2 − αq cos ωt.
The general solution for a given initial state (t0 , q0 , p0 ) evolved for a time
∆ is
h i
q(t0 + ∆)
p(t0 + ∆)/ω0
h ih i
cos ω0 ∆ sin ω0 ∆ q0 − α0 cos ωt0
=
− sin ω0 ∆ cos ω0 ∆ (1/ω0 )(p0 + α0 ω sin ωt0 )
h i
α0 cos ω(t0 + ∆)
+ 0
−α (ω/ω0 ) sin ω(t0 + ∆)
where α0 = α/(ω02 − ω 2 ).
a. Fill in the details of the procedure
(define (((C alpha omega omega0) delta-t) state)
... )
that implements the time-evolution transformation of the driven har-

monic oscillator.
b. In terms of C the general solution emanating from a given state is
(define (((solution alpha omega omega0) state0) t)
(((C alpha omega omega0) (- t (time state0))) state0))
Check that the implementation of C is correct by using it to construct the

solution and verifying that the solution satisfies Hamilton’s equations.
Further check the solution by comparing to numerical integration.
c. We know that for any phase space state function F the rate of change
of that function along a solution path σ is:
D(F ◦ σ) = ∂0 F ◦ σ + {F, H} ◦ σ
Show, by writing a short program to test it, that this is true of the
function implemented by (C delta) for the driven oscillator. Why is
this interesting?
d. Verify that both C and Cp are symplectic using symplectic?.
e. Use the procedure canonical? to verify that both C and Cp are canon-
ical with the appropriate transformed Hamiltonian.
5.7.1 Another View of Time Evolution

We can also show that time evolution generates canonical trans-
formations using the Poincaré-Cartan integral invariant.
Consider a two-dimensional region of phase space coordinates,
R , at some particular time t0 (see figure 5.8). Let R be the image
0
of this region at time t under time evolution for a time interval

of ∆. The time evolution is governed by a Hamiltonian H. Let
P
i Ai be the sum of the oriented areas of the projectionsP of R
27
onto the fundamental canonical planes. Similarly, let i Ai be 0
P sum of
the P oriented projected areas for R0 . We will show that
0
i Ai = i Ai , and thus the Poincaré integral invariant is pre-
served by time evolution. By showing that the Poincaré integral
invariant is preserved we will have shown that the qp part of the
transformation generated by time evolution is symplectic. From
this we can construct canonical transformations from time evolu-
tion as before.
In the extended phase space we see that the evolution sweeps
out a cylindrical volume with endcaps the regions R0 and R, each
at a fixed time. Let R00 be the two-dimensional region swept out
by the trajectories that map the boundary of region R0 to the
27
By Stokes’ theorem we may compute the area of a region by a line integral
around the boundary of the region. We define the positive sense of the area
to be the area enclosed by a curve that is traversed in a counterclockwise
direction, when drawn on a plane with the coordinate on the abscissa and the
momentum on the ordinate.
(t, q, p)
R
Time
R00
R0
(t0 , q 0 , p0 )
Figure 5.8 All points in some two-dimensional region R0 in phase

space at time t0 are evolved for some time interval ∆. At the time t
the set of points define the two-dimensional region R. For example, the
state labelled by the phase space coordinates (t0 , q 0 , p0 ) evolves to the
state labelled by the coordinates (t, q, p).
boundary of region R. The regions R, R0 , and R00 together form

the boundary of a volume of phase state space.
The Poincaré-Cartan integral invariant on the whole boundary
is zero.28 Thus
n
X n
X n
X
Ai − A0i + A00i = 0, (5.349)
i=0 i=0 i=0
28
We can see this is the following way. Let γ be any closed curve in the
boundary. This curve divides the boundary into two regions. By Stokes’
theorem the integral invariant over both of these pieces can be written as a
line integral along this boundary, but they have opposite signs, because γ is
traversed in opposite directions to keep the surface on the left. So we conclude
that the integral invariant over the entire surface is zero.
where the n index indicates the tT canonical plane. The second

term is negative, because in the extended phase space we take the
area to be positive if the normal to the surface is outward pointing.
We will show that the Poincaré-Cartan integral invariant for a
region of phase space that is generated by time evolution is zero:
n
X
A00i = 0. (5.350)
i=0
This will allow us to conclude

n
X n
X
Ai − A0i = 0. (5.351)
i=0 i=0
The areas of the projection of R and R0 on the tT plane are zero

because R and R0 are at constant times, so for these regions the
Poincaré-Cartan integral invariant is the same as the Poincaré
integral invariant. Thus
n−1
X n−1
X
Ai = A0i . (5.352)
i=0 i=0
We are left with showing that the Poincaré-Cartan integral in-

variant for the region R00 is zero. This will be zero if the contri-
bution from any small piece of R00 is zero. We will show this by
showing that the ω form for a small parallelogram in this region
is zero. Let (0; q, t; p, T ) be a vertex of this parallelogram. The
parallelogram is specified by two edges ζ1 and ζ2 emanating from
this vertex with components (0; ∆q, ∆t; ∆p, ∆T ). For edge ζ1 of
the parallelogram we take a constant time phase space increment
with length ∆q and ∆p in the q and p directions. The first order
change in the Hamiltonian that corresponds to these changes is
∆H = ∂1 H(t, q, p)∆q + ∂2 H(t, q, p)∆p (5.353)
for constant time ∆t = 0. The increment ∆T is the negative of

∆H. So the extended phase space increment is
ζ1 = (0; ∆q, 0; ∆p, −∂1 H(t, q, p)∆q − ∂2 H(t, q, p)∆p). (5.354)

The edge ζ2 is obtained by time evolution of the vertex for a time

interval ∆t. Using Hamilton’s equations we obtain
ζ2 = (0; Dq(t)∆t, ∆t; Dp(t)∆t, DT (t)∆t) (5.355)

= (0; ∂2 H(t, q, p)∆t, ∆t; −∂1 H(t, q, p)∆t, −∂0 H(t, q, p)∆t).
The ω form applied to these incremental states that form the edges
of this parallelogram gives the area of the parallelogram:
ω(ζ1 , ζ2 )
= Q(ζ1 )P (ζ2 ) − P (ζ1 )Q(ζ2 )
= (∆q, 0)
· (−∂1 H(t, q, p)∆t, −∂0 H(t, q, p)∆t)
− (∆p, −∂1 H(t, q, p)∆q − ∂2 H(t, q, p)∆p)
· (∂2 H(t, q, p)∆t, ∆t)
= 0. (5.356)
So we may conclude that the integral of this expression over the

entire surface of the tube of trajectories is also zero. Thus the
Poincaré-Cartan integral invariant is zero for any region that is
generated by time evolution.
Having proven that the trajectory tube provides no contribu-
tion, we have shown that the Poincaré integral invariant of the two
endcaps is the same. This proves that time evolution generates a
symplectic qp transformation.
Area preservation of surfaces of section
We can use the Poincaré-Cartan invariant to prove that for au-
tonomous two degree of freedom systems surfaces of section (con-
structed appropriately) preserve area.
To show this we consider a surface of section for one coordinate
(say q2 ) equal to zero, and we construct the section by accumulat-
ing the (q1 , p1 ) pairs. We assume that all initial conditions have
the same energy. We compute the sum of the areas of canonical
projections in the extended phase space again. Because all initial
conditions have the same q2 = 0 there is no area on the (q2 , p2 )
plane and because all the trajectories have the same value of the
Hamiltonian the area of the projection on the (t, T ) plane is also
zero. So the sum of areas of the projections is just the area of the
region on the surface of section. Now let each point on the surface
of section evolve to the next section crossing. For each point on
5.7.2 Yet Another View of Time Evolution 401
the section this may take a different amount of time. Compute the
sum of the areas again for the mapped region. Again, all points
of the mapped region have the same q2 so the area on the (q2 , p2 )
plane is zero, and they continue to have the same energy so the
area on the (t, T ) plane is zero. So the area of the mapped re-
gion is again just the area on the surface of section, the (q1 , p1 )
plane. Time evolution preserves the sum of areas, so the area on
the surface of section is the same as the mapped area.
So surfaces of section preserve area provided that the section
points are entirely on a canonical plane. For example, for the
Hénon-Heiles surfaces of section we plotted py versus y when x = 0
with px ≥ 0. So for all section points the x coordinate has the
fixed value 0, the trajectories all have the same energy, and the
points accumulated are entirely in the (py , y) canonical plane. So
the Hénon-Heiles surfaces of section preserve area.
5.7.2 Yet Another View of Time Evolution

We can show that time evolution generates a canonical transfor-
mation directly from the action principle.
Recall that the Lagrangian action S is
Z t2
S[q](t1 , t2 ) = L ◦ Γ[q]. (5.357)
t1
We computed the variation of the action in deriving the Lagrange

equations. The variation is (see equation 1.33)
Z t2
δη S[q](t1 , t2 ) = (∂2 L ◦ Γ[q])η|tt21 − (E [L] ◦ Γ[q])η, (5.358)
t1
rewritten in terms of the Euler-Lagrange operator E. In the deriva-

tion of the Lagrange equations we considered only variations that
preserved the endpoints of the path being tested. However equa-
tion (5.358) is true of arbitrary variations. Here we consider varia-
tions that are not zero at the endpoints around a realizable path q
(one for which E [L] ◦ Γ[q] = 0). For these variations the variation
of the action is just the integrated term:
δη S[q](t1 , t2 ) = (∂2 L ◦ Γ[q])η|tt21 = p(t2 )η(t2 ) − p(t1 )η(t1 ). (5.359)
Recall that p and η are structures, and the product implies a sum
of products of components.
Consider a continuous family of realizable paths, the path for

parameter s is q̃(s), and the coordinates of this path at time t are
q̃(s)(t). We define η̃(s) = Dq̃(s); the variation of the path along
the family is the derivative of the parametric path with respect to
the parameter. Let
e
S(s) = S[q̃(s)](t1 , t2 ) (5.360)
be the value of the action from t1 to t2 for path q̃(s). The deriva-
tive of the action along this parametric family of paths is 29
e
DS(s) = δη̃(s) S[q̃(s)]
Z t2
= (∂2 L ◦ Γ[q̃(s)])η̃(s)|tt21 − (E[L] ◦ Γ[q̃(s)])η̃(s). (5.361)
t1
Since q̃(s) is a realizable path E[L] ◦ Γ[q̃(s)] = 0. So

e = (∂2 L ◦ Γ[q̃(s)])η̃(s)|t2
DS(s) t1
= p̃(s)(t2 )η̃(s)(t2 ) − p̃(s)(t1 )η̃(s)(t1 ), (5.362)
where p̃(s) is the conjugate momentum to q̃(s). The integral of

DSe is
Z s2
S[q̃(s2 )](t1 , t2 ) − S[q̃(s1 )](t1 , t2 ) = e
(DS)
s1
Z s2
= (h(t2 ) − h(t1 )), (5.363)
s1
where
h(t)(s) = p̃(s)(t)η̃(s)(t) = p̃(s)(t)Dq̃(s)(t). (5.364)
In conventional notation the latter line integral is written

Z X Z X
i
pi dq − pi dq i , (5.365)
γ2 i γ1 i
where γ1 (s) = q̃(s)(t1 ) and γ2 (s) = q̃(s)(t2 ).
29
Let f be a path dependent function, η̃(s) = Dq̃(s), and g(s) = f [q̃(s)]. The
variation of f at q̃(s) in the direction η̃(s) is δη̃(s) f [q̃(s)] = Dg(s).
5.8 Hamilton-Jacobi Equation 403
For a loop family of paths (such that q̃(s2 ) = q̃(s1 )), the differ-
ence of actions at the endpoints vanishes, so we deduce
I X I X
i
pi dq = pi dq i , (5.366)
γ2 i γ1 i
which is the line-integral version of the integral invariants.

In terms of area integrals, using Stokes’ theorem, this is
XZ XZ
dpi dq i = dpi dq i , (5.367)
i R2i i R1i
where Rji are the regions in the ith canonical plane. We have found
that the time evolution preserves the integral invariants, thus time
evolution generates a canonical transformation.
5.8 Hamilton-Jacobi Equation
If we could find a canonical transformation so that the transformed

Hamiltonian was identically zero, then by Hamilton’s equations
the new coordinates and momenta would be constants. All of the
time variation of the solution would be captured in the canonical
transformation, and there would be nothing more to the solution.
The mixed-variable generating function that does this job satisfies
a partial differential equation called the Hamilton-Jacobi equation.
In most cases, the Hamilton-Jacobi equation cannot be solved
explicitly. When it can be solved the Hamilton-Jacobi equation
provides a means of reducing a problem to a useful simple form.
Recall the relations satisfied by an F2 type generating function:
q 0 = ∂2 F2 (t, q, p0 ) (5.368)
p = ∂1 F2 (t, q, p0 ) (5.369)
H 0 (t, q 0 , p0 ) = H(t, q, p) + ∂0 F2 (t, q, p0 ). (5.370)
If we require the new Hamiltonian to be zero, then F2 must satisfy

the equation
0 = H(t, q, ∂1 F2 (t, q, p0 )) + ∂0 F2 (t, q, p0 ). (5.371)
So the solution of the problem is “reduced” to the problem of

solving an n-dimensional partial differential equation for F2 with
unspecified new (constant) momenta p0 . This is the Hamilton-

Jacobi equation, and in some cases we can solve it.
We can also attempt a somewhat less drastic method of solu-
tion. Rather than try to find an F2 that makes the new Hamilto-
nian identically zero, we can seek an F2 -shaped function W that
gives a new Hamiltonian that is solely a function of the new mo-
menta. A system described by this form of Hamiltonian is also
easy to solve. So if we set
H 00 (t, q 00 , p00 ) = H(t, q, ∂1 W (t, q, p00 )) + ∂0 W (t, q, p00 )

= E(p00 ) (5.372)
and are able to solve for W then the problem is essentially solved.
In this case, the primed momenta are all constant, and the primed
positions are linear in time. This is an alternate form of the
Hamilton-Jacobi equation.
These forms are related. Suppose that we have a W that sat-
isfies the second form of the Hamilton-Jacobi equation (5.372).
Then the F2 constructed from W
F2 (t, q, p0 ) = W (t, q, p0 ) − E(p0 )t (5.373)
satisfies the first form of the Hamilton-Jacobi equation (5.371).

Furthermore
p = ∂1 F2 (t, q, p0 ) = ∂1 W (t, q, p0 ), (5.374)
so the primed momenta are the same in the two formulations. But
q 0 = ∂2 F2 (t, q, p0 )
= ∂2 W (t, q, p0 ) − DE(p0 )t
= q 00 − DE(p0 )t, (5.375)
so we see that the primed coordinates differ by a term that is

linear in time—both p0 (t) = p00 and q 0 (t) = q00 are constant. Thus
we can use either W or F2 as the generating function depending
on the form of the new Hamiltonian that we want.
Note that if H is time independent then we can often find a
time-independent W that does the job. For time-independent W
the Hamilton-Jacobi equation simplifies to
E(p0 ) = H(t, q, ∂1 W (t, q, p0 )). (5.376)

The corresponding F2 is then linear in time. Notice that an im-

plicit requirement is that the energy can be written as a function
of the new momenta alone. This excludes the possibility that the
transformed phase-space coordinates q 0 and p0 are simply initial
conditions for q and p.
Exercise 5.24: Hamilton-Jacobi with F1

We have used an F2 -type generating function to carry out the Hamilton-
Jacobi transformations. Carry out the equivalent transformations with
an F1 -type generating function. Find the equations corresponding to
equations (5.371), (5.372), and (5.376).
5.8.1 Harmonic Oscillator

Consider the familiar time-independent Hamiltonian
p2 kx2
H(t, x, p) = + . (5.377)
2m 2
We form the Hamilton-Jacobi equation for this problem
0 = H(t, x, ∂1 F2 (t, x, p0 )) + ∂0 F2 (t, x, p0 ) (5.378)
Using F2 (t, x, p0 ) = W (t, x, p0 ) − E(p0 )t we find
E(p0 ) = H(t, x, ∂1 W (t, x, p0 )). (5.379)
Writing this out explicitly
(∂1 W (t, x, p0 ))2 kx2

E(p0 ) = + , (5.380)
2m 2
and solving for ∂1 W
s µ ¶
0 0
kx2
∂1 W (t, x, p ) = 2m E(p ) − . (5.381)
2
Integrating gives the desired W :

Z xs µ ¶
0 0
kz 2
W (t, x, p ) = 2m E(p ) − dz. (5.382)
2
We can use either W or the corresponding F2 as the generating

function. First, take W to be the generating function. We obtain
the coordinate transformation by differentiating
x0 = ∂2 W (t, x, p0 )
Z x
mDE(p0 )
= q ¡ ¢ dz (5.383)
kz 2
2m E(p0 ) − 2
and then integrating to get

r Ãs !
m k
x0 = DE(p0 ) arcsin x + C(p0 ), (5.384)
k 2E(p0 )
with some integration constant C(p0 ). Inverting this, we get the

unprimed coordinate in terms of the primed coordinate and mo-
mentum
r · r ¸
2E(p0 ) 1 k 0 0
x= sin (x − C(p )) . (5.385)
k DE(p0 ) m
The new Hamiltonian H 0 depends only on the momentum
H 0 (t, x0 , p0 ) = E(p0 ). (5.386)
The equations of motion are just
Dx0 (t) = ∂2 H 0 (t, x0 (t), p0 (t)) = DE(p0 )

Dp0 (t) = −∂1 H 0 (t, x0 (t), p0 (t)) = 0, (5.387)
with solution
x0 (t) = DE(p0 )t + x00

p0 (t) = p00 (5.388)
for initial conditions x00 and p00 . If we plug these expressions for
x0 (t) and p0 (t) into equation (5.385) we find
r · r ¸
2E(p0 ) 1 k 0 0 0
x(t) = sin (DE(p )t + x0 − C(p ))
k DE(p0 ) m
r ·r ¸
2E(p0 ) k
= sin (t − t0 )
k m
= A sin (ωt + φ) , (5.389)
p
where
p the angular frequency is ω = k/m, the amplitude is A =
2E(p0 )/k, and the phase is φ = −ωt0 = ω(x00 − C(p0 ))/DE(p0 ).
We can also use F2 = W − Et as the generating function. The
new Hamiltonian is zero, so both x0 and p0 are constant, but the
relationship between the old and new variables is
x0 = ∂2 F2 (t, x, p0 )
= ∂2 W (t, x, p0 ) − DE(p0 )t
Z x
mDE(p0 )
= q ¡ − DE(p0 )t
kz 2¢
2m E(p0 ) − 2
r Ãs !
m 0 −1 k
= DE(p ) sin x + C(p0 ) − DE(p0 )t. (5.390)
k 2E(p0 )
Plugging in the solution x0 = x00 and p0 = p00 and solving for

x we find equation (5.389). So once again we see that the two
approaches are equivalent.
It is interesting to note that the solution depends upon the
constants E(p0 ) and DE(p0 ) but otherwise the motion is not de-
pendent in any essential way on what the function E actually is.
The momentum p0 is constant and the values of the constants are
set by the initial conditions. Given a particular function E the
initial conditions determine p0 , but the solution can be obtained
without further specifying the E function.
If we choose particular functions E we can get particular canon-
ical transformations. For example, a convenient choice is simply
E(p0 ) = αp0 , (5.391)
for some constant α that will be chosen later. We find

r
2αp0 ω
x= sin x0 . (5.392)
k α
p
So we see that a convenient choice is α = ω = k/m, so
s
2p0
x= sin x0 , (5.393)
β
√
with β = km. The new Hamiltonian is
H 0 (t, x0 , p0 ) = E(p0 ) = ωp0 . (5.394)
The solution are just x0 = ωt + x00 and p0 = p00 . Substituting the

expression for x in terms of x0 and p0 into H(t, x, p) = H 0 (t, x0 , p0 )
we derive
h ³ k í1/2
p = 2m p0 α − x2
p 2
= 2p0 β cos x0 . (5.395)
The two transformation equations (5.393) and (5.395) are what

we have called the polar-canonical transformation (equation 5.34).
We have already shown that this transformation is canonical and
that it solves the harmonic oscillator, but it was not derived. Here
we have derived this transformation as a particular case of the
solution of the Hamilton-Jacobi equation.
We can also explore other choices for the E function. For ex-
ample, we could choose
E(p0 ) = 12 αp02 . (5.396)
Following the same steps as before

r
αp02 ω x0
x= sin . (5.397)
k α p0
So a convenient choice is again α = ω leaving
p0 x0
x= sin 0
β p
x0
p = βp0 cos 0 , (5.398)
p
with β = (km)1/4 . By construction, this transformation is also

canonical and also brings the harmonic oscillator problem into a
easily solvable form.
H 0 (t, x0 , p0 ) = 12 ωp02 (5.399)
The harmonic oscillator Hamiltonian has been transformed to

what looks a lot like the Hamiltonian for a free particle. This
is very interesting. Notice that whereas Hamiltonian (5.394) does

not have a well defined Legendre transform to an equivalent La-
grangian, the “free particle” harmonic oscillator has a well defined
Legendre transform:
ẋ02
L0 (t, x0 , ẋ0 ) = . (5.400)
2ω
Of course, there may be additional properties that make one choice
more useful than others for particular applications.
Exercise 5.25: Pendulum

Solve the Hamilton-Jacobi equation for the pendulum; investigate both
the circulating and oscillating regions of phase space. (Note: This is a
long story and requires some knowledge of elliptic functions.)
5.8.2 Kepler Problem

We can use the Hamilton-Jacobi equation to find canonical coor-
dinates that solve the Kepler problem. This is an essential first
step to doing perturbation theory for orbital problems.
In rectangular coordinates (x, y, z), the Kepler Hamiltonian is
p2 µ
Hr (t; x, y, z; px , py , pz ) = − , (5.401)
2m r
where r2 = x2 +y 2 +z 2 and p2 = p2x +p2y +p2z . The Kepler problem
describes the relative motion of two bodies; it is also encountered
in the formulation of other problems involving orbital motion such
as the n-body problem.
We try a generating function of the form W (t; x, y, z; p0x , p0y , p0z ).
The Hamilton-Jacobi equation is then30
1 h¡ ¢2
E(p0 ) = ∂1,0 W (t; x, y, z; p0x , p0y , p0z )
2m
¡ ¢2
+ ∂1,1 W (t; x, y, z; p0x , p0y , p0z )
¡ ¢2 i µ
+ ∂1,2 W (t; x, y, z; p0x , p0y , p0z ) − . (5.402)
r
30
Remember that ∂1,0 means the derivative with respect to the first coordinate
position.
This is a partial differential equation in the three partial deriva-

tives of W . We stare at it a while and give up.
Next we try converting to spherical coordinates. This is mo-
tivated by the fact that the potential energy only depends on r.
The Hamiltonian in spherical coordinates (r, θ, φ), where θ is the
colatitude and φ is the longitude, is
· ¸
1 2 p2θ p2φ µ
Hs (t; r, θ, φ; pr , pθ , pφ ) = pr + 2 + 2 2 − . (5.403)
2m r r sin θ r
The Hamilton-Jacobi equation is
E(p01 , p02 , p03 )

1 h 2
= (∂1,0 W (t; r, θ, φ; p01 , p02 , p03 ))
2m
1 2
+ 2 (∂1,1 W (t; r, θ, φ; p01 , p02 , p03 ))
r
1 i µ
2
+ 2 2 (∂1,2 W (t; r, θ, φ; p01 , p02 , p03 )) − . (5.404)
r sin θ r
We can solve the Hamilton-Jacobi equation by successively iso-
lating the dependence on the various variables. Looking first at
the φ dependence, we see that, outside of W , φ appears only in
one partial derivative. If we write
W (t; r, θ, φ; p01 , p02 , p03 ) = f (r, θ, p01 , p02 , p03 ) + p03 φ, (5.405)
then ∂1,2 W (t; r, θ, φ; p01 , p02 , p03 ) = p03 , and then φ does not appear
in the remaining equation for f :
E(p01 , p02 , p03 )

1 n 2
= (∂1,0 f (r, θ, p01 , p02 , p03 ))
2m · ¸¾
1 0 0 0 2 (p03 )2 µ
+ 2 (∂1,1 f (r, θ, p1 , p2 , p3 )) + 2 − . (5.406)
r sin θ r
Any function of the p0i could have been used as the coefficient of
φ in the generating function. This particular choice has the nice
feature that p03 is the z component of the angular momentum.
We can eliminate the θ dependence if we choose
f (r, θ, p01 , p02 , p03 ) = R(r, p01 , p02 , p03 ) + Θ(θ, p01 , p02 , p03 ) (5.407)
and require that Θ solves
2 (p03 )2
(∂0 Θ(θ, p01 , p02 , p03 )) + = (p02 )2 . (5.408)
sin2 θ
We are free to choose the right-hand side to be any function of
the new momenta. This choice reflects the fact that the left-hand
side is non-negative. It turns out that p02 is the total angular
momentum. This equation for Θ can be solved by quadrature.
The remaining equation that determines R is
1 h 2 1 i µ
E(p01 , p02 , p03 ) = (∂1,0 R(r, p01 , p02 , p03 )) + 2 (p02 )2 − , (5.409)
2m r r
which also can be solved by quadrature.
Altogether the solution of the Hamilton-Jacobi equation reads
Z r µ ¶1/2
2mµ (p02 )2
W (r, θ, φ, p01 , p02 , p03 ) = 2mE(p01 , p02 , p03 ) + − 2 dr
r r
Z θ µ ¶1/2
(p0 )2
+ (p02 )2 − 32 dθ
sin θ
+ p03 φ. (5.410)
It is interesting that our solution to the Hamilton-Jacobi partial

differential equation is of the form
W (t; r, θ, φ; p01 , p02 , p03 )

= R(r, p01 , p02 , p03 ) + Θ(θ, p01 , p02 , p03 ) + Φ(φ, p01 , p02 , p03 ). (5.411)
Thus we have a separation of variables technique that involves

writing the solution as a sum of functions of the individual vari-
ables. This might be contrasted with separation of variables tech-
nique encountered in elementary quantum mechanics and classi-
cal electrodynamics which use products of functions of individual
variables. However, integrable problems in classical mechanics are
rare, so it would be incorrect to think of this method as a general
solution method.
The coordinates q10 , q20 , q30 conjugate to the momenta p01 , p02 , p03
are
q10 = ∂2,0 W (t; r, θ, φ; p01 , p02 , p03 )

Z r µ ¶−1/2
2mµ (p02 )2
=m 2mE(p01 , p02 , p03 )+ − 2 dr (5.412)
r r
q20 = ∂2,1 W (t; r, θ, φ; p01 , p02 , p03 )
Z θµ ¶−1/2
(p0 )2
= p02 (p02 )2 − 32 dθ
sin θ
Z r µ ¶−1/2
0 1 0 2mµ (p02 )2
− p2 2mp1 + − 2 dr (5.413)
r2 r r
q30 = ∂2,2 W (t; r, θ, φ; p01 , p02 , p03 )
= φ. (5.414)
We are still free to choose the functional form of E. A conve-

nient (and conventional) choice is
mµ2
E(p01 , p02 , p03 ) = − . (5.415)
2(p01 )2
With this choice the momentum p01 has dimensions of angular

momentum, and the conjugate coordinate is an angle.
The Hamiltonian for the Kepler problem is reduced to
mµ2
H 0 (t; q10 , q20 , q30 ; p01 , p02 , p03 ) = E(p01 , p02 , p03 ) = − . (5.416)
2(p01 )2
Thus
q10 = nt + q10
0
(5.417)
q20 = q20
0
(5.418)
q30 = q30
0
, (5.419)
where n = mµ2 /(p01 )3 and where q10

0 , q 0 , and q 0 are the initial
20 30
values. Only one of the new variables changes with time.31
31
The canonical phase space coordinates can be written in terms of the pa-
rameters that specify an orbit. We will just summarize the results. For further
explanation see [33] or [35].
Assume we have a bound orbit, with semimajor axis a, eccentricity e,
inclination i, longitude of ascending node Ω, argument of pericenter √ ω,
and mean anomaly M . The three canonical momenta are p01 = mµa,
p p
p02 = mµa(1 − e2 ), and p03 = mµa(1 − e2 ) cos i. The first momentum
is related to the energy, the second momentum is the total angular momen-
tum, and the third momentum is the component of the angular momentum
5.8.3 F2 and the Lagrangian 413
5.8.3 F2 and the Lagrangian

The solution to the Hamilton-Jacobi equation, the mixed variable
generating function that generates time evolution, is related to the
action used in the variational principle. In particular, on realizable
paths the derivative of the generating function has the same value
as the Lagrangian.
Let Fe2 (t) = F2 (t, q(t), p0 (t)) be the value of F2 along the paths
q and p0 at time t. The derivative of Fe2 is
DFe2 (t) = ∂1 F2 (t, q(t), p0 (t))Dq(t)

+ ∂2 F2 (t, q(t), p0 (t))Dp0 (t)
+ ∂0 F2 (t, q(t), p0 (t))
= p(t)Dq(t)
+ ∂2 F2 (t, q(t), p0 (t))Dp0 (t)
+ ∂0 F2 (t, q(t), p0 (t)), (5.420)
where we have used the relation for p in terms of F2 in the first

term. Using the Hamilton-Jacobi equation (5.371) this becomes
DFe2 (t) = p(t)Dq(t) − H(t, q(t), p(t)) + ∂2 F2 (t, q(t), p0 (t))Dp0 (t)
= L(t, q(t), Dq(t)) + ∂2 F2 (t, q(t), p0 (t))Dp0 (t). (5.421)
On realizable paths we have Dp0 (t) = 0, so along realizable paths

the time derivative of F2 is the same as the Lagrangian along
the path. The time integral of the Lagrangian along any path is
the action along that path. This means that, up to an additive
term that is constant on realizable paths but may be a function
of the transformed phase-space coordinates q 0 and p0 , the F2 that
solves the Hamilton-Jacobi equation has the same value as the
Lagrangian action for realizable paths.
The same conclusion follows for the Hamilton-Jacobi equation
formulated in terms of F1 . Up to an additive term that is con-
stant on realizable paths but may be a function of the transformed
phase-space coordinates q 0 and p0 , the F1 that solves the corre-
sponding Hamilton-Jacobi equation has the same value as the La-
grangian action for realizable paths.
in the ẑ direction. The conjugate canonical coordinates are q10 = M , q20 = ω,

and q30 = Ω.
Recall that a transformation given by an F2 -type generating

function is also given by an F1 -type generating function related to
it by a Legendre transform (see equation 5.196):
F1 (t, q, q 0 ) = F2 (t, q, p0 ) − q 0 p0 , (5.422)
provided the transformations are non-singular. In this case, both

q 0 and p0 are constant on realizable paths, so the additive constants
that make F1 and F2 equal to the Lagrangian action differ by q 0 p0 .
Exercise 5.26: Harmonic oscillator

Let’s check this for the harmonic oscillator (of course).
a. Finish the integral (5.382):
Z xs µ ¶
0 0
kz 2
W (t, x, p ) = 2m E(p ) − .dz
2
p
Write the result in terms of the amplitude A = 2E(p0 )/k.
b. Check that this generating function gives the transformation:
r µ ¶
m x
x0 = ∂2 W (t, x, p0 ) = DE(p0 ) sin−1 p
k 2E(p0 )/k
which is the same as equation (5.384) for a particular choice of the

integration constant. The other part of the transformation is
√ p
p = ∂1 W (t, x, p0 ) = mk A2 − x2 ,
with the same definition of A as before.
c. Compute the time derivative of the associated F2 along realizable
paths (Dp0 = 0), and compare to the Lagrangian along realizable paths.
5.8.4 The Action Generates Time Evolution

We define the function F̄ (t1 , q1 , t2 , q2 ) to be the value of the action
for a realizable path q such that q(t1 ) = q1 and q(t2 ) = q2 . So F̄
satisfies
Z t2
F̄ (t1 , q(t1 ), t2 , q(t2 )) = S[q](t1 , t2 ) = L ◦ Γ[q]. (5.423)
t1
5.8.4 The Action Generates Time Evolution 415
For variations η that are not necessarily zero at the end times
and for realizable paths q the variation of the action is
δη S[q](t1 , t2 ) = ∂2 L ◦ Γ[q]η|tt21
= p(t2 )η(t2 ) − p(t1 )η(t1 ). (5.424)
Alternatively, the variation of S[q] in equation (5.423) gives
δη S[q](t1 , t2 ) = ∂1 F̄ (t1 , q(t1 ), t2 , q(t2 ))η(t1 )

+ ∂3 F̄ (t1 , q(t1 ), t2 , q(t2 ))η(t2 ). (5.425)
Comparing equations (5.424) and (5.425), and using the fact that
the variation η is arbitrary, we find
∂1 F̄ (t1 , q(t1 ), t2 , q(t2 )) = −p(t1 )

∂3 F̄ (t1 , q(t1 ), t2 , q(t2 )) = p(t2 ). (5.426)
The partial derivatives of F̄ with respect to the coordinate argu-

ments give the momenta. Abstracting off paths, we have
∂1 F̄ (t1 , q1 , t2 , q2 ) = −p1
∂3 F̄ (t1 , q1 , t2 , q2 ) = p2 . (5.427)
This sort of looks like the F1 type generating function relations,

but here there are two times.
Given a realizable path q such that q(t1 ) = q1 and q(t2 ) = q2 ,
we get the partial derivatives with respect to the time slots:
∂0 (S[q])(t1 , t2 ) = −L(t1 , q(t1 ), Dq(t1 ))

= ∂0 F̄ (t1 , q1 , t2 , q2 ) + ∂1 F̄ (t1 , q1 , t2 , q2 )Dq(t1 )
= ∂0 F̄ (t1 , q1 , t2 , q2 ) − p(t1 )Dq(t1 ). (5.428)
Therefore
∂0 F̄ (t1 , q1 , t2 , q2 ) = H(t1 , q1 , p1 )
= H(t1 , q1 , −∂1 F̄ (t1 , q1 , t2 , q2 )). (5.429)
And similarly
∂2 F̄ (t1 , q1 , t2 , q2 ) = −H(t2 , q2 , p2 )
= −H(t2 , q2 , ∂3 F̄ (t1 , q1 , t2 , q2 )). (5.430)
These are a pair of the Hamilton-Jacobi equations, computed at

the endpoints of the path.
Solving equations (5.427) for q2 and p2 as functions of t2 , and
the initial state t1 , q1 , p1 , we get the time evolution of the system
in terms of F̄ . The function F̄ generates time evolution.
The function F̄ can be written in terms of the F2 or F1 that
solves the Hamilton-Jacobi equation. We can compute time evo-
lution by using the F2 solution of the Hamilton-Jacobi equation
to express the state (t1 , q1 , p1 ) in terms of the constants q 0 and p0
at a given time t1 . We can then perform a subsequent transfor-
mation back from q 0 p0 to the original state variables at a different
time t2 , giving the state (t2 , q2 , p2 ). The composition of canoni-
cal transformations is canonical. The generating function for the
composition is the difference of the generating functions for each
step:
F̄ (t1 , q1 , t2 , q2 ) = F2 (t2 , q2 , p0 ) − F2 (t1 , q1 , p0 ), (5.431)
with the condition
∂2 F2 (t2 , q2 , p0 ) − ∂2 F2 (t1 , q1 , p0 ) = 0, (5.432)
which allows us to eliminate p0 .
Exercise 5.27: Uniform acceleration

a. Compute the Lagrangian action, as a function of the endpoints and
times, for a uniformly accelerated particle. Use this to construct the
canonical transformation for time evolution from a given initial state.
b. Solve the Hamilton-Jacobi equation for the uniformly accelerated
particle, obtaining the F2 that makes the transformed Hamiltonian zero.
Show that the Lagrangian action can be expressed as a difference of two
applications of this F2 .
5.9 Lie Transforms
The evolution of a system under any Hamiltonian generates a con-

tinuous family of canonical transformations. To study the behav-
ior of some system governed by a Hamiltonian H it is sometimes
appropriate to use a canonical transformation generated by evo-
lution governed by another Hamiltonian-like function W on the
same phase space. Such a canonical transformation is called a Lie
transform.
C∆,H C∆,H 0
0
C²,W
t, q, p t, q 0 , p0
t0 , q0 , p0 t0 , q00 , p00
Figure 5.9 Time evolution of a trajectory started at the point

(t0 , q0 , p0 ), governed by the Hamiltonian H is transformed by the Lie
transform governed by the generator W . The time evolution of the
transformed trajectory is governed by the Hamiltonian H 0
The functions H and W are both Hamiltonian-shaped functions

defined on the same phase space. Time evolution for an interval
∆ governed by H is a canonical transformation C∆,H . Evolution
0
by W for an interval ² is a canonical transformation C²,W :
0
(t, q, p) = C²,W (t, q 0 , p0 ). (5.433)
The independent variable in the H evolution is time, and the inde-

pendent variable in the W evolution is an arbitrary parameter of
the canonical transformation. We chose C 0 for the W evolution so
that the canonical transformation induced by W does not change
the time in the system governed by H.
Figure 5.9 shows how a Lie transform is used to transform a
trajectory. We can see from the diagram that the canonical trans-
formations obey the relation:
0 0
C²,W ◦ C∆,H 0 = C∆,H ◦ C²,W . (5.434)
For generators W that do not depend on the independent vari-

0
able the resulting canonical transformation C²,W is time-independent
and symplectic. For a time-independent symplectic transforma-
tion, the transformation is canonical if the Hamiltonian transforms

by composition32
H 0 = H ◦ C²,W
0
. (5.435)
We will only work with Lie transforms with generators that are
independent of the independent variable.
Lie transforms of functions
The value of a phase-space function F changes if its arguments
0
change. We define the function E²,W of a function F of phase-
space coordinates (t, q, p) by
0 0
E²,W F = F ◦ C²,W . (5.436)
0
We say that E²,W F is the Lie transform of the function F .
In particular, the Lie transform advances the coordinate and
momentum selector functions Q = I1 and P = I2 :
0
(E²,W Q)(t, q 0 , p0 ) = (Q ◦ C²,W
0
)(t, q 0 , p0 ) = Q(t, q, p) = q
0
(E²,W P )(t, q 0 , p0 ) = (P ◦ C²,W
0
)(t, q 0 , p0 ) = P (t, q, p) = p (5.437)
So we may restate equation (5.436) as:

0
(E²,W F )(t, q 0 , p0 )
0
= F (t, (E²,W Q)(t, q 0 , p0 ), (E²,W
0
P )(t, q 0 , p0 )). (5.438)
More generally, Lie transforms descend into compositions:

0 0
(E²,W (F ◦ G)) = F ◦ (E²,W G) (5.439)
32
In general, the generator W could depend on its independent variable. If
so, it would be necessary to specify a rule that gives the initial value of the
independent variable for the W evolution. This rule may or may not depend
upon the time. If the specification of the independent variable for the W evo-
lution does not depend on time then the resulting canonical transformation
0
C²,W is time independent and the Hamiltonians transform by composition. If
the generator W depends on its independent variable and the rule for speci-
0
fying its initial value depends on time, then the transformation C²,W is time
dependent. In this case there may need to be an adjustment to the relation
between the Hamiltonians H and H 0 . In the extended phase space all these
complications disappear. There is only one case. We can assume all generators
W are independent of the independent variable.
0
In terms of E²,W we have the canonical transformation:
0
q = (E²,W Q)(t, q 0 , p0 )
0
p = (E²,W P )(t, q 0 , p0 )
H 0 = E²,W
0
H. (5.440)
We can also say

0
(t, q, p) = (E²,W I)(t, q 0 , p0 ), (5.441)
where I is the phase space identity function: I(t, q, p) = (t, q, p).

0
Note that E²,W has the property:33
E²0 1 +²2 ,W = E²0 1 ,W ◦ E²0 2 ,W = E²0 2 ,W ◦ E²0 1 ,W . (5.442)
The identity I is
0
I = E0,W . (5.443)
We can define the inverse function

0
(E²,W )−1 = E−²,W
0
(5.444)
with the property

0 0
I = E²,W ◦ (E²,W )−1 = (E²,W
0
)−1 ◦ E²,W
0
. (5.445)
Simple Lie transforms

For example, suppose we are studying a system for which a rota-
tion would be a helpful transformation. To concoct such a trans-
formation we note that we intend a configuration coordinate to
increase uniformly with a given rate. In this case we want an
angle to be incremented. The Hamiltonian which consists solely
of the momentum conjugate to that configuration coordinate al-
ways does the job. So the angular momentum is an appropriate
generator for rotations.
The analysis is simple if we use polar coordinates r, θ with con-
jugate momenta pr , pθ . The generator W is just:
W (τ ; r, θ; pr , pθ ) = pθ (5.446)
33 0
The set of transformations E²,W with the operation composition and with
parameter ² is a one parameter Lie group.
The family of transformations satisfies Hamilton’s equations:
Dr =0
Dθ =1
Dpr =0
Dpθ =0 (5.447)
Since the only variable which appears in W is pθ then θ is the only

variable that varies as ² is varied. In fact the family of canonical
transformations is:
r = r0
θ = θ0 + ²
pr = p0r
pθ = p0θ (5.448)
So angular momentum is the generator of a canonical rotation.

The example is simple, but it illustrates one important feature
of Lie transformations—they give one set of variables entirely in
terms of the other set of variables. This should be contrasted with
the mixed-variable generating function transformations which al-
ways give a mixture of old and new variables in terms of a mixture
of new and old variables, and thus require an inversion to get one
set of variables in terms of the other set of variables. This in-
verse can only be written in closed form for special cases. In
general there is considerable advantage in using a transformation
rule that generates explicit transformations from the start. The
Lie transformations are always explicit, in the sense that they give
one set of variables in terms of the other, but for there to be ex-
plicit expressions the evolution governed by the generator must
be solvable.
Let’s consider another example. This time consider a three
degree of freedom problem in rectangular coordinates, and take
the generator of the transformation to be the z component of the
angular momentum:
W (τ ; x, y, z; px , py , pz ) = xpy − ypx (5.449)
The evolution equations are
Dx = −y
Dy =x
Dz =0
Dpx = −py
Dpy = px
Dpz =0 (5.450)
We notice that z and pz are unchanged; and that the equations

governing the evolution of x and y decouple from those of px and
py . Each of these pairs of equations represent simple harmonic
motion, as can be seen by writing them as second order systems.
The solutions are
x = x0 cos ² − y 0 sin ²
y = x0 sin ² + y 0 cos ²
z = z0 (5.451)
px = p0x cos ² − p0y sin ²
py = p0x sin ² + p0y cos ²
pz = p0z (5.452)
So we see that again a component of the angular momentum gen-

erates a canonical rotation. There was nothing special about our
choice of axes, so we can deduce that the component of angular
momentum about any axis generates rotations about that axis.
Example
Suppose we have a system governed by the Hamiltonian
H(t; x, y; px , py ) = 12 (p2x + p2y ) + 12 a(x − y)2 + 12 b(x + y)2 . (5.453)
Hamilton’s equations couple the motion of x and y
Dx = px
Dy = py
Dpx = −a(x − y) − b(x + y)
Dpy = a(x − y) − b(x + y). (5.454)
We can decouple the system by performing a coordinate rota-

tion by π/4. This is generated by
W (τ ; x, y; px , py ) = xpy − ypx , (5.455)

which is similar to the one above but without the z degree of

freedom. Evolving (τ ; x, y; px , py ) by W for an interval of π/4
gives a canonical rotation:
x = x0 cos π/4 − y 0 sin π/4

y = x0 sin π/4 + y 0 cos π/4
px = p0x cos π/4 − p0y sin π/4
py = p0x sin π/4 + p0y cos π/4. (5.456)
Composing the Hamiltonian H with this time independent trans-

formation gives the new Hamiltonian
H 0 (t; x0 , y 0 ; p0x , p0y ) = ( 12 (p0x )2 + b(x0 )2 ) + ( 12 (p0y )2 + a(y 0 )2 ), (5.457)
which is a Hamiltonian for two uncoupled harmonic oscillators.

So the original coupled problem has been transformed by a Lie
transform to a new form for which the solution is easy.
5.10 Lie Series
Taylor’s theorem gives us a way of approximating the value of a

nice enough function at a point near to a point where the value
is known. If we know f and all of its derivatives at t then we can
get the value of f (t + ²) for small enough ², as follows:
1 1
f (t+²) = f (t)+²Df (t)+ ²2 D2 f (t)+· · ·+ ²n Dn f (t)+· · ·(5.458)
2 n!
We also recall that the power series for the exponential function
is:
1 1
ex = 1 + x + x2 + · · · + xn + · · · (5.459)
2 n!
This suggests that we can formally construct a Taylor-series op-
erator as the exponential of a differential operator34
1 1
e²D = I + ²D + (²D)2 + · · · + (²D)n + · · · (5.460)
2 n!
34
We are playing fast-and-loose with differential operators here. In a formal
treatment it is essential to prove that these games are mathematically well-
defined and have appropriate convergence properties.
5.10 Lie Series 423
with the goal that we will be able to write
f (t + ²) = (e²D f )(t). (5.461)
We have to be a bit careful here: (²D)2 = ²D²D. We can only turn

it into ²2 D2 because ² is a scalar constant which must commute
with every differential operator. But with this caveat in mind we
can define the differential operator
1 1
(e²D f )(t) = f (t) + ²Df (t) + ²2 D2 f (t) + · · · + ²n Dn f (t) + · · ·
2 n!
(5.462)
Before going on, it is interesting to compute with these a bit.

In the code transcripts that follow we develop the series by expo-
nentiation. We can incrementally examine the series by looking
at successive elements of the (infinite) sequence of terms of the se-
ries. The procedure series:for-each is an incremental traverser
which applies its first argument to successive elements of the se-
ries given as its second argument. The third argument (when
given) specifies the number of terms to be traversed. In each of
the following transcripts we print simplified expressions for the
successive terms.
The first thing to look at is the general Taylor expansion for
an unknown literal function, expanded around t, with increment
². Understanding what we see in this simple problem will help us
understand what we will see in more complex problems later.
(series:for-each print-expression
(((exp (* ’epsilon D))
(literal-function ’f))
’t)
6)
(f t)
(* ((D f) t) epsilon)
(* 1/2 (((expt D 2) f) t) (expt epsilon 2))
...
We can also look at the expansions of particular functions that

we recognize, such as the expansion of sin around 0.
(((exp (* ’epsilon D)) sin) 0)
6)
0
epsilon
0
(* -1/6 (expt epsilon 3))
0
(* 1/120 (expt epsilon 5))
...
It is often instructive to
√ expand functions we usually don’t re-
member, such as f (x) = 1 + x.
(((exp (* ’epsilon D))
(lambda (x) (sqrt (+ x 1))))
0)
6)
1
(* 1/2 epsilon)
...
Exercise 5.28: Binomial series

Develop the binomial expansion of (1 + x)n as a Taylor expansion. Of
course, it must be the case that for n a positive integer all of the coeffi-
cients except for the first n + 1 are zero. However, in the general case,
for symbolic n, the coefficients are rather complicated polynomials in n.
For example, you will find that the seventh term is:
(+ (* 1/5040 (expt n 7))
(* -1/240 (expt n 6))
(* 5/144 (expt n 5))
(* -7/48 (expt n 4))
(* 29/90 (expt n 3))
(* -7/20 (expt n 2))
(* 1/7 n))
These terms must evaluate to the entries in Pascal’s triangle. In partic-

ular, this polynomial must be zero for n < 7. How is this arranged?
5.10 Lie Series 425
Dynamics
Now to play this game with dynamical functions we want to pro-
vide a derivative-like operator that we can exponentiate, which
will give us the advance operator. The key idea is to write the
derivative of the function in terms of the Poisson bracket. Equa-
tion (3.75) shows how to do this in general:
D(F ◦ σ) = ({F, H} + ∂0 F ) ◦ σ (5.463)
We define the operator DH by
DH F = ∂0 F + {F, H}, (5.464)
so
DH F ◦ σ = D(F ◦ σ), (5.465)
and iterates of this operator can be used to compute higher order

derivatives:
Dn (F ◦ σ) = DH
n
F ◦σ (5.466)
Thus we can rewrite the advance of the path function f =

F ◦ σ for an interval ² with respect to H as a power series in the
derivative operator DH applied to the phase-space function F and
then composed with the path:
f (t + ²) = (e²D f )(t) = (e²DH F ) ◦ σ(t) (5.467)
Indeed, we can implement the time-advance operator with this

series when it converges.
Exercise 5.29: Iterated derivatives

Show that equation (5.466) is correct.
Exercise 5.30: Lagrangian analog

Compare DH with the total time derivative operator. Recall that
Dt F ◦ Γ[q] = D(F ◦ Γ[q])
abstracts the derivative of a function of a path through state space to
a function of the derivatives of the path. Define another derivative
operator DL , analogous to DH that would give the time derivative of
functions along Lagrangian state paths that are solutions of Lagrange’s
equations for a given Lagrangian. How might this be useful?
Let H be a Hamiltonian. If F and H are both time-independent,

we can simplify the computation of the advance of F . In this case
we define the Lie derivative operator LH such that
LH F = {F, H} (5.468)
which reads “the Lie derivative of F with respect to H.”35 So
DH = ∂0 + LH (5.469)
and for time-independent F
D(F ◦ σ) = LH F ◦ σ (5.470)
We can iterate this process to compute higher derivatives. So
L2H F = {{F, H}, H}, (5.471)
and successively higher order Poisson brackets of F with H give

successively higher order derivatives when evaluated on the tra-
jectory.
Let f = F ◦ σ, we have
Df = (LH F ) ◦ σ (5.472)
D2 f = (L2H F ) ◦ σ (5.473)
··· (5.474)
Thus we can rewrite the advance of the path function f for an

interval ² with respect to H as a power series in the Lie derivative
operator applied to the phase-space function F and then composed
with the path:
f (t + ²) = (e²D f )(t) = (e²LH F ) ◦ σ(t) (5.475)

0
We can implement the time-advance operator E²,H with the series
0
E²,H F = (e²LH F ), (5.476)
35
Our LH is a special case of what is referred to as a Lie derivative in differ-
ential geometry. The more general idea is that a vector field defines a flow.
The Lie derivative of an object with respect to a vector field gives the rate of
change of the object as it is dragged along with the flow. In our case the flow
is the evolution generated by Hamilton’s equations, with Hamiltonian H.
5.10 Lie Series 427
when this series converges.

We have shown that time evolution is canonical, so the series
above are formal representations of canonical transformations as
power series in the time. These series may not converge, even if
the evolution governed by the Hamiltonian H is well defined.
Computing Lie series
We can use the Lie transform as a computational tool to locally
examine the evolution of dynamical systems. We define the Lie
derivative of F , as a derivative-like operator, relative to the given
Hamiltonian function, H:36
(define ((Lie-derivative H) F)
(Poisson-bracket F H))
We also define a procedure to implement the Lie transform:37

(define (Lie-transform H t)
(exp (* t (Lie-derivative H))))
Let’s start by examining the beginning of the Lie series for the
position of a simple harmonic oscillator of mass m and spring
constant k. Note that we make up the Lie transform (series)
operator by passing it an appropriate Hamiltonian function and
an interval to evolve for. The resulting operator is then given the
position selector procedure. The Lie transform operator returns
the new position selector procedure, that when given the phase-
space coordinates x0 and p0 returns the position selected from the
result of advancing those coordinates by the interval dt.
36
Actually, we define the Lie derivative slightly differently, as follows:
(define ((Lie-derivative-procedure H) F)
(Poisson-bracket F H))
(define Lie-derivative
(make-operator Lie-derivative-procedure ’Lie-derivative))
The reason is that we want Lie-derivative to be an operator, which is just like
a function except that the product of operators is interpreted as composition
while the product of functions is the function computing the product of their
values.
37
The Lie-transform procedure here is also defined to be an operator, just
like Lie-derivative, but in this case the operator declaration is purely formal
because the exp procedure will produce a series, and we do not currently have
a way of iterating that process.
(((Lie-transform (H-harmonic ’m ’k) ’dt)
coordinate)
(up 0 ’x0 ’p0))
6)
x0
(/ (* dt p0) m)
(/ (* -1/2 (expt dt 2) k x0) m)
(/ (* -1/6 (expt dt 3) k p0) (expt m 2))
(/ (* 1/24 (expt dt 4) (expt k 2) x0) (expt m 2))
(/ (* 1/120 (expt dt 5) (expt k 2) p0) (expt m 3))
...
We should recognize the terms of this series. We start with the ini-
tial position x0 . The first-order correction (p0 /m)dt is due to the
initial velocity. Next we find an acceleration term (−kx0 /2m)dt2
due to the restoring force of the spring at the initial position.
The Lie transform is just as appropriate for showing us how the
momentum evolves over the interval:
momentum)
(up 0 ’x0 ’p0))
6)
p0
(* -1 dt k x0)
(/ (* -1/2 (expt dt 2) k p0) m)
(/ (* 1/6 (expt dt 3) (expt k 2) x0) m)
(/ (* 1/24 (expt dt 4) (expt k 2) p0) (expt m 2))
(/ (* -1/120 (expt dt 5) (expt k 3) x0) (expt m 2))
...
In this series we see how the initial momentum p0 is corrected by

the effect of the restoring force −kx0 dt, etc.
What is a bit more fun is to see how a more complex phase-
space function is treated by the Lie series expansion. In the ex-
periment below we examine the Lie series developed by advancing
the harmonic-oscillator Hamiltonian, by the transform generated
by the same harmonic-oscillator Hamiltonian:
5.10 Lie Series 429
(H-harmonic ’m ’k))
(up 0 ’x0 ’p0))
6)
(/ (+ (* 1/2 k m (expt x0 2)) (* 1/2 (expt p0 2))) m)

0
0
0
0
0
...
As we would hope, the series shows us the original energy ex-

pression (k/2)x20 + (1/2m)p20 as the first term. Each subsequent
correction term turns out to be zero—because the energy is con-
served.
Of course, the Lie series can be used in much more complex
situations where we want to see the expansion of the motion of a
system characterized by a more complex Hamiltonian. The planar
motion of a particle in a general central field is a simple problem
for which the Lie series is instructive. In the following transcript
we can see how rapidly the series becomes complicated. It is
worth one’s while to try to interpret the additive parts of the
third (acceleration) term shown below:
(((Lie-transform
(H-central-polar ’m (literal-function ’U))
’dt)
coordinate)
(up 0
(up ’r 0 ’phi 0)
(down ’p r 0 ’p phi 0)))
4)
(up r 0 phi 0)
(up (/ (* dt p r 0) m)
(/ (* dt p phi 0) (* m (expt r 0 2))))
(up
(+ (/ (* -1/2 ((D U) r 0) (expt dt 2)) m)
(/ (* 1/2 (expt dt 2) (expt p phi 0 2))
(* (expt m 2) (expt r 0 3))))
(/ (* -1 (expt dt 2) p phi 0 p r 0)
(* (expt m 2) (expt r 0 3))))
(up
(+ (/ (* -1/6 (((expt D 2) U) r 0) (expt dt 3) p r 0)
(expt m 2))
(/ (* -1/2 (expt dt 3) (expt p phi 0 2) p r 0)
(* (expt m 3) (expt r 0 4))))
(+ (/ (* 1/3 ((D U) r 0) (expt dt 3) p phi 0)
(* (expt m 2) (expt r 0 3)))
(/ (* -1/3 (expt dt 3) (expt p phi 0 3))
(* (expt m 3) (expt r 0 6)))
(/ (* (expt dt 3) p phi 0 (expt p r 0 2))
(* (expt m 3) (expt r 0 4)))))
...
Of course, if we know the closed form Lie transform it is prob-

ably a good idea to take advantage of it, but when we do not
know the closed form the Lie series representation of it can come
in handy.
5.11 Exponential Identities
The composition of Lie transforms can be written as products of

exponentials of Lie derivative operators. In general, Lie deriva-
tive operators do not commute. If A and B are non-commuting
operators, then the exponents do not combine in the usual way:
eA eB 6= eA+B . (5.477)
So it will be helpful to recall some results about exponentials of

non-commuting operators.
We introduce the commutator
[A, B] = AB − BA. (5.478)
The commutator is bilinear and satisfies the Jacobi identity
[A, [B, C]] + [B, [C, A]] + [C, [A, B]] = 0, (5.479)
which is true for all A, B, and C.

We introduce a notation ∆A for the commutator with respect
to the operator A:
∆A B = [A, B]. (5.480)

5.11 Exponential Identities 431
In terms of ∆ this is the same as
[∆A , ∆B ] = ∆[A,B] . (5.481)
An important identity is
eC Ae−C = e∆C A
1
= A + [C, A] + [C, [C, A]] + · · · . (5.482)
2
We can check this term by term.
We see that
¡ ¢2
eC A2 e−C = eC Ae−C eC Ae−C = eC Ae−C , (5.483)
using e−C eC = I, the identity operator. Using the same trick

¡ ¢n
eC An e−C = eC Ae−C . (5.484)
More generally, if f can be represented as a power series then
eC f (A, B, ...)e−C = f (eC Ae−C , eC Be−C , ...) (5.485)
For instance, applying this to the exponential function
eC eA e−C = ee Ae−C
C
. (5.486)
Using equation (5.482 we can rewrite this
e∆C eA = ee A
∆C
. (5.487)
Exercise 5.31: Commutators of Lie derivatives

a. Let W and W 0 be two phase space state functions. Use the Poisson
bracket Jacobi identity to show
[LW , L0W ] = −L{W,W 0 } . (5.488)
b. Consider the phase space state functions that gives the components
of the angular momentum in terms of rectangular canonical coordinates
Jx (t; x, y, z; px , py , pz ) = ypz − zpy
Jy (t; x, y, z; px , py , pz ) = zpx − xpz
Jz (t; x, y, z; px , py , pz ) = xpy − ypx
Show
[LJx , LJy ] + LJz = 0. (5.489)
c. Relate the Jacobi identity for operators to the Poisson bracket Jacobi
identity.
Exercise 5.32: Baker-Campbell-Hausdorff

Derive the rule for combining exponentials of non-commuting operators:
1
eA eB = eA+B+ 2 [A,B]+··· . (5.490)
5.12 Summary
Canonical transformations can be used to reformulate a problem
in coordinates that are easier to understand or that expose some
symmetry of a problem.
In this chapter we have investigated different representations
of a dynamical system. We have found that different representa-
tions will be equivalent if the coordinate-momentum part of the
transformation has symplectic derivative, and if the Hamiltonian
transforms in a specified way. If the phase-space transformation
is time-independent then the Hamiltonian transforms by compo-
sition with the phase-space transformation. The symplectic con-
dition can be equivalently expressed in terms of the fundamental
Poisson brackets. The Poisson-bracket and the ω function are
invariant under canonical transformations. The invariance of ω
implies the areas of the projections onto fundamental coordinate-
momentum planes is preserved (Poincaré integral invariant) by
We can formulate an extended phase space in which time is
treated as another coordinate. Time dependent transformations
are simple in the extended phase space. In the extended phase
space the Poincaré integral invariant is the Poincaré-Cartan inte-
gral invariant. We can also reformulate a time independent prob-
lem as a time-dependent problem with fewer degrees of freedom
with one of the original coordinates taking on the role of time;
this is the reduced phase space.
A generating function is a real-valued function of the phase
space coordinates and time that represents a canonical transfor-
mation through its partial derivatives. We found that all canoni-
5.12 Summary 433
cal transformations can be represented by a generating function.

The proof depends on the Poincaré integral invariant (and not on
the fact that total time derivatives can be added to Lagrangians
without changing the equations of motion).
The time evolution of any Hamiltonian system induces a canon-
ical transformation: if we consider all possible initial states of a
Hamiltonian system, and we follow all of the trajectories for the
same time interval, then the map from the initial state to the fi-
nal state of each trajectory is a canonical transformation. This
is true for any interval we choose, so time evolution generates a
continuous family of canonical transformations.
We generalized this idea to generate continuous canonical trans-
formations other than those generated by time evolution. Such
transformations will be especially useful in support of perturba-
tion theory.
In rare cases a canonical transformation can be made to a rep-
resentation in which the problem is easily solvable: when all coor-
dinates are ignorable and all the momenta are conserved. Here we
investigate the Hamilton-Jacobi method for finding such canoni-
cal transformations. For problems for which the Hamilton-Jacobi
method works we find that the time-evolution of the system is
given as a canonical transformation.
6
Canonical Perturbation Theory
The first treatment of the Problem of Three
Bodies, as well as of Two Bodies, was due to
Newton. It was given in Book I, Section XI, of the
Principia, and it was said by Airy to be “the most
valuable chapter that was ever written on physical
science.” . . . The value of the motion of the lunar
perigee found by Newton from theory was only
half that given by observations. In 1872, in certain
of Newton’s unpublished manuscripts, known as
the Portsmouth Collection, it was found that
Newton had accounted for the entire motion of the
perigee by including perturbations of the second
order. This work being unknown to astronomers,
the motion of the lunar perigee was not otherwise
derived from theory until the year 1749 . . . .
Newton regarded the Lunar Theory as being very
difficult, and he is said to have told his friend
Halley in despair that it “made his head ache and
kept him awake so often that he would think of it
no more.”
Forest Ray Moulton An Introduction to Celestial
Mechanics (1914).
Closed-form solutions of dynamical systems can only rarely be

found. However, some systems differ from a solvable system by
the addition of a small effect. The goal of perturbation theory is
to relate aspects of the motion of the given system to those of the
nearby solvable system. We can try to find a way to transform the
exact solution of this approximate problem into an approximate
solution to the original problem. We can also use perturbation
theory to try to predict qualitative features of the solutions by
describing the characteristic ways in which solutions of the solv-
able system are distorted by the additional effects. For instance,
we might want to predict where the largest resonance regions are
located or the locations and sizes of the largest chaotic zones. Be-
ing able to predict such features can give insight into the behavior
of the particular system of interest.
436 Chapter 6 Canonical Perturbation Theory
Suppose, for example, we have a system characterized by a

Hamiltonian that breaks up into two parts as follows,
H = H0 + ²H1 (6.1)
where H0 is solvable and ² is a small parameter. The difference

between our system and a solvable system is then a small additive
complication.
There are a number of strategies for doing this. One strategy
is to seek a canonical transformation that eliminates the terms of
order ² from the Hamiltonian that impede solution—this typically
introduces new terms of order ²2 . Then seek another canonical
transformation that eliminates the terms of order ²2 that impede
solution leaving terms of order ²3 . We can imagine repeating this
process until the part that impedes solution is of such high order
in ² that it can be neglected. Having reduced the problem to a
solvable problem, we can reverse the sequence of transformations
to find an approximate solution of the original problem. Does
this process converge? How do we know we can ever neglect the
remaining terms? Let’s follow this path and see where it goes.
6.1 Perturbation Theory with Lie Series
Given a system we look for a decomposition of the Hamiltonian

in the form
H(t, q, p) = H0 (t, q, p) + ²H1 (t, q, p), (6.2)
where H0 is solvable. We assume that the Hamiltonian has no

explicit time dependence; this can be ensured by going to the ex-
tended phase space if necessary. We also assume that a canonical
transformation has been made so that H0 depends solely on the
momenta:
∂1 H0 = 0. (6.3)
We carry out a Lie transformation and find the conditions that

the Lie generator W must satisfy to eliminate the order ² terms
from the Hamiltonian.
6.1 Perturbation Theory with Lie Series 437
The Lie transform and associated Lie series specify a canonical

transformation:
H 0 = E²,W
0
H = e²LW H
0
q = (E²,W Q)(t, q 0 , p0 ) = (e²LW Q)(t, q 0 , p0 )
0
p = (E²,W P )(t, q 0 , p0 ) = (e²LW P )(t, q 0 , p0 )
0
(t, q, p) = (E²,W I)(t, q 0 , p0 ) = (e²LW I)(t, q 0 , p0 ), (6.4)
where Q = I1 and P = I2 are the coordinate and momentum

selectors and I is the identity function. Recall the definitions
1
e²LW F = F + ²LW F + ²2 L2W F + · · ·
2
1
= F + ²{F, W } + ²2 {{F, W }, W } + · · · , (6.5)
2
with LW F = {F, W }.
Applying the Lie transformation to H
H 0 = e²LW H
1
= H0 + ²LW H0 + ²2 L2W H0 + · · ·
2
+²H1 + ²2 LW H1 + · · ·
³1 ´
= H0 + ² (LW H0 + H1 ) + ²2 L2W H0 + LW H1 + · · · . (6.6)
2
The first order term in ² is zero if W satisfies the condition
LW H0 + H1 = 0, (6.7)
which is a linear partial differential equation for W . The trans-

formed Hamiltonian is
³1 ´
H 0 = H0 + ²2 L2W H0 + LW H1 + · · ·
2
1 2
= H0 − ² LW H1 + · · · , (6.8)
2
where we have used condition (6.7) to simplify the ²2 contribution.
This basic step of perturbation theory has eliminated terms of
a certain order (order ²) from the Hamiltonian, but in doing so
has generated new terms of higher order (here ²2 and higher).
At this point we can find an approximate solution by trun-

cating Hamiltonian (6.8) to H0 , which is solvable. The approxi-
mate solution for given initial conditions (t0 , q0 , p0 ) is obtained by
finding the corresponding (t0 , q00 , p00 ) using the inverse of transfor-
mation (6.4). Then the system is evolved using the solutions of
the truncated Hamiltonian H0 to time t giving the state (t, q 0 , p0 ).
The phase space coordinates of the evolved point are transformed
back to the original variables using the transformation (6.4) to
state (t, q, p). The approximate solution is
0 0 0
(t, q, p) = (E²,W Et−t 0 ,H0
E−²,W I)(t0 , q0 , p0 )
= (e²LW e(t−t0 )LH0 e−²LW I)(t0 , q0 , p0 ). (6.9)
0
If the Lie transform E²,W = e²LW must be evaluated by summing
the series then we must specify the order to which the sum extends.
Assuming everything goes ok, we can imagine repeating this
process to eliminate the order ²2 terms and so on, bringing the
transformed Hamiltonian as close as we like to H0 . Unfortunately,
there are complications. We can understand some of these com-
plications and how to deal with them by considering some specific
applications.
6.2 Pendulum as a Perturbed Rotor

The pendulum is a simple one-degree of freedom system, for which
the solutions are known. If we consider the pendulum as a free
rotor with the added complication of gravity, then we can carry
out a perturbation step as just described to see how well it ap-
proximates the known motion of the pendulum.
The motion of a pendulum is described by the Hamiltonian
p2
H(t, θ, p) = − ²β cos(θ), (6.10)
2α
with coordinate θ and conjugate angular momentum p, and where
α = ml2 and β = mgl. The parameter ² allows us to scale the per-
turbation; it is 1 for the actual pendulum. We divide the Hamil-
tonian into the free rotor Hamiltonian and the perturbation from
gravity:
H = H0 + ²H1 , (6.11)
where
p2
H0 (t, θ, p) =
2α
²H1 (t, θ, p) = −²β cos θ. (6.12)
The Lie generator W satisfies condition (6.7):
{H0 , W } + H1 = 0, (6.13)
or
p
− ∂1 W (t, θ, p) − β cos θ = 0. (6.14)
α
So
αβ sin θ
W (t, θ, p) = − , (6.15)
p
where the arbitrary integration constant is ignored.
The transformed Hamiltonian is H 0 = H0 + o(²2 ). If we can
ignore the ²2 contributions, then the transformed Hamiltonian is
simply
(p0 )2
H 0 (t, θ0 , p0 ) = , (6.16)
2α
with solutions
p00
θ0 = θ00 + (t − t0 )
α
p0 = p00 . (6.17)
To connect these solutions to the solutions of the original prob-

lem we use the Lie series
θ = (e²LW Q)(t, θ0 , p0 )
= θ0 + ²{Q, W }(t, θ0 , p0 ) + · · ·
= θ0 + ²∂2 W (t, θ0 , p0 ) + · · ·
αβ sin θ0
= θ0 + ² + ···. (6.18)
(p0 )2
Similarly,
αβ cos θ0
p = p0 + ² + ···. (6.19)
p0
Note that if the Lie series is truncated it is not exactly a canonical
transformation; only the infinite series is canonical.
The initial values θ00 and p00 are determined from the initial
values of θ and p by the inverse Lie transformation:
θ0 = (e−²LW Q)(t, θ, p)
αβ sin θ
=θ−² + ···, (6.20)
(p)2
and
αβ cos θ
p0 = p − ² + ···. (6.21)
p
Note that if we truncate the coordinate transformations after the
first order terms in ² (or any finite order) then the inverse trans-
formation is not exactly the inverse of the transformation.
The approximate solution for given initial conditions t0 , θ0 , p0 )
is obtained by finding the corresponding (t0 , θ00 , p00 ) using the
transformation (6.20) and (6.21). Then the system is evolved
using the solutions (6.17). The phase space coordinates of the
evolved point are transformed back to the original variables using
the transformation (6.18) and (6.19).
We define the two parts of the pendulum Hamiltonian:
(define ((H0 alpha) state)
(let ((ptheta (momentum state)))
(/ (square ptheta) (* 2 alpha))))
(define ((H1 beta) state)

(let ((theta (coordinate state)))
(* -1 beta (cos theta))))
The Hamiltonian for the pendulum can be expressed as a series

expansion in the parameter ² by
(define (H-pendulum-series alpha beta epsilon)
(series (H0 alpha) (* epsilon (H1 beta))))
where the series procedure is a constructor for a series whose first

terms are given and all further terms are zero. The Lie generator
that eliminates the order ² terms is
(define ((W alpha beta) state)
(let ((theta (coordinate state))
(ptheta (momentum state)))
(/ (* -1 alpha beta (sin theta)) ptheta)))
We check that W satisfies condition (6.7):1

(print-expression
((+ ((Lie-derivative (W ’alpha ’beta)) (H0 ’alpha))
(H1 ’beta))
a-state))
0
and that it has the desired effect on the Hamiltonian

(show-expression
(series:sum
(((exp (* ’epsilon (Lie-derivative (W ’alpha ’beta))))
(H-pendulum-series ’alpha ’beta ’epsilon))
a-state)
2))
1 2 1 2 2 2
2 pθ 2 αβ ² (sin (θ))
+
α p2θ
Indeed, the order ² term has been removed, and an order ²2 term
has been introduced.
Ignoring the ²2 terms in the new Hamiltonian the solution is
(define (((solution0 alpha beta) t) state0)
(let ((t0 (time state0))
(theta0 (coordinate state0))
(ptheta0 (momentum state0)))
(up t
(+ theta0 (/ (* (- t t0) ptheta0) alpha))
ptheta0)))
1
We use the typical pendulum state
(define a-state (up ’t ’theta ’p theta))
The transformation from primed to unprimed phase-space co-

ordinates is, including terms up to order,
(define ((C alpha beta epsilon order) state)
(series:sum
(((Lie-transform (W alpha beta) epsilon)
identity)
state)
order))
To second order in ² the transformation generated by W is

(show-expression ((C ’alpha ’beta ’epsilon 2) a-state))
 
t
 
 1 α2 β 2 ²2 cos (θ) sin (θ) αβ² sin (θ) 
 2 
 − + + θ 
 p4θ p2θ 
 
 1 2 2 2 
2α β ² αβ² cos (θ)
− 3 + + pθ
pθ pθ
The inverse transformation is

(define (C-inv alpha beta epsilon order)
(C alpha beta (- epsilon) order))
Using these components the perturbative solution (equation

6.9) is
(define (((solution epsilon order) alpha beta) delta-t)
(compose (C alpha beta epsilon order)
((solution0 alpha beta) delta-t)
(C-inv alpha beta epsilon order)))
The resulting procedure maps an initial state to the solution state

advanced by delta-t.
We can examine the behavior of the perturbative solution and
compare it to the true behavior of the pendulum. There are several
considerations. We have truncated the Lie series for the phase-
space transformation. Does the missing part matter? If the miss-
ing part does not matter, how well does this perturbation step
work?
Figure 6.1 shows that as we increase the number of terms in the
Lie series for the phase-space coordinate transformation the result
appears to converge. The lone trajectory only includes terms of

first order. The others, including terms of second, third, and
fourth order, are closely clustered. On the left edge of the graph
(at θ = −π) the order of the solution increases from the top to the
bottom of the graph. In the middle (at θ = 0) the fourth-order
curve is between the second order curve and the third order curve.
In addition to the error in phase-space path, there is also an error
in the period—the higher-order orbits have longer periods than
the first-order orbit. The parameters are α = 1.0 and β = 0.1.
We have set ² = 1. Each trajectory was started at θ = 0 with
pθ = 0.7. Notice that the initial point on the solution varies
between trajectories. This is because the transformation is not
perfectly inverted by the truncated Lie series.
Figure 6.2 compares the perturbative solution (with terms up
to fourth order) with the actual trajectory of the pendulum. The
initial points coincide, to the precision of the graph, because the
terms to fourth order are sufficient. The trajectories deviate both
in the phase plane and in the period, but they are still quite close.
The trajectories of figures 6.1 and 6.2 are all for the same initial
state. As we vary the initial state we find that for trajectories
that are in the circulation region, far from the separatrix, the
perturbative solution does quite well. However, if we get close to
the separatrix, or if we enter the oscillation region the perturbative
solution is nothing like the real solution, and it does not even
seem to converge. Figure 6.3 shows what happens when we try to
use the perturbative solution inside the oscillation region. Each
trajectory was started at θ = 0 with pθ = 0.55. The parameters
are α = 1.0 and β = 0.1.
This failure of the perturbation solution should not be surpris-
ing. We assumed that the real motion was a distorted version
of the motion of the free rotor. But in the oscillation region the
assumption is not true—the pendulum is not rotating at all. The
perturbative solutions can only be valid (if they work at all!) in
a region where the topology of the real orbits is the same as the
topology of the perturbative solutions.
We can make a crude estimate the range of validity of the per-
turbative solution by looking at the first correction term in the
phase-space transformation (6.18). The correction in θ is propor-
tional to ²αβ/(p0 )2 . This is not a small perturbation if
p
|p0 | < ²αβ. (6.22)
0.75
0.50
0.25
−π 0 π
Figure 6.1 The perturbative solution in the phase plane, including
terms of first, second, third, and fourth order in the phase-space coordi-
nate transformation. The solutions appear to converge.
Figure 6.2 The perturbative solution in the phase plane, including

terms of fourth order in the phase-space coordinate transformation, is
compared with the actual trajectory. The actual trajectory is the lower
of the two curves. The parameters are the same as in figure 6.1.
0.5
0.0
-0.5
−π 0 π
Figure 6.3 The perturbative solution does not converge in the os-
cillation region. As we include more terms in the Lie series for the
phase-space transformation the resulting trajectory develops loops near
the hyperbolic fixed point that increase in size with the order.
This sets the scale for the validity of the perturbative solution.
We can compare this scale to the size of the oscillation region
(see figure 6.4). We can calculate the extent of the region of
oscillation of the pendulum by considering the separatrix. The
value of the Hamiltonian on the separatrix is the same as the
value at the unstable equilibrium: H(t, θ = π, pθ = 0) = β². The
separatrix has maximum momentum psep θ at θ = 0:
H(t, 0, psep
θ ) = H(t, π, 0). (6.23)
Solving for psep

θ , the half-width of the region of oscillation, we find
p
psep
θ = 2 αβ². (6.24)
Comparing equations (6.22) and (6.24) we see that the require-

ment that the terms in the perturbation solution be small excludes
a region of the phase space with the same scale as the region of
oscillation of the pendulum.
What the perturbation theory is doing is deforming the phase
space coordinate system so that the problem looks like the free-
p
1
2(αβ²) 2
−π +π
Figure 6.4 The oscillation region of the pendulum is delimited by

the separatrix. The maximum momentum occurs at the zero-crossing
of the angle. The energy is conserved, so its value is the same at the
unstable fixed point and at the point of maximum momentum. At the
unstable fixed point the energy is entirely potential energy, because the
momentum is zero. We use this to compute the maximum momentum
(where the potential energy is zero and all of the energy is kinetic.)
rotor problem. This deformation is only sensible in the circulating

case. So, it is not surprising that the perturbation theory fails in
the oscillation region. What may be surprising is how well the
perturbation theory works just outside the oscillation region. The
range of pθ in which the perturbation theory is not valid scales
in the same way as the width of the oscillation region. This need
not have been the case—the perturbation theory could have failed
over a wider range.
Exercise 6.1: Symplectic residual

Compute the residual in the symplectic test for various orders of trunca-
tion of the Lie series for transformation (C alpha beta epsilon order).
6.2.1 Higher Order

We can improve the perturbative solution by carrying out addi-
tional perturbation steps. The overall plan is the same as before.
We perform a Lie transformation with a new generator that elim-
inates the desired terms from the Hamiltonian.
6.2.1 Higher Order 447
After the first step the Hamiltonian is, to second order in ²,
(p0 )2 αβ 2
H 0 (t, θ0 , p0 ) = + ²2 (sin θ0 )2 + · · ·
2α 2(p0 )2
(p0 )2 αβ 2
= + ²2 (1 − cos(2θ0 )) + · · ·
2α 4(p0 )2
= H0 (p0 ) + ²2 H2 (t, θ0 , p0 ) + · · · . (6.25)
Performing a Lie transformation with generator W 0 , the new

Hamiltonian is
H 00 = e² LW 0 H 0
2
= H0 + ²2 (LW 0 H0 + H2 ) + · · · . (6.26)
So the condition on W 0 that the second order terms are eliminated

is
LW 0 H0 + H2 = 0. (6.27)
This is
p0 αβ 2
− ∂1 W 0 (t, θ0 , p0 ) + (1 − cos(2θ0 )) = 0. (6.28)
α 4(p0 )2
A generator that satisfies this condition is
α2 β 2 0 α2 β 2
W 0 (t, θ0 , p0 ) = θ + sin(2θ0 ). (6.29)
4(p0 )3 8(p0 )3
There are two contributions to this generator, one proportional to
θ0 and the other involving a trigonometric function of θ0 .
The phase-space coordinate transformation resulting from this
Lie transform is found as before. For given initial conditions, we
first carry out the inverse transformation corresponding to W ,
then that for W 0 , solve for the evolution of the system using H0 ,
then transform back using W 0 and then W . The approximate
solution is
0
(t, θ, p) = (E²,W E²0 2 ,W 0 E(t−t
0
0 ),H0
0
E−² 0
2 ,W 0 E−²,W I)(t0 , θ0 , p0 )
= (e²LW e² LW 0 (t−t0 )LH0 −²2 LW 0 −²LW

2
e e e I)(t0 , θ0 , p0 ). (6.30)
The solution obtained in this way is compared to the actual evo-

lution of the pendulum in figure 6.5. Terms in all Lie series up
to ²4 are included. The perturbative solution, including this sec-
ond perturbative step, is much closer to the actual solution in the
initial segment, but then the two begin to diverge. The time in-
terval spanned is 10. Over longer times the divergence is actually
severe, as shown in figure 6.6. The time interval spanned is 100.
These solutions begin at θ = 0 with pθ = 0.7. The parameters are
α = 1.0 and β = 0.1.
A problem with the perturbative solution is that there are terms
in W 0 and in the corresponding phase-space coordinate transfor-
mation that are proportional to θ0 , and θ0 grows linearly with time.
So the solution can only be valid for small times; the interval of
validity depends on the frequency of the particular trajectory un-
der investigation and the size of the coefficients multiplying the
various terms. Such terms in a perturbative representation of the
solution that are proportional to time are called secular terms.
They limit the validity of the perturbation theory to small times.
6.2.2 Eliminating Secular Terms

There is a simple solution to the problem of secular terms, devel-
oped by Lindstedt and Poincaré. The goal of each perturbation
step is to eliminate terms in the Hamiltonian that prevent solu-
tion. However, the term in H 0 that led to the secular term in
the generator W 0 does not actually impede solution. So a better
procedure is to leave that term in the Hamiltonian and find the
generator W 00 that only eliminates the term that is periodic in θ0 .
So W 00 satisfies
p0 αβ 2
− ∂1 W 00 (t, θ0 , p0 ) − cos(2θ0 ) = 0. (6.31)
α 4(p0 )2
The generator is
α2 β 2
W 00 (t, θ0 , p0 ) = sin(2θ0 ). (6.32)
8(p0 )3
After performing a Lie transformation with this generator the new
Hamiltonian is
(p00 )2 αβ 2
H 00 (t, θ00 , p00 ) = + ²2 + ···. (6.33)
2α 4(p00 )2
6.2.2 Eliminating Secular Terms 449
0.75
0.50
0.25
−π 0 π
Figure 6.5 The solution using a second perturbation step, eliminating
²2 terms from the Hamiltonian, is compared to the actual solution. The
initial agreement is especially good, but the error increases with time.
0.75
0.50
0.25
−π 0 π
Figure 6.6 The two-step perturbative solution is shown over longer
time. The actual solution is a closed curve in the phase plane; this
perturbative solution wanders all over the place and gets worse with
time.
Including terms up to the ²2 term, the solution is

µ 00 2
¶
00 00 p0 2 αβ
θ = θ0 + −² (t − t0 )
α 2(p000 )3
p00 = p000 . (6.34)
We construct the solution for a given initial condition as before

by composing the transformations, the solution of the modified
Hamiltonian, and the inverse transformations. The approximate
solution is
0
(t, θ, p) = (E²,W E²0 2 ,W 00 E(t−t
0
0 ),H
00 0
00 E−²2 ,W 00 E−²,W I)(t0 , θ0 , p0 )
= (e²LW e² LW 00 (t−t0 )LH 00 −²2 LW 00 −²LW

2
e e e I)(t0 , θ0 , p0 ). (6.35)
The resulting phase space evolution is shown is figure 6.7. Now

the perturbative solution is a closed curve in the phase plane and
is in pretty good agreement with the actual solution.
0.75
0.50
0.25
−π 0 π
Figure 6.7 The two-step perturbative solution without secular terms

is compared to the actual solution. The perturbative solution is now a
closed curve and is very close to the actual solution.
By modifying the solvable part of the Hamiltonian we are mod-

ifying the frequency of the solution. The secular terms appeared
because we were trying to approximate a solution with one fre-

quency as a Fourier series with the wrong frequency. As an anal-
ogy consider
sin(ω + ∆ω)t = sin ωt cos ∆ωt + cos ωt sin ∆ωt

µ ¶
(∆ωt)2
= sin ωt 1 − + ···
2
+ cos ωt (∆ωt + · · ·) . (6.36)
The periodic terms are multiplied by terms that are polynomials in

the time. These polynomials are the initial segment of the power
series for periodic functions. The infinite series are convergent,
but if the series are truncated the error is large at large times.
Continuing the perturbative solution to higher orders is now
a straightforward repetition of the steps we have carried out so
far. At each step in the perturbation solution there will be new
contributions to the solvable part of the Hamiltonian that absorb
potential secular terms. The contribution is just the angle inde-
pendent part of the Hamiltonian after the Hamiltonian is written
as a Fourier series. The constant part of the Fourier series is the
same as the average of the Hamiltonian over the angle. So at
each step in the perturbation theory, the average of the perturba-
tion is included with the solvable part of the Hamiltonian and the
periodic part is eliminated by a Lie transformation.
6.3 Many Degrees of Freedom
Other problems are encountered in applying perturbation theory

to systems with more than a single degree of freedom. Consider a
Hamiltonian of the form
H = H0 + ²H1 , (6.37)
where H0 depends only on the momenta and so is solvable. We

assume that the Hamiltonian has no explicit time dependence. We
further assume that the coordinates are all angles, and that H1 is
a multiply periodic function of the coordinates.
Carrying out a Lie transformation with generator W , the new
Hamiltonian is
H 0 = e²LW H
= H0 + ² (LW H0 + H1 ) + · · · , (6.38)
as before. The condition that the order ² terms are eliminated is
{H0 , W } + H1 = 0, (6.39)
a linear partial differential equation. By assumption, the Hamil-

tonian H0 depends only on the momenta. We define
ω0 (p) = ∂2 H0 (t, θ, p), (6.40)
the tuple of frequencies of the unperturbed system. The condition

on W is
ω0 (p)∂1 W (t, θ, p) = H1 (t, θ, p). (6.41)
As H1 is a multiply periodic function of the coordinates we can

write it as a Poisson series2
X
H1 (t, θ, p) = Ak (p) cos(k · θ). (6.42)
k
Similarly, we assume W can be written as a Poisson series:

X
W (t, θ, p) = Bk (p) sin(k · θ). (6.43)
k
Substituting these into the condition that order ² terms are elim-
inated, we find
X X
Bk (p)(ω0 (p) · k) cos(k · θ) = Ak (p) cos(k · θ). (6.44)
k k
The cosines are orthogonal so each term must be individually zero.

We deduce
Ak (p)
Bk (p) = , (6.45)
k · ω0 (p)
2
In general, we need to include sine terms as well, but the cosine expansion is
enough for this illustration.
and that the required Lie generator is

X Ak (p)
W (t, θ, p) = sin(k · θ). (6.46)
k · ω0 (p)
k
There are a couple of problems. First, if A0 is non-zero then the

expression for B0 involves a division by zero. So the expression
for B0 is not correct. The problem is that the corresponding
term in H1 does not involve θ. So the integration for B0 should
introduce linear terms in θ. But this is the same situation that
led to the secular terms in the perturbation approximation to the
pendulum. Having learned our lesson there we avoid the secular
terms by adjoining this term to the solvable Hamiltonian, and
excluding k = 0 from the sum for W . We have
H 0 = H0 + ²A0 + · · · , (6.47)
and
X Ak (p)
W (t, θ, p) = sin(k · θ). (6.48)
k · ω0 (p)
k6=0
Another problem is that there are many opportunities for small

denominators, which would make the perturbation large and there-
fore not a perturbation. As we saw in the perturbation approx-
imation for the pendulum in terms of the rotor we must exclude
certain regions from the domain of applicability of the perturba-
tion approximation. Consider the phase-space transformation of
the coordinates
¡ ¢
θ = e²LW Q (t, θ0 , p0 )
= θ0 + ²∂2 W (t, θ0 , p0 ) + · · ·
X µ DAk (p0 ) Ak (p0 )(k · Dω(p0 ))
¶
= θ0 + ² − sin(k · θ) (6.49)
k · ω0 (p0 ) (k · ω0 (p0 ))2
k6=0
So we must exclude from the domain of applicability all regions

for which the coefficients are large. If the second term dominates,
the excluded regions satisfy
|(k · Dω(p0 )) Ak (p)| > (k · ω0 (p))2 . (6.50)

Considering the fact that for any tuple of frequencies ω0 (p0 ) we

can find a tuple of integers k such that k · ω(p0 ) is arbitrarily small
this problem of small divisors looks very serious.
However, the problem, though serious, is not as bad as it may
appear, for a couple of reasons. First, it may be that Ak 6= 0 only
for certain k. In this case, the regions excluded from the domain
of applicability are limited just to those for these terms. Second,
for analytic functions the magnitude of Ak decreases strongly with
the size of k (see [4])
|Ak (p0 )| ≤ Ce−β|k|+ , (6.51)
for some positive β and C, and where |k|+ = |k0 | + |k1 | + · · ·. At

any stage of a perturbation approximation we can limit considera-
tion to just those terms that are larger than a specified magnitude.
The excluded regions corresponding to these terms decreases ex-
ponentially with order, with size of order square root of |Ak (p0 )|
in the inequality (6.51).
6.3.1 Driven Pendulum as a Perturbed Rotor

More concretely, consider the periodically driven pendulum. We
will develop approximate solutions for the driven pendulum as a
perturbed rotor.
We use the Hamiltonian
p2
H(t, θ, p) = − ml(g − Aω 2 cos(ωt)) cos θ. (6.52)
2ml2
We can remove the explicit time dependence by going to the ex-
tended phase space. The Hamiltonian is
H(τ ; θ, t; p, T )
p2
=T + − ml(g − Aω 2 cos(ωt)) cos θ
2ml2
p2
=T + − β cos(θ) + γ cos(θ − ωt) + γ cos(θ + ωt), (6.53)
2α
with the constants α = ml2 , β = mlg, and γ = 12 mlAω 2 .
6.3.1 Driven Pendulum as a Perturbed Rotor 455
With the intent to approximate the driven pendulum as a per-

turbed rotor we choose
p2
H0 (τ ; θ, t; p, T ) = T +
2α
H1 (τ ; θ, t; p, T ) = −β cos θ + γ cos(θ + ωt) + γ cos(θ − ωt). (6.54)
Notice that the perturbation H1 has only three terms in its Pois-
son series, so in the first perturbation step there will only be three
regions excluded from the domain of applicability. The perturba-
tion H1 is particularly simple: it has only three terms, and the
coefficients are constants.
The Lie series generator that eliminates the terms in H1 to first
order in ², satisfying
{H0 , W } + H1 = 0, (6.55)
is
β
W (τ ; θ, t; p, T ) = − sin θ
ωr (p)
γ
+ sin(θ + ωt)
ωr (p) + ω
γ
+ sin(θ − ωt), (6.56)
ωr (p) − ω
where ωr (p) = ∂2,0 H0 (τ ; θ, t; p, T ) = p/α is the unperturbed rotor

frequency.
The resulting approximate solution has three regions in which
there are small denominators, and so three regions that are ex-
cluded from applicability of the perturbative solution. Regions of
phase space for which ωr (p) is near 0, ω, and −ω are excluded.
Away from these regions the perturbative solution works well,
just as in the rotor approximation for the pendulum. Unfortu-
nately, some of the more interesting regions of the phase space of
the driven pendulum are excluded: the region in which we find
the remnant of the undriven pendulum is excluded, as are the
two resonance regions in which the rotation of the pendulum is
synchronous with the drive. We need to develop methods for ap-
proximating these regions.
6.4 Nonlinear Resonance
We can develop an approximation for an isolated resonance region

as follows.
We again consider Hamiltonians of the form
H = H0 + ²H1 , (6.57)
where H0 (t, q, p) = Ĥ0 (p) depends only on the momenta and so

is solvable. We assume that the Hamiltonian has no explicit time
dependence. We further assume that the coordinates are all an-
gles, and that H1 is a multiply periodic function of the coordinates
that can be written
X
H1 (t, θ, p) = Ak (p) cos(k · θ). (6.58)
k
Suppose we are interested in a region of phase space for which

n · ω0 (p) is near zero, where n is a tuple of integers, one for each
degree of freedom. If we developed the perturbation theory as be-
fore with the generator W that eliminates all terms of order ² then
the transformed Hamiltonian is H0 , which is analytically solvable,
but there would be terms with n · ω0 (p) in the denominator. The
resulting solution is not applicable near this resonance.
Just as the problem of secular terms was solved by grouping
more terms with the solvable part of the Hamiltonian, we can
develop approximations that are valid in the resonance region by
eliminating fewer terms, and grouping more terms in the solvable
part.
To develop a perturbative approximation in the resonance re-
gion for which n · ω0 (p) is near zero we take the generator W to
be
X Ak (p)
Wn (t, θ, p) = sin(k · θ), (6.59)
k · ω0 (p)
k6=0,k6=n
excluding terms in W that lead to small denominators in this

region. The transformed Hamiltonian is
Hn0 (t, θ, p) = Ĥ0 (p) + ²A0 (p) + ²An (p) cos(n · θ) + · · · , (6.60)
6.4 Nonlinear Resonance 457
where the additional terms are higher order in ². By excluding the

term k = n from the sum in the generating function, that term is
left after the transformation.
The transformed Hamiltonian depends only on a single combi-
nation of angles, so a change of variables can be made so that the
new transformed Hamiltonian is cyclic in all but one coordinate,
which is this combination of angles. This transformed Hamilto-
nian is solvable (reducible to quadratures).
For example, suppose there are two degrees of freedom θ =
(θ1 , θ2 ) and we are interested in a region of phase space in which
n · ω0 is near zero, with n = (n1 , n2 ). The combination of angles
n · θ is slowly varying in the resonance region. The transformed
Hamiltonian (6.60) is of the form
Hn0 (t; θ1 , θ2 ; p1 , p2 ) = Ĥ0 (p1 , p2 ) + ²A0 (p1 , p2 )

+ ²An (p1 , p2 ) cos(n1 θ1 + n2 θ2 ). (6.61)
We can transform variables to σ = n1 θ1 + n2 θ2 , with second coor-

dinate, say, θ0 = θ2 .3 Using the F2 -type generating function
F2 (t; θ1 , θ2 ; Σ, Θ0 ) = (n1 θ1 + n2 θ2 )Σ + θ2 Θ0 . (6.62)
The transformation is
p1 = n1 Σ
p2 = n2 Σ + Θ0
σ = n 1 θ1 + n2 θ2
θ0 = θ2 . (6.63)
In these variables the transformed resonance Hamiltonian Hn0 be-

comes
Hn00 (t; σ, θ0 ; Σ, Θ0 ) = Ĥ0 (n1 Σ, n2 Σ + Θ0 ) + ²A0 (n1 Σ, n2 Σ + Θ0 )

+ ²An (n1 Σ, n2 Σ + Θ0 ) cos(σ). (6.64)
This Hamiltonian is cyclic in θ0 , so Θ0 is constant. With this con-

stant momentum, the Hamiltonian for the conjugate pair (σ, Σ)
has one degree of freedom. The solutions are level curves of the
Hamiltonian. These solutions can be reexpressed in terms of the
3
Any linearly independent combination will be acceptable here.
original phase space coordinates, and give the evolution of Hn0 .

An approximate solution in the resonance region is therefore
0 0 0
(t; θ, p) = (E²,Wn
0 Et−t ,H 0 E−²,W 0 I)(t0 , θ0 , p0 )
0 n n
(6.65)
If the resonance regions are sufficiently separated, then a global

solution can be constructed by splicing together such solutions for
each resonance region.
6.4.1 Pendulum Approximation

The resonance Hamiltonian (6.64) has a single degree of freedom
and is therefore solvable (reducible to quadratures). We can de-
velop an approximate analytic solution by making use of the fact
that the solution is only valid in the immediate vicinity of the
resonance. The resonance Hamiltonian can be approximated by a
generalized pendulum Hamiltonian.
Let
00
Hn,0 (t; σ, θ0 ; Σ, Θ0 ) = Ĥ0 (n1 Σ, n2 Σ+Θ0 )+²A0 (n1 Σ, n2 Σ+Θ0 )(6.66)
and
00
Hn,1 (t; σ, θ0 ; Σ, Θ0 ) = An (n1 Σ, n2 Σ + Θ0 ) cos(σ). (6.67)
The resonance Hamiltonian is
Hn00 = Hn,0
00 00
+ ²Hn,1 . (6.68)
Define the resonance center Σn by the requirement that the

resonance frequency is zero
00
∂2,0 Hn,0 (t; σ, θ0 ; Σn , Θ0 ) = 0. (6.69)
Now expand both parts of the resonance about the resonance cen-
ter:
00
Hn,0 (t; σ, θ0 ; Σ, Θ0 ) = Hn,0
00
(t; σ, θ0 ; Σn , Θ0 )
00
+ ∂2,0 Hn,0 (t; σ, θ0 ; Σn , Θ0 ) (Σ − Σn )
1 2
+ ∂2,0 00
Hn,0 (t; σ, θ0 ; Σn , Θ0 ) (Σ − Σn )2
2
+ ···, (6.70)
and
00
Hn,1 (t; σ, θ0 ; Σ, Θ0 ) = Hn,1
00
(t; σ, θ0 ; Σn , Θ0 ) + · · · . (6.71)
The first term in the expansion of Hn,0 00 is a constant and can
be ignored. The coefficient of the second term is zero, from the

definition of Σn . The third term is the first significant term. We
presume here that the first term of Hn,100 is a non-zero constant.
Now the scale √ of the separatrix in Σ at resonance is typically

proportional to ². So the third term of Hn,000 and the first term of
00 are both proportional to ². Subsequent terms are higher order
Hn,1
in ². Keeping only the order ² terms the approximate resonance
Hamiltonian is of the form
(Σ − Σn )2
− β 0 cos σ, (6.72)
2α0
which the Hamiltonian for a pendulum with a shifted center in
momentum. This is analytically solvable.
Driven pendulum resonances
Consider the behavior of the periodically driven pendulum in the
vicinity of the resonance ωr (p) = ω.
The Hamiltonian (6.54) for the driven pendulum has three reso-
nance terms in H1 . The full generator (6.56) has three terms that
are designed to eliminate the corresponding resonance terms in the
Hamiltonian. The resulting approximate solution has small de-
nominators near each of the three resonances ωr (p) = 0, ωr (p) = ω,
ωr (p) = −ω.
To develop a resonance approximation near ωr (p) = ω, we do
not include the corresponding term in the generator, so that the
corresponding term is left in the Hamiltonian. It is helpful to give
names to the various terms in the full generator (6.56):
β
W 0 (τ ; θ, t; p, T ) = − sin θ
ωr (p)
γ
W − (τ ; θ, t; p, T ) = sin(θ + ωt)
ωr (p) + ω
γ
W + (τ ; θ, t; p, T ) = sin(θ − ωt), (6.73)
ωr (p) − ω
The full generator is W 0 + W − + W + .

To investigate the motion in the phase space near the resonance

ωr (p) = ω (the ”+” resonance) we use the generator that excludes
the corresponding term
W+ = W 0 + W − . (6.74)
Using this generator the transformed Hamiltonian is
p2
H+ (τ ; θ, t; p, T ) = T + + γ cos(θ − ωt) + · · · . (6.75)
2α
Excluding the higher order terms, this Hamiltonian has only
a single combination of coordinates, and so can be transformed
into a Hamiltonian that is cyclic in all but one degree of freedom.
Define the transformation through the mixed variable generating
function
F2 (τ ; t, θ; Σ, T 0 ) = (θ − ωt)Σ + tT 0 , (6.76)
giving the transformation
σ = θ − ωt
t = t0
p=Σ
T = T 0 − ωΣ. (6.77)
Expressed in these new coordinates the resonance Hamiltonian is
Σ2
H+ 0 (τ ; σ, t0 ; Σ, T 0 ) = T 0 − ωΣ + + γ cos σ
2α
(Σ − αω)2 1
= + γ cos σ + T 0 − αω 2 . (6.78)
2α 2
This Hamiltonian is cyclic in t0 , so the solutions are level curves
0 0
of H + in (σ, Σ). Actually more can be said here because H +
is already of the form of a pendulum shifted in the Σ direction
by αω, and shifted by π in phase. The shift by π comes about
because the sign of the cosine term is positive rather than negative
as in the usual pendulum. A sketch of the level curves is given in
figure 6.8.
Exercise 6.2: Resonance width

√
Verify that the half width of the resonance region is 2 αγ².
1
2(αγ²) 2
αω
−π +π
0
Figure 6.8 Contours of the resonance Hamiltonian H + give the mo-
tion in the (σ, Σ) plane. In this case the resonance Hamiltonian is a
generalized pendulum shifted in momentum
√ and phase. The half-width
of the resonance oscillation zone is 2 αγ².
Exercise 6.3: With the computer

Verify, with the computer, that with the generator W+ the transformed
Hamiltonian is given by equation (6.75).
An approximate solution of the driven pendulum near the

ωr (p) = ω resonance is
0
(τ ; θ, t; p, T ) = (E²,W+
Eτ0 −τ0 ,H+ 0 E−²,W
0
+
I)(τ0 ; θ0 , t0 ; p0 , T0 ). (6.79)
To find out to what extent the approximate solution models the

actual driven pendulum we make a surface of section using this
approximate solution and compare it to a surface of section for
the actual driven pendulum. The surface of section for the ap-
proximate solution in the resonance region is shown in figure 6.9.
A surface of section for the actual driven pendulum is shown in
figure 6.10. The correspondence is surprisingly good, but some
features of the actual section are not represented. For instance,
there is a small chaotic zone near the actual separatrix. Note how
the resonance island is not symmetrical about a line of constant
momentum. The resonance Hamiltonian is symmetrical about
Σ = αω, so, taken by itself, would give a symmetric resonance
island. The necessary distortion is introduced by the W + trans-
formation that eliminates the other resonances. Indeed, in the

full section the distortion appears to be generated by the nearby
ωr (p) = 0 resonance “pushing away” nearby features so that it
has room to fit.
10
-10
−π 0 π
Figure 6.9 Surface of section of the first-order perturbative solution

for the driven pendulum constructed for the region near the resonance
ωr (p) = ω. The parameters of the system are: α = 1, β = 1, γ = 1/4,
and ω = 5. Only order ² terms were kept in the Lie series for the W
transformation. The perturbative solution captures the essential shape
and position of the resonant island it is designed to approximate.
The perturbation solution near the ωr (p) = 0 resonance merges

smoothly with the perturbation solutions for the ωr (p) = ω and
ωr (p) = −ω resonances. We can make a composite perturba-
tive solution by using the appropriate resonance solution for each
region of phase space. A surface of section for the composite
perturbative solution is shown in figure 6.10. The corresponding
surface of section for the actual driven pendulum is also shown.
The perturbative solution captures many features seen on the ac-
tual section. However, the first-order perturbative solution does
not capture the resonant islands between the two primary reso-
nances or the secondary island chains contained within a primary
resonance region. The first-order perturbative solution does not
10
-10
−π 0 π
10
-10
−π 0 π
Figure 6.10 A composite surface of section for the driven pendulum

is constructed by combining the first-order perturbative solution for the
region near the resonance ωr (p) = 0 and the solutions for the regions
near the resonances ωr (p) = ±ω. A corresponding surface of section
for the actual driven pendulum is shown below. The parameters of the
system are: α = 1, β = 1, γ = 1/4, and ω = 5.
show the chaotic zone near the separatrix apparent in the surface
of section for the actual driven pendulum.
We see, from the comparisons of the sections of the first-order
perturbative solutions for the various resonance regions that the
section for the actual driven pendulum can be approximately con-
structed by combining the approximations developed for each res-
onance. The shapes of the resonance regions are distorted by
the transformations that eliminate the nearby resonances, so the
resulting pieces fit together consistently. The predicted width of
each resonance region agrees with the actual width: it was not sub-
stantially changed by the distortion of the region introduced by
the elimination of the other resonance terms. Not all the features
of the actual section are reproduced in this composite of first-order
approximations: there are chaotic zones and islands that are not
accounted for in this collage of first-order approximations.
For larger drives the approximations derived by first-order per-
turbations are worse. In figure 6.11, with a factor of five larger
drive we lose the invariant curves that separate the resonance re-
gions. The main resonance islands persist, but the chaotic zones
near the separatrices have merged into one large chaotic sea.
The first-order perturbative solution for the more strongly driven
pendulum in figure 6.11 still approximates the centers of the main
resonance islands reasonably well, but it fails as we move out and
encounter the secondary islands that are visible in the resonance
region for ωr (p) = ω. Here the approximations for the two regions
do not fit together so well. The chaotic sea is found in the region
where the perturbative solutions do not match.
6.4.2 Reading the Hamiltonian

The locations and widths of the primary resonance islands can
often be read straight off the Hamiltonian, when expressed as a
Poisson series. For each term in the series for the perturbation
there is a corresponding resonance island. The width of the island
can often be simply computed from the coefficients in the Hamil-
tonian. So just by looking at the Hamiltonian we can get a good
idea of what sort of behavior we will see on the surface of section.
So, for instance, in the driven pendulum, Hamiltonian (6.53) has
three terms. We could anticipate, just from looking at the Hamil-
tonian, that there are three main resonance islands to be found on
the surface of section. We know that these islands will be located
6.4.2 Reading the Hamiltonian 465
10
-10
−π 0 π
10
-10
−π 0 π
Figure 6.11 Composite surface of section for the driven pendulum

constructed by combining the first-order perturbative solution for the
region near the resonance ωr (p) = 0 and the regions near the resonances
ωr (p) = ±ω. A corresponding surface of section for the actual driven
pendulum is shown below. The parameters of the system are the same
as in figure 6.10 except that γ = 5/4.
where the resonant combination of angles is slow. So for the pe-

riodically driven pendulum the resonances occur near ωr (p) = ω,
ωr (p) = 0, and ωr (p) = −ω. The approximate widths of the
resonance islands can be computed with a simple calculation.
6.4.3 Resonance Overlap Criterion

As the size of the drive increases the chaotic zones near the sep-
aratrices get larger and then merge into a large chaotic sea. The
resonance overlap criterion gives an analytic estimate of when this
occurs. The basic idea is to compare the sum of the widths of
neighboring resonances with their separation. If the sum of the
half-widths is greater than the separation then the resonance over-
lap criterion predicts there will be large scale chaotic behavior near
the overlapping resonances. In the case of the periodically driven
√
pendulum the half-width of the ωr (p) = 0 resonance is 2 αβ,
√
and the half-width of the ωr (p) = ω resonance is 2 αγ (see fig-
ure 6.12). The separation of the resonances is αω. So resonance
overlap occurs if
p √
2 αβ + 2 αγ ≥ αω. (6.80)
The amplitude of the drive enters through γ. Solving, we find

the value of γ above which resonance overlap occurs. For the
parameters α = β = 1, ω = 5 used in the above figures, the
resonance overlap value of γ is 9/4. We see that, in fact, the
chaotic zones have already merged for γ = 5/4. So in this case
the resonance overlap criterion overestimates the strength of the
resonances that are required to get large scale chaotic behavior.
This is typical of the resonance overlap criterion.
A way of thinking about why the resonance overlap criterion
usually overestimates the strength required to get large scale chaos
is that there are other effects that need to be taken into account.
For instance, as the drive is increased second order resonances
appear between the primary resonances; these resonances take
up space and so resonance overlap occurs for smaller drive than
would be expected by considering the primary resonances alone.
Also the chaotic zones at each separatrix have some width also
take up area that must be taken into account.
6.4.4 Resonances in Higher Order Perturbation Theory 467
αω 1
2(αγ²) 2
1
2(αβ²) 2
σ
−π +π
Figure 6.12 Resonance overlap occurs when the sum of the half-
widths of adjacent resonances is larger than the spacing between them.
6.4.4 Resonances in Higher Order Perturbation Theory

As the drive is increased, a variety of new islands emerge, which
are not evident in the original Hamiltonian. To find approxima-
tions for motion in these regions we can use higher order pertur-
bation theory. The basic plan is the same as before. At any stage
the Hamiltonian (which is perhaps a result of earlier stages of
perturbation theory) is expressed as a Poisson series (a multiple
angle Fourier series). The terms that are not resonant in a region
of interest are eliminated by a Lie transformation. The remaining
resonance terms involve only a single combination of angle and is
thus solvable by making a canonical transformation to resonance
coordinates. We complete the solution and transform back to the
original coordinates.
Let’s find a perturbative approximation for the second order
islands visible in figure 6.10 between the ωr (p) = 0 resonance and
the ωr (p) = −ω resonance. The details are messy, so we will just
give a few intermediate results.
This resonance is not near the three primary resonances, so we
can use the full generator (6.56) to eliminate those three primary
resonance terms from the Hamiltonian. After this perturbation
step the Hamiltonian is too hairy to look at.
We expand the transformed Hamiltonian in Poisson form and

divide the terms into those that are resonant and those that are
not. The terms that are not resonant can be eliminated by a
Lie transform. This Lie transform leaves the resonant terms in
the Hamiltonian and introduces an additional distortion to the
curves on the surface of section. This latter distortion is small in
this case, and very messy to compute, so we will just not include
this effect. The resonance Hamiltonian is then (after considerable
algebra)
H2:1 (τ ; θ, t; p, T )
p2 αβγ α2 ω 2 + 2αωp + 2p2
= +T + cos (2θ + ωt) (6.81)
2α 4p2 (αω + p)2
This is solvable because there is only a single combination of co-
ordinates.
We can get an analytic solution by making the pendulum ap-
proximation. The Hamiltonian is already quadratic in the mo-
mentum p, so all we need to do is evaluate the coefficient of the
potential terms at the resonance center p2:1 = αω/2. The reso-
nance Hamiltonian, in the pendulum approximation, is
0 p2 2βγ
H2:1 (τ ; θ, t; p, T ) = + cos (2θ + ωt) . (6.82)
2α αω 2
Carrying out the transformation to the resonance variable σ =
2θ − ωt reduces this to a pendulum Hamiltonian with a single
degree of freedom. Combining the analytic solution of this pen-
dulum Hamiltonian, with the transformations generated by the
full W , we get an approximate perturbative solution
0
(τ ; θ, t; p, T ) = (E²,W Eτ0 −τ0 ,H2:1
00 E
0
−²,W I)(τ0 ; θ0 , t0 ; p0 , T0 ). (6.83)
A surface of section in the appropriate resonance region using this

solution is shown in figure 6.13. Comparing this to the actual sur-
face of section (figure 6.10) we see that the approximate solution
provides a good representation of this resonance motion.
6.4.5 Stability of Inverted Vertical Equilibrium

As a second application, we use second order perturbation theory
to investigate the inverted vertical equilibrium of the periodically
driven pendulum.
10
-10
−π 0 π
Figure 6.13 Second order perturbation theory gives an approxima-

tion to the second order islands near the resonance 2ωr (p) + ω = 0.
Actually the procedure parallels that just followed, but here we

focus on a different set of resonance terms. The terms that are
slowly varying for the vertical equilibrium are those that involve
θ but do not involve t such as cos(θ) and cos(2θ). So we want
to use the generator W + + W − that eliminates the non-resonant
terms involving combinations of θ and ωt, while leaving the central
resonance. After the Lie transform of the Hamiltonian with this
generator, we write the transformed Hamiltonian as a Poisson
series and collect the resonant terms. The transformed resonance
Hamiltonian is
HV0 (τ ; θ, t; p, T )
p2 αγ 2 ²2 (α2 ω 2 + p2 )
= − β² cos θ + cos(2θ) + · · · . (6.84)
2α 2(α2 ω 2 − p2 )2
Figure 6.14 shows contours of this resonance Hamiltonian HV0 .

Figure 6.14 shows a surface of section for the actual driven pen-
dulum for the same parameters. The behavior of the resonance
Hamiltonian is indistinguishable from that of the actual driven
pendulum. The theory does especially well here; there are no

nearby resonances because the drive frequency is high.
We can get an analytic estimate for the stability of the inverted
vertical equilibrium by carrying out a linear stability analysis of
the resonance Hamiltonian of the fixed point θ = π, p = 0. The
algebra is somewhat simpler if we first make the pendulum approx-
imation about the resonance center. The resonance Hamiltonian
is then approximately
p2 γ 2 ²2
HV00 (τ ; θ, t; p, T ) = − β² cos θ + cos(2θ) + · · · . (6.85)
2α 2αω 2
Linear stability analysis of the inverted vertical equilibrium indi-
cates stability for
γ 2 > αβω 2 . (6.86)
In terms of the original physical parameters, the vertical equilib-

rium is linearly stable if
ω A √
> 2, (6.87)
ωs l
p
where ωs = g/l, the small amplitude oscillation frequency. For
the vertical equilibrium to be stable the scaled product of the
amplitude of the drive and the drive frequency must be sufficiently
large.
This analytic estimate is compared with the behavior of the
driven pendulum in figure 6.15. For any given assignment of the
parameters the driven pendulum can be tested for the linear sta-
bility of the inverted vertical equilibrium by the methods of chap-
ter 4: numerically, this involves determining the roots of the char-
acteristic polynomial for a reference orbit at the resonance center.
In the figure the stability of the inverted vertical equilibrium was
assessed at each point of a grid of assignments of the parameters.
A dot is shown for combinations of parameters that are linearly
stable. The diagonal line is the analytic boundary of the√ region of
stability of the inverted equilibrium: (ω/ωs )(A/l) = 2. We see
that the boundary of the region of stability is well approximated
by the analytic estimate derived from the perturbation theory.
Note that for very high drive amplitudes there is another region
of instability, which is not captured by this perturbation analysis.
10
-10
−π 0 π
10
-10
−π 0 π
Figure 6.14 Contours of the resonance Hamiltonian HV0 , which has

been developed to study the stability of the vertical equilibrium, are
shown in the upper plot. A corresponding surface of section for the
actual driven pendulum is shown in the lower plot. The parameters are
m =p 1 kg, l = 1 m, g = 9.8 m/s2 , A = 0.03m, ω = 100ωs , where
ωs = g/l.
-1
log10 (A/l)
-2
-3
0 1 2 3
log10 (ω/ωs )
Figure 6.15 Stability of the inverted vertical equilibrium over a range

of parameters. The full parameter space displayed was sampled over a
regular grid. The dots indicate parameters for which the actual driven
pendulum is linearly stable; nothing is plotted in the case of instability.
√
The diagonal line is the locus of points satisfying: (ω/ωs )(A/l) = 2.
6.5 Projects
Exercise 6.4: Periodically driven pendulum
a. Work out the details of the perturbation theory for the primary driven
pendulum resonances, as displayed in figure 6.10.
b. Work out the details of the perturbation theory for the stability of
the inverted vertical equilibrium. Derive the resonance Hamiltonian,
and plot its contours. Compare these contours to surfaces of section for
a variety of parameters.
c. Carry out the linear stability analysis leading to equation (6.87).
What is happening in the upper part of figure fig:dpend-inverted-summary?
Why is the system unstable when criterion (6.87) predicts stability? Use
surfaces of section to investigate this parameter regime.
6.5 Projects 473
Exercise 6.5: Spin-orbit coupling

A Hamiltonian for the spin-orbit problem, described in section 2.11.2, is
p2θ n2 ²2 C a3
H(t, θ, pθ ) = − cos 2(θ − f (t))
2C 4 R3 (t)
p2 n2 ²2 C 7e
= θ − (cos(2θ − 2nt) + cos(2θ − 3nt)
2C 4 2
e
− cos(2θ − nt) + · · ·) (6.88)
2
where the ignored terms are higher order in eccentricity e.
a. Find the widths and centers of the three primary resonances. Com-
pare the predictions for the widths to the island widths seen on surfaces
of section. Write the criterion for resonance overlap and compare to
numerical experiments for the transition to large-scale chaos.
b. The fixed point of the synchronous island is offset from the average
rate of rotation. This is indicative of a “forced” oscillation of the ro-
tation of the Moon. Develop a perturbative theory for motion in the
synchronous island by using a Lie transform to eliminate the two non-
synchronous resonances. Predict the location of the fixed point at the
center of the synchronous resonance on the surface of section, and thus
predict the amplitude of the forced oscillation of the Moon.
7
Appendix: Our Notation
An adequate notation should be understood by at
least two people, one of whom may be the author.
Abdus Salam, (1950).
We adopt a functional mathematical notation that is close to that

used by Spivak in his Calculus on Manifolds. The use of func-
tional notation avoids many of the ambiguities of traditional math-
ematical notation; the ambiguities of traditional notation can be
an impediment to clear reasoning in classical mechanics. Func-
tional notation carefully distinguishes the function from the value
of the function when applied to particular arguments. In func-
tional notation mathematical expressions are unambiguous and
self-contained.
We adopt a generic arithmetic in which the basic arithmetic
operations, such as addition and multiplication, are extended to
a wide variety of mathematical types. Thus, for example, the ad-
dition operator + can be applied to numbers, tuples of numbers,
matrices, functions, etc. Generic arithmetic formalizes the com-
mon informal practice that is used to manipulate mathematical
objects.
We often want to manipulate aggregate quantities, such as the
collection of all of the rectangular coordinates of a collection of
particles, without explicitly manipulating the component parts.
Tensor arithmetic provides a traditional way of manipulating ag-
gregate objects: Indices label the parts, and conventions, such as
the summation convention, are introduced to manipulate the in-
dices. We introduce a tuple arithmetic as an alternative way of
manipulating aggregate quantities that usually allows us to avoid
labelling the parts with indices. Tuple arithmetic is inspired by
tensor arithmetic, but it is more general: not all of the components
of a tuple need to be of the same size or type.
The mathematical notation is in one-to-one correspondence
with the expressions of the computer language Scheme [21]. Scheme
is based on the λ-calculus [12] and directly supports the manipula-
476 Chapter 7 Appendix: Our Notation
tion of functions. We augment Scheme with symbolic, numerical,

and generic features to support our applications. For a simple
introduction to Scheme see Appendix 8. The correspondence be-
tween the mathematical notation and Scheme requires that math-
ematical expressions be unambiguous and self-contained. Scheme
provides immediate feedback in verification of mathematical de-
ductions, and facilitates the exploration of the behavior of sys-
tems.
Functions
The value of the function f , given the argument x, is written f (x).
The expression f (x) denotes the value of the function at the given
argument; when we wish to denote the function we write just f .
Functions may take several arguments. For example, we may have
the function that gives the Euclidean distance between two points
in the plane given by their rectangular coordinates.
p
d(x1 , y1 , x2 , y2 ) = (x2 − x1 )2 + (y2 − y1 )2 . (7.1)
In Scheme we can write this:

(define (d x1 y1 x2 y2)
(sqrt (+ (square (- x2 x1)) (square (- y2 y1)))))
Functions may be composed if the range of one overlaps the

domain of the other. The composition of functions is constructed
by passing the output of one to the input of the other. We write
the composition of two functions using the ◦ operation:
(f ◦ g) : x 7→ (f ◦ g)(x) = f (g(x)). (7.2)
A procedure h that computes the cube of the sine of its argument

may be defined by composing the procedures cube and sin:
(define h (compose cube sin))
(h 2)
.7518269446689928
which is the same as

(cube (sin 2))
.7518269446689928
477
Arithmetic is extended to the manipulation of functions: the

usual mathematical operations may be applied to functions. Ex-
amples are addition and multiplication; we may add or multiply
two functions if they take the same kinds of arguments and if their
values can be added or multiplied:
(f + g)(x) = f (x) + g(x),

(f g)(x) = f (x)g(x). (7.3)
A procedure g that multiplies the cube of its argument by the sine

of its argument is:
(define g (* cube sin))
(g 2)
7.274379414605454
(* (cube 2) (sin 2))

7.274379414605454
Symbolic values
As in usual mathematical notation, arithmetic is extended to al-
low the use of symbols that represent unknown or incompletely
specified mathematical objects. These symbols are manipulated
as if they had values of a known type. By default, a Scheme
symbol is assumed to represent a real number. So the expression
’a is a literal Scheme symbol that represents an unspecified real
number.
(print-expression
((compose cube sin) ’a))
(expt (sin a) 3)
The procedure print-expression simplifies the expression, re-

moves the type tags, and displays it in a readable form. We can
use the simplifier to verify a trigonometric identity:
(print-expression
((- (+ (square sin) (square cos)) 1) ’a))
0
Just as it is useful to be able to manipulate symbolic numbers,

it is useful to be able to manipulate symbolic functions. The
procedure literal-function makes a procedure that acts as a
function having no properties other than its name. By default, a

literal function is defined to take one real argument and produce
one real value. For example, we may want to work with a function
f : R → R.
(print-expression
((literal-function ’f) ’x))
(g x)
(print-expression
((compose (literal-function ’f) (literal-function ’g)) ’x))
(f (g x))
We can also make literal functions of multiple, possibly struc-

tured arguments that return structured values. For example, to
denote a literal function named g that takes two real arguments
and returns a real value (g : R × R → R) we may write:
(define g (literal-function ’g (-> (X Real Real) Real)))
(print-expression (g ’x ’y))
(g x y)
We may use such a literal function anywhere that an explicit func-

tion of the same type may be used.
There is a whole language for describing types of literal func-
tions in terms of the types and numbers of their arguments and
the types of their values. Here we describe a function that maps
pairs of real numbers to real numbers with the expression: (-> (X
Real Real) Real). Later we will introduce structured arguments
and values and we will show the extensions of literal functions to
handle these.
Tuples
There are two kinds of tuples: up tuples and down tuples. We
write tuples as ordered lists of their components; a tuple is de-
limited by parentheses if it is an up tuple and it is delimited by
square brackets if it is a down tuple. For example, the up tuple v
of velocity components v 0 , v 1 , and v 2 is
¡ ¢
v = v0 , v1, v2 . (7.4)
The down tuple p of momentum components p0 , p1 , and p2 is
p = [p0 , p1 , p2 ] . (7.5)
479
A component of an up tuple is usually identified with a super-

script. A component of a down tuple is usually identified with
a subscript. We use zero-based indexing when referring to tuple
elements. This notation follows the usual convention in tensor
arithmetic.
In Scheme we make tuples with the constructors up and down.
(define v (up ’v^0 ’v^1 ’v^2))
(print-expression v)
(up vˆ0 vˆ1 vˆ2)
(define p (down ’p 0 ’p 1 ’p 2))
(print-expression p)
(down p 0 p 1 p 2)
Tuple arithmetic is different from the usual tensor arithmetic

in that the components of a tuple may also be tuples and different
components need not have the same structure. For example, a
tuple structure s of phase-space states is
s = (t, (x, y) , [px , py ]) . (7.6)
It is an up tuple of the time, the coordinates, and the momenta.

The time t has no substructure. The coordinates are an up tuple
of the coordinate components x and y. The momentum is a down
tuple of the momentum components px and py . In Scheme:
(define s (up ’t (up ’x ’y) (down ’p x ’p y)))
In order to reference components of tuple structures there is a

class of selector functions. For example:
I(s) = s
I0 (s) = t
I1 (s) = (x, y)
I2 (s) = [px , py ]
I1,0 (s) = x
...
I2,1 (s) = py . (7.7)
The sequence of integer subscripts on the selector describes the

access chain to the desired component.
The procedure component is the general selector procedure that
implements the selector function Iz :
((component 0 1) (up (up ’a ’b) (up ’c ’d)))
b
To access a component of a tuple we may also use the selector

procedure ref, which takes a tuple and an index and returns the
indicated element of the tuple:
(ref (up ’a ’b ’c) 1)
b
We use zero-based indexing everywhere. The procedure ref can

be used to access any substructure of a tree of tuples:
(ref (up (up ’a ’b) (up ’c ’d)) 0 1)
b
Two up tuples of the same length may be added or subtracted,

elementwise, to produce an up tuple, if the components are com-
patible for addition. Similarly, two down tuples of the same length
may be added or subtracted, elementwise, to produce a down tu-
ple, if the components are compatible for addition.
Any tuple may be multiplied by a number, by multiplying each
component by the number. Numbers may, of course, be mul-
tiplied. Tuples that are compatible for addition form a vector
space.
Two tuples are said to be compatible for contraction if they are
of opposite types, they are of the same length, and their corre-
sponding elements are compatible for contraction. If two tuples
are compatible for contraction then generic multiplication is in-
terpreted to be contraction: The result is the sum of the products
of corresponding components of the tuples. For example, p and v
introduced above are compatible for multiplication; the product
is
pv = p0 v 0 + p1 v 1 + p2 v 2 . (7.8)
So the product of tuples that are compatible for contraction is an

inner product. Contraction of tuples is commutative: pv = vp.
Using the tuples p and v defined above
481
(print-expression
(* p v))
(+ (* p 0 vˆ0) (* p 1 vˆ1) (* p 2 vˆ2))
Tuple structures can be made to represent linear transforma-

tions. For example, the rotation commonly represented by the
matrix
h i
cos θ − sin θ
(7.9)
sin θ cos θ
can be represented as a tuple structure,1

h³ ´³ í
cos θ − sin θ
. (7.10)
sin θ cos θ
Such a tuple is compatible for contraction with an up tuple that
represents a vector. So, for example:
h³ ´³ í³ ´ ³ ´
cos θ − sin θ x x cos θ − y sin θ
= . (7.11)
sin θ cos θ y x sin θ + y cos θ
Two tuples that represent linear transformations, though not com-
patible for contraction, may also be combined by multiplication.
In this case the product represents the composition of the linear
transformations. For example, the product of the tuples repre-
senting two rotations is
h³ ´³ íh³ ´³ í
cos θ − sin θ cos φ − sin φ
sin θ cos θ sin φ cos φ
h³ ´³ í
cos(θ + φ) − sin(θ + φ)
= . (7.12)
sin(θ + φ) cos(θ + φ)
Multiplication of tuples that represent linear transformations is as-
sociative but generally not commutative, just as the composition
of the transformations is associative but not generally commuta-
tive.
The actual rule for multiplying two structures that are not com-
patible for contraction is simple. If A and B are not compatible
for contraction, the product is a tuple of type B, the compo-
nents are the products of a and the components of B. The same
1
The arrangement of the components of a tuple structure is not significant,
as it is in matrix notation: We might just as well have written this tuple as
[(cos θ, sin θ) , (− sin θ, cos θ)].
rule is applied recursively in multiplying the components. So if

B = (B 0 , B 1 , B 2 ), the product of A and B is
¡ ¢
AB = AB 0 , AB 1 , AB 2 . (7.13)
If A and C are not compatible for contraction and C = [C0 , C1 , C2 ],

the product is
AC = [AC0 , AC1 , AC2 ] . (7.14)
Caution: Multiplication of tuples that are compatible for con-

traction is, in general, not associative. For example, let u = (5, 2),
v = (11, 13), and g = [[3, 5] , [7, 9]]. Then u(gv) = 964, but
(ug)v = 878. The expression ugv is ambiguous. An expression
that has this ambiguity does not arise in this book.
Derivatives
The derivative of a function f is a function. It is denoted by
Df . Our notational convention is that D is a high-precedence
operator. Thus D operates on the adjacent function before any
other application occurs: Df (x) is the same as (Df )(x). Higher-
order derivatives are described by exponentiating the derivative
operator. Thus the nth derivative of a function f is notated by
Dn f .
The Scheme procedure for producing the derivative of a function
is named D. The derivative of the sin procedure is a procedure that
computes cos:
(define derivative-of-sine (D sin))
(print-expression (derivative-of-sine ’x))

(cos x)
The derivative of a function f is the function Df whose value

for a particular argument is something that can be multiplied by
an increment ∆x in the argument to get a linear approximation
to the increment in the value of f :
f (x + ∆x) ≈ f (x) + Df (x)∆x. (7.15)
For example, let f be the function that cubes its argument

(f (x) = x3 ), then Df is the function that yields three times
483
the square of its argument (Df (y) = 3y 2 ). So f (5) = 125 and

Df (5) = 75. The value of f with argument x + ∆x is
f (x + ∆x) = (x + ∆x)3 = x3 + 3x2 ∆x + 3x∆x2 + ∆x3 (7.16)
and
Df (x)∆x = 3x2 ∆x. (7.17)
So Df (x) multiplied by ∆x gives us the term in f (x + ∆x) that is

linear in ∆x, providing a good approximation to f (x + ∆x) − f (x)
when ∆x is small.
Derivatives are operators. An operator is like a function ex-
cept that multiplication of operators is interpreted as composition,
whereas multiplication of functions is multiplication of the values
(see equation 7.3). If D were an ordinary function, then the rule
for multiplication would imply that D2 f would just be the prod-
uct of Df with itself, which is not what is intended. Arithmetic is
extended to allow manipulation of operators. A typical operator
is
(D + 1)(D − 1) = D2 − 1,
which subtracts a function from its second derivative. The 1 acts

as the identity operator: When arithmetically combined with op-
erators, a number is treated as an operator that multiplies its
input by the number. Such an operator can be constructed and
used in Scheme:
(print-expression
(((* (- D 1) (+ D 1)) (literal-function ’f)) ’x))
(+ (((expt D 2) f) x) (* -1 (f x)))
Derivatives of functions of multiple arguments

The derivative generalizes to functions that take multiple argu-
ments. The derivative of a real-valued function of multiple argu-
ments is an object whose contraction with the tuple of increments
in the arguments gives a linear approximation to the increment in
the function’s value.
A function of multiple arguments can be thought of as a func-
tion of an up tuple of those arguments. Thus an incremental ar-
gument tuple is an up tuple of components, one for each argument
position. Thus the derivative of such a function is a down-tuple
of the the partial derivatives of the function with respect to each

argument position.
Suppose we have a real-valued function g of two real-valued
arguments, and we want to approximate the increment in the value
of g from its value at x, y. If the arguments are incremented by
the tuple (∆x, ∆y) we compute:
Dg(x, y) (∆x, ∆y) = [∂0 g(x, y), ∂1 g(x, y)] · (∆x, ∆y)
= ∂0 g(x, y)∆x + ∂1 g(x, y)∆y. (7.18)
Using the two-argument literal function g defined above:

(print-expression ((D g) ’x ’y))
(down (((partial 0) g) x y) (((partial 1) g) x y))
In general, partial derivatives are just the components of the

derivative of a function that takes multiple arguments (or struc-
tured arguments or both, see below). So a partial derivative of a
function is a composition of a component selector and the deriva-
tive of that function. Indeed:
∂0 g = I0 ◦ Dg (7.19)
∂1 g = I1 ◦ Dg. (7.20)
Concretely, if
g(x, y) = x3 y 5 (7.21)
then
£ ¤
Dg(x, y) = 3x2 y 5 , 5x3 y 4 (7.22)
and the first-order approximation of the increment for changing

the arguments by ∆x and ∆y is
£ ¤
g(x + ∆x, y + ∆y) − g(x, y) ≈ 3x2 y 5 , 5x3 y 4 · (∆x, ∆y)
= 3x2 y 5 ∆x + 5x3 y 4 ∆y. (7.23)
Mathematical notation usually does not distinguish functions

of multiple arguments and functions of the tuple of arguments.
Let h((x, y)) = g(x, y). The function h, which takes a tuple of
arguments x and y, is not distinguished from the function g that
takes arguments x and y. We use both ways of defining functions
485
of multiple arguments. The derivatives of both kinds of functions

are compatible for contraction with a tuple of increments to the
arguments. Scheme comes in handy here:
(define (h s)
(g (ref s 0) (ref s 1)))
(print-expression
(h (up ’x ’y)))
(g x y)

(print-expression ((D h) (up ’x ’y)))

A phase-space state function is a function of time, coordinates,

and momenta. Let H be such a function. The value of H is
H(t, (x, y) , [px , py ]) for time t, coordinates (x, y), and momenta
[px , py ]. Let s be the phase-space state tuple as in (7.6):
s = (t, (x, y) , [px , py ]) . (7.24)
The value of H for argument tuple s is H(s). We use both ways

of writing the value of H.
We often show the use of a function of multiple arguments that
include tuples by indicating the boundaries of the argument tuples
with semicolons and separating their components with commas.
If H is a function of phase-space states with arguments t, (x, y),
and [px , py ] we may write H(t; x, y; px , py ). This notation loses
the up/down distinction, but our semicolon-and-comma notation
is convenient and reasonably unambiguous.
The derivative of H is a function that produces an object that
can be contracted with an increment in the argument structure to
produce an increment in the function’s value. The derivative is a
down tuple of three partial derivatives. The first partial derivative
is the partial derivative with respect to the numerical argument.
The second partial derivative is a down tuple of partial derivatives,
with respect to each component of the up-tuple argument. The
third partial derivative is an up tuple of partial derivatives, with
respect to each component of the down-tuple argument.
DH(s) = [∂0 H(s), ∂1 H(s), ∂2 H(s)] (7.25)

= [∂0 H(s), [∂1,0 H(s), ∂1,1 H(s)] , (∂2,0 H(s), ∂2,1 H(s))] ,
where ∂1,0 indicates the partial derivative with respect to the first
component (index 0) of the second argument (index 1) of the func-
tion, and so on. Indeed ∂z F = Iz ◦ DF , for any function F and
access chain z. So, if we let ∆s be an incremental phase-space
state tuple,
∆s = (∆t, (∆x, ∆y) , [∆px , ∆py ]) , (7.26)
then
DH(s)∆s = ∂0 H(s)∆t
+ ∂1,0 H(s)∆x + ∂1,1 H(s)∆y
+ ∂2,0 H(s)∆px + ∂2,1 H(s)∆py . (7.27)
Caution: Partial derivative operators with respect to different

structured arguments generally do not commute.
In Scheme we must make explicit choices. We usually assume
phase space state functions are functions of the tuple. For example
(define H
(literal-function ’H
(print-expression
(H s))
(H (up t (up x y) (down p x p y)))
(print-expression
((D H) s))
(down
(((partial 0) H) (up t (up x y) (down p x p y)))
(down (((partial 1 0) H) (up t (up x y) (down p x p y)))
(((partial 1 1) H) (up t (up x y) (down p x p y))))
(up (((partial 2 0) H) (up t (up x y) (down p x p y)))
(((partial 2 1) H) (up t (up x y) (down p x p y)))))
Structured results
Some functions produce structured outputs. A function whose
output is a tuple is equivalent to a tuple of component functions
each of which produces one component of the output tuple.
For example, a function that takes one numerical argument and
produces a structure of outputs may be used to describe a curve
487
through space. The following function describes a helical path

around the z-axis in three-dimensional space:
h(t) = (cos t, sin t, t) = (cos, sin, I)(t). (7.28)
The derivative is just the up tuple of the derivatives of each com-

ponent of the function:
Dh(t) = (− sin t, cos t, 1). (7.29)
In Scheme we can write

(define (helix t)
(up (cos t) (sin t) t))
or just
(define helix (up cos sin identity))
Its derivative is just the up tuple of the derivatives of each com-

ponent of the function:
(print-expression ((D helix) ’t))
(up (* -1 (sin t)) (cos t) 1)
In general, a function that produces structured outputs is just

treated as a structure of functions, one for each of the components.
The derivative of a function of structured inputs that produces
structured outputs is an object that when contracted with an in-
cremental input structure produces a linear approximation to the
incremental output. Thus, if we define function g by
g(x, y) = ((x + y)2 , (y − x)3 , ex+y ), (7.30)
then the derivative of g is:

"Ã ! Ã !#
2(x + y) 2(x + y)
Dg(x, y) = −3(y − x)2 , 3(y − x)2 (7.31)
ex+y ex+y
In Scheme:
(define (g x y)
(up (square (+ x y)) (cube (- y x)) (exp (+ x y))))

(down (up (+ (* 2 x) (* 2 y))
(+ (* -3 (expt x 2)) (* 6 x y) (* -3 (expt y 2)))
(* (exp y) (exp x)))
(up (+ (* 2 x) (* 2 y))
(+ (* 3 (expt x 2)) (* -6 x y) (* 3 (expt y 2)))
(* (exp y) (exp x))))
Exercise 7.1: Chain rule

Let F (x, y) = x2 y 3 , G(x, y) = (F (x, y), y), and H(x, y) = F (F (x, y), y),
so that H = F ◦ G.
a. Compute ∂0 F (x, y), and ∂1 F (x, y).
b. Compute ∂0 F (F (x, y), y), and ∂1 F (F (x, y), y).
c. Compute ∂0 G(x, y), and ∂1 G(x, y).
d. Compute DF (a, b), DG(3, 5), and DH(3a2 , 5b3 ).
Exercise 7.2: Computing derivatives

We can represent functions of multiple arguments as procedures in sev-
eral ways, depending upon how we wish to use them. The simplest idea
is to identify the procedure arguments with the function’s arguments.
For example, we could write implementations of the functions that
occur in exercise 7.1 as follows:
(define (f x y)
(* (square x) (cube y)))
(define (g x y)
(up (f x y) y))
(define (h x y)
(f (f x y) y))
With this choice it is awkward to compose a function with multiple

arguments, such as f , with a function that produces a tuple of those
arguments, such as g. Alternatively, we can represent the function ar-
guments as slots of a tuple data structure, and then composition with
a function that produces such a data structure is easy. However, this
choice requires the procedures to build and take apart structures.
For example, we may define procedures that implement the functions
above as follows:
489
(define (f v)
(let ((x (ref v 0))
(y (ref v 1)))
(* (square x) (cube y))))
(define (g v)
(let ((x (ref v 0))
(y (ref v 1)))
(up (f v) y)))
(define h (compose f g))
Repeat exercise 7.1 using the computer. Explore both implementa-

tions of multiple-argument functions.
8
Appendix: Scheme
Programming languages should be designed not by
piling feature on top of feature, but by removing
the weaknesses and restrictions that make
additional features appear necessary. Scheme
demonstrates that a very small number of rules for
forming expressions, with no restrictions on how
they are composed, suffice to form a practical and
efficient programming language that is flexible
enough to support most of the major programming
paradigms in use today.
Revised3 Report on the Algorithmic Language
Scheme, (1986).
Here we give an elementary introduction to Scheme.1 For a more

precise explanation of the language see the IEEE standard [21].
For a longer introduction see the textbook [1].
Scheme is a simple programming language based on expressions.
An expression names a value. For example, the numeral 3.14
names an approximation to a familiar number. There are primitive
expressions, such as a numeral, that we directly recognize, and
there are compound expressions of several kinds.
Procedure calls
A procedure call is a kind of compound expression. A procedure
call is a sequence of expressions delimited by parentheses. The
first subexpression in a procedure call is taken to name a proce-
dure, and the rest of the subexpressions are taken to name the
arguments to that procedure. The value produced by the proce-
dure when applied to the given arguments is the value named by
the procedure call. For example,
1
Many of the statements here are only valid assuming there are no assignments.
492 Chapter 8 Appendix: Scheme
(+ 1 2.14)
3.14
(+ 1 (* 2 1.07))
3.14
are both compound expressions that name the same number as

the numeral 3.14.2 In these cases the symbols + and * name
procedures that add and multiply, respectively. If we replace any
subexpression of any expression with an expression that names
the same thing as the original subexpression, the thing named by
the overall expression remains unchanged. In general, a procedure
call is written
( operator operand-1 ... operand-n )
where operator names a procedure and operand-i names the ith

argument.3
Lambda expressions
Just as we use numerals to name numbers, we can use λ-expressions
to name procedures.4 For example, the procedure that squares its
input can be written:
(lambda (x) (* x x))
This expression can be read: “The procedure of one argument, x,

that multiplies x by x.” Of course, we can use this expression in
any context where a procedure is needed. For example,
((lambda (x) (* x x)) 4)
16
The general form of a λ-expression is:

(lambda formal-parameters body)
2
In examples we show the value that would be printed by the Scheme system
using an italic face following the input expression.
3
In Scheme every parenthesis is essential: you cannot add extra parentheses
or remove any.
4
The logician Alonzo Church [12] invented λ notation to allow the specification
of an anonymous function of a named parameter: λx[expression in x]. This
is read “That function of one argument that is obtained by substituting the
argument for x in the indicated expression.”
493
where formal-parameters is a list of symbols that will be the names

of the arguments to the procedure and the body is an expression
that may refer to the formal parameters. The value of a procedure
call is the value of the body of the procedure with the arguments
substituted for the formal parameters.
Definitions
We can use the define construct to give a name to any object.
For example, if we make the definitions
(define pi 3.141592653589793)
(define square (lambda (x) (* x x)))
we can then use the symbols pi and square wherever the numeral
or the λ-expression could appear. For example, the area of the
surface of a sphere of radius 5 meters is:
(* 4 pi (square 5))
314.1592653589793
Procedure definitions may be expressed more conveniently, using

“syntactic sugar.” The squaring procedure may be defined
(define (square x) (* x x))
which we may read: “To square x multiply x by x.”

In Scheme, procedures may be passed as arguments and re-
turned as values. For example, it is possible to make a procedure
that implements the mathematical notion of the composition of
two functions:5
5
The examples are indented to help with readability. Scheme does not care
about extra whitespace, so we may add as much as we please to make things
easier to read.
(define compose
(lambda (f g)
(lambda (x)
(f (g x)))))
((compose square sin) 2)

.826821810431806
(square (sin 2))

.826821810431806
Using the syntactic sugar shown above we can write the defini-
tion more conveniently. The following are both equivalent to the
definition above:
(define (compose f g)
(lambda (x)
(f (g x))))
(define ((compose f g) x)
(f (g x)))
Conditionals
Conditional expressions may be used to choose among several ex-
pressions to produce a value. For example, a procedure that im-
plements the absolute value function may be written:
(define (abs x)
(cond ((< x 0) (- x))
((= x 0) x)
((> x 0) x)))
The conditional cond takes a number of clauses. Each clause has

a predicate expression, which may be either true or false, and a
consequent expression. The value of the cond expression is the
value of the consequent expression of the first clause for which the
corresponding predicate expression is true. The general form of a
conditional expression is
(cond ( predicate-1 consequent-1)
···
( predicate-n consequent-n))
For convenience there is a special predicate expression else that

can be used as the predicate in the last clause of a cond. The if
construct provides another way to make a conditional when there
495
is only a binary choice to be made. For example, because we only

have to do something special when the argument is negative we
could have defined abs as:
(define (abs x)
(if (< x 0)
(- x)
x))
The general form of an if expression is

(if predicate consequent alternative)
If the predicate is true the value of the if expression is the value

of the consequent, otherwise it is the value of the alternative.
Recursive procedures
Given conditionals and definitions we can write recursive proce-
dures. For example, to compute the nth factorial number we may
write:
(define (factorial n)
(if (= n 0)
1
(* n (factorial (- n 1)))))
(factorial 6)
720
(factorial 40)
815915283247897734345611269596115894272000000000
Local names
The let expression is used to give names to objects in a local
context. For example,
(define (f radius)
(let ((area (* 4 pi (square radius)))
(volume (* 4/3 pi (cube radius))))
(/ volume area)))
(f 3)
1
The general form of a let expression is

(let (( variable-1 expression-1)

···
( variable-n expression-n))
body)
The value of the let expression is the value of the body expression
in the context where the variables variable-i have the values of
the expressions expression-i. The expressions expression-i may
not refer to the variables variable-i.
A slight variant of the let expression provides a convenient
way to express looping constructs. We can write a procedure that
implements an alternative algorithm for computing factorials as
follows:
(define (factorial n)
(let clp ((count 1) (answer 1))
(if (> count n)
answer
(clp (+ count 1) (* count answer)))))
(factorial 6)
720
Here, the symbol following the let (in this case clp) is locally de-
fined to be a procedure that has the variables count and answer
as its formal parameters. It is called the first time with the ex-
pressions 1 and 1, initializing the loop. Whenever the procedure
named clp is called later, these variables get new values, which are
the values of the operand expressions (+ count 1) and (* count
answer).
Compound data—lists and vectors
Data can be glued together to form compound data structures.
A list is a data structure in which the elements are linked se-
quentially. A Scheme vector is a data structure in which the el-
ements are packed in a linear array. New elements can be added
to lists, but a list takes computing time proportional to its length
to access. Scheme vectors can be accessed in constant time, but
a Scheme vector is of fixed length. All data structures in this
book are implemented as combinations of lists and Scheme vec-
tors. Compound data objects are constructed from components by
procedures called constructors and the components are accessed
by selectors.
497
The procedure list is the constructor for lists. The selector

list-ref gets an element of the list. All selectors in Scheme are
zero-based. For example,
(define a-list (list 6 946 8 356 12 620))
a-list
(6 946 8 356 12 620)
(list-ref a-list 3)
356
(list-ref a-list 0)
6
Lists are built from pairs. A pair is made using the constructor
cons. The selectors for the two components of the pair are car
and cdr.6 A list is a chain of pairs, such that the car of each pair
is the list element and the cdr of each pair is the next pair, except
for the last cdr, which is a distinguishable value called the empty
list and which is written (). Thus,
(car a-list)
6
(cdr a-list)
(946 8 356 12 620)
(car (cdr a-list))

946
(define another-list
(cons 32 (cdr a-list)))
another-list
(32 946 8 356 12 620)
(car (cdr another-list))

946
Both a-list and another-list share the same tail (their cdr).
6
These names are accidents of history. They stand for “the Contents of the Ad-
dress Register” and “the Contents of the Decrement Register” of the IBM 704
computer, which was used for the first implementation of Lisp in the late
1950’s.
There is a predicate pair? that is true of pairs and false on all

other types of data.
Vectors are simpler than lists. There is a constructor vector
that can be used to make vectors, and there is a selector vector-ref
for accessing the elements of a vector:
(define a-vector
(vector 37 63 49 21 88 56))
a-vector
#(37 63 49 21 88 56)
(vector-ref a-vector 3)
21
(vector-ref a-vector 0)
37
Notice that a vector is distinguished from a list on printout by the

character “#” appearing before the initial parenthesis.
There is a predicate vector? that is true of vectors and false
on all other types of data.
The elements of lists and vectors may be any kind of data, in-
cluding numbers, procedures, lists, and vectors. There are numer-
ous other procedures for manipulating list-structured data and
vector-structured data that can be found in the Scheme online
documentation.
Symbols
Symbols are a very important kind of primitive data type that we
use to make programs and algebraic expressions. You probably
have noticed that Scheme programs look just like lists. They are
lists. Some of the elements of the lists that make up programs
are symbols, such as + and vector. If we are to make programs
that can manipulate programs we need to be able to write an
expression that names such a symbol. This is accomplished by
the mechanism of quotation. The name of the symbol + is the
expression ’+, and in general the name of an expression is the
expression preceded by a single quote character. Thus the name
of the expression (+ 3 a) is ’(+ 3 a).
We can test if two symbols are the identical with the predi-
cate eq?. Using this we can write a program to determine if an
expression is a sum:
499
(define (sum? expression)

(and (pair? expression)
(eq? (car expression) ’+)))
(sum? ’(+ 3 a))

#t
(sum? ’(* 3 a))

#f
Consider what would happen if we were to leave out the quote in

the expression (sum? ’(+ 3 a)). If the variable a had the value 4
we would be asking if 7 is a sum. But what we wanted to know
was whether the expression (+ 3 a) is a sum. That is why we
need the quote.
Bibliography
[1] Harold Abelson and Gerald Jay Sussman with Julie Sussman, Struc-
ture and Interpretation of Computer Programs, 2nd edition, MIT
Press and McGraw-Hill, 1996.
[2] Ralph H. Abraham and Jerrold E. Marsden, Foundations of Me-
chanics, 2nd edition, Addison-Wesley, 1978.
[3] Ralph H. Abraham, Jerrold E. Marsden, and Tudor Raţiu, Mani-
folds, Tensor Analysis, and Applications, 2nd edition, Springer Ver-
lag, 1993.
[4] V. I. Arnold, “Small Denominators and Problems of Stability of
Motion in Classical and Celestial Mechanics,” in Russian Math. Sur-
veys, 18, 6 (1963).
[5] V. I. Arnold, Mathematical Methods of Classical Mechanics,
Springer Verlag, 1980.
[6] V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt, “Mathematical
Aspects of Classical and Celestial Mechanics,” in Dynamical Systems
III, Springer Verlag, 1988.
[7] Max Born, Vorlesungen über Atommechanik, J. Springer, Berlin,
1925-30.
[8] Constantin Carathéodory, Calculus of variations and partial differ-
ential equations of the first order. Translated by Robert B. Dean and
Julius J. Brandstatter, Holden-Day, San Francisco, 1965-67.
[9] Constantin Carathéodory, Geometrische Optik. Series title: Ergeb-
nisse der Mathematik und ihrer Grenzgebiete, 4. Bd., J. Springer,
Berlin, 1937.
[10] Élie Cartan, Leçons sur les invariants intégraux, Hermann, Paris,
1922; reprinted in 1971.
[11] Boris V. Chirikov, “A Universal Instability of Many-Dimensional
Oscillator Systems,” in Physics Reports 52, 5, pp. 263–379, (1979).
[12] Alonzo Church, The Calculi of Lambda-Conversion, Princeton Uni-
versity Press, 1941.
[13] Richard Courant and David Hilbert, Methods of Mathematical
Physics, 2 volumes., Wiley-Interscience, 1957.
[14] Jean Dieudonné, Treatise on Analysis, Academic Press, 1969.
[15] Hans Freudenthal, Didactical Phenomenology of Mathematical
Structures, Kluwer Publishing Co., Boston, 1983.
502 Bibliography
[16] Giovanni Gallavotti, The Elements of Mechanics, Springer Verlag,

1983.
[17] F. R. Gantmakher, Lektsii po analiticheskoı̌ mekhanike (Lectures
on analytical mechanics), Fizmatgiz, Moscow, 1960; English trans-
lation by G.Yankovsky, Mir Publishing, Moscow, 1970.
[18] Herbert Goldstein, Classical Mechanics, 2nd edition, Addison-
Wesley Publishing Company, 1980.
[19] Michel Hénon and Carl Heiles, “The Applicability of the Third
Integral of Motion: Some Numerical Experiments,” in Astronomical
Journal, 69, pp. 73-79 (1964).
[20] Robert Hermann, Differential Geometry and the Calculus of Vari-
ations, Academic Press, 1968.
[21] IEEE Std 1178-1990, IEEE Standard for the Scheme Programming
Language, Institute of Electrical and Electronic Engineers, Inc., NY,
1991.
[22] E. L. Ince, Ordinary differential equations, Longmans, Green and
Co., 1926; Dover Publications, 1956.
[23] Jorge V. José and Eugene J. Saletan, Classical Dynamics: A Con-
temporary Approach, Cambridge University Press, 1998.
[24] Res Jost, “Poisson Brackets: an Unpedagogical Lecture,” in Re-
views of Modern Physics, 36, 572 (1964).
[25] P. E. B. Jourdain, The Principle of Least Action, The Open Court
Publishing Company, Chicago, 1913.
[26] Cornelius Lanczos, The Variational Principles of Mechanics, 4th
edition, University of Toronto Press, 1970; Dover Publications, 1982.
[27] L. D. Landau and E. M. Lifshitz, Mechanics, 3rd Edition, Volume
1 of Course of Theoretical Physics, Pergamon Press, 1976.
[28] Edward Lorenz, “Deterministic nonperiodic flow,” in Journal of
Atmospheric Science 20, p. 130, (1963).
[29] Jerrold E. Marsden, and Tudor S. Raţiu, Introduction to Mechanics
and Symmetry, Springer Verlag, 1994.
[30] Philip Morse and Hermann Feshbach, Methods of Theoretical
Physics, 2 vols., McGraw-Hill, 1953.
[31] Lothar Nordheim, The Principles of Mechanics, in Handbuch der
Physik, vol.2 Springer, Berlin, 1927.
[32] Henri Poincaré, Les Méthodes Nouvelles de la Mécanique Céleste,
Paris, 1892; reprinted by Dover Publications, 1957.
[33] H. C. Plummer, An Introductory Treatise on Dynamical Astron-
omy, Cambridge University Press 1918; reprinted by Dover Publica-
tions, 1960.
[34] Florian Scheck, Mechanics, From Newton’s Laws to Deterministic
Chaos, 2nd Ed. Springer-Verlag, 1994.
Bibliography 503
[35] P. Kenneth Seidelmann, editor, Explanatory Supplement to the As-

tronomical Almanac, University Science Books, 1992.
[36] Jean-Marie Souriau, Structure des systèmes dynamiques, Dunod
Université, Paris, 1970; English Translation: Birkhäuser Boston,
1998.
[37] Michael Spivak, Calculus on Manifolds, W.A.Benjamin, 1965.
[38] Stanly Steinberg, “Lie series, Lie transformations, and their ap-
plications,” in Lie Methods in Optics, J. Sánchez Mondragón and
K.B.Wolf, editors, Springer Verlag, 1986, pp. 45–103.
[39] Shlomo Sternberg, Celestial Mechanics, W. A. Benjamin, NY, 1969.
[40] E. C. G. Sudarshan and N. Mukunda, Classical Dynamics: A Mod-
ern Perspective, John Wiley & Sons, NY, 1974.
[41] J. B. Taylor, unpublished, 1968.
[42] Walter Thirring, A Course in Mathematical Physics 1: Classical Dy-
namical Systems, Translated by Evans M. Harell, Springer-Verlag,
1978.
[43] E. T. Whittaker, A Treatise on Analytical Dynamics, Cambridge
University Press, 1937.
List of Exercises
1.1 9 1.9 33 1.17 51 1.25 64 1.33 92

1.2 11 1.10 34 1.18 51 1.26 65 1.34 101
1.3 14 1.11 36 1.19 51 1.27 67 1.35 102
1.4 20 1.12 36 1.20 52 1.28 81 1.36 102
1.5 25 1.13 44 1.21 58 1.29 81 1.37 104
1.6 25 1.14 46 1.22 59 1.30 84 1.38 110
1.7 29 1.15 47 1.23 59 1.31 91 1.39 111
1.8 29 1.16 51 1.24 59 1.32 91
2.1 120 2.5 125 2.9 133 2.13 151 2.17 176
2.2 120 2.6 125 2.10 136 2.14 155 2.18 177
2.3 120 2.7 125 2.11 151 2.15 166 2.19 177
2.4 120 2.8 129 2.12 151 2.16 176 2.20 178
3.1 185 3.4 193 3.7 199 3.10 232 3.13 261
3.2 185 3.5 195 3.8 218 3.11 254 3.14 263
3.3 189 3.6 198 3.9 220 3.12 258 3.15 264
4.1 275 4.3 277 4.5 287 4.7 303 4.9 313
4.2 277 4.4 282 4.6 289 4.8 311 4.10 313
5.1 321 5.8 337 5.15 376 5.22 396 5.29 425
5.2 324 5.9 345 5.16 386 5.23 396 5.30 425
5.3 330 5.10 349 5.17 389 5.24 405 5.31 431
5.4 330 5.11 349 5.18 389 5.25 409 5.32 432
5.5 330 5.12 349 5.19 390 5.26 414
5.6 333 5.13 352 5.20 390 5.27 416
5.7 333 5.14 373 5.21 390 5.28 424
6.1 446 6.2 460 6.3 461 6.4 472 6.5 473
7.1 488 7.2 488

MIT Press - Structure and Interpretation of Classical Mechanics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIT Press - Structure and Interpretation of Classical Mechanics

Uploaded by

Copyright:

Available Formats

Structure and Interpretation

Gerald Jay Sussman and Jack Wisdom

The MIT Press

The Principle of Least Action.

“The author has spared himself no pains in his endeavour to

1.10.2Derivative Constraints 102

2 Rigid Bodies 113

3 Hamiltonian Mechanics 179

3.3 One Degree of Freedom 206

4 Phase Space Structure 265

5 Canonical Transformations 317

5.2.1 Time-independent Canonical Transformations 325

6 Canonical Perturbation Theory 435

6.4 Nonlinear Resonance 456

7 Appendix: Our Notation 475

8 Appendix: Scheme 491

There has been a remarkable revival of interest in classical me-

pend on context, and often even change within a given context.1

be interpreted automatically, as by a computer. As a consequence

D(∂2 L ◦ Γ[q]) − ∂1 L ◦ Γ[q] = 0

The Lagrangian L is a real-valued function of time t, coordinates

experiment. The exercises and projects are an integral part of the

We would like to thank the many people who have helped us to

mented a number of other arcane mechanisms that make our

discussions with Piet Hut, Jon Doyle, David Finkelstein, Peter

The subject of this book is motion, and the mathematical tools

we distinguish motions of a system that can actually occur from

Mechanics, as invented by Newton and his contemporaries, de-

1.1 The Principle of Stationary Action

Let us suppose that for each physical system there is a path-

path segment. The realizability of a path segment depends on

is the configuration at time t. The action of the segment of the

where F[γ] is a function of time that measures some local property

F[γ] = L ◦ T [γ]. (1.2)

The function T takes the path and produces a function of time.

T [γ](t) = (t, γ(t), Dγ(t), . . .) (1.3)

We refer to this tuple, which includes as many derivatives as are

is called the Lagrangian action. Lagrangians can be found for a

Exercise 1.1: Fermat optics

1.2 Configuration Spaces

configuration space is the smallest number of parameters that have

is specified by describing the changing configuration. Thus, the

Exercise 1.2: Degrees of freedom

1.3 Generalized Coordinates

vertical plane. The configuration is specified if the orientation of

the values χi (m) of the coordinate functions are the generalized

“chart”) that extends a coordinate representation to the local tu-

χ (t, γ(t), Dγ(t), . . .) = (t, q(t), Dq(t), . . .) , (1.5)

where q = χ ◦ γ. The function χ takes the coordinate-free local

Γ[q](t) = (t, q(t), Dq(t), . . .) . (1.6)

The evaluation of Γ only involves taking derivatives of the coordi-

Γ[q] = χ ◦ T [γ]. (1.7)

Exercise 1.3: Generalized coordinates

Lagrangians in generalized coordinates

The Lagrangian L is a function of the local tuple T [γ](t) =

Lχ ◦ Γ[q] = L ◦ T [γ]. (1.10)

On the left we have the composition of functions that use the

The function Sχ takes a coordinate path; the function S takes a

S[γ](t1 , t2 ) = Sχ [χ ◦ γ](t1 , t2 ). (1.12)

So we have a way of constructing coordinate representations of a

The coordinate system used in the definition of a Lagrangian or

1.4 Computing Actions

L(t, x, v) = 12 m(v · v), (1.14)

where the formal parameter x names a tuple of components of